├── .gitignore
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── R
    ├── compute_moments.R
    ├── construct_priors.R
    ├── create_dists.R
    ├── get_data_dists.R
    ├── helpers.R
    ├── metrics.R
    ├── plotting.R
    ├── run_simulation.R
    ├── simulate_data.R
    ├── simulation_study.R
    ├── stopping_functions.R
    └── update_priors.R
├── README.md
├── inst
    └── abayes
    │   ├── app.R
    │   ├── dependencies.R
    │   ├── how_to.md
    │   ├── server_funs.R
    │   └── ui_elements.R
├── man
    ├── ab_arguments.Rd
    ├── approx_solver.Rd
    ├── b_gt_a.Rd
    ├── bernoulli_dist.Rd
    ├── beta_cdf.Rd
    ├── beta_dist.Rd
    ├── calc_beta_dist.Rd
    ├── calc_gamma_dist.Rd
    ├── calc_normal_gamma_dist.Rd
    ├── create_empty_dt.Rd
    ├── expected_loss_b.Rd
    ├── gamma_cdf.Rd
    ├── gamma_dist.Rd
    ├── get_actual_loss.Rd
    ├── get_data_dists.Rd
    ├── get_losses.Rd
    ├── get_metrics.Rd
    ├── get_supported_beta.Rd
    ├── get_supported_gamma.Rd
    ├── get_supported_normal_gamma.Rd
    ├── investigate_simulations.Rd
    ├── normal_dist.Rd
    ├── normal_gamma_dist.Rd
    ├── plot_beta.Rd
    ├── plot_gamma.Rd
    ├── plot_normal.Rd
    ├── plot_relative_gain.Rd
    ├── poisson_dist.Rd
    ├── sim_effect_size.Rd
    ├── simulate_ab_test.Rd
    ├── simulate_data.Rd
    ├── single_loss_stop.Rd
    ├── update_prior.Rd
    ├── update_priors.Rd
    ├── validate_dt.Rd
    └── validate_inputs.Rd
├── shiny_app_screen_shot.png
├── tests
    ├── testthat.R
    └── testthat
    │   ├── test-construct_priors.R
    │   ├── test-create_dists.R
    │   ├── test-get_data_dists.R
    │   ├── test-metrics.R
    │   ├── test-run_simulation.R
    │   ├── test-simulation_study.R
    │   ├── test-stopping_functions.R
    │   └── test-update_priors.R
└── vignettes
    └── abayes_vignette.Rmd


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .Rhistory
3 | inst/abayes/rsconnect
4 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: abayes
 2 | Type: Package
 3 | Title: Bayesian A/B Testing
 4 | Version: 0.0.1
 5 | Authors@R: c(
 6 |     person("Michael", "Frasco", email = "mfrasco@convoy.com", role = c("aut", "cre")),
 7 |     person("Convoy Inc.", role = c("cph")))
 8 | Maintainer: Michael Frasco <mfrasco@convoy.com>
 9 | Description:
10 |     Simulate Bayesian A/B testing algorithms and evaluate the results of A/B
11 |     tests with a shiny app.
12 | Depends:
13 |     R (>= 3.3.0)
14 | Imports:
15 |     data.table,
16 |     ggplot2,
17 |     gridExtra,
18 |     parallel,
19 |     purrr,
20 |     stats
21 | Suggests:
22 |     knitr,
23 |     testthat,
24 |     rmarkdown
25 | License: Apache License 2.0 | file LICENSE
26 | LazyData: TRUE
27 | RoxygenNote: 6.0.1
28 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Apache License, Version 2.0
 2 | ===========================
 3 | 
 4 | Copyright 2018 Convoy, Inc.
 5 | 
 6 | Licensed under the Apache License, Version 2.0 (the "License");
 7 | you may not use this file except in compliance with the License.
 8 | You may obtain a copy of the License at
 9 | 
10 |     http://www.apache.org/licenses/LICENSE-2.0
11 | 
12 | Unless required by applicable law or agreed to in writing, software
13 | distributed under the License is distributed on an "AS IS" BASIS,
14 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | See the License for the specific language governing permissions and
16 | limitations under the License.


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | S3method(b_gt_a,beta_dist)
 4 | S3method(b_gt_a,gamma_dist)
 5 | S3method(b_gt_a,normal_gamma_dist)
 6 | S3method(compute_moments,beta_dist)
 7 | S3method(compute_moments,gamma_dist)
 8 | S3method(compute_moments,normal_gamma_dist)
 9 | S3method(expected_loss_b,beta_dist)
10 | S3method(expected_loss_b,gamma_dist)
11 | S3method(expected_loss_b,normal_gamma_dist)
12 | S3method(get_data_dists,beta_dist)
13 | S3method(get_data_dists,gamma_dist)
14 | S3method(get_data_dists,normal_gamma_dist)
15 | S3method(simulate_data,bernoulli_dist)
16 | S3method(simulate_data,beta_dist)
17 | S3method(simulate_data,gamma_dist)
18 | S3method(simulate_data,normal_dist)
19 | S3method(simulate_data,normal_gamma_dist)
20 | S3method(simulate_data,poisson_dist)
21 | S3method(update_prior,beta_dist)
22 | S3method(update_prior,gamma_dist)
23 | S3method(update_prior,normal_gamma_dist)
24 | export(approx_solver)
25 | export(b_gt_a)
26 | export(bernoulli_dist)
27 | export(beta_cdf)
28 | export(beta_dist)
29 | export(calc_beta_dist)
30 | export(calc_gamma_dist)
31 | export(calc_normal_gamma_dist)
32 | export(compute_moments)
33 | export(create_empty_dt)
34 | export(expected_loss_b)
35 | export(gamma_cdf)
36 | export(gamma_dist)
37 | export(get_actual_loss)
38 | export(get_data_dists)
39 | export(get_losses)
40 | export(get_metrics)
41 | export(get_supported_beta)
42 | export(get_supported_gamma)
43 | export(get_supported_normal_gamma)
44 | export(investigate_simulations)
45 | export(normal_dist)
46 | export(normal_gamma_dist)
47 | export(plot_beta)
48 | export(plot_gamma)
49 | export(plot_normal)
50 | export(plot_relative_gain)
51 | export(poisson_dist)
52 | export(sim_effect_size)
53 | export(simulate_ab_test)
54 | export(simulate_data)
55 | export(single_loss_stop)
56 | export(update_prior)
57 | export(update_priors)
58 | export(validate_dt)
59 | export(validate_inputs)
60 | importFrom(data.table,as.data.table)
61 | importFrom(data.table,between)
62 | importFrom(data.table,data.table)
63 | importFrom(data.table,rbindlist)
64 | importFrom(data.table,set)
65 | importFrom(data.table,setnames)
66 | importFrom(ggplot2,geom_density)
67 | importFrom(ggplot2,ggplot)
68 | importFrom(ggplot2,ggtitle)
69 | importFrom(ggplot2,scale_color_manual)
70 | importFrom(ggplot2,scale_fill_manual)
71 | importFrom(ggplot2,theme)
72 | importFrom(ggplot2,xlab)
73 | importFrom(ggplot2,ylab)
74 | importFrom(gridExtra,arrangeGrob)
75 | importFrom(gridExtra,grid.arrange)
76 | importFrom(parallel,clusterApplyLB)
77 | importFrom(parallel,clusterExport)
78 | importFrom(parallel,makeCluster)
79 | importFrom(parallel,stopCluster)
80 | importFrom(purrr,map)
81 | importFrom(purrr,map_dbl)
82 | importFrom(stats,pbeta)
83 | importFrom(stats,pgamma)
84 | importFrom(stats,pnorm)
85 | importFrom(stats,quantile)
86 | importFrom(stats,rbeta)
87 | importFrom(stats,rbinom)
88 | importFrom(stats,rgamma)
89 | importFrom(stats,rnorm)
90 | importFrom(stats,rpois)
91 | importFrom(stats,uniroot)
92 | 


--------------------------------------------------------------------------------
/R/compute_moments.R:
--------------------------------------------------------------------------------
 1 | # these functions compute the mean and standard deviation of distribution objects
 2 | 
 3 | #' @export
 4 | compute_moments.beta_dist <- function(dist) {
 5 |     a <- dist[['alpha']]; b <- dist[['beta']]
 6 |     return(list(mu = a / (a + b), sigma = sqrt(a * b / (a + b) ^ 2 / (a + b + 1))))
 7 | }
 8 | 
 9 | #' @export
10 | compute_moments.normal_gamma_dist <- function(dist) {
11 |     mu <- dist[['mu']]; lambda <- dist[['lambda']]
12 |     a <- dist[['alpha']]; b <- dist[['beta']]
13 |     return(list(x = list(mu = mu, sigma = sqrt(b / lambda / (a - 1))), tau = list(mu = a / b, sigma = sqrt(a / b ^ 2))))
14 | }
15 | 
16 | #' @export
17 | compute_moments.gamma_dist <- function(dist) {
18 |     a <- dist[['alpha']]; b <- dist[['beta']]
19 |     return(list(mu = a / b, sigma = sqrt(a / b ^ 2)))
20 | }
21 | 
22 | #' @title Simulate Data According to Some Distribution
23 | #' @name simulate_data
24 | #' @description Simulate a vector of data from a given distribution object.
25 | #' @export
26 | #' @param dist An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'},
27 | #'             \code{'gamma_dist'} that specifies the parameters of some distribution
28 | #' @return A list
29 | compute_moments <- function(dist) {
30 |     UseMethod('compute_moments')
31 | }


--------------------------------------------------------------------------------
/R/construct_priors.R:
--------------------------------------------------------------------------------
  1 | #' @title Calculate Parameters For Beta Distribution
  2 | #' @name calc_beta_dist
  3 | #' @description Calculate the parameters for a beta distribution parameterized
  4 | #'              by the expected value and standard deviation.
  5 | #' @export
  6 | #' @param mu The expected value of x (see details)
  7 | #' @param sigma The standard deviation of x (see details)
  8 | #' @details If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
  9 | #' @return An object of class \code{beta_dist}
 10 | calc_beta_dist <- function(mu, sigma) {
 11 |     alpha <- ((1 - mu) * mu ^ 2 - mu * sigma ^ 2) / sigma ^ 2
 12 |     beta <- alpha / mu - alpha
 13 |     return(beta_dist('alpha' = alpha, 'beta' = beta))
 14 | }
 15 | 
 16 | #' @title CDF of Parameterized Beta Distribution
 17 | #' @name beta_cdf
 18 | #' @description Calculate the area to the left or right of \code{bound} for
 19 | #'              a beta distribution parameterized by the expected value and
 20 | #'              standard deviation.
 21 | #' @export
 22 | #' @importFrom stats pbeta
 23 | #' @inheritParams calc_beta_dist
 24 | #' @param bound The quantile of the distribution of interest
 25 | #' @param lower.tail A logical that indicates whether to find the area of the upper or lower tail.
 26 | #'                   Default is \code{TRUE}.
 27 | #' @details If x ~ Beta(A, B), the \code{mu} = E[x] and \code{sigma^2} = Var(x)
 28 | #' @return The area under the desired tail.
 29 | beta_cdf <- function(mu, sigma, bound, lower.tail = TRUE) {
 30 |     beta_dist <- calc_beta_dist(mu, sigma)
 31 |     return(stats::pbeta(bound, beta_dist[['alpha']], beta_dist[['beta']], lower.tail = lower.tail))
 32 | }
 33 | 
 34 | #' @title Calculate Parameters For Normal Gamma Distribution
 35 | #' @name calc_normal_gamma_dist
 36 | #' @description Calculate the parameters for a normal gamma distribution parameterized
 37 | #'              by the expected value and standard deviation.
 38 | #' @export
 39 | #' @param mu The expected value of x (see details)
 40 | #' @param tau The expected value of T (see details)
 41 | #' @param sigma_mu The standard deviation of x (see details)
 42 | #' @param sigma_tau The standard deviation of T (see details)
 43 | #' @details If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x],
 44 | #'          \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T)
 45 | #' @return An object of class \code{normal_gamma_dist}
 46 | calc_normal_gamma_dist <- function(mu, tau, sigma_mu, sigma_tau) {
 47 |     beta <- tau / (sigma_tau ^ 2)
 48 |     alpha <- beta * tau
 49 |     lambda <- beta / (sigma_mu ^ 2 * (alpha - 1))
 50 |     return(normal_gamma_dist('mu' = mu, 'lambda' = lambda, 'alpha' = alpha, 'beta' = beta))
 51 | }
 52 | 
 53 | #' @title Calculate Parameters For Gamma Distribution
 54 | #' @name calc_gamma_dist
 55 | #' @description This function determines the parameters for a gamma distribution
 56 | #'              from the desired mean
 57 | #' @export
 58 | #' @inheritParams calc_beta_dist
 59 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
 60 | #' @return A named list
 61 | calc_gamma_dist <- function(mu, sigma) {
 62 |     beta <- mu / (sigma ^ 2)
 63 |     alpha <- beta * mu
 64 |     return(gamma_dist('alpha' = alpha, 'beta' = beta))
 65 | }
 66 | 
 67 | #' @title CDF of Parameterized Gamma Distribution
 68 | #' @name gamma_cdf
 69 | #' @description Calculate the area to the left or right of \code{bound} for
 70 | #'              a gamma distribution parameterized by the expected value and
 71 | #'              standard deviation.
 72 | #' @export
 73 | #' @importFrom stats pgamma
 74 | #' @inheritParams calc_beta_dist
 75 | #' @inheritParams beta_cdf
 76 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
 77 | #' @return The area under the desired tail.
 78 | gamma_cdf <- function(mu, sigma, bound, lower.tail = TRUE) {
 79 |     gamma_dist <- calc_gamma_dist(mu, sigma)
 80 |     return(stats::pgamma(bound, gamma_dist[['alpha']], gamma_dist[['beta']], lower.tail = lower.tail))
 81 | }
 82 | 
 83 | #' @title find_percentile
 84 | 
 85 | #' @title Approximate Distribution Solver
 86 | #' @name approx_solver
 87 | #' @description Determine the standard deviation of a distribution that will provide
 88 | #'              the desired amount of support at a certain bound.
 89 | #' @export
 90 | #' @inheritParams gamma_cdf
 91 | #' @importFrom data.table between
 92 | #' @param bound The desired maximum of the distribution
 93 | #' @param desired_support The amount of probability mass to be more extreme than \code{bound}
 94 | #' @param cdf_fun A CDF
 95 | #' @param lower_sigma A guess of the lower bound on the standard deviation
 96 | #' @param upper_sigma A guess of the upper bound on the standard deviation
 97 | #' @param tolerance Return \code{NULL}, if we cannot find a value that gives support
 98 | #'                  within \code{tolerance} of \code{desired_support}
 99 | #' @return The optimal standard deviation
100 | approx_solver <- function(mu
101 |                           , bound
102 |                           , desired_support
103 |                           , cdf_fun
104 |                           , lower_sigma
105 |                           , upper_sigma
106 |                           , lower.tail = TRUE
107 |                           , tolerance = 0.01
108 | ) {
109 |     
110 |     # get a reasonable range of lower and upper values
111 |     sigma_range <- c(lower_sigma, upper_sigma)
112 |     prob_range <- cdf_fun(mu = mu, sigma = sigma_range, bound = bound, lower.tail = lower.tail)
113 |     
114 |     if (!data.table::between(desired_support, prob_range[1], prob_range[2])) {
115 |         # lower and upper do not contain desired support
116 |         # return(approx_solver(expected, bound, desired_support, cdf_fun, lower.tail
117 |         #        , lower_multipler * 0.5, upper_multiplier * 2, tolerance))
118 |         return(NULL)
119 |     }
120 |     
121 |     candidates <- seq(sigma_range[1], sigma_range[2], length.out = 1000)
122 |     probs <- cdf_fun(mu = mu, sigma = candidates, bound = bound, lower.tail = lower.tail)
123 |     best <- which.min(abs(probs - desired_support))
124 |     if (abs(probs[best] - desired_support) > tolerance) {
125 |         return(NULL)
126 |     } else {
127 |         return(candidates[best])
128 |     }
129 | }
130 | 
131 | #' @title Get Beta Distribution With Desired Support
132 | #' @name get_supported_beta
133 | #' @description Determines the parameters for a beta distribution given an expected value
134 | #'              and desired support at some bound.
135 | #' @inheritParams calc_beta_dist
136 | #' @inheritParams approx_solver
137 | #' @details If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
138 | #' @export
139 | #' @return An object of class \code{beta_dist}
140 | get_supported_beta <- function(mu, bound, desired_support = 0.05) {
141 |     if (bound <= mu) {
142 |         return('The maximum possible rate must be greater than expected rate.')
143 |     }
144 |     
145 |     if (!all(data.table::between(c(mu, bound), 0, 1, incbounds = FALSE))) {
146 |         return('Expected and Maximum must be between 0 and 1.')
147 |     }
148 |     
149 |     # beta distribution has a maximum standard deviation of mu - mu ^ 2
150 |     sigma_diff <- min(sqrt(mu - mu ^ 2), bound - mu)
151 |     
152 |     sigma <- approx_solver(mu = mu
153 |                            , bound = bound
154 |                            , desired_support = desired_support
155 |                            , cdf_fun = beta_cdf
156 |                            , lower_sigma = 0.1 * sigma_diff
157 |                            , upper_sigma = 0.99 * sigma_diff
158 |                            , lower.tail = FALSE
159 |                            , tolerance = 0.1)
160 |     
161 |     if (is.null(sigma)) {
162 |         return('Please choose less extreme expected and maximum rates')
163 |     } else {
164 |         return(calc_beta_dist(mu = mu, sigma = sigma))
165 |     }
166 | }
167 | 
168 | #' @title Get Normal Gamma Distribution With Desired Support
169 | #' @name get_supported_normal_gamma
170 | #' @description Determines the parameters for a normal gamma distribution given the expected values
171 | #'              of the distribution and bounds on their maximum values
172 | #' @importFrom stats pnorm rgamma rnorm uniroot
173 | #' @export
174 | #' @inheritParams calc_normal_gamma_dist
175 | #' @inheritParams get_supported_beta
176 | #' @param bound_mu An upper bound on the likely value of x (see details)
177 | #' @param sigma The standard deviation of x (see details)
178 | #' @param bound_sigma The upper bound on the likely value of T (see details)
179 | #' @details If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x],
180 | #'          \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T)
181 | #' @return An object of class \code{normal_gamma_dist}
182 | get_supported_normal_gamma <- function(mu, bound_mu, sigma, bound_sigma, desired_support = 0.05) {
183 |     if ((bound_mu <= mu) || (bound_sigma <= sigma) || (bound_sigma <= 0)) {
184 |         return('The maximum possible average must be greater than the expected average.')
185 |     }
186 |     
187 |     # first, we need to solve for alpha and beta, then we can use that to solve for lambda
188 |     tau <- 1 / sigma ^ 2
189 |     bound_tau <- 1 / bound_sigma ^ 2
190 |     
191 |     gamma_dist <- get_supported_gamma(mu = tau
192 |                                       , bound = bound_tau
193 |                                       , desired_support = desired_support
194 |                                       , lower.tail = TRUE)
195 |     if (!inherits(gamma_dist, 'gamma_dist')) {
196 |         return('Please choose less extreme difference between expected and maximum.')
197 |     }
198 |     
199 |     sigma_tau <- sqrt(gamma_dist[['alpha']] / gamma_dist[['beta']] ^ 2)
200 |     
201 |     n_samps <- 1e4
202 |     sample_tau <- stats::rgamma(n = n_samps, shape = gamma_dist[['alpha']], rate = gamma_dist[['beta']])
203 |     
204 |     f <- function(sigma_mu) {
205 |         ng_dist <- calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau)
206 |         sample_mu <- stats::rnorm(n = n_samps, mean = ng_dist[['mu']], sd = sqrt(1 / (ng_dist[['lambda']] * sample_tau)))
207 |         return(quantile(sample_mu, 1 - desired_support) - bound_mu)
208 |     }
209 |     
210 |     sigma_mu <- tryCatch(stats::uniroot(f, interval = c(0.2, 1) * (bound_mu - mu))$root
211 |                          , error = function(e) NULL)
212 |     if (is.null(sigma_mu)) {
213 |         return('Choose a less extreme difference between the maximum possible average and expected average.')
214 |     }
215 |     ng_dist <- calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau)
216 |     
217 |     mu_tail <- mean(stats::pnorm(bound_mu, ng_dist[['mu']], sqrt(1 / (ng_dist[['lambda']] * sample_tau)), lower.tail = FALSE))
218 |     if (abs(mu_tail - desired_support) > 0.1) {
219 |         return('Choose a less extreme difference between the maximum possible average and expected average.')
220 |     }
221 |     
222 |     return(ng_dist)
223 | }
224 | 
225 | #' @title Get parameters of gamma distribution from expected and maximum rate
226 | #' @name get_supported_gamma
227 | #' @description This function determines the parameters for a gamma distribution
228 | #' @inheritParams calc_gamma_dist
229 | #' @inheritParams get_supported_beta
230 | #' @param lower.tail A boolean that indicates whether we want to get the support on the upper or lower tail
231 | #' @export
232 | #' @details If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
233 | #' @return A \code{gamma_dist} object
234 | get_supported_gamma <- function(mu, bound, desired_support = 0.05, lower.tail = FALSE) {
235 |     if (!lower.tail) {
236 |         if ((bound <= mu) || (bound <= 0)) {
237 |             return('The maximum must be greater than the expected and positive.')
238 |         }
239 |         sigma_diff <- bound - mu
240 |     } else {
241 |         sigma_diff <- mu - bound
242 |     }
243 |     
244 |     sigma <- approx_solver(mu = mu
245 |                            , bound = bound
246 |                            , desired_support = desired_support
247 |                            , cdf_fun = gamma_cdf
248 |                            , lower_sigma = sigma_diff * 0.1
249 |                            , upper_sigma = sigma_diff * 0.99
250 |                            , lower.tail = lower.tail
251 |                            , tolerance = 0.1)
252 |     
253 |     if (is.null(sigma)) {
254 |         return('Please choose less extreme difference between expected and maximum.')
255 |     } else {
256 |         return(calc_gamma_dist(mu, sigma))
257 |     }
258 | }
259 | 
260 | 


--------------------------------------------------------------------------------
/R/create_dists.R:
--------------------------------------------------------------------------------
  1 | #' @title Beta Distribution
  2 | #' @name beta_dist
  3 | #' @description Create a beta distribution object, used as the prior for metrics that
  4 | #'              have a bernoulli distribution.
  5 | #' @export
  6 | #' @param alpha The non-negative alpha parameter of the beta distribution
  7 | #' @param beta The non-negative beta parameter of the beta distribution
  8 | #' @return An object of class \code{'beta'}.
  9 | beta_dist <- function(alpha, beta) {
 10 |     if (!is.numeric(c(alpha, beta))) {
 11 |         stop('alpha and beta must be numeric')
 12 |     }
 13 |     if (alpha < 0 || beta < 0) {
 14 |         stop('alpha and beta must be non-negative')
 15 |     }
 16 |     x <- list('alpha' = alpha, 'beta' = beta)
 17 |     class(x) <- 'beta_dist'
 18 |     return(x)
 19 | }
 20 | 
 21 | #' @title Normal-Gamma Distribution
 22 | #' @name normal_gamma_dist
 23 | #' @description Create a normal_gamma distribution object, used as the prior for
 24 | #'              metrics that have a normal distribution.
 25 | #' @export
 26 | #' @param mu The mean of the normal-gamma distribution
 27 | #' @param lambda The non-negative lambda parameter of the normal-gamma distribution
 28 | #' @param alpha The non-negative alpha parameter of the normal-gamma distribution
 29 | #' @param beta The non-negative beta parameter of the normal-gamma distribution
 30 | #' @return A normal_unknown dist object
 31 | normal_gamma_dist <- function(mu, lambda, alpha, beta) {
 32 |     if (!is.numeric(c(mu, lambda, alpha, beta))) {
 33 |         stop('mu, lambda, alpha, and beta must be numeric')
 34 |     }
 35 |     if (lambda < 0 || alpha < 0 || beta < 0) {
 36 |         stop('lambda, alpha, and beta must be non-negative')
 37 |     }
 38 |     x <- list('mu' = mu, 'lambda' = lambda, 'alpha' = alpha, 'beta' = beta)
 39 |     class(x) <- 'normal_gamma_dist'
 40 |     return(x)
 41 | }
 42 | 
 43 | #' @title Gamma Distribution
 44 | #' @name gamma_dist
 45 | #' @description Create a gamma distribution object, used as the prior for
 46 | #'              metrics that have a poisson distribution.
 47 | #' @export
 48 | #' @param alpha The non-negative alpha (shape) parameter of the gamma distribution
 49 | #' @param beta The non-negative beta (rate) parameter of the gamma distribution
 50 | #' @return A normal_unknown dist object
 51 | gamma_dist <- function(alpha, beta) {
 52 |     if (!is.numeric(c(alpha, beta))) {
 53 |         stop('alpha and beta must be numeric')
 54 |     }
 55 |     if (alpha < 0 || beta < 0) {
 56 |         stop('alpha and beta must be non-negative')
 57 |     }
 58 |     x <- list('alpha' = alpha, 'beta' = beta)
 59 |     class(x) <- 'gamma_dist'
 60 |     return(x)
 61 | }
 62 | 
 63 | #' @title Bernoulli Distribution
 64 | #' @name bernoulli_dist
 65 | #' @description Create a bernoulli distribution object, used as the data generating
 66 | #'              distribution in tests where the metric is a probability of an
 67 | #'              event happening.
 68 | #' @export
 69 | #' @param rate The rate of the bernoulli distribution (between 0 and 1)
 70 | #' @return A \code{bernoulli_dist} object
 71 | bernoulli_dist <- function(rate) {
 72 |     if (!is.numeric(rate)) {
 73 |         stop('rate must be numeric')
 74 |     }
 75 |     if (rate < 0 || rate > 1) {
 76 |         stop('rate must be between 0 and 1')
 77 |     }
 78 |     x <- list('rate' = rate)
 79 |     class(x) <- 'bernoulli_dist'
 80 |     return(x)
 81 | }
 82 | 
 83 | #' @title Normal Distribution
 84 | #' @name normal_dist
 85 | #' @description Create a normal distribution object, used as the data generating
 86 | #'              distribution in tests where the metric is a continuous value.
 87 | #' @export
 88 | #' @param mu The mean of the normal distribution
 89 | #' @param sigma The non-negative standard deviation of the normal distribution
 90 | #' @return A \code{normal_dist} object
 91 | normal_dist <- function(mu, sigma) {
 92 |     if (!is.numeric(c(mu, sigma))) {
 93 |         stop('mu and sigma must be numeric')
 94 |     }
 95 |     if (sigma < 0) {
 96 |         stop('sigma must be non-negative')
 97 |     }
 98 |     x <- list('mu' = mu, 'sigma' = sigma)
 99 |     class(x) <- 'normal_dist'
100 |     return(x)
101 | }
102 | 
103 | #' @title Poisson Distribution
104 | #' @name poisson_dist
105 | #' @description Create a poisson distribution object, used as the data generating
106 | #'              distribution in tests where the metric is a count of an event happening.
107 | #' @export
108 | #' @param rate The non-negative mean of the poisson distribution
109 | #' @return A \code{poisson_dist} object
110 | poisson_dist <- function(rate) {
111 |     if (!is.numeric(rate)) {
112 |         stop('rate must be numeric')
113 |     }
114 |     if (rate < 0) {
115 |         stop('rate must be non-negative')
116 |     }
117 |     x <- list('rate' = rate)
118 |     class(x) <- 'poisson_dist'
119 |     return(x)
120 | }
121 | 


--------------------------------------------------------------------------------
/R/get_data_dists.R:
--------------------------------------------------------------------------------
 1 | #' @export
 2 | #' @importFrom stats rbeta
 3 | get_data_dists.beta_dist <- function(dist, n, num_variants, ...) {
 4 |     rates <- stats::rbeta(n = n * num_variants, dist[['alpha']], dist[['beta']])
 5 |     return(purrr::map2(rates[1:n], rates[(n + 1):length(rates)]
 6 |                        , function(x, y) list(a = bernoulli_dist(rate = x)
 7 |                                              , b = bernoulli_dist(rate = y))))
 8 | }
 9 | 
10 | #' @export
11 | #' @importFrom stats rgamma rnorm
12 | get_data_dists.normal_gamma_dist <- function(dist, n, num_variants, ...) {
13 |     tau <- stats::rgamma(n = n * num_variants, shape = dist[['alpha']], rate = dist[['beta']])
14 |     mu <- stats::rnorm(n = n * num_variants, mean = dist[['mu']], sd = sqrt(1 / (dist[['lambda']] * tau)))
15 |     joint <- purrr::map2(mu, tau, c)
16 |     return(purrr::map2(joint[1:n], joint[(n + 1):length(joint)]
17 |                        , function(x, y) list(a = normal_dist(mu = x[1], sigma = sqrt(1 / x[2]))
18 |                                              , b = normal_dist(mu = y[1], sigma = sqrt(1 / y[2])))))
19 | }
20 | 
21 | #' @export
22 | #' @importFrom stats rgamma
23 | get_data_dists.gamma_dist <- function(dist, n, num_variants, ...) {
24 |     rates <- stats::rgamma(n = n * num_variants, dist[['alpha']], dist[['beta']])
25 |     return(purrr::map2(rates[1:n], rates[(n + 1):length(rates)]
26 |                        , function(x, y) list(a = poisson_dist(rate = x)
27 |                                              , b = poisson_dist(rate = y))))
28 | }
29 | 
30 | #' @title Sample Data Distributions
31 | #' @name get_data_dists
32 | #' @description Given a prior distribution, sample many data generating distributions
33 | #' @param dist A prior distribution
34 | #' @param n The number of distributions to sample
35 | #' @param num_variants The number of variants to use
36 | #' @param ... Arguments to be used by other methods
37 | #' @export
38 | #' @return A list of data generating distributions for multiple variants
39 | get_data_dists <- function(dist, n, num_variants, ...) {
40 |     UseMethod('get_data_dists')
41 | }
42 | 


--------------------------------------------------------------------------------
/R/helpers.R:
--------------------------------------------------------------------------------
 1 | #' @title Bayesian A/B Arguments
 2 | #' @name ab_arguments
 3 | #' @description Provide documentation for arguments commonly used in this package
 4 | #' @param data_dists A named list of distribution objects. This list specifies the
 5 | #'                   distributions that are used to generate data in simulations.
 6 | #'                   Currently, the list must only have elements named \code{'a'}
 7 | #'                   and \code{'b'}. See Details for more information.
 8 | #' @param priors A named list of distribution objects. This list specifies the
 9 | #'               distributions that are used as priors when estimating some parameter
10 | #'               from the data generating distribution. Currently, the list must only
11 | #'               have elements named \code{'a'} and \code{'b'}. See Details for more information.
12 | #' @param loss_threshold A positive number that identifies a bound for the expected
13 | #'                       loss for each variant. Once the expected loss is below this
14 | #'                       bound, the experiment is concluded.
15 | #' @param obs_per_round A positive number that represents how many observations, across both variants,
16 | #'                      are generated before we update the prior distributions and
17 | #'                      evaluate the expected loss. This number must be divisible by
18 | #'                      the number of variants used. Default is \code{1000}.
19 | #' @param max_rounds A positive integer that specifies the maximum number of
20 | #'                   times that we will evaluate the expected loss on both experiments.
21 | #'                   Default is \code{100}.
22 | #' @param sim_batch_size A positive integer that specifies how much data is simulated
23 | #'                       when evaluating the expected loss for variants that do
24 | #'                       not have an analytic solution (i.e. normal data).
25 | NULL
26 | 
27 | 
28 | #' @title Create an Empty Data Table
29 | #' @name create_empty_dt
30 | #' @description Create an empty data table with a specified number of rows and columns.
31 | #'              You can also choose the value that fills the data table.
32 | #' @export
33 | #' @importFrom data.table as.data.table setnames
34 | #' @param num_rows A positive integer that specifies the number of rows for the data table
35 | #' @param column_names A character vector that specifies the number and names of the columns.
36 | #' @param fill_value Default is -1. Specifies what value should fill the data table
37 | #' @return A data.table
38 | create_empty_dt <- function(num_rows, column_names, fill_value = -1) {
39 |     dt <- data.table::as.data.table(matrix(fill_value, nrow = num_rows, ncol = length(column_names)))
40 |     data.table::setnames(dt, column_names)
41 |     return(dt)
42 | }
43 | 


--------------------------------------------------------------------------------
/R/metrics.R:
--------------------------------------------------------------------------------
  1 | use_exact_beta <- function(dist_a, dist_b, cutoff = 1000) {
  2 |     is_small_beta <- function(dist, cutoff) {
  3 |         return(dist[['alpha']] + dist[['beta']] < cutoff)
  4 |     }
  5 |     return(is_small_beta(dist_a, cutoff) || is_small_beta(dist_b, cutoff))
  6 | }
  7 | 
  8 | use_exact_gamma <- function(dist_a, dist_b, cutoff = 1000) {
  9 |     is_small_gamma <- function(dist, cutoff) {
 10 |         return(dist[['alpha']] < cutoff)
 11 |     }
 12 |     return(is_small_gamma(dist_a, cutoff) || is_small_gamma(dist_b, cutoff))
 13 | }
 14 | 
 15 | #' @export
 16 | #' @importFrom stats pnorm
 17 | b_gt_a.beta_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) {
 18 |     alpha_a <- dist_a[['alpha']]
 19 |     beta_a <- dist_a[['beta']]
 20 |     alpha_b <- dist_b[['alpha']]
 21 |     beta_b <- dist_b[['beta']]
 22 |     
 23 |     if (is.null(exact)) {
 24 |         exact <- use_exact_beta(dist_a, dist_b, ...)
 25 |     }
 26 |     
 27 |     if (exact) {
 28 |         iter <- seq(0, round(alpha_b) - 1)
 29 |         alpha_a <- round(alpha_a)
 30 |         w <- lbeta(alpha_a + iter, beta_a + beta_b)
 31 |         x <- log(beta_b + iter)
 32 |         y <- lbeta(1 + iter, beta_b)
 33 |         z <- lbeta(alpha_a, beta_a)
 34 |         return(sum(exp(w - x - y - z)))
 35 |     } else {
 36 |         mu_a <- alpha_a / (alpha_a + beta_a)
 37 |         mu_b <- alpha_b / (alpha_b + beta_b)
 38 |         var_a <- alpha_a * beta_a / ((alpha_a + beta_a) ^ 2 * (alpha_a + beta_a + 1))
 39 |         var_b <- alpha_b * beta_b / ((alpha_b + beta_b) ^ 2 * (alpha_b + beta_b + 1))
 40 |         return(stats::pnorm(0, mu_b - mu_a, sqrt(var_a + var_b), lower.tail = FALSE))
 41 |     }
 42 | }
 43 | 
 44 | #' @export
 45 | #' @importFrom stats pnorm
 46 | b_gt_a.normal_gamma_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) {
 47 |     return(mean(theta_b > theta_a))
 48 | }
 49 | 
 50 | #' @export
 51 | #' @importFrom stats pnorm
 52 | b_gt_a.gamma_dist <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) {
 53 |     alpha_a <- dist_a[['alpha']]
 54 |     beta_a <- dist_a[['beta']]
 55 |     alpha_b <- dist_b[['alpha']]
 56 |     beta_b <- dist_b[['beta']]
 57 |     
 58 |     if (is.null(exact)) {
 59 |         exact <- use_exact_gamma(dist_a, dist_b)
 60 |     }
 61 |     
 62 |     if (exact) {
 63 |         iter <- seq(0, round(alpha_b) - 1)
 64 |         alpha_a <- round(alpha_a)
 65 |         x1 <- iter * log(beta_b)
 66 |         x2 <- alpha_a * log(beta_a)
 67 |         x3 <- log(iter + alpha_a)
 68 |         x4 <- lbeta(iter + 1, alpha_a)
 69 |         x5 <- (iter + alpha_a) * log(beta_a + beta_b)
 70 |         return(sum(exp(x1 + x2 - (x3 + x4 + x5))))
 71 |     } else {
 72 |         mu_a <- alpha_a / beta_a
 73 |         mu_b <- alpha_b / beta_b
 74 |         var_a <- alpha_a / beta_a ^ 2
 75 |         var_b <- alpha_b / beta_b ^ 2
 76 |         return(stats::pnorm(0, mu_b - mu_a, sqrt(var_a + var_b), lower.tail = FALSE))
 77 |     }
 78 | }
 79 | 
 80 | #' @title Probability Variant B is Greater Than Variant A
 81 | #' @name b_gt_a
 82 | #' @description Given two distributions, find the probability that the expected
 83 | #'              value of variant B is greater than the expected value of variant A
 84 | #' @param dist_a Some distribution object (see examples)
 85 | #' @param dist_b Some distribution object (see examples)
 86 | #' @param theta_a A vector of simulated values from \code{dist_a}
 87 | #' @param theta_b A vector of simulated values from \code{dist_b}
 88 | #' @param exact A boolean that indicates whether the calculation should be approximated
 89 | #'              using the normal distribution. Default is \code{NULL}, which means
 90 | #'              that it will use the normal distribution if there is sufficient data.
 91 | #' @param ... Arguments to be passed onto other methods
 92 | #' @export
 93 | #' @return A numeric value
 94 | b_gt_a <- function(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...) {
 95 |     UseMethod('b_gt_a')
 96 | }
 97 | 
 98 | #' @export
 99 | expected_loss_b.beta_dist <- function(dist_a
100 |                                       , dist_b
101 |                                       , theta_a
102 |                                       , theta_b
103 |                                       , method = c('absolute', 'percent')
104 |                                       , ...
105 | ) {
106 |     method <- match.arg(method)
107 |     
108 |     alpha_a <- dist_a[['alpha']]
109 |     beta_a <- dist_a[['beta']]
110 |     alpha_b <- dist_b[['alpha']]
111 |     beta_b <- dist_b[['beta']]
112 |     
113 |     if (method == 'absolute') {
114 |         x1 <- lbeta(alpha_a + 1, beta_a)
115 |         y1 <- log(1 - b_gt_a(beta_dist(alpha_a + 1, beta_a), dist_b, ...))
116 |         z1 <- lbeta(alpha_a, beta_a)
117 |         
118 |         x2 <- lbeta(alpha_b + 1, beta_b)
119 |         y2 <- log(1 - b_gt_a(dist_a, beta_dist(alpha_b + 1, beta_b), ...))
120 |         z2 <- lbeta(alpha_b, beta_b)
121 |         
122 |         return(exp(x1 + y1 - z1) - exp(x2 + y2 - z2))
123 |     } else {
124 |         prob_1 <- 1 - b_gt_a(dist_a, dist_b, ...)
125 |         
126 |         x <- lbeta(alpha_a - 1, beta_a)
127 |         y <- lbeta(alpha_a, beta_a)
128 |         z <- lbeta(alpha_b + 1, beta_b)
129 |         w <- lbeta(alpha_b, beta_b)
130 |         prob_2 <- log(1 - b_gt_a(beta_dist(alpha_a - 1, beta_a)
131 |                                  , beta_dist(alpha_b + 1, beta_b), ...))
132 |         return(prob_1 - exp(x - y + z - w + prob_2))
133 |     }
134 | }
135 | 
136 | #' @export
137 | expected_loss_b.normal_gamma_dist <- function(dist_a
138 |                                               , dist_b
139 |                                               , theta_a
140 |                                               , theta_b
141 |                                               , method = c('absolute', 'percent')
142 |                                               , ...
143 | ) {
144 |     return(mean(pmax(theta_a - theta_b, 0)))
145 | }
146 | 
147 | #' @export
148 | expected_loss_b.gamma_dist <- function(dist_a
149 |                                        , dist_b
150 |                                        , theta_a
151 |                                        , theta_b
152 |                                        , method = c('absolute', 'percent')
153 |                                        , ...
154 | ) {
155 |     alpha_a <- dist_a[['alpha']]
156 |     beta_a <- dist_a[['beta']]
157 |     alpha_b <- dist_b[['alpha']]
158 |     beta_b <- dist_b[['beta']]
159 |     
160 |     x1 <- lgamma(alpha_a + 1)
161 |     y1 <- log(1 - b_gt_a(gamma_dist(alpha_a + 1, beta_a), dist_b))
162 |     z1 <- log(beta_a)
163 |     w1 <- lgamma(alpha_a)
164 |     
165 |     x2 <- lgamma(alpha_b + 1)
166 |     y2 <- log(1 - b_gt_a(dist_a, gamma_dist(alpha_b + 1, beta_b)))
167 |     z2 <- log(beta_b)
168 |     w2 <- lgamma(alpha_b)
169 |     return(exp(x1 + y1 - z1 - w1) - exp(x2 + y2 - z2 - w2))
170 | }
171 | 
172 | #' @title Expected Loss of Choosing Variant B
173 | #' @name expected_loss_b
174 | #' @description This function calculates the expected loss of choosing variant B
175 | #' @export
176 | #' @param dist_a The distribution object for variant A
177 | #' @param dist_b The distribution object for variant B
178 | #' @param theta_a A vector of simulated values from \code{dist_a}
179 | #' @param theta_b A vector of simulated values from \code{dist_b}
180 | #' @param method One of \code{'absolute'} or \code{'percent'} that indicates
181 | #'               whether the loss function takes the absolute difference
182 | #'               or the percent difference between \code{theta_a} and \code{theta_b}
183 | #' @param ... Arguments to be passed onto other methods
184 | #' @return A numeric
185 | expected_loss_b <- function(dist_a
186 |                             , dist_b
187 |                             , theta_a
188 |                             , theta_b
189 |                             , method = c('absolute', 'percent')
190 |                             , ...
191 | ) {
192 |     UseMethod('expected_loss_b')
193 | }
194 | 
195 | #' @title Expected Losses For Variants
196 | #' @name get_losses
197 | #' @description Evaluate the expected loss for each variant
198 | #' @export
199 | #' @param posteriors A list of distribution objects that identify the posterior distributions
200 | #'                   for each variant
201 | #' @inheritParams ab_arguments
202 | #' @inheritParams expected_loss_b
203 | #' @importFrom purrr map
204 | #' @inheritParams ab_arguments
205 | #' @return A list of expected losses for each variant
206 | get_losses <- function(posteriors, sim_batch_size, method = c('absolute', 'percent')) {
207 |     dist_name <- class(posteriors[['a']])
208 |     dist_a <- posteriors[['a']]
209 |     dist_b <- posteriors[['b']]
210 |     if (dist_name %in% c('normal_gamma_dist')) {
211 |         thetas <- purrr::map(posteriors, function(dist) simulate_data(dist = dist, n = sim_batch_size))
212 |     } else {
213 |         thetas <- NULL
214 |     }
215 |     
216 |     loss_a <- expected_loss_b(dist_a = dist_b, dist_b = dist_a
217 |                               , theta_a = thetas[['b']], theta_b = thetas[['a']]
218 |                               , method = method)
219 |     loss_b <- expected_loss_b(dist_a = dist_a, dist_b = dist_b
220 |                               , theta_a = thetas[['a']], theta_b = thetas[['b']]
221 |                               , method = method)
222 |     
223 |     return(list(a = loss_a, b = loss_b))
224 | }
225 | 
226 | #' @title Get Bayesian A/B Testing Metrics
227 | #' @name get_metrics
228 | #' @description Calculate various metrics
229 | #' @export
230 | #' @importFrom purrr map
231 | #' @param posteriors A list of distribution objects
232 | #' @param sim_batch_size How many observations of data to simulate
233 | #' @param method What type of loss to calculate?
234 | #' @return A named list with the metrics
235 | get_metrics <- function(posteriors, sim_batch_size, method = c('absolute', 'percent')) {
236 |     dist_a <- posteriors[['a']]
237 |     dist_b <- posteriors[['b']]
238 |     
239 |     # simulate data for effect size credible interval and normal gamma
240 |     thetas <- purrr::map(posteriors, function(dist) simulate_data(dist = dist, n = sim_batch_size))
241 |     
242 |     # probability that one metric is greater than another
243 |     prob_b_gt_a <- b_gt_a(dist_a = dist_a
244 |                           , dist_b = dist_b
245 |                           , theta_a = thetas[['a']]
246 |                           , theta_b = thetas[['b']])
247 |     
248 |     # risk for each variant
249 |     loss_a <- expected_loss_b(dist_a = dist_b
250 |                               , dist_b = dist_a
251 |                               , theta_a = thetas[['b']]
252 |                               , theta_b = thetas[['a']]
253 |                               , method = method)
254 |     loss_b <- expected_loss_b(dist_a = dist_a
255 |                               , dist_b = dist_b
256 |                               , theta_a = thetas[['a']]
257 |                               , theta_b = thetas[['b']]
258 |                               , method = method)
259 |     
260 |     # effect size for variant b
261 |     effect <- sim_effect_size(theta_a = thetas[['a']], theta_b = thetas[['b']])
262 |     
263 |     return(list('loss_a' = loss_a, 'loss_b' = loss_b, 'prob_b_gt_a' = prob_b_gt_a
264 |                 , 'effect_lower' = effect[['lower']], 'effect_expected' = effect[['avg']]
265 |                 , 'effect_upper' = effect[['upper']]))
266 | }
267 | 
268 | #' @title Actual Loss of an Experiment
269 | #' @name get_actual_loss
270 | #' @description Once the experiment has concluded, measure the loss of the decision
271 | #' @inheritParams ab_arguments
272 | #' @param selected_variant A string that identifies the winning variant.
273 | #' @importFrom purrr map
274 | #' @export
275 | #' @return Numeric
276 | get_actual_loss <- function(data_dists, selected_variant) {
277 |     dist_name <- class(data_dists[['a']])
278 |     metric_mapping <- list(bernoulli_dist = 'rate'
279 |                            , normal_dist = 'mu'
280 |                            , poisson_dist = 'rate')
281 |     truth <- purrr::map(data_dists, metric_mapping[[dist_name]])
282 |     if (selected_variant == 'a') {
283 |         return(pmax(truth[['b']] - truth[['a']], 0))
284 |     } else {
285 |         return(pmax(truth[['a']] - truth[['b']], 0))
286 |     }
287 | }
288 | 
289 | #' @title Simulate the effect size of variant B - variant A
290 | #' @name sim_effect_size
291 | #' @description This function simulates the effect size (B - A)
292 | #' @export
293 | #' @importFrom stats quantile
294 | #' @param theta_a A vector of draws from the posterior of a
295 | #' @param theta_b A vector of draws from the posterior of b
296 | #' @return A list containing the 2.5%, mean, and 97.5% of the effect size
297 | sim_effect_size <- function(theta_a, theta_b) {
298 |     diffs <- theta_b - theta_a
299 |     credible_interval <- stats::quantile(diffs, c(0.025, 0.975))
300 |     return(list('lower' = credible_interval['2.5%']
301 |                 , 'avg' = mean(diffs)
302 |                 , 'upper' = credible_interval['97.5%']))
303 | }
304 | 
305 | 
306 | 
307 | 
308 | 


--------------------------------------------------------------------------------
/R/plotting.R:
--------------------------------------------------------------------------------
  1 | #' @title Plot Beta Distributions
  2 | #' @name plot_beta
  3 | #' @description This function plots the densities of multiple beta distributions
  4 | #' @export
  5 | #' @importFrom data.table between data.table rbindlist
  6 | #' @importFrom stats rbeta
  7 | #' @importFrom ggplot2 ggplot geom_density ggtitle xlab ylab scale_color_manual scale_fill_manual theme
  8 | #' @param betas A list of lists of beta distributions
  9 | #' @param title The title of the plot
 10 | #' @param xlab The title of the x axis
 11 | #' @param ylab The title of the y axis
 12 | #' @param color The color for the plot
 13 | #' @param support_level The desired amount of area between the lower and upper bounds. Default is \code{0.99}.
 14 | #' 
 15 | #' @return NULL. A plot is generated
 16 | plot_beta <- function(betas
 17 |                       , title = 'Beta Distribution'
 18 |                       , xlab = 'Rate that the Event Occurs'
 19 |                       , ylab = 'Density of that Rate'
 20 |                       , color = '#f65335'
 21 |                       , support_level = 0.99
 22 | ) {
 23 |     
 24 |     n_samp <- 1e5
 25 |     beta_dt <- NULL
 26 |     
 27 |     for (i in seq_along(betas)) {
 28 |         x <- names(betas)[i]
 29 |         var_name <- paste('variant', x)
 30 |         
 31 |         # generate some data
 32 |         beta_vec <- stats::rbeta(n_samp, betas[[x]][['alpha']], betas[[x]][['beta']])
 33 |         
 34 |         # remove the lower and upper extremes of the data
 35 |         beta_vec <- beta_vec[data.table::between(beta_vec
 36 |                                                  , quantile(beta_vec, (1 - support_level) / 2)
 37 |                                                  , quantile(beta_vec, support_level + (1 - support_level) / 2))]
 38 |         
 39 |         beta_dt <- data.table::rbindlist(list(beta_dt, data.table::data.table('variant' = rep(var_name, length(beta_vec))
 40 |                                                                               , 'betas' = beta_vec)))
 41 |     }
 42 |     
 43 |     if (length(betas) > 1) {
 44 |         col_vals <- c('darkred', 'darkblue')
 45 |     } else {
 46 |         col_vals <- color
 47 |     }
 48 |     
 49 |     rate_plot <- ggplot(beta_dt, aes(x = betas, colour = variant, fill =variant)) + 
 50 |         geom_density(size = 1, alpha = 0.1) +
 51 |         ggtitle(title) + xlab(xlab) + ylab(ylab) +
 52 |         scale_color_manual(values = col_vals) +
 53 |         scale_fill_manual(values = col_vals) +
 54 |         theme(plot.title = element_text(hjust = 0.5, size = 22)
 55 |               , axis.title = element_text(size = 18)
 56 |               , axis.text = element_text(size = 14)
 57 |               , legend.title = element_blank()
 58 |               , legend.text = element_text(size = 16)
 59 |               , legend.position = if(length(betas) == 2) 'bottom' else 'none')
 60 |     return(rate_plot)
 61 | }
 62 | 
 63 | #' @title Plot Normal Distributions
 64 | #' @name plot_normal
 65 | #' @description This function allows you to visualize the densities of a normal distribution
 66 | #' @export
 67 | #' @importFrom stats rgamma rnorm
 68 | #' @importFrom data.table between data.table rbindlist
 69 | #' @importFrom gridExtra grid.arrange arrangeGrob
 70 | #' @param normals A list of lists of normal distributions
 71 | #' @inheritParams plot_beta
 72 | #' @return NULL. A plot is generated
 73 | plot_normal <- function(normals
 74 |                         , title = 'Normal Distribution'
 75 |                         , color = '#f65335'
 76 |                         , support_level = 0.99
 77 | ) {
 78 |     n_samp <- 1e5
 79 |     
 80 |     sd_dt <- NULL
 81 |     mu_dt <- NULL
 82 |     
 83 |     # sample some data
 84 |     for (i in seq_along(normals)) {
 85 |         x <- names(normals)[i]
 86 |         var_name <- paste('variant', x)
 87 |         
 88 |         # generate some data
 89 |         sd_vec <- sqrt(1 / stats::rgamma(n_samp
 90 |                                          , normals[[x]][['alpha']]
 91 |                                          , normals[[x]][['beta']]))
 92 |         mu_vec <- stats::rnorm(n_samp
 93 |                                , normals[[x]][['mu']]
 94 |                                , sqrt(1 / normals[[x]][['lambda']]) * sd_vec)
 95 |         
 96 |         # remove the lower and upper extremes of the data
 97 |         sd_vec <- sd_vec[data.table::between(sd_vec
 98 |                                              , quantile(sd_vec, (1 - support_level) / 2)
 99 |                                              , quantile(sd_vec, support_level + (1 - support_level) / 2))]
100 |         n_sd <- length(sd_vec)
101 |         mu_vec <- mu_vec[data.table::between(mu_vec
102 |                                              , quantile(mu_vec, (1 - support_level) / 2)
103 |                                              , quantile(mu_vec, support_level + (1 - support_level) / 2))]
104 |         n_mu <- length(mu_vec)
105 |         
106 |         sd_dt <- data.table::rbindlist(list(sd_dt, data.table::data.table('variant' = rep(var_name, n_sd)
107 |                                                                           , 'sds' = sd_vec)))
108 |         mu_dt <- data.table::rbindlist(list(mu_dt, data.table::data.table('variant' = rep(var_name, n_mu)
109 |                                                                           , 'mus' = mu_vec)))
110 |     }
111 |     
112 |     if (length(normals) > 1) {
113 |         col_vals <- c('darkred', 'darkblue')
114 |     } else {
115 |         col_vals <- color
116 |     }
117 |     
118 |     mu_plot <- ggplot(mu_dt, aes(x = mus, colour = variant, fill = variant)) + 
119 |         geom_density(size = 1, alpha = 0.1) +
120 |         ggtitle("What We Believe About The Mean") +
121 |         xlab('Mean') +
122 |         ylab('Probability') +
123 |         scale_color_manual(values = col_vals) +
124 |         scale_fill_manual(values = col_vals) +
125 |         theme(plot.title = element_text(hjust = 0.5, size = 22)
126 |               , axis.title = element_text(size = 18)
127 |               , axis.text = element_text(size = 14)
128 |               , legend.title = element_blank()
129 |               , legend.text = element_text(size = 16)
130 |               , legend.position = 'none')
131 |     
132 |     sd_plot <- ggplot(sd_dt, aes(x = sds, colour = variant, fill = variant)) + 
133 |         geom_density(size = 1, alpha = 0.1) +
134 |         ggtitle("What We Believe About The Standard Deviation") +
135 |         xlab('Standard Deviation') +
136 |         ylab('Probability') +
137 |         scale_color_manual(values = col_vals) +
138 |         scale_fill_manual(values = col_vals) +
139 |         theme(plot.title = element_text(hjust = 0.5, size = 22)
140 |               , axis.title = element_text(size = 18)
141 |               , axis.text = element_text(size = 14)
142 |               , legend.title = element_blank()
143 |               , legend.text = element_text(size = 16)
144 |               , legend.position = if(length(normals) == 2) 'bottom' else 'none')
145 |     
146 |     if (length(normals) == 2) {
147 |         g_legend <- function(a.gplot) {
148 |             tmp <- ggplot_gtable(ggplot_build(a.gplot))
149 |             leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
150 |             legend <- tmp$grobs[[leg]]
151 |             return(legend)
152 |         }
153 |         
154 |         my_legend <- g_legend(sd_plot)
155 |         
156 |         results_plot <- gridExtra::grid.arrange(gridExtra::arrangeGrob(mu_plot + theme(legend.position="none"),
157 |                                                                        sd_plot + theme(legend.position="none"),
158 |                                                                        nrow = 2)
159 |                                                 , my_legend, nrow = 2, heights=c(10, 1))
160 |     } else {
161 |         results_plot <- gridExtra::grid.arrange(mu_plot, sd_plot, ncol = 1, heights=c(1, 1))
162 |     }
163 |     
164 |     return(NULL)
165 | }
166 | 
167 | #' @title Plot Gamma Distributions
168 | #' @name plot_gamma
169 | #' @description This function plots the densities of multiple gamma distributions
170 | #' @export
171 | #' @importFrom purrr map map_dbl
172 | #' @param gammas A list of lists of gamma distributions
173 | #' @inheritParams plot_beta
174 | #' @return NULL. A plot is generated
175 | plot_gamma <- function(gammas
176 |                        , title = 'Density of Gamma Distribution'
177 |                        , color = '#f65335'
178 |                        , support_level = 0.99
179 | ) {
180 |     
181 |     n_samp <- 1e5
182 |     gamma_dt <- NULL
183 |     
184 |     for (i in seq_along(gammas)) {
185 |         x <- names(gammas)[i]
186 |         var_name <- paste('variant', x)
187 |         
188 |         # generate some data
189 |         gamma_vec <- rgamma(n_samp, gammas[[x]][['alpha']], gammas[[x]][['beta']])
190 |         
191 |         # remove the lower and upper extremes of the data
192 |         gamma_vec <- gamma_vec[data.table::between(gamma_vec
193 |                                                    , quantile(gamma_vec, (1 - support_level) / 2)
194 |                                                    , quantile(gamma_vec, support_level + (1 - support_level) / 2))]
195 |         
196 |         gamma_dt <- data.table::rbindlist(list(gamma_dt, data.table::data.table('variant' = rep(var_name, length(gamma_vec))
197 |                                                                                 , 'gammas' = gamma_vec)))
198 |     }
199 |     
200 |     if (length(gammas) > 1) {
201 |         col_vals <- c('darkred', 'darkblue')
202 |     } else {
203 |         col_vals <- color
204 |     }
205 |     
206 |     lambda_plot <- ggplot(gamma_dt, aes(x = gammas, colour = variant, fill = variant)) + 
207 |         geom_density(size = 1, alpha = 0.1) +
208 |         ggtitle(title) +
209 |         xlab('Expected Amount of Times Event Occurrs') +
210 |         ylab('Probability of That Rate') +
211 |         scale_color_manual(values = col_vals) +
212 |         scale_fill_manual(values = col_vals) +
213 |         theme(plot.title = element_text(hjust = 0.5, size = 22)
214 |               , axis.title = element_text(size = 18)
215 |               , axis.text = element_text(size = 14)
216 |               , legend.title = element_blank()
217 |               , legend.text = element_text(size = 16)
218 |               , legend.position = if(length(gammas) == 2) 'bottom' else 'none')
219 |     return(lambda_plot)
220 | }
221 | 
222 | #' @title Plot Relative Gain
223 | #' @name plot_relative_gain
224 | #' @description Plot the cumulative density of the ratio of the metric under variant B to the metric under variant A
225 | #' @importFrom purrr map
226 | #' @export
227 | #' @param dists A list of distribution objects, with elements named \code{'a'} and \code{'b'}
228 | #' @param sim_batch_size The number of objects to simulate
229 | #' @return A plot
230 | plot_relative_gain <- function(dists, sim_batch_size = 1e5, title = 'Cumulative Density of B / A') {
231 |     thetas <- purrr::map(dists, function(dist) simulate_data(dist = dist, n = sim_batch_size))
232 |     ratios <- thetas[['b']] / thetas[['a']]
233 |     df <- data.frame(x = ratios)
234 |     ecdf_plot <- ggplot(df, aes(x, colour = '#f65335')) + stat_ecdf(size = 1.5) +
235 |         ggtitle(title) + xlab('Relative Gain (B / A)') + ylab('Cumulative Density') +
236 |         theme(plot.title = element_text(hjust = 0.5, size = 22)
237 |               , axis.title = element_text(size = 18)
238 |               , axis.text = element_text(size = 14)
239 |               , legend.position = 'none')
240 |     return(ecdf_plot)
241 | }
242 | 


--------------------------------------------------------------------------------
/R/run_simulation.R:
--------------------------------------------------------------------------------
  1 | #' @title Simulate a Bayesian A/B Test
  2 | #' @name simulate_ab_test
  3 | #' @description Given true data generating distributions and prior distributions for
  4 | #'              variants A and B, simulate the data, calculate the necessary statistics
  5 | #'              and declare one of the tests a winner.
  6 | #' @inheritParams ab_arguments
  7 | #' @export
  8 | #' @importFrom purrr map
  9 | #' @importFrom data.table set
 10 | #' @details In order to create \code{data_dists} and \code{priors}, you need to use the following
 11 | #'          for data generating distributions: \code{\link{bernoulli_dist}},
 12 | #'          \code{\link{normal_dist}}, \code{\link{poisson_dist}} and the following for prior
 13 | #'          distributions \code{\link{beta_dist}}, \code{\link{normal_gamma_dist}},
 14 | #'          \code{\link{beta_dist}},
 15 | #' @seealso \code{\link{bernoulli_dist}} \code{\link{normal_dist}} \code{\link{poisson_dist}}
 16 | #'          \code{\link{beta_dist}} \code{\link{normal_gamma_dist}} \code{\link{beta_dist}}
 17 | #' @return A list that contains the name of the winning variant, the number of observations used,
 18 | #'         the loss of the decision, whether the test finished, the metrics from each round, and
 19 | #'         the raw data.
 20 | simulate_ab_test <- function(data_dists
 21 |                              , priors
 22 |                              , loss_threshold
 23 |                              , obs_per_round = 1000
 24 |                              , max_rounds = 100
 25 |                              , sim_batch_size = 1e5
 26 | ) {
 27 |     if (any(c(loss_threshold, obs_per_round, max_rounds, sim_batch_size) <= 0)) {
 28 |         stop('loss_threshold, obs_per_round, max_rounds, and sim_batch_size must be positive')
 29 |     }
 30 |     validate_inputs(data_dists, priors, obs_per_round)
 31 |     variants <- names(data_dists)
 32 |     num_variants <- length(data_dists)
 33 |     obs_per_variant <- obs_per_round / num_variants
 34 |     
 35 |     evidence_dt <- create_empty_dt(num_rows = max_rounds * obs_per_variant, column_names = variants)
 36 |     loss_dt <- create_empty_dt(num_rows = max_rounds, column_names = variants)
 37 |     best_variant <- NULL
 38 |     
 39 |     for (round_num in 1:max_rounds) {
 40 |         
 41 |         # simulate new set of data
 42 |         which_rows <- (1 + (round_num - 1) * obs_per_variant):(round_num * obs_per_variant)
 43 |         new_evidence <- purrr::map(data_dists, function(dist) simulate_data(dist = dist, n = obs_per_variant))
 44 |         data.table::set(evidence_dt, i = which_rows, j = names(evidence_dt), value = new_evidence)
 45 |         
 46 |         # update priors with new evidence
 47 |         posteriors <- update_priors(priors = priors
 48 |                                     , evidence_dt = evidence_dt[1:max(which_rows)])
 49 |         
 50 |         # evaluate metrics
 51 |         losses <- get_losses(posteriors = posteriors
 52 |                              , sim_batch_size = sim_batch_size)
 53 |         
 54 |         data.table::set(loss_dt, i = round_num, j = names(loss_dt), value = losses)
 55 | 
 56 |         # determine if we can stop the experiment
 57 |         decision <- single_loss_stop(losses = unlist(losses)
 58 |                                      , loss_threshold = loss_threshold)
 59 |         if (decision[['stop']]) {
 60 |             best_variant <- decision[['variant']]
 61 |             break
 62 |         }
 63 |     }
 64 |     test_finished <- as.numeric(!is.null(best_variant))
 65 |     if (test_finished) {
 66 |         actual_loss <- get_actual_loss(data_dists = data_dists
 67 |                                        , selected_variant = best_variant)
 68 |     } else {
 69 |         actual_loss <- NA_real_
 70 |     }
 71 |     
 72 |     
 73 |     out <- list(best_variant = best_variant
 74 |                 , num_obs = obs_per_round * round_num
 75 |                 , actual_loss = actual_loss
 76 |                 , test_finished = test_finished
 77 |                 , loss_dt = loss_dt[1:round_num]
 78 |                 , evidence_dt = evidence_dt[1:(round_num * obs_per_variant)])
 79 |     return(out)
 80 | }
 81 | 
 82 | #' @title Validate Inputs to Bayesian A/B Simulation
 83 | #' @name validate_inputs
 84 | #' @description Check the validity of the parameters being used in the simulation.
 85 | #' @export
 86 | #' @inheritParams ab_arguments
 87 | #' @return NULL if all of the tests pass. Else, it will fail loudly.
 88 | validate_inputs <- function(data_dists
 89 |                             , priors
 90 |                             , obs_per_round
 91 | ) {
 92 |     # check the number and names of the variants
 93 |     num_variants <- length(data_dists)
 94 |     if (num_variants != 2) {
 95 |         stop('Currently, only A/B tests with two variants are supported')
 96 |     }
 97 |     true_names <- names(data_dists)
 98 |     prior_names <- names(priors)
 99 |     if (is.null(true_names) || !identical(true_names, prior_names) || !identical(true_names, letters[1:num_variants])) {
100 |         stop('data_dists and priors must be named lists. And the elements must be named "a" and "b"')
101 |     }
102 |     
103 |     supported_data_dists <- c('bernoulli_dist', 'normal_dist', 'poisson_dist')
104 |     supported_priors <- c('beta_dist', 'normal_gamma_dist', 'gamma_dist')
105 |     data_a <- class(data_dists[['a']])
106 |     data_b <- class(data_dists[['b']])
107 |     prior_a <- class(priors[['a']])
108 |     prior_b <- class(priors[['b']])
109 |     
110 |     if (!identical(data_a, data_b) | !(data_a %in% supported_data_dists)) {
111 |         stop(paste('All data distributions need to be identical. Supported:'
112 |                    , paste(supported_data_dists, collapse = ', ')))
113 |     }
114 |     
115 |     if (!identical(prior_a, prior_b) | !(prior_a %in% supported_priors)) {
116 |         stop(paste('All prior distributions need to be identical. Supported:'
117 |                    , paste(supported_priors, collapse = ', ')))
118 |     }
119 |     
120 |     # check that the number of obs to sample each round is divisible by the number of variants
121 |     if (obs_per_round %% num_variants != 0) {
122 |         stop('Number of obs must be divisible by number of variants')
123 |     }
124 |     
125 |     return(NULL)
126 | }
127 | 
128 | 


--------------------------------------------------------------------------------
/R/simulate_data.R:
--------------------------------------------------------------------------------
 1 | #' @export
 2 | #' @importFrom stats rbeta
 3 | simulate_data.beta_dist <- function(dist, n, ...) {
 4 |     return(stats::rbeta(n = n, dist[['alpha']], dist[['beta']]))
 5 | }
 6 | 
 7 | #' @export
 8 | #' @importFrom stats rgamma rnorm
 9 | simulate_data.normal_gamma_dist <- function(dist, n, ...) {
10 |     tau <- stats::rgamma(n = n, shape = dist[['alpha']], rate = dist[['beta']])
11 |     return(stats::rnorm(n = n, mean = dist[['mu']], sd = sqrt(1 / (dist[['lambda']] * tau))))
12 | }
13 | 
14 | #' @export
15 | #' @importFrom stats rgamma
16 | simulate_data.gamma_dist <- function(dist, n, ...) {
17 |     return(stats::rgamma(n = n, shape = dist[['alpha']], rate = dist[['beta']]))
18 | }
19 | 
20 | #' @export
21 | #' @importFrom stats rbinom
22 | simulate_data.bernoulli_dist <- function(dist, n, ...) {
23 |     return(as.numeric(stats::rbinom(n, size = 1, prob = dist[['rate']])))
24 | }
25 | 
26 | #' @export
27 | #' @importFrom stats rnorm
28 | simulate_data.normal_dist <- function(dist, n, ...) {
29 |     return(stats::rnorm(n, mean = dist[['mu']], sd = dist[['sigma']]))
30 | }
31 | 
32 | #' @export
33 | #' @importFrom stats rpois
34 | simulate_data.poisson_dist <- function(dist, n, ...) {
35 |     return(as.numeric(stats::rpois(n, lambda = dist[['rate']])))
36 | }
37 | 
38 | #' @title Simulate Data According to Some Distribution
39 | #' @name simulate_data
40 | #' @description Simulate a vector of data from a given distribution object.
41 | #' @export
42 | #' @param dist An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'},
43 | #'             \code{'gamma_dist'}, \code{'bernouilli_dist'}, \code{'normal_dist'},
44 | #'             or \code{'poisson_dist'} that specifies the parameters of some distribution
45 | #' @param n How many data points to simulate
46 | #' @param ... Arguments to be passed onto other methods
47 | #' @return A vector of simulated data.
48 | simulate_data <- function(dist, n, ...) {
49 |     UseMethod('simulate_data')
50 | }
51 | 
52 | 


--------------------------------------------------------------------------------
/R/simulation_study.R:
--------------------------------------------------------------------------------
 1 | #' @title Investigate Properties of Bayesian A/B Testing
 2 | #' @name investigate_simulations
 3 | #' @description Run multiple simulations of an A/B test in order to evaluate various properties
 4 | #' @export
 5 | #' @inheritParams ab_arguments
 6 | #' @param num_sims A positive integer that specifies how many simulations to perform.
 7 | #' @param sampling_distribution An list of distribution objects that specifies how the data generating
 8 | #'                              distributions should be created.
 9 | #' @param num_cores How many cores to use in the parallelization of tests. Default is \code{NULL}, which
10 | #'                  means no parallelization.
11 | #' @importFrom data.table setnames set rbindlist
12 | #' @importFrom purrr map_dbl
13 | #' @importFrom parallel makeCluster clusterExport clusterApplyLB stopCluster
14 | #' @return A list containing two data.tables: one with summary statistics for each simulation
15 | #'         and one with the averages over all of the simulations.
16 | investigate_simulations <- function(num_sims
17 |                                     , priors
18 |                                     , loss_threshold
19 |                                     , data_dists = NULL
20 |                                     , sampling_distribution = NULL
21 |                                     , obs_per_round = 1000
22 |                                     , max_rounds = 100
23 |                                     , sim_batch_size = 1e5
24 |                                     , num_cores = NULL
25 | ) {
26 |     
27 |     if (!is.null(sampling_distribution)) {
28 |         distributions <- get_data_dists(dist = sampling_distribution
29 |                                         , n = num_sims
30 |                                         , num_variants = 2)
31 |     } else {
32 |         distributions <- rep(list(data_dists), num_sims)
33 |     }
34 |     
35 |     if (!is.null(num_cores)) {
36 |         cluster <- parallel::makeCluster(num_cores)
37 |         parallel::clusterExport(cluster
38 |                                 , varlist = c('simulate_ab_test', 'priors'
39 |                                               , 'loss_threshold', 'obs_per_round'
40 |                                               , 'max_rounds', 'sim_batch_size')
41 |                                 , envir = environment())
42 |         result_list <- parallel::clusterApplyLB(
43 |             cluster
44 |             , x = distributions
45 |             , fun = function(dist) {
46 |                 result <- simulate_ab_test(data_dists = dist
47 |                                            , priors = priors
48 |                                            , loss_threshold = loss_threshold
49 |                                            , obs_per_round = obs_per_round
50 |                                            , max_rounds = max_rounds
51 |                                            , sim_batch_size = sim_batch_size)
52 |                 return(list(num_obs_seen = result[['num_obs']]
53 |                             , test_finished = result[['test_finished']]
54 |                             , bad_decision = as.numeric(result[['actual_loss']] > 0)
55 |                             , loss = result[['actual_loss']]))
56 |             })
57 |         parallel::stopCluster(cluster)
58 |         sim_dt <- data.table::rbindlist(result_list)
59 |     } else {
60 |         sim_columns <- c('num_obs_seen', 'test_finished', 'bad_decision', 'loss')
61 |         sim_dt <- create_empty_dt(num_rows = num_sims, column_names = sim_columns)
62 |         for (i in 1:num_sims) {
63 |             result <- simulate_ab_test(data_dists = distributions[[i]]
64 |                                        , priors = priors
65 |                                        , loss_threshold = loss_threshold
66 |                                        , obs_per_round = obs_per_round
67 |                                        , max_rounds = max_rounds
68 |                                        , sim_batch_size = sim_batch_size)
69 |             result_list <- list(result[['num_obs']]
70 |                                 , result[['test_finished']]
71 |                                 , as.numeric(result[['actual_loss']] > 0)
72 |                                 , result[['actual_loss']])
73 |             
74 |             data.table::set(sim_dt, i = i, j = names(sim_dt), value = result_list)
75 |         }
76 |     }
77 |     summary_dt <- sim_dt[, .(avg_obs_seen = mean(num_obs_seen)
78 |                              , p80_obs_seen = quantile(num_obs_seen, 0.8)
79 |                              , max_obs_seen = max(num_obs_seen)
80 |                              , test_finish_rate = mean(test_finished)
81 |                              , wrong_decision_rate = mean(bad_decision)
82 |                              , avg_loss = mean(loss))]
83 |     
84 |     return(list(sim_dt = sim_dt, summary_dt = summary_dt))
85 | }
86 | 


--------------------------------------------------------------------------------
/R/stopping_functions.R:
--------------------------------------------------------------------------------
 1 | #' @title Decide whether to stop the experiment using the last loss values
 2 | #' @name single_loss_stop
 3 | #' @description This function decides whether or not to stop the experiment and
 4 | #'              declare one of the variants as the winner.
 5 | #' @param losses A named vector of the losses for each variant.
 6 | #' @param loss_threshold Choose a variant once the loss goes beneath this value.
 7 | #' @export
 8 | #' @return A list containg two named elements: \code{'stop'} (a boolean) and
 9 | #'         \code{'winner'}, which is \code{NULL} if \code{'stop'} is \code{FALSE}
10 | #'         else it is the name of the winning variant
11 | single_loss_stop <- function(losses, loss_threshold) {
12 |     if (any(losses < loss_threshold)) {
13 |         best_variant <- names(losses)[which.min(losses)]
14 |         return(list(stop = TRUE, variant = best_variant))
15 |     } else {
16 |         return(list(stop = FALSE, variant = NULL))
17 |     }
18 | }


--------------------------------------------------------------------------------
/R/update_priors.R:
--------------------------------------------------------------------------------
  1 | #' @export
  2 | update_prior.beta_dist <- function(prior, evidence = NULL, stats = NULL, ...) {
  3 |     if (!is.null(evidence)) {
  4 |         if (!is.numeric(evidence) || any(!(evidence %in% c(0, 1)))) {
  5 |             stop('data passed to update_prior for beta distribution must be a binary vector')
  6 |         }
  7 |         num_success <- sum(evidence == 1)
  8 |         num_failure <- sum(evidence == 0)
  9 |     } else {
 10 |         if (is.null(stats)) {
 11 |             stop('If evidence is NULL, stats cannot be NULL')
 12 |         }
 13 |         num_success <- stats[['num_obs']] * stats[['observed_rate']]
 14 |         num_failure <- stats[['num_obs']] - num_success
 15 |     }
 16 |     return(beta_dist(alpha = prior[['alpha']] + num_success
 17 |                      , beta = prior[['beta']] + num_failure))
 18 | }
 19 | 
 20 | #' @export
 21 | update_prior.normal_gamma_dist <- function(prior, evidence = NULL, stats = NULL, ...) {
 22 |     if (!is.null(evidence)) {
 23 |         if (!is.numeric(evidence)) {
 24 |             stop('evidence must be a numeric vector')
 25 |         }
 26 |         num_obs <- length(evidence)
 27 |         avg <- mean(evidence)
 28 |         sse <- sum((evidence - avg) ^ 2)
 29 |     } else {
 30 |         if (is.null(stats)) {
 31 |             stop('If evidence is NULL, stats cannot be NULL')
 32 |         }
 33 |         num_obs <- stats[['num_obs']]
 34 |         avg <- stats[['avg']]
 35 |         std_dev <- stats[['std_dev']]
 36 |         sse <- (std_dev ^ 2) * (num_obs - 1)
 37 |     }
 38 |     
 39 |     beta_term <- (prior[['lambda']] * num_obs * (avg - prior[['mu']]) ^ 2) / (prior[['lambda']] + num_obs)
 40 |     return(normal_gamma_dist(mu = (prior[['lambda']] * prior[['mu']] + num_obs * avg) / (prior[['lambda']] + num_obs)
 41 |                              , lambda = prior[['lambda']] + num_obs
 42 |                              , alpha = prior[['alpha']] + num_obs / 2
 43 |                              , beta = prior[['beta']] + (sse + beta_term) / 2))
 44 | }
 45 | 
 46 | #' @export
 47 | update_prior.gamma_dist <- function(prior, evidence = NULL, stats = NULL, ...) {
 48 |     if (!is.null(evidence)) {
 49 |         if (any(evidence < 0)) {
 50 |             stop('evidence must be a positive integer vector')
 51 |         }
 52 |         num_sessions <- length(evidence)
 53 |         observed_count <- sum(evidence)
 54 |     } else {
 55 |         if (is.null(stats)) {
 56 |             stop('If evidence is NULL, stats cannot be NULL')
 57 |         }
 58 |         num_sessions <- stats[['num_sessions']]
 59 |         observed_count <- stats[['observed_count']]
 60 |     }
 61 |     return(gamma_dist(alpha = prior[['alpha']] + observed_count
 62 |                       , beta = prior[['beta']] + num_sessions))
 63 | }
 64 | 
 65 | #' @title Update Prior Parameters
 66 | #' @name update_prior
 67 | #' @description Use observed data to update parameters of prior distribution
 68 | #' @export
 69 | #' @importFrom purrr map
 70 | #' @param prior An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}
 71 | #'             , or \code{'gamma_dist'} that specifies the parameters of some distribution
 72 | #' @param evidence A numeric vector that contains observed data. Default is \code{NULL},
 73 | #'                 will override \code{stats} if specified.
 74 | #' @param stats An object of class \code{'beta_stats'}, \code{'normal_gamma_stats'}, or
 75 | #'              \code{'gamma_stats'} that contains sufficient statistics for the
 76 | #'              update. Default is \code{NULL}, will be ignored if \code{evidence}
 77 | #'              is specified.
 78 | #' @param ... Arguments to be passed onto other methods
 79 | update_prior <- function(prior, evidence = NULL, stats = NULL, ...) {
 80 |     UseMethod('update_prior')
 81 | }
 82 | 
 83 | #' @title Update the hyper-parameters of prior distributions
 84 | #' @name update_priors
 85 | #' @description This function updates the hyper-parameters of our prior distributions
 86 | #' @inheritParams ab_arguments
 87 | #' @param evidence_dt A data.table containing the raw data generated by each variant.
 88 | #'                    Column names must be identical to variant names in \code{priors}
 89 | #' @param stats_dt A data.table containing statistics about the raw data generated by each
 90 | #'                 variant.
 91 | #' @export
 92 | #' @importFrom purrr map
 93 | #' @return A list of distribution objects that represented the updated prior distributions
 94 | update_priors <- function(priors, evidence_dt = NULL, stats_dt = NULL) {
 95 |     variants <- names(priors)
 96 |     
 97 |     if (!is.null(evidence_dt)) {
 98 |         validate_dt(evidence_dt, expected_cols = variants)
 99 |         posteriors <- purrr::map(variants, function(x) {
100 |             update_prior(prior = priors[[x]]
101 |                          , evidence = evidence_dt[[x]])
102 |         })
103 |     } else {
104 |         validate_stats_dt(stats_dt, class(priors[['a']]))
105 |         posteriors <- purrr::map(variants, function(x) {
106 |             update_prior(prior = priors[[x]]
107 |                          , stats = as.list(stats_dt[variant == x, !'variant']))
108 |         })
109 |     }
110 |     
111 |     names(posteriors) <- names(priors)
112 |     return(posteriors)
113 | }
114 | 
115 | #' @title Check That Data Table Matches Expectations
116 | #' @name validate_dt
117 | #' @description Use this function to assert certain properties of a data.table
118 | #' @export
119 | #' @param dt A data.table
120 | #' @param expected_cols A vector of column names that \code{dt} must have
121 | #' @return \code{NULL} if successful, else a fatal error
122 | validate_dt <- function(dt, expected_cols) {
123 |     if (!inherits(dt, 'data.table')) {
124 |         stop('dt must be a data.table')
125 |     }
126 |     
127 |     cols <- names(dt)
128 |     missing_cols <- setdiff(expected_cols, cols)
129 |     if (length(missing_cols) > 0) {
130 |         stop(paste('missing columns:', paste(missing_cols, collapse = ', ')))
131 |     }
132 |     return(NULL)
133 | }
134 | 
135 | validate_stats_dt <- function(dt, dist_name) {
136 |     if (!inherits(dt, 'data.table')) {
137 |         stop('stats_dt must be a data.table')
138 |     }
139 |     
140 |     cols <- names(dt)
141 |     if (identical(dist_name, 'beta_dist')) {
142 |         expected <- c('variant', 'num_obs', 'observed_rate')
143 |     } else if (identical(dist_name, 'normal_gamma_dist')) {
144 |         expected <- c('variant', 'num_obs', 'avg', 'std_dev')
145 |     } else if (identical(dist_name, 'gamma_dist')) {
146 |         expected <- c('variant', 'num_sessions', 'observed_count')
147 |     } else {
148 |         stop('Unsupported prior distribution type')
149 |     }
150 |     if (!identical(sort(cols), sort(expected))) {
151 |         stop(paste0('bad column names in stats_dt provided to update_priors. expected:'
152 |                     , paste(expected, collapse = ', '), 'received:'
153 |                     , paste(cols, collapse = ', ')))
154 |     }
155 | }
156 | 
157 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # abayes
 2 | 
 3 | ## What is this package?
 4 | 
 5 | abayes is an R package that implements Bayesian methods for A/B testing. You can use this package to perform simulations of A/B tests in order to evaluate the Bayesian methods. You can also use the functions in this package to evaluate the results of a particular A/B test or use [this shiny app](https://convoyds.shinyapps.io/abayes/).
 6 | 
 7 | ### Other Packages on Bayesian A/B Testing
 8 | 
 9 | I wrote most of this code before investigating the existence of open source alternatives. However, it is important to recognize them now. In order to evaluate A/B tests, a popular option is to use the [bayesAB](https://github.com/FrankPortman/bayesAB) package created by Frank Portman. In order to run simulations, David Robinson wrote code to perform simulations for [this blog post](http://varianceexplained.org/r/bayesian-ab-testing/). abayes lies somewhere in between these two packages. bayesAB can handle more complicated experiment settings and performs inferences faster, but is not designed for simulating data to analyze the performance of Bayesian methods. David's code is only designed for evaluating tests of binomial data and makes stronger assumptions about the simulation procedure than this package.
10 | 
11 | ## Why did I write this package?
12 | 
13 | As a data scientist at Convoy, I investigated the performance of Bayesian A/B testing compared to Convoy's existing experimentation framework. After initial simulations showed that the speed and accuracy of Bayesian methods were promising, I wrote more simulation code and expanded the set of functions to support various experimental settings. As Bayesian A/B testing began to be used more and more throughout Convoy, I put all of the functions into a package and built [a shiny app](https://convoyds.shinyapps.io/abayes/) on top of that package. This allowed my co-workers to use my functions easily.
14 | 
15 | ## Why did I open source this package?
16 | 
17 | As part of the blog post about Bayesian A/B testing, I included a visualization of the guarantees about controlling the expected loss. In the spirit of full transparency, I wanted to provide the script and the code that produced the visualization.
18 | 
19 | ## How to install
20 | 
21 | This package is not on CRAN, and there is no plan on submitting it to CRAN. In order to install this package, you can use `devtools::install_github('convoyinc/abayes')` or you can clone the package locally and use `devtools::install_local('abayes')`.
22 | 
23 | ## What can this package do?
24 | 
25 | For a detailed demonstration of the simulation capabilities, please see [the vignette](https://github.com/convoyinc/abayes/blob/master/vignettes/abayes_vignette.Rmd) included in this repo. There, I provide an example of the simulations that demonstrate the guarantees that A/B testing makes.
26 | 
27 | However, this package can also be used to evaluate the results of a single A/B test. Below is some code that demonstrates how to use the package for that purpose.
28 | 
29 | ``` r
30 | library(abayes)
31 | library(purrr)
32 | library(data.table)
33 | n <- 1000
34 | prior_dist <- list(a = beta_dist(alpha = 1, beta = 1), b = beta_dist(alpha = 1, beta = 1))
35 | data_dist <- list(a = bernoulli_dist(rate = 0.1), b = bernoulli_dist(rate = 0.12))
36 | evidence_dt <- data.table::as.data.table(purrr::map(data_dist, function(x) simulate_data(x, n)))
37 | posterior_dist <- update_priors(prior_dist, evidence_dt)
38 | 
39 | b_gt_a(posterior_dist[['a']], posterior_dist[['b']])
40 | expected_loss_b(posterior_dist[['a']], posterior_dist[['b']])
41 | plot_beta(posterior_dist)
42 | ```
43 | 
44 | ## We welcome other open source contributions!
45 | 
46 | There are many ways that we can make this package better. If you have any ideas, please fork this repo and submit a pull request. Thanks for your interest in Bayesian A/B testing.
47 | 


--------------------------------------------------------------------------------
/inst/abayes/app.R:
--------------------------------------------------------------------------------
  1 | library(shiny)
  2 | library(data.table)
  3 | library(purrr)
  4 | library(ggplot2)
  5 | library(gridExtra)
  6 | library(markdown)
  7 | library(abayes)
  8 | 
  9 | source('server_funs.R')
 10 | source('ui_elements.R')
 11 | 
 12 | ui <- navbarPage('Bayesian A/B Testing'
 13 |     , tabPanel('Basic',
 14 |         sidebarLayout(
 15 |             sidebarPanel(
 16 |                 
 17 |                 h3('Choose Your Metric', align = 'center', style="color:#f65335")
 18 |                 , radioButtons(inputId = 'dist_name_basic', label = 'Which option describes your metric?'
 19 |                                , choices = dist_name_choices)
 20 |                 
 21 |                 , h3('Enter What You Believe Before The Experiment', align = 'center', style="color:#f65335")
 22 |                 
 23 |                 , conditionalPanel('input.dist_name_basic == "binomial"'
 24 |                     , choose_binomial_rate(variant = NULL, id_type = 'basic')
 25 |                     , choose_binomial_bound(variant = NULL, id_type = 'basic')
 26 |                     , br()
 27 |                     , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
 28 |                     , h4('Variant A', align = 'center')
 29 |                     , set_binomial_obs(variant = 'a', id_type = 'basic')
 30 |                     , set_binomial_rate(variant = 'a', id_type = 'basic')
 31 |                     , h4('Variant B', align = 'center')
 32 |                     , set_binomial_obs(variant = 'b', id_type = 'basic')
 33 |                     , set_binomial_rate(variant = 'b', id_type = 'basic')
 34 |                     , br()
 35 |                     , h3('Choose Your Loss Function', align = 'center', style="color:#f65335")
 36 |                     , radioButtons(inputId = 'loss_type_basic', label = 'Choose between the absolute loss or the percent loss'
 37 |                                    , choices = list('absolute' = 'absolute', 'percent' = 'percent'))
 38 |                 )
 39 |                 
 40 |                 , conditionalPanel('input.dist_name_basic == "normal"'
 41 |                     , choose_normal_mean(variant = NULL, id_type = 'basic')
 42 |                     , choose_normal_mean_bound(variant = NULL, id_type = 'basic')
 43 |                     , choose_normal_sd(variant = NULL, id_type = 'basic')
 44 |                     , choose_normal_sd_bound(variant = NULL, id_type = 'basic')
 45 |                     , br()
 46 |                     , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
 47 |                     , h4('Variant A', align = 'center')
 48 |                     , set_normal_count(variant = 'a', id_type = 'basic')
 49 |                     , set_normal_mean(variant = 'a', id_type = 'basic')
 50 |                     , set_normal_sd(variant = 'a', id_type = 'basic')
 51 |                     , h4('Variant B', align = 'center')
 52 |                     , set_normal_count(variant = 'b', id_type = 'basic')
 53 |                     , set_normal_mean(variant = 'b', id_type = 'basic')
 54 |                     , set_normal_sd(variant = 'b', id_type = 'basic')
 55 |                 )
 56 |                 
 57 |                 , conditionalPanel('input.dist_name_basic == "poisson"'
 58 |                    , choose_poisson_lambda(variant = NULL, id_type = 'basic')
 59 |                    , choose_poisson_max(variant = NULL, id_type = 'basic')
 60 |                    , br()
 61 |                    , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
 62 |                    , h4('Variant A', align = 'center')
 63 |                    , set_poisson_num(variant = 'a', id_type = 'basic')
 64 |                    , set_poisson_total(variant = 'a', id_type = 'basic')
 65 |                    , h4('Variant B', align = 'center')
 66 |                    , set_poisson_num(variant = 'b', id_type = 'basic')
 67 |                    , set_poisson_total(variant = 'b', id_type = 'basic')
 68 |                 )
 69 |             ),
 70 |             
 71 |             # Main panel for displaying outputs ----
 72 |             mainPanel(
 73 |                 
 74 |                 h1('Before The Experiment', align = 'center', style="color:#f65335")
 75 |                 , plotOutput(outputId = "plot_prior_basic")
 76 |                 , textOutput(outputId = 'prior_description_basic')
 77 |                 , h1('After Observing Some Data', align = 'center', style="color:#f65335")
 78 |                 , conditionalPanel('(input.dist_name_basic == "binomial" && (input.obs_a_basic != 0 || input.obs_b_basic != 0))
 79 |                                     || (input.dist_name_basic == "normal" && (input.count_a_basic != 0 || input.count_b_basic != 0))
 80 |                                     || (input.dist_name_basic == "poisson" && (input.num_a_basic != 0 || input.num_b_basic != 0))'
 81 |                     , tableOutput(outputId = 'metrics_basic')
 82 |                     , plotOutput(outputId = "plot_posterior_basic")
 83 |                 )
 84 |                 , conditionalPanel('(input.dist_name_basic == "binomial" && input.obs_a_basic == 0 && input.obs_b_basic == 0)
 85 |                                     || (input.dist_name_basic == "normal" && input.count_a_basic == 0 && input.count_b_basic == 0)
 86 |                                     || (input.dist_name_basic == "poisson" && input.num_a_basic == 0 && input.num_b_basic == 0)'
 87 |                                    , br()
 88 |                                    , br()
 89 |                                    , h4(paste('In order to determine which variant is better, you need to enter'
 90 |                                               , 'the results of your experiment. Use the panel on the left.')
 91 |                                         , style="color:black")
 92 |                 )
 93 |             )
 94 |         )
 95 |     )
 96 |     , tabPanel('Advanced',
 97 |         sidebarLayout(
 98 |             sidebarPanel(
 99 |                        
100 |                 h3('Choose Your Metric', align = 'center', style="color:#f65335")
101 |                 , radioButtons(inputId = 'dist_name_adv', label = 'Which option describes your metric?'
102 |                                , choices = dist_name_choices)
103 |                    
104 |                 , h3('Enter What You Believe Before The Experiment', align = 'center', style="color:#f65335")
105 |                    
106 |                 , conditionalPanel('input.dist_name_adv == "binomial"'
107 |                        , h4('Variant A', align = 'center')
108 |                        , choose_binomial_rate(variant = 'a', id_type = 'adv')
109 |                        , choose_binomial_bound(variant = 'a', id_type = 'adv')
110 |                        , h4('Variant B', align = 'center')
111 |                        , choose_binomial_rate(variant = 'b', id_type = 'adv')
112 |                        , choose_binomial_bound(variant = 'b', id_type = 'adv')
113 |                        , br()
114 |                        , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
115 |                        , h4('Variant A', align = 'center')
116 |                        , set_binomial_obs(variant = 'a', id_type = 'adv')
117 |                        , set_binomial_rate(variant = 'a', id_type = 'adv')
118 |                        , h4('Variant B', align = 'center')
119 |                        , set_binomial_obs(variant = 'b', id_type = 'adv')
120 |                        , set_binomial_rate(variant = 'b', id_type = 'adv')
121 |                        , br()
122 |                        , h3('Choose Your Loss Function', align = 'center', style="color:#f65335")
123 |                        , radioButtons(inputId = 'loss_type_adv', label = 'Choose between the absolute loss or the percent loss'
124 |                                       , choices = list('absolute' = 'absolute', 'percent' = 'percent'))
125 |                    )
126 |                    
127 |                 , conditionalPanel('input.dist_name_adv == "normal"'
128 |                        , h4('Variant A', align = 'center')
129 |                        , choose_normal_mean(variant = 'a', id_type = 'adv')
130 |                        , choose_normal_mean_bound(variant = 'a', id_type = 'adv')
131 |                        , choose_normal_sd(variant = 'a', id_type = 'adv')
132 |                        , choose_normal_sd_bound(variant = 'a', id_type = 'adv')
133 |                        , h4('Variant B', align = 'center')
134 |                        , choose_normal_mean(variant = 'b', id_type = 'adv')
135 |                        , choose_normal_mean_bound(variant = 'b', id_type = 'adv')
136 |                        , choose_normal_sd(variant = 'b', id_type = 'adv')
137 |                        , choose_normal_sd_bound(variant = 'b', id_type = 'adv')
138 |                        , br()
139 |                        , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
140 |                        , h4('Variant A', align = 'center')
141 |                        , set_normal_count(variant = 'a', id_type = 'adv')
142 |                        , set_normal_mean(variant = 'a', id_type = 'adv')
143 |                        , set_normal_sd(variant = 'a', id_type = 'adv')
144 |                        , h4('Variant B', align = 'center')
145 |                        , set_normal_count(variant = 'b', id_type = 'adv')
146 |                        , set_normal_mean(variant = 'b', id_type = 'adv')
147 |                        , set_normal_sd(variant = 'b', id_type = 'adv')
148 |                    )
149 |                    
150 |                 , conditionalPanel('input.dist_name_adv == "poisson"'
151 |                        , h4('Variant A', align = 'center')
152 |                        , choose_poisson_lambda(variant = 'a', id_type = 'adv')
153 |                        , choose_poisson_max(variant = 'a', id_type = 'adv')
154 |                        , h4('Variant B', align = 'center')
155 |                        , choose_poisson_lambda(variant = 'b', id_type = 'adv')
156 |                        , choose_poisson_max(variant = 'b', id_type = 'adv')
157 |                        , br()
158 |                        , h3('Enter The Results of Your Experiment', align = 'center', style="color:#f65335")
159 |                        , h4('Variant A', align = 'center')
160 |                        , set_poisson_num(variant = 'a', id_type = 'adv')
161 |                        , set_poisson_total(variant = 'a', id_type = 'adv')
162 |                        , h4('Variant B', align = 'center')
163 |                        , set_poisson_num(variant = 'b', id_type = 'adv')
164 |                        , set_poisson_total(variant = 'b', id_type = 'adv')
165 |                 )
166 |             ),
167 |                
168 |                # Main panel for displaying outputs ----
169 |            mainPanel(
170 |                
171 |                h1('Before The Experiment', align = 'center', style="color:#f65335")
172 |                , plotOutput(outputId = "plot_prior_adv")
173 |                , textOutput(outputId = 'prior_description_adv')
174 |                , h1('After Observing Some Data', align = 'center', style="color:#f65335")
175 |                , conditionalPanel('(input.dist_name_adv == "binomial" && (input.obs_a_adv != 0 || input.obs_b_adv != 0))
176 |                                   || (input.dist_name_adv == "normal" && (input.count_a_adv != 0 || input.count_b_adv != 0))
177 |                                   || (input.dist_name_adv == "poisson" && (input.num_a_adv != 0 || input.num_b_adv != 0))'
178 |                                   , tableOutput(outputId = 'metrics_adv')
179 |                                   , plotOutput(outputId = "plot_posterior_adv")
180 |                                   , textOutput(outputId = 'posterior_description_adv')
181 |                                   # , plotOutput(outputId = 'relative_gain_adv')
182 |                )
183 |                , conditionalPanel('(input.dist_name_adv == "binomial" && input.obs_a_adv == 0 && input.obs_b_adv == 0)
184 |                                   || (input.dist_name_adv == "normal" && input.count_a_adv == 0 && input.count_b_adv == 0)
185 |                                   || (input.dist_name_adv == "poisson" && input.num_a_adv == 0 && input.num_b_adv == 0)'
186 |                                   , br()
187 |                                   , br()
188 |                                   , h4(paste('In order to determine which variant is better, you need to enter'
189 |                                              , 'the results of your experiment. Use the panel on the left.')
190 |                                        , style="color:black")
191 |                )
192 |            )
193 |         )
194 |     )
195 |     , tabPanel('How To Use',
196 |         fluidRow(column(12, includeMarkdown('how_to.md')))
197 |     )
198 | )
199 | 
200 | # generate the outputs
201 | server <- function(input, output) {
202 |     
203 |     prior_dist_basic <- reactive({
204 |         prior_dist <- create_prior_dist(dist_name = input$dist_name_basic
205 |                                         , expected_rate = input$expected_rate_basic
206 |                                         , upper_bound_rate = input$upper_bound_basic
207 |                                         , expected_mean = input$expected_mean_basic
208 |                                         , upper_bound_mean = input$upper_bound_mean_basic
209 |                                         , expected_sd = input$expected_sd_basic
210 |                                         , upper_bound_sd = input$upper_bound_sd_basic
211 |                                         , expected_lambda = input$expected_lambda_basic
212 |                                         , upper_bound_lambda = input$max_lambda_basic
213 |         )
214 |         prior_dist
215 |     })
216 |     
217 |     prior_dist_adv <- reactive({
218 |         prior_dist_a <- create_prior_dist(dist_name = input$dist_name_adv
219 |                                           , expected_rate = input$expected_rate_a_adv
220 |                                           , upper_bound_rate = input$upper_bound_a_adv
221 |                                           , expected_mean = input$expected_mean_a_adv
222 |                                           , upper_bound_mean = input$upper_bound_mean_a_adv
223 |                                           , expected_sd = input$expected_sd_a_adv
224 |                                           , upper_bound_sd = input$upper_bound_sd_a_adv
225 |                                           , expected_lambda = input$expected_lambda_a_adv
226 |                                           , upper_bound_lambda = input$max_lambda_a_adv
227 |         )
228 |         prior_dist_b <- create_prior_dist(dist_name = input$dist_name_adv
229 |                                           , expected_rate = input$expected_rate_b_adv
230 |                                           , upper_bound_rate = input$upper_bound_b_adv
231 |                                           , expected_mean = input$expected_mean_b_adv
232 |                                           , upper_bound_mean = input$upper_bound_mean_b_adv
233 |                                           , expected_sd = input$expected_sd_b_adv
234 |                                           , upper_bound_sd = input$upper_bound_sd_b_adv
235 |                                           , expected_lambda = input$expected_lambda_b_adv
236 |                                           , upper_bound_lambda = input$max_lambda_b_adv
237 |         )
238 |         if (is.character(prior_dist_a)) {
239 |             paste('Variant A:', prior_dist_a)
240 |         } else if (is.character(prior_dist_b)) {
241 |             paste('Variant B:', prior_dist_b)
242 |         } else {
243 |             list('a' = prior_dist_a, 'b' = prior_dist_b)
244 |         }
245 |     })
246 |     
247 |     results_dt_basic <- reactive({
248 |         results_dt <- create_results_dt(dist_name = input$dist_name_basic
249 |                                         , obs_a = input$obs_a_basic, obs_b = input$obs_b_basic
250 |                                         , rate_a = input$rate_a_basic, rate_b = input$rate_b_basic
251 |                                         , count_a = input$count_a_basic, count_b = input$count_b_basic
252 |                                         , mean_a = input$mean_a_basic, mean_b = input$mean_b_basic
253 |                                         , sd_a = input$sd_a_basic, sd_b = input$sd_b_basic
254 |                                         , num_a = input$num_a_basic, num_b = input$num_b_basic
255 |                                         , total_a = input$total_a_basic, total_b = input$total_b_basic)
256 |         results_dt
257 |     })
258 |     
259 |     results_dt_adv <- reactive({
260 |         results_dt <- create_results_dt(dist_name = input$dist_name_adv
261 |                                         , obs_a = input$obs_a_adv, obs_b = input$obs_b_adv
262 |                                         , rate_a = input$rate_a_adv, rate_b = input$rate_b_adv
263 |                                         , count_a = input$count_a_adv, count_b = input$count_b_adv
264 |                                         , mean_a = input$mean_a_adv, mean_b = input$mean_b_adv
265 |                                         , sd_a = input$sd_a_adv, sd_b = input$sd_b_adv
266 |                                         , num_a = input$num_a_adv, num_b = input$num_b_adv
267 |                                         , total_a = input$total_a_adv, total_b = input$total_b_adv)
268 |         results_dt
269 |     })
270 |     
271 |     posterior_basic <- reactive({
272 |         prior_dist <- prior_dist_basic()
273 |         results_dt <- results_dt_basic()
274 |         
275 |         if (is.character(prior_dist)) {
276 |             return(prior_dist)
277 |         }
278 |             
279 |         if (is.character(results_dt)) {
280 |             return(results_dt)
281 |         }
282 |         priors <- list('a' = prior_dist, 'b' = prior_dist)
283 |         update_priors(priors = priors, stats_dt = results_dt)
284 |     })
285 |     
286 |     posterior_adv <- reactive({
287 |         prior_dist <- prior_dist_adv()
288 |         results_dt <- results_dt_adv()
289 |         
290 |         if (is.character(prior_dist)) {
291 |             return(prior_dist)
292 |         }
293 |         
294 |         if (is.character(results_dt)) {
295 |             return(results_dt)
296 |         }
297 |         update_priors(priors = prior_dist, stats_dt = results_dt)
298 |     })
299 |     
300 |     metrics_dt_basic <- reactive({
301 |         posteriors <- posterior_basic()
302 |         if (is.character(posteriors)) {
303 |             return(data.table::data.table())
304 |         }
305 |         
306 |         create_metric_dt(posteriors = posteriors, method = input$loss_type_basic)
307 |     })
308 |     
309 |     metrics_dt_adv <- reactive({
310 |         posteriors <- posterior_adv()
311 |         if (is.character(posteriors)) {
312 |             return(data.table::data.table())
313 |         }
314 |         create_metric_dt(posteriors = posteriors, method = input$loss_type_adv)
315 |     })
316 |     
317 |     # prior plot
318 |     output$plot_prior_basic <- renderPlot({
319 |         prior_dist <- prior_dist_basic()
320 |         if (!is.character(prior_dist)) {
321 |             prior_dist <- list('a' = prior_dist)
322 |         }
323 |         plot_dists(dists = prior_dist, input$dist_name_basic)
324 |     })
325 |     
326 |     output$plot_prior_adv <- renderPlot({
327 |         plot_dists(dists = prior_dist_adv(), input$dist_name_adv)
328 |     })
329 |     
330 |     # prior description
331 |     output$prior_description_basic <- renderText({
332 |         create_description(dist_name = input$dist_name_basic, dist = prior_dist_basic())
333 |     })
334 |     
335 |     output$prior_description_adv <- renderText({
336 |         prior_dist <- prior_dist_adv()
337 |         if (is.character(prior_dist)) {
338 |             return('')
339 |         }
340 |         
341 |         a <- create_description(dist_name = input$dist_name_adv, dist = prior_dist[['a']])
342 |         b <- create_description(dist_name = input$dist_name_adv, dist = prior_dist[['b']])
343 |         return(paste0('Variant A: ', a, '. Variant B: ', b))
344 |     })
345 |     
346 |     # posterior plot
347 |     output$plot_posterior_basic <- renderPlot({
348 |         plot_dists(dists = posterior_basic(), input$dist_name_basic)
349 |     })
350 |     
351 |     output$plot_posterior_adv <- renderPlot({
352 |         plot_dists(dists = posterior_adv(), input$dist_name_adv)
353 |     })
354 |     
355 |     output$metrics_basic <- renderTable({
356 |         metrics_dt_basic()
357 |     }, digits = 7, striped = TRUE, hover = FALSE, bordered = FALSE
358 |      , spacing = 'm', width = '90%', size = '40', align = 'c')
359 |     
360 |     output$metrics_adv <- renderTable({
361 |         metrics_dt_adv()
362 |     }, digits = 7, striped = TRUE, hover = FALSE, bordered = FALSE
363 |     , spacing = 'm', width = '90%', size = '40', align = 'c')
364 |     
365 |     output$posterior_description_adv <- renderText({
366 |         posterior <- posterior_adv()
367 |         if (is.character(posterior)) {
368 |             return('')
369 |         }
370 |         
371 |         a <- create_description(dist_name = input$dist_name_adv, dist = posterior[['a']])
372 |         b <- create_description(dist_name = input$dist_name_adv, dist = posterior[['b']])
373 |         return(paste0('Variant A: ', a, '. Variant B: ', b))
374 |     })
375 |     
376 |     # output$relative_gain_adv <- renderPlot({
377 |     #     dists <- posterior_adv()
378 |     #     plot_relative_gain(dists = dists)
379 |     # })
380 |     
381 | }
382 | 
383 | shinyApp(ui = ui, server = server)
384 | 
385 | 
386 | 
387 | 
388 | 


--------------------------------------------------------------------------------
/inst/abayes/dependencies.R:
--------------------------------------------------------------------------------
1 | require(abayes)
2 | require(data.table)
3 | require(ggplot2)
4 | require(gridExtra)
5 | require(markdown)
6 | require(purrr)
7 | require(stats)


--------------------------------------------------------------------------------
/inst/abayes/how_to.md:
--------------------------------------------------------------------------------
1 | # Bayesian A/B Testing
2 | 
3 | A brief explanation of Bayesian A/B testing and how to use this shiny app.
4 | 


--------------------------------------------------------------------------------
/inst/abayes/server_funs.R:
--------------------------------------------------------------------------------
  1 | library(data.table)
  2 | 
  3 | MAX_VAL <- 1e6
  4 | MAX_OBS <- 1e8
  5 | 
  6 | create_prior_dist <- function(dist_name
  7 |                               , expected_rate
  8 |                               , upper_bound_rate
  9 |                               , expected_mean
 10 |                               , upper_bound_mean
 11 |                               , expected_sd
 12 |                               , upper_bound_sd
 13 |                               , expected_lambda
 14 |                               , upper_bound_lambda
 15 | ) {
 16 |     if (dist_name == 'binomial') {
 17 |         in_vals <- c(expected_rate, upper_bound_rate)
 18 |         if (any(is.na(in_vals))) {
 19 |             return('Please enter data')
 20 |         }
 21 |         if (!all(data.table::between(in_vals, 0, 1))) {
 22 |             return('Rates must be between 0 and 1')
 23 |         }
 24 |         return(get_supported_beta(mu = expected_rate, bound = upper_bound_rate, desired_support = 0.05))
 25 |     } else if (dist_name == 'normal') {
 26 |         means <- c(expected_mean, upper_bound_mean)
 27 |         sds <- c(expected_sd, upper_bound_sd)
 28 |         if (any(is.na(c(means, sds)))) {
 29 |             return('Please enter data')
 30 |         }
 31 |         if (!all(data.table::between(means, -MAX_VAL, MAX_VAL))) {
 32 |             return('Average must be less extreme')
 33 |         }
 34 |         if (!all(data.table::between(sds, 0, MAX_VAL, incbounds = FALSE))) {
 35 |             return('Standard deviations must be less extreme')
 36 |         }
 37 |         return(get_supported_normal_gamma(mu = expected_mean, bound_mu = upper_bound_mean
 38 |                                           , sigma = expected_sd, bound_sigma = upper_bound_sd
 39 |                                           , desired_support = 0.05))
 40 |     } else {
 41 |         in_vals <- c(expected_lambda, upper_bound_lambda)
 42 |         if (any(is.na(in_vals))) {
 43 |             return('Please enter data')
 44 |         }
 45 |         if (!all(data.table::between(in_vals, 0, MAX_VAL))) {
 46 |             return('Values must be less extreme')
 47 |         }
 48 |         return(get_supported_gamma(mu = expected_lambda, bound = upper_bound_lambda, desired_support = 0.05))
 49 |     }
 50 | }
 51 | 
 52 | create_results_dt <- function(dist_name
 53 |                               , obs_a, obs_b, rate_a, rate_b
 54 |                               , count_a, count_b, mean_a, mean_b, sd_a, sd_b
 55 |                               , num_a, num_b, total_a, total_b
 56 | ) {
 57 |     if (dist_name == 'binomial') {
 58 |         obs <- c(obs_a, obs_b)
 59 |         rates <- c(rate_a, rate_b)
 60 |         if (any(is.na(c(obs, rates)))) {
 61 |             return('Please enter data')
 62 |         }
 63 |         if (!all(data.table::between(obs, 0, MAX_OBS))) {
 64 |             return('Please use less extreme number of observations')
 65 |         }
 66 |         if (!all(data.table::between(rates, 0, 1))) {
 67 |             return('Rates must be between 0 and 1')
 68 |         }
 69 |         return(data.table::data.table('variant' = c('a', 'b')
 70 |                                       , 'num_obs' = obs
 71 |                                       , 'observed_rate' = rates))
 72 |     } else if (dist_name == 'normal') {
 73 |         counts <- c(count_a, count_b)
 74 |         means <- c(mean_a, mean_b)
 75 |         sds <- c(sd_a, sd_b)
 76 |         if (any(is.na(c(counts, means, sds)))) {
 77 |             return('Please enter data')
 78 |         }
 79 |         if (!all(data.table::between(counts, 0, MAX_OBS))) {
 80 |             return('Please use less extreme number of observations')
 81 |         }
 82 |         if (!all(data.table::between(means, -MAX_VAL, MAX_VAL))) {
 83 |             return('Average must be less extreme')
 84 |         }
 85 |         if (!all(data.table::between(sds, 0, MAX_VAL, incbounds=FALSE))) {
 86 |             return('Standard deviations must be less extreme')
 87 |         }
 88 |         return(data.table::data.table('variant' = c('a', 'b')
 89 |                                       , 'num_obs' = counts
 90 |                                       , 'avg' = means
 91 |                                       , 'std_dev' = sds))
 92 |     } else {
 93 |         nums <- c(num_a, num_b)
 94 |         totals <- c(total_a, total_b)
 95 |         if (any(is.na(c(nums, totals)))) {
 96 |             return('Please enter data')
 97 |         }
 98 |         if (!all(data.table::between(nums, 0, MAX_VAL))) {
 99 |             return('Please use less extreme number of sessions')
100 |         }
101 |         if (!all(data.table::between(totals, 0, MAX_VAL))) {
102 |             return('Please use less extreme number of observations')
103 |         }
104 |         return(data.table::data.table('variant' = c('a', 'b')
105 |                                       , 'num_sessions' = c(num_a, num_b)
106 |                                       , 'observed_count' = c(total_a, total_b)))
107 |     }
108 | }
109 | 
110 | create_description <- function(dist_name, dist) {
111 |     if (is.character(dist)) {
112 |         return('')
113 |     } else {
114 |         if (dist_name == 'binomial') {
115 |             moments <- abayes::compute_moments(dist)
116 |             mu <- moments[['mu']]; sigma <- moments[['sigma']]
117 |             return(paste0('The distribution is a beta distribution with parameters: alpha = '
118 |                          , round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2)
119 |                          , '. Mean: ',  signif(mu, 3), ' and Standard Deviation: ', signif(sigma, 3)))
120 |         } else if (dist_name == 'normal') {
121 |             moments <- abayes::compute_moments(dist)
122 |             x_mu <- moments[['x']][['mu']]; x_sigma <- moments[['x']][['sigma']]
123 |             tau_mu <- moments[['tau']][['mu']]; tau_sigma <- moments[['tau']][['sigma']]
124 |             return(paste0('The distribution is a normal gamma distribution with parameters: mu = '
125 |                          , round(dist[['mu']], 2), ', lambda = ', round(dist[['lambda']], 2)
126 |                          , ', alpha = ', round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2)
127 |                          , '. Mean: ', signif(x_mu, 3), ' and Standard Deviation: ', signif(x_sigma, 3)))
128 |         } else {
129 |             moments <- abayes::compute_moments(dist)
130 |             mu <- moments[['mu']]; sigma <- moments[['sigma']]
131 |             return(paste0('The distribution is a gamma distribution with parameters: alpha = '
132 |                           , round(dist[['alpha']], 2), ', beta = ', round(dist[['beta']], 2)
133 |                           , '. Mean: ', signif(mu, 3), ' and Standard Deviation: ', signif(sigma, 3)))
134 |         }
135 |     }
136 | }
137 | 
138 | create_metric_dt <- function(posteriors, method) {
139 |     
140 |     metrics <- abayes::get_metrics(posteriors = posteriors, sim_batch_size = 1e5, method = method)
141 |     dt <- data.table::data.table(x = c('A', 'B')
142 |                                  , y = as.character(signif(c(metrics[['loss_a']], metrics[['loss_b']]), 3))
143 |                                  , z = as.character(signif(c(1 - metrics[['prob_b_gt_a']], metrics[['prob_b_gt_a']]), 3))
144 |                                  , w = c('-', as.character(signif(metrics[['effect_lower']], 3)))
145 |                                  , a = c('-', as.character(signif(metrics[['effect_expected']], 3)))
146 |                                  , b = c('-', as.character(signif(metrics[['effect_upper']], 3)))
147 |     )
148 |     data.table::setnames(dt, c('Variant', 'Risk Of Choosing Variant', 'Prob Variant is Larger'
149 |                                , '95% CI Lower Bound', 'Expected Effect Size (B - A)', '95% CI Upper Bound'))
150 |     return(dt)
151 | }
152 | 
153 | plot_dists <- function(dists, dist_name) {
154 |     if (is.character(dists)) {
155 |         df <- data.frame()
156 |         return(ggplot(df) + geom_point() + xlim(0, 1) + ylim(0, 1) +
157 |                    annotate('text', x = 0.5, y = 0.5, label = dists, size = 5))
158 |     } else {
159 |         if (dist_name == 'binomial') {
160 |             return(plot_beta(betas = dists, title = 'What We Believe About the Rate'))
161 |         } else if (dist_name == 'normal') {
162 |             return(plot_normal(normals = dists))
163 |         } else {
164 |             return(plot_gamma(gammas = dists, title = 'What We Believe About the Expected Count'))
165 |         }
166 |     }
167 | }
168 | 
169 | 


--------------------------------------------------------------------------------
/inst/abayes/ui_elements.R:
--------------------------------------------------------------------------------
  1 | dist_name_choices <- list('Rate: Your metric is the frequency that an event happens.' = 'binomial'
  2 |                           , 'Quantity: Your metric is a normally distributed, continuous quantity.' = 'normal'
  3 |                           , 'Count: Your metric is the number of times an event happens.' = 'poisson')
  4 | 
  5 | get_input_id <- function(id_string, variant, id_type) {
  6 |     if (is.null(variant)) {
  7 |         return(paste(id_string, id_type, sep = '_'))
  8 |     } else {
  9 |         return(paste(id_string, variant, id_type, sep = '_'))
 10 |     }
 11 | }
 12 | 
 13 | choose_binomial_rate <- function(variant, id_type) {
 14 |     return(numericInput(inputId = get_input_id("expected_rate", variant, id_type)
 15 |                         , label = "Expected Rate of the Event:"
 16 |                         , value = 0.1, min = 0, max = 1, step = 0.01))
 17 | }
 18 | 
 19 | choose_binomial_bound <- function(variant, id_type) {
 20 |     return(numericInput(inputId = get_input_id("upper_bound", variant, id_type)
 21 |                         , label = "Maximum Possible Rate:"
 22 |                         , value = 0.2, min = 0, max = 1, step = 0.01))
 23 | }
 24 | 
 25 | set_binomial_obs <- function(variant, id_type) {
 26 |     return(numericInput(inputId = get_input_id("obs", variant, id_type)
 27 |                         , label = "Number of Observations:"
 28 |                         , value = 0, min = 0, max = 1e8, step = 1))
 29 | }
 30 | 
 31 | set_binomial_rate <- function(variant, id_type) {
 32 |     return(numericInput(inputId = get_input_id("rate", variant, id_type)
 33 |                         , label = "Observed Frequency of the Event:"
 34 |                         , value = 0.1, min = 0, max = 1, step = 0.01))
 35 | }
 36 | 
 37 | choose_normal_mean <- function(variant, id_type) {
 38 |     return(numericInput(inputId = get_input_id('expected_mean', variant, id_type)
 39 |                         , label = "Expected Average:"
 40 |                         , value = 0, min = -1e6, max = 1e6, step = 1))
 41 | }
 42 | 
 43 | choose_normal_mean_bound <- function(variant, id_type) {
 44 |     return(numericInput(inputId = get_input_id("upper_bound_mean", variant, id_type)
 45 |                         , label = "Maximum Possible Average:"
 46 |                         , value = 1, min = -1e6, max = 1e6, step = 1))
 47 | }
 48 | 
 49 | choose_normal_sd <- function(variant, id_type) {
 50 |     return(numericInput(inputId = get_input_id("expected_sd", variant, id_type)
 51 |                         , label = "Expected Standard Deviation:"
 52 |                         , value = 1, min = 0, max = 1e6, step = 1))
 53 | }
 54 | 
 55 | choose_normal_sd_bound <- function(variant, id_type) {
 56 |     return(numericInput(inputId = get_input_id("upper_bound_sd", variant, id_type)
 57 |                         , label = "Maximum Possible Standard Deviation:"
 58 |                         , value = 2, min = 0, max = 1e6, step = 1))
 59 | }
 60 | 
 61 | set_normal_count <- function(variant, id_type) {
 62 |     return(numericInput(inputId = get_input_id("count", variant, id_type)
 63 |                         , label = "Number of Observations:"
 64 |                         , value = 0, min = 0, max = 1e8, step = 1))
 65 | }
 66 | 
 67 | set_normal_mean <- function(variant, id_type) {
 68 |     return(numericInput(inputId = get_input_id("mean", variant, id_type)
 69 |                         , label = "Observed Average:"
 70 |                         , value = 0, min = -1e6, max = 1e6, step = 1))
 71 | }
 72 | 
 73 | set_normal_sd <- function(variant, id_type) {
 74 |     return(numericInput(inputId = get_input_id("sd", variant, id_type)
 75 |                         , label = "Observed Standard Deviation:"
 76 |                         , value = 1, min = 0, max = 1e6, step = 1))
 77 | }
 78 | 
 79 | choose_poisson_lambda <- function(variant, id_type) {
 80 |     return(numericInput(inputId = get_input_id("expected_lambda", variant, id_type)
 81 |                         , label = "Expected Number of Events in One Session:"
 82 |                         , value = 1, min = 0, max = 1e3, step = 1))
 83 | }
 84 | 
 85 | choose_poisson_max <- function(variant, id_type) {
 86 |     return(numericInput(inputId = get_input_id("max_lambda", variant, id_type)
 87 |                         , label = "Maximum Possible Events in One Session:"
 88 |                         , value = 3, min = 0, max = 1e3, step = 1))
 89 | }
 90 | 
 91 | set_poisson_num <- function(variant, id_type) {
 92 |     return(numericInput(inputId = get_input_id("num", variant, id_type)
 93 |                         , label = "Number of Sessions:"
 94 |                         , value = 0, min = 0, max = 1e5, step = 1))
 95 | }
 96 | 
 97 | set_poisson_total <- function(variant, id_type) {
 98 |     return(numericInput(inputId = get_input_id("total", variant, id_type)
 99 |                         , label = "Observed Number of Events Across All Sessions:"
100 |                         , value = 0, min = 0, max = 1e8, step = 1))
101 | }
102 | 


--------------------------------------------------------------------------------
/man/ab_arguments.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/helpers.R
 3 | \name{ab_arguments}
 4 | \alias{ab_arguments}
 5 | \title{Bayesian A/B Arguments}
 6 | \arguments{
 7 | \item{data_dists}{A named list of distribution objects. This list specifies the
 8 | distributions that are used to generate data in simulations.
 9 | Currently, the list must only have elements named \code{'a'}
10 | and \code{'b'}. See Details for more information.}
11 | 
12 | \item{priors}{A named list of distribution objects. This list specifies the
13 | distributions that are used as priors when estimating some parameter
14 | from the data generating distribution. Currently, the list must only
15 | have elements named \code{'a'} and \code{'b'}. See Details for more information.}
16 | 
17 | \item{loss_threshold}{A positive number that identifies a bound for the expected
18 | loss for each variant. Once the expected loss is below this
19 | bound, the experiment is concluded.}
20 | 
21 | \item{obs_per_round}{A positive number that represents how many observations, across both variants,
22 | are generated before we update the prior distributions and
23 | evaluate the expected loss. This number must be divisible by
24 | the number of variants used. Default is \code{1000}.}
25 | 
26 | \item{max_rounds}{A positive integer that specifies the maximum number of
27 | times that we will evaluate the expected loss on both experiments.
28 | Default is \code{100}.}
29 | 
30 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated
31 | when evaluating the expected loss for variants that do
32 | not have an analytic solution (i.e. normal data).}
33 | }
34 | \description{
35 | Provide documentation for arguments commonly used in this package
36 | }
37 | 


--------------------------------------------------------------------------------
/man/approx_solver.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{approx_solver}
 4 | \alias{approx_solver}
 5 | \title{find_percentile}
 6 | \usage{
 7 | approx_solver(mu, bound, desired_support, cdf_fun, lower_sigma, upper_sigma,
 8 |   lower.tail = TRUE, tolerance = 0.01)
 9 | }
10 | \arguments{
11 | \item{mu}{The expected value of x (see details)}
12 | 
13 | \item{bound}{The desired maximum of the distribution}
14 | 
15 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}}
16 | 
17 | \item{cdf_fun}{A CDF}
18 | 
19 | \item{lower_sigma}{A guess of the lower bound on the standard deviation}
20 | 
21 | \item{upper_sigma}{A guess of the upper bound on the standard deviation}
22 | 
23 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail.
24 | Default is \code{TRUE}.}
25 | 
26 | \item{tolerance}{Return \code{NULL}, if we cannot find a value that gives support
27 | within \code{tolerance} of \code{desired_support}}
28 | }
29 | \value{
30 | The optimal standard deviation
31 | }
32 | \description{
33 | Determine the standard deviation of a distribution that will provide
34 |              the desired amount of support at a certain bound.
35 | }
36 | 


--------------------------------------------------------------------------------
/man/b_gt_a.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{b_gt_a}
 4 | \alias{b_gt_a}
 5 | \title{Probability Variant B is Greater Than Variant A}
 6 | \usage{
 7 | b_gt_a(dist_a, dist_b, theta_a, theta_b, exact = NULL, ...)
 8 | }
 9 | \arguments{
10 | \item{dist_a}{Some distribution object (see examples)}
11 | 
12 | \item{dist_b}{Some distribution object (see examples)}
13 | 
14 | \item{theta_a}{A vector of simulated values from \code{dist_a}}
15 | 
16 | \item{theta_b}{A vector of simulated values from \code{dist_b}}
17 | 
18 | \item{exact}{A boolean that indicates whether the calculation should be approximated
19 | using the normal distribution. Default is \code{NULL}, which means
20 | that it will use the normal distribution if there is sufficient data.}
21 | 
22 | \item{...}{Arguments to be passed onto other methods}
23 | }
24 | \value{
25 | A numeric value
26 | }
27 | \description{
28 | Given two distributions, find the probability that the expected
29 |              value of variant B is greater than the expected value of variant A
30 | }
31 | 


--------------------------------------------------------------------------------
/man/bernoulli_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{bernoulli_dist}
 4 | \alias{bernoulli_dist}
 5 | \title{Bernoulli Distribution}
 6 | \usage{
 7 | bernoulli_dist(rate)
 8 | }
 9 | \arguments{
10 | \item{rate}{The rate of the bernoulli distribution (between 0 and 1)}
11 | }
12 | \value{
13 | A \code{bernoulli_dist} object
14 | }
15 | \description{
16 | Create a bernoulli distribution object, used as the data generating
17 |              distribution in tests where the metric is a probability of an
18 |              event happening.
19 | }
20 | 


--------------------------------------------------------------------------------
/man/beta_cdf.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{beta_cdf}
 4 | \alias{beta_cdf}
 5 | \title{CDF of Parameterized Beta Distribution}
 6 | \usage{
 7 | beta_cdf(mu, sigma, bound, lower.tail = TRUE)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{sigma}{The standard deviation of x (see details)}
13 | 
14 | \item{bound}{The quantile of the distribution of interest}
15 | 
16 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail.
17 | Default is \code{TRUE}.}
18 | }
19 | \value{
20 | The area under the desired tail.
21 | }
22 | \description{
23 | Calculate the area to the left or right of \code{bound} for
24 |              a beta distribution parameterized by the expected value and
25 |              standard deviation.
26 | }
27 | \details{
28 | If x ~ Beta(A, B), the \code{mu} = E[x] and \code{sigma^2} = Var(x)
29 | }
30 | 


--------------------------------------------------------------------------------
/man/beta_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{beta_dist}
 4 | \alias{beta_dist}
 5 | \title{Beta Distribution}
 6 | \usage{
 7 | beta_dist(alpha, beta)
 8 | }
 9 | \arguments{
10 | \item{alpha}{The non-negative alpha parameter of the beta distribution}
11 | 
12 | \item{beta}{The non-negative beta parameter of the beta distribution}
13 | }
14 | \value{
15 | An object of class \code{'beta'}.
16 | }
17 | \description{
18 | Create a beta distribution object, used as the prior for metrics that
19 |              have a bernoulli distribution.
20 | }
21 | 


--------------------------------------------------------------------------------
/man/calc_beta_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{calc_beta_dist}
 4 | \alias{calc_beta_dist}
 5 | \title{Calculate Parameters For Beta Distribution}
 6 | \usage{
 7 | calc_beta_dist(mu, sigma)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{sigma}{The standard deviation of x (see details)}
13 | }
14 | \value{
15 | An object of class \code{beta_dist}
16 | }
17 | \description{
18 | Calculate the parameters for a beta distribution parameterized
19 |              by the expected value and standard deviation.
20 | }
21 | \details{
22 | If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
23 | }
24 | 


--------------------------------------------------------------------------------
/man/calc_gamma_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{calc_gamma_dist}
 4 | \alias{calc_gamma_dist}
 5 | \title{Calculate Parameters For Gamma Distribution}
 6 | \usage{
 7 | calc_gamma_dist(mu, sigma)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{sigma}{The standard deviation of x (see details)}
13 | }
14 | \value{
15 | A named list
16 | }
17 | \description{
18 | This function determines the parameters for a gamma distribution
19 |              from the desired mean
20 | }
21 | \details{
22 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
23 | }
24 | 


--------------------------------------------------------------------------------
/man/calc_normal_gamma_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{calc_normal_gamma_dist}
 4 | \alias{calc_normal_gamma_dist}
 5 | \title{Calculate Parameters For Normal Gamma Distribution}
 6 | \usage{
 7 | calc_normal_gamma_dist(mu, tau, sigma_mu, sigma_tau)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{tau}{The expected value of T (see details)}
13 | 
14 | \item{sigma_mu}{The standard deviation of x (see details)}
15 | 
16 | \item{sigma_tau}{The standard deviation of T (see details)}
17 | }
18 | \value{
19 | An object of class \code{normal_gamma_dist}
20 | }
21 | \description{
22 | Calculate the parameters for a normal gamma distribution parameterized
23 |              by the expected value and standard deviation.
24 | }
25 | \details{
26 | If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x],
27 |          \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T)
28 | }
29 | 


--------------------------------------------------------------------------------
/man/create_empty_dt.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/helpers.R
 3 | \name{create_empty_dt}
 4 | \alias{create_empty_dt}
 5 | \title{Create an Empty Data Table}
 6 | \usage{
 7 | create_empty_dt(num_rows, column_names, fill_value = -1)
 8 | }
 9 | \arguments{
10 | \item{num_rows}{A positive integer that specifies the number of rows for the data table}
11 | 
12 | \item{column_names}{A character vector that specifies the number and names of the columns.}
13 | 
14 | \item{fill_value}{Default is -1. Specifies what value should fill the data table}
15 | }
16 | \value{
17 | A data.table
18 | }
19 | \description{
20 | Create an empty data table with a specified number of rows and columns.
21 |              You can also choose the value that fills the data table.
22 | }
23 | 


--------------------------------------------------------------------------------
/man/expected_loss_b.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{expected_loss_b}
 4 | \alias{expected_loss_b}
 5 | \title{Expected Loss of Choosing Variant B}
 6 | \usage{
 7 | expected_loss_b(dist_a, dist_b, theta_a, theta_b, method = c("absolute",
 8 |   "percent"), ...)
 9 | }
10 | \arguments{
11 | \item{dist_a}{The distribution object for variant A}
12 | 
13 | \item{dist_b}{The distribution object for variant B}
14 | 
15 | \item{theta_a}{A vector of simulated values from \code{dist_a}}
16 | 
17 | \item{theta_b}{A vector of simulated values from \code{dist_b}}
18 | 
19 | \item{method}{One of \code{'absolute'} or \code{'percent'} that indicates
20 | whether the loss function takes the absolute difference
21 | or the percent difference between \code{theta_a} and \code{theta_b}}
22 | 
23 | \item{...}{Arguments to be passed onto other methods}
24 | }
25 | \value{
26 | A numeric
27 | }
28 | \description{
29 | This function calculates the expected loss of choosing variant B
30 | }
31 | 


--------------------------------------------------------------------------------
/man/gamma_cdf.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{gamma_cdf}
 4 | \alias{gamma_cdf}
 5 | \title{CDF of Parameterized Gamma Distribution}
 6 | \usage{
 7 | gamma_cdf(mu, sigma, bound, lower.tail = TRUE)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{sigma}{The standard deviation of x (see details)}
13 | 
14 | \item{bound}{The quantile of the distribution of interest}
15 | 
16 | \item{lower.tail}{A logical that indicates whether to find the area of the upper or lower tail.
17 | Default is \code{TRUE}.}
18 | }
19 | \value{
20 | The area under the desired tail.
21 | }
22 | \description{
23 | Calculate the area to the left or right of \code{bound} for
24 |              a gamma distribution parameterized by the expected value and
25 |              standard deviation.
26 | }
27 | \details{
28 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
29 | }
30 | 


--------------------------------------------------------------------------------
/man/gamma_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{gamma_dist}
 4 | \alias{gamma_dist}
 5 | \title{Gamma Distribution}
 6 | \usage{
 7 | gamma_dist(alpha, beta)
 8 | }
 9 | \arguments{
10 | \item{alpha}{The non-negative alpha (shape) parameter of the gamma distribution}
11 | 
12 | \item{beta}{The non-negative beta (rate) parameter of the gamma distribution}
13 | }
14 | \value{
15 | A normal_unknown dist object
16 | }
17 | \description{
18 | Create a gamma distribution object, used as the prior for
19 |              metrics that have a poisson distribution.
20 | }
21 | 


--------------------------------------------------------------------------------
/man/get_actual_loss.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{get_actual_loss}
 4 | \alias{get_actual_loss}
 5 | \title{Actual Loss of an Experiment}
 6 | \usage{
 7 | get_actual_loss(data_dists, selected_variant)
 8 | }
 9 | \arguments{
10 | \item{data_dists}{A named list of distribution objects. This list specifies the
11 | distributions that are used to generate data in simulations.
12 | Currently, the list must only have elements named \code{'a'}
13 | and \code{'b'}. See Details for more information.}
14 | 
15 | \item{selected_variant}{A string that identifies the winning variant.}
16 | }
17 | \value{
18 | Numeric
19 | }
20 | \description{
21 | Once the experiment has concluded, measure the loss of the decision
22 | }
23 | 


--------------------------------------------------------------------------------
/man/get_data_dists.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/get_data_dists.R
 3 | \name{get_data_dists}
 4 | \alias{get_data_dists}
 5 | \title{Sample Data Distributions}
 6 | \usage{
 7 | get_data_dists(dist, n, num_variants, ...)
 8 | }
 9 | \arguments{
10 | \item{dist}{A prior distribution}
11 | 
12 | \item{n}{The number of distributions to sample}
13 | 
14 | \item{num_variants}{The number of variants to use}
15 | 
16 | \item{...}{Arguments to be used by other methods}
17 | }
18 | \value{
19 | A list of data generating distributions for multiple variants
20 | }
21 | \description{
22 | Given a prior distribution, sample many data generating distributions
23 | }
24 | 


--------------------------------------------------------------------------------
/man/get_losses.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{get_losses}
 4 | \alias{get_losses}
 5 | \title{Expected Losses For Variants}
 6 | \usage{
 7 | get_losses(posteriors, sim_batch_size, method = c("absolute", "percent"))
 8 | }
 9 | \arguments{
10 | \item{posteriors}{A list of distribution objects that identify the posterior distributions
11 | for each variant}
12 | 
13 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated
14 | when evaluating the expected loss for variants that do
15 | not have an analytic solution (i.e. normal data).}
16 | 
17 | \item{method}{One of \code{'absolute'} or \code{'percent'} that indicates
18 | whether the loss function takes the absolute difference
19 | or the percent difference between \code{theta_a} and \code{theta_b}}
20 | }
21 | \value{
22 | A list of expected losses for each variant
23 | }
24 | \description{
25 | Evaluate the expected loss for each variant
26 | }
27 | 


--------------------------------------------------------------------------------
/man/get_metrics.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{get_metrics}
 4 | \alias{get_metrics}
 5 | \title{Get Bayesian A/B Testing Metrics}
 6 | \usage{
 7 | get_metrics(posteriors, sim_batch_size, method = c("absolute", "percent"))
 8 | }
 9 | \arguments{
10 | \item{posteriors}{A list of distribution objects}
11 | 
12 | \item{sim_batch_size}{How many observations of data to simulate}
13 | 
14 | \item{method}{What type of loss to calculate?}
15 | }
16 | \value{
17 | A named list with the metrics
18 | }
19 | \description{
20 | Calculate various metrics
21 | }
22 | 


--------------------------------------------------------------------------------
/man/get_supported_beta.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{get_supported_beta}
 4 | \alias{get_supported_beta}
 5 | \title{Get Beta Distribution With Desired Support}
 6 | \usage{
 7 | get_supported_beta(mu, bound, desired_support = 0.05)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{bound}{The desired maximum of the distribution}
13 | 
14 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}}
15 | }
16 | \value{
17 | An object of class \code{beta_dist}
18 | }
19 | \description{
20 | Determines the parameters for a beta distribution given an expected value
21 |              and desired support at some bound.
22 | }
23 | \details{
24 | If x ~ Beta(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
25 | }
26 | 


--------------------------------------------------------------------------------
/man/get_supported_gamma.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{get_supported_gamma}
 4 | \alias{get_supported_gamma}
 5 | \title{Get parameters of gamma distribution from expected and maximum rate}
 6 | \usage{
 7 | get_supported_gamma(mu, bound, desired_support = 0.05, lower.tail = FALSE)
 8 | }
 9 | \arguments{
10 | \item{mu}{The expected value of x (see details)}
11 | 
12 | \item{bound}{The desired maximum of the distribution}
13 | 
14 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}}
15 | 
16 | \item{lower.tail}{A boolean that indicates whether we want to get the support on the upper or lower tail}
17 | }
18 | \value{
19 | A \code{gamma_dist} object
20 | }
21 | \description{
22 | This function determines the parameters for a gamma distribution
23 | }
24 | \details{
25 | If x ~ Gamma(alpha, beta), then \code{mu} = E[x] and \code{sigma^2} = Var(x)
26 | }
27 | 


--------------------------------------------------------------------------------
/man/get_supported_normal_gamma.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/construct_priors.R
 3 | \name{get_supported_normal_gamma}
 4 | \alias{get_supported_normal_gamma}
 5 | \title{Get Normal Gamma Distribution With Desired Support}
 6 | \usage{
 7 | get_supported_normal_gamma(mu, bound_mu, sigma, bound_sigma,
 8 |   desired_support = 0.05)
 9 | }
10 | \arguments{
11 | \item{mu}{The expected value of x (see details)}
12 | 
13 | \item{bound_mu}{An upper bound on the likely value of x (see details)}
14 | 
15 | \item{sigma}{The standard deviation of x (see details)}
16 | 
17 | \item{bound_sigma}{The upper bound on the likely value of T (see details)}
18 | 
19 | \item{desired_support}{The amount of probability mass to be more extreme than \code{bound}}
20 | }
21 | \value{
22 | An object of class \code{normal_gamma_dist}
23 | }
24 | \description{
25 | Determines the parameters for a normal gamma distribution given the expected values
26 |              of the distribution and bounds on their maximum values
27 | }
28 | \details{
29 | If (x, T) ~ NormalGamma(mu, lambda, alpha, beta), then \code{mu} = E[x],
30 |          \code{tau} = E[T], \code{sigma_mu^2} = Var(x), and \code{sigma_tau^2} = Var(T)
31 | }
32 | 


--------------------------------------------------------------------------------
/man/investigate_simulations.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/simulation_study.R
 3 | \name{investigate_simulations}
 4 | \alias{investigate_simulations}
 5 | \title{Investigate Properties of Bayesian A/B Testing}
 6 | \usage{
 7 | investigate_simulations(num_sims, priors, loss_threshold, data_dists = NULL,
 8 |   sampling_distribution = NULL, obs_per_round = 1000, max_rounds = 100,
 9 |   sim_batch_size = 1e+05, num_cores = NULL)
10 | }
11 | \arguments{
12 | \item{num_sims}{A positive integer that specifies how many simulations to perform.}
13 | 
14 | \item{priors}{A named list of distribution objects. This list specifies the
15 | distributions that are used as priors when estimating some parameter
16 | from the data generating distribution. Currently, the list must only
17 | have elements named \code{'a'} and \code{'b'}. See Details for more information.}
18 | 
19 | \item{loss_threshold}{A positive number that identifies a bound for the expected
20 | loss for each variant. Once the expected loss is below this
21 | bound, the experiment is concluded.}
22 | 
23 | \item{data_dists}{A named list of distribution objects. This list specifies the
24 | distributions that are used to generate data in simulations.
25 | Currently, the list must only have elements named \code{'a'}
26 | and \code{'b'}. See Details for more information.}
27 | 
28 | \item{sampling_distribution}{An list of distribution objects that specifies how the data generating
29 | distributions should be created.}
30 | 
31 | \item{obs_per_round}{A positive number that represents how many observations, across both variants,
32 | are generated before we update the prior distributions and
33 | evaluate the expected loss. This number must be divisible by
34 | the number of variants used. Default is \code{1000}.}
35 | 
36 | \item{max_rounds}{A positive integer that specifies the maximum number of
37 | times that we will evaluate the expected loss on both experiments.
38 | Default is \code{100}.}
39 | 
40 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated
41 | when evaluating the expected loss for variants that do
42 | not have an analytic solution (i.e. normal data).}
43 | 
44 | \item{num_cores}{How many cores to use in the parallelization of tests. Default is \code{NULL}, which
45 | means no parallelization.}
46 | }
47 | \value{
48 | A list containing two data.tables: one with summary statistics for each simulation
49 |         and one with the averages over all of the simulations.
50 | }
51 | \description{
52 | Run multiple simulations of an A/B test in order to evaluate various properties
53 | }
54 | 


--------------------------------------------------------------------------------
/man/normal_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{normal_dist}
 4 | \alias{normal_dist}
 5 | \title{Normal Distribution}
 6 | \usage{
 7 | normal_dist(mu, sigma)
 8 | }
 9 | \arguments{
10 | \item{mu}{The mean of the normal distribution}
11 | 
12 | \item{sigma}{The non-negative standard deviation of the normal distribution}
13 | }
14 | \value{
15 | A \code{normal_dist} object
16 | }
17 | \description{
18 | Create a normal distribution object, used as the data generating
19 |              distribution in tests where the metric is a continuous value.
20 | }
21 | 


--------------------------------------------------------------------------------
/man/normal_gamma_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{normal_gamma_dist}
 4 | \alias{normal_gamma_dist}
 5 | \title{Normal-Gamma Distribution}
 6 | \usage{
 7 | normal_gamma_dist(mu, lambda, alpha, beta)
 8 | }
 9 | \arguments{
10 | \item{mu}{The mean of the normal-gamma distribution}
11 | 
12 | \item{lambda}{The non-negative lambda parameter of the normal-gamma distribution}
13 | 
14 | \item{alpha}{The non-negative alpha parameter of the normal-gamma distribution}
15 | 
16 | \item{beta}{The non-negative beta parameter of the normal-gamma distribution}
17 | }
18 | \value{
19 | A normal_unknown dist object
20 | }
21 | \description{
22 | Create a normal_gamma distribution object, used as the prior for
23 |              metrics that have a normal distribution.
24 | }
25 | 


--------------------------------------------------------------------------------
/man/plot_beta.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plotting.R
 3 | \name{plot_beta}
 4 | \alias{plot_beta}
 5 | \title{Plot Beta Distributions}
 6 | \usage{
 7 | plot_beta(betas, title = "Beta Distribution",
 8 |   xlab = "Rate that the Event Occurs", ylab = "Density of that Rate",
 9 |   color = "#f65335", support_level = 0.99)
10 | }
11 | \arguments{
12 | \item{betas}{A list of lists of beta distributions}
13 | 
14 | \item{title}{The title of the plot}
15 | 
16 | \item{xlab}{The title of the x axis}
17 | 
18 | \item{ylab}{The title of the y axis}
19 | 
20 | \item{color}{The color for the plot}
21 | 
22 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.}
23 | }
24 | \value{
25 | NULL. A plot is generated
26 | }
27 | \description{
28 | This function plots the densities of multiple beta distributions
29 | }
30 | 


--------------------------------------------------------------------------------
/man/plot_gamma.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plotting.R
 3 | \name{plot_gamma}
 4 | \alias{plot_gamma}
 5 | \title{Plot Gamma Distributions}
 6 | \usage{
 7 | plot_gamma(gammas, title = "Density of Gamma Distribution",
 8 |   color = "#f65335", support_level = 0.99)
 9 | }
10 | \arguments{
11 | \item{gammas}{A list of lists of gamma distributions}
12 | 
13 | \item{title}{The title of the plot}
14 | 
15 | \item{color}{The color for the plot}
16 | 
17 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.}
18 | }
19 | \value{
20 | NULL. A plot is generated
21 | }
22 | \description{
23 | This function plots the densities of multiple gamma distributions
24 | }
25 | 


--------------------------------------------------------------------------------
/man/plot_normal.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plotting.R
 3 | \name{plot_normal}
 4 | \alias{plot_normal}
 5 | \title{Plot Normal Distributions}
 6 | \usage{
 7 | plot_normal(normals, title = "Normal Distribution", color = "#f65335",
 8 |   support_level = 0.99)
 9 | }
10 | \arguments{
11 | \item{normals}{A list of lists of normal distributions}
12 | 
13 | \item{title}{The title of the plot}
14 | 
15 | \item{color}{The color for the plot}
16 | 
17 | \item{support_level}{The desired amount of area between the lower and upper bounds. Default is \code{0.99}.}
18 | }
19 | \value{
20 | NULL. A plot is generated
21 | }
22 | \description{
23 | This function allows you to visualize the densities of a normal distribution
24 | }
25 | 


--------------------------------------------------------------------------------
/man/plot_relative_gain.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plotting.R
 3 | \name{plot_relative_gain}
 4 | \alias{plot_relative_gain}
 5 | \title{Plot Relative Gain}
 6 | \usage{
 7 | plot_relative_gain(dists, sim_batch_size = 1e+05,
 8 |   title = "Cumulative Density of B / A")
 9 | }
10 | \arguments{
11 | \item{dists}{A list of distribution objects, with elements named \code{'a'} and \code{'b'}}
12 | 
13 | \item{sim_batch_size}{The number of objects to simulate}
14 | }
15 | \value{
16 | A plot
17 | }
18 | \description{
19 | Plot the cumulative density of the ratio of the metric under variant B to the metric under variant A
20 | }
21 | 


--------------------------------------------------------------------------------
/man/poisson_dist.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/create_dists.R
 3 | \name{poisson_dist}
 4 | \alias{poisson_dist}
 5 | \title{Poisson Distribution}
 6 | \usage{
 7 | poisson_dist(rate)
 8 | }
 9 | \arguments{
10 | \item{rate}{The non-negative mean of the poisson distribution}
11 | }
12 | \value{
13 | A \code{poisson_dist} object
14 | }
15 | \description{
16 | Create a poisson distribution object, used as the data generating
17 |              distribution in tests where the metric is a count of an event happening.
18 | }
19 | 


--------------------------------------------------------------------------------
/man/sim_effect_size.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/metrics.R
 3 | \name{sim_effect_size}
 4 | \alias{sim_effect_size}
 5 | \title{Simulate the effect size of variant B - variant A}
 6 | \usage{
 7 | sim_effect_size(theta_a, theta_b)
 8 | }
 9 | \arguments{
10 | \item{theta_a}{A vector of draws from the posterior of a}
11 | 
12 | \item{theta_b}{A vector of draws from the posterior of b}
13 | }
14 | \value{
15 | A list containing the 2.5%, mean, and 97.5% of the effect size
16 | }
17 | \description{
18 | This function simulates the effect size (B - A)
19 | }
20 | 


--------------------------------------------------------------------------------
/man/simulate_ab_test.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/run_simulation.R
 3 | \name{simulate_ab_test}
 4 | \alias{simulate_ab_test}
 5 | \title{Simulate a Bayesian A/B Test}
 6 | \usage{
 7 | simulate_ab_test(data_dists, priors, loss_threshold, obs_per_round = 1000,
 8 |   max_rounds = 100, sim_batch_size = 1e+05)
 9 | }
10 | \arguments{
11 | \item{data_dists}{A named list of distribution objects. This list specifies the
12 | distributions that are used to generate data in simulations.
13 | Currently, the list must only have elements named \code{'a'}
14 | and \code{'b'}. See Details for more information.}
15 | 
16 | \item{priors}{A named list of distribution objects. This list specifies the
17 | distributions that are used as priors when estimating some parameter
18 | from the data generating distribution. Currently, the list must only
19 | have elements named \code{'a'} and \code{'b'}. See Details for more information.}
20 | 
21 | \item{loss_threshold}{A positive number that identifies a bound for the expected
22 | loss for each variant. Once the expected loss is below this
23 | bound, the experiment is concluded.}
24 | 
25 | \item{obs_per_round}{A positive number that represents how many observations, across both variants,
26 | are generated before we update the prior distributions and
27 | evaluate the expected loss. This number must be divisible by
28 | the number of variants used. Default is \code{1000}.}
29 | 
30 | \item{max_rounds}{A positive integer that specifies the maximum number of
31 | times that we will evaluate the expected loss on both experiments.
32 | Default is \code{100}.}
33 | 
34 | \item{sim_batch_size}{A positive integer that specifies how much data is simulated
35 | when evaluating the expected loss for variants that do
36 | not have an analytic solution (i.e. normal data).}
37 | }
38 | \value{
39 | A list that contains the name of the winning variant, the number of observations used,
40 |         the loss of the decision, whether the test finished, the metrics from each round, and
41 |         the raw data.
42 | }
43 | \description{
44 | Given true data generating distributions and prior distributions for
45 |              variants A and B, simulate the data, calculate the necessary statistics
46 |              and declare one of the tests a winner.
47 | }
48 | \details{
49 | In order to create \code{data_dists} and \code{priors}, you need to use the following
50 |          for data generating distributions: \code{\link{bernoulli_dist}},
51 |          \code{\link{normal_dist}}, \code{\link{poisson_dist}} and the following for prior
52 |          distributions \code{\link{beta_dist}}, \code{\link{normal_gamma_dist}},
53 |          \code{\link{beta_dist}},
54 | }
55 | \seealso{
56 | \code{\link{bernoulli_dist}} \code{\link{normal_dist}} \code{\link{poisson_dist}}
57 |          \code{\link{beta_dist}} \code{\link{normal_gamma_dist}} \code{\link{beta_dist}}
58 | }
59 | 


--------------------------------------------------------------------------------
/man/simulate_data.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/compute_moments.R, R/simulate_data.R
 3 | \name{simulate_data}
 4 | \alias{simulate_data}
 5 | \alias{compute_moments}
 6 | \alias{simulate_data}
 7 | \title{Simulate Data According to Some Distribution}
 8 | \usage{
 9 | compute_moments(dist)
10 | 
11 | simulate_data(dist, n, ...)
12 | }
13 | \arguments{
14 | \item{dist}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'},
15 | \code{'gamma_dist'} that specifies the parameters of some distribution}
16 | 
17 | \item{n}{How many data points to simulate}
18 | 
19 | \item{...}{Arguments to be passed onto other methods}
20 | 
21 | \item{dist}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'},
22 | \code{'gamma_dist'}, \code{'bernouilli_dist'}, \code{'normal_dist'},
23 | or \code{'poisson_dist'} that specifies the parameters of some distribution}
24 | }
25 | \value{
26 | A list
27 | 
28 | A vector of simulated data.
29 | }
30 | \description{
31 | Simulate a vector of data from a given distribution object.
32 | 
33 | Simulate a vector of data from a given distribution object.
34 | }
35 | 


--------------------------------------------------------------------------------
/man/single_loss_stop.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/stopping_functions.R
 3 | \name{single_loss_stop}
 4 | \alias{single_loss_stop}
 5 | \title{Decide whether to stop the experiment using the last loss values}
 6 | \usage{
 7 | single_loss_stop(losses, loss_threshold)
 8 | }
 9 | \arguments{
10 | \item{losses}{A named vector of the losses for each variant.}
11 | 
12 | \item{loss_threshold}{Choose a variant once the loss goes beneath this value.}
13 | }
14 | \value{
15 | A list containg two named elements: \code{'stop'} (a boolean) and
16 |         \code{'winner'}, which is \code{NULL} if \code{'stop'} is \code{FALSE}
17 |         else it is the name of the winning variant
18 | }
19 | \description{
20 | This function decides whether or not to stop the experiment and
21 |              declare one of the variants as the winner.
22 | }
23 | 


--------------------------------------------------------------------------------
/man/update_prior.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/update_priors.R
 3 | \name{update_prior}
 4 | \alias{update_prior}
 5 | \title{Update Prior Parameters}
 6 | \usage{
 7 | update_prior(prior, evidence = NULL, stats = NULL, ...)
 8 | }
 9 | \arguments{
10 | \item{prior}{An object of class \code{'beta_dist'}, \code{'normal_gamma_dist'}
11 | , or \code{'gamma_dist'} that specifies the parameters of some distribution}
12 | 
13 | \item{evidence}{A numeric vector that contains observed data. Default is \code{NULL},
14 | will override \code{stats} if specified.}
15 | 
16 | \item{stats}{An object of class \code{'beta_stats'}, \code{'normal_gamma_stats'}, or
17 | \code{'gamma_stats'} that contains sufficient statistics for the
18 | update. Default is \code{NULL}, will be ignored if \code{evidence}
19 | is specified.}
20 | 
21 | \item{...}{Arguments to be passed onto other methods}
22 | }
23 | \description{
24 | Use observed data to update parameters of prior distribution
25 | }
26 | 


--------------------------------------------------------------------------------
/man/update_priors.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/update_priors.R
 3 | \name{update_priors}
 4 | \alias{update_priors}
 5 | \title{Update the hyper-parameters of prior distributions}
 6 | \usage{
 7 | update_priors(priors, evidence_dt = NULL, stats_dt = NULL)
 8 | }
 9 | \arguments{
10 | \item{priors}{A named list of distribution objects. This list specifies the
11 | distributions that are used as priors when estimating some parameter
12 | from the data generating distribution. Currently, the list must only
13 | have elements named \code{'a'} and \code{'b'}. See Details for more information.}
14 | 
15 | \item{evidence_dt}{A data.table containing the raw data generated by each variant.
16 | Column names must be identical to variant names in \code{priors}}
17 | 
18 | \item{stats_dt}{A data.table containing statistics about the raw data generated by each
19 | variant.}
20 | }
21 | \value{
22 | A list of distribution objects that represented the updated prior distributions
23 | }
24 | \description{
25 | This function updates the hyper-parameters of our prior distributions
26 | }
27 | 


--------------------------------------------------------------------------------
/man/validate_dt.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/update_priors.R
 3 | \name{validate_dt}
 4 | \alias{validate_dt}
 5 | \title{Check That Data Table Matches Expectations}
 6 | \usage{
 7 | validate_dt(dt, expected_cols)
 8 | }
 9 | \arguments{
10 | \item{dt}{A data.table}
11 | 
12 | \item{expected_cols}{A vector of column names that \code{dt} must have}
13 | }
14 | \value{
15 | \code{NULL} if successful, else a fatal error
16 | }
17 | \description{
18 | Use this function to assert certain properties of a data.table
19 | }
20 | 


--------------------------------------------------------------------------------
/man/validate_inputs.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/run_simulation.R
 3 | \name{validate_inputs}
 4 | \alias{validate_inputs}
 5 | \title{Validate Inputs to Bayesian A/B Simulation}
 6 | \usage{
 7 | validate_inputs(data_dists, priors, obs_per_round)
 8 | }
 9 | \arguments{
10 | \item{data_dists}{A named list of distribution objects. This list specifies the
11 | distributions that are used to generate data in simulations.
12 | Currently, the list must only have elements named \code{'a'}
13 | and \code{'b'}. See Details for more information.}
14 | 
15 | \item{priors}{A named list of distribution objects. This list specifies the
16 | distributions that are used as priors when estimating some parameter
17 | from the data generating distribution. Currently, the list must only
18 | have elements named \code{'a'} and \code{'b'}. See Details for more information.}
19 | 
20 | \item{obs_per_round}{A positive number that represents how many observations, across both variants,
21 | are generated before we update the prior distributions and
22 | evaluate the expected loss. This number must be divisible by
23 | the number of variants used. Default is \code{1000}.}
24 | }
25 | \value{
26 | NULL if all of the tests pass. Else, it will fail loudly.
27 | }
28 | \description{
29 | Check the validity of the parameters being used in the simulation.
30 | }
31 | 


--------------------------------------------------------------------------------
/shiny_app_screen_shot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/convoyinc/abayes/6e74f8288ad84cf58ffea58dd396a5d2f8ff10e8/shiny_app_screen_shot.png


--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | Sys.setenv('R_TESTS' = '')
2 | 
3 | library(abayes)
4 | testthat::test_check('abayes')


--------------------------------------------------------------------------------
/tests/testthat/test-construct_priors.R:
--------------------------------------------------------------------------------
 1 | context('Testing functions that construct priors')
 2 | 
 3 | test_that('calc_beta_dist can get a beta distribution from mu and sigma', {
 4 |     mu <- 0.5
 5 |     sigma <- 0.01
 6 |     dist <- calc_beta_dist(mu = mu, sigma = sigma)
 7 |     expected <- beta_dist(alpha = 1249.5, beta = 1249.5)
 8 |     expect_identical(dist, expected)
 9 | })
10 | 
11 | test_that('beta_cdf calculates the correct integral', {
12 |     mu <- 0.5
13 |     sigma <- 0.01
14 |     bound <- 0.51
15 |     lower.tail <- FALSE
16 |     area <- beta_cdf(mu = mu, sigma = sigma, bound = bound, lower.tail = lower.tail)
17 |     expected <- pbeta(0.51, 1249.5, 1249.5, lower.tail = FALSE)
18 |     expect_identical(area, expected)
19 | })
20 | 
21 | test_that('calc_normal_gamma_dist can get a normal gamma distribution', {
22 |     mu <- 10
23 |     tau <- 2
24 |     sigma_mu <- 2
25 |     sigma_tau <- 0.5
26 |     dist <- calc_normal_gamma_dist(mu = mu, tau = tau, sigma_mu = sigma_mu, sigma_tau = sigma_tau)
27 |     expected <- normal_gamma_dist(mu = 10, lambda = 2/15, alpha = 16, beta = 8)
28 |     expect_identical(dist, expected)
29 | })
30 | 
31 | test_that('calc_gamma_dist can get a gamma distribution from expected value and standard deviation', {
32 |     mu <- 10
33 |     sigma <- 5
34 |     dist <- calc_gamma_dist(mu = mu, sigma = sigma)
35 |     expected <- gamma_dist(alpha = 4, beta = 0.4)
36 |     expect_identical(dist, expected)
37 | })
38 | 
39 | test_that('gamma_cdf calculates the correct integral', {
40 |     mu <- 3
41 |     sigma <- 2
42 |     bound <- 4
43 |     lower.tail <- FALSE
44 |     area <- gamma_cdf(mu = mu, sigma = sigma, bound = bound, lower.tail = lower.tail)
45 |     expected <- pgamma(4, 2.25, 0.75, lower.tail = FALSE)
46 |     expect_identical(area, expected)
47 | })
48 | 
49 | test_that('get_supported_beta returns correct distribution', {
50 |     run_test <- function(mu, bound, desired_support) {
51 |         beta_dist <- get_supported_beta(mu = mu, bound = bound, desired_support = desired_support)
52 |         expect_true(inherits(beta_dist, 'beta_dist'))
53 |         actual_support <- pbeta(bound, beta_dist[['alpha']], beta_dist[['beta']], lower.tail = FALSE)
54 |         expect_true(abs(actual_support - desired_support) < 0.01)
55 |     }
56 |     
57 |     run_test(mu = 0.1, bound = 0.2, desired_support = 0.05)
58 |     run_test(mu = 0.01, bound = 0.011, desired_support = 0.05)
59 |     run_test(mu = 0.3, bound = 0.5, desired_support = 0.05)
60 | })
61 | 
62 | test_that('get_supported_normal_gamma returns correct distribution', {
63 |     run_test <- function(mu, bound_mu, sigma, bound_sigma, desired_support) {
64 |         ng_dist <- get_supported_normal_gamma(mu = mu, bound_mu = bound_mu
65 |                                               , sigma = sigma, bound_sigma = bound_sigma
66 |                                               , desired_support = desired_support)
67 |         expect_true(inherits(ng_dist, 'normal_gamma_dist'))
68 |         sample_tau <- rgamma(n = 1e4, shape = ng_dist[['alpha']], rate = ng_dist[['beta']])
69 |         actual_support <- mean(pnorm(bound_mu
70 |                                      , ng_dist[['mu']]
71 |                                      , sqrt(1 / (ng_dist[['lambda']] * sample_tau))
72 |                                      , lower.tail = FALSE))
73 |         expect_true(abs(actual_support - desired_support) < 0.01)
74 |     }
75 |     
76 |     run_test(mu = 0, bound_mu = 2, sigma = 2, bound_sigma = 4, desired_support = 0.05)
77 |     run_test(mu = 10, bound_mu = 25, sigma = 100, bound_sigma = 150, desired_support = 0.05)
78 | })
79 | 
80 | test_that('get_supported_gamma returns correct distribution', {
81 |     run_test <- function(mu, bound, desired_support) {
82 |         gamma_dist <- get_supported_gamma(mu = mu, bound = bound, desired_support = desired_support)
83 |         expect_true(inherits(gamma_dist, 'gamma_dist'))
84 |         actual_support <- pgamma(bound, gamma_dist[['alpha']], gamma_dist[['beta']], lower.tail = FALSE)
85 |         expect_true(abs(actual_support - desired_support) < 0.01)
86 |     }
87 |     
88 |     run_test(mu = 10, bound = 30, desired_support = 0.05)
89 |     run_test(mu = 25, bound = 30, desired_support = 0.05)
90 | })
91 | 
92 | 


--------------------------------------------------------------------------------
/tests/testthat/test-create_dists.R:
--------------------------------------------------------------------------------
 1 | context('Testing functions that create distribution objects')
 2 | 
 3 | test_that('beta_dist fails on bad inputs', {
 4 |     expect_error(beta_dist(alpha = 'a', beta = 0), regexp = 'must be numeric')
 5 |     expect_error(beta_dist(alpha = -1, beta = 0), regexp = 'must be non-negative')
 6 | })
 7 | 
 8 | test_that('beta_dist creates a valid object', {
 9 |     dist <- beta_dist(alpha = 1, beta = 2)
10 |     expect_equal(class(dist), 'beta_dist')
11 |     expect_equal(names(dist), c('alpha', 'beta'))
12 |     expect_equal(dist[['alpha']], 1)
13 |     expect_equal(dist[['beta']], 2)
14 | })
15 | 
16 | test_that('normal_gamma_dist fails on bad inputs', {
17 |     expect_error(normal_gamma_dist(mu = 'a', lambda = 0, alpha = 1, beta = 2)
18 |                  , regexp = 'must be numeric')
19 |     expect_error(normal_gamma_dist(mu = -1, lambda = -1, alpha = 1, beta = 2)
20 |                  , regexp = 'must be non-negative')
21 | })
22 | 
23 | test_that('normal_gamma_dist creates a valid object', {
24 |     dist <- normal_gamma_dist(mu = 0, lambda = 3, alpha = 1, beta = 2)
25 |     expect_equal(class(dist), 'normal_gamma_dist')
26 |     expect_equal(names(dist), c('mu', 'lambda', 'alpha', 'beta'))
27 |     expect_equal(dist[['mu']], 0)
28 |     expect_equal(dist[['lambda']], 3)
29 |     expect_equal(dist[['alpha']], 1)
30 |     expect_equal(dist[['beta']], 2)
31 | })
32 | 
33 | test_that('gamma_dist fails on bad inputs', {
34 |     expect_error(gamma_dist(alpha = 'a', beta = 0), regexp = 'must be numeric')
35 |     expect_error(gamma_dist(alpha = -1, beta = 0), regexp = 'must be non-negative')
36 | })
37 | 
38 | test_that('gamma_dist creates a valid object', {
39 |     dist <- gamma_dist(alpha = 1, beta = 2)
40 |     expect_equal(class(dist), 'gamma_dist')
41 |     expect_equal(names(dist), c('alpha', 'beta'))
42 |     expect_equal(dist[['alpha']], 1)
43 |     expect_equal(dist[['beta']], 2)
44 | })
45 | 
46 | test_that('bernoulli_dist fails on bad inputs', {
47 |     expect_error(bernoulli_dist(rate = 'a'), regexp = 'must be numeric')
48 |     expect_error(bernoulli_dist(rate = 1.5), regexp = 'must be between 0 and 1')
49 | })
50 | 
51 | test_that('bernoulli_dist creates a valid object', {
52 |     dist <- bernoulli_dist(rate = 0.5)
53 |     expect_equal(class(dist), 'bernoulli_dist')
54 |     expect_equal(names(dist), 'rate')
55 |     expect_equal(dist[['rate']], 0.5)
56 | })
57 | 
58 | test_that('normal_dist fails on bad inputs', {
59 |     expect_error(normal_dist(mu = 'a', sigma = 1), regexp = 'must be numeric')
60 |     expect_error(normal_dist(mu = 1.5, sigma = -1), regexp = 'must be non-negative')
61 | })
62 | 
63 | test_that('normal_dist creates a valid object', {
64 |     dist <- normal_dist(mu = 0, sigma = 1)
65 |     expect_equal(class(dist), 'normal_dist')
66 |     expect_equal(names(dist), c('mu', 'sigma'))
67 |     expect_equal(dist[['mu']], 0)
68 |     expect_equal(dist[['sigma']], 1)
69 | })
70 | 
71 | test_that('poisson_dist fails on bad inputs', {
72 |     expect_error(poisson_dist(rate = 'a'), regexp = 'must be numeric')
73 |     expect_error(poisson_dist(rate = -1), regexp = 'must be non-negative')
74 | })
75 | 
76 | test_that('poisson_dist creates a valid object', {
77 |     dist <- poisson_dist(rate = 5)
78 |     expect_equal(class(dist), 'poisson_dist')
79 |     expect_equal(names(dist), 'rate')
80 |     expect_equal(dist[['rate']], 5)
81 | })
82 | 
83 | 


--------------------------------------------------------------------------------
/tests/testthat/test-get_data_dists.R:
--------------------------------------------------------------------------------
 1 | context('Test ability to generate data distributions')
 2 | 
 3 | test_that('get_data_dists can create a list of distributions', {
 4 |     set.seed(123)
 5 |     sampling_distribution <- beta_dist(alpha = 1, beta = 1)
 6 |     distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2)
 7 |     set.seed(123)
 8 |     rates <- rbeta(6, 1, 1)
 9 |     expect_equal(distributions[[1]][['b']][['rate']], rates[4])
10 |     
11 |     set.seed(123)
12 |     sampling_distribution <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1)
13 |     distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2)
14 |     set.seed(123)
15 |     tau <- rgamma(6, 1, 1)
16 |     mu <- rnorm(6, 0, sqrt(1 / tau))
17 |     expect_equal(distributions[[2]][['a']][['mu']], mu[2])
18 |     expect_equal(distributions[[3]][['b']][['sigma']], 1 / sqrt(tau[6]))
19 |     
20 |     set.seed(123)
21 |     sampling_distribution <- gamma_dist(alpha = 1, beta = 1)
22 |     distributions <- get_data_dists(sampling_distribution, n = 3, num_variants = 2)
23 |     set.seed(123)
24 |     rates <- rgamma(6, 1, 1)
25 |     expect_equal(distributions[[1]][['b']][['rate']], rates[4])
26 | })


--------------------------------------------------------------------------------
/tests/testthat/test-metrics.R:
--------------------------------------------------------------------------------
 1 | context('Testing functions that calculate metrics on the posterior distribution work')
 2 | 
 3 | test_that('b_gt_a and expected_loss_b work correctly for beta distribution', {
 4 |     dist_a <- beta_dist(alpha = 3, beta = 5)
 5 |     dist_b <- beta_dist(alpha = 4, beta = 4)
 6 |     prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b)
 7 |     loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b)
 8 |     theta_a <- simulate_data(dist_a, 1e7)
 9 |     theta_b <- simulate_data(dist_b, 1e7)
10 |     sim_prob <- mean(theta_b > theta_a)
11 |     sim_loss <- mean(pmax(theta_a - theta_b, 0))
12 |     expect_true(abs(prob - sim_prob) < 0.01)
13 |     expect_true(abs(loss - sim_loss) < 0.01)
14 |     
15 |     dist_a <- beta_dist(alpha = 3300, beta = 5000)
16 |     dist_b <- beta_dist(alpha = 3350, beta = 5000)
17 |     prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b)
18 |     loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b)
19 |     theta_a <- simulate_data(dist_a, 1e7)
20 |     theta_b <- simulate_data(dist_b, 1e7)
21 |     sim_prob <- mean(theta_b > theta_a)
22 |     sim_loss <- mean(pmax(theta_a - theta_b, 0))
23 |     expect_true(abs(prob - sim_prob) < 0.01)
24 |     expect_true(abs(loss - sim_loss) < 0.01)
25 | })
26 | 
27 | test_that('b_gt_a and expected_loss_b work correctly for normal gamma distribution', {
28 |     set.seed(123)
29 |     dist_a <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 2, beta = 3)
30 |     dist_b <- normal_gamma_dist(mu = 0, lambda = 3, alpha = 2, beta = 1)
31 |     theta_a <- simulate_data(dist_a, 5)
32 |     theta_b <- simulate_data(dist_b, 5)
33 |     prob <- b_gt_a(dist_a, dist_b, theta_a, theta_b)
34 |     loss <- expected_loss_b(dist_a, dist_b, theta_a, theta_b)
35 |     expect_equal(prob, 0.6)
36 |     expect_true(abs(loss - 0.318) < 0.01)
37 | })
38 | 
39 | test_that('b_gt_a works correctly for gamma distribution', {
40 |     dist_a <- gamma_dist(alpha = 3, beta = 5)
41 |     dist_b <- gamma_dist(alpha = 4, beta = 4)
42 |     prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b)
43 |     loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b)
44 |     theta_a <- simulate_data(dist_a, 1e7)
45 |     theta_b <- simulate_data(dist_b, 1e7)
46 |     sim_prob <- mean(theta_b > theta_a)
47 |     sim_loss <- mean(pmax(theta_a - theta_b, 0))
48 |     expect_true(abs(prob - sim_prob) < 0.01)
49 |     expect_true(abs(loss - sim_loss) < 0.01)
50 |     
51 |     dist_a <- gamma_dist(alpha = 3300, beta = 400)
52 |     dist_b <- gamma_dist(alpha = 4650, beta = 550)
53 |     prob <- b_gt_a(dist_a = dist_a, dist_b = dist_b)
54 |     loss <- expected_loss_b(dist_a = dist_a, dist_b = dist_b)
55 |     theta_a <- simulate_data(dist_a, 1e6)
56 |     theta_b <- simulate_data(dist_b, 1e6)
57 |     sim_prob <- mean(theta_b > theta_a)
58 |     sim_loss <- mean(pmax(theta_a - theta_b, 0))
59 |     expect_true(abs(prob - sim_prob) < 0.01)
60 | })
61 | 
62 | test_that('get metrics produces correct output', {
63 |     dist_a <- gamma_dist(alpha = 3, beta = 5)
64 |     dist_b <- gamma_dist(alpha = 4, beta = 4)
65 |     metrics <- get_metrics(posteriors = list(a = dist_a, b = dist_b), sim_batch_size = 1e4)
66 |     expect_equal(metrics[['prob_b_gt_a']], b_gt_a(dist_a, dist_b))
67 | })
68 | 
69 | 


--------------------------------------------------------------------------------
/tests/testthat/test-run_simulation.R:
--------------------------------------------------------------------------------
 1 | context('Test that we can run a whole simulation')
 2 | 
 3 | test_that('A simulation is successful with beta', {
 4 |     
 5 |     data_dists <- list(a = bernoulli_dist(rate = 0.3), b = bernoulli_dist(rate = 0.8))
 6 |     priors <- list(a = beta_dist(alpha = 1, beta = 1), b = beta_dist(alpha = 1, beta = 1))
 7 |     
 8 |     result <- simulate_ab_test(data_dists = data_dists
 9 |                                , priors = priors
10 |                                , loss_threshold = 1e-4
11 |                                , obs_per_round = 10
12 |                                , max_rounds = 100)
13 |     expect_equal(result[['best_variant']], 'b')
14 | })
15 | 
16 | test_that('A simulation is successful with normal gamma', {
17 |     
18 |     data_dists <- list(a = normal_dist(mu = 2, sigma = 1), b = normal_dist(mu = 1, sigma = 2))
19 |     priors <- list(a = normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1)
20 |                    , b = normal_gamma_dist(mu = 0, lambda = 1, alpha = 1, beta = 1))
21 |     
22 |     result <- simulate_ab_test(data_dists = data_dists
23 |                                , priors = priors
24 |                                , loss_threshold = 1e-4
25 |                                , obs_per_round = 10
26 |                                , max_rounds = 100)
27 |     expect_equal(result[['best_variant']], 'a')
28 | })
29 | 
30 | test_that('A simulation is successful with gamma', {
31 |     
32 |     data_dists <- list(a = poisson_dist(rate = 1), b = poisson_dist(rate = 2))
33 |     priors <- list(a = gamma_dist(alpha = 1, beta = 1), b = gamma_dist(alpha = 1, beta = 1))
34 |     
35 |     result <- simulate_ab_test(data_dists = data_dists
36 |                                , priors = priors
37 |                                , loss_threshold = 1e-4
38 |                                , obs_per_round = 10
39 |                                , max_rounds = 100)
40 |     expect_equal(result[['best_variant']], 'b')
41 | })
42 | 


--------------------------------------------------------------------------------
/tests/testthat/test-simulation_study.R:
--------------------------------------------------------------------------------
 1 | context('Testing the simulation study')
 2 | 
 3 | test_that('Test that frequentist simulation works', {
 4 |     data_dists <- list('a' = bernoulli_dist(0.4), 'b' = bernoulli_dist(0.5))
 5 |     priors <- list('a' = beta_dist(1, 1), 'b' = beta_dist(1, 1))
 6 |     result <- investigate_simulations(num_sims = 10
 7 |                                       , data_dists = data_dists
 8 |                                       , priors = priors
 9 |                                       , loss_threshold = 5e-4
10 |                                       , obs_per_round = 10)
11 |     expect_identical(result[['summary_dt']][['avg_loss']], 0)
12 |     
13 |     data_dists <- list('a' = normal_dist(0, 1), 'b' = normal_dist(1, 1))
14 |     priors <- list('a' = normal_gamma_dist(0, 1, 1, 1), 'b' = normal_gamma_dist(0, 1, 1, 1))
15 |     result <- investigate_simulations(num_sims = 10
16 |                                       , data_dists = data_dists
17 |                                       , priors = priors
18 |                                       , loss_threshold = 5e-4
19 |                                       , obs_per_round = 10)
20 |     expect_identical(result[['summary_dt']][['avg_loss']], 0)
21 |     
22 |     data_dists <- list('a' = poisson_dist(1), 'b' = poisson_dist(2))
23 |     priors <- list('a' = gamma_dist(1, 1), 'b' = gamma_dist(1, 1))
24 |     result <- investigate_simulations(num_sims = 10
25 |                                       , data_dists = data_dists
26 |                                       , priors = priors
27 |                                       , loss_threshold = 5e-4
28 |                                       , obs_per_round = 10)
29 |     expect_identical(result[['summary_dt']][['avg_loss']], 0)
30 | })


--------------------------------------------------------------------------------
/tests/testthat/test-stopping_functions.R:
--------------------------------------------------------------------------------
 1 | context('Testing functions that stop the tests')
 2 | 
 3 | test_that('single_loss_stop provides the best variant if there is a winner', {
 4 |     losses <- c(a = 10, b = 2, c = 30)
 5 |     out <- single_loss_stop(losses = losses, loss_threshold = 25)
 6 |     expect_equal(out, list(stop = TRUE, variant = 'b'))
 7 | })
 8 | 
 9 | test_that('single_loss_stop provides NULL if no variant has a small loss', {
10 |     losses <- c(a = 10, b = 2, c = 30)
11 |     out <- single_loss_stop(losses = losses, loss_threshold = 1)
12 |     expect_equal(out, list(stop = FALSE, variant = NULL))
13 | })
14 | 


--------------------------------------------------------------------------------
/tests/testthat/test-update_priors.R:
--------------------------------------------------------------------------------
 1 | context('Testing functions that update prior distributions')
 2 | 
 3 | test_that('update_prior.beta_dist fails on bad inputs', {
 4 |     prior <- beta_dist(alpha = 5, beta = 10)
 5 |     expect_error(update_prior(prior = prior, evidence = 1:3), regexp = 'must be a binary vector')
 6 |     expect_error(update_prior(prior = prior), regexp = 'cannot be NULL')
 7 | })
 8 | 
 9 | test_that('update_prior.beta_dist produces proper output', {
10 |     prior <- beta_dist(alpha = 5, beta = 10)
11 |     post <- update_prior(prior = prior, evidence = c(1, 1, 0, 0))
12 |     expected <- beta_dist(alpha = 7, beta = 12)
13 |     expect_equal(post, expected)
14 |     post <- update_prior(prior = prior, stats = list('num_obs' = 10, 'observed_rate' = 0.5))
15 |     expected <- beta_dist(alpha = 10, beta = 15)
16 |     expect_equal(post, expected)
17 | })
18 | 
19 | test_that('update_prior.normal_gamma_dist fails on bad inputs', {
20 |     prior <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 5, beta = 10)
21 |     expect_error(update_prior(prior = prior, evidence = c('a', 'b')), regexp = 'must be a numeric vector')
22 |     expect_error(update_prior(prior = prior), regexp = 'cannot be NULL')
23 | })
24 | 
25 | test_that('update_prior.normal_gamma_dist produces proper output', {
26 |     prior <- normal_gamma_dist(mu = 0, lambda = 1, alpha = 5, beta = 10)
27 |     post <- update_prior(prior = prior, evidence = c(1, 2, 3, 4))
28 |     expected <- normal_gamma_dist(mu = 2, lambda = 5, alpha = 7, beta = 15)
29 |     expect_equal(post, expected)
30 |     post <- update_prior(prior = prior, stats = list('num_obs' = 4, 'avg' = 2.5, 'std_dev' = sd(c(1, 2, 3, 4))))
31 |     expect_equal(post, expected)
32 | })
33 | 
34 | test_that('update_prior.gamma_dist fails on bad inputs', {
35 |     prior <- gamma_dist(alpha = 5, beta = 10)
36 |     expect_error(update_prior(prior = prior, evidence = -1:3), regexp = 'positive integer vector')
37 |     expect_error(update_prior(prior = prior), regexp = 'cannot be NULL')
38 | })
39 | 
40 | test_that('update_prior.gamma_dist produces proper output', {
41 |     prior <- gamma_dist(alpha = 5, beta = 10)
42 |     post <- update_prior(prior = prior, evidence = 1:4)
43 |     expected <- gamma_dist(alpha = 15, beta = 14)
44 |     expect_equal(post, expected)
45 |     post <- update_prior(prior = prior, stats = list('num_sessions' = 3, 'observed_count' = 30))
46 |     expected <- gamma_dist(alpha = 35, beta = 13)
47 |     expect_equal(post, expected)
48 | })
49 | 
50 | test_that('validate_dt fails where it should', {
51 |     df <- data.frame(x = 1:4, y = 2:5)
52 |     expect_error(validate_dt(df), regexp = 'data.table')
53 |     dt <- data.table::data.table(x = 1:4, y = 2:5)
54 |     expect_error(validate_dt(dt, c('x', 'y', 'z')), 'z')
55 |     expect_null(validate_dt(dt, c('x', 'y')))
56 | })
57 | 
58 | test_that('update priors can update a list of priors', {
59 |     priors <- list('a' = beta_dist(alpha = 1, beta = 1)
60 |                    , 'b' = beta_dist(alpha = 1, beta = 1))
61 |     evidence_dt <- data.table::data.table(a = c(1, 1, 1, 0, 0, 0)
62 |                                           , b = c(1, 1, 1, 1, 0, 0))
63 |     posteriors <- update_priors(priors = priors, evidence_dt = evidence_dt)
64 |     expected <- list('a' = beta_dist(alpha = 4, beta = 4)
65 |                      , 'b' = beta_dist(alpha = 5, beta = 3))
66 |     expect_equal(posteriors, expected)
67 |     stats_dt <- data.table::data.table(variant = c('a', 'b')
68 |                                        , num_obs = c(6, 6)
69 |                                        , observed_rate = c(1 / 2, 2 / 3))
70 |     posteriors <- update_priors(priors = priors, evidence_dt = evidence_dt)
71 |     expect_equal(posteriors, expected)
72 | })
73 | 
74 | 
75 | 


--------------------------------------------------------------------------------
/vignettes/abayes_vignette.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "abayes Vignette"
  3 | author: "Michael Frasco"
  4 | date: "`r Sys.Date()`"
  5 | output: rmarkdown::html_vignette
  6 | vignette: >
  7 |   %\VignetteIndexEntry{Vignette Title}
  8 |   %\VignetteEngine{knitr::rmarkdown}
  9 |   \usepackage[utf8]{inputenc}
 10 | ---
 11 | 
 12 | ## What is this package?
 13 | 
 14 | For background information, please read the README.md at the root of the git repository. The rest of this document assumes that you have read that README.
 15 | 
 16 | ## Scope
 17 | 
 18 | This document supports [this blog post](convoy.com).
 19 | 
 20 | ```{r, message=FALSE}
 21 | library(ggplot2)
 22 | library(data.table)
 23 | library(parallel)
 24 | library(purrr)
 25 | library(abayes)
 26 | ```
 27 | 
 28 | ## Loss Function
 29 | 
 30 | The loss function that we are using in these simulations is shown below as a function of the difference between $\theta_b$ and $\theta_a$. We lose nothing if $\theta_a$ is larger than $\theta_b$, else we lose the difference. There are many variants that we could apply to this function. We'll stick with this function because it is the simplest.
 31 | 
 32 | ```{r, fig.height=4, fig.width=6.67}
 33 | dt <- data.table(theta_diff = seq(-3, 3, length.out = 100))
 34 | dt[, loss_fun := pmax(theta_diff, 0)]
 35 | 
 36 | loss_plot <- ggplot(dt, aes(x = theta_diff, y = loss_fun)) +
 37 |     geom_line(colour = '#f65335', size = 1.5) +
 38 |     labs(title = 'Loss Function when Choosing Variant A', aes(colour = '#f65335')) +
 39 |     xlab(expression(beta*' - '*alpha)) +
 40 |     ylab('Loss Function') +
 41 |     theme(plot.title = element_text(hjust = 0.5, size = 22)
 42 |           , axis.title = element_text(size = 18)
 43 |           , axis.text = element_text(size = 14))
 44 | loss_plot
 45 | ```
 46 | 
 47 | ## Simulations
 48 | 
 49 | I want to demonstrate how we can control our average loss over a sequence of experiments by ending the experiment once the expected loss for either variant drops below $\epsilon$. This is the guarantee about reliability that Bayesian methods provide.
 50 | 
 51 | I use 20 different values of $\epsilon$ for each setup. In practice, the choice of $\epsilon$ is very context dependent. In a bernoulli experiment with a small expected rate, $\epsilon$ is going to be very small. In a normal experiment with a large variance, $\epsilon$ is going to be much larger. You can also use a loss function that takes the percent difference between $\theta_a$ and $\theta_b$ in order to eliminate some of this variability from experiment to experiment.
 52 | 
 53 | The last lever that controls the settings of the experiment is how frequently we evaluate the stopping condition. We could check the results after every single observation. However, for the sake of experimentation speed, I try to setup the simulations so that we evaluate the stopping condition on average 25 times over the entire experiment. In my head, this corresponding to checking the results once a day for a month. Of course, evaluating the stopping condition fewer times will lead to fewer mistakes. Therefore, one might consider a stopping rule that makes sure that the expected loss has been below $\epsilon$ for a consecutive period of time.
 54 | 
 55 | ### Choosing a Prior
 56 | 
 57 | The prior distribution that I am going to use for this experiment is shown below.
 58 | 
 59 | ```{r, fig.height=4, fig.width=6.67}
 60 | prior <- beta_dist(alpha = 70, beta = 7000)
 61 | sampling_plot <- plot_beta(betas = list('a' = prior), title = 'Prior Distribution', xlab = 'Rate that Events Occur')
 62 | sampling_plot
 63 | ```
 64 | 
 65 | This distribution was chosen because it is centered around $0.01$ and has a reasonable range of values from $0.007$ to $0.013$. We are going to use these values as our prior for both variants and as our sampling distribution for the data generating distribution in each simulation.
 66 | 
 67 | ### Running the Simulation
 68 | 
 69 | ```{r, eval=FALSE}
 70 | num_cores <- detectCores() - 1
 71 | 
 72 | num_thresholds <- 20
 73 | num_sims <- 250 # doing more simulations would lead to more accurate results
 74 | max_rounds <- 10000
 75 | 
 76 | numeric_seed <- as.numeric(paste(charToRaw('ab'), collapse = '')) # 6162
 77 | set.seed(numeric_seed)
 78 | 
 79 | priors <- list('a' = prior, 'b' = prior)
 80 | thresholds <- 10 ^ seq(-5.5, -3, length.out = num_thresholds)
 81 | obs_to_see <- seq(3000, 50, length.out = num_thresholds)
 82 | obs_to_see <- ceiling(obs_to_see) - ceiling(obs_to_see) %% 2
 83 | results <- vector('list', num_thresholds)
 84 | 
 85 | for (i in 1:num_thresholds) {
 86 |     print(paste('performing simulation', i))
 87 |     results[[i]] <- investigate_simulations(num_sims = num_sims
 88 |                                             , priors = priors
 89 |                                             , loss_threshold = thresholds[i]
 90 |                                             , sampling_distribution = prior
 91 |                                             , obs_per_round = obs_to_see[i]
 92 |                                             , max_rounds = max_rounds
 93 |                                             , num_cores = num_cores)
 94 | }
 95 | ```
 96 | 
 97 | ```{r, eval=FALSE}
 98 | avg_losses <- purrr::map_dbl(results, function(x) mean(x[['sim_dt']][['loss']], na.rm = TRUE))
 99 | dt <- data.table(log_thresh = thresholds, avg_loss = avg_losses, expected = thresholds)
100 | sim_plot <- ggplot(dt, aes(log_thresh)) +
101 |     geom_line(aes(y = avg_loss, colour = 'observed average loss'), size = 1.25) +
102 |     geom_line(aes(y = expected, colour = 'expected loss'), size = 1.5) +
103 |     labs(title = expression('Observed Loss is Controlled by '*epsilon), aes(colour = '#f65335')) +
104 |     xlab(expression(epsilon*': Threshold Used in Simulations')) +
105 |     ylab('Simulated Average Loss') +
106 |     scale_colour_manual(values=c('black', '#f65335')) +
107 |     scale_x_log10() + 
108 |     scale_y_log10() + 
109 |     theme(plot.title = element_text(size = 16, hjust = 0.5)
110 |           , plot.subtitle = element_text(size = 14, hjust = 0.5)
111 |           , axis.text = element_text(size = 12)
112 |           , axis.title = element_text(size = 14)
113 |           , legend.title = element_blank()
114 |           , legend.position = 'bottom'
115 |           , legend.text = element_text(size = 12))
116 | sim_plot
117 | ```
118 | 
119 | 
120 | ## Derivations for Bayesian A/B Testing
121 | 
122 | ### Calculating Metrics Between Two Variants
123 | 
124 | The following derivations were taken from [Evan Miller's website](https://www.evanmiller.org/bayesian-ab-testing.html) and [Chris Stucchio's website](https://www.chrisstucchio.com/blog/2014/bayesian_ab_decision_rule.html). I am re-creating their derivations for my own sake.
125 | 
126 | #### Beta Distribution
127 | 
128 | ##### Probability Variant B Greater than Variant A
129 | 
130 | We aim to provide an analytic solution for calculating $P(p_b > p_a)$ where
131 | 
132 | $$
133 | \begin{eqnarray}
134 | p_a & \sim & Beta(\alpha_a, \beta_a) \\
135 | p_b & \sim & Beta(\alpha_b, \beta_b)
136 | \end{eqnarray}
137 | $$
138 | 
139 | Here $Beta(., .)$ represents the [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution).
140 | 
141 | ###### Background Information Needed For the Derivation
142 | 
143 | Before, we begin the actual derivation, we are going to provide some established facts about the Beta distribution and the [Beta function](https://en.wikipedia.org/wiki/Beta_function).
144 | 
145 | First, for any random variable $X$, if $X \sim B(\alpha, \beta)$, then the probability density function (i.e. the probability that $X$ takes on some value $x$) can be written as follows.
146 | 
147 | $$
148 | \begin{eqnarray}
149 | f(x) = P(X = x) & = & \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}
150 | \end{eqnarray}
151 | $$
152 | Where $B(., .)$ represents the Beta function.
153 | 
154 | Next, the cumulative density function of $X$ (i.e. the probability that $X$ takes on some value less than $x$) is known as the regularized incomplete beta function.
155 | 
156 | $$
157 | \begin{eqnarray}
158 |     P(X < x) = I_x(\alpha, \beta) & = & \frac{B(x; \alpha, \beta)}{B(\alpha, \beta)}
159 | \end{eqnarray}
160 | $$
161 | 
162 | Where $B(x; \alpha, \beta)$ is the incomplete beta function and can be expressed as
163 | 
164 | $$
165 | \begin{eqnarray}
166 | B(x; \alpha, \beta) = \int_0^x t^{\alpha - 1} (1 - t)^{\beta - 1} dt
167 | \end{eqnarray}
168 | $$
169 | 
170 | Lastly, we have the [following equalities](https://en.wikipedia.org/wiki/Beta_function#Properties_2)
171 | 
172 | $$
173 | \begin{eqnarray}
174 | I_x(1, \beta) & = & 1 - (1 - x)^{\beta} \\
175 | I_x(\alpha, \beta) & = & I_x(\alpha - 1, \beta) - \frac{x^{\alpha - 1} (1 - x)^\beta}{(\alpha - 1) B(\alpha - 1, \beta)}
176 | \end{eqnarray}
177 | $$
178 | 
179 | If we apply that second equation recursively and use the first equation as the base case, we obtain the following equation.
180 | 
181 | $$
182 | \begin{eqnarray}
183 | I_x(\alpha, \beta) & = & 1 - (1 - x)^b - \sum_{j=1}^{\alpha - 1} \frac{x^{\alpha - j} (1 - x)^b}{(\alpha - j) B(\alpha - j, \beta)}
184 | \end{eqnarray}
185 | $$
186 | Now, if change the iterating variable so that $i = \alpha - j$ and let $(1 - x)^b$ represent the case where $i = 0$, then we have the following identities
187 | 
188 | $$
189 | \begin{eqnarray}
190 | x^{\alpha - j} & = & x^i \\
191 | (\alpha - j) * B(\alpha - 1, \beta) & = & (\beta + i) * B(1 + i, \beta)
192 | \end{eqnarray}
193 | $$
194 | And this allows us to write
195 | 
196 | $$
197 | \begin{eqnarray}
198 | I_x(\alpha, \beta) = 1 - \sum_{i=0}^{\alpha - 1} \frac{x^i (1 - x)^\beta}{(\beta + i) B(1 + i, \beta)}
199 | \end{eqnarray}
200 | $$
201 | 
202 | ###### The actual derivation
203 | 
204 | Given all of the information above, we can write
205 | 
206 | $$
207 | \begin{eqnarray}
208 | P(p_b > p_a) & = & \int_0^1 \int_0^1 1_{p_b > p_a} f(p_a) f(p_b) dp_b dp_a \\
209 | & = & \int_0^1 \int_{p_a}^1 f(p_a) f(p_b) dp_b dp_a \\
210 | & = & \int_0^1 f(p_a) \Big[ \int_{p_a}^1 f(p_b) dp_b \Big] dp_a \\
211 | & = & \int_0^1 f(p_a) \Big[ 1 - I_{p_a}(\alpha_b, \beta_b) \Big] dp_a \\
212 | & = & \int_0^1 f(p_a) dp_a - \int_0^1 f(p_a) I_{p_a}(\alpha_b, \beta_b) dp_a \\
213 | & = & 1 - \int_0^1 f(p_a) \Big[ 1 - \sum_{i=0}^{\alpha_b - 1} \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\
214 | & = & 1 - 1 + \int_0^1 f(p_a) \Big[ \sum_{i=0}^{\alpha_b - 1} \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\
215 | & = & \sum_{i=0}^{\alpha_b - 1} \int_0^1 f(p_a) \Big[  \frac{p_a^i (1 - p_a)^{\beta_b}}{(\beta_b + i) B(1 + i, \beta_b)} \Big] dp_a \\
216 | & = & \sum_{i=0}^{\alpha_b - 1} \int_0^1  \frac{p_a^{\alpha_a + i - 1} (1 - p_a)^{\beta_a + \beta_b - 1}}{(\beta_b + i) B(\alpha_a, \beta_a) B(1 + i, \beta_b)} dp_a \\
217 | \end{eqnarray}
218 | $$
219 | 
220 | Lastly, if we multiple the numerator and denominator by $B(\alpha_a + i, \beta_a + \beta_b)$ and bring all of the terms that do not depend on $p_a$ out of the integral, we get
221 | 
222 | $$
223 | \begin{eqnarray}
224 | P(p_b > p_a) & = & \sum_{i=0}^{\alpha_b - 1} \frac{B(\alpha_a + i, \beta_a + \beta_b)}{(\beta_b + i)B(\alpha_a, \alpha_b)B(1 + i, \beta_b)} \int_0^1 \frac{p_a^{\alpha_a + i - 1}(1 - p_a)^{\beta_a + \beta_b - 1}}{B(\alpha_a + i, \beta_a + \beta_b)} dp_a \\ 
225 | & = & \sum_{i=0}^{\alpha_b - 1} \frac{B(\alpha_a + i, \beta_a + \beta_b)}{(\beta_b + i)B(\alpha_a, \alpha_b)B(1 + i, \beta_b)} \equiv h(\alpha_a, \beta_a, \alpha_b, \beta_b)
226 | \end{eqnarray}
227 | $$
228 | 
229 | Thus, we have a closed form expression for the desired probability that only depends on the parameters of the posterior distributions.
230 | 
231 | ##### $E[L](A)$ and $E[L](B)$
232 | 
233 | Next, we want to derive the formula for the bayesian A/B testing decision rule.
234 | 
235 | We assume that the loss function for each variant is the absolute loss, which is shown below.
236 | 
237 | $$
238 | \begin{eqnarray}
239 | L(p_a, p_b, A) & = & max(p_b - p_a, 0) \\
240 | L(p_a, p_b, B) & = & max(p_a - p_b, 0)
241 | \end{eqnarray}
242 | $$
243 | 
244 | This derivation depends on the derivation for $P(p_b > p_a) \equiv h(\alpha_a, \beta_a, \alpha_b, \beta_b)$, which is shown above.
245 | 
246 | $$
247 | \begin{eqnarray}
248 | E[L](B) & = & \int_0^1 \int_0^1 max(p_a - p_b, 0) f(p_a) f(p_b) dp_a dp_b \\
249 | & = & \int_0^1 \int_{p_b}^1 (p_a - p_b) f(p_a) f(p_b) dp_a dp_b \\
250 | & = & \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b \\
251 | \end{eqnarray}
252 | $$
253 | Now, we can multiply the first term by $\frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a + 1, \beta_a)}$ and the second term by $\frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b + 1, \beta_b)}$.
254 | 
255 | $$
256 | \begin{eqnarray}
257 | E[L](B) & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a + 1, \beta_a)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b + 1, \beta_b)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b \\
258 | & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a, \beta_a)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a + 1, \beta_a)} \frac{p_b^{\alpha_b - 1} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b, \beta_b)} dp_a dp_b - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b, \beta_b)} \int_0^1 \int_{p_b}^1 \frac{p_a^{\alpha_a - 1} (1 - p_a)^{\beta_a - 1}}{B(\alpha_a, \beta_a)} \frac{p_b^{\alpha_b} (1 - p_b)^{\beta_b - 1}}{B(\alpha_b + 1, \beta_b)} dp_a dp_b \\
259 | & = & \frac{B(\alpha_a + 1, \beta_a)}{B(\alpha_a, \beta_a)} \big(1 - h(\alpha_a + 1, \beta_a, \alpha_b, \beta_b)\big) - \frac{B(\alpha_b + 1, \beta_b)}{B(\alpha_b, \beta_b)} \big(1 - h(\alpha_a, \beta_a, \alpha_b + 1, \beta_b)\big)
260 | \end{eqnarray}
261 | $$
262 | 
263 | Since the loss function is symmetrical, we can solve for $E[L](A)$ by switching the roles of the two variants in the equation above.
264 | 
265 | 
266 | ### Constructing Parameterized Distributions From Desired Properties
267 | 
268 | #### Beta Distribution
269 | 
270 | Suppose that $x \sim Beta(\alpha, \beta)$.
271 | 
272 | We know that $E[x] = \mu = \frac{\alpha}{\alpha + \beta}$ and $Var(x) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$.
273 | 
274 | Then, we can write $\alpha + \beta = \frac{\alpha}{\mu}$ and then $\beta = \frac{\alpha}{\mu} - \alpha$. We can use this value to solve for $\alpha$. We have that $\sigma^2 = \frac{\alpha (\frac{\alpha}{\mu} - \alpha)}{(\frac{\alpha}{\mu})^2 (\frac{\alpha}{\mu} + 1)}$.
275 | 
276 | Simplifying the above, we have
277 | 
278 | $$
279 | \begin{eqnarray}
280 | \sigma^2 & = & \frac{\alpha^2 (1 - \mu) \frac{1}{\mu}}{\alpha^2 (\alpha + \mu) \frac{1}{\mu^3}} \\
281 | & = & \frac{(1 - \mu)}{(\alpha + \mu) \frac{1}{\mu^2}} \\
282 | \end{eqnarray}
283 | $$
284 | 
285 | This implies that $(1 - \mu) \mu^2 = \sigma^2 \alpha + \sigma^2 \mu$, so that we get $\alpha = \frac{(1 - \mu)\mu^2 - \mu \sigma^2}{\sigma^2}$.
286 | 
287 | #### Normal Gamma Distribution
288 | 
289 | Suppose that $\mu \sim N(\mu_0, \frac{1}{\lambda \tau})$ and that $\tau \sim Gamma(\alpha, \beta)$, then we have that $(\mu, \tau) \sim NormalGamma(\mu_0, \lambda, \alpha, \beta)$.
290 | 
291 | We know that $E[\mu] = \mu_0$, $E[\tau] = \tau_0 = \frac{\alpha}{\beta}$, $Var(\mu) = \sigma^2_{\mu} = \frac{\beta}{\lambda (\alpha - 1)}$, and $Var(\tau) = \sigma^2_{\tau} \frac{\alpha}{\beta^2}$.
292 | 
293 | Then, we can write $\beta \tau_0 = \alpha$. We can use this to solve for $\beta$. We have that $\sigma^2_{\tau} = \frac{\beta \tau_0}{\beta^2} = \frac{\tau_0}{\beta}$, so that $\beta = \frac{\tau_0}{\sigma^2_{\tau}}$.
294 | 
295 | Finally, we can use the values for $\alpha$ and $\beta$ to solve for $\lambda$. We have that $\lambda = \frac{\beta}{\sigma^2_{\mu} (\alpha - 1)}$.
296 | 
297 | #### Gamma Distribution
298 | 
299 | Suppose that $x \sim Gamma(\alpha, \beta)$.
300 | 
301 | We know that $E[x] = \mu = \frac{\alpha}{\beta}$ and that $Var(x) = \sigma^2 = \frac{\alpha}{\beta^2}$.
302 | 
303 | Then, we can write $\beta \mu = \alpha$. We can use this to solve for $\beta$. We have that $\sigma^2 = \frac{\beta \mu}{\beta^2} = \frac{\mu}{\beta}$. This allows us to write $\beta = \frac{\mu}{\sigma ^ 2}$.
304 | 
305 | Therfore, given $\mu$ and $\sigma$, we can first solve for $\beta$ and then use this to solve for $\alpha$.
306 | 
307 | 
308 | 
309 | 


--------------------------------------------------------------------------------