├── .gitattributes ├── .gitignore ├── 1var-categorical-bar-bayesian_template.Rmd ├── 1var-continuous-line-bayesian_template.Rmd ├── 1var-ordinal-line-bayesian_template.Rmd ├── 2var-categorical-bar-bayesian_template.Rmd ├── 2var-categorical_ordinal-line-bayesian_template.Rmd ├── 2var-continuous_categorical-line-bayesian_template.Rmd ├── LICENSE ├── README.md ├── bayesian-template.Rproj ├── datasets ├── choc_cleaned_data.csv ├── feel-the-movement_simulated-data.csv ├── lab-in-the-wild_subset.csv └── stigmatized_campaigns_simulated-data.csv ├── html_outputs ├── 1var-categorical-bar-bayesian_template.html ├── 1var-categorical-bar-bayesian_template_(strong).html ├── 1var-categorical-bar-bayesian_template_(weak).html ├── 1var-continuous-line-bayesian_template.html ├── 1var-continuous-line-bayesian_template_(strong).html ├── 1var-continuous-line-bayesian_template_(weak).html ├── 1var-ordinal-line-bayesian_template.html ├── 1var-ordinal-line-bayesian_template_(strong).html ├── 1var-ordinal-line-bayesian_template_(weak).html ├── 2var-categorical-bar-bayesian_template.html ├── 2var-categorical-bar-bayesian_template_(strong).html ├── 2var-categorical-bar-bayesian_template_(weak).html ├── 2var-categorical_ordinal-line-bayesian_template.html ├── 2var-categorical_ordinal-line-bayesian_template_(strong).html ├── 2var-categorical_ordinal-line-bayesian_template_(weak).html ├── 2var-continuous_categorical-line-bayesian_template.html ├── 2var-continuous_categorical-line-bayesian_template_(strong).html └── 2var-continuous_categorical-line-bayesian_template_(weak).html ├── images ├── generic_2bar_chart.png ├── generic_2line-cont_chart.png ├── generic_2line_chart.png ├── generic_2line_chart_old.png ├── generic_bar_chart.png ├── generic_line-ord_chart.png └── generic_line_chart.png ├── plotting_functions.R └── quizlet ├── check_understanding_priors.Rmd └── check_understanding_priors.html /.gitattributes: -------------------------------------------------------------------------------- 1 | * text=auto 2 | 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | .Rhistory 3 | .RData 4 | .Rproj* 5 | .Rproj.user 6 | .DS_Store 7 | 1var-categorical-bar-bayesian_template_cache/ 8 | 1var-categorical-bar-bayesian_template_files/ 9 | 1var-continuous-line-bayesian_template_cache/ 10 | 1var-continuous-line-bayesian_template_files/ 11 | 1var-ordinal-line-bayesian_template.html_cache/ 12 | 1var-ordinal-line-bayesian_template.html_files/ 13 | 2var-categorical-bar-bayesian_template_cache/ 14 | 2var-categorical-bar-bayesian_template_files/ 15 | 2var-categorical_ordinal-line-bayesian_template_cache/ 16 | 2var-categorical_ordinal-line-bayesian_template_files/ 17 | 2var-continuous_categorical-line-bayesian_template_cache/ 18 | 2var-continuous_categorical-line-bayesian_template_files/ -------------------------------------------------------------------------------- /1var-categorical-bar-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 1:* 11 | 12 | ![](images/generic_bar_chart.png) 13 | 14 | **Single categorical independent variable (bar chart)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single categorical independent variable** and will produce a **bar chart**. If your analysis includes a **t-test** or **one-way ANOVA**, this might be the right template for you. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. **Categorical; bar graph (e.g. t-tests, one-way ANOVA)** 29 | 30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA) 31 | 32 | 3. Continuous; line graph (e.g. linear regression) 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. Two categorical; bar graph (e.g. two-way ANOVA) 37 | 38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA) 39 | 40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines) 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | 60 | ##Set up 61 | ###Requirements 62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 63 | 64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 65 | 66 | ###Libraries 67 | **Installation:** 68 | If this is your first time using the template, you may need to install libraries. 69 | 70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 71 | 72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 73 | 74 | **Troubleshooting:** 75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 76 | 77 | 1. Restart R. 78 | 79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 80 | 81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 82 | 83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 84 | 85 | ```{r libraries, message=FALSE, warning=FALSE} 86 | 87 | knitr::opts_chunk$set(fig.align="center") 88 | 89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 90 | 91 | library(rstanarm) #bayesian analysis package 92 | library(tidyverse) #tidy datascience commands 93 | library(tidybayes) #tidy data + ggplot workflow 94 | library(modelr) #tidy pipelines for modeling 95 | library(ggplot2) #plotting package 96 | library(gganimate) #animate ggplots 97 | 98 | # We import all of our plotting functions from this separate R file to keep the code in 99 | # this template easier to read. You can edit this file to customize aesthetics of the plots 100 | # if desired. Just be sure to run this line again after you make edits! 101 | source('plotting_functions.R') 102 | 103 | theme_set(theme_light()) # set the ggplot theme for all plots 104 | 105 | ``` 106 | 107 | 108 | ###Read in data 109 | **What to change** 110 | 111 | 1. mydata: Read in your data. 112 | 113 | ```{r data_prep} 114 | 115 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 116 | 117 | ``` 118 | 119 | 120 | ## Specify model 121 | We'll fit the following model: `stan_glm(y ~ x)`. As $x$ is a categorical variable in this template, this specifies a linear regression with dummy variables for each level in categorical variable $x$. **This is equivalent to ANOVA.** So for example, for a regression where $x$ has three levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b_1dummy_1 + b_2dummy_2$ and standard deviation equal to `sigma` ($\sigma$): 122 | 123 | $$ 124 | y_i \sim Normal(a + b_1dummy_1 + b_2dummy_2, \sigma) 125 | $$ 126 | 127 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 128 | 129 | **What to change** 130 | 131 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots. 132 | 133 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots. 134 | 135 | 4. x_lab: Label your plots' x-axes. 136 | 137 | 5. y_lab: Label your plots' y-axes. 138 | 139 | ```{r specify_model} 140 | 141 | #select your independent and dependent variable 142 | mydata$x = mydata$sat_max #CHANGE ME 2 143 | mydata$y = mydata$satis_Q1 #CHANGE ME 3 144 | 145 | # label the axes on the plots 146 | x_lab = "Decision-making type" #CHANGE ME 4 147 | y_lab = "Satisfaction" #CHANGE ME 5 148 | 149 | ``` 150 | 151 | 152 | ### Set priors 153 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 154 | 155 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 156 | 157 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 158 | 159 | Our model has the following parameters: 160 | 161 | a. the overall mean y-value across all levels of categorical variable x 162 | 163 | b. the mean y-value for each of the individual levels 164 | 165 | c. the standard deviation of the normally distributed error term 166 | 167 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the categorical variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 168 | 169 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 170 | 171 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 172 | 173 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 174 | 175 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 176 | 177 | **What to change** 178 | 179 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 180 | 181 | *Remember: **do not** use any of your data from the current study to inform prior values.* 182 | 183 | 6. a_prior: Select the control condition mean. 184 | 185 | 7. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.) 186 | 187 | 8. b1_prior: Select the effect size mean. 188 | 189 | 9. b1_sd: Select the effect size standard deviation. 190 | 191 | 10. You should also change the comments in the code below to explain your choice of priors. 192 | 193 | **If you are using strong priors (i.e. priors from previous literature):** 194 | 195 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 196 | 197 | ```{r} 198 | 199 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 200 | # In our example dataset, y-axis scores can be in the range [1, 7]. 201 | # In the absence of other information, we set the parameter mean as 4 202 | # (the mean of the range [1,7]) and the maximum possible value as 7. 203 | # From exploratory analysis, we know the mean score and sd for y in our 204 | # dataset but we *DO NOT* use this information because priors *CANNOT* 205 | # include any information from the current study. 206 | 207 | a_prior = 4 # CHANGE ME 6 208 | a_prior_max = 7 # CHANGE ME 7 209 | 210 | # With a normal distribution, we can't completely rule out 211 | # impossible values, but we choose an sd that assigns less than 212 | # 5% probability to those impossible values. Remember that in a normal 213 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 214 | # we calculate the value of 1 sd by finding the maximum amount our data 215 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 216 | 217 | a_sd = (a_prior_max - a_prior) / 2 # do not change 218 | 219 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 220 | # In our example dataset, we do not have a strong hypothesis that the treatment 221 | # conditions will be higher or lower than the control, so we set the mean of 222 | # the effect size parameters to be 0. In the absence of other information, we 223 | # set the sd to be the same as for the control condition. 224 | 225 | b1_prior = 0 # CHANGE ME 8 226 | b1_sd = a_sd # CHANGE ME 9 227 | 228 | ``` 229 | 230 | 231 | **What to change** 232 | 233 | **If you are using weakly informative priors:** 234 | 235 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 236 | 237 | **If you are using strong priors (i.e. priors from previous literature):** 238 | 239 | *Remember: **do not** use any of your data from the current study to set prior values.* 240 | 241 | First, make sure to uncomment all four variables set in this code chunk. 242 | 243 | 6. a_prior: Select the control condition mean. 244 | 245 | 7. a_sd: Select the control condition standard deviation. 246 | 247 | 8. b1_prior: Select the effect size mean. 248 | 249 | 9. b1_sd: Select the effect size standard deviation. 250 | 251 | 10. You should also change the comments in the code below to explain your choice of priors. 252 | 253 | ```{r} 254 | 255 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 256 | # In our example dataset, y-axis scores can be in the range [1, 7]. 257 | # To choose our priors, we use the results from a previous study 258 | # where participants completed an identical task (choosing between 259 | # different chocolate bars). For our overall prior mean, we pool the mean 260 | # satisfaction scores from all conditions in the previous study to get 261 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 262 | # the 95% confidence intervals from the previous study results. 263 | 264 | # a_prior = 5.86 # CHANGE ME 6 265 | # a_sd = 0.6 # CHANGE ME 7 266 | 267 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 268 | # In our example dataset, we do not have guidance from previous literature 269 | # to set an exact effect size, but the literature does tell us that satisficers 270 | # (the "treatment" condition) are likely to have higher mean satisfaction 271 | # than the maximizers (the "control" condition), so we set an effect size 272 | # parameter mean that results in a 1 point increase in satisfaction for satisficers. 273 | # To reflect the uncertainty in this effect size, we select a broad sd so 274 | # that there is a ~20% chance that the effect size will be negative. 275 | 276 | # b1_prior = 1 # CHANGE ME 8 277 | # b1_sd = 1 # CHANGE ME 9 278 | 279 | ``` 280 | 281 | 282 | ### Checking priors with visualizations 283 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 284 | 285 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars. 286 | 287 | #### Static visualization of priors 288 | **What to change** 289 | 290 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try re-running it a few more times to make sure that any implausible values you see don't come up very often. 291 | 292 | **Troubleshooting** 293 | 294 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 295 | 296 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 297 | 298 | ```{r check_priors, results="hide"} 299 | 300 | # generate the prior distribution 301 | m_prior = stan_glm(y ~ x, data = mydata, 302 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 303 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 304 | prior_PD = TRUE 305 | ) 306 | 307 | # Create the dataframe with fitted draws 308 | prior_draws = mydata %>% #pipe mydata to datagrid() 309 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 310 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid 311 | # the seed argument is for reproducibility: it ensures the pseudo-random 312 | # number generator used to pick draws has the same seed on every run, 313 | # so that someone else can re-run this code and verify their output matches 314 | 315 | # Plot the five sample draws 316 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 317 | static_prior_plot_1(prior_draws) 318 | ``` 319 | 320 | #### Animated HOPs visualization of priors 321 | 322 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws! 323 | 324 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above. 325 | 326 | **What to change:** Nothing! Just run the code to check your priors. 327 | 328 | ```{r HOPs_priors} 329 | # Animation parameters 330 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time) 331 | frames_per_second = 2.5 # the speed of the HOPs 332 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 333 | # Faster speeds (100ms) have been demonstrated to not work as well. 334 | # See Kale et al. VIS 2018 for more info. 335 | 336 | # Add more prior draws to the data frame for the visualization 337 | more_prior_draws = prior_draws %>% 338 | rbind( 339 | mydata %>% 340 | data_grid(x) %>% 341 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345)) 342 | 343 | # Animate the prior draws with HOPs 344 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 345 | prior_HOPs = animate(HOPs_plot_1(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second) 346 | prior_HOPs 347 | ``` 348 | 349 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 350 | 351 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 352 | 353 | ## Run the model 354 | **What to change:** Nothing! Just run the model. 355 | 356 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 357 | 358 | ```{r results = "hide", message = FALSE, warning = FALSE} 359 | 360 | m = stan_glm(y ~ x, data = mydata, 361 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 362 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 363 | ) 364 | 365 | ``` 366 | 367 | 368 | ### Model summary 369 | Here is a summary of the model fit. 370 | 371 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 372 | 373 | ```{r} 374 | summary(m, digits=3) 375 | ``` 376 | 377 | 378 | ## Visualizing results 379 | #### Static visualization 380 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model: 381 | 382 | ```{r static_graph} 383 | 384 | # Create the dataframe with fitted draws 385 | fit = mydata %>%#pipe mydata to datagrid() 386 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 387 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid 388 | mean_qi(.width = .95) #add 95% credible intervals 389 | 390 | # Plot the posterior draws 391 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 392 | static_post_plot_1(fit) 393 | ``` 394 | 395 | #### Animated HOPs visualization 396 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 397 | 398 | ```{r} 399 | 400 | p = mydata %>% #pipe mydata to datagrid() 401 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 402 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid 403 | 404 | # Animate the posterior draws with HOPs 405 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 406 | post_HOPs = animate(HOPs_plot_1(p), nframes = n_draws * 2, fps = frames_per_second) 407 | post_HOPs 408 | 409 | ``` 410 | 411 | ### Comparing the prior and posterior 412 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 413 | 414 |
**Prior draws**
415 | ```{r echo=F} 416 | prior_HOPs 417 | ``` 418 | 419 |
**Posterior draws**
420 | ```{r echo=F} 421 | post_HOPs 422 | ``` 423 | 424 | ## Finishing up 425 | 426 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 427 | 428 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 429 | 430 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 431 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 432 | 433 | 434 | The citation for the paper that reports the process of developing and user-testing these templates is below: 435 | 436 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 437 | 438 | -------------------------------------------------------------------------------- /1var-continuous-line-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 3:* 11 | 12 | ![](images/generic_line_chart.png) 13 | 14 | **Single continuous independent variable (line graph)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single continuous independent variable** and will produce a **line chart**. If your analysis includes a **linear regression**, this might be the right template for you. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA) 29 | 30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA) 31 | 32 | 3. **Continuous; line graph (e.g. linear regression)** 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. Two categorical; bar graph (e.g. two-way ANOVA) 37 | 38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA) 39 | 40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines) 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between the number of choices presented, which we treat as a continuous variable in this template with values that can fall in the range [12,72]; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | 60 | ##Set up 61 | ###Requirements 62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 63 | 64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 65 | 66 | ###Libraries 67 | **Installation:** 68 | If this is your first time using the template, you may need to install libraries. 69 | 70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 71 | 72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 73 | 74 | **Troubleshooting:** 75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 76 | 77 | 1. Restart R. 78 | 79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 80 | 81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 82 | 83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 84 | 85 | ```{r libraries, message=FALSE, warning=FALSE} 86 | 87 | knitr::opts_chunk$set(fig.align="center") 88 | 89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 90 | 91 | library(rstanarm) #bayesian analysis package 92 | library(tidyverse) #tidy datascience commands 93 | library(tidybayes) #tidy data + ggplot workflow 94 | library(modelr) #tidy pipelines for modeling 95 | library(ggplot2) #plotting package 96 | library(gganimate) #animate ggplots 97 | 98 | # We import all of our plotting functions from this separate R file to keep the code in 99 | # this template easier to read. You can edit this file to customize aesthetics of the plots 100 | # if desired. Just be sure to run this line again after you make edits! 101 | source('plotting_functions.R') 102 | 103 | theme_set(theme_light()) # set the ggplot theme for all plots 104 | 105 | ``` 106 | 107 | ###Read in data 108 | **What to change** 109 | 110 | 1. mydata: Read in your data. 111 | 112 | ```{r data_prep} 113 | 114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 115 | 116 | ``` 117 | 118 | 119 | ## Specify model 120 | We'll fit the following model: `stan_glm(y ~ x)`, which specifies a linear regression where each $y_i$ is drawn from a normal distribution with mean equal to $a + bx_i$ and standard deviation equal to `sigma` ($\sigma$): 121 | 122 | $$ 123 | y_i \sim Normal(a + bx_i, \sigma) 124 | $$ 125 | 126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 127 | 128 | **What to change** 129 | 130 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots. 131 | 132 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots. 133 | 134 | 4. x_lab: Label your plots' x-axes. 135 | 136 | 5. y_lab: Label your plots' y-axes. 137 | 138 | ```{r specify_model} 139 | 140 | #select your independent and dependent variable 141 | mydata$x = mydata$num_products_displayed #CHANGE ME 2 142 | mydata$y = mydata$satis_Q1 #CHANGE ME 3 143 | 144 | # label the axes on the plots 145 | x_lab = "Choices" #CHANGE ME 4 146 | y_lab = "Satisfaction" #CHANGE ME 5 147 | 148 | ``` 149 | 150 | 151 | ### Set priors 152 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 153 | 154 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 155 | 156 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 157 | 158 | Our model has the following parameters: 159 | 160 | a. the intercept; functionally, this is often the mean of the control condition 161 | 162 | b. the slope; i.e, the effect size 163 | 164 | c. the standard deviation of the normally distributed error term 165 | 166 | To simplify things, we limit the number of different prior beliefs you can have. Think of the intercept as specifying the control condition of an experiment, and the slope as specifying the effect size. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 167 | 168 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 169 | 170 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 171 | 172 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 173 | 174 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 175 | 176 | **What to change** 177 | 178 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 179 | 180 | *Remember: **do not** use any of your data from the current study to inform prior values.* 181 | 182 | 6. a_prior: Select the intercept (likely the control condition mean). 183 | 184 | 7. a_prior_max: Select the maximum plausible value of the intercept (maximum plausible value of control condition data). (We will use this to calculate the sd of `a`.) 185 | 186 | 8. b1_prior: Select the effect size mean. 187 | 188 | 9. b1_sd: Select the effect size standard deviation. 189 | 190 | 10. You should also change the comments in the code below to explain your choice of priors. 191 | 192 | **If you are using strong priors (i.e. priors from previous literature):** 193 | 194 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 195 | 196 | ```{r} 197 | 198 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 199 | # In our example dataset, y-axis scores can be in the range [1, 7]. 200 | # In the absence of other information, we set the parameter mean as 4 201 | # (the mean of the range [1,7]) and the maximum possible value as 7. 202 | # From exploratory analysis, we know the mean score and sd for y in our 203 | # dataset but we *DO NOT* use this information because priors *CANNOT* 204 | # include any information from the current study. 205 | 206 | a_prior = 4 # CHANGE ME 6 207 | a_prior_max = 7 # CHANGE ME 7 208 | 209 | # With a normal distribution, we can't completely rule out 210 | # impossible values, but we choose an sd that assigns less than 211 | # 5% probability to those impossible values. Remember that in a normal 212 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 213 | # we calculate the value of 1 sd by finding the maximum amount our data 214 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 215 | 216 | a_sd = (a_prior_max - a_prior) / 2 # do not change 217 | 218 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 219 | # In this example, we will say we do not have guidance from literature 220 | # about the effect of choice set size on satisfaction, so we set the mean 221 | # of the effect size parameters to be 0. In the absence of other information, 222 | # we set the sd so that a change from the minimum choice set size (12) 223 | # to the maximum choice set size (72) could plausibly result 224 | # in a +6/-6 change in satisfaction, the maximum possible change. 225 | 226 | b1_prior = 0 # CHANGE ME 8 227 | b1_sd = (6/(72-12))/2 # CHANGE ME 9 228 | 229 | ``` 230 | 231 | 232 | **What to change** 233 | 234 | **If you are using weakly informative priors:** 235 | 236 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 237 | 238 | **If you are using strong priors (i.e. priors from previous literature):** 239 | 240 | *Remember: **do not** use any of your data from the current study to set prior values.* 241 | 242 | First, make sure to uncomment all four variables set in this code chunk. 243 | 244 | 6. a_prior: Select the control condition mean. 245 | 246 | 7. a_sd: Select the control condition standard deviation. 247 | 248 | 8. b1_prior: Select the effect size mean. 249 | 250 | 9. b1_sd: Select the effect size standard deviation. 251 | 252 | 10. You should also change the comments in the code below to explain your choice of priors. 253 | 254 | ```{r} 255 | 256 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 257 | # In our example dataset, y-axis scores can be in the range [1, 7]. 258 | # To choose our priors, we use the results from a previous study 259 | # where participants completed an identical task (choosing between 260 | # different chocolate bars). For our overall prior mean, we pool the mean 261 | # satisfaction scores from all conditions in the previous study to get 262 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 263 | # the 95% confidence intervals from the previous study results. 264 | 265 | # a_prior = 5.86 # CHANGE ME 6 266 | # a_sd = 0.6 # CHANGE ME 7 267 | 268 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 269 | # In this example, we do not have guidance from previous literature 270 | # to set an exact effect size, but literature does tell us that 271 | # satisfaction is likely to decline as choice size increases, so we set 272 | # an effect size parameter mean so that a change from the minimum 273 | # choice set size (12) to the maximum choice set size (72) results in 274 | # a 2-point decrease in satisfaction. To reflect the uncertainty in 275 | # this effect size, we set the sd so that a change from the minimum 276 | # to maximum choice set size could plausibly result in a +6/-6 change 277 | # in satisfaction, the maximum possible change. 278 | 279 | # b1_prior = -2/(72-12) # CHANGE ME 8 280 | # b1_sd = (6/(72-12))/2 # CHANGE ME 9 281 | 282 | ``` 283 | 284 | 285 | ### Checking priors with visualizations 286 | Next, you'll want to check your priors by running this code chunk. It will produce a set of 100 sample draws from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 287 | 288 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars. 289 | 290 | #### Static visualization of priors 291 | **What to change** 292 | 293 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often. 294 | 295 | **Troubleshooting** 296 | 297 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 298 | 299 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 300 | 301 | ```{r check_priors, results="hide"} 302 | 303 | # generate the prior distribution 304 | m_prior = stan_glm(y ~ x, data = mydata, 305 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 306 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 307 | prior_PD = TRUE 308 | ) 309 | 310 | # Create the dataframe with fitted draws 311 | prior_draws = mydata %>% #pipe mydata to datagrid() 312 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 313 | add_fitted_draws(m_prior, n = 100, seed = 12345) #add n fitted draws from the model to the fit grid 314 | # the seed argument is for reproducibility: it ensures the pseudo-random 315 | # number generator used to pick draws has the same seed on every run, 316 | # so that someone else can re-run this code and verify their output matches 317 | 318 | # Plot the five sample draws 319 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 320 | static_prior_plot_3(prior_draws) 321 | ``` 322 | 323 | #### Animated HOPs visualization of priors 324 | The static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs. HOPs visualizes the same information as the static plot generated above, but are better for visualizing uncertainty and identifying underlying trends. 325 | 326 | In this code chunk, we create the animated plot using the 50 of the 100 draws we used in the plot above. Each frame of the animation shows a different draw from the prior. 327 | 328 | **What to change:** Nothing! Just run the code to check your priors. 329 | 330 | ```{r HOPs_priors} 331 | # Animation parameters 332 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time) 333 | frames_per_second = 2.5 # the speed of the HOPs 334 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 335 | # Faster speeds (100ms) have been demonstrated to not work as well. 336 | # See Kale et al. VIS 2018 for more info. 337 | 338 | # Animate the prior draws with HOPs 339 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 340 | prior_HOPs = animate(HOPs_plot_3(prior_draws), nframes = n_draws * 2, fps = frames_per_second) 341 | prior_HOPs 342 | ``` 343 | 344 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 345 | 346 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 347 | 348 | 349 | ## Run the model 350 | **What to change:** Nothing! Just run the model. 351 | 352 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 353 | 354 | ```{r results = "hide", message = FALSE, warning = FALSE} 355 | m = stan_glm(y ~ x, data = mydata, 356 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 357 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 358 | ) 359 | ``` 360 | 361 | 362 | ### Model summary 363 | Here is a summary of the model fit. 364 | 365 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 366 | 367 | ```{r} 368 | summary(m, digits=3) 369 | ``` 370 | 371 | 372 | ## Visualizing results 373 | #### Static visualizations 374 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with a 95% confidence band. To do this, we use the grid and draw samples from the posterior mean evaluated at each x position in the grid using the `add_fitted_draws` function, and then summarize these samples in ggplot using a `stat_lineribbon`: 375 | 376 | ```{r static_graph} 377 | 378 | # Create the dataframe with fitted draws 379 | fit = mydata %>%#pipe mydata to datagrid() 380 | data_grid(x = seq_range(x, n = 20)) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 381 | add_fitted_draws(m) #add n fitted draws from the model to the fit grid 382 | 383 | # Plot the posterior draws with a confidence band 384 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 385 | static_post_plot_3a(fit) 386 | 387 | ``` 388 | 389 | But what we really want is to display a selection of plausible fit lines, say 100 of them. To do that, we instead ask `add_fitted_draws` for only 50 draws, which we plot separately as lines: 390 | 391 | ```{r} 392 | 393 | fit = mydata %>% 394 | data_grid(x = seq_range(x, n = 101)) %>% 395 | # the seed argument is for reproducibility: it ensures the pseudo-random 396 | # number generator used to pick draws has the same seed on every run, 397 | # so that someone else can re-run this code and verify their output matches 398 | add_fitted_draws(m, n = 100, seed = 12345) 399 | 400 | # Plot the posterior draws with a selection of fit draws 401 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 402 | static_post_plot_3b(fit) 403 | 404 | ``` 405 | 406 | 407 | #### Animated HOPs visualization 408 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 409 | 410 | ```{r} 411 | 412 | p = mydata %>% #pipe mydata to datagrid() 413 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 414 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid 415 | 416 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk 417 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 418 | post_HOPs = animate(HOPs_plot_3(p), nframes = n_draws * 2, fps = frames_per_second) 419 | post_HOPs 420 | 421 | ``` 422 | 423 | ### Comparing the prior and posterior 424 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 425 | 426 |
**Prior draws**
427 | ```{r echo=F} 428 | prior_HOPs 429 | ``` 430 | 431 |
**Posterior draws**
432 | ```{r echo=F} 433 | post_HOPs 434 | ``` 435 | 436 | ## Finishing up 437 | 438 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 439 | 440 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 441 | 442 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 443 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 444 | 445 | 446 | The citation for the paper that reports the process of developing and user-testing these templates is below: 447 | 448 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 449 | -------------------------------------------------------------------------------- /1var-ordinal-line-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 2:* 11 | 12 | ![](images/generic_line-ord_chart.png) 13 | 14 | **Single ordinal independent variable (line graph)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single ordinal independent variable** and will produce a **line chart**. If your analysis includes a **t-test** or **one-way ANOVA**, this might be the right template for you. In many cases, a bar chart will be a better option for visualizing your results; for bar charts with this type of data, use template 1 instead. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA) 29 | 30 | 2. **Ordinal; line graph (e.g. t-tests, one-way ANOVA)** 31 | 32 | 3. Continuous; line graph (e.g. linear regression) 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. Two categorical; bar graph (e.g. two-way ANOVA) 37 | 38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA) 39 | 40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines) 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between the number of choices presented, which we treat as an ordinal variable in this template with possible values [12,24,40,50,60,72]; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | 60 | ##Set up 61 | ###Requirements 62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 63 | 64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 65 | 66 | ###Libraries 67 | **Installation:** 68 | If this is your first time using the template, you may need to install libraries. 69 | 70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 71 | 72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 73 | 74 | **Troubleshooting:** 75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 76 | 77 | 1. Restart R. 78 | 79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 80 | 81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 82 | 83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 84 | 85 | ```{r libraries, message=FALSE, warning=FALSE} 86 | 87 | knitr::opts_chunk$set(fig.align="center") 88 | 89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 90 | 91 | library(rstanarm) #bayesian analysis package 92 | library(tidyverse) #tidy datascience commands 93 | library(tidybayes) #tidy data + ggplot workflow 94 | library(modelr) #tidy pipelines for modeling 95 | library(ggplot2) #plotting package 96 | library(gganimate) #animate ggplots 97 | 98 | # We import all of our plotting functions from this separate R file to keep the code in 99 | # this template easier to read. You can edit this file to customize aesthetics of the plots 100 | # if desired. Just be sure to run this line again after you make edits! 101 | source('plotting_functions.R') 102 | 103 | theme_set(theme_light()) # set the ggplot theme for all plots 104 | 105 | ``` 106 | 107 | ###Read in data 108 | **What to change** 109 | 110 | 1. mydata: Read in your data. 111 | 112 | ```{r data_prep} 113 | 114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 115 | 116 | ``` 117 | 118 | 119 | ## Specify model 120 | We'll fit the following model: `stan_glm(y ~ x)`. As $x$ is an ordinal variable in this template, this specifies a linear regression with dummy variables for each level in ordinal variable $x$. **This is equivalent to ANOVA.** So for example, for a regression where $x$ has three levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b_1dummy_1 + b_2dummy_2$ and standard deviation equal to `sigma` ($\sigma$): 121 | 122 | $$ 123 | y_i \sim Normal(a + b_1dummy_1 + b_2dummy_2, \sigma) 124 | $$ 125 | 126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 127 | 128 | **What to change** 129 | 130 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots. 131 | 132 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots. 133 | 134 | 4. x_lab: Label your plots' x-axes. 135 | 136 | 5. y_lab: Label your plots' y-axes. 137 | 138 | ```{r specify_model} 139 | 140 | #select your independent and dependent variable 141 | mydata$x = as.factor(mydata$num_products_displayed) #CHANGE ME 2 142 | mydata$y = mydata$satis_Q1 #CHANGE ME 3 143 | 144 | # label the axes on the plots 145 | x_lab = "Choices" #CHANGE ME 4 146 | y_lab = "Satisfaction" #CHANGE ME 5 147 | 148 | ``` 149 | 150 | 151 | ### Set priors 152 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 153 | 154 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 155 | 156 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 157 | 158 | Our model has the following parameters: 159 | 160 | a. the overall mean y-value across all levels of ordinal variable x 161 | 162 | b. the mean y-value for each of the individual levels 163 | 164 | c. the standard deviation of the normally distributed error term 165 | 166 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the ordinal variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 167 | 168 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 169 | 170 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 171 | 172 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 173 | 174 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 175 | 176 | **What to change** 177 | 178 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 179 | 180 | *Remember: **do not** use any of your data from the current study to inform prior values.* 181 | 182 | 6. a_prior: Select the control condition mean. 183 | 184 | 7. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.) 185 | 186 | 8. b1_prior: Select the effect size mean. 187 | 188 | 9. b1_sd: Select the effect size standard deviation. 189 | 190 | 10. You should also change the comments in the code below to explain your choice of priors. 191 | 192 | **If you are using strong priors (i.e. priors from previous literature):** 193 | 194 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 195 | 196 | ```{r} 197 | 198 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 199 | # In our example dataset, y-axis scores can be in the range [1, 7]. 200 | # In the absence of other information, we set the parameter mean as 4 201 | # (the mean of the range [1,7]) and the maximum possible value as 7. 202 | # From exploratory analysis, we know the mean score and sd for y in our 203 | # dataset but we *DO NOT* use this information because priors *CANNOT* 204 | # include any information from the current study. 205 | 206 | a_prior = 4 # CHANGE ME 6 207 | a_prior_max = 7 # CHANGE ME 7 208 | 209 | # With a normal distribution, we can't completely rule out 210 | # impossible values, but we choose an sd that assigns less than 211 | # 5% probability to those impossible values. Remember that in a normal 212 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 213 | # we calculate the value of 1 sd by finding the maximum amount our data 214 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 215 | 216 | a_sd = (a_prior_max - a_prior) / 2 # do not change 217 | 218 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 219 | # In this example, we will say we do not have a strong hypothesis about the effect 220 | # of choice set size on satisfaction, so we set the mean of the effect size 221 | # parameters to be 0. In the absence of other information, we set the sd 222 | # to be the same as for the control condition. 223 | 224 | b1_prior = 0 # CHANGE ME 8 225 | b1_sd = a_sd # CHANGE ME 9 226 | 227 | ``` 228 | 229 | 230 | **What to change** 231 | 232 | **If you are using weakly informative priors:** 233 | 234 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 235 | 236 | **If you are using strong priors (i.e. priors from previous literature):** 237 | 238 | *Remember: **do not** use any of your data from the current study to set prior values.* 239 | 240 | First, make sure to uncomment all four variables set in this code chunk. 241 | 242 | 6. a_prior: Select the control condition mean. 243 | 244 | 7. a_sd: Select the control condition standard deviation. 245 | 246 | 8. b1_prior: Select the effect size mean. 247 | 248 | 9. b1_sd: Select the effect size standard deviation. 249 | 250 | 10. You should also change the comments in the code below to explain your choice of priors. 251 | 252 | ```{r} 253 | 254 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 255 | # In our example dataset, y-axis scores can be in the range [1, 7]. 256 | # To choose our priors, we use the results from a previous study 257 | # where participants completed an identical task (choosing between 258 | # different chocolate bars). For our overall prior mean, we pool the mean 259 | # satisfaction scores from all conditions in the previous study to get 260 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 261 | # the 95% confidence intervals from the previous study results. 262 | 263 | # a_prior = 5.86 # CHANGE ME 6 264 | # a_sd = 0.6 # CHANGE ME 7 265 | 266 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10) 267 | # In this example, we do not have a strong hypothesis about the effect 268 | # of choice set size on satisfaction, so we set the mean of the effect size 269 | # parameters to be 0. In the absence of other information, we set the sd 270 | # to be the same as for the control condition. 271 | 272 | # b1_prior = 0 # CHANGE ME 8 273 | # b1_sd = a_sd # CHANGE ME 9 274 | 275 | ``` 276 | 277 | 278 | ### Checking priors with visualizations 279 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 280 | 281 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars. 282 | 283 | #### Static visualization of priors 284 | 285 | **What to change** 286 | 287 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often. 288 | 289 | **Troubleshooting** 290 | 291 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 292 | 293 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 294 | 295 | ```{r check_priors, results="hide"} 296 | 297 | # generate the prior distribution 298 | m_prior = stan_glm(y ~ x, data = mydata, 299 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 300 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 301 | prior_PD = TRUE 302 | ) 303 | 304 | # Create the dataframe with fitted draws 305 | prior_draws = mydata %>% #pipe mydata to datagrid() 306 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 307 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid 308 | # the seed argument is for reproducibility: it ensures the pseudo-random 309 | # number generator used to pick draws has the same seed on every run, 310 | # so that someone else can re-run this code and verify their output matches 311 | 312 | # Plot the five sample draws 313 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 314 | static_prior_plot_2(prior_draws) 315 | ``` 316 | 317 | #### Animated HOPs visualization of priors 318 | 319 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws! 320 | 321 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above. 322 | 323 | **What to change:** Nothing! Just run the code to check your priors. 324 | 325 | ```{r HOPs_priors} 326 | # Animation parameters 327 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time) 328 | frames_per_second = 2.5 # the speed of the HOPs 329 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 330 | # Faster speeds (100ms) have been demonstrated to not work as well. 331 | # See Kale et al. VIS 2018 for more info. 332 | 333 | # Add more prior draws to the data frame for the visualization 334 | more_prior_draws = prior_draws %>% 335 | rbind( 336 | mydata %>% 337 | data_grid(x) %>% 338 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345)) 339 | 340 | # Animate the prior draws with HOPs 341 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 342 | prior_HOPs = animate(HOPs_plot_2(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second) 343 | prior_HOPs 344 | ``` 345 | 346 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 347 | 348 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 349 | 350 | 351 | ### Run the model 352 | **What to change:** Nothing! Just run the model. 353 | 354 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 355 | 356 | ```{r results = "hide", message = FALSE, warning = FALSE} 357 | 358 | m = stan_glm(y ~ x, data = mydata, 359 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 360 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 361 | ) 362 | 363 | ``` 364 | 365 | 366 | ### Model summary 367 | Here is a summary of the model fit. 368 | 369 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 370 | 371 | ```{r} 372 | summary(m, digits=3) 373 | ``` 374 | 375 | 376 | ## Visualizing results 377 | #### Static visualization 378 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model: 379 | 380 | ```{r static_graph} 381 | 382 | # Create the dataframe with fitted draws 383 | fit = mydata %>%#pipe mydata to datagrid() 384 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 385 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid 386 | mean_qi(.width = .95) #add 95% credible intervals 387 | 388 | # Plot the posterior draws 389 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 390 | static_post_plot_2(fit) 391 | ``` 392 | 393 | 394 | #### Animated HOPs visualization 395 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 396 | 397 | ```{r} 398 | 399 | p = mydata %>% #pipe mydata to datagrid() 400 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 401 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid 402 | 403 | #animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk 404 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 405 | post_HOPs = animate(HOPs_plot_2(p), nframes = n_draws * 2, fps = frames_per_second) 406 | post_HOPs 407 | 408 | ``` 409 | 410 | ### Comparing the prior and posterior 411 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 412 | 413 |
**Prior draws**
414 | ```{r echo=F} 415 | prior_HOPs 416 | ``` 417 | 418 |
**Posterior draws**
419 | ```{r echo=F} 420 | post_HOPs 421 | ``` 422 | 423 | ## Finishing up 424 | 425 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 426 | 427 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 428 | 429 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 430 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 431 | 432 | 433 | The citation for the paper that reports the process of developing and user-testing these templates is below: 434 | 435 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 436 | -------------------------------------------------------------------------------- /2var-categorical-bar-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 4:* 11 | 12 | ![](images/generic_2bar_chart.png) 13 | 14 | **Interaction of two categorical independent variables (bar chart)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **two interacting categorical independent variables** and will produce a **bar chart**. If your analysis includes a **two-way ANOVA**, this might be the right template for you. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA) 29 | 30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA) 31 | 32 | 3. Continuous; line graph (e.g. linear regression) 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. **Two categorical; bar graph (e.g. two-way ANOVA)** 37 | 38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA) 39 | 40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines) 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as a categorical variable in this template with possible values [12,24,40,50,60,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | 60 | ##Set up 61 | ###Requirements 62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 63 | 64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 65 | 66 | ###Libraries 67 | **Installation:** 68 | If this is your first time using the template, you may need to install libraries. 69 | 70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 71 | 72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 73 | 74 | **Troubleshooting:** 75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 76 | 77 | 1. Restart R. 78 | 79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 80 | 81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 82 | 83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 84 | 85 | ```{r libraries, message=FALSE, warning=FALSE} 86 | 87 | knitr::opts_chunk$set(fig.align="center") 88 | 89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 90 | 91 | library(rstanarm) #bayesian analysis package 92 | library(tidyverse) #tidy datascience commands 93 | library(tidybayes) #tidy data + ggplot workflow 94 | library(modelr) #tidy pipelines for modeling 95 | library(ggplot2) #plotting package 96 | library(gganimate) #animate ggplots 97 | 98 | # We import all of our plotting functions from this separate R file to keep the code in 99 | # this template easier to read. You can edit this file to customize aesthetics of the plots 100 | # if desired. Just be sure to run this line again after you make edits! 101 | source('plotting_functions.R') 102 | 103 | theme_set(theme_light()) # set the ggplot theme for all plots 104 | 105 | ``` 106 | 107 | 108 | ###Read in data 109 | **What to change** 110 | 111 | 1. mydata: Read in your data. 112 | 113 | ```{r data_prep} 114 | 115 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 116 | 117 | ``` 118 | 119 | 120 | ## Specify model 121 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where both $x_1$ and $x_2$ are categorical variables. This specifies a linear regression with dummy variables for each level in $x_1$ and $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. **This is equivalent to ANOVA.** So for example, for a regression where $x_1$ has three levels and $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b*dummy$ (where $b*dummy$ is the appropriate dummy term) and standard deviation equal to `sigma` ($\sigma$): 122 | 123 | 124 | $$ 125 | \begin{aligned} 126 | y_i \sim Normal(a + b_{x1a}dummy_{x1a} + b_{x1b}dummy_{x1b} + \\ 127 | b_{x2}dummy_{x2} + \\ 128 | b_{x2}dummy_{x2} * b_{x1a}dummy_{x1a} + \\ 129 | b_{x2}dummy_{x2} * b_{x1b}dummy_{x1b}, \\\sigma) 130 | \end{aligned} 131 | $$ 132 | 133 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 134 | 135 | **What to change** 136 | 137 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots. 138 | 139 | 3. mydata\$x2: Select the second independent variable. Each level of this variable will correspond to a different bar color in the output plot. 140 | 141 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots. 142 | 143 | 5. x_lab: Label your plots' x-axes. 144 | 145 | 6. y_lab: Label your plots' y-axes. 146 | 147 | ```{r specify_model} 148 | 149 | #select your independent and dependent variables 150 | mydata$x1 = as.factor(mydata$num_products_displayed) #CHANGE ME 2 151 | mydata$x2 = mydata$sat_max #CHANGE ME 3 152 | mydata$y = mydata$satis_Q1 #CHANGE ME 4 153 | 154 | # label the axes on the plots 155 | x_lab = "Choices" #CHANGE ME 5 156 | y_lab = "Satisfaction" #CHANGE ME 6 157 | 158 | ``` 159 | 160 | 161 | ### Set priors 162 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 163 | 164 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 165 | 166 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 167 | 168 | Our model has the following parameters: 169 | 170 | a. the overall mean y-value across all levels of categorical variable x 171 | 172 | b. the mean y-value for each of the individual levels 173 | 174 | c. the standard deviation of the normally distributed error term 175 | 176 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the categorical variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 177 | 178 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 179 | 180 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 181 | 182 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 183 | 184 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 185 | 186 | **What to change** 187 | 188 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 189 | 190 | *Remember: **do not** use any of your data from the current study to inform prior values.* 191 | 192 | 7. a_prior: Select the control condition mean. 193 | 194 | 8. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.) 195 | 196 | 9. b1_prior: Select the effect size mean. 197 | 198 | 10. b1_sd: Select the effect size standard deviation. 199 | 200 | 11. You should also change the comments in the code below to explain your choice of priors. 201 | 202 | **If you are using strong priors (i.e. priors from previous literature):** 203 | 204 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 205 | 206 | ```{r} 207 | 208 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 209 | # In our example dataset, y-axis scores can be in the range [1, 7]. 210 | # In the absence of other information, we set the parameter mean as 4 211 | # (the mean of the range [1,7]) and the maximum possible value as 7. 212 | # From exploratory analysis, we know the mean score and sd for y in our 213 | # dataset but we *DO NOT* use this information because priors *CANNOT* 214 | # include any information from the current study. 215 | 216 | a_prior = 4 # CHANGE ME 7 217 | a_prior_max = 7 # CHANGE ME 8 218 | 219 | # With a normal distribution, we can't completely rule out 220 | # impossible values, but we choose an sd that assigns less than 221 | # 5% probability to those impossible values. Remember that in a normal 222 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 223 | # we calculate the value of 1 sd by finding the maximum amount our data 224 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 225 | 226 | a_sd = (a_prior_max - a_prior) / 2 # do not change 227 | 228 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 229 | # In our example dataset, we do not have a strong hypothesis that the treatment 230 | # conditions will be higher or lower than the control, so we set the mean of 231 | # the effect size parameters to be 0. In the absence of other information, we 232 | # set the sd to be the same as for the control condition. 233 | 234 | b1_prior = 0 # CHANGE ME 9 235 | b1_sd = a_sd # CHANGE ME 10 236 | 237 | ``` 238 | 239 | 240 | **What to change** 241 | 242 | **If you are using weakly informative priors:** 243 | 244 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 245 | 246 | **If you are using strong priors (i.e. priors from previous literature):** 247 | 248 | *Remember: **do not** use any of your data from the current study to set prior values.* 249 | 250 | First, make sure to uncomment all four variables set in this code chunk. 251 | 252 | 7. a_prior: Select the control condition mean. 253 | 254 | 8. a_sd: Select the control condition standard deviation. 255 | 256 | 9. b1_prior: Select the effect size mean. 257 | 258 | 10. b1_sd: Select the effect size standard deviation. 259 | 260 | 11. You should also change the comments in the code below to explain your choice of priors. 261 | 262 | ```{r} 263 | 264 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 265 | # In our example dataset, y-axis scores can be in the range [1, 7]. 266 | # To choose our priors, we use the results from a previous study 267 | # where participants completed an identical task (choosing between 268 | # different chocolate bars). For our overall prior mean, we pool the mean 269 | # satisfaction scores from all conditions in the previous study to get 270 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 271 | # the 95% confidence intervals from the previous study results. 272 | 273 | # a_prior = 5.86 # CHANGE ME 7 274 | # a_sd = 0.6 # CHANGE ME 8 275 | 276 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 277 | # In our example dataset, we do not have guidance from previous literature 278 | # to set an exact effect size, but we do know that satisficers (the "treatment" 279 | # condition) are likely to have higher mean satisfaction than the maximizers 280 | # (the "control" condition), so we set an effect size parameter mean that 281 | # results in a 1 point increase in satisfaction for satisficers. To reflect 282 | # the uncertainty in this effect size, we select a broad sd so that there is 283 | # a ~20% chance that the effect size will be negative. 284 | 285 | # b1_prior = 1 # CHANGE ME 9 286 | # b1_sd = 1 # CHANGE ME 10 287 | 288 | ``` 289 | 290 | 291 | ### Checking priors with visualizations 292 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 293 | 294 | **What to change** 295 | 296 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often. 297 | 298 | **Troubleshooting** 299 | 300 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 301 | 302 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 303 | 304 | ```{r check_priors, results="hide"} 305 | 306 | # generate the prior distribution 307 | m_prior = stan_glm(y ~ x1*x2, data = mydata, 308 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 309 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 310 | prior_PD = TRUE 311 | ) 312 | 313 | # Create the dataframe with fitted draws 314 | prior_draws = mydata %>% #pipe mydata to datagrid() 315 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 316 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid 317 | # the seed argument is for reproducibility: it ensures the pseudo-random 318 | # number generator used to pick draws has the same seed on every run, 319 | # so that someone else can re-run this code and verify their output matches 320 | 321 | # Plot the five sample draws 322 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 323 | static_prior_plot_4(prior_draws) 324 | ``` 325 | 326 | #### Animated visualization of priors 327 | 328 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws! 329 | 330 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above. 331 | 332 | **What to change:** Nothing! Just run the code to check your priors. 333 | 334 | ```{r HOPs_priors} 335 | # Animation parameters 336 | n_draws = 50 # the number of draws to visualize in the HOPs 337 | frames_per_second = 2.5 # the speed of the HOPs 338 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 339 | # Faster speeds (100ms) have been demonstrated to not work as well. 340 | # See Kale et al. VIS 2018 for more info. 341 | 342 | # Add more prior draws to the data frame for the visualization 343 | more_prior_draws = prior_draws %>% 344 | rbind( 345 | mydata %>% 346 | data_grid(x1,x2) %>% 347 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345)) 348 | 349 | # Animate the prior draws with HOPs 350 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 351 | prior_HOPs = animate(HOPs_plot_4(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second) 352 | prior_HOPs 353 | ``` 354 | 355 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 356 | 357 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 358 | 359 | 360 | ### Run the model 361 | There's nothing you have to change here. Just run the model. 362 | 363 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 364 | 365 | ```{r results = "hide", message = FALSE, warning = FALSE} 366 | 367 | m = stan_glm(y ~ x1*x2, data = mydata, 368 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 369 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 370 | ) 371 | 372 | ``` 373 | 374 | 375 | ## Model summary 376 | Here is a summary of the model fit. 377 | 378 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 379 | 380 | ```{r} 381 | summary(m, digits=3) 382 | ``` 383 | 384 | 385 | ## Visualizing results 386 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model: 387 | 388 | ```{r static_graph} 389 | 390 | # Create the dataframe with fitted draws 391 | fit = mydata %>%#pipe mydata to datagrid() 392 | data_grid(x1,x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 393 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid 394 | mean_qi(.width = .95) #add 95% credible intervals 395 | 396 | # Plot the posterior draws 397 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 398 | static_post_plot_4(fit) 399 | ``` 400 | 401 | 402 | ### Sampling from the posterior 403 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 404 | 405 | ```{r} 406 | 407 | p = mydata %>% #pipe mydata to datagrid() 408 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 409 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid 410 | 411 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk 412 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 413 | post_HOPs = animate(HOPs_plot_4(p), nframes = n_draws * 2, fps = frames_per_second) 414 | post_HOPs 415 | 416 | ``` 417 | 418 | ### Comparing the prior and posterior 419 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 420 | 421 |
**Prior draws**
422 | ```{r echo=F} 423 | prior_HOPs 424 | ``` 425 | 426 |
**Posterior draws**
427 | ```{r echo=F} 428 | post_HOPs 429 | ``` 430 | 431 | ## Finishing up 432 | 433 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 434 | 435 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 436 | 437 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 438 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 439 | 440 | 441 | The citation for paper reporting the process of developing and user-testing these templates is below: 442 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 443 | -------------------------------------------------------------------------------- /2var-categorical_ordinal-line-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 5:* 11 | 12 | ![](images/generic_2line_chart.png) 13 | 14 | **Interaction of one categorical & one ordinal independent variable (line graph)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **two interacting independent variables, one categorical and one ordinal** and will produce a **line graph**. If your analysis includes a **two-way ANOVA**, this might be the right template for you. In most cases, we *do not recommend* using line charts for this type of analysis; a bar chart is usually the better option. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA) 29 | 30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA) 31 | 32 | 3. Continuous; line graph (e.g. linear regression) 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. Two categorical; bar graph (e.g. two-way ANOVA) 37 | 38 | 5. **One categorical, one ordinal; line graph (e.g. two-way ANOVA)** 39 | 40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines) 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as an ordinal variable in this template with possible values [12,24,40,50,60,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | ##Set up 60 | ###Requirements 61 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 62 | 63 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 64 | 65 | ###Libraries 66 | **Installation:** 67 | If this is your first time using the template, you may need to install libraries. 68 | 69 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 70 | 71 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 72 | 73 | **Troubleshooting:** 74 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 75 | 76 | 1. Restart R. 77 | 78 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 79 | 80 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 81 | 82 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 83 | 84 | ```{r libraries, message=FALSE, warning=FALSE} 85 | 86 | knitr::opts_chunk$set(fig.align="center") 87 | 88 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 89 | 90 | library(rstanarm) #bayesian analysis package 91 | library(tidyverse) #tidy datascience commands 92 | library(tidybayes) #tidy data + ggplot workflow 93 | library(modelr) #tidy pipelines for modeling 94 | library(ggplot2) #plotting package 95 | library(gganimate) #animate ggplots 96 | 97 | # We import all of our plotting functions from this separate R file to keep the code in 98 | # this template easier to read. You can edit this file to customize aesthetics of the plots 99 | # if desired. Just be sure to run this line again after you make edits! 100 | source('plotting_functions.R') 101 | 102 | theme_set(theme_light()) # set the ggplot theme for all plots 103 | 104 | ``` 105 | 106 | ###Read in data 107 | **What to change** 108 | 109 | 1. mydata: Read in your data. 110 | 111 | ```{r data_prep} 112 | 113 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 114 | 115 | ``` 116 | 117 | 118 | ## Specify model 119 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where $x_1$ is an ordinal variable and $x_2$ is a categorical variable. This specifies a linear regression with dummy variables for each level in $x_1$ and $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. **This is equivalent to ANOVA.** So for example, for a regression where $x_1$ has three levels and $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b*dummy$ (where $b*dummy$ is the appropriate dummy term) and standard deviation equal to `sigma` ($\sigma$): 120 | 121 | 122 | $$ 123 | \begin{aligned} 124 | y_i \sim Normal(a + b_{x1a}dummy_{x1a} + b_{x1b}dummy_{x1b} + \\ 125 | b_{x2}dummy_{x2} + \\ 126 | b_{x2}dummy_{x2} * b_{x1a}dummy_{x1a} + \\ 127 | b_{x2}dummy_{x2} * b_{x1b}dummy_{x1b}, \\\sigma) 128 | \end{aligned} 129 | $$ 130 | 131 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 132 | 133 | **What to change** 134 | 135 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots. This is your ordered variable. 136 | 137 | 3. mydata\$x2: Select the second independent variable, the categorical variable. You will have one line in the output graph for each level of this variable. 138 | 139 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots. 140 | 141 | 5. x_lab: Label your plots' x-axes. 142 | 143 | 6. y_lab: Label your plots' y-axes. 144 | 145 | ```{r specify_model} 146 | 147 | #select your independent and dependent variables 148 | mydata$x1 = as.factor(mydata$num_products_displayed) #CHANGE ME 2 149 | mydata$x2 = mydata$sat_max #CHANGE ME 3 150 | mydata$y = mydata$satis_Q1 #CHANGE ME 4 151 | 152 | # label the axes on the plots 153 | x_lab = "Choices" #CHANGE ME 5 154 | y_lab = "Satisfaction" #CHANGE ME 6 155 | 156 | ``` 157 | 158 | 159 | ###Set priors 160 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 161 | 162 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 163 | 164 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 165 | 166 | Our model has the following parameters: 167 | 168 | a. the overall mean y-value across all levels of ordinal variable x 169 | 170 | b. the mean y-value for each of the individual levels 171 | 172 | c. the standard deviation of the normally distributed error term 173 | 174 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the ordinal variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 175 | 176 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 177 | 178 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 179 | 180 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 181 | 182 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 183 | 184 | **What to change** 185 | 186 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 187 | 188 | *Remember: **do not** use any of your data from the current study to inform prior values.* 189 | 190 | 7. a_prior: Select the control condition mean. 191 | 192 | 8. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.) 193 | 194 | 9. b1_prior: Select the effect size mean. 195 | 196 | 10. b1_sd: Select the effect size standard deviation. 197 | 198 | 11. You should also change the comments in the code below to explain your choice of priors. 199 | 200 | **If you are using strong priors (i.e. priors from previous literature):** 201 | 202 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 203 | 204 | ```{r} 205 | 206 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 207 | # In our example dataset, y-axis scores can be in the range [1, 7]. 208 | # In the absence of other information, we set the parameter mean as 4 209 | # (the mean of the range [1,7]) and the maximum possible value as 7. 210 | # From exploratory analysis, we know the mean score and sd for y in our 211 | # dataset but we *DO NOT* use this information because priors *CANNOT* 212 | # include any information from the current study. 213 | 214 | a_prior = 4 # CHANGE ME 7 215 | a_prior_max = 7 # CHANGE ME 8 216 | 217 | # With a normal distribution, we can't completely rule out 218 | # impossible values, but we choose an sd that assigns less than 219 | # 5% probability to those impossible values. Remember that in a normal 220 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 221 | # we calculate the value of 1 sd by finding the maximum amount our data 222 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 223 | 224 | a_sd = (a_prior_max - a_prior) / 2 # do not change 225 | 226 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 227 | # In our example dataset, we do not have a strong hypothesis that the treatment 228 | # conditions will be higher or lower than the control, so we set the mean of 229 | # the effect size parameters to be 0. In the absence of other information, we 230 | # set the sd to be the same as for the control condition. 231 | 232 | b1_prior = 0 # CHANGE ME 9 233 | b1_sd = a_sd # CHANGE ME 10 234 | 235 | ``` 236 | 237 | 238 | **What to change** 239 | 240 | **If you are using weakly informative priors:** 241 | 242 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 243 | 244 | **If you are using strong priors (i.e. priors from previous literature):** 245 | 246 | *Remember: **do not** use any of your data from the current study to set prior values.* 247 | 248 | First, make sure to uncomment all four variables set in this code chunk. 249 | 250 | 7. a_prior: Select the control condition mean. 251 | 252 | 8. a_sd: Select the control condition standard deviation. 253 | 254 | 9. b1_prior: Select the effect size mean. 255 | 256 | 10. b1_sd: Select the effect size standard deviation. 257 | 258 | 11. You should also change the comments in the code below to explain your choice of priors. 259 | 260 | ```{r} 261 | 262 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 263 | # In our example dataset, y-axis scores can be in the range [1, 7]. 264 | # To choose our priors, we use the results from a previous study 265 | # where participants completed an identical task (choosing between 266 | # different chocolate bars). For our overall prior mean, we pool the mean 267 | # satisfaction scores from all conditions in the previous study to get 268 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 269 | # the 95% confidence intervals from the previous study results. 270 | 271 | # a_prior = 5.86 # CHANGE ME 7 272 | # a_sd = 0.6 # CHANGE ME 8 273 | 274 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 275 | # In our example dataset, we do not have guidance from previous literature 276 | # to set an exact effect size, but we do know that satisficers (the "treatment" 277 | # condition) are likely to have higher mean satisfaction than the maximizers 278 | # (the "control" condition), so we set an effect size parameter mean that 279 | # results in a 1 point increase in satisfaction for satisficers. To reflect 280 | # the uncertainty in this effect size, we select a broad sd so that there is 281 | # a ~20% chance that the effect size will be negative. 282 | 283 | # b1_prior = 1 # CHANGE ME 9 284 | # b1_sd = 1 # CHANGE ME 10 285 | 286 | ``` 287 | 288 | 289 | ### Checking priors with visualizations 290 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 291 | 292 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars. 293 | 294 | #### Static visualization of priors 295 | 296 | **What to change** 297 | 298 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often. 299 | 300 | **Troubleshooting** 301 | 302 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 303 | 304 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 305 | 306 | ```{r check_priors, results="hide"} 307 | 308 | # generate the prior distribution 309 | m_prior = stan_glm(y ~ x1*x2, data = mydata, 310 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 311 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 312 | prior_PD = TRUE 313 | ) 314 | 315 | # Create the dataframe with fitted draws 316 | prior_draws = mydata %>% #pipe mydata to datagrid() 317 | data_grid(x1,x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 318 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid 319 | # the seed argument is for reproducibility: it ensures the pseudo-random 320 | # number generator used to pick draws has the same seed on every run, 321 | # so that someone else can re-run this code and verify their output matches 322 | 323 | # Plot the five sample draws 324 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 325 | static_prior_plot_5(prior_draws) 326 | ``` 327 | 328 | #### Animated visualization of priors 329 | 330 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws! 331 | 332 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above. 333 | 334 | **What to change:** Nothing! Just run the code to check your priors. 335 | 336 | ```{r HOPs_priors} 337 | # Animation parameters 338 | n_draws = 50 # the number of draws to visualize in the HOPs 339 | frames_per_second = 2.5 # the speed of the HOPs 340 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 341 | # Faster speeds (100ms) have been demonstrated to not work as well. 342 | # See Kale et al. VIS 2018 for more info. 343 | 344 | # Add more prior draws to the data frame for the visualization 345 | more_prior_draws = prior_draws %>% 346 | rbind( 347 | mydata %>% 348 | data_grid(x1, x2) %>% 349 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345)) 350 | 351 | # Animate the prior draws with HOPs 352 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 353 | prior_HOPs = animate(HOPS_plot_5(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second) 354 | prior_HOPs 355 | ``` 356 | 357 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 358 | 359 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 360 | 361 | 362 | ### Run the model 363 | There's nothing you have to change here. Just run the model. 364 | 365 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 366 | 367 | ```{r results = "hide", message = FALSE, warning = FALSE} 368 | 369 | m = stan_glm(y ~ x1*x2, data = mydata, 370 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 371 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 372 | ) 373 | 374 | ``` 375 | 376 | 377 | ## Model summary 378 | Here is a summary of the model fit. 379 | 380 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 381 | 382 | ```{r} 383 | summary(m, digits=3) 384 | ``` 385 | 386 | ## Visualizing results 387 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model: 388 | 389 | ```{r static_graph} 390 | 391 | # Create the dataframe with fitted draws 392 | fit = mydata %>%#pipe mydata to datagrid() 393 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 394 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid 395 | mean_qi(.width = .95) #add 95% credible intervals 396 | 397 | # Plot the posterior draws 398 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 399 | static_post_plot_5(fit) 400 | ``` 401 | 402 | 403 | #### Animated HOPs visualization 404 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 405 | 406 | ```{r} 407 | 408 | p = mydata %>% #pipe mydata to datagrid() 409 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 410 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid 411 | # the seed argument is for reproducibility: it ensures the pseudo-random 412 | # number generator used to pick draws has the same seed on every run, 413 | # so that someone else can re-run this code and verify their output matches 414 | 415 | #animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk 416 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 417 | post_HOPs = animate(HOPS_plot_5(p), nframes = n_draws * 2, fps = frames_per_second) 418 | post_HOPs 419 | 420 | ``` 421 | 422 | ### Comparing the prior and posterior 423 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 424 | 425 |
**Prior draws**
426 | ```{r echo=F} 427 | prior_HOPs 428 | ``` 429 | 430 |
**Posterior draws**
431 | ```{r echo=F} 432 | post_HOPs 433 | ``` 434 | 435 | ## Finishing up 436 | 437 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 438 | 439 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 440 | 441 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 442 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 443 | 444 | 445 | The citation for paper reporting the process of developing and user-testing these templates is below: 446 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 447 | -------------------------------------------------------------------------------- /2var-continuous_categorical-line-bayesian_template.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title:
Bayesian analysis template
3 | author:
Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output: 5 | html_document: 6 | theme: flatly 7 | highlight: pygments 8 | --- 9 |
10 |
*Template 6:* 11 | 12 | ![](images/generic_2line-cont_chart.png) 13 | 14 | **Interaction of one continuous & one categorical independent variable (line graph)**
15 | 16 | 17 | ##Introduction 18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **one continuous and one categorical independent variable** and will produce a **line chart**. If your analysis includes a **linear regression**, this might be the right template for you. 19 | 20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates. 21 | 22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions. 23 | 24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 25 | 26 | *One independent variable:* 27 | 28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA) 29 | 30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA) 31 | 32 | 3. Continuous; line graph (e.g. linear regression) 33 | 34 | *Two interacting independent variables:* 35 | 36 | 4. Two categorical; bar graph (e.g. two-way ANOVA) 37 | 38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA) 39 | 40 | 6. **One categorical, one continuous; line graph (e.g. linear regression with multiple lines)** 41 | 42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid. 43 | 44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document. 45 | 46 | Good luck! 47 | 48 | ###Tips before you start 49 | 50 | 1. Make sure you have picked the right template! (See above.) 51 | 52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up. 53 | 54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template. 55 | 56 | ###Sample dataset 57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as a continuous variable in this template with values that can fall in the range [12,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7]. 58 | 59 | 60 | ##Set up 61 | ###Requirements 62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.) 63 | 64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this. 65 | 66 | ###Libraries 67 | **Installation:** 68 | If this is your first time using the template, you may need to install libraries. 69 | 70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both. 71 | 72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once. 73 | 74 | **Troubleshooting:** 75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options: 76 | 77 | 1. Restart R. 78 | 79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template). 80 | 81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started). 82 | 83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages... 84 | 85 | ```{r libraries, message=FALSE, warning=FALSE} 86 | 87 | knitr::opts_chunk$set(fig.align="center") 88 | 89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate")) 90 | 91 | library(rstanarm) #bayesian analysis package 92 | library(tidyverse) #tidy datascience commands 93 | library(tidybayes) #tidy data + ggplot workflow 94 | library(modelr) #tidy pipelines for modeling 95 | library(ggplot2) #plotting package 96 | library(gganimate) #animate ggplots 97 | 98 | # We import all of our plotting functions from this separate R file to keep the code in 99 | # this template easier to read. You can edit this file to customize aesthetics of the plots 100 | # if desired. Just be sure to run this line again after you make edits! 101 | source('plotting_functions.R') 102 | 103 | theme_set(theme_light()) # set the ggplot theme for all plots 104 | 105 | ``` 106 | 107 | ###Read in data 108 | **What to change** 109 | 110 | 1. mydata: Read in your data. 111 | 112 | ```{r data_prep} 113 | 114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1 115 | 116 | ``` 117 | 118 | 119 | ## Specify model 120 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where $x_1$ is a continuous variable and $x_2$ is a categorical variable. This specifies a linear regression with a parameter for $x_1$ and dummy variables for each level in $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. So for example, for a regression where $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to the value of the specified regression equation and standard deviation equal to `sigma` ($\sigma$): 121 | 122 | $$ 123 | y_i \sim Normal(a + b_{x1}x_i + b_{x2}dummy_{x2} + b_{x1}x_i * b_{x2}dummy_{x2}, \sigma) 124 | $$ 125 | 126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots. 127 | 128 | **What to change** 129 | 130 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots. This is your continuous variable. 131 | 132 | 3. mydata\$x2: Select the second independent variable, the categorical variable. You will have one line in the output graph for each level of this variable. 133 | 134 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots. 135 | 136 | 5. x_lab: Label your plots' x-axes. 137 | 138 | 6. y_lab: Label your plots' y-axes. 139 | 140 | ```{r specify_model} 141 | 142 | #select your independent and dependent variables 143 | mydata$x1 = mydata$num_products_displayed #CHANGE ME 2 144 | mydata$x2 = mydata$sat_max #CHANGE ME 3 145 | mydata$y = mydata$satis_Q1 #CHANGE ME 4 146 | 147 | # label the axes on the plots 148 | x_lab = "Choices" #CHANGE ME 5 149 | y_lab = "Satisfaction" #CHANGE ME 6 150 | 151 | ``` 152 | 153 | 154 | ###Set priors 155 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters. 156 | 157 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below. 158 | 159 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable. 160 | 161 | Our model has the following parameters: 162 | 163 | a. the intercept; functionally, this is often the mean of the control condition 164 | 165 | b. the slope; i.e, the effect size 166 | 167 | c. the standard deviation of the normally distributed error term 168 | 169 | To simplify things, we limit the number of different prior beliefs you can have. Think of the intercept as specifying the control condition of an experiment, and the slope as specifying the effect size. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code. 170 | 171 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean. 172 | 173 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c). 174 | 175 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution. 176 | 177 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors. 178 | 179 | **What to change** 180 | 181 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):** 182 | 183 | *Remember: **do not** use any of your data from the current study to inform prior values.* 184 | 185 | 7. a_prior: Select the intercept (likely the control condition mean). 186 | 187 | 8. a_prior_max: Select the maximum plausible value of the intercept (maximum plausible value of control condition data). (We will use this to calculate the sd of `a`.) 188 | 189 | 9. b1_prior: Select the effect size mean. 190 | 191 | 10. b1_sd: Select the effect size standard deviation. 192 | 193 | 11. You should also change the comments in the code below to explain your choice of priors. 194 | 195 | **If you are using strong priors (i.e. priors from previous literature):** 196 | 197 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk. 198 | 199 | ```{r} 200 | 201 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 202 | # In our example dataset, y-axis scores can be in the range [1, 7]. 203 | # In the absence of other information, we set the parameter mean as 4 204 | # (the mean of the range [1,7]) and the maximum possible value as 7. 205 | # From exploratory analysis, we know the mean score and sd for y in our 206 | # dataset but we *DO NOT* use this information because priors *CANNOT* 207 | # include any information from the current study. 208 | 209 | a_prior = 4 # CHANGE ME 7 210 | a_prior_max = 7 # CHANGE ME 8 211 | 212 | # With a normal distribution, we can't completely rule out 213 | # impossible values, but we choose an sd that assigns less than 214 | # 5% probability to those impossible values. Remember that in a normal 215 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore, 216 | # we calculate the value of 1 sd by finding the maximum amount our data 217 | # can vary from the mean (a_prior_max - a_prior) and divide that in half. 218 | 219 | a_sd = (a_prior_max - a_prior) / 2 # do not change 220 | 221 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 222 | # In our example dataset, we do not have a strong hypothesis that the treatment 223 | # conditions will be higher or lower than the control, so we set the mean of 224 | # the effect size parameter to be 0. In the absence of other information, we 225 | # set the sd so that a change from the minimum choice set size (12) 226 | # to the maximum choice set size (72) could plausibly result in a 227 | # +6/-6 change in satisfaction, the maximum possible change. 228 | 229 | b1_prior = 0 # CHANGE ME 9 230 | b1_sd = (6/(72-12))/2 # CHANGE ME 10 231 | 232 | ``` 233 | 234 | 235 | **What to change** 236 | 237 | **If you are using weakly informative priors:** 238 | 239 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten. 240 | 241 | **If you are using strong priors (i.e. priors from previous literature):** 242 | 243 | *Remember: **do not** use any of your data from the current study to set prior values.* 244 | 245 | First, make sure to uncomment all four variables set in this code chunk. 246 | 247 | 7. a_prior: Select the control condition mean. 248 | 249 | 8. a_sd: Select the control condition standard deviation. 250 | 251 | 9. b1_prior: Select the effect size mean. 252 | 253 | 10. b1_sd: Select the effect size standard deviation. 254 | 255 | 11. You should also change the comments in the code below to explain your choice of priors. 256 | 257 | ```{r} 258 | 259 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 260 | # In our example dataset, y-axis scores can be in the range [1, 7]. 261 | # To choose our priors, we use the results from a previous study 262 | # where participants completed an identical task (choosing between 263 | # different chocolate bars). For our overall prior mean, we pool the mean 264 | # satisfaction scores from all conditions in the previous study to get 265 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses 266 | # the 95% confidence intervals from the previous study results. 267 | 268 | # a_prior = 5.86 # CHANGE ME 7 269 | # a_sd = 0.6 # CHANGE ME 8 270 | 271 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11) 272 | # In this example, we do not have guidance from previous literature 273 | # to set an effect size on satisfaction with an interaction term between 274 | # choice size and maximizer score, so we set the mean effect size 275 | # at 0. To reflect the uncertainty in this effect size, we set the sd 276 | # so that a change from the minimum choice set size (12) to the maximum 277 | # choice set size (72) could plausibly result in a +6/-6 change 278 | # in satisfaction, the maximum possible change. 279 | 280 | # b1_prior = 0 # CHANGE ME 9 281 | # b1_sd = (6/(72-12))/2 # CHANGE ME 10 282 | 283 | ``` 284 | 285 | 286 | ### Checking priors with visualizations 287 | Next, you'll want to check your priors by running this code chunk. It will produce a set of 100 sample draws drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable. 288 | 289 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars. 290 | 291 | #### Static visualization of priors 292 | **What to change** 293 | 294 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often. 295 | 296 | **Troubleshooting** 297 | 298 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data. 299 | 300 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream. 301 | 302 | ```{r check_priors, results="hide"} 303 | 304 | # generate the prior distribution 305 | m_prior = stan_glm(y ~ x1*x2, data = mydata, 306 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 307 | prior = normal(b1_prior, b1_sd, autoscale = FALSE), 308 | prior_PD = TRUE 309 | ) 310 | 311 | # Create the dataframe with fitted draws 312 | prior_draws = mydata %>% #pipe mydata to datagrid() 313 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 314 | add_fitted_draws(m_prior, n = 100, seed = 12345) #add n fitted draws from the model to the fit grid 315 | # the seed argument is for reproducibility: it ensures the pseudo-random 316 | # number generator used to pick draws has the same seed on every run, 317 | # so that someone else can re-run this code and verify their output matches 318 | 319 | # Plot the 100 sample draws 320 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 321 | static_prior_plot_6(prior_draws) 322 | ``` 323 | 324 | #### Animated visualization of priors 325 | The static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs. HOPs visualizes the same information as the static plot generated above, but are better for visualizing uncertainty and identifying underlying trends. 326 | 327 | In this code chunk, we create the animated plot using the 50 of the 100 draws we used in the plot above. Each frame of the animation shows a different draw from the prior. 328 | 329 | **What to change:** Nothing! Just run the code to check your priors. 330 | 331 | ```{r HOPs_priors} 332 | # Animation parameters 333 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time) 334 | frames_per_second = 2.5 # the speed of the HOPs 335 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization. 336 | # Faster speeds (100ms) have been demonstrated to not work as well. 337 | # See Kale et al. VIS 2018 for more info. 338 | 339 | # Animate the prior draws with HOPs 340 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 341 | prior_HOPs = animate(HOPS_plot_6(prior_draws), nframes = n_draws * 2, fps = frames_per_second) 342 | prior_HOPs 343 | ``` 344 | 345 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates. 346 | 347 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section. 348 | 349 | 350 | 351 | ### Run the model 352 | There's nothing you have to change here. Just run the model. 353 | 354 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options. 355 | 356 | ```{r results = "hide", message = FALSE, warning = FALSE} 357 | m = stan_glm(y ~ x1*x2, data = mydata, 358 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE), 359 | prior = normal(b1_prior, b1_sd, autoscale = FALSE) 360 | ) 361 | ``` 362 | 363 | 364 | ## Model summary 365 | Here is a summary of the model fit. 366 | 367 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors. 368 | 369 | ```{r} 370 | summary(m, digits=3) 371 | ``` 372 | 373 | 374 | ## Visualizing results 375 | #### Static visualizations 376 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with a 95% confidence band. To do this, we use the grid and draw samples from the posterior mean evaluated at each x position in the grid using the `add_fitted_draws` function, and then summarize these samples in ggplot using a `stat_lineribbon`: 377 | 378 | ```{r static_graph} 379 | 380 | # Create the dataframe with fitted draws 381 | fit = mydata %>%#pipe mydata to datagrid() 382 | data_grid(x1 = seq_range(x1, n = 20), x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws() 383 | add_fitted_draws(m) #add n fitted draws from the model to the fit grid 384 | 385 | # Plot the posterior draws 386 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 387 | static_post_plot_6a(fit) 388 | 389 | ``` 390 | 391 | But what we really want is to display a selection of plausible fit lines, say 100 of them. To do that, we instead ask `add_fitted_draws` for only 50 draws, which we plot separately as lines: 392 | 393 | ```{r} 394 | 395 | fit = mydata %>% 396 | data_grid(x1 = seq_range(x1, n = 101), x2) %>% 397 | # the seed argument is for reproducibility: it ensures the pseudo-random 398 | # number generator used to pick draws has the same seed on every run, 399 | # so that someone else can re-run this code and verify their output matches 400 | add_fitted_draws(m, n = 100, seed = 12345) 401 | 402 | # Plot the posterior draws with a selection of fit draws 403 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 404 | static_post_plot_6b(fit) 405 | ``` 406 | 407 | 408 | #### Animated HOPs visualization 409 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`: 410 | 411 | ```{r} 412 | 413 | p = mydata %>% 414 | data_grid(x1 = seq_range(x1, n = 101), x2) %>% 415 | add_fitted_draws(m, n = n_draws, seed = 12345) 416 | 417 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk 418 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics. 419 | post_HOPs = animate(HOPS_plot_6(p), nframes = n_draws * 2, fps = frames_per_second) 420 | post_HOPs 421 | 422 | ``` 423 | 424 | ### Comparing the prior and posterior 425 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph. 426 | 427 |
**Prior draws**
428 | ```{r echo=F} 429 | prior_HOPs 430 | ``` 431 | 432 |
**Posterior draws**
433 | ```{r echo=F} 434 | post_HOPs 435 | ``` 436 | 437 | ## Finishing up 438 | 439 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process. 440 | 441 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks: 442 | 443 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.) 444 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.) 445 | 446 | 447 | The citation for paper reporting the process of developing and user-testing these templates is below: 448 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709 449 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 cdphelan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bayesian templates for beginners 2 | 3 | UPDATE May 2019: A new version of the templates has been released. The structure of the templates has been improved to make it easier to follow along. The ggplot code that sets the plot aesthetics has been moved to the file plotting_functions.R for improved readability of the main templates. 4 | 5 | This repo is a set of templates that will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. Over time, we will be adding to this list of templates. 6 | 7 | The research paper that accompanies these templates is forthcoming at CHI 2019: "Some Prior(s) Experience Necessary: Templates for Getting Started in Bayesian Analysis." Links will be added once the paper is published. 8 | 9 | Detailed instructions on how to get started are at the end of this README. 10 | 11 | A productive way to choose which template to use is to think about what your independent variables are and what type of chart you would like to produce to summarize your data. Currently, the templates support the following: 12 | 13 | ## One independent variable 14 | 15 | **1) Categorical**: 16 | bar chart 17 | 18 | Creates a bar chart; compatible with tests such as t-tests, one-way ANOVA 19 | 20 | Use this template file: 21 | 22 | 1var-categorical-bar-bayesian_template.Rmd 23 | 24 | **2) Ordinal**: 25 | line chart 26 | 27 | Creates a line graph; compatible with tests such as t-tests, one-way ANOVA 28 | 29 | Use this template file: 30 | 31 | 1var-ordinal-line-bayesian_template.html.Rmd 32 | 33 | **3) Continuous**: 34 | line chart 35 | 36 | Creates a line graph; compatible with tests such as linear regressions 37 | 38 | Use this template file: 39 | 40 | 1var-continuous-line-bayesian_template.Rmd 41 | 42 | 43 | ## Interaction of two independent variables 44 | 45 | **4) Interaction of two categorical**: 46 | bar chart 47 | 48 | Creates a bar chart; compatible with tests such as two-way ANOVA 49 | 50 | Use this template file: 51 | 52 | 2var-categorical-bar-bayesian_template.Rmd 53 | 54 | **5) Interaction of one categorical, one ordinal**: 55 | line chart 56 | 57 | Creates a line graph; compatible with tests such as two-way ANOVA 58 | 59 | Use this template file: 60 | 61 | 2var-categorical_ordinal-line-bayesian_template.Rmd 62 | 63 | **6) Interaction of one categorical, one continuous**: 64 | line chart 65 | 66 | Creates a line graph; compatible with tests such as linear regressions with multiple lines 67 | 68 | Use this template file: 69 | 70 | 2var-continuous_categorical-line-bayesian_template.Rmd 71 | 72 | # Getting started 73 | 74 | 1) If you do not have RStudio already installed, install it from here: https://www.rstudio.com/products/rstudio/download/#download 75 | 76 | 2) Clone the repo (if you're familiar with git) or download a zip file of the repository contents here: https://github.com/cdphelan/bayesian-template/archive/master.zip 77 | 78 | 3) Open the file bayesian-template.Rproj as a project in RStudio to get started. 79 | 80 | 4) Explore the templates as you would like! We suggest that you use the pre-knitted HTML documents (found in the folder /html_outputs/) as a reference, so you can read the instructions more easily and see a complete example as you work. 81 | 82 | 5) Open the .Rmd template files when you are ready to start editing code. 83 | 84 | -------------------------------------------------------------------------------- /bayesian-template.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | -------------------------------------------------------------------------------- /datasets/choc_cleaned_data.csv: -------------------------------------------------------------------------------- 1 | "","ParticipantID","num_products_displayed","satis_Q1","max_score","sat_max" 2 | "1",2733,24,7,85,"maximizer" 3 | "2",2729,50,7,80,"maximizer" 4 | "3",3498,40,7,64,"maximizer" 5 | "4",2628,72,7,16,"satisficer" 6 | "5",2737,50,3,64,"maximizer" 7 | "6",2735,50,3,73,"maximizer" 8 | "7",2619,72,5,53,"maximizer" 9 | "8",2730,50,7,71,"maximizer" 10 | "9",2736,60,4,60,"maximizer" 11 | "10",2734,50,5,48,"satisficer" 12 | "11",687,72,4,64,"maximizer" 13 | "12",2728,50,7,63,"maximizer" 14 | "13",2725,72,7,61,"maximizer" 15 | "14",2727,24,7,64,"maximizer" 16 | "15",3828,24,4,74,"maximizer" 17 | "16",2846,50,7,77,"maximizer" 18 | "17",1316,72,6,60,"maximizer" 19 | "18",2660,60,7,53,"maximizer" 20 | "19",1102,40,7,61,"maximizer" 21 | "20",3531,60,5,76,"maximizer" 22 | "21",3870,50,7,73,"maximizer" 23 | "22",3776,12,7,67,"maximizer" 24 | "23",2623,60,1,61,"maximizer" 25 | "24",2593,60,7,73,"maximizer" 26 | "25",3868,60,6,72,"maximizer" 27 | "26",3267,60,4,71,"maximizer" 28 | "27",1509,40,6,69,"maximizer" 29 | "28",3543,60,7,38,"satisficer" 30 | "29",1781,24,5,86,"maximizer" 31 | "30",2594,50,7,68,"maximizer" 32 | "31",2827,50,5,74,"maximizer" 33 | "32",2554,72,6,68,"maximizer" 34 | "33",381,40,5,51,"satisficer" 35 | "34",3709,40,7,84,"maximizer" 36 | "35",2880,50,7,70,"maximizer" 37 | "36",3316,12,6,53,"maximizer" 38 | "37",2581,60,7,91,"maximizer" 39 | "38",3339,24,7,72,"maximizer" 40 | "39",3407,72,5,73,"maximizer" 41 | "40",2845,24,5,82,"maximizer" 42 | "41",3401,60,7,44,"satisficer" 43 | "42",3350,12,7,57,"maximizer" 44 | "43",3179,50,5,54,"maximizer" 45 | "44",3473,40,7,57,"maximizer" 46 | "45",3674,60,7,52,"satisficer" 47 | "46",2500,72,6,49,"satisficer" 48 | "47",493,24,7,67,"maximizer" 49 | "48",2505,24,5,52,"satisficer" 50 | "49",3121,60,4,58,"maximizer" 51 | "50",1510,60,6,51,"satisficer" 52 | "51",348,72,7,65,"maximizer" 53 | "52",390,72,7,74,"maximizer" 54 | "53",3764,72,5,59,"maximizer" 55 | "54",3568,12,7,72,"maximizer" 56 | "55",852,24,6,51,"satisficer" 57 | "56",2461,50,6,65,"maximizer" 58 | "57",554,50,7,61,"maximizer" 59 | "58",3818,40,5,72,"maximizer" 60 | "59",3451,50,7,43,"satisficer" 61 | "60",3185,50,6,44,"satisficer" 62 | "61",495,24,6,63,"maximizer" 63 | "62",2410,72,6,59,"maximizer" 64 | "63",3845,24,7,53,"maximizer" 65 | "64",2763,72,7,59,"maximizer" 66 | "65",2590,40,5,61,"maximizer" 67 | "66",1226,12,6,70,"maximizer" 68 | "67",2563,50,5,67,"maximizer" 69 | "68",3634,50,7,65,"maximizer" 70 | "69",3633,60,6,60,"maximizer" 71 | "70",2602,12,5,57,"maximizer" 72 | "71",3887,40,6,70,"maximizer" 73 | "72",3191,72,7,59,"maximizer" 74 | "73",2951,12,7,58,"maximizer" 75 | "74",3442,40,6,58,"maximizer" 76 | "75",3389,60,6,57,"maximizer" 77 | "76",3736,24,7,57,"maximizer" 78 | "77",2862,50,7,60,"maximizer" 79 | "78",3556,24,6,58,"maximizer" 80 | "79",3228,72,6,46,"satisficer" 81 | "80",2444,40,7,41,"satisficer" 82 | "81",3184,50,4,49,"satisficer" 83 | "82",2603,40,7,62,"maximizer" 84 | "83",2857,60,7,30,"satisficer" 85 | "84",3180,12,6,70,"maximizer" 86 | "85",3233,40,4,38,"satisficer" 87 | "86",2817,40,6,74,"maximizer" 88 | "87",3292,60,7,84,"maximizer" 89 | "88",3521,50,7,77,"maximizer" 90 | "89",3593,24,4,60,"maximizer" 91 | "90",2751,60,7,70,"maximizer" 92 | "91",3549,50,5,66,"maximizer" 93 | "92",2676,60,4,67,"maximizer" 94 | "93",2692,50,6,57,"maximizer" 95 | "94",3186,12,6,32,"satisficer" 96 | "95",2943,60,7,48,"satisficer" 97 | "96",1759,12,4,67,"maximizer" 98 | "97",438,24,6,59,"maximizer" 99 | "98",695,12,5,52,"satisficer" 100 | "99",3334,60,5,64,"maximizer" 101 | "100",3676,40,6,51,"satisficer" 102 | "101",2454,72,5,55,"maximizer" 103 | "102",3817,12,6,52,"satisficer" 104 | "103",2450,50,6,71,"maximizer" 105 | "104",3384,60,7,64,"maximizer" 106 | "105",3330,12,7,66,"maximizer" 107 | "106",2864,40,5,57,"maximizer" 108 | "107",528,40,7,54,"maximizer" 109 | "108",3827,24,4,73,"maximizer" 110 | "109",1945,40,5,73,"maximizer" 111 | "110",1749,50,5,72,"maximizer" 112 | "111",3376,72,5,52,"satisficer" 113 | "112",3387,72,7,49,"satisficer" 114 | "113",2791,50,7,68,"maximizer" 115 | "114",2493,72,4,74,"maximizer" 116 | "115",2883,24,7,64,"maximizer" 117 | "116",2514,50,7,51,"satisficer" 118 | "117",3394,12,7,72,"maximizer" 119 | "118",3142,40,7,61,"maximizer" 120 | "119",828,24,7,42,"satisficer" 121 | "120",3797,72,7,86,"maximizer" 122 | "121",2691,24,6,74,"maximizer" 123 | "122",3486,24,7,63,"maximizer" 124 | "123",2755,40,7,52,"satisficer" 125 | "124",3730,50,6,58,"maximizer" 126 | "125",2869,50,5,59,"maximizer" 127 | "126",417,50,7,80,"maximizer" 128 | "127",3554,50,7,33,"satisficer" 129 | "128",357,12,6,48,"satisficer" 130 | "129",3472,40,6,68,"maximizer" 131 | "130",2463,72,6,62,"maximizer" 132 | "131",3215,40,5,55,"maximizer" 133 | "132",2543,12,7,54,"maximizer" 134 | "133",3578,72,7,53,"maximizer" 135 | "134",3383,40,7,63,"maximizer" 136 | "135",1791,72,6,51,"satisficer" 137 | "136",2686,12,6,50,"satisficer" 138 | "137",2470,72,5,54,"maximizer" 139 | "138",3028,60,7,56,"maximizer" 140 | "139",3525,12,7,45,"satisficer" 141 | "140",3363,40,6,53,"maximizer" 142 | "141",982,50,7,66,"maximizer" 143 | "142",3173,40,7,74,"maximizer" 144 | "143",2501,12,5,43,"satisficer" 145 | "144",3679,12,6,51,"satisficer" 146 | "145",1030,12,7,56,"maximizer" 147 | "146",3150,72,4,57,"maximizer" 148 | "147",2964,12,5,56,"maximizer" 149 | "148",1050,12,7,54,"maximizer" 150 | "149",3420,40,4,30,"satisficer" 151 | "150",2760,60,6,62,"maximizer" 152 | "151",2541,50,5,50,"satisficer" 153 | "152",2826,12,6,68,"maximizer" 154 | "153",2548,40,6,49,"satisficer" 155 | "154",2707,40,7,67,"maximizer" 156 | "155",349,12,7,67,"maximizer" 157 | "156",3721,60,6,53,"maximizer" 158 | "157",2833,40,7,36,"satisficer" 159 | "158",3046,72,5,51,"satisficer" 160 | "159",3598,50,6,76,"maximizer" 161 | "160",3270,60,6,64,"maximizer" 162 | "161",3382,50,4,68,"maximizer" 163 | "162",3048,50,7,67,"maximizer" 164 | "163",3468,50,7,61,"maximizer" 165 | "164",2426,40,6,58,"maximizer" 166 | "165",3605,12,7,39,"satisficer" 167 | "166",2874,40,7,34,"satisficer" 168 | "167",3271,24,7,64,"maximizer" 169 | "168",2812,24,4,89,"maximizer" 170 | "169",2753,40,6,57,"maximizer" 171 | "170",336,50,6,63,"maximizer" 172 | "171",430,12,6,37,"satisficer" 173 | "172",2503,72,6,71,"maximizer" 174 | "173",1933,60,5,38,"satisficer" 175 | "174",1606,60,7,69,"maximizer" 176 | "175",2309,12,7,76,"maximizer" 177 | "176",2709,60,7,62,"maximizer" 178 | "177",1790,60,5,41,"satisficer" 179 | "178",3269,40,7,61,"maximizer" 180 | "179",3337,24,6,55,"maximizer" 181 | "180",3395,50,5,55,"maximizer" 182 | "181",3830,72,6,50,"satisficer" 183 | "182",3718,72,6,78,"maximizer" 184 | "183",3351,12,6,74,"maximizer" 185 | "184",2592,60,7,68,"maximizer" 186 | "185",2589,24,7,52,"satisficer" 187 | "186",2526,50,5,34,"satisficer" 188 | "187",3450,50,6,45,"satisficer" 189 | "188",2356,60,5,70,"maximizer" 190 | "189",1996,12,6,58,"maximizer" 191 | "190",3283,72,7,64,"maximizer" 192 | "191",373,12,7,71,"maximizer" 193 | "192",2832,24,6,66,"maximizer" 194 | "193",2578,40,7,63,"maximizer" 195 | "194",2562,50,6,74,"maximizer" 196 | "195",3268,12,7,69,"maximizer" 197 | "196",3879,40,7,38,"satisficer" 198 | "197",2620,40,5,57,"maximizer" 199 | "198",3637,50,7,65,"maximizer" 200 | "199",3580,60,6,51,"satisficer" 201 | "200",3757,24,6,50,"satisficer" 202 | "201",2509,40,7,70,"maximizer" 203 | "202",356,24,6,44,"satisficer" 204 | "203",3218,50,6,48,"satisficer" 205 | "204",3520,60,5,62,"maximizer" 206 | "205",3551,40,7,58,"maximizer" 207 | "206",3166,12,6,48,"satisficer" 208 | "207",542,12,7,61,"maximizer" 209 | "208",2560,72,5,69,"maximizer" 210 | "209",2881,24,7,41,"satisficer" 211 | "210",3612,12,6,56,"maximizer" 212 | "211",2458,50,5,69,"maximizer" 213 | "212",3570,40,5,51,"satisficer" 214 | "213",3235,72,7,51,"satisficer" 215 | "214",2840,40,5,61,"maximizer" 216 | "215",1313,50,5,64,"maximizer" 217 | "216",3592,50,5,65,"maximizer" 218 | "217",3495,60,7,47,"satisficer" 219 | "218",3469,12,5,54,"maximizer" 220 | "219",3687,50,7,47,"satisficer" 221 | "220",3713,12,7,61,"maximizer" 222 | "221",2784,40,6,52,"satisficer" 223 | "222",2605,60,5,64,"maximizer" 224 | "223",3296,72,7,62,"maximizer" 225 | "224",2970,24,5,52,"satisficer" 226 | "225",3272,12,6,68,"maximizer" 227 | "226",3620,12,4,60,"maximizer" 228 | "227",3503,72,7,68,"maximizer" 229 | "228",3530,40,6,71,"maximizer" 230 | "229",2102,72,6,47,"satisficer" 231 | "230",3658,72,6,55,"maximizer" 232 | "231",3675,50,5,68,"maximizer" 233 | "232",2818,60,7,48,"satisficer" 234 | "233",2580,24,6,38,"satisficer" 235 | "234",3423,40,6,49,"satisficer" 236 | "235",3266,60,6,57,"maximizer" 237 | "236",3301,60,7,49,"satisficer" 238 | "237",2681,12,7,60,"maximizer" 239 | "238",2443,50,7,45,"satisficer" 240 | "239",811,12,6,66,"maximizer" 241 | "240",354,60,7,54,"maximizer" 242 | "241",2873,24,7,69,"maximizer" 243 | "242",2842,24,7,54,"maximizer" 244 | "243",1463,60,7,60,"maximizer" 245 | "244",3239,72,7,54,"maximizer" 246 | "245",2762,24,7,39,"satisficer" 247 | "246",2387,50,7,71,"maximizer" 248 | "247",2634,24,7,59,"maximizer" 249 | "248",3878,12,7,54,"maximizer" 250 | "249",2863,24,7,61,"maximizer" 251 | "250",2780,12,7,66,"maximizer" 252 | "251",2440,60,7,36,"satisficer" 253 | "252",3851,40,7,50,"satisficer" 254 | "253",3602,72,2,69,"maximizer" 255 | "254",3523,60,6,77,"maximizer" 256 | "255",3615,12,7,73,"maximizer" 257 | "256",1712,40,6,67,"maximizer" 258 | "257",2477,50,7,57,"maximizer" 259 | "258",2637,60,6,54,"maximizer" 260 | "259",2891,72,4,51,"satisficer" 261 | "260",3145,12,7,53,"maximizer" 262 | "261",3667,72,7,70,"maximizer" 263 | "262",3467,24,6,64,"maximizer" 264 | "263",1483,60,4,60,"maximizer" 265 | "264",2393,24,7,54,"maximizer" 266 | "265",3619,24,6,64,"maximizer" 267 | "266",3835,50,7,39,"satisficer" 268 | "267",3881,12,5,57,"maximizer" 269 | "268",3402,12,6,45,"satisficer" 270 | "269",2497,50,5,70,"maximizer" 271 | "270",592,50,7,64,"maximizer" 272 | "271",2774,40,6,58,"maximizer" 273 | "272",1036,50,7,49,"satisficer" 274 | "273",2781,72,7,55,"maximizer" 275 | "274",3207,24,5,49,"satisficer" 276 | "275",3613,60,6,58,"maximizer" 277 | "276",3699,60,6,41,"satisficer" 278 | "277",3731,40,7,73,"maximizer" 279 | "278",3816,12,7,84,"maximizer" 280 | "279",3579,72,7,54,"maximizer" 281 | "280",3606,12,6,70,"maximizer" 282 | "281",2830,50,5,56,"maximizer" 283 | "282",2828,24,7,73,"maximizer" 284 | "283",2285,12,6,65,"maximizer" 285 | "284",3585,60,7,83,"maximizer" 286 | "285",1505,50,4,76,"maximizer" 287 | "286",3517,50,5,65,"maximizer" 288 | "287",3438,72,6,49,"satisficer" 289 | "288",2875,40,7,66,"maximizer" 290 | "289",2569,72,7,52,"satisficer" 291 | "290",3393,72,7,52,"satisficer" 292 | "291",595,60,6,65,"maximizer" 293 | "292",3386,24,6,49,"satisficer" 294 | "293",3378,24,3,56,"maximizer" 295 | "294",2570,72,7,43,"satisficer" 296 | "295",825,72,6,67,"maximizer" 297 | "296",439,72,4,40,"satisficer" 298 | "297",3282,60,7,77,"maximizer" 299 | "298",3428,50,4,70,"maximizer" 300 | "299",1639,50,6,55,"maximizer" 301 | "300",3526,60,7,43,"satisficer" 302 | "301",3582,12,7,54,"maximizer" 303 | "302",3429,12,7,49,"satisficer" 304 | "303",2636,12,7,50,"satisficer" 305 | "304",3729,72,7,53,"maximizer" 306 | "305",3433,40,7,48,"satisficer" 307 | "306",2747,60,5,69,"maximizer" 308 | "307",3716,24,2,78,"maximizer" 309 | "308",3007,24,7,52,"satisficer" 310 | "309",3342,40,6,53,"maximizer" 311 | "310",2847,40,7,57,"maximizer" 312 | "311",2279,12,5,65,"maximizer" 313 | "312",2482,60,6,49,"satisficer" 314 | "313",2661,24,7,56,"maximizer" 315 | "314",676,50,6,44,"satisficer" 316 | "315",3188,72,7,42,"satisficer" 317 | "316",1677,40,6,50,"satisficer" 318 | "317",3360,72,6,49,"satisficer" 319 | "318",2776,50,7,65,"maximizer" 320 | "319",3712,24,7,47,"satisficer" 321 | "320",3514,24,7,44,"satisficer" 322 | "321",2834,72,6,48,"satisficer" 323 | "322",2851,12,7,57,"maximizer" 324 | "323",2523,60,6,62,"maximizer" 325 | "324",3635,72,7,59,"maximizer" 326 | "325",3253,50,7,40,"satisficer" 327 | "326",648,72,7,41,"satisficer" 328 | "327",2522,12,6,68,"maximizer" 329 | "328",3711,12,5,70,"maximizer" 330 | "329",874,50,5,36,"satisficer" 331 | "330",3197,24,6,54,"maximizer" 332 | "331",3454,40,6,68,"maximizer" 333 | "332",3560,12,4,57,"maximizer" 334 | "333",2246,40,7,55,"maximizer" 335 | "334",3365,50,6,53,"maximizer" 336 | "335",2838,50,5,60,"maximizer" 337 | "336",3670,72,7,53,"maximizer" 338 | "337",2485,40,6,41,"satisficer" 339 | "338",2508,40,7,57,"maximizer" 340 | "339",3065,12,5,77,"maximizer" 341 | "340",3431,12,6,61,"maximizer" 342 | "341",2572,12,5,42,"satisficer" 343 | "342",1701,50,7,60,"maximizer" 344 | "343",460,24,6,64,"maximizer" 345 | "344",3331,72,6,33,"satisficer" 346 | "345",2705,50,7,66,"maximizer" 347 | "346",2788,12,7,46,"satisficer" 348 | "347",2442,60,7,58,"maximizer" 349 | "348",2611,50,6,54,"maximizer" 350 | "349",3621,72,7,51,"satisficer" 351 | "350",3763,50,7,59,"maximizer" 352 | "351",3178,60,5,49,"satisficer" 353 | "352",2789,50,5,53,"maximizer" 354 | "353",3332,24,6,63,"maximizer" 355 | "354",1120,50,6,61,"maximizer" 356 | "355",2534,60,6,60,"maximizer" 357 | "356",3722,12,3,56,"maximizer" 358 | "357",3422,60,4,48,"satisficer" 359 | "358",839,24,7,61,"maximizer" 360 | "359",631,72,6,85,"maximizer" 361 | "360",3485,60,7,53,"maximizer" 362 | "361",3719,50,7,58,"maximizer" 363 | "362",3311,50,6,58,"maximizer" 364 | "363",3564,40,7,68,"maximizer" 365 | "364",3147,72,5,60,"maximizer" 366 | "365",2933,40,7,61,"maximizer" 367 | "366",3062,60,7,45,"satisficer" 368 | "367",2484,40,6,54,"maximizer" 369 | "368",2431,40,5,56,"maximizer" 370 | "369",3258,50,5,65,"maximizer" 371 | "370",3347,72,6,63,"maximizer" 372 | "371",3575,60,6,62,"maximizer" 373 | "372",3470,50,7,44,"satisficer" 374 | "373",3741,12,7,55,"maximizer" 375 | "374",2768,60,7,53,"maximizer" 376 | "375",2599,60,6,49,"satisficer" 377 | "376",3748,40,7,75,"maximizer" 378 | "377",3002,40,6,65,"maximizer" 379 | "378",2878,72,7,57,"maximizer" 380 | "379",2506,24,7,54,"maximizer" 381 | "380",3700,12,4,65,"maximizer" 382 | "381",1390,12,7,48,"satisficer" 383 | "382",2808,40,7,48,"satisficer" 384 | "383",2764,50,7,37,"satisficer" 385 | "384",334,60,6,62,"maximizer" 386 | "385",2490,72,6,52,"satisficer" 387 | "386",2459,72,6,44,"satisficer" 388 | "387",3515,12,7,52,"satisficer" 389 | "388",2451,72,7,45,"satisficer" 390 | "389",3437,72,7,55,"maximizer" 391 | "390",1024,60,5,58,"maximizer" 392 | "391",3241,60,4,66,"maximizer" 393 | "392",2665,50,7,50,"satisficer" 394 | "393",3064,12,7,59,"maximizer" 395 | "394",3669,72,7,51,"satisficer" 396 | "395",3657,12,5,57,"maximizer" 397 | "396",3392,60,7,63,"maximizer" 398 | "397",2831,72,7,44,"satisficer" 399 | "398",3297,72,6,55,"maximizer" 400 | "399",3542,72,7,58,"maximizer" 401 | "400",3789,72,6,42,"satisficer" 402 | "401",3673,12,7,67,"maximizer" 403 | "402",2643,24,6,46,"satisficer" 404 | "403",3544,12,6,36,"satisficer" 405 | "404",3265,12,7,52,"satisficer" 406 | "405",3295,72,6,75,"maximizer" 407 | "406",2759,24,7,63,"maximizer" 408 | "407",3701,72,7,53,"maximizer" 409 | "408",3164,24,5,43,"satisficer" 410 | "409",2790,24,5,45,"satisficer" 411 | "410",3622,40,7,74,"maximizer" 412 | "411",3466,24,7,62,"maximizer" 413 | "412",3697,24,5,62,"maximizer" 414 | "413",3447,24,7,72,"maximizer" 415 | "414",3225,40,7,37,"satisficer" 416 | "415",3183,40,6,62,"maximizer" 417 | "416",3819,40,7,44,"satisficer" 418 | "417",3534,24,7,60,"maximizer" 419 | "418",2853,60,7,59,"maximizer" 420 | "419",2767,40,7,44,"satisficer" 421 | "420",3231,50,7,42,"satisficer" 422 | "421",2645,12,7,63,"maximizer" 423 | "422",1010,12,7,47,"satisficer" 424 | "423",2616,24,6,43,"satisficer" 425 | "424",3639,40,5,53,"maximizer" 426 | "425",3683,60,7,55,"maximizer" 427 | "426",3456,72,6,61,"maximizer" 428 | "427",2456,60,4,61,"maximizer" 429 | "428",1603,72,6,73,"maximizer" 430 | "429",2848,60,7,53,"maximizer" 431 | "430",3834,40,6,63,"maximizer" 432 | "431",3573,12,5,52,"satisficer" 433 | "432",3122,12,7,36,"satisficer" 434 | "433",2931,40,6,52,"satisficer" 435 | "434",3756,40,7,60,"maximizer" 436 | "435",2821,24,7,38,"satisficer" 437 | "436",3352,12,6,52,"satisficer" 438 | "437",2950,12,7,71,"maximizer" 439 | "438",2565,24,7,77,"maximizer" 440 | "439",2520,60,6,58,"maximizer" 441 | "440",3636,60,6,37,"satisficer" 442 | "441",1542,72,6,66,"maximizer" 443 | "442",1937,60,4,52,"satisficer" 444 | "443",3156,72,7,51,"satisficer" 445 | "444",3509,60,7,43,"satisficer" 446 | "445",2704,60,6,59,"maximizer" 447 | "446",375,12,6,52,"satisficer" 448 | "447",3684,12,7,68,"maximizer" 449 | "448",2606,12,5,60,"maximizer" 450 | "449",2518,40,7,44,"satisficer" 451 | "450",3165,24,5,58,"maximizer" 452 | "451",2816,60,4,67,"maximizer" 453 | "452",3053,24,5,61,"maximizer" 454 | "453",2607,60,5,59,"maximizer" 455 | "454",441,12,7,36,"satisficer" 456 | "455",3655,50,6,60,"maximizer" 457 | "456",3444,40,6,81,"maximizer" 458 | "457",3836,50,7,53,"maximizer" 459 | "458",1819,50,6,54,"maximizer" 460 | "459",2229,72,7,47,"satisficer" 461 | "460",3262,12,7,49,"satisficer" 462 | "461",3608,24,5,65,"maximizer" 463 | "462",2804,60,6,60,"maximizer" 464 | "463",3841,60,7,53,"maximizer" 465 | "464",789,24,6,59,"maximizer" 466 | "465",2553,12,4,75,"maximizer" 467 | "466",2517,40,7,33,"satisficer" 468 | "467",1252,72,6,40,"satisficer" 469 | "468",2746,40,7,39,"satisficer" 470 | "469",3545,72,5,75,"maximizer" 471 | "470",3054,12,6,54,"maximizer" 472 | "471",330,60,1,55,"maximizer" 473 | "472",2724,50,7,31,"satisficer" 474 | "473",2811,24,5,57,"maximizer" 475 | "474",3249,60,6,60,"maximizer" 476 | "475",2797,40,5,44,"satisficer" 477 | "476",2427,50,4,66,"maximizer" 478 | "477",3855,60,7,49,"satisficer" 479 | "478",2713,60,7,91,"maximizer" 480 | "479",1770,50,7,46,"satisficer" 481 | "480",3617,72,7,56,"maximizer" 482 | "481",2773,72,6,46,"satisficer" 483 | "482",3187,24,7,50,"satisficer" 484 | "483",2861,50,7,71,"maximizer" 485 | "484",2893,60,4,54,"maximizer" 486 | "485",2489,50,6,39,"satisficer" 487 | "486",2953,40,6,52,"satisficer" 488 | "487",3280,12,7,32,"satisficer" 489 | "488",2494,50,6,46,"satisficer" 490 | "489",2694,60,7,52,"satisficer" 491 | "490",3853,40,7,52,"satisficer" 492 | "491",3425,40,7,64,"maximizer" 493 | "492",3243,72,7,53,"maximizer" 494 | "493",3443,24,7,52,"satisficer" 495 | "494",3181,12,7,44,"satisficer" 496 | "495",2785,50,7,71,"maximizer" 497 | "496",3533,50,6,50,"satisficer" 498 | "497",2457,72,6,48,"satisficer" 499 | "498",2564,12,7,66,"maximizer" 500 | "499",2680,40,7,50,"satisficer" 501 | "500",2695,24,7,61,"maximizer" 502 | "501",3434,60,7,62,"maximizer" 503 | "502",2283,12,6,74,"maximizer" 504 | "503",2783,72,7,50,"satisficer" 505 | "504",3375,12,7,51,"satisficer" 506 | "505",3255,40,5,57,"maximizer" 507 | "506",2819,40,6,59,"maximizer" 508 | "507",2849,60,7,45,"satisficer" 509 | "508",3677,60,7,37,"satisficer" 510 | "509",3024,12,4,43,"satisficer" 511 | "510",2855,12,6,45,"satisficer" 512 | "511",3388,40,6,54,"maximizer" 513 | "512",2448,12,6,40,"satisficer" 514 | "513",3103,72,5,32,"satisficer" 515 | "514",2540,40,6,57,"maximizer" 516 | "515",2805,50,6,43,"satisficer" 517 | "516",3131,72,7,53,"maximizer" 518 | "517",3532,72,7,67,"maximizer" 519 | "518",2000,60,7,57,"maximizer" 520 | "519",3385,12,5,49,"satisficer" 521 | "520",3016,50,6,59,"maximizer" 522 | "521",3440,24,4,30,"satisficer" 523 | "522",2786,12,7,36,"satisficer" 524 | "523",2577,40,7,60,"maximizer" 525 | "524",3286,12,6,58,"maximizer" 526 | "525",2452,40,7,49,"satisficer" 527 | "526",3293,50,6,72,"maximizer" 528 | "527",3781,24,5,52,"satisficer" 529 | "528",3211,24,6,65,"maximizer" 530 | "529",3226,40,7,35,"satisficer" 531 | "530",2779,50,7,57,"maximizer" 532 | "531",2612,72,7,48,"satisficer" 533 | "532",3400,50,6,41,"satisficer" 534 | "533",455,40,6,68,"maximizer" 535 | "534",2905,50,7,30,"satisficer" 536 | "535",3832,50,5,62,"maximizer" 537 | "536",3460,24,5,80,"maximizer" 538 | "537",3281,12,6,49,"satisficer" 539 | "538",1015,12,7,43,"satisficer" 540 | "539",3362,60,6,53,"maximizer" 541 | "540",3338,40,6,57,"maximizer" 542 | "541",2941,50,6,55,"maximizer" 543 | "542",2504,50,7,72,"maximizer" 544 | "543",2197,60,7,48,"satisficer" 545 | "544",3254,40,7,52,"satisficer" 546 | "545",3435,60,7,66,"maximizer" 547 | "546",3144,12,7,55,"maximizer" 548 | "547",784,60,7,66,"maximizer" 549 | "548",3303,60,6,54,"maximizer" 550 | "549",2492,72,5,56,"maximizer" 551 | "550",3307,12,6,47,"satisficer" 552 | "551",2841,50,6,40,"satisficer" 553 | "552",3698,60,7,32,"satisficer" 554 | "553",3439,60,6,72,"maximizer" 555 | "554",3177,50,6,56,"maximizer" 556 | "555",3333,72,5,71,"maximizer" 557 | "556",2545,72,5,45,"satisficer" 558 | "557",2792,24,7,62,"maximizer" 559 | "558",3315,24,6,42,"satisficer" 560 | "559",2955,72,6,55,"maximizer" 561 | "560",3083,50,7,61,"maximizer" 562 | "561",2887,72,6,54,"maximizer" 563 | "562",2478,60,5,61,"maximizer" 564 | "563",3170,12,6,77,"maximizer" 565 | "564",2744,60,6,52,"satisficer" 566 | "565",2537,50,7,65,"maximizer" 567 | "566",3285,50,7,49,"satisficer" 568 | "567",3680,40,6,57,"maximizer" 569 | "568",2529,50,6,58,"maximizer" 570 | "569",3576,72,7,43,"satisficer" 571 | "570",3706,72,7,45,"satisficer" 572 | "571",2870,72,7,64,"maximizer" 573 | "572",3371,60,7,72,"maximizer" 574 | "573",3459,40,6,46,"satisficer" 575 | "574",3645,50,4,80,"maximizer" 576 | "575",3571,50,7,59,"maximizer" 577 | "576",2595,60,7,43,"satisficer" 578 | "577",2212,60,7,60,"maximizer" 579 | "578",3348,60,7,57,"maximizer" 580 | "579",1971,60,6,50,"satisficer" 581 | "580",3244,72,7,34,"satisficer" 582 | "581",3182,72,6,43,"satisficer" 583 | "582",2850,60,3,52,"satisficer" 584 | "583",629,12,6,55,"maximizer" 585 | "584",2772,40,7,58,"maximizer" 586 | "585",3860,24,5,59,"maximizer" 587 | "586",2568,40,7,67,"maximizer" 588 | "587",3724,72,6,62,"maximizer" 589 | "588",1289,12,7,52,"satisficer" 590 | "589",2871,24,4,77,"maximizer" 591 | "590",2586,50,6,55,"maximizer" 592 | "591",2608,50,6,69,"maximizer" 593 | "592",3398,12,6,53,"maximizer" 594 | "593",2604,24,6,66,"maximizer" 595 | "594",2675,24,6,46,"satisficer" 596 | "595",3110,12,7,33,"satisficer" 597 | "596",2527,12,6,42,"satisficer" 598 | "597",415,40,7,78,"maximizer" 599 | "598",3416,24,6,62,"maximizer" 600 | "599",2765,12,7,45,"satisficer" 601 | "600",2587,12,7,66,"maximizer" 602 | "601",3381,12,4,67,"maximizer" 603 | "602",2618,72,6,60,"maximizer" 604 | "603",3546,60,4,46,"satisficer" 605 | "604",2858,60,7,62,"maximizer" 606 | "605",2837,24,6,54,"maximizer" 607 | "606",2673,50,7,66,"maximizer" 608 | "607",3026,24,6,49,"satisficer" 609 | "608",3762,60,7,54,"maximizer" 610 | "609",1980,60,6,58,"maximizer" 611 | "610",3276,40,5,70,"maximizer" 612 | "611",3626,40,7,62,"maximizer" 613 | -------------------------------------------------------------------------------- /datasets/feel-the-movement_simulated-data.csv: -------------------------------------------------------------------------------- 1 | "","participant","motion","tlx_scale","value" 2 | "1",1,"motion","effort",62.6771370559256 3 | "2",2,"motion","effort",63.4354251165914 4 | "3",3,"motion","effort",76.2658933484645 5 | "4",4,"motion","effort",66.3820234657209 6 | "5",5,"motion","effort",65.6975218414369 7 | "6",6,"motion","effort",74.03741634514 8 | "7",7,"motion","effort",62.7278952274524 9 | "8",8,"motion","effort",64.4763705067135 10 | "9",9,"motion","effort",60.5876560715567 11 | "10",10,"motion","effort",67.2479168232032 12 | "11",11,"motion","effort",68.7483313902238 13 | "12",12,"motion","effort",63.5102955005729 14 | "13",13,"motion","effort",53.16100398592 15 | "14",14,"motion","effort",43.7113566640125 16 | "15",15,"motion","effort",63.0529421613001 17 | "16",16,"motion","effort",52.2808144957658 18 | "17",1,"motion","frustration",61.6201519017405 19 | "18",2,"motion","frustration",67.3077229193698 20 | "19",3,"motion","frustration",38.4557544676673 21 | "20",4,"motion","frustration",40.9211178284298 22 | "21",5,"motion","frustration",66.8935737234158 23 | "22",6,"motion","frustration",70.470560878121 24 | "23",7,"motion","frustration",75.7511739437905 25 | "24",8,"motion","frustration",45.9269502385398 26 | "25",9,"motion","frustration",31.8621980876999 27 | "26",10,"motion","frustration",55.114263582379 28 | "27",11,"motion","frustration",25.5436395548097 29 | "28",12,"motion","frustration",43.835108881354 30 | "29",13,"motion","frustration",51.5842570093871 31 | "30",14,"motion","frustration",32.8405486619909 32 | "31",15,"motion","frustration",46.5531182429116 33 | "32",16,"motion","frustration",33.3198600783933 34 | "33",1,"motion","mental_demand",65.1995519130071 35 | "34",2,"motion","mental_demand",65.6961609874982 36 | "35",3,"motion","mental_demand",59.3961427371377 37 | "36",4,"motion","mental_demand",69.8805770972607 38 | "37",5,"motion","mental_demand",77.6740721949966 39 | "38",6,"motion","mental_demand",70.4827130196958 40 | "39",7,"motion","mental_demand",58.5390917473144 41 | "40",8,"motion","mental_demand",73.5908307345228 42 | "41",9,"motion","mental_demand",75.1373820502899 43 | "42",10,"motion","mental_demand",67.9491896417236 44 | "43",11,"motion","mental_demand",67.3894228990569 45 | "44",12,"motion","mental_demand",67.8700868540447 46 | "45",13,"motion","mental_demand",70.6149403360524 47 | "46",14,"motion","mental_demand",70.1159547762453 48 | "47",15,"motion","mental_demand",70.5115723937418 49 | "48",16,"motion","mental_demand",69.9523106174122 50 | "49",1,"motion","performance",53.8756777801028 51 | "50",2,"motion","performance",61.7459940639232 52 | "51",3,"motion","performance",36.7482036856618 53 | "52",4,"motion","performance",58.4828245097124 54 | "53",5,"motion","performance",65.7588288541138 55 | "54",6,"motion","performance",57.5078005553597 56 | "55",7,"motion","performance",57.7940598138312 57 | "56",8,"motion","performance",58.7180697964181 58 | "57",9,"motion","performance",40.0213269023191 59 | "58",10,"motion","performance",67.1339510428577 60 | "59",11,"motion","performance",52.3945502146542 61 | "60",12,"motion","performance",28.554922625903 62 | "61",13,"motion","performance",52.5881457436131 63 | "62",14,"motion","performance",58.6607454744935 64 | "63",15,"motion","performance",45.7458331637738 65 | "64",16,"motion","performance",44.2690657732627 66 | "65",1,"motion","physical_demand",44.3822856923961 67 | "66",2,"motion","physical_demand",42.394630354364 68 | "67",3,"motion","physical_demand",31.3837039265605 69 | "68",4,"motion","physical_demand",20.9895777776461 70 | "69",5,"motion","physical_demand",50.3030074925543 71 | "70",6,"motion","physical_demand",36.1740268714536 72 | "71",7,"motion","physical_demand",78.5673363887555 73 | "72",8,"motion","physical_demand",43.8709626597258 74 | "73",9,"motion","physical_demand",47.6785917730261 75 | "74",10,"motion","physical_demand",54.1910169249521 76 | "75",11,"motion","physical_demand",24.1474535522667 77 | "76",12,"motion","physical_demand",43.1677762007205 78 | "77",13,"motion","physical_demand",25.6418001648926 79 | "78",14,"motion","physical_demand",33.2263705309118 80 | "79",15,"motion","physical_demand",53.8940040112754 81 | "80",16,"motion","physical_demand",45.0274556784988 82 | "81",1,"motion","temporal_demand",64.7585219029532 83 | "82",2,"motion","temporal_demand",67.8707793375568 84 | "83",3,"motion","temporal_demand",68.7232160707118 85 | "84",4,"motion","temporal_demand",50.5731151746078 86 | "85",5,"motion","temporal_demand",59.9556328636728 87 | "86",6,"motion","temporal_demand",56.9228740267909 88 | "87",7,"motion","temporal_demand",64.8387476229165 89 | "88",8,"motion","temporal_demand",66.132502668041 90 | "89",9,"motion","temporal_demand",67.516716381199 91 | "90",10,"motion","temporal_demand",57.7545036553703 92 | "91",11,"motion","temporal_demand",61.6615295154681 93 | "92",12,"motion","temporal_demand",63.2242521256356 94 | "93",13,"motion","temporal_demand",68.0329269256004 95 | "94",14,"motion","temporal_demand",57.6838375149577 96 | "95",15,"motion","temporal_demand",58.1614300601278 97 | "96",16,"motion","temporal_demand",46.1894141543904 98 | "97",1,"no_motion","effort",58.5614249963152 99 | "98",2,"no_motion","effort",64.699364050644 100 | "99",3,"no_motion","effort",56.1285571312222 101 | "100",4,"no_motion","effort",57.5043251549015 102 | "101",5,"no_motion","effort",72.4094233483514 103 | "102",6,"no_motion","effort",63.5574861451147 104 | "103",7,"no_motion","effort",57.7852575831209 105 | "104",8,"no_motion","effort",66.8122813853142 106 | "105",9,"no_motion","effort",63.0053416399275 107 | "106",10,"no_motion","effort",59.3436305871179 108 | "107",11,"no_motion","effort",72.4897883385223 109 | "108",12,"no_motion","effort",58.5363008932268 110 | "109",13,"no_motion","effort",60.2628165237742 111 | "110",14,"no_motion","effort",71.1814214109012 112 | "111",15,"no_motion","effort",69.4400907968164 113 | "112",16,"no_motion","effort",72.2824900147297 114 | "113",1,"no_motion","frustration",38.6685024331998 115 | "114",2,"no_motion","frustration",19.5940822219001 116 | "115",3,"no_motion","frustration",27.1474499149157 117 | "116",4,"no_motion","frustration",53.7579266113433 118 | "117",5,"no_motion","frustration",41.3116380443163 119 | "118",6,"no_motion","frustration",46.5499078783941 120 | "119",7,"no_motion","frustration",51.4600640247224 121 | "120",8,"no_motion","frustration",60.7924116922647 122 | "121",9,"no_motion","frustration",63.6737578699399 123 | "122",10,"no_motion","frustration",72.2826117461159 124 | "123",11,"no_motion","frustration",33.9419530392239 125 | "124",12,"no_motion","frustration",61.4898151714451 126 | "125",13,"no_motion","frustration",55.046118963894 127 | "126",14,"no_motion","frustration",38.2602047315632 128 | "127",15,"no_motion","frustration",48.2965876008443 129 | "128",16,"no_motion","frustration",47.726968055917 130 | "129",1,"no_motion","mental_demand",61.3914073808808 131 | "130",2,"no_motion","mental_demand",71.3401568567912 132 | "131",3,"no_motion","mental_demand",59.9000047044046 133 | "132",4,"no_motion","mental_demand",60.9500381088238 134 | "133",5,"no_motion","mental_demand",66.5278306263365 135 | "134",6,"no_motion","mental_demand",64.3631198045993 136 | "135",7,"no_motion","mental_demand",57.9692286938047 137 | "136",8,"no_motion","mental_demand",78.9522439205533 138 | "137",9,"no_motion","mental_demand",74.0910105261259 139 | "138",10,"no_motion","mental_demand",63.9942276948203 140 | "139",11,"no_motion","mental_demand",71.4781760333647 141 | "140",12,"no_motion","mental_demand",56.4007211104531 142 | "141",13,"no_motion","mental_demand",72.29144173273 143 | "142",14,"no_motion","mental_demand",71.8361674389643 144 | "143",15,"no_motion","mental_demand",77.5489661048835 145 | "144",16,"no_motion","mental_demand",70.965259262464 146 | "145",1,"no_motion","performance",38.7432612359684 147 | "146",2,"no_motion","performance",56.2074461601137 148 | "147",3,"no_motion","performance",63.0138244963009 149 | "148",4,"no_motion","performance",40.8894393148272 150 | "149",5,"no_motion","performance",58.2739402425559 151 | "150",6,"no_motion","performance",52.2537180805095 152 | "151",7,"no_motion","performance",63.5728827865647 153 | "152",8,"no_motion","performance",57.7975034803145 154 | "153",9,"no_motion","performance",38.4347005303272 155 | "154",10,"no_motion","performance",55.8594952584844 156 | "155",11,"no_motion","performance",64.3387071322576 157 | "156",12,"no_motion","performance",27.7535568520894 158 | "157",13,"no_motion","performance",49.8610842757746 159 | "158",14,"no_motion","performance",41.2963735635064 160 | "159",15,"no_motion","performance",64.3302098050749 161 | "160",16,"no_motion","performance",63.3738567853306 162 | "161",1,"no_motion","physical_demand",48.5022574023903 163 | "162",2,"no_motion","physical_demand",56.5879485585391 164 | "163",3,"no_motion","physical_demand",32.8479713910682 165 | "164",4,"no_motion","physical_demand",33.2010798560281 166 | "165",5,"no_motion","physical_demand",48.6489109280486 167 | "166",6,"no_motion","physical_demand",30.1677238807114 168 | "167",7,"no_motion","physical_demand",15.3014055671121 169 | "168",8,"no_motion","physical_demand",38.2957947932656 170 | "169",9,"no_motion","physical_demand",29.8803937380322 171 | "170",10,"no_motion","physical_demand",27.4228684319627 172 | "171",11,"no_motion","physical_demand",34.0094756756381 173 | "172",12,"no_motion","physical_demand",19.2195937995676 174 | "173",13,"no_motion","physical_demand",36.6178038330854 175 | "174",14,"no_motion","physical_demand",30.8737385132687 176 | "175",15,"no_motion","physical_demand",47.4495966274307 177 | "176",16,"no_motion","physical_demand",35.9334370038512 178 | "177",1,"no_motion","temporal_demand",60.9615011891964 179 | "178",2,"no_motion","temporal_demand",65.7747207347715 180 | "179",3,"no_motion","temporal_demand",54.3904375394031 181 | "180",4,"no_motion","temporal_demand",70.6459092694736 182 | "181",5,"no_motion","temporal_demand",64.6354579889539 183 | "182",6,"no_motion","temporal_demand",69.7955333793362 184 | "183",7,"no_motion","temporal_demand",63.7476922150155 185 | "184",8,"no_motion","temporal_demand",64.7973511950388 186 | "185",9,"no_motion","temporal_demand",66.982649980603 187 | "186",10,"no_motion","temporal_demand",57.9476773540266 188 | "187",11,"no_motion","temporal_demand",74.3224239520792 189 | "188",12,"no_motion","temporal_demand",65.611152777591 190 | "189",13,"no_motion","temporal_demand",63.7785493137214 191 | "190",14,"no_motion","temporal_demand",59.6897169551303 192 | "191",15,"no_motion","temporal_demand",69.1993772354949 193 | "192",16,"no_motion","temporal_demand",63.7198489201647 194 | -------------------------------------------------------------------------------- /datasets/stigmatized_campaigns_simulated-data.csv: -------------------------------------------------------------------------------- 1 | "","participant","attitude","order","persuasiveness","empathy" 2 | "1",1,"support","before",4.40573455089864,3.89829569417854 3 | "2",2,"support","before",5.73009162749441,3.63916980722547 4 | "3",3,"support","before",5.13983264589146,2.31759521215623 5 | "4",4,"support","before",3.28325535163691,4.2863938522029 6 | "5",5,"support","before",2.6702387860235,3.48695273941219 7 | "6",6,"support","before",3.80442011852061,3.36107961746716 8 | "7",7,"support","before",4.04852092501442,3.5999877262985 9 | "8",8,"support","before",3.60740761836469,5.20765809139836 10 | "9",9,"support","before",4.96716329694572,3.19667979166351 11 | "10",10,"support","before",3.00984202321901,3.24574042282484 12 | "11",11,"support","before",2.71857802020707,4.401826939733 13 | "12",12,"support","before",3.95702947304393,2.30384720967563 14 | "13",13,"support","before",1.6326318467114,4.59699075452465 15 | "14",14,"support","before",3.48812747845937,5.65821578887246 16 | "15",15,"support","before",5.3772858657999,4.1045331930034 17 | "16",16,"support","before",2.07124933759269,2.25624252538731 18 | "17",17,"support","before",3.94834414343596,3.85216606783078 19 | "18",18,"support","before",3.78466822695807,3.0637306150288 20 | "19",19,"support","before",2.87183112635898,4.23705606263923 21 | "20",20,"support","before",1.82981883912361,3.64173725107282 22 | "21",21,"support","before",4.41562687567857,4.02352485737185 23 | "22",22,"support","before",2.98636453830791,3.11353736359473 24 | "23",23,"support","before",2.31369888756975,3.06429014917687 25 | "24",24,"support","before",3.29600639760319,3.70288171567384 26 | "25",25,"support","before",3.1422319991402,2.73986655158695 27 | "26",26,"oppose","before",2.70289663267111,3.31535050426009 28 | "27",27,"oppose","before",4.86976299399937,3.30981443993623 29 | "28",28,"oppose","before",1.75111048739079,1.26886028290927 30 | "29",29,"oppose","before",4.82878313846614,3.1842382024408 31 | "30",30,"oppose","before",3.35000931386195,3.60723816093571 32 | "31",31,"oppose","before",6.20794873026476,3.75166370350341 33 | "32",32,"oppose","before",4.69091296237086,3.37451851145866 34 | "33",33,"oppose","before",4.08316338830329,3.40279418681068 35 | "34",34,"oppose","before",3.31458752093788,2.51468529768596 36 | "35",35,"oppose","before",6.97669442648597,2.86035223864848 37 | "36",36,"oppose","before",3.00841661882744,3.33653109469252 38 | "37",37,"oppose","before",3.18419262164537,4.82062688154548 39 | "38",38,"oppose","before",4.96941219408283,1.27308752228665 40 | "39",39,"oppose","before",2.92578656710011,2.97837989709837 41 | "40",40,"oppose","before",3.87792735611406,0.903726365687009 42 | "41",41,"oppose","before",3.30829261761016,2.43491933490072 43 | "42",42,"oppose","before",3.96754076943,5.08259618331271 44 | "43",43,"oppose","before",2.68787234714274,2.424575409719 45 | "44",44,"oppose","before",1.83716206277372,2.80755583942817 46 | "45",45,"oppose","before",4.43377392427531,1.61042083059434 47 | "46",46,"oppose","before",4.02172216658175,2.96534691171415 48 | "47",47,"oppose","before",4.14142591343767,3.94684014524908 49 | "48",48,"oppose","before",2.96776692307587,2.19538928571236 50 | "49",49,"oppose","before",3.32007335407847,3.40343555795276 51 | "50",50,"oppose","before",4.73281399612096,3.59944932362922 52 | "51",51,"oppose","before",4.39457829507545,3.90979678016095 53 | "52",52,"oppose","before",1.50537267787595,5.14780710772723 54 | "53",1,"support","after",6.00486956549221,4.62239312343345 55 | "54",2,"support","after",4.59584395452931,3.76215675555703 56 | "55",3,"support","after",4.11958565125179,4.7780622342035 57 | "56",4,"support","after",5.73047442817776,4.17604432845596 58 | "57",5,"support","after",6.04130988239085,3.34823570880482 59 | "58",6,"support","after",5.1167068473927,4.9156872193279 60 | "59",7,"support","after",7.66399217871838,4.17638215867229 61 | "60",8,"support","after",5.63696066414526,5.72330328717527 62 | "61",9,"support","after",4.14926207707465,6.19617062669002 63 | "62",10,"support","after",6.24596311609874,3.83792542175314 64 | "63",11,"support","after",5.21344815836189,4.25006545689576 65 | "64",12,"support","after",3.72285662677104,7.87705734028252 66 | "65",13,"support","after",3.36819379876267,3.51488167366069 67 | "66",14,"support","after",4.04772616957932,4.32987922446406 68 | "67",15,"support","after",6.1729909818432,5.73565171201996 69 | "68",16,"support","after",3.32926156178592,4.66310619896197 70 | "69",17,"support","after",4.60756738336794,4.77864728078604 71 | "70",18,"support","after",5.00379826465122,4.87565012560552 72 | "71",19,"support","after",3.82124270910706,4.61857579354836 73 | "72",20,"support","after",4.9640476592055,3.21249651219853 74 | "73",21,"support","after",6.39607726134969,4.5219555890663 75 | "74",22,"support","after",4.44138024697706,4.82037924824058 76 | "75",23,"support","after",4.02393168187947,4.45095300314973 77 | "76",24,"support","after",4.02752151254139,4.18112720355392 78 | "77",25,"support","after",4.55498761854497,4.13321277349271 79 | "78",26,"oppose","after",3.4454529630567,2.22918869662918 80 | "79",27,"oppose","after",4.22624045580722,4.15000529557309 81 | "80",28,"oppose","after",5.69724172159873,3.25569679509731 82 | "81",29,"oppose","after",5.74106831818266,3.08859063878211 83 | "82",30,"oppose","after",1.8152787713683,2.32702904348417 84 | "83",31,"oppose","after",5.71882997237842,1.16797941728631 85 | "84",32,"oppose","after",3.78635363713166,2.74976474417464 86 | "85",33,"oppose","after",2.55151914792261,2.50498467560064 87 | "86",34,"oppose","after",3.42759677802461,0.834672289763792 88 | "87",35,"oppose","after",2.17260751602258,3.80350186944327 89 | "88",36,"oppose","after",2.18009493701987,2.65440817762918 90 | "89",37,"oppose","after",5.71354722299045,4.43194036032053 91 | "90",38,"oppose","after",2.19249939431504,2.72964599398318 92 | "91",39,"oppose","after",5.06291739611731,2.64781813087259 93 | "92",40,"oppose","after",5.38920634265136,2.64041083712852 94 | "93",41,"oppose","after",2.45926999269871,1.5927993445083 95 | "94",42,"oppose","after",4.90025733792512,3.74624881011539 96 | "95",43,"oppose","after",3.98899922267144,2.09049436273894 97 | "96",44,"oppose","after",4.31140994525199,3.55233604501274 98 | "97",45,"oppose","after",6.89653197977992,2.97980036566709 99 | "98",46,"oppose","after",2.08780943334144,2.23728502289986 100 | "99",47,"oppose","after",3.98847042995348,2.97282437454049 101 | "100",48,"oppose","after",4.39322474871176,1.22900098662931 102 | "101",49,"oppose","after",5.35755261568502,1.67473946258257 103 | "102",50,"oppose","after",3.29999493498857,0.949884504886524 104 | "103",51,"oppose","after",4.28009849622172,3.47890509762928 105 | "104",52,"oppose","after",3.9959262881833,2.48004465702101 106 | -------------------------------------------------------------------------------- /images/generic_2bar_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2bar_chart.png -------------------------------------------------------------------------------- /images/generic_2line-cont_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line-cont_chart.png -------------------------------------------------------------------------------- /images/generic_2line_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line_chart.png -------------------------------------------------------------------------------- /images/generic_2line_chart_old.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line_chart_old.png -------------------------------------------------------------------------------- /images/generic_bar_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_bar_chart.png -------------------------------------------------------------------------------- /images/generic_line-ord_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_line-ord_chart.png -------------------------------------------------------------------------------- /images/generic_line_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_line_chart.png -------------------------------------------------------------------------------- /plotting_functions.R: -------------------------------------------------------------------------------- 1 | ## PLOTTING FUNCTIONS FOR R TEMPLATES ## 2 | #This file contains the default code for the plots. If needed, the plot aesthetics can be customized here. 3 | 4 | #################### TEMPLATE 1: 1var categorical bar chart #################### 5 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 6 | HOPs_plot_1 = function(data) { 7 | ggplot(data, aes(x = x, y = .value)) + #do not change 8 | geom_bar(stat='identity', width=0.5) + #do not change from stat='identity'. Fill and line aesthetics may be modified here, see ggplot2 documentation 9 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 10 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability 11 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 12 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels 13 | labs(x=x_lab, y=y_lab) # axes labels 14 | } 15 | 16 | # This function generates the 5 static plots of prior draws. 17 | static_prior_plot_1 = function(prior_draws) { 18 | ggplot(prior_draws, aes(x = x, y = .value)) + 19 | geom_bar(stat='identity') + 20 | facet_grid(cols = vars(.draw)) + 21 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 22 | theme(strip.background = element_blank(), 23 | strip.text.y = element_blank(), 24 | axis.text.x = element_text(angle = 45, hjust = 1), 25 | plot.title = element_text(hjust = 0.5)) + 26 | labs(x=x_lab, y=y_lab) + # axes labels 27 | ggtitle("Five sample draws from the priors") 28 | } 29 | 30 | # This function generates the static plots of posterior draws. 31 | static_post_plot_1 = function(fit) { 32 | ggplot(fit, aes(x = x, y = .value)) + 33 | geom_bar(stat ='identity', width=0.5) + 34 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) + 35 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 36 | theme(strip.background = element_blank(), 37 | strip.text.y = element_blank(), 38 | axis.text.x = element_text(angle = 45, hjust = 1), 39 | plot.title = element_text(hjust = 0.5)) + 40 | labs(x=x_lab, y=y_lab) # axes labels 41 | } 42 | 43 | 44 | #################### TEMPLATE 2: 1var ordinal line chart #################### 45 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 46 | HOPs_plot_2 = function(data) { 47 | ggplot(data, aes(x = x, y = .value, group = 1)) + #do not change 48 | geom_line() + #do not change 49 | geom_point() + #do not change 50 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 51 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability 52 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 53 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels 54 | labs(x=x_lab, y=y_lab) # axes labels 55 | } 56 | 57 | # This function generates the 5 static plots of prior draws. 58 | static_prior_plot_2 = function(prior_draws) { 59 | ggplot(prior_draws, aes(x = x, y = .value, group=.draw)) + 60 | geom_line() + 61 | geom_point() + 62 | facet_grid(cols = vars(.draw)) + 63 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 64 | theme(strip.background = element_blank(), 65 | strip.text.y = element_blank(), 66 | axis.text.x = element_text(angle = 45, hjust = 1), 67 | plot.title = element_text(hjust = 0.5)) + 68 | labs(x=x_lab, y=y_lab) + # axes labels 69 | ggtitle("Five sample draws from the priors") 70 | } 71 | 72 | # This function generates the static plots of posterior draws with error bars. 73 | static_post_plot_2 = function(fit) { 74 | ggplot(fit, aes(x = x, y = .value, group=1)) + 75 | geom_line() + 76 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) + 77 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 78 | labs(x=x_lab, y=y_lab) 79 | } 80 | 81 | 82 | #################### TEMPLATE 3: 1var continuous line chart #################### 83 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 84 | HOPs_plot_3 = function(data) { 85 | ggplot(data, aes(x = x, y = .value)) + #do not change 86 | geom_line() + #do not change 87 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 88 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 89 | labs(x=x_lab, y=y_lab) # axes labels 90 | } 91 | 92 | # This function generates the static plot of 100 prior draws. 93 | static_prior_plot_3 = function(fit) { 94 | ggplot(fit, aes(x = x, y = .value)) + 95 | geom_line(aes(group = .draw), alpha = .2) + 96 | labs(x=x_lab, y=y_lab) + # axes labels 97 | ggtitle("100 sample draws from the priors") 98 | } 99 | 100 | # This function generates the static plots of posterior draws with a confidence band. 101 | static_post_plot_3a = function(fit) { 102 | ggplot(fit, aes(x = x, y = .value)) + 103 | stat_lineribbon() + 104 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 105 | scale_fill_brewer() 106 | } 107 | 108 | # This function generates the static plots of 100 overlaid posterior draws. 109 | static_post_plot_3b = function(fit) { 110 | ggplot(fit, aes(x = x, y = .value)) + 111 | geom_line(aes(group = .draw), alpha = .2) + 112 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 113 | labs(x=x_lab, y=y_lab) # axes labels 114 | } 115 | 116 | 117 | #################### TEMPLATE 4: 2var categorical & categorical bar chart #################### 118 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 119 | HOPs_plot_4 = function(data) { 120 | ggplot(data, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + #do not change 121 | geom_bar(stat='identity', position='dodge') + #do not change from stat='identity'. Fill and line aesthetics may be modified here, see ggplot2 documentation 122 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 123 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability 124 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 125 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels 126 | labs(x=x_lab, y=y_lab) # axes labels 127 | } 128 | 129 | # This function generates the 5 static plots of prior draws. 130 | static_prior_plot_4 = function(prior_draws) { 131 | ggplot(prior_draws, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + 132 | geom_bar(stat='identity', position='dodge') + 133 | facet_grid(cols = vars(.draw)) + 134 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 135 | theme(strip.background = element_blank(), 136 | strip.text.y = element_blank(), 137 | axis.text.x = element_text(angle = 45, hjust = 1), 138 | plot.title = element_text(hjust = 0.5)) + 139 | labs(x=x_lab, y=y_lab) + # axes labels 140 | ggtitle("Five sample draws from the priors") 141 | } 142 | 143 | # This function generates the static plots of posterior draws with error bars. 144 | static_post_plot_4 = function(fit) { 145 | ggplot(fit, aes(x = x1, y = .value, fill = x2, group = x2)) + 146 | geom_bar(stat='identity', position='dodge') + 147 | geom_errorbar(aes(ymin = .lower, ymax = .upper), position = position_dodge(width = .9), width = .2) + 148 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 149 | labs(x=x_lab, y=y_lab) 150 | } 151 | 152 | 153 | #################### TEMPLATE 5: 2var categorical & ordinal line chart #################### 154 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 155 | HOPS_plot_5 = function(data) { 156 | ggplot(data, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + #do not change 157 | geom_bar(stat='identity', position='dodge') + #do not change 158 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 159 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability 160 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 161 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels 162 | labs(x=x_lab, y=y_lab) # axes labels 163 | } 164 | 165 | # This function generates the 5 static plots of prior draws. 166 | static_prior_plot_5 = function(fit) { 167 | ggplot(fit, aes(x = x1, y = .value, col = x2, group = x2)) + 168 | geom_line(aes(group = .draw), alpha = .7) + 169 | geom_point(aes(group = .draw), alpha = .7) + 170 | facet_grid(cols = vars(x2)) + 171 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits - CHANGE ME (optional) 172 | theme(axis.text.x = element_text(angle = 45, hjust = 1), 173 | plot.title = element_text(hjust = 0.5), 174 | legend.position="none") + 175 | labs(x=x_lab, y=y_lab) + # axes labels 176 | ggtitle("Five sample draws from the priors") 177 | } 178 | 179 | # This function generates the static plots of posterior draws with error bars. 180 | static_post_plot_5 = function(fit) { 181 | ggplot(fit, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + 182 | geom_line() + #do not change 183 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) + 184 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 185 | labs(x=x_lab, y=y_lab) 186 | } 187 | 188 | 189 | 190 | #################### TEMPLATE 6: 2var continuous & categorical line chart #################### 191 | # This function generates the HOPs (animated plots) for both the prior and posterior plots 192 | HOPS_plot_6 = function(data) { 193 | ggplot(data, aes(x = x1, y = .value, col=x2, group=x2)) + #do not change 194 | geom_line() + #do not change 195 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change 196 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 197 | labs(x=x_lab, y=y_lab) # axes labels 198 | } 199 | 200 | # This function generates the static plots of 100 prior draws. 201 | static_prior_plot_6 = function(prior_draws) { 202 | ggplot(prior_draws, aes(x = x1, y = .value, col = x2, group = x2)) + 203 | geom_line(aes(group = .draw), alpha = .2) + 204 | facet_grid(cols = vars(x2)) + 205 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 206 | theme(plot.title = element_text(hjust = 0.5), 207 | legend.position="none") + 208 | labs(x=x_lab, y=y_lab) + # axes labels 209 | ggtitle("100 sample draws from the priors") 210 | } 211 | 212 | # This function generates the static plots of posterior draws with a confidence band. 213 | static_post_plot_6a = function(fit) { 214 | ggplot(fit, aes(x = x1, y = .value, group = x2)) + 215 | stat_lineribbon() + 216 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 217 | scale_fill_brewer() + 218 | facet_grid(cols = vars(x2)) # comment this out to plot both x vars on the same plot 219 | } 220 | 221 | # This function generates the static plots of 100 overlaid posterior draws. 222 | static_post_plot_6b = function(fit) { 223 | ggplot(fit, aes(x = x1, y = .value, col=x2, group=x2)) + 224 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits 225 | geom_line(aes(group = .draw), alpha = .2) + 226 | facet_grid(cols = vars(x2)) + # comment this out to plot both x vars on the same plot 227 | theme(plot.title = element_text(hjust = 0.5), 228 | legend.position="none") + 229 | labs(x=x_lab, y=y_lab) # axes labels 230 | } 231 | -------------------------------------------------------------------------------- /quizlet/check_understanding_priors.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Check your understanding of how to set Bayesian priors" 3 | output: learnr::tutorial 4 | runtime: shiny_prerendered 5 | --- 6 | 7 | ```{r setup, include=FALSE} 8 | library(learnr) 9 | knitr::opts_chunk$set(echo = FALSE) 10 | ``` 11 | 12 | 13 | ```{r where} 14 | 15 | question("What information can you use when setting priors? (Select all that apply)", 16 | answer("Means & sds from related previous studies", correct = TRUE), 17 | answer("Approximate means & sds from your current data"), 18 | answer("The range of possible values of the dependent variable", correct = TRUE), 19 | answer("Results from a previous analysis of your current data"), 20 | answer("Effect sizes from previous studies", correct = TRUE), 21 | correct = "Correct! You can use any information you could have known before running your current study.", 22 | incorrect = "Incorrect. Did you select option 2 or 4? Priors can only include information you could have known before running your current study. Pretend you don't know anything about your current data!", 23 | allow_retry = TRUE) 24 | 25 | ``` 26 | 27 | 28 | ```{r example-a} 29 | 30 | question("Say you have conducted an experiment with 2 conditions (control and treatment). Your outcome variable is a continuous variable with a range [0,100]. You have no previous studies to use for your priors, so you are going to set weakly informative priors that just tell the model which values are implausible. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.", 31 | answer("Control mean: 50 | Control max value: 50 | Effect size mean: 0 | Effect sd: 50"), 32 | answer("Control mean: 50 | Control sd: 25 | Effect size mean: 0 | Effect sd: 25", correct = TRUE), 33 | answer("Control mean: 0 | Control sd: 50 | Effect size mean: 0 | Effect sd: 50"), 34 | answer("Control mean: 50 | Control sd: 25 | Effect size mean: 50 | Effect sd: 25"), 35 | incorrect = "Incorrect. Your control mean is the mean of the range (so here, mean = 50). In our template, which uses a normal distribution for priors, you want your max value to be 2 sds away from the mean: (100 - 50)/2 = 25. You don't have any info about an effect so you set the effect mean = 0 and the effect sd = control sd (25).", 36 | allow_retry = TRUE) 37 | 38 | ``` 39 | 40 | 41 | ```{r example-b} 42 | 43 | question("Say you have conducted an experiment with 2 conditions (a control condition and 2 treatment conditions). Participants in the experiment were asked to rate a product from 0 to 5 stars. You have no previous literature to use to set priors, and you know from exploratory analysis that your study participants rated the product with an overall mean score of 3.5, and sd of 0.5. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.", 44 | answer("Control mean: 3.5 | Control sd: 0.5 | Effect size mean: 0 | Effect sd: 0.5"), 45 | answer("Control mean: 2.5 | Control sd: 2.5 | Effect size mean: 0 | Effect sd: 2.5"), 46 | answer("Control mean: 2.5 | Control sd: 1.25 | Effect size mean: 0 | Effect sd: 1.25", correct = TRUE), 47 | answer("Control mean: 3.5 | Control sd: 5 | Effect size mean: 0 | Effect sd: 5"), 48 | incorrect = "Incorrect. (Sort of) trick question - you shouldn't use any of the provided information about your current data to set priors. Your control mean is the mean of the range (so here, mean = 2.5). In our template, which uses a normal distribution for priors, you want your max value to be 2 sds away from the mean: (5 - 2.5)/2 = 1.25. You don't have any info about an effect so you set the effect mean = 0 and the effect sd = control sd (1.25).", 49 | allow_retry = TRUE) 50 | 51 | ``` 52 | 53 | 54 | ```{r example-c} 55 | 56 | question("Say you have conducted an experiment with 2 conditions (control and treatment). Participants in the experiment were asked to rate a product from 0 to 5 stars. This time, you have previous literature to use to set priors, which says that the mean score is 3 in the control condition, with a sd of 0.25, and a mean score that is 0.75 points higher in the treatment condition, with an sd of 0.5. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.", 57 | answer("Control mean: 3 | Control sd: 0.25 | Effect size mean: 0.75 | Effect sd: 0.5", correct = TRUE), 58 | answer("Control mean: 3 | Control sd: 0.25 | Effect size mean: 3.75 | Effect sd: 0.5"), 59 | answer("Control mean: 2.5 | Control sd: 1.25 | Effect size mean: 0 | Effect sd: 1.25"), 60 | answer("Control mean: 0 | Control sd: 5 | Effect size mean: 0 | Effect sd: 5"), 61 | incorrect = "Incorrect. Your control mean and sd are taken from the previous literature: 3 and 0.25, respectively. The effect size is also taken straight from the literature: mean = 0.75 and sd = 0.5.", 62 | allow_retry = TRUE) 63 | 64 | ``` 65 | 66 | 67 | ```{r example-d} 68 | 69 | question("Say you have gathered data on 25 different subreddits (message forums) and you want to compare the word counts of comments on each subreddit. Say that you know from a previously-published study that word counts of comments on subreddits follow a normal distribution with a mean of 85 and a sd of 35. You have no previous information about the individual subreddits you are using in your current study, so have no information about what the 'effect size' might be between the different subreddits. What values do you use for the priors?", 70 | answer("Control mean: 85 | Control sd: 35 | Effect size mean: 85 | Effect sd: 35"), 71 | answer("Control mean: 0 | Control sd: 35 | Effect size mean: 25 | Effect sd: 35"), 72 | answer("Control mean: 35 | Control sd: 85 | Effect size mean: 0 | Effect sd: 25"), 73 | answer("Control mean: 85 | Control sd: 35 | Effect size mean: 0 | Effect sd: 35", correct = TRUE), 74 | incorrect = "Incorrect. Your control mean and control sd are the mean and sd of the previous study. You have no information about how the 25 individual subreddits might be different from one another, so you set the 'effect size' mean = 0 and the effect sd = control sd (35).", 75 | allow_retry = TRUE) 76 | 77 | ``` 78 | 79 | 80 | ```{r definition} 81 | 82 | question("When you set priors, you're giving your best guess of what the probability distribution of some unknown quantity would be, if you didn't know anything about the data from your current study. What is that unknown quantity?", 83 | answer("The parameters of the model", correct = TRUE), 84 | answer("The outcome variable"), 85 | correct = "Correct! Priors are the probability distributions of the *parameters*, not the outcome variable. Note though that the two values may be identical if your model has only main effects with dummy variables.", 86 | incorrect = "Incorrect. Priors are the probability distributions of the parameters, not the outcome variable. Note though that the two values may be identical if your model has only main effects with dummy variables.", 87 | allow_retry = TRUE) 88 | 89 | ``` 90 | 91 | All done! You can head back to the analysis template. 92 | --------------------------------------------------------------------------------