├── .gitattributes
├── .gitignore
├── 1var-categorical-bar-bayesian_template.Rmd
├── 1var-continuous-line-bayesian_template.Rmd
├── 1var-ordinal-line-bayesian_template.Rmd
├── 2var-categorical-bar-bayesian_template.Rmd
├── 2var-categorical_ordinal-line-bayesian_template.Rmd
├── 2var-continuous_categorical-line-bayesian_template.Rmd
├── LICENSE
├── README.md
├── bayesian-template.Rproj
├── datasets
├── choc_cleaned_data.csv
├── feel-the-movement_simulated-data.csv
├── lab-in-the-wild_subset.csv
└── stigmatized_campaigns_simulated-data.csv
├── html_outputs
├── 1var-categorical-bar-bayesian_template.html
├── 1var-categorical-bar-bayesian_template_(strong).html
├── 1var-categorical-bar-bayesian_template_(weak).html
├── 1var-continuous-line-bayesian_template.html
├── 1var-continuous-line-bayesian_template_(strong).html
├── 1var-continuous-line-bayesian_template_(weak).html
├── 1var-ordinal-line-bayesian_template.html
├── 1var-ordinal-line-bayesian_template_(strong).html
├── 1var-ordinal-line-bayesian_template_(weak).html
├── 2var-categorical-bar-bayesian_template.html
├── 2var-categorical-bar-bayesian_template_(strong).html
├── 2var-categorical-bar-bayesian_template_(weak).html
├── 2var-categorical_ordinal-line-bayesian_template.html
├── 2var-categorical_ordinal-line-bayesian_template_(strong).html
├── 2var-categorical_ordinal-line-bayesian_template_(weak).html
├── 2var-continuous_categorical-line-bayesian_template.html
├── 2var-continuous_categorical-line-bayesian_template_(strong).html
└── 2var-continuous_categorical-line-bayesian_template_(weak).html
├── images
├── generic_2bar_chart.png
├── generic_2line-cont_chart.png
├── generic_2line_chart.png
├── generic_2line_chart_old.png
├── generic_bar_chart.png
├── generic_line-ord_chart.png
└── generic_line_chart.png
├── plotting_functions.R
└── quizlet
├── check_understanding_priors.Rmd
└── check_understanding_priors.html
/.gitattributes:
--------------------------------------------------------------------------------
1 | * text=auto
2 |
3 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | .Rhistory
3 | .RData
4 | .Rproj*
5 | .Rproj.user
6 | .DS_Store
7 | 1var-categorical-bar-bayesian_template_cache/
8 | 1var-categorical-bar-bayesian_template_files/
9 | 1var-continuous-line-bayesian_template_cache/
10 | 1var-continuous-line-bayesian_template_files/
11 | 1var-ordinal-line-bayesian_template.html_cache/
12 | 1var-ordinal-line-bayesian_template.html_files/
13 | 2var-categorical-bar-bayesian_template_cache/
14 | 2var-categorical-bar-bayesian_template_files/
15 | 2var-categorical_ordinal-line-bayesian_template_cache/
16 | 2var-categorical_ordinal-line-bayesian_template_files/
17 | 2var-continuous_categorical-line-bayesian_template_cache/
18 | 2var-continuous_categorical-line-bayesian_template_files/
--------------------------------------------------------------------------------
/1var-categorical-bar-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title:
Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 1:*
11 |
12 | 
13 |
14 | **Single categorical independent variable (bar chart)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single categorical independent variable** and will produce a **bar chart**. If your analysis includes a **t-test** or **one-way ANOVA**, this might be the right template for you.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. **Categorical; bar graph (e.g. t-tests, one-way ANOVA)**
29 |
30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA)
31 |
32 | 3. Continuous; line graph (e.g. linear regression)
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. Two categorical; bar graph (e.g. two-way ANOVA)
37 |
38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA)
39 |
40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines)
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 |
60 | ##Set up
61 | ###Requirements
62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
63 |
64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
65 |
66 | ###Libraries
67 | **Installation:**
68 | If this is your first time using the template, you may need to install libraries.
69 |
70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
71 |
72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
73 |
74 | **Troubleshooting:**
75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
76 |
77 | 1. Restart R.
78 |
79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
80 |
81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
82 |
83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
84 |
85 | ```{r libraries, message=FALSE, warning=FALSE}
86 |
87 | knitr::opts_chunk$set(fig.align="center")
88 |
89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
90 |
91 | library(rstanarm) #bayesian analysis package
92 | library(tidyverse) #tidy datascience commands
93 | library(tidybayes) #tidy data + ggplot workflow
94 | library(modelr) #tidy pipelines for modeling
95 | library(ggplot2) #plotting package
96 | library(gganimate) #animate ggplots
97 |
98 | # We import all of our plotting functions from this separate R file to keep the code in
99 | # this template easier to read. You can edit this file to customize aesthetics of the plots
100 | # if desired. Just be sure to run this line again after you make edits!
101 | source('plotting_functions.R')
102 |
103 | theme_set(theme_light()) # set the ggplot theme for all plots
104 |
105 | ```
106 |
107 |
108 | ###Read in data
109 | **What to change**
110 |
111 | 1. mydata: Read in your data.
112 |
113 | ```{r data_prep}
114 |
115 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
116 |
117 | ```
118 |
119 |
120 | ## Specify model
121 | We'll fit the following model: `stan_glm(y ~ x)`. As $x$ is a categorical variable in this template, this specifies a linear regression with dummy variables for each level in categorical variable $x$. **This is equivalent to ANOVA.** So for example, for a regression where $x$ has three levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b_1dummy_1 + b_2dummy_2$ and standard deviation equal to `sigma` ($\sigma$):
122 |
123 | $$
124 | y_i \sim Normal(a + b_1dummy_1 + b_2dummy_2, \sigma)
125 | $$
126 |
127 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
128 |
129 | **What to change**
130 |
131 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots.
132 |
133 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots.
134 |
135 | 4. x_lab: Label your plots' x-axes.
136 |
137 | 5. y_lab: Label your plots' y-axes.
138 |
139 | ```{r specify_model}
140 |
141 | #select your independent and dependent variable
142 | mydata$x = mydata$sat_max #CHANGE ME 2
143 | mydata$y = mydata$satis_Q1 #CHANGE ME 3
144 |
145 | # label the axes on the plots
146 | x_lab = "Decision-making type" #CHANGE ME 4
147 | y_lab = "Satisfaction" #CHANGE ME 5
148 |
149 | ```
150 |
151 |
152 | ### Set priors
153 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
154 |
155 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
156 |
157 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
158 |
159 | Our model has the following parameters:
160 |
161 | a. the overall mean y-value across all levels of categorical variable x
162 |
163 | b. the mean y-value for each of the individual levels
164 |
165 | c. the standard deviation of the normally distributed error term
166 |
167 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the categorical variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
168 |
169 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
170 |
171 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
172 |
173 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
174 |
175 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
176 |
177 | **What to change**
178 |
179 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
180 |
181 | *Remember: **do not** use any of your data from the current study to inform prior values.*
182 |
183 | 6. a_prior: Select the control condition mean.
184 |
185 | 7. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.)
186 |
187 | 8. b1_prior: Select the effect size mean.
188 |
189 | 9. b1_sd: Select the effect size standard deviation.
190 |
191 | 10. You should also change the comments in the code below to explain your choice of priors.
192 |
193 | **If you are using strong priors (i.e. priors from previous literature):**
194 |
195 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
196 |
197 | ```{r}
198 |
199 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
200 | # In our example dataset, y-axis scores can be in the range [1, 7].
201 | # In the absence of other information, we set the parameter mean as 4
202 | # (the mean of the range [1,7]) and the maximum possible value as 7.
203 | # From exploratory analysis, we know the mean score and sd for y in our
204 | # dataset but we *DO NOT* use this information because priors *CANNOT*
205 | # include any information from the current study.
206 |
207 | a_prior = 4 # CHANGE ME 6
208 | a_prior_max = 7 # CHANGE ME 7
209 |
210 | # With a normal distribution, we can't completely rule out
211 | # impossible values, but we choose an sd that assigns less than
212 | # 5% probability to those impossible values. Remember that in a normal
213 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
214 | # we calculate the value of 1 sd by finding the maximum amount our data
215 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
216 |
217 | a_sd = (a_prior_max - a_prior) / 2 # do not change
218 |
219 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
220 | # In our example dataset, we do not have a strong hypothesis that the treatment
221 | # conditions will be higher or lower than the control, so we set the mean of
222 | # the effect size parameters to be 0. In the absence of other information, we
223 | # set the sd to be the same as for the control condition.
224 |
225 | b1_prior = 0 # CHANGE ME 8
226 | b1_sd = a_sd # CHANGE ME 9
227 |
228 | ```
229 |
230 |
231 | **What to change**
232 |
233 | **If you are using weakly informative priors:**
234 |
235 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
236 |
237 | **If you are using strong priors (i.e. priors from previous literature):**
238 |
239 | *Remember: **do not** use any of your data from the current study to set prior values.*
240 |
241 | First, make sure to uncomment all four variables set in this code chunk.
242 |
243 | 6. a_prior: Select the control condition mean.
244 |
245 | 7. a_sd: Select the control condition standard deviation.
246 |
247 | 8. b1_prior: Select the effect size mean.
248 |
249 | 9. b1_sd: Select the effect size standard deviation.
250 |
251 | 10. You should also change the comments in the code below to explain your choice of priors.
252 |
253 | ```{r}
254 |
255 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
256 | # In our example dataset, y-axis scores can be in the range [1, 7].
257 | # To choose our priors, we use the results from a previous study
258 | # where participants completed an identical task (choosing between
259 | # different chocolate bars). For our overall prior mean, we pool the mean
260 | # satisfaction scores from all conditions in the previous study to get
261 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
262 | # the 95% confidence intervals from the previous study results.
263 |
264 | # a_prior = 5.86 # CHANGE ME 6
265 | # a_sd = 0.6 # CHANGE ME 7
266 |
267 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
268 | # In our example dataset, we do not have guidance from previous literature
269 | # to set an exact effect size, but the literature does tell us that satisficers
270 | # (the "treatment" condition) are likely to have higher mean satisfaction
271 | # than the maximizers (the "control" condition), so we set an effect size
272 | # parameter mean that results in a 1 point increase in satisfaction for satisficers.
273 | # To reflect the uncertainty in this effect size, we select a broad sd so
274 | # that there is a ~20% chance that the effect size will be negative.
275 |
276 | # b1_prior = 1 # CHANGE ME 8
277 | # b1_sd = 1 # CHANGE ME 9
278 |
279 | ```
280 |
281 |
282 | ### Checking priors with visualizations
283 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
284 |
285 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars.
286 |
287 | #### Static visualization of priors
288 | **What to change**
289 |
290 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try re-running it a few more times to make sure that any implausible values you see don't come up very often.
291 |
292 | **Troubleshooting**
293 |
294 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
295 |
296 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
297 |
298 | ```{r check_priors, results="hide"}
299 |
300 | # generate the prior distribution
301 | m_prior = stan_glm(y ~ x, data = mydata,
302 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
303 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
304 | prior_PD = TRUE
305 | )
306 |
307 | # Create the dataframe with fitted draws
308 | prior_draws = mydata %>% #pipe mydata to datagrid()
309 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
310 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid
311 | # the seed argument is for reproducibility: it ensures the pseudo-random
312 | # number generator used to pick draws has the same seed on every run,
313 | # so that someone else can re-run this code and verify their output matches
314 |
315 | # Plot the five sample draws
316 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
317 | static_prior_plot_1(prior_draws)
318 | ```
319 |
320 | #### Animated HOPs visualization of priors
321 |
322 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws!
323 |
324 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above.
325 |
326 | **What to change:** Nothing! Just run the code to check your priors.
327 |
328 | ```{r HOPs_priors}
329 | # Animation parameters
330 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time)
331 | frames_per_second = 2.5 # the speed of the HOPs
332 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
333 | # Faster speeds (100ms) have been demonstrated to not work as well.
334 | # See Kale et al. VIS 2018 for more info.
335 |
336 | # Add more prior draws to the data frame for the visualization
337 | more_prior_draws = prior_draws %>%
338 | rbind(
339 | mydata %>%
340 | data_grid(x) %>%
341 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345))
342 |
343 | # Animate the prior draws with HOPs
344 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
345 | prior_HOPs = animate(HOPs_plot_1(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second)
346 | prior_HOPs
347 | ```
348 |
349 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
350 |
351 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
352 |
353 | ## Run the model
354 | **What to change:** Nothing! Just run the model.
355 |
356 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
357 |
358 | ```{r results = "hide", message = FALSE, warning = FALSE}
359 |
360 | m = stan_glm(y ~ x, data = mydata,
361 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
362 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
363 | )
364 |
365 | ```
366 |
367 |
368 | ### Model summary
369 | Here is a summary of the model fit.
370 |
371 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
372 |
373 | ```{r}
374 | summary(m, digits=3)
375 | ```
376 |
377 |
378 | ## Visualizing results
379 | #### Static visualization
380 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model:
381 |
382 | ```{r static_graph}
383 |
384 | # Create the dataframe with fitted draws
385 | fit = mydata %>%#pipe mydata to datagrid()
386 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
387 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid
388 | mean_qi(.width = .95) #add 95% credible intervals
389 |
390 | # Plot the posterior draws
391 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
392 | static_post_plot_1(fit)
393 | ```
394 |
395 | #### Animated HOPs visualization
396 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
397 |
398 | ```{r}
399 |
400 | p = mydata %>% #pipe mydata to datagrid()
401 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
402 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid
403 |
404 | # Animate the posterior draws with HOPs
405 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
406 | post_HOPs = animate(HOPs_plot_1(p), nframes = n_draws * 2, fps = frames_per_second)
407 | post_HOPs
408 |
409 | ```
410 |
411 | ### Comparing the prior and posterior
412 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
413 |
414 | **Prior draws**
415 | ```{r echo=F}
416 | prior_HOPs
417 | ```
418 |
419 | **Posterior draws**
420 | ```{r echo=F}
421 | post_HOPs
422 | ```
423 |
424 | ## Finishing up
425 |
426 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
427 |
428 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
429 |
430 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
431 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
432 |
433 |
434 | The citation for the paper that reports the process of developing and user-testing these templates is below:
435 |
436 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
437 |
438 |
--------------------------------------------------------------------------------
/1var-continuous-line-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 3:*
11 |
12 | 
13 |
14 | **Single continuous independent variable (line graph)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single continuous independent variable** and will produce a **line chart**. If your analysis includes a **linear regression**, this might be the right template for you.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA)
29 |
30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA)
31 |
32 | 3. **Continuous; line graph (e.g. linear regression)**
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. Two categorical; bar graph (e.g. two-way ANOVA)
37 |
38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA)
39 |
40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines)
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between the number of choices presented, which we treat as a continuous variable in this template with values that can fall in the range [12,72]; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 |
60 | ##Set up
61 | ###Requirements
62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
63 |
64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
65 |
66 | ###Libraries
67 | **Installation:**
68 | If this is your first time using the template, you may need to install libraries.
69 |
70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
71 |
72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
73 |
74 | **Troubleshooting:**
75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
76 |
77 | 1. Restart R.
78 |
79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
80 |
81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
82 |
83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
84 |
85 | ```{r libraries, message=FALSE, warning=FALSE}
86 |
87 | knitr::opts_chunk$set(fig.align="center")
88 |
89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
90 |
91 | library(rstanarm) #bayesian analysis package
92 | library(tidyverse) #tidy datascience commands
93 | library(tidybayes) #tidy data + ggplot workflow
94 | library(modelr) #tidy pipelines for modeling
95 | library(ggplot2) #plotting package
96 | library(gganimate) #animate ggplots
97 |
98 | # We import all of our plotting functions from this separate R file to keep the code in
99 | # this template easier to read. You can edit this file to customize aesthetics of the plots
100 | # if desired. Just be sure to run this line again after you make edits!
101 | source('plotting_functions.R')
102 |
103 | theme_set(theme_light()) # set the ggplot theme for all plots
104 |
105 | ```
106 |
107 | ###Read in data
108 | **What to change**
109 |
110 | 1. mydata: Read in your data.
111 |
112 | ```{r data_prep}
113 |
114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
115 |
116 | ```
117 |
118 |
119 | ## Specify model
120 | We'll fit the following model: `stan_glm(y ~ x)`, which specifies a linear regression where each $y_i$ is drawn from a normal distribution with mean equal to $a + bx_i$ and standard deviation equal to `sigma` ($\sigma$):
121 |
122 | $$
123 | y_i \sim Normal(a + bx_i, \sigma)
124 | $$
125 |
126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
127 |
128 | **What to change**
129 |
130 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots.
131 |
132 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots.
133 |
134 | 4. x_lab: Label your plots' x-axes.
135 |
136 | 5. y_lab: Label your plots' y-axes.
137 |
138 | ```{r specify_model}
139 |
140 | #select your independent and dependent variable
141 | mydata$x = mydata$num_products_displayed #CHANGE ME 2
142 | mydata$y = mydata$satis_Q1 #CHANGE ME 3
143 |
144 | # label the axes on the plots
145 | x_lab = "Choices" #CHANGE ME 4
146 | y_lab = "Satisfaction" #CHANGE ME 5
147 |
148 | ```
149 |
150 |
151 | ### Set priors
152 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
153 |
154 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
155 |
156 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
157 |
158 | Our model has the following parameters:
159 |
160 | a. the intercept; functionally, this is often the mean of the control condition
161 |
162 | b. the slope; i.e, the effect size
163 |
164 | c. the standard deviation of the normally distributed error term
165 |
166 | To simplify things, we limit the number of different prior beliefs you can have. Think of the intercept as specifying the control condition of an experiment, and the slope as specifying the effect size. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
167 |
168 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
169 |
170 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
171 |
172 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
173 |
174 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
175 |
176 | **What to change**
177 |
178 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
179 |
180 | *Remember: **do not** use any of your data from the current study to inform prior values.*
181 |
182 | 6. a_prior: Select the intercept (likely the control condition mean).
183 |
184 | 7. a_prior_max: Select the maximum plausible value of the intercept (maximum plausible value of control condition data). (We will use this to calculate the sd of `a`.)
185 |
186 | 8. b1_prior: Select the effect size mean.
187 |
188 | 9. b1_sd: Select the effect size standard deviation.
189 |
190 | 10. You should also change the comments in the code below to explain your choice of priors.
191 |
192 | **If you are using strong priors (i.e. priors from previous literature):**
193 |
194 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
195 |
196 | ```{r}
197 |
198 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
199 | # In our example dataset, y-axis scores can be in the range [1, 7].
200 | # In the absence of other information, we set the parameter mean as 4
201 | # (the mean of the range [1,7]) and the maximum possible value as 7.
202 | # From exploratory analysis, we know the mean score and sd for y in our
203 | # dataset but we *DO NOT* use this information because priors *CANNOT*
204 | # include any information from the current study.
205 |
206 | a_prior = 4 # CHANGE ME 6
207 | a_prior_max = 7 # CHANGE ME 7
208 |
209 | # With a normal distribution, we can't completely rule out
210 | # impossible values, but we choose an sd that assigns less than
211 | # 5% probability to those impossible values. Remember that in a normal
212 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
213 | # we calculate the value of 1 sd by finding the maximum amount our data
214 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
215 |
216 | a_sd = (a_prior_max - a_prior) / 2 # do not change
217 |
218 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
219 | # In this example, we will say we do not have guidance from literature
220 | # about the effect of choice set size on satisfaction, so we set the mean
221 | # of the effect size parameters to be 0. In the absence of other information,
222 | # we set the sd so that a change from the minimum choice set size (12)
223 | # to the maximum choice set size (72) could plausibly result
224 | # in a +6/-6 change in satisfaction, the maximum possible change.
225 |
226 | b1_prior = 0 # CHANGE ME 8
227 | b1_sd = (6/(72-12))/2 # CHANGE ME 9
228 |
229 | ```
230 |
231 |
232 | **What to change**
233 |
234 | **If you are using weakly informative priors:**
235 |
236 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
237 |
238 | **If you are using strong priors (i.e. priors from previous literature):**
239 |
240 | *Remember: **do not** use any of your data from the current study to set prior values.*
241 |
242 | First, make sure to uncomment all four variables set in this code chunk.
243 |
244 | 6. a_prior: Select the control condition mean.
245 |
246 | 7. a_sd: Select the control condition standard deviation.
247 |
248 | 8. b1_prior: Select the effect size mean.
249 |
250 | 9. b1_sd: Select the effect size standard deviation.
251 |
252 | 10. You should also change the comments in the code below to explain your choice of priors.
253 |
254 | ```{r}
255 |
256 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
257 | # In our example dataset, y-axis scores can be in the range [1, 7].
258 | # To choose our priors, we use the results from a previous study
259 | # where participants completed an identical task (choosing between
260 | # different chocolate bars). For our overall prior mean, we pool the mean
261 | # satisfaction scores from all conditions in the previous study to get
262 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
263 | # the 95% confidence intervals from the previous study results.
264 |
265 | # a_prior = 5.86 # CHANGE ME 6
266 | # a_sd = 0.6 # CHANGE ME 7
267 |
268 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
269 | # In this example, we do not have guidance from previous literature
270 | # to set an exact effect size, but literature does tell us that
271 | # satisfaction is likely to decline as choice size increases, so we set
272 | # an effect size parameter mean so that a change from the minimum
273 | # choice set size (12) to the maximum choice set size (72) results in
274 | # a 2-point decrease in satisfaction. To reflect the uncertainty in
275 | # this effect size, we set the sd so that a change from the minimum
276 | # to maximum choice set size could plausibly result in a +6/-6 change
277 | # in satisfaction, the maximum possible change.
278 |
279 | # b1_prior = -2/(72-12) # CHANGE ME 8
280 | # b1_sd = (6/(72-12))/2 # CHANGE ME 9
281 |
282 | ```
283 |
284 |
285 | ### Checking priors with visualizations
286 | Next, you'll want to check your priors by running this code chunk. It will produce a set of 100 sample draws from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
287 |
288 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars.
289 |
290 | #### Static visualization of priors
291 | **What to change**
292 |
293 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often.
294 |
295 | **Troubleshooting**
296 |
297 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
298 |
299 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
300 |
301 | ```{r check_priors, results="hide"}
302 |
303 | # generate the prior distribution
304 | m_prior = stan_glm(y ~ x, data = mydata,
305 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
306 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
307 | prior_PD = TRUE
308 | )
309 |
310 | # Create the dataframe with fitted draws
311 | prior_draws = mydata %>% #pipe mydata to datagrid()
312 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
313 | add_fitted_draws(m_prior, n = 100, seed = 12345) #add n fitted draws from the model to the fit grid
314 | # the seed argument is for reproducibility: it ensures the pseudo-random
315 | # number generator used to pick draws has the same seed on every run,
316 | # so that someone else can re-run this code and verify their output matches
317 |
318 | # Plot the five sample draws
319 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
320 | static_prior_plot_3(prior_draws)
321 | ```
322 |
323 | #### Animated HOPs visualization of priors
324 | The static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs. HOPs visualizes the same information as the static plot generated above, but are better for visualizing uncertainty and identifying underlying trends.
325 |
326 | In this code chunk, we create the animated plot using the 50 of the 100 draws we used in the plot above. Each frame of the animation shows a different draw from the prior.
327 |
328 | **What to change:** Nothing! Just run the code to check your priors.
329 |
330 | ```{r HOPs_priors}
331 | # Animation parameters
332 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time)
333 | frames_per_second = 2.5 # the speed of the HOPs
334 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
335 | # Faster speeds (100ms) have been demonstrated to not work as well.
336 | # See Kale et al. VIS 2018 for more info.
337 |
338 | # Animate the prior draws with HOPs
339 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
340 | prior_HOPs = animate(HOPs_plot_3(prior_draws), nframes = n_draws * 2, fps = frames_per_second)
341 | prior_HOPs
342 | ```
343 |
344 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
345 |
346 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
347 |
348 |
349 | ## Run the model
350 | **What to change:** Nothing! Just run the model.
351 |
352 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
353 |
354 | ```{r results = "hide", message = FALSE, warning = FALSE}
355 | m = stan_glm(y ~ x, data = mydata,
356 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
357 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
358 | )
359 | ```
360 |
361 |
362 | ### Model summary
363 | Here is a summary of the model fit.
364 |
365 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
366 |
367 | ```{r}
368 | summary(m, digits=3)
369 | ```
370 |
371 |
372 | ## Visualizing results
373 | #### Static visualizations
374 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with a 95% confidence band. To do this, we use the grid and draw samples from the posterior mean evaluated at each x position in the grid using the `add_fitted_draws` function, and then summarize these samples in ggplot using a `stat_lineribbon`:
375 |
376 | ```{r static_graph}
377 |
378 | # Create the dataframe with fitted draws
379 | fit = mydata %>%#pipe mydata to datagrid()
380 | data_grid(x = seq_range(x, n = 20)) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
381 | add_fitted_draws(m) #add n fitted draws from the model to the fit grid
382 |
383 | # Plot the posterior draws with a confidence band
384 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
385 | static_post_plot_3a(fit)
386 |
387 | ```
388 |
389 | But what we really want is to display a selection of plausible fit lines, say 100 of them. To do that, we instead ask `add_fitted_draws` for only 50 draws, which we plot separately as lines:
390 |
391 | ```{r}
392 |
393 | fit = mydata %>%
394 | data_grid(x = seq_range(x, n = 101)) %>%
395 | # the seed argument is for reproducibility: it ensures the pseudo-random
396 | # number generator used to pick draws has the same seed on every run,
397 | # so that someone else can re-run this code and verify their output matches
398 | add_fitted_draws(m, n = 100, seed = 12345)
399 |
400 | # Plot the posterior draws with a selection of fit draws
401 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
402 | static_post_plot_3b(fit)
403 |
404 | ```
405 |
406 |
407 | #### Animated HOPs visualization
408 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
409 |
410 | ```{r}
411 |
412 | p = mydata %>% #pipe mydata to datagrid()
413 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
414 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid
415 |
416 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk
417 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
418 | post_HOPs = animate(HOPs_plot_3(p), nframes = n_draws * 2, fps = frames_per_second)
419 | post_HOPs
420 |
421 | ```
422 |
423 | ### Comparing the prior and posterior
424 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
425 |
426 | **Prior draws**
427 | ```{r echo=F}
428 | prior_HOPs
429 | ```
430 |
431 | **Posterior draws**
432 | ```{r echo=F}
433 | post_HOPs
434 | ```
435 |
436 | ## Finishing up
437 |
438 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
439 |
440 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
441 |
442 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
443 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
444 |
445 |
446 | The citation for the paper that reports the process of developing and user-testing these templates is below:
447 |
448 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
449 |
--------------------------------------------------------------------------------
/1var-ordinal-line-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 2:*
11 |
12 | 
13 |
14 | **Single ordinal independent variable (line graph)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with a **single ordinal independent variable** and will produce a **line chart**. If your analysis includes a **t-test** or **one-way ANOVA**, this might be the right template for you. In many cases, a bar chart will be a better option for visualizing your results; for bar charts with this type of data, use template 1 instead.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA)
29 |
30 | 2. **Ordinal; line graph (e.g. t-tests, one-way ANOVA)**
31 |
32 | 3. Continuous; line graph (e.g. linear regression)
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. Two categorical; bar graph (e.g. two-way ANOVA)
37 |
38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA)
39 |
40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines)
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between the number of choices presented, which we treat as an ordinal variable in this template with possible values [12,24,40,50,60,72]; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 |
60 | ##Set up
61 | ###Requirements
62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
63 |
64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
65 |
66 | ###Libraries
67 | **Installation:**
68 | If this is your first time using the template, you may need to install libraries.
69 |
70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
71 |
72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
73 |
74 | **Troubleshooting:**
75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
76 |
77 | 1. Restart R.
78 |
79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
80 |
81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
82 |
83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
84 |
85 | ```{r libraries, message=FALSE, warning=FALSE}
86 |
87 | knitr::opts_chunk$set(fig.align="center")
88 |
89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
90 |
91 | library(rstanarm) #bayesian analysis package
92 | library(tidyverse) #tidy datascience commands
93 | library(tidybayes) #tidy data + ggplot workflow
94 | library(modelr) #tidy pipelines for modeling
95 | library(ggplot2) #plotting package
96 | library(gganimate) #animate ggplots
97 |
98 | # We import all of our plotting functions from this separate R file to keep the code in
99 | # this template easier to read. You can edit this file to customize aesthetics of the plots
100 | # if desired. Just be sure to run this line again after you make edits!
101 | source('plotting_functions.R')
102 |
103 | theme_set(theme_light()) # set the ggplot theme for all plots
104 |
105 | ```
106 |
107 | ###Read in data
108 | **What to change**
109 |
110 | 1. mydata: Read in your data.
111 |
112 | ```{r data_prep}
113 |
114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
115 |
116 | ```
117 |
118 |
119 | ## Specify model
120 | We'll fit the following model: `stan_glm(y ~ x)`. As $x$ is an ordinal variable in this template, this specifies a linear regression with dummy variables for each level in ordinal variable $x$. **This is equivalent to ANOVA.** So for example, for a regression where $x$ has three levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b_1dummy_1 + b_2dummy_2$ and standard deviation equal to `sigma` ($\sigma$):
121 |
122 | $$
123 | y_i \sim Normal(a + b_1dummy_1 + b_2dummy_2, \sigma)
124 | $$
125 |
126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
127 |
128 | **What to change**
129 |
130 | 2. mydata\$x: Select which variables will appear on the x-axis of your plots.
131 |
132 | 3. mydata\$y: Select which variables will appear on the y-axis of your plots.
133 |
134 | 4. x_lab: Label your plots' x-axes.
135 |
136 | 5. y_lab: Label your plots' y-axes.
137 |
138 | ```{r specify_model}
139 |
140 | #select your independent and dependent variable
141 | mydata$x = as.factor(mydata$num_products_displayed) #CHANGE ME 2
142 | mydata$y = mydata$satis_Q1 #CHANGE ME 3
143 |
144 | # label the axes on the plots
145 | x_lab = "Choices" #CHANGE ME 4
146 | y_lab = "Satisfaction" #CHANGE ME 5
147 |
148 | ```
149 |
150 |
151 | ### Set priors
152 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
153 |
154 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
155 |
156 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
157 |
158 | Our model has the following parameters:
159 |
160 | a. the overall mean y-value across all levels of ordinal variable x
161 |
162 | b. the mean y-value for each of the individual levels
163 |
164 | c. the standard deviation of the normally distributed error term
165 |
166 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the ordinal variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
167 |
168 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
169 |
170 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
171 |
172 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
173 |
174 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
175 |
176 | **What to change**
177 |
178 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
179 |
180 | *Remember: **do not** use any of your data from the current study to inform prior values.*
181 |
182 | 6. a_prior: Select the control condition mean.
183 |
184 | 7. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.)
185 |
186 | 8. b1_prior: Select the effect size mean.
187 |
188 | 9. b1_sd: Select the effect size standard deviation.
189 |
190 | 10. You should also change the comments in the code below to explain your choice of priors.
191 |
192 | **If you are using strong priors (i.e. priors from previous literature):**
193 |
194 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
195 |
196 | ```{r}
197 |
198 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
199 | # In our example dataset, y-axis scores can be in the range [1, 7].
200 | # In the absence of other information, we set the parameter mean as 4
201 | # (the mean of the range [1,7]) and the maximum possible value as 7.
202 | # From exploratory analysis, we know the mean score and sd for y in our
203 | # dataset but we *DO NOT* use this information because priors *CANNOT*
204 | # include any information from the current study.
205 |
206 | a_prior = 4 # CHANGE ME 6
207 | a_prior_max = 7 # CHANGE ME 7
208 |
209 | # With a normal distribution, we can't completely rule out
210 | # impossible values, but we choose an sd that assigns less than
211 | # 5% probability to those impossible values. Remember that in a normal
212 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
213 | # we calculate the value of 1 sd by finding the maximum amount our data
214 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
215 |
216 | a_sd = (a_prior_max - a_prior) / 2 # do not change
217 |
218 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
219 | # In this example, we will say we do not have a strong hypothesis about the effect
220 | # of choice set size on satisfaction, so we set the mean of the effect size
221 | # parameters to be 0. In the absence of other information, we set the sd
222 | # to be the same as for the control condition.
223 |
224 | b1_prior = 0 # CHANGE ME 8
225 | b1_sd = a_sd # CHANGE ME 9
226 |
227 | ```
228 |
229 |
230 | **What to change**
231 |
232 | **If you are using weakly informative priors:**
233 |
234 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
235 |
236 | **If you are using strong priors (i.e. priors from previous literature):**
237 |
238 | *Remember: **do not** use any of your data from the current study to set prior values.*
239 |
240 | First, make sure to uncomment all four variables set in this code chunk.
241 |
242 | 6. a_prior: Select the control condition mean.
243 |
244 | 7. a_sd: Select the control condition standard deviation.
245 |
246 | 8. b1_prior: Select the effect size mean.
247 |
248 | 9. b1_sd: Select the effect size standard deviation.
249 |
250 | 10. You should also change the comments in the code below to explain your choice of priors.
251 |
252 | ```{r}
253 |
254 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
255 | # In our example dataset, y-axis scores can be in the range [1, 7].
256 | # To choose our priors, we use the results from a previous study
257 | # where participants completed an identical task (choosing between
258 | # different chocolate bars). For our overall prior mean, we pool the mean
259 | # satisfaction scores from all conditions in the previous study to get
260 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
261 | # the 95% confidence intervals from the previous study results.
262 |
263 | # a_prior = 5.86 # CHANGE ME 6
264 | # a_sd = 0.6 # CHANGE ME 7
265 |
266 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (10)
267 | # In this example, we do not have a strong hypothesis about the effect
268 | # of choice set size on satisfaction, so we set the mean of the effect size
269 | # parameters to be 0. In the absence of other information, we set the sd
270 | # to be the same as for the control condition.
271 |
272 | # b1_prior = 0 # CHANGE ME 8
273 | # b1_sd = a_sd # CHANGE ME 9
274 |
275 | ```
276 |
277 |
278 | ### Checking priors with visualizations
279 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
280 |
281 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars.
282 |
283 | #### Static visualization of priors
284 |
285 | **What to change**
286 |
287 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often.
288 |
289 | **Troubleshooting**
290 |
291 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
292 |
293 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
294 |
295 | ```{r check_priors, results="hide"}
296 |
297 | # generate the prior distribution
298 | m_prior = stan_glm(y ~ x, data = mydata,
299 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
300 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
301 | prior_PD = TRUE
302 | )
303 |
304 | # Create the dataframe with fitted draws
305 | prior_draws = mydata %>% #pipe mydata to datagrid()
306 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
307 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid
308 | # the seed argument is for reproducibility: it ensures the pseudo-random
309 | # number generator used to pick draws has the same seed on every run,
310 | # so that someone else can re-run this code and verify their output matches
311 |
312 | # Plot the five sample draws
313 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
314 | static_prior_plot_2(prior_draws)
315 | ```
316 |
317 | #### Animated HOPs visualization of priors
318 |
319 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws!
320 |
321 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above.
322 |
323 | **What to change:** Nothing! Just run the code to check your priors.
324 |
325 | ```{r HOPs_priors}
326 | # Animation parameters
327 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time)
328 | frames_per_second = 2.5 # the speed of the HOPs
329 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
330 | # Faster speeds (100ms) have been demonstrated to not work as well.
331 | # See Kale et al. VIS 2018 for more info.
332 |
333 | # Add more prior draws to the data frame for the visualization
334 | more_prior_draws = prior_draws %>%
335 | rbind(
336 | mydata %>%
337 | data_grid(x) %>%
338 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345))
339 |
340 | # Animate the prior draws with HOPs
341 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
342 | prior_HOPs = animate(HOPs_plot_2(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second)
343 | prior_HOPs
344 | ```
345 |
346 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
347 |
348 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
349 |
350 |
351 | ### Run the model
352 | **What to change:** Nothing! Just run the model.
353 |
354 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
355 |
356 | ```{r results = "hide", message = FALSE, warning = FALSE}
357 |
358 | m = stan_glm(y ~ x, data = mydata,
359 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
360 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
361 | )
362 |
363 | ```
364 |
365 |
366 | ### Model summary
367 | Here is a summary of the model fit.
368 |
369 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
370 |
371 | ```{r}
372 | summary(m, digits=3)
373 | ```
374 |
375 |
376 | ## Visualizing results
377 | #### Static visualization
378 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model:
379 |
380 | ```{r static_graph}
381 |
382 | # Create the dataframe with fitted draws
383 | fit = mydata %>%#pipe mydata to datagrid()
384 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
385 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid
386 | mean_qi(.width = .95) #add 95% credible intervals
387 |
388 | # Plot the posterior draws
389 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
390 | static_post_plot_2(fit)
391 | ```
392 |
393 |
394 | #### Animated HOPs visualization
395 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
396 |
397 | ```{r}
398 |
399 | p = mydata %>% #pipe mydata to datagrid()
400 | data_grid(x) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
401 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid
402 |
403 | #animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk
404 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
405 | post_HOPs = animate(HOPs_plot_2(p), nframes = n_draws * 2, fps = frames_per_second)
406 | post_HOPs
407 |
408 | ```
409 |
410 | ### Comparing the prior and posterior
411 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
412 |
413 | **Prior draws**
414 | ```{r echo=F}
415 | prior_HOPs
416 | ```
417 |
418 | **Posterior draws**
419 | ```{r echo=F}
420 | post_HOPs
421 | ```
422 |
423 | ## Finishing up
424 |
425 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
426 |
427 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
428 |
429 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
430 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
431 |
432 |
433 | The citation for the paper that reports the process of developing and user-testing these templates is below:
434 |
435 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
436 |
--------------------------------------------------------------------------------
/2var-categorical-bar-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 4:*
11 |
12 | 
13 |
14 | **Interaction of two categorical independent variables (bar chart)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **two interacting categorical independent variables** and will produce a **bar chart**. If your analysis includes a **two-way ANOVA**, this might be the right template for you.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA)
29 |
30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA)
31 |
32 | 3. Continuous; line graph (e.g. linear regression)
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. **Two categorical; bar graph (e.g. two-way ANOVA)**
37 |
38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA)
39 |
40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines)
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as a categorical variable in this template with possible values [12,24,40,50,60,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 |
60 | ##Set up
61 | ###Requirements
62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
63 |
64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
65 |
66 | ###Libraries
67 | **Installation:**
68 | If this is your first time using the template, you may need to install libraries.
69 |
70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
71 |
72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
73 |
74 | **Troubleshooting:**
75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
76 |
77 | 1. Restart R.
78 |
79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
80 |
81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
82 |
83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
84 |
85 | ```{r libraries, message=FALSE, warning=FALSE}
86 |
87 | knitr::opts_chunk$set(fig.align="center")
88 |
89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
90 |
91 | library(rstanarm) #bayesian analysis package
92 | library(tidyverse) #tidy datascience commands
93 | library(tidybayes) #tidy data + ggplot workflow
94 | library(modelr) #tidy pipelines for modeling
95 | library(ggplot2) #plotting package
96 | library(gganimate) #animate ggplots
97 |
98 | # We import all of our plotting functions from this separate R file to keep the code in
99 | # this template easier to read. You can edit this file to customize aesthetics of the plots
100 | # if desired. Just be sure to run this line again after you make edits!
101 | source('plotting_functions.R')
102 |
103 | theme_set(theme_light()) # set the ggplot theme for all plots
104 |
105 | ```
106 |
107 |
108 | ###Read in data
109 | **What to change**
110 |
111 | 1. mydata: Read in your data.
112 |
113 | ```{r data_prep}
114 |
115 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
116 |
117 | ```
118 |
119 |
120 | ## Specify model
121 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where both $x_1$ and $x_2$ are categorical variables. This specifies a linear regression with dummy variables for each level in $x_1$ and $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. **This is equivalent to ANOVA.** So for example, for a regression where $x_1$ has three levels and $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b*dummy$ (where $b*dummy$ is the appropriate dummy term) and standard deviation equal to `sigma` ($\sigma$):
122 |
123 |
124 | $$
125 | \begin{aligned}
126 | y_i \sim Normal(a + b_{x1a}dummy_{x1a} + b_{x1b}dummy_{x1b} + \\
127 | b_{x2}dummy_{x2} + \\
128 | b_{x2}dummy_{x2} * b_{x1a}dummy_{x1a} + \\
129 | b_{x2}dummy_{x2} * b_{x1b}dummy_{x1b}, \\\sigma)
130 | \end{aligned}
131 | $$
132 |
133 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
134 |
135 | **What to change**
136 |
137 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots.
138 |
139 | 3. mydata\$x2: Select the second independent variable. Each level of this variable will correspond to a different bar color in the output plot.
140 |
141 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots.
142 |
143 | 5. x_lab: Label your plots' x-axes.
144 |
145 | 6. y_lab: Label your plots' y-axes.
146 |
147 | ```{r specify_model}
148 |
149 | #select your independent and dependent variables
150 | mydata$x1 = as.factor(mydata$num_products_displayed) #CHANGE ME 2
151 | mydata$x2 = mydata$sat_max #CHANGE ME 3
152 | mydata$y = mydata$satis_Q1 #CHANGE ME 4
153 |
154 | # label the axes on the plots
155 | x_lab = "Choices" #CHANGE ME 5
156 | y_lab = "Satisfaction" #CHANGE ME 6
157 |
158 | ```
159 |
160 |
161 | ### Set priors
162 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
163 |
164 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
165 |
166 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
167 |
168 | Our model has the following parameters:
169 |
170 | a. the overall mean y-value across all levels of categorical variable x
171 |
172 | b. the mean y-value for each of the individual levels
173 |
174 | c. the standard deviation of the normally distributed error term
175 |
176 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the categorical variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
177 |
178 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
179 |
180 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
181 |
182 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
183 |
184 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
185 |
186 | **What to change**
187 |
188 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
189 |
190 | *Remember: **do not** use any of your data from the current study to inform prior values.*
191 |
192 | 7. a_prior: Select the control condition mean.
193 |
194 | 8. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.)
195 |
196 | 9. b1_prior: Select the effect size mean.
197 |
198 | 10. b1_sd: Select the effect size standard deviation.
199 |
200 | 11. You should also change the comments in the code below to explain your choice of priors.
201 |
202 | **If you are using strong priors (i.e. priors from previous literature):**
203 |
204 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
205 |
206 | ```{r}
207 |
208 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
209 | # In our example dataset, y-axis scores can be in the range [1, 7].
210 | # In the absence of other information, we set the parameter mean as 4
211 | # (the mean of the range [1,7]) and the maximum possible value as 7.
212 | # From exploratory analysis, we know the mean score and sd for y in our
213 | # dataset but we *DO NOT* use this information because priors *CANNOT*
214 | # include any information from the current study.
215 |
216 | a_prior = 4 # CHANGE ME 7
217 | a_prior_max = 7 # CHANGE ME 8
218 |
219 | # With a normal distribution, we can't completely rule out
220 | # impossible values, but we choose an sd that assigns less than
221 | # 5% probability to those impossible values. Remember that in a normal
222 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
223 | # we calculate the value of 1 sd by finding the maximum amount our data
224 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
225 |
226 | a_sd = (a_prior_max - a_prior) / 2 # do not change
227 |
228 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
229 | # In our example dataset, we do not have a strong hypothesis that the treatment
230 | # conditions will be higher or lower than the control, so we set the mean of
231 | # the effect size parameters to be 0. In the absence of other information, we
232 | # set the sd to be the same as for the control condition.
233 |
234 | b1_prior = 0 # CHANGE ME 9
235 | b1_sd = a_sd # CHANGE ME 10
236 |
237 | ```
238 |
239 |
240 | **What to change**
241 |
242 | **If you are using weakly informative priors:**
243 |
244 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
245 |
246 | **If you are using strong priors (i.e. priors from previous literature):**
247 |
248 | *Remember: **do not** use any of your data from the current study to set prior values.*
249 |
250 | First, make sure to uncomment all four variables set in this code chunk.
251 |
252 | 7. a_prior: Select the control condition mean.
253 |
254 | 8. a_sd: Select the control condition standard deviation.
255 |
256 | 9. b1_prior: Select the effect size mean.
257 |
258 | 10. b1_sd: Select the effect size standard deviation.
259 |
260 | 11. You should also change the comments in the code below to explain your choice of priors.
261 |
262 | ```{r}
263 |
264 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
265 | # In our example dataset, y-axis scores can be in the range [1, 7].
266 | # To choose our priors, we use the results from a previous study
267 | # where participants completed an identical task (choosing between
268 | # different chocolate bars). For our overall prior mean, we pool the mean
269 | # satisfaction scores from all conditions in the previous study to get
270 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
271 | # the 95% confidence intervals from the previous study results.
272 |
273 | # a_prior = 5.86 # CHANGE ME 7
274 | # a_sd = 0.6 # CHANGE ME 8
275 |
276 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
277 | # In our example dataset, we do not have guidance from previous literature
278 | # to set an exact effect size, but we do know that satisficers (the "treatment"
279 | # condition) are likely to have higher mean satisfaction than the maximizers
280 | # (the "control" condition), so we set an effect size parameter mean that
281 | # results in a 1 point increase in satisfaction for satisficers. To reflect
282 | # the uncertainty in this effect size, we select a broad sd so that there is
283 | # a ~20% chance that the effect size will be negative.
284 |
285 | # b1_prior = 1 # CHANGE ME 9
286 | # b1_sd = 1 # CHANGE ME 10
287 |
288 | ```
289 |
290 |
291 | ### Checking priors with visualizations
292 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
293 |
294 | **What to change**
295 |
296 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often.
297 |
298 | **Troubleshooting**
299 |
300 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
301 |
302 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
303 |
304 | ```{r check_priors, results="hide"}
305 |
306 | # generate the prior distribution
307 | m_prior = stan_glm(y ~ x1*x2, data = mydata,
308 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
309 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
310 | prior_PD = TRUE
311 | )
312 |
313 | # Create the dataframe with fitted draws
314 | prior_draws = mydata %>% #pipe mydata to datagrid()
315 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
316 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid
317 | # the seed argument is for reproducibility: it ensures the pseudo-random
318 | # number generator used to pick draws has the same seed on every run,
319 | # so that someone else can re-run this code and verify their output matches
320 |
321 | # Plot the five sample draws
322 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
323 | static_prior_plot_4(prior_draws)
324 | ```
325 |
326 | #### Animated visualization of priors
327 |
328 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws!
329 |
330 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above.
331 |
332 | **What to change:** Nothing! Just run the code to check your priors.
333 |
334 | ```{r HOPs_priors}
335 | # Animation parameters
336 | n_draws = 50 # the number of draws to visualize in the HOPs
337 | frames_per_second = 2.5 # the speed of the HOPs
338 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
339 | # Faster speeds (100ms) have been demonstrated to not work as well.
340 | # See Kale et al. VIS 2018 for more info.
341 |
342 | # Add more prior draws to the data frame for the visualization
343 | more_prior_draws = prior_draws %>%
344 | rbind(
345 | mydata %>%
346 | data_grid(x1,x2) %>%
347 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345))
348 |
349 | # Animate the prior draws with HOPs
350 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
351 | prior_HOPs = animate(HOPs_plot_4(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second)
352 | prior_HOPs
353 | ```
354 |
355 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
356 |
357 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
358 |
359 |
360 | ### Run the model
361 | There's nothing you have to change here. Just run the model.
362 |
363 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
364 |
365 | ```{r results = "hide", message = FALSE, warning = FALSE}
366 |
367 | m = stan_glm(y ~ x1*x2, data = mydata,
368 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
369 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
370 | )
371 |
372 | ```
373 |
374 |
375 | ## Model summary
376 | Here is a summary of the model fit.
377 |
378 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
379 |
380 | ```{r}
381 | summary(m, digits=3)
382 | ```
383 |
384 |
385 | ## Visualizing results
386 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model:
387 |
388 | ```{r static_graph}
389 |
390 | # Create the dataframe with fitted draws
391 | fit = mydata %>%#pipe mydata to datagrid()
392 | data_grid(x1,x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
393 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid
394 | mean_qi(.width = .95) #add 95% credible intervals
395 |
396 | # Plot the posterior draws
397 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
398 | static_post_plot_4(fit)
399 | ```
400 |
401 |
402 | ### Sampling from the posterior
403 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
404 |
405 | ```{r}
406 |
407 | p = mydata %>% #pipe mydata to datagrid()
408 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
409 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid
410 |
411 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk
412 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
413 | post_HOPs = animate(HOPs_plot_4(p), nframes = n_draws * 2, fps = frames_per_second)
414 | post_HOPs
415 |
416 | ```
417 |
418 | ### Comparing the prior and posterior
419 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
420 |
421 | **Prior draws**
422 | ```{r echo=F}
423 | prior_HOPs
424 | ```
425 |
426 | **Posterior draws**
427 | ```{r echo=F}
428 | post_HOPs
429 | ```
430 |
431 | ## Finishing up
432 |
433 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
434 |
435 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
436 |
437 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
438 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
439 |
440 |
441 | The citation for paper reporting the process of developing and user-testing these templates is below:
442 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
443 |
--------------------------------------------------------------------------------
/2var-categorical_ordinal-line-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 5:*
11 |
12 | 
13 |
14 | **Interaction of one categorical & one ordinal independent variable (line graph)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **two interacting independent variables, one categorical and one ordinal** and will produce a **line graph**. If your analysis includes a **two-way ANOVA**, this might be the right template for you. In most cases, we *do not recommend* using line charts for this type of analysis; a bar chart is usually the better option.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA)
29 |
30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA)
31 |
32 | 3. Continuous; line graph (e.g. linear regression)
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. Two categorical; bar graph (e.g. two-way ANOVA)
37 |
38 | 5. **One categorical, one ordinal; line graph (e.g. two-way ANOVA)**
39 |
40 | 6. One categorical, one continuous; line graph (e.g. linear regression with multiple lines)
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as an ordinal variable in this template with possible values [12,24,40,50,60,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 | ##Set up
60 | ###Requirements
61 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
62 |
63 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
64 |
65 | ###Libraries
66 | **Installation:**
67 | If this is your first time using the template, you may need to install libraries.
68 |
69 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
70 |
71 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
72 |
73 | **Troubleshooting:**
74 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
75 |
76 | 1. Restart R.
77 |
78 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
79 |
80 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
81 |
82 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
83 |
84 | ```{r libraries, message=FALSE, warning=FALSE}
85 |
86 | knitr::opts_chunk$set(fig.align="center")
87 |
88 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
89 |
90 | library(rstanarm) #bayesian analysis package
91 | library(tidyverse) #tidy datascience commands
92 | library(tidybayes) #tidy data + ggplot workflow
93 | library(modelr) #tidy pipelines for modeling
94 | library(ggplot2) #plotting package
95 | library(gganimate) #animate ggplots
96 |
97 | # We import all of our plotting functions from this separate R file to keep the code in
98 | # this template easier to read. You can edit this file to customize aesthetics of the plots
99 | # if desired. Just be sure to run this line again after you make edits!
100 | source('plotting_functions.R')
101 |
102 | theme_set(theme_light()) # set the ggplot theme for all plots
103 |
104 | ```
105 |
106 | ###Read in data
107 | **What to change**
108 |
109 | 1. mydata: Read in your data.
110 |
111 | ```{r data_prep}
112 |
113 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
114 |
115 | ```
116 |
117 |
118 | ## Specify model
119 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where $x_1$ is an ordinal variable and $x_2$ is a categorical variable. This specifies a linear regression with dummy variables for each level in $x_1$ and $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. **This is equivalent to ANOVA.** So for example, for a regression where $x_1$ has three levels and $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to $a + b*dummy$ (where $b*dummy$ is the appropriate dummy term) and standard deviation equal to `sigma` ($\sigma$):
120 |
121 |
122 | $$
123 | \begin{aligned}
124 | y_i \sim Normal(a + b_{x1a}dummy_{x1a} + b_{x1b}dummy_{x1b} + \\
125 | b_{x2}dummy_{x2} + \\
126 | b_{x2}dummy_{x2} * b_{x1a}dummy_{x1a} + \\
127 | b_{x2}dummy_{x2} * b_{x1b}dummy_{x1b}, \\\sigma)
128 | \end{aligned}
129 | $$
130 |
131 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
132 |
133 | **What to change**
134 |
135 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots. This is your ordered variable.
136 |
137 | 3. mydata\$x2: Select the second independent variable, the categorical variable. You will have one line in the output graph for each level of this variable.
138 |
139 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots.
140 |
141 | 5. x_lab: Label your plots' x-axes.
142 |
143 | 6. y_lab: Label your plots' y-axes.
144 |
145 | ```{r specify_model}
146 |
147 | #select your independent and dependent variables
148 | mydata$x1 = as.factor(mydata$num_products_displayed) #CHANGE ME 2
149 | mydata$x2 = mydata$sat_max #CHANGE ME 3
150 | mydata$y = mydata$satis_Q1 #CHANGE ME 4
151 |
152 | # label the axes on the plots
153 | x_lab = "Choices" #CHANGE ME 5
154 | y_lab = "Satisfaction" #CHANGE ME 6
155 |
156 | ```
157 |
158 |
159 | ###Set priors
160 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
161 |
162 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
163 |
164 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
165 |
166 | Our model has the following parameters:
167 |
168 | a. the overall mean y-value across all levels of ordinal variable x
169 |
170 | b. the mean y-value for each of the individual levels
171 |
172 | c. the standard deviation of the normally distributed error term
173 |
174 | To simplify things, we limit the number of different prior beliefs you can have. Think of the first level of the ordinal variable as specifying the control condition of an experiment, and all of the other levels being treatment conditions in the experiment. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
175 |
176 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
177 |
178 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
179 |
180 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
181 |
182 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
183 |
184 | **What to change**
185 |
186 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
187 |
188 | *Remember: **do not** use any of your data from the current study to inform prior values.*
189 |
190 | 7. a_prior: Select the control condition mean.
191 |
192 | 8. a_prior_max: Select the maximum plausible value of the control condition data. (We will use this to calculate the sd of `a`.)
193 |
194 | 9. b1_prior: Select the effect size mean.
195 |
196 | 10. b1_sd: Select the effect size standard deviation.
197 |
198 | 11. You should also change the comments in the code below to explain your choice of priors.
199 |
200 | **If you are using strong priors (i.e. priors from previous literature):**
201 |
202 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
203 |
204 | ```{r}
205 |
206 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
207 | # In our example dataset, y-axis scores can be in the range [1, 7].
208 | # In the absence of other information, we set the parameter mean as 4
209 | # (the mean of the range [1,7]) and the maximum possible value as 7.
210 | # From exploratory analysis, we know the mean score and sd for y in our
211 | # dataset but we *DO NOT* use this information because priors *CANNOT*
212 | # include any information from the current study.
213 |
214 | a_prior = 4 # CHANGE ME 7
215 | a_prior_max = 7 # CHANGE ME 8
216 |
217 | # With a normal distribution, we can't completely rule out
218 | # impossible values, but we choose an sd that assigns less than
219 | # 5% probability to those impossible values. Remember that in a normal
220 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
221 | # we calculate the value of 1 sd by finding the maximum amount our data
222 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
223 |
224 | a_sd = (a_prior_max - a_prior) / 2 # do not change
225 |
226 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
227 | # In our example dataset, we do not have a strong hypothesis that the treatment
228 | # conditions will be higher or lower than the control, so we set the mean of
229 | # the effect size parameters to be 0. In the absence of other information, we
230 | # set the sd to be the same as for the control condition.
231 |
232 | b1_prior = 0 # CHANGE ME 9
233 | b1_sd = a_sd # CHANGE ME 10
234 |
235 | ```
236 |
237 |
238 | **What to change**
239 |
240 | **If you are using weakly informative priors:**
241 |
242 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
243 |
244 | **If you are using strong priors (i.e. priors from previous literature):**
245 |
246 | *Remember: **do not** use any of your data from the current study to set prior values.*
247 |
248 | First, make sure to uncomment all four variables set in this code chunk.
249 |
250 | 7. a_prior: Select the control condition mean.
251 |
252 | 8. a_sd: Select the control condition standard deviation.
253 |
254 | 9. b1_prior: Select the effect size mean.
255 |
256 | 10. b1_sd: Select the effect size standard deviation.
257 |
258 | 11. You should also change the comments in the code below to explain your choice of priors.
259 |
260 | ```{r}
261 |
262 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
263 | # In our example dataset, y-axis scores can be in the range [1, 7].
264 | # To choose our priors, we use the results from a previous study
265 | # where participants completed an identical task (choosing between
266 | # different chocolate bars). For our overall prior mean, we pool the mean
267 | # satisfaction scores from all conditions in the previous study to get
268 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
269 | # the 95% confidence intervals from the previous study results.
270 |
271 | # a_prior = 5.86 # CHANGE ME 7
272 | # a_sd = 0.6 # CHANGE ME 8
273 |
274 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
275 | # In our example dataset, we do not have guidance from previous literature
276 | # to set an exact effect size, but we do know that satisficers (the "treatment"
277 | # condition) are likely to have higher mean satisfaction than the maximizers
278 | # (the "control" condition), so we set an effect size parameter mean that
279 | # results in a 1 point increase in satisfaction for satisficers. To reflect
280 | # the uncertainty in this effect size, we select a broad sd so that there is
281 | # a ~20% chance that the effect size will be negative.
282 |
283 | # b1_prior = 1 # CHANGE ME 9
284 | # b1_sd = 1 # CHANGE ME 10
285 |
286 | ```
287 |
288 |
289 | ### Checking priors with visualizations
290 | Next, you'll want to check your priors by running this code chunk. It will produce a set of five sample plots drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
291 |
292 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars.
293 |
294 | #### Static visualization of priors
295 |
296 | **What to change**
297 |
298 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often.
299 |
300 | **Troubleshooting**
301 |
302 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
303 |
304 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
305 |
306 | ```{r check_priors, results="hide"}
307 |
308 | # generate the prior distribution
309 | m_prior = stan_glm(y ~ x1*x2, data = mydata,
310 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
311 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
312 | prior_PD = TRUE
313 | )
314 |
315 | # Create the dataframe with fitted draws
316 | prior_draws = mydata %>% #pipe mydata to datagrid()
317 | data_grid(x1,x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
318 | add_fitted_draws(m_prior, n = 5, seed = 12345) #add n fitted draws from the model to the fit grid
319 | # the seed argument is for reproducibility: it ensures the pseudo-random
320 | # number generator used to pick draws has the same seed on every run,
321 | # so that someone else can re-run this code and verify their output matches
322 |
323 | # Plot the five sample draws
324 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
325 | static_prior_plot_5(prior_draws)
326 | ```
327 |
328 | #### Animated visualization of priors
329 |
330 | The five static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs, which are better for visualizing uncertainty and identifying underlying trends. HOPs visualizes the same information as the static plot generated above. However, with HOPs we can visualize more draws: with the static plot, we run out of room after only about five draws!
331 |
332 | In this code chunk, we add more draws to the `prior_draws` dataframe, so we have a total of 50 draws to visualize, and then create the animated plot. Each frame of the animation shows a different draw from the prior, starting with the same five draws as the static image above.
333 |
334 | **What to change:** Nothing! Just run the code to check your priors.
335 |
336 | ```{r HOPs_priors}
337 | # Animation parameters
338 | n_draws = 50 # the number of draws to visualize in the HOPs
339 | frames_per_second = 2.5 # the speed of the HOPs
340 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
341 | # Faster speeds (100ms) have been demonstrated to not work as well.
342 | # See Kale et al. VIS 2018 for more info.
343 |
344 | # Add more prior draws to the data frame for the visualization
345 | more_prior_draws = prior_draws %>%
346 | rbind(
347 | mydata %>%
348 | data_grid(x1, x2) %>%
349 | add_fitted_draws(m_prior, n = n_draws - 5, seed = 12345))
350 |
351 | # Animate the prior draws with HOPs
352 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
353 | prior_HOPs = animate(HOPS_plot_5(more_prior_draws), nframes = n_draws * 2, fps = frames_per_second)
354 | prior_HOPs
355 | ```
356 |
357 | In most cases, your prior HOPs will show a lot of uncertainty: the bars will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
358 |
359 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs, or bars that reach/exceed the plot's maximum y-value), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
360 |
361 |
362 | ### Run the model
363 | There's nothing you have to change here. Just run the model.
364 |
365 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
366 |
367 | ```{r results = "hide", message = FALSE, warning = FALSE}
368 |
369 | m = stan_glm(y ~ x1*x2, data = mydata,
370 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
371 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
372 | )
373 |
374 | ```
375 |
376 |
377 | ## Model summary
378 | Here is a summary of the model fit.
379 |
380 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
381 |
382 | ```{r}
383 | summary(m, digits=3)
384 | ```
385 |
386 | ## Visualizing results
387 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with error bars that represent a 95% credible interval. For each x position in the fit grid, we can get the posterior mean estimates and 95% credible intervals from the model:
388 |
389 | ```{r static_graph}
390 |
391 | # Create the dataframe with fitted draws
392 | fit = mydata %>%#pipe mydata to datagrid()
393 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
394 | add_fitted_draws(m) %>% #add n fitted draws from the model to the fit grid
395 | mean_qi(.width = .95) #add 95% credible intervals
396 |
397 | # Plot the posterior draws
398 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
399 | static_post_plot_5(fit)
400 | ```
401 |
402 |
403 | #### Animated HOPs visualization
404 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
405 |
406 | ```{r}
407 |
408 | p = mydata %>% #pipe mydata to datagrid()
409 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
410 | add_fitted_draws(m, n = n_draws, seed = 12345) #add n fitted draws from the model to the fit grid
411 | # the seed argument is for reproducibility: it ensures the pseudo-random
412 | # number generator used to pick draws has the same seed on every run,
413 | # so that someone else can re-run this code and verify their output matches
414 |
415 | #animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk
416 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
417 | post_HOPs = animate(HOPS_plot_5(p), nframes = n_draws * 2, fps = frames_per_second)
418 | post_HOPs
419 |
420 | ```
421 |
422 | ### Comparing the prior and posterior
423 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
424 |
425 | **Prior draws**
426 | ```{r echo=F}
427 | prior_HOPs
428 | ```
429 |
430 | **Posterior draws**
431 | ```{r echo=F}
432 | post_HOPs
433 | ```
434 |
435 | ## Finishing up
436 |
437 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
438 |
439 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
440 |
441 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
442 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
443 |
444 |
445 | The citation for paper reporting the process of developing and user-testing these templates is below:
446 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
447 |
--------------------------------------------------------------------------------
/2var-continuous_categorical-line-bayesian_template.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian analysis template
3 | author: Phelan, C., Hullman, J., Kay, M. & Resnick, P.
4 | output:
5 | html_document:
6 | theme: flatly
7 | highlight: pygments
8 | ---
9 |
10 | *Template 6:*
11 |
12 | 
13 |
14 | **Interaction of one continuous & one categorical independent variable (line graph)**
15 |
16 |
17 | ##Introduction
18 | Welcome! This template will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. This template is for data with **one continuous and one categorical independent variable** and will produce a **line chart**. If your analysis includes a **linear regression**, this might be the right template for you.
19 |
20 | This template assumes you have basic familiarity with R. Once complete, this template will produce a summary of the analysis, complete with parameter estimates and credible intervals, and two animated HOPs (see Hullman, Resnick, Adar 2015 DOI: 10.1371/journal.pone.0142444 and Kale, Nguyen, Kay, and Hullman VIS 2018 for more information) for both your prior and posterior estimates.
21 |
22 | This Bayesian analysis focuses on producing results in a form that are easily interpretable, even to nonexperts. The credible intervals produced by Bayesian analysis are the analogue of confidence intervals in traditional null hypothesis significance testing (NHST). A weakness of NHST confidence intervals is that they are easily misinterpreted. Many people naturally interpret an NHST 95% confidence interval to mean that there is a 95% chance that the true parameter value lies somewhere in that interval; in fact, it means that if the experiment were repeated 100 times, 95 of the resulting confidence intervals would include the true parameter value. The Bayesian credible interval sidesteps this complication by providing the intuitive meaning: a 95% chance that the true parameter value lies somewhere in that interval. To further support intuitive interpretations of your results, this template also produces animated HOPs, a type of plot that is more effective than visualizations such as error bars in helping people make accurate judgments about probability distributions.
23 |
24 | This set of templates supports a few types of statistical analysis. (In future work, this list of supported statistical analyses will be expanded.) For clarity, each type has been broken out into a separate template, so be sure to select the right template before you start! A productive way to choose which template to use is to think about what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
25 |
26 | *One independent variable:*
27 |
28 | 1. Categorical; bar graph (e.g. t-tests, one-way ANOVA)
29 |
30 | 2. Ordinal; line graph (e.g. t-tests, one-way ANOVA)
31 |
32 | 3. Continuous; line graph (e.g. linear regression)
33 |
34 | *Two interacting independent variables:*
35 |
36 | 4. Two categorical; bar graph (e.g. two-way ANOVA)
37 |
38 | 5. One categorical, one ordinal; line graph (e.g. two-way ANOVA)
39 |
40 | 6. **One categorical, one continuous; line graph (e.g. linear regression with multiple lines)**
41 |
42 | Note that this template fits your data to a model that assumes normally distributed error terms. (This is the same assumption underlying t-tests, ANOVA, etc.) This template requires you to have already run diagnostics to determine that your data is consistent with this assumption; if you have not, the results may not be valid.
43 |
44 | Once you have selected your template, to complete the analysis, please follow along this template. For each code chunk, you may need to make changes to customize the code for your own analysis. In those places, the code chunk will be preceded by a list of things you need to change (with the heading "What to change"), and each line that needs to be customized will also include the comment `#CHANGE ME` within the code chunk itself. You can run each code chunk independently during debugging; when you're finished, you can knit the document to produce the complete document.
45 |
46 | Good luck!
47 |
48 | ###Tips before you start
49 |
50 | 1. Make sure you have picked the right template! (See above.)
51 |
52 | 2. Use the pre-knitted HTML version of this template as a reference as you work (we've included all the HTML files, in the folder `html_outputs`. The formatting makes the template easier to follow. You can also knit this document as you work once you have completed set up.
53 |
54 | 3. Make sure you are using the most recent version of the templates. Updates can be found at https://github.com/cdphelan/bayesian-template.
55 |
56 | ###Sample dataset
57 | This template comes prefilled with an example dataset from Moser et al. (DOI: 10.1145/3025453.3025778), which examines choice overload in the context of e-commerce. The study examined the relationship between choice satisfaction (measured at a 7-point Likert scale), the number of product choices presented on a webpage, and whether the participant is a decision "maximizer" (a person who examines all options and tries to choose the best) or a "satisficer" (a person who selects the first option that is satisfactory). In this template, we analyze the relationship between choice set size, which we treat as a continuous variable in this template with values that can fall in the range [12,72]; type of decision-making (maximizer or satisficer), a two-level categorical variable; and choice satisfaction, which we treat as a continuous variable with values that can fall in the range [1,7].
58 |
59 |
60 | ##Set up
61 | ###Requirements
62 | To run this template, we assume that you are using RStudio, and you have the most recent version of R installed. (This template was built with R version 3.5.1.)
63 |
64 | This template works best if you first open the file `bayesian-template.Rproj` from the code repository as a project in RStudio to get started, and then open the individual `.Rmd` template files after this.
65 |
66 | ###Libraries
67 | **Installation:**
68 | If this is your first time using the template, you may need to install libraries.
69 |
70 | 1. **If you are using Windows,** first you will need to manually install RStan and Rtools. Follow the instructions [here](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows) to install both.
71 |
72 | 2. On both Mac and Windows, uncomment the line with `install.packages()` to install the required packages. This only needs to be done once.
73 |
74 | **Troubleshooting:**
75 | You may have some trouble installing the packages, especially if you are on Windows. Regardless of OS, if you have any issues installing these packages, try one or more of the following troubleshooting options:
76 |
77 | 1. Restart R.
78 |
79 | 2. Make sure you are running the most recent version of R (3.5.1, as of the writing of this template).
80 |
81 | 3. Manually install RStan and Rtools, following the instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
82 |
83 | 4. If you have tried the above and you are still getting error messages like `there is no package called [X]`, try installing the missing package(s) manually using the RStudio interface under Tools > Install Packages...
84 |
85 | ```{r libraries, message=FALSE, warning=FALSE}
86 |
87 | knitr::opts_chunk$set(fig.align="center")
88 |
89 | # install.packages(c("ggplot2", "rstanarm", "tidyverse", "tidybayes", "modelr", "gganimate"))
90 |
91 | library(rstanarm) #bayesian analysis package
92 | library(tidyverse) #tidy datascience commands
93 | library(tidybayes) #tidy data + ggplot workflow
94 | library(modelr) #tidy pipelines for modeling
95 | library(ggplot2) #plotting package
96 | library(gganimate) #animate ggplots
97 |
98 | # We import all of our plotting functions from this separate R file to keep the code in
99 | # this template easier to read. You can edit this file to customize aesthetics of the plots
100 | # if desired. Just be sure to run this line again after you make edits!
101 | source('plotting_functions.R')
102 |
103 | theme_set(theme_light()) # set the ggplot theme for all plots
104 |
105 | ```
106 |
107 | ###Read in data
108 | **What to change**
109 |
110 | 1. mydata: Read in your data.
111 |
112 | ```{r data_prep}
113 |
114 | mydata = read.csv('datasets/choc_cleaned_data.csv') #CHANGE ME 1
115 |
116 | ```
117 |
118 |
119 | ## Specify model
120 | We'll fit the following model: `stan_glm(y ~ x1 * x2)`, where $x_1$ is a continuous variable and $x_2$ is a categorical variable. This specifies a linear regression with a parameter for $x_1$ and dummy variables for each level in $x_2$, plus interaction terms for each combination of $x_1$ and $x_2$. So for example, for a regression where $x_2$ has two levels, each $y_i$ is drawn from a normal distribution with mean equal to the value of the specified regression equation and standard deviation equal to `sigma` ($\sigma$):
121 |
122 | $$
123 | y_i \sim Normal(a + b_{x1}x_i + b_{x2}dummy_{x2} + b_{x1}x_i * b_{x2}dummy_{x2}, \sigma)
124 | $$
125 |
126 | Choose your independent and dependent variables. These are the variables that will correspond to the x and y axis on the final plots.
127 |
128 | **What to change**
129 |
130 | 2. mydata\$x1: Select which variables will appear on the x-axis of your plots. This is your continuous variable.
131 |
132 | 3. mydata\$x2: Select the second independent variable, the categorical variable. You will have one line in the output graph for each level of this variable.
133 |
134 | 4. mydata\$y: Select which variables will appear on the y-axis of your plots.
135 |
136 | 5. x_lab: Label your plots' x-axes.
137 |
138 | 6. y_lab: Label your plots' y-axes.
139 |
140 | ```{r specify_model}
141 |
142 | #select your independent and dependent variables
143 | mydata$x1 = mydata$num_products_displayed #CHANGE ME 2
144 | mydata$x2 = mydata$sat_max #CHANGE ME 3
145 | mydata$y = mydata$satis_Q1 #CHANGE ME 4
146 |
147 | # label the axes on the plots
148 | x_lab = "Choices" #CHANGE ME 5
149 | y_lab = "Satisfaction" #CHANGE ME 6
150 |
151 | ```
152 |
153 |
154 | ###Set priors
155 | In this section, you will set priors for your model. Setting priors thoughtfully is important to any Bayesian analysis, especially if you have a small sample of data that you will use for fitting for your model. The priors express your best prior belief, *before seeing any data*, of reasonable values for the model parameters.
156 |
157 | Ideally, you will have previous literature from which to draw these prior beliefs. If no previous studies exist, you can instead assign "weakly informative priors" that only minimally restrict the model, excluding only values that are implausible or impossible. We have provided examples of how to set both weak and strong priors below.
158 |
159 | To check the plausibility of your priors, use the code section after this one to generate a graph of five sample draws from your priors to check if the values generated are reasonable.
160 |
161 | Our model has the following parameters:
162 |
163 | a. the intercept; functionally, this is often the mean of the control condition
164 |
165 | b. the slope; i.e, the effect size
166 |
167 | c. the standard deviation of the normally distributed error term
168 |
169 | To simplify things, we limit the number of different prior beliefs you can have. Think of the intercept as specifying the control condition of an experiment, and the slope as specifying the effect size. We let you specify a prior belief about the plausible values of mean in the control condition (a), and then we let you set a prior belief about the plausible effect size (b). You have to specify the same plausible effect sizes for all conditions, unless you dig deeper into our code.
170 |
171 | To simplify things further, we only let you specify beliefs about these parameters in the form of a normal distribution. Thus, you will specify what you think is the most likely value for the parameter (the mean), and a standard deviation. You will be expressing a belief that you were 95% certain (before looking at any data) that the true value of the parameter is within two standard deviations of the mean.
172 |
173 | Finally, our modeling system, `stan_glm()`, will automatically set priors for the last parameter, the standard deviation of the normally distributed error term for the model overall (c).
174 |
175 | To explore more about priors, you can experiment with different values for these parameters and use the following section, *Checking priors with visualizations*, to see how different parameter values change the prior distribution.
176 |
177 | Want more examples? Check your understanding of how to set priors in this [quizlet](https://cdphelan.shinyapps.io/check_understanding_priors/), which includes several more examples of how to set both strong and weak priors.
178 |
179 | **What to change**
180 |
181 | **If you are using weakly informative priors (i.e. priors not informed by previous literature):**
182 |
183 | *Remember: **do not** use any of your data from the current study to inform prior values.*
184 |
185 | 7. a_prior: Select the intercept (likely the control condition mean).
186 |
187 | 8. a_prior_max: Select the maximum plausible value of the intercept (maximum plausible value of control condition data). (We will use this to calculate the sd of `a`.)
188 |
189 | 9. b1_prior: Select the effect size mean.
190 |
191 | 10. b1_sd: Select the effect size standard deviation.
192 |
193 | 11. You should also change the comments in the code below to explain your choice of priors.
194 |
195 | **If you are using strong priors (i.e. priors from previous literature):**
196 |
197 | Skip this code chunk and set your priors in the next code chunk. For clarity, comment out everything in this code chunk.
198 |
199 | ```{r}
200 |
201 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
202 | # In our example dataset, y-axis scores can be in the range [1, 7].
203 | # In the absence of other information, we set the parameter mean as 4
204 | # (the mean of the range [1,7]) and the maximum possible value as 7.
205 | # From exploratory analysis, we know the mean score and sd for y in our
206 | # dataset but we *DO NOT* use this information because priors *CANNOT*
207 | # include any information from the current study.
208 |
209 | a_prior = 4 # CHANGE ME 7
210 | a_prior_max = 7 # CHANGE ME 8
211 |
212 | # With a normal distribution, we can't completely rule out
213 | # impossible values, but we choose an sd that assigns less than
214 | # 5% probability to those impossible values. Remember that in a normal
215 | # distribution, 95% of the data lies within 2 sds of the mean. Therefore,
216 | # we calculate the value of 1 sd by finding the maximum amount our data
217 | # can vary from the mean (a_prior_max - a_prior) and divide that in half.
218 |
219 | a_sd = (a_prior_max - a_prior) / 2 # do not change
220 |
221 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
222 | # In our example dataset, we do not have a strong hypothesis that the treatment
223 | # conditions will be higher or lower than the control, so we set the mean of
224 | # the effect size parameter to be 0. In the absence of other information, we
225 | # set the sd so that a change from the minimum choice set size (12)
226 | # to the maximum choice set size (72) could plausibly result in a
227 | # +6/-6 change in satisfaction, the maximum possible change.
228 |
229 | b1_prior = 0 # CHANGE ME 9
230 | b1_sd = (6/(72-12))/2 # CHANGE ME 10
231 |
232 | ```
233 |
234 |
235 | **What to change**
236 |
237 | **If you are using weakly informative priors:**
238 |
239 | Do not use this code chunk; use the code chunk above to set your priors instead. Make sure everything in this code chunk is commented out so that your priors are not overwritten.
240 |
241 | **If you are using strong priors (i.e. priors from previous literature):**
242 |
243 | *Remember: **do not** use any of your data from the current study to set prior values.*
244 |
245 | First, make sure to uncomment all four variables set in this code chunk.
246 |
247 | 7. a_prior: Select the control condition mean.
248 |
249 | 8. a_sd: Select the control condition standard deviation.
250 |
251 | 9. b1_prior: Select the effect size mean.
252 |
253 | 10. b1_sd: Select the effect size standard deviation.
254 |
255 | 11. You should also change the comments in the code below to explain your choice of priors.
256 |
257 | ```{r}
258 |
259 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
260 | # In our example dataset, y-axis scores can be in the range [1, 7].
261 | # To choose our priors, we use the results from a previous study
262 | # where participants completed an identical task (choosing between
263 | # different chocolate bars). For our overall prior mean, we pool the mean
264 | # satisfaction scores from all conditions in the previous study to get
265 | # an overall mean of 5.86. We set a_sd so that 5.86 +/- 2 sds encompasses
266 | # the 95% confidence intervals from the previous study results.
267 |
268 | # a_prior = 5.86 # CHANGE ME 7
269 | # a_sd = 0.6 # CHANGE ME 8
270 |
271 | # CHANGE THIS COMMENT EXPLAINING YOUR CHOICE OF PRIORS (11)
272 | # In this example, we do not have guidance from previous literature
273 | # to set an effect size on satisfaction with an interaction term between
274 | # choice size and maximizer score, so we set the mean effect size
275 | # at 0. To reflect the uncertainty in this effect size, we set the sd
276 | # so that a change from the minimum choice set size (12) to the maximum
277 | # choice set size (72) could plausibly result in a +6/-6 change
278 | # in satisfaction, the maximum possible change.
279 |
280 | # b1_prior = 0 # CHANGE ME 9
281 | # b1_sd = (6/(72-12))/2 # CHANGE ME 10
282 |
283 | ```
284 |
285 |
286 | ### Checking priors with visualizations
287 | Next, you'll want to check your priors by running this code chunk. It will produce a set of 100 sample draws drawn from the priors you set in the previous section, so you can check to see if the values generated are reasonable.
288 |
289 | You'll also want to run the code chunk after this one, `HOPs_priors`, which presents plots of sample prior draws in an animated format called HOPs (Hypothetical Outcomes Plots). HOPs are a type of plot that visualizes uncertainty as sets of draws from a distribution, and has been demonstrated to improve multivariate probability estimates (Hullman et al. 2015) and increase sensitivity to the underlying trend in data (Kale et al. 2018) over static representations of uncertainty like error bars.
290 |
291 | #### Static visualization of priors
292 | **What to change**
293 |
294 | Nothing! Just run this code to check your priors, adjusting prior values above as needed until you find reasonable prior values. Note that you may get a couple of very implausible or even impossible values because our assumption of normally distributed priors assigns a small probability to even very extreme values. If you are concerned by the outcome, you can try rerunning it a few more times to make sure that any implausible values you see don't come up very often.
295 |
296 | **Troubleshooting**
297 |
298 | * In rare cases, you may get a warning that the Markov chains have failed to converge. Chains that fail to converge are a sign that your model is not a good fit to the data. If you get this warning, you should adjust your priors. Your prior distribution may be too narrow, and/or your prior mean is very far from the data.
299 |
300 | * If you get any other errors, first double-check the values you have changed in the code chunks above (i.e. `mydata`, `mydata$x1`, `mydata$x2`, `mydata$y`, and prior values). Problems with these values can cause confusing errors downstream.
301 |
302 | ```{r check_priors, results="hide"}
303 |
304 | # generate the prior distribution
305 | m_prior = stan_glm(y ~ x1*x2, data = mydata,
306 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
307 | prior = normal(b1_prior, b1_sd, autoscale = FALSE),
308 | prior_PD = TRUE
309 | )
310 |
311 | # Create the dataframe with fitted draws
312 | prior_draws = mydata %>% #pipe mydata to datagrid()
313 | data_grid(x1, x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
314 | add_fitted_draws(m_prior, n = 100, seed = 12345) #add n fitted draws from the model to the fit grid
315 | # the seed argument is for reproducibility: it ensures the pseudo-random
316 | # number generator used to pick draws has the same seed on every run,
317 | # so that someone else can re-run this code and verify their output matches
318 |
319 | # Plot the 100 sample draws
320 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
321 | static_prior_plot_6(prior_draws)
322 | ```
323 |
324 | #### Animated visualization of priors
325 | The static draws above give use some idea of what the prior distribution might look like. Even better, we can animate this graph using HOPs. HOPs visualizes the same information as the static plot generated above, but are better for visualizing uncertainty and identifying underlying trends.
326 |
327 | In this code chunk, we create the animated plot using the 50 of the 100 draws we used in the plot above. Each frame of the animation shows a different draw from the prior.
328 |
329 | **What to change:** Nothing! Just run the code to check your priors.
330 |
331 | ```{r HOPs_priors}
332 | # Animation parameters
333 | n_draws = 50 # the number of draws to visualize in the HOPs (more draws == longer rendering time)
334 | frames_per_second = 2.5 # the speed of the HOPs
335 | # 2.5 frames per second (400ms) is the recommended speed for the HOPs visualization.
336 | # Faster speeds (100ms) have been demonstrated to not work as well.
337 | # See Kale et al. VIS 2018 for more info.
338 |
339 | # Animate the prior draws with HOPs
340 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
341 | prior_HOPs = animate(HOPS_plot_6(prior_draws), nframes = n_draws * 2, fps = frames_per_second)
342 | prior_HOPs
343 | ```
344 |
345 | In most cases, your prior HOPs will show a lot of uncertainty: the lines will jump around to a lot of different possible values. At the end of the template, you'll see how this uncertainty is affected when study data is added to the estimates.
346 |
347 | Even when you see a lot of uncertainty in the graph, the individual HOPs frames should mostly show plausible values. You will see some implausible values (usually represented as empty graphs), but if you see many implausible values, it may be a sign that you should adjust your priors in the "Set priors" section.
348 |
349 |
350 |
351 | ### Run the model
352 | There's nothing you have to change here. Just run the model.
353 |
354 | **Troubleshooting:** If this code produces errors, check the troubleshooting section under the "Check priors" heading above for a few troubleshooting options.
355 |
356 | ```{r results = "hide", message = FALSE, warning = FALSE}
357 | m = stan_glm(y ~ x1*x2, data = mydata,
358 | prior_intercept = normal(a_prior, a_sd, autoscale = FALSE),
359 | prior = normal(b1_prior, b1_sd, autoscale = FALSE)
360 | )
361 | ```
362 |
363 |
364 | ## Model summary
365 | Here is a summary of the model fit.
366 |
367 | The summary reports diagnostic values that can help you evaluate whether your model is a good fit for the data. For this template, we can keep diagnostics simple: check that your `Rhat` values are very close to 1.0. Larger values mean that your model is not a good fit for the data. This is usually only a problem if the `Rhat` values are greater than 1.1, which is a warning sign that the Markov chains have failed to converge. In this happens, Stan will warn you about the failure, and you should adjust your priors.
368 |
369 | ```{r}
370 | summary(m, digits=3)
371 | ```
372 |
373 |
374 | ## Visualizing results
375 | #### Static visualizations
376 | To plot the results, we again create a fit grid using `data_grid()`, just as we did when we created the HOPs for the prior. Given this fit grid, we can then create any number of visualizations of the results. One way we might want to visualize the results is a static graph with a 95% confidence band. To do this, we use the grid and draw samples from the posterior mean evaluated at each x position in the grid using the `add_fitted_draws` function, and then summarize these samples in ggplot using a `stat_lineribbon`:
377 |
378 | ```{r static_graph}
379 |
380 | # Create the dataframe with fitted draws
381 | fit = mydata %>%#pipe mydata to datagrid()
382 | data_grid(x1 = seq_range(x1, n = 20), x2) %>% #create a fit grid with each level in x, and pipe it to add_fitted_draws()
383 | add_fitted_draws(m) #add n fitted draws from the model to the fit grid
384 |
385 | # Plot the posterior draws
386 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
387 | static_post_plot_6a(fit)
388 |
389 | ```
390 |
391 | But what we really want is to display a selection of plausible fit lines, say 100 of them. To do that, we instead ask `add_fitted_draws` for only 50 draws, which we plot separately as lines:
392 |
393 | ```{r}
394 |
395 | fit = mydata %>%
396 | data_grid(x1 = seq_range(x1, n = 101), x2) %>%
397 | # the seed argument is for reproducibility: it ensures the pseudo-random
398 | # number generator used to pick draws has the same seed on every run,
399 | # so that someone else can re-run this code and verify their output matches
400 | add_fitted_draws(m, n = 100, seed = 12345)
401 |
402 | # Plot the posterior draws with a selection of fit draws
403 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
404 | static_post_plot_6b(fit)
405 | ```
406 |
407 |
408 | #### Animated HOPs visualization
409 | To get a better visualization of the uncertainty remaining in the posterior results, we can use animated HOPs for this graph as well. The code to generate the posterior plots is identical to the HOPs code for the priors, except we replace `m_prior` with `m`:
410 |
411 | ```{r}
412 |
413 | p = mydata %>%
414 | data_grid(x1 = seq_range(x1, n = 101), x2) %>%
415 | add_fitted_draws(m, n = n_draws, seed = 12345)
416 |
417 | # animate the data from p, using the graph aesthetics set in the graph aesthetics code chunk
418 | # this function is defined in 'plotting_functions.R', if you wish to customize the aesthetics.
419 | post_HOPs = animate(HOPS_plot_6(p), nframes = n_draws * 2, fps = frames_per_second)
420 | post_HOPs
421 |
422 | ```
423 |
424 | ### Comparing the prior and posterior
425 | If we look at our two HOPs plots together - one of the prior distribution, and one of the posterior - we can see how adding information to the model (i.e. the study data) adds more certainty to our estimates, and produces a posterior graph that is more "settled" than the prior graph.
426 |
427 | **Prior draws**
428 | ```{r echo=F}
429 | prior_HOPs
430 | ```
431 |
432 | **Posterior draws**
433 | ```{r echo=F}
434 | post_HOPs
435 | ```
436 |
437 | ## Finishing up
438 |
439 | **Congratulations!** You made it through your first Bayesian analysis. We hope our templates helped demystify the process.
440 |
441 | If you're interested in learning more about Bayesian statistics, we suggest the following textbooks:
442 |
443 | - Statistical Rethinking, by Richard McElreath.(Website: https://xcelab.net/rm/statistical-rethinking/, including links to YouTube lectures.)
444 | - Doing Bayesian Analysis, by John K. Kruschke. (Website: https://sites.google.com/site/doingbayesiandataanalysis/, including R code templates.)
445 |
446 |
447 | The citation for paper reporting the process of developing and user-testing these templates is below:
448 | Chanda Phelan, Jessica Hullman, Matthew Kay, and Paul Resnick. 2019. Some Prior(s) Experience Necessary: Templates for Getting Started with Bayesian Analysis. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3290605.3300709
449 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 cdphelan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Bayesian templates for beginners
2 |
3 | UPDATE May 2019: A new version of the templates has been released. The structure of the templates has been improved to make it easier to follow along. The ggplot code that sets the plot aesthetics has been moved to the file plotting_functions.R for improved readability of the main templates.
4 |
5 | This repo is a set of templates that will guide you through a Bayesian analysis in R, even if you have never done Bayesian analysis before. There are a set of templates, each for a different type of analysis. Over time, we will be adding to this list of templates.
6 |
7 | The research paper that accompanies these templates is forthcoming at CHI 2019: "Some Prior(s) Experience Necessary: Templates for Getting Started in Bayesian Analysis." Links will be added once the paper is published.
8 |
9 | Detailed instructions on how to get started are at the end of this README.
10 |
11 | A productive way to choose which template to use is to think about what your independent variables are and what type of chart you would like to produce to summarize your data. Currently, the templates support the following:
12 |
13 | ## One independent variable
14 |
15 | **1) Categorical**:
16 |
17 |
18 | Creates a bar chart; compatible with tests such as t-tests, one-way ANOVA
19 |
20 | Use this template file:
21 |
22 | 1var-categorical-bar-bayesian_template.Rmd
23 |
24 | **2) Ordinal**:
25 |
26 |
27 | Creates a line graph; compatible with tests such as t-tests, one-way ANOVA
28 |
29 | Use this template file:
30 |
31 | 1var-ordinal-line-bayesian_template.html.Rmd
32 |
33 | **3) Continuous**:
34 |
35 |
36 | Creates a line graph; compatible with tests such as linear regressions
37 |
38 | Use this template file:
39 |
40 | 1var-continuous-line-bayesian_template.Rmd
41 |
42 |
43 | ## Interaction of two independent variables
44 |
45 | **4) Interaction of two categorical**:
46 |
47 |
48 | Creates a bar chart; compatible with tests such as two-way ANOVA
49 |
50 | Use this template file:
51 |
52 | 2var-categorical-bar-bayesian_template.Rmd
53 |
54 | **5) Interaction of one categorical, one ordinal**:
55 |
56 |
57 | Creates a line graph; compatible with tests such as two-way ANOVA
58 |
59 | Use this template file:
60 |
61 | 2var-categorical_ordinal-line-bayesian_template.Rmd
62 |
63 | **6) Interaction of one categorical, one continuous**:
64 |
65 |
66 | Creates a line graph; compatible with tests such as linear regressions with multiple lines
67 |
68 | Use this template file:
69 |
70 | 2var-continuous_categorical-line-bayesian_template.Rmd
71 |
72 | # Getting started
73 |
74 | 1) If you do not have RStudio already installed, install it from here: https://www.rstudio.com/products/rstudio/download/#download
75 |
76 | 2) Clone the repo (if you're familiar with git) or download a zip file of the repository contents here: https://github.com/cdphelan/bayesian-template/archive/master.zip
77 |
78 | 3) Open the file bayesian-template.Rproj as a project in RStudio to get started.
79 |
80 | 4) Explore the templates as you would like! We suggest that you use the pre-knitted HTML documents (found in the folder /html_outputs/) as a reference, so you can read the instructions more easily and see a complete example as you work.
81 |
82 | 5) Open the .Rmd template files when you are ready to start editing code.
83 |
84 |
--------------------------------------------------------------------------------
/bayesian-template.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 |
--------------------------------------------------------------------------------
/datasets/choc_cleaned_data.csv:
--------------------------------------------------------------------------------
1 | "","ParticipantID","num_products_displayed","satis_Q1","max_score","sat_max"
2 | "1",2733,24,7,85,"maximizer"
3 | "2",2729,50,7,80,"maximizer"
4 | "3",3498,40,7,64,"maximizer"
5 | "4",2628,72,7,16,"satisficer"
6 | "5",2737,50,3,64,"maximizer"
7 | "6",2735,50,3,73,"maximizer"
8 | "7",2619,72,5,53,"maximizer"
9 | "8",2730,50,7,71,"maximizer"
10 | "9",2736,60,4,60,"maximizer"
11 | "10",2734,50,5,48,"satisficer"
12 | "11",687,72,4,64,"maximizer"
13 | "12",2728,50,7,63,"maximizer"
14 | "13",2725,72,7,61,"maximizer"
15 | "14",2727,24,7,64,"maximizer"
16 | "15",3828,24,4,74,"maximizer"
17 | "16",2846,50,7,77,"maximizer"
18 | "17",1316,72,6,60,"maximizer"
19 | "18",2660,60,7,53,"maximizer"
20 | "19",1102,40,7,61,"maximizer"
21 | "20",3531,60,5,76,"maximizer"
22 | "21",3870,50,7,73,"maximizer"
23 | "22",3776,12,7,67,"maximizer"
24 | "23",2623,60,1,61,"maximizer"
25 | "24",2593,60,7,73,"maximizer"
26 | "25",3868,60,6,72,"maximizer"
27 | "26",3267,60,4,71,"maximizer"
28 | "27",1509,40,6,69,"maximizer"
29 | "28",3543,60,7,38,"satisficer"
30 | "29",1781,24,5,86,"maximizer"
31 | "30",2594,50,7,68,"maximizer"
32 | "31",2827,50,5,74,"maximizer"
33 | "32",2554,72,6,68,"maximizer"
34 | "33",381,40,5,51,"satisficer"
35 | "34",3709,40,7,84,"maximizer"
36 | "35",2880,50,7,70,"maximizer"
37 | "36",3316,12,6,53,"maximizer"
38 | "37",2581,60,7,91,"maximizer"
39 | "38",3339,24,7,72,"maximizer"
40 | "39",3407,72,5,73,"maximizer"
41 | "40",2845,24,5,82,"maximizer"
42 | "41",3401,60,7,44,"satisficer"
43 | "42",3350,12,7,57,"maximizer"
44 | "43",3179,50,5,54,"maximizer"
45 | "44",3473,40,7,57,"maximizer"
46 | "45",3674,60,7,52,"satisficer"
47 | "46",2500,72,6,49,"satisficer"
48 | "47",493,24,7,67,"maximizer"
49 | "48",2505,24,5,52,"satisficer"
50 | "49",3121,60,4,58,"maximizer"
51 | "50",1510,60,6,51,"satisficer"
52 | "51",348,72,7,65,"maximizer"
53 | "52",390,72,7,74,"maximizer"
54 | "53",3764,72,5,59,"maximizer"
55 | "54",3568,12,7,72,"maximizer"
56 | "55",852,24,6,51,"satisficer"
57 | "56",2461,50,6,65,"maximizer"
58 | "57",554,50,7,61,"maximizer"
59 | "58",3818,40,5,72,"maximizer"
60 | "59",3451,50,7,43,"satisficer"
61 | "60",3185,50,6,44,"satisficer"
62 | "61",495,24,6,63,"maximizer"
63 | "62",2410,72,6,59,"maximizer"
64 | "63",3845,24,7,53,"maximizer"
65 | "64",2763,72,7,59,"maximizer"
66 | "65",2590,40,5,61,"maximizer"
67 | "66",1226,12,6,70,"maximizer"
68 | "67",2563,50,5,67,"maximizer"
69 | "68",3634,50,7,65,"maximizer"
70 | "69",3633,60,6,60,"maximizer"
71 | "70",2602,12,5,57,"maximizer"
72 | "71",3887,40,6,70,"maximizer"
73 | "72",3191,72,7,59,"maximizer"
74 | "73",2951,12,7,58,"maximizer"
75 | "74",3442,40,6,58,"maximizer"
76 | "75",3389,60,6,57,"maximizer"
77 | "76",3736,24,7,57,"maximizer"
78 | "77",2862,50,7,60,"maximizer"
79 | "78",3556,24,6,58,"maximizer"
80 | "79",3228,72,6,46,"satisficer"
81 | "80",2444,40,7,41,"satisficer"
82 | "81",3184,50,4,49,"satisficer"
83 | "82",2603,40,7,62,"maximizer"
84 | "83",2857,60,7,30,"satisficer"
85 | "84",3180,12,6,70,"maximizer"
86 | "85",3233,40,4,38,"satisficer"
87 | "86",2817,40,6,74,"maximizer"
88 | "87",3292,60,7,84,"maximizer"
89 | "88",3521,50,7,77,"maximizer"
90 | "89",3593,24,4,60,"maximizer"
91 | "90",2751,60,7,70,"maximizer"
92 | "91",3549,50,5,66,"maximizer"
93 | "92",2676,60,4,67,"maximizer"
94 | "93",2692,50,6,57,"maximizer"
95 | "94",3186,12,6,32,"satisficer"
96 | "95",2943,60,7,48,"satisficer"
97 | "96",1759,12,4,67,"maximizer"
98 | "97",438,24,6,59,"maximizer"
99 | "98",695,12,5,52,"satisficer"
100 | "99",3334,60,5,64,"maximizer"
101 | "100",3676,40,6,51,"satisficer"
102 | "101",2454,72,5,55,"maximizer"
103 | "102",3817,12,6,52,"satisficer"
104 | "103",2450,50,6,71,"maximizer"
105 | "104",3384,60,7,64,"maximizer"
106 | "105",3330,12,7,66,"maximizer"
107 | "106",2864,40,5,57,"maximizer"
108 | "107",528,40,7,54,"maximizer"
109 | "108",3827,24,4,73,"maximizer"
110 | "109",1945,40,5,73,"maximizer"
111 | "110",1749,50,5,72,"maximizer"
112 | "111",3376,72,5,52,"satisficer"
113 | "112",3387,72,7,49,"satisficer"
114 | "113",2791,50,7,68,"maximizer"
115 | "114",2493,72,4,74,"maximizer"
116 | "115",2883,24,7,64,"maximizer"
117 | "116",2514,50,7,51,"satisficer"
118 | "117",3394,12,7,72,"maximizer"
119 | "118",3142,40,7,61,"maximizer"
120 | "119",828,24,7,42,"satisficer"
121 | "120",3797,72,7,86,"maximizer"
122 | "121",2691,24,6,74,"maximizer"
123 | "122",3486,24,7,63,"maximizer"
124 | "123",2755,40,7,52,"satisficer"
125 | "124",3730,50,6,58,"maximizer"
126 | "125",2869,50,5,59,"maximizer"
127 | "126",417,50,7,80,"maximizer"
128 | "127",3554,50,7,33,"satisficer"
129 | "128",357,12,6,48,"satisficer"
130 | "129",3472,40,6,68,"maximizer"
131 | "130",2463,72,6,62,"maximizer"
132 | "131",3215,40,5,55,"maximizer"
133 | "132",2543,12,7,54,"maximizer"
134 | "133",3578,72,7,53,"maximizer"
135 | "134",3383,40,7,63,"maximizer"
136 | "135",1791,72,6,51,"satisficer"
137 | "136",2686,12,6,50,"satisficer"
138 | "137",2470,72,5,54,"maximizer"
139 | "138",3028,60,7,56,"maximizer"
140 | "139",3525,12,7,45,"satisficer"
141 | "140",3363,40,6,53,"maximizer"
142 | "141",982,50,7,66,"maximizer"
143 | "142",3173,40,7,74,"maximizer"
144 | "143",2501,12,5,43,"satisficer"
145 | "144",3679,12,6,51,"satisficer"
146 | "145",1030,12,7,56,"maximizer"
147 | "146",3150,72,4,57,"maximizer"
148 | "147",2964,12,5,56,"maximizer"
149 | "148",1050,12,7,54,"maximizer"
150 | "149",3420,40,4,30,"satisficer"
151 | "150",2760,60,6,62,"maximizer"
152 | "151",2541,50,5,50,"satisficer"
153 | "152",2826,12,6,68,"maximizer"
154 | "153",2548,40,6,49,"satisficer"
155 | "154",2707,40,7,67,"maximizer"
156 | "155",349,12,7,67,"maximizer"
157 | "156",3721,60,6,53,"maximizer"
158 | "157",2833,40,7,36,"satisficer"
159 | "158",3046,72,5,51,"satisficer"
160 | "159",3598,50,6,76,"maximizer"
161 | "160",3270,60,6,64,"maximizer"
162 | "161",3382,50,4,68,"maximizer"
163 | "162",3048,50,7,67,"maximizer"
164 | "163",3468,50,7,61,"maximizer"
165 | "164",2426,40,6,58,"maximizer"
166 | "165",3605,12,7,39,"satisficer"
167 | "166",2874,40,7,34,"satisficer"
168 | "167",3271,24,7,64,"maximizer"
169 | "168",2812,24,4,89,"maximizer"
170 | "169",2753,40,6,57,"maximizer"
171 | "170",336,50,6,63,"maximizer"
172 | "171",430,12,6,37,"satisficer"
173 | "172",2503,72,6,71,"maximizer"
174 | "173",1933,60,5,38,"satisficer"
175 | "174",1606,60,7,69,"maximizer"
176 | "175",2309,12,7,76,"maximizer"
177 | "176",2709,60,7,62,"maximizer"
178 | "177",1790,60,5,41,"satisficer"
179 | "178",3269,40,7,61,"maximizer"
180 | "179",3337,24,6,55,"maximizer"
181 | "180",3395,50,5,55,"maximizer"
182 | "181",3830,72,6,50,"satisficer"
183 | "182",3718,72,6,78,"maximizer"
184 | "183",3351,12,6,74,"maximizer"
185 | "184",2592,60,7,68,"maximizer"
186 | "185",2589,24,7,52,"satisficer"
187 | "186",2526,50,5,34,"satisficer"
188 | "187",3450,50,6,45,"satisficer"
189 | "188",2356,60,5,70,"maximizer"
190 | "189",1996,12,6,58,"maximizer"
191 | "190",3283,72,7,64,"maximizer"
192 | "191",373,12,7,71,"maximizer"
193 | "192",2832,24,6,66,"maximizer"
194 | "193",2578,40,7,63,"maximizer"
195 | "194",2562,50,6,74,"maximizer"
196 | "195",3268,12,7,69,"maximizer"
197 | "196",3879,40,7,38,"satisficer"
198 | "197",2620,40,5,57,"maximizer"
199 | "198",3637,50,7,65,"maximizer"
200 | "199",3580,60,6,51,"satisficer"
201 | "200",3757,24,6,50,"satisficer"
202 | "201",2509,40,7,70,"maximizer"
203 | "202",356,24,6,44,"satisficer"
204 | "203",3218,50,6,48,"satisficer"
205 | "204",3520,60,5,62,"maximizer"
206 | "205",3551,40,7,58,"maximizer"
207 | "206",3166,12,6,48,"satisficer"
208 | "207",542,12,7,61,"maximizer"
209 | "208",2560,72,5,69,"maximizer"
210 | "209",2881,24,7,41,"satisficer"
211 | "210",3612,12,6,56,"maximizer"
212 | "211",2458,50,5,69,"maximizer"
213 | "212",3570,40,5,51,"satisficer"
214 | "213",3235,72,7,51,"satisficer"
215 | "214",2840,40,5,61,"maximizer"
216 | "215",1313,50,5,64,"maximizer"
217 | "216",3592,50,5,65,"maximizer"
218 | "217",3495,60,7,47,"satisficer"
219 | "218",3469,12,5,54,"maximizer"
220 | "219",3687,50,7,47,"satisficer"
221 | "220",3713,12,7,61,"maximizer"
222 | "221",2784,40,6,52,"satisficer"
223 | "222",2605,60,5,64,"maximizer"
224 | "223",3296,72,7,62,"maximizer"
225 | "224",2970,24,5,52,"satisficer"
226 | "225",3272,12,6,68,"maximizer"
227 | "226",3620,12,4,60,"maximizer"
228 | "227",3503,72,7,68,"maximizer"
229 | "228",3530,40,6,71,"maximizer"
230 | "229",2102,72,6,47,"satisficer"
231 | "230",3658,72,6,55,"maximizer"
232 | "231",3675,50,5,68,"maximizer"
233 | "232",2818,60,7,48,"satisficer"
234 | "233",2580,24,6,38,"satisficer"
235 | "234",3423,40,6,49,"satisficer"
236 | "235",3266,60,6,57,"maximizer"
237 | "236",3301,60,7,49,"satisficer"
238 | "237",2681,12,7,60,"maximizer"
239 | "238",2443,50,7,45,"satisficer"
240 | "239",811,12,6,66,"maximizer"
241 | "240",354,60,7,54,"maximizer"
242 | "241",2873,24,7,69,"maximizer"
243 | "242",2842,24,7,54,"maximizer"
244 | "243",1463,60,7,60,"maximizer"
245 | "244",3239,72,7,54,"maximizer"
246 | "245",2762,24,7,39,"satisficer"
247 | "246",2387,50,7,71,"maximizer"
248 | "247",2634,24,7,59,"maximizer"
249 | "248",3878,12,7,54,"maximizer"
250 | "249",2863,24,7,61,"maximizer"
251 | "250",2780,12,7,66,"maximizer"
252 | "251",2440,60,7,36,"satisficer"
253 | "252",3851,40,7,50,"satisficer"
254 | "253",3602,72,2,69,"maximizer"
255 | "254",3523,60,6,77,"maximizer"
256 | "255",3615,12,7,73,"maximizer"
257 | "256",1712,40,6,67,"maximizer"
258 | "257",2477,50,7,57,"maximizer"
259 | "258",2637,60,6,54,"maximizer"
260 | "259",2891,72,4,51,"satisficer"
261 | "260",3145,12,7,53,"maximizer"
262 | "261",3667,72,7,70,"maximizer"
263 | "262",3467,24,6,64,"maximizer"
264 | "263",1483,60,4,60,"maximizer"
265 | "264",2393,24,7,54,"maximizer"
266 | "265",3619,24,6,64,"maximizer"
267 | "266",3835,50,7,39,"satisficer"
268 | "267",3881,12,5,57,"maximizer"
269 | "268",3402,12,6,45,"satisficer"
270 | "269",2497,50,5,70,"maximizer"
271 | "270",592,50,7,64,"maximizer"
272 | "271",2774,40,6,58,"maximizer"
273 | "272",1036,50,7,49,"satisficer"
274 | "273",2781,72,7,55,"maximizer"
275 | "274",3207,24,5,49,"satisficer"
276 | "275",3613,60,6,58,"maximizer"
277 | "276",3699,60,6,41,"satisficer"
278 | "277",3731,40,7,73,"maximizer"
279 | "278",3816,12,7,84,"maximizer"
280 | "279",3579,72,7,54,"maximizer"
281 | "280",3606,12,6,70,"maximizer"
282 | "281",2830,50,5,56,"maximizer"
283 | "282",2828,24,7,73,"maximizer"
284 | "283",2285,12,6,65,"maximizer"
285 | "284",3585,60,7,83,"maximizer"
286 | "285",1505,50,4,76,"maximizer"
287 | "286",3517,50,5,65,"maximizer"
288 | "287",3438,72,6,49,"satisficer"
289 | "288",2875,40,7,66,"maximizer"
290 | "289",2569,72,7,52,"satisficer"
291 | "290",3393,72,7,52,"satisficer"
292 | "291",595,60,6,65,"maximizer"
293 | "292",3386,24,6,49,"satisficer"
294 | "293",3378,24,3,56,"maximizer"
295 | "294",2570,72,7,43,"satisficer"
296 | "295",825,72,6,67,"maximizer"
297 | "296",439,72,4,40,"satisficer"
298 | "297",3282,60,7,77,"maximizer"
299 | "298",3428,50,4,70,"maximizer"
300 | "299",1639,50,6,55,"maximizer"
301 | "300",3526,60,7,43,"satisficer"
302 | "301",3582,12,7,54,"maximizer"
303 | "302",3429,12,7,49,"satisficer"
304 | "303",2636,12,7,50,"satisficer"
305 | "304",3729,72,7,53,"maximizer"
306 | "305",3433,40,7,48,"satisficer"
307 | "306",2747,60,5,69,"maximizer"
308 | "307",3716,24,2,78,"maximizer"
309 | "308",3007,24,7,52,"satisficer"
310 | "309",3342,40,6,53,"maximizer"
311 | "310",2847,40,7,57,"maximizer"
312 | "311",2279,12,5,65,"maximizer"
313 | "312",2482,60,6,49,"satisficer"
314 | "313",2661,24,7,56,"maximizer"
315 | "314",676,50,6,44,"satisficer"
316 | "315",3188,72,7,42,"satisficer"
317 | "316",1677,40,6,50,"satisficer"
318 | "317",3360,72,6,49,"satisficer"
319 | "318",2776,50,7,65,"maximizer"
320 | "319",3712,24,7,47,"satisficer"
321 | "320",3514,24,7,44,"satisficer"
322 | "321",2834,72,6,48,"satisficer"
323 | "322",2851,12,7,57,"maximizer"
324 | "323",2523,60,6,62,"maximizer"
325 | "324",3635,72,7,59,"maximizer"
326 | "325",3253,50,7,40,"satisficer"
327 | "326",648,72,7,41,"satisficer"
328 | "327",2522,12,6,68,"maximizer"
329 | "328",3711,12,5,70,"maximizer"
330 | "329",874,50,5,36,"satisficer"
331 | "330",3197,24,6,54,"maximizer"
332 | "331",3454,40,6,68,"maximizer"
333 | "332",3560,12,4,57,"maximizer"
334 | "333",2246,40,7,55,"maximizer"
335 | "334",3365,50,6,53,"maximizer"
336 | "335",2838,50,5,60,"maximizer"
337 | "336",3670,72,7,53,"maximizer"
338 | "337",2485,40,6,41,"satisficer"
339 | "338",2508,40,7,57,"maximizer"
340 | "339",3065,12,5,77,"maximizer"
341 | "340",3431,12,6,61,"maximizer"
342 | "341",2572,12,5,42,"satisficer"
343 | "342",1701,50,7,60,"maximizer"
344 | "343",460,24,6,64,"maximizer"
345 | "344",3331,72,6,33,"satisficer"
346 | "345",2705,50,7,66,"maximizer"
347 | "346",2788,12,7,46,"satisficer"
348 | "347",2442,60,7,58,"maximizer"
349 | "348",2611,50,6,54,"maximizer"
350 | "349",3621,72,7,51,"satisficer"
351 | "350",3763,50,7,59,"maximizer"
352 | "351",3178,60,5,49,"satisficer"
353 | "352",2789,50,5,53,"maximizer"
354 | "353",3332,24,6,63,"maximizer"
355 | "354",1120,50,6,61,"maximizer"
356 | "355",2534,60,6,60,"maximizer"
357 | "356",3722,12,3,56,"maximizer"
358 | "357",3422,60,4,48,"satisficer"
359 | "358",839,24,7,61,"maximizer"
360 | "359",631,72,6,85,"maximizer"
361 | "360",3485,60,7,53,"maximizer"
362 | "361",3719,50,7,58,"maximizer"
363 | "362",3311,50,6,58,"maximizer"
364 | "363",3564,40,7,68,"maximizer"
365 | "364",3147,72,5,60,"maximizer"
366 | "365",2933,40,7,61,"maximizer"
367 | "366",3062,60,7,45,"satisficer"
368 | "367",2484,40,6,54,"maximizer"
369 | "368",2431,40,5,56,"maximizer"
370 | "369",3258,50,5,65,"maximizer"
371 | "370",3347,72,6,63,"maximizer"
372 | "371",3575,60,6,62,"maximizer"
373 | "372",3470,50,7,44,"satisficer"
374 | "373",3741,12,7,55,"maximizer"
375 | "374",2768,60,7,53,"maximizer"
376 | "375",2599,60,6,49,"satisficer"
377 | "376",3748,40,7,75,"maximizer"
378 | "377",3002,40,6,65,"maximizer"
379 | "378",2878,72,7,57,"maximizer"
380 | "379",2506,24,7,54,"maximizer"
381 | "380",3700,12,4,65,"maximizer"
382 | "381",1390,12,7,48,"satisficer"
383 | "382",2808,40,7,48,"satisficer"
384 | "383",2764,50,7,37,"satisficer"
385 | "384",334,60,6,62,"maximizer"
386 | "385",2490,72,6,52,"satisficer"
387 | "386",2459,72,6,44,"satisficer"
388 | "387",3515,12,7,52,"satisficer"
389 | "388",2451,72,7,45,"satisficer"
390 | "389",3437,72,7,55,"maximizer"
391 | "390",1024,60,5,58,"maximizer"
392 | "391",3241,60,4,66,"maximizer"
393 | "392",2665,50,7,50,"satisficer"
394 | "393",3064,12,7,59,"maximizer"
395 | "394",3669,72,7,51,"satisficer"
396 | "395",3657,12,5,57,"maximizer"
397 | "396",3392,60,7,63,"maximizer"
398 | "397",2831,72,7,44,"satisficer"
399 | "398",3297,72,6,55,"maximizer"
400 | "399",3542,72,7,58,"maximizer"
401 | "400",3789,72,6,42,"satisficer"
402 | "401",3673,12,7,67,"maximizer"
403 | "402",2643,24,6,46,"satisficer"
404 | "403",3544,12,6,36,"satisficer"
405 | "404",3265,12,7,52,"satisficer"
406 | "405",3295,72,6,75,"maximizer"
407 | "406",2759,24,7,63,"maximizer"
408 | "407",3701,72,7,53,"maximizer"
409 | "408",3164,24,5,43,"satisficer"
410 | "409",2790,24,5,45,"satisficer"
411 | "410",3622,40,7,74,"maximizer"
412 | "411",3466,24,7,62,"maximizer"
413 | "412",3697,24,5,62,"maximizer"
414 | "413",3447,24,7,72,"maximizer"
415 | "414",3225,40,7,37,"satisficer"
416 | "415",3183,40,6,62,"maximizer"
417 | "416",3819,40,7,44,"satisficer"
418 | "417",3534,24,7,60,"maximizer"
419 | "418",2853,60,7,59,"maximizer"
420 | "419",2767,40,7,44,"satisficer"
421 | "420",3231,50,7,42,"satisficer"
422 | "421",2645,12,7,63,"maximizer"
423 | "422",1010,12,7,47,"satisficer"
424 | "423",2616,24,6,43,"satisficer"
425 | "424",3639,40,5,53,"maximizer"
426 | "425",3683,60,7,55,"maximizer"
427 | "426",3456,72,6,61,"maximizer"
428 | "427",2456,60,4,61,"maximizer"
429 | "428",1603,72,6,73,"maximizer"
430 | "429",2848,60,7,53,"maximizer"
431 | "430",3834,40,6,63,"maximizer"
432 | "431",3573,12,5,52,"satisficer"
433 | "432",3122,12,7,36,"satisficer"
434 | "433",2931,40,6,52,"satisficer"
435 | "434",3756,40,7,60,"maximizer"
436 | "435",2821,24,7,38,"satisficer"
437 | "436",3352,12,6,52,"satisficer"
438 | "437",2950,12,7,71,"maximizer"
439 | "438",2565,24,7,77,"maximizer"
440 | "439",2520,60,6,58,"maximizer"
441 | "440",3636,60,6,37,"satisficer"
442 | "441",1542,72,6,66,"maximizer"
443 | "442",1937,60,4,52,"satisficer"
444 | "443",3156,72,7,51,"satisficer"
445 | "444",3509,60,7,43,"satisficer"
446 | "445",2704,60,6,59,"maximizer"
447 | "446",375,12,6,52,"satisficer"
448 | "447",3684,12,7,68,"maximizer"
449 | "448",2606,12,5,60,"maximizer"
450 | "449",2518,40,7,44,"satisficer"
451 | "450",3165,24,5,58,"maximizer"
452 | "451",2816,60,4,67,"maximizer"
453 | "452",3053,24,5,61,"maximizer"
454 | "453",2607,60,5,59,"maximizer"
455 | "454",441,12,7,36,"satisficer"
456 | "455",3655,50,6,60,"maximizer"
457 | "456",3444,40,6,81,"maximizer"
458 | "457",3836,50,7,53,"maximizer"
459 | "458",1819,50,6,54,"maximizer"
460 | "459",2229,72,7,47,"satisficer"
461 | "460",3262,12,7,49,"satisficer"
462 | "461",3608,24,5,65,"maximizer"
463 | "462",2804,60,6,60,"maximizer"
464 | "463",3841,60,7,53,"maximizer"
465 | "464",789,24,6,59,"maximizer"
466 | "465",2553,12,4,75,"maximizer"
467 | "466",2517,40,7,33,"satisficer"
468 | "467",1252,72,6,40,"satisficer"
469 | "468",2746,40,7,39,"satisficer"
470 | "469",3545,72,5,75,"maximizer"
471 | "470",3054,12,6,54,"maximizer"
472 | "471",330,60,1,55,"maximizer"
473 | "472",2724,50,7,31,"satisficer"
474 | "473",2811,24,5,57,"maximizer"
475 | "474",3249,60,6,60,"maximizer"
476 | "475",2797,40,5,44,"satisficer"
477 | "476",2427,50,4,66,"maximizer"
478 | "477",3855,60,7,49,"satisficer"
479 | "478",2713,60,7,91,"maximizer"
480 | "479",1770,50,7,46,"satisficer"
481 | "480",3617,72,7,56,"maximizer"
482 | "481",2773,72,6,46,"satisficer"
483 | "482",3187,24,7,50,"satisficer"
484 | "483",2861,50,7,71,"maximizer"
485 | "484",2893,60,4,54,"maximizer"
486 | "485",2489,50,6,39,"satisficer"
487 | "486",2953,40,6,52,"satisficer"
488 | "487",3280,12,7,32,"satisficer"
489 | "488",2494,50,6,46,"satisficer"
490 | "489",2694,60,7,52,"satisficer"
491 | "490",3853,40,7,52,"satisficer"
492 | "491",3425,40,7,64,"maximizer"
493 | "492",3243,72,7,53,"maximizer"
494 | "493",3443,24,7,52,"satisficer"
495 | "494",3181,12,7,44,"satisficer"
496 | "495",2785,50,7,71,"maximizer"
497 | "496",3533,50,6,50,"satisficer"
498 | "497",2457,72,6,48,"satisficer"
499 | "498",2564,12,7,66,"maximizer"
500 | "499",2680,40,7,50,"satisficer"
501 | "500",2695,24,7,61,"maximizer"
502 | "501",3434,60,7,62,"maximizer"
503 | "502",2283,12,6,74,"maximizer"
504 | "503",2783,72,7,50,"satisficer"
505 | "504",3375,12,7,51,"satisficer"
506 | "505",3255,40,5,57,"maximizer"
507 | "506",2819,40,6,59,"maximizer"
508 | "507",2849,60,7,45,"satisficer"
509 | "508",3677,60,7,37,"satisficer"
510 | "509",3024,12,4,43,"satisficer"
511 | "510",2855,12,6,45,"satisficer"
512 | "511",3388,40,6,54,"maximizer"
513 | "512",2448,12,6,40,"satisficer"
514 | "513",3103,72,5,32,"satisficer"
515 | "514",2540,40,6,57,"maximizer"
516 | "515",2805,50,6,43,"satisficer"
517 | "516",3131,72,7,53,"maximizer"
518 | "517",3532,72,7,67,"maximizer"
519 | "518",2000,60,7,57,"maximizer"
520 | "519",3385,12,5,49,"satisficer"
521 | "520",3016,50,6,59,"maximizer"
522 | "521",3440,24,4,30,"satisficer"
523 | "522",2786,12,7,36,"satisficer"
524 | "523",2577,40,7,60,"maximizer"
525 | "524",3286,12,6,58,"maximizer"
526 | "525",2452,40,7,49,"satisficer"
527 | "526",3293,50,6,72,"maximizer"
528 | "527",3781,24,5,52,"satisficer"
529 | "528",3211,24,6,65,"maximizer"
530 | "529",3226,40,7,35,"satisficer"
531 | "530",2779,50,7,57,"maximizer"
532 | "531",2612,72,7,48,"satisficer"
533 | "532",3400,50,6,41,"satisficer"
534 | "533",455,40,6,68,"maximizer"
535 | "534",2905,50,7,30,"satisficer"
536 | "535",3832,50,5,62,"maximizer"
537 | "536",3460,24,5,80,"maximizer"
538 | "537",3281,12,6,49,"satisficer"
539 | "538",1015,12,7,43,"satisficer"
540 | "539",3362,60,6,53,"maximizer"
541 | "540",3338,40,6,57,"maximizer"
542 | "541",2941,50,6,55,"maximizer"
543 | "542",2504,50,7,72,"maximizer"
544 | "543",2197,60,7,48,"satisficer"
545 | "544",3254,40,7,52,"satisficer"
546 | "545",3435,60,7,66,"maximizer"
547 | "546",3144,12,7,55,"maximizer"
548 | "547",784,60,7,66,"maximizer"
549 | "548",3303,60,6,54,"maximizer"
550 | "549",2492,72,5,56,"maximizer"
551 | "550",3307,12,6,47,"satisficer"
552 | "551",2841,50,6,40,"satisficer"
553 | "552",3698,60,7,32,"satisficer"
554 | "553",3439,60,6,72,"maximizer"
555 | "554",3177,50,6,56,"maximizer"
556 | "555",3333,72,5,71,"maximizer"
557 | "556",2545,72,5,45,"satisficer"
558 | "557",2792,24,7,62,"maximizer"
559 | "558",3315,24,6,42,"satisficer"
560 | "559",2955,72,6,55,"maximizer"
561 | "560",3083,50,7,61,"maximizer"
562 | "561",2887,72,6,54,"maximizer"
563 | "562",2478,60,5,61,"maximizer"
564 | "563",3170,12,6,77,"maximizer"
565 | "564",2744,60,6,52,"satisficer"
566 | "565",2537,50,7,65,"maximizer"
567 | "566",3285,50,7,49,"satisficer"
568 | "567",3680,40,6,57,"maximizer"
569 | "568",2529,50,6,58,"maximizer"
570 | "569",3576,72,7,43,"satisficer"
571 | "570",3706,72,7,45,"satisficer"
572 | "571",2870,72,7,64,"maximizer"
573 | "572",3371,60,7,72,"maximizer"
574 | "573",3459,40,6,46,"satisficer"
575 | "574",3645,50,4,80,"maximizer"
576 | "575",3571,50,7,59,"maximizer"
577 | "576",2595,60,7,43,"satisficer"
578 | "577",2212,60,7,60,"maximizer"
579 | "578",3348,60,7,57,"maximizer"
580 | "579",1971,60,6,50,"satisficer"
581 | "580",3244,72,7,34,"satisficer"
582 | "581",3182,72,6,43,"satisficer"
583 | "582",2850,60,3,52,"satisficer"
584 | "583",629,12,6,55,"maximizer"
585 | "584",2772,40,7,58,"maximizer"
586 | "585",3860,24,5,59,"maximizer"
587 | "586",2568,40,7,67,"maximizer"
588 | "587",3724,72,6,62,"maximizer"
589 | "588",1289,12,7,52,"satisficer"
590 | "589",2871,24,4,77,"maximizer"
591 | "590",2586,50,6,55,"maximizer"
592 | "591",2608,50,6,69,"maximizer"
593 | "592",3398,12,6,53,"maximizer"
594 | "593",2604,24,6,66,"maximizer"
595 | "594",2675,24,6,46,"satisficer"
596 | "595",3110,12,7,33,"satisficer"
597 | "596",2527,12,6,42,"satisficer"
598 | "597",415,40,7,78,"maximizer"
599 | "598",3416,24,6,62,"maximizer"
600 | "599",2765,12,7,45,"satisficer"
601 | "600",2587,12,7,66,"maximizer"
602 | "601",3381,12,4,67,"maximizer"
603 | "602",2618,72,6,60,"maximizer"
604 | "603",3546,60,4,46,"satisficer"
605 | "604",2858,60,7,62,"maximizer"
606 | "605",2837,24,6,54,"maximizer"
607 | "606",2673,50,7,66,"maximizer"
608 | "607",3026,24,6,49,"satisficer"
609 | "608",3762,60,7,54,"maximizer"
610 | "609",1980,60,6,58,"maximizer"
611 | "610",3276,40,5,70,"maximizer"
612 | "611",3626,40,7,62,"maximizer"
613 |
--------------------------------------------------------------------------------
/datasets/feel-the-movement_simulated-data.csv:
--------------------------------------------------------------------------------
1 | "","participant","motion","tlx_scale","value"
2 | "1",1,"motion","effort",62.6771370559256
3 | "2",2,"motion","effort",63.4354251165914
4 | "3",3,"motion","effort",76.2658933484645
5 | "4",4,"motion","effort",66.3820234657209
6 | "5",5,"motion","effort",65.6975218414369
7 | "6",6,"motion","effort",74.03741634514
8 | "7",7,"motion","effort",62.7278952274524
9 | "8",8,"motion","effort",64.4763705067135
10 | "9",9,"motion","effort",60.5876560715567
11 | "10",10,"motion","effort",67.2479168232032
12 | "11",11,"motion","effort",68.7483313902238
13 | "12",12,"motion","effort",63.5102955005729
14 | "13",13,"motion","effort",53.16100398592
15 | "14",14,"motion","effort",43.7113566640125
16 | "15",15,"motion","effort",63.0529421613001
17 | "16",16,"motion","effort",52.2808144957658
18 | "17",1,"motion","frustration",61.6201519017405
19 | "18",2,"motion","frustration",67.3077229193698
20 | "19",3,"motion","frustration",38.4557544676673
21 | "20",4,"motion","frustration",40.9211178284298
22 | "21",5,"motion","frustration",66.8935737234158
23 | "22",6,"motion","frustration",70.470560878121
24 | "23",7,"motion","frustration",75.7511739437905
25 | "24",8,"motion","frustration",45.9269502385398
26 | "25",9,"motion","frustration",31.8621980876999
27 | "26",10,"motion","frustration",55.114263582379
28 | "27",11,"motion","frustration",25.5436395548097
29 | "28",12,"motion","frustration",43.835108881354
30 | "29",13,"motion","frustration",51.5842570093871
31 | "30",14,"motion","frustration",32.8405486619909
32 | "31",15,"motion","frustration",46.5531182429116
33 | "32",16,"motion","frustration",33.3198600783933
34 | "33",1,"motion","mental_demand",65.1995519130071
35 | "34",2,"motion","mental_demand",65.6961609874982
36 | "35",3,"motion","mental_demand",59.3961427371377
37 | "36",4,"motion","mental_demand",69.8805770972607
38 | "37",5,"motion","mental_demand",77.6740721949966
39 | "38",6,"motion","mental_demand",70.4827130196958
40 | "39",7,"motion","mental_demand",58.5390917473144
41 | "40",8,"motion","mental_demand",73.5908307345228
42 | "41",9,"motion","mental_demand",75.1373820502899
43 | "42",10,"motion","mental_demand",67.9491896417236
44 | "43",11,"motion","mental_demand",67.3894228990569
45 | "44",12,"motion","mental_demand",67.8700868540447
46 | "45",13,"motion","mental_demand",70.6149403360524
47 | "46",14,"motion","mental_demand",70.1159547762453
48 | "47",15,"motion","mental_demand",70.5115723937418
49 | "48",16,"motion","mental_demand",69.9523106174122
50 | "49",1,"motion","performance",53.8756777801028
51 | "50",2,"motion","performance",61.7459940639232
52 | "51",3,"motion","performance",36.7482036856618
53 | "52",4,"motion","performance",58.4828245097124
54 | "53",5,"motion","performance",65.7588288541138
55 | "54",6,"motion","performance",57.5078005553597
56 | "55",7,"motion","performance",57.7940598138312
57 | "56",8,"motion","performance",58.7180697964181
58 | "57",9,"motion","performance",40.0213269023191
59 | "58",10,"motion","performance",67.1339510428577
60 | "59",11,"motion","performance",52.3945502146542
61 | "60",12,"motion","performance",28.554922625903
62 | "61",13,"motion","performance",52.5881457436131
63 | "62",14,"motion","performance",58.6607454744935
64 | "63",15,"motion","performance",45.7458331637738
65 | "64",16,"motion","performance",44.2690657732627
66 | "65",1,"motion","physical_demand",44.3822856923961
67 | "66",2,"motion","physical_demand",42.394630354364
68 | "67",3,"motion","physical_demand",31.3837039265605
69 | "68",4,"motion","physical_demand",20.9895777776461
70 | "69",5,"motion","physical_demand",50.3030074925543
71 | "70",6,"motion","physical_demand",36.1740268714536
72 | "71",7,"motion","physical_demand",78.5673363887555
73 | "72",8,"motion","physical_demand",43.8709626597258
74 | "73",9,"motion","physical_demand",47.6785917730261
75 | "74",10,"motion","physical_demand",54.1910169249521
76 | "75",11,"motion","physical_demand",24.1474535522667
77 | "76",12,"motion","physical_demand",43.1677762007205
78 | "77",13,"motion","physical_demand",25.6418001648926
79 | "78",14,"motion","physical_demand",33.2263705309118
80 | "79",15,"motion","physical_demand",53.8940040112754
81 | "80",16,"motion","physical_demand",45.0274556784988
82 | "81",1,"motion","temporal_demand",64.7585219029532
83 | "82",2,"motion","temporal_demand",67.8707793375568
84 | "83",3,"motion","temporal_demand",68.7232160707118
85 | "84",4,"motion","temporal_demand",50.5731151746078
86 | "85",5,"motion","temporal_demand",59.9556328636728
87 | "86",6,"motion","temporal_demand",56.9228740267909
88 | "87",7,"motion","temporal_demand",64.8387476229165
89 | "88",8,"motion","temporal_demand",66.132502668041
90 | "89",9,"motion","temporal_demand",67.516716381199
91 | "90",10,"motion","temporal_demand",57.7545036553703
92 | "91",11,"motion","temporal_demand",61.6615295154681
93 | "92",12,"motion","temporal_demand",63.2242521256356
94 | "93",13,"motion","temporal_demand",68.0329269256004
95 | "94",14,"motion","temporal_demand",57.6838375149577
96 | "95",15,"motion","temporal_demand",58.1614300601278
97 | "96",16,"motion","temporal_demand",46.1894141543904
98 | "97",1,"no_motion","effort",58.5614249963152
99 | "98",2,"no_motion","effort",64.699364050644
100 | "99",3,"no_motion","effort",56.1285571312222
101 | "100",4,"no_motion","effort",57.5043251549015
102 | "101",5,"no_motion","effort",72.4094233483514
103 | "102",6,"no_motion","effort",63.5574861451147
104 | "103",7,"no_motion","effort",57.7852575831209
105 | "104",8,"no_motion","effort",66.8122813853142
106 | "105",9,"no_motion","effort",63.0053416399275
107 | "106",10,"no_motion","effort",59.3436305871179
108 | "107",11,"no_motion","effort",72.4897883385223
109 | "108",12,"no_motion","effort",58.5363008932268
110 | "109",13,"no_motion","effort",60.2628165237742
111 | "110",14,"no_motion","effort",71.1814214109012
112 | "111",15,"no_motion","effort",69.4400907968164
113 | "112",16,"no_motion","effort",72.2824900147297
114 | "113",1,"no_motion","frustration",38.6685024331998
115 | "114",2,"no_motion","frustration",19.5940822219001
116 | "115",3,"no_motion","frustration",27.1474499149157
117 | "116",4,"no_motion","frustration",53.7579266113433
118 | "117",5,"no_motion","frustration",41.3116380443163
119 | "118",6,"no_motion","frustration",46.5499078783941
120 | "119",7,"no_motion","frustration",51.4600640247224
121 | "120",8,"no_motion","frustration",60.7924116922647
122 | "121",9,"no_motion","frustration",63.6737578699399
123 | "122",10,"no_motion","frustration",72.2826117461159
124 | "123",11,"no_motion","frustration",33.9419530392239
125 | "124",12,"no_motion","frustration",61.4898151714451
126 | "125",13,"no_motion","frustration",55.046118963894
127 | "126",14,"no_motion","frustration",38.2602047315632
128 | "127",15,"no_motion","frustration",48.2965876008443
129 | "128",16,"no_motion","frustration",47.726968055917
130 | "129",1,"no_motion","mental_demand",61.3914073808808
131 | "130",2,"no_motion","mental_demand",71.3401568567912
132 | "131",3,"no_motion","mental_demand",59.9000047044046
133 | "132",4,"no_motion","mental_demand",60.9500381088238
134 | "133",5,"no_motion","mental_demand",66.5278306263365
135 | "134",6,"no_motion","mental_demand",64.3631198045993
136 | "135",7,"no_motion","mental_demand",57.9692286938047
137 | "136",8,"no_motion","mental_demand",78.9522439205533
138 | "137",9,"no_motion","mental_demand",74.0910105261259
139 | "138",10,"no_motion","mental_demand",63.9942276948203
140 | "139",11,"no_motion","mental_demand",71.4781760333647
141 | "140",12,"no_motion","mental_demand",56.4007211104531
142 | "141",13,"no_motion","mental_demand",72.29144173273
143 | "142",14,"no_motion","mental_demand",71.8361674389643
144 | "143",15,"no_motion","mental_demand",77.5489661048835
145 | "144",16,"no_motion","mental_demand",70.965259262464
146 | "145",1,"no_motion","performance",38.7432612359684
147 | "146",2,"no_motion","performance",56.2074461601137
148 | "147",3,"no_motion","performance",63.0138244963009
149 | "148",4,"no_motion","performance",40.8894393148272
150 | "149",5,"no_motion","performance",58.2739402425559
151 | "150",6,"no_motion","performance",52.2537180805095
152 | "151",7,"no_motion","performance",63.5728827865647
153 | "152",8,"no_motion","performance",57.7975034803145
154 | "153",9,"no_motion","performance",38.4347005303272
155 | "154",10,"no_motion","performance",55.8594952584844
156 | "155",11,"no_motion","performance",64.3387071322576
157 | "156",12,"no_motion","performance",27.7535568520894
158 | "157",13,"no_motion","performance",49.8610842757746
159 | "158",14,"no_motion","performance",41.2963735635064
160 | "159",15,"no_motion","performance",64.3302098050749
161 | "160",16,"no_motion","performance",63.3738567853306
162 | "161",1,"no_motion","physical_demand",48.5022574023903
163 | "162",2,"no_motion","physical_demand",56.5879485585391
164 | "163",3,"no_motion","physical_demand",32.8479713910682
165 | "164",4,"no_motion","physical_demand",33.2010798560281
166 | "165",5,"no_motion","physical_demand",48.6489109280486
167 | "166",6,"no_motion","physical_demand",30.1677238807114
168 | "167",7,"no_motion","physical_demand",15.3014055671121
169 | "168",8,"no_motion","physical_demand",38.2957947932656
170 | "169",9,"no_motion","physical_demand",29.8803937380322
171 | "170",10,"no_motion","physical_demand",27.4228684319627
172 | "171",11,"no_motion","physical_demand",34.0094756756381
173 | "172",12,"no_motion","physical_demand",19.2195937995676
174 | "173",13,"no_motion","physical_demand",36.6178038330854
175 | "174",14,"no_motion","physical_demand",30.8737385132687
176 | "175",15,"no_motion","physical_demand",47.4495966274307
177 | "176",16,"no_motion","physical_demand",35.9334370038512
178 | "177",1,"no_motion","temporal_demand",60.9615011891964
179 | "178",2,"no_motion","temporal_demand",65.7747207347715
180 | "179",3,"no_motion","temporal_demand",54.3904375394031
181 | "180",4,"no_motion","temporal_demand",70.6459092694736
182 | "181",5,"no_motion","temporal_demand",64.6354579889539
183 | "182",6,"no_motion","temporal_demand",69.7955333793362
184 | "183",7,"no_motion","temporal_demand",63.7476922150155
185 | "184",8,"no_motion","temporal_demand",64.7973511950388
186 | "185",9,"no_motion","temporal_demand",66.982649980603
187 | "186",10,"no_motion","temporal_demand",57.9476773540266
188 | "187",11,"no_motion","temporal_demand",74.3224239520792
189 | "188",12,"no_motion","temporal_demand",65.611152777591
190 | "189",13,"no_motion","temporal_demand",63.7785493137214
191 | "190",14,"no_motion","temporal_demand",59.6897169551303
192 | "191",15,"no_motion","temporal_demand",69.1993772354949
193 | "192",16,"no_motion","temporal_demand",63.7198489201647
194 |
--------------------------------------------------------------------------------
/datasets/stigmatized_campaigns_simulated-data.csv:
--------------------------------------------------------------------------------
1 | "","participant","attitude","order","persuasiveness","empathy"
2 | "1",1,"support","before",4.40573455089864,3.89829569417854
3 | "2",2,"support","before",5.73009162749441,3.63916980722547
4 | "3",3,"support","before",5.13983264589146,2.31759521215623
5 | "4",4,"support","before",3.28325535163691,4.2863938522029
6 | "5",5,"support","before",2.6702387860235,3.48695273941219
7 | "6",6,"support","before",3.80442011852061,3.36107961746716
8 | "7",7,"support","before",4.04852092501442,3.5999877262985
9 | "8",8,"support","before",3.60740761836469,5.20765809139836
10 | "9",9,"support","before",4.96716329694572,3.19667979166351
11 | "10",10,"support","before",3.00984202321901,3.24574042282484
12 | "11",11,"support","before",2.71857802020707,4.401826939733
13 | "12",12,"support","before",3.95702947304393,2.30384720967563
14 | "13",13,"support","before",1.6326318467114,4.59699075452465
15 | "14",14,"support","before",3.48812747845937,5.65821578887246
16 | "15",15,"support","before",5.3772858657999,4.1045331930034
17 | "16",16,"support","before",2.07124933759269,2.25624252538731
18 | "17",17,"support","before",3.94834414343596,3.85216606783078
19 | "18",18,"support","before",3.78466822695807,3.0637306150288
20 | "19",19,"support","before",2.87183112635898,4.23705606263923
21 | "20",20,"support","before",1.82981883912361,3.64173725107282
22 | "21",21,"support","before",4.41562687567857,4.02352485737185
23 | "22",22,"support","before",2.98636453830791,3.11353736359473
24 | "23",23,"support","before",2.31369888756975,3.06429014917687
25 | "24",24,"support","before",3.29600639760319,3.70288171567384
26 | "25",25,"support","before",3.1422319991402,2.73986655158695
27 | "26",26,"oppose","before",2.70289663267111,3.31535050426009
28 | "27",27,"oppose","before",4.86976299399937,3.30981443993623
29 | "28",28,"oppose","before",1.75111048739079,1.26886028290927
30 | "29",29,"oppose","before",4.82878313846614,3.1842382024408
31 | "30",30,"oppose","before",3.35000931386195,3.60723816093571
32 | "31",31,"oppose","before",6.20794873026476,3.75166370350341
33 | "32",32,"oppose","before",4.69091296237086,3.37451851145866
34 | "33",33,"oppose","before",4.08316338830329,3.40279418681068
35 | "34",34,"oppose","before",3.31458752093788,2.51468529768596
36 | "35",35,"oppose","before",6.97669442648597,2.86035223864848
37 | "36",36,"oppose","before",3.00841661882744,3.33653109469252
38 | "37",37,"oppose","before",3.18419262164537,4.82062688154548
39 | "38",38,"oppose","before",4.96941219408283,1.27308752228665
40 | "39",39,"oppose","before",2.92578656710011,2.97837989709837
41 | "40",40,"oppose","before",3.87792735611406,0.903726365687009
42 | "41",41,"oppose","before",3.30829261761016,2.43491933490072
43 | "42",42,"oppose","before",3.96754076943,5.08259618331271
44 | "43",43,"oppose","before",2.68787234714274,2.424575409719
45 | "44",44,"oppose","before",1.83716206277372,2.80755583942817
46 | "45",45,"oppose","before",4.43377392427531,1.61042083059434
47 | "46",46,"oppose","before",4.02172216658175,2.96534691171415
48 | "47",47,"oppose","before",4.14142591343767,3.94684014524908
49 | "48",48,"oppose","before",2.96776692307587,2.19538928571236
50 | "49",49,"oppose","before",3.32007335407847,3.40343555795276
51 | "50",50,"oppose","before",4.73281399612096,3.59944932362922
52 | "51",51,"oppose","before",4.39457829507545,3.90979678016095
53 | "52",52,"oppose","before",1.50537267787595,5.14780710772723
54 | "53",1,"support","after",6.00486956549221,4.62239312343345
55 | "54",2,"support","after",4.59584395452931,3.76215675555703
56 | "55",3,"support","after",4.11958565125179,4.7780622342035
57 | "56",4,"support","after",5.73047442817776,4.17604432845596
58 | "57",5,"support","after",6.04130988239085,3.34823570880482
59 | "58",6,"support","after",5.1167068473927,4.9156872193279
60 | "59",7,"support","after",7.66399217871838,4.17638215867229
61 | "60",8,"support","after",5.63696066414526,5.72330328717527
62 | "61",9,"support","after",4.14926207707465,6.19617062669002
63 | "62",10,"support","after",6.24596311609874,3.83792542175314
64 | "63",11,"support","after",5.21344815836189,4.25006545689576
65 | "64",12,"support","after",3.72285662677104,7.87705734028252
66 | "65",13,"support","after",3.36819379876267,3.51488167366069
67 | "66",14,"support","after",4.04772616957932,4.32987922446406
68 | "67",15,"support","after",6.1729909818432,5.73565171201996
69 | "68",16,"support","after",3.32926156178592,4.66310619896197
70 | "69",17,"support","after",4.60756738336794,4.77864728078604
71 | "70",18,"support","after",5.00379826465122,4.87565012560552
72 | "71",19,"support","after",3.82124270910706,4.61857579354836
73 | "72",20,"support","after",4.9640476592055,3.21249651219853
74 | "73",21,"support","after",6.39607726134969,4.5219555890663
75 | "74",22,"support","after",4.44138024697706,4.82037924824058
76 | "75",23,"support","after",4.02393168187947,4.45095300314973
77 | "76",24,"support","after",4.02752151254139,4.18112720355392
78 | "77",25,"support","after",4.55498761854497,4.13321277349271
79 | "78",26,"oppose","after",3.4454529630567,2.22918869662918
80 | "79",27,"oppose","after",4.22624045580722,4.15000529557309
81 | "80",28,"oppose","after",5.69724172159873,3.25569679509731
82 | "81",29,"oppose","after",5.74106831818266,3.08859063878211
83 | "82",30,"oppose","after",1.8152787713683,2.32702904348417
84 | "83",31,"oppose","after",5.71882997237842,1.16797941728631
85 | "84",32,"oppose","after",3.78635363713166,2.74976474417464
86 | "85",33,"oppose","after",2.55151914792261,2.50498467560064
87 | "86",34,"oppose","after",3.42759677802461,0.834672289763792
88 | "87",35,"oppose","after",2.17260751602258,3.80350186944327
89 | "88",36,"oppose","after",2.18009493701987,2.65440817762918
90 | "89",37,"oppose","after",5.71354722299045,4.43194036032053
91 | "90",38,"oppose","after",2.19249939431504,2.72964599398318
92 | "91",39,"oppose","after",5.06291739611731,2.64781813087259
93 | "92",40,"oppose","after",5.38920634265136,2.64041083712852
94 | "93",41,"oppose","after",2.45926999269871,1.5927993445083
95 | "94",42,"oppose","after",4.90025733792512,3.74624881011539
96 | "95",43,"oppose","after",3.98899922267144,2.09049436273894
97 | "96",44,"oppose","after",4.31140994525199,3.55233604501274
98 | "97",45,"oppose","after",6.89653197977992,2.97980036566709
99 | "98",46,"oppose","after",2.08780943334144,2.23728502289986
100 | "99",47,"oppose","after",3.98847042995348,2.97282437454049
101 | "100",48,"oppose","after",4.39322474871176,1.22900098662931
102 | "101",49,"oppose","after",5.35755261568502,1.67473946258257
103 | "102",50,"oppose","after",3.29999493498857,0.949884504886524
104 | "103",51,"oppose","after",4.28009849622172,3.47890509762928
105 | "104",52,"oppose","after",3.9959262881833,2.48004465702101
106 |
--------------------------------------------------------------------------------
/images/generic_2bar_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2bar_chart.png
--------------------------------------------------------------------------------
/images/generic_2line-cont_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line-cont_chart.png
--------------------------------------------------------------------------------
/images/generic_2line_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line_chart.png
--------------------------------------------------------------------------------
/images/generic_2line_chart_old.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_2line_chart_old.png
--------------------------------------------------------------------------------
/images/generic_bar_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_bar_chart.png
--------------------------------------------------------------------------------
/images/generic_line-ord_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_line-ord_chart.png
--------------------------------------------------------------------------------
/images/generic_line_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cdphelan/bayesian-template/1c8973cc9dfd07b7a1def207f7cf59a258eb41b5/images/generic_line_chart.png
--------------------------------------------------------------------------------
/plotting_functions.R:
--------------------------------------------------------------------------------
1 | ## PLOTTING FUNCTIONS FOR R TEMPLATES ##
2 | #This file contains the default code for the plots. If needed, the plot aesthetics can be customized here.
3 |
4 | #################### TEMPLATE 1: 1var categorical bar chart ####################
5 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
6 | HOPs_plot_1 = function(data) {
7 | ggplot(data, aes(x = x, y = .value)) + #do not change
8 | geom_bar(stat='identity', width=0.5) + #do not change from stat='identity'. Fill and line aesthetics may be modified here, see ggplot2 documentation
9 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
10 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability
11 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
12 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels
13 | labs(x=x_lab, y=y_lab) # axes labels
14 | }
15 |
16 | # This function generates the 5 static plots of prior draws.
17 | static_prior_plot_1 = function(prior_draws) {
18 | ggplot(prior_draws, aes(x = x, y = .value)) +
19 | geom_bar(stat='identity') +
20 | facet_grid(cols = vars(.draw)) +
21 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
22 | theme(strip.background = element_blank(),
23 | strip.text.y = element_blank(),
24 | axis.text.x = element_text(angle = 45, hjust = 1),
25 | plot.title = element_text(hjust = 0.5)) +
26 | labs(x=x_lab, y=y_lab) + # axes labels
27 | ggtitle("Five sample draws from the priors")
28 | }
29 |
30 | # This function generates the static plots of posterior draws.
31 | static_post_plot_1 = function(fit) {
32 | ggplot(fit, aes(x = x, y = .value)) +
33 | geom_bar(stat ='identity', width=0.5) +
34 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) +
35 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
36 | theme(strip.background = element_blank(),
37 | strip.text.y = element_blank(),
38 | axis.text.x = element_text(angle = 45, hjust = 1),
39 | plot.title = element_text(hjust = 0.5)) +
40 | labs(x=x_lab, y=y_lab) # axes labels
41 | }
42 |
43 |
44 | #################### TEMPLATE 2: 1var ordinal line chart ####################
45 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
46 | HOPs_plot_2 = function(data) {
47 | ggplot(data, aes(x = x, y = .value, group = 1)) + #do not change
48 | geom_line() + #do not change
49 | geom_point() + #do not change
50 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
51 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability
52 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
53 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels
54 | labs(x=x_lab, y=y_lab) # axes labels
55 | }
56 |
57 | # This function generates the 5 static plots of prior draws.
58 | static_prior_plot_2 = function(prior_draws) {
59 | ggplot(prior_draws, aes(x = x, y = .value, group=.draw)) +
60 | geom_line() +
61 | geom_point() +
62 | facet_grid(cols = vars(.draw)) +
63 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
64 | theme(strip.background = element_blank(),
65 | strip.text.y = element_blank(),
66 | axis.text.x = element_text(angle = 45, hjust = 1),
67 | plot.title = element_text(hjust = 0.5)) +
68 | labs(x=x_lab, y=y_lab) + # axes labels
69 | ggtitle("Five sample draws from the priors")
70 | }
71 |
72 | # This function generates the static plots of posterior draws with error bars.
73 | static_post_plot_2 = function(fit) {
74 | ggplot(fit, aes(x = x, y = .value, group=1)) +
75 | geom_line() +
76 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) +
77 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
78 | labs(x=x_lab, y=y_lab)
79 | }
80 |
81 |
82 | #################### TEMPLATE 3: 1var continuous line chart ####################
83 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
84 | HOPs_plot_3 = function(data) {
85 | ggplot(data, aes(x = x, y = .value)) + #do not change
86 | geom_line() + #do not change
87 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
88 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
89 | labs(x=x_lab, y=y_lab) # axes labels
90 | }
91 |
92 | # This function generates the static plot of 100 prior draws.
93 | static_prior_plot_3 = function(fit) {
94 | ggplot(fit, aes(x = x, y = .value)) +
95 | geom_line(aes(group = .draw), alpha = .2) +
96 | labs(x=x_lab, y=y_lab) + # axes labels
97 | ggtitle("100 sample draws from the priors")
98 | }
99 |
100 | # This function generates the static plots of posterior draws with a confidence band.
101 | static_post_plot_3a = function(fit) {
102 | ggplot(fit, aes(x = x, y = .value)) +
103 | stat_lineribbon() +
104 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
105 | scale_fill_brewer()
106 | }
107 |
108 | # This function generates the static plots of 100 overlaid posterior draws.
109 | static_post_plot_3b = function(fit) {
110 | ggplot(fit, aes(x = x, y = .value)) +
111 | geom_line(aes(group = .draw), alpha = .2) +
112 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
113 | labs(x=x_lab, y=y_lab) # axes labels
114 | }
115 |
116 |
117 | #################### TEMPLATE 4: 2var categorical & categorical bar chart ####################
118 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
119 | HOPs_plot_4 = function(data) {
120 | ggplot(data, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + #do not change
121 | geom_bar(stat='identity', position='dodge') + #do not change from stat='identity'. Fill and line aesthetics may be modified here, see ggplot2 documentation
122 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
123 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability
124 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
125 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels
126 | labs(x=x_lab, y=y_lab) # axes labels
127 | }
128 |
129 | # This function generates the 5 static plots of prior draws.
130 | static_prior_plot_4 = function(prior_draws) {
131 | ggplot(prior_draws, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) +
132 | geom_bar(stat='identity', position='dodge') +
133 | facet_grid(cols = vars(.draw)) +
134 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
135 | theme(strip.background = element_blank(),
136 | strip.text.y = element_blank(),
137 | axis.text.x = element_text(angle = 45, hjust = 1),
138 | plot.title = element_text(hjust = 0.5)) +
139 | labs(x=x_lab, y=y_lab) + # axes labels
140 | ggtitle("Five sample draws from the priors")
141 | }
142 |
143 | # This function generates the static plots of posterior draws with error bars.
144 | static_post_plot_4 = function(fit) {
145 | ggplot(fit, aes(x = x1, y = .value, fill = x2, group = x2)) +
146 | geom_bar(stat='identity', position='dodge') +
147 | geom_errorbar(aes(ymin = .lower, ymax = .upper), position = position_dodge(width = .9), width = .2) +
148 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
149 | labs(x=x_lab, y=y_lab)
150 | }
151 |
152 |
153 | #################### TEMPLATE 5: 2var categorical & ordinal line chart ####################
154 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
155 | HOPS_plot_5 = function(data) {
156 | ggplot(data, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) + #do not change
157 | geom_bar(stat='identity', position='dodge') + #do not change
158 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
159 | theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotates the x-axis text for readability
160 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
161 | # scale_x_discrete(limits=c("before","after")) + #manually set the order of the x-axis levels
162 | labs(x=x_lab, y=y_lab) # axes labels
163 | }
164 |
165 | # This function generates the 5 static plots of prior draws.
166 | static_prior_plot_5 = function(fit) {
167 | ggplot(fit, aes(x = x1, y = .value, col = x2, group = x2)) +
168 | geom_line(aes(group = .draw), alpha = .7) +
169 | geom_point(aes(group = .draw), alpha = .7) +
170 | facet_grid(cols = vars(x2)) +
171 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits - CHANGE ME (optional)
172 | theme(axis.text.x = element_text(angle = 45, hjust = 1),
173 | plot.title = element_text(hjust = 0.5),
174 | legend.position="none") +
175 | labs(x=x_lab, y=y_lab) + # axes labels
176 | ggtitle("Five sample draws from the priors")
177 | }
178 |
179 | # This function generates the static plots of posterior draws with error bars.
180 | static_post_plot_5 = function(fit) {
181 | ggplot(fit, aes(x = x1, y = .value, col = x2, fill = x2, group = x2)) +
182 | geom_line() + #do not change
183 | geom_errorbar(aes(ymin = .lower, ymax = .upper), width = .2) +
184 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
185 | labs(x=x_lab, y=y_lab)
186 | }
187 |
188 |
189 |
190 | #################### TEMPLATE 6: 2var continuous & categorical line chart ####################
191 | # This function generates the HOPs (animated plots) for both the prior and posterior plots
192 | HOPS_plot_6 = function(data) {
193 | ggplot(data, aes(x = x1, y = .value, col=x2, group=x2)) + #do not change
194 | geom_line() + #do not change
195 | transition_states(.draw, transition_length = 1, state_length = 1) + # gganimate code to animate the plots. Do not change
196 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
197 | labs(x=x_lab, y=y_lab) # axes labels
198 | }
199 |
200 | # This function generates the static plots of 100 prior draws.
201 | static_prior_plot_6 = function(prior_draws) {
202 | ggplot(prior_draws, aes(x = x1, y = .value, col = x2, group = x2)) +
203 | geom_line(aes(group = .draw), alpha = .2) +
204 | facet_grid(cols = vars(x2)) +
205 | # coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
206 | theme(plot.title = element_text(hjust = 0.5),
207 | legend.position="none") +
208 | labs(x=x_lab, y=y_lab) + # axes labels
209 | ggtitle("100 sample draws from the priors")
210 | }
211 |
212 | # This function generates the static plots of posterior draws with a confidence band.
213 | static_post_plot_6a = function(fit) {
214 | ggplot(fit, aes(x = x1, y = .value, group = x2)) +
215 | stat_lineribbon() +
216 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
217 | scale_fill_brewer() +
218 | facet_grid(cols = vars(x2)) # comment this out to plot both x vars on the same plot
219 | }
220 |
221 | # This function generates the static plots of 100 overlaid posterior draws.
222 | static_post_plot_6b = function(fit) {
223 | ggplot(fit, aes(x = x1, y = .value, col=x2, group=x2)) +
224 | coord_cartesian(ylim = c(min(mydata$y, na.rm=T), max(mydata$y, na.rm=T))) + # sets axis limits
225 | geom_line(aes(group = .draw), alpha = .2) +
226 | facet_grid(cols = vars(x2)) + # comment this out to plot both x vars on the same plot
227 | theme(plot.title = element_text(hjust = 0.5),
228 | legend.position="none") +
229 | labs(x=x_lab, y=y_lab) # axes labels
230 | }
231 |
--------------------------------------------------------------------------------
/quizlet/check_understanding_priors.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Check your understanding of how to set Bayesian priors"
3 | output: learnr::tutorial
4 | runtime: shiny_prerendered
5 | ---
6 |
7 | ```{r setup, include=FALSE}
8 | library(learnr)
9 | knitr::opts_chunk$set(echo = FALSE)
10 | ```
11 |
12 |
13 | ```{r where}
14 |
15 | question("What information can you use when setting priors? (Select all that apply)",
16 | answer("Means & sds from related previous studies", correct = TRUE),
17 | answer("Approximate means & sds from your current data"),
18 | answer("The range of possible values of the dependent variable", correct = TRUE),
19 | answer("Results from a previous analysis of your current data"),
20 | answer("Effect sizes from previous studies", correct = TRUE),
21 | correct = "Correct! You can use any information you could have known before running your current study.",
22 | incorrect = "Incorrect. Did you select option 2 or 4? Priors can only include information you could have known before running your current study. Pretend you don't know anything about your current data!",
23 | allow_retry = TRUE)
24 |
25 | ```
26 |
27 |
28 | ```{r example-a}
29 |
30 | question("Say you have conducted an experiment with 2 conditions (control and treatment). Your outcome variable is a continuous variable with a range [0,100]. You have no previous studies to use for your priors, so you are going to set weakly informative priors that just tell the model which values are implausible. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.",
31 | answer("Control mean: 50 | Control max value: 50 | Effect size mean: 0 | Effect sd: 50"),
32 | answer("Control mean: 50 | Control sd: 25 | Effect size mean: 0 | Effect sd: 25", correct = TRUE),
33 | answer("Control mean: 0 | Control sd: 50 | Effect size mean: 0 | Effect sd: 50"),
34 | answer("Control mean: 50 | Control sd: 25 | Effect size mean: 50 | Effect sd: 25"),
35 | incorrect = "Incorrect. Your control mean is the mean of the range (so here, mean = 50). In our template, which uses a normal distribution for priors, you want your max value to be 2 sds away from the mean: (100 - 50)/2 = 25. You don't have any info about an effect so you set the effect mean = 0 and the effect sd = control sd (25).",
36 | allow_retry = TRUE)
37 |
38 | ```
39 |
40 |
41 | ```{r example-b}
42 |
43 | question("Say you have conducted an experiment with 2 conditions (a control condition and 2 treatment conditions). Participants in the experiment were asked to rate a product from 0 to 5 stars. You have no previous literature to use to set priors, and you know from exploratory analysis that your study participants rated the product with an overall mean score of 3.5, and sd of 0.5. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.",
44 | answer("Control mean: 3.5 | Control sd: 0.5 | Effect size mean: 0 | Effect sd: 0.5"),
45 | answer("Control mean: 2.5 | Control sd: 2.5 | Effect size mean: 0 | Effect sd: 2.5"),
46 | answer("Control mean: 2.5 | Control sd: 1.25 | Effect size mean: 0 | Effect sd: 1.25", correct = TRUE),
47 | answer("Control mean: 3.5 | Control sd: 5 | Effect size mean: 0 | Effect sd: 5"),
48 | incorrect = "Incorrect. (Sort of) trick question - you shouldn't use any of the provided information about your current data to set priors. Your control mean is the mean of the range (so here, mean = 2.5). In our template, which uses a normal distribution for priors, you want your max value to be 2 sds away from the mean: (5 - 2.5)/2 = 1.25. You don't have any info about an effect so you set the effect mean = 0 and the effect sd = control sd (1.25).",
49 | allow_retry = TRUE)
50 |
51 | ```
52 |
53 |
54 | ```{r example-c}
55 |
56 | question("Say you have conducted an experiment with 2 conditions (control and treatment). Participants in the experiment were asked to rate a product from 0 to 5 stars. This time, you have previous literature to use to set priors, which says that the mean score is 3 in the control condition, with a sd of 0.25, and a mean score that is 0.75 points higher in the treatment condition, with an sd of 0.5. What values do you use for the priors? Assume you're using our template, which requires priors to have normal distributions.",
57 | answer("Control mean: 3 | Control sd: 0.25 | Effect size mean: 0.75 | Effect sd: 0.5", correct = TRUE),
58 | answer("Control mean: 3 | Control sd: 0.25 | Effect size mean: 3.75 | Effect sd: 0.5"),
59 | answer("Control mean: 2.5 | Control sd: 1.25 | Effect size mean: 0 | Effect sd: 1.25"),
60 | answer("Control mean: 0 | Control sd: 5 | Effect size mean: 0 | Effect sd: 5"),
61 | incorrect = "Incorrect. Your control mean and sd are taken from the previous literature: 3 and 0.25, respectively. The effect size is also taken straight from the literature: mean = 0.75 and sd = 0.5.",
62 | allow_retry = TRUE)
63 |
64 | ```
65 |
66 |
67 | ```{r example-d}
68 |
69 | question("Say you have gathered data on 25 different subreddits (message forums) and you want to compare the word counts of comments on each subreddit. Say that you know from a previously-published study that word counts of comments on subreddits follow a normal distribution with a mean of 85 and a sd of 35. You have no previous information about the individual subreddits you are using in your current study, so have no information about what the 'effect size' might be between the different subreddits. What values do you use for the priors?",
70 | answer("Control mean: 85 | Control sd: 35 | Effect size mean: 85 | Effect sd: 35"),
71 | answer("Control mean: 0 | Control sd: 35 | Effect size mean: 25 | Effect sd: 35"),
72 | answer("Control mean: 35 | Control sd: 85 | Effect size mean: 0 | Effect sd: 25"),
73 | answer("Control mean: 85 | Control sd: 35 | Effect size mean: 0 | Effect sd: 35", correct = TRUE),
74 | incorrect = "Incorrect. Your control mean and control sd are the mean and sd of the previous study. You have no information about how the 25 individual subreddits might be different from one another, so you set the 'effect size' mean = 0 and the effect sd = control sd (35).",
75 | allow_retry = TRUE)
76 |
77 | ```
78 |
79 |
80 | ```{r definition}
81 |
82 | question("When you set priors, you're giving your best guess of what the probability distribution of some unknown quantity would be, if you didn't know anything about the data from your current study. What is that unknown quantity?",
83 | answer("The parameters of the model", correct = TRUE),
84 | answer("The outcome variable"),
85 | correct = "Correct! Priors are the probability distributions of the *parameters*, not the outcome variable. Note though that the two values may be identical if your model has only main effects with dummy variables.",
86 | incorrect = "Incorrect. Priors are the probability distributions of the parameters, not the outcome variable. Note though that the two values may be identical if your model has only main effects with dummy variables.",
87 | allow_retry = TRUE)
88 |
89 | ```
90 |
91 | All done! You can head back to the analysis template.
92 |
--------------------------------------------------------------------------------