├── .gitattributes
├── .gitignore
├── README.md
├── email_instructions.txt
├── exercises
├── exercise_0-pdf.Rmd
├── exercise_0-pdf.pdf
├── exercise_0.Rmd
├── exercise_0.nb.html
├── exercise_0_SOLUTION.Rmd
├── exercise_0_SOLUTION.nb.html
├── exercise_1-number_of_parameters.Rmd
├── exercise_1-number_of_parameters.pdf
├── exercise_2-pdf.Rmd
├── exercise_2-pdf.pdf
├── exercise_2.R
├── exercise_2.Rmd
├── exercise_2.nb.html
├── exercise_2_SOLUTION.R
└── prepare_data.R
├── handout
├── mixed_model_handout.Rmd
└── mixed_model_handout.pdf
├── part0-introduction
├── .Rhistory
├── figures
│ ├── RMarkdown-example.png
│ ├── ch-02-markdown-margin.png
│ ├── data-science.png
│ ├── github-workshop.png
│ ├── magrittr.png
│ ├── markdownChunk2.png
│ └── tidy-1.png
├── introduction.R
├── introduction.Rmd
├── introduction.html
├── libs
│ └── remark-css
│ │ ├── default-fonts.css
│ │ └── default.css
└── my-theme.css
├── part1-statistical-modeling-in-r
├── .Rhistory
├── cognition_cutout.png
├── libs
│ └── remark-css
│ │ ├── default-fonts.css
│ │ └── default.css
├── my-theme.css
├── ssk16_dat_tutorial.rda
├── statistical_modeling.R
├── statistical_modeling.Rmd
├── statistical_modeling.html
└── statistical_modeling_files
│ └── figure-html
│ ├── unnamed-chunk-10-1.svg
│ ├── unnamed-chunk-2-1.svg
│ ├── unnamed-chunk-3-1.svg
│ ├── unnamed-chunk-4-1.svg
│ ├── unnamed-chunk-5-1.svg
│ ├── unnamed-chunk-57-1.svg
│ ├── unnamed-chunk-58-1.svg
│ ├── unnamed-chunk-60-1.png
│ ├── unnamed-chunk-60-1.svg
│ ├── unnamed-chunk-62-1.png
│ ├── unnamed-chunk-7-1.svg
│ ├── unnamed-chunk-71-1.svg
│ ├── unnamed-chunk-76-1.svg
│ └── unnamed-chunk-8-1.svg
└── part2-mixed-models-in-r
├── .Rhistory
├── fitted_lmms.rda
├── libs
└── remark-css
│ ├── default-fonts.css
│ └── default.css
├── mixed_models.R
├── mixed_models.Rmd
├── mixed_models.html
├── mixed_models_files
└── figure-html
│ ├── unnamed-chunk-10-1.png
│ ├── unnamed-chunk-11-1.png
│ ├── unnamed-chunk-12-1.png
│ ├── unnamed-chunk-13-1.png
│ ├── unnamed-chunk-14-1.png
│ ├── unnamed-chunk-15-1.png
│ ├── unnamed-chunk-16-1.png
│ ├── unnamed-chunk-17-1.png
│ ├── unnamed-chunk-19-1.png
│ ├── unnamed-chunk-20-1.png
│ ├── unnamed-chunk-21-1.png
│ ├── unnamed-chunk-22-1.png
│ ├── unnamed-chunk-24-1.svg
│ ├── unnamed-chunk-27-1.png
│ ├── unnamed-chunk-3-1.png
│ ├── unnamed-chunk-30-1.png
│ ├── unnamed-chunk-31-1.png
│ ├── unnamed-chunk-32-1.png
│ ├── unnamed-chunk-33-1.png
│ ├── unnamed-chunk-34-1.png
│ ├── unnamed-chunk-34-1.svg
│ ├── unnamed-chunk-35-1.png
│ ├── unnamed-chunk-35-1.svg
│ ├── unnamed-chunk-38-1.png
│ ├── unnamed-chunk-39-1.png
│ ├── unnamed-chunk-4-1.png
│ ├── unnamed-chunk-41-1.png
│ ├── unnamed-chunk-42-1.png
│ ├── unnamed-chunk-43-1.png
│ ├── unnamed-chunk-46-1.png
│ ├── unnamed-chunk-47-1.png
│ ├── unnamed-chunk-48-1.png
│ ├── unnamed-chunk-49-1.png
│ ├── unnamed-chunk-5-1.png
│ ├── unnamed-chunk-50-1.png
│ ├── unnamed-chunk-6-1.png
│ ├── unnamed-chunk-7-1.png
│ ├── unnamed-chunk-8-1.png
│ └── unnamed-chunk-9-1.png
├── my-theme.css
├── random_effect_types.png
└── ssk16_dat_tutorial.rda
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 |
4 | # Custom for Visual Studio
5 | *.cs diff=csharp
6 |
7 | # Standard to msysgit
8 | *.doc diff=astextplain
9 | *.DOC diff=astextplain
10 | *.docx diff=astextplain
11 | *.DOCX diff=astextplain
12 | *.dot diff=astextplain
13 | *.DOT diff=astextplain
14 | *.pdf diff=astextplain
15 | *.PDF diff=astextplain
16 | *.rtf diff=astextplain
17 | *.RTF diff=astextplain
18 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Windows image file caches
2 | Thumbs.db
3 | ehthumbs.db
4 |
5 | # Folder config file
6 | Desktop.ini
7 |
8 | # Recycle Bin used on file shares
9 | $RECYCLE.BIN/
10 |
11 | # Windows Installer files
12 | *.cab
13 | *.msi
14 | *.msm
15 | *.msp
16 |
17 | # Windows shortcuts
18 | *.lnk
19 |
20 | # =========================
21 | # Operating System Files
22 | # =========================
23 |
24 | # OSX
25 | # =========================
26 |
27 | .DS_Store
28 | .AppleDouble
29 | .LSOverride
30 |
31 | # Thumbnails
32 | ._*
33 |
34 | # Files that might appear in the root of a volume
35 | .DocumentRevisions-V100
36 | .fseventsd
37 | .Spotlight-V100
38 | .TemporaryItems
39 | .Trashes
40 | .VolumeIcon.icns
41 |
42 | # Directories potentially created on remote AFP share
43 | .AppleDB
44 | .AppleDesktop
45 | Network Trash Folder
46 | Temporary Items
47 | .apdisk
48 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | **NOTE: This repository is not maintained or updated any more. Please visit the [successor repository](https://github.com/singmann/mixed_model_workshop_2day) which extends this workshop to two days:** https://github.com/singmann/mixed_model_workshop_2day
3 |
4 | -------------
5 |
6 | # Statistical Modeling and Mixed Models with R
7 |
8 | This repo contains slides and exercise materials for my workshop on statistical modeling and mixed models with R. Previous instances of this workshop:
9 |
10 | - The first instance of this workshop was held as part of the [Data on the Mind 2017](http://www.dataonthemind.org/2017-workshop). Title: *Statistical Models for Dependent Data: An Introduction to Mixed Models in R*
11 | - One day workshop at the University of Freiburg in June 2018. Title: *Mixed Models in R – An Applied Introduction*
12 | - One day tutorial at CogSci 2018 in Madison (Wisconsin). Title: *Mixed Models in R – An Applied Introduction*
13 |
14 | The mixed model part of the workshop are loosely based on my chapter: [An introduction to linear mixed modeling in experimental psychology.](http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf)
15 | Read the chapter to get a more comprehensive overview.
16 |
17 |
18 | The repo currently contains three `html` presentations:
19 |
20 | - [Part 0: Introduction to Modern `R`](https://htmlpreview.github.io/?https://github.com/singmann/mixed_model_workshop/blob/master/part0-introduction/introduction.html)
21 | - [Part 1: Statistical Modeling in R](https://htmlpreview.github.io/?https://github.com/singmann/mixed_model_workshop/blob/master/part1-statistical-modeling-in-r/statistical_modeling.html)
22 | - [Part 2: Mixed Models in R](https://htmlpreview.github.io/?https://github.com/singmann/mixed_model_workshop/blob/master/part2-mixed-models-in-r/mixed_models.html)
23 |
24 | In addition, the repo contains a [`pdf` handout](https://github.com/singmann/mixed_model_workshop/raw/master/handout/mixed_model_handout.pdf) providing a concise overview.
25 |
26 | ### Requirements
27 | - A recent version of `R` (currently `R 3.5.1`): `https://cran.rstudio.com/`
28 | - `R` packages necessary for the analysis (install with `install.packages("package")` at `R` prompt): `afex` (which automatically installs the additional requirements `emmeans`, `lme4`, and `car`) and `psych` and `MEMSS` (for example data)
29 | - `R` package `tidyverse` as well as `broom` for the exercises (we mainly need `dplyr`, `broom`, `tidyr`, `purrr`, and `ggplot2`).
30 | - `R` package `xaringan` to compile the slides.
31 | - `R` package `sjstats` for Intraclass Correlation Coefficient (ICC)
32 | - Possibly `R` packages `sjPlot` and `MuMIn` for some examples.
33 | - A html 5 compatible browser to view the slides.
34 | - `RStudio`: https://www.rstudio.com/products/rstudio/download3/#download
35 |
36 | ### Overview
37 |
38 | In order to increase statistical power and precision, many data sets in cognitive and behavioral sciences contain more than one data point from each unit of observation (e.g., participant), often across different experimental conditions. Such *repeated-measures* pose a problem to most standard statistical procedures such as ordinary least-squares regression, (between-subjects) ANOVA, or generalized linear models (e.g., logistic regression) as these procedures assume that the data points are *independent and identically distributed*. In case of repeated measures, the independence assumption is expected to be violated. For example, observations coming from the same participant are usually correlated - they are more likely to be similar to each other than two observations coming from two different participants.
39 |
40 | The goal of this workshop is to introduce a class of statistical models that is able to account for most of the cases of non-independence that are typically encountered in cognitive science – *linear mixed-effects models* (Baayen, Davidson, & Bates, 2008), or mixed models for short. Mixed models are a generalization of ordinary regression that explicitly capture the dependency among data points via random-effects parameters. Compared to traditional analyses approaches that ignore these dependencies, mixed models provide more accurate (and generalizable) estimates of the effects, improved statistical power, and non-inflated Type I errors (e.g., Barr, Levy, Scheepers, & Tily, 2013).
41 |
42 | In recent years, mixed models have become increasingly popular. One of the main reason for this is that a number of software packages have appeared that allow to estimate large classes of mixed models in a relatively convenient manner. The workshop will focus on `lme4` (Bates, Mächler, Bolker, & Walker, 2015), the gold standard for estimating mixed models in `R` (R Core Team, 2018). In addition, it will introduce the functionality of `afex` (Singmann, Bolker, Westfall, & Aust, 2017), which simplifies many aspects of using `lme4`, such as the calculation of p-values for mixed models. `afex` was specifically developed with a focus on factorial designs that are common in cognitive and behavioral sciences.
43 |
44 | Despite a number of high impact publications that introduce mixed models to a wide variety of audiences (e.g., Baayen et al., 2008; Judd, Westfall, & Kenny, 2012) the application of mixed models in practice is far from trivial. Applying mixed models requires a number of steps and decisions that are not necessarily part of the methodological arsenal of every researcher. The goal of the workshop is to change this and to introduce mixed models in such a way that they can be effectively used and the results communicated.
45 |
46 | The workshop is split into two parts main parts and one interlude. The focus of the first part is not on mixed models, but on the basic knowledge in statistical modeling with R that necessary for competently using mixed models. The second part focuses exclusively on mixed models. It introduces the key concepts and simultaneously shows how to fit mixed models of increasing complexity. Each part will take approximately 3 hours (including breaks). The time between the two parts will be used to provide a short introduction to the `tidyverse` (Wickham & Grolemund, 2017), a modern set of tools for data science in R that are especially useful in this context.
47 |
48 | Participants of the workshop need some basic knowledge of R. For example, they should be able to read in data, select subsets of the data, and estimate a linear regression model. Participants without any R knowledge will likely nor profit from the workshop.
49 |
50 | ### References
51 |
52 | - Baayen, H., Davidson, D. J., & Bates, D. (2008). Mixed-effects modeling with crossed random effects for subjects and items. *Journal of Memory and Language*, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
53 | - Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. *Journal of Statistical Software*, 67(1). https://doi.org/10.18637/jss.v067.i01
54 | - Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. *Journal of Memory and Language*, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
55 | - Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. *Journal of Personality and Social Psychology*, 103(1), 54–69. https://doi.org/10.1037/a0028347
56 | - Singmann, H., Bolker, B., Westfall, J., & Aust, F. (2017). *afex: Analysis of Factorial Experiments.* R package version 0.18-0. http://cran.r-project.org/package=afex
57 | - R Core Team. (2017). *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/
58 | - Wickham, H., & Grolemund, G. (2017). *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data.* Sebastopol CA: O’Reilly.
59 |
60 | ---
61 |
62 | Last edited: June 2018
63 |
64 | ---
65 |
66 | All code in this repository is released under the [GPL v2 or later license](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html). All non-code materials is released under the [CC-BY-SA license](https://creativecommons.org/licenses/by-sa/4.0/).
67 |
--------------------------------------------------------------------------------
/email_instructions.txt:
--------------------------------------------------------------------------------
1 | Please bring your laptop and prepare it beforehand. This includes:
2 | - Updating both R and RStudio,
3 | - installing a few R packages,
4 | - and making sure that xaringan presentations can be produced.
5 |
6 | In the interest of spending the time of the tutorial on its content, it is important that you do this at least one or two days before the tutorial. I will not have the time to solve installation problems at the day of the tutorial. So please make sure you do this beforehand!
7 |
8 | The latest version of R is 3.5.1 and can be downloaded from: https://cran.rstudio.com/
9 | The latest version of RStudio is 1.1.453 and can be downloaded from: https://www.rstudio.com/products/rstudio/download/#download
10 |
11 | Please note that both R and RStudio need to be updated independently and older versions of R/RStudio are likely to not work properly.
12 |
13 | After updating both R and RStudio, please install the following R packages [e.g., via install.packages("package")]:
14 | afex
15 | MEMSS
16 | psych
17 | tidyverse
18 | broom
19 | xaringan
20 | sjPlot
21 |
22 | After installation of these packages, please ensure that you can produce ("knit") xaringan presentations. For this, start RStudio and create a new example presentation:
23 | - In RStudio select from the menu File -> New File -> R Markdown -> From Template -> Ninja Presentation
24 | - Save the newly created RMarkdown document somewhere (e.g., as "test.Rmd" on your Desktop)
25 | - Click on "Knit" (above the code, below the menu). Note that clicking "Knit" for the first time might prompt the installation of additional packages.
26 |
27 | If successful, clicking "Knit" should create and open the example presentation ("Presentation Ninja - with xaringan ...") as an html file (e.g., "test.html"). The file will likely be opened in an RStudio internal html viewer. Clicking "Open in Browser" will open the file in a browser.
28 |
29 | All workshop materials are available from: https://github.com/singmann/mixed_model_workshop/releases
30 | Download the corresponding zip or tar.gz archive ("Source code"). This archive contains all slides and code used at the workshop.
31 | Please note that it is possible that I will update the materials until the workshop.
32 |
--------------------------------------------------------------------------------
/exercises/exercise_0-pdf.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercise 0: Introduction"
3 | author: "Henrik Singmann"
4 | date: "25 July 2018"
5 | output: pdf_document
6 | ---
7 |
8 | ## Freeman, Heathcote, Chalmers, and Hockley (2010) data
9 |
10 | The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants presented in Freeman et al. (2010). The 300 items in each `stimulus` condition were selected to form a balanced $2 \times 2$ design with factors neighborhood `density` (low versus high) and `frequency` (low versus high).
11 |
12 | The `task` was a between subjects factor: 25 participants worked on the lexical decision task and 20 participants on the naming task. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords.
13 |
14 | - Lexical decision task: Decide whether a string of letters presented on screen is a word (e.g., house) or a non-word (e.g., huese). Response times were recorded when participants pressed the corresponding response key (i.e., word or non-word).
15 | - Naming task: Read the word presented in the screen. Response times were recorded when participants started saying the presented word.
16 |
17 |
18 | ### Design
19 |
20 | The data comes with package `afex`, so we can load it right away. But at first, we load the `tidyverse` package, because these are the functions we want to use throughout this exercise.
21 |
22 | ```{r, message=FALSE}
23 | library("tidyverse")
24 | data("fhch2010", package = "afex") # load
25 | fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors
26 | str(fhch) # structure of the data
27 | ```
28 |
29 | The columns in the data are:
30 |
31 | - `id`: participant id, `factor`
32 | - `task`: `factor` with two levels indicating which task was performed: `"naming"` or `"lexdec"`
33 | - `stimulus`: `factor` indicating whether the shown stimulus was a `"word"` or `"nonword"`
34 | - `density`: `factor` indicating the neighborhood density of presented items with two levels: `"low"` and `"high"`. Density is defined as the number of words that differ from a base word by one letter or phoneme.
35 | - `frequency`: `factor` indicating the word frequency of presented items with two levels: `"low"` (i.e., words that occur less often in natural language) and `"high"` (i.e., words that occur more often in natural language).
36 | - `length`: `factor` with 3 levels (4, 5, or 6) indicating the number of characters of presented stimuli.
37 | - `item`: `factor` with 600 levels: 300 words and 300 nonwords
38 | - `rt`: response time in seconds
39 | - `log_rt`: natural logarithm of response time in seconds
40 | - `correct`: boolean indicating whether or not the response in the lexical decision task was correct or incorrect (incorrect responses of the naming task are not part of the data).
41 |
42 |
43 | ## Exercise 1: Calculating Simple Summary Measures
44 |
45 | For this and the following exercises use the `fhch` `data.frame` (i.e., the data after removing errors).
46 |
47 | ### Part A:
48 |
49 | Use your knowledge of `dplyr` in combination with the pipe `%>%` and take the `mean` of the `rt` column, conditional on `task`. For which task are participants on average faster?
50 |
51 | Hints:
52 |
53 | - `group_by` can be used for conditioning on one or several variables. Separate more than one variable by comma.
54 | - `summarise` can be used for aggregating multiple lines into one.
55 | - The pipe `%>%` allows to concatenate calls from left to right (shortcut for the pipe is `ctrl/cmd` + `shift` + `m`).
56 | - More information: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
57 |
58 |
59 | ```{r}
60 | # start with:
61 | #fhch %>% ...
62 | ```
63 |
64 | ### Part B
65 |
66 | `summarise` allows to use more than one aggregation function. Extend the previous code and also calculate the standard deviation, `sd()`, per task. Does the task that is faster also have the lower variability in RT (i.e., smaller sd)?
67 |
68 | ```{r}
69 | # fhch %>% ...
70 | ```
71 |
72 | ### Part C
73 |
74 | Means are quite sensitive to outliers. Therefore, please recalculate `mean` and `sd` per task, after removing some extreme outliers. Here, we define outliers as RTs below .25 seconds and above 2.5 seconds. Do we still find the same pattern?
75 |
76 | Remember, the verb for selecting variables using `dplyr` is `filter`. You can concatenate various filters simply by comma, in the same call to `filter()`.
77 |
78 |
79 | ```{r}
80 | # fhch %>% ...
81 |
82 | ```
83 |
84 |
85 |
86 | ## Exercise 2: Aggregating Data by ID and Plotting
87 |
88 | The `fhch` data has multiple observations (i.e., trials) per participant and cell of the design. In a traditional analysis, for example, using ANOVA, one can only have one observation per participants and cell of the design. Therefore, a common task is to aggregate the data on the level of the participant and combinations of factors one is currently interested in.
89 |
90 |
91 | ### Part A
92 |
93 | Use the data from the `"lexdec"` task only. For this, take the `mean` of the `rt` column per participant and level of the `length` factor. Save this data in a new object `agg1`.
94 |
95 | Note that to condition on more than one variable in `group_by()`, simply separate the variables by comma.
96 |
97 |
98 | ```{r}
99 | ## write code here
100 | ```
101 |
102 |
103 | ### Part B
104 |
105 | Let us take a look at the individual-level data per length level that you just created. For this, use `ggplot` and plot the level of `length` on the x-axis and the mean RTs on the y-axis.
106 |
107 | - Try both `geom_point` and `geom_jitter()`. Which looks better?
108 | - Does this plot show any clear pattern?
109 | - Can you think of a way to make this plot more informative?
110 |
111 | ```{r}
112 | ## write code here
113 | ```
114 |
115 | ```{r}
116 | ## write code here
117 | ```
118 |
119 |
120 | ### Part C
121 |
122 | Make a plot similar as above, but this time also condition on the `density` factor. That is, first aggregate the data again, this time for the combination of `id`, `length`, and `density`. Then plot the data as above, but also add an aesthetic for the `density` factor. Use `color` to distinguish the different levels of `density` in the plot. Can you see something in this plot? If not, have a look at `position_dodge` with `geom_point`.
123 |
124 |
125 | ```{r}
126 | ## write code here
127 | ```
128 |
129 |
130 | ## Ressources
131 |
132 | - `RStudio` cheat sheets: https://www.rstudio.com/resources/cheatsheets/
133 | - `RStudio`: https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
134 | - `ggplot2`: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf
135 | - `dplyr` & `tidyr`: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
136 |
137 | ## References
138 |
139 | - Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. *Journal of Memory and Language*, 62(1), 1-18. https://doi.org/10.1016/j.jml.2009.09.004
140 |
141 |
142 |
--------------------------------------------------------------------------------
/exercises/exercise_0-pdf.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/exercises/exercise_0-pdf.pdf
--------------------------------------------------------------------------------
/exercises/exercise_0.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercise 0: Introduction"
3 | output: html_notebook
4 | ---
5 |
6 | ## Freeman, Heathcote, Chalmers, and Hockley (2010) data
7 |
8 | The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants presented in Freeman et al. (2010). The 300 items in each `stimulus` condition were selected to form a balanced $2 \times 2$ design with factors neighborhood `density` (low versus high) and `frequency` (low versus high).
9 |
10 | The `task` was a between subjects factor: 25 participants worked on the lexical decision task and 20 participants on the naming task. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords.
11 |
12 | - Lexical decision task: Decide whether a string of letters presented on screen is a word (e.g., house) or a non-word (e.g., huese). Response times were recorded when participants pressed the corresponding response key (i.e., word or non-word).
13 | - Naming task: Read the word presented in the screen. Response times were recorded when participants started saying the presented word.
14 |
15 |
16 | ### Design
17 |
18 | The data comes with package `afex`, so we can load it right away. But at first, we load the `tidyverse` package, because these are the functions we want to use throughout this exercise.
19 |
20 | ```{r, message=FALSE}
21 | library("tidyverse")
22 | data("fhch2010", package = "afex") # load
23 | fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors
24 | str(fhch) # structure of the data
25 | ```
26 |
27 | The columns in the data are:
28 |
29 | - `id`: participant id, `factor`
30 | - `task`: `factor` with two levels indicating which task was performed: `"naming"` or `"lexdec"`
31 | - `stimulus`: `factor` indicating whether the shown stimulus was a `"word"` or `"nonword"`
32 | - `density`: `factor` indicating the neighborhood density of presented items with two levels: `"low"` and `"high"`. Density is defined as the number of words that differ from a base word by one letter or phoneme.
33 | - `frequency`: `factor` indicating the word frequency of presented items with two levels: `"low"` (i.e., words that occur less often in natural language) and `"high"` (i.e., words that occur more often in natural language).
34 | - `length`: `factor` with 3 levels (4, 5, or 6) indicating the number of characters of presented stimuli.
35 | - `item`: `factor` with 600 levels: 300 words and 300 nonwords
36 | - `rt`: response time in seconds
37 | - `log_rt`: natural logarithm of response time in seconds
38 | - `correct`: boolean indicating whether or not the response in the lexical decision task was correct or incorrect (incorrect responses of the naming task are not part of the data).
39 |
40 |
41 | ## Exercise 1: Calculating Simple Summary Measures
42 |
43 | For this and the following exercises use the `fhch` `data.frame` (i.e., the data after removing errors).
44 |
45 | ### Part A:
46 |
47 | Use your knowledge of `dplyr` in combination with the pipe `%>%` and take the `mean` of the `rt` column, conditional on `task`. For which task are participants on average faster?
48 |
49 | Hints:
50 |
51 | - `group_by` can be used for conditioning on one or several variables. Separate more than one variable by comma.
52 | - `summarise` can be used for aggregating multiple lines into one.
53 | - The pipe `%>%` allows to concatenate calls from left to right (shortcut for the pipe is `ctrl/cmd` + `shift` + `m`).
54 | - More information: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
55 |
56 |
57 | ```{r}
58 | # start with:
59 | #fhch %>% ...
60 | ```
61 |
62 | ### Part B
63 |
64 | `summarise` allows to use more than one aggregation function. Extend the previous code and also calculate the standard deviation, `sd()`, per task. Does the task that is faster also have the lower variability in RT (i.e., smaller sd)?
65 |
66 | ```{r}
67 | # fhch %>% ...
68 | ```
69 |
70 | ### Part C
71 |
72 | Means are quite sensitive to outliers. Therefore, please recalculate `mean` and `sd` per task, after removing some extreme outliers. Here, we define outliers as RTs below .25 seconds and above 2.5 seconds. Do we still find the same pattern?
73 |
74 | Remember, the verb for selecting variables using `dplyr` is `filter`. You can concatenate various filters simply by comma, in the same call to `filter()`.
75 |
76 |
77 | ```{r}
78 | # fhch %>% ...
79 |
80 | ```
81 |
82 |
83 | ## Exercise 2: Aggregating Data by ID and Plotting
84 |
85 | The `fhch` data has multiple observations (i.e., trials) per participant and cell of the design. In a traditional analysis, for example, using ANOVA, one can only have one observation per participants and cell of the design. Therefore, a common task is to aggregate the data on the level of the participant and combinations of factors one is currently interested in.
86 |
87 |
88 | ### Part A
89 |
90 | Use the data from the `"lexdec"` task only. For this, take the `mean` of the `rt` column per participant and level of the `length` factor. Save this data in a new object `agg1`.
91 |
92 | Note that to condition on more than one variable in `group_by()`, simply separate the variables by comma.
93 |
94 |
95 | ```{r}
96 | ## write code here
97 | ```
98 |
99 |
100 | ### Part B
101 |
102 | Let us take a look at the individual-level data per length level that you just created. For this, use `ggplot` and plot the level of `length` on the x-axis and the mean RTs on the y-axis.
103 |
104 | - Try both `geom_point` and `geom_jitter()`. Which looks better?
105 | - Does this plot show any clear pattern?
106 | - Can you think of a way to make this plot more informative?
107 |
108 | ```{r}
109 | ## write code here
110 | ```
111 |
112 | ```{r}
113 | ## write code here
114 | ```
115 |
116 |
117 | ### Part C
118 |
119 | Make a plot similar as above, but this time also condition on the `density` factor. That is, first aggregate the data again, this time for the combination of `id`, `length`, and `density`. Then plot the data as above, but also add an aesthetic for the `density` factor. Use `color` to distinguish the different levels of `density` in the plot. Can you see something in this plot? If not, have a look at `position_dodge` with `geom_point`.
120 |
121 |
122 | ```{r}
123 | ## write code here
124 | ```
125 |
126 |
127 | ## Ressources
128 |
129 | - `RStudio` cheat sheets: https://www.rstudio.com/resources/cheatsheets/
130 | - `RStudio`: https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
131 | - `ggplot2`: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf
132 | - `dplyr` & `tidyr`: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
133 |
134 | ## References
135 |
136 | - Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. *Journal of Memory and Language*, 62(1), 1-18. https://doi.org/10.1016/j.jml.2009.09.004
137 |
138 |
139 |
--------------------------------------------------------------------------------
/exercises/exercise_0_SOLUTION.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercise 0: Introduction"
3 | output: html_notebook
4 | ---
5 |
6 | ## Freeman, Heathcote, Chalmers, and Hockley (2010) data
7 |
8 | The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants presented in Freeman et al. (2010). The 300 items in each `stimulus` condition were selected to form a balanced $2 \times 2$ design with factors neighborhood `density` (low versus high) and `frequency` (low versus high).
9 |
10 | The `task` was a between subjects factor: 25 participants worked on the lexical decision task and 20 participants on the naming task. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords.
11 |
12 | - Lexical decision task: Decide whether a string of letters presented on screen is a word (e.g., house) or a non-word (e.g., huese). Response times were recorded when participants pressed the corresponding response key (i.e., word or non-word).
13 | - Naming task: Read the word presented in the screen. Response times were recorded when participants started saying the presented word.
14 |
15 |
16 | ### Design
17 |
18 | The data comes with package `afex`, so we can load it right away. But at first, we load the `tidyverse` package, because these are the functions we want to use throughout this exercise.
19 |
20 | ```{r, message=FALSE}
21 | library("tidyverse")
22 | data("fhch2010", package = "afex") # load
23 | fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors
24 | str(fhch) # structure of the data
25 | library("tidyverse")
26 | ```
27 |
28 | The columns in the data are:
29 |
30 | - `id`: participant id, `factor`
31 | - `task`: `factor` with two levels indicating which task was performed: `"naming"` or `"lexdec"`
32 | - `stimulus`: `factor` indicating whether the shown stimulus was a `"word"` or `"nonword"`
33 | - `density`: `factor` indicating the neighborhood density of presented items with two levels: `"low"` and `"high"`. Density is defined as the number of words that differ from a base word by one letter or phoneme.
34 | - `frequency`: `factor` indicating the word frequency of presented items with two levels: `"low"` (i.e., words that occur less often in natural language) and `"high"` (i.e., words that occur more often in natural language).
35 | - `length`: `factor` with 3 levels (4, 5, or 6) indicating the number of characters of presented stimuli.
36 | - `item`: `factor` with 600 levels: 300 words and 300 nonwords
37 | - `rt`: response time in seconds
38 | - `log_rt`: natural logarithm of response time in seconds
39 | - `correct`: boolean indicating whether or not the response in the lexical decision task was correct or incorrect (incorrect responses of the naming task are not part of the data).
40 |
41 |
42 | ## Exercise 1: Calculating Simple Summary Measures
43 |
44 | For this and the following exercises use the `fhch` `data.frame` (i.e., the data after removing errors).
45 |
46 | ### Part A:
47 |
48 | Use your knowledge of `dplyr` in combination with the pipe `%>%` and take the `mean` of the `rt` column, conditional on `task`. For which task are participants on average faster?
49 |
50 | Hints:
51 |
52 | - `group_by` can be used for conditioning on one or several variables. Separate more than one variable by comma.
53 | - `summarise` can be used for aggregating multiple lines into one.
54 | - The pipe `%>%` allows to concatenate calls from left to right (shortcut for the pipe is `ctrl/cmd` + `shift` + `m`).
55 | - More information: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
56 |
57 |
58 | ```{r}
59 | fhch %>%
60 | group_by(task) %>%
61 | summarise(m = mean(rt))
62 | ```
63 |
64 | ### Part B
65 |
66 | `summarise` allows to use more than one aggregation function. Extend the previous code and also calculate the standard deviation, `sd()`, per task. Does the task that is faster also have the lower variability in RT (i.e., smaller sd)?
67 |
68 |
69 | ```{r}
70 | fhch %>%
71 | group_by(task) %>%
72 | summarise(m = mean(rt),
73 | sd = sd(rt))
74 | ```
75 |
76 | ### Part C
77 |
78 | Means are quite sensitive to outliers. Therefore, please recalculate `mean` and `sd` per task, after removing some extreme outliers. Here, we define outliers as RTs below .25 seconds and above 2.5 seconds. Do we still find the same pattern?
79 |
80 | Remember, the verb for selecting variables using `dplyr` is `filter`. You can concatenate various filters simply by comma, in the same call to `filter()`.
81 |
82 |
83 | ```{r}
84 | fhch %>%
85 | filter(rt > 0.25, rt < 2.5) %>%
86 | group_by(task) %>%
87 | summarise(m = mean(rt),
88 | sd = sd(rt))
89 |
90 | ```
91 |
92 |
93 | ## Exercise 2: Aggregating Data by ID and Plotting
94 |
95 | The `fhch` data has multiple observations (i.e., trials) per participant and cell of the design. In a traditional analysis, for example, using ANOVA, one can only have one observation per participants and cell of the design. Therefore, a common task is to aggregate the data on the level of the participant and combinations of factors one is currently interested in.
96 |
97 |
98 | ### Part A
99 |
100 | Use the data from the `"lexdec"` task only. For this, take the `mean` of the `rt` column per participant and level of the `length` factor. Save this data in a new object `agg1`.
101 |
102 | Note that to condition on more than one variable in `group_by()`, simply separate the variables by comma.
103 |
104 |
105 | ```{r}
106 | agg1 <- fhch %>%
107 | filter(task == "lexdec") %>%
108 | group_by(id, length) %>%
109 | summarise(mrt = mean(rt))
110 | ```
111 |
112 |
113 | ### Part B
114 |
115 | Let us take a look at the individual-level data per length level that you just created. For this, use `ggplot` and plot the level of `length` on the x-axis and the mean RTs on the y-axis.
116 |
117 | - Try both `geom_point` and `geom_jitter()`. Which looks better?
118 | - Does this plot show any clear pattern?
119 | - Can you think of a way to make this plot more informative?
120 |
121 | ```{r}
122 | ggplot(agg1, aes(x = length, y = mrt)) +
123 | geom_jitter()
124 | ```
125 |
126 | ```{r}
127 | ggplot(agg1, aes(x = length, y = mrt)) +
128 | geom_point(alpha = 0.2) +
129 | geom_violin(fill = "transparent") +
130 | stat_summary(color = "red") +
131 | theme_bw()
132 | ```
133 |
134 |
135 | ### Part C
136 |
137 | Make a plot similar as above, but this time also condition on the `density` factor. That is, first aggregate the data again, this time for the combination of `id`, `length`, and `density`. Then plot the data as above, but also add an aesthetic for the `density` factor. Use `color` to distinguish the different levels of `density` in the plot. Can you see something in this plot? If not, have a look at `position_dodge` with `geom_point`.
138 |
139 |
140 | ```{r}
141 | agg2 <- fhch %>%
142 | filter(task == "lexdec") %>%
143 | group_by(id, length, density) %>%
144 | summarise(mrt = mean(rt))
145 | ggplot(agg2, aes(x = length, y = mrt, color = density, group = density)) +
146 | geom_point(position = position_dodge(0.25)) +
147 | stat_summary(position = position_dodge(0.25))
148 |
149 | ```
150 |
151 | ```{r}
152 | ggplot(agg2, aes(x = length, y = mrt, color = density, group = density)) +
153 | geom_point(position = position_dodge(0.25), alpha = 0.5) +
154 | stat_summary(position = position_dodge(0.25)) +
155 | theme_light()
156 |
157 | ```
158 |
159 |
160 | ## Ressources
161 |
162 | - `RStudio` cheat sheets: https://www.rstudio.com/resources/cheatsheets/
163 | - `RStudio`: https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
164 | - `ggplot2`: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf
165 | - `dplyr` & `tidyr`: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
166 |
167 | ## References
168 |
169 | - Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. *Journal of Memory and Language*, 62(1), 1-18. https://doi.org/10.1016/j.jml.2009.09.004
170 |
171 |
172 |
--------------------------------------------------------------------------------
/exercises/exercise_1-number_of_parameters.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercises I: Statistical Modeling in R"
3 | author: "Henrik Singmann"
4 | date: "25 July 2018"
5 | output: pdf_document
6 | ---
7 |
8 |
9 | ```{r setup, include=FALSE}
10 | require(psych)
11 | data(sat.act)
12 | sat.act$gender <- factor(sat.act$gender, 1:2, labels = c("male", "female"))
13 | sat.act$education <- factor(sat.act$education)
14 | sat.act <- na.omit(sat.act)
15 | ```
16 |
17 | # Formula Interface for Statistical Models: `~`
18 |
19 | - Allows symbolic specification of statistical model, e.g. linear models: `lm(ACT ~ SATQ, sat.act)`
20 | - Everything to the left of `~` is the dependent variable.
21 | - Independent variables are to the right of the `~`:
22 |
23 | | Formula | | Interpretation |
24 | | ------------------------|---|----------------------------------|
25 | | `~ x` or `~1+x` || Intercept and main effect of `x` |
26 | | ` ~ x-1` or `~0 + x` || Only main effect of `x` and no intercept (questionable) |
27 | | `~ x+y` || Main effects of `x` and `y`|
28 | | `~ x:y` || Interaction between `x` and `y` (and no main effect) |
29 | | `~ x*y` or `~ x+y+x:y` || Main effects and interaction between `x` and `y` |
30 |
31 |
32 | # Continuous Variables: How many Parameters in each Model?
33 |
34 | ```{r, eval=FALSE}
35 | lm(ACT ~ SATQ_c + SATV_c, sat.act) # a
36 | lm(ACT ~ SATQ_c : SATV_c, sat.act) # b
37 | lm(ACT ~ 0 + SATQ_c:SATV_c, sat.act) # c
38 | lm(ACT ~ SATQ_c*SATV_c, sat.act) # d
39 | lm(ACT ~ 0+SATQ_c*SATV_c, sat.act) # e
40 | ```
41 |
42 | # Categorical Variables: How many Parameters in each Model?
43 |
44 | ```{r, eval=FALSE}
45 | lm(ACT ~ gender, sat.act) # a
46 | lm(ACT ~ 0+gender, sat.act) # b
47 | lm(ACT ~ gender+education, sat.act) # c
48 | lm(ACT ~ 0+gender+education, sat.act) # d
49 | lm(ACT ~ gender:education, sat.act) # e
50 | lm(ACT ~ 0+gender:education, sat.act) # f
51 | lm(ACT ~ gender*education, sat.act) # g
52 | lm(ACT ~ 0+gender*education, sat.act) # h
53 | lm(ACT ~ gender+gender:education, sat.act) # i
54 | ```
55 |
56 | ```{r}
57 | levels(sat.act$gender) ## 2
58 | levels(sat.act$education) ## 6
59 | ```
60 |
--------------------------------------------------------------------------------
/exercises/exercise_1-number_of_parameters.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/exercises/exercise_1-number_of_parameters.pdf
--------------------------------------------------------------------------------
/exercises/exercise_2-pdf.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercise 2: Identifying Random Effects-Structure"
3 | author: "Henrik Singmann"
4 | date: "25 July 2018"
5 | output: pdf_document
6 | ---
7 |
8 | # Exercise 2: Identifying Random Effects-Structure
9 |
10 | Your task is to identify the *maximal random-effects structure justified by the design* (Barr, Levy, Scheepers, & Tily, 2013) for one data set and implement this structure in `lme4::lmer` syntax.
11 |
12 |
13 | # Freeman, Heathcote, Chalmers, and Hockley (2010)
14 |
15 | Lexical decision and word naming latencies for 300 words and 300 nonwords presented in Freeman, Heathcote, Chalmers, and Hockley (2010). The study had one between-subjects factors, `task` with two levels (`"naming"` or `"lexdec"`), and four within-subjects factors: `stimulus` type with two levels (`"word"` or `"nonword"`), word `density` and word `frequency` each with two levels (`"low"` and `"high"`) and stimulus `length` with three levels (`4`, `5`, and `6`).
16 |
17 | The data comes with `afex` as `fhch2010`:
18 | ```{r}
19 | data("fhch2010", package = "afex")
20 | str(fhch2010)
21 | ```
22 |
23 | What is the maximal random-effects structure justified by the design for this data set for the dependent variable `log_rt`:
24 |
25 | ```{r, eval=FALSE}
26 | mixed(log_rt ~ ...)
27 |
28 | ```
29 |
30 |
31 | ## References
32 | - Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. *Journal of Memory and Language*, 68(3), 255-278. https://doi.org/10.1016/j.jml.2012.11.001
33 | -Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. *Journal of Memory and Language*, 62(1), 1-18. http://doi.org/10.1016/j.jml.2009.09.004
34 |
--------------------------------------------------------------------------------
/exercises/exercise_2-pdf.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/exercises/exercise_2-pdf.pdf
--------------------------------------------------------------------------------
/exercises/exercise_2.R:
--------------------------------------------------------------------------------
1 | ## ------------------------------------------------------------------------
2 | data("fhch2010", package = "afex")
3 | str(fhch2010)
4 |
5 | ## ---- eval=FALSE---------------------------------------------------------
6 | ## mixed(log_rt ~ ...)
7 | ##
8 |
9 |
--------------------------------------------------------------------------------
/exercises/exercise_2.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exercise 2: Identifying Random Effects-Structure"
3 | output: html_notebook
4 | ---
5 |
6 | # Exercise 2: Identifying Random Effects-Structure
7 |
8 | Your task is to identify the *maximal random-effects structure justified by the design* (Barr, Levy, Scheepers, & Tily, 2013) for one data set and implement this structure in `lme4::lmer` syntax.
9 |
10 |
11 | # Freeman, Heathcote, Chalmers, and Hockley (2010)
12 |
13 | Lexical decision and word naming latencies for 300 words and 300 nonwords presented in Freeman, Heathcote, Chalmers, and Hockley (2010). The study had one between-subjects factors, `task` with two levels (`"naming"` or `"lexdec"`), and four within-subjects factors: `stimulus` type with two levels (`"word"` or `"nonword"`), word `density` and word `frequency` each with two levels (`"low"` and `"high"`) and stimulus `length` with three levels (`4`, `5`, and `6`).
14 |
15 | The data comes with `afex` as `fhch2010`:
16 | ```{r}
17 | data("fhch2010", package = "afex")
18 | str(fhch2010)
19 | ```
20 |
21 | What is the maximal random-effects structure justified by the design for this data set for the dependent variable `log_rt`:
22 |
23 | ```{r, eval=FALSE}
24 | mixed(log_rt ~ ...)
25 |
26 | ```
27 |
28 |
29 | ## References
30 | - Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. *Journal of Memory and Language*, 68(3), 255-278. https://doi.org/10.1016/j.jml.2012.11.001
31 | -Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. *Journal of Memory and Language*, 62(1), 1-18. http://doi.org/10.1016/j.jml.2009.09.004
32 |
33 |
34 |
--------------------------------------------------------------------------------
/exercises/exercise_2_SOLUTION.R:
--------------------------------------------------------------------------------
1 | ## ------------------------------------------------------------------------
2 | data("fhch2010", package = "afex")
3 | str(fhch2010)
4 | # 'data.frame': 13222 obs. of 10 variables:
5 | # $ id : Factor w/ 45 levels "N1","N12","N13",..: 1 1 1 1 1 1 1 1 1 1 ...
6 | # $ task : Factor w/ 2 levels "naming","lexdec": 1 1 1 1 1 1 1 1 1 1 ...
7 | # $ stimulus : Factor w/ 2 levels "word","nonword": 1 1 1 2 2 1 2 2 1 2 ...
8 | # $ density : Factor w/ 2 levels "low","high": 2 1 1 2 1 2 1 1 1 1 ...
9 | # $ frequency: Factor w/ 2 levels "low","high": 1 2 2 2 2 2 1 2 1 2 ...
10 | # $ length : Factor w/ 3 levels "4","5","6": 3 3 2 2 1 1 3 2 1 3 ...
11 | # $ item : Factor w/ 600 levels "abide","acts",..: 363 121 202 525 580 135 42 368 227 141 ...
12 | # $ rt : num 1.091 0.876 0.71 1.21 0.843 ...
13 | # $ log_rt : num 0.0871 -0.1324 -0.3425 0.1906 -0.1708 ...
14 | # $ correct : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
15 |
16 | ## ---- eval=FALSE---------------------------------------------------------
17 | ## mixed(log_rt ~ ...)
18 | ##
19 |
20 | m_fhch <- mixed(log_rt ~ task*stimulus*density*frequency*length +
21 | (stimulus*density*frequency*length||id) +
22 | (task||item), fhch2010,
23 | method = "S", expand_re = TRUE)
24 |
--------------------------------------------------------------------------------
/exercises/prepare_data.R:
--------------------------------------------------------------------------------
1 | # This file prepares the data as described in https://osf.io/j4swp/
2 | # With a few additional additions
3 |
4 | # you might need to set the correct working directory via the menu:
5 | # Session - Set Working Directory - To Source File Location
6 |
7 | load("ssk16_dat_online.rda") # data comes in 4 data frames per
8 | # dw_1$group <-"P(if,then)"
9 | # dw_2$group <-"Acc(if,then)"
10 | # dw_3$group <-"P(Even)"
11 | # dw_4$group <-"Acc(Even)"
12 |
13 | dw_1$dv_question <- "probability"
14 | dw_2$dv_question <- "acceptability"
15 | dw_3$dv_question <- "probability"
16 | dw_4$dv_question <- "acceptability"
17 |
18 | dw_1$conditional <- "indicative"
19 | dw_2$conditional <- "indicative"
20 | dw_3$conditional <- "concessive"
21 | dw_4$conditional <- "concessive"
22 |
23 | dw_1$lfdn <- factor(paste(as.character(dw_1$lfdn), "P(if,then)", sep ="_"))
24 | dw_2$lfdn <- factor(paste(as.character(dw_2$lfdn), "Acc(if,then)", sep ="_"))
25 | dw_3$lfdn <- factor(paste(as.character(dw_3$lfdn), "P(Even)", sep ="_"))
26 | dw_4$lfdn <- factor(paste(as.character(dw_4$lfdn), "Acc(Even)", sep ="_"))
27 |
28 | names(dw_1)[names(dw_1) == 'P'] <- 'DV'
29 | names(dw_2)[names(dw_2) == 'ACC'] <- 'DV'
30 | names(dw_3)[names(dw_3) == 'PEven'] <- 'DV'
31 | names(dw_4)[names(dw_4) == 'ACCEven'] <- 'DV'
32 |
33 | dw <- rbind(dw_1, dw_2, dw_3, dw_4)
34 |
35 | # center IVs and DV at midpoint of scale
36 | dat <- within(dw, {
37 | c_given_a <- (CgivenA-50)/100
38 | dv <- (DV-50)/100
39 | #group <- factor(group, levels = c("P(if,then)", "Acc(if,then)", "P(Even)", "Acc(Even)"))
40 | dv_question <- factor(dv_question, levels = c("probability", "acceptability"))
41 | conditional <- factor(conditional, levels = c("indicative", "concessive"))
42 | })
43 |
44 | dat$AC <- NULL
45 | dat$conclusion <- NULL
46 |
47 | dat <- droplevels(dat[ dat$conditional == "indicative", ])
48 | dat$conditional <- NULL
49 | dat$type <- NULL
50 |
51 | dat <- dplyr::rename(dat, p_id = lfdn, i_id = le_nr)
52 | length(levels(dat$p_id))
53 |
54 | save(dat, file="ssk16_dat_preapred.rda")
55 |
56 | dat <- droplevels(dat[ dat$dv_question == "probability", ])
57 | dat$dv_question <- NULL
58 |
59 | save(dat, file="ssk16_dat_preapred_ex1.rda")
60 |
61 | ### latest preparation (July 2018)
62 |
63 | library("tidyverse")
64 |
65 | dat <- dat %>%
66 | rename(B_given_A = CgivenA,
67 | if_A_then_B = DV,
68 | B_given_A_c = c_given_a,
69 | if_A_then_B_c = dv) %>%
70 | select(p_id, i_id, B_given_A, B_given_A_c, if_A_then_B, if_A_then_B_c, rel_cond)
71 |
72 | save(dat, file = "ssk16_dat_tutorial.rda")
--------------------------------------------------------------------------------
/handout/mixed_model_handout.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Mixed Models in R - A Practical Introduction"
3 | author: "Henrik Singmann"
4 | date: "November 2018"
5 | output: pdf_document
6 | ---
7 |
8 | ```{r setup, include=FALSE}
9 | knitr::opts_chunk$set(echo = TRUE)
10 | ```
11 |
12 | ### Overview: Statistical Models in R
13 |
14 | 1. Identify probability distribution of data (more correct: of conditional distribution of the response)
15 | 2. Make sure variables are of correct type via `str()`
16 | 3. Set appropriate contrasts (orthogonal contrasts if model includes interaction): `afex::set_sum_contrasts()`
17 | 4. Describe statistical model using `formula`
18 | 4. Fit model: pass `formula` and `data.frame` to corresponding modeling function (e.g., `lm()`, `glm()`)
19 | 4. Check model fit (e.g., inspect residuals)
20 | 5. Test terms (i.e., main effects and interactions): Pass fitted model to `car::Anova()`
21 | 7. Follow-up tests:
22 | - Estimated marginal means: Pass fitted model to `emmeans::emmeans()`
23 | - Specify specific contrasts on estimated marginal means (e.g., `contrast()`, `pairs()`)
24 |
25 | - `afex` combines fitting (5.) and testing (7.):
26 | - ANOVAs: `afex::aov_car()`, `afex::aov_ez()`, or `afex::aov_4()`
27 | - (Generalized) linear mixed-effects models: `afex::mixed()`
28 |
29 |
30 | ### `R` Formula Interface for Statistical Models: `~`
31 |
32 | - `R` `formula` interface allows symbolic specification of statistical models, e.g. linear models:
33 | `lm(y ~ x, data)`
34 | - Dependent variable(s) left of `~` (can be multivariate or missing), independent variables right of `~`:
35 |
36 | | Formula | | Interpretation |
37 | | ------------------------|---|----------------------------------|
38 | | `~ x` or `~1+x` || Intercept and main effect of `x` |
39 | | ` ~ x-1` or `~0 + x` || Only main effect of `x` and no intercept (questionable) |
40 | | `~ x+y` || Main effects of `x` and `y`|
41 | | `~ x:y` || Interaction between `x` and `y` (and no main effect) |
42 | | `~ x*y` or `~ x+y+x:y` || Main effects and interaction between `x` and `y` |
43 |
44 |
45 | - **Formulas behave differently for coninuous and categorical covariates!!**
46 | + Always use `str(data)` before fitting: `int` & `num` is continuous, `Factor` or `character` is categorical.
47 | + Categorical/nominal variables have to be `factor`s. Create via `factor()`.
48 |
49 | - Categorical variables are transformed into numerical variables using contrast functions (via `model.matrix()`; see Cohen et al., 2002)
50 | + **If models include interactions, orthogonal contrasts (e.g., `contr.sum`) in which the intercept corresponds to the (unweighted) grand mean should be used**: `afex::set_sum_contrasts()`
51 | + Dummy/treatment contrasts (`R` default) lead to simple effects for lower order effects.
52 | + For linear models: Coding only affects interpretation of parameters/tests not overall model fit.
53 |
54 | - For models with only numerical covariates, suppressing intercept works as expected.
55 | - For models with categorical covariates, suppressing intercept or other lower-order effects often leads to very surprising results (and should generally be avoided).
56 |
57 | ### Tests of Model Terms/Effects with `car::Anova()`
58 | - `car::Anova(model, type = 3)` general solution for testing effects.
59 | - Type II and III tests equivalent for balanced designs (i.e., equal group sizes) and highest-order effect.
60 | - Type III tests require orthogonal contrasts (e.g.,`contr.sum`); recommended:
61 | + For experimental designs in which imbalance is completely random and not structural,
62 | + Complete cross-over interactions (i.e., main effects in presence of interaction) possible.
63 | - Type II are more appropriate if imbalance is structural (i.e., observational data).
64 |
65 | ### Follow-Up Tests
66 | - Choice of follow-up test after significant interactions based on research questions
67 | - Simple effects (e.g., main effect of one factor conditional on other factor[s])
68 | - Comparison of specific cell means
69 | - Two approaches for follow-up tests:
70 | - Model based using `emmeans` (assumes assumptions hold and uses shared error term)
71 | - Splitting data and running separate models for each split (assumes assumptions do not hold, use separate error terms)
72 | - When splitting data or using `emmeans::test()`, adjustment for multiple testing needs to be done by hand; e.g., pass $p$-values to `p.adjust()`
73 |
74 |
75 |
76 | ### Follow-up Tests with `emmeans` (Formerly `lsmeans`)
77 | - `emmeans(model, c("factor"))` (or `emmeans(model, ~factor)`) produces estimates marginal means (or least-square means for linear regression) for model terms (e.g., `emmeans(m6, c("education", "gender"))`).
78 | - Additional functions allow specifying contrasts/follow-up tests on the means, e.g.:
79 | + `pairs()` tests all pairwise comparisons among means.
80 | + `contrast()` allows to define arbitrary contrasts on marginal means.
81 | + `test(..., joint = TRUE)` for joint tests (e.g., simple effects if using `by`).
82 | + For more examples see vignettes: https://cran.r-project.org/package=emmeans
83 |
84 | ### ANOVAs with afex
85 |
86 | - `afex` ANOVA functions require column with participant ID:
87 | + `afex::aov_car()` allows specification of ANOVA using `aov`-like formula. Specification of participant id in `Error()` term. For example:
88 | `aov_car(dv ~ between_factor + Error(id/within_factor), data)`
89 | + `afex::aov_4()` allows specification of ANOVA using `lme4`-like formula. Specification of participant id in random term. For example:
90 | `aov_4(dv ~ between_factor + (within_factor|id), data)`
91 | + `afex::aov_ez()` allows specification of ANOVA using characters. For example:
92 | `aov_ez("id", "dv", data, between = "between_factor", within = "within_factor")`
93 | - All `afex` ANOVA functions return same results (only differ in how to specify)
94 |
95 | ### Repeated-Measures, IID Assumption, & Pooling
96 |
97 | - Ordinary linear regression, between-subjects ANOVA, and basically all standard statistical models share one assumption: Data points are *independent and identically distributed* (*iid*).
98 | + Independence assumption refers to residuals: After taking structure of model (i.e., parameters) into account, probability of a data point having a specific value is independent of all other data points.
99 | + Identical distribution: All observations sampled from same distribution.
100 | - For repeated-measures independence assumption often violated, which can have dramatic consequences on significance tests from model (e.g., increased or decreased Type I errors).
101 | - Three ways to deal with repeated-measures:
102 | 1. *Complete pooling*: Ignore dependency in data (often not appropriate, results likely biased)
103 | 2. *No pooling*: Two step procedure. 1. Separate data based on factor producing dependency and calculate separate statistical model for each subset. 2. Analysis of distribution of estimates from step 1. (Prone to overfitting which decreases precision of parameter estimates, estimation error accumulates in step 2, combination and analysis of individual estimates can be non-trivial if interest is in more than 1 parameter)
104 | 3. *Partial pooling*: Analyse data jointly while taking dependency into account (gold standard, e.g., mixed models)
105 |
106 |
107 |
108 | ### Mixed Models
109 |
110 | - Mixed models extend regular regression models via *random-effects parameters* that account for dependencies among related data points.
111 | - __Fixed Effects__
112 | - Overall or *population-level average* effect of specific model term (i.e., main effect, interaction, parameter) on dependent variable
113 | - Independent of stochastic variability controlled for by random effects
114 | - Hypothesis tests on fixed effect interpreted as hypothesis tests for terms in standard ANOVA or regression model
115 | - Possible to test specific hypotheses among factor levels (e.g., planned contrasts)
116 | - *Fixed-effects parameters*: Overall effect of specific model term on dependent variable
117 | - __Random Effects__
118 | - *Random-effects grouping factors*: Categorical variables that capture random or stochastic variability (e.g., participants, items, groups, or other hierarchical-structures).
119 | - In experimental settings, random-effects grouping factors often part of design one wants to generalize over.
120 | - Random-effects factor out idiosyncrasies of sample, thereby providing a more general estimate of the fixed effects of interest.
121 | - *Random-effects parameters*:
122 | + Provide each level of random-effects grouping factor with idiosyncratic parameter set.
123 | + zero-centered offsets/displacements for each level of random-effects grouping factor
124 | + added to specific fixed-effects parameter
125 | + assumed to follow normal distribution which provides _hierarchical shrinkage_, thereby avoids over-fitting
126 | + should be added to each parameter that varies within the levels of a random-effects grouping factor (i.e., factor is *crossed* with random-effects grouping factor)
127 | + Note: random-effects parameters (i.e., random-slopes) can only be added to a parameter if there exist multiple data points (i.e., replications) for each level of random-effects grouping factor and the parameter (e.g., each cell of corresponding factor or design-cell)
128 |
129 |
130 | ### Random-Effects Parameters in `lme4`/`afex`
131 |
132 | | Formula | Interpretation |
133 | | ------------------------|----------------------------------|
134 | | `(1|s)` | random intercepts for `s` (i.e., by-`s` random intercepts) |
135 | | `(1|s) + (1|i)` | by-`s` and by-`i` (i.e., crossed) random intercepts |
136 | | `(a|s)` or `(1+a|s)` | by-`s` random intercepts and by-`s` random slopes for `a` plus their correlation|
137 | | `(a*b|s)` | by-`s` random intercepts and by-`s` random slopes for `a`, `b`, and the `a:b` interaction plus correlations among the by-`s` random effects parameters |
138 | | `(0+a|s)` | by-`s` random slopes for `a` and no random intercept |
139 | | `(a||s)` | by-`s` random intercepts and by-`s` random slopes for `a`, but no correlation (expands to: `(0+a|s) + (1|s)`) |
140 | \emph{Note.} Suppressing the correlation parameters via \texttt{||} works only for numerical covariates in \texttt{lmer} and not for factors. \texttt{afex} provides the functionality to suppress the correlation also among factors if argument \texttt{expand\_re = TRUE} in the call to \texttt{mixed()} (see also function \texttt{lmer\_alt()}).
141 |
142 | Examples:
143 | `mixed(dv ~ within_s_factor * within_i_factor + (within_s_factor|s) + (within_i_factor|i), data, method = "S")`
144 | `mixed(dv ~ within_s_factor + (within_s_factor||s), data, method = "S", expand_re = TRUE)`
145 |
146 | ### Crossed Versus Nested Factors
147 |
148 | - Factor `A` is **crossed** with factor `B` if multiple levels of `A` appear within multiple levels of `B`. Note that this definition allows for missing values (i.e., it does not need to hold that all levels of `A` appear in all levels of `B`). For example:
149 | - Levels `a1`, `a2`, ... of `A` appear in `b1` of `B` and in `b2` of `B`, etc.
150 | - A within-subject factor (e.g., `congruency`) is crossed with the `participant` factor.
151 | - If each participant responds to a random subset of items and each item is responded to by several participants, `participant` and `item` are crossed.
152 |
153 |
154 | - Factor `A` is **nested** within factor `B` if some levels of `A` appear only within specific levels of factor `B`. E.g.,:
155 | - Levels `a1`, `a2`, and `a3` of `A` appear only in `b1` of `B` and `a4`, `a5`, and `a6` of `A` appear only in `b2` of `B`
156 | - Participants are nested in a between-subjects factor (e.g., `group`), because each level of `participant` only provides data for one level of the factor.
157 | - If student can be member of one class only and several classes were observed, factor `student` is nested within factor `class`.
158 |
159 |
160 | - Both dependency structures dealt with in same conceptual manner, via independent random effects-parameters. Specifically, both need independent random effects terms in model formula. For example:
161 | - For `students` nested within `class`, where each student has unique label (i.e., `student` id 1 is assigned to exactly one student and not to different students in different classes), at least:
162 | `... + (1|student) + (1|class)`
163 | - If additional factor `A` is crossed with `class`, but not with `student` (e.g., some students in each class receive treatment `a1`, some others `a2`), by-class random slopes need to be added:
164 | `... + (1|student) + (A|class)`
165 |
166 |
167 | ### Hypothesis-Tests for Mixed Models
168 |
169 | - `lme4::lmer` does not include *p*-values.
170 | - `afex::mixed` provides four different methods:
171 | 1. Kenward-Roger (`method="KR"`, default): Provides best-protection against anti-conservative results, requires a lot of RAM for complicated random-effects structures.
172 | 2. Satterthwaite (`method="S"`): Similar to KR, but requires less RAM.
173 | 3. Parametric-bootstrap (`method="PB"`): Simulation-based, can take a lot of time (can be speed-up using parallel computation).
174 | 4. Likelihood-ratio tests (`method="LRT"`): Provides worst control for anti-conservative results. Can be used if all else fails or if all random-effects grouping factors have many levels (e.g., over 50).
175 | - `afex::mixed` uses orthogonal contrasts per default. Necessary for categorical variables in interactions.
176 |
177 | ### Random-Effects Structure
178 |
179 | - Omitting random-effects parameters for model terms which vary within the levels of a random-effects grouping factor and for which random variability exists leads to non-iid residuals (i.e., $\epsilon$) and anti-conservative results (e.g., Barr, Levy, Scheepers, & Tily, 2013).
180 | - Safeguard is *maximal model justified by the design*.
181 | - If maximal model is overparameterized, contains degenerate estimates, and/or singular fits, power of maximal model may be reduced and a reduced model may be considered (Bates et al., 2015; Matuschek et al., 2017); however, reducing model introduces unknown risk of anti-conservativity, and should be done with caution.
182 | - Steps for running a mixed model analysis:
183 | 1. Identify desired fixed-effects structure
184 | 2. Identify random-effects grouping factors
185 | 3. Identify *maximal model justified by the design*:
186 | - Which factors/terms vary within levels of (i.e. are crossed with) each random-effects grouping factor?
187 | - Are there replicates within factor levels (or parameters/coefficients) for levels of random-effects grouping factor?
188 | 4. Choose method for calculating *p*-values and fit maximal model
189 | 5. Iteratively reduce random-effects structure until all degenerate/zero-variance random-effects parameters are removed.
190 | - If the maximal model shows critical convergence warnings, reducing random-effects structure probably indicated, even though this introduces unknown risk of anti-conservativity:
191 | - Start by removing the correlation among random-effects parameters
192 | - Remove random-effects parameters for highest-order effects with lowest variance
193 | - It can sometimes help to try different optimizers
194 | - Compare *p*-values/fixed-effects estimates across models (*p*-values from degenerate/minimal models are not reliable)
195 |
196 | ### GLMMs: Mixed-models with Alternative Distributional Assumptions
197 |
198 | - Not all data can be reasonable described by a normal distribution.
199 | - Generalized-linear mixed models (GLMMs; e.g., Jaeger, 2008) allow for other distributions. For example:
200 | - Binomial distribution: Repeated-measures logistic regression
201 | - Poisson distribution for count data
202 | - Gamma distribution for non-negative data (e.g., RTs)
203 | - GLMMs require specification of the conditional distribution of the response (`family`) and link function.
204 | - Link function determines how values on untransformed scale are mapped onto response scale.
205 | - Specification of random-effects structure conceptually identical as for LMMs.
206 | - GLMMs only allow two methods for hypothesis testing: `"LRT"` or `"PB"`.
207 | - Inspection of residuals/model fit more important for GLMMs than for LMMs: R package [`DHARMa`](https://cran.r-project.org/package=DHARMa)
208 | - Fit with `lme4::glmer` or `afex::mixed`, both require `family` argument (e.g., `family = binomial`):
209 | `mixed(prop ~ a * b + (a|s) + (b|i), data, weights = data$n, family = binomial, method = "LRT")` (Note: `data$n * data$prop` must produce integers; number of successes.)
210 |
--------------------------------------------------------------------------------
/handout/mixed_model_handout.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/handout/mixed_model_handout.pdf
--------------------------------------------------------------------------------
/part0-introduction/.Rhistory:
--------------------------------------------------------------------------------
1 | options(htmltools.dir.version = FALSE)
2 | # see: https://github.com/yihui/xaringan
3 | # install.packages("xaringan")
4 | # see:
5 | # https://github.com/yihui/xaringan/wiki
6 | # https://github.com/gnab/remark/wiki/Markdown
7 | options(width=110)
8 | options(digits = 4)
9 | require(psych)
10 | data(sat.act)
11 | sat.act$gender <- factor(sat.act$gender, 1:2, labels = c("male", "female"))
12 | sat.act$education <- factor(sat.act$education)
13 | summary(sat.act) # alternatively: psych::describe(sat.act)
14 | sat.act <- na.omit(sat.act)
15 | par(mfrow=c(1,2))
16 | plot(sat.act$SATV, sat.act$ACT)
17 | plot(sat.act$SATQ, sat.act$ACT)
18 | m1 <- lm(ACT ~ SATQ, sat.act)
19 | summary(m1)
20 | coef(m1)
21 | plot(sat.act$SATV, sat.act$ACT)
22 | abline(m1)
23 | sat.act$SATQ_c <- sat.act$SATQ - mean(sat.act$SATQ, na.rm = TRUE)
24 | sat.act$SATV_c <- sat.act$SATV - mean(sat.act$SATV)
25 | m2 <- lm(ACT ~ SATQ_c, sat.act)
26 | summary(m2)
27 | coef(m2)
28 | plot(sat.act$SATV_c, sat.act$ACT)
29 | abline(m2)
30 | plot(ACT ~ SATV, sat.act)
31 | plot(ACT ~ SATV_c, sat.act)
32 | ?formula
33 | cbind(rnorm(10), rnorm(10))
34 | cbind(rnorm(10), rnorm(10), rnorm(10))
35 | cbind(rnorm(10), rnorm(10), rnorm(9))
36 |
--------------------------------------------------------------------------------
/part0-introduction/figures/RMarkdown-example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/RMarkdown-example.png
--------------------------------------------------------------------------------
/part0-introduction/figures/ch-02-markdown-margin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/ch-02-markdown-margin.png
--------------------------------------------------------------------------------
/part0-introduction/figures/data-science.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/data-science.png
--------------------------------------------------------------------------------
/part0-introduction/figures/github-workshop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/github-workshop.png
--------------------------------------------------------------------------------
/part0-introduction/figures/magrittr.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/magrittr.png
--------------------------------------------------------------------------------
/part0-introduction/figures/markdownChunk2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/markdownChunk2.png
--------------------------------------------------------------------------------
/part0-introduction/figures/tidy-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singmann/mixed_model_workshop/cb1ba0d82579103e246402c25015bc99fa27fbeb/part0-introduction/figures/tidy-1.png
--------------------------------------------------------------------------------
/part0-introduction/introduction.R:
--------------------------------------------------------------------------------
1 | ## ----setup, include=FALSE------------------------------------------------
2 | options(htmltools.dir.version = FALSE)
3 | # see: https://github.com/yihui/xaringan
4 | # install.packages("xaringan")
5 | # see:
6 | # https://github.com/yihui/xaringan/wiki
7 | # https://github.com/gnab/remark/wiki/Markdown
8 | options(width=110)
9 | options(digits = 4)
10 |
11 | ## ---- eval = FALSE-------------------------------------------------------
12 | ## ---
13 | ## title: "My Title"
14 | ## author: "Henrik Singmann"
15 | ## date: "`r format(Sys.time(), '%d %B, %Y')`"
16 | ## output:
17 | ## html_document:
18 | ## toc: TRUE
19 | ## toc_float: true
20 | ## theme: paper
21 | ## highlight: espresso
22 | ## ---
23 |
24 | ## ---- echo=FALSE---------------------------------------------------------
25 | 1 + 1
26 |
27 | ## ---- eval=FALSE---------------------------------------------------------
28 | ## iris
29 |
30 | ## ---- eval=TRUE, echo=FALSE----------------------------------------------
31 | options(width = 50)
32 | iris[1:5, 1:3] # [...]
33 |
34 | ## ---- eval=TRUE----------------------------------------------------------
35 | iris$Spec
36 |
37 | ## ---- eval=TRUE----------------------------------------------------------
38 | library("tibble")
39 | iris2 <- as_tibble(iris)
40 | iris2
41 | iris2$Spec
42 |
43 | ## ---- eval=FALSE---------------------------------------------------------
44 | ## x %>% f
45 | ## x %>% f(y)
46 | ## x %>% f %>% g %>% h
47 | ##
48 | ## x %>% f(y, .)
49 | ## x %>% f(y, z = .)
50 |
51 | ## ---- eval=FALSE---------------------------------------------------------
52 | ## f(x)
53 | ## f(x, y)
54 | ## h(g(f(x)))
55 | ##
56 | ## f(y, x)
57 | ## f(y, z = x)
58 |
59 | ## ---- eval=FALSE---------------------------------------------------------
60 | ## library(magrittr)
61 | ## iris2$Sepal.Length %>%
62 | ## mean
63 |
64 | ## ---- message=FALSE------------------------------------------------------
65 | library("dplyr")
66 | iris2 %>%
67 | filter(Species == "setosa") %>%
68 | summarise(mean(Sepal.Length))
69 |
70 | ## ------------------------------------------------------------------------
71 | iris2 %>%
72 | group_by(Species) %>%
73 | summarise(mean_l = mean(Sepal.Length),
74 | max_l = max(Sepal.Length),
75 | min_l = min(Sepal.Length),
76 | sd_l = sd(Sepal.Length))
77 |
78 | ## ---- eval=FALSE---------------------------------------------------------
79 | ## library("ggplot2")
80 | ## ggplot(iris2, aes(x = Petal.Width, y = Petal.Length)) +
81 | ## geom_point()
82 |
83 | ## ---- eval=FALSE---------------------------------------------------------
84 | ## ggplot(iris2, aes(x = Petal.Width, y = Petal.Length, color = Species)) +
85 | ## geom_point()
86 |
87 | ## ---- eval=FALSE---------------------------------------------------------
88 | ## ggplot(iris2, aes(x = Species, y = Petal.Length)) +
89 | ## geom_jitter(width = 0.2) +
90 | ## geom_boxplot(fill = "transparent") +
91 | ## theme_bw()
92 |
93 |
--------------------------------------------------------------------------------
/part0-introduction/introduction.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Mixed Models in R"
3 | subtitle: "An Applied Introduction"
4 | author: "Henrik Singmann (University of Zurich) Twitter: @HenrikSingmann"
5 | date: "July 2018"
6 | output:
7 | xaringan::moon_reader:
8 | css: ["default", "default-fonts", "my-theme.css"]
9 | lib_dir: libs
10 | nature:
11 | highlightStyle: github
12 | highlightLines: true
13 | countIncrementalSlides: false
14 | ratio: '16:9'
15 | ---
16 |
17 |
18 |
19 | ```{r setup, include=FALSE}
20 | options(htmltools.dir.version = FALSE)
21 | # see: https://github.com/yihui/xaringan
22 | # install.packages("xaringan")
23 | # see:
24 | # https://github.com/yihui/xaringan/wiki
25 | # https://github.com/gnab/remark/wiki/Markdown
26 | options(width=110)
27 | options(digits = 4)
28 | ```
29 |
30 |
31 | class: inline-grey
32 | # Outline
33 |
34 | 1. Introduction: Modern `R`
35 | 2. Statistical Modeling in `R`
36 | 3. Dealing with repeated-measures (pooling)
37 | 4. Mixed models
38 |
39 | ---
40 | class: small
41 |
42 | ### Research and Statistics
43 |
44 | - *Substantive research questions*
45 | 1. Negative cognitive distortions sustain depressive symptoms.
46 | 2. Interference and not decay is the main source of forgetting in memory.
47 | 3. Inhibition is a specific and general mental ability, like IQ.
48 |
49 | --
50 |
51 | - *Operationalization and measurement*
52 | 1. Educating patients how to escape their negative thoughts should reduce depressive symptoms.
53 | 2. Control independently time of delay and amount of new information.
54 | 3. Ability to suppress distracting information should be related across tasks. For example, Stroop performance and flanker performance.
55 |
56 | --
57 |
58 |
59 | - Substantive questions cannot be directly adressed via empirical means (e.g., [Duhem-Quine thesis](https://en.wikipedia.org/wiki/Duhem%E2%80%93Quine_thesis)).
60 | - Researchers use empirical observations (data) for making arguments about research questions.
61 | - Appropriate *research methods* (e.g., experimental design, reliability, validity, reproducibility) help in making better (i.e., more convincing) arguments.
62 | - *Data visualization* and *statistics* are important tools for making good arguments about data:
63 | - A statistic cannot prove nor disprove a substantive research question or empirical hypothesis: *statistical arguments need context (e.g., data visualization).* [this is why AIC/BIC/WAIC/... often sucks]
64 | - Some statistical arguments are better, some are worse, and some have essentially no evidential value.
65 | - *Statistics is not a ritual* (e.g., [Gigerenzer, 2018](https://doi.org/10.1177/2515245918771329)). Instead, statistics is a toolkit, researchers have to select the right tool for each job.
66 | --
67 | - "There are no routine statistical questions, only questionable statistical routines." (David Cox)
68 | - "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (John Tukey)
69 |
70 |
71 | ---
72 |
73 | ### Process and Tools: `tidyverse` and `RMarkdown`
74 |
75 | 
76 |
77 | Conceptual model of data analysis (source: [Wickham & Grolemund (2017): R for Data Science](http://r4ds.had.co.nz))
78 |
79 |
80 | --
81 |
82 | - `tidyverse`: Selection of packages curated/developed by `RStudio`:
83 | - [`readr`](https://readr.tidyverse.org/): Reading data in, the `RStudio` way.
84 | - Data wrangling with [`tibble`](http://tibble.tidyverse.org/), [`magrittr`](http://magrittr.tidyverse.org/), [`tidyr`](http://tidyr.tidyverse.org/), and [`dplyr`](http://dplyr.tidyverse.org/): Coherent set of functions for tidying, transforming, and working with rectangular data. Supersedes many base `R` functions and makes common problems easy.
85 | - [`ggplot2`](http://ggplot2.tidyverse.org/): System for data visualization.
86 | - [`purrr`](http://purrr.tidyverse.org/) and [`broom`](https://broom.tidyverse.org/): Advanced modeling with the `tidyverse`.
87 |
88 | --
89 |
90 | - `RMarkdown` "authoring framework for data science".
91 |
92 | ---
93 |
94 | # `RMarkdown`
95 |
96 |
97 | - Context requires combination of a narrative/prose with data visualization and statistical results.
98 | - `RMarkdown` "authoring framework for data science".
99 | - Single document, `.Rmd` file, combines text, pictures, and `R` code.
100 | - Render document: Runs code and combines text, pictures, code, and output (i.e., text output and plots) into nicely formatted result:
101 | - `html` file
102 | - `pdf` or `Word` file
103 | - presentation (like this one)
104 | - blog or other website (`blogdown`), books (`bookdown`), interactive tutorials (`learnr`), [...](https://www.rstudio.com/resources/videos/r-markdown-eight-ways/)
105 |
106 | --
107 |
108 | - `RMarkdown` is efficient, easy to use, ensures reproducibility, and
109 | - is ideal for communicating results with collaborators or PIs,
110 | - can be used for for writing preregistrations with [`prereg`](https://cran.r-project.org/package=prereg),
111 | - and even for writing papers (i.e., [`papaja`](https://github.com/crsh/papaja)).
112 |
113 | --
114 |
115 |
116 | - *Warning:* If you send an `RMarkdown` `html` report, it needs to be downloaded before figures are visible (e.g., opening it directly from `gmail` does not show plots)!
117 |
118 | ---
119 | class:inline-grey, small
120 |
121 | ### `RMarkdown` - First Steps
122 |
123 | - Create new `RMarkdown` document: `File` -> `New File` -> `R Markdown...`
124 | - Enter title and your name -> Keep `html` selected -> `Ok`
125 | - `Save` file somewhere (e.g., `test.Rmd` in `Downloads`) -> `Knit` creates and opens `html` document
126 |
127 |
128 | ---
129 |
130 | ### `RMarkdown` Document Example ([source](http://rstudio-pubs-static.s3.amazonaws.com/202429_acbbe794b27f4dffaac6047d1b6d5aa0.html))
131 |
132 | 
133 |
134 | ---
135 | class:inline-grey, small
136 |
137 | ### `RMarkdown` - YAML Header
138 |
139 |
140 | ```{r, eval = FALSE}
141 | ---
142 | title: "My Title"
143 | author: "Henrik Singmann"
144 | date: "`r format(Sys.time(), '%d %B, %Y')`"
145 | output:
146 | html_document:
147 | toc: TRUE
148 | toc_float: true
149 | theme: paper
150 | highlight: espresso
151 | ---
152 | ```
153 |
154 | - `YAML` Stands for "YAML Ain't Markup Language"
155 | - This is where you set options for your overall document, for example:
156 | - [output format](https://rmarkdown.rstudio.com/formats.html) (`html_document`, `pdf_document`, `word_document`, `github_document`, ...)
157 | - add and format table of content
158 | - appearance (also add custom `css`)
159 | - see [`RMarkdown` cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) or https://rmarkdown.rstudio.com/html_document_format.html
160 |
161 | ---
162 | class: small
163 |
164 | ### Text Formatting
165 |
166 | .pull-left[
167 | 
168 |
169 | `[link](www.rstudio.com)` -> [link](www.rstudio.com)
170 |
171 |
172 | (source: http://socviz.co/gettingstarted.html#work-in-plain-text-using-rmarkdown)
173 | ]
174 |
175 |
176 |
177 | ---
178 | class: small
179 |
180 | 
181 |
182 | ---
183 |
184 | ### Code Chunks
185 |
186 | ````
187 | ```{r chunk_name, echo=FALSE}`r ''`
188 | 1 + 1
189 | ```
190 | ````
191 |
192 | ```{r, echo=FALSE}
193 | 1 + 1
194 | ```
195 |
196 | - Run a code chunk with `Ctrl`/`Cmd` + `Shift` + `Enter`
197 |
198 | Important chunk options:
199 | - `echo`: Display code in output document (default = `TRUE`)
200 | - `eval`: Run code in chunk (default = `TRUE`)
201 | - `include`: Include chunk and output in doc after running (default = `TRUE`)
202 | - `fig.height` and `fig.width`: Dimensions of plots in inches
203 | - `error`: Display error messages in doc (`TRUE`) or stop render when errors occur (`FALSE`) (default = `FALSE`)
204 | - `warning`: display code warnings in document (default = `TRUE`)
205 | - `results`: How to format results:
206 | - default = `'markup'`
207 | - `'asis'` - pass through results
208 | - `'hide'` - do not display results
209 | - `'hold'` - put all results below all code
210 | - `cache`: cache results for future knits (default = `FALSE`)
211 |
212 | --
213 |
214 | Try replacing `summary(cars)` with `str(cars)`
215 |
216 | ---
217 | class: small, inline-grey
218 |
219 | - visit: [`https://github.com/singmann/mixed_model_workshop/releases`](https://github.com/singmann/mixed_model_workshop/releases)
220 | - Download `Source code (zip)` (or `Source code (tar.gz) `)
221 |
222 |
223 |
224 |
225 |
226 | ---
227 | class: inline-grey
228 |
229 | ## Workshop Materials
230 |
231 | - `zip` Archive contains all materials (e.g., slides, code, exercises) of the current workshop
232 | - Extract `zip` archive if necessary
233 | - All slides are built using `RMarkdown` and `xaringan` package.
234 | - `part0-introduction` materials for introduction session (these slides).
235 | - `part1-statistical-modeling-in-r` materials for statistical modeling session.
236 | - `part2-mixed-models-in-r` materials for mixed models session.
237 | - In each folder:
238 | - `.Rmd` file is the `RMarkdown` containing text and code for the slides.
239 | - `.R` file only contains the code for the slides and no text.
240 | - You can follow the presentation by following either file. Don't forget:
241 | - Run a code chunk (i.e., grey block) with `Ctrl`/`Cmd` + `Shift` + `Enter`
242 | - Run a single line of code with `Ctrl`/`Cmd` + `Enter`
243 | - `.html` is the full presentation you are seeing. After opening, press `h` for help.
244 | - `exercises` contains some exercises.
245 | - `handout` contains the handout (also includes the `RMarkdown` file)
246 |
247 | ---
248 | class: center, middle, inverse
249 |
250 | # `tidyverse`
251 |
252 | ---
253 | class: small
254 |
255 | .pull-left[
256 | ### `tibble`
257 |
258 | - "**tibble** or `tbl_df` is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not."
259 | - Dramtically enhanced `print` method.
260 | - Does not change `strings` to `factors`.
261 | - Complains when a variable is missing (i.e., no partial matching).
262 | - Allows list columns (with nice printing).
263 |
264 | ```{r, eval=FALSE}
265 | iris
266 | ```
267 |
268 | ```{r, eval=TRUE, echo=FALSE}
269 | options(width = 50)
270 | iris[1:5, 1:3] # [...]
271 | ```
272 |
273 | [...]
274 |
275 | ```{r, eval=TRUE}
276 | iris$Spec
277 | ```
278 | ]
279 |
280 | --
281 |
282 | .pull-right[
283 | ```{r, eval=TRUE}
284 | library("tibble")
285 | iris2 <- as_tibble(iris)
286 | iris2
287 | iris2$Spec
288 | ```
289 |
290 | ]
291 |
292 | ---
293 | class:inline-grey
294 |
295 | ## `magrittr`
296 |
297 | - Pipe operator `%>%` makes code more readable:
298 | - structuring sequences of data operations left-to-right (as opposed to from the inside and out)
299 | - avoiding nested function calls,
300 | - minimizing need for local variables and function definitions.
301 | - Add pipe with `Ctrl`/`Cmd` +`Shift` + `m`
302 |
303 | .pull-left[
304 | ### Pipe
305 |
306 | ```{r, eval=FALSE}
307 | x %>% f
308 | x %>% f(y)
309 | x %>% f %>% g %>% h
310 |
311 | x %>% f(y, .)
312 | x %>% f(y, z = .)
313 | ```
314 |
315 | ]
316 |
317 | .pull-right[
318 | ### Base R
319 | ```{r, eval=FALSE}
320 | f(x)
321 | f(x, y)
322 | h(g(f(x)))
323 |
324 | f(y, x)
325 | f(y, z = x)
326 | ```
327 |
328 | ]
329 |
330 | --
331 |
332 | Try it out:
333 | ```{r, eval=FALSE}
334 | library(magrittr)
335 | iris2$Sepal.Length %>%
336 | mean
337 | ```
338 |
339 | ---
340 | class: small
341 |
342 | ### Tidy Data (`tidyr`)
343 |
344 | *"Tidy datasets are all alike, but every messy dataset is messy in its own way." -- Hadley Wickham*
345 |
346 | 1. Put each data set in a `tibble`.
347 | 2. Put each variable in a column.
348 | 1. Each variable must have its own column.
349 | 2. Each observation must have its own row.
350 | 3. Each value must have its own cell.
351 | 
352 | --
353 |
354 | - For psychologists: Transform wide into long data. See also:
355 | - Wickham, H. (2014). Tidy data. *The Journal of Statistical Software*, 59(10). http://www.jstatsoft.org/v59/i10
356 | - Wickham, H., & Grolemund, G. (2017). R for Data Science (ch. 12). http://r4ds.had.co.nz/tidy-data.html
357 |
358 | ---
359 |
360 | ### `dplyr`
361 |
362 | - grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
363 | - `mutate()` adds new variables that are functions of existing variables
364 | - `select()` picks variables based on their names.
365 | - `filter()` picks cases based on their values.
366 | - `summarise()` reduces multiple values down to a single summary.
367 | - `arrange()` changes the ordering of the rows.
368 | - All combine naturally with `group_by()` which allows performing any operation "by group".
369 |
370 | --
371 |
372 | .pull-left[
373 | ```{r, message=FALSE}
374 | library("dplyr")
375 | iris2 %>%
376 | filter(Species == "setosa") %>%
377 | summarise(mean(Sepal.Length))
378 | ```
379 |
380 | ]
381 |
382 | --
383 |
384 | .pull-right[
385 |
386 | ```{r}
387 | iris2 %>%
388 | group_by(Species) %>%
389 | summarise(mean_l = mean(Sepal.Length),
390 | max_l = max(Sepal.Length),
391 | min_l = min(Sepal.Length),
392 | sd_l = sd(Sepal.Length))
393 | ```
394 | ]
395 |
396 |
397 |
398 | ---
399 |
400 | ### `ggplot2`
401 |
402 | - System for declaratively creating graphics, based on "The Grammar of Graphics" by Leland Wilkinson
403 | - "You provide data, tell `ggplot2` how to map variables to aesthetics what graphical primitives to use, and it takes care of the details."
404 | - `ggplot()` is the basic function which takes the data.
405 | - `aes()` is used for mapping aesthetics.
406 | - `geom_...` tells which primitive to use.
407 |
408 | ```{r, eval=FALSE}
409 | library("ggplot2")
410 | ggplot(iris2, aes(x = Petal.Width, y = Petal.Length)) +
411 | geom_point()
412 | ```
413 | --
414 |
415 | ```{r, eval=FALSE}
416 | ggplot(iris2, aes(x = Petal.Width, y = Petal.Length, color = Species)) +
417 | geom_point()
418 | ```
419 | --
420 |
421 |
422 | ```{r, eval=FALSE}
423 | ggplot(iris2, aes(x = Species, y = Petal.Length)) +
424 | geom_jitter(width = 0.2) +
425 | geom_boxplot(fill = "transparent") +
426 | theme_bw()
427 | ```
428 | --
429 |
430 | - Learning `ggplot2`:
431 | - R for Data Science, http://r4ds.had.co.nz/, ch. 3 and ch. 28
432 | - `ggplot2` cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf
433 |
434 | ---
435 |
436 | ## Summary
437 |
438 | 
439 |
440 | - Data analysis and statistics are iterative processes.
441 | - Goal of statistics is to support arguments connecting empirical data and substantive research questions.
442 |
443 | ### `tidyverse`
444 | - Selection of packages providing a unified approach and syntax to common data analysis problems.
445 | - To learn more about the `tidyverse` check out the free "R for Data Science" book by Wickham & Grolemund: http://r4ds.had.co.nz/
446 |
447 | ### `RMarkdown`
448 | - Allows combining prose, code, and output into one nicely formatted document.
449 | - Great for communicating results and ensuring reproducibility.
450 |
451 | ---
452 | class: inline-grey
453 |
454 | ### Exercise
455 |
456 | - Open `exercises/exercise_0.Rmd` (or the exercise handout or `exercises/exercise_0.nb.html` for a nicer format of the instruction).
457 | - Follow text and try to solve a few small tasks helping you to get comfortable with the `tidyverse` (without looking at the solution).
458 | - Main goal is for you getting comfortable with `dplyr` and `ggplot2` syntax.
459 | - Exercise uses response time data from Freeman, Heathcote, Chalmers, and Hockley (2010).
460 | - Participants did a lexical-decision task or a naming task.
461 |
462 | The exercise uses yet another type of `RMarkdown` document, `html_notebook` instead of `html_document`:
463 | - `html_document`: "knitting" runs all code in a new `R` process from beginning to end (which ensures reproducibility).
464 | - In contrast, a `html_notebook`
465 | - uses current `R` process (i.e., state of `Console`), similar to [`Jupyter`](http://jupyter.org/) (does *NOT* ensure reproducibility),
466 | - allows to `Preview` the current state of the document as a `html` file,
467 | - potentially better for initial analysis or situations involving expensive calculations,
468 | - can be transformed into `html_document` by simply changing the YAML header.
469 |
470 | Remember:
471 | - Run a code chunk (i.e., grey block) with `Ctrl`/`Cmd` + `Shift` + `Enter`
472 | - Run a single line of code with `Ctrl`/`Cmd` + `Enter`
473 |
474 | ---
475 |
476 | ### Links
477 | - `RStudio` cheat sheets: https://www.rstudio.com/resources/cheatsheets/
478 | - `RStudio`: https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
479 | - `RMarkdown`: https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf
480 | - `ggplot2`: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf
481 | - `dplyr` & `tidyr`: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
482 |
483 | - Introduction to Open Data Science: http://ohi-science.org/data-science-training/
484 | - R for Data Science: http://r4ds.had.co.nz/
485 | - Data Visualization: A practical introduction: http://socviz.co/
486 | - Exploratory Data Analysis with R: https://bookdown.org/rdpeng/exdata/
487 | - The Art of Data Science: https://bookdown.org/rdpeng/artofdatascience/
488 |
489 |
490 |
--------------------------------------------------------------------------------
/part0-introduction/introduction.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |