├── .gitattributes
├── .gitignore
├── README.md
├── data
    ├── earnings.RData
    ├── gpa.RData
    ├── nurses.RData
    ├── patents.RData
    ├── popularity.RData
    ├── pupils.RData
    ├── sociometric.RData
    ├── speed_dating.RData
    ├── speed_dating_model.RData
    └── storms.RData
├── mixed-models-with-r-workshop-2019.Rproj
└── notebooks
    ├── extensions.Rmd
    ├── extensions.nb.html
    ├── introduction.Rmd
    ├── introduction.nb.html
    ├── mixed_models_basics.Rmd
    ├── mixed_models_basics.nb.html
    ├── random_slopes.Rmd
    └── random_slopes.nb.html


/.gitattributes:
--------------------------------------------------------------------------------
 1 | # History files
 2 | .Rhistory
 3 | .Rapp.history
 4 | 
 5 | # Session Data files
 6 | .RData
 7 | 
 8 | # Example code in package build process
 9 | *-Ex.R
10 | 
11 | # Output files from R CMD build
12 | /*.tar.gz
13 | 
14 | # Output files from R CMD check
15 | /*.Rcheck/
16 | 
17 | # RStudio files
18 | .Rproj.user/
19 | 
20 | # Primary content
21 | *.Rmd
22 | 
23 | # produced vignettes
24 | _book/*.html
25 | _docs/*.html
26 | _book/*.pdf
27 | 
28 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
29 | .httr-oauth
30 | 
31 | # knitr and R markdown default cache directories
32 | /*_bookdown_files/
33 | /*_cache/
34 | /cache/
35 | 
36 | # Temporary files created by R markdown
37 | *.utf8.md
38 | *.knit.md
39 | 
40 | *.html linguist-documentation=true
41 | *.js linguist-vendored
42 | *.sh linguist-vendored
43 | *.css linguist-vendored
44 | 
45 | 
46 | *.Rmd linguist-language=R
47 | *.stan linguist-language=Stan
48 | *.inp linguist-language=Mplus
49 | *.out linguist-generated=true


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | .DS_*


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Mixed Models with R Workshop 2019
 2 | 
 3 | Notebook and data for a workshop demonstrating mixed models.  For more details see: https://m-clark.github.io/mixed-models-with-R/
 4 | 
 5 | For those in the workshop, after downloading this repo, open the RStudio project inside and, once your project is open, open each of the notebook files in the notebooks folder.  They contain condensed code and exercises from the document linked above.
 6 | 
 7 | The order of the notebooks:
 8 | 
 9 | - introduction
10 | - mixed_model_basics
11 | - random_slopes
12 | - extensions


--------------------------------------------------------------------------------
/data/earnings.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/earnings.RData


--------------------------------------------------------------------------------
/data/gpa.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/gpa.RData


--------------------------------------------------------------------------------
/data/nurses.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/nurses.RData


--------------------------------------------------------------------------------
/data/patents.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/patents.RData


--------------------------------------------------------------------------------
/data/popularity.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/popularity.RData


--------------------------------------------------------------------------------
/data/pupils.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/pupils.RData


--------------------------------------------------------------------------------
/data/sociometric.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/sociometric.RData


--------------------------------------------------------------------------------
/data/speed_dating.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/speed_dating.RData


--------------------------------------------------------------------------------
/data/speed_dating_model.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/speed_dating_model.RData


--------------------------------------------------------------------------------
/data/storms.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/storms.RData


--------------------------------------------------------------------------------
/mixed-models-with-r-workshop-2019.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/notebooks/extensions.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Common Extensions"
  3 | output: html_notebook
  4 | editor_options: 
  5 |   chunk_output_type: inline
  6 | ---
  7 | 
  8 | 
  9 | ```{r chunk_setup, include=FALSE, eval=TRUE}
 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 
 11 |                       eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 
 12 |                       fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite')
 13 | ```
 14 | 
 15 | ```{r catchup}
 16 | # if needed
 17 | library(tidyverse)
 18 | library(lme4)
 19 | 
 20 | load('data/gpa.RData')
 21 | 
 22 | # if you want to run all the code, you'll need to install the following
 23 | # install.packages(c('sjstats', 'merTools', 'broom'))
 24 | ```
 25 | 
 26 | 
 27 | ## Additional Grouping Structure
 28 | 
 29 | ### Crossed random effects
 30 | 
 31 | Load the pupils data. We'll look at achievement scores for students.  The sources of dependency are due to students having gone to the same primary or secondary schools.  However, in this example, going to a primary school doesn't necessarily mean you'll go to a specific secondary school.  Note also that there are no repeated measures, we see each student only once.
 32 | 
 33 | ```{r load_pupils_data}
 34 | load('data/pupils.RData')
 35 | 
 36 | pupils
 37 | ```
 38 | 
 39 | #### Primary and Secondary School Random Effects
 40 | 
 41 | We'll do a model with random effects for both primary and secondary school.  As such, we'll get variance components for each as well.
 42 | 
 43 | ```{r cross_classified}
 44 | pupils_crossed = lmer(achievement ~ sex + ses 
 45 |                       + (1|primary_school_id) + (1|secondary_school_id), 
 46 |                       data = pupils)
 47 | 
 48 | summary(pupils_crossed, correlation=F)
 49 | ```
 50 | 
 51 | In this case, the primary school displays more variance.
 52 | 
 53 | ```{r crossed_vc}
 54 | VarCorr(pupils_crossed) %>% 
 55 |   print(comp=c('Var', 'Std'), digits=3)
 56 | ```
 57 | 
 58 | ```{r crossed_icc}
 59 | sjstats::icc(pupils_crossed) # relative proportion due to each school
 60 | ```
 61 | 
 62 | Note the specific random effects for 50 distinct primary schools vs. 30 distinct secondary schools.
 63 | 
 64 | ```{r crossed_re}
 65 | str(ranef(pupils_crossed))
 66 | ```
 67 | 
 68 | We can use the `merTools` package to visualize.  Install it if you haven't already.  This plot allows to visually see the increased variability due to primary schools relative to secondary.
 69 | 
 70 | ```{r crossed_re_plot}
 71 | library(merTools)
 72 | 
 73 | plotREsim(REsim(pupils_crossed)) +
 74 |   theme_minimal()
 75 | ```
 76 | 
 77 | 
 78 | ### Hierarchical Structure
 79 | 
 80 | Load and inspect the nurses data. Here we are interested in the effect of a training program (`treatment`) on stress levels (on a scale of 1-7) of nurses.  In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital.
 81 | 
 82 | ```{r nurses_data, echo=1}
 83 | load('data/nurses.RData')
 84 | 
 85 | nurses
 86 | ```
 87 | 
 88 | There are two different ways to note a nested structure with `lme4`.  Either is fine.  As before, we have two sources of variability, and with this data it is with ward and hospital.
 89 | 
 90 | ```{r hierarchical}
 91 | nurses_hierarchical = lmer(stress ~ age + sex + experience 
 92 |                            + treatment + wardtype + hospsize 
 93 |                            + (1|hospital) + (1|hospital:ward), 
 94 |                            data = nurses)
 95 | 
 96 | nurses_hierarchical = lmer(stress ~ age  + sex + experience 
 97 |                            + treatment + wardtype + hospsize 
 98 |                            + (1|hospital/ward), 
 99 |                            data = nurses) # same thing!
100 | 
101 | summary(nurses_hierarchical, correlation=F)
102 | ```
103 | 
104 | ```{r hierarchical_fixed}
105 | nurses_hierarchical %>% 
106 |   broom::tidy('fixed', conf.int=T) %>% 
107 |   mutate_if(is.numeric, round, digits = 2)
108 | ```
109 | 
110 | 
111 | As far as the fixed effects go, about the only thing that doesn't have a statistical effect is ward type.  
112 | 
113 | There appears to be more variability due to ward than that due to hospital.
114 | 
115 | ```{r hierarchical_random}
116 | VarCorr(nurses_hierarchical)
117 | ```
118 | 
119 | 
120 | 
121 | ### Crossed vs. Nested
122 | 
123 | The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different?
124 | 
125 | ```{r crossed_vs_nested, message=F}
126 | nurses_hierarchical1 = lmer(stress ~ age  + sex + experience 
127 |                            + treatment + wardtype + hospsize 
128 |                            + (1|hospital) + (1|hospital:ward), data = nurses)
129 | 
130 | nurses_crossed1 = lmer(stress ~ age  + sex + experience 
131 |                            + treatment + wardtype + hospsize 
132 |                            + (1|hospital) + (1|ward), data = nurses)
133 | 
134 | nurses_crossed2 = lmer(stress ~ age  + sex + experience 
135 |                            + treatment + wardtype + hospsize 
136 |                            + (1|hospital) + (1|wardid), data = nurses)
137 | ```
138 | 
139 | 
140 | ```{r nested1}
141 | VarCorr(nurses_hierarchical1)
142 | ```
143 | 
144 | ```{r crossed1}
145 | VarCorr(nurses_crossed1)
146 | ```
147 | 
148 | ```{r crossed2}
149 | VarCorr(nurses_crossed2)
150 | ```
151 | 
152 | 
153 | The first hierarchical model and the second crossed version are identical. The second crossed is incorrectly labeled to be using a crossed notation for the model, as the label doesn't distinguish ward 1 in hospital 1 from ward 1 in hospital 2.  In the second crossed model, we use `wardid` instead of `ward`, so each ward has a unique id, and the proper structure is accounted for with  the crossed syntax.
154 | 
155 | 
156 | ## Residual Structure
157 | 
158 | 
159 | ### Heterogeneous Variances
160 | 
161 | ```{r heterovar}
162 | library(nlme)
163 | heterovar_res = lme(gpa ~ occasion, 
164 |                     data = gpa,
165 |                     random = ~1|student, 
166 |                     weights = varIdent(form = ~1|occasion))
167 | 
168 | summary(heterovar_res)
169 | ```
170 | 
171 | ```{r relative_variances}
172 | # values are relative to redisual variance and on the standard deviation scale
173 | summary(heterovar_res$modelStruct)
174 | ```
175 | 
176 | ```{r heterovar_glmmTMB}
177 | library(glmmTMB)
178 | heterovar_res2 = glmmTMB(gpa ~ occasion 
179 |                          + (1|student) + diag(0 + occas |student), 
180 |                          data = gpa)
181 | 
182 | summary(heterovar_res2) # you can ignore the Corr part in the random effects output
183 | ```
184 | 
185 | ```{r glmmtmb_extract_variances}
186 | vc_glmmtmb = VarCorr(heterovar_res2)
187 | vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2
188 | vc_glmmtmb
189 | ```
190 | 
191 | ### Autocorrelation
192 | 
193 | ```{r corr_residual}
194 | corr_res = lme(
195 |   gpa ~ occasion, 
196 |   data = gpa,
197 |   random = ~1|student, 
198 |   correlation = corAR1(form = ~occasion)
199 | )
200 | 
201 | corr_res
202 | ```
203 | 
204 | ## Generalized Linear Mixed Models
205 | 
206 | Note that `nlme` does not model beyond the gaussian distribution, so we go back to using `lme4` and the `glmer` function.  You may notice that it takes the model a second or two even though it is not that complex.  GLMM are often hard to estimate, and you will often encounter convergence issues.  Scaling the data can help a lot.
207 | 
208 | ```{r glmm_speed_dating}
209 | load('data/speed_dating.RData')
210 | 
211 | sd_model = glmer(
212 |   decision ~ sex + samerace + attractive_sc + sincere_sc + intelligent_sc 
213 |   + (1|iid), 
214 |   data = speed_dating,
215 |   family = binomial
216 | )
217 | 
218 | summary(sd_model, correlation=F)
219 | ```
220 | 
221 | Note that the participant effect (`iid`) is almost as large (in terms of the standard deviation) as the effect of attractiveness.
222 | 
223 | 
224 | ## Exercises
225 | 
226 | 
227 | ### Sociometric data
228 | 
229 | In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size.
230 | 
231 | To run a mixed model, we will have three sources of structure to consider:
232 | 
233 | - senders (within group)
234 | - receivers (within group)
235 | - group
236 | 
237 | First, load the sociometric data. 
238 | 
239 | ```{r load_socio}
240 | load('data/sociometric.RData')
241 | ```
242 | 
243 | 
244 | To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run.
245 | 
246 | - Model 1: No covariates, only sender and receiver random effects. Note that even though we don't add group yet, still use the nesting approach to specify the effects (e.g. `1|group:receiver`)
247 | - Model 2: No covariates, add group random effect
248 | - Model 3: Add all covariates: `agesend/rec`, `sexsend/rec`, and `grsize` (group size)
249 | - Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model `sexsend:sexrec`.
250 | - Compare models with AIC, e.g. `AIC(model1)`. A lower value would indicate the model is preferred.
251 | 
252 | 
253 | ```{r socio}
254 | model1 = lmer(rating ~ (1|group:sender) + (1|group:receiver), 
255 |              data=sociometric)
256 | summary(model1, correlation=F) 
257 | 
258 | model2 = lmer(rating ~ (1|group:sender) + (1|group:receiver) + (1|group), 
259 |              data=sociometric)
260 | summary(model2, correlation=F) 
261 | 
262 | model3 = lmer(rating ~ sexsend + sexrec + agesend + agerec + grsize 
263 |               + (1|group:sender) + (1|group:receiver) + (1|group), 
264 |              data=sociometric)
265 | summary(model3, correlation=F)
266 | 
267 | model4 = lmer(rating ~ sexsend*sexrec + agesend + agerec + grsize 
268 |               + (1|group:sender) + (1|group:receiver) + (1|group), 
269 |              data=sociometric)
270 | summary(model4, correlation=F)
271 | 
272 | c(AIC(model1), AIC(model2), AIC(model3), AIC(model4))
273 | ```
274 | 
275 | 
276 | 
277 | 
278 | ### Patents
279 | 
280 | Do a Poisson mixed effect model using the `patent` data.  Model the number of citations (`ncit`) based on whether there was opposition to the patent (`opposition`) and if it was for the biotechnology/pharmaceutical industry (`biopharm`). Use year as a random effect to account for unspecified economic conditions.  
281 | 
282 | 
283 | ```{r patent_starter}
284 | load('data/patents.RData')
285 | 
286 | model_poisson = glmer(ncit ~ opposition + biopharm + (1 | year), 
287 |                       data = patents, 
288 |                       family = 'poisson')
289 | 
290 | summary(model_poisson)
291 | ```
292 | 
293 | 
294 | Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (`subject` in the data).  In other words, no clustering or grouped structure is necessary.
295 | 


--------------------------------------------------------------------------------
/notebooks/introduction.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Mixed Models with R"
  3 | output: html_notebook
  4 | editor_options: 
  5 |   chunk_output_type: inline
  6 | ---
  7 | 
  8 | 
  9 | ```{r chunk_setup, include=FALSE, eval=TRUE}
 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 
 11 |                       eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 
 12 |                       fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite')
 13 | ```
 14 | 
 15 | ```{r basic_packages}
 16 | library(tidyverse)
 17 | ```
 18 | 
 19 | 
 20 | # Introduction
 21 | 
 22 | ## Basic Linear Model
 23 | 
 24 | Note the output that you get from a standard regression.
 25 | 
 26 | ```{r linear_model}
 27 | library(lme4)
 28 | 
 29 | lm_model = lm(Reaction ~ Days, data = sleepstudy)
 30 | summary(lm_model)
 31 | ```
 32 | 
 33 | Now we will run a mixed model.  Here we will allow for *random effects*, which are specific effects for each individual, and come from some distribution, e.g. normal with mean zero and some variances - $\mathcal{N}(0, \sigma^2_{individual})$.
 34 | 
 35 | ```{r mixed_model}
 36 | mixed_model = lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)
 37 | 
 38 | summary(mixed_model)
 39 | ```
 40 | 
 41 | Three parts of the above output are the same as we would get using the `lm` function.  The 'fixed effects' coefficients are in fact identical, though we can see they have different standard errors, and for many mixed model settings the coefficients would not be exactly the same.  In addition, we have the residual variance and standard deviation.  The latter is the same thing as the residual standard error from `lm`, but we can see its value is different.  That *unexplained* variance has now been partitioned into different parts.
 42 | 
 43 | In this model we allow both intercepts and the coefficient for Days to vary by student.  The estimated standard deviation tells what how much we would deviate from the normal effect (i.e. the fixed effects above) as we move from student to student, on average.
 44 | 
 45 | ```{r variance_components}
 46 | print(VarCorr(mixed_model), comp=c('Var', 'Std')) 
 47 | ```
 48 | 
 49 | 
 50 | We can also get estimates of these individual specific deviations.
 51 | 
 52 | ```{r random_effects}
 53 | random_effects = ranef(mixed_model)$Subject
 54 | random_effects
 55 | ```
 56 | 
 57 | Or we can add them to the fixed effects, allowing us to think in terms of *random coefficients*.
 58 | 
 59 | ```{r random_coefficients}
 60 | random_coefficients = coef(mixed_model)$Subject
 61 | ```
 62 | 
 63 | As you can imagine, this would be notably better for prediction than assuming the same intercept and slope for everyone.
 64 | 
 65 | ```{r plot_fixed_effect}
 66 | fixed_effects = fixef(mixed_model)
 67 | random_effects  = 
 68 |   random_coefficients %>% 
 69 |   mutate(Subject = factor(unique(sleepstudy$Subject)))
 70 | 
 71 | sleepstudy %>% 
 72 |   ggplot(aes(Days, Reaction)) +
 73 |   geom_point(aes(color = Subject), alpha = .25) +
 74 |   geom_abline(
 75 |     aes(
 76 |       intercept = fixed_effects['(Intercept)'],
 77 |       slope = fixed_effects['Days']),
 78 |     color = 'darkred',
 79 |     size = 2) +
 80 |   geom_abline(
 81 |     aes(
 82 |       intercept = `(Intercept)`, 
 83 |       slope = Days,
 84 |       color = Subject),
 85 |     size = .5,
 86 |     alpha = .25,
 87 |     data = random_effects) 
 88 | ```
 89 | 
 90 | 
 91 | Each individual is allowed to have their own say, but we can still refer to the general `population` or `typical` effect.
 92 | 
 93 | ```{r animate_effects}
 94 | library(gganimate)
 95 | 
 96 | # this may take up to a minute to produce, and will show in your viewer
 97 | sleepstudy %>% 
 98 |   ggplot(aes(Days, Reaction)) +
 99 |   geom_abline(
100 |     aes(
101 |       intercept = fixed_effects['(Intercept)'],
102 |       slope = fixed_effects['Days']),
103 |     color = 'darkred',
104 |     size = 2,
105 |     alpha = .01) +
106 |   geom_point(aes(color = Subject), alpha = 1) +
107 |   geom_abline(
108 |     aes(
109 |       intercept = `(Intercept)`, 
110 |       slope = Days,
111 |       group = Subject,
112 |       color = Subject),
113 |     size = .5,
114 |     alpha = .25,
115 |     data = random_effects) + 
116 |   transition_states(Subject, 1, 1) +
117 |   theme_minimal()
118 | ```
119 | 
120 | 


--------------------------------------------------------------------------------
/notebooks/mixed_models_basics.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Mixed Models: Basics"
  3 | output: html_notebook
  4 | editor_options: 
  5 |   chunk_output_type: inline
  6 | ---
  7 | 
  8 | 
  9 | ```{r chunk_setup, include=FALSE, eval=TRUE}
 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 
 11 |                       eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 
 12 |                       fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite')
 13 | ```
 14 | 
 15 | 
 16 | ## Example: Student GPA
 17 | 
 18 | Load up and inspect the data.
 19 | 
 20 | ```{r basic_packages}
 21 | library(tidyverse)
 22 | ```
 23 | 
 24 | 
 25 | ```{r load_gpa_data}
 26 | load('data/gpa.RData')
 27 | gpa
 28 | ```
 29 | 
 30 | ## Application
 31 | 
 32 | ### Standard Regression
 33 | 
 34 | We'll start with a standard linear regression model. We have coefficients for the intercept and the effect of time, and in addition, the variance of the observations (residual standard error).
 35 | 
 36 | ```{r gpa_lm}
 37 | gpa_lm = lm(gpa ~ occasion, data=gpa)
 38 | summary(gpa_lm)
 39 | ```
 40 | 
 41 | ### Mixed Model
 42 | 
 43 | ```{r gpa_mixed}
 44 | library(lme4)
 45 | gpa_mixed = lmer(gpa ~ occasion + (1|student), data=gpa)
 46 | summary(gpa_mixed)
 47 | ```
 48 | 
 49 | [As a test, replace `1|student` with `1|sample(1:10, 1200, replace = T)`.  As your variance due to arbitrary grouping is essentially 0, the residual error estimate is similar to the `lm` model.]
 50 | 
 51 | 
 52 | People always ask where the p-values are, but the answer is... complicated.  Other packages and programs present them as if they are trivially obtained, but that is not the case, and the `lme4` developers would rather not make unnecessary assumptions.  On the plus side, you can get interval estimates easily enough, even though they are poorly named for the variance components. `sigma01` is the student variance.
 53 | 
 54 | ```{r gpa_mixed_confint}
 55 | confint(gpa_mixed)
 56 | ```
 57 | 
 58 | #### Estimated Random Effects
 59 | 
 60 | Now examine the random effects.
 61 | 
 62 | ```{r gpa_mixed_ranef}
 63 | ranef(gpa_mixed)$student 
 64 | ```
 65 | 
 66 | ```{r gpa_mixed_rancoef}
 67 | coef(gpa_mixed)$student 
 68 | ```
 69 | 
 70 | As we didn't allow the occasion effect to vary, it is constant.  We'll change this later.
 71 | 
 72 | #### Prediction
 73 | 
 74 | ```{r gpa_mixed_prediction}
 75 | predict(gpa_mixed, re.form=NA) %>% head
 76 | ```
 77 | 
 78 | ## Adding a Cluster-level Covariate
 79 | 
 80 | See exercises.
 81 | 
 82 | 
 83 | 
 84 | ## Exercises
 85 | 
 86 | 
 87 | ### Sleep
 88 | 
 89 | For this exercise, we'll use the sleep study data from the `lme4` package.  The following describes it.
 90 | 
 91 | > The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject.
 92 | 
 93 | After loading the package, the data can be loaded as follows.  I show the first few observations.
 94 | 
 95 | ```{r sleepstudy}
 96 | library(lme4)
 97 | data("sleepstudy")
 98 | head(sleepstudy)
 99 | ```
100 | 
101 | 1. Run a regression with Reaction as the target variable and Days as the predictor. 
102 | 
103 | 2. Run a mixed model with a random intercept for Subject.
104 | 
105 | 3. Interpret the variance components and fixed effects.
106 | 
107 | 4. What would a plot of the prediction lines per student look like relative to the overall trend?
108 | 
109 | 
110 | 
111 | ### Cluster level covariate
112 | 
113 | Rerun the mixed model with the GPA data adding the cluster level covariate of `sex`, or high school GPA (`highgpa`), or both.  Interpret all aspects of the results.
114 | 
115 | ```{r gpa_cluster, eval=FALSE}
116 | gpa_mixed_cluster_level = lmer(?, gpa)
117 | 
118 | summary(gpa_mixed_cluster_level)
119 | ```
120 | 
121 | What happened to the student variance after adding cluster level covariates to the model?
122 | 
123 | 
124 | 
125 | ### Simulation
126 | 
127 | The following represents a simple way to simulate a random intercepts model.  Note each object what each object is, and make sure the code make sense to you.  Then run it.
128 | 
129 | ```{r simMixed}
130 | set.seed(1234)  # this will allow you to exactly duplicate your result
131 | Ngroups = 100
132 | NperGroup = 3
133 | N = Ngroups * NperGroup
134 | groups = factor(rep(1:Ngroups, each = NperGroup))
135 | u = rnorm(Ngroups, sd = .5)
136 | e = rnorm(N, sd = .25)
137 | x = rnorm(N)
138 | y = 2 + .5 * x + u[groups] + e
139 | 
140 | d = data.frame(x, y, groups)
141 | ```
142 | 
143 | Which of the above represent the fixed and random effects? Now run the following.
144 | 
145 | ```{r simMixed2}
146 | model = lmer(y ~ x + (1|groups), data=d)
147 | summary(model)
148 | confint(model)
149 | 
150 | 
151 | 
152 | library(ggplot2)
153 | ggplot(aes(x, y), data=d) +
154 |   geom_point()
155 | ```
156 | 
157 | Do the results seem in keeping with what you expect?
158 | 
159 | In what follows we'll change various aspects of the data, then rerun the model after each change, then summarize and get confidence intervals as before.  For each note specifically at least one thing that changed in the results.
160 | 
161 | 0. First calculate or simply eyeball the intraclass correlation coefficient:
162 | 
163 | $$\frac{\textrm{random effect variance}}{\textrm{residual + random effect variance}}$$
164 | 
165 | In addition, create a density plot of the random effects as follows.
166 | 
167 | ```{r simMixed3, eval=FALSE}
168 | re = ranef(model)$groups
169 | qplot(x=re, geom='density', xlim=c(-3,3))
170 | ```
171 | 
172 | 1. Change the random effect variance/sd and/or the residual variance/sd and note your new estimate of the ICC, and plot the random effect as before.
173 | 
174 | 2. Reset the values to the original.  Change <span class="objclass">Ngroups</span> to 50. What differences do you see in the confidence interval estimates? 
175 | 
176 | 3. Set the Ngroups back to 100. Now change <span class="objclass">NperGroup</span> to 10, and note again the how the CI is different from the base condition.


--------------------------------------------------------------------------------
/notebooks/random_slopes.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "More Random Effects"
  3 | output: html_notebook
  4 | editor_options: 
  5 |   chunk_output_type: inline
  6 | ---
  7 | 
  8 | 
  9 | ```{r chunk_setup, include=FALSE, eval=TRUE}
 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 
 11 |                       eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 
 12 |                       fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite')
 13 | ```
 14 | 
 15 | ```{r catchup}
 16 | # if needed
 17 | library(tidyverse)
 18 | library(lme4)
 19 | 
 20 | load('data/gpa.RData')
 21 | 
 22 | gpa_lm = lm(gpa ~ occasion, data=gpa)
 23 | ```
 24 | 
 25 | ## Application
 26 | 
 27 | Add a random slope to our previous example and examine the results.
 28 | 
 29 | ```{r random_slope}
 30 | gpa_mixed =  lmer(gpa ~ occasion + (1 + occasion|student), data=gpa)
 31 | summary(gpa_mixed)
 32 | ```
 33 | 
 34 | ### Explore Random Effects
 35 | 
 36 | As before, we can examine the per-student random effects. 
 37 | 
 38 | ```{r random_effects}
 39 | ranef(gpa_mixed)$student
 40 | ```
 41 | 
 42 | Unlike before, we see each student's occasion effect.
 43 | 
 44 | ```{r random_coefficients}
 45 | coef(gpa_mixed)$student
 46 | ```
 47 | 
 48 | 
 49 | ## Comparison to Many Regressions
 50 | 
 51 | The following code calculates regression models for each student (i.e. six observations apiece). Density plots are shown comparing regressions run for each student and the mixed model.
 52 | 
 53 | ```{r by_group}
 54 | gpa_lm_by_group0 = lmList(gpa ~ occasion | student, gpa)
 55 | gpa_lm_by_group  = coef(gpa_lm_by_group0)
 56 | 
 57 | gint = 
 58 |   data_frame(Mixed=coef(gpa_mixed)$student[,1], Separate=gpa_lm_by_group[,1]) %>% 
 59 |   gather(key=Model, value=Intercept) %>% 
 60 |   ggplot(aes(x=Intercept)) +
 61 |   geom_density(aes(color=Model, fill=Model), alpha=.25) +
 62 |   scale_color_viridis_d(begin = .25, end = .75) +
 63 |   scale_fill_viridis_d(begin = .25, end = .75) +
 64 |   ggtitle('Intercepts') +
 65 |   labs(x='', y='') +
 66 |   xlim(c(1.5,4)) +
 67 |   theme_minimal() +
 68 |   theme(
 69 |     axis.text.y = element_blank(),
 70 |     axis.ticks.y = element_blank(),
 71 |     legend.key.size=unit(2, 'mm'),
 72 |     legend.title=element_text(size=8),
 73 |     legend.text=element_text(size=8),
 74 |     legend.box.spacing=unit(0, 'in'),
 75 |     legend.position=c(.85,.75)
 76 |     )
 77 | 
 78 | gslopes = 
 79 |   data_frame(Mixed=coef(gpa_mixed)$student[,2], Separate=gpa_lm_by_group[,2]) %>% 
 80 |   gather(key=Model, value=Occasion) %>% 
 81 |   ggplot(aes(x=Occasion)) +
 82 |   geom_density(aes(color=Model, fill=Model), alpha=.25, show.legend=F) +
 83 |   scale_color_viridis_d(begin = .25, end = .75) +
 84 |   scale_fill_viridis_d(begin = .25, end = .75) +
 85 |   ggtitle('Slopes for occasion') +
 86 |   labs(x='', y='') +
 87 |   xlim(c(-.2,.4)) +
 88 |   theme_minimal() +
 89 |   theme(
 90 |     axis.text.y = element_blank(),
 91 |     axis.ticks.y = element_blank()
 92 |   )
 93 | 
 94 | 
 95 | library(patchwork)
 96 | gint + gslopes
 97 | ```
 98 | 
 99 | ## Visualization of Effects
100 | 
101 | 
102 | Let's look at what the results are in terms of prediction. First we can look at the mixed effects results.
103 | 
104 | ```{r visualize_mixed_fit}
105 | # add if you want, not displayed as there are only a couple negative slopes
106 | going_down = factor(rep(coef(gpa_mixed)$student[,'occasion']<0, e=6), labels=c('Up', 'Down'))
107 | 
108 | library(modelr) # allows us to add predictions to the data frame
109 | gpa %>% 
110 |   add_predictions(gpa_lm, var='lm') %>% 
111 |   add_predictions(gpa_mixed, var='mixed') %>% 
112 |   ggplot() + 
113 |   geom_line(aes(x=occasion, y=mixed, group=student), alpha=.1, color='#00aaff') +
114 |   geom_line(aes(x=occasion, y=lm, group=student), color='#ff5500') +
115 |   labs(y='gpa') + 
116 |   theme_minimal() +
117 |   theme(panel.grid.minor = element_blank(),
118 |         panel.grid.major.x = element_blank())
119 | ```
120 | 
121 | With that in mind, now we can see the messier 'by-group' approach.  The general trend is ignored, and many more students are predicted with downward trends when they probably shouldn't be.
122 | 
123 | ```{r visualize_by_group_fit}
124 | going_down = factor(rep(gpa_lm_by_group[,'occasion']<0, e=6), labels=c('Upward', 'Downward'))
125 | 
126 | gpa %>% 
127 |   mutate(stufit=fitted(gpa_lm_by_group0)) %>% 
128 |   add_predictions(gpa_lm, var='gpa') %>% 
129 |   add_predictions(gpa_lm, var='lm') %>% 
130 |   ggplot() +
131 |   geom_line(aes(x=occasion, y=stufit, group=student, color=going_down, alpha = going_down)) +
132 |   geom_line(aes(x=occasion, y=lm), 
133 |             color='#ff5500') +
134 |   labs(y='gpa') + 
135 |   theme_minimal() +
136 |   theme(panel.grid.minor = element_blank(),
137 |         panel.grid.major.x = element_blank())
138 | ```
139 | 
140 | 
141 | 
142 | ## Exercises
143 | 
144 | #### Sleep revisited
145 | 
146 | Run the sleep study model with random coefficient for the Days effect, and interpret the results.  What is the correlation between the intercept and Days random effects?  Use the `ranef` and `coef` functions on the model you've created to inspect the individual specific effects. What do you see?
147 | 
148 | ```{r sleepstudy2, eval=FALSE}
149 | library(lme4)
150 | data("sleepstudy")
151 | 
152 | model = lmer(?, data = sleepstudy)
153 | 
154 | summary(model)
155 | ```
156 | 
157 | In the following, run each line, inspecting the result of each as you go along. 
158 | 
159 | ```{r, eval=FALSE}
160 | re = ranef(model)$Subject
161 | fe = fixef(model)
162 | apply(re, 1, function(x) x + fe) %>% t
163 | ```
164 | 
165 | The above code adds the fixed effects to each row of the random effects (the `t` just transposes the result). What is the result compared to what you saw before?
166 | 
167 | 
168 | 
169 | #### Simulation revisited
170 | 
171 | The following shows a simplified way to simulate some random slopes, but otherwise is the same as the simulation before.  Go ahead and run the code.
172 | 
173 | ```{r simSlopes}
174 | set.seed(1234)  # this will allow you to exactly duplicate your result
175 | Ngroups = 50
176 | NperGroup = 3
177 | N = Ngroups * NperGroup
178 | groups = factor(rep(1:Ngroups, each = NperGroup))
179 | re_int = rnorm(Ngroups, sd = .75)
180 | re_slope = rnorm(Ngroups, sd = .25)
181 | e = rnorm(N, sd = .25)
182 | x = rnorm(N)
183 | y = (2 + re_int[groups]) + (.5 + re_slope[groups]) * x + e
184 | 
185 | d = data.frame(x, y, groups)
186 | ```
187 | 
188 | This next bit of code shows a way to run a mixed model while specifying that there is no correlation between intercepts and slopes.  There is generally no reason to do this unless the study design warrants it, but you could do it as a step in the model-building process, such that you fit a model with no correlation, then one with it.
189 | 
190 | ```{r simSlopes2, eval=FALSE}
191 | model_ints_only = lmer(y ~ x + (1 | groups), data = d)
192 | model_with_slopes = lmer(y ~ x + (1 | groups) + (0 + x | groups), data = d)
193 | summary(model_with_slopes)
194 | confint(model_with_slopes)
195 | 
196 | library(ggplot2)
197 | ggplot(aes(x, y), data=d) +
198 |   geom_point()
199 | ```
200 | 
201 | Compare model fit using the `AIC` function, e.g. `AIC(model)`.  The model with the lower AIC is the better model, so which would you choose?
202 | 


--------------------------------------------------------------------------------