├── .gitattributes ├── .gitignore ├── README.md ├── data ├── earnings.RData ├── gpa.RData ├── nurses.RData ├── patents.RData ├── popularity.RData ├── pupils.RData ├── sociometric.RData ├── speed_dating.RData ├── speed_dating_model.RData └── storms.RData ├── mixed-models-with-r-workshop-2019.Rproj └── notebooks ├── extensions.Rmd ├── extensions.nb.html ├── introduction.Rmd ├── introduction.nb.html ├── mixed_models_basics.Rmd ├── mixed_models_basics.nb.html ├── random_slopes.Rmd └── random_slopes.nb.html /.gitattributes: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | 14 | # Output files from R CMD check 15 | /*.Rcheck/ 16 | 17 | # RStudio files 18 | .Rproj.user/ 19 | 20 | # Primary content 21 | *.Rmd 22 | 23 | # produced vignettes 24 | _book/*.html 25 | _docs/*.html 26 | _book/*.pdf 27 | 28 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 29 | .httr-oauth 30 | 31 | # knitr and R markdown default cache directories 32 | /*_bookdown_files/ 33 | /*_cache/ 34 | /cache/ 35 | 36 | # Temporary files created by R markdown 37 | *.utf8.md 38 | *.knit.md 39 | 40 | *.html linguist-documentation=true 41 | *.js linguist-vendored 42 | *.sh linguist-vendored 43 | *.css linguist-vendored 44 | 45 | 46 | *.Rmd linguist-language=R 47 | *.stan linguist-language=Stan 48 | *.inp linguist-language=Mplus 49 | *.out linguist-generated=true -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .DS_* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Mixed Models with R Workshop 2019 2 | 3 | Notebook and data for a workshop demonstrating mixed models. For more details see: https://m-clark.github.io/mixed-models-with-R/ 4 | 5 | For those in the workshop, after downloading this repo, open the RStudio project inside and, once your project is open, open each of the notebook files in the notebooks folder. They contain condensed code and exercises from the document linked above. 6 | 7 | The order of the notebooks: 8 | 9 | - introduction 10 | - mixed_model_basics 11 | - random_slopes 12 | - extensions -------------------------------------------------------------------------------- /data/earnings.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/earnings.RData -------------------------------------------------------------------------------- /data/gpa.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/gpa.RData -------------------------------------------------------------------------------- /data/nurses.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/nurses.RData -------------------------------------------------------------------------------- /data/patents.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/patents.RData -------------------------------------------------------------------------------- /data/popularity.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/popularity.RData -------------------------------------------------------------------------------- /data/pupils.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/pupils.RData -------------------------------------------------------------------------------- /data/sociometric.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/sociometric.RData -------------------------------------------------------------------------------- /data/speed_dating.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/speed_dating.RData -------------------------------------------------------------------------------- /data/speed_dating_model.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/speed_dating_model.RData -------------------------------------------------------------------------------- /data/storms.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-clark/mixed-models-with-r-workshop-2019/24f2359a7740dc2e85274b4a98143bb44319a0f0/data/storms.RData -------------------------------------------------------------------------------- /mixed-models-with-r-workshop-2019.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /notebooks/extensions.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Common Extensions" 3 | output: html_notebook 4 | editor_options: 5 | chunk_output_type: inline 6 | --- 7 | 8 | 9 | ```{r chunk_setup, include=FALSE, eval=TRUE} 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 11 | eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 12 | fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite') 13 | ``` 14 | 15 | ```{r catchup} 16 | # if needed 17 | library(tidyverse) 18 | library(lme4) 19 | 20 | load('data/gpa.RData') 21 | 22 | # if you want to run all the code, you'll need to install the following 23 | # install.packages(c('sjstats', 'merTools', 'broom')) 24 | ``` 25 | 26 | 27 | ## Additional Grouping Structure 28 | 29 | ### Crossed random effects 30 | 31 | Load the pupils data. We'll look at achievement scores for students. The sources of dependency are due to students having gone to the same primary or secondary schools. However, in this example, going to a primary school doesn't necessarily mean you'll go to a specific secondary school. Note also that there are no repeated measures, we see each student only once. 32 | 33 | ```{r load_pupils_data} 34 | load('data/pupils.RData') 35 | 36 | pupils 37 | ``` 38 | 39 | #### Primary and Secondary School Random Effects 40 | 41 | We'll do a model with random effects for both primary and secondary school. As such, we'll get variance components for each as well. 42 | 43 | ```{r cross_classified} 44 | pupils_crossed = lmer(achievement ~ sex + ses 45 | + (1|primary_school_id) + (1|secondary_school_id), 46 | data = pupils) 47 | 48 | summary(pupils_crossed, correlation=F) 49 | ``` 50 | 51 | In this case, the primary school displays more variance. 52 | 53 | ```{r crossed_vc} 54 | VarCorr(pupils_crossed) %>% 55 | print(comp=c('Var', 'Std'), digits=3) 56 | ``` 57 | 58 | ```{r crossed_icc} 59 | sjstats::icc(pupils_crossed) # relative proportion due to each school 60 | ``` 61 | 62 | Note the specific random effects for 50 distinct primary schools vs. 30 distinct secondary schools. 63 | 64 | ```{r crossed_re} 65 | str(ranef(pupils_crossed)) 66 | ``` 67 | 68 | We can use the `merTools` package to visualize. Install it if you haven't already. This plot allows to visually see the increased variability due to primary schools relative to secondary. 69 | 70 | ```{r crossed_re_plot} 71 | library(merTools) 72 | 73 | plotREsim(REsim(pupils_crossed)) + 74 | theme_minimal() 75 | ``` 76 | 77 | 78 | ### Hierarchical Structure 79 | 80 | Load and inspect the nurses data. Here we are interested in the effect of a training program (`treatment`) on stress levels (on a scale of 1-7) of nurses. In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital. 81 | 82 | ```{r nurses_data, echo=1} 83 | load('data/nurses.RData') 84 | 85 | nurses 86 | ``` 87 | 88 | There are two different ways to note a nested structure with `lme4`. Either is fine. As before, we have two sources of variability, and with this data it is with ward and hospital. 89 | 90 | ```{r hierarchical} 91 | nurses_hierarchical = lmer(stress ~ age + sex + experience 92 | + treatment + wardtype + hospsize 93 | + (1|hospital) + (1|hospital:ward), 94 | data = nurses) 95 | 96 | nurses_hierarchical = lmer(stress ~ age + sex + experience 97 | + treatment + wardtype + hospsize 98 | + (1|hospital/ward), 99 | data = nurses) # same thing! 100 | 101 | summary(nurses_hierarchical, correlation=F) 102 | ``` 103 | 104 | ```{r hierarchical_fixed} 105 | nurses_hierarchical %>% 106 | broom::tidy('fixed', conf.int=T) %>% 107 | mutate_if(is.numeric, round, digits = 2) 108 | ``` 109 | 110 | 111 | As far as the fixed effects go, about the only thing that doesn't have a statistical effect is ward type. 112 | 113 | There appears to be more variability due to ward than that due to hospital. 114 | 115 | ```{r hierarchical_random} 116 | VarCorr(nurses_hierarchical) 117 | ``` 118 | 119 | 120 | 121 | ### Crossed vs. Nested 122 | 123 | The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different? 124 | 125 | ```{r crossed_vs_nested, message=F} 126 | nurses_hierarchical1 = lmer(stress ~ age + sex + experience 127 | + treatment + wardtype + hospsize 128 | + (1|hospital) + (1|hospital:ward), data = nurses) 129 | 130 | nurses_crossed1 = lmer(stress ~ age + sex + experience 131 | + treatment + wardtype + hospsize 132 | + (1|hospital) + (1|ward), data = nurses) 133 | 134 | nurses_crossed2 = lmer(stress ~ age + sex + experience 135 | + treatment + wardtype + hospsize 136 | + (1|hospital) + (1|wardid), data = nurses) 137 | ``` 138 | 139 | 140 | ```{r nested1} 141 | VarCorr(nurses_hierarchical1) 142 | ``` 143 | 144 | ```{r crossed1} 145 | VarCorr(nurses_crossed1) 146 | ``` 147 | 148 | ```{r crossed2} 149 | VarCorr(nurses_crossed2) 150 | ``` 151 | 152 | 153 | The first hierarchical model and the second crossed version are identical. The second crossed is incorrectly labeled to be using a crossed notation for the model, as the label doesn't distinguish ward 1 in hospital 1 from ward 1 in hospital 2. In the second crossed model, we use `wardid` instead of `ward`, so each ward has a unique id, and the proper structure is accounted for with the crossed syntax. 154 | 155 | 156 | ## Residual Structure 157 | 158 | 159 | ### Heterogeneous Variances 160 | 161 | ```{r heterovar} 162 | library(nlme) 163 | heterovar_res = lme(gpa ~ occasion, 164 | data = gpa, 165 | random = ~1|student, 166 | weights = varIdent(form = ~1|occasion)) 167 | 168 | summary(heterovar_res) 169 | ``` 170 | 171 | ```{r relative_variances} 172 | # values are relative to redisual variance and on the standard deviation scale 173 | summary(heterovar_res$modelStruct) 174 | ``` 175 | 176 | ```{r heterovar_glmmTMB} 177 | library(glmmTMB) 178 | heterovar_res2 = glmmTMB(gpa ~ occasion 179 | + (1|student) + diag(0 + occas |student), 180 | data = gpa) 181 | 182 | summary(heterovar_res2) # you can ignore the Corr part in the random effects output 183 | ``` 184 | 185 | ```{r glmmtmb_extract_variances} 186 | vc_glmmtmb = VarCorr(heterovar_res2) 187 | vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2 188 | vc_glmmtmb 189 | ``` 190 | 191 | ### Autocorrelation 192 | 193 | ```{r corr_residual} 194 | corr_res = lme( 195 | gpa ~ occasion, 196 | data = gpa, 197 | random = ~1|student, 198 | correlation = corAR1(form = ~occasion) 199 | ) 200 | 201 | corr_res 202 | ``` 203 | 204 | ## Generalized Linear Mixed Models 205 | 206 | Note that `nlme` does not model beyond the gaussian distribution, so we go back to using `lme4` and the `glmer` function. You may notice that it takes the model a second or two even though it is not that complex. GLMM are often hard to estimate, and you will often encounter convergence issues. Scaling the data can help a lot. 207 | 208 | ```{r glmm_speed_dating} 209 | load('data/speed_dating.RData') 210 | 211 | sd_model = glmer( 212 | decision ~ sex + samerace + attractive_sc + sincere_sc + intelligent_sc 213 | + (1|iid), 214 | data = speed_dating, 215 | family = binomial 216 | ) 217 | 218 | summary(sd_model, correlation=F) 219 | ``` 220 | 221 | Note that the participant effect (`iid`) is almost as large (in terms of the standard deviation) as the effect of attractiveness. 222 | 223 | 224 | ## Exercises 225 | 226 | 227 | ### Sociometric data 228 | 229 | In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size. 230 | 231 | To run a mixed model, we will have three sources of structure to consider: 232 | 233 | - senders (within group) 234 | - receivers (within group) 235 | - group 236 | 237 | First, load the sociometric data. 238 | 239 | ```{r load_socio} 240 | load('data/sociometric.RData') 241 | ``` 242 | 243 | 244 | To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run. 245 | 246 | - Model 1: No covariates, only sender and receiver random effects. Note that even though we don't add group yet, still use the nesting approach to specify the effects (e.g. `1|group:receiver`) 247 | - Model 2: No covariates, add group random effect 248 | - Model 3: Add all covariates: `agesend/rec`, `sexsend/rec`, and `grsize` (group size) 249 | - Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model `sexsend:sexrec`. 250 | - Compare models with AIC, e.g. `AIC(model1)`. A lower value would indicate the model is preferred. 251 | 252 | 253 | ```{r socio} 254 | model1 = lmer(rating ~ (1|group:sender) + (1|group:receiver), 255 | data=sociometric) 256 | summary(model1, correlation=F) 257 | 258 | model2 = lmer(rating ~ (1|group:sender) + (1|group:receiver) + (1|group), 259 | data=sociometric) 260 | summary(model2, correlation=F) 261 | 262 | model3 = lmer(rating ~ sexsend + sexrec + agesend + agerec + grsize 263 | + (1|group:sender) + (1|group:receiver) + (1|group), 264 | data=sociometric) 265 | summary(model3, correlation=F) 266 | 267 | model4 = lmer(rating ~ sexsend*sexrec + agesend + agerec + grsize 268 | + (1|group:sender) + (1|group:receiver) + (1|group), 269 | data=sociometric) 270 | summary(model4, correlation=F) 271 | 272 | c(AIC(model1), AIC(model2), AIC(model3), AIC(model4)) 273 | ``` 274 | 275 | 276 | 277 | 278 | ### Patents 279 | 280 | Do a Poisson mixed effect model using the `patent` data. Model the number of citations (`ncit`) based on whether there was opposition to the patent (`opposition`) and if it was for the biotechnology/pharmaceutical industry (`biopharm`). Use year as a random effect to account for unspecified economic conditions. 281 | 282 | 283 | ```{r patent_starter} 284 | load('data/patents.RData') 285 | 286 | model_poisson = glmer(ncit ~ opposition + biopharm + (1 | year), 287 | data = patents, 288 | family = 'poisson') 289 | 290 | summary(model_poisson) 291 | ``` 292 | 293 | 294 | Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (`subject` in the data). In other words, no clustering or grouped structure is necessary. 295 | -------------------------------------------------------------------------------- /notebooks/introduction.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Mixed Models with R" 3 | output: html_notebook 4 | editor_options: 5 | chunk_output_type: inline 6 | --- 7 | 8 | 9 | ```{r chunk_setup, include=FALSE, eval=TRUE} 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 11 | eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 12 | fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite') 13 | ``` 14 | 15 | ```{r basic_packages} 16 | library(tidyverse) 17 | ``` 18 | 19 | 20 | # Introduction 21 | 22 | ## Basic Linear Model 23 | 24 | Note the output that you get from a standard regression. 25 | 26 | ```{r linear_model} 27 | library(lme4) 28 | 29 | lm_model = lm(Reaction ~ Days, data = sleepstudy) 30 | summary(lm_model) 31 | ``` 32 | 33 | Now we will run a mixed model. Here we will allow for *random effects*, which are specific effects for each individual, and come from some distribution, e.g. normal with mean zero and some variances - $\mathcal{N}(0, \sigma^2_{individual})$. 34 | 35 | ```{r mixed_model} 36 | mixed_model = lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy) 37 | 38 | summary(mixed_model) 39 | ``` 40 | 41 | Three parts of the above output are the same as we would get using the `lm` function. The 'fixed effects' coefficients are in fact identical, though we can see they have different standard errors, and for many mixed model settings the coefficients would not be exactly the same. In addition, we have the residual variance and standard deviation. The latter is the same thing as the residual standard error from `lm`, but we can see its value is different. That *unexplained* variance has now been partitioned into different parts. 42 | 43 | In this model we allow both intercepts and the coefficient for Days to vary by student. The estimated standard deviation tells what how much we would deviate from the normal effect (i.e. the fixed effects above) as we move from student to student, on average. 44 | 45 | ```{r variance_components} 46 | print(VarCorr(mixed_model), comp=c('Var', 'Std')) 47 | ``` 48 | 49 | 50 | We can also get estimates of these individual specific deviations. 51 | 52 | ```{r random_effects} 53 | random_effects = ranef(mixed_model)$Subject 54 | random_effects 55 | ``` 56 | 57 | Or we can add them to the fixed effects, allowing us to think in terms of *random coefficients*. 58 | 59 | ```{r random_coefficients} 60 | random_coefficients = coef(mixed_model)$Subject 61 | ``` 62 | 63 | As you can imagine, this would be notably better for prediction than assuming the same intercept and slope for everyone. 64 | 65 | ```{r plot_fixed_effect} 66 | fixed_effects = fixef(mixed_model) 67 | random_effects = 68 | random_coefficients %>% 69 | mutate(Subject = factor(unique(sleepstudy$Subject))) 70 | 71 | sleepstudy %>% 72 | ggplot(aes(Days, Reaction)) + 73 | geom_point(aes(color = Subject), alpha = .25) + 74 | geom_abline( 75 | aes( 76 | intercept = fixed_effects['(Intercept)'], 77 | slope = fixed_effects['Days']), 78 | color = 'darkred', 79 | size = 2) + 80 | geom_abline( 81 | aes( 82 | intercept = `(Intercept)`, 83 | slope = Days, 84 | color = Subject), 85 | size = .5, 86 | alpha = .25, 87 | data = random_effects) 88 | ``` 89 | 90 | 91 | Each individual is allowed to have their own say, but we can still refer to the general `population` or `typical` effect. 92 | 93 | ```{r animate_effects} 94 | library(gganimate) 95 | 96 | # this may take up to a minute to produce, and will show in your viewer 97 | sleepstudy %>% 98 | ggplot(aes(Days, Reaction)) + 99 | geom_abline( 100 | aes( 101 | intercept = fixed_effects['(Intercept)'], 102 | slope = fixed_effects['Days']), 103 | color = 'darkred', 104 | size = 2, 105 | alpha = .01) + 106 | geom_point(aes(color = Subject), alpha = 1) + 107 | geom_abline( 108 | aes( 109 | intercept = `(Intercept)`, 110 | slope = Days, 111 | group = Subject, 112 | color = Subject), 113 | size = .5, 114 | alpha = .25, 115 | data = random_effects) + 116 | transition_states(Subject, 1, 1) + 117 | theme_minimal() 118 | ``` 119 | 120 | -------------------------------------------------------------------------------- /notebooks/mixed_models_basics.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Mixed Models: Basics" 3 | output: html_notebook 4 | editor_options: 5 | chunk_output_type: inline 6 | --- 7 | 8 | 9 | ```{r chunk_setup, include=FALSE, eval=TRUE} 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 11 | eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 12 | fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite') 13 | ``` 14 | 15 | 16 | ## Example: Student GPA 17 | 18 | Load up and inspect the data. 19 | 20 | ```{r basic_packages} 21 | library(tidyverse) 22 | ``` 23 | 24 | 25 | ```{r load_gpa_data} 26 | load('data/gpa.RData') 27 | gpa 28 | ``` 29 | 30 | ## Application 31 | 32 | ### Standard Regression 33 | 34 | We'll start with a standard linear regression model. We have coefficients for the intercept and the effect of time, and in addition, the variance of the observations (residual standard error). 35 | 36 | ```{r gpa_lm} 37 | gpa_lm = lm(gpa ~ occasion, data=gpa) 38 | summary(gpa_lm) 39 | ``` 40 | 41 | ### Mixed Model 42 | 43 | ```{r gpa_mixed} 44 | library(lme4) 45 | gpa_mixed = lmer(gpa ~ occasion + (1|student), data=gpa) 46 | summary(gpa_mixed) 47 | ``` 48 | 49 | [As a test, replace `1|student` with `1|sample(1:10, 1200, replace = T)`. As your variance due to arbitrary grouping is essentially 0, the residual error estimate is similar to the `lm` model.] 50 | 51 | 52 | People always ask where the p-values are, but the answer is... complicated. Other packages and programs present them as if they are trivially obtained, but that is not the case, and the `lme4` developers would rather not make unnecessary assumptions. On the plus side, you can get interval estimates easily enough, even though they are poorly named for the variance components. `sigma01` is the student variance. 53 | 54 | ```{r gpa_mixed_confint} 55 | confint(gpa_mixed) 56 | ``` 57 | 58 | #### Estimated Random Effects 59 | 60 | Now examine the random effects. 61 | 62 | ```{r gpa_mixed_ranef} 63 | ranef(gpa_mixed)$student 64 | ``` 65 | 66 | ```{r gpa_mixed_rancoef} 67 | coef(gpa_mixed)$student 68 | ``` 69 | 70 | As we didn't allow the occasion effect to vary, it is constant. We'll change this later. 71 | 72 | #### Prediction 73 | 74 | ```{r gpa_mixed_prediction} 75 | predict(gpa_mixed, re.form=NA) %>% head 76 | ``` 77 | 78 | ## Adding a Cluster-level Covariate 79 | 80 | See exercises. 81 | 82 | 83 | 84 | ## Exercises 85 | 86 | 87 | ### Sleep 88 | 89 | For this exercise, we'll use the sleep study data from the `lme4` package. The following describes it. 90 | 91 | > The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject. 92 | 93 | After loading the package, the data can be loaded as follows. I show the first few observations. 94 | 95 | ```{r sleepstudy} 96 | library(lme4) 97 | data("sleepstudy") 98 | head(sleepstudy) 99 | ``` 100 | 101 | 1. Run a regression with Reaction as the target variable and Days as the predictor. 102 | 103 | 2. Run a mixed model with a random intercept for Subject. 104 | 105 | 3. Interpret the variance components and fixed effects. 106 | 107 | 4. What would a plot of the prediction lines per student look like relative to the overall trend? 108 | 109 | 110 | 111 | ### Cluster level covariate 112 | 113 | Rerun the mixed model with the GPA data adding the cluster level covariate of `sex`, or high school GPA (`highgpa`), or both. Interpret all aspects of the results. 114 | 115 | ```{r gpa_cluster, eval=FALSE} 116 | gpa_mixed_cluster_level = lmer(?, gpa) 117 | 118 | summary(gpa_mixed_cluster_level) 119 | ``` 120 | 121 | What happened to the student variance after adding cluster level covariates to the model? 122 | 123 | 124 | 125 | ### Simulation 126 | 127 | The following represents a simple way to simulate a random intercepts model. Note each object what each object is, and make sure the code make sense to you. Then run it. 128 | 129 | ```{r simMixed} 130 | set.seed(1234) # this will allow you to exactly duplicate your result 131 | Ngroups = 100 132 | NperGroup = 3 133 | N = Ngroups * NperGroup 134 | groups = factor(rep(1:Ngroups, each = NperGroup)) 135 | u = rnorm(Ngroups, sd = .5) 136 | e = rnorm(N, sd = .25) 137 | x = rnorm(N) 138 | y = 2 + .5 * x + u[groups] + e 139 | 140 | d = data.frame(x, y, groups) 141 | ``` 142 | 143 | Which of the above represent the fixed and random effects? Now run the following. 144 | 145 | ```{r simMixed2} 146 | model = lmer(y ~ x + (1|groups), data=d) 147 | summary(model) 148 | confint(model) 149 | 150 | 151 | 152 | library(ggplot2) 153 | ggplot(aes(x, y), data=d) + 154 | geom_point() 155 | ``` 156 | 157 | Do the results seem in keeping with what you expect? 158 | 159 | In what follows we'll change various aspects of the data, then rerun the model after each change, then summarize and get confidence intervals as before. For each note specifically at least one thing that changed in the results. 160 | 161 | 0. First calculate or simply eyeball the intraclass correlation coefficient: 162 | 163 | $$\frac{\textrm{random effect variance}}{\textrm{residual + random effect variance}}$$ 164 | 165 | In addition, create a density plot of the random effects as follows. 166 | 167 | ```{r simMixed3, eval=FALSE} 168 | re = ranef(model)$groups 169 | qplot(x=re, geom='density', xlim=c(-3,3)) 170 | ``` 171 | 172 | 1. Change the random effect variance/sd and/or the residual variance/sd and note your new estimate of the ICC, and plot the random effect as before. 173 | 174 | 2. Reset the values to the original. Change Ngroups to 50. What differences do you see in the confidence interval estimates? 175 | 176 | 3. Set the Ngroups back to 100. Now change NperGroup to 10, and note again the how the CI is different from the base condition. -------------------------------------------------------------------------------- /notebooks/random_slopes.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "More Random Effects" 3 | output: html_notebook 4 | editor_options: 5 | chunk_output_type: inline 6 | --- 7 | 8 | 9 | ```{r chunk_setup, include=FALSE, eval=TRUE} 10 | knitr::opts_chunk$set(echo = T, message=F, warning=F, comment=NA, autodep=F, 11 | eval=T, cache.rebuild=F, cache=T, R.options=list(width=120), 12 | fig.width=8, fig.align = 'center', dev.args=list(bg = 'transparent'), dev='svglite') 13 | ``` 14 | 15 | ```{r catchup} 16 | # if needed 17 | library(tidyverse) 18 | library(lme4) 19 | 20 | load('data/gpa.RData') 21 | 22 | gpa_lm = lm(gpa ~ occasion, data=gpa) 23 | ``` 24 | 25 | ## Application 26 | 27 | Add a random slope to our previous example and examine the results. 28 | 29 | ```{r random_slope} 30 | gpa_mixed = lmer(gpa ~ occasion + (1 + occasion|student), data=gpa) 31 | summary(gpa_mixed) 32 | ``` 33 | 34 | ### Explore Random Effects 35 | 36 | As before, we can examine the per-student random effects. 37 | 38 | ```{r random_effects} 39 | ranef(gpa_mixed)$student 40 | ``` 41 | 42 | Unlike before, we see each student's occasion effect. 43 | 44 | ```{r random_coefficients} 45 | coef(gpa_mixed)$student 46 | ``` 47 | 48 | 49 | ## Comparison to Many Regressions 50 | 51 | The following code calculates regression models for each student (i.e. six observations apiece). Density plots are shown comparing regressions run for each student and the mixed model. 52 | 53 | ```{r by_group} 54 | gpa_lm_by_group0 = lmList(gpa ~ occasion | student, gpa) 55 | gpa_lm_by_group = coef(gpa_lm_by_group0) 56 | 57 | gint = 58 | data_frame(Mixed=coef(gpa_mixed)$student[,1], Separate=gpa_lm_by_group[,1]) %>% 59 | gather(key=Model, value=Intercept) %>% 60 | ggplot(aes(x=Intercept)) + 61 | geom_density(aes(color=Model, fill=Model), alpha=.25) + 62 | scale_color_viridis_d(begin = .25, end = .75) + 63 | scale_fill_viridis_d(begin = .25, end = .75) + 64 | ggtitle('Intercepts') + 65 | labs(x='', y='') + 66 | xlim(c(1.5,4)) + 67 | theme_minimal() + 68 | theme( 69 | axis.text.y = element_blank(), 70 | axis.ticks.y = element_blank(), 71 | legend.key.size=unit(2, 'mm'), 72 | legend.title=element_text(size=8), 73 | legend.text=element_text(size=8), 74 | legend.box.spacing=unit(0, 'in'), 75 | legend.position=c(.85,.75) 76 | ) 77 | 78 | gslopes = 79 | data_frame(Mixed=coef(gpa_mixed)$student[,2], Separate=gpa_lm_by_group[,2]) %>% 80 | gather(key=Model, value=Occasion) %>% 81 | ggplot(aes(x=Occasion)) + 82 | geom_density(aes(color=Model, fill=Model), alpha=.25, show.legend=F) + 83 | scale_color_viridis_d(begin = .25, end = .75) + 84 | scale_fill_viridis_d(begin = .25, end = .75) + 85 | ggtitle('Slopes for occasion') + 86 | labs(x='', y='') + 87 | xlim(c(-.2,.4)) + 88 | theme_minimal() + 89 | theme( 90 | axis.text.y = element_blank(), 91 | axis.ticks.y = element_blank() 92 | ) 93 | 94 | 95 | library(patchwork) 96 | gint + gslopes 97 | ``` 98 | 99 | ## Visualization of Effects 100 | 101 | 102 | Let's look at what the results are in terms of prediction. First we can look at the mixed effects results. 103 | 104 | ```{r visualize_mixed_fit} 105 | # add if you want, not displayed as there are only a couple negative slopes 106 | going_down = factor(rep(coef(gpa_mixed)$student[,'occasion']<0, e=6), labels=c('Up', 'Down')) 107 | 108 | library(modelr) # allows us to add predictions to the data frame 109 | gpa %>% 110 | add_predictions(gpa_lm, var='lm') %>% 111 | add_predictions(gpa_mixed, var='mixed') %>% 112 | ggplot() + 113 | geom_line(aes(x=occasion, y=mixed, group=student), alpha=.1, color='#00aaff') + 114 | geom_line(aes(x=occasion, y=lm, group=student), color='#ff5500') + 115 | labs(y='gpa') + 116 | theme_minimal() + 117 | theme(panel.grid.minor = element_blank(), 118 | panel.grid.major.x = element_blank()) 119 | ``` 120 | 121 | With that in mind, now we can see the messier 'by-group' approach. The general trend is ignored, and many more students are predicted with downward trends when they probably shouldn't be. 122 | 123 | ```{r visualize_by_group_fit} 124 | going_down = factor(rep(gpa_lm_by_group[,'occasion']<0, e=6), labels=c('Upward', 'Downward')) 125 | 126 | gpa %>% 127 | mutate(stufit=fitted(gpa_lm_by_group0)) %>% 128 | add_predictions(gpa_lm, var='gpa') %>% 129 | add_predictions(gpa_lm, var='lm') %>% 130 | ggplot() + 131 | geom_line(aes(x=occasion, y=stufit, group=student, color=going_down, alpha = going_down)) + 132 | geom_line(aes(x=occasion, y=lm), 133 | color='#ff5500') + 134 | labs(y='gpa') + 135 | theme_minimal() + 136 | theme(panel.grid.minor = element_blank(), 137 | panel.grid.major.x = element_blank()) 138 | ``` 139 | 140 | 141 | 142 | ## Exercises 143 | 144 | #### Sleep revisited 145 | 146 | Run the sleep study model with random coefficient for the Days effect, and interpret the results. What is the correlation between the intercept and Days random effects? Use the `ranef` and `coef` functions on the model you've created to inspect the individual specific effects. What do you see? 147 | 148 | ```{r sleepstudy2, eval=FALSE} 149 | library(lme4) 150 | data("sleepstudy") 151 | 152 | model = lmer(?, data = sleepstudy) 153 | 154 | summary(model) 155 | ``` 156 | 157 | In the following, run each line, inspecting the result of each as you go along. 158 | 159 | ```{r, eval=FALSE} 160 | re = ranef(model)$Subject 161 | fe = fixef(model) 162 | apply(re, 1, function(x) x + fe) %>% t 163 | ``` 164 | 165 | The above code adds the fixed effects to each row of the random effects (the `t` just transposes the result). What is the result compared to what you saw before? 166 | 167 | 168 | 169 | #### Simulation revisited 170 | 171 | The following shows a simplified way to simulate some random slopes, but otherwise is the same as the simulation before. Go ahead and run the code. 172 | 173 | ```{r simSlopes} 174 | set.seed(1234) # this will allow you to exactly duplicate your result 175 | Ngroups = 50 176 | NperGroup = 3 177 | N = Ngroups * NperGroup 178 | groups = factor(rep(1:Ngroups, each = NperGroup)) 179 | re_int = rnorm(Ngroups, sd = .75) 180 | re_slope = rnorm(Ngroups, sd = .25) 181 | e = rnorm(N, sd = .25) 182 | x = rnorm(N) 183 | y = (2 + re_int[groups]) + (.5 + re_slope[groups]) * x + e 184 | 185 | d = data.frame(x, y, groups) 186 | ``` 187 | 188 | This next bit of code shows a way to run a mixed model while specifying that there is no correlation between intercepts and slopes. There is generally no reason to do this unless the study design warrants it, but you could do it as a step in the model-building process, such that you fit a model with no correlation, then one with it. 189 | 190 | ```{r simSlopes2, eval=FALSE} 191 | model_ints_only = lmer(y ~ x + (1 | groups), data = d) 192 | model_with_slopes = lmer(y ~ x + (1 | groups) + (0 + x | groups), data = d) 193 | summary(model_with_slopes) 194 | confint(model_with_slopes) 195 | 196 | library(ggplot2) 197 | ggplot(aes(x, y), data=d) + 198 | geom_point() 199 | ``` 200 | 201 | Compare model fit using the `AIC` function, e.g. `AIC(model)`. The model with the lower AIC is the better model, so which would you choose? 202 | --------------------------------------------------------------------------------