├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── Regressionspakete.docx ├── _config.yml ├── by2-small.png ├── by2.png ├── cc-attrib-nc.png ├── example.RData ├── mixed-models-snippets.Rproj ├── nested_fully-crossed_cross-classified_models.R ├── nested_fully-crossed_cross-classified_models.Rmd ├── nested_fully-crossed_cross-classified_models.html ├── overview_modelling_packages.Rmd ├── overview_modelling_packages.html ├── random-effects-within-between-effects-model-glmmtmb.Rmd ├── random-effects-within-between-effects-model-glmmtmb.html ├── random-effects-within-between-effects-model.Rmd ├── random-effects-within-between-effects-model.bib ├── random-effects-within-between-effects-model.html ├── regression_pkgs_handout.docx ├── regression_pkgs_handout.pdf ├── time-varying-covariates.R ├── time-varying-covariates.Rmd └── time-varying-covariates.html /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | *.html linguist-vendored 4 | *.md linguist-vendored 5 | *.bib linguist-vendored 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | .Rproj 8 | 9 | # Example code in package build process 10 | *-Ex.R 11 | 12 | # Output files from R CMD build 13 | /*.tar.gz 14 | 15 | # Output files from R CMD check 16 | /*.Rcheck/ 17 | 18 | # RStudio files 19 | .Rproj.user/ 20 | 21 | # produced vignettes 22 | vignettes/*.html 23 | vignettes/*.pdf 24 | 25 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 26 | .httr-oauth 27 | 28 | # knitr and R markdown default cache directories 29 | /*_cache/ 30 | /cache/ 31 | 32 | # Temporary files created by R markdown 33 | *.utf8.md 34 | *.knit.md 35 | 36 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html 37 | rsconnect/ 38 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # mixed-models-snippets 2 | 3 | This repository collects various small code snippets or short instructions on how to use or define specific mixed models, mostly with packages lme4 and glmmTMB. 4 | 5 | Most code snippets come from forum solutions or email-correspondence. 6 | 7 | * [Fixed versus Random Effects Models](https://easystats.github.io/parameters/articles/demean.html) 8 | * [Nested and Crossed Random Effects Models](http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/nested_fully-crossed_cross-classified_models.html) 9 | * [Time-varying covariates](http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/time-varying-covariates.html) 10 | * [Overview of Modelling Packages](http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/overview_modelling_packages.html) 11 | -------------------------------------------------------------------------------- /Regressionspakete.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/Regressionspakete.docx -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-slate -------------------------------------------------------------------------------- /by2-small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/by2-small.png -------------------------------------------------------------------------------- /by2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/by2.png -------------------------------------------------------------------------------- /cc-attrib-nc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/cc-attrib-nc.png -------------------------------------------------------------------------------- /example.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/example.RData -------------------------------------------------------------------------------- /mixed-models-snippets.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | -------------------------------------------------------------------------------- /nested_fully-crossed_cross-classified_models.R: -------------------------------------------------------------------------------- 1 | # Nested and Crossed Random Effects Models ---- 2 | 3 | # Say we have a model with a dependent variable "DV", independent variable "IV" 4 | # and groups as random effects ("Cluster", "Subject"). The "IV" varies accross 5 | # "Cluster" and "Subject". 6 | # 7 | # Is this a nested, fully crossed or cross-classified design? 8 | # 9 | # Nested design ---- 10 | # 11 | # The key distinction is whether each "Subject" receives a completely 12 | # different "Cluster" set. If this is the case the design is nested, 13 | # which simply means: not crossed. 14 | 15 | lmer(DV ~ IV + (1 + IV | Cluster / Subject), data = ...) 16 | 17 | # which expands to... 18 | 19 | lmer(DV ~ IV + (1 + IV | Cluster ) + (1 + IV | Cluster:Subject), data = ...) 20 | 21 | 22 | # Fully-crossed or cross-classified models ---- 23 | 24 | # If each "Subject" receives the same "Cluster", it is a fully crossed 25 | # random factors design. If there is some mixture it is cross-classified. 26 | # The appropriate model notation for a crossed design would be: 27 | 28 | lmer(DV ~ IV + (1 + IV | Cluster) + (1 + IV | Subject), data = ...) 29 | 30 | 31 | # related post: https://www.researchgate.net/post/Multilevel_modelling_in_R 32 | 33 | # see also: https://stats.stackexchange.com/a/228814/54740 34 | -------------------------------------------------------------------------------- /nested_fully-crossed_cross-classified_models.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Nested and Crossed Random Effects Models" 3 | author: "Daniel Lüdecke" 4 | date: "26 3 2019" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE,echo=FALSE} 9 | library(knitr) 10 | knitr::opts_chunk$set( 11 | echo = TRUE, 12 | collapse = TRUE, 13 | warning = FALSE, 14 | comment = "#>", 15 | dev = "png" 16 | ) 17 | ``` 18 | 19 | # Nested and Crossed Random Effects Models 20 | 21 | Say we have a model with a dependent variable **DV**, independent variable **IV** and groups as random effects (**Cluster**, **Subject**). The **IV** varies across **Cluster** and **Subject**. 22 | 23 | Is this a nested, fully crossed or cross-classified design? 24 | 25 | ## Nested design 26 | 27 | The key distinction is whether each **Subject** receives a completely different **Cluster** set. If this is the case the design is _nested_, which simply means: _not crossed_. 28 | 29 | ```{r eval=FALSE} 30 | lmer(DV ~ IV + (1 + IV | Cluster / Subject), data = ...) 31 | ``` 32 | 33 | which expands to... 34 | 35 | ```{r eval=FALSE} 36 | lmer(DV ~ IV + (1 + IV | Cluster ) + (1 + IV | Cluster:Subject), data = ...) 37 | ``` 38 | 39 | 40 | ## Fully-crossed or cross-classified models 41 | 42 | If each **Subject** receives the same **Cluster** (i.e. subjects appear in all clusters), it is a _fully crossed_ random factors design. If there is some mixture it is _cross-classified_. The appropriate model notation for a crossed design would be: 43 | 44 | ```{r eval=FALSE} 45 | lmer(DV ~ IV + (1 + IV | Cluster) + (1 + IV | Subject), data = ...) 46 | ``` 47 | 48 | ## Easily check if group factors are nested or crossed 49 | 50 | You can use the [**sjmisc**-package](https://strengejacke.github.io/sjmisc/) to check whether group factors are (fully) crossed, nested or cross-classified. 51 | 52 | `is_cross_classified()` returns `TRUE`, so a cross-classified design would be appropriate for this random effects structure. 53 | 54 | ```{r} 55 | # data with cross-classified distribution of "cluster" and "subject" 56 | data <- data.frame( 57 | cluster = rep(1:5, each = 3), 58 | subject = c(1,2,3, 1, 2, 4, 1, 2, 3, 1, 2, 5, 1, 2, 4) 59 | ) 60 | 61 | # the table output indicates that data is not nested, but also not fully crossed 62 | table(data) 63 | 64 | # check nesting / crossing of group factors 65 | library(sjmisc) 66 | is_nested(data$cluster, data$subject) 67 | is_crossed(data$cluster, data$subject) 68 | is_cross_classified(data$cluster, data$subject) 69 | ``` 70 | 71 | # References 72 | 73 | * Related post: https://www.researchgate.net/post/Multilevel_modelling_in_R 74 | * See also: https://stats.stackexchange.com/a/228814/54740 75 | -------------------------------------------------------------------------------- /overview_modelling_packages.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Overview of R Modelling Packages" 3 | author: "Daniel Lüdecke" 4 | output: html_document 5 | --- 6 | 7 | ```{r setup, include=FALSE} 8 | library(knitr) 9 | library(docxtractr) 10 | 11 | knitr::opts_chunk$set(echo = TRUE) 12 | 13 | docx <- read_docx("Regressionspakete.docx") 14 | tab <- docx_extract_tbl(docx) 15 | colnames(tab) <- c("Nature of Response", "Example", "Type of Regression", "R package or function", "Example Webpage", "Bayesian with [`brms`](https://paul-buerkner.github.io/brms/reference/brmsfamily.html)") 16 | ``` 17 | 18 | This is an overview of R packages and functions for fitting different types of regression models. For each row, the upper cells in the last column (_packages and functions_) refer to "simple" models, while the lower cells refer to their mixed models counterpart (if available and known). 19 | 20 | This overview raises no claims towards completeness of available modelling packages. Rather, it shows commonly or more often used packages, but there a plenty of other packages as well (that might even perform better in doing those mentioned tasks - if you're aware of such packages or think that an important package or function is missing, please [file an issue](https://github.com/strengejacke/mixed-models-snippets/issues)). 21 | 22 | ## Modelling Packages 23 | 24 | ```{r echo=FALSE} 25 | kable(tab) 26 | ``` 27 | 28 | _Note that ratios or proportions from **count data**, like `cbind(successes, failures)`, are modelled as logistic regression with `glm(cbind(successes, failures), family=binomial())`, while ratios from **continuous data** (where the response ranges from 0 to 1) are modelled using beta-regression._ 29 | 30 | _Usually, zero-inflated models are used when 0 or 1 come from a separate process or category. However, when the 0/1 values are most consistent with censoring rather than with a separate category/process, the ordered beta regression is probably a better choice (i.e., 0 mean "below detection", not "something qualitatively different happened") (Source: https://twitter.com/bolkerb/status/1577755600808775680)_ 31 | 32 | ## Included packages for non-mixed models: 33 | 34 | - Base R: `lm()`, `glm()` 35 | - [AER](https://CRAN.R-project.org/package=AER): `tobit()` 36 | - [aod](https://CRAN.R-project.org/package=aod): `betabin()` 37 | - [betareg](https://CRAN.R-project.org/package=betareg): `betareg()` 38 | - [brglm2](https://CRAN.R-project.org/package=brglm2): `bracl()`, `brmultinom()` 39 | - [censReg](https://CRAN.R-project.org/package=censReg): `censReg()` 40 | - [cplm](https://CRAN.R-project.org/package=cplm): `cpglm()` 41 | - [coxph](https://CRAN.R-project.org/package=survival): `coxph()` 42 | - [DirichletReg](https://CRAN.R-project.org/package=DirichletReg): `DirichReg()` 43 | - [HRQoL](https://CRAN.R-project.org/package=HRQoL): `BBreg()` 44 | - [MASS](https://CRAN.R-project.org/package=MASS): `glm.nb()`, `polr()` 45 | - [nnet](https://CRAN.R-project.org/package=nnet): `multinom()` 46 | - [ordbetareg](https://cran.r-project.org/package=ordbetareg): `ordbetareg()` 47 | - [ordinal](https://CRAN.R-project.org/package=ordinal): `clm()`, `clm2()` 48 | - [pscl](https://CRAN.R-project.org/package=pscl): `zeroinfl()`, `hurdle()` 49 | - [statmod](https://CRAN.R-project.org/package=statmod): `tweedie()` 50 | - [VGAM](https://CRAN.R-project.org/package=VGAM): `vglm()` 51 | 52 | ## Included packages for mixed models: 53 | 54 | - [cplm](https://CRAN.R-project.org/package=cplm): `cpglmm()` 55 | - [coxme](https://CRAN.R-project.org/package=coxme): `coxme()` 56 | - [glmmTMB](https://CRAN.R-project.org/package=glmmTMB): `glmmTMB()` 57 | - [lme4](https://CRAN.R-project.org/package=lme4): `lmer()`, `glmer()`, `glmer.nb()` 58 | - [MCMCglmm](https://CRAN.R-project.org/package=MCMCglmm): `MCMCglmm()` 59 | - [mixor](https://CRAN.R-project.org/package=mixor): `mixor()` 60 | - [ordbetareg](https://cran.r-project.org/package=ordbetareg): `ordbetareg()` 61 | - [ordinal](https://CRAN.R-project.org/package=ordinal): `clmm()`, `clmm2()` 62 | - [smicd](https://cran.r-project.org/package=smicd): `semLme()` 63 | 64 | ## Included packages for Bayesian models (mixed an non-mixed): 65 | 66 | - [brms](https://cran.r-project.org/package=brms): `brm()` 67 | 68 | ## Handout 69 | 70 | There is a [handout](regression_pkgs_handout.pdf) in PDF-format. 71 | 72 | ![](by2-small.png) 73 | -------------------------------------------------------------------------------- /random-effects-within-between-effects-model-glmmtmb.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Fixed and Random Effects Models" 3 | author: "Daniel Lüdecke" 4 | date: "22 Feb 2019" 5 | output: 6 | html_document: 7 | theme: cerulean 8 | bibliography: random-effects-within-between-effects-model.bib 9 | --- 10 | 11 | ```{r setup, include=FALSE} 12 | knitr::opts_chunk$set( 13 | echo = TRUE, 14 | collapse = TRUE, 15 | warning = FALSE, 16 | comment = "#>", 17 | dev = "png" 18 | ) 19 | ``` 20 | 21 | ![cc](cc-attrib-nc.png) 22 | 25 | 26 | This document is licensed under the 27 | [Creative Commons attribution-noncommercial license](http://creativecommons.org/licenses/by-nc-sa/2.5/ca/). 28 | Please share \& remix noncommercially, mentioning its origin. Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets). 29 | 30 | ## The violation of model-assumptions in RE-models for panel data 31 | 32 | This example shows how to address the issue when group factors (random effects) and (time-constant) predictors correlate for mixed models, especially in panel data. Models, where predictors and group factors correlate, may have compromised estimates of uncertainty as well as possible bias. In particular in econometrics, fixed-effects models are considered the gold-standard to address such issues. However, it often makes no sense to consider group-effects as "fixed" over a long time period. Apart from this, there are more shortcomings of FE-models as well, see [@bell_fixed_2018], [@bell_understanding_2018] and [@bafumi_fitting_2006]. 33 | 34 | The following equations and discussion on FE vs. RE are based on [@bell_fixed_2018]. Further discussion on FE vs. RE, also at the end of this document, refer to [@gelman_data_2007] and [@bafumi_fitting_2006]. 35 | 36 | ### Adding group meaned predictors to solve this issue 37 | 38 | The solution to the critics from "FE-modelers" is simple: If you include a group-mean of your variables in a random effects model (that is, calculating the mean of the predictor at each group-level and including it as a group-level predictor), it will give the same answer as a fixed effects model (see table 3 very below, and [@bell_understanding_2018] as reference). This is why FE-modelers often call this type of RE-model also a "kind of" FE-model, i.e. they define a RE model as a model where predictors are assumed uncorrelated with the residuals. However, 39 | 40 | > "Calling it a FE model is not just inaccurate. It also does down its potential. Eg FE models don’t usually include random slopes, and failing to do so can lead to incorrect SEs as well as being a less interesting and realistic model." 41 | 42 | > "A random effects model is such because it has random effects (that is, higher-level entities treated as a distribution) in it rather than fixed effects (higher-level entities treated as dummy variables) in it." 43 | 44 | source: [Twitter-Discussion 1](https://twitter.com/AndrewJDBell/status/1026764338370105344), [Twitter-Discussion 2](https://twitter.com/AndrewJDBell/status/1026764347480178689) 45 | 46 | ### Problems of ignoring random slopes in Fixed Effects models 47 | 48 | [@heisig_costs_2017] demonstrate how ignoring random slopes, i.e. neglecting "cross-cluster differences in the effects of lower-level controls reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power". You may refer to this paper to justify a mixed model with random slopes over "simple" FE-models. 49 | 50 | ## Examples 51 | 52 | The following code snippets show how to translate the Equations from [@bell_fixed_2018] into R-code, using `glmmTMB()` from the **glmmTMB**-package. 53 | 54 | ```{r message=FALSE} 55 | library(glmmTMB) 56 | library(parameters) 57 | library(sjPlot) 58 | library(lfe) 59 | 60 | load("example.RData") 61 | ``` 62 | 63 | _Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets)._ 64 | 65 | ## Description of the data 66 | 67 | * Variables: 68 | * `x_tv` : time-varying variable 69 | * `z1_ti` : first time-invariant variable, co-variate 70 | * `z2_ti` : second time-invariant variable, co-variate 71 | * `QoL` : Response (quality of life of patient) 72 | * `ID` : patient ID 73 | * `time` : time-point of measurement 74 | 75 | 76 | ## "Classical" growth-model for longitudinal data 77 | 78 | ```{r} 79 | model_re <- glmmTMB( 80 | QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID), 81 | data = d, 82 | REML = TRUE 83 | ) 84 | ``` 85 | 86 | ## Computing the de-meaned and group-meaned variables 87 | 88 | Next is a model from Eq. 10, which includes the _"de-meaned"_ time-varying variable as well as the _"group-meaned"_ time-varying variable. 89 | 90 | ```{r} 91 | # compute mean of "x_tv" for each subject (ID) and 92 | # then "de-mean" x_tv 93 | d <- cbind( 94 | d, 95 | demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" 96 | ) 97 | ``` 98 | 99 | Now we have: 100 | 101 | * `x_tv_between` : time-varying variable with the mean of `x_tv` accross all time-points, for each patient (ID). 102 | * `x_tv_within` : the de-meaned time-varying variable `x_tv` 103 | 104 | `QoL_between` and `QoL_within` are used to test different FE-models, which are described later. In those models, I also use a "de-meaned" response variable without the group-variable (`ID`) as fixed effect (see Equation 6 in the paper). 105 | 106 | ## The complex random-effect-within-between model (REWB) 107 | 108 | Eq. 10 suggests allowing the "within-effect" (de-meaned) vary across individuals, that's why `x_tv_within` is added as random slope as well. 109 | 110 | Here, the estimate of `x_tv_within` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _between-subject_ effect. This model also allows for heterogenity across level-2 units, that's why `x_tv_within` also appears in the random effects. The estimates of `z1_ti` 111 | and `z2_ti` also indicate a _between-subject_ effect, as this is a level-2 variable, which cannot have a within-subject effect. 112 | 113 | 114 | ### Model from Equation 10 115 | 116 | Here is the equation 10 from Bell et al. 2018: 117 | 118 | ```{r echo=FALSE} 119 | f <- "yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + υi0 + υi1 (xit - ͞xi) + εit" 120 | knitr::asis_output(f) 121 | ``` 122 | 123 | ```{r echo=FALSE} 124 | f <- "" 125 | knitr::asis_output(f) 126 | ``` 127 | 128 | 129 | ```{r warning=FALSE} 130 | # Model fits... 131 | model_complex_rewb <- glmmTMB( 132 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID), 133 | data = d, 134 | REML = TRUE 135 | ) 136 | 137 | # An alternative could be to model the random effects as not correlated. 138 | model_complex_rewb_2 <- glmmTMB( 139 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID), 140 | REML = TRUE, 141 | data = d 142 | ) 143 | 144 | # here we get an error message when calling the summary(). The 145 | # model can't compute standard errors. So we don't use this model. 146 | # model_complex_rewb_2b <- glmmTMB( 147 | # QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 148 | # (1 + time | ID) + (1 + x_tv_within | ID), 149 | # data = d, 150 | # REML = TRUE 151 | # ) 152 | 153 | # an alternative would be to assume independence between random slopes 154 | # and no covariance... 155 | model_complex_rewb_2c <- glmmTMB( 156 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 157 | (1 + time | ID) + (0 + x_tv_within | ID), 158 | data = d, 159 | REML = TRUE 160 | ) 161 | ``` 162 | 163 | We compare all model fits, but we go on with `model_complex_rewb` for now... Note that in the [same examples with `lmer()`](random-effects-within-between-effects-model.html), we took the model where the random parts were `(1 + time | ID) + (1 + x_tv_within | ID)`, like in the above model `model_complex_rewb_2b` - however, model `model_complex_rewb_2b` has serious problems with calculating the standard errors. Model `model_complex_rewb` comes closest to the results from the lme4-model `model_complex_rewb_2`. 164 | 165 | #### Table 1: Comparison of complex REWB-Models 166 | 167 | ```{r message=FALSE} 168 | tab_model( 169 | model_complex_rewb, model_complex_rewb_2, model_complex_rewb_2c, 170 | show.ci = FALSE, 171 | show.se = TRUE, 172 | auto.label = FALSE, 173 | string.se = "SE", 174 | show.icc = FALSE, 175 | dv.labels = c("Complex REWB (1)", "Complex REWB (2)", "Complex REWB (3)") 176 | ) 177 | ``` 178 | 179 | 180 | ## The simple random-effect-within-between model (REWB) and Mundlak model 181 | 182 | After email correspondance, the paper's authors suggest that, depending on the research interest and necessary complexity of the model, a "simple" random-slope might be suitable as well. As stated in the paper, this is useful if homogenity across level-2 units is assumed. This model usually yields the same results as a FE-model, however, we additionally have information about the random effects - and the model can incorporate time-invariant covariates. 183 | 184 | Again, the estimate of `x_tv_within` indicates the within-subject effect, while the estimate of `x_tv_between` indicates the between-subject effect. 185 | 186 | ### Model from Equation 2 187 | 188 | ```{r echo=FALSE} 189 | f <- "yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + (υi + εit)" 190 | knitr::asis_output(f) 191 | ``` 192 | 193 | 194 | ```{r} 195 | model_simple_rewb <- glmmTMB( 196 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID), 197 | data = d, 198 | REML = TRUE 199 | ) 200 | ``` 201 | 202 | An alternativ would be the **Mundlak** model. Here, the estimate of `x_tv` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _contextual_ effect. 203 | 204 | ### Model from Equation 3 205 | 206 | ```{r echo=FALSE} 207 | f <- "yit = β0 + β1W xit + β2C͞xi + β3zi + (υi + εit)" 208 | knitr::asis_output(f) 209 | ``` 210 | 211 | ```{r} 212 | model_mundlak <- glmmTMB( 213 | QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + time | ID), 214 | data = d, 215 | REML = TRUE 216 | ) 217 | ``` 218 | 219 | The contextual effect, i.e. the coefficient for `x_tv_between`, indicates the effect of an individual (at level 1) that moves from one level-2 group into another one. In the above model, or in general: in case of longitudinal data, the contextual effect is meaningless, as level-2 predictors are individuals (or subjects) themselves, and by definition cannot "move" to another individual. Therefor, the REWB-model is more informative. 220 | 221 | ## Comparison of models 222 | 223 | In table 2, we compare the "classical" RE-model, the complex REWB-model, the "simple" REWB-model and the Mundlak-Model. 224 | 225 | #### Table 2: Comparison of RE, REWB and Mundlak Models 226 | 227 | ```{r message=FALSE} 228 | tab_model( 229 | model_re, model_complex_rewb, model_simple_rewb, model_mundlak, 230 | show.ci = FALSE, 231 | show.se = TRUE, 232 | auto.label = FALSE, 233 | string.se = "SE", 234 | show.icc = FALSE, 235 | dv.labels = c("Classical RE", "Complex REWB", "Simple REWB", "Mundlak") 236 | ) 237 | ``` 238 | 239 | 240 | ## Check if a REWB- or simple RE-model suitable 241 | 242 | If the estimates of the within- and between-effect (`x_tv_within` and `x_tv_between`) are (almost) identical, or if the contextual effect (`x_tv_between`) in the **Mundlak**-model is zero and doesn't give a significant improvement for the model, you can also use a simple RE-model. 243 | 244 | A simple way to check this is a likelihood-ratio test between the simple RE-model and the Mundlak-model: 245 | 246 | ```{r} 247 | anova(model_re, model_mundlak) 248 | ``` 249 | 250 | Here we see a significant improvement of the Mundlak-model over the simple RE-model, indicating that it makes sense to model within- and between-subjects effects, i.e. to apply a REWB-model. 251 | 252 | ## Comparison FE- and REWB-Model 253 | 254 | The function `felm()` from the package **lfe** was used to compute the fixed effects regression models. Base R's `lm()` gives the same result, however, the output is much longer due to the ID-parameters. 255 | 256 | ```{r echo=FALSE} 257 | f <- "yit = β1 (xit - ͞xi) + (υi + εit)" 258 | knitr::asis_output(f) 259 | ``` 260 | 261 | 262 | ```{r} 263 | # Model from Equation 5 264 | model_fe_ID <- felm( 265 | QoL ~ time + x_tv_within | ID, 266 | data = d 267 | ) 268 | 269 | # same as this lm-model 270 | # model_fe_ID <- lm( 271 | # QoL ~ 0 + time + x_tv_within + ID, 272 | # data = d 273 | # ) 274 | ``` 275 | 276 | Equation 6 describes a fixed effects model with de-meaned dependent variable. 277 | 278 | ```{r echo=FALSE} 279 | f <- "(yit - ͞yi) = β1 (xit - ͞xi) + εit" 280 | knitr::asis_output(f) 281 | ``` 282 | 283 | 284 | ```{r} 285 | # Model from Equation 6 286 | model_fe_y_within <- felm( 287 | QoL_within ~ time + x_tv_within, 288 | data = d 289 | ) 290 | 291 | # or ... 292 | # model_fe_y_within <- lm( 293 | # QoL_within ~ 0 + time + x_tv_within, 294 | # data = d 295 | # ) 296 | ``` 297 | 298 | We compare the results from the FE-models with a simple RE-model and the REWB-model. 299 | 300 | ```{r} 301 | model_re_2 <- glmmTMB( 302 | QoL ~ time + x_tv_within + x_tv_between + (1 | ID), 303 | data = d, 304 | REML = TRUE 305 | ) 306 | 307 | # Compare with complex REWB-model 308 | model_complex_rewb3 <- glmmTMB( 309 | QoL ~ time + x_tv_within + x_tv_between + (1 + time + x_tv_within | ID), 310 | data = d, 311 | REML = TRUE 312 | ) 313 | ``` 314 | 315 | As we can see, the estimates of the FE-models and the RE-model are identical. However, the estimates from the REWB-model differ. This is because the time-varying predictors, the within-subject effect `x_tv_within`, is allowed to vary between subjects as well (i.e. it is modelled as random slope). 316 | 317 | #### Table 3: Comparison of FE- and RE-models 318 | 319 | ```{r message=FALSE} 320 | tab_model( 321 | model_fe_ID, model_fe_y_within, model_re_2, model_complex_rewb3, 322 | show.ci = FALSE, 323 | show.se = TRUE, 324 | auto.label = FALSE, 325 | string.se = "SE", 326 | show.icc = FALSE, 327 | dv.labels = c("FE-model with ID", "FE, de-meaned Y (with Intercept)", "RE", "Complex REWB") 328 | ) 329 | ``` 330 | 331 | ## Conclusion 332 | 333 | When group factors (random effects) and (time-constant) predictors correlate, it's recommended to fit a complex random-effect-within-between model (REWB) instead of a "simple" mixed effects model. This requires de- and group-meaning the time-varying predictors. Depending on the data structure, random slope and intercept may correlate or not. 334 | 335 | The random effects structure, i.e. how to model random slopes and intercepts and allow correlations among them, depends on the nature of the data. The benefits from using mixed effects models over fixed effects models are more precise estimates (in particular when random slopes are included) and the possibility to include between-subjects effects. 336 | 337 | In case of convergence problems or singular fits, note that changing the optimizer might help. In this context, some models ran fine in [**lme4**](random-effects-within-between-effects-model.html), while other models that had problems being fitted in _lme4_ ran without any problems in _glmmTMB_. 338 | 339 | ```{r eval=FALSE} 340 | # compute group-mean of "x_tv" for each subject (ID) and 341 | # then "de-mean" x_tv 342 | d <- cbind( 343 | d, 344 | demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" 345 | ) 346 | 347 | # This model gives a convergence warning: 348 | # Model convergence problem; singular convergence (7) 349 | m <- glmmTMB( 350 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID), 351 | data = d, 352 | REML = TRUE 353 | ) 354 | 355 | # An alternative could be to model the random effects as not correlated. 356 | m <- glmmTMB( 357 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID), 358 | REML = TRUE, 359 | data = d 360 | ) 361 | ``` 362 | 363 | * `x_tv_within` indicates the _within-subject_ effect 364 | * `x_tv_between` indicates the _between-subject_ effect 365 | * `z1_ti` and `z2_ti` also indicate a _between-subject_ effect 366 | 367 | ## Further critics of the FE-approach 368 | 369 | (source: http://andrewgelman.com/2012/04/02/fixed-effects-and-identification/) 370 | 371 | > "But the so-called fixed effects model does not in general minimize bias. It only minimizes bias under some particular models. As I wrote above, 'it’s just another model.' Another way to see this, in the time-series cross-sectional case, is to recognize that there’s no reason to think of group-level coefficients as truly 'fixed'. One example I remember was a regression on some political outcomes, with 40 years of data for each of 50 states, where the analysis included 'fixed effects' for states. I’m sorry but it doesn’t really make sense to think of Vermont from 1960 through 2000 as being 'fixed' in any sense." 372 | 373 | > "I just don’t see how setting the group-level variance to infinity can be better than estimating it from the data or setting it to a reasonable finite value. That said, the big advantage of multilevel (“random effects”) modeling comes when you are interested in the varying coefficients themselves, or if you’re interested in predictions for new groups, or if you want the treatment effect itself to vary by group. On a slightly different note, I’m unhappy with many of the time-series cross-sectional analyses I’ve seen because I don’t buy the assumption of constancy over time. That is, I don’t really think those effects are “fixed”!" 374 | 375 | > "I don’t know that there’s anything much that’s time-invariant in what I study. But, in any case, the so-called fixed-effects analysis is mathematically a special case of multilevel modeling in which the group-level variance is set to infinity. I agree that there’s no need to “believe” that model for the method to work; however, I think it works because of some implicit additivity assumptions. I’d prefer to (a) allow the group-level variance to be finite, and (b) work in the relevant assumptions more directly." 376 | 377 | # References 378 | -------------------------------------------------------------------------------- /random-effects-within-between-effects-model.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Fixed and Random Effects Models" 3 | author: "Daniel Lüdecke" 4 | date: "22 May 2019" 5 | output: 6 | html_document: 7 | theme: cerulean 8 | toc: yes 9 | bibliography: random-effects-within-between-effects-model.bib 10 | --- 11 | 12 | ```{r setup, include=FALSE,echo=FALSE} 13 | library(knitr) 14 | knitr::opts_chunk$set( 15 | echo = TRUE, 16 | collapse = TRUE, 17 | warning = FALSE, 18 | message = FALSE, 19 | comment = "#>", 20 | dev = "png" 21 | ) 22 | ``` 23 | 24 | ![cc](cc-attrib-nc.png) 25 | 28 | 29 | This document is licensed under the 30 | [Creative Commons attribution-noncommercial license](http://creativecommons.org/licenses/by-nc-sa/2.5/ca/). 31 | Please share \& remix noncommercially, mentioning its origin. Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets). 32 | 33 | ## The violation of model-assumptions in RE-models for panel data 34 | 35 | This example shows how to address the issue when group factors (random effects) and (time-constant) predictors correlate for mixed models, especially in panel data. Models, where predictors and group factors correlate, may have compromised estimates of uncertainty as well as possible bias. In particular in econometrics, fixed-effects models are considered the gold-standard to address such issues. However, it often makes no sense to consider group-effects as "fixed" over a long time period. Apart from this, there are more shortcomings of FE-models as well, see [@bell_fixed_2018], [@bell_understanding_2018] and [@bafumi_fitting_2006]. 36 | 37 | The following equations and discussion on FE vs. RE are based on [@bell_fixed_2018]. Further discussion on FE vs. RE, also at the end of this document, refer to [@gelman_data_2007] and [@bafumi_fitting_2006]. 38 | 39 | ### Adding group meaned predictors to solve this issue 40 | 41 | The solution to the critics from "FE-modelers" is simple: If you include a group-mean of your variables in a random effects model (that is, calculating the mean of the predictor at each group-level and including it as a group-level predictor), it will give the same answer as a fixed effects model (see table 3 very below, and [@bell_understanding_2018] as reference). This is why FE-modelers often call this type of RE-model also a "kind of" FE-model, i.e. they define a RE model as a model where predictors are assumed uncorrelated with the residuals. However, 42 | 43 | > "Calling it a FE model is not just inaccurate. It also does down its potential. Eg FE models don’t usually include random slopes, and failing to do so can lead to incorrect SEs as well as being a less interesting and realistic model." 44 | 45 | > "A random effects model is such because it has random effects (that is, higher-level entities treated as a distribution) in it rather than fixed effects (higher-level entities treated as dummy variables) in it." 46 | 47 | source: [Twitter-Discussion 1](https://twitter.com/AndrewJDBell/status/1026764338370105344), [Twitter-Discussion 2](https://twitter.com/AndrewJDBell/status/1026764347480178689) 48 | 49 | ### Problems of ignoring random slopes in Fixed Effects models 50 | 51 | [@heisig_costs_2017] demonstrate how ignoring random slopes, i.e. neglecting "cross-cluster differences in the effects of lower-level controls reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power". You may refer to this paper to justify a mixed model with random slopes over "simple" FE-models. 52 | 53 | ## Examples 54 | 55 | The following code snippets show how to translate the Equations from [@bell_fixed_2018] into R-code, using `lmer()` from the **lme4**-package. 56 | 57 | ```{r message=FALSE} 58 | library(lme4) 59 | library(sjPlot) 60 | library(parameters) 61 | library(lfe) 62 | 63 | load("example.RData") 64 | ``` 65 | 66 | _Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets)._ 67 | 68 | ## Description of the data 69 | 70 | * Variables: 71 | * `x_tv` : time-varying variable 72 | * `z1_ti` : first time-invariant variable, co-variate 73 | * `z2_ti` : second time-invariant variable, co-variate 74 | * `QoL` : Response (quality of life of patient) 75 | * `ID` : patient ID 76 | * `time` : time-point of measurement 77 | 78 | 79 | ## "Classical" growth-model for longitudinal data 80 | 81 | ```{r} 82 | model_re <- lmer( 83 | QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID), 84 | data = d 85 | ) 86 | ``` 87 | 88 | ## Computing the de-meaned and group-meaned variables 89 | 90 | Next is a model from Eq. 10, which includes the _"de-meaned"_ time-varying variable as well as the _"group-meaned"_ time-varying variable. 91 | 92 | ```{r} 93 | # compute mean of "x_tv" for each subject (ID) and 94 | # then "de-mean" x_tv 95 | d <- cbind( 96 | d, 97 | demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" 98 | ) 99 | ``` 100 | 101 | Now we have: 102 | 103 | * `x_tv_between` : time-varying variable with the mean of `x_tv` accross all time-points, for each patient (ID). 104 | * `x_tv_within` : the de-meaned time-varying variable `x_tv` 105 | 106 | `QoL_between` and `QoL_within` are used to test different FE-models, which are described later. In those models, I also use a "de-meaned" response variable without the group-variable (`ID`) as fixed effect (see Equation 6 in the paper). 107 | 108 | ## The complex random-effect-within-between model (REWB) 109 | 110 | Eq. 10 suggests allowing the "within-effect" (de-meaned) vary across individuals, that's why `x_tv_within` is added as random slope as well. 111 | 112 | Here, the estimate of `x_tv_within` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _between-subject_ effect. This model also allows for heterogenity across level-2 units, that's why `x_tv_within` also appears in the random effects. The estimates of `z1_ti` 113 | and `z2_ti` also indicate a _between-subject_ effect, as this is a level-2 variable, which cannot have a within-subject effect. 114 | 115 | 116 | ### Model from Equation 10 117 | 118 | Here is the equation 10 from Bell et al. 2018: 119 | 120 | ```{r echo=FALSE} 121 | f <- "yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + υi0 + υi1 (xit - ͞xi) + εit" 122 | knitr::asis_output(f) 123 | ``` 124 | 125 | ```{r echo=FALSE} 126 | f <- "" 127 | knitr::asis_output(f) 128 | ``` 129 | 130 | ```{r} 131 | # This model this leads to an error (number of observations <= number of 132 | # random effects), the check for nobs vs. re is ignored here. 133 | model_rewb <- lmer( 134 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID), 135 | data = d, 136 | control = lmerControl(check.nobs.vs.nRE = "ignore") 137 | ) 138 | 139 | # An alternative could be to model the random effects as not correlated. 140 | # m2b <- lmer( 141 | # QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID), 142 | # data = d 143 | # ) 144 | 145 | # here we get no error message, model runs fine... 146 | model_complex_rewb <- lmer( 147 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 148 | (1 + time | ID) + (1 + x_tv_within | ID), 149 | data = d 150 | ) 151 | 152 | # an alternative would be to assume independence between random slopes 153 | # and no covariance... 154 | model_complex_rewb_2 <- lmer( 155 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 156 | (1 + time | ID) + (0 + x_tv_within | ID), 157 | data = d 158 | ) 159 | ``` 160 | 161 | 162 | We compare all model fits, but we go on with `model_complex_rewb` (the second model in the table below) for now... 163 | 164 | #### Table 1: Comparison of complex REWB-Models 165 | 166 | ```{r message=FALSE} 167 | tab_model( 168 | model_rewb, model_complex_rewb, model_complex_rewb_2, 169 | show.ci = FALSE, 170 | show.se = TRUE, 171 | auto.label = FALSE, 172 | string.se = "SE", 173 | show.icc = FALSE, 174 | dv.labels = c("Complex REWB (1)", "Complex REWB (2)", "Complex REWB (3)") 175 | ) 176 | ``` 177 | 178 | 179 | ## The simple random-effect-within-between model (REWB) and Mundlak model 180 | 181 | After email correspondance, the paper's authors suggest that, depending on the research interest and necessary complexity of the model, a "simple" random-slope might be suitable as well. As stated in the paper, this is useful if homogenity across level-2 units is assumed. This model usually yields the same results as a FE-model, however, we additionally have information about the random effects - and the model can incorporate time-invariant covariates. 182 | 183 | Again, the estimate of `x_tv_within` indicates the within-subject effect, while the estimate of `x_tv_between` indicates the between-subject effect. 184 | 185 | ### Model from Equation 2 186 | 187 | ```{r echo=FALSE} 188 | f <- "yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + (υi + εit)" 189 | knitr::asis_output(f) 190 | ``` 191 | 192 | ```{r} 193 | model_simple_rewb <- lmer( 194 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID), 195 | data = d 196 | ) 197 | ``` 198 | 199 | An alternativ would be the **Mundlak** model. Here, the estimate of `x_tv` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _contextual_ effect. 200 | 201 | ### Model from Equation 3 202 | 203 | ```{r echo=FALSE} 204 | f <- "yit = β0 + β1W xit + β2C ͞xi + β3 zi + (υi + εit)" 205 | knitr::asis_output(f) 206 | ``` 207 | 208 | ```{r} 209 | model_mundlak <- lmer( 210 | QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + time | ID), 211 | data = d 212 | ) 213 | ``` 214 | 215 | The contextual effect, i.e. the coefficient for `x_tv_between`, indicates the effect of an individual (at level 1) that moves from one level-2 group into another one. In the above model, or in general: in case of longitudinal data, the contextual effect is meaningless, as level-2 predictors are individuals (or subjects) themselves, and by definition cannot "move" to another individual. Therefor, the REWB-model is more informative. 216 | 217 | ## Comparison of models 218 | 219 | In table 2, we compare the "classical" RE-model, the complex REWB-model, the "simple" REWB-model and the Mundlak-Model. 220 | 221 | #### Table 2: Comparison of RE, REWB and Mundlak Models 222 | 223 | ```{r message=FALSE} 224 | tab_model( 225 | model_re, model_complex_rewb, model_simple_rewb, model_mundlak, 226 | show.ci = FALSE, 227 | show.se = TRUE, 228 | auto.label = FALSE, 229 | string.se = "SE", 230 | show.icc = FALSE, 231 | dv.labels = c("Classical RE", "Complex REWB", "Simple REWB", "Mundlak") 232 | ) 233 | ``` 234 | 235 | 236 | ## Check if a REWB- or simple RE-model suitable 237 | 238 | If the estimates of the within- and between-effect (`x_tv_within` and `x_tv_between`) are (almost) identical, or if the contextual effect (`x_tv_between`) in the **Mundlak**-model is zero and doesn't give a significant improvement for the model, you can also use a simple RE-model. 239 | 240 | A simple way to check this is a likelihood-ratio test between the simple RE-model and the Mundlak-model: 241 | 242 | ```{r} 243 | anova(model_re, model_mundlak) 244 | ``` 245 | 246 | Here we see a significant improvement of the Mundlak-model over the simple RE-model, indicating that it makes sense to model within- and between-subjects effects, i.e. to apply a REWB-model. 247 | 248 | ## Comparison FE- and REWB-Model 249 | 250 | The function `felm()` from the package **lfe** was use to compute the fixed effects regression models. Base R's `lm()` gives the same result, however, the output is much longer due to the ID-parameters. 251 | 252 | 253 | ```{r echo=FALSE} 254 | f <- "yit = β1 (xit - ͞xi) + (υi + εit)" 255 | knitr::asis_output(f) 256 | ``` 257 | 258 | ```{r} 259 | # Model from Equation 5 260 | model_fe_ID <- felm( 261 | QoL ~ time + x_tv_within | ID, 262 | data = d 263 | ) 264 | 265 | # same as this lm-model 266 | # model_fe_ID <- lm( 267 | # QoL ~ 0 + time + x_tv_within + ID, 268 | # data = d 269 | # ) 270 | ``` 271 | 272 | Equation 6 describes a fixed effects model with de-meaned dependent variable. 273 | 274 | ```{r echo=FALSE} 275 | f <- "(yit - ͞yi) = β1 (xit - ͞xi) + εit" 276 | knitr::asis_output(f) 277 | ``` 278 | 279 | ```{r} 280 | # Model from Equation 6 281 | model_fe_y_within <- felm( 282 | QoL_within ~ time + x_tv_within, 283 | data = d 284 | ) 285 | 286 | # or ... 287 | # model_fe_y_within <- lm( 288 | # QoL_within ~ 0 + time + x_tv_within, 289 | # data = d 290 | # ) 291 | ``` 292 | 293 | We compare the results from the FE-models with a simple RE-model and the REWB-model. 294 | 295 | ```{r} 296 | model_re_2 <- lmer( 297 | QoL ~ time + x_tv_within + x_tv_between + (1 | ID), 298 | data = d 299 | ) 300 | 301 | # Compare with complex REWB-model 302 | model_complex_rewb3 <- lmer( 303 | QoL ~ time + x_tv_within + x_tv_between + 304 | (1 + time | ID) + (1 + x_tv_within | ID), 305 | data = d 306 | ) 307 | ``` 308 | 309 | As we can see, the estimates of the FE-models and the RE-model are identical. However, the estimates from the REWB-model differ. This is because the time-varying predictors, the within-subject effect `x_tv_within`, is allowed to vary between subjects as well (i.e. it is modelled as random slope). 310 | 311 | #### Table 3: Comparison of FE- and RE-models 312 | 313 | ```{r message=FALSE} 314 | tab_model( 315 | model_fe_ID, model_fe_y_within, model_re_2, model_complex_rewb3, 316 | show.ci = FALSE, 317 | show.se = TRUE, 318 | auto.label = FALSE, 319 | string.se = "SE", 320 | show.icc = FALSE, 321 | dv.labels = c("FE-model with ID", "FE, de-meaned Y (with Intercept)", "RE", "Complex REWB") 322 | ) 323 | ``` 324 | 325 | ## Comparison with the panelr-package 326 | 327 | The [panelr-package](https://panelr.jacob-long.com/) provides functions to fit models similar to those suggested by Bell et al. 2018, especially the "simple REWB model" (`model_simple_rewb`) and the Mundlak-model (`model_mundlak`). For the complex REWB model (`model_complex_rewb`), I needed some slight modification when using `panelr::wbm()`, so the following model `model_complex_rewb_panelr` that mimics the complex REWB model is similar to the above model `model_rewb`. 328 | 329 | Here we compare the results from `panelr::wbm()` with our previous models `model_complex_rewb` (complex REWB), `model_simple_rewb` (simple REWB) and `model_mundlak` (Mundlak). 330 | 331 | ```{r} 332 | library(panelr) 333 | 334 | # prepare the data for processing with "panelr" 335 | pd <- panel_data(d, id = ID, wave = time) 336 | 337 | # the complex REWB-model 338 | model_complex_rewb_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time + x_tv | ID), 339 | data = pd, 340 | control = lmerControl(check.nobs.vs.nRE = "ignore")) 341 | 342 | # the simple REWB-model 343 | model_rewb_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time | ID), data = pd) 344 | 345 | # the Mundlak model 346 | model_mundlak_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time | ID), data = pd, model = "contextual") 347 | 348 | tab_model( 349 | model_complex_rewb_panelr, model_rewb_panelr, model_mundlak_panelr, 350 | show.ci = FALSE, 351 | show.se = TRUE, 352 | auto.label = FALSE, 353 | string.se = "SE", 354 | show.icc = FALSE, 355 | dv.labels = c("Complex REWB", "Simple REWB", "Mundlak") 356 | ) 357 | ``` 358 | 359 | ```{r} 360 | # compare with other models 361 | tab_model( 362 | model_complex_rewb, model_simple_rewb, model_mundlak, 363 | show.ci = FALSE, 364 | show.se = TRUE, 365 | auto.label = FALSE, 366 | string.se = "SE", 367 | show.icc = FALSE, 368 | dv.labels = c("Complex REWB", "Simple REWB", "Mundlak") 369 | ) 370 | ``` 371 | 372 | 373 | As we can see, coefficients, standard errors and p-values of all relevant parameters are identical for the simple REWB and Mundlak models from both packages (`panelr::wbm()` and `lme4::lmer()`). This confirms the correct "translation" of the formulae from Bell at al. 2018 into `lmer()`-syntax. 374 | 375 | The complex REWB models are also (almost) identical, the minor variation after the second fractional part is most likely due to the slightly different random effects specification. 376 | 377 | ## Conclusion 378 | 379 | When group factors (random effects) and (time-constant) predictors correlate, it's recommended to fit a complex random-effect-within-between model (REWB) instead of a "simple" mixed effects model. This requires de- and group-meaning the time-varying predictors. Depending on the data structure, random slope and intercept may correlate or not. 380 | 381 | The random effects structure, i.e. how to model random slopes and intercepts and allow correlations among them, depends on the nature of the data. The benefits from using mixed effects models over fixed effects models are more precise estimates (in particular when random slopes are included) and the possibility to include between-subjects effects. 382 | 383 | In case of convergence problems or singular fits, note that changing the optimizer might help. In this context, some models ran fine in _lme4_, while other models that had problems being fitted in _lme4_ ran without any problems in [**glmmTMB**](random-effects-within-between-effects-model-glmmtmb.html) 384 | 385 | ```{r eval=FALSE} 386 | # compute group-mean of "x_tv" for each subject (ID) and 387 | # then "de-mean" x_tv 388 | d <- cbind( 389 | d, 390 | demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" 391 | ) 392 | 393 | # fit complex REWB-model 394 | m <- lmer( 395 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 396 | (1 + time | ID) + (1 + x_tv_within | ID), 397 | data = d 398 | ) 399 | 400 | # an alternative would be to assume independence between random slopes 401 | # and no covariance... 402 | m <- lmer( 403 | QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 404 | (1 + time | ID) + (0 + x_tv_within | ID), 405 | data = d 406 | ) 407 | ``` 408 | 409 | * `x_tv_within` indicates the _within-subject_ effect 410 | * `x_tv_between` indicates the _between-subject_ effect 411 | * `z1_ti` and `z2_ti` also indicate a _between-subject_ effect 412 | 413 | ## Further critics of the FE-approach 414 | 415 | (source: http://andrewgelman.com/2012/04/02/fixed-effects-and-identification/) 416 | 417 | > "But the so-called fixed effects model does not in general minimize bias. It only minimizes bias under some particular models. As I wrote above, 'it’s just another model.' Another way to see this, in the time-series cross-sectional case, is to recognize that there’s no reason to think of group-level coefficients as truly 'fixed'. One example I remember was a regression on some political outcomes, with 40 years of data for each of 50 states, where the analysis included 'fixed effects' for states. I’m sorry but it doesn’t really make sense to think of Vermont from 1960 through 2000 as being 'fixed' in any sense." 418 | 419 | > "I just don’t see how setting the group-level variance to infinity can be better than estimating it from the data or setting it to a reasonable finite value. That said, the big advantage of multilevel (“random effects”) modeling comes when you are interested in the varying coefficients themselves, or if you’re interested in predictions for new groups, or if you want the treatment effect itself to vary by group. On a slightly different note, I’m unhappy with many of the time-series cross-sectional analyses I’ve seen because I don’t buy the assumption of constancy over time. That is, I don’t really think those effects are “fixed”!" 420 | 421 | > "I don’t know that there’s anything much that’s time-invariant in what I study. But, in any case, the so-called fixed-effects analysis is mathematically a special case of multilevel modeling in which the group-level variance is set to infinity. I agree that there’s no need to “believe” that model for the method to work; however, I think it works because of some implicit additivity assumptions. I’d prefer to (a) allow the group-level variance to be finite, and (b) work in the relevant assumptions more directly." 422 | 423 | ## Further Readings 424 | 425 | - Discussion at [Cross-Validted](https://stats.stackexchange.com/q/100227) 426 | 427 | # References 428 | -------------------------------------------------------------------------------- /random-effects-within-between-effects-model.bib: -------------------------------------------------------------------------------- 1 | @article{bell_fixed_2018, 2 | title = {Fixed and random effects models: making an informed choice}, 3 | issn = {1573-7845}, 4 | url = {https://doi.org/10.1007/s11135-018-0802-x}, 5 | doi = {10.1007/s11135-018-0802-x}, 6 | journal = {Quality \& Quantity}, 7 | author = {Bell, Andrew and Fairbrother, Malcolm and Jones, Kelvyn}, 8 | year = {2018} 9 | } 10 | 11 | @article{bell_understanding_2018, 12 | title = {Understanding and misunderstanding group mean centering: a commentary on Kelley et al.’s dangerous practice}, 13 | volume = {52}, 14 | issn = {0033-5177, 1573-7845}, 15 | url = {http://link.springer.com/10.1007/s11135-017-0593-5}, 16 | doi = {10.1007/s11135-017-0593-5}, 17 | number = {5}, 18 | journal = {Quality \& Quantity}, 19 | author = {Bell, Andrew and Jones, Kelvyn and Fairbrother, Malcolm}, 20 | year = {2018}, 21 | pages = {2031-2036} 22 | } 23 | 24 | 25 | @book{gelman_data_2007, 26 | address = {Cambridge ; New York}, 27 | series = {Analytical methods for social research}, 28 | title = {Data analysis using regression and multilevel/hierarchical models}, 29 | isbn = {978-0-521-86706-1 978-0-521-68689-1}, 30 | publisher = {Cambridge University Press}, 31 | author = {Gelman, Andrew and Hill, Jennifer}, 32 | year = {2007}, 33 | keywords = {Multilevel models (Statistics), Regression analysis} 34 | } 35 | 36 | @inproceedings{bafumi_fitting_2006, 37 | address = {Philadelphia, PA}, 38 | organization = {Annual meeting of the American Political Science Association}, 39 | title = {Fitting Multilevel Models When Predictors and Group Effects Correlate.}, 40 | author = {Bafumi, Joseph and Gelman, Andrew}, 41 | year = {2006} 42 | } 43 | 44 | @article{heisig_costs_2017, 45 | title = {The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls}, 46 | volume = {82}, 47 | issn = {0003-1224, 1939-8271}, 48 | url = {http://journals.sagepub.com/doi/10.1177/0003122417717901}, 49 | doi = {10.1177/0003122417717901}, 50 | number = {4}, 51 | journal = {American Sociological Review}, 52 | author = {Heisig, Jan Paul and Schaeffer, Merlin and Giesecke, Johannes}, 53 | year = {2017}, 54 | pages = {796--827} 55 | } 56 | -------------------------------------------------------------------------------- /regression_pkgs_handout.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/regression_pkgs_handout.docx -------------------------------------------------------------------------------- /regression_pkgs_handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/strengejacke/mixed-models-snippets/a97640e9ce28eb27af80d7b72359be9b4b8e27a4/regression_pkgs_handout.pdf -------------------------------------------------------------------------------- /time-varying-covariates.R: -------------------------------------------------------------------------------- 1 | # Time-varying covariates ---- 2 | 3 | # Time-varying covariates are most flexibly modelled with splines. 4 | # Here are some examples, from simple longitudinal models to models 5 | # with time-varying covariates. 6 | # 7 | # y = outcome (continuous / Gaussian) 8 | # t = time-points 9 | # tv = time-varying covariate 10 | # tc = time-constant covariate 11 | # id = subject-ID 12 | 13 | # Model 1 - constant change in time ---- 14 | lmer(y ~ t + tv + tc + (1 + t | id), data) 15 | 16 | # Model 2 - constant change in time, different slopes depending on covariate ---- 17 | # (interaction) 18 | lmer(y ~ t * tv + tc + (1 + t | id), data) 19 | 20 | # Model 3 - non-linear change in time ---- 21 | lmer(y ~ t + I(t^2) + tv + tc + (1 + t | id), data) 22 | 23 | # Model 4 - non-linear change in time, different slopes depending on covariate ---- 24 | # (interaction) 25 | lmer(y ~ t * tv + I(t^2) * tv + tc + (1 + t | id), data) 26 | 27 | # Model 5 - non-linear change in time, time-varying covariate ---- 28 | # (interaction with non-linear covariate) 29 | lmer(y ~ y ~ t * tv + I(t^2) * tv + t * I(tv^2) + I(t^2) * I(tv^2) + tc + (1 + t | id), data) 30 | 31 | # Model 6 - non-linear change in time, time-varying covariate --- 32 | # (cubic instead of quadratic interaction) 33 | model6 <- lmer( 34 | y ~ t * tv + I(t^2) * tv + I(t^3) * tv + t * I(tv^2) + t * I(tv^3) + 35 | I(t^2) * I(tv^2) + I(t^2) * I(tv^3) + I(t^3) * I(tv^2) + 36 | I(t^3) * I(tv^3) + tc + (1 + t | id), 37 | data 38 | ) 39 | 40 | # This final model with splines is almost identical to model 6, 41 | # when comparing marginal effects. The time-varying covariate 42 | # needs to be specified in the "by"-argument from the spline-term. 43 | # Time "t" is non-linear (spline), "tv" varies over time. 44 | model7 <- lmer(y ~ s(t) + s(t, by = tv) + tc + (1 + t | id), data) 45 | -------------------------------------------------------------------------------- /time-varying-covariates.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Time-varying covariates" 3 | author: "Daniel Lüdecke" 4 | date: "26 3 2019" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE,echo=FALSE} 9 | library(knitr) 10 | knitr::opts_chunk$set( 11 | echo = TRUE, 12 | collapse = TRUE, 13 | warning = FALSE, 14 | comment = "#>", 15 | dev = "png" 16 | ) 17 | ``` 18 | 19 | Time-varying covariates are most flexibly modelled with splines. Here are some examples, from simple longitudinal models to models with time-varying covariates. 20 | 21 | * `y` = outcome (continuous / Gaussian) 22 | * `t` = time-points 23 | * `tv` = time-varying covariate 24 | * `tc` = time-constant covariate 25 | * `id` = subject-ID 26 | 27 | ```{r eval=FALSE} 28 | # Model 1 - constant change in time 29 | lmer(y ~ t + tv + tc + (1 + t | id), data) 30 | 31 | # Model 2 - constant change in time, different slopes depending on covariate 32 | # (interaction) 33 | lmer(y ~ t * tv + tc + (1 + t | id), data) 34 | 35 | # Model 3 - non-linear change in time 36 | lmer(y ~ t + I(t^2) + tv + tc + (1 + t | id), data) 37 | 38 | # Model 4 - non-linear change in time, different slopes depending on covariate 39 | # (interaction) 40 | lmer(y ~ t * tv + I(t^2) * tv + tc + (1 + t | id), data) 41 | 42 | # Model 5 - non-linear change in time, time-varying covariate 43 | # (interaction with non-linear covariate) 44 | lmer(y ~ y ~ t * tv + I(t^2) * tv + t * I(tv^2) + I(t^2) * I(tv^2) + tc + (1 + t | id), data) 45 | 46 | # Model 6 - non-linear change in time, time-varying covariate 47 | # (cubic instead of quadratic interaction) 48 | model6 <- lmer( 49 | y ~ t * tv + I(t^2) * tv + I(t^3) * tv + t * I(tv^2) + t * I(tv^3) + 50 | I(t^2) * I(tv^2) + I(t^2) * I(tv^3) + I(t^3) * I(tv^2) + 51 | I(t^3) * I(tv^3) + tc + (1 + t | id), 52 | data 53 | ) 54 | ``` 55 | 56 | This final model with splines is almost identical to model 6, when comparing marginal effects. The time-varying covariate needs to be specified in the `by`-argument from the spline-term. Time `t` is non-linear (spline), `tv` varies over time. 57 | 58 | ```{r eval=FALSE} 59 | model7 <- lmer(y ~ s(t) + s(t, by = tv) + tc + (1 + t | id), data) 60 | ``` 61 | 62 | Note that for time-varying covariates, these may correlate with the group-level predictors. In such cases, a "complex random effects within-between" model is recommended, see https://strengejacke.github.io/mixed-models-snippets/random-effects-within-between-effects-model.html. 63 | --------------------------------------------------------------------------------