├── .gitignore ├── Makefile ├── MixedModels.md ├── README.md ├── check.R ├── gen_glmm_packages.R ├── glmm_packages_meta.rmd ├── removed.md └── watch.md /.gitignore: -------------------------------------------------------------------------------- 1 | *cache/ 2 | *_files/ 3 | figure/ 4 | .Rhistory 5 | *~ 6 | .#* 7 | gh-pages 8 | *.nav 9 | *.toc 10 | *.bbl 11 | *.blg 12 | *.snm 13 | *tikzDictionary 14 | *.aux 15 | *.log 16 | *.vrb 17 | *.out 18 | notes/*.html 19 | bib2xhtml 20 | *.Rout 21 | MixedModels.html 22 | *.html 23 | \#* 24 | .Rproj.user 25 | 26 | *.Rproj 27 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | ## FIXME: set up proper make rule for all_deps.rds 2 | glmm_packages: 3 | R CMD BATCH --vanilla gen_glmm_packages.R 4 | 5 | taskview: 6 | Rscript -e "ctv::ctv2html(ctv::read.ctv('MixedModels.md'))" 7 | 8 | check: 9 | Rscript check.R 10 | Rscript -e "ctv::check_ctv_packages('MixedModels.md')" 11 | -------------------------------------------------------------------------------- /MixedModels.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: MixedModels 3 | topic: Mixed, Multilevel, and Hierarchical Models in R 4 | maintainer: Ben Bolker, Julia Piaskowski, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer 5 | email: bolker@mcmaster.ca 6 | version: 2025-05-29 7 | source: https://github.com/cran-task-views/MixedModels/ 8 | --- 9 | 10 | **Contributors**: Maintainers *plus* Michael Agronah, Matthew Fidler, Thierry Onkelinx 11 | 12 | 13 | *Mixed* (or *mixed-effect*) *models* are a broad class of statistical models used to analyze data where observations can be assigned *a priori* to discrete groups, and where the parameters describing the differences between groups are treated as random (or *latent*) variables. They are one category of *multilevel*, or *hierarchical* models; *longitudinal* data are often analyzed in this framework. In econometrics, longitudinal or cross-sectional time series data are often referred to as *panel data* and are sometimes fitted with mixed models. Mixed models can be fitted in either frequentist or Bayesian frameworks. 14 | 15 | This task view only includes models that incorporate *continuous* (usually although not always Gaussian) latent variables. This excludes packages that handle hidden Markov models, latent Markov models, and finite (discrete) mixture models (some of these are covered by the `r view("Cluster")` task view). Dynamic linear models and other state-space models that do not incorporate a discrete grouping variable are also excluded (some of these are covered by the `r view("TimeSeries")` task view). Bioinformatic applications of [mixed models hosted on Bioconductor](https://bioconductor.org/help/search/index.html?q="mixed+models"/) are mostly excluded as well. 16 | 17 | 18 | ### Basic model fitting 19 | 20 | #### Linear mixed models 21 | 22 | Linear mixed models (LMMs) make the following assumptions: 23 | 24 | - The expected values of the responses are linear combinations of the fixed predictor variables and the random effects. 25 | - The conditional distribution of the responses is Gaussian (equivalently, the errors are Gaussian). 26 | - The random effects are normally distributed. 27 | 28 | *Frequentist:* 29 | 30 | The most commonly used packages and/or functions for frequentist LMMs are: 31 | 32 | - `r pkg("nlme", priority = "core")`: `nlme::lme()` provides REML or ML estimation. Allows multiple nested random effects, and provides structures for modeling heteroscedastic and/or correlated errors. Wald estimates of parameter uncertainty. 33 | - `r pkg("lme4", priority = "core")`: `lmer4::lmer()` provides REML or ML estimation. Allows multiple nested or crossed random effects, can compute profile confidence intervals and conduct parametric bootstrapping. 34 | - `r pkg("mbest")`: fits large nested LMMs using a fast moment-based approach. 35 | 36 | *Bayesian:* 37 | 38 | Most Bayesian R packages use Markov chain Monte Carlo (MCMC) estimation: `r pkg("MCMCglmm", priority = "core")`, `r pkg("rstanarm")`, and `r pkg("brms", priority = "core")`; the latter two packages use the [Stan](mc-stan.org) infrastructure. `r pkg("blme")`, built on `r pkg("lme4", priority = "core")`, uses maximum a posteriori (MAP) estimation. `r pkg("bamlss")` provides a flexible set of modular functions for Bayesian regression modeling. 39 | 40 | #### Generalized linear mixed models 41 | 42 | Generalized linear mixed models (GLMMs) can be described as hierarchical extensions of generalized linear models (GLMs), or as extensions of LMMs to different response distributions, typically in the exponential family. The random-effect distributions are typically assumed to be Gaussian on the scale of the linear predictor. 43 | 44 | *Frequentist:* 45 | 46 | - `r pkg("MASS")`: `MASS::glmmPQL()` fits via penalized quasi-likelihood. 47 | - `r pkg("lme4", priority = "core")`: `lme4::glmer()` uses Laplace approximation and adaptive Gauss-Hermite quadrature; fits negative binomial as well as exponential-family models. 48 | - `r pkg("glmmTMB", priority = "core")` uses Laplace approximation; allows some correlation structures; fits some non-exponential families (Beta, COM-Poisson, etc.) and zero-inflated/hurdle models. 49 | - `r pkg("GLMMadaptive")` uses adaptive Gauss-Hermite quadrature; fits exponential family, negative binomial, beta, zero-inflated/hurdle/censored Gaussian models, user-specified log-densities. 50 | - `r pkg("hglm")` fits hierarchical GLMs using $h$-likelihood (*sensu* Nelder, Lee and Pawitan (2017) 51 | - `r pkg("glmm")` fits GLMMs using Monte Carlo likelihood approximation. 52 | - `r pkg("glmmEP")` fits probit mixed models for binary data by expectation propagation. 53 | - `r pkg("mbest")` fits large nested GLMMs using a fast moment-based approach. 54 | - `r pkg("galamm")` fits a wide variety of models (heteroscedastic, mixed response types, factor loadings, etc.) 55 | - `r pkg("glmmrBase")` uses MCMC and Laplace approximations to Gaussian, binomial, Poisson, Beta, Gamma responses with flexible correlation structures 56 | 57 | *Bayesian:* 58 | 59 | Most Bayesian mixed model packages use some form of Markov chain Monte Carlo (or other Monte Carlo methods). 60 | 61 | - `r pkg("MCMCglmm", priority = "core")`: Gibbs sampling. Exponential family, multinomial, ordinal, zero-inflated/altered/hurdle, censored, multimembership, multi-response models. Pedigree (animal/kinship/phylogenetic) models. 62 | - `r pkg("rstanarm")` Hamiltonian Monte Carlo (based on [Stan](http://mc-stan.org)); designed for `lme4` compatibility. 63 | - `r pkg("brms", priority = "core")`: Hamilton Monte Carlo. Linear, robust linear, count data, survival, response times, ordinal, zero-inflated/hurdle/censored data. 64 | - `r pkg("bamlss")`: optimization and derivative-based Metropolis-Hastings/slice sampling. Wide range of distributions and link functions. 65 | 66 | The following packages (in addition to `r pkg("bamlss")`) find maximum *a posteriori* fits to Bayesian (G)LMMs by optimization: 67 | 68 | - `r pkg("blme")` wraps `r pkg("lme4", priority = "core")` to add prior distributions. 69 | - [INLA](https://www.r-inla.org) uses integrated nested Laplace approximation to fit GLMMs using a wide range of latent models (especially for spatial estimation), priors, and distributions. 70 | - `r pkg("inlabru")` facilitates spatial modeling using integrated nested Laplace approximation via the R-INLA package. Additionally, extends the GAM-like model class to more general nonlinear predictor expressions and implements a log-Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. 71 | - `r github("inbo/inlatools")` provides tools to set sensible priors and check the dispersion and distribution of INLA models. 72 | - `r pkg("vglmer")` estimates GLMMs by variational Bayesian methods. 73 | 74 | #### Nonlinear mixed models 75 | 76 | Nonlinear mixed models incorporate arbitrary nonlinear responses that cannot be accommodated in the framework of GLMMs. Only a few packages can accommodate *generalized* nonlinear mixed models (i.e., parametric nonlinear mixed models with non-Gaussian responses). However, many packages allow smooth nonparametric components (see ["Additive models"](#additive-models) below). Otherwise, users may need to implement GNLMMs themselves in a more general [hierarchical modeling framework](#hierarchical-modeling-frameworks). 77 | 78 | *Frequentist:* 79 | 80 | - `nlme::nlme()` from `r pkg("nlme")` and `lmer4::nlmer()` from `r pkg("lme4", priority = "core")` fit nonlinear mixed effects models by maximum likelihood. 81 | - `nlmixr2::nlmixr2()` from `r pkg("nlmixr2")` fits nonlinear mixed effects model by first order conditional estimation (FOCEi) maximum likelihood approximation (a different approximation than `nlme:nlme()` and `lmer4:nlmer()`), and allows generalized likelihood as well as a selection of built-in link functions. 82 | - `gnlmm()` and `gnlmm3()` from `r pkg("repeated")` fit GNLMMs by Gauss-Hermite integration. 83 | - `r pkg("saemix")` and `r pkg("nlmixr2")` both use a stochastic approximation of the EM algorithm to fit a wide range of GNLMMs. 84 | 85 | 86 | *Bayesian:* 87 | 88 | - `r pkg("brms")` supports GNLMMs. 89 | 90 | #### Generalized estimating equations 91 | 92 | General estimating equations (GEEs) are an alternative approach to fitting clustered, longitudinal, or otherwise correlated data. These models produce estimates of the *marginal* effects (averaged across the group-level variation) rather than *conditional* effects (conditioned on group-level information). 93 | 94 | - `r pkg("geepack", priority = "core")`, `r pkg("gee")`, and `r pkg("geeM")` are standard GEE solvers, providing GEE estimation of the parameters in mean structures with possible correlation between the outcomes. 95 | - `r pkg("geesmv")`: GEE estimator using the original sandwich variance estimator proposed by Liang and Zeger (1986), and eight types of variance estimators for improving the finite small-sample performance. 96 | - `r pkg("multgee")` is a GEE solver for correlated nominal or ordinal multinomial responses. 97 | - `r pkg("glmtoolbox")` handles a wide variety of model types (GLMs, beta-binomial and negative binomial, zero-inflation and zero-alteration, mixed models) via GEEs 98 | 99 | ### Specialized models/tasks 100 | 101 | - [**Additive models**]{#additive-models} (models incorporating smooth functional components such as regression splines or Gaussian processes; also known as *semiparametric* models): `r pkg("gamm4")`, `r pkg("mgcv")`, `r pkg("brms", priority = "core")`, `r pkg("lmeSplines")`, `r pkg("bamlss")`, `r pkg("gamlss")`, `r github("Biometris/LMMsolver")`, `r pkg("R2BayesX")`, `r pkg("GLMMRR")`, `r pkg("glmmTMB", priority = "core")`, `r pkg("galamm")`. 102 | - **Big data/distributed computation**: `r pkg("lmmpar")`, `r pkg("mbest")`. See also [MixedModels.jl](https://juliastats.org/MixedModels.jl/dev/) (Julia), [diamond](https://github.com/stitchfix/diamond) (Python). 103 | - **Bioinformatics/quantitative genetics**: `r pkg("MCMC.qpcr")`, `r pkg("QGglmm")`, `r pkg("CpGassoc")` (methylation studies). 104 | - **Censored data** (response data known only up to lower/upper bounds): `r pkg("brms", priority = "core")` and `r pkg("nlmixr2")` (general), `r pkg("ARpLMEC")` (censored Gaussian, autoregressive errors). Censored Gaussian (Tobit) responses: `r pkg("GLMMadaptive")`, `r pkg("MCMCglmm", priority = "core")`, `r pkg("gamlss")`. 105 | - **Denominator degree-of-freedom computation**: Satterthwaite and/or Kenward-Roger corrections are computed by `r pkg("lmerTest")`, `r pkg("pbkrtest")`, `r pkg("glmmrBase")` 106 | - [**Differential equations**]{#differential-equations} (fitting DEs with group-structured parameters; this category overlaps considerably with **pharmacokinetic modeling**): `r pkg("mixedsde")` for stochastic DEs. Ordinary DEs can be run with `r pkg("nlmixr2")` using the "focei" or "saem" (EM) methods, or using the `r pkg("nlme")` package; see also the `r view("DifferentialEquations")` and `r view("Pharmacokinetics")` task views. 107 | - **Doubly hierarchical GLMs**: `r pkg("dhglm")`, `r pkg("mdhglm")` (multivariate) 108 | - **Factor analytic, latent variable, and structural equation models**: `r pkg("lavaan", priority = "core")`, `r pkg("nlmm")`,`r pkg("sem")`, `r pkg("piecewiseSEM")`, `r pkg("semtree")`, and `r pkg("blavaan")`; see also the `r view("Psychometrics")` task view. 109 | - **Flexible correlation structures**: `r pkg("brms")`, `r pkg("glmmTMB")`, `r pkg("sommer")`, `r pkg("glmmrBase")`, `r pkg("regress")` 110 | - **Kinship-augmented models** (responses where individuals have a known family relationship): `r pkg("pedigreemm")`, `r pkg("coxme")`, `r pkg("kinship2")`, `r github("Biometris/LMMsolver")`, `r pkg("MCMCglmm", priority = "core")`, `r pkg("sommer", priority = "core")`, `r pkg("rrBLUP")`, `r pkg("BGLR")`, `r github("perpdgo/lme4GS")`, `r github("variani/lme4qtl")`, `r github("cheuerde/cpgen")`, `r pkg("QTLRel")`. 111 | - **Location-scale models**: `r pkg("nlme", priority = "core")`, `r pkg("glmmTMB", priority = "core")`, `r pkg("brms", priority = "core")`, `r pkg("mgcv")` [with `family` chosen from one of the `*ls`/`*lss` options] all allow modeling of the dispersion/scale component. 112 | - **Missing values**: `r pkg("mice")`, `r pkg("micemd")`, `r pkg("CRTgeeDR")`, `r pkg("JointAI")`, `r pkg("mdmb")`, `r pkg("pan")`; see also the `r view("MissingData")` task view. 113 | - [**Multiple membership models**]{#multimembership-models}: (Bayesian) `r pkg("MCMCglmm", priority = "core")`, `r pkg("brms", priority = "core")`, `r github("benrosche/rmm")`; (frequentist) `r github("jvparidon/lmerMultiMember")` (can also fit the Bradley-Terry model) 114 | - **Multinomial responses**: `r pkg("bamlss")`, `r pkg("R2BayesX")`, `r pkg("MCMCglmm", priority = "core")`, `r pkg("mgcv")`, `r pkg("mclogit")`. 115 | - **Multivariate responses/multi-trait analysis**: (multiple dependent variables; the response variables may or may not be constrained to be from the same family) `r pkg("MCMCglmm", priority = "core")`, `r github("deruncie/MegaLMM")`, `r pkg("brms")`, `r pkg("sommer")`, `r pkg("gllvm")`, INLA. Many mixed-effect packages allow fitting of (homogeneous) multivariate responses by "melting" the data (converting to long format) and treating each observation in the original data as a cluster. 116 | - **Non-Gaussian random effects**: `r pkg("brms", priority = "core")`, `r pkg("repeated")`, `r pkg("spaMM")`. 117 | - **Ordinal-valued responses** (responses measured on an ordinal scale): `r pkg("ordinal")`, `r pkg("GLMMadaptive")`, `r pkg("multgee")` (frequentist); `r pkg("MCMCglmm")`, `r pkg("brms")` (Bayesian), `r pkg("cplm")` (both) 118 | - **Over-dispersed models**: `r pkg("aod")`, `r pkg("aods3")`. 119 | - **Panel data**: in econometrics, *panel data* typically refers to subjects (individuals or firms) that are sampled repeatedly over time. The theoretical and computational approaches used by econometricians overlap with mixed models (e.g., see [here](https://cran.r-project.org/web/packages/plm/vignettes/A_plmPackage.html#nlme)). The `r pkg("plm")` package can fit mixed-effects panel models; see also the `r view("Econometrics")` task view. 120 | - **Quantile regression**: `r pkg("lqmm")`, `r pkg("qrLMM")`, `r pkg("qrNLMM")`. 121 | - **Phylogenetic models**: `r pkg("pez")`, `r pkg("phyr")`, `r pkg("MCMCglmm", priority = "core")`, `r pkg("brms", priority = "core")`, `r pkg("gllvm")`. 122 | - **Repeated measures**: (packages with specialized covariance structures for handling repeated measures) `r pkg("nlme", priority = "core")`, `r pkg("mmrm")`, `r pkg("glmmTMB", priority = "core")`, `r github("Biometris/LMMsolver")`, `r pkg("repeated")`, `r pkg("mmrm")` 123 | - **Regularized/penalized models** (regularization or variable selection by ridge, lasso, or elastic net penalties): `r pkg("splmm")` fits LMMs for high-dimensional data by imposing penalty on both the fixed effects and random effects for variable selection. `r pkg("glmmLasso")` fits GLMMs with L1-penalized (LASSO) fixed effects. `r pkg("bamlss")` implements LASSO-like penalization for generalized additive models. 124 | - **Robust/heavy-tailed estimation** (downweighting the importance of extreme observations): `r pkg("robustlmm")`, `r pkg("robustBLME")` (Bayesian robust LME), `r pkg("CRTgeeDR")` for the doubly robust inverse probability weighted augmented GEE estimator. Some packages (`r pkg("brms", priority = "core")`, `r pkg("bamlss")`, `r pkg("GLMMadaptive")`, `r pkg("glmmTMB")`, `r pkg("mgcv")` with `family = "scat"`, `r pkg("nlmixr2")`) allow heavy-tailed response distributions such as Student-$t$. 125 | - **Skewed data/response transformation**: `r pkg("skewlmm")` fits a scale mixture of skew-normal linear mixed models using expectation-maximization (EM). `r pkg("nlmixr2")` can fit skewed data with dynamic transform of both sides with both `coxBox()` and `yeoJohnson()` transformations with maximum likelihood or the EM method "saem". `r pkg("bcmixed")` fits Box-Cox-transformed LMMs and provides inferences for differences between treatment levels. `r pkg("boxcoxmix")` fits Box-Cox transformed LMMs and logistic mixed models. 126 | - **Spatial models**: `r pkg("nlme", priority = "core")` (with `corStruct` functions), `r pkg("CARBayesST")`, `r pkg("sphet")`, `r pkg("spind")`, `r pkg("spaMM")`, `r pkg("glmmfields")`, `r pkg("glmmTMB")`, `r pkg("inlabru")` (spatial point processes via log-Gaussian Cox processes), `r pkg("brms", priority = "core")`, `r github("Biometris/LMMsolver")`, `r pkg("bamlss")`, `r pkg("spmodel")` (spatial linear and generalized linear mixed models, Kriging/prediction); see also the `r view("Spatial")` and `r view("SpatioTemporal")` CRAN task views. 127 | - **Sports analytics**: `r pkg("mvglmmRank")`, multivariate generalized linear mixed models for ranking sports teams. 128 | - **Survival analysis**: `r pkg("coxme")`. 129 | - **Tree-based models**: `r pkg("glmertree")`, `r pkg("semtree")`, `r pkg("gpboost")` 130 | - **Weighted models**: `r pkg("WeMix")` (linear and logit models with weights at multiple levels) 131 | - **Zero-inflated models**: (frequentist) `r pkg("glmmTMB")`, `r pkg("cplm")`, `r pkg("mgcv")` (zi Poisson only), `r pkg("GLMMadaptive")`; (Bayesian): `r pkg("MCMCglmm", priority = "core")`, `r pkg("brms", priority = "core")`, `r pkg("bamlss")`. 132 | - **Zero-one inflated Beta regression**: `r pkg("brms")`, `r pkg("zoib")`, `r pkg("glmmTMB")` (zero-inflated only). *Ordered beta regression* is an alternative framework to address the same type of data: `r pkg("ordbetareg")`, `r pkg("glmmTMB")` 133 | 134 | ### Hierarchical modeling frameworks 135 | 136 | These packages do not directly provide functions to fit mixed models, but instead implement interfaces to general-purpose sampling and optimization toolboxes that can be used to fit mixed models. While models require extra effort to set up, and often require programming in a domain-specific language other than R, these frameworks are more flexible than most of the other packages listed here. 137 | 138 | * Interfaces to [JAGS](https://mcmc-jags.sourceforge.io/)/[OpenBUGS](https://www.mrc-bsu.cam.ac.uk/software/bugs/openbugs/): `r pkg("R2jags")`, `r pkg("rjags")`, `r pkg("R2OpenBUGS")` (BUGS language). 139 | * Interfaces to [Stan](http://mc-stan.org/) (C++ extensions): `r pkg("rstan")`, `r github("stan-dev/cmdstanr")`, `r github("rmcelreath/rethinking")` (`ulam()` function). 140 | * Other frameworks: `r pkg("TMB")` (automatic differentiation and Laplace approximation via C++ extensions), `r pkg("RTMB")` (simplified R interface to `TMB`), `r pkg("tmbstan")`, `r pkg("nimble")`, `r pkg("greta")` (R interface to TensorFlow). 141 | 142 | 143 | ### Model diagnostics and summary statistics 144 | 145 | #### Model diagnostics 146 | 147 | - **general**: `r pkg("HLMdiag")` (diagnostic tools for hierarchical (multilevel) linear models), `r pkg("rockchalk")`, `r pkg("performance")`, `r pkg("multilevelTools")`, `r pkg("merTools")` (for models fitted using `lme4`), `r pkg("ggResidpanel")`, `r pkg("mlmtools")`, `r pkg("DHARMa")`. 148 | - **influential data points**: `r pkg("influence.ME")`, `r pkg("influence.SEM")`. 149 | - **residuals**: `r pkg("DHARMa")`. 150 | 151 | #### Summary statistics 152 | 153 | - **Correlations**: `r pkg("iccbeta")` (intraclass correlation), `r pkg("rptR")` (repeatabilities) 154 | - **$R^2$ calculations**: `r pkg("r2glmm")` ($R^2$ and partial $R^2$), `r pkg("MuMIn")` (`r.squaredGLMM()` function), `r pkg("partR2")`, `r pkg("performance")` (`r2()` function), `r pkg("rr2")`, `r pkg("mlmtools")`, `r pkg("mlmhelpr")` (Note that there are many different methods for computing $R^2$ values for (G)LMMs: see e.g. Nakagawa, Johnson and Schielzeth (2017), Jaeger et al. (2017).). Many of these packages also compute *intra-class correlations*. 155 | - **Information criteria**: `r pkg("cAIC4")` (conditional AIC) , `r pkg("blmeco")` (WAIC). 156 | - **Robust variance-covariance estimates**: `r pkg("clubSandwich")`, `r pkg("merDeriv")`, `r pkg("mlmhelpr")`, `r pkg("glmmrBase")`, `r pkg("confintROB")` 157 | 158 | #### Derivatives 159 | 160 | The first and second derivatives of log-likelihood with respect to parameters can be useful for various model evaluation tasks (e.g., computing sensitivities, robust variance-covariance matrices, or delta-method variances). 161 | 162 | - `r pkg("lmeInfo")`, `r pkg("merDeriv")`. 163 | 164 | ### Data sets 165 | 166 | Many packages include small example data sets (e.g., `r pkg("lme4", priority = "core")`, `r pkg("nlme", priority = "core")`). These packages provide previously described data sets often used in evaluating mixed models. 167 | 168 | - `r pkg("mlmRev")`: examples from the Multilevel Software Comparative Reviews. 169 | - `r pkg("SASmixed")`: data sets from [SAS System for Mixed Models](https://support.sas.com/content/dam/SAS/support/en/books/sas-for-mixed-models-an-introduction/68787_excerpt.pdf) 170 | - `r pkg("StroupGLMM")`: R scripts and data sets for *[Generalized Linear Mixed Models](https://www.taylorfrancis.com/books/mono/10.1201/b13151/generalized-linear-mixed-models-walter-stroup)*. 171 | - `r pkg("blmeco")`: Data and functions accompanying *[Bayesian Data Analysis in Ecology using R, BUGS and Stan](https://www.elsevier.com/books/bayesian-data-analysis-in-ecology-using-linear-models-with-r-bugs-and-stan/korner-nievergelt/978-0-12-801370-0)*. 172 | - `r pkg("nlmeU")`: Data sets, functions and scripts described in *[Linear Mixed-Effects Models: A Step-by-Step Approach](https://link.springer.com/book/10.1007/978-1-4614-3900-4)*. 173 | - `r pkg("VetResearchLMM")`: R scripts and data sets for *[Linear Mixed Models. An Introduction with applications in Veterinary Research](https://hdl.handle.net/10568/5379)*. 174 | - `r pkg("languageR")`: R scripts and data sets for *[Analyzing Linguistic Data: A practical introduction to statistics using R](https://doi.org/10.1017/CBO9780511801686)*. 175 | - `r pkg("nlmixr2data")`: includes the data sets for testing `r pkg("nlmixr2")` against commercial competitors like 'NONMEM' and 'Monolix' 176 | 177 | 178 | ### Model presentation and prediction 179 | 180 | Functions and frameworks for convenient and tabular and graphical output of mixed model results: 181 | 182 | - **Tables**: `r pkg("huxtable")`, `r pkg("broom.mixed", priority = "core")`, `r pkg("rockchalk")`, `r pkg("parameters")`, `r pkg("modelsummary")`. 183 | - **Figures/visualization**: `r pkg("dotwhisker")`, `r pkg("sjPlot")`, `r pkg("rockchalk")`, `r pkg("mlmtools")` 184 | 185 | 186 | ### Convenience wrappers 187 | 188 | These functions provide convenient frameworks to fit and interpret mixed models. 189 | 190 | - **Model fitting**: `r pkg("multilevelmod", priority = "core")`, `r pkg("ez")`, `r pkg("mixlm")`, `r pkg("afex")`, and `r pkg("nimble")`. 191 | - **Model summaries**: `r pkg("broom.mixed", priority = "core")`, `r pkg("insight")` 192 | - **Variable selection & model averaging**: `r pkg("LMERConvenienceFunctions")`, `r pkg("MuMIn")`, `r pkg("glmulti")` (see, e.g., [maintainer's blog](https://vcalcagnoresearch.wordpress.com/package-glmulti/) or [here](https://gist.github.com/bbolker/4ae3496c0ddf99ea2009a22b94aecbe5) for use with mixed models). 193 | `r pkg("mlmhelpr")` 194 | * **Centering/scaling predictors** at the population or group level: `r pkg("mlmhelpr")`, `r pkg("mlmtools")`, `arm::standardize()` 195 | 196 | ### Inference and model selection 197 | 198 | #### Hypothesis testing 199 | 200 | - **Fixed effects**: `r pkg("car")`, `r pkg("lmerTest")`, `r pkg("RVAideMemoire")`, `r pkg("emmeans")`, `r pkg("afex")`, `r pkg("pbkrtest")`, `r pkg("CLME")`. 201 | - **Random effects**: `r pkg("varTestnlme")`, `r pkg("RLRsim")`, `r pkg("mvctm")`. 202 | 203 | #### Prediction and estimation 204 | 205 | - `r pkg("emmeans")`, `r pkg("effects")`, `r pkg("margins")`, `r pkg("MarginalMediation")`, `r pkg("marginaleffects")`, `r pkg("ggeffects")`. 206 | 207 | #### Bootstrapping 208 | 209 | - `r pkg("pbkrtest")`, `r pkg("lme4", priority = "core")` (`lme4::bootMer()` function), `r pkg("lmeresampler")`, `r pkg("boot.pval")`, `r pkg("mlmhelpr")`, `r pkg("confintROB")` 210 | 211 | #### Power analysis and simulation 212 | 213 | These topics are closely related because there are few available analytical methods for computing statistical power for mixed models; power usually needs to be estimated by simulation. 214 | 215 | - **Power**: `r pkg("longpower")`, `r pkg("pass.lme")`, `r pkg("simr")`, `r pkg("powerEQTL")` (`powerLME` function), `r github("DejanDraschkow/mixedpower")` 216 | - **Simulation**: `r pkg("faux")`; `simulate()` in `lme4` (for formula arguments), `glmmTMB::simulate_new()`; `r pkg("rxode2")`, `r pkg("mrgsolve")`, `r pkg("PKPDsim")` (ODE/pharmacokinetic models) 217 | 218 | #### Model selection 219 | 220 | - `r pkg("cAIC4")` (`cAIC4::stepcAIC`), `r pkg("buildmer")`, `r pkg("MuMIn")`, `r github("timnewbold/StatisticalModels")` (`GLMERSelect`), `r pkg("glmmsel")` 221 | 222 | 223 | ### Commercial software interfaces 224 | 225 | - [Mplus](https://www.statmodel.com/): `r pkg("MplusAutomation")`. 226 | - [ASReml-R](https://vsni.co.uk/software/asreml-r): `r pkg("asremlPlus")`. 227 | - `r pkg("babelmixr2")` allows `r pkg("nlmixr2")` models to be translated and run in either the commercial tool 228 | [Monolix](https://monolix.lixoft.com/) or [NONMEM](https://www.iconplc.com/solutions/technologies/nonmem) and then reads the results 229 | back in to create a standardized `nlmixr2` fit object. This fit object runs the diagnostics in `nlmixr2` and compares them 230 | to the ones output in the commercial software to "validate" the fit object against the output of the commercial tool. 231 | It also interfaces with free tools such as `r pkg("PKNCA")` for automatically using observed pharmacokinetic (PK) data 232 | for initial estimates of PK models. 233 | 234 | ### Links 235 | 236 | - Help: [R-SIG-mixed-models mailing list](https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models) for discussion of mixed-model-related questions, course announcements, etc.. 237 | - Help: [[r] + [mixed-models] tags on Stack Overflow](http://stackoverflow.com/questions/tagged/r+mixed-models). 238 | - Help: [Cross Validated](http://stats.stackexchange.com/). 239 | - Other software: [Mixed models on Bioconductor](https://bioconductor.org/help/search/index.html?q="mixed+models"/) 240 | - Other software: [ASReml-R](https://vsni.co.uk/software/asreml-r) (commercial: `r pkg("asremlPlus")`). 241 | - Other software: [assist](https://yuedong.faculty.pstat.ucsb.edu/software.html). 242 | - Other software: [INLA](http://www.r-inla.org/home). 243 | - Other software: [Zelig Project](http://docs.zeligproject.org/) 244 | - Other software: [MixWild/MixRegLS](https://voices.uchicago.edu/hedeker/mixwild_mixregls/) for scale-location modeling. 245 | - Other software: [MixedModels.jl](https://github.com/JuliaStats/MixedModels.jl) for mixed models in Julia. 246 | - Other software: [Monolix](https://monolix.lixoft.com/) for ODE based mixed models (commercial). 247 | - Other software: [NONMEM](https://www.iconplc.com/innovation/nonmem/) for ODE based mixed models (commercial). 248 | - Book: *[Mixed-Effects Models in S and S-PLUS](https://link.springer.com/book/10.1007/b98882)*. 249 | - Book: *[SAS System for Mixed Models](https://v8doc.sas.com/sashtml/hrddoc/indfiles/55235.htm)*. 250 | - Book: *[Generalized Linear Mixed Models](https://www.taylorfrancis.com/books/mono/10.1201/b13151/generalized-linear-mixed-models-walter-stroup)*. 251 | - Book: *[Bayesian Data Analysis in Ecology using R, BUGS and Stan](https://www.elsevier.com/books/bayesian-data-analysis-in-ecology-using-linear-models-with-r-bugs-and-stan/korner-nievergelt/978-0-12-801370-0)*. 252 | - Book: *[Linear Mixed-Effects Models: A Step-by-Step Approach](https://link.springer.com/book/10.1007/978-1-4614-3900-4)*. 253 | - Book: *[Mixed Effects Models and Extensions in Ecology with R](https://link.springer.com/book/10.1007/978-0-387-87458-6)*. 254 | - Online Book: *[Embrace Uncertainty: Mixed-effects models with Julia](https://juliamixedmodels.github.io/EmbraceUncertainty/)*. 255 | - Online Book: *[Generalized Linear Mixed Models with Applications in Agriculture and Biology](https://link.springer.com/book/10.1007/978-3-031-32800-8)* 256 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R 2 | 3 | **URL:** 4 | 5 | **Source file:** [MixedModels.md](MixedModels.md) 6 | 7 | **Contributions:** Suggestions and improvements for this task view are very 8 | welcome and can be made through issues or pull requests here on GitHub or 9 | via e-mail to the maintainer address. For further details see the 10 | [Contributing](https://github.com/cran-task-views/ctv/blob/main/Contributing.md) 11 | guide. All contributions must adhere to the 12 | [code of conduct](https://github.com/cran-task-views/ctv/blob/main/CodeOfConduct.md). 13 | -------------------------------------------------------------------------------- /check.R: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | # needs to adapted for our use cases 3 | 4 | # Maintenance script to check CTV packages, URLs, and formatting. 5 | 6 | library("ctv") 7 | library("httr") 8 | library("xml2") 9 | library("magrittr") 10 | 11 | ctvFile <- "MixedModels.md" 12 | stopifnot(file.exists(ctvFile)) 13 | 14 | message("Building HTML and opening for viewing") 15 | ctv::ctv2html(ctvFile) 16 | htmlFile <- gsub(".md", ".html", ctvFile, fixed = TRUE) 17 | browseURL(htmlFile) 18 | 19 | 20 | message("Checking packages...") 21 | packages <- check_ctv_packages(ctvFile) 22 | packagesIssues <- lengths(packages) != 0 23 | if (any(packagesIssues)) { 24 | warning("These packages need updating:", call. = FALSE, immediate. = TRUE) 25 | print(packages[packagesIssues]) 26 | } 27 | 28 | message("Checking date...") 29 | xml <- read_xml(htmlFile) 30 | date_node <- xml_find_all(xml, "//meta[@name='DC.issued']") 31 | cat(sprintf("Today is %s", Sys.Date()), "\n") 32 | cat(sprintf("Task view last updated %s", xml_attr(date_node, "content")), "\n") 33 | if (Sys.Date() != xml_attr(date_node, "content")) { 34 | warning("Don't forget to update the version", call. = FALSE, immediate. = TRUE) 35 | } 36 | 37 | 38 | message("Checking URLs...") 39 | 40 | urls_all <- unique(xml_find_all(xml, "//a[@href]") %>% xml_attr(., "href")) 41 | urls <- urls_all[intersect(grep("^#.", urls_all, invert = TRUE), 42 | grep("https://CRAN.R-project.org/.", urls_all, invert = TRUE))] 43 | httr::set_config(timeout(1e6)) 44 | url_test <- rep(NA, length(urls)) 45 | 46 | ## FIXME: progress bar? (back to a for loop?) 47 | get_url <- function(url) { 48 | tt <- try(http_error(url, 49 | config(ssl_verifypeer = 0L, ssl_verifyhost = 0L)), 50 | silent = TRUE) 51 | if (inherits(tt, "try-error")) return(NA) 52 | tt 53 | } 54 | url_test <- vapply(urls, get_url, 55 | FUN.VALUE = logical(1)) 56 | 57 | #url_test <- sapply(urls, try(http_error), config(ssl_verifypeer = 0L, ssl_verifyhost = 0L)) 58 | 59 | ## sometimes links come up error when they do work fine: false positive list 60 | ## (update as needed) 61 | 62 | working_urls <- character(0) 63 | bad_urls <- urls[url_test & !(urls %in% working_urls)] 64 | 65 | if (length(bad_urls) > 0) { 66 | status <- vapply(bad_urls, 67 | function(x) httr::GET(x)$status, 68 | FUN.VALUE = integer(1)) 69 | cat("Failed URLs:\n") 70 | vapply(status, 71 | function(x) http_status(x)$message, 72 | FUN.VALUE = character(1)) 73 | } 74 | 75 | -------------------------------------------------------------------------------- /gen_glmm_packages.R: -------------------------------------------------------------------------------- 1 | ## utilities/misc code for finding interesting packages related to mixed models 2 | library(tidyverse) ## general 3 | library(miniCRAN) 4 | library(crandep) 5 | library(igraph) 6 | library(ctv) 7 | ## library(packdep) ## archived ... 8 | library(packageRank) 9 | 10 | options(repos = c(CRAN = "https://cloud.r-project.org")) 11 | 12 | ## 1. plot reverse dependencies of lme4 13 | ## dependency types to include 14 | ## A depends on B: B automatically gets loaded 15 | ## A imports B: A uses functions from B, B *must* be installed 16 | ## A suggests B: A optionally uses functions from B 17 | ## A enhances B: the authors of the package think these go together 18 | ## "reverse-depends": reverse dependencies of lme4 = all the packages that depend on lme4 19 | rd <- c("Reverse depends", "Reverse imports", "Reverse linking to", "Reverse suggests") 20 | dd <- get_dep("lme4",rd) 21 | ## plot(igraph::graph_from_data_frame(dd)) ## too many! 22 | ## nrow(dd) ## 425, 31 July 2022 23 | ## grep/regular expression tricks 24 | 25 | ## 2. find many (not all) GLMM packages 26 | a1 <- available.packages() 27 | ## grep("lmm",rownames(a1),value=TRUE,ignore.case=TRUE) 28 | ## lm followed by ("m or e") followed by (a character not "t" or the end of the string) 29 | regexps <- c("lm(m|e([^t]|$))") ## was using "mixed" but ... ? should check, 30 | find_pkgs <- function(x) grep(x, rownames(a1), value=TRUE, ignore.case=TRUE) 31 | regex_pkgs <- character(0) 32 | for (r in regexps) { 33 | regex_pkgs <- union(regex_pkgs, find_pkgs(r)) 34 | } 35 | ## false pos 36 | fpos <- c("palmerpenguins", "curtailment", "yamlme", "mailmerge") 37 | 38 | 39 | ## false negatives: some known-interesting pkgs *not* picked up by regex 40 | ## (check MixedModels.ctv for some more) 41 | ## fneg <- c("SASmixed", "broom.mixed", 42 | ## "pbkrtest", "emmeans", "mgcv", "gamm4", 43 | ## "brms", "rstanarm", "pez", "merDeriv", "repeated", "hglm", 44 | ## "geesmv", "geepack", "influence.ME", "cAIC4", "HLMdiag", "lmmfit", 45 | ## "iccbeta", "DHARMa", "effects", "rockchalk", 46 | ## "arm", "performance", "car", 47 | ## "ez", "afex", "RVAideMemoire", "geoRglm", "GLMMarp", "spaMM", 48 | ## "polytomous", "ordinal", "longpower") 49 | 50 | 51 | regex_pkgs <- setdiff(regex_pkgs, fpos) 52 | rr <- read.ctv("MixedModels.md") 53 | bad_pkgs <- unlist(check_ctv_packages("MixedModels.md")) 54 | ctv_pkgs <- setdiff(rr$packagelist[,"name"], bad_pkgs) 55 | 56 | omit_pkgs <- "MASS" 57 | ## don't want to include MASS in ranking (it's only in there for glmmPQL) 58 | 59 | focal_pkgs <- union(ctv_pkgs, regex_pkgs) |> setdiff(omit_pkgs) 60 | 61 | length(focal_pkgs) ## 128 62 | 63 | ## now extract 64 | pkg_rd <- (expand_grid(name=focal_pkgs, type=rd)) 65 | 66 | ## clunky 67 | pb <- txtProgressBar(max=nrow(pkg_rd), style=3) 68 | i <- 0 69 | ff <- function(name,type) { 70 | ## cat(".") 71 | ## cat(name, "\n") 72 | i <<- i+1 73 | setTxtProgressBar(pb,i) 74 | gd <- get_dep(name,type) 75 | res <- tibble(focal=name, type, from = gd$from, to = gd$to) 76 | if (nrow(res) == 0) return(NULL) 77 | return(res) 78 | } 79 | 80 | 81 | if (file.exists("all_deps.rds")) { 82 | all_deps <- readRDS("all_deps.rds") 83 | } else { 84 | ## THIS BIT IS SLOW, WATCH OUT ... (~ 6 minutes) 85 | system.time(all_deps <- (pkg_rd 86 | |> pmap(ff) 87 | |> bind_rows() 88 | ) 89 | ) 90 | saveRDS(all_deps, "all_deps.rds") 91 | } 92 | 93 | ## disregard numbers, for purposes of plotting 94 | unique_deps <- (all_deps 95 | |> select(focal, from, to) 96 | |> unique() 97 | |> filter(to %in% focal_pkgs) 98 | ) 99 | 100 | rdg1 <-igraph::graph_from_data_frame(unique_deps[,c("to", "from")]) 101 | 102 | ## plot(rdg1) 103 | 104 | ## 3. collect importance measures 105 | 106 | cc <- eigen_centrality(rdg1)$vector 107 | central_tbl <- tibble(focal=names(cc), central=cc) 108 | 109 | a1a <- (all_deps 110 | |> mutate_at("type", str_remove, "Reverse ") 111 | |> mutate_at("type", 112 | ~ case_when(. %in% c("depends", "imports") ~ "strong", 113 | TRUE ~ "weak")) 114 | ) 115 | 116 | a1 <- (a1a 117 | |> drop_na() 118 | |> count(focal, type) 119 | ) 120 | 121 | a_tot <- (a1 122 | |> group_by(focal) 123 | |> summarise(n=sum(n),.groups="drop") 124 | |> mutate(type="total") 125 | ) 126 | 127 | a2 <- (bind_rows(a1,a_tot) 128 | |> pivot_wider(names_from=type, values_from=n, values_fill=0) 129 | ## restore packages with no depends 130 | |> full_join(tibble(focal=focal_pkgs), by="focal") 131 | |> mutate(across(where(is.integer), replace_na, 0L)) 132 | ) 133 | 134 | ## 135 | ## SLOW the first time 136 | pp <- packageRank(focal_pkgs) 137 | pp2 <- (pp$package.data 138 | |> as_tibble() 139 | |> select(focal=packages,downloads,percentile) 140 | ) 141 | 142 | a3 <- (full_join(pp2, a2, by="focal") 143 | |> full_join(central_tbl, by="focal") 144 | |> mutate_at("central",replace_na,0) 145 | |> mutate(score=percentile/100+strong/max(strong)+central, .before = downloads) 146 | |> mutate(across(where(is.double), round, 3)) 147 | |> mutate(in_taskview = focal %in% ctv_pkgs, .before = downloads) 148 | |> rename(package = "focal") 149 | |> arrange(desc(score)) 150 | ) 151 | 152 | ## a3 153 | ## View(a3) 154 | ## add descriptions?? 155 | 156 | write_csv(a3, "glmm_packages.csv") 157 | rmarkdown::render("glmm_packages_meta.rmd", output_format = "md_document") 158 | -------------------------------------------------------------------------------- /glmm_packages_meta.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "metadata for automated GLMM package list" 3 | --- 4 | 5 | This document describes `glmm_packages.csv`, which is auto-generated by: 6 | 7 | - regex code-golfing the list of available package names (with the pattern `"lm(m|e([^t]|$))"`) 8 | - adding some known false negatives/subtracting some false positives 9 | - scraping to discover the following information: 10 | - number of strong/weak reverse dependencies (`crandep::get_dep`) 11 | - eigenvalue centrality of resulting dependency graph (`igraph::eigen_centrality`) 12 | - download/percentile info (`packageRank::packageRank`) 13 | - overall score is `percentile/100 + strong/max(strong) + central` 14 | 15 | Last generated: `r format(Sys.time(), '%d %b %Y')` 16 | -------------------------------------------------------------------------------- /removed.md: -------------------------------------------------------------------------------- 1 | ### Packages removed 2 | 3 | *(when packages are archived)* 4 | 5 | | Package | Date | flagged in issue| 6 | |:----------------|:-----------|:----:| 7 | | clusterPower | 2024-04-26 | #58 | 8 | | mlmmm | 2024-04-26 | #57 | 9 | | Phxnlme | 2024-04-26 | #55 | 10 | | BMTME | 2024-04-26 | #48 | 11 | | dalmatian | 2024-04-26 | #47 | 12 | | qgtools | 2024-06-24 | #66 | 13 | | wgeesel | 2024-07-17 | #68 | 14 | -------------------------------------------------------------------------------- /watch.md: -------------------------------------------------------------------------------- 1 | - `margins`, `dotwhisker`: hopefully restored from archiving soon! 2 | - `qgtools`: reasonably recently (2024-04-20) archived, hopefully coming back 3 | --------------------------------------------------------------------------------