├── MixedModeling-GKHajduk-script.R ├── README.md └── dragons.RData /MixedModeling-GKHajduk-script.R: -------------------------------------------------------------------------------- 1 | ###################################### 2 | # # 3 | # Mixed effects modeling in R # 4 | # # 5 | ###################################### 6 | 7 | ## authors: Gabriela K Hajduk, based on workshop developed by Liam Bailey 8 | ## contact details: gkhajduk.github.io; email: gkhajduk@gmail.com 9 | ## date: 2017-03-09 10 | ## 11 | 12 | ###---- Explore the data -----### 13 | 14 | ## load the data and have a look at it 15 | 16 | load("dragons.RData") 17 | 18 | head(dragons) 19 | 20 | ## Let's say we want to know how the body length affects test scores. 21 | 22 | ## Have a look at the data distribution: 23 | 24 | hist(dragons$testScore) # seems close to normal distribution - good! 25 | 26 | ## It is good practice to standardise your explanatory variables before proceeding - you can use scale() to do that: 27 | 28 | dragons$bodyLength2 <- scale(dragons$bodyLength) 29 | 30 | ## Back to our question: is test score affected by body length? 31 | 32 | ###---- Fit all data in one analysis -----### 33 | 34 | ## One way to analyse this data would be to try fitting a linear model to all our data, ignoring the sites and the mountain ranges for now. 35 | 36 | library(lme4) 37 | library(dplyr) 38 | 39 | basic.lm <- lm(testScore ~ bodyLength2, data = dragons) 40 | 41 | summary(basic.lm) 42 | 43 | ## Let's plot the data with ggplot2 44 | 45 | library(ggplot2) 46 | 47 | ggplot(dragons, aes(x = bodyLength, y = testScore)) + 48 | geom_point()+ 49 | geom_smooth(method = "lm") 50 | 51 | 52 | ### Assumptions? 53 | 54 | ## Plot the residuals - the red line should be close to being flat, like the dashed grey line 55 | 56 | plot(basic.lm, which = 1) # not perfect, but look alright 57 | 58 | ## Have a quick look at the qqplot too - point should ideally fall onto the diagonal dashed line 59 | 60 | plot(basic.lm, which = 2) # a bit off at the extremes, but that's often the case; again doesn't look too bad 61 | 62 | 63 | ## However, what about observation independence? Are our data independent? 64 | ## We collected multiple samples from eight mountain ranges 65 | ## It's perfectly plausible that the data from within each mountain range are more similar to each other than the data from different mountain ranges - they are correlated. Pseudoreplication isn't our friend. 66 | 67 | ## Have a look at the data to see if above is true 68 | boxplot(testScore ~ mountainRange, data = dragons) # certainly looks like something is going on here 69 | 70 | ## We could also plot it colouring points by mountain range 71 | ggplot(dragons, aes(x = bodyLength, y = testScore, colour = mountainRange))+ 72 | geom_point(size = 2)+ 73 | theme_classic()+ 74 | theme(legend.position = "none") 75 | 76 | ## From the above plots it looks like our mountain ranges vary both in the dragon body length and in their test scores. This confirms that our observations from within each of the ranges aren't independent. We can't ignore that. 77 | 78 | ## So what do we do? 79 | 80 | ###----- Run multiple analyses -----### 81 | 82 | 83 | ## We could run many separate analyses and fit a regression for each of the mountain ranges. 84 | 85 | ## Lets have a quick look at the data split by mountain range 86 | ## We use the facet_wrap to do that 87 | 88 | ggplot(aes(bodyLength, testScore), data = dragons) + geom_point() + 89 | facet_wrap(~ mountainRange) + 90 | xlab("length") + ylab("test score") 91 | 92 | 93 | 94 | ##----- Modify the model -----### 95 | 96 | ## We want to use all the data, but account for the data coming from different mountain ranges 97 | 98 | ## let's add mountain range as a fixed effect to our basic.lm 99 | 100 | mountain.lm <- lm(testScore ~ bodyLength2 + mountainRange, data = dragons) 101 | summary(mountain.lm) 102 | 103 | ## now body length is not significant 104 | 105 | 106 | ###----- Mixed effects models -----### 107 | 108 | 109 | 110 | ##----- First mixed model -----## 111 | 112 | ### model 113 | 114 | ### plots 115 | 116 | ### summary 117 | 118 | ### variance accounted for by mountain ranges 119 | 120 | 121 | 122 | ##-- implicit vs explicit nesting --## 123 | 124 | head(dragons) # we have site and mountainRange 125 | str(dragons) # we took samples from three sites per mountain range and eight mountain ranges in total 126 | 127 | ### create new "sample" variable 128 | 129 | 130 | ##----- Second mixed model -----## 131 | 132 | ### model 133 | 134 | ### summary 135 | 136 | ### plot 137 | 138 | 139 | 140 | ##----- Model selection for the keen -----## 141 | 142 | ### full model 143 | 144 | ### reduced model 145 | 146 | ### comparison 147 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CC-Linear-mixed-models 2 | 3 | ### Intro tutorial to linear mixed models, available here: https://ourcodingclub.github.io/2017/03/15/mixed-models.html 4 | 5 | We would love to hear your feedback on the tutorial, whether you did it at a Coding Club workshop or online: 6 | [https://www.surveymonkey.co.uk/r/HJYGVSF](https://www.surveymonkey.co.uk/r/HJYGVSF) 7 | 8 | Check out https://ourcodingclub.github.io/workshop/ to learn how you can get involved! 9 | 10 | This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/). 11 | 12 | [![License: CC BY-SA 4.0](https://licensebuttons.net/l/by-sa/4.0/80x15.png)](https://creativecommons.org/licenses/by-sa/4.0/) 13 | -------------------------------------------------------------------------------- /dragons.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ourcodingclub/CC-Linear-mixed-models/00ce776eed837c8cc0653287f2d79430667d3b92/dragons.RData --------------------------------------------------------------------------------