├── .gitignore ├── README.md ├── cfa-example ├── cfa-example.html ├── cfa-example.md ├── cfa-example.rmd └── figure │ └── unnamed-chunk-19.png ├── cheat-sheet-lavaan ├── cheat-sheet-lavaan.html ├── cheat-sheet-lavaan.md └── cheat-sheet-lavaan.rmd ├── convert.r ├── ex1-paper ├── ex1-paper.html ├── ex1-paper.md └── ex1-paper.rmd ├── ex2-paper ├── ex2-paper.html ├── ex2-paper.md └── ex2-paper.rmd ├── makefile └── path-analysis ├── figure └── unnamed-chunk-5.png ├── path-analysis.html ├── path-analysis.md └── path-analysis.rmd /.gitignore: -------------------------------------------------------------------------------- 1 | *cache* 2 | .build* 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This repository shares a few example analyses using `lavaan`, an R package for structural equation modelling. 2 | 3 | I've just been creating these examples to teach myself how to use the software. Feel free to re-use the code, but I make no guaranty as to the accuracy or validity of these analyses. 4 | -------------------------------------------------------------------------------- /cfa-example/cfa-example.md: -------------------------------------------------------------------------------- 1 | # CFA Example 2 | 3 | 4 | 5 | ```r 6 | library(psych) 7 | library(lavaan) 8 | Data <- bfi 9 | item_names <- names(Data)[1:25] 10 | ``` 11 | 12 | 13 | 14 | 15 | ## Check data 16 | 17 | 18 | 19 | ```r 20 | sapply(Data[, item_names], function(X) sum(is.na(X))) 21 | ``` 22 | 23 | ``` 24 | ## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5 25 | ## 16 27 26 19 16 21 24 20 26 16 23 16 25 9 21 22 21 11 36 29 22 0 28 14 20 26 | ``` 27 | 28 | ```r 29 | 30 | Data$item_na <- apply(Data[, item_names], 1, function(X) sum(is.na(X)) > 31 | 0) 32 | 33 | table(Data$item_na) 34 | ``` 35 | 36 | ``` 37 | ## 38 | ## FALSE TRUE 39 | ## 2436 364 40 | ``` 41 | 42 | ```r 43 | Data <- Data[!Data$item_na, ] 44 | ``` 45 | 46 | 47 | 48 | 49 | * I decided to remove data with missing data to simplify subsequent exploration of the features of the lavaan software. 50 | 51 | 52 | ## Basic CFA 53 | 54 | 55 | ```r 56 | m1_model <- ' N =~ N1 + N2 + N3 + N4 + N5 57 | E =~ E1 + E2 + E3 + E4 + E5 58 | O =~ O1 + O2 + O3 + O4 + O5 59 | A =~ A1 + A2 + A3 + A4 + A5 60 | C =~ C1 + C2 + C3 + C4 + C5 61 | ' 62 | 63 | m1_fit <- cfa(m1_model, data=Data[, item_names]) 64 | summary(m1_fit, standardized=TRUE) 65 | ``` 66 | 67 | ``` 68 | ## lavaan (0.4-14) converged normally after 63 iterations 69 | ## 70 | ## Number of observations 2436 71 | ## 72 | ## Estimator ML 73 | ## Minimum Function Chi-square 4165.467 74 | ## Degrees of freedom 265 75 | ## P-value 0.000 76 | ## 77 | ## Parameter estimates: 78 | ## 79 | ## Information Expected 80 | ## Standard Errors Standard 81 | ## 82 | ## Estimate Std.err Z-value P(>|z|) Std.lv Std.all 83 | ## Latent variables: 84 | ## N =~ 85 | ## N1 1.000 1.300 0.825 86 | ## N2 0.947 0.024 39.899 0.000 1.230 0.803 87 | ## N3 0.884 0.025 35.919 0.000 1.149 0.721 88 | ## N4 0.692 0.025 27.753 0.000 0.899 0.573 89 | ## N5 0.628 0.026 24.027 0.000 0.816 0.503 90 | ## E =~ 91 | ## E1 1.000 0.920 0.564 92 | ## E2 1.226 0.051 23.899 0.000 1.128 0.699 93 | ## E3 -0.921 0.041 -22.431 0.000 -0.847 -0.627 94 | ## E4 -1.121 0.047 -23.977 0.000 -1.031 -0.703 95 | ## E5 -0.808 0.039 -20.648 0.000 -0.743 -0.553 96 | ## O =~ 97 | ## O1 1.000 0.635 0.564 98 | ## O2 -1.020 0.068 -14.962 0.000 -0.648 -0.418 99 | ## O3 1.373 0.072 18.942 0.000 0.872 0.724 100 | ## O4 0.437 0.048 9.160 0.000 0.277 0.233 101 | ## O5 -0.960 0.060 -16.056 0.000 -0.610 -0.461 102 | ## A =~ 103 | ## A1 1.000 0.484 0.344 104 | ## A2 -1.579 0.108 -14.650 0.000 -0.764 -0.648 105 | ## A3 -2.030 0.134 -15.093 0.000 -0.983 -0.749 106 | ## A4 -1.564 0.115 -13.616 0.000 -0.757 -0.510 107 | ## A5 -1.804 0.121 -14.852 0.000 -0.873 -0.687 108 | ## C =~ 109 | ## C1 1.000 0.680 0.551 110 | ## C2 1.148 0.057 20.152 0.000 0.781 0.592 111 | ## C3 1.036 0.054 19.172 0.000 0.705 0.546 112 | ## C4 -1.421 0.065 -21.924 0.000 -0.967 -0.702 113 | ## C5 -1.489 0.072 -20.694 0.000 -1.013 -0.620 114 | ## 115 | ## Covariances: 116 | ## N ~~ 117 | ## E 0.292 0.032 9.131 0.000 0.244 0.244 118 | ## O -0.093 0.022 -4.138 0.000 -0.112 -0.112 119 | ## A 0.141 0.018 7.713 0.000 0.223 0.223 120 | ## C -0.250 0.025 -10.118 0.000 -0.283 -0.283 121 | ## E ~~ 122 | ## O -0.265 0.021 -12.347 0.000 -0.453 -0.453 123 | ## A 0.304 0.025 12.293 0.000 0.683 0.683 124 | ## C -0.224 0.020 -11.121 0.000 -0.357 -0.357 125 | ## O ~~ 126 | ## A -0.093 0.011 -8.446 0.000 -0.303 -0.303 127 | ## C 0.130 0.014 9.190 0.000 0.301 0.301 128 | ## A ~~ 129 | ## C -0.110 0.012 -9.254 0.000 -0.334 -0.334 130 | ## 131 | ## Variances: 132 | ## N1 0.793 0.037 0.793 0.320 133 | ## N2 0.836 0.036 0.836 0.356 134 | ## N3 1.222 0.043 1.222 0.481 135 | ## N4 1.654 0.052 1.654 0.672 136 | ## N5 1.969 0.060 1.969 0.747 137 | ## E1 1.814 0.058 1.814 0.682 138 | ## E2 1.332 0.049 1.332 0.512 139 | ## E3 1.108 0.038 1.108 0.607 140 | ## E4 1.088 0.041 1.088 0.506 141 | ## E5 1.251 0.040 1.251 0.694 142 | ## O1 0.865 0.032 0.865 0.682 143 | ## O2 1.990 0.063 1.990 0.826 144 | ## O3 0.691 0.039 0.691 0.476 145 | ## O4 1.346 0.040 1.346 0.946 146 | ## O5 1.380 0.045 1.380 0.788 147 | ## A1 1.745 0.052 1.745 0.882 148 | ## A2 0.807 0.028 0.807 0.580 149 | ## A3 0.754 0.032 0.754 0.438 150 | ## A4 1.632 0.051 1.632 0.740 151 | ## A5 0.852 0.032 0.852 0.528 152 | ## C1 1.063 0.035 1.063 0.697 153 | ## C2 1.130 0.039 1.130 0.650 154 | ## C3 1.170 0.039 1.170 0.702 155 | ## C4 0.960 0.040 0.960 0.507 156 | ## C5 1.640 0.059 1.640 0.615 157 | ## N 1.689 0.073 1.000 1.000 158 | ## E 0.846 0.062 1.000 1.000 159 | ## O 0.404 0.033 1.000 1.000 160 | ## A 0.234 0.030 1.000 1.000 161 | ## C 0.463 0.036 1.000 1.000 162 | ## 163 | ``` 164 | 165 | 166 | 167 | 168 | * **`Std.lv`**: Only latent variables have been standardized 169 | * **`Std.all`**: Observed and latent variables have been standardized. 170 | * **Factor loadings**: Under the `latent variables` section, the `Std.all` column provides standardised factor loadings. 171 | * **Factor correlations**: Under the `Covariances` section, the `Std.all` column provides standardised factor loadings. 172 | * **`Variances`**: Latent factor variances can be constrained for identifiability purposes to be 1, but in this case, one of the loadings was constrained to be one. Variances for items represent the variance not explained by the latent factor. 173 | 174 | 175 | 176 | 177 | 178 | ```r 179 | variances <- c(unique = subset(inspect(m1_fit, "standardizedsolution"), 180 | lhs == "N1" & rhs == "N1")[, "est.std"], common = subset(inspect(m1_fit, 181 | "standardizedsolution"), lhs == "N" & rhs == "N1")[, "est.std"]^2) 182 | (variances <- c(variances, total = sum(variances))) 183 | ``` 184 | 185 | ``` 186 | ## unique common total 187 | ## 0.3195 0.6805 1.0000 188 | ``` 189 | 190 | 191 | 192 | 193 | * The output above illustrates the point about variances. Variance for each item is explained by either the common factor or by error variance. As there is just one latent factor loading on the item, the squared standardised coefficient is the variance explained by the common factor. The sum of the unique and common standardised variances is one, which naturally corresponds to the variance of a standardised variable. 194 | * The code also demonstrates ideas about how to extract specific information from the lavaan model fit object. Specifically, the `inspect` method provides access to a wide range of specific information. See help for further details. 195 | * I used the `subset` method to provide an easy one-liner for extracting elements from the data frame returned by the `inspect` method. 196 | 197 | 198 | 199 | ```r 200 | variances <- c(N1_N1 = subset(parameterestimates(m1_fit), lhs == 201 | "N1" & rhs == "N1")[, "est"], N_N = subset(parameterestimates(m1_fit), lhs == 202 | "N" & rhs == "N")[, "est"], N_N1 = subset(parameterestimates(m1_fit), lhs == 203 | "N" & rhs == "N1")[, "est"]) 204 | 205 | cbind(parameters = c(variances, total = variances["N_N1"] * variances["N_N"] + 206 | variances["N1_N1"], raw_divide_by_n_minus_1 = var(Data[, "N1"]), raw_divide_by_n = mean((Data[, 207 | "N1"] - mean(Data[, "N1"]))^2))) 208 | ``` 209 | 210 | ``` 211 | ## parameters 212 | ## N1_N1 0.7932 213 | ## N_N 1.6893 214 | ## N_N1 1.0000 215 | ## total.N_N1 2.4825 216 | ## raw_divide_by_n_minus_1 2.4835 217 | ## raw_divide_by_n 2.4825 218 | ``` 219 | 220 | 221 | 222 | 223 | * The output above shows the unstandardised parameters related to the item `N1`. 224 | * `N1_N1` corresponds to the unstandardised unique variance for the item. 225 | * `N_N` times `N_N1` represents the unstandardised common variance. 226 | * Thus, the sum of the unique and common variance represents the total variance. 227 | * When I calculated this on the raw data using the standard $n-1$ denominator, the value was slightly larger, but when I used $n$ as the denominator, the estimate was very close. 228 | 229 | 230 | 231 | ## Compare with a single factor model 232 | 233 | 234 | ```r 235 | m2_model <- ' G =~ N1 + N2 + N3 + N4 + N5 236 | + E1 + E2 + E3 + E4 + E5 237 | + O1 + O2 + O3 + O4 + O5 238 | + A1 + A2 + A3 + A4 + A5 239 | + C1 + C2 + C3 + C4 + C5 240 | ' 241 | 242 | m2_fit <- cfa(m2_model, data=Data[, item_names]) 243 | summary(m2_fit, standardized=TRUE) 244 | ``` 245 | 246 | ``` 247 | ## lavaan (0.4-14) converged normally after 55 iterations 248 | ## 249 | ## Number of observations 2436 250 | ## 251 | ## Estimator ML 252 | ## Minimum Function Chi-square 10673.239 253 | ## Degrees of freedom 275 254 | ## P-value 0.000 255 | ## 256 | ## Parameter estimates: 257 | ## 258 | ## Information Expected 259 | ## Standard Errors Standard 260 | ## 261 | ## Estimate Std.err Z-value P(>|z|) Std.lv Std.all 262 | ## Latent variables: 263 | ## G =~ 264 | ## N1 1.000 0.547 0.347 265 | ## N2 0.959 0.081 11.809 0.000 0.524 0.342 266 | ## N3 0.960 0.083 11.547 0.000 0.525 0.329 267 | ## N4 1.375 0.099 13.919 0.000 0.752 0.479 268 | ## N5 0.884 0.081 10.860 0.000 0.484 0.298 269 | ## E1 1.332 0.099 13.509 0.000 0.728 0.447 270 | ## E2 1.868 0.122 15.297 0.000 1.022 0.633 271 | ## E3 -1.382 0.094 -14.730 0.000 -0.756 -0.559 272 | ## E4 -1.702 0.111 -15.307 0.000 -0.931 -0.635 273 | ## E5 -1.292 0.090 -14.425 0.000 -0.707 -0.526 274 | ## O1 -0.656 0.058 -11.321 0.000 -0.359 -0.318 275 | ## O2 0.444 0.067 6.641 0.000 0.243 0.156 276 | ## O3 -0.877 0.068 -12.801 0.000 -0.479 -0.398 277 | ## O4 0.142 0.048 2.930 0.003 0.078 0.065 278 | ## O5 0.416 0.058 7.196 0.000 0.228 0.172 279 | ## A1 0.568 0.065 8.797 0.000 0.311 0.221 280 | ## A2 -1.032 0.074 -13.913 0.000 -0.565 -0.479 281 | ## A3 -1.322 0.090 -14.663 0.000 -0.723 -0.552 282 | ## A4 -1.172 0.088 -13.307 0.000 -0.641 -0.432 283 | ## A5 -1.413 0.093 -15.123 0.000 -0.773 -0.608 284 | ## C1 -0.705 0.063 -11.188 0.000 -0.386 -0.312 285 | ## C2 -0.725 0.066 -10.923 0.000 -0.396 -0.301 286 | ## C3 -0.682 0.064 -10.645 0.000 -0.373 -0.289 287 | ## C4 1.009 0.079 12.852 0.000 0.552 0.401 288 | ## C5 1.332 0.099 13.505 0.000 0.728 0.446 289 | ## 290 | ## Variances: 291 | ## N1 2.183 0.064 2.183 0.880 292 | ## N2 2.075 0.061 2.075 0.883 293 | ## N3 2.267 0.066 2.267 0.892 294 | ## N4 1.897 0.057 1.897 0.770 295 | ## N5 2.401 0.070 2.401 0.911 296 | ## E1 2.130 0.064 2.130 0.801 297 | ## E2 1.560 0.050 1.560 0.599 298 | ## E3 1.255 0.039 1.255 0.687 299 | ## E4 1.284 0.042 1.284 0.597 300 | ## E5 1.304 0.040 1.304 0.723 301 | ## O1 1.140 0.033 1.140 0.899 302 | ## O2 2.351 0.068 2.351 0.976 303 | ## O3 1.222 0.036 1.222 0.842 304 | ## O4 1.417 0.041 1.417 0.996 305 | ## O5 1.701 0.049 1.701 0.970 306 | ## A1 1.883 0.054 1.883 0.951 307 | ## A2 1.072 0.032 1.072 0.771 308 | ## A3 1.196 0.037 1.196 0.696 309 | ## A4 1.794 0.053 1.794 0.814 310 | ## A5 1.017 0.032 1.017 0.630 311 | ## C1 1.376 0.040 1.376 0.902 312 | ## C2 1.582 0.046 1.582 0.910 313 | ## C3 1.528 0.044 1.528 0.917 314 | ## C4 1.590 0.047 1.590 0.839 315 | ## C5 2.134 0.064 2.134 0.801 316 | ## G 0.299 0.037 1.000 1.000 317 | ## 318 | ``` 319 | 320 | 321 | 322 | 323 | 324 | 325 | ```r 326 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m2 = inspect(m2_fit, 327 | "fit.measures")), 3) 328 | ``` 329 | 330 | ``` 331 | ## m1 m2 332 | ## chisq 4165.467 1.067e+04 333 | ## df 265.000 2.750e+02 334 | ## pvalue 0.000 0.000e+00 335 | ## baseline.chisq 18222.116 1.822e+04 336 | ## baseline.df 300.000 3.000e+02 337 | ## baseline.pvalue 0.000 0.000e+00 338 | ## cfi 0.782 4.200e-01 339 | ## tli 0.754 3.670e-01 340 | ## logl -99840.238 -1.031e+05 341 | ## unrestricted.logl -97757.504 -9.776e+04 342 | ## npar 60.000 5.000e+01 343 | ## aic 199800.476 2.063e+05 344 | ## bic 200148.363 2.066e+05 345 | ## ntotal 2436.000 2.436e+03 346 | ## bic2 199957.729 2.064e+05 347 | ## rmsea 0.078 1.250e-01 348 | ## rmsea.ci.lower 0.076 1.230e-01 349 | ## rmsea.ci.upper 0.080 1.270e-01 350 | ## rmsea.pvalue 0.000 0.000e+00 351 | ## srmr 0.075 1.160e-01 352 | ``` 353 | 354 | ```r 355 | anova(m1_fit, m2_fit) 356 | ``` 357 | 358 | ``` 359 | ## Chi Square Difference Test 360 | ## 361 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 362 | ## m1_fit 265 199800 200148 4165 363 | ## m2_fit 275 206288 206578 10673 6508 10 <2e-16 *** 364 | ## --- 365 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 366 | ``` 367 | 368 | 369 | 370 | 371 | * The output compares the model fit statistics for the two models. 372 | * It also performs a chi-square difference test which shows that a one-factor model has significantly worse fit than the two-factor model. 373 | 374 | 375 | ## Modification indices 376 | 377 | 378 | ```r 379 | m1_mod <- modificationindices(m1_fit) 380 | m1_mod_summary <- subset(m1_mod, mi > 100) 381 | m1_mod_summary[order(m1_mod_summary$mi, decreasing = TRUE), ] 382 | ``` 383 | 384 | ``` 385 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox 386 | ## 1 N1 ~~ N2 418.8 0.841 0.841 0.348 0.348 387 | ## 2 E =~ N4 200.8 0.487 0.448 0.285 0.285 388 | ## 3 O =~ E3 153.7 0.672 0.427 0.316 0.316 389 | ## 4 N3 ~~ N4 134.1 0.403 0.403 0.161 0.161 390 | ## 5 O =~ E4 122.6 -0.636 -0.404 -0.276 -0.276 391 | ## 6 C =~ E5 121.5 0.504 0.343 0.255 0.255 392 | ## 7 E =~ O3 114.2 -0.429 -0.395 -0.328 -0.328 393 | ## 8 E =~ O4 113.9 0.372 0.343 0.287 0.287 394 | ## 9 N =~ C5 108.8 0.271 0.352 0.216 0.216 395 | ## 10 E =~ A5 108.6 -0.488 -0.449 -0.354 -0.354 396 | ## 11 N =~ C2 107.0 0.219 0.285 0.216 0.216 397 | ## 12 C1 ~~ C2 107.0 0.288 0.288 0.177 0.177 398 | ## 13 E2 ~~ O4 104.7 0.310 0.310 0.161 0.161 399 | ## 14 A1 ~~ A2 101.4 -0.276 -0.276 -0.166 -0.166 400 | ``` 401 | 402 | 403 | 404 | 405 | * `modificationindices` suggests several ad hoc modifications that could be made to improve the fit of the model. 406 | * The largest index suggests that items `N1` and `N2` share common variance. If we look at the help file on the bfi dataset `?bfi`, we see tha the text for `N1` ("Get angry easily") and `N2` ("Get irritated easily") are very similar. 407 | 408 | 409 | 410 | ```r 411 | (N_cors <- round(cor(Data[, paste0("N", 1:5)]), 2)) 412 | ``` 413 | 414 | ``` 415 | ## N1 N2 N3 N4 N5 416 | ## N1 1.00 0.72 0.57 0.41 0.38 417 | ## N2 0.72 1.00 0.55 0.39 0.35 418 | ## N3 0.57 0.55 1.00 0.52 0.43 419 | ## N4 0.41 0.39 0.52 1.00 0.40 420 | ## N5 0.38 0.35 0.43 0.40 1.00 421 | ``` 422 | 423 | ```r 424 | N1_N2_corr <- N_cors["N1", "N2"] 425 | other_N_corrs <- round(mean(abs(N_cors[lower.tri(N_cors)][-1])), 426 | 2) 427 | ``` 428 | 429 | 430 | 431 | 432 | * The correlation matrix also shows that the correlation N1 and N2 ($r = 0.72$) is much larger than it is for the other variables ($\text{mean}(|r|) = 0.44$). 433 | 434 | ## Various matrices 435 | ### Observed, fitted, and residual covariance matrices 436 | The following analysis extracts observed, fitted, and residual covariances and checks that they are consistent with expectations. I only perform this for five items rather than the full 25 item set in order to make the point about demonstrating their meaning clearer. 437 | 438 | 439 | 440 | ```r 441 | N_names <- paste0("N", 1:5) 442 | N_matrices <- list(observed = inspect(m1_fit, "sampstat")$cov[N_names, 443 | N_names], fitted = fitted(m1_fit)$cov[N_names, N_names], residual = resid(m1_fit)$cov[N_names, 444 | N_names]) 445 | 446 | N_matrices$check <- N_matrices$observed - (N_matrices$fitted + N_matrices$residual) 447 | lapply(N_matrices, function(X) round(X, 3)) 448 | ``` 449 | 450 | ``` 451 | ## $observed 452 | ## N1 N2 N3 N4 N5 453 | ## N1 2.482 1.735 1.425 1.013 0.973 454 | ## N2 1.735 2.350 1.344 0.950 0.873 455 | ## N3 1.425 1.344 2.542 1.309 1.114 456 | ## N4 1.013 0.950 1.309 2.463 1.026 457 | ## N5 0.973 0.873 1.114 1.026 2.635 458 | ## 459 | ## $fitted 460 | ## N1 N2 N3 N4 N5 461 | ## N1 2.482 1.599 1.493 1.169 1.061 462 | ## N2 1.599 2.350 1.414 1.106 1.004 463 | ## N3 1.493 1.414 2.542 1.033 0.937 464 | ## N4 1.169 1.106 1.033 2.463 0.734 465 | ## N5 1.061 1.004 0.937 0.734 2.635 466 | ## 467 | ## $residual 468 | ## N1 N2 N3 N4 N5 469 | ## N1 0.000 0.135 -0.068 -0.155 -0.087 470 | ## N2 0.135 0.000 -0.069 -0.157 -0.131 471 | ## N3 -0.068 -0.069 0.000 0.276 0.177 472 | ## N4 -0.155 -0.157 0.276 0.000 0.293 473 | ## N5 -0.087 -0.131 0.177 0.293 0.000 474 | ## 475 | ## $check 476 | ## N1 N2 N3 N4 N5 477 | ## N1 0 0 0 0 0 478 | ## N2 0 0 0 0 0 479 | ## N3 0 0 0 0 0 480 | ## N4 0 0 0 0 0 481 | ## N5 0 0 0 0 0 482 | ## 483 | ``` 484 | 485 | 486 | 487 | 488 | * The overved covariance matrix was extracted using the `cov` function on the sample data. 489 | * The fitted covariance matrix can be extracted using the `fitted` method on the model fit object and then extracting the cov 490 | * Many symmetric matrices in lavaan are of class `lavaan.matrix.symmetric`. This hides the upper triangle of the matrix and formats the matrix to `nd` decimal places. 491 | Run `getAnywhere(print.lavaan.matrix.symmetric)` to see more details. 492 | * The `sampstat` option in the `inspect` method can be used to extract the sample covariance matrix. This is similar, but not exactly the same as running `cov` on the sample data. 493 | * The `resid` method can be used to extract the residual covariance matrix 494 | * I then create a `check` that `observed = fitted - residual`, which it does. 495 | 496 | ### Observed, fitted, and residual correlation matrices 497 | I often find it more meaningful to examine observed, fitted, and residual correlation matrices. Standardisation often makes it easier to understand the real magnitude of any residual. 498 | 499 | 500 | 501 | ```r 502 | N_names <- paste0("N", 1:5) 503 | N_cov <- list(observed = inspect(m1_fit, "sampstat")$cov[N_names, 504 | N_names], fitted = fitted(m1_fit)$cov[N_names, N_names]) 505 | 506 | N_cor <- list(observed = cov2cor(N_cov$observed), fitted = cov2cor(N_cov$fitted)) 507 | 508 | N_cor$residual <- N_cor$observed - N_cor$fitted 509 | 510 | lapply(N_cor, function(X) round(X, 2)) 511 | ``` 512 | 513 | ``` 514 | ## $observed 515 | ## N1 N2 N3 N4 N5 516 | ## N1 1.00 0.72 0.57 0.41 0.38 517 | ## N2 0.72 1.00 0.55 0.39 0.35 518 | ## N3 0.57 0.55 1.00 0.52 0.43 519 | ## N4 0.41 0.39 0.52 1.00 0.40 520 | ## N5 0.38 0.35 0.43 0.40 1.00 521 | ## 522 | ## $fitted 523 | ## N1 N2 N3 N4 N5 524 | ## N1 1.00 0.66 0.59 0.47 0.41 525 | ## N2 0.66 1.00 0.58 0.46 0.40 526 | ## N3 0.59 0.58 1.00 0.41 0.36 527 | ## N4 0.47 0.46 0.41 1.00 0.29 528 | ## N5 0.41 0.40 0.36 0.29 1.00 529 | ## 530 | ## $residual 531 | ## N1 N2 N3 N4 N5 532 | ## N1 0.00 0.06 -0.03 -0.06 -0.03 533 | ## N2 0.06 0.00 -0.03 -0.07 -0.05 534 | ## N3 -0.03 -0.03 0.00 0.11 0.07 535 | ## N4 -0.06 -0.07 0.11 0.00 0.11 536 | ## N5 -0.03 -0.05 0.07 0.11 0.00 537 | ## 538 | ``` 539 | 540 | 541 | 542 | 543 | * `cov2cor` is a `base` R function that scales a covariance matrix into a correlation matrix. 544 | * Fitted and observed correlation matrices can be obtained by running `cov2cor` on the corresponding covariance matrices. 545 | * The residual correlation matrix can be obtained by subtracting the fitted correlation matrix from the observed correlation matrix. 546 | * In this case we can see that the certain pairs of items correlate more or less than other pairs. In particular `N1-N2`, `N3-N4`, `N4-N5` have positive correlation residuals. An examination of the items below may suggest some added degree of similarity between these pairs of items. For example, N1 and N2 both concern anger and irritation, whereas N3 and N4 both concern mood and affect. 547 | 548 | 549 | > N1: Get angry easily. (q_952) 550 | > N2: Get irritated easily. (q_974) 551 | > N3: Have frequent mood swings. (q_1099 552 | > N4: Often feel blue. (q_1479) 553 | > N5: Panic easily. (q_1505) 554 | 555 | ## Uncorrelated factors 556 | ### All Uncorrelated factors 557 | The following examines a mdoel with uncorrelated factors. 558 | 559 | 560 | 561 | ```r 562 | m3_model <- ' N =~ N1 + N2 + N3 + N4 + N5 563 | E =~ E1 + E2 + E3 + E4 + E5 564 | O =~ O1 + O2 + O3 + O4 + O5 565 | A =~ A1 + A2 + A3 + A4 + A5 566 | C =~ C1 + C2 + C3 + C4 + C5 567 | ' 568 | 569 | m3_fit <- cfa(m3_model, data=Data[, item_names], orthogonal=TRUE) 570 | 571 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 572 | m3=inspect(m3_fit, 'fit.measures')), 3) 573 | ``` 574 | 575 | ``` 576 | ## m1 m3 577 | ## chisq 4165.467 5.640e+03 578 | ## df 265.000 2.750e+02 579 | ## pvalue 0.000 0.000e+00 580 | ## baseline.chisq 18222.116 1.822e+04 581 | ## baseline.df 300.000 3.000e+02 582 | ## baseline.pvalue 0.000 0.000e+00 583 | ## cfi 0.782 7.010e-01 584 | ## tli 0.754 6.730e-01 585 | ## logl -99840.238 -1.006e+05 586 | ## unrestricted.logl -97757.504 -9.776e+04 587 | ## npar 60.000 5.000e+01 588 | ## aic 199800.476 2.013e+05 589 | ## bic 200148.363 2.015e+05 590 | ## ntotal 2436.000 2.436e+03 591 | ## bic2 199957.729 2.014e+05 592 | ## rmsea 0.078 8.900e-02 593 | ## rmsea.ci.lower 0.076 8.700e-02 594 | ## rmsea.ci.upper 0.080 9.200e-02 595 | ## rmsea.pvalue 0.000 0.000e+00 596 | ## srmr 0.075 1.380e-01 597 | ``` 598 | 599 | ```r 600 | anova(m1_fit, m3_fit) 601 | ``` 602 | 603 | ``` 604 | ## Chi Square Difference Test 605 | ## 606 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 607 | ## m1_fit 265 199800 200148 4165 608 | ## m3_fit 275 201255 201545 5640 1474 10 <2e-16 *** 609 | ## --- 610 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 611 | ``` 612 | 613 | ```r 614 | 615 | rmsea_m1 <- round(inspect(m1_fit, 'fit.measures')['rmsea'], 3) 616 | rmsea_m3 <- round(inspect(m3_fit, 'fit.measures')['rmsea'], 3) 617 | ``` 618 | 619 | 620 | 621 | 622 | * To convert a `cfa` model from one that permits fators to be correlated to one that constrains factors to be uncorrelated, just specify `orthogonal=TRUE`. 623 | * In this case constraining the factor covariances to all be zero led to a significant reduction in fit. This poorer fit can also be seen in measures like RMSEA (m1= 624 | `0.078`; m3 = `0.089` ). 625 | 626 | 627 | ### Correlations and covariances between factors 628 | It is useful to be able to extract correlations and covaraiances between factors. 629 | 630 | 631 | 632 | ```r 633 | inspect(m1_fit, "coefficients")$psi 634 | ``` 635 | 636 | ``` 637 | ## N E O A C 638 | ## N 1.689 639 | ## E 0.292 0.846 640 | ## O -0.093 -0.265 0.404 641 | ## A 0.141 0.304 -0.093 0.234 642 | ## C -0.250 -0.224 0.130 -0.110 0.463 643 | ``` 644 | 645 | ```r 646 | cov2cor(inspect(m1_fit, "coefficients")$psi) 647 | ``` 648 | 649 | ``` 650 | ## N E O A C 651 | ## N 1.000 652 | ## E 0.244 1.000 653 | ## O -0.112 -0.453 1.000 654 | ## A 0.223 0.683 -0.303 1.000 655 | ## C -0.283 -0.357 0.301 -0.334 1.000 656 | ``` 657 | 658 | ```r 659 | A_E_r <- cov2cor(inspect(m1_fit, "coefficients")$psi)["A", "E"] 660 | ``` 661 | 662 | 663 | 664 | 665 | * This code first extracts the factor variances and covariances. 666 | * I assume that naming the element `psi` (i.e., $\psi$) is a reference to LISREL Matrix notation (see this discussion from [USP 655 SEM](http://www.upa.pdx.edu/IOA/newsom/semclass/ho_lisrel%20notation.pdf)). 667 | * Once again `cov2cor` is used to convert the covariance matrix to a correlation matrix. 668 | * An inspection of the values shows that there are some substantive correlations that helps to explain why constraining them to zero in an orthogonal model would have substantially damaged fit. For example, the correlation between extraversion (`E`) and agreeableness (`A`) was quite high ($r = 0.68$). 669 | 670 | 671 | 672 | 673 | ```r 674 | # c('O', 'C', 'E', 'A', 'N') # set of factor names lhs != rhs # excludes 675 | # factor variances 676 | subset(inspect(m1_fit, "standardized"), rhs %in% c("O", "C", "E", 677 | "A", "N") & lhs != rhs) 678 | ``` 679 | 680 | ``` 681 | ## lhs op rhs est.std se z pvalue 682 | ## 1 N ~~ E 0.244 NA NA NA 683 | ## 2 N ~~ O -0.112 NA NA NA 684 | ## 3 N ~~ A 0.223 NA NA NA 685 | ## 4 N ~~ C -0.283 NA NA NA 686 | ## 5 E ~~ O -0.453 NA NA NA 687 | ## 6 E ~~ A 0.683 NA NA NA 688 | ## 7 E ~~ C -0.357 NA NA NA 689 | ## 8 O ~~ A -0.303 NA NA NA 690 | ## 9 O ~~ C 0.301 NA NA NA 691 | ## 10 A ~~ C -0.334 NA NA NA 692 | ``` 693 | 694 | 695 | 696 | 697 | * The same values can be extracted from the `standardized` coefficients table using the `inspect` method. 698 | 699 | We can also confirm that for the orthogonal model (`m3`) the correlations are zero. 700 | 701 | 702 | 703 | ```r 704 | cov2cor(inspect(m3_fit, "coefficients")$psi) 705 | ``` 706 | 707 | ``` 708 | ## N E O A C 709 | ## N 1 710 | ## E 0 1 711 | ## O 0 0 1 712 | ## A 0 0 0 1 713 | ## C 0 0 0 0 1 714 | ``` 715 | 716 | 717 | 718 | 719 | 720 | ## Constrain factor correlations to be equal 721 | ### Change constraints so that factor variances are one 722 | 723 | 724 | 725 | ```r 726 | m4_model <- ' N =~ N1 + N2 + N3 + N4 + N5 727 | E =~ E1 + E2 + E3 + E4 + E5 728 | O =~ O1 + O2 + O3 + O4 + O5 729 | A =~ A1 + A2 + A3 + A4 + A5 730 | C =~ C1 + C2 + C3 + C4 + C5 731 | ' 732 | 733 | m4_fit <- cfa(m4_model, data=Data[, item_names], std.lv=TRUE) 734 | 735 | inspect(m4_fit, 'coefficients')$psi 736 | ``` 737 | 738 | ``` 739 | ## N E O A C 740 | ## N 1.000 741 | ## E -0.244 1.000 742 | ## O -0.112 0.453 1.000 743 | ## A -0.223 0.683 0.303 1.000 744 | ## C -0.283 0.357 0.301 0.334 1.000 745 | ``` 746 | 747 | ```r 748 | inspect(m4_fit, 'coefficients')$psi 749 | ``` 750 | 751 | ``` 752 | ## N E O A C 753 | ## N 1.000 754 | ## E -0.244 1.000 755 | ## O -0.112 0.453 1.000 756 | ## A -0.223 0.683 0.303 1.000 757 | ## C -0.283 0.357 0.301 0.334 1.000 758 | ``` 759 | 760 | 761 | 762 | 763 | * `std.lv` is an argument that when `TRUE` standardises latent variables by fixing their variance to 1.0. The default is `FALSE` which instead constrains the first factor loading to 1.0. 764 | * This makes the covariance and the correlation matrix of the factors the same. 765 | 766 | We can see the differences in the loadings by comparing the loadings for the neuroticism factor: 767 | 768 | 769 | 770 | ```r 771 | head(parameterestimates(m4_fit), 5) 772 | ``` 773 | 774 | ``` 775 | ## lhs op rhs est se z pvalue ci.lower ci.upper 776 | ## 1 N =~ N1 1.300 0.028 46.07 0 1.244 1.355 777 | ## 2 N =~ N2 1.230 0.028 44.38 0 1.176 1.285 778 | ## 3 N =~ N3 1.149 0.030 38.41 0 1.090 1.207 779 | ## 4 N =~ N4 0.899 0.031 28.75 0 0.838 0.960 780 | ## 5 N =~ N5 0.816 0.033 24.65 0 0.751 0.881 781 | ``` 782 | 783 | ```r 784 | head(parameterestimates(m1_fit), 5) 785 | ``` 786 | 787 | ``` 788 | ## lhs op rhs est se z pvalue ci.lower ci.upper 789 | ## 1 N =~ N1 1.000 0.000 NA NA 1.000 1.000 790 | ## 2 N =~ N2 0.947 0.024 39.90 0 0.900 0.993 791 | ## 3 N =~ N3 0.884 0.025 35.92 0 0.836 0.932 792 | ## 4 N =~ N4 0.692 0.025 27.75 0 0.643 0.741 793 | ## 5 N =~ N5 0.628 0.026 24.03 0 0.577 0.679 794 | ``` 795 | 796 | ```r 797 | 798 | # shows how ratio of loadings has not changed 799 | head(parameterestimates(m4_fit), 5)$est/head(parameterestimates(m4_fit), 800 | 5)$est[1] 801 | ``` 802 | 803 | ``` 804 | ## [1] 1.0000 0.9467 0.8839 0.6918 0.6278 805 | ``` 806 | 807 | 808 | 809 | 810 | 811 | 812 | ### Add equality constraints 813 | 814 | 815 | ```r 816 | m5_model <- ' N =~ N1 + N2 + N3 + N4 + N5 817 | E =~ E1 + E2 + E3 + E4 + E5 818 | O =~ O1 + O2 + O3 + O4 + O5 819 | A =~ A1 + A2 + A3 + A4 + A5 820 | C =~ C1 + C2 + C3 + C4 + C5 821 | N ~~ R*E + R*O + R*A + R*C 822 | E ~~ R*O + R*A + R*C 823 | O ~~ R*A + R*C 824 | A ~~ R*C 825 | ' 826 | 827 | Data_reversed <- Data 828 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 829 | 830 | m5_fit <- cfa(m5_model, data=Data_reversed[, item_names], std.lv=TRUE) 831 | ``` 832 | 833 | 834 | 835 | 836 | * Equality constraints were added by labelling all the covariance parameters with a common label (i.e., `R`). 837 | * `~~` stands for covariance. 838 | * `R*E` labels the parameter with the `E` variable with the label 839 | * I reversed the neuroticism items and hence the factor to ensure that all the inter-item correlations were positive. 840 | 841 | The following output shows that the correlation/covariance is the same for all factor inter-correlations. 842 | 843 | 844 | 845 | ```r 846 | inspect(m5_fit, "coefficients")$psi 847 | ``` 848 | 849 | ``` 850 | ## N E O A C 851 | ## N 1.000 852 | ## E 0.323 1.000 853 | ## O 0.323 0.323 1.000 854 | ## A 0.323 0.323 0.323 1.000 855 | ## C 0.323 0.323 0.323 0.323 1.000 856 | ``` 857 | 858 | 859 | 860 | 861 | The following analysis compare the fit of the unconstrained with the equal-covariance model. 862 | 863 | 864 | 865 | ```r 866 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m5 = inspect(m5_fit, 867 | "fit.measures")), 3) 868 | ``` 869 | 870 | ``` 871 | ## m1 m5 872 | ## chisq 4165.467 4.576e+03 873 | ## df 265.000 2.740e+02 874 | ## pvalue 0.000 0.000e+00 875 | ## baseline.chisq 18222.116 1.822e+04 876 | ## baseline.df 300.000 3.000e+02 877 | ## baseline.pvalue 0.000 0.000e+00 878 | ## cfi 0.782 7.600e-01 879 | ## tli 0.754 7.370e-01 880 | ## logl -99840.238 -1.000e+05 881 | ## unrestricted.logl -97757.504 -9.776e+04 882 | ## npar 60.000 5.100e+01 883 | ## aic 199800.476 2.002e+05 884 | ## bic 200148.363 2.005e+05 885 | ## ntotal 2436.000 2.436e+03 886 | ## bic2 199957.729 2.003e+05 887 | ## rmsea 0.078 8.000e-02 888 | ## rmsea.ci.lower 0.076 7.800e-02 889 | ## rmsea.ci.upper 0.080 8.200e-02 890 | ## rmsea.pvalue 0.000 0.000e+00 891 | ## srmr 0.075 8.900e-02 892 | ``` 893 | 894 | ```r 895 | anova(m1_fit, m5_fit) 896 | ``` 897 | 898 | ``` 899 | ## Chi Square Difference Test 900 | ## 901 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 902 | ## m1_fit 265 2e+05 2e+05 4165 903 | ## m5_fit 274 2e+05 2e+05 4576 411 9 <2e-16 *** 904 | ## --- 905 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 906 | ``` 907 | 908 | 909 | 910 | 911 | * The unconstrained model provides a better fit both in terms of the chi-square difference test and when comparing various parisomony adjusted fit indices such as RMSEA. 912 | * The difference is relatively small. 913 | 914 | The following summarises the correlations between variables (correlations with Neuroticism reversed). 915 | 916 | 917 | 918 | ```r 919 | rs <- abs(inspect(m4_fit, "coefficients")$psi) 920 | summary(rs[lower.tri(rs)]) 921 | ``` 922 | 923 | ``` 924 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 925 | ## 0.112 0.254 0.302 0.329 0.352 0.683 926 | ``` 927 | 928 | ```r 929 | hist(rs[lower.tri(rs)]) 930 | ``` 931 | 932 | ![plot of chunk unnamed-chunk-19](figure/unnamed-chunk-19.png) 933 | 934 | ```r 935 | 936 | round(rs, 2) 937 | ``` 938 | 939 | ``` 940 | ## N E O A C 941 | ## N 1.00 942 | ## E 0.24 1.00 943 | ## O 0.11 0.45 1.00 944 | ## A 0.22 0.68 0.30 1.00 945 | ## C 0.28 0.36 0.30 0.33 1.00 946 | ``` 947 | 948 | 949 | 950 | 951 | * Given the very large sample size, even small variations in sample correlations likely reflect true variation. 952 | * However, in particular, the correlation between E and A is much larger than the average correlation, and the correlation between O and N is much smaller than the average correlation. 953 | 954 | ### Add equality constraints with some post hoc modifications 955 | 956 | 957 | ```r 958 | m6_model <- ' N =~ N1 + N2 + N3 + N4 + N5 959 | E =~ E1 + E2 + E3 + E4 + E5 960 | O =~ O1 + O2 + O3 + O4 + O5 961 | A =~ A1 + A2 + A3 + A4 + A5 962 | C =~ C1 + C2 + C3 + C4 + C5 963 | N ~~ R*E + R*A + R*C 964 | E ~~ R*O + R*C 965 | O ~~ R*A + R*C 966 | A ~~ R*C 967 | ' 968 | 969 | Data_reversed <- Data 970 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 971 | 972 | m6_fit <- cfa(m6_model, data=Data_reversed[, item_names], std.lv=TRUE) 973 | ``` 974 | 975 | 976 | 977 | 978 | The above model frees up the correlation between E and A, and between O and N. 979 | 980 | 981 | 982 | ```r 983 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m5 = inspect(m1_fit, 984 | "fit.measures"), m6 = inspect(m6_fit, "fit.measures")), 3) 985 | ``` 986 | 987 | ``` 988 | ## m1 m5 m6 989 | ## chisq 4165.467 4165.467 4223.250 990 | ## df 265.000 265.000 272.000 991 | ## pvalue 0.000 0.000 0.000 992 | ## baseline.chisq 18222.116 18222.116 18222.116 993 | ## baseline.df 300.000 300.000 300.000 994 | ## baseline.pvalue 0.000 0.000 0.000 995 | ## cfi 0.782 0.782 0.780 996 | ## tli 0.754 0.754 0.757 997 | ## logl -99840.238 -99840.238 -99869.130 998 | ## unrestricted.logl -97757.504 -97757.504 -97757.504 999 | ## npar 60.000 60.000 53.000 1000 | ## aic 199800.476 199800.476 199844.259 1001 | ## bic 200148.363 200148.363 200151.559 1002 | ## ntotal 2436.000 2436.000 2436.000 1003 | ## bic2 199957.729 199957.729 199983.166 1004 | ## rmsea 0.078 0.078 0.077 1005 | ## rmsea.ci.lower 0.076 0.076 0.075 1006 | ## rmsea.ci.upper 0.080 0.080 0.079 1007 | ## rmsea.pvalue 0.000 0.000 0.000 1008 | ## srmr 0.075 0.075 0.077 1009 | ``` 1010 | 1011 | ```r 1012 | anova(m1_fit, m6_fit) 1013 | ``` 1014 | 1015 | ``` 1016 | ## Chi Square Difference Test 1017 | ## 1018 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 1019 | ## m1_fit 265 2e+05 2e+05 4165 1020 | ## m6_fit 272 2e+05 2e+05 4223 57.8 7 4.2e-10 *** 1021 | ## --- 1022 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 1023 | ``` 1024 | 1025 | ```r 1026 | anova(m5_fit, m6_fit) 1027 | ``` 1028 | 1029 | ``` 1030 | ## Chi Square Difference Test 1031 | ## 1032 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 1033 | ## m6_fit 272 2e+05 2e+05 4223 1034 | ## m5_fit 274 2e+05 2e+05 4576 353 2 <2e-16 *** 1035 | ## --- 1036 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 1037 | ``` 1038 | 1039 | 1040 | 1041 | 1042 | * Freeing up these two correlations improved the model relative to the equality model. By most fit statistics, this model still provided a worse fit than the unconstrained model. However, interestingly, the RMSEA was slightly lower (i.e., better). 1043 | 1044 | ### Add equality constraints without reversal 1045 | In section 5.5 of the [Lavaan introductory guide 0.4-13](http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf) it talks about various types of equality constraints. Thus, instead of reversing the neuroticism factor, it is possible to directly constrain covariances of neuroticism with each other factor to be the opposite of the covariances. 1046 | 1047 | 1048 | 1049 | ```r 1050 | m7_model <- ' N =~ N1 + N2 + N3 + N4 + N5 1051 | E =~ E1 + E2 + E3 + E4 + E5 1052 | O =~ O1 + O2 + O3 + O4 + O5 1053 | A =~ A1 + A2 + A3 + A4 + A5 1054 | C =~ C1 + C2 + C3 + C4 + C5 1055 | # covariances 1056 | N ~~ R1*E + R1*O + R1*A + R1*C 1057 | E ~~ R2*O + R2*A + R2*C 1058 | O ~~ R2*A + R2*C 1059 | A ~~ R2*C 1060 | 1061 | # constraints 1062 | R1 == 0 - R2 1063 | ' 1064 | 1065 | m7_fit <- cfa(m7_model, data=Data[, item_names], std.lv=TRUE) 1066 | ``` 1067 | 1068 | 1069 | 1070 | 1071 | Let's check that the results are the same whether we reverse data or set negative constraints. 1072 | 1073 | 1074 | 1075 | 1076 | 1077 | ```r 1078 | m5_fit 1079 | ``` 1080 | 1081 | ``` 1082 | ## lavaan (0.4-14) converged normally after 43 iterations 1083 | ## 1084 | ## Number of observations 2436 1085 | ## 1086 | ## Estimator ML 1087 | ## Minimum Function Chi-square 4576.170 1088 | ## Degrees of freedom 274 1089 | ## P-value 0.000 1090 | ## 1091 | ``` 1092 | 1093 | ```r 1094 | m7_fit 1095 | ``` 1096 | 1097 | ``` 1098 | ## lavaan (0.4-14) converged normally after 283 iterations 1099 | ## 1100 | ## Number of observations 2436 1101 | ## 1102 | ## Estimator ML 1103 | ## Minimum Function Chi-square 4576.170 1104 | ## Degrees of freedom 274 1105 | ## P-value 0.000 1106 | ## 1107 | ``` 1108 | 1109 | 1110 | 1111 | -------------------------------------------------------------------------------- /cfa-example/cfa-example.rmd: -------------------------------------------------------------------------------- 1 | # CFA Example 2 | 3 | ```{r get_data, message=FALSE} 4 | library(psych) 5 | library(lavaan) 6 | Data <- bfi 7 | item_names <- names(Data)[1:25] 8 | ``` 9 | 10 | ## Check data 11 | 12 | ```{r } 13 | sapply(Data[,item_names], function(X) sum(is.na(X))) 14 | 15 | Data$item_na <- apply(Data[,item_names], 1, function(X) sum(is.na(X)) > 0) 16 | 17 | table(Data$item_na) 18 | Data <- Data[!Data$item_na, ] 19 | ``` 20 | 21 | * I decided to remove data with missing data to simplify subsequent exploration of the features of the lavaan software. 22 | 23 | 24 | ## Basic CFA 25 | ```{r, tidy=FALSE} 26 | m1_model <- ' N =~ N1 + N2 + N3 + N4 + N5 27 | E =~ E1 + E2 + E3 + E4 + E5 28 | O =~ O1 + O2 + O3 + O4 + O5 29 | A =~ A1 + A2 + A3 + A4 + A5 30 | C =~ C1 + C2 + C3 + C4 + C5 31 | ' 32 | 33 | m1_fit <- cfa(m1_model, data=Data[, item_names]) 34 | summary(m1_fit, standardized=TRUE) 35 | ``` 36 | 37 | * **`Std.lv`**: Only latent variables have been standardized 38 | * **`Std.all`**: Observed and latent variables have been standardized. 39 | * **Factor loadings**: Under the `latent variables` section, the `Std.all` column provides standardised factor loadings. 40 | * **Factor correlations**: Under the `Covariances` section, the `Std.all` column provides standardised factor loadings. 41 | * **`Variances`**: Latent factor variances can be constrained for identifiability purposes to be 1, but in this case, one of the loadings was constrained to be one. Variances for items represent the variance not explained by the latent factor. 42 | 43 | 44 | 45 | ```{r demonstrate_variance_point} 46 | variances <- c(unique=subset(inspect(m1_fit, "standardizedsolution"), 47 | lhs == 'N1' & rhs == 'N1')[, 'est.std'], 48 | common=subset(inspect(m1_fit, "standardizedsolution"), 49 | lhs == 'N' & rhs == 'N1')[, 'est.std']^2) 50 | (variances <- c(variances, total=sum(variances))) 51 | ``` 52 | 53 | * The output above illustrates the point about variances. Variance for each item is explained by either the common factor or by error variance. As there is just one latent factor loading on the item, the squared standardised coefficient is the variance explained by the common factor. The sum of the unique and common standardised variances is one, which naturally corresponds to the variance of a standardised variable. 54 | * The code also demonstrates ideas about how to extract specific information from the lavaan model fit object. Specifically, the `inspect` method provides access to a wide range of specific information. See help for further details. 55 | * I used the `subset` method to provide an easy one-liner for extracting elements from the data frame returned by the `inspect` method. 56 | 57 | ```{r} 58 | variances <- c(N1_N1=subset(parameterestimates(m1_fit), 59 | lhs == 'N1' & rhs == 'N1')[, 'est'], 60 | N_N=subset(parameterestimates(m1_fit), 61 | lhs == 'N' & rhs == 'N')[, 'est'], 62 | N_N1=subset(parameterestimates(m1_fit), 63 | lhs == 'N' & rhs == 'N1')[, 'est']) 64 | 65 | cbind(parameters = c(variances, 66 | total=variances['N_N1'] * variances['N_N'] + variances['N1_N1'], 67 | raw_divide_by_n_minus_1=var(Data[,'N1']), 68 | raw_divide_by_n=mean((Data[,'N1'] - mean(Data[,'N1']))^2))) 69 | ``` 70 | 71 | * The output above shows the unstandardised parameters related to the item `N1`. 72 | * `N1_N1` corresponds to the unstandardised unique variance for the item. 73 | * `N_N` times `N_N1` represents the unstandardised common variance. 74 | * Thus, the sum of the unique and common variance represents the total variance. 75 | * When I calculated this on the raw data using the standard $n-1$ denominator, the value was slightly larger, but when I used $n$ as the denominator, the estimate was very close. 76 | 77 | 78 | 79 | ## Compare with a single factor model 80 | ```{r, tidy=FALSE} 81 | m2_model <- ' G =~ N1 + N2 + N3 + N4 + N5 82 | + E1 + E2 + E3 + E4 + E5 83 | + O1 + O2 + O3 + O4 + O5 84 | + A1 + A2 + A3 + A4 + A5 85 | + C1 + C2 + C3 + C4 + C5 86 | ' 87 | 88 | m2_fit <- cfa(m2_model, data=Data[, item_names]) 89 | summary(m2_fit, standardized=TRUE) 90 | ``` 91 | 92 | ```{r} 93 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 94 | m2=inspect(m2_fit, 'fit.measures')), 3) 95 | anova(m1_fit, m2_fit) 96 | ``` 97 | 98 | * The output compares the model fit statistics for the two models. 99 | * It also performs a chi-square difference test which shows that a one-factor model has significantly worse fit than the two-factor model. 100 | 101 | 102 | ## Modification indices 103 | ```{r} 104 | m1_mod <- modificationindices(m1_fit) 105 | m1_mod_summary <- subset(m1_mod, mi > 100) 106 | m1_mod_summary[order(m1_mod_summary$mi, decreasing=TRUE), ] 107 | ``` 108 | 109 | * `modificationindices` suggests several ad hoc modifications that could be made to improve the fit of the model. 110 | * The largest index suggests that items `N1` and `N2` share common variance. If we look at the help file on the bfi dataset `?bfi`, we see tha the text for `N1` ("Get angry easily") and `N2` ("Get irritated easily") are very similar. 111 | 112 | ```{r} 113 | (N_cors <- round(cor(Data[, paste0('N', 1:5)]), 2)) 114 | N1_N2_corr <- N_cors['N1', 'N2'] 115 | other_N_corrs <- round(mean(abs(N_cors[lower.tri(N_cors)][-1])), 2) 116 | 117 | ``` 118 | 119 | * The correlation matrix also shows that the correlation N1 and N2 ($r = `r I(N1_N2_corr)`$) is much larger than it is for the other variables ($\text{mean}(|r|) = `r I(other_N_corrs)`$). 120 | 121 | ## Various matrices 122 | ### Observed, fitted, and residual covariance matrices 123 | The following analysis extracts observed, fitted, and residual covariances and checks that they are consistent with expectations. I only perform this for five items rather than the full 25 item set in order to make the point about demonstrating their meaning clearer. 124 | 125 | ```{r} 126 | N_names <- paste0('N', 1:5) 127 | N_matrices <- list( 128 | observed=inspect(m1_fit, 'sampstat')$cov[N_names, N_names], 129 | fitted=fitted(m1_fit)$cov[N_names, N_names], 130 | residual=resid(m1_fit)$cov[N_names, N_names]) 131 | 132 | N_matrices$check <- N_matrices$observed - (N_matrices$fitted + N_matrices$residual) 133 | lapply(N_matrices, function(X) round(X, 3)) 134 | ``` 135 | 136 | * The overved covariance matrix was extracted using the `cov` function on the sample data. 137 | * The fitted covariance matrix can be extracted using the `fitted` method on the model fit object and then extracting the cov 138 | * Many symmetric matrices in lavaan are of class `lavaan.matrix.symmetric`. This hides the upper triangle of the matrix and formats the matrix to `nd` decimal places. 139 | Run `getAnywhere(print.lavaan.matrix.symmetric)` to see more details. 140 | * The `sampstat` option in the `inspect` method can be used to extract the sample covariance matrix. This is similar, but not exactly the same as running `cov` on the sample data. 141 | * The `resid` method can be used to extract the residual covariance matrix 142 | * I then create a `check` that `observed = fitted - residual`, which it does. 143 | 144 | ### Observed, fitted, and residual correlation matrices 145 | I often find it more meaningful to examine observed, fitted, and residual correlation matrices. Standardisation often makes it easier to understand the real magnitude of any residual. 146 | 147 | ```{r} 148 | N_names <- paste0('N', 1:5) 149 | N_cov <- list( 150 | observed=inspect(m1_fit, 'sampstat')$cov[N_names, N_names], 151 | fitted=fitted(m1_fit)$cov[N_names, N_names]) 152 | 153 | N_cor <- list( 154 | observed = cov2cor(N_cov$observed), 155 | fitted = cov2cor(N_cov$fitted) ) 156 | 157 | N_cor$residual <- N_cor$observed - N_cor$fitted 158 | 159 | lapply(N_cor, function(X) round(X, 2)) 160 | ``` 161 | 162 | * `cov2cor` is a `base` R function that scales a covariance matrix into a correlation matrix. 163 | * Fitted and observed correlation matrices can be obtained by running `cov2cor` on the corresponding covariance matrices. 164 | * The residual correlation matrix can be obtained by subtracting the fitted correlation matrix from the observed correlation matrix. 165 | * In this case we can see that the certain pairs of items correlate more or less than other pairs. In particular `N1-N2`, `N3-N4`, `N4-N5` have positive correlation residuals. An examination of the items below may suggest some added degree of similarity between these pairs of items. For example, N1 and N2 both concern anger and irritation, whereas N3 and N4 both concern mood and affect. 166 | 167 | 168 | > N1: Get angry easily. (q_952) 169 | > N2: Get irritated easily. (q_974) 170 | > N3: Have frequent mood swings. (q_1099 171 | > N4: Often feel blue. (q_1479) 172 | > N5: Panic easily. (q_1505) 173 | 174 | ## Uncorrelated factors 175 | ### All Uncorrelated factors 176 | The following examines a mdoel with uncorrelated factors. 177 | 178 | ```{r tidy=FALSE} 179 | m3_model <- ' N =~ N1 + N2 + N3 + N4 + N5 180 | E =~ E1 + E2 + E3 + E4 + E5 181 | O =~ O1 + O2 + O3 + O4 + O5 182 | A =~ A1 + A2 + A3 + A4 + A5 183 | C =~ C1 + C2 + C3 + C4 + C5 184 | ' 185 | 186 | m3_fit <- cfa(m3_model, data=Data[, item_names], orthogonal=TRUE) 187 | 188 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 189 | m3=inspect(m3_fit, 'fit.measures')), 3) 190 | anova(m1_fit, m3_fit) 191 | 192 | rmsea_m1 <- round(inspect(m1_fit, 'fit.measures')['rmsea'], 3) 193 | rmsea_m3 <- round(inspect(m3_fit, 'fit.measures')['rmsea'], 3) 194 | ``` 195 | 196 | * To convert a `cfa` model from one that permits fators to be correlated to one that constrains factors to be uncorrelated, just specify `orthogonal=TRUE`. 197 | * In this case constraining the factor covariances to all be zero led to a significant reduction in fit. This poorer fit can also be seen in measures like RMSEA (m1= 198 | `r rmsea_m1`; m3 = `r rmsea_m3` ). 199 | 200 | 201 | ### Correlations and covariances between factors 202 | It is useful to be able to extract correlations and covaraiances between factors. 203 | 204 | ```{r} 205 | inspect(m1_fit, 'coefficients')$psi 206 | cov2cor(inspect(m1_fit, 'coefficients')$psi) 207 | A_E_r <- cov2cor(inspect(m1_fit, 'coefficients')$psi)['A', 'E'] 208 | ``` 209 | 210 | * This code first extracts the factor variances and covariances. 211 | * I assume that naming the element `psi` (i.e., $\psi$) is a reference to LISREL Matrix notation (see this discussion from [USP 655 SEM](http://www.upa.pdx.edu/IOA/newsom/semclass/ho_lisrel%20notation.pdf)). 212 | * Once again `cov2cor` is used to convert the covariance matrix to a correlation matrix. 213 | * An inspection of the values shows that there are some substantive correlations that helps to explain why constraining them to zero in an orthogonal model would have substantially damaged fit. For example, the correlation between extraversion (`E`) and agreeableness (`A`) was quite high ($r = `r I(round(A_E_r, 2))`$). 214 | 215 | 216 | ```{r} 217 | # c('O', 'C', 'E', 'A', 'N') # set of factor names 218 | # lhs != rhs # excludes factor variances 219 | subset(inspect(m1_fit, 'standardized'), 220 | rhs %in% c('O', 'C', 'E', 'A', 'N') & lhs != rhs) 221 | ``` 222 | 223 | * The same values can be extracted from the `standardized` coefficients table using the `inspect` method. 224 | 225 | We can also confirm that for the orthogonal model (`m3`) the correlations are zero. 226 | 227 | ```{r} 228 | cov2cor(inspect(m3_fit, 'coefficients')$psi) 229 | ``` 230 | 231 | 232 | ## Constrain factor correlations to be equal 233 | ### Change constraints so that factor variances are one 234 | 235 | ```{r tidy=FALSE} 236 | m4_model <- ' N =~ N1 + N2 + N3 + N4 + N5 237 | E =~ E1 + E2 + E3 + E4 + E5 238 | O =~ O1 + O2 + O3 + O4 + O5 239 | A =~ A1 + A2 + A3 + A4 + A5 240 | C =~ C1 + C2 + C3 + C4 + C5 241 | ' 242 | 243 | m4_fit <- cfa(m4_model, data=Data[, item_names], std.lv=TRUE) 244 | 245 | inspect(m4_fit, 'coefficients')$psi 246 | inspect(m4_fit, 'coefficients')$psi 247 | ``` 248 | 249 | * `std.lv` is an argument that when `TRUE` standardises latent variables by fixing their variance to 1.0. The default is `FALSE` which instead constrains the first factor loading to 1.0. 250 | * This makes the covariance and the correlation matrix of the factors the same. 251 | 252 | We can see the differences in the loadings by comparing the loadings for the neuroticism factor: 253 | 254 | ```{r} 255 | head(parameterestimates(m4_fit), 5) 256 | head(parameterestimates(m1_fit), 5) 257 | 258 | # shows how ratio of loadings has not changed 259 | head(parameterestimates(m4_fit), 5)$est / head(parameterestimates(m4_fit), 5)$est[1] 260 | ``` 261 | 262 | 263 | 264 | ### Add equality constraints 265 | ```{r tidy=FALSE} 266 | m5_model <- ' N =~ N1 + N2 + N3 + N4 + N5 267 | E =~ E1 + E2 + E3 + E4 + E5 268 | O =~ O1 + O2 + O3 + O4 + O5 269 | A =~ A1 + A2 + A3 + A4 + A5 270 | C =~ C1 + C2 + C3 + C4 + C5 271 | N ~~ R*E + R*O + R*A + R*C 272 | E ~~ R*O + R*A + R*C 273 | O ~~ R*A + R*C 274 | A ~~ R*C 275 | ' 276 | 277 | Data_reversed <- Data 278 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 279 | 280 | m5_fit <- cfa(m5_model, data=Data_reversed[, item_names], std.lv=TRUE) 281 | ``` 282 | 283 | * Equality constraints were added by labelling all the covariance parameters with a common label (i.e., `R`). 284 | * `~~` stands for covariance. 285 | * `R*E` labels the parameter with the `E` variable with the label 286 | * I reversed the neuroticism items and hence the factor to ensure that all the inter-item correlations were positive. 287 | 288 | The following output shows that the correlation/covariance is the same for all factor inter-correlations. 289 | 290 | ```{r} 291 | inspect(m5_fit, 'coefficients')$psi 292 | ``` 293 | 294 | The following analysis compare the fit of the unconstrained with the equal-covariance model. 295 | 296 | ```{r} 297 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 298 | m5=inspect(m5_fit, 'fit.measures')), 3) 299 | anova(m1_fit, m5_fit) 300 | ``` 301 | 302 | * The unconstrained model provides a better fit both in terms of the chi-square difference test and when comparing various parisomony adjusted fit indices such as RMSEA. 303 | * The difference is relatively small. 304 | 305 | The following summarises the correlations between variables (correlations with Neuroticism reversed). 306 | 307 | ```{r } 308 | rs <- abs(inspect(m4_fit, 'coefficients')$psi) 309 | summary(rs[lower.tri(rs)]) 310 | hist(rs[lower.tri(rs)]) 311 | 312 | round(rs, 2) 313 | ``` 314 | 315 | * Given the very large sample size, even small variations in sample correlations likely reflect true variation. 316 | * However, in particular, the correlation between E and A is much larger than the average correlation, and the correlation between O and N is much smaller than the average correlation. 317 | 318 | ### Add equality constraints with some post hoc modifications 319 | ```{r tidy=FALSE} 320 | m6_model <- ' N =~ N1 + N2 + N3 + N4 + N5 321 | E =~ E1 + E2 + E3 + E4 + E5 322 | O =~ O1 + O2 + O3 + O4 + O5 323 | A =~ A1 + A2 + A3 + A4 + A5 324 | C =~ C1 + C2 + C3 + C4 + C5 325 | N ~~ R*E + R*A + R*C 326 | E ~~ R*O + R*C 327 | O ~~ R*A + R*C 328 | A ~~ R*C 329 | ' 330 | 331 | Data_reversed <- Data 332 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 333 | 334 | m6_fit <- cfa(m6_model, data=Data_reversed[, item_names], std.lv=TRUE) 335 | ``` 336 | 337 | The above model frees up the correlation between E and A, and between O and N. 338 | 339 | ```{r} 340 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 341 | m5=inspect(m1_fit, 'fit.measures'), 342 | m6=inspect(m6_fit, 'fit.measures')), 3) 343 | anova(m1_fit, m6_fit) 344 | anova(m5_fit, m6_fit) 345 | ``` 346 | 347 | * Freeing up these two correlations improved the model relative to the equality model. By most fit statistics, this model still provided a worse fit than the unconstrained model. However, interestingly, the RMSEA was slightly lower (i.e., better). 348 | 349 | ### Add equality constraints without reversal 350 | In section 5.5 of the [Lavaan introductory guide 0.4-13](http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf) it talks about various types of equality constraints. Thus, instead of reversing the neuroticism factor, it is possible to directly constrain covariances of neuroticism with each other factor to be the opposite of the covariances. 351 | 352 | ```{r tidy=FALSE} 353 | m7_model <- ' N =~ N1 + N2 + N3 + N4 + N5 354 | E =~ E1 + E2 + E3 + E4 + E5 355 | O =~ O1 + O2 + O3 + O4 + O5 356 | A =~ A1 + A2 + A3 + A4 + A5 357 | C =~ C1 + C2 + C3 + C4 + C5 358 | # covariances 359 | N ~~ R1*E + R1*O + R1*A + R1*C 360 | E ~~ R2*O + R2*A + R2*C 361 | O ~~ R2*A + R2*C 362 | A ~~ R2*C 363 | 364 | # constraints 365 | R1 == 0 - R2 366 | ' 367 | 368 | m7_fit <- cfa(m7_model, data=Data[, item_names], std.lv=TRUE) 369 | ``` 370 | 371 | Let's check that the results are the same whether we reverse data or set negative constraints. 372 | 373 | 374 | ```{r} 375 | m5_fit 376 | m7_fit 377 | 378 | inspect(m5_fit, 'coefficients')$psi 379 | inspect(m7_fit, 'coefficients')$psi 380 | ``` -------------------------------------------------------------------------------- /cfa-example/figure/unnamed-chunk-19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jeromyanglim/lavaan-examples/d7f5cbdc7fe14ffd039512bae6aa140c2a0ca5e6/cfa-example/figure/unnamed-chunk-19.png -------------------------------------------------------------------------------- /cheat-sheet-lavaan/cheat-sheet-lavaan.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | Lavaan Cheat Sheet 9 | 10 | 11 | 12 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 |

Lavaan Cheat Sheet

147 | 148 |

Assumptions

149 | 150 | 156 | 157 |

Documentation tips

158 | 159 | 164 | 165 |

Model fitting

166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 |
NameCommand
fit CFA to datacfa(model, data=Data)
fit SEM to datasem(model, data=Data)
standardised solutionsem(model, data=Data, std.ov=TRUE)
orthogonal factorscfa(model, data=Data, orthogonal=TRUE)
190 | 191 |

Matrices

192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 |
NameCommand
Factor covariance matrixinspect(fit, "coefficients")$psi
Fitted covariance matrixfitted(fit)$cov
Observed covariance matrixinspect(fit, 'sampstat')$cov
Residual covariance matrixresid(fit)$cov
Factor correlation matrixcov2cor(inspect(fit, "coefficients")$psi) or use covariance command with standardised solution e.g., cfa(..., std.ov=TRUE)
220 | 221 |

Fit Measures

222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 |
NameCommand
Fit measures:fitMeasures(fit)
Specific fit measures e.g.:fitMeasures(fit)[c('chisq', 'df', 'pvalue', 'cfi', 'rmsea', 'srmr')]
238 | 239 |

Parameters

240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 |
NameCommand
Parameter informationparTable(fit)
Standardised estimatesstandardizedSolution(fit) or summary(fit, standardized=TRUE)
256 | 257 |

R-squared | inspect(fit, 'r2')

258 | 259 |

Compare models

260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 |
NameCommand
Compare fit measurescbind(m1=inspect(m1_fit, 'fit.measures'), m2=inspect(m2_fit, 'fit.measures'))
Chi-square difference testanova(m1_fit, m2_fit)
276 | 277 |

Model improvement

278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 |
NameCommand
Modification indicesmod_ind <- modificationindices(fit)
10 greatesthead(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], 10)
mi > 5subset(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], mi > 5)
298 | 299 | 300 | 301 | 302 | 303 | -------------------------------------------------------------------------------- /cheat-sheet-lavaan/cheat-sheet-lavaan.md: -------------------------------------------------------------------------------- 1 | # Lavaan Cheat Sheet 2 | ## Assumptions 3 | * `Data` is data frame 4 | * `model` is the lavaan model syntax character variable 5 | * `fit` is an object of class `lavaan` typically returned from functions `cfa`, `sem`, `growth`, and `lavaan` 6 | * `m1_fit` and `m2_fit` are used for showing model comparison of `lavaan` objects. 7 | 8 | 9 | ## Documentation tips 10 | * Introduction: http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf 11 | * Basic model commmands: `?cfa ?sem ?lavaan`: 12 | * Extracting elements: `?inspect` 13 | 14 | ## Model fitting 15 | | Name | Command | 16 | -------| ------------- 17 | fit CFA to data | `cfa(model, data=Data)` 18 | fit SEM to data | `sem(model, data=Data)` 19 | standardised solution | `sem(model, data=Data, std.ov=TRUE)` 20 | orthogonal factors | `cfa(model, data=Data, orthogonal=TRUE)` 21 | 22 | 23 | ## Matrices 24 | | Name | Command | 25 | -------| ------------- 26 | Factor covariance matrix | `inspect(fit, "coefficients")$psi` 27 | Fitted covariance matrix | `fitted(fit)$cov` 28 | Observed covariance matrix | `inspect(fit, 'sampstat')$cov` 29 | Residual covariance matrix | `resid(fit)$cov` 30 | Factor correlation matrix | `cov2cor(inspect(fit, "coefficients")$psi)` or use covariance command with standardised solution e.g., `cfa(..., std.ov=TRUE)` 31 | 32 | ## Fit Measures 33 | | Name | Command | 34 | -------| ------------- 35 | Fit measures: | `fitMeasures(fit)` 36 | Specific fit measures e.g.: | `fitMeasures(fit)[c('chisq', 'df', 'pvalue', 'cfi', 'rmsea', 'srmr')]` 37 | 38 | 39 | ## Parameters 40 | | Name | Command | 41 | -------| ------------- 42 | Parameter information | `parTable(fit)` 43 | Standardised estimates | `standardizedSolution(fit)` or `summary(fit, standardized=TRUE)` 44 | 45 | R-squared | `inspect(fit, 'r2')` 46 | 47 | 48 | 49 | ## Compare models 50 | 51 | | Name | Command | 52 | -------| ------------- 53 | Compare fit measures | `cbind(m1=inspect(m1_fit, 'fit.measures'), m2=inspect(m2_fit, 'fit.measures'))` 54 | Chi-square difference test | `anova(m1_fit, m2_fit)` 55 | 56 | ## Model improvement 57 | | Name | Command | 58 | -------| ------------- 59 | Modification indices | `mod_ind <- modificationindices(fit)` 60 | 10 greatest | `head(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], 10)` 61 | mi > 5 | `subset(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], mi > 5)` 62 | 63 | -------------------------------------------------------------------------------- /cheat-sheet-lavaan/cheat-sheet-lavaan.rmd: -------------------------------------------------------------------------------- 1 | # Lavaan Cheat Sheet 2 | ## Assumptions 3 | * `Data` is data frame 4 | * `model` is the lavaan model syntax character variable 5 | * `fit` is an object of class `lavaan` typically returned from functions `cfa`, `sem`, `growth`, and `lavaan` 6 | * `m1_fit` and `m2_fit` are used for showing model comparison of `lavaan` objects. 7 | 8 | 9 | ## Documentation tips 10 | * Introduction: http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf 11 | * Basic model commmands: `?cfa ?sem ?lavaan`: 12 | * Extracting elements: `?inspect` 13 | 14 | ## Model fitting 15 | | Name | Command | 16 | -------| ------------- 17 | fit CFA to data | `cfa(model, data=Data)` 18 | fit SEM to data | `sem(model, data=Data)` 19 | standardised solution | `sem(model, data=Data, std.ov=TRUE)` 20 | orthogonal factors | `cfa(model, data=Data, orthogonal=TRUE)` 21 | 22 | 23 | ## Matrices 24 | | Name | Command | 25 | -------| ------------- 26 | Factor covariance matrix | `inspect(fit, "coefficients")$psi` 27 | Fitted covariance matrix | `fitted(fit)$cov` 28 | Observed covariance matrix | `inspect(fit, 'sampstat')$cov` 29 | Residual covariance matrix | `resid(fit)$cov` 30 | Factor correlation matrix | `cov2cor(inspect(fit, "coefficients")$psi)` or use covariance command with standardised solution e.g., `cfa(..., std.ov=TRUE)` 31 | Residual correlation matrix 32 | 33 | ## Fit Measures 34 | | Name | Command | 35 | -------| ------------- 36 | Fit measures: | `fitMeasures(fit)` 37 | Specific fit measures e.g.: | `fitMeasures(fit)[c('chisq', 'df', 'pvalue', 'cfi', 'rmsea', 'srmr')]` 38 | 39 | 40 | ## Parameters 41 | | Name | Command | 42 | -------| ------------- 43 | Parameter information | `parTable(fit)` 44 | Standardised estimates | `parameterestimates(m1_fit, standardized=TRUE)` or `standardizedSolution(fit)` or `summary(fit, standardized=TRUE)` or `inspect(fit, 'std.coef')` 45 | Unstandardised estimates | `parameterestimates(fit)` or `coef(fit)` 46 | R-squared | `inspect(fit, 'r2')` 47 | 48 | 49 | 50 | ## Compare models 51 | 52 | | Name | Command | 53 | -------| ------------- 54 | Compare fit measures | `cbind(m1=inspect(m1_fit, 'fit.measures'), m2=inspect(m2_fit, 'fit.measures'))` 55 | Chi-square difference test | `anova(m1_fit, m2_fit)` 56 | 57 | ## Model improvement 58 | | Name | Command | 59 | -------| ------------- 60 | Modification indices | `mod_ind <- modificationindices(fit)` 61 | 10 greatest | `head(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], 10)` 62 | mi > 5 | `subset(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], mi > 5)` 63 | 64 | -------------------------------------------------------------------------------- /convert.r: -------------------------------------------------------------------------------- 1 | setwd('.build') # if necessary 2 | 3 | require(knitr) # required for knitting from rmd to md 4 | require(markdown) # required for md to html 5 | rmd_to_html <- function(input_rmd, output_stem, markdown_only=FALSE) { 6 | output_rmd <- paste0(output_stem, '.rmd') 7 | output_md <- paste0(output_stem, '.md') 8 | output_html <- paste0(output_stem, '.html') 9 | output_tex <- paste0(output_stem, '.tex') 10 | file.copy(input_rmd, output_rmd, overwrite=TRUE) 11 | knit(output_rmd, output_md) # creates md file 12 | markdownToHTML(output_md, output_html, 13 | options='fragment_only') # creates html file 14 | output_html 15 | } 16 | 17 | # Combine component HTML files 18 | html_files <- rmd_to_html('../cheat-sheet-lavaan/cheat-sheet-lavaan.rmd', 'cheat-sheet-lavaan') 19 | html_files <- c(html_files, 20 | rmd_to_html('../cfa-example/cfa-example.rmd', 'cfa-example')) 21 | html_files <- c(html_files, 22 | rmd_to_html('../ex1-paper/ex1-paper.rmd', 'ex1-paper')) 23 | html_files <- c(html_files, 24 | rmd_to_html('../ex2-paper/ex2-paper.rmd', 'ex2-paper')) 25 | html_files <- c(html_files, 26 | rmd_to_html('../path-analysis/path-analysis.rmd', 'path-analysis')) 27 | 28 | # HTML to LaTeX to PDF 29 | combined_stem <- 'combined' 30 | combined_html <- paste0(combined_stem, '.html') 31 | combined_tex <- paste0(combined_stem, '.tex') 32 | combined_pdf <- paste0(combined_stem, '.pdf') 33 | system(paste('cat', paste(html_files, collapse=' '), '>', combined_html)) 34 | 35 | system(paste('pandoc --toc -s', combined_html, ' -o', combined_tex)) 36 | system(paste('pdflatex -interaction nonstopmode', combined_tex)) 37 | system(paste('pdflatex -interaction nonstopmode', combined_tex)) 38 | system(paste('pdflatex -interaction nonstopmode', combined_tex)) 39 | system(paste('gnome-open', combined_pdf)) 40 | -------------------------------------------------------------------------------- /ex1-paper/ex1-paper.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | Example 1 from Lavaan 9 | 10 | 11 | 12 | 138 | 139 | 140 | 170 | 171 | 172 | 176 | 177 | 178 | 179 | 181 | 182 | 183 | 184 | 185 | 186 | 187 |

Example 1 from Lavaan

188 | 189 |

This exercise examines the first example shown in
190 | http://www.jstatsoft.org/v48/i02/paper.
191 | It's a three-factor confirmatory factor analysis example with three items per factor.
192 | All three latent factors are permitted to correlate.

193 | 194 | 199 | 200 |
library('lavaan')
201 | library('Hmisc')
202 | cases <- HolzingerSwineford1939
203 | 
204 | 205 |

Quick examination of data

206 | 207 |
str(cases)
208 | 
209 | 210 |
## 'data.frame':    301 obs. of  15 variables:
211 | ##  $ id    : int  1 2 3 4 5 6 7 8 9 11 ...
212 | ##  $ sex   : int  1 2 2 1 2 2 1 2 2 2 ...
213 | ##  $ ageyr : int  13 13 13 13 12 14 12 12 13 12 ...
214 | ##  $ agemo : int  1 7 1 2 2 1 1 2 0 5 ...
215 | ##  $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ...
216 | ##  $ grade : int  7 7 7 7 7 7 7 7 7 7 ...
217 | ##  $ x1    : num  3.33 5.33 4.5 5.33 4.83 ...
218 | ##  $ x2    : num  7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ...
219 | ##  $ x3    : num  0.375 2.125 1.875 3 0.875 ...
220 | ##  $ x4    : num  2.33 1.67 1 2.67 2.67 ...
221 | ##  $ x5    : num  5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ...
222 | ##  $ x6    : num  1.286 1.286 0.429 2.429 2.571 ...
223 | ##  $ x7    : num  3.39 3.78 3.26 3 3.7 ...
224 | ##  $ x8    : num  5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ...
225 | ##  $ x9    : num  6.36 7.92 4.42 4.86 5.92 ...
226 | 
227 | 228 |
Hmisc::describe(cases)
229 | 
230 | 231 |
## cases 
232 | ## 
233 | ##  15  Variables      301  Observations
234 | ## ---------------------------------------------------------------------------
235 | ## id 
236 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
237 | ##     301       0     301   176.6      17      33      82     163     272 
238 | ##     .90     .95 
239 | ##     318     335 
240 | ## 
241 | ## lowest :   1   2   3   4   5, highest: 346 347 348 349 351 
242 | ## ---------------------------------------------------------------------------
243 | ## sex 
244 | ##       n missing  unique    Mean 
245 | ##     301       0       2   1.515 
246 | ## 
247 | ## 1 (146, 49%), 2 (155, 51%) 
248 | ## ---------------------------------------------------------------------------
249 | ## ageyr 
250 | ##       n missing  unique    Mean 
251 | ##     301       0       6      13 
252 | ## 
253 | ##           11  12  13 14 15 16
254 | ## Frequency  8 101 110 55 20  7
255 | ## %          3  34  37 18  7  2
256 | ## ---------------------------------------------------------------------------
257 | ## agemo 
258 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
259 | ##     301       0      12   5.375       0       1       2       5       8 
260 | ##     .90     .95 
261 | ##      10      11 
262 | ## 
263 | ##            0  1  2  3  4  5  6  7  8  9 10 11
264 | ## Frequency 22 31 26 26 27 27 21 25 26 23 19 28
265 | ## %          7 10  9  9  9  9  7  8  9  8  6  9
266 | ## ---------------------------------------------------------------------------
267 | ## school 
268 | ##       n missing  unique 
269 | ##     301       0       2 
270 | ## 
271 | ## Grant-White (145, 48%), Pasteur (156, 52%) 
272 | ## ---------------------------------------------------------------------------
273 | ## grade 
274 | ##       n missing  unique    Mean 
275 | ##     300       1       2   7.477 
276 | ## 
277 | ## 7 (157, 52%), 8 (143, 48%) 
278 | ## ---------------------------------------------------------------------------
279 | ## x1 
280 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
281 | ##     301       0      35   4.936   3.000   3.333   4.167   5.000   5.667 
282 | ##     .90     .95 
283 | ##   6.333   6.667 
284 | ## 
285 | ## lowest : 0.6667 1.6667 1.8333 2.0000 2.6667
286 | ## highest: 7.0000 7.1667 7.3333 7.5000 8.5000 
287 | ## ---------------------------------------------------------------------------
288 | ## x2 
289 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
290 | ##     301       0      25   6.088    4.50    4.75    5.25    6.00    6.75 
291 | ##     .90     .95 
292 | ##    7.75    8.50 
293 | ## 
294 | ## lowest : 2.25 3.50 3.75 4.00 4.25, highest: 8.25 8.50 8.75 9.00 9.25 
295 | ## ---------------------------------------------------------------------------
296 | ## x3 
297 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
298 | ##     301       0      35    2.25   0.625   0.875   1.375   2.125   3.125 
299 | ##     .90     .95 
300 | ##   4.000   4.250 
301 | ## 
302 | ## lowest : 0.250 0.375 0.500 0.625 0.750
303 | ## highest: 4.000 4.125 4.250 4.375 4.500 
304 | ## ---------------------------------------------------------------------------
305 | ## x4 
306 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
307 | ##     301       0      20   3.061   1.333   1.667   2.333   3.000   3.667 
308 | ##     .90     .95 
309 | ##   4.667   5.000 
310 | ## 
311 | ## lowest : 0.0000 0.3333 0.6667 1.0000 1.3333
312 | ## highest: 5.0000 5.3333 5.6667 6.0000 6.3333 
313 | ## ---------------------------------------------------------------------------
314 | ## x5 
315 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
316 | ##     301       0      25   4.341    2.00    2.50    3.50    4.50    5.25 
317 | ##     .90     .95 
318 | ##    6.00    6.25 
319 | ## 
320 | ## lowest : 1.00 1.25 1.50 1.75 2.00, highest: 6.00 6.25 6.50 6.75 7.00 
321 | ## ---------------------------------------------------------------------------
322 | ## x6 
323 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
324 | ##     301       0      40   2.186  0.7143  1.0000  1.4286  2.0000  2.7143 
325 | ##     .90     .95 
326 | ##  3.7143  4.2857 
327 | ## 
328 | ## lowest : 0.1429 0.2857 0.4286 0.5714 0.7143
329 | ## highest: 5.1429 5.4286 5.5714 5.8571 6.1429 
330 | ## ---------------------------------------------------------------------------
331 | ## x7 
332 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
333 | ##     301       0      97   4.186   2.435   2.826   3.478   4.087   4.913 
334 | ##     .90     .95 
335 | ##   5.696   5.870 
336 | ## 
337 | ## lowest : 1.304 1.870 2.000 2.043 2.130
338 | ## highest: 6.652 6.826 6.957 7.261 7.435 
339 | ## ---------------------------------------------------------------------------
340 | ## x8 
341 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
342 | ##     301       0      84   5.527    3.90    4.20    4.85    5.50    6.10 
343 | ##     .90     .95 
344 | ##    6.80    7.20 
345 | ## 
346 | ## lowest :  3.05  3.50  3.60  3.65  3.70
347 | ## highest:  8.00  8.05  8.30  9.10 10.00 
348 | ## ---------------------------------------------------------------------------
349 | ## x9 
350 | ##       n missing  unique    Mean     .05     .10     .25     .50     .75 
351 | ##     301       0     129   5.374   3.750   4.111   4.750   5.417   6.083 
352 | ##     .90     .95 
353 | ##   6.667   7.000 
354 | ## 
355 | ## lowest : 2.778 3.111 3.222 3.278 3.306
356 | ## highest: 7.528 7.611 7.917 8.611 9.250 
357 | ## ---------------------------------------------------------------------------
358 | 
359 | 360 |

The data set include 301 observations. It includes a few demographic variables (e.g., sex, age in years and months, school, and grade). It includes nine variables that are the observed test scores used in the subsequent CFA.

361 | 362 |

Fit CFA

363 | 364 |
m1_model <- ' visual  =~ x1 + x2 + x3
365 |               textual =~ x4 + x5 + x6
366 |               speed   =~ x7 + x8 + x9
367 | '
368 | 
369 | m1_fit <- cfa(m1_model, data=cases)
370 | 
371 | 372 | 377 | 378 |

Show parameter table

379 | 380 |
parTable(m1_fit)
381 | 
382 | 383 |
##    id     lhs op     rhs user group free ustart exo label eq.id unco
384 | ## 1   1  visual =~      x1    1     1    0      1   0           0    0
385 | ## 2   2  visual =~      x2    1     1    1     NA   0           0    1
386 | ## 3   3  visual =~      x3    1     1    2     NA   0           0    2
387 | ## 4   4 textual =~      x4    1     1    0      1   0           0    0
388 | ## 5   5 textual =~      x5    1     1    3     NA   0           0    3
389 | ## 6   6 textual =~      x6    1     1    4     NA   0           0    4
390 | ## 7   7   speed =~      x7    1     1    0      1   0           0    0
391 | ## 8   8   speed =~      x8    1     1    5     NA   0           0    5
392 | ## 9   9   speed =~      x9    1     1    6     NA   0           0    6
393 | ## 10 10      x1 ~~      x1    0     1    7     NA   0           0    7
394 | ## 11 11      x2 ~~      x2    0     1    8     NA   0           0    8
395 | ## 12 12      x3 ~~      x3    0     1    9     NA   0           0    9
396 | ## 13 13      x4 ~~      x4    0     1   10     NA   0           0   10
397 | ## 14 14      x5 ~~      x5    0     1   11     NA   0           0   11
398 | ## 15 15      x6 ~~      x6    0     1   12     NA   0           0   12
399 | ## 16 16      x7 ~~      x7    0     1   13     NA   0           0   13
400 | ## 17 17      x8 ~~      x8    0     1   14     NA   0           0   14
401 | ## 18 18      x9 ~~      x9    0     1   15     NA   0           0   15
402 | ## 19 19  visual ~~  visual    0     1   16     NA   0           0   16
403 | ## 20 20 textual ~~ textual    0     1   17     NA   0           0   17
404 | ## 21 21   speed ~~   speed    0     1   18     NA   0           0   18
405 | ## 22 22  visual ~~ textual    0     1   19     NA   0           0   19
406 | ## 23 23  visual ~~   speed    0     1   20     NA   0           0   20
407 | ## 24 24 textual ~~   speed    0     1   21     NA   0           0   21
408 | 
409 | 410 | 431 | 432 |

It shows that the latent factors are allowed to intercorrelate. The cfa function has an an argument orthogonal. It defaults to FALSE which permits correlated factors.

433 | 434 |
parTable(cfa(m1_model, data=cases, orthogonal=TRUE))[22:24, ]
435 | 
436 | 437 |
##    id     lhs op     rhs user group free ustart exo label eq.id unco
438 | ## 22 22  visual ~~ textual    0     1    0      0   0           0    0
439 | ## 23 23  visual ~~   speed    0     1    0      0   0           0    0
440 | ## 24 24 textual ~~   speed    0     1    0      0   0           0    0
441 | 
442 | 443 |

When orthogonal=TRUE is specified, the covariance of latent factors is constrained to zero. This is reflected in free=0 (i.e., it's not free to vary) and ustart=0 (the constrained value is zero) in the parameter table.

444 | 445 |

Returning to the original parameter table:

446 | 447 | 450 | 451 |

Summarise fit

452 | 453 |
summary(m1_fit)
454 | 
455 | 456 |
## lavaan (0.4-14) converged normally after 41 iterations
457 | ## 
458 | ##   Number of observations                           301
459 | ## 
460 | ##   Estimator                                         ML
461 | ##   Minimum Function Chi-square                   85.306
462 | ##   Degrees of freedom                                24
463 | ##   P-value                                        0.000
464 | ## 
465 | ## Parameter estimates:
466 | ## 
467 | ##   Information                                 Expected
468 | ##   Standard Errors                             Standard
469 | ## 
470 | ##                    Estimate  Std.err  Z-value  P(>|z|)
471 | ## Latent variables:
472 | ##   visual =~
473 | ##     x1                1.000
474 | ##     x2                0.553    0.100    5.554    0.000
475 | ##     x3                0.729    0.109    6.685    0.000
476 | ##   textual =~
477 | ##     x4                1.000
478 | ##     x5                1.113    0.065   17.014    0.000
479 | ##     x6                0.926    0.055   16.703    0.000
480 | ##   speed =~
481 | ##     x7                1.000
482 | ##     x8                1.180    0.165    7.152    0.000
483 | ##     x9                1.082    0.151    7.155    0.000
484 | ## 
485 | ## Covariances:
486 | ##   visual ~~
487 | ##     textual           0.408    0.074    5.552    0.000
488 | ##     speed             0.262    0.056    4.660    0.000
489 | ##   textual ~~
490 | ##     speed             0.173    0.049    3.518    0.000
491 | ## 
492 | ## Variances:
493 | ##     x1                0.549    0.114
494 | ##     x2                1.134    0.102
495 | ##     x3                0.844    0.091
496 | ##     x4                0.371    0.048
497 | ##     x5                0.446    0.058
498 | ##     x6                0.356    0.043
499 | ##     x7                0.799    0.081
500 | ##     x8                0.488    0.074
501 | ##     x9                0.566    0.071
502 | ##     visual            0.809    0.145
503 | ##     textual           0.979    0.112
504 | ##     speed             0.384    0.086
505 | ## 
506 | 
507 | 508 |

The default summary method shows \( \chi^2 \), \( df \), p-value for the overall model, unstandardised parameter estimates, in some cases with significance tests.

509 | 510 |

Getting fit statistics

511 | 512 |

There are multiple ways of getting fit statistics

513 | 514 |
fitMeasures(m1_fit)
515 | 
516 | 517 |
##             chisq                df            pvalue    baseline.chisq 
518 | ##            85.306            24.000             0.000           918.852 
519 | ##       baseline.df   baseline.pvalue               cfi               tli 
520 | ##            36.000             0.000             0.931             0.896 
521 | ##              logl unrestricted.logl              npar               aic 
522 | ##         -3737.745         -3695.092            21.000          7517.490 
523 | ##               bic            ntotal              bic2             rmsea 
524 | ##          7595.339           301.000          7528.739             0.092 
525 | ##    rmsea.ci.lower    rmsea.ci.upper      rmsea.pvalue              srmr 
526 | ##             0.071             0.114             0.001             0.065 
527 | 
528 | 529 |
# equivalent to:
530 | # inspect(m1_fit, 'fit.measures')
531 | 
532 | fitMeasures(m1_fit)['rmsea']
533 | 
534 | 535 |
##   rmsea 
536 | ## 0.09212 
537 | 
538 | 539 |
fitMeasures(m1_fit, c('rmsea', 'rmsea.ci.lower', 'rmsea.ci.upper'))
540 | 
541 | 542 |
##          rmsea rmsea.ci.lower rmsea.ci.upper 
543 | ##          0.092          0.071          0.114 
544 | 
545 | 546 |

547 | 
548 | summary(m1_fit, fit.measures=TRUE)
549 | 
550 | 551 |
## lavaan (0.4-14) converged normally after 41 iterations
552 | ## 
553 | ##   Number of observations                           301
554 | ## 
555 | ##   Estimator                                         ML
556 | ##   Minimum Function Chi-square                   85.306
557 | ##   Degrees of freedom                                24
558 | ##   P-value                                        0.000
559 | ## 
560 | ## Chi-square test baseline model:
561 | ## 
562 | ##   Minimum Function Chi-square                  918.852
563 | ##   Degrees of freedom                                36
564 | ##   P-value                                        0.000
565 | ## 
566 | ## Full model versus baseline model:
567 | ## 
568 | ##   Comparative Fit Index (CFI)                    0.931
569 | ##   Tucker-Lewis Index (TLI)                       0.896
570 | ## 
571 | ## Loglikelihood and Information Criteria:
572 | ## 
573 | ##   Loglikelihood user model (H0)              -3737.745
574 | ##   Loglikelihood unrestricted model (H1)      -3695.092
575 | ## 
576 | ##   Number of free parameters                         21
577 | ##   Akaike (AIC)                                7517.490
578 | ##   Bayesian (BIC)                              7595.339
579 | ##   Sample-size adjusted Bayesian (BIC)         7528.739
580 | ## 
581 | ## Root Mean Square Error of Approximation:
582 | ## 
583 | ##   RMSEA                                          0.092
584 | ##   90 Percent Confidence Interval          0.071  0.114
585 | ##   P-value RMSEA <= 0.05                          0.001
586 | ## 
587 | ## Standardized Root Mean Square Residual:
588 | ## 
589 | ##   SRMR                                           0.065
590 | ## 
591 | ## Parameter estimates:
592 | ## 
593 | ##   Information                                 Expected
594 | ##   Standard Errors                             Standard
595 | ## 
596 | ##                    Estimate  Std.err  Z-value  P(>|z|)
597 | ## Latent variables:
598 | ##   visual =~
599 | ##     x1                1.000
600 | ##     x2                0.553    0.100    5.554    0.000
601 | ##     x3                0.729    0.109    6.685    0.000
602 | ##   textual =~
603 | ##     x4                1.000
604 | ##     x5                1.113    0.065   17.014    0.000
605 | ##     x6                0.926    0.055   16.703    0.000
606 | ##   speed =~
607 | ##     x7                1.000
608 | ##     x8                1.180    0.165    7.152    0.000
609 | ##     x9                1.082    0.151    7.155    0.000
610 | ## 
611 | ## Covariances:
612 | ##   visual ~~
613 | ##     textual           0.408    0.074    5.552    0.000
614 | ##     speed             0.262    0.056    4.660    0.000
615 | ##   textual ~~
616 | ##     speed             0.173    0.049    3.518    0.000
617 | ## 
618 | ## Variances:
619 | ##     x1                0.549    0.114
620 | ##     x2                1.134    0.102
621 | ##     x3                0.844    0.091
622 | ##     x4                0.371    0.048
623 | ##     x5                0.446    0.058
624 | ##     x6                0.356    0.043
625 | ##     x7                0.799    0.081
626 | ##     x8                0.488    0.074
627 | ##     x9                0.566    0.071
628 | ##     visual            0.809    0.145
629 | ##     textual           0.979    0.112
630 | ##     speed             0.384    0.086
631 | ## 
632 | 
633 | 634 | 640 | 641 |

Modification indices

642 | 643 |
m1_mod <- modificationIndices(m1_fit)
644 | head(m1_mod[order(m1_mod$mi, decreasing=TRUE), ], 10)
645 | 
646 | 647 |
##        lhs op rhs     mi    epc sepc.lv sepc.all sepc.nox
648 | ## 1   visual =~  x9 36.411  0.577   0.519    0.515    0.515
649 | ## 2       x7 ~~  x8 34.145  0.536   0.536    0.488    0.488
650 | ## 3   visual =~  x7 18.631 -0.422  -0.380   -0.349   -0.349
651 | ## 4       x8 ~~  x9 14.946 -0.423  -0.423   -0.415   -0.415
652 | ## 5  textual =~  x3  9.151 -0.272  -0.269   -0.238   -0.238
653 | ## 6       x2 ~~  x7  8.918 -0.183  -0.183   -0.143   -0.143
654 | ## 7  textual =~  x1  8.903  0.350   0.347    0.297    0.297
655 | ## 8       x2 ~~  x3  8.532  0.218   0.218    0.164    0.164
656 | ## 9       x3 ~~  x5  7.858 -0.130  -0.130   -0.089   -0.089
657 | ## 10  visual =~  x5  7.441 -0.210  -0.189   -0.147   -0.147
658 | 
659 | 660 | 664 | 665 |
m2_model <- ' visual  =~ x1 + x2 + x3 + x9
666 |               textual =~ x4 + x5 + x6
667 |               speed   =~ x7 + x8 + x9
668 | '
669 | 
670 | m2_fit <- cfa(m2_model, data=cases)
671 | anova(m1_fit, m2_fit)
672 | 
673 | 674 |
## Chi Square Difference Test
675 | ## 
676 | ##        Df  AIC  BIC Chisq Chisq diff Df diff Pr(>Chisq)    
677 | ## m2_fit 23 7487 7568  52.4                                  
678 | ## m1_fit 24 7517 7595  85.3       32.9       1    9.6e-09 ***
679 | ## ---
680 | ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
681 | 
682 | 683 | 690 | 691 |

Standardised parameters

692 | 693 |
summary(m1_fit)
694 | 
695 | 696 |
## lavaan (0.4-14) converged normally after 41 iterations
697 | ## 
698 | ##   Number of observations                           301
699 | ## 
700 | ##   Estimator                                         ML
701 | ##   Minimum Function Chi-square                   85.306
702 | ##   Degrees of freedom                                24
703 | ##   P-value                                        0.000
704 | ## 
705 | ## Parameter estimates:
706 | ## 
707 | ##   Information                                 Expected
708 | ##   Standard Errors                             Standard
709 | ## 
710 | ##                    Estimate  Std.err  Z-value  P(>|z|)
711 | ## Latent variables:
712 | ##   visual =~
713 | ##     x1                1.000
714 | ##     x2                0.553    0.100    5.554    0.000
715 | ##     x3                0.729    0.109    6.685    0.000
716 | ##   textual =~
717 | ##     x4                1.000
718 | ##     x5                1.113    0.065   17.014    0.000
719 | ##     x6                0.926    0.055   16.703    0.000
720 | ##   speed =~
721 | ##     x7                1.000
722 | ##     x8                1.180    0.165    7.152    0.000
723 | ##     x9                1.082    0.151    7.155    0.000
724 | ## 
725 | ## Covariances:
726 | ##   visual ~~
727 | ##     textual           0.408    0.074    5.552    0.000
728 | ##     speed             0.262    0.056    4.660    0.000
729 | ##   textual ~~
730 | ##     speed             0.173    0.049    3.518    0.000
731 | ## 
732 | ## Variances:
733 | ##     x1                0.549    0.114
734 | ##     x2                1.134    0.102
735 | ##     x3                0.844    0.091
736 | ##     x4                0.371    0.048
737 | ##     x5                0.446    0.058
738 | ##     x6                0.356    0.043
739 | ##     x7                0.799    0.081
740 | ##     x8                0.488    0.074
741 | ##     x9                0.566    0.071
742 | ##     visual            0.809    0.145
743 | ##     textual           0.979    0.112
744 | ##     speed             0.384    0.086
745 | ## 
746 | 
747 | 748 |
standardizedSolution(m1_fit)
749 | 
750 | 751 |
##        lhs op     rhs est.std se  z pvalue
752 | ## 1   visual =~      x1   0.772 NA NA     NA
753 | ## 2   visual =~      x2   0.424 NA NA     NA
754 | ## 3   visual =~      x3   0.581 NA NA     NA
755 | ## 4  textual =~      x4   0.852 NA NA     NA
756 | ## 5  textual =~      x5   0.855 NA NA     NA
757 | ## 6  textual =~      x6   0.838 NA NA     NA
758 | ## 7    speed =~      x7   0.570 NA NA     NA
759 | ## 8    speed =~      x8   0.723 NA NA     NA
760 | ## 9    speed =~      x9   0.665 NA NA     NA
761 | ## 10      x1 ~~      x1   0.404 NA NA     NA
762 | ## 11      x2 ~~      x2   0.821 NA NA     NA
763 | ## 12      x3 ~~      x3   0.662 NA NA     NA
764 | ## 13      x4 ~~      x4   0.275 NA NA     NA
765 | ## 14      x5 ~~      x5   0.269 NA NA     NA
766 | ## 15      x6 ~~      x6   0.298 NA NA     NA
767 | ## 16      x7 ~~      x7   0.676 NA NA     NA
768 | ## 17      x8 ~~      x8   0.477 NA NA     NA
769 | ## 18      x9 ~~      x9   0.558 NA NA     NA
770 | ## 19  visual ~~  visual   1.000 NA NA     NA
771 | ## 20 textual ~~ textual   1.000 NA NA     NA
772 | ## 21   speed ~~   speed   1.000 NA NA     NA
773 | ## 22  visual ~~ textual   0.459 NA NA     NA
774 | ## 23  visual ~~   speed   0.471 NA NA     NA
775 | ## 24 textual ~~   speed   0.283 NA NA     NA
776 | 
777 | 778 | 779 | 780 | 781 | 782 | -------------------------------------------------------------------------------- /ex1-paper/ex1-paper.md: -------------------------------------------------------------------------------- 1 | # Example 1 from Lavaan 2 | 3 | This exercise examines the first example shown in 4 | . 5 | It's a three-factor confirmatory factor analysis example with three items per factor. 6 | All three latent factors are permitted to correlate. 7 | 8 | * `x1` to `x3` load on a `visual` factor 9 | * `x4` to `x6` load on a `textual` factor 10 | * `x7` to `x9` load on a `speed` factor 11 | 12 | 13 | 14 | ```r 15 | library('lavaan') 16 | library('Hmisc') 17 | cases <- HolzingerSwineford1939 18 | ``` 19 | 20 | 21 | 22 | 23 | ## Quick examination of data 24 | 25 | 26 | 27 | ```r 28 | str(cases) 29 | ``` 30 | 31 | ``` 32 | ## 'data.frame': 301 obs. of 15 variables: 33 | ## $ id : int 1 2 3 4 5 6 7 8 9 11 ... 34 | ## $ sex : int 1 2 2 1 2 2 1 2 2 2 ... 35 | ## $ ageyr : int 13 13 13 13 12 14 12 12 13 12 ... 36 | ## $ agemo : int 1 7 1 2 2 1 1 2 0 5 ... 37 | ## $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ... 38 | ## $ grade : int 7 7 7 7 7 7 7 7 7 7 ... 39 | ## $ x1 : num 3.33 5.33 4.5 5.33 4.83 ... 40 | ## $ x2 : num 7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ... 41 | ## $ x3 : num 0.375 2.125 1.875 3 0.875 ... 42 | ## $ x4 : num 2.33 1.67 1 2.67 2.67 ... 43 | ## $ x5 : num 5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ... 44 | ## $ x6 : num 1.286 1.286 0.429 2.429 2.571 ... 45 | ## $ x7 : num 3.39 3.78 3.26 3 3.7 ... 46 | ## $ x8 : num 5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ... 47 | ## $ x9 : num 6.36 7.92 4.42 4.86 5.92 ... 48 | ``` 49 | 50 | ```r 51 | Hmisc::describe(cases) 52 | ``` 53 | 54 | ``` 55 | ## cases 56 | ## 57 | ## 15 Variables 301 Observations 58 | ## --------------------------------------------------------------------------- 59 | ## id 60 | ## n missing unique Mean .05 .10 .25 .50 .75 61 | ## 301 0 301 176.6 17 33 82 163 272 62 | ## .90 .95 63 | ## 318 335 64 | ## 65 | ## lowest : 1 2 3 4 5, highest: 346 347 348 349 351 66 | ## --------------------------------------------------------------------------- 67 | ## sex 68 | ## n missing unique Mean 69 | ## 301 0 2 1.515 70 | ## 71 | ## 1 (146, 49%), 2 (155, 51%) 72 | ## --------------------------------------------------------------------------- 73 | ## ageyr 74 | ## n missing unique Mean 75 | ## 301 0 6 13 76 | ## 77 | ## 11 12 13 14 15 16 78 | ## Frequency 8 101 110 55 20 7 79 | ## % 3 34 37 18 7 2 80 | ## --------------------------------------------------------------------------- 81 | ## agemo 82 | ## n missing unique Mean .05 .10 .25 .50 .75 83 | ## 301 0 12 5.375 0 1 2 5 8 84 | ## .90 .95 85 | ## 10 11 86 | ## 87 | ## 0 1 2 3 4 5 6 7 8 9 10 11 88 | ## Frequency 22 31 26 26 27 27 21 25 26 23 19 28 89 | ## % 7 10 9 9 9 9 7 8 9 8 6 9 90 | ## --------------------------------------------------------------------------- 91 | ## school 92 | ## n missing unique 93 | ## 301 0 2 94 | ## 95 | ## Grant-White (145, 48%), Pasteur (156, 52%) 96 | ## --------------------------------------------------------------------------- 97 | ## grade 98 | ## n missing unique Mean 99 | ## 300 1 2 7.477 100 | ## 101 | ## 7 (157, 52%), 8 (143, 48%) 102 | ## --------------------------------------------------------------------------- 103 | ## x1 104 | ## n missing unique Mean .05 .10 .25 .50 .75 105 | ## 301 0 35 4.936 3.000 3.333 4.167 5.000 5.667 106 | ## .90 .95 107 | ## 6.333 6.667 108 | ## 109 | ## lowest : 0.6667 1.6667 1.8333 2.0000 2.6667 110 | ## highest: 7.0000 7.1667 7.3333 7.5000 8.5000 111 | ## --------------------------------------------------------------------------- 112 | ## x2 113 | ## n missing unique Mean .05 .10 .25 .50 .75 114 | ## 301 0 25 6.088 4.50 4.75 5.25 6.00 6.75 115 | ## .90 .95 116 | ## 7.75 8.50 117 | ## 118 | ## lowest : 2.25 3.50 3.75 4.00 4.25, highest: 8.25 8.50 8.75 9.00 9.25 119 | ## --------------------------------------------------------------------------- 120 | ## x3 121 | ## n missing unique Mean .05 .10 .25 .50 .75 122 | ## 301 0 35 2.25 0.625 0.875 1.375 2.125 3.125 123 | ## .90 .95 124 | ## 4.000 4.250 125 | ## 126 | ## lowest : 0.250 0.375 0.500 0.625 0.750 127 | ## highest: 4.000 4.125 4.250 4.375 4.500 128 | ## --------------------------------------------------------------------------- 129 | ## x4 130 | ## n missing unique Mean .05 .10 .25 .50 .75 131 | ## 301 0 20 3.061 1.333 1.667 2.333 3.000 3.667 132 | ## .90 .95 133 | ## 4.667 5.000 134 | ## 135 | ## lowest : 0.0000 0.3333 0.6667 1.0000 1.3333 136 | ## highest: 5.0000 5.3333 5.6667 6.0000 6.3333 137 | ## --------------------------------------------------------------------------- 138 | ## x5 139 | ## n missing unique Mean .05 .10 .25 .50 .75 140 | ## 301 0 25 4.341 2.00 2.50 3.50 4.50 5.25 141 | ## .90 .95 142 | ## 6.00 6.25 143 | ## 144 | ## lowest : 1.00 1.25 1.50 1.75 2.00, highest: 6.00 6.25 6.50 6.75 7.00 145 | ## --------------------------------------------------------------------------- 146 | ## x6 147 | ## n missing unique Mean .05 .10 .25 .50 .75 148 | ## 301 0 40 2.186 0.7143 1.0000 1.4286 2.0000 2.7143 149 | ## .90 .95 150 | ## 3.7143 4.2857 151 | ## 152 | ## lowest : 0.1429 0.2857 0.4286 0.5714 0.7143 153 | ## highest: 5.1429 5.4286 5.5714 5.8571 6.1429 154 | ## --------------------------------------------------------------------------- 155 | ## x7 156 | ## n missing unique Mean .05 .10 .25 .50 .75 157 | ## 301 0 97 4.186 2.435 2.826 3.478 4.087 4.913 158 | ## .90 .95 159 | ## 5.696 5.870 160 | ## 161 | ## lowest : 1.304 1.870 2.000 2.043 2.130 162 | ## highest: 6.652 6.826 6.957 7.261 7.435 163 | ## --------------------------------------------------------------------------- 164 | ## x8 165 | ## n missing unique Mean .05 .10 .25 .50 .75 166 | ## 301 0 84 5.527 3.90 4.20 4.85 5.50 6.10 167 | ## .90 .95 168 | ## 6.80 7.20 169 | ## 170 | ## lowest : 3.05 3.50 3.60 3.65 3.70 171 | ## highest: 8.00 8.05 8.30 9.10 10.00 172 | ## --------------------------------------------------------------------------- 173 | ## x9 174 | ## n missing unique Mean .05 .10 .25 .50 .75 175 | ## 301 0 129 5.374 3.750 4.111 4.750 5.417 6.083 176 | ## .90 .95 177 | ## 6.667 7.000 178 | ## 179 | ## lowest : 2.778 3.111 3.222 3.278 3.306 180 | ## highest: 7.528 7.611 7.917 8.611 9.250 181 | ## --------------------------------------------------------------------------- 182 | ``` 183 | 184 | 185 | 186 | 187 | The data set include `301` observations. It includes a few demographic variables (e.g., sex, age in years and months, school, and grade). It includes nine variables that are the observed test scores used in the subsequent CFA. 188 | 189 | ## Fit CFA 190 | 191 | 192 | ```r 193 | m1_model <- ' visual =~ x1 + x2 + x3 194 | textual =~ x4 + x5 + x6 195 | speed =~ x7 + x8 + x9 196 | ' 197 | 198 | m1_fit <- cfa(m1_model, data=cases) 199 | ``` 200 | 201 | 202 | 203 | 204 | * model syntax is specified as a character variable 205 | * `cfa` is one model fitting function in `lavaan`. The command includes many options. Data can be specified as a data frame, as it is here using the `data` argument. Alternatively covariance matrix, vector of means, and sample size can be specified. 206 | * From what I can tell, `lavaan` is the parent model fitting function that can take a `model.type` argument of `'cfa'`, `'sem'`, or `'growth'`. Thus, the arguments to `model.type` are functions themselves which just call `lavaan` with particular argument values. 207 | 208 | ## Show parameter table 209 | 210 | 211 | ```r 212 | parTable(m1_fit) 213 | ``` 214 | 215 | ``` 216 | ## id lhs op rhs user group free ustart exo label eq.id unco 217 | ## 1 1 visual =~ x1 1 1 0 1 0 0 0 218 | ## 2 2 visual =~ x2 1 1 1 NA 0 0 1 219 | ## 3 3 visual =~ x3 1 1 2 NA 0 0 2 220 | ## 4 4 textual =~ x4 1 1 0 1 0 0 0 221 | ## 5 5 textual =~ x5 1 1 3 NA 0 0 3 222 | ## 6 6 textual =~ x6 1 1 4 NA 0 0 4 223 | ## 7 7 speed =~ x7 1 1 0 1 0 0 0 224 | ## 8 8 speed =~ x8 1 1 5 NA 0 0 5 225 | ## 9 9 speed =~ x9 1 1 6 NA 0 0 6 226 | ## 10 10 x1 ~~ x1 0 1 7 NA 0 0 7 227 | ## 11 11 x2 ~~ x2 0 1 8 NA 0 0 8 228 | ## 12 12 x3 ~~ x3 0 1 9 NA 0 0 9 229 | ## 13 13 x4 ~~ x4 0 1 10 NA 0 0 10 230 | ## 14 14 x5 ~~ x5 0 1 11 NA 0 0 11 231 | ## 15 15 x6 ~~ x6 0 1 12 NA 0 0 12 232 | ## 16 16 x7 ~~ x7 0 1 13 NA 0 0 13 233 | ## 17 17 x8 ~~ x8 0 1 14 NA 0 0 14 234 | ## 18 18 x9 ~~ x9 0 1 15 NA 0 0 15 235 | ## 19 19 visual ~~ visual 0 1 16 NA 0 0 16 236 | ## 20 20 textual ~~ textual 0 1 17 NA 0 0 17 237 | ## 21 21 speed ~~ speed 0 1 18 NA 0 0 18 238 | ## 22 22 visual ~~ textual 0 1 19 NA 0 0 19 239 | ## 23 23 visual ~~ speed 0 1 20 NA 0 0 20 240 | ## 24 24 textual ~~ speed 0 1 21 NA 0 0 21 241 | ``` 242 | 243 | 244 | 245 | 246 | * What do the columns mean? 247 | * `id`: numeric identifier for the parameter 248 | * `lhs`: left hand side variable name 249 | * `op`: operator (see page 7 of http://www.jstatsoft.org/v48/i02/paper); `=~` 250 | is manifested by; `~~` is correlated with. 251 | * `rhs`: right hand side variable name 252 | * `user`: 1 if parameter was specified by the user, 0 otherwise 253 | * `group`: presumably used in multiple group analysis 254 | * `free`: Nonzero elements are free parameters in the model 255 | * `ustart`: The value specified for fixed parameters 256 | * `exo`: ??? 257 | * `label`: Probably just an optional label??? 258 | * `eq.id`: ??? 259 | * `unco`: ??? 260 | 261 | 262 | * The model syntax used in `lavaan` incorporates a lot of parameters by default to permit a tidy model syntax. The exact nature of these parameters is also determined by options in the `cfa`, `sem` and other model fitting fucntions. 263 | * `parTable` is a method 264 | 265 | It shows that the latent factors are allowed to intercorrelate. The `cfa` function has an an argument `orthogonal`. It defaults to FALSE which permits correlated factors. 266 | 267 | 268 | 269 | ```r 270 | parTable(cfa(m1_model, data=cases, orthogonal=TRUE))[22:24, ] 271 | ``` 272 | 273 | ``` 274 | ## id lhs op rhs user group free ustart exo label eq.id unco 275 | ## 22 22 visual ~~ textual 0 1 0 0 0 0 0 276 | ## 23 23 visual ~~ speed 0 1 0 0 0 0 0 277 | ## 24 24 textual ~~ speed 0 1 0 0 0 0 0 278 | ``` 279 | 280 | 281 | 282 | When `orthogonal=TRUE` is specified, the covariance of latent factors is constrained to zero. This is reflected in `free=0` (i.e., it's not free to vary) and `ustart=0` (the constrained value is zero) in the parameter table. 283 | 284 | Returning to the original parameter table: 285 | 286 | * Variance parameters (`op=~~` where `lhs` is the same as `rhs`) are included for all observed and latent variables. 287 | 288 | ## Summarise fit 289 | 290 | 291 | ```r 292 | summary(m1_fit) 293 | ``` 294 | 295 | ``` 296 | ## lavaan (0.4-14) converged normally after 41 iterations 297 | ## 298 | ## Number of observations 301 299 | ## 300 | ## Estimator ML 301 | ## Minimum Function Chi-square 85.306 302 | ## Degrees of freedom 24 303 | ## P-value 0.000 304 | ## 305 | ## Parameter estimates: 306 | ## 307 | ## Information Expected 308 | ## Standard Errors Standard 309 | ## 310 | ## Estimate Std.err Z-value P(>|z|) 311 | ## Latent variables: 312 | ## visual =~ 313 | ## x1 1.000 314 | ## x2 0.553 0.100 5.554 0.000 315 | ## x3 0.729 0.109 6.685 0.000 316 | ## textual =~ 317 | ## x4 1.000 318 | ## x5 1.113 0.065 17.014 0.000 319 | ## x6 0.926 0.055 16.703 0.000 320 | ## speed =~ 321 | ## x7 1.000 322 | ## x8 1.180 0.165 7.152 0.000 323 | ## x9 1.082 0.151 7.155 0.000 324 | ## 325 | ## Covariances: 326 | ## visual ~~ 327 | ## textual 0.408 0.074 5.552 0.000 328 | ## speed 0.262 0.056 4.660 0.000 329 | ## textual ~~ 330 | ## speed 0.173 0.049 3.518 0.000 331 | ## 332 | ## Variances: 333 | ## x1 0.549 0.114 334 | ## x2 1.134 0.102 335 | ## x3 0.844 0.091 336 | ## x4 0.371 0.048 337 | ## x5 0.446 0.058 338 | ## x6 0.356 0.043 339 | ## x7 0.799 0.081 340 | ## x8 0.488 0.074 341 | ## x9 0.566 0.071 342 | ## visual 0.809 0.145 343 | ## textual 0.979 0.112 344 | ## speed 0.384 0.086 345 | ## 346 | ``` 347 | 348 | 349 | 350 | 351 | The default `summary` method shows $\chi^2$, $df$, p-value for the overall model, unstandardised parameter estimates, in some cases with significance tests. 352 | 353 | 354 | ## Getting fit statistics 355 | There are multiple ways of getting fit statistics 356 | 357 | 358 | 359 | ```r 360 | fitMeasures(m1_fit) 361 | ``` 362 | 363 | ``` 364 | ## chisq df pvalue baseline.chisq 365 | ## 85.306 24.000 0.000 918.852 366 | ## baseline.df baseline.pvalue cfi tli 367 | ## 36.000 0.000 0.931 0.896 368 | ## logl unrestricted.logl npar aic 369 | ## -3737.745 -3695.092 21.000 7517.490 370 | ## bic ntotal bic2 rmsea 371 | ## 7595.339 301.000 7528.739 0.092 372 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr 373 | ## 0.071 0.114 0.001 0.065 374 | ``` 375 | 376 | ```r 377 | # equivalent to: 378 | # inspect(m1_fit, 'fit.measures') 379 | 380 | fitMeasures(m1_fit)['rmsea'] 381 | ``` 382 | 383 | ``` 384 | ## rmsea 385 | ## 0.09212 386 | ``` 387 | 388 | ```r 389 | fitMeasures(m1_fit, c('rmsea', 'rmsea.ci.lower', 'rmsea.ci.upper')) 390 | ``` 391 | 392 | ``` 393 | ## rmsea rmsea.ci.lower rmsea.ci.upper 394 | ## 0.092 0.071 0.114 395 | ``` 396 | 397 | ```r 398 | 399 | 400 | summary(m1_fit, fit.measures=TRUE) 401 | ``` 402 | 403 | ``` 404 | ## lavaan (0.4-14) converged normally after 41 iterations 405 | ## 406 | ## Number of observations 301 407 | ## 408 | ## Estimator ML 409 | ## Minimum Function Chi-square 85.306 410 | ## Degrees of freedom 24 411 | ## P-value 0.000 412 | ## 413 | ## Chi-square test baseline model: 414 | ## 415 | ## Minimum Function Chi-square 918.852 416 | ## Degrees of freedom 36 417 | ## P-value 0.000 418 | ## 419 | ## Full model versus baseline model: 420 | ## 421 | ## Comparative Fit Index (CFI) 0.931 422 | ## Tucker-Lewis Index (TLI) 0.896 423 | ## 424 | ## Loglikelihood and Information Criteria: 425 | ## 426 | ## Loglikelihood user model (H0) -3737.745 427 | ## Loglikelihood unrestricted model (H1) -3695.092 428 | ## 429 | ## Number of free parameters 21 430 | ## Akaike (AIC) 7517.490 431 | ## Bayesian (BIC) 7595.339 432 | ## Sample-size adjusted Bayesian (BIC) 7528.739 433 | ## 434 | ## Root Mean Square Error of Approximation: 435 | ## 436 | ## RMSEA 0.092 437 | ## 90 Percent Confidence Interval 0.071 0.114 438 | ## P-value RMSEA <= 0.05 0.001 439 | ## 440 | ## Standardized Root Mean Square Residual: 441 | ## 442 | ## SRMR 0.065 443 | ## 444 | ## Parameter estimates: 445 | ## 446 | ## Information Expected 447 | ## Standard Errors Standard 448 | ## 449 | ## Estimate Std.err Z-value P(>|z|) 450 | ## Latent variables: 451 | ## visual =~ 452 | ## x1 1.000 453 | ## x2 0.553 0.100 5.554 0.000 454 | ## x3 0.729 0.109 6.685 0.000 455 | ## textual =~ 456 | ## x4 1.000 457 | ## x5 1.113 0.065 17.014 0.000 458 | ## x6 0.926 0.055 16.703 0.000 459 | ## speed =~ 460 | ## x7 1.000 461 | ## x8 1.180 0.165 7.152 0.000 462 | ## x9 1.082 0.151 7.155 0.000 463 | ## 464 | ## Covariances: 465 | ## visual ~~ 466 | ## textual 0.408 0.074 5.552 0.000 467 | ## speed 0.262 0.056 4.660 0.000 468 | ## textual ~~ 469 | ## speed 0.173 0.049 3.518 0.000 470 | ## 471 | ## Variances: 472 | ## x1 0.549 0.114 473 | ## x2 1.134 0.102 474 | ## x3 0.844 0.091 475 | ## x4 0.371 0.048 476 | ## x5 0.446 0.058 477 | ## x6 0.356 0.043 478 | ## x7 0.799 0.081 479 | ## x8 0.488 0.074 480 | ## x9 0.566 0.071 481 | ## visual 0.809 0.145 482 | ## textual 0.979 0.112 483 | ## speed 0.384 0.086 484 | ## 485 | ``` 486 | 487 | 488 | 489 | 490 | 491 | * I assume that lavaan uses S4 classes which makes extracting elements a little different to S3 classes. 492 | * The above code shows how to extract fit measures. 493 | * While it is not clear hear, it appears that `rmsea.ci.lower` and `rmsea.ci.upper` refer to 90% lower and upper confidence intervals. 494 | * Adding `fit.measures=TRUE` provides a way of displaying 495 | 496 | 497 | ## Modification indices 498 | 499 | 500 | ```r 501 | m1_mod <- modificationIndices(m1_fit) 502 | head(m1_mod[order(m1_mod$mi, decreasing=TRUE), ], 10) 503 | ``` 504 | 505 | ``` 506 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox 507 | ## 1 visual =~ x9 36.411 0.577 0.519 0.515 0.515 508 | ## 2 x7 ~~ x8 34.145 0.536 0.536 0.488 0.488 509 | ## 3 visual =~ x7 18.631 -0.422 -0.380 -0.349 -0.349 510 | ## 4 x8 ~~ x9 14.946 -0.423 -0.423 -0.415 -0.415 511 | ## 5 textual =~ x3 9.151 -0.272 -0.269 -0.238 -0.238 512 | ## 6 x2 ~~ x7 8.918 -0.183 -0.183 -0.143 -0.143 513 | ## 7 textual =~ x1 8.903 0.350 0.347 0.297 0.297 514 | ## 8 x2 ~~ x3 8.532 0.218 0.218 0.164 0.164 515 | ## 9 x3 ~~ x5 7.858 -0.130 -0.130 -0.089 -0.089 516 | ## 10 visual =~ x5 7.441 -0.210 -0.189 -0.147 -0.147 517 | ``` 518 | 519 | 520 | 521 | 522 | * The `modificationIndices` function returns modification indices and expected parameter changes (EPCs). 523 | * The second line above sorts the rows of the modification indices table in decreasing order and shows those parameters with the 10 largest values. 524 | 525 | 526 | 527 | ```r 528 | m2_model <- ' visual =~ x1 + x2 + x3 + x9 529 | textual =~ x4 + x5 + x6 530 | speed =~ x7 + x8 + x9 531 | ' 532 | 533 | m2_fit <- cfa(m2_model, data=cases) 534 | anova(m1_fit, m2_fit) 535 | ``` 536 | 537 | ``` 538 | ## Chi Square Difference Test 539 | ## 540 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 541 | ## m2_fit 23 7487 7568 52.4 542 | ## m1_fit 24 7517 7595 85.3 32.9 1 9.6e-09 *** 543 | ## --- 544 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 545 | ``` 546 | 547 | 548 | 549 | 550 | * Note that more than one empty line at the end of the model definition seems to cause an error. 551 | * TODO: Work out why the change in $\chi^2$ 552 | `32.9234` 553 | is different to the value of the modification index 554 | `36.411`. 555 | 556 | ## Standardised parameters 557 | 558 | 559 | ```r 560 | summary(m1_fit) 561 | ``` 562 | 563 | ``` 564 | ## lavaan (0.4-14) converged normally after 41 iterations 565 | ## 566 | ## Number of observations 301 567 | ## 568 | ## Estimator ML 569 | ## Minimum Function Chi-square 85.306 570 | ## Degrees of freedom 24 571 | ## P-value 0.000 572 | ## 573 | ## Parameter estimates: 574 | ## 575 | ## Information Expected 576 | ## Standard Errors Standard 577 | ## 578 | ## Estimate Std.err Z-value P(>|z|) 579 | ## Latent variables: 580 | ## visual =~ 581 | ## x1 1.000 582 | ## x2 0.553 0.100 5.554 0.000 583 | ## x3 0.729 0.109 6.685 0.000 584 | ## textual =~ 585 | ## x4 1.000 586 | ## x5 1.113 0.065 17.014 0.000 587 | ## x6 0.926 0.055 16.703 0.000 588 | ## speed =~ 589 | ## x7 1.000 590 | ## x8 1.180 0.165 7.152 0.000 591 | ## x9 1.082 0.151 7.155 0.000 592 | ## 593 | ## Covariances: 594 | ## visual ~~ 595 | ## textual 0.408 0.074 5.552 0.000 596 | ## speed 0.262 0.056 4.660 0.000 597 | ## textual ~~ 598 | ## speed 0.173 0.049 3.518 0.000 599 | ## 600 | ## Variances: 601 | ## x1 0.549 0.114 602 | ## x2 1.134 0.102 603 | ## x3 0.844 0.091 604 | ## x4 0.371 0.048 605 | ## x5 0.446 0.058 606 | ## x6 0.356 0.043 607 | ## x7 0.799 0.081 608 | ## x8 0.488 0.074 609 | ## x9 0.566 0.071 610 | ## visual 0.809 0.145 611 | ## textual 0.979 0.112 612 | ## speed 0.384 0.086 613 | ## 614 | ``` 615 | 616 | ```r 617 | standardizedSolution(m1_fit) 618 | ``` 619 | 620 | ``` 621 | ## lhs op rhs est.std se z pvalue 622 | ## 1 visual =~ x1 0.772 NA NA NA 623 | ## 2 visual =~ x2 0.424 NA NA NA 624 | ## 3 visual =~ x3 0.581 NA NA NA 625 | ## 4 textual =~ x4 0.852 NA NA NA 626 | ## 5 textual =~ x5 0.855 NA NA NA 627 | ## 6 textual =~ x6 0.838 NA NA NA 628 | ## 7 speed =~ x7 0.570 NA NA NA 629 | ## 8 speed =~ x8 0.723 NA NA NA 630 | ## 9 speed =~ x9 0.665 NA NA NA 631 | ## 10 x1 ~~ x1 0.404 NA NA NA 632 | ## 11 x2 ~~ x2 0.821 NA NA NA 633 | ## 12 x3 ~~ x3 0.662 NA NA NA 634 | ## 13 x4 ~~ x4 0.275 NA NA NA 635 | ## 14 x5 ~~ x5 0.269 NA NA NA 636 | ## 15 x6 ~~ x6 0.298 NA NA NA 637 | ## 16 x7 ~~ x7 0.676 NA NA NA 638 | ## 17 x8 ~~ x8 0.477 NA NA NA 639 | ## 18 x9 ~~ x9 0.558 NA NA NA 640 | ## 19 visual ~~ visual 1.000 NA NA NA 641 | ## 20 textual ~~ textual 1.000 NA NA NA 642 | ## 21 speed ~~ speed 1.000 NA NA NA 643 | ## 22 visual ~~ textual 0.459 NA NA NA 644 | ## 23 visual ~~ speed 0.471 NA NA NA 645 | ## 24 textual ~~ speed 0.283 NA NA NA 646 | ``` 647 | 648 | 649 | 650 | 651 | 652 | 653 | -------------------------------------------------------------------------------- /ex1-paper/ex1-paper.rmd: -------------------------------------------------------------------------------- 1 | # Example 1 from Lavaan 2 | `r opts_chunk$set(tidy=FALSE)` 3 | This exercise examines the first example shown in 4 | . 5 | It's a three-factor confirmatory factor analysis example with three items per factor. 6 | All three latent factors are permitted to correlate. 7 | 8 | * `x1` to `x3` load on a `visual` factor 9 | * `x4` to `x6` load on a `textual` factor 10 | * `x7` to `x9` load on a `speed` factor 11 | 12 | ```{r setup, message=FALSE} 13 | library('lavaan') 14 | library('Hmisc') 15 | cases <- HolzingerSwineford1939 16 | ``` 17 | 18 | ## Quick examination of data 19 | 20 | ```{r } 21 | str(cases) 22 | Hmisc::describe(cases) 23 | ``` 24 | 25 | The data set include `r nrow(cases)` observations. It includes a few demographic variables (e.g., sex, age in years and months, school, and grade). It includes nine variables that are the observed test scores used in the subsequent CFA. 26 | 27 | ## Fit CFA 28 | ```{r fit_m1} 29 | m1_model <- ' visual =~ x1 + x2 + x3 30 | textual =~ x4 + x5 + x6 31 | speed =~ x7 + x8 + x9 32 | ' 33 | 34 | m1_fit <- cfa(m1_model, data=cases) 35 | ``` 36 | 37 | * model syntax is specified as a character variable 38 | * `cfa` is one model fitting function in `lavaan`. The command includes many options. Data can be specified as a data frame, as it is here using the `data` argument. Alternatively covariance matrix, vector of means, and sample size can be specified. 39 | * From what I can tell, `lavaan` is the parent model fitting function that can take a `model.type` argument of `'cfa'`, `'sem'`, or `'growth'`. Thus, the arguments to `model.type` are functions themselves which just call `lavaan` with particular argument values. 40 | 41 | ## Show parameter table 42 | ```{r} 43 | parTable(m1_fit) 44 | ``` 45 | 46 | * What do the columns mean? 47 | * `id`: numeric identifier for the parameter 48 | * `lhs`: left hand side variable name 49 | * `op`: operator (see page 7 of http://www.jstatsoft.org/v48/i02/paper); `=~` 50 | is manifested by; `~~` is correlated with. 51 | * `rhs`: right hand side variable name 52 | * `user`: 1 if parameter was specified by the user, 0 otherwise 53 | * `group`: presumably used in multiple group analysis 54 | * `free`: Nonzero elements are free parameters in the model 55 | * `ustart`: The value specified for fixed parameters 56 | * `exo`: ??? 57 | * `label`: Probably just an optional label??? 58 | * `eq.id`: ??? 59 | * `unco`: ??? 60 | 61 | 62 | * The model syntax used in `lavaan` incorporates a lot of parameters by default to permit a tidy model syntax. The exact nature of these parameters is also determined by options in the `cfa`, `sem` and other model fitting fucntions. 63 | * `parTable` is a method 64 | 65 | It shows that the latent factors are allowed to intercorrelate. The `cfa` function has an an argument `orthogonal`. It defaults to FALSE which permits correlated factors. 66 | 67 | ```{r} 68 | parTable(cfa(m1_model, data=cases, orthogonal=TRUE))[22:24, ] 69 | ``` 70 | When `orthogonal=TRUE` is specified, the covariance of latent factors is constrained to zero. This is reflected in `free=0` (i.e., it's not free to vary) and `ustart=0` (the constrained value is zero) in the parameter table. 71 | 72 | Returning to the original parameter table: 73 | 74 | * Variance parameters (`op=~~` where `lhs` is the same as `rhs`) are included for all observed and latent variables. 75 | 76 | ## Summarise fit 77 | ```{r} 78 | summary(m1_fit) 79 | ``` 80 | 81 | The default `summary` method shows $\chi^2$, $df$, p-value for the overall model, unstandardised parameter estimates, in some cases with significance tests. 82 | 83 | 84 | ## Getting fit statistics 85 | There are multiple ways of getting fit statistics 86 | 87 | ```{r} 88 | fitMeasures(m1_fit) 89 | # equivalent to: 90 | # inspect(m1_fit, 'fit.measures') 91 | 92 | fitMeasures(m1_fit)['rmsea'] 93 | fitMeasures(m1_fit, c('rmsea', 'rmsea.ci.lower', 'rmsea.ci.upper')) 94 | 95 | 96 | summary(m1_fit, fit.measures=TRUE) 97 | ``` 98 | 99 | 100 | * I assume that lavaan uses S4 classes which makes extracting elements a little different to S3 classes. 101 | * The above code shows how to extract fit measures. 102 | * While it is not clear hear, it appears that `rmsea.ci.lower` and `rmsea.ci.upper` refer to 90% lower and upper confidence intervals. 103 | * Adding `fit.measures=TRUE` provides a way of displaying 104 | 105 | 106 | ## Modification indices 107 | ```{r} 108 | m1_mod <- modificationIndices(m1_fit) 109 | head(m1_mod[order(m1_mod$mi, decreasing=TRUE), ], 10) 110 | ``` 111 | 112 | * The `modificationIndices` function returns modification indices and expected parameter changes (EPCs). 113 | * The second line above sorts the rows of the modification indices table in decreasing order and shows those parameters with the 10 largest values. 114 | 115 | ```{r} 116 | m2_model <- ' visual =~ x1 + x2 + x3 + x9 117 | textual =~ x4 + x5 + x6 118 | speed =~ x7 + x8 + x9 119 | ' 120 | 121 | m2_fit <- cfa(m2_model, data=cases) 122 | anova(m1_fit, m2_fit) 123 | ``` 124 | 125 | * Note that more than one empty line at the end of the model definition seems to cause an error. 126 | * TODO: Work out why the change in $\chi^2$ 127 | `r anova(m1_fit, m2_fit)[2, 'Chisq diff']` 128 | is different to the value of the modification index 129 | `r m1_mod[m1_mod$lhs == 'visual' & m1_mod$rhs == 'x9', 'mi']`. 130 | 131 | ## Standardised parameters 132 | ```{r} 133 | summary(m1_fit) 134 | standardizedSolution(m1_fit) 135 | ``` 136 | 137 | 138 | 139 | -------------------------------------------------------------------------------- /ex2-paper/ex2-paper.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | Example 2 from Rossel's Paper on lavaan 9 | 10 | 11 | 12 | 138 | 139 | 140 | 170 | 171 | 172 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 |

Example 2 from Rossel's Paper on lavaan

184 | 185 |
library(lavaan)
186 | Data <- PoliticalDemocracy
187 | 
188 | 189 |

This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see here).

190 | 191 |

M0: Basic Measurement model

192 | 193 |
m0_model <- '
194 | # measurement model
195 | ind60 =~ x1 + x2 + x3
196 | dem60 =~ y1 + y2 + y3 + y4
197 | dem65 =~ y5 + y6 + y7 + y8
198 | '
199 | 
200 | m0_fit <- cfa(m0_model, data=Data)
201 | 
202 | 203 | 206 | 207 |

Questions:

208 | 209 | 212 | 213 |
fitmeasures(m0_fit)
214 | 
215 | 216 |
##             chisq                df            pvalue    baseline.chisq 
217 | ##            72.462            41.000             0.002           730.654 
218 | ##       baseline.df   baseline.pvalue               cfi               tli 
219 | ##            55.000             0.000             0.953             0.938 
220 | ##              logl unrestricted.logl              npar               aic 
221 | ##         -1564.959         -1528.728            25.000          3179.918 
222 | ##               bic            ntotal              bic2             rmsea 
223 | ##          3237.855            75.000          3159.062             0.101 
224 | ##    rmsea.ci.lower    rmsea.ci.upper      rmsea.pvalue              srmr 
225 | ##             0.061             0.139             0.021             0.055 
226 | 
227 | 228 | 231 | 232 |
inspect(m0_fit, 'standardized')
233 | 
234 | 235 |
##      lhs op   rhs est.std se  z pvalue
236 | ## 1  ind60 =~    x1   0.920 NA NA     NA
237 | ## 2  ind60 =~    x2   0.973 NA NA     NA
238 | ## 3  ind60 =~    x3   0.872 NA NA     NA
239 | ## 4  dem60 =~    y1   0.845 NA NA     NA
240 | ## 5  dem60 =~    y2   0.760 NA NA     NA
241 | ## 6  dem60 =~    y3   0.705 NA NA     NA
242 | ## 7  dem60 =~    y4   0.860 NA NA     NA
243 | ## 8  dem65 =~    y5   0.803 NA NA     NA
244 | ## 9  dem65 =~    y6   0.783 NA NA     NA
245 | ## 10 dem65 =~    y7   0.819 NA NA     NA
246 | ## 11 dem65 =~    y8   0.847 NA NA     NA
247 | ## 12    x1 ~~    x1   0.154 NA NA     NA
248 | ## 13    x2 ~~    x2   0.053 NA NA     NA
249 | ## 14    x3 ~~    x3   0.240 NA NA     NA
250 | ## 15    y1 ~~    y1   0.286 NA NA     NA
251 | ## 16    y2 ~~    y2   0.422 NA NA     NA
252 | ## 17    y3 ~~    y3   0.503 NA NA     NA
253 | ## 18    y4 ~~    y4   0.261 NA NA     NA
254 | ## 19    y5 ~~    y5   0.355 NA NA     NA
255 | ## 20    y6 ~~    y6   0.387 NA NA     NA
256 | ## 21    y7 ~~    y7   0.329 NA NA     NA
257 | ## 22    y8 ~~    y8   0.283 NA NA     NA
258 | ## 23 ind60 ~~ ind60   1.000 NA NA     NA
259 | ## 24 dem60 ~~ dem60   1.000 NA NA     NA
260 | ## 25 dem65 ~~ dem65   1.000 NA NA     NA
261 | ## 26 ind60 ~~ dem60   0.448 NA NA     NA
262 | ## 27 ind60 ~~ dem65   0.555 NA NA     NA
263 | ## 28 dem60 ~~ dem65   0.978 NA NA     NA
264 | 
265 | 266 | 269 | 270 |
m0_mod <- modificationindices(m0_fit)
271 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12)
272 | 
273 | 274 |
##      lhs op rhs    mi    epc sepc.lv sepc.all sepc.nox
275 | ## 1     y2 ~~  y6 9.279  2.129   2.129    0.162    0.162
276 | ## 2     y6 ~~  y8 8.668  1.513   1.513    0.140    0.140
277 | ## 3     y1 ~~  y5 8.183  0.884   0.884    0.131    0.131
278 | ## 4     y3 ~~  y6 6.574 -1.590  -1.590   -0.146   -0.146
279 | ## 5     y1 ~~  y3 5.204  1.024   1.024    0.121    0.121
280 | ## 6     y2 ~~  y4 4.911  1.432   1.432    0.110    0.110
281 | ## 7     y3 ~~  y7 4.088  1.152   1.152    0.108    0.108
282 | ## 8  ind60 =~  y5 4.007  0.762   0.510    0.197    0.197
283 | ## 9     x1 ~~  y2 3.785 -0.192  -0.192   -0.067   -0.067
284 | ## 10 ind60 =~  y4 3.568  0.811   0.543    0.163    0.163
285 | ## 11    y2 ~~  y3 3.215 -1.365  -1.365   -0.107   -0.107
286 | ## 12    y5 ~~  y6 3.116 -0.774  -0.774   -0.089   -0.089
287 | 
288 | 289 | 294 | 295 |
round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2)
296 | 
297 | 298 |
##      y5   y6   y7   y8
299 | ## y5 1.00 0.56 0.68 0.63
300 | ## y6 0.56 1.00 0.61 0.75
301 | ## y7 0.68 0.61 1.00 0.71
302 | ## y8 0.63 0.75 0.71 1.00
303 | 
304 | 305 | 308 | 309 |
cov2cor(inspect(m0_fit, "coefficients")$psi)
310 | 
311 | 312 |
##       ind60 dem60 dem65
313 | ## ind60 1.000            
314 | ## dem60 0.448 1.000      
315 | ## dem65 0.555 0.978 1.000
316 | 
317 | 318 |

This certainly suggests that factors are strongly related, especially the two demographics measures.

319 | 320 |

M1: Correlated item measurement model

321 | 322 |

This next model permits corresponding democracy measures from the two points to be correlated.

323 | 324 |
m1_model <- '
325 |     # measurement model
326 |     ind60 =~ x1 + x2 + x3
327 |     dem60 =~ y1 + y2 + y3 + y4
328 |     dem65 =~ y5 + y6 + y7 + y8
329 | 
330 |     # correlated residuals
331 |     y1 ~~ y5
332 |     y2 ~~ y6
333 |     y3 ~~ y7
334 |     y4 ~~ y8
335 | '
336 | 
337 | m1_fit <- cfa(m1_model, data=Data)
338 | 
339 | 340 | 344 | 345 |
anova(m0_fit, m1_fit)
346 | 
347 | 348 |
## Chi Square Difference Test
349 | ## 
350 | ##        Df  AIC  BIC Chisq Chisq diff Df diff Pr(>Chisq)    
351 | ## m1_fit 37 3166 3233  50.8                                  
352 | ## m0_fit 41 3180 3238  72.5       21.6       4    0.00024 ***
353 | ## ---
354 | ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
355 | 
356 | 357 |
round(cbind(m0=inspect(m0_fit, 'fit.measures'), 
358 |             m1=inspect(m1_fit, 'fit.measures')), 3)
359 | 
360 | 361 |
##                          m0        m1
362 | ## chisq                72.462    50.835
363 | ## df                   41.000    37.000
364 | ## pvalue                0.002     0.064
365 | ## baseline.chisq      730.654   730.654
366 | ## baseline.df          55.000    55.000
367 | ## baseline.pvalue       0.000     0.000
368 | ## cfi                   0.953     0.980
369 | ## tli                   0.938     0.970
370 | ## logl              -1564.959 -1554.146
371 | ## unrestricted.logl -1528.728 -1528.728
372 | ## npar                 25.000    29.000
373 | ## aic                3179.918  3166.292
374 | ## bic                3237.855  3233.499
375 | ## ntotal               75.000    75.000
376 | ## bic2               3159.062  3142.099
377 | ## rmsea                 0.101     0.071
378 | ## rmsea.ci.lower        0.061     0.000
379 | ## rmsea.ci.upper        0.139     0.115
380 | ## rmsea.pvalue          0.021     0.234
381 | ## srmr                  0.055     0.050
382 | 
383 | 384 | 389 | 390 |

M2: Basic SEM

391 | 392 |
m2_model <- '
393 |     # measurement model
394 |     ind60 =~ x1 + x2 + x3
395 |     dem60 =~ y1 + y2 + y3 + y4
396 |     dem65 =~ y5 + y6 + y7 + y8
397 | 
398 |     # correlated residuals
399 |     y1 ~~ y5
400 |     y2 ~~ y6
401 |     y3 ~~ y7
402 |     y4 ~~ y8
403 | 
404 |     # regressions
405 |     dem60 ~ ind60
406 |     dem65 ~ ind60 + dem60
407 | '
408 | 
409 | m2_fit <- sem(m2_model, data=Data)
410 | 
411 | 412 | 415 | 416 |
rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')], 
417 |     m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')])
418 | 
419 | 420 |
##    chisq   rmsea
421 | ## m1 50.84 0.07061
422 | ## m2 50.84 0.07061
423 | 
424 | 425 |

Yes, it is.

426 | 427 | 437 | 438 |
# m2_fit <- sem(m2_model, data=Data)
439 | 
440 | # r-square for dem-65
441 | inspect(m2_fit, 'r2')['dem65']
442 | 
443 | 444 |
##  dem65 
445 | ## 0.9139 
446 | 
447 | 448 |

449 | # Unstandardised regression coefficients
450 | inspect(m2_fit, 'coef')$beta['dem65', ]
451 | 
452 | 453 |
##  ind60  dem60  dem65 
454 | ## 0.5069 0.8157 0.0000 
455 | 
456 | 457 |

458 | # Standardised regression coefficients
459 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~')
460 | 
461 | 462 |
##     lhs op   rhs est.std se  z pvalue
463 | ## 1 dem65  ~ ind60   0.168 NA NA     NA
464 | ## 2 dem65  ~ dem60   0.869 NA NA     NA
465 | 
466 | 467 |

468 | # Just a guess, may not be correct:
469 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ],
470 | #       se=inspect(m2_fit, 'se')$beta['dem65', ])
471 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96
472 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96
473 | 
474 | 475 | 476 | 477 | 478 | 479 | -------------------------------------------------------------------------------- /ex2-paper/ex2-paper.md: -------------------------------------------------------------------------------- 1 | 2 | # Example 2 from Rossel's Paper on lavaan 3 | 4 | 5 | ```r 6 | library(lavaan) 7 | Data <- PoliticalDemocracy 8 | ``` 9 | 10 | 11 | 12 | 13 | This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see [here](http://www.jstatsoft.org/v48/i02/paper)). 14 | 15 | ## M0: Basic Measurement model 16 | 17 | 18 | ```r 19 | m0_model <- ' 20 | # measurement model 21 | ind60 =~ x1 + x2 + x3 22 | dem60 =~ y1 + y2 + y3 + y4 23 | dem65 =~ y5 + y6 + y7 + y8 24 | ' 25 | 26 | m0_fit <- cfa(m0_model, data=Data) 27 | ``` 28 | 29 | 30 | 31 | 32 | * `m0` defines a basic measurement model that permits correlated factors. Note that it does not have correlations between corresponding democracy indicator measures over time. 33 | 34 | **Questions:** 35 | 36 | * Is it a good model? 37 | 38 | 39 | 40 | ```r 41 | fitmeasures(m0_fit) 42 | ``` 43 | 44 | ``` 45 | ## chisq df pvalue baseline.chisq 46 | ## 72.462 41.000 0.002 730.654 47 | ## baseline.df baseline.pvalue cfi tli 48 | ## 55.000 0.000 0.953 0.938 49 | ## logl unrestricted.logl npar aic 50 | ## -1564.959 -1528.728 25.000 3179.918 51 | ## bic ntotal bic2 rmsea 52 | ## 3237.855 75.000 3159.062 0.101 53 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr 54 | ## 0.061 0.139 0.021 0.055 55 | ``` 56 | 57 | 58 | 59 | 60 | * cfi suggests a reasonable model, but RMSEA is quite large. 61 | 62 | 63 | 64 | ```r 65 | inspect(m0_fit, 'standardized') 66 | ``` 67 | 68 | ``` 69 | ## lhs op rhs est.std se z pvalue 70 | ## 1 ind60 =~ x1 0.920 NA NA NA 71 | ## 2 ind60 =~ x2 0.973 NA NA NA 72 | ## 3 ind60 =~ x3 0.872 NA NA NA 73 | ## 4 dem60 =~ y1 0.845 NA NA NA 74 | ## 5 dem60 =~ y2 0.760 NA NA NA 75 | ## 6 dem60 =~ y3 0.705 NA NA NA 76 | ## 7 dem60 =~ y4 0.860 NA NA NA 77 | ## 8 dem65 =~ y5 0.803 NA NA NA 78 | ## 9 dem65 =~ y6 0.783 NA NA NA 79 | ## 10 dem65 =~ y7 0.819 NA NA NA 80 | ## 11 dem65 =~ y8 0.847 NA NA NA 81 | ## 12 x1 ~~ x1 0.154 NA NA NA 82 | ## 13 x2 ~~ x2 0.053 NA NA NA 83 | ## 14 x3 ~~ x3 0.240 NA NA NA 84 | ## 15 y1 ~~ y1 0.286 NA NA NA 85 | ## 16 y2 ~~ y2 0.422 NA NA NA 86 | ## 17 y3 ~~ y3 0.503 NA NA NA 87 | ## 18 y4 ~~ y4 0.261 NA NA NA 88 | ## 19 y5 ~~ y5 0.355 NA NA NA 89 | ## 20 y6 ~~ y6 0.387 NA NA NA 90 | ## 21 y7 ~~ y7 0.329 NA NA NA 91 | ## 22 y8 ~~ y8 0.283 NA NA NA 92 | ## 23 ind60 ~~ ind60 1.000 NA NA NA 93 | ## 24 dem60 ~~ dem60 1.000 NA NA NA 94 | ## 25 dem65 ~~ dem65 1.000 NA NA NA 95 | ## 26 ind60 ~~ dem60 0.448 NA NA NA 96 | ## 27 ind60 ~~ dem65 0.555 NA NA NA 97 | ## 28 dem60 ~~ dem65 0.978 NA NA NA 98 | ``` 99 | 100 | 101 | 102 | 103 | * The table of standardised loadings show all factor loadings to be large. 104 | 105 | 106 | 107 | ```r 108 | m0_mod <- modificationindices(m0_fit) 109 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12) 110 | ``` 111 | 112 | ``` 113 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox 114 | ## 1 y2 ~~ y6 9.279 2.129 2.129 0.162 0.162 115 | ## 2 y6 ~~ y8 8.668 1.513 1.513 0.140 0.140 116 | ## 3 y1 ~~ y5 8.183 0.884 0.884 0.131 0.131 117 | ## 4 y3 ~~ y6 6.574 -1.590 -1.590 -0.146 -0.146 118 | ## 5 y1 ~~ y3 5.204 1.024 1.024 0.121 0.121 119 | ## 6 y2 ~~ y4 4.911 1.432 1.432 0.110 0.110 120 | ## 7 y3 ~~ y7 4.088 1.152 1.152 0.108 0.108 121 | ## 8 ind60 =~ y5 4.007 0.762 0.510 0.197 0.197 122 | ## 9 x1 ~~ y2 3.785 -0.192 -0.192 -0.067 -0.067 123 | ## 10 ind60 =~ y4 3.568 0.811 0.543 0.163 0.163 124 | ## 11 y2 ~~ y3 3.215 -1.365 -1.365 -0.107 -0.107 125 | ## 12 y5 ~~ y6 3.116 -0.774 -0.774 -0.089 -0.089 126 | ``` 127 | 128 | 129 | 130 | 131 | * The table of largest modification indices suggest a range of ways that the model could be improved. Because the sample size is small, particular caution needs to be taken with these. 132 | * Several of these modifications concern the expected requirement to permit indicator variables at different time points to correlate (e.g., `y2` with `y6`, `y3` with `y7`). 133 | * It may also be that some pairs of items are correlated more than others. For example, the following correlation matrix shows how `y6` and `y8` have a particularly large correlation. 134 | 135 | 136 | 137 | ```r 138 | round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2) 139 | ``` 140 | 141 | ``` 142 | ## y5 y6 y7 y8 143 | ## y5 1.00 0.56 0.68 0.63 144 | ## y6 0.56 1.00 0.61 0.75 145 | ## y7 0.68 0.61 1.00 0.71 146 | ## y8 0.63 0.75 0.71 1.00 147 | ``` 148 | 149 | 150 | 151 | 152 | 153 | * What are the correlations between the factors? 154 | 155 | 156 | 157 | ```r 158 | cov2cor(inspect(m0_fit, "coefficients")$psi) 159 | ``` 160 | 161 | ``` 162 | ## ind60 dem60 dem65 163 | ## ind60 1.000 164 | ## dem60 0.448 1.000 165 | ## dem65 0.555 0.978 1.000 166 | ``` 167 | 168 | 169 | 170 | 171 | This certainly suggests that factors are strongly related, especially the two demographics measures. 172 | 173 | 174 | ## M1: Correlated item measurement model 175 | This next model permits corresponding democracy measures from the two points to be correlated. 176 | 177 | 178 | 179 | ```r 180 | m1_model <- ' 181 | # measurement model 182 | ind60 =~ x1 + x2 + x3 183 | dem60 =~ y1 + y2 + y3 + y4 184 | dem65 =~ y5 + y6 + y7 + y8 185 | 186 | # correlated residuals 187 | y1 ~~ y5 188 | y2 ~~ y6 189 | y3 ~~ y7 190 | y4 ~~ y8 191 | ' 192 | 193 | m1_fit <- cfa(m1_model, data=Data) 194 | ``` 195 | 196 | 197 | 198 | 199 | * Is this an improvement over `m0` with uncorrelated indicators? 200 | * Does `m1` have good fit in and of itself? 201 | 202 | 203 | 204 | ```r 205 | anova(m0_fit, m1_fit) 206 | ``` 207 | 208 | ``` 209 | ## Chi Square Difference Test 210 | ## 211 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 212 | ## m1_fit 37 3166 3233 50.8 213 | ## m0_fit 41 3180 3238 72.5 21.6 4 0.00024 *** 214 | ## --- 215 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 216 | ``` 217 | 218 | ```r 219 | round(cbind(m0=inspect(m0_fit, 'fit.measures'), 220 | m1=inspect(m1_fit, 'fit.measures')), 3) 221 | ``` 222 | 223 | ``` 224 | ## m0 m1 225 | ## chisq 72.462 50.835 226 | ## df 41.000 37.000 227 | ## pvalue 0.002 0.064 228 | ## baseline.chisq 730.654 730.654 229 | ## baseline.df 55.000 55.000 230 | ## baseline.pvalue 0.000 0.000 231 | ## cfi 0.953 0.980 232 | ## tli 0.938 0.970 233 | ## logl -1564.959 -1554.146 234 | ## unrestricted.logl -1528.728 -1528.728 235 | ## npar 25.000 29.000 236 | ## aic 3179.918 3166.292 237 | ## bic 3237.855 3233.499 238 | ## ntotal 75.000 75.000 239 | ## bic2 3159.062 3142.099 240 | ## rmsea 0.101 0.071 241 | ## rmsea.ci.lower 0.061 0.000 242 | ## rmsea.ci.upper 0.139 0.115 243 | ## rmsea.pvalue 0.021 0.234 244 | ## srmr 0.055 0.050 245 | ``` 246 | 247 | 248 | 249 | 250 | * It is a significant improvement. 251 | * RMSEA and other fit measurs are substantially improved. 252 | * The relatively small sample size makes it somewhat difficult to see how much further improvements should continue. In general, the RMSEA suggests that further improvements are possible but it may be less clear on how to proceed in a principled way. 253 | 254 | 255 | 256 | 257 | # M2: Basic SEM 258 | 259 | 260 | ```r 261 | m2_model <- ' 262 | # measurement model 263 | ind60 =~ x1 + x2 + x3 264 | dem60 =~ y1 + y2 + y3 + y4 265 | dem65 =~ y5 + y6 + y7 + y8 266 | 267 | # correlated residuals 268 | y1 ~~ y5 269 | y2 ~~ y6 270 | y3 ~~ y7 271 | y4 ~~ y8 272 | 273 | # regressions 274 | dem60 ~ ind60 275 | dem65 ~ ind60 + dem60 276 | ' 277 | 278 | m2_fit <- sem(m2_model, data=Data) 279 | ``` 280 | 281 | 282 | 283 | 284 | * Is fit the same as model 1 as I would expect? 285 | 286 | 287 | 288 | ```r 289 | rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')], 290 | m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')]) 291 | ``` 292 | 293 | ``` 294 | ## chisq rmsea 295 | ## m1 50.84 0.07061 296 | ## m2 50.84 0.07061 297 | ``` 298 | 299 | 300 | 301 | Yes, it is. 302 | 303 | * Assuming democracy 1965 is the depenent variable, how can we get the information typically available in multiple regression output? 304 | * R-squared? 305 | * Unstandardised regression coefficients? 306 | * Standardised regression coefficients? 307 | * Standard errors, p-values, and confidence intervals on unstandardised coefficients? 308 | 309 | 310 | 311 | ```r 312 | # m2_fit <- sem(m2_model, data=Data) 313 | 314 | # r-square for dem-65 315 | inspect(m2_fit, 'r2')['dem65'] 316 | ``` 317 | 318 | ``` 319 | ## dem65 320 | ## 0.9139 321 | ``` 322 | 323 | ```r 324 | 325 | # Unstandardised regression coefficients 326 | inspect(m2_fit, 'coef')$beta['dem65', ] 327 | ``` 328 | 329 | ``` 330 | ## ind60 dem60 dem65 331 | ## 0.5069 0.8157 0.0000 332 | ``` 333 | 334 | ```r 335 | 336 | # Standardised regression coefficients 337 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~') 338 | ``` 339 | 340 | ``` 341 | ## lhs op rhs est.std se z pvalue 342 | ## 1 dem65 ~ ind60 0.168 NA NA NA 343 | ## 2 dem65 ~ dem60 0.869 NA NA NA 344 | ``` 345 | 346 | ```r 347 | 348 | # Just a guess, may not be correct: 349 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ], 350 | # se=inspect(m2_fit, 'se')$beta['dem65', ]) 351 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96 352 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96 353 | ``` 354 | 355 | 356 | 357 | 358 | 359 | 360 | -------------------------------------------------------------------------------- /ex2-paper/ex2-paper.rmd: -------------------------------------------------------------------------------- 1 | `r opts_chunk$set(cache=TRUE, tidy=FALSE)` 2 | # Example 2 from Rossel's Paper on lavaan 3 | ```{r setup, message=FALSE} 4 | library(lavaan) 5 | Data <- PoliticalDemocracy 6 | ``` 7 | 8 | This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see [here](http://www.jstatsoft.org/v48/i02/paper)). 9 | 10 | ## M0: Basic Measurement model 11 | ```{r basic_measurement_model} 12 | m0_model <- ' 13 | # measurement model 14 | ind60 =~ x1 + x2 + x3 15 | dem60 =~ y1 + y2 + y3 + y4 16 | dem65 =~ y5 + y6 + y7 + y8 17 | ' 18 | 19 | m0_fit <- cfa(m0_model, data=Data) 20 | ``` 21 | 22 | * `m0` defines a basic measurement model that permits correlated factors. Note that it does not have correlations between corresponding democracy indicator measures over time. 23 | 24 | **Questions:** 25 | 26 | * Is it a good model? 27 | 28 | ```{r m0_fit_measures} 29 | fitmeasures(m0_fit) 30 | ``` 31 | 32 | * cfi suggests a reasonable model, but RMSEA is quite large. 33 | 34 | ```{r m0_standardised_parameters} 35 | inspect(m0_fit, 'standardized') 36 | ``` 37 | 38 | * The table of standardised loadings show all factor loadings to be large. 39 | 40 | ```{r m0_mod_indices} 41 | m0_mod <- modificationindices(m0_fit) 42 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12) 43 | ``` 44 | 45 | * The table of largest modification indices suggest a range of ways that the model could be improved. Because the sample size is small, particular caution needs to be taken with these. 46 | * Several of these modifications concern the expected requirement to permit indicator variables at different time points to correlate (e.g., `y2` with `y6`, `y3` with `y7`). 47 | * It may also be that some pairs of items are correlated more than others. For example, the following correlation matrix shows how `y6` and `y8` have a particularly large correlation. 48 | 49 | ```{r} 50 | round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2) 51 | ``` 52 | 53 | 54 | * What are the correlations between the factors? 55 | 56 | ```{r} 57 | cov2cor(inspect(m0_fit, "coefficients")$psi) 58 | ``` 59 | 60 | This certainly suggests that factors are strongly related, especially the two demographics measures. 61 | 62 | 63 | ## M1: Correlated item measurement model 64 | This next model permits corresponding democracy measures from the two points to be correlated. 65 | 66 | ```{r correlated_measurement_model} 67 | m1_model <- ' 68 | # measurement model 69 | ind60 =~ x1 + x2 + x3 70 | dem60 =~ y1 + y2 + y3 + y4 71 | dem65 =~ y5 + y6 + y7 + y8 72 | 73 | # correlated residuals 74 | y1 ~~ y5 75 | y2 ~~ y6 76 | y3 ~~ y7 77 | y4 ~~ y8 78 | ' 79 | 80 | m1_fit <- cfa(m1_model, data=Data) 81 | ``` 82 | 83 | * Is this an improvement over `m0` with uncorrelated indicators? 84 | * Does `m1` have good fit in and of itself? 85 | 86 | ```{r} 87 | anova(m0_fit, m1_fit) 88 | round(cbind(m0=inspect(m0_fit, 'fit.measures'), 89 | m1=inspect(m1_fit, 'fit.measures')), 3) 90 | ``` 91 | 92 | * It is a significant improvement. 93 | * RMSEA and other fit measurs are substantially improved. 94 | * The relatively small sample size makes it somewhat difficult to see how much further improvements should continue. In general, the RMSEA suggests that further improvements are possible but it may be less clear on how to proceed in a principled way. 95 | 96 | 97 | 98 | 99 | # M2: Basic SEM 100 | ```{r m2_model} 101 | m2_model <- ' 102 | # measurement model 103 | ind60 =~ x1 + x2 + x3 104 | dem60 =~ y1 + y2 + y3 + y4 105 | dem65 =~ y5 + y6 + y7 + y8 106 | 107 | # correlated residuals 108 | y1 ~~ y5 109 | y2 ~~ y6 110 | y3 ~~ y7 111 | y4 ~~ y8 112 | 113 | # regressions 114 | dem60 ~ ind60 115 | dem65 ~ ind60 + dem60 116 | ' 117 | 118 | m2_fit <- sem(m2_model, data=Data) 119 | ``` 120 | 121 | * Is fit the same as model 1 as I would expect? 122 | 123 | ```{r m2_chi_square_check} 124 | rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')], 125 | m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')]) 126 | ``` 127 | Yes, it is. 128 | 129 | * Assuming democracy 1965 is the depenent variable, how can we get the information typically available in multiple regression output? 130 | * R-squared? 131 | * Unstandardised regression coefficients? 132 | * Standardised regression coefficients? 133 | * Standard errors, p-values, and confidence intervals on unstandardised coefficients? 134 | 135 | ```{r} 136 | # m2_fit <- sem(m2_model, data=Data) 137 | 138 | # r-square for dem-65 139 | inspect(m2_fit, 'r2')['dem65'] 140 | 141 | # Unstandardised regression coefficients 142 | inspect(m2_fit, 'coef')$beta['dem65', ] 143 | 144 | # Standardised regression coefficients 145 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~') 146 | 147 | # Just a guess, may not be correct: 148 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ], 149 | # se=inspect(m2_fit, 'se')$beta['dem65', ]) 150 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96 151 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96 152 | ``` 153 | 154 | 155 | 156 | -------------------------------------------------------------------------------- /makefile: -------------------------------------------------------------------------------- 1 | 2 | pdf-all: 3 | Rscript 'convert.r' 4 | -------------------------------------------------------------------------------- /path-analysis/figure/unnamed-chunk-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jeromyanglim/lavaan-examples/d7f5cbdc7fe14ffd039512bae6aa140c2a0ca5e6/path-analysis/figure/unnamed-chunk-5.png -------------------------------------------------------------------------------- /path-analysis/path-analysis.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Path Analysis Example 4 | 5 | 6 | ```r 7 | library(psych) 8 | library(lavaan) 9 | ``` 10 | 11 | 12 | 13 | 14 | 15 | ## Simulate data 16 | Let's simulate some data: 17 | 18 | * three orthogonal predictor variables 19 | * one mediator variable 20 | * one dependent variable 21 | 22 | 23 | 24 | ```r 25 | set.seed(1234) 26 | N <- 1000 27 | iv1 <- rnorm(N, 0, 1) 28 | iv2 <- rnorm(N, 0, 1) 29 | iv3 <- rnorm(N, 0, 1) 30 | mv <- rnorm(N, .2 * iv1 + -.2 * iv2 + .3 * iv3, 1) 31 | dv <- rnorm(N, .8 * mv, 1) 32 | data_1 <- data.frame(iv1, iv2, iv3, mv, dv) 33 | ``` 34 | 35 | 36 | 37 | 38 | ## Traditional examination of dataset 39 | * Is a regression consistent with the model? 40 | 41 | 42 | 43 | ```r 44 | summary(lm(mv ~ iv1 + iv2 + iv3, data_1)) 45 | ``` 46 | 47 | ``` 48 | ## 49 | ## Call: 50 | ## lm(formula = mv ~ iv1 + iv2 + iv3, data = data_1) 51 | ## 52 | ## Residuals: 53 | ## Min 1Q Median 3Q Max 54 | ## -3.0281 -0.6863 0.0114 0.6697 3.1412 55 | ## 56 | ## Coefficients: 57 | ## Estimate Std. Error t value Pr(>|t|) 58 | ## (Intercept) -0.00945 0.03150 -0.30 0.76 59 | ## iv1 0.19737 0.03163 6.24 6.4e-10 *** 60 | ## iv2 -0.19978 0.03216 -6.21 7.7e-10 *** 61 | ## iv3 0.29183 0.03113 9.38 < 2e-16 *** 62 | ## --- 63 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 64 | ## 65 | ## Residual standard error: 0.995 on 996 degrees of freedom 66 | ## Multiple R-squared: 0.144, Adjusted R-squared: 0.141 67 | ## F-statistic: 55.8 on 3 and 996 DF, p-value: <2e-16 68 | ## 69 | ``` 70 | 71 | 72 | 73 | 74 | This is broadly similar to the equation predicting `mv`. 75 | 76 | 77 | 78 | ```r 79 | summary(lm(dv ~ iv1 + iv2 + iv3 + mv, data_1)) 80 | ``` 81 | 82 | ``` 83 | ## 84 | ## Call: 85 | ## lm(formula = dv ~ iv1 + iv2 + iv3 + mv, data = data_1) 86 | ## 87 | ## Residuals: 88 | ## Min 1Q Median 3Q Max 89 | ## -2.7484 -0.6547 -0.0359 0.6947 2.7185 90 | ## 91 | ## Coefficients: 92 | ## Estimate Std. Error t value Pr(>|t|) 93 | ## (Intercept) -0.0410 0.0308 -1.33 0.18 94 | ## iv1 -0.0449 0.0315 -1.43 0.15 95 | ## iv2 0.0400 0.0320 1.25 0.21 96 | ## iv3 0.0162 0.0317 0.51 0.61 97 | ## mv 0.8250 0.0309 26.66 <2e-16 *** 98 | ## --- 99 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 100 | ## 101 | ## Residual standard error: 0.972 on 995 degrees of freedom 102 | ## Multiple R-squared: 0.45, Adjusted R-squared: 0.448 103 | ## F-statistic: 204 on 4 and 995 DF, p-value: <2e-16 104 | ## 105 | ``` 106 | 107 | 108 | 109 | Given that the simulation is based on complete mediation, the true regression coefficients for the ivs are zero. The results of the multiple regression predicting the `dv` from the `iv`s and `mv` is consistent with this. 110 | 111 | What are the basic descriptive statistics and intercorrelations? 112 | 113 | 114 | 115 | ```r 116 | psych::describe(data_1) 117 | ``` 118 | 119 | ``` 120 | ## var n mean sd median trimmed mad min max range skew 121 | ## iv1 1 1000 -0.03 1.00 -0.04 -0.03 0.95 -3.40 3.20 6.59 -0.01 122 | ## iv2 2 1000 0.01 0.98 0.01 0.02 0.97 -3.12 3.17 6.29 -0.07 123 | ## iv3 3 1000 0.03 1.01 0.06 0.03 1.06 -3.09 3.02 6.12 0.01 124 | ## mv 4 1000 -0.01 1.07 0.02 -0.02 1.03 -3.12 3.82 6.94 0.07 125 | ## dv 5 1000 -0.05 1.31 -0.03 -0.04 1.37 -3.99 3.53 7.53 -0.03 126 | ## kurtosis se 127 | ## iv1 0.25 0.03 128 | ## iv2 -0.07 0.03 129 | ## iv3 -0.21 0.03 130 | ## mv 0.14 0.03 131 | ## dv -0.17 0.04 132 | ``` 133 | 134 | ```r 135 | pairs.panels(data_1, pch='.') 136 | ``` 137 | 138 | ![plot of chunk unnamed-chunk-5](figure/unnamed-chunk-5.png) 139 | 140 | 141 | 142 | ## M1 Fit Path Analysis model 143 | 144 | 145 | 146 | ```r 147 | m1_model <- ' 148 | dv ~ mv 149 | mv ~ iv1 + iv2 + iv3 150 | ' 151 | 152 | m1_fit <- sem(m1_model, data=data_1) 153 | ``` 154 | 155 | 156 | 157 | 158 | Are the regression coefficients the same? 159 | 160 | 161 | 162 | ```r 163 | parameterestimates(m1_fit) 164 | ``` 165 | 166 | ``` 167 | ## lhs op rhs est se z pvalue ci.lower ci.upper 168 | ## 1 dv ~ mv 0.815 0.029 28.490 0 0.759 0.871 169 | ## 2 mv ~ iv1 0.197 0.032 6.253 0 0.136 0.259 170 | ## 3 mv ~ iv2 -0.200 0.032 -6.224 0 -0.263 -0.137 171 | ## 4 mv ~ iv3 0.292 0.031 9.394 0 0.231 0.353 172 | ## 5 dv ~~ dv 0.944 0.042 22.361 0 0.861 1.026 173 | ## 6 mv ~~ mv 0.986 0.044 22.361 0 0.900 1.073 174 | ## 7 iv1 ~~ iv1 0.994 0.000 NA NA 0.994 0.994 175 | ## 8 iv1 ~~ iv2 0.055 0.000 NA NA 0.055 0.055 176 | ## 9 iv1 ~~ iv3 0.016 0.000 NA NA 0.016 0.016 177 | ## 10 iv2 ~~ iv2 0.962 0.000 NA NA 0.962 0.962 178 | ## 11 iv2 ~~ iv3 -0.035 0.000 NA NA -0.035 -0.035 179 | ## 12 iv3 ~~ iv3 1.024 0.000 NA NA 1.024 1.024 180 | ``` 181 | 182 | 183 | 184 | 185 | All the coefficients are in the ball park of what is expected. 186 | 187 | Does the model provide a good fit? 188 | 189 | 190 | ```r 191 | fitmeasures(m1_fit) 192 | ``` 193 | 194 | ``` 195 | ## chisq df pvalue baseline.chisq 196 | ## 3.654 3.000 0.301 753.212 197 | ## baseline.df baseline.pvalue cfi tli 198 | ## 7.000 0.000 0.999 0.998 199 | ## logl unrestricted.logl npar aic 200 | ## -7045.596 -7043.769 6.000 14103.191 201 | ## bic ntotal bic2 rmsea 202 | ## 14132.638 1000.000 14113.582 0.015 203 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr 204 | ## 0.000 0.057 0.899 0.011 205 | ``` 206 | 207 | 208 | 209 | 210 | * The fitted model should provide a good fit because the fitted model is identical to the model used to simulate the data. 211 | * In this case, the p-value and the fit measures are consistent with the data being generated from the model specified. 212 | 213 | 214 | ## Calculate and test indirect effects 215 | 216 | 217 | ```r 218 | m2_model <- ' 219 | dv ~ b1*mv 220 | mv ~ a1*iv1 + a2*iv2 + a3*iv3 221 | 222 | # indirect effects 223 | iv1_mv := a1*b1 224 | iv2_mv := a2*b1 225 | iv3_mv := a3*b1 226 | ' 227 | 228 | m2_fit <- sem(m2_model, data=data_1) 229 | ``` 230 | 231 | 232 | 233 | 234 | * Note that I needed to name effects before I could define the indirect effect as the product of two effects using `:=` notation. 235 | 236 | 237 | 238 | 239 | ```r 240 | parameterestimates(m2_fit, standardize=TRUE) 241 | ``` 242 | 243 | ``` 244 | ## lhs op rhs label est se z pvalue ci.lower ci.upper 245 | ## 1 dv ~ mv b1 0.815 0.029 28.490 0 0.759 0.871 246 | ## 2 mv ~ iv1 a1 0.197 0.032 6.253 0 0.136 0.259 247 | ## 3 mv ~ iv2 a2 -0.200 0.032 -6.224 0 -0.263 -0.137 248 | ## 4 mv ~ iv3 a3 0.292 0.031 9.394 0 0.231 0.353 249 | ## 5 dv ~~ dv 0.944 0.042 22.361 0 0.861 1.026 250 | ## 6 mv ~~ mv 0.986 0.044 22.361 0 0.900 1.073 251 | ## 7 iv1 ~~ iv1 0.994 0.000 NA NA 0.994 0.994 252 | ## 8 iv1 ~~ iv2 0.055 0.000 NA NA 0.055 0.055 253 | ## 9 iv1 ~~ iv3 0.016 0.000 NA NA 0.016 0.016 254 | ## 10 iv2 ~~ iv2 0.962 0.000 NA NA 0.962 0.962 255 | ## 11 iv2 ~~ iv3 -0.035 0.000 NA NA -0.035 -0.035 256 | ## 12 iv3 ~~ iv3 1.024 0.000 NA NA 1.024 1.024 257 | ## 13 iv1_mv := a1*b1 iv1_mv 0.161 0.026 6.108 0 0.109 0.213 258 | ## 14 iv2_mv := a2*b1 iv2_mv -0.163 0.027 -6.081 0 -0.215 -0.110 259 | ## 15 iv3_mv := a3*b1 iv3_mv 0.238 0.027 8.922 0 0.186 0.290 260 | ## std.lv std.all std.nox 261 | ## 1 0.815 0.669 0.669 262 | ## 2 0.197 0.183 0.184 263 | ## 3 -0.200 -0.183 -0.186 264 | ## 4 0.292 0.275 0.272 265 | ## 5 0.944 0.552 0.552 266 | ## 6 0.986 0.856 0.856 267 | ## 7 0.994 1.000 0.994 268 | ## 8 0.055 0.057 0.055 269 | ## 9 0.016 0.015 0.016 270 | ## 10 0.962 1.000 0.962 271 | ## 11 -0.035 -0.035 -0.035 272 | ## 12 1.024 1.000 1.024 273 | ## 13 0.161 0.161 0.161 274 | ## 14 -0.163 -0.163 -0.163 275 | ## 15 0.238 0.238 0.238 276 | ``` 277 | 278 | 279 | 280 | 281 | The above output provide a significance test, and confidence intervals for the indirect effects, and includes standardised effects. 282 | 283 | 284 | 285 | -------------------------------------------------------------------------------- /path-analysis/path-analysis.rmd: -------------------------------------------------------------------------------- 1 | `r opts_chunk$set(cache=TRUE, tidy=FALSE)` 2 | 3 | # Path Analysis Example 4 | ```{r, message=FALSE} 5 | library(psych) 6 | library(lavaan) 7 | 8 | ``` 9 | 10 | 11 | ## Simulate data 12 | Let's simulate some data: 13 | 14 | * three orthogonal predictor variables 15 | * one mediator variable 16 | * one dependent variable 17 | 18 | ```{r} 19 | set.seed(1234) 20 | N <- 1000 21 | iv1 <- rnorm(N, 0, 1) 22 | iv2 <- rnorm(N, 0, 1) 23 | iv3 <- rnorm(N, 0, 1) 24 | mv <- rnorm(N, .2 * iv1 + -.2 * iv2 + .3 * iv3, 1) 25 | dv <- rnorm(N, .8 * mv, 1) 26 | data_1 <- data.frame(iv1, iv2, iv3, mv, dv) 27 | ``` 28 | 29 | ## Traditional examination of dataset 30 | * Is a regression consistent with the model? 31 | 32 | ```{r} 33 | summary(lm(mv ~ iv1 + iv2 + iv3, data_1)) 34 | ``` 35 | 36 | This is broadly similar to the equation predicting `mv`. 37 | 38 | ```{r} 39 | summary(lm(dv ~ iv1 + iv2 + iv3 + mv, data_1)) 40 | ``` 41 | Given that the simulation is based on complete mediation, the true regression coefficients for the ivs are zero. The results of the multiple regression predicting the `dv` from the `iv`s and `mv` is consistent with this. 42 | 43 | What are the basic descriptive statistics and intercorrelations? 44 | 45 | ```{r} 46 | psych::describe(data_1) 47 | pairs.panels(data_1, pch='.') 48 | ``` 49 | 50 | 51 | ## M1 Fit Path Analysis model 52 | 53 | ```{r} 54 | m1_model <- ' 55 | dv ~ mv 56 | mv ~ iv1 + iv2 + iv3 57 | ' 58 | 59 | m1_fit <- sem(m1_model, data=data_1) 60 | ``` 61 | 62 | Are the regression coefficients the same? 63 | 64 | ```{r} 65 | parameterestimates(m1_fit) 66 | ``` 67 | 68 | All the coefficients are in the ball park of what is expected. 69 | 70 | Does the model provide a good fit? 71 | ```{r} 72 | fitmeasures(m1_fit) 73 | ``` 74 | 75 | * The fitted model should provide a good fit because the fitted model is identical to the model used to simulate the data. 76 | * In this case, the p-value and the fit measures are consistent with the data being generated from the model specified. 77 | 78 | 79 | ## Calculate and test indirect effects 80 | ```{r} 81 | m2_model <- ' 82 | dv ~ b1*mv 83 | mv ~ a1*iv1 + a2*iv2 + a3*iv3 84 | 85 | # indirect effects 86 | iv1_mv := a1*b1 87 | iv2_mv := a2*b1 88 | iv3_mv := a3*b1 89 | ' 90 | 91 | m2_fit <- sem(m2_model, data=data_1) 92 | ``` 93 | 94 | * Note that I needed to name effects before I could define the indirect effect as the product of two effects using `:=` notation. 95 | 96 | 97 | ```{r} 98 | parameterestimates(m2_fit, standardize=TRUE) 99 | 100 | ``` 101 | 102 | The above output provide a significance test, and confidence intervals for the indirect effects, and includes standardised effects. 103 | 104 | 105 | 106 | --------------------------------------------------------------------------------