├── .gitignore ├── README.md ├── cfa-example ├── cfa-example.html ├── cfa-example.md ├── cfa-example.rmd └── figure │ └── unnamed-chunk-19.png ├── cheat-sheet-lavaan ├── cheat-sheet-lavaan.html ├── cheat-sheet-lavaan.md └── cheat-sheet-lavaan.rmd ├── convert.r ├── ex1-paper ├── ex1-paper.html ├── ex1-paper.md └── ex1-paper.rmd ├── ex2-paper ├── ex2-paper.html ├── ex2-paper.md └── ex2-paper.rmd ├── makefile └── path-analysis ├── figure └── unnamed-chunk-5.png ├── path-analysis.html ├── path-analysis.md └── path-analysis.rmd /.gitignore: -------------------------------------------------------------------------------- 1 | *cache* 2 | .build* 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This repository shares a few example analyses using `lavaan`, an R package for structural equation modelling. 2 | 3 | I've just been creating these examples to teach myself how to use the software. Feel free to re-use the code, but I make no guaranty as to the accuracy or validity of these analyses. 4 | -------------------------------------------------------------------------------- /cfa-example/cfa-example.md: -------------------------------------------------------------------------------- 1 | # CFA Example 2 | 3 | 4 | 5 | ```r 6 | library(psych) 7 | library(lavaan) 8 | Data <- bfi 9 | item_names <- names(Data)[1:25] 10 | ``` 11 | 12 | 13 | 14 | 15 | ## Check data 16 | 17 | 18 | 19 | ```r 20 | sapply(Data[, item_names], function(X) sum(is.na(X))) 21 | ``` 22 | 23 | ``` 24 | ## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5 25 | ## 16 27 26 19 16 21 24 20 26 16 23 16 25 9 21 22 21 11 36 29 22 0 28 14 20 26 | ``` 27 | 28 | ```r 29 | 30 | Data$item_na <- apply(Data[, item_names], 1, function(X) sum(is.na(X)) > 31 | 0) 32 | 33 | table(Data$item_na) 34 | ``` 35 | 36 | ``` 37 | ## 38 | ## FALSE TRUE 39 | ## 2436 364 40 | ``` 41 | 42 | ```r 43 | Data <- Data[!Data$item_na, ] 44 | ``` 45 | 46 | 47 | 48 | 49 | * I decided to remove data with missing data to simplify subsequent exploration of the features of the lavaan software. 50 | 51 | 52 | ## Basic CFA 53 | 54 | 55 | ```r 56 | m1_model <- ' N =~ N1 + N2 + N3 + N4 + N5 57 | E =~ E1 + E2 + E3 + E4 + E5 58 | O =~ O1 + O2 + O3 + O4 + O5 59 | A =~ A1 + A2 + A3 + A4 + A5 60 | C =~ C1 + C2 + C3 + C4 + C5 61 | ' 62 | 63 | m1_fit <- cfa(m1_model, data=Data[, item_names]) 64 | summary(m1_fit, standardized=TRUE) 65 | ``` 66 | 67 | ``` 68 | ## lavaan (0.4-14) converged normally after 63 iterations 69 | ## 70 | ## Number of observations 2436 71 | ## 72 | ## Estimator ML 73 | ## Minimum Function Chi-square 4165.467 74 | ## Degrees of freedom 265 75 | ## P-value 0.000 76 | ## 77 | ## Parameter estimates: 78 | ## 79 | ## Information Expected 80 | ## Standard Errors Standard 81 | ## 82 | ## Estimate Std.err Z-value P(>|z|) Std.lv Std.all 83 | ## Latent variables: 84 | ## N =~ 85 | ## N1 1.000 1.300 0.825 86 | ## N2 0.947 0.024 39.899 0.000 1.230 0.803 87 | ## N3 0.884 0.025 35.919 0.000 1.149 0.721 88 | ## N4 0.692 0.025 27.753 0.000 0.899 0.573 89 | ## N5 0.628 0.026 24.027 0.000 0.816 0.503 90 | ## E =~ 91 | ## E1 1.000 0.920 0.564 92 | ## E2 1.226 0.051 23.899 0.000 1.128 0.699 93 | ## E3 -0.921 0.041 -22.431 0.000 -0.847 -0.627 94 | ## E4 -1.121 0.047 -23.977 0.000 -1.031 -0.703 95 | ## E5 -0.808 0.039 -20.648 0.000 -0.743 -0.553 96 | ## O =~ 97 | ## O1 1.000 0.635 0.564 98 | ## O2 -1.020 0.068 -14.962 0.000 -0.648 -0.418 99 | ## O3 1.373 0.072 18.942 0.000 0.872 0.724 100 | ## O4 0.437 0.048 9.160 0.000 0.277 0.233 101 | ## O5 -0.960 0.060 -16.056 0.000 -0.610 -0.461 102 | ## A =~ 103 | ## A1 1.000 0.484 0.344 104 | ## A2 -1.579 0.108 -14.650 0.000 -0.764 -0.648 105 | ## A3 -2.030 0.134 -15.093 0.000 -0.983 -0.749 106 | ## A4 -1.564 0.115 -13.616 0.000 -0.757 -0.510 107 | ## A5 -1.804 0.121 -14.852 0.000 -0.873 -0.687 108 | ## C =~ 109 | ## C1 1.000 0.680 0.551 110 | ## C2 1.148 0.057 20.152 0.000 0.781 0.592 111 | ## C3 1.036 0.054 19.172 0.000 0.705 0.546 112 | ## C4 -1.421 0.065 -21.924 0.000 -0.967 -0.702 113 | ## C5 -1.489 0.072 -20.694 0.000 -1.013 -0.620 114 | ## 115 | ## Covariances: 116 | ## N ~~ 117 | ## E 0.292 0.032 9.131 0.000 0.244 0.244 118 | ## O -0.093 0.022 -4.138 0.000 -0.112 -0.112 119 | ## A 0.141 0.018 7.713 0.000 0.223 0.223 120 | ## C -0.250 0.025 -10.118 0.000 -0.283 -0.283 121 | ## E ~~ 122 | ## O -0.265 0.021 -12.347 0.000 -0.453 -0.453 123 | ## A 0.304 0.025 12.293 0.000 0.683 0.683 124 | ## C -0.224 0.020 -11.121 0.000 -0.357 -0.357 125 | ## O ~~ 126 | ## A -0.093 0.011 -8.446 0.000 -0.303 -0.303 127 | ## C 0.130 0.014 9.190 0.000 0.301 0.301 128 | ## A ~~ 129 | ## C -0.110 0.012 -9.254 0.000 -0.334 -0.334 130 | ## 131 | ## Variances: 132 | ## N1 0.793 0.037 0.793 0.320 133 | ## N2 0.836 0.036 0.836 0.356 134 | ## N3 1.222 0.043 1.222 0.481 135 | ## N4 1.654 0.052 1.654 0.672 136 | ## N5 1.969 0.060 1.969 0.747 137 | ## E1 1.814 0.058 1.814 0.682 138 | ## E2 1.332 0.049 1.332 0.512 139 | ## E3 1.108 0.038 1.108 0.607 140 | ## E4 1.088 0.041 1.088 0.506 141 | ## E5 1.251 0.040 1.251 0.694 142 | ## O1 0.865 0.032 0.865 0.682 143 | ## O2 1.990 0.063 1.990 0.826 144 | ## O3 0.691 0.039 0.691 0.476 145 | ## O4 1.346 0.040 1.346 0.946 146 | ## O5 1.380 0.045 1.380 0.788 147 | ## A1 1.745 0.052 1.745 0.882 148 | ## A2 0.807 0.028 0.807 0.580 149 | ## A3 0.754 0.032 0.754 0.438 150 | ## A4 1.632 0.051 1.632 0.740 151 | ## A5 0.852 0.032 0.852 0.528 152 | ## C1 1.063 0.035 1.063 0.697 153 | ## C2 1.130 0.039 1.130 0.650 154 | ## C3 1.170 0.039 1.170 0.702 155 | ## C4 0.960 0.040 0.960 0.507 156 | ## C5 1.640 0.059 1.640 0.615 157 | ## N 1.689 0.073 1.000 1.000 158 | ## E 0.846 0.062 1.000 1.000 159 | ## O 0.404 0.033 1.000 1.000 160 | ## A 0.234 0.030 1.000 1.000 161 | ## C 0.463 0.036 1.000 1.000 162 | ## 163 | ``` 164 | 165 | 166 | 167 | 168 | * **`Std.lv`**: Only latent variables have been standardized 169 | * **`Std.all`**: Observed and latent variables have been standardized. 170 | * **Factor loadings**: Under the `latent variables` section, the `Std.all` column provides standardised factor loadings. 171 | * **Factor correlations**: Under the `Covariances` section, the `Std.all` column provides standardised factor loadings. 172 | * **`Variances`**: Latent factor variances can be constrained for identifiability purposes to be 1, but in this case, one of the loadings was constrained to be one. Variances for items represent the variance not explained by the latent factor. 173 | 174 | 175 | 176 | 177 | 178 | ```r 179 | variances <- c(unique = subset(inspect(m1_fit, "standardizedsolution"), 180 | lhs == "N1" & rhs == "N1")[, "est.std"], common = subset(inspect(m1_fit, 181 | "standardizedsolution"), lhs == "N" & rhs == "N1")[, "est.std"]^2) 182 | (variances <- c(variances, total = sum(variances))) 183 | ``` 184 | 185 | ``` 186 | ## unique common total 187 | ## 0.3195 0.6805 1.0000 188 | ``` 189 | 190 | 191 | 192 | 193 | * The output above illustrates the point about variances. Variance for each item is explained by either the common factor or by error variance. As there is just one latent factor loading on the item, the squared standardised coefficient is the variance explained by the common factor. The sum of the unique and common standardised variances is one, which naturally corresponds to the variance of a standardised variable. 194 | * The code also demonstrates ideas about how to extract specific information from the lavaan model fit object. Specifically, the `inspect` method provides access to a wide range of specific information. See help for further details. 195 | * I used the `subset` method to provide an easy one-liner for extracting elements from the data frame returned by the `inspect` method. 196 | 197 | 198 | 199 | ```r 200 | variances <- c(N1_N1 = subset(parameterestimates(m1_fit), lhs == 201 | "N1" & rhs == "N1")[, "est"], N_N = subset(parameterestimates(m1_fit), lhs == 202 | "N" & rhs == "N")[, "est"], N_N1 = subset(parameterestimates(m1_fit), lhs == 203 | "N" & rhs == "N1")[, "est"]) 204 | 205 | cbind(parameters = c(variances, total = variances["N_N1"] * variances["N_N"] + 206 | variances["N1_N1"], raw_divide_by_n_minus_1 = var(Data[, "N1"]), raw_divide_by_n = mean((Data[, 207 | "N1"] - mean(Data[, "N1"]))^2))) 208 | ``` 209 | 210 | ``` 211 | ## parameters 212 | ## N1_N1 0.7932 213 | ## N_N 1.6893 214 | ## N_N1 1.0000 215 | ## total.N_N1 2.4825 216 | ## raw_divide_by_n_minus_1 2.4835 217 | ## raw_divide_by_n 2.4825 218 | ``` 219 | 220 | 221 | 222 | 223 | * The output above shows the unstandardised parameters related to the item `N1`. 224 | * `N1_N1` corresponds to the unstandardised unique variance for the item. 225 | * `N_N` times `N_N1` represents the unstandardised common variance. 226 | * Thus, the sum of the unique and common variance represents the total variance. 227 | * When I calculated this on the raw data using the standard $n-1$ denominator, the value was slightly larger, but when I used $n$ as the denominator, the estimate was very close. 228 | 229 | 230 | 231 | ## Compare with a single factor model 232 | 233 | 234 | ```r 235 | m2_model <- ' G =~ N1 + N2 + N3 + N4 + N5 236 | + E1 + E2 + E3 + E4 + E5 237 | + O1 + O2 + O3 + O4 + O5 238 | + A1 + A2 + A3 + A4 + A5 239 | + C1 + C2 + C3 + C4 + C5 240 | ' 241 | 242 | m2_fit <- cfa(m2_model, data=Data[, item_names]) 243 | summary(m2_fit, standardized=TRUE) 244 | ``` 245 | 246 | ``` 247 | ## lavaan (0.4-14) converged normally after 55 iterations 248 | ## 249 | ## Number of observations 2436 250 | ## 251 | ## Estimator ML 252 | ## Minimum Function Chi-square 10673.239 253 | ## Degrees of freedom 275 254 | ## P-value 0.000 255 | ## 256 | ## Parameter estimates: 257 | ## 258 | ## Information Expected 259 | ## Standard Errors Standard 260 | ## 261 | ## Estimate Std.err Z-value P(>|z|) Std.lv Std.all 262 | ## Latent variables: 263 | ## G =~ 264 | ## N1 1.000 0.547 0.347 265 | ## N2 0.959 0.081 11.809 0.000 0.524 0.342 266 | ## N3 0.960 0.083 11.547 0.000 0.525 0.329 267 | ## N4 1.375 0.099 13.919 0.000 0.752 0.479 268 | ## N5 0.884 0.081 10.860 0.000 0.484 0.298 269 | ## E1 1.332 0.099 13.509 0.000 0.728 0.447 270 | ## E2 1.868 0.122 15.297 0.000 1.022 0.633 271 | ## E3 -1.382 0.094 -14.730 0.000 -0.756 -0.559 272 | ## E4 -1.702 0.111 -15.307 0.000 -0.931 -0.635 273 | ## E5 -1.292 0.090 -14.425 0.000 -0.707 -0.526 274 | ## O1 -0.656 0.058 -11.321 0.000 -0.359 -0.318 275 | ## O2 0.444 0.067 6.641 0.000 0.243 0.156 276 | ## O3 -0.877 0.068 -12.801 0.000 -0.479 -0.398 277 | ## O4 0.142 0.048 2.930 0.003 0.078 0.065 278 | ## O5 0.416 0.058 7.196 0.000 0.228 0.172 279 | ## A1 0.568 0.065 8.797 0.000 0.311 0.221 280 | ## A2 -1.032 0.074 -13.913 0.000 -0.565 -0.479 281 | ## A3 -1.322 0.090 -14.663 0.000 -0.723 -0.552 282 | ## A4 -1.172 0.088 -13.307 0.000 -0.641 -0.432 283 | ## A5 -1.413 0.093 -15.123 0.000 -0.773 -0.608 284 | ## C1 -0.705 0.063 -11.188 0.000 -0.386 -0.312 285 | ## C2 -0.725 0.066 -10.923 0.000 -0.396 -0.301 286 | ## C3 -0.682 0.064 -10.645 0.000 -0.373 -0.289 287 | ## C4 1.009 0.079 12.852 0.000 0.552 0.401 288 | ## C5 1.332 0.099 13.505 0.000 0.728 0.446 289 | ## 290 | ## Variances: 291 | ## N1 2.183 0.064 2.183 0.880 292 | ## N2 2.075 0.061 2.075 0.883 293 | ## N3 2.267 0.066 2.267 0.892 294 | ## N4 1.897 0.057 1.897 0.770 295 | ## N5 2.401 0.070 2.401 0.911 296 | ## E1 2.130 0.064 2.130 0.801 297 | ## E2 1.560 0.050 1.560 0.599 298 | ## E3 1.255 0.039 1.255 0.687 299 | ## E4 1.284 0.042 1.284 0.597 300 | ## E5 1.304 0.040 1.304 0.723 301 | ## O1 1.140 0.033 1.140 0.899 302 | ## O2 2.351 0.068 2.351 0.976 303 | ## O3 1.222 0.036 1.222 0.842 304 | ## O4 1.417 0.041 1.417 0.996 305 | ## O5 1.701 0.049 1.701 0.970 306 | ## A1 1.883 0.054 1.883 0.951 307 | ## A2 1.072 0.032 1.072 0.771 308 | ## A3 1.196 0.037 1.196 0.696 309 | ## A4 1.794 0.053 1.794 0.814 310 | ## A5 1.017 0.032 1.017 0.630 311 | ## C1 1.376 0.040 1.376 0.902 312 | ## C2 1.582 0.046 1.582 0.910 313 | ## C3 1.528 0.044 1.528 0.917 314 | ## C4 1.590 0.047 1.590 0.839 315 | ## C5 2.134 0.064 2.134 0.801 316 | ## G 0.299 0.037 1.000 1.000 317 | ## 318 | ``` 319 | 320 | 321 | 322 | 323 | 324 | 325 | ```r 326 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m2 = inspect(m2_fit, 327 | "fit.measures")), 3) 328 | ``` 329 | 330 | ``` 331 | ## m1 m2 332 | ## chisq 4165.467 1.067e+04 333 | ## df 265.000 2.750e+02 334 | ## pvalue 0.000 0.000e+00 335 | ## baseline.chisq 18222.116 1.822e+04 336 | ## baseline.df 300.000 3.000e+02 337 | ## baseline.pvalue 0.000 0.000e+00 338 | ## cfi 0.782 4.200e-01 339 | ## tli 0.754 3.670e-01 340 | ## logl -99840.238 -1.031e+05 341 | ## unrestricted.logl -97757.504 -9.776e+04 342 | ## npar 60.000 5.000e+01 343 | ## aic 199800.476 2.063e+05 344 | ## bic 200148.363 2.066e+05 345 | ## ntotal 2436.000 2.436e+03 346 | ## bic2 199957.729 2.064e+05 347 | ## rmsea 0.078 1.250e-01 348 | ## rmsea.ci.lower 0.076 1.230e-01 349 | ## rmsea.ci.upper 0.080 1.270e-01 350 | ## rmsea.pvalue 0.000 0.000e+00 351 | ## srmr 0.075 1.160e-01 352 | ``` 353 | 354 | ```r 355 | anova(m1_fit, m2_fit) 356 | ``` 357 | 358 | ``` 359 | ## Chi Square Difference Test 360 | ## 361 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 362 | ## m1_fit 265 199800 200148 4165 363 | ## m2_fit 275 206288 206578 10673 6508 10 <2e-16 *** 364 | ## --- 365 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 366 | ``` 367 | 368 | 369 | 370 | 371 | * The output compares the model fit statistics for the two models. 372 | * It also performs a chi-square difference test which shows that a one-factor model has significantly worse fit than the two-factor model. 373 | 374 | 375 | ## Modification indices 376 | 377 | 378 | ```r 379 | m1_mod <- modificationindices(m1_fit) 380 | m1_mod_summary <- subset(m1_mod, mi > 100) 381 | m1_mod_summary[order(m1_mod_summary$mi, decreasing = TRUE), ] 382 | ``` 383 | 384 | ``` 385 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox 386 | ## 1 N1 ~~ N2 418.8 0.841 0.841 0.348 0.348 387 | ## 2 E =~ N4 200.8 0.487 0.448 0.285 0.285 388 | ## 3 O =~ E3 153.7 0.672 0.427 0.316 0.316 389 | ## 4 N3 ~~ N4 134.1 0.403 0.403 0.161 0.161 390 | ## 5 O =~ E4 122.6 -0.636 -0.404 -0.276 -0.276 391 | ## 6 C =~ E5 121.5 0.504 0.343 0.255 0.255 392 | ## 7 E =~ O3 114.2 -0.429 -0.395 -0.328 -0.328 393 | ## 8 E =~ O4 113.9 0.372 0.343 0.287 0.287 394 | ## 9 N =~ C5 108.8 0.271 0.352 0.216 0.216 395 | ## 10 E =~ A5 108.6 -0.488 -0.449 -0.354 -0.354 396 | ## 11 N =~ C2 107.0 0.219 0.285 0.216 0.216 397 | ## 12 C1 ~~ C2 107.0 0.288 0.288 0.177 0.177 398 | ## 13 E2 ~~ O4 104.7 0.310 0.310 0.161 0.161 399 | ## 14 A1 ~~ A2 101.4 -0.276 -0.276 -0.166 -0.166 400 | ``` 401 | 402 | 403 | 404 | 405 | * `modificationindices` suggests several ad hoc modifications that could be made to improve the fit of the model. 406 | * The largest index suggests that items `N1` and `N2` share common variance. If we look at the help file on the bfi dataset `?bfi`, we see tha the text for `N1` ("Get angry easily") and `N2` ("Get irritated easily") are very similar. 407 | 408 | 409 | 410 | ```r 411 | (N_cors <- round(cor(Data[, paste0("N", 1:5)]), 2)) 412 | ``` 413 | 414 | ``` 415 | ## N1 N2 N3 N4 N5 416 | ## N1 1.00 0.72 0.57 0.41 0.38 417 | ## N2 0.72 1.00 0.55 0.39 0.35 418 | ## N3 0.57 0.55 1.00 0.52 0.43 419 | ## N4 0.41 0.39 0.52 1.00 0.40 420 | ## N5 0.38 0.35 0.43 0.40 1.00 421 | ``` 422 | 423 | ```r 424 | N1_N2_corr <- N_cors["N1", "N2"] 425 | other_N_corrs <- round(mean(abs(N_cors[lower.tri(N_cors)][-1])), 426 | 2) 427 | ``` 428 | 429 | 430 | 431 | 432 | * The correlation matrix also shows that the correlation N1 and N2 ($r = 0.72$) is much larger than it is for the other variables ($\text{mean}(|r|) = 0.44$). 433 | 434 | ## Various matrices 435 | ### Observed, fitted, and residual covariance matrices 436 | The following analysis extracts observed, fitted, and residual covariances and checks that they are consistent with expectations. I only perform this for five items rather than the full 25 item set in order to make the point about demonstrating their meaning clearer. 437 | 438 | 439 | 440 | ```r 441 | N_names <- paste0("N", 1:5) 442 | N_matrices <- list(observed = inspect(m1_fit, "sampstat")$cov[N_names, 443 | N_names], fitted = fitted(m1_fit)$cov[N_names, N_names], residual = resid(m1_fit)$cov[N_names, 444 | N_names]) 445 | 446 | N_matrices$check <- N_matrices$observed - (N_matrices$fitted + N_matrices$residual) 447 | lapply(N_matrices, function(X) round(X, 3)) 448 | ``` 449 | 450 | ``` 451 | ## $observed 452 | ## N1 N2 N3 N4 N5 453 | ## N1 2.482 1.735 1.425 1.013 0.973 454 | ## N2 1.735 2.350 1.344 0.950 0.873 455 | ## N3 1.425 1.344 2.542 1.309 1.114 456 | ## N4 1.013 0.950 1.309 2.463 1.026 457 | ## N5 0.973 0.873 1.114 1.026 2.635 458 | ## 459 | ## $fitted 460 | ## N1 N2 N3 N4 N5 461 | ## N1 2.482 1.599 1.493 1.169 1.061 462 | ## N2 1.599 2.350 1.414 1.106 1.004 463 | ## N3 1.493 1.414 2.542 1.033 0.937 464 | ## N4 1.169 1.106 1.033 2.463 0.734 465 | ## N5 1.061 1.004 0.937 0.734 2.635 466 | ## 467 | ## $residual 468 | ## N1 N2 N3 N4 N5 469 | ## N1 0.000 0.135 -0.068 -0.155 -0.087 470 | ## N2 0.135 0.000 -0.069 -0.157 -0.131 471 | ## N3 -0.068 -0.069 0.000 0.276 0.177 472 | ## N4 -0.155 -0.157 0.276 0.000 0.293 473 | ## N5 -0.087 -0.131 0.177 0.293 0.000 474 | ## 475 | ## $check 476 | ## N1 N2 N3 N4 N5 477 | ## N1 0 0 0 0 0 478 | ## N2 0 0 0 0 0 479 | ## N3 0 0 0 0 0 480 | ## N4 0 0 0 0 0 481 | ## N5 0 0 0 0 0 482 | ## 483 | ``` 484 | 485 | 486 | 487 | 488 | * The overved covariance matrix was extracted using the `cov` function on the sample data. 489 | * The fitted covariance matrix can be extracted using the `fitted` method on the model fit object and then extracting the cov 490 | * Many symmetric matrices in lavaan are of class `lavaan.matrix.symmetric`. This hides the upper triangle of the matrix and formats the matrix to `nd` decimal places. 491 | Run `getAnywhere(print.lavaan.matrix.symmetric)` to see more details. 492 | * The `sampstat` option in the `inspect` method can be used to extract the sample covariance matrix. This is similar, but not exactly the same as running `cov` on the sample data. 493 | * The `resid` method can be used to extract the residual covariance matrix 494 | * I then create a `check` that `observed = fitted - residual`, which it does. 495 | 496 | ### Observed, fitted, and residual correlation matrices 497 | I often find it more meaningful to examine observed, fitted, and residual correlation matrices. Standardisation often makes it easier to understand the real magnitude of any residual. 498 | 499 | 500 | 501 | ```r 502 | N_names <- paste0("N", 1:5) 503 | N_cov <- list(observed = inspect(m1_fit, "sampstat")$cov[N_names, 504 | N_names], fitted = fitted(m1_fit)$cov[N_names, N_names]) 505 | 506 | N_cor <- list(observed = cov2cor(N_cov$observed), fitted = cov2cor(N_cov$fitted)) 507 | 508 | N_cor$residual <- N_cor$observed - N_cor$fitted 509 | 510 | lapply(N_cor, function(X) round(X, 2)) 511 | ``` 512 | 513 | ``` 514 | ## $observed 515 | ## N1 N2 N3 N4 N5 516 | ## N1 1.00 0.72 0.57 0.41 0.38 517 | ## N2 0.72 1.00 0.55 0.39 0.35 518 | ## N3 0.57 0.55 1.00 0.52 0.43 519 | ## N4 0.41 0.39 0.52 1.00 0.40 520 | ## N5 0.38 0.35 0.43 0.40 1.00 521 | ## 522 | ## $fitted 523 | ## N1 N2 N3 N4 N5 524 | ## N1 1.00 0.66 0.59 0.47 0.41 525 | ## N2 0.66 1.00 0.58 0.46 0.40 526 | ## N3 0.59 0.58 1.00 0.41 0.36 527 | ## N4 0.47 0.46 0.41 1.00 0.29 528 | ## N5 0.41 0.40 0.36 0.29 1.00 529 | ## 530 | ## $residual 531 | ## N1 N2 N3 N4 N5 532 | ## N1 0.00 0.06 -0.03 -0.06 -0.03 533 | ## N2 0.06 0.00 -0.03 -0.07 -0.05 534 | ## N3 -0.03 -0.03 0.00 0.11 0.07 535 | ## N4 -0.06 -0.07 0.11 0.00 0.11 536 | ## N5 -0.03 -0.05 0.07 0.11 0.00 537 | ## 538 | ``` 539 | 540 | 541 | 542 | 543 | * `cov2cor` is a `base` R function that scales a covariance matrix into a correlation matrix. 544 | * Fitted and observed correlation matrices can be obtained by running `cov2cor` on the corresponding covariance matrices. 545 | * The residual correlation matrix can be obtained by subtracting the fitted correlation matrix from the observed correlation matrix. 546 | * In this case we can see that the certain pairs of items correlate more or less than other pairs. In particular `N1-N2`, `N3-N4`, `N4-N5` have positive correlation residuals. An examination of the items below may suggest some added degree of similarity between these pairs of items. For example, N1 and N2 both concern anger and irritation, whereas N3 and N4 both concern mood and affect. 547 | 548 | 549 | > N1: Get angry easily. (q_952) 550 | > N2: Get irritated easily. (q_974) 551 | > N3: Have frequent mood swings. (q_1099 552 | > N4: Often feel blue. (q_1479) 553 | > N5: Panic easily. (q_1505) 554 | 555 | ## Uncorrelated factors 556 | ### All Uncorrelated factors 557 | The following examines a mdoel with uncorrelated factors. 558 | 559 | 560 | 561 | ```r 562 | m3_model <- ' N =~ N1 + N2 + N3 + N4 + N5 563 | E =~ E1 + E2 + E3 + E4 + E5 564 | O =~ O1 + O2 + O3 + O4 + O5 565 | A =~ A1 + A2 + A3 + A4 + A5 566 | C =~ C1 + C2 + C3 + C4 + C5 567 | ' 568 | 569 | m3_fit <- cfa(m3_model, data=Data[, item_names], orthogonal=TRUE) 570 | 571 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 572 | m3=inspect(m3_fit, 'fit.measures')), 3) 573 | ``` 574 | 575 | ``` 576 | ## m1 m3 577 | ## chisq 4165.467 5.640e+03 578 | ## df 265.000 2.750e+02 579 | ## pvalue 0.000 0.000e+00 580 | ## baseline.chisq 18222.116 1.822e+04 581 | ## baseline.df 300.000 3.000e+02 582 | ## baseline.pvalue 0.000 0.000e+00 583 | ## cfi 0.782 7.010e-01 584 | ## tli 0.754 6.730e-01 585 | ## logl -99840.238 -1.006e+05 586 | ## unrestricted.logl -97757.504 -9.776e+04 587 | ## npar 60.000 5.000e+01 588 | ## aic 199800.476 2.013e+05 589 | ## bic 200148.363 2.015e+05 590 | ## ntotal 2436.000 2.436e+03 591 | ## bic2 199957.729 2.014e+05 592 | ## rmsea 0.078 8.900e-02 593 | ## rmsea.ci.lower 0.076 8.700e-02 594 | ## rmsea.ci.upper 0.080 9.200e-02 595 | ## rmsea.pvalue 0.000 0.000e+00 596 | ## srmr 0.075 1.380e-01 597 | ``` 598 | 599 | ```r 600 | anova(m1_fit, m3_fit) 601 | ``` 602 | 603 | ``` 604 | ## Chi Square Difference Test 605 | ## 606 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 607 | ## m1_fit 265 199800 200148 4165 608 | ## m3_fit 275 201255 201545 5640 1474 10 <2e-16 *** 609 | ## --- 610 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 611 | ``` 612 | 613 | ```r 614 | 615 | rmsea_m1 <- round(inspect(m1_fit, 'fit.measures')['rmsea'], 3) 616 | rmsea_m3 <- round(inspect(m3_fit, 'fit.measures')['rmsea'], 3) 617 | ``` 618 | 619 | 620 | 621 | 622 | * To convert a `cfa` model from one that permits fators to be correlated to one that constrains factors to be uncorrelated, just specify `orthogonal=TRUE`. 623 | * In this case constraining the factor covariances to all be zero led to a significant reduction in fit. This poorer fit can also be seen in measures like RMSEA (m1= 624 | `0.078`; m3 = `0.089` ). 625 | 626 | 627 | ### Correlations and covariances between factors 628 | It is useful to be able to extract correlations and covaraiances between factors. 629 | 630 | 631 | 632 | ```r 633 | inspect(m1_fit, "coefficients")$psi 634 | ``` 635 | 636 | ``` 637 | ## N E O A C 638 | ## N 1.689 639 | ## E 0.292 0.846 640 | ## O -0.093 -0.265 0.404 641 | ## A 0.141 0.304 -0.093 0.234 642 | ## C -0.250 -0.224 0.130 -0.110 0.463 643 | ``` 644 | 645 | ```r 646 | cov2cor(inspect(m1_fit, "coefficients")$psi) 647 | ``` 648 | 649 | ``` 650 | ## N E O A C 651 | ## N 1.000 652 | ## E 0.244 1.000 653 | ## O -0.112 -0.453 1.000 654 | ## A 0.223 0.683 -0.303 1.000 655 | ## C -0.283 -0.357 0.301 -0.334 1.000 656 | ``` 657 | 658 | ```r 659 | A_E_r <- cov2cor(inspect(m1_fit, "coefficients")$psi)["A", "E"] 660 | ``` 661 | 662 | 663 | 664 | 665 | * This code first extracts the factor variances and covariances. 666 | * I assume that naming the element `psi` (i.e., $\psi$) is a reference to LISREL Matrix notation (see this discussion from [USP 655 SEM](http://www.upa.pdx.edu/IOA/newsom/semclass/ho_lisrel%20notation.pdf)). 667 | * Once again `cov2cor` is used to convert the covariance matrix to a correlation matrix. 668 | * An inspection of the values shows that there are some substantive correlations that helps to explain why constraining them to zero in an orthogonal model would have substantially damaged fit. For example, the correlation between extraversion (`E`) and agreeableness (`A`) was quite high ($r = 0.68$). 669 | 670 | 671 | 672 | 673 | ```r 674 | # c('O', 'C', 'E', 'A', 'N') # set of factor names lhs != rhs # excludes 675 | # factor variances 676 | subset(inspect(m1_fit, "standardized"), rhs %in% c("O", "C", "E", 677 | "A", "N") & lhs != rhs) 678 | ``` 679 | 680 | ``` 681 | ## lhs op rhs est.std se z pvalue 682 | ## 1 N ~~ E 0.244 NA NA NA 683 | ## 2 N ~~ O -0.112 NA NA NA 684 | ## 3 N ~~ A 0.223 NA NA NA 685 | ## 4 N ~~ C -0.283 NA NA NA 686 | ## 5 E ~~ O -0.453 NA NA NA 687 | ## 6 E ~~ A 0.683 NA NA NA 688 | ## 7 E ~~ C -0.357 NA NA NA 689 | ## 8 O ~~ A -0.303 NA NA NA 690 | ## 9 O ~~ C 0.301 NA NA NA 691 | ## 10 A ~~ C -0.334 NA NA NA 692 | ``` 693 | 694 | 695 | 696 | 697 | * The same values can be extracted from the `standardized` coefficients table using the `inspect` method. 698 | 699 | We can also confirm that for the orthogonal model (`m3`) the correlations are zero. 700 | 701 | 702 | 703 | ```r 704 | cov2cor(inspect(m3_fit, "coefficients")$psi) 705 | ``` 706 | 707 | ``` 708 | ## N E O A C 709 | ## N 1 710 | ## E 0 1 711 | ## O 0 0 1 712 | ## A 0 0 0 1 713 | ## C 0 0 0 0 1 714 | ``` 715 | 716 | 717 | 718 | 719 | 720 | ## Constrain factor correlations to be equal 721 | ### Change constraints so that factor variances are one 722 | 723 | 724 | 725 | ```r 726 | m4_model <- ' N =~ N1 + N2 + N3 + N4 + N5 727 | E =~ E1 + E2 + E3 + E4 + E5 728 | O =~ O1 + O2 + O3 + O4 + O5 729 | A =~ A1 + A2 + A3 + A4 + A5 730 | C =~ C1 + C2 + C3 + C4 + C5 731 | ' 732 | 733 | m4_fit <- cfa(m4_model, data=Data[, item_names], std.lv=TRUE) 734 | 735 | inspect(m4_fit, 'coefficients')$psi 736 | ``` 737 | 738 | ``` 739 | ## N E O A C 740 | ## N 1.000 741 | ## E -0.244 1.000 742 | ## O -0.112 0.453 1.000 743 | ## A -0.223 0.683 0.303 1.000 744 | ## C -0.283 0.357 0.301 0.334 1.000 745 | ``` 746 | 747 | ```r 748 | inspect(m4_fit, 'coefficients')$psi 749 | ``` 750 | 751 | ``` 752 | ## N E O A C 753 | ## N 1.000 754 | ## E -0.244 1.000 755 | ## O -0.112 0.453 1.000 756 | ## A -0.223 0.683 0.303 1.000 757 | ## C -0.283 0.357 0.301 0.334 1.000 758 | ``` 759 | 760 | 761 | 762 | 763 | * `std.lv` is an argument that when `TRUE` standardises latent variables by fixing their variance to 1.0. The default is `FALSE` which instead constrains the first factor loading to 1.0. 764 | * This makes the covariance and the correlation matrix of the factors the same. 765 | 766 | We can see the differences in the loadings by comparing the loadings for the neuroticism factor: 767 | 768 | 769 | 770 | ```r 771 | head(parameterestimates(m4_fit), 5) 772 | ``` 773 | 774 | ``` 775 | ## lhs op rhs est se z pvalue ci.lower ci.upper 776 | ## 1 N =~ N1 1.300 0.028 46.07 0 1.244 1.355 777 | ## 2 N =~ N2 1.230 0.028 44.38 0 1.176 1.285 778 | ## 3 N =~ N3 1.149 0.030 38.41 0 1.090 1.207 779 | ## 4 N =~ N4 0.899 0.031 28.75 0 0.838 0.960 780 | ## 5 N =~ N5 0.816 0.033 24.65 0 0.751 0.881 781 | ``` 782 | 783 | ```r 784 | head(parameterestimates(m1_fit), 5) 785 | ``` 786 | 787 | ``` 788 | ## lhs op rhs est se z pvalue ci.lower ci.upper 789 | ## 1 N =~ N1 1.000 0.000 NA NA 1.000 1.000 790 | ## 2 N =~ N2 0.947 0.024 39.90 0 0.900 0.993 791 | ## 3 N =~ N3 0.884 0.025 35.92 0 0.836 0.932 792 | ## 4 N =~ N4 0.692 0.025 27.75 0 0.643 0.741 793 | ## 5 N =~ N5 0.628 0.026 24.03 0 0.577 0.679 794 | ``` 795 | 796 | ```r 797 | 798 | # shows how ratio of loadings has not changed 799 | head(parameterestimates(m4_fit), 5)$est/head(parameterestimates(m4_fit), 800 | 5)$est[1] 801 | ``` 802 | 803 | ``` 804 | ## [1] 1.0000 0.9467 0.8839 0.6918 0.6278 805 | ``` 806 | 807 | 808 | 809 | 810 | 811 | 812 | ### Add equality constraints 813 | 814 | 815 | ```r 816 | m5_model <- ' N =~ N1 + N2 + N3 + N4 + N5 817 | E =~ E1 + E2 + E3 + E4 + E5 818 | O =~ O1 + O2 + O3 + O4 + O5 819 | A =~ A1 + A2 + A3 + A4 + A5 820 | C =~ C1 + C2 + C3 + C4 + C5 821 | N ~~ R*E + R*O + R*A + R*C 822 | E ~~ R*O + R*A + R*C 823 | O ~~ R*A + R*C 824 | A ~~ R*C 825 | ' 826 | 827 | Data_reversed <- Data 828 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 829 | 830 | m5_fit <- cfa(m5_model, data=Data_reversed[, item_names], std.lv=TRUE) 831 | ``` 832 | 833 | 834 | 835 | 836 | * Equality constraints were added by labelling all the covariance parameters with a common label (i.e., `R`). 837 | * `~~` stands for covariance. 838 | * `R*E` labels the parameter with the `E` variable with the label 839 | * I reversed the neuroticism items and hence the factor to ensure that all the inter-item correlations were positive. 840 | 841 | The following output shows that the correlation/covariance is the same for all factor inter-correlations. 842 | 843 | 844 | 845 | ```r 846 | inspect(m5_fit, "coefficients")$psi 847 | ``` 848 | 849 | ``` 850 | ## N E O A C 851 | ## N 1.000 852 | ## E 0.323 1.000 853 | ## O 0.323 0.323 1.000 854 | ## A 0.323 0.323 0.323 1.000 855 | ## C 0.323 0.323 0.323 0.323 1.000 856 | ``` 857 | 858 | 859 | 860 | 861 | The following analysis compare the fit of the unconstrained with the equal-covariance model. 862 | 863 | 864 | 865 | ```r 866 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m5 = inspect(m5_fit, 867 | "fit.measures")), 3) 868 | ``` 869 | 870 | ``` 871 | ## m1 m5 872 | ## chisq 4165.467 4.576e+03 873 | ## df 265.000 2.740e+02 874 | ## pvalue 0.000 0.000e+00 875 | ## baseline.chisq 18222.116 1.822e+04 876 | ## baseline.df 300.000 3.000e+02 877 | ## baseline.pvalue 0.000 0.000e+00 878 | ## cfi 0.782 7.600e-01 879 | ## tli 0.754 7.370e-01 880 | ## logl -99840.238 -1.000e+05 881 | ## unrestricted.logl -97757.504 -9.776e+04 882 | ## npar 60.000 5.100e+01 883 | ## aic 199800.476 2.002e+05 884 | ## bic 200148.363 2.005e+05 885 | ## ntotal 2436.000 2.436e+03 886 | ## bic2 199957.729 2.003e+05 887 | ## rmsea 0.078 8.000e-02 888 | ## rmsea.ci.lower 0.076 7.800e-02 889 | ## rmsea.ci.upper 0.080 8.200e-02 890 | ## rmsea.pvalue 0.000 0.000e+00 891 | ## srmr 0.075 8.900e-02 892 | ``` 893 | 894 | ```r 895 | anova(m1_fit, m5_fit) 896 | ``` 897 | 898 | ``` 899 | ## Chi Square Difference Test 900 | ## 901 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 902 | ## m1_fit 265 2e+05 2e+05 4165 903 | ## m5_fit 274 2e+05 2e+05 4576 411 9 <2e-16 *** 904 | ## --- 905 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 906 | ``` 907 | 908 | 909 | 910 | 911 | * The unconstrained model provides a better fit both in terms of the chi-square difference test and when comparing various parisomony adjusted fit indices such as RMSEA. 912 | * The difference is relatively small. 913 | 914 | The following summarises the correlations between variables (correlations with Neuroticism reversed). 915 | 916 | 917 | 918 | ```r 919 | rs <- abs(inspect(m4_fit, "coefficients")$psi) 920 | summary(rs[lower.tri(rs)]) 921 | ``` 922 | 923 | ``` 924 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 925 | ## 0.112 0.254 0.302 0.329 0.352 0.683 926 | ``` 927 | 928 | ```r 929 | hist(rs[lower.tri(rs)]) 930 | ``` 931 | 932 |  933 | 934 | ```r 935 | 936 | round(rs, 2) 937 | ``` 938 | 939 | ``` 940 | ## N E O A C 941 | ## N 1.00 942 | ## E 0.24 1.00 943 | ## O 0.11 0.45 1.00 944 | ## A 0.22 0.68 0.30 1.00 945 | ## C 0.28 0.36 0.30 0.33 1.00 946 | ``` 947 | 948 | 949 | 950 | 951 | * Given the very large sample size, even small variations in sample correlations likely reflect true variation. 952 | * However, in particular, the correlation between E and A is much larger than the average correlation, and the correlation between O and N is much smaller than the average correlation. 953 | 954 | ### Add equality constraints with some post hoc modifications 955 | 956 | 957 | ```r 958 | m6_model <- ' N =~ N1 + N2 + N3 + N4 + N5 959 | E =~ E1 + E2 + E3 + E4 + E5 960 | O =~ O1 + O2 + O3 + O4 + O5 961 | A =~ A1 + A2 + A3 + A4 + A5 962 | C =~ C1 + C2 + C3 + C4 + C5 963 | N ~~ R*E + R*A + R*C 964 | E ~~ R*O + R*C 965 | O ~~ R*A + R*C 966 | A ~~ R*C 967 | ' 968 | 969 | Data_reversed <- Data 970 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 971 | 972 | m6_fit <- cfa(m6_model, data=Data_reversed[, item_names], std.lv=TRUE) 973 | ``` 974 | 975 | 976 | 977 | 978 | The above model frees up the correlation between E and A, and between O and N. 979 | 980 | 981 | 982 | ```r 983 | round(cbind(m1 = inspect(m1_fit, "fit.measures"), m5 = inspect(m1_fit, 984 | "fit.measures"), m6 = inspect(m6_fit, "fit.measures")), 3) 985 | ``` 986 | 987 | ``` 988 | ## m1 m5 m6 989 | ## chisq 4165.467 4165.467 4223.250 990 | ## df 265.000 265.000 272.000 991 | ## pvalue 0.000 0.000 0.000 992 | ## baseline.chisq 18222.116 18222.116 18222.116 993 | ## baseline.df 300.000 300.000 300.000 994 | ## baseline.pvalue 0.000 0.000 0.000 995 | ## cfi 0.782 0.782 0.780 996 | ## tli 0.754 0.754 0.757 997 | ## logl -99840.238 -99840.238 -99869.130 998 | ## unrestricted.logl -97757.504 -97757.504 -97757.504 999 | ## npar 60.000 60.000 53.000 1000 | ## aic 199800.476 199800.476 199844.259 1001 | ## bic 200148.363 200148.363 200151.559 1002 | ## ntotal 2436.000 2436.000 2436.000 1003 | ## bic2 199957.729 199957.729 199983.166 1004 | ## rmsea 0.078 0.078 0.077 1005 | ## rmsea.ci.lower 0.076 0.076 0.075 1006 | ## rmsea.ci.upper 0.080 0.080 0.079 1007 | ## rmsea.pvalue 0.000 0.000 0.000 1008 | ## srmr 0.075 0.075 0.077 1009 | ``` 1010 | 1011 | ```r 1012 | anova(m1_fit, m6_fit) 1013 | ``` 1014 | 1015 | ``` 1016 | ## Chi Square Difference Test 1017 | ## 1018 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 1019 | ## m1_fit 265 2e+05 2e+05 4165 1020 | ## m6_fit 272 2e+05 2e+05 4223 57.8 7 4.2e-10 *** 1021 | ## --- 1022 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 1023 | ``` 1024 | 1025 | ```r 1026 | anova(m5_fit, m6_fit) 1027 | ``` 1028 | 1029 | ``` 1030 | ## Chi Square Difference Test 1031 | ## 1032 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) 1033 | ## m6_fit 272 2e+05 2e+05 4223 1034 | ## m5_fit 274 2e+05 2e+05 4576 353 2 <2e-16 *** 1035 | ## --- 1036 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 1037 | ``` 1038 | 1039 | 1040 | 1041 | 1042 | * Freeing up these two correlations improved the model relative to the equality model. By most fit statistics, this model still provided a worse fit than the unconstrained model. However, interestingly, the RMSEA was slightly lower (i.e., better). 1043 | 1044 | ### Add equality constraints without reversal 1045 | In section 5.5 of the [Lavaan introductory guide 0.4-13](http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf) it talks about various types of equality constraints. Thus, instead of reversing the neuroticism factor, it is possible to directly constrain covariances of neuroticism with each other factor to be the opposite of the covariances. 1046 | 1047 | 1048 | 1049 | ```r 1050 | m7_model <- ' N =~ N1 + N2 + N3 + N4 + N5 1051 | E =~ E1 + E2 + E3 + E4 + E5 1052 | O =~ O1 + O2 + O3 + O4 + O5 1053 | A =~ A1 + A2 + A3 + A4 + A5 1054 | C =~ C1 + C2 + C3 + C4 + C5 1055 | # covariances 1056 | N ~~ R1*E + R1*O + R1*A + R1*C 1057 | E ~~ R2*O + R2*A + R2*C 1058 | O ~~ R2*A + R2*C 1059 | A ~~ R2*C 1060 | 1061 | # constraints 1062 | R1 == 0 - R2 1063 | ' 1064 | 1065 | m7_fit <- cfa(m7_model, data=Data[, item_names], std.lv=TRUE) 1066 | ``` 1067 | 1068 | 1069 | 1070 | 1071 | Let's check that the results are the same whether we reverse data or set negative constraints. 1072 | 1073 | 1074 | 1075 | 1076 | 1077 | ```r 1078 | m5_fit 1079 | ``` 1080 | 1081 | ``` 1082 | ## lavaan (0.4-14) converged normally after 43 iterations 1083 | ## 1084 | ## Number of observations 2436 1085 | ## 1086 | ## Estimator ML 1087 | ## Minimum Function Chi-square 4576.170 1088 | ## Degrees of freedom 274 1089 | ## P-value 0.000 1090 | ## 1091 | ``` 1092 | 1093 | ```r 1094 | m7_fit 1095 | ``` 1096 | 1097 | ``` 1098 | ## lavaan (0.4-14) converged normally after 283 iterations 1099 | ## 1100 | ## Number of observations 2436 1101 | ## 1102 | ## Estimator ML 1103 | ## Minimum Function Chi-square 4576.170 1104 | ## Degrees of freedom 274 1105 | ## P-value 0.000 1106 | ## 1107 | ``` 1108 | 1109 | 1110 | 1111 | -------------------------------------------------------------------------------- /cfa-example/cfa-example.rmd: -------------------------------------------------------------------------------- 1 | # CFA Example 2 | 3 | ```{r get_data, message=FALSE} 4 | library(psych) 5 | library(lavaan) 6 | Data <- bfi 7 | item_names <- names(Data)[1:25] 8 | ``` 9 | 10 | ## Check data 11 | 12 | ```{r } 13 | sapply(Data[,item_names], function(X) sum(is.na(X))) 14 | 15 | Data$item_na <- apply(Data[,item_names], 1, function(X) sum(is.na(X)) > 0) 16 | 17 | table(Data$item_na) 18 | Data <- Data[!Data$item_na, ] 19 | ``` 20 | 21 | * I decided to remove data with missing data to simplify subsequent exploration of the features of the lavaan software. 22 | 23 | 24 | ## Basic CFA 25 | ```{r, tidy=FALSE} 26 | m1_model <- ' N =~ N1 + N2 + N3 + N4 + N5 27 | E =~ E1 + E2 + E3 + E4 + E5 28 | O =~ O1 + O2 + O3 + O4 + O5 29 | A =~ A1 + A2 + A3 + A4 + A5 30 | C =~ C1 + C2 + C3 + C4 + C5 31 | ' 32 | 33 | m1_fit <- cfa(m1_model, data=Data[, item_names]) 34 | summary(m1_fit, standardized=TRUE) 35 | ``` 36 | 37 | * **`Std.lv`**: Only latent variables have been standardized 38 | * **`Std.all`**: Observed and latent variables have been standardized. 39 | * **Factor loadings**: Under the `latent variables` section, the `Std.all` column provides standardised factor loadings. 40 | * **Factor correlations**: Under the `Covariances` section, the `Std.all` column provides standardised factor loadings. 41 | * **`Variances`**: Latent factor variances can be constrained for identifiability purposes to be 1, but in this case, one of the loadings was constrained to be one. Variances for items represent the variance not explained by the latent factor. 42 | 43 | 44 | 45 | ```{r demonstrate_variance_point} 46 | variances <- c(unique=subset(inspect(m1_fit, "standardizedsolution"), 47 | lhs == 'N1' & rhs == 'N1')[, 'est.std'], 48 | common=subset(inspect(m1_fit, "standardizedsolution"), 49 | lhs == 'N' & rhs == 'N1')[, 'est.std']^2) 50 | (variances <- c(variances, total=sum(variances))) 51 | ``` 52 | 53 | * The output above illustrates the point about variances. Variance for each item is explained by either the common factor or by error variance. As there is just one latent factor loading on the item, the squared standardised coefficient is the variance explained by the common factor. The sum of the unique and common standardised variances is one, which naturally corresponds to the variance of a standardised variable. 54 | * The code also demonstrates ideas about how to extract specific information from the lavaan model fit object. Specifically, the `inspect` method provides access to a wide range of specific information. See help for further details. 55 | * I used the `subset` method to provide an easy one-liner for extracting elements from the data frame returned by the `inspect` method. 56 | 57 | ```{r} 58 | variances <- c(N1_N1=subset(parameterestimates(m1_fit), 59 | lhs == 'N1' & rhs == 'N1')[, 'est'], 60 | N_N=subset(parameterestimates(m1_fit), 61 | lhs == 'N' & rhs == 'N')[, 'est'], 62 | N_N1=subset(parameterestimates(m1_fit), 63 | lhs == 'N' & rhs == 'N1')[, 'est']) 64 | 65 | cbind(parameters = c(variances, 66 | total=variances['N_N1'] * variances['N_N'] + variances['N1_N1'], 67 | raw_divide_by_n_minus_1=var(Data[,'N1']), 68 | raw_divide_by_n=mean((Data[,'N1'] - mean(Data[,'N1']))^2))) 69 | ``` 70 | 71 | * The output above shows the unstandardised parameters related to the item `N1`. 72 | * `N1_N1` corresponds to the unstandardised unique variance for the item. 73 | * `N_N` times `N_N1` represents the unstandardised common variance. 74 | * Thus, the sum of the unique and common variance represents the total variance. 75 | * When I calculated this on the raw data using the standard $n-1$ denominator, the value was slightly larger, but when I used $n$ as the denominator, the estimate was very close. 76 | 77 | 78 | 79 | ## Compare with a single factor model 80 | ```{r, tidy=FALSE} 81 | m2_model <- ' G =~ N1 + N2 + N3 + N4 + N5 82 | + E1 + E2 + E3 + E4 + E5 83 | + O1 + O2 + O3 + O4 + O5 84 | + A1 + A2 + A3 + A4 + A5 85 | + C1 + C2 + C3 + C4 + C5 86 | ' 87 | 88 | m2_fit <- cfa(m2_model, data=Data[, item_names]) 89 | summary(m2_fit, standardized=TRUE) 90 | ``` 91 | 92 | ```{r} 93 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 94 | m2=inspect(m2_fit, 'fit.measures')), 3) 95 | anova(m1_fit, m2_fit) 96 | ``` 97 | 98 | * The output compares the model fit statistics for the two models. 99 | * It also performs a chi-square difference test which shows that a one-factor model has significantly worse fit than the two-factor model. 100 | 101 | 102 | ## Modification indices 103 | ```{r} 104 | m1_mod <- modificationindices(m1_fit) 105 | m1_mod_summary <- subset(m1_mod, mi > 100) 106 | m1_mod_summary[order(m1_mod_summary$mi, decreasing=TRUE), ] 107 | ``` 108 | 109 | * `modificationindices` suggests several ad hoc modifications that could be made to improve the fit of the model. 110 | * The largest index suggests that items `N1` and `N2` share common variance. If we look at the help file on the bfi dataset `?bfi`, we see tha the text for `N1` ("Get angry easily") and `N2` ("Get irritated easily") are very similar. 111 | 112 | ```{r} 113 | (N_cors <- round(cor(Data[, paste0('N', 1:5)]), 2)) 114 | N1_N2_corr <- N_cors['N1', 'N2'] 115 | other_N_corrs <- round(mean(abs(N_cors[lower.tri(N_cors)][-1])), 2) 116 | 117 | ``` 118 | 119 | * The correlation matrix also shows that the correlation N1 and N2 ($r = `r I(N1_N2_corr)`$) is much larger than it is for the other variables ($\text{mean}(|r|) = `r I(other_N_corrs)`$). 120 | 121 | ## Various matrices 122 | ### Observed, fitted, and residual covariance matrices 123 | The following analysis extracts observed, fitted, and residual covariances and checks that they are consistent with expectations. I only perform this for five items rather than the full 25 item set in order to make the point about demonstrating their meaning clearer. 124 | 125 | ```{r} 126 | N_names <- paste0('N', 1:5) 127 | N_matrices <- list( 128 | observed=inspect(m1_fit, 'sampstat')$cov[N_names, N_names], 129 | fitted=fitted(m1_fit)$cov[N_names, N_names], 130 | residual=resid(m1_fit)$cov[N_names, N_names]) 131 | 132 | N_matrices$check <- N_matrices$observed - (N_matrices$fitted + N_matrices$residual) 133 | lapply(N_matrices, function(X) round(X, 3)) 134 | ``` 135 | 136 | * The overved covariance matrix was extracted using the `cov` function on the sample data. 137 | * The fitted covariance matrix can be extracted using the `fitted` method on the model fit object and then extracting the cov 138 | * Many symmetric matrices in lavaan are of class `lavaan.matrix.symmetric`. This hides the upper triangle of the matrix and formats the matrix to `nd` decimal places. 139 | Run `getAnywhere(print.lavaan.matrix.symmetric)` to see more details. 140 | * The `sampstat` option in the `inspect` method can be used to extract the sample covariance matrix. This is similar, but not exactly the same as running `cov` on the sample data. 141 | * The `resid` method can be used to extract the residual covariance matrix 142 | * I then create a `check` that `observed = fitted - residual`, which it does. 143 | 144 | ### Observed, fitted, and residual correlation matrices 145 | I often find it more meaningful to examine observed, fitted, and residual correlation matrices. Standardisation often makes it easier to understand the real magnitude of any residual. 146 | 147 | ```{r} 148 | N_names <- paste0('N', 1:5) 149 | N_cov <- list( 150 | observed=inspect(m1_fit, 'sampstat')$cov[N_names, N_names], 151 | fitted=fitted(m1_fit)$cov[N_names, N_names]) 152 | 153 | N_cor <- list( 154 | observed = cov2cor(N_cov$observed), 155 | fitted = cov2cor(N_cov$fitted) ) 156 | 157 | N_cor$residual <- N_cor$observed - N_cor$fitted 158 | 159 | lapply(N_cor, function(X) round(X, 2)) 160 | ``` 161 | 162 | * `cov2cor` is a `base` R function that scales a covariance matrix into a correlation matrix. 163 | * Fitted and observed correlation matrices can be obtained by running `cov2cor` on the corresponding covariance matrices. 164 | * The residual correlation matrix can be obtained by subtracting the fitted correlation matrix from the observed correlation matrix. 165 | * In this case we can see that the certain pairs of items correlate more or less than other pairs. In particular `N1-N2`, `N3-N4`, `N4-N5` have positive correlation residuals. An examination of the items below may suggest some added degree of similarity between these pairs of items. For example, N1 and N2 both concern anger and irritation, whereas N3 and N4 both concern mood and affect. 166 | 167 | 168 | > N1: Get angry easily. (q_952) 169 | > N2: Get irritated easily. (q_974) 170 | > N3: Have frequent mood swings. (q_1099 171 | > N4: Often feel blue. (q_1479) 172 | > N5: Panic easily. (q_1505) 173 | 174 | ## Uncorrelated factors 175 | ### All Uncorrelated factors 176 | The following examines a mdoel with uncorrelated factors. 177 | 178 | ```{r tidy=FALSE} 179 | m3_model <- ' N =~ N1 + N2 + N3 + N4 + N5 180 | E =~ E1 + E2 + E3 + E4 + E5 181 | O =~ O1 + O2 + O3 + O4 + O5 182 | A =~ A1 + A2 + A3 + A4 + A5 183 | C =~ C1 + C2 + C3 + C4 + C5 184 | ' 185 | 186 | m3_fit <- cfa(m3_model, data=Data[, item_names], orthogonal=TRUE) 187 | 188 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 189 | m3=inspect(m3_fit, 'fit.measures')), 3) 190 | anova(m1_fit, m3_fit) 191 | 192 | rmsea_m1 <- round(inspect(m1_fit, 'fit.measures')['rmsea'], 3) 193 | rmsea_m3 <- round(inspect(m3_fit, 'fit.measures')['rmsea'], 3) 194 | ``` 195 | 196 | * To convert a `cfa` model from one that permits fators to be correlated to one that constrains factors to be uncorrelated, just specify `orthogonal=TRUE`. 197 | * In this case constraining the factor covariances to all be zero led to a significant reduction in fit. This poorer fit can also be seen in measures like RMSEA (m1= 198 | `r rmsea_m1`; m3 = `r rmsea_m3` ). 199 | 200 | 201 | ### Correlations and covariances between factors 202 | It is useful to be able to extract correlations and covaraiances between factors. 203 | 204 | ```{r} 205 | inspect(m1_fit, 'coefficients')$psi 206 | cov2cor(inspect(m1_fit, 'coefficients')$psi) 207 | A_E_r <- cov2cor(inspect(m1_fit, 'coefficients')$psi)['A', 'E'] 208 | ``` 209 | 210 | * This code first extracts the factor variances and covariances. 211 | * I assume that naming the element `psi` (i.e., $\psi$) is a reference to LISREL Matrix notation (see this discussion from [USP 655 SEM](http://www.upa.pdx.edu/IOA/newsom/semclass/ho_lisrel%20notation.pdf)). 212 | * Once again `cov2cor` is used to convert the covariance matrix to a correlation matrix. 213 | * An inspection of the values shows that there are some substantive correlations that helps to explain why constraining them to zero in an orthogonal model would have substantially damaged fit. For example, the correlation between extraversion (`E`) and agreeableness (`A`) was quite high ($r = `r I(round(A_E_r, 2))`$). 214 | 215 | 216 | ```{r} 217 | # c('O', 'C', 'E', 'A', 'N') # set of factor names 218 | # lhs != rhs # excludes factor variances 219 | subset(inspect(m1_fit, 'standardized'), 220 | rhs %in% c('O', 'C', 'E', 'A', 'N') & lhs != rhs) 221 | ``` 222 | 223 | * The same values can be extracted from the `standardized` coefficients table using the `inspect` method. 224 | 225 | We can also confirm that for the orthogonal model (`m3`) the correlations are zero. 226 | 227 | ```{r} 228 | cov2cor(inspect(m3_fit, 'coefficients')$psi) 229 | ``` 230 | 231 | 232 | ## Constrain factor correlations to be equal 233 | ### Change constraints so that factor variances are one 234 | 235 | ```{r tidy=FALSE} 236 | m4_model <- ' N =~ N1 + N2 + N3 + N4 + N5 237 | E =~ E1 + E2 + E3 + E4 + E5 238 | O =~ O1 + O2 + O3 + O4 + O5 239 | A =~ A1 + A2 + A3 + A4 + A5 240 | C =~ C1 + C2 + C3 + C4 + C5 241 | ' 242 | 243 | m4_fit <- cfa(m4_model, data=Data[, item_names], std.lv=TRUE) 244 | 245 | inspect(m4_fit, 'coefficients')$psi 246 | inspect(m4_fit, 'coefficients')$psi 247 | ``` 248 | 249 | * `std.lv` is an argument that when `TRUE` standardises latent variables by fixing their variance to 1.0. The default is `FALSE` which instead constrains the first factor loading to 1.0. 250 | * This makes the covariance and the correlation matrix of the factors the same. 251 | 252 | We can see the differences in the loadings by comparing the loadings for the neuroticism factor: 253 | 254 | ```{r} 255 | head(parameterestimates(m4_fit), 5) 256 | head(parameterestimates(m1_fit), 5) 257 | 258 | # shows how ratio of loadings has not changed 259 | head(parameterestimates(m4_fit), 5)$est / head(parameterestimates(m4_fit), 5)$est[1] 260 | ``` 261 | 262 | 263 | 264 | ### Add equality constraints 265 | ```{r tidy=FALSE} 266 | m5_model <- ' N =~ N1 + N2 + N3 + N4 + N5 267 | E =~ E1 + E2 + E3 + E4 + E5 268 | O =~ O1 + O2 + O3 + O4 + O5 269 | A =~ A1 + A2 + A3 + A4 + A5 270 | C =~ C1 + C2 + C3 + C4 + C5 271 | N ~~ R*E + R*O + R*A + R*C 272 | E ~~ R*O + R*A + R*C 273 | O ~~ R*A + R*C 274 | A ~~ R*C 275 | ' 276 | 277 | Data_reversed <- Data 278 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 279 | 280 | m5_fit <- cfa(m5_model, data=Data_reversed[, item_names], std.lv=TRUE) 281 | ``` 282 | 283 | * Equality constraints were added by labelling all the covariance parameters with a common label (i.e., `R`). 284 | * `~~` stands for covariance. 285 | * `R*E` labels the parameter with the `E` variable with the label 286 | * I reversed the neuroticism items and hence the factor to ensure that all the inter-item correlations were positive. 287 | 288 | The following output shows that the correlation/covariance is the same for all factor inter-correlations. 289 | 290 | ```{r} 291 | inspect(m5_fit, 'coefficients')$psi 292 | ``` 293 | 294 | The following analysis compare the fit of the unconstrained with the equal-covariance model. 295 | 296 | ```{r} 297 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 298 | m5=inspect(m5_fit, 'fit.measures')), 3) 299 | anova(m1_fit, m5_fit) 300 | ``` 301 | 302 | * The unconstrained model provides a better fit both in terms of the chi-square difference test and when comparing various parisomony adjusted fit indices such as RMSEA. 303 | * The difference is relatively small. 304 | 305 | The following summarises the correlations between variables (correlations with Neuroticism reversed). 306 | 307 | ```{r } 308 | rs <- abs(inspect(m4_fit, 'coefficients')$psi) 309 | summary(rs[lower.tri(rs)]) 310 | hist(rs[lower.tri(rs)]) 311 | 312 | round(rs, 2) 313 | ``` 314 | 315 | * Given the very large sample size, even small variations in sample correlations likely reflect true variation. 316 | * However, in particular, the correlation between E and A is much larger than the average correlation, and the correlation between O and N is much smaller than the average correlation. 317 | 318 | ### Add equality constraints with some post hoc modifications 319 | ```{r tidy=FALSE} 320 | m6_model <- ' N =~ N1 + N2 + N3 + N4 + N5 321 | E =~ E1 + E2 + E3 + E4 + E5 322 | O =~ O1 + O2 + O3 + O4 + O5 323 | A =~ A1 + A2 + A3 + A4 + A5 324 | C =~ C1 + C2 + C3 + C4 + C5 325 | N ~~ R*E + R*A + R*C 326 | E ~~ R*O + R*C 327 | O ~~ R*A + R*C 328 | A ~~ R*C 329 | ' 330 | 331 | Data_reversed <- Data 332 | Data_reversed[, paste0('N', 1:5)] <- 7 - Data[, paste0('N', 1:5)] 333 | 334 | m6_fit <- cfa(m6_model, data=Data_reversed[, item_names], std.lv=TRUE) 335 | ``` 336 | 337 | The above model frees up the correlation between E and A, and between O and N. 338 | 339 | ```{r} 340 | round(cbind(m1=inspect(m1_fit, 'fit.measures'), 341 | m5=inspect(m1_fit, 'fit.measures'), 342 | m6=inspect(m6_fit, 'fit.measures')), 3) 343 | anova(m1_fit, m6_fit) 344 | anova(m5_fit, m6_fit) 345 | ``` 346 | 347 | * Freeing up these two correlations improved the model relative to the equality model. By most fit statistics, this model still provided a worse fit than the unconstrained model. However, interestingly, the RMSEA was slightly lower (i.e., better). 348 | 349 | ### Add equality constraints without reversal 350 | In section 5.5 of the [Lavaan introductory guide 0.4-13](http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf) it talks about various types of equality constraints. Thus, instead of reversing the neuroticism factor, it is possible to directly constrain covariances of neuroticism with each other factor to be the opposite of the covariances. 351 | 352 | ```{r tidy=FALSE} 353 | m7_model <- ' N =~ N1 + N2 + N3 + N4 + N5 354 | E =~ E1 + E2 + E3 + E4 + E5 355 | O =~ O1 + O2 + O3 + O4 + O5 356 | A =~ A1 + A2 + A3 + A4 + A5 357 | C =~ C1 + C2 + C3 + C4 + C5 358 | # covariances 359 | N ~~ R1*E + R1*O + R1*A + R1*C 360 | E ~~ R2*O + R2*A + R2*C 361 | O ~~ R2*A + R2*C 362 | A ~~ R2*C 363 | 364 | # constraints 365 | R1 == 0 - R2 366 | ' 367 | 368 | m7_fit <- cfa(m7_model, data=Data[, item_names], std.lv=TRUE) 369 | ``` 370 | 371 | Let's check that the results are the same whether we reverse data or set negative constraints. 372 | 373 | 374 | ```{r} 375 | m5_fit 376 | m7_fit 377 | 378 | inspect(m5_fit, 'coefficients')$psi 379 | inspect(m7_fit, 'coefficients')$psi 380 | ``` -------------------------------------------------------------------------------- /cfa-example/figure/unnamed-chunk-19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jeromyanglim/lavaan-examples/d7f5cbdc7fe14ffd039512bae6aa140c2a0ca5e6/cfa-example/figure/unnamed-chunk-19.png -------------------------------------------------------------------------------- /cheat-sheet-lavaan/cheat-sheet-lavaan.html: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 |
6 | 7 | 8 |Data
is data framemodel
is the lavaan model syntax character variablefit
is an object of class lavaan
typically returned from functions cfa
, sem
, growth
, and lavaan
m1_fit
and m2_fit
are used for showing model comparison of lavaan
objects.?cfa ?sem ?lavaan
: ?inspect
Name | 170 |Command | 171 |
---|---|
fit CFA to data | 175 |cfa(model, data=Data) |
176 |
fit SEM to data | 179 |sem(model, data=Data) |
180 |
standardised solution | 183 |sem(model, data=Data, std.ov=TRUE) |
184 |
orthogonal factors | 187 |cfa(model, data=Data, orthogonal=TRUE) |
188 |
Name | 196 |Command | 197 |
---|---|
Factor covariance matrix | 201 |inspect(fit, "coefficients")$psi |
202 |
Fitted covariance matrix | 205 |fitted(fit)$cov |
206 |
Observed covariance matrix | 209 |inspect(fit, 'sampstat')$cov |
210 |
Residual covariance matrix | 213 |resid(fit)$cov |
214 |
Factor correlation matrix | 217 |cov2cor(inspect(fit, "coefficients")$psi) or use covariance command with standardised solution e.g., cfa(..., std.ov=TRUE) |
218 |
Name | 226 |Command | 227 |
---|---|
Fit measures: | 231 |fitMeasures(fit) |
232 |
Specific fit measures e.g.: | 235 |fitMeasures(fit)[c('chisq', 'df', 'pvalue', 'cfi', 'rmsea', 'srmr')] |
236 |
Name | 244 |Command | 245 |
---|---|
Parameter information | 249 |parTable(fit) |
250 |
Standardised estimates | 253 |standardizedSolution(fit) or summary(fit, standardized=TRUE) |
254 |
R-squared | inspect(fit, 'r2')
Name | 264 |Command | 265 |
---|---|
Compare fit measures | 269 |cbind(m1=inspect(m1_fit, 'fit.measures'), m2=inspect(m2_fit, 'fit.measures')) |
270 |
Chi-square difference test | 273 |anova(m1_fit, m2_fit) |
274 |
Name | 282 |Command | 283 |
---|---|
Modification indices | 287 |mod_ind <- modificationindices(fit) |
288 |
10 greatest | 291 |head(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], 10) |
292 |
mi > 5 | 295 |subset(mod_ind[order(mod_ind$mi, decreasing=TRUE), ], mi > 5) |
296 |
This exercise examines the first example shown in
190 | http://www.jstatsoft.org/v48/i02/paper.
191 | It's a three-factor confirmatory factor analysis example with three items per factor.
192 | All three latent factors are permitted to correlate.
x1
to x3
load on a visual
factorx4
to x6
load on a textual
factorx7
to x9
load on a speed
factorlibrary('lavaan')
201 | library('Hmisc')
202 | cases <- HolzingerSwineford1939
203 |
204 |
205 | str(cases)
208 |
209 |
210 | ## 'data.frame': 301 obs. of 15 variables:
211 | ## $ id : int 1 2 3 4 5 6 7 8 9 11 ...
212 | ## $ sex : int 1 2 2 1 2 2 1 2 2 2 ...
213 | ## $ ageyr : int 13 13 13 13 12 14 12 12 13 12 ...
214 | ## $ agemo : int 1 7 1 2 2 1 1 2 0 5 ...
215 | ## $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ...
216 | ## $ grade : int 7 7 7 7 7 7 7 7 7 7 ...
217 | ## $ x1 : num 3.33 5.33 4.5 5.33 4.83 ...
218 | ## $ x2 : num 7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ...
219 | ## $ x3 : num 0.375 2.125 1.875 3 0.875 ...
220 | ## $ x4 : num 2.33 1.67 1 2.67 2.67 ...
221 | ## $ x5 : num 5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ...
222 | ## $ x6 : num 1.286 1.286 0.429 2.429 2.571 ...
223 | ## $ x7 : num 3.39 3.78 3.26 3 3.7 ...
224 | ## $ x8 : num 5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ...
225 | ## $ x9 : num 6.36 7.92 4.42 4.86 5.92 ...
226 |
227 |
228 | Hmisc::describe(cases)
229 |
230 |
231 | ## cases
232 | ##
233 | ## 15 Variables 301 Observations
234 | ## ---------------------------------------------------------------------------
235 | ## id
236 | ## n missing unique Mean .05 .10 .25 .50 .75
237 | ## 301 0 301 176.6 17 33 82 163 272
238 | ## .90 .95
239 | ## 318 335
240 | ##
241 | ## lowest : 1 2 3 4 5, highest: 346 347 348 349 351
242 | ## ---------------------------------------------------------------------------
243 | ## sex
244 | ## n missing unique Mean
245 | ## 301 0 2 1.515
246 | ##
247 | ## 1 (146, 49%), 2 (155, 51%)
248 | ## ---------------------------------------------------------------------------
249 | ## ageyr
250 | ## n missing unique Mean
251 | ## 301 0 6 13
252 | ##
253 | ## 11 12 13 14 15 16
254 | ## Frequency 8 101 110 55 20 7
255 | ## % 3 34 37 18 7 2
256 | ## ---------------------------------------------------------------------------
257 | ## agemo
258 | ## n missing unique Mean .05 .10 .25 .50 .75
259 | ## 301 0 12 5.375 0 1 2 5 8
260 | ## .90 .95
261 | ## 10 11
262 | ##
263 | ## 0 1 2 3 4 5 6 7 8 9 10 11
264 | ## Frequency 22 31 26 26 27 27 21 25 26 23 19 28
265 | ## % 7 10 9 9 9 9 7 8 9 8 6 9
266 | ## ---------------------------------------------------------------------------
267 | ## school
268 | ## n missing unique
269 | ## 301 0 2
270 | ##
271 | ## Grant-White (145, 48%), Pasteur (156, 52%)
272 | ## ---------------------------------------------------------------------------
273 | ## grade
274 | ## n missing unique Mean
275 | ## 300 1 2 7.477
276 | ##
277 | ## 7 (157, 52%), 8 (143, 48%)
278 | ## ---------------------------------------------------------------------------
279 | ## x1
280 | ## n missing unique Mean .05 .10 .25 .50 .75
281 | ## 301 0 35 4.936 3.000 3.333 4.167 5.000 5.667
282 | ## .90 .95
283 | ## 6.333 6.667
284 | ##
285 | ## lowest : 0.6667 1.6667 1.8333 2.0000 2.6667
286 | ## highest: 7.0000 7.1667 7.3333 7.5000 8.5000
287 | ## ---------------------------------------------------------------------------
288 | ## x2
289 | ## n missing unique Mean .05 .10 .25 .50 .75
290 | ## 301 0 25 6.088 4.50 4.75 5.25 6.00 6.75
291 | ## .90 .95
292 | ## 7.75 8.50
293 | ##
294 | ## lowest : 2.25 3.50 3.75 4.00 4.25, highest: 8.25 8.50 8.75 9.00 9.25
295 | ## ---------------------------------------------------------------------------
296 | ## x3
297 | ## n missing unique Mean .05 .10 .25 .50 .75
298 | ## 301 0 35 2.25 0.625 0.875 1.375 2.125 3.125
299 | ## .90 .95
300 | ## 4.000 4.250
301 | ##
302 | ## lowest : 0.250 0.375 0.500 0.625 0.750
303 | ## highest: 4.000 4.125 4.250 4.375 4.500
304 | ## ---------------------------------------------------------------------------
305 | ## x4
306 | ## n missing unique Mean .05 .10 .25 .50 .75
307 | ## 301 0 20 3.061 1.333 1.667 2.333 3.000 3.667
308 | ## .90 .95
309 | ## 4.667 5.000
310 | ##
311 | ## lowest : 0.0000 0.3333 0.6667 1.0000 1.3333
312 | ## highest: 5.0000 5.3333 5.6667 6.0000 6.3333
313 | ## ---------------------------------------------------------------------------
314 | ## x5
315 | ## n missing unique Mean .05 .10 .25 .50 .75
316 | ## 301 0 25 4.341 2.00 2.50 3.50 4.50 5.25
317 | ## .90 .95
318 | ## 6.00 6.25
319 | ##
320 | ## lowest : 1.00 1.25 1.50 1.75 2.00, highest: 6.00 6.25 6.50 6.75 7.00
321 | ## ---------------------------------------------------------------------------
322 | ## x6
323 | ## n missing unique Mean .05 .10 .25 .50 .75
324 | ## 301 0 40 2.186 0.7143 1.0000 1.4286 2.0000 2.7143
325 | ## .90 .95
326 | ## 3.7143 4.2857
327 | ##
328 | ## lowest : 0.1429 0.2857 0.4286 0.5714 0.7143
329 | ## highest: 5.1429 5.4286 5.5714 5.8571 6.1429
330 | ## ---------------------------------------------------------------------------
331 | ## x7
332 | ## n missing unique Mean .05 .10 .25 .50 .75
333 | ## 301 0 97 4.186 2.435 2.826 3.478 4.087 4.913
334 | ## .90 .95
335 | ## 5.696 5.870
336 | ##
337 | ## lowest : 1.304 1.870 2.000 2.043 2.130
338 | ## highest: 6.652 6.826 6.957 7.261 7.435
339 | ## ---------------------------------------------------------------------------
340 | ## x8
341 | ## n missing unique Mean .05 .10 .25 .50 .75
342 | ## 301 0 84 5.527 3.90 4.20 4.85 5.50 6.10
343 | ## .90 .95
344 | ## 6.80 7.20
345 | ##
346 | ## lowest : 3.05 3.50 3.60 3.65 3.70
347 | ## highest: 8.00 8.05 8.30 9.10 10.00
348 | ## ---------------------------------------------------------------------------
349 | ## x9
350 | ## n missing unique Mean .05 .10 .25 .50 .75
351 | ## 301 0 129 5.374 3.750 4.111 4.750 5.417 6.083
352 | ## .90 .95
353 | ## 6.667 7.000
354 | ##
355 | ## lowest : 2.778 3.111 3.222 3.278 3.306
356 | ## highest: 7.528 7.611 7.917 8.611 9.250
357 | ## ---------------------------------------------------------------------------
358 |
359 |
360 | The data set include 301
observations. It includes a few demographic variables (e.g., sex, age in years and months, school, and grade). It includes nine variables that are the observed test scores used in the subsequent CFA.
m1_model <- ' visual =~ x1 + x2 + x3
365 | textual =~ x4 + x5 + x6
366 | speed =~ x7 + x8 + x9
367 | '
368 |
369 | m1_fit <- cfa(m1_model, data=cases)
370 |
371 |
372 | cfa
is one model fitting function in lavaan
. The command includes many options. Data can be specified as a data frame, as it is here using the data
argument. Alternatively covariance matrix, vector of means, and sample size can be specified.lavaan
is the parent model fitting function that can take a model.type
argument of 'cfa'
, 'sem'
, or 'growth'
. Thus, the arguments to model.type
are functions themselves which just call lavaan
with particular argument values.parTable(m1_fit)
381 |
382 |
383 | ## id lhs op rhs user group free ustart exo label eq.id unco
384 | ## 1 1 visual =~ x1 1 1 0 1 0 0 0
385 | ## 2 2 visual =~ x2 1 1 1 NA 0 0 1
386 | ## 3 3 visual =~ x3 1 1 2 NA 0 0 2
387 | ## 4 4 textual =~ x4 1 1 0 1 0 0 0
388 | ## 5 5 textual =~ x5 1 1 3 NA 0 0 3
389 | ## 6 6 textual =~ x6 1 1 4 NA 0 0 4
390 | ## 7 7 speed =~ x7 1 1 0 1 0 0 0
391 | ## 8 8 speed =~ x8 1 1 5 NA 0 0 5
392 | ## 9 9 speed =~ x9 1 1 6 NA 0 0 6
393 | ## 10 10 x1 ~~ x1 0 1 7 NA 0 0 7
394 | ## 11 11 x2 ~~ x2 0 1 8 NA 0 0 8
395 | ## 12 12 x3 ~~ x3 0 1 9 NA 0 0 9
396 | ## 13 13 x4 ~~ x4 0 1 10 NA 0 0 10
397 | ## 14 14 x5 ~~ x5 0 1 11 NA 0 0 11
398 | ## 15 15 x6 ~~ x6 0 1 12 NA 0 0 12
399 | ## 16 16 x7 ~~ x7 0 1 13 NA 0 0 13
400 | ## 17 17 x8 ~~ x8 0 1 14 NA 0 0 14
401 | ## 18 18 x9 ~~ x9 0 1 15 NA 0 0 15
402 | ## 19 19 visual ~~ visual 0 1 16 NA 0 0 16
403 | ## 20 20 textual ~~ textual 0 1 17 NA 0 0 17
404 | ## 21 21 speed ~~ speed 0 1 18 NA 0 0 18
405 | ## 22 22 visual ~~ textual 0 1 19 NA 0 0 19
406 | ## 23 23 visual ~~ speed 0 1 20 NA 0 0 20
407 | ## 24 24 textual ~~ speed 0 1 21 NA 0 0 21
408 |
409 |
410 | What do the columns mean?
412 | 413 |id
: numeric identifier for the parameterlhs
: left hand side variable nameop
: operator (see page 7 of http://www.jstatsoft.org/v48/i02/paper); =~
417 | is manifested by; ~~
is correlated with.rhs
: right hand side variable nameuser
: 1 if parameter was specified by the user, 0 otherwisegroup
: presumably used in multiple group analysisfree
: Nonzero elements are free parameters in the modelustart
: The value specified for fixed parametersexo
: ???label
: Probably just an optional label???eq.id
: ??? unco
: ???The model syntax used in lavaan
incorporates a lot of parameters by default to permit a tidy model syntax. The exact nature of these parameters is also determined by options in the cfa
, sem
and other model fitting fucntions.
parTable
is a method
It shows that the latent factors are allowed to intercorrelate. The cfa
function has an an argument orthogonal
. It defaults to FALSE which permits correlated factors.
parTable(cfa(m1_model, data=cases, orthogonal=TRUE))[22:24, ]
435 |
436 |
437 | ## id lhs op rhs user group free ustart exo label eq.id unco
438 | ## 22 22 visual ~~ textual 0 1 0 0 0 0 0
439 | ## 23 23 visual ~~ speed 0 1 0 0 0 0 0
440 | ## 24 24 textual ~~ speed 0 1 0 0 0 0 0
441 |
442 |
443 | When orthogonal=TRUE
is specified, the covariance of latent factors is constrained to zero. This is reflected in free=0
(i.e., it's not free to vary) and ustart=0
(the constrained value is zero) in the parameter table.
Returning to the original parameter table:
446 | 447 |op=~~
where lhs
is the same as rhs
) are included for all observed and latent variables.summary(m1_fit)
454 |
455 |
456 | ## lavaan (0.4-14) converged normally after 41 iterations
457 | ##
458 | ## Number of observations 301
459 | ##
460 | ## Estimator ML
461 | ## Minimum Function Chi-square 85.306
462 | ## Degrees of freedom 24
463 | ## P-value 0.000
464 | ##
465 | ## Parameter estimates:
466 | ##
467 | ## Information Expected
468 | ## Standard Errors Standard
469 | ##
470 | ## Estimate Std.err Z-value P(>|z|)
471 | ## Latent variables:
472 | ## visual =~
473 | ## x1 1.000
474 | ## x2 0.553 0.100 5.554 0.000
475 | ## x3 0.729 0.109 6.685 0.000
476 | ## textual =~
477 | ## x4 1.000
478 | ## x5 1.113 0.065 17.014 0.000
479 | ## x6 0.926 0.055 16.703 0.000
480 | ## speed =~
481 | ## x7 1.000
482 | ## x8 1.180 0.165 7.152 0.000
483 | ## x9 1.082 0.151 7.155 0.000
484 | ##
485 | ## Covariances:
486 | ## visual ~~
487 | ## textual 0.408 0.074 5.552 0.000
488 | ## speed 0.262 0.056 4.660 0.000
489 | ## textual ~~
490 | ## speed 0.173 0.049 3.518 0.000
491 | ##
492 | ## Variances:
493 | ## x1 0.549 0.114
494 | ## x2 1.134 0.102
495 | ## x3 0.844 0.091
496 | ## x4 0.371 0.048
497 | ## x5 0.446 0.058
498 | ## x6 0.356 0.043
499 | ## x7 0.799 0.081
500 | ## x8 0.488 0.074
501 | ## x9 0.566 0.071
502 | ## visual 0.809 0.145
503 | ## textual 0.979 0.112
504 | ## speed 0.384 0.086
505 | ##
506 |
507 |
508 | The default summary
method shows \( \chi^2 \), \( df \), p-value for the overall model, unstandardised parameter estimates, in some cases with significance tests.
There are multiple ways of getting fit statistics
513 | 514 |fitMeasures(m1_fit)
515 |
516 |
517 | ## chisq df pvalue baseline.chisq
518 | ## 85.306 24.000 0.000 918.852
519 | ## baseline.df baseline.pvalue cfi tli
520 | ## 36.000 0.000 0.931 0.896
521 | ## logl unrestricted.logl npar aic
522 | ## -3737.745 -3695.092 21.000 7517.490
523 | ## bic ntotal bic2 rmsea
524 | ## 7595.339 301.000 7528.739 0.092
525 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr
526 | ## 0.071 0.114 0.001 0.065
527 |
528 |
529 | # equivalent to:
530 | # inspect(m1_fit, 'fit.measures')
531 |
532 | fitMeasures(m1_fit)['rmsea']
533 |
534 |
535 | ## rmsea
536 | ## 0.09212
537 |
538 |
539 | fitMeasures(m1_fit, c('rmsea', 'rmsea.ci.lower', 'rmsea.ci.upper'))
540 |
541 |
542 | ## rmsea rmsea.ci.lower rmsea.ci.upper
543 | ## 0.092 0.071 0.114
544 |
545 |
546 |
547 |
548 | summary(m1_fit, fit.measures=TRUE)
549 |
550 |
551 | ## lavaan (0.4-14) converged normally after 41 iterations
552 | ##
553 | ## Number of observations 301
554 | ##
555 | ## Estimator ML
556 | ## Minimum Function Chi-square 85.306
557 | ## Degrees of freedom 24
558 | ## P-value 0.000
559 | ##
560 | ## Chi-square test baseline model:
561 | ##
562 | ## Minimum Function Chi-square 918.852
563 | ## Degrees of freedom 36
564 | ## P-value 0.000
565 | ##
566 | ## Full model versus baseline model:
567 | ##
568 | ## Comparative Fit Index (CFI) 0.931
569 | ## Tucker-Lewis Index (TLI) 0.896
570 | ##
571 | ## Loglikelihood and Information Criteria:
572 | ##
573 | ## Loglikelihood user model (H0) -3737.745
574 | ## Loglikelihood unrestricted model (H1) -3695.092
575 | ##
576 | ## Number of free parameters 21
577 | ## Akaike (AIC) 7517.490
578 | ## Bayesian (BIC) 7595.339
579 | ## Sample-size adjusted Bayesian (BIC) 7528.739
580 | ##
581 | ## Root Mean Square Error of Approximation:
582 | ##
583 | ## RMSEA 0.092
584 | ## 90 Percent Confidence Interval 0.071 0.114
585 | ## P-value RMSEA <= 0.05 0.001
586 | ##
587 | ## Standardized Root Mean Square Residual:
588 | ##
589 | ## SRMR 0.065
590 | ##
591 | ## Parameter estimates:
592 | ##
593 | ## Information Expected
594 | ## Standard Errors Standard
595 | ##
596 | ## Estimate Std.err Z-value P(>|z|)
597 | ## Latent variables:
598 | ## visual =~
599 | ## x1 1.000
600 | ## x2 0.553 0.100 5.554 0.000
601 | ## x3 0.729 0.109 6.685 0.000
602 | ## textual =~
603 | ## x4 1.000
604 | ## x5 1.113 0.065 17.014 0.000
605 | ## x6 0.926 0.055 16.703 0.000
606 | ## speed =~
607 | ## x7 1.000
608 | ## x8 1.180 0.165 7.152 0.000
609 | ## x9 1.082 0.151 7.155 0.000
610 | ##
611 | ## Covariances:
612 | ## visual ~~
613 | ## textual 0.408 0.074 5.552 0.000
614 | ## speed 0.262 0.056 4.660 0.000
615 | ## textual ~~
616 | ## speed 0.173 0.049 3.518 0.000
617 | ##
618 | ## Variances:
619 | ## x1 0.549 0.114
620 | ## x2 1.134 0.102
621 | ## x3 0.844 0.091
622 | ## x4 0.371 0.048
623 | ## x5 0.446 0.058
624 | ## x6 0.356 0.043
625 | ## x7 0.799 0.081
626 | ## x8 0.488 0.074
627 | ## x9 0.566 0.071
628 | ## visual 0.809 0.145
629 | ## textual 0.979 0.112
630 | ## speed 0.384 0.086
631 | ##
632 |
633 |
634 | rmsea.ci.lower
and rmsea.ci.upper
refer to 90% lower and upper confidence intervals.fit.measures=TRUE
provides a way of displaying m1_mod <- modificationIndices(m1_fit)
644 | head(m1_mod[order(m1_mod$mi, decreasing=TRUE), ], 10)
645 |
646 |
647 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
648 | ## 1 visual =~ x9 36.411 0.577 0.519 0.515 0.515
649 | ## 2 x7 ~~ x8 34.145 0.536 0.536 0.488 0.488
650 | ## 3 visual =~ x7 18.631 -0.422 -0.380 -0.349 -0.349
651 | ## 4 x8 ~~ x9 14.946 -0.423 -0.423 -0.415 -0.415
652 | ## 5 textual =~ x3 9.151 -0.272 -0.269 -0.238 -0.238
653 | ## 6 x2 ~~ x7 8.918 -0.183 -0.183 -0.143 -0.143
654 | ## 7 textual =~ x1 8.903 0.350 0.347 0.297 0.297
655 | ## 8 x2 ~~ x3 8.532 0.218 0.218 0.164 0.164
656 | ## 9 x3 ~~ x5 7.858 -0.130 -0.130 -0.089 -0.089
657 | ## 10 visual =~ x5 7.441 -0.210 -0.189 -0.147 -0.147
658 |
659 |
660 | modificationIndices
function returns modification indices and expected parameter changes (EPCs). m2_model <- ' visual =~ x1 + x2 + x3 + x9
666 | textual =~ x4 + x5 + x6
667 | speed =~ x7 + x8 + x9
668 | '
669 |
670 | m2_fit <- cfa(m2_model, data=cases)
671 | anova(m1_fit, m2_fit)
672 |
673 |
674 | ## Chi Square Difference Test
675 | ##
676 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
677 | ## m2_fit 23 7487 7568 52.4
678 | ## m1_fit 24 7517 7595 85.3 32.9 1 9.6e-09 ***
679 | ## ---
680 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
681 |
682 |
683 | 32.9234
687 | is different to the value of the modification index
688 | 36.411
.summary(m1_fit)
694 |
695 |
696 | ## lavaan (0.4-14) converged normally after 41 iterations
697 | ##
698 | ## Number of observations 301
699 | ##
700 | ## Estimator ML
701 | ## Minimum Function Chi-square 85.306
702 | ## Degrees of freedom 24
703 | ## P-value 0.000
704 | ##
705 | ## Parameter estimates:
706 | ##
707 | ## Information Expected
708 | ## Standard Errors Standard
709 | ##
710 | ## Estimate Std.err Z-value P(>|z|)
711 | ## Latent variables:
712 | ## visual =~
713 | ## x1 1.000
714 | ## x2 0.553 0.100 5.554 0.000
715 | ## x3 0.729 0.109 6.685 0.000
716 | ## textual =~
717 | ## x4 1.000
718 | ## x5 1.113 0.065 17.014 0.000
719 | ## x6 0.926 0.055 16.703 0.000
720 | ## speed =~
721 | ## x7 1.000
722 | ## x8 1.180 0.165 7.152 0.000
723 | ## x9 1.082 0.151 7.155 0.000
724 | ##
725 | ## Covariances:
726 | ## visual ~~
727 | ## textual 0.408 0.074 5.552 0.000
728 | ## speed 0.262 0.056 4.660 0.000
729 | ## textual ~~
730 | ## speed 0.173 0.049 3.518 0.000
731 | ##
732 | ## Variances:
733 | ## x1 0.549 0.114
734 | ## x2 1.134 0.102
735 | ## x3 0.844 0.091
736 | ## x4 0.371 0.048
737 | ## x5 0.446 0.058
738 | ## x6 0.356 0.043
739 | ## x7 0.799 0.081
740 | ## x8 0.488 0.074
741 | ## x9 0.566 0.071
742 | ## visual 0.809 0.145
743 | ## textual 0.979 0.112
744 | ## speed 0.384 0.086
745 | ##
746 |
747 |
748 | standardizedSolution(m1_fit)
749 |
750 |
751 | ## lhs op rhs est.std se z pvalue
752 | ## 1 visual =~ x1 0.772 NA NA NA
753 | ## 2 visual =~ x2 0.424 NA NA NA
754 | ## 3 visual =~ x3 0.581 NA NA NA
755 | ## 4 textual =~ x4 0.852 NA NA NA
756 | ## 5 textual =~ x5 0.855 NA NA NA
757 | ## 6 textual =~ x6 0.838 NA NA NA
758 | ## 7 speed =~ x7 0.570 NA NA NA
759 | ## 8 speed =~ x8 0.723 NA NA NA
760 | ## 9 speed =~ x9 0.665 NA NA NA
761 | ## 10 x1 ~~ x1 0.404 NA NA NA
762 | ## 11 x2 ~~ x2 0.821 NA NA NA
763 | ## 12 x3 ~~ x3 0.662 NA NA NA
764 | ## 13 x4 ~~ x4 0.275 NA NA NA
765 | ## 14 x5 ~~ x5 0.269 NA NA NA
766 | ## 15 x6 ~~ x6 0.298 NA NA NA
767 | ## 16 x7 ~~ x7 0.676 NA NA NA
768 | ## 17 x8 ~~ x8 0.477 NA NA NA
769 | ## 18 x9 ~~ x9 0.558 NA NA NA
770 | ## 19 visual ~~ visual 1.000 NA NA NA
771 | ## 20 textual ~~ textual 1.000 NA NA NA
772 | ## 21 speed ~~ speed 1.000 NA NA NA
773 | ## 22 visual ~~ textual 0.459 NA NA NA
774 | ## 23 visual ~~ speed 0.471 NA NA NA
775 | ## 24 textual ~~ speed 0.283 NA NA NA
776 |
777 |
778 |
779 |
780 |
781 |
782 |
--------------------------------------------------------------------------------
/ex1-paper/ex1-paper.md:
--------------------------------------------------------------------------------
1 | # Example 1 from Lavaan
2 |
3 | This exercise examines the first example shown in
4 | library(lavaan)
186 | Data <- PoliticalDemocracy
187 |
188 |
189 | This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see here).
190 | 191 |m0_model <- '
194 | # measurement model
195 | ind60 =~ x1 + x2 + x3
196 | dem60 =~ y1 + y2 + y3 + y4
197 | dem65 =~ y5 + y6 + y7 + y8
198 | '
199 |
200 | m0_fit <- cfa(m0_model, data=Data)
201 |
202 |
203 | m0
defines a basic measurement model that permits correlated factors. Note that it does not have correlations between corresponding democracy indicator measures over time.Questions:
208 | 209 |fitmeasures(m0_fit)
214 |
215 |
216 | ## chisq df pvalue baseline.chisq
217 | ## 72.462 41.000 0.002 730.654
218 | ## baseline.df baseline.pvalue cfi tli
219 | ## 55.000 0.000 0.953 0.938
220 | ## logl unrestricted.logl npar aic
221 | ## -1564.959 -1528.728 25.000 3179.918
222 | ## bic ntotal bic2 rmsea
223 | ## 3237.855 75.000 3159.062 0.101
224 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr
225 | ## 0.061 0.139 0.021 0.055
226 |
227 |
228 | inspect(m0_fit, 'standardized')
233 |
234 |
235 | ## lhs op rhs est.std se z pvalue
236 | ## 1 ind60 =~ x1 0.920 NA NA NA
237 | ## 2 ind60 =~ x2 0.973 NA NA NA
238 | ## 3 ind60 =~ x3 0.872 NA NA NA
239 | ## 4 dem60 =~ y1 0.845 NA NA NA
240 | ## 5 dem60 =~ y2 0.760 NA NA NA
241 | ## 6 dem60 =~ y3 0.705 NA NA NA
242 | ## 7 dem60 =~ y4 0.860 NA NA NA
243 | ## 8 dem65 =~ y5 0.803 NA NA NA
244 | ## 9 dem65 =~ y6 0.783 NA NA NA
245 | ## 10 dem65 =~ y7 0.819 NA NA NA
246 | ## 11 dem65 =~ y8 0.847 NA NA NA
247 | ## 12 x1 ~~ x1 0.154 NA NA NA
248 | ## 13 x2 ~~ x2 0.053 NA NA NA
249 | ## 14 x3 ~~ x3 0.240 NA NA NA
250 | ## 15 y1 ~~ y1 0.286 NA NA NA
251 | ## 16 y2 ~~ y2 0.422 NA NA NA
252 | ## 17 y3 ~~ y3 0.503 NA NA NA
253 | ## 18 y4 ~~ y4 0.261 NA NA NA
254 | ## 19 y5 ~~ y5 0.355 NA NA NA
255 | ## 20 y6 ~~ y6 0.387 NA NA NA
256 | ## 21 y7 ~~ y7 0.329 NA NA NA
257 | ## 22 y8 ~~ y8 0.283 NA NA NA
258 | ## 23 ind60 ~~ ind60 1.000 NA NA NA
259 | ## 24 dem60 ~~ dem60 1.000 NA NA NA
260 | ## 25 dem65 ~~ dem65 1.000 NA NA NA
261 | ## 26 ind60 ~~ dem60 0.448 NA NA NA
262 | ## 27 ind60 ~~ dem65 0.555 NA NA NA
263 | ## 28 dem60 ~~ dem65 0.978 NA NA NA
264 |
265 |
266 | m0_mod <- modificationindices(m0_fit)
271 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12)
272 |
273 |
274 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
275 | ## 1 y2 ~~ y6 9.279 2.129 2.129 0.162 0.162
276 | ## 2 y6 ~~ y8 8.668 1.513 1.513 0.140 0.140
277 | ## 3 y1 ~~ y5 8.183 0.884 0.884 0.131 0.131
278 | ## 4 y3 ~~ y6 6.574 -1.590 -1.590 -0.146 -0.146
279 | ## 5 y1 ~~ y3 5.204 1.024 1.024 0.121 0.121
280 | ## 6 y2 ~~ y4 4.911 1.432 1.432 0.110 0.110
281 | ## 7 y3 ~~ y7 4.088 1.152 1.152 0.108 0.108
282 | ## 8 ind60 =~ y5 4.007 0.762 0.510 0.197 0.197
283 | ## 9 x1 ~~ y2 3.785 -0.192 -0.192 -0.067 -0.067
284 | ## 10 ind60 =~ y4 3.568 0.811 0.543 0.163 0.163
285 | ## 11 y2 ~~ y3 3.215 -1.365 -1.365 -0.107 -0.107
286 | ## 12 y5 ~~ y6 3.116 -0.774 -0.774 -0.089 -0.089
287 |
288 |
289 | y2
with y6
, y3
with y7
).y6
and y8
have a particularly large correlation.round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2)
296 |
297 |
298 | ## y5 y6 y7 y8
299 | ## y5 1.00 0.56 0.68 0.63
300 | ## y6 0.56 1.00 0.61 0.75
301 | ## y7 0.68 0.61 1.00 0.71
302 | ## y8 0.63 0.75 0.71 1.00
303 |
304 |
305 | cov2cor(inspect(m0_fit, "coefficients")$psi)
310 |
311 |
312 | ## ind60 dem60 dem65
313 | ## ind60 1.000
314 | ## dem60 0.448 1.000
315 | ## dem65 0.555 0.978 1.000
316 |
317 |
318 | This certainly suggests that factors are strongly related, especially the two demographics measures.
319 | 320 |This next model permits corresponding democracy measures from the two points to be correlated.
323 | 324 |m1_model <- '
325 | # measurement model
326 | ind60 =~ x1 + x2 + x3
327 | dem60 =~ y1 + y2 + y3 + y4
328 | dem65 =~ y5 + y6 + y7 + y8
329 |
330 | # correlated residuals
331 | y1 ~~ y5
332 | y2 ~~ y6
333 | y3 ~~ y7
334 | y4 ~~ y8
335 | '
336 |
337 | m1_fit <- cfa(m1_model, data=Data)
338 |
339 |
340 | m0
with uncorrelated indicators?m1
have good fit in and of itself?anova(m0_fit, m1_fit)
346 |
347 |
348 | ## Chi Square Difference Test
349 | ##
350 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
351 | ## m1_fit 37 3166 3233 50.8
352 | ## m0_fit 41 3180 3238 72.5 21.6 4 0.00024 ***
353 | ## ---
354 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
355 |
356 |
357 | round(cbind(m0=inspect(m0_fit, 'fit.measures'),
358 | m1=inspect(m1_fit, 'fit.measures')), 3)
359 |
360 |
361 | ## m0 m1
362 | ## chisq 72.462 50.835
363 | ## df 41.000 37.000
364 | ## pvalue 0.002 0.064
365 | ## baseline.chisq 730.654 730.654
366 | ## baseline.df 55.000 55.000
367 | ## baseline.pvalue 0.000 0.000
368 | ## cfi 0.953 0.980
369 | ## tli 0.938 0.970
370 | ## logl -1564.959 -1554.146
371 | ## unrestricted.logl -1528.728 -1528.728
372 | ## npar 25.000 29.000
373 | ## aic 3179.918 3166.292
374 | ## bic 3237.855 3233.499
375 | ## ntotal 75.000 75.000
376 | ## bic2 3159.062 3142.099
377 | ## rmsea 0.101 0.071
378 | ## rmsea.ci.lower 0.061 0.000
379 | ## rmsea.ci.upper 0.139 0.115
380 | ## rmsea.pvalue 0.021 0.234
381 | ## srmr 0.055 0.050
382 |
383 |
384 | m2_model <- '
393 | # measurement model
394 | ind60 =~ x1 + x2 + x3
395 | dem60 =~ y1 + y2 + y3 + y4
396 | dem65 =~ y5 + y6 + y7 + y8
397 |
398 | # correlated residuals
399 | y1 ~~ y5
400 | y2 ~~ y6
401 | y3 ~~ y7
402 | y4 ~~ y8
403 |
404 | # regressions
405 | dem60 ~ ind60
406 | dem65 ~ ind60 + dem60
407 | '
408 |
409 | m2_fit <- sem(m2_model, data=Data)
410 |
411 |
412 | rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')],
417 | m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')])
418 |
419 |
420 | ## chisq rmsea
421 | ## m1 50.84 0.07061
422 | ## m2 50.84 0.07061
423 |
424 |
425 | Yes, it is.
426 | 427 |# m2_fit <- sem(m2_model, data=Data)
439 |
440 | # r-square for dem-65
441 | inspect(m2_fit, 'r2')['dem65']
442 |
443 |
444 | ## dem65
445 | ## 0.9139
446 |
447 |
448 |
449 | # Unstandardised regression coefficients
450 | inspect(m2_fit, 'coef')$beta['dem65', ]
451 |
452 |
453 | ## ind60 dem60 dem65
454 | ## 0.5069 0.8157 0.0000
455 |
456 |
457 |
458 | # Standardised regression coefficients
459 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~')
460 |
461 |
462 | ## lhs op rhs est.std se z pvalue
463 | ## 1 dem65 ~ ind60 0.168 NA NA NA
464 | ## 2 dem65 ~ dem60 0.869 NA NA NA
465 |
466 |
467 |
468 | # Just a guess, may not be correct:
469 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ],
470 | # se=inspect(m2_fit, 'se')$beta['dem65', ])
471 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96
472 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96
473 |
474 |
475 |
476 |
477 |
478 |
479 |
--------------------------------------------------------------------------------
/ex2-paper/ex2-paper.md:
--------------------------------------------------------------------------------
1 |
2 | # Example 2 from Rossel's Paper on lavaan
3 |
4 |
5 | ```r
6 | library(lavaan)
7 | Data <- PoliticalDemocracy
8 | ```
9 |
10 |
11 |
12 |
13 | This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see [here](http://www.jstatsoft.org/v48/i02/paper)).
14 |
15 | ## M0: Basic Measurement model
16 |
17 |
18 | ```r
19 | m0_model <- '
20 | # measurement model
21 | ind60 =~ x1 + x2 + x3
22 | dem60 =~ y1 + y2 + y3 + y4
23 | dem65 =~ y5 + y6 + y7 + y8
24 | '
25 |
26 | m0_fit <- cfa(m0_model, data=Data)
27 | ```
28 |
29 |
30 |
31 |
32 | * `m0` defines a basic measurement model that permits correlated factors. Note that it does not have correlations between corresponding democracy indicator measures over time.
33 |
34 | **Questions:**
35 |
36 | * Is it a good model?
37 |
38 |
39 |
40 | ```r
41 | fitmeasures(m0_fit)
42 | ```
43 |
44 | ```
45 | ## chisq df pvalue baseline.chisq
46 | ## 72.462 41.000 0.002 730.654
47 | ## baseline.df baseline.pvalue cfi tli
48 | ## 55.000 0.000 0.953 0.938
49 | ## logl unrestricted.logl npar aic
50 | ## -1564.959 -1528.728 25.000 3179.918
51 | ## bic ntotal bic2 rmsea
52 | ## 3237.855 75.000 3159.062 0.101
53 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr
54 | ## 0.061 0.139 0.021 0.055
55 | ```
56 |
57 |
58 |
59 |
60 | * cfi suggests a reasonable model, but RMSEA is quite large.
61 |
62 |
63 |
64 | ```r
65 | inspect(m0_fit, 'standardized')
66 | ```
67 |
68 | ```
69 | ## lhs op rhs est.std se z pvalue
70 | ## 1 ind60 =~ x1 0.920 NA NA NA
71 | ## 2 ind60 =~ x2 0.973 NA NA NA
72 | ## 3 ind60 =~ x3 0.872 NA NA NA
73 | ## 4 dem60 =~ y1 0.845 NA NA NA
74 | ## 5 dem60 =~ y2 0.760 NA NA NA
75 | ## 6 dem60 =~ y3 0.705 NA NA NA
76 | ## 7 dem60 =~ y4 0.860 NA NA NA
77 | ## 8 dem65 =~ y5 0.803 NA NA NA
78 | ## 9 dem65 =~ y6 0.783 NA NA NA
79 | ## 10 dem65 =~ y7 0.819 NA NA NA
80 | ## 11 dem65 =~ y8 0.847 NA NA NA
81 | ## 12 x1 ~~ x1 0.154 NA NA NA
82 | ## 13 x2 ~~ x2 0.053 NA NA NA
83 | ## 14 x3 ~~ x3 0.240 NA NA NA
84 | ## 15 y1 ~~ y1 0.286 NA NA NA
85 | ## 16 y2 ~~ y2 0.422 NA NA NA
86 | ## 17 y3 ~~ y3 0.503 NA NA NA
87 | ## 18 y4 ~~ y4 0.261 NA NA NA
88 | ## 19 y5 ~~ y5 0.355 NA NA NA
89 | ## 20 y6 ~~ y6 0.387 NA NA NA
90 | ## 21 y7 ~~ y7 0.329 NA NA NA
91 | ## 22 y8 ~~ y8 0.283 NA NA NA
92 | ## 23 ind60 ~~ ind60 1.000 NA NA NA
93 | ## 24 dem60 ~~ dem60 1.000 NA NA NA
94 | ## 25 dem65 ~~ dem65 1.000 NA NA NA
95 | ## 26 ind60 ~~ dem60 0.448 NA NA NA
96 | ## 27 ind60 ~~ dem65 0.555 NA NA NA
97 | ## 28 dem60 ~~ dem65 0.978 NA NA NA
98 | ```
99 |
100 |
101 |
102 |
103 | * The table of standardised loadings show all factor loadings to be large.
104 |
105 |
106 |
107 | ```r
108 | m0_mod <- modificationindices(m0_fit)
109 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12)
110 | ```
111 |
112 | ```
113 | ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
114 | ## 1 y2 ~~ y6 9.279 2.129 2.129 0.162 0.162
115 | ## 2 y6 ~~ y8 8.668 1.513 1.513 0.140 0.140
116 | ## 3 y1 ~~ y5 8.183 0.884 0.884 0.131 0.131
117 | ## 4 y3 ~~ y6 6.574 -1.590 -1.590 -0.146 -0.146
118 | ## 5 y1 ~~ y3 5.204 1.024 1.024 0.121 0.121
119 | ## 6 y2 ~~ y4 4.911 1.432 1.432 0.110 0.110
120 | ## 7 y3 ~~ y7 4.088 1.152 1.152 0.108 0.108
121 | ## 8 ind60 =~ y5 4.007 0.762 0.510 0.197 0.197
122 | ## 9 x1 ~~ y2 3.785 -0.192 -0.192 -0.067 -0.067
123 | ## 10 ind60 =~ y4 3.568 0.811 0.543 0.163 0.163
124 | ## 11 y2 ~~ y3 3.215 -1.365 -1.365 -0.107 -0.107
125 | ## 12 y5 ~~ y6 3.116 -0.774 -0.774 -0.089 -0.089
126 | ```
127 |
128 |
129 |
130 |
131 | * The table of largest modification indices suggest a range of ways that the model could be improved. Because the sample size is small, particular caution needs to be taken with these.
132 | * Several of these modifications concern the expected requirement to permit indicator variables at different time points to correlate (e.g., `y2` with `y6`, `y3` with `y7`).
133 | * It may also be that some pairs of items are correlated more than others. For example, the following correlation matrix shows how `y6` and `y8` have a particularly large correlation.
134 |
135 |
136 |
137 | ```r
138 | round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2)
139 | ```
140 |
141 | ```
142 | ## y5 y6 y7 y8
143 | ## y5 1.00 0.56 0.68 0.63
144 | ## y6 0.56 1.00 0.61 0.75
145 | ## y7 0.68 0.61 1.00 0.71
146 | ## y8 0.63 0.75 0.71 1.00
147 | ```
148 |
149 |
150 |
151 |
152 |
153 | * What are the correlations between the factors?
154 |
155 |
156 |
157 | ```r
158 | cov2cor(inspect(m0_fit, "coefficients")$psi)
159 | ```
160 |
161 | ```
162 | ## ind60 dem60 dem65
163 | ## ind60 1.000
164 | ## dem60 0.448 1.000
165 | ## dem65 0.555 0.978 1.000
166 | ```
167 |
168 |
169 |
170 |
171 | This certainly suggests that factors are strongly related, especially the two demographics measures.
172 |
173 |
174 | ## M1: Correlated item measurement model
175 | This next model permits corresponding democracy measures from the two points to be correlated.
176 |
177 |
178 |
179 | ```r
180 | m1_model <- '
181 | # measurement model
182 | ind60 =~ x1 + x2 + x3
183 | dem60 =~ y1 + y2 + y3 + y4
184 | dem65 =~ y5 + y6 + y7 + y8
185 |
186 | # correlated residuals
187 | y1 ~~ y5
188 | y2 ~~ y6
189 | y3 ~~ y7
190 | y4 ~~ y8
191 | '
192 |
193 | m1_fit <- cfa(m1_model, data=Data)
194 | ```
195 |
196 |
197 |
198 |
199 | * Is this an improvement over `m0` with uncorrelated indicators?
200 | * Does `m1` have good fit in and of itself?
201 |
202 |
203 |
204 | ```r
205 | anova(m0_fit, m1_fit)
206 | ```
207 |
208 | ```
209 | ## Chi Square Difference Test
210 | ##
211 | ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
212 | ## m1_fit 37 3166 3233 50.8
213 | ## m0_fit 41 3180 3238 72.5 21.6 4 0.00024 ***
214 | ## ---
215 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
216 | ```
217 |
218 | ```r
219 | round(cbind(m0=inspect(m0_fit, 'fit.measures'),
220 | m1=inspect(m1_fit, 'fit.measures')), 3)
221 | ```
222 |
223 | ```
224 | ## m0 m1
225 | ## chisq 72.462 50.835
226 | ## df 41.000 37.000
227 | ## pvalue 0.002 0.064
228 | ## baseline.chisq 730.654 730.654
229 | ## baseline.df 55.000 55.000
230 | ## baseline.pvalue 0.000 0.000
231 | ## cfi 0.953 0.980
232 | ## tli 0.938 0.970
233 | ## logl -1564.959 -1554.146
234 | ## unrestricted.logl -1528.728 -1528.728
235 | ## npar 25.000 29.000
236 | ## aic 3179.918 3166.292
237 | ## bic 3237.855 3233.499
238 | ## ntotal 75.000 75.000
239 | ## bic2 3159.062 3142.099
240 | ## rmsea 0.101 0.071
241 | ## rmsea.ci.lower 0.061 0.000
242 | ## rmsea.ci.upper 0.139 0.115
243 | ## rmsea.pvalue 0.021 0.234
244 | ## srmr 0.055 0.050
245 | ```
246 |
247 |
248 |
249 |
250 | * It is a significant improvement.
251 | * RMSEA and other fit measurs are substantially improved.
252 | * The relatively small sample size makes it somewhat difficult to see how much further improvements should continue. In general, the RMSEA suggests that further improvements are possible but it may be less clear on how to proceed in a principled way.
253 |
254 |
255 |
256 |
257 | # M2: Basic SEM
258 |
259 |
260 | ```r
261 | m2_model <- '
262 | # measurement model
263 | ind60 =~ x1 + x2 + x3
264 | dem60 =~ y1 + y2 + y3 + y4
265 | dem65 =~ y5 + y6 + y7 + y8
266 |
267 | # correlated residuals
268 | y1 ~~ y5
269 | y2 ~~ y6
270 | y3 ~~ y7
271 | y4 ~~ y8
272 |
273 | # regressions
274 | dem60 ~ ind60
275 | dem65 ~ ind60 + dem60
276 | '
277 |
278 | m2_fit <- sem(m2_model, data=Data)
279 | ```
280 |
281 |
282 |
283 |
284 | * Is fit the same as model 1 as I would expect?
285 |
286 |
287 |
288 | ```r
289 | rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')],
290 | m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')])
291 | ```
292 |
293 | ```
294 | ## chisq rmsea
295 | ## m1 50.84 0.07061
296 | ## m2 50.84 0.07061
297 | ```
298 |
299 |
300 |
301 | Yes, it is.
302 |
303 | * Assuming democracy 1965 is the depenent variable, how can we get the information typically available in multiple regression output?
304 | * R-squared?
305 | * Unstandardised regression coefficients?
306 | * Standardised regression coefficients?
307 | * Standard errors, p-values, and confidence intervals on unstandardised coefficients?
308 |
309 |
310 |
311 | ```r
312 | # m2_fit <- sem(m2_model, data=Data)
313 |
314 | # r-square for dem-65
315 | inspect(m2_fit, 'r2')['dem65']
316 | ```
317 |
318 | ```
319 | ## dem65
320 | ## 0.9139
321 | ```
322 |
323 | ```r
324 |
325 | # Unstandardised regression coefficients
326 | inspect(m2_fit, 'coef')$beta['dem65', ]
327 | ```
328 |
329 | ```
330 | ## ind60 dem60 dem65
331 | ## 0.5069 0.8157 0.0000
332 | ```
333 |
334 | ```r
335 |
336 | # Standardised regression coefficients
337 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~')
338 | ```
339 |
340 | ```
341 | ## lhs op rhs est.std se z pvalue
342 | ## 1 dem65 ~ ind60 0.168 NA NA NA
343 | ## 2 dem65 ~ dem60 0.869 NA NA NA
344 | ```
345 |
346 | ```r
347 |
348 | # Just a guess, may not be correct:
349 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ],
350 | # se=inspect(m2_fit, 'se')$beta['dem65', ])
351 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96
352 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96
353 | ```
354 |
355 |
356 |
357 |
358 |
359 |
360 |
--------------------------------------------------------------------------------
/ex2-paper/ex2-paper.rmd:
--------------------------------------------------------------------------------
1 | `r opts_chunk$set(cache=TRUE, tidy=FALSE)`
2 | # Example 2 from Rossel's Paper on lavaan
3 | ```{r setup, message=FALSE}
4 | library(lavaan)
5 | Data <- PoliticalDemocracy
6 | ```
7 |
8 | This example is an elaboration on Example 2 from Yves Rossel's Journal of Statistical Software Article (see [here](http://www.jstatsoft.org/v48/i02/paper)).
9 |
10 | ## M0: Basic Measurement model
11 | ```{r basic_measurement_model}
12 | m0_model <- '
13 | # measurement model
14 | ind60 =~ x1 + x2 + x3
15 | dem60 =~ y1 + y2 + y3 + y4
16 | dem65 =~ y5 + y6 + y7 + y8
17 | '
18 |
19 | m0_fit <- cfa(m0_model, data=Data)
20 | ```
21 |
22 | * `m0` defines a basic measurement model that permits correlated factors. Note that it does not have correlations between corresponding democracy indicator measures over time.
23 |
24 | **Questions:**
25 |
26 | * Is it a good model?
27 |
28 | ```{r m0_fit_measures}
29 | fitmeasures(m0_fit)
30 | ```
31 |
32 | * cfi suggests a reasonable model, but RMSEA is quite large.
33 |
34 | ```{r m0_standardised_parameters}
35 | inspect(m0_fit, 'standardized')
36 | ```
37 |
38 | * The table of standardised loadings show all factor loadings to be large.
39 |
40 | ```{r m0_mod_indices}
41 | m0_mod <- modificationindices(m0_fit)
42 | head(m0_mod[order(m0_mod$mi, decreasing=TRUE), ], 12)
43 | ```
44 |
45 | * The table of largest modification indices suggest a range of ways that the model could be improved. Because the sample size is small, particular caution needs to be taken with these.
46 | * Several of these modifications concern the expected requirement to permit indicator variables at different time points to correlate (e.g., `y2` with `y6`, `y3` with `y7`).
47 | * It may also be that some pairs of items are correlated more than others. For example, the following correlation matrix shows how `y6` and `y8` have a particularly large correlation.
48 |
49 | ```{r}
50 | round(cor(Data[,c('y5', 'y6', 'y7', 'y8')]), 2)
51 | ```
52 |
53 |
54 | * What are the correlations between the factors?
55 |
56 | ```{r}
57 | cov2cor(inspect(m0_fit, "coefficients")$psi)
58 | ```
59 |
60 | This certainly suggests that factors are strongly related, especially the two demographics measures.
61 |
62 |
63 | ## M1: Correlated item measurement model
64 | This next model permits corresponding democracy measures from the two points to be correlated.
65 |
66 | ```{r correlated_measurement_model}
67 | m1_model <- '
68 | # measurement model
69 | ind60 =~ x1 + x2 + x3
70 | dem60 =~ y1 + y2 + y3 + y4
71 | dem65 =~ y5 + y6 + y7 + y8
72 |
73 | # correlated residuals
74 | y1 ~~ y5
75 | y2 ~~ y6
76 | y3 ~~ y7
77 | y4 ~~ y8
78 | '
79 |
80 | m1_fit <- cfa(m1_model, data=Data)
81 | ```
82 |
83 | * Is this an improvement over `m0` with uncorrelated indicators?
84 | * Does `m1` have good fit in and of itself?
85 |
86 | ```{r}
87 | anova(m0_fit, m1_fit)
88 | round(cbind(m0=inspect(m0_fit, 'fit.measures'),
89 | m1=inspect(m1_fit, 'fit.measures')), 3)
90 | ```
91 |
92 | * It is a significant improvement.
93 | * RMSEA and other fit measurs are substantially improved.
94 | * The relatively small sample size makes it somewhat difficult to see how much further improvements should continue. In general, the RMSEA suggests that further improvements are possible but it may be less clear on how to proceed in a principled way.
95 |
96 |
97 |
98 |
99 | # M2: Basic SEM
100 | ```{r m2_model}
101 | m2_model <- '
102 | # measurement model
103 | ind60 =~ x1 + x2 + x3
104 | dem60 =~ y1 + y2 + y3 + y4
105 | dem65 =~ y5 + y6 + y7 + y8
106 |
107 | # correlated residuals
108 | y1 ~~ y5
109 | y2 ~~ y6
110 | y3 ~~ y7
111 | y4 ~~ y8
112 |
113 | # regressions
114 | dem60 ~ ind60
115 | dem65 ~ ind60 + dem60
116 | '
117 |
118 | m2_fit <- sem(m2_model, data=Data)
119 | ```
120 |
121 | * Is fit the same as model 1 as I would expect?
122 |
123 | ```{r m2_chi_square_check}
124 | rbind(m1 = fitMeasures(m1_fit)[c('chisq', 'rmsea')],
125 | m2 = fitMeasures(m2_fit)[c('chisq', 'rmsea')])
126 | ```
127 | Yes, it is.
128 |
129 | * Assuming democracy 1965 is the depenent variable, how can we get the information typically available in multiple regression output?
130 | * R-squared?
131 | * Unstandardised regression coefficients?
132 | * Standardised regression coefficients?
133 | * Standard errors, p-values, and confidence intervals on unstandardised coefficients?
134 |
135 | ```{r}
136 | # m2_fit <- sem(m2_model, data=Data)
137 |
138 | # r-square for dem-65
139 | inspect(m2_fit, 'r2')['dem65']
140 |
141 | # Unstandardised regression coefficients
142 | inspect(m2_fit, 'coef')$beta['dem65', ]
143 |
144 | # Standardised regression coefficients
145 | subset(inspect(m2_fit, 'standardized'), lhs == 'dem65' & op == '~')
146 |
147 | # Just a guess, may not be correct:
148 | # coefs <- data.frame(coef=inspect(m2_fit, 'coef')$beta['dem65', ],
149 | # se=inspect(m2_fit, 'se')$beta['dem65', ])
150 | # coefs$low95ci <- coefs$coef - coefs$se * 1.96
151 | # coefs$high95ci <- coefs$coef + coefs$se * 1.96
152 | ```
153 |
154 |
155 |
156 |
--------------------------------------------------------------------------------
/makefile:
--------------------------------------------------------------------------------
1 |
2 | pdf-all:
3 | Rscript 'convert.r'
4 |
--------------------------------------------------------------------------------
/path-analysis/figure/unnamed-chunk-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jeromyanglim/lavaan-examples/d7f5cbdc7fe14ffd039512bae6aa140c2a0ca5e6/path-analysis/figure/unnamed-chunk-5.png
--------------------------------------------------------------------------------
/path-analysis/path-analysis.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # Path Analysis Example
4 |
5 |
6 | ```r
7 | library(psych)
8 | library(lavaan)
9 | ```
10 |
11 |
12 |
13 |
14 |
15 | ## Simulate data
16 | Let's simulate some data:
17 |
18 | * three orthogonal predictor variables
19 | * one mediator variable
20 | * one dependent variable
21 |
22 |
23 |
24 | ```r
25 | set.seed(1234)
26 | N <- 1000
27 | iv1 <- rnorm(N, 0, 1)
28 | iv2 <- rnorm(N, 0, 1)
29 | iv3 <- rnorm(N, 0, 1)
30 | mv <- rnorm(N, .2 * iv1 + -.2 * iv2 + .3 * iv3, 1)
31 | dv <- rnorm(N, .8 * mv, 1)
32 | data_1 <- data.frame(iv1, iv2, iv3, mv, dv)
33 | ```
34 |
35 |
36 |
37 |
38 | ## Traditional examination of dataset
39 | * Is a regression consistent with the model?
40 |
41 |
42 |
43 | ```r
44 | summary(lm(mv ~ iv1 + iv2 + iv3, data_1))
45 | ```
46 |
47 | ```
48 | ##
49 | ## Call:
50 | ## lm(formula = mv ~ iv1 + iv2 + iv3, data = data_1)
51 | ##
52 | ## Residuals:
53 | ## Min 1Q Median 3Q Max
54 | ## -3.0281 -0.6863 0.0114 0.6697 3.1412
55 | ##
56 | ## Coefficients:
57 | ## Estimate Std. Error t value Pr(>|t|)
58 | ## (Intercept) -0.00945 0.03150 -0.30 0.76
59 | ## iv1 0.19737 0.03163 6.24 6.4e-10 ***
60 | ## iv2 -0.19978 0.03216 -6.21 7.7e-10 ***
61 | ## iv3 0.29183 0.03113 9.38 < 2e-16 ***
62 | ## ---
63 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
64 | ##
65 | ## Residual standard error: 0.995 on 996 degrees of freedom
66 | ## Multiple R-squared: 0.144, Adjusted R-squared: 0.141
67 | ## F-statistic: 55.8 on 3 and 996 DF, p-value: <2e-16
68 | ##
69 | ```
70 |
71 |
72 |
73 |
74 | This is broadly similar to the equation predicting `mv`.
75 |
76 |
77 |
78 | ```r
79 | summary(lm(dv ~ iv1 + iv2 + iv3 + mv, data_1))
80 | ```
81 |
82 | ```
83 | ##
84 | ## Call:
85 | ## lm(formula = dv ~ iv1 + iv2 + iv3 + mv, data = data_1)
86 | ##
87 | ## Residuals:
88 | ## Min 1Q Median 3Q Max
89 | ## -2.7484 -0.6547 -0.0359 0.6947 2.7185
90 | ##
91 | ## Coefficients:
92 | ## Estimate Std. Error t value Pr(>|t|)
93 | ## (Intercept) -0.0410 0.0308 -1.33 0.18
94 | ## iv1 -0.0449 0.0315 -1.43 0.15
95 | ## iv2 0.0400 0.0320 1.25 0.21
96 | ## iv3 0.0162 0.0317 0.51 0.61
97 | ## mv 0.8250 0.0309 26.66 <2e-16 ***
98 | ## ---
99 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
100 | ##
101 | ## Residual standard error: 0.972 on 995 degrees of freedom
102 | ## Multiple R-squared: 0.45, Adjusted R-squared: 0.448
103 | ## F-statistic: 204 on 4 and 995 DF, p-value: <2e-16
104 | ##
105 | ```
106 |
107 |
108 |
109 | Given that the simulation is based on complete mediation, the true regression coefficients for the ivs are zero. The results of the multiple regression predicting the `dv` from the `iv`s and `mv` is consistent with this.
110 |
111 | What are the basic descriptive statistics and intercorrelations?
112 |
113 |
114 |
115 | ```r
116 | psych::describe(data_1)
117 | ```
118 |
119 | ```
120 | ## var n mean sd median trimmed mad min max range skew
121 | ## iv1 1 1000 -0.03 1.00 -0.04 -0.03 0.95 -3.40 3.20 6.59 -0.01
122 | ## iv2 2 1000 0.01 0.98 0.01 0.02 0.97 -3.12 3.17 6.29 -0.07
123 | ## iv3 3 1000 0.03 1.01 0.06 0.03 1.06 -3.09 3.02 6.12 0.01
124 | ## mv 4 1000 -0.01 1.07 0.02 -0.02 1.03 -3.12 3.82 6.94 0.07
125 | ## dv 5 1000 -0.05 1.31 -0.03 -0.04 1.37 -3.99 3.53 7.53 -0.03
126 | ## kurtosis se
127 | ## iv1 0.25 0.03
128 | ## iv2 -0.07 0.03
129 | ## iv3 -0.21 0.03
130 | ## mv 0.14 0.03
131 | ## dv -0.17 0.04
132 | ```
133 |
134 | ```r
135 | pairs.panels(data_1, pch='.')
136 | ```
137 |
138 | 
139 |
140 |
141 |
142 | ## M1 Fit Path Analysis model
143 |
144 |
145 |
146 | ```r
147 | m1_model <- '
148 | dv ~ mv
149 | mv ~ iv1 + iv2 + iv3
150 | '
151 |
152 | m1_fit <- sem(m1_model, data=data_1)
153 | ```
154 |
155 |
156 |
157 |
158 | Are the regression coefficients the same?
159 |
160 |
161 |
162 | ```r
163 | parameterestimates(m1_fit)
164 | ```
165 |
166 | ```
167 | ## lhs op rhs est se z pvalue ci.lower ci.upper
168 | ## 1 dv ~ mv 0.815 0.029 28.490 0 0.759 0.871
169 | ## 2 mv ~ iv1 0.197 0.032 6.253 0 0.136 0.259
170 | ## 3 mv ~ iv2 -0.200 0.032 -6.224 0 -0.263 -0.137
171 | ## 4 mv ~ iv3 0.292 0.031 9.394 0 0.231 0.353
172 | ## 5 dv ~~ dv 0.944 0.042 22.361 0 0.861 1.026
173 | ## 6 mv ~~ mv 0.986 0.044 22.361 0 0.900 1.073
174 | ## 7 iv1 ~~ iv1 0.994 0.000 NA NA 0.994 0.994
175 | ## 8 iv1 ~~ iv2 0.055 0.000 NA NA 0.055 0.055
176 | ## 9 iv1 ~~ iv3 0.016 0.000 NA NA 0.016 0.016
177 | ## 10 iv2 ~~ iv2 0.962 0.000 NA NA 0.962 0.962
178 | ## 11 iv2 ~~ iv3 -0.035 0.000 NA NA -0.035 -0.035
179 | ## 12 iv3 ~~ iv3 1.024 0.000 NA NA 1.024 1.024
180 | ```
181 |
182 |
183 |
184 |
185 | All the coefficients are in the ball park of what is expected.
186 |
187 | Does the model provide a good fit?
188 |
189 |
190 | ```r
191 | fitmeasures(m1_fit)
192 | ```
193 |
194 | ```
195 | ## chisq df pvalue baseline.chisq
196 | ## 3.654 3.000 0.301 753.212
197 | ## baseline.df baseline.pvalue cfi tli
198 | ## 7.000 0.000 0.999 0.998
199 | ## logl unrestricted.logl npar aic
200 | ## -7045.596 -7043.769 6.000 14103.191
201 | ## bic ntotal bic2 rmsea
202 | ## 14132.638 1000.000 14113.582 0.015
203 | ## rmsea.ci.lower rmsea.ci.upper rmsea.pvalue srmr
204 | ## 0.000 0.057 0.899 0.011
205 | ```
206 |
207 |
208 |
209 |
210 | * The fitted model should provide a good fit because the fitted model is identical to the model used to simulate the data.
211 | * In this case, the p-value and the fit measures are consistent with the data being generated from the model specified.
212 |
213 |
214 | ## Calculate and test indirect effects
215 |
216 |
217 | ```r
218 | m2_model <- '
219 | dv ~ b1*mv
220 | mv ~ a1*iv1 + a2*iv2 + a3*iv3
221 |
222 | # indirect effects
223 | iv1_mv := a1*b1
224 | iv2_mv := a2*b1
225 | iv3_mv := a3*b1
226 | '
227 |
228 | m2_fit <- sem(m2_model, data=data_1)
229 | ```
230 |
231 |
232 |
233 |
234 | * Note that I needed to name effects before I could define the indirect effect as the product of two effects using `:=` notation.
235 |
236 |
237 |
238 |
239 | ```r
240 | parameterestimates(m2_fit, standardize=TRUE)
241 | ```
242 |
243 | ```
244 | ## lhs op rhs label est se z pvalue ci.lower ci.upper
245 | ## 1 dv ~ mv b1 0.815 0.029 28.490 0 0.759 0.871
246 | ## 2 mv ~ iv1 a1 0.197 0.032 6.253 0 0.136 0.259
247 | ## 3 mv ~ iv2 a2 -0.200 0.032 -6.224 0 -0.263 -0.137
248 | ## 4 mv ~ iv3 a3 0.292 0.031 9.394 0 0.231 0.353
249 | ## 5 dv ~~ dv 0.944 0.042 22.361 0 0.861 1.026
250 | ## 6 mv ~~ mv 0.986 0.044 22.361 0 0.900 1.073
251 | ## 7 iv1 ~~ iv1 0.994 0.000 NA NA 0.994 0.994
252 | ## 8 iv1 ~~ iv2 0.055 0.000 NA NA 0.055 0.055
253 | ## 9 iv1 ~~ iv3 0.016 0.000 NA NA 0.016 0.016
254 | ## 10 iv2 ~~ iv2 0.962 0.000 NA NA 0.962 0.962
255 | ## 11 iv2 ~~ iv3 -0.035 0.000 NA NA -0.035 -0.035
256 | ## 12 iv3 ~~ iv3 1.024 0.000 NA NA 1.024 1.024
257 | ## 13 iv1_mv := a1*b1 iv1_mv 0.161 0.026 6.108 0 0.109 0.213
258 | ## 14 iv2_mv := a2*b1 iv2_mv -0.163 0.027 -6.081 0 -0.215 -0.110
259 | ## 15 iv3_mv := a3*b1 iv3_mv 0.238 0.027 8.922 0 0.186 0.290
260 | ## std.lv std.all std.nox
261 | ## 1 0.815 0.669 0.669
262 | ## 2 0.197 0.183 0.184
263 | ## 3 -0.200 -0.183 -0.186
264 | ## 4 0.292 0.275 0.272
265 | ## 5 0.944 0.552 0.552
266 | ## 6 0.986 0.856 0.856
267 | ## 7 0.994 1.000 0.994
268 | ## 8 0.055 0.057 0.055
269 | ## 9 0.016 0.015 0.016
270 | ## 10 0.962 1.000 0.962
271 | ## 11 -0.035 -0.035 -0.035
272 | ## 12 1.024 1.000 1.024
273 | ## 13 0.161 0.161 0.161
274 | ## 14 -0.163 -0.163 -0.163
275 | ## 15 0.238 0.238 0.238
276 | ```
277 |
278 |
279 |
280 |
281 | The above output provide a significance test, and confidence intervals for the indirect effects, and includes standardised effects.
282 |
283 |
284 |
285 |
--------------------------------------------------------------------------------
/path-analysis/path-analysis.rmd:
--------------------------------------------------------------------------------
1 | `r opts_chunk$set(cache=TRUE, tidy=FALSE)`
2 |
3 | # Path Analysis Example
4 | ```{r, message=FALSE}
5 | library(psych)
6 | library(lavaan)
7 |
8 | ```
9 |
10 |
11 | ## Simulate data
12 | Let's simulate some data:
13 |
14 | * three orthogonal predictor variables
15 | * one mediator variable
16 | * one dependent variable
17 |
18 | ```{r}
19 | set.seed(1234)
20 | N <- 1000
21 | iv1 <- rnorm(N, 0, 1)
22 | iv2 <- rnorm(N, 0, 1)
23 | iv3 <- rnorm(N, 0, 1)
24 | mv <- rnorm(N, .2 * iv1 + -.2 * iv2 + .3 * iv3, 1)
25 | dv <- rnorm(N, .8 * mv, 1)
26 | data_1 <- data.frame(iv1, iv2, iv3, mv, dv)
27 | ```
28 |
29 | ## Traditional examination of dataset
30 | * Is a regression consistent with the model?
31 |
32 | ```{r}
33 | summary(lm(mv ~ iv1 + iv2 + iv3, data_1))
34 | ```
35 |
36 | This is broadly similar to the equation predicting `mv`.
37 |
38 | ```{r}
39 | summary(lm(dv ~ iv1 + iv2 + iv3 + mv, data_1))
40 | ```
41 | Given that the simulation is based on complete mediation, the true regression coefficients for the ivs are zero. The results of the multiple regression predicting the `dv` from the `iv`s and `mv` is consistent with this.
42 |
43 | What are the basic descriptive statistics and intercorrelations?
44 |
45 | ```{r}
46 | psych::describe(data_1)
47 | pairs.panels(data_1, pch='.')
48 | ```
49 |
50 |
51 | ## M1 Fit Path Analysis model
52 |
53 | ```{r}
54 | m1_model <- '
55 | dv ~ mv
56 | mv ~ iv1 + iv2 + iv3
57 | '
58 |
59 | m1_fit <- sem(m1_model, data=data_1)
60 | ```
61 |
62 | Are the regression coefficients the same?
63 |
64 | ```{r}
65 | parameterestimates(m1_fit)
66 | ```
67 |
68 | All the coefficients are in the ball park of what is expected.
69 |
70 | Does the model provide a good fit?
71 | ```{r}
72 | fitmeasures(m1_fit)
73 | ```
74 |
75 | * The fitted model should provide a good fit because the fitted model is identical to the model used to simulate the data.
76 | * In this case, the p-value and the fit measures are consistent with the data being generated from the model specified.
77 |
78 |
79 | ## Calculate and test indirect effects
80 | ```{r}
81 | m2_model <- '
82 | dv ~ b1*mv
83 | mv ~ a1*iv1 + a2*iv2 + a3*iv3
84 |
85 | # indirect effects
86 | iv1_mv := a1*b1
87 | iv2_mv := a2*b1
88 | iv3_mv := a3*b1
89 | '
90 |
91 | m2_fit <- sem(m2_model, data=data_1)
92 | ```
93 |
94 | * Note that I needed to name effects before I could define the indirect effect as the product of two effects using `:=` notation.
95 |
96 |
97 | ```{r}
98 | parameterestimates(m2_fit, standardize=TRUE)
99 |
100 | ```
101 |
102 | The above output provide a significance test, and confidence intervals for the indirect effects, and includes standardised effects.
103 |
104 |
105 |
106 |
--------------------------------------------------------------------------------