├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Arthur CHATTON 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Causal cookbook recipes 2 | ================ 3 | Arthur Chatton and Julia Rohrer 4 | 2024-01-12 5 | 6 | - Goals 7 | - Implementing the Recipes 9 | - Generating synthetic data 11 | - Positivity 12 | - Inverse-probability-weighting 14 | - G-computation 15 | - Doubly-robust standardisation 17 | - Bootstrapping to compute 19 | the variance 20 | - Illustration 23 | of the impact of the weighting schemes on the pseudo-sample 24 | characteristics 25 | - References 26 | 27 | # Goals 28 | 29 | This document contains additional materials for the Causal Cookbook 30 | (Chatton & Rohrer, 2023). In the first sections, we illustrate how to 31 | implement the recipes of the cookbook in R, namely: 32 | 33 | 1. Inverse-probability weighting (also known as propensity-score 34 | weighting) 35 | 36 | 2. g-computation 37 | 38 | 3. Doubly-robust standardisation 39 | 40 | Below, we also provide an illustration of how the different weighting 41 | schemes mentioned in the cookbook affect the resulting pseudo-sample. 42 | 43 | We assume only basic knowledge of R. 44 | 45 | # Implementing the Recipes 46 | 47 | ## Generating synthetic data 48 | 49 | First, we need some data to work on. We generate synthetic data here 50 | because this allows us to know the true causal effect, which means that 51 | we can evaluate how the different recipes perform. 52 | 53 | ``` r 54 | datasim <- function(n) { 55 | # This small function simulates a dataset with n rows 56 | # containing covariats, and action, an outcome and 57 | # the underlying potential outcomes 58 | 59 | # Two binary covariates 60 | x1 <- rbinom(n, size = 1, prob = 0.5) 61 | x2 <- rbinom(n, size = 1, prob = 0.65) 62 | 63 | # Two continuous, normally distributed covariates 64 | x3 <- rnorm(n, 0, 1) 65 | x4 <- rnorm(n, 0, 1) 66 | 67 | # The action (independent variable, treatment, exposure...) 68 | # is a function of x2, x3, x4 and 69 | # the product of (ie, an interaction of) x2 and x4 70 | A <- rbinom(n, size = 1, prob = plogis(-1.6 + 2.5*x2 + 2.2*x3 + 0.6*x4 + 0.4*x2*x4)) 71 | 72 | # Simulate the two potential outcomes 73 | # as functions of x1, X2, X4 and the product of x2 and x4 74 | 75 | # Potential outcome if the action is 1 76 | # note that here, we add 1 77 | Y.1 <- rbinom(n, size = 1, prob = plogis(-0.7 + 1 - 0.15*x1 + 0.45*x2 + 0.20*x4 + 0.4*x2*x4)) 78 | # Potential outcome if the action is 0 79 | # note that here, we do not add 1 so that there 80 | # is a different the two potential outcomes (ie, an effect of A) 81 | Y.0 <- rbinom(n, size = 1, prob = plogis(-0.7 + 0 - 0.15*x1 + 0.45*x2 + 0.20*x4 + 0.4*x2*x4)) 82 | 83 | # Observed outcome 84 | # is the potential outcomes (Y.1 or Y.0) 85 | # corresponding to action the individual experienced (A) 86 | Y <- Y.1*A + Y.0*(1 - A) 87 | 88 | # Return a data.frame as the output of this function 89 | data.frame(x1, x2, x3, x4, A, Y, Y.1, Y.0) 90 | } 91 | ``` 92 | 93 | We have 4 covariates (`x1` to `x4`), but only `x2` and `x4` are 94 | confounders. Thus, the minimal adjustment set C for achieving 95 | (conditional) exchangeability is (`x2`, `x4`). These four predictors are 96 | used for simulating the action `A` (say the Alcoholic Anonymous 97 | attendance: 1 if attendee and 0 otherwise) and the outcome `Y` (say 98 | abstinence at the 1-year follow-up: 1 if abstinent and 0 otherwise). 99 | 100 | We also see the variables `Y.1` and `Y.0,` which are the two *potential 101 | outcomes* observed in a hypothetical world in which all individuals are 102 | attenders (A=1) or non-attenders (A=0), respectively. In practice, the 103 | potential outcomes are unknown, but using synthetic data we know them 104 | which allows us to determine the true causal effect. 105 | 106 | Let’s actually generate the data. 107 | 108 | ``` r 109 | set.seed(120110) # for reproducibility 110 | ObsData <- datasim(n = 500000) # really large sample 111 | TRUE_EY.1 <- mean(ObsData$Y.1); TRUE_EY.1 # mean outcome under A = 1 112 | ``` 113 | 114 | ## [1] 0.619346 115 | 116 | ``` r 117 | TRUE_EY.0 <- mean(ObsData$Y.0); TRUE_EY.0 # mean outcome under A = 0 118 | ``` 119 | 120 | ## [1] 0.388012 121 | 122 | ``` r 123 | TRUE_ATE <- TRUE_EY.1 - TRUE_EY.0; TRUE_ATE # true average treatment effect is the difference 124 | ``` 125 | 126 | ## [1] 0.231334 127 | 128 | ``` r 129 | TRUE_MOR <- (TRUE_EY.1*(1 - TRUE_EY.0))/((1 - TRUE_EY.1)*TRUE_EY.0); TRUE_MOR # true marginal OR 130 | ``` 131 | 132 | ## [1] 2.56626 133 | 134 | The true ATE (average treatment effect on the entire population, i.e., 135 | risk difference over all individuals) is around 0.231, while the true 136 | marginal odds ratio is around 2.57. Both are valid causal effects, but 137 | their scale differs. 138 | 139 | In other words, in our simulated data, attending AA increases the 140 | chances of abstinence by 23.1 percentage points (from 38.8% to 61.9%) 141 | 142 | ## Positivity 143 | 144 | Before modelling, we must check the positivity assumption (*i.e.*, all 145 | individuals must have a non-extreme probability to experience the each 146 | level of `A`) regardless of the method used afterwards. For this, we use 147 | the PoRT algorithm (Danelian *et al.*, 2023). 148 | 149 | ``` r 150 | remotes::install_github('ArthurChatton/PoRT') #install the port package. 151 | library(port) 152 | ObsData$x4_round <- round(ObsData$x4, 0) # to reduce computational time (huge sample size), in practice use meaningful cutoff. 153 | port(A="A", cov.quanti="x4_round", cov.quali="x2", data=ObsData) 154 | ``` 155 | 156 | ## [1] "No problematic subgroup was identified." 157 | 158 | So, positivity seems respected here. We can go away and start 159 | ~~cooking~~ modelling. Else, we face a choice: 160 | 161 | - Changing the target population by excluding the problematic 162 | subgroup(s) identified by PoRT 163 | 164 | - Targeting an estimand for which the identified violation(s) are not 165 | meaningful (*e.g.*, ATT only requires the attendees have a 166 | non-extreme probability to be non-attendee) 167 | 168 | - Using an approach able to extrapolate over the problematic 169 | subgroup(s) such as the g-computation (but a correct extrapolation 170 | is not guaranteed). 171 | 172 | ## Inverse-probability-weighting 173 | 174 | Next, we illustrate the recipe provided in Box 4. 175 | 176 | In the first step, we need to fit the nuisance model e(C), *i.e.*, the 177 | propensity score. (Here, we do know how the data were generated, so this 178 | step is easy—if we were using actual data, we would first need to decide 179 | which covariates to include and how to precisely model the action) 180 | 181 | ``` r 182 | # Fit a logistic regression model predicting A from the relevant confounders 183 | # And immediately predict A for all observations (fitted.values) 184 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values 185 | ``` 186 | 187 | The coefficients of this model don’t matter, so we don’t even look them 188 | (see Weistreich & Greenland, 2013, for an explanation). 189 | 190 | In the second step, we compute the weights to recover the causal effect 191 | of interest. Here, we use the unstabilised ATE weights. 192 | 193 | ``` r 194 | # Assign the weights depending on the action group 195 | # See also Table 2 in the Causal Cookbook 196 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e)) 197 | summary(omega) 198 | ``` 199 | 200 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 201 | ## 1.037 1.385 1.625 1.999 2.313 17.559 202 | 203 | We need to check if this model balance the two groups in the resulting 204 | pseudo-sample (*i.e.*, weighted sample). We can use the `tableone` 205 | package for this (Yoshida & Bartel, 2022). 206 | 207 | ``` r 208 | ## Load the packages 209 | library(tableone); library(survey) 210 | 211 | ## Weighted data (pseudo-sample) 212 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 213 | 214 | ## Construct the table (This is quite slow) 215 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x2", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 216 | 217 | ## Show table with SMD 218 | print(tabWeighted, smd = TRUE) 219 | ``` 220 | 221 | ## Stratified by A 222 | ## Overall 0 1 SMD 223 | ## n 999696.74 499708.33 499988.41 224 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48) 0.001 225 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00) 0.003 226 | 227 | A correct balance of the confounders is achieved when the standardised 228 | mean difference (`SMD` in the table) between the two action groups in 229 | the pseudo sample is lower than 10% (Ali *et al.*, 2015). Here, the 230 | weights seem to have balanced the groups. However, if we have 231 | unmeasured or omitted confounders some imbalances can remain unnoticed. 232 | Note that the pseudo-sample size is twice the actual sample size. The 233 | stabilisation of the weights corrects this phenomenon. 234 | 235 | Note that other metrics can be useful, see the `cobalt` package (Greifer, 2024) and the related vignettes. 236 | 237 | In the third step, we fit the marginal structural model. 238 | 239 | ``` r 240 | # Logistic regression model returns the marginal OR 241 | msm_OR <- glm(Y~A, weights = omega, data=ObsData, family=binomial) 242 | # Linear model returns the ATE (risk difference) 243 | msm_RD <- lm(Y~A, weights = omega, data=ObsData) 244 | ``` 245 | 246 | The choice of the marginal structural model depends on the causal effect 247 | of interest. A logistic model gives us an estimate of the marginal OR, 248 | while a linear model gives us an estimate of the risk difference. 249 | 250 | Fourth step, the resulting estimates: 251 | 252 | ``` r 253 | MOR_IPW <- exp(msm_OR$coef[2]) 254 | RD_IPW <- msm_RD$coef[2] 255 | c(MOR_IPW,RD_IPW) |> round(digits=3) 256 | ``` 257 | 258 | ## A A 259 | ## 2.568 0.231 260 | 261 | We obtain a marginal OR of 2.57 and an ATE of 0.231—these are indeed the 262 | true values that we recovered above by contrasting the potential 263 | outcomes. 264 | 265 | For the variance of these estimates, we can use either a robust SE 266 | matrix as below or bootstrapping (see the end of this document). 267 | 268 | ``` r 269 | library(sandwich) 270 | library(lmtest) 271 | coeftest(msm_RD, vcov = sandwich) 272 | ``` 273 | 274 | ## 275 | ## t test of coefficients: 276 | ## 277 | ## Estimate Std. Error t value Pr(>|t|) 278 | ## (Intercept) 0.3888322 0.0011077 351.04 < 2.2e-16 *** 279 | ## A 0.2314945 0.0015505 149.30 < 2.2e-16 *** 280 | ## --- 281 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 282 | 283 | ``` r 284 | coeftest(msm_OR, vcov = sandwich) 285 | ``` 286 | 287 | ## 288 | ## z test of coefficients: 289 | ## 290 | ## Estimate Std. Error z value Pr(>|z|) 291 | ## (Intercept) -0.4522236 0.0046611 -97.021 < 2.2e-16 *** 292 | ## A 0.9431586 0.0065535 143.918 < 2.2e-16 *** 293 | ## --- 294 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 295 | 296 | ## G-computation 297 | 298 | Next, we illustrate the recipe from Box 5. 299 | 300 | First step, we need to fit the nuisance model Q(A,C): 301 | 302 | ``` r 303 | # Fit a logistic regression model predicting Y from the relevant confounders 304 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, data = ObsData, family=binomial) 305 | ``` 306 | 307 | Again, the coefficients of this model don’t matter. In contrast to 308 | propensity score-based methods, g-computation doesn’t need a “balance” 309 | assumption. 310 | 311 | Does the Q-model look familiar? Indeed, it’s a classical model used to 312 | control for confounding. But when we look at the estimate of the 313 | marginal OR 314 | 315 | ``` r 316 | exp(Q$coef[2]) 317 | ``` 318 | 319 | ## A 320 | ## 2.736306 321 | 322 | This estimate is not the causal effect of interest which we already 323 | learned is 2.57. 324 | 325 | This is because the marginal OR is non-collapsible; we need a specific 326 | causal method to target it properly. (What we get here is not the 327 | marginal OR, but an unbiased estimate of the *conditional OR*. This is 328 | the OR when all covariates are set to 0; its value with thus depend on 329 | how we coded the covariates (e.g., whether we centered them, which 330 | reference category was used for dummy variables) and which terms we 331 | included (linear, quadratic, cubic, interactions…). It can be viewed as 332 | a kind of \`\`average” of causal effects in all possible subgroups 333 | defined by the adjustment set. In contrast, the marginal OR depends on 334 | the actual distribution of covariates in the data.) 335 | 336 | Second step, let’s create hypothetical worlds! 337 | 338 | ``` r 339 | # Copy the "actual" (simulated) data twice 340 | A1Data <- A0Data <- ObsData 341 | # In one world A equals 1 for everyone, in the other one it equals 0 for everyone 342 | # The rest of the data stays as is (for now) 343 | A1Data$A <- 1; A0Data$A <- 0 344 | head(ObsData); head(A1Data); head(A0Data) 345 | ``` 346 | 347 | ## x1 x2 x3 x4 A Y Y.1 Y.0 348 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0 349 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0 350 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0 351 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0 352 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1 353 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0 354 | 355 | ## x1 x2 x3 x4 A Y Y.1 Y.0 356 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0 357 | ## 2 1 1 -0.1598913 -1.6984448 1 0 1 0 358 | ## 3 1 0 -0.5454640 1.6010621 1 0 1 0 359 | ## 4 0 1 -0.4733918 -0.8286665 1 0 1 0 360 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1 361 | ## 6 1 0 -1.2864879 1.5281848 1 0 1 0 362 | 363 | ## x1 x2 x3 x4 A Y Y.1 Y.0 364 | ## 1 0 1 -0.2909924 0.8374205 0 1 1 0 365 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0 366 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0 367 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0 368 | ## 5 1 1 -0.8476024 1.1711469 0 1 1 1 369 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0 370 | 371 | Our hypothetical worlds are identical except for the action status. 372 | `A1Data` represents a hypothetical world in which all individuals are 373 | attendees, while `A0Data` represents the opposite worlds in which all 374 | individuals are non-attendees. 375 | 376 | In the third step, we make counterfactual predictions. For this, we use 377 | the model `Q` as a prediction model and estimate the outcome’s 378 | probability in each hypothetical world: 379 | 380 | ``` r 381 | # Predict Y if everybody attends 382 | Y_A1 <- predict(Q, A1Data, type="response") 383 | # Predict Y if nobody attends 384 | Y_A0 <- predict(Q, A0Data, type="response") 385 | # Taking a look at the predictions 386 | data.frame(Y_A1=head(Y_A1), Y_A0=head(Y_A0), TRUE_Y=head(ObsData$Y)) |> round(digits = 2) 387 | ``` 388 | 389 | ## Y_A1 Y_A0 TRUE_Y 390 | ## 1 0.77 0.55 1 391 | ## 2 0.42 0.21 0 392 | ## 3 0.63 0.39 0 393 | ## 4 0.55 0.31 0 394 | ## 5 0.80 0.60 1 395 | ## 6 0.63 0.38 0 396 | 397 | Now, we can do the fourth step and compute the estimates. 398 | 399 | ``` r 400 | # Mean outcomes in the two worlds 401 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0) 402 | 403 | # Marginal odds ratio 404 | MOR_gcomp <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0) 405 | # ATE (risk difference) 406 | RD_gcomp <- pred_A1 - pred_A0 407 | c(MOR_gcomp, RD_gcomp) |> round(digits=3) 408 | ``` 409 | 410 | ## [1] 2.568 0.231 411 | 412 | As before when using IPW, we obtain unbiased estimates of the two causal 413 | effects. 414 | 415 | To quantify the variance of these estimates, we must rely on 416 | bootstrapping, which we illustrate at the end of this document. 417 | 418 | ## Doubly-robust standardisation 419 | 420 | Next, we illustrate the recipe provided in Box 6. 421 | 422 | Doubly-robust standardisation is a doubly-robust estimator combining IPW 423 | and g-computation. It can be more robust to potential misspecification 424 | because it requires only one model to be correctly specified, 425 | giving us twice the chance to get it right. 426 | 427 | This estimator begins (like IPW) with fitting e(C): 428 | 429 | ``` r 430 | # Correctly specified action model 431 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values 432 | # Misspecified action model 433 | emis <- glm(A ~ x2 + x4, data = ObsData, family=binomial)$fitted.values 434 | ``` 435 | 436 | We also fit a second model, which is misspecified because it omits the 437 | interaction between `x2` and `x4`. This will later allow us to 438 | demonstrate the doubly robust property of the estimator. 439 | 440 | As for IPW, we can check the balance at this step. However, because it’s 441 | a doubly robust estimator, small imbalances are less punishing. 442 | 443 | Second, we compute the weights that match the causal effect of interest 444 | (here again the unstabilised ATE): 445 | 446 | ``` r 447 | # Weights from the correctly specified action model 448 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e)) 449 | summary(omega) 450 | ``` 451 | 452 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 453 | ## 1.037 1.385 1.625 1.999 2.313 17.559 454 | 455 | ``` r 456 | # Weights from the misspecified action model 457 | omegamis <- ifelse(ObsData$A==1, 1/emis, 1/(1-emis)) 458 | summary(omegamis) 459 | ``` 460 | 461 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 462 | ## 1.040 1.397 1.659 2.003 2.274 21.454 463 | 464 | The `omega` weights are identical to those from IPW because it is 465 | exactly the same procedure up to this point. The `omegamis` weights 466 | differ due to the misspecification of e(C). 467 | 468 | Now, we start the g-computation part of doubly-robust standardisation. 469 | We fit the outcome model Q(A,C), but this time it is weighted by the 470 | weights `omega`. 471 | 472 | ``` r 473 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, weights = omega, data = ObsData, family=binomial) 474 | ``` 475 | 476 | We could again obtain the *conditional OR* through the coefficient 477 | related to `A` in `Q`, but we are more interested in the marginal OR 478 | which describes the effect across the whole population. 479 | 480 | We also compute various misspecified Q(A,C) models: 481 | 482 | ``` r 483 | # Outcome model misspecified 484 | Qmis <- glm(Y ~ A + x2 + x4, weights = omega, data = ObsData, family=binomial) 485 | # Outcome model is correct but the action model (weights) is misspecified 486 | Qomis <- glm(Y ~ A + x2 + x4 + x2*x4, weights = omegamis, data = ObsData, family=binomial) 487 | # Both the outcome model and action model are misspecified 488 | Qmismis <- glm(Y ~ A + x2 + x4, weights = omegamis, data = ObsData, family=binomial) 489 | ``` 490 | 491 | Where `Qmis` is only misspecified in Q(A,C), `Qomis` is only 492 | misspecified in e(C), and `Qmismis` is misspecified in both Q(A,C) and 493 | e(C). 494 | 495 | Fourth step, generating the counterfactuals dataset: 496 | 497 | ``` r 498 | # Duplicate data twice 499 | A1Data <- A0Data <- ObsData 500 | # Set action values 501 | A1Data$A <- 1; A0Data$A <- 0 502 | head(ObsData); head(A1Data); head(A0Data) 503 | ``` 504 | 505 | ## x1 x2 x3 x4 A Y Y.1 Y.0 506 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0 507 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0 508 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0 509 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0 510 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1 511 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0 512 | 513 | ## x1 x2 x3 x4 A Y Y.1 Y.0 514 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0 515 | ## 2 1 1 -0.1598913 -1.6984448 1 0 1 0 516 | ## 3 1 0 -0.5454640 1.6010621 1 0 1 0 517 | ## 4 0 1 -0.4733918 -0.8286665 1 0 1 0 518 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1 519 | ## 6 1 0 -1.2864879 1.5281848 1 0 1 0 520 | 521 | ## x1 x2 x3 x4 A Y Y.1 Y.0 522 | ## 1 0 1 -0.2909924 0.8374205 0 1 1 0 523 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0 524 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0 525 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0 526 | ## 5 1 1 -0.8476024 1.1711469 0 1 1 1 527 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0 528 | 529 | We are now ready for the fifth step: counterfactual predictions. 530 | 531 | ``` r 532 | Y_A1 <- predict(Q, A1Data, type="response") 533 | Y_A0 <- predict(Q, A0Data, type="response") 534 | data.frame(Y_A1=head(Y_A1), Y_A0=head(Y_A0), TRUE_Y=head(ObsData$Y)) |> round(digits = 2) 535 | ``` 536 | 537 | ## Y_A1 Y_A0 TRUE_Y 538 | ## 1 0.77 0.55 1 539 | ## 2 0.42 0.21 0 540 | ## 3 0.63 0.39 0 541 | ## 4 0.55 0.31 0 542 | ## 5 0.80 0.60 1 543 | ## 6 0.63 0.38 0 544 | 545 | And we repeat the procedure for the misspecified models. 546 | 547 | ``` r 548 | Y_A1mis <- predict(Qmis, A1Data, type="response") 549 | Y_A0mis <- predict(Qmis, A0Data, type="response") 550 | Y_A1omis <- predict(Qomis, A1Data, type="response") 551 | Y_A0omis <- predict(Qomis, A0Data, type="response") 552 | Y_A1mismis <- predict(Qmismis, A1Data, type="response") 553 | Y_A0mismis <- predict(Qmismis, A0Data, type="response") 554 | ``` 555 | 556 | Finally, we compute the estimates of the doubly-robust standardisation: 557 | 558 | ``` r 559 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0) 560 | 561 | # Marginal odds ratio 562 | MOR_DRS <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0) 563 | # ATE (risk difference) 564 | RD_DRS <- pred_A1 - pred_A0 565 | c(MOR_DRS, RD_DRS) |> round(digits=3) 566 | ``` 567 | 568 | ## [1] 2.564 0.231 569 | 570 | Again, the estimates are unbiased since we have included all the 571 | confounders and correctly specified the models. But what if the model(s) 572 | are misspecified? 573 | 574 | ``` r 575 | pred_A1mis <- mean(Y_A1mis); pred_A0mis <- mean(Y_A0mis) 576 | MOR_DRSmis <- (pred_A1mis*(1 - pred_A0mis))/((1 - pred_A1mis)*pred_A0mis) 577 | RD_DRSmis <- pred_A1mis - pred_A0mis 578 | 579 | pred_A1omis <- mean(Y_A1omis); pred_A0omis <- mean(Y_A0omis) 580 | MOR_DRSomis <- (pred_A1omis*(1 - pred_A0omis))/((1 - pred_A1omis)*pred_A0omis) 581 | RD_DRSomis <- pred_A1omis - pred_A0omis 582 | 583 | pred_A1mismis <- mean(Y_A1mismis); pred_A0mismis <- mean(Y_A0mismis) 584 | MOR_DRSmismis <- (pred_A1mismis*(1 - pred_A0mis))/((1 - pred_A1mismis)*pred_A0mismis) 585 | RD_DRSmismis <- pred_A1mismis - pred_A0mismis 586 | 587 | data.frame(`Q_mis`=c(MOR_DRSmis, RD_DRSmis), `g_mis`=c(MOR_DRSomis, RD_DRSomis), `Q_mis_g_mis`=c(MOR_DRSmismis, RD_DRSmismis)) 588 | ``` 589 | 590 | ## Q_mis g_mis Q_mis_g_mis 591 | ## 1 2.5643738 2.564584 2.611692 592 | ## 2 0.2311518 0.231171 0.236282 593 | 594 | In the first two scenarios (only one model misspecified), there is no 595 | bias. However, when both models are misspecified, we can observe a bias 596 | (which is small in this simulated example but could of course be much 597 | larger in actual data). 598 | 599 | You can try to change the IPW and the g-computation codes by yourself to 600 | see that these procedures lack this doubly-robust property. Maybe you 601 | also want to try omitting a confounder of changing the functional form 602 | (*e.g.*, cubic relationship for `x4`) to get a feel for how the methods 603 | behave. 604 | 605 | ## Bootstrapping to compute the variance 606 | 607 | Bootstrapping is a powerful tool to obtain the variance of these 608 | estimators. The underlying idea is to resample with replacement, so that 609 | we end up with a slightly different sample. 610 | 611 | ``` r 612 | # Small bootstrapping demonstration 613 | test_db <- LETTERS[1:5] 614 | test_db 615 | ``` 616 | 617 | ## [1] "A" "B" "C" "D" "E" 618 | 619 | ``` r 620 | for(i in 1:3){ 621 | db_boot <- test_db[sample(1:length(test_db), size=length(test_db), replace=TRUE)] 622 | print(paste0('Bootstrap sample #', i)); print(db_boot) 623 | } 624 | ``` 625 | 626 | ## [1] "Bootstrap sample #1" 627 | ## [1] "D" "A" "D" "B" "D" 628 | ## [1] "Bootstrap sample #2" 629 | ## [1] "D" "E" "C" "B" "D" 630 | ## [1] "Bootstrap sample #3" 631 | ## [1] "C" "E" "A" "D" "C" 632 | 633 | By drawing several bootstrap samples and then applying the full 634 | estimation process on each sample, we can obtain a fair estimate of the 635 | variance, taking into account the whole uncertainty in the process 636 | (uncertainty in the estimated weights for IPW; uncertainty in the model 637 | Q for g-computation; uncertainty in both for doubly-robust 638 | standardisation). 639 | 640 | Let’s apply the bootstrap to g-computation: 641 | 642 | ``` r 643 | # A bit of setup before we can start the resampling 644 | 645 | # We will need an empty vector to store the results 646 | boot_MOR <- boot_ATE <- c() 647 | 648 | # Number of bootstrap samples, usually 500 or 1000 649 | B <- 20 650 | 651 | for (i in 1:B){ 652 | # We repeat everything in this loop B times 653 | 654 | # Draw the sample 655 | db_boot <- ObsData[sample(1:nrow(ObsData), size = nrow(ObsData), replace = TRUE),] 656 | 657 | # Step 1: fit Q(A,C) on db_boot (instead of ObsData) 658 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, data = db_boot, family=binomial) 659 | 660 | # Step 2: counterfactual (bootstrap) datasets 661 | A1Data <- A0Data <- db_boot 662 | A1Data$A <- 1; A0Data$A <- 0 663 | 664 | # Step 3: Counterfactual predictions 665 | Y_A1 <- predict(Q, A1Data, type="response") 666 | Y_A0 <- predict(Q, A0Data, type="response") 667 | 668 | # Step 4: Estimates 669 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0) 670 | 671 | boot_MOR[i] <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0) 672 | boot_ATE[i] <- pred_A1 - pred_A0 673 | 674 | } 675 | head(boot_MOR); head(boot_ATE) 676 | ``` 677 | 678 | ## [1] 2.564340 2.583766 2.555410 2.561102 2.576301 2.575973 679 | 680 | ## [1] 0.2311627 0.2329416 0.2303259 0.2308432 0.2322557 0.2322273 681 | 682 | Each bootstrap sample results in an estimate of the causal effect and in 683 | the end, we have a vector of causal effect estimates. When working with 684 | actual data we should run at least 500 bootstrap iterations to be 685 | somewhat confident in the results. 686 | 687 | Here, we have only 20 results; the results happen to be pretty similar 688 | across bootstrap samples because of the huge sample size and the simple 689 | data-generating process. 690 | 691 | Once we have the result vector(s), we can summarize them to compute the 692 | standard error or confidence intervals. 693 | 694 | ``` r 695 | # Standard error (SE) of the marginal OR 696 | sd(boot_MOR) 697 | ``` 698 | 699 | ## [1] 0.0158117 700 | 701 | ``` r 702 | # 95% Confidence Interval 703 | quantile(boot_MOR, probs=c(0.025,0.975), na.rm=TRUE) 704 | ``` 705 | 706 | ## 2.5% 97.5% 707 | ## 2.529537 2.585768 708 | 709 | ``` r 710 | # SE of the ATE 711 | sd(boot_ATE) 712 | ``` 713 | 714 | ## [1] 0.001462938 715 | 716 | ``` r 717 | # 95% Confidence Interval 718 | quantile(boot_ATE, probs=c(0.025,0.975), na.rm=TRUE) 719 | ``` 720 | 721 | ## 2.5% 97.5% 722 | ## 0.2279146 0.2331152 723 | 724 | # Illustration of the impact of the weighting schemes on the pseudo-sample characteristics 725 | 726 | In the previous sections, we have used the unstabilised ATE weights for 727 | IPW and the doubly robust standardisation. However, we can use other 728 | weighting schemes (presented in Table 2 of Chatton & Rohrer, 2023; 729 | reproduced below) which results in different pseudo-samples. 730 | 731 | | Name | Weight if A=1 | Weight if A=0 | Target population | 732 | |------------------|:---------------:|:-----------------:|:-----------------:| 733 | | Unstabilised ATE | 1/e(C) | 1/\[1-e(C)\] | Whole sample | 734 | | Stabilised ATE | P(A=1)/e(C) | P(A=0)/\[1-e(C)\] | Whole sample | 735 | | Unstabilised ATT | 1 | e(C)/\[1-e(C)\] | Treated | 736 | | Unstabilised ATU | \[1-e(C)\]/e(C) | 1 | Untreated | 737 | | Overlap | 1-e(C) | e(C) | Unclear | 738 | 739 | Let’s take a look at how using these weights affects the actual 740 | (simulated) sample characteristics. 741 | 742 | ``` r 743 | ## Construct the table 744 | tab <- tableone::CreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = ObsData, test = FALSE, addOverall = TRUE) 745 | 746 | ## Show table without SMD (not relevant here) 747 | print(tab, smd = FALSE) 748 | ``` 749 | 750 | ## Stratified by A 751 | ## Overall 0 1 752 | ## n 500000 248680 251320 753 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 754 | ## x2 (mean (SD)) 0.65 (0.48) 0.50 (0.50) 0.80 (0.40) 755 | ## x3 (mean (SD)) 0.00 (1.00) -0.55 (0.84) 0.54 (0.84) 756 | ## x4 (mean (SD)) 0.00 (1.00) -0.22 (0.98) 0.22 (0.97) 757 | 758 | We see that without any weighting, only `x1` is balanced between the two 759 | groups. 760 | 761 | Now take a look at the pseudo-sample that results when we weigh the 762 | data using the unstabilised ATE weights: 763 | 764 | ``` r 765 | ## Recompute the weights from e(C) for pedagogical purpose 766 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e)) 767 | summary(omega) 768 | ``` 769 | 770 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 771 | ## 1.037 1.385 1.625 1.999 2.313 17.559 772 | 773 | ``` r 774 | ## Weighted data (pseudo-sample) 775 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 776 | 777 | ## Construct the table (This is quite slow) 778 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 779 | 780 | ## Show table without SMD 781 | print(tabWeighted, smd = FALSE) 782 | ``` 783 | 784 | ## Stratified by A 785 | ## Overall 0 1 786 | ## n 999696.74 499708.33 499988.41 787 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 788 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48) 789 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.83) 0.64 (0.83) 790 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00) 791 | 792 | Now, in the two groups in the pseudo-sample, `x2` and `x4` (the 793 | covariates necessary to achieve conditional exchangeability) have the 794 | same distribution as in the actual sample; `x3` differs between the 795 | groups – it was not included when calculating the weights as it was not 796 | necessary to achieve conditional exchangeability. Thus, the ATE weights 797 | target the population represented by the whole sample. We can also see 798 | that the pseudo-sample size is twice the actual sample size. Stabilised 799 | weights correct this issue and produce less extreme weights. 800 | 801 | ``` r 802 | ## Compute the weights from e(C) 803 | omega <- ifelse(ObsData$A==1, mean(ObsData$A)/e, (1-mean(ObsData$A))/(1-e)) 804 | summary(omega) 805 | ``` 806 | 807 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 808 | ## 0.5211 0.6923 0.8137 0.9997 1.1550 8.7332 809 | 810 | ``` r 811 | ## Weighted data (pseudo-sample) 812 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 813 | 814 | ## Construct the table (again slow) 815 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 816 | 817 | ## Show table without SMD 818 | print(tabWeighted, smd = FALSE) 819 | ``` 820 | 821 | ## Stratified by A 822 | ## Overall 0 1 823 | ## n 499849.11 248534.93 251314.18 824 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 825 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48) 826 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.83) 0.64 (0.83) 827 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00) 828 | 829 | We see we have approximated more closely the original sample size with 830 | stabilised weights as expected. 831 | 832 | What happens if we use ATT weights instead? 833 | 834 | ``` r 835 | ## Compute the weights from e(C) 836 | omega <- ifelse(ObsData$A==1, 1, e/(1-e)) 837 | summary(omega) 838 | ``` 839 | 840 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 841 | ## 0.08435 0.62160 1.00000 1.00470 1.00000 16.55910 842 | 843 | ``` r 844 | ## Weighted data (pseudo-sample) 845 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 846 | 847 | ## Construct the table (anew slow) 848 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 849 | 850 | ## Show table without SMD 851 | print(tabWeighted, smd = FALSE) 852 | ``` 853 | 854 | ## Stratified by A 855 | ## Overall 0 1 856 | ## n 502348.33 251028.33 251320.00 857 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 858 | ## x2 (mean (SD)) 0.80 (0.40) 0.80 (0.40) 0.80 (0.40) 859 | ## x3 (mean (SD)) -0.10 (1.05) -0.75 (0.81) 0.54 (0.84) 860 | ## x4 (mean (SD)) 0.21 (0.97) 0.21 (0.97) 0.22 (0.97) 861 | 862 | Here, we see that the distribution of `x2` and `x4` in the two groups no 863 | longer matches the actual sample distribution. Instead, they have the 864 | same distribution as in the attendee group in our original data. Thus, 865 | ATT weights target the “treated” population instead of the whole 866 | population. 867 | 868 | ATU weights do “the opposite” and target the untreated population: 869 | 870 | ``` r 871 | ## Compute the weights from e(C) 872 | omega <- ifelse(ObsData$A==1, (1-e)/e, 1) 873 | summary(omega) 874 | ``` 875 | 876 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 877 | ## 0.0368 0.6272 1.0000 0.9947 1.0000 10.4176 878 | 879 | ``` r 880 | ## Weighted data (pseudo-sample) 881 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 882 | 883 | ## Construct the table (still slow) 884 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 885 | 886 | ## Show table without SMD 887 | print(tabWeighted, smd = FALSE) 888 | ``` 889 | 890 | ## Stratified by A 891 | ## Overall 0 1 892 | ## n 497348.41 248680.00 248668.41 893 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 894 | ## x2 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 895 | ## x3 (mean (SD)) 0.10 (1.05) -0.55 (0.84) 0.74 (0.81) 896 | ## x4 (mean (SD)) -0.21 (0.98) -0.22 (0.98) -0.21 (0.98) 897 | 898 | Lastly, there are overlap weights. They are designed to target a 899 | population in which positivity is respected. Let’s take a look: 900 | 901 | ``` r 902 | ## Compute the weights from e(C) 903 | omega <- ifelse(ObsData$A==1, 1-e, e) 904 | summary(omega) 905 | ``` 906 | 907 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 908 | ## 0.03549 0.27814 0.38464 0.42482 0.56763 0.94305 909 | 910 | ``` r 911 | ## Weighted data (pseudo-sample) 912 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 913 | 914 | ## Construct the table (always slow) 915 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 916 | 917 | ## Show table without SMD 918 | print(tabWeighted, smd = FALSE) 919 | ``` 920 | 921 | ## Stratified by A 922 | ## Overall 0 1 923 | ## n 212409.38 106204.69 106204.69 924 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 925 | ## x2 (mean (SD)) 0.67 (0.47) 0.67 (0.47) 0.67 (0.47) 926 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.82) 0.64 (0.82) 927 | ## x4 (mean (SD)) -0.03 (0.95) -0.03 (0.95) -0.03 (0.95) 928 | 929 | Here, the pseudo-sample is similar to the one obtained with the help of 930 | ATE weights. This is a “benign” scenario, because positivity is 931 | respected in our simulated data (and thus, overlap weights would not be 932 | necessary to begin with). 933 | 934 | But what happens when there is a lack of positivity? 935 | 936 | ``` r 937 | ## Introduce a positivity violation in the simulated data 938 | ObsData$x4[ObsData$x4>0.5 & ObsData$A==1] <- 0.5 # No attendee can have x4 > 0.5, but non-attendees can 939 | 940 | ## Compute the propensity score e(C) 941 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values 942 | 943 | ## Compute the weights from e(C) 944 | omega <- ifelse(ObsData$A==1, (1-e)/e, 1) 945 | summary(omega) 946 | ``` 947 | 948 | ## Min. 1st Qu. Median Mean 3rd Qu. Max. 949 | ## 0.4517 0.5907 1.0000 0.9914 1.0000 3.7720 950 | 951 | ``` r 952 | ## Weighted data (pseudo-sample) 953 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega) 954 | 955 | ## Construct the table (over and over slow) 956 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE) 957 | 958 | ## Show table without SMD 959 | print(tabWeighted, smd = FALSE) 960 | ``` 961 | 962 | ## Stratified by A 963 | ## Overall 0 1 964 | ## n 495715.10 248680.00 247035.10 965 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 966 | ## x2 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50) 967 | ## x3 (mean (SD)) 0.08 (1.04) -0.55 (0.84) 0.71 (0.82) 968 | ## x4 (mean (SD)) -0.20 (0.87) -0.22 (0.98) -0.17 (0.74) 969 | 970 | In this scenario, the pseudo-sample distribution maps the non-attendee 971 | group of the actual sample. Therefore, the target population is the 972 | “untreated” population. This is due to the particular positivity 973 | violation that we simulated. 974 | 975 | More generally, the population targeted by the overlap weights always 976 | lies somewhere between the populations targeted by the ATT and the ATU. 977 | In the most benign scenario, it will happen to target the whole sample 978 | (as it did before we simulated a positivity violation), but usually it 979 | does not. Thus, we often end up with an ill-defined target population – 980 | especially in settings where positivity is violated, and thus especially 981 | when the statistical properties of these weights would add the most 982 | value (Austin, 2023). 983 | 984 | # References 985 | 986 | Ali M.S., Groenwold R.H.H., Belitser S.V., Pestman W.R., Hoes A.W., Roes 987 | K.C.B., de Boer A. & Klungel O.H. (2015) Reporting of covariate 988 | selection and balance assessment in propensity score analysis is 989 | suboptimal: A systematic review. *Journal of Clinical Epidemiology*, 990 | 68(2), 122‑131. 991 | 992 | Austin P.C. (2023). Differences in target estimands between different 993 | propensity score-based weights. *Pharmacoepidemiology and Drug Safety*, 994 | 32(10), 1103-1112. 995 | 996 | Chatton A. & Rohrer JM. (2023) The causal cookbook: Recipes for 997 | propensity scores, g-computation and doubly robust standardisation. PsyArXiv: 10.31234/osf.io/k2gzp 998 | 999 | Danelian G., Foucher Y., Léger M., Le Borgne F. & Chatton A. (2023) 1000 | Identifying in-sample positivity violations through regression trees: 1001 | the PoRT algorithm. *Journal of Causal Inference*, 11(1), 20220032. 1002 | 1003 | Greifer, N. (2024). cobalt: Covariate Balance Tables and Plots. *R package*, 1004 | 1005 | Westreich D. & Greenland S. (2013). The Table 2 Fallacy: Presenting and 1006 | Interpreting Confounder and Modifier Coefficients. *American Journal of 1007 | Epidemiology*, 177(4), 292‑298. 1008 | 1009 | Yoshida K. & Bartel A. (2022). tableone: Create ‘Table 1’ to Describe 1010 | Baseline Characteristics with or without Propensity Score Weights. *R 1011 | package*, 1012 | --------------------------------------------------------------------------------