├── LICENSE
└── README.md
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Arthur CHATTON
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Causal cookbook recipes
2 | ================
3 | Arthur Chatton and Julia Rohrer
4 | 2024-01-12
5 |
6 | - Goals
7 | - Implementing the Recipes
9 | - Generating synthetic data
11 | - Positivity
12 | - Inverse-probability-weighting
14 | - G-computation
15 | - Doubly-robust standardisation
17 | - Bootstrapping to compute
19 | the variance
20 | - Illustration
23 | of the impact of the weighting schemes on the pseudo-sample
24 | characteristics
25 | - References
26 |
27 | # Goals
28 |
29 | This document contains additional materials for the Causal Cookbook
30 | (Chatton & Rohrer, 2023). In the first sections, we illustrate how to
31 | implement the recipes of the cookbook in R, namely:
32 |
33 | 1. Inverse-probability weighting (also known as propensity-score
34 | weighting)
35 |
36 | 2. g-computation
37 |
38 | 3. Doubly-robust standardisation
39 |
40 | Below, we also provide an illustration of how the different weighting
41 | schemes mentioned in the cookbook affect the resulting pseudo-sample.
42 |
43 | We assume only basic knowledge of R.
44 |
45 | # Implementing the Recipes
46 |
47 | ## Generating synthetic data
48 |
49 | First, we need some data to work on. We generate synthetic data here
50 | because this allows us to know the true causal effect, which means that
51 | we can evaluate how the different recipes perform.
52 |
53 | ``` r
54 | datasim <- function(n) {
55 | # This small function simulates a dataset with n rows
56 | # containing covariats, and action, an outcome and
57 | # the underlying potential outcomes
58 |
59 | # Two binary covariates
60 | x1 <- rbinom(n, size = 1, prob = 0.5)
61 | x2 <- rbinom(n, size = 1, prob = 0.65)
62 |
63 | # Two continuous, normally distributed covariates
64 | x3 <- rnorm(n, 0, 1)
65 | x4 <- rnorm(n, 0, 1)
66 |
67 | # The action (independent variable, treatment, exposure...)
68 | # is a function of x2, x3, x4 and
69 | # the product of (ie, an interaction of) x2 and x4
70 | A <- rbinom(n, size = 1, prob = plogis(-1.6 + 2.5*x2 + 2.2*x3 + 0.6*x4 + 0.4*x2*x4))
71 |
72 | # Simulate the two potential outcomes
73 | # as functions of x1, X2, X4 and the product of x2 and x4
74 |
75 | # Potential outcome if the action is 1
76 | # note that here, we add 1
77 | Y.1 <- rbinom(n, size = 1, prob = plogis(-0.7 + 1 - 0.15*x1 + 0.45*x2 + 0.20*x4 + 0.4*x2*x4))
78 | # Potential outcome if the action is 0
79 | # note that here, we do not add 1 so that there
80 | # is a different the two potential outcomes (ie, an effect of A)
81 | Y.0 <- rbinom(n, size = 1, prob = plogis(-0.7 + 0 - 0.15*x1 + 0.45*x2 + 0.20*x4 + 0.4*x2*x4))
82 |
83 | # Observed outcome
84 | # is the potential outcomes (Y.1 or Y.0)
85 | # corresponding to action the individual experienced (A)
86 | Y <- Y.1*A + Y.0*(1 - A)
87 |
88 | # Return a data.frame as the output of this function
89 | data.frame(x1, x2, x3, x4, A, Y, Y.1, Y.0)
90 | }
91 | ```
92 |
93 | We have 4 covariates (`x1` to `x4`), but only `x2` and `x4` are
94 | confounders. Thus, the minimal adjustment set C for achieving
95 | (conditional) exchangeability is (`x2`, `x4`). These four predictors are
96 | used for simulating the action `A` (say the Alcoholic Anonymous
97 | attendance: 1 if attendee and 0 otherwise) and the outcome `Y` (say
98 | abstinence at the 1-year follow-up: 1 if abstinent and 0 otherwise).
99 |
100 | We also see the variables `Y.1` and `Y.0,` which are the two *potential
101 | outcomes* observed in a hypothetical world in which all individuals are
102 | attenders (A=1) or non-attenders (A=0), respectively. In practice, the
103 | potential outcomes are unknown, but using synthetic data we know them
104 | which allows us to determine the true causal effect.
105 |
106 | Let’s actually generate the data.
107 |
108 | ``` r
109 | set.seed(120110) # for reproducibility
110 | ObsData <- datasim(n = 500000) # really large sample
111 | TRUE_EY.1 <- mean(ObsData$Y.1); TRUE_EY.1 # mean outcome under A = 1
112 | ```
113 |
114 | ## [1] 0.619346
115 |
116 | ``` r
117 | TRUE_EY.0 <- mean(ObsData$Y.0); TRUE_EY.0 # mean outcome under A = 0
118 | ```
119 |
120 | ## [1] 0.388012
121 |
122 | ``` r
123 | TRUE_ATE <- TRUE_EY.1 - TRUE_EY.0; TRUE_ATE # true average treatment effect is the difference
124 | ```
125 |
126 | ## [1] 0.231334
127 |
128 | ``` r
129 | TRUE_MOR <- (TRUE_EY.1*(1 - TRUE_EY.0))/((1 - TRUE_EY.1)*TRUE_EY.0); TRUE_MOR # true marginal OR
130 | ```
131 |
132 | ## [1] 2.56626
133 |
134 | The true ATE (average treatment effect on the entire population, i.e.,
135 | risk difference over all individuals) is around 0.231, while the true
136 | marginal odds ratio is around 2.57. Both are valid causal effects, but
137 | their scale differs.
138 |
139 | In other words, in our simulated data, attending AA increases the
140 | chances of abstinence by 23.1 percentage points (from 38.8% to 61.9%)
141 |
142 | ## Positivity
143 |
144 | Before modelling, we must check the positivity assumption (*i.e.*, all
145 | individuals must have a non-extreme probability to experience the each
146 | level of `A`) regardless of the method used afterwards. For this, we use
147 | the PoRT algorithm (Danelian *et al.*, 2023).
148 |
149 | ``` r
150 | remotes::install_github('ArthurChatton/PoRT') #install the port package.
151 | library(port)
152 | ObsData$x4_round <- round(ObsData$x4, 0) # to reduce computational time (huge sample size), in practice use meaningful cutoff.
153 | port(A="A", cov.quanti="x4_round", cov.quali="x2", data=ObsData)
154 | ```
155 |
156 | ## [1] "No problematic subgroup was identified."
157 |
158 | So, positivity seems respected here. We can go away and start
159 | ~~cooking~~ modelling. Else, we face a choice:
160 |
161 | - Changing the target population by excluding the problematic
162 | subgroup(s) identified by PoRT
163 |
164 | - Targeting an estimand for which the identified violation(s) are not
165 | meaningful (*e.g.*, ATT only requires the attendees have a
166 | non-extreme probability to be non-attendee)
167 |
168 | - Using an approach able to extrapolate over the problematic
169 | subgroup(s) such as the g-computation (but a correct extrapolation
170 | is not guaranteed).
171 |
172 | ## Inverse-probability-weighting
173 |
174 | Next, we illustrate the recipe provided in Box 4.
175 |
176 | In the first step, we need to fit the nuisance model e(C), *i.e.*, the
177 | propensity score. (Here, we do know how the data were generated, so this
178 | step is easy—if we were using actual data, we would first need to decide
179 | which covariates to include and how to precisely model the action)
180 |
181 | ``` r
182 | # Fit a logistic regression model predicting A from the relevant confounders
183 | # And immediately predict A for all observations (fitted.values)
184 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values
185 | ```
186 |
187 | The coefficients of this model don’t matter, so we don’t even look them
188 | (see Weistreich & Greenland, 2013, for an explanation).
189 |
190 | In the second step, we compute the weights to recover the causal effect
191 | of interest. Here, we use the unstabilised ATE weights.
192 |
193 | ``` r
194 | # Assign the weights depending on the action group
195 | # See also Table 2 in the Causal Cookbook
196 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e))
197 | summary(omega)
198 | ```
199 |
200 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
201 | ## 1.037 1.385 1.625 1.999 2.313 17.559
202 |
203 | We need to check if this model balance the two groups in the resulting
204 | pseudo-sample (*i.e.*, weighted sample). We can use the `tableone`
205 | package for this (Yoshida & Bartel, 2022).
206 |
207 | ``` r
208 | ## Load the packages
209 | library(tableone); library(survey)
210 |
211 | ## Weighted data (pseudo-sample)
212 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
213 |
214 | ## Construct the table (This is quite slow)
215 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x2", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
216 |
217 | ## Show table with SMD
218 | print(tabWeighted, smd = TRUE)
219 | ```
220 |
221 | ## Stratified by A
222 | ## Overall 0 1 SMD
223 | ## n 999696.74 499708.33 499988.41
224 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48) 0.001
225 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00) 0.003
226 |
227 | A correct balance of the confounders is achieved when the standardised
228 | mean difference (`SMD` in the table) between the two action groups in
229 | the pseudo sample is lower than 10% (Ali *et al.*, 2015). Here, the
230 | weights seem to have balanced the groups. However, if we have
231 | unmeasured or omitted confounders some imbalances can remain unnoticed.
232 | Note that the pseudo-sample size is twice the actual sample size. The
233 | stabilisation of the weights corrects this phenomenon.
234 |
235 | Note that other metrics can be useful, see the `cobalt` package (Greifer, 2024) and the related vignettes.
236 |
237 | In the third step, we fit the marginal structural model.
238 |
239 | ``` r
240 | # Logistic regression model returns the marginal OR
241 | msm_OR <- glm(Y~A, weights = omega, data=ObsData, family=binomial)
242 | # Linear model returns the ATE (risk difference)
243 | msm_RD <- lm(Y~A, weights = omega, data=ObsData)
244 | ```
245 |
246 | The choice of the marginal structural model depends on the causal effect
247 | of interest. A logistic model gives us an estimate of the marginal OR,
248 | while a linear model gives us an estimate of the risk difference.
249 |
250 | Fourth step, the resulting estimates:
251 |
252 | ``` r
253 | MOR_IPW <- exp(msm_OR$coef[2])
254 | RD_IPW <- msm_RD$coef[2]
255 | c(MOR_IPW,RD_IPW) |> round(digits=3)
256 | ```
257 |
258 | ## A A
259 | ## 2.568 0.231
260 |
261 | We obtain a marginal OR of 2.57 and an ATE of 0.231—these are indeed the
262 | true values that we recovered above by contrasting the potential
263 | outcomes.
264 |
265 | For the variance of these estimates, we can use either a robust SE
266 | matrix as below or bootstrapping (see the end of this document).
267 |
268 | ``` r
269 | library(sandwich)
270 | library(lmtest)
271 | coeftest(msm_RD, vcov = sandwich)
272 | ```
273 |
274 | ##
275 | ## t test of coefficients:
276 | ##
277 | ## Estimate Std. Error t value Pr(>|t|)
278 | ## (Intercept) 0.3888322 0.0011077 351.04 < 2.2e-16 ***
279 | ## A 0.2314945 0.0015505 149.30 < 2.2e-16 ***
280 | ## ---
281 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
282 |
283 | ``` r
284 | coeftest(msm_OR, vcov = sandwich)
285 | ```
286 |
287 | ##
288 | ## z test of coefficients:
289 | ##
290 | ## Estimate Std. Error z value Pr(>|z|)
291 | ## (Intercept) -0.4522236 0.0046611 -97.021 < 2.2e-16 ***
292 | ## A 0.9431586 0.0065535 143.918 < 2.2e-16 ***
293 | ## ---
294 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
295 |
296 | ## G-computation
297 |
298 | Next, we illustrate the recipe from Box 5.
299 |
300 | First step, we need to fit the nuisance model Q(A,C):
301 |
302 | ``` r
303 | # Fit a logistic regression model predicting Y from the relevant confounders
304 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, data = ObsData, family=binomial)
305 | ```
306 |
307 | Again, the coefficients of this model don’t matter. In contrast to
308 | propensity score-based methods, g-computation doesn’t need a “balance”
309 | assumption.
310 |
311 | Does the Q-model look familiar? Indeed, it’s a classical model used to
312 | control for confounding. But when we look at the estimate of the
313 | marginal OR
314 |
315 | ``` r
316 | exp(Q$coef[2])
317 | ```
318 |
319 | ## A
320 | ## 2.736306
321 |
322 | This estimate is not the causal effect of interest which we already
323 | learned is 2.57.
324 |
325 | This is because the marginal OR is non-collapsible; we need a specific
326 | causal method to target it properly. (What we get here is not the
327 | marginal OR, but an unbiased estimate of the *conditional OR*. This is
328 | the OR when all covariates are set to 0; its value with thus depend on
329 | how we coded the covariates (e.g., whether we centered them, which
330 | reference category was used for dummy variables) and which terms we
331 | included (linear, quadratic, cubic, interactions…). It can be viewed as
332 | a kind of \`\`average” of causal effects in all possible subgroups
333 | defined by the adjustment set. In contrast, the marginal OR depends on
334 | the actual distribution of covariates in the data.)
335 |
336 | Second step, let’s create hypothetical worlds!
337 |
338 | ``` r
339 | # Copy the "actual" (simulated) data twice
340 | A1Data <- A0Data <- ObsData
341 | # In one world A equals 1 for everyone, in the other one it equals 0 for everyone
342 | # The rest of the data stays as is (for now)
343 | A1Data$A <- 1; A0Data$A <- 0
344 | head(ObsData); head(A1Data); head(A0Data)
345 | ```
346 |
347 | ## x1 x2 x3 x4 A Y Y.1 Y.0
348 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0
349 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0
350 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0
351 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0
352 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1
353 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0
354 |
355 | ## x1 x2 x3 x4 A Y Y.1 Y.0
356 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0
357 | ## 2 1 1 -0.1598913 -1.6984448 1 0 1 0
358 | ## 3 1 0 -0.5454640 1.6010621 1 0 1 0
359 | ## 4 0 1 -0.4733918 -0.8286665 1 0 1 0
360 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1
361 | ## 6 1 0 -1.2864879 1.5281848 1 0 1 0
362 |
363 | ## x1 x2 x3 x4 A Y Y.1 Y.0
364 | ## 1 0 1 -0.2909924 0.8374205 0 1 1 0
365 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0
366 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0
367 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0
368 | ## 5 1 1 -0.8476024 1.1711469 0 1 1 1
369 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0
370 |
371 | Our hypothetical worlds are identical except for the action status.
372 | `A1Data` represents a hypothetical world in which all individuals are
373 | attendees, while `A0Data` represents the opposite worlds in which all
374 | individuals are non-attendees.
375 |
376 | In the third step, we make counterfactual predictions. For this, we use
377 | the model `Q` as a prediction model and estimate the outcome’s
378 | probability in each hypothetical world:
379 |
380 | ``` r
381 | # Predict Y if everybody attends
382 | Y_A1 <- predict(Q, A1Data, type="response")
383 | # Predict Y if nobody attends
384 | Y_A0 <- predict(Q, A0Data, type="response")
385 | # Taking a look at the predictions
386 | data.frame(Y_A1=head(Y_A1), Y_A0=head(Y_A0), TRUE_Y=head(ObsData$Y)) |> round(digits = 2)
387 | ```
388 |
389 | ## Y_A1 Y_A0 TRUE_Y
390 | ## 1 0.77 0.55 1
391 | ## 2 0.42 0.21 0
392 | ## 3 0.63 0.39 0
393 | ## 4 0.55 0.31 0
394 | ## 5 0.80 0.60 1
395 | ## 6 0.63 0.38 0
396 |
397 | Now, we can do the fourth step and compute the estimates.
398 |
399 | ``` r
400 | # Mean outcomes in the two worlds
401 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0)
402 |
403 | # Marginal odds ratio
404 | MOR_gcomp <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0)
405 | # ATE (risk difference)
406 | RD_gcomp <- pred_A1 - pred_A0
407 | c(MOR_gcomp, RD_gcomp) |> round(digits=3)
408 | ```
409 |
410 | ## [1] 2.568 0.231
411 |
412 | As before when using IPW, we obtain unbiased estimates of the two causal
413 | effects.
414 |
415 | To quantify the variance of these estimates, we must rely on
416 | bootstrapping, which we illustrate at the end of this document.
417 |
418 | ## Doubly-robust standardisation
419 |
420 | Next, we illustrate the recipe provided in Box 6.
421 |
422 | Doubly-robust standardisation is a doubly-robust estimator combining IPW
423 | and g-computation. It can be more robust to potential misspecification
424 | because it requires only one model to be correctly specified,
425 | giving us twice the chance to get it right.
426 |
427 | This estimator begins (like IPW) with fitting e(C):
428 |
429 | ``` r
430 | # Correctly specified action model
431 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values
432 | # Misspecified action model
433 | emis <- glm(A ~ x2 + x4, data = ObsData, family=binomial)$fitted.values
434 | ```
435 |
436 | We also fit a second model, which is misspecified because it omits the
437 | interaction between `x2` and `x4`. This will later allow us to
438 | demonstrate the doubly robust property of the estimator.
439 |
440 | As for IPW, we can check the balance at this step. However, because it’s
441 | a doubly robust estimator, small imbalances are less punishing.
442 |
443 | Second, we compute the weights that match the causal effect of interest
444 | (here again the unstabilised ATE):
445 |
446 | ``` r
447 | # Weights from the correctly specified action model
448 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e))
449 | summary(omega)
450 | ```
451 |
452 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
453 | ## 1.037 1.385 1.625 1.999 2.313 17.559
454 |
455 | ``` r
456 | # Weights from the misspecified action model
457 | omegamis <- ifelse(ObsData$A==1, 1/emis, 1/(1-emis))
458 | summary(omegamis)
459 | ```
460 |
461 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
462 | ## 1.040 1.397 1.659 2.003 2.274 21.454
463 |
464 | The `omega` weights are identical to those from IPW because it is
465 | exactly the same procedure up to this point. The `omegamis` weights
466 | differ due to the misspecification of e(C).
467 |
468 | Now, we start the g-computation part of doubly-robust standardisation.
469 | We fit the outcome model Q(A,C), but this time it is weighted by the
470 | weights `omega`.
471 |
472 | ``` r
473 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, weights = omega, data = ObsData, family=binomial)
474 | ```
475 |
476 | We could again obtain the *conditional OR* through the coefficient
477 | related to `A` in `Q`, but we are more interested in the marginal OR
478 | which describes the effect across the whole population.
479 |
480 | We also compute various misspecified Q(A,C) models:
481 |
482 | ``` r
483 | # Outcome model misspecified
484 | Qmis <- glm(Y ~ A + x2 + x4, weights = omega, data = ObsData, family=binomial)
485 | # Outcome model is correct but the action model (weights) is misspecified
486 | Qomis <- glm(Y ~ A + x2 + x4 + x2*x4, weights = omegamis, data = ObsData, family=binomial)
487 | # Both the outcome model and action model are misspecified
488 | Qmismis <- glm(Y ~ A + x2 + x4, weights = omegamis, data = ObsData, family=binomial)
489 | ```
490 |
491 | Where `Qmis` is only misspecified in Q(A,C), `Qomis` is only
492 | misspecified in e(C), and `Qmismis` is misspecified in both Q(A,C) and
493 | e(C).
494 |
495 | Fourth step, generating the counterfactuals dataset:
496 |
497 | ``` r
498 | # Duplicate data twice
499 | A1Data <- A0Data <- ObsData
500 | # Set action values
501 | A1Data$A <- 1; A0Data$A <- 0
502 | head(ObsData); head(A1Data); head(A0Data)
503 | ```
504 |
505 | ## x1 x2 x3 x4 A Y Y.1 Y.0
506 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0
507 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0
508 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0
509 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0
510 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1
511 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0
512 |
513 | ## x1 x2 x3 x4 A Y Y.1 Y.0
514 | ## 1 0 1 -0.2909924 0.8374205 1 1 1 0
515 | ## 2 1 1 -0.1598913 -1.6984448 1 0 1 0
516 | ## 3 1 0 -0.5454640 1.6010621 1 0 1 0
517 | ## 4 0 1 -0.4733918 -0.8286665 1 0 1 0
518 | ## 5 1 1 -0.8476024 1.1711469 1 1 1 1
519 | ## 6 1 0 -1.2864879 1.5281848 1 0 1 0
520 |
521 | ## x1 x2 x3 x4 A Y Y.1 Y.0
522 | ## 1 0 1 -0.2909924 0.8374205 0 1 1 0
523 | ## 2 1 1 -0.1598913 -1.6984448 0 0 1 0
524 | ## 3 1 0 -0.5454640 1.6010621 0 0 1 0
525 | ## 4 0 1 -0.4733918 -0.8286665 0 0 1 0
526 | ## 5 1 1 -0.8476024 1.1711469 0 1 1 1
527 | ## 6 1 0 -1.2864879 1.5281848 0 0 1 0
528 |
529 | We are now ready for the fifth step: counterfactual predictions.
530 |
531 | ``` r
532 | Y_A1 <- predict(Q, A1Data, type="response")
533 | Y_A0 <- predict(Q, A0Data, type="response")
534 | data.frame(Y_A1=head(Y_A1), Y_A0=head(Y_A0), TRUE_Y=head(ObsData$Y)) |> round(digits = 2)
535 | ```
536 |
537 | ## Y_A1 Y_A0 TRUE_Y
538 | ## 1 0.77 0.55 1
539 | ## 2 0.42 0.21 0
540 | ## 3 0.63 0.39 0
541 | ## 4 0.55 0.31 0
542 | ## 5 0.80 0.60 1
543 | ## 6 0.63 0.38 0
544 |
545 | And we repeat the procedure for the misspecified models.
546 |
547 | ``` r
548 | Y_A1mis <- predict(Qmis, A1Data, type="response")
549 | Y_A0mis <- predict(Qmis, A0Data, type="response")
550 | Y_A1omis <- predict(Qomis, A1Data, type="response")
551 | Y_A0omis <- predict(Qomis, A0Data, type="response")
552 | Y_A1mismis <- predict(Qmismis, A1Data, type="response")
553 | Y_A0mismis <- predict(Qmismis, A0Data, type="response")
554 | ```
555 |
556 | Finally, we compute the estimates of the doubly-robust standardisation:
557 |
558 | ``` r
559 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0)
560 |
561 | # Marginal odds ratio
562 | MOR_DRS <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0)
563 | # ATE (risk difference)
564 | RD_DRS <- pred_A1 - pred_A0
565 | c(MOR_DRS, RD_DRS) |> round(digits=3)
566 | ```
567 |
568 | ## [1] 2.564 0.231
569 |
570 | Again, the estimates are unbiased since we have included all the
571 | confounders and correctly specified the models. But what if the model(s)
572 | are misspecified?
573 |
574 | ``` r
575 | pred_A1mis <- mean(Y_A1mis); pred_A0mis <- mean(Y_A0mis)
576 | MOR_DRSmis <- (pred_A1mis*(1 - pred_A0mis))/((1 - pred_A1mis)*pred_A0mis)
577 | RD_DRSmis <- pred_A1mis - pred_A0mis
578 |
579 | pred_A1omis <- mean(Y_A1omis); pred_A0omis <- mean(Y_A0omis)
580 | MOR_DRSomis <- (pred_A1omis*(1 - pred_A0omis))/((1 - pred_A1omis)*pred_A0omis)
581 | RD_DRSomis <- pred_A1omis - pred_A0omis
582 |
583 | pred_A1mismis <- mean(Y_A1mismis); pred_A0mismis <- mean(Y_A0mismis)
584 | MOR_DRSmismis <- (pred_A1mismis*(1 - pred_A0mis))/((1 - pred_A1mismis)*pred_A0mismis)
585 | RD_DRSmismis <- pred_A1mismis - pred_A0mismis
586 |
587 | data.frame(`Q_mis`=c(MOR_DRSmis, RD_DRSmis), `g_mis`=c(MOR_DRSomis, RD_DRSomis), `Q_mis_g_mis`=c(MOR_DRSmismis, RD_DRSmismis))
588 | ```
589 |
590 | ## Q_mis g_mis Q_mis_g_mis
591 | ## 1 2.5643738 2.564584 2.611692
592 | ## 2 0.2311518 0.231171 0.236282
593 |
594 | In the first two scenarios (only one model misspecified), there is no
595 | bias. However, when both models are misspecified, we can observe a bias
596 | (which is small in this simulated example but could of course be much
597 | larger in actual data).
598 |
599 | You can try to change the IPW and the g-computation codes by yourself to
600 | see that these procedures lack this doubly-robust property. Maybe you
601 | also want to try omitting a confounder of changing the functional form
602 | (*e.g.*, cubic relationship for `x4`) to get a feel for how the methods
603 | behave.
604 |
605 | ## Bootstrapping to compute the variance
606 |
607 | Bootstrapping is a powerful tool to obtain the variance of these
608 | estimators. The underlying idea is to resample with replacement, so that
609 | we end up with a slightly different sample.
610 |
611 | ``` r
612 | # Small bootstrapping demonstration
613 | test_db <- LETTERS[1:5]
614 | test_db
615 | ```
616 |
617 | ## [1] "A" "B" "C" "D" "E"
618 |
619 | ``` r
620 | for(i in 1:3){
621 | db_boot <- test_db[sample(1:length(test_db), size=length(test_db), replace=TRUE)]
622 | print(paste0('Bootstrap sample #', i)); print(db_boot)
623 | }
624 | ```
625 |
626 | ## [1] "Bootstrap sample #1"
627 | ## [1] "D" "A" "D" "B" "D"
628 | ## [1] "Bootstrap sample #2"
629 | ## [1] "D" "E" "C" "B" "D"
630 | ## [1] "Bootstrap sample #3"
631 | ## [1] "C" "E" "A" "D" "C"
632 |
633 | By drawing several bootstrap samples and then applying the full
634 | estimation process on each sample, we can obtain a fair estimate of the
635 | variance, taking into account the whole uncertainty in the process
636 | (uncertainty in the estimated weights for IPW; uncertainty in the model
637 | Q for g-computation; uncertainty in both for doubly-robust
638 | standardisation).
639 |
640 | Let’s apply the bootstrap to g-computation:
641 |
642 | ``` r
643 | # A bit of setup before we can start the resampling
644 |
645 | # We will need an empty vector to store the results
646 | boot_MOR <- boot_ATE <- c()
647 |
648 | # Number of bootstrap samples, usually 500 or 1000
649 | B <- 20
650 |
651 | for (i in 1:B){
652 | # We repeat everything in this loop B times
653 |
654 | # Draw the sample
655 | db_boot <- ObsData[sample(1:nrow(ObsData), size = nrow(ObsData), replace = TRUE),]
656 |
657 | # Step 1: fit Q(A,C) on db_boot (instead of ObsData)
658 | Q <- glm(Y ~ A + x2 + x4 + x2*x4, data = db_boot, family=binomial)
659 |
660 | # Step 2: counterfactual (bootstrap) datasets
661 | A1Data <- A0Data <- db_boot
662 | A1Data$A <- 1; A0Data$A <- 0
663 |
664 | # Step 3: Counterfactual predictions
665 | Y_A1 <- predict(Q, A1Data, type="response")
666 | Y_A0 <- predict(Q, A0Data, type="response")
667 |
668 | # Step 4: Estimates
669 | pred_A1 <- mean(Y_A1); pred_A0 <- mean(Y_A0)
670 |
671 | boot_MOR[i] <- (pred_A1*(1 - pred_A0))/((1 - pred_A1)*pred_A0)
672 | boot_ATE[i] <- pred_A1 - pred_A0
673 |
674 | }
675 | head(boot_MOR); head(boot_ATE)
676 | ```
677 |
678 | ## [1] 2.564340 2.583766 2.555410 2.561102 2.576301 2.575973
679 |
680 | ## [1] 0.2311627 0.2329416 0.2303259 0.2308432 0.2322557 0.2322273
681 |
682 | Each bootstrap sample results in an estimate of the causal effect and in
683 | the end, we have a vector of causal effect estimates. When working with
684 | actual data we should run at least 500 bootstrap iterations to be
685 | somewhat confident in the results.
686 |
687 | Here, we have only 20 results; the results happen to be pretty similar
688 | across bootstrap samples because of the huge sample size and the simple
689 | data-generating process.
690 |
691 | Once we have the result vector(s), we can summarize them to compute the
692 | standard error or confidence intervals.
693 |
694 | ``` r
695 | # Standard error (SE) of the marginal OR
696 | sd(boot_MOR)
697 | ```
698 |
699 | ## [1] 0.0158117
700 |
701 | ``` r
702 | # 95% Confidence Interval
703 | quantile(boot_MOR, probs=c(0.025,0.975), na.rm=TRUE)
704 | ```
705 |
706 | ## 2.5% 97.5%
707 | ## 2.529537 2.585768
708 |
709 | ``` r
710 | # SE of the ATE
711 | sd(boot_ATE)
712 | ```
713 |
714 | ## [1] 0.001462938
715 |
716 | ``` r
717 | # 95% Confidence Interval
718 | quantile(boot_ATE, probs=c(0.025,0.975), na.rm=TRUE)
719 | ```
720 |
721 | ## 2.5% 97.5%
722 | ## 0.2279146 0.2331152
723 |
724 | # Illustration of the impact of the weighting schemes on the pseudo-sample characteristics
725 |
726 | In the previous sections, we have used the unstabilised ATE weights for
727 | IPW and the doubly robust standardisation. However, we can use other
728 | weighting schemes (presented in Table 2 of Chatton & Rohrer, 2023;
729 | reproduced below) which results in different pseudo-samples.
730 |
731 | | Name | Weight if A=1 | Weight if A=0 | Target population |
732 | |------------------|:---------------:|:-----------------:|:-----------------:|
733 | | Unstabilised ATE | 1/e(C) | 1/\[1-e(C)\] | Whole sample |
734 | | Stabilised ATE | P(A=1)/e(C) | P(A=0)/\[1-e(C)\] | Whole sample |
735 | | Unstabilised ATT | 1 | e(C)/\[1-e(C)\] | Treated |
736 | | Unstabilised ATU | \[1-e(C)\]/e(C) | 1 | Untreated |
737 | | Overlap | 1-e(C) | e(C) | Unclear |
738 |
739 | Let’s take a look at how using these weights affects the actual
740 | (simulated) sample characteristics.
741 |
742 | ``` r
743 | ## Construct the table
744 | tab <- tableone::CreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = ObsData, test = FALSE, addOverall = TRUE)
745 |
746 | ## Show table without SMD (not relevant here)
747 | print(tab, smd = FALSE)
748 | ```
749 |
750 | ## Stratified by A
751 | ## Overall 0 1
752 | ## n 500000 248680 251320
753 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
754 | ## x2 (mean (SD)) 0.65 (0.48) 0.50 (0.50) 0.80 (0.40)
755 | ## x3 (mean (SD)) 0.00 (1.00) -0.55 (0.84) 0.54 (0.84)
756 | ## x4 (mean (SD)) 0.00 (1.00) -0.22 (0.98) 0.22 (0.97)
757 |
758 | We see that without any weighting, only `x1` is balanced between the two
759 | groups.
760 |
761 | Now take a look at the pseudo-sample that results when we weigh the
762 | data using the unstabilised ATE weights:
763 |
764 | ``` r
765 | ## Recompute the weights from e(C) for pedagogical purpose
766 | omega <- ifelse(ObsData$A==1, 1/e, 1/(1-e))
767 | summary(omega)
768 | ```
769 |
770 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
771 | ## 1.037 1.385 1.625 1.999 2.313 17.559
772 |
773 | ``` r
774 | ## Weighted data (pseudo-sample)
775 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
776 |
777 | ## Construct the table (This is quite slow)
778 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
779 |
780 | ## Show table without SMD
781 | print(tabWeighted, smd = FALSE)
782 | ```
783 |
784 | ## Stratified by A
785 | ## Overall 0 1
786 | ## n 999696.74 499708.33 499988.41
787 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
788 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48)
789 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.83) 0.64 (0.83)
790 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00)
791 |
792 | Now, in the two groups in the pseudo-sample, `x2` and `x4` (the
793 | covariates necessary to achieve conditional exchangeability) have the
794 | same distribution as in the actual sample; `x3` differs between the
795 | groups – it was not included when calculating the weights as it was not
796 | necessary to achieve conditional exchangeability. Thus, the ATE weights
797 | target the population represented by the whole sample. We can also see
798 | that the pseudo-sample size is twice the actual sample size. Stabilised
799 | weights correct this issue and produce less extreme weights.
800 |
801 | ``` r
802 | ## Compute the weights from e(C)
803 | omega <- ifelse(ObsData$A==1, mean(ObsData$A)/e, (1-mean(ObsData$A))/(1-e))
804 | summary(omega)
805 | ```
806 |
807 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
808 | ## 0.5211 0.6923 0.8137 0.9997 1.1550 8.7332
809 |
810 | ``` r
811 | ## Weighted data (pseudo-sample)
812 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
813 |
814 | ## Construct the table (again slow)
815 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
816 |
817 | ## Show table without SMD
818 | print(tabWeighted, smd = FALSE)
819 | ```
820 |
821 | ## Stratified by A
822 | ## Overall 0 1
823 | ## n 499849.11 248534.93 251314.18
824 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
825 | ## x2 (mean (SD)) 0.65 (0.48) 0.65 (0.48) 0.65 (0.48)
826 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.83) 0.64 (0.83)
827 | ## x4 (mean (SD)) 0.00 (1.00) 0.00 (1.00) 0.00 (1.00)
828 |
829 | We see we have approximated more closely the original sample size with
830 | stabilised weights as expected.
831 |
832 | What happens if we use ATT weights instead?
833 |
834 | ``` r
835 | ## Compute the weights from e(C)
836 | omega <- ifelse(ObsData$A==1, 1, e/(1-e))
837 | summary(omega)
838 | ```
839 |
840 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
841 | ## 0.08435 0.62160 1.00000 1.00470 1.00000 16.55910
842 |
843 | ``` r
844 | ## Weighted data (pseudo-sample)
845 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
846 |
847 | ## Construct the table (anew slow)
848 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
849 |
850 | ## Show table without SMD
851 | print(tabWeighted, smd = FALSE)
852 | ```
853 |
854 | ## Stratified by A
855 | ## Overall 0 1
856 | ## n 502348.33 251028.33 251320.00
857 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
858 | ## x2 (mean (SD)) 0.80 (0.40) 0.80 (0.40) 0.80 (0.40)
859 | ## x3 (mean (SD)) -0.10 (1.05) -0.75 (0.81) 0.54 (0.84)
860 | ## x4 (mean (SD)) 0.21 (0.97) 0.21 (0.97) 0.22 (0.97)
861 |
862 | Here, we see that the distribution of `x2` and `x4` in the two groups no
863 | longer matches the actual sample distribution. Instead, they have the
864 | same distribution as in the attendee group in our original data. Thus,
865 | ATT weights target the “treated” population instead of the whole
866 | population.
867 |
868 | ATU weights do “the opposite” and target the untreated population:
869 |
870 | ``` r
871 | ## Compute the weights from e(C)
872 | omega <- ifelse(ObsData$A==1, (1-e)/e, 1)
873 | summary(omega)
874 | ```
875 |
876 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
877 | ## 0.0368 0.6272 1.0000 0.9947 1.0000 10.4176
878 |
879 | ``` r
880 | ## Weighted data (pseudo-sample)
881 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
882 |
883 | ## Construct the table (still slow)
884 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
885 |
886 | ## Show table without SMD
887 | print(tabWeighted, smd = FALSE)
888 | ```
889 |
890 | ## Stratified by A
891 | ## Overall 0 1
892 | ## n 497348.41 248680.00 248668.41
893 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
894 | ## x2 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
895 | ## x3 (mean (SD)) 0.10 (1.05) -0.55 (0.84) 0.74 (0.81)
896 | ## x4 (mean (SD)) -0.21 (0.98) -0.22 (0.98) -0.21 (0.98)
897 |
898 | Lastly, there are overlap weights. They are designed to target a
899 | population in which positivity is respected. Let’s take a look:
900 |
901 | ``` r
902 | ## Compute the weights from e(C)
903 | omega <- ifelse(ObsData$A==1, 1-e, e)
904 | summary(omega)
905 | ```
906 |
907 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
908 | ## 0.03549 0.27814 0.38464 0.42482 0.56763 0.94305
909 |
910 | ``` r
911 | ## Weighted data (pseudo-sample)
912 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
913 |
914 | ## Construct the table (always slow)
915 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
916 |
917 | ## Show table without SMD
918 | print(tabWeighted, smd = FALSE)
919 | ```
920 |
921 | ## Stratified by A
922 | ## Overall 0 1
923 | ## n 212409.38 106204.69 106204.69
924 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
925 | ## x2 (mean (SD)) 0.67 (0.47) 0.67 (0.47) 0.67 (0.47)
926 | ## x3 (mean (SD)) 0.00 (1.05) -0.65 (0.82) 0.64 (0.82)
927 | ## x4 (mean (SD)) -0.03 (0.95) -0.03 (0.95) -0.03 (0.95)
928 |
929 | Here, the pseudo-sample is similar to the one obtained with the help of
930 | ATE weights. This is a “benign” scenario, because positivity is
931 | respected in our simulated data (and thus, overlap weights would not be
932 | necessary to begin with).
933 |
934 | But what happens when there is a lack of positivity?
935 |
936 | ``` r
937 | ## Introduce a positivity violation in the simulated data
938 | ObsData$x4[ObsData$x4>0.5 & ObsData$A==1] <- 0.5 # No attendee can have x4 > 0.5, but non-attendees can
939 |
940 | ## Compute the propensity score e(C)
941 | e <- glm(A ~ x2 + x4 + x2*x4, data = ObsData, family=binomial)$fitted.values
942 |
943 | ## Compute the weights from e(C)
944 | omega <- ifelse(ObsData$A==1, (1-e)/e, 1)
945 | summary(omega)
946 | ```
947 |
948 | ## Min. 1st Qu. Median Mean 3rd Qu. Max.
949 | ## 0.4517 0.5907 1.0000 0.9914 1.0000 3.7720
950 |
951 | ``` r
952 | ## Weighted data (pseudo-sample)
953 | pseudo <- survey::svydesign(ids = ~ 1, data = ObsData, weights = ~ omega)
954 |
955 | ## Construct the table (over and over slow)
956 | tabWeighted <- tableone::svyCreateTableOne(vars = c("x1", "x2", "x3", "x4"), strata = "A", data = pseudo, test = FALSE, addOverall = TRUE)
957 |
958 | ## Show table without SMD
959 | print(tabWeighted, smd = FALSE)
960 | ```
961 |
962 | ## Stratified by A
963 | ## Overall 0 1
964 | ## n 495715.10 248680.00 247035.10
965 | ## x1 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
966 | ## x2 (mean (SD)) 0.50 (0.50) 0.50 (0.50) 0.50 (0.50)
967 | ## x3 (mean (SD)) 0.08 (1.04) -0.55 (0.84) 0.71 (0.82)
968 | ## x4 (mean (SD)) -0.20 (0.87) -0.22 (0.98) -0.17 (0.74)
969 |
970 | In this scenario, the pseudo-sample distribution maps the non-attendee
971 | group of the actual sample. Therefore, the target population is the
972 | “untreated” population. This is due to the particular positivity
973 | violation that we simulated.
974 |
975 | More generally, the population targeted by the overlap weights always
976 | lies somewhere between the populations targeted by the ATT and the ATU.
977 | In the most benign scenario, it will happen to target the whole sample
978 | (as it did before we simulated a positivity violation), but usually it
979 | does not. Thus, we often end up with an ill-defined target population –
980 | especially in settings where positivity is violated, and thus especially
981 | when the statistical properties of these weights would add the most
982 | value (Austin, 2023).
983 |
984 | # References
985 |
986 | Ali M.S., Groenwold R.H.H., Belitser S.V., Pestman W.R., Hoes A.W., Roes
987 | K.C.B., de Boer A. & Klungel O.H. (2015) Reporting of covariate
988 | selection and balance assessment in propensity score analysis is
989 | suboptimal: A systematic review. *Journal of Clinical Epidemiology*,
990 | 68(2), 122‑131.
991 |
992 | Austin P.C. (2023). Differences in target estimands between different
993 | propensity score-based weights. *Pharmacoepidemiology and Drug Safety*,
994 | 32(10), 1103-1112.
995 |
996 | Chatton A. & Rohrer JM. (2023) The causal cookbook: Recipes for
997 | propensity scores, g-computation and doubly robust standardisation. PsyArXiv: 10.31234/osf.io/k2gzp
998 |
999 | Danelian G., Foucher Y., Léger M., Le Borgne F. & Chatton A. (2023)
1000 | Identifying in-sample positivity violations through regression trees:
1001 | the PoRT algorithm. *Journal of Causal Inference*, 11(1), 20220032.
1002 |
1003 | Greifer, N. (2024). cobalt: Covariate Balance Tables and Plots. *R package*,
1004 |
1005 | Westreich D. & Greenland S. (2013). The Table 2 Fallacy: Presenting and
1006 | Interpreting Confounder and Modifier Coefficients. *American Journal of
1007 | Epidemiology*, 177(4), 292‑298.
1008 |
1009 | Yoshida K. & Bartel A. (2022). tableone: Create ‘Table 1’ to Describe
1010 | Baseline Characteristics with or without Propensity Score Weights. *R
1011 | package*,
1012 |
--------------------------------------------------------------------------------