├── .gitignore
├── README.Rmd
├── README.md
├── code
└── smd-plot.R
├── exercises
├── 01-whole-game-exercises.Rmd
├── 02-dags-exercises.Rmd
├── 03-pscores-exercises.Rmd
├── 04-pscores-weighting-exercises.Rmd
├── 05-pscores-diagnostics-exercises.Rmd
└── 06-outcome-model-exercises.Rmd
├── slides
├── 00-intro.Rmd
├── 00-intro.html
├── 00-intro.pdf
├── 01-causal_modeling_whole_game.Rmd
├── 01-causal_modeling_whole_game.html
├── 01-causal_modeling_whole_game.pdf
├── 01-causal_modeling_whole_game_cache
│ └── html
│ │ ├── __packages
│ │ ├── boot_cache_011afc01abdd656081906cf473a2e4a7.RData
│ │ ├── boot_cache_011afc01abdd656081906cf473a2e4a7.rdb
│ │ └── boot_cache_011afc01abdd656081906cf473a2e4a7.rdx
├── 01-causal_modeling_whole_game_files
│ └── figure-html
│ │ ├── unnamed-chunk-10-1.png
│ │ ├── unnamed-chunk-11-1.png
│ │ ├── unnamed-chunk-12-1.png
│ │ ├── unnamed-chunk-13-1.png
│ │ ├── unnamed-chunk-14-1.png
│ │ ├── unnamed-chunk-16-1.png
│ │ ├── unnamed-chunk-19-1.png
│ │ ├── unnamed-chunk-2-1.png
│ │ ├── unnamed-chunk-20-1.png
│ │ ├── unnamed-chunk-21-1.png
│ │ ├── unnamed-chunk-27-1.png
│ │ ├── unnamed-chunk-3-1.png
│ │ ├── unnamed-chunk-4-1.png
│ │ ├── unnamed-chunk-5-1.png
│ │ ├── unnamed-chunk-6-1.png
│ │ ├── unnamed-chunk-7-1.png
│ │ └── unnamed-chunk-8-1.png
├── 02-dags.Rmd
├── 02-dags.html
├── 02-dags.pdf
├── 02-dags_files
│ └── figure-html
│ │ ├── unnamed-chunk-10-1.png
│ │ ├── unnamed-chunk-11-1.png
│ │ ├── unnamed-chunk-13-1.png
│ │ ├── unnamed-chunk-15-1.png
│ │ ├── unnamed-chunk-17-1.png
│ │ ├── unnamed-chunk-19-1.png
│ │ ├── unnamed-chunk-21-1.png
│ │ ├── unnamed-chunk-5-1.png
│ │ ├── unnamed-chunk-6-1.png
│ │ ├── unnamed-chunk-7-1.png
│ │ ├── unnamed-chunk-8-1.png
│ │ └── unnamed-chunk-9-1.png
├── 03-pscores.Rmd
├── 03-pscores.html
├── 03-pscores.pdf
├── 03-pscores_files
│ └── figure-html
│ │ ├── unnamed-chunk-5-1.png
│ │ └── unnamed-chunk-6-1.png
├── 04-pscore-weighting.Rmd
├── 04-pscore-weighting.html
├── 04-pscore-weighting.pdf
├── 04-pscore-weighting_files
│ └── figure-html
│ │ ├── unnamed-chunk-2-1.png
│ │ ├── unnamed-chunk-3-1.png
│ │ ├── unnamed-chunk-4-1.png
│ │ ├── unnamed-chunk-5-1.png
│ │ ├── unnamed-chunk-6-1.png
│ │ ├── unnamed-chunk-7-1.png
│ │ └── unnamed-chunk-8-1.png
├── 04-using-pscores_files
│ └── figure-html
│ │ ├── unnamed-chunk-2-1.png
│ │ ├── unnamed-chunk-3-1.png
│ │ ├── unnamed-chunk-4-1.png
│ │ ├── unnamed-chunk-5-1.png
│ │ ├── unnamed-chunk-6-1.png
│ │ └── unnamed-chunk-7-1.png
├── 05-pscore-diagnostics.Rmd
├── 05-pscore-diagnostics.html
├── 05-pscore-diagnostics.pdf
├── 05-pscore-diagnostics_files
│ └── figure-html
│ │ ├── unnamed-chunk-13-1.png
│ │ ├── unnamed-chunk-14-1.png
│ │ ├── unnamed-chunk-14-2.png
│ │ ├── unnamed-chunk-14-3.png
│ │ ├── unnamed-chunk-14-4.png
│ │ ├── unnamed-chunk-15-1.png
│ │ ├── unnamed-chunk-15-2.png
│ │ ├── unnamed-chunk-17-1.png
│ │ ├── unnamed-chunk-18-1.png
│ │ ├── unnamed-chunk-18-2.png
│ │ ├── unnamed-chunk-19-1.png
│ │ ├── unnamed-chunk-19-2.png
│ │ ├── unnamed-chunk-20-1.png
│ │ ├── unnamed-chunk-21-1.png
│ │ ├── unnamed-chunk-23-1.png
│ │ ├── unnamed-chunk-32-1.png
│ │ └── unnamed-chunk-8-1.png
├── 06-outcome-model.Rmd
├── 06-outcome-model.html
├── 06-outcome-model.pdf
├── img
│ ├── conf-2.png
│ ├── conf-3.png
│ ├── ggdagitty.png
│ ├── ggdagitty_alg.png
│ ├── ggdagitty_plots.png
│ ├── ldm.jpg
│ ├── mb.jpg
│ ├── obs-studies-2.png
│ ├── obs-studies-3.png
│ ├── obs-studies.png
│ ├── pscores.png
│ ├── randomized-2.png
│ ├── randomized.png
│ ├── tidy_ggdagitty.png
│ ├── trt-conf.png
│ └── trt.png
├── index.Rmd
├── index.html
├── libs
│ ├── countdown-0.3.5
│ │ ├── countdown.css
│ │ ├── countdown.js
│ │ └── smb_stage_clear.mp3
│ ├── countdown
│ │ ├── countdown.css
│ │ ├── countdown.js
│ │ └── smb_stage_clear.mp3
│ ├── header-attrs-2.3
│ │ └── header-attrs.js
│ ├── header-attrs
│ │ └── header-attrs.js
│ ├── remark-css-0.0.1
│ │ └── default.css
│ └── remark-css
│ │ └── default.css
└── theme.css
└── user2020-causal-inference.Rproj
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | .DS_Store
6 | render_pdf.R
7 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: github_document
3 | ---
4 |
5 |
6 |
7 | ```{r, include = FALSE}
8 | knitr::opts_chunk$set(
9 | collapse = TRUE,
10 | comment = "#>"
11 | )
12 | ```
13 |
14 | ## user2020! Causal Inference in R Workshop
15 |
16 | ### Slides
17 | * [00 Intro](https://user2020.lucymcgowan.com/00-intro.html)
18 | * [01 Whole Game](https://user2020.lucymcgowan.com/01-causal_modeling_whole_game.html)
19 | * [02 Causal Diagrams](https://user2020.lucymcgowan.com/02-dags.html)
20 | * [03 Introduction to Propensity Scores](https://user2020.lucymcgowan.com/03-pscores.html)
21 | * [04 Using Propensity Scores](https://user2020.lucymcgowan.com/04-pscore-weighting.html)
22 | * [05 Checking Propensity Scores](https://user2020.lucymcgowan.com/05-pscore-diagnostics.html)
23 | * [06 Fitting the outcome model](https://user2020.lucymcgowan.com/06-outcome-model.html)
24 |
25 | ### Installing materials locally
26 |
27 | We will be using RStudio Cloud for the workshop, but if you would like to install the required packages and course materials, we have an R package called {[useRcausal2020](https://github.com/malcolmbarrett/useRcausal2020)} to help you do that! You can install {[useRcausal2020](https://github.com/malcolmbarrett/useRcausal2020)} from GitHub with:
28 |
29 | ``` r
30 | install.packages("remotes")
31 | remotes::install_github("malcolmbarrett/useRcausal2020")
32 | ```
33 |
34 | Once you've installed the package, install the workshop with
35 |
36 | ``` r
37 | useRcausal2020::install_workshop("path/to/your/computer")
38 | ```
39 |
40 | Replace "path/to/your/computer" with where on your computer you want the workshop installed.
41 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | ## user2020\! Causal Inference in R Workshop
5 |
6 | ### Slides
7 |
8 | - [00 Intro](https://user2020.lucymcgowan.com/00-intro.html)
9 | - [01 Whole
10 | Game](https://user2020.lucymcgowan.com/01-causal_modeling_whole_game.html)
11 | - [02 Causal Diagrams](https://user2020.lucymcgowan.com/02-dags.html)
12 | - [03 Introduction to Propensity
13 | Scores](https://user2020.lucymcgowan.com/03-pscores.html)
14 | - [04 Using Propensity
15 | Scores](https://user2020.lucymcgowan.com/04-pscore-weighting.html)
16 | - [05 Checking Propensity
17 | Scores](https://user2020.lucymcgowan.com/05-pscore-diagnostics.html)
18 | - [06 Fitting the outcome
19 | model](https://user2020.lucymcgowan.com/06-outcome-model.html)
20 |
21 | ### Installing materials locally
22 |
23 | We will be using RStudio Cloud for the workshop, but if you would like
24 | to install the required packages and course materials, we have an R
25 | package called
26 | {[useRcausal2020](https://github.com/malcolmbarrett/useRcausal2020)} to
27 | help you do that\! You can install
28 | {[useRcausal2020](https://github.com/malcolmbarrett/useRcausal2020)}
29 | from GitHub with:
30 |
31 | ``` r
32 | install.packages("remotes")
33 | remotes::install_github("malcolmbarrett/useRcausal2020")
34 | ```
35 |
36 | Once you’ve installed the package, install the workshop with
37 |
38 | ``` r
39 | useRcausal2020::install_workshop("path/to/your/computer")
40 | ```
41 |
42 | Replace “path/to/your/computer” with where on your computer you want the
43 | workshop installed.
44 |
--------------------------------------------------------------------------------
/code/smd-plot.R:
--------------------------------------------------------------------------------
1 | library(survey)
2 | library(tableone)
3 | library(tidyverse)
4 | library(broom)
5 | # remotes::install_github("malcolmbarrett/cidata")
6 | library(cidata)
7 |
8 | propensity_model <- glm(
9 | qsmk ~ sex +
10 | race + age + I(age^2) + education +
11 | smokeintensity + I(smokeintensity^2) +
12 | smokeyrs + I(smokeyrs^2) + exercise + active +
13 | wt71 + I(wt71^2),
14 | family = binomial(),
15 | data = nhefs_complete
16 | )
17 |
18 | df <- propensity_model %>%
19 | augment(type.predict = "response", data = nhefs_complete) %>%
20 | mutate(wts = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))
21 |
22 | svy_des <- svydesign(
23 | ids = ~ 1,
24 | data = df,
25 | weights = ~ wts)
26 |
27 | smd_table_unweighted <- CreateTableOne(
28 | vars = c("sex", "race", "age", "education", "smokeintensity", "smokeyrs",
29 | "exercise", "active", "wt71"),
30 | strata = "qsmk",
31 | data = df,
32 | test = FALSE)
33 |
34 | smd_table <- svyCreateTableOne(
35 | vars = c("sex", "race", "age", "education", "smokeintensity", "smokeyrs",
36 | "exercise", "active", "wt71"),
37 | strata = "qsmk",
38 | data = svy_des,
39 | test = FALSE)
40 |
41 |
42 | plot_df <- data.frame(
43 | var = rownames(ExtractSmd(smd_table)),
44 | Unadjusted = as.numeric(ExtractSmd(smd_table_unweighted)),
45 | Weighted = as.numeric(ExtractSmd(smd_table))) %>%
46 | pivot_longer(-var, names_to = "Method", values_to = "SMD")
47 |
48 | ggplot(
49 | data = plot_df,
50 | mapping = aes(x = var, y = SMD, group = Method, color = Method)
51 | ) +
52 | geom_line() +
53 | geom_point() +
54 | geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
55 | coord_flip()
56 |
--------------------------------------------------------------------------------
/exercises/01-whole-game-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Causal Modeling in R: Whole Game"
3 | output: html_document
4 | ---
5 |
6 | ```{r setup}
7 | library(tidyverse)
8 | library(broom)
9 | library(rsample)
10 | library(ggdag)
11 | # remotes::install_github("malcolmbarrett/cidata")
12 | library(cidata)
13 | library(survey)
14 | library(tableone)
15 | ```
16 |
17 | ## Causal Modeling: Whole Game
18 |
19 | In this guided exercise, we'll attempt to answer a causal question: does quitting smoking make you gain weight? Causal modeling has a special place in the history of smoking research: the studies that demonstrated that smoking causes lung cancer were observational. Thanks to other studies, we also know that, if you're already a smoker, quitting smoking reduces your risk of lung cancer. However, some have observed that former smokers tend to gain weight. Is this the result of quitting smoking, or does something else explain this effect? In the book Causal Inference by Hernán and Robins, the authors analyze this question using several causal inference techniques.
20 |
21 | To answer this question, we'll use causal inference methods to examine the relationship between quitting smoking and gaining weight. First, we'll draw our assumptions with a causal diagram (a directed acyclic graph, or DAG), which will guide our model. Then, we'll use a modeling approach called inverse probability weighting--one of many causal modeling techniques--to estimate the causal effect we're interested in.
22 |
23 | We'll use data from NHEFS to try to estimate the causal effect of quitting smoking on weight game. NHEFS is a longitudinal, observational study that has many of the variables we'll need. Take a look at `nhefs_codebook` if you want to know more about the variables in this data set. These data are included in the {cidata} package. We'll use the `nhefs_complete` data set, but we'll remove people who were lost to follow-up.
24 |
25 | ```{r}
26 | nhefs_complete_uc <- nhefs_complete %>%
27 | filter(censored == 0)
28 | nhefs_complete_uc
29 | ```
30 |
31 | Let's look at the distribution of weight gain between the two groups.
32 |
33 | ```{r}
34 | colors <- c("#E69F00", "#56B4E9")
35 |
36 | nhefs_complete_uc %>%
37 | ggplot(aes(wt82_71, fill = factor(qsmk))) +
38 | geom_vline(xintercept = 0, color = "grey60", size = 1) +
39 | geom_density(color = "white", alpha = .75, size = .5) +
40 | scale_fill_manual(values = colors) +
41 | theme_minimal() +
42 | theme(legend.position = "bottom") +
43 | labs(
44 | x = "change in weight (lbs)",
45 | fill = "quit smoking (1 = yes)"
46 | )
47 | ```
48 |
49 | There's a difference--former smokers do seemed to have gained a bit more weight--but there's also a lot of variation. Let's look at the numeric summaries.
50 |
51 | ```{r}
52 | # ~2.5 lbs gained for quit vs. not quit
53 | nhefs_complete_uc %>%
54 | group_by(qsmk) %>%
55 | summarize(
56 | mean_weight_change = mean(wt82_71),
57 | sd = sd(wt82_71),
58 | .groups = "drop"
59 | )
60 | ```
61 |
62 | ---
63 |
64 | ```{r}
65 | # ~2.5 lbs gained for quit vs. not quit
66 | nhefs_complete_uc %>%
67 | group_by(qsmk) %>%
68 | summarize(
69 | mean_weight_change = mean(wt82_71),
70 | sd = sd(wt82_71),
71 | .groups = "drop"
72 | )
73 | ```
74 |
75 | Here, it looks like those who quit smoking gained, on average, 2.5 lbs. But is there something else that could explain these results? There are many factors associated with both quitting smoking and gaining weight; could one of those factors explain away the results we're seeing here?
76 |
77 | To truly answer this question, we need to specify a causal diagram based on domain knowledge. Sadly, for most circumstances, there is no data-driven approach that consistently identify confounders. Only our causal assumptions can help us identify them. Causal diagrams are a visual expression of those assumptions linked to rigorous mathematics that allow us to understand what we need to account for in our model.
78 |
79 | In R, we can visualize and analyze our DAGs with the {ggdag} package. {ggdag} uses {ggplot2} and {ggraph} to visualize diagrams and {dagitty} to analyze them. Let's set up our assumptions. The `dagify()` function takes formulas, much like `lm()` and friends, to express assumptions. We have two basic causal structures: the causes of quitting smoking and the causes of gaining weight. Here, we're assuming that the set of variables here affect both. Additionally, we're adding `qsmk` as a cause of `wt82_71`, which is our causal question; we also identify these as our outcome and exposure. Finally, we'll add some labels so the diagram is easier to understand. The result is a `dagitty` object, and we can transform it to a `tidy_dagitty` data set with `tidy_dagitty()`.
80 |
81 | ```{r}
82 | library(ggdag)
83 | # set up DAG
84 | smk_wt_dag <- dagify(
85 | # specify causes of quitting smoking and weight gain:
86 | qsmk ~ sex + race + age + education +
87 | smokeintensity + smokeyrs + exercise + active + wt71,
88 | wt82_71 ~ qsmk + sex + race + age + education +
89 | smokeintensity + smokeyrs + exercise + active + wt71,
90 | # specify causal question:
91 | exposure = "qsmk",
92 | outcome = "wt82_71",
93 | # set up labels:
94 | # here, I'll use the same variable names as the data set, but I'll label them
95 | # with clearer names
96 | labels = c(
97 | # causal question
98 | "qsmk" = "quit\nsmoking",
99 | "wt82_71" = "change in\nweight",
100 |
101 | # demographics
102 | "age" = "age",
103 | "sex" = "sex",
104 | "race" = "race",
105 | "education" = "education",
106 |
107 | # health
108 | "wt71" = "baseline\nweight",
109 | "active" = "daily\nactivity\nlevel",
110 | "exercise" = "exercise",
111 |
112 | # smoking history
113 | "smokeintensity" = "smoking\nintensity",
114 | "smokeyrs" = "yrs of\nsmoking"
115 | )
116 | )
117 |
118 | tidy_dagitty(smk_wt_dag)
119 | ```
120 |
121 | Let's visualize our assumptions with `ggdag()`.
122 |
123 | ```{r}
124 | smk_wt_dag %>%
125 | ggdag(text = FALSE, use_labels = "label")
126 | ```
127 |
128 | What do we need to control for to estimate an unbiased effect of quitting smoking on weight gain? In many DAGs, there will be many sets of variables--called adjustment sets--that will give us the right effect (assuming our DAG is correct--a big, unverifiable assumption!). `ggdag_adjustment_set()` can help you visualize them. Here, there's only one adjustment set: we need to control for everything! While we're add it, since a {ggdag} plot is just a {ggplot2} plot, let's clean it up a bit, too.
129 |
130 | ```{r}
131 | smk_wt_dag %>%
132 | ggdag_adjustment_set(text = FALSE, use_labels = "label") +
133 | theme_dag() +
134 | scale_color_manual(values = colors) +
135 | scale_fill_manual(values = colors)
136 | ```
137 |
138 | Let's fit a model with these variables. Note that we'll fit all continuous variables with squared terms, as well, to allow them a bit of flexibility.
139 |
140 | ```{r}
141 | lm(
142 | wt82_71~ qsmk + sex +
143 | race + age + I(age^2) + education +
144 | smokeintensity + I(smokeintensity^2) +
145 | smokeyrs + I(smokeyrs^2) + exercise + active +
146 | wt71 + I(wt71^2),
147 | data = nhefs_complete_uc
148 | ) %>%
149 | tidy(conf.int = TRUE) %>%
150 | filter(term == "qsmk")
151 | ```
152 |
153 | When we adjust for the variables in our DAG, we get an estimate of about 3.5 lbs--people who quit smoking gained about this amount of weight. However, we are trying to answer a specific causal question: how much weight would a person gain if the quit smoking vs. if the same person did not quit smoking? Let's use an inverse probability weighting model to try to estimate that effect at the population level (what if *everyone* quit smoking vs what if *no one* quit smoking).
154 |
155 | For a simple IPW model, we have two modeling steps. First, we fit a propensity score model, which predicts the probability that you received a treatment or exposure (here, that a participant quit smoking). We use this model to calculate inverse probability weights--1 / your probability of treatment. Then, in the second step, we use this weights in the outcome model, which estimates the effect of exposure on the outcome (here, the effect of quitting smoking on gaining weight).
156 |
157 | For the propensity score model, we'll use logistic regression (since quitting smoking is a binary variable). The outcome is quitting smoking, and the variables in the model are all those included in our adjustment set. Then, we'll use `augment()` from {broom} (which calls `predict()` on the inside) to calculate our weights and save it back into our data set.
158 |
159 |
160 | ```{r}
161 | propensity_model <- glm(
162 | qsmk ~ sex +
163 | race + age + I(age^2) + education +
164 | smokeintensity + I(smokeintensity^2) +
165 | smokeyrs + I(smokeyrs^2) + exercise + active +
166 | wt71 + I(wt71^2),
167 | family = binomial(),
168 | data = nhefs_complete_uc
169 | )
170 |
171 | nhefs_complete_uc <- propensity_model %>%
172 | # predict whether quit smoking
173 | augment(type.predict = "response", data = nhefs_complete_uc) %>%
174 | # calculate inverse probability
175 | mutate(wts = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))
176 |
177 | nhefs_complete_uc %>%
178 | select(qsmk, .fitted, wts)
179 | ```
180 |
181 | Let's look at the distribution of the weights.
182 |
183 | ```{r}
184 | ggplot(nhefs_complete_uc, aes(wts)) +
185 | geom_density(col = "#E69F00", fill = "#E69F0095", size = .8) +
186 | # use a log scale for the x axis
187 | scale_x_log10() +
188 | theme_minimal(base_size = 20) +
189 | xlab("Weights")
190 | ```
191 |
192 | It looks a little skewed, particularly that there are some participants with much higher weights. There are a few techniques for dealing with this--trimming weights and stabilizing weights--but we'll keep it simple for now and just use them as is.
193 |
194 | The main goal here is to *break* the non-causal associations between quitting smoking and gaining weight--the other paths that might distort our results. In other words, if we succeed, there should be no differences in these variables between our two groups, those who quit smoking and those who didn't. This is where randomized trials shine; you can often assume that there is no baseline differences among potential confounders between your treatment groups (of course, no study is perfect, and there's a whole set of literature on dealing with this problem in randomized trials).
195 |
196 | Standardized mean differences (SMD) are a simple measurement of differences that work across variable types. In general, the closer to 0 we are, the better job we have done eliminating the non-causal relationships we drew in our DAG. Note that low SMDs for everything we adjust for does *not* mean that there is not something else that might confound our study. Unmeasured confounders or misspecified DAGs can still distort our effects, even if our SMDs look great!
197 |
198 | We'll use the {survey} and {tableone} package to calculate the SMDs, then visualize them.
199 |
200 | ```{r}
201 | svy_des <- svydesign(
202 | ids = ~ 1,
203 | data = nhefs_complete_uc,
204 | weights = ~ wts)
205 |
206 | smd_table_unweighted <- CreateTableOne(
207 | vars = c("sex", "race", "age", "education", "smokeintensity", "smokeyrs",
208 | "exercise", "active", "wt71"),
209 | strata = "qsmk",
210 | data = nhefs_complete_uc,
211 | test = FALSE)
212 |
213 | smd_table <- svyCreateTableOne(
214 | vars = c("sex", "race", "age", "education", "smokeintensity", "smokeyrs",
215 | "exercise", "active", "wt71"),
216 | strata = "qsmk",
217 | data = svy_des,
218 | test = FALSE)
219 |
220 |
221 | plot_df <- data.frame(
222 | var = rownames(ExtractSmd(smd_table)),
223 | Unadjusted = as.numeric(ExtractSmd(smd_table_unweighted)),
224 | Weighted = as.numeric(ExtractSmd(smd_table))) %>%
225 | pivot_longer(-var, names_to = "Method", values_to = "SMD")
226 |
227 | ggplot(
228 | data = plot_df,
229 | mapping = aes(x = var, y = SMD, group = Method, color = Method)
230 | ) +
231 | geom_line() +
232 | geom_point() +
233 | geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
234 | coord_flip() +
235 | theme_minimal()
236 |
237 | ```
238 |
239 | These look pretty good! Some variables are better than others, but weighting appears to have done a much better job eliminating these differences than an unadjusted analysis.
240 |
241 | We could do more here to analyze our assumptions, but let's move on to our second step: fitting the outcome model weighted by our inverse probabilities. Some researchers call these Marginal Structural Models, in part because the model is marginal; we only need to include our outcome (`wt82_71`) and exposure (`qsmk`). The other variables aren't in the model; they are accounted for with the IPWs!
242 |
243 | ```{r}
244 | ipw_model <- lm(
245 | wt82_71 ~ qsmk,
246 | data = nhefs_complete_uc,
247 | weights = wts # inverse probability weights
248 | )
249 |
250 | ipw_estimate <- ipw_model %>%
251 | tidy(conf.int = TRUE) %>%
252 | filter(term == "qsmk")
253 |
254 | ipw_estimate
255 | ```
256 |
257 | This estimate is pretty similar to what we saw before, if a little smaller. In fact, for simple causal questions, this is often the case: adjusting for confounders directly in your regression model sometimes estimates the same effect as IPWs and other causal techniques. Causal techniques are special, though, in that the use counterfactual modeling, which allows you to deal with many circumstances, such as when you have selection bias or time-dependendent confounding.
258 |
259 | But we have other problem that we need to address. While we're just using `lm()` to estimate our IPW model, it doesn't properly account for the weights. That means our standard error is too small, which will artificially narrow confidence intervals and artificially shrink p-values. There are many ways to address this, including robust estimators, but we'll use a simple approach using the bootstrap via the {rsamples} package.
260 |
261 | First, we need to wrap our model in a function so we can call it many times on our bootstrapped data. A function like this might be your instinct; however, it's not quite right.
262 |
263 | ```{r}
264 | # fit ipw model for a single bootstrap sample
265 | fit_ipw_not_quite_rightly <- function(split, ...) {
266 | # get bootstrapped data sample with `rsample::analysis()`
267 | .df <- analysis(split)
268 |
269 | # fit ipw model
270 | lm(wt82_71 ~ qsmk, data = .df, weights = wts) %>%
271 | tidy()
272 | }
273 | ```
274 |
275 | The problem is that we need to account for the *entire* modeling process, so we need to include the first step of our analysis -- fitting the inverse probability weights.
276 |
277 | ```{r}
278 | fit_ipw <- function(split, ...) {
279 | .df <- analysis(split)
280 |
281 | # fit propensity score model
282 | propensity_model <- glm(
283 | qsmk ~ sex +
284 | race + age + I(age^2) + education +
285 | smokeintensity + I(smokeintensity^2) +
286 | smokeyrs + I(smokeyrs^2) + exercise + active +
287 | wt71 + I(wt71^2),
288 | family = binomial(),
289 | data = .df
290 | )
291 |
292 | # calculate inverse probability weights
293 | .df <- propensity_model %>%
294 | augment(type.predict = "response", data = .df) %>%
295 | mutate(wts = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))
296 |
297 | # fit correctly bootsrapped ipw model
298 | lm(wt82_71 ~ qsmk, data = .df, weights = wts) %>%
299 | tidy()
300 | }
301 | ```
302 |
303 | {rsample} makes the rest easy for us: `bootstraps()` resamples our data 1000 times, then we can use `purrr::map()` to apply our function to each resampled set (`splits`). {rsample}'s `int_*()` functions help us get confidence intervals for our estimate.
304 |
305 | ```{r}
306 | # fit ipw model to bootstrapped samples
307 | ipw_results <- bootstraps(nhefs_complete, 1000, apparent = TRUE) %>%
308 | mutate(results = map(splits, fit_ipw))
309 |
310 | # get t-statistic-based CIs
311 | boot_estimate <- int_t(ipw_results, results) %>%
312 | filter(term == "qsmk")
313 |
314 | boot_estimate
315 | ```
316 |
317 | Let's compare to our naive weighted model that just used a single estimate from `lm()`
318 |
319 | ```{r}
320 | bind_rows(
321 | ipw_estimate %>%
322 | select(estimate, conf.low, conf.high) %>%
323 | mutate(type = "ols"),
324 | boot_estimate %>%
325 | select(estimate = .estimate, conf.low = .lower, conf.high = .upper) %>%
326 | mutate(type = "bootstrap")
327 | ) %>%
328 | # calculate CI width to sort by it
329 | mutate(width = conf.high - conf.low) %>%
330 | arrange(width) %>%
331 | # fix the order of the model types for the plot
332 | mutate(type = fct_inorder(type)) %>%
333 | ggplot(aes(x = type, y = estimate, ymin = conf.low, ymax = conf.high)) +
334 | geom_pointrange(color = "#0172B1", size = 1, fatten = 3) +
335 | coord_flip() +
336 | theme_minimal(base_size = 20)
337 | ```
338 |
339 | Our bootstrapped confidence intervals are wider, which is expected; remember that they were artificially narrow in the naive OLS model!
340 |
341 | So, we have a final estimate for our causal effect: on average, a person who quits smoking will gain 3.5 lbs (95% CI 2.4 lbs, 4.4 lbs) versus if they had not quit smoking. What do you think? Is this estimate reliable? Did we do a good job addressing the assumptions we need to make for a causal effect, particularly that there is no confounding? How might you criticize this model, and what would you do differently?
342 |
343 | ***
344 |
345 | # Take aways
346 | * The broad strokes for a causal analysis are: 1) identify your causal question 2) make your assumptions clear 3) check your assumptions as best you can and 4) use the right estimator for the question you're trying to ask. As scientists, we should be able to critique each of these steps, and that's a good thing!
347 | * To create marginal structural models, first fit a propensity model for the weights with the exposure as the outcome. Then, use the inverse of the predicted probabilities as weights in a model with just the outcome and exposure.
348 | * See more at https://causalinferencebookr.netlify.com
349 |
--------------------------------------------------------------------------------
/exercises/02-dags-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Causal Diagrams in R"
3 | output: html_document
4 | ---
5 |
6 | ```{r setup}
7 | library(tidyverse)
8 | library(ggdag)
9 | library(dagitty)
10 | ```
11 |
12 | ## Your Turn 1
13 |
14 | Descriptively, drinking coffee is associated with lung cancer. Does drinking coffee cause lung cancer?
15 |
16 | 1. Specify a DAG with `dagify()`. Write your assumption that `smoking` causes `cancer` as a formula.
17 | 2. We're going to assume that coffee does *not* cause cancer, so there's no formula for that. But we still need to declare our causal question. Specify "coffee" as the exposure and "cancer" as the outcome (both in quotations marks).
18 | 3. Plot the DAG using `ggdag()`
19 |
20 | Stretch goal: Underneath the hood, `ggdag()` calls `tidy_dagitty()` on `dagitty` objects. Check the help page for `?tidy_dagitty()` and try using one of the layouts listed in `?layout_tbl_graph_igraph()`. Experiment with different layouts. Plot with `ggdag()`
21 |
22 | Stretch goal: We are assuming that coffee does *not* cause lung cancer. In causal diagram terminology, we want to know if these two factors are *D-connected* (as in, directionally connected). Since we are assuming that there's no causal relationship, any D-connection is caused by other factors. Try `ggdag_dconnected()` to see if we coffee and lung cancer are D-connected.
23 |
24 | ```{r}
25 | coffee_cancer_dag <- ______(
26 | ______ ~ ______,
27 | smoking ~ addictive,
28 | coffee ~ addictive,
29 | exposure = "______",
30 | outcome = "______",
31 | labels = c(
32 | "coffee" = "Coffee",
33 | "cancer" = "Lung Cancer",
34 | "smoking" = "Smoking",
35 | "addictive" = "Addictive \nBehavior"
36 | )
37 | )
38 |
39 | ______(coffee_cancer_dag)
40 | ```
41 |
42 | ## Your Turn 2
43 |
44 | Most {ggdag} quick plotting functions are actually wrappers for functions that let you work with the DAG data directly.
45 |
46 | 1. Call `tidy_dagitty()` on `coffee_cancer_dag` to create a tidy DAG, then pass the results to `dag_paths()`. What's different about these data?
47 | 2. Plot the open paths with `ggdag_paths()`. (Just give it `coffee_cancer_dag` rather than using `dag_paths()`; the quick plot function will do that for you.) Remember, since we assume there is *no* causal path from coffee to lung cancer, any open paths must be confounding pathways.
48 |
49 | Stretch goal: ggdags are just ggplots! You can add themes, geoms, and other {ggplot2} elements with `+` like a normal ggplot. Try adding a theme ({ggdag} has several, or you could try `theme_void()`).
50 |
51 | Stretch goal: The variable names are a little hard to read. We specified some labels earlier, so let's use them. In the tidy DAG, these are called `label`. In `ggdag_paths()`, add the argument `use_labels = "label"` and remove the node text with `text = FALSE`.
52 |
53 | ```{r}
54 | coffee_cancer_dag %>%
55 | ______() %>%
56 | ______()
57 |
58 | ______
59 | ```
60 |
61 |
62 | ## Your Turn 3
63 |
64 | Now that we know the open, confounding pathways (sometimes called "backdoor paths"), we need to know how to close them! First, we'll ask {ggdag} for adjustment sets, then we would need to do something in our analysis to account for at least one adjustment set (e.g. multivariable regression, weighting, or matching for the adjustment sets).
65 |
66 | 1. Use `ggdag_adjustment_set()` to visualize the adjustment sets. Add the arguments `use_labels = "label"` and `text = FALSE`.
67 | 2. Write an R formula for each adjustment set, as you might if you were fitting a model in `lm()` or `glm()`
68 |
69 | Stretch goal: Sometimes, we know a variable plays a vital role in a causal diagram but we can't measure it, or we simply don't have it in our data set. You can tell {ggdag} that a variable is unmeasured with the `latent` argument in `dagify()`. Re-run the `dagify()` call above, but set `latent = "addictive"` (meaning we can't or haven't measured this variable). Plot it with `ggdag_adjustment_set()`. What's different? Now, try setting `latent = c("addictive", "smoking")` and plotting the adjustment set. What do the results mean?
70 |
71 | ```{r}
72 | ______(______)
73 |
74 | cancer ~ ______
75 | cancer ~ ______
76 | ```
77 |
78 | # Take aways
79 |
80 | * Draw your assumptions with DAGs! Use `dagify()` to specify them and `ggdag()` and friends to draw them.
81 | * The main goal for many analyses is to close backdoor (non-causal) paths. {ggdag} and {dagitty} can help you identify them.
82 | * Adjustment sets are key for closing backdoor paths. Take a reasonable set and use it in your model to get a causal effect estimate.
83 |
--------------------------------------------------------------------------------
/exercises/03-pscores-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Propensity Scores"
3 | output: html_document
4 | ---
5 |
6 |
7 | ```{r}
8 | library(tidyverse)
9 | library(broom)
10 | library(cidata)
11 | library(ggdag)
12 | ```
13 |
14 | Using the National Health and Nutrition Examination Survey Data (`nhefs_complete`), we are interested in the relationship between the **exposure**, `qsmk`: whether the participant quit smoking, and the **outcome**, `wt82_71`: their weight change in kilograms.
15 |
16 | Below is a proposed DAG of the relationship between 4 confounders: `sex`, `age`, `smokeyrs` and `wt71` and the exposure and outcome.
17 |
18 | _Knit this document to see the DAG or refer to the slides_.
19 |
20 | ```{r}
21 | set.seed(1234)
22 | # set up DAG
23 | smk_wt_dag <- dagify(
24 | # specify causes of quitting smoking and weight gain:
25 | qsmk ~ sex + age + smokeyrs + wt71,
26 | wt82_71 ~ sex + age + smokeyrs + wt71,
27 | # specify causal question:
28 | exposure = "qsmk",
29 | outcome = "wt82_71",
30 | # set up labels:
31 | # here, I'll use the same variable names as the data set, but I'll label them
32 | # with clearer names
33 | labels = c(
34 | # causal question
35 | "qsmk" = "quit\nsmoking (qsmk)",
36 | "wt82_71" = "change in\nweight",
37 |
38 | # demographics
39 | "age" = "age",
40 | "sex" = "sex",
41 |
42 | # health
43 | "wt71" = "baseline\nweight (wt71)",
44 |
45 | # smoking history
46 | "smokeyrs" = "yrs of\nsmoking (smokeyrs)"
47 | )
48 | ) %>%
49 | tidy_dagitty()
50 |
51 | smk_wt_dag %>%
52 | ggdag(text = FALSE, use_labels = "label")
53 | ```
54 |
55 | ## Your Turn
56 |
57 | _After updating the code chunks below, change `eval = TRUE` before knitting._
58 |
59 | Fit a propensity score model for `qsmk` using the above proposed confounders.
60 |
61 | ```{r, eval = FALSE}
62 | propensity_model <- ___(
63 | ___ ~ ___,
64 | data = nhefs_complete,
65 | family = _____
66 | )
67 | ```
68 |
69 | Add the propensity scores to the `nhefs_complete` data set, call this new dataset `df`.
70 |
71 | ```{r, eval = FALSE}
72 | df <- propensity_model %>%
73 | ____(type.predict = ____, data = ____)
74 | ```
75 |
76 |
77 | Stretch Goal 1:
78 |
79 | Examine two histograms of the propensity scores, one for those that quit smoking (`qsmk == 1`) and one for those that did not (`qsmk == 0`). How do these compare?
80 |
81 | ```{r}
82 |
83 | ```
84 |
85 |
--------------------------------------------------------------------------------
/exercises/04-pscores-weighting-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Propensity Score Weighting"
3 | output: html_document
4 | ---
5 |
6 |
7 | ```{r}
8 | library(tidyverse)
9 | library(broom)
10 | library(cidata)
11 | ```
12 |
13 | Using the National Health and Nutrition Examination Survey Data (`nhefs_complete`), we are interested in the relationship between the **exposure**, `qsmk`: whether the participant quit smoking, and the **outcome**, `wt82_71`: their weight change in kilograms.
14 |
15 | Below is the propensity score model you created in the previous exercise.
16 |
17 | ```{r, eval = FALSE}
18 | propensity_model <- glm(
19 | qsmk ~ age + sex + wt71 + smokeyrs,
20 | data = nhefs_complete,
21 | family = binomial()
22 | )
23 |
24 | df <- propensity_model %>%
25 | augment(type.predict = "response", data = nhefs_complete)
26 | ```
27 |
28 | ## Your Turn
29 |
30 | _After updating the code chunks below, change `eval = TRUE` before knitting._
31 |
32 | Add the ATE weights to the data frame, `df`
33 |
34 | ```{r, eval = FALSE}
35 | df <- df %>%
36 | mutate(w_ate = ___)
37 | ```
38 |
39 |
40 | Stretch Goal 1:
41 |
42 | Add ATT weights to the data frame, `df`
43 |
44 | ```{r, eval = FALSE}
45 | df <- df %>%
46 | mutate(w_att = ___)
47 | ```
48 |
49 | Stretch Goal 2:
50 |
51 | Update the code below to examine the distribution of the weighted sample. **HINT** the part that needs to be updated is the `weight` parameter in two of the `geom_histogram()` calls.
52 |
53 | ```{r, eval = FALSE}
54 | d <- df %>%
55 | tidyr::spread(qsmk, .fitted, sep = "_p")
56 | ```
57 |
58 |
59 | ```{r, eval = FALSE}
60 | ggplot(d) +
61 | geom_histogram(bins = 50, aes(qsmk_p1), alpha = 0.5) +
62 | geom_histogram(bins = 50, aes(qsmk_p1, weight = ____), fill = "green", alpha = 0.5) +
63 | geom_histogram(bins = 50, alpha = 0.5, aes(x = qsmk_p0, y = -..count..)) +
64 | geom_histogram(bins = 50, aes(x = qsmk_p0, weight = ____, y = -..count..), fill = "blue", alpha = 0.5) +
65 | ylab("count") + xlab("p") +
66 | geom_hline(yintercept = 0, lwd = 0.5) +
67 | scale_y_continuous(label = abs) +
68 | theme_minimal() +
69 | geom_rect(aes(xmin = 0.95, xmax = 1, ymin = 5, ymax = 100), fill = "#5DB854") +
70 | geom_text(aes(x = 0.975, y = 50), label = "trt", angle = 270, color = "white") +
71 | geom_rect(aes(xmin = 0.95, xmax = 1, ymin = -100, ymax = -5), fill = "#5154B8") +
72 | geom_text(aes(x = 0.975, y = -50), label = "control", angle = 270, color = "white")
73 |
--------------------------------------------------------------------------------
/exercises/05-pscores-diagnostics-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Propensity Score Diagnostics"
3 | output: html_document
4 | ---
5 |
6 |
7 | ```{r}
8 | library(tidyverse)
9 | library(survey)
10 | library(tableone)
11 | library(broom)
12 | library(cidata)
13 | ```
14 |
15 | Using the National Health and Nutrition Examination Survey Data (`nhefs_complete`), we are interested in the relationship between the **exposure**, `qsmk`: whether the participant quit smoking, and the **outcome**, `wt82_71`: their weight change in kilograms.
16 |
17 | Below is the propensity score model and weights you created in the previous exercise.
18 |
19 | ```{r, eval = FALSE}
20 | propensity_model <- glm(
21 | qsmk ~ age + sex + wt71 + smokeyrs,
22 | data = nhefs_complete,
23 | family = binomial()
24 | )
25 |
26 | df <- propensity_model %>%
27 | augment(type.predict = "response", data = nhefs_complete) %>%
28 | mutate(w_ate = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))
29 | ```
30 |
31 | ## Your Turn 1
32 |
33 | _After updating the code chunks below, change `eval = TRUE` before knitting._
34 |
35 | Create the survey design object to incorporate the weights.
36 |
37 | ```{r, eval = FALSE}
38 | svy_des <- ____(
39 | ids = ~ 1,
40 | data = ___,
41 | weights = ___
42 | )
43 | ```
44 |
45 | Create the **unweighted** standardized mean differences data frame
46 |
47 | ```{r, eval = FALSE}
48 | smd_table_unweighted <- ____(
49 | vars = _____,
50 | strata = _____,
51 | data = ____,
52 | test = FALSE)
53 | ```
54 |
55 | Create the **weighted** standardized mean differences data frame
56 |
57 | ```{r, eval = FALSE}
58 | smd_table <- ____(
59 | vars = _____,
60 | strata = _____,
61 | data = ____,
62 | test = FALSE)
63 | ```
64 |
65 | Create a data frame that merges `smd_table_unweighted` and `smd_table` and pivots it to prepare for plotting
66 |
67 | ```{r, eval = FALSE}
68 | plot_df <- data.frame(
69 | var = rownames(____),
70 | Unadjusted = _____,
71 | Weighted = _____) %>%
72 | pivot_longer(-var, names_to = "Method", values_to = "SMD")
73 | ```
74 |
75 | Create the Love Plot using ggplot
76 |
77 | ```{r, eval = FALSE}
78 | ggplot(data = _____,
79 | mapping = aes(x = ____, y = ____, group = ____, color = ____)) +
80 | geom_line() +
81 | geom_point() +
82 | geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
83 | coord_flip()
84 | ```
85 |
86 |
87 |
88 | ## Your Turn 2
89 |
90 | Create an unweighted ECDF for `smokeyrs` by those who quit smoking and those who did not.
91 |
92 | ```{r, eval = FALSE}
93 | ggplot(df, aes(x = ____, group = ____, color = factor(____))) +
94 | ____() +
95 | scale_color_manual("Quit smoking", values = c("#5154B8", "#5DB854"),
96 | labels = c("Yes", "No")) +
97 | xlab(____) +
98 | ylab("Proportion <= x")
99 | ```
100 |
101 |
102 | Create an weighted ECDF for `smokeyrs` by those who quit smoking and those who did not.
103 |
104 | ```{r, eval = FALSE}
105 | ecdf_1 <- df %>%
106 | filter(____) %>%
107 | arrange(____) %>%
108 | mutate(cum_pct = cumsum(w_ate) / sum(w_ate))
109 |
110 | ecdf_0 <- df %>%
111 | filter(____) %>%
112 | arrange(____) %>%
113 | mutate(cum_pct = cumsum(w_ate) / sum(w_ate))
114 |
115 | ggplot(ecdf_1, aes(x = _____, y = cum_pct)) +
116 | geom_line(color = "#5DB854") +
117 | geom_line(data = ecdf_0, aes(x = ____, y = cum_pct), color = "#5154B8") +
118 | xlab(____) +
119 | ylab("Proportion <= x")
120 | ```
121 |
122 |
--------------------------------------------------------------------------------
/exercises/06-outcome-model-exercises.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Outcome Model"
3 | output: html_document
4 | ---
5 |
6 |
7 | ```{r}
8 | library(tidyverse)
9 | library(survey)
10 | library(tableone)
11 | library(broom)
12 | library(cidata)
13 | library(rsample)
14 | ```
15 |
16 | Using the National Health and Nutrition Examination Survey Data (`nhefs_complete`), we are interested in the relationship between the **exposure**, `qsmk`: whether the participant quit smoking, and the **outcome**, `wt82_71`: their weight change in kilograms.
17 |
18 | ## Your turn
19 |
20 | _After updating the code chunks below, change `eval = TRUE` before knitting._
21 |
22 | Create a function called `ipw_fit` that fits the propensity score model from Exercise 03, incorporates the ATE weights calculated in Exercise 04, and fits a weighted outcome model.
23 |
24 | ```{r, eval = FALSE}
25 | it_ipw <- function(split, ...) {
26 | .df <-____
27 |
28 | # fit propensity score model
29 |
30 | # calculate ATE weights
31 |
32 | # fit correctly bootsrapped ipw model
33 | lm(___ ~ ___, data = .df, weights = ___) %>%
34 | tidy()
35 | }
36 | ```
37 |
38 | Bootstrap this result 1000 times.
39 |
40 | ```{r, eval = FALSE}
41 | ipw_results <- ____(___, 1000, apparent = TRUE) %>%
42 | mutate(results = map(splits, _____))
43 | ```
44 |
45 |
46 | Calculate the confidence interval
47 |
48 | ```{r, eval = FALSE}
49 | boot_estimate <- ____(____, ____) %>%
50 | filter(term == ____)
51 | ```
52 |
53 |
--------------------------------------------------------------------------------
/slides/00-intro.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Causal Inference in R: Introduction"
3 | date: "2020-07-29 (updated: `r Sys.Date()`)"
4 | output:
5 | xaringan::moon_reader:
6 | css: ["default", "theme.css"]
7 | lib_dir: libs
8 | nature:
9 | highlightStyle: github
10 | highlightLines: true
11 | countIncrementalSlides: false
12 | ---
13 |
14 | ```{r setup, include=FALSE}
15 | options(htmltools.dir.version = FALSE, tibble.max_extra_cols = 6, tibble.width = 60)
16 | knitr::opts_chunk$set(warning = FALSE, message = FALSE, fig.align = "center", dpi = 320)
17 | ```
18 |
19 | ```{css, echo = FALSE}
20 | img {
21 | height: 250px;
22 | width: 250px;
23 | border-radius: 50%;
24 | object-fit: cover;
25 | }
26 | ```
27 |
28 |
29 |
30 | ## `> who_are_we(c("lucy", "malcolm"))`
31 |
32 | .pull-left[
33 |
34 |
35 | ```{r, echo=FALSE}
36 | knitr::include_graphics("img/ldm.jpg")
37 | ```
38 |
39 | `r fontawesome::fa("globe")` [https://www.lucymcgowan.com/](https://www.lucymcgowan.com/)
40 | ]
41 |
42 | .pull-right[
43 |
44 |
45 | ```{r, echo=FALSE}
46 | knitr::include_graphics("img/mb.jpg")
47 | ```
48 |
49 | `r fontawesome::fa("globe")` [https://www.malco.io/](https://www.malco.io/)
50 |
51 | ]
52 |
53 | ---
54 |
55 | class: center, inverse, middle
56 |
57 | # The three practices of analysis
58 |
59 | 1. Describe
60 | 2. Predict
61 | 3. Explain
62 |
63 | ---
64 | class: middle, center, inverse
65 |
66 | # Normal regression estimates associations. But we want *counterfactual, causal* estimates:
67 |
68 | # What would happen if *everyone* in the study were exposed to x vs if *no one* was exposed.
69 |
70 |
71 | ---
72 |
73 | class: middle, center, inverse
74 | # For causal inference, we need to make sometimes unverifiable assumptions.
75 |
76 | # Today, we'll focus on the assumption of *no confounding*.
77 |
78 | ---
79 |
80 | class: inverse, middle
81 |
82 | # Tools for causal inference
83 |
84 | 1. Causal diagrams
85 | 1. Propensity score weighting
86 | 1. Propensity score matching
87 |
88 | ---
89 |
90 | class: inverse, middle
91 |
92 | # Other tools for causal inference
93 |
94 | 1. Randomized trials
95 | 1. G-methods & friends
96 | 1. Instrumental variables & friends
97 |
98 | ---
99 |
100 | class: inverse, middle, center
101 |
102 | # Let's head to RStudio Cloud: https://bit.ly/causalcloud
103 |
104 | ---
105 |
106 |
107 | class: inverse
108 |
109 | # Resources
110 | ## [Causal Inference](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/): Comprehensive text on causal inference. Free online.
111 | ## [The Book of Why](http://bayes.cs.ucla.edu/WHY/): Detailed, friendly intro to DAGs and causal inference. Free online.
112 | ## [Mastering 'Metrics](http://www.masteringmetrics.com/): Friendly introduction to IV-based methods
113 |
--------------------------------------------------------------------------------
/slides/00-intro.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |