├── README.md ├── ComparingDiD_files └── figure-gfm │ ├── plot-1.png │ └── plot treatment-1.png ├── ComparingDiD.Rmd └── ComparingDiD.md /README.md: -------------------------------------------------------------------------------- 1 | # did_compare 2 | R-code to compare some staggered did methods 3 | -------------------------------------------------------------------------------- /ComparingDiD_files/figure-gfm/plot-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fhollenbach/did_compare/HEAD/ComparingDiD_files/figure-gfm/plot-1.png -------------------------------------------------------------------------------- /ComparingDiD_files/figure-gfm/plot treatment-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fhollenbach/did_compare/HEAD/ComparingDiD_files/figure-gfm/plot treatment-1.png -------------------------------------------------------------------------------- /ComparingDiD.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Comparing Staggered DiD" 3 | author: "Florian M. Hollenbach" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::github_document 6 | bibliography: "/Users/florianhollenbach/Dropbox (Personal)/Bibtex/fhollenbach_master.bib" 7 | --- 8 | 9 | 10 | # Comparing different staggered Difference-in-Differences Estimators 11 | **Note: this is not supposed to be an evaluation of the different estimators/packages.** 12 | 13 | Let's load packages and set up the ggplot theme, which is stolen from [Andrew Heiss](https://www.andrewheiss.com/). 14 | 15 | ```{r setup, echo = FALSE} 16 | knitr::opts_chunk$set(echo = TRUE) 17 | library(did) 18 | library(didimputation) 19 | library(PanelMatch) 20 | library(fixest) 21 | library(broom) 22 | library(tidyverse) 23 | library(augsynth) 24 | library(panelView) 25 | library(fect) 26 | library(MetBrewer) 27 | 28 | 29 | theme_clean <- function() { 30 | theme_minimal(base_family = "Barlow Semi Condensed") + 31 | theme(panel.grid.minor = element_blank(), 32 | plot.title = element_text(face = "bold", size=30), 33 | plot.subtitle = element_text(size=25), 34 | axis.title = element_text(family = "Barlow Semi Condensed Medium", face = "bold", size = 20), 35 | axis.text = element_text(family = "Barlow Semi Condensed Medium", size = 15), 36 | strip.text = element_text(family = "Barlow Semi Condensed", 37 | face = "bold", size = rel(1), hjust = 0), 38 | axis.title.y = element_text(angle = 90), 39 | strip.background = element_rect(fill = "grey80", color = NA), 40 | plot.caption = element_text(hjust = 0), 41 | legend.text = element_text(family = "Barlow Semi Condensed Medium", size=15), 42 | legend.title = element_text(family = "Barlow Semi Condensed", size=20)) 43 | } 44 | ``` 45 | 46 | First, we create a simulated data set, with staggered treatments, heterogeneous and dynamic treatment effects. The code is based on the simulations in @baker.2021.how, except we have a never-treated group and decreased the number of units.^[Thanks to Andrew Baker for sharing his code [here](https://github.com/andrewchbaker/DiD_Codes).] 47 | ```{r sim data} 48 | # Data 6 - Multiple Treatment Periods and Dynamic Treatment Effects -------------- 49 | make_data6 <- function(...) { 50 | 51 | # Fixed Effects ------------------------------------------------ 52 | # unit fixed effects 53 | unit <- tibble( 54 | unit = 1:200, 55 | unit_fe = rnorm(200, 0, 0.5), 56 | # generate state 57 | state = sample(1:50, 200, replace = TRUE), 58 | # generate treatment groups 59 | group = case_when( 60 | state %in% 1:10 ~ 1989, 61 | state %in% 11:20 ~ 1998, 62 | state %in% 21:35 ~ NA_real_, ## never treated 63 | state %in% 36:50 ~ 2005 64 | ), 65 | # avg yearly treatment effects by group 66 | hat_gamma = case_when( 67 | is.na(group) ~ 0, ## never treated 68 | group == 1989 ~ .5, 69 | group == 1998 ~ .3, 70 | group == 2005 ~ .1 71 | )) %>% 72 | # generate unit specific yearly treatment effects 73 | rowwise() %>% 74 | mutate(gamma = if_else(is.na(group) == TRUE, 0, rnorm(1, hat_gamma, .2))) %>% 75 | ungroup() 76 | 77 | # year fixed effects 78 | year <- tibble( 79 | year = 1980:2015, 80 | year_fe = rnorm(36, 0, 0.5)) 81 | 82 | # full interaction of unit X year 83 | crossing(unit, year) %>% 84 | # make error term and get treatment indicators and treatment effects 85 | mutate(error = rnorm(nrow(.), 0, 0.5), 86 | treat = ifelse(year >= group & is.na(group)==F, 1, 0), # 0 for ## never treated 87 | tau = ifelse(treat == 1 & is.na(group)==F, gamma, 0)) %>% # 0 for ## never treated 88 | # calculate the dep variable 89 | group_by(unit) %>% 90 | mutate(cumtau = cumsum(tau)) %>% 91 | mutate(dep_var = unit_fe + year_fe + cumtau + error) 92 | } 93 | 94 | # make data 95 | #and treatment group variable for CSA 96 | data <- make_data6() %>% 97 | as_tibble() %>% 98 | mutate(group_CSA = if_else(is.na(group), 0, group), # CSA wants never treated cohort variable to be 0 99 | group = if_else(is.na(group), 10000, group), # never treated cohort variable 10000 for fixest 100 | time_to_treatment = ifelse(group != 10000, year - group, -1000)) # set time to treatment to -1000 for fixest 101 | 102 | ``` 103 | 104 | We can plot the data and treatment status using Licheng Liu and Yiqing Xu's awesome `panelView` package [@liu.2021.panelview]. 105 | 106 | ```{r plot treatment} 107 | panelView(dep_var ~ treat, data = data, index = c("unit","year"), xlab = "Year", ylab = "Unit", axis.lab.gap = 5) 108 | ``` 109 | Next, we create the stacked data set, once again following the code by [Andrew Baker](https://github.com/andrewchbaker/DiD_Codes). 110 | 111 | 112 | ```{r data} 113 | ### for stacking 114 | groups <- data %>% 115 | filter(group != 10000) %>% 116 | pull(group) %>% 117 | unique() 118 | 119 | ### create stacked data 120 | getdata <- function(i) { 121 | 122 | #keep what we need 123 | data %>% 124 | # keep treated units and all units not treated within -5 to 5 125 | filter(group == i | group > i + 7) %>% 126 | # keep just year -5 to 5 127 | filter(year >= i - 7 & year <= i + 7) %>% 128 | # create an indicator for the dataset 129 | mutate(df = i) %>% 130 | mutate(time_to_treatment = year - group) %>% 131 | # make dummies 132 | mutate(time_to_treatment = if_else(group == i, time_to_treatment, 0)) 133 | } 134 | stacked_data <- map_df(groups, getdata) %>% 135 | mutate(bracket_df = paste(state,df)) 136 | ``` 137 | 138 | Now we can move on to estimating the different models. First, the standard two-way fixed effects model with dynamic event time estimates. We estimate the model using the `fixest` package [@berge.2018.efficient] and extract the dynamic event time estimates. 139 | 140 | ```{r twfe} 141 | twfe <- data %>% 142 | do(broom::tidy(feols(dep_var ~ + i(time_to_treatment, ref = c(-1, -1000)) | unit + year, 143 | data = ., cluster = ~state), conf.int = TRUE)) %>% 144 | mutate(t = as.double(str_replace_all(term, c("time_to_treatment::" = "", ":treated" = "")))) %>% 145 | filter(t > -8 & t < 8) %>% 146 | select(t, estimate, conf.low, conf.high) %>% 147 | # add in data for year -1 148 | bind_rows(tibble(t = -1, estimate = 0, 149 | conf.low = 0, conf.high = 0 150 | )) %>% 151 | mutate(method = "TWFE") 152 | ``` 153 | 154 | Next, the same model but on the stacked data. Following @baker.2021.how, we cluster standard errors at the unit$\times$dataset interaction. 155 | ```{r stacked} 156 | stacked <- stacked_data %>% 157 | # fit the model 158 | do(broom::tidy(feols(dep_var ~ i(time_to_treatment, ref = c(-1, -1000)) | unit^df + year^df, data = ., cluster = "bracket_df"), 159 | conf.int = TRUE)) %>% 160 | mutate(t = as.double(str_replace(term, "time_to_treatment::", ""))) %>% 161 | filter(t > -8 & t < 8) %>% 162 | select(t, estimate, conf.low, conf.high) %>% 163 | # add in data for year -1 164 | bind_rows(tibble(t = -1, estimate = 0, 165 | conf.low = 0, conf.high = 0 166 | )) %>% 167 | mutate(method = "Stacked") 168 | ``` 169 | We continue using the `fixest` package and its `sunab` function to estimate the dynamic effects using the Sun & Abraham method [@sun.2021.estimating]. 170 | 171 | ```{r Sun & Abraham} 172 | SA <- data %>% 173 | do(broom::tidy(feols(dep_var ~ sunab(group, year) | unit + year, data = ., 174 | cluster = ~ state))) %>% 175 | mutate(t = as.double(str_replace(term, "year::", "")), 176 | conf.low = estimate - (qnorm(0.975)*std.error), 177 | conf.high = estimate + (qnorm(0.975)*std.error)) %>% 178 | filter(t > -8 & t < 8) %>% 179 | select(t, estimate, conf.low, conf.high) %>% 180 | mutate(method = "Sun & Abraham") 181 | ``` 182 | 183 | 184 | The next model to estimate is the doubly-robust estimator developed by @callaway.2021.difference and available in the `did` package [@callaway.2021.did]. We use the `not-yet-treated` as the control group, standard errors are clustered at the treatment level (state). It should be noted that @callaway.2021.difference use simultaneous inference procedures which are robust to multiple testing but increase the size of confidence intervals. 185 | ```{r CSA} 186 | csa.est<- att_gt(yname= 'dep_var', 187 | tname= 'year', 188 | idname = 'unit', 189 | gname = 'group_CSA', 190 | clustervars = 'state', 191 | est_method = 'dr', 192 | control_group = 'not-yet-treated', 193 | data = data) 194 | 195 | CSA <- aggte(csa.est, type = "dynamic", na.rm = TRUE) %>% 196 | tidy() %>% 197 | rename(t = event.time) %>% 198 | filter(t > -8 & t < 8) %>% 199 | select(t, estimate, conf.low, conf.high) %>% 200 | mutate(method = "CSA") 201 | ``` 202 | 203 | Now we use the `didimputation` package written by Kyle Butts [butts.2021.didimputation] based on the paper by @borusyak.2021.revisiting. 204 | 205 | ```{r did impute} 206 | did_imp <- did_imputation(data = data, yname = "dep_var", gname = "group_CSA", 207 | tname = "year", idname = "unit", 208 | horizon=TRUE, pretrends = -10:-1) 209 | coef_imp <- did_imp %>% 210 | select(t = term, estimate, std.error) %>% 211 | mutate( 212 | conf.low = estimate - 1.96 * std.error, 213 | conf.high = estimate + 1.96 * std.error, 214 | t = as.numeric(t) 215 | ) %>% 216 | mutate(method = "DID Imputation") %>% 217 | select(c(t, estimate, conf.low, conf.high, method)) %>% 218 | filter(t > -8 & t < 8) 219 | ``` 220 | 221 | Next, we add the augmented synthetic control estimates for staggered adoption [@ben-michael.2021.augmented] using the `augsynth` package provided by @eli.2021.augsynth. 222 | ```{r augmented synthetic control} 223 | asyn_res <- multisynth(dep_var ~ treat, 224 | unit, 225 | year, 226 | data) 227 | 228 | asyn <- summary(asyn_res)$att %>% 229 | filter(Time > -8 & Time < 8 & (Level == 'Average')) %>% 230 | rename(t = Time, estimate = Estimate, conf.low = lower_bound, conf.high = upper_bound) %>% 231 | mutate(method = "Aug. Synth") %>% 232 | select(c(t, estimate, conf.low, conf.high, method)) 233 | ``` 234 | 235 | Lastly, we use the `fect` package by @liu.2021.fect and estimate a counterfactul estimator chosen via cross-validation [@liu.2021.practical]. 236 | 237 | 238 | ```{r fect} 239 | fect.res <- data %>% 240 | fect(dep_var ~ treat, data = ., 241 | index = c("unit","year"), 242 | method = "both", 243 | CV = TRUE, 244 | se = TRUE, 245 | nboots = 500, 246 | parallel = TRUE, 247 | cv.treat = FALSE) 248 | 249 | fect <- fect.res$est.att %>% 250 | as_tibble() %>% 251 | mutate(t = as.double(rownames(fect.res$est.att))) %>% 252 | filter(t > -8 & t < 8) %>% 253 | mutate(method = "FECT") %>% 254 | rename(estimate = ATT, conf.low = CI.lower, conf.high = CI.upper) %>% 255 | select(c(t, estimate, conf.low, conf.high, method)) 256 | ``` 257 | Lastly, we can use the `PanelMatch` package [@kim.2021.panelmatch] to add the panel match estimator by @imai.2021.matching. 258 | ```{r panelmatch} 259 | PM_est <- PanelMatch(lag = 5, time.id = "year", unit.id = "unit", 260 | treatment = "treat", refinement.method = "none", 261 | data = as.data.frame(data), match.missing = TRUE, 262 | size.match = 5, qoi = "att" , outcome.var = "dep_var", 263 | lead = 0:7, forbid.treatment.reversal = TRUE, 264 | use.diagonal.variance.matrix = TRUE) 265 | PM_est <- PanelEstimate(sets = PM_est, data = as.data.frame(data)) 266 | 267 | PM <- tibble(t = c(0, 1, 2, 3, 4, 5, 6, 7), estimate = summary(PM_est)$summary[, 1], conf.low = summary(PM_est)$summary[, 3], conf.high = summary(PM_est)$summary[, 4]) %>% 268 | select(t, estimate, conf.low, conf.high) %>% 269 | mutate(method = "Panel Match") 270 | ``` 271 | 272 | ```{r plot} 273 | coefs <- bind_rows(twfe, stacked, CSA, SA, coef_imp, asyn, fect, PM) 274 | 275 | plot <- coefs %>% 276 | ggplot(aes(x = t, y = estimate, color = method)) + 277 | geom_point(aes(x = t, y = estimate), position = position_dodge2(width = 0.8), size = 1) + 278 | geom_linerange(aes(x = t, ymin = conf.low, ymax = conf.high), position = position_dodge2(width = 0.8), size = 0.75) + 279 | geom_hline(yintercept = 0, linetype = "dashed", color = "red", size = .25, alpha = 0.75) + 280 | geom_vline(xintercept = -0.5, linetype = "dashed", size = .25) + 281 | scale_color_manual(name="Estimation Method", values= met.brewer("Cross", 8, "discrete")) + 282 | theme_clean() + theme(legend.position= 'bottom') + 283 | labs(title = 'Event Time Estimates', y="ATT", x = "Relative Time") + 284 | guides(col = guide_legend(nrow = 3)) 285 | plot 286 | ``` 287 | 288 | 289 | -------------------------------------------------------------------------------- /ComparingDiD.md: -------------------------------------------------------------------------------- 1 | Comparing Staggered DiD 2 | ================ 3 | Florian M. Hollenbach 4 | 2021-12-27 5 | 6 | # Comparing different staggered Difference-in-Differences Estimators 7 | 8 | **Note: this is not supposed to be an evaluation of the different 9 | estimators/packages.** 10 | 11 | Let’s load packages and set up the ggplot theme, which is stolen from 12 | [Andrew Heiss](https://www.andrewheiss.com/). 13 | 14 | ## Loading required package: fixest 15 | 16 | ## ── Attaching packages ────────────────────────────────── tidyverse 1.3.1.9000 ── 17 | 18 | ## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4 19 | ## ✓ tibble 3.1.6 ✓ dplyr 1.0.7 20 | ## ✓ tidyr 1.1.4 ✓ stringr 1.4.0 21 | ## ✓ readr 2.1.1 ✓ forcats 0.5.1 22 | 23 | ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── 24 | ## x dplyr::filter() masks stats::filter() 25 | ## x dplyr::lag() masks stats::lag() 26 | 27 | ## ## See bit.ly/panelview4r for more info. 28 | ## ## Report bugs -> yiqingxu@stanford.edu. 29 | 30 | ## Registered S3 method overwritten by 'GGally': 31 | ## method from 32 | ## +.gg ggplot2 33 | 34 | First, we create a simulated data set, with staggered treatments, 35 | heterogeneous and dynamic treatment effects. The code is based on the 36 | simulations in Baker, Larcker, and Wang (2021), except we have a 37 | never-treated group and decreased the number of units.[^1] 38 | 39 | ``` r 40 | # Data 6 - Multiple Treatment Periods and Dynamic Treatment Effects -------------- 41 | make_data6 <- function(...) { 42 | 43 | # Fixed Effects ------------------------------------------------ 44 | # unit fixed effects 45 | unit <- tibble( 46 | unit = 1:200, 47 | unit_fe = rnorm(200, 0, 0.5), 48 | # generate state 49 | state = sample(1:50, 200, replace = TRUE), 50 | # generate treatment groups 51 | group = case_when( 52 | state %in% 1:10 ~ 1989, 53 | state %in% 11:20 ~ 1998, 54 | state %in% 21:35 ~ NA_real_, ## never treated 55 | state %in% 36:50 ~ 2005 56 | ), 57 | # avg yearly treatment effects by group 58 | hat_gamma = case_when( 59 | is.na(group) ~ 0, ## never treated 60 | group == 1989 ~ .5, 61 | group == 1998 ~ .3, 62 | group == 2005 ~ .1 63 | )) %>% 64 | # generate unit specific yearly treatment effects 65 | rowwise() %>% 66 | mutate(gamma = if_else(is.na(group) == TRUE, 0, rnorm(1, hat_gamma, .2))) %>% 67 | ungroup() 68 | 69 | # year fixed effects 70 | year <- tibble( 71 | year = 1980:2015, 72 | year_fe = rnorm(36, 0, 0.5)) 73 | 74 | # full interaction of unit X year 75 | crossing(unit, year) %>% 76 | # make error term and get treatment indicators and treatment effects 77 | mutate(error = rnorm(nrow(.), 0, 0.5), 78 | treat = ifelse(year >= group & is.na(group)==F, 1, 0), # 0 for ## never treated 79 | tau = ifelse(treat == 1 & is.na(group)==F, gamma, 0)) %>% # 0 for ## never treated 80 | # calculate the dep variable 81 | group_by(unit) %>% 82 | mutate(cumtau = cumsum(tau)) %>% 83 | mutate(dep_var = unit_fe + year_fe + cumtau + error) 84 | } 85 | 86 | # make data 87 | #and treatment group variable for CSA 88 | data <- make_data6() %>% 89 | as_tibble() %>% 90 | mutate(group_CSA = if_else(is.na(group), 0, group), # CSA wants never treated cohort variable to be 0 91 | group = if_else(is.na(group), 10000, group), # never treated cohort variable 10000 for fixest 92 | time_to_treatment = ifelse(group != 10000, year - group, -1000)) # set time to treatment to -1000 for fixest 93 | ``` 94 | 95 | We can plot the data and treatment status using Licheng Liu and Yiqing 96 | Xu’s awesome `panelView` package (Liu and Xu 2021). 97 | 98 | ``` r 99 | panelView(dep_var ~ treat, data = data, index = c("unit","year"), xlab = "Year", ylab = "Unit", axis.lab.gap = 5) 100 | ``` 101 | 102 | ![](ComparingDiD_files/figure-gfm/plot%20treatment-1.png) Next, 103 | we create the stacked data set, once again following the code by [Andrew 104 | Baker](https://github.com/andrewchbaker/DiD_Codes). 105 | 106 | ``` r 107 | ### for stacking 108 | groups <- data %>% 109 | filter(group != 10000) %>% 110 | pull(group) %>% 111 | unique() 112 | 113 | ### create stacked data 114 | getdata <- function(i) { 115 | 116 | #keep what we need 117 | data %>% 118 | # keep treated units and all units not treated within -5 to 5 119 | filter(group == i | group > i + 7) %>% 120 | # keep just year -5 to 5 121 | filter(year >= i - 7 & year <= i + 7) %>% 122 | # create an indicator for the dataset 123 | mutate(df = i) %>% 124 | mutate(time_to_treatment = year - group) %>% 125 | # make dummies 126 | mutate(time_to_treatment = if_else(group == i, time_to_treatment, 0)) 127 | } 128 | stacked_data <- map_df(groups, getdata) %>% 129 | mutate(bracket_df = paste(state,df)) 130 | ``` 131 | 132 | Now we can move on to estimating the different models. First, the 133 | standard two-way fixed effects model with dynamic event time estimates. 134 | We estimate the model using the `fixest` package (Bergé 2018) and 135 | extract the dynamic event time estimates. 136 | 137 | ``` r 138 | twfe <- data %>% 139 | do(broom::tidy(feols(dep_var ~ + i(time_to_treatment, ref = c(-1, -1000)) | unit + year, 140 | data = ., cluster = ~state), conf.int = TRUE)) %>% 141 | mutate(t = as.double(str_replace_all(term, c("time_to_treatment::" = "", ":treated" = "")))) %>% 142 | filter(t > -8 & t < 8) %>% 143 | select(t, estimate, conf.low, conf.high) %>% 144 | # add in data for year -1 145 | bind_rows(tibble(t = -1, estimate = 0, 146 | conf.low = 0, conf.high = 0 147 | )) %>% 148 | mutate(method = "TWFE") 149 | ``` 150 | 151 | Next, the same model but on the stacked data. Following Baker, Larcker, 152 | and Wang (2021), we cluster standard errors at the unit×dataset 153 | interaction. 154 | 155 | ``` r 156 | stacked <- stacked_data %>% 157 | # fit the model 158 | do(broom::tidy(feols(dep_var ~ i(time_to_treatment, ref = c(-1, -1000)) | unit^df + year^df, data = ., cluster = "bracket_df"), 159 | conf.int = TRUE)) %>% 160 | mutate(t = as.double(str_replace(term, "time_to_treatment::", ""))) %>% 161 | filter(t > -8 & t < 8) %>% 162 | select(t, estimate, conf.low, conf.high) %>% 163 | # add in data for year -1 164 | bind_rows(tibble(t = -1, estimate = 0, 165 | conf.low = 0, conf.high = 0 166 | )) %>% 167 | mutate(method = "Stacked") 168 | ``` 169 | 170 | We continue using the `fixest` package and its `sunab` function to 171 | estimate the dynamic effects using the Sun & Abraham method (Sun and 172 | Abraham 2021). 173 | 174 | ``` r 175 | SA <- data %>% 176 | do(broom::tidy(feols(dep_var ~ sunab(group, year) | unit + year, data = ., 177 | cluster = ~ state))) %>% 178 | mutate(t = as.double(str_replace(term, "year::", "")), 179 | conf.low = estimate - (qnorm(0.975)*std.error), 180 | conf.high = estimate + (qnorm(0.975)*std.error)) %>% 181 | filter(t > -8 & t < 8) %>% 182 | select(t, estimate, conf.low, conf.high) %>% 183 | mutate(method = "Sun & Abraham") 184 | ``` 185 | 186 | The next model to estimate is the doubly-robust estimator developed by 187 | Callaway and Sant’Anna (2021b) and available in the `did` package 188 | (Callaway and Sant’Anna 2021a). We use the `not-yet-treated` as the 189 | control group, standard errors are clustered at the treatment level 190 | (state). It should be noted that Callaway and Sant’Anna (2021b) use 191 | simultaneous inference procedures which are robust to multiple testing 192 | but increase the size of confidence intervals. 193 | 194 | ``` r 195 | csa.est<- att_gt(yname= 'dep_var', 196 | tname= 'year', 197 | idname = 'unit', 198 | gname = 'group_CSA', 199 | clustervars = 'state', 200 | est_method = 'dr', 201 | control_group = 'not-yet-treated', 202 | data = data) 203 | 204 | CSA <- aggte(csa.est, type = "dynamic", na.rm = TRUE) %>% 205 | tidy() %>% 206 | rename(t = event.time) %>% 207 | filter(t > -8 & t < 8) %>% 208 | select(t, estimate, conf.low, conf.high) %>% 209 | mutate(method = "CSA") 210 | ``` 211 | 212 | Now we use the `didimputation` package written by Kyle Butts 213 | \[butts.2021.didimputation\] based on the paper by Borusyak, Jaravel, 214 | and Spiess (2021). 215 | 216 | ``` r 217 | did_imp <- did_imputation(data = data, yname = "dep_var", gname = "group_CSA", 218 | tname = "year", idname = "unit", 219 | horizon=TRUE, pretrends = -10:-1) 220 | coef_imp <- did_imp %>% 221 | select(t = term, estimate, std.error) %>% 222 | mutate( 223 | conf.low = estimate - 1.96 * std.error, 224 | conf.high = estimate + 1.96 * std.error, 225 | t = as.numeric(t) 226 | ) %>% 227 | mutate(method = "DID Imputation") %>% 228 | select(c(t, estimate, conf.low, conf.high, method)) %>% 229 | filter(t > -8 & t < 8) 230 | ``` 231 | 232 | Next, we add the augmented synthetic control estimates for staggered 233 | adoption (Ben-Michael, Feller, and Rothstein 2021) using the `augsynth` 234 | package provided by Ben-Michael (2021). 235 | 236 | ``` r 237 | asyn_res <- multisynth(dep_var ~ treat, 238 | unit, 239 | year, 240 | data) 241 | 242 | asyn <- summary(asyn_res)$att %>% 243 | filter(Time > -8 & Time < 8 & (Level == 'Average')) %>% 244 | rename(t = Time, estimate = Estimate, conf.low = lower_bound, conf.high = upper_bound) %>% 245 | mutate(method = "Aug. Synth") %>% 246 | select(c(t, estimate, conf.low, conf.high, method)) 247 | ``` 248 | 249 | Lastly, we use the `fect` package by Liu et al. (2021) and estimate a 250 | counterfactul estimator chosen via cross-validation (Liu, Wang, and Xu 251 | 2021). 252 | 253 | ``` r 254 | fect.res <- data %>% 255 | fect(dep_var ~ treat, data = ., 256 | index = c("unit","year"), 257 | method = "both", 258 | CV = TRUE, 259 | se = TRUE, 260 | nboots = 500, 261 | parallel = TRUE, 262 | cv.treat = FALSE) 263 | ``` 264 | 265 | ## Parallel computing ... 266 | ## Cross-validating ... 267 | ## Criterion: Mean Squared Prediction Error 268 | ## Interactive fixed effects model... 269 | ## 270 | ## r = 0; sigma2 = 0.24594; IC = -0.98374; PC = 0.23377; MSPE = 0.25846; GMSPE = 0.06675; Moment = 0.05937; MSPTATT = 0.00123; MSE = 0.23419* 271 | ## r = 1; sigma2 = 0.23805; IC = -0.60102; PC = 0.28198; MSPE = 0.29146; GMSPE = 0.07945; Moment = 0.06798; MSPTATT = 0.00127; MSE = 0.21116 272 | ## r = 2; sigma2 = 0.23098; IC = -0.21938; PC = 0.32776; MSPE = 0.34917; GMSPE = 0.09068; Moment = 0.06388; MSPTATT = 0.00090; MSE = 0.19200 273 | ## r = 3; sigma2 = 0.22354; IC = 0.15610; PC = 0.36970; MSPE = 0.43069; GMSPE = 0.10096; Moment = 0.07597; MSPTATT = 0.00096; MSE = 0.17147 274 | ## r = 4; sigma2 = 0.21703; IC = 0.53122; PC = 0.41001; MSPE = 0.48773; GMSPE = 0.11763; Moment = 0.07404; MSPTATT = 0.00098; MSE = 0.15237 275 | ## r = 5; sigma2 = 0.20927; IC = 0.89591; PC = 0.44468; MSPE = 0.53300; GMSPE = 0.14317; Moment = 0.07045; MSPTATT = 0.00071; MSE = 0.13361 276 | ## 277 | ## r* = 0 278 | ## 279 | ## Matrix completion method... 280 | ## 281 | ## lambda.norm = 1.00000; MSPE = 0.25846; GMSPE = 0.06675; Moment = 0.05937; MSPTATT = 0.00123; MSE = 0.23419* 282 | ## lambda.norm = 0.42170; MSPE = 0.27182; GMSPE = 0.07157; Moment = 0.06584; MSPTATT = 0.00038; MSE = 0.07751 283 | ## lambda.norm = 0.17783; MSPE = 0.28195; GMSPE = 0.07701; Moment = 0.06617; MSPTATT = 0.00007; MSE = 0.01424 284 | ## lambda.norm = 0.07499; MSPE = 0.27824; GMSPE = 0.07668; Moment = 0.06540; MSPTATT = 0.00001; MSE = 0.00254 285 | ## 286 | ## lambda.norm* = 1 287 | ## 288 | ## 289 | ## 290 | ## Recommended method through cross-validation: ife 291 | ## 292 | ## Bootstrapping for uncertainties ... 500 runs 293 | ## Cannot use full pre-treatment periods. The first period is removed. 294 | ## Call: 295 | ## fect.formula(formula = dep_var ~ treat, data = ., index = c("unit", 296 | ## "year"), CV = TRUE, cv.treat = FALSE, method = "both", se = TRUE, 297 | ## nboots = 500, parallel = TRUE) 298 | ## 299 | ## ATT: 300 | ## ATT S.E. CI.lower CI.upper p.value 301 | ## Tr obs equally weighted 4.214 0.3575 3.514 4.915 0 302 | ## Tr units equally weighted 3.180 0.3016 2.588 3.771 0 303 | 304 | ``` r 305 | fect <- fect.res$est.att %>% 306 | as_tibble() %>% 307 | mutate(t = as.double(rownames(fect.res$est.att))) %>% 308 | filter(t > -8 & t < 8) %>% 309 | mutate(method = "FECT") %>% 310 | rename(estimate = ATT, conf.low = CI.lower, conf.high = CI.upper) %>% 311 | select(c(t, estimate, conf.low, conf.high, method)) 312 | ``` 313 | 314 | Lastly, we can use the `PanelMatch` package (Kim et al. 2021) to add the 315 | panel match estimator by Imai, Kim, and Wang (Forthcoming). 316 | 317 | ``` r 318 | PM_est <- PanelMatch(lag = 5, time.id = "year", unit.id = "unit", 319 | treatment = "treat", refinement.method = "none", 320 | data = as.data.frame(data), match.missing = TRUE, 321 | size.match = 5, qoi = "att" , outcome.var = "dep_var", 322 | lead = 0:7, forbid.treatment.reversal = TRUE, 323 | use.diagonal.variance.matrix = TRUE) 324 | PM_est <- PanelEstimate(sets = PM_est, data = as.data.frame(data)) 325 | 326 | PM <- tibble(t = c(0, 1, 2, 3, 4, 5, 6, 7), estimate = summary(PM_est)$summary[, 1], conf.low = summary(PM_est)$summary[, 3], conf.high = summary(PM_est)$summary[, 4]) %>% 327 | select(t, estimate, conf.low, conf.high) %>% 328 | mutate(method = "Panel Match") 329 | ``` 330 | 331 | ## Matches created with 5 lags 332 | ## 333 | ## Standard errors computed with 1000 Weighted bootstrap samples 334 | ## 335 | ## Estimate of Average Treatment Effect on the Treated (ATT) by Period: 336 | ## Matches created with 5 lags 337 | ## 338 | ## Standard errors computed with 1000 Weighted bootstrap samples 339 | ## 340 | ## Estimate of Average Treatment Effect on the Treated (ATT) by Period: 341 | ## Matches created with 5 lags 342 | ## 343 | ## Standard errors computed with 1000 Weighted bootstrap samples 344 | ## 345 | ## Estimate of Average Treatment Effect on the Treated (ATT) by Period: 346 | 347 | ``` r 348 | coefs <- bind_rows(twfe, stacked, CSA, SA, coef_imp, asyn, fect, PM) 349 | 350 | plot <- coefs %>% 351 | ggplot(aes(x = t, y = estimate, color = method)) + 352 | geom_point(aes(x = t, y = estimate), position = position_dodge2(width = 0.8), size = 1) + 353 | geom_linerange(aes(x = t, ymin = conf.low, ymax = conf.high), position = position_dodge2(width = 0.8), size = 0.75) + 354 | geom_hline(yintercept = 0, linetype = "dashed", color = "red", size = .25, alpha = 0.75) + 355 | geom_vline(xintercept = -0.5, linetype = "dashed", size = .25) + 356 | scale_color_manual(name="Estimation Method", values= met.brewer("Cross", 8, "discrete")) + 357 | theme_clean() + theme(legend.position= 'bottom') + 358 | labs(title = 'Event Time Estimates', y="ATT", x = "Relative Time") + 359 | guides(col = guide_legend(nrow = 3)) 360 | plot 361 | ``` 362 | 363 | ![](ComparingDiD_files/figure-gfm/plot-1.png) 364 | 365 |
366 | 367 |
368 | 369 | Baker, Andrew, David F. Larcker, and Charles C. Y Wang. 2021. “How Much 370 | Should We Trust Staggered Difference-in-Differences Estimates?” 371 | . 372 | 373 |
374 | 375 |
376 | 377 | Ben-Michael, Eli. 2021. *Augsynth: The Augmented Synthetic Control 378 | Method*. 379 | 380 |
381 | 382 |
383 | 384 | Ben-Michael, Eli, Avi Feller, and Jesse Rothstein. 2021. “The Augmented 385 | Synthetic Control Method.” *Journal of the American Statistical 386 | Association* 116 (536): 1789–1803. 387 | . 388 | 389 |
390 | 391 |
392 | 393 | Bergé, Laurent. 2018. “Efficient Estimation of Maximum Likelihood Models 394 | with Multiple Fixed-Effects: The R Package FENmlm.” *CREA Discussion 395 | Papers*, no. 13. 396 | 397 |
398 | 399 |
400 | 401 | Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting 402 | Event Study Designs: Robust and Efficient Estimation.” 403 | . 404 | 405 |
406 | 407 |
408 | 409 | Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021a. “Did: Difference in 410 | Differences.” . 411 | 412 |
413 | 414 |
415 | 416 | ———. 2021b. “Difference-in-Differences with Multiple Time Periods.” 417 | *Journal of Econometrics*. 418 | . 419 | 420 |
421 | 422 |
423 | 424 | Imai, Kosuke, In Song Kim, and Erik Wang. Forthcoming. “Matching Methods 425 | for Causal Inference with Time-Series Cross-Sectional Data.” *American 426 | Journal of Political Science*, Forthcoming. 427 | [\\url{https://imai.fas.harvard.edu/research/tscs.html}](\url{https://imai.fas.harvard.edu/research/tscs.html}). 428 | 429 |
430 | 431 |
432 | 433 | Kim, In Song, Adam Rauh, Erik Wang, and Kosuke Imai. 2021. *PanelMatch: 434 | Matching Methods for Causal Inference with Time-Series Cross-Sectional 435 | Data*. . 436 | 437 |
438 | 439 |
440 | 441 | Liu, Licheng, Ye Wang, and Yiqing Xu. 2021. “A Practical Guide to 442 | Counterfactual Estimators for Causal Inference with Time-Series 443 | Cross-Sectional Data.” . 444 | 445 |
446 | 447 |
448 | 449 | Liu, Licheng, Ye Wang, Yiqing Xu, and Ziyi Liu. 2021. *Fect: Fixed 450 | Effects Counterfactuals*. 451 | . 452 | 453 |
454 | 455 |
456 | 457 | Liu, Licheng, and Yiqing Xu. 2021. *panelView: Visualizing Panel Data*. 458 | . 459 | 460 |
461 | 462 |
463 | 464 | Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment 465 | Effects in Event Studies with Heterogeneous Treatment Effects.” *Journal 466 | of Econometrics* 225: 175–99. 467 | https://doi.org/. 468 | 469 |
470 | 471 |
472 | 473 | [^1]: Thanks to Andrew Baker for sharing his code 474 | [here](https://github.com/andrewchbaker/DiD_Codes). 475 | --------------------------------------------------------------------------------