├── README.md
├── ComparingDiD_files
    └── figure-gfm
    │   ├── plot-1.png
    │   └── plot treatment-1.png
├── ComparingDiD.Rmd
└── ComparingDiD.md


/README.md:
--------------------------------------------------------------------------------
1 | # did_compare
2 | R-code to compare some staggered did methods
3 | 


--------------------------------------------------------------------------------
/ComparingDiD_files/figure-gfm/plot-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fhollenbach/did_compare/HEAD/ComparingDiD_files/figure-gfm/plot-1.png


--------------------------------------------------------------------------------
/ComparingDiD_files/figure-gfm/plot treatment-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fhollenbach/did_compare/HEAD/ComparingDiD_files/figure-gfm/plot treatment-1.png


--------------------------------------------------------------------------------
/ComparingDiD.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Comparing Staggered DiD"
  3 | author: "Florian M. Hollenbach"
  4 | date:  "`r Sys.Date()`"
  5 | output: rmarkdown::github_document
  6 | bibliography: "/Users/florianhollenbach/Dropbox (Personal)/Bibtex/fhollenbach_master.bib"
  7 | ---
  8 | 
  9 | 
 10 | # Comparing different staggered Difference-in-Differences Estimators
 11 | **Note: this is not supposed to be an evaluation of the different estimators/packages.**
 12 | 
 13 | Let's load packages and set up the ggplot theme, which is stolen from [Andrew Heiss](https://www.andrewheiss.com/).
 14 | 
 15 | ```{r setup, echo = FALSE}
 16 | knitr::opts_chunk$set(echo = TRUE)
 17 | library(did)
 18 | library(didimputation)
 19 | library(PanelMatch)
 20 | library(fixest)
 21 | library(broom)
 22 | library(tidyverse)
 23 | library(augsynth)
 24 | library(panelView)
 25 | library(fect)
 26 | library(MetBrewer)
 27 | 
 28 | 
 29 | theme_clean <- function() {
 30 |   theme_minimal(base_family = "Barlow Semi Condensed") +
 31 |     theme(panel.grid.minor = element_blank(),
 32 |           plot.title = element_text(face = "bold", size=30),
 33 |           plot.subtitle = element_text(size=25),
 34 |           axis.title = element_text(family = "Barlow Semi Condensed Medium", face = "bold", size = 20),
 35 |           axis.text = element_text(family = "Barlow Semi Condensed Medium", size = 15),
 36 |           strip.text = element_text(family = "Barlow Semi Condensed",
 37 |                                     face = "bold", size = rel(1), hjust = 0),
 38 |           axis.title.y = element_text(angle = 90),
 39 |           strip.background = element_rect(fill = "grey80", color = NA),
 40 |           plot.caption = element_text(hjust = 0),
 41 |           legend.text = element_text(family = "Barlow Semi Condensed Medium", size=15), 
 42 |           legend.title = element_text(family = "Barlow Semi Condensed", size=20))
 43 | }
 44 | ```
 45 | 
 46 | First, we create a simulated data set, with staggered treatments, heterogeneous and dynamic treatment effects. The code is based on the simulations in @baker.2021.how, except we have a never-treated group and decreased the number of units.^[Thanks to Andrew Baker for sharing his code [here](https://github.com/andrewchbaker/DiD_Codes).]
 47 | ```{r sim data}
 48 | # Data 6 - Multiple Treatment Periods and Dynamic Treatment Effects --------------
 49 | make_data6 <- function(...) {
 50 |   
 51 |   # Fixed Effects ------------------------------------------------
 52 |   # unit fixed effects
 53 |   unit <- tibble(
 54 |     unit = 1:200, 
 55 |     unit_fe = rnorm(200, 0, 0.5),
 56 |     # generate state
 57 |     state = sample(1:50, 200, replace = TRUE),
 58 |     # generate treatment groups
 59 |     group = case_when(
 60 |       state %in% 1:10 ~ 1989,
 61 |       state %in% 11:20 ~ 1998,
 62 |       state %in% 21:35 ~ NA_real_, ## never treated
 63 |       state %in% 36:50 ~ 2005
 64 |     ),
 65 |     # avg yearly treatment effects by group
 66 |     hat_gamma = case_when(
 67 |       is.na(group) ~ 0, ## never treated
 68 |       group == 1989 ~ .5,
 69 |       group == 1998 ~ .3,
 70 |       group == 2005 ~ .1
 71 |     )) %>%
 72 |     # generate unit specific yearly treatment effects 
 73 |     rowwise() %>% 
 74 |     mutate(gamma = if_else(is.na(group) == TRUE, 0, rnorm(1, hat_gamma, .2))) %>% 
 75 |     ungroup()
 76 |   
 77 |   # year fixed effects 
 78 |   year <- tibble(
 79 |     year = 1980:2015,
 80 |     year_fe = rnorm(36, 0, 0.5))
 81 |   
 82 |   # full interaction of unit X year 
 83 |   crossing(unit, year) %>% 
 84 |     # make error term and get treatment indicators and treatment effects
 85 |     mutate(error = rnorm(nrow(.), 0, 0.5),
 86 |            treat = ifelse(year >= group & is.na(group)==F, 1, 0), # 0 for ## never treated
 87 |            tau = ifelse(treat == 1 & is.na(group)==F, gamma, 0)) %>% # 0 for ## never treated
 88 |     # calculate the dep variable
 89 |     group_by(unit) %>% 
 90 |     mutate(cumtau = cumsum(tau)) %>% 
 91 |     mutate(dep_var = unit_fe + year_fe + cumtau + error)
 92 | }
 93 | 
 94 | # make data
 95 | #and treatment group variable for CSA
 96 | data <- make_data6() %>% 
 97 |   as_tibble() %>%
 98 |   mutate(group_CSA = if_else(is.na(group), 0, group), # CSA wants never treated cohort variable to be 0
 99 |          group = if_else(is.na(group), 10000, group), # never treated cohort variable 10000 for fixest
100 |          time_to_treatment = ifelse(group != 10000, year - group, -1000)) # set time to treatment to -1000 for fixest
101 |          
102 | ```
103 | 
104 | We can plot the data and treatment status using Licheng Liu and Yiqing Xu's awesome `panelView` package [@liu.2021.panelview]. 
105 | 
106 | ```{r plot treatment}
107 | panelView(dep_var ~ treat, data = data, index = c("unit","year"), xlab = "Year", ylab = "Unit", axis.lab.gap = 5)
108 | ```
109 | Next, we create the stacked data set, once again following the code by [Andrew Baker](https://github.com/andrewchbaker/DiD_Codes). 
110 | 
111 | 
112 | ```{r data}
113 | ### for stacking
114 | groups <- data %>% 
115 |   filter(group != 10000) %>% 
116 |   pull(group) %>% 
117 |   unique()
118 | 
119 | ### create stacked data
120 | getdata <- function(i) {
121 |   
122 |   #keep what we need
123 |   data %>% 
124 |     # keep treated units and all units not treated within -5 to 5
125 |     filter(group == i | group > i + 7) %>% 
126 |     # keep just year -5 to 5
127 |     filter(year >= i - 7 & year <= i + 7) %>%
128 |     # create an indicator for the dataset
129 |     mutate(df = i) %>% 
130 |     mutate(time_to_treatment = year - group) %>% 
131 |     # make dummies
132 |     mutate(time_to_treatment = if_else(group == i, time_to_treatment, 0))
133 | }
134 | stacked_data <- map_df(groups, getdata) %>% 
135 |   mutate(bracket_df = paste(state,df))
136 | ```
137 | 
138 | Now we can move on to estimating the different models. First, the standard two-way fixed effects model with dynamic event time estimates. We estimate the model using the `fixest` package [@berge.2018.efficient] and extract the dynamic event time estimates.
139 | 
140 | ```{r twfe}
141 | twfe <- data %>% 
142 |   do(broom::tidy(feols(dep_var ~ + i(time_to_treatment, ref = c(-1, -1000)) | unit + year, 
143 |                        data = ., cluster = ~state), conf.int = TRUE)) %>% 
144 |   mutate(t =  as.double(str_replace_all(term, c("time_to_treatment::" = "", ":treated" = "")))) %>% 
145 |   filter(t > -8 & t < 8) %>% 
146 |   select(t, estimate, conf.low, conf.high) %>% 
147 |   # add in data for year -1
148 |   bind_rows(tibble(t = -1, estimate = 0, 
149 |                    conf.low = 0, conf.high = 0
150 |   )) %>% 
151 |   mutate(method = "TWFE")
152 | ```
153 | 
154 | Next, the same model but on the stacked data. Following @baker.2021.how, we cluster standard errors at the unit$\times$dataset interaction.
155 | ```{r stacked}
156 | stacked <- stacked_data %>% 
157 |   # fit the model
158 |   do(broom::tidy(feols(dep_var ~ i(time_to_treatment, ref = c(-1, -1000)) | unit^df + year^df, data = ., cluster = "bracket_df"),
159 |                  conf.int = TRUE)) %>% 
160 |   mutate(t =  as.double(str_replace(term, "time_to_treatment::", ""))) %>% 
161 |   filter(t > -8 & t < 8) %>% 
162 |   select(t, estimate, conf.low, conf.high) %>% 
163 |   # add in data for year -1
164 |   bind_rows(tibble(t = -1, estimate = 0, 
165 |                    conf.low = 0, conf.high = 0
166 |   )) %>% 
167 |   mutate(method = "Stacked")
168 | ```
169 | We continue using the `fixest` package and its `sunab` function to estimate the dynamic effects using the Sun & Abraham method [@sun.2021.estimating].
170 | 
171 | ```{r Sun & Abraham}
172 | SA <- data %>% 
173 |   do(broom::tidy(feols(dep_var ~ sunab(group, year) | unit + year, data = .,
174 |                  cluster = ~ state))) %>% 
175 |   mutate(t =  as.double(str_replace(term, "year::", "")),
176 |          conf.low = estimate - (qnorm(0.975)*std.error),
177 |          conf.high = estimate + (qnorm(0.975)*std.error)) %>% 
178 |   filter(t > -8 & t < 8) %>% 
179 |   select(t, estimate, conf.low, conf.high) %>% 
180 |   mutate(method = "Sun & Abraham")
181 | ```
182 | 
183 | 
184 | The next model to estimate is the doubly-robust estimator developed by @callaway.2021.difference and available in the `did` package [@callaway.2021.did]. We use the `not-yet-treated` as the control group, standard errors are clustered at the treatment level (state). It should be noted that @callaway.2021.difference use simultaneous inference procedures which are robust to multiple testing but increase the size of confidence intervals.
185 | ```{r CSA}
186 | csa.est<- att_gt(yname= 'dep_var',
187 |              tname= 'year',
188 |              idname = 'unit',
189 |              gname = 'group_CSA',
190 |              clustervars = 'state',
191 |              est_method = 'dr',
192 |              control_group = 'not-yet-treated',
193 |              data = data) 
194 | 
195 | CSA <- aggte(csa.est, type = "dynamic", na.rm = TRUE) %>% 
196 |   tidy() %>% 
197 |   rename(t = event.time) %>% 
198 |   filter(t > -8 & t < 8) %>% 
199 |   select(t, estimate, conf.low, conf.high) %>% 
200 |   mutate(method = "CSA")
201 | ```
202 | 
203 | Now we use the `didimputation` package written by Kyle Butts [butts.2021.didimputation] based on the paper by @borusyak.2021.revisiting.
204 | 
205 | ```{r did impute}
206 | did_imp <- did_imputation(data = data, yname = "dep_var", gname = "group_CSA",
207 |                           tname = "year", idname = "unit", 
208 |                           horizon=TRUE, pretrends = -10:-1) 
209 | coef_imp <- did_imp %>% 
210 |   select(t = term, estimate, std.error) %>%
211 |   mutate(
212 |     conf.low = estimate - 1.96 * std.error,
213 |     conf.high = estimate + 1.96 * std.error,
214 |     t = as.numeric(t)
215 |   ) %>%
216 |   mutate(method = "DID Imputation") %>% 
217 |   select(c(t, estimate, conf.low, conf.high, method)) %>% 
218 |   filter(t > -8 & t < 8)
219 | ```
220 | 
221 | Next, we add the augmented synthetic control estimates for staggered adoption [@ben-michael.2021.augmented] using the `augsynth` package provided by @eli.2021.augsynth.
222 | ```{r augmented synthetic control}
223 | asyn_res <- multisynth(dep_var ~ treat,
224 |                    unit, 
225 |                    year, 
226 |                    data)
227 | 
228 | asyn <- summary(asyn_res)$att %>% 
229 |   filter(Time > -8 & Time < 8 & (Level == 'Average')) %>%
230 |   rename(t = Time, estimate = Estimate, conf.low = lower_bound, conf.high = upper_bound) %>% 
231 |   mutate(method = "Aug. Synth") %>% 
232 |   select(c(t, estimate, conf.low, conf.high, method))
233 | ```
234 | 
235 | Lastly, we use the `fect` package by @liu.2021.fect and estimate a counterfactul estimator chosen via cross-validation [@liu.2021.practical].
236 | 
237 | 
238 | ```{r fect}
239 | fect.res <- data %>% 
240 |   fect(dep_var ~ treat, data = ., 
241 |        index = c("unit","year"), 
242 |        method = "both",
243 |        CV = TRUE, 
244 |        se = TRUE, 
245 |        nboots = 500, 
246 |        parallel = TRUE, 
247 |        cv.treat = FALSE)
248 | 
249 | fect <- fect.res$est.att %>% 
250 |   as_tibble() %>% 
251 |   mutate(t = as.double(rownames(fect.res$est.att))) %>% 
252 |   filter(t > -8 & t < 8) %>% 
253 |   mutate(method = "FECT") %>% 
254 |   rename(estimate = ATT, conf.low = CI.lower, conf.high = CI.upper) %>% 
255 |   select(c(t, estimate, conf.low, conf.high, method))
256 | ```
257 | Lastly, we can use the `PanelMatch` package [@kim.2021.panelmatch] to add the panel match estimator by @imai.2021.matching.
258 | ```{r panelmatch}
259 | PM_est <- PanelMatch(lag = 5, time.id = "year", unit.id = "unit", 
260 |                      treatment = "treat", refinement.method = "none", 
261 |                      data = as.data.frame(data), match.missing = TRUE, 
262 |                      size.match = 5, qoi = "att" , outcome.var = "dep_var",
263 |                      lead = 0:7, forbid.treatment.reversal = TRUE, 
264 |                      use.diagonal.variance.matrix = TRUE)
265 | PM_est <- PanelEstimate(sets = PM_est, data = as.data.frame(data))
266 | 
267 | PM <- tibble(t = c(0, 1, 2, 3, 4, 5, 6, 7), estimate = summary(PM_est)$summary[, 1], conf.low = summary(PM_est)$summary[, 3], conf.high = summary(PM_est)$summary[, 4]) %>% 
268 |   select(t, estimate, conf.low, conf.high) %>% 
269 |   mutate(method = "Panel Match")
270 | ```
271 | 
272 | ```{r plot}
273 | coefs <- bind_rows(twfe, stacked, CSA, SA, coef_imp, asyn, fect, PM) 
274 | 
275 | plot <- coefs %>% 
276 |   ggplot(aes(x = t, y = estimate, color = method)) + 
277 |   geom_point(aes(x = t, y = estimate), position = position_dodge2(width = 0.8), size = 1) +
278 |   geom_linerange(aes(x = t, ymin = conf.low, ymax = conf.high), position = position_dodge2(width = 0.8), size = 0.75) +
279 |   geom_hline(yintercept = 0, linetype = "dashed", color = "red", size = .25, alpha = 0.75) + 
280 |   geom_vline(xintercept = -0.5, linetype = "dashed", size = .25) +
281 |   scale_color_manual(name="Estimation Method", values= met.brewer("Cross", 8, "discrete")) +
282 |   theme_clean() + theme(legend.position= 'bottom') +
283 |   labs(title = 'Event Time Estimates', y="ATT", x = "Relative Time") + 
284 |   guides(col = guide_legend(nrow = 3)) 
285 | plot
286 | ```
287 | 
288 | 
289 | 


--------------------------------------------------------------------------------
/ComparingDiD.md:
--------------------------------------------------------------------------------
  1 | Comparing Staggered DiD
  2 | ================
  3 | Florian M. Hollenbach
  4 | 2021-12-27
  5 | 
  6 | # Comparing different staggered Difference-in-Differences Estimators
  7 | 
  8 | **Note: this is not supposed to be an evaluation of the different
  9 | estimators/packages.**
 10 | 
 11 | Let’s load packages and set up the ggplot theme, which is stolen from
 12 | [Andrew Heiss](https://www.andrewheiss.com/).
 13 | 
 14 |     ## Loading required package: fixest
 15 | 
 16 |     ## ── Attaching packages ────────────────────────────────── tidyverse 1.3.1.9000 ──
 17 | 
 18 |     ## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
 19 |     ## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
 20 |     ## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
 21 |     ## ✓ readr   2.1.1     ✓ forcats 0.5.1
 22 | 
 23 |     ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 24 |     ## x dplyr::filter() masks stats::filter()
 25 |     ## x dplyr::lag()    masks stats::lag()
 26 | 
 27 |     ## ## See bit.ly/panelview4r for more info.
 28 |     ## ## Report bugs -> yiqingxu@stanford.edu.
 29 | 
 30 |     ## Registered S3 method overwritten by 'GGally':
 31 |     ##   method from   
 32 |     ##   +.gg   ggplot2
 33 | 
 34 | First, we create a simulated data set, with staggered treatments,
 35 | heterogeneous and dynamic treatment effects. The code is based on the
 36 | simulations in Baker, Larcker, and Wang (2021), except we have a
 37 | never-treated group and decreased the number of units.[^1]
 38 | 
 39 | ``` r
 40 | # Data 6 - Multiple Treatment Periods and Dynamic Treatment Effects --------------
 41 | make_data6 <- function(...) {
 42 |   
 43 |   # Fixed Effects ------------------------------------------------
 44 |   # unit fixed effects
 45 |   unit <- tibble(
 46 |     unit = 1:200, 
 47 |     unit_fe = rnorm(200, 0, 0.5),
 48 |     # generate state
 49 |     state = sample(1:50, 200, replace = TRUE),
 50 |     # generate treatment groups
 51 |     group = case_when(
 52 |       state %in% 1:10 ~ 1989,
 53 |       state %in% 11:20 ~ 1998,
 54 |       state %in% 21:35 ~ NA_real_, ## never treated
 55 |       state %in% 36:50 ~ 2005
 56 |     ),
 57 |     # avg yearly treatment effects by group
 58 |     hat_gamma = case_when(
 59 |       is.na(group) ~ 0, ## never treated
 60 |       group == 1989 ~ .5,
 61 |       group == 1998 ~ .3,
 62 |       group == 2005 ~ .1
 63 |     )) %>%
 64 |     # generate unit specific yearly treatment effects 
 65 |     rowwise() %>% 
 66 |     mutate(gamma = if_else(is.na(group) == TRUE, 0, rnorm(1, hat_gamma, .2))) %>% 
 67 |     ungroup()
 68 |   
 69 |   # year fixed effects 
 70 |   year <- tibble(
 71 |     year = 1980:2015,
 72 |     year_fe = rnorm(36, 0, 0.5))
 73 |   
 74 |   # full interaction of unit X year 
 75 |   crossing(unit, year) %>% 
 76 |     # make error term and get treatment indicators and treatment effects
 77 |     mutate(error = rnorm(nrow(.), 0, 0.5),
 78 |            treat = ifelse(year >= group & is.na(group)==F, 1, 0), # 0 for ## never treated
 79 |            tau = ifelse(treat == 1 & is.na(group)==F, gamma, 0)) %>% # 0 for ## never treated
 80 |     # calculate the dep variable
 81 |     group_by(unit) %>% 
 82 |     mutate(cumtau = cumsum(tau)) %>% 
 83 |     mutate(dep_var = unit_fe + year_fe + cumtau + error)
 84 | }
 85 | 
 86 | # make data
 87 | #and treatment group variable for CSA
 88 | data <- make_data6() %>% 
 89 |   as_tibble() %>%
 90 |   mutate(group_CSA = if_else(is.na(group), 0, group), # CSA wants never treated cohort variable to be 0
 91 |          group = if_else(is.na(group), 10000, group), # never treated cohort variable 10000 for fixest
 92 |          time_to_treatment = ifelse(group != 10000, year - group, -1000)) # set time to treatment to -1000 for fixest
 93 | ```
 94 | 
 95 | We can plot the data and treatment status using Licheng Liu and Yiqing
 96 | Xu’s awesome `panelView` package (Liu and Xu 2021).
 97 | 
 98 | ``` r
 99 | panelView(dep_var ~ treat, data = data, index = c("unit","year"), xlab = "Year", ylab = "Unit", axis.lab.gap = 5)
100 | ```
101 | 
102 | ![](ComparingDiD_files/figure-gfm/plot%20treatment-1.png)<!-- --> Next,
103 | we create the stacked data set, once again following the code by [Andrew
104 | Baker](https://github.com/andrewchbaker/DiD_Codes).
105 | 
106 | ``` r
107 | ### for stacking
108 | groups <- data %>% 
109 |   filter(group != 10000) %>% 
110 |   pull(group) %>% 
111 |   unique()
112 | 
113 | ### create stacked data
114 | getdata <- function(i) {
115 |   
116 |   #keep what we need
117 |   data %>% 
118 |     # keep treated units and all units not treated within -5 to 5
119 |     filter(group == i | group > i + 7) %>% 
120 |     # keep just year -5 to 5
121 |     filter(year >= i - 7 & year <= i + 7) %>%
122 |     # create an indicator for the dataset
123 |     mutate(df = i) %>% 
124 |     mutate(time_to_treatment = year - group) %>% 
125 |     # make dummies
126 |     mutate(time_to_treatment = if_else(group == i, time_to_treatment, 0))
127 | }
128 | stacked_data <- map_df(groups, getdata) %>% 
129 |   mutate(bracket_df = paste(state,df))
130 | ```
131 | 
132 | Now we can move on to estimating the different models. First, the
133 | standard two-way fixed effects model with dynamic event time estimates.
134 | We estimate the model using the `fixest` package (Bergé 2018) and
135 | extract the dynamic event time estimates.
136 | 
137 | ``` r
138 | twfe <- data %>% 
139 |   do(broom::tidy(feols(dep_var ~ + i(time_to_treatment, ref = c(-1, -1000)) | unit + year, 
140 |                        data = ., cluster = ~state), conf.int = TRUE)) %>% 
141 |   mutate(t =  as.double(str_replace_all(term, c("time_to_treatment::" = "", ":treated" = "")))) %>% 
142 |   filter(t > -8 & t < 8) %>% 
143 |   select(t, estimate, conf.low, conf.high) %>% 
144 |   # add in data for year -1
145 |   bind_rows(tibble(t = -1, estimate = 0, 
146 |                    conf.low = 0, conf.high = 0
147 |   )) %>% 
148 |   mutate(method = "TWFE")
149 | ```
150 | 
151 | Next, the same model but on the stacked data. Following Baker, Larcker,
152 | and Wang (2021), we cluster standard errors at the unit×dataset
153 | interaction.
154 | 
155 | ``` r
156 | stacked <- stacked_data %>% 
157 |   # fit the model
158 |   do(broom::tidy(feols(dep_var ~ i(time_to_treatment, ref = c(-1, -1000)) | unit^df + year^df, data = ., cluster = "bracket_df"),
159 |                  conf.int = TRUE)) %>% 
160 |   mutate(t =  as.double(str_replace(term, "time_to_treatment::", ""))) %>% 
161 |   filter(t > -8 & t < 8) %>% 
162 |   select(t, estimate, conf.low, conf.high) %>% 
163 |   # add in data for year -1
164 |   bind_rows(tibble(t = -1, estimate = 0, 
165 |                    conf.low = 0, conf.high = 0
166 |   )) %>% 
167 |   mutate(method = "Stacked")
168 | ```
169 | 
170 | We continue using the `fixest` package and its `sunab` function to
171 | estimate the dynamic effects using the Sun & Abraham method (Sun and
172 | Abraham 2021).
173 | 
174 | ``` r
175 | SA <- data %>% 
176 |   do(broom::tidy(feols(dep_var ~ sunab(group, year) | unit + year, data = .,
177 |                  cluster = ~ state))) %>% 
178 |   mutate(t =  as.double(str_replace(term, "year::", "")),
179 |          conf.low = estimate - (qnorm(0.975)*std.error),
180 |          conf.high = estimate + (qnorm(0.975)*std.error)) %>% 
181 |   filter(t > -8 & t < 8) %>% 
182 |   select(t, estimate, conf.low, conf.high) %>% 
183 |   mutate(method = "Sun & Abraham")
184 | ```
185 | 
186 | The next model to estimate is the doubly-robust estimator developed by
187 | Callaway and Sant’Anna (2021b) and available in the `did` package
188 | (Callaway and Sant’Anna 2021a). We use the `not-yet-treated` as the
189 | control group, standard errors are clustered at the treatment level
190 | (state). It should be noted that Callaway and Sant’Anna (2021b) use
191 | simultaneous inference procedures which are robust to multiple testing
192 | but increase the size of confidence intervals.
193 | 
194 | ``` r
195 | csa.est<- att_gt(yname= 'dep_var',
196 |              tname= 'year',
197 |              idname = 'unit',
198 |              gname = 'group_CSA',
199 |              clustervars = 'state',
200 |              est_method = 'dr',
201 |              control_group = 'not-yet-treated',
202 |              data = data) 
203 | 
204 | CSA <- aggte(csa.est, type = "dynamic", na.rm = TRUE) %>% 
205 |   tidy() %>% 
206 |   rename(t = event.time) %>% 
207 |   filter(t > -8 & t < 8) %>% 
208 |   select(t, estimate, conf.low, conf.high) %>% 
209 |   mutate(method = "CSA")
210 | ```
211 | 
212 | Now we use the `didimputation` package written by Kyle Butts
213 | \[butts.2021.didimputation\] based on the paper by Borusyak, Jaravel,
214 | and Spiess (2021).
215 | 
216 | ``` r
217 | did_imp <- did_imputation(data = data, yname = "dep_var", gname = "group_CSA",
218 |                           tname = "year", idname = "unit", 
219 |                           horizon=TRUE, pretrends = -10:-1) 
220 | coef_imp <- did_imp %>% 
221 |   select(t = term, estimate, std.error) %>%
222 |   mutate(
223 |     conf.low = estimate - 1.96 * std.error,
224 |     conf.high = estimate + 1.96 * std.error,
225 |     t = as.numeric(t)
226 |   ) %>%
227 |   mutate(method = "DID Imputation") %>% 
228 |   select(c(t, estimate, conf.low, conf.high, method)) %>% 
229 |   filter(t > -8 & t < 8)
230 | ```
231 | 
232 | Next, we add the augmented synthetic control estimates for staggered
233 | adoption (Ben-Michael, Feller, and Rothstein 2021) using the `augsynth`
234 | package provided by Ben-Michael (2021).
235 | 
236 | ``` r
237 | asyn_res <- multisynth(dep_var ~ treat,
238 |                    unit, 
239 |                    year, 
240 |                    data)
241 | 
242 | asyn <- summary(asyn_res)$att %>% 
243 |   filter(Time > -8 & Time < 8 & (Level == 'Average')) %>%
244 |   rename(t = Time, estimate = Estimate, conf.low = lower_bound, conf.high = upper_bound) %>% 
245 |   mutate(method = "Aug. Synth") %>% 
246 |   select(c(t, estimate, conf.low, conf.high, method))
247 | ```
248 | 
249 | Lastly, we use the `fect` package by Liu et al. (2021) and estimate a
250 | counterfactul estimator chosen via cross-validation (Liu, Wang, and Xu
251 | 2021).
252 | 
253 | ``` r
254 | fect.res <- data %>% 
255 |   fect(dep_var ~ treat, data = ., 
256 |        index = c("unit","year"), 
257 |        method = "both",
258 |        CV = TRUE, 
259 |        se = TRUE, 
260 |        nboots = 500, 
261 |        parallel = TRUE, 
262 |        cv.treat = FALSE)
263 | ```
264 | 
265 |     ## Parallel computing ...
266 |     ## Cross-validating ... 
267 |     ## Criterion: Mean Squared Prediction Error
268 |     ## Interactive fixed effects model...
269 |     ## 
270 |     ##  r = 0; sigma2 = 0.24594; IC = -0.98374; PC = 0.23377; MSPE = 0.25846; GMSPE = 0.06675; Moment = 0.05937; MSPTATT = 0.00123; MSE = 0.23419*
271 |     ##  r = 1; sigma2 = 0.23805; IC = -0.60102; PC = 0.28198; MSPE = 0.29146; GMSPE = 0.07945; Moment = 0.06798; MSPTATT = 0.00127; MSE = 0.21116
272 |     ##  r = 2; sigma2 = 0.23098; IC = -0.21938; PC = 0.32776; MSPE = 0.34917; GMSPE = 0.09068; Moment = 0.06388; MSPTATT = 0.00090; MSE = 0.19200
273 |     ##  r = 3; sigma2 = 0.22354; IC = 0.15610; PC = 0.36970; MSPE = 0.43069; GMSPE = 0.10096; Moment = 0.07597; MSPTATT = 0.00096; MSE = 0.17147
274 |     ##  r = 4; sigma2 = 0.21703; IC = 0.53122; PC = 0.41001; MSPE = 0.48773; GMSPE = 0.11763; Moment = 0.07404; MSPTATT = 0.00098; MSE = 0.15237
275 |     ##  r = 5; sigma2 = 0.20927; IC = 0.89591; PC = 0.44468; MSPE = 0.53300; GMSPE = 0.14317; Moment = 0.07045; MSPTATT = 0.00071; MSE = 0.13361
276 |     ## 
277 |     ##  r* = 0
278 |     ## 
279 |     ## Matrix completion method...
280 |     ## 
281 |     ##  lambda.norm = 1.00000; MSPE = 0.25846; GMSPE = 0.06675; Moment = 0.05937; MSPTATT = 0.00123; MSE = 0.23419*
282 |     ##  lambda.norm = 0.42170; MSPE = 0.27182; GMSPE = 0.07157; Moment = 0.06584; MSPTATT = 0.00038; MSE = 0.07751
283 |     ##  lambda.norm = 0.17783; MSPE = 0.28195; GMSPE = 0.07701; Moment = 0.06617; MSPTATT = 0.00007; MSE = 0.01424
284 |     ##  lambda.norm = 0.07499; MSPE = 0.27824; GMSPE = 0.07668; Moment = 0.06540; MSPTATT = 0.00001; MSE = 0.00254
285 |     ## 
286 |     ##  lambda.norm* = 1
287 |     ## 
288 |     ## 
289 |     ## 
290 |     ##  Recommended method through cross-validation: ife
291 |     ## 
292 |     ## Bootstrapping for uncertainties ... 500 runs
293 |     ## Cannot use full pre-treatment periods. The first period is removed.
294 |     ## Call:
295 |     ## fect.formula(formula = dep_var ~ treat, data = ., index = c("unit", 
296 |     ##     "year"), CV = TRUE, cv.treat = FALSE, method = "both", se = TRUE, 
297 |     ##     nboots = 500, parallel = TRUE)
298 |     ## 
299 |     ## ATT:
300 |     ##                             ATT   S.E. CI.lower CI.upper p.value
301 |     ## Tr obs equally weighted   4.214 0.3575    3.514    4.915       0
302 |     ## Tr units equally weighted 3.180 0.3016    2.588    3.771       0
303 | 
304 | ``` r
305 | fect <- fect.res$est.att %>% 
306 |   as_tibble() %>% 
307 |   mutate(t = as.double(rownames(fect.res$est.att))) %>% 
308 |   filter(t > -8 & t < 8) %>% 
309 |   mutate(method = "FECT") %>% 
310 |   rename(estimate = ATT, conf.low = CI.lower, conf.high = CI.upper) %>% 
311 |   select(c(t, estimate, conf.low, conf.high, method))
312 | ```
313 | 
314 | Lastly, we can use the `PanelMatch` package (Kim et al. 2021) to add the
315 | panel match estimator by Imai, Kim, and Wang (Forthcoming).
316 | 
317 | ``` r
318 | PM_est <- PanelMatch(lag = 5, time.id = "year", unit.id = "unit", 
319 |                      treatment = "treat", refinement.method = "none", 
320 |                      data = as.data.frame(data), match.missing = TRUE, 
321 |                      size.match = 5, qoi = "att" , outcome.var = "dep_var",
322 |                      lead = 0:7, forbid.treatment.reversal = TRUE, 
323 |                      use.diagonal.variance.matrix = TRUE)
324 | PM_est <- PanelEstimate(sets = PM_est, data = as.data.frame(data))
325 | 
326 | PM <- tibble(t = c(0, 1, 2, 3, 4, 5, 6, 7), estimate = summary(PM_est)$summary[, 1], conf.low = summary(PM_est)$summary[, 3], conf.high = summary(PM_est)$summary[, 4]) %>% 
327 |   select(t, estimate, conf.low, conf.high) %>% 
328 |   mutate(method = "Panel Match")
329 | ```
330 | 
331 |     ## Matches created with 5 lags
332 |     ## 
333 |     ## Standard errors computed with 1000 Weighted bootstrap samples
334 |     ## 
335 |     ## Estimate of Average Treatment Effect on the Treated (ATT) by Period:
336 |     ## Matches created with 5 lags
337 |     ## 
338 |     ## Standard errors computed with 1000 Weighted bootstrap samples
339 |     ## 
340 |     ## Estimate of Average Treatment Effect on the Treated (ATT) by Period:
341 |     ## Matches created with 5 lags
342 |     ## 
343 |     ## Standard errors computed with 1000 Weighted bootstrap samples
344 |     ## 
345 |     ## Estimate of Average Treatment Effect on the Treated (ATT) by Period:
346 | 
347 | ``` r
348 | coefs <- bind_rows(twfe, stacked, CSA, SA, coef_imp, asyn, fect, PM) 
349 | 
350 | plot <- coefs %>% 
351 |   ggplot(aes(x = t, y = estimate, color = method)) + 
352 |   geom_point(aes(x = t, y = estimate), position = position_dodge2(width = 0.8), size = 1) +
353 |   geom_linerange(aes(x = t, ymin = conf.low, ymax = conf.high), position = position_dodge2(width = 0.8), size = 0.75) +
354 |   geom_hline(yintercept = 0, linetype = "dashed", color = "red", size = .25, alpha = 0.75) + 
355 |   geom_vline(xintercept = -0.5, linetype = "dashed", size = .25) +
356 |   scale_color_manual(name="Estimation Method", values= met.brewer("Cross", 8, "discrete")) +
357 |   theme_clean() + theme(legend.position= 'bottom') +
358 |   labs(title = 'Event Time Estimates', y="ATT", x = "Relative Time") + 
359 |   guides(col = guide_legend(nrow = 3)) 
360 | plot
361 | ```
362 | 
363 | ![](ComparingDiD_files/figure-gfm/plot-1.png)<!-- -->
364 | 
365 | <div id="refs" class="references csl-bib-body hanging-indent">
366 | 
367 | <div id="ref-baker.2021.how" class="csl-entry">
368 | 
369 | Baker, Andrew, David F. Larcker, and Charles C. Y Wang. 2021. “How Much
370 | Should We Trust Staggered Difference-in-Differences Estimates?”
371 | <http://dx.doi.org/10.2139/ssrn.3794018>.
372 | 
373 | </div>
374 | 
375 | <div id="ref-eli.2021.augsynth" class="csl-entry">
376 | 
377 | Ben-Michael, Eli. 2021. *Augsynth: The Augmented Synthetic Control
378 | Method*.
379 | 
380 | </div>
381 | 
382 | <div id="ref-ben-michael.2021.augmented" class="csl-entry">
383 | 
384 | Ben-Michael, Eli, Avi Feller, and Jesse Rothstein. 2021. “The Augmented
385 | Synthetic Control Method.” *Journal of the American Statistical
386 | Association* 116 (536): 1789–1803.
387 | <https://doi.org/10.1080/01621459.2021.1929245>.
388 | 
389 | </div>
390 | 
391 | <div id="ref-berge.2018.efficient" class="csl-entry">
392 | 
393 | Bergé, Laurent. 2018. “Efficient Estimation of Maximum Likelihood Models
394 | with Multiple Fixed-Effects: The R Package FENmlm.” *CREA Discussion
395 | Papers*, no. 13.
396 | 
397 | </div>
398 | 
399 | <div id="ref-borusyak.2021.revisiting" class="csl-entry">
400 | 
401 | Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting
402 | Event Study Designs: Robust and Efficient Estimation.”
403 | <https://www.dropbox.com/s/y92mmyndlbkufo1/Draft_RobustAndEfficient.pdf?raw=1>.
404 | 
405 | </div>
406 | 
407 | <div id="ref-callaway.2021.did" class="csl-entry">
408 | 
409 | Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021a. “Did: Difference in
410 | Differences.” <https://bcallaway11.github.io/did/>.
411 | 
412 | </div>
413 | 
414 | <div id="ref-callaway.2021.difference" class="csl-entry">
415 | 
416 | ———. 2021b. “Difference-in-Differences with Multiple Time Periods.”
417 | *Journal of Econometrics*.
418 | <https://doi.org/10.1016/j.jeconom.2020.12.001>.
419 | 
420 | </div>
421 | 
422 | <div id="ref-imai.2021.matching" class="csl-entry">
423 | 
424 | Imai, Kosuke, In Song Kim, and Erik Wang. Forthcoming. “Matching Methods
425 | for Causal Inference with Time-Series Cross-Sectional Data.” *American
426 | Journal of Political Science*, Forthcoming.
427 | [\\url{https://imai.fas.harvard.edu/research/tscs.html}](\url{https://imai.fas.harvard.edu/research/tscs.html}).
428 | 
429 | </div>
430 | 
431 | <div id="ref-kim.2021.panelmatch" class="csl-entry">
432 | 
433 | Kim, In Song, Adam Rauh, Erik Wang, and Kosuke Imai. 2021. *PanelMatch:
434 | Matching Methods for Causal Inference with Time-Series Cross-Sectional
435 | Data*. <https://CRAN.R-project.org/package=PanelMatch>.
436 | 
437 | </div>
438 | 
439 | <div id="ref-liu.2021.practical" class="csl-entry">
440 | 
441 | Liu, Licheng, Ye Wang, and Yiqing Xu. 2021. “A Practical Guide to
442 | Counterfactual Estimators for Causal Inference with Time-Series
443 | Cross-Sectional Data.” <http://dx.doi.org/10.2139/ssrn.3555463>.
444 | 
445 | </div>
446 | 
447 | <div id="ref-liu.2021.fect" class="csl-entry">
448 | 
449 | Liu, Licheng, Ye Wang, Yiqing Xu, and Ziyi Liu. 2021. *Fect: Fixed
450 | Effects Counterfactuals*.
451 | <https://yiqingxu.org/packages/fect/fect.html>.
452 | 
453 | </div>
454 | 
455 | <div id="ref-liu.2021.panelview" class="csl-entry">
456 | 
457 | Liu, Licheng, and Yiqing Xu. 2021. *panelView: Visualizing Panel Data*.
458 | <https://yiqingxu.org/packages/panelView/panelView.html>.
459 | 
460 | </div>
461 | 
462 | <div id="ref-sun.2021.estimating" class="csl-entry">
463 | 
464 | Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment
465 | Effects in Event Studies with Heterogeneous Treatment Effects.” *Journal
466 | of Econometrics* 225: 175–99.
467 | https://doi.org/<https://doi.org/10.1016/j.jeconom.2020.09.006>.
468 | 
469 | </div>
470 | 
471 | </div>
472 | 
473 | [^1]: Thanks to Andrew Baker for sharing his code
474 |     [here](https://github.com/andrewchbaker/DiD_Codes).
475 | 


--------------------------------------------------------------------------------