├── LICENSE.md ├── README.md ├── code ├── TCM_log_linear_reg_model.Rmd ├── TCM_log_linear_reg_model.pdf ├── TCM_poisson_reg_model.Rmd ├── TCM_poisson_reg_model.pdf ├── f1.m ├── f2.m ├── f3.m ├── f4.m ├── f5.m ├── fit_arrvial_model.R └── mcmcpredplot.m └── data ├── data_fig_2.xlsx ├── data_fig_S1A.xlsx ├── data_fig_S2.xlsx ├── data_fig_S4A.xlsx ├── data_fig_S4B.xlsx ├── fit_arrival_model.xlsx ├── health commission.xlsx ├── nCoV-data.xlsx └── nCoV-data_0323.xlsx /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Tian et al. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # COVID-19_TCM-50d_China 2 | [![DOI](https://zenodo.org/badge/249884240.svg)](https://zenodo.org/badge/latestdoi/249884240) 3 | 4 | Code for: An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China 5 | 6 | ## Citation 7 | 8 | An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China 9 | 10 | Huaiyu Tian, Yonghong Liu, Yidan Li, Chieh-Hsi Wu, Bin Chen, Moritz U.G. Kraemer, Bingying Li, Jun Cai, Bo Xu, Qiqi Yang, Ben Wang, Peng Yang, Yujun Cui, Yimeng Song, Pai Zheng, Quanyi Wang, Ottar N. Bjornstad, Ruifu Yang, Bryan T. Grenfell, Oliver G. Pybus, and Christopher Dye. 11 | 12 | DOI: 10.1126/science.abb6105 13 | 14 | ## Abstract 15 | 16 | Responding to an outbreak of a novel coronavirus (agent of COVID-19) in December 2019, China banned travel to and from Wuhan city on 23 January and implemented a national emergency response. We investigated the spread and control of COVID-19 using a unique data set including case reports, human movement and public health interventions. The Wuhan shutdown was associated with the delayed arrival of COVID-19 in other cities by 2.91 days (95%CI: 2.54-3.29). Cities that implemented control measures pre-emptively reported fewer cases, on average, in the first week of their outbreaks (13.0; 7.1-18.8) compared with cities that started control later (20.6; 14.5-26.8). Suspending intra-city public transport, closing entertainment venues and banning public gatherings were associated with reductions in case incidence. The national emergency response appears to have delayed the growth and limited the size of the COVID-19 epidemic in China, averting hundreds of thousands of cases by 19 February (day 50). 17 | 18 | ## Notes on the code 19 | 20 | To run, you need a Matlab toolbox called "DRAM": 21 | DRAM is a combination of two ideas for improving the efficiency of Metropolis-Hastings type Markov chain Monte Carlo (MCMC) algorithms, Delayed Rejection and Adaptive Metropolis. This page explains the basic ideas behind DRAM and provides examples and Matlab code for the computations.(see http://helios.fmi.fi/~lainema/dram/) 22 | 23 | The TCM regression analysis was performed using R3.6.2 on a MAC OS version 10.15.3 (19D76). 24 | 25 | ## Data 26 | 27 | ### Epidemiological data 28 | 29 | The COVID-19 cases data and transmission control measures implemented between 31 December 2019 and 19 February 2020 were collected from the official reports of the health commission of 34 provincial-level administrative units and 342 city-level units. The information was collected by Bingying Li. 30 | 31 | ### Mobility data 32 | 33 | Human movements were tracked with mobile phone data, through location-based services (LBS) employed by popular Tencent applications such as WeChat and QQ. Movement outflows from Wuhan City to other cities (i.e. records of the number of people leaving each day) by air, train and road, were obtained from the migration flows database (https://heat.qq.com/) from 13 January 2017 to 21 February 2017 (Spring Festival travel 2017), from 1 February 2018 to 12 March 2018 (Spring Festival travel 2018), and from 1 January 2018 to 31 December 2018 (entire 2018). Tecent's LBS data was collected and processed Dr.Bin Chen and Dr.Yimeng Song. 34 | 35 | To reconstruct the movement outflow from Wuhan during the 2020 Spring Festival (from 11 January to 25 January, before the Chinese Lunar New Year), mobile phone data (provided by the telecommunications operators) were used together with the Baidu migration index (http://qianxi.baidu.com/); using both data sources gave the most accurate measure of movement volume. The expected movement outflows from Wuhan after the New Year festival from 26 January to 19 February. 36 | 37 | ## License 38 | 39 | (see LICENSE) 40 | 41 | Additional license, warranty, and copyright information 42 | 43 | We provide a license for our code (see LICENSE) and do not claim ownership, nor the right to license, the data we have obtained nor any third-party software tools/code used in our analyses. Please cite the appropriate agency, paper, and/or individual in publications and/or derivatives using these data, contact them regarding the legal use of these data, and remember to pass-forward any existing license/warranty/copyright information. THE DATA AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA AND/OR SOFTWARE OR THE USE OR OTHER DEALINGS IN THE DATA AND/OR SOFTWARE. 44 | -------------------------------------------------------------------------------- /code/TCM_log_linear_reg_model.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Log-linear regression analysis to investigate the associations between transmission 3 | controls measures and the number of reported cases in the first seven days of the 4 | outbreaks in cities" 5 | author: "Chieh-Hsi Wu" 6 | output: 7 | pdf_document: 8 | extra_dependencies: ["xcolor"] 9 | toc: true 10 | toc_depth: 3 11 | html_notebook: default 12 | html_document: 13 | df_print: paged 14 | --- 15 | 16 | 17 | Here we investigate the associations between transmission control measures and the number of reported cases in the first week of the outbreaks in cities. 18 | 19 | ```{r load_library} 20 | library(readxl) 21 | library(caret) 22 | library(lmtest) 23 | library(boot) 24 | library(R330) 25 | ``` 26 | 27 | 28 | ```{r define_functions} 29 | # function to obtain regression coefficients 30 | bs = function(formula, data, indices) { 31 | # allows boot to select sample 32 | d <- data[indices,] 33 | fit <- lm(formula, data=d) 34 | return(coef(fit)) 35 | } 36 | 37 | 38 | getSgnChgIndex = function(ci = NULL, a = NULL){ 39 | colnames(ci) = a 40 | rownames(ci) = c("CI level", 41 | "lower index", "upper index", 42 | "95% CI lower bound", "95% CI upper bound") 43 | 44 | ci = t(ci[4:5,]) 45 | cprCIBndSgn = apply(ci,1, 46 | function(z){ 47 | sgn = z>0; 48 | return(sgn[1]==sgn[2]) 49 | }) 50 | return(max(which(cprCIBndSgn))) 51 | } 52 | 53 | ``` 54 | 55 | 56 | ```{r read_in_data} 57 | covid2019FilePath = ".../data/nCoV-data.xlsx" 58 | covid2019.df = read_excel(path = covid2019FilePath, sheet = "3resp-7days") 59 | covid2019Dist.df = read_excel(path = covid2019FilePath, sheet = "dist-296") 60 | 61 | covid2019v2.df = read_excel(path = ".../data/nCoV-data_0323.xlsx", 62 | sheet = "3resp-7days") 63 | ``` 64 | 65 | # Processing the data 66 | 67 | Some of the cities have a inflow from Wuhan recorded as 0, which causes calculations to run into an error when we use it as an offset variable. 68 | To resolve this issue, 0 values are changed to $10^{-6}$, which is equivalent to only one person arriving to a city from Wuhan. 69 | 70 | ```{r process_totalflow, echo = F} 71 | covid2019.df$new.totalflow_million = covid2019.df$totalflow_million 72 | covid2019.df$new.totalflow_million[covid2019.df$totalflow_million == 0] = 1e-6 73 | ``` 74 | 75 | The arrival time is processed so that 31 December 2019 is coded as day 0. 76 | 77 | ```{r process_arrival_time} 78 | covid2019.df$new.arr.time = covid2019.df$arr.time - 1 79 | ``` 80 | 81 | The timing of suspending intra-city public transport is processed so that 31 December 2019 is coded as day 0. 82 | 83 | ```{r process_TTCM_bus} 84 | bus.resp.tab = table(covid2019.df$Bus.resp) 85 | bus.resp.tab 86 | bus.date.tab1 = table(covid2019.df$Bus.date[which(covid2019.df$Bus.resp==1)]) 87 | bus.date.tab1 88 | 89 | covid2019.df$new.Bus.date = covid2019.df$Bus.date - 1 90 | new.bus.date.tab1 = table(covid2019.df$new.Bus.date[which(covid2019.df$Bus.resp==1)]) 91 | new.bus.date.tab1 92 | 93 | covid2019.df$new.Bus.date[which(covid2019.df$Bus.resp==0)] = 0 94 | new.bus.date.tab = table(covid2019.df$new.Bus.date) 95 | new.bus.date.tab 96 | 97 | ## Sanity check 98 | ## Should return 0(s). 99 | # bus.date.tab1 - new.bus.date.tab1 100 | # (as.numeric(names(bus.date.tab1)) - 1) - as.numeric(names(new.bus.date.tab1)) 101 | # bus.date.tab1 - new.bus.date.tab[-1] 102 | # (as.numeric(names(bus.date.tab1)) - 1) - as.numeric(names(new.bus.date.tab[-1])) 103 | # bus.resp.tab["0"] - new.bus.date.tab["0"] 104 | ## Sanity check complete 105 | ``` 106 | 107 | The timing of suspending inter-city passenger traffic is processed so that 31 December 2019 is coded as day 0. 108 | 109 | 110 | ```{r process_TTCM_railway, echo = F} 111 | rail.resp.tab = table(covid2019.df$Railway.resp) 112 | rail.resp.tab 113 | rail.date.tab1 = table(covid2019.df$Railway.date[which(covid2019.df$Railway.resp == 1)]) 114 | rail.date.tab1 115 | 116 | 117 | covid2019.df$new.Railway.date = covid2019.df$Railway.date - 1 118 | new.rail.date.tab1 = table(covid2019.df$new.Railway.date[which(covid2019.df$Railway.resp == 1)]) 119 | new.rail.date.tab1 120 | 121 | covid2019.df$new.Railway.date[which(covid2019.df$Railway.resp == 0)] = 0 122 | new.rail.date.tab = table(covid2019.df$new.Railway.date) 123 | new.rail.date.tab 124 | 125 | ## Sanity check 126 | ## Should return 0(s). 127 | # rail.date.tab1 - new.rail.date.tab1 128 | # (as.numeric(names(rail.date.tab1)) - 1) - as.numeric(names(new.rail.date.tab1)) 129 | # rail.date.tab1 - new.rail.date.tab[-1] 130 | # (as.numeric(names(rail.date.tab1)) - 1) - as.numeric(names(new.rail.date.tab[-1])) 131 | # rail.resp.tab["0"] - new.rail.date.tab["0"] 132 | ## Sanity check complete 133 | 134 | ``` 135 | 136 | 137 | The timing of closure of entertainment venues and banning public gathering is processed so that 31 December 2019 is coded as day 0. 138 | 139 | 140 | ```{r process_TTCM_enter, echo = F} 141 | enter.resp.tab = table(covid2019.df$Enter.resp) 142 | enter.resp.tab 143 | enter.date.tab1 = table(covid2019.df$Enter.date[which(covid2019.df$Enter.resp == 1)]) 144 | enter.date.tab1 145 | 146 | covid2019.df$new.Enter.date = covid2019.df$Enter.date - 1 147 | new.enter.date.tab1 = table(covid2019.df$new.Enter.date[which(covid2019.df$Enter.resp == 1)]) 148 | new.enter.date.tab1 149 | 150 | covid2019.df$new.Enter.date[which(covid2019.df$Enter.resp == 0)] = 0 151 | new.enter.date.tab = table(covid2019.df$new.Enter.date) 152 | new.enter.date.tab 153 | 154 | ## Sanity check 155 | ## Should return 0(s). 156 | # enter.date.tab1 - new.enter.date.tab1 157 | # (as.numeric(names(enter.date.tab1)) - 1) - as.numeric(names(new.enter.date.tab1)) 158 | # enter.date.tab1 - new.enter.date.tab[-1] 159 | # (as.numeric(names(enter.date.tab1)) - 1) - as.numeric(names(new.enter.date.tab[-1])) 160 | # enter.resp.tab["0"] - new.enter.date.tab["0"] 161 | ## Sanity check complete 162 | ``` 163 | 164 | 165 | 166 | # Fitting a log-linear regression model 167 | 168 | First we create a new response variable which is incidence per capita divided by the inflow Wuhan. 169 | 170 | ```{r create_standardised_7day_cases} 171 | covid2019.df$std.sevendays.cucase = 172 | covid2019.df$sevendays.cucase/ 173 | (covid2019.df$Pop_million_2018*covid2019.df$new.totalflow_million) 174 | ``` 175 | 176 | After some exploration we found that there's non-linear relationship between dependent variable and the timing of closure of entertainment venues and banning public gathering. 177 | Therefore we have used a square term of that timeing variable to take care of that. 178 | The log-linear model summarised below. 179 | 180 | ```{r lm_add} 181 | yfc.resp.date.lm = lm(log(std.sevendays.cucase) ~ new.arr.time + log10.Dis.WH + 182 | Bus.resp + new.Bus.date + 183 | Railway.resp + new.Railway.date + 184 | Enter.resp + new.Enter.date + I(new.Enter.date^2), 185 | data = covid2019.df) 186 | summary(yfc.resp.date.lm) 187 | ``` 188 | 189 | The influential plots indicates an apparent influential point. 190 | 191 | 192 | ```{r lm_add_diagnostics} 193 | par(mfrow = c(2, 4)) 194 | influenceplots(yfc.resp.date.lm) 195 | ``` 196 | 197 | For interpretabiliy, instead of having a squared we discretise the timing variable, and re-fit the model with the timing variable. 198 | 199 | ```{r lm1_add} 200 | covid2019.df$new.Enter.date.cat = cut(covid2019.df$new.Enter.date, c(-1, c(23, 24, 25, 36) -1)) 201 | covid2019.df$new.Enter.date.cat = as.numeric(covid2019.df$new.Enter.date.cat) -1 202 | 203 | yfc.resp.date.lm1 = lm(log(std.sevendays.cucase) ~ new.arr.time + log10.Dis.WH + 204 | Bus.resp + new.Bus.date + 205 | Railway.resp + new.Railway.date + 206 | Enter.resp + new.Enter.date.cat, 207 | data = covid2019.df) 208 | summary(yfc.resp.date.lm1) 209 | ``` 210 | 211 | After the discretising the timing variable, the observation flagged as an influential is no longer problematic. 212 | The Cook's distances do not indicate presence of outliers. 213 | Here are a number of observations indicated as having large hat values. 214 | However, they are not too far away from they cutoff, and their hat values are quite similar to each other. 215 | So it is hard to justigy removing only a subset of those points, but there are quite a few of them, so we leave them in the model. 216 | 217 | 218 | ```{r lm1_infl_plots} 219 | par(mfrow = c(2,4)) 220 | influenceplots(yfc.resp.date.lm1) 221 | ``` 222 | 223 | The studentized Breusch-Pagan test provides evidence for heterscedasticity in the residuals. 224 | 225 | ```{r yfc.resp.date.lm1_test} 226 | lmtest::bptest(yfc.resp.date.lm1) 227 | ``` 228 | 229 | After some exploration, we found that heteroscedasticity occurs when we include either the log10 of distance to Wuhan, and the binary and timing varibles for closure of entertainment venues and banning public gathrings. 230 | Coincidentally, the models above show that we have no evidence for the the associations between those three variables and the dependent variable. 231 | 232 | The conclusions regarding the rest of the variables do not change. 233 | 234 | ```{r lm2_fitted_vs_residuals, echo = F} 235 | yfc.resp.date.lm2 = lm(log(std.sevendays.cucase) ~ new.arr.time + 236 | Bus.resp + new.Bus.date + 237 | Enter.resp + new.Enter.date.cat , 238 | data = covid2019.df) 239 | summary(yfc.resp.date.lm2) 240 | ``` 241 | 242 | The plot of studentised residuals does not indicate evident non-linearity in the residuals 243 | 244 | ```{r lm2_add_diagnostics, fig.height = 6, fig.width = 6.5} 245 | par(mfrow = c(2,2)) 246 | plot(yfc.resp.date.lm2) 247 | ``` 248 | 249 | The influence plots do provide strong indication any points that are particularly influential and need to be removed. 250 | 251 | ```{r yfc_resp_date_lm2_influ_plot} 252 | par(mfrow = c(2,4)) 253 | influenceplots(yfc.resp.date.lm2) 254 | ``` 255 | 256 | The studentized Breusch-Pagan test dose not provide evidence for heteroscedasticity in the residuals. 257 | 258 | ```{r yfc.resp.date.lm2_bptest} 259 | lmtest::bptest(yfc.resp.date.lm2) 260 | ``` 261 | 262 | ```{r lm2_add_est} 263 | yfc.resp.date.lm2.est = coef(summary(yfc.resp.date.lm2)) 264 | yfc.resp.date.lm2.est.tab = cbind(yfc.resp.date.lm2.est[,"Estimate"], 265 | yfc.resp.date.lm2.est[,"Estimate"]-1.96*yfc.resp.date.lm2.est[,"Std. Error"], 266 | yfc.resp.date.lm2.est[,"Estimate"]+1.96*yfc.resp.date.lm2.est[,"Std. Error"]) 267 | colnames(yfc.resp.date.lm2.est.tab) = c("Coefficient", "95% CI upper bound", "95% CI lower bound") 268 | round(yfc.resp.date.lm2.est.tab, 2) 269 | 270 | ``` 271 | 272 | # Diagnostics 273 | 274 | ## Check for spatial correlation in the residuals 275 | 276 | Here we check whether cities that are closer together are going have more similar residuals 277 | 278 | ```{r create_dist_matrix_for_cities} 279 | covid2019DistMat = as.matrix(covid2019Dist.df[,-1], nrow = 296, ncol = 296) 280 | rownames(covid2019DistMat) = covid2019Dist.df$code 281 | colnames(covid2019DistMat) = covid2019Dist.df$code 282 | ``` 283 | 284 | 285 | ```{r check_the_order_of_the_matrix} 286 | all(covid2019.df$Code == rownames(covid2019DistMat)) 287 | all(covid2019.df$Code == colnames(covid2019DistMat)) 288 | ``` 289 | 290 | 291 | 292 | ```{r create_residual_dist_mat} 293 | yfc.resp.date.lm2.res = residuals(yfc.resp.date.lm2) 294 | res.dist = as.matrix(dist(yfc.resp.date.lm2.res)) 295 | cityDist = covid2019DistMat 296 | # 262 is NA by mistake in the input file 297 | diag(cityDist)[262] = 0 298 | ``` 299 | 300 | ```{r get_upper_dist_matrix} 301 | ## Only extract the values of the upper triangle of the matrix as 302 | ## The lower triangle repeats the upper triange values. 303 | res.dist.unique = res.dist[upper.tri(res.dist)] 304 | city.dist.unique = cityDist[upper.tri(cityDist)] 305 | length(res.dist.unique) == length(city.dist.unique) 306 | 307 | ## Sanity checks 308 | ## The codes below should all return TRUE(s). 309 | # res.dist[1,2] == res.dist.unique[1] 310 | # res.dist[1:2,3] == res.dist.unique[1+1:2] 311 | # res.dist[1:4,5] == res.dist.unique[6+1:4] 312 | # res.dist[1:5,6] == res.dist.unique[10+1:5] 313 | 314 | # cityDist[1,2] == city.dist.unique[1] 315 | # cityDist[1:2,3] == city.dist.unique[1+1:2] 316 | # cityDist[1:4,5] == city.dist.unique[6+1:4] 317 | #cityDist[1:5,6] == city.dist.unique[10+1:5] 318 | ## Sanity checks complete 319 | ``` 320 | 321 | There is no evident correlation between the pairiwse residual differences and pairwise city differences. 322 | 323 | ```{r calculate_correlation_between_geo_distance_and_residual_diff} 324 | cor((res.dist.unique), city.dist.unique) 325 | ``` 326 | 327 | Here we check whether the peak times of inflow from Wuhan correlations with the residuals. 328 | The peak inflow is calculated for the period from 11 January 2020 to 23 January 2020. 329 | 11 January is 15 days before the Chinese New Year, while 23 January is day of Wuhan shutdown. 330 | 331 | ```{r} 332 | all(covid2019.df$Code == covid2019v2.df$Code) 333 | par(mfrow = c(1,2)) 334 | plot(x = covid2019v2.df$peak_time2, 335 | y = yfc.resp.date.lm2.res) 336 | ``` 337 | 338 | We do not find evident correlation between the residuals peak times of inflow from Wuhan. 339 | 340 | ```{r} 341 | cor(x = covid2019v2.df$peak_time2, 342 | y = yfc.resp.date.lm2.res, use="pairwise.complete.obs") 343 | ``` 344 | 345 | 346 | 347 | ## Check for temporal correlation in the residuals 348 | 349 | Here we check whether there's some temporal autocorrelation in the data. 350 | To this end we evaluate the strength of evidence for the association between the mean residuals with arrival time on day j and the residuals with arrival time on day j+1. 351 | 352 | ```{r Calculate_the_mean_residual_value_for_each_day} 353 | ## Get the unqiue arrival times 354 | arr.time.level = sort(unique(covid2019.df$arr.time)) 355 | 356 | ## Calculate the mean arrival time for each arrival time value 357 | res.mean.by.time = vector(length = length(arr.time.level)) 358 | for(i in 1:length(arr.time.level)){ 359 | # get the residual values for a given arrival time 360 | res.per.arrT = yfc.resp.date.lm2.res[covid2019.df$arr.time == arr.time.level[i]] 361 | res.mean.by.time[i] = mean(res.per.arrT) 362 | } 363 | names(res.mean.by.time) = arr.time.level 364 | ``` 365 | 366 | ```{r plot_arr_time_res} 367 | plot(covid2019.df$arr.time, 368 | yfc.resp.date.lm2.res, 369 | xlab = "Time", ylab = "Residuals") 370 | lines(x = arr.time.level, y = res.mean.by.time, 371 | col = "red", lwd = 2) 372 | ``` 373 | 374 | 375 | ```{r get_valid_prev_days} 376 | arr.time.level.prev = arr.time.level[-length(arr.time.level)] 377 | arr.time.level.prev 378 | names(arr.time.level.prev) = arr.time.level[-1] 379 | ``` 380 | 381 | The earliest arrival time is 20 January. 382 | So the residuals with this arrival time will not have a covariate value. 383 | 384 | ```{r identify_the_imediate_previous_date} 385 | prev.day = arr.time.level.prev[as.character(covid2019.df$arr.time)] 386 | table(covid2019.df$arr.time - prev.day, useNA = "always") 387 | 388 | # Sanity check 389 | # table(covid2019.df$arr.time - 1)[-1] == table(prev.day) 390 | ``` 391 | 392 | There is not evidence for an association between the mean residuals with arrival time on day j and the residuals with arrival time on day j + 1. 393 | 394 | ```{r determine_Time_autocorrelation} 395 | prev.mean = res.mean.by.time[as.character(prev.day)] 396 | summary(lm(yfc.resp.date.lm2.res ~ prev.mean)) 397 | ``` 398 | 399 | ## Check the normality assumption 400 | 401 | The disribution of the residuals is fairly symmetric albeit the longer right tail. 402 | As a result, we see moderate departure from normality in the Q-Q plots. 403 | 404 | ```{r qq_plot} 405 | #pdf(file = "/Users/chwu/Documents/research/nCov-2019_TCM/logLinearNormality.pdf", 406 | # width = 9, height = 4.5) 407 | par(mfrow = c(1,2), mar = c(5,4,1,2)+0.2) 408 | 409 | hist(yfc.resp.date.lm2.res, 410 | xlab = "Residuals", main = "", 411 | nclass = 20, prob = T) 412 | yfc.resp.date.lm2.res2 = 413 | yfc.resp.date.lm2.res[which(yfc.resp.date.lm2.res < abs(min(yfc.resp.date.lm2.res)))] 414 | normFitDens = dnorm(-100:100/10, mean = -0.5642705, sd = 2.12) 415 | lines(-100:100/10, normFitDens) 416 | 417 | qqnorm(yfc.resp.date.lm2.res, main ="", col = "#00000077", pch = 16) 418 | qqline(yfc.resp.date.lm2.res, col="red") 419 | #dev.off() 420 | 421 | ``` 422 | 423 | The Shapiro-Wilk test provides evidence for departure of normality in the errors. 424 | 425 | ```{r yfc.resp.date.lm2_shapiro_test} 426 | shapiro.test(yfc.resp.date.lm2.res) 427 | ``` 428 | 429 | # Bootstrap analysis 430 | 431 | By central limit theorem, the moderate departure from normality, should not be a problem. 432 | However, just to be safe, we evaluate the bootstrap esimtates of the regression coefficients, 95\% CI and p-values. 433 | The bootstrap esimtates does not assume a parametric distribution for the errors and therefore is a suitable alternative when departure of normality would be a problem. 434 | 435 | ## Simulate bootstrap replicates 436 | 437 | We simulated 10000 bootstrap replicates. 438 | 439 | 440 | ```{r get_boot_strap_rep} 441 | set.seed(777) 442 | # bootstrapping with 1000 replications 443 | results <- boot(data=covid2019.df, statistic=bs, 444 | R=10000, formula=log(std.sevendays.cucase) ~ new.arr.time + 445 | Bus.resp + new.Bus.date + 446 | Enter.resp + new.Enter.date.cat) 447 | ``` 448 | 449 | The bias is generally very small compare to the coefficients, which indicates the bootstrap esimates are very similar to the least squre estimates above. 450 | This confirms that the moderate departure of normality is not an issue this case. 451 | 452 | 453 | ```{r bootstrap_results} 454 | # view results 455 | print(results) 456 | ``` 457 | 458 | The bootstrap estimates for the coefficients are 459 | 460 | ```{r bootstrap_estimate} 461 | round(colMeans(results$t), 2) 462 | ``` 463 | 464 | ## Normality of the bootstrap statistics 465 | 466 | ### Intercept and the adjusting variable 467 | 468 | ```{r normality_of_bootstrap_statistics_of_intercept_and_adjusted_variables} 469 | par(mfrow = c(2,2), mar = c(5, 5, 2, 2) + 0.2) 470 | hist(results$t[,1], nclass = 50, prob = T, 471 | xlab = "bootstrap replicates", main = "Intercept") 472 | qqnorm(results$t[,1], main = "Intercept", ylab = "Sampled quantiles\n(bootstrap)") 473 | 474 | hist(results$t[,2], nclass = 50, prob = T, 475 | xlab = "bootstrap replicates", main = "Arrival time") 476 | qqnorm(results$t[,2], main = "Arrival time", ylab = "Sampled quantiles\n(bootstrap)") 477 | ``` 478 | 479 | ### Transmission control measure variables 480 | 481 | ```{r normality_of_bootstrap_statistics_of_TCM_variables} 482 | par(mfrow = c(2,4), mar = c(5, 5, 2, 2) + 0.2) 483 | hist(results$t[,3], nclass = 50, prob = T, 484 | xlab = "bootstrap replicates", main = expression(M[S])) 485 | qqnorm(results$t[,3], main = expression(M[S]), 486 | ylab = "Sampled quantiles\n(bootstrap)") 487 | qqline(results$t[,3], col = "red") 488 | 489 | hist(results$t[,4], nclass = 50, prob = T, 490 | xlab = "bootstrap replicates", main = expression("T"[S])) 491 | qqnorm(results$t[,4], main = expression("T"[S]), 492 | ylab = "Sampled quantiles\n(bootstrap)") 493 | qqline(results$t[,4], col = "red") 494 | 495 | hist(results$t[,5], nclass = 50, prob = T, 496 | xlab = "bootstrap replicates", main = expression(M[B])) 497 | qqnorm(results$t[,5], main = expression(M[B]), 498 | ylab = "Sampled quantiles\n(bootstrap)") 499 | qqline(results$t[,5], col = "red") 500 | 501 | hist(results$t[,6], nclass = 50, prob = T, 502 | xlab = "bootstrap replicates", main = expression("T"[B])) 503 | qqnorm(results$t[,6], main = expression("T"[B]), 504 | ylab = "Sampled quantiles\n(bootstrap)") 505 | qqline(results$t[,6], col = "red") 506 | ``` 507 | 508 | ## Confidence intervals 509 | 510 | The confidence intervals calculated by using the adjusted bootstrap percentile for each coefficient is given by 511 | 512 | ```{r bootstrap_CI} 513 | # get 95% confidence intervals 514 | boot.ci.results.tab = sapply(c(1:6), 515 | function(varIndex = NULL){ 516 | boot.ci(results, type = "bca", index = varIndex)$bca 517 | }) 518 | boot.ci.tab = t(boot.ci.results.tab)[,4:5] 519 | rownames(boot.ci.tab) = names(coef(yfc.resp.date.lm2)) 520 | colnames(boot.ci.tab) = paste(c("95% lower bound", "95% lower bound")) 521 | round(boot.ci.tab,2) 522 | ``` 523 | 524 | ## P-values suggested by the confidence intervals 525 | 526 | The p-value is defined as 1 minus the highest confidence level that produces a confidence interval excluding 0. 527 | While we might be able to assume that the bootstrap statistics follow a normal distribution, this method is chosen so that the p-values and confidence intervals are consistent. 528 | The p-values are calculated with sufficient precision to round up to 2 decimal places. 529 | 530 | ### Intercept 531 | 532 | ```{r pval_by_boostrap_ci_intercept} 533 | a1 = c(500:520)/1000 534 | 535 | ci1 = sapply(a1, function(a){ 536 | boot.ci(results, type= c("bca"), 537 | conf = a, index=1)$bca 538 | }) 539 | 540 | round(1 - a1[getSgnChgIndex(ci = ci1, a = a1)],2) 541 | ``` 542 | 543 | ### Arrival time 544 | 545 | ```{r pval_by_boostrap_ci_arrival_time} 546 | boot.ci(results, type= c("bca"), 547 | conf = 0.999, index=2)$bca 548 | 1 - 0.999 549 | ``` 550 | 551 | ### Binary variable for suspension of the intra-city public transport 552 | 553 | ```{r pval_by_boostrap_ci_M_S} 554 | a3 = c(9980:9990)/10000 555 | ci3 = sapply(a3, function(a){ 556 | boot.ci(results, type= c("bca"), 557 | conf = a, index=3)$bca 558 | }) 559 | #ci3[,c(3:4)] 560 | #1-a3[3] 561 | 562 | 1 - a3[getSgnChgIndex(ci = ci3, a = a3)] 563 | ``` 564 | 565 | ### Timing of suspension of the intra-city public transport 566 | 567 | ```{r pval_by_boostrap_ci_T_S} 568 | a4 = c(9970:9980)/10000 569 | ci4 = sapply(a4, function(a){ 570 | boot.ci(results, type= c("bca"), 571 | conf = a, index=4)$bca 572 | }) 573 | 574 | 1 - a4[getSgnChgIndex(ci = ci4, a = a4)] 575 | 576 | ``` 577 | 578 | ### Binary variable for closure of entertainment venues and banning of public gatherings 579 | 580 | ```{r pval_by_boostrap_ci_M_B} 581 | a5 = c(9900:9910)/10000 582 | 583 | ci5 = sapply(a5, function(a){ 584 | boot.ci(results, type= c("bca"), 585 | conf = a, index=5)$bca 586 | }) 587 | 588 | 1 - a5[getSgnChgIndex(ci = ci5, a = a5)] 589 | ``` 590 | 591 | ### Timing of closure of entertainment venues and banning of public gatherings 592 | 593 | ```{r pval_by_boostrap_ci_T_B} 594 | a6 = c(990:999)/1000 595 | 596 | ci6 = sapply(a6, function(a){ 597 | boot.ci(results, type= c("bca"), 598 | conf = a, index=6)$bca 599 | }) 600 | 601 | 1 - a6[getSgnChgIndex(ci = ci6, a = a6)] 602 | ``` 603 | -------------------------------------------------------------------------------- /code/TCM_log_linear_reg_model.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/code/TCM_log_linear_reg_model.pdf -------------------------------------------------------------------------------- /code/TCM_poisson_reg_model.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Poisson regression analysis to investigate the associations between transmission 3 | controls measures and the number of reported cases in the first seven days of the 4 | outbreaks in cities 5 | author: "Chieh-Hsi Wu" 6 | output: 7 | pdf_document: default 8 | html_notebook: default 9 | --- 10 | 11 | Here we investigate the associations between transmission control measures and the number of reported cases in the first week of the outbreaks in cities. 12 | 13 | 14 | ```{r read_in_data} 15 | library(R330) 16 | library(readxl) 17 | library(car) 18 | library(lmtest) 19 | 20 | covid2019FilePath = ".../data/nCoV-data.xlsx" 21 | covid2019.df = read_excel(path = covid2019FilePath, sheet = "3resp-7days") 22 | 23 | ``` 24 | 25 | # Processing the data 26 | 27 | Some of the cities have a inflow from Wuhan recorded as 0, which causes calculations to run into an error when we use it as an offset variable. 28 | To resolve this issue, 0 values are changed to $10^{-6}$, which is equivalent to only one person arriving to a city from Wuhan. 29 | 30 | ```{r process_totalflow} 31 | covid2019.df$new.totalflow_million = covid2019.df$totalflow_million 32 | covid2019.df$new.totalflow_million[covid2019.df$totalflow_million == 0] = 1e-6 33 | ``` 34 | 35 | The arrival time is processed so that 31 December 2019 is coded as day 0. 36 | 37 | ```{r process_arr.time} 38 | covid2019.df$new.arr.time = covid2019.df$arr.time - 1 39 | ``` 40 | 41 | The timing of suspending intra-city public transport is processed so that 31 December 2019 is coded as day 0. 42 | 43 | ```{r process_TTCM_bus} 44 | bus.resp.tab = table(covid2019.df$Bus.resp) 45 | bus.resp.tab 46 | bus.date.tab1 = table(covid2019.df$Bus.date[which(covid2019.df$Bus.resp==1)]) 47 | bus.date.tab1 48 | 49 | covid2019.df$new.Bus.date = covid2019.df$Bus.date - 1 50 | new.bus.date.tab1 = table(covid2019.df$new.Bus.date[which(covid2019.df$Bus.resp==1)]) 51 | new.bus.date.tab1 52 | 53 | covid2019.df$new.Bus.date[which(covid2019.df$Bus.resp==0)] = 0 54 | new.bus.date.tab = table(covid2019.df$new.Bus.date) 55 | new.bus.date.tab 56 | 57 | ## Sanity check 58 | ## Codes below should return (0) 59 | # bus.date.tab1 - new.bus.date.tab1 60 | # (as.numeric(names(bus.date.tab1)) - 1) - as.numeric(names(new.bus.date.tab1)) 61 | # bus.date.tab1 - new.bus.date.tab[-1] 62 | # (as.numeric(names(bus.date.tab1)) - 1) - as.numeric(names(new.bus.date.tab[-1])) 63 | # bus.resp.tab["0"] - new.bus.date.tab["0"] 64 | ## Sanity check complete 65 | ``` 66 | 67 | The timing of suspending inter-city passenger traffic is processed so that 31 December 2019 is coded as day 0. 68 | 69 | ```{r process_TTCM_railway} 70 | rail.resp.tab = table(covid2019.df$Railway.resp) 71 | rail.resp.tab 72 | rail.date.tab1 = table(covid2019.df$Railway.date[which(covid2019.df$Railway.resp == 1)]) 73 | rail.date.tab1 74 | 75 | 76 | covid2019.df$new.Railway.date = covid2019.df$Railway.date - 1 77 | new.rail.date.tab1 = table(covid2019.df$new.Railway.date[which(covid2019.df$Railway.resp == 1)]) 78 | new.rail.date.tab1 79 | 80 | covid2019.df$new.Railway.date[which(covid2019.df$Railway.resp == 0)] = 0 81 | new.rail.date.tab = table(covid2019.df$new.Railway.date) 82 | new.rail.date.tab 83 | 84 | ## Sanity check 85 | ## Codes below should return 0(s) 86 | # rail.date.tab1 - new.rail.date.tab1 87 | # (as.numeric(names(rail.date.tab1)) - 1) - as.numeric(names(new.rail.date.tab1)) 88 | # rail.date.tab1 - new.rail.date.tab[-1] 89 | # (as.numeric(names(rail.date.tab1)) - 1) - as.numeric(names(new.rail.date.tab[-1])) 90 | # rail.resp.tab["0"] - new.rail.date.tab["0"] 91 | ## Sanity check complete 92 | ``` 93 | 94 | The timing of closure of entertainment venues and banning public gathering is processed so that 31 December 2019 is coded as day 0. 95 | 96 | ```{r process_TTCM_enter} 97 | enter.resp.tab = table(covid2019.df$Enter.resp) 98 | enter.resp.tab 99 | enter.date.tab1 = table(covid2019.df$Enter.date[which(covid2019.df$Enter.resp == 1)]) 100 | enter.date.tab1 101 | 102 | covid2019.df$new.Enter.date = covid2019.df$Enter.date - 1 103 | new.enter.date.tab1 = table(covid2019.df$new.Enter.date[which(covid2019.df$Enter.resp == 1)]) 104 | new.enter.date.tab1 105 | 106 | covid2019.df$new.Enter.date[which(covid2019.df$Enter.resp == 0)] = 0 107 | new.enter.date.tab = table(covid2019.df$new.Enter.date) 108 | new.enter.date.tab 109 | 110 | ## Sanity check 111 | ## Codes should return 0(s) 112 | # enter.date.tab1 - new.enter.date.tab1 113 | # (as.numeric(names(enter.date.tab1)) - 1) - as.numeric(names(new.enter.date.tab1)) 114 | # enter.date.tab1 - new.enter.date.tab[-1] 115 | # (as.numeric(names(enter.date.tab1)) - 1) - as.numeric(names(new.enter.date.tab[-1])) 116 | # enter.resp.tab["0"] - new.enter.date.tab["0"] 117 | ## Sanity check complete 118 | ``` 119 | 120 | # Regression analysis 121 | 122 | ## Poisson regression 123 | 124 | The analysis using a Poisson regression model is presented below. 125 | 126 | ```{r glm_add1} 127 | yfc.resp.date.glm1 = glm(sevendays.cucase ~ new.arr.time + log10.Dis.WH+ 128 | Bus.resp + new.Bus.date + 129 | Railway.resp + new.Railway.date + 130 | Enter.resp + new.Enter.date + 131 | offset(log(Pop_million_2018*new.totalflow_million)), 132 | family = "poisson", 133 | data = covid2019.df) 134 | summary(yfc.resp.date.glm1) 135 | ``` 136 | 137 | There appears to be three influential points. 138 | 139 | ```{r glm_add1_infl_pts, fig.height=5, fig.width=5} 140 | influencePlot(yfc.resp.date.glm1) 141 | ``` 142 | 143 | Below is a visualisation of the influential points indicators. 144 | 145 | ```{r glm_add1_infl_plots, fig.height=6.5, fig.width=6} 146 | influenceplots(yfc.resp.date.glm1) 147 | ``` 148 | 149 | Removing the three influential points does not affect the conclusions. 150 | 151 | ```{r r glm_add2} 152 | yfc.resp.date.glm2 = glm(sevendays.cucase ~ new.arr.time + log10.Dis.WH+ 153 | Bus.resp + new.Bus.date + 154 | Railway.resp + new.Railway.date + 155 | Enter.resp + new.Enter.date + 156 | offset(log(Pop_million_2018*new.totalflow_million)), 157 | family = "poisson", 158 | data = covid2019.df[-c(1, 150, 157),]) 159 | summary(yfc.resp.date.glm2) 160 | ``` 161 | 162 | There appears to be more observations with outstanding Cook's distances. 163 | 164 | ```{r glm_add2_infl_pts, fig.height=5, fig.width=5} 165 | influencePlot(yfc.resp.date.glm2) 166 | ``` 167 | 168 | The plots below show that the outstanding Cook's distance a substantially far apart from the rest. 169 | 170 | ```{r glm_add2_infl_plots, fig.height=6.5, fig.width=6} 171 | influenceplots(yfc.resp.date.glm2) 172 | ``` 173 | 174 | The conclusions are not affected by removing the observations with the outstanding Cook's distances. 175 | 176 | 177 | ```{r glm_add} 178 | yfc.resp.date.glm = glm(sevendays.cucase ~ new.arr.time + log10.Dis.WH+ 179 | Bus.resp + new.Bus.date + 180 | Railway.resp + new.Railway.date + 181 | Enter.resp + new.Enter.date + 182 | offset(log(Pop_million_2018*new.totalflow_million)), 183 | family = "poisson", 184 | data = covid2019.df[-c(1, 7, 150, 157, 226),]) 185 | summary(yfc.resp.date.glm) 186 | ``` 187 | 188 | There still are several data points with large Cook's Distances. 189 | However there are a large number of these and by standard practice, it would be too many to remove. 190 | In general, we should only remove one or two influential points, as this is to check that the results observed is not just purely the result of one or two points. 191 | If too many points are removed, the defies the purpose of this check. 192 | 193 | ```{r glm_add_infl, fig.height=6.5, fig.width=6} 194 | influenceplots(yfc.resp.date.glm) 195 | ``` 196 | 197 | ```{r ci} 198 | yfc.resp.date.glm.est = coef(summary(yfc.resp.date.glm)) 199 | yfc.resp.date.glm.est.tab = cbind(yfc.resp.date.glm.est[,"Estimate"], 200 | yfc.resp.date.glm.est[,"Estimate"]-1.96*yfc.resp.date.glm.est[,"Std. Error"], 201 | yfc.resp.date.glm.est[,"Estimate"]+1.96*yfc.resp.date.glm.est[,"Std. Error"]) 202 | colnames(yfc.resp.date.glm.est.tab) = c("Estimate", "Lower 95% CI", "Upper 95% CI") 203 | round(yfc.resp.date.glm.est.tab, 2) 204 | ``` 205 | 206 | If the model is correct, the Pearson residuals should have constant spread across fitted values. 207 | However, the plot below clearly shows that this is not the case---there is evident heteroscedasticity in the pearson residuals. 208 | 209 | 210 | ```{r glm_pearson_res, fig.height= 4.5, fig.width=5.5} 211 | plot((fitted(yfc.resp.date.glm)), 212 | residuals(yfc.resp.date.glm, type= "pearson"), 213 | xlab = "Fitted values", ylab = "Pearson residuals") 214 | ``` 215 | 216 | Due to these issues we report the parameter estimates from the regression with influential points excluded as that give us the most conservative estimates. 217 | 218 | ## Quasi-Poisson regression 219 | 220 | 221 | A quasi-Poisson regression model is fitted to see whether it can rectify this problem. 222 | This model uses a quasi-likelihood and suggests that there is no evidence that any of the coefficients are significant. 223 | This is dubious as we demonstrate in the next step. 224 | 225 | ```{r yfc.resp.date.qp.glm} 226 | yfc.resp.date.qp.glm = glm(sevendays.cucase ~ new.arr.time + log10.Dis.WH+ 227 | Bus.resp + new.Bus.date + 228 | Railway.resp + new.Railway.date + 229 | Enter.resp + new.Enter.date + 230 | offset(log(Pop_million_2018*new.totalflow_million)), 231 | family = "quasipoisson", 232 | data = covid2019.df[-c(1, 7, 150, 157, 226),]) 233 | summary(yfc.resp.date.qp.glm) 234 | ``` 235 | 236 | We consider the arrival time, and plot it against the logarithem of the number of reported cases divided by the population size and inflow from Wuhan. 237 | It is apparente that the arrival time increases as the number of reported cases increases. 238 | 239 | ```{r arrT_cumucase_plot, fig.height= 4.5, fig.width=5.5} 240 | par(mar = c(5, 6, 2, 1) + 0.2) 241 | log.std.seven.cucase = log(covid2019.df$sevendays.cucase/ 242 | (covid2019.df$Pop_million_2018*covid2019.df$new.totalflow_million)) 243 | plot(covid2019.df$arr.time,log.std.seven.cucase, 244 | xlab = "Arrival time", 245 | ylab = "The number of case reported in the\nfirst seven days of the outbreaks in cities") 246 | ``` 247 | 248 | However, the quasi-Poisson regression again indicates the no evidence for coefficient of arrival time. 249 | This could be because of the heteroscedasticity in the Pearson residuals is extremely severe, the quasi-Poisson regression is over-compensating and is unable to detect any signal in the data efficiently. 250 | Furthermore, the plot below shows that the quasi-Poisson model provides no indication that the heteroscedasticity in the Pearson residuals have been rectified. 251 | (Pearson residuals in the plot is from the full quasi-Poisson model including all the control measure varaibles.) 252 | Therefore, we are also uncertain about the reliability of the quasi-Poisson regression analysis. 253 | 254 | ```{r yfc_arr_time_qp_glm, fig.height= 5, fig.width=5.5} 255 | yfc.arr.time.qp.glm = glm(sevendays.cucase ~ new.arr.time + 256 | offset(log(Pop_million_2018*new.totalflow_million)), 257 | family = "quasipoisson", 258 | data = covid2019.df[-c(1, 7, 150, 157, 226),]) 259 | summary(yfc.arr.time.qp.glm) 260 | ``` 261 | 262 | 263 | ```{r glm_arrTqp_pearson_res, fig.height = 4.5, fig.width = 5.5} 264 | par(mar = c(5, 5, 5, 2) + 0.2) 265 | plot((fitted(yfc.resp.date.qp.glm)), 266 | residuals(yfc.resp.date.qp.glm, type= "pearson"), 267 | xlab = "Fitted values", 268 | ylab = "Pearson residual") 269 | ``` 270 | 271 | ## Negative binomial regression 272 | 273 | A negative regression model is fitted to see whether it can rectify the heteroscedasticity in the Pearson residuals. 274 | 275 | ```{r yfc.resp.date.nb.glm} 276 | library(MASS) 277 | yfc.resp.date.nb.glm = glm.nb(sevendays.cucase ~ new.arr.time + log10.Dis.WH+ 278 | Bus.resp + new.Bus.date + 279 | Railway.resp + new.Railway.date + 280 | Enter.resp + new.Enter.date + 281 | offset(log10(Pop_million_2018*new.totalflow_million)), 282 | data = covid2019.df[-c(1, 7, 150, 157, 226),]) 283 | summary(yfc.resp.date.nb.glm) 284 | 285 | ``` 286 | 287 | The plot below shows that the Pearson residuals of the negative-binomial regression still display heteroscedasticity. 288 | And therefore, we are also uncertain about the reliability of the results from the negative binomial regression. 289 | 290 | ```{r glm_nb_peason_res, fig.height = 4.5, fig.width = 5.5} 291 | par(mar = c(5, 5, 5, 2) + 0.2) 292 | plot((fitted(yfc.resp.date.nb.glm)), 293 | residuals(yfc.resp.date.nb.glm, type= "pearson"), 294 | xlab = "Fitted values", 295 | ylab = "Pearson residual") 296 | ``` -------------------------------------------------------------------------------- /code/TCM_poisson_reg_model.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/code/TCM_poisson_reg_model.pdf -------------------------------------------------------------------------------- /code/f1.m: -------------------------------------------------------------------------------- 1 | clear model data params options 2 | load data_model.mat 3 | load name_province.mat 4 | 5 | %% 6 | % The model sum of squares in file is 7 | % given in the model structure. 8 | model.ssfun = @f2; 9 | 10 | %% 11 | % All parameters are constrained to be positive. The initial 12 | % concentrations are also unknown and are treated as extra parameters. 13 | params1 = { 14 | 15 | 16 | {'Ro', 2.2, 1.2,3.2,2.2,1} %basic reproduction number 17 | {'sigma', 4.5, 4.1,7,4.7,0.51} %incubation period 18 | {'report', 0.002,0,1} % report rate 19 | {'controlh',0.97,0,1,0.8,0.1} %control intensity 20 | {'controlm',0.64, 0,1,0.6,0.1} 21 | {'controllow',0.33,0,1,0.3,0.1} 22 | {'control2nd',0.012,0,1,0.05,0.05}%2nd control 23 | {'gamma', 5.29, 1,15,5,3}%time period from onset to isolation 24 | {'Iw0', 2,1,11}%intial infected cases in wuhan 25 | 26 | 27 | 28 | 29 | }; 30 | 31 | %% 32 | % We assume having at least some prior information on the 33 | % repeatability of the observation and assign rather non informational 34 | % prior for the residual variances of the observed states. The default 35 | % prior distribution is sigma2 ~ invchisq(S20,N0), the inverse chi 36 | % squared distribution (see for example Gelman et al.). The 3 37 | % components (_A_, _Z_, _P_) all have separate variances. 38 | 39 | model.S20 = [4]; 40 | model.N0 = [1]; 41 | 42 | %% 43 | % First generate an initial chain. 44 | options.nsimu = 50000; 45 | options.stats = 1; 46 | [results, chain, s2chain]= mcmcrun(model,data,params1,options); 47 | 48 | 49 | % % %%generate chain after burn-in 50 | options.nsimu = 200000; 51 | options.stats = 1; 52 | [results2, chain2, s2chain2] = mcmcrun(model,data,params1,options,results); 53 | 54 | %% 55 | % Chain plots should reveal that the chain has converged and we can 56 | % % use the results for estimation and predictive inference. 57 | figure 58 | mcmcplot(chain2,[],results2); %,'pairs' 59 | figure 60 | mcmcplot(chain2,[],results2,'denspanel',2); 61 | 62 | 63 | %% 64 | % Function |chainstats| calculates mean ans std from the chain and 65 | % estimates the Monte Carlo error of the estimates. Number |tau| is 66 | % the integrated autocorrelation time and |geweke| is a simple test 67 | % for a null hypothesis that the chain has converged. 68 | 69 | results2.sstype = 1; % needed for mcmcpred and sqrt transformation 70 | 71 | chainstats(chain2,results2) 72 | 73 | 74 | %% 75 | % In order to use the |mcmcpred| function we need 76 | % function |modelfun| with input arguments given as 77 | % |modelfun(xdata,theta)|. We construct this as an anonymous function. 78 | 79 | modelfun = @(d,th) f3(d(:,1),th,th(end),d); 80 | 81 | 82 | % We sample 1000 parameter realizations from |chain| and |s2chain| 83 | % and calculate the predictive plots. 84 | nsample = 1000; 85 | results2.sstype = 1; 86 | out = mcmcpred(results2,chain2,s2chain2,data.xdata,modelfun,nsample);%data.ydata-->data 87 | 88 | figure 89 | mcmcpredplot(out,out.data,data,C); 90 | 91 | -------------------------------------------------------------------------------- /code/f2.m: -------------------------------------------------------------------------------- 1 | function ss = f2(theta,data) 2 | % algae sum-of-squares function 3 | 4 | time = data.ydata(:,1); 5 | ydata = data.ydata(:,2:32); 6 | xdata = data.xdata; 7 | 8 | % 3 last parameters are the initial states 9 | y0 = theta(end); 10 | 11 | ymodel = f3(time,theta,y0,xdata); 12 | 13 | ss =sum((sqrt(ymodel) - sqrt(ydata)).^2); 14 | -------------------------------------------------------------------------------- /code/f3.m: -------------------------------------------------------------------------------- 1 | function y=f3(time,theta,y0,xdata) 2 | 3 | y = f4(time,y0,theta,xdata); 4 | 5 | 6 | -------------------------------------------------------------------------------- /code/f4.m: -------------------------------------------------------------------------------- 1 | function ydot = f4(t,y,theta,xdata) 2 | 3 | 4 | gamma1(1:31)=theta(8); %time period from onset to isolation 5 | gamma=1./gamma1; 6 | sigma=1/theta(2);%Incubation period 7 | beta1=theta(1).*gamma; % transmission rate 8 | report=theta(3); %report rate 9 | control1(1:31)=theta(5);% control intensity 10 | control1([30,4,24,18,22,19,10,9,28,27,8,21,16])=theta(6); 11 | control1([31,29,25,11,23])=theta(4); 12 | control2nd(1:31)=theta(7);%second control 13 | 14 | 15 | Pwuhan=10892900;%population of Wuhan 16 | flow=xdata(ceil(t),[2:31,96]);%matrix,1 colomu-1 province 17 | HRwuhan=xdata(ceil(t),32);%wuhan report daily cases 18 | HRwuhan(1:5)=theta(9);% estimated initial infected cases in wuhan from 1/11-1/15 19 | pop=xdata(ceil(1:31),33);%population of other province 20 | importI=HRwuhan(1)*flow(1,:)/report/Pwuhan;%imported cases 21 | timelevel1=xdata(ceil(t),[35:64,97]); 22 | time2nd=xdata(ceil(t),[66:95,98]); 23 | 24 | beta(1:31)=beta1; 25 | HE(1,1:31) = importI; 26 | HI(1,1:31) = 0; 27 | HS(1,1:31) = pop'-HE(1,:)-HI(1,:); 28 | HR(1,1:31) = 0; 29 | I(1,1:31)=0; 30 | beta(1:31)=beta1; 31 | 32 | for i = 2:40 33 | 34 | %control since level 1 response 35 | beta(find(timelevel1(i,:)==1))=beta1(find(timelevel1(i,:)==1)).*control1(find(timelevel1(i,:)==1)); 36 | 37 | %%2nd control 38 | beta(find(time2nd(i,:)==1))=beta1(find(time2nd(i,:)==1)).*control1(find(time2nd(i,:)==1)).*control2nd(find(time2nd(i,:)==1)); 39 | 40 | %Imported Infected cases 41 | importI=HRwuhan(i)*flow(i,:)/report/Pwuhan;%1*30 42 | 43 | 44 | %SEIR model 45 | HS(i,:)=HS(i-1,:)-beta.*HI(i-1,:).*HS(i-1,:)./pop'; 46 | HE(i,:)=HE(i-1,:)+beta.*HI(i-1,:).*HS(i-1,:)./pop'+importI-sigma*HE(i-1,:); 47 | HI(i,:)=HI(i-1,:)+sigma*HE(i-1,:)-gamma.*HI(i-1,:); 48 | HR(i,:)=HR(i-1,:)+gamma.*HI(i-1,:); 49 | 50 | I(i,:)=gamma.*HI(i-1,:); 51 | 52 | end 53 | 54 | 55 | ydot=[I(:,:)]; 56 | 57 | -------------------------------------------------------------------------------- /code/f5.m: -------------------------------------------------------------------------------- 1 | %predict different scenarios 2 | load name_province.mat 3 | 4 | %predict without shutdown but with control 5 | load without_shutdown_with_control.mat %noshutcon 6 | modelfun = @(d,th) f3(d(:,1),th,th(end),d); 7 | nsample = 500; 8 | results2.sstype = 1; 9 | out1 = mcmcpred(results2,chain2,s2chain2,data.xdata,modelfun,nsample); 10 | figure 11 | mcmcpredplot(out1,out.data,data,C); 12 | 13 | %%%%%%predict with shutdown but without control 14 | load with_shutdown_without_control.mat 15 | modelfun = @(d,th) f3(d(:,1),th,th(end),d); 16 | nsample = 500; 17 | results2.sstype = 1; 18 | out2 = mcmcpred(results2,chain2,s2chain2,data.xdata,modelfun,nsample); 19 | 20 | %%%%%%% predict without shutdown and without control 21 | load without_shutdown_without_control.mat 22 | modelfun = @(d,th) f3(d(:,1),th,th(end),d); 23 | nsample = 500; 24 | results2.sstype = 1; 25 | out3 = mcmcpred(results2,chain2,s2chain2,data.xdata,modelfun,nsample); 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /code/fit_arrvial_model.R: -------------------------------------------------------------------------------- 1 | rm(list = ls()) 2 | library(MASS) 3 | library(readxl) 4 | library(car) 5 | 6 | infected=read_xlsx(".../data/fit_arrival_model.xlsx",sheet="1") 7 | infectedW=infected[infected$arrivalday>=2,] 8 | 9 | step.model <- lm(arrivalday~lat+lon+log10(Pop2018)+ 10 | log10(Totalflow)+aftershutdown,data=infectedW) 11 | summary(step.model) 12 | 13 | ### Diagnosis### 14 | par(mfrow = c(2,2)) 15 | plot(fitted(step.model), 16 | residuals(step.model, type = "pearson"), 17 | xlab = "Fitted values", ylab = "Pearson residuals") 18 | plot(fitted(step.model), 19 | residuals(step.model, type = "deviance"), 20 | xlab = "Fitted values", ylab = "Deviance residuals") 21 | influencePlot(step.model) 22 | fit1 <- lm(arrivalday~lat+lon+log10(Pop2018)+log10(Totalflow)+aftershutdown, 23 | data=infectedW[-c(64,72,255,257,258),]) 24 | 25 | summary(fit1) 26 | influencePlot(fit1) 27 | lmtest::bptest(fit1) 28 | coeftest(fit1, vcov = vcovHC(fit1)) 29 | qqnorm(fit1$residuals) 30 | qqline(fit1$residuals) 31 | shapiro.test(fit1$residuals) 32 | 33 | 34 | -------------------------------------------------------------------------------- /code/mcmcpredplot.m: -------------------------------------------------------------------------------- 1 | function h=mcmcpredplot(out,data,adddata,name) 2 | %MCMCPREDPLOT - predictive plot for mcmc results 3 | % Creates predictive figures for each batch in the data set using 4 | % mcmc chain. Needs input from the function mcmcpred. 5 | % Example: 6 | % out=mcmcpred(results,chain,s2chain,data,modelfun); 7 | % mcmcpredplot(out) 8 | % 9 | % If s2chain has been given to mcmcpred, then the plot shows 95% 10 | % probability limits for new observations and for model parameter 11 | % uncertainty. If s2chain is not used then the plot contains 50%, 12 | % 90%, 95%, and 99% predictive probability limits due parameter uncertainty. 13 | 14 | % $Revision: 1.5 $ $Date: 2013/04/17 08:45:40 $ 15 | 16 | if nargin < 2 17 | data = out.data; 18 | end 19 | 20 | if nargin < 3 21 | adddata = 0; 22 | end 23 | 24 | nbatch = length(out.predlims); 25 | 26 | if ~iscell(data) 27 | d=data; data=[]; data{1}=d; clear d 28 | end 29 | 30 | np = size(out.predlims{1}{1},1); 31 | nn = (np+1)/2; % median 32 | np = nn-1; 33 | 34 | hh = zeros(nbatch,1); 35 | 36 | for i=1:nbatch 37 | if nbatch > 1; hh(i) = figure;else hh(1)=gcf; end; % create new figures 38 | plimi = out.predlims{i}; 39 | ny = size(plimi,2); 40 | 41 | datai = data{i}; 42 | 43 | if isnumeric(datai) 44 | time = datai(:,1); % time is the first columd of data 45 | elseif isfield(datai,'ydata') 46 | time = datai.ydata(:,1); % first column of ydata 47 | elseif isfield(datai,'xdata') 48 | time = datai.xdata(:,1); % first column of xdata 49 | else 50 | error('dont know the x axis of the plots') 51 | end 52 | 53 | for j=1:ny 54 | dimc = [0.9 0.9 0.9]; % dimmest (lightest) color 55 | %yidanli changed to subplot the figure 2020/3/25(line57) original code line56 56 | %if ny>1;subplot(ny,1,j);end 57 | subplot(6,6,j) 58 | if ~isempty(out.obslims) 59 | fillyy(time,out.obslims{i}{j}(1,:),out.obslims{i}{j}(3,:),dimc); 60 | hold on 61 | dimc = [0.8,0.8,0.8]; 62 | end 63 | fillyy(time,plimi{j}(1,:),plimi{j}(2*nn-1,:),dimc); 64 | hold on 65 | for k=2:(nn-1) 66 | fillyy(time,plimi{j}(k,:),plimi{j}(2*nn-k,:),dimc.*0.9.^(k-1)); 67 | end 68 | plot(time,plimi{j}(nn,:),'-k'); 69 | %yidanli changed code to add real data in figures original code are in 70 | %line 72-74 %changed code line 76-78 71 | 72 | % if adddata 73 | % plot(datai.ydata(:,1), datai.ydata(:,j+1),'sk'); 74 | % end 75 | 76 | if ~isempty(adddata) 77 | plot(adddata.ydata(:,1),adddata.ydata(:,j+1),'sk'); 78 | end 79 | 80 | %%add the real data in the figure yidanli changed on 2020/3/25 81 | 82 | hold off 83 | if nbatch > 1 84 | title(sprintf('Data set %d, y[%d]',i,j)); 85 | elseif ny > 1 86 | 87 | %yidanli changed code to produce the name of province original code 88 | %in in line 90, new code in line 91 89 | 90 | %title(sprintf('y[%d]',j)); 91 | title(sprintf(char(name(j)),j),'FontSize',9,'Fontname','Arial'); 92 | end 93 | end 94 | 95 | end 96 | 97 | if nargout > 0 98 | h=hh; 99 | end 100 | -------------------------------------------------------------------------------- /data/data_fig_2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/data_fig_2.xlsx -------------------------------------------------------------------------------- /data/data_fig_S1A.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/data_fig_S1A.xlsx -------------------------------------------------------------------------------- /data/data_fig_S2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/data_fig_S2.xlsx -------------------------------------------------------------------------------- /data/data_fig_S4A.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/data_fig_S4A.xlsx -------------------------------------------------------------------------------- /data/data_fig_S4B.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/data_fig_S4B.xlsx -------------------------------------------------------------------------------- /data/fit_arrival_model.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/fit_arrival_model.xlsx -------------------------------------------------------------------------------- /data/health commission.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/health commission.xlsx -------------------------------------------------------------------------------- /data/nCoV-data.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/nCoV-data.xlsx -------------------------------------------------------------------------------- /data/nCoV-data_0323.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huaiyutian/COVID-19_TCM-50d_China/cc3257fc99335922a8e9b8668eea9e10fbb0e752/data/nCoV-data_0323.xlsx --------------------------------------------------------------------------------