To make our comparison fair, we manually compiled R, numpy, and scipy with Intel's Math Kernel Libarary (MKL), so that these R and Python packages do calculations using the same Linear Algebra library.
We performed two tests on the model above. Test 1 is a difference GMM and test 2 a system GMM. Regression results of each test are stored in https://github.com/dazhwu/pydynpd/blob/main/Benchmark/test_1.md and https://github.com/dazhwu/pydynpd/blob/main/Benchmark/test_2.md respectively. Scripts can be found in the /Benchmark/code subfolder
58 |
59 |
60 | ## Test 1: Difference GMM
61 | The following four packages produced the same regression result: plm, panelvar, pydynpd, and xtabond2. The result from R package pdynmc is slightly different. The table below compaires their speeds. Column 2 shows their running time in seconds per 100 loops, and column 3 calculates their relative speed (i.e, relative to the fastest package in the test). Please note that as xtabond2 was developed and compiled using Mata language, there are two different modes available in Stata. In the default mode, storage is favored over speed. But users can manually switch to the speed mode. Therefore, we report two speeds for xtabond2. More specifically, xtabond2 (default) balances between speed and storage, while xtabond2 (speed) represents the fastest speed this package can achieve.
62 |
63 |
64 | | Package | Running Time | Relative to the fastest |
65 | | ------------------ | ------------ | ----------------------- |
66 | | xtabond2 | 4.19 | 1.00 |
67 | | pydynpd | 4.81 | 1.15 |
68 | | panelvar | 661.893 | 157.97 |
69 | | plm | 11.02 | 2.63 |
70 | | pdynmc | 167.4 | 39.95 |
71 |
72 |
73 |
74 | 
75 |
76 | The chart above shows that our package (pydynpd) is not far behind of xtabond2 even though it is an interpreted package.
77 |
78 |
79 | ## Test 2: System GMM
80 |
81 | In the second test, pydynpd produced the same regression results as those by xtabond2 and panelvar. plm has different results because it doesn't include constant term. On the other hand, we made several attempts but could not let R package pdynmc work; it kept on reporting the same error message (i.e., "Matrices must have same number of rows in cbind2(x, .Call(dense_to_Csparse, y)").
82 |
83 | | Package | Running Time | Relative to the fastest |
84 | | ------------------ | ------------ | ----------------------- |
85 | | xtabond2 | 5.893 | 1.00 |
86 | | pydynpd | 6.21 | 1.05 |
87 | | plm | 14.26 | 2.42 |
88 | | panelvar | 718.2 | 121.87 |
89 | | pdynmc | NA | NA |
90 |
91 |
92 |
93 | 
94 |
95 |
96 |
--------------------------------------------------------------------------------
/Benchmark/test_1.md:
--------------------------------------------------------------------------------
1 | # plm
2 | ```
3 | library(plm)
4 | abdata=read.csv("data.csv")
5 | pd <- pdata.frame(abdata, index = c("id", "year"), drop.index = TRUE)
6 | z1<-pgmm(n ~ 1+ lag(n, 1:2) + w + k |lag(n, 2:4) + lag(w, 1:3), data=pd, effect='individual',
7 | model="twosteps" ,transformation='d',robust = TRUE
8 | summary(z1)
9 |
10 | ```
11 | ```
12 | Oneway (individual) effect Two-steps model Difference GMM
13 |
14 | Call:
15 | pgmm(formula = n ~ 1 + lag(n, 1:2) + w + k | lag(n, 2:4) + lag(w,
16 | 1:3), data = pd, effect = "individual", model = "twosteps",
17 | transformation = "d", robust = TRUE)
18 |
19 | Unbalanced Panel: n = 140, T = 7-9, N = 1031
20 |
21 | Number of Observations Used: 611
22 | Residuals:
23 | Min. 1st Qu. Median Mean 3rd Qu. Max.
24 | -0.63963 -0.03725 0.00000 -0.00309 0.04356 0.48571
25 |
26 | Coefficients:
27 | Estimate Std. Error z-value Pr(>|z|)
28 | lag(n, 1:2)1 0.170062 0.104665 1.6248 0.1042
29 | lag(n, 1:2)2 -0.011338 0.037720 -0.3006 0.7637
30 | w -0.951058 0.127730 -7.4459 9.632e-14 ***
31 | k 0.463722 0.071833 6.4556 1.078e-10 ***
32 | ---
33 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
34 |
35 | Sargan test: chisq(32) = 47.85967 (p-value = 0.035436)
36 | Autocorrelation test (1): normal = -1.187819 (p-value = 0.2349)
37 | Autocorrelation test (2): normal = -0.8112457 (p-value = 0.41722)
38 | Wald test for coefficients: chisq(4) = 402.4636 (p-value = < 2.22e-16)
39 | ```
40 |
41 | # panelvar
42 | ```
43 | library(panelvar)
44 | abdata=read.csv("data.csv")
45 |
46 | p1 <-pvargmm(
47 | dependent_vars = c("n"),
48 | lags = 2,
49 | predet_vars = c("w"),
50 | exog_vars=c("k"),
51 | transformation = "fd",
52 | data = abdata,
53 | panel_identifier = c("id", "year"),
54 | steps = c("twostep"),
55 | system_instruments = FALSE,
56 | max_instr_dependent_vars = 3,
57 | max_instr_predet_vars = 3,
58 | min_instr_dependent_vars = 1L,
59 | min_instr_predet_vars = 1L,
60 | collapse = FALSE,
61 | progressbar = FALSE
62 | )
63 | summary(p1)
64 |
65 | ```
66 | ```
67 | ---------------------------------------------------
68 | Dynamic Panel VAR estimation, two-step GMM
69 | ---------------------------------------------------
70 | Transformation: First-differences
71 | Group variable: id
72 | Time variable: year
73 | Number of observations = 611
74 | Number of groups = 140
75 | Obs per group: min = 4
76 | avg = 4.364286
77 | max = 6
78 | Number of instruments = 36
79 |
80 | ===================
81 | n
82 | -------------------
83 | lag1_n 0.1701
84 | (0.1047)
85 | lag2_n -0.0113
86 | (0.0377)
87 | w -0.9511 ***
88 | (0.1277)
89 | k 0.4637 ***
90 | (0.0718)
91 | ===================
92 | *** p < 0.001; ** p < 0.01; * p < 0.05
93 |
94 | ---------------------------------------------------
95 | Instruments for equation
96 | Standard
97 | FD.(k)
98 | GMM-type
99 | Dependent vars: L(1, 3)
100 | Predet vars: L(1, 3)
101 | Collapse = FALSE
102 | ---------------------------------------------------
103 |
104 | Hansen test of overid. restrictions: chi2(32) = 47.86 Prob > chi2 = 0.035
105 | (Robust, but weakened by many instruments.)
106 | ```
107 |
108 | # pdynmc
109 |
110 | ```
111 | library(pdynmc)
112 | abdata=read.csv("data.csv")
113 | mc_1 <- pdynmc(dat=abdata,varname.i = "id", varname.t = "year",
114 | use.mc.diff = TRUE, use.mc.lev = FALSE, use.mc.nonlin = FALSE,
115 | include.y = TRUE, varname.y = "n", lagTerms.y = 2, maxLags.y=4,
116 | inst.stata = TRUE, include.x = TRUE,
117 | varname.reg.pre = c("w"), lagTerms.reg.pre = c(0), maxLags.reg.pre = c(3),
118 | fur.con = TRUE, fur.con.diff = TRUE, fur.con.lev = FALSE,
119 | varname.reg.fur = c("k"),lagTerms.reg.fur = c(0),
120 | w.mat = "iid.err", std.err = "corrected", estimation = "twostep",
121 | opt.meth = "none")
122 | summary(mc_1)
123 | mtest.fct(mc_1, order = 2)
124 | ```
125 | ```
126 | Dynamic linear panel estimation (twostep)
127 | Estimation steps: 2
128 |
129 | Coefficients:
130 | Estimate Std.Err.rob z-value.rob Pr(>|z.rob|)
131 | L1.n 0.17078 0.10597 1.611 0.107
132 | L2.n -0.01186 0.03862 -0.307 0.759
133 | L0.w -0.96426 0.12689 -7.599 <2e-16 ***
134 | L0.k 0.46357 0.07237 6.406 <2e-16 ***
135 | ---
136 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
137 |
138 | 36 total instruments are employed to estimate 4 parameters
139 | 35 linear (DIF)
140 | 1 further controls (DIF)
141 | no time dummies
142 |
143 | J-Test (overid restrictions): 47.49 with 32 DF, pvalue: 0.0383
144 | F-Statistic (slope coeff): 408.98 with 4 DF, pvalue: <0.001
145 | F-Statistic (time dummies): no time dummies included in estimation
146 |
147 | Arellano and Bond (1991) serial correlation test of degree 2
148 |
149 | data: 2step GMM Estimation
150 | normal = -0.9218, p-value = 0.3566
151 | alternative hypothesis: serial correlation of order 2 in the error terms
152 |
153 |
154 |
155 | Arellano and Bond (1991) serial correlation test of degree 2
156 |
157 | data: 2step GMM Estimation
158 | normal = -0.9218, p-value = 0.3566
159 | alternative hypothesis: serial correlation of order 2 in the error terms
160 |
161 | ```
162 | # pydynpd
163 | ```
164 | import pandas as pd
165 | from pydynpd import regression
166 | df = pd.read_csv("data.csv")
167 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) | nolevel', df, ['id', 'year'])
168 | ```
169 | ```
170 | Dynamic panel-data estimation, two-step difference GMM
171 | Group variable: id Number of obs = 611
172 | Time variable: year Min obs per group: 5
173 | Number of instruments = 36 Max obs per group: 7
174 | Number of groups = 140 Avg obs per group: 5.36
175 | +------+------------+---------------------+------------+-----------+-----+
176 | | n | coef. | Corrected Std. Err. | z | P>|z| | |
177 | +------+------------+---------------------+------------+-----------+-----+
178 | | L1.n | 0.1700616 | 0.1046652 | 1.6248154 | 0.1042019 | |
179 | | L2.n | -0.0113381 | 0.0377205 | -0.3005824 | 0.7637329 | |
180 | | w | -0.9510582 | 0.1277298 | -7.4458585 | 0.0000000 | *** |
181 | | k | 0.4637223 | 0.0718328 | 6.4555747 | 0.0000000 | *** |
182 | +------+------------+---------------------+------------+-----------+-----+
183 | Hansen test of overid. restrictions: chi(32) = 47.860 Prob > Chi2 = 0.035
184 | Arellano-Bond test for AR(1) in first differences: z = -1.19 Pr > z =0.235
185 | Arellano-Bond test for AR(2) in first differences: z = -0.81 Pr > z =0.417
186 | ```
187 |
188 |
189 | command_str='y L1.y L1.x | gmm(y, 2:4) iv(L1.x)| timedumm '
190 | mydpd = regression.abond(command_str, df, ['id', 'year'])
191 |
192 | # xtabond2
193 |
194 | ```
195 | insheet using "data.csv"
196 | xtset(id year)
197 | xtabond2 n L(1/2).n w k , gmm(n, lag(2 4)) gmm(w, lag(1 3)) iv(k ) nolevel twostep robust
198 |
199 | ```
200 | ```
201 |
202 | Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.
203 | Warning: Two-step estimated covariance matrix of moments is singular.
204 | Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
205 | Difference-in-Sargan/Hansen statistics may be negative.
206 |
207 | Dynamic panel-data estimation, two-step difference GMM
208 | ------------------------------------------------------------------------------
209 | Group variable: id Number of obs = 611
210 | Time variable : year Number of groups = 140
211 | Number of instruments = 36 Obs per group: min = 4
212 | Wald chi2(0) = . avg = 4.36
213 | Prob > chi2 = . max = 6
214 | ------------------------------------------------------------------------------
215 | | Corrected
216 | n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
217 | -------------+----------------------------------------------------------------
218 | n |
219 | L1. | .1700616 .1046652 1.62 0.104 -.0350784 .3752016
220 | L2. | -.0113381 .0377205 -0.30 0.764 -.0852688 .0625926
221 | |
222 | w | -.9510582 .1277298 -7.45 0.000 -1.201404 -.7007124
223 | k | .4637223 .0718328 6.46 0.000 .3229325 .6045121
224 | ------------------------------------------------------------------------------
225 | Instruments for first differences equation
226 | Standard
227 | D.k
228 | GMM-type (missing=0, separate instruments for each period unless collapsed)
229 | L(1/3).w
230 | L(2/4).n
231 | ------------------------------------------------------------------------------
232 | Arellano-Bond test for AR(1) in first differences: z = -1.19 Pr > z = 0.235
233 | Arellano-Bond test for AR(2) in first differences: z = -0.81 Pr > z = 0.417
234 | ------------------------------------------------------------------------------
235 | Sargan test of overid. restrictions: chi2(32) = 91.61 Prob > chi2 = 0.000
236 | (Not robust, but not weakened by many instruments.)
237 | Hansen test of overid. restrictions: chi2(32) = 47.86 Prob > chi2 = 0.035
238 | (Robust, but weakened by many instruments.)
239 |
240 | Difference-in-Hansen tests of exogeneity of instrument subsets:
241 | gmm(n, lag(2 4))
242 | Hansen test excluding group: chi2(15) = 23.75 Prob > chi2 = 0.069
243 | Difference (null H = exogenous): chi2(17) = 24.11 Prob > chi2 = 0.117
244 | gmm(w, lag(1 3))
245 | Hansen test excluding group: chi2(14) = 17.25 Prob > chi2 = 0.243
246 | Difference (null H = exogenous): chi2(18) = 30.61 Prob > chi2 = 0.032
247 | iv(k)
248 | Hansen test excluding group: chi2(31) = 38.33 Prob > chi2 = 0.171
249 | Difference (null H = exogenous): chi2(1) = 9.53 Prob > chi2 = 0.002
250 |
251 | ```
252 |
--------------------------------------------------------------------------------
/Benchmark/test_2.md:
--------------------------------------------------------------------------------
1 | # plm
2 | ```
3 | library(plm)
4 | abdata=read.csv("data.csv")
5 | pd <- pdata.frame(abdata, index = c("id", "year"), drop.index = TRUE)
6 | z1<-pgmm(n ~ 1+ lag(n, 1:2) + w + k |lag(n, 2:4) + lag(w, 1:3), data=pd, effect='individual',
7 | model="twosteps" ,transformation='ld', robust = TRUE)
8 | summary(z1)
9 |
10 | ```
11 |
12 | ```
13 | pgmm(formula = n ~ lag(n, 1:2) + w + k | lag(n, 2:4) + lag(w,
14 | 1:3), data = pd, effect = "individual", model = "twosteps",
15 | transformation = "ld", robust = TRUE)
16 |
17 | Unbalanced Panel: n = 140, T = 7-9, N = 1031
18 |
19 | Number of Observations Used: 1362
20 | Residuals:
21 | Min. 1st Qu. Median Mean 3rd Qu. Max.
22 | -1.716999 -0.039468 0.000000 0.001151 0.049452 1.057841
23 |
24 | Coefficients:
25 | Estimate Std. Error z-value Pr(>|z|)
26 | lag(n, 1:2)1 0.993296 0.146555 6.7776 1.222e-11 ***
27 | lag(n, 1:2)2 -0.164000 0.107125 -1.5309 0.125791
28 | w 0.059379 0.028402 2.0906 0.036560 *
29 | k 0.140340 0.050027 2.8053 0.005027 **
30 | ---
31 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
32 |
33 | Sargan test: chisq(47) = 105.7369 (p-value = 2.0581e-06)
34 | Autocorrelation test (1): normal = -1.926883 (p-value = 0.053994)
35 | Autocorrelation test (2): normal = -0.1281159 (p-value = 0.89806)
36 | Wald test for coefficients: chisq(4) = 8031.159 (p-value = < 2.22e-16)
37 |
38 |
39 | ```
40 |
41 | # panelvar
42 | ```
43 | library(panelvar)
44 | abdata=read.csv("data.csv")
45 |
46 | p1 <-pvargmm(
47 | dependent_vars = c("n"),
48 | lags = 2,
49 | predet_vars = c("w"),
50 | exog_vars=c("k"),
51 | transformation = "fd",
52 | data = abdata,
53 | panel_identifier = c("id", "year"),
54 | steps = c("twostep"),
55 | system_instruments = TRUE,
56 | max_instr_dependent_vars = 3,
57 | max_instr_predet_vars = 3,
58 | min_instr_dependent_vars = 1L,
59 | min_instr_predet_vars = 1L,
60 | collapse = FALSE,
61 | progressbar = FALSE
62 | )
63 | summary(p1)
64 |
65 | ```
66 |
67 | ```
68 | -------------------------------------------------
69 | Dynamic Panel VAR estimation, two-step GMM
70 | ---------------------------------------------------
71 | Transformation: First-differences
72 | Group variable: id
73 | Time variable: year
74 | Number of observations = 611
75 | Number of groups = 140
76 | Obs per group: min = 4
77 | avg = 4.364286
78 | max = 6
79 | Number of instruments = 51
80 |
81 | ===================
82 | n
83 | -------------------
84 | lag1_n 0.9454 ***
85 | (0.1430)
86 | lag2_n -0.0860
87 | (0.1082)
88 | w -0.4478 **
89 | (0.1522)
90 | k 0.1236 *
91 | (0.0509)
92 | const 1.5631 **
93 | (0.4993)
94 | ===================
95 | *** p < 0.001; ** p < 0.01; * p < 0.05
96 |
97 | ---------------------------------------------------
98 | Instruments for equation
99 | Standard
100 | FD.(k)
101 | GMM-type
102 | Dependent vars: L(2, 4)
103 | Predet vars: L(1, 3)
104 | Collapse = FALSE
105 | ---------------------------------------------------
106 |
107 | Hansen test of overid. restrictions: chi2(46) = 96.44 Prob > chi2 = 0
108 | (Robust, but weakened by many instruments.)
109 |
110 |
111 | ```
112 |
113 | # pdynmc
114 |
115 | ```
116 | library(pdynmc)
117 | abdata=read.csv("data.csv")
118 | mc_1 <- pdynmc(dat=abdata,varname.i = "id", varname.t = "year",
119 | use.mc.diff = TRUE, use.mc.lev = TRUE, use.mc.nonlin = FALSE,
120 | include.y = TRUE, varname.y = "n", lagTerms.y = 2, maxLags.y=4,
121 | inst.stata = TRUE, include.x = TRUE,
122 | varname.reg.pre = c("w"), lagTerms.reg.pre = c(0), maxLags.reg.pre = c(3),
123 | fur.con = TRUE, fur.con.diff = TRUE, fur.con.lev = TRUE,
124 | varname.reg.fur = c("k"),lagTerms.reg.fur = c(0),
125 | w.mat = "iid.err", std.err = "corrected", estimation = "twostep",
126 | opt.meth = "none")
127 | summary(mc_1)
128 | mtest.fct(mc_1, order = 2)
129 | ```
130 | ```
131 | Error in mapply(ti = ti.temp, t.end = tend.temp, lagTerms = lagTerms, : non-numeric argument to binary operator
132 | Traceback:
133 |
134 | 1. pdynmc(dat = abdata, varname.i = "id", varname.t = "year", use.mc.diff = TRUE,
135 | . use.mc.lev = TRUE, use.mc.nonlin = FALSE, include.y = TRUE,
136 | . varname.y = "n", lagTerms.y = 2, maxLags.y = 4, inst.stata = TRUE,
137 | . include.x = TRUE, varname.reg.pre = c("w"), lagTerms.reg.pre = c(0),
138 | . maxLags.reg.pre = c(3), fur.con = TRUE, fur.con.diff = TRUE,
139 | . fur.con.lev = TRUE, varname.reg.fur = c("k"), lagTerms.reg.fur = c(0),
140 | . w.mat = "iid.err", std.err = "corrected", estimation = "twostep",
141 | . opt.meth = "none")
142 | 2. lapply(X = i_cases, FUN = Z_i.fct, Time = Time, varname.i = varname.i,
143 | . use.mc.diff = use.mc.diff, use.mc.lev = use.mc.lev, use.mc.nonlin = use.mc.nonlin,
144 | . use.mc.nonlinAS = use.mc.nonlinAS, include.y = include.y,
145 | . varname.y = varname.y, inst.stata = inst.stata, include.dum = include.dum,
146 | . dum.diff = dum.diff, dum.lev = dum.lev, colnames.dum = colnames.dum,
147 | . fur.con = fur.con, fur.con.diff = fur.con.diff, fur.con.lev = fur.con.lev,
148 | . varname.reg.estParam.fur = varname.reg.estParam.fur, include.x = include.x,
149 | . end.reg = end.reg, varname.reg.end = varname.reg.end, pre.reg = pre.reg,
150 | . varname.reg.pre = varname.reg.pre, ex.reg = ex.reg, varname.reg.ex = varname.reg.ex,
151 | . maxLags.y = maxLags.y, lagTerms.y = lagTerms.y, max.lagTerms = max.lagTerms,
152 | . maxLags.reg.end = maxLags.reg.end, maxLags.reg.pre = maxLags.reg.pre,
153 | . maxLags.reg.ex = maxLags.reg.ex, inst.reg.ex.expand = inst.reg.ex.expand,
154 | . dat = dat, dat.na = dat.na)
155 | 3. FUN(X[[i]], ...)
156 | 4. do.call(what = "cbind", args = sapply(FUN = LEV.pre.fct, i = i,
157 | . varname.ex.pre.temp, T.mcLev = T.mcLev.temp, use.mc.diff = use.mc.diff,
158 | . inst.stata = inst.stata, Time = Time, varname.i = varname.i,
159 | . lagTerms = max.lagTerms, dat = dat, dat.na = dat.na))
160 | 5. sapply(FUN = LEV.pre.fct, i = i, varname.ex.pre.temp, T.mcLev = T.mcLev.temp,
161 | . use.mc.diff = use.mc.diff, inst.stata = inst.stata, Time = Time,
162 | . varname.i = varname.i, lagTerms = max.lagTerms, dat = dat,
163 | . dat.na = dat.na)
164 | 6. lapply(X = X, FUN = FUN, ...)
165 | 7. FUN(X[[i]], ...)
166 | 8. Matrix::bdiag(do.call(what = diag, args = list(mapply(ti = ti.temp,
167 | . t.end = tend.temp, lagTerms = lagTerms, FUN = datLEV.pre.fct,
168 | . varname = varname, MoreArgs = list(i = i, use.mc.diff = use.mc.diff,
169 | . inst.stata = inst.stata, dat = dat, dat.na = dat.na,
170 | . varname.i = varname.i, Time = Time)) * as.vector(!is.na(diff(dat.na[dat.na[,
171 | . varname.i] == i, varname][(lagTerms - 1):Time]))))))
172 | 9. do.call(what = diag, args = list(mapply(ti = ti.temp, t.end = tend.temp,
173 | . lagTerms = lagTerms, FUN = datLEV.pre.fct, varname = varname,
174 | . MoreArgs = list(i = i, use.mc.diff = use.mc.diff, inst.stata = inst.stata,
175 | . dat = dat, dat.na = dat.na, varname.i = varname.i, Time = Time)) *
176 | . as.vector(!is.na(diff(dat.na[dat.na[, varname.i] == i, varname][(lagTerms -
177 | . 1):Time])))))
178 |
179 | ```
180 | # pydynpd
181 | ```
182 | import pandas as pd
183 | from pydynpd import regression
184 | df=pd.read_csv("data.csv")
185 |
186 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k)', df, ['id', 'year'])
187 | ```
188 | ```
189 | Dynamic panel-data estimation, two-step system GMM
190 | Group variable: id Number of obs = 751
191 | Time variable: year Min obs per group: 5
192 | Number of instruments = 51 Max obs per group: 7
193 | Number of groups = 140 Avg obs per group: 5.36
194 | +------+------------+---------------------+------------+-----------+-----+
195 | | n | coef. | Corrected Std. Err. | z | P>|z| | |
196 | +------+------------+---------------------+------------+-----------+-----+
197 | | L1.n | 0.9453810 | 0.1429764 | 6.6121470 | 0.0000000 | *** |
198 | | L2.n | -0.0860069 | 0.1082318 | -0.7946553 | 0.4268140 | |
199 | | w | -0.4477795 | 0.1521917 | -2.9422068 | 0.0032588 | ** |
200 | | k | 0.1235808 | 0.0508836 | 2.4286941 | 0.0151533 | * |
201 | | _con | 1.5630849 | 0.4993484 | 3.1302492 | 0.0017466 | ** |
202 | +------+------------+---------------------+------------+-----------+-----+
203 | Hansen test of overid. restrictions: chi(46) = 96.442 Prob > Chi2 = 0.000
204 | Arellano-Bond test for AR(1) in first differences: z = -2.35 Pr > z =0.019
205 | Arellano-Bond test for AR(2) in first differences: z = -1.15 Pr > z =0.251
206 | ```
207 |
208 |
209 |
210 |
211 | # xtabond2
212 |
213 | ```
214 | mata: mata set matafavor speed, perm
215 | insheet using "data.csv"
216 | xtset(id year)
217 | xtabond2 n L(1/2).n w k , gmm(n, lag(2 4)) gmm(w, lag(1 3)) iv(k ) nolevel twostep robust
218 |
219 | ```
220 | ```
221 |
222 | Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.
223 | Warning: Two-step estimated covariance matrix of moments is singular.
224 | Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
225 | Difference-in-Sargan/Hansen statistics may be negative.
226 |
227 | Dynamic panel-data estimation, two-step difference GMM
228 | ------------------------------------------------------------------------------
229 | Group variable: id Number of obs = 611
230 | Time variable : year Number of groups = 140
231 | Number of instruments = 36 Obs per group: min = 4
232 | Wald chi2(0) = . avg = 4.36
233 | Prob > chi2 = . max = 6
234 | ------------------------------------------------------------------------------
235 | | Corrected
236 | n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
237 | -------------+----------------------------------------------------------------
238 | n |
239 | L1. | .1700616 .1046652 1.62 0.104 -.0350784 .3752016
240 | L2. | -.0113381 .0377205 -0.30 0.764 -.0852688 .0625926
241 | |
242 | w | -.9510582 .1277298 -7.45 0.000 -1.201404 -.7007124
243 | k | .4637223 .0718328 6.46 0.000 .3229325 .6045121
244 | ------------------------------------------------------------------------------
245 | Instruments for first differences equation
246 | Standard
247 | D.k
248 | GMM-type (missing=0, separate instruments for each period unless collapsed)
249 | L(1/3).w
250 | L(2/4).n
251 | ------------------------------------------------------------------------------
252 | Arellano-Bond test for AR(1) in first differences: z = -1.19 Pr > z = 0.235
253 | Arellano-Bond test for AR(2) in first differences: z = -0.81 Pr > z = 0.417
254 | ------------------------------------------------------------------------------
255 | Sargan test of overid. restrictions: chi2(32) = 91.61 Prob > chi2 = 0.000
256 | (Not robust, but not weakened by many instruments.)
257 | Hansen test of overid. restrictions: chi2(32) = 47.86 Prob > chi2 = 0.035
258 | (Robust, but weakened by many instruments.)
259 |
260 | Difference-in-Hansen tests of exogeneity of instrument subsets:
261 | gmm(n, lag(2 4))
262 | Hansen test excluding group: chi2(15) = 23.75 Prob > chi2 = 0.069
263 | Difference (null H = exogenous): chi2(17) = 24.11 Prob > chi2 = 0.117
264 | gmm(w, lag(1 3))
265 | Hansen test excluding group: chi2(14) = 17.25 Prob > chi2 = 0.243
266 | Difference (null H = exogenous): chi2(18) = 30.61 Prob > chi2 = 0.032
267 | iv(k)
268 | Hansen test excluding group: chi2(31) = 38.33 Prob > chi2 = 0.171
269 | Difference (null H = exogenous): chi2(1) = 9.53 Prob > chi2 = 0.002
270 |
271 | ```
272 |
273 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Dazhong Wu
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/Main.py:
--------------------------------------------------------------------------------
1 |
2 | import pandas as pd
3 | from pydynpd import regression
4 |
5 | import time
6 |
7 |
8 | df = pd.read_csv("test_data.csv")
9 |
10 |
11 | #command_str='y L(1:?).y L(1:?).x | gmm(y, 2:3) iv(L(1:1).x)| timedumm'
12 | #mydpd = regression.abond(command_str, df, ['id', 'year'])
13 | df = pd.read_csv("data.csv")
14 | #mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) |nolevel fod ', df, ['id', 'year'])
15 | command_str='n L(1:?).n w k | gmm(n, 2:3) pred(w k)| fod'
16 | mydpd = regression.abond(command_str, df, ['id', 'year'])
17 |
18 | for i in range(0, len(mydpd.models)):
19 | print("model", end=" ")
20 | print(i+1, end=": bic= ")
21 | print(mydpd.models[i].MMSC_LU["bic"], end = "; hqic=")
22 | print(mydpd.models[i].MMSC_LU["hqic"], end="; aic=")
23 | print(mydpd.models[i].MMSC_LU["aic"])
24 |
25 |
26 |
27 |
28 |
--------------------------------------------------------------------------------
/Main2.py:
--------------------------------------------------------------------------------
1 |
2 | import pandas as pd
3 | from pydynpd import regression
4 |
5 | import time
6 |
7 | a=time.time()
8 |
9 |
10 | df = pd.read_csv("data.csv")
11 | for i in range(0,101):
12 | #mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k)', df, ['id', 'year'])
13 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:.) pred(w k)', df, ['id', 'year'])
14 |
15 | print(time.time()-a)
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # pydynpd: A python package for dynamic panel model
2 | [](https://zenodo.org/badge/latestdoi/466146436)
3 | [](https://pypi.org/project/pydynpd/)
4 |
5 | pydynpd is the first python package to implement Difference and System GMM [1][2][3] to estimate dynamic panel data models.
6 |
7 | Below is a typical dynamic panel data model:
8 |
9 | 
10 |
11 | In the equation above, x is a predetermined variable that is potentially correlated with past errors, s is a strictly exogenous variable, and u is fixed effect.
12 |
13 | This software has been published in Journal of Open Source Software:
14 | ```
15 | Wu et al., (2023). pydynpd: A Python package for dynamic panel model.
16 | Journal of Open Source Software, 8(83), 4416, https://doi.org/10.21105/joss.04416
17 | ```
18 | ## Features supported:
19 | * Differene and System GMM
20 | * One-step, two-step, and iterative estimates
21 | * First-difference and forward orthogonal deviation transformations
22 | * Robust standard errors. For two-step GMM, the calculation suggested by Windmeijer (2005) is used.
23 | * Hansen over-identification test
24 | * Arellano-Bond test for autocorrelation
25 | * Time dummies
26 | * Collapse GMM instruments to limit instrument proliferation
27 | * Search for models based on users' request, rather than just run the model specified by users as other packages do
28 |
29 |
30 | ## Installation:
31 | ```
32 | pip install pydynpd
33 | ```
34 | This package requires: numpy, scipy, pandas, and PrettyTable
35 |
36 | ## Usage:
37 | ```
38 | import pandas as pd
39 | from pydynpd import regression
40 |
41 | df = pd.read_csv("data.csv")
42 | command_str='n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) | timedumm nolevel'
43 | mydpd = regression.abond(command_str, df, ['id', 'year'])
44 | ```
45 | result:
46 | ```
47 | Dynamic panel-data estimation, two-step difference GMM
48 | Group variable: id Number of obs = 611
49 | Time variable: year Number of groups = 140
50 | Number of instruments = 42
51 | +-----------+------------+---------------------+------------+-----------+
52 | | n | coef. | Corrected Std. Err. | z | P>|z| |
53 | +-----------+------------+---------------------+------------+-----------+
54 | | L1.n | 0.2710675 | 0.1382542 | 1.9606462 | 0.0499203 |
55 | | L2.n | -0.0233928 | 0.0419665 | -0.5574151 | 0.5772439 |
56 | | w | -0.5668527 | 0.2092231 | -2.7093219 | 0.0067421 |
57 | | k | 0.3613939 | 0.0662624 | 5.4539824 | 0.0000000 |
58 | | year_1979 | 0.0011898 | 0.0092322 | 0.1288765 | 0.8974554 |
59 | | year_1980 | -0.0316432 | 0.0116155 | -2.7242254 | 0.0064453 |
60 | | year_1981 | -0.0900163 | 0.0206593 | -4.3571693 | 0.0000132 |
61 | | year_1982 | -0.0996210 | 0.0296036 | -3.3651654 | 0.0007650 |
62 | | year_1983 | -0.0693308 | 0.0404276 | -1.7149347 | 0.0863572 |
63 | | year_1984 | -0.0614505 | 0.0475525 | -1.2922666 | 0.1962648 |
64 | +-----------+------------+---------------------+------------+-----------+
65 | Hansen test of overid. restrictions: chi(32) = 32.666 Prob > Chi2 = 0.434
66 | Arellano-Bond test for AR(1) in first differences: z = -1.29 Pr > z =0.198
67 | Arellano-Bond test for AR(2) in first differences: z = -0.31 Pr > z =0.760
68 | ```
69 | ## Tutorial
70 | A detailed tutorial is given in the following two documents:
71 | [inputs of the abond command](https://github.com/dazhwu/pydynpd/blob/main/vignettes/Tutorial.ipynb).
72 | [outputs of the abond command](https://github.com/dazhwu/pydynpd/blob/main/vignettes/API.md).
73 |
74 | ## Similar packages
75 | The objective of the package is similar to the following open-source packages:
76 | Package | Language | version
77 | --- | --- | ---
78 | plm | R | 2.6-1
79 | panelvar | R| 0.5.3
80 | pdynmc | R| 0.9.7
81 |
82 | To compare pydynpd with similar packages, we performed performance tests. More specifically, in each test for each package we run 100 times to estimate the same model with the same data. For verification, the tests also include Stata package xtabond2 though Stata is a commercial software. We use xtabond2 for regression result verification because it is the most popular package in estimating dynamic panel models. Figure below is from one of the tests. Note that directly comparing xtabond2's speed with R or Python packages is a little unfair because the calculation part of xtabond2 was compiled while pydynpd and the three R packages are interpreted; xtabond2 should have a clear advantage on speed.
83 |
84 | 
85 |
86 | Though developed in pure python, pydynpd is not far behind of xtabond2. Moreover, it is significanly faster than the three R packages which are interpreted scripts just like pydynpd.
87 |
88 | A detailed description of the tests can be found [here](https://github.com/dazhwu/pydynpd/blob/main/Benchmark/performance_comparison.md)
89 |
90 | ## FAQs
91 | ### How to extract coefficients from regression?
92 | For example, if you run:
93 | ```
94 | df = pd.read_csv("data.csv")
95 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) ', df, ['id', 'year'])
96 | ```
97 |
98 | The output regression table will be
99 | ```
100 | +------+------------+---------------------+------------+-----------+-----+
101 | | n | coef. | Corrected Std. Err. | z | P>|z| | |
102 | +------+------------+---------------------+------------+-----------+-----+
103 | | L1.n | 0.9453810 | 0.1429764 | 6.6121470 | 0.0000000 | *** |
104 | | L2.n | -0.0860069 | 0.1082318 | -0.7946553 | 0.4268140 | |
105 | | w | -0.4477795 | 0.1521917 | -2.9422068 | 0.0032588 | ** |
106 | | k | 0.1235808 | 0.0508836 | 2.4286941 | 0.0151533 | * |
107 | | _con | 1.5630849 | 0.4993484 | 3.1302492 | 0.0017466 | ** |
108 | +------+------------+---------------------+------------+-----------+-----+
109 | ```
110 | If you want to programably extract a value, for example, the first z value (6.6121470) then you can add the following:
111 | ```
112 | >>>mydpd.models[0].regression_table.iloc[0]['z_value']
113 | 6.6121469997085915
114 | ```
115 | Basically, the object mydpd returned above contains models because pydynpd allows us to run and compare multiple models at the same time. By default, it only contains one model which is models[0]. A model has a regression table which is a pandas dataframe:
116 | ```
117 | >>>mydpd.models[0].regression_table
118 |
119 | variable coefficient std_err z_value p_value sig
120 | 0 L1.n 0.945381 0.142976 6.612147 3.787856e-11 ***
121 | 1 L2.n -0.086007 0.108232 -0.794655 4.268140e-01
122 | 2 w -0.447780 0.152192 -2.942207 3.258822e-03 **
123 | 3 k 0.123581 0.050884 2.428694 1.515331e-02 *
124 | 4 _con 1.563085 0.499348 3.130249 1.746581e-03 **
125 |
126 | ```
127 | So you can extract any value from this dataframe.
128 |
129 | ### How to use pydynpd with R?
130 | First, you need to install Python on your computer; then install pydynpd.
131 | ```
132 | pip install pydynpd
133 | ```
134 | Second, in R environment install package reticulate:
135 | ```
136 | install.packages("reticulate")
137 | ```
138 | Third, you configure Rstudio so that it can communicate with Python installed in step 1. You can find instruction at
139 | https://www.rstudio.com/blog/rstudio-v1-4-preview-python-support/
140 |
141 | Finally, you can use the following template to call pydynpd from R. For comparision, the corresponding Python code is also incuded.
142 |
157 | import pandas as pd
158 | from pydynpd import regression
159 | df = pd.read_csv("data.csv")
160 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k)', df, ['id', 'year'])
161 |
162 |
163 |
164 |
165 |
166 | Code above generates the following result:
167 | ```
168 | Dynamic panel-data estimation, two-step system GMM
169 | Group variable: id Number of obs = 611
170 | Time variable: year Min obs per group: 4
171 | Number of instruments = 51 Max obs per group: 6
172 | Number of groups = 140 Avg obs per group: 4.36
173 | +------+------------+---------------------+------------+-----------+-----+
174 | | n | coef. | Corrected Std. Err. | z | P>|z| | |
175 | +------+------------+---------------------+------------+-----------+-----+
176 | | L1.n | 0.9453810 | 0.1429764 | 6.6121470 | 0.0000000 | *** |
177 | | L2.n | -0.0860069 | 0.1082318 | -0.7946553 | 0.4268140 | |
178 | | w | -0.4477795 | 0.1521917 | -2.9422068 | 0.0032588 | ** |
179 | | k | 0.1235808 | 0.0508836 | 2.4286941 | 0.0151533 | * |
180 | | _con | 1.5630849 | 0.4993484 | 3.1302492 | 0.0017466 | ** |
181 | +------+------------+---------------------+------------+-----------+-----+
182 | Hansen test of overid. restrictions: chi(46) = 96.442 Prob > Chi2 = 0.000
183 | Arellano-Bond test for AR(1) in first differences: z = -2.35 Pr > z =0.019
184 | Arellano-Bond test for AR(2) in first differences: z = -1.15 Pr > z =0.251
185 | ```
186 | As you can see, you don't need to change the command string in R. The only parameter you have to change is the identifiers; ['id', 'year'] in Python is changed to c('id', 'year') in R. Also, from R you can access the properties of the result above the same way you work on Python. For example, after running code above if you run the following R script:
187 | ```
188 | reg_table=result$models[[1]]$regression_table
189 | print(reg_table)
190 | ```
191 | The output is:
192 | ```
193 | variable coefficient std_err z_value p_value sig
194 | 1 L1.n 0.94538100 0.14297640 6.6121470 3.787856e-11 ***
195 | 2 L2.n -0.08600694 0.10823176 -0.7946553 4.268140e-01
196 | 3 w -0.44777955 0.15219173 -2.9422068 3.258822e-03 **
197 | 4 k 0.12358078 0.05088363 2.4286941 1.515331e-02 *
198 | 5 _con 1.56308487 0.49934839 3.1302492 1.746581e-03 **
199 | ```
200 | In the example above, reg_table is an R data frame.
201 |
202 | ## Contributing
203 | There are several ways to contribute to pydynpd:
204 |
205 | Submit issue/bug reports [here](https://github.com/dazhwu/pydynpd/issues/), or try to fix the problem yourself and then submit a [pull request](https://github.com/dazhwu/pydynpd/pulls).
206 |
207 | Browse the source code and see if anything looks out of place - let us know!
208 |
209 | ## References
210 | [1]
211 | Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The review of economic studies, 58(2), 277-297.
212 |
213 | [2]
214 | Arellano, M., & Bover, O. (1995). Another look at the instrumental variable estimation of error-components models. Journal of econometrics, 68(1), 29-51.
215 |
216 | [3]
217 | Blundell, R., & Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of econometrics, 87(1), 115-143.
218 |
219 | [4]
220 | Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. The stata journal, 9(1), 86-136.
221 |
222 | [5]
223 | Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of econometrics, 126(1), 25-51.
224 |
--------------------------------------------------------------------------------
/build_upload.sh:
--------------------------------------------------------------------------------
1 | rm -rf dist/*
2 | python3 -m build
3 | python3 -m twine upload --repository pypi dist/*.whl
4 |
--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 |
3 | There are several ways to contribute to pydynpd!
4 | - Submit issue/bug reports [here](https://github.com/wouterboomsma/eigency/issues),
5 | or try to fix the problem yourself and then [submit a pull request](https://github.com/wouterboomsma/eigency/pulls).
6 | - Request features or ask questions [here](https://github.com/wouterboomsma/eigency/issues).
7 | - Browse [the source code](https://github.com/wouterboomsma/eigency) and see if anything looks out of place - let us know!
8 |
9 |
--------------------------------------------------------------------------------
/output.html:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dazhwu/pydynpd/706f69d75cb38feebf042c2dffd8f203e0a7aeb9/output.html
--------------------------------------------------------------------------------
/pydynpd.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
1 | Metadata-Version: 2.1
2 | Name: pydynpd
3 | Version: 0.2.0
4 | Summary: A package to estimate dynamic panel data model using difference GMM and system GMM.
5 | Home-page: https://github.com/dazhwu/pydynpd
6 | Author: Dazhong Wu
7 | Author-email: wudz800@gmail.com
8 | License: MIT
9 | Classifier: License :: OSI Approved :: MIT License
10 | Classifier: Programming Language :: Python
11 | Classifier: Programming Language :: Python :: 3
12 | Classifier: Programming Language :: Python :: 3.6
13 | Classifier: Programming Language :: Python :: Implementation :: CPython
14 | Classifier: Programming Language :: Python :: Implementation :: PyPy
15 | Requires-Python: >=3.6.0
16 | Description-Content-Type: text/markdown
17 | License-File: LICENSE
18 |
19 |
20 | # pydynpd: Dynamic panel estimation for Difference and System GMM (generalized method-of-moments)
21 | [](https://zenodo.org/badge/latestdoi/466146436)
22 | [](https://pypi.org/project/pydynpd/)
23 |
24 | pydynpd is the first python package to implement Difference and System GMM [1][2][3] to estimate dynamic panel data models.
25 |
26 | Below is a typical dynamic panel data model:
27 |
28 | 
29 |
30 | In the equation above, x is a predetermined variable that is potentially correlated with past errors, s is a strictly exogenous variable, and u is fixed effect.
31 |
32 | ## Features supported:
33 | * Differene and System GMM
34 | * One-step, two-step, and iterative estimates
35 | * First-difference and forward orthogonal deviation transformations
36 | * Robust standard errors. For two-step GMM, the calculation suggested by Windmeijer (2005) is used.
37 | * Hansen over-identification test
38 | * Arellano-Bond test for autocorrelation
39 | * Time dummies
40 | * Collapse GMM instruments to limit instrument proliferation
41 | * Search for models based on users' request, rather than just run the model specified by users as other packages do
42 |
43 |
44 | ## Installation:
45 | ```
46 | pip install pydynpd
47 | ```
48 | This package requires: numpy, scipy, pandas, and PrettyTable
49 |
50 | ## Usage:
51 | ```
52 | import pandas as pd
53 | from pydynpd import regression
54 |
55 | df = pd.read_csv("data.csv")
56 | command_str='n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) | timedumm nolevel'
57 | mydpd = regression.abond(command_str, df, ['id', 'year'])
58 | ```
59 | result:
60 | ```
61 | Dynamic panel-data estimation, two-step difference GMM
62 | Group variable: id Number of obs = 611
63 | Time variable: year Number of groups = 140
64 | Number of instruments = 42
65 | +-----------+------------+---------------------+------------+-----------+
66 | | n | coef. | Corrected Std. Err. | z | P>|z| |
67 | +-----------+------------+---------------------+------------+-----------+
68 | | L1.n | 0.2710675 | 0.1382542 | 1.9606462 | 0.0499203 |
69 | | L2.n | -0.0233928 | 0.0419665 | -0.5574151 | 0.5772439 |
70 | | w | -0.5668527 | 0.2092231 | -2.7093219 | 0.0067421 |
71 | | k | 0.3613939 | 0.0662624 | 5.4539824 | 0.0000000 |
72 | | year_1979 | 0.0011898 | 0.0092322 | 0.1288765 | 0.8974554 |
73 | | year_1980 | -0.0316432 | 0.0116155 | -2.7242254 | 0.0064453 |
74 | | year_1981 | -0.0900163 | 0.0206593 | -4.3571693 | 0.0000132 |
75 | | year_1982 | -0.0996210 | 0.0296036 | -3.3651654 | 0.0007650 |
76 | | year_1983 | -0.0693308 | 0.0404276 | -1.7149347 | 0.0863572 |
77 | | year_1984 | -0.0614505 | 0.0475525 | -1.2922666 | 0.1962648 |
78 | +-----------+------------+---------------------+------------+-----------+
79 | Hansen test of overid. restrictions: chi(32) = 32.666 Prob > Chi2 = 0.434
80 | Arellano-Bond test for AR(1) in first differences: z = -1.29 Pr > z =0.198
81 | Arellano-Bond test for AR(2) in first differences: z = -0.31 Pr > z =0.760
82 | ```
83 | ## Tutorial
84 | A detailed tutorial is [here](https://github.com/dazhwu/pydynpd/blob/main/vignettes/Tutorial.ipynb).
85 |
86 | ## Similar packages
87 | The objective of the package is similar to the following open-source packages:
88 | Package | Language | version
89 | --- | --- | ---
90 | plm | R | 2.6-1
91 | panelvar | R| 0.5.3
92 | pdynmc | R| 0.9.7
93 |
94 | To compare pydynpd with similar packages, we performed performance tests. More specifically, in each test for each package we run 100 times to estimate the same model with the same data. For verification, the tests also include Stata package xtabond2 though Stata is a commercial software. We use xtabond2 for regression result verification because it is the most popular package in estimating dynamic panel models. Figure below is from one of the tests. Note that directly comparing xtabond2's speed with R or Python packages is a little unfair because the calculation part of xtabond2 was compiled while pydynpd and the three R packages are interpreted; xtabond2 should have a clear advantage on speed.
95 |
96 | 
97 |
98 | Though developed in pure python, pydynpd is not far behind of xtabond2. Moreover, it is significanly faster than the three R packages which are interpreted scripts just like pydynpd.
99 |
100 | A detailed description of the tests can be found [here](https://github.com/dazhwu/pydynpd/blob/main/Benchmark/performance_comparison.md)
101 |
102 | ## FAQs
103 | 1. How to extract coefficients from regression?
104 | For example, if you run:
105 | ```
106 | df = pd.read_csv("data.csv")
107 | mydpd = regression.abond('n L(1:2).n w k | gmm(n, 2:4) gmm(w, 1:3) iv(k) ', df, ['id', 'year'])
108 | ```
109 |
110 | The output regression table will be
111 | ```
112 | +------+------------+---------------------+------------+-----------+-----+
113 | | n | coef. | Corrected Std. Err. | z | P>|z| | |
114 | +------+------------+---------------------+------------+-----------+-----+
115 | | L1.n | 0.9453810 | 0.1429764 | 6.6121470 | 0.0000000 | *** |
116 | | L2.n | -0.0860069 | 0.1082318 | -0.7946553 | 0.4268140 | |
117 | | w | -0.4477795 | 0.1521917 | -2.9422068 | 0.0032588 | ** |
118 | | k | 0.1235808 | 0.0508836 | 2.4286941 | 0.0151533 | * |
119 | | _con | 1.5630849 | 0.4993484 | 3.1302492 | 0.0017466 | ** |
120 | +------+------------+---------------------+------------+-----------+-----+
121 | ```
122 | If you want to programably extract a value, for example, the first z value (6.6121470) then you can add the following:
123 | ```
124 | >>>mydpd.models[0].regression_table.iloc[0]['z_value']
125 | 6.6121469997085915
126 | ```
127 | Basically, the object mydpd returned above contains models because pydynpd allows us to run and compare multiple models at the same time. By default, it only contains one model which is models[0]. A model has a regression table which is a pandas dataframe:
128 | ```
129 | >>>mydpd.models[0].regression_table
130 |
131 | variable coefficient std_err z_value p_value sig
132 | 0 L1.n 0.945381 0.142976 6.612147 3.787856e-11 ***
133 | 1 L2.n -0.086007 0.108232 -0.794655 4.268140e-01
134 | 2 w -0.447780 0.152192 -2.942207 3.258822e-03 **
135 | 3 k 0.123581 0.050884 2.428694 1.515331e-02 *
136 | 4 _con 1.563085 0.499348 3.130249 1.746581e-03 **
137 |
138 | ```
139 | So you can extract any value from this dataframe.
140 |
141 |
142 | ## Contributing
143 | There are several ways to contribute to pydynpd:
144 |
145 | Submit issue/bug reports [here](https://github.com/dazhwu/pydynpd/issues/), or try to fix the problem yourself and then submit a [pull request](https://github.com/dazhwu/pydynpd/pulls).
146 |
147 | Browse the source code and see if anything looks out of place - let us know!
148 |
149 | ## References
150 | [1]
151 | Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The review of economic studies, 58(2), 277-297.
152 |
153 | [2]
154 | Arellano, M., & Bover, O. (1995). Another look at the instrumental variable estimation of error-components models. Journal of econometrics, 68(1), 29-51.
155 |
156 | [3]
157 | Blundell, R., & Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of econometrics, 87(1), 115-143.
158 |
159 | [4]
160 | Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. The stata journal, 9(1), 86-136.
161 |
162 | [5]
163 | Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of econometrics, 126(1), 25-51.
164 |
--------------------------------------------------------------------------------
/pydynpd.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | LICENSE
2 | README.md
3 | pyproject.toml
4 | setup.py
5 | pydynpd/__init__.py
6 | pydynpd/__version__.py
7 | pydynpd/command.py
8 | pydynpd/common_functions.py
9 | pydynpd/dynamic_panel_model.py
10 | pydynpd/info.py
11 | pydynpd/instruments.py
12 | pydynpd/model_organizer.py
13 | pydynpd/model_summary.py
14 | pydynpd/panel_data.py
15 | pydynpd/regression.py
16 | pydynpd/specification_tests.py
17 | pydynpd/variable.py
18 | pydynpd.egg-info/PKG-INFO
19 | pydynpd.egg-info/SOURCES.txt
20 | pydynpd.egg-info/dependency_links.txt
21 | pydynpd.egg-info/requires.txt
22 | pydynpd.egg-info/top_level.txt
--------------------------------------------------------------------------------
/pydynpd.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/pydynpd.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | scipy
3 | prettytable
4 | pandas
5 |
--------------------------------------------------------------------------------
/pydynpd.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | pydynpd
2 |
--------------------------------------------------------------------------------
/pydynpd/.idea/.name:
--------------------------------------------------------------------------------
1 | dynamic_panel_model.py
--------------------------------------------------------------------------------
/pydynpd/.idea/inspectionProfiles/Project_Default.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/pydynpd/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |