├── LICENSE ├── README.md ├── dstat.ado ├── dstat.pkg ├── dstat.sthlp ├── dstat_svyr.ado └── stata.toc /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Ben Jann 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dstat 2 | Stata module to compute summary statistics and distribution functions including 3 | standard errors and optional covariate balancing 4 | 5 | `dstat` unites a variety of methods to describe (univariate) 6 | statistical distributions. Covered are density estimation, histograms, 7 | cumulative distribution functions, probability distributions, quantile 8 | functions, lorenz curves, percentile shares, and a large collection 9 | of summary statistics such as classical and robust measures of location, scale, 10 | skewness, and kurtosis, as well as inequality and poverty measures. Particular 11 | features of the command are that it provides consistent standard errors 12 | supporting complex sample designs for all covered statistics and that the 13 | simultaneous analysis of multiple variables across multiple subpopulations is 14 | possible. Furthermore, the command supports covariate balancing based on 15 | reweighting techniques (inverse probability weighting and entropy balancing), 16 | including appropriate correction of standard errors. Standard error estimation 17 | is implemented in terms of influence functions, which can be stored 18 | for further analysis, for example, using RIF regression. 19 | 20 | To install `dstat` from the SSC Archive, type 21 | 22 | . ssc install dstat, replace 23 | 24 | in Stata. Stata version 14 or newer is required. Furthermore, `moremata` and 25 | `coefplot` are required. To install these packages from the SSC Archive, type 26 | 27 | . ssc install moremata, replace 28 | . ssc install coefplot, replace 29 | 30 | --- 31 | 32 | Installation from GitHub: 33 | 34 | . net install dstat, replace from(https://raw.githubusercontent.com/benjann/dstat/main/) 35 | . net install moremata, replace from(https://raw.githubusercontent.com/benjann/moremata/master/) 36 | . net install coefplot, replace from(https://raw.githubusercontent.com/benjann/coefplot/master/) 37 | 38 | --- 39 | 40 | Main changes: 41 | 42 | 04jun2025 (version 1.4.7) 43 | - -dstat graph- failed after -dstat histogram- and -dstat share- in case of 44 | over() with suboption -contrast-; this is fixed 45 | - -dstat graph- after -dstat histogram-, -dstat proportion-, or -dstat share- 46 | now leaves the margin of the plotregion unchanged in case of over() with 47 | suboption -contrast-; furthermore, base() is now set to 1 (rather than 0) 48 | in case of over() with suboption -ratio- 49 | 50 | 25apr2025 (version 1.4.6) 51 | - statistic -smse- in -dstat summarize- has been renamed to -rmse- (root mean 52 | squared error) 53 | - -dstat summarize- could be unnecessarily slow on small datasets due to an 54 | unfortunate use of Mata's findexternal(); this is fixed 55 | - inequality statistic [gw_|w_|b_]ge(alpha) in -dstat summarize- did not 56 | correctly diagnose out-of-support observations if alpha was 0 or 1; this is 57 | fixed 58 | 59 | 04apr2025 (version 1.4.5) 60 | - return e(sinfo) added (undocumented) 61 | - -dstat summarize- now has undocumented option -noclean- to retain duplicate 62 | statistics 63 | 64 | 24mar2023 (version 1.4.4) 65 | - generate() stored the influence functions of the raw statistics rather than the 66 | influence functions of the transformed statistics if suboption -lnratio- was 67 | specified in over(); this also implied that vce(svy) reported the standard errors 68 | of the raw statistics rather than standard errors of the transformed statistics 69 | if suboption -lnratio- was specified in over(); this is fixed 70 | 71 | 28dec2022 (version 1.4.3) 72 | - command -dstat (somersd) Y, by(X)- computed D(X|Y) rather than D(Y|X); I now 73 | changed this so that D(Y|X) is computed, which is more intuitive (and more in 74 | line with how other asymmetric statistics are computed by dstat); thanks to 75 | Maurizio Pisati for pointing out this inconsistency 76 | 77 | 15dec2022 (version 1.4.2) 78 | - modified dstat_svyr such that replication-based svy estimators no longer 79 | apply checks for omitted coefficients; this prevents the estimators from 80 | failing on results that have zero variance (e.g. a zero-frequency histogram 81 | bar) 82 | 83 | 14dec2022 (version 1.4.1) 84 | - [no]cov is no longer a suboption within vce(); it is now a regular option 85 | - dstat predict now has option scaling() to determine the scaling of the 86 | generated influence functions 87 | - option nobwfixed added; code to obtain grid and bandwidth in case of 88 | replication estimators revised 89 | - revised implementation of vce(svy) 90 | - revised implementation of predict 91 | 92 | 12dec2022 (version 1.4.0) 93 | - dstat pw did not work with vce() set to bootstrap, jackknife, or svy; this is 94 | fixed 95 | - the returned information on sample and population size included observations 96 | that were excluded from estimation due to missing values if vce(svy) with 97 | replication-based variance estimation was specified; this is fixed 98 | - the secondary variable (-by-) can now be string for inequality decomposition 99 | measures as well as for cohend, mindex, uc[l|r], cramersv, and dissim 100 | 101 | 05dec2022 (version 1.3.9) 102 | - statistic -sdlog- added 103 | - new methods in citype() for proportions: agresti, exact, jeffreys, wilson 104 | - citype(normal) can now be abbreviated as citype(norm) 105 | - reorganized code for computation of CIs 106 | - dstat graph: overlay can now be specified as a synonym for merge 107 | - r() from -dstat- is now preserved if option -graph- is specified; this ensures 108 | that r(table) will be available after running -dstat- with both the -graph- 109 | option and the -table- option; furthermore, r() from dstat is now also 110 | preserved if option -generate()- or -rif()- is applied 111 | - the display routine is now executed even if -quietly- is applied to -dstat-, 112 | so that r(table) will created even if -quietly- is applied 113 | - the display routine will now clear preexisting r() even if -notable- is applied 114 | - -dstat predict- no longer modified r() 115 | - an informative error message is now displayed if a string variable is 116 | specified in by(), pline(), or as an argument to a statistic 117 | 118 | 21nov2022 (version 1.3.8) 119 | - dstat density: option [l|r]tight added; requires newest update of moremata 120 | 121 | 20oct2022 (version 1.3.7) 122 | - dstat returned error if option -nose- was applied with statistics that set 123 | standard errors to zero (e.g. min and max); this is fixed 124 | 125 | 22sep2022 (version 1.3.6) 126 | - dstat returned error if histogram method -scott- was specified; this is fixed 127 | - now using errprintf() to display errors in Mata 128 | 129 | 11aug2022 (version 1.3.5) 130 | - statistic -cohend- added 131 | - statistic -freq- without argument can now be used to obtain 132 | overall frequence/sum of weights; can also type -count- 133 | 134 | 17feb2022 (version 1.3.4) 135 | - dstat pw added (wrapper for dstat summarize to compute pairwise correlations 136 | and similar) 137 | - informative error message is now displayed if factor variables are used in 138 | -dstat proportion- without option -nocategorical- 139 | 140 | 14feb2022 (version 1.3.3) 141 | - additional statistics in dstat sum: -slope- or -b- (regression coefficient; 142 | may also be used to compute mean difference or risk difference), 143 | -or- (odds ratio in 2x2 table), -rr- (risk ratio in 2x2 table) 144 | - version of moremata library is now checked 145 | 146 | 17jan2022 (version 1.3.2) 147 | - option hdtrim() added (trimmed Harrell-Davis quantiles) 148 | - grid size in _ds_mq_d_init() now 1024+1 because first point will be removed 149 | 150 | 11jan2022 (version 1.3.1) 151 | - now using a properly derived expression for the influence function of 152 | Harrell-Davis quantiles (rather than obtaining the IF by analogy to the 153 | jackknife approach proposed by Harrell and Davis 1982); the new formulas 154 | lead to slightly different results 155 | 156 | 07jan2022 (version 1.3.0) 157 | - dstat sum: huber, biweight, mad[n], mae[n], mscale now take account of qdef() 158 | - dstat sum: computation of IFs for winsor, qskew, qw, lqw, rqw revised so that 159 | qdef() is taken into account (only relevant if qdef=10 or qdef=11) 160 | 161 | 30dec2021 (version 1.2.9) 162 | - system for managing selection of observations and temporary results rewritten 163 | (more systematic, cleaner code, less error prone, more efficient) 164 | - dstat sum: harmonic mean (hmean) is now set to zero if at least one outcome 165 | value is equal to zero 166 | 167 | 22dec2021 (version 1.2.8) 168 | - dstat sum: computation of taua was wrong in case of fweights; this is fixed 169 | - dstat sum: renamed cdfm to mcdf, cdff to fcdf, ccdfm to mccdf, ccdff to fccdf 170 | - system for parsing syntax of -dstat sum- rewritten (more general, cleaner 171 | code, easier to manage/expand, better error messages) 172 | 173 | 22dec2021 (version 1.2.7) 174 | - support for qdef(11) added (mid-quantile); option -mquantile- is a synonym 175 | for qdef(11) 176 | - dstat sum: mquantile, gw_vlog, w_vlog, b_vlog, ekurtosis, rsquared added 177 | - dstat sum: now using quad precision when taking cross products in variance, 178 | sd, cv, md, gini, vlog, sen, sst, takayama, lvar, mse, spearman, skewness, 179 | kurtosis, gci, corr, cov 180 | - default for napprox() increased from 512 to 1024 181 | - dstat histogram: in case of pweights or iweights, the effective sample size 182 | (sum(w)^2/sum(w^2)) is now used instead of the physical number number of 183 | observations in the rules for selecting the number of bins 184 | - default bandwidth selector for density estimation is now -dpi(2)-; -sjpi- 185 | can be erratic on data that contains heaping 186 | - improved error messages and some code cleaning 187 | 188 | 05dec2021 (version 1.2.6) 189 | - IF of b_gini assumes that the order of group means is stable; this is an 190 | assumption that is typically not very critical; comparison to the jackknife 191 | illustrates that the IF is quite accurate even in small samples; removed 192 | the corresponding disclaimer in the help file 193 | 194 | 05dec2021 (version 1.2.6) 195 | - dstat_sum: b_gini added (IF not fully correct yet; may only serve as a rough 196 | approximation) 197 | - dstat sum: gw_gini, gw_mld, gw_theil, gw_ge added 198 | - datat sum: mldwithin renamed to w_mld; mldbetween renamed to b_mld 199 | - datat sum: theilwithin renamed to w_theil; theilbetween renamed to b_theil 200 | - datat sum: gewithin renamed to w_ge; gebetween renamed to b_ge 201 | 202 | 04dec2021 (version 1.2.5) 203 | - dstat sum: gewithin and gebetween added 204 | - dstat sum: IF of dissim made more efficient 205 | 206 | 03dec2021 (version 1.2.4) 207 | - dstat sum: mldwithin, mldbetween, teilwithin, teilbetween, dissim added 208 | - dstat sum: now using more efficient approach to compute IFs of categorical 209 | measures (hhi, entropy, mindex, etc) 210 | - option zvar() is now called by(); zvar() still supported but no longer documented 211 | 212 | 27nov2021 (version 1.2.3) 213 | - -nocasewise- had a bug that could crash -dstat- in some cases; this is fixed 214 | 215 | 26nov2021 (version 1.2.2) 216 | - new system to manage temporary results to improve efficiency of -dstat sum- 217 | - due to a type the values for gamma and tau_b could be somewhat off if weight 218 | were specified; this is fixed 219 | 220 | 25nov2021 (version 1.2.1) 221 | - added association statistics: taua, taub, somersd, gamma; using a fast 222 | algorithm by R. Newson (2006. Efficient Calculation of Jackknife Confidence 223 | Intervals for Rank Statistics. Journal of Statistical Software 15/1) to 224 | compute the difference in the sum of concordant and discordant pairs 225 | - dstat automatically (and silently) recentered (all) influence functions if 226 | any IF had a relative error (i.e. deviation from zero relative to the value 227 | of the statistic) larger than 1e-14; a corresponding warning message was only 228 | displayed if any IF had a relative error larger than 1e-6; the former type 229 | of recentering is now discarded; that is, recentering is now only applied 230 | if at least one relative error is larger than 1e-6 (all IFs will be 231 | affected) and a warning message is always displayed if recentering is applied 232 | - option -relax- could cause error in some situations; this is fixed 233 | - dstat no longer enforces user version 14.2 when writing coefficient names to 234 | e(b) (enforcing user version 14.2 caused issues with bootstrap and similar 235 | commands); a consequence of this is that in Stata 15 (and in Stata 16 prior 236 | to the 30mar2021 update) the results table from -dstat summarize- might look 237 | slightly awkward if statistics with parameters in parentheses are specified; 238 | type -version 14: dstat summarize ...- for better output in these cases 239 | - over-legend is no longer displayed if the coefficients table is suppressed 240 | - subcmd is now always set to -summarize-, if no known subcmd is specified; for 241 | example, -datat x1-x5- now works 242 | 243 | 20nov2021 (version 1.2.0) 244 | - a bug in -nocasewise- led to erroneous selection of observations or crashed 245 | dstat in some situations; this is fixed 246 | - added statistics for categorical variables: hhi, hhin, gimp, entropy, hill, 247 | renyi, mindex, uc, cramer 248 | 249 | 03aug2021 (version 1.1.9) 250 | - fixed header layout in Stata 17, employing _coef_table_header options 251 | introduced in the 13jul2021 update of Stata 17 252 | 253 | 14jul2021 (version 1.1.8) 254 | - option -discrete- now allowed in -dstat histogram-; -dstat histogram, discrete- 255 | is an alias for -dstat proportion, nocategorical- 256 | - graphs after -dstat proportion- now use a continuous axis instead of a categorical 257 | axis if option -nocategorical- has been specified 258 | - -dstat frequency- can now be used as alias for -dstat proportion, frequency- 259 | - statistic hdquantile() now fully supports weights; computation of influence 260 | functions has been improved 261 | - option -qdef(10)- can now be specified to use Harrell-Davis quantiles; option 262 | -hdquantile- is a synonym for -qdef(10)- 263 | 264 | 01jul2021 (version 1.1.7) 265 | - statistic hdquantile() added 266 | - SEs of quantile(0) and quantile(1) now set to 0 267 | - -dstat pdf- now allowed as alias for -dstat pdf- 268 | - better error message if an invalid subcommand is specified 269 | 270 | 30jun2021 (version 1.1.6) 271 | - additional poverty measures: tip (TIP ordinate) and atip (absolute TIP ordinate) 272 | - -datat tip- failed if a variables was specified in -pline()- instead of a 273 | fixed value; this is fixed 274 | - -dstat tip- no longer returns HCR and PGI in e() 275 | 276 | 29jun2021 (version 1.1.5) 277 | - -dstat tip- (Tip curve) added 278 | - option range() added to subcommands density, cdf, ccdf, quantile, lorenz, tip 279 | - association measures added: corr (correlation), cov (covariance), spearman 280 | (Spearman's rank correlation) 281 | - additional poverty measures: apgap (absolute poverty gap), apgi (absolute 282 | poverty gap index) 283 | - contrast(lag) and contrast(lead) now allowed in over() 284 | - can now specify custom p1 and p2 with -iqrn- 285 | - observations with missing on variables specified in zvar() or pline() (or 286 | corresponding variables specified as arguments to individual statistics) are 287 | no longer excluded from the overall estimation sample if -nocasewise- is 288 | specified 289 | - number of obs and sum of weights now returned for each parameter in e(nobs) 290 | and e(sumw) 291 | 292 | 23jun2021 (version 1.1.4) 293 | - additional inequality statistic: hoover index (robin hood index) 294 | - additional poverty statistics: hcr (head count ratio), pgap (poverty gap), 295 | pgi (poverty gap index), sen (Sen poverty index), sst (Sen-Shorrocks-Thon), 296 | takayama (Takayama poverty index), chu (Clark-Hemming-Ulph) 297 | - new option -pstrong- to employ the "strong" poverty definition; -fgt- now uses 298 | the "weak" definition by default 299 | - option -relax- of -dstat summarize- was not included in e() and was not passed 300 | through to -predict-; this is fixed 301 | - the routine computing -md- could break in some contexts; this is fixed 302 | 303 | 10jun2021 (version 1.1.2) 304 | - -predict- could fail after -dstat proportion-; this is fixed 305 | - contrast options -ratio- and -lnratio- now again supported for statistics 306 | that are not normalized by the sample size (frequencies, totals) 307 | - fixed bug that could occur if nocasewise and unconditional were both specified 308 | 309 | 07jun2021 (version 1.1.1) 310 | - option -nocasewise- added 311 | - option -relax- added 312 | - dstat now always uses scores for totals/frequencies instead of influence 313 | functions; (sub)option svy in -predict-, -vce(analytic)- and -vce(cluster)- 314 | is discontinued; option -unconditional(fixed)- is discontinued; treatment of 315 | totals/freqs now consistent with survey estimation by default (i.e. supopulation 316 | sizes are assumed random; number PSUs is assumed fixed); this is different 317 | from how official command -total- handles subpops if used without -svy- 318 | prefix 319 | - contrast options -ratio- and -lnratio- are no longer supported for statistics 320 | that are not normalized by the sample size (frequencies, totals); -ratio- and 321 | -lnratio- now imply -contrast- 322 | - option -compact- of -predict/generate()/rif()- no longer allowed with 323 | -over(, contrast/accumulate)- or with statistics that are not normalized by 324 | the sample size 325 | - dstat summarize applied sorting even if not necessary; this is fixed 326 | - omitted estimates are no longer flagged in the coefficient names; vector 327 | e(omit) is now returned 328 | - density estimation settings are now returned in e() only if density estimation 329 | has, in fact, been employed; e(bwidth) now has better column names 330 | - in some situations, dstat histogram computed wrong results for the first bin 331 | if option balance() was specified; this is fixed 332 | - _makesymmetric() is now applied to e(V) to remove asymmetry due to possible 333 | roundoff-error 334 | 335 | 22dec2020 (version 1.1.0) 336 | - results for statistics mad(0,0), madn(0,0), mae(0), and maen(0) were wrong 337 | in case of weights; this is fixed 338 | 339 | 16dec2020 (version 1.0.9) 340 | - new subopions -contrast()-, -ratio-, -lnratio-, and -accumulate- in -over()- 341 | - new -common- option in -dstat density-, -dstat histogram-, and -datat [c]cdf- 342 | - new display options -cref- and -pvalue- 343 | - citype() now sets CI to missing if value of coef is outside domain of 344 | transformation function 345 | - option select() in -dstat graph- can now contain -reverse- instead of a 346 | numlist 347 | 348 | 11dec2020 (version 1.0.8) 349 | - cluster variable in vce(cluster) can now be string 350 | - over(..., rescale) now implemented as subcommand-specific option 351 | -unconditional-; -unconditional(fixed)- added to treat subpopulation 352 | sizes as fixed 353 | - dstat cdf/ccdf: specifying -ipolate- together with -floor- returned error; this 354 | is fixed 355 | 356 | 10dec2020 (version 1.0.7) 357 | - vce(analytic/cluster, svy) 358 | o svy was not taken into account if no clusters and no weights, iweights, or 359 | fweights were specified; this is fixes 360 | o revised code to preserve memory and avoid double work 361 | - for reasons of consistency, in case of iweights, the sum of weights is now 362 | reported in e(N) instead of the physical number of observations 363 | 364 | 09dec2020 (version 1.0.6) 365 | - new option select() in -dstat graph- to select and order subgraphs and plots 366 | - new suboption select() in over(): select and order subpopulations to be included 367 | in results; total will still include obs from all groups 368 | - new suboption -rescale- in over(): rescale results by the relative size of the 369 | subpopulation 370 | - suboption -svy- in vce(analytic) and vce(cluster) to compute SEs for 371 | frequencies and totals like svy does 372 | - new statistics: min, max, range, midrange (IFs/SEs will be set to zero for 373 | these statistics) 374 | - vce(svy), vce(bootstrap), and vce(jackknife) now feature suboption [no]cov to 375 | decide whether to store full e(V) or only e(se); default is -cov- for 376 | -dstat summarize- and -nocov- else; with vce(svy) option -nocov- also removes 377 | auxiliary covariance matrices such as e(V_srs) 378 | - dstat density: standard errors were correct only in the first subpopulation 379 | if -over()- was specified together with -exact-; this is fixed 380 | 381 | 05dec2020 (version 1.0.5) 382 | - new -dstat ccdf- command for complementary CDF (tail distribution, survival 383 | function) 384 | - -dstat cdf- has new options -frequency-, -percent-, -floor-, and -ipolate- 385 | - additional statistics: total(), cdff(), ccdf(), ccdfm(), ccdff() 386 | - statistics trim(p1,p2) and winsor(p1,p2) now documented; furthermore, qdef() 387 | is now taken into account by trim() and winsor() 388 | - option -sum- in -dstat lorenz- and -dstat share- now documented 389 | - statistics tlorenz(), tshare(), tccurve(), tcshare() now documented 390 | - option generate() has a new -svy- suboption to generated scores for survey 391 | estimation instead of influence functions; this is only makes a difference for 392 | unnormalized statistics (frequencies, totals) 393 | - VCE for unnormalized statistics (frequencies, totals) did not take account of 394 | the extra uncertainty induced by the variability of the sum of weights in the 395 | context of survey estimation; this is fixed 396 | - confidence limits had wrong scale if -percent- was specified, citype() was not 397 | normal, and width of confidence interval was zero; this is fixed 398 | - predict after survey estimation with subpop() returned missing in observations 399 | outside subpop(); the IFs for these observations are now set to 0 400 | - revised code of some IFs to avoid double work; affected functions are 401 | dstat_density_IF(), dstat_cdf_IF(), dstat_sum_hist(), dstat_sum_cdf(), 402 | dstat_sum_cdfm(), dstat_sum_freq() 403 | - now using pstyle(p#line) instead of pstyle(p#) in graphs if appropriate 404 | - no longer using mm_repeat(); using J() instead 405 | 406 | 27nov2020 (version 1.0.4) 407 | - "version, user" issue now finally fixed (hopefully); the issue was related 408 | to -set dp comma- 409 | 410 | 27nov2020 (version 1.0.3) 411 | - yet another try to fix the "version, user" issue 412 | 413 | 27nov2020 (version 1.0.2) 414 | - graph option -merge- added 415 | - added code to circumvent the "version, user" error that appears to occur 416 | in some variants of Stata installations 417 | 418 | 24nov2020 (version 1.0.1) 419 | - issues encountered with regexr() in Stata 14; no longer using regexr() 420 | - fixed another awkward Stata 14 issue 421 | 422 | 24nov2020 (version 1.0.0): 423 | - dstat released on GitHub 424 | 425 | -------------------------------------------------------------------------------- /dstat.pkg: -------------------------------------------------------------------------------- 1 | v 3 2 | d dstat: Stata module to compute summary statistics and distribution functions including standard errors and optional covariate balancing 3 | d 4 | d Author: Ben Jann, University of Bern, ben.jann@unibe.ch 5 | d 6 | d Distribution-Date: 20250604 7 | f dstat.ado 8 | f dstat.sthlp 9 | f dstat_svyr.ado 10 | -------------------------------------------------------------------------------- /dstat.sthlp: -------------------------------------------------------------------------------- 1 | {smcl} 2 | {* 25apr2025}{...} 3 | {viewerjumpto "Syntax" "dstat##syntax"}{...} 4 | {viewerjumpto "Description" "dstat##description"}{...} 5 | {viewerjumpto "Summary statistics" "dstat##stats"}{...} 6 | {viewerjumpto "Options" "dstat##options"}{...} 7 | {viewerjumpto "Examples" "dstat##examples"}{...} 8 | {viewerjumpto "Methods and formulas" "dstat##methods"}{...} 9 | {viewerjumpto "Saved results" "dstat##saved_results"}{...} 10 | {viewerjumpto "References" "dstat##references"}{...} 11 | {hline} 12 | help for {hi:dstat}{...} 13 | {right:{browse "http://github.com/benjann/dstat/"}} 14 | {hline} 15 | 16 | {title:Title} 17 | 18 | {pstd}{hi:dstat} {hline 2} Summary statistics and distribution functions 19 | 20 | 21 | {marker syntax}{...} 22 | {title:Syntax} 23 | 24 | {pstd} 25 | Estimation 26 | 27 | {pmore} 28 | Summary statistics 29 | 30 | {p 12 17 2} 31 | {cmd:dstat} [{cmdab:su:mmarize}] [{cmd:(}{it:{help dstat##statistics:stats}}{cmd:)}] {varlist} 32 | [ {cmd:(}{it:{help dstat##statistics:stats}}{cmd:)} {varlist} {it:...} ] 33 | {ifin} {weight} [{cmd:,} {help dstat##opts:{it:options}} ] 34 | 35 | {pmore} 36 | Pairwise associations (wrapper for {cmd:dstat summarize}) 37 | 38 | {p 12 17 2} 39 | {cmd:dstat} {cmdab:pw} {varlist} {ifin} {weight} [{cmd:,} 40 | {cmdab:s:tatistic}{cmd:(}{help dstat##pw:{it:stat}}{cmd:)} 41 | {help dstat##opts:{it:options}} ] 42 | 43 | {pmore} 44 | Distribution functions 45 | 46 | {p 12 17 2} 47 | {cmd:dstat} {it:subcmd} {varlist} {ifin} {weight} [{cmd:,} {help dstat##opts:{it:options}} ] 48 | 49 | {pmore2} 50 | where {it:subcmd} is 51 | 52 | {p2colset 15 28 30 2}{...} 53 | {p2col:{opt d:ensity}}density function{p_end} 54 | {p2col:{opt pdf}}same as {cmd:density}{p_end} 55 | {p2col:{opt h:istogram}}histogram{p_end} 56 | {p2col:{opt p:roportion}}probability distribution{p_end} 57 | {p2col:{opt freq:uency}}same as {cmd:proportion} with option {cmd:frequency}{p_end} 58 | {p2col:{opt c:df}}cumulative distribution function{p_end} 59 | {p2col:{opt cc:df}}complementary CDF/survival function{p_end} 60 | {p2col:{opt q:uantile}}quantile function{p_end} 61 | {p2col:{opt l:orenz}}lorenz curve{p_end} 62 | {p2col:{opt sh:are}}percentile shares{p_end} 63 | {p2col:{opt tip}}TIP curve{p_end} 64 | 65 | {pmore} 66 | {it:varlist} may contain factor variables (in most cases; an exception is {cmd:dstat pw}); see {help fvvarlist}. 67 | {p_end} 68 | {pmore} 69 | {cmd:fweight}s, {cmd:pweight}s, and {cmd:iweight}s are allowed; see {help weight}. 70 | 71 | {pstd} 72 | Postestimation 73 | 74 | {pmore} 75 | Replay results 76 | 77 | {p 12 17 2} 78 | {cmd:dstat} [{cmd:,} {help dstat##repopts:{it:reporting_options}} ] 79 | 80 | {pmore} 81 | Draw graph 82 | 83 | {p 12 17 2} 84 | {cmd:dstat} {cmdab:gr:aph} 85 | [{cmd:,} {help dstat##graph_opts:{it:graph_options}} ] 86 | 87 | {pmore} 88 | Obtain (recentered) influence functions 89 | 90 | {p 12 17 2} 91 | {cmd:predict} {c -(}{help newvarlist##stub*:{it:stub}}{cmd:*} | 92 | {it:{help newvar:newvar1}} {it:{help newvar:newvar2}} {cmd:...}{c )-} {ifin} 93 | [{cmd:,} {it:{help dstat##predict_opts:predict_options}} ] 94 | 95 | 96 | {synoptset 26 tabbed}{...} 97 | {marker opts}{col 5}{help dstat##options:{it:options}}{col 33}Description 98 | {synoptline} 99 | {syntab:{help dstat##mainopts:Main}} 100 | {synopt:{opt nocase:wise}}do not perform casewise deletion of observations 101 | {p_end} 102 | {synopt:{cmdab:o:ver(}{help varname:{it:overvar}}[{cmd:,} {it:opts}]{cmd:)}}compute 103 | results for subpopulations defined by {it:overvar}; not allowed with {cmd:dstat pw} 104 | {p_end} 105 | {synopt:{opt tot:al}}include results for total population 106 | {p_end} 107 | {synopt:{cmdab:bal:ance(}{help dstat##balance:{it:spec}}{cmd:)}}balance 108 | covariates using reweighting; requires {cmd:over()} 109 | {p_end} 110 | {synopt:{help dstat##repopts:{it:reporting_options}}}reporting options 111 | {p_end} 112 | {synopt:{opt noval:ues}}do not use values as coefficient names 113 | {p_end} 114 | {synopt:{opth vf:ormat(fmt)}}format for coefficient name values 115 | {p_end} 116 | 117 | {syntab:{help dstat##vce:SE/VCE}} 118 | {synopt:{cmd:vce(}{it:vcetype}{cmd:)}}variance estimation method; 119 | {it:vcetype} may be {cmd:none} (skip variance estimation), 120 | {cmdab:a:nalytic}, {cmdab:cl:uster} {it:clustvar}, {cmdab:svy}, {cmdab:boot:strap}, 121 | or {cmdab:jack:knife} 122 | {p_end} 123 | {synopt:{cmd:nose}}alias for {cmd:vce(none)} 124 | {p_end} 125 | {synopt:[{cmd:no}]{cmd:cov}}whether to store the full variance matrix or only the 126 | standard errors 127 | {p_end} 128 | {synopt:{opt nobwfix:ed}}do not keep density bandwidth fixed across replications 129 | {p_end} 130 | {synopt:{cmdab:g:enerate(}{it:names}[{cmd:,} {it:opts}]{cmd:)}}store influence functions 131 | {p_end} 132 | {synopt:{cmd:rif(}{it:names}[{cmd:,} {it:opts}]{cmd:)}}store recentered influence functions 133 | {p_end} 134 | {synopt:{opt r:eplace}}allow replacing existing variables 135 | {p_end} 136 | 137 | {syntab:{help dstat##quant:Quantile/density settings}} 138 | {synopt:{opt qdef(#)}}quantile definition; # in {c -(}0,...,11{c )-} 139 | {p_end} 140 | {synopt:{opt hdq:uantile}}synonym for {cmd:qdef(10)} (Harrell-Davis quantiles) 141 | {p_end} 142 | {synopt:{opt hdt:rim}[{cmd:(}{it:width}{cmd:)}]}apply trimming to the Harrell-Davis estimator 143 | {p_end} 144 | {synopt:{opt mq:uantile}}synonym for {cmd:qdef(11)} (mid-quantiles) 145 | {p_end} 146 | {synopt:{opt mqopt:s(options)}}options for mid-quantiles 147 | {p_end} 148 | {synopt:{it:{help dstat##densopts:density_options}}}details of density estimation 149 | {p_end} 150 | 151 | {syntab:{help dstat##sum:Subcommand {bf:summarize}}} 152 | {synopt:{opt relax}}compute a statistic even if observations are out of support 153 | {p_end} 154 | {synopt:{opth by(varname)}}default secondary variable (for association and concentration measures) 155 | {p_end} 156 | {synopt:{opt pl:ine(#|varname)}}default poverty line 157 | {p_end} 158 | {synopt:{opt pstr:ong}}use "strong" poverty definition 159 | {p_end} 160 | 161 | {syntab:{help dstat##pw:Subcommand {bf:pw}}} 162 | {synopt:{opt s:tatistic(stat)}}type of association measure; default is {cmd:statistic(corr)} 163 | {p_end} 164 | {synopt:{opt lo:wer}}lower-triangle elements only 165 | {p_end} 166 | {synopt:{opt up:per}}upper-triangle elements only 167 | {p_end} 168 | {synopt:{opt diag:onal}}include diagonal elements 169 | {p_end} 170 | 171 | {syntab:{help dstat##density:Subcommand {bf:density}}} 172 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)} 173 | {p_end} 174 | {synopt:{opt com:mon}}use common evaluation points across subpopulations 175 | {p_end} 176 | {synopt:[{cmd:l}|{cmd:r}]{cmd:tight}}use tight evaluation grid 177 | {p_end} 178 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}; default is to determine 179 | grid range from data 180 | {p_end} 181 | {synopt:{opth at(numlist)}}custom grid of evaluation points 182 | {p_end} 183 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation 184 | {p_end} 185 | 186 | {syntab:{help dstat##hist:Subcommand {bf:histogram}}} 187 | {synopt:{opt prop:ortion}}estimate proportions instead of densities 188 | {p_end} 189 | {synopt:{opt per:cent}}estimate percent instead of densities 190 | {p_end} 191 | {synopt:{opt freq:uency}}estimate frequencies instead of densities 192 | {p_end} 193 | {synopt:{cmd:n(}{cmd:#}|{it:{help dstat##hist:method}}{cmd:)}}number of 194 | histogram bins; default is {cmd:n(sqrt)} 195 | {p_end} 196 | {synopt:{cmd:ep}}use equal probability bins instead of equal width bins 197 | {p_end} 198 | {synopt:{opt com:mon}}use common bin definitions across subpopulations 199 | {p_end} 200 | {synopt:{opth at(numlist)}}custom bin definitions 201 | {p_end} 202 | {synopt:{opt disc:rete}}treat data as discrete (calls {cmd:dstat proportion}) 203 | {p_end} 204 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation 205 | {p_end} 206 | 207 | {syntab:{help dstat##prop:Subcommand {bf:proportion}}} 208 | {synopt:{opt per:cent}}estimate percent instead of probabilities 209 | {p_end} 210 | {synopt:{opt freq:uency}}estimate frequencies instead of probabilities 211 | {p_end} 212 | {synopt:{opth at(numlist)}}custom list of levels for which to estimate proportions 213 | {p_end} 214 | {synopt:{opt nocat:egorical}}allow variables that do not comply to Stata's rules 215 | for factor variables 216 | {p_end} 217 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation 218 | {p_end} 219 | 220 | {syntab:{help dstat##cdf:Subcommands {bf:cdf} and {bf:ccdf}}} 221 | {synopt:{opt per:cent}}estimate percent instead of probabilities 222 | {p_end} 223 | {synopt:{opt freq:uency}}estimate frequencies instead of probabilities 224 | {p_end} 225 | {synopt:{opt mid}}apply midpoint adjustment 226 | {p_end} 227 | {synopt:{opt fl:oor}}use "lower-than" definition 228 | {p_end} 229 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)} 230 | {p_end} 231 | {synopt:{opt com:mon}}use common evaluation points across subpopulations 232 | {p_end} 233 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}; default is to determine 234 | grid range from data 235 | {p_end} 236 | {synopt:{opth at(numlist)}}custom grid of evaluation points 237 | {p_end} 238 | {synopt:{opt disc:rete}}treat data as discrete 239 | {p_end} 240 | {synopt:{opt ip:olate}}obtain CDF by linear interpolation 241 | {p_end} 242 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation 243 | {p_end} 244 | 245 | {syntab:{help dstat##quantile:Subcommand {bf:quantile}}} 246 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)} 247 | {p_end} 248 | {synopt:{opt range(a b)}}use grid within range from {it:a} to {it:b}, {it:a} and {it:b} 249 | in [0,1]; default is {cmd:range(0 1)} 250 | {p_end} 251 | {synopt:{opth at(numlist)}}custom grid of evaluation points 252 | {p_end} 253 | 254 | {syntab:{help dstat##lorenz:Subcommand {bf:lorenz}}} 255 | {synopt:{opt per:cent}}report percent instead of proportions 256 | {p_end} 257 | {synopt:{opt general:ized}}estimate generalized Lorenz curve 258 | {p_end} 259 | {synopt:{opt sum}}estimate total (unnormalized) Lorenz curve 260 | {p_end} 261 | {synopt:{opt gap}}estimate equality gap curve 262 | {p_end} 263 | {synopt:{opt abs:olute}}estimate absolute Lorenz curve 264 | {p_end} 265 | {synopt:{opth by(varname)}}estimate concentration curve with respect to specified variable 266 | {p_end} 267 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(101)} 268 | {p_end} 269 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}, {it:a} and {it:b} 270 | in [0,1]; default is {cmd:range(0 1)} 271 | {p_end} 272 | {synopt:{opth at(numlist)}}custom grid of evaluation points 273 | {p_end} 274 | 275 | {syntab:{help dstat##share:Subcommand {bf:share}}} 276 | {synopt:{opt prop:ortion}}estimate proportions instead of densities 277 | {p_end} 278 | {synopt:{opt per:cent}}estimate percent instead of densities 279 | {p_end} 280 | {synopt:{opt general:ized}}estimate generalized shares instead of densities 281 | {p_end} 282 | {synopt:{opt sum}}estimate totals instead of density 283 | {p_end} 284 | {synopt:{opt ave:rage}}estimate averages instead of densities 285 | {p_end} 286 | {synopt:{opth by(varname)}}estimate concentration shares with respect to specified variable 287 | {p_end} 288 | {synopt:{opt n(#)}}number of bins; default is {cmd:n(20)} 289 | {p_end} 290 | {synopt:{opth at(numlist)}}custom bin definitions 291 | {p_end} 292 | 293 | {syntab:{help dstat##tip:Subcommand {bf:tip}}} 294 | {synopt:{opt pl:ine(#|varname)}}poverty line (required) 295 | {p_end} 296 | {synopt:{opt abs:olute}}estimate absolute TIP curve 297 | {p_end} 298 | {synopt:{opt pstr:ong}}use "strong" poverty definition 299 | {p_end} 300 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(101)} 301 | {p_end} 302 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}, {it:a} and {it:b} 303 | in [0,1]; default is {cmd:range(0 1)} 304 | {p_end} 305 | {synopt:{opth at(numlist)}}custom grid of evaluation points 306 | {p_end} 307 | {synoptline} 308 | {pstd} 309 | 310 | {marker graph_opts}{col 5}{help dstat##graph_options:{it:graph_options}}{col 33}Description 311 | {synoptline} 312 | {synopt:{cmdab:mer:ge}}merge results into a single subgraph 313 | {p_end} 314 | {synopt:{cmdab:overl:ay}}synonym for {cmd:merge} 315 | {p_end} 316 | {synopt:{cmd:flip}}change how results are allocated to plots and subgraphs 317 | {p_end} 318 | {synopt:[{ul:{cmd:g}}|{ul:{cmd:p}}]{opt sel:ect(spec)}}select and order plots and subgraphs 319 | {p_end} 320 | {synopt:{opt cref}}include results from the the reference (sub)population 321 | {p_end} 322 | {synopt:{cmdab:bys:tats}[{cmd:(}{it:arg}{cmd:)}]}group results by statistics; only relevant for {cmd:dstat summarize} 323 | {p_end} 324 | {synopt:[{cmd:no}]{cmd:step}}do/do not use step function; only relevant for {cmd:dstat cdf} 325 | {p_end} 326 | {synopt:{cmdab:noref:line}}suppress equality line; only relevant for {cmd:dstat lorenz} 327 | {p_end} 328 | {synopt:{opth ref:line(line_options)}}affect rendition of equality line; only relevant for {cmd:dstat lorenz} 329 | {p_end} 330 | {synopt:{help dstat##coefplot:{it:coefplot_options}}}options to be passed through to {helpb coefplot} 331 | {p_end} 332 | {synoptline} 333 | 334 | {marker predict_opts}{col 5}{help dstat##predict_options:{it:predict_options}}{col 33}Description 335 | {synoptline} 336 | {synopt:{opt rif}}store recentered influence functions 337 | {p_end} 338 | {synopt:{cmdab:sca:ling(}{cmdab:t:otal}|{cmdab:m:ean}{cmd:)}}set the scaling of the influence functions 339 | {p_end} 340 | {synopt:{opt com:pact}}store influence functions in compact form; not allowed with {cmd:balance()} or {cmd:unconditional} 341 | {p_end} 342 | {synopt:{opt qui:etly}}do not display list of generated variables 343 | {p_end} 344 | {synoptline} 345 | 346 | 347 | {marker description}{...} 348 | {title:Description} 349 | 350 | {pstd} 351 | {cmd:dstat} provides a unified framework for the analysis of (univariate) 352 | distributions. It supports the estimation of various distribution 353 | functions (such as the PDF and CDF, quantiles, probabilities and 354 | frequencies, histograms, Lorenz and concentration curves) and a large 355 | collection of summary statistics (classical and robust measures of 356 | location, scale, skewness, and kurtosis, measures of inequality, 357 | concentration, and poverty). 358 | 359 | {pstd} 360 | {cmd:dstat} is an estimation command. Its results are stored 361 | in {cmd:e()} and standard errors are provided for all 362 | estimates. Variance-covariance estimation is based on influence functions 363 | (Hampel 1974, Deville 1999) and fully supports complex survey data; see the {helpb dstat##vce:vce()} 364 | option. Influence functions or recentered influence functions (RIFs) can be 365 | generated for all statistics supported by {cmd:dstat}, either using the 366 | {helpb dstat##generate:generate()} or {cmd:rif()} option, or 367 | by applying {cmd:predict} after estimation. 368 | 369 | {pstd} 370 | {cmd:dstat} supports simultaneous estimation for multiple variables and 371 | multiple subpopulations, and allows for covariate balancing or 372 | standardization between subpopulations based on inverse probability 373 | weighting (IPW) or entropy balancing. See the {helpb dstat##over:over()} 374 | and {helpb dstat##balance:balance()} options. Standard errors will take 375 | account of the uncertainty induced by the balancing. 376 | 377 | {pstd} 378 | Basic functionality for graphing results is provided through the 379 | {cmd:graph()} option or by applying command {cmd:dstat graph} 380 | after estimation. {cmd:dstat} employs {helpb coefplot} for 381 | graphing, which needs to be installed on the system; see 382 | {net "describe coefplot, from(http://fmwww.bc.edu/repec/bocode/c/)":{bf:ssc describe coefplot}}. Furthermore, 383 | {cmd:dstat} requires the {helpb moremata} package; see 384 | {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":{bf:ssc describe moremata}}. 385 | 386 | 387 | {marker statistics}{...} 388 | {title:Summary statistics} 389 | 390 | {pstd} 391 | The syntax for specifying summary statistics and variables with 392 | {cmd:dstat summarize} is 393 | 394 | [ {cmd:(}{it:{help dstat##stats:stats}}{cmd:)} ] {varlist} [ {cmd:(}{it:{help dstat##stats:stats}}{cmd:)} {varlist} {it:...} ] 395 | 396 | {pstd} 397 | where {it:stats} is a space-separated list of statistics as documented below 398 | and {it:varlist} is a list of numeric variables, possibly including factor 399 | variables (see {help fvvarlist}). The default statistic is {cmd:mean}. Statistics 400 | and variables may be repeated. {cmd:dstat} will rearrange the statistics by variables 401 | and remove duplicate combinations. 402 | 403 | {pstd} 404 | The names of the statistics can be abbreviated and typed in lower or 405 | uppercase letters (the names will be used as typed in the output; repeated 406 | statistics with differently typed names will be treated as different 407 | statistics). If abbreviation is ambiguous, the first matching statistic in 408 | the sorted list of supported statistics will be used (with the following 409 | exceptions: {cmd:q} or {cmd:p} can be used for {cmd:quantile}, {cmd:d} 410 | for {cmd:density}, and {cmd:f} for {cmd:freq}). For example, to obtain the 411 | geometric mean, you could type {cmd:gmean}, {cmd:gm}, {cmd:GM}, {cmd:gMean}, 412 | {cmd:gme}, or any other variant including at least the first two letters. 413 | 414 | {pstd} 415 | Many of the statistics allow or require one or several arguments in 416 | parentheses. Parentheses can be omitted if there is only a single numeric 417 | argument and no space is included between the name and the argument. For 418 | example, to obtain the 5% trimmed mean you could type {cmd:trim(5)} or 419 | simply {cmd:trim5} (omitting parentheses also works with numbers that 420 | contain decimal places, that is, you could type {cmd:trim5.5} to obtain the 421 | 5.5% trimmed mean; in this case, however, parentheses will be added in the 422 | output). 423 | 424 | {synoptset 27 tabbed}{...} 425 | {marker stats}{col 5}{it:stats}{col 33}Description 426 | {synoptline} 427 | {syntab:Points in the distribution} 428 | {synopt:{opt quantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 quantile; {it:p} in [0,100] 429 | {p_end} 430 | {synopt:{opt p}{cmd:(}{it:p}{cmd:)}}same as {cmd:quantile()} 431 | {p_end} 432 | {synopt:{opt hdquantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 Harrell/Davis (1982) quantile; {it:p} in [0,100] 433 | {p_end} 434 | {synopt:{opt mquantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 mid-quantile; {it:p} in [0,100] 435 | {p_end} 436 | {synopt:{opt density}{cmd:(}{it:x}{cmd:)}}kernel density at value {it:x} 437 | {p_end} 438 | {synopt:{opt hist}{cmd:(}{it:x1}{cmd:,}{it:x2}{cmd:)}}histogram density of data within ({it:x1},{it:x2}] 439 | {p_end} 440 | {synopt:[*]{opt cdf}{cmd:(}{it:x}{cmd:)}}cumulative distribution (CDF) at value {it:x}; prefix {it:*} is empty for default, 441 | {cmd:m} for mid-adjusted CDF, {cmd:f} for floor CDF 442 | {p_end} 443 | {synopt:[*]{opt ccdf}{cmd:(}{it:x}{cmd:)}}complementary CDF at value {it:x}; prefix {it:*} is empty for default, 444 | {cmd:m} for mid-adjusted CCDF, {cmd:f} for floor CCDF 445 | {p_end} 446 | {synopt:{opt prop}{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}}proportion of data equal to {it:x1} or within [{it:x1},{it:x2}] 447 | {p_end} 448 | {synopt:{opt pct}{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}}percent of data equal to {it:x1} or within [{it:x1},{it:x2}] 449 | {p_end} 450 | {synopt:{opt freq}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}overall frequency, or frequency of data equal to {it:x1} or within [{it:x1},{it:x2}] 451 | {p_end} 452 | {synopt:{opt count}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}same as {cmd:freq()} 453 | {p_end} 454 | {synopt:{opt total}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}overall total, or total of data equal to {it:x1} or within [{it:x1},{it:x2}] 455 | {p_end} 456 | {synopt:{opt min}}observed minimum (standard error set to zero) 457 | {p_end} 458 | {synopt:{opt max}}observed maximum (standard error set to zero) 459 | {p_end} 460 | {synopt:{opt range}}{cmd:max}-{cmd:min} (standard error set to zero) 461 | {p_end} 462 | {synopt:{opt midrange}}({cmd:min}+{cmd:max})/2 (standard error set to zero) 463 | {p_end} 464 | 465 | {syntab:Location measures} 466 | {synopt:{opt mean}}arithmetic mean 467 | {p_end} 468 | {synopt:{opt gmean}}geometric mean (data must be positive) 469 | {p_end} 470 | {synopt:{opt hmean}}harmonic mean (data must be positive) 471 | {p_end} 472 | {synopt:{cmd:trim(}{it:p1}[{cmd:,}{it:p2}]{cmd:)}}trimmed mean with 473 | {it:p1}/100 lower trimming and {it:p2}/100 upper trimming; {it:p1} and {it:p2} in 474 | [0,50]; {it:p2}={it:p1} if omitted; default is {it:p1}={it:p2}=25 475 | {p_end} 476 | {synopt:{cmd:winsor(}{it:p1}[{cmd:,}{it:p2}]{cmd:)}}winsorized mean with 477 | {it:p1}/100 lower winsorizing and {it:p2}/100 upper winsorizing; {it:p1} and {it:p2} in 478 | [0,50]; {it:p2}={it:p1} if omitted; default is {it:p1}={it:p2}=25 479 | {p_end} 480 | {synopt:{opt median}}median; equal to {cmd:q50} 481 | {p_end} 482 | {synopt:{opt huber}[{cmd:(}{it:p}{cmd:)}]}Huber M estimate with gaussian efficiency 483 | {it:p} in [63.7,99.9]; default is {it:p}=95 484 | {p_end} 485 | {synopt:{opt biweight}[{cmd:(}{it:p}{cmd:)}]}biweight M estimate with gaussian 486 | efficiency {it:p} in [.01,99.9]; default is {it:p}=95 487 | {p_end} 488 | {synopt:{opt hl}}Hodges-Lehmann location measure (Hodges and Lehmann 1963) 489 | {p_end} 490 | 491 | {syntab:Scale measures} 492 | {synopt:{opt sd}[{cmd:(}{it:df}{cmd:)}]}standard deviation; {it:df} applies 493 | small-sample adjustment; default is {it:df}=1 494 | {p_end} 495 | {synopt:{opt variance}[{cmd:(}{it:df}{cmd:)}]}variance; default is {it:df}=1 496 | {p_end} 497 | {synopt:{opt mse}[{cmd:(}{it:x}[{cmd:,}{it:df}]{cmd:)}]}mean squared deviation from value 498 | {it:x} (mean squared error); default is {it:x}=0 and {it:df}=0 499 | {p_end} 500 | {synopt:{opt rmse}[{cmd:(}{it:x}[{cmd:,}{it:df}]{cmd:)}]}root mean 501 | squared deviation from value {it:x}; default is {it:x}=0 and {it:df}=0 502 | {p_end} 503 | {synopt:{opt iqr}[{cmd:n}][{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}interquantile range; default 504 | is {cmd:iqr(25,75)} (interquartile range); specify {cmd:iqrn} for 505 | normalized IQR, equal to 1/(invnormal({it:p2}) - invnormal({it:p1})) * {cmd:iqr} 506 | {p_end} 507 | {synopt:{opt mad}[{cmd:n}][{cmd:(}{it:l}[{cmd:,}{it:t}]{cmd:)}]}median (or mean if {it:l}!=0) 508 | absolute deviation from the median (or mean if {it:t}!=0); specify {cmd:madn} for 509 | normalized MAD, equal to 1/invnormal(0.75) * {cmd:mad} (or sqrt(pi/2) * {cmd:mad} if {it:l}!=0) 510 | {p_end} 511 | {synopt:{opt mae}[{cmd:n}][{cmd:(}{it:l}[{cmd:,}{it:x}]{cmd:)}]}median (or mean if {it:l}!=0) 512 | absolute deviation from value {it:x}; default is {it:x}=0; specify {cmd:maen} for 513 | normalized MAE, equal to 1/invnormal(0.75) * {cmd:mae} or (sqrt(pi/2) * {cmd:mae} if {it:l}!=0) 514 | {p_end} 515 | {synopt:{opt md}[{cmd:n}]}mean absolute pairwise difference; equal to 2 * {cmd:mean} * {cmd:gini}; specify {cmd:mdn} for 516 | normalized MD, equal to sqrt(pi)/2 * {cmd:md} 517 | {p_end} 518 | {synopt:{opt mscale}[{cmd:(}{it:bp}{cmd:)}]}M estimate of scale with breakdown 519 | point {it:bp} in [1,50]; default is {it:bp}=50 520 | {p_end} 521 | {synopt:{opt qn}}Qn scale coefficient (Rousseeuw and Croux 1993) 522 | {p_end} 523 | 524 | {syntab:Skewness measures} 525 | {synopt:{opt skewness}}skewness 526 | {p_end} 527 | {synopt:{opt qskew}[{cmd:(}{it:alpha}{cmd:)}]}quantile skewness (Hinkley 1975); 528 | {it:alpha} in [0,50]; default is {it:alpha}=25 529 | {p_end} 530 | {synopt:{opt mc}}medcouple (Brys et al. 2004) 531 | {p_end} 532 | 533 | {syntab:Kurtosis measures} 534 | {synopt:{opt kurtosis}}kurtosis 535 | {p_end} 536 | {synopt:{opt ekurtosis}}excess kurtosis; equal to {cmd:kurtosis}-3 537 | {p_end} 538 | {synopt:{opt qw}[{cmd:(}{it:alpha}{cmd:)}]}quantile tail weight; {it:alpha} 539 | in [0,50]; default is {it:alpha}=25 540 | {p_end} 541 | {synopt:{opt lqw}[{cmd:(}{it:alpha}{cmd:)}]}left quantile tail weight; {it:alpha} 542 | in [0,50]; default is {it:alpha}=25 543 | {p_end} 544 | {synopt:{opt rqw}[{cmd:(}{it:alpha}{cmd:)}]}right quantile tail weight; 545 | {it:alpha} in [0,50]; default is {it:alpha}=25 546 | {p_end} 547 | {synopt:{opt lmc}}left medcouple tail weight measure (Brys et al. 2006) 548 | {p_end} 549 | {synopt:{opt rmc}}right medcouple tail weight measure (Brys et al. 2006) 550 | {p_end} 551 | 552 | {syntab:Inequality measures} 553 | {synopt:{opt hoover}}Hoover index (Robin Hood index, Ricci-Schutz, Pietra index) 554 | {p_end} 555 | {synopt:[{cmd:a}]{opt gini}[{cmd:(}{it:df}{cmd:)}]}Gini coefficient; {it:df} applies 556 | small-sample adjustment; default is {it:df}=0; specify {cmd:agini} for the absolute Gini coefficient 557 | {p_end} 558 | {synopt:{opt mld}}mean log deviation (MLD); equal to {cmd:ge(0)} 559 | {p_end} 560 | {synopt:{opt theil}}Theil index; equal to {cmd:ge(1)} 561 | {p_end} 562 | {synopt:{opt ge}[{cmd:(}{it:alpha}{cmd:)}]}generalized entropy (Shorrocks 1980) 563 | with parameter {it:alpha}; default is {it:alpha}=1 (in which case 564 | {cmd:ge}={cmd:theil}) 565 | {p_end} 566 | {synopt:{opt atkinson}[{cmd:(}{it:epsilon}{cmd:)}]}Atkinson index with parameter 567 | {it:epsilon}>=0; default is {it:epsilon}=1 568 | {p_end} 569 | {synopt:{opt cv}[{cmd:(}{it:df}{cmd:)}]}coefficient of variation; default is {it:df}=1; 570 | {cmd:cv(0)}=sqrt(2*{cmd:ge(2)}) 571 | {p_end} 572 | {synopt:{opt lvar}[{cmd:(}{it:df}{cmd:)}]}logarithmic variance; default is {it:df}=1 573 | {p_end} 574 | {synopt:{opt vlog}[{cmd:(}{it:df}{cmd:)}]}variance of logarithm; default is {it:df}=1 575 | {p_end} 576 | {synopt:{opt sdlog}[{cmd:(}{it:df}{cmd:)}]}standard deviation of logarithm; default is {it:df}=1 577 | {p_end} 578 | {synopt:{opt top}[{cmd:(}{it:p}{cmd:)}]}outcome share of top {it:p} percent; default 579 | is {it:p}=10 580 | {p_end} 581 | {synopt:{opt bottom}[{cmd:(}{it:p}{cmd:)}]}outcome share of bottom {it:p} percent; 582 | default is {it:p}=40 583 | {p_end} 584 | {synopt:{opt mid}[{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}outcome share of mid 585 | {it:p1} to {it:p2} percent; default is {it:p1}=40 and {it:p2}=90 586 | {p_end} 587 | {synopt:{opt palma}}palma ratio; equal to {cmd:top}/{cmd:bottom} or {cmd:sratio(40,90)} 588 | {p_end} 589 | {synopt:{opt qratio}[{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}quantile ratio 590 | {cmd:q}({it:p2})/{cmd:q}({it:p1}); default is {it:p1}=10 and {it:p2}=90 591 | {p_end} 592 | {synopt:{opt sratio}[{cmd:(}{it:l1}{cmd:,}{it:u1}{cmd:,}{it:l2}{cmd:,}{it:u2}{cmd:)}]}percentile 593 | share ratio; default is {it:l1}=0, {it:u1}=10, {it:l2}=90, {it:u2}=100; can also specify 594 | {cmd:sratio(}{it:u1}{cmd:,}{it:l2}{cmd:)} 595 | {p_end} 596 | {synopt:[*]{cmd:lorenz}{cmd:(}{it:p}{cmd:)}}Lorenz ordinate, {it:p} in [0,100]; 597 | prefix {it:*} is empty for default, {cmd:g} for generalized, {cmd:t} for total, 598 | {cmd:a} for absolute, {cmd:e} for equality gap 599 | {p_end} 600 | {synopt:[*]{cmd:share}{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}}percentile 601 | share, {it:p1} and {it:p2} in [0,100]; prefix {it:*} is empty for default, 602 | {cmd:d} for density, {cmd:g} for generalized, {cmd:t} for total, {cmd:a} for average 603 | {p_end} 604 | 605 | {syntab:Inequality decomposition} 606 | {synopt:{it:d}{cmd:_gini}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}where 607 | {it:d} is {cmd:b} for the between-group Gini coefficient, or {cmd:gw} for the 608 | weighted average of group-specific Gini coefficients; {it:by} specifies the group 609 | variable (string allowed); default is as set by option {cmd:by()}; {it:df} applies 610 | small-sample adjustment; default is {it:df}=0; 611 | can also specify {it:d}{opt _gini(df)} 612 | {p_end} 613 | {synopt:{it:d}{cmd:_mld}[{cmd:(}{it:{help varname:by}}{cmd:)}]}where {it:d} is 614 | {cmd:b} for the between-group MLD, {cmd:w} for the within-group MLD, or 615 | {cmd:gw} for the weighted average of group-specific MLDs ({cmd:gw_mld} is 616 | equivalent to {cmd:w_mld}); {it:by} as for {it:d}{cmd:_gini} 617 | {p_end} 618 | {synopt:{it:d}{cmd:_theil}[{cmd:(}{it:{help varname:by}}{cmd:)}]}where {it:d} 619 | is {cmd:b} for the between-group Theil index, {cmd:w} for the 620 | within-group Theil index, or {cmd:gw} for the weighted average of 621 | group-specific Theil indices; {it:by} as for {it:d}{cmd:_gini} 622 | {p_end} 623 | {synopt:{it:d}{cmd:_ge}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:alpha}]{cmd:)}]}where 624 | {it:d} is {cmd:b} for the between-group generalized entropy, {cmd:w} for the 625 | within-group generalized entropy, or {cmd:gw} for the weighted 626 | average of group-specific generalized entropy; {it:by} as for {it:d}{cmd:_gini}; 627 | can also specify {it:d}{opt _ge(alpha)} 628 | {p_end} 629 | {synopt:{it:d}{cmd:_vlog}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}where 630 | {it:d} is {cmd:b} for the between-group variance of logarithm, {cmd:w} for the 631 | within-group variance of logarithm, or {cmd:gw} for the weighted average of 632 | group-specific variance of logarithm; {it:by} as for {it:d}{cmd:_gini}; 633 | can also specify {it:d}{opt _vlog(df)} 634 | {p_end} 635 | 636 | {syntab:Concentration measures} 637 | {synopt:{opt gci}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}Gini concentration index; 638 | {it:by} specifies the sort variable; default is as set by option {cmd:by()}; 639 | {it:df} applies small-sample adjustment; default is {it:df}=0; can also 640 | specify {opt gci(df)} 641 | {p_end} 642 | {synopt:{opt aci}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}absolute Gini concentration index; syntax 643 | as for {cmd:gci} 644 | {p_end} 645 | {synopt:[*]{cmd:ccurve}{cmd:(}{it:p}[{cmd:,}{it:{help varname:by}}]{cmd:)}}concentration curve ordinate, 646 | {it:p} in [0,100]; prefix {it:*} is empty for default, {cmd:g} for generalized, {cmd:t} for total, 647 | {cmd:a} for absolute, {cmd:e} for equality gap; 648 | {it:by} as for {cmd:gci} 649 | {p_end} 650 | {synopt:[*]{cmd:cshare}{cmd:(}{it:p1}{cmd:,}{it:p2}[{cmd:,}{it:{help varname:by}}]{cmd:)}}concentration share, 651 | {it:p1} and {it:p2} in [0,100]; prefix {it:*} is empty for default, {cmd:d} for density, 652 | {cmd:g} for generalized, {cmd:t} for total, {cmd:a} for average; 653 | {it:by} as for {cmd:gci} 654 | {p_end} 655 | 656 | {syntab:Poverty measures} 657 | {synopt:{opt hcr}[{cmd:(}{it:pline}{cmd:)}]}head count ratio (i.e. proportion poor); {it:pline} specifies the 658 | poverty line > 0; {it:pline} can be {varname} or {it:#}; default is as set by option {cmd:pline()} 659 | {p_end} 660 | {synopt:[{cmd:a}]{opt pgap}[{cmd:(}{it:pline}{cmd:)}]}poverty gap (proportion by which mean outcome of poor 661 | is below {it:pline}); specify {cmd:apgap} for absolute poverty gap ({it:pline} - mean outcome of poor); 662 | {it:pline} as for {cmd:hci} 663 | {p_end} 664 | {synopt:[{cmd:a}]{opt pgi}[{cmd:(}{it:pline}{cmd:)}]}poverty gap index; equal to {cmd:hcr}*{cmd:pgap}; specify 665 | {cmd:apgi} for absolute poverty gap index, equal to {cmd:hcr}*{cmd:apgap}; 666 | {it:pline} as for {cmd:hci} 667 | {p_end} 668 | {synopt:{opt fgt}[{cmd:(}{it:a}[{cmd:,}{it:pline}]{cmd:)}]}Foster–Greer–Thorbecke index with {it:a}>=0 669 | (Foster et al. 1984, 2010); default is {it:a}=0 (head count ratio); {it:a}=1 is equivalent to 670 | {cmd:pgi}; 671 | {it:pline} as for {cmd:hci} 672 | {p_end} 673 | {synopt:{opt sen}[{cmd:(}{it:pline}{cmd:)}]}Sen poverty index (Sen 1976; using the 674 | replication invariant version of the index, also see Shorrocks 1995); 675 | {it:pline} as for {cmd:hci} 676 | {p_end} 677 | {synopt:{opt sst}[{cmd:(}{it:pline}{cmd:)}]}Sen-Shorrocks-Thon poverty index 678 | (see, e.g., Osberg and Xu 2008); 679 | {it:pline} as for {cmd:hci} 680 | {p_end} 681 | {synopt:{opt takayama}[{cmd:(}{it:pline}{cmd:)}]}Takayama poverty index 682 | (Takayama 1979); 683 | {it:pline} as for {cmd:hci} 684 | {p_end} 685 | {synopt:{opt watts}[{cmd:(}{it:pline}{cmd:)}]}Watts index (see, e.g., Saisana 2014); 686 | {it:pline} as for {cmd:hci} 687 | {p_end} 688 | {synopt:{opt chu}[{cmd:(}{it:a}[{cmd:,}{it:pline}]{cmd:)}]}Clark-Hemming-Ulph poverty index with {it:a} in [0,100] 689 | (Clark et al. 1981); default is {it:a}=50; {it:a}=0 is equivalent to 690 | 1-exp(-{cmd:watts}); {it:a}=100 is equivalent to {cmd:fgt(1)}; 691 | {it:pline} as for {cmd:hci} 692 | {p_end} 693 | {synopt:[{cmd:a}]{cmd:tip}{cmd:(}{it:p}[{cmd:,}{it:pline}]{cmd:)}}TIP ordinate, 694 | {it:p} in [0,100]; specify {cmd:atip()} for absolute TIP ordinates; 695 | {it:pline} as for {cmd:hci} 696 | {p_end} 697 | 698 | {marker association}{...} 699 | {syntab:Association measures} 700 | {synopt:{opt corr}[{cmd:(}{it:{help varname:by}}{cmd:)}]}correlation coefficient; 701 | {it:by} specifies the secondary variable; default is as set by option {cmd:by()} 702 | {p_end} 703 | {synopt:{opt slope}[{cmd:(}{it:{help varname:by}}{cmd:)}]}regression slope 704 | (equal to mean difference if {it:by} is dichotomous); {it:by} as for {cmd:corr} 705 | {p_end} 706 | {synopt:{opt b}[{cmd:(}{it:{help varname:by}}{cmd:)}]}same as {cmd:slope} 707 | {p_end} 708 | {synopt:{opt cohend}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}Cohen's d 709 | (allowing unequal group sizes); {it:df} applies small-sample 710 | adjustment; default is {it:df}=2; can also specify {opt cohend(df)}; 711 | {it:by} is assumed to be dichotomous (string allowed) 712 | {p_end} 713 | {synopt:{opt covar}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}covariance; {it:df} applies small-sample 714 | adjustment; default is {it:df}=1; can also specify {opt covar(df)}; 715 | {it:by} as for {cmd:corr} 716 | {p_end} 717 | {synopt:{opt rsquared}[{cmd:(}{it:{help varname:by}}{cmd:)}]}R squared, equal to {cmd:corr}^2; 718 | {it:by} as for {cmd:corr} 719 | {p_end} 720 | {synopt:{opt spearman}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Spearman's rank correlation; 721 | {it:by} as for {cmd:corr} 722 | {p_end} 723 | {synopt:{opt taua}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Kendall's tau-a (using fast algorithm by Newson 2006); 724 | {it:by} as for {cmd:corr} 725 | {p_end} 726 | {synopt:{opt taub}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Kendall's tau-b (using fast algorithm by Newson 2006); 727 | {it:by} as for {cmd:corr} 728 | {p_end} 729 | {synopt:{opt somersd}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Somers' D (using fast algorithm by Newson 2006); 730 | {it:by} as for {cmd:corr} 731 | {p_end} 732 | {synopt:{opt gamma}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Goodman and Kruskal's gamma (using fast algorithm by Newson 2006); 733 | {it:by} as for {cmd:corr} 734 | {p_end} 735 | 736 | {syntab:Categorical data (univariate)} 737 | {synopt:{opt hhi}[{cmd:n}]}Herfindahl–Hirschman index (Herfindahl index, Simpson index); 738 | specify {cmd:hhin} for normalization ({cmd:hhi}-1/K)/(1-1/K), where 739 | K is the number of categories 740 | {p_end} 741 | {synopt:{opt gimp}[{cmd:n}]}Gini impurity (Gini–Simpson index, Simpson's interaction index, 742 | Blau index, Gibbs–Martin index); {cmd:gimp} = 1-{cmd:hhi}; {cmd:gimpn} = 1-{cmd:hhin} 743 | {p_end} 744 | {synopt:{opt entropy}[{cmd:(}{it:base}{cmd:)}]}Shannon entropy; {it:base} specifies 745 | the base of the logarithm (default is natural logarithm) 746 | {p_end} 747 | {synopt:{opt hill}[{cmd:(}{it:q}{cmd:)}]}Hill number (true diversity, 748 | effective number of species); {it:q} specifies the order of the diversity; 749 | default is {it:q}=1 such that {cmd:hill} = exp({cmd:entropy}); if {it:q}=0, 750 | {cmd:hill} is equal to the observed number of categories 751 | {p_end} 752 | {synopt:{opt renyi}[{cmd:(}{it:q}{cmd:)}]}Rényi entropy; 753 | equal to ln({cmd:hill(}{it:q}{cmd:)}); default is {it:q}=1 such that 754 | {cmd:renyi} = {cmd:entropy} 755 | {p_end} 756 | 757 | {marker catbivar}{...} 758 | {syntab:Categorical data (bivariate)} 759 | {synopt:{opt mindex}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:base}]{cmd:)}]}mutual information index (M index); 760 | {it:by} specifies the secondary variable (string allowed); default is as set by option {cmd:by()}; 761 | {it:base} specifies the base of the logarithm (default is natural logarithm); 762 | can also specify {opt mindex(base)} 763 | {p_end} 764 | {synopt:{opt uc}[{cmd:l}|{cmd:r}][{cmd:(}{it:{help varname:by}}{cmd:)}]}uncertainty coefficient (H index); 765 | {cmd:ucl} returns the asymmetric coefficient with respect to the left-hand 766 | side variable (i.e. division by the entropy of the main variable), 767 | {cmd:ucr} is with respect to the right-hand side variable (i.e. division by 768 | the entropy of the secondary variable), {cmd:uc} returns the symmetric 769 | uncertainty coefficient (weighted average of {cmd:ucl} and {cmd:ucr}); 770 | {it:by} as for {cmd:mindex} 771 | {p_end} 772 | {synopt:{opt cramersv}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Cramér's V; 773 | {it:by} as for {cmd:mindex} 774 | {p_end} 775 | {synopt:{opt dissim}[{cmd:(}{it:{help varname:by}}{cmd:)}]}(generalized) dissimilarity index (Duncan's D); 776 | {it:by} as for {cmd:mindex} 777 | {p_end} 778 | {synopt:{opt or}[{cmd:(}{it:{help varname:by}}{cmd:)}]}odds ratio; variables are 779 | interpreted as true/false indicators (false if 0, else true); {it:by} as for {cmd:mindex} 780 | {p_end} 781 | {synopt:{opt rr}[{cmd:(}{it:{help varname:by}}{cmd:)}]}risk ratio; variables are 782 | interpreted as true/false indicators (false if 0, else true); {it:by} as for {cmd:mindex} 783 | {p_end} 784 | {synoptline} 785 | 786 | {pstd} 787 | {it:Note on output formatting in Stata 15 (or in Stata 16 prior to the update of March 30, 2021):} If 788 | statistics with parameters in parentheses are requested, {cmd:dstat summarize} 789 | may possibly display a somewhat disarranged output table. Type 790 | 791 | {cmd:. version 14: dstat summarize} {it:...} 792 | 793 | {pstd} 794 | to obtain an improved table in such a case. 795 | 796 | 797 | {marker options}{...} 798 | {title:Options} 799 | 800 | {marker mainopts}{...} 801 | {dlgtab:Main} 802 | 803 | {phang} 804 | {cmd:nocasewise} causes missing values to be excluded for each variable in 805 | {it:varlist} individually. The default is to perform casewise deletion of 806 | observations, that is, to restrict the sample to observations that are not 807 | missing for any of the variables. If {cmd:nocasewise} is specified, the 808 | overall estimation sample is still restricted by the {cmd:if} and {cmd:in} 809 | qualifiers, the weights, and the variables specified in {cmd:over()} and 810 | {cmd:balance()}, but not by missing values in the main {it:varlist} (or in 811 | {cmd:by()}, {it:by}, {cmd:pline()}, or {it:pline}). For each variable 812 | the subsample of all nonmissing values within the overall estimation sample 813 | will then be used in the relevant computations. 814 | 815 | {marker over}{...} 816 | {phang} 817 | {cmd:over(}{help varname:{it:overvar}}[{cmd:,} {it:options}]{cmd:)} 818 | computes results for each subpopulation defined by the values of 819 | {it:overvar}. {it:overvar} must be integer and nonnegative. {it:options} 820 | are as follows: 821 | 822 | {phang2} 823 | {opth sel:ect(numlist)} selects (and orders) subpopulations. {it:numlist} 824 | specifies the values of the subpopulations to be included 825 | and also determines the order of the subpopulations in the output. The basis 826 | for estimation will always be the total sample including all subpopulations. 827 | 828 | {phang2} 829 | {opt contr:ast}[{cmd:(}{it:#}|{cmd:lag}|{cmd:lead}{cmd:)}] computes contrasts between 830 | subpopulations or between subpopulations and the total population. If 831 | {cmd:contrast} is specified without argument, the total population or 832 | the first subpopulation (possibly after applying {cmd:select()}) 833 | will be used as the basis for the contrasts, depending on whether option 834 | {cmd:total} has been specified or not. Alternatively, specify 835 | the value of the reference subpopulation in parentheses (this may also be 836 | a subpopulation that has been excluded by {cmd:select()}) or 837 | type {cmd:contrast(lag)} or {cmd:contrast(lead)} to take stepwise contrasts 838 | with respect to the previous or next subpopulation, respectively. {cmd:contrast} 839 | implies {cmd:common} (if relevant). 840 | 841 | {pmore2} 842 | The estimates from the reference (sub)population will be included among the 843 | stored results (in logarithmic form if {cmd:lnratio} is specified), but 844 | their display will be suppressed. Specify display 845 | option {cmd:cref} to report these results in the output. 846 | 847 | {phang2} 848 | {opt ratio} requests that the contrasts are expressed as ratios. The 849 | default is to express contrasts as differences. {cmd:ratio} implies 850 | {cmd:contrast}. 851 | 852 | {phang2} 853 | {opt lnr:atio} requests that the contrasts are expressed as differences in 854 | logarithms. The default is to express contrasts as raw differences. {cmd:lnratio} 855 | implies {cmd:contrast} and takes precedence over {cmd:ratio}. 856 | 857 | {pmore2} 858 | When applying {cmd:lnratio} you may also want to specify reporting option 859 | {helpb dstat##display_opts:eform} to display results that are transformed back to 860 | ratios. In fact, point estimates, standard errors, and confidence intervals 861 | from {cmd:lnratio} with {cmd:eform} are identical to results 862 | from {cmd:ratio} with option {cmd:citype(log)}. An advantage of 863 | {cmd:lnratio}, however, is that the null hypothesis for t-statistics and 864 | p-values is ratio = 1 or, more precisely, ln(ratio) = 0 (i.e. no group 865 | difference). For {cmd:ratio} the null hypothesis is ratio = 0, which 866 | does not appear useful (this is why {cmd:dstat} suppresses t-statistics and 867 | p-values in case of {cmd:ratio}). A disadvantage of {cmd:lnrange} is that it 868 | cannot represent cases in which the comparison estimate and, hence, the ratio 869 | is zero. 870 | 871 | {phang2} 872 | {opt accum:ulate} accumulates results across subpopulations 873 | (running sum). Only one of {cmd:contrast} and {cmd:accumulate} is allowed. 874 | 875 | {pmore} 876 | Option {cmd:over()} is not supported by {cmd:dstat pw}. 877 | 878 | {phang} 879 | {cmd:total} reports additional results across all subpopulations, including 880 | subpopulations that may have been excluded by {cmd:select()}. {cmd:total} 881 | only has an effect if {cmd:over()} is specified. 882 | 883 | {marker balance}{...} 884 | {phang} 885 | {cmd:balance(}{it:spec}{cmd:)} balances covariate distributions between 886 | subpopulations using reweighting. {opt balance()} requires {cmd:over()}. The 887 | syntax of {it:spec} is 888 | 889 | [{it:method}{cmd::}] {varlist} [{cmd:,} {it:options}] 890 | 891 | {pmore} 892 | where {it:method} is either {cmd:ipw} for inverse probability weighting based 893 | on logistic regression (the default) or {cmd:eb} for entropy balancing using 894 | {helpb mf_mm_ebal:mm_ebal()} from {helpb moremata}, and 895 | {it:varlist} specifies the list of covariates to be balanced (factor-variable 896 | notation is allowed). For information on inverse probability weighting 897 | see, e.g., DiNardo et al. (1996) and {helpb teffects ipw}. For entropy balancing see 898 | Hainmueller (2012) and Section 3.8 in 899 | {browse "http://ideas.repec.org/p/bss/wpaper/35.html":Jann (2020)}. {it:options} are as follows: 900 | 901 | {phang2} 902 | {opt ref:erence(#)} identifies the reference distribution. The default is 903 | use the total across all subpopulations as the reference distribution, 904 | including subpopulations that may have been excluded by {cmd:select()}. Specify 905 | {cmd:reference(}{it:#}{cmd:)} to obtain the reference distributions from 906 | observations for which {it:overvar}={it:#}; this may also be a subpopulation 907 | that has been excluded by {cmd:select()}. 908 | 909 | {phang2} 910 | {it:logit_options} are options to be passed through to {helpb logit}. {it:logit_options} 911 | are only allowed if {it:method} is {cmd:ipw}. 912 | 913 | {phang2} 914 | {opt btol:erance(#)}, {it:#}>=0, specifies the tolerance for the entropy 915 | balancing algorithm. The default is {cmd:btolerance(1e-5)}. A warning 916 | message is displayed if a balancing solution is not within the specified 917 | tolerance. {cmd:btolerance()} is only allowed if {it:method} is {cmd:eb}. 918 | 919 | {phang2} 920 | {opt noi:sily} displays the output of the balancing procedure. 921 | 922 | {phang2} 923 | {opt gen:erate(newvar)} stores the balancing weights in variable 924 | {it:newvar}. This is useful if you want to check whether covariates have been 925 | balanced successfully. 926 | 927 | {pmore} 928 | Balancing weights will only be computed once per subpopulation. If 929 | {cmd:casewise} is specified, balancing will be based on the overall estimation 930 | sample as defined in the description of the {cmd:casewise} option; the weights 931 | will not be recomputed for each variable individually. 932 | 933 | {marker repopts}{...} 934 | {phang} 935 | {it:reporting_options} are options affecting how results are reported. The options 936 | are as follows: 937 | 938 | {phang2} 939 | {opt l:evel(#)} specifies the confidence level, as a percentage, for 940 | confidence intervals. The default is {cmd:level(95)} or as set by 941 | {helpb set level}. 942 | 943 | {phang2} 944 | {opt citype(type)} specifies the method for the computation of the 945 | confidence interval limits. {it:type} can be: 946 | 947 | {p2colset 17 28 30 2}{...} 948 | {p2col:{cmdab:norm:al}}normal CIs 949 | {p_end} 950 | {p2col:{cmd:logit}}logit transformed CIs; useful for statistics in [0,1] 951 | {p_end} 952 | {p2col:{cmd:probit}}probit transformed CIs; useful for statistics in [0,1] 953 | {p_end} 954 | {p2col:{cmd:atanh}}inverse hyperbolic tangent transformed CIs; useful for statistics in [-1,1] 955 | {p_end} 956 | {p2col:{cmd:log}}log transformed CIs; useful for statistics > 0 957 | {p_end} 958 | {p2col:{cmdab:agres:ti}}Agresti-Coull CIs; useful for proportions 959 | {p_end} 960 | {p2col:{cmd:exact}}exact (Clopper-Pearson) CIs; useful for proportions 961 | {p_end} 962 | {p2col:{cmdab:jeff:reys}}Jeffreys CIs; useful for proportions 963 | {p_end} 964 | {p2col:{cmd:wilson}}Wilson CIs; useful for proportions 965 | {p_end} 966 | 967 | {pmore2} 968 | The default depends on subcommand and options. Use {cmd:citype()} to 969 | override the default. For details on {cmd:agresti}, {cmd:exact}, 970 | {cmd:jeffreys}, and {cmd:wilson} see the documentation of {helpb proportion}. 971 | 972 | {pmore2} 973 | {cmd:dstat} will store the confidence limits given {cmd:level()} and 974 | {cmd:citype()} in {cmd:e(ci)}. Replaying the results with different settings 975 | will update {cmd:e(ci)}. (In Stata 14, normal confidence intervals will be 976 | displayed in the output table irrespective of the contents of {cmd:e(ci)}.) 977 | 978 | {phang2} 979 | {opt nohead:er} suppress the output header. 980 | 981 | {phang2} 982 | {opt notab:le} suppresses the output table containing the estimated 983 | coefficients. {opt tab:le} enforces displaying the table. 984 | 985 | {phang2} 986 | [{ul:{cmd:no}}]{opt pv:alues} decides whether p-values and their test 987 | statistics are reported in the coefficient table or not. The default is 988 | {cmd:nopvalues} unless {cmd:over(, contrast())} has been specified. 989 | 990 | {phang2} 991 | {opt cref} causes the estimates from the reference (sub)population to be 992 | included in the coefficient tables. The default is to suppress these 993 | results. {cmd:cref} is only relevant if {cmd:over(, contrast())} has been 994 | specified. 995 | 996 | {marker display_opts}{...} 997 | {phang2} 998 | {it:display_options} are standard reporting options such as {cmd:eform}, 999 | {cmd:cformat()}, or {cmd:coeflegend}; see {help eform_option:{bf:[R]} {it:eform_option}} and 1000 | the Reporting options in {helpb estimation options:[R] Estimation options}. 1001 | 1002 | {phang2} 1003 | {opt gr:aph}[{cmd:(}{help dstat##graph_options:{it:graph_options}}{cmd:)}] 1004 | displays the results in a graph using {helpb coefplot}. The coefficients 1005 | table will be suppressed in this case (unless option {cmd:table} is 1006 | specified). Alternatively, use command {cmd:dstat graph} to display the 1007 | graph after estimation. 1008 | 1009 | {phang} 1010 | {opt novalues} prevents using the values of the evaluation points as 1011 | coefficient names. This is not relevant for {cmd:dstat summarize}. If {cmd:novalues} 1012 | is specified, the coefficients will be named as {it:stub#}, where 1013 | {it:#} is consecutive number and {it:stub} is 1014 | {cmd:d} in case of {cmd:dstat density}, 1015 | {cmd:h} in case of {cmd:dstat histogram}, 1016 | {cmd:p} in case of {cmd:dstat proportion}, 1017 | {cmd:c} in case of {cmd:dstat cdf} and {cmd:dstat ccdf}, 1018 | {cmd:q} in case of {cmd:dstat quantile}, 1019 | {cmd:l} in case of {cmd:dstat lorenz}, 1020 | {cmd:s} in case of {cmd:dstat share} (and for {cmd:dstat histogram} and 1021 | {cmd:dstat share} the last coefficient, i.e. the upper limit of last bin, will be named 1022 | {cmd:_ul}). 1023 | 1024 | {phang} 1025 | {opth vformat(fmt)} sets the display format used to create coefficient names 1026 | from evaluation points. This is not relevant for {cmd:dstat summarize}. See 1027 | help {helpb format} for available formats. 1028 | 1029 | {marker vce}{...} 1030 | {dlgtab:SE/VCE} 1031 | 1032 | {phang} 1033 | {opt vce(vcetype)} determines how standard errors are computed. {it:vcetype} may be: 1034 | 1035 | {opt none} 1036 | {opt a:nalytic} 1037 | {opt cl:uster} {it:clustvar} 1038 | {opt svy} [{help svy##svy_vcetype:{it:svy_vcetype}}] [{cmd:,} {help svy##svy_options:{it:svy_options}} ] 1039 | {opt boot:strap} [{cmd:,} {help bootstrap:{it:bootstrap_options}} ] 1040 | {opt jack:knife} [{cmd:,} {help jackknife:{it:jackknife_options}} ] 1041 | 1042 | {pmore} 1043 | {cmd:vce(none)} omits the computation of standard errors. This saves computer 1044 | time. 1045 | 1046 | {pmore} 1047 | {cmd:vce(analytic)}, the default, computes standard errors based on 1048 | influence functions. 1049 | 1050 | {pmore} 1051 | {bind:{cmd:vce(cluster} {it:clustvar}{cmd:)}} computes standard errors based 1052 | on influence functions allowing for intragroup correlation, where 1053 | {it:clustvar} specifies to which group each observation belongs. 1054 | 1055 | {pmore} 1056 | {cmd:vce(svy)} computes standard errors taking the survey design as set by 1057 | {helpb svyset} into account. The syntax is equivalent to the syntax of the {helpb svy} 1058 | prefix command; that is, {cmd:vce(svy)} is {cmd:dstat}'s way to support 1059 | the {helpb svy} prefix. 1060 | 1061 | {pmore} 1062 | {cmd:vce(bootstrap)} and {cmd:vce(jackknife)} compute standard errors using 1063 | {helpb bootstrap} or {helpb jackknife}, respectively; see help {it:{help vce_option}}. 1064 | 1065 | {phang} 1066 | {cmd:nose} is an alias for {cmd:vce(none)}. {cmd:nose} overrides {cmd:vce(analytic)} and 1067 | {cmd:vce(cluster)}, but has no effect if specified together with 1068 | {cmd:vce(svy)}, {cmd:vce(bootstrap)}, or {cmd:vce(jackknife)}. 1069 | 1070 | {phang} 1071 | [{cmd:no}]{cmd:cov} determines wether the full variance-covariance matrix 1072 | of the estimates is stored in {cmd:e(V)}, or whether only the standard 1073 | errors are stored in vector {cmd:e(se)}. The default is {cmd:cov} (full 1074 | variance matrix) for subcommands {cmd:summarize} and {cmd:pw}, and 1075 | {cmd:nocov} (standard errors only) for all other subcommands. {cmd:nocov} 1076 | saves memory if the number of evaluation points is large (for example, if 1077 | you estimate the density using 400 points across two subpopulations, the 1078 | covariance matrix has 800 x 800 = 640'000 elements; the vector of standard 1079 | errors has only 800 elements). For {cmd:vce(analytic)} and 1080 | {cmd:vce(cluster)}, option {cmd:nocov} also saves computer time (since the 1081 | computation of covariances is skipped; in the other cases, covariances are 1082 | removed after estimation). For {cmd:vce(svy)}, option {cmd:nocov} also 1083 | removes auxiliary variance matrices such as {cmd:e(V_srs)}. Note that 1084 | post-estimation commands that rely on covariances (or on auxiliary variance 1085 | matrices in case of {cmd:svy}) will not work after {cmd:nocov} has been 1086 | applied; specify option {cmd:cov} if you intend to use such post-estimation 1087 | commands (e.g., {helpb test} or {helpb lincom}) after subcommands other than 1088 | {cmd:summarize} or {cmd:pw}. 1089 | 1090 | {phang} 1091 | {cmd:nobwfixed} allows the bandwidth(s) for density estimation to vary 1092 | across replications. This is only relevant if density estimation is 1093 | requested (subcommand {cmd:density} or {cmd:pdf}; subcommand 1094 | {cmd:summarize} with at least one {cmd:density()} statistic), if the 1095 | bandwidth is not set to a specific value (or a list of specific values) 1096 | using option {cmd:bwidth()}, and if a replication technique is used for 1097 | standard error estimation, i.e. {cmd:vce(bootstrap)}, {cmd:vce(jackknife)}, 1098 | or {cmd:vce(svy)} with {help svy##svy_vcetype:{it:svy_vcetype}} other than 1099 | {cmd:linearized}. The default is to hold the bandwidth(s) fixed across 1100 | replications. 1101 | 1102 | {marker generate}{...} 1103 | {phang} 1104 | {cmd:generate(}{it:names}[{cmd:,} {it:options}]{cmd:)} stores the influence 1105 | functions that were used to compute the standard errors, where {it:names} 1106 | is either a list of (new) variable names or 1107 | {help newvarlist##stub*:{it:stub}}{cmd:*} to create names {it:stub}{cmd:1}, 1108 | {it:stub}{cmd:2}, etc. {it:options} are {cmd:rif} to store RIFs, 1109 | {cmdab:sca:ling(}{cmdab:t:otal}{cmd:)} or {cmdab:sca:ling(}{cmdab:m:ean}{cmd:)} 1110 | to determine the scaling, {cmdab:com:pact} to merge the influence functions 1111 | across subpopulations, and {cmdab:qui:etly} to suppress output; see 1112 | {it:{help dstat##predict_options:predict_options}} below. 1113 | 1114 | {phang} 1115 | {cmd:rif(}{it:names}[{cmd:,} {it:options}]{cmd:)} is an alias for 1116 | {cmd:generate(}{it:names}{cmd:,} {cmd:rif} [{it:options}]{cmd:)}. 1117 | 1118 | {phang} 1119 | {opt replace} allows replacing existing variables. 1120 | 1121 | {marker quant}{...} 1122 | {dlgtab:Quantile/density settings} 1123 | 1124 | {phang} 1125 | {opt qdef(#)} sets the quantile definition to be used when computing 1126 | quantiles, with {it:#} in {c -(}0,...,11{c )-}. The default is 1127 | {cmd:qdef(2)} (same as, e.g. {helpb summarize}). Definitions 1-9 are as 1128 | described in Hyndman and Fan (1996), definition 0 is the "high" quantile, 1129 | definition 10 is the Harrell-Davis quantile (Harrell and Davis 1982), 1130 | definition 11 is the mid-quantile (Ma et al. 2011); see 1131 | {helpb mf_mm_quantile:mm_quantile()} for more information. Apart from the 1132 | {cmd:dstat quantile} and statistic {cmd:quantile()}, option {cmd:qdef()} affects 1133 | all statistics that make use of quantiles (e.g. {cmd:trim}, {cmd:winsor}, 1134 | {cmd:huber}, {cmd:biweight}, {cmd:mad}, etc.). 1135 | 1136 | {phang} 1137 | {opt hdquantile} is a synonym for {cmd:qdef(10)} (Harrell-Davis 1138 | quantiles). Only one of {opt hdquantile}, {opt mquantile}, and {opt qdef()} 1139 | is allowed. The Harrell-Davis estimator typically leads to smoother 1140 | quantile functions than classical quantile definitions. Furthermore, 1141 | standard errors do not depend on density estimation and tend to be 1142 | more reliable than for other quantile definitions if there is heaping in the data. 1143 | 1144 | {phang} 1145 | {opt hdtrim}[{cmd:(}{it:width}{cmd:)}] applies trimming to the Harrell-Davis 1146 | quantile estimator as suggested by Akinshin (2021). If {cmd:hdtrim} is specified without 1147 | argument, the width of evaluation interval is set to 1/sqrt(n), where n 1148 | is the effective sample size. Alternatively, specify a custom {it:width}. Sensible values 1149 | for {it:width} lie between 0 and 1 ({it:width}>=1 uses the untrimmed estimator; 1150 | {it:width}<=0 sets the width to 1/sqrt(n)). 1151 | 1152 | {phang} 1153 | {opt mquantile} is a synonym for {cmd:qdef(11)} (mid-quantiles). Only one 1154 | of {opt hdquantile}, {opt mquantile}, and {opt qdef()} is allowed. The 1155 | mid-quantile estimator typically leads to smoother quantile 1156 | functions than classical quantile definitions. Ma et al. (2011) suggest 1157 | using the mid-quantile estimator for discrete data. 1158 | 1159 | {phang} 1160 | {opt mqopts(options)} provides additional settings for mid-quantiles that 1161 | are relevant for standard error estimation. {it:options} are as follows: 1162 | 1163 | {phang2} 1164 | {opt us:mooth(#)}, with {it:#}<1, sets the degree of undersmoothing that is 1165 | applied when determining the sparsity function via density estimation. 1166 | The default is {cmd:usmooth(0.2)}. The undersmoothing factor is computed as 1167 | n^(1/5) / n^(1/(5*(1-#)), where n is the effective sample size. Set # to 0 1168 | to omit undersmoothing; #<0 leads to oversmoothing. Note that 1169 | {help densopts:{it:density_options}} have no effect on density estimation 1170 | for mid-quantiles. 1171 | 1172 | {phang2} 1173 | {cmd:cdf}[{cmd:(}{it:#}{cmd:})], with {it:#}>=0, determines the sparsity 1174 | function by differencing the ECDF instead of employing density 1175 | estimation. This may lead to somewhat more valid results in discrete data (i.e. data 1176 | with relatively few distinct levels), but results may be unreliable in 1177 | continuous data. Optional argument {it:#} sets the width of the integration 1178 | window that is used to interpolate across jumps in the ECDF ({it:#} is on 1179 | the probability scale; for example, a value of 0.01 is equivalent to a 1180 | window covering 1 percent of data mass). The default is {it:#} = 1 / 1181 | ceil(2 * n^(2/5)), where n is the effective sample size. Set {it:#}=0 to 1182 | omit integration (this corresponds to the formulas given in Ma et al. 2011; 1183 | the sparsity function will have sharp jumps). 1184 | 1185 | {marker densopts}{...} 1186 | {phang} 1187 | {it:density_options} set the details of density 1188 | estimation. These settings are relevant for command 1189 | {cmd:dstat density}/{cmd:pdf} and statistic {cmd:density()} as well as for 1190 | the computation of influence functions that involve density 1191 | estimation (e.g., the influence function of a quantile). For more information 1192 | on density estimation see {helpb mf_mm_density:mm_density()}, 1193 | {browse "http://boris.unibe.ch/69421/2/kdens.pdf":Jann (2007)}, and 1194 | Wand and Jones (1995). The options are as follows: 1195 | 1196 | {phang2} 1197 | {cmdab:bw:idth(}{it:method}[{cmd:,} {opt adj:ust(#)} {cmd:rd}]{cmd:)} 1198 | specifies the type of automatic bandwidth selector for kernel density 1199 | estimation. Possible choices for {it:method} are: 1200 | 1201 | {p2colset 17 32 34 2}{...} 1202 | {p2col:{cmdab:s:ilverman}}optimal of Silverman 1203 | {p_end} 1204 | {p2col:{cmdab:n:ormalscale}}normal scale rule 1205 | {p_end} 1206 | {p2col:{cmdab:o:versmoothed}}oversmoothed rule 1207 | {p_end} 1208 | {p2col:{opt sj:pi}}Sheather-Jones solve-the-equation plug-in 1209 | {p_end} 1210 | {p2col:{cmdab:d:pi}[{cmd:(}{it:#}{cmd:)}]}Sheather-Jones direct plug-in, 1211 | where {it:#} specifies the number of stages of functional estimation; 1212 | default is {cmd:2} 1213 | {p_end} 1214 | {p2col:{opt isj}}diffusion estimator bandwidth (Botev et al. 2010) 1215 | {p_end} 1216 | 1217 | {pmore2} 1218 | The default is {cmd:bwidth(dpi(2))}. Suboption {opt adjust(#)}, with #>0, can be 1219 | used to adjust the automatic bandwidth by factor {it:#}. Suboption {cmd:rd} 1220 | applies relative-data correction to the automatic bandwidth (Cwik and Mielniczuk 1993). 1221 | 1222 | {phang2} 1223 | {opth bw:idth(numlist)} is an alternative to {opt bwidth(method)} and sets 1224 | the bandwidth to a specific value. If {it:numlist} contains multiple values, 1225 | the values are used one after the other across the variables and 1226 | subpopulations (recycling values if needed). The specified values must be larger 1227 | than zero. 1228 | 1229 | {phang2} 1230 | {opt k:ernel(kernel)} specifies the kernel function. {it:kernel} may 1231 | be {opt e:panechnikov}, {opt epan2} (alternative Epanechnikov kernel 1232 | function), {opt b:iweight}, {opt triw:eight}, {opt c:osine}, 1233 | {opt g:aussian}, {opt p:arzen}, {opt r:ectangle} or {opt t:riangle}. The default 1234 | is {cmd:kernel(gaussian)}. 1235 | 1236 | {phang2} 1237 | {opt adapt:ive(#)} specifies the number of iterations used by the adaptive 1238 | kernel density estimator. The default is {cmd:adaptive(0)} (non-adaptive 1239 | density estimator). 1240 | 1241 | {phang2} 1242 | {cmd:exact} causes the exact kernel density estimator to be used instead 1243 | of the binned approximation estimator. The exact estimator can be slow in large 1244 | datasets if the density is evaluated at many points. 1245 | 1246 | {phang2} 1247 | {opt na:pprox(#)} specifies the grid size used by the binned approximation 1248 | density estimator (and by the data-driven bandwidth selectors). The default 1249 | is {cmd:napprox(1024)}. 1250 | 1251 | {phang2} 1252 | {opt pad(#)} specifies the padding proportion of the approximation grid. Default is 1253 | {cmd:pad(0.1)}. 1254 | 1255 | {phang2} 1256 | {opt ll(#)} specifies the lower boundary of the support of data and causes 1257 | boundary-correction to be applied to the density estimate. Error will be 1258 | returned if the data contains values smaller than {it:#}. 1259 | 1260 | {phang2} 1261 | {opt ul(#)} specifies the upper boundary of the support of data and causes 1262 | boundary-correction to be applied to the density estimate. Error will be 1263 | returned if the data contains values larger than {it:#}. 1264 | 1265 | {phang2} 1266 | {opt bo:undary(method)} sets the type of boundary correction. Choices are 1267 | {opt ren:orm} (renormalization method; the default), {opt refl:ect} (reflection method), or 1268 | {opt lc} (linear combination technique). This is only relevant if {cmd:ll()} or {cmd:ul()} 1269 | has been specified. 1270 | 1271 | {marker sum}{...} 1272 | {dlgtab:Subcommand summarize} 1273 | 1274 | {phang} 1275 | {opt relax} continues computations even if there are observations outside 1276 | of the support for a specific statistic. Some statistics such as the 1277 | geometric mean, the MLD, or the Theil index require observations to be 1278 | within a specific domain (e.g. strictly positive). By default, {cmd:dstat} aborts 1279 | with error if observations violating such requirements are encountered. Specify 1280 | {cmd:relax} if you want to continue computations based on the valid 1281 | observations in such a case. Exclusion of invalid observations will be 1282 | applied to each statistic individually; that is, the invalid observations 1283 | will not be dropped from the overall estimation sample. 1284 | 1285 | {phang} 1286 | {opth by(varname)} specifies a default secondary variable for 1287 | inequality desomposition, concentration indices, and association measures. 1288 | 1289 | {phang} 1290 | {opt pline(#|varname)} specifies a default poverty line for poverty 1291 | measures, either as a single value or as a variable containing observation-specific 1292 | values. 1293 | 1294 | {phang} 1295 | {opt pstrong} selects the poverty definition to be applied (see Donaldson and 1296 | Weymark 1986). The default is to use the "weak" definition, that is, to treat 1297 | outcomes equal to the poverty line as non-poor. Specify {cmd:pstrong} to treat 1298 | these cases as poor ("strong" definition). The choice of definition is relevant 1299 | only for some of the poverty measures. 1300 | 1301 | {marker pw}{...} 1302 | {dlgtab:Subcommand pw} 1303 | 1304 | {phang} 1305 | {opt statistic(stat)} selects the association measure to be 1306 | computed. {it:stat} may be any statistic listed under 1307 | {help dstat##association:Association measures} or 1308 | {help dstat##catbivar:Categorical data (bivariate)} 1309 | in the above table of summary statistics (omitting argument 1310 | {it:by}). {cmd:statistic(corr)} is the default. Type, for example, 1311 | {cmd:statistic(taub)} to compute Kendall's tau-b. Arguments other than 1312 | {it:by} can be provided in parentheses as usual; for example, type 1313 | {cmd:statistic(mindex(2))} to compute the M index with base 2. 1314 | 1315 | {pmore} 1316 | Most supported statistics are symmetric in the sense that the upper and 1317 | lower triangles of the association matrix (i.e. the matrix of pairwise 1318 | associations among the variables in {it:varlist}) contain the same results 1319 | (i.e. the association between X and Y is the same as the association beteen 1320 | Y and X). For asymmetric statistics (e.g. {cmd:slope}) the column 1321 | (i.e. equation) variable is treated as the dependent variable. 1322 | 1323 | {phang} 1324 | {opt lower} requests that the lower-triangle elements of the association 1325 | matrix be computed. The default is to compute both the lower-triangle elements 1326 | and the upper-triangle elements. 1327 | 1328 | {phang} 1329 | {opt upper} requests that the upper-triangle elements of the association 1330 | matrix be computed. The default is to compute both the lower-triangle elements 1331 | and the upper-triangle elements. 1332 | 1333 | {phang} 1334 | {opt diagonal} includes the diagonal elements of the association 1335 | matrix (associations of the variables with themselves). By default, 1336 | diagonal elements are omitted. 1337 | 1338 | {marker density}{...} 1339 | {dlgtab:Subcommand density} 1340 | 1341 | {phang} 1342 | {opt n(#)} sets the number of points for which the density is to 1343 | be estimated. A regular grid of {it:#} points spanning the 1344 | data range (within subpopulation; plus some padding) will be used. The 1345 | default is {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed. 1346 | 1347 | {phang} 1348 | {opt common} requests that a common set of evaluation points is used across 1349 | all subpopulations. The default is to determine the evaluation points based on 1350 | the data range within subpopulation. If {cmd:common} is specified, the 1351 | evaluation points will be based on the data range in the total population. 1352 | 1353 | {phang} 1354 | [{cmd:l}|{cmd:r}]{cmd:tight} omits padding when determining the evaluation 1355 | grid. Specify {cmd:tight} to omit padding on both sides, that is, to use a grid 1356 | from the observed minimum to the observed maximum of the data. Specify 1357 | {cmd:ltight} to omit padding only on the left, that is, to use the observed 1358 | minimum as the lower bound of the grid. Specify {cmd:rtight} to omit padding 1359 | only on the right, that is, to use the observed maximum as the upper bound 1360 | of the grid. Option {cmd:tight} has no effect if {cmd:range()} or {cmd:at()} 1361 | is specified. 1362 | 1363 | {phang} 1364 | {opt range(a b)} specifies the range of the evaluation grid. The default is 1365 | is to determine the range of the grid from the data; see option {cmd:n()}. Option 1366 | {cmd:range()} overrides {cmd:common}. Only one of {cmd:range()} and 1367 | {cmd:at()} is allowed. 1368 | 1369 | {phang} 1370 | {opth at(numlist)} specifies a custom grid of evaluation points. Only 1371 | one of {cmd:n()} and {cmd:at()} is allowed. 1372 | 1373 | {phang} 1374 | {cmd:unconditional} rescales results such that the 1375 | density function integrates to the relative size of the subpopulation 1376 | instead of 1. This is only relevant if option {cmd:over()} has been 1377 | specified. 1378 | 1379 | {marker hist}{...} 1380 | {dlgtab:Subcommand histogram} 1381 | 1382 | {phang} 1383 | {opt proportion} estimates proportions instead of densities. 1384 | 1385 | {phang} 1386 | {opt percent} estimates percent instead of densities. 1387 | 1388 | {phang} 1389 | {opt frequency} estimates frequencies instead of densities. 1390 | 1391 | {phang} 1392 | {cmd:n(}{cmd:#}|{it:method}{cmd:)} selects 1393 | (the method to determine) the number of histogram bins. Specify {opt n(#)} 1394 | to use {it:#} bins. Alternatively, specify {opt n(method)} to determine the 1395 | number of bins automatically, where {it:method} may be one of the following: 1396 | 1397 | {p2colset 13 23 25 2}{...} 1398 | {p2col:{opt sq:rt}}modified square-root choice as used by {helpb histogram} 1399 | {p_end} 1400 | {p2col:{opt st:urges}}Sturges' formula 1401 | {p_end} 1402 | {p2col:{opt ri:ce}}Rice rule 1403 | {p_end} 1404 | {p2col:{opt do:ane}}Doane's formula 1405 | {p_end} 1406 | {p2col:{opt sc:ott}}Scott's normal reference rule 1407 | {p_end} 1408 | {p2col:{opt fd}}Freedman–Diaconis' choice 1409 | {p_end} 1410 | {p2col:{opt ep}}power-maximizing number of equiprobable bins 1411 | {p_end} 1412 | 1413 | {pmore} 1414 | The default is {cmd:n(sqrt)}; see help {helpb histogram} for details on this 1415 | rule. For the other rules see {browse "http://en.wikipedia.org/wiki/Histogram"}. The generated 1416 | bins will span the range of the observed data (within subpopulation). 1417 | 1418 | {phang} 1419 | {cmd:ep} uses equal probability bins (approximately) instead of equal 1420 | width bins. 1421 | 1422 | {phang} 1423 | {opt common} requests that a common set of bin definitions is used across 1424 | all subpopulations. The default is to determine the number of bins and the 1425 | bin boundaries based on the data within subpopulation. If {cmd:common} is 1426 | specified, the bin definitions will be based on the data in the total population. 1427 | 1428 | {phang} 1429 | {opth at(numlist)} specifies custom cutpoints for the bins (in ascending 1430 | order). If {it:numlist} contains {it:n} numbers, {it:n}-1 bins will be 1431 | created. Note that the constructed bins will cover all data only if the first 1432 | cutpoint is smaller than or equal to the minimum of the data and the last 1433 | cutpoint is larger than or equal to the maximum ({cmd:dstat} does {it:not} check 1434 | this condition and does not display a warning if the condition is violated). 1435 | 1436 | {phang} 1437 | {cmd:discrete} treats the data as discrete and estimates the probability of 1438 | each observed level in the data. The option is implemented as a 1439 | redirection to subcommand {cmd:proportion} with option {cmd:nocategorical}. Options 1440 | {cmd:n()} and {cmd:ep} are not allowed together with {cmd:discrete}; the other 1441 | options are as described for {help dstat##prop:subcommand {bf:proportion}}. 1442 | 1443 | {phang} 1444 | {cmd:unconditional} rescales results by the relative size of 1445 | the subpopulation. This is only relevant if option {cmd:over()} has been 1446 | specified. {cmd:unconditional} is not allowed together with {cmd:frequency}. 1447 | 1448 | {marker prop}{...} 1449 | {dlgtab:Subcommand proportion} 1450 | 1451 | {phang} 1452 | {opt percent} estimates percent instead of proportions. 1453 | 1454 | {phang} 1455 | {opt frequency} estimates frequencies instead of proportions. 1456 | 1457 | {phang} 1458 | {opth at(numlist)} provides a custom list of levels for which to estimate 1459 | proportions. The default is to use all levels observed in the data (across 1460 | subpopulations). 1461 | 1462 | {phang} 1463 | {opt nocategorical} allows outcome variables that do not comply to 1464 | Stata's rules for factor variables (e.g. variables that contain negative 1465 | or noninteger values). This also affects how the coefficients are 1466 | labeled in the output. 1467 | 1468 | {phang} 1469 | {cmd:unconditional} rescales proportions by the relative size of 1470 | the subpopulation. This is only relevant if option {cmd:over()} has been 1471 | specified. {cmd:unconditional} is not allowed together with {cmd:frequency}. 1472 | 1473 | {marker cdf}{...} 1474 | {dlgtab:Subcommands cdf and ccdf} 1475 | 1476 | {phang} 1477 | {opt percent} estimates percent instead of proportions. 1478 | 1479 | {phang} 1480 | {opt frequency} estimates frequencies instead of proportions. 1481 | 1482 | {phang} 1483 | {opt mid} applies midpoint adjustment to the estimated CDF. By default, the 1484 | CDF at evaluation point {it:x} is defined as the proportion of data that is 1485 | lower than or equal to {it:x}. If {cmd:mid} is specified, the CDF at 1486 | point {it:x} is reduced by one half the proportion of data equal to 1487 | {it:x}. {cmd:mid} only has an effect on the results for evaluation points 1488 | that have a match in the data (unless {cmd:ipolate} is specified; see below). Only 1489 | one of {cmd:mid} and {cmd:floor} is allowed. 1490 | 1491 | {phang} 1492 | {opt floor} defines the CDF at evaluation point {it:x} as the proportion 1493 | of data that is lower than {it:x}, rather than lower than or equal to 1494 | {it:x}. {cmd:floor} only has an effect on the results for evaluation points 1495 | that have a match in the data (unless {cmd:ipolate} is specified; 1496 | see below). Only one of {cmd:floor} and {cmd:mid} is allowed. 1497 | 1498 | {phang} 1499 | {opt n(#)} sets the number of points at which the CDF is to be 1500 | evaluated. A regular grid of {it:#} points spanning the 1501 | observed data range (within subpopulation) will be used. The default is 1502 | {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed. 1503 | 1504 | {phang} 1505 | {opt common} requests that a common set of evaluation points is used across 1506 | all subpopulations. The default is to determine the evaluation points based on 1507 | the data range within subpopulation. If {cmd:common} is specified, the 1508 | evaluation points will be based on the data range in the total population. 1509 | 1510 | {phang} 1511 | {opt range(a b)} specifies the range of the evaluation grid. The default is 1512 | is to determine the range of the grid from the data; see option {cmd:n()}. Option 1513 | {cmd:range()} overrides {cmd:common}. Only one of {cmd:range()} and 1514 | {cmd:at()} is allowed. 1515 | 1516 | {phang} 1517 | {opth at(numlist)} provides a custom list of points at which to evaluate 1518 | the CDF. Only one of {cmd:n()} and {cmd:at()} is allowed. 1519 | 1520 | {phang} 1521 | {cmd:discrete} treats the data as discrete. In this case, the CDF will 1522 | be estimated at each level observed in the data 1523 | (across all subpopulations). Option {cmd:n()} is not allowed if 1524 | {cmd:discrete} is specified. 1525 | 1526 | {phang} 1527 | {cmd:ipolate} obtains the estimates of the CDF by linearly interpolating 1528 | the values of the empirical CDF. That is, the estimates will lie 1529 | on the curve that linearly connects the points of the CDF if the CDF is 1530 | evaluated at each observed level in the data (within subpopulation; options 1531 | {cmd:mid} and {cmd:floor} have an effect on the location of these 1532 | points). By default, the estimates of the CDF are obtained according to the definitions 1533 | described above (see {cmd:mid} and {cmd:floor}). 1534 | 1535 | {phang} 1536 | {cmd:unconditional} rescales results by the relative size of 1537 | the subpopulation. This is only relevant if option {cmd:over()} has been 1538 | specified. {cmd:unconditional} is not allowed together with {cmd:frequency}. 1539 | 1540 | {marker quantile}{...} 1541 | {dlgtab:Subcommand quantile} 1542 | 1543 | {phang} 1544 | {opt n(#)} sets the number of quantiles to be computed. A regular grid 1545 | of {it:#} points from {it:a}+{it:h} to {it:b}-{it:h} will be used, 1546 | with {it:h} = ({it:b}-{it:a})/({it:#}+1) and {it:a} and {it:b} 1547 | as set by option {cmd:range()}. The default is 1548 | {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed. 1549 | 1550 | {phang} 1551 | {opt range(a b)} specifies the range of the evaluation grid, {it:a} and 1552 | {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()} 1553 | and {cmd:at()} is allowed. 1554 | 1555 | {phang} 1556 | {opth at(numlist)} provides a custom list of probabilities at which to 1557 | compute quantiles. The specified values must be within [0,1]. Only one of 1558 | {cmd:n()} and {cmd:at()} is allowed. 1559 | 1560 | {marker lorenz}{...} 1561 | {dlgtab:Subcommand lorenz} 1562 | 1563 | {phang} 1564 | {opt percent} expresses results in percent instead of 1565 | proportions. {cmd:percent} is not allowed with 1566 | {cmd:generalized} or {cmd:absolute}. 1567 | 1568 | {phang} 1569 | {opt generalized} estimates the generalized Lorenz curve. 1570 | 1571 | {phang} 1572 | {opt sum} estimates the total (unnormalized) Lorenz curve. 1573 | 1574 | {phang} 1575 | {opt gap} estimates the equality gap curve. 1576 | 1577 | {phang} 1578 | {opt absolute} estimates the absolute Lorenz curve. 1579 | 1580 | {phang} 1581 | {opth by(varname)} estimates the concentration curve with respect to 1582 | {it:varname} instead of the Lorenz curve. 1583 | 1584 | {phang} 1585 | {opt n(#)} sets the number of ordinates to be estimated. A regular grid 1586 | of {it:#} values from {it:a} to {it:b} will be used, with {it:a} and {it:b} 1587 | as set by option {cmd:range()}. The default is {cmd:n(101)}. Only one of 1588 | {cmd:n()} and {cmd:at()} is allowed. 1589 | 1590 | {phang} 1591 | {opt range(a b)} specifies the range of the evaluation grid, {it:a} and 1592 | {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()} 1593 | and {cmd:at()} is allowed. 1594 | 1595 | {phang} 1596 | {opth at(numlist)} provides a custom list of points at which to 1597 | estimate Lorenz ordinates. The specified values must be within [0,1]. Only one of 1598 | {cmd:n()} and {cmd:at()} is allowed. 1599 | 1600 | {marker share}{...} 1601 | {dlgtab:Subcommand share} 1602 | 1603 | {phang} 1604 | {opt proportion} estimates proportions instead of densities. 1605 | 1606 | {phang} 1607 | {opt percent} estimates percent instead of densities. 1608 | 1609 | {phang} 1610 | {opt generalized} estimates generalized shares instead of densities. 1611 | 1612 | {phang} 1613 | {opt sum} estimates totals instead of densities. 1614 | 1615 | {phang} 1616 | {opt average} estimates averages instead of densities. 1617 | 1618 | {phang} 1619 | {opth by(varname)} estimates the concentration shares with respect to 1620 | {it:varname}. 1621 | 1622 | {phang} 1623 | {opt n(#)} sets the number of bins. A regular grid of {it:#} bins between 1624 | 0 an 1 will be used. The default is {cmd:n(20)}. 1625 | 1626 | {phang} 1627 | {opth at(numlist)} specifies custom cutpoints for the bins (in ascending 1628 | order). The specified values must be within [0,1]. If {it:numlist} contains 1629 | {it:n} numbers, {it:n}-1 bins will be created. Note that the constructed 1630 | bins will cover all data only if the first cutpoint is 0 and the last 1631 | cutpoint is 1. 1632 | 1633 | {marker tip}{...} 1634 | {dlgtab:Subcommand tip} 1635 | 1636 | {phang} 1637 | {opt pline(#|varname)} specifies the poverty line, either as a single 1638 | value or as a variable containing observation-specific 1639 | values. Option {cmd:pline()} is required. 1640 | 1641 | {phang} 1642 | {opt absolute} estimates the absolute TIP curve. Default is to estimate the 1643 | relative TIP curve. 1644 | 1645 | {phang} 1646 | {opt pstrong} selects the poverty definition to be applied (see Donaldson and 1647 | Weymark 1986). The default is to use the "weak" definition, that is, to treat 1648 | outcomes equal to the poverty line as non-poor. Specify {cmd:pstrong} to treat 1649 | these cases as poor ("strong" definition). 1650 | 1651 | {phang} 1652 | {opt n(#)} sets the number of ordinates to be estimated. A regular grid 1653 | of {it:#} values from {it:a} to {it:b} will be used, with {it:a} and {it:b} 1654 | as set by option {cmd:range()}. The default is {cmd:n(101)}. Only one of {cmd:n()} 1655 | and {cmd:at()} is allowed. 1656 | 1657 | {phang} 1658 | {opt range(a b)} specifies the range of the evaluation grid, {it:a} and 1659 | {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()} 1660 | and {cmd:at()} is allowed. 1661 | 1662 | {phang} 1663 | {opth at(numlist)} provides a custom list of points at which to 1664 | estimate the ordinates. The specified values must be within [0,1]. Only one of 1665 | {cmd:n()} and {cmd:at()} is allowed. 1666 | 1667 | {marker graph_options}{...} 1668 | {dlgtab:Graph options} 1669 | 1670 | {phang} 1671 | {cmd:merge} causes results from different equations to be placed 1672 | in a single graph (as separate "plots", i.e. as separate series of results 1673 | displayed in a common style) instead of creating a separate subgraph for 1674 | each equation. This is only relevant if the results contain multiple 1675 | equations and if the equations are one-dimensional 1676 | (e.g. subpopulations); {cmd:merge} has no effect if the 1677 | equations are two-dimensional (subpopulations and variables). 1678 | 1679 | {phang} 1680 | {cmd:overlay} is a synonym for {cmd:merge}. 1681 | 1682 | {phang} 1683 | {cmd:flip} changes how results are allocated to plots and subgraphs. This is 1684 | only relevant if the results contain multiple equations. If the equations 1685 | are two-dimensional (subpopulations and variables), the default is to 1686 | create subgraphs by the secondary dimension (variables) and create 1687 | "plots" (series of results displayed in a common style) within subgraphs by 1688 | the main dimension (subpopulations). Specify {cmd:flip} to reverse this 1689 | behavior. If equations are one-dimensional, {cmd:flip} has the same effect 1690 | as {cmd:merge}. 1691 | 1692 | {phang} 1693 | [{cmd:g}|{cmd:p}]{cmdab:sel:ect}{cmd:(}{it:{help numlist}}|{cmdab:r:everse}{cmd:)} 1694 | selects and orders subgraphs and plots within 1695 | subgraphs. {it:numlist} specifies the indices of the subgraphs or plots to 1696 | be included. For example, in a situation where the default graph has three 1697 | subgraphs (containing one plot each), you could type {cmd:select(3 1)} to 1698 | omit the 2nd subgraph and reverse the order such that the 3rd subgraph comes 1699 | first. Instead of providing {it:numlist}, type {cmd:select(reverse)} 1700 | to reverse the order of subgraphs or plots. 1701 | 1702 | {pmore} 1703 | {cmd:select()} applies to both, subgraphs and plots within subgraphs. If a 1704 | graph contains multiple subgraphs and multiple plots within subgraphs, use option 1705 | {cmd:gselect()} to select and order subgraphs, and use option {cmd:pselect()} 1706 | to select and order plots. 1707 | 1708 | {pmore} 1709 | {cmd:select()}, {cmd:gselect()}, and {cmd:pselect()} only have an effect if 1710 | there are multiple elements to choose from. That is, 1711 | single subgraphs or single plots will always be displayed, irrespective of 1712 | what you type in these options. 1713 | 1714 | {phang} 1715 | {opt cref} causes results from the reference (sub)population to be 1716 | included in the graph. The default is to suppress these 1717 | results. {cmd:cref} is only relevant if {cmd:over(, contrast())} has been 1718 | specified. 1719 | 1720 | {phang} 1721 | {cmd:bystats}[{cmd:(}{cmdab:m:ain}|{cmdab:s:econdary}{cmd:)}] treats coefficients as equations and 1722 | equations as coefficients. This is only relevant after 1723 | {cmd:dstat summarize} and only has an effect if the results contain multiple 1724 | equations. The effect of {cmd:bystats} typically is that results are grouped 1725 | by statistics rather than by subpopulations or variables (the option may 1726 | also have the opposite effect depending on how exactly {cmd:dstat} returned its 1727 | results). Optional type {cmd:bystats(main)} (the default) or 1728 | {cmd:bystats(secondary)} to specify wether coefficients should replace the 1729 | main dimension or the secondary dimension of the equations, respectively. This 1730 | is only relevant if the equations contain two dimensions 1731 | (subpopulations and variables). 1732 | 1733 | {phang} 1734 | [{cmd:no}]{cmd:step} enforces or prevents using a step function to display 1735 | the distribution function. This is only relevant after {cmd:dstat cdf} 1736 | and {cmd:dstat ccdf}. The default is to display the CDF as a step function 1737 | if option {cmd:discrete} (but not {cmd:ipolate}) has been specified, and 1738 | else use straight lines. Specify {cmd:nostep} or {cmd:step}, respectively, 1739 | to override the default. 1740 | 1741 | {phang} 1742 | {cmd:norefline} suppresses the equality line (diagonal) that is printed 1743 | when plotting results from {cmd:dstat lorenz} (unless option 1744 | {cmd:generalized}, {cmd:gap}, or {cmd:absolute} has been specified). 1745 | 1746 | {phang} 1747 | {opt refline(line_options)} specifies options to affect the rendition of 1748 | the equality line; see {it:{help line_options}}. This is only relevant after 1749 | {cmd:dstat lorenz}. 1750 | 1751 | {marker coefplot}{...} 1752 | {phang} 1753 | {it:coefplot_options} are options to be passed through to 1754 | {helpb coefplot}. Use these options, for example, to set titles and axis 1755 | labels or to affect the overall look and size of the graph. The options can 1756 | also be used to change the rendering of the plotted results (e.g. colors, 1757 | line patterns, marker symbols, etc.). If a graph contains multiple plots 1758 | (multiple series of results displayed in a common style), option 1759 | {cmd:p}{it:#}{cmd:()} can be used to address the {it:#}th plot. For example, 1760 | you could type {cmd:p2(recast(dropline) pstyle(p5) noci)} to change the 1761 | {it:plottype} of the 2nd plot to {cmd:dropline}, change its {it:pstyle} 1762 | to {cmd:p5} (instead of the default {cmd:p2}), and suppress its confidence 1763 | intervals. 1764 | 1765 | {marker predict_options}{...} 1766 | {dlgtab:Predict options} 1767 | 1768 | {phang} 1769 | {opt rif} generates recentered influence functions (RIFs) instead of regular 1770 | influence functions. RIFs are defined such that their mean is equal to the 1771 | statistic in question (Firpo et al. 2009; also see Rios-Avila 2020) 1772 | and the standard error of the mean (as computed by 1773 | command {helpb mean}) provides an estimate of the standard error of the 1774 | statistic. The default is to store influence functions defined in a way such 1775 | that their total is zero and the standard error of the total (as computed by 1776 | command {helpb total}) provides an estimate of the standard error of the 1777 | statistic. 1778 | 1779 | {phang} 1780 | {opt scaling(spec)} determines the scaling of the generated 1781 | influence functions. {it:spec} can be {cmdab:t:otal} (scaling for 1782 | analysis by {helpb total}) or {cmdab:m:ean} (scaling for analysis by 1783 | {helpb mean}). The default is {cmd:scaling(total)} for regular influence 1784 | functions and {cmd:scaling(mean)} for recentered influence functions 1785 | (i.e. if option {cmd:rif} is specified). 1786 | 1787 | {phang} 1788 | {opt compact} generates influence functions in compact form. {cmd:compact} 1789 | only has an effect if {cmd:over()} has been specified and is not allowed 1790 | with {cmd:balance()}, {cmd:unconditional}, {cmd:over(, contrast)}, or 1791 | {cmd:over(, accumulate)}. Furthermore, {cmd:compact} is not supported 1792 | for statistics that are not normalized by the sample size (i.e. frequencies 1793 | or totals). 1794 | 1795 | {pmore} 1796 | The default is to generate one influence function for each single parameter 1797 | estimated by {cmd:dstat}. If {cmd:over()} is specified, this means that 1798 | each statistic in each subpopulation has its own influence 1799 | function. Specify {cmd:compact} to merge the influence functions across 1800 | subpopulations. In this case, {cmd:over()} has to be specified when 1801 | analyzing the influence functions. 1802 | 1803 | {phang} 1804 | {opt quietly} suppresses the list of generated variables that is displayed by 1805 | default. 1806 | 1807 | {pstd} 1808 | Note that weights, if specified, will not be incorporated into the 1809 | influence functions, so that the weights can be 1810 | applied when analyzing the influence functions. The influence functions do, 1811 | however, incorporate the balancing weights (net of base weights) 1812 | from option {cmd:balance()}. 1813 | 1814 | {pstd} 1815 | Furthermore, note that {cmd:dstat} generates scores instead of 1816 | influence functions for statistics that are not normalized by the sample 1817 | size (i.e. frequencies or totals). The difference is that the total of an influence function 1818 | across the estimation sample is zero, whereas the total of the score is 1819 | equal to the statistic in question. Returning scores for frequencies and totals 1820 | ensures that standard errors obtained by {cmd:total} will be correct for these 1821 | statistics in complex survey designs. 1822 | 1823 | 1824 | {marker examples}{...} 1825 | {title:Examples} 1826 | 1827 | {dlgtab:Summary statistics} 1828 | 1829 | {pstd} 1830 | {cmd:dstat summarize} supports a long list of summary statistic. For example, the following 1831 | command computes the arithmetic mean, geometric mean, median, 5% trimmed mean, 5% 1832 | winsorized mean, 95%-efficiency Huber M estimate, and Hodges-Lehmann location of wages 1833 | for unionized and nonunionized workers: 1834 | 1835 | . {stata sysuse nlsw88, clear} 1836 | {p 8 12 2} 1837 | . {stata dstat (mean gmean median trim5 winsor5 huber95 hl) wage, over(union)} 1838 | {p_end} 1839 | 1840 | {pstd} 1841 | Results can be computed for multiple variables, and statistics may 1842 | differ across variables. The following command estimates the 1843 | Gini coefficient, mean log deviation, and variance of logarithms of wages, 1844 | the means of working hours and work experience, as well as the proportion of whites 1845 | ({cmd:race}=1), blacks ({cmd:race}=2), and others ({cmd:race}=3): 1846 | 1847 | {p 8 12 2} 1848 | . {stata dstat (gini mld vlog) wage (mean) hours ttl_exp (pr1 pr2 pr3) race} 1849 | {p_end} 1850 | 1851 | {dlgtab:Distribution functions} 1852 | 1853 | {pstd} 1854 | {cmd:dstat} supports the estimation of several types of distribution 1855 | functions. For example, the density function of wages by union status can 1856 | be obtained as follows: 1857 | 1858 | . {stata sysuse nlsw88, clear} 1859 | . {stata dstat density wage, over(union) ll(0) graph} 1860 | 1861 | {pstd} 1862 | Option {cmd:graph} has been specified so that a graph is drawn. The coefficients 1863 | table will be suppressed in this case; specify option {cmd:table} to enforce displaying the 1864 | coefficients table. An alternative would be to 1865 | omit the {cmd:graph} option and then type {cmd:dstat graph} after estimation. 1866 | Option {cmd:ll(0)} has been specified because wages can only be positive. The option 1867 | causes density estimation to be restricted to the positive domain 1868 | and applies appropriate boundary correction. 1869 | 1870 | {pstd} 1871 | In the example above, the density estimates for unionized and nonunionized 1872 | workers have been displayed in two separate subgraphs. Apply graph option 1873 | {cmd:merge} to overlay the two curves in a single coordinate system: 1874 | 1875 | . {stata dstat graph, merge} 1876 | 1877 | {pstd} 1878 | To see how the overall wage distribution is composed by the two 1879 | groups, we can, for example, rescale the density estimates by group size 1880 | using option {cmd:unconditional} and include the total density using option 1881 | {cmd:total}: 1882 | 1883 | {p 8 12 2} 1884 | . {stata dstat density wage, over(union) total unconditional ll(0) graph(merge)} 1885 | {p_end} 1886 | 1887 | {dlgtab:Covariate balancing} 1888 | 1889 | {pstd} 1890 | The {cmd:balance()} option can be used to adjust results for differences in 1891 | covariate distributions when comparing subpopulations. By default, 1892 | {cmd:dstat} employs inverse probability weighting (IPW) to balance the 1893 | covariates and obtains the relevant reference distribution from the total 1894 | sample. That is, in each subpopulation the covariate distribution is 1895 | adjusted such that it resembles the covariate distribution observed in the 1896 | total population. Use the {cmd:reference()} suboption to change the reference 1897 | distribution. 1898 | 1899 | {pstd} 1900 | For example, the mean difference of average wages between nonunionized and 1901 | unionized workers is as follows: 1902 | 1903 | . {stata sysuse nlsw88, clear} 1904 | . {stata dstat (mean) wage, over(union)} 1905 | . {stata lincom _b[1.union]-_b[0.union]} 1906 | 1907 | {pstd} 1908 | Controlling for education, working hours, work experience and tenure reduces 1909 | the mean difference by about a third (note that there has been a small change 1910 | in the estimation sample due to missing values; for a more valid comparison, 1911 | the raw difference should be computed based on the same sample as the 1912 | balanced difference): 1913 | 1914 | {p 8 12 2} 1915 | . {stata dstat (mean) wage, over(union) balance(grade hours ttl_exp tenure)} 1916 | {p_end} 1917 | . {stata lincom _b[1.union]-_b[0.union]} 1918 | 1919 | {pstd} 1920 | To evaluate how successful the balancing was, you can use suboption {cmd:generate()} 1921 | to store the balancing weights: 1922 | 1923 | {p 8 12 2} 1924 | . {stata dstat (mean) wage, over(union) balance(grade hours ttl_exp tenure, generate(wbal))} 1925 | {p_end} 1926 | {p 8 12 2} 1927 | . {stata tabstat grade hours ttl_exp tenure if wage<., by(union)} (unbalanced) 1928 | {p_end} 1929 | {p 8 12 2} 1930 | . {stata tabstat grade hours ttl_exp tenure [aw=wbal], by(union)} (balanced) 1931 | {p_end} 1932 | . {stata drop wbal} 1933 | 1934 | {pstd} 1935 | The balancing has only been partially successful. Perfect balancing 1936 | (with respect to the means) can be achieved by entropy balancing: 1937 | 1938 | {p 8 12 2} 1939 | . {stata "dstat (mean) wage, over(union) balance(eb: grade hours ttl_exp tenure, generate(wbal))"} 1940 | {p_end} 1941 | {p 8 12 2} 1942 | . {stata tabstat grade hours ttl_exp tenure [aw=wbal], by(union)} 1943 | {p_end} 1944 | . {stata drop wbal} 1945 | 1946 | {pstd} 1947 | Note that, instead of using {helpb lincom} after estimation, you can also obtain group 1948 | differences directly using suboption {cmd:contrast} within the {cmd:over()} 1949 | option: 1950 | 1951 | {p 8 12 2} 1952 | . {stata "dstat (mean) wage, over(union, contrast(0)) balance(eb:grade hours ttl_exp tenure)"} 1953 | {p_end} 1954 | 1955 | {dlgtab:Influence functions} 1956 | 1957 | {pstd} 1958 | {cmd:dstat} can store the influence functions or the recentered 1959 | influence functions (RIFs) of the computed statistics. The influence functions 1960 | or RIFs can then be used in further analyses. Here is an example of 1961 | RIF regressions (Firpo et al. 2009) for the Gini coefficient and the 1962 | mean log deviation: 1963 | 1964 | . {stata sysuse nlsw88, clear} 1965 | . {stata dstat (gini mld) wage, rif(gini mld)} 1966 | . {stata regress gini union south smsa, robust} 1967 | . {stata regress mld union south smsa, robust} 1968 | 1969 | {pstd} 1970 | The RIFs are also useful for decomposition analysis. In the following example 1971 | the wage gap between unionized and non-unionized workers is decomposed into 1972 | a part explained by differences in covariates and a residual (unexplained) part, using 1973 | reweighting based on entropy balancing and using the covariate distribution 1974 | of unionized workers as the reference distribution: 1975 | 1976 | {p 8 12 2} 1977 | . {stata "dstat (mean) wage, over(union) balance(eb: grade hours ttl_exp tenure, reference(1)) rif(RIF0c)"} 1978 | {p_end} 1979 | {p 8 12 2} 1980 | . {stata "dstat (mean) wage if e(sample), over(union) rif(RIF0 RIF1)"} 1981 | {p_end} 1982 | . {stata generate difference = RIF1 - RIF0} 1983 | . {stata generate explained = RIF0c - RIF0} 1984 | . {stata generate unexplained = RIF1 - RIF0c} 1985 | . {stata mean difference explained unexplained} 1986 | 1987 | 1988 | {marker methods}{...} 1989 | {title:Methods and formulas} 1990 | 1991 | {pstd} 1992 | (under construction) 1993 | 1994 | 1995 | {marker saved_results}{...} 1996 | {title:Saved results} 1997 | 1998 | {pstd} 1999 | Depending on options, {cmd:dstat} stores a selection of the following 2000 | results in {cmd:e()}. 2001 | 2002 | {synoptset 20 tabbed}{...} 2003 | {p2col 5 20 24 2: Scalars}{p_end} 2004 | {synopt:{cmd:e(N)}}number of observations{p_end} 2005 | {synopt:{cmd:e(W)}}sum of weights{p_end} 2006 | {synopt:{cmd:e(N_over)}}number subpopulations{p_end} 2007 | {synopt:{cmd:e(N_clust)}}number of clusters{p_end} 2008 | {synopt:{cmd:e(N_vars)}}number of variables{p_end} 2009 | {synopt:{cmd:e(N_stats)}}number of (unique) summary statistics{p_end} 2010 | {synopt:{cmd:e(k_eq)}}number of equations in {cmd:e(b)}{p_end} 2011 | {synopt:{cmd:e(k_omit)}}number of omitted estimates{p_end} 2012 | {synopt:{cmd:e(df_r)}}sample degrees of freedom{p_end} 2013 | {synopt:{cmd:e(qdef)}}quantile definition{p_end} 2014 | {synopt:{cmd:e(adaptive)}}number of iterations of adaptive density estimator{p_end} 2015 | {synopt:{cmd:e(napprox)}}size of density estimation grid{p_end} 2016 | {synopt:{cmd:e(pad)}}padding of density estimation grid{p_end} 2017 | {synopt:{cmd:e(ll)}}lower boundary of the data support (density estimation){p_end} 2018 | {synopt:{cmd:e(ul)}}upper boundary of the data support (density estimation){p_end} 2019 | {synopt:{cmd:e(level)}}confidence level{p_end} 2020 | 2021 | {synoptset 20 tabbed}{...} 2022 | {p2col 5 20 24 2: Macros}{p_end} 2023 | {synopt:{cmd:e(cmd)}}{cmd:dstat}{p_end} 2024 | {synopt:{cmd:e(subcmd)}}{cmd:summarize}, {cmd:density}, {cmd:histogram}, {cmd:proportion}, {cmd:cdf}, {cmd:ccdf}, {cmd:quantile}, {cmd:lorenz}, or {cmd:share}{p_end} 2025 | {synopt:{cmd:e(predict)}}{cmd:dstat predict}{p_end} 2026 | {synopt:{cmd:e(cmdline)}}command as typed{p_end} 2027 | {synopt:{cmd:e(depvar)}}name(s) of analyzed variable(s){p_end} 2028 | {synopt:{cmd:e(nocasewise)}}{bf:nocasewise} or empty{p_end} 2029 | {synopt:{cmd:e(over)}}name of {it:overvar}{p_end} 2030 | {synopt:{cmd:e(over_namelist)}}values of subpopulations{p_end} 2031 | {synopt:{cmd:e(over_labels)}}labels of subpopulations{p_end} 2032 | {synopt:{cmd:e(over_select)}}values of selected subpopulations{p_end} 2033 | {synopt:{cmd:e(over_contrast)}}{cmd:total}, {it:#}, {cmd:lag}, {cmd:lead}, or empty{p_end} 2034 | {synopt:{cmd:e(over_ratio)}}{cmd:ratio} or {cmd:lnratio} or empty{p_end} 2035 | {synopt:{cmd:e(over_accumulate)}}{cmd:accumulate} or empty{p_end} 2036 | {synopt:{cmd:e(over_fixed)}}{cmd:fixed} or empty{p_end} 2037 | {synopt:{cmd:e(total)}}{cmd:total} or empty{p_end} 2038 | {synopt:{cmd:e(unconditional)}}{cmd:unconditional} or empty{p_end} 2039 | {synopt:{cmd:e(balance)}}list of balancing variables{p_end} 2040 | {synopt:{cmd:e(balmethod)}}balancing method{p_end} 2041 | {synopt:{cmd:e(balref)}}balancing reference{p_end} 2042 | {synopt:{cmd:e(balopts)}}options passed through to balancing procedure{p_end} 2043 | {synopt:{cmd:e(bwmethod)}}bandwidth selection as specified in {cmd:bwidth()}{p_end} 2044 | {synopt:{cmd:e(kernel)}}kernel as specified in {cmd:kernel()}{p_end} 2045 | {synopt:{cmd:e(exact)}}{cmd:exact} or empty{p_end} 2046 | {synopt:{cmd:e(boundary)}}boundary correction method{p_end} 2047 | {synopt:{cmd:e(hdtrim)}}{cmd:hdtrim()} as specified{p_end} 2048 | {synopt:{cmd:e(mqopts)}}{cmd:mqopts()} as specified{p_end} 2049 | {synopt:{cmd:e(novalues)}}{cmd:novalues} or empty{p_end} 2050 | {synopt:{cmd:e(vformat)}}display format specified in {cmd:vformat()}{p_end} 2051 | {synopt:{cmd:e(stats)}}list of (unique) summary statistics{p_end} 2052 | {synopt:{cmd:e(slist)}}normalized specification of statistics and variables{p_end} 2053 | {synopt:{cmd:e(percent)}}{cmd:percent} or empty{p_end} 2054 | {synopt:{cmd:e(proportion)}}{cmd:proportion} or empty{p_end} 2055 | {synopt:{cmd:e(frequency)}}{cmd:frequency} or empty{p_end} 2056 | {synopt:{cmd:e(mid)}}{cmd:mid} or empty{p_end} 2057 | {synopt:{cmd:e(floor)}}{cmd:floor} or empty{p_end} 2058 | {synopt:{cmd:e(ipolate)}}{cmd:ipolate} or empty{p_end} 2059 | {synopt:{cmd:e(discrete)}}{cmd:discrete} or empty{p_end} 2060 | {synopt:{cmd:e(categorical)}}{cmd:categorical} or empty{p_end} 2061 | {synopt:{cmd:e(ep)}}{cmd:ep} or empty{p_end} 2062 | {synopt:{cmd:e(gap)}}{cmd:gap} or empty{p_end} 2063 | {synopt:{cmd:e(generalized)}}{cmd:generalized} or empty{p_end} 2064 | {synopt:{cmd:e(absolute)}}{cmd:absolute} or empty{p_end} 2065 | {synopt:{cmd:e(average)}}{cmd:average} or empty{p_end} 2066 | {synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end} 2067 | {synopt:{cmd:e(byvar)}}name of variable specified in {cmd:by()}{p_end} 2068 | {synopt:{cmd:e(pline)}}poverty line variable specified in {cmd:pline()}{p_end} 2069 | {synopt:{cmd:e(pstrong)}}{cmd:pstrong} or empty{p_end} 2070 | {synopt:{cmd:e(generate)}}name(s) of generated variable(s){p_end} 2071 | {synopt:{cmd:e(clustvar)}}name of cluster variable{p_end} 2072 | {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end} 2073 | {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end} 2074 | {synopt:{cmd:e(citype)}}type confidence interval stored in {cmd:e(ci)}{p_end} 2075 | {synopt:{cmd:e(wtype)}}weight type{p_end} 2076 | {synopt:{cmd:e(wexp)}}weight expression{p_end} 2077 | {synopt:{cmd:e(title)}}title in estimation output{p_end} 2078 | {synopt:{cmd:e(properties)}}{cmd:b} or {cmd:b V}{p_end} 2079 | 2080 | {synoptset 20 tabbed}{...} 2081 | {p2col 5 20 24 2: Matrices}{p_end} 2082 | {synopt:{cmd:e(b)}}estimates{p_end} 2083 | {synopt:{cmd:e(V)}}variance-covariance matrix of estimates{p_end} 2084 | {synopt:{cmd:e(se)}}standard errors of estimates{p_end} 2085 | {synopt:{cmd:e(ci)}}confidence intervals of estimates{p_end} 2086 | {synopt:{cmd:e(nobs)}}number of observations per estimate{p_end} 2087 | {synopt:{cmd:e(sumw)}}sum of weights per estimate{p_end} 2088 | {synopt:{cmd:e(at)}}evaluation points of distribution function{p_end} 2089 | {synopt:{cmd:e(omit)}}indicator for omitted estimates{p_end} 2090 | {synopt:{cmd:e(id)}}subpopulation IDs of estimates{p_end} 2091 | {synopt:{cmd:e(cref)}}contrast reference indicators{p_end} 2092 | {synopt:{cmd:e(bwidth)}}kernel bandwidth(s) of density estimation{p_end} 2093 | {synopt:{cmd:e(_N)}}number of observations by subpopulation{p_end} 2094 | {synopt:{cmd:e(_W)}}sum of weights by subpopulation{p_end} 2095 | 2096 | {synoptset 20 tabbed}{...} 2097 | {p2col 5 20 24 2: Functions}{p_end} 2098 | {synopt:{cmd:e(sample)}}estimation sample{p_end} 2099 | {p2colreset}{...} 2100 | 2101 | {pstd} 2102 | If {cmd:vce()} is {cmd:svy}, {cmd:bootstrap}, or {cmd:jackknife}, additional 2103 | results are stored in {cmd:e()}; see {helpb svy}, {helpb bootstrap}, and 2104 | {helpb jackknife}, respectively. 2105 | 2106 | 2107 | {marker references}{...} 2108 | {title:References} 2109 | 2110 | {phang} 2111 | Akinshin, A. (2021). Trimmed Harrell-Davis quantile estimator based on the 2112 | highest density interval of the given 2113 | width. {browse "http://arxiv.org/abs/2111.11776":arXiv:2111.11776} [stat.ME]. 2114 | {p_end} 2115 | {phang} 2116 | Botev, Z.I., J.F. Grotowski, and D.P. Kroese (2010). Kernel density 2117 | estimation via diffusion. Annals of Statistics 2118 | 38(5): 2916-2957. DOI: {browse "http://doi.org/10.1214/10-AOS799":10.1214/10-AOS799}. 2119 | {p_end} 2120 | {phang} 2121 | Brys, G., M. Hubert, A. Struyf (2004). A Robust Measure of Skewness. 2122 | Journal of Computational and Graphical Statistics 13(4): 996-1017. 2123 | {p_end} 2124 | {phang} 2125 | Brys, G., M. Hubert, A. Struyf (2006). Robust measures of tail weight. 2126 | Computational Statistics & Data Analysis 50: 733-759. 2127 | {p_end} 2128 | {phang} 2129 | Clark, S., R. Hemming, D. Ulph (1981). On Indices for the Measurement of Poverty. The 2130 | Economic Journal 91(362): 515-526 2131 | {p_end} 2132 | {phang} 2133 | Cwik, J., J. Mielniczuk (1993). Data-dependent bandwidth choice for a grade density 2134 | kernel estimate. Statistics & Probability Letters 16: 397-405. 2135 | {p_end} 2136 | {phang} 2137 | Deville, Jean-Claude (1999). Variance estimation for complex statistics and 2138 | estimators: Linearization and residual techniques. Survey Methodology 25: 193-203. 2139 | {p_end} 2140 | {phang} 2141 | DiNardo, J.E., N. Fortin, T. Lemieux (1996). Labour Market Institutions and 2142 | the Distribution of Wages, 1973-1992: A Semiparametric Approach. Econometrica 2143 | 64(5): 1001-1046. 2144 | {p_end} 2145 | {phang} 2146 | Donaldson, D., J.A. Weymark (1986). Properties of Fixed-Population Poverty Indices. International 2147 | Economic Review 27(3): 667-688. 2148 | {p_end} 2149 | {phang} 2150 | Firpo, S., N.M. Fortin, T. Lemieux (2009). Unconditional Quantile 2151 | Regressions. Econometrica 77: 953-973. 2152 | {p_end} 2153 | {phang} 2154 | Foster, J., J. Greer, E. Thorbecke (1984). A class of decomposable poverty 2155 | measures. Econometrica 52(3): 761-766. 2156 | {p_end} 2157 | {phang} 2158 | Foster, J., J. Greer, E. Thorbecke (2010). The Foster–Greer–Thorbecke (FGT) poverty measures: 25 years 2159 | later. The Journal of Economic Inequality 8: 491–524. 2160 | {p_end} 2161 | {phang} 2162 | Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate 2163 | Reweighting Method to Produce Balanced Samples in Observational Studies. 2164 | Political Analysis 20: 25-46. 2165 | {p_end} 2166 | {phang} 2167 | Hampel, F.R. (1974). The Influence Curve and Its Role in Robust 2168 | Estimation. Journal of the American Statistical Association 69: 383-393. 2169 | {p_end} 2170 | {phang} 2171 | Harrell, F.E., C.E. Davis (1982). A New Distribution-Free Quantile Estimator. Biometrika 2172 | 69: 635-640. 2173 | {p_end} 2174 | {phang} 2175 | Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika 2176 | 62(1): 101-111. 2177 | {p_end} 2178 | {phang} 2179 | Hodges, Jr., J.L., E.L. Lehmann (1963). Estimates of location based on 2180 | rank tests. Annals of Mathematical Statistics 34(2): 598-611. 2181 | {p_end} 2182 | {phang} 2183 | Hyndman, R.J., Fan, Y. (1996). Sample Quantiles in Statistical 2184 | Packages. The American Statistician 50: 361-365. 2185 | {p_end} 2186 | {phang} 2187 | Jann, B. (2007). Univariate kernel density 2188 | estimation. DOI: {browse "http://boris.unibe.ch/69421/2/kdens.pdf":10.7892/boris.69421}. 2189 | {p_end} 2190 | {phang} 2191 | Jann, B. (2020). Influence functions continued. A framework for estimating standard errors in 2192 | reweighting, matching, and regression adjustment. University of Bern Social Sciences 2193 | Working Papers 35. Available from 2194 | {browse "http://ideas.repec.org/p/bss/wpaper/35.html"}. 2195 | {p_end} 2196 | {phang} 2197 | Ma, Y., M.G. Genton, E. Parzen (2011). Asymptotic properties of sample 2198 | quantiles of discrete distributions. Annals of the Institute of Statistical 2199 | Mathematics 63:227–243. 2200 | {p_end} 2201 | {phang} 2202 | Newson, R. (2006). Efficient Calculation of Jackknife Confidence 2203 | Intervals for Rank Statistics. Journal of Statistical Software 15(1). 2204 | {p_end} 2205 | {phang} 2206 | Osberg, L., K. Xu (2008). How Should We Measure Poverty in a Changing World? Methodological 2207 | Issues and Chinese Case Study. Review of Development Economics 12(2): 419–441. 2208 | {p_end} 2209 | {phang} 2210 | Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata: RIF 2211 | regression and RIF decomposition. The Stata Journal 20(1): 51-94. 2212 | {p_end} 2213 | {phang} 2214 | Rousseeuw, P.J., C. Croux (1993). Alternatives to the Median 2215 | Absolute Deviation. Journal of the American Statistical Association 2216 | 88(424): 1273-1283. 2217 | {p_end} 2218 | {phang} 2219 | Saisana M. (2014). Watts Poverty Index. In: A.C. Michalos (ed). Encyclopedia of Quality of Life and Well-Being 2220 | Research. Dordrecht: Springer. DOI: {browse "http://doi.org/10.1007/978-94-007-0753-5_3197":10.1007/978-94-007-0753-5_3197} 2221 | {p_end} 2222 | {phang} 2223 | Sen, A. (1976). Poverty: An Ordinal Approach to Measurement. Econometrica 44(2): 219-231. 2224 | {p_end} 2225 | {phang} 2226 | Shorrocks, A.F. (1980). The Class of Additively Decomposable Inequality Measures. Econometrica 48(3): 613-625. 2227 | {p_end} 2228 | {phang} 2229 | Shorrocks, A.F. (1995). Revisiting the Sen Poverty Index. Econometrica 63(5): 1225-1230. 2230 | {p_end} 2231 | {phang} 2232 | Takayama, N. (1979). Poverty, income inequality, and their measures: Professor Sen's 2233 | axiomatic approach reconsidered. Econometrica 47(3): 747-759. 2234 | {p_end} 2235 | {phang} 2236 | Wand, M.P., M.C. Jones (1995). Kernel Smoothing. London: Chapman and Hall. 2237 | {p_end} 2238 | 2239 | 2240 | {marker author}{...} 2241 | {title:Author} 2242 | 2243 | {pstd} 2244 | Ben Jann, University of Bern, ben.jann@unibe.ch 2245 | 2246 | {pstd} 2247 | Thanks for citing this software as follows: 2248 | 2249 | {pmore} 2250 | Jann, B. (2020). dstat: Stata module to compute summary statistics and 2251 | distribution functions including standard errors 2252 | and optional covariate balancing. Available from 2253 | {browse "http://ideas.repec.org/c/boc/bocode/s458874.html"}. 2254 | 2255 | 2256 | {marker also_see}{...} 2257 | {title:Also see} 2258 | 2259 | {psee} 2260 | Online: help for 2261 | {helpb centile}, 2262 | {helpb ci}, 2263 | {helpb correlate}, 2264 | {helpb cumul}, 2265 | {helpb histogram}, 2266 | {helpb kdensity}, 2267 | {helpb mean}, 2268 | {helpb pctile}, 2269 | {helpb proportion}, 2270 | {helpb spearman}, 2271 | {helpb summarize}, 2272 | {helpb table}, 2273 | {helpb tabstat}, 2274 | {helpb tabulate}, 2275 | {helpb teffects ipw}, 2276 | {helpb total} 2277 | 2278 | {psee} 2279 | Packages from the SSC Archive (type {cmd:ssc describe} {it:name} for 2280 | more information): 2281 | {helpb akdensity}, 2282 | {helpb apoverty}, 2283 | {helpb catplot}, 2284 | {helpb cdfplot}, 2285 | {helpb ci2}, 2286 | {helpb dfl}, 2287 | {helpb distplot}, 2288 | {helpb duncan}, 2289 | {helpb eqprhistogram}, 2290 | {helpb fre}, 2291 | {helpb glcurve}, 2292 | {helpb ineqdeco}, 2293 | {helpb kdens}, 2294 | {helpb kmatch}, 2295 | {helpb lorenz}, 2296 | {helpb moremata}, 2297 | {helpb povdeco}, 2298 | {helpb poverty}, 2299 | {helpb pshare}, 2300 | {helpb reldist}, 2301 | {helpb rif}, 2302 | {helpb robstat}, 2303 | {helpb seg}, 2304 | {helpb somersd}, 2305 | {helpb sumdist}, 2306 | {helpb svygei:svygei_svyatk}, 2307 | {helpb svylorenz} 2308 | 2309 | -------------------------------------------------------------------------------- /dstat_svyr.ado: -------------------------------------------------------------------------------- 1 | *! version 1.0.5 15dec2022 Ben Jann 2 | *! helper program for -dstat, vce(svy)-; do not use manually 3 | 4 | program dstat_svyr, eclass properties(svylb svyb svyj) 5 | version 14 6 | _parse comma lhs 0 : 0 7 | syntax [, NOSE * ] 8 | dstat `lhs', nose `options' 9 | tempname b V 10 | mat `b' = e(b) 11 | mata: st_matrix("`V'", diag(1 :- st_matrix("e(omit)"))) 12 | ereturn repost b=`b' V=`V', resize 13 | eret local cmd "prop" // trick to skip _check_omit 14 | eret local cmd0 "dstat_svyr" 15 | end 16 | 17 | -------------------------------------------------------------------------------- /stata.toc: -------------------------------------------------------------------------------- 1 | v 3 2 | p dstat Stata module to compute summary statistics and distribution functions including standard errors and optional covariate balancing 3 | --------------------------------------------------------------------------------