├── LICENSE
├── README.md
├── dstat.ado
├── dstat.pkg
├── dstat.sthlp
├── dstat_svyr.ado
└── stata.toc


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Ben Jann
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # dstat
  2 | Stata module to compute summary statistics and distribution functions including 
  3 | standard errors and optional covariate balancing
  4 | 
  5 | `dstat` unites a variety of methods to describe (univariate)
  6 | statistical distributions. Covered are density estimation, histograms,
  7 | cumulative distribution functions, probability distributions, quantile
  8 | functions, lorenz curves, percentile shares, and a large collection
  9 | of summary statistics such as classical and robust measures of location, scale,
 10 | skewness, and kurtosis, as well as inequality and poverty measures. Particular
 11 | features of the command are that it provides consistent standard errors
 12 | supporting complex sample designs for all covered statistics and that the
 13 | simultaneous analysis of multiple variables across multiple subpopulations is
 14 | possible. Furthermore, the command supports covariate balancing based on
 15 | reweighting techniques (inverse probability weighting and entropy balancing),
 16 | including appropriate correction of standard errors. Standard error estimation
 17 | is implemented in terms of influence functions, which can be stored 
 18 | for further analysis, for example, using RIF regression.
 19 | 
 20 | To install `dstat` from the SSC Archive, type
 21 | 
 22 |     . ssc install dstat, replace
 23 | 
 24 | in Stata. Stata version 14 or newer is required. Furthermore, `moremata` and
 25 | `coefplot` are required. To install these packages from the SSC Archive, type
 26 | 
 27 |     . ssc install moremata, replace
 28 |     . ssc install coefplot, replace
 29 | 
 30 | ---
 31 | 
 32 | Installation from GitHub:
 33 | 
 34 |     . net install dstat, replace from(https://raw.githubusercontent.com/benjann/dstat/main/)
 35 |     . net install moremata, replace from(https://raw.githubusercontent.com/benjann/moremata/master/)
 36 |     . net install coefplot, replace from(https://raw.githubusercontent.com/benjann/coefplot/master/)
 37 | 
 38 | ---
 39 | 
 40 | Main changes:
 41 | 
 42 |     04jun2025 (version 1.4.7)
 43 |     - -dstat graph- failed after -dstat histogram- and -dstat share- in case of
 44 |       over() with suboption -contrast-; this is fixed
 45 |     - -dstat graph- after -dstat histogram-, -dstat proportion-, or -dstat share-
 46 |       now leaves the margin of the plotregion unchanged in case of over() with
 47 |       suboption -contrast-; furthermore, base() is now set to 1 (rather than 0)
 48 |       in case of over() with suboption -ratio-
 49 | 
 50 |     25apr2025 (version 1.4.6)
 51 |     - statistic -smse- in -dstat summarize- has been renamed to -rmse- (root mean
 52 |       squared error)
 53 |     - -dstat summarize- could be unnecessarily slow on small datasets due to an
 54 |       unfortunate use of Mata's findexternal(); this is fixed
 55 |     - inequality statistic [gw_|w_|b_]ge(alpha) in -dstat summarize- did not
 56 |       correctly diagnose out-of-support observations if alpha was 0 or 1; this is
 57 |       fixed
 58 | 
 59 |     04apr2025 (version 1.4.5)
 60 |     - return e(sinfo) added (undocumented)
 61 |     - -dstat summarize- now has undocumented option -noclean- to retain duplicate
 62 |       statistics
 63 | 
 64 |     24mar2023 (version 1.4.4)
 65 |     - generate() stored the influence functions of the raw statistics rather than the
 66 |       influence functions of the transformed statistics if suboption -lnratio- was
 67 |       specified in over(); this also implied that vce(svy) reported the standard errors
 68 |       of the raw statistics rather than standard errors of the transformed statistics
 69 |       if suboption -lnratio- was specified in over(); this is fixed
 70 | 
 71 |     28dec2022 (version 1.4.3)
 72 |     - command -dstat (somersd) Y, by(X)- computed D(X|Y) rather than D(Y|X); I now
 73 |       changed this so that D(Y|X) is computed, which is more intuitive (and more in
 74 |       line with how other asymmetric statistics are computed by dstat); thanks to
 75 |       Maurizio Pisati for pointing out this inconsistency
 76 | 
 77 |     15dec2022 (version 1.4.2)
 78 |     - modified dstat_svyr such that replication-based svy estimators no longer
 79 |       apply checks for omitted coefficients; this prevents the estimators from
 80 |       failing on results that have zero variance (e.g. a zero-frequency histogram
 81 |       bar)
 82 | 
 83 |     14dec2022 (version 1.4.1)
 84 |     - [no]cov is no longer a suboption within vce(); it is now a regular option
 85 |     - dstat predict now has option scaling() to determine the scaling of the
 86 |       generated influence functions
 87 |     - option nobwfixed added; code to obtain grid and bandwidth in case of
 88 |       replication estimators revised
 89 |     - revised implementation of vce(svy)
 90 |     - revised implementation of predict
 91 | 
 92 |     12dec2022 (version 1.4.0)
 93 |     - dstat pw did not work with vce() set to bootstrap, jackknife, or svy; this is
 94 |       fixed
 95 |     - the returned information on sample and population size included observations
 96 |       that were excluded from estimation due to missing values if vce(svy) with
 97 |       replication-based variance estimation was specified; this is fixed
 98 |     - the secondary variable (-by-) can now be string for inequality decomposition
 99 |       measures as well as for cohend, mindex, uc[l|r], cramersv, and dissim
100 | 
101 |     05dec2022 (version 1.3.9)
102 |     - statistic -sdlog- added
103 |     - new methods in citype() for proportions: agresti, exact, jeffreys, wilson
104 |     - citype(normal) can now be abbreviated as citype(norm)
105 |     - reorganized code for computation of CIs
106 |     - dstat graph: overlay can now be specified as a synonym for merge
107 |     - r() from -dstat- is now preserved if option -graph- is specified; this ensures
108 |       that r(table) will be available after running -dstat- with both the -graph-
109 |       option and the -table- option; furthermore, r() from dstat is now also
110 |       preserved if option -generate()- or -rif()- is applied
111 |     - the display routine is now executed even if -quietly- is applied to -dstat-,
112 |       so that r(table) will created even if -quietly- is applied
113 |     - the display routine will now clear preexisting r() even if -notable- is applied
114 |     - -dstat predict- no longer modified r()
115 |     - an informative error message is now displayed if a string variable is
116 |       specified in by(), pline(), or as an argument to a statistic
117 | 
118 |     21nov2022 (version 1.3.8)
119 |     - dstat density: option [l|r]tight added; requires newest update of moremata
120 | 
121 |     20oct2022 (version 1.3.7)
122 |     - dstat returned error if option -nose- was applied with statistics that set
123 |       standard errors to zero (e.g. min and max); this is fixed
124 |     
125 |     22sep2022 (version 1.3.6)
126 |     - dstat returned error if histogram method -scott- was specified; this is fixed
127 |     - now using errprintf() to display errors in Mata
128 | 
129 |     11aug2022 (version 1.3.5)
130 |     - statistic -cohend- added
131 |     - statistic -freq- without argument can now be used to obtain
132 |       overall frequence/sum of weights; can also type -count- 
133 | 
134 |     17feb2022 (version 1.3.4)
135 |     - dstat pw added (wrapper for dstat summarize to compute pairwise correlations
136 |       and similar)
137 |     - informative error message is now displayed if factor variables are used in
138 |       -dstat proportion- without option -nocategorical-
139 | 
140 |     14feb2022 (version 1.3.3)
141 |     - additional statistics in dstat sum: -slope- or -b- (regression coefficient;
142 |       may also be used to compute mean difference or risk difference),
143 |       -or- (odds ratio in 2x2 table), -rr- (risk ratio in 2x2 table)
144 |     - version of moremata library is now checked
145 | 
146 |     17jan2022 (version 1.3.2)
147 |     - option hdtrim() added (trimmed Harrell-Davis quantiles)
148 |     - grid size in _ds_mq_d_init() now 1024+1 because first point will be removed
149 | 
150 |     11jan2022 (version 1.3.1)
151 |     - now using a properly derived expression for the influence function of 
152 |       Harrell-Davis quantiles (rather than obtaining the IF by analogy to the
153 |       jackknife approach proposed by Harrell and Davis 1982); the new formulas
154 |       lead to slightly different results
155 | 
156 |     07jan2022 (version 1.3.0)
157 |     - dstat sum: huber, biweight, mad[n], mae[n], mscale now take account of qdef()
158 |     - dstat sum: computation of IFs for winsor, qskew, qw, lqw, rqw revised so that
159 |       qdef() is taken into account (only relevant if qdef=10 or qdef=11)
160 |   
161 |     30dec2021 (version 1.2.9)
162 |     - system for managing selection of observations and temporary results rewritten
163 |       (more systematic, cleaner code, less error prone, more efficient)
164 |     - dstat sum: harmonic mean (hmean) is now set to zero if at least one outcome
165 |       value is equal to zero
166 | 
167 |     22dec2021 (version 1.2.8)
168 |     - dstat sum: computation of taua was wrong in case of fweights; this is fixed
169 |     - dstat sum: renamed cdfm to mcdf, cdff to fcdf, ccdfm to mccdf, ccdff to fccdf
170 |     - system for parsing syntax of -dstat sum- rewritten (more general, cleaner
171 |       code, easier to manage/expand, better error messages)
172 | 
173 |     22dec2021 (version 1.2.7)
174 |     - support for qdef(11) added (mid-quantile); option -mquantile- is a synonym
175 |       for qdef(11)
176 |     - dstat sum: mquantile, gw_vlog, w_vlog, b_vlog, ekurtosis, rsquared added
177 |     - dstat sum: now using quad precision when taking cross products in variance,
178 |       sd, cv, md, gini, vlog, sen, sst, takayama, lvar, mse, spearman, skewness,
179 |       kurtosis, gci, corr, cov 
180 |     - default for napprox() increased from 512 to 1024
181 |     - dstat histogram: in case of pweights or iweights, the effective sample size 
182 |       (sum(w)^2/sum(w^2)) is now used instead of the physical number number of
183 |       observations in the rules for selecting the number of bins
184 |     - default bandwidth selector for density estimation is now -dpi(2)-; -sjpi-
185 |       can be erratic on data that contains heaping
186 |     - improved error messages and some code cleaning
187 |     
188 |     05dec2021 (version 1.2.6)
189 |     - IF of b_gini assumes that the order of group means is stable; this is an
190 |       assumption that is typically not very critical; comparison to the jackknife
191 |       illustrates that the IF is quite accurate even in small samples; removed
192 |       the corresponding disclaimer in the help file
193 | 
194 |     05dec2021 (version 1.2.6)
195 |     - dstat_sum: b_gini added (IF not fully correct yet; may only serve as a rough
196 |       approximation)
197 |     - dstat sum: gw_gini, gw_mld, gw_theil, gw_ge added
198 |     - datat sum: mldwithin renamed to w_mld; mldbetween renamed to b_mld
199 |     - datat sum: theilwithin renamed to w_theil; theilbetween renamed to b_theil
200 |     - datat sum: gewithin renamed to w_ge; gebetween renamed to b_ge
201 | 
202 |     04dec2021 (version 1.2.5)
203 |     - dstat sum: gewithin and gebetween added
204 |     - dstat sum: IF of dissim made more efficient
205 | 
206 |     03dec2021 (version 1.2.4)
207 |     - dstat sum: mldwithin, mldbetween, teilwithin, teilbetween, dissim added
208 |     - dstat sum: now using more efficient approach to compute IFs of categorical
209 |       measures (hhi, entropy, mindex, etc)
210 |     - option zvar() is now called by(); zvar() still supported but no longer documented
211 | 
212 |     27nov2021 (version 1.2.3)
213 |     - -nocasewise- had a bug that could crash -dstat- in some cases; this is fixed
214 | 
215 |     26nov2021 (version 1.2.2)
216 |     - new system to manage temporary results to improve efficiency of -dstat sum-
217 |     - due to a type the values for gamma and tau_b could be somewhat off if weight
218 |       were specified; this is fixed
219 | 
220 |     25nov2021 (version 1.2.1)
221 |     - added association statistics: taua, taub, somersd, gamma; using a fast
222 |       algorithm by R. Newson (2006. Efficient Calculation of Jackknife Confidence 
223 |       Intervals for Rank Statistics. Journal of Statistical Software 15/1) to
224 |       compute the difference in the sum of concordant and discordant pairs
225 |     - dstat automatically (and silently) recentered (all) influence functions if
226 |       any IF had a relative error (i.e. deviation from zero relative to the value
227 |       of the statistic) larger than 1e-14; a corresponding warning message was only
228 |       displayed if any IF had a relative error larger than 1e-6; the former type
229 |       of recentering is now discarded; that is, recentering is now only applied
230 |       if at least one relative error is larger than 1e-6 (all IFs will be
231 |       affected) and a warning message is always displayed if recentering is applied
232 |     - option -relax- could cause error in some situations; this is fixed
233 |     - dstat no longer enforces user version 14.2 when writing coefficient names to
234 |       e(b) (enforcing user version 14.2 caused issues with bootstrap and similar
235 |       commands); a consequence of this is that in Stata 15 (and in Stata 16 prior
236 |       to the 30mar2021 update) the results table from -dstat summarize- might look
237 |       slightly awkward if statistics with parameters in parentheses are specified;
238 |       type -version 14: dstat summarize ...- for better output in these cases
239 |     - over-legend is no longer displayed if the coefficients table is suppressed
240 |     - subcmd is now always set to -summarize-, if no known subcmd is specified; for
241 |       example, -datat x1-x5- now works
242 | 
243 |     20nov2021 (version 1.2.0)
244 |     - a bug in -nocasewise- led to erroneous selection of observations or crashed
245 |       dstat in some situations; this is fixed
246 |     - added statistics for categorical variables: hhi, hhin, gimp, entropy, hill,
247 |       renyi, mindex, uc, cramer
248 | 
249 |     03aug2021 (version 1.1.9)
250 |     - fixed header layout in Stata 17, employing _coef_table_header options
251 |       introduced in the 13jul2021 update of Stata 17
252 |   
253 |     14jul2021 (version 1.1.8)
254 |     - option -discrete- now allowed in -dstat histogram-; -dstat histogram, discrete-
255 |       is an alias for -dstat proportion, nocategorical-
256 |     - graphs after -dstat proportion- now use a continuous axis instead of a categorical
257 |       axis if option -nocategorical- has been specified
258 |     - -dstat frequency- can now be used as alias for -dstat proportion, frequency-
259 |     - statistic hdquantile() now fully supports weights; computation of influence
260 |       functions has been improved
261 |     - option -qdef(10)- can now be specified to use Harrell-Davis quantiles; option
262 |       -hdquantile- is a synonym for -qdef(10)-
263 | 
264 |     01jul2021 (version 1.1.7)
265 |     - statistic hdquantile() added
266 |     - SEs of quantile(0) and quantile(1) now set to 0
267 |     - -dstat pdf- now allowed as alias for -dstat pdf-
268 |     - better error message if an invalid subcommand is specified
269 | 
270 |     30jun2021 (version 1.1.6)
271 |     - additional poverty measures: tip (TIP ordinate) and atip (absolute TIP ordinate)
272 |     - -datat tip- failed if a variables was specified in -pline()- instead of a
273 |       fixed value; this is fixed
274 |     - -dstat tip- no longer returns HCR and PGI in e()
275 | 
276 |     29jun2021 (version 1.1.5)
277 |     - -dstat tip- (Tip curve) added
278 |     - option range() added to subcommands density, cdf, ccdf, quantile, lorenz, tip
279 |     - association measures added: corr (correlation), cov (covariance), spearman
280 |       (Spearman's rank correlation)
281 |     - additional poverty measures: apgap (absolute poverty gap), apgi (absolute
282 |       poverty gap index)
283 |     - contrast(lag) and contrast(lead) now allowed in over()
284 |     - can now specify custom p1 and p2 with -iqrn-
285 |     - observations with missing on variables specified in zvar() or pline() (or
286 |       corresponding variables specified as arguments to individual statistics) are
287 |       no longer excluded from the overall estimation sample if -nocasewise- is 
288 |       specified
289 |     - number of obs and sum of weights now returned for each parameter in e(nobs)
290 |       and e(sumw)
291 | 
292 |     23jun2021 (version 1.1.4)
293 |     - additional inequality statistic: hoover index (robin hood index)
294 |     - additional poverty statistics: hcr (head count ratio), pgap (poverty gap),
295 |       pgi (poverty gap index), sen (Sen poverty index), sst (Sen-Shorrocks-Thon),
296 |       takayama (Takayama poverty index), chu (Clark-Hemming-Ulph)
297 |     - new option -pstrong- to employ the "strong" poverty definition; -fgt- now uses
298 |       the "weak" definition by default
299 |     - option -relax- of -dstat summarize- was not included in e() and was not passed 
300 |       through to -predict-; this is fixed
301 |     - the routine computing -md- could break in some contexts; this is fixed
302 | 
303 |     10jun2021 (version 1.1.2)
304 |     - -predict- could fail after -dstat proportion-; this is fixed
305 |     - contrast options -ratio- and -lnratio- now again supported for statistics
306 |       that are not normalized by the sample size (frequencies, totals)
307 |     - fixed bug that could occur if nocasewise and unconditional were both specified
308 | 
309 |     07jun2021 (version 1.1.1)
310 |     - option -nocasewise- added
311 |     - option -relax- added
312 |     - dstat now always uses scores for totals/frequencies instead of influence
313 |       functions; (sub)option svy in -predict-, -vce(analytic)- and -vce(cluster)-
314 |       is discontinued; option -unconditional(fixed)- is discontinued; treatment of
315 |       totals/freqs now consistent with survey estimation by default (i.e. supopulation
316 |       sizes are assumed random; number PSUs is assumed fixed); this is different
317 |       from how official command -total- handles subpops if used without -svy-
318 |       prefix
319 |     - contrast options -ratio- and -lnratio- are no longer supported for statistics
320 |       that are not normalized by the sample size (frequencies, totals); -ratio- and
321 |       -lnratio- now imply -contrast-
322 |     - option -compact- of -predict/generate()/rif()- no longer allowed with
323 |       -over(, contrast/accumulate)- or with statistics that are not normalized by
324 |       the sample size
325 |     - dstat summarize applied sorting even if not necessary; this is fixed
326 |     - omitted estimates are no longer flagged in the coefficient names; vector
327 |       e(omit) is now returned
328 |     - density estimation settings are now returned in e() only if density estimation
329 |       has, in fact, been employed; e(bwidth) now has better column names
330 |     - in some situations, dstat histogram computed wrong results for the first bin
331 |       if option balance() was specified; this is fixed
332 |     - _makesymmetric() is now applied to e(V) to remove asymmetry due to possible
333 |        roundoff-error
334 | 
335 |     22dec2020 (version 1.1.0)
336 |     - results for statistics mad(0,0), madn(0,0), mae(0), and maen(0) were wrong
337 |       in case of weights; this is fixed
338 | 
339 |     16dec2020 (version 1.0.9)
340 |     - new subopions -contrast()-, -ratio-, -lnratio-, and -accumulate- in -over()-
341 |     - new -common- option in -dstat density-, -dstat histogram-, and -datat [c]cdf-
342 |     - new display options -cref- and -pvalue-
343 |     - citype() now sets CI to missing if value of coef is outside domain of
344 |       transformation function
345 |     - option select() in -dstat graph- can now contain -reverse- instead of a
346 |       numlist
347 | 
348 |     11dec2020 (version 1.0.8)
349 |     - cluster variable in vce(cluster) can now be string
350 |     - over(..., rescale) now implemented as subcommand-specific option
351 |       -unconditional-; -unconditional(fixed)- added to treat subpopulation
352 |       sizes as fixed
353 |     - dstat cdf/ccdf: specifying -ipolate- together with -floor- returned error; this
354 |       is fixed
355 | 
356 |     10dec2020 (version 1.0.7)
357 |     - vce(analytic/cluster, svy)
358 |       o svy was not taken into account if no clusters and no weights, iweights, or
359 |         fweights were specified; this is fixes
360 |       o revised code to preserve memory and avoid double work
361 |     - for reasons of consistency, in case of iweights, the sum of weights is now
362 |       reported in e(N)  instead of the physical number of observations
363 | 
364 |     09dec2020 (version 1.0.6)
365 |     - new option select() in -dstat graph- to select and order subgraphs and plots
366 |     - new suboption select() in over(): select and order subpopulations to be included
367 |       in results; total will still include obs from all groups
368 |     - new suboption -rescale- in over(): rescale results by the relative size of the
369 |       subpopulation
370 |     - suboption -svy- in vce(analytic) and vce(cluster) to compute SEs for
371 |       frequencies and totals like svy does 
372 |     - new statistics: min, max, range, midrange (IFs/SEs will be set to zero for 
373 |       these statistics)
374 |     - vce(svy), vce(bootstrap), and vce(jackknife) now feature suboption [no]cov to
375 |       decide whether to store full e(V) or only e(se); default is -cov- for 
376 |       -dstat summarize- and -nocov- else; with vce(svy) option -nocov- also removes
377 |       auxiliary covariance matrices such as e(V_srs)
378 |     - dstat density: standard errors were correct only in the first subpopulation 
379 |       if -over()- was specified together with -exact-; this is fixed 
380 | 
381 |     05dec2020 (version 1.0.5)
382 |     - new -dstat ccdf- command for complementary CDF (tail distribution, survival
383 |       function)
384 |     - -dstat cdf- has new options -frequency-, -percent-, -floor-, and -ipolate-
385 |     - additional statistics: total(), cdff(), ccdf(), ccdfm(), ccdff()
386 |     - statistics trim(p1,p2) and winsor(p1,p2) now documented; furthermore, qdef()
387 |       is now taken into account by trim() and winsor()
388 |     - option -sum- in -dstat lorenz- and -dstat share- now documented
389 |     - statistics tlorenz(), tshare(), tccurve(), tcshare() now documented
390 |     - option generate() has a new -svy- suboption to generated scores for survey 
391 |       estimation instead of influence functions; this is only makes a difference for
392 |       unnormalized statistics (frequencies, totals)
393 |     - VCE for unnormalized statistics (frequencies, totals) did not take account of
394 |       the extra uncertainty induced by the variability of the sum of weights in the
395 |       context of survey estimation; this is fixed
396 |     - confidence limits had wrong scale if -percent- was specified, citype() was not
397 |       normal, and width of confidence interval was zero; this is fixed
398 |     - predict after survey estimation with subpop() returned missing in observations
399 |       outside subpop(); the IFs for these observations are now set to 0
400 |     - revised code of some IFs to avoid double work; affected functions are
401 |       dstat_density_IF(), dstat_cdf_IF(), dstat_sum_hist(), dstat_sum_cdf(),
402 |       dstat_sum_cdfm(), dstat_sum_freq()
403 |     - now using pstyle(p#line) instead of pstyle(p#) in graphs if appropriate
404 |     - no longer using mm_repeat(); using J() instead
405 | 
406 |     27nov2020 (version 1.0.4)
407 |     - "version, user" issue now finally fixed (hopefully); the issue was related
408 |       to -set dp comma-
409 | 
410 |     27nov2020 (version 1.0.3)
411 |     - yet another try to fix the "version, user" issue
412 | 
413 |     27nov2020 (version 1.0.2)
414 |     - graph option -merge- added
415 |     - added code to circumvent the "version, user" error that appears to occur
416 |       in some variants of Stata installations
417 | 
418 |     24nov2020 (version 1.0.1)
419 |     - issues encountered with regexr() in Stata 14; no longer using regexr()
420 |     - fixed another awkward Stata 14 issue
421 | 
422 |     24nov2020 (version 1.0.0):
423 |     - dstat released on GitHub
424 | 
425 | 


--------------------------------------------------------------------------------
/dstat.pkg:
--------------------------------------------------------------------------------
 1 | v 3
 2 | d dstat: Stata module to compute summary statistics and distribution functions including standard errors and optional covariate balancing
 3 | d 
 4 | d Author: Ben Jann, University of Bern, ben.jann@unibe.ch
 5 | d
 6 | d Distribution-Date: 20250604
 7 | f dstat.ado
 8 | f dstat.sthlp
 9 | f dstat_svyr.ado
10 | 


--------------------------------------------------------------------------------
/dstat.sthlp:
--------------------------------------------------------------------------------
   1 | {smcl}
   2 | {* 25apr2025}{...}
   3 | {viewerjumpto "Syntax" "dstat##syntax"}{...}
   4 | {viewerjumpto "Description" "dstat##description"}{...}
   5 | {viewerjumpto "Summary statistics" "dstat##stats"}{...}
   6 | {viewerjumpto "Options" "dstat##options"}{...}
   7 | {viewerjumpto "Examples" "dstat##examples"}{...}
   8 | {viewerjumpto "Methods and formulas" "dstat##methods"}{...}
   9 | {viewerjumpto "Saved results" "dstat##saved_results"}{...}
  10 | {viewerjumpto "References" "dstat##references"}{...}
  11 | {hline}
  12 | help for {hi:dstat}{...}
  13 | {right:{browse "http://github.com/benjann/dstat/"}}
  14 | {hline}
  15 | 
  16 | {title:Title}
  17 | 
  18 | {pstd}{hi:dstat} {hline 2} Summary statistics and distribution functions
  19 | 
  20 | 
  21 | {marker syntax}{...}
  22 | {title:Syntax}
  23 | 
  24 | {pstd}
  25 |     Estimation
  26 | 
  27 | {pmore}
  28 |     Summary statistics
  29 | 
  30 | {p 12 17 2}
  31 | {cmd:dstat} [{cmdab:su:mmarize}] [{cmd:(}{it:{help dstat##statistics:stats}}{cmd:)}] {varlist}
  32 |     [ {cmd:(}{it:{help dstat##statistics:stats}}{cmd:)} {varlist} {it:...} ]
  33 |     {ifin} {weight} [{cmd:,}  {help dstat##opts:{it:options}} ]
  34 | 
  35 | {pmore}
  36 |     Pairwise associations (wrapper for {cmd:dstat summarize})
  37 | 
  38 | {p 12 17 2}
  39 | {cmd:dstat} {cmdab:pw} {varlist} {ifin} {weight} [{cmd:,}
  40 |     {cmdab:s:tatistic}{cmd:(}{help dstat##pw:{it:stat}}{cmd:)}
  41 |     {help dstat##opts:{it:options}} ]
  42 | 
  43 | {pmore}
  44 |     Distribution functions
  45 | 
  46 | {p 12 17 2}
  47 | {cmd:dstat} {it:subcmd} {varlist} {ifin} {weight} [{cmd:,}  {help dstat##opts:{it:options}} ]
  48 | 
  49 | {pmore2}
  50 |     where {it:subcmd} is
  51 | 
  52 | {p2colset 15 28 30 2}{...}
  53 | {p2col:{opt d:ensity}}density function{p_end}
  54 | {p2col:{opt pdf}}same as {cmd:density}{p_end}
  55 | {p2col:{opt h:istogram}}histogram{p_end}
  56 | {p2col:{opt p:roportion}}probability distribution{p_end}
  57 | {p2col:{opt freq:uency}}same as {cmd:proportion} with option {cmd:frequency}{p_end}
  58 | {p2col:{opt c:df}}cumulative distribution function{p_end}
  59 | {p2col:{opt cc:df}}complementary CDF/survival function{p_end}
  60 | {p2col:{opt q:uantile}}quantile function{p_end}
  61 | {p2col:{opt l:orenz}}lorenz curve{p_end}
  62 | {p2col:{opt sh:are}}percentile shares{p_end}
  63 | {p2col:{opt tip}}TIP curve{p_end}
  64 | 
  65 | {pmore}
  66 |     {it:varlist} may contain factor variables (in most cases; an exception is {cmd:dstat pw}); see {help fvvarlist}.
  67 |     {p_end}
  68 | {pmore}
  69 |     {cmd:fweight}s, {cmd:pweight}s, and {cmd:iweight}s are allowed; see {help weight}.
  70 | 
  71 | {pstd}
  72 |     Postestimation
  73 | 
  74 | {pmore}
  75 |     Replay results
  76 | 
  77 | {p 12 17 2}
  78 | {cmd:dstat} [{cmd:,} {help dstat##repopts:{it:reporting_options}} ]
  79 | 
  80 | {pmore}
  81 |     Draw graph
  82 | 
  83 | {p 12 17 2}
  84 | {cmd:dstat} {cmdab:gr:aph}
  85 |     [{cmd:,} {help dstat##graph_opts:{it:graph_options}} ]
  86 | 
  87 | {pmore}
  88 |     Obtain (recentered) influence functions
  89 | 
  90 | {p 12 17 2}
  91 |     {cmd:predict} {c -(}{help newvarlist##stub*:{it:stub}}{cmd:*} |
  92 |         {it:{help newvar:newvar1}} {it:{help newvar:newvar2}} {cmd:...}{c )-} {ifin}
  93 |         [{cmd:,} {it:{help dstat##predict_opts:predict_options}} ]
  94 | 
  95 | 
  96 | {synoptset 26 tabbed}{...}
  97 | {marker opts}{col 5}{help dstat##options:{it:options}}{col 33}Description
  98 | {synoptline}
  99 | {syntab:{help dstat##mainopts:Main}}
 100 | {synopt:{opt nocase:wise}}do not perform casewise deletion of observations
 101 |     {p_end}
 102 | {synopt:{cmdab:o:ver(}{help varname:{it:overvar}}[{cmd:,} {it:opts}]{cmd:)}}compute
 103 |     results for subpopulations defined by {it:overvar}; not allowed with {cmd:dstat pw}
 104 |     {p_end}
 105 | {synopt:{opt tot:al}}include results for total population
 106 |     {p_end}
 107 | {synopt:{cmdab:bal:ance(}{help dstat##balance:{it:spec}}{cmd:)}}balance
 108 |     covariates using reweighting; requires {cmd:over()}
 109 |     {p_end}
 110 | {synopt:{help dstat##repopts:{it:reporting_options}}}reporting options
 111 |     {p_end}
 112 | {synopt:{opt noval:ues}}do not use values as coefficient names
 113 |     {p_end}
 114 | {synopt:{opth vf:ormat(fmt)}}format for coefficient name values
 115 |     {p_end}
 116 | 
 117 | {syntab:{help dstat##vce:SE/VCE}}
 118 | {synopt:{cmd:vce(}{it:vcetype}{cmd:)}}variance estimation method;
 119 |     {it:vcetype} may be {cmd:none} (skip variance estimation),
 120 |     {cmdab:a:nalytic}, {cmdab:cl:uster} {it:clustvar}, {cmdab:svy}, {cmdab:boot:strap},
 121 |     or {cmdab:jack:knife}
 122 |     {p_end}
 123 | {synopt:{cmd:nose}}alias for {cmd:vce(none)}
 124 |     {p_end}
 125 | {synopt:[{cmd:no}]{cmd:cov}}whether to store the full variance matrix or only the
 126 |     standard errors
 127 |     {p_end}
 128 | {synopt:{opt nobwfix:ed}}do not keep density bandwidth fixed across replications
 129 |     {p_end}
 130 | {synopt:{cmdab:g:enerate(}{it:names}[{cmd:,} {it:opts}]{cmd:)}}store influence functions
 131 |     {p_end}
 132 | {synopt:{cmd:rif(}{it:names}[{cmd:,} {it:opts}]{cmd:)}}store recentered influence functions
 133 |     {p_end}
 134 | {synopt:{opt r:eplace}}allow replacing existing variables
 135 |     {p_end}
 136 | 
 137 | {syntab:{help dstat##quant:Quantile/density settings}}
 138 | {synopt:{opt qdef(#)}}quantile definition; # in {c -(}0,...,11{c )-}
 139 |     {p_end}
 140 | {synopt:{opt hdq:uantile}}synonym for {cmd:qdef(10)} (Harrell-Davis quantiles)
 141 |     {p_end}
 142 | {synopt:{opt hdt:rim}[{cmd:(}{it:width}{cmd:)}]}apply trimming to the Harrell-Davis estimator
 143 |     {p_end}
 144 | {synopt:{opt mq:uantile}}synonym for {cmd:qdef(11)} (mid-quantiles)
 145 |     {p_end}
 146 | {synopt:{opt mqopt:s(options)}}options for mid-quantiles
 147 |     {p_end}
 148 | {synopt:{it:{help dstat##densopts:density_options}}}details of density estimation
 149 |     {p_end}
 150 | 
 151 | {syntab:{help dstat##sum:Subcommand {bf:summarize}}}
 152 | {synopt:{opt relax}}compute a statistic even if observations are out of support
 153 |     {p_end}
 154 | {synopt:{opth by(varname)}}default secondary variable (for association and concentration measures)
 155 |     {p_end}
 156 | {synopt:{opt pl:ine(#|varname)}}default poverty line
 157 |     {p_end}
 158 | {synopt:{opt pstr:ong}}use "strong" poverty definition
 159 |     {p_end}
 160 | 
 161 | {syntab:{help dstat##pw:Subcommand {bf:pw}}}
 162 | {synopt:{opt s:tatistic(stat)}}type of association measure; default is {cmd:statistic(corr)}
 163 |     {p_end}
 164 | {synopt:{opt lo:wer}}lower-triangle elements only
 165 |     {p_end}
 166 | {synopt:{opt up:per}}upper-triangle elements only
 167 |     {p_end}
 168 | {synopt:{opt diag:onal}}include diagonal elements
 169 |     {p_end}
 170 | 
 171 | {syntab:{help dstat##density:Subcommand {bf:density}}}
 172 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)}
 173 |     {p_end}
 174 | {synopt:{opt com:mon}}use common evaluation points across subpopulations
 175 |     {p_end}
 176 | {synopt:[{cmd:l}|{cmd:r}]{cmd:tight}}use tight evaluation grid
 177 |     {p_end}
 178 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}; default is to determine
 179 |     grid range from data
 180 |     {p_end}
 181 | {synopt:{opth at(numlist)}}custom grid of evaluation points
 182 |     {p_end}
 183 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation
 184 |     {p_end}
 185 | 
 186 | {syntab:{help dstat##hist:Subcommand {bf:histogram}}}
 187 | {synopt:{opt prop:ortion}}estimate proportions instead of densities
 188 |     {p_end}
 189 | {synopt:{opt per:cent}}estimate percent instead of densities
 190 |     {p_end}
 191 | {synopt:{opt freq:uency}}estimate frequencies instead of densities
 192 |     {p_end}
 193 | {synopt:{cmd:n(}{cmd:#}|{it:{help dstat##hist:method}}{cmd:)}}number of
 194 |     histogram bins; default is {cmd:n(sqrt)}
 195 |     {p_end}
 196 | {synopt:{cmd:ep}}use equal probability bins instead of equal width bins
 197 |     {p_end}
 198 | {synopt:{opt com:mon}}use common bin definitions across subpopulations
 199 |     {p_end}
 200 | {synopt:{opth at(numlist)}}custom bin definitions
 201 |     {p_end}
 202 | {synopt:{opt disc:rete}}treat data as discrete (calls {cmd:dstat proportion})
 203 |     {p_end}
 204 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation
 205 |     {p_end}
 206 | 
 207 | {syntab:{help dstat##prop:Subcommand {bf:proportion}}}
 208 | {synopt:{opt per:cent}}estimate percent instead of probabilities
 209 |     {p_end}
 210 | {synopt:{opt freq:uency}}estimate frequencies instead of probabilities
 211 |     {p_end}
 212 | {synopt:{opth at(numlist)}}custom list of levels for which to estimate proportions
 213 |     {p_end}
 214 | {synopt:{opt nocat:egorical}}allow variables that do not comply to Stata's rules
 215 |     for factor variables
 216 |     {p_end}
 217 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation
 218 |     {p_end}
 219 | 
 220 | {syntab:{help dstat##cdf:Subcommands {bf:cdf} and {bf:ccdf}}}
 221 | {synopt:{opt per:cent}}estimate percent instead of probabilities
 222 |     {p_end}
 223 | {synopt:{opt freq:uency}}estimate frequencies instead of probabilities
 224 |     {p_end}
 225 | {synopt:{opt mid}}apply midpoint adjustment
 226 |     {p_end}
 227 | {synopt:{opt fl:oor}}use "lower-than" definition
 228 |     {p_end}
 229 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)}
 230 |     {p_end}
 231 | {synopt:{opt com:mon}}use common evaluation points across subpopulations
 232 |     {p_end}
 233 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}; default is to determine
 234 |     grid range from data
 235 |     {p_end}
 236 | {synopt:{opth at(numlist)}}custom grid of evaluation points
 237 |     {p_end}
 238 | {synopt:{opt disc:rete}}treat data as discrete
 239 |     {p_end}
 240 | {synopt:{opt ip:olate}}obtain CDF by linear interpolation
 241 |     {p_end}
 242 | {synopt:{cmdab:unc:onditional}}rescale results by relative size of subpopulation
 243 |     {p_end}
 244 | 
 245 | {syntab:{help dstat##quantile:Subcommand {bf:quantile}}}
 246 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(99)}
 247 |     {p_end}
 248 | {synopt:{opt range(a b)}}use grid within range from {it:a} to {it:b}, {it:a} and {it:b}
 249 |     in [0,1]; default is {cmd:range(0 1)}
 250 |     {p_end}
 251 | {synopt:{opth at(numlist)}}custom grid of evaluation points
 252 |     {p_end}
 253 | 
 254 | {syntab:{help dstat##lorenz:Subcommand {bf:lorenz}}}
 255 | {synopt:{opt per:cent}}report percent instead of proportions
 256 |     {p_end}
 257 | {synopt:{opt general:ized}}estimate generalized Lorenz curve
 258 |     {p_end}
 259 | {synopt:{opt sum}}estimate total (unnormalized) Lorenz curve
 260 |     {p_end}
 261 | {synopt:{opt gap}}estimate equality gap curve
 262 |     {p_end}
 263 | {synopt:{opt abs:olute}}estimate absolute Lorenz curve
 264 |     {p_end}
 265 | {synopt:{opth by(varname)}}estimate concentration curve with respect to specified variable
 266 |     {p_end}
 267 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(101)}
 268 |     {p_end}
 269 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}, {it:a} and {it:b}
 270 |     in [0,1]; default is {cmd:range(0 1)}
 271 |     {p_end}
 272 | {synopt:{opth at(numlist)}}custom grid of evaluation points
 273 |     {p_end}
 274 | 
 275 | {syntab:{help dstat##share:Subcommand {bf:share}}}
 276 | {synopt:{opt prop:ortion}}estimate proportions instead of densities
 277 |     {p_end}
 278 | {synopt:{opt per:cent}}estimate percent instead of densities
 279 |     {p_end}
 280 | {synopt:{opt general:ized}}estimate generalized shares instead of densities
 281 |     {p_end}
 282 | {synopt:{opt sum}}estimate totals instead of density
 283 |     {p_end}
 284 | {synopt:{opt ave:rage}}estimate averages instead of densities
 285 |     {p_end}
 286 | {synopt:{opth by(varname)}}estimate concentration shares with respect to specified variable
 287 |     {p_end}
 288 | {synopt:{opt n(#)}}number of bins; default is {cmd:n(20)}
 289 |     {p_end}
 290 | {synopt:{opth at(numlist)}}custom bin definitions
 291 |     {p_end}
 292 | 
 293 | {syntab:{help dstat##tip:Subcommand {bf:tip}}}
 294 | {synopt:{opt pl:ine(#|varname)}}poverty line (required)
 295 |     {p_end}
 296 | {synopt:{opt abs:olute}}estimate absolute TIP curve
 297 |     {p_end}
 298 | {synopt:{opt pstr:ong}}use "strong" poverty definition
 299 |     {p_end}
 300 | {synopt:{opt n(#)}}size of evaluation grid; default is {cmd:n(101)}
 301 |     {p_end}
 302 | {synopt:{opt range(a b)}}use grid from {it:a} to {it:b}, {it:a} and {it:b}
 303 |     in [0,1]; default is {cmd:range(0 1)}
 304 |     {p_end}
 305 | {synopt:{opth at(numlist)}}custom grid of evaluation points
 306 |     {p_end}
 307 | {synoptline}
 308 | {pstd}
 309 | 
 310 | {marker graph_opts}{col 5}{help dstat##graph_options:{it:graph_options}}{col 33}Description
 311 | {synoptline}
 312 | {synopt:{cmdab:mer:ge}}merge results into a single subgraph
 313 |     {p_end}
 314 | {synopt:{cmdab:overl:ay}}synonym for {cmd:merge}
 315 |     {p_end}
 316 | {synopt:{cmd:flip}}change how results are allocated to plots and subgraphs
 317 |     {p_end}
 318 | {synopt:[{ul:{cmd:g}}|{ul:{cmd:p}}]{opt sel:ect(spec)}}select and order plots and subgraphs
 319 |     {p_end}
 320 | {synopt:{opt cref}}include results from the the reference (sub)population
 321 |     {p_end}
 322 | {synopt:{cmdab:bys:tats}[{cmd:(}{it:arg}{cmd:)}]}group results by statistics; only relevant for {cmd:dstat summarize}
 323 |     {p_end}
 324 | {synopt:[{cmd:no}]{cmd:step}}do/do not use step function; only relevant for {cmd:dstat cdf}
 325 |     {p_end}
 326 | {synopt:{cmdab:noref:line}}suppress equality line; only relevant for {cmd:dstat lorenz}
 327 |     {p_end}
 328 | {synopt:{opth ref:line(line_options)}}affect rendition of equality line; only relevant for {cmd:dstat lorenz}
 329 |     {p_end}
 330 | {synopt:{help dstat##coefplot:{it:coefplot_options}}}options to be passed through to {helpb coefplot}
 331 |     {p_end}
 332 | {synoptline}
 333 | 
 334 | {marker predict_opts}{col 5}{help dstat##predict_options:{it:predict_options}}{col 33}Description
 335 | {synoptline}
 336 | {synopt:{opt rif}}store recentered influence functions
 337 |     {p_end}
 338 | {synopt:{cmdab:sca:ling(}{cmdab:t:otal}|{cmdab:m:ean}{cmd:)}}set the scaling of the influence functions
 339 |     {p_end}
 340 | {synopt:{opt com:pact}}store influence functions in compact form; not allowed with {cmd:balance()} or {cmd:unconditional}
 341 |     {p_end}
 342 | {synopt:{opt qui:etly}}do not display list of generated variables
 343 |     {p_end}
 344 | {synoptline}
 345 | 
 346 | 
 347 | {marker description}{...}
 348 | {title:Description}
 349 | 
 350 | {pstd}
 351 |     {cmd:dstat} provides a unified framework for the analysis of (univariate)
 352 |     distributions. It supports the estimation of various distribution
 353 |     functions (such as the PDF and CDF, quantiles, probabilities and
 354 |     frequencies, histograms, Lorenz and concentration curves) and a large
 355 |     collection of summary statistics (classical and robust measures of
 356 |     location, scale, skewness, and kurtosis, measures of inequality,
 357 |     concentration, and poverty).
 358 | 
 359 | {pstd}
 360 |     {cmd:dstat} is an estimation command. Its results are stored
 361 |     in {cmd:e()} and standard errors are provided for all
 362 |     estimates. Variance-covariance estimation is based on influence functions
 363 |     (Hampel 1974, Deville 1999) and fully supports complex survey data; see the {helpb dstat##vce:vce()}
 364 |     option. Influence functions or recentered influence functions (RIFs) can be
 365 |     generated for all statistics supported by {cmd:dstat}, either using the
 366 |     {helpb dstat##generate:generate()} or {cmd:rif()} option, or
 367 |     by applying {cmd:predict} after estimation.
 368 | 
 369 | {pstd}
 370 |     {cmd:dstat} supports simultaneous estimation for multiple variables and
 371 |     multiple subpopulations, and allows for covariate balancing or
 372 |     standardization between subpopulations based on inverse probability
 373 |     weighting (IPW) or entropy balancing. See the {helpb dstat##over:over()}
 374 |     and {helpb dstat##balance:balance()} options. Standard errors will take
 375 |     account of the uncertainty induced by the balancing.
 376 | 
 377 | {pstd}
 378 |     Basic functionality for graphing results is provided through the
 379 |     {cmd:graph()} option or by applying command {cmd:dstat graph}
 380 |     after estimation. {cmd:dstat} employs {helpb coefplot} for
 381 |     graphing, which needs to be installed on the system; see
 382 |     {net "describe coefplot, from(http://fmwww.bc.edu/repec/bocode/c/)":{bf:ssc describe coefplot}}. Furthermore,
 383 |     {cmd:dstat} requires the {helpb moremata} package; see
 384 |     {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":{bf:ssc describe moremata}}.
 385 | 
 386 | 
 387 | {marker statistics}{...}
 388 | {title:Summary statistics}
 389 | 
 390 | {pstd}
 391 |     The syntax for specifying summary statistics and variables with
 392 |     {cmd:dstat summarize} is
 393 | 
 394 |         [ {cmd:(}{it:{help dstat##stats:stats}}{cmd:)} ] {varlist} [ {cmd:(}{it:{help dstat##stats:stats}}{cmd:)} {varlist} {it:...} ]
 395 | 
 396 | {pstd}
 397 |     where {it:stats} is a space-separated list of statistics as documented below
 398 |     and {it:varlist} is a list of numeric variables, possibly including factor
 399 |     variables (see {help fvvarlist}). The default statistic is {cmd:mean}. Statistics
 400 |     and variables may be repeated. {cmd:dstat} will rearrange the statistics by variables
 401 |     and remove duplicate combinations.
 402 | 
 403 | {pstd}
 404 |     The names of the statistics can be abbreviated and typed in lower or
 405 |     uppercase letters (the names will be used as typed in the output; repeated
 406 |     statistics with differently typed names will be treated as different
 407 |     statistics). If abbreviation is ambiguous, the first matching statistic in
 408 |     the sorted list of supported statistics will be used (with the following
 409 |     exceptions: {cmd:q} or {cmd:p} can be used for {cmd:quantile}, {cmd:d}
 410 |     for {cmd:density}, and {cmd:f} for {cmd:freq}). For example, to obtain the
 411 |     geometric mean, you could type {cmd:gmean}, {cmd:gm}, {cmd:GM}, {cmd:gMean},
 412 |     {cmd:gme}, or any other variant including at least the first two letters.
 413 | 
 414 | {pstd}
 415 |     Many of the statistics allow or require one or several arguments in
 416 |     parentheses. Parentheses can be omitted if there is only a single numeric
 417 |     argument and no space is included between the name and the argument. For
 418 |     example, to obtain the 5% trimmed mean you could type {cmd:trim(5)} or
 419 |     simply {cmd:trim5} (omitting parentheses also works with numbers that
 420 |     contain decimal places, that is, you could type {cmd:trim5.5} to obtain the
 421 |     5.5% trimmed mean; in this case, however, parentheses will be added in the
 422 |     output).
 423 | 
 424 | {synoptset 27 tabbed}{...}
 425 | {marker stats}{col 5}{it:stats}{col 33}Description
 426 | {synoptline}
 427 | {syntab:Points in the distribution}
 428 | {synopt:{opt quantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 quantile; {it:p} in [0,100]
 429 |     {p_end}
 430 | {synopt:{opt p}{cmd:(}{it:p}{cmd:)}}same as {cmd:quantile()}
 431 |     {p_end}
 432 | {synopt:{opt hdquantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 Harrell/Davis (1982) quantile; {it:p} in [0,100]
 433 |     {p_end}
 434 | {synopt:{opt mquantile}{cmd:(}{it:p}{cmd:)}}{it:p}/100 mid-quantile; {it:p} in [0,100]
 435 |     {p_end}
 436 | {synopt:{opt density}{cmd:(}{it:x}{cmd:)}}kernel density at value {it:x}
 437 |     {p_end}
 438 | {synopt:{opt hist}{cmd:(}{it:x1}{cmd:,}{it:x2}{cmd:)}}histogram density of data within ({it:x1},{it:x2}]
 439 |     {p_end}
 440 | {synopt:[*]{opt cdf}{cmd:(}{it:x}{cmd:)}}cumulative distribution (CDF) at value {it:x}; prefix {it:*} is empty for default,
 441 |     {cmd:m} for mid-adjusted CDF, {cmd:f} for floor CDF
 442 |     {p_end}
 443 | {synopt:[*]{opt ccdf}{cmd:(}{it:x}{cmd:)}}complementary CDF at value {it:x}; prefix {it:*} is empty for default,
 444 |     {cmd:m} for mid-adjusted CCDF, {cmd:f} for floor CCDF
 445 |     {p_end}
 446 | {synopt:{opt prop}{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}}proportion of data equal to {it:x1} or within [{it:x1},{it:x2}]
 447 |     {p_end}
 448 | {synopt:{opt pct}{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}}percent of data equal to {it:x1} or within [{it:x1},{it:x2}]
 449 |     {p_end}
 450 | {synopt:{opt freq}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}overall frequency, or frequency of data equal to {it:x1} or within [{it:x1},{it:x2}]
 451 |     {p_end}
 452 | {synopt:{opt count}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}same as {cmd:freq()}
 453 |     {p_end}
 454 | {synopt:{opt total}[{cmd:(}{it:x1}[{cmd:,}{it:x2}]{cmd:)}]}overall total, or total of data equal to {it:x1} or within [{it:x1},{it:x2}]
 455 |     {p_end}
 456 | {synopt:{opt min}}observed minimum (standard error set to zero)
 457 |     {p_end}
 458 | {synopt:{opt max}}observed maximum (standard error set to zero)
 459 |     {p_end}
 460 | {synopt:{opt range}}{cmd:max}-{cmd:min} (standard error set to zero)
 461 |     {p_end}
 462 | {synopt:{opt midrange}}({cmd:min}+{cmd:max})/2 (standard error set to zero)
 463 |     {p_end}
 464 | 
 465 | {syntab:Location measures}
 466 | {synopt:{opt mean}}arithmetic mean
 467 |     {p_end}
 468 | {synopt:{opt gmean}}geometric mean (data must be positive)
 469 |     {p_end}
 470 | {synopt:{opt hmean}}harmonic mean (data must be positive)
 471 |     {p_end}
 472 | {synopt:{cmd:trim(}{it:p1}[{cmd:,}{it:p2}]{cmd:)}}trimmed mean with
 473 |     {it:p1}/100 lower trimming and {it:p2}/100 upper trimming; {it:p1} and {it:p2} in
 474 |     [0,50]; {it:p2}={it:p1} if omitted; default is {it:p1}={it:p2}=25
 475 |     {p_end}
 476 | {synopt:{cmd:winsor(}{it:p1}[{cmd:,}{it:p2}]{cmd:)}}winsorized mean with
 477 |     {it:p1}/100 lower winsorizing and {it:p2}/100 upper winsorizing; {it:p1} and {it:p2} in
 478 |     [0,50]; {it:p2}={it:p1} if omitted; default is {it:p1}={it:p2}=25
 479 |     {p_end}
 480 | {synopt:{opt median}}median; equal to {cmd:q50}
 481 |     {p_end}
 482 | {synopt:{opt huber}[{cmd:(}{it:p}{cmd:)}]}Huber M estimate with gaussian efficiency
 483 |     {it:p} in [63.7,99.9]; default is {it:p}=95
 484 |     {p_end}
 485 | {synopt:{opt biweight}[{cmd:(}{it:p}{cmd:)}]}biweight M estimate with gaussian
 486 |     efficiency {it:p} in [.01,99.9]; default is {it:p}=95
 487 |     {p_end}
 488 | {synopt:{opt hl}}Hodges-Lehmann location measure (Hodges and Lehmann 1963)
 489 |     {p_end}
 490 | 
 491 | {syntab:Scale measures}
 492 | {synopt:{opt sd}[{cmd:(}{it:df}{cmd:)}]}standard deviation; {it:df} applies
 493 |     small-sample adjustment; default is {it:df}=1
 494 |     {p_end}
 495 | {synopt:{opt variance}[{cmd:(}{it:df}{cmd:)}]}variance; default is {it:df}=1
 496 |     {p_end}
 497 | {synopt:{opt mse}[{cmd:(}{it:x}[{cmd:,}{it:df}]{cmd:)}]}mean squared deviation from value
 498 |     {it:x} (mean squared error); default is {it:x}=0 and {it:df}=0
 499 |     {p_end}
 500 | {synopt:{opt rmse}[{cmd:(}{it:x}[{cmd:,}{it:df}]{cmd:)}]}root mean
 501 |     squared deviation from value {it:x}; default is {it:x}=0 and {it:df}=0
 502 |     {p_end}
 503 | {synopt:{opt iqr}[{cmd:n}][{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}interquantile range; default
 504 |     is {cmd:iqr(25,75)} (interquartile range); specify {cmd:iqrn} for
 505 |     normalized IQR, equal to 1/(invnormal({it:p2}) - invnormal({it:p1})) * {cmd:iqr}
 506 |     {p_end}
 507 | {synopt:{opt mad}[{cmd:n}][{cmd:(}{it:l}[{cmd:,}{it:t}]{cmd:)}]}median (or mean if {it:l}!=0)
 508 |     absolute deviation from the median (or mean if {it:t}!=0); specify {cmd:madn} for
 509 |     normalized MAD, equal to 1/invnormal(0.75) * {cmd:mad} (or sqrt(pi/2) * {cmd:mad} if {it:l}!=0)
 510 |     {p_end}
 511 | {synopt:{opt mae}[{cmd:n}][{cmd:(}{it:l}[{cmd:,}{it:x}]{cmd:)}]}median (or mean if {it:l}!=0)
 512 |     absolute deviation from value {it:x}; default is {it:x}=0; specify {cmd:maen} for
 513 |     normalized MAE, equal to 1/invnormal(0.75) * {cmd:mae} or (sqrt(pi/2) * {cmd:mae} if {it:l}!=0)
 514 |     {p_end}
 515 | {synopt:{opt md}[{cmd:n}]}mean absolute pairwise difference; equal to 2 * {cmd:mean} * {cmd:gini}; specify {cmd:mdn} for
 516 |     normalized MD, equal to sqrt(pi)/2 * {cmd:md}
 517 |     {p_end}
 518 | {synopt:{opt mscale}[{cmd:(}{it:bp}{cmd:)}]}M estimate of scale with breakdown
 519 |     point {it:bp} in [1,50]; default is {it:bp}=50
 520 |     {p_end}
 521 | {synopt:{opt qn}}Qn scale coefficient (Rousseeuw and Croux 1993)
 522 |     {p_end}
 523 | 
 524 | {syntab:Skewness measures}
 525 | {synopt:{opt skewness}}skewness
 526 |     {p_end}
 527 | {synopt:{opt qskew}[{cmd:(}{it:alpha}{cmd:)}]}quantile skewness (Hinkley 1975);
 528 |     {it:alpha} in [0,50]; default is {it:alpha}=25
 529 |     {p_end}
 530 | {synopt:{opt mc}}medcouple (Brys et al. 2004)
 531 |     {p_end}
 532 | 
 533 | {syntab:Kurtosis measures}
 534 | {synopt:{opt kurtosis}}kurtosis
 535 |     {p_end}
 536 | {synopt:{opt ekurtosis}}excess kurtosis; equal to {cmd:kurtosis}-3
 537 |     {p_end}
 538 | {synopt:{opt qw}[{cmd:(}{it:alpha}{cmd:)}]}quantile tail weight; {it:alpha}
 539 |     in [0,50]; default is {it:alpha}=25
 540 |     {p_end}
 541 | {synopt:{opt lqw}[{cmd:(}{it:alpha}{cmd:)}]}left quantile tail weight; {it:alpha}
 542 |     in [0,50]; default is {it:alpha}=25
 543 |     {p_end}
 544 | {synopt:{opt rqw}[{cmd:(}{it:alpha}{cmd:)}]}right quantile tail weight;
 545 |     {it:alpha} in [0,50]; default is {it:alpha}=25
 546 |     {p_end}
 547 | {synopt:{opt lmc}}left medcouple tail weight measure (Brys et al. 2006)
 548 |     {p_end}
 549 | {synopt:{opt rmc}}right medcouple tail weight measure (Brys et al. 2006)
 550 |     {p_end}
 551 | 
 552 | {syntab:Inequality measures}
 553 | {synopt:{opt hoover}}Hoover index (Robin Hood index, Ricci-Schutz, Pietra index)
 554 |     {p_end}
 555 | {synopt:[{cmd:a}]{opt gini}[{cmd:(}{it:df}{cmd:)}]}Gini coefficient; {it:df} applies
 556 |     small-sample adjustment; default is {it:df}=0; specify {cmd:agini} for the absolute Gini coefficient
 557 |     {p_end}
 558 | {synopt:{opt mld}}mean log deviation (MLD); equal to {cmd:ge(0)}
 559 |     {p_end}
 560 | {synopt:{opt theil}}Theil index; equal to {cmd:ge(1)}
 561 |     {p_end}
 562 | {synopt:{opt ge}[{cmd:(}{it:alpha}{cmd:)}]}generalized entropy (Shorrocks 1980)
 563 |     with parameter {it:alpha}; default is {it:alpha}=1 (in which case
 564 |     {cmd:ge}={cmd:theil})
 565 |     {p_end}
 566 | {synopt:{opt atkinson}[{cmd:(}{it:epsilon}{cmd:)}]}Atkinson index with parameter
 567 |     {it:epsilon}>=0; default is {it:epsilon}=1
 568 |     {p_end}
 569 | {synopt:{opt cv}[{cmd:(}{it:df}{cmd:)}]}coefficient of variation; default is {it:df}=1;
 570 |     {cmd:cv(0)}=sqrt(2*{cmd:ge(2)})
 571 |     {p_end}
 572 | {synopt:{opt lvar}[{cmd:(}{it:df}{cmd:)}]}logarithmic variance; default is {it:df}=1
 573 |     {p_end}
 574 | {synopt:{opt vlog}[{cmd:(}{it:df}{cmd:)}]}variance of logarithm; default is {it:df}=1
 575 |     {p_end}
 576 | {synopt:{opt sdlog}[{cmd:(}{it:df}{cmd:)}]}standard deviation of logarithm; default is {it:df}=1
 577 |     {p_end}
 578 | {synopt:{opt top}[{cmd:(}{it:p}{cmd:)}]}outcome share of top {it:p} percent; default
 579 |     is {it:p}=10
 580 |     {p_end}
 581 | {synopt:{opt bottom}[{cmd:(}{it:p}{cmd:)}]}outcome share of bottom {it:p} percent;
 582 |     default is {it:p}=40
 583 |     {p_end}
 584 | {synopt:{opt mid}[{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}outcome share of mid
 585 |     {it:p1} to {it:p2} percent; default is {it:p1}=40 and {it:p2}=90
 586 |     {p_end}
 587 | {synopt:{opt palma}}palma ratio; equal to {cmd:top}/{cmd:bottom} or {cmd:sratio(40,90)}
 588 |     {p_end}
 589 | {synopt:{opt qratio}[{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}]}quantile ratio
 590 |     {cmd:q}({it:p2})/{cmd:q}({it:p1}); default is {it:p1}=10 and {it:p2}=90
 591 |     {p_end}
 592 | {synopt:{opt sratio}[{cmd:(}{it:l1}{cmd:,}{it:u1}{cmd:,}{it:l2}{cmd:,}{it:u2}{cmd:)}]}percentile
 593 |     share ratio; default is {it:l1}=0, {it:u1}=10, {it:l2}=90, {it:u2}=100; can also specify
 594 |     {cmd:sratio(}{it:u1}{cmd:,}{it:l2}{cmd:)}
 595 |     {p_end}
 596 | {synopt:[*]{cmd:lorenz}{cmd:(}{it:p}{cmd:)}}Lorenz ordinate, {it:p} in [0,100];
 597 |     prefix {it:*} is empty for default, {cmd:g} for generalized, {cmd:t} for total,
 598 |     {cmd:a} for absolute, {cmd:e} for equality gap
 599 |     {p_end}
 600 | {synopt:[*]{cmd:share}{cmd:(}{it:p1}{cmd:,}{it:p2}{cmd:)}}percentile
 601 |     share, {it:p1} and {it:p2} in [0,100]; prefix {it:*} is empty for default,
 602 |     {cmd:d} for density, {cmd:g} for generalized, {cmd:t} for total, {cmd:a} for average
 603 |     {p_end}
 604 | 
 605 | {syntab:Inequality decomposition}
 606 | {synopt:{it:d}{cmd:_gini}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}where
 607 |     {it:d} is {cmd:b} for the between-group Gini coefficient, or {cmd:gw} for the
 608 |     weighted average of group-specific Gini coefficients; {it:by} specifies the group
 609 |     variable (string allowed); default is as set by option {cmd:by()}; {it:df} applies
 610 |     small-sample adjustment; default is {it:df}=0;
 611 |     can also specify {it:d}{opt _gini(df)}
 612 |     {p_end}
 613 | {synopt:{it:d}{cmd:_mld}[{cmd:(}{it:{help varname:by}}{cmd:)}]}where {it:d} is
 614 |     {cmd:b} for the between-group MLD, {cmd:w} for the within-group MLD, or
 615 |     {cmd:gw} for the weighted average of group-specific MLDs ({cmd:gw_mld} is
 616 |     equivalent to {cmd:w_mld}); {it:by} as for {it:d}{cmd:_gini}
 617 |     {p_end}
 618 | {synopt:{it:d}{cmd:_theil}[{cmd:(}{it:{help varname:by}}{cmd:)}]}where {it:d}
 619 |     is {cmd:b} for the between-group Theil index, {cmd:w} for the
 620 |     within-group Theil index, or {cmd:gw} for the weighted average of
 621 |     group-specific Theil indices; {it:by} as for {it:d}{cmd:_gini}
 622 |     {p_end}
 623 | {synopt:{it:d}{cmd:_ge}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:alpha}]{cmd:)}]}where
 624 |     {it:d} is {cmd:b} for the between-group generalized entropy, {cmd:w} for the
 625 |     within-group generalized entropy, or {cmd:gw} for the weighted
 626 |     average of group-specific generalized entropy; {it:by} as for {it:d}{cmd:_gini};
 627 |     can also specify {it:d}{opt _ge(alpha)}
 628 |     {p_end}
 629 | {synopt:{it:d}{cmd:_vlog}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}where
 630 |     {it:d} is {cmd:b} for the between-group variance of logarithm, {cmd:w} for the
 631 |     within-group variance of logarithm, or {cmd:gw} for the weighted average of
 632 |     group-specific variance of logarithm; {it:by} as for {it:d}{cmd:_gini};
 633 |     can also specify {it:d}{opt _vlog(df)}
 634 |     {p_end}
 635 | 
 636 | {syntab:Concentration measures}
 637 | {synopt:{opt gci}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}Gini concentration index;
 638 |     {it:by} specifies the sort variable; default is as set by option {cmd:by()};
 639 |     {it:df} applies small-sample adjustment; default is {it:df}=0; can also
 640 |     specify {opt gci(df)}
 641 |     {p_end}
 642 | {synopt:{opt aci}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}absolute Gini concentration index; syntax
 643 |     as for {cmd:gci}
 644 |     {p_end}
 645 | {synopt:[*]{cmd:ccurve}{cmd:(}{it:p}[{cmd:,}{it:{help varname:by}}]{cmd:)}}concentration curve ordinate,
 646 |     {it:p} in [0,100]; prefix {it:*} is empty for default, {cmd:g} for generalized, {cmd:t} for total,
 647 |     {cmd:a} for absolute, {cmd:e} for equality gap;
 648 |     {it:by} as for {cmd:gci}
 649 |     {p_end}
 650 | {synopt:[*]{cmd:cshare}{cmd:(}{it:p1}{cmd:,}{it:p2}[{cmd:,}{it:{help varname:by}}]{cmd:)}}concentration share,
 651 |     {it:p1} and {it:p2} in [0,100]; prefix {it:*} is empty for default, {cmd:d} for density,
 652 |     {cmd:g} for generalized, {cmd:t} for total, {cmd:a} for average;
 653 |     {it:by} as for {cmd:gci}
 654 |     {p_end}
 655 | 
 656 | {syntab:Poverty measures}
 657 | {synopt:{opt hcr}[{cmd:(}{it:pline}{cmd:)}]}head count ratio (i.e. proportion poor); {it:pline} specifies the
 658 |     poverty line > 0; {it:pline} can be {varname} or {it:#}; default is as set by option {cmd:pline()}
 659 |     {p_end}
 660 | {synopt:[{cmd:a}]{opt pgap}[{cmd:(}{it:pline}{cmd:)}]}poverty gap (proportion by which mean outcome of poor
 661 |     is below {it:pline}); specify {cmd:apgap} for absolute poverty gap ({it:pline} - mean outcome of poor);
 662 |     {it:pline} as for {cmd:hci}
 663 |     {p_end}
 664 | {synopt:[{cmd:a}]{opt pgi}[{cmd:(}{it:pline}{cmd:)}]}poverty gap index; equal to {cmd:hcr}*{cmd:pgap}; specify
 665 |     {cmd:apgi} for absolute poverty gap index, equal to {cmd:hcr}*{cmd:apgap};
 666 |     {it:pline} as for {cmd:hci}
 667 |     {p_end}
 668 | {synopt:{opt fgt}[{cmd:(}{it:a}[{cmd:,}{it:pline}]{cmd:)}]}Foster–Greer–Thorbecke index with {it:a}>=0
 669 |     (Foster et al. 1984, 2010); default is {it:a}=0 (head count ratio); {it:a}=1 is equivalent to
 670 |     {cmd:pgi};
 671 |     {it:pline} as for {cmd:hci}
 672 |     {p_end}
 673 | {synopt:{opt sen}[{cmd:(}{it:pline}{cmd:)}]}Sen poverty index (Sen 1976; using the
 674 |     replication invariant version of the index, also see Shorrocks 1995);
 675 |     {it:pline} as for {cmd:hci}
 676 |     {p_end}
 677 | {synopt:{opt sst}[{cmd:(}{it:pline}{cmd:)}]}Sen-Shorrocks-Thon poverty index
 678 |     (see, e.g., Osberg and Xu 2008);
 679 |     {it:pline} as for {cmd:hci}
 680 |     {p_end}
 681 | {synopt:{opt takayama}[{cmd:(}{it:pline}{cmd:)}]}Takayama poverty index
 682 |     (Takayama 1979);
 683 |     {it:pline} as for {cmd:hci}
 684 |     {p_end}
 685 | {synopt:{opt watts}[{cmd:(}{it:pline}{cmd:)}]}Watts index (see, e.g., Saisana 2014);
 686 |     {it:pline} as for {cmd:hci}
 687 |     {p_end}
 688 | {synopt:{opt chu}[{cmd:(}{it:a}[{cmd:,}{it:pline}]{cmd:)}]}Clark-Hemming-Ulph poverty index with {it:a} in [0,100]
 689 |     (Clark et al. 1981); default is {it:a}=50; {it:a}=0 is equivalent to
 690 |     1-exp(-{cmd:watts}); {it:a}=100 is equivalent to {cmd:fgt(1)};
 691 |     {it:pline} as for {cmd:hci}
 692 |     {p_end}
 693 | {synopt:[{cmd:a}]{cmd:tip}{cmd:(}{it:p}[{cmd:,}{it:pline}]{cmd:)}}TIP ordinate,
 694 |     {it:p} in [0,100]; specify {cmd:atip()} for absolute TIP ordinates;
 695 |     {it:pline} as for {cmd:hci}
 696 |     {p_end}
 697 | 
 698 | {marker association}{...}
 699 | {syntab:Association measures}
 700 | {synopt:{opt corr}[{cmd:(}{it:{help varname:by}}{cmd:)}]}correlation coefficient;
 701 |     {it:by} specifies the secondary variable; default is as set by option {cmd:by()}
 702 |     {p_end}
 703 | {synopt:{opt slope}[{cmd:(}{it:{help varname:by}}{cmd:)}]}regression slope
 704 |     (equal to mean difference if {it:by} is dichotomous); {it:by} as for {cmd:corr}
 705 |     {p_end}
 706 | {synopt:{opt b}[{cmd:(}{it:{help varname:by}}{cmd:)}]}same as {cmd:slope}
 707 |     {p_end}
 708 | {synopt:{opt cohend}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}Cohen's d
 709 |     (allowing unequal group sizes); {it:df} applies small-sample
 710 |     adjustment; default is {it:df}=2; can also specify {opt cohend(df)};
 711 |     {it:by} is assumed to be dichotomous (string allowed)
 712 |     {p_end}
 713 | {synopt:{opt covar}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:df}]{cmd:)}]}covariance; {it:df} applies small-sample
 714 |     adjustment; default is {it:df}=1; can also specify {opt covar(df)};
 715 |     {it:by} as for {cmd:corr}
 716 |     {p_end}
 717 | {synopt:{opt rsquared}[{cmd:(}{it:{help varname:by}}{cmd:)}]}R squared, equal to {cmd:corr}^2;
 718 |      {it:by} as for {cmd:corr}
 719 |     {p_end}
 720 | {synopt:{opt spearman}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Spearman's rank correlation;
 721 |     {it:by} as for {cmd:corr}
 722 |     {p_end}
 723 | {synopt:{opt taua}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Kendall's tau-a (using fast algorithm by Newson 2006);
 724 |     {it:by} as for {cmd:corr}
 725 |     {p_end}
 726 | {synopt:{opt taub}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Kendall's tau-b (using fast algorithm by Newson 2006);
 727 |     {it:by} as for {cmd:corr}
 728 |     {p_end}
 729 | {synopt:{opt somersd}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Somers' D (using fast algorithm by Newson 2006);
 730 |     {it:by} as for {cmd:corr}
 731 |     {p_end}
 732 | {synopt:{opt gamma}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Goodman and Kruskal's gamma (using fast algorithm by Newson 2006);
 733 |     {it:by} as for {cmd:corr}
 734 |     {p_end}
 735 | 
 736 | {syntab:Categorical data (univariate)}
 737 | {synopt:{opt hhi}[{cmd:n}]}Herfindahl–Hirschman index (Herfindahl index, Simpson index);
 738 |     specify {cmd:hhin} for normalization  ({cmd:hhi}-1/K)/(1-1/K), where
 739 |     K is the number of categories
 740 |     {p_end}
 741 | {synopt:{opt gimp}[{cmd:n}]}Gini impurity (Gini–Simpson index, Simpson's interaction index,
 742 |     Blau index, Gibbs–Martin index); {cmd:gimp} = 1-{cmd:hhi}; {cmd:gimpn} = 1-{cmd:hhin}
 743 |     {p_end}
 744 | {synopt:{opt entropy}[{cmd:(}{it:base}{cmd:)}]}Shannon entropy; {it:base} specifies
 745 |     the base of the logarithm (default is natural logarithm)
 746 |     {p_end}
 747 | {synopt:{opt hill}[{cmd:(}{it:q}{cmd:)}]}Hill number (true diversity,
 748 |     effective number of species); {it:q} specifies the order of the diversity;
 749 |     default is {it:q}=1 such that {cmd:hill} = exp({cmd:entropy}); if {it:q}=0,
 750 |     {cmd:hill} is equal to the observed number of categories
 751 |     {p_end}
 752 | {synopt:{opt renyi}[{cmd:(}{it:q}{cmd:)}]}Rényi entropy;
 753 |     equal to ln({cmd:hill(}{it:q}{cmd:)}); default is {it:q}=1 such that
 754 |     {cmd:renyi} = {cmd:entropy}
 755 |     {p_end}
 756 | 
 757 | {marker catbivar}{...}
 758 | {syntab:Categorical data (bivariate)}
 759 | {synopt:{opt mindex}[{cmd:(}{it:{help varname:by}}[{cmd:,}{it:base}]{cmd:)}]}mutual information index (M index);
 760 |     {it:by} specifies the secondary variable (string allowed); default is as set by option {cmd:by()};
 761 |     {it:base} specifies the base of the logarithm (default is natural logarithm);
 762 |     can also specify {opt mindex(base)}
 763 |     {p_end}
 764 | {synopt:{opt uc}[{cmd:l}|{cmd:r}][{cmd:(}{it:{help varname:by}}{cmd:)}]}uncertainty coefficient (H index);
 765 |     {cmd:ucl} returns the asymmetric coefficient with respect to the left-hand
 766 |     side variable (i.e. division by the entropy of the main variable),
 767 |     {cmd:ucr} is with respect to the right-hand side variable (i.e. division by
 768 |     the entropy of the secondary variable), {cmd:uc} returns the symmetric
 769 |     uncertainty coefficient (weighted average of {cmd:ucl} and {cmd:ucr});
 770 |     {it:by} as for {cmd:mindex}
 771 |     {p_end}
 772 | {synopt:{opt cramersv}[{cmd:(}{it:{help varname:by}}{cmd:)}]}Cramér's V;
 773 |     {it:by} as for {cmd:mindex}
 774 |     {p_end}
 775 | {synopt:{opt dissim}[{cmd:(}{it:{help varname:by}}{cmd:)}]}(generalized) dissimilarity index (Duncan's D);
 776 |     {it:by} as for {cmd:mindex}
 777 |     {p_end}
 778 | {synopt:{opt or}[{cmd:(}{it:{help varname:by}}{cmd:)}]}odds ratio; variables are
 779 |     interpreted as true/false indicators (false if 0, else true); {it:by} as for {cmd:mindex}
 780 |     {p_end}
 781 | {synopt:{opt rr}[{cmd:(}{it:{help varname:by}}{cmd:)}]}risk ratio; variables are
 782 |     interpreted as true/false indicators (false if 0, else true); {it:by} as for {cmd:mindex}
 783 |     {p_end}
 784 | {synoptline}
 785 | 
 786 | {pstd}
 787 |     {it:Note on output formatting in Stata 15 (or in Stata 16 prior to the update of March 30, 2021):} If
 788 |     statistics with parameters in parentheses are requested, {cmd:dstat summarize}
 789 |     may possibly display a somewhat disarranged output table. Type
 790 | 
 791 |         {cmd:. version 14: dstat summarize} {it:...}
 792 | 
 793 | {pstd}
 794 |     to obtain an improved table in such a case.
 795 | 
 796 | 
 797 | {marker options}{...}
 798 | {title:Options}
 799 | 
 800 | {marker mainopts}{...}
 801 | {dlgtab:Main}
 802 | 
 803 | {phang}
 804 |     {cmd:nocasewise} causes missing values to be excluded for each variable in
 805 |     {it:varlist} individually. The default is to perform casewise deletion of
 806 |     observations, that is, to restrict the sample to observations that are not
 807 |     missing for any of the variables. If {cmd:nocasewise} is specified, the
 808 |     overall estimation sample is still restricted by the {cmd:if} and {cmd:in}
 809 |     qualifiers, the weights, and the variables specified in {cmd:over()} and
 810 |     {cmd:balance()}, but not by missing values in the main {it:varlist} (or in
 811 |     {cmd:by()}, {it:by}, {cmd:pline()}, or {it:pline}). For each variable
 812 |     the subsample of all nonmissing values within the overall estimation sample
 813 |     will then be used in the relevant computations.
 814 | 
 815 | {marker over}{...}
 816 | {phang}
 817 |     {cmd:over(}{help varname:{it:overvar}}[{cmd:,} {it:options}]{cmd:)}
 818 |     computes results for each subpopulation defined by the values of
 819 |     {it:overvar}. {it:overvar} must be integer and nonnegative. {it:options}
 820 |     are as follows:
 821 | 
 822 | {phang2}
 823 |     {opth sel:ect(numlist)} selects (and orders) subpopulations. {it:numlist}
 824 |     specifies the values of the subpopulations to be included
 825 |     and also determines the order of the subpopulations in the output. The basis
 826 |     for estimation will always be the total sample including all subpopulations.
 827 | 
 828 | {phang2}
 829 |     {opt contr:ast}[{cmd:(}{it:#}|{cmd:lag}|{cmd:lead}{cmd:)}] computes contrasts between
 830 |     subpopulations or between subpopulations and the total population. If
 831 |     {cmd:contrast} is specified without argument, the total population or
 832 |     the first subpopulation (possibly after applying {cmd:select()})
 833 |     will be used as the basis for the contrasts, depending on whether option
 834 |     {cmd:total} has been specified or not. Alternatively, specify
 835 |     the value of the reference subpopulation in parentheses (this may also be
 836 |     a subpopulation that has been excluded by {cmd:select()}) or
 837 |     type {cmd:contrast(lag)} or {cmd:contrast(lead)} to take stepwise contrasts
 838 |     with respect to the previous or next subpopulation, respectively. {cmd:contrast}
 839 |     implies {cmd:common} (if relevant).
 840 | 
 841 | {pmore2}
 842 |     The estimates from the reference (sub)population will be included among the
 843 |     stored results (in logarithmic form if {cmd:lnratio} is specified), but
 844 |     their display will be suppressed. Specify display
 845 |     option {cmd:cref} to report these results in the output.
 846 | 
 847 | {phang2}
 848 |     {opt ratio} requests that the contrasts are expressed as ratios. The
 849 |     default is to express contrasts as differences. {cmd:ratio} implies
 850 |     {cmd:contrast}.
 851 | 
 852 | {phang2}
 853 |     {opt lnr:atio} requests that the contrasts are expressed as differences in
 854 |     logarithms. The default is to express contrasts as raw differences. {cmd:lnratio}
 855 |     implies {cmd:contrast} and takes precedence over {cmd:ratio}.
 856 | 
 857 | {pmore2}
 858 |     When applying {cmd:lnratio} you may also want to specify reporting option
 859 |     {helpb dstat##display_opts:eform} to display results that are transformed back to
 860 |     ratios. In fact, point estimates, standard errors, and confidence intervals
 861 |     from {cmd:lnratio} with {cmd:eform} are identical to results
 862 |     from {cmd:ratio} with option {cmd:citype(log)}. An advantage of
 863 |     {cmd:lnratio}, however, is that the null hypothesis for t-statistics and
 864 |     p-values is ratio = 1 or, more precisely, ln(ratio) = 0 (i.e. no group
 865 |     difference). For {cmd:ratio} the null hypothesis is ratio = 0, which
 866 |     does not appear useful (this is why {cmd:dstat} suppresses t-statistics and
 867 |     p-values in case of {cmd:ratio}). A disadvantage of {cmd:lnrange} is that it
 868 |     cannot represent cases in which the comparison estimate and, hence, the ratio
 869 |     is zero.
 870 | 
 871 | {phang2}
 872 |     {opt accum:ulate} accumulates results across subpopulations
 873 |     (running sum). Only one of {cmd:contrast} and {cmd:accumulate} is allowed.
 874 | 
 875 | {pmore}
 876 |     Option {cmd:over()} is not supported by {cmd:dstat pw}.
 877 | 
 878 | {phang}
 879 |     {cmd:total} reports additional results across all subpopulations, including
 880 |     subpopulations that may have been excluded by {cmd:select()}. {cmd:total}
 881 |     only has an effect if {cmd:over()} is specified.
 882 | 
 883 | {marker balance}{...}
 884 | {phang}
 885 |     {cmd:balance(}{it:spec}{cmd:)} balances covariate distributions between
 886 |     subpopulations using reweighting. {opt balance()} requires {cmd:over()}. The
 887 |     syntax of {it:spec} is
 888 | 
 889 |             [{it:method}{cmd::}] {varlist} [{cmd:,} {it:options}]
 890 | 
 891 | {pmore}
 892 |     where {it:method} is either {cmd:ipw} for inverse probability weighting based
 893 |     on logistic regression (the default) or {cmd:eb} for entropy balancing using
 894 |     {helpb mf_mm_ebal:mm_ebal()} from {helpb moremata}, and
 895 |     {it:varlist} specifies the list of covariates to be balanced (factor-variable
 896 |     notation is allowed). For information on inverse probability weighting
 897 |     see, e.g., DiNardo et al. (1996) and {helpb teffects ipw}. For entropy balancing see
 898 |     Hainmueller (2012) and Section 3.8 in
 899 |     {browse "http://ideas.repec.org/p/bss/wpaper/35.html":Jann (2020)}. {it:options} are as follows:
 900 | 
 901 | {phang2}
 902 |     {opt ref:erence(#)} identifies the reference distribution. The default is
 903 |     use the total across all subpopulations as the reference distribution,
 904 |     including subpopulations that may have been excluded by {cmd:select()}. Specify
 905 |     {cmd:reference(}{it:#}{cmd:)} to obtain the reference distributions from
 906 |     observations for which {it:overvar}={it:#}; this may also be a subpopulation
 907 |     that has been excluded by {cmd:select()}.
 908 | 
 909 | {phang2}
 910 |     {it:logit_options} are options to be passed through to {helpb logit}. {it:logit_options}
 911 |     are only allowed if {it:method} is {cmd:ipw}.
 912 | 
 913 | {phang2}
 914 |     {opt btol:erance(#)}, {it:#}>=0, specifies the tolerance for the entropy
 915 |     balancing algorithm. The default is {cmd:btolerance(1e-5)}. A warning
 916 |     message is displayed if a balancing solution is not within the specified
 917 |     tolerance. {cmd:btolerance()} is only allowed if {it:method} is {cmd:eb}.
 918 | 
 919 | {phang2}
 920 |     {opt noi:sily} displays the output of the balancing procedure.
 921 | 
 922 | {phang2}
 923 |     {opt gen:erate(newvar)} stores the balancing weights in variable
 924 |     {it:newvar}. This is useful if you want to check whether covariates have been
 925 |     balanced successfully.
 926 | 
 927 | {pmore}
 928 |     Balancing weights will only be computed once per subpopulation. If
 929 |     {cmd:casewise} is specified, balancing will be based on the overall estimation
 930 |     sample as defined in the description of the {cmd:casewise} option; the weights
 931 |     will not be recomputed for each variable individually.
 932 | 
 933 | {marker repopts}{...}
 934 | {phang}
 935 |     {it:reporting_options} are options affecting how results are reported. The options
 936 |     are as follows:
 937 | 
 938 | {phang2}
 939 |     {opt l:evel(#)} specifies the confidence level, as a percentage, for
 940 |     confidence intervals. The default is {cmd:level(95)} or as set by
 941 |     {helpb set level}.
 942 | 
 943 | {phang2}
 944 |     {opt citype(type)} specifies the method for the computation of the
 945 |     confidence interval limits. {it:type} can be:
 946 | 
 947 | {p2colset 17 28 30 2}{...}
 948 | {p2col:{cmdab:norm:al}}normal CIs
 949 |     {p_end}
 950 | {p2col:{cmd:logit}}logit transformed CIs; useful for statistics in [0,1]
 951 |     {p_end}
 952 | {p2col:{cmd:probit}}probit transformed CIs; useful for statistics in [0,1]
 953 |     {p_end}
 954 | {p2col:{cmd:atanh}}inverse hyperbolic tangent transformed CIs; useful for statistics in [-1,1]
 955 |     {p_end}
 956 | {p2col:{cmd:log}}log transformed CIs; useful for statistics > 0
 957 |     {p_end}
 958 | {p2col:{cmdab:agres:ti}}Agresti-Coull CIs; useful for proportions
 959 |     {p_end}
 960 | {p2col:{cmd:exact}}exact (Clopper-Pearson) CIs; useful for proportions
 961 |     {p_end}
 962 | {p2col:{cmdab:jeff:reys}}Jeffreys CIs; useful for proportions
 963 |     {p_end}
 964 | {p2col:{cmd:wilson}}Wilson CIs; useful for proportions
 965 |     {p_end}
 966 | 
 967 | {pmore2}
 968 |     The default depends on subcommand and options. Use {cmd:citype()} to
 969 |     override the default. For details on {cmd:agresti}, {cmd:exact},
 970 |     {cmd:jeffreys}, and {cmd:wilson} see the documentation of {helpb proportion}.
 971 | 
 972 | {pmore2}
 973 |     {cmd:dstat} will store the confidence limits given {cmd:level()} and
 974 |     {cmd:citype()} in {cmd:e(ci)}. Replaying the results with different settings
 975 |     will update {cmd:e(ci)}. (In Stata 14, normal confidence intervals will be
 976 |     displayed in the output table irrespective of the contents of {cmd:e(ci)}.)
 977 | 
 978 | {phang2}
 979 |     {opt nohead:er} suppress the output header.
 980 | 
 981 | {phang2}
 982 |     {opt notab:le} suppresses the output table containing the estimated
 983 |     coefficients. {opt tab:le} enforces displaying the table.
 984 | 
 985 | {phang2}
 986 |     [{ul:{cmd:no}}]{opt pv:alues} decides whether p-values and their test
 987 |     statistics are reported in the coefficient table or not. The default is
 988 |     {cmd:nopvalues} unless {cmd:over(, contrast())} has been specified.
 989 | 
 990 | {phang2}
 991 |     {opt cref} causes the estimates from the reference (sub)population to be
 992 |     included in the coefficient tables. The default is to suppress these
 993 |     results. {cmd:cref} is only relevant if {cmd:over(, contrast())} has been
 994 |     specified.
 995 | 
 996 | {marker display_opts}{...}
 997 | {phang2}
 998 |     {it:display_options} are standard reporting options such as {cmd:eform},
 999 |     {cmd:cformat()}, or {cmd:coeflegend}; see {help eform_option:{bf:[R]} {it:eform_option}} and
1000 |     the Reporting options in {helpb estimation options:[R] Estimation options}.
1001 | 
1002 | {phang2}
1003 |     {opt gr:aph}[{cmd:(}{help dstat##graph_options:{it:graph_options}}{cmd:)}]
1004 |     displays the results in a graph using {helpb coefplot}. The coefficients
1005 |     table will be suppressed in this case (unless option {cmd:table} is
1006 |     specified). Alternatively, use command {cmd:dstat graph} to display the
1007 |     graph after estimation.
1008 | 
1009 | {phang}
1010 |     {opt novalues} prevents using the values of the evaluation points as
1011 |     coefficient names. This is not relevant for {cmd:dstat summarize}. If {cmd:novalues}
1012 |     is specified, the coefficients will be named as {it:stub#}, where
1013 |     {it:#} is consecutive number and {it:stub} is
1014 |     {cmd:d} in case of {cmd:dstat density},
1015 |     {cmd:h} in case of {cmd:dstat histogram},
1016 |     {cmd:p} in case of {cmd:dstat proportion},
1017 |     {cmd:c} in case of {cmd:dstat cdf} and {cmd:dstat ccdf},
1018 |     {cmd:q} in case of {cmd:dstat quantile},
1019 |     {cmd:l} in case of {cmd:dstat lorenz},
1020 |     {cmd:s} in case of {cmd:dstat share} (and for {cmd:dstat histogram} and
1021 |     {cmd:dstat share} the last coefficient, i.e. the upper limit of last bin, will be named
1022 |     {cmd:_ul}).
1023 | 
1024 | {phang}
1025 |     {opth vformat(fmt)} sets the display format used to create coefficient names
1026 |     from evaluation points. This is not relevant for {cmd:dstat summarize}. See
1027 |     help {helpb format} for available formats.
1028 | 
1029 | {marker vce}{...}
1030 | {dlgtab:SE/VCE}
1031 | 
1032 | {phang}
1033 |     {opt vce(vcetype)} determines how standard errors are computed. {it:vcetype} may be:
1034 | 
1035 |             {opt none}
1036 |             {opt a:nalytic}
1037 |             {opt cl:uster} {it:clustvar}
1038 |             {opt svy} [{help svy##svy_vcetype:{it:svy_vcetype}}] [{cmd:,} {help svy##svy_options:{it:svy_options}} ]
1039 |             {opt boot:strap} [{cmd:,} {help bootstrap:{it:bootstrap_options}} ]
1040 |             {opt jack:knife} [{cmd:,} {help jackknife:{it:jackknife_options}} ]
1041 | 
1042 | {pmore}
1043 |     {cmd:vce(none)} omits the computation of standard errors. This saves computer
1044 |     time.
1045 | 
1046 | {pmore}
1047 |     {cmd:vce(analytic)}, the default, computes standard errors based on
1048 |     influence functions.
1049 | 
1050 | {pmore}
1051 |     {bind:{cmd:vce(cluster} {it:clustvar}{cmd:)}} computes standard errors based
1052 |     on influence functions allowing for intragroup correlation, where
1053 |     {it:clustvar} specifies to which group each observation belongs.
1054 | 
1055 | {pmore}
1056 |     {cmd:vce(svy)} computes standard errors taking the survey design as set by
1057 |     {helpb svyset} into account. The syntax is equivalent to the syntax of the {helpb svy}
1058 |     prefix command; that is, {cmd:vce(svy)} is {cmd:dstat}'s way to support
1059 |     the {helpb svy} prefix.
1060 | 
1061 | {pmore}
1062 |     {cmd:vce(bootstrap)} and {cmd:vce(jackknife)} compute standard errors using
1063 |     {helpb bootstrap} or {helpb jackknife}, respectively; see help {it:{help vce_option}}.
1064 | 
1065 | {phang}
1066 |     {cmd:nose} is an alias for {cmd:vce(none)}. {cmd:nose} overrides {cmd:vce(analytic)} and
1067 |     {cmd:vce(cluster)}, but has no effect if specified together with
1068 |     {cmd:vce(svy)}, {cmd:vce(bootstrap)}, or {cmd:vce(jackknife)}.
1069 | 
1070 | {phang}
1071 |     [{cmd:no}]{cmd:cov} determines wether the full variance-covariance matrix
1072 |     of the estimates is stored in {cmd:e(V)}, or whether only the standard
1073 |     errors are stored in vector {cmd:e(se)}. The default is {cmd:cov} (full
1074 |     variance matrix) for subcommands {cmd:summarize} and {cmd:pw}, and
1075 |     {cmd:nocov} (standard errors only) for all other subcommands. {cmd:nocov}
1076 |     saves memory if the number of evaluation points is large (for example, if
1077 |     you estimate the density using 400 points across two subpopulations, the
1078 |     covariance matrix has 800 x 800 = 640'000 elements; the vector of standard
1079 |     errors has only 800 elements). For {cmd:vce(analytic)} and
1080 |     {cmd:vce(cluster)}, option {cmd:nocov} also saves computer time (since the
1081 |     computation of covariances is skipped; in the other cases, covariances are
1082 |     removed after estimation). For {cmd:vce(svy)}, option {cmd:nocov} also
1083 |     removes auxiliary variance matrices such as {cmd:e(V_srs)}. Note that
1084 |     post-estimation commands that rely on covariances (or on auxiliary variance
1085 |     matrices in case of {cmd:svy}) will not work after {cmd:nocov} has been
1086 |     applied; specify option {cmd:cov} if you intend to use such post-estimation
1087 |     commands (e.g., {helpb test} or {helpb lincom}) after subcommands other than
1088 |     {cmd:summarize} or {cmd:pw}.
1089 | 
1090 | {phang}
1091 |     {cmd:nobwfixed} allows the bandwidth(s) for density estimation to vary
1092 |     across replications. This is only relevant if density estimation is
1093 |     requested (subcommand {cmd:density} or {cmd:pdf}; subcommand
1094 |     {cmd:summarize} with at least one {cmd:density()} statistic), if the
1095 |     bandwidth is not set to a specific value (or a list of specific values)
1096 |     using option {cmd:bwidth()}, and if a replication technique is used for
1097 |     standard error estimation, i.e. {cmd:vce(bootstrap)}, {cmd:vce(jackknife)},
1098 |     or {cmd:vce(svy)} with {help svy##svy_vcetype:{it:svy_vcetype}} other than
1099 |     {cmd:linearized}. The default is to hold the bandwidth(s) fixed across
1100 |     replications.
1101 | 
1102 | {marker generate}{...}
1103 | {phang}
1104 |     {cmd:generate(}{it:names}[{cmd:,} {it:options}]{cmd:)} stores the influence
1105 |     functions that were used to compute the standard errors, where {it:names}
1106 |     is either a list of (new) variable names or
1107 |     {help newvarlist##stub*:{it:stub}}{cmd:*} to create names {it:stub}{cmd:1},
1108 |     {it:stub}{cmd:2}, etc. {it:options} are {cmd:rif} to store RIFs,
1109 |     {cmdab:sca:ling(}{cmdab:t:otal}{cmd:)} or {cmdab:sca:ling(}{cmdab:m:ean}{cmd:)}
1110 |     to determine the scaling, {cmdab:com:pact} to merge the influence functions
1111 |     across subpopulations, and {cmdab:qui:etly} to suppress output; see
1112 |     {it:{help dstat##predict_options:predict_options}} below.
1113 | 
1114 | {phang}
1115 |     {cmd:rif(}{it:names}[{cmd:,} {it:options}]{cmd:)} is an alias for
1116 |     {cmd:generate(}{it:names}{cmd:,} {cmd:rif} [{it:options}]{cmd:)}.
1117 | 
1118 | {phang}
1119 |     {opt replace} allows replacing existing variables.
1120 | 
1121 | {marker quant}{...}
1122 | {dlgtab:Quantile/density settings}
1123 | 
1124 | {phang}
1125 |     {opt qdef(#)} sets the quantile definition to be used when computing
1126 |     quantiles, with {it:#} in {c -(}0,...,11{c )-}. The default is
1127 |     {cmd:qdef(2)} (same as, e.g. {helpb summarize}). Definitions 1-9 are as
1128 |     described in Hyndman and Fan (1996), definition 0 is the "high" quantile,
1129 |     definition 10 is the Harrell-Davis quantile (Harrell and Davis 1982),
1130 |     definition 11 is the mid-quantile (Ma et al. 2011); see
1131 |     {helpb mf_mm_quantile:mm_quantile()} for more information. Apart from the
1132 |     {cmd:dstat quantile} and statistic {cmd:quantile()}, option {cmd:qdef()} affects
1133 |     all statistics that make use of quantiles (e.g. {cmd:trim}, {cmd:winsor},
1134 |     {cmd:huber}, {cmd:biweight}, {cmd:mad}, etc.).
1135 | 
1136 | {phang}
1137 |     {opt hdquantile} is a synonym for {cmd:qdef(10)} (Harrell-Davis
1138 |     quantiles). Only one of {opt hdquantile}, {opt mquantile}, and {opt qdef()}
1139 |     is allowed. The Harrell-Davis estimator typically leads to smoother
1140 |     quantile functions than classical quantile definitions. Furthermore,
1141 |     standard errors do not depend on density estimation and tend to be
1142 |     more reliable than for other quantile definitions if there is heaping in the data.
1143 | 
1144 | {phang}
1145 |     {opt hdtrim}[{cmd:(}{it:width}{cmd:)}] applies trimming to the Harrell-Davis
1146 |     quantile estimator as suggested by Akinshin (2021). If {cmd:hdtrim} is specified without
1147 |     argument, the width of evaluation interval is set to 1/sqrt(n), where n
1148 |     is the effective sample size. Alternatively, specify a custom {it:width}. Sensible values
1149 |     for {it:width} lie between 0 and 1 ({it:width}>=1 uses the untrimmed estimator;
1150 |     {it:width}<=0 sets the width to 1/sqrt(n)).
1151 | 
1152 | {phang}
1153 |     {opt mquantile} is a synonym for {cmd:qdef(11)} (mid-quantiles). Only one
1154 |     of {opt hdquantile}, {opt mquantile}, and {opt qdef()} is allowed. The
1155 |     mid-quantile estimator typically leads to smoother quantile
1156 |     functions than classical quantile definitions. Ma et al. (2011) suggest
1157 |     using the mid-quantile estimator for discrete data.
1158 | 
1159 | {phang}
1160 |     {opt mqopts(options)} provides additional settings for mid-quantiles that
1161 |     are relevant for standard error estimation. {it:options} are as follows:
1162 | 
1163 | {phang2}
1164 |     {opt us:mooth(#)}, with {it:#}<1, sets the degree of undersmoothing that is
1165 |     applied when determining the sparsity function via density estimation.
1166 |     The default is {cmd:usmooth(0.2)}. The undersmoothing factor is computed as
1167 |     n^(1/5) / n^(1/(5*(1-#)), where n is the effective sample size. Set # to 0
1168 |     to omit undersmoothing; #<0 leads to oversmoothing. Note that
1169 |     {help densopts:{it:density_options}} have no effect on density estimation
1170 |     for mid-quantiles.
1171 | 
1172 | {phang2}
1173 |     {cmd:cdf}[{cmd:(}{it:#}{cmd:})], with {it:#}>=0, determines the sparsity
1174 |     function by differencing the ECDF instead of employing density
1175 |     estimation. This may lead to somewhat more valid results in discrete data (i.e. data
1176 |     with relatively few distinct levels), but results may be unreliable in
1177 |     continuous data. Optional argument {it:#} sets the width of the integration
1178 |     window that is used to interpolate across jumps in the ECDF ({it:#} is on
1179 |     the probability scale; for example, a value of 0.01 is equivalent to a
1180 |     window covering 1 percent of data mass). The default is {it:#} = 1 /
1181 |     ceil(2 * n^(2/5)), where n is the effective sample size. Set {it:#}=0 to
1182 |     omit integration (this corresponds to the formulas given in Ma et al. 2011;
1183 |     the sparsity function will have sharp jumps).
1184 | 
1185 | {marker densopts}{...}
1186 | {phang}
1187 |     {it:density_options} set the details of density
1188 |     estimation. These settings are relevant for command
1189 |     {cmd:dstat density}/{cmd:pdf} and statistic {cmd:density()} as well as for
1190 |     the computation of influence functions that involve density
1191 |     estimation (e.g., the influence function of a quantile). For more information
1192 |     on density estimation see {helpb mf_mm_density:mm_density()},
1193 |     {browse "http://boris.unibe.ch/69421/2/kdens.pdf":Jann (2007)}, and
1194 |     Wand and Jones (1995). The options are as follows:
1195 | 
1196 | {phang2}
1197 |     {cmdab:bw:idth(}{it:method}[{cmd:,} {opt adj:ust(#)} {cmd:rd}]{cmd:)}
1198 |     specifies the type of automatic bandwidth selector for kernel density
1199 |     estimation. Possible choices for {it:method} are:
1200 | 
1201 | {p2colset 17 32 34 2}{...}
1202 | {p2col:{cmdab:s:ilverman}}optimal of Silverman
1203 |     {p_end}
1204 | {p2col:{cmdab:n:ormalscale}}normal scale rule
1205 |     {p_end}
1206 | {p2col:{cmdab:o:versmoothed}}oversmoothed rule
1207 |     {p_end}
1208 | {p2col:{opt sj:pi}}Sheather-Jones solve-the-equation plug-in
1209 |     {p_end}
1210 | {p2col:{cmdab:d:pi}[{cmd:(}{it:#}{cmd:)}]}Sheather-Jones direct plug-in,
1211 |     where {it:#} specifies the number of stages of functional estimation;
1212 |     default is {cmd:2}
1213 |     {p_end}
1214 | {p2col:{opt isj}}diffusion estimator bandwidth (Botev et al. 2010)
1215 |     {p_end}
1216 | 
1217 | {pmore2}
1218 |     The default is {cmd:bwidth(dpi(2))}. Suboption {opt adjust(#)}, with #>0, can be
1219 |     used to adjust the automatic bandwidth by factor {it:#}. Suboption {cmd:rd}
1220 |     applies relative-data correction to the automatic bandwidth (Cwik and Mielniczuk 1993).
1221 | 
1222 | {phang2}
1223 |     {opth bw:idth(numlist)} is an alternative to {opt bwidth(method)} and sets
1224 |     the bandwidth to a specific value. If {it:numlist} contains multiple values,
1225 |     the values are used one after the other across the variables and
1226 |     subpopulations (recycling values if needed). The specified values must be larger
1227 |     than zero.
1228 | 
1229 | {phang2}
1230 |     {opt k:ernel(kernel)} specifies the kernel function. {it:kernel} may
1231 |     be {opt e:panechnikov}, {opt epan2} (alternative Epanechnikov kernel
1232 |     function), {opt b:iweight}, {opt triw:eight}, {opt c:osine},
1233 |     {opt g:aussian}, {opt p:arzen}, {opt r:ectangle} or {opt t:riangle}. The default
1234 |     is {cmd:kernel(gaussian)}.
1235 | 
1236 | {phang2}
1237 |     {opt adapt:ive(#)} specifies the number of iterations used by the adaptive
1238 |     kernel density estimator. The default is {cmd:adaptive(0)} (non-adaptive
1239 |     density estimator).
1240 | 
1241 | {phang2}
1242 |     {cmd:exact} causes the exact kernel density estimator to be used instead
1243 |     of the binned approximation estimator. The exact estimator can be slow in large
1244 |     datasets if the density is evaluated at many points.
1245 | 
1246 | {phang2}
1247 |     {opt na:pprox(#)} specifies the grid size used by the binned approximation
1248 |     density estimator (and by the data-driven bandwidth selectors). The default
1249 |     is {cmd:napprox(1024)}.
1250 | 
1251 | {phang2}
1252 |     {opt pad(#)} specifies the padding proportion of the approximation grid. Default is
1253 |     {cmd:pad(0.1)}.
1254 | 
1255 | {phang2}
1256 |     {opt ll(#)} specifies the lower boundary of the support of data and causes
1257 |     boundary-correction to be applied to the density estimate. Error will be
1258 |     returned if the data contains values smaller than {it:#}.
1259 | 
1260 | {phang2}
1261 |     {opt ul(#)} specifies the upper boundary of the support of data and causes
1262 |     boundary-correction to be applied to the density estimate. Error will be
1263 |     returned if the data contains values larger than {it:#}.
1264 | 
1265 | {phang2}
1266 |     {opt bo:undary(method)} sets the type of boundary correction. Choices are
1267 |     {opt ren:orm} (renormalization method; the default), {opt refl:ect} (reflection method), or
1268 |     {opt lc} (linear combination technique). This is only relevant if {cmd:ll()} or {cmd:ul()}
1269 |     has been specified.
1270 | 
1271 | {marker sum}{...}
1272 | {dlgtab:Subcommand summarize}
1273 | 
1274 | {phang}
1275 |     {opt relax} continues computations even if there are observations outside
1276 |     of the support for a specific statistic. Some statistics such as the
1277 |     geometric mean, the MLD, or the Theil index require observations to be
1278 |     within a specific domain (e.g. strictly positive). By default, {cmd:dstat} aborts
1279 |     with error if observations violating such requirements are encountered. Specify
1280 |     {cmd:relax} if you want to continue computations based on the valid
1281 |     observations in such a case. Exclusion of invalid observations will be
1282 |     applied to each statistic individually; that is, the invalid observations
1283 |     will not be dropped from the overall estimation sample.
1284 | 
1285 | {phang}
1286 |     {opth by(varname)} specifies a default secondary variable for
1287 |     inequality desomposition, concentration indices, and association measures.
1288 | 
1289 | {phang}
1290 |     {opt pline(#|varname)} specifies a default poverty line for poverty
1291 |     measures, either as a single value or as a variable containing observation-specific
1292 |     values.
1293 | 
1294 | {phang}
1295 |     {opt pstrong} selects the poverty definition to be applied (see Donaldson and
1296 |     Weymark 1986). The default is to use the "weak" definition, that is, to treat
1297 |     outcomes equal to the poverty line as non-poor. Specify {cmd:pstrong} to treat
1298 |     these cases as poor ("strong" definition). The choice of definition is relevant
1299 |     only for some of the poverty measures.
1300 | 
1301 | {marker pw}{...}
1302 | {dlgtab:Subcommand pw}
1303 | 
1304 | {phang}
1305 |     {opt statistic(stat)} selects the association measure to be
1306 |     computed. {it:stat} may be any statistic listed under
1307 |     {help dstat##association:Association measures} or
1308 |     {help dstat##catbivar:Categorical data (bivariate)}
1309 |     in the above table of summary statistics (omitting argument
1310 |     {it:by}). {cmd:statistic(corr)} is the default. Type, for example,
1311 |     {cmd:statistic(taub)} to compute Kendall's tau-b. Arguments other than
1312 |     {it:by} can be provided in parentheses as usual; for example, type
1313 |     {cmd:statistic(mindex(2))} to compute the M index with base 2.
1314 | 
1315 | {pmore}
1316 |     Most supported statistics are symmetric in the sense that the upper and
1317 |     lower triangles of the association matrix (i.e. the matrix of pairwise
1318 |     associations among the variables in {it:varlist}) contain the same results
1319 |     (i.e. the association between X and Y is the same as the association beteen
1320 |     Y and X). For asymmetric statistics (e.g. {cmd:slope}) the column
1321 |     (i.e. equation) variable is treated as the dependent variable.
1322 | 
1323 | {phang}
1324 |     {opt lower} requests that the lower-triangle elements of the association
1325 |     matrix be computed. The default is to compute both the lower-triangle elements
1326 |     and the upper-triangle elements.
1327 | 
1328 | {phang}
1329 |     {opt upper} requests that the upper-triangle elements of the association
1330 |     matrix be computed. The default is to compute both the lower-triangle elements
1331 |     and the upper-triangle elements.
1332 | 
1333 | {phang}
1334 |     {opt diagonal} includes the diagonal elements of the association
1335 |     matrix (associations of the variables with themselves). By default,
1336 |     diagonal elements are omitted.
1337 | 
1338 | {marker density}{...}
1339 | {dlgtab:Subcommand density}
1340 | 
1341 | {phang}
1342 |     {opt n(#)} sets the number of points for which the density is to
1343 |     be estimated. A regular grid of {it:#} points spanning the
1344 |     data range (within subpopulation; plus some padding) will be used. The
1345 |     default is {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed.
1346 | 
1347 | {phang}
1348 |     {opt common} requests that a common set of evaluation points is used across
1349 |     all subpopulations. The default is to determine the evaluation points based on
1350 |     the data range within subpopulation. If {cmd:common} is specified, the
1351 |     evaluation points will be based on the data range in the total population.
1352 | 
1353 | {phang}
1354 |     [{cmd:l}|{cmd:r}]{cmd:tight} omits padding when determining the evaluation
1355 |     grid. Specify {cmd:tight} to omit padding on both sides, that is, to use a grid
1356 |     from the observed minimum to the observed maximum of the data. Specify
1357 |     {cmd:ltight} to omit padding only on the left, that is, to use the observed
1358 |     minimum as the lower bound of the grid. Specify {cmd:rtight} to omit padding
1359 |     only on the right, that is, to use the observed maximum as the upper bound
1360 |     of the grid. Option {cmd:tight} has no effect if {cmd:range()} or {cmd:at()}
1361 |     is specified.
1362 | 
1363 | {phang}
1364 |     {opt range(a b)} specifies the range of the evaluation grid. The default is
1365 |     is to determine the range of the grid from the data; see option {cmd:n()}. Option
1366 |     {cmd:range()} overrides {cmd:common}. Only one of {cmd:range()} and
1367 |     {cmd:at()} is allowed.
1368 | 
1369 | {phang}
1370 |     {opth at(numlist)} specifies a custom grid of evaluation points. Only
1371 |     one of {cmd:n()} and {cmd:at()} is allowed.
1372 | 
1373 | {phang}
1374 |     {cmd:unconditional} rescales results such that the
1375 |     density function integrates to the relative size of the subpopulation
1376 |     instead of 1. This is only relevant if option {cmd:over()} has been
1377 |     specified.
1378 | 
1379 | {marker hist}{...}
1380 | {dlgtab:Subcommand histogram}
1381 | 
1382 | {phang}
1383 |     {opt proportion} estimates proportions instead of densities.
1384 | 
1385 | {phang}
1386 |     {opt percent} estimates percent instead of densities.
1387 | 
1388 | {phang}
1389 |     {opt frequency} estimates frequencies instead of densities.
1390 | 
1391 | {phang}
1392 |     {cmd:n(}{cmd:#}|{it:method}{cmd:)} selects
1393 |     (the method to determine) the number of histogram bins. Specify {opt n(#)}
1394 |     to use {it:#} bins. Alternatively, specify {opt n(method)} to determine the
1395 |     number of bins automatically, where {it:method} may be one of the following:
1396 | 
1397 | {p2colset 13 23 25 2}{...}
1398 | {p2col:{opt sq:rt}}modified square-root choice as used by {helpb histogram}
1399 |     {p_end}
1400 | {p2col:{opt st:urges}}Sturges' formula
1401 |     {p_end}
1402 | {p2col:{opt ri:ce}}Rice rule
1403 |     {p_end}
1404 | {p2col:{opt do:ane}}Doane's formula
1405 |     {p_end}
1406 | {p2col:{opt sc:ott}}Scott's normal reference rule
1407 |     {p_end}
1408 | {p2col:{opt fd}}Freedman–Diaconis' choice
1409 |     {p_end}
1410 | {p2col:{opt ep}}power-maximizing number of equiprobable bins
1411 |     {p_end}
1412 | 
1413 | {pmore}
1414 |     The default is {cmd:n(sqrt)}; see help {helpb histogram} for details on this
1415 |     rule. For the other rules see {browse "http://en.wikipedia.org/wiki/Histogram"}. The generated
1416 |     bins will span the range of the observed data (within subpopulation).
1417 | 
1418 | {phang}
1419 |     {cmd:ep} uses equal probability bins (approximately) instead of equal
1420 |     width bins.
1421 | 
1422 | {phang}
1423 |     {opt common} requests that a common set of bin definitions is used across
1424 |     all subpopulations. The default is to determine the number of bins and the
1425 |     bin boundaries based on the data within subpopulation. If {cmd:common} is
1426 |     specified, the bin definitions will be based on the data in the total population.
1427 | 
1428 | {phang}
1429 |     {opth at(numlist)} specifies custom cutpoints for the bins (in ascending
1430 |     order). If {it:numlist} contains {it:n} numbers, {it:n}-1 bins will be
1431 |     created. Note that the constructed bins will cover all data only if the first
1432 |     cutpoint is smaller than or equal to the minimum of the data and the last
1433 |     cutpoint is larger than or equal to the maximum ({cmd:dstat} does {it:not} check
1434 |     this condition and does not display a warning if the condition is violated).
1435 | 
1436 | {phang}
1437 |     {cmd:discrete} treats the data as discrete and estimates the probability of
1438 |     each observed level in the data. The option is implemented as a
1439 |     redirection to subcommand {cmd:proportion} with option {cmd:nocategorical}. Options
1440 |     {cmd:n()} and {cmd:ep} are not allowed together with {cmd:discrete}; the other
1441 |     options are as described for {help dstat##prop:subcommand {bf:proportion}}.
1442 | 
1443 | {phang}
1444 |     {cmd:unconditional} rescales results by the relative size of
1445 |     the subpopulation. This is only relevant if option {cmd:over()} has been
1446 |     specified. {cmd:unconditional} is not allowed together with {cmd:frequency}.
1447 | 
1448 | {marker prop}{...}
1449 | {dlgtab:Subcommand proportion}
1450 | 
1451 | {phang}
1452 |     {opt percent} estimates percent instead of proportions.
1453 | 
1454 | {phang}
1455 |     {opt frequency} estimates frequencies instead of proportions.
1456 | 
1457 | {phang}
1458 |     {opth at(numlist)} provides a custom list of levels for which to estimate
1459 |     proportions. The default is to use all levels observed in the data (across
1460 |     subpopulations).
1461 | 
1462 | {phang}
1463 |     {opt nocategorical} allows outcome variables that do not comply to
1464 |     Stata's rules for factor variables (e.g. variables that contain negative
1465 |     or noninteger values). This also affects how the coefficients are
1466 |     labeled in the output.
1467 | 
1468 | {phang}
1469 |     {cmd:unconditional} rescales proportions by the relative size of
1470 |     the subpopulation. This is only relevant if option {cmd:over()} has been
1471 |     specified. {cmd:unconditional} is not allowed together with {cmd:frequency}.
1472 | 
1473 | {marker cdf}{...}
1474 | {dlgtab:Subcommands cdf and ccdf}
1475 | 
1476 | {phang}
1477 |     {opt percent} estimates percent instead of proportions.
1478 | 
1479 | {phang}
1480 |     {opt frequency} estimates frequencies instead of proportions.
1481 | 
1482 | {phang}
1483 |     {opt mid} applies midpoint adjustment to the estimated CDF. By default, the
1484 |     CDF at evaluation point {it:x} is defined as the proportion of data that is
1485 |     lower than or equal to {it:x}. If {cmd:mid} is specified, the CDF at
1486 |     point {it:x} is reduced by one half the proportion of data equal to
1487 |     {it:x}. {cmd:mid} only has an effect on the results for evaluation points
1488 |     that have a match in the data (unless {cmd:ipolate} is specified; see below). Only
1489 |     one of {cmd:mid} and {cmd:floor} is allowed.
1490 | 
1491 | {phang}
1492 |     {opt floor} defines the CDF at evaluation point {it:x} as the proportion
1493 |     of data that is lower than {it:x}, rather than lower than or equal to
1494 |     {it:x}. {cmd:floor} only has an effect on the results for evaluation points
1495 |     that have a match in the data (unless {cmd:ipolate} is specified;
1496 |     see below). Only one of {cmd:floor} and {cmd:mid} is allowed.
1497 | 
1498 | {phang}
1499 |     {opt n(#)} sets the number of points at which the CDF is to be
1500 |     evaluated. A regular grid of {it:#} points spanning the
1501 |     observed data range (within subpopulation) will be used. The default is
1502 |     {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed.
1503 | 
1504 | {phang}
1505 |     {opt common} requests that a common set of evaluation points is used across
1506 |     all subpopulations. The default is to determine the evaluation points based on
1507 |     the data range within subpopulation. If {cmd:common} is specified, the
1508 |     evaluation points will be based on the data range in the total population.
1509 | 
1510 | {phang}
1511 |     {opt range(a b)} specifies the range of the evaluation grid. The default is
1512 |     is to determine the range of the grid from the data; see option {cmd:n()}. Option
1513 |     {cmd:range()} overrides {cmd:common}. Only one of {cmd:range()} and
1514 |     {cmd:at()} is allowed.
1515 | 
1516 | {phang}
1517 |     {opth at(numlist)} provides a custom list of points at which to evaluate
1518 |     the CDF. Only one of {cmd:n()} and {cmd:at()} is allowed.
1519 | 
1520 | {phang}
1521 |     {cmd:discrete} treats the data as discrete. In this case, the CDF will
1522 |     be estimated at each level observed in the data
1523 |     (across all subpopulations). Option {cmd:n()} is not allowed if
1524 |     {cmd:discrete} is specified.
1525 | 
1526 | {phang}
1527 |     {cmd:ipolate} obtains the estimates of the CDF by linearly interpolating
1528 |     the values of the empirical CDF. That is, the estimates will lie
1529 |     on the curve that linearly connects the points of the CDF if the CDF is
1530 |     evaluated at each observed level in the data (within subpopulation; options
1531 |     {cmd:mid} and {cmd:floor} have an effect on the location of these
1532 |     points). By default, the estimates of the CDF are obtained according to the definitions
1533 |     described above (see {cmd:mid} and {cmd:floor}).
1534 | 
1535 | {phang}
1536 |     {cmd:unconditional} rescales results by the relative size of
1537 |     the subpopulation. This is only relevant if option {cmd:over()} has been
1538 |     specified. {cmd:unconditional} is not allowed together with {cmd:frequency}.
1539 | 
1540 | {marker quantile}{...}
1541 | {dlgtab:Subcommand quantile}
1542 | 
1543 | {phang}
1544 |     {opt n(#)} sets the number of quantiles to be computed. A regular grid
1545 |     of {it:#} points from {it:a}+{it:h} to {it:b}-{it:h} will be used,
1546 |     with {it:h} = ({it:b}-{it:a})/({it:#}+1) and {it:a} and {it:b}
1547 |     as set by option {cmd:range()}. The default is
1548 |     {cmd:n(99)}. Only one of {cmd:n()} and {cmd:at()} is allowed.
1549 | 
1550 | {phang}
1551 |     {opt range(a b)} specifies the range of the evaluation grid, {it:a} and
1552 |     {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()}
1553 |     and {cmd:at()} is allowed.
1554 | 
1555 | {phang}
1556 |     {opth at(numlist)} provides a custom list of probabilities at which to
1557 |     compute quantiles. The specified values must be within [0,1]. Only one of
1558 |     {cmd:n()} and {cmd:at()} is allowed.
1559 | 
1560 | {marker lorenz}{...}
1561 | {dlgtab:Subcommand lorenz}
1562 | 
1563 | {phang}
1564 |     {opt percent} expresses results in percent instead of
1565 |     proportions. {cmd:percent} is not allowed with
1566 |     {cmd:generalized} or {cmd:absolute}.
1567 | 
1568 | {phang}
1569 |     {opt generalized} estimates the generalized Lorenz curve.
1570 | 
1571 | {phang}
1572 |     {opt sum} estimates the total (unnormalized) Lorenz curve.
1573 | 
1574 | {phang}
1575 |     {opt gap} estimates the equality gap curve.
1576 | 
1577 | {phang}
1578 |     {opt absolute} estimates the absolute Lorenz curve.
1579 | 
1580 | {phang}
1581 |     {opth by(varname)} estimates the concentration curve with respect to
1582 |     {it:varname} instead of the Lorenz curve.
1583 | 
1584 | {phang}
1585 |     {opt n(#)} sets the number of ordinates to be estimated. A regular grid
1586 |     of {it:#} values from {it:a} to {it:b} will be used, with {it:a} and {it:b}
1587 |     as set by option {cmd:range()}. The default is {cmd:n(101)}. Only one of
1588 |     {cmd:n()} and {cmd:at()} is allowed.
1589 | 
1590 | {phang}
1591 |     {opt range(a b)} specifies the range of the evaluation grid, {it:a} and
1592 |     {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()}
1593 |     and {cmd:at()} is allowed.
1594 | 
1595 | {phang}
1596 |     {opth at(numlist)} provides a custom list of points at which to
1597 |     estimate Lorenz ordinates. The specified values must be within [0,1]. Only one of
1598 |     {cmd:n()} and {cmd:at()} is allowed.
1599 | 
1600 | {marker share}{...}
1601 | {dlgtab:Subcommand share}
1602 | 
1603 | {phang}
1604 |     {opt proportion} estimates proportions instead of densities.
1605 | 
1606 | {phang}
1607 |     {opt percent} estimates percent instead of densities.
1608 | 
1609 | {phang}
1610 |     {opt generalized} estimates generalized shares instead of densities.
1611 | 
1612 | {phang}
1613 |     {opt sum} estimates totals instead of densities.
1614 | 
1615 | {phang}
1616 |     {opt average} estimates averages instead of densities.
1617 | 
1618 | {phang}
1619 |     {opth by(varname)} estimates the concentration shares with respect to
1620 |     {it:varname}.
1621 | 
1622 | {phang}
1623 |     {opt n(#)} sets the number of bins. A regular grid of {it:#} bins between
1624 |     0 an 1 will be used. The default is {cmd:n(20)}.
1625 | 
1626 | {phang}
1627 |     {opth at(numlist)} specifies custom cutpoints for the bins (in ascending
1628 |     order). The specified values must be within [0,1]. If {it:numlist} contains
1629 |     {it:n} numbers, {it:n}-1 bins will be created. Note that the constructed
1630 |     bins will cover all data only if the first cutpoint is 0 and the last
1631 |     cutpoint is 1.
1632 | 
1633 | {marker tip}{...}
1634 | {dlgtab:Subcommand tip}
1635 | 
1636 | {phang}
1637 |     {opt pline(#|varname)} specifies the poverty line, either as a single
1638 |     value or as a variable containing observation-specific
1639 |     values. Option {cmd:pline()} is required.
1640 | 
1641 | {phang}
1642 |     {opt absolute} estimates the absolute TIP curve. Default is to estimate the
1643 |     relative TIP curve.
1644 | 
1645 | {phang}
1646 |     {opt pstrong} selects the poverty definition to be applied (see Donaldson and
1647 |     Weymark 1986). The default is to use the "weak" definition, that is, to treat
1648 |     outcomes equal to the poverty line as non-poor. Specify {cmd:pstrong} to treat
1649 |     these cases as poor ("strong" definition).
1650 | 
1651 | {phang}
1652 |     {opt n(#)} sets the number of ordinates to be estimated. A regular grid
1653 |     of {it:#} values from {it:a} to {it:b} will be used, with {it:a} and {it:b}
1654 |     as set by option {cmd:range()}. The default is {cmd:n(101)}. Only one of {cmd:n()}
1655 |     and {cmd:at()} is allowed.
1656 | 
1657 | {phang}
1658 |     {opt range(a b)} specifies the range of the evaluation grid, {it:a} and
1659 |     {it:b} in [0,1]. The default is {cmd:range(0 1)}. Only one of {cmd:range()}
1660 |     and {cmd:at()} is allowed.
1661 | 
1662 | {phang}
1663 |     {opth at(numlist)} provides a custom list of points at which to
1664 |     estimate the ordinates. The specified values must be within [0,1]. Only one of
1665 |     {cmd:n()} and {cmd:at()} is allowed.
1666 | 
1667 | {marker graph_options}{...}
1668 | {dlgtab:Graph options}
1669 | 
1670 | {phang}
1671 |     {cmd:merge} causes results from different equations to be placed
1672 |     in a single graph (as separate "plots", i.e. as separate series of results
1673 |     displayed in a common style) instead of creating a separate subgraph for
1674 |     each equation. This is only relevant if the results contain multiple
1675 |     equations and if the equations are one-dimensional
1676 |     (e.g. subpopulations); {cmd:merge} has no effect if the
1677 |     equations are two-dimensional (subpopulations and variables).
1678 | 
1679 | {phang}
1680 |     {cmd:overlay} is a synonym for {cmd:merge}.
1681 | 
1682 | {phang}
1683 |     {cmd:flip} changes how results are allocated to plots and subgraphs. This is
1684 |     only relevant if the results contain multiple equations. If the equations
1685 |     are two-dimensional (subpopulations and variables), the default is to
1686 |     create subgraphs by the secondary dimension (variables) and create
1687 |     "plots" (series of results displayed in a common style) within subgraphs by
1688 |     the main dimension (subpopulations). Specify {cmd:flip} to reverse this
1689 |     behavior. If equations are one-dimensional, {cmd:flip} has the same effect
1690 |     as {cmd:merge}.
1691 | 
1692 | {phang}
1693 |     [{cmd:g}|{cmd:p}]{cmdab:sel:ect}{cmd:(}{it:{help numlist}}|{cmdab:r:everse}{cmd:)}
1694 |     selects and orders subgraphs and plots within
1695 |     subgraphs. {it:numlist} specifies the indices of the subgraphs or plots to
1696 |     be included. For example, in a situation where the default graph has three
1697 |     subgraphs (containing one plot each), you could type {cmd:select(3 1)} to
1698 |     omit the 2nd subgraph and reverse the order such that the 3rd subgraph comes
1699 |     first. Instead of providing {it:numlist}, type {cmd:select(reverse)}
1700 |     to reverse the order of subgraphs or plots.
1701 | 
1702 | {pmore}
1703 |     {cmd:select()} applies to both, subgraphs and plots within subgraphs. If a
1704 |     graph contains multiple subgraphs and multiple plots within subgraphs, use option
1705 |     {cmd:gselect()} to select and order subgraphs, and use option {cmd:pselect()}
1706 |     to select and order plots.
1707 | 
1708 | {pmore}
1709 |     {cmd:select()}, {cmd:gselect()}, and {cmd:pselect()} only have an effect if
1710 |     there are multiple elements to choose from. That is,
1711 |     single subgraphs or single plots will always be displayed, irrespective of
1712 |     what you type in these options.
1713 | 
1714 | {phang}
1715 |     {opt cref} causes results from the reference (sub)population to be
1716 |     included in the graph. The default is to suppress these
1717 |     results. {cmd:cref} is only relevant if {cmd:over(, contrast())} has been
1718 |     specified.
1719 | 
1720 | {phang}
1721 |     {cmd:bystats}[{cmd:(}{cmdab:m:ain}|{cmdab:s:econdary}{cmd:)}] treats coefficients as equations and
1722 |     equations as coefficients. This is only relevant after
1723 |     {cmd:dstat summarize} and only has an effect if the results contain multiple
1724 |     equations. The effect of {cmd:bystats} typically is that results are grouped
1725 |     by statistics rather than by subpopulations or variables (the option may
1726 |     also have the opposite effect depending on how exactly {cmd:dstat} returned its
1727 |     results). Optional type {cmd:bystats(main)} (the default) or
1728 |     {cmd:bystats(secondary)} to specify wether coefficients should replace the
1729 |     main dimension or the secondary dimension of the equations, respectively. This
1730 |     is only relevant if the equations contain two dimensions
1731 |     (subpopulations and variables).
1732 | 
1733 | {phang}
1734 |     [{cmd:no}]{cmd:step} enforces or prevents using a step function to display
1735 |     the distribution function. This is only relevant after {cmd:dstat cdf}
1736 |     and {cmd:dstat ccdf}. The default is to display the CDF as a step function
1737 |     if option {cmd:discrete} (but not {cmd:ipolate}) has been specified, and
1738 |     else use straight lines. Specify {cmd:nostep} or {cmd:step}, respectively,
1739 |     to override the default.
1740 | 
1741 | {phang}
1742 |     {cmd:norefline} suppresses the equality line (diagonal) that is printed
1743 |     when plotting results from {cmd:dstat lorenz} (unless option
1744 |     {cmd:generalized}, {cmd:gap}, or {cmd:absolute} has been specified).
1745 | 
1746 | {phang}
1747 |     {opt refline(line_options)} specifies options to affect the rendition of
1748 |     the equality line; see {it:{help line_options}}. This is only relevant after
1749 |     {cmd:dstat lorenz}.
1750 | 
1751 | {marker coefplot}{...}
1752 | {phang}
1753 |     {it:coefplot_options} are options to be passed through to
1754 |     {helpb coefplot}. Use these options, for example, to set titles and axis
1755 |     labels or to affect the overall look and size of the graph. The options can
1756 |     also be used to change the rendering of the plotted results (e.g. colors,
1757 |     line patterns, marker symbols, etc.). If a graph contains multiple plots
1758 |     (multiple series of results displayed in a common style), option
1759 |     {cmd:p}{it:#}{cmd:()} can be used to address the {it:#}th plot. For example,
1760 |     you could type {cmd:p2(recast(dropline) pstyle(p5) noci)} to change the
1761 |     {it:plottype} of the 2nd plot to {cmd:dropline}, change its {it:pstyle}
1762 |     to {cmd:p5} (instead of the default {cmd:p2}), and suppress its confidence
1763 |     intervals.
1764 | 
1765 | {marker predict_options}{...}
1766 | {dlgtab:Predict options}
1767 | 
1768 | {phang}
1769 |     {opt rif} generates recentered influence functions (RIFs) instead of regular
1770 |     influence functions. RIFs are defined such that their mean is equal to the
1771 |     statistic in question (Firpo et al. 2009; also see Rios-Avila 2020)
1772 |     and the standard error of the mean (as computed by
1773 |     command {helpb mean}) provides an estimate of the standard error of the
1774 |     statistic. The default is to store influence functions defined in a way such
1775 |     that their total is zero and the standard error of the total (as computed by
1776 |     command {helpb total}) provides an estimate of the standard error of the
1777 |     statistic.
1778 | 
1779 | {phang}
1780 |     {opt scaling(spec)} determines the scaling of the generated
1781 |     influence functions. {it:spec} can be {cmdab:t:otal} (scaling for
1782 |     analysis by {helpb total}) or {cmdab:m:ean} (scaling for analysis by
1783 |     {helpb mean}). The default is {cmd:scaling(total)} for regular influence
1784 |     functions and {cmd:scaling(mean)} for recentered influence functions
1785 |     (i.e. if option {cmd:rif} is specified).
1786 | 
1787 | {phang}
1788 |     {opt compact} generates influence functions in compact form. {cmd:compact}
1789 |     only has an effect if {cmd:over()} has been specified and is not allowed
1790 |     with {cmd:balance()}, {cmd:unconditional}, {cmd:over(, contrast)}, or
1791 |     {cmd:over(, accumulate)}. Furthermore, {cmd:compact} is not supported
1792 |     for statistics that are not normalized by the sample size (i.e. frequencies
1793 |     or totals).
1794 | 
1795 | {pmore}
1796 |     The default is to generate one influence function for each single parameter
1797 |     estimated by {cmd:dstat}. If {cmd:over()} is specified, this means that
1798 |     each statistic in each subpopulation has its own influence
1799 |     function. Specify {cmd:compact} to merge the influence functions across
1800 |     subpopulations. In this case, {cmd:over()} has to be specified when
1801 |     analyzing the influence functions.
1802 | 
1803 | {phang}
1804 |     {opt quietly} suppresses the list of generated variables that is displayed by
1805 |     default.
1806 | 
1807 | {pstd}
1808 |     Note that weights, if specified, will not be incorporated into the
1809 |     influence functions, so that the weights can be
1810 |     applied when analyzing the influence functions. The influence functions do,
1811 |     however, incorporate the balancing weights (net of base weights)
1812 |     from option {cmd:balance()}.
1813 | 
1814 | {pstd}
1815 |     Furthermore, note that {cmd:dstat} generates scores instead of
1816 |     influence functions for statistics that are not normalized by the sample
1817 |     size (i.e. frequencies or totals). The difference is that the total of an influence function
1818 |     across the estimation sample is zero, whereas the total of the score is
1819 |     equal to the statistic in question. Returning scores for frequencies and totals
1820 |     ensures that standard errors obtained by {cmd:total} will be correct for these
1821 |     statistics in complex survey designs.
1822 | 
1823 | 
1824 | {marker examples}{...}
1825 | {title:Examples}
1826 | 
1827 | {dlgtab:Summary statistics}
1828 | 
1829 | {pstd}
1830 |     {cmd:dstat summarize} supports a long list of summary statistic. For example, the following
1831 |     command computes the arithmetic mean, geometric mean, median, 5% trimmed mean, 5%
1832 |     winsorized mean, 95%-efficiency Huber M estimate, and Hodges-Lehmann location of wages
1833 |     for unionized and nonunionized workers:
1834 | 
1835 |         . {stata sysuse nlsw88, clear}
1836 | {p 8 12 2}
1837 |         . {stata dstat (mean gmean median trim5 winsor5 huber95 hl) wage, over(union)}
1838 |         {p_end}
1839 | 
1840 | {pstd}
1841 |     Results can be computed for multiple variables, and statistics may
1842 |     differ across variables. The following command estimates the
1843 |     Gini coefficient, mean log deviation, and variance of logarithms of wages,
1844 |     the means of working hours and work experience, as well as the proportion of whites
1845 |     ({cmd:race}=1), blacks ({cmd:race}=2), and others ({cmd:race}=3):
1846 | 
1847 | {p 8 12 2}
1848 |         . {stata dstat (gini mld vlog) wage (mean) hours ttl_exp (pr1 pr2 pr3) race}
1849 |         {p_end}
1850 | 
1851 | {dlgtab:Distribution functions}
1852 | 
1853 | {pstd}
1854 |     {cmd:dstat} supports the estimation of several types of distribution
1855 |     functions. For example, the density function of wages by union status can
1856 |     be obtained as follows:
1857 | 
1858 |         . {stata sysuse nlsw88, clear}
1859 |         . {stata dstat density wage, over(union) ll(0) graph}
1860 | 
1861 | {pstd}
1862 |     Option {cmd:graph} has been specified so that a graph is drawn. The coefficients
1863 |     table will be suppressed in this case; specify option {cmd:table} to enforce displaying the
1864 |     coefficients table. An alternative would be to
1865 |     omit the {cmd:graph} option and then type {cmd:dstat graph} after estimation.
1866 |     Option {cmd:ll(0)} has been specified because wages can only be positive. The option
1867 |     causes density estimation to be restricted to the positive domain
1868 |     and applies appropriate boundary correction.
1869 | 
1870 | {pstd}
1871 |     In the example above, the density estimates for unionized and nonunionized
1872 |     workers have been displayed in two separate subgraphs. Apply graph option
1873 |     {cmd:merge} to overlay the two curves in a single coordinate system:
1874 | 
1875 |         . {stata dstat graph, merge}
1876 | 
1877 | {pstd}
1878 |     To see how the overall wage distribution is composed by the two
1879 |     groups, we can, for example, rescale the density estimates by group size
1880 |     using option {cmd:unconditional} and include the total density using option
1881 |     {cmd:total}:
1882 | 
1883 | {p 8 12 2}
1884 |         . {stata dstat density wage, over(union) total unconditional ll(0) graph(merge)}
1885 |         {p_end}
1886 | 
1887 | {dlgtab:Covariate balancing}
1888 | 
1889 | {pstd}
1890 |     The {cmd:balance()} option can be used to adjust results for differences in
1891 |     covariate distributions when comparing subpopulations. By default,
1892 |     {cmd:dstat} employs inverse probability weighting (IPW) to balance the
1893 |     covariates and obtains the relevant reference distribution from the total
1894 |     sample. That is, in each subpopulation the covariate distribution is
1895 |     adjusted such that it resembles the covariate distribution observed in the
1896 |     total population. Use the {cmd:reference()} suboption to change the reference
1897 |     distribution.
1898 | 
1899 | {pstd}
1900 |     For example, the mean difference of average wages between nonunionized and
1901 |     unionized workers is as follows:
1902 | 
1903 |         . {stata sysuse nlsw88, clear}
1904 |         . {stata dstat (mean) wage, over(union)}
1905 |         . {stata lincom _b[1.union]-_b[0.union]}
1906 | 
1907 | {pstd}
1908 |     Controlling for education, working hours, work experience and tenure reduces
1909 |     the mean difference by about a third (note that there has been a small change
1910 |     in the estimation sample due to missing values; for a more valid comparison,
1911 |     the raw difference should be computed based on the same sample as the
1912 |     balanced difference):
1913 | 
1914 | {p 8 12 2}
1915 |         . {stata dstat (mean) wage, over(union) balance(grade hours ttl_exp tenure)}
1916 |         {p_end}
1917 |         . {stata lincom _b[1.union]-_b[0.union]}
1918 | 
1919 | {pstd}
1920 |     To evaluate how successful the balancing was, you can use suboption {cmd:generate()}
1921 |     to store the balancing weights:
1922 | 
1923 | {p 8 12 2}
1924 |         . {stata dstat (mean) wage, over(union) balance(grade hours ttl_exp tenure, generate(wbal))}
1925 |         {p_end}
1926 | {p 8 12 2}
1927 |         . {stata tabstat grade hours ttl_exp tenure if wage<., by(union)} (unbalanced)
1928 |         {p_end}
1929 | {p 8 12 2}
1930 |         . {stata tabstat grade hours ttl_exp tenure [aw=wbal], by(union)} (balanced)
1931 |         {p_end}
1932 |         . {stata drop wbal}
1933 | 
1934 | {pstd}
1935 |     The balancing has only been partially successful. Perfect balancing
1936 |     (with respect to the means) can be achieved by entropy balancing:
1937 | 
1938 | {p 8 12 2}
1939 |         . {stata "dstat (mean) wage, over(union) balance(eb: grade hours ttl_exp tenure, generate(wbal))"}
1940 |         {p_end}
1941 | {p 8 12 2}
1942 |         . {stata tabstat grade hours ttl_exp tenure [aw=wbal], by(union)}
1943 |         {p_end}
1944 |         . {stata drop wbal}
1945 | 
1946 | {pstd}
1947 |     Note that, instead of using {helpb lincom} after estimation, you can also obtain group
1948 |     differences directly using suboption {cmd:contrast} within the {cmd:over()}
1949 |     option:
1950 | 
1951 | {p 8 12 2}
1952 |         . {stata "dstat (mean) wage, over(union, contrast(0)) balance(eb:grade hours ttl_exp tenure)"}
1953 |         {p_end}
1954 | 
1955 | {dlgtab:Influence functions}
1956 | 
1957 | {pstd}
1958 |     {cmd:dstat} can store the influence functions or the recentered
1959 |     influence functions (RIFs) of the computed statistics. The influence functions
1960 |     or RIFs can then be used in further analyses. Here is an example of
1961 |     RIF regressions (Firpo et al. 2009) for the Gini coefficient and the
1962 |     mean log deviation:
1963 | 
1964 |         . {stata sysuse nlsw88, clear}
1965 |         . {stata dstat (gini mld) wage, rif(gini mld)}
1966 |         . {stata regress gini union south smsa, robust}
1967 |         . {stata regress mld union south smsa, robust}
1968 | 
1969 | {pstd}
1970 |     The RIFs are also useful for decomposition analysis. In the following example
1971 |     the wage gap between unionized and non-unionized workers is decomposed into
1972 |     a part explained by differences in covariates and a residual (unexplained) part, using
1973 |     reweighting based on entropy balancing and using the covariate distribution
1974 |     of unionized workers as the reference distribution:
1975 | 
1976 | {p 8 12 2}
1977 |         . {stata "dstat (mean) wage, over(union) balance(eb: grade hours ttl_exp tenure, reference(1)) rif(RIF0c)"}
1978 |         {p_end}
1979 | {p 8 12 2}
1980 |         . {stata "dstat (mean) wage if e(sample), over(union) rif(RIF0 RIF1)"}
1981 |         {p_end}
1982 |         . {stata generate difference  = RIF1  - RIF0}
1983 |         . {stata generate explained   = RIF0c - RIF0}
1984 |         . {stata generate unexplained = RIF1  - RIF0c}
1985 |         . {stata mean difference explained unexplained}
1986 | 
1987 | 
1988 | {marker methods}{...}
1989 | {title:Methods and formulas}
1990 | 
1991 | {pstd}
1992 |     (under construction)
1993 | 
1994 | 
1995 | {marker saved_results}{...}
1996 | {title:Saved results}
1997 | 
1998 | {pstd}
1999 |     Depending on options, {cmd:dstat} stores a selection of the following
2000 |     results in {cmd:e()}.
2001 | 
2002 | {synoptset 20 tabbed}{...}
2003 | {p2col 5 20 24 2: Scalars}{p_end}
2004 | {synopt:{cmd:e(N)}}number of observations{p_end}
2005 | {synopt:{cmd:e(W)}}sum of weights{p_end}
2006 | {synopt:{cmd:e(N_over)}}number subpopulations{p_end}
2007 | {synopt:{cmd:e(N_clust)}}number of clusters{p_end}
2008 | {synopt:{cmd:e(N_vars)}}number of variables{p_end}
2009 | {synopt:{cmd:e(N_stats)}}number of (unique) summary statistics{p_end}
2010 | {synopt:{cmd:e(k_eq)}}number of equations in {cmd:e(b)}{p_end}
2011 | {synopt:{cmd:e(k_omit)}}number of omitted estimates{p_end}
2012 | {synopt:{cmd:e(df_r)}}sample degrees of freedom{p_end}
2013 | {synopt:{cmd:e(qdef)}}quantile definition{p_end}
2014 | {synopt:{cmd:e(adaptive)}}number of iterations of adaptive density estimator{p_end}
2015 | {synopt:{cmd:e(napprox)}}size of density estimation grid{p_end}
2016 | {synopt:{cmd:e(pad)}}padding of density estimation grid{p_end}
2017 | {synopt:{cmd:e(ll)}}lower boundary of the data support (density estimation){p_end}
2018 | {synopt:{cmd:e(ul)}}upper boundary of the data support (density estimation){p_end}
2019 | {synopt:{cmd:e(level)}}confidence level{p_end}
2020 | 
2021 | {synoptset 20 tabbed}{...}
2022 | {p2col 5 20 24 2: Macros}{p_end}
2023 | {synopt:{cmd:e(cmd)}}{cmd:dstat}{p_end}
2024 | {synopt:{cmd:e(subcmd)}}{cmd:summarize}, {cmd:density}, {cmd:histogram}, {cmd:proportion}, {cmd:cdf}, {cmd:ccdf}, {cmd:quantile}, {cmd:lorenz}, or {cmd:share}{p_end}
2025 | {synopt:{cmd:e(predict)}}{cmd:dstat predict}{p_end}
2026 | {synopt:{cmd:e(cmdline)}}command as typed{p_end}
2027 | {synopt:{cmd:e(depvar)}}name(s) of analyzed variable(s){p_end}
2028 | {synopt:{cmd:e(nocasewise)}}{bf:nocasewise} or empty{p_end}
2029 | {synopt:{cmd:e(over)}}name of {it:overvar}{p_end}
2030 | {synopt:{cmd:e(over_namelist)}}values of subpopulations{p_end}
2031 | {synopt:{cmd:e(over_labels)}}labels of subpopulations{p_end}
2032 | {synopt:{cmd:e(over_select)}}values of selected subpopulations{p_end}
2033 | {synopt:{cmd:e(over_contrast)}}{cmd:total}, {it:#}, {cmd:lag}, {cmd:lead}, or empty{p_end}
2034 | {synopt:{cmd:e(over_ratio)}}{cmd:ratio} or {cmd:lnratio} or empty{p_end}
2035 | {synopt:{cmd:e(over_accumulate)}}{cmd:accumulate} or empty{p_end}
2036 | {synopt:{cmd:e(over_fixed)}}{cmd:fixed} or empty{p_end}
2037 | {synopt:{cmd:e(total)}}{cmd:total} or empty{p_end}
2038 | {synopt:{cmd:e(unconditional)}}{cmd:unconditional} or empty{p_end}
2039 | {synopt:{cmd:e(balance)}}list of balancing variables{p_end}
2040 | {synopt:{cmd:e(balmethod)}}balancing method{p_end}
2041 | {synopt:{cmd:e(balref)}}balancing reference{p_end}
2042 | {synopt:{cmd:e(balopts)}}options passed through to balancing procedure{p_end}
2043 | {synopt:{cmd:e(bwmethod)}}bandwidth selection as specified in {cmd:bwidth()}{p_end}
2044 | {synopt:{cmd:e(kernel)}}kernel as specified in {cmd:kernel()}{p_end}
2045 | {synopt:{cmd:e(exact)}}{cmd:exact} or empty{p_end}
2046 | {synopt:{cmd:e(boundary)}}boundary correction method{p_end}
2047 | {synopt:{cmd:e(hdtrim)}}{cmd:hdtrim()} as specified{p_end}
2048 | {synopt:{cmd:e(mqopts)}}{cmd:mqopts()} as specified{p_end}
2049 | {synopt:{cmd:e(novalues)}}{cmd:novalues} or empty{p_end}
2050 | {synopt:{cmd:e(vformat)}}display format specified in {cmd:vformat()}{p_end}
2051 | {synopt:{cmd:e(stats)}}list of (unique) summary statistics{p_end}
2052 | {synopt:{cmd:e(slist)}}normalized specification of statistics and variables{p_end}
2053 | {synopt:{cmd:e(percent)}}{cmd:percent} or empty{p_end}
2054 | {synopt:{cmd:e(proportion)}}{cmd:proportion} or empty{p_end}
2055 | {synopt:{cmd:e(frequency)}}{cmd:frequency} or empty{p_end}
2056 | {synopt:{cmd:e(mid)}}{cmd:mid} or empty{p_end}
2057 | {synopt:{cmd:e(floor)}}{cmd:floor} or empty{p_end}
2058 | {synopt:{cmd:e(ipolate)}}{cmd:ipolate} or empty{p_end}
2059 | {synopt:{cmd:e(discrete)}}{cmd:discrete} or empty{p_end}
2060 | {synopt:{cmd:e(categorical)}}{cmd:categorical} or empty{p_end}
2061 | {synopt:{cmd:e(ep)}}{cmd:ep} or empty{p_end}
2062 | {synopt:{cmd:e(gap)}}{cmd:gap} or empty{p_end}
2063 | {synopt:{cmd:e(generalized)}}{cmd:generalized} or empty{p_end}
2064 | {synopt:{cmd:e(absolute)}}{cmd:absolute} or empty{p_end}
2065 | {synopt:{cmd:e(average)}}{cmd:average} or empty{p_end}
2066 | {synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end}
2067 | {synopt:{cmd:e(byvar)}}name of variable specified in {cmd:by()}{p_end}
2068 | {synopt:{cmd:e(pline)}}poverty line variable specified in {cmd:pline()}{p_end}
2069 | {synopt:{cmd:e(pstrong)}}{cmd:pstrong} or empty{p_end}
2070 | {synopt:{cmd:e(generate)}}name(s) of generated variable(s){p_end}
2071 | {synopt:{cmd:e(clustvar)}}name of cluster variable{p_end}
2072 | {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end}
2073 | {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end}
2074 | {synopt:{cmd:e(citype)}}type confidence interval stored in {cmd:e(ci)}{p_end}
2075 | {synopt:{cmd:e(wtype)}}weight type{p_end}
2076 | {synopt:{cmd:e(wexp)}}weight expression{p_end}
2077 | {synopt:{cmd:e(title)}}title in estimation output{p_end}
2078 | {synopt:{cmd:e(properties)}}{cmd:b} or {cmd:b V}{p_end}
2079 | 
2080 | {synoptset 20 tabbed}{...}
2081 | {p2col 5 20 24 2: Matrices}{p_end}
2082 | {synopt:{cmd:e(b)}}estimates{p_end}
2083 | {synopt:{cmd:e(V)}}variance-covariance matrix of estimates{p_end}
2084 | {synopt:{cmd:e(se)}}standard errors of estimates{p_end}
2085 | {synopt:{cmd:e(ci)}}confidence intervals of estimates{p_end}
2086 | {synopt:{cmd:e(nobs)}}number of observations per estimate{p_end}
2087 | {synopt:{cmd:e(sumw)}}sum of weights per estimate{p_end}
2088 | {synopt:{cmd:e(at)}}evaluation points of distribution function{p_end}
2089 | {synopt:{cmd:e(omit)}}indicator for omitted estimates{p_end}
2090 | {synopt:{cmd:e(id)}}subpopulation IDs of estimates{p_end}
2091 | {synopt:{cmd:e(cref)}}contrast reference indicators{p_end}
2092 | {synopt:{cmd:e(bwidth)}}kernel bandwidth(s) of density estimation{p_end}
2093 | {synopt:{cmd:e(_N)}}number of observations by subpopulation{p_end}
2094 | {synopt:{cmd:e(_W)}}sum of weights by subpopulation{p_end}
2095 | 
2096 | {synoptset 20 tabbed}{...}
2097 | {p2col 5 20 24 2: Functions}{p_end}
2098 | {synopt:{cmd:e(sample)}}estimation sample{p_end}
2099 | {p2colreset}{...}
2100 | 
2101 | {pstd}
2102 |     If {cmd:vce()} is {cmd:svy}, {cmd:bootstrap}, or {cmd:jackknife}, additional
2103 |     results are stored in {cmd:e()}; see {helpb svy}, {helpb bootstrap}, and
2104 |     {helpb jackknife}, respectively.
2105 | 
2106 | 
2107 | {marker references}{...}
2108 | {title:References}
2109 | 
2110 | {phang}
2111 |     Akinshin, A. (2021). Trimmed Harrell-Davis quantile estimator based on the
2112 |     highest density interval of the given
2113 |     width. {browse "http://arxiv.org/abs/2111.11776":arXiv:2111.11776} [stat.ME].
2114 |     {p_end}
2115 | {phang}
2116 |     Botev, Z.I., J.F. Grotowski, and D.P. Kroese (2010). Kernel density
2117 |     estimation via diffusion. Annals of Statistics
2118 |     38(5): 2916-2957. DOI: {browse "http://doi.org/10.1214/10-AOS799":10.1214/10-AOS799}.
2119 |     {p_end}
2120 | {phang}
2121 |     Brys, G., M. Hubert, A. Struyf (2004). A Robust Measure of Skewness.
2122 |     Journal of Computational and Graphical Statistics 13(4): 996-1017.
2123 |     {p_end}
2124 | {phang}
2125 |     Brys, G., M. Hubert, A. Struyf (2006). Robust measures of tail weight.
2126 |     Computational Statistics & Data Analysis 50: 733-759.
2127 |     {p_end}
2128 | {phang}
2129 |     Clark, S., R. Hemming, D. Ulph (1981). On Indices for the Measurement of Poverty. The
2130 |     Economic Journal 91(362): 515-526
2131 |     {p_end}
2132 | {phang}
2133 |     Cwik, J., J. Mielniczuk (1993). Data-dependent bandwidth choice for a grade density
2134 |     kernel estimate. Statistics & Probability Letters 16: 397-405.
2135 |     {p_end}
2136 | {phang}
2137 |     Deville, Jean-Claude (1999). Variance estimation for complex statistics and
2138 |     estimators: Linearization and residual techniques. Survey Methodology 25: 193-203.
2139 |     {p_end}
2140 | {phang}
2141 |     DiNardo, J.E., N. Fortin, T. Lemieux (1996). Labour Market Institutions and
2142 |     the Distribution of Wages, 1973-1992: A Semiparametric Approach. Econometrica
2143 |     64(5): 1001-1046.
2144 |     {p_end}
2145 | {phang}
2146 |     Donaldson, D., J.A. Weymark (1986). Properties of Fixed-Population Poverty Indices. International
2147 |     Economic Review 27(3): 667-688.
2148 |     {p_end}
2149 | {phang}
2150 |     Firpo, S., N.M. Fortin, T. Lemieux (2009). Unconditional Quantile
2151 |     Regressions. Econometrica 77: 953-973.
2152 |     {p_end}
2153 | {phang}
2154 |     Foster, J., J. Greer, E. Thorbecke (1984). A class of decomposable poverty
2155 |     measures. Econometrica 52(3): 761-766.
2156 |     {p_end}
2157 | {phang}
2158 |     Foster, J., J. Greer, E. Thorbecke (2010). The Foster–Greer–Thorbecke (FGT) poverty measures: 25 years
2159 |     later. The Journal of Economic Inequality 8: 491–524.
2160 |     {p_end}
2161 | {phang}
2162 |     Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate
2163 |     Reweighting Method to Produce Balanced Samples in Observational Studies.
2164 |     Political Analysis 20: 25-46.
2165 |     {p_end}
2166 | {phang}
2167 |     Hampel, F.R. (1974). The Influence Curve and Its Role in Robust
2168 |     Estimation. Journal of the American Statistical Association 69: 383-393.
2169 |     {p_end}
2170 | {phang}
2171 |     Harrell, F.E., C.E. Davis (1982). A New Distribution-Free Quantile Estimator. Biometrika
2172 |     69: 635-640.
2173 |     {p_end}
2174 | {phang}
2175 |     Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika
2176 |     62(1): 101-111.
2177 |     {p_end}
2178 | {phang}
2179 |     Hodges, Jr., J.L., E.L. Lehmann (1963). Estimates of location based on
2180 |     rank tests. Annals of Mathematical Statistics 34(2): 598-611.
2181 |     {p_end}
2182 | {phang}
2183 |     Hyndman, R.J., Fan, Y. (1996). Sample Quantiles in Statistical
2184 |     Packages. The American Statistician 50: 361-365.
2185 |     {p_end}
2186 | {phang}
2187 |     Jann, B. (2007). Univariate kernel density
2188 |     estimation. DOI: {browse "http://boris.unibe.ch/69421/2/kdens.pdf":10.7892/boris.69421}.
2189 |     {p_end}
2190 | {phang}
2191 |     Jann, B. (2020). Influence functions continued. A framework for estimating standard errors in
2192 |     reweighting, matching, and regression adjustment. University of Bern Social Sciences
2193 |     Working Papers 35. Available from
2194 |     {browse "http://ideas.repec.org/p/bss/wpaper/35.html"}.
2195 |     {p_end}
2196 | {phang}
2197 |     Ma, Y., M.G. Genton, E. Parzen (2011). Asymptotic properties of sample
2198 |     quantiles of discrete distributions. Annals of the Institute of Statistical
2199 |     Mathematics 63:227–243.
2200 |     {p_end}
2201 | {phang}
2202 |     Newson, R. (2006). Efficient Calculation of Jackknife Confidence
2203 |     Intervals for Rank Statistics. Journal of Statistical Software 15(1).
2204 |     {p_end}
2205 | {phang}
2206 |     Osberg, L., K. Xu (2008). How Should We Measure Poverty in a Changing World? Methodological
2207 |     Issues and Chinese Case Study. Review of Development Economics 12(2): 419–441.
2208 |     {p_end}
2209 | {phang}
2210 |     Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata: RIF
2211 |     regression and RIF decomposition. The Stata Journal 20(1): 51-94.
2212 |     {p_end}
2213 | {phang}
2214 |     Rousseeuw, P.J., C. Croux (1993). Alternatives to the Median
2215 |     Absolute Deviation. Journal of the American Statistical Association
2216 |     88(424): 1273-1283.
2217 |     {p_end}
2218 | {phang}
2219 |     Saisana M. (2014). Watts Poverty Index. In: A.C. Michalos (ed). Encyclopedia of Quality of Life and Well-Being
2220 |     Research. Dordrecht: Springer. DOI: {browse "http://doi.org/10.1007/978-94-007-0753-5_3197":10.1007/978-94-007-0753-5_3197}
2221 |     {p_end}
2222 | {phang}
2223 |     Sen, A. (1976). Poverty: An Ordinal Approach to Measurement. Econometrica 44(2): 219-231.
2224 |     {p_end}
2225 | {phang}
2226 |     Shorrocks, A.F. (1980). The Class of Additively Decomposable Inequality Measures. Econometrica 48(3): 613-625.
2227 |     {p_end}
2228 | {phang}
2229 |     Shorrocks, A.F. (1995). Revisiting the Sen Poverty Index. Econometrica 63(5): 1225-1230.
2230 |     {p_end}
2231 | {phang}
2232 |     Takayama, N. (1979). Poverty, income inequality, and their measures: Professor Sen's
2233 |     axiomatic approach reconsidered. Econometrica 47(3): 747-759.
2234 |     {p_end}
2235 | {phang}
2236 |     Wand, M.P., M.C. Jones (1995). Kernel Smoothing. London: Chapman and Hall.
2237 |     {p_end}
2238 | 
2239 | 
2240 | {marker author}{...}
2241 | {title:Author}
2242 | 
2243 | {pstd}
2244 |     Ben Jann, University of Bern, ben.jann@unibe.ch
2245 | 
2246 | {pstd}
2247 |     Thanks for citing this software as follows:
2248 | 
2249 | {pmore}
2250 |     Jann, B. (2020). dstat: Stata module to compute summary statistics and
2251 |     distribution functions including standard errors
2252 |     and optional covariate balancing. Available from
2253 |     {browse "http://ideas.repec.org/c/boc/bocode/s458874.html"}.
2254 | 
2255 | 
2256 | {marker also_see}{...}
2257 | {title:Also see}
2258 | 
2259 | {psee}
2260 |     Online: help for
2261 |     {helpb centile},
2262 |     {helpb ci},
2263 |     {helpb correlate},
2264 |     {helpb cumul},
2265 |     {helpb histogram},
2266 |     {helpb kdensity},
2267 |     {helpb mean},
2268 |     {helpb pctile},
2269 |     {helpb proportion},
2270 |     {helpb spearman},
2271 |     {helpb summarize},
2272 |     {helpb table},
2273 |     {helpb tabstat},
2274 |     {helpb tabulate},
2275 |     {helpb teffects ipw},
2276 |     {helpb total}
2277 | 
2278 | {psee}
2279 |     Packages from the SSC Archive (type {cmd:ssc describe} {it:name} for
2280 |     more information):
2281 |     {helpb akdensity},
2282 |     {helpb apoverty},
2283 |     {helpb catplot},
2284 |     {helpb cdfplot},
2285 |     {helpb ci2},
2286 |     {helpb dfl},
2287 |     {helpb distplot},
2288 |     {helpb duncan},
2289 |     {helpb eqprhistogram},
2290 |     {helpb fre},
2291 |     {helpb glcurve},
2292 |     {helpb ineqdeco},
2293 |     {helpb kdens},
2294 |     {helpb kmatch},
2295 |     {helpb lorenz},
2296 |     {helpb moremata},
2297 |     {helpb povdeco},
2298 |     {helpb poverty},
2299 |     {helpb pshare},
2300 |     {helpb reldist},
2301 |     {helpb rif},
2302 |     {helpb robstat},
2303 |     {helpb seg},
2304 |     {helpb somersd},
2305 |     {helpb sumdist},
2306 |     {helpb svygei:svygei_svyatk},
2307 |     {helpb svylorenz}
2308 | 
2309 | 


--------------------------------------------------------------------------------
/dstat_svyr.ado:
--------------------------------------------------------------------------------
 1 | *! version 1.0.5  15dec2022  Ben Jann
 2 | *! helper program for -dstat, vce(svy)-; do not use manually
 3 | 
 4 | program dstat_svyr, eclass properties(svylb svyb svyj)
 5 |     version 14
 6 |     _parse comma lhs 0 : 0
 7 |     syntax [, NOSE * ]
 8 |     dstat `lhs', nose `options'
 9 |     tempname b V
10 |     mat `b' = e(b)
11 |     mata: st_matrix("`V'", diag(1 :- st_matrix("e(omit)")))
12 |     ereturn repost b=`b' V=`V', resize
13 |     eret local cmd "prop" // trick to skip _check_omit
14 |     eret local cmd0 "dstat_svyr"
15 | end
16 | 
17 | 


--------------------------------------------------------------------------------
/stata.toc:
--------------------------------------------------------------------------------
1 | v 3
2 | p dstat Stata module to compute summary statistics and distribution functions including standard errors and optional covariate balancing
3 | 


--------------------------------------------------------------------------------