├── LICENSE ├── README.md ├── benchmarks └── regressby_benchmark.png ├── regressby.ado ├── regressby.pkg ├── regressby.sthlp └── stata.toc /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Michael Droste 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | regressby 3 | ================================= 4 | 5 | [Overview](#overview) 6 | | [Motivation](#motivation) 7 | | [Installation](#installation) 8 | | [Usage](#usage) 9 | | [Benchmarks](#benchmarks) 10 | | [To-Do](#todo) 11 | | [Acknowledgements](#acknowledgements) 12 | | [License](#license) 13 | 14 | Flexible and hyper-fast grouped regressions in Stata 15 | 16 | `version 0.51 31jul2018` 17 | 18 | 19 | Overview 20 | --------------------------------- 21 | 22 | regressby is a fast and efficient method to run grouped OLS regressions; that is, it estimates a given OLS regression model on a collection of subsets of your dataset, returning the coefficients and standard errors associated with each regression. Functionally, it is very similar to the built-in -statsby- program, however, -regressby- runs between 10 and 1000 times faster than -statsby- in most use cases. The performance increases are particularly large when there are many groups, when the number observations in each group is relatively small, and when the regression model only contains a few parameters. 23 | 24 | regressby supports a number of useful bells and whistles: subsetting with if/in, analytical weights, heteroskedasticity-robust and clustered standard errors. Furthermore, unlike statsby, regressby (optionally) allows users to access to the full variance-covariance matrix associated with each regression by returning the sampling covariance associated with each pair of estimated parameters. 25 | 26 | 27 | Motivation 28 | --------------------------------- 29 | 30 | It is easiest to explain how regressby functions by way of example. Suppose you want to estimate a regression describing how the relationship between a person's income (y) and their parent's income (x) varies across place of birth (g). More concretely, you want to run a regression of y on x with separate slopes and intercepts for each group g. 31 | 32 | You can accomplish this in one step by regressing y on a vector of dummy variables for each distinct value of g and a vector of interactions between these dummies and x, suppressing the constant to avoid interpreting the coefficients with respect to an omitted reference group. This approach is convenient, but suffers from a number of drawacks. Most importantly, Stata does not allow the direct estimation of more than 10,998 parameters simultaneously, which in this case means that this one-step estimator can only be computed when there are fewer than 5,500 groups. Second, it turns out that directly estimating thousands of parameters is quite slow. 33 | 34 | If the number of groups is relatively large, an alternative strategy is to estimate a univariate regression of y on x separately within each group g. There are at least two easy ways to do this in Stata, either by manually iterating over groups or by using the built-in -statsby- function. Unfortunately, both of these methods are excruciatingly slow when the number of groups is large. 35 | 36 | Regressby is intended primarily as a replacement for these built-in methods. In my use cases, this program has been hundreds of times faster than -statsby-, reducing the runtime of scripts that would previously take days or weeks into less than an hour. 37 | 38 | 39 | Installation 40 | --------------------------------- 41 | 42 | There are two options for installing regressby. 43 | 44 | 1. The most recent version can be installed from Github with the following Stata command: 45 | 46 | ```stata 47 | net install regressby, from(https://raw.githubusercontent.com/mdroste/stata-regressby/master/) 48 | ``` 49 | 50 | 2. A ZIP containing the program can be downloaded and manually placed on the user's adopath from Github. 51 | 52 | 53 | Usage 54 | --------------------------------- 55 | 56 | The following two commands are equivalent: 57 | 58 | ```stata 59 | regressby y x, by(byvars) 60 | statsby, by(byvars) clear: reg y x 61 | ``` 62 | 63 | More on this soon. See the help file in Stata. 64 | 65 | 66 | Benchmarks 67 | --------------------------------- 68 | 69 | ![regressby benchmark](benchmarks/regressby_benchmark.png "regressby benchmark") 70 | 71 | 72 | Todo 73 | --------------------------------- 74 | 75 | The following items will be addressed soon: 76 | 77 | - [ ] Finish off this readme.md and the help file 78 | - [ ] Finish benchmarking 79 | - [ ] Provide script to validate results / example datasets 80 | - [ ] Add support for frequency weights 81 | 82 | Porting this program into a compiled C plugin for Stata would yield a significant increase in performance; I have no plans to do that in the near future. 83 | 84 | 85 | Acknowledgements 86 | --------------------------------- 87 | 88 | This program is based off of internal code from the illustrious [Michael Stepner](https://github.com/michaelstepner)'s health inequality project. This program also benefited from contributions provided by the inimitable Dr. [Wilbur Townsend](https://github.com/wilbur-t), who helped elegantly generalize the code to allow for an arbitrary number of regressors. Finally, this program benefited greatly from the guidance and advice of Raj Chetty. 89 | 90 | 91 | License 92 | --------------------------------- 93 | 94 | regressby is [MIT-licensed](https://github.com/mdroste/stata-regressby/blob/master/LICENSE). 95 | -------------------------------------------------------------------------------- /benchmarks/regressby_benchmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mdroste/stata-regressby/dae25762b633ee87cba46f8a85fa6712056fc9ff/benchmarks/regressby_benchmark.png -------------------------------------------------------------------------------- /regressby.ado: -------------------------------------------------------------------------------- 1 | *=============================================================================== 2 | * PROGRAM: regressby.ado 3 | * PURPOSE: Performs fast grouped univariate OLS regressions. 4 | * The following commands are equivalent: 5 | * regressby y x, by(byvars) 6 | * statsby, by(byvars) clear: reg y x 7 | * Except regressby will run 10-100x faster. 8 | * Also computes standard errors in a variety of flavors: usual 9 | * asymptotic standard errors, robust standard errors, and clustered 10 | * standard errors. 11 | * AUTHORS: Michael Stepner, Michael Droste, Wilbur Townsend 12 | *=============================================================================== 13 | 14 | 15 | *------------------------------------------------------------------------------- 16 | * Stata wrapper 17 | *------------------------------------------------------------------------------- 18 | 19 | program define regressby 20 | 21 | version 12.0 22 | syntax varlist(min=2 numeric) [aweight], by(varlist) [vce(string) covs save(string)] 23 | 24 | * Preserve dataset in case we crash 25 | preserve 26 | 27 | * Restrict sample with if/in conditions 28 | marksample touse, strok novarlist 29 | qui drop if `touse'==0 30 | 31 | * Parse VCE option, if specified 32 | if `"`vce'"' != "" { 33 | my_vce_parse , vce(`vce') 34 | local vcetype "robust" 35 | local clusterby "`r(clustervar)'" 36 | if "`vcetype'"=="robust" local robust "robust" 37 | if "`clusterby'"!="" local robust = "" 38 | } 39 | 40 | * Check to make sure save data file path is valid 41 | if ("`replace'"=="") & (`"`savegraph'"'!="") { 42 | if regexm(`"`savegraph'"',"\.[a-zA-Z0-9]+$") confirm new file `"`save'"' 43 | else confirm new file `"`save'.dta"' 44 | } 45 | 46 | * Error checking: can't specify both robust and clusterby 47 | if "`robust'"!="" & "`clusterby'"!="" { 48 | di as error "Error: can't specify both clustered and robust standard errors at once! Choose one." 49 | exit 50 | } 51 | 52 | * Display type of standard error chosen 53 | if "`robust'"=="" & "`clusterby'"=="" { 54 | di "Running regressby with normal OLS standard errors." 55 | } 56 | if "`robust'"!="" { 57 | di "Running regressby with robust standard errors." 58 | } 59 | if "`clusterby'"!="" { 60 | di "Running regressby with cluster-robust standard errors (clustered by `clusterby')." 61 | } 62 | 63 | * Construct analytical weight variable 64 | if ("`weight'"!="") { 65 | local wt [`weight'`exp'] 66 | tempvar tmpwt 67 | gen `tmpwt' `exp' 68 | local weightby `tmpwt' 69 | di "Using analytical weights, weight `exp'." 70 | } 71 | 72 | * Display weighting scheme, if applicable 73 | if "`weightby'"!="" { 74 | foreach v in `varlist' { 75 | qui replace `v' = `v' * sqrt(`weightby') 76 | } 77 | qui replace `weightby' = sqrt(`weightby') 78 | } 79 | 80 | * Convert string by-vars to temporary numeric variables 81 | foreach var of varlist `by' { 82 | cap confirm numeric variable `var', exact 83 | if _rc==0 { // numeric var 84 | local bynumeric `bynumeric' `var' 85 | } 86 | else { // string var 87 | tempvar `var'N 88 | encode `var', gen(``var'N') 89 | local bynumeric `bynumeric' ``var'N' 90 | local bystr `bystr' `var' // list of string by-vars 91 | } 92 | } 93 | 94 | * Sort using by-groups 95 | sort `by' `clusterby' 96 | 97 | * Generate a single by-variable counting by groups 98 | tempvar grp 99 | egen `grp'=group(`bynumeric') 100 | qui drop if mi(`grp') 101 | 102 | * Drop observations missing independent or dependent variables 103 | * Also count number of variables here including constant, awkward and should be replaced 104 | local num_x = 0 105 | foreach v in `varlist'{ 106 | qui drop if mi(`v') 107 | local num_x = `num_x' + 1 108 | } 109 | local num_x = `num_x' - 1 110 | if "`nocons'"=="" local num_x = `num_x' + 1 111 | 112 | * Drop observations missing weight, if weights are specified 113 | if "`weightby'"!="" { 114 | drop if `weightby'==. 115 | } 116 | 117 | * XX revisit this later to handle missing data 118 | 119 | 120 | * Perform regressions on each by-group, store in dataset 121 | mata: _regressby("`varlist'", "`grp'", "`bynumeric'","`clusterby'","`robust'","`weightby'") 122 | 123 | * Convert string by-vars back to strings, from numeric 124 | foreach var in `bystr' { 125 | decode ``var'N', gen(`var') 126 | } 127 | order `by' 128 | 129 | 130 | * XX find out if it is faster to compute R2 in Mata or Stata 131 | if "`nocov'"!="" { 132 | cap drop _cov_* 133 | } 134 | 135 | * XX optionally save out to dta and just restore with a message 136 | if "`save'"=="" { 137 | restore, not 138 | } 139 | if "`save'"!="" { 140 | save `save', replace 141 | restore 142 | } 143 | 144 | end 145 | 146 | *------------------------------------------------------------------------------- 147 | * Mata program: _regressby3 148 | * Inputs: 149 | * - A y-var and x-var for an OLS regression 150 | * - A group var, for which each value represents a distinct by-group. 151 | * This var must be in ascending order. 152 | * - A list of numeric by-variables, whose groups correspond to th group var. 153 | * Outputs: 154 | * - dataset of coefficients from OLS regression for each by-group 155 | *------------------------------------------------------------------------------- 156 | 157 | version 13.1 158 | set matastrict on 159 | 160 | mata: 161 | void _regressby(string scalar regvars, string scalar grpvar, string scalar byvars, string scalar clusterby, string scalar robust, string scalar weightby) { 162 | 163 | // Convert variable names to column indices 164 | real rowvector regcols, bycols, clustercol, weightcol 165 | real scalar ycol, xcol, grpcol 166 | regcols = st_varindex(tokens(regvars)) 167 | bycols = st_varindex(tokens(byvars)) 168 | clustercol = st_varindex(tokens(clusterby)) 169 | weightcol = st_varindex(tokens(weightby)) 170 | grpcol = st_varindex(grpvar) 171 | 172 | // Fetch number of groups 173 | real scalar numgrp, startobs, curgrp 174 | numgrp = _st_data(st_nobs(),grpcol) 175 | startobs = 1 176 | curgrp = _st_data(1,grpcol) 177 | 178 | // Preallocate matrices for output 179 | real matrix groups, coefs, ses, covs, nobs 180 | groups = J(numgrp, cols(bycols), .) 181 | coefs = J(numgrp, cols(regcols), .) 182 | Vs = J(numgrp, cols(regcols)^2, .) 183 | nobs = J(numgrp, 1, .) 184 | 185 | // Preallocate regression objects 186 | real matrix XX, Xy, XX_inv, V, Z, M, y, x, w 187 | real scalar N, k, cov, p, nc 188 | real vector beta, e, s2, cvar, xi, ei 189 | 190 | // ----------------------------------------------------------------------------- 191 | // Iterate over groups 192 | // ----------------------------------------------------------------------------- 193 | 194 | // Iterate over groups 1 to Ng-1 195 | for (obs=1; obs<=st_nobs()-1; obs++) { 196 | if (_st_data(obs,grpcol)!=curgrp) { 197 | st_view(M, (startobs,obs-1), regcols, 0) 198 | st_subview(y, M, ., 1) 199 | st_subview(X, M, ., (2\.)) 200 | N = rows(X) 201 | // Augment x with either column of 1's or weights 202 | // TODO -- noconstant option needs to be specified here and also accounted for in df 203 | if (weightby!="") { 204 | st_view(w, (startobs,obs-1), weightcol, 0) 205 | X = X,w 206 | } 207 | if (weightby=="") { 208 | X = X,J(N,1,1) 209 | } 210 | // Define matrix products 211 | XX = quadcross(X,X) 212 | Xy = quadcross(X,y) 213 | XX_inv = invsym(XX) 214 | // ------------ COMPUTE COEFFICIENTS -------------------- 215 | beta = (XX_inv*Xy)' 216 | e = y - X*beta' 217 | p = cols(X) 218 | k = p - diag0cnt(XX_inv) 219 | // ------------ COMPUTE STANDARD ERRORS ----------------- 220 | if (robust == "" & clusterby=="") { 221 | V = quadcross(e,e)/(N-k)*cholinv(XX) 222 | } 223 | if (robust != "") { 224 | V = (N/(N-k))*XX_inv*quadcross(X, e:^2, X)*XX_inv 225 | } 226 | if (clusterby != "") { 227 | st_view(cvar,(startobs,obs-1),clustercol,0) 228 | info = panelsetup(cvar, 1) 229 | nc = rows(info) 230 | Z = J(k, k, 0) 231 | if (nc>2) { 232 | for (i=1; i<=nc; i++) { 233 | xi = panelsubmatrix(X,i,info) 234 | ei = panelsubmatrix(e,i,info) 235 | Z = Z + xi'*(ei*ei')*xi 236 | } 237 | V = ((N-1)/(N-k))*(nc/(nc-1))*XX_inv*Z*XX_inv 238 | } 239 | } 240 | // ------------ STORE OUTPUT ---------------------------- 241 | coefs[curgrp,.] = beta 242 | Vs[curgrp,.] = rowshape(V, 1) 243 | nobs[curgrp,1] = N 244 | groups[curgrp,.] = st_data(startobs,bycols) 245 | // ------------ WRAP UP BY ITERATING COUNTERS ----------- 246 | curgrp = _st_data(obs,grpcol) 247 | startobs = obs 248 | } 249 | } 250 | 251 | // Iterate over last group manually 252 | obs=st_nobs() 253 | if (_st_data(obs,grpcol)==curgrp) { // last observation is not a group to itself 254 | // increment obs, since code is written as processing the observation that is 1 past the last in the group 255 | ++obs 256 | // compute OLS coefs: beta = inv(X'X) * X'y. --> see Example 4 of -help mf_cross- 257 | st_view(M, (startobs,obs-1), regcols, 0) 258 | st_subview(y, M, ., 1) 259 | st_subview(X, M, ., (2\.)) 260 | N = rows(X) 261 | // Augment X with either column of 1's (unweighted) or weights (weighted) 262 | // TODO -- noconstant option needs to be specified here and also accounted for in df 263 | if (weightby!="") { 264 | st_view(w, (startobs,obs-1), weightcol, 0) 265 | X = X,w 266 | } 267 | if (weightby=="") { 268 | X = X,J(N,1,1) 269 | } 270 | // Define matrix products 271 | XX = quadcross(X,X) 272 | Xy = quadcross(X,y) 273 | XX_inv = invsym(XX) 274 | beta = (XX_inv*Xy)' 275 | e = y - X*beta' 276 | p = cols(X) 277 | k = p - diag0cnt(XX_inv) 278 | // USUAL OLS STANDARD ERRORS 279 | if (robust == "" & clusterby == "") { 280 | V = quadcross(e,e)/(N-k)*cholinv(XX) 281 | } 282 | // ROBUST STANDARD ERRORS 283 | if (robust != "") { 284 | V = (N/(N-k))*XX_inv*quadcross(X, e:^2, X)*XX_inv 285 | } 286 | // CLUSTERED STANDARD ERRORS 287 | if (clusterby != "") { 288 | st_view(cvar,(startobs,obs-1),clustercol,0) 289 | info = panelsetup(cvar, 1) 290 | nc = rows(info) 291 | Z = J(k, k, 0) 292 | if (nc>2) { 293 | for (i=1; i<=nc; i++) { 294 | xi = panelsubmatrix(X,i,info) 295 | ei = panelsubmatrix(e,i,info) 296 | Z = Z + xi'*(ei*ei')*xi 297 | } 298 | V = ((N-1)/(N-k))*(nc/(nc-1))*XX_inv*Z*XX_inv 299 | } 300 | } 301 | // STORE REGRESSION OUTPUT 302 | coefs[curgrp,.] = beta 303 | Vs[curgrp,.] = rowshape(V, 1) 304 | nobs[curgrp,1] = N 305 | groups[curgrp,.] = st_data(startobs,bycols) 306 | } 307 | 308 | else { 309 | display("{error} last observation is in a singleton group") 310 | exit(2001) 311 | } 312 | 313 | // ----------------------------------------------------------------------------- 314 | // Gather output and pass back into Stata 315 | // ----------------------------------------------------------------------------- 316 | 317 | // Store group identifiers in dataset 318 | stata("qui keep in 1/"+strofreal(numgrp, "%18.0g")) 319 | stata("keep "+byvars) 320 | st_store(.,tokens(byvars),groups) 321 | 322 | // Store coefficients in dataset: 323 | 324 | // ... Number of observations, 325 | (void) st_addvar("long", "N") 326 | st_store(., ("N"), nobs) 327 | 328 | // ... And then looping over covariates, 329 | covariates = (cols(regcols)>1) ? tokens(regvars)[|2 \ .|], "cons" : ("cons") 330 | for (k=1; k<=length(covariates); k++) { 331 | covName = covariates[k] 332 | // ... Coefficients and standard errors, 333 | (void) st_addvar("float", "_b_"+covName) 334 | (void) st_addvar("float", "_se_"+covName) 335 | st_store(., "_b_"+covName, coefs[., k]) 336 | st_store(., "_se_"+covName, sqrt(Vs[., k + cols(regcols)*(k - 1)])) 337 | // ... And the sampling covariances. 338 | for (j=1; j 2 { 358 | my_vce_error , typed(`vce') 359 | } 360 | local 0 `", `vce'"' 361 | syntax [, Robust CLuster * ] 362 | if `case' == 2 { 363 | if "`robust'" == "robust" | "`cluster'" == "" { 364 | my_vce_error , typed(`vce') 365 | } 366 | capture confirm numeric variable `options' 367 | if _rc { 368 | my_vce_error , typed(`vce') 369 | } 370 | local clustervar "`options'" 371 | } 372 | else { // case = 1 373 | if "`robust'" == "" { 374 | my_vce_error , typed(`vce') 375 | } 376 | } 377 | return clear 378 | return local clustervar "`clustervar'" 379 | end 380 | 381 | program define my_vce_error 382 | syntax , typed(string) 383 | display `"{red}{bf:vce(`typed')} invalid"' 384 | error 498 385 | end 386 | -------------------------------------------------------------------------------- /regressby.pkg: -------------------------------------------------------------------------------- 1 | v 0.51 2 | d 3 | d 'regressby': fast and flexible grouped regressions 4 | d 5 | d Distribution-Date: 20180731 6 | d 7 | f regressby.ado 8 | f regressby.sthlp 9 | -------------------------------------------------------------------------------- /regressby.sthlp: -------------------------------------------------------------------------------- 1 | {smcl} 2 | {* *! version 0.51 31jul2018}{...} 3 | {viewerjumpto "Syntax" "regressby##syntax"}{...} 4 | {viewerjumpto "Description" "regressby##description"}{...} 5 | {viewerjumpto "Options" "regressby##options"}{...} 6 | {viewerjumpto "Examples" "regressby##examples"}{...} 7 | {viewerjumpto "Author" "regressby##author"}{...} 8 | {viewerjumpto "Acknowledgements" "regressby##acknowledgements"}{...} 9 | {title:Title} 10 | 11 | {p2colset 5 19 21 2}{...} 12 | {p2col :{hi:regressby} {hline 2}}Fast, flexible grouped regressions{p_end} 13 | {p2colreset}{...} 14 | 15 | 16 | 17 | {marker syntax}{title:Syntax} 18 | 19 | {p 8 15 2} 20 | {cmd:regressby} 21 | depvar [indepvars] {ifin} 22 | {weight}, by(varlist) 23 | [{cmd:}{it:options}] 24 | 25 | 26 | {synoptset 30 tabbed}{...} 27 | {synopthdr :options} 28 | {synoptline} 29 | 30 | {syntab :Main} 31 | {synopt :{opt vce(vcetype)}}{it:vcetype} may be {bf:robust}, or {bf:cluster} {it:clustvar}.{p_end} 32 | {synopt :{opt nocovs}}Do not compute the sampling covariances between dependent variables.{p_end} 33 | 34 | {syntab :Save Output} 35 | {synopt :{opt save(filename)}}Saves output to a .dta given by {it:filename}, restores data{p_end} 36 | 37 | {synoptline} 38 | {p 4 6 2} 39 | {opt aweight}s are allowed; 40 | see {help weight}. 41 | {p_end} 42 | 43 | 44 | 45 | {marker description}{...} 46 | {title:Description} 47 | 48 | {pstd} 49 | {opt regressby} runs a series of grouped regressions of an independent variable (y) on a set of dependent variables (x) separately within each distinct value of grouping by-variable. 50 | 51 | 52 | 53 | {marker options}{...} 54 | {title:Options} 55 | 56 | {dlgtab:Main} 57 | 58 | {phang} 59 | {opth vce(vcetype)} Choose a method for calculating standard errors. The default method computes asympotic OLS standard errors. The option {bf:vce}({it:robust}) computes heteroskedasticity-robust standard errors. The option {bf:vce}({it:cluster clustervar}) computes cluster-robust standard errors with clusters defined by the variable {it: clustervar}. 60 | 61 | {dlgtab:Save Output} 62 | 63 | {phang} 64 | {opt save(filename)} saves the output dataset to a dataset specified by {it:filename}. If a full file path is not provided, the working directory used. If no file extension is specified, .dta is assumed. 65 | 66 | {marker examples}{...} 67 | {title:Examples} 68 | 69 | {marker example1}{...} 70 | {pstd}{bf:Example 1} 71 | 72 | {pstd}Load the auto example dataset.{p_end} 73 | {phang2}. {stata sysuse auto, clear}{p_end} 74 | 75 | {pstd}Regress price on mpg within each value of foreign.{p_end} 76 | {phang2}. {stata regressby price mpg, by(foreign)}{p_end} 77 | 78 | {pstd}Examine the data.{p_end} 79 | {phang2}. {stata list}{p_end} 80 | 81 | {marker example2}{...} 82 | {pstd}{bf:Example 2} 83 | 84 | {pstd}Load the life expectancy by country example dataset.{p_end} 85 | {phang2}. {stata sysuse lifeexp, clear}{p_end} 86 | 87 | {pstd}Regress life expectancy on per-capita GDP within region, saving out to output.dta in the working directory.{p_end} 88 | {phang2}. {stata regressby lexp gnppc, by(region) save(output.dta)}{p_end} 89 | 90 | 91 | 92 | {pstd}{p_end} 93 | 94 | {marker author}{...} 95 | {title:Author} 96 | 97 | {pstd}Michael Droste{p_end} 98 | {pstd}thedroste@gmail.com{p_end} 99 | 100 | 101 | 102 | {marker acknowledgements}{...} 103 | {title:Acknowledgements} 104 | 105 | {pstd}The present version of {cmd:regressby} is based on code written for Michael Stepner's Health Inequality Project. It was extended by Michael Droste with helpful contributions by Wilbur Townsend. Regressby also benefited from valuable advice provided by Raj Chetty. 106 | 107 | -------------------------------------------------------------------------------- /stata.toc: -------------------------------------------------------------------------------- 1 | 2 | v 0.51 3 | d Michael Droste, thedroste@gmail.com 4 | p 'regressby': fast and flexible grouped regressions --------------------------------------------------------------------------------