├── LICENSE
├── README.md
├── benchmarks
    └── regressby_benchmark.png
├── regressby.ado
├── regressby.pkg
├── regressby.sthlp
└── stata.toc


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Michael Droste
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | regressby
 3 | =================================
 4 | 
 5 | [Overview](#overview)
 6 | | [Motivation](#motivation)
 7 | | [Installation](#installation)
 8 | | [Usage](#usage)
 9 | | [Benchmarks](#benchmarks)
10 | | [To-Do](#todo)
11 | | [Acknowledgements](#acknowledgements)
12 | | [License](#license)
13 | 
14 | Flexible and hyper-fast grouped regressions in Stata
15 | 
16 | `version 0.51 31jul2018`
17 | 
18 | 
19 | Overview
20 | ---------------------------------
21 | 
22 | regressby is a fast and efficient method to run grouped OLS regressions; that is, it estimates a given OLS regression model on a collection of subsets of your dataset, returning the coefficients and standard errors associated with each regression. Functionally, it is very similar to the built-in -statsby- program, however, -regressby- runs between 10 and 1000 times faster than -statsby- in most use cases. The performance increases are particularly large when there are many groups, when the number observations in each group is relatively small, and when the regression model only contains a few parameters.
23 | 
24 | regressby supports a number of useful bells and whistles: subsetting with if/in, analytical weights, heteroskedasticity-robust and clustered standard errors. Furthermore, unlike statsby, regressby (optionally) allows users to access to the full variance-covariance matrix associated with each regression by returning the sampling covariance associated with each pair of estimated parameters.
25 | 
26 | 
27 | Motivation
28 | ---------------------------------
29 | 
30 | It is easiest to explain how regressby functions by way of example. Suppose you want to estimate a regression describing how the relationship between a person's income (y) and their parent's income (x) varies across place of birth (g). More concretely, you want to run a regression of y on x with separate slopes and intercepts for each group g.
31 | 
32 | You can accomplish this in one step by regressing y on a vector of dummy variables for each distinct value of g and a vector of interactions between these dummies and x, suppressing the constant to avoid interpreting the coefficients with respect to an omitted reference group. This approach is convenient, but suffers from a number of drawacks. Most importantly, Stata does not allow the direct estimation of more than 10,998 parameters simultaneously, which in this case means that this one-step estimator can only be computed when there are fewer than 5,500 groups. Second, it turns out that directly estimating thousands of parameters is quite slow.
33 | 
34 | If the number of groups is relatively large, an alternative strategy is to estimate a univariate regression of y on x separately within each group g. There are at least two easy ways to do this in Stata, either by manually iterating over groups or by using the built-in -statsby- function. Unfortunately, both of these methods are excruciatingly slow when the number of groups is large.
35 | 
36 | Regressby is intended primarily as a replacement for these built-in methods. In my use cases, this program has been hundreds of times faster than -statsby-, reducing the runtime of scripts that would previously take days or weeks into less than an hour.
37 | 
38 | 
39 | Installation
40 | ---------------------------------
41 | 
42 | There are two options for installing regressby.
43 | 
44 | 1. The most recent version can be installed from Github with the following Stata command:
45 | 
46 | ```stata
47 | net install regressby, from(https://raw.githubusercontent.com/mdroste/stata-regressby/master/)
48 | ```
49 | 
50 | 2. A ZIP containing the program can be downloaded and manually placed on the user's adopath from Github.
51 | 
52 | 
53 | Usage
54 | ---------------------------------
55 | 
56 | The following two commands are equivalent:
57 | 
58 | ```stata
59 | regressby y x, by(byvars)
60 | statsby, by(byvars) clear: reg y x	
61 | ```
62 | 
63 | More on this soon. See the help file in Stata.
64 | 
65 | 
66 | Benchmarks
67 | ---------------------------------
68 | 
69 | ![regressby benchmark](benchmarks/regressby_benchmark.png "regressby benchmark")
70 | 
71 |   
72 | Todo
73 | ---------------------------------
74 | 
75 | The following items will be addressed soon:
76 | 
77 | - [ ] Finish off this readme.md and the help file
78 | - [ ] Finish benchmarking
79 | - [ ] Provide script to validate results / example datasets
80 | - [ ] Add support for frequency weights
81 | 
82 | Porting this program into a compiled C plugin for Stata would yield a significant increase in performance; I have no plans to do that in the near future.
83 | 
84 | 
85 | Acknowledgements
86 | ---------------------------------
87 | 
88 | This program is based off of internal code from the illustrious [Michael Stepner](https://github.com/michaelstepner)'s health inequality project. This program also benefited from contributions provided by the inimitable Dr. [Wilbur Townsend](https://github.com/wilbur-t), who helped elegantly generalize the code to allow for an arbitrary number of regressors. Finally, this program benefited greatly from the guidance and advice of Raj Chetty.
89 | 
90 | 
91 | License
92 | ---------------------------------
93 | 
94 | regressby is [MIT-licensed](https://github.com/mdroste/stata-regressby/blob/master/LICENSE).
95 | 


--------------------------------------------------------------------------------
/benchmarks/regressby_benchmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mdroste/stata-regressby/dae25762b633ee87cba46f8a85fa6712056fc9ff/benchmarks/regressby_benchmark.png


--------------------------------------------------------------------------------
/regressby.ado:
--------------------------------------------------------------------------------
  1 | *===============================================================================
  2 | * PROGRAM: regressby.ado
  3 | * PURPOSE: Performs fast grouped univariate OLS regressions.
  4 | *          The following commands are equivalent:
  5 | *			 regressby y x, by(byvars)
  6 | *			 statsby, by(byvars) clear: reg y x	
  7 | *		   Except regressby will run 10-100x faster.
  8 | *          Also computes standard errors in a variety of flavors: usual
  9 | *          asymptotic standard errors, robust standard errors, and clustered
 10 | *          standard errors.
 11 | * AUTHORS: Michael Stepner, Michael Droste, Wilbur Townsend
 12 | *===============================================================================
 13 | 
 14 | 
 15 | *-------------------------------------------------------------------------------
 16 | * Stata wrapper
 17 | *-------------------------------------------------------------------------------
 18 | 
 19 | program define regressby
 20 | 
 21 | 	version 12.0
 22 | 	syntax varlist(min=2 numeric) [aweight], by(varlist) [vce(string) covs save(string)] 
 23 | 	
 24 | 	* Preserve dataset in case we crash
 25 | 	preserve
 26 | 
 27 | 	* Restrict sample with if/in conditions
 28 | 	marksample touse, strok novarlist
 29 | 	qui drop if `touse'==0
 30 | 
 31 | 	* Parse VCE option, if specified
 32 | 	if `"`vce'"' != "" {
 33 | 		my_vce_parse , vce(`vce') 
 34 | 		local vcetype    "robust"
 35 | 		local clusterby  "`r(clustervar)'"
 36 | 		if "`vcetype'"=="robust" local robust "robust"
 37 | 		if "`clusterby'"!="" local robust = ""
 38 | 	}
 39 | 
 40 | 	* Check to make sure save data file path is valid
 41 | 	if ("`replace'"=="") & (`"`savegraph'"'!="") {
 42 | 		if regexm(`"`savegraph'"',"\.[a-zA-Z0-9]+$") confirm new file `"`save'"'
 43 | 		else confirm new file `"`save'.dta"'
 44 | 	}
 45 | 
 46 | 	* Error checking: can't specify both robust and clusterby
 47 | 	if "`robust'"!="" & "`clusterby'"!="" {
 48 | 		di as error "Error: can't specify both clustered and robust standard errors at once! Choose one."
 49 | 		exit
 50 | 	}
 51 | 	
 52 | 	* Display type of standard error chosen
 53 | 	if "`robust'"=="" & "`clusterby'"=="" {
 54 | 		di "Running regressby with normal OLS standard errors."
 55 | 	}
 56 | 	if "`robust'"!="" {
 57 | 		di "Running regressby with robust standard errors."
 58 | 	}
 59 | 	if "`clusterby'"!="" {
 60 | 		di "Running regressby with cluster-robust standard errors (clustered by `clusterby')."
 61 | 	}
 62 | 	
 63 | 	* Construct analytical weight variable
 64 | 	if ("`weight'"!="") {
 65 | 		local wt [`weight'`exp']
 66 | 		tempvar tmpwt
 67 | 		gen `tmpwt' `exp'
 68 | 		local weightby `tmpwt'
 69 | 		di "Using analytical weights, weight `exp'."
 70 | 	}
 71 | 
 72 | 	* Display weighting scheme, if applicable
 73 | 	if "`weightby'"!="" {
 74 | 		foreach v in `varlist' {
 75 | 			qui replace `v' = `v' * sqrt(`weightby')
 76 | 		}
 77 | 		qui replace `weightby' = sqrt(`weightby')
 78 | 	}
 79 | 	
 80 | 	* Convert string by-vars to temporary numeric variables
 81 | 	foreach var of varlist `by' {
 82 | 		cap confirm numeric variable `var', exact
 83 | 		if _rc==0 {  // numeric var
 84 | 			local bynumeric `bynumeric' `var'
 85 | 		}
 86 | 		else {  // string var
 87 | 			tempvar `var'N
 88 | 			encode `var', gen(``var'N')
 89 | 			local bynumeric `bynumeric' ``var'N'
 90 | 			local bystr `bystr' `var'  // list of string by-vars
 91 | 		}
 92 | 	}
 93 | 	
 94 | 	* Sort using by-groups
 95 | 	sort `by' `clusterby'
 96 | 	
 97 | 	* Generate a single by-variable counting by groups
 98 | 	tempvar grp
 99 | 	egen `grp'=group(`bynumeric')
100 | 	qui drop if mi(`grp')
101 | 	
102 | 	* Drop observations missing independent or dependent variables
103 | 	* Also count number of variables here including constant, awkward and should be replaced
104 | 	local num_x = 0
105 | 	foreach v in `varlist'{
106 | 		qui drop if mi(`v')
107 | 		local num_x = `num_x' + 1
108 | 	}
109 | 	local num_x = `num_x' - 1
110 | 	if "`nocons'"=="" local num_x = `num_x' + 1
111 | 
112 | 	* Drop observations missing weight, if weights are specified
113 | 	if "`weightby'"!="" {
114 | 		drop if `weightby'==.
115 | 	}
116 | 
117 | 	* XX revisit this later to handle missing data
118 | 	
119 | 
120 | 	* Perform regressions on each by-group, store in dataset
121 | 	mata: _regressby("`varlist'", "`grp'", "`bynumeric'","`clusterby'","`robust'","`weightby'")
122 | 	
123 | 	* Convert string by-vars back to strings, from numeric
124 | 	foreach var in `bystr' {
125 | 		decode ``var'N', gen(`var')
126 | 	}
127 | 	order `by'
128 | 
129 | 
130 | 	* XX find out if it is faster to compute R2 in Mata or Stata
131 | 	if "`nocov'"!="" {
132 | 		cap drop _cov_*
133 | 	}
134 | 
135 | 	* XX optionally save out to dta and just restore with a message
136 | 	if "`save'"=="" {
137 | 		restore, not
138 | 	}
139 | 	if "`save'"!="" {
140 | 		save `save', replace
141 | 		restore
142 | 	}
143 | 
144 | end
145 | 
146 | *-------------------------------------------------------------------------------
147 | * Mata program: _regressby3
148 | * Inputs:
149 | *  	- A y-var and x-var for an OLS regression
150 | *  	- A group var, for which each value represents a distinct by-group.	
151 | *		This var must be in ascending order.
152 | *	- A list of numeric by-variables, whose groups correspond to th group var.
153 | * Outputs:
154 | *  	- dataset of coefficients from OLS regression for each by-group
155 | *-------------------------------------------------------------------------------
156 | 
157 | version 13.1
158 | set matastrict on
159 | 
160 | mata:
161 | void _regressby(string scalar regvars, string scalar grpvar, string scalar byvars, string scalar clusterby, string scalar robust, string scalar weightby) {
162 | 
163 | // Convert variable names to column indices
164 | real rowvector regcols, bycols, clustercol, weightcol
165 | real scalar ycol, xcol, grpcol
166 | regcols 	= st_varindex(tokens(regvars))
167 | bycols 		= st_varindex(tokens(byvars))
168 | clustercol 	= st_varindex(tokens(clusterby))
169 | weightcol 	= st_varindex(tokens(weightby))
170 | grpcol 		= st_varindex(grpvar)
171 | 
172 | // Fetch number of groups
173 | real scalar numgrp, startobs, curgrp
174 | numgrp 		= _st_data(st_nobs(),grpcol)
175 | startobs 	= 1  
176 | curgrp	 	= _st_data(1,grpcol)
177 | 
178 | // Preallocate matrices for output
179 | real matrix groups, coefs, ses, covs, nobs
180 | groups 		= J(numgrp, cols(bycols), .)
181 | coefs 		= J(numgrp, cols(regcols), .)
182 | Vs          = J(numgrp, cols(regcols)^2, .)
183 | nobs 		= J(numgrp, 1, .)
184 | 
185 | // Preallocate regression objects
186 | real matrix XX, Xy, XX_inv, V, Z, M, y, x, w
187 | real scalar N, k, cov, p, nc
188 | real vector beta, e, s2, cvar, xi, ei
189 | 
190 | // -----------------------------------------------------------------------------
191 | // Iterate over groups
192 | // -----------------------------------------------------------------------------
193 | 
194 | // Iterate over groups 1 to Ng-1
195 | for (obs=1; obs<=st_nobs()-1; obs++) {
196 | 	if (_st_data(obs,grpcol)!=curgrp) {
197 | 		st_view(M, (startobs,obs-1), regcols, 0)
198 | 		st_subview(y, M, ., 1)
199 | 		st_subview(X, M, ., (2\.))
200 | 		N    = rows(X)
201 | 	// Augment x with either column of 1's or weights
202 | 	// TODO -- noconstant option needs to be specified here and also accounted for in df
203 | 		if (weightby!="") {
204 | 			st_view(w, (startobs,obs-1), weightcol, 0)
205 | 			X = X,w
206 | 		}
207 | 		if (weightby=="") {
208 | 			X = X,J(N,1,1)
209 | 		}
210 | 		// Define matrix products
211 | 		XX 		= quadcross(X,X)
212 | 		Xy 		= quadcross(X,y)
213 | 		XX_inv 	= invsym(XX)
214 | 		// ------------ COMPUTE COEFFICIENTS --------------------
215 | 		beta 	= (XX_inv*Xy)'
216 |         e 		= y - X*beta'
217 | 		p    	= cols(X)
218 | 		k    	= p - diag0cnt(XX_inv)
219 | 		// ------------ COMPUTE STANDARD ERRORS -----------------
220 | 		if (robust == "" & clusterby=="") {
221 | 			V 	= quadcross(e,e)/(N-k)*cholinv(XX)
222 | 		}
223 | 		if (robust != "") {
224 | 			V   = (N/(N-k))*XX_inv*quadcross(X, e:^2, X)*XX_inv
225 | 		}
226 | 		if (clusterby != "") {
227 | 			st_view(cvar,(startobs,obs-1),clustercol,0)
228 | 			info = panelsetup(cvar, 1)
229 | 			nc  = rows(info)
230 | 			Z   = J(k, k, 0)
231 | 			if (nc>2) {
232 | 				for (i=1; i<=nc; i++) {
233 | 					xi = panelsubmatrix(X,i,info)
234 | 					ei = panelsubmatrix(e,i,info)
235 | 					Z  = Z + xi'*(ei*ei')*xi
236 | 				}
237 | 				V   = ((N-1)/(N-k))*(nc/(nc-1))*XX_inv*Z*XX_inv
238 | 			}
239 | 		}
240 | 		// ------------ STORE OUTPUT ----------------------------
241 | 		coefs[curgrp,.] 	= beta
242 | 	    Vs[curgrp,.]        = rowshape(V, 1)
243 | 		nobs[curgrp,1]  	= N
244 | 		groups[curgrp,.] 	= st_data(startobs,bycols)
245 | 		// ------------ WRAP UP BY ITERATING COUNTERS -----------
246 | 		curgrp	 = _st_data(obs,grpcol)
247 | 		startobs = obs
248 | 	}
249 | }
250 | 
251 | // Iterate over last group manually
252 | obs=st_nobs()
253 | if (_st_data(obs,grpcol)==curgrp) {  // last observation is not a group to itself
254 | 	// increment obs, since code is written as processing the observation that is 1 past the last in the group
255 | 	++obs
256 | 	// compute OLS coefs: beta = inv(X'X) * X'y. --> see Example 4 of -help mf_cross-
257 | 	st_view(M, (startobs,obs-1), regcols, 0)
258 | 	st_subview(y, M, ., 1)
259 | 	st_subview(X, M, ., (2\.))
260 | 	N    = rows(X)
261 | 	// Augment X with either column of 1's (unweighted) or weights (weighted)
262 | 	// TODO -- noconstant option needs to be specified here and also accounted for in df
263 | 	if (weightby!="") {
264 | 		st_view(w, (startobs,obs-1), weightcol, 0)
265 | 		X = X,w
266 | 	}
267 | 	if (weightby=="") {
268 | 		X = X,J(N,1,1)
269 | 	}
270 | 	// Define matrix products
271 | 	XX 		= quadcross(X,X)
272 | 	Xy 		= quadcross(X,y)
273 | 	XX_inv 	= invsym(XX)
274 | 	beta 	= (XX_inv*Xy)'
275 |     e 		= y - X*beta'
276 | 	p    	= cols(X)
277 | 	k    	= p - diag0cnt(XX_inv)
278 | 	// USUAL OLS STANDARD ERRORS
279 | 	if (robust == "" & clusterby == "") {
280 | 		V 	= quadcross(e,e)/(N-k)*cholinv(XX)
281 | 	}
282 | 	// ROBUST STANDARD ERRORS
283 | 	if (robust != "") {
284 | 		V    = (N/(N-k))*XX_inv*quadcross(X, e:^2, X)*XX_inv
285 | 	}
286 | 	// CLUSTERED STANDARD ERRORS
287 | 	if (clusterby != "") {
288 | 		st_view(cvar,(startobs,obs-1),clustercol,0)
289 | 		info = panelsetup(cvar, 1)
290 | 		nc  = rows(info)
291 | 		Z   = J(k, k, 0)
292 | 		if (nc>2) {
293 | 			for (i=1; i<=nc; i++) {
294 | 				xi = panelsubmatrix(X,i,info)
295 | 				ei = panelsubmatrix(e,i,info)
296 | 				Z  = Z + xi'*(ei*ei')*xi
297 | 			}
298 | 			V   = ((N-1)/(N-k))*(nc/(nc-1))*XX_inv*Z*XX_inv
299 | 		}
300 | 	}
301 | 	// STORE REGRESSION OUTPUT
302 | 	coefs[curgrp,.] 	= beta
303 |     Vs[curgrp,.]        = rowshape(V, 1)
304 | 	nobs[curgrp,1]  	= N
305 | 	groups[curgrp,.] 	= st_data(startobs,bycols)
306 | }
307 | 	
308 | else {
309 | 	display("{error} last observation is in a singleton group")
310 | 	exit(2001)
311 | }
312 | 
313 | // -----------------------------------------------------------------------------
314 | // Gather output and pass back into Stata
315 | // -----------------------------------------------------------------------------
316 | 
317 | // Store group identifiers in dataset
318 | stata("qui keep in 1/"+strofreal(numgrp, "%18.0g"))
319 | stata("keep "+byvars)
320 | st_store(.,tokens(byvars),groups)
321 | 
322 | // Store coefficients in dataset:
323 | 
324 | // ... Number of observations,
325 | (void) st_addvar("long", "N")
326 | st_store(., ("N"), nobs)
327 | 
328 | // ... And then looping over covariates,
329 | covariates = (cols(regcols)>1) ? tokens(regvars)[|2 \ .|], "cons" : ("cons")
330 | for (k=1; k<=length(covariates); k++) {
331 |     covName = covariates[k]
332 |     // ... Coefficients and standard errors,
333 |     (void) st_addvar("float", "_b_"+covName)
334 |     (void) st_addvar("float", "_se_"+covName)
335 |     st_store(., "_b_"+covName,  coefs[., k])
336 |     st_store(., "_se_"+covName, sqrt(Vs[., k + cols(regcols)*(k - 1)]))
337 |     // ... And the sampling covariances.
338 |     for (j=1; j<k; j++) {
339 |         otherCovName = covariates[j]
340 |         (void) st_addvar("float", "_cov_"+covName+"_"+otherCovName)
341 |         st_store(., "_cov_"+covName+"_"+otherCovName, Vs[., k + cols(regcols) * (j - 1)])
342 |     }
343 | }
344 | 
345 | }
346 | 
347 | end
348 | 
349 | *-------------------------------------------------------------------------------
350 | * Auxiliary Stata programs for parsing vce (standard error options)
351 | * Source: https://blog.stata.com/2015/12/08/programming-an-estimation-command-in-stata-using-a-subroutine-to-parse-a-complex-option/
352 | *-------------------------------------------------------------------------------
353 | 
354 | program define my_vce_parse, rclass
355 |     syntax  [, vce(string) ]
356 |     local case : word count `vce'
357 |     if `case' > 2 {
358 |         my_vce_error , typed(`vce')
359 |     }
360 |     local 0 `", `vce'"' 
361 |     syntax  [, Robust CLuster * ]
362 |     if `case' == 2 {
363 |         if "`robust'" == "robust" | "`cluster'" == "" {
364 |             my_vce_error , typed(`vce')
365 |         }
366 |         capture confirm numeric variable `options'
367 |         if _rc {
368 |             my_vce_error , typed(`vce')
369 |         }
370 |         local clustervar "`options'" 
371 |     }
372 |     else {    // case = 1
373 |         if "`robust'" == "" {
374 |             my_vce_error , typed(`vce')
375 |         }
376 |     }
377 |     return clear    
378 |     return local clustervar "`clustervar'" 
379 | end
380 |  
381 | program define my_vce_error
382 |     syntax , typed(string)
383 |     display `"{red}{bf:vce(`typed')} invalid"'
384 |     error 498
385 | end
386 | 


--------------------------------------------------------------------------------
/regressby.pkg:
--------------------------------------------------------------------------------
1 | v 0.51
2 | d
3 | d 'regressby': fast and flexible grouped regressions
4 | d
5 | d Distribution-Date: 20180731
6 | d
7 | f regressby.ado
8 | f regressby.sthlp
9 | 


--------------------------------------------------------------------------------
/regressby.sthlp:
--------------------------------------------------------------------------------
  1 | {smcl}
  2 | {* *! version 0.51  31jul2018}{...}
  3 | {viewerjumpto "Syntax" "regressby##syntax"}{...}
  4 | {viewerjumpto "Description" "regressby##description"}{...}
  5 | {viewerjumpto "Options" "regressby##options"}{...}
  6 | {viewerjumpto "Examples" "regressby##examples"}{...}
  7 | {viewerjumpto "Author" "regressby##author"}{...}
  8 | {viewerjumpto "Acknowledgements" "regressby##acknowledgements"}{...}
  9 | {title:Title}
 10 |  
 11 | {p2colset 5 19 21 2}{...}
 12 | {p2col :{hi:regressby} {hline 2}}Fast, flexible grouped regressions{p_end}
 13 | {p2colreset}{...}
 14 |  
 15 |  
 16 |  
 17 | {marker syntax}{title:Syntax}
 18 |  
 19 | {p 8 15 2}
 20 | {cmd:regressby}
 21 | depvar [indepvars] {ifin}
 22 | {weight}, by(varlist)
 23 | [{cmd:}{it:options}]
 24 |                                
 25 |  
 26 | {synoptset 30 tabbed}{...}
 27 | {synopthdr :options}
 28 | {synoptline}
 29 |  
 30 | {syntab :Main}
 31 | {synopt :{opt vce(vcetype)}}{it:vcetype} may be {bf:robust}, or {bf:cluster} {it:clustvar}.{p_end}
 32 | {synopt :{opt nocovs}}Do not compute the sampling covariances between dependent variables.{p_end}
 33 |  
 34 | {syntab :Save Output}
 35 | {synopt :{opt save(filename)}}Saves output to a .dta given by {it:filename}, restores data{p_end}
 36 | 
 37 | {synoptline}
 38 | {p 4 6 2}
 39 | {opt aweight}s are allowed;
 40 | see {help weight}.
 41 | {p_end}
 42 |  
 43 |  
 44 |  
 45 | {marker description}{...}
 46 | {title:Description}
 47 |  
 48 | {pstd}
 49 | {opt regressby} runs a series of grouped regressions of an independent variable (y) on a set of dependent variables (x) separately within each distinct value of grouping by-variable.
 50 | 
 51 |  
 52 |  
 53 | {marker options}{...}
 54 | {title:Options}
 55 |  
 56 | {dlgtab:Main}
 57 |  
 58 | {phang}
 59 | {opth vce(vcetype)} Choose a method for calculating standard errors. The default method computes asympotic OLS standard errors. The option {bf:vce}({it:robust}) computes heteroskedasticity-robust standard errors. The option {bf:vce}({it:cluster clustervar}) computes cluster-robust standard errors with clusters defined by the variable {it: clustervar}.
 60 |  
 61 | {dlgtab:Save Output}
 62 |  
 63 | {phang}
 64 | {opt save(filename)} saves the output dataset to a dataset specified by {it:filename}. If a full file path is not provided, the working directory used. If no file extension is specified, .dta is assumed.
 65 | 
 66 | {marker examples}{...}
 67 | {title:Examples}
 68 |  
 69 | {marker example1}{...}
 70 | {pstd}{bf:Example 1}
 71 | 
 72 | {pstd}Load the auto example dataset.{p_end}
 73 | {phang2}. {stata sysuse auto, clear}{p_end}
 74 | 
 75 | {pstd}Regress price on mpg within each value of foreign.{p_end}
 76 | {phang2}. {stata regressby price mpg, by(foreign)}{p_end}
 77 | 
 78 | {pstd}Examine the data.{p_end}
 79 | {phang2}. {stata list}{p_end}
 80 | 
 81 | {marker example2}{...}
 82 | {pstd}{bf:Example 2}
 83 | 
 84 | {pstd}Load the life expectancy by country example dataset.{p_end}
 85 | {phang2}. {stata sysuse lifeexp, clear}{p_end}
 86 | 
 87 | {pstd}Regress life expectancy on per-capita GDP within region, saving out to output.dta in the working directory.{p_end}
 88 | {phang2}. {stata regressby lexp gnppc, by(region) save(output.dta)}{p_end}
 89 | 
 90 | 
 91 | 
 92 | {pstd}{p_end}
 93 |  
 94 | {marker author}{...}
 95 | {title:Author}
 96 |  
 97 | {pstd}Michael Droste{p_end}
 98 | {pstd}thedroste@gmail.com{p_end}
 99 |  
100 |  
101 |  
102 | {marker acknowledgements}{...}
103 | {title:Acknowledgements}
104 |  
105 | {pstd}The present version of {cmd:regressby} is based on code written for Michael Stepner's Health Inequality Project. It was extended by Michael Droste with helpful contributions by Wilbur Townsend. Regressby also benefited from valuable advice provided by Raj Chetty.
106 |  
107 | 


--------------------------------------------------------------------------------
/stata.toc:
--------------------------------------------------------------------------------
1 | 
2 | v 0.51
3 | d Michael Droste, thedroste@gmail.com
4 | p 'regressby': fast and flexible grouped regressions


--------------------------------------------------------------------------------