├── LICENSE
├── README.md
├── fig
├── aapl.png
├── compare1.png
├── compare2.png
├── tsla1.png
└── tsla2.png
└── src
├── calc_welch_market_beta.md
├── comp_funda_clean.sas
└── crsp_monthly.md
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Chao Zi
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # Sample SAS Programs for Processing WRDS Data
3 |
4 | In this repository, I present a selection of SAS programs that (pre-)process [WRDS data](https://wrds-www.wharton.upenn.edu/) and compute certain variables/measures, as well as conduct some simple analysis.
5 | My goal is to provide efficient procedures for cleaning and transforming raw data from various WRDS databases (e.g., CRSP, Compustat, IBES, etc.) into well-structured datasets that contain only variables of interest and are conducive to econometric analysis.
6 | By walking through the steps in each program, one can
7 | (1) quickly gain a working knowledge of related raw data (e.g., file structures, variable definitions, etc.),
8 | and (2) understand the proper steps in the cleaning and organizing processes.
9 | I believe these programs are well-written and should be pretty straightforward to interpret (even for people who are new to SAS).
10 | They are also very adaptable and can be easily tailored to serve specific research purposes.
11 | (I personally have used these programs as building blocks for more complicated projects.)
12 | You should be able to run these programs smoothly on [WRDS SAS Studio](https://wrds-www.wharton.upenn.edu/pages/data/sas-studio-wrds/).
13 | Should you have any questions or find any bugs, please submit an issue or email me at [i@czi.finance](mailto:i@czi.finance).
14 | I will do my best to help!
15 |
16 |
17 | ### Table of Contents
18 | #### Basic programs
19 | - [Examine companies' financial performance over time](#comp)
20 | - [Construct stock portfolios based on past characteristics](#crsp)
21 | - [Compare analysts' forecasts with firms' actual earnings](#ibes)
22 | #### Additional programs
23 | - [Calculate market betas according to Welch (2019)](./src/calc_welch_market_beta.md)
24 |
25 |
26 |
27 | ## Examine companies' financial performance over time
28 |
29 | In this program, I build from [Compustat](https://wrds-web.wharton.upenn.edu/wrds/query_forms/navigation.cfm?navId=60) a data set that contains various financial measures for U.S. companies; they are calculated from **annual** financial statements.
30 | This data set can be used to study and compare the performance of one or more companies over time.
31 | As an illustration, I plot four figures below using this data.
32 | The first two shows the market leverage, asset turnover, and profit margin, as well as the asset and sales growth and the market-to-book (assets) ratio for **Tesla Inc.** (`gvkey = 184996`) over the past five fiscal years.
33 | The latter two figures compare Tesla with Ford and GM.
34 |
35 | Tesla did not use much leverage in the past five years.
36 | Its market leverage was about 8 percent in the 2015 fiscal year.
37 | That number increased to around 17 percent in 2016 and remained there ever since.
38 | Tesla's asset turnover declined from 50 percent in 2015 to roughly 30-40 percent in 2016-17.
39 | That number climbed back and reached more than 70 percent in 2018-19, suggesting that Tesla has become more effective in generating sales.
40 | That said, Tesla has not been good at generating profit.
41 | It has a negative profit margin in 2015-18.
42 | That number only turned positive in 2019.
43 | Tesla's book value of assets almost tripled in 2016, which is driven largely by the acquisition of *SolarCity*.
44 | But it has been growing rapidly nonetheless:
45 | Its balance sheet expanded by 38 and 26 percent in 2015 and 2017, respectively; that number slipped to 4 and 15 percent in 2018 and 2019, respectively.
46 | Tesla's sales also increased rapidly.
47 | In particular, its annual sales grew by 73, 68, and 83 percent in 2016, 2017, and 2018, respectively.
48 | But that number declined to less than 15 percent in 2019.
49 | Before the acquisition of *SolarCity*, Tesla's market value of assets was five times its book value, while after the acquisition, that number declined to around three (still a pretty high valuation).
50 |
51 | 
52 | 
53 |
54 | In comparison, Ford and GM used much more leverage: about 70-80 percent for the former, and around 60 percent for the latter.
55 | But their growth seem to have stagnated, with both firms' sales growth hovering around zero.
56 |
57 | Without further ado, let's dive into the code!
58 |
59 |
60 | 1. Begin by defining a set of macro variables that indicate sample period (e.g., from 1950 to 2019), data filters, and financial statement items to include.
61 | Obtain a complete list of covered companies, and construct a *balanced* panel (which I will refer to as the *Panel* hereafter) that spans the full sample period without time gap.
62 | ```sas
63 | %let yr_beg = 1950;
64 | %let yr_end = 2019;
65 | %let funda_filter = ((indfmt eq "INDL") and (consol eq 'C') and
66 | (popsrc eq 'D') and (datafmt eq "STD") and
67 | (fic eq "USA") and (curncd eq "USD") and
68 | (curcd eq "USD") and not missing(at));
69 | %let funda_fncd_filter = ((indfmt eq "INDL") and (consol eq 'C') and
70 | (popsrc eq 'D') and (datafmt eq "STD"));
71 | %let secm_filter = (primiss eq 'P' and fic eq "USA" and curcdm eq "USD");
72 | %let comp_sample_period = (&yr_beg. le fyear le &yr_end.);
73 | %let secm_sample_period = ("01jan&yr_beg."d le datadate le "31dec&yr_end."d);;
74 |
75 | /* panel variable: gvkey; time variable: fyear or datadate */
76 | %let names_vars = gvkey conm sic naics;
77 | %let funda_keys = gvkey datadate fyear;
78 | %let funda_vars = sich naicsh at sale csho prcc_f seq ceq lct lt
79 | dltt mib txditc txdb itcb pstk pstkrv pstkl lo
80 | dlc cogs xsga revt xint xopr oiadp oibdp dp
81 | ppegt invt ib ppent dpact
82 | ;
83 |
84 | proc sort data = comp.names nodupkey
85 | out = _tmp0 (keep = &names_vars.);
86 | by gvkey;
87 | %let nyr = %eval(&yr_end. - &yr_beg.);
88 | data _tmp0; set _tmp0;
89 | yr_0 = &yr_beg.;
90 | array yr {&nyr.} yr_1 - yr_&nyr.;
91 | do i = 1 to &nyr.;
92 | yr(i) = yr_0 + i;
93 | end;
94 | drop i;
95 | proc transpose data = _tmp0
96 | out = _tmp1 (rename = (col1 = fyear)
97 | drop = _name_);
98 | by &names_vars.;
99 | var yr_0 - yr_&nyr.;
100 | run;
101 | ```
102 |
103 | 2. Extract only the valid records and the relevant items from the Fundamental Annual file for the given sample period.
104 | Here, only U.S. firm-years with non-missing book value of assets are included.
105 | It is important to do a sanity check and see whether the pair of `gvkey` and `datadate` uniquely identify observations.
106 | ```sas
107 | data funda_short; set comp.funda;
108 | where &comp_sample_period. and &funda_filter.;
109 | keep &funda_keys. &funda_vars.;
110 | proc sort nodupkey; by gvkey datadate;
111 | run;
112 | ```
113 |
114 | 3. Add to the *Panel* the relevant financial statement items as well as any variables of interest calculated from them (e.g., book value of equity `be`, market value of equity `me`, book value of debt `bd`, asset turnover `to`, profit margin `pm`, etc.).
115 | For each firm, exclude the preceding and trailing null observations---that is, those earlier (later) than the first (last) available record.
116 | (This makes the *Panel* unbalanced, but significantly reduces its size.)
117 | Note that alternative definitions are used to minimize the instances of missing value.
118 | Besides, (preliminary) sanity checks are conducted when defining all variables.
119 | ```sas
120 | proc sql;
121 | create table _tmp11 (drop = _gvkey _fyear) as
122 | select a.* , b.* from _tmp1 as a left join
123 | funda_short (rename = (gvkey = _gvkey fyear = _fyear)) as b
124 | on a.gvkey eq b._gvkey and a.fyear eq b._fyear
125 | group by gvkey
126 | having min(_fyear) le fyear le max(_fyear)
127 | ;
128 | quit;
129 | proc sort; by gvkey fyear datadate;
130 | data _tmp2;
131 | format gvkey fyear datadate conm sic;
132 | keep gvkey fyear datadate conm sic
133 | at sale ib be me bd PnI to pm ic;
134 | set _tmp11; by gvkey fyear datadate;
135 | /* if last.fyear; */
136 | if not missing(sich) then sic = put(sich , z4.);
137 | /* naics = coalesce(naicsh , naics); */
138 | /* naics3 = substr(put(naics , 6. -l) , 1 , 3); */
139 | /*************** VARIABLE DEFINITION ***************/
140 | be = coalesce(seq , sum(ceq , pstk) , sum(at-lt , -mib))
141 | + coalesce(txditc , sum(txdb , itcb) ,
142 | lt - lct - lo - dltt , 0)
143 | - coalesce(pstkrv , pstkl , pstk , 0);
144 | me = prcc_f * csho; me = ifn(me>0,me,.);
145 | bd = dlc + dltt; bd = ifn(bd>0,bd,.);
146 | ppegt = coalesce(ppegt , ppent+dpact);
147 | PnI = ppegt + invt; PnI = ifn(PnI>0,PnI,.);
148 | sale = ifn(missing(sale) , revt , sale);
149 | to = sale / ifn(at>0,at,.);
150 | oibdp = coalesce(oibdp , sale-xopr , sale-cogs-xsga);
151 | oiadp = coalesce(oiadp , oibdp-dp);
152 | pm = oiadp / ifn(sale>0,sale,.);
153 | ic = oiadp / ifn(xint>0,xint,.);
154 | /* check uniqueness of the key */
155 | proc sort nodupkey; by gvkey fyear;
156 | run;
157 | ```
158 |
159 | 4. Complement the market value of equity using information from the monthly security file (`comp.secm`).
160 | Consider only the primary equity issue of a company:
161 | use its market capitalization at the end of a fiscal year (or the following quarter-end if the former is unavailable) to represent the market value of equity.
162 | ```sas
163 | data _tmp21;
164 | keep gvkey datadate me_secm;
165 | set comp.secm;
166 | where &secm_filter. and &secm_sample_period.;
167 | me_secm = prccm * cshoq;
168 | if not missing(me_secm);
169 | rename datadate = mdate;
170 | /* check uniqueness of the key */
171 | proc sort nodupkey; by gvkey mdate;
172 | run;
173 |
174 | proc sql;
175 | create table _tmp22 as
176 | select a.gvkey , a.fyear , a.datadate ,
177 | b.mdate , b.me_secm
178 | from _tmp2 as a left join _tmp21 as b
179 | on a.gvkey eq b.gvkey and
180 | 0 le intck('mon' , a.datadate , b.mdate) le 3
181 | order by a.gvkey , a.fyear , a.datadate , b.mdate
182 | ;
183 | quit;
184 | proc transpose data = _tmp22 out = _tmp23 (drop = _name_)
185 | prefix = me_secm;
186 | var me_secm;
187 | by gvkey fyear datadate;
188 | data _tmp3;
189 | drop me_secm me_secm1 - me_secm2;
190 | merge _tmp2 _tmp23;
191 | by gvkey fyear datadate;
192 | me_secm = coalesce(of me_secm1 - me_secm2);
193 | me = coalesce(me , me_secm); me = ifn(me>0,me,.);
194 | /*************** VARIABLE DEFINITION ***************/
195 | ml = bd / (me + bd);
196 | bm = be / me;
197 | proc sort nodupkey; by gvkey fyear;
198 | run;
199 | ```
200 |
201 | 5. Obtain the footnotes for certain data items from the Fundamental Annual Footnote file.
202 | They can be used to identify and filter out extreme values caused by some extraordinary corporate actions/events (e.g., M&A, spin-off, bankruptcy, etc.).
203 | ```sas
204 | proc sql;
205 | create table _tmp4 as
206 | select a.* , b.at_fn , b.sale_fn
207 | from _tmp3 as a left join
208 | (select * from comp.funda_fncd
209 | where &comp_sample_period.
210 | and &funda_fncd_filter.) as b
211 | on a.gvkey eq b.gvkey and
212 | a.datadate eq b.datadate
213 | ;
214 | quit;
215 | proc sort nodupkey; by gvkey fyear;
216 | run;
217 | ```
218 |
219 | 6. *(optional)* Add to the *Panel* companies' sales to major customers (which is reported in `compsegd.seg_customer`).
220 | Here I use the U.S. government as an example: I compute firms' total sales to federal, state, and local governments combined.
221 | ```sas
222 | proc sql;
223 | create table _tmp41 as
224 | select gvkey , datadate , sum(salecs) as sale_gov
225 | from compsegd.seg_customer
226 | where ctype in ('GOVDOM' , 'GOVSTATE' , 'GOVLOC')
227 | group by gvkey , datadate
228 | having sale_gov gt 0
229 | ;
230 | create table comp_funda_clean as
231 | select a.* , b.sale_gov
232 | from _tmp4 as a left join _tmp41 as b
233 | on a.gvkey eq b.gvkey and
234 | a.datadate eq b.datadate
235 | ;
236 | quit;
237 | proc sort nodupkey; by gvkey fyear;
238 | run;
239 | ```
240 |
241 | 7. One can export the *Panel* in different formats (e.g., .dta, .csv) conducive to subsequent analysis.
242 | For example, I use `comp_funda_clean` in Stata to plot those figures above.
243 | ```sas
244 | %let FFP = "[The Path to Your Output Folder]/comp_funda_clean.dta";
245 | proc export data = comp_funda_clean outfile = &FFP. replace; run;
246 | ```
247 | Note that I use `left join` when adding new variables to the *Panel*, and then check whether the uniqueness of the key is preserved.
248 | I believe this is a good practice that helps prevent unintentionally duplicating or deleting observations when one merges data.
249 | It also helps reveal bugs if there is any.
250 |
251 | [Back to top](#top)
252 |
253 |
254 | ## Construct stock portfolios based on past characteristics
255 |
256 |
257 | ## Compare analysts' forecasts with firms' actual earnings
258 |
259 | In this program, I build from [IBES](https://wrds-web.wharton.upenn.edu/wrds/query_forms/navigation.cfm?navId=221&_ga=2.202254610.2026535339.1587168594-1066308586.1576595708) a data set that contains US companies' actual *earnings per share* (EPS) for certain fiscal years, along with the corresponding forecasts made by financial analysts prior to the earnings announcements.
260 | This data set can be used to address questions like:
261 | - Do analysts make rational predictions?
262 | - What is the impact of surprisingly high/low earnings?
263 | - What is driving the earnings surprises?
264 |
265 | As an illustration, I plot the figure below using this data.
266 | It shows analysts' predictions of **Apple Inc.**'s EPS for the 2019 fiscal year, as well as the actual number that was announced on October 30, 2019.
267 | One can see that analysts made forecasts throughout the year, and overall they seem to slightly underestimate Apple's earnings for this fiscal year.
268 |
269 |
270 |
271 | Without further ado, let's dive into the code!
272 |
273 | 1. Begin by choosing sample period (e.g., from 1970 to 2020); `pends` and `fpedats` denote fiscal period end.
274 | Extract from `ibes.actu_epsus` the actual EPS data and apply a series of standard filters.
275 | The resulting data set `_tmp0` covers U.S. firms that report EPS in dollars.
276 | In this exercise, we focus on annual EPS instead of quarterly one (i.e., `pdicity eq "ANN"`).
277 | Observations with missing announcement dates or EPS values are deleted.
278 | ```sas
279 | %let yr_beg = 1970;
280 | %let yr_end = 2020;
281 | %let ibes_actu_period = ("01jan&yr_beg."d le pends le "31dec&yr_end."d);
282 | %let ibes_actu_filter = (measure eq "EPS" and anndats le actdats and
283 | curr_act eq "USD" and usfirm eq 1);
284 | %let ibes_detu_period = ("01jan&yr_beg."d le fpedats le "31dec&yr_end."d);
285 | %let ibes_detu_filter = (missing(currfl) and measure eq "EPS" and missing(curr) and
286 | usfirm eq 1 and anndats le actdats and report_curr eq "USD" );
287 |
288 | data _tmp0;
289 | format ticker pends pdicity anndats value;
290 | set ibes.actu_epsus;
291 | where &ibes_actu_period. and &ibes_actu_filter.;
292 | if pdicity eq "ANN" and nmiss(anndats , value) eq 0;
293 | keep ticker pends pdicity anndats value;
294 | /* Sanity check: there should be only one observation for a given firm-fiscal period. */
295 | proc sort nodupkey; by ticker pends pdicity;
296 | run;
297 | ```
298 | 2. Extract from `ibes.detu_epsus` analysts' EPS forecasts and apply a series of standard filters.
299 | The resulting data set `_tmp1` covers U.S. firms that report EPS in dollars and analysts who report predictions in dollars.
300 | In this exercise, we only consider one-year-ahead forecasts (i.e., `fpi in ('1')`)
301 | Observations with missing *forecast* announcement dates or predicted EPS values are excluded.
302 | Each broker (`estimator`) may have multiple analysts (`analys`).
303 | Some EPS are on a primary basis while others on a diluted basis, as indicated by `pdf`.
304 | An analyst may make multiple forecasts throughout the period before the actual EPS announcement.
305 | For each analyst, only her last forecast before an EPS announcement is included.
306 | Alternatively, one can change the last line of code to keep the last forecast made by a given analyst on a given date.
307 | (Yes, analysts may report multiple forecasts on a given date.)
308 | ```sas
309 | data _tmp1;
310 | format ticker fpedats estimator analys anndats pdf fpi value;
311 | set ibes.detu_epsus;
312 | where &ibes_detu_period. and &ibes_detu_filter.;
313 | if fpi in ('1') and nmiss(anndats , value) eq 0;
314 | keep ticker fpedats estimator analys anndats pdf fpi value;
315 | proc sort; by ticker fpedats estimator analys anndats;
316 | data _tmp1; set _tmp1;
317 | by ticker fpedats estimator analys anndats;
318 | if last.analys; * last.anndats;
319 | run;
320 | ```
321 |
322 | 3. Run a WRDS macro to create a link table between IBES TICKER and CRSP PERMNO.
323 | Keep only high-quality links, and exclude cases where one *ticker* is matched to multiple *permno*.
324 | Create a list of all relevant *permno*, and then extract their price and share adjustment factor from CRSP daily stock file; keep only observations within the sample period.
325 | ```sas
326 | %iclink (ibesid = ibes.id , crspid = crsp.stocknames , outset = iclnk);
327 | proc sort data = iclnk (where = (score in (0 , 1 , 2)))
328 | uniout = iclnk_uniperm nouniquekey;
329 | by ticker; run;
330 |
331 | proc sort data = iclnk_uniperm nodupkey
332 | out = allperm (keep = permno);
333 | by permno; run;
334 |
335 | proc sql;
336 | create table dsf_short as
337 | select a.permno , a.date , a.prc , a.cfacshr
338 | from crsp.dsf as a , allperm as b
339 | where a.permno eq b.permno and
340 | a.date ge "01jan&yr_beg."d
341 | ;
342 | quit;
343 | ```
344 |
345 | 4. For each firm-fiscal year, obtain the latest stock price and share adjustment factor on/before the earnings announcement date.
346 | ```sas
347 | proc sql;
348 | create table _tmp01 as
349 | select a.* , b.permno
350 | from _tmp0 as a , iclnk_uniperm as b
351 | where a.ticker eq b.ticker
352 | order by ticker , pends
353 | ;
354 | create table _tmp02 as
355 | select a.* , abs(b.prc) as prc_act , b.cfacshr as adjfac_act
356 | from _tmp01 as a , dsf_short as b
357 | where a.permno eq b.permno and
358 | intnx("week" , a.anndats , -1 , 'b') le b.date le a.anndats
359 | group by a.ticker , a.pends , a.anndats
360 | having abs(a.anndats - b.date) eq min(abs(a.anndats - b.date))
361 | order by a.ticker , a.pends , a.anndats
362 | ;
363 | quit;
364 | ```
365 |
366 | 5. For each analysts' forecast, obtain the latest share adjustment factor on/before the *forecast* announcement date.
367 | ```sas
368 | proc sql;
369 | create table _tmp11 as
370 | select a.* , b.permno
371 | from _tmp1 as a , iclnk_uniperm as b
372 | where a.ticker eq b.ticker
373 | order by ticker , fpedats , estimator , analys , anndats
374 | ;
375 | create table _tmp12 as
376 | select a.* , b.cfacshr as adjfac_est
377 | from _tmp11 as a , dsf_short as b
378 | where a.permno eq b.permno and
379 | intnx("week" , a.anndats , -1 , 'b') le b.date le a.anndats
380 | group by a.ticker , a.fpedats , a.estimator , a.analys , a.anndats
381 | having abs(a.anndats - b.date) eq min(abs(a.anndats - b.date))
382 | order by a.ticker , a.fpedats , a.estimator , a.analys , a.anndats
383 | ;
384 | quit;
385 | ```
386 |
387 | 6. Merge analysts' forecasts with actual EPS.
388 | To ensure that predicted and actual EPS are based on the same number of shares outstanding, adjust the predicted ones for stock splits etc. using the CRSP share adjustment factor.
389 | (For details, see [A Note on IBES Unadjusted Data](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/i-b-e-s/ibes-estimates/wrds-research-notes/note-ibes-unadjusted-data/).)
390 | ```sas
391 | proc sql;
392 | create table _tmp2 as
393 | select a.ticker , a.pends as fpedats , a.anndats ,
394 | a.value as actval , a.prc_act , a.adjfac_act ,
395 | b.estimator as broker , b.analys , b.anndats as estdats ,
396 | b.value as estval , b.adjfac_est , b.permno
397 | from _tmp02 as a , _tmp12 as b
398 | where a.ticker eq b.ticker and
399 | a.pends eq b.fpedats
400 | order by a.ticker, a.pends , a.anndats , b.anndats
401 | ;
402 | quit;
403 |
404 | data _tmp3; set _tmp2;
405 | if adjfac_est eq 0 then adjfac_est = 1;
406 | if adjfac_act eq 0 then adjfac_act = 1;
407 | estval_adj = estval * coalesce(adjfac_act / adjfac_est , 1);
408 | drop adjfac: estval;
409 | run;
410 | ```
411 |
412 | 7. Compute a number of statistics for analysts' forecasts,
413 | including the number of analysts who made forecasts in the 9 months prior to earnings announcement,
414 | and their mean (median) forecast.
415 | Also, compute the earnings-to-price ratio as well as two measures of (relative) forecast error.
416 | ```sas
417 | proc sql;
418 | create table _tmp4 as
419 | select ticker , permno , fpedats ,
420 | anndats , actval , prc_act ,
421 | count(*) as num_analys ,
422 | mean(estval_adj) as estavg ,
423 | median(estval_adj) as estmed
424 | from _tmp3
425 | where intnx("mon" , anndats , -9 , 'b') le estdats lt anndats
426 | group by ticker, permno, fpedats, anndats, actval, prc_act
427 | ;
428 | quit;
429 | /* Sanity check: there should be only one observation for a given firm-fiscal period. */
430 | proc sort nodupkey; by ticker fpedats;
431 | run;
432 |
433 | data _tmp5; set _tmp4;
434 | epratio = actval / prc_act;
435 | ferr1 = (estavg - actval) / prc_act;
436 | ferr2 = (estmed - actval) / prc_act;
437 | run;
438 | ```
439 |
440 | Finally, one can compute summary statistics for these measures.
441 | ```sas
442 | proc means data = _tmp5
443 | N mean median std skew kurt min p1 p5 p95 p99 max;
444 | var num_analys epratio ferr1 ferr2;
445 | run;
446 | ```
447 |
448 | [Back to top](#top)
449 |
--------------------------------------------------------------------------------
/fig/aapl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/aapl.png
--------------------------------------------------------------------------------
/fig/compare1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/compare1.png
--------------------------------------------------------------------------------
/fig/compare2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/compare2.png
--------------------------------------------------------------------------------
/fig/tsla1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/tsla1.png
--------------------------------------------------------------------------------
/fig/tsla2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/tsla2.png
--------------------------------------------------------------------------------
/src/calc_welch_market_beta.md:
--------------------------------------------------------------------------------
1 |
2 | # Calculate [Welch (2019)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3371240) Market Betas
3 |
4 | Below is a SAS macro that calculates market betas for CRSP stocks following [Welch (2019)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3371240).
5 | The default version covers all stocks that are
6 | (1) ordinary common shares,
7 | (2) issued by companies incorporated in the U.S.,
8 | and (3) listed on the NYSE, AMEX, or NASDAQ.
9 | It calculates market betas for each stock month by month using **daily stock returns** with a rolling window of 60 months.
10 | Daily excess returns are winsorized at *(1 ± 3) * market excess return* before the calculation.
11 | Older observations are given less weights based on a decay rate of 2/252 per day.
12 |
13 |
14 | ##### Example usages:
15 | - calling `%welch_market_beta (output_ds = welch_beta)` would calculate market betas for all stock in each month of 2018, and the output would be stored in `welch_beta`.
16 | - calling `%welch_market_beta (yr_beg = 1927 , yr_end = 2018 , rw_mon = 36 , output_ds = welch_beta)` would calculate market betas for every month between 1927 and 2018 with a rolling window of 36 months.
17 | - calling `%welch_market_beta (delta = 5 , rho = 3 / 252 , output_ds = welch_beta)` would calculate market betas using daily excess returns that are winsorized at (1 ± 5) * market excess return, and with a daily decay rate of 3/252.
18 |
19 |
20 | ```sas
21 | %macro welch_market_beta (yr_beg = 2018 , yr_end = 2018 , rw_mon = 60 , delta = 3 , rho = 2 / 252 , output_ds = );
22 |
23 | /* only include stocks that are ordinary common shares issued by companies incorporated in the US and listed on the NYSE, AMEX, or NASDAQ */
24 | %let dsenames_std_filter = (shrcd in (10 , 11) and exchcd in (1 , 2 , 3));
25 | %let dsf_sample_period = (intnx('mon' , mdy(1,1,&yr_beg.) , - &rw_mon. , 'end') lt date le "31dec&yr_end."d);
26 |
27 | /* extract only relevant variables and observations from the Daily Stock File */
28 | proc sql;
29 | create table dsf_short as
30 | select a.permno , b.date , b.ret
31 | from (select * from crsp.dsenames
32 | where &dsenames_std_filter.) as a ,
33 | (select * from crsp.dsf
34 | where &dsf_sample_period.) as b
35 | where a.permno eq b.permno and
36 | a.namedt le b.date le a.nameendt
37 | order by a.permno , b.date
38 | ;
39 | quit;
40 |
41 | /* clear the work directory; make sure the relevant filenames have not been used */
42 | proc datasets lib = work nolist;
43 | delete &output_ds. _dset0 _dset1 _regest0 _regest1;
44 | run;
45 |
46 | /* start looping through every month in the sample period */
47 | %local mdate yy mm;
48 | %do yy = &yr_beg %to &yr_end;
49 | %do mm = 1 %to 12;
50 | /* %do yy = &yr_beg %to &yr_beg; */
51 | /* %do mm = 1 %to 1; */
52 |
53 | %let mdate = %sysfunc(mdy(&mm , 1 , &yy));
54 | %let mdate = %sysfunc(intnx(month , &mdate. , 0 , end));
55 | %put ********** Calculating market betas for %sysfunc(putn(&mdate , date9.)) **********;
56 |
57 | /* extract the most recent &rw_mon. months of daily data */
58 | data _dset0; set dsf_short;
59 | where 0 le intck('mon' , date , &mdate.) lt &rw_mon.;
60 | run;
61 |
62 | /* add risk-free rate and market excess return; make sure there are at least 250 valid daily observations for a stock to be included */
63 | proc sql;
64 | create table _dset1 as
65 | select a.* , (a.ret - b.rf) as exret , b.mktrf , b.rf
66 | from _dset0 as a
67 | left join ff.factors_daily as b
68 | on a.date eq b.date
69 | group by a.permno
70 | having count(exret) ge 250
71 | ;
72 | quit;
73 |
74 | /* winsorize and assign weights to daily excess returns following Welch (2019); */
75 | proc sort; by permno date;
76 | data _dset1; set _dset1;
77 | by permno date;
78 | if first.permno then n = 0; n + 1;
79 | agew = exp(&rho. * n);
80 | exrlo = min((1 - &delta.) * mktrf , (1 + &delta.) * mktrf);
81 | exrhi = max((1 - &delta.) * mktrf , (1 + &delta.) * mktrf);
82 | if not missing(exret) then
83 | exret = min(max(exrlo , exret) , exrhi);
84 | drop exrlo exrhi n rf;
85 | run;
86 |
87 | /* calculate market betas as the slope coefficients from the WLS estimation of CAPM */
88 | proc reg data = _dset1 outest = _regest0 edf noprint;
89 | model exret = mktrf;
90 | by permno; weight agew;
91 | run;
92 |
93 | data _regest1;
94 | format permno mdate bMkt;
95 | keep permno mdate bMkt;
96 | set _regest0 (rename = (mktrf = bMkt));
97 | mdate = &mdate.; format mdate date9.;
98 | run;
99 |
100 | /* add the obtained betas for this month to the final output */
101 | proc datasets lib = work nolist;
102 | append base = &output_ds. data = _regest1;
103 | delete _dset0 _dset1 _regest0 _regest1;
104 | run;
105 |
106 | %end;
107 | %end;
108 | %mend welch_market_beta;
109 | ```
110 |
--------------------------------------------------------------------------------
/src/comp_funda_clean.sas:
--------------------------------------------------------------------------------
1 | %macro comp_funda_clean (yr_beg = 1980 , yr_end = 2019 , output_ds = );
2 | /* %let yr_beg = 1978; %let yr_end = 2018; */
3 | %let funda_filter = ((indfmt eq "INDL") and (consol eq 'C') and
4 | (popsrc eq 'D') and (datafmt eq "STD") and
5 | (fic eq "USA") and (curncd eq "USD") and
6 | (curcd eq "USD") and not missing(at));
7 | %let funda_fncd_filter = ((indfmt eq "INDL") and (consol eq 'C') and
8 | (popsrc eq 'D') and (datafmt eq "STD"));
9 | %let secm_filter = (primiss eq 'P' and fic eq "USA" and curcdm eq "USD");
10 | %let comp_sample_period = (&yr_beg. le fyear le &yr_end.);
11 | %let secm_sample_period = ("01jan&yr_beg."d le datadate le "31dec&yr_end."d);;
12 |
13 | /* panel variable: gvkey; time variable: fyear and datadate */
14 | %let names_vars = gvkey conm sic naics;
15 | %let funda_keys = gvkey datadate fyear;
16 | %let funda_vars = sich naicsh at sale csho prcc_f seq ceq lct lt
17 | dltt mib txditc txdb itcb pstk pstkrv pstkl lo
18 | dlc cogs xsga revt xint ebitda
19 | ;
20 |
21 | proc sort data = comp.names nodupkey
22 | out = _tmp0 (keep = &names_vars.);
23 | by gvkey;
24 | %let nyr = %eval(&yr_end. - &yr_beg.);
25 | data _tmp0; set _tmp0;
26 | yr_0 = &yr_beg.;
27 | array yr {&nyr.} yr_1 - yr_&nyr.;
28 | do i = 1 to &nyr.;
29 | yr(i) = yr_0 + i;
30 | end;
31 | drop i;
32 | proc transpose data = _tmp0
33 | out = _tmp1 (rename = (col1 = fyear)
34 | drop = _name_);
35 | by &names_vars.;
36 | var yr_0 - yr_&nyr.;
37 | run;
38 |
39 | data funda_short; set comp.funda;
40 | where &comp_sample_period. and
41 | &funda_filter.;
42 | keep &funda_keys. &funda_vars.;
43 | proc sort nodupkey; by gvkey datadate;
44 | run;
45 |
46 | proc sql;
47 | create table _tmp11 (drop = _gvkey _fyear) as
48 | select a.* , b.*
49 | from _tmp1 as a
50 | left join funda_short (rename = (gvkey = _gvkey fyear = _fyear)) as b
51 | on a.gvkey eq b._gvkey and a.fyear eq b._fyear
52 | group by gvkey
53 | having min(_fyear) le fyear le max(_fyear)
54 | ;
55 | quit;
56 | proc sort; by gvkey fyear datadate;
57 | data _tmp2;
58 | format gvkey fyear datadate conm sic;
59 | keep gvkey fyear datadate conm
60 | sic at sale be me bd to pm;
61 | set _tmp11; by gvkey fyear datadate;
62 | /* if last.fyear; */
63 | sic = coalescec(put(sich,4.) , sic);
64 | /* naics = coalesce(naicsh , naics); */
65 | /* naics3 = substr(put(naics , 6. -l) , 1 , 3); */
66 | sale = ifn(missing(sale) , revt , sale);
67 | be = coalesce(seq , sum(ceq , pstk) , sum(at - lt , -mib))
68 | + coalesce(txditc , sum(txdb , itcb) ,
69 | lt - lct - lo - dltt , 0)
70 | - coalesce(pstkrv , pstkl , pstk , 0);
71 | be = ifn(be>0 , be , .);
72 | me = prcc_f * csho; me = ifn(me>0,me,.);
73 | bd = dlc + dltt;
74 | to = sale / ifn(at>0,at,.);
75 | pm = ebitda / ifn(sale>0,sale,.);
76 | /* check uniqueness of the key */
77 | proc sort nodupkey; by gvkey fyear;
78 | run;
79 |
80 | data _tmp21;
81 | keep gvkey datadate me_secm;
82 | set comp.secm;
83 | where &secm_filter. and
84 | &secm_sample_period.;
85 | me_secm = prccm * cshoq;
86 | rename datadate = mdate;
87 | if not missing(me_secm);
88 | /* check uniqueness of the key */
89 | proc sort nodupkey; by gvkey mdate;
90 | run;
91 |
92 | proc sql;
93 | create table _tmp22 as
94 | select a.gvkey , a.fyear , a.datadate ,
95 | b.mdate , b.me_secm
96 | from _tmp2 as a
97 | left join _tmp21 as b
98 | on a.gvkey eq b.gvkey and
99 | 0 le intck('mon' , a.datadate , b.mdate) le 3
100 | order by a.gvkey , a.fyear , a.datadate , b.mdate
101 | ;
102 | quit;
103 | proc transpose data = _tmp22 out = _tmp23 (drop = _name_)
104 | prefix = me_secm;
105 | var me_secm;
106 | by gvkey fyear datadate;
107 | data _tmp3;
108 | drop me_secm me_secm1 - me_secm2;
109 | merge _tmp2 _tmp23;
110 | by gvkey fyear datadate;
111 | me_secm = coalesce(of me_secm1 - me_secm2);
112 | me = coalesce(me , me_secm); me = ifn(me>0,me,.);
113 | if bd gt 0 then ml = bd / (me + bd);
114 | if at gt 0 then mb = (at - be + me) / at;
115 | proc sort nodupkey; by gvkey fyear;
116 | run;
117 |
118 | proc sql;
119 | create table _tmp4 as
120 | select a.* , b.at_fn , b.sale_fn
121 | from _tmp3 as a
122 | left join
123 | (select * from comp.funda_fncd
124 | where &comp_sample_period.
125 | and &funda_fncd_filter.) as b
126 | on a.gvkey eq b.gvkey and
127 | a.fyear eq b.fyear and
128 | a.datadate eq b.datadate
129 | ;
130 | quit;
131 | proc sort nodupkey; by gvkey fyear;
132 | run;
133 |
134 | proc sql;
135 | create table _tmp41 as
136 | select gvkey , datadate , sum(salecs) as sale_gov
137 | from compsegd.seg_customer
138 | where ctype in ('GOVDOM' , 'GOVSTATE' , 'GOVLOC')
139 | group by gvkey , datadate
140 | having sale_gov gt 0
141 | ;
142 | create table &output_ds. as
143 | select a.* , b.sale_gov
144 | from _tmp4 as a
145 | left join _tmp41 as b
146 | on a.gvkey eq b.gvkey and
147 | a.datadate eq b.datadate
148 | ;
149 | quit;
150 | proc sort nodupkey; by gvkey fyear;
151 | run;
152 |
153 | proc datasets library = work nolist; save &output_ds.; run;
154 |
155 | %mend comp_funda_clean;
156 |
157 | %comp_funda_clean (output_ds = comp_funda_clean);
158 |
159 | %let FFP = "[...]/comp_funda_clean.dta";
160 | proc export data = comp_funda_clean outfile = &FFP. replace; run;
161 |
--------------------------------------------------------------------------------
/src/crsp_monthly.md:
--------------------------------------------------------------------------------
1 | The code below compiles a panel that contains monthly stock information.
2 |
3 | ```sas
4 | /* only include stocks that are ordinary common shares issued by companies incorporated in the US and listed on the NYSE, AMEX, or NASDAQ */
5 | %let msenames_std_filter = (shrcd in (10 , 11) and exchcd in (1 , 2 , 3));
6 | /* choose sample period */
7 | %let msf_sample_period = ('31jan2018'd le date le '31dec2018'd);
8 |
9 | proc sql;
10 | create table _tmp0 as
11 | select a.permno , /* add other vars from msenames if necessary, e.g., a.siccd */
12 | b.date , b.ret , b.retx , b.prc ,
13 | b.vol , b.shrout , b.cfacpr , b.cfacshr
14 | from (select * from crsp.msenames
15 | where &msenames_std_filter.) as a ,
16 | (select * from crsp.msf
17 | where &msf_sample_period.) as b
18 | where a.permno eq b.permno and
19 | a.namedt le b.date le a.nameendt
20 | order by a.permno , b.date
21 | ;
22 | /* add delisting return */
23 | create table _tmp1 as
24 | select a.* , b.dlret
25 | from _tmp0 as a
26 | left join crsp.msedelist as b
27 | on a.permno eq b.permno and
28 | intck('mon' , a.date , b.dlstdt) eq 0
29 | order by permno , date
30 | ;
31 | quit;
32 |
33 | data _tmp2;
34 | format permno mdate ret sz szb turn; /* order the vars */
35 | keep permno mdate ret sz szb turn; /* keep the needed vars */
36 | set _tmp1;
37 | mdate = intnx('mon' , date , 0 , 'e'); format mdate date9.;
38 | ret = (1 + ret) * sum(1 , dlret) - 1; /* adjust for delisting */
39 | if not missing(ret) then ret = min(max(-0.4 , ret) , 0.6); /* winsorizing */
40 | /* adjust price */
41 | if cfacpr eq 0 then p = abs(prc);
42 | else p = abs(prc) / cfacpr;
43 | if p le 0 then p = .;
44 | /* adjust shares outstanding */
45 | if cfacshr eq 0 then tso = shrout * 1000;
46 | else tso = shrout * cfacshr * 1000;
47 | if tso le 0 then tso = .;
48 | sz = p * tso / 1000000; /* market value of the stock at month end */
49 | szb = sz / (1 + retx); /* market value of the stock at month beginning */
50 | turn = vol * 100 / (shrout * 1000) * 100; /* turnover ratio in percentage */
51 | proc sort nodupkey; by permno mdate; /* check uniqueness of the key */
52 | run;
53 |
54 | /* a macro provided by WRDS that compiles a link table between crso and compustat */
55 | %macro compress_ccmxpf_lnkhist;
56 | %let prim_link = %str(linktype in ("LC", "LS", "LU"));
57 |
58 | proc sql;
59 | create table lnk1 as
60 | select *
61 | from crsp.ccmxpf_lnkhist
62 | where &prim_link.
63 | order by gvkey, lpermno, lpermco, linkdt, linkenddt
64 | ;
65 | quit;
66 |
67 | data lnk2;
68 | set lnk1;
69 | by gvkey lpermno lpermco linkdt linkenddt;
70 | format prev_ldt prev_ledt yymmddn8.;
71 | retain prev_ldt prev_ledt;
72 | if first.lpermno then do;
73 | if last.lpermno then do;
74 | /* Keep this obs if it's the first and last matching permno pair */
75 | output;
76 | end;
77 | else do;
78 | /* If it's the first but not the last pair, retain the dates for future use */
79 | prev_ldt = linkdt;
80 | prev_ledt = linkenddt;
81 | output;
82 | end;
83 | end;
84 | else do;
85 | if linkdt=prev_ledt+1 or linkdt=prev_ledt then do;
86 | /* If the date range follows the previous one, assign the previous linkdt value
87 | to the current - will remove the redundant in later steps. Also retain the
88 | link end date value */
89 | linkdt = prev_ldt;
90 | prev_ledt = linkenddt;
91 | output;
92 | end;
93 | else do;
94 | /* If it doesn't fall into any of the above conditions, just keep it and retain the
95 | link date range for future use*/
96 | output;
97 | prev_ldt = linkdt;
98 | prev_ledt = linkenddt;
99 | end;
100 | end;
101 | drop prev_ldt prev_ledt;
102 | run;
103 |
104 | data lnk;
105 | set lnk2;
106 | by gvkey lpermno linkdt;
107 | if last.linkdt;
108 | /* remove redundant observations with identical LINKDT (result of the previous data step), so that
109 | each consecutive pair of observations will have either different GVKEY-IID-PERMNO match, or
110 | non-consecutive link date range
111 | */
112 | run;
113 | %mend;
114 | %compress_ccmxpf_lnkhist;
115 |
116 | proc sql;
117 | create table _tmp3 as
118 | select distinct a.* , b.gvkey
119 | from _tmp2 as a left join lnk as b
120 | on a.permno eq b.lpermno and
121 | (b.LINKDT le a.mdate or missing(b.LINKDT)) and
122 | (b.LINKENDDT gt a.mdate or missing(b.LINKENDDT))
123 | order by permno , mdate
124 | /* group by permno , mdate having count(unique gvkey) gt 1 /* one permno-mdate should be matched to only one gvkey */
125 | ;
126 | quit;
127 | proc sort nodupkey; by permno mdate;
128 | run;
129 |
130 | %let FFP = "/home/uiuc/chaozi/test.csv"; /* change csv to dta if you need stata file */
131 | proc export data = _tmp3 outfile = &FFP. replace; run;
132 | ```
133 |
--------------------------------------------------------------------------------