├── LICENSE ├── README.md ├── fig ├── aapl.png ├── compare1.png ├── compare2.png ├── tsla1.png └── tsla2.png └── src ├── calc_welch_market_beta.md ├── comp_funda_clean.sas └── crsp_monthly.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Chao Zi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Sample SAS Programs for Processing WRDS Data 3 | 4 | In this repository, I present a selection of SAS programs that (pre-)process [WRDS data](https://wrds-www.wharton.upenn.edu/) and compute certain variables/measures, as well as conduct some simple analysis. 5 | My goal is to provide efficient procedures for cleaning and transforming raw data from various WRDS databases (e.g., CRSP, Compustat, IBES, etc.) into well-structured datasets that contain only variables of interest and are conducive to econometric analysis. 6 | By walking through the steps in each program, one can 7 | (1) quickly gain a working knowledge of related raw data (e.g., file structures, variable definitions, etc.), 8 | and (2) understand the proper steps in the cleaning and organizing processes. 9 | I believe these programs are well-written and should be pretty straightforward to interpret (even for people who are new to SAS). 10 | They are also very adaptable and can be easily tailored to serve specific research purposes. 11 | (I personally have used these programs as building blocks for more complicated projects.) 12 | You should be able to run these programs smoothly on [WRDS SAS Studio](https://wrds-www.wharton.upenn.edu/pages/data/sas-studio-wrds/). 13 | Should you have any questions or find any bugs, please submit an issue or email me at [i@czi.finance](mailto:i@czi.finance). 14 | I will do my best to help! 15 | 16 | 17 | ### Table of Contents 18 | #### Basic programs 19 | - [Examine companies' financial performance over time](#comp) 20 | - [Construct stock portfolios based on past characteristics](#crsp) 21 | - [Compare analysts' forecasts with firms' actual earnings](#ibes) 22 | #### Additional programs 23 | - [Calculate market betas according to Welch (2019)](./src/calc_welch_market_beta.md) 24 | 25 | 26 | 27 | ## Examine companies' financial performance over time 28 | 29 | In this program, I build from [Compustat](https://wrds-web.wharton.upenn.edu/wrds/query_forms/navigation.cfm?navId=60) a data set that contains various financial measures for U.S. companies; they are calculated from **annual** financial statements. 30 | This data set can be used to study and compare the performance of one or more companies over time. 31 | As an illustration, I plot four figures below using this data. 32 | The first two shows the market leverage, asset turnover, and profit margin, as well as the asset and sales growth and the market-to-book (assets) ratio for **Tesla Inc.** (`gvkey = 184996`) over the past five fiscal years. 33 | The latter two figures compare Tesla with Ford and GM. 34 | 35 | Tesla did not use much leverage in the past five years. 36 | Its market leverage was about 8 percent in the 2015 fiscal year. 37 | That number increased to around 17 percent in 2016 and remained there ever since. 38 | Tesla's asset turnover declined from 50 percent in 2015 to roughly 30-40 percent in 2016-17. 39 | That number climbed back and reached more than 70 percent in 2018-19, suggesting that Tesla has become more effective in generating sales. 40 | That said, Tesla has not been good at generating profit. 41 | It has a negative profit margin in 2015-18. 42 | That number only turned positive in 2019. 43 | Tesla's book value of assets almost tripled in 2016, which is driven largely by the acquisition of *SolarCity*. 44 | But it has been growing rapidly nonetheless: 45 | Its balance sheet expanded by 38 and 26 percent in 2015 and 2017, respectively; that number slipped to 4 and 15 percent in 2018 and 2019, respectively. 46 | Tesla's sales also increased rapidly. 47 | In particular, its annual sales grew by 73, 68, and 83 percent in 2016, 2017, and 2018, respectively. 48 | But that number declined to less than 15 percent in 2019. 49 | Before the acquisition of *SolarCity*, Tesla's market value of assets was five times its book value, while after the acquisition, that number declined to around three (still a pretty high valuation). 50 | 51 | 52 | 53 | 54 | In comparison, Ford and GM used much more leverage: about 70-80 percent for the former, and around 60 percent for the latter. 55 | But their growth seem to have stagnated, with both firms' sales growth hovering around zero. 56 | 57 | Without further ado, let's dive into the code! 58 | 59 | 60 | 1. Begin by defining a set of macro variables that indicate sample period (e.g., from 1950 to 2019), data filters, and financial statement items to include. 61 | Obtain a complete list of covered companies, and construct a *balanced* panel (which I will refer to as the *Panel* hereafter) that spans the full sample period without time gap. 62 | ```sas 63 | %let yr_beg = 1950; 64 | %let yr_end = 2019; 65 | %let funda_filter = ((indfmt eq "INDL") and (consol eq 'C') and 66 | (popsrc eq 'D') and (datafmt eq "STD") and 67 | (fic eq "USA") and (curncd eq "USD") and 68 | (curcd eq "USD") and not missing(at)); 69 | %let funda_fncd_filter = ((indfmt eq "INDL") and (consol eq 'C') and 70 | (popsrc eq 'D') and (datafmt eq "STD")); 71 | %let secm_filter = (primiss eq 'P' and fic eq "USA" and curcdm eq "USD"); 72 | %let comp_sample_period = (&yr_beg. le fyear le &yr_end.); 73 | %let secm_sample_period = ("01jan&yr_beg."d le datadate le "31dec&yr_end."d);; 74 | 75 | /* panel variable: gvkey; time variable: fyear or datadate */ 76 | %let names_vars = gvkey conm sic naics; 77 | %let funda_keys = gvkey datadate fyear; 78 | %let funda_vars = sich naicsh at sale csho prcc_f seq ceq lct lt 79 | dltt mib txditc txdb itcb pstk pstkrv pstkl lo 80 | dlc cogs xsga revt xint xopr oiadp oibdp dp 81 | ppegt invt ib ppent dpact 82 | ; 83 | 84 | proc sort data = comp.names nodupkey 85 | out = _tmp0 (keep = &names_vars.); 86 | by gvkey; 87 | %let nyr = %eval(&yr_end. - &yr_beg.); 88 | data _tmp0; set _tmp0; 89 | yr_0 = &yr_beg.; 90 | array yr {&nyr.} yr_1 - yr_&nyr.; 91 | do i = 1 to &nyr.; 92 | yr(i) = yr_0 + i; 93 | end; 94 | drop i; 95 | proc transpose data = _tmp0 96 | out = _tmp1 (rename = (col1 = fyear) 97 | drop = _name_); 98 | by &names_vars.; 99 | var yr_0 - yr_&nyr.; 100 | run; 101 | ``` 102 | 103 | 2. Extract only the valid records and the relevant items from the Fundamental Annual file for the given sample period. 104 | Here, only U.S. firm-years with non-missing book value of assets are included. 105 | It is important to do a sanity check and see whether the pair of `gvkey` and `datadate` uniquely identify observations. 106 | ```sas 107 | data funda_short; set comp.funda; 108 | where &comp_sample_period. and &funda_filter.; 109 | keep &funda_keys. &funda_vars.; 110 | proc sort nodupkey; by gvkey datadate; 111 | run; 112 | ``` 113 | 114 | 3. Add to the *Panel* the relevant financial statement items as well as any variables of interest calculated from them (e.g., book value of equity `be`, market value of equity `me`, book value of debt `bd`, asset turnover `to`, profit margin `pm`, etc.). 115 | For each firm, exclude the preceding and trailing null observations---that is, those earlier (later) than the first (last) available record. 116 | (This makes the *Panel* unbalanced, but significantly reduces its size.) 117 | Note that alternative definitions are used to minimize the instances of missing value. 118 | Besides, (preliminary) sanity checks are conducted when defining all variables. 119 | ```sas 120 | proc sql; 121 | create table _tmp11 (drop = _gvkey _fyear) as 122 | select a.* , b.* from _tmp1 as a left join 123 | funda_short (rename = (gvkey = _gvkey fyear = _fyear)) as b 124 | on a.gvkey eq b._gvkey and a.fyear eq b._fyear 125 | group by gvkey 126 | having min(_fyear) le fyear le max(_fyear) 127 | ; 128 | quit; 129 | proc sort; by gvkey fyear datadate; 130 | data _tmp2; 131 | format gvkey fyear datadate conm sic; 132 | keep gvkey fyear datadate conm sic 133 | at sale ib be me bd PnI to pm ic; 134 | set _tmp11; by gvkey fyear datadate; 135 | /* if last.fyear; */ 136 | if not missing(sich) then sic = put(sich , z4.); 137 | /* naics = coalesce(naicsh , naics); */ 138 | /* naics3 = substr(put(naics , 6. -l) , 1 , 3); */ 139 | /*************** VARIABLE DEFINITION ***************/ 140 | be = coalesce(seq , sum(ceq , pstk) , sum(at-lt , -mib)) 141 | + coalesce(txditc , sum(txdb , itcb) , 142 | lt - lct - lo - dltt , 0) 143 | - coalesce(pstkrv , pstkl , pstk , 0); 144 | me = prcc_f * csho; me = ifn(me>0,me,.); 145 | bd = dlc + dltt; bd = ifn(bd>0,bd,.); 146 | ppegt = coalesce(ppegt , ppent+dpact); 147 | PnI = ppegt + invt; PnI = ifn(PnI>0,PnI,.); 148 | sale = ifn(missing(sale) , revt , sale); 149 | to = sale / ifn(at>0,at,.); 150 | oibdp = coalesce(oibdp , sale-xopr , sale-cogs-xsga); 151 | oiadp = coalesce(oiadp , oibdp-dp); 152 | pm = oiadp / ifn(sale>0,sale,.); 153 | ic = oiadp / ifn(xint>0,xint,.); 154 | /* check uniqueness of the key */ 155 | proc sort nodupkey; by gvkey fyear; 156 | run; 157 | ``` 158 | 159 | 4. Complement the market value of equity using information from the monthly security file (`comp.secm`). 160 | Consider only the primary equity issue of a company: 161 | use its market capitalization at the end of a fiscal year (or the following quarter-end if the former is unavailable) to represent the market value of equity. 162 | ```sas 163 | data _tmp21; 164 | keep gvkey datadate me_secm; 165 | set comp.secm; 166 | where &secm_filter. and &secm_sample_period.; 167 | me_secm = prccm * cshoq; 168 | if not missing(me_secm); 169 | rename datadate = mdate; 170 | /* check uniqueness of the key */ 171 | proc sort nodupkey; by gvkey mdate; 172 | run; 173 | 174 | proc sql; 175 | create table _tmp22 as 176 | select a.gvkey , a.fyear , a.datadate , 177 | b.mdate , b.me_secm 178 | from _tmp2 as a left join _tmp21 as b 179 | on a.gvkey eq b.gvkey and 180 | 0 le intck('mon' , a.datadate , b.mdate) le 3 181 | order by a.gvkey , a.fyear , a.datadate , b.mdate 182 | ; 183 | quit; 184 | proc transpose data = _tmp22 out = _tmp23 (drop = _name_) 185 | prefix = me_secm; 186 | var me_secm; 187 | by gvkey fyear datadate; 188 | data _tmp3; 189 | drop me_secm me_secm1 - me_secm2; 190 | merge _tmp2 _tmp23; 191 | by gvkey fyear datadate; 192 | me_secm = coalesce(of me_secm1 - me_secm2); 193 | me = coalesce(me , me_secm); me = ifn(me>0,me,.); 194 | /*************** VARIABLE DEFINITION ***************/ 195 | ml = bd / (me + bd); 196 | bm = be / me; 197 | proc sort nodupkey; by gvkey fyear; 198 | run; 199 | ``` 200 | 201 | 5. Obtain the footnotes for certain data items from the Fundamental Annual Footnote file. 202 | They can be used to identify and filter out extreme values caused by some extraordinary corporate actions/events (e.g., M&A, spin-off, bankruptcy, etc.). 203 | ```sas 204 | proc sql; 205 | create table _tmp4 as 206 | select a.* , b.at_fn , b.sale_fn 207 | from _tmp3 as a left join 208 | (select * from comp.funda_fncd 209 | where &comp_sample_period. 210 | and &funda_fncd_filter.) as b 211 | on a.gvkey eq b.gvkey and 212 | a.datadate eq b.datadate 213 | ; 214 | quit; 215 | proc sort nodupkey; by gvkey fyear; 216 | run; 217 | ``` 218 | 219 | 6. *(optional)* Add to the *Panel* companies' sales to major customers (which is reported in `compsegd.seg_customer`). 220 | Here I use the U.S. government as an example: I compute firms' total sales to federal, state, and local governments combined. 221 | ```sas 222 | proc sql; 223 | create table _tmp41 as 224 | select gvkey , datadate , sum(salecs) as sale_gov 225 | from compsegd.seg_customer 226 | where ctype in ('GOVDOM' , 'GOVSTATE' , 'GOVLOC') 227 | group by gvkey , datadate 228 | having sale_gov gt 0 229 | ; 230 | create table comp_funda_clean as 231 | select a.* , b.sale_gov 232 | from _tmp4 as a left join _tmp41 as b 233 | on a.gvkey eq b.gvkey and 234 | a.datadate eq b.datadate 235 | ; 236 | quit; 237 | proc sort nodupkey; by gvkey fyear; 238 | run; 239 | ``` 240 | 241 | 7. One can export the *Panel* in different formats (e.g., .dta, .csv) conducive to subsequent analysis. 242 | For example, I use `comp_funda_clean` in Stata to plot those figures above. 243 | ```sas 244 | %let FFP = "[The Path to Your Output Folder]/comp_funda_clean.dta"; 245 | proc export data = comp_funda_clean outfile = &FFP. replace; run; 246 | ``` 247 | Note that I use `left join` when adding new variables to the *Panel*, and then check whether the uniqueness of the key is preserved. 248 | I believe this is a good practice that helps prevent unintentionally duplicating or deleting observations when one merges data. 249 | It also helps reveal bugs if there is any. 250 | 251 | [Back to top](#top) 252 | 253 | 254 | ## Construct stock portfolios based on past characteristics 255 | 256 | 257 | ## Compare analysts' forecasts with firms' actual earnings 258 | 259 | In this program, I build from [IBES](https://wrds-web.wharton.upenn.edu/wrds/query_forms/navigation.cfm?navId=221&_ga=2.202254610.2026535339.1587168594-1066308586.1576595708) a data set that contains US companies' actual *earnings per share* (EPS) for certain fiscal years, along with the corresponding forecasts made by financial analysts prior to the earnings announcements. 260 | This data set can be used to address questions like: 261 | - Do analysts make rational predictions? 262 | - What is the impact of surprisingly high/low earnings? 263 | - What is driving the earnings surprises? 264 | 265 | As an illustration, I plot the figure below using this data. 266 | It shows analysts' predictions of **Apple Inc.**'s EPS for the 2019 fiscal year, as well as the actual number that was announced on October 30, 2019. 267 | One can see that analysts made forecasts throughout the year, and overall they seem to slightly underestimate Apple's earnings for this fiscal year. 268 | 269 | 270 | 271 | Without further ado, let's dive into the code! 272 | 273 | 1. Begin by choosing sample period (e.g., from 1970 to 2020); `pends` and `fpedats` denote fiscal period end. 274 | Extract from `ibes.actu_epsus` the actual EPS data and apply a series of standard filters. 275 | The resulting data set `_tmp0` covers U.S. firms that report EPS in dollars. 276 | In this exercise, we focus on annual EPS instead of quarterly one (i.e., `pdicity eq "ANN"`). 277 | Observations with missing announcement dates or EPS values are deleted. 278 | ```sas 279 | %let yr_beg = 1970; 280 | %let yr_end = 2020; 281 | %let ibes_actu_period = ("01jan&yr_beg."d le pends le "31dec&yr_end."d); 282 | %let ibes_actu_filter = (measure eq "EPS" and anndats le actdats and 283 | curr_act eq "USD" and usfirm eq 1); 284 | %let ibes_detu_period = ("01jan&yr_beg."d le fpedats le "31dec&yr_end."d); 285 | %let ibes_detu_filter = (missing(currfl) and measure eq "EPS" and missing(curr) and 286 | usfirm eq 1 and anndats le actdats and report_curr eq "USD" ); 287 | 288 | data _tmp0; 289 | format ticker pends pdicity anndats value; 290 | set ibes.actu_epsus; 291 | where &ibes_actu_period. and &ibes_actu_filter.; 292 | if pdicity eq "ANN" and nmiss(anndats , value) eq 0; 293 | keep ticker pends pdicity anndats value; 294 | /* Sanity check: there should be only one observation for a given firm-fiscal period. */ 295 | proc sort nodupkey; by ticker pends pdicity; 296 | run; 297 | ``` 298 | 2. Extract from `ibes.detu_epsus` analysts' EPS forecasts and apply a series of standard filters. 299 | The resulting data set `_tmp1` covers U.S. firms that report EPS in dollars and analysts who report predictions in dollars. 300 | In this exercise, we only consider one-year-ahead forecasts (i.e., `fpi in ('1')`) 301 | Observations with missing *forecast* announcement dates or predicted EPS values are excluded. 302 | Each broker (`estimator`) may have multiple analysts (`analys`). 303 | Some EPS are on a primary basis while others on a diluted basis, as indicated by `pdf`. 304 | An analyst may make multiple forecasts throughout the period before the actual EPS announcement. 305 | For each analyst, only her last forecast before an EPS announcement is included. 306 | Alternatively, one can change the last line of code to keep the last forecast made by a given analyst on a given date. 307 | (Yes, analysts may report multiple forecasts on a given date.) 308 | ```sas 309 | data _tmp1; 310 | format ticker fpedats estimator analys anndats pdf fpi value; 311 | set ibes.detu_epsus; 312 | where &ibes_detu_period. and &ibes_detu_filter.; 313 | if fpi in ('1') and nmiss(anndats , value) eq 0; 314 | keep ticker fpedats estimator analys anndats pdf fpi value; 315 | proc sort; by ticker fpedats estimator analys anndats; 316 | data _tmp1; set _tmp1; 317 | by ticker fpedats estimator analys anndats; 318 | if last.analys; * last.anndats; 319 | run; 320 | ``` 321 | 322 | 3. Run a WRDS macro to create a link table between IBES TICKER and CRSP PERMNO. 323 | Keep only high-quality links, and exclude cases where one *ticker* is matched to multiple *permno*. 324 | Create a list of all relevant *permno*, and then extract their price and share adjustment factor from CRSP daily stock file; keep only observations within the sample period. 325 | ```sas 326 | %iclink (ibesid = ibes.id , crspid = crsp.stocknames , outset = iclnk); 327 | proc sort data = iclnk (where = (score in (0 , 1 , 2))) 328 | uniout = iclnk_uniperm nouniquekey; 329 | by ticker; run; 330 | 331 | proc sort data = iclnk_uniperm nodupkey 332 | out = allperm (keep = permno); 333 | by permno; run; 334 | 335 | proc sql; 336 | create table dsf_short as 337 | select a.permno , a.date , a.prc , a.cfacshr 338 | from crsp.dsf as a , allperm as b 339 | where a.permno eq b.permno and 340 | a.date ge "01jan&yr_beg."d 341 | ; 342 | quit; 343 | ``` 344 | 345 | 4. For each firm-fiscal year, obtain the latest stock price and share adjustment factor on/before the earnings announcement date. 346 | ```sas 347 | proc sql; 348 | create table _tmp01 as 349 | select a.* , b.permno 350 | from _tmp0 as a , iclnk_uniperm as b 351 | where a.ticker eq b.ticker 352 | order by ticker , pends 353 | ; 354 | create table _tmp02 as 355 | select a.* , abs(b.prc) as prc_act , b.cfacshr as adjfac_act 356 | from _tmp01 as a , dsf_short as b 357 | where a.permno eq b.permno and 358 | intnx("week" , a.anndats , -1 , 'b') le b.date le a.anndats 359 | group by a.ticker , a.pends , a.anndats 360 | having abs(a.anndats - b.date) eq min(abs(a.anndats - b.date)) 361 | order by a.ticker , a.pends , a.anndats 362 | ; 363 | quit; 364 | ``` 365 | 366 | 5. For each analysts' forecast, obtain the latest share adjustment factor on/before the *forecast* announcement date. 367 | ```sas 368 | proc sql; 369 | create table _tmp11 as 370 | select a.* , b.permno 371 | from _tmp1 as a , iclnk_uniperm as b 372 | where a.ticker eq b.ticker 373 | order by ticker , fpedats , estimator , analys , anndats 374 | ; 375 | create table _tmp12 as 376 | select a.* , b.cfacshr as adjfac_est 377 | from _tmp11 as a , dsf_short as b 378 | where a.permno eq b.permno and 379 | intnx("week" , a.anndats , -1 , 'b') le b.date le a.anndats 380 | group by a.ticker , a.fpedats , a.estimator , a.analys , a.anndats 381 | having abs(a.anndats - b.date) eq min(abs(a.anndats - b.date)) 382 | order by a.ticker , a.fpedats , a.estimator , a.analys , a.anndats 383 | ; 384 | quit; 385 | ``` 386 | 387 | 6. Merge analysts' forecasts with actual EPS. 388 | To ensure that predicted and actual EPS are based on the same number of shares outstanding, adjust the predicted ones for stock splits etc. using the CRSP share adjustment factor. 389 | (For details, see [A Note on IBES Unadjusted Data](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/i-b-e-s/ibes-estimates/wrds-research-notes/note-ibes-unadjusted-data/).) 390 | ```sas 391 | proc sql; 392 | create table _tmp2 as 393 | select a.ticker , a.pends as fpedats , a.anndats , 394 | a.value as actval , a.prc_act , a.adjfac_act , 395 | b.estimator as broker , b.analys , b.anndats as estdats , 396 | b.value as estval , b.adjfac_est , b.permno 397 | from _tmp02 as a , _tmp12 as b 398 | where a.ticker eq b.ticker and 399 | a.pends eq b.fpedats 400 | order by a.ticker, a.pends , a.anndats , b.anndats 401 | ; 402 | quit; 403 | 404 | data _tmp3; set _tmp2; 405 | if adjfac_est eq 0 then adjfac_est = 1; 406 | if adjfac_act eq 0 then adjfac_act = 1; 407 | estval_adj = estval * coalesce(adjfac_act / adjfac_est , 1); 408 | drop adjfac: estval; 409 | run; 410 | ``` 411 | 412 | 7. Compute a number of statistics for analysts' forecasts, 413 | including the number of analysts who made forecasts in the 9 months prior to earnings announcement, 414 | and their mean (median) forecast. 415 | Also, compute the earnings-to-price ratio as well as two measures of (relative) forecast error. 416 | ```sas 417 | proc sql; 418 | create table _tmp4 as 419 | select ticker , permno , fpedats , 420 | anndats , actval , prc_act , 421 | count(*) as num_analys , 422 | mean(estval_adj) as estavg , 423 | median(estval_adj) as estmed 424 | from _tmp3 425 | where intnx("mon" , anndats , -9 , 'b') le estdats lt anndats 426 | group by ticker, permno, fpedats, anndats, actval, prc_act 427 | ; 428 | quit; 429 | /* Sanity check: there should be only one observation for a given firm-fiscal period. */ 430 | proc sort nodupkey; by ticker fpedats; 431 | run; 432 | 433 | data _tmp5; set _tmp4; 434 | epratio = actval / prc_act; 435 | ferr1 = (estavg - actval) / prc_act; 436 | ferr2 = (estmed - actval) / prc_act; 437 | run; 438 | ``` 439 | 440 | Finally, one can compute summary statistics for these measures. 441 | ```sas 442 | proc means data = _tmp5 443 | N mean median std skew kurt min p1 p5 p95 p99 max; 444 | var num_analys epratio ferr1 ferr2; 445 | run; 446 | ``` 447 | 448 | [Back to top](#top) 449 | -------------------------------------------------------------------------------- /fig/aapl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/aapl.png -------------------------------------------------------------------------------- /fig/compare1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/compare1.png -------------------------------------------------------------------------------- /fig/compare2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/compare2.png -------------------------------------------------------------------------------- /fig/tsla1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/tsla1.png -------------------------------------------------------------------------------- /fig/tsla2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/czi-finance/wrds_sample_code/60111750908dc5fa4f1897f114ffc8cd75e7a73d/fig/tsla2.png -------------------------------------------------------------------------------- /src/calc_welch_market_beta.md: -------------------------------------------------------------------------------- 1 | 2 | # Calculate [Welch (2019)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3371240) Market Betas 3 | 4 | Below is a SAS macro that calculates market betas for CRSP stocks following [Welch (2019)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3371240). 5 | The default version covers all stocks that are 6 | (1) ordinary common shares, 7 | (2) issued by companies incorporated in the U.S., 8 | and (3) listed on the NYSE, AMEX, or NASDAQ. 9 | It calculates market betas for each stock month by month using **daily stock returns** with a rolling window of 60 months. 10 | Daily excess returns are winsorized at *(1 ± 3) * market excess return* before the calculation. 11 | Older observations are given less weights based on a decay rate of 2/252 per day. 12 | 13 | 14 | ##### Example usages: 15 | - calling `%welch_market_beta (output_ds = welch_beta)` would calculate market betas for all stock in each month of 2018, and the output would be stored in `welch_beta`. 16 | - calling `%welch_market_beta (yr_beg = 1927 , yr_end = 2018 , rw_mon = 36 , output_ds = welch_beta)` would calculate market betas for every month between 1927 and 2018 with a rolling window of 36 months. 17 | - calling `%welch_market_beta (delta = 5 , rho = 3 / 252 , output_ds = welch_beta)` would calculate market betas using daily excess returns that are winsorized at (1 ± 5) * market excess return, and with a daily decay rate of 3/252. 18 | 19 | 20 | ```sas 21 | %macro welch_market_beta (yr_beg = 2018 , yr_end = 2018 , rw_mon = 60 , delta = 3 , rho = 2 / 252 , output_ds = ); 22 | 23 | /* only include stocks that are ordinary common shares issued by companies incorporated in the US and listed on the NYSE, AMEX, or NASDAQ */ 24 | %let dsenames_std_filter = (shrcd in (10 , 11) and exchcd in (1 , 2 , 3)); 25 | %let dsf_sample_period = (intnx('mon' , mdy(1,1,&yr_beg.) , - &rw_mon. , 'end') lt date le "31dec&yr_end."d); 26 | 27 | /* extract only relevant variables and observations from the Daily Stock File */ 28 | proc sql; 29 | create table dsf_short as 30 | select a.permno , b.date , b.ret 31 | from (select * from crsp.dsenames 32 | where &dsenames_std_filter.) as a , 33 | (select * from crsp.dsf 34 | where &dsf_sample_period.) as b 35 | where a.permno eq b.permno and 36 | a.namedt le b.date le a.nameendt 37 | order by a.permno , b.date 38 | ; 39 | quit; 40 | 41 | /* clear the work directory; make sure the relevant filenames have not been used */ 42 | proc datasets lib = work nolist; 43 | delete &output_ds. _dset0 _dset1 _regest0 _regest1; 44 | run; 45 | 46 | /* start looping through every month in the sample period */ 47 | %local mdate yy mm; 48 | %do yy = &yr_beg %to &yr_end; 49 | %do mm = 1 %to 12; 50 | /* %do yy = &yr_beg %to &yr_beg; */ 51 | /* %do mm = 1 %to 1; */ 52 | 53 | %let mdate = %sysfunc(mdy(&mm , 1 , &yy)); 54 | %let mdate = %sysfunc(intnx(month , &mdate. , 0 , end)); 55 | %put ********** Calculating market betas for %sysfunc(putn(&mdate , date9.)) **********; 56 | 57 | /* extract the most recent &rw_mon. months of daily data */ 58 | data _dset0; set dsf_short; 59 | where 0 le intck('mon' , date , &mdate.) lt &rw_mon.; 60 | run; 61 | 62 | /* add risk-free rate and market excess return; make sure there are at least 250 valid daily observations for a stock to be included */ 63 | proc sql; 64 | create table _dset1 as 65 | select a.* , (a.ret - b.rf) as exret , b.mktrf , b.rf 66 | from _dset0 as a 67 | left join ff.factors_daily as b 68 | on a.date eq b.date 69 | group by a.permno 70 | having count(exret) ge 250 71 | ; 72 | quit; 73 | 74 | /* winsorize and assign weights to daily excess returns following Welch (2019); */ 75 | proc sort; by permno date; 76 | data _dset1; set _dset1; 77 | by permno date; 78 | if first.permno then n = 0; n + 1; 79 | agew = exp(&rho. * n); 80 | exrlo = min((1 - &delta.) * mktrf , (1 + &delta.) * mktrf); 81 | exrhi = max((1 - &delta.) * mktrf , (1 + &delta.) * mktrf); 82 | if not missing(exret) then 83 | exret = min(max(exrlo , exret) , exrhi); 84 | drop exrlo exrhi n rf; 85 | run; 86 | 87 | /* calculate market betas as the slope coefficients from the WLS estimation of CAPM */ 88 | proc reg data = _dset1 outest = _regest0 edf noprint; 89 | model exret = mktrf; 90 | by permno; weight agew; 91 | run; 92 | 93 | data _regest1; 94 | format permno mdate bMkt; 95 | keep permno mdate bMkt; 96 | set _regest0 (rename = (mktrf = bMkt)); 97 | mdate = &mdate.; format mdate date9.; 98 | run; 99 | 100 | /* add the obtained betas for this month to the final output */ 101 | proc datasets lib = work nolist; 102 | append base = &output_ds. data = _regest1; 103 | delete _dset0 _dset1 _regest0 _regest1; 104 | run; 105 | 106 | %end; 107 | %end; 108 | %mend welch_market_beta; 109 | ``` 110 | -------------------------------------------------------------------------------- /src/comp_funda_clean.sas: -------------------------------------------------------------------------------- 1 | %macro comp_funda_clean (yr_beg = 1980 , yr_end = 2019 , output_ds = ); 2 | /* %let yr_beg = 1978; %let yr_end = 2018; */ 3 | %let funda_filter = ((indfmt eq "INDL") and (consol eq 'C') and 4 | (popsrc eq 'D') and (datafmt eq "STD") and 5 | (fic eq "USA") and (curncd eq "USD") and 6 | (curcd eq "USD") and not missing(at)); 7 | %let funda_fncd_filter = ((indfmt eq "INDL") and (consol eq 'C') and 8 | (popsrc eq 'D') and (datafmt eq "STD")); 9 | %let secm_filter = (primiss eq 'P' and fic eq "USA" and curcdm eq "USD"); 10 | %let comp_sample_period = (&yr_beg. le fyear le &yr_end.); 11 | %let secm_sample_period = ("01jan&yr_beg."d le datadate le "31dec&yr_end."d);; 12 | 13 | /* panel variable: gvkey; time variable: fyear and datadate */ 14 | %let names_vars = gvkey conm sic naics; 15 | %let funda_keys = gvkey datadate fyear; 16 | %let funda_vars = sich naicsh at sale csho prcc_f seq ceq lct lt 17 | dltt mib txditc txdb itcb pstk pstkrv pstkl lo 18 | dlc cogs xsga revt xint ebitda 19 | ; 20 | 21 | proc sort data = comp.names nodupkey 22 | out = _tmp0 (keep = &names_vars.); 23 | by gvkey; 24 | %let nyr = %eval(&yr_end. - &yr_beg.); 25 | data _tmp0; set _tmp0; 26 | yr_0 = &yr_beg.; 27 | array yr {&nyr.} yr_1 - yr_&nyr.; 28 | do i = 1 to &nyr.; 29 | yr(i) = yr_0 + i; 30 | end; 31 | drop i; 32 | proc transpose data = _tmp0 33 | out = _tmp1 (rename = (col1 = fyear) 34 | drop = _name_); 35 | by &names_vars.; 36 | var yr_0 - yr_&nyr.; 37 | run; 38 | 39 | data funda_short; set comp.funda; 40 | where &comp_sample_period. and 41 | &funda_filter.; 42 | keep &funda_keys. &funda_vars.; 43 | proc sort nodupkey; by gvkey datadate; 44 | run; 45 | 46 | proc sql; 47 | create table _tmp11 (drop = _gvkey _fyear) as 48 | select a.* , b.* 49 | from _tmp1 as a 50 | left join funda_short (rename = (gvkey = _gvkey fyear = _fyear)) as b 51 | on a.gvkey eq b._gvkey and a.fyear eq b._fyear 52 | group by gvkey 53 | having min(_fyear) le fyear le max(_fyear) 54 | ; 55 | quit; 56 | proc sort; by gvkey fyear datadate; 57 | data _tmp2; 58 | format gvkey fyear datadate conm sic; 59 | keep gvkey fyear datadate conm 60 | sic at sale be me bd to pm; 61 | set _tmp11; by gvkey fyear datadate; 62 | /* if last.fyear; */ 63 | sic = coalescec(put(sich,4.) , sic); 64 | /* naics = coalesce(naicsh , naics); */ 65 | /* naics3 = substr(put(naics , 6. -l) , 1 , 3); */ 66 | sale = ifn(missing(sale) , revt , sale); 67 | be = coalesce(seq , sum(ceq , pstk) , sum(at - lt , -mib)) 68 | + coalesce(txditc , sum(txdb , itcb) , 69 | lt - lct - lo - dltt , 0) 70 | - coalesce(pstkrv , pstkl , pstk , 0); 71 | be = ifn(be>0 , be , .); 72 | me = prcc_f * csho; me = ifn(me>0,me,.); 73 | bd = dlc + dltt; 74 | to = sale / ifn(at>0,at,.); 75 | pm = ebitda / ifn(sale>0,sale,.); 76 | /* check uniqueness of the key */ 77 | proc sort nodupkey; by gvkey fyear; 78 | run; 79 | 80 | data _tmp21; 81 | keep gvkey datadate me_secm; 82 | set comp.secm; 83 | where &secm_filter. and 84 | &secm_sample_period.; 85 | me_secm = prccm * cshoq; 86 | rename datadate = mdate; 87 | if not missing(me_secm); 88 | /* check uniqueness of the key */ 89 | proc sort nodupkey; by gvkey mdate; 90 | run; 91 | 92 | proc sql; 93 | create table _tmp22 as 94 | select a.gvkey , a.fyear , a.datadate , 95 | b.mdate , b.me_secm 96 | from _tmp2 as a 97 | left join _tmp21 as b 98 | on a.gvkey eq b.gvkey and 99 | 0 le intck('mon' , a.datadate , b.mdate) le 3 100 | order by a.gvkey , a.fyear , a.datadate , b.mdate 101 | ; 102 | quit; 103 | proc transpose data = _tmp22 out = _tmp23 (drop = _name_) 104 | prefix = me_secm; 105 | var me_secm; 106 | by gvkey fyear datadate; 107 | data _tmp3; 108 | drop me_secm me_secm1 - me_secm2; 109 | merge _tmp2 _tmp23; 110 | by gvkey fyear datadate; 111 | me_secm = coalesce(of me_secm1 - me_secm2); 112 | me = coalesce(me , me_secm); me = ifn(me>0,me,.); 113 | if bd gt 0 then ml = bd / (me + bd); 114 | if at gt 0 then mb = (at - be + me) / at; 115 | proc sort nodupkey; by gvkey fyear; 116 | run; 117 | 118 | proc sql; 119 | create table _tmp4 as 120 | select a.* , b.at_fn , b.sale_fn 121 | from _tmp3 as a 122 | left join 123 | (select * from comp.funda_fncd 124 | where &comp_sample_period. 125 | and &funda_fncd_filter.) as b 126 | on a.gvkey eq b.gvkey and 127 | a.fyear eq b.fyear and 128 | a.datadate eq b.datadate 129 | ; 130 | quit; 131 | proc sort nodupkey; by gvkey fyear; 132 | run; 133 | 134 | proc sql; 135 | create table _tmp41 as 136 | select gvkey , datadate , sum(salecs) as sale_gov 137 | from compsegd.seg_customer 138 | where ctype in ('GOVDOM' , 'GOVSTATE' , 'GOVLOC') 139 | group by gvkey , datadate 140 | having sale_gov gt 0 141 | ; 142 | create table &output_ds. as 143 | select a.* , b.sale_gov 144 | from _tmp4 as a 145 | left join _tmp41 as b 146 | on a.gvkey eq b.gvkey and 147 | a.datadate eq b.datadate 148 | ; 149 | quit; 150 | proc sort nodupkey; by gvkey fyear; 151 | run; 152 | 153 | proc datasets library = work nolist; save &output_ds.; run; 154 | 155 | %mend comp_funda_clean; 156 | 157 | %comp_funda_clean (output_ds = comp_funda_clean); 158 | 159 | %let FFP = "[...]/comp_funda_clean.dta"; 160 | proc export data = comp_funda_clean outfile = &FFP. replace; run; 161 | -------------------------------------------------------------------------------- /src/crsp_monthly.md: -------------------------------------------------------------------------------- 1 | The code below compiles a panel that contains monthly stock information. 2 | 3 | ```sas 4 | /* only include stocks that are ordinary common shares issued by companies incorporated in the US and listed on the NYSE, AMEX, or NASDAQ */ 5 | %let msenames_std_filter = (shrcd in (10 , 11) and exchcd in (1 , 2 , 3)); 6 | /* choose sample period */ 7 | %let msf_sample_period = ('31jan2018'd le date le '31dec2018'd); 8 | 9 | proc sql; 10 | create table _tmp0 as 11 | select a.permno , /* add other vars from msenames if necessary, e.g., a.siccd */ 12 | b.date , b.ret , b.retx , b.prc , 13 | b.vol , b.shrout , b.cfacpr , b.cfacshr 14 | from (select * from crsp.msenames 15 | where &msenames_std_filter.) as a , 16 | (select * from crsp.msf 17 | where &msf_sample_period.) as b 18 | where a.permno eq b.permno and 19 | a.namedt le b.date le a.nameendt 20 | order by a.permno , b.date 21 | ; 22 | /* add delisting return */ 23 | create table _tmp1 as 24 | select a.* , b.dlret 25 | from _tmp0 as a 26 | left join crsp.msedelist as b 27 | on a.permno eq b.permno and 28 | intck('mon' , a.date , b.dlstdt) eq 0 29 | order by permno , date 30 | ; 31 | quit; 32 | 33 | data _tmp2; 34 | format permno mdate ret sz szb turn; /* order the vars */ 35 | keep permno mdate ret sz szb turn; /* keep the needed vars */ 36 | set _tmp1; 37 | mdate = intnx('mon' , date , 0 , 'e'); format mdate date9.; 38 | ret = (1 + ret) * sum(1 , dlret) - 1; /* adjust for delisting */ 39 | if not missing(ret) then ret = min(max(-0.4 , ret) , 0.6); /* winsorizing */ 40 | /* adjust price */ 41 | if cfacpr eq 0 then p = abs(prc); 42 | else p = abs(prc) / cfacpr; 43 | if p le 0 then p = .; 44 | /* adjust shares outstanding */ 45 | if cfacshr eq 0 then tso = shrout * 1000; 46 | else tso = shrout * cfacshr * 1000; 47 | if tso le 0 then tso = .; 48 | sz = p * tso / 1000000; /* market value of the stock at month end */ 49 | szb = sz / (1 + retx); /* market value of the stock at month beginning */ 50 | turn = vol * 100 / (shrout * 1000) * 100; /* turnover ratio in percentage */ 51 | proc sort nodupkey; by permno mdate; /* check uniqueness of the key */ 52 | run; 53 | 54 | /* a macro provided by WRDS that compiles a link table between crso and compustat */ 55 | %macro compress_ccmxpf_lnkhist; 56 | %let prim_link = %str(linktype in ("LC", "LS", "LU")); 57 | 58 | proc sql; 59 | create table lnk1 as 60 | select * 61 | from crsp.ccmxpf_lnkhist 62 | where &prim_link. 63 | order by gvkey, lpermno, lpermco, linkdt, linkenddt 64 | ; 65 | quit; 66 | 67 | data lnk2; 68 | set lnk1; 69 | by gvkey lpermno lpermco linkdt linkenddt; 70 | format prev_ldt prev_ledt yymmddn8.; 71 | retain prev_ldt prev_ledt; 72 | if first.lpermno then do; 73 | if last.lpermno then do; 74 | /* Keep this obs if it's the first and last matching permno pair */ 75 | output; 76 | end; 77 | else do; 78 | /* If it's the first but not the last pair, retain the dates for future use */ 79 | prev_ldt = linkdt; 80 | prev_ledt = linkenddt; 81 | output; 82 | end; 83 | end; 84 | else do; 85 | if linkdt=prev_ledt+1 or linkdt=prev_ledt then do; 86 | /* If the date range follows the previous one, assign the previous linkdt value 87 | to the current - will remove the redundant in later steps. Also retain the 88 | link end date value */ 89 | linkdt = prev_ldt; 90 | prev_ledt = linkenddt; 91 | output; 92 | end; 93 | else do; 94 | /* If it doesn't fall into any of the above conditions, just keep it and retain the 95 | link date range for future use*/ 96 | output; 97 | prev_ldt = linkdt; 98 | prev_ledt = linkenddt; 99 | end; 100 | end; 101 | drop prev_ldt prev_ledt; 102 | run; 103 | 104 | data lnk; 105 | set lnk2; 106 | by gvkey lpermno linkdt; 107 | if last.linkdt; 108 | /* remove redundant observations with identical LINKDT (result of the previous data step), so that 109 | each consecutive pair of observations will have either different GVKEY-IID-PERMNO match, or 110 | non-consecutive link date range 111 | */ 112 | run; 113 | %mend; 114 | %compress_ccmxpf_lnkhist; 115 | 116 | proc sql; 117 | create table _tmp3 as 118 | select distinct a.* , b.gvkey 119 | from _tmp2 as a left join lnk as b 120 | on a.permno eq b.lpermno and 121 | (b.LINKDT le a.mdate or missing(b.LINKDT)) and 122 | (b.LINKENDDT gt a.mdate or missing(b.LINKENDDT)) 123 | order by permno , mdate 124 | /* group by permno , mdate having count(unique gvkey) gt 1 /* one permno-mdate should be matched to only one gvkey */ 125 | ; 126 | quit; 127 | proc sort nodupkey; by permno mdate; 128 | run; 129 | 130 | %let FFP = "/home/uiuc/chaozi/test.csv"; /* change csv to dta if you need stata file */ 131 | proc export data = _tmp3 outfile = &FFP. replace; run; 132 | ``` 133 | --------------------------------------------------------------------------------