├── .gitignore ├── LICENSE.txt ├── README.md ├── __init__.py ├── __main__.py ├── companies_list.txt ├── document_group_section_search.json ├── img ├── home_depot_screenshots.png └── output_files_example_image.png ├── output_files_examples └── batch_0001 │ └── 001 │ ├── HD_0000354950_10K_20160131_Item1A_excerpt.txt │ ├── HD_0000354950_10K_20160131_Item1_excerpt.txt │ ├── HD_0000354950_10K_20160131_Item7A_excerpt.txt │ ├── HD_0000354950_10K_20160131_Item7_excerpt.txt │ ├── HD_0000354950_10K_20170129_Item1A_excerpt.txt │ ├── HD_0000354950_10K_20170129_Item1_excerpt.txt │ ├── HD_0000354950_10K_20170129_Item7A_excerpt.txt │ └── HD_0000354950_10K_20170129_Item7_excerpt.txt ├── requirements.txt └── src ├── __init__.py ├── control.py ├── document.py ├── download.py ├── html_document.py ├── metadata.py ├── text_document.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled source # 2 | ################### 3 | *.com 4 | *.class 5 | *.dll 6 | *.exe 7 | *.o 8 | *.so 9 | *.pyc 10 | 11 | # Packages # 12 | ############ 13 | # it's better to unpack these files and commit the raw source 14 | # git has its own built in compression methods 15 | *.7z 16 | *.dmg 17 | *.gz 18 | *.iso 19 | *.jar 20 | *.rar 21 | *.tar 22 | *.zip 23 | 24 | # Logs and databases # 25 | ###################### 26 | *.log 27 | *.sql 28 | *.sqlite 29 | 30 | # OS generated files # 31 | ###################### 32 | .DS_Store 33 | .DS_Store? 34 | ._* 35 | .Spotlight-V100 36 | .Trashes 37 | ehthumbs.db 38 | Thumbs.db 39 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SEC EDGAR Text 2 | The goal of this project is to download large numbers of company filings 3 | from the SEC EDGAR service, extract key text sections of interest, 4 | and store them in an easily accessible and readable format. 5 | 6 | ![Home depot screenshots](img/home_depot_screenshots.png) 7 | 8 | Download key text sections of SEC EDGAR company filings. Format, organise 9 | and store the text excerpts ready for both automated processing (NLP) and 10 | for human reading (spot-checking). Structured storage of text and 11 | metadata, with logging of failed document analyses. Suitable for 12 | automation of large-scale downloads, with flexibility to customise 13 | which sections of the documents are extracted. Compatible with all 14 | main EDGAR document formats from 1993 onwards, and easily adapted or 15 | extended to extract different sections of interest in EDGAR filings. 16 | 17 | Generally accurate in extracting text, but lots of room for improvement. 18 | Comments and contributions welcome! 19 | 20 | 21 | 22 | #### About the project 23 | 24 | * I completed this project during a sabbatical from my job. 25 | * I used [SEC-Edgar-Crawler](https://github.com/rahulrrixe/sec-edgar) 26 | for initial ideas which helped this project. 27 | * Thanks to my colleagues at Rosenberg Equities for help with an earlier 28 | attempt to download EDGAR data. 29 | 30 | ## Installation 31 | Clone the repo, and install the packages in requirements.txt. 32 | 33 | git clone https://github.com/alions7000/SEC-EDGAR-text 34 | pip install -r SEC-EDGAR-text/requirements.txt 35 | 36 | 37 | ## Usage 38 | *Basic usage* This will download those 500 large US companies in the 39 | included 'companies_list.txt' file. Run from the project folder, 40 | and accept the default settings when prompted. 41 | 42 | python SEC-EDGAR-text 43 | 44 | *Typical usage* There are several arguments available to choose which 45 | companies download list ot use, which types of filings to download, where to 46 | save the extracted documents, multiprocessing option, download rate 47 | and so on. For example: 48 | 49 | python SEC-EDGAR-text --storage=/path/to/my_storage_location --start=20150101 --end=99991231 --filings=10-K --multiprocessing_cores=0 --traffic_limit_pause_ms=500 50 | 51 | See module utils.py to see a full list of command line options. 52 | 53 | To download of a full history of key sections from 10-K and 10-Q filings 54 | for (most) US companies takes less than 40GB storage: around 1 million 55 | text excerpt files, plus a similar number of metadata files. 56 | 57 | 58 | 59 | 60 | ## Background 61 | ### About EDGAR 62 | 63 | *History and future of EDGAR, links to key SEC procedures documents etc.* 64 | Electronic filing was mandatory from 1996. 65 | 66 | 67 | ### Retrieving text data from EDGAR 68 | 69 | ![Example of files download](img/output_files_example_image.png) 70 | 71 | 72 | ### References 73 | **Other packages** 74 | 75 | Lots of open source projects have automated access to EDGAR filings. 76 | A few access text data, like this one. Most focus on downloading whole 77 | filing documents, financial statements information, or parsing 78 | XBRL filings. This package aims to make access to large volumes of text 79 | information easier and more consistent. 80 | 81 | https://github.com/datasets/edgar A nice introduction to the EDGAR database 82 | website. 83 | https://github.com/eliangcs/pystock-crawler A project which also accesses 84 | daily stock prices from Yahoo Finance 85 | 86 | https://github.com/rahulrrixe/sec-edgar Download a list of filings and 87 | save the complete document locally. 88 | https://github.com/lukerosiak/pysec Django code for parsing XBRL documents. 89 | https://github.com/eliangcs/pystock-crawler A project which also accesses 90 | daily stock prices from Yahoo Finance 91 | 92 | **Academic research** 93 | 94 | Professor Bill McDonald, with collaborators including Prof Tim Loughran 95 | has led much of the academic reserach into company filings' text 96 | data in recent years. He much shares the approach 97 | that he used for extracting EDGAR filings text data on his 98 | [website](https://www3.nd.edu/~mcdonald/Word_Lists.html). 99 | The approach to scraping the text data is somewhat different to that 100 | used in this project, but it has a similar goal, and the documentation 101 | gives a great introduction to the structure of the HTML filing documents. 102 | The website includes links to related research, and 103 | plenty of guidance on doing downstream research on EDGAR text documents. -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ -------------------------------------------------------------------------------- /__main__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | secedgartext: extract text from SEC corporate filings 4 | Copyright (C) 2017 Alexander Ions 5 | 6 | This program is free software: you can redistribute it and/or modify 7 | it under the terms of the GNU General Public License as published by 8 | the Free Software Foundation, either version 3 of the License, or 9 | (at your option) any later version. 10 | 11 | This program is distributed in the hope that it will be useful, 12 | but WITHOUT ANY WARRANTY; without even the implied warranty of 13 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 | GNU General Public License for more details. 15 | 16 | You should have received a copy of the GNU General Public License 17 | along with this program. If not, see . 18 | """ 19 | 20 | from src.control import Downloader 21 | from src.utils import logger, sql_cursor, sql_connection 22 | 23 | def main(): 24 | try: 25 | Downloader().download_companies(do_save_full_document=False) 26 | except Exception: 27 | # this makes sure that the full error message is recorded in 28 | # the logger text file for the process 29 | logger.exception("Fatal error in company downloading") 30 | 31 | # tidy up database before closing 32 | sql_cursor.execute("delete from metadata where sec_cik like 'dummy%'") 33 | sql_connection.close() 34 | 35 | 36 | if __name__ == '__main__': 37 | main() 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /companies_list.txt: -------------------------------------------------------------------------------- 1 | # A list of some of the current largest US listed companies: SEC CIK code, followed by a ticker code (which is used for labelling the output files) 2 | 354950 HD 3 | 50104 TSO 4 | 895421 MS 5 | 89800 SHW 6 | 1037868 AME 7 | 850209 FL 8 | 827054 MCHP 9 | 277948 CSX 10 | 100493 TSN 11 | 4962 AXP 12 | 1260221 TDG 13 | 899689 VNO 14 | 29989 OMC 15 | 62709 MMC 16 | 106535 WY 17 | 51644 IPG 18 | 1050915 PWR 19 | 10456 BAX 20 | 354908 FLIR 21 | 1281761 RF 22 | 100885 UNP 23 | 1390777 BK 24 | 65984 ETR 25 | 1326160 DUK 26 | 18230 CAT 27 | 72333 JWN 28 | 21665 CL 29 | 1430602 SNI 30 | 1492633 NLSN 31 | 21271 VLO 32 | 1341439 ORCL 33 | 1120193 NDAQ 34 | 76334 PH 35 | 1024478 ROK 36 | 1048911 FDX 37 | 313616 DHR 38 | 96021 SYY 39 | 320335 TMK 40 | 743988 XLNX 41 | 813828 CBS 42 | 859737 HOLX 43 | 49826 ITW 44 | 1140536 WLTW 45 | 906107 EQR 46 | 31791 PKI 47 | 1001039 DIS 48 | 63276 MAT 49 | 886982 GS 50 | 1393311 PSA 51 | 92230 BBT 52 | 35527 FITB 53 | 315293 AON 54 | 217346 TXT 55 | 1067983 BRK/B 56 | 1646383 CSRA 57 | 815556 FAST 58 | 63754 MKC 59 | 940944 DRI 60 | 1174922 WYNN 61 | 100517 UAL 62 | 36270 MTB 63 | 1051470 CCI 64 | 46080 HAS 65 | 1659166 FTV 66 | 60667 LOW 67 | 97476 TXN 68 | 754737 SCG 69 | 823768 WM 70 | 52988 JEC 71 | 827052 EIX 72 | 1063761 SPG 73 | 55785 KMB 74 | 80424 PG 75 | 200406 JNJ 76 | 6201 AAL 77 | 920148 LH 78 | 77476 PEP 79 | 1039101 LLL 80 | 814453 NWL 81 | 891024 PDCO 82 | 879101 KIM 83 | 831001 C 84 | 1410636 AWK 85 | 21076 CLX 86 | 42582 GT 87 | 40533 GD 88 | 1524472 XYL 89 | 20286 CINF 90 | 78239 PVH 91 | 4281 ARNC 92 | 1532063 ESRX 93 | 1567892 MNK 94 | 48465 HRL 95 | 1012100 SEE 96 | 936468 LMT 97 | 1060391 RSG 98 | 1043604 JNPR 99 | 1489393 LYB 100 | 773840 HON 101 | 1004980 PCG 102 | 1618921 WBA 103 | 8818 AVY 104 | 50863 INTC 105 | 794323 LVLT 106 | 1275283 RAI 107 | 1361658 WYN 108 | 1136869 ZBH 109 | 27904 DAL 110 | 313927 CHD 111 | 1623613 MYL 112 | 899051 ALL 113 | 858877 CSCO 114 | 6951 AMAT 115 | 1002910 AEE 116 | 56873 KR 117 | 77360 PNR 118 | 1466258 IR 119 | 310158 MRK 120 | 1359841 HBI 121 | 350698 AN 122 | 79879 PPG 123 | 68505 MSI 124 | 723254 CTAS 125 | 1103982 MDLZ 126 | 12659 HRB 127 | 895648 GGP 128 | 86312 TRV 129 | 78003 PFE 130 | 104169 WMT 131 | 1144215 AYI 132 | 66740 MMM 133 | 1551182 ETN 134 | 1158449 AAP 135 | 1022079 DGX 136 | 915389 EMN 137 | 5272 AIG 138 | 59558 LNC 139 | 40987 GPC 140 | 832988 SIG 141 | 1418135 DPS 142 | 1510295 MPC 143 | 849399 SYMC 144 | 72903 XEL 145 | 764622 PNW 146 | 30625 FLS 147 | 6281 ADI 148 | 922224 PPL 149 | 1124198 FLR 150 | 1339947 VIAB 151 | 922864 AIV 152 | 9389 BLL 153 | 356028 CA 154 | 1037038 RL 155 | 877890 CTXS 156 | 1121788 GRMN 157 | 1067701 URI 158 | 1099219 MET 159 | 811156 CMS 160 | 1137774 PRU 161 | 7332 SWN 162 | 861878 SRCL 163 | 1365135 WU 164 | 820027 AMP 165 | 12927 BA 166 | 47111 HSY 167 | 72741 ES 168 | 5513 UNM 169 | 1637459 KHC 170 | 764180 MO 171 | 833444 JCI 172 | 723125 MU 173 | 107263 WMB 174 | 1506307 KMI 175 | 732717 T 176 | 37996 F 177 | 1530721 KORS 178 | 1013871 NRG 179 | 1534701 PSX 180 | 59478 LLY 181 | 875159 XL 182 | 39911 GPS 183 | 55067 K 184 | 27419 TGT 185 | 319201 KLAC 186 | 24545 TAP 187 | 7084 ADM 188 | 40704 GIS 189 | 886158 BBBY 190 | 60086 L 191 | 32604 EMR 192 | 783325 WEC 193 | 16732 CPB 194 | 106040 WDC 195 | 816761 TDC 196 | 1070750 HST 197 | 23217 CAG 198 | 764478 BBY 199 | 78814 PBI 200 | 320193 AAPL 201 | 788784 PEG 202 | 4904 AEP 203 | 318154 AMGN 204 | 24741 GLW 205 | 38777 BEN 206 | 1130310 CNP 207 | 1045609 PLD 208 | 1289490 EXR 209 | 1677703 CNDT 210 | 920760 LEN 211 | 47217 HPQ 212 | 1047862 ED 213 | 732712 VZ 214 | 1467858 GM 215 | 885639 KSS 216 | 1564708 NWS 217 | 1564708 NWSA 218 | 1636023 WRK 219 | 1002047 NTAP 220 | 1593034 ENDP 221 | 51434 IP 222 | 791519 SPLS 223 | 1267238 AIZ 224 | 4977 AFL 225 | 1137789 STX 226 | 18926 CTL 227 | 1031296 FE 228 | 1109357 EXC 229 | 874761 AES 230 | 765880 HCP 231 | 874766 HIG 232 | 1011006 YHOO 233 | 936340 DTE 234 | 794367 M 235 | 1274494 FSLR 236 | 1451505 RIG 237 | 20520 FTR 238 | 1385157 TEL 239 | 1137411 COL 240 | 1164727 NEM 241 | 912595 MAA 242 | 1593538 NAVI 243 | 1024305 COTY 244 | 882095 GILD 245 | 1645590 HPE 246 | 108772 XRX 247 | 21344 KO 248 | 63908 MCD 249 | 831259 FCX 250 | 1041061 YUM 251 | 101778 MRO 252 | 39899 TGNA 253 | 769397 ADSK 254 | 1358071 CXO 255 | 1336917 UAA 256 | 797468 OXY 257 | 1039684 OKE 258 | 1021860 NOV 259 | 1168054 XEC 260 | 1336917 UA 261 | 1018724 AMZN 262 | 87347 SLB 263 | 808362 BHI 264 | 33213 EQT 265 | 899866 ALXN 266 | 1065280 NFLX 267 | 1108524 CRM 268 | 796343 ADBE 269 | 1652044 GOOGL 270 | 1633917 PYPL 271 | 4447 HES 272 | 6769 APA 273 | 872589 REGN 274 | 72207 NBL 275 | 816284 CELG 276 | 1090012 DVN 277 | 1075531 PCLN 278 | 1403568 ULTA 279 | 773910 APC 280 | 1373835 SE 281 | 1086222 AKAM 282 | 1087423 RHT 283 | 1324424 EXPE 284 | 1058090 CMG 285 | 1045810 NVDA 286 | 1403161 V 287 | 34088 XOM 288 | 1526520 TRIP 289 | 865752 MNST 290 | 316709 SCHW 291 | 707549 LRCX 292 | 1141391 MA 293 | 1110803 ILMN 294 | 1101239 EQIX 295 | 29915 DOW 296 | 858470 COG 297 | 45012 HAL 298 | 895126 CHK 299 | 1038357 PXD 300 | 821189 EOG 301 | 875320 VRTX 302 | 1326801 FB 303 | 315852 RRC 304 | 46765 HP 305 | 882184 DHI 306 | 19617 JPM 307 | 93410 CVX 308 | 51143 IBM 309 | 912750 NFX 310 | 1101215 ADS 311 | 1020569 IRM 312 | 1035267 ISRG 313 | 1004434 AMG 314 | 1396009 VMC 315 | 1058290 CTSH 316 | 829224 SBUX 317 | 1135152 FTI 318 | 896878 INTU 319 | 874716 IDXX 320 | 1048286 MAR 321 | 1071739 CNC 322 | 728535 JBHT 323 | 1604778 QRVO 324 | 789019 MSFT 325 | 1099800 EW 326 | 804753 CERN 327 | 40545 GE 328 | 320187 NKE 329 | 1678531 EVHC 330 | 92122 SO 331 | 1551152 ABBV 332 | 1123360 GPN 333 | 822416 PHM 334 | 800459 HAR 335 | 75362 PCAR 336 | 202058 HRS 337 | 916076 MLM 338 | 766421 ALK 339 | 80661 PGR 340 | 30554 DD 341 | 731766 UNH 342 | 8670 ADP 343 | 1601712 SYF 344 | 29534 DG 345 | 354190 AJG 346 | 1364742 BLK 347 | 914208 IVZ 348 | 915912 AVB 349 | 1111711 NI 350 | 1324404 CF 351 | 915913 ALB 352 | 717423 MUR 353 | 1437107 DISCK 354 | 1015780 ETFC 355 | 701221 CI 356 | 33185 EFX 357 | 1800 ABT 358 | 723531 PAYX 359 | 1166691 CMCSA 360 | 882835 ROP 361 | 1578845 AGN 362 | 909832 COST 363 | 1519751 FBHS 364 | 203527 VAR 365 | 16918 STZ 366 | 885725 BSX 367 | 4127 SWKS 368 | 875045 BIIB 369 | 1649338 AVGO 370 | 898173 ORLY 371 | 109198 TJX 372 | 1138118 CBG 373 | 1467373 ACN 374 | 916365 TSCO 375 | 315189 DE 376 | 745732 ROST 377 | 49196 HBAN 378 | 352915 UHS 379 | 712515 EA 380 | 759944 CFG 381 | 912242 MAC 382 | 1156039 ANTM 383 | 920522 ESS 384 | 91419 SJM 385 | 1053507 AMT 386 | 1170010 KMX 387 | 815097 CCL 388 | 1001250 EL 389 | 1059556 MCO 390 | 64803 CVS 391 | 1032208 SRE 392 | 1297996 DLR 393 | 1110783 MON 394 | 1091667 CHTR 395 | 1442145 VRSK 396 | 54480 KSU 397 | 746515 EXPD 398 | 1585364 PRGO 399 | 1378946 PBCT 400 | 1308161 FOX 401 | 1308161 FOXA 402 | 310764 SYK 403 | 96223 LUK 404 | 1555280 ZTS 405 | 1048695 FFIV 406 | 1133421 NOC 407 | 1521332 DLPH 408 | 1285785 MOS 409 | 721683 TSS 410 | 34903 FRT 411 | 1122304 AET 412 | 935703 DLTR 413 | 804328 QCOM 414 | 1037540 BXP 415 | 51253 IFF 416 | 818479 XRAY 417 | 711404 COO 418 | 2969 APD 419 | 109380 ZION 420 | 9892 BCR 421 | 14693 BF/B 422 | 49071 HUM 423 | 798354 FISV 424 | 701985 LB 425 | 64040 SPGI 426 | 1000228 HSIC 427 | 28412 CMA 428 | 74208 UDR 429 | 37785 FMC 430 | 73309 NUE 431 | 753308 NEE 432 | 766704 HCN 433 | 718877 ATVI 434 | 1000697 WAT 435 | 1043277 CHRW 436 | 106640 WHR 437 | 73124 NTRS 438 | 1065088 EBAY 439 | 1413329 PM 440 | 927628 COF 441 | 1163165 COP 442 | 1105705 TWX 443 | 866787 AZO 444 | 91440 SNA 445 | 865436 WFM 446 | 851968 MHK 447 | 715957 D 448 | 721371 CAH 449 | 791907 LLTC 450 | 1156375 CME 451 | 1065696 LKQ 452 | 1113169 TROW 453 | 97745 TMO 454 | 1116132 COH 455 | 101829 UTX 456 | 14272 BMY 457 | 36104 USB 458 | 93556 SWK 459 | 1140859 ABC 460 | 820313 APH 461 | 884887 RCL 462 | 927066 DVA 463 | 92380 LUV 464 | 1037646 MTD 465 | 927653 MCK 466 | 750556 STI 467 | 85961 R 468 | 912615 URBN 469 | 70858 BAC 470 | 896159 CB 471 | 740260 VTR 472 | 1115222 DNB 473 | 1090872 A 474 | 10795 BDX 475 | 884905 PX 476 | 1126328 PFG 477 | 860730 HCA 478 | 1613103 MDT 479 | 352541 LNT 480 | 908255 BWA 481 | 1040971 SLG 482 | 62996 MAS 483 | 713676 PNC 484 | 277135 GWW 485 | 103379 VFC 486 | 98246 TIF 487 | 1571949 ICE 488 | 1579241 ALLE 489 | 58492 LEG 490 | 315213 RHI 491 | 29905 DOV 492 | 1136893 FIS 493 | 1090727 UPS 494 | 26172 CMI 495 | 1452575 MJN 496 | 1014473 VRSN 497 | 1393612 DFS 498 | 1047122 RTN 499 | 31462 ECL 500 | 726728 O 501 | 72971 WFC 502 | 793952 HOG 503 | 91576 KEY 504 | 702165 NSC 505 | 93751 STT 506 | -------------------------------------------------------------------------------- /document_group_section_search.json: -------------------------------------------------------------------------------- 1 | { 2 | "10-K": [ 3 | { 4 | "itemname": "Item1", 5 | "txt": [ 6 | { 7 | "start": "\n\\s{,40}(?:PART.{,40})?Item_1.{,10}Business.{,39}?\n", 8 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_(?:1A|2).{,10}(?:Risk_Factors|Properties).{,39}?\n" 9 | } 10 | ], 11 | "html": [ 12 | { 13 | "start": "\n_(?:PART.{,40})?Item_1.{,10}Business", 14 | "end": "\n_(?:PART.{,40})?Item_(?:1A|2).{,10}(?:Risk_Factors|Properties).{,99}?\n" 15 | } 16 | ] 17 | }, 18 | { 19 | "itemname": "Item1A", 20 | "txt": [ 21 | { 22 | "start": "\n\\s{,40}(?:PART.{,40})?Item_1.{,10}Risk_Factors.{,39}?\n", 23 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_2.{,10}Properties.{,39}?\n" 24 | }, 25 | { 26 | "start": "\n\\s{,40}(?:PART.{,40})?Item_1.{,10}Risk_Factors", 27 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_[2-9].{,39}?\n" 28 | } 29 | ], 30 | "html": [ 31 | { 32 | "start": "\n_(?:PART.{,40})?Item_1.{,10}Risk_Factors", 33 | "end": "\n_(?:PART.{,40})?Item_2.{,10}Properties.{,99}?\n" 34 | }, 35 | { 36 | "start": "\n_(?:PART.{,40})?Item_1.{,10}Risk_Factors", 37 | "end": "\n_(?:PART.{,40})?Item_[2-9].{,99}?\n" 38 | }, 39 | { 40 | "start": "\n_(?:PART.{,40})?Item_1\\(?A", 41 | "end": "\n_(?:PART.{,40})?Item_[2-9].{,99}?\n" 42 | } 43 | ] 44 | }, 45 | { 46 | "itemname": "Item7", 47 | "txt": [ 48 | { 49 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7.{,40})?MANAGEMENT.{,5}DISCUSSION_AND.{,69}?\n", 50 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_(?:7A|8).{,99}?\n" 51 | }, 52 | { 53 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7.{,40})?MANAGEMENT.{,5}DISCUSSION_AND.{,69}?\n", 54 | "end": "\n\\s{,40}(?:PART.{,40})?(?:INDEX_TO.{,20})?(?:QUANTITATIVE_AND_QUALITATIVE_DISCLOSURES|FINANCIAL_STATEMENTS|CONSOLIDATED_BALANCE|CONSOLIDATED_STATEMENTS).{,69}?\n" 55 | } 56 | ], 57 | "html": [ 58 | { 59 | "start": "\n_(?:PART.{,40})?_Item_7", 60 | "end": "\n_(?:PART.{,40})?_Item_(?:7A|8).{,99}?\n" 61 | }, 62 | { 63 | "start": "\n(?:PART.{,40})?MANAGEMENT.{,5}DISCUSSION_AND", 64 | "end": "\n(?:PART.{,40})?(?:INDEX_TO.{,20})?(?:QUANTITATIVE_AND_QUALITATIVE_DISCLOSURES|FINANCIAL_STATEMENTS|CONSOLIDATED_BALANCE|CONSOLIDATED_STATEMENTS?).{,99}?\n" 65 | } 66 | ] 67 | }, 68 | { 69 | "itemname": "Item7A", 70 | "txt": [ 71 | { 72 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7A.{,40})?QUANTITATIVE_AND_QUALITATIVE_DISCLOSURES.{,69}?\n", 73 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_8.{,99}?\n" 74 | }, 75 | { 76 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7A.{,40})?QUANTITATIVE_AND_QUALITATIVE_DISCLOSURES.{,69}?\n", 77 | "end": "\n\\s{,40}(?:PART.{,40})?(?:INDEX_TO.{,20})?(?:FINANCIAL_STATEMENTS|CONSOLIDATED_BALANCE|CONSOLIDATED_STATEMENTS).{,69}?\n" 78 | } 79 | ], 80 | "html": [ 81 | { 82 | "start": "\n_(?:PART.{,40})?_Item_7A", 83 | "end": "\n_(?:PART.{,40})?_Item_8.{,99}?\n" 84 | }, 85 | { 86 | "start": "\n(?:PART.{,40})?QUANTITATIVE_AND_QUALITATIVE_DISCLOSURES", 87 | "end": "\n(?:PART.{,40})?(?:INDEX_TO.{,20})?(?:FINANCIAL_STATEMENTS|CONSOLIDATED_BALANCE|CONSOLIDATED_STATEMENTS?).{,99}?\n" 88 | } 89 | ] 90 | } 91 | 92 | ], 93 | 94 | "EX-13": [ 95 | { 96 | "itemname": "Exhibit13", 97 | "txt": [ 98 | { 99 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7.{,40})?Management.{,5}Discussion.{,69}?\n", 100 | "end": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_8.{,40})?(?:Financial_Statement|Consolidated_Statements?_of|Significant_Accounting|Cautionary_Language|Report_of_Independent).{,69}?\n" 101 | }, 102 | { 103 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_7.{,40})?Financial_Review.{,69}?\n", 104 | "end": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_8.{,40})?(?:Financial_Statement|Consolidated_Statements?_of|Significant_Accounting|Cautionary_Language|Report_of_Independent).{,69}?\n" 105 | } 106 | ], 107 | "html": [ 108 | { 109 | "start": "\n_(?:PART.{,40})?(?:ITEM_7.{,40})?(?:Management.{,5}Discussion|Financial_Review)", 110 | "end": "\n_(?:PART.{,40})?(?:ITEM_8.{,40})?(?:Financial_Statement|Consolidated_Statements?_of|Significant_Accounting|Cautionary_Language|Report_of_Independent).{,99}?\n" 111 | } 112 | ] 113 | } 114 | ], 115 | 116 | "10-Q": [ 117 | { 118 | "itemname": "Item1A", 119 | "txt": [ 120 | { 121 | "start": "\n\\s{,40}(?:PART.{,40})?ITEM_1.{,10}Risk_Factors.{,39}?\n", 122 | "end": "\n\\s{,40}(?:PART.{,40})?ITEM_2.{,39}?\n" 123 | } 124 | ], 125 | "html": [ 126 | { 127 | "start": "\n_(?:PART.{,40})?Item_1.{,10}Risk_Factors", 128 | "end": "\n_(?:PART.{,40})?Item_2.{,10}Unregistered.{,99}?\n" 129 | }, 130 | { 131 | "start": "\n_(?:PART.{,40})?Item_1.{,10}Risk_Factors", 132 | "end": "\n_(?:PART.{,40})?Item_\\d.{,99}?\n" 133 | }, 134 | { 135 | "start": "\n_(?:PART.{,40})?Item_1\\(?A", 136 | "end": "\n_(?:PART.{,40})?Item_\\d.{,99}?\n" 137 | }, 138 | { 139 | "start": "\n_(?:PART.{,40})?(?:Item_1.{,40})?.{,10}Risk_Factors", 140 | "end": "\n_(?:PART.{,40})?.{,10}(?:Item_[2-9]|Unregistered).{,99}?\n" 141 | } 142 | ] 143 | }, 144 | { 145 | "itemname": "Item2", 146 | "txt": [ 147 | { 148 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_2.{,10})?Management.{,5}Discussion_and.{,69}?\n", 149 | "end": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_3|PART_II).{,99}?\n" 150 | }, 151 | { 152 | "start": "\n\\s{,40}(?:PART.{,40})?(?:ITEM_2.{,10})?Management.{,5}Discussion_and.{,69}?\n", 153 | "end": "\n\\s{,40}(?:PART_II|Item_[3-9]).{,99}?\n" 154 | } 155 | ], 156 | "html": [ 157 | { 158 | "start": "\n_(?:PART.{,40})?Items?_2.{,15}Management", 159 | "end": "\n_(?:PART.{,40})?(?:Item_[3-9]|Quantitative_and_Qualitative|Controls_and_Procedures|OTHER_INFORMATION|Legal_Proceedings).{,99}?\n" 160 | }, 161 | { 162 | "start": "\n_(?:PART.{,40})?(?:Items?_2.{,15})?Management.{,5}Discussion_and_Analysis", 163 | "end": "\n_(?:PART.{,40})?(?:Item_[3-9]|Quantitative_and_Qualitative|Controls_and_Procedures|OTHER_INFORMATION|Legal_Proceedings).{,99}?\n" 164 | }, 165 | { 166 | "start": "\n_(?:Financial_Review|Disclosure_Regarding_Forward|Executive_Overview|Business_Overview|Earnings_Performance).{,49}\n", 167 | "end": "\n_(?:PART.{,40})?(?:Item_[3-9]|Quantitative_and_Qualitative|Controls_and_Procedures|OTHER_INFORMATION|Legal_Proceedings).{,99}?\n" 168 | }, 169 | { 170 | "start": "\n_(?:PART.{,40})?Items?_2(?!.{,10}Unregistered_Sales)", 171 | "end": "\n_(?:PART.{,40})?(?:Item_[3-9]|Quantitative_and_Qualitative|Controls_and_Procedures|OTHER_INFORMATION|Legal_Proceedings).{,99}?\n" 172 | } 173 | ] 174 | } 175 | ], 176 | 177 | "EX-99": [ 178 | { 179 | "itemname": "EX-99", 180 | "txt": [ 181 | { 182 | "start": "^", 183 | "end": "$" 184 | } 185 | ], 186 | "html": [ 187 | { 188 | "start": "^", 189 | "end": "$" 190 | } 191 | ] 192 | } 193 | 194 | ], 195 | "8-K": [ 196 | { 197 | "itemname": "8-K", 198 | "txt": [ 199 | { 200 | "start": "^", 201 | "end": "$" 202 | } 203 | ], 204 | "html": [ 205 | { 206 | "start": "^", 207 | "end": "$" 208 | } 209 | ] 210 | } 211 | 212 | ] 213 | } 214 | -------------------------------------------------------------------------------- /img/home_depot_screenshots.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alions7000/SEC-EDGAR-text/1acedca1babf79a2e3fba4cf88c02bd27b0b28e0/img/home_depot_screenshots.png -------------------------------------------------------------------------------- /img/output_files_example_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alions7000/SEC-EDGAR-text/1acedca1babf79a2e3fba4cf88c02bd27b0b28e0/img/output_files_example_image.png -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20160131_Item1A_excerpt.txt: -------------------------------------------------------------------------------- 1 | Item 1A. Risk Factors. 2 | 3 | The risks and uncertainties described below could materially and adversely affect our business, financial condition and results of operations and could cause actual results to differ materially from our expectations and projections. You should read these Risk Factors in conjunction with "Management’s Discussion and Analysis of Financial Condition and Results of Operations" in Item 7 and our Consolidated Financial Statements and related notes in Item 8. There also may be other factors that we cannot anticipate or that are not described in this report generally because we do not currently perceive them to be material. Those factors could cause results to differ materially from our expectations. 4 | 5 | Strong competition could adversely affect prices and demand for our products and services and could decrease our market share. 6 | 7 | We operate in markets that are highly competitive. We compete principally based on customer service, price, store location and appearance, and quality, availability and assortment of merchandise. In each market we serve, there are a number of other home improvement stores, electrical, plumbing and building materials supply houses and lumber yards. With respect to some products and services, we also compete with specialty design stores, showrooms, discount stores, local, regional and national hardware stores, paint stores, mail order firms, warehouse clubs, independent building supply stores, MRO companies and, to a lesser extent, other retailers, as well as with installers of home improvement products. In addition, we face growing competition from online and multichannel retailers, some of whom may have a lower cost structure than ours, as our customers increasingly use computers, tablets, smartphones and other mobile devices to shop online and compare prices and products in real time. Intense competitive pressures from one or more of our competitors or our inability to adapt effectively and quickly to a changing competitive landscape could affect our prices, our margins or demand for our products and services. If we are unable to timely and appropriately respond to these competitive pressures, including through maintenance of customer service and customer relationships to deliver a superior customer experience, our market share and our financial performance could be adversely affected. 8 | 9 | 7 10 | We may not timely identify or effectively respond to consumer needs, expectations or trends, which could adversely affect our relationship with customers, our reputation, the demand for our products and services, and our market share. 11 | 12 | The success of our business depends in part on our ability to identify and respond promptly to evolving trends in demographics; consumer preferences, expectations and needs; and unexpected weather conditions, while also managing appropriate inventory levels and maintaining an excellent customer experience. It is difficult to successfully predict the products and services our customers will demand. As the housing and home improvement market continues to recover, resulting changes in demand will put further pressure on our ability to meet customer needs and expectations and maintain high service levels. In addition, each of our primary customer groups – DIY, DIFM and Pro – have different needs and expectations, many of which evolve as the demographics in a particular customer group change. If we do not successfully differentiate the shopping experience to meet the individual needs and expectations of a customer group, we may lose market share with respect to those customers. 13 | 14 | Customer expectations about the methods by which they purchase and receive products or services are also becoming more demanding. Customers are increasingly using technology and mobile devices to rapidly compare products and prices, determine real-time product availability and purchase products. Once products are purchased, customers are seeking alternate options for delivery of those products, and they often expect quick and low-cost delivery. We must continually anticipate and adapt to these changes in the purchasing process. We have implemented programs like BOSS, BOPIS and direct fulfillment, and are rolling out BODFS, but we cannot guarantee that these programs or others we may implement will be implemented successfully or will meet customers’ needs and expectations. Customers are also using social media to provide feedback and information about our Company and products and services in a manner that can be quickly and broadly disseminated. To the extent a customer has a negative experience and shares it over social media, it may impact our brand and reputation. 15 | 16 | Further, we have an aging store base that requires maintenance and space reallocation initiatives to deliver the shopping environment that our customers desire. Failure to maintain our stores and utilize our store space effectively, to provide a compelling online presence, to timely identify or respond to changing consumer preferences, expectations and home improvement needs and to differentiate the customer experience for our three primary customer groups could adversely affect our relationship with customers, our reputation, the demand for our products and services, and our market share. 17 | 18 | Our success depends upon our ability to attract, develop and retain highly qualified associates while also controlling our labor costs. 19 | 20 | Our customers expect a high level of customer service and product knowledge from our associates. To meet the needs and expectations of our customers, we must attract, develop and retain a large number of highly qualified associates while at the same time controlling labor costs. Our ability to control labor costs is subject to numerous external factors, including prevailing wage rates and health and other insurance costs, as well as the impact of legislation or regulations governing labor relations, minimum wage, or healthcare benefits. An inability to provide wages and/or benefits that are competitive within the markets in which we operate could adversely affect our ability to retain and attract employees. In addition, we compete with other retail businesses for many of our associates in hourly positions, and we invest significant resources in training and motivating them to maintain a high level of job satisfaction. These positions have historically had high turnover rates, which can lead to increased training and retention costs, particularly if the economy continues to improve and employment opportunities increase. There is no assurance that we will be able to attract or retain highly qualified associates in the future. 21 | 22 | We have incurred losses related to our Data Breach, and we are still in the process of determining the full impact of related government investigations and civil litigation on our results of operations, which could have an adverse impact on our operations, financial results and reputation. 23 | 24 | The Data Breach involved the theft of certain payment card information and customer email addresses through unauthorized access to our systems. Since the Data Breach occurred, we have recorded $161 million of pretax expenses, net of expected insurance recoveries, in connection with the Data Breach, as described in more detail in Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operation" and Note 13 to the Consolidated Financial Statements included in Item 8, "Financial Statements and Supplementary Data". We are facing putative class actions filed in the U.S. and Canada and a consolidated shareholder derivative action brought by two purported shareholders in the U.S., and other claims have been and may be asserted on behalf of customers, payment card issuing banks, shareholders, or others seeking damages or other related relief, allegedly arising out of the Data Breach. We are also facing investigations by a number of state and federal agencies. These claims and investigations may adversely affect how we operate our business, divert the attention of management from the operation of the business, have an adverse effect on our reputation, and result in additional costs and fines. In addition, the governmental agencies investigating the Data Breach may seek to impose injunctive relief, which could 25 | 26 | 8 27 | materially increase our data security costs, adversely impact how we operate our systems and collect and use customer information, and put us at a competitive disadvantage with other retailers. 28 | 29 | If our efforts to maintain the privacy and security of customer, associate, supplier and Company information are not successful, we could incur substantial additional costs and reputational damage, and could become subject to further litigation and enforcement actions. 30 | 31 | Our business, like that of most retailers, involves the receipt, storage and transmission of customers’ personal information, consumer preferences and payment card information, as well as confidential information about our associates, our suppliers and our Company, some of which is entrusted to third-party service providers and vendors. We also work with third-party service providers and vendors that provide technology, systems and services that we use in connection with the receipt, storage and transmission of this information. Our information systems, and those of our third-party service providers and vendors, are vulnerable to an increasing threat of continually evolving cybersecurity risks. Unauthorized parties may attempt to gain access to these systems or our information through fraud or other means of deceiving our associates, third-party service providers or vendors. Hardware, software or applications we develop or obtain from third parties may contain defects in design or manufacture or other problems that could unexpectedly compromise information security. The methods used to obtain unauthorized access, disable or degrade service or sabotage systems are also constantly changing and evolving and may be difficult to anticipate or detect for long periods of time. We have implemented and regularly review and update processes and procedures to protect against unauthorized access to or use of secured data and to prevent data loss. However, the ever-evolving threats mean we and our third-party service providers and vendors must continually evaluate and adapt our respective systems and processes, and there is no guarantee that they will be adequate to safeguard against all data security breaches or misuses of data. Any future significant compromise or breach of our data security, whether external or internal, or misuse of customer, associate, supplier or Company data, could result in additional significant costs, lost sales, fines, lawsuits, and damage to our reputation. In addition, as the regulatory environment related to information security, data collection and use, and privacy becomes increasingly rigorous, with new and constantly changing requirements applicable to our business, compliance with those requirements could also result in additional costs. 32 | 33 | We are subject to payment-related risks that could increase our operating costs, expose us to fraud or theft, subject us to potential liability and potentially disrupt our business. 34 | 35 | We accept payments using a variety of methods, including cash, checks, credit and debit cards, PayPal, our private label credit cards and installment loan program, and gift cards, and we may offer new payment options over time. Acceptance of these payment options subjects us to rules, regulations, contractual obligations and compliance requirements, including payment network rules and operating guidelines, data security standards and certification requirements, and rules governing electronic funds transfers. These requirements may change over time or be reinterpreted, making compliance more difficult or costly. For certain payment methods, including credit and debit cards, we pay interchange and other fees, which may increase over time and raise our operating costs. We rely on third parties to provide payment processing services, including the processing of credit cards, debit cards, and other forms of electronic payment. If these companies become unable to provide these services to us, or if their systems are compromised, it could potentially disrupt our business. The payment methods that we offer also subject us to potential fraud and theft by criminals, who are becoming increasingly more sophisticated, seeking to obtain unauthorized access to or exploit weaknesses that may exist in the payment systems, as reflected in our recent Data Breach. If we fail to comply with applicable rules or requirements for the payment methods we accept, or if payment-related data is compromised due to a breach or misuse of data, we may be liable for costs incurred by payment card issuing banks and other third parties or subject to fines and higher transaction fees, or our ability to accept or facilitate certain types of payments may be impaired. In addition, our customers could lose confidence in certain payment types, which may result in a shift to other payment types or potential changes to our payment systems that may result in higher costs. As a result, our business and operating results could be adversely affected. 36 | 37 | Uncertainty regarding the housing market, economic conditions and other factors beyond our control could adversely affect demand for our products and services, our costs of doing business and our financial performance. 38 | 39 | Our financial performance depends significantly on the stability of the housing, residential construction and home improvement markets, as well as general economic conditions, including changes in gross domestic product. Adverse conditions in or uncertainty about these markets or the economy could adversely impact our customers’ confidence or financial condition, causing them to determine not to purchase home improvement products and services or delay purchasing or payment for those products and services. Other factors beyond our control – including high levels of unemployment and foreclosures; interest rate fluctuations; fuel and other energy costs; labor and healthcare costs; the availability of financing; the state of the credit markets, including mortgages, home equity loans and consumer credit; weather; natural disasters and 40 | 41 | 9 42 | other conditions beyond our control – could further adversely affect demand for our products and services, our costs of doing business and our financial performance. 43 | 44 | A failure of a key information technology system or process could adversely affect our business. 45 | 46 | We rely extensively on information technology systems, some of which are managed or provided by third-party service providers, to analyze, process, store, manage and protect transactions and data. We also rely heavily on the integrity of, security of and consistent access to this data in managing our business. For these systems and processes to operate effectively, we or our service providers must periodically maintain and update them. Our systems and the third-party systems on which we rely are subject to damage or interruption from a number of causes, including power outages; computer and telecommunications failures; computer viruses; security breaches; cyber-attacks; catastrophic events such as fires, floods, earthquakes, tornadoes, or hurricanes; acts of war or terrorism; and design or usage errors by our associates, contractors or third-party service providers. Although we and our third-party service providers seek to maintain our respective systems effectively and to successfully address the risk of compromise of the integrity, security and consistent operations of these systems, we may not be successful in doing so. As a result, we or our service providers could experience errors, interruptions, delays or cessations of service in key portions of our information technology infrastructure, which could significantly disrupt our operations and be costly, time consuming and resource-intensive to remedy. 47 | 48 | Disruptions in our customer-facing technology systems could impair our interconnected retail strategy and give rise to negative customer experiences. 49 | 50 | Through our information technology developments, we are able to provide an improved overall shopping and multichannel experience that empowers our customers to shop and interact with us from computers, tablets, smartphones and other mobile devices. We use our website both as a sales channel for our products and also as a method of providing product, project and other relevant information to our customers to drive both in-store and online sales. We have multiple online communities and knowledge centers that allow us to inform, assist and interact with our customers. Multichannel retailing is continually evolving and expanding, and we must effectively respond to changing customer expectations and new developments. For example, to improve our special order process we are currently rolling out COM, our new Customer Order Management system, which our customers will be able to access online, and we continually seek to enhance all of our online properties to provide an attractive user-friendly interface for our customers. Disruptions, failures or other performance issues with these customer-facing technology systems could impair the benefits that they provide to our online and in-store business and negatively affect our relationship with our customers. 51 | 52 | If we fail to identify and develop relationships with a sufficient number of qualified suppliers, or if our suppliers experience financial difficulties or other challenges, our ability to timely and efficiently access products that meet our high standards for quality could be adversely affected. 53 | 54 | We buy our products from suppliers located throughout the world. Our ability to continue to identify and develop relationships with qualified suppliers who can satisfy our high standards for quality and responsible sourcing, as well as our need to access products in a timely and efficient manner, is a significant challenge. Our ability to access products from our suppliers can be adversely affected by political instability, military conflict, the financial instability of suppliers (particularly in light of continuing economic difficulties in various regions of the world), suppliers’ noncompliance with applicable laws, trade restrictions, tariffs, currency exchange rates, any disruptions in our suppliers’ logistics or supply chain networks, and other factors beyond our or our suppliers’ control. 55 | 56 | Disruptions in our supply chain and other factors affecting the distribution of our merchandise could adversely impact our business. 57 | 58 | A disruption within our logistics or supply chain network could adversely affect our ability to deliver inventory in a timely manner, which could impair our ability to meet customer demand for products and result in lost sales, increased supply chain costs or damage to our reputation. Such disruptions may result from damage or destruction to our distribution centers; weather-related events; natural disasters; trade restrictions; tariffs; third-party strikes, lock-outs, work stoppages or slowdowns; shipping capacity constraints; supply or shipping interruptions or costs; or other factors beyond our control. Any such disruption could negatively impact our financial performance or financial condition. 59 | 60 | The implementation of our supply chain and technology initiatives could disrupt our operations in the near term, and these initiatives might not provide the anticipated benefits or might fail. 61 | 62 | We have made, and we plan to continue to make, significant investments in our supply chain and technology. These initiatives, such as Project Sync and COM, are designed to streamline our operations to allow our associates to continue to 63 | 64 | 10 65 | provide high-quality service to our customers, while simplifying customer interaction and providing our customers with a more interconnected retail experience. The cost and potential problems and interruptions associated with the implementation of these initiatives, including those associated with managing third-party service providers and employing new web-based tools and services, could disrupt or reduce the efficiency of our operations in the near term and lead to product availability issues. In addition, our improved supply chain and new or upgraded technology might not provide the anticipated benefits, it might take longer than expected to realize the anticipated benefits, or the initiatives might fail altogether, each of which could adversely impact our competitive position and our financial condition, results of operations or cash flows. 66 | 67 | If we are unable to effectively manage and expand our alliances and relationships with selected suppliers of both brand name and proprietary products, we may be unable to effectively execute our strategy to differentiate ourselves from our competitors. 68 | 69 | As part of our focus on product differentiation, we have formed strategic alliances and exclusive relationships with selected suppliers to market products under a variety of well-recognized brand names. We have also developed relationships with selected suppliers to allow us to market proprietary products that are comparable to national brands. Our proprietary products differentiate us from other retailers, generally carry higher margins than national brand products, and represent a growing portion of our business. If we are unable to manage and expand these alliances and relationships or identify alternative sources for comparable brand name and proprietary products, we may not be able to effectively execute product differentiation, which may impact our sales and gross margin results. 70 | 71 | Our proprietary products subject us to certain increased risks. 72 | 73 | As we expand our proprietary product offerings, we may become subject to increased risks due to our greater role in the design, manufacture, marketing and sale of those products. The risks include greater responsibility to administer and comply with applicable regulatory requirements, increased potential product liability and product recall exposure and increased potential reputational risks related to the responsible sourcing of those products. To effectively execute on our product differentiation strategy, we must also be able to successfully protect our proprietary rights and successfully navigate and avoid claims related to the proprietary rights of third parties. In addition, an increase in sales of our proprietary products may adversely affect sales of our vendors’ products, which, in turn, could adversely affect our relationships with certain of our vendors. Any failure to appropriately address some or all of these risks could damage our reputation and have an adverse effect on our business, results of operations and financial condition. 74 | 75 | We may be unsuccessful in implementing our growth strategy, which includes the integration of Interline to expand our business with professional customers and in the MRO market, which could have an adverse impact on our financial condition and results of operation. 76 | 77 | In fiscal 2015, we completed the acquisition of Interline, which we believe will enhance our ability to serve our professional customers and increase our share of the MRO market. Our goal is to serve all of our different Pro customer groups through one integrated approach to drive growth and capture market share in the retail, services and MRO markets, and this strategy depends, in part, on the successful integration of Interline. As with any acquisition, we need to successfully integrate the target company’s products, services, associates and systems into our business operations. Integration can be a complex and time-consuming process, and if the integration is not fully successful or is delayed for a material period of time, we may not achieve the anticipated synergies or benefits of the acquisition. An inability to realize the full extent of the anticipated synergies or benefits of the Interline acquisition could have an adverse effect on our financial condition or results of operation. Furthermore, even if Interline is successfully integrated, the acquisition may fail to further our business strategy as anticipated, expose us to increased competition or challenges with respect to our products or services, and expose us to additional liabilities associated with the Interline business. 78 | 79 | If we are unable to manage effectively our installation service business, we could suffer lost sales and be subject to fines, lawsuits and reputational damage. 80 | 81 | We act as a general contractor to provide installation services to our DIFM customers through third-party installers. As such, we are subject to regulatory requirements and risks applicable to general contractors, which include management of licensing, permitting and quality of our third-party installers. We have established processes and procedures that provide protections beyond those required by law to manage these requirements and ensure customer satisfaction with the services provided by our third-party installers. If we fail to manage these processes effectively or to provide proper oversight of these services, we could suffer lost sales, fines and lawsuits, as well as damage to our reputation, which could adversely affect our business. 82 | 83 | 11 84 | Our costs of doing business could increase as a result of changes in, expanded enforcement of, or adoption of new federal, state or local laws and regulations. 85 | 86 | We are subject to various federal, state and local laws and regulations that govern numerous aspects of our business. Recently, there have been a large number of legislative and regulatory initiatives and reforms, as well as expanded enforcement of existing laws and regulations by federal, state and local agencies. Changes in, expanded enforcement of, or adoption of new federal, state or local laws and regulations governing minimum wage or living wage requirements; other wage, labor or workplace regulations; cybersecurity and data privacy; the sale of some of our products; transportation; logistics; supply chain transparency; taxes; energy costs or environmental matters could increase our costs of doing business or impact our operations. In addition, recent healthcare reform legislation could adversely impact our labor costs and our ability to negotiate favorable terms under our benefit plans for our associates. 87 | 88 | If we cannot successfully manage the unique challenges presented by international markets, we may not be successful in our international operations and our sales and profit margins may be impacted. 89 | 90 | Our ability to successfully conduct retail operations in, and source products and materials from, international markets is affected by many of the same risks we face in our U.S. operations, as well as unique costs and difficulties of managing international operations. Our international operations, including any expansion in international markets, may be adversely affected by local laws and customs, U.S. laws applicable to foreign operations and other legal and regulatory constraints, as well as political and economic conditions. Risks inherent in international operations also include, among others, potential adverse tax consequences, greater difficulty in enforcing intellectual property rights, risks associated with the Foreign Corrupt Practices Act and local anti-bribery law compliance, and challenges in our ability to identify and gain access to local suppliers. In addition, our operations in international markets create risk due to foreign currency exchange rates and fluctuations in those rates, which may adversely impact our sales and profit margins. 91 | 92 | The inflation or deflation of commodity prices could affect our prices, demand for our products, our sales and our profit margins. 93 | 94 | Prices of certain commodity products, including lumber and other raw materials, are historically volatile and are subject to fluctuations arising from changes in domestic and international supply and demand, labor costs, competition, market speculation, government regulations and periodic delays in delivery. Rapid and significant changes in commodity prices may affect the demand for our products, our sales and our profit margins. 95 | 96 | Changes in accounting standards and subjective assumptions, estimates and judgments by management related to complex accounting matters could significantly affect our financial results or financial condition. 97 | 98 | Generally accepted accounting principles and related accounting pronouncements, implementation guidelines and interpretations with regard to a wide range of matters that are relevant to our business, such as revenue recognition, asset impairment, impairment of goodwill and other intangible assets, inventories, lease obligations, self-insurance, tax matters and litigation, are highly complex and involve many subjective assumptions, estimates and judgments. Changes in these rules or their interpretation or changes in underlying assumptions, estimates or judgments could significantly change our reported or expected financial performance or financial condition. 99 | 100 | We are involved in a number of legal and regulatory proceedings, and while we cannot predict the outcomes of those proceedings and other contingencies with certainty, some of these outcomes may adversely affect our operations or increase our costs. 101 | 102 | In addition to the matters discussed above with respect to the Data Breach, we are involved in a number of legal proceedings and regulatory matters, including government inquiries and investigations, and consumer, employment, tort and other litigation that arise from time to time in the ordinary course of business. Litigation is inherently unpredictable, and the outcome of some of these proceedings and other contingencies could require us to take or refrain from taking actions which could adversely affect our operations or could result in excessive adverse verdicts. Additionally, involvement in these lawsuits, investigations and inquiries, and other proceedings may involve significant expense, divert management’s attention and resources from other matters, and impact the reputation of the Company. 103 | 104 | 12 105 | Item 1B. Unresolved Staff Comments. 106 | 107 | Not applicable. 108 | 109 | Item 2. Properties. -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20160131_Item1_excerpt.txt: -------------------------------------------------------------------------------- 1 | PART I 2 | 3 | Item 1. Business. 4 | 5 | Introduction 6 | 7 | The Home Depot, Inc. is the world’s largest home improvement retailer based on Net Sales for the fiscal year ended January 31, 2016 ("fiscal 2015"). The Home Depot sells a wide assortment of building materials, home improvement products and lawn and garden products and provides a number of services. The Home Depot stores average approximately 104,000 square feet of enclosed space, with approximately 24,000 additional square feet of outside garden area. As of the end of fiscal 2015, we had 2,274 The Home Depot stores located throughout the United States, including the Commonwealth of Puerto Rico and the territories of the U.S. Virgin Islands and Guam, Canada and Mexico. When we refer to "The Home Depot", the "Company", "we", "us" or "our" in this report, we are referring to The Home Depot, Inc. and its consolidated subsidiaries. 8 | 9 | The Home Depot, Inc. is a Delaware corporation that was incorporated in 1978. Our Store Support Center (corporate office) is located at 2455 Paces Ferry Road, Atlanta, Georgia 30339. Our telephone number is (770) 433-8211. 10 | 11 | Our internet website is www.homedepot.com. We make available on the Investor Relations section of our website, free of charge, our Annual Reports to shareholders, Annual Reports on Form 10-K, Quarterly Reports on Form 10-Q, Current Reports on Form 8-K, Proxy Statements and Forms 3, 4 and 5, and amendments to those reports, as soon as reasonably practicable after filing such documents with, or furnishing such documents to, the SEC. 12 | 13 | We include our website addresses throughout this filing for reference only. The information contained on our websites is not incorporated by reference into this report. 14 | 15 | For information on key financial highlights, including historical revenues, profits and total assets, see the "Five-Year Summary of Financial and Operating Results" on page F-1 of this report and Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations". 16 | 17 | 1 18 | Our Business 19 | 20 | Operating Strategy 21 | 22 | Since 2009, we have been guided by a consistent strategic framework organized around our customers, our products and our disciplined use of capital, tied together through our interconnected retail initiative. In fiscal 2015, we announced an evolution of this strategy to reflect the changing needs of our customers and our business. The fundamental aspects remain the same, but we are now focused more than ever on connecting various aspects of our business to drive value for our customers, our associates, our suppliers and our shareholders. Our current strategic framework is comprised of three key initiatives – Customer Experience, Product Authority, and Productivity and Efficiency Driven by Capital Allocation – tied together by our interconnecting retail initiative. As customers increasingly expect to be able to buy how, when and where they want, we believe that providing a seamless and frictionless shopping experience across multiple channels, featuring innovative and expanded product choices delivered in a fast and cost-efficient manner, will be a key enabler for future success. Becoming a best-in-class interconnected retailer is growing in importance as the line between online and in-store shopping continues to blur and customers demand increased value and convenience. 23 | 24 | Interconnecting retail is woven through each of our other three initiatives, as discussed in more detail below. For example, under our customer experience initiative, we are focused on connecting our stores to our online experience and connecting service to customer needs. Under our product authority initiative, we are focused on connecting our product assortment to local needs and connecting our customers with product information to inspire and empower them. Under our productivity and efficiency initiative, we are focused on connecting our merchandise from our suppliers to our customers by optimizing our supply chain. Overall, we are collaborating more closely, both internally and externally, through deeper cross-functional work and a more integrated, longer-term approach with our suppliers and other business partners, to build complete end-to-end solutions. 25 | 26 | Customer Experience 27 | 28 | Our customer experience initiative is anchored on the principles of putting customers first and taking care of our associates. Our commitment to customer service is a key part of this initiative, and in fiscal 2015, to underscore the importance of customer service, we re-trained our store associates on our Customer FIRST program. We recognize that the customer experience includes more than just customer service, and we have taken a number of steps to enhance this initiative to provide our customers with a seamless and frictionless shopping experience in our stores, online, on the job site or in their homes. 29 | 30 | Our Customers. We serve three primary customer groups, and we have different approaches to meet their particular needs: 31 | 32 | •Do-It-Yourself ("DIY") Customers. These customers are typically home owners who purchase products and complete their own projects and installations. Our associates assist these customers with specific product and installation questions both in our stores and through online resources and other media designed to provide product and project knowledge. We also offer a variety of clinics and workshops both to impart this knowledge and to build an emotional connection with our DIY customers. 33 | 34 | •Do-It-For-Me ("DIFM") Customers. These customers are typically home owners who purchase materials and hire third parties to complete the project or installation. Our stores offer a variety of installation services targeted at DIFM customers who purchase products and installation of those products from us in our stores, online or in their homes through in-home consultations. Our installation programs include many categories, such as flooring, cabinets, countertops, water heaters and sheds. In addition, we provide third-party professional installation in a number of categories sold through our in-home sales programs, such as roofing, siding, windows, cabinet refacing, furnaces and central air systems. This customer group is growing due to changing demographics, which we believe will increase demand for our installation services. Further, our focus on serving the professional customers, or "Pros", who perform these services for our DIFM customers will help us drive higher product sales. 35 | 36 | •Professional Customers. These customers are primarily professional renovators/remodelers, general contractors, repairmen, installers, small business owners and tradesmen. With our acquisition of Interline Brands, Inc. ("Interline") in August 2015, we expanded our service to the maintenance, repair and operations ("MRO") Pro. We recognize the unique service needs of the Pro customer and use our expertise to facilitate their buying experience. We offer a variety of special programs to these customers, including delivery and will-call services, dedicated staff, expanded credit programs, designated parking spaces close to store entrances and bulk pricing programs for both online and in-store purchases. In addition, we maintain a loyalty program, Pro Xtra, that provides our Pros with discounts on useful business services, exclusive product offers and a purchase tracking tool to enable receipt lookup 37 | 38 | 2 39 | online and job tracking of purchases across all forms of payment. This program, introduced in fiscal 2013, has continued to gain traction, with almost 4 million customers enrolled by the end of fiscal 2015. 40 | 41 | We also recognize that our Pros have differing needs depending on the type of work they perform. Our goal is to develop a wide spectrum of solutions for all of our professional customers, such as supplying both recurring MRO needs and core building materials to large-scale property managers and providing inventory management solutions for our traditional Pro customers. We believe developing a unified approach to service all the needs of our Pros will differentiate us from competitors who are solely traditional retail, installation or MRO companies. 42 | 43 | We help our DIY, DIFM and Pro customers finance their projects by offering private label credit products in our stores through third-party credit providers. We also help certain of our Pros through our own programs. In fiscal 2015, our customers opened approximately 3.2 million new The Home Depot private label credit accounts, and at fiscal year end the total number of The Home Depot active account holders was approximately 12 million. Private label credit card sales accounted for approximately 23% of sales in fiscal 2015. In addition, in the U.S. we re-launched our private label credit program at the end of fiscal 2015 with additional benefits, including a 365-day return policy for all of our customers and commercial fuel rewards and extended payment terms for our Pros. 44 | 45 | Our Associates. Our associates are key to our customer experience initiative. As noted above, we empower our associates to deliver excellent customer service through our Customer FIRST training program, and we strive to remove complexity and inefficient processes from the stores to allow our associates to focus on our customers. In fiscal 2015, we began to roll out a number of new initiatives to improve freight handling in the stores, as well as Project Sync, which is discussed in more detail below under "Logistics". All of these programs are designed to make our freight handling process more efficient, which allows our associates to devote more time to the customer experience and makes working at The Home Depot a better experience for them. We also have a number of programs to recognize stores and individual associates for exceptional customer and community service. 46 | 47 | At the end of fiscal 2015, we employed approximately 385,000 associates, of whom approximately 24,000 were salaried, with the remainder compensated on an hourly or temporary basis. To attract and retain qualified personnel, we seek to maintain competitive salary and wage levels in each market we serve. We measure associate satisfaction regularly, and we believe that our employee relations are very good. 48 | 49 | Interconnecting Retail. In fiscal 2015, we continued to enhance our customers’ interconnected shopping experiences through a variety of initiatives. Our associates used second generation FIRST phones, our web-enabled handheld devices, to help customers complete online sales in the aisle, expedite the checkout process for customers during peak traffic periods, locate products in the aisles and online, and check inventory on hand. We have also empowered our customers with improved product location and inventory availability tools through enhancements to our website and mobile app, and we have invested heavily in content improvements such as videos, ratings and reviews, and more detailed product information. These enhancements are critical for our increasingly interconnected customers who research products online and then go into one of our stores to view the products in person or talk to an associate before making the purchase. While in the store, customers may also go online to access ratings and reviews, compare prices, view our extended assortment and purchase products. 50 | 51 | We continued to make enhancements to our special order process in fiscal 2015 with our new Customer Order Management platform ("COM"), which was introduced in fiscal 2014. This platform is designed to provide greater visibility into and improved execution of special orders by our associates and a more seamless and frictionless experience for our customers. After COM is rolled out to all U.S. stores, which we expect to occur by the end of fiscal 2016, store associates, suppliers and customers will be able to access relevant special order information online, regardless of where the order was placed. In addition, we have three online contact centers to service our online customers’ needs. 52 | 53 | We also recognize that customers desire greater flexibility and convenience when it comes to receiving their products and services. In fiscal 2015, we began to roll out Buy Online, Deliver From Store ("BODFS"), which complements our existing interconnecting retail programs: Buy Online, Pick-up In Store ("BOPIS"), Buy Online, Ship to Store ("BOSS") and Buy Online, Return In Store ("BORIS"). We expect to complete the roll out of BODFS by the end of fiscal 2016. We will continue to blend our physical and digital assets in a seamless and frictionless way to enhance the end-to-end customer experience. 54 | 55 | 3 56 | Product Authority 57 | 58 | Our product authority initiative is facilitated by our merchandising transformation and portfolio strategy, which is focused on delivering product innovation, assortment and value. In fiscal 2015, we continued to introduce a wide range of innovative new products to our DIY, DIFM and Pro customers, while remaining focused on offering everyday values in our stores and online. 59 | 60 | Our Products. In fiscal 2015, we introduced a number of innovative and distinctive products to our customers at attractive values. Examples of these new products include EGO™ 58-volt cordless outdoor power tools (string trimmer, hedge trimmer, blower, chainsaw and lawn mower); the Husky® 100 platform of mechanics tools; LifeProof Carpet®; Milwaukee® Cobalt Red Helix™ drill bits; and Feit® Electric HomeBrite® Bluetooth® Smart LED light bulbs. 61 | 62 | During fiscal 2015, we continued to offer value to our customers through our proprietary and exclusive brands across a wide range of departments. Highlights of these offerings include Husky® hand tools and tool storage; Everbilt® hardware and fasteners; Hampton Bay® lighting, ceiling fans and patio furniture; Vigoro® lawn care products; RIDGID® and Ryobi® power tools; Glacier Bay® bath fixtures; HDX® storage and cleaning products; and Home Decorators Collection® furniture and home décor. We will continue to assess departments and categories, both online and in-store, for opportunities to expand the assortment of products offered within The Home Depot’s portfolio of proprietary and exclusive brands. 63 | 64 | We maintain a global sourcing program to obtain high-quality and innovative products directly from manufacturers around the world. In fiscal 2015, in addition to our U.S. sourcing operations, we maintained sourcing offices in China, Taiwan, India, Italy, Mexico and Canada. With our acquisition of Interline, we also acquired additional sourcing offices in China, Thailand and Indonesia. 65 | 66 | The percentage of Net Sales of each of our major product categories (and related services) for each of the last three fiscal years is presented in Note 1 to the Consolidated Financial Statements included in Item 8, "Financial Statements and Supplementary Data". Net Sales outside the U.S. were $8.0 billion, $8.5 billion and $8.5 billion for fiscal 2015, 2014 and 2013, respectively. Long-lived assets outside the U.S. totaled $2.3 billion, $2.5 billion and $2.9 billion as of January 31, 2016, February 1, 2015 and February 2, 2014, respectively. 67 | 68 | Quality Assurance. Our suppliers are obligated to ensure that their products comply with applicable international, federal, state and local laws. In addition, we have both quality assurance and engineering resources dedicated to establishing criteria and overseeing compliance with safety, quality and performance standards for our proprietary branded products. We also have a global Supplier Social and Environmental Responsibility Program designed to ensure that all suppliers adhere to the highest standards of social and environmental responsibility. 69 | 70 | Environmentally-Friendly Products and Programs. The Home Depot is committed to sustainable business practices – from the environmental impact of our operations, to our sourcing activities, to our involvement within the communities in which we do business. We believe these efforts continue to be successful in creating value for our customers and shareholders. For example, we offer a growing selection of environmentally-preferred products, which supports sustainability and helps our customers save energy, water and money. Through our Eco Options® Program introduced in 2007, we have created product categories that allow customers to easily identify products that meet specifications for energy efficiency, water conservation, healthy home, clean air and sustainable forestry. As of the end of fiscal 2015, our Eco Options® Program included over 10,000 products. Through this program, we sell ENERGY STAR® certified appliances, LED light bulbs, tankless water heaters and other products that enable our customers to save on their utility bills. We estimate that in fiscal 2015 we helped customers save over $700 million in electricity costs through sales of ENERGY STAR® certified products and over $300 million in product costs through ENERGY STAR® rebate programs. We also estimate our customers saved over 70 billion gallons of water resulting in over $590 million in water bill savings in fiscal 2015 through the sales of our WaterSense®-labeled bath faucets, showerheads, aerators, toilets and irrigation controllers. 71 | 72 | We continue to offer store recycling programs nationwide, such as an in-store compact fluorescent light ("CFL") bulb recycling program launched in 2008. This service is offered to customers free of charge and is available in all U.S. stores. We also maintain an in-store rechargeable battery recycling program. Launched in 2001 and currently done in partnership with Call2Recycle, this program is also available to customers free of charge in all stores throughout the U.S. Through these recycling programs, in fiscal 2015 we helped recycle over 680,000 pounds of CFL bulbs and over 930,000 pounds of rechargeable batteries collected from our customers. In fiscal 2015, we also recycled over 170,000 lead acid batteries collected from our customers under our lead acid battery exchange program, as well as over 200,000 tons of cardboard through a nationwide cardboard recycling program across our U.S. stores. We believe our Eco Options® Program and our recycling efforts drive sales, which in turn benefits our shareholders, in addition to our customers and the environment. 73 | 74 | 4 75 | Interconnecting Retail. A typical The Home Depot store stocks approximately 30,000 to 40,000 products during the year, including both national brand name and proprietary items. To enhance our merchandising capabilities, we continued to make improvements to our information technology tools in fiscal 2015 to better understand our customers, provide more localized assortments to fit customer demand and optimize space to dedicate the right square footage to the right products in the right location. We also continued to use the resources of BlackLocus, Inc., a data analytics and pricing firm we acquired in fiscal 2012, to help us make focused merchandising decisions based on large, complex data sets. 76 | 77 | Our online product offerings complement our stores by serving as an extended aisle, and we offer a significantly broader product assortment through our Home Depot, Home Decorators Collection and Blinds.com websites. We continue to enhance our websites and mobile experience by improving navigation and search functionalities to allow customers to more easily find and purchase an expanded array of products and provide our customers with flexibility and convenience for their purchases, for example, through our BOPIS, BOSS, BORIS and BODFS programs. In addition, we invest in content, such as videos, room scenes, buying guides and how-to information, and we routinely assess our online assortment to balance choice with curation so that we provide value to our customers. As a result of these efforts, in fiscal 2015 we enhanced the customer experience and saw increased traffic to our websites, improved online sales conversion rates, and a larger percentage of orders being picked up in our stores. For fiscal 2015, we had over 1.4 billion visits to our online properties; sales from our online channels increased over 25% compared to fiscal 2014; and over 40% of our online orders were picked up in a store. 78 | 79 | Seasonality. Our business is subject to seasonal influences. Generally, our highest volume of sales occurs in our second fiscal quarter, and the lowest volume occurs either during our first or fourth fiscal quarter. 80 | 81 | Competition. Our industry is highly competitive, with competition based primarily on customer service, price, store location and appearance, and quality, availability and assortment of merchandise. Although we are currently the world’s largest home improvement retailer, in each of the markets we serve there are a number of other home improvement stores, electrical, plumbing and building materials supply houses, and lumber yards. With respect to some products and services, we also compete with specialty design stores, showrooms, discount stores, local, regional and national hardware stores, paint stores, mail order firms, warehouse clubs, independent building supply stores, MRO companies and, to a lesser extent, other retailers, as well as with installers of home improvement products. In addition, we face growing competition from online and multichannel retailers, some of whom may have a lower cost structure than ours, as our customers increasingly use computers, tablets, smartphones and other mobile devices to shop online and compare prices and products. 82 | 83 | Intellectual Property. Our business has one of the most recognized brands in North America. As a result, we believe that The Home Depot® trademark has significant value and is an important factor in the marketing of our products, e-commerce, stores and business. We have registered or applied for registration of trademarks, service marks, copyrights and internet domain names, both domestically and internationally, for use in our business, including our expanding proprietary brands such as HDX®, Husky®, Hampton Bay®, Home Decorators Collection®, Glacier Bay® and Vigoro®. We also maintain patent portfolios relating to some of our products and services and seek to patent or otherwise protect innovations we incorporate into our products or business operations. 84 | 85 | Productivity and Efficiency Driven by Capital Allocation 86 | 87 | We have advanced this initiative by building best-in-class competitive advantages in our information technology and supply chain to better ensure product availability to our customers while managing our costs, which results in higher returns for our shareholders. During fiscal 2015, we continued to focus on optimizing our supply chain network and improving our inventory, transportation and distribution productivity. 88 | 89 | Logistics. Our supply chain operations are focused on creating a competitive advantage through ensuring product availability for our customers, effectively using our investment in inventory, and managing total supply chain costs. One of our principal 2015 initiatives has been to further optimize and efficiently operate our network by beginning initial work on a multi-year program called Supply Chain Synchronization, or "Project Sync". 90 | 91 | Our distribution strategy is to provide the optimal flow path for a given product. Rapid Deployment Centers ("RDCs") play a key role in optimizing our network as they allow for aggregation of product needs for multiple stores to a single purchase order and then rapid allocation and deployment of inventory to individual stores upon arrival at the RDC. This results in a simplified ordering process and improved transportation and inventory management. We have 18 mechanized RDCs in the U.S. and two recently opened mechanized RDCs in Canada. Through Project Sync, which is being rolled out gradually to suppliers in several U.S. RDCs, we can significantly reduce our average lead time from supplier to shelf. Project Sync requires deep collaboration among our suppliers, transportation providers, RDCs and stores, as well as rigorous planning and information technology development to create an engineered flow schedule that shortens and stabilizes lead time, resulting in 92 | 93 | 5 94 | more predictable and consistent freight flow. As we continue to roll out Project Sync throughout our supply chain over the next several years, we plan to create an end-to-end solution that benefits all participants in our supply chain, from our suppliers to our transportation providers to our RDC and store associates to our customers. 95 | 96 | Over the past several years, we have centralized our inventory planning and replenishment function and continuously improved our forecasting and replenishment technology. This has helped us improve our product availability and our inventory productivity at the same time. At the end of fiscal 2015, over 95% of our U.S. store products were ordered through central inventory management. 97 | 98 | In addition to our RDCs, at the end of fiscal 2015, we operated 34 bulk distribution centers, which handle products distributed optimally on flat bed trucks, in the U.S. and Canada; 22 stocking distribution centers in the U.S., Canada and Mexico; and ten specialty distribution centers, which include offshore consolidation and return logistics centers, in the U.S. and Canada. We also utilize four U.S. transload facilities, operated by third parties near ocean ports, for our imported product. These facilities allow us to improve our import logistics costs and inventory management by postponing final inventory deployment decisions until product arrives at destination ports. We remain committed to leveraging our supply chain capabilities to fully utilize and optimize our improved logistics network. 99 | 100 | Interconnecting Retail. To support our online growth, in fiscal 2015 we opened the third of our three new direct fulfillment centers ("DFCs"). We expect these facilities to enable us to reach 90% of our U.S. customers in two business days or less with parcel shipping, which provides our customers with a balance of cost efficiency and speed in shipping online orders. For non-parcel orders originating from our DFCs, we have fully implemented BOSS via RDC delivery to provide our customers with a less expensive store pick-up alternative. With our acquisition of Interline, we have also added more than 90 distribution points with fast delivery of a broad assortment of MRO products. 101 | 102 | In addition to the distribution and fulfillment centers described above, we leverage our almost 2,000 U.S. stores as a network of convenient customer pick-up, return and delivery fulfillment locations. For customers who shop online and wish to pick-up or return merchandise at our U.S. stores, we have fully implemented our BOPIS, BOSS and BORIS programs, which we believe provide us with a competitive advantage. For customers who would like the option to have store-based orders delivered directly to their home or job site, we pick, pack and ship orders to customers from our stores. We will continue our roll out of BODFS during fiscal 2016, allowing online customers to select their preferred delivery date and time windows for store-based deliveries. Our supply chain and logistics strategies will continue to be focused on providing our customers high product availability with convenient and low cost fulfillment options. 103 | 104 | Commitment to Sustainability and Environmentally Responsible Operations. The Home Depot focuses on sustainable operations and is committed to conducting business in an environmentally responsible manner. This commitment impacts all areas of our business, including energy usage, supply chain, store construction and maintenance, and, as noted above under "Environmentally-Friendly Products and Programs", product selection and recycling programs for our customers. 105 | 106 | In our 2015 Sustainability Report, available on our corporate website under "Corporate Responsibility > THD and the Environment", we reported that we had significantly surpassed our energy and carbon reduction goals set in 2010 and announced two new sustainability goals for 2020. Our 2010 goals were to reduce our kilowatt hours (kWh) per square foot in our U.S. stores by 20% over 2004 levels and to reduce our supply chain carbon emissions by 20% over 2010 levels by 2015. We estimate that we have reduced those levels by over 30% and over 35%, respectively, as of the end of fiscal 2015. From 2014 to 2015 alone, we reduced our kWh per square foot by approximately 3.6%. Our new 2020 sustainability commitments are to reduce our U.S. stores’ energy use by 20% over 2010 levels and to produce and procure, on an annual basis, 135 megawatts of energy for our stores through renewable or alternate energy sources, such as wind, solar and fuel cell technology. We are committed to implementing strict operational standards that establish energy efficient operations in all of our U.S. facilities and continuing to invest in renewable energy. Our 2015 Sustainability Report also uses the Global Reporting Initiative (GRI) framework for sustainability reporting. 107 | 108 | Additionally, we implemented a rainwater reclamation project in our stores in 2010. As of the end of fiscal 2015, 145 of our stores used reclamation tanks to collect rainwater and condensation from HVAC units and garden center roofs, which is in turn used to water plants in our outside garden centers. We estimate our annual water savings from these units to be approximately 500,000 gallons per store for total water savings of over 68 million gallons in fiscal 2015. 109 | 110 | Our commitment to corporate sustainability has resulted in a number of environmental awards and recognitions. In 2015, we received three significant awards from the U.S. Environmental Protection Agency ("EPA"). The ENERGY STAR® division named us "Retail Partner of the Year – Sustained Excellence" for our overall excellence in energy efficiency, and we received the 2015 WaterSense® Sustained Excellence Award for our overall excellence in water efficiency. We also received the EPA’s 111 | 112 | 6 113 | "SmartWay Excellence Award", which recognizes The Home Depot as an industry leader in freight supply chain environmental performance and energy efficiency. We also participate in the CDP (formerly known as the Carbon Disclosure Project) reporting process. CDP is an independent, international, not-for-profit organization providing a global system for companies and cities to measure, disclose, manage and share environmental information. In 2015, we scored 99 out of 100 from the CDP for our disclosure, placing us among the highest scoring companies in the Index and near the top of our sector. We also were named as an industry leader by the CDP and received a performance band ranking of A- (out of a range from A to E), reflecting a high level of action on climate change mitigation, adaptation and transparency. 114 | 115 | We are strongly committed to maintaining a safe shopping and working environment for our customers and associates and protecting the environment of the communities in which we do business. Our Environmental, Health & Safety ("EH&S") function is dedicated to ensuring the health and safety of our customers and associates, with trained associates who evaluate, develop, implement and enforce policies, processes and programs on a Company-wide basis. Our EH&S policies are woven into our everyday operations and are part of The Home Depot culture. Some common program elements include: daily store inspection checklists (by department); routine follow-up audits from our store-based safety team members and regional, district and store operations field teams; equipment enhancements and preventative maintenance programs to promote physical safety; departmental merchandising safety standards; training and education programs for all associates, with varying degrees of training provided based on an associate’s role and responsibilities; and awareness, communication and recognition programs designed to drive operational awareness and understanding of EH&S issues. 116 | 117 | Returning Value to Shareholders. As noted above, we drive productivity and efficiency through our capital allocation decisions, with a focus on expense control. This discipline drove higher returns on invested capital and allowed us to return value to shareholders through $7.0 billion in share repurchases and $3.0 billion in dividends in fiscal 2015, as discussed in Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations". 118 | 119 | Data Breach 120 | 121 | In the third quarter of fiscal 2014, we confirmed that our payment data systems were breached, which impacted customers who used payment cards at our U.S. and Canadian stores (the "Data Breach"). For a description of matters related to the Data Breach, see Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations" and Note 13 to the Consolidated Financial Statements included in Item 8, "Financial Statements and Supplementary Data". 122 | 123 | Item 1A. Risk Factors. -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20160131_Item7A_excerpt.txt: -------------------------------------------------------------------------------- 1 | Item 7A. Quantitative and Qualitative Disclosures About Market Risk. 2 | 3 | The information required by this item is incorporated by reference to Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations" of this report. 4 | 5 | 28 6 | Item 8. Financial Statements and Supplementary Data. -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20170129_Item1A_excerpt.txt: -------------------------------------------------------------------------------- 1 | Item 1A. Risk Factors. 2 | 3 | The risks and uncertainties described below could materially and adversely affect our business, financial condition and results of operations and could cause actual results to differ materially from our expectations and projections. You should read these Risk Factors in conjunction with "Management’s Discussion and Analysis of Financial Condition and Results of Operations" in Item 7 and our Consolidated Financial Statements and related notes in Item 8. There also may be other factors that we cannot anticipate or that are not described in this report generally because we do not currently perceive them to be material. Those factors could cause results to differ materially from our expectations. 4 | 5 | Strong competition could adversely affect prices and demand for our products and services and could decrease our market share. 6 | 7 | We operate in markets that are highly competitive. We compete principally based on customer experience, price, store location and appearance, and quality, availability, assortment and presentation of merchandise. In each market we serve, there are a number of other home improvement stores, electrical, plumbing and building materials supply houses and lumber yards. With respect to some products and services, we also compete with specialty design stores, showrooms, discount stores, local, regional and national hardware stores, paint stores, mail order firms, warehouse clubs, independent building supply stores, MRO companies and other retailers, as well as with providers of home improvement services. In addition, we face growing competition from online and multichannel retailers, some of whom may have a lower cost structure than ours, as our customers now routinely use computers, tablets, smartphones and other mobile devices to shop online and compare prices and products in real time. We use our marketing, advertising and promotional programs to drive customer traffic and compete more effectively, and we must regularly assess and adjust our efforts to address changes in the competitive landscape. Intense competitive pressures from one or more of our competitors, such as through aggressive promotional pricing or liquidation events, or our inability to adapt effectively and quickly to a changing competitive landscape could affect our prices, our 8 | 9 | 7 10 | margins or demand for our products and services. If we are unable to timely and appropriately respond to these competitive pressures, including through the delivery of a superior customer experience or maintenance of effective marketing, advertising or promotional programs, our market share and our financial performance could be adversely affected. 11 | 12 | We may not timely identify or effectively respond to consumer needs, expectations or trends, which could adversely affect our relationship with customers, our reputation, the demand for our products and services, and our market share. 13 | 14 | The success of our business depends in part on our ability to identify and respond promptly to evolving trends in demographics; consumer preferences, expectations and needs; and unexpected weather conditions, while also managing appropriate inventory levels and maintaining an excellent customer experience. It is difficult to successfully predict the products and services our customers will demand. As we continue to see increasing strength in the housing and home improvement market, resulting changes in demand will put further pressure on our ability to meet customer needs and expectations and maintain high service levels. In addition, each of our primary customer groups – DIY, DIFM and Pro – have different needs and expectations, many of which evolve as the demographics in a particular customer group change. We also need to offer more localized assortments of our merchandise to appeal to local cultural and demographic tastes within each customer group. If we do not successfully differentiate the shopping experience to meet the individual needs and expectations of – or within – a customer group, we may lose market share with respect to those customers. 15 | 16 | Customer expectations about the methods by which they purchase and receive products or services are also becoming more demanding. Customers now routinely use technology and mobile devices to rapidly compare products and prices, determine real-time product availability and purchase products. Once products are purchased, customers are seeking alternate options for delivery of those products, and they often expect quick and low-cost delivery. We must continually anticipate and adapt to these changes in the purchasing process. We have implemented programs like BOSS, BOPIS, BODFS and direct fulfillment, but we cannot guarantee that these programs or others we may implement will be implemented successfully or will meet customers’ needs and expectations. Customers are also using social media to provide feedback and information about our Company and products and services in a manner that can be quickly and broadly disseminated. To the extent a customer has a negative experience and shares it over social media, it may impact our brand and reputation. 17 | 18 | Further, we have an aging store base that requires maintenance and space reallocation initiatives to deliver the shopping experience that our customers desire. We must also maintain a safe store environment for our customers and associates. Failure to maintain our stores and utilize our store space effectively; to provide a compelling online presence; to timely identify or respond to changing consumer preferences, expectations and home improvement needs; to provide quick and low-cost delivery alternatives; to differentiate the customer experience for our three primary customer groups; and to effectively implement an increasingly localized merchandising assortment could adversely affect our relationship with customers, our reputation, the demand for our products and services, and our market share. 19 | 20 | Our success depends upon our ability to attract, develop and retain highly qualified associates while also controlling our labor costs. 21 | 22 | Our customers expect a high level of customer service and product knowledge from our associates. To meet the needs and expectations of our customers, we must attract, develop and retain a large number of highly qualified associates while at the same time controlling labor costs. Our ability to control labor costs is subject to numerous external factors, including prevailing wage rates and health and other insurance costs, as well as the impact of legislation or regulations governing labor relations, minimum wage, or healthcare benefits. An inability to provide wages and/or benefits that are competitive within the markets in which we operate could adversely affect our ability to retain and attract employees. In addition, we compete with other retail businesses for many of our associates in hourly positions, and we invest significant resources in training and motivating them to maintain a high level of job satisfaction. These positions have historically had high turnover rates, which can lead to increased training and retention costs, particularly as the economy continues to improve and the labor market tightens. There is no assurance that we will be able to attract or retain highly qualified associates in the future. 23 | 24 | A failure of a key information technology system or process could adversely affect our business. 25 | 26 | We rely extensively on information technology systems, some of which are managed or provided by third-party service providers, to analyze, process, store, manage and protect transactions and data. In managing our business, we also rely heavily on the integrity of, security of and consistent access to this data for information such as sales, merchandise ordering, inventory replenishment and order fulfillment. For these information technology systems and processes to operate effectively, we or our service providers must periodically maintain and update them. Our systems and the third-party systems on which we rely are subject to damage or interruption from a number of causes, including power outages; computer and telecommunications failures; computer viruses; security breaches; cyber-attacks, including the use of ransomware; 27 | 28 | 8 29 | catastrophic events such as fires, floods, earthquakes, tornadoes, or hurricanes; acts of war or terrorism; and design or usage errors by our associates, contractors or third-party service providers. Although we and our third-party service providers seek to maintain our respective systems effectively and to successfully address the risk of compromise of the integrity, security and consistent operations of these systems, such efforts may not be successful. As a result, we or our service providers could experience errors, interruptions, delays or cessations of service in key portions of our information technology infrastructure, which could significantly disrupt our operations and be costly, time consuming and resource-intensive to remedy. 30 | 31 | Disruptions in our customer-facing technology systems could impair our interconnected retail strategy and give rise to negative customer experiences. 32 | 33 | Through our information technology developments, we are able to provide an improved overall shopping and interconnected retail experience that empowers our customers to shop and interact with us from computers, tablets, smartphones and other mobile devices. We use our websites and our mobile app both as sales channels for our products and also as methods of providing product, project and other relevant information to our customers to drive both in-store and online sales. We have multiple online communities and knowledge centers that allow us to inform, assist and interact with our customers. Multichannel retailing is continually evolving and expanding, and we must effectively respond to changing customer preferences and new developments. We also continually seek to enhance all of our online properties to provide an attractive user-friendly interface for our customers, as evidenced by our recent redesign of our homedepot.com website. Disruptions, failures or other performance issues with these customer-facing technology systems could impair the benefits that they provide to our online and in-store business and negatively affect our relationship with our customers. 34 | 35 | If our efforts to maintain the privacy and security of customer, associate, supplier and Company information are not successful, we could incur substantial additional costs and reputational damage, and could become subject to further litigation and enforcement actions. 36 | 37 | Our business, like that of most retailers, involves the receipt, storage and transmission of customers’ personal information, preferences and payment card information, as well as other confidential information, such as personal information about our associates and our suppliers and confidential Company information. We also work with third-party service providers and vendors that provide technology, systems and services that we use in connection with the receipt, storage and transmission of this information. Our information systems, and those of our third-party service providers and vendors, are vulnerable to an increasing threat of continually evolving data protection and cybersecurity risks. Unauthorized parties may attempt to gain access to these systems or our information through fraud or other means of deceiving our associates, third-party service providers or vendors. Hardware, software or applications we develop or obtain from third parties may contain defects in design or manufacture or other problems that could unexpectedly compromise information security. The methods used to obtain unauthorized access, disable or degrade service or sabotage systems are also constantly changing and evolving and may be difficult to anticipate or detect for long periods of time. We have implemented and regularly review and update processes and procedures to protect against unauthorized access to or use of data and to prevent data loss. However, the ever-evolving threats mean we and our third-party service providers and vendors must continually evaluate and adapt our respective systems and processes and overall security environment, and there is no guarantee that they will be adequate to safeguard against all data security breaches, system compromises or misuses of data. Any future significant compromise or breach of our data security, whether external or internal, or misuse of customer, associate, supplier or Company data, could result in significant costs, lost sales, fines, lawsuits, and damage to our reputation. In addition, as the regulatory environment related to information security, data collection and use, and privacy becomes increasingly rigorous, with new and constantly changing requirements applicable to our business, compliance with those requirements could also result in significant costs. 38 | 39 | We are subject to payment-related risks that could increase our operating costs, expose us to fraud or theft, subject us to potential liability and potentially disrupt our business. 40 | 41 | We accept payments using a variety of methods, including cash, checks, credit and debit cards, PayPal, our private label credit cards, an installment loan program, trade credit, and gift cards, and we may offer new payment options over time. Acceptance of these payment options subjects us to rules, regulations, contractual obligations and compliance requirements, including payment network rules and operating guidelines, data security standards and certification requirements, and rules governing electronic funds transfers. These requirements may change over time or be reinterpreted, making compliance more difficult or costly. For certain payment methods, including credit and debit cards, we pay interchange and other fees, which may increase over time and raise our operating costs. We rely on third parties to provide payment processing services, including the processing of credit cards, debit cards, and other forms of electronic payment. If these companies become unable to provide these services to us, or if their systems are compromised, it could potentially disrupt our business. The payment methods that we offer also subject us to potential fraud and theft by criminals, who are becoming increasingly more sophisticated, seeking to obtain unauthorized access to or exploit weaknesses that may exist in the payment systems. If we 42 | 43 | 9 44 | fail to comply with applicable rules or requirements for the payment methods we accept, or if payment-related data is compromised due to a breach or misuse of data, we may be liable for costs incurred by payment card issuing banks and other third parties or subject to fines and higher transaction fees, or our ability to accept or facilitate certain types of payments may be impaired. In addition, our customers could lose confidence in certain payment types, which may result in a shift to other payment types or potential changes to our payment systems that may result in higher costs. As a result, our business and operating results could be adversely affected. 45 | 46 | Uncertainty regarding the housing market, economic conditions, political climate and other factors beyond our control could adversely affect demand for our products and services, our costs of doing business and our financial performance. 47 | 48 | Our financial performance depends significantly on the stability of the housing, residential construction and home improvement markets, as well as general economic conditions, including changes in gross domestic product. Adverse conditions in or uncertainty about these markets, the economy or the political climate could adversely impact our customers’ confidence or financial condition, causing them to determine not to purchase home improvement products and services, causing them to delay purchasing decisions, or impacting their ability to pay for products and services. Other factors beyond our control – including unemployment and foreclosure rates; interest rate fluctuations; fuel and other energy costs; labor and healthcare costs; the availability of financing; the state of the credit markets, including mortgages, home equity loans and consumer credit; weather; natural disasters; acts of terrorism and other conditions beyond our control – could further adversely affect demand for our products and services, our costs of doing business and our financial performance. 49 | 50 | If we fail to identify and develop relationships with a sufficient number of qualified suppliers, or if our suppliers experience financial difficulties or other challenges, our ability to timely and efficiently access products that meet our high standards for quality could be adversely affected. 51 | 52 | We buy our products from suppliers located throughout the world. Our ability to continue to identify and develop relationships with qualified suppliers who can satisfy our high standards for quality and responsible sourcing, as well as our need to access products in a timely and efficient manner, is a significant challenge. Our ability to access products from our suppliers can be adversely affected by political instability, military conflict, acts of terrorism, the financial instability of suppliers, suppliers’ noncompliance with applicable laws, trade restrictions, tariffs, currency exchange rates, any disruptions in our suppliers’ logistics or supply chain networks, and other factors beyond our or our suppliers’ control. 53 | 54 | The implementation of our supply chain and technology initiatives could disrupt our operations in the near term, and these initiatives might not provide the anticipated benefits or might fail. 55 | 56 | We have made, and we plan to continue to make, significant investments in our supply chain and information technology systems. These initiatives, such as Project Sync and COM, our new Customer Order Management system, are designed to streamline our operations to allow our associates to continue to provide high-quality service to our customers, while simplifying customer interaction and providing our customers with a more interconnected retail experience. The cost and potential problems and interruptions associated with the implementation of these initiatives, including those associated with managing third-party service providers and employing new web-based tools and services, could disrupt or reduce the efficiency of our operations in the near term and lead to product availability issues. Failure to choose the right investments and implement them in the right manner and at the right pace could disrupt our operations. In addition, our improved supply chain and new or upgraded information technology systems might not provide the anticipated benefits, it might take longer than expected to realize the anticipated benefits, or the initiatives might fail altogether, each of which could adversely impact our competitive position and our financial condition, results of operations or cash flows. 57 | 58 | Disruptions in our supply chain and other factors affecting the distribution of our merchandise could adversely impact our business. 59 | 60 | A disruption within our logistics or supply chain network could adversely affect our ability to deliver inventory in a timely manner, which could impair our ability to meet customer demand for products and result in lost sales, increased supply chain costs or damage to our reputation. Such disruptions may result from damage or destruction to our distribution centers; weather-related events; natural disasters; trade policy changes or restrictions; tariffs or import-related taxes; third-party strikes, lock-outs, work stoppages or slowdowns; shipping capacity constraints; supply or shipping interruptions or costs; or other factors beyond our control. Any such disruption could negatively impact our financial performance or financial condition. 61 | 62 | 10 63 | If we are unable to effectively manage and expand our alliances and relationships with selected suppliers of both brand name and proprietary products, we may be unable to effectively execute our strategy to differentiate ourselves from our competitors. 64 | 65 | As part of our focus on product differentiation, we have formed strategic alliances and exclusive relationships with selected suppliers to market products under a variety of well-recognized brand names. We have also developed relationships with selected suppliers to allow us to market proprietary products that are comparable to national brands. Our proprietary products differentiate us from other retailers, generally carry higher margins than national brand products, and represent a growing portion of our business. If we are unable to manage and expand these alliances and relationships or identify alternative sources for comparable brand name and proprietary products, we may not be able to effectively execute product differentiation, which may impact our sales and gross margin results. 66 | 67 | Our proprietary products subject us to certain increased risks. 68 | 69 | As we expand our proprietary product offerings, we may become subject to increased risks due to our greater role in the design, manufacture, marketing and sale of those products. The risks include greater responsibility to administer and comply with applicable regulatory requirements, increased potential product liability and product recall exposure and increased potential reputational risks related to the responsible sourcing of those products. To effectively execute on our product differentiation strategy, we must also be able to successfully protect our proprietary rights and successfully navigate and avoid claims related to the proprietary rights of third parties. In addition, an increase in sales of our proprietary products may adversely affect sales of our vendors’ products, which in turn could adversely affect our relationships with certain of our vendors. Any failure to appropriately address some or all of these risks could damage our reputation and have an adverse effect on our business, results of operations and financial condition. 70 | 71 | If we are unable to manage effectively our installation services business, we could suffer lost sales and be subject to fines, lawsuits and reputational damage, or the loss of our general contractor licenses. 72 | 73 | We act as a general contractor to provide installation services to our DIFM customers through professional third-party installers. As such, we are subject to regulatory requirements and risks applicable to general contractors, which include management of licensing, permitting and quality of work performed by our third-party installers. We have established processes and procedures to manage these requirements and ensure customer satisfaction with the services provided by our third-party installers. However, if we fail to manage these processes effectively or to provide proper oversight of these services, we could suffer lost sales, fines and lawsuits for violations of regulatory requirements, as well as for property damage or personal injury. In addition, we may suffer damage to our reputation or the loss of our general contractor licenses, which could adversely affect our business. 74 | 75 | We may be unsuccessful in implementing our growth strategy, which could have an adverse impact on our financial condition and results of operation. 76 | 77 | In fiscal 2015, we completed the acquisition of Interline, which we believe has enhanced our ability to serve our professional customers and increased our share of the MRO market. During fiscal 2016, we continued to develop and implement our strategy with Interline. Our goal is to serve all of our different Pro customer groups through one integrated approach to drive growth and capture market share in the retail, services and MRO markets, and this strategy depends, in part, on our continuing integration of Interline. As with any acquisition, we need to successfully integrate Interline’s products, services, associates and systems into our business operations. Integration can be a complex and time-consuming process, and if the integration is not fully successful or is delayed for a material period of time, we may not achieve the anticipated synergies or benefits of the acquisition. Furthermore, even if Interline is successfully integrated, the acquisition may fail to further our business strategy as anticipated, expose us to increased competition or challenges with respect to our products or services, and expose us to additional liabilities associated with the Interline business and the wholesale market. 78 | 79 | Our costs of doing business could increase as a result of changes in, expanded enforcement of, or adoption of new federal, state or local laws and regulations. 80 | 81 | We are subject to various federal, state and local laws and regulations that govern numerous aspects of our business. In recent years, a number of new laws and regulations have been adopted, and there has been expanded enforcement of certain existing laws and regulations by federal, state and local agencies. These laws and regulations, and related interpretations and enforcement activity, may change as a result of a variety of factors, including political, economic or social events. Changes in, expanded enforcement of, or adoption of new federal, state or local laws and regulations governing minimum wage or living wage requirements; other wage, labor or workplace regulations; healthcare; data protection and cybersecurity; the sale of some of our products; transportation; logistics; international trade; supply chain transparency; taxes; unclaimed property; 82 | 83 | 11 84 | energy costs; or environmental matters, including with respect to our installation services business, could increase our costs of doing business or impact our operations. 85 | 86 | If we cannot successfully manage the unique challenges presented by international markets, we may not be successful in our international operations and our sales and profit margins may be impacted. 87 | 88 | Our ability to successfully conduct retail operations in, and source products and materials from, international markets is affected by many of the same risks we face in our U.S. operations, as well as unique costs and difficulties of managing international operations. Our international operations, including any expansion in international markets, may be adversely affected by local laws and customs, U.S. laws applicable to foreign operations and other legal and regulatory constraints, as well as political and economic conditions. Risks inherent in international operations also include, among others, potential adverse tax consequences; potential tariffs and other import-related taxes; greater difficulty in enforcing intellectual property rights; risks associated with the Foreign Corrupt Practices Act and local anti-bribery law compliance; and challenges in our ability to identify and gain access to local suppliers. In addition, our operations in international markets create risk due to foreign currency exchange rates and fluctuations in those rates, which may adversely impact our sales and profit margins. 89 | 90 | The inflation or deflation of commodity prices could affect our prices, demand for our products, our sales and our profit margins. 91 | 92 | Prices of certain commodity products, including lumber and other raw materials, are historically volatile and are subject to fluctuations arising from changes in domestic and international supply and demand, labor costs, competition, market speculation, government regulations and periodic delays in delivery. Rapid and significant changes in commodity prices may affect the demand for our products, our sales and our profit margins. 93 | 94 | Changes in accounting standards and subjective assumptions, estimates and judgments by management related to complex accounting matters could significantly affect our financial results or financial condition. 95 | 96 | Generally accepted accounting principles and related accounting pronouncements, implementation guidelines and interpretations with regard to a wide range of matters that are relevant to our business, such as revenue recognition, asset impairment, impairment of goodwill and other intangible assets, inventories, lease obligations, self-insurance, tax matters and litigation, are highly complex and involve many subjective assumptions, estimates and judgments. Changes in these rules or their interpretation or changes in underlying assumptions, estimates or judgments could significantly change our reported or expected financial performance or financial condition. 97 | 98 | We have incurred losses related to the data breach we discovered in the third quarter of fiscal 2014 (the "Data Breach"); we may incur additional losses or experience future operational impacts on our business, which could have an adverse impact on our operations, financial results and reputation. 99 | 100 | The Data Breach involved the theft of certain payment card information and customer email addresses through unauthorized access to our systems. Since the Data Breach occurred, we have recorded $198 million of pretax expenses, net of expected insurance recoveries, in connection with the Data Breach, as described in more detail in Note 13 to the Consolidated Financial Statements included in Item 8, "Financial Statements and Supplementary Data". We are still facing a consolidated shareholder derivative action brought by two purported shareholders and an investigation by a number of State Attorneys General. These matters may adversely affect how we operate our business, divert the attention of management from the operation of the business, have an adverse effect on our reputation, and result in additional costs and fines. 101 | 102 | We are involved in a number of legal and regulatory proceedings, and while we cannot predict the outcomes of those proceedings and other contingencies with certainty, some of these outcomes may adversely affect our operations or increase our costs. 103 | 104 | In addition to the matters discussed above with respect to the Data Breach, we are involved in a number of legal proceedings and regulatory matters, including government inquiries and investigations, and consumer, employment, tort and other litigation that arise from time to time in the ordinary course of business. Litigation is inherently unpredictable, and the outcome of some of these proceedings and other contingencies could require us to take or refrain from taking actions which could adversely affect our operations or could result in excessive adverse verdicts. Additionally, involvement in these lawsuits, investigations and inquiries, and other proceedings may involve significant expense, divert management’s attention and resources from other matters, and impact the reputation of the Company. 105 | 106 | 12 107 | Item 1B. Unresolved Staff Comments. 108 | 109 | Not applicable. 110 | 111 | Item 2. Properties. -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20170129_Item1_excerpt.txt: -------------------------------------------------------------------------------- 1 | PART I 2 | 3 | Item 1. Business. 4 | 5 | Introduction 6 | 7 | The Home Depot, Inc. is the world’s largest home improvement retailer based on Net Sales for the fiscal year ended January 29, 2017 ("fiscal 2016"). The Home Depot sells a wide assortment of building materials, home improvement products and lawn and garden products and provides a number of services. The Home Depot stores average approximately 104,000 square feet of enclosed space, with approximately 24,000 additional square feet of outside garden area. As of the end of fiscal 2016, we had 2,278 The Home Depot stores located throughout the United States, including the Commonwealth of Puerto Rico and the territories of the U.S. Virgin Islands and Guam, Canada and Mexico. When we refer to "The Home Depot", the "Company", "we", "us" or "our" in this report, we are referring to The Home Depot, Inc. and its consolidated subsidiaries. 8 | 9 | The Home Depot, Inc. is a Delaware corporation that was incorporated in 1978. Our Store Support Center (corporate office) is located at 2455 Paces Ferry Road, Atlanta, Georgia 30339. Our telephone number is (770) 433-8211. 10 | 11 | Our internet website is www.homedepot.com. We make available on the Investor Relations section of our website, free of charge, our Annual Reports to shareholders, Annual Reports on Form 10-K, Quarterly Reports on Form 10-Q, Current Reports on Form 8-K, Proxy Statements and Forms 3, 4 and 5, and amendments to those reports, as soon as reasonably practicable after filing such documents with, or furnishing such documents to, the SEC. 12 | 13 | We include our website addresses throughout this filing for reference only. The information contained on our websites is not incorporated by reference into this report. 14 | 15 | For information on key financial highlights, including historical revenues, profits and total assets, see the "Five-Year Summary of Financial and Operating Results" on page F-1 of this report and Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations". 16 | 17 | 1 18 | Our Business 19 | 20 | Operating Strategy 21 | 22 | Our strategic framework is founded on our three-legged stool of customer experience, product authority, and productivity and efficiency driven by effective capital allocation, all united by the goal of providing an interconnected retail experience to drive value for our customers, our associates, our suppliers and our shareholders. The retail landscape is rapidly evolving, and our goal is to become more agile in responding to the changing competitive landscape and customer preferences. As customers increasingly expect to be able to buy how, when and where they want, we believe that providing a seamless and frictionless shopping experience across multiple channels, featuring innovative and curated product choices delivered in a fast and cost-efficient manner, will be a key enabler for future success. 23 | 24 | Becoming a best-in-class interconnected retailer is growing in importance as the line between online and in-store shopping continues to blur and customers demand increased convenience and value. Under customer experience, we are focused on connecting our associates to customer needs and connecting our stores to our online experience. Under product authority, we are focused on connecting our product and services to customer needs, including offering local and customized assortments to meet the unique needs of our different communities. Under productivity and efficiency, we are focused on connecting our merchandise from our suppliers to our customers by optimizing our supply chain. Overall, we are collaborating more closely throughout our organization and with our business partners to build an interconnected experience for customers that offers an end-to-end solution for their home improvement needs. 25 | 26 | Customer Experience 27 | 28 | Customer experience is anchored on the principles of putting customers first and taking care of our associates. Our commitment to customer service has been and will continue to be a key part of the customer experience. But we recognize that the customer experience includes more than just customer service, and we have taken a number of steps to provide our customers with a seamless and frictionless shopping experience in our stores, online, on the job site or in their homes. 29 | 30 | Our Customers. We serve three primary customer groups, and we have different approaches to meet their particular needs: 31 | 32 | •Do-It-Yourself ("DIY") Customers. These customers are typically home owners who purchase products and complete their own projects and installations. Our associates assist these customers with specific product and installation questions both in our stores and through online resources and other media designed to provide product and project knowledge. We also offer a variety of clinics and workshops both to impart this knowledge and to build an emotional connection with our DIY customers. 33 | 34 | •Do-It-For-Me ("DIFM") Customers. These customers are typically home owners who purchase materials and hire third parties to complete the project or installation. Our stores offer a variety of installation services available to DIFM customers who purchase products and installation of those products from us in our stores, online or in their homes through in-home consultations. Our installation programs include many categories, such as flooring, cabinets, countertops, water heaters and sheds. In addition, we provide third-party professional installation in a number of categories sold through our in-home sales programs, such as roofing, siding, windows, cabinet refacing, furnaces and central air systems. This customer group is growing due to changing demographics, which we believe will increase demand for our installation services. Further, our focus on serving the professional customers, or "Pros", who perform these services for our DIFM customers will help us drive higher product sales. 35 | 36 | •Professional Customers. These customers are primarily professional renovators/remodelers, general contractors, handymen, property managers, building service contractors and specialty tradesmen, such as installers. We recognize the great value our Pro customers provide to their clients, and we strive to make the Pro’s job easier. For example, we offer our Pros a wide range of special programs such as delivery and will-call services, dedicated sales and service staff, enhanced credit programs, designated parking spaces close to store entrances and bulk pricing programs for both online and in-store purchases. In addition, we maintain a loyalty program, Pro Xtra, that provides our Pros with discounts on useful business services, exclusive product offers and a purchase tracking tool to enable receipt lookup online and job tracking of purchases across all forms of payment. 37 | 38 | With our acquisition of Interline Brands, Inc. ("Interline") in August 2015, we established a platform in the maintenance, repair and operations ("MRO") market where we serve primarily institutional customers (such as educational and healthcare institutions), hospitality businesses, and national apartment complexes. We also gained additional competencies relevant to our large professional customers, including outside sales and account management, expanded assortment of product, expanded financing options and last mile delivery capabilities. 39 | 40 | 2 41 | We recognize that our Pros have differing needs depending on the type of work they perform. Our goal is to develop a comprehensive set of capabilities for our professional customers to provide solutions across every purchase opportunity, such as supplying both recurring MRO needs and renovation products and services to property managers or providing inventory management solutions for specialty tradesmen’s replenishment needs. We believe that by bringing our best resources to bear for each individual customer, we can provide a differentiated customer experience and enhanced value proposition for our professional customers. 42 | 43 | We help our DIY, DIFM and Pro customers finance their projects by offering private label credit products through third-party credit providers. Our private label credit program includes other benefits, such as a 365-day return policy and, for our Pros, commercial fuel rewards and extended payment terms. In fiscal 2016, our customers opened approximately 4.2 million new The Home Depot private label credit accounts, and at fiscal year end the total number of The Home Depot active account holders was approximately 13 million. Private label credit card sales accounted for approximately 23% of sales in fiscal 2016. 44 | 45 | Our Associates. Our associates are key to our customer experience. We empower our associates to deliver excellent customer service, and we strive to remove complexity and inefficient processes from the stores to allow our associates to focus on our customers. In fiscal 2016, we continued to implement new initiatives to improve freight handling in the stores, as well as Project Sync, which is discussed in more detail below under "Logistics". All of these programs are designed to make our freight handling process more efficient, which allows our associates to devote more time to the customer experience and makes working at The Home Depot a better experience for them. We also have a number of programs to recognize stores and individual associates for exceptional customer service. 46 | 47 | At the end of fiscal 2016, we employed approximately 406,000 associates, of whom approximately 27,000 were salaried, with the remainder compensated on an hourly or temporary basis. To attract and retain qualified personnel, we seek to maintain competitive salary and wage levels in each market we serve. We measure associate satisfaction regularly, and we believe that our employee relations are very good. 48 | 49 | Interconnected Retail. In fiscal 2016, we continued to enhance our customers’ interconnected shopping experiences through a variety of initiatives. Our associates continued to use FIRST phones, our web-enabled handheld devices, to help customers complete online sales in the aisle, expedite the checkout process for customers during peak traffic periods, locate products in the aisles and online, and check inventory on hand. We have also empowered our customers with improved product location and inventory availability tools on our website and mobile app. During fiscal 2016, we launched a redesign of our website to provide our customers with better search capabilities and a faster checkout experience, and we upgraded our mobile app. Enhancements to our online and mobile properties are critical for our increasingly interconnected customers who research products online and then go into one of our stores to view the products in person or talk to an associate before making the purchase. While in the store, customers may also go online to access ratings and reviews, compare prices, view our extended assortment and purchase products. 50 | 51 | We also recognize that customers desire greater flexibility and convenience when it comes to receiving their products and services. We strive to deliver on this customer expectation by blending the physical and digital channels into a seamless customer experience. A key component of this strategy is enabled through our new Customer Order Management platform ("COM"), which we rolled out to all U.S. stores in fiscal 2016. COM provides a common management system for customer orders, providing greater visibility and a more seamless and frictionless experience for our customers and associates. COM also provides the foundation for the enhanced delivery option we rolled out to all U.S. stores in fiscal 2016: Buy Online, Deliver From Store ("BODFS"). BODFS allows customers to schedule a specific delivery window, and it complements our existing interconnected retail programs: Buy Online, Pick-up In Store ("BOPIS"), Buy Online, Ship to Store ("BOSS") and Buy Online, Return In Store ("BORIS"). During the year, we also enabled a dynamic ETA (estimated time of arrival) feature for online orders, which provides our customers with a faster and more accurate delivery date based on their location. 52 | 53 | We do not view the customer experience as a specific transaction; rather, it encompasses an entire process from inspiration and know-how, to purchase and fulfillment, to post-purchase care and support. Further, we believe that by connecting our stores to online and online to our stores, we drive sales not just in-store but also online. In fiscal 2016, we saw increased traffic to our online properties and improved online sales conversion rates. Sales from our online channels increased over 19% compared to fiscal 2015. We will continue to blend our physical and digital assets in a seamless and frictionless way to enhance the end-to-end customer experience. 54 | 55 | 3 56 | Product Authority 57 | 58 | Product authority is facilitated by our merchandising transformation and portfolio strategy, which is focused on delivering product innovation, assortment and value. In fiscal 2016, we continued to introduce a wide range of innovative new products to our DIY, DIFM and Pro customers, while remaining focused on offering everyday values in our stores and online. 59 | 60 | Our Products. A typical The Home Depot store stocks approximately 30,000 to 40,000 products during the year, including both national brand name and proprietary items. To enhance our merchandising capabilities, we continued to make improvements to our information technology tools in fiscal 2016 to better understand our customers, provide more localized assortments to fit customer demand and optimize space to dedicate the right square footage to the right products in the right location. Our online product offerings complement our stores by serving as an extended aisle, and we offer a significantly broader product assortment through our websites, including homedepot.com and Blinds.com. We also routinely use our merchandising tools to refine our online assortment to balance the extended choice with a more curated offering. 61 | 62 | In fiscal 2016, we introduced a number of innovative and distinctive products to our customers at attractive values. Examples of these new products include Feit® LED Edge-Lit flat panel lighting; Pergo® Outlast+ flooring; Thomasville® Studio 1904 kitchen cabinets; DEWALT FLEXVOLT® system for power tools; Diablo® carbide hole saws; Defiant® LED security lights; Makita® subcompact drills; and Milwaukee M18 FUEL® with ONE KEY™ power tools, compatible with the HIGH DEMAND™ 9.0Ah battery. 63 | 64 | During fiscal 2016, we continued to offer value to our customers through our proprietary and exclusive brands across a wide range of departments. Highlights of these offerings include Husky® hand tools, tool storage and work benches; Everbilt® hardware and fasteners; Hampton Bay® lighting, ceiling fans and patio furniture; Vigoro® lawn care products; RIDGID® and Ryobi® power tools; Glacier Bay® bath fixtures; HDX® storage and cleaning products; and Home Decorators Collection® furniture and home décor. We will continue to assess departments and categories, both online and in-store, for opportunities to expand the assortment of products offered within The Home Depot’s portfolio of proprietary and exclusive brands. 65 | 66 | We maintain a global sourcing program to obtain high-quality and innovative products directly from manufacturers around the world. In fiscal 2016, in addition to our U.S. sourcing operations, we maintained sourcing offices in Mexico, Canada, China, India, Southeast Asia and Europe. 67 | 68 | The percentage of Net Sales of each of our major product categories (and related services) for each of the last three fiscal years is presented in Note 1 to the Consolidated Financial Statements included in Item 8, "Financial Statements and Supplementary Data". Net Sales inside the U.S. were $86.6 billion, $80.5 billion and $74.7 billion for fiscal 2016, 2015 and 2014, respectively. Net Sales outside the U.S. were $8.0 billion, $8.0 billion and $8.5 billion for fiscal 2016, 2015 and 2014, respectively. Long-lived assets, which consist of net property and equipment, inside the U.S. totaled $19.5 billion, $19.9 billion and $20.2 billion as of January 29, 2017, January 31, 2016 and February 1, 2015, respectively. Long-lived assets outside the U.S. totaled $2.4 billion, $2.3 billion and $2.5 billion as of January 29, 2017, January 31, 2016 and February 1, 2015, respectively. 69 | 70 | Quality Assurance. Our suppliers are obligated to ensure that their products comply with applicable international, federal, state and local laws. In addition, we have both quality assurance and engineering resources dedicated to establishing criteria and overseeing compliance with safety, quality and performance standards for our proprietary branded products. We also have a global Supplier Social and Environmental Responsibility Program designed to ensure that suppliers adhere to the highest standards of social and environmental responsibility. 71 | 72 | Environmentally-Preferred Products and Programs. The Home Depot is committed to sustainable business practices – from the environmental impact of our operations, to our sourcing activities, to our involvement within the communities in which we do business. We believe these efforts continue to be successful in creating value for our customers and shareholders. For example, we offer a growing selection of environmentally-preferred products, which supports sustainability and helps our customers save energy, water and money. Through our Eco Options® Program introduced in 2007, we have created product categories that allow customers to easily identify products that meet specifications for energy efficiency, water conservation, healthy home, clean air and sustainable forestry. As of the end of fiscal 2016, our Eco Options® Program included over 10,000 products. Through this program, we sell ENERGY STAR® certified appliances, LED light bulbs, tankless water heaters and other products that enable our customers to save on their utility bills. We estimate that in fiscal 2016 we helped customers save over $900 million in electricity costs through sales of ENERGY STAR® certified products. We also estimate our customers saved over 76 billion gallons of water resulting in over $640 million in water bill savings in fiscal 2016 through the sales of our WaterSense®-labeled bath faucets, showerheads, aerators, toilets and irrigation controllers. Our 2016 73 | 74 | 4 75 | Responsibility Report, available on our website at https://corporate.homedepot.com/responsibility, describes many of our other environmentally-preferred products that promote energy efficiency, water conservation, clean air and a healthy home. 76 | 77 | We continue to offer store recycling programs nationwide, such as an in-store compact fluorescent light ("CFL") bulb recycling program launched in 2008. This service is offered to customers free of charge and is available in all U.S. stores. We also maintain an in-store rechargeable battery recycling program. Launched in 2001 and currently done in partnership with Call2Recycle, this program is also available to customers free of charge in all stores throughout the U.S. Through our recycling programs, in fiscal 2016 we helped recycle over 860,000 pounds of CFL bulbs and over 1 million pounds of rechargeable batteries. In fiscal 2016, we also recycled over 180,000 lead acid batteries collected from our customers under our lead acid battery exchange program, as well as over 225,000 tons of cardboard through a nationwide cardboard recycling program across our U.S. operations. We believe our environmentally-preferred product selection and our recycling efforts drive sales, which in turn benefits our shareholders, in addition to our customers and the environment. 78 | 79 | Seasonality. Our business is subject to seasonal influences. Generally, our highest volume of sales occurs in our second fiscal quarter, and the lowest volume occurs either during our first or fourth fiscal quarter. 80 | 81 | Competition. Our industry is highly competitive, with competition based primarily on customer experience, price, store location and appearance, and quality, availability, assortment and presentation of merchandise. Although we are currently the world’s largest home improvement retailer, in each of the markets we serve there are a number of other home improvement stores, electrical, plumbing and building materials supply houses, and lumber yards. With respect to some products and services, we also compete with specialty design stores, showrooms, discount stores, local, regional and national hardware stores, paint stores, mail order firms, warehouse clubs, independent building supply stores, MRO companies and other retailers, as well as with providers of home improvement services. In addition, we face growing competition from online and multichannel retailers, some of whom may have a lower cost structure than ours, as our customers now routinely use computers, tablets, smartphones and other mobile devices to shop online and compare prices and products. 82 | 83 | Intellectual Property. Our business has one of the most recognized brands in North America. As a result, we believe that The Home Depot® trademark has significant value and is an important factor in the marketing of our products, e-commerce, stores and business. We have registered or applied for registration of trademarks, service marks, copyrights and internet domain names, both domestically and internationally, for use in our business, including our expanding proprietary brands such as HDX®, Husky®, Hampton Bay®, Home Decorators Collection®, Glacier Bay® and Vigoro®. We also maintain patent portfolios relating to some of our products and services and seek to patent or otherwise protect innovations we incorporate into our products or business operations. 84 | 85 | Productivity and Efficiency Driven by Capital Allocation 86 | 87 | We continue to drive productivity and efficiency by building best-in-class competitive advantages in our information technology and supply chain to ensure product availability for our customers while managing our costs, which results in higher returns for our shareholders. During fiscal 2016, we continued to focus on optimizing our end-to-end supply chain network and improving our inventory, transportation and distribution productivity. 88 | 89 | Logistics. We continue to invest in our supply chain by optimizing our network through initiatives like Supply Chain Synchronization, or "Project Sync". As described in more detail below, Project Sync is a recent initiative to improve our in-stock rates, inventory productivity and logistics costs. During fiscal 2016, we rolled out the first phase of Project Sync to all U.S. stores, and we continue to onboard new suppliers. This is a multi-year, multi-phase endeavor, which we plan to implement in each of our distribution flow paths. 90 | 91 | Our overall distribution strategy is to provide the optimal flow path for a given product. Rapid Deployment Centers ("RDCs") play a key role in optimizing our network as they allow for aggregation of product needs for multiple stores to a single purchase order and then rapid allocation and deployment of inventory to individual stores upon arrival at the RDC. This results in a simplified ordering process and improved transportation and inventory management. Through Project Sync, we can significantly reduce our average lead time from supplier to shelf. Project Sync requires deep collaboration among our suppliers, transportation providers, RDCs and stores, as well as rigorous planning and information technology development to create an engineered flow schedule that shortens and stabilizes lead time, resulting in more predictable and consistent freight flow. As we continue to roll out Project Sync throughout our supply chain over the next several years, we plan to create an end-to-end solution that benefits all participants in our supply chain, from our suppliers to our transportation providers to our RDC and store associates to our customers. 92 | 93 | Over the past several years, we have centralized our inventory planning and replenishment function and continuously improved our forecasting and replenishment technology. This has helped us improve our product availability and our 94 | 95 | 5 96 | inventory productivity at the same time. At the end of fiscal 2016, over 95% of our U.S. store products were ordered through central inventory management. 97 | 98 | We operate multiple distribution center platforms tailored to meet the needs of our stores and customers, based on the types of products, local geography, and transportation and delivery requirements. The following table sets forth the number and type of distribution centers in our network as of the end of fiscal 2016: 99 | 100 | [DATA_TABLE_REMOVED] 101 | 102 | To better serve our online customers, we offer a variety of delivery methods to get product to each customer’s preferred location. We completed the rollout of three new U.S. DFCs in fiscal 2015 to support our online growth. We also operate one DFC for our Home Decorators Collection orders and one DFC for "big and bulky" orders. Our DFCs enable us to reduce delivery costs and lead time, and improve the overall customer experience by shipping online orders directly to the customer. In addition, with our acquisition of Interline in fiscal 2015, we added more than 100 distribution points with fast delivery of a broad assortment of MRO products. We remain committed to leveraging our supply chain capabilities to fully utilize and optimize our improved logistics network. 103 | 104 | In addition to the distribution and fulfillment centers described above, we leverage our almost 2,000 U.S. stores as a network of convenient customer pick-up, return and delivery fulfillment locations. For customers who shop online and wish to pick-up or return merchandise at our U.S. stores, we have fully implemented our BOPIS, BOSS and BORIS programs, which we believe provide us with a competitive advantage. For example, in the fourth quarter of fiscal 2016, approximately 45% of our online U.S. orders were picked up in a store. For customers who would like the option to have store-based orders delivered directly to their home or job site, we pick, pack and ship orders to customers from our stores. Our BODFS program, which we rolled out in fiscal 2016, allows our customers to select their preferred delivery date and time windows for store-based deliveries. Our supply chain and logistics strategies will continue to be focused on providing our customers high product availability with convenient and low cost fulfillment options. 105 | 106 | Commitment to Sustainability and Environmentally Responsible Operations. The Home Depot focuses on sustainable operations and is committed to conducting business in an environmentally responsible manner. This commitment impacts all areas of our business, including energy usage, supply chain and packaging, store construction and maintenance, and, as noted above under "Environmentally-Preferred Products and Programs", product selection and recycling programs for our customers. 107 | 108 | In 2015, we announced two major sustainability commitments for 2020. Our first goal is to reduce our U.S. stores’ energy use by 20% over 2010 levels, and our second goal is to produce and procure, on an annual basis, 135 megawatts of energy for our stores through renewable or alternate energy sources, such as wind, solar and fuel cell technology. As of the end of fiscal 2016, we are on track to exceed both of our goals before the end of 2020. We are committed to implementing strict operational standards that establish energy efficient operations in all of our U.S. facilities and continuing to invest in renewable energy. 109 | 110 | 6 111 | Additionally, we implemented a rainwater reclamation project in our stores in 2010. As of the end of fiscal 2016, 145 of our stores used reclamation tanks to collect rainwater and condensation from HVAC units and garden center roofs, which is in turn used to water plants in our outside garden centers. We estimate our annual water savings from these units to be over 500,000 gallons per store for total water savings of over 72.5 million gallons in fiscal 2016. Our 2016 Responsibility Report, which uses the Global Reporting Initiative (GRI) framework for sustainability reporting, provides more information on sustainability efforts in other aspects of our operations. Our 2016 Responsibility Report is available on our website at https://corporate.homedepot.com/responsibility. 112 | 113 | Our commitment to corporate sustainability has resulted in a number of environmental awards and recognitions. In 2016, we received three significant awards from the U.S. Environmental Protection Agency ("EPA"). The ENERGY STAR® division named us "Retail Partner of the Year – Sustained Excellence" for our overall excellence in energy efficiency, and we received the 2016 WaterSense® Sustained Excellence Award for our overall excellence in water efficiency. We also received the EPA’s "SmartWay Excellence Award", which recognizes The Home Depot as an industry leader in freight supply chain environmental performance and energy efficiency. We also participate in the CDP (formerly known as the Carbon Disclosure Project) reporting process. CDP is an independent, international, not-for-profit organization providing a global system for companies and cities to measure, disclose, manage and share environmental information. In 2016, we received a score of A- (out of a range from A to E) from the CDP, reflecting a high level of action on climate change mitigation, adaptation and transparency. We also were named an industry leader by the CDP. 114 | 115 | We are strongly committed to maintaining a safe shopping and working environment for our customers and associates. Our Environmental, Health & Safety ("EH&S") function is dedicated to ensuring the health and safety of our customers and associates, with trained associates who evaluate, develop, implement and enforce policies, processes and programs on a Company-wide basis. Our EH&S policies are woven into our everyday operations and are part of The Home Depot culture. Some common program elements include: daily store inspection checklists (by department); routine follow-up audits from our store-based safety team members and regional, district and store operations field teams; equipment enhancements and preventative maintenance programs to promote physical safety; departmental merchandising safety standards; training and education programs for all associates, with varying degrees of training provided based on an associate’s role and responsibilities; and awareness, communication and recognition programs designed to drive operational awareness and understanding of EH&S issues. 116 | 117 | Returning Value to Shareholders. As noted above, we drive productivity and efficiency through our capital allocation decisions, with a focus on expense control. This discipline drove higher returns on invested capital and allowed us to return value to shareholders through $7.0 billion in share repurchases and $3.4 billion in dividends in fiscal 2016, as discussed in Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations". 118 | 119 | Item 1A. Risk Factors. -------------------------------------------------------------------------------- /output_files_examples/batch_0001/001/HD_0000354950_10K_20170129_Item7A_excerpt.txt: -------------------------------------------------------------------------------- 1 | Item 7A. Quantitative and Qualitative Disclosures About Market Risk. 2 | 3 | The information required by this item is incorporated by reference to Item 7, "Management’s Discussion and Analysis of Financial Condition and Results of Operations" of this report. 4 | 5 | 28 6 | Item 8. Financial Statements and Supplementary Data. -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4>=4.4.1 2 | lxml>=3.5.0 3 | requests>=2.12.4 -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ -------------------------------------------------------------------------------- /src/control.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | import re 9 | import os 10 | 11 | from .download import EdgarCrawler 12 | from .utils import logger, args 13 | from .utils import companies_file_location, single_company, date_search_string 14 | from .utils import batch_number, storage_toplevel_directory 15 | 16 | MAX_FILES_IN_SUBDIRECTORY = 1000 17 | 18 | class Downloader(object): 19 | def __init__(self): 20 | self.storage_path = args.storage 21 | 22 | def download_companies (self, do_save_full_document=False): 23 | """Iterate through a list of companies and download documents. 24 | 25 | Downloading document contents within each filing type required 26 | :param do_save_full_document: save a local copy of the whole original 27 | document 28 | :return: 29 | """ 30 | companies = list() 31 | if single_company: 32 | companies.append([str(single_company), str(single_company)]) 33 | logger.info("Downloading single company: %s", args.company) 34 | if not companies: 35 | try: 36 | companies = company_list(companies_file_location) 37 | logger.info("Using companies list: %s", 38 | companies_file_location) 39 | except: 40 | logger.warning("Companies list not available") 41 | company_input = input("Enter company code (CIK or ticker): ") 42 | if company_input: 43 | companies.append([company_input, company_input]) 44 | logger.info("Downloading single company: %s", company_input) 45 | else: 46 | # default company: Dow Chemical 47 | company_default = 'DOW' 48 | companies.append([company_default, company_default.title()]) 49 | logger.info("Downloading default company: %s", 50 | next(iter(companies))) 51 | start_date = args.start # TODO:this may be ignored by EDGAR web interface, consider removing this argument 52 | end_date = args.end 53 | filings = args.filings 54 | 55 | logger.info('-' * 65) 56 | logger.info("Downloading %i companies: %s", len(companies), 57 | single_company or companies_file_location) 58 | logger.info("Filings period: %i - %i", args.start, args.end) 59 | logger.info("Filings search: %s", filings) 60 | logger.info("Storage location: %s", self.storage_path) 61 | logger.info('-' * 65) 62 | 63 | start_company = max(1, int(args.start_company or 1)) 64 | end_company = min(len(companies), 65 | int(args.end_company or len(companies))) 66 | 67 | download_companies = companies[start_company-1:end_company] 68 | seccrawler = EdgarCrawler() 69 | 70 | 71 | if do_save_full_document: 72 | logger.info("Saving source document and extracts " 73 | "(if successful) locally") 74 | else: 75 | logger.info("Saving extracts (if successful) only. " 76 | "Not saving source documents locally.") 77 | logger.info("SEC filing date range: %i to %i", start_date, end_date) 78 | storage_subdirectory_number = 1 79 | 80 | for c, company_keys in enumerate(download_companies): 81 | edgar_search_string = str(company_keys[0]) 82 | company_description = str(company_keys[1]).strip() 83 | company_description = re.sub('/','', company_description) 84 | 85 | logger.info('Batch number: ' + str(batch_number) + 86 | ', begin downloading company: ' + 87 | str(c + 1) + ' / ' + 88 | str(len(download_companies))) 89 | storage_subdirectory = os.path.join(storage_toplevel_directory, 90 | format(storage_subdirectory_number, 91 | '03d')) 92 | if not os.path.exists(storage_subdirectory): 93 | os.makedirs(storage_subdirectory) 94 | seccrawler.storage_folder = storage_subdirectory 95 | for filing_search_string in args.filings: 96 | seccrawler.download_filings(company_description, 97 | edgar_search_string, 98 | filing_search_string, 99 | date_search_string, 100 | str(start_date), 101 | str(end_date), do_save_full_document) 102 | if len(os.listdir(storage_subdirectory)) > MAX_FILES_IN_SUBDIRECTORY: 103 | storage_subdirectory_number += 1 104 | logger.warning("SUCCESS: Finished attempted download of " + 105 | str(len(download_companies) or 0) + 106 | " companies from an overall list of " + 107 | str(len(companies) or 0) + " companies." ) 108 | 109 | 110 | def company_list(text_file_location): 111 | """Read companies list from text_file_location, load into a dictionary. 112 | :param text_file_location: 113 | :return: company_dict: each element is a list of CIK code text and 114 | company descriptive text 115 | """ 116 | company_list = list() 117 | with open(text_file_location, newline='') as f: 118 | for r in f.readlines(): 119 | if r[0] =='#' and len(company_list) > 0: 120 | break 121 | if r[0] != '#' and len(r) > 1: 122 | r = re.sub('\n', '', r) 123 | text_items = re.split('[ ,\t]', r) # various delimiters allowed 124 | edgar_search_text = text_items[0].zfill(10) 125 | company_description = '_'.join( 126 | text_items[1:2]) 127 | company_list.append([edgar_search_text, company_description]) 128 | return company_list 129 | 130 | 131 | 132 | -------------------------------------------------------------------------------- /src/document.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | import time 9 | from datetime import datetime 10 | import copy 11 | import os 12 | from abc import ABCMeta 13 | import multiprocessing as mp 14 | 15 | from .utils import search_terms as master_search_terms 16 | from .utils import args, logger 17 | 18 | class Document(object): 19 | __metaclass__ = ABCMeta 20 | 21 | def __init__(self, file_path, doc_text, extraction_method): 22 | self._file_path = file_path 23 | self.doc_text = doc_text 24 | self.extraction_method = extraction_method 25 | self.log_cache = [] 26 | 27 | def get_excerpt(self, input_text, form_type, metadata_master, 28 | skip_existing_excerpts): 29 | """ 30 | 31 | :param input_text: 32 | :param form_type: 33 | :param metadata_master: 34 | :param skip_existing_excerpts: 35 | :return: 36 | """ 37 | start_time = time.process_time() 38 | self.prepare_text() 39 | prep_time = time.process_time() - start_time 40 | file_name_root = metadata_master.metadata_file_name 41 | for section_search_terms in master_search_terms[form_type]: 42 | start_time = time.process_time() 43 | metadata = copy.copy(metadata_master) 44 | warnings = [] 45 | section_name = section_search_terms['itemname'] 46 | section_output_path = file_name_root + '_' + section_name 47 | txt_output_path = section_output_path + '_excerpt.txt' 48 | metadata_path = section_output_path + '_metadata.json' 49 | failure_metadata_output_path = section_output_path + '_failure.json' 50 | 51 | search_pairs = section_search_terms[self.search_terms_type()] 52 | text_extract, extraction_summary, start_text, end_text, warnings = \ 53 | self.extract_section(search_pairs) 54 | time_elapsed = time.process_time() - start_time 55 | # metadata.extraction_method = self.extraction_method 56 | metadata.section_name = section_name 57 | if start_text: 58 | start_text = start_text.replace('\"', '\'') 59 | if end_text: 60 | end_text = end_text.replace('\"', '\'') 61 | metadata.endpoints = [start_text, end_text] 62 | metadata.warnings = warnings 63 | metadata.time_elapsed = round(prep_time + time_elapsed, 1) 64 | metadata.section_end_time = str(datetime.utcnow()) 65 | if text_extract: 66 | # success: save the excerpt file 67 | metadata.section_n_characters = len(text_extract) 68 | with open(txt_output_path, 'w', encoding='utf-8', 69 | newline='\n') as txt_output: 70 | txt_output.write(text_extract) 71 | log_str = ': '.join(['SUCCESS Saved file for', 72 | section_name, txt_output_path]) 73 | self.log_cache.append(('DEBUG', log_str)) 74 | try: 75 | os.remove(failure_metadata_output_path) 76 | except: 77 | pass 78 | metadata.output_file = txt_output_path 79 | metadata.metadata_file_name = metadata_path 80 | metadata.save_to_json(metadata_path) 81 | else: 82 | log_str = ': '.join(['No excerpt located for ', 83 | section_name, metadata.sec_index_url]) 84 | self.log_cache.append(('WARNING', log_str)) 85 | try: 86 | os.remove(metadata_path) 87 | except: 88 | pass 89 | metadata.metadata_file_name = failure_metadata_output_path 90 | metadata.save_to_json(failure_metadata_output_path) 91 | if args.write_sql: 92 | metadata.save_to_db() 93 | return(self.log_cache) 94 | 95 | def prepare_text(self): 96 | # handled in child classes 97 | pass 98 | 99 | -------------------------------------------------------------------------------- /src/download.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | # Originally adapted from "SEC-Edgar" package code 9 | import multiprocessing as mp 10 | import os 11 | import re 12 | import copy 13 | from bs4 import BeautifulSoup 14 | 15 | from .utils import args, logger, requests_get 16 | from .metadata import Metadata 17 | from .utils import search_terms as master_search_terms 18 | from .html_document import HtmlDocument 19 | from .text_document import TextDocument 20 | 21 | 22 | class EdgarCrawler(object): 23 | def __init__(self): 24 | self.storage_folder = None 25 | 26 | def download_filings(self, company_description, edgar_search_string, 27 | filing_search_string, date_search_string, 28 | start_date, end_date, 29 | do_save_full_document, count=100): 30 | """Build a list of all filings of a certain type, within a date range. 31 | 32 | Then download them and extract the text of interest 33 | :param: cik 34 | :param: count number of Filing Results to return on the (first) EDGAR 35 | Search Results query page. 9999=show all 36 | :param: type_serach_string 37 | :param: start_date, end_date 38 | :return: text_extract: str , warnings: [str] 39 | """ 40 | 41 | filings_links = self.download_filings_links(edgar_search_string, 42 | company_description, 43 | filing_search_string, 44 | date_search_string, 45 | start_date, end_date, count) 46 | 47 | filings_list = [] 48 | 49 | logger.info("Identified " + str(len(filings_links)) + 50 | " filings, gathering SEC metadata and document links...") 51 | 52 | is_multiprocessing = args.multiprocessing_cores > 0 53 | if is_multiprocessing: 54 | pool = mp.Pool(processes = args.multiprocessing_cores) 55 | 56 | for i, index_url in enumerate(filings_links): 57 | # Get the URL for the (text-format) document which packages all 58 | # of the parts of the filing 59 | base_url = re.sub('-index.htm.?','',index_url) + ".txt" 60 | filings_list.append([index_url, base_url, company_description]) 61 | filing_metadata = Metadata(index_url) 62 | 63 | if re.search(date_search_string, 64 | str(filing_metadata.sec_period_of_report)): 65 | filing_metadata.sec_index_url = index_url 66 | filing_metadata.sec_url = base_url 67 | filing_metadata.company_description = company_description 68 | if is_multiprocessing: 69 | # multi-core processing. Add jobs to pool. 70 | pool.apply_async(self.download_filing, 71 | args=(filing_metadata, do_save_full_document), 72 | callback=self.process_log_cache) 73 | else: 74 | # single core processing 75 | log_cache = self.download_filing(filing_metadata, do_save_full_document) 76 | self.process_log_cache(log_cache) 77 | if is_multiprocessing: 78 | pool.close() 79 | pool.join() 80 | logger.debug("Finished attempting to download all the %s forms for %s", 81 | filing_search_string, company_description) 82 | 83 | 84 | def process_log_cache(self, log_cache): 85 | """Output log_cache messages via logger 86 | """ 87 | for msg in log_cache: 88 | msg_type = msg[0] 89 | msg_text = msg[1] 90 | if msg_type=='process_name': 91 | id = '(' + msg_text + ') ' 92 | elif msg_type=='INFO': 93 | logger.info(id + msg_text) 94 | elif msg_type=='DEBUG': 95 | logger.debug(id + msg_text) 96 | elif msg_type=='WARNING': 97 | logger.warning(id + msg_text) 98 | elif msg_type=='ERROR': 99 | logger.error(id + msg_text) 100 | 101 | 102 | 103 | def download_filings_links(self, edgar_search_string, company_description, 104 | filing_search_string, date_search_string, 105 | start_date, end_date, count): 106 | """[docstring here] 107 | :param edgar_search_string: 10-digit integer CIK code, or ticker 108 | :param company_description: 109 | :param filing_search_string: e.g. '10-K' 110 | :param start_date: ccyymmdd 111 | :param end_date: ccyymmdd 112 | :param count: 113 | :return: linkList, a list of links to main pages for each filing found 114 | example of a typical base_url: http://www.sec.gov/cgi-bin/browse-secedgartext?action=getcompany&CIK=0000051143&type=10-K&datea=20011231&dateb=20131231&owner=exclude&output=xml&count=9999 115 | """ 116 | 117 | sec_website = "https://www.sec.gov/" 118 | browse_url = sec_website + "cgi-bin/browse-edgar" 119 | requests_params = {'action': 'getcompany', 120 | 'CIK': str(edgar_search_string), 121 | 'type': filing_search_string, 122 | 'datea': start_date, 123 | 'dateb': end_date, 124 | 'owner': 'exclude', 125 | 'output': 'html', 126 | 'count': count} 127 | logger.info('-' * 100) 128 | logger.info( 129 | "Query EDGAR database for " + filing_search_string + ", Search: " + 130 | str(edgar_search_string) + " (" + company_description + ")") 131 | 132 | linkList = [] # List of all links from the CIK page 133 | continuation_tag = 'first pass' 134 | 135 | while continuation_tag: 136 | r = requests_get(browse_url, params=requests_params) 137 | if continuation_tag == 'first pass': 138 | logger.debug("EDGAR search URL: " + r.url) 139 | logger.info('-' * 100) 140 | data = r.text 141 | soup = BeautifulSoup(data, "html.parser") 142 | for link in soup.find_all('a', {'id': 'documentsbutton'}): 143 | URL = sec_website + link['href'] 144 | linkList.append(URL) 145 | continuation_tag = soup.find('input', {'value': 'Next ' + str(count)}) # a button labelled 'Next 100' for example 146 | if continuation_tag: 147 | continuation_string = continuation_tag['onclick'] 148 | browse_url = sec_website + re.findall('cgi-bin.*count=\d*', continuation_string)[0] 149 | requests_params = None 150 | return linkList 151 | 152 | 153 | def download_filing(self, filing_metadata, do_save_full_document): 154 | """ 155 | Download filing, extract relevant sections. 156 | 157 | Download a filing (full filing submission). Find relevant 158 | portions of the filing, and send the raw text for text extraction 159 | :param: doc_info: contains URL for the full filing submission, and 160 | other EDGAR index metadata 161 | """ 162 | log_cache = [('process_name', str(os.getpid()))] 163 | filing_url = filing_metadata.sec_url 164 | company_description = filing_metadata.company_description 165 | log_str = "Retrieving: %s, %s, period: %s, index page: %s" \ 166 | % (filing_metadata.sec_company_name, 167 | filing_metadata.sec_form_header, 168 | filing_metadata.sec_period_of_report, 169 | filing_metadata.sec_index_url) 170 | log_cache.append(('DEBUG', log_str)) 171 | 172 | r = requests_get(filing_url) 173 | filing_text = r.text 174 | filing_metadata.add_data_from_filing_text(filing_text[0:10000]) 175 | 176 | # Iterate through the DOCUMENT types that we are seeking, 177 | # checking for each in turn whether they are included in the current 178 | # filing. Note that searching for document_group '10-K' will also 179 | # deliberately find DOCUMENT type variants such as 10-K/A, 10-K405 etc. 180 | # Note we search for all DOCUMENT types that interest us, regardless of 181 | # whether the current filing came from a '10-K' or '10-Q' web query 182 | # originally. Also note that we process DOCUMENT types in no 183 | # fixed order. 184 | filtered_search_terms = {doc_type: master_search_terms[doc_type] 185 | for doc_type in args.documents} 186 | for document_group in filtered_search_terms: 187 | doc_search = re.search(".{,20}" + document_group + 188 | ".*?", filing_text, 189 | flags=re.DOTALL | re.IGNORECASE) 190 | if doc_search: 191 | doc_text = doc_search.group() 192 | doc_metadata = copy.copy(filing_metadata) 193 | # look for form type near the start of the document. 194 | type_search = re.search(".*", 195 | doc_text[0:10000], re.IGNORECASE) 196 | if type_search: 197 | document_type = re.sub("^", "", type_search.group(), re.IGNORECASE) 198 | document_type = re.sub(r"(-|/|\.)", "", 199 | document_type) # remove hyphens etc 200 | else: 201 | document_type = "document_TYPE_not_tagged" 202 | log_cache.append(('ERROR', 203 | "form not given in form?: " + 204 | filing_url)) 205 | local_path = os.path.join(self.storage_folder, 206 | company_description + '_' + \ 207 | filing_metadata.sec_cik + "_" + document_type + "_" + \ 208 | filing_metadata.sec_period_of_report) 209 | doc_metadata.document_type = document_type 210 | # doc_metadata.form_type_internal = form_string 211 | doc_metadata.document_group = document_group 212 | doc_metadata.metadata_file_name = local_path 213 | 214 | # search for a ... block in the DOCUMENT 215 | html_search = re.search(r".*?", 216 | doc_text, re.DOTALL | re.IGNORECASE) 217 | xbrl_search = re.search(r".*?", 218 | doc_text, re.DOTALL | re.IGNORECASE) 219 | # occasionally a (somewhat corrupted) filing includes a mixture 220 | # of HTML-format documents, but some of them are enclosed in 221 | # ... tags and others in ... tags. 222 | # If the first -enclosed document is before the first 223 | # enclosed one, then we take that one instead of 224 | # the block identified in html_search. 225 | text_search = re.search(r".*?", 226 | doc_text, re.DOTALL | re.IGNORECASE) 227 | if text_search and html_search \ 228 | and text_search.start() < html_search.start() \ 229 | and html_search.start() > 5000: 230 | html_search = text_search 231 | if xbrl_search: 232 | doc_metadata.extraction_method = 'xbrl' 233 | doc_text = xbrl_search.group() 234 | main_path = local_path + ".xbrl" 235 | reader_class = HtmlDocument 236 | elif html_search: 237 | # if there's an html block inside the DOCUMENT then just 238 | # take this instead of the full DOCUMENT text 239 | doc_metadata.extraction_method = 'html' 240 | doc_text = html_search.group() 241 | main_path = local_path + ".htm" 242 | reader_class = HtmlDocument 243 | else: 244 | doc_metadata.extraction_method = 'txt' 245 | main_path = local_path + ".txt" 246 | reader_class = TextDocument 247 | doc_metadata.original_file_size = str(len(doc_text)) + ' chars' 248 | sections_log_items = reader_class( 249 | doc_metadata.original_file_name, 250 | doc_text, doc_metadata.extraction_method).\ 251 | get_excerpt(doc_text, document_group, 252 | doc_metadata, 253 | skip_existing_excerpts=False) 254 | log_cache = log_cache + sections_log_items 255 | if do_save_full_document: 256 | with open(main_path, "w") as filename: 257 | filename.write(doc_text) 258 | log_str = "Saved file: " + main_path + ', ' + \ 259 | str(round(os.path.getsize(main_path) / 1024)) + ' KB' 260 | log_cache.append(('DEBUG', log_str)) 261 | filing_metadata.original_file_name = main_path 262 | else: 263 | filing_metadata.original_file_name = \ 264 | "file was not saved locally" 265 | return(log_cache) 266 | 267 | 268 | 269 | -------------------------------------------------------------------------------- /src/html_document.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | import re 9 | import time 10 | from statistics import median 11 | from bs4 import BeautifulSoup, NavigableString, Tag, Comment 12 | 13 | from .utils import logger 14 | from .document import Document 15 | 16 | USE_HTML2TEXT = False 17 | 18 | class HtmlDocument(Document): 19 | soup = None 20 | plaintext = None 21 | 22 | def __init__(self, *args, **kwargs): 23 | super(HtmlDocument, self).__init__(*args, **kwargs) 24 | 25 | def search_terms_type(self): 26 | return "html" 27 | 28 | def prepare_text(self): 29 | """Strip unwanted text and parse the HTML. 30 | 31 | Remove some unhelpful text from the HTML, and parse the HTML, 32 | initialising the 'soup' attribute 33 | """ 34 | html_text = self.doc_text 35 | # remove whitespace sometimes found inside tags, 36 | # which breaks the parser 37 | html_text = re.sub('<\s', '<', html_text) 38 | # remove small-caps formatting tags which can confuse later analysis 39 | html_text = re.sub('(|)', '', html_text, 40 | flags=re.IGNORECASE) 41 | # for simplistic no-tags HTML documents (example: T 10K 20031231), 42 | # make sure the section headers get treated as new blocks. 43 | html_text = re.sub(r'(\nITEM\s{1,10}[1-9])', r'
\1', html_text, 44 | flags=re.IGNORECASE) 45 | 46 | # we prefer to use lxml parser for speed, this requires seprate 47 | # installation. Straightforward on Linux, somewhat tricky on Windows. 48 | # http://stackoverflow.com/questions/29440482/how-to-install-lxml-on-windows 49 | # ...note install the 32-bit version for Intel 64-bit? 50 | start_time = time.process_time() 51 | try: 52 | soup = BeautifulSoup(html_text, 'lxml') 53 | except: 54 | soup = BeautifulSoup(html_text, 'html.parser') # default parser 55 | parsing_time_elapsed = time.process_time() - start_time 56 | log_str = 'parsing time: ' + '% 3.2f' % \ 57 | (parsing_time_elapsed) + 's; ' + "{:,}". \ 58 | format(len(html_text)) + ' characters; ' + "{:,}". \ 59 | format(len(soup.find_all())) + ' HTML elements' 60 | self.log_cache.append(('DEBUG', log_str)) 61 | 62 | # for some old, simplistic documents lacking a proper HTML tree, 63 | # put in
tags artificially to help with parsing paragraphs, ensures 64 | # that section headers get properly identified 65 | if len(html_text) / len(soup.find_all()) > 500: 66 | html_text = re.sub(r'\n\n', r'
', html_text, 67 | flags=re.IGNORECASE) 68 | soup = BeautifulSoup(html_text, 'html.parser') 69 | 70 | # Remove numeric tables from soup 71 | tables_generator = (s for s in soup.find_all('table') if 72 | self.should_remove_table(s)) 73 | # debug: save the extracted tables to a text file 74 | # tables_debug_file = open(r'tables_deleted.txt', 'wt', encoding='latin1') 75 | for s in tables_generator: 76 | s.replace_with('[DATA_TABLE_REMOVED]') 77 | # tables_debug_file.write('#' * 80 + '\n') 78 | # tables_debug_file.write('\n'.join([x for x in s.text.splitlines() 79 | # if x.strip()]).encode('latin-1','replace').decode('latin-1')) 80 | # tables_debug_file.close() 81 | self.soup = soup 82 | 83 | if USE_HTML2TEXT: 84 | # option: use the HTML2TEXT library for paragraph splitting. 85 | # Purpose and performance is generally similar to the 86 | # home-made approach below 87 | import html2text 88 | h = html2text.HTML2Text(bodywidth=0) 89 | h.ignore_emphasis = True 90 | self.plaintext = h.handle(str(soup)) # use soup instead of the original html: it's faster and it benefits from the tables being excluded 91 | else: 92 | # paragraphs_analysis = [] 93 | # p_idx = 0 94 | # has_href = False 95 | # has_crossreference = False 96 | paragraph_string = '' 97 | document_string = '' 98 | all_paras = [] 99 | ec = soup.find() 100 | is_in_a_paragraph = True 101 | while not (ec is None): 102 | if is_line_break(ec) or ec.next_element is None: 103 | # end of paragraph tag (does not itself contain 104 | # Navigable String): insert double line-break for readability 105 | if is_in_a_paragraph: 106 | is_in_a_paragraph = False 107 | all_paras.append(paragraph_string) 108 | document_string = document_string + '\n\n' + paragraph_string 109 | else: 110 | # continuation of the current paragraph 111 | if isinstance(ec, NavigableString) and not \ 112 | isinstance(ec, Comment): 113 | # # remove redundant line breaks and other whitespace at the 114 | # # ends, and in the middle, of the string 115 | # ecs = re.sub(r'\s+', ' ', ec.string.strip()) 116 | ecs = re.sub(r'\s+', ' ', ec.string) 117 | if len(ecs) > 0: 118 | if not (is_in_a_paragraph): 119 | # set up for the start of a new paragraph 120 | is_in_a_paragraph = True 121 | paragraph_string = '' 122 | # paragraph_string = paragraph_string + ' ' + ecs 123 | paragraph_string = paragraph_string + ecs 124 | ec = ec.next_element 125 | # clean up multiple line-breaks 126 | document_string = re.sub('\n\s+\n', '\n\n', document_string) 127 | document_string = re.sub('\n{3,}', '\n\n', document_string) 128 | self.plaintext = document_string 129 | 130 | 131 | def extract_section(self, search_pairs): 132 | """ 133 | 134 | :param search_pairs: 135 | :return: 136 | """ 137 | start_text = 'na' 138 | end_text = 'na' 139 | warnings = [] 140 | text_extract = None 141 | for st_idx, st in enumerate(search_pairs): 142 | # ungreedy search (note '.*?' regex expression between 'start' and 'end' patterns 143 | # also using (?:abc|def) for a non-capturing group 144 | # also an extra pair of parentheses around the whole expression, 145 | # so that we always return just one object, not a tuple of groups 146 | # st = super().search_terms_pattern_to_regex() 147 | # st = Reader.search_terms_pattern_to_regex(st) 148 | item_search = re.findall(st['start']+'.*?'+ st['end'], 149 | self.plaintext, 150 | re.DOTALL | re.IGNORECASE) 151 | # item_search = re.findall('(' + st['start']+'.*?'+ st['end']+')', 152 | # self.plaintext, 153 | # re.DOTALL | re.IGNORECASE) 154 | if item_search: 155 | longest_text_length = 0 156 | for s in item_search: 157 | if isinstance(s, tuple): 158 | # If incorrect use of multiple regex groups has caused 159 | # more than one match, then s is returned as a tuple 160 | self.log_cache.append(('ERROR', 161 | "Groups found in Regex, please correct")) 162 | if len(s) > longest_text_length: 163 | text_extract = s.strip() 164 | longest_text_length = len(s) 165 | # final_text_new = re.sub('^\n*', '', final_text_new) 166 | final_text_lines = text_extract.split('\n') 167 | start_text = final_text_lines[0] 168 | end_text = final_text_lines[-1] 169 | break 170 | extraction_summary = self.extraction_method + '_document' 171 | if not text_extract: 172 | warnings.append('Extraction did not work for HTML file') 173 | extraction_summary = self.extraction_method + '_document: failed' 174 | else: 175 | text_extract = re.sub('\n\s{,5}Table of Contents\n', '', 176 | text_extract, flags=re.IGNORECASE) 177 | 178 | return text_extract, extraction_summary, start_text, end_text, warnings 179 | 180 | 181 | def should_remove_table(self, html): 182 | """Decide whether html contains a mostly-numeric table. 183 | 184 | Identify text in table element 'html' which cannot (realistically) be 185 | subject to downstream text analysis. Note there is a risk that we 186 | inadvertently remove any Section headings that are inside
elements 187 | We reduce this risk by only seeking takes with more than 5 (nonblank) 188 | elements, the median length of which is fewer than 30 characters 189 | """ 190 | char_counts = [] 191 | if html.stripped_strings: 192 | for t in html.stripped_strings: 193 | if len(t) > 0: 194 | char_counts.append(len(t)) 195 | return len(char_counts) > 5 and median(char_counts) < 30 196 | else: 197 | self.log_cache.append(('ERROR', 198 | "the should_remove_table function is broken")) 199 | 200 | 201 | 202 | def is_line_break(e): 203 | """Is e likely to function as a line break when document is rendered? 204 | 205 | we are including 'HTML block-level elements' here. Note

('paragraph') 206 | and other tags may not necessarily force the appearance of a 'line break', 207 | on the page if they are enclosed inside other elements, notably a 208 | table cell 209 | """ 210 | 211 | 212 | is_block_tag = e.name != None and e.name in ['p', 'div', 'br', 'hr', 'tr', 213 | 'table', 'form', 'h1', 'h2', 214 | 'h3', 'h4', 'h5', 'h6'] 215 | # handle block tags inside tables: if the apparent block formatting is 216 | # enclosed in a table cell

tags, and if there are no other block 217 | # elements within the cell (it's a singleton, then it will not 218 | # necessarily appear on a new line so we don't treat it as a line break 219 | if is_block_tag and e.parent.name == 'td': 220 | if len(e.parent.findChildren(name=e.name)) == 1: 221 | is_block_tag = False 222 | # inspect the style attribute of element e (if any) to see if it has 223 | # block style, which will appear as a line break in the document 224 | if hasattr(e, 'attrs') and 'style' in e.attrs: 225 | is_block_style = re.search('margin-(top|bottom)', e['style']) 226 | else: 227 | is_block_style = False 228 | return is_block_tag or is_block_style 229 | 230 | -------------------------------------------------------------------------------- /src/metadata.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | import json 9 | import re 10 | from bs4 import BeautifulSoup, Tag, NavigableString 11 | import time 12 | import random 13 | 14 | from .utils import logger 15 | from .utils import args, requests_get 16 | from .utils import batch_number, batch_start_time, batch_machine_id 17 | from .utils import sql_cursor, sql_connection 18 | 19 | 20 | class Metadata(object): 21 | def __init__(self, index_url=None): 22 | self.sec_cik = '' 23 | self.sec_company_name = '' 24 | self.document_type = '' 25 | self.sec_form_header = '' 26 | self.sec_period_of_report = '' 27 | self.sec_filing_date = '' 28 | self.sec_changed_date = '' 29 | self.sec_accepted_date = '' 30 | self.sec_index_url = '' 31 | self.sec_url = '' 32 | self.metadata_file_name = '' 33 | self.original_file_name = '' 34 | self.original_file_size = '' 35 | self.document_group = '' 36 | self.section_name = '' 37 | self.section_n_characters = None 38 | self.endpoints = [] 39 | self.extraction_method = '' 40 | self.warnings = [] 41 | self.company_description = '' 42 | self.output_file = None 43 | self.time_elapsed = None 44 | self.batch_number = batch_number 45 | self.batch_signature = args.batch_signature or '' 46 | self.batch_start_time = str(batch_start_time) 47 | self.batch_machine_id = batch_machine_id 48 | self.section_end_time = None 49 | 50 | if index_url: 51 | index_metadata = {} 52 | attempts = 0 53 | while attempts < 5: 54 | try: 55 | ri = requests_get(index_url) 56 | logger.info('Status Code: ' + str(ri.status_code)) 57 | soup = BeautifulSoup(ri.text, 'html.parser') 58 | # Parse the page to find metadata 59 | form_type = soup.find('div', {'id': 'formHeader'}). \ 60 | find_next('strong').string.strip() 61 | break 62 | except: 63 | attempts += 1 64 | logger.warning('No valid index page, attempt %i: %s' 65 | % (attempts, index_url)) 66 | time.sleep(attempts*10 + random.randint(1,5)) 67 | 68 | index_metadata['formHeader'] = form_type 69 | infoheads = soup.find_all('div', class_='infoHead') 70 | for i in infoheads: 71 | j = i.next_element 72 | while not (isinstance(j, Tag)) or not ('info') in \ 73 | j.attrs['class']: 74 | j = j.next_element 75 | # remove colons, spaces, hyphens from dates/times 76 | if type(j.string) is NavigableString: 77 | index_metadata[i.string] = re.sub('[: -]', '', 78 | j.string).strip() 79 | i = soup.find('span', class_='companyName') 80 | while not (isinstance(i, NavigableString)): 81 | i = i.next_element 82 | index_metadata['companyName'] = i.strip() 83 | i = soup.find(string='CIK') 84 | while not (isinstance(i, NavigableString)) or not (re.search('\d{10}', i.string)): 85 | i = i.next_element 86 | index_metadata['CIK'] = re.search('\d{5,}', i).group() 87 | 88 | for pair in [['Period of Report', 'sec_period_of_report'], 89 | ['Filing Date', 'sec_filing_date'], 90 | ['Filing Date Changed', 'sec_changed_date'], 91 | ['Accepted', 'sec_accepted_date'], 92 | ['formHeader', 'sec_form_header'], 93 | ['companyName', 'sec_company_name'], 94 | ['CIK', 'sec_cik']]: 95 | if pair[0] in index_metadata: 96 | setattr(self, pair[1], index_metadata[pair[0]]) 97 | 98 | def add_data_from_filing_text(self, text): 99 | """Scrape metadata from the filing document 100 | 101 | Find key metadata fields at the start of the filing submission, 102 | if they were not already found in the SEC index page 103 | :param text: full text of the filing 104 | """ 105 | for pair in [['CONFORMED PERIOD OF REPORT:', 'sec_period_of_report'], 106 | ['FILED AS OF DATE:', 'sec_filing_date'], 107 | ['DATE AS OF CHANGE:', 'sec_changed_date'], 108 | ['', 'sec_accepted_date'], 109 | ['COMPANY CONFORMED NAME:', 'sec_company_name'], 110 | ['CENTRAL INDEX KEY::', 'sec_cik']]: 111 | srch = re.search('(?<=' + pair[0] + ').*', text) 112 | if srch and not getattr(self, pair[1]): 113 | setattr(self, pair[1], srch.group().strip()) 114 | 115 | def save_to_json(self, file_path): 116 | """ 117 | we effectively convert the Metadata object's data into a dict 118 | when we do json.dumps on it 119 | :param file_path: 120 | :return: 121 | """ 122 | with open(file_path, 'w', encoding='utf-8') as json_output: 123 | # to write the backslashes in the JSON file legibly 124 | # (without duplicate backslashes), we have to 125 | # encode/decode using the 'unicode_escape' codec. This then 126 | # allows us to open the JSON file and click on the file link, 127 | # for immediate viewing in a browser. 128 | excerpt_as_json = json.dumps(self, default=lambda o: o.__dict__, 129 | sort_keys=False, indent=4) 130 | json_output.write(bytes(excerpt_as_json, "utf-8"). 131 | decode("unicode_escape")) 132 | 133 | 134 | def save_to_db(self): 135 | """Append metadata to sqlite database 136 | 137 | """ 138 | 139 | # conn = sqlite3.connect(path.join(args.storage, 'metadata.sqlite3')) 140 | # c = conn.cursor() 141 | sql_insert = """INSERT INTO metadata ( 142 | batch_number, 143 | batch_signature, 144 | batch_start_time, 145 | batch_machine_id, 146 | sec_cik, 147 | company_description, 148 | sec_company_name, 149 | sec_form_header, 150 | sec_period_of_report, 151 | sec_filing_date, 152 | sec_index_url, 153 | sec_url, 154 | metadata_file_name, 155 | document_group, 156 | section_name, 157 | section_n_characters, 158 | section_end_time, 159 | extraction_method, 160 | output_file, 161 | start_line, 162 | end_line, 163 | time_elapsed) VALUES 164 | """ + "('" + "', '".join([str(self.batch_number), 165 | str(self.batch_signature), 166 | str(self.batch_start_time)[:-3], # take only 3dp microseconds 167 | self.batch_machine_id, 168 | self.sec_cik, 169 | re.sub("[\'\"]","", self.company_description).strip(), 170 | re.sub("[\'\"]","", self.sec_company_name).strip(), 171 | self.sec_form_header, self.sec_period_of_report, 172 | self.sec_filing_date, 173 | self.sec_index_url, self.sec_url, 174 | self.metadata_file_name, self.document_group, 175 | self.section_name, str(self.section_n_characters), 176 | str(self.section_end_time)[:-3], 177 | self.extraction_method, 178 | str(self.output_file), 179 | re.sub("[\'\"]","", self.endpoints[0]).strip()[0:200], 180 | re.sub("[\'\"]","", self.endpoints[1]).strip()[0:200], 181 | str(self.time_elapsed)]) + "')" 182 | sql_insert = sql_insert.replace("'None'","NULL") 183 | sql_cursor.execute(sql_insert) 184 | sql_connection.commit() 185 | 186 | 187 | def load_from_json(file_path): 188 | metadata = Metadata() 189 | with open(file_path, 'r') as json_file: 190 | try: 191 | # data = json.loads(data_file.read().replace('\\', '\\\\'), strict=False) 192 | data = json.loads(json_file.read()) 193 | metadata.sec_cik = data['sec_cik'] 194 | metadata.sec_company_name = data['sec_company_name'] 195 | metadata.company_description = data['company_description'] 196 | metadata.document_type = data['document_type'] 197 | metadata.sec_form_header = data['sec_form_header'] 198 | metadata.sec_period_of_report = data['sec_period_of_report'] 199 | metadata.sec_filing_date = data['sec_filing_date'] 200 | metadata.sec_changed_date = data['sec_changed_date'] 201 | metadata.sec_accepted_date = data['sec_accepted_date'] 202 | metadata.sec_accepted_date = data['sec_accepted_date'] 203 | metadata.sec_url = data['sec_url'] 204 | metadata.metadata_file_name = data['metadata_file_name'] 205 | metadata.original_file_name = data['original_file_name'] 206 | metadata.original_file_size = data['original_file_size'] 207 | metadata.document_group = data['form_group'] 208 | metadata.section_name = data['section_name'] 209 | metadata.section_n_characters = data['section_n_characters'] 210 | metadata.endpoints = data['endpoints'] 211 | metadata.extraction_method = data['extraction_method'] 212 | metadata.warnings = data['warnings'] 213 | metadata.output_file = data['output_file'] 214 | metadata.time_elapsed = data['time_elapsed'] 215 | metadata.batch_number = data['batch_number'] 216 | metadata.batch_signature = data['batch_signature'] 217 | metadata.batch_start_time = data['batch_start_time'] 218 | metadata.batch_machine_id = data['batch_machine_id'] 219 | metadata.section_end_time = data['section_end_time'] 220 | 221 | except: 222 | logger.info('Could not load corrupted JSON file: ' + file_path) 223 | 224 | return metadata 225 | 226 | -------------------------------------------------------------------------------- /src/text_document.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | import re 9 | 10 | from .document import Document 11 | 12 | 13 | class TextDocument(Document): 14 | def __init__(self, *args, **kwargs): 15 | super(TextDocument, self).__init__(*args, **kwargs) 16 | 17 | def search_terms_type(self): 18 | return "txt" 19 | 20 | def extract_section(self, search_pairs): 21 | """ 22 | 23 | :param search_pairs: 24 | :return: 25 | """ 26 | start_text = 'na' 27 | end_text = 'na' 28 | warnings = [] 29 | text_extract = None 30 | for st_idx, st in enumerate(search_pairs): 31 | # ungreedy search (note '.*?' regex expression between 'start' and 'end' patterns 32 | # also using (?:abc|def) for a non-capturing group 33 | # st = super().search_terms_pattern_to_regex() 34 | # st = Reader.search_terms_pattern_to_regex(st) 35 | item_search = re.findall(st['start'] + '.*?' + st['end'], 36 | self.doc_text, 37 | re.DOTALL | re.IGNORECASE) 38 | if item_search: 39 | longest_text_length = 0 40 | for s in item_search: 41 | text_extract = s.strip() 42 | if len(s) > longest_text_length: 43 | longest_text_length = len(text_extract) 44 | # final_text_new = re.sub('^\n*', '', final_text_new) 45 | final_text_lines = text_extract.split('\n') 46 | start_text = final_text_lines[0] 47 | end_text = final_text_lines[-1] 48 | break 49 | if text_extract: 50 | # final_text = '\n'.join(final_text_lines) 51 | # text_extract = remove_table_lines(final_text) 52 | text_extract = remove_table_lines(text_extract) 53 | extraction_summary = self.extraction_method + '_document' 54 | else: 55 | warnings.append('Extraction did not work for text file') 56 | extraction_summary = self.extraction_method + '_document: failed' 57 | return text_extract, extraction_summary, start_text, end_text, warnings 58 | 59 | def remove_table_lines(input_text): 60 | """Replace lines believed to be part of numeric tables with a placeholder. 61 | 62 | :param input_text: 63 | :return: 64 | """ 65 | text_lines = [] 66 | table_lines = [] 67 | post_table_lines = [] 68 | is_in_a_table = False 69 | is_in_a_post_table = False 70 | all_lines = input_text.splitlines(True) 71 | for i, line in enumerate(all_lines, 0): 72 | if is_table_line(line): 73 | # a table line, possibly not part of an excerpt 74 | if is_in_a_post_table: 75 | # table resumes: put the inter-table lines into the table_line list 76 | table_lines = table_lines + post_table_lines 77 | post_table_lines = [] 78 | is_in_a_post_table = False 79 | table_lines.append(line) 80 | is_in_a_table = True 81 | else: 82 | # not a table line 83 | if is_in_a_table: 84 | # the first post-table line 85 | is_in_a_table = False 86 | is_in_a_post_table = True 87 | post_table_lines.append(line) 88 | elif is_in_a_post_table: 89 | # 2nd and subsequent post-table lines, or final line 90 | if len(post_table_lines) >= 4: 91 | # sufficient post-table lines have accumulated now that we 92 | # revert to standard 'not a post table' mode. 93 | # We append the post-table lines to the text_lines, 94 | # and we discard the table_lines 95 | if len(table_lines) >= 3: 96 | text_lines.append( 97 | '[DATA_TABLE_REMOVED_' + 98 | str(len(table_lines)) + '_LINES]\n\n') 99 | else: 100 | # very short table, so we just leave it in 101 | # the document regardless 102 | text_lines = text_lines + table_lines 103 | text_lines = text_lines + post_table_lines 104 | table_lines = [] 105 | post_table_lines = [] 106 | is_in_a_post_table = False 107 | else: 108 | post_table_lines.append(line) 109 | if not (is_in_a_table) and not (is_in_a_post_table): 110 | # normal excerpt line: just append it to text_lines 111 | text_lines.append(line) 112 | # Tidy up any outstanding table_lines and post_table_lines at the end 113 | if len(table_lines) >= 3: 114 | text_lines.append( 115 | '[DATA_TABLE_REMOVED_' + str(len(table_lines)) + '_LINES]\n\n') 116 | else: 117 | text_lines = text_lines + table_lines 118 | text_lines = text_lines + post_table_lines 119 | 120 | final_text = ''.join(text_lines) 121 | return final_text 122 | 123 | 124 | def is_table_line(s): 125 | """Is text line string s likely to be part of a numeric table? 126 | 127 | gaps between table 'cells' are expected to have three or more whitespaces, 128 | and table rows are expected to have at least 3 such gaps, i.e. 4 columns 129 | 130 | :param s: 131 | :return: 132 | """ 133 | s = s.replace('\t', ' ') 134 | rs = re.findall('\S\s{3,}', s) # \S = non-whitespace, \s = whitespace 135 | r = re.search('(|(-|=|_){5,})', s) # check for TABLE quasi-HTML tag, 136 | # or use of lots of punctuation marks as table gridlines 137 | # Previously also looking for ^\s{10,}[a-zA-z] "lots of spaces prior to 138 | # the first (non-numeric i.e. not just a page number marker) character". 139 | # Not using this approach because risk of confusion with centre-justified 140 | # section headings in certain text documents 141 | return len(rs) >= 2 or r != None -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | secedgartext: extract text from SEC corporate filings 3 | Copyright (C) 2017 Alexander Ions 4 | 5 | You should have received a copy of the GNU General Public License 6 | along with this program. If not, see . 7 | """ 8 | 9 | import logging 10 | import os 11 | import sys 12 | import shutil 13 | import argparse 14 | import re 15 | from os import path 16 | import socket 17 | import time 18 | import datetime 19 | import json 20 | import sqlite3 21 | import multiprocessing as mp 22 | from copy import copy 23 | 24 | 25 | """Parse the command line arguments 26 | """ 27 | companies_file_location = '' 28 | single_company = '' 29 | project_dir = path.dirname(path.dirname(__file__)) 30 | parser = argparse.ArgumentParser() 31 | parser.add_argument('--storage', help='Specify path to storage location') 32 | parser.add_argument('--write_sql', default=True, help='Save metadata to sqlite database? (Boolean)') 33 | parser.add_argument('--company', help='CIK code specifying company for single-company download') 34 | parser.add_argument('--companies_list', help='path of text file with all company CIK codes to download') 35 | parser.add_argument('--filings', help='comma-separated list of SEC filings of interest (10-Q,10-K...)') 36 | parser.add_argument('--documents') 37 | parser.add_argument('--start', help='document start date passed to EDGAR web interface') 38 | parser.add_argument('--end', help='document end date passed to EDGAR web interface') 39 | parser.add_argument('--report_period', help='search pattern for company report dates, e.g. 2012, 201206 etc.') 40 | parser.add_argument('--batch_signature') 41 | parser.add_argument('--start_company', help='index number of first company to download from the companies_list file') 42 | parser.add_argument('--end_company', help='index number of last company to download from the companies_list file') 43 | parser.add_argument('--traffic_limit_pause_ms', help='time to pause between download attempts, to avoid overloading EDGAR server') 44 | parser.add_argument('--multiprocessing_cores', help='number of processor cores to use') 45 | args = parser.parse_args() 46 | 47 | if args.storage: 48 | if not path.isabs(args.storage): 49 | args.storage = path.join(project_dir, args.storage) 50 | else: 51 | args.storage = path.join(project_dir, 'output_files_examples') 52 | 53 | args.write_sql = args.write_sql or True 54 | if args.company: 55 | single_company = args.company 56 | else: 57 | if args.companies_list: 58 | companies_file_location = os.path.join(project_dir, args.companies_list) 59 | else: 60 | companies_file_location = os.path.join(project_dir, 'companies_list.txt') 61 | 62 | args.filings = args.filings or \ 63 | input('Enter filings search text (default: 10-K,10-Q): ') or \ 64 | '10-K,10-Q' 65 | args.filings = re.split(',', args.filings) # ['10-K','10-Q'] 66 | 67 | if '10-K' in args.filings: 68 | search_window_days = 365 69 | else: 70 | search_window_days = 91 71 | ccyymmdd_default_start = (datetime.datetime.now() - datetime.timedelta(days= 72 | search_window_days)).strftime('%Y%m%d') 73 | args.start = int(args.start or \ 74 | input('Enter start date for filings search (default: ' + 75 | ccyymmdd_default_start + '): ') or \ 76 | ccyymmdd_default_start) 77 | ccyymmdd_default_end = (datetime.datetime.strptime(str(args.start), '%Y%m%d') + 78 | datetime.timedelta(days=search_window_days)).strftime('%Y%m%d') 79 | args.end = int(args.end or \ 80 | input('Enter end date for filings search (default: ' + 81 | ccyymmdd_default_end + '): ') or \ 82 | ccyymmdd_default_end) 83 | if str(args.report_period).lower() == 'all': 84 | date_search_string = '.*' 85 | else: 86 | date_search_string = str( 87 | args.report_period or 88 | input('Enter filing report period ccyy, ccyymm etc. (default: all periods): ') or 89 | '.*') 90 | 91 | 92 | """Set up the metadata database 93 | """ 94 | batch_start_time = datetime.datetime.utcnow() 95 | batch_machine_id = socket.gethostname() 96 | 97 | if args.write_sql: 98 | db_location = path.join(args.storage, 'metadata.sqlite3') 99 | sql_connection = sqlite3.connect(db_location) 100 | sql_cursor = sql_connection.cursor() 101 | sql_cursor.execute(""" 102 | CREATE TABLE IF NOT EXISTS metadata ( 103 | id integer PRIMARY KEY, 104 | batch_number integer NOT NULL, 105 | batch_signature text NOT NULL, 106 | batch_start_time datetime NOT NULL, 107 | batch_machine_id text, 108 | sec_cik text NOT NULL, 109 | company_description text, 110 | sec_company_name text, 111 | sec_form_header text, 112 | sec_period_of_report integer, 113 | sec_filing_date integer, 114 | sec_index_url text, 115 | sec_url text, 116 | metadata_file_name text, 117 | document_group text, 118 | section_name text, 119 | section_n_characters integer, 120 | section_end_time datetime, 121 | extraction_method text, 122 | output_file text, 123 | start_line text, 124 | end_line text, 125 | time_elapsed real) 126 | """) 127 | sql_connection.commit() 128 | query_result = sql_cursor.execute('SELECT max(batch_number) FROM metadata').fetchone() 129 | if query_result and query_result[0]: 130 | batch_number = query_result[0] + 1 131 | else: 132 | batch_number = 1 133 | # put a dummy line into the metadata table to 'reserve' a batch number: 134 | # prevents other processes running in parallel from taking the same batch_number 135 | sql_cursor.execute(""" 136 | insert into metadata (batch_number, batch_signature, 137 | batch_start_time, sec_cik) values 138 | """ + " ('" + "', '".join([str(batch_number), 139 | str(args.batch_signature or ''), 140 | str(batch_start_time)[:-3], # take only 3dp microseconds 141 | 'dummy_cik_code']) + "')") 142 | sql_connection.commit() 143 | else: 144 | batch_number = 0 145 | 146 | 147 | """Set up numbered storage sub-directory for the current batch run 148 | """ 149 | storage_toplevel_directory = os.path.join(args.storage, 150 | 'batch_' + 151 | format(batch_number, '04d')) 152 | 153 | # (re-)make the storage directory for the current batch. This will delete 154 | # any contents that might be left over from earlier runs, thus avoiding 155 | # any potential duplication/overlap/confusion 156 | if os.path.exists(storage_toplevel_directory): 157 | shutil.rmtree(storage_toplevel_directory) 158 | os.makedirs(storage_toplevel_directory) 159 | 160 | 161 | 162 | 163 | """Set up logging 164 | """ 165 | # log_file_name = 'sec_extractor_{0}.log'.format(ts) 166 | log_file_name = 'secedgartext_batch_%s.log' % format(batch_number, '04d') 167 | log_path = path.join(args.storage, log_file_name) 168 | 169 | logger = logging.getLogger('text_analysis') 170 | # # set up the logger if it hasn't already been set up earlier in the execution run 171 | logger.setLevel(logging.DEBUG) # we have to initialise this top-level setting otherwise everything defaults to logging.WARN level 172 | formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s', 173 | '%Y%m%d %H:%M:%S') 174 | 175 | file_handler = logging.FileHandler(log_path) 176 | file_handler.setFormatter(formatter) 177 | file_handler.setLevel(logging.DEBUG) 178 | file_handler.set_name('my_file_handler') 179 | logger.addHandler(file_handler) 180 | 181 | console_handler = logging.StreamHandler() 182 | console_handler.setFormatter(formatter) 183 | console_handler.setLevel(logging.DEBUG) 184 | console_handler.set_name('my_console_handler') 185 | logger.addHandler(console_handler) 186 | 187 | 188 | ts = time.time() 189 | logger.info('=' * 65) 190 | logger.info('Analysis started at {0}'. 191 | format(datetime.datetime.fromtimestamp(ts). 192 | strftime('%Y%m%d %H:%M:%S'))) 193 | logger.info('Command line:\t{0}'.format(sys.argv[0])) 194 | logger.info('Arguments:\t\t{0}'.format(' '.join(sys.argv[:]))) 195 | logger.info('=' * 65) 196 | 197 | if args.write_sql: 198 | logger.info('Opened SQL connection: %s', db_location) 199 | 200 | 201 | if not args.traffic_limit_pause_ms: 202 | # default pause after HTTP request: zero milliseconds 203 | args.traffic_limit_pause_ms = 0 204 | else: 205 | args.traffic_limit_pause_ms = int(args.traffic_limit_pause_ms) 206 | logger.info('Traffic Limit Pause (ms): %s' % 207 | str(args.traffic_limit_pause_ms)) 208 | 209 | 210 | if args.multiprocessing_cores: 211 | args.multiprocessing_cores = min(mp.cpu_count()-1, 212 | int(args.multiprocessing_cores)) 213 | else: 214 | args.multiprocessing_cores = 0 215 | 216 | 217 | """Create search_terms_regex, which stores the patterns that we 218 | use for identifying sections in each of EDGAR documents types 219 | """ 220 | with open (path.join(project_dir, 'document_group_section_search.json'), 'r') as \ 221 | f: 222 | json_text = f.read() 223 | search_terms = json.loads(json_text) 224 | if not search_terms: 225 | logger.error('Search terms file is missing or corrupted: ' + 226 | f.name) 227 | search_terms_regex = copy(search_terms) 228 | for filing in search_terms: 229 | for idx, section in enumerate(search_terms[filing]): 230 | for format in ['txt','html']: 231 | for idx2, pattern in enumerate(search_terms[filing][idx][format]): 232 | for startend in ['start','end']: 233 | regex_string = search_terms[filing][idx][format] \ 234 | [idx2][startend] 235 | regex_string = regex_string.replace('_','\\s{,5}') 236 | regex_string = regex_string.replace('\n', '\\n') 237 | search_terms_regex[filing][idx][format] \ 238 | [idx2][startend] = regex_string 239 | """identify which 'document' types are to be downloaded. If no command line 240 | argument given, then default to all of the document types listed in the 241 | JSON file""" 242 | args.documents = args.documents or ','.join(list(search_terms.keys())) 243 | args.documents = re.split(',', args.documents) # ['10-K','10-Q'] 244 | 245 | 246 | def requests_get(url, params=None): 247 | """retrieve text via url, fatal error if no internet connection available 248 | :param url: source url 249 | :return: text retriieved 250 | """ 251 | import requests, random 252 | retries = 0 253 | success = False 254 | hdr = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36'} 255 | while (not success) and (retries <= 20): 256 | # wait for an increasingly long time (up to a day) in case internet 257 | # connection is broken. Gives enough time to fix connection or SEC site 258 | try: 259 | # to test the timeout functionality, try loading this page: 260 | # http://httpstat.us/200?sleep=20000 (20 seconds delay before page loads) 261 | r = requests.get(url, headers=hdr, params=params, timeout=10) 262 | success = True 263 | # facility to add a pause to respect SEC EDGAR traffic limit 264 | # https://www.sec.gov/privacy.htm#security 265 | time.sleep(args.traffic_limit_pause_ms/1000) 266 | except requests.exceptions.RequestException as e: 267 | wait = (retries ^3) * 20 + random.randint(1,5) 268 | logger.warning(e) 269 | logger.info('URL: %s' % url) 270 | logger.info( 271 | 'Waiting %s secs and re-trying...' % wait) 272 | time.sleep(wait) 273 | retries += 1 274 | if retries > 10: 275 | logger.error('Download repeatedly failed: %s', 276 | url) 277 | sys.exit('Download repeatedly failed: %s' % 278 | url) 279 | return r 280 | 281 | 282 | 283 | 284 | --------------------------------------------------------------------------------