├── .qrignore
├── LICENSE.txt
├── README.md
└── fundamental_factors
├── Introduction.ipynb
├── Lesson01-Data-Collection.ipynb
├── Lesson02-Base-Universe.ipynb
├── Lesson03-Basic-Usage.ipynb
├── Lesson04-Periodic-Computations.ipynb
├── Lesson05-Exploratory-Data-Analysis.ipynb
├── Lesson06-Profitability.ipynb
├── Lesson07-Profitability-Growth.ipynb
├── Lesson08-Factor-Values-vs-Factor-Ranks.ipynb
├── Lesson09-Sector-Neutralization.ipynb
├── Lesson10-Macro-Analysis.ipynb
├── Lesson11-Altman-Z-Score.ipynb
├── Lesson12-Multi-Factor-Scores.ipynb
├── __init__.py
└── universe.py
/.qrignore:
--------------------------------------------------------------------------------
1 | README.md
2 | LICENSE.txt
3 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
10 |
11 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
12 |
13 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
14 |
15 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
16 |
17 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
18 |
19 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
20 |
21 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
22 |
23 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
24 |
25 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
26 |
27 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
28 |
29 | 2. Grant of Copyright License.
30 |
31 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
32 |
33 | 3. Grant of Patent License.
34 |
35 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
36 |
37 | 4. Redistribution.
38 |
39 | You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
40 |
41 | You must give any other recipients of the Work or Derivative Works a copy of this License; and
42 | You must cause any modified files to carry prominent notices stating that You changed the files; and
43 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
44 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
45 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
46 |
47 | 5. Submission of Contributions.
48 |
49 | Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
50 |
51 | 6. Trademarks.
52 |
53 | This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
54 |
55 | 7. Disclaimer of Warranty.
56 |
57 | Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
58 |
59 | 8. Limitation of Liability.
60 |
61 | In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
62 |
63 | 9. Accepting Warranty or Additional Liability.
64 |
65 | While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
66 |
67 | END OF TERMS AND CONDITIONS
68 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # fundamental-factors
2 |
3 | Learn how to research fundamental factors using the Pipeline API, Alphalens, and Sharadar US price and fundamental data.
4 |
5 | ## Clone in QuantRocket
6 |
7 | CLI:
8 |
9 | ```shell
10 | quantrocket codeload clone 'fundamental-factors'
11 | ```
12 |
13 | Python:
14 |
15 | ```python
16 | from quantrocket.codeload import clone
17 | clone("fundamental-factors")
18 | ```
19 |
20 | ## Browse in GitHub
21 |
22 | Start here: [fundamental_factors/Introduction.ipynb](fundamental_factors/Introduction.ipynb)
23 |
24 | ***
25 |
26 | Find more code in QuantRocket's [Code Library](https://www.quantrocket.com/code/)
27 |
--------------------------------------------------------------------------------
/fundamental_factors/Introduction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {
14 | "tags": []
15 | },
16 | "source": [
17 | "# Fundamental Factors\n",
18 | "\n",
19 | "This tutorial demonstrates how to research fundamental factors using the Pipeline API, Alphalens, and Sharadar US price and fundamental data.\n",
20 | "\n",
21 | "Lessons are organized around specific fundamental factors, but the techniques demonstrated can be applied to any factors. "
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "## Prerequisites\n",
29 | "\n",
30 | "The Pipeline Tutorial is recommended as a prerequisite to this tutorial:"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "from quantrocket.codeload import clone\n",
40 | "clone('pipeline-tutorial')"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "## Data Permissions\n",
48 | "\n",
49 | "To complete the tutorial, you must have a Sharadar subscription that includes permission for the following datasets:\n",
50 | "\n",
51 | "* US Company Fundamentals\n",
52 | "* End-of-Day US Stock Prices (Fund Prices not required)"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "## Contents\n",
60 | "\n",
61 | "### Introduction\n",
62 | "\n",
63 | "* Lesson 1: [Data Collection](Lesson01-Data-Collection.ipynb) - Sharadar price and fundamental data collection steps\n",
64 | "* Lesson 2: [Define a Base Universe](Lesson02-Base-Universe.ipynb) - learn how to define a base universe to use with multiple pipelines\n",
65 | "* Lesson 3: [Basic Usage](Lesson03-Basic-Usage.ipynb) - learn how to choose a dimension and define and derive factors from dataset columns\n",
66 | "* Lesson 4: [Periodic Computations](Lesson04-Periodic-Computations.ipynb) - learn how to create factors that measure the change in fundamental metrics over time\n",
67 | "\n",
68 | "\n",
69 | "### Factor 1: Profitability\n",
70 | "\n",
71 | "* Lesson 5: [Exploratory Data Analysis](Lesson05-Exploratory-Data-Analysis.ipynb) - learn how to get a basic overview of a factor's distribution and characteristics\n",
72 | "* Lesson 6: [Alphalens: Profitability](Lesson06-Profitability.ipynb) - learn how to create and interpret an Alphalens tear sheet, looking at the profitability factor \n",
73 | "* Lesson 7: [Alphalens: Profitability Growth](Lesson07-Profitability-Growth.ipynb) - an additional Alphalens walkthrough that looks at growth in profitability\n",
74 | "\n",
75 | "### Factor 2: Debt-to-Equity Ratio\n",
76 | "\n",
77 | "* Lesson 8: [Factor Values vs Factor Ranks](Lesson08-Factor-Values-vs-Factor-Ranks.ipynb) - learn how ranking factors can reduce the effect of outliers and lead to smaller maximum weights\n",
78 | "* Lesson 9: [Sector Neutralization](Lesson09-Sector-Neutralization.ipynb) - learn techniques that can be used to avoid unintended concentrated bets on sectors\n",
79 | "\n",
80 | "### Factor 3: Financial Distress Indicators\n",
81 | "\n",
82 | "* Lesson 10: [Macro Analysis: The Interest Coverage Ratio and Altman Z-Score](Lesson10-Macro-Analysis.ipynb) - learn how to perform macro analysis of US stocks, using segmented processing to analyze more data than can fit into memory\n",
83 | "* Lesson 11: [Alphalens: Altman Z-Score](Lesson11-Altman-Z-Score.ipynb) - learn how and when to define specific bin edges for your factor instead of using equal-sized quantiles\n",
84 | "* Lesson 12: [Multi-Factor Scores](Lesson12-Multi-Factor-Scores.ipynb) - learn how to combine multiple factors into a single score"
85 | ]
86 | }
87 | ],
88 | "metadata": {
89 | "kernelspec": {
90 | "display_name": "Python 3.11",
91 | "language": "python",
92 | "name": "python3"
93 | },
94 | "language_info": {
95 | "codemirror_mode": {
96 | "name": "ipython",
97 | "version": 3
98 | },
99 | "file_extension": ".py",
100 | "mimetype": "text/x-python",
101 | "name": "python",
102 | "nbconvert_exporter": "python",
103 | "pygments_lexer": "ipython3",
104 | "version": "3.11.0"
105 | }
106 | },
107 | "nbformat": 4,
108 | "nbformat_minor": 4
109 | }
110 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson01-Data-Collection.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 1: Data Collection\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Sharadar Data Collection \n",
27 | "\n",
28 | "This tutorial utilizes Sharadar price and fundamental data. "
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {
34 | "tags": []
35 | },
36 | "source": [
37 | "## Collect Sharadar Fundamentals\n",
38 | "\n",
39 | "First, collect the fundamental data:"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 1,
45 | "metadata": {
46 | "tags": []
47 | },
48 | "outputs": [
49 | {
50 | "data": {
51 | "text/plain": [
52 | "{'status': 'the fundamental data will be collected asynchronously'}"
53 | ]
54 | },
55 | "execution_count": 1,
56 | "metadata": {},
57 | "output_type": "execute_result"
58 | }
59 | ],
60 | "source": [
61 | "from quantrocket.fundamental import collect_sharadar_fundamentals\n",
62 | "collect_sharadar_fundamentals()"
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "metadata": {},
68 | "source": [
69 | "Monitor flightlog for completion:\n",
70 | "\n",
71 | "```\n",
72 | "quantrocket.fundamental: INFO Collecting Sharadar US fundamentals\n",
73 | "quantrocket.fundamental: INFO Finished collecting Sharadar US fundamentals\n",
74 | "```"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {
80 | "tags": []
81 | },
82 | "source": [
83 | "## Ingest the Sharadar Zipline Bundle\n",
84 | "\n",
85 | "Next, ingest the Sharadar Zipline bundle. To do so, first create the bundle:"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 2,
91 | "metadata": {
92 | "tags": []
93 | },
94 | "outputs": [
95 | {
96 | "data": {
97 | "text/plain": [
98 | "{'status': 'success', 'msg': 'successfully created sharadar-1d bundle'}"
99 | ]
100 | },
101 | "execution_count": 2,
102 | "metadata": {},
103 | "output_type": "execute_result"
104 | }
105 | ],
106 | "source": [
107 | "from quantrocket.zipline import create_sharadar_bundle\n",
108 | "create_sharadar_bundle(\"sharadar-1d\")"
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "metadata": {},
114 | "source": [
115 | "Then, ingest data into the bundle:"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 3,
121 | "metadata": {
122 | "tags": []
123 | },
124 | "outputs": [
125 | {
126 | "data": {
127 | "text/plain": [
128 | "{'status': 'the data will be ingested asynchronously'}"
129 | ]
130 | },
131 | "execution_count": 3,
132 | "metadata": {},
133 | "output_type": "execute_result"
134 | }
135 | ],
136 | "source": [
137 | "from quantrocket.zipline import ingest_bundle\n",
138 | "ingest_bundle(\"sharadar-1d\")"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "Monitor flightlog for completion:\n",
146 | "\n",
147 | "```\n",
148 | "quantrocket.zipline: INFO [sharadar-1d] Ingesting daily bars for sharadar-1d bundle\n",
149 | "quantrocket.zipline: INFO [sharadar-1d] Ingesting adjustments for sharadar-1d bundle\n",
150 | "quantrocket.zipline: INFO [sharadar-1d] Ingesting assets for sharadar-1d bundle\n",
151 | "quantrocket.zipline: INFO [sharadar-1d] Completed ingesting data for sharadar-1d bundle\n",
152 | "```"
153 | ]
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {},
158 | "source": [
159 | "## Set default bundle\n",
160 | "\n",
161 | "Next, set the Sharadar bundle as the default bundle. This is optional but saves us from having to specify the bundle explicitly every time we run the notebook. "
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": 4,
167 | "metadata": {
168 | "tags": []
169 | },
170 | "outputs": [
171 | {
172 | "data": {
173 | "text/plain": [
174 | "{'status': 'successfully set default bundle'}"
175 | ]
176 | },
177 | "execution_count": 4,
178 | "metadata": {},
179 | "output_type": "execute_result"
180 | }
181 | ],
182 | "source": [
183 | "from quantrocket.zipline import set_default_bundle\n",
184 | "set_default_bundle(\"sharadar-1d\")"
185 | ]
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "metadata": {},
190 | "source": [
191 | "***\n",
192 | "\n",
193 | "## *Next Up*\n",
194 | "\n",
195 | "Lesson 2: [Define a Base Universe](Lesson02-Base-Universe.ipynb)"
196 | ]
197 | }
198 | ],
199 | "metadata": {
200 | "kernelspec": {
201 | "display_name": "Python 3.11",
202 | "language": "python",
203 | "name": "python3"
204 | },
205 | "language_info": {
206 | "codemirror_mode": {
207 | "name": "ipython",
208 | "version": 3
209 | },
210 | "file_extension": ".py",
211 | "mimetype": "text/x-python",
212 | "name": "python",
213 | "nbconvert_exporter": "python",
214 | "pygments_lexer": "ipython3",
215 | "version": "3.11.0"
216 | }
217 | },
218 | "nbformat": 4,
219 | "nbformat_minor": 4
220 | }
221 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson02-Base-Universe.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 2: Define a Base Universe\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Define a Base Universe\n",
27 | "\n",
28 | "Before researching specific factors, we will define a base universe. We don't want to include certain securities such as ETFs and ADRS in any of our subsequent analysis, and by defining a base universe in a separate file, we can import and use the definition in our notebooks without having to re-define the universe rules in each notebook. \n",
29 | "\n",
30 | "The base universe will still be quite broad, for two reasons. First, we can always add more rules to the base rules in any given notebook to narrow the universe. Second, using a broad universe will help us see how factors behave across the US equities market, even if we subsequently wish to narrow the universe for trading or further analysis. "
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {
36 | "tags": []
37 | },
38 | "source": [
39 | "## Explore Sharadar Categories\n",
40 | "\n",
41 | "Different types of securities are categorized in the `sharadar_Category` field of the securities master database. Let's query all Sharadar records in the securities master database and group by `sharadar_Category` to see a breakdown of security types. (You can also obtain this information by browsing the sharadar-1d bundle in the Data Browser and looking at the Universe tab.)"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 1,
47 | "metadata": {
48 | "tags": []
49 | },
50 | "outputs": [
51 | {
52 | "data": {
53 | "text/plain": [
54 | "sharadar_Category\n",
55 | "ADR 2\n",
56 | "ADR Common Stock 2118\n",
57 | "ADR Common Stock Primary Class 141\n",
58 | "ADR Common Stock Secondary Class 118\n",
59 | "ADR Common Stock Warrant 179\n",
60 | "ADR Preferred 6\n",
61 | "ADR Preferred Stock 96\n",
62 | "ADR Stock Warrant 1\n",
63 | "CEF 1067\n",
64 | "CEF Preferred 63\n",
65 | "CEF Warrant 41\n",
66 | "Canadian 1\n",
67 | "Canadian Common Stock 373\n",
68 | "Canadian Common Stock Primary Class 12\n",
69 | "Canadian Common Stock Secondary Class 3\n",
70 | "Canadian Common Stock Warrant 8\n",
71 | "Canadian Preferred Stock 3\n",
72 | "Canadian Stock Warrant 3\n",
73 | "Domestic 76\n",
74 | "Domestic Common Stock 13861\n",
75 | "Domestic Common Stock Primary Class 1121\n",
76 | "Domestic Common Stock Secondary Class 1098\n",
77 | "Domestic Common Stock Warrant 1455\n",
78 | "Domestic Preferred 45\n",
79 | "Domestic Preferred Stock 1169\n",
80 | "Domestic Primary 1\n",
81 | "Domestic Stock Warrant 208\n",
82 | "Domestic Warrant 16\n",
83 | "ETD 498\n",
84 | "ETF 4992\n",
85 | "ETMF 18\n",
86 | "ETN 403\n",
87 | "IDX 5\n",
88 | "UNIT 25\n",
89 | "Name: Symbol, dtype: int64"
90 | ]
91 | },
92 | "execution_count": 1,
93 | "metadata": {},
94 | "output_type": "execute_result"
95 | }
96 | ],
97 | "source": [
98 | "from quantrocket.master import get_securities\n",
99 | "securities = get_securities(vendors=\"sharadar\", fields=[\"Symbol\", \"sharadar_Category\"])\n",
100 | "\n",
101 | "securities.groupby(\"sharadar_Category\").Symbol.count()"
102 | ]
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {},
107 | "source": [
108 | "We will focus on domestic common stocks. Since some companies have multiple share classes, we will exclude \"Domestic Common Stock Secondary Class\". The following Pipeline expression will satisfy these requirements:"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": 2,
114 | "metadata": {
115 | "tags": []
116 | },
117 | "outputs": [],
118 | "source": [
119 | "from zipline.pipeline import master\n",
120 | "\n",
121 | "category = master.SecuritiesMaster.sharadar_Category.latest\n",
122 | "common_stocks = (\n",
123 | " # domestic common stocks\n",
124 | " category.has_substring(\"Domestic Common\")\n",
125 | " # no secondary shares\n",
126 | " & ~category.has_substring(\"Secondary\")\n",
127 | ")"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "Since `sharadar_Category` is a field from the `SecuritiesMaster` Dataset, this filter can be applied as the `initial_universe` argument of our Pipelines in the following notebooks. Applying the filter as `initial_universe` will completely exclude from the pipeline workspace any assets that aren't primary-share common stocks and will provides a speed boost compared to include these rules in the `screen` along with our other rules. For more information on the `initial_universe` parameter and how it relates to `screen`, see the Usage Guide or the Pipeline Tutorial.\n",
135 | "\n",
136 | "The additional filters below cannot be used with `initial_universe` and must be applied separately as the `screen` parameter of the Pipeline (or as a mask to other terms)."
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "## Liquidity Filter\n",
144 | "\n",
145 | "Even though we want our base universe to be broad and include companies of all sizes, it is still important to add a basic liquidity filter. We will limit the universe to stocks that have had at least some trading volume on each trading day of the past month (approximately 21 trading days). Stocks that have zero trading volume are not only untradable but are also more likely to have suspect prices that can cause unexpected results in Alphalens tear sheets and other analyses. "
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": 3,
151 | "metadata": {
152 | "tags": []
153 | },
154 | "outputs": [],
155 | "source": [
156 | "from zipline.pipeline import EquityPricing\n",
157 | "\n",
158 | "base_universe = (EquityPricing.volume.latest > 0).all(21)"
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {},
164 | "source": [
165 | "## Penny Stock Filter\n",
166 | "\n",
167 | "In addition to the liquidity filter, we will also add a rule to filter out penny stocks by requiring that the closing price must be above $1.00 for 21 consecutive days. Penny stocks often undergo dramatic price jumps and price drops that, if included in the analysis, can bias the results and make it harder to interpret overall factor performance. "
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 4,
173 | "metadata": {
174 | "tags": []
175 | },
176 | "outputs": [],
177 | "source": [
178 | "base_universe = (EquityPricing.close.latest > 1.00).all(21, mask=base_universe)"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "## Helper file\n",
186 | "\n",
187 | "To be able to reuse our universe filters, we put them in a separate file, [universe.py](universe.py). The universes can be imported as follows."
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": 5,
193 | "metadata": {
194 | "tags": []
195 | },
196 | "outputs": [],
197 | "source": [
198 | "from codeload.fundamental_factors.universe import CommonStocks, BaseUniverse\n",
199 | "\n",
200 | "initial_universe = CommonStocks()\n",
201 | "base_universe = BaseUniverse()"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "The `CommonStocks()` filter will be used as the `initial_universe` of the Pipeline in the following notebooks, while the `BaseUniverse()` filter will be used as the `screen` parameter of the Pipeline and as a mask to various factors, and will sometimes be combined with additional filtering rules. (As a reminder from other tutorials, `screen` supports more kinds of filters than `initial_universe`, but using `initial_universe` reduces the size of the computational universe and thus provides a speed boost compared to using `screen`.)"
209 | ]
210 | },
211 | {
212 | "cell_type": "markdown",
213 | "metadata": {},
214 | "source": [
215 | "***\n",
216 | "\n",
217 | "## *Next Up*\n",
218 | "\n",
219 | "Lesson 3: [Basic Usage](Lesson03-Basic-Usage.ipynb)"
220 | ]
221 | }
222 | ],
223 | "metadata": {
224 | "kernelspec": {
225 | "display_name": "Python 3.11",
226 | "language": "python",
227 | "name": "python3"
228 | },
229 | "language_info": {
230 | "codemirror_mode": {
231 | "name": "ipython",
232 | "version": 3
233 | },
234 | "file_extension": ".py",
235 | "mimetype": "text/x-python",
236 | "name": "python",
237 | "nbconvert_exporter": "python",
238 | "pygments_lexer": "ipython3",
239 | "version": "3.11.0"
240 | }
241 | },
242 | "nbformat": 4,
243 | "nbformat_minor": 4
244 | }
245 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson03-Basic-Usage.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 3: Basic Usage\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Basic Usage\n"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {
32 | "tags": []
33 | },
34 | "source": [
35 | "## Choosing a dimension\n",
36 | "\n",
37 | "The first step when using Sharadar fundamentals is to decide which \"dimension\" you want to use. There are 6 possible dimensions, resulting from the combination of 3 possible reporting windows and 2 ways of handling restatements. The 3 reporting windows are:\n",
38 | "\n",
39 | "* Q = Quarterly: metrics reflect financial results for the fiscal quarter\n",
40 | "* Y = Annual: metrics reflect financial results for the fiscal year\n",
41 | "* T = Trailing-Twelve Month: metrics reflect financial results for the trailing twelve months (previous 4 quarters)\n",
42 | "\n",
43 | "The 2 ways of handling restatements are:\n",
44 | "\n",
45 | "* AR = As-Reported: metrics reflect the values as originally reported by the company\n",
46 | "* MR = Most Recently Reported: metrics reflect the most recently reported values\n",
47 | "\n",
48 | "For historical research, most quants prefer to use As-Reported data, because it most accurately represents what would have originally been known at the time of trade.\n",
49 | "\n",
50 | "Thus the 6 possible dimensions are:\n",
51 | "\n",
52 | "* ARQ = As-Reported Quarterly\n",
53 | "* ARY = As-Reported Annual\n",
54 | "* ART = As-Reported Trailing-Twelve Month\n",
55 | "* MRQ = Most Recently Reported Quarterly\n",
56 | "* MRY = Most Recently Reported Annual\n",
57 | "* MRT = Most Recently Reported Trailing-Twelve Month\n",
58 | "\n",
59 | "In this notebook, we will use as-reported, trailing-twelve-month fundamentals. To do so, we use the `slice()` method of `zipline.pipeline.sharadar.Fundamentals` to select the desired dimension. `Fundamentals` is a `DataSetFamily`, and calling its `slice()` method returns a `DataSet`:"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 1,
65 | "metadata": {
66 | "tags": []
67 | },
68 | "outputs": [],
69 | "source": [
70 | "from zipline.pipeline import sharadar\n",
71 | "\n",
72 | "fundamentals = sharadar.Fundamentals.slice('ART')"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {},
78 | "source": [
79 | "## Using Columns as Factors\n",
80 | "\n",
81 | "Many fundamental factors are directly available as columns in the Sharadar dataset. A list of available factors can be found in the docstring for `Fundamentals`, which can be viewed by clicking on `Fundamentals` in the above cell in JupyterLab and pressing `Ctrl`. Once you have identified a factor of interest, you can use it in Pipeline by accessing its `latest` property. \n",
82 | "\n",
83 | "If we want to look at profitability, one metric we could use is net margin, defined as the ratio of net income to revenue:"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 2,
89 | "metadata": {
90 | "tags": []
91 | },
92 | "outputs": [],
93 | "source": [
94 | "net_margin = fundamentals.NETMARGIN.latest"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "## Deriving Factors from Multiple Columns\n",
102 | "\n",
103 | "Sometimes, a fundamental metric may not be directly available in the dataset, but you can derive the metric by combining other metrics. For example, operating margin, defined as the ratio of operating income to revenue, is not included in the dataset. But you can derive the metric like this:"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 3,
109 | "metadata": {
110 | "tags": []
111 | },
112 | "outputs": [],
113 | "source": [
114 | "operating_margin = fundamentals.OPINC.latest / fundamentals.REVENUE.latest "
115 | ]
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "metadata": {},
120 | "source": [
121 | "***\n",
122 | "\n",
123 | "## *Next Up*\n",
124 | "\n",
125 | "Lesson 4: [Periodic Computations](Lesson04-Periodic-Computations.ipynb)"
126 | ]
127 | }
128 | ],
129 | "metadata": {
130 | "kernelspec": {
131 | "display_name": "Python 3.11",
132 | "language": "python",
133 | "name": "python3"
134 | },
135 | "language_info": {
136 | "codemirror_mode": {
137 | "name": "ipython",
138 | "version": 3
139 | },
140 | "file_extension": ".py",
141 | "mimetype": "text/x-python",
142 | "name": "python",
143 | "nbconvert_exporter": "python",
144 | "pygments_lexer": "ipython3",
145 | "version": "3.11.0"
146 | }
147 | },
148 | "nbformat": 4,
149 | "nbformat_minor": 4
150 | }
151 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson04-Periodic-Computations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 4: Periodic Computations\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Periodic Computations\n",
27 | "\n",
28 | "When analyzing price data, it is common to compute changes in prices over time, such as calculating a 52-week high or a 50-day moving average. We call these \"windowed computations\" because they involve looking at a window of data rather than a single price observation. In Pipeline, factors that accept a `window_length` parameter are used to perform windowed computations. For example, `SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)` computes the average of the last 10 days of closing prices.\n",
29 | "\n",
30 | "Just as with price data, it is often useful to compute changes in fundamental values over time. For example, you might want to compute 5-year dividend growth or screen for companies who have consistently grown their earnings over a certain number of quarters. Typical window-based Pipeline factors like `SimpleMovingAverage` aren't suitable for fundamental data because fundamental data changes quarterly, not daily. We don't want to compute the average dividend of the last N days but of the last N quarters.\n",
31 | "\n",
32 | "Pipeline makes it easy to perform computations on multiple quarters or years of fundamental data. These are referred to as \"periodic computations\" because they use fiscal periods rather than the daily values that are used in typical windowed computations like `SimpleMovingAverage`. There are ready-made factors to compute the average, high, low, percent change, or CAGR of a fundamental metric over time, or to screen for companies with metrics that are consistently above or below a certain value (such as consistently positive earnings or dividends), or to screen for consistently increasing or decreasing metrics (such as consistently increasing revenue). "
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {
38 | "tags": []
39 | },
40 | "source": [
41 | "## Choosing a period_offset\n",
42 | "\n",
43 | "Before we look at some of the ready-made periodic factors and filters, let's look at the `period_offset` parameter, which forms the basis of all periodic computations. \n",
44 | "\n",
45 | "As we saw in the previous lesson, you must specify a `dimension` when taking a slice of a fundamental dataset:"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 1,
51 | "metadata": {
52 | "tags": []
53 | },
54 | "outputs": [],
55 | "source": [
56 | "from zipline.pipeline import sharadar\n",
57 | "\n",
58 | "# ARQ = As-Reported Quarterly fundamentals\n",
59 | "fundamentals = sharadar.Fundamentals.slice('ARQ')"
60 | ]
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "metadata": {},
65 | "source": [
66 | "The `slice()` method also accepts an optional second parameter, `period_offset`. If omitted, as in the above example, `period_offset` defaults to 0, which means that Pipeline will return data for the most recent fiscal period (as of the pipeline simulation date). In contrast, a negative `period_offset` means to return data for a previous fiscal period: -1 means the immediately preceding fiscal period, -2 means two fiscal periods ago, etc. For quarterly and trailing-twelve-month dimensions, previous period means previous quarter, while for annual dimensions, previous period means previous year.\n",
67 | "\n",
68 | "To illustrate the use of `period_offset`, let's look at Microsoft's current and previous EPS. First, we take two slices of `Fundamentals`, one representing the latest period and one representing the previous period, and from these slices create factors for the current and previous EPS:"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": 2,
74 | "metadata": {
75 | "tags": []
76 | },
77 | "outputs": [],
78 | "source": [
79 | "from zipline.pipeline import sharadar\n",
80 | "\n",
81 | "current_fundamentals = sharadar.Fundamentals.slice('ART', period_offset=0)\n",
82 | "previous_fundamentals = sharadar.Fundamentals.slice('ART', period_offset=-1)\n",
83 | "\n",
84 | "eps = current_fundamentals.EPS.latest\n",
85 | "previous_eps = previous_fundamentals.EPS.latest"
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "Then, we include the factors as pipeline columns and limit the initial universe to MSFT only. We also include a column with the fiscal period end date for reference:"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 3,
98 | "metadata": {
99 | "tags": []
100 | },
101 | "outputs": [],
102 | "source": [
103 | "from zipline.pipeline import Pipeline\n",
104 | "from zipline.pipeline.filters import StaticAssets\n",
105 | "from zipline.research import symbol\n",
106 | "\n",
107 | "MSFT = symbol(\"MSFT\")\n",
108 | "\n",
109 | "pipeline = Pipeline(\n",
110 | " columns={\n",
111 | " 'fiscal_period_end_date': current_fundamentals.CALENDARDATE.latest,\n",
112 | " 'eps': eps,\n",
113 | " 'previous_eps': previous_eps,\n",
114 | " },\n",
115 | " initial_universe=StaticAssets([MSFT])\n",
116 | ")\n"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "Finally, we run the pipeline. To see what's going on, we can use `drop_duplicates()` to limit the output to rows where the values changed from the previous row: "
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 4,
129 | "metadata": {
130 | "tags": []
131 | },
132 | "outputs": [
133 | {
134 | "data": {
135 | "text/html": [
136 | "
\n",
137 | "\n",
150 | "
\n",
151 | " \n",
152 | "
\n",
153 | "
\n",
154 | "
\n",
155 | "
fiscal_period_end_date
\n",
156 | "
eps
\n",
157 | "
previous_eps
\n",
158 | "
\n",
159 | "
\n",
160 | "
date
\n",
161 | "
asset
\n",
162 | "
\n",
163 | "
\n",
164 | "
\n",
165 | "
\n",
166 | " \n",
167 | " \n",
168 | "
\n",
169 | "
2022-01-03
\n",
170 | "
Equity(FIBBG000BPH459 [MSFT])
\n",
171 | "
2021-09-30
\n",
172 | "
9.02
\n",
173 | "
8.12
\n",
174 | "
\n",
175 | "
\n",
176 | "
2022-01-26
\n",
177 | "
Equity(FIBBG000BPH459 [MSFT])
\n",
178 | "
2021-12-31
\n",
179 | "
9.47
\n",
180 | "
9.02
\n",
181 | "
\n",
182 | "
\n",
183 | "
2022-04-27
\n",
184 | "
Equity(FIBBG000BPH459 [MSFT])
\n",
185 | "
2022-03-31
\n",
186 | "
9.65
\n",
187 | "
9.47
\n",
188 | "
\n",
189 | "
\n",
190 | "
2022-07-29
\n",
191 | "
Equity(FIBBG000BPH459 [MSFT])
\n",
192 | "
2022-06-30
\n",
193 | "
9.70
\n",
194 | "
9.65
\n",
195 | "
\n",
196 | "
\n",
197 | "
2022-10-26
\n",
198 | "
Equity(FIBBG000BPH459 [MSFT])
\n",
199 | "
2022-09-30
\n",
200 | "
9.32
\n",
201 | "
9.70
\n",
202 | "
\n",
203 | " \n",
204 | "
\n",
205 | "
"
206 | ],
207 | "text/plain": [
208 | " fiscal_period_end_date ... previous_eps\n",
209 | "date asset ... \n",
210 | "2022-01-03 Equity(FIBBG000BPH459 [MSFT]) 2021-09-30 ... 8.12\n",
211 | "2022-01-26 Equity(FIBBG000BPH459 [MSFT]) 2021-12-31 ... 9.02\n",
212 | "2022-04-27 Equity(FIBBG000BPH459 [MSFT]) 2022-03-31 ... 9.47\n",
213 | "2022-07-29 Equity(FIBBG000BPH459 [MSFT]) 2022-06-30 ... 9.65\n",
214 | "2022-10-26 Equity(FIBBG000BPH459 [MSFT]) 2022-09-30 ... 9.70\n",
215 | "\n",
216 | "[5 rows x 3 columns]"
217 | ]
218 | },
219 | "execution_count": 4,
220 | "metadata": {},
221 | "output_type": "execute_result"
222 | }
223 | ],
224 | "source": [
225 | "from zipline.research import run_pipeline\n",
226 | "\n",
227 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')\n",
228 | "results.drop_duplicates()"
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "You can see that the `previous_eps` column contains the `eps` column value shifted down from the previous period. \n",
236 | "\n",
237 | "Using `period_offset`, we can do things like compare the current and previous EPS to create a new Filter that computes True if EPS increased from the previous period:"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 5,
243 | "metadata": {
244 | "tags": []
245 | },
246 | "outputs": [],
247 | "source": [
248 | "eps_increased = eps > previous_eps"
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {},
254 | "source": [
255 | "You can go back an arbitrary number of periods with `period_offset`, and you combine the different periods into arbitrarily complex expressions. Under the hood, this is what Pipeline's built-in periodic factors and filters do."
256 | ]
257 | },
258 | {
259 | "cell_type": "markdown",
260 | "metadata": {},
261 | "source": [
262 | "## Built-In Periodic Factors and Filters\n",
263 | "\n",
264 | "The Pipeline API includes a variety of built-in factors and filters for performing periodic computations. These live in the `zipline.pipeline.periodic` module. To see the full list of available factors, click on `periodic` in the following import statement in JupyterLab and press `Ctrl` to see the module docstring:"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": 6,
270 | "metadata": {
271 | "tags": []
272 | },
273 | "outputs": [],
274 | "source": [
275 | "from zipline.pipeline import periodic"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {},
281 | "source": [
282 | "Let's create some real-world examples."
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "metadata": {},
288 | "source": [
289 | "### Average Earnings\n",
290 | "\n",
291 | "To smooth out variation in quarterly earnings, we can compute the average EBITDA over the last 4 quarters:"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 7,
297 | "metadata": {
298 | "tags": []
299 | },
300 | "outputs": [],
301 | "source": [
302 | "from zipline.pipeline.periodic import PeriodicAverage\n",
303 | "\n",
304 | "fundamentals = sharadar.Fundamentals.slice('ARQ')\n",
305 | "avg_earnings = PeriodicAverage(fundamentals.EBITDA, window_length=4)"
306 | ]
307 | },
308 | {
309 | "cell_type": "markdown",
310 | "metadata": {},
311 | "source": [
312 | "Note that the first argument we pass to `PeriodicAverage()` is the column itself (`fundamentals.EBITDA`), not the `latest` factor of the column (`fundamentals.EBITDA.latest`). This is true of all built-in periodic factors and filters."
313 | ]
314 | },
315 | {
316 | "cell_type": "markdown",
317 | "metadata": {},
318 | "source": [
319 | "### Revenue Growth\n",
320 | "\n",
321 | "We can use `PeriodicCAGR()` to compute the compound annual growth rate of revenue over the last 5 years:"
322 | ]
323 | },
324 | {
325 | "cell_type": "code",
326 | "execution_count": 8,
327 | "metadata": {
328 | "tags": []
329 | },
330 | "outputs": [],
331 | "source": [
332 | "from zipline.pipeline.periodic import PeriodicCAGR\n",
333 | "\n",
334 | "fundamentals = sharadar.Fundamentals.slice('ARY')\n",
335 | "revenue_growth = PeriodicCAGR(fundamentals.REVENUE, window_length=5)"
336 | ]
337 | },
338 | {
339 | "cell_type": "markdown",
340 | "metadata": {},
341 | "source": [
342 | "A similar factor is `PeriodicPercentChange()`, which differs only in that it calculates the total percent change over the window length rather than the annual growth rate."
343 | ]
344 | },
345 | {
346 | "cell_type": "markdown",
347 | "metadata": {},
348 | "source": [
349 | "### Consistent Dividend Payers\n",
350 | "\n",
351 | "In this example, we use `AllPeriodAbove()` to screen for companies that have paid dividends in each of the last 8 years:"
352 | ]
353 | },
354 | {
355 | "cell_type": "code",
356 | "execution_count": 9,
357 | "metadata": {
358 | "tags": []
359 | },
360 | "outputs": [],
361 | "source": [
362 | "from zipline.pipeline.periodic import AllPeriodsAbove\n",
363 | "\n",
364 | "fundamentals = sharadar.Fundamentals.slice('ARY')\n",
365 | "consistently_pay_dividends = AllPeriodsAbove(fundamentals.DPS, 0, window_length=8)"
366 | ]
367 | },
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {},
371 | "source": [
372 | "### Never Cut Dividends\n",
373 | "\n",
374 | "This example builds on the previous one by using `AllPeriodsIncreasing()` to further limit the screen to companies that have never cut their dividends over the 8-year period. We use `allow_equal=True` to allow for equal or increasing dividends, and we provide the previous screen as a mask to limit the computation to dividend payers:"
375 | ]
376 | },
377 | {
378 | "cell_type": "code",
379 | "execution_count": 10,
380 | "metadata": {
381 | "tags": []
382 | },
383 | "outputs": [],
384 | "source": [
385 | "from zipline.pipeline.periodic import AllPeriodsIncreasing\n",
386 | "\n",
387 | "have_never_cut_dividends = AllPeriodsIncreasing(fundamentals.DPS, allow_equal=True, window_length=8, mask=consistently_pay_dividends)"
388 | ]
389 | },
390 | {
391 | "cell_type": "markdown",
392 | "metadata": {},
393 | "source": [
394 | "### EPS versus 4-year High\n",
395 | "\n",
396 | "Suppose we'd like to know how the current EPS compares to the 4-year high of EPS. We can use `PeriodicHigh()` to compute the 4-year high (16 quarters using trailing-twelve-month fundamentals), then compare it to EPS to get a ratio. We use `where()` to limit the output to companies with positive EPS:"
397 | ]
398 | },
399 | {
400 | "cell_type": "code",
401 | "execution_count": 11,
402 | "metadata": {
403 | "tags": []
404 | },
405 | "outputs": [],
406 | "source": [
407 | "from zipline.pipeline.periodic import PeriodicHigh\n",
408 | "\n",
409 | "fundamentals = sharadar.Fundamentals.slice('ART')\n",
410 | "eps = fundamentals.EPS.latest\n",
411 | "high_eps = PeriodicHigh(fundamentals.EPS, window_length=16)\n",
412 | "eps_vs_high = (eps / high_eps).where(eps > 0)"
413 | ]
414 | },
415 | {
416 | "cell_type": "markdown",
417 | "metadata": {},
418 | "source": [
419 | "### Periodic Computations as of Earlier Periods\n",
420 | "\n",
421 | "Let's look at a variation of the previous example. Suppose we want to find companies whose current EPS is higher than any of the previous 16 quarters. To do this, we need to compute the 16-quarter high of EPS *as of the previous quarter*, then see if the current EPS is higher than that. We can calculate the highest EPS as of the previous quarter by using `period_offset` to pass the previous quarter's EPS to `PeriodicHigh()`:"
422 | ]
423 | },
424 | {
425 | "cell_type": "code",
426 | "execution_count": 12,
427 | "metadata": {
428 | "tags": []
429 | },
430 | "outputs": [],
431 | "source": [
432 | "\n",
433 | "current_fundamentals = sharadar.Fundamentals.slice('ART', period_offset=0)\n",
434 | "previous_fundamentals = sharadar.Fundamentals.slice('ART', period_offset=-1)\n",
435 | "\n",
436 | "eps = current_fundamentals.EPS.latest\n",
437 | "previous_high_eps = PeriodicHigh(previous_fundamentals.EPS, window_length=16)\n",
438 | "is_new_high_eps = eps > previous_high_eps"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "### Performing Periodic Computations with Derived Factors\n",
446 | "\n",
447 | "So far, we have passed fundamental columns (such as `REVENUE` or `EPS`) directly to the built-in periodic factors. What if we want to perform periodic computations using derived factors, such as operating margin, which as we saw in a previous notebook can be derived as follows:"
448 | ]
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": 13,
453 | "metadata": {
454 | "tags": []
455 | },
456 | "outputs": [],
457 | "source": [
458 | "operating_margin = fundamentals.OPINC.latest / fundamentals.REVENUE.latest "
459 | ]
460 | },
461 | {
462 | "cell_type": "markdown",
463 | "metadata": {},
464 | "source": [
465 | "To use a derived factor with any of the built-in periodic factors or filters, we must create a function that returns the derived factor, then pass the function to the periodic factor or filter. \n",
466 | "\n",
467 | "The function we create must accept two parameters: `period_offset` and `mask`. The function should use the `period_offset` parameter to derive the factor corresponding to that `period_offset`. The function should use the `mask` parameter (if provided) to mask the derived factor it returns. Here is a function that computes operating margin:"
468 | ]
469 | },
470 | {
471 | "cell_type": "code",
472 | "execution_count": 14,
473 | "metadata": {
474 | "tags": []
475 | },
476 | "outputs": [],
477 | "source": [
478 | "def OPMARGIN(period_offset=0, mask=None):\n",
479 | " fundamentals = sharadar.Fundamentals.slice(\"ART\", period_offset)\n",
480 | " operating_margin = fundamentals.OPINC.latest / fundamentals.REVENUE.latest\n",
481 | " if mask is not None:\n",
482 | " operating_margin = operating_margin.where(mask)\n",
483 | " return operating_margin"
484 | ]
485 | },
486 | {
487 | "cell_type": "markdown",
488 | "metadata": {},
489 | "source": [
490 | "We can now pass the `OPMARGIN` function to any of the built-in periodic factors and filters, just as we would pass a data column. Here, we compute the lowest and highest operating margin over the last 4 quarters: "
491 | ]
492 | },
493 | {
494 | "cell_type": "code",
495 | "execution_count": 15,
496 | "metadata": {
497 | "tags": []
498 | },
499 | "outputs": [],
500 | "source": [
501 | "from zipline.pipeline.periodic import PeriodicLow, PeriodicHigh\n",
502 | "\n",
503 | "high_opmargin = PeriodicHigh(OPMARGIN, window_length=4)\n",
504 | "low_opmargin = PeriodicLow(OPMARGIN, window_length=4)"
505 | ]
506 | },
507 | {
508 | "cell_type": "markdown",
509 | "metadata": {},
510 | "source": [
511 | "Make sure to pass the function itself to the periodic factor or filter, not the result of calling the function (`OPMARGIN`, not `OPMARGIN()`).\n",
512 | "\n",
513 | "If you were to pass a `mask` to `PeriodicHigh()` or `PeriodicLow()`, that mask would be passed in turn to your `OPMARGIN` function. If you don't pass a `mask` to `PeriodicHigh()` or `PeriodicLow()`, no mask will be passed to your `OPMARGIN` function. Regardless of whether you intend to pass a mask or not, your `OPMARGIN` function must accept a `mask` parameter. "
514 | ]
515 | },
516 | {
517 | "cell_type": "markdown",
518 | "metadata": {
519 | "tags": []
520 | },
521 | "source": [
522 | "***\n",
523 | "\n",
524 | "## *Next Up*\n",
525 | "\n",
526 | "Lesson 5: [Exploratory Data Analysis](Lesson05-Exploratory-Data-Analysis.ipynb)"
527 | ]
528 | }
529 | ],
530 | "metadata": {
531 | "kernelspec": {
532 | "display_name": "Python 3.11",
533 | "language": "python",
534 | "name": "python3"
535 | },
536 | "language_info": {
537 | "codemirror_mode": {
538 | "name": "ipython",
539 | "version": 3
540 | },
541 | "file_extension": ".py",
542 | "mimetype": "text/x-python",
543 | "name": "python",
544 | "nbconvert_exporter": "python",
545 | "pygments_lexer": "ipython3",
546 | "version": "3.11.0"
547 | }
548 | },
549 | "nbformat": 4,
550 | "nbformat_minor": 4
551 | }
552 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson05-Exploratory-Data-Analysis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 5: Exploratory Data Analysis\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Exploratory Data Analysis\n",
27 | "\n",
28 | "Pipeline output can be fed into a Zipline strategy or analyzed with Alphalens. However, a useful first step in most cases is to explore the data to get a basic sense of the data's distribution and characteristics. This can often highlight ways that you must massage the data or tweak your computations to achieve the desired results. This process is referred to as Exploratory Data Analysis (EDA).\n"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "## Exploring the Profitability Factor\n",
36 | "\n",
37 | "In an earlier lesson, we defined a factor for operating margin, defined as operating income divided by revenue. Operating margin is a measure of profitability. If a company has an operating margin of 10% (0.1), this means that for every dollar of revenue, the company earns 10 cents in operating income.\n",
38 | "\n",
39 | "> Although these lessons will refer to operating margin as the profitability factor, note that operating margin is a different profitability measure than the [Fama-French profitability factor](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2287202). The main differences are that the Fama-French version subtracts interest from revenue while operating margin does not, and the Fama-French version divides the numerator by book equity while operating margin divides by revenue. The choice of operating margin in these lessons is pedagogical and does not constitute an endorsement of one formula over the other. The Fama-French profitability factor can be computed with the following Sharadar fields: `(REVENUE - COR - INTEXP - SGNA) / EQUITY`\n",
40 | "\n",
41 | "\n",
42 | "Suppose we wanted to use Alphalens to assess the suitability of including operating margin in a long-short or long-only strategy. What might our exploratory data analysis look like? \n",
43 | "\n",
44 | "Let's begin by creating a pipeline with this factor, importing our universe filters from lesson 2: "
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 1,
50 | "metadata": {
51 | "tags": []
52 | },
53 | "outputs": [],
54 | "source": [
55 | "from zipline.pipeline import Pipeline, sharadar\n",
56 | "from codeload.fundamental_factors.universe import CommonStocks, BaseUniverse\n",
57 | "\n",
58 | "fundamentals = sharadar.Fundamentals.slice('ART')\n",
59 | "\n",
60 | "operating_margin = fundamentals.OPINC.latest / fundamentals.REVENUE.latest \n",
61 | "\n",
62 | "pipeline = Pipeline(\n",
63 | " columns={\n",
64 | " 'operating_margin': operating_margin,\n",
65 | " },\n",
66 | " initial_universe=CommonStocks(),\n",
67 | " screen=BaseUniverse()\n",
68 | ")"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "Then, we run the pipeline with a year of data. A year (or even less) is often sufficient for the purpose of exploratory data analysis, since the purpose is simply to get a basic understanding of the data distribution and characteristics. In contrast, longer date ranges are beneficial when analyzing factor performance with Alphalens or Zipline, in order to see how the factor performs over time."
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": 2,
81 | "metadata": {},
82 | "outputs": [],
83 | "source": [
84 | "from zipline.research import run_pipeline\n",
85 | "\n",
86 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "## Summarizing the data with pandas' `describe()` method\n",
94 | "\n",
95 | "An easy and fail-safe first way to explore the data is using pandas's `describe()` method, which computes summary statistics for each column in the DataFrame. We will visualize the data distribution with a histogram later, but `describe()` is nice because it works for any kind of data and doesn't require us to think in advance about what kind of plot is best suited to the data, a task that can be tricky when we don't yet have a basic understanding of the data distribution."
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 3,
101 | "metadata": {},
102 | "outputs": [
103 | {
104 | "name": "stderr",
105 | "output_type": "stream",
106 | "text": [
107 | "/opt/conda/lib/python3.11/site-packages/numpy/core/_methods.py:49: RuntimeWarning: invalid value encountered in reduce\n",
108 | " return umr_sum(a, axis, dtype, out, keepdims, initial, where)\n"
109 | ]
110 | },
111 | {
112 | "data": {
113 | "text/html": [
114 | "
\n",
115 | "\n",
128 | "
\n",
129 | " \n",
130 | "
\n",
131 | "
\n",
132 | "
operating_margin
\n",
133 | "
\n",
134 | " \n",
135 | " \n",
136 | "
\n",
137 | "
count
\n",
138 | "
1.033654e+06
\n",
139 | "
\n",
140 | "
\n",
141 | "
mean
\n",
142 | "
NaN
\n",
143 | "
\n",
144 | "
\n",
145 | "
std
\n",
146 | "
NaN
\n",
147 | "
\n",
148 | "
\n",
149 | "
min
\n",
150 | "
-inf
\n",
151 | "
\n",
152 | "
\n",
153 | "
25%
\n",
154 | "
-2.414564e-01
\n",
155 | "
\n",
156 | "
\n",
157 | "
50%
\n",
158 | "
6.528241e-02
\n",
159 | "
\n",
160 | "
\n",
161 | "
75%
\n",
162 | "
2.049988e-01
\n",
163 | "
\n",
164 | "
\n",
165 | "
max
\n",
166 | "
inf
\n",
167 | "
\n",
168 | " \n",
169 | "
\n",
170 | "
"
171 | ],
172 | "text/plain": [
173 | " operating_margin\n",
174 | "count 1.033654e+06\n",
175 | "mean NaN\n",
176 | "std NaN\n",
177 | "min -inf\n",
178 | "25% -2.414564e-01\n",
179 | "50% 6.528241e-02\n",
180 | "75% 2.049988e-01\n",
181 | "max inf"
182 | ]
183 | },
184 | "execution_count": 3,
185 | "metadata": {},
186 | "output_type": "execute_result"
187 | }
188 | ],
189 | "source": [
190 | "results.describe()"
191 | ]
192 | },
193 | {
194 | "cell_type": "markdown",
195 | "metadata": {},
196 | "source": [
197 | "We immediately notice the `NaN` and `inf` values in the `describe()` output. What's going on here? This is happening because we are dividing operating income by revenue, and revenue can be 0. Dividing by zero causes numpy to compute +/-infinity for max and min, and the `inf` values cause the `NaN` values for mean and std. A quick check of `describe()` has highlighted a problem that we should correct before going any further. \n",
198 | "\n",
199 | "A simple solution is to use the Factor `where()` method to ignore observations where revenue is 0. The `where()` method takes a Filter (Pipeline's version of a boolean condition) as its first argument, and returns a new Factor with only those values where the Filter is True. A replacement value can be passed as an optional second argument, but if this is omitted, as we do below, the Factor will be masked with `NaN` values where the Filter is False. These `NaN`s will then be ignored in subsequent analyses.\n",
200 | "\n",
201 | "We re-write the operating margin factor to ignore observations with no revenue as follows:"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": 4,
207 | "metadata": {
208 | "tags": [],
209 | "vscode": {
210 | "languageId": "python"
211 | }
212 | },
213 | "outputs": [],
214 | "source": [
215 | "revenue = fundamentals.REVENUE.latest\n",
216 | "operating_margin = fundamentals.OPINC.latest / revenue.where(revenue > 0) \n",
217 | "\n",
218 | "pipeline = Pipeline(\n",
219 | " columns={\n",
220 | " 'operating_margin': operating_margin,\n",
221 | " },\n",
222 | " initial_universe=CommonStocks(),\n",
223 | " screen=BaseUniverse()\n",
224 | ")"
225 | ]
226 | },
227 | {
228 | "cell_type": "markdown",
229 | "metadata": {},
230 | "source": [
231 | "Re-running the pipeline and `describe()` method, we see that the `NaN` and `inf` values are gone:"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 5,
237 | "metadata": {
238 | "tags": [],
239 | "vscode": {
240 | "languageId": "python"
241 | }
242 | },
243 | "outputs": [
244 | {
245 | "data": {
246 | "text/html": [
247 | "
\n",
248 | "\n",
261 | "
\n",
262 | " \n",
263 | "
\n",
264 | "
\n",
265 | "
operating_margin
\n",
266 | "
\n",
267 | " \n",
268 | " \n",
269 | "
\n",
270 | "
count
\n",
271 | "
945562.000000
\n",
272 | "
\n",
273 | "
\n",
274 | "
mean
\n",
275 | "
-40.700615
\n",
276 | "
\n",
277 | "
\n",
278 | "
std
\n",
279 | "
4985.586530
\n",
280 | "
\n",
281 | "
\n",
282 | "
min
\n",
283 | "
-841260.400000
\n",
284 | "
\n",
285 | "
\n",
286 | "
25%
\n",
287 | "
-0.067687
\n",
288 | "
\n",
289 | "
\n",
290 | "
50%
\n",
291 | "
0.081445
\n",
292 | "
\n",
293 | "
\n",
294 | "
75%
\n",
295 | "
0.222025
\n",
296 | "
\n",
297 | "
\n",
298 | "
max
\n",
299 | "
7.001749
\n",
300 | "
\n",
301 | " \n",
302 | "
\n",
303 | "
"
304 | ],
305 | "text/plain": [
306 | " operating_margin\n",
307 | "count 945562.000000\n",
308 | "mean -40.700615\n",
309 | "std 4985.586530\n",
310 | "min -841260.400000\n",
311 | "25% -0.067687\n",
312 | "50% 0.081445\n",
313 | "75% 0.222025\n",
314 | "max 7.001749"
315 | ]
316 | },
317 | "execution_count": 5,
318 | "metadata": {},
319 | "output_type": "execute_result"
320 | }
321 | ],
322 | "source": [
323 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')\n",
324 | "results.describe()"
325 | ]
326 | },
327 | {
328 | "cell_type": "markdown",
329 | "metadata": {},
330 | "source": [
331 | "What else can we learn from `describe()`? Simplistically, we can think of profit margin as the amount of revenue left over after paying expenses. A company with no revenue left over after expenses would have a profit margin of 0, while a company with no expenses would have a profit margin of 1. But `describe()` reminds us that operating margin is not bounded by 0 and 1. First of all, operating income can be negative, so operating margin can also be negative: a company can spend arbitrarily more than it brings in as revenue. This isn't surprising or unintuitive but it is a useful reminder that our universe currently doesn't just include profitable companies with wider or narrower profit margins but also unprofitable companies that are losing varying amounts of money. Depending on our goals, we may want to include those unprofitable companies in our analysis or exclude them. \n",
332 | "\n",
333 | "A second revelation of `describe()` is more puzzling: operating margin can be greater than 1. This violates the intuitive understanding of profit margin as the amount of revenue left over after paying expenses. How can there be more than 100% of revenue left over after paying expenses? \n",
334 | "\n",
335 | "To see what's going on, we should look at some specific examples. We'll re-run the pipeline, but this time we'll screen for stocks with operating margin greater than 1, and we will include in the output all of the relevant columns that form the basis of operating margin, to help us see where the unexpected result is coming from. To know what columns are relevant, you can start by adding `REVENUE` and `OPINC`, which are the basis of operating margin, then clicking on them in JupyterLab and pressing CTRL to view their definitions, which will show you the names of any underlying columns from which they're derived. `REVENUE` comes directly from the income statement, while `OPINC` is derived from `GP` (gross profit) and `OPEX` (operating expenses). `GP`, in turn, depends on `COR` (cost of revenue).\n",
336 | "\n",
337 | "We run a pipeline with these columns and look at a few of the results:"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": 6,
343 | "metadata": {
344 | "tags": [],
345 | "vscode": {
346 | "languageId": "python"
347 | }
348 | },
349 | "outputs": [
350 | {
351 | "data": {
352 | "text/html": [
353 | "
\n",
354 | "\n",
367 | "
\n",
368 | " \n",
369 | "
\n",
370 | "
\n",
371 | "
\n",
372 | "
operating_margin
\n",
373 | "
revenue
\n",
374 | "
operating_income
\n",
375 | "
gross_profit
\n",
376 | "
cost_of_revenue
\n",
377 | "
operating_expenses
\n",
378 | "
\n",
379 | "
\n",
380 | "
date
\n",
381 | "
asset
\n",
382 | "
\n",
383 | "
\n",
384 | "
\n",
385 | "
\n",
386 | "
\n",
387 | "
\n",
388 | "
\n",
389 | " \n",
390 | " \n",
391 | "
\n",
392 | "
2023-01-03
\n",
393 | "
Equity(FIBBG000D6DM44 [STRS])
\n",
394 | "
7.001749
\n",
395 | "
10866000.0
\n",
396 | "
76081000.0
\n",
397 | "
8507000.0
\n",
398 | "
2359000.0
\n",
399 | "
-67574000.0
\n",
400 | "
\n",
401 | "
\n",
402 | "
2022-08-29
\n",
403 | "
Equity(FIBBG000D6DM44 [STRS])
\n",
404 | "
4.515059
\n",
405 | "
16369000.0
\n",
406 | "
73907000.0
\n",
407 | "
8058000.0
\n",
408 | "
8311000.0
\n",
409 | "
-65849000.0
\n",
410 | "
\n",
411 | "
\n",
412 | "
2022-06-03
\n",
413 | "
Equity(FIBBG000D6DM44 [STRS])
\n",
414 | "
3.936445
\n",
415 | "
16820000.0
\n",
416 | "
66211000.0
\n",
417 | "
3317000.0
\n",
418 | "
13503000.0
\n",
419 | "
-62894000.0
\n",
420 | "
\n",
421 | "
\n",
422 | "
2022-04-01
\n",
423 | "
Equity(FIBBG000D6DM44 [STRS])
\n",
424 | "
2.962884
\n",
425 | "
28236000.0
\n",
426 | "
83660000.0
\n",
427 | "
4024000.0
\n",
428 | "
24212000.0
\n",
429 | "
-79636000.0
\n",
430 | "
\n",
431 | "
\n",
432 | "
2022-12-19
\n",
433 | "
Equity(FIBBG004HQHKK0 [AMBC])
\n",
434 | "
1.517751
\n",
435 | "
338000000.0
\n",
436 | "
513000000.0
\n",
437 | "
694000000.0
\n",
438 | "
-356000000.0
\n",
439 | "
181000000.0
\n",
440 | "
\n",
441 | " \n",
442 | "
\n",
443 | "
"
444 | ],
445 | "text/plain": [
446 | " operating_margin ... operating_expenses\n",
447 | "date asset ... \n",
448 | "2023-01-03 Equity(FIBBG000D6DM44 [STRS]) 7.001749 ... -67574000.0\n",
449 | "2022-08-29 Equity(FIBBG000D6DM44 [STRS]) 4.515059 ... -65849000.0\n",
450 | "2022-06-03 Equity(FIBBG000D6DM44 [STRS]) 3.936445 ... -62894000.0\n",
451 | "2022-04-01 Equity(FIBBG000D6DM44 [STRS]) 2.962884 ... -79636000.0\n",
452 | "2022-12-19 Equity(FIBBG004HQHKK0 [AMBC]) 1.517751 ... 181000000.0\n",
453 | "\n",
454 | "[5 rows x 6 columns]"
455 | ]
456 | },
457 | "execution_count": 6,
458 | "metadata": {},
459 | "output_type": "execute_result"
460 | }
461 | ],
462 | "source": [
463 | "pipeline = Pipeline(\n",
464 | " columns={\n",
465 | " 'operating_margin': operating_margin, # operating_margin = OPINC / REVENUE\n",
466 | " 'revenue': fundamentals.REVENUE.latest,\n",
467 | " 'operating_income': fundamentals.OPINC.latest, # OPINC = GP - OPEX\n",
468 | " 'gross_profit': fundamentals.GP.latest, # GP = REVENUE - COR\n",
469 | " 'cost_of_revenue': fundamentals.COR.latest,\n",
470 | " 'operating_expenses': fundamentals.OPEX.latest\n",
471 | " },\n",
472 | " initial_universe=CommonStocks(),\n",
473 | " screen=BaseUniverse() & (operating_margin > 1)\n",
474 | ")\n",
475 | "\n",
476 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')\n",
477 | "results.sort_values('operating_margin', ascending=False).drop_duplicates().head()"
478 | ]
479 | },
480 | {
481 | "cell_type": "markdown",
482 | "metadata": {},
483 | "source": [
484 | "In these examples, operating expenses or cost of revenue is negative, which accounts for operating margin being greater than 1. A negative operating expense or cost of revenue is unexpected and may indicate an unusual one-time accounting adjustment made by the company; further investigation (such as viewing the full report on the SEC website) would be required to determine with certainty. Regardless, for the purpose of our profitability analysis, we probably don't want to treat a company with negative operating expenses or negative cost of revenue as though it were extraordinarily profitable. Therefore, we can refine our profitability factor further by excluding these companies."
485 | ]
486 | },
487 | {
488 | "cell_type": "code",
489 | "execution_count": 7,
490 | "metadata": {
491 | "tags": [],
492 | "vscode": {
493 | "languageId": "python"
494 | }
495 | },
496 | "outputs": [],
497 | "source": [
498 | "revenue = fundamentals.REVENUE.latest\n",
499 | "operating_margin = fundamentals.OPINC.latest / revenue.where(revenue > 0) \n",
500 | "\n",
501 | "# exclude companies with negative operating expenses or negative cost of revenue\n",
502 | "opex = fundamentals.OPEX.latest\n",
503 | "cor = fundamentals.COR.latest\n",
504 | "operating_margin = operating_margin.where((opex > 0) & (cor > 0))\n",
505 | "\n",
506 | "pipeline = Pipeline(\n",
507 | " columns={\n",
508 | " 'operating_margin': operating_margin,\n",
509 | " },\n",
510 | " initial_universe=CommonStocks(),\n",
511 | " screen=BaseUniverse()\n",
512 | ")"
513 | ]
514 | },
515 | {
516 | "cell_type": "markdown",
517 | "metadata": {},
518 | "source": [
519 | "Re-running the refined pipeline, the output from `describe()` conforms better to expectations, as the maximum operating margin is now slightly below 1.0:"
520 | ]
521 | },
522 | {
523 | "cell_type": "code",
524 | "execution_count": 8,
525 | "metadata": {
526 | "tags": [],
527 | "vscode": {
528 | "languageId": "python"
529 | }
530 | },
531 | "outputs": [
532 | {
533 | "data": {
534 | "text/html": [
535 | "
\n",
536 | "\n",
549 | "
\n",
550 | " \n",
551 | "
\n",
552 | "
\n",
553 | "
operating_margin
\n",
554 | "
\n",
555 | " \n",
556 | " \n",
557 | "
\n",
558 | "
count
\n",
559 | "
778943.000000
\n",
560 | "
\n",
561 | "
\n",
562 | "
mean
\n",
563 | "
-40.017840
\n",
564 | "
\n",
565 | "
\n",
566 | "
std
\n",
567 | "
5476.397958
\n",
568 | "
\n",
569 | "
\n",
570 | "
min
\n",
571 | "
-841260.400000
\n",
572 | "
\n",
573 | "
\n",
574 | "
25%
\n",
575 | "
-0.065510
\n",
576 | "
\n",
577 | "
\n",
578 | "
50%
\n",
579 | "
0.067233
\n",
580 | "
\n",
581 | "
\n",
582 | "
75%
\n",
583 | "
0.165055
\n",
584 | "
\n",
585 | "
\n",
586 | "
max
\n",
587 | "
0.965168
\n",
588 | "
\n",
589 | " \n",
590 | "
\n",
591 | "
"
592 | ],
593 | "text/plain": [
594 | " operating_margin\n",
595 | "count 778943.000000\n",
596 | "mean -40.017840\n",
597 | "std 5476.397958\n",
598 | "min -841260.400000\n",
599 | "25% -0.065510\n",
600 | "50% 0.067233\n",
601 | "75% 0.165055\n",
602 | "max 0.965168"
603 | ]
604 | },
605 | "execution_count": 8,
606 | "metadata": {},
607 | "output_type": "execute_result"
608 | }
609 | ],
610 | "source": [
611 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')\n",
612 | "results.describe()"
613 | ]
614 | },
615 | {
616 | "cell_type": "markdown",
617 | "metadata": {},
618 | "source": [
619 | "\n",
620 | "## Visualizing the data distribution\n",
621 | "\n",
622 | "Now that we've refined our factor to exclude unusual cases, let's look at a histogram of operating margin to get a better feel for its distribution. We can use pandas's `plot.hist()` method to do so. However, the plot we get for the data on the first try is not very informative, as all the observations are crammed into a single bin: "
623 | ]
624 | },
625 | {
626 | "cell_type": "code",
627 | "execution_count": 9,
628 | "metadata": {
629 | "tags": [],
630 | "vscode": {
631 | "languageId": "python"
632 | }
633 | },
634 | "outputs": [
635 | {
636 | "data": {
637 | "image/png": "",
638 | "text/plain": [
639 | ""
640 | ]
641 | },
642 | "metadata": {},
643 | "output_type": "display_data"
644 | }
645 | ],
646 | "source": [
647 | "results.plot.hist();"
648 | ]
649 | },
650 | {
651 | "cell_type": "markdown",
652 | "metadata": {},
653 | "source": [
654 | "The problem is that negative outliers (companies with extremely negative operating margins) are causing most values to be crammed in a single bin. We can fix this by using the `range` argument to `hist()` to zoom in on the bulk of the distribution. In addition, we'll increase the number of bins to 20, from the default 10."
655 | ]
656 | },
657 | {
658 | "cell_type": "code",
659 | "execution_count": 10,
660 | "metadata": {
661 | "tags": [],
662 | "vscode": {
663 | "languageId": "python"
664 | }
665 | },
666 | "outputs": [
667 | {
668 | "data": {
669 | "image/png": "",
670 | "text/plain": [
671 | ""
672 | ]
673 | },
674 | "metadata": {},
675 | "output_type": "display_data"
676 | }
677 | ],
678 | "source": [
679 | "results.plot.hist(bins=20, range=(-1, 1));"
680 | ]
681 | },
682 | {
683 | "cell_type": "markdown",
684 | "metadata": {},
685 | "source": [
686 | "## Clipping Outliers\n",
687 | "\n",
688 | "Using `range` to zoom in on the distribution is useful for viewing the histogram, but it doesn't remove the outliers from the pipeline output itself. Before using the pipeline output in Alphalens or Zipline, perhaps it would be a good idea to adjust the pipeline to deal with the outliers. Beyond a certain point, increasingly negative operating margins don't provide useful additional information; it's enough to know that the company is very unprofitable. So, a reasonable solution is to clip the values to -1. This means that any values less than -1 will be replaced with -1. The Factor `clip()` method requires both a lower and upper bound. We'll set the upper bound to 1, knowing that since we already excluded companies with operating margins above 1, the upper bound is redundant. (The lower and upper bound can be set to `-np.inf` and `np.inf`, respectively, to indicate no bound.) "
689 | ]
690 | },
691 | {
692 | "cell_type": "code",
693 | "execution_count": 11,
694 | "metadata": {
695 | "tags": [],
696 | "vscode": {
697 | "languageId": "python"
698 | }
699 | },
700 | "outputs": [],
701 | "source": [
702 | "pipeline = Pipeline(\n",
703 | " columns={\n",
704 | " 'operating_margin': operating_margin.clip(min_bound=-1, max_bound=1),\n",
705 | " },\n",
706 | " initial_universe=CommonStocks(),\n",
707 | " screen=BaseUniverse()\n",
708 | ")"
709 | ]
710 | },
711 | {
712 | "cell_type": "markdown",
713 | "metadata": {},
714 | "source": [
715 | "If we re-run the pipeline, we can now plot the histogram without using `range`. Notice that, unlike the previous histogram which ignored data outside the (-1, 1) range, in this histogram the clipped values cluster at -1. In other words, the previous histogram included a subset of the data, while this histogram includes all the data. "
716 | ]
717 | },
718 | {
719 | "cell_type": "code",
720 | "execution_count": 12,
721 | "metadata": {
722 | "tags": [],
723 | "vscode": {
724 | "languageId": "python"
725 | }
726 | },
727 | "outputs": [
728 | {
729 | "data": {
730 | "image/png": "",
731 | "text/plain": [
732 | ""
733 | ]
734 | },
735 | "metadata": {},
736 | "output_type": "display_data"
737 | }
738 | ],
739 | "source": [
740 | "results = run_pipeline(pipeline, '2022-01-01', '2022-12-31')\n",
741 | "results.plot.hist(bins=20);"
742 | ]
743 | },
744 | {
745 | "cell_type": "markdown",
746 | "metadata": {},
747 | "source": [
748 | "An alternative to using `clip()` would be to use `winsorize()`. The difference is that `winsorize()` trims values above and below a certain percentile in the distribution, while `clip()` trims values above and below specific fixed values. "
749 | ]
750 | },
751 | {
752 | "cell_type": "markdown",
753 | "metadata": {},
754 | "source": [
755 | "***\n",
756 | "\n",
757 | "## *Next Up*\n",
758 | "\n",
759 | "Lesson 6: [Alphalens: Profitability](Lesson06-Profitability.ipynb)"
760 | ]
761 | }
762 | ],
763 | "metadata": {
764 | "kernelspec": {
765 | "display_name": "Python 3.11",
766 | "language": "python",
767 | "name": "python3"
768 | },
769 | "language_info": {
770 | "codemirror_mode": {
771 | "name": "ipython",
772 | "version": 3
773 | },
774 | "file_extension": ".py",
775 | "mimetype": "text/x-python",
776 | "name": "python",
777 | "nbconvert_exporter": "python",
778 | "pygments_lexer": "ipython3",
779 | "version": "3.11.0"
780 | }
781 | },
782 | "nbformat": 4,
783 | "nbformat_minor": 4
784 | }
785 |
--------------------------------------------------------------------------------
/fundamental_factors/Lesson08-Factor-Values-vs-Factor-Ranks.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | " \n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "attachments": {},
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "***\n",
17 | "[Fundamental Factors](Introduction.ipynb) › Lesson 8: Factor Values vs Factor Ranks\n",
18 | "***"
19 | ]
20 | },
21 | {
22 | "attachments": {},
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "# Factor Values vs Factor Ranks\n",
27 | "\n",
28 | "When using a factor in Pipeline, one choice you must make is whether use the raw factor values or whether to rank the values and use the ranks as your factor. Before explaining why you might choose one or the other, let's see how to rank a factor in Pipeline. \n",
29 | "\n",
30 | "In the following example, we look at the debt-to-equity ratio, which is the ratio of a company's total liabilities to its shareholder equity. The D/E ratio is a measure of a company's finanical leverage, indicating the degree to which a company's operations are funded by debt. Ranking by D/E ratio in Pipeline is a simple matter of using the `rank()` method. However, note the use of `where()` to mask the D/E ratio with our base universe before calling `rank()`. We do this to avoid including securities that aren't part of our universe in the ranks. (Equivalently, we could have passed our universe as the `mask` argument to `rank()`.)"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 1,
36 | "metadata": {
37 | "tags": []
38 | },
39 | "outputs": [],
40 | "source": [
41 | "from zipline.pipeline import sharadar, Pipeline\n",
42 | "from codeload.fundamental_factors.universe import CommonStocks, BaseUniverse\n",
43 | "\n",
44 | "universe = BaseUniverse()\n",
45 | "\n",
46 | "fundamentals = sharadar.Fundamentals.slice('ART')\n",
47 | "\n",
48 | "de = fundamentals.DE.latest\n",
49 | "\n",
50 | "# Mask d/e with the base universe, before using rank() below \n",
51 | "de = de.where(universe)\n",
52 | "\n",
53 | "pipeline = Pipeline(\n",
54 | " columns={\n",
55 | " 'de': de,\n",
56 | " 'de_rank': de.rank(),\n",
57 | " },\n",
58 | " initial_universe=CommonStocks(),\n",
59 | " screen=universe\n",
60 | ")"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "Is it better to use factor ranks or factor values? It depends on what you plan to do with your pipeline output. \n",
68 | "\n",
69 | "There are times when you care about the actual factor values, perhaps because you want to examine their distribution or because you want to bin the values using cutoffs that are meaningful for the specific factor. In those cases, ranking may not be helpful. For example, if you wanted to classify D/E ratios below 1 as \"safe\", D/E ratios between 1 and 2 as \"borderline\", and D/E ratios above 2 as \"risky\", you would need the actual factor values, not the ranks. \n",
70 | "\n",
71 | "An example where ranking can be helpful is when looking at the long-short, factor-weighted cumulative returns plot in Alphalens. \"Factor-weighted\" means that the asset weights are proportional to the factor value. This makes the plot sensitive to outliers, as assets with extreme positive or negative values will have extreme positive or negative weights. This situation can cause the factor-weighted cumulative returns plot to show extreme up and down moves. Using factor ranks instead of factor values can mitigate this problem. The outliers will still have larger weights, but much more moderately so, since the weights will be proportional to the ranks (which are linear), rather than to the values on which the ranks are based.\n",
72 | "\n",
73 | "To illustrate the different weighting that can occur from using factor values vs factor ranks, let's create a two-column pipeline that computes weights based on the D/E ratio and the D/E ratio rank: "
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 2,
79 | "metadata": {
80 | "tags": []
81 | },
82 | "outputs": [],
83 | "source": [
84 | "pipeline = Pipeline(\n",
85 | " columns={\n",
86 | " 'factor_weighted': de / de.sum(),\n",
87 | " 'rank_weighted': de.rank() / de.rank().sum()\n",
88 | " },\n",
89 | " initial_universe=CommonStocks(),\n",
90 | " screen=universe\n",
91 | ")"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "If we run this pipeline and plot the maximum weight in each column, we see that weighting on raw factors can result in much more concentrated portfolios than weighting on ranks:"
99 | ]
100 | },
101 | {
102 | "cell_type": "code",
103 | "execution_count": 3,
104 | "metadata": {
105 | "tags": []
106 | },
107 | "outputs": [
108 | {
109 | "data": {
110 | "image/png": "",
111 | "text/plain": [
112 | ""
113 | ]
114 | },
115 | "metadata": {},
116 | "output_type": "display_data"
117 | }
118 | ],
119 | "source": [
120 | "from zipline.research import run_pipeline\n",
121 | "\n",
122 | "results = run_pipeline(pipeline, '2022-12-30', '2022-12-30')\n",
123 | "\n",
124 | "results.max().plot(kind=\"barh\", title=\"Maximum weights\");"
125 | ]
126 | },
127 | {
128 | "cell_type": "markdown",
129 | "metadata": {},
130 | "source": [
131 | "## Don't confuse yourself\n",
132 | "\n",
133 | "You can rank values in ascending order (the default) or in descending order (`rank(ascending=False)`). Whichever you choose, it's worth making a mental note of where \"good\" and \"bad\" values will end up in the ranked results, as it's easy to get confused. For some metrics, like D/E ratio, small numbers are \"good\" and large numbers are \"bad,\" while for other metrics, like return on equity, small numbers are \"bad\" and large numbers are \"good.\" Ranking in ascending order puts the small numbers first (ranks 1, 2, 3, etc.), while ranking in descending order puts the large numbers first. To avoid confusion, you may find it helpful to consistently choose the ranking order that will put the \"good\" numbers first."
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "***\n",
141 | "\n",
142 | "## *Next Up*\n",
143 | "\n",
144 | "Lesson 9: [Sector Neutralization](Lesson09-Sector-Neutralization.ipynb)"
145 | ]
146 | }
147 | ],
148 | "metadata": {
149 | "kernelspec": {
150 | "display_name": "Python 3.11",
151 | "language": "python",
152 | "name": "python3"
153 | },
154 | "language_info": {
155 | "codemirror_mode": {
156 | "name": "ipython",
157 | "version": 3
158 | },
159 | "file_extension": ".py",
160 | "mimetype": "text/x-python",
161 | "name": "python",
162 | "nbconvert_exporter": "python",
163 | "pygments_lexer": "ipython3",
164 | "version": "3.11.0"
165 | }
166 | },
167 | "nbformat": 4,
168 | "nbformat_minor": 4
169 | }
170 |
--------------------------------------------------------------------------------
/fundamental_factors/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/quantrocket-codeload/fundamental-factors/0c9c2c6d6c2979446c29d32584dbd06c7b38dfc3/fundamental_factors/__init__.py
--------------------------------------------------------------------------------
/fundamental_factors/universe.py:
--------------------------------------------------------------------------------
1 | # Copyright 2024 QuantRocket LLC - All Rights Reserved
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | from zipline.pipeline import master, EquityPricing
16 |
17 | def CommonStocks():
18 | """
19 | Return a Pipeline filter that is limited to domestic common stocks, excluding
20 | secondary shares. This filter is intended to be used as the initial_universe
21 | of the Pipeline.
22 | """
23 | category = master.SecuritiesMaster.sharadar_Category.latest
24 | common_stocks = (
25 | # domestic common stocks
26 | category.has_substring("Domestic Common")
27 | # no secondary shares
28 | & ~category.has_substring("Secondary")
29 | )
30 | return common_stocks
31 |
32 | def BaseUniverse():
33 | """
34 | Return a Pipeline filter that excludes illiquid stocks and penny stocks.
35 | This filter is intended to be used as the screen of the Pipeline or the
36 | mask of factors in the Pipeline.
37 | """
38 | base_universe = (EquityPricing.volume.latest > 0).all(21)
39 | base_universe = (EquityPricing.close.latest > 1.00).all(21, mask=base_universe)
40 |
41 | return base_universe
42 |
--------------------------------------------------------------------------------