├── .qrignore
├── LICENSE.txt
├── README.md
└── pipeline_tutorial
├── Introduction.ipynb
├── Lesson01-Data-Collection.ipynb
├── Lesson02-Why-Pipeline.ipynb
├── Lesson03-Creating-Pipeline.ipynb
├── Lesson04-Factors.ipynb
├── Lesson05-Combining-Factors.ipynb
├── Lesson06-Filters.ipynb
├── Lesson07-Combining-Filters.ipynb
├── Lesson08-Masking.ipynb
├── Lesson09-Classifiers.ipynb
├── Lesson10-Datasets.ipynb
├── Lesson11-Custom-Factors.ipynb
├── Lesson12-Initial-Universe.ipynb
├── Lesson13-TradableStocksUS-Universe.ipynb
├── Lesson14-Alphalens.ipynb
├── Lesson15-Data-Browser-Integration.ipynb
├── __init__.py
└── tradable_stocks.py
/.qrignore:
--------------------------------------------------------------------------------
1 | README.md
2 | LICENSE.txt
3 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
10 |
11 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
12 |
13 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
14 |
15 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
16 |
17 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
18 |
19 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
20 |
21 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
22 |
23 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
24 |
25 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
26 |
27 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
28 |
29 | 2. Grant of Copyright License.
30 |
31 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
32 |
33 | 3. Grant of Patent License.
34 |
35 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
36 |
37 | 4. Redistribution.
38 |
39 | You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
40 |
41 | You must give any other recipients of the Work or Derivative Works a copy of this License; and
42 | You must cause any modified files to carry prominent notices stating that You changed the files; and
43 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
44 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
45 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
46 |
47 | 5. Submission of Contributions.
48 |
49 | Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
50 |
51 | 6. Trademarks.
52 |
53 | This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
54 |
55 | 7. Disclaimer of Warranty.
56 |
57 | Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
58 |
59 | 8. Limitation of Liability.
60 |
61 | In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
62 |
63 | 9. Accepting Warranty or Additional Liability.
64 |
65 | While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
66 |
67 | END OF TERMS AND CONDITIONS
68 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # pipeline-tutorial
2 |
3 | In-depth walkthrough of Pipeline, an API for filtering and performing computations on large universes of securities. The Pipeline API is part of Zipline but can also be used on a standalone basis.
4 |
5 | ## Clone in QuantRocket
6 |
7 | CLI:
8 |
9 | ```shell
10 | quantrocket codeload clone 'pipeline-tutorial'
11 | ```
12 |
13 | Python:
14 |
15 | ```python
16 | from quantrocket.codeload import clone
17 | clone("pipeline-tutorial")
18 | ```
19 |
20 | ## Browse in GitHub
21 |
22 | Start here: [pipeline_tutorial/Introduction.ipynb](pipeline_tutorial/Introduction.ipynb)
23 |
24 | ***
25 |
26 | Find more code in QuantRocket's [Code Library](https://www.quantrocket.com/code/)
27 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Introduction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "# Pipeline API Tutorial\n",
16 | "\n",
17 | "This tutorial provides an in-depth walkthrough of Pipeline, an API for filtering and performing computations on large universes of securities. The Pipeline API is part of Zipline but can also be used on a standalone basis. Before completing this tutorial, it is recommended that you start with the `zipline-intro` tutorial in the [Code Library](https://www.quantrocket.com/code/?filter=zipline), which provides a more basic introduction to Zipline and the Pipeline API.\n",
18 | "\n",
19 | "Lessons 2-11 were originally created by Quantopian Inc. and released under a Creative Commons license ([GitHub source repo](https://github.com/quantopian/research_public/tree/3bd730f292aa76d7c546f9a400583399030c8f65)). They have been adapted for QuantRocket. \n",
20 | " \n",
21 | "* Lesson 1: [Data Collection](Lesson01-Data-Collection.ipynb)\n",
22 | "* Lesson 2: [Why Pipeline?](Lesson02-Why-Pipeline.ipynb)\n",
23 | "* Lesson 3: [Creating a Pipeline](Lesson03-Creating-Pipeline.ipynb)\n",
24 | "* Lesson 4: [Factors](Lesson04-Factors.ipynb)\n",
25 | "* Lesson 5: [Combining Factors](Lesson05-Combining-Factors.ipynb)\n",
26 | "* Lesson 6: [Filters](Lesson06-Filters.ipynb)\n",
27 | "* Lesson 7: [Combining Filters](Lesson07-Combining-Filters.ipynb)\n",
28 | "* Lesson 8: [Masking](Lesson08-Masking.ipynb)\n",
29 | "* Lesson 9: [Classifiers](Lesson09-Classifiers.ipynb)\n",
30 | "* Lesson 10: [Datasets](Lesson10-Datasets.ipynb)\n",
31 | "* Lesson 11: [Custom Factors](Lesson11-Custom-Factors.ipynb)\n",
32 | "* Lesson 12: [Initial Universe](Lesson12-Initial-Universe.ipynb)\n",
33 | "* Lesson 13: [Creating the TradableStocksUS Universe](Lesson13-TradableStocksUS-Universe.ipynb)\n",
34 | "* Lesson 14: [Using Pipeline with Alphalens](Lesson14-Alphalens.ipynb)\n",
35 | "* Lesson 15: [Browsing Pipeline Output in the Data Browser](Lesson15-Data-Browser-Integration.ipynb) "
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "## License \n",
43 | "\n",
44 | "© Lessons 2-11 Copyright Quantopian, Inc.
\n",
45 | "© Lessons 2-11 Modifications Copyright QuantRocket LLC
\n",
46 | "© Other Lessons Copyright QuantRocket LLC\n",
47 | "\n",
48 | "Notebooks licensed under the [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/legalcode).
\n",
49 | "Python files licensed under the Apache 2.0 license."
50 | ]
51 | }
52 | ],
53 | "metadata": {
54 | "kernelspec": {
55 | "display_name": "Python 3.11",
56 | "language": "python",
57 | "name": "python3"
58 | },
59 | "language_info": {
60 | "codemirror_mode": {
61 | "name": "ipython",
62 | "version": 3
63 | },
64 | "file_extension": ".py",
65 | "mimetype": "text/x-python",
66 | "name": "python",
67 | "nbconvert_exporter": "python",
68 | "pygments_lexer": "ipython3",
69 | "version": "3.11.0"
70 | }
71 | },
72 | "nbformat": 4,
73 | "nbformat_minor": 4
74 | }
75 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson01-Data-Collection.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 1: Data Collection\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Data Collection\n",
25 | "\n",
26 | "This tutorial uses the US Stock learning bundle, which provides daily prices for all US stocks for the years 2007-2011.\n",
27 | "\n",
28 | "Create an empty bundle called 'usstock-learn-1d':"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 1,
34 | "metadata": {},
35 | "outputs": [
36 | {
37 | "data": {
38 | "text/plain": [
39 | "{'status': 'success', 'msg': 'successfully created usstock-learn-1d bundle'}"
40 | ]
41 | },
42 | "execution_count": 1,
43 | "metadata": {},
44 | "output_type": "execute_result"
45 | }
46 | ],
47 | "source": [
48 | "from quantrocket.zipline import create_usstock_bundle\n",
49 | "create_usstock_bundle(\"usstock-learn-1d\", learn=True)"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "Then ingest the data:"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 2,
62 | "metadata": {},
63 | "outputs": [
64 | {
65 | "data": {
66 | "text/plain": [
67 | "{'status': 'the data will be ingested asynchronously'}"
68 | ]
69 | },
70 | "execution_count": 2,
71 | "metadata": {},
72 | "output_type": "execute_result"
73 | }
74 | ],
75 | "source": [
76 | "from quantrocket.zipline import ingest_bundle\n",
77 | "ingest_bundle(\"usstock-learn-1d\")"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "This dataset only takes a second or two to load. Use flightlog to confirm completion:\n",
85 | "\n",
86 | "```\n",
87 | "quantrocket.zipline: INFO [usstock-learn-1d] Completed ingesting daily history for all stocks from 2007-2011\n",
88 | "```"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "---\n",
96 | "\n",
97 | "**Next Lesson:** [Why Pipeline?](Lesson02-Why-Pipeline.ipynb) "
98 | ]
99 | }
100 | ],
101 | "metadata": {
102 | "kernelspec": {
103 | "display_name": "Python 3.11",
104 | "language": "python",
105 | "name": "python3"
106 | },
107 | "language_info": {
108 | "codemirror_mode": {
109 | "name": "ipython",
110 | "version": 3
111 | },
112 | "file_extension": ".py",
113 | "mimetype": "text/x-python",
114 | "name": "python",
115 | "nbconvert_exporter": "python",
116 | "pygments_lexer": "ipython3",
117 | "version": "3.11.0"
118 | }
119 | },
120 | "nbformat": 4,
121 | "nbformat_minor": 4
122 | }
123 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson02-Why-Pipeline.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 2: Why Pipeline?\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Why Pipeline?\n",
25 | "\n",
26 | "Many trading algorithms have the following structure:\n",
27 | "\n",
28 | "1. For each asset in a known (large) set, compute N scalar values for the asset based on a trailing window of data.\n",
29 | "2. Select a smaller tradeable set of assets based on the values computed in (1).\n",
30 | "3. Calculate desired portfolio weights on the set of assets selected in (2).\n",
31 | "4. Place orders to move the algorithm's current portfolio allocations to the desired weights computed in (3).\n",
32 | "\n",
33 | "There are several technical challenges with doing this robustly. These include:\n",
34 | "\n",
35 | "* efficiently querying large sets of assets\n",
36 | "* performing computations on large sets of assets\n",
37 | "* handling adjustments (splits and dividends)\n",
38 | "* asset delistings\n",
39 | "\n",
40 | "Pipeline exists to solve these challenges by providing a uniform API for expressing computations on a diverse collection of datasets."
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "## Research Notebooks vs Zipline Backtests\n",
48 | "An ideal algorithm design workflow involves a research phase and an implementation phase. In the research phase, we can interact with data or quickly iterate on different ideas in a notebook. Algorithms are then implemented in Zipline where they can be backtested.\n",
49 | "\n",
50 | "One feature of the Pipeline API is that constructing a pipeline is identical in a research notebook and in a Zipline algorithm. The only difference between using pipeline in the two environments is how it is run. This makes it easy to iterate on a pipeline design in research and then move it with a simple copy paste to a Zipline algorithm. "
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "## Computations\n",
58 | "There are three types of computations that can be expressed in a pipeline: factors, filters, and classifiers.\n",
59 | "Abstractly, factors, filters, and classifiers all represent functions that produce a value from an asset and a moment in time. Factors, filters, and classifiers are distinguished by the types of values they produce."
60 | ]
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "metadata": {},
65 | "source": [
66 | "### Factors\n",
67 | "A factor is a function from an asset and a moment in time to a numerical value.\n",
68 | "A simple example of a factor is the most recent price of a security. Given a security and a specific point in time, the most recent price is a number. Another example is the 10-day average trading volume of a security. Factors are most commonly used to assign values to securities which can then be used in a number of ways. A factor can be used in each of the following procedures:\n",
69 | "\n",
70 | "* computing target weights\n",
71 | "* generating alpha signal\n",
72 | "* constructing other, more complex factors\n",
73 | "* constructing filters"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "### Filters\n",
81 | "A filter is a function from an asset and a moment in time to a boolean.\n",
82 | "An example of a filter is a function indicating whether a security's price is below \\$10. Given a security and a point in time, this evaluates to either True or False. Filters are most commonly used for describing sets of assets to include or exclude for some particular purpose.\n"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | "### Classifiers\n",
90 | "A classifier is a function from an asset and a moment in time to a categorical output.\n",
91 | "More specifically, a classifier produces a string or an int that doesn't represent a numerical value (e.g. an integer label such as a sector code). Classifiers are most commonly used for grouping assets for complex transformations on Factor outputs. An example of a classifier is the exchange on which an asset is currently being traded."
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "## Datasets\n",
99 | "Pipeline computations can be performed using a variety of data such as pricing (OHLC) and volume data, fundamental data, and securities master data. We will explore these datasets in later lessons.\n",
100 | "\n",
101 | "A typical pipeline usually involves multiple computations and datasets. In this tutorial, we will build up to a pipeline that selects liquid securities with large changes between their 10-day and 30-day average prices."
102 | ]
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {},
107 | "source": [
108 | "---\n",
109 | "\n",
110 | "**Next Lesson:** [Creating a Pipeline](Lesson03-Creating-Pipeline.ipynb) "
111 | ]
112 | }
113 | ],
114 | "metadata": {
115 | "kernelspec": {
116 | "display_name": "Python 3.11",
117 | "language": "python",
118 | "name": "python3"
119 | },
120 | "language_info": {
121 | "codemirror_mode": {
122 | "name": "ipython",
123 | "version": 3
124 | },
125 | "file_extension": ".py",
126 | "mimetype": "text/x-python",
127 | "name": "python",
128 | "nbconvert_exporter": "python",
129 | "pygments_lexer": "ipython3",
130 | "version": "3.11.0"
131 | }
132 | },
133 | "nbformat": 4,
134 | "nbformat_minor": 4
135 | }
136 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson03-Creating-Pipeline.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 3: Creating a Pipeline\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Creating a Pipeline\n",
25 | "\n",
26 | "In this lesson, we will take a look at creating an empty pipeline. First, let's import the Pipeline class:"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "from zipline.pipeline import Pipeline"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "In a new cell, let's define a function to create our pipeline. Wrapping our pipeline creation in a function sets up a structure for more complex pipelines that we will see later on. For now, this function simply returns an empty pipeline:"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {},
49 | "outputs": [],
50 | "source": [
51 | "def make_pipeline():\n",
52 | " return Pipeline()"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "In a new cell, let's instantiate our pipeline by running `make_pipeline()`:"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 3,
65 | "metadata": {
66 | "collapsed": false,
67 | "jupyter": {
68 | "outputs_hidden": false
69 | }
70 | },
71 | "outputs": [],
72 | "source": [
73 | "my_pipe = make_pipeline()"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "## Running a Pipeline\n",
81 | "\n",
82 | "Now that we have a reference to an empty Pipeline, `my_pipe`, let's run it to see what it looks like. Before running our pipeline, we first need to import `run_pipeline`, a research-only function that allows us to run a pipeline over a specified time period."
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 4,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "from zipline.research import run_pipeline"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "Since we will be using the same data bundle repeatedly in this tutorial, we can set it as the default bundle to avoid always having to type the name of the bundle in each call to `run_pipeline`:"
99 | ]
100 | },
101 | {
102 | "cell_type": "code",
103 | "execution_count": 5,
104 | "metadata": {},
105 | "outputs": [
106 | {
107 | "data": {
108 | "text/plain": [
109 | "{'status': 'successfully set default bundle'}"
110 | ]
111 | },
112 | "execution_count": 5,
113 | "metadata": {},
114 | "output_type": "execute_result"
115 | }
116 | ],
117 | "source": [
118 | "from quantrocket.zipline import set_default_bundle\n",
119 | "set_default_bundle(\"usstock-learn-1d\")"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "Let's run our pipeline for one day (2010-01-05) with `run_pipeline` and display it."
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": 6,
132 | "metadata": {
133 | "collapsed": false,
134 | "jupyter": {
135 | "outputs_hidden": false
136 | }
137 | },
138 | "outputs": [],
139 | "source": [
140 | "result = run_pipeline(my_pipe, start_date='2010-01-05', end_date='2010-01-05')"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "A call to `run_pipeline` returns a pandas DataFrame indexed by date and security. Let's see what the empty pipeline looks like:"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 7,
153 | "metadata": {
154 | "collapsed": false,
155 | "jupyter": {
156 | "outputs_hidden": false
157 | },
158 | "scrolled": true
159 | },
160 | "outputs": [
161 | {
162 | "data": {
163 | "text/html": [
164 | "
\n",
165 | "\n",
178 | "
\n",
179 | " \n",
180 | " \n",
181 | " | \n",
182 | " | \n",
183 | "
\n",
184 | " \n",
185 | " date | \n",
186 | " asset | \n",
187 | "
\n",
188 | " \n",
189 | " \n",
190 | " \n",
191 | " 2010-01-05 | \n",
192 | " Equity(FIBBG000C2V3D6 [A]) | \n",
193 | "
\n",
194 | " \n",
195 | " Equity(QI000000004076 [AABA]) | \n",
196 | "
\n",
197 | " \n",
198 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
199 | "
\n",
200 | " \n",
201 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
202 | "
\n",
203 | " \n",
204 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
205 | "
\n",
206 | " \n",
207 | " ... | \n",
208 | "
\n",
209 | " \n",
210 | " Equity(FIBBG011MC2100 [AATC]) | \n",
211 | "
\n",
212 | " \n",
213 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
214 | "
\n",
215 | " \n",
216 | " Equity(FIBBG000008NR0 [ISM]) | \n",
217 | "
\n",
218 | " \n",
219 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
220 | "
\n",
221 | " \n",
222 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
223 | "
\n",
224 | " \n",
225 | "
\n",
226 | "
7841 rows × 0 columns
\n",
227 | "
"
228 | ],
229 | "text/plain": [
230 | "Empty DataFrameWithMetadata\n",
231 | "Columns: []\n",
232 | "Index: [(2010-01-05 00:00:00, Equity(FIBBG000C2V3D6 [A])), (2010-01-05 00:00:00, Equity(QI000000004076 [AABA])), (2010-01-05 00:00:00, Equity(FIBBG000BZWHH8 [AACC])), (2010-01-05 00:00:00, Equity(FIBBG000V2S3P6 [AACG])), (2010-01-05 00:00:00, Equity(FIBBG000M7KQ09 [AAI])), (2010-01-05 00:00:00, Equity(FIBBG000BD1373 [AAIC])), (2010-01-05 00:00:00, Equity(FIBBG000B9XB24 [AAME])), (2010-01-05 00:00:00, Equity(FIBBG000D9V7T4 [PRG])), (2010-01-05 00:00:00, Equity(QI000000053169 [AAN])), (2010-01-05 00:00:00, Equity(FIBBG000C2LZP3 [AAON])), (2010-01-05 00:00:00, Equity(FIBBG000F7RCJ1 [AAP])), (2010-01-05 00:00:00, Equity(FIBBG000B9XRY4 [AAPL])), (2010-01-05 00:00:00, Equity(FIBBG000006B98 [AAR])), (2010-01-05 00:00:00, Equity(FIBBG000F519J0 [AATI])), (2010-01-05 00:00:00, Equity(FIBBG000DGFSY4 [AAU])), (2010-01-05 00:00:00, Equity(FIBBG000C5QZ62 [AAV])), (2010-01-05 00:00:00, Equity(FIBBG000Q57YP0 [AAWW])), (2010-01-05 00:00:00, Equity(FIBBG000G6GXC5 [AAXJ])), (2010-01-05 00:00:00, Equity(FIBBG000B9WM03 [AB])), (2010-01-05 00:00:00, Equity(FIBBG000CP4WX9 [ABAX])), (2010-01-05 00:00:00, Equity(FIBBG000DK5Q25 [ABB])), (2010-01-05 00:00:00, Equity(FIBBG000PMDBR5 [ABBC])), (2010-01-05 00:00:00, Equity(FIBBG000MDCQC2 [COR])), (2010-01-05 00:00:00, Equity(FIBBG000CDY3H5 [ABCB])), (2010-01-05 00:00:00, Equity(FIBBG000Q05Q43 [ABCD])), (2010-01-05 00:00:00, Equity(FIBBG000CG1LX6 [ABCO])), (2010-01-05 00:00:00, Equity(FIBBG000C4H3C9 [ABCW])), (2010-01-05 00:00:00, Equity(FIBBG000C24FY6 [ABE])), (2010-01-05 00:00:00, Equity(FIBBG000BKDWB5 [ABG])), (2010-01-05 00:00:00, Equity(FIBBG000KPFSC0 [ABI])), (2010-01-05 00:00:00, Equity(FIBBG000BC7G61 [ABII])), (2010-01-05 00:00:00, Equity(FIBBG000BWNN28 [ABIO])), (2010-01-05 00:00:00, Equity(FIBBG000001382 [ABKPZ])), (2010-01-05 00:00:00, Equity(FIBBG000B9YFS6 [ABL])), (2010-01-05 00:00:00, Equity(FIBBG000B9YYH7 [ABM])), (2010-01-05 00:00:00, Equity(FIBBG000C101X4 [ABMD])), (2010-01-05 00:00:00, Equity(FIBBG000KMVDV1 [ABR])), (2010-01-05 00:00:00, Equity(FIBBG000B9ZXB4 [ABT])), (2010-01-05 00:00:00, Equity(QI000000052857 [ABV])), (2010-01-05 00:00:00, Equity(FIBBG000BMG4H4 [ABV.C])), (2010-01-05 00:00:00, Equity(FIBBG000CZ31L9 [ABVA])), (2010-01-05 00:00:00, Equity(FIBBG000BJ7R08 [ABVT])), (2010-01-05 00:00:00, Equity(FIBBG000006Z04 [ABWPA])), (2010-01-05 00:00:00, Equity(FIBBG000928WM2 [ABY])), (2010-01-05 00:00:00, Equity(FIBBG000BHG9K0 [ACAD])), (2010-01-05 00:00:00, Equity(FIBBG000CS36R8 [ACAP])), (2010-01-05 00:00:00, Equity(FIBBG000BYR208 [ACAS])), (2010-01-05 00:00:00, Equity(FIBBG000BS2LC3 [ACAT])), (2010-01-05 00:00:00, Equity(FIBBG000M9KP89 [ACC])), (2010-01-05 00:00:00, Equity(FIBBG000C3NNC0 [ACCL])), (2010-01-05 00:00:00, Equity(FIBBG000J06K07 [ACCO])), (2010-01-05 00:00:00, Equity(FIBBG000BG5S59 [ACER])), (2010-01-05 00:00:00, Equity(FIBBG000BB0S19 [ACET])), (2010-01-05 00:00:00, Equity(FIBBG000CXLZ20 [ACF])), (2010-01-05 00:00:00, Equity(QI000000140206 [ACFC])), (2010-01-05 00:00:00, Equity(FIBBG000BP23H4 [ACFC])), (2010-01-05 00:00:00, Equity(FIBBG000CMY866 [ACFN])), (2010-01-05 00:00:00, Equity(FIBBG000BB1732 [ACG])), (2010-01-05 00:00:00, Equity(FIBBG000HXNN20 [ACGL])), (2010-01-05 00:00:00, Equity(FIBBG0000092M0 [ACGLO])), (2010-01-05 00:00:00, Equity(FIBBG000008PD0 [ACGLP])), (2010-01-05 00:00:00, Equity(FIBBG000CMRVH1 [ACH])), (2010-01-05 00:00:00, Equity(FIBBG000BPPV05 [ACHN])), (2010-01-05 00:00:00, Equity(FIBBG000FB8S62 [ACHV])), (2010-01-05 00:00:00, Equity(FIBBG000PMBV39 [ACIW])), (2010-01-05 00:00:00, Equity(FIBBG000DQ1D34 [ACL])), (2010-01-05 00:00:00, Equity(FIBBG000QTH7H5 [ACLI])), (2010-01-05 00:00:00, Equity(FIBBG000DW34S2 [ACLS])), (2010-01-05 00:00:00, Equity(FIBBG000F61RJ8 [ACM])), (2010-01-05 00:00:00, Equity(FIBBG000BSXFL2 [ACMR])), (2010-01-05 00:00:00, Equity(FIBBG000D9D830 [ACN])), (2010-01-05 00:00:00, Equity(FIBBG000C2LGN7 [ACO])), (2010-01-05 00:00:00, Equity(FIBBG000P0DBL9 [ACOM])), (2010-01-05 00:00:00, Equity(FIBBG000FD10V8 [ACOR])), (2010-01-05 00:00:00, Equity(FIBBG000HK4RX6 [ACR])), (2010-01-05 00:00:00, Equity(FIBBG000J8CD72 [ACS])), (2010-01-05 00:00:00, Equity(FIBBG000BB7923 [ACTA])), (2010-01-05 00:00:00, Equity(FIBBG000KF9J02 [ACTG])), (2010-01-05 00:00:00, Equity(FIBBG000PNRF80 [ACTI])), (2010-01-05 00:00:00, Equity(FIBBG000BJ31T8 [ACTL])), (2010-01-05 00:00:00, Equity(FIBBG000F80PN4 [ACTS])), (2010-01-05 00:00:00, Equity(FIBBG000BB3LM8 [ACU])), (2010-01-05 00:00:00, Equity(FIBBG000J0MK45 [ACUR])), (2010-01-05 00:00:00, Equity(FIBBG000BB41G8 [ACV])), (2010-01-05 00:00:00, Equity(FIBBG000TH6VB3 [ACWI])), (2010-01-05 00:00:00, Equity(FIBBG000TH7DF8 [ACWX])), (2010-01-05 00:00:00, Equity(QE000000122140 [ACXM])), (2010-01-05 00:00:00, Equity(FIBBG000B9WP24 [MPU])), (2010-01-05 00:00:00, Equity(FIBBG000FHTP62 [ADAM])), (2010-01-05 00:00:00, Equity(FIBBG000CG2ZW5 [ADAT])), (2010-01-05 00:00:00, Equity(FIBBG000BB5006 [ADBE])), (2010-01-05 00:00:00, Equity(FIBBG000BC9DK0 [ADC])), (2010-01-05 00:00:00, Equity(FIBBG000BB5HV5 [ADCT])), (2010-01-05 00:00:00, Equity(FIBBG000FQYSY9 [ADEP])), (2010-01-05 00:00:00, Equity(FIBBG000JFNL85 [ARQ])), (2010-01-05 00:00:00, Equity(FIBBG000CWFQT0 [ADG])), (2010-01-05 00:00:00, Equity(FIBBG000B9WLK3 [ADGE])), (2010-01-05 00:00:00, Equity(FIBBG000BNKLH9 [ADGF])), (2010-01-05 00:00:00, Equity(FIBBG000BB6G37 [ADI])), (2010-01-05 00:00:00, Equity(FIBBG000PYP3D9 [ADK])), ...]\n",
233 | "\n",
234 | "[7841 rows x 0 columns]"
235 | ]
236 | },
237 | "execution_count": 7,
238 | "metadata": {},
239 | "output_type": "execute_result"
240 | }
241 | ],
242 | "source": [
243 | "result"
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "metadata": {},
249 | "source": [
250 | "The output of an empty pipeline is a DataFrame with no columns. In this example, our pipeline has an index made up of all ~8000 securities (truncated in the display) for Jan 5th, 2010, but doesn't have any columns.\n",
251 | "\n",
252 | "In the following lessons, we'll take a look at how to add columns to our pipeline output, and how to filter down to a subset of securities."
253 | ]
254 | },
255 | {
256 | "cell_type": "markdown",
257 | "metadata": {},
258 | "source": [
259 | "---\n",
260 | "\n",
261 | "**Next Lesson:** [Factors](Lesson04-Factors.ipynb) "
262 | ]
263 | }
264 | ],
265 | "metadata": {
266 | "kernelspec": {
267 | "display_name": "Python 3.11",
268 | "language": "python",
269 | "name": "python3"
270 | },
271 | "language_info": {
272 | "codemirror_mode": {
273 | "name": "ipython",
274 | "version": 3
275 | },
276 | "file_extension": ".py",
277 | "mimetype": "text/x-python",
278 | "name": "python",
279 | "nbconvert_exporter": "python",
280 | "pygments_lexer": "ipython3",
281 | "version": "3.11.0"
282 | }
283 | },
284 | "nbformat": 4,
285 | "nbformat_minor": 4
286 | }
287 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson04-Factors.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 4: Factors\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Factors\n",
25 | "A factor is a function from an asset and a moment in time to a number.\n",
26 | "```\n",
27 | "F(asset, timestamp) -> float\n",
28 | "```\n",
29 | "In Pipeline, Factors are the most commonly-used term, representing the result of any computation producing a numerical result. Factors require a column of data and a window length as input.\n",
30 | "\n",
31 | "The simplest factors in Pipeline are built-in Factors. Built-in Factors are pre-built to perform common computations. As a first example, let's make a factor to compute the average close price over the last 10 days. We can use the `SimpleMovingAverage` built-in factor which computes the average value of the input data (close price) over the specified window length (10 days). To do this, we need to import our built-in `SimpleMovingAverage` factor and the `EquityPricing` dataset."
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 1,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "# New from the last lesson, import the EquityPricing dataset.\n",
41 | "from zipline.pipeline import Pipeline, EquityPricing\n",
42 | "from zipline.research import run_pipeline\n",
43 | "\n",
44 | "# New from the last lesson, import the built-in SimpleMovingAverage factor.\n",
45 | "from zipline.pipeline.factors import SimpleMovingAverage"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "To see the full list of built-in factors, click on the `factors` module in the above import statement then press Control, or see the [API Reference](https://www.quantrocket.com/docs/api/#built-in-factors)."
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "## Creating a Factor\n",
60 | "Let's go back to our `make_pipeline` function from the previous lesson and instantiate a `SimpleMovingAverage` factor. To create a `SimpleMovingAverage` factor, we can call the `SimpleMovingAverage` constructor with two arguments: inputs, which must be a list of `BoundColumn` objects, and window_length, which must be an integer indicating how many days worth of data our moving average calculation should receive. (We'll discuss `BoundColumn` in more depth later; for now we just need to know that a `BoundColumn` is an object indicating what kind of data should be passed to our Factor.).\n",
61 | "\n",
62 | "The following line creates a `Factor` for computing the 10-day mean close price of securities."
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 2,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": [
71 | "mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "It's important to note that creating the factor does not actually perform a computation. Creating a factor is like defining the function. To perform a computation, we need to add the factor to our pipeline and run it."
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "## Adding a Factor to a Pipeline\n",
86 | "\n",
87 | "Let's update our original empty pipeline to make it compute our new moving average factor. To start, let's move our factor instantiation into `make_pipeline`. Next, we can tell our pipeline to compute our factor by passing it a `columns` argument, which should be a dictionary mapping column names to factors, filters, or classifiers. Our updated `make_pipeline` function should look something like this:"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": 3,
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "def make_pipeline():\n",
97 | " \n",
98 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
99 | " \n",
100 | " return Pipeline(\n",
101 | " columns={\n",
102 | " '10_day_mean_close': mean_close_10\n",
103 | " }\n",
104 | " )"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "To see what this looks like, let's make our pipeline, run it, and display the result."
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": 4,
117 | "metadata": {
118 | "scrolled": true
119 | },
120 | "outputs": [
121 | {
122 | "data": {
123 | "text/html": [
124 | "\n",
125 | "\n",
138 | "
\n",
139 | " \n",
140 | " \n",
141 | " | \n",
142 | " | \n",
143 | " 10_day_mean_close | \n",
144 | "
\n",
145 | " \n",
146 | " date | \n",
147 | " asset | \n",
148 | " | \n",
149 | "
\n",
150 | " \n",
151 | " \n",
152 | " \n",
153 | " 2010-01-05 | \n",
154 | " Equity(FIBBG000C2V3D6 [A]) | \n",
155 | " 30.432000 | \n",
156 | "
\n",
157 | " \n",
158 | " Equity(QI000000004076 [AABA]) | \n",
159 | " 16.605000 | \n",
160 | "
\n",
161 | " \n",
162 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
163 | " 6.434000 | \n",
164 | "
\n",
165 | " \n",
166 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
167 | " 4.501444 | \n",
168 | "
\n",
169 | " \n",
170 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
171 | " 5.250000 | \n",
172 | "
\n",
173 | " \n",
174 | " ... | \n",
175 | " ... | \n",
176 | "
\n",
177 | " \n",
178 | " Equity(FIBBG011MC2100 [AATC]) | \n",
179 | " 11.980500 | \n",
180 | "
\n",
181 | " \n",
182 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
183 | " NaN | \n",
184 | "
\n",
185 | " \n",
186 | " Equity(FIBBG000008NR0 [ISM]) | \n",
187 | " NaN | \n",
188 | "
\n",
189 | " \n",
190 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
191 | " NaN | \n",
192 | "
\n",
193 | " \n",
194 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
195 | " 106.570000 | \n",
196 | "
\n",
197 | " \n",
198 | "
\n",
199 | "
7841 rows × 1 columns
\n",
200 | "
"
201 | ],
202 | "text/plain": [
203 | " 10_day_mean_close\n",
204 | "date asset \n",
205 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 30.432000\n",
206 | " Equity(QI000000004076 [AABA]) 16.605000\n",
207 | " Equity(FIBBG000BZWHH8 [AACC]) 6.434000\n",
208 | " Equity(FIBBG000V2S3P6 [AACG]) 4.501444\n",
209 | " Equity(FIBBG000M7KQ09 [AAI]) 5.250000\n",
210 | "... ...\n",
211 | " Equity(FIBBG011MC2100 [AATC]) 11.980500\n",
212 | " Equity(FIBBG000GDBDH4 [BDG]) NaN\n",
213 | " Equity(FIBBG000008NR0 [ISM]) NaN\n",
214 | " Equity(FIBBG000GZ24W8 [PEM]) NaN\n",
215 | " Equity(FIBBG000BB5S87 [HCH]) 106.570000\n",
216 | "\n",
217 | "[7841 rows x 1 columns]"
218 | ]
219 | },
220 | "execution_count": 4,
221 | "metadata": {},
222 | "output_type": "execute_result"
223 | }
224 | ],
225 | "source": [
226 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
227 | "result"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "Now we have a column in our pipeline output with the 10-day average close price for all ~8000 securities (display truncated). Note that each row corresponds to the result of our computation for a given security on a given date stored. The `DataFrame` has a MultiIndex where the first level is a datetime representing the date of the computation and the second level is an `Equity` object corresponding to the security.\n",
235 | "\n",
236 | "If we run our pipeline over more than one day, the output looks like this."
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 5,
242 | "metadata": {
243 | "scrolled": true
244 | },
245 | "outputs": [
246 | {
247 | "data": {
248 | "text/html": [
249 | "\n",
250 | "\n",
263 | "
\n",
264 | " \n",
265 | " \n",
266 | " | \n",
267 | " | \n",
268 | " 10_day_mean_close | \n",
269 | "
\n",
270 | " \n",
271 | " date | \n",
272 | " asset | \n",
273 | " | \n",
274 | "
\n",
275 | " \n",
276 | " \n",
277 | " \n",
278 | " 2010-01-05 | \n",
279 | " Equity(FIBBG000C2V3D6 [A]) | \n",
280 | " 30.432000 | \n",
281 | "
\n",
282 | " \n",
283 | " Equity(QI000000004076 [AABA]) | \n",
284 | " 16.605000 | \n",
285 | "
\n",
286 | " \n",
287 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
288 | " 6.434000 | \n",
289 | "
\n",
290 | " \n",
291 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
292 | " 4.501444 | \n",
293 | "
\n",
294 | " \n",
295 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
296 | " 5.250000 | \n",
297 | "
\n",
298 | " \n",
299 | " ... | \n",
300 | " ... | \n",
301 | " ... | \n",
302 | "
\n",
303 | " \n",
304 | " 2010-01-07 | \n",
305 | " Equity(FIBBG011MC2100 [AATC]) | \n",
306 | " 11.816000 | \n",
307 | "
\n",
308 | " \n",
309 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
310 | " NaN | \n",
311 | "
\n",
312 | " \n",
313 | " Equity(FIBBG000008NR0 [ISM]) | \n",
314 | " NaN | \n",
315 | "
\n",
316 | " \n",
317 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
318 | " NaN | \n",
319 | "
\n",
320 | " \n",
321 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
322 | " 109.796667 | \n",
323 | "
\n",
324 | " \n",
325 | "
\n",
326 | "
23534 rows × 1 columns
\n",
327 | "
"
328 | ],
329 | "text/plain": [
330 | " 10_day_mean_close\n",
331 | "date asset \n",
332 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 30.432000\n",
333 | " Equity(QI000000004076 [AABA]) 16.605000\n",
334 | " Equity(FIBBG000BZWHH8 [AACC]) 6.434000\n",
335 | " Equity(FIBBG000V2S3P6 [AACG]) 4.501444\n",
336 | " Equity(FIBBG000M7KQ09 [AAI]) 5.250000\n",
337 | "... ...\n",
338 | "2010-01-07 Equity(FIBBG011MC2100 [AATC]) 11.816000\n",
339 | " Equity(FIBBG000GDBDH4 [BDG]) NaN\n",
340 | " Equity(FIBBG000008NR0 [ISM]) NaN\n",
341 | " Equity(FIBBG000GZ24W8 [PEM]) NaN\n",
342 | " Equity(FIBBG000BB5S87 [HCH]) 109.796667\n",
343 | "\n",
344 | "[23534 rows x 1 columns]"
345 | ]
346 | },
347 | "execution_count": 5,
348 | "metadata": {},
349 | "output_type": "execute_result"
350 | }
351 | ],
352 | "source": [
353 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-07')\n",
354 | "result"
355 | ]
356 | },
357 | {
358 | "cell_type": "markdown",
359 | "metadata": {},
360 | "source": [
361 | "Note: factors can also be added to an existing `Pipeline` instance using the `Pipeline.add` method. Using `add` looks something like this:\n",
362 | "\n",
363 | "```python\n",
364 | "my_pipe = Pipeline()\n",
365 | "f1 = SomeFactor(...)\n",
366 | "my_pipe.add(f1, 'f1')\n",
367 | "```"
368 | ]
369 | },
370 | {
371 | "cell_type": "markdown",
372 | "metadata": {},
373 | "source": [
374 | "## Latest\n",
375 | "The most commonly used built-in `Factor` is `Latest`. The `Latest` factor gets the most recent value of a given data column. This factor is common enough that it is instantiated differently from other factors. The best way to get the latest value of a data column is by getting its `.latest` attribute. As an example, let's update `make_pipeline` to create a latest close price factor and add it to our pipeline:"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": 6,
381 | "metadata": {},
382 | "outputs": [],
383 | "source": [
384 | "def make_pipeline():\n",
385 | "\n",
386 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
387 | " latest_close = EquityPricing.close.latest\n",
388 | "\n",
389 | " return Pipeline(\n",
390 | " columns={\n",
391 | " '10_day_mean_close': mean_close_10,\n",
392 | " 'latest_close_price': latest_close\n",
393 | " }\n",
394 | " )"
395 | ]
396 | },
397 | {
398 | "cell_type": "markdown",
399 | "metadata": {},
400 | "source": [
401 | "And now, when we make and run our pipeline again, there are two columns in our output dataframe. One column has the 10-day mean close price of each security, and the other has the latest close price."
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": 7,
407 | "metadata": {},
408 | "outputs": [
409 | {
410 | "data": {
411 | "text/html": [
412 | "\n",
413 | "\n",
426 | "
\n",
427 | " \n",
428 | " \n",
429 | " | \n",
430 | " | \n",
431 | " 10_day_mean_close | \n",
432 | " latest_close_price | \n",
433 | "
\n",
434 | " \n",
435 | " date | \n",
436 | " asset | \n",
437 | " | \n",
438 | " | \n",
439 | "
\n",
440 | " \n",
441 | " \n",
442 | " \n",
443 | " 2010-01-05 | \n",
444 | " Equity(FIBBG000C2V3D6 [A]) | \n",
445 | " 30.432000 | \n",
446 | " 31.300 | \n",
447 | "
\n",
448 | " \n",
449 | " Equity(QI000000004076 [AABA]) | \n",
450 | " 16.605000 | \n",
451 | " 17.100 | \n",
452 | "
\n",
453 | " \n",
454 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
455 | " 6.434000 | \n",
456 | " 7.150 | \n",
457 | "
\n",
458 | " \n",
459 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
460 | " 4.501444 | \n",
461 | " 4.702 | \n",
462 | "
\n",
463 | " \n",
464 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
465 | " 5.250000 | \n",
466 | " 5.180 | \n",
467 | "
\n",
468 | " \n",
469 | "
\n",
470 | "
"
471 | ],
472 | "text/plain": [
473 | " 10_day_mean_close latest_close_price\n",
474 | "date asset \n",
475 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 30.432000 31.300\n",
476 | " Equity(QI000000004076 [AABA]) 16.605000 17.100\n",
477 | " Equity(FIBBG000BZWHH8 [AACC]) 6.434000 7.150\n",
478 | " Equity(FIBBG000V2S3P6 [AACG]) 4.501444 4.702\n",
479 | " Equity(FIBBG000M7KQ09 [AAI]) 5.250000 5.180"
480 | ]
481 | },
482 | "execution_count": 7,
483 | "metadata": {},
484 | "output_type": "execute_result"
485 | }
486 | ],
487 | "source": [
488 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
489 | "result.head(5)"
490 | ]
491 | },
492 | {
493 | "cell_type": "markdown",
494 | "metadata": {},
495 | "source": [
496 | "`.latest` can sometimes return things other than `Factors`. We'll see examples of other possible return types in later lessons."
497 | ]
498 | },
499 | {
500 | "cell_type": "markdown",
501 | "metadata": {},
502 | "source": [
503 | "## Default Inputs\n",
504 | "Some factors have default inputs that should never be changed. For example the VWAP built-in factor is always calculated from `EquityPricing.close` and `EquityPricing.volume`. When a factor is always calculated from the same `BoundColumn`, we can call the constructor without specifying `inputs`."
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": 8,
510 | "metadata": {},
511 | "outputs": [],
512 | "source": [
513 | "from zipline.pipeline.factors import VWAP\n",
514 | "vwap = VWAP(window_length=10)"
515 | ]
516 | },
517 | {
518 | "cell_type": "markdown",
519 | "metadata": {},
520 | "source": [
521 | "## Choosing a Start Date\n",
522 | "\n",
523 | "When choosing a `start_date` for `run_pipeline`, there are two gotchas to keep in mind. First, the earliest possible `start_date` you can specify must be one day after the start date of the bundle. This is because the `start_date` you pass to `run_pipeline` indicates the first date you want to include in the pipeline output, and each day's pipeline output is based on the previous day's data. The purpose of this one-day lag is to avoid lookahead bias. Pipeline output tells you what you would have known at the start of each day, based on the previous day's data.\n",
524 | "\n",
525 | "The learning bundle starts on 2007-01-03 (the first trading day of 2007), but if we try to run a pipeline that starts on (or before) that date, we'll get an error that tells us to start one day after the bundle start date:"
526 | ]
527 | },
528 | {
529 | "cell_type": "code",
530 | "execution_count": 9,
531 | "metadata": {
532 | "scrolled": true
533 | },
534 | "outputs": [
535 | {
536 | "ename": "ValidationError",
537 | "evalue": "start_date cannot be earlier than 2007-01-04 for this bundle (one session after the bundle start date of 2007-01-03)",
538 | "output_type": "error",
539 | "traceback": [
540 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
541 | "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
542 | "Cell \u001b[0;32mIn[9], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43mrun_pipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43mPipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstart_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m2007-01-03\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m2007-01-03\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n",
543 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/research/pipeline.py:95\u001b[0m, in \u001b[0;36mrun_pipeline\u001b[0;34m(pipeline, start_date, end_date, bundle)\u001b[0m\n\u001b[1;32m 36\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mrun_pipeline\u001b[39m(\n\u001b[1;32m 37\u001b[0m pipeline: Pipeline,\n\u001b[1;32m 38\u001b[0m start_date: \u001b[38;5;28mstr\u001b[39m,\n\u001b[1;32m 39\u001b[0m end_date: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 40\u001b[0m bundle: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 41\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame:\n\u001b[1;32m 42\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 43\u001b[0m \u001b[38;5;124;03m Compute values for pipeline from start_date to end_date, using the specified\u001b[39;00m\n\u001b[1;32m 44\u001b[0m \u001b[38;5;124;03m bundle or the default bundle.\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[38;5;124;03m factor = run_pipeline(pipeline, '2018-01-01', '2019-02-01', bundle=\"usstock-1min\")\u001b[39;00m\n\u001b[1;32m 94\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 95\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_run_pipeline\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 96\u001b[0m \u001b[43m \u001b[49m\u001b[43mpipeline\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 97\u001b[0m \u001b[43m \u001b[49m\u001b[43mstart_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_date\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 98\u001b[0m \u001b[43m \u001b[49m\u001b[43mend_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mend_date\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 99\u001b[0m \u001b[43m \u001b[49m\u001b[43mbundle\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbundle\u001b[49m\u001b[43m)\u001b[49m\n",
544 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/research/pipeline.py:149\u001b[0m, in \u001b[0;36m_run_pipeline\u001b[0;34m(pipeline, start_date, end_date, bundle, mask)\u001b[0m\n\u001b[1;32m 147\u001b[0m second_session \u001b[38;5;241m=\u001b[39m exchange_calendar\u001b[38;5;241m.\u001b[39mnext_session(first_session)\n\u001b[1;32m 148\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m start_date \u001b[38;5;241m<\u001b[39m second_session:\n\u001b[0;32m--> 149\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ValidationError(\n\u001b[1;32m 150\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mstart_date cannot be earlier than \u001b[39m\u001b[38;5;132;01m{\u001b[39;00msecond_session\u001b[38;5;241m.\u001b[39mdate()\u001b[38;5;241m.\u001b[39misoformat()\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 151\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfor this bundle (one session after the bundle start date of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfirst_session\u001b[38;5;241m.\u001b[39mdate()\u001b[38;5;241m.\u001b[39misoformat()\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 153\u001b[0m \u001b[38;5;66;03m# Roll-forward start_date to valid session\u001b[39;00m\n\u001b[1;32m 154\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m i \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mrange\u001b[39m(\u001b[38;5;241m100\u001b[39m):\n",
545 | "\u001b[0;31mValidationError\u001b[0m: start_date cannot be earlier than 2007-01-04 for this bundle (one session after the bundle start date of 2007-01-03)"
546 | ]
547 | }
548 | ],
549 | "source": [
550 | "result = run_pipeline(Pipeline(), start_date='2007-01-03', end_date='2007-01-03')"
551 | ]
552 | },
553 | {
554 | "cell_type": "markdown",
555 | "metadata": {},
556 | "source": [
557 | "The second gotcha to keep in mind is that the `start_date` you choose must also make allowance for the `window_length` of your factors. The following pipeline includes a 10-day VWAP factor, so if we set the `start_date` to 2007-01-04 (as suggested by the previous error message), we will get a new error (scroll to the bottom of the traceback for the useful error message): "
558 | ]
559 | },
560 | {
561 | "cell_type": "code",
562 | "execution_count": 10,
563 | "metadata": {
564 | "scrolled": true
565 | },
566 | "outputs": [
567 | {
568 | "ename": "NoDataOnDate",
569 | "evalue": "the pipeline definition requires EquityPricing.close::float64 data on 2006-12-18 00:00:00 but no bundle data is available on that date; the cause of this issue is that another pipeline term needs EquityPricing.close::float64 and has a window_length of 10, which necessitates loading 9 extra rows of EquityPricing.close::float64; try setting a later start date so that the maximum window_length of any term doesn't extend further back than the bundle start date. Review the pipeline dependencies below to help determine which terms are causing the problem:\n\n{'dependencies': [{'term': EquityPricing.close::float64,\n 'used_by': VWAP([EquityPricing.close, EquityPricing.volume], 10)},\n {'term': EquityPricing.volume::float64,\n 'used_by': VWAP([EquityPricing.close, EquityPricing.volume], 10)}],\n 'nodes': [{'extra_rows': 9, 'needed_for': EquityPricing.close::float64},\n {'extra_rows': 9, 'needed_for': EquityPricing.volume::float64}]}",
570 | "output_type": "error",
571 | "traceback": [
572 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
573 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
574 | "File \u001b[0;32mindex.pyx:598\u001b[0m, in \u001b[0;36mpandas._libs.index.DatetimeEngine.get_loc\u001b[0;34m()\u001b[0m\n",
575 | "File \u001b[0;32mpandas/_libs/hashtable_class_helper.pxi:2606\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n",
576 | "File \u001b[0;32mpandas/_libs/hashtable_class_helper.pxi:2630\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.Int64HashTable.get_item\u001b[0;34m()\u001b[0m\n",
577 | "\u001b[0;31mKeyError\u001b[0m: 1166400000000000000",
578 | "\nDuring handling of the above exception, another exception occurred:\n",
579 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
580 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3790\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 3789\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 3790\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcasted_key\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 3791\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
581 | "File \u001b[0;32mindex.pyx:566\u001b[0m, in \u001b[0;36mpandas._libs.index.DatetimeEngine.get_loc\u001b[0;34m()\u001b[0m\n",
582 | "File \u001b[0;32mindex.pyx:600\u001b[0m, in \u001b[0;36mpandas._libs.index.DatetimeEngine.get_loc\u001b[0;34m()\u001b[0m\n",
583 | "\u001b[0;31mKeyError\u001b[0m: Timestamp('2006-12-18 00:00:00')",
584 | "\nThe above exception was the direct cause of the following exception:\n",
585 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
586 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/pandas/core/indexes/datetimes.py:631\u001b[0m, in \u001b[0;36mDatetimeIndex.get_loc\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 630\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 631\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mIndex\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 632\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
587 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3797\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 3796\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidIndexError(key)\n\u001b[0;32m-> 3797\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[1;32m 3798\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:\n\u001b[1;32m 3799\u001b[0m \u001b[38;5;66;03m# If we have a listlike key, _check_indexing_error will raise\u001b[39;00m\n\u001b[1;32m 3800\u001b[0m \u001b[38;5;66;03m# InvalidIndexError. Otherwise we fall through and re-raise\u001b[39;00m\n\u001b[1;32m 3801\u001b[0m \u001b[38;5;66;03m# the TypeError.\u001b[39;00m\n",
588 | "\u001b[0;31mKeyError\u001b[0m: Timestamp('2006-12-18 00:00:00')",
589 | "\nThe above exception was the direct cause of the following exception:\n",
590 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
591 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/data/bcolz_daily_bars.py:578\u001b[0m, in \u001b[0;36mBcolzDailyBarReader._load_raw_arrays_date_to_index\u001b[0;34m(self, date)\u001b[0m\n\u001b[1;32m 577\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 578\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msessions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdate\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 579\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m:\n",
592 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/pandas/core/indexes/datetimes.py:633\u001b[0m, in \u001b[0;36mDatetimeIndex.get_loc\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 632\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n\u001b[0;32m--> 633\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(orig_key) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n",
593 | "\u001b[0;31mKeyError\u001b[0m: Timestamp('2006-12-18 00:00:00')",
594 | "\nDuring handling of the above exception, another exception occurred:\n",
595 | "\u001b[0;31mNoDataOnDate\u001b[0m Traceback (most recent call last)",
596 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/engine.py:763\u001b[0m, in \u001b[0;36mSimplePipelineEngine.compute_chunk\u001b[0;34m(self, graph, dates, sids, workspace, refcounts, execution_order, hooks)\u001b[0m\n\u001b[1;32m 762\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 763\u001b[0m loaded \u001b[38;5;241m=\u001b[39m \u001b[43mloader\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_adjusted_array\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 764\u001b[0m \u001b[43m \u001b[49m\u001b[43mdomain\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mto_load\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmask_dates\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msids\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 765\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 766\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m NoDataOnDate \u001b[38;5;28;01mas\u001b[39;00m e:\n",
597 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/loaders/equity_pricing_loader.py:90\u001b[0m, in \u001b[0;36mEquityPricingLoader.load_adjusted_array\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 88\u001b[0m ohlcv_colnames \u001b[38;5;241m=\u001b[39m [c\u001b[38;5;241m.\u001b[39mname \u001b[38;5;28;01mfor\u001b[39;00m c \u001b[38;5;129;01min\u001b[39;00m ohlcv_cols]\n\u001b[0;32m---> 90\u001b[0m raw_ohlcv_arrays \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraw_price_reader\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_raw_arrays\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 91\u001b[0m \u001b[43m \u001b[49m\u001b[43mohlcv_colnames\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 92\u001b[0m \u001b[43m \u001b[49m\u001b[43mshifted_dates\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 93\u001b[0m \u001b[43m \u001b[49m\u001b[43mshifted_dates\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m-\u001b[39;49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 94\u001b[0m \u001b[43m \u001b[49m\u001b[43msids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 95\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 97\u001b[0m \u001b[38;5;66;03m# Currency convert raw_arrays in place if necessary. We use shifted\u001b[39;00m\n\u001b[1;32m 98\u001b[0m \u001b[38;5;66;03m# dates to load currency conversion rates to make them line up with\u001b[39;00m\n\u001b[1;32m 99\u001b[0m \u001b[38;5;66;03m# dates used to fetch prices.\u001b[39;00m\n",
598 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/data/bcolz_daily_bars.py:557\u001b[0m, in \u001b[0;36mBcolzDailyBarReader.load_raw_arrays\u001b[0;34m(self, columns, start_date, end_date, assets)\u001b[0m\n\u001b[1;32m 556\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mload_raw_arrays\u001b[39m(\u001b[38;5;28mself\u001b[39m, columns, start_date, end_date, assets):\n\u001b[0;32m--> 557\u001b[0m start_idx \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_load_raw_arrays_date_to_index\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart_date\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 558\u001b[0m end_idx \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_load_raw_arrays_date_to_index(end_date)\n",
599 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/data/bcolz_daily_bars.py:580\u001b[0m, in \u001b[0;36mBcolzDailyBarReader._load_raw_arrays_date_to_index\u001b[0;34m(self, date)\u001b[0m\n\u001b[1;32m 579\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m:\n\u001b[0;32m--> 580\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m NoDataOnDate(date)\n",
600 | "\u001b[0;31mNoDataOnDate\u001b[0m: 2006-12-18 00:00:00",
601 | "\nDuring handling of the above exception, another exception occurred:\n",
602 | "\u001b[0;31mNoDataOnDate\u001b[0m Traceback (most recent call last)",
603 | "Cell \u001b[0;32mIn[10], line 7\u001b[0m\n\u001b[1;32m 1\u001b[0m pipeline \u001b[38;5;241m=\u001b[39m Pipeline(\n\u001b[1;32m 2\u001b[0m columns\u001b[38;5;241m=\u001b[39m{\n\u001b[1;32m 3\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvwap\u001b[39m\u001b[38;5;124m\"\u001b[39m: VWAP(window_length\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m10\u001b[39m)\n\u001b[1;32m 4\u001b[0m }\n\u001b[1;32m 5\u001b[0m )\n\u001b[0;32m----> 7\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43mrun_pipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpipeline\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstart_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m2007-01-04\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m2007-01-04\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n",
604 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/research/pipeline.py:95\u001b[0m, in \u001b[0;36mrun_pipeline\u001b[0;34m(pipeline, start_date, end_date, bundle)\u001b[0m\n\u001b[1;32m 36\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mrun_pipeline\u001b[39m(\n\u001b[1;32m 37\u001b[0m pipeline: Pipeline,\n\u001b[1;32m 38\u001b[0m start_date: \u001b[38;5;28mstr\u001b[39m,\n\u001b[1;32m 39\u001b[0m end_date: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 40\u001b[0m bundle: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 41\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame:\n\u001b[1;32m 42\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 43\u001b[0m \u001b[38;5;124;03m Compute values for pipeline from start_date to end_date, using the specified\u001b[39;00m\n\u001b[1;32m 44\u001b[0m \u001b[38;5;124;03m bundle or the default bundle.\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[38;5;124;03m factor = run_pipeline(pipeline, '2018-01-01', '2019-02-01', bundle=\"usstock-1min\")\u001b[39;00m\n\u001b[1;32m 94\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 95\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_run_pipeline\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 96\u001b[0m \u001b[43m \u001b[49m\u001b[43mpipeline\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 97\u001b[0m \u001b[43m \u001b[49m\u001b[43mstart_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_date\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 98\u001b[0m \u001b[43m \u001b[49m\u001b[43mend_date\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mend_date\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 99\u001b[0m \u001b[43m \u001b[49m\u001b[43mbundle\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbundle\u001b[49m\u001b[43m)\u001b[49m\n",
605 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/research/pipeline.py:251\u001b[0m, in \u001b[0;36m_run_pipeline\u001b[0;34m(pipeline, start_date, end_date, bundle, mask)\u001b[0m\n\u001b[1;32m 248\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m use_chunks:\n\u001b[1;32m 249\u001b[0m \u001b[38;5;66;03m# Run in 1-years chunks to reduce memory usage\u001b[39;00m\n\u001b[1;32m 250\u001b[0m chunksize \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m252\u001b[39m\n\u001b[0;32m--> 251\u001b[0m results \u001b[38;5;241m=\u001b[39m \u001b[43mengine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun_chunked_pipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpipeline\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstart_date\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend_date\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mchunksize\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchunksize\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 252\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 253\u001b[0m results \u001b[38;5;241m=\u001b[39m engine\u001b[38;5;241m.\u001b[39mrun_pipeline(pipeline, start_date, end_date)\n",
606 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/engine.py:350\u001b[0m, in \u001b[0;36mSimplePipelineEngine.run_chunked_pipeline\u001b[0;34m(self, pipeline, start_date, end_date, chunksize, hooks)\u001b[0m\n\u001b[1;32m 348\u001b[0m run_pipeline \u001b[38;5;241m=\u001b[39m partial(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run_pipeline_impl, pipeline, hooks\u001b[38;5;241m=\u001b[39mhooks)\n\u001b[1;32m 349\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m hooks\u001b[38;5;241m.\u001b[39mrunning_pipeline(pipeline, start_date, end_date):\n\u001b[0;32m--> 350\u001b[0m chunks \u001b[38;5;241m=\u001b[39m \u001b[43m[\u001b[49m\u001b[43mrun_pipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43me\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43me\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mranges\u001b[49m\u001b[43m]\u001b[49m\n\u001b[1;32m 352\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(chunks) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 353\u001b[0m \u001b[38;5;66;03m# OPTIMIZATION: Don't make an extra copy in `categorical_df_concat`\u001b[39;00m\n\u001b[1;32m 354\u001b[0m \u001b[38;5;66;03m# if we don't have to.\u001b[39;00m\n\u001b[1;32m 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m chunks[\u001b[38;5;241m0\u001b[39m]\n",
607 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/engine.py:350\u001b[0m, in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 348\u001b[0m run_pipeline \u001b[38;5;241m=\u001b[39m partial(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run_pipeline_impl, pipeline, hooks\u001b[38;5;241m=\u001b[39mhooks)\n\u001b[1;32m 349\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m hooks\u001b[38;5;241m.\u001b[39mrunning_pipeline(pipeline, start_date, end_date):\n\u001b[0;32m--> 350\u001b[0m chunks \u001b[38;5;241m=\u001b[39m [\u001b[43mrun_pipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43me\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;28;01mfor\u001b[39;00m s, e \u001b[38;5;129;01min\u001b[39;00m ranges]\n\u001b[1;32m 352\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(chunks) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 353\u001b[0m \u001b[38;5;66;03m# OPTIMIZATION: Don't make an extra copy in `categorical_df_concat`\u001b[39;00m\n\u001b[1;32m 354\u001b[0m \u001b[38;5;66;03m# if we don't have to.\u001b[39;00m\n\u001b[1;32m 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m chunks[\u001b[38;5;241m0\u001b[39m]\n",
608 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/engine.py:440\u001b[0m, in \u001b[0;36mSimplePipelineEngine._run_pipeline_impl\u001b[0;34m(self, pipeline, start_date, end_date, hooks)\u001b[0m\n\u001b[1;32m 434\u001b[0m execution_order \u001b[38;5;241m=\u001b[39m plan\u001b[38;5;241m.\u001b[39mexecution_order(workspace, refcounts)\n\u001b[1;32m 436\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m hooks\u001b[38;5;241m.\u001b[39mcomputing_chunk(execution_order,\n\u001b[1;32m 437\u001b[0m start_date,\n\u001b[1;32m 438\u001b[0m end_date):\n\u001b[0;32m--> 440\u001b[0m results \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompute_chunk\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 441\u001b[0m \u001b[43m \u001b[49m\u001b[43mgraph\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mplan\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 442\u001b[0m \u001b[43m \u001b[49m\u001b[43mdates\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdates\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 443\u001b[0m \u001b[43m \u001b[49m\u001b[43msids\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 444\u001b[0m \u001b[43m \u001b[49m\u001b[43mworkspace\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mworkspace\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 445\u001b[0m \u001b[43m \u001b[49m\u001b[43mrefcounts\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrefcounts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 446\u001b[0m \u001b[43m \u001b[49m\u001b[43mexecution_order\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexecution_order\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 447\u001b[0m \u001b[43m \u001b[49m\u001b[43mhooks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mhooks\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 448\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 450\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_to_narrow(\n\u001b[1;32m 451\u001b[0m plan\u001b[38;5;241m.\u001b[39moutputs,\n\u001b[1;32m 452\u001b[0m results,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 455\u001b[0m sids,\n\u001b[1;32m 456\u001b[0m )\n",
609 | "File \u001b[0;32m/opt/conda/lib/python3.11/site-packages/zipline/pipeline/engine.py:777\u001b[0m, in \u001b[0;36mSimplePipelineEngine.compute_chunk\u001b[0;34m(self, graph, dates, sids, workspace, refcounts, execution_order, hooks)\u001b[0m\n\u001b[1;32m 767\u001b[0m extra_rows \u001b[38;5;241m=\u001b[39m graph\u001b[38;5;241m.\u001b[39mextra_rows[term]\n\u001b[1;32m 768\u001b[0m msg \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m 769\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mthe pipeline definition requires \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mterm\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m data on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mstr\u001b[39m(e)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m but no bundle data is \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 770\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mavailable on that date; the cause of this issue is that another pipeline term needs \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 775\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mthe problem:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mrepr\u001b[39m(graph)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 776\u001b[0m )\n\u001b[0;32m--> 777\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m NoDataOnDate(msg)\n\u001b[1;32m 778\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mset\u001b[39m(loaded) \u001b[38;5;241m==\u001b[39m \u001b[38;5;28mset\u001b[39m(to_load), (\n\u001b[1;32m 779\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mloader did not return an AdjustedArray for each column\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m'\u001b[39m\n\u001b[1;32m 780\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mexpected: \u001b[39m\u001b[38;5;132;01m%r\u001b[39;00m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m'\u001b[39m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 784\u001b[0m )\n\u001b[1;32m 785\u001b[0m )\n\u001b[1;32m 786\u001b[0m workspace\u001b[38;5;241m.\u001b[39mupdate(loaded)\n",
610 | "\u001b[0;31mNoDataOnDate\u001b[0m: the pipeline definition requires EquityPricing.close::float64 data on 2006-12-18 00:00:00 but no bundle data is available on that date; the cause of this issue is that another pipeline term needs EquityPricing.close::float64 and has a window_length of 10, which necessitates loading 9 extra rows of EquityPricing.close::float64; try setting a later start date so that the maximum window_length of any term doesn't extend further back than the bundle start date. Review the pipeline dependencies below to help determine which terms are causing the problem:\n\n{'dependencies': [{'term': EquityPricing.close::float64,\n 'used_by': VWAP([EquityPricing.close, EquityPricing.volume], 10)},\n {'term': EquityPricing.volume::float64,\n 'used_by': VWAP([EquityPricing.close, EquityPricing.volume], 10)}],\n 'nodes': [{'extra_rows': 9, 'needed_for': EquityPricing.close::float64},\n {'extra_rows': 9, 'needed_for': EquityPricing.volume::float64}]}"
611 | ]
612 | }
613 | ],
614 | "source": [
615 | "pipeline = Pipeline(\n",
616 | " columns={\n",
617 | " \"vwap\": VWAP(window_length=10)\n",
618 | " }\n",
619 | ")\n",
620 | "\n",
621 | "result = run_pipeline(pipeline, start_date='2007-01-04', end_date='2007-01-04')"
622 | ]
623 | },
624 | {
625 | "cell_type": "markdown",
626 | "metadata": {},
627 | "source": [
628 | "The error message indicates that we would need data back to 2006-12-18 in order to calculate a 10-day VWAP and produce pipeline output on 2007-01-04 (`window_length` is measured in trading days, not calendar days). The solution is to set a later start date so that the VWAP factor doesn't require data prior to the bundle start date of 2007-01-03. In this example, the earliest possible `start_date` turns out to be 2007-01-18 (14 calendar days, or 10 trading days, after 2007-01-04). "
629 | ]
630 | },
631 | {
632 | "cell_type": "code",
633 | "execution_count": 11,
634 | "metadata": {},
635 | "outputs": [],
636 | "source": [
637 | "result = run_pipeline(pipeline, start_date='2007-01-18', end_date='2007-01-18')"
638 | ]
639 | },
640 | {
641 | "cell_type": "markdown",
642 | "metadata": {},
643 | "source": [
644 | "---\n",
645 | "\n",
646 | "**Next Lesson:** [Combining Factors](Lesson05-Combining-Factors.ipynb) "
647 | ]
648 | }
649 | ],
650 | "metadata": {
651 | "kernelspec": {
652 | "display_name": "Python 3.11",
653 | "language": "python",
654 | "name": "python3"
655 | },
656 | "language_info": {
657 | "codemirror_mode": {
658 | "name": "ipython",
659 | "version": 3
660 | },
661 | "file_extension": ".py",
662 | "mimetype": "text/x-python",
663 | "name": "python",
664 | "nbconvert_exporter": "python",
665 | "pygments_lexer": "ipython3",
666 | "version": "3.11.0"
667 | }
668 | },
669 | "nbformat": 4,
670 | "nbformat_minor": 4
671 | }
672 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson05-Combining-Factors.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 5: Combining Factors\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Combining Factors\n",
25 | "\n",
26 | "Factors can be combined, both with other Factors and with scalar values, via any of the builtin mathematical operators (+, -, \\*, etc). This makes it easy to write complex expressions that combine multiple Factors. For example, constructing a Factor that computes the average of two other Factors is simply:\n",
27 | "\n",
28 | "```python\n",
29 | ">>> f1 = SomeFactor(...)\n",
30 | ">>> f2 = SomeOtherFactor(...)\n",
31 | ">>> average = (f1 + f2) / 2.0\n",
32 | "```\n",
33 | "\n",
34 | "In this lesson, we will create a pipeline that creates a `relative_difference` factor by combining a 10-day average factor and a 30-day average factor. \n",
35 | "\n",
36 | "As usual, let's start with our imports:"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 1,
42 | "metadata": {},
43 | "outputs": [],
44 | "source": [
45 | "from zipline.pipeline import Pipeline, EquityPricing\n",
46 | "from zipline.research import run_pipeline\n",
47 | "from zipline.pipeline.factors import SimpleMovingAverage"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "For this example, we need two factors: a 10-day mean close price factor, and a 30-day one:"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 2,
60 | "metadata": {
61 | "collapsed": false,
62 | "jupyter": {
63 | "outputs_hidden": false
64 | }
65 | },
66 | "outputs": [],
67 | "source": [
68 | "mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
69 | "mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "Then, let's create a percent difference factor by combining our `mean_close_30` factor with our `mean_close_10` factor."
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 3,
82 | "metadata": {
83 | "collapsed": false,
84 | "jupyter": {
85 | "outputs_hidden": false
86 | }
87 | },
88 | "outputs": [],
89 | "source": [
90 | "percent_difference = (mean_close_10 - mean_close_30) / mean_close_30"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "In this example, `percent_difference` is still a `Factor` even though it's composed as a combination of more primitive factors. We can add `percent_difference` as a column in our pipeline. Let's define `make_pipeline` to create a pipeline with `percent_difference` as a column (and not the mean close factors):"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 4,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "def make_pipeline():\n",
107 | "\n",
108 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
109 | " mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)\n",
110 | "\n",
111 | " percent_difference = (mean_close_10 - mean_close_30) / mean_close_30\n",
112 | "\n",
113 | " return Pipeline(\n",
114 | " columns={\n",
115 | " 'percent_difference': percent_difference\n",
116 | " }\n",
117 | " )"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "Let's see what the new output looks like:"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 5,
130 | "metadata": {
131 | "collapsed": false,
132 | "jupyter": {
133 | "outputs_hidden": false
134 | },
135 | "scrolled": true
136 | },
137 | "outputs": [
138 | {
139 | "data": {
140 | "text/html": [
141 | "\n",
142 | "\n",
155 | "
\n",
156 | " \n",
157 | " \n",
158 | " | \n",
159 | " | \n",
160 | " percent_difference | \n",
161 | "
\n",
162 | " \n",
163 | " date | \n",
164 | " asset | \n",
165 | " | \n",
166 | "
\n",
167 | " \n",
168 | " \n",
169 | " \n",
170 | " 2010-01-05 | \n",
171 | " Equity(FIBBG000C2V3D6 [A]) | \n",
172 | " 0.021425 | \n",
173 | "
\n",
174 | " \n",
175 | " Equity(QI000000004076 [AABA]) | \n",
176 | " 0.050484 | \n",
177 | "
\n",
178 | " \n",
179 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
180 | " 0.059385 | \n",
181 | "
\n",
182 | " \n",
183 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
184 | " -0.079614 | \n",
185 | "
\n",
186 | " \n",
187 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
188 | " 0.068811 | \n",
189 | "
\n",
190 | " \n",
191 | " ... | \n",
192 | " ... | \n",
193 | "
\n",
194 | " \n",
195 | " Equity(FIBBG011MC2100 [AATC]) | \n",
196 | " -0.047524 | \n",
197 | "
\n",
198 | " \n",
199 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
200 | " NaN | \n",
201 | "
\n",
202 | " \n",
203 | " Equity(FIBBG000008NR0 [ISM]) | \n",
204 | " NaN | \n",
205 | "
\n",
206 | " \n",
207 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
208 | " NaN | \n",
209 | "
\n",
210 | " \n",
211 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
212 | " 0.045581 | \n",
213 | "
\n",
214 | " \n",
215 | "
\n",
216 | "
7841 rows × 1 columns
\n",
217 | "
"
218 | ],
219 | "text/plain": [
220 | " percent_difference\n",
221 | "date asset \n",
222 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.021425\n",
223 | " Equity(QI000000004076 [AABA]) 0.050484\n",
224 | " Equity(FIBBG000BZWHH8 [AACC]) 0.059385\n",
225 | " Equity(FIBBG000V2S3P6 [AACG]) -0.079614\n",
226 | " Equity(FIBBG000M7KQ09 [AAI]) 0.068811\n",
227 | "... ...\n",
228 | " Equity(FIBBG011MC2100 [AATC]) -0.047524\n",
229 | " Equity(FIBBG000GDBDH4 [BDG]) NaN\n",
230 | " Equity(FIBBG000008NR0 [ISM]) NaN\n",
231 | " Equity(FIBBG000GZ24W8 [PEM]) NaN\n",
232 | " Equity(FIBBG000BB5S87 [HCH]) 0.045581\n",
233 | "\n",
234 | "[7841 rows x 1 columns]"
235 | ]
236 | },
237 | "execution_count": 5,
238 | "metadata": {},
239 | "output_type": "execute_result"
240 | }
241 | ],
242 | "source": [
243 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
244 | "result"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "---\n",
252 | "\n",
253 | "**Next Lesson:** [Filters](Lesson06-Filters.ipynb) "
254 | ]
255 | }
256 | ],
257 | "metadata": {
258 | "kernelspec": {
259 | "display_name": "Python 3.11",
260 | "language": "python",
261 | "name": "python3"
262 | },
263 | "language_info": {
264 | "codemirror_mode": {
265 | "name": "ipython",
266 | "version": 3
267 | },
268 | "file_extension": ".py",
269 | "mimetype": "text/x-python",
270 | "name": "python",
271 | "nbconvert_exporter": "python",
272 | "pygments_lexer": "ipython3",
273 | "version": "3.11.0"
274 | }
275 | },
276 | "nbformat": 4,
277 | "nbformat_minor": 4
278 | }
279 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson06-Filters.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 6: Filters\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Filters\n",
25 | "A Filter is a function from an asset and a moment in time to a boolean:\n",
26 | "\n",
27 | "```\n",
28 | "F(asset, timestamp) -> boolean\n",
29 | "\n",
30 | "```\n",
31 | "\n",
32 | "In Pipeline, Filters are used for narrowing down the set of securities included in a computation or in the final output of a pipeline. There are two common ways to create a `Filter`: comparison operators and `Factor`/`Classifier` methods."
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 1,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "from zipline.pipeline import Pipeline, EquityPricing\n",
42 | "from zipline.research import run_pipeline\n",
43 | "from zipline.pipeline.factors import SimpleMovingAverage"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## Comparison Operators\n",
51 | "\n",
52 | "Comparison operators on `Factors` and `Classifiers` produce Filters. Since we haven't looked at `Classifiers` yet, let's stick to examples using `Factors`. The following example produces a filter that returns `True` whenever the latest close price is above $20."
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 2,
58 | "metadata": {},
59 | "outputs": [],
60 | "source": [
61 | "last_close_price = EquityPricing.close.latest\n",
62 | "close_price_filter = last_close_price > 20"
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "metadata": {},
68 | "source": [
69 | "And this example produces a filter that returns True whenever the 10-day mean is below the 30-day mean."
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 3,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
79 | "mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)\n",
80 | "mean_crossover_filter = mean_close_10 < mean_close_30"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "Remember, each security will get its own `True` or `False` value each day."
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "## Factor/Classifier Methods\n",
95 | "\n",
96 | "Various methods of the `Factor` and `Classifier` classes return `Filters`. Again, since we haven't yet looked at `Classifiers`, let's stick to `Factor` methods for now (we'll look at `Classifier` methods later). The `Factor.top(n)` method produces a `Filter` that returns `True` for the top `n` securities of a given `Factor`. The following example produces a filter that returns `True` for exactly 200 securities every day, indicating that those securities were in the top 200 by last close price across all known securities."
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 4,
102 | "metadata": {},
103 | "outputs": [],
104 | "source": [
105 | "last_close_price = EquityPricing.close.latest\n",
106 | "top_close_price_filter = last_close_price.top(200)"
107 | ]
108 | },
109 | {
110 | "cell_type": "markdown",
111 | "metadata": {},
112 | "source": [
113 | "For a full list of `Factor` methods that return `Filters`, see the [Factor API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor).\n",
114 | "\n",
115 | "For a full list of `Classifier` methods that return `Filters`, see the [Classifier API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.Classifier)."
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "## Dollar Volume Filter\n",
123 | "As a starting example, let's create a filter that returns `True` if a security's 30-day average dollar volume is above $10,000,000. To do this, we'll first need to create an `AverageDollarVolume` factor to compute the 30-day average dollar volume. Let's include the built-in `AverageDollarVolume` factor in our imports:"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 5,
129 | "metadata": {},
130 | "outputs": [],
131 | "source": [
132 | "from zipline.pipeline.factors import AverageDollarVolume"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "And then, let's instantiate our average dollar volume factor."
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 6,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "dollar_volume = AverageDollarVolume(window_length=30)"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "By default, `AverageDollarVolume` uses `EquityPricing.close` and `EquityPricing.volume` as its `inputs`, so we don't specify them."
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "Now that we have a dollar volume factor, we can create a filter with a boolean expression. The following line creates a filter returning `True` for securities with a `dollar_volume` greater than 10,000,000:"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 7,
168 | "metadata": {
169 | "collapsed": false,
170 | "jupyter": {
171 | "outputs_hidden": false
172 | }
173 | },
174 | "outputs": [],
175 | "source": [
176 | "high_dollar_volume = (dollar_volume > 10000000)"
177 | ]
178 | },
179 | {
180 | "cell_type": "markdown",
181 | "metadata": {},
182 | "source": [
183 | "To see what this filter looks like, let's can add it as a column to the pipeline we defined in the previous lesson."
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": 8,
189 | "metadata": {},
190 | "outputs": [],
191 | "source": [
192 | "def make_pipeline():\n",
193 | "\n",
194 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
195 | " mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)\n",
196 | "\n",
197 | " percent_difference = (mean_close_10 - mean_close_30) / mean_close_30\n",
198 | " \n",
199 | " dollar_volume = AverageDollarVolume(window_length=30)\n",
200 | " high_dollar_volume = (dollar_volume > 10000000)\n",
201 | "\n",
202 | " return Pipeline(\n",
203 | " columns={\n",
204 | " 'percent_difference': percent_difference,\n",
205 | " 'high_dollar_volume': high_dollar_volume\n",
206 | " }\n",
207 | " )"
208 | ]
209 | },
210 | {
211 | "cell_type": "markdown",
212 | "metadata": {},
213 | "source": [
214 | "If we make and run our pipeline, we now have a column `high_dollar_volume` with a boolean value corresponding to the result of the expression for each security."
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 9,
220 | "metadata": {
221 | "collapsed": false,
222 | "jupyter": {
223 | "outputs_hidden": false
224 | },
225 | "scrolled": true
226 | },
227 | "outputs": [
228 | {
229 | "data": {
230 | "text/html": [
231 | "\n",
232 | "\n",
245 | "
\n",
246 | " \n",
247 | " \n",
248 | " | \n",
249 | " | \n",
250 | " percent_difference | \n",
251 | " high_dollar_volume | \n",
252 | "
\n",
253 | " \n",
254 | " date | \n",
255 | " asset | \n",
256 | " | \n",
257 | " | \n",
258 | "
\n",
259 | " \n",
260 | " \n",
261 | " \n",
262 | " 2010-01-05 | \n",
263 | " Equity(FIBBG000C2V3D6 [A]) | \n",
264 | " 0.021425 | \n",
265 | " True | \n",
266 | "
\n",
267 | " \n",
268 | " Equity(QI000000004076 [AABA]) | \n",
269 | " 0.050484 | \n",
270 | " True | \n",
271 | "
\n",
272 | " \n",
273 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
274 | " 0.059385 | \n",
275 | " False | \n",
276 | "
\n",
277 | " \n",
278 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
279 | " -0.079614 | \n",
280 | " False | \n",
281 | "
\n",
282 | " \n",
283 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
284 | " 0.068811 | \n",
285 | " True | \n",
286 | "
\n",
287 | " \n",
288 | " ... | \n",
289 | " ... | \n",
290 | " ... | \n",
291 | "
\n",
292 | " \n",
293 | " Equity(FIBBG011MC2100 [AATC]) | \n",
294 | " -0.047524 | \n",
295 | " False | \n",
296 | "
\n",
297 | " \n",
298 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
299 | " NaN | \n",
300 | " False | \n",
301 | "
\n",
302 | " \n",
303 | " Equity(FIBBG000008NR0 [ISM]) | \n",
304 | " NaN | \n",
305 | " False | \n",
306 | "
\n",
307 | " \n",
308 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
309 | " NaN | \n",
310 | " False | \n",
311 | "
\n",
312 | " \n",
313 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
314 | " 0.045581 | \n",
315 | " False | \n",
316 | "
\n",
317 | " \n",
318 | "
\n",
319 | "
7841 rows × 2 columns
\n",
320 | "
"
321 | ],
322 | "text/plain": [
323 | " percent_difference high_dollar_volume\n",
324 | "date asset \n",
325 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.021425 True\n",
326 | " Equity(QI000000004076 [AABA]) 0.050484 True\n",
327 | " Equity(FIBBG000BZWHH8 [AACC]) 0.059385 False\n",
328 | " Equity(FIBBG000V2S3P6 [AACG]) -0.079614 False\n",
329 | " Equity(FIBBG000M7KQ09 [AAI]) 0.068811 True\n",
330 | "... ... ...\n",
331 | " Equity(FIBBG011MC2100 [AATC]) -0.047524 False\n",
332 | " Equity(FIBBG000GDBDH4 [BDG]) NaN False\n",
333 | " Equity(FIBBG000008NR0 [ISM]) NaN False\n",
334 | " Equity(FIBBG000GZ24W8 [PEM]) NaN False\n",
335 | " Equity(FIBBG000BB5S87 [HCH]) 0.045581 False\n",
336 | "\n",
337 | "[7841 rows x 2 columns]"
338 | ]
339 | },
340 | "execution_count": 9,
341 | "metadata": {},
342 | "output_type": "execute_result"
343 | }
344 | ],
345 | "source": [
346 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
347 | "result"
348 | ]
349 | },
350 | {
351 | "cell_type": "markdown",
352 | "metadata": {},
353 | "source": [
354 | "## Applying a Screen\n",
355 | "By default, a pipeline produces computed values each day for every asset in the data bundle. Very often however, we only care about a subset of securities that meet specific criteria (for example, we might only care about securities that have enough daily trading volume to fill our orders quickly). We can tell our Pipeline to ignore securities for which a filter produces `False` by passing that filter to our Pipeline via the `screen` keyword.\n",
356 | "\n",
357 | "To screen our pipeline output for securities with a 30-day average dollar volume greater than $10,000,000, we can simply pass our `high_dollar_volume` filter as the `screen` argument. This is what our `make_pipeline` function now looks like:"
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": 10,
363 | "metadata": {},
364 | "outputs": [],
365 | "source": [
366 | "def make_pipeline():\n",
367 | "\n",
368 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
369 | " mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)\n",
370 | "\n",
371 | " percent_difference = (mean_close_10 - mean_close_30) / mean_close_30\n",
372 | "\n",
373 | " dollar_volume = AverageDollarVolume(window_length=30)\n",
374 | " high_dollar_volume = dollar_volume > 10000000\n",
375 | "\n",
376 | " return Pipeline(\n",
377 | " columns={\n",
378 | " 'percent_difference': percent_difference\n",
379 | " },\n",
380 | " screen=high_dollar_volume\n",
381 | " )"
382 | ]
383 | },
384 | {
385 | "cell_type": "markdown",
386 | "metadata": {},
387 | "source": [
388 | "When we run this, the pipeline output only includes securities that pass the `high_dollar_volume` filter on a given day. For example, running this pipeline on Jan 5th, 2010 results in an output for ~1,600 securities"
389 | ]
390 | },
391 | {
392 | "cell_type": "code",
393 | "execution_count": 11,
394 | "metadata": {
395 | "collapsed": false,
396 | "jupyter": {
397 | "outputs_hidden": false
398 | },
399 | "scrolled": true
400 | },
401 | "outputs": [
402 | {
403 | "name": "stdout",
404 | "output_type": "stream",
405 | "text": [
406 | "Number of securities that passed the filter: 1619\n"
407 | ]
408 | },
409 | {
410 | "data": {
411 | "text/html": [
412 | "\n",
413 | "\n",
426 | "
\n",
427 | " \n",
428 | " \n",
429 | " | \n",
430 | " | \n",
431 | " percent_difference | \n",
432 | "
\n",
433 | " \n",
434 | " date | \n",
435 | " asset | \n",
436 | " | \n",
437 | "
\n",
438 | " \n",
439 | " \n",
440 | " \n",
441 | " 2010-01-05 | \n",
442 | " Equity(FIBBG000C2V3D6 [A]) | \n",
443 | " 0.021425 | \n",
444 | "
\n",
445 | " \n",
446 | " Equity(QI000000004076 [AABA]) | \n",
447 | " 0.050484 | \n",
448 | "
\n",
449 | " \n",
450 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
451 | " 0.068811 | \n",
452 | "
\n",
453 | " \n",
454 | " Equity(QI000000053169 [AAN]) | \n",
455 | " 0.045988 | \n",
456 | "
\n",
457 | " \n",
458 | " Equity(FIBBG000F7RCJ1 [AAP]) | \n",
459 | " 0.015388 | \n",
460 | "
\n",
461 | " \n",
462 | " ... | \n",
463 | " ... | \n",
464 | "
\n",
465 | " \n",
466 | " Equity(FIBBG000BX6PW7 [YELL]) | \n",
467 | " -0.094294 | \n",
468 | "
\n",
469 | " \n",
470 | " Equity(FIBBG000RF0Z26 [YGE]) | \n",
471 | " 0.056671 | \n",
472 | "
\n",
473 | " \n",
474 | " Equity(FIBBG000BH3GZ2 [YUM]) | \n",
475 | " 0.003000 | \n",
476 | "
\n",
477 | " \n",
478 | " Equity(FIBBG000BKPL53 [ZBH]) | \n",
479 | " 0.010965 | \n",
480 | "
\n",
481 | " \n",
482 | " Equity(FIBBG000BX9WL1 [ZION]) | \n",
483 | " -0.011646 | \n",
484 | "
\n",
485 | " \n",
486 | "
\n",
487 | "
1619 rows × 1 columns
\n",
488 | "
"
489 | ],
490 | "text/plain": [
491 | " percent_difference\n",
492 | "date asset \n",
493 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.021425\n",
494 | " Equity(QI000000004076 [AABA]) 0.050484\n",
495 | " Equity(FIBBG000M7KQ09 [AAI]) 0.068811\n",
496 | " Equity(QI000000053169 [AAN]) 0.045988\n",
497 | " Equity(FIBBG000F7RCJ1 [AAP]) 0.015388\n",
498 | "... ...\n",
499 | " Equity(FIBBG000BX6PW7 [YELL]) -0.094294\n",
500 | " Equity(FIBBG000RF0Z26 [YGE]) 0.056671\n",
501 | " Equity(FIBBG000BH3GZ2 [YUM]) 0.003000\n",
502 | " Equity(FIBBG000BKPL53 [ZBH]) 0.010965\n",
503 | " Equity(FIBBG000BX9WL1 [ZION]) -0.011646\n",
504 | "\n",
505 | "[1619 rows x 1 columns]"
506 | ]
507 | },
508 | "execution_count": 11,
509 | "metadata": {},
510 | "output_type": "execute_result"
511 | }
512 | ],
513 | "source": [
514 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
515 | "print(f'Number of securities that passed the filter: {len(result)}')\n",
516 | "result"
517 | ]
518 | },
519 | {
520 | "cell_type": "markdown",
521 | "metadata": {},
522 | "source": [
523 | "## Inverting a Filter\n",
524 | "The `~` operator is used to invert a filter, swapping all `True` values with `Falses` and vice-versa. For example, we can write the following to filter for low dollar volume securities:"
525 | ]
526 | },
527 | {
528 | "cell_type": "code",
529 | "execution_count": 12,
530 | "metadata": {},
531 | "outputs": [],
532 | "source": [
533 | "low_dollar_volume = ~high_dollar_volume"
534 | ]
535 | },
536 | {
537 | "cell_type": "markdown",
538 | "metadata": {},
539 | "source": [
540 | "This will return `True` for all securities with an average dollar volume below or equal to $10,000,000 over the last 30 days."
541 | ]
542 | },
543 | {
544 | "cell_type": "markdown",
545 | "metadata": {},
546 | "source": [
547 | "---\n",
548 | "\n",
549 | "**Next Lesson:** [Combining Filters](Lesson07-Combining-Filters.ipynb) "
550 | ]
551 | }
552 | ],
553 | "metadata": {
554 | "kernelspec": {
555 | "display_name": "Python 3.11",
556 | "language": "python",
557 | "name": "python3"
558 | },
559 | "language_info": {
560 | "codemirror_mode": {
561 | "name": "ipython",
562 | "version": 3
563 | },
564 | "file_extension": ".py",
565 | "mimetype": "text/x-python",
566 | "name": "python",
567 | "nbconvert_exporter": "python",
568 | "pygments_lexer": "ipython3",
569 | "version": "3.11.0"
570 | }
571 | },
572 | "nbformat": 4,
573 | "nbformat_minor": 4
574 | }
575 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson07-Combining-Filters.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 7: Combining Filters\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Combining Filters\n",
25 | "\n",
26 | "Like factors, filters can be combined. Combining filters is done using the `&` (and) and `|` (or) operators. For example, let's say we want to screen for securities that are in the top 10% of average dollar volume and have a latest close price above \\$20. To start, let's make a high dollar volume filter using an `AverageDollarVolume` factor and `percentile_between`:"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "from zipline.pipeline import Pipeline, EquityPricing\n",
36 | "from zipline.research import run_pipeline\n",
37 | "from zipline.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "execution_count": 2,
43 | "metadata": {},
44 | "outputs": [],
45 | "source": [
46 | "dollar_volume = AverageDollarVolume(window_length=30)\n",
47 | "high_dollar_volume = dollar_volume.percentile_between(90, 100)"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "Note: `percentile_between` is a `Factor` method returning a `Filter`."
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "Next, let's create a `latest_close` factor and define a filter for securities that closed above $20:"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 3,
67 | "metadata": {},
68 | "outputs": [],
69 | "source": [
70 | "latest_close = EquityPricing.close.latest\n",
71 | "above_20 = latest_close > 20"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "Now we can combine our `high_dollar_volume` filter with our `above_20` filter using the `&` operator:"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 4,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": [
87 | "tradeable_filter = high_dollar_volume & above_20"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "This filter will evaluate to `True` for securities where both `high_dollar_volume` and `above_20` are `True`. Otherwise, it will evaluate to `False`. A similar computation can be made with the `|` (or) operator.\n",
95 | "\n",
96 | "If we want to use this filter as a screen in our pipeline, we can simply pass `tradeable_filter` as the `screen` argument."
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 5,
102 | "metadata": {},
103 | "outputs": [],
104 | "source": [
105 | "def make_pipeline():\n",
106 | "\n",
107 | " mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)\n",
108 | " mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)\n",
109 | "\n",
110 | " percent_difference = (mean_close_10 - mean_close_30) / mean_close_30\n",
111 | "\n",
112 | " dollar_volume = AverageDollarVolume(window_length=30)\n",
113 | " high_dollar_volume = dollar_volume.percentile_between(90, 100)\n",
114 | "\n",
115 | " latest_close = EquityPricing.close.latest\n",
116 | " above_20 = latest_close > 20\n",
117 | "\n",
118 | " tradeable_filter = high_dollar_volume & above_20\n",
119 | "\n",
120 | " return Pipeline(\n",
121 | " columns={\n",
122 | " 'percent_difference': percent_difference\n",
123 | " },\n",
124 | " screen=tradeable_filter\n",
125 | " )"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "When we run this, our pipeline output now only includes ~600 securities."
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": 6,
138 | "metadata": {
139 | "collapsed": false,
140 | "jupyter": {
141 | "outputs_hidden": false
142 | },
143 | "scrolled": true
144 | },
145 | "outputs": [
146 | {
147 | "name": "stdout",
148 | "output_type": "stream",
149 | "text": [
150 | "Number of securities that passed the filter: 615\n"
151 | ]
152 | },
153 | {
154 | "data": {
155 | "text/html": [
156 | "\n",
157 | "\n",
170 | "
\n",
171 | " \n",
172 | " \n",
173 | " | \n",
174 | " | \n",
175 | " percent_difference | \n",
176 | "
\n",
177 | " \n",
178 | " date | \n",
179 | " asset | \n",
180 | " | \n",
181 | "
\n",
182 | " \n",
183 | " \n",
184 | " \n",
185 | " 2010-01-05 | \n",
186 | " Equity(FIBBG000C2V3D6 [A]) | \n",
187 | " 0.021425 | \n",
188 | "
\n",
189 | " \n",
190 | " Equity(FIBBG000F7RCJ1 [AAP]) | \n",
191 | " 0.015388 | \n",
192 | "
\n",
193 | " \n",
194 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
195 | " 0.030018 | \n",
196 | "
\n",
197 | " \n",
198 | " Equity(FIBBG000MDCQC2 [COR]) | \n",
199 | " 0.036454 | \n",
200 | "
\n",
201 | " \n",
202 | " Equity(FIBBG000B9ZXB4 [ABT]) | \n",
203 | " 0.003292 | \n",
204 | "
\n",
205 | " \n",
206 | " ... | \n",
207 | " ... | \n",
208 | "
\n",
209 | " \n",
210 | " Equity(FIBBG000BGB482 [XOP]) | \n",
211 | " 0.051799 | \n",
212 | "
\n",
213 | " \n",
214 | " Equity(FIBBG000D80VV4 [XRT]) | \n",
215 | " 0.021822 | \n",
216 | "
\n",
217 | " \n",
218 | " Equity(FIBBG000BH2VM4 [XTO]) | \n",
219 | " 0.064755 | \n",
220 | "
\n",
221 | " \n",
222 | " Equity(FIBBG000BH3GZ2 [YUM]) | \n",
223 | " 0.003000 | \n",
224 | "
\n",
225 | " \n",
226 | " Equity(FIBBG000BKPL53 [ZBH]) | \n",
227 | " 0.010965 | \n",
228 | "
\n",
229 | " \n",
230 | "
\n",
231 | "
615 rows × 1 columns
\n",
232 | "
"
233 | ],
234 | "text/plain": [
235 | " percent_difference\n",
236 | "date asset \n",
237 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.021425\n",
238 | " Equity(FIBBG000F7RCJ1 [AAP]) 0.015388\n",
239 | " Equity(FIBBG000B9XRY4 [AAPL]) 0.030018\n",
240 | " Equity(FIBBG000MDCQC2 [COR]) 0.036454\n",
241 | " Equity(FIBBG000B9ZXB4 [ABT]) 0.003292\n",
242 | "... ...\n",
243 | " Equity(FIBBG000BGB482 [XOP]) 0.051799\n",
244 | " Equity(FIBBG000D80VV4 [XRT]) 0.021822\n",
245 | " Equity(FIBBG000BH2VM4 [XTO]) 0.064755\n",
246 | " Equity(FIBBG000BH3GZ2 [YUM]) 0.003000\n",
247 | " Equity(FIBBG000BKPL53 [ZBH]) 0.010965\n",
248 | "\n",
249 | "[615 rows x 1 columns]"
250 | ]
251 | },
252 | "execution_count": 6,
253 | "metadata": {},
254 | "output_type": "execute_result"
255 | }
256 | ],
257 | "source": [
258 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
259 | "print(f'Number of securities that passed the filter: {len(result)}')\n",
260 | "result"
261 | ]
262 | },
263 | {
264 | "cell_type": "markdown",
265 | "metadata": {},
266 | "source": [
267 | "---\n",
268 | "\n",
269 | "**Next Lesson:** [Masking](Lesson08-Masking.ipynb) "
270 | ]
271 | }
272 | ],
273 | "metadata": {
274 | "kernelspec": {
275 | "display_name": "Python 3.11",
276 | "language": "python",
277 | "name": "python3"
278 | },
279 | "language_info": {
280 | "codemirror_mode": {
281 | "name": "ipython",
282 | "version": 3
283 | },
284 | "file_extension": ".py",
285 | "mimetype": "text/x-python",
286 | "name": "python",
287 | "nbconvert_exporter": "python",
288 | "pygments_lexer": "ipython3",
289 | "version": "3.11.0"
290 | }
291 | },
292 | "nbformat": 4,
293 | "nbformat_minor": 4
294 | }
295 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson08-Masking.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 8: Masking\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Masking\n",
25 | "\n",
26 | "Sometimes we want to ignore certain assets when computing pipeline expresssions. There are two common cases where ignoring assets is useful:\n",
27 | "1. We want to compute an expression that's computationally expensive, and we know we only care about results for certain assets. An example of such an expensive expression is a `Factor` computing the coefficients of a regression ([RollingLinearRegressionOfReturns](https://www.quantrocket.com/docs/api/#zipline.pipeline.factors.RollingLinearRegressionOfReturns)).\n",
28 | "2. We want to compute an expression that performs comparisons between assets, but we only want those comparisons to be performed against a subset of all assets. For example, we might want to use the `Factor` method `top` to compute the top 200 assets by earnings yield, ignoring assets that don't meet some liquidity constraint.\n",
29 | "\n",
30 | "To support these two use-cases, all `Factors` and many `Factor` methods can accept a mask argument, which must be a `Filter` indicating which assets to consider when computing."
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 1,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "from zipline.pipeline import Pipeline, EquityPricing\n",
40 | "from zipline.research import run_pipeline\n",
41 | "from zipline.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "## Masking Factors\n",
49 | "\n",
50 | "Let's say we want our pipeline to output securities with a high or low percent difference but we also only want to consider securities with a dollar volume above \\$10,000,000. To do this, let's rearrange our `make_pipeline` function so that we first create the `high_dollar_volume` filter. We can then use this filter as a `mask` for moving average factors by passing `high_dollar_volume` as the `mask` argument to `SimpleMovingAverage`."
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 2,
56 | "metadata": {
57 | "collapsed": false,
58 | "jupyter": {
59 | "outputs_hidden": false
60 | }
61 | },
62 | "outputs": [],
63 | "source": [
64 | "# Dollar volume factor\n",
65 | "dollar_volume = AverageDollarVolume(window_length=30)\n",
66 | "\n",
67 | "# High dollar volume filter\n",
68 | "high_dollar_volume = (dollar_volume > 10000000)\n",
69 | "\n",
70 | "# Average close price factors\n",
71 | "mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10, mask=high_dollar_volume)\n",
72 | "mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30, mask=high_dollar_volume)\n",
73 | "\n",
74 | "# Relative difference factor\n",
75 | "percent_difference = (mean_close_10 - mean_close_30) / mean_close_30"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "Applying the mask to `SimpleMovingAverage` restricts the average close price factors to a computation over the ~2000 securities passing the `high_dollar_volume` filter, as opposed to ~8000 without a mask. When we combine `mean_close_10` and `mean_close_30` to form `percent_difference`, the computation is performed on the same ~2000 securities."
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | "## Masking Filters\n",
90 | "\n",
91 | "Masks can be also be applied to methods that return filters like `top`, `bottom`, and `percentile_between`.\n",
92 | "\n",
93 | "Masks are most useful when we want to apply a filter in the earlier steps of a combined computation. For example, suppose we want to get the 50 securities with the highest open price that are also in the top 10% of dollar volume. Suppose that we then want the 90th-100th percentile of these securities by close price. We can do this with the following:"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 3,
99 | "metadata": {
100 | "collapsed": false,
101 | "jupyter": {
102 | "outputs_hidden": false
103 | }
104 | },
105 | "outputs": [],
106 | "source": [
107 | "# Dollar volume factor\n",
108 | "dollar_volume = AverageDollarVolume(window_length=30)\n",
109 | "\n",
110 | "# High dollar volume filter\n",
111 | "high_dollar_volume = dollar_volume.percentile_between(90,100)\n",
112 | "\n",
113 | "# Top open price filter (high dollar volume securities)\n",
114 | "top_open_price = EquityPricing.open.latest.top(50, mask=high_dollar_volume)\n",
115 | "\n",
116 | "# Top percentile close price filter (high dollar volume, top 50 open price)\n",
117 | "high_close_price = EquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "Let's put this into `make_pipeline` and output an empty pipeline screened with our `high_close_price` filter."
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 4,
130 | "metadata": {},
131 | "outputs": [],
132 | "source": [
133 | "def make_pipeline():\n",
134 | "\n",
135 | " # Dollar volume factor\n",
136 | " dollar_volume = AverageDollarVolume(window_length=30)\n",
137 | "\n",
138 | " # High dollar volume filter\n",
139 | " high_dollar_volume = dollar_volume.percentile_between(90,100)\n",
140 | "\n",
141 | " # Top open securities filter (high dollar volume securities)\n",
142 | " top_open_price = EquityPricing.open.latest.top(50, mask=high_dollar_volume)\n",
143 | "\n",
144 | " # Top percentile close price filter (high dollar volume, top 50 open price)\n",
145 | " high_close_price = EquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)\n",
146 | "\n",
147 | " return Pipeline(\n",
148 | " screen=high_close_price\n",
149 | " )"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "Running this pipeline outputs 5 securities on Jan 5th, 2010."
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 5,
162 | "metadata": {
163 | "collapsed": false,
164 | "jupyter": {
165 | "outputs_hidden": false
166 | }
167 | },
168 | "outputs": [
169 | {
170 | "name": "stdout",
171 | "output_type": "stream",
172 | "text": [
173 | "Number of securities that passed the filter: 5\n"
174 | ]
175 | },
176 | {
177 | "data": {
178 | "text/html": [
179 | "\n",
180 | "\n",
193 | "
\n",
194 | " \n",
195 | " \n",
196 | " | \n",
197 | " | \n",
198 | "
\n",
199 | " \n",
200 | " date | \n",
201 | " asset | \n",
202 | "
\n",
203 | " \n",
204 | " \n",
205 | " \n",
206 | " 2010-01-05 | \n",
207 | " Equity(FIBBG000QXWHD1 [BIDU]) | \n",
208 | "
\n",
209 | " \n",
210 | " Equity(FIBBG000DWCFL4 [BRK.A]) | \n",
211 | "
\n",
212 | " \n",
213 | " Equity(FIBBG000DWG505 [BRK.B]) | \n",
214 | "
\n",
215 | " \n",
216 | " Equity(FIBBG000BHLYP4 [CME]) | \n",
217 | "
\n",
218 | " \n",
219 | " Equity(FIBBG009S39JX6 [GOOGL]) | \n",
220 | "
\n",
221 | " \n",
222 | "
\n",
223 | "
"
224 | ],
225 | "text/plain": [
226 | "Empty DataFrameWithMetadata\n",
227 | "Columns: []\n",
228 | "Index: [(2010-01-05 00:00:00, Equity(FIBBG000QXWHD1 [BIDU])), (2010-01-05 00:00:00, Equity(FIBBG000DWCFL4 [BRK.A])), (2010-01-05 00:00:00, Equity(FIBBG000DWG505 [BRK.B])), (2010-01-05 00:00:00, Equity(FIBBG000BHLYP4 [CME])), (2010-01-05 00:00:00, Equity(FIBBG009S39JX6 [GOOGL]))]"
229 | ]
230 | },
231 | "execution_count": 5,
232 | "metadata": {},
233 | "output_type": "execute_result"
234 | }
235 | ],
236 | "source": [
237 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
238 | "print(f'Number of securities that passed the filter: {len(result)}')\n",
239 | "result"
240 | ]
241 | },
242 | {
243 | "cell_type": "markdown",
244 | "metadata": {},
245 | "source": [
246 | "Note that applying masks in layers as we did above can be thought of as an \"asset funnel\"."
247 | ]
248 | },
249 | {
250 | "cell_type": "markdown",
251 | "metadata": {},
252 | "source": [
253 | "---\n",
254 | "\n",
255 | "**Next Lesson:** [Classifiers](Lesson09-Classifiers.ipynb) "
256 | ]
257 | }
258 | ],
259 | "metadata": {
260 | "kernelspec": {
261 | "display_name": "Python 3.11",
262 | "language": "python",
263 | "name": "python3"
264 | },
265 | "language_info": {
266 | "codemirror_mode": {
267 | "name": "ipython",
268 | "version": 3
269 | },
270 | "file_extension": ".py",
271 | "mimetype": "text/x-python",
272 | "name": "python",
273 | "nbconvert_exporter": "python",
274 | "pygments_lexer": "ipython3",
275 | "version": "3.11.0"
276 | }
277 | },
278 | "nbformat": 4,
279 | "nbformat_minor": 4
280 | }
281 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson09-Classifiers.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 9: Classifiers\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Classifiers\n",
25 | "A classifier is a function from an asset and a moment in time to a [categorical output](https://en.wikipedia.org/wiki/Categorical_variable) such as a `string` or `integer` label:\n",
26 | "\n",
27 | "```\n",
28 | "F(asset, timestamp) -> category\n",
29 | "```\n",
30 | "\n",
31 | "An example of a classifier producing a string output is the exchange of a security. To create this classifier, we'll have to import `master.SecuritiesMaster.Exchange` and use the `latest` attribute to instantiate our classifier:"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 1,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "from zipline.pipeline import Pipeline, master\n",
41 | "from zipline.research import run_pipeline\n",
42 | "from zipline.pipeline.factors import AverageDollarVolume\n",
43 | "\n",
44 | "# Since the underlying data of master.SecuritiesMaster.Exchange\n",
45 | "# is of type string, .latest returns a Classifier\n",
46 | "exchange = master.SecuritiesMaster.Exchange.latest"
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {},
52 | "source": [
53 | "Previously, we saw that the `latest` attribute produced an instance of a `Factor`. In this case, since the underlying data is of type `string`, `latest` produces a `Classifier`.\n",
54 | "\n",
55 | "Similarly, a computation producing the sector of a security is a `Classifier`. To get the sector, we can again use the `SecuritiesMaster` dataset."
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": 2,
61 | "metadata": {
62 | "collapsed": false,
63 | "jupyter": {
64 | "outputs_hidden": false
65 | }
66 | },
67 | "outputs": [],
68 | "source": [
69 | "sector = master.SecuritiesMaster.usstock_Sector.latest"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "## Building Filters from Classifiers\n",
77 | "\n",
78 | "Classifiers can also be used to produce filters with methods like `isnull`, `eq`, and `startswith`. The full list of `Classifier` methods producing `Filters` can be found in the [API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.Classifier).\n",
79 | "\n",
80 | "As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the `eq` method of our `exchange` classifier."
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 3,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": [
89 | "nyse_filter = exchange.eq('XNYS')"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "This filter will return `True` for securities having `'XNYS'` as their `Exchange`."
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "metadata": {},
102 | "source": [
103 | "## Quantiles\n",
104 | "\n",
105 | "Classifiers can also be produced from various `Factor` methods. The most general of these is the `quantiles` method which accepts a bin count as an argument. The `quantiles` method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a `Classifier` with these labels. `NaN`s are labeled with -1. Aliases are available for [quartiles](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor.quartiles) (`quantiles(4)`), [quintiles](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor.quintiles) (`quantiles(5)`), and [deciles](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor.deciles) (`quantiles(10)`). As an example, this is what a filter for the top decile of a factor might look like:"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 4,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
115 | "top_decile = (dollar_volume_decile.eq(9))"
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "Let's put each of our classifiers into a pipeline and run it to see what they look like."
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 5,
128 | "metadata": {
129 | "collapsed": false,
130 | "jupyter": {
131 | "outputs_hidden": false
132 | }
133 | },
134 | "outputs": [],
135 | "source": [
136 | "def make_pipeline():\n",
137 | " exchange = master.SecuritiesMaster.Exchange.latest\n",
138 | " nyse_filter = exchange.eq('XNYS')\n",
139 | "\n",
140 | " sector = master.SecuritiesMaster.usstock_Sector.latest\n",
141 | "\n",
142 | " dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
143 | " top_decile = (dollar_volume_decile.eq(9))\n",
144 | "\n",
145 | " return Pipeline(\n",
146 | " columns={\n",
147 | " 'exchange': exchange,\n",
148 | " 'sector': sector,\n",
149 | " 'dollar_volume_decile': dollar_volume_decile\n",
150 | " },\n",
151 | " screen=(nyse_filter & top_decile)\n",
152 | " )"
153 | ]
154 | },
155 | {
156 | "cell_type": "code",
157 | "execution_count": 6,
158 | "metadata": {
159 | "collapsed": false,
160 | "jupyter": {
161 | "outputs_hidden": false
162 | }
163 | },
164 | "outputs": [
165 | {
166 | "name": "stdout",
167 | "output_type": "stream",
168 | "text": [
169 | "Number of securities that passed the filter: 471\n"
170 | ]
171 | },
172 | {
173 | "data": {
174 | "text/html": [
175 | "\n",
176 | "\n",
189 | "
\n",
190 | " \n",
191 | " \n",
192 | " | \n",
193 | " | \n",
194 | " exchange | \n",
195 | " sector | \n",
196 | " dollar_volume_decile | \n",
197 | "
\n",
198 | " \n",
199 | " date | \n",
200 | " asset | \n",
201 | " | \n",
202 | " | \n",
203 | " | \n",
204 | "
\n",
205 | " \n",
206 | " \n",
207 | " \n",
208 | " 2010-01-05 | \n",
209 | " Equity(FIBBG000C2V3D6 [A]) | \n",
210 | " XNYS | \n",
211 | " Technology | \n",
212 | " 9 | \n",
213 | "
\n",
214 | " \n",
215 | " Equity(FIBBG000F7RCJ1 [AAP]) | \n",
216 | " XNYS | \n",
217 | " Consumer Discretionary | \n",
218 | " 9 | \n",
219 | "
\n",
220 | " \n",
221 | " Equity(FIBBG000MDCQC2 [COR]) | \n",
222 | " XNYS | \n",
223 | " Health Care | \n",
224 | " 9 | \n",
225 | "
\n",
226 | " \n",
227 | " Equity(FIBBG000B9ZXB4 [ABT]) | \n",
228 | " XNYS | \n",
229 | " Health Care | \n",
230 | " 9 | \n",
231 | "
\n",
232 | " \n",
233 | " Equity(QI000000052857 [ABV]) | \n",
234 | " XNYS | \n",
235 | " Consumer Staples | \n",
236 | " 9 | \n",
237 | "
\n",
238 | " \n",
239 | "
\n",
240 | "
"
241 | ],
242 | "text/plain": [
243 | " exchange ... dollar_volume_decile\n",
244 | "date asset ... \n",
245 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) XNYS ... 9\n",
246 | " Equity(FIBBG000F7RCJ1 [AAP]) XNYS ... 9\n",
247 | " Equity(FIBBG000MDCQC2 [COR]) XNYS ... 9\n",
248 | " Equity(FIBBG000B9ZXB4 [ABT]) XNYS ... 9\n",
249 | " Equity(QI000000052857 [ABV]) XNYS ... 9\n",
250 | "\n",
251 | "[5 rows x 3 columns]"
252 | ]
253 | },
254 | "execution_count": 6,
255 | "metadata": {},
256 | "output_type": "execute_result"
257 | }
258 | ],
259 | "source": [
260 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
261 | "print(f'Number of securities that passed the filter: {len(result)}')\n",
262 | "result.head(5)"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {},
268 | "source": [
269 | "Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as [demean](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor.demean) are outside the scope of this tutorial."
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {},
275 | "source": [
276 | "---\n",
277 | "\n",
278 | "**Next Lesson:** [Datasets](Lesson10-Datasets.ipynb) "
279 | ]
280 | }
281 | ],
282 | "metadata": {
283 | "kernelspec": {
284 | "display_name": "Python 3.11",
285 | "language": "python",
286 | "name": "python3"
287 | },
288 | "language_info": {
289 | "codemirror_mode": {
290 | "name": "ipython",
291 | "version": 3
292 | },
293 | "file_extension": ".py",
294 | "mimetype": "text/x-python",
295 | "name": "python",
296 | "nbconvert_exporter": "python",
297 | "pygments_lexer": "ipython3",
298 | "version": "3.11.0"
299 | }
300 | },
301 | "nbformat": 4,
302 | "nbformat_minor": 4
303 | }
304 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson10-Datasets.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 10: Datasets\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Datasets and BoundColumns\n",
25 | "\n",
26 | "DataSets are simply collections of objects that tell the Pipeline API where and how to find the inputs to computations. An example of a `DataSet` that we have already seen is `EquityPricing`.\n",
27 | "\n",
28 | "A `BoundColumn` is a column of data that is concretely bound to a `DataSet`. Instances of `BoundColumn` are dynamically created upon access to attributes of a `DataSet`. Inputs to pipeline computations must be of type `BoundColumn`. An example of a `BoundColumn` that we have already seen is `EquityPricing.close`.\n",
29 | "It is important to understand that `DataSet`s and `BoundColumn`s do not hold actual data. Remember that when computations are created and added to a pipeline, they don't actually perform the computation until the pipeline is run. `DataSet` and `BoundColumn` can be thought of in a similar way; they are simply used to identify the inputs of a computation. The data is populated later when the pipeline is run.\n"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## dtypes\n",
37 | "\n",
38 | "When defining pipeline computations, we need to know the types of our inputs in order to know which operations and functions we can use. The dtype of a `BoundColumn` tells a computation what the type of the data will be when the pipeline is run. For example, `EquityPricing` has a float dtype so a factor may perform arithmetic operations on `EquityPricing.close` (e.g. compute the 5-day mean). The importance of this will become more clear in the next lesson.\n",
39 | "The dtype of a `BoundColumn` can also determine the type of a computation. In the case of the latest computation, the dtype determines whether the computation is a factor (float), a filter (bool), or a classifier (string, int)."
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | "## Pricing Data\n",
47 | "Equity pricing data is stored in the `EquityPricing` dataset. `EquityPricing` provides five columns:\n",
48 | "\n",
49 | " EquityPricing.open\n",
50 | " EquityPricing.high\n",
51 | " EquityPricing.low\n",
52 | " EquityPricing.close\n",
53 | " EquityPricing.volume\n",
54 | "\n",
55 | "Each of these columns has a float dtype. The `EquityPricing` dataset is bound to the particular bundle specified in the call to `run_pipeline` (or the default bundle if no bundle is specified)."
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "## Securities Master Data, Fundamental Data, Short Sale Data, etc.\n",
63 | "\n",
64 | "In addition to pricing data, you can access a variety of other datasets in Pipeline, including securities master data, fundamental data, short sale data, and more. For a full list of available datasets, see the [usage guide](https://www.quantrocket.com/docs/#zipline-pipeline-data)."
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "---\n",
72 | "\n",
73 | "**Next Lesson:** [Custom Factors](Lesson11-Custom-Factors.ipynb) "
74 | ]
75 | }
76 | ],
77 | "metadata": {
78 | "kernelspec": {
79 | "display_name": "Python 3.11",
80 | "language": "python",
81 | "name": "python3"
82 | },
83 | "language_info": {
84 | "codemirror_mode": {
85 | "name": "ipython",
86 | "version": 3
87 | },
88 | "file_extension": ".py",
89 | "mimetype": "text/x-python",
90 | "name": "python",
91 | "nbconvert_exporter": "python",
92 | "pygments_lexer": "ipython3",
93 | "version": "3.11.0"
94 | }
95 | },
96 | "nbformat": 4,
97 | "nbformat_minor": 4
98 | }
99 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson11-Custom-Factors.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 11: Custom Factors\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Custom Factors\n",
25 | "When we first looked at factors, we explored the set of built-in factors. Frequently, a desired computation isn't included as a built-in factor. One of the most powerful features of the Pipeline API is that it allows us to define our own custom factors. When a desired computation doesn't exist as a built-in, we define a custom factor.\n",
26 | "\n",
27 | "Conceptually, a custom factor is identical to a built-in factor. It accepts `inputs`, `window_length`, and `mask` as constructor arguments, and returns a `Factor` object each day.\n",
28 | "\n",
29 | "Let's take an example of a computation that doesn't exist as a built-in: standard deviation. To create a factor that computes the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) over a trailing window, we can subclass `zipline.pipeline.CustomFactor` and implement a compute method whose signature is:\n",
30 | "\n",
31 | "\n",
32 | "```\n",
33 | "def compute(self, today, asset_ids, out, *inputs):\n",
34 | " ...\n",
35 | "```\n",
36 | "\n",
37 | "- `*inputs` are M x N numpy arrays, where M is the `window_length` and N is the number of securities (usually around ~8000 unless a `mask` is provided). `*inputs` are trailing data windows. Note that there will be one M x N array for each `BoundColumn` provided in the factor's `inputs` list. The data type of each array will be the `dtype` of the corresponding `BoundColumn`.\n",
38 | "- `out` is an empty array of length N. `out` will be the output of our custom factor each day. The job of the `compute` method is to write output values into `out`.\n",
39 | "- `asset_ids` will be an integer array of length N containing security ids corresponding to the columns in our `*inputs` arrays.\n",
40 | "- `today` will be a pandas Timestamp representing the day for which `compute` is being called.\n",
41 | "\n",
42 | "Of these, `*inputs` and `out` are most commonly used.\n",
43 | "\n",
44 | "An instance of `CustomFactor` that has been added to a pipeline will have its compute method called every day. For example, let's define a custom factor that computes the standard deviation of the close price over the last 5 days. To start, let's add `CustomFactor` and `numpy` to our import statements."
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 1,
50 | "metadata": {},
51 | "outputs": [],
52 | "source": [
53 | "from zipline.pipeline import Pipeline, EquityPricing\n",
54 | "from zipline.pipeline.factors import CustomFactor\n",
55 | "from zipline.research import run_pipeline\n",
56 | "import numpy"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "Next, let's define our custom factor to calculate the standard deviation over a trailing window using `numpy.nanstd`:"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 2,
69 | "metadata": {
70 | "collapsed": false,
71 | "jupyter": {
72 | "outputs_hidden": false
73 | }
74 | },
75 | "outputs": [],
76 | "source": [
77 | "class StdDev(CustomFactor):\n",
78 | " def compute(self, today, asset_ids, out, values):\n",
79 | " # Calculates the column-wise standard deviation, ignoring NaNs\n",
80 | " out[:] = numpy.nanstd(values, axis=0)"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "Finally, let's instantiate our factor in `make_pipeline()`:"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": 3,
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "def make_pipeline():\n",
97 | " std_dev = StdDev(inputs=[EquityPricing.close], window_length=5)\n",
98 | "\n",
99 | " return Pipeline(\n",
100 | " columns={\n",
101 | " 'std_dev': std_dev\n",
102 | " }\n",
103 | " )"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "When this pipeline is run, `StdDev.compute()` will be called every day with data as follows:\n",
111 | "\n",
112 | "- `values`: An M x N numpy array, where M is 5 (`window_length`), and N is ~8000 (the number of securities in our database on the day in question).\n",
113 | "- `out`: An empty array of length N (~8000). In this example, the job of `compute` is to populate `out` with an array storing the 5-day close price standard deviations."
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 4,
119 | "metadata": {
120 | "collapsed": false,
121 | "jupyter": {
122 | "outputs_hidden": false
123 | },
124 | "scrolled": true
125 | },
126 | "outputs": [
127 | {
128 | "name": "stderr",
129 | "output_type": "stream",
130 | "text": [
131 | "/opt/conda/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.\n",
132 | " var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n"
133 | ]
134 | },
135 | {
136 | "data": {
137 | "text/html": [
138 | "\n",
139 | "\n",
152 | "
\n",
153 | " \n",
154 | " \n",
155 | " | \n",
156 | " | \n",
157 | " std_dev | \n",
158 | "
\n",
159 | " \n",
160 | " date | \n",
161 | " asset | \n",
162 | " | \n",
163 | "
\n",
164 | " \n",
165 | " \n",
166 | " \n",
167 | " 2010-01-05 | \n",
168 | " Equity(FIBBG000C2V3D6 [A]) | \n",
169 | " 0.396434 | \n",
170 | "
\n",
171 | " \n",
172 | " Equity(QI000000004076 [AABA]) | \n",
173 | " 0.106283 | \n",
174 | "
\n",
175 | " \n",
176 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
177 | " 0.211528 | \n",
178 | "
\n",
179 | " \n",
180 | " Equity(FIBBG000V2S3P6 [AACG]) | \n",
181 | " 0.100665 | \n",
182 | "
\n",
183 | " \n",
184 | " Equity(FIBBG000M7KQ09 [AAI]) | \n",
185 | " 0.020396 | \n",
186 | "
\n",
187 | " \n",
188 | " ... | \n",
189 | " ... | \n",
190 | "
\n",
191 | " \n",
192 | " Equity(FIBBG011MC2100 [AATC]) | \n",
193 | " 0.132755 | \n",
194 | "
\n",
195 | " \n",
196 | " Equity(FIBBG000GDBDH4 [BDG]) | \n",
197 | " NaN | \n",
198 | "
\n",
199 | " \n",
200 | " Equity(FIBBG000008NR0 [ISM]) | \n",
201 | " NaN | \n",
202 | "
\n",
203 | " \n",
204 | " Equity(FIBBG000GZ24W8 [PEM]) | \n",
205 | " NaN | \n",
206 | "
\n",
207 | " \n",
208 | " Equity(FIBBG000BB5S87 [HCH]) | \n",
209 | " 0.000000 | \n",
210 | "
\n",
211 | " \n",
212 | "
\n",
213 | "
7841 rows × 1 columns
\n",
214 | "
"
215 | ],
216 | "text/plain": [
217 | " std_dev\n",
218 | "date asset \n",
219 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.396434\n",
220 | " Equity(QI000000004076 [AABA]) 0.106283\n",
221 | " Equity(FIBBG000BZWHH8 [AACC]) 0.211528\n",
222 | " Equity(FIBBG000V2S3P6 [AACG]) 0.100665\n",
223 | " Equity(FIBBG000M7KQ09 [AAI]) 0.020396\n",
224 | "... ...\n",
225 | " Equity(FIBBG011MC2100 [AATC]) 0.132755\n",
226 | " Equity(FIBBG000GDBDH4 [BDG]) NaN\n",
227 | " Equity(FIBBG000008NR0 [ISM]) NaN\n",
228 | " Equity(FIBBG000GZ24W8 [PEM]) NaN\n",
229 | " Equity(FIBBG000BB5S87 [HCH]) 0.000000\n",
230 | "\n",
231 | "[7841 rows x 1 columns]"
232 | ]
233 | },
234 | "execution_count": 4,
235 | "metadata": {},
236 | "output_type": "execute_result"
237 | }
238 | ],
239 | "source": [
240 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
241 | "result"
242 | ]
243 | },
244 | {
245 | "cell_type": "markdown",
246 | "metadata": {},
247 | "source": [
248 | "### Default Inputs\n",
249 | "When writing a custom factor, we can set default `inputs` and `window_length` in our `CustomFactor` subclass. For example, let's define the `TenDayMeanDifference` custom factor to compute the mean difference between two data columns over a trailing window using `numpy.nanmean`. Let's set the default `inputs` to `[EquityPricing.close, EquityPricing.open]` and the default `window_length` to 10:"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": 5,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": [
258 | "class TenDayMeanDifference(CustomFactor):\n",
259 | " # Default inputs.\n",
260 | " inputs = [EquityPricing.close, EquityPricing.open]\n",
261 | " window_length = 10\n",
262 | " def compute(self, today, asset_ids, out, close, open):\n",
263 | " # Calculates the column-wise mean difference, ignoring NaNs\n",
264 | " out[:] = numpy.nanmean(close - open, axis=0)"
265 | ]
266 | },
267 | {
268 | "cell_type": "markdown",
269 | "metadata": {},
270 | "source": [
271 | "Remember in this case that `close` and `open` are each 10 x ~8000 2D numpy arrays.\n",
272 | "\n",
273 | "If we call `TenDayMeanDifference` without providing any arguments, it will use the defaults."
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": 6,
279 | "metadata": {},
280 | "outputs": [],
281 | "source": [
282 | "# Computes the 10-day mean difference between the daily open and close prices.\n",
283 | "close_open_diff = TenDayMeanDifference()"
284 | ]
285 | },
286 | {
287 | "cell_type": "markdown",
288 | "metadata": {},
289 | "source": [
290 | "The defaults can be manually overridden by specifying arguments in the constructor call."
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 7,
296 | "metadata": {},
297 | "outputs": [],
298 | "source": [
299 | "# Computes the 10-day mean difference between the daily high and low prices.\n",
300 | "high_low_diff = TenDayMeanDifference(inputs=[EquityPricing.high, EquityPricing.low])"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "metadata": {},
306 | "source": [
307 | "### Further Example\n",
308 | "Let's take another example where we build a [momentum](http://www.investopedia.com/terms/m/momentum.asp) custom factor and use it to create a filter. We will then use that filter as a `screen` for our pipeline.\n",
309 | "\n",
310 | "Let's start by defining a `Momentum` factor to be the division of the most recent close price by the close price from `n` days ago where `n` is the `window_length`."
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": 8,
316 | "metadata": {},
317 | "outputs": [],
318 | "source": [
319 | "class Momentum(CustomFactor):\n",
320 | " # Default inputs\n",
321 | " inputs = [EquityPricing.close]\n",
322 | "\n",
323 | " # Compute momentum\n",
324 | " def compute(self, today, assets, out, close):\n",
325 | " out[:] = close[-1] / close[0]"
326 | ]
327 | },
328 | {
329 | "cell_type": "markdown",
330 | "metadata": {},
331 | "source": [
332 | "Now, let's instantiate our `Momentum` factor (twice) to create a 10-day momentum factor and a 20-day momentum factor. Let's also create a `positive_momentum` filter returning `True` for securities with both a positive 10-day momentum and a positive 20-day momentum."
333 | ]
334 | },
335 | {
336 | "cell_type": "code",
337 | "execution_count": 9,
338 | "metadata": {
339 | "collapsed": false,
340 | "jupyter": {
341 | "outputs_hidden": false
342 | }
343 | },
344 | "outputs": [],
345 | "source": [
346 | "ten_day_momentum = Momentum(window_length=10)\n",
347 | "twenty_day_momentum = Momentum(window_length=20)\n",
348 | "\n",
349 | "positive_momentum = ((ten_day_momentum > 1) & (twenty_day_momentum > 1))"
350 | ]
351 | },
352 | {
353 | "cell_type": "markdown",
354 | "metadata": {},
355 | "source": [
356 | "Next, let's add our momentum factors and our `positive_momentum` filter to `make_pipeline`. Let's also pass `positive_momentum` as a `screen` to our pipeline."
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 10,
362 | "metadata": {
363 | "collapsed": false,
364 | "jupyter": {
365 | "outputs_hidden": false
366 | }
367 | },
368 | "outputs": [],
369 | "source": [
370 | "def make_pipeline():\n",
371 | "\n",
372 | " ten_day_momentum = Momentum(window_length=10)\n",
373 | " twenty_day_momentum = Momentum(window_length=20)\n",
374 | "\n",
375 | " positive_momentum = ((ten_day_momentum > 1) & (twenty_day_momentum > 1))\n",
376 | "\n",
377 | " std_dev = StdDev(inputs=[EquityPricing.close], window_length=5)\n",
378 | "\n",
379 | " return Pipeline(\n",
380 | " columns={\n",
381 | " 'std_dev': std_dev,\n",
382 | " 'ten_day_momentum': ten_day_momentum,\n",
383 | " 'twenty_day_momentum': twenty_day_momentum\n",
384 | " },\n",
385 | " screen=positive_momentum\n",
386 | " )"
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "Running this pipeline outputs the standard deviation and each of our momentum computations for securities with positive 10-day and 20-day momentum."
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": 11,
399 | "metadata": {
400 | "collapsed": false,
401 | "jupyter": {
402 | "outputs_hidden": false
403 | },
404 | "scrolled": true
405 | },
406 | "outputs": [
407 | {
408 | "name": "stderr",
409 | "output_type": "stream",
410 | "text": [
411 | "/opt/conda/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.\n",
412 | " var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n"
413 | ]
414 | },
415 | {
416 | "data": {
417 | "text/html": [
418 | "\n",
419 | "\n",
432 | "
\n",
433 | " \n",
434 | " \n",
435 | " | \n",
436 | " | \n",
437 | " std_dev | \n",
438 | " ten_day_momentum | \n",
439 | " twenty_day_momentum | \n",
440 | "
\n",
441 | " \n",
442 | " date | \n",
443 | " asset | \n",
444 | " | \n",
445 | " | \n",
446 | " | \n",
447 | "
\n",
448 | " \n",
449 | " \n",
450 | " \n",
451 | " 2010-01-05 | \n",
452 | " Equity(FIBBG000C2V3D6 [A]) | \n",
453 | " 0.396434 | \n",
454 | " 1.064626 | \n",
455 | " 1.048225 | \n",
456 | "
\n",
457 | " \n",
458 | " Equity(QI000000004076 [AABA]) | \n",
459 | " 0.106283 | \n",
460 | " 1.059480 | \n",
461 | " 1.125741 | \n",
462 | "
\n",
463 | " \n",
464 | " Equity(FIBBG000BZWHH8 [AACC]) | \n",
465 | " 0.211528 | \n",
466 | " 1.191667 | \n",
467 | " 1.185738 | \n",
468 | "
\n",
469 | " \n",
470 | " Equity(FIBBG000BD1373 [AAIC]) | \n",
471 | " 0.218998 | \n",
472 | " 1.036842 | \n",
473 | " 1.133813 | \n",
474 | "
\n",
475 | " \n",
476 | " Equity(FIBBG000C2LZP3 [AAON]) | \n",
477 | " 0.185084 | \n",
478 | " 1.009086 | \n",
479 | " 1.053091 | \n",
480 | "
\n",
481 | " \n",
482 | " ... | \n",
483 | " ... | \n",
484 | " ... | \n",
485 | " ... | \n",
486 | "
\n",
487 | " \n",
488 | " Equity(FIBBG000N8D1G2 [ZSTN]) | \n",
489 | " 0.782800 | \n",
490 | " 1.269780 | \n",
491 | " 1.346630 | \n",
492 | "
\n",
493 | " \n",
494 | " Equity(FIBBG000BXB8X8 [ZTR]) | \n",
495 | " 0.010172 | \n",
496 | " 1.012920 | \n",
497 | " 1.028857 | \n",
498 | "
\n",
499 | " \n",
500 | " Equity(FIBBG000PYX812 [ZUMZ]) | \n",
501 | " 0.168713 | \n",
502 | " 1.003247 | \n",
503 | " 1.038655 | \n",
504 | "
\n",
505 | " \n",
506 | " Equity(FIBBG000C3CQP1 [ZVO]) | \n",
507 | " 0.093894 | \n",
508 | " 1.003372 | \n",
509 | " 1.026915 | \n",
510 | "
\n",
511 | " \n",
512 | " Equity(FIBBG000PZKV21 [ZZ]) | \n",
513 | " 0.044091 | \n",
514 | " 1.045752 | \n",
515 | " 1.114983 | \n",
516 | "
\n",
517 | " \n",
518 | "
\n",
519 | "
4534 rows × 3 columns
\n",
520 | "
"
521 | ],
522 | "text/plain": [
523 | " std_dev ... twenty_day_momentum\n",
524 | "date asset ... \n",
525 | "2010-01-05 Equity(FIBBG000C2V3D6 [A]) 0.396434 ... 1.048225\n",
526 | " Equity(QI000000004076 [AABA]) 0.106283 ... 1.125741\n",
527 | " Equity(FIBBG000BZWHH8 [AACC]) 0.211528 ... 1.185738\n",
528 | " Equity(FIBBG000BD1373 [AAIC]) 0.218998 ... 1.133813\n",
529 | " Equity(FIBBG000C2LZP3 [AAON]) 0.185084 ... 1.053091\n",
530 | "... ... ... ...\n",
531 | " Equity(FIBBG000N8D1G2 [ZSTN]) 0.782800 ... 1.346630\n",
532 | " Equity(FIBBG000BXB8X8 [ZTR]) 0.010172 ... 1.028857\n",
533 | " Equity(FIBBG000PYX812 [ZUMZ]) 0.168713 ... 1.038655\n",
534 | " Equity(FIBBG000C3CQP1 [ZVO]) 0.093894 ... 1.026915\n",
535 | " Equity(FIBBG000PZKV21 [ZZ]) 0.044091 ... 1.114983\n",
536 | "\n",
537 | "[4534 rows x 3 columns]"
538 | ]
539 | },
540 | "execution_count": 11,
541 | "metadata": {},
542 | "output_type": "execute_result"
543 | }
544 | ],
545 | "source": [
546 | "result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')\n",
547 | "result"
548 | ]
549 | },
550 | {
551 | "cell_type": "markdown",
552 | "metadata": {},
553 | "source": [
554 | "Custom factors allow us to define custom computations in a pipeline. They are frequently the best way to perform computations on multiple data columns. The full documentation for CustomFactors is available in the [API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.CustomFactor)."
555 | ]
556 | },
557 | {
558 | "cell_type": "markdown",
559 | "metadata": {},
560 | "source": [
561 | "---\n",
562 | "\n",
563 | "**Next Lesson:** [Initial Universe](Lesson12-Initial-Universe.ipynb) "
564 | ]
565 | }
566 | ],
567 | "metadata": {
568 | "kernelspec": {
569 | "display_name": "Python 3.11",
570 | "language": "python",
571 | "name": "python3"
572 | },
573 | "language_info": {
574 | "codemirror_mode": {
575 | "name": "ipython",
576 | "version": 3
577 | },
578 | "file_extension": ".py",
579 | "mimetype": "text/x-python",
580 | "name": "python",
581 | "nbconvert_exporter": "python",
582 | "pygments_lexer": "ipython3",
583 | "version": "3.11.0"
584 | }
585 | },
586 | "nbformat": 4,
587 | "nbformat_minor": 4
588 | }
589 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson12-Initial-Universe.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 12: Initial Universe\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Initial Universe\n",
25 | "\n",
26 | "By default, a pipeline performs computations on every asset in the bundle. As we learned in an earlier lesson, an optional `screen` argument (consisting of a Filter) can be applied that limits the pipeline output to a subset of assets. Under the hood, screens are applied as the last step of a pipeline computation. If we screen for assets with dollar volume above $1MM, the pipeline will compute dollar volume for every asset in the bundle, then filter out any asset/date combinations where dollar volume falls below the threshold.\n",
27 | "\n",
28 | "This means that we are often performing computations on assets that don't ultimately interest us. Often, this extra computational work is necessary because we don't know in advance which assets will pass the screen. We don't know which assets have high dollar volume until we compute dollar volume for all assets. However, sometimes we do know in advance that certain assets can be excluded. If we are screening for certain kinds of stocks but our bundle includes stocks and ETFs, does it make sense to perform computations on the ETFs? Wouldn't it be better to exclude the ETFs entirely?\n",
29 | "\n",
30 | "One way to exclude ETFs would be to use masking. We could create a filter that returns `False` for ETFs and pass that filter as the `mask` argument to any factors we want to use. However, a better approach in this case is to exclude ETFs from the initial universe that our pipeline considers. This can be done with the `initial_universe` parameter to the `Pipeline` class:"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 1,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "from zipline.pipeline import Pipeline, master\n",
40 | "\n",
41 | "# SecuritiesMaster.Etf is a boolean column, and the unary operator (~)\n",
42 | "# negates it\n",
43 | "are_not_etfs = ~master.SecuritiesMaster.Etf.latest\n",
44 | "\n",
45 | "pipeline = Pipeline(\n",
46 | " initial_universe=are_not_etfs\n",
47 | ")"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "In this example, we import the `SecuritiesMaster` Dataset (which points to QuantRocket's securities master database), create a filter that negates the `Etf` column (a boolean column indicating whether the asset is an ETF), and pass the filter to our Pipeline as `initial_universe`. Any columns we add to the above Pipeline will only be computed on assets that are not ETFs. ETFs will not even be loaded into the Pipeline workspace, resulting in a speed improvement compared to using `screen` or `mask`. "
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "The filter passed to `initial_universe` can derive from any column of the `SecuritiesMaster` Dataset. The filter can combine multiple columns as long as they are ANDed together using `&` (filters ORed together with `|` are not supported for `initial_universe`). In the following example, we limit the initial universe to common stocks (thus excluding not only ETFs but also REITs, ADRs, preferred stocks, LPs, etc.) and, for stocks that have multiple share classes, we limit the universe to the primary share class: "
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 2,
67 | "metadata": {},
68 | "outputs": [],
69 | "source": [
70 | "# Equities listed as common stock (not preferred stock, ETF, ADR, LP, etc)\n",
71 | "common_stock = master.SecuritiesMaster.usstock_SecurityType2.latest.eq('Common Stock')\n",
72 | "\n",
73 | "# Filter for primary share equities; primary shares can be identified by a\n",
74 | "# null usstock_PrimaryShareSid field (i.e. no pointer to a primary share)\n",
75 | "is_primary_share = master.SecuritiesMaster.usstock_PrimaryShareSid.latest.isnull()\n",
76 | "\n",
77 | "pipeline = Pipeline(\n",
78 | " initial_universe=(common_stock & is_primary_share)\n",
79 | ")"
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "metadata": {},
85 | "source": [
86 | "In addition to accepting filters created from the `SecuritiesMaster` Dataset, `initial_universe` also accepts the four filters imported below, which reference static lists of assets. \n",
87 | "\n",
88 | "> To see the docstrings for these filters, click on the filter name and press Control in JupyterLab, or consult the [API Reference](https://www.quantrocket.com/docs/api/#built-in-filters)."
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": 3,
94 | "metadata": {},
95 | "outputs": [],
96 | "source": [
97 | "from zipline.pipeline.filters import (\n",
98 | " SingleAsset,\n",
99 | " StaticAssets,\n",
100 | " StaticSids,\n",
101 | " StaticUniverse\n",
102 | ")"
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "The `initial_universe` parameter does not accept any other filters besides the ones listed above, because these are the only filters that represent static lists of assets or (in the case of the `SecuritiesMaster` Dataset) static characteristics of assets. Other filters represent dynamic characteristics of assets that change over time (such as price, volume, or fundamentals) and require loading the asset's data to see if it passes the filter. If you try to use an unsupported filter with `initial_universe`, you will receive an error message."
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "## Speed Benefit of `initial_universe`\n",
117 | "\n",
118 | "The main reason that Pipeline supports an `initial_universe` argument is to speed up computation. Since `screen` supports any filter while `initial_universe` only supports a limited set of filters, we could rely entirely on `screen` if we didn't care about speed. But since we care about speed, a general rule of thumb is to use `initial_universe` when possible and use `screen` for filters that `initial_universe` doesn't support.\n",
119 | "\n",
120 | "Let's run a similar pipeline with `screen` and then with `initial_universe` to demonstrate the speed benefit of using `initial_universe`. The speed benefit is greater, the fewer assets we are interested in. Suppose we want to get the rolling linear regression of two assets, Apple and Microsoft, versus SPY, using the built-in `RollingLinearRegressionOfReturns` factor. (This factor is computationally expensive and thus a good choice for this demonstration, but we will omit discussion of its parameters and multiple outputs; see the factor's docstring if you want to learn more about it.)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 4,
126 | "metadata": {},
127 | "outputs": [],
128 | "source": [
129 | "from zipline.pipeline.factors import RollingLinearRegressionOfReturns\n",
130 | "from zipline.research import symbol\n",
131 | "\n",
132 | "spy = symbol('SPY')\n",
133 | "aapl = symbol('AAPL')\n",
134 | "msft = symbol('MSFT')\n",
135 | "\n",
136 | "regression_factor = RollingLinearRegressionOfReturns(\n",
137 | " target=spy,\n",
138 | " returns_length=2,\n",
139 | " regression_length=10,\n",
140 | ")"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "First, let's see how long it takes to run this pipeline using `screen`:"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 5,
153 | "metadata": {},
154 | "outputs": [
155 | {
156 | "name": "stdout",
157 | "output_type": "stream",
158 | "text": [
159 | "CPU times: user 36.3 s, sys: 71.9 ms, total: 36.4 s\n",
160 | "Wall time: 36.4 s\n"
161 | ]
162 | },
163 | {
164 | "data": {
165 | "text/html": [
166 | "\n",
167 | "\n",
180 | "
\n",
181 | " \n",
182 | " \n",
183 | " | \n",
184 | " | \n",
185 | " alpha | \n",
186 | " beta | \n",
187 | "
\n",
188 | " \n",
189 | " date | \n",
190 | " asset | \n",
191 | " | \n",
192 | " | \n",
193 | "
\n",
194 | " \n",
195 | " \n",
196 | " \n",
197 | " 2010-01-05 | \n",
198 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
199 | " 0.007732 | \n",
200 | " 0.982085 | \n",
201 | "
\n",
202 | " \n",
203 | " Equity(FIBBG000BPH459 [MSFT]) | \n",
204 | " 0.000622 | \n",
205 | " 1.157044 | \n",
206 | "
\n",
207 | " \n",
208 | " 2010-01-06 | \n",
209 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
210 | " 0.006409 | \n",
211 | " 0.959515 | \n",
212 | "
\n",
213 | " \n",
214 | " Equity(FIBBG000BPH459 [MSFT]) | \n",
215 | " -0.001242 | \n",
216 | " 1.052271 | \n",
217 | "
\n",
218 | " \n",
219 | " 2010-01-07 | \n",
220 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
221 | " 0.004031 | \n",
222 | " 1.082494 | \n",
223 | "
\n",
224 | " \n",
225 | "
\n",
226 | "
"
227 | ],
228 | "text/plain": [
229 | " alpha beta\n",
230 | "date asset \n",
231 | "2010-01-05 Equity(FIBBG000B9XRY4 [AAPL]) 0.007732 0.982085\n",
232 | " Equity(FIBBG000BPH459 [MSFT]) 0.000622 1.157044\n",
233 | "2010-01-06 Equity(FIBBG000B9XRY4 [AAPL]) 0.006409 0.959515\n",
234 | " Equity(FIBBG000BPH459 [MSFT]) -0.001242 1.052271\n",
235 | "2010-01-07 Equity(FIBBG000B9XRY4 [AAPL]) 0.004031 1.082494"
236 | ]
237 | },
238 | "execution_count": 5,
239 | "metadata": {},
240 | "output_type": "execute_result"
241 | }
242 | ],
243 | "source": [
244 | "%%time\n",
245 | "\n",
246 | "from zipline.pipeline.filters import StaticAssets\n",
247 | "from zipline.research import run_pipeline\n",
248 | "\n",
249 | "pipeline = Pipeline(\n",
250 | " columns={\n",
251 | " 'alpha': regression_factor.alpha,\n",
252 | " 'beta': regression_factor.beta,\n",
253 | " },\n",
254 | " screen=StaticAssets([aapl, msft]) # limit output to Apple and Microsoft\n",
255 | ")\n",
256 | "results = run_pipeline(pipeline, start_date='2010-01-05', end_date='2010-06-05')\n",
257 | "results.head()"
258 | ]
259 | },
260 | {
261 | "cell_type": "markdown",
262 | "metadata": {},
263 | "source": [
264 | "Despite only returning data for two assets, this pipeline had to compute regression factors for every asset in the bundle, causing a longer runtime. Now let's see how long it takes to run the same pipeline using `initial_universe`: "
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": 6,
270 | "metadata": {},
271 | "outputs": [
272 | {
273 | "name": "stdout",
274 | "output_type": "stream",
275 | "text": [
276 | "CPU times: user 75.2 ms, sys: 4.02 ms, total: 79.2 ms\n",
277 | "Wall time: 83.9 ms\n"
278 | ]
279 | },
280 | {
281 | "data": {
282 | "text/html": [
283 | "\n",
284 | "\n",
297 | "
\n",
298 | " \n",
299 | " \n",
300 | " | \n",
301 | " | \n",
302 | " alpha | \n",
303 | " beta | \n",
304 | "
\n",
305 | " \n",
306 | " date | \n",
307 | " asset | \n",
308 | " | \n",
309 | " | \n",
310 | "
\n",
311 | " \n",
312 | " \n",
313 | " \n",
314 | " 2010-01-05 | \n",
315 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
316 | " 0.007732 | \n",
317 | " 0.982085 | \n",
318 | "
\n",
319 | " \n",
320 | " Equity(FIBBG000BPH459 [MSFT]) | \n",
321 | " 0.000622 | \n",
322 | " 1.157044 | \n",
323 | "
\n",
324 | " \n",
325 | " 2010-01-06 | \n",
326 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
327 | " 0.006409 | \n",
328 | " 0.959515 | \n",
329 | "
\n",
330 | " \n",
331 | " Equity(FIBBG000BPH459 [MSFT]) | \n",
332 | " -0.001242 | \n",
333 | " 1.052271 | \n",
334 | "
\n",
335 | " \n",
336 | " 2010-01-07 | \n",
337 | " Equity(FIBBG000B9XRY4 [AAPL]) | \n",
338 | " 0.004031 | \n",
339 | " 1.082494 | \n",
340 | "
\n",
341 | " \n",
342 | "
\n",
343 | "
"
344 | ],
345 | "text/plain": [
346 | " alpha beta\n",
347 | "date asset \n",
348 | "2010-01-05 Equity(FIBBG000B9XRY4 [AAPL]) 0.007732 0.982085\n",
349 | " Equity(FIBBG000BPH459 [MSFT]) 0.000622 1.157044\n",
350 | "2010-01-06 Equity(FIBBG000B9XRY4 [AAPL]) 0.006409 0.959515\n",
351 | " Equity(FIBBG000BPH459 [MSFT]) -0.001242 1.052271\n",
352 | "2010-01-07 Equity(FIBBG000B9XRY4 [AAPL]) 0.004031 1.082494"
353 | ]
354 | },
355 | "execution_count": 6,
356 | "metadata": {},
357 | "output_type": "execute_result"
358 | }
359 | ],
360 | "source": [
361 | "%%time\n",
362 | "\n",
363 | "pipeline = Pipeline(\n",
364 | " columns={\n",
365 | " 'alpha': regression_factor.alpha,\n",
366 | " 'beta': regression_factor.beta,\n",
367 | " },\n",
368 | " initial_universe=StaticAssets([aapl, msft, spy]), # limit universe to Apple, Microsoft, and SPY\n",
369 | " screen=StaticAssets([aapl, msft]), # limit output to Apple and Microsoft\n",
370 | ")\n",
371 | "results = run_pipeline(pipeline, start_date='2010-01-05', end_date='2010-06-05')\n",
372 | "results.head()"
373 | ]
374 | },
375 | {
376 | "cell_type": "markdown",
377 | "metadata": {},
378 | "source": [
379 | "Runtimes will vary based on your hardware, but this pipeline should run much faster because it ignores all but the few assets we are interested in.\n",
380 | "\n",
381 | "Note in the last example that we must include SPY in our `initial_universe` so we can regress AAPL and MSFT against it, but we then use `screen` to limit the output to AAPL and MSFT (not SPY). This illustrates another point about `initial_universe` and `screen`: they can be used together, with `initial_universe` limiting the size of the computational universe and `screen` further filtering the results. "
382 | ]
383 | },
384 | {
385 | "cell_type": "markdown",
386 | "metadata": {},
387 | "source": [
388 | "---\n",
389 | "\n",
390 | "**Next Lesson:** [The TradableStocksUS Universe](Lesson13-TradableStocksUS-Universe.ipynb) "
391 | ]
392 | }
393 | ],
394 | "metadata": {
395 | "kernelspec": {
396 | "display_name": "Python 3.11",
397 | "language": "python",
398 | "name": "python3"
399 | },
400 | "language_info": {
401 | "codemirror_mode": {
402 | "name": "ipython",
403 | "version": 3
404 | },
405 | "file_extension": ".py",
406 | "mimetype": "text/x-python",
407 | "name": "python",
408 | "nbconvert_exporter": "python",
409 | "pygments_lexer": "ipython3",
410 | "version": "3.11.0"
411 | }
412 | },
413 | "nbformat": 4,
414 | "nbformat_minor": 4
415 | }
416 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson13-TradableStocksUS-Universe.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 13: The TradableStocksUS Universe\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# The TradableStocksUS Universe\n",
25 | "\n",
26 | "Now that we've covered the basic components of the Pipeline API, let's construct a pipeline that we might want to use in an algorithm.\n",
27 | "\n",
28 | "To do so, we will create a filter to narrow down the full universe of US stocks to a subset of tradable securities, defined as those securities that meet all of the following criteria:\n",
29 | "\n",
30 | "- Common stocks only: no preferred stocks, ADRs, limited partnerships (LPs), or ETFs. ADRs are issuances in the US equity market for stocks that trade on other exchanges. Frequently, there is inherent risk associated with depositary receipts due to currency fluctuations so we exclude them from our pipeline. LP securities are not tradable with most brokers, so we exclude them as well. In the case of the US Stock dataset, selecting for \"Common Stock\" in the `usstock_SecurityType2` field will automatically exclude preferred stocks, ETFs, ADRs, and LPs, as the latter all have different values for `usstock_SecurityType2`.\n",
31 | "- Primary shares only: for companies with multiple share classes, select only the primary share class \n",
32 | "- Dollar volume: to ensure that stocks in our universe are relatively easy to trade when entering and exiting positions, include only stocks that have average daily dollar volume of \\$2.5M or more over the trailing 200 days.\n",
33 | "- Not too cheap: if a stock's price is lower than \\$5, the bid-ask spread becomes larger relative to the price, and the transaction cost becomes too high. \n",
34 | "- 200 continuous days of price and volume: If a stock has any missing data for the previous 200 days, the company is excluded. This targets stocks with trading halts, IPOs, and other situations that make them harder to assess.\n",
35 | "\n",
36 | "\n",
37 | "Former Quantopian users may notice that this universe is modeled on Quantopian's most popular universe, `QTradableStocksUS`, described in [this archived Quantopian forum post](https://quantopian-archive.netlify.app/forum/threads/working-on-our-best-universe-yet-qtradablestocksus.html). To reduce data dependencies, we have omitted one rule, namely that market cap must be over \\$500M. See the note further down if you have a Sharadar fundamentals subscription and would like to add this market cap filter.\n"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": [
44 | "## Creating Our Universe\n",
45 | "\n",
46 | "Let's create a filter for each criterion and combine them together to create a `TradableStocksUS` filter."
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 1,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "from zipline.pipeline import EquityPricing, master\n",
56 | "from zipline.pipeline.factors import AverageDollarVolume, Latest"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "The first two rules - common stocks only and primary shares only - are static characteristics that derive from the securities master database and thus can be applied as the `initial_universe`. We will place these rules in their own function, which returns a filter that can be passed to a Pipeline as the `initial_universe` argument."
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 2,
69 | "metadata": {},
70 | "outputs": [],
71 | "source": [
72 | "def InitialUniverse():\n",
73 | " # Equities listed as common stock (not preferred stock, ETF, ADR, LP, etc)\n",
74 | " common_stock = master.SecuritiesMaster.usstock_SecurityType2.latest.eq('Common Stock')\n",
75 | "\n",
76 | " # Filter for primary share equities; primary shares can be identified by a\n",
77 | " # null usstock_PrimaryShareSid field (i.e. no pointer to a primary share)\n",
78 | " is_primary_share = master.SecuritiesMaster.usstock_PrimaryShareSid.latest.isnull()\n",
79 | "\n",
80 | " # combine the security type filters to form our initial universe\n",
81 | " initial_universe = common_stock & is_primary_share\n",
82 | "\n",
83 | " return initial_universe"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "We place the remaining rules in a separate function. As discussed in the lesson on masking, we use masks in the later steps of our asset funnel to reduce computational load. This function will return a filter that can be passed to a pipeline as the `screen` argument or as the `mask` argument of another Term. In conjunction with the initial universe, it will limit the pipeline to the stocks we've defined as tradable. "
91 | ]
92 | },
93 | {
94 | "cell_type": "code",
95 | "execution_count": 3,
96 | "metadata": {},
97 | "outputs": [],
98 | "source": [
99 | "def TradableStocksUS():\n",
100 | "\n",
101 | " # require high dollar volume\n",
102 | " tradable_stocks = AverageDollarVolume(window_length=200) >= 2.5e6\n",
103 | "\n",
104 | " # also require price > $5. Note that we use Latest(...) instead of EquityPricing.close.latest\n",
105 | " # so that we can pass a mask\n",
106 | " tradable_stocks = Latest(EquityPricing.close, mask=tradable_stocks) > 5\n",
107 | "\n",
108 | " # also require no missing data for 200 days\n",
109 | " tradable_stocks = EquityPricing.close.all_present(200, mask=tradable_stocks)\n",
110 | " has_volume = EquityPricing.volume.latest > 0\n",
111 | " tradable_stocks = has_volume.all(200, mask=tradable_stocks)\n",
112 | "\n",
113 | " return tradable_stocks"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "Note that when defining our filters, we used several methods that we haven't yet seen including `isnull`, `all`, and `all_present`. Documentation on these methods is available in the [Pipeline API Reference](https://www.quantrocket.com/docs/api/#pipeline-api) or by clicking on the method name in JupyterLab and pressing Control."
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "If you have a Sharadar fundamentals subscription and would like to add a market cap filter to your universe to fully re-create the `QTradableStocksUS` universe, you can do so by adding the following line to the above function:\n",
128 | "\n",
129 | "```\n",
130 | "# also require market cap over $500M\n",
131 | "tradable_stocks = Latest([sharadar.Fundamentals.slice(dimension='ARQ', period_offset=0).MARKETCAP], mask=tradable_stocks) >= 500e6\n",
132 | "```"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "## Code Reuse\n",
140 | "\n",
141 | "Our universe may be useful to us in numerous notebooks and Zipline algorithms, so a practical next step is to transfer the pipeline code to a `.py` file to facilitate code reuse. We have done so in [tradable_stocks.py](tradable_stocks.py). The initial universe and the full universe can now be imported in any notebook or Zipline algorithm as follows:"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 4,
147 | "metadata": {},
148 | "outputs": [],
149 | "source": [
150 | "from codeload.pipeline_tutorial.tradable_stocks import InitialUniverse, TradableStocksUS\n",
151 | "\n",
152 | "initial_universe = InitialUniverse()\n",
153 | "universe = TradableStocksUS()"
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "We'll import and use this universe in the next lesson."
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {},
166 | "source": [
167 | "---\n",
168 | "\n",
169 | "**Next Lesson:** [Using Pipeline with Alphalens](Lesson14-Alphalens.ipynb) "
170 | ]
171 | }
172 | ],
173 | "metadata": {
174 | "kernelspec": {
175 | "display_name": "Python 3.11",
176 | "language": "python",
177 | "name": "python3"
178 | },
179 | "language_info": {
180 | "codemirror_mode": {
181 | "name": "ipython",
182 | "version": 3
183 | },
184 | "file_extension": ".py",
185 | "mimetype": "text/x-python",
186 | "name": "python",
187 | "nbconvert_exporter": "python",
188 | "pygments_lexer": "ipython3",
189 | "version": "3.11.0"
190 | }
191 | },
192 | "nbformat": 4,
193 | "nbformat_minor": 4
194 | }
195 |
--------------------------------------------------------------------------------
/pipeline_tutorial/Lesson15-Data-Browser-Integration.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
\n",
8 | "Disclaimer"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "***\n",
16 | "[Pipeline Tutorial](Introduction.ipynb) › Lesson 15: Data Browser Integration\n",
17 | "***"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "# Browsing Pipeline Output in the Data Browser\n",
25 | "\n",
26 | "Pipeline output can be opened in the Data Browser. This allows you to view plots of the pipeline columns you calculated and visualize how the calculated indicators change over time for any given security. The Data Browser also makes it easy to see a breakdown of the types of securities that are passing your pipeline screen so that you can validate your pipeline logic. Are the results concentrated in certain sectors? Do the results include security types that you would rather exclude?"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "from zipline.pipeline import Pipeline, EquityPricing, master\n",
36 | "from zipline.pipeline.factors import SimpleMovingAverage\n",
37 | "# import our InitialUniverse and TradableStocksUS functions\n",
38 | "from codeload.pipeline_tutorial.tradable_stocks import InitialUniverse, TradableStocksUS"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "To demonstrate the Data Browser integration, let's create a Pipeline that includes 3 columns:\n",
46 | "\n",
47 | "* The daily closing price\n",
48 | "* The 200-day moving average of the closing price\n",
49 | "* A boolean indicator of whether the stock closed above the 200-day moving average that day. In the Data Browser, we can only view numeric columns, so we use the `.as_factor()` method to convert the boolean filter (True or False) into 1 or 0.\n",
50 | "\n",
51 | "The universe for the analysis will be the `TradableStocksUS` universe. "
52 | ]
53 | },
54 | {
55 | "cell_type": "code",
56 | "execution_count": 2,
57 | "metadata": {},
58 | "outputs": [],
59 | "source": [
60 | "def make_pipeline():\n",
61 | "\n",
62 | " initial_universe = InitialUniverse()\n",
63 | " universe = TradableStocksUS()\n",
64 | "\n",
65 | " # 200-day close price average.\n",
66 | " mean_200 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=200, mask=universe)\n",
67 | "\n",
68 | " return Pipeline(\n",
69 | " columns={\n",
70 | " 'close': EquityPricing.close.latest,\n",
71 | " '200_ma': mean_200,\n",
72 | " 'above_200_ma': (EquityPricing.close.latest > mean_200).as_factor()\n",
73 | " },\n",
74 | " initial_universe=initial_universe,\n",
75 | " screen=universe\n",
76 | " )"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "Next, we run the pipeline for the date range of interest:"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 3,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "from zipline.research import run_pipeline\n",
93 | "results = run_pipeline(make_pipeline(), start_date=\"2010-01-05\", end_date=\"2011-01-05\")"
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "We can then open the pipeline output in the Data Browser by calling the IPython magic command `%browse` followed by the name of the results DataFrame:"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 4,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "%browse results"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "See the video below for a walk-through:"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": 5,
122 | "metadata": {},
123 | "outputs": [
124 | {
125 | "data": {
126 | "text/html": [
127 | "\n",
128 | " \n",
136 | " "
137 | ],
138 | "text/plain": [
139 | ""
140 | ]
141 | },
142 | "execution_count": 5,
143 | "metadata": {},
144 | "output_type": "execute_result"
145 | }
146 | ],
147 | "source": [
148 | "from IPython.display import VimeoVideo\n",
149 | "VimeoVideo('920097406', width=600, height=450)"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "---\n",
157 | "[Back to Introduction](Introduction.ipynb) "
158 | ]
159 | }
160 | ],
161 | "metadata": {
162 | "kernelspec": {
163 | "display_name": "Python 3.11",
164 | "language": "python",
165 | "name": "python3"
166 | },
167 | "language_info": {
168 | "codemirror_mode": {
169 | "name": "ipython",
170 | "version": 3
171 | },
172 | "file_extension": ".py",
173 | "mimetype": "text/x-python",
174 | "name": "python",
175 | "nbconvert_exporter": "python",
176 | "pygments_lexer": "ipython3",
177 | "version": "3.11.0"
178 | }
179 | },
180 | "nbformat": 4,
181 | "nbformat_minor": 4
182 | }
183 |
--------------------------------------------------------------------------------
/pipeline_tutorial/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/quantrocket-codeload/pipeline-tutorial/c322807b7aed3c7be164a822152f843f73882efa/pipeline_tutorial/__init__.py
--------------------------------------------------------------------------------
/pipeline_tutorial/tradable_stocks.py:
--------------------------------------------------------------------------------
1 | # Copyright 2024 QuantRocket LLC - All Rights Reserved
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | from zipline.pipeline import EquityPricing, master, sharadar
16 | from zipline.pipeline.factors import AverageDollarVolume, Latest
17 |
18 | def InitialUniverse():
19 | """
20 | Returns a Pipeline filter that can be used as the
21 | initial universe for a tradable stocks filter. The
22 | initial universe consists of:
23 |
24 | - common stocks only (no preferred stocks, ADRs, LPs, or ETFs)
25 | - primary shares only
26 | """
27 | # Equities listed as common stock (not preferred stock, ETF, ADR, LP, etc)
28 | common_stock = master.SecuritiesMaster.usstock_SecurityType2.latest.eq('Common Stock')
29 |
30 | # Filter for primary share equities; primary shares can be identified by a
31 | # null usstock_PrimaryShareSid field (i.e. no pointer to a primary share)
32 | is_primary_share = master.SecuritiesMaster.usstock_PrimaryShareSid.latest.isnull()
33 |
34 | # combine the security type filters to form our initial universe
35 | initial_universe = common_stock & is_primary_share
36 |
37 | return initial_universe
38 |
39 | def TradableStocksUS(market_cap_filter=False):
40 | """
41 | Returns a Pipeline filter of tradable stocks, defined as:
42 |
43 | - 200-day average dollar volume >= $2.5M
44 | - price >= $5
45 | - 200 continuous days of price and volume.
46 |
47 | If market_cap_filter=True, also requires market cap > $500M.
48 |
49 | This filter is intended to be used in conjunction with the filter
50 | returns by InitialUniverse, above.
51 | """
52 | # require high dollar volume
53 | tradable_stocks = AverageDollarVolume(window_length=200) >= 2.5e6
54 |
55 | # also require price > $5. Note that we use Latest(...) instead of EquityPricing.close.latest
56 | # so that we can pass a mask
57 | tradable_stocks = Latest(EquityPricing.close, mask=tradable_stocks) > 5
58 |
59 | # also require no missing data for 200 days
60 | tradable_stocks = EquityPricing.close.all_present(200, mask=tradable_stocks)
61 | has_volume = EquityPricing.volume.latest > 0
62 | tradable_stocks = has_volume.all(200, mask=tradable_stocks)
63 |
64 | if market_cap_filter:
65 | # also require market cap over $500M
66 | tradable_stocks = Latest([sharadar.Fundamentals.slice(dimension='ARQ', period_offset=0).MARKETCAP], mask=tradable_stocks) >= 500e6
67 |
68 | return tradable_stocks
69 |
--------------------------------------------------------------------------------