├── .qrignore
├── LICENSE.txt
├── README.md
└── kitchensink_ml
    ├── Introduction.ipynb
    ├── Part1-Data-Collection-US-Stocks.ipynb
    ├── Part2-Data-Collection-Indexes.ipynb
    ├── Part3-Moonshot-Strategy-Code.ipynb
    ├── Part4-Walkforward-Optimization.ipynb
    ├── Part5-Dimensionality-Reduction.ipynb
    ├── Part6-Predictions-Analysis.ipynb
    └── kitchensink_ml.py


/.qrignore:
--------------------------------------------------------------------------------
1 | README.md
2 | LICENSE.txt
3 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Apache License
 2 | Version 2.0, January 2004
 3 | http://www.apache.org/licenses/
 4 | 
 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
 6 | 
 7 | 1. Definitions.
 8 | 
 9 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
10 | 
11 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
12 | 
13 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
14 | 
15 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
16 | 
17 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
18 | 
19 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
20 | 
21 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
22 | 
23 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
24 | 
25 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
26 | 
27 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
28 | 
29 | 2. Grant of Copyright License.
30 | 
31 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
32 | 
33 | 3. Grant of Patent License.
34 | 
35 | Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
36 | 
37 | 4. Redistribution.
38 | 
39 | You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
40 | 
41 | You must give any other recipients of the Work or Derivative Works a copy of this License; and
42 | You must cause any modified files to carry prominent notices stating that You changed the files; and
43 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
44 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
45 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
46 | 
47 | 5. Submission of Contributions.
48 | 
49 | Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
50 | 
51 | 6. Trademarks.
52 | 
53 | This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
54 | 
55 | 7. Disclaimer of Warranty.
56 | 
57 | Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
58 | 
59 | 8. Limitation of Liability.
60 | 
61 | In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
62 | 
63 | 9. Accepting Warranty or Additional Liability.
64 | 
65 | While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
66 | 
67 | END OF TERMS AND CONDITIONS
68 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # kitchensink-ml
 2 | 
 3 | Machine learning strategy that trains the model using "everything and the kitchen sink": fundamentals, technical indicators, returns, price levels, volume and volatility spikes, liquidity, market breadth, and more. Runs in Moonshot.
 4 | 
 5 | ## Clone in QuantRocket
 6 | 
 7 | CLI:
 8 | 
 9 | ```shell
10 | quantrocket codeload clone 'kitchensink-ml'
11 | ```
12 | 
13 | Python:
14 | 
15 | ```python
16 | from quantrocket.codeload import clone
17 | clone("kitchensink-ml")
18 | ```
19 | 
20 | ## [Browse in GitHub](kitchensink_ml/Introduction.ipynb)
21 | 


--------------------------------------------------------------------------------
/kitchensink_ml/Introduction.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "<a href=\"https://www.quantrocket.com\"><img alt=\"QuantRocket logo\" src=\"https://www.quantrocket.com/assets/img/notebook-header-logo.png\"></a><br>\n",
 8 |     "<a href=\"https://www.quantrocket.com/disclaimer/\">Disclaimer</a>"
 9 |    ]
10 |   },
11 |   {
12 |    "cell_type": "markdown",
13 |    "metadata": {},
14 |    "source": [
15 |     "# Machine Learning and the Kitchen Sink \n",
16 |     "\n",
17 |     "This tutorial demonstrates how to run a machine learning strategy in Moonshot using a wide variety of features, including fundamentals, technical indicators, returns, price levels, volume spikes, liquidity, volatility, market breadth, and more. We throw \"everything and the kitchen sink\" at the model to see what it can do. \n",
18 |     "\n",
19 |     "The tutorial utilizes price and fundamental data from Sharadar for US stocks, as well as index data from Interactive Brokers. \n",
20 |     "\n",
21 |     "We train the model with scikit-learns's Stochastic Gradient Descent model and also experiment with dimensionality reduction using PCA (principal component analysis).\n",
22 |     "\n",
23 |     "> Due to the large number of features and large number of stocks, this is an advanced tutorial. For a simpler introduction to machine learning in QuantRocket, see the sample code in the [usage guide](https://www.quantrocket.com/docs/#ml). At least 16 GB of memory is recommended for this tutorial. "
24 |    ]
25 |   },
26 |   {
27 |    "cell_type": "markdown",
28 |    "metadata": {},
29 |    "source": [
30 |     "* Part 1: [Data Collection for US Stocks](Part1-Data-Collection-US-Stocks.ipynb)\n",
31 |     "* Part 2: [Data Collection for Market Indexes](Part2-Data-Collection-Indexes.ipynb)\n",
32 |     "* Part 3: [Moonshot Strategy Code](Part3-Moonshot-Strategy-Code.ipynb)\n",
33 |     "* Part 4: [Walk-forward Optimization](Part4-Walkforward-Optimization.ipynb)\n",
34 |     "* Part 5: [Dimensionality Reduction with PCA](Part5-Dimensionality-Reduction.ipynb)\n",
35 |     "* Part 6: [Analysis of Model Predictions](Part6-Predictions-Analysis.ipynb)"
36 |    ]
37 |   }
38 |  ],
39 |  "metadata": {
40 |   "kernelspec": {
41 |    "display_name": "Python 3.9",
42 |    "language": "python",
43 |    "name": "python3"
44 |   },
45 |   "language_info": {
46 |    "codemirror_mode": {
47 |     "name": "ipython",
48 |     "version": 3
49 |    },
50 |    "file_extension": ".py",
51 |    "mimetype": "text/x-python",
52 |    "name": "python",
53 |    "nbconvert_exporter": "python",
54 |    "pygments_lexer": "ipython3",
55 |    "version": "3.9.7"
56 |   }
57 |  },
58 |  "nbformat": 4,
59 |  "nbformat_minor": 4
60 | }
61 | 


--------------------------------------------------------------------------------
/kitchensink_ml/Part1-Data-Collection-US-Stocks.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "<a href=\"https://www.quantrocket.com\"><img alt=\"QuantRocket logo\" src=\"https://www.quantrocket.com/assets/img/notebook-header-logo.png\"></a><br>\n",
  8 |     "<a href=\"https://www.quantrocket.com/disclaimer/\">Disclaimer</a>"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "***\n",
 16 |     "[Machine Learning and the Kitchen Sink Strategy](Introduction.ipynb) › Part 1: Data Collection (Stocks)\n",
 17 |     "***"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# Data Collection - US Stocks\n",
 25 |     "\n",
 26 |     "Our machine learning strategy will run on the universe of all US stocks.\n",
 27 |     "\n",
 28 |     "Start by collecting US stock data from Sharadar. Fundamental and price data are collected separately but can be run simultaneously. "
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "## Collect Sharadar fundamentals\n",
 36 |     "\n",
 37 |     "To collect the fundamentals:"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": 1,
 43 |    "metadata": {},
 44 |    "outputs": [
 45 |     {
 46 |      "data": {
 47 |       "text/plain": [
 48 |        "{'status': 'the fundamental data will be collected asynchronously'}"
 49 |       ]
 50 |      },
 51 |      "execution_count": 1,
 52 |      "metadata": {},
 53 |      "output_type": "execute_result"
 54 |     }
 55 |    ],
 56 |    "source": [
 57 |     "from quantrocket.fundamental import collect_sharadar_fundamentals\n",
 58 |     "collect_sharadar_fundamentals(country=\"US\")"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "This runs in the background, monitor flightlog for a completion message:\n",
 66 |     "\n",
 67 |     "```\n",
 68 |     "quantrocket.fundamental: INFO Collecting Sharadar US fundamentals\n",
 69 |     "quantrocket.fundamental: INFO Collecting updated Sharadar US securities listings\n",
 70 |     "quantrocket.fundamental: INFO Finished collecting Sharadar US fundamentals\n",
 71 |     "```"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {},
 77 |    "source": [
 78 |     "## Collect Sharadar prices\n",
 79 |     "\n",
 80 |     "First, create a database for Sharadar stock prices:"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": 2,
 86 |    "metadata": {},
 87 |    "outputs": [
 88 |     {
 89 |      "data": {
 90 |       "text/plain": [
 91 |        "{'status': 'successfully created quantrocket.v2.history.sharadar-us-stk-1d.sqlite'}"
 92 |       ]
 93 |      },
 94 |      "execution_count": 2,
 95 |      "metadata": {},
 96 |      "output_type": "execute_result"
 97 |     }
 98 |    ],
 99 |    "source": [
100 |     "from quantrocket.history import create_sharadar_db\n",
101 |     "create_sharadar_db(\"sharadar-us-stk-1d\", sec_type=\"STK\", country=\"US\")"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "markdown",
106 |    "metadata": {},
107 |    "source": [
108 |     "Then collect the data:"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": 3,
114 |    "metadata": {},
115 |    "outputs": [
116 |     {
117 |      "data": {
118 |       "text/plain": [
119 |        "{'status': 'the historical data will be collected asynchronously'}"
120 |       ]
121 |      },
122 |      "execution_count": 3,
123 |      "metadata": {},
124 |      "output_type": "execute_result"
125 |     }
126 |    ],
127 |    "source": [
128 |     "from quantrocket.history import collect_history\n",
129 |     "collect_history(\"sharadar-us-stk-1d\")"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "markdown",
134 |    "metadata": {},
135 |    "source": [
136 |     "This runs in the background, monitor flightlog for a completion message:\n",
137 |     "\n",
138 |     "```\n",
139 |     "quantrocket.history: INFO [sharadar-us-stk-1d] Collecting Sharadar US STK prices\n",
140 |     "quantrocket.history: INFO [sharadar-us-stk-1d] Collecting updated Sharadar US securities listings\n",
141 |     "quantrocket.history: INFO [sharadar-us-stk-1d] Finished collecting Sharadar US STK prices\n",
142 |     "```"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {},
148 |    "source": [
149 |     "***\n",
150 |     "\n",
151 |     "## *Next Up*\n",
152 |     "\n",
153 |     "Part 2: [Data Collection - Indexes](Part2-Data-Collection-Indexes.ipynb)"
154 |    ]
155 |   }
156 |  ],
157 |  "metadata": {
158 |   "kernelspec": {
159 |    "display_name": "Python 3.9",
160 |    "language": "python",
161 |    "name": "python3"
162 |   },
163 |   "language_info": {
164 |    "codemirror_mode": {
165 |     "name": "ipython",
166 |     "version": 3
167 |    },
168 |    "file_extension": ".py",
169 |    "mimetype": "text/x-python",
170 |    "name": "python",
171 |    "nbconvert_exporter": "python",
172 |    "pygments_lexer": "ipython3",
173 |    "version": "3.9.7"
174 |   }
175 |  },
176 |  "nbformat": 4,
177 |  "nbformat_minor": 4
178 | }
179 | 


--------------------------------------------------------------------------------
/kitchensink_ml/Part2-Data-Collection-Indexes.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "<a href=\"https://www.quantrocket.com\"><img alt=\"QuantRocket logo\" src=\"https://www.quantrocket.com/assets/img/notebook-header-logo.png\"></a><br>\n",
  8 |     "<a href=\"https://www.quantrocket.com/disclaimer/\">Disclaimer</a>"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "***\n",
 16 |     "[Machine Learning and the Kitchen Sink Strategy](Introduction.ipynb) › Part 2: Data Collection (Indexes)\n",
 17 |     "***"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# Data Collection - Indexes\n",
 25 |     "\n",
 26 |     "Though most of our features will come from the Sharadar price and fundamental data, we also wish to add some additional features that reflect the broad market. Specifically, we will include features relating to the S&P 500, the VIX, and the NYSE TRIN (aka Arms Index, a breadth measure). \n",
 27 |     "\n",
 28 |     "This data will come from Interactive Brokers."
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "First, start IB Gateway:"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": 1,
 41 |    "metadata": {},
 42 |    "outputs": [
 43 |     {
 44 |      "data": {
 45 |       "text/plain": [
 46 |        "{'ibg1': {'status': 'running'}}"
 47 |       ]
 48 |      },
 49 |      "execution_count": 1,
 50 |      "metadata": {},
 51 |      "output_type": "execute_result"
 52 |     }
 53 |    ],
 54 |    "source": [
 55 |     "from quantrocket.ibg import start_gateways\n",
 56 |     "start_gateways(wait=True)"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "markdown",
 61 |    "metadata": {},
 62 |    "source": [
 63 |     "## Collect listings\n",
 64 |     "\n",
 65 |     "Next we collect the listings from IBKR. (For the S&P 500, we collect the SPY ETF because IBKR provides deeper history for SPY@ARCA than SPX@CBOE.)"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 2,
 71 |    "metadata": {},
 72 |    "outputs": [
 73 |     {
 74 |      "data": {
 75 |       "text/plain": [
 76 |        "{'status': 'the IBKR listing details will be collected asynchronously'}"
 77 |       ]
 78 |      },
 79 |      "execution_count": 2,
 80 |      "metadata": {},
 81 |      "output_type": "execute_result"
 82 |     }
 83 |    ],
 84 |    "source": [
 85 |     "from quantrocket.master import collect_ibkr_listings\n",
 86 |     "collect_ibkr_listings(exchanges=\"ARCA\", symbols=\"SPY\", sec_types=\"ETF\")\n",
 87 |     "collect_ibkr_listings(countries=\"US\", symbols=[\"VIX\", \"TRIN-NYSE\"], sec_types=\"IND\")"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "Monitor flightlog for the completion messages:\n",
 95 |     "\n",
 96 |     "```\n",
 97 |     "quantrocket.master: INFO Collecting ARCA ETF listings from IBKR website (SPY only)\n",
 98 |     "quantrocket.master: INFO Requesting details for 1 ARCA ETF listings found on IBKR website\n",
 99 |     "quantrocket.master: INFO Saved 1 ARCA ETF listings to securities master database\n",
100 |     "quantrocket.master: INFO Collecting US IND listings from IBKR website (VIX, TRIN-NYSE only)\n",
101 |     "quantrocket.master: INFO Requesting details for 2 US IND listings found on IBKR website\n",
102 |     "quantrocket.master: INFO Saved 2 US IND listings to securities master database\n",
103 |     "```"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "metadata": {},
109 |    "source": [
110 |     "## Lookup Sids\n",
111 |     "\n",
112 |     "Look up the Sids for the various instruments:"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 3,
118 |    "metadata": {},
119 |    "outputs": [],
120 |    "source": [
121 |     "from quantrocket.master import download_master_file\n",
122 |     "download_master_file(\"indices.csv\", exchanges=[\"CBOE\", \"NYSE\", \"ARCA\"], symbols=[\"SPY\", \"VIX\", \"TRIN-NYSE\"], sec_types=[\"IND\",\"ETF\"], vendors=\"ibkr\")"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": 5,
128 |    "metadata": {},
129 |    "outputs": [
130 |     {
131 |      "data": {
132 |       "text/html": [
133 |        "<div>\n",
134 |        "<style scoped>\n",
135 |        "    .dataframe tbody tr th:only-of-type {\n",
136 |        "        vertical-align: middle;\n",
137 |        "    }\n",
138 |        "\n",
139 |        "    .dataframe tbody tr th {\n",
140 |        "        vertical-align: top;\n",
141 |        "    }\n",
142 |        "\n",
143 |        "    .dataframe thead th {\n",
144 |        "        text-align: right;\n",
145 |        "    }\n",
146 |        "</style>\n",
147 |        "<table border=\"1\" class=\"dataframe\">\n",
148 |        "  <thead>\n",
149 |        "    <tr style=\"text-align: right;\">\n",
150 |        "      <th></th>\n",
151 |        "      <th>Sid</th>\n",
152 |        "      <th>Symbol</th>\n",
153 |        "      <th>Name</th>\n",
154 |        "      <th>Exchange</th>\n",
155 |        "    </tr>\n",
156 |        "  </thead>\n",
157 |        "  <tbody>\n",
158 |        "    <tr>\n",
159 |        "      <th>0</th>\n",
160 |        "      <td>FIBBG000BDTBL9</td>\n",
161 |        "      <td>SPY</td>\n",
162 |        "      <td>SPDR S&amp;P 500 ETF TRUST</td>\n",
163 |        "      <td>ARCX</td>\n",
164 |        "    </tr>\n",
165 |        "    <tr>\n",
166 |        "      <th>1</th>\n",
167 |        "      <td>IB13455763</td>\n",
168 |        "      <td>VIX</td>\n",
169 |        "      <td>CBOE Volatility Index</td>\n",
170 |        "      <td>XCBO</td>\n",
171 |        "    </tr>\n",
172 |        "    <tr>\n",
173 |        "      <th>2</th>\n",
174 |        "      <td>IB26718743</td>\n",
175 |        "      <td>TRIN-NYSE</td>\n",
176 |        "      <td>NYSE TRIN (OR ARMS) INDEX</td>\n",
177 |        "      <td>XNYS</td>\n",
178 |        "    </tr>\n",
179 |        "  </tbody>\n",
180 |        "</table>\n",
181 |        "</div>"
182 |       ],
183 |       "text/plain": [
184 |        "              Sid     Symbol                       Name Exchange\n",
185 |        "0  FIBBG000BDTBL9        SPY     SPDR S&P 500 ETF TRUST     ARCX\n",
186 |        "1      IB13455763        VIX      CBOE Volatility Index     XCBO\n",
187 |        "2      IB26718743  TRIN-NYSE  NYSE TRIN (OR ARMS) INDEX     XNYS"
188 |       ]
189 |      },
190 |      "execution_count": 5,
191 |      "metadata": {},
192 |      "output_type": "execute_result"
193 |     }
194 |    ],
195 |    "source": [
196 |     "import pandas as pd\n",
197 |     "indices = pd.read_csv(\"indices.csv\")\n",
198 |     "\n",
199 |     "indices[[\"Sid\", \"Symbol\", \"Name\", \"Exchange\"]]"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "markdown",
204 |    "metadata": {},
205 |    "source": [
206 |     "## Collect historical data\n",
207 |     "\n",
208 |     "Next, we create a database for collecting 1-day bars for the indexes:"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": 6,
214 |    "metadata": {},
215 |    "outputs": [
216 |     {
217 |      "data": {
218 |       "text/plain": [
219 |        "{'status': 'successfully created quantrocket.v2.history.market-1d.sqlite'}"
220 |       ]
221 |      },
222 |      "execution_count": 6,
223 |      "metadata": {},
224 |      "output_type": "execute_result"
225 |     }
226 |    ],
227 |    "source": [
228 |     "from quantrocket.history import create_ibkr_db\n",
229 |     "create_ibkr_db(\"market-1d\", \n",
230 |     "              sids=[\n",
231 |     "                  \"FIBBG000BDTBL9\",\n",
232 |     "                  \"IB13455763\", \n",
233 |     "                  \"IB26718743\",\n",
234 |     "              ], \n",
235 |     "              bar_size=\"1 day\")"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "markdown",
240 |    "metadata": {},
241 |    "source": [
242 |     "Then collect the data:"
243 |    ]
244 |   },
245 |   {
246 |    "cell_type": "code",
247 |    "execution_count": 7,
248 |    "metadata": {},
249 |    "outputs": [
250 |     {
251 |      "data": {
252 |       "text/plain": [
253 |        "{'status': 'the historical data will be collected asynchronously'}"
254 |       ]
255 |      },
256 |      "execution_count": 7,
257 |      "metadata": {},
258 |      "output_type": "execute_result"
259 |     }
260 |    ],
261 |    "source": [
262 |     "from quantrocket.history import collect_history\n",
263 |     "collect_history(\"market-1d\")"
264 |    ]
265 |   },
266 |   {
267 |    "cell_type": "markdown",
268 |    "metadata": {},
269 |    "source": [
270 |     "Monitor flightlog for completion:\n",
271 |     "\n",
272 |     "```\n",
273 |     "quantrocket.history: INFO [market-1d] Collecting history from IBKR for 3 securities in market-1d\n",
274 |     "quantrocket.history: INFO [market-1d] Saved 13302 total records for 3 total securities to quantrocket.v2.history.market-1d.sqlite\n",
275 |     "```"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "markdown",
280 |    "metadata": {},
281 |    "source": [
282 |     "***\n",
283 |     "\n",
284 |     "## *Next Up*\n",
285 |     "\n",
286 |     "Part 3: [Moonshot Strategy Code](Part3-Moonshot-Strategy-Code.ipynb)"
287 |    ]
288 |   }
289 |  ],
290 |  "metadata": {
291 |   "kernelspec": {
292 |    "display_name": "Python 3.9",
293 |    "language": "python",
294 |    "name": "python3"
295 |   },
296 |   "language_info": {
297 |    "codemirror_mode": {
298 |     "name": "ipython",
299 |     "version": 3
300 |    },
301 |    "file_extension": ".py",
302 |    "mimetype": "text/x-python",
303 |    "name": "python",
304 |    "nbconvert_exporter": "python",
305 |    "pygments_lexer": "ipython3",
306 |    "version": "3.9.7"
307 |   }
308 |  },
309 |  "nbformat": 4,
310 |  "nbformat_minor": 4
311 | }
312 | 


--------------------------------------------------------------------------------
/kitchensink_ml/Part3-Moonshot-Strategy-Code.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "<a href=\"https://www.quantrocket.com\"><img alt=\"QuantRocket logo\" src=\"https://www.quantrocket.com/assets/img/notebook-header-logo.png\"></a><br>\n",
  8 |     "<a href=\"https://www.quantrocket.com/disclaimer/\">Disclaimer</a>"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "***\n",
 16 |     "[Machine Learning and the Kitchen Sink Strategy](Introduction.ipynb) › Part 3: Moonshot Code\n",
 17 |     "***"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# Moonshot Strategy Code\n",
 25 |     "\n",
 26 |     "The file [kitchensink_ml.py](kitchensink_ml.py) contains the strategy code."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "## Prices to features\n",
 34 |     "\n",
 35 |     "Due to the large number of features, the strategy's `prices_to_features` calls a variety of helper methods to create the various categories of features. Not only does this improve code readability but it also allows intermediate DataFrames to be garbage-collected more frequently, reducing memory usage.  "
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "### Fundamental features\n",
 43 |     "\n",
 44 |     "The method `add_fundamental_features` adds various fundamental values and ratios. For each fundamental field, we choose to rank the stocks and use the rank as the feature, rather than the raw fundamental value. This is meant to ensure more uniform scaling of features. For example:\n",
 45 |     "\n",
 46 |     "```python\n",
 47 |     "features[\"enterprise_multiples_ranks\"] = enterprise_multiples.rank(axis=1, pct=True).fillna(0.5)\n",
 48 |     "```\n",
 49 |     "\n",
 50 |     "The parameter `pct=True` causes Pandas to rank the stocks along a continuum from 0 to 1, nicely scaling the data. We use `fillna(0.5)` to place NaNs at the center rather than at either extreme, so that the model does not interpret them as having either a good or bad rank."
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "### Quality features\n",
 58 |     "\n",
 59 |     "The method `add_quality_features` adds additional fundamental features related to quality as defined in the Piotroski F-Score. We add the nine individual F-score components as well as the daily F-score ranks."
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "markdown",
 64 |    "metadata": {},
 65 |    "source": [
 66 |     "### Price and volume features\n",
 67 |     "        \n",
 68 |     "The method `add_price_and_volume_features` adds a number of features derived from price and volume including:\n",
 69 |     "\n",
 70 |     "* ranking by returns on several time frames (yearly, monthly, weekly, daily)\n",
 71 |     "* price level (above or below 10, above or below 2)\n",
 72 |     "* rankings by dollar volume\n",
 73 |     "* rankings by volatility\n",
 74 |     "* whether a volatility spike occurred today\n",
 75 |     "* whether a volume spike occurred today"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "### Technical indicator features\n",
 83 |     "\n",
 84 |     "The method `add_technical_indicator_features` calculates several technical indicators for each stock in the universe:\n",
 85 |     "\n",
 86 |     "* where is the price in relation to its 20-day Bollinger bands\n",
 87 |     "* RSI (Relative Strength Index)\n",
 88 |     "* Stochastic oscillator\n",
 89 |     "* Money Flow Index\n",
 90 |     "\n",
 91 |     "Each indicator can have a value between 0 and 1. In the case of Bollinger Bands, where the price could exceed the band, resulting in a value less than 0 or greater than 1, we choose to winsorize the price at the upper and lower bands in order to keep the range between 0 and 1.\n",
 92 |     "\n",
 93 |     "```python\n",
 94 |     "winsorized_closes = closes.where(closes > lower_bands, lower_bands).where(closes < upper_bands, upper_bands)\n",
 95 |     "```\n"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {},
101 |    "source": [
102 |     "### Securities master features\n",
103 |     "\n",
104 |     "The method `add_securities_master_features` adds a few features from the securities master database: whether the stock is an ADR, and what sector it belongs to. Note that sectors must be one-hot encoded, resulting in a boolean DataFrame for each sector indicating whether the stock belongs to that particular sector. See the usage guide for more on one-hot encoding. \n"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {},
110 |    "source": [
111 |     "### Market features\n",
112 |     "\n",
113 |     "The method `add_market_features` adds several market-wide features to help the model know what is happening in the broader market, including:\n",
114 |     "\n",
115 |     "* whether the S&P 500 is above or below its 200-day moving average\n",
116 |     "* the level of the VIX (specifically, where it falls within the range of 12 - 30, our chosen thresholds for low and high levels)\n",
117 |     "* where the 10-day NYSE TRIN falls within the range of 0.5 to 2\n",
118 |     "* the McClellan oscillator\n",
119 |     "* whether the Hindenburg Omen triggered in the last 30 days\n",
120 |     "\n",
121 |     "The first 3 of these features are derived from the index data collected from IBKR. We query this database for the date range of our prices DataFrame. (Note that we identified the database as the `BENCHMARK_DB` so that we can use SPY as the backtest benchmark, see the usage guide for more on benchmarks.) \n",
122 |     "\n",
123 |     "```python\n",
124 |     "# Get prices for SPY, VIX, TRIN-NYSE\n",
125 |     "market_prices = get_prices(self.BENCHMARK_DB,\n",
126 |     "                           fields=\"Close\",\n",
127 |     "                           start_date=closes.index.min(),\n",
128 |     "                           end_date=closes.index.max())\n",
129 |     "\n",
130 |     "market_closes = market_prices.loc[\"Close\"]\n",
131 |     "```\n",
132 |     "\n",
133 |     "Using SPY as an example, we extract the Series of SPY prices from the DataFrame and perform our calculations.\n",
134 |     "\n",
135 |     "```python\n",
136 |     "# Is S&P above its 200-day?\n",
137 |     "spy_closes = market_closes[self.SPY_SID]\n",
138 |     "spy_200d_mavg = spy_closes.rolling(200).mean()\n",
139 |     "spy_above_200d = (spy_closes > spy_200d_mavg).astype(int)\n",
140 |     "```\n",
141 |     "\n",
142 |     "Now that we have a Series indicating whether SPY is above its moving average, we need to reshape the Series like our prices DataFrame, so that the SPY indicator is provided to the model as a feature for each stock. First, we reindex the Series like the prices DataFrame, in case there are any differences in dates between the two data sources (we don't expect there to be a difference but it is possible when using data from different sources). Then, we use `apply` to broadcast the Series along each column (i.e. each security) of the prices DataFrame:  \n",
143 |     "\n",
144 |     "```python\n",
145 |     "# Must reindex like closes in case indexes differ\n",
146 |     "spy_above_200d = spy_above_200d.reindex(closes.index, method=\"ffill\")\n",
147 |     "features[\"spy_above_200d\"] = closes.apply(lambda x: spy_above_200d)\n",
148 |     "```\n",
149 |     "\n",
150 |     "The McClellan oscillator is a market breadth indicator which we calculate using the Sharadar data, counting the daily advancers and decliners then calculating the indicator from these Series:\n",
151 |     "\n",
152 |     "```python\n",
153 |     "# McClellan oscillator\n",
154 |     "total_issues = closes.count(axis=1)\n",
155 |     "returns = closes.pct_change()\n",
156 |     "advances = returns.where(returns > 0).count(axis=1)\n",
157 |     "declines = returns.where(returns < 0).count(axis=1)\n",
158 |     "net_advances = advances - declines\n",
159 |     "pct_net_advances = net_advances / total_issues\n",
160 |     "ema_19 = pct_net_advances.ewm(span=19).mean()\n",
161 |     "ema_39 = pct_net_advances.ewm(span=39).mean()\n",
162 |     "mcclellan_oscillator = (ema_19 - ema_39) * 10\n",
163 |     "# Winsorize at 50 and -50\n",
164 |     "mcclellan_oscillator = mcclellan_oscillator.where(mcclellan_oscillator < 50, 50).where(mcclellan_oscillator > -50, -50)\n",
165 |     "```\n",
166 |     "As with the SPY indicator, we lastly broadcast the Series with `apply` to shape the indicator like the prices DataFrame:\n",
167 |     "\n",
168 |     "```python\n",
169 |     "features[\"mcclellan_oscillator\"] = closes.apply(lambda x: mcclellan_oscillator).fillna(0)\n",
170 |     "\n",
171 |     "```"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "markdown",
176 |    "metadata": {},
177 |    "source": [
178 |     "### Targets\n",
179 |     "\n",
180 |     "Having created all of our features, in `prices_to_features` we create our targets by asking the model to predict the one-week forward return:\n",
181 |     "\n",
182 |     "```python\n",
183 |     "def prices_to_features(self, prices: pd.DataFrame):\n",
184 |     "    ...\n",
185 |     "    # Target to predict: next week return\n",
186 |     "    one_week_returns = (closes - closes.shift(5)) / closes.shift(5).where(closes.shift(5) > 0)\n",
187 |     "    targets = one_week_returns.shift(-5)\n",
188 |     "    ...\n",
189 |     "```"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "markdown",
194 |    "metadata": {},
195 |    "source": [
196 |     "## Predictions to signals\n",
197 |     "\n",
198 |     "The features and targets will be fed to the machine learning model during training. During backtesting or live trading, the features (but not the targets) will be fed to the machine learning model to generate predictions. The model's predictions will in turn be fed to the `predictions_to_signals` method, which creates buy signals for the 10 stocks with the highest predicted return and sell signals for the 10 stocks with the lowest predicted return, provided they have adequate dollar volume:\n",
199 |     "\n",
200 |     "> We choose to train our model on all securities regardless of dollar volume but only trade securities with adequate dollar volume. We could alternatively have chosen to only train on the set of liquid securities we were willing to trade. \n",
201 |     "\n",
202 |     "```python\n",
203 |     "def predictions_to_signals(self, predictions: pd.DataFrame, prices: pd.DataFrame):\n",
204 |     " \n",
205 |     "    ...\n",
206 |     "    # Buy (sell) stocks with best (worst) predicted return\n",
207 |     "    have_best_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=False, axis=1) <= 10\n",
208 |     "    have_worst_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=True, axis=1) <= 10\n",
209 |     "    signals = have_best_predictions.astype(int).where(have_best_predictions, -have_worst_predictions.astype(int).where(have_worst_predictions, 0))\n",
210 |     "    ...\n",
211 |     "```        "
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "metadata": {},
217 |    "source": [
218 |     "## Weight allocation and rebalancing\n",
219 |     "\n",
220 |     "Capital is divided equally among the signals, with weekly rebalancing:\n",
221 |     "\n",
222 |     "```python\n",
223 |     "def signals_to_target_weights(self, signals: pd.DataFrame, prices: pd.DataFrame):\n",
224 |     "    # Allocate equal weights\n",
225 |     "    daily_signal_counts = signals.abs().sum(axis=1)\n",
226 |     "    weights = signals.div(daily_signal_counts, axis=0).fillna(0)\n",
227 |     "\n",
228 |     "    # Rebalance weekly\n",
229 |     "    # Resample daily to weekly, taking the first day's signal\n",
230 |     "    # For pandas offset aliases, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases\n",
231 |     "    weights = weights.resample(\"W\").first()\n",
232 |     "    # Reindex back to daily and fill forward\n",
233 |     "    weights = weights.reindex(prices.loc[\"Close\"].index, method=\"ffill\")        \n",
234 |     "    ...\n",
235 |     "```"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "markdown",
240 |    "metadata": {},
241 |    "source": [
242 |     "***\n",
243 |     "\n",
244 |     "## *Next Up*\n",
245 |     "\n",
246 |     "Part 4: [Walk-forward Optimization](Part4-Walkforward-Optimization.ipynb)"
247 |    ]
248 |   }
249 |  ],
250 |  "metadata": {
251 |   "kernelspec": {
252 |    "display_name": "Python 3.9",
253 |    "language": "python",
254 |    "name": "python3"
255 |   },
256 |   "language_info": {
257 |    "codemirror_mode": {
258 |     "name": "ipython",
259 |     "version": 3
260 |    },
261 |    "file_extension": ".py",
262 |    "mimetype": "text/x-python",
263 |    "name": "python",
264 |    "nbconvert_exporter": "python",
265 |    "pygments_lexer": "ipython3",
266 |    "version": "3.9.7"
267 |   }
268 |  },
269 |  "nbformat": 4,
270 |  "nbformat_minor": 4
271 | }
272 | 


--------------------------------------------------------------------------------
/kitchensink_ml/Part6-Predictions-Analysis.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "<a href=\"https://www.quantrocket.com\"><img alt=\"QuantRocket logo\" src=\"https://www.quantrocket.com/assets/img/notebook-header-logo.png\"></a><br>\n",
  8 |     "<a href=\"https://www.quantrocket.com/disclaimer/\">Disclaimer</a>"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "***\n",
 16 |     "[Machine Learning and the Kitchen Sink Strategy](Introduction.ipynb) › Part 6: Analysis of Model Predictions\n",
 17 |     "***"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# Analysis of Model Predictions\n",
 25 |     "\n",
 26 |     "A `MoonshotML` strategy encompasses both the model's training and predictions and what we choose to do with those predictions in our trading logic. That's a lot to worry about in the initial research stage. To separate concerns, we can retrieve the model predictions from our backtest results and analyze the predictions in a notebook, which might illuminate how we want to use the predictions."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "## Retrieve predictions and prices\n",
 34 |     "\n",
 35 |     "\n",
 36 |     "In `predictions_to_signals`, we save the predictions, closing prices, and volume DataFrames to the backtest results: \n",
 37 |     "\n",
 38 |     "```python\n",
 39 |     "...\n",
 40 |     "# Save the predictions and prices so we can analyze them\n",
 41 |     "self.save_to_results(\"Prediction\", predictions)\n",
 42 |     "self.save_to_results(\"Close\", closes)\n",
 43 |     "self.save_to_results(\"Volume\", volumes)\n",
 44 |     "...\n",
 45 |     "```\n",
 46 |     "\n",
 47 |     "To get these fields back, we must re-run the walk-forward optimization with `details=True`:"
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "code",
 52 |    "execution_count": 1,
 53 |    "metadata": {},
 54 |    "outputs": [],
 55 |    "source": [
 56 |     "from quantrocket.moonshot import ml_walkforward\n",
 57 |     "ml_walkforward(\"kitchensink-ml\",\n",
 58 |     "                start_date=\"2006-12-31\",\n",
 59 |     "                end_date=\"2018-12-31\",\n",
 60 |     "                train=\"Y\",\n",
 61 |     "                details=True,\n",
 62 |     "                model_filepath=\"pca_sgd_model.joblib\",\n",
 63 |     "                progress=True,\n",
 64 |     "                filepath_or_buffer=\"kitchensink_ml_details*\")"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "## Load predictions and prices\n",
 72 |     "\n",
 73 |     "Using `details=True` on a large universe results in a large backtest results CSV. To make it easier to load, we use `csvgrep` to isolate particular fields before we load anything into memory:"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 2,
 79 |    "metadata": {},
 80 |    "outputs": [],
 81 |    "source": [
 82 |     "!csvgrep -c Field -m Prediction kitchensink_ml_details_results.csv > kitchensink_ml_details_results_Prediction.csv\n",
 83 |     "!csvgrep -c Field -m Close kitchensink_ml_details_results.csv > kitchensink_ml_details_results_Close.csv\n",
 84 |     "!csvgrep -c Field -m Volume kitchensink_ml_details_results.csv > kitchensink_ml_details_results_Volume.csv"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "markdown",
 89 |    "metadata": {},
 90 |    "source": [
 91 |     "Then we load only these fields:"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": 3,
 97 |    "metadata": {},
 98 |    "outputs": [],
 99 |    "source": [
100 |     "import pandas as pd\n",
101 |     "predictions = pd.read_csv(\"kitchensink_ml_details_results_Prediction.csv\", parse_dates=[\"Date\"], index_col=[\"Field\",\"Date\"]).loc[\"Prediction\"]\n",
102 |     "closes = pd.read_csv(\"kitchensink_ml_details_results_Close.csv\", parse_dates=[\"Date\"], index_col=[\"Field\",\"Date\"]).loc[\"Close\"]\n",
103 |     "volumes = pd.read_csv(\"kitchensink_ml_details_results_Volume.csv\", parse_dates=[\"Date\"], index_col=[\"Field\",\"Date\"]).loc[\"Volume\"]"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "metadata": {},
109 |    "source": [
110 |     "Let's split our predictions into 5 bins and compare one-week forward returns for each bin:"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 4,
116 |    "metadata": {},
117 |    "outputs": [
118 |     {
119 |      "data": {
120 |       "text/plain": [
121 |        "<matplotlib.axes._subplots.AxesSubplot at 0x7f15d0e8f438>"
122 |       ]
123 |      },
124 |      "execution_count": 4,
125 |      "metadata": {},
126 |      "output_type": "execute_result"
127 |     },
128 |     {
129 |      "data": {
130 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAA6gAAAG/CAYAAABVDfoiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3X+0bXVdL/z39hzxJxLiUUsgKfCTWDeFRJ8xvPeKWkIlWIlgGpj0Sx+0rj0WZhlgejErsitmXTUgbyFKKSUIFcqte1NJsB9IH0UlBPyBcjDMTIjz/LHWjs32HM7a5xzOnGuv12sMBmvNOTfnzRhzr7Pec37n97u0ZcuWAAAAwNDuNXQAAAAASBRUAAAARkJBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGYePQAVa76aZbrXuzG+y99/2zefNXho4BO8V5zHrhXGY9cB6zXjiX73mbNu25tK197qAuqI0bNwwdAXaa85j1wrnMeuA8Zr1wLg9LQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGYePQAQAAgPXpuitPGzrCml03dIA12v9xrxw6wi7lDioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKG2c5qKqOSPL6JBuSvLm7T1+1/z5JzklyaJIvJjm2u69dsX//JB9Nckp3/9quiQ4AAMB6st07qFW1IcmZSY5McnCS51TVwasOOzHJ5u4+MMkZSV67av8ZSS7a+bgAAACsV7MM8T0syTXd/cnu/lqSc5McveqYo5OcPX39ziRPraqlJKmqZyb5ZJKrdk1kAAAA1qNZCuojknx6xfvrp9u2ekx3357kS0n2qaoHJPn5JKfufFQAAADWs1meQV3ayrYtMx5zapIzuvvLVTVToL33vn82btww07HsnE2b9hw6Auw05zHrhXOZ9cB5zGrXDR1gAay337tZCur1SfZb8X7fJDdu45jrq2pjkr2S3JzkCUmeVVW/muQbktxRVV/t7jds6w/bvPkra4jPjtq0ac/cdNOtQ8eAneI8Zr1wLrMeOI9hGPP4e3d3pXqWgnp5koOq6oAkNyQ5LskPrzrmgiQnJPnrJM9Kcml3b0nyn5cPqKpTknz57sopAAAAi2u7z6BOnyk9KcnFSa5Ocl53X1VVp1XVUdPD3pLJM6fXJHlpkpPvqcAAAACsTzOtg9rdFya5cNW2V654/dUkx2znv3HKDuQDAABgQcwyiy8AAADc4xRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGYeMsB1XVEUlen2RDkjd39+mr9t8nyTlJDk3yxSTHdve1VXVYkt+dHraU5JTu/uNdFR4AAID1Y7t3UKtqQ5IzkxyZ5OAkz6mqg1cddmKSzd19YJIzkrx2uv0fknxXdz82yRFJfqeqZirFAAAALJZZyuJhSa7p7k8mSVWdm+ToJB9dcczRSU6Zvn5nkjdU1VJ3f2XFMfdNsmWnEwMAALAuzfIM6iOSfHrF++un27Z6THffnuRLSfZJkqp6QlVdleTvk/zUdD8AAADcxSx3UJe2sm31ndBtHtPdH0zymKp6dJKzq+qi7v7qtv6wvfe+fzZu3DBDLHbWpk17Dh0BdprzmPXCucx64DxmteuGDrAA1tvv3SwF9fok+614v2+SG7dxzPXTZ0z3SnLzygO6++qq+pck357kb7b1h23e/JVt7WIX2rRpz9x0061Dx4Cd4jxmvXAusx44j2EY8/h7d3elepYhvpcnOaiqDqiqPZIcl+SCVcdckOSE6etnJbm0u7dMf2ZjklTVNyepJNeuLT4AAACLYLsFdfrM6ElJLk5ydZLzuvuqqjqtqo6aHvaWJPtU1TVJXprk5On2JyX526r6SJI/TvKi7v7Crv6fAAAAYP4tbdkyrol1b7rp1nEFWqcMw2E9cB6zXjiXWQ+cx2zNdVeeNnSEdW//x71y6AhrtmnTnlubwyjJbEN8AQAA4B6noAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAobZzmoqo5I8vokG5K8ubtPX7X/PknOSXJoki8mOba7r62q705yepI9knwtycu6+9JdmB8AAIB1Yrt3UKtqQ5IzkxyZ5OAkz6mqg1cddmKSzd19YJIzkrx2uv0LSZ7R3d+R5IQkv7+rggMAALC+zDLE97Ak13T3J7v7a0nOTXL0qmOOTnL29PU7kzy1qpa6+8ruvnG6/aok953ebQUAAIC7mKWgPiLJp1e8v366bavHdPftSb6UZJ9Vx/xQkiu7+992LCoAAADr2SzPoC5tZduWtRxTVY/JZNjv92zvD9t77/tn48YNM8RiZ23atOfQEWCnOY9ZL5zLrAfOY1a7bugAC2C9/d7NUlCvT7Lfivf7JrlxG8dcX1Ubk+yV5OYkqap9k/xxkuO7+xPb+8M2b/7KDJHYWZs27Zmbbrp16BiwU5zHrBfOZdYD5zEMYx5/7+6uVM9SUC9PclBVHZDkhiTHJfnhVcdckMkkSH+d5FlJLu3uLVX1DUnek+Tl3f1/diA7AAAAC2K7z6BOnyk9KcnFSa5Ocl53X1VVp1XVUdPD3pJkn6q6JslLk5w83X5SkgOT/FJVfWT6z0N3+f8FAAAAc29py5bVj5MO66abbh1XoHXKMBzWA+cx64VzmfXAeczWXHflaUNHWPf2f9wrh46wZps27bm1OYySzDaLLwAAANzjFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUNg4dAAB2pY/92POHjrBmHxs6wBo96s1nDR0BgHXKHVQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQUVAAAAEZBQQUAAGAUFFQAAABGQUEFAABgFBRUAAAARkFBBQAAYBQ2znJQVR2R5PVJNiR5c3efvmr/fZKck+TQJF9Mcmx3X1tV+yR5Z5LHJzmru0/aleEBAABYP7Z7B7WqNiQ5M8mRSQ5O8pyqOnjVYScm2dzdByY5I8lrp9u/muSXkvx/uywxAAAA69IsQ3wPS3JNd3+yu7+W5NwkR6865ugkZ09fvzPJU6tqqbv/pbv/KpOiCgAAANs0yxDfRyT59Ir31yd5wraO6e7bq+pLSfZJ8oW1Btp77/tn48YNa/0xdsCmTXsOHQF2mvOY1T42dIAF4PeObXFusNp1QwdYAOvt926Wgrq0lW1bduCYmWze/JUd+THWaNOmPXPTTbcOHQN2ivMYhuH3jq3xmQzDmMffu7sr1bMM8b0+yX4r3u+b5MZtHVNVG5PsleTmNaUEAABgoc1yB/XyJAdV1QFJbkhyXJIfXnXMBUlOSPLXSZ6V5NLu3qE7qAAAACym7d5B7e7bk5yU5OIkVyc5r7uvqqrTquqo6WFvSbJPVV2T5KVJTl7++aq6NslvJHl+VV2/lRmAAQAAYLZ1ULv7wiQXrtr2yhWvv5rkmG387CN3Ih8AAAALYpZnUAEAAOAep6ACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOwcegAAAB8vd8+/f1DR1j3Xnjyk4eOAKziDioAAACj4A7qLvCC0y8dOsK699aTnzJ0BAAA4B7mDioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo6CgAgAAMAoKKgAAAKOgoAIAADAKCioAAACjoKACAAAwCgoqAAAAo7BxloOq6ogkr0+yIcmbu/v0Vfvvk+ScJIcm+WKSY7v72um+lyc5Mcm/J3lJd1+8y9IDAACwbmz3DmpVbUhyZpIjkxyc5DlVdfCqw05Msrm7D0xyRpLXTn/24CTHJXlMkiOSvHH63wMAAIC7mGWI72FJrunuT3b315Kcm+ToVcccneTs6et3JnlqVS1Nt5/b3f/W3Z9Kcs30vwcAAAB3MUtBfUSST694f/1021aP6e7bk3wpyT4z/iwAAADM9Azq0la2bZnxmFl+9i42bdpzaz8zan/y66tvKAO7y6ZNew4dgZHZ9O7zh44Au8Qrf/0ZQ0eAnbbpe143dATmzCx3UK9Pst+K9/smuXFbx1TVxiR7Jbl5xp8FAACAmQrq5UkOqqoDqmqPTCY9umDVMRckOWH6+llJLu3uLdPtx1XVfarqgCQHJfnQrokOAADAerLdgjp9pvSkJBcnuTrJed19VVWdVlVHTQ97S5J9quqaJC9NcvL0Z69Kcl6SjyZ5b5L/t7v/fdf/bwAAADDvlrZsudtHQgEAAGC3mGWILwAAANzjFFQAAABGQUEFAABgFGZZB5U5V1WHzHDYbd399/d4GIAF5zOZ9aCqfnCGw77a3Rfe42GAdUVBXQyXZbJc0NLdHHNAkkfuljSwA6rq72Y47Kbufuo9HgZ2js9k1oP/meTdufvz+L8kUVAZNd8vxkdBXQyXd/dT7u6Aqrp0d4WBHbQhyffezf6lfP0azTBGPpNZDy7q7hfc3QFV9bbdFQZ2gu8XI2OZGWAuVNWTuvuvdvYYAIBlvl+Mj4K6AKpqj0yeZ9oyfX94kkOSfLS7Lxo0HMCCqqp7d/dtq7Y9pLu/MFQmWIuqelCSTd39iVXb/1N3zzJsEuDrGOK7GC5P8uQkm6vqZUl+IJNnQl5aVf+lu18+ZDiYRVU9MMnPJfmhJPsm+VqSTyR5U3efNWA0WJPpRcLfT3KfqroyyU9097XT3ZdkcgERRq2qnp3kN5N8vqruneT53X35dPdZcR4zJ6YXWl6eyXeLi7r7D1bse2N3v2iwcAvKMjOLYUN3b56+PjbJU7v7V5IcmeT7hosFa/K/knwyydOTnJrkt5L8SJLDq+o1QwaDNfrVJE/v7k1JfjfJn1XVE6f77m7CGRiTX0hyaHc/NsmPJvn9FTP7Oo+ZJ7+XyTl7fpLjqur8qrrPdN8Tt/1j3FMU1MXwz1X17dPXX0hy3+nrjXEOMD8e2d1ndff13f0bSY7q7o9n8sVoluUOYCz26O6rkqS735nkmUnOrqofSOK5G+bFhu7+TJJ094eSHJ7kFVX1kjiPmS/f2t0nd/e7uvuoJFckubSq9hk62KJSThbDTyX5X1V1TpLPJ/mbqnprkr9K4s4T8+JfqupJSVJVz0hyc5J09x1xtZ75cltVPXz5zbSsPjXJLyc5aLBUsDa3VtW3Lr+ZltUnJzk6yWOGCgU74D5V9R+dqLtfncnolv+dREkdgIK6AKYTFRyS5A+TfDjJG5NcnORpK8fZw8j9VJLfqKpbkvx8khcnSVVtSnLmkMFgjU5O8rCVG7r7+ky+3J8+RCDYAS/Mqu+R3X1rkiOS3O3yMzAyf5LkLkt/dffZSX42k/ku2M3M4rugquqQ7r5i6BwAwPyqqocleUQmw3pv7O7PDRwJmHMK6gKoqq3NpHdBkmckWVJUmXdV9aPd/XtD54BZVNW3JTkjyR1JXpLklzJ5DvVjSU7o7qsHjAczqarHJnlTkr2S3DDdvG+SW5K8sLuvHCobrMX0uek/mo5kYQQsM7MY/ibJB5L824pt+yT5jUyueD5laz8Ec+TUTGbhg3nwu0lel+SBSS7NZMj6jyb5/iRvyOR5VBi7s5L8ZHd/cOXG6YzUZyX5zgEywY54VZKTq+oTmTwO947uvmngTAtNQV0Mz87keb3XdfeFSVJVn+ruw4eNBbOrqm0t+r6UVc/zwcjt2d1/kiRV9aruPne6/U+q6tQBc8FaPGB1OU2S7v5AVT1giECwgz6Z5NAkT8tkOcZTq+rDmZTVP5o+W81upKAugO5+Z1W9N8mrqupHM3no29hu5s3DMlkDdfOq7UtJ/u/ujwM7bMOK17+xat8euzMI7ISLquo9Sc5J8unptv2SHJ/kvYOlgrXbMl0R4JIkl1TVvZMcmeQ5SX4tyaYhwy0iBXVBdPeXk/y36TMjZ2cytAzmyZ8meWB3f2T1jqp6/+6PAzvszKp6YHd/ubvfuLyxqg5M8ucD5oKZdfdLqurITJaVeUQmFwuvT3Lm8mgtmBN3Waquu2/LZK6WC6rqfsNEWmwmSVpAVbWUyRCzfx46CwAADKWqHtXdHxs6B3dSUBdEVT09k9n1/qK7r12x/QXd/dbBgsEaLC+k3d13VNUeSb49ybXdffOwyWBtqurwJD+UyZDI25N8PMmbu/uaQYPBLlBVP9Hdvzt0DpjV9ObNYVmxZFKSD3W3ojSAe23/EOZdVb0mySuSfEeSv6iqF6/YfdIwqWBtquqZST6T5IaqOjrJX2bybMjfVdUzBg0Ha1BVp2fynN4HktyWyQQdn0jyjqo6ZshssIssbf8QGIeq+p5MLhKekuR7k3xfJqsDfHy6j93MM6iL4RlJHtfdt1fVKUn+oKq+pbv/W/wlwvz45UyWLbhfkr9N8vju7qr65iTnJ/mTIcPBGnxfd39HklTVuUku6+6XVdU7M7nw8o5B08FO6u7fGToDrMHrkzxt5QjDJKmqA5JcmOTRQ4RaZArqYtjY3bcnSXffMr3b9LtV9Y6YMZI50t2fTZKquq67e7rtn5aH/sKcuKOqHjwdmv5Nmc7q292bp8PMYC5MHx96Zu46LPLd3W0WX+bJxkwm+FrthiT33s1ZiIK6KD5RVf+1uy9Lku7+9yQnVtWvZPIMFMyFqrrXdCr4F6zYtiEutDBfXpPkyqrqJN+W5IVJUlWbMhkdAKNXVb+Z5FGZLDOz/OV+3yQvqaoju/unBwsHa/PWJJdPR7SsXDLpuCRvGSzVAjNJ0gJYniK7u/91K/se0d037P5UsDZV9fgkf9/dX121/ZFJntTdbxskGOyAqnpwkm9Jck133zJ0HlirqvpYdz9qK9uXknysuw8aIBbskKp6dL5+yaQLuvujgwZbUArqgqiqvZIckbsOw7nYFyPm0fTL/Zbu3jx0FtgRPpOZd1X1d0l+rLs/tGr7YUnesvycNcBaKagLoKqOz2SCmUsyGU+fTIbhfHeSU7v7nKGywayqav8kv5rkqUluyeQK54OSXJrk5NWTG8BY+UxmPaiqQ5L8dpI9c+cQ3/2S/HOSF3X3h4fKBrtKVZ3S3acMnWPReAZ1MbwiyaGrr8xX1d5JPpjJ8yMwdm9P8ptJnjt9jnr5+dNjkpyb5IkDZoO18JnM3OvuK5I8oaoenhXDIpcns4N1woWWASioi2EpkyFkq90Ry8wwPx7S3W9fuWFaVM+tqlcNlAl2hM9k1o1pIVVKWZe62xJ2A1BQF8Ork1xRVZfkztnJ9s9kOJkv9syLD1fVG5OcnbvOsndCkisHSwVr5zMZYCSqamOSE5P8QCZLf/3HkkmZPE9924DxFpJnUBfEdOjY03PX2ckuNskM86Kq9sjkL5Cvm2Uvk79A/m3AeLAmPpMBxqGq/jCTuS3Ozl2XTDohyYO7+9ihsi0qBXWBVNXDsmLGyO7+3MCRABaWz2SA4VVVd3dtY99Wl1PinmWI7wKoqscmeVOSvTK5MrSUZN+quiWTmfauGDIfzGLFEJxn5q5LcxiCw1zxmcx6VlVXT1+e2d1vGDQMzGZzVR2T5PzuviNJqupemUzCaFTLABTUxXBWkp/s7g+u3FhVT0zye0m+c4hQsEa/n8kQnFPz9UNw3pbEEBzmxVnxmcw61d2PrqqHJHnC0FlgRscleW2SN1bVciH9hiTvm+5jNzPEdwFU1ce7+6Bt7Lumuw/c3ZlgrQzBYb3wmQwwTlW1T5Kl7v7C0FkWmTuoi+GiqnpPJmvrrZz99Pgk7x0sFayNITisFz6TmXtVtV+S12XyyMVFSV63/KhFVb2ru585ZD7YEd39xaEz4A7qwqiqI7OV2U+7+8JBg8GMquqRmQzBeUruLKTLQ3BO7u5PDRQN1sxnMvOuqv4syflJPpDJ/ACHJnlGd3+xqq7s7scNGhCYWwoqMHcMwQEYVlV9pLsfu+L985K8PMlRSd7R3YcMFg6Ya/caOgDDqqqfGDoDrFV3f3FlOa2qhw+ZB3YVn8nMkXtX1X2X33T325L8dJKLk3zjYKlgF6mqb6yq+wydYxEpqCwNHQB2gbcMHQB2EZ/JzIs3Z9VMvd3955nMC/APgySCXev3k/xjVf3a0EEWjSG+AAAAq1TVUpKDu/uqobMsEgV1QVTV05M8M5MJObYkuTHJu7vbjJHMjelfFIflrufxh7rbBxlzxWcy866q7p/kpEzO3/+RyXqRP5jkH5Oc1t1fHjAeMMcU1AVQVb+Z5FGZLGlw/XTzvpksafDx7v7pobLBrKrqe5K8McnHk9ww3bxvkgOTvKi7LxkqG6yFz2TWg6o6L5Nlku6XpJJcneS8JM9I8vDu/pEB48HMquo7kvzP3Llk0s939+bpvg9192FD5ltE1kFdDN/b3Y9avbGq3p7kY5lMagBj9/okT+vua1durKoDklyY5NFDhIId4DOZ9eBR3f3s6ciWz2Ty+bylqv4yyd8OnA3W4reTnJLJkkk/luSvquqo7v5EknsPGWxRmSRpMXy1qrZ29efxSb66u8PADtqYO+82rXRD/AXCfPGZzLoxfcTiwuVHLab/NjyPefLA7n5vd9/S3b+WydD191bVE+NcHoQ7qIvh+Ul+u6r2zJ1f8PdL8s/TfTAP3prk8qo6N5NhZcnkPD4uZvFlvjw/PpOZf39TVQ/s7i939wuWN1bVtya5dcBcsFZLVbVXd38pSbr7fVX1Q0nOT/LgYaMtJs+gLpDpWpGPyGQZg+u7+7MDR4I1qapHJzk6K87jJBd090cHDQY7wGcy61VVLZm8jnlRVT+c5JPd/YFV2/dP8kvd/ePDJFtcCuqCqqpTuvuUoXMAAPOpqh6Y5IhMRgDcnskkdpd09x2DBgPmmoK6oKrqiu4+ZOgcAMD8qapnJ3lZJhMiHZ7k/2Yyt8l3JHled//dgPFgZlW1V5KXZ7L016bp5s8neXeS07v7lqGyLSqTJC2upaEDAABz6xeT/Nfu/rEkT0jy0O5+bpLnJXnToMlgbc5LsjnJk7t7n+7eJ5OLLpuTvGPQZAtKQV1chw4dAACYW0tJ/nX6+l+SPDRJpndOHzRUKNgBj+zu166cB6C7P9vdr02y/4C5FpZZfBeU50NYL6rqNUm+lOTN3f3FofPAjqqqq6cvz+zuNwwaBrbvwkyW4rgsyZGZ3mmqqgfHKC3myz9V1c8lObu7P5ckVfWwTGZV//Td/SD3DHdQgXn3oUwm5zhj6CCwM7r70UmelORTQ2eB7enun09eMCIdAAAMHklEQVTy+iRfS3Jad79muuuWJOa4YJ4cm2SfJJdV1c1VdXOS92eyxMyzhwy2qEySBAADqaq9k9ze3daNZC5N7zQ9IsmWJDcu34EC2FEK6gKrqqOTfLa7Pzh0FtieqvqBJJd1981VtSnJryd5XJKPJvnZ7r5+0IAwo6r6piSnZ7Km7wOT3DDd9dYkr+7u24bKBrOqqsdmMhnSXrnzHN43kzuoL+ruK4bKBrtKVR3iXN79DPFdbE9I8otVddHQQWAGr+7um6ev35Dkykyee7ooye8NlgrW7m1J3trdeyU5Jsn5SR6dybwQZw4ZDNbgrCQ/3d2P7u6nTf/5tiQ/E5/JrB8vHDrAIjJJ0gLr7l8YOgOswYYVrw/s7mOnr8+qqp8ZIhDsoH26+/1J0t1/VFWv6O5/yeSC4T8OGw1m9oCtjcDq7g9U1QOGCAS7Wnf/+NAZFpGCuiCq6kFJNnX3J1Zt/08W02ZOvL+qTkvy36evn9nd76qqwzOZxRfmxU1V9bwklyb5oSTXJklVLcXIJubHRVX1niTn5M6ZTvdLcnyS9w6WCnZAVe2V5IiseJ46ycXdfcugwRaUvwgXQFU9O8k/Jjm/qq6qqsev2H3WMKlgzU5KckeSzmRY5B9V1a1JfjzJjwwZDNboBUmOSnJJJo9anDTd/uAkLx8qFKxFd78kk8ctDs/kvP2F6eszu/uku/tZGJOqOj7JFUmenOT+SR6Qybn84ek+djOTJC2AqvpIkiO7+zNVdVgmVzt/YTq07MruftzAEWFNplc6N1r3FADYGVXVSZ6w+m7pdJb1D3b3o4ZJtrgM8V0MG7r7M0nS3R+aDon806raN5NhDDBXuvtLSVJVr/EsNetJVb2yu08bOgdsz/RC4cszmY36odPNn0/y7iSnGxrJHFnK1r8P3zHdx26moC6GW6vqW5efP53eSX1yknclecygyWBGVfVbqzYtJfmRqnpg8h/DzWDe/VgSBZV5cF4mz1Ef3t2fTZKqeniS5yd5R5LvHi4arMmrk1xRVZfkzuep98/kHH7VYKkWmIK6GF6YVc8bd/etVXVEkmcPEwnW7AeTvD+T5/aWr2gel+TDQwWCHVFV/7yNXUtJ7rc7s8BOeGR3v3blhmlRPb2qfnSgTLBm3X12VV2Q5OmZTJK0lMn3jZd39+Yhsy0qz6ACc6Gq9szkSuZDk7ysu2+oqk9297cMHA3WpKquS/L47v7cVvZ9urv3GyAWrMn0btOfJzl7+Vyuqodlcgf1u7v7aQPGg5lV1VJ3320hmuUYdh13UBdAVe2X5HWZXBW6KMnruvu26b53dfczh8wHs+juW5P8TFUdmuRt0+UNzETOPDonyTcn+bqCmuQPdnMW2FHHJjk5yWVVtfwM6ueSXBCjs5gv76uq85O8u7uvW95YVXskeVKSE5K8L1a+2G3cQV0AVfVnSc5P8oEkJyY5NMkzuvuLZvFlHk3Xi3xRkv+nu583dB4AYD5V1X0zWf7ruUkOSHJLJo9b3CuTx4rO7O6PDJdw8SioC6CqPtLdj13x/nmZzLx3VJJ3dPchg4UDWEDTyWTS3Z+tqk1J/vPkbV81bDLYeVV1SHdfMXQOWKuquneShyT5VzNRD8fwuMVw7+nVoSRJd78tyU8nuTjJNw6WCtagqvarqnOr6i+r6hemf4ks73vXkNlgLarqJ5P8dZIPVNULk/xpku9P8kdVdeKg4WDXeOHQAWBHdPdt3f0Z5XRYCupieHOSJ6zc0N1/nuSYJP8wSCJYu7dmMqveizO5sHJZVe0z3ffNQ4WCHXBSJkt8HZrJ/ABHd/cLkjwxk/Mb5lp3//jQGYD5ZZKkBdDdZ2xj+5WxThnzY1N3v2n6+sXToer/u6qOytYX2Iaxuq27v5LkK1X1ieU1JLt7c1U5l5kbVbVXkiMymYRxS5Ibk1zs7hOwM9xBXVBV9bGhM8AaGarOenHHiiHq37e8cXp++3uZuVBVxye5IsmTk9w/yQOSHJ7kw9N9ADvEHdQFUFW35s47TEvTf99/eXt3P2iYZLAmy0PVL1ve0N1/XlXHJPnVwVLB2v3g8ovuvn7F9n2S/OzujwM75BVJDl19t7Sq9k7ywUyWUwJYMwV1MZyVZK8kL1uxmPanuvuAQVPBGhiqznqxvM5eVT0sK4ZGdvcNSW4YMhuswVK2/njFHbnzYjjAmimoC6C7X1xVhyb5w+lsp2+IZ/aYM1W1MZN1fH8gyTflzued3p3kLd1924DxYGZV9dgkb8rkwuFyId23qm5J8sLpRRcYu1cnuaKqLkny6em2/TO5YPiqwVIBc886qAukqu6VyeyRxyT51u7+poEjwcyq6g8zWTz77CTLwyL3TXJCkgd397FDZYO1qKqPJPnJ7v7gqu1PTPI73f2dwySDtZkO5316JiMBljL5bL64uzcPGgyYa+6gLpDuviPJb1XVO5I8bug8sEaHdHet2nZ9JmtJmvSLefKA1eU0Sbr7A1X1gCECwVpV1dK0iJ67nWPcCQHWREFdEFX1bUmOzornnabPoV49bDKY2ebphEjnTy+2LI8KOCaJq/XMk4uq6j2ZTCKzPDRyvyTHJ3nvYKlgbd5XVecneffyc9VJUlV7JHlSJqNb3pfJPBgAMzPEdwFU1c8neU4mVzlXDo08Lsm53X36UNlgVlX1yCSvTfKUTArpUpJvSHJpkpO7+1PDpYO1qaojc+dFw+WhkRd094WDBoMZTZdFekGS5yY5IJNHMO6XyVJJlyQ5s7s/MlxCYF4pqAtgOvzxMasnkZle5byquw8aJhnsmKraJ8lSd39h6CwAi266ru9Dkvzr6mVnANbKEN/FcEcms57+06rt3zjdB3Nh9VD1qroxk+Fl/zhsMphdVe2V5OWZnMsPnW7+fCYzUp/uCz7zZnoB/DND5wDWBwV1MfxMkr+oqo/nrlPBH5jJrL4wequGqn9ounnfJOdWlaHqzJPzMhmafnh3fzZJqurhSZ6f5B2xri8AC8wQ3wUxnUzmsNz1eafLu/vfBw0GMzJUnfWiqnorM1Jvdx8ALAJ3UBfEdNbTDwydA3aCoeqsF/9UVT+X5Ozu/lySVNXDMrmD+um7+0EAWO8U1AVXVX/a3d8/dA6YgaHqrBfHJjk5yWVVtfwM6ueSXJDk2YOlAoARMMR3wVXVN3a3iQ2YC4aqAwCsbwrqgqmqByfZ0t2bh84CwF1V1SHdfcXQOQBgKPcaOgD3vKrav6rOraqbknwwyeVV9fnptkcOHA92WlX96dAZYBd54dABAGBICupieHuSP07y8O4+qLsPzGRimXdlsmQHzLsfHzoA7Ard7VwGYKEZ4rsAqurj21qC4+72wVgZqs68q6q9khyRyfPUW5LcmOTi7r5l0GAAMDAFdQFU1blJbk5ydu6c/XS/JCckeUh3mzWS0auq/ZP8apKnJrklk0mSHpTk0iQnd/e1w6WD2VXV8Ul+OcklSW6Ybt43yXcnObW7zxkqGwAMzTIzi+H4JCcmOTV3nf30giRvGTAXrMXbk/xmkucuz9pbVRuSHJPJUPUnDpgN1uIVSQ5dfbe0qvbOZJ4ABRWAhaWgLoDu/lqS357+A/PqId399pUbpkX13Kp61UCZYEcsZTKsd7U7pvsAYGEpqAuqqq7o7kOGzgFr8OGqemO2PlT9ysFSwdq9OskVVXVJ7jyX989kiK+LLQAsNLP4Li5X6Zk3xyf5+0yGql+cyfN7pyb5hyQ/MmAuWJPuPjvJdyW5LMm/Jflakvcn+a7uPmu4ZAAwPJMkLaiq+pXu/sWhcwAsmqpa6u67/ct3lmMAYD1yB3UBVNXX3S1dXU63dgyMXVVdMXQG2AHvq6oXT2em/g9VtUdVPaWqzs5k6DoALBzPoC6G91XV+Une3d3XLW+sqj2SPCmTL0LvS3LWMPFgh7mwwjw6IskLkvxhVR2QybJJ98vkovElSc7o7o8MmA8ABqOgLgZfhliv3jN0AFir7v5qkjcmeWNV3TvJQ5L86+plZwBgEXkGdcH4MsS88tweAMD6p6ACc6Gq3p9ku0PVzYIKADC/FFRgLlTVfTMZqv7cJFsbqn6moeoAAPNNQQXmjqHqAADrk4IKAADAKFgHFQAAgFFQUAEAABgFBRUAAIBRUFABAAAYBQUVAACAUfj/AfLMMS2E8Gm3AAAAAElFTkSuQmCC\n",
131 |       "text/plain": [
132 |        "<Figure size 1152x432 with 1 Axes>"
133 |       ]
134 |      },
135 |      "metadata": {},
136 |      "output_type": "display_data"
137 |     }
138 |    ],
139 |    "source": [
140 |     "# Calculate one week returns\n",
141 |     "one_week_returns = (closes - closes.shift(5)) / closes.shift(5)\n",
142 |     "\n",
143 |     "# Shift one week returns back to time of prediction, and stack returns and predictions \n",
144 |     "one_week_forward_returns_stacked = one_week_returns.shift(-5).stack(dropna=False)\n",
145 |     "predictions_stacked = predictions.stack(dropna=False)\n",
146 |     "\n",
147 |     "# Bin predictions into 5 equal-size bins\n",
148 |     "prediction_bins = pd.qcut(predictions_stacked, 5)\n",
149 |     "\n",
150 |     "# Plot returns by bin\n",
151 |     "one_week_forward_returns_stacked.groupby(prediction_bins).mean().plot(kind=\"bar\")"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {},
157 |    "source": [
158 |     "The stocks with the higest predicted return indeed have the highest forward return. The stocks with the lowest predicted return have a lower but still positive forward return. This suggests it may be possible to dig deeper into the strategy and figure out why, despite this forward return profile, the backtest performance is poor."
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "markdown",
163 |    "metadata": {},
164 |    "source": [
165 |     "***\n",
166 |     "\n",
167 |     "[Back to Introduction](Introduction.ipynb)"
168 |    ]
169 |   }
170 |  ],
171 |  "metadata": {
172 |   "kernelspec": {
173 |    "display_name": "Python 3.9",
174 |    "language": "python",
175 |    "name": "python3"
176 |   },
177 |   "language_info": {
178 |    "codemirror_mode": {
179 |     "name": "ipython",
180 |     "version": 3
181 |    },
182 |    "file_extension": ".py",
183 |    "mimetype": "text/x-python",
184 |    "name": "python",
185 |    "nbconvert_exporter": "python",
186 |    "pygments_lexer": "ipython3",
187 |    "version": "3.9.7"
188 |   }
189 |  },
190 |  "nbformat": 4,
191 |  "nbformat_minor": 4
192 | }
193 | 


--------------------------------------------------------------------------------
/kitchensink_ml/kitchensink_ml.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2019 QuantRocket LLC - All Rights Reserved
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | 
 15 | import pandas as pd
 16 | from moonshot import MoonshotML
 17 | from moonshot.commission import PerShareCommission
 18 | from quantrocket.fundamental import get_sharadar_fundamentals_reindexed_like
 19 | from quantrocket import get_prices
 20 | from quantrocket.master import get_securities_reindexed_like
 21 | 
 22 | class USStockCommission(PerShareCommission):
 23 |     BROKER_COMMISSION_PER_SHARE = 0.005
 24 | 
 25 | class TheKitchenSinkML(MoonshotML):
 26 | 
 27 |     CODE = "kitchensink-ml"
 28 |     DB = "sharadar-us-stk-1d"
 29 |     DB_FIELDS = ["Close", "Volume"]
 30 |     BENCHMARK_DB = "market-1d"
 31 |     SPY_SID = "FIBBG000BDTBL9"
 32 |     VIX_SID = "IB13455763"
 33 |     TRIN_SID = "IB26718743"
 34 |     BENCHMARK = SPY_SID
 35 |     DOLLAR_VOLUME_TOP_N_PCT = 60
 36 |     DOLLAR_VOLUME_WINDOW = 90
 37 |     MODEL = None
 38 |     LOOKBACK_WINDOW = 252
 39 |     COMMISSION_CLASS = USStockCommission
 40 | 
 41 |     def prices_to_features(self, prices: pd.DataFrame):
 42 | 
 43 |         closes = prices.loc["Close"]
 44 | 
 45 |         features = {}
 46 | 
 47 |         print("adding fundamental features")
 48 |         self.add_fundamental_features(prices, features)
 49 |         print("adding quality features")
 50 |         self.add_quality_features(prices, features)
 51 | 
 52 |         print("adding price and volume features")
 53 |         self.add_price_and_volume_features(prices, features)
 54 |         print("adding techical indicator features")
 55 |         self.add_technical_indicator_features(prices, features)
 56 |         print("adding securities master features")
 57 |         self.add_securities_master_features(prices, features)
 58 |         print("adding market features")
 59 |         self.add_market_features(prices, features)
 60 | 
 61 |         # Target to predict: next week return
 62 |         one_week_returns = (closes - closes.shift(5)) / closes.shift(5).where(closes.shift(5) > 0)
 63 |         targets = one_week_returns.shift(-5)
 64 | 
 65 |         return features, targets
 66 | 
 67 |     def add_fundamental_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
 68 |         """
 69 |         Fundamental features:
 70 | 
 71 |         - Enterprise multiple
 72 |         - various quarterly values and ratios
 73 |         - various trailing-twelve month values and ratios
 74 |         """
 75 | 
 76 |         closes = prices.loc["Close"]
 77 | 
 78 |         # enterprise multiple
 79 |         fundamentals = get_sharadar_fundamentals_reindexed_like(
 80 |             closes,
 81 |             fields=["EVEBIT", "EBIT"],
 82 |             dimension="ART")
 83 |         enterprise_multiples = fundamentals.loc["EVEBIT"]
 84 |         ebits = fundamentals.loc["EBIT"]
 85 |         # Ignore negative earnings
 86 |         enterprise_multiples = enterprise_multiples.where(ebits > 0)
 87 |         features["enterprise_multiples_ranks"] = enterprise_multiples.rank(axis=1, pct=True).fillna(0.5)
 88 | 
 89 |         # Query quarterly fundamentals
 90 |         fundamentals = get_sharadar_fundamentals_reindexed_like(
 91 |             closes,
 92 |             dimension="ARQ", # As-reported quarterly reports
 93 |             fields=[
 94 |                 "CURRENTRATIO", # Current ratio
 95 |                 "DE", # Debt to Equity Ratio
 96 |                 "PB", # Price to Book Value
 97 |                 "TBVPS", # Tangible Asset Book Value per Share
 98 |                 "MARKETCAP",
 99 |             ])
100 | 
101 |         for field in fundamentals.index.get_level_values("Field").unique():
102 |             features["{}_ranks".format(field)] = fundamentals.loc[field].rank(axis=1, pct=True).fillna(0.5)
103 | 
104 |         # Query trailing-twelve-month fundamentals
105 |         fundamentals = get_sharadar_fundamentals_reindexed_like(
106 |             closes,
107 |             dimension="ART", # As-reported trailing-twelve-month reports
108 |             fields=[
109 |                 "ASSETTURNOVER", # Asset Turnover
110 |                 "EBITDAMARGIN", # EBITDA Margin
111 |                 "EQUITYAVG", # Average Equity
112 |                 "GROSSMARGIN", # Gross Margin
113 |                 "NETMARGIN", # Profit Margin
114 |                 "PAYOUTRATIO", # Payout Ratio
115 |                 "PE", # Price Earnings Damodaran Method
116 |                 "PE1", # Price to Earnings Ratio
117 |                 "PS", # Price Sales (Damodaran Method)
118 |                 "PS1", # Price to Sales Ratio
119 |                 "ROA", # Return on Average Assets
120 |                 "ROE", # Return on Average Equity
121 |                 "ROS", # Return on Sales
122 |             ])
123 | 
124 |         for field in fundamentals.index.get_level_values("Field").unique():
125 |             features["{}_ranks".format(field)] = fundamentals.loc[field].rank(axis=1, pct=True).fillna(0.5)
126 | 
127 |     def add_quality_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
128 |         """
129 |         Adds quality features, based on the Piotroski F-score.
130 |         """
131 |         closes = prices.loc["Close"]
132 | 
133 |         # Step 1: query relevant indicators
134 |         fundamentals = get_sharadar_fundamentals_reindexed_like(
135 |             closes,
136 |             dimension="ART", # As-reported TTM reports
137 |             fields=[
138 |                "ROA", # Return on assets
139 |                "ASSETS", # Total Assets
140 |                "NCFO", # Net Cash Flow from Operations
141 |                "DE", # Debt to Equity Ratio
142 |                "CURRENTRATIO", # Current ratio
143 |                "SHARESWA", # Outstanding shares
144 |                "GROSSMARGIN", # Gross margin
145 |                "ASSETTURNOVER", # Asset turnover
146 |            ])
147 |         return_on_assets = fundamentals.loc["ROA"]
148 |         total_assets = fundamentals.loc["ASSETS"]
149 |         operating_cash_flows = fundamentals.loc["NCFO"]
150 |         leverages = fundamentals.loc["DE"]
151 |         current_ratios = fundamentals.loc["CURRENTRATIO"]
152 |         shares_out = fundamentals.loc["SHARESWA"]
153 |         gross_margins = fundamentals.loc["GROSSMARGIN"]
154 |         asset_turnovers = fundamentals.loc["ASSETTURNOVER"]
155 | 
156 |         # Step 2: many Piotroski F-score components compare current to previous
157 |         # values, so get DataFrames of previous values
158 | 
159 |         # Step 2.a: get a boolean mask of the first day of each newly reported fiscal
160 |         # period
161 |         fundamentals = get_sharadar_fundamentals_reindexed_like(
162 |             closes,
163 |             dimension="ARQ", # As-reported quarterly reports
164 |             fields=[
165 |                "REPORTPERIOD"
166 |            ])
167 |         fiscal_periods = fundamentals.loc["REPORTPERIOD"]
168 |         are_new_fiscal_periods = fiscal_periods != fiscal_periods.shift()
169 | 
170 |         periods_ago = 4
171 | 
172 |         # this function will be applied sid by sid and returns a Series of
173 |         # earlier fundamentals
174 |         def n_periods_ago(fundamentals_for_sid):
175 |             sid = fundamentals_for_sid.name
176 |             # remove all rows except for new fiscal periods
177 |             new_period_fundamentals = fundamentals_for_sid.where(are_new_fiscal_periods[sid]).dropna()
178 |             # Shift the desired number of periods
179 |             earlier_fundamentals = new_period_fundamentals.shift(periods_ago)
180 |             # Reindex and forward-fill to restore original shape
181 |             earlier_fundamentals = earlier_fundamentals.reindex(fundamentals_for_sid.index, method="ffill")
182 |             return earlier_fundamentals
183 | 
184 |         previous_return_on_assets = return_on_assets.apply(n_periods_ago)
185 |         previous_leverages = leverages.apply(n_periods_ago)
186 |         previous_current_ratios = current_ratios.apply(n_periods_ago)
187 |         previous_shares_out = shares_out.apply(n_periods_ago)
188 |         previous_gross_margins = gross_margins.apply(n_periods_ago)
189 |         previous_asset_turnovers = asset_turnovers.apply(n_periods_ago)
190 | 
191 |         # Step 3: calculate F-Score components; each resulting component is a DataFrame
192 |         # of booleans
193 |         have_positive_return_on_assets = return_on_assets > 0
194 |         have_positive_operating_cash_flows = operating_cash_flows > 0
195 |         have_increasing_return_on_assets = return_on_assets > previous_return_on_assets
196 |         total_assets = total_assets.where(total_assets > 0) # avoid DivisionByZero errors
197 |         have_more_cash_flow_than_incomes = operating_cash_flows / total_assets > return_on_assets
198 |         have_decreasing_leverages = leverages < previous_leverages
199 |         have_increasing_current_ratios = current_ratios > previous_current_ratios
200 |         have_no_new_shares = shares_out <= previous_shares_out
201 |         have_increasing_gross_margins = gross_margins > previous_gross_margins
202 |         have_increasing_asset_turnovers = asset_turnovers > previous_asset_turnovers
203 | 
204 |         # Save each boolean F score component as a feature
205 |         features["have_positive_return_on_assets"] = have_positive_return_on_assets.astype(int)
206 |         features["have_positive_operating_cash_flows"] = have_positive_operating_cash_flows.astype(int)
207 |         features["have_increasing_return_on_assets"] = have_increasing_return_on_assets.astype(int)
208 |         features["have_more_cash_flow_than_incomes"] = have_more_cash_flow_than_incomes.astype(int)
209 |         features["have_decreasing_leverages"] = have_decreasing_leverages.astype(int)
210 |         features["have_increasing_current_ratios"] = have_increasing_current_ratios.astype(int)
211 |         features["have_no_new_shares"] = have_no_new_shares.astype(int)
212 |         features["have_increasing_gross_margins"] = have_increasing_gross_margins.astype(int)
213 |         features["have_increasing_asset_turnovers"] = have_increasing_asset_turnovers.astype(int)
214 | 
215 |         # Sum the components to get the F-Score and saves the ranks as a feature
216 |         f_scores = (
217 |             have_positive_return_on_assets.astype(int)
218 |             + have_positive_operating_cash_flows.astype(int)
219 |             + have_increasing_return_on_assets.astype(int)
220 |             + have_more_cash_flow_than_incomes.astype(int)
221 |             + have_decreasing_leverages.astype(int)
222 |             + have_increasing_current_ratios.astype(int)
223 |             + have_no_new_shares.astype(int)
224 |             + have_increasing_gross_margins.astype(int)
225 |             + have_increasing_asset_turnovers.astype(int)
226 |         )
227 |         features["f_score_ranks"] = f_scores.rank(axis=1, pct=True).fillna(0.5)
228 | 
229 |     def add_price_and_volume_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
230 |         """
231 |         Price and volume features, or features derived from price and volume:
232 | 
233 |         - return ranks
234 |         - price level
235 |         - dollar volume rank
236 |         - volatility ranks
237 |         - volatility spikes
238 |         - volume spikes
239 |         """
240 |         closes = prices.loc["Close"]
241 | 
242 |         # yearly, monthly, weekly, 2-day, daily returns ranks
243 |         one_year_returns = (closes.shift(22) - closes.shift(252)) / closes.shift(252) # exclude most recent month, per classic momentum
244 |         one_month_returns = (closes - closes.shift(22)) / closes.shift(22)
245 |         one_week_returns = (closes - closes.shift(5)) / closes.shift(5)
246 |         two_day_returns = (closes - closes.shift(2)) / closes.shift(2)
247 |         one_day_returns = closes.pct_change()
248 |         features["1yr_returns_ranks"] = one_year_returns.rank(axis=1, pct=True).fillna(0.5)
249 |         features["1mo_returns_ranks"] = one_month_returns.rank(axis=1, pct=True).fillna(0.5)
250 |         features["1wk_returns_ranks"] = one_week_returns.rank(axis=1, pct=True).fillna(0.5)
251 |         features["2d_returns_ranks"] = two_day_returns.rank(axis=1, pct=True).fillna(0.5)
252 |         features["1d_returns_ranks"] = one_day_returns.rank(axis=1, pct=True).fillna(0.5)
253 | 
254 |         # whether returns were positive
255 |         features["last_1year_was_positive"] = (one_year_returns > 0).astype(int)
256 |         features["last_1month_was_positive"] = (one_month_returns > 0).astype(int)
257 |         features["last_1week_was_positive"] = (one_week_returns > 0).astype(int)
258 |         features["last_2day_was_positive"] = (two_day_returns > 0).astype(int)
259 |         features["last_1day_was_positive"] = (one_day_returns > 0).astype(int)
260 | 
261 |         # price level
262 |         features["price_below_10"] = closes < 10
263 |         features["price_below_2"] = closes < 2
264 | 
265 |         # dollar volume ranks
266 |         volumes = prices.loc["Volume"]
267 |         avg_dollar_volumes = (closes * volumes).rolling(63).mean()
268 |         features["dollar_volume_ranks"] = avg_dollar_volumes.rank(axis=1, ascending=True, pct=True).fillna(0.5)
269 | 
270 |         # quarterly volatility ranks
271 |         quarterly_stds = closes.pct_change().rolling(window=63).std()
272 |         features["quaterly_std_ranks"] = quarterly_stds.rank(axis=1, pct=True).fillna(0.5)
273 | 
274 |         # volatility spikes
275 |         volatility_1d_vs_quarter = closes.pct_change().abs() / quarterly_stds.where(quarterly_stds > 0)
276 |         features["2std_volatility_spike"] = (volatility_1d_vs_quarter >= 2).astype(int)
277 |         features["volatility_spike_ranks"] = volatility_1d_vs_quarter.rank(axis=1, pct=True).fillna(0.5)
278 | 
279 |         # volume spike
280 |         avg_volumes = volumes.rolling(window=63).mean()
281 |         volume_1d_vs_quarter = volumes / avg_volumes.where(avg_volumes > 0)
282 |         features["2x_volume_spike"] = (volume_1d_vs_quarter >= 2).astype(int)
283 |         features["volume_spike_ranks"] = volume_1d_vs_quarter.rank(axis=1, pct=True).fillna(0.5)
284 | 
285 |     def add_technical_indicator_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
286 |         """
287 |         Various technical indicators:
288 | 
289 |         - Bollinger bands
290 |         - RSI
291 |         - Stochastic oscillator
292 |         - Money Flow Index
293 |         """
294 |         closes = prices.loc["Close"]
295 | 
296 |         # relative position within Bollinger Bands (0 = at or below lower band, 1 = at or above upper band)
297 |         mavgs = closes.rolling(20).mean()
298 |         stds  = closes.rolling(20).std()
299 |         upper_bands = mavgs + (stds * 2)
300 |         lower_bands = mavgs - (stds * 2)
301 |         # Winsorize at upper and lower bands
302 |         winsorized_closes = closes.where(closes > lower_bands, lower_bands).where(closes < upper_bands, upper_bands)
303 |         features["close_vs_bbands"] = (winsorized_closes - lower_bands) / (upper_bands - lower_bands)
304 | 
305 |         # RSI (0-1)
306 |         returns = closes.diff()
307 |         avg_gains = returns.where(returns > 0).rolling(window=14, min_periods=1).mean()
308 |         avg_losses = returns.where(returns < 0).abs().rolling(window=14, min_periods=1).mean()
309 |         relative_strengths = avg_gains / avg_losses.where(avg_losses != 0)
310 |         features["RSI"] = 1 - (1 / (1 + relative_strengths.fillna(0.5)))
311 | 
312 |         # Stochastic oscillator (0-1)
313 |         highest_highs = closes.rolling(window=14).max()
314 |         lowest_lows = closes.rolling(window=14).min()
315 |         features["stochastic"] = (closes - lowest_lows) / (highest_highs - lowest_lows)
316 | 
317 |         # Money flow (similar to RSI but volume-weighted) (0-1)
318 |         money_flows = closes * prices.loc["Volume"]
319 |         positive_money_flows = money_flows.where(returns > 0).rolling(window=14, min_periods=1).sum()
320 |         negative_money_flows = money_flows.where(returns < 0).rolling(window=14, min_periods=1).sum()
321 |         money_flow_ratios = positive_money_flows / negative_money_flows.where(negative_money_flows > 0)
322 |         features["money_flow"] = 1 - (1 / (1 + money_flow_ratios.fillna(0.5)))
323 | 
324 |     def add_securities_master_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
325 |         """
326 |         Features from the securities master:
327 | 
328 |         - ADR?
329 |         - sector
330 |         """
331 |         closes = prices.loc["Close"]
332 | 
333 |         securities = get_securities_reindexed_like(closes, fields=["sharadar_Category", "sharadar_Sector"])
334 | 
335 |         # Is it an ADR?
336 |         categories = securities.loc["sharadar_Category"]
337 |         unique_categories = categories.iloc[0].fillna('').unique()
338 |         # this dataset includes several ADR classifications, all of which start with "ADR "
339 |         features["are_adrs"] = categories.isin([cat for cat in unique_categories if cat.startswith("ADR ")]).astype(int)
340 | 
341 |         # Which sector? (sectors must be one-hot encoded - see usage guide for more)
342 |         sectors = securities.loc["sharadar_Sector"]
343 |         for sector in sectors.stack().unique():
344 |             features["sector_{}".format(sector)] = (sectors == sector).astype(int)
345 | 
346 |     def add_market_features(self, prices: pd.DataFrame, features: dict[str, pd.DataFrame]):
347 |         """
348 |         Market price, volatility, and breadth, some of which are queried from a
349 |         database and some of which are calculated from the Sharadar data:
350 | 
351 |         - whether S&P 500 is above or below its 200-day moving average
352 |         - where VIX falls within the range of 12 - 30
353 |         - where 10-day NYSE TRIN falls within the range of 0.5 to 2
354 |         - McClellan oscillator
355 |         - Hindenburg Omen
356 |         """
357 |         closes = prices.loc["Close"]
358 | 
359 |         # Get prices for SPY, VIX, TRIN-NYSE
360 |         market_prices = get_prices(self.BENCHMARK_DB,
361 |                                    fields="Close",
362 |                                    start_date=closes.index.min(),
363 |                                    end_date=closes.index.max())
364 |         market_closes = market_prices.loc["Close"]
365 | 
366 |         # Is S&P above its 200-day?
367 |         spy_closes = market_closes[self.SPY_SID]
368 |         spy_200d_mavg = spy_closes.rolling(200).mean()
369 |         spy_above_200d = (spy_closes > spy_200d_mavg).astype(int)
370 |         # Must reindex like closes in case indexes differ
371 |         spy_above_200d = spy_above_200d.reindex(closes.index, method="ffill")
372 |         features["spy_above_200d"] = closes.apply(lambda x: spy_above_200d)
373 | 
374 |         # VIX and TRIN don't go back as far as Sharadar data, so we may need a filler DataFrame
375 |         fillers = pd.DataFrame(0.5, index=closes.index, columns=closes.columns)
376 | 
377 |         # Where does VIX fall within the range of 12-30?
378 |         try:
379 |             vix = market_closes[self.VIX_SID]
380 |         except KeyError:
381 |             features["vix"] = fillers
382 |         else:
383 |             vix_high = 30
384 |             vix_low = 12
385 |             # Winsorize VIX
386 |             vix = vix.where(vix > vix_low, vix_low).where(vix < vix_high, vix_high)
387 |             vix_as_pct = (vix - vix_low) / (vix_high - vix_low)
388 |             vix_as_pct = vix_as_pct.reindex(closes.index, method="ffill")
389 |             features["vix"] = closes.apply(lambda x: vix_as_pct).fillna(0.5)
390 | 
391 |         # Where does NYSE TRIN fall within the range of 0.5-2?
392 |         try:
393 |             trin = market_closes[self.TRIN_SID]
394 |         except KeyError:
395 |             features["trin"] = fillers
396 |         else:
397 |             trin = trin.rolling(window=10).mean()
398 |             trin_high = 2
399 |             trin_low = 0.5
400 |             # Winsorize TRIN
401 |             trin = trin.where(trin > trin_low, trin_low).where(trin < trin_high, trin_high)
402 |             trin_as_pct = (trin - trin_low) / (trin_high - trin_low)
403 |             trin_as_pct = trin_as_pct.reindex(closes.index, method="ffill")
404 |             features["trin"] = closes.apply(lambda x: trin_as_pct).fillna(0.5)
405 | 
406 |         # McClellan oscillator
407 |         total_issues = closes.count(axis=1)
408 |         returns = closes.pct_change()
409 |         advances = returns.where(returns > 0).count(axis=1)
410 |         declines = returns.where(returns < 0).count(axis=1)
411 |         net_advances = advances - declines
412 |         pct_net_advances = net_advances / total_issues
413 |         ema_19 = pct_net_advances.ewm(span=19).mean()
414 |         ema_39 = pct_net_advances.ewm(span=39).mean()
415 |         mcclellan_oscillator = (ema_19 - ema_39) * 10
416 |         # Winsorize at 50 and -50
417 |         mcclellan_oscillator = mcclellan_oscillator.where(mcclellan_oscillator < 50, 50).where(mcclellan_oscillator > -50, -50)
418 |         features["mcclellan_oscillator"] = closes.apply(lambda x: mcclellan_oscillator).fillna(0)
419 | 
420 |         # Hindenburg omen (and new 52-week highs/lows)
421 |         one_year_highs = closes.rolling(window=252).max()
422 |         one_year_lows = closes.rolling(window=252).min()
423 |         new_one_year_highs = (closes > one_year_highs.shift()).astype(int)
424 |         new_one_year_lows = (closes < one_year_lows.shift()).astype(int)
425 |         features["new_one_year_highs"] = new_one_year_highs
426 |         features["new_one_year_lows"] = new_one_year_lows
427 |         pct_one_year_highs = new_one_year_highs.sum(axis=1) / total_issues
428 |         pct_one_year_lows = new_one_year_lows.sum(axis=1) / total_issues
429 |         hindenburg_omens = (pct_one_year_highs > 0.028) & (pct_one_year_lows > 0.028) & (spy_closes > spy_closes.shift(50))
430 |         # Omen lasts for 30 days
431 |         hindenburg_omens = hindenburg_omens.where(hindenburg_omens).fillna(method="ffill", limit=30).fillna(False).astype(int)
432 |         hindenburg_omens = hindenburg_omens.reindex(closes.index, method="ffill")
433 |         features["hindenburg_omens"] = closes.apply(lambda x: hindenburg_omens)
434 | 
435 |     def predictions_to_signals(self, predictions: pd.DataFrame, prices: pd.DataFrame):
436 |         closes = prices.loc["Close"]
437 |         volumes = prices.loc["Volume"]
438 |         avg_dollar_volumes = (closes * volumes).rolling(self.DOLLAR_VOLUME_WINDOW).mean()
439 |         dollar_volume_ranks = avg_dollar_volumes.rank(axis=1, ascending=False, pct=True)
440 |         have_adequate_dollar_volumes = dollar_volume_ranks <= (self.DOLLAR_VOLUME_TOP_N_PCT/100)
441 | 
442 |         # Save the predictions and prices so we can analyze them
443 |         self.save_to_results("Prediction", predictions)
444 |         self.save_to_results("Close", closes)
445 |         self.save_to_results("Volume", volumes)
446 | 
447 |         # Buy (sell) stocks with best (worst) predicted return
448 |         have_best_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=False, axis=1) <= 10
449 |         have_worst_predictions = predictions.where(have_adequate_dollar_volumes).rank(ascending=True, axis=1) <= 10
450 |         signals = have_best_predictions.astype(int).where(have_best_predictions, -have_worst_predictions.astype(int).where(have_worst_predictions, 0))
451 |         return signals
452 | 
453 |     def signals_to_target_weights(self, signals: pd.DataFrame, prices: pd.DataFrame):
454 |         # Allocate equal weights
455 |         daily_signal_counts = signals.abs().sum(axis=1)
456 |         weights = signals.div(daily_signal_counts, axis=0).fillna(0)
457 | 
458 |         # Rebalance weekly
459 |         # Resample daily to weekly, taking the first day's signal
460 |         # For pandas offset aliases, see https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
461 |         weights = weights.resample("W").first()
462 |         # Reindex back to daily and fill forward
463 |         weights = weights.reindex(prices.loc["Close"].index, method="ffill")
464 | 
465 |         return weights
466 | 
467 |     def target_weights_to_positions(self, weights: pd.DataFrame, prices: pd.DataFrame):
468 |         # Enter the position the day after the signal
469 |         return weights.shift()
470 | 
471 |     def positions_to_gross_returns(self, positions: pd.DataFrame, prices: pd.DataFrame):
472 | 
473 |         closes = prices.loc["Close"]
474 |         gross_returns = closes.pct_change() * positions.shift()
475 |         return gross_returns
476 | 


--------------------------------------------------------------------------------