├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── causalitydata
├── __init__.py
├── analysis
│ ├── __init__.py
│ ├── backtest.py
│ ├── compounding.py
│ └── evaluate_pnl.py
├── data
│ ├── __init__.py
│ └── dataloader.py
└── notebook
│ └── 01-Backtesting-Signals.ipynb
├── images
└── logo.png
├── poetry.lock
├── pyproject.toml
├── scripts
├── install_ipykernel.py
└── uninstall_ipykernel.py
└── tests
└── __init__.py
/.gitignore:
--------------------------------------------------------------------------------
1 | **/__pycache__
2 | .vscode
3 | .idea
4 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/pre-commit/pre-commit-hooks
3 | rev: v3.2.0
4 | hooks:
5 | - id: check-toml
6 | - id: check-yaml
7 | - id: end-of-file-fixer
8 | - id: trailing-whitespace
9 | - id: check-added-large-files
10 | args: ['--maxkb=1000']
11 | - repo: https://github.com/python-poetry/poetry
12 | rev: '1.7.1'
13 | hooks:
14 | - id: poetry-check
15 | - repo: https://github.com/psf/black
16 | rev: '24.2.0'
17 | hooks:
18 | - id: black
19 | - id: black-jupyter
20 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2024, Causality SL
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 |
8 | 1. Redistributions of source code must retain the above copyright notice, this
9 | list of conditions and the following disclaimer.
10 |
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 | this list of conditions and the following disclaimer in the documentation
13 | and/or other materials provided with the distribution.
14 |
15 | 3. Neither the name of the copyright holder nor the names of its
16 | contributors may be used to endorse or promote products derived from
17 | this software without specific prior written permission.
18 |
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
6 |
7 |
8 |
9 |
10 |
17 | [![Contributors][contributors-shield]][contributors-url]
18 | [![Forks][forks-shield]][forks-url]
19 | [![Stargazers][stars-shield]][stars-url]
20 | [![Issues][issues-shield]][issues-url]
21 | [![BSD3 License][license-shield]][license-url]
22 | [![LinkedIn][linkedin-shield]][linkedin-url]
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
Causality Benchmark Data
34 |
35 |
36 | Showcasing Causality Group's benchmark data through a data loading library and a signal backtesting example.
37 |
39 |
40 |
41 |
43 | Report Bug
44 | ·
45 | Request Feature
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 | Table of Contents
54 |
55 | -
56 | About the Project
57 |
60 |
61 | -
62 | Getting Started
63 |
67 |
68 | -
69 | Backtesting and Data Layout
70 |
75 |
76 |
77 |
78 | - License
79 | - Contact
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 | ## About the Project
88 |
89 | Have you ever found yourself struggling to prepare clean financial data for analysis or to align data from various sources?
90 |
91 | With this repository, you can explore Causality Group's curated historical dataset for academic and non-commercial use, covering the 1500 most liquid stocks in the US equities markets.
92 |
93 | Features include:
94 | * Liquid universe of 1500 stocks, updated monthly
95 | * Free from survivorship bias
96 | * Daily Open, High, Low, Close, VWAP, and Volume
97 | * Overnight returns adjusted for splits, dividends, mergers, and acquisitions
98 | * Intraday 5-minute VWAP, spread, and volume snapshots
99 | * SPY ETF data for hedging
100 | * CAPM betas and residuals for market-neutral analysis
101 |
102 | Please contact us on [LinkedIn](https://www.linkedin.com/in/markhorvath-ai) for access to the dataset!
103 |
104 | More details [here](#backtesting-and-data-layout)
105 |
106 | (back to top)
107 |
108 |
109 |
110 | ### Built With
111 |
112 | * [![Python][Python.org]][Python-url]
113 | * [![Poetry][Poetry.org]][Poetry-url]
114 | * [![Jupyter][Jupyter.org]][Jupyter-url]
115 | * [![Matplotlib][Matplotlib.org]][Matplotlib-url]
116 | * [![Numpy][Numpy.org]][Numpy-url]
117 | * [![Pandas][Pandas.org]][Pandas-url]
118 | * [![Sklearn][Sklearn.org]][Sklearn-url]
119 | * [![Scipy][Scipy.org]][Scipy-url]
120 |
121 | (back to top)
122 |
123 |
124 |
125 |
126 | ## Getting Started
127 |
128 | Follow these steps to set up the project on your local machine for development and testing purposes.
129 |
130 | ### Prerequisites
131 |
132 | Ensure you have the following installed on your local setup:
133 | - Python 3.9.5
134 | - Poetry (see [installation instructions](https://python-poetry.org/docs/#installation))
135 |
136 | ### Installation
137 |
138 | 1. Clone the repository.
139 | 2. Install the dependencies:
140 | ```bash
141 | poetry install
142 | ```
143 | > **Optional:** If you want to use the Jupyter kernel, install the optional `jupyter` group of dependencies with `poetry install --with jupyter`.
144 |
145 | 3. Install the pre-commit hooks:
146 | ```bash
147 | poetry run pre-commit install
148 | ```
149 |
150 | You're all set! Pre-commit hooks will run on git commit (more information in [pre-commit docs](https://pre-commit.com/index.html)). Ensure your changes pass all checks before pushing.
151 |
152 | ### Available Scripts
153 | - `poetry run black ./causalitydata`: Runs the code formatter.
154 | - `poetry run pylint ./causalitydata`: Runs the linter.
155 | - `poetry run install-ipykernel`: Installs the causality kernel for Jupyter.
156 | - `poetry run uninstall-ipykernel`: Uninstalls the causality kernel for Jupyter.
157 |
158 | > **Note:** To run the ipykernel scripts you need to install the optional `jupyter` group of dependencies. Use `poetry install --with jupyter`.
159 |
160 | (back to top)
161 |
162 |
163 |
164 |
165 | ## Backtesting and Data Layout
166 |
167 | ### Backtesting
168 |
169 | [01-Backtesting-Signals.ipynb](https://github.com/causality-group/causality-benchmark-data/blob/main/causalitydata/notebook/01-Backtesting-Signals.ipynb) serves as a minimal example of utilizing the dataset and library for quantitative analysis, alpha signal research, and backtesting.
170 |
171 | The example showcases a daily backtest, relying on close-to-close adjusted returns of the 1500 most liquid companies in the US since 2007. Since the most liquid companies change constantly, we update our liquid universe at the start of each month. This dynamic universe is already pre-calculated in the `universe.csv` data file.
172 |
173 | Assuming trading at the 16:00 close auction in the US, our example only uses features for alpha creation that are observable by 15:45. We plot the performance of some well-known alpha factors and invite you to experiment with building your quantitative investment model from there!
174 |
175 | ### Data Layout
176 |
177 | All data files in the benchmark dataset have the same structure:
178 |
179 | * Data files are in `.csv` format.
180 | * The first row contains the header.
181 | * Rows represent different dates in increasing order. There is only one row per date, i.e., there is no intraday granularity within the files.
182 | * The first column corresponds to the index and contains the date information, at which the given value is observable:
183 | * Date format: `YYYY-MM-DD`.
184 | * Every other column represents an individual asset in the universe:
185 | * Asset identifier format: `__`. E.g., `AAPL_XNAS_ESXXXX`.
186 | * All files have the same number of rows and columns.
187 |
188 | There are two types of files in the dataset, *daily* and *intraday*. *Daily* files contain data whose characteristic is that there can only be one datapoint per day, e.g., open auction price, daily volume, GICS sector information, etc. *Intraday* files contain information about the market movements during the US trading session, e.g., intraday prices and volumes. We accumulate this data in 5-minute bars. The name of *intraday* files starts with an integer identifying the bar time.
189 |
190 | #### File Description
191 |
192 | Here we detail the data contained in some files that might not be trivial by their name.
193 |
194 | * **Daily**
195 | * `universe.csv`: Mask of the tradable universe at each date. The universe is rebalanced at the beginning of each month.
196 | * `ret_.csv`: Adjusted asset returns calculated on different time periods:
197 | * `cc`: Close-to-Close, the position is entered at the close auction and exited at the following day's close auction.
198 | * `co`: Close-to-Open, the position is entered at the close auction and exited at the following day's open auction.
199 | * `oc`: Open-to-Close, the position is entered at the open auction and exited at the same day's close auction.
200 | * `oo`: Open-to-Open, the position is entered at the open auction and exited at the following day's open auction.
201 | * `SPY_ret_.csv`: SPY ETF return. The SPY time series is placed in all asset columns for convenience.
202 | * `beta_.csv`: CAPM betas between assets and the SPY ETF for different time periods.
203 | * `resid_.csv`: CAPM residual returns on different time periods. `resid = ret - beta * SPY_ret`
204 | * **Intraday**
205 | * `__5m.csv`: Intraday market data snapshots at hhmmss bar. These backward-looking bars are calculated on the time range `[t-5min, t)`.
206 |
207 | (back to top)
208 |
209 |
210 |
211 |
212 |
224 |
225 |
226 |
227 |
243 |
244 |
245 |
246 | ## License
247 |
248 | Distributed under the BSD-3 License. See `LICENSE` for more information.
249 |
250 | (back to top)
251 |
252 |
253 |
254 |
255 | ## Contact
256 |
257 | Please reach us at [LinkedIn](https://www.linkedin.com/in/markhorvath-ai) or visit our [website](https://www.causalitygroup.com)!
258 |
259 | (back to top)
260 |
261 |
262 |
263 |
264 |
273 |
274 |
275 |
276 |
277 | [contributors-shield]: https://img.shields.io/github/contributors/causality-group/causality-benchmark-data?style=for-the-badge
278 | [contributors-url]: https://github.com/causality-group/causality-benchmark-data/graphs/contributors
279 | [forks-shield]: https://img.shields.io/github/forks/causality-group/causality-benchmark-data.svg?style=for-the-badge
280 | [forks-url]: https://github.com/causality-group/causality-benchmark-data/network/members
281 | [stars-shield]: https://img.shields.io/github/stars/causality-group/causality-benchmark-data?style=for-the-badge
282 | [stars-url]: https://github.com/causality-group/causality-benchmark-data/stargazers
283 | [issues-shield]: https://img.shields.io/github/issues/causality-group/causality-benchmark-data.svg?style=for-the-badge
284 | [issues-url]: https://github.com/causality-group/causality-benchmark-data/issues
285 | [license-shield]: https://img.shields.io/github/license/causality-group/causality-benchmark-data.svg?style=for-the-badge
286 | [license-url]: https://github.com/causality-group/causality-benchmark-data/blob/main/LICENSE
287 | [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
288 | [linkedin-url]: https://linkedin.com/company/causality-group
289 |
290 | [Python.org]: https://img.shields.io/badge/Python-3.9.5-blue?style=for-the-badge&logo=python&logoColor=ffdd54&labelColor=3776ab&color=3776ab
291 | [Python-url]: https://python.org/
292 | [Poetry.org]: https://img.shields.io/badge/Poetry-1.7.1-%233B82F6?style=for-the-badge&logo=poetry&logoColor=0B3D8D&labelColor=%233B82F6
293 | [Poetry-url]: https://python-poetry.org/
294 | [Jupyter.org]: https://img.shields.io/badge/jupyter-8.6.0-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white&labelColor=%23FA0F00
295 | [Jupyter-url]: https://jupyter.org/
296 | [Pandas.org]: https://img.shields.io/badge/pandas-2.2.0-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white&labelColor=%23150458&color=%23150458
297 | [Pandas-url]: https://pandas.pydata.org/
298 | [Matplotlib.org]: https://img.shields.io/badge/Matplotlib-3.8.3-%23ffffff.svg?style=for-the-badge&logo=Matplotlib&logoColor=black&labelColor=%23ffffff
299 | [Matplotlib-url]: https://matplotlib.org
300 | [Numpy.org]: https://img.shields.io/badge/numpy-1.26.4-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white&labelColor=%23013243
301 | [Numpy-url]: https://numpy.org
302 | [Sklearn.org]: https://img.shields.io/badge/scikit--learn-1.0.1-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white&labelColor=%23F7931E
303 | [Sklearn-url]: http://scikit-learn.org
304 | [SciPy.org]: https://img.shields.io/badge/SciPy-1.12.0-%230C55A5.svg?style=for-the-badge&logo=scipy&logoColor=%white&labelColor=%230C55A5
305 | [Scipy-url]: https://scipy.org
306 |
--------------------------------------------------------------------------------
/causalitydata/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/causality-group/causality-benchmark-data/16c83f4fa5bbeab0583858e702973aba6cceaf28/causalitydata/__init__.py
--------------------------------------------------------------------------------
/causalitydata/analysis/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/causality-group/causality-benchmark-data/16c83f4fa5bbeab0583858e702973aba6cceaf28/causalitydata/analysis/__init__.py
--------------------------------------------------------------------------------
/causalitydata/analysis/backtest.py:
--------------------------------------------------------------------------------
1 | def signalbacktest_df(
2 | signal_df, upcoming_ret_df, enter_at_stdev=2.0, do_return_signal=False
3 | ):
4 | """A simple backtest, trading cross-sectional signals at a given standard deviation.
5 |
6 | Invests $0.5 both long and short, hence results can be interpreted as strategy returns.
7 |
8 | Args:
9 | signal_df: DataFrame with signals observable at the Timestamps in in its index.
10 | upcoming_ret_df: DataFrame with forward looking returns tradeable at the Timestamps in in its index.
11 | enter_at_stdev: Number of standard deviations to enter the trade both on long and short side.
12 | do_return_signal: If True, returns the signal DataFrame as well, otherwise only the strategy returns.
13 |
14 | Returns:
15 | DataFrame with strategy returns.
16 | """
17 | is_long_df = (
18 | signal_df.add(
19 | -(signal_df.mean(axis=1) + enter_at_stdev * signal_df.std(axis=1)), axis=0
20 | )
21 | > 0.0
22 | )
23 | is_short_df = (
24 | signal_df.add(
25 | -(signal_df.mean(axis=1) - enter_at_stdev * signal_df.std(axis=1)), axis=0
26 | )
27 | < 0.0
28 | )
29 |
30 | signal_df = is_long_df.astype(int) - is_short_df.astype(int)
31 | # Demean cross-sectionally:
32 | signal_df = signal_df.add(-signal_df.mean(axis=1), axis=0)
33 | # Normalize cross-sectionally:
34 | signal_df = signal_df.div(signal_df.abs().sum(axis=1), axis=0)
35 |
36 | if do_return_signal:
37 | return signal_df * upcoming_ret_df, signal_df
38 | else:
39 | return signal_df * upcoming_ret_df
40 |
--------------------------------------------------------------------------------
/causalitydata/analysis/compounding.py:
--------------------------------------------------------------------------------
1 | """Functionality to compound returns."""
2 |
3 | import numpy as np
4 | import pandas as pd
5 | from typing import List
6 |
7 | EPSILON_TIME = pd.offsets.Micro()
8 |
9 |
10 | def compound_ret_df(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
11 | """Compounds arithmetic returns.
12 |
13 | Args:
14 | df1: DataFrame with arithmetic returns
15 | df2: DataFrame with arithmetic returns
16 |
17 | Returns:
18 | DataFrame with compounded returns
19 | """
20 | return (1.0 + df1) * (1.0 + df2) - 1.0
21 |
22 |
23 | def _sum_nan_if_all_nan_df(df: pd.DataFrame) -> pd.Series:
24 | """
25 | If all values in a column are NaN, result will be NaN.
26 | Otherwise, the sum of the values.
27 | """
28 | sum_ser = df.sum(axis=0)
29 | all_nan_ser = df.count(axis=0) == 0
30 | sum_ser[all_nan_ser] = np.nan
31 | return sum_ser
32 |
33 |
34 | def compound_upcoming_bar_return_cc_df(
35 | ret_cc_df: pd.DataFrame, bar_tss: List[pd.Timestamp]
36 | ) -> pd.DataFrame:
37 | """
38 | Compounds daily adjusted close to close returns for each Timestamp in bar_tss,
39 | such that we can trade those at the 16:00 auction.
40 |
41 | Returns bars do not overlap, each covers a time period from 16:00 between bar_tss[i] and bar_tss[i+1].
42 | """
43 | # Returns end on the index date, need to align to the start date of the return
44 | ret_cc_df = ret_cc_df.shift(-1)
45 | ret_cc_df = np.log(ret_cc_df + 1.0) # convert to log returns
46 | bar_tss = bar_tss + [pd.Timestamp.max]
47 | log_bar_ret_df = pd.DataFrame(
48 | {
49 | # Indexing my Timestamp is inclusive, hence: -EPSILON_TIME
50 | ts: _sum_nan_if_all_nan_df(
51 | ret_cc_df.loc[ts : bar_tss[i + 1] - EPSILON_TIME, :]
52 | )
53 | for i, ts in enumerate(bar_tss[:-1])
54 | }
55 | ).T
56 | return np.exp(log_bar_ret_df) - 1.0
57 |
58 |
59 | def compound_observable_bar_return_cc_df(
60 | ret_cc_df: pd.DataFrame,
61 | bar_tss: List[pd.Timestamp],
62 | ) -> pd.DataFrame:
63 | """
64 | Compounds daily adjusted close to close returns for each Timestamp in bar_tss,
65 | such that we can observe those at 15:45 to trade the 16:00 auction.
66 |
67 | This means in practice using close prices only until previous close.
68 |
69 | Returns bars do not overlap, each covers a time period from 16:00
70 | between bar_tss[i] and bar_tss[i+1].
71 | """
72 | ret_cc_df = np.log(ret_cc_df + 1.0) # convert to log returns
73 | bar_tss = [pd.Timestamp.min] + bar_tss
74 | log_bar_ret_df = pd.DataFrame(
75 | {
76 | # Indexing my Timestamp is inclusive, hence: -EPSILON_TIME
77 | ts: _sum_nan_if_all_nan_df(
78 | ret_cc_df.loc[bar_tss[i - 1] : ts - EPSILON_TIME, :]
79 | )
80 | for i, ts in enumerate(bar_tss)
81 | if i > 0
82 | }
83 | ).T
84 | return np.exp(log_bar_ret_df) - 1.0
85 |
--------------------------------------------------------------------------------
/causalitydata/analysis/evaluate_pnl.py:
--------------------------------------------------------------------------------
1 | """Functionality to evaluate performance."""
2 |
3 | from typing import List, Optional, Tuple
4 |
5 | import numpy as np
6 | import matplotlib.pyplot as plt
7 | import pandas as pd
8 |
9 |
10 | def plot_pnl(
11 | pnl_df: pd.DataFrame,
12 | xlabel: Optional[str] = "Time",
13 | ylabel: Optional[str] = "Cumulative PnL",
14 | title: Optional[str] = "Profit and Loss Curve",
15 | append_legends: Optional[List[str]] = None,
16 | figure_size: Tuple[int, int] = (10, 6),
17 | ):
18 | """
19 | Plots a PnL DataFrame.
20 |
21 | Args:
22 | pnl_df: DataFrame with PnL values for each trading period
23 | xlabel: Label for the x-axis
24 | ylabel: Label for the y-axis
25 | title: Title of the plot
26 | append_legends: List of strings to append to the legend of each line
27 | figure_size: Size of the figure (width, height)
28 | """
29 | plt.figure(figsize=figure_size)
30 | plt.plot(pnl_df.index, pnl_df.cumsum().values)
31 | if xlabel is not None:
32 | plt.xlabel(xlabel)
33 | if ylabel is not None:
34 | plt.ylabel(ylabel)
35 | if title is not None:
36 | plt.title(title)
37 | legends = pnl_df.columns.tolist()
38 | if append_legends is not None:
39 | new_legend = []
40 | for i, item in enumerate(legends):
41 | new_legend += [legends[i] + item]
42 | legends = new_legend
43 | plt.legend(legends, bbox_to_anchor=(1.05, 1), loc="upper left")
44 | plt.show()
45 |
46 |
47 | def calculate_performance_df(
48 | pnl_df: pd.DataFrame,
49 | ) -> pd.DataFrame:
50 | """
51 | Calculates performance metrics for a PnL DataFrame:
52 | - Cumulative PnL
53 | - Daily PnL
54 | - Average PnL
55 | - Maximum PnL
56 | - Minimum PnL
57 | - PnL Standard Deviation
58 | - Sharpe Ratio
59 | - Sortino Ratio
60 | - Maximum Drawdown
61 | - Calmar Ratio
62 | - Matrin Ratio
63 |
64 | Args:
65 | pnl_df: DataFrame with PnL values for each trading period
66 | signal_df: DataFrame with trading signals for each trading period
67 |
68 | Returns:
69 | pd.DataFrame with performance metrics.
70 | Performance metrics are rows of the DataFrame,
71 | and column names match column names in the pnl_df argument.
72 | """
73 |
74 | num_periods_per_year = len(pnl_df.index) / (
75 | (pnl_df.index[-1].year + (pnl_df.index[-1].month - 1) / 12)
76 | - (pnl_df.index[0].year + (pnl_df.index[0].month - 1) / 12)
77 | )
78 |
79 | performance_df = pd.DataFrame()
80 |
81 | # Cumulative PnL
82 | performance_df["Cumulative PnL"] = pnl_df.sum()
83 |
84 | # Daily PnL
85 | performance_df["Annual PnL"] = pnl_df.mean() * num_periods_per_year
86 |
87 | # Average PnL
88 | performance_df["Average PnL"] = pnl_df.mean()
89 |
90 | # Maximum PnL
91 | performance_df["Maximum PnL"] = pnl_df.max()
92 |
93 | # Minimum PnL
94 | performance_df["Minimum PnL"] = pnl_df.min()
95 |
96 | # PnL Standard Deviation
97 | performance_df["Annual Standard Deviation"] = pnl_df.std() * np.sqrt(
98 | num_periods_per_year
99 | )
100 |
101 | # Sharpe Ratio
102 | performance_df["Annual Sharpe Ratio"] = (
103 | pnl_df.mean() / pnl_df.std() * np.sqrt(num_periods_per_year)
104 | )
105 |
106 | # Sortino Ratio
107 | downside_returns = pnl_df[pnl_df < 0]
108 | downside_std = downside_returns.std()
109 | performance_df["Sortino Ratio"] = (
110 | pnl_df.mean() / downside_std * np.sqrt(num_periods_per_year)
111 | )
112 |
113 | # Maximum Drawdown
114 | cumulative_returns = pnl_df.cumsum()
115 | rolling_max = cumulative_returns.cummax()
116 | drawdown = cumulative_returns - rolling_max
117 | performance_df["Maximum Drawdown"] = drawdown.min()
118 |
119 | # Calmar Ratio
120 | performance_df["Calmar Ratio"] = pnl_df.mean() / abs(drawdown.min())
121 |
122 | # Martin Ratio
123 | performance_df["Martin Ratio"] = pnl_df.mean() / pnl_df.std() * np.sqrt(len(pnl_df))
124 |
125 | # # Statistical significance of the null-hypothesis that
126 | # the mean PnL is zero or negative, using one sided t-test
127 | # t_stat, p_value = stats.ttest_1samp(pnl_df, 0, alternative='less')
128 | # performance_df['T-Test p-value'] = p_value
129 |
130 | return performance_df.T
131 |
--------------------------------------------------------------------------------
/causalitydata/data/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/causality-group/causality-benchmark-data/16c83f4fa5bbeab0583858e702973aba6cceaf28/causalitydata/data/__init__.py
--------------------------------------------------------------------------------
/causalitydata/data/dataloader.py:
--------------------------------------------------------------------------------
1 | """Functionality to load data."""
2 |
3 | import os
4 | from typing import Tuple, Optional
5 |
6 | import numpy as np
7 | import pandas as pd
8 |
9 |
10 | def load_field_df(
11 | field_name: str,
12 | data_root_path: str,
13 | shift: int = 0,
14 | end_ts: Optional[pd.Timestamp] = None,
15 | dtype: type = np.float64,
16 | ) -> pd.DataFrame:
17 | """
18 | Loads data fields into DataFrame with DateTimeIndex.
19 |
20 | Args:
21 | field_name: name of the field to load
22 | data_root_path: path to the data root directory
23 | shift: number of periods to shift the data by.
24 | Positive shift lags, negative shift peeks into the future.
25 | end_ts: end date of the DataFrame.
26 | If None, the whole DataFrame is returned.
27 | dtype: data type to read in
28 |
29 | Returns:
30 | DataFrame with DateTimeIndex.
31 | Columns are strings including ticker symbol,
32 | exchange and CFI category, while rows are dates of the observations.
33 | """
34 | df = load_path_df(os.path.join(data_root_path, field_name + ".csv"), dtype=dtype)
35 | df.index = pd.DatetimeIndex(pd.to_datetime(df.index))
36 | df = df.shift(shift)
37 | if end_ts is not None:
38 | df = df.loc[:end_ts, :]
39 | return df
40 |
41 |
42 | def load_path_df(
43 | csvfile_path: str,
44 | exclude_tickers: Tuple = (),
45 | dtype: type = np.float64,
46 | ) -> pd.DataFrame:
47 | """
48 | Reads data into pandas from csv files.
49 |
50 | Args:
51 | csvfile_path: path to csv file
52 | exclude_tickers: list of tickers to exclude
53 | dtype: data type to read in
54 |
55 | Returns:
56 | pd.DataFrame with time index and columns for each ticker
57 | """
58 | if "str" in str(dtype):
59 | df = pd.read_csv(
60 | csvfile_path,
61 | # avoid DtypeWarning: Columns (1..1330) have mixed types.
62 | # Specify dtype option on import or set low_memory=False
63 | low_memory=False,
64 | index_col=0,
65 | header=0,
66 | parse_dates=False,
67 | dtype=dtype,
68 | )
69 | else: # numeric type
70 | df = pd.read_csv(
71 | csvfile_path,
72 | # avoid DtypeWarning: Columns (1..1330) have mixed types.
73 | # Specify dtype option on import or set low_memory=False
74 | low_memory=False,
75 | index_col=0,
76 | header=0,
77 | parse_dates=False,
78 | )
79 |
80 | if exclude_tickers:
81 | for ex in exclude_tickers:
82 | df = df[[s for s in df.columns if ex not in s]]
83 |
84 | if "str" in str(dtype):
85 | return df
86 | else: # numeric type
87 | return df.astype(dtype=dtype, copy=False)
88 |
--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/causality-group/causality-benchmark-data/16c83f4fa5bbeab0583858e702973aba6cceaf28/images/logo.png
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.poetry]
2 | name = "causalitydata"
3 | version = "0.1.0"
4 | description = "Showcasing time series forecasting and machine learning optimization with Darts."
5 | authors = ["Causality Group"]
6 | readme = "README.md"
7 |
8 | [tool.poetry.dependencies]
9 | python = "~3.9"
10 | scikit-learn = "1.0.1"
11 | darts = "0.27.1"
12 | stats = "0.1.2a"
13 |
14 |
15 | [tool.poetry.group.dev.dependencies]
16 | pytest = "^8.0.1"
17 | black = {extras = ["jupyter"], version = "^24.2.0"}
18 | pylint = "^3.0.3"
19 | pre-commit = "^3.6.2"
20 |
21 |
22 | [tool.poetry.group.jupyter]
23 | optional = true
24 |
25 | [tool.poetry.group.jupyter.dependencies]
26 | ipykernel = "6.29.0"
27 |
28 | [build-system]
29 | requires = ["poetry-core"]
30 | build-backend = "poetry.core.masonry.api"
31 |
32 | [tool.poetry.scripts]
33 | install-ipykernel = "scripts.install_ipykernel:install_ipykernel"
34 | uninstall-ipykernel = "scripts.uninstall_ipykernel:uninstall_ipykernel"
35 |
--------------------------------------------------------------------------------
/scripts/install_ipykernel.py:
--------------------------------------------------------------------------------
1 | """Script to install causality ipykernel."""
2 |
3 | import logging
4 | from ipykernel.kernelspec import install
5 |
6 | # Set up logging
7 | logging.basicConfig(level=logging.INFO)
8 |
9 |
10 | def install_ipykernel():
11 | """Installs causality ipykernel."""
12 | try:
13 | install(user=True, kernel_name="causalitydata-p3.9")
14 | logging.info("Successfully installed causality ipykernel.")
15 | except Exception as e:
16 | logging.error("Failed to install causality ipykernel.")
17 | logging.error(repr(e))
18 |
19 |
20 | if __name__ == "__main__":
21 | install_ipykernel()
22 |
--------------------------------------------------------------------------------
/scripts/uninstall_ipykernel.py:
--------------------------------------------------------------------------------
1 | """Script to uninstall causality ipykernel."""
2 |
3 | import logging
4 | from jupyter_client.kernelspec import KernelSpecManager
5 |
6 | # Set up logging
7 | logging.basicConfig(level=logging.INFO)
8 |
9 |
10 | def uninstall_ipykernel():
11 | """Uninstalls causality ipykernel."""
12 | try:
13 | KernelSpecManager().remove_kernel_spec("causalitydata-p3.9")
14 | logging.info("Successfully uninstalled causality ipykernel.")
15 | except Exception as e:
16 | logging.error("Failed to uninstall causality ipykernel.")
17 | logging.error(repr(e))
18 |
19 |
20 | if __name__ == "__main__":
21 | uninstall_ipykernel()
22 |
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/causality-group/causality-benchmark-data/16c83f4fa5bbeab0583858e702973aba6cceaf28/tests/__init__.py
--------------------------------------------------------------------------------