├── ExtendedAbstract.pdf
├── README.md
├── Thesis.pdf
├── classes
    ├── class_DataProcessor.py
    ├── class_ForecastingTrader.py
    ├── class_SeriesAnalyser.py
    └── class_Trader.py
├── code_organization.pdf
├── data
    └── link_to_data.txt
├── drafts
    ├── PairsTrading_Examples.ipynb
    ├── config
    │   ├── config.json
    │   ├── config_commodities_2000_2018.json
    │   ├── config_commodities_2008_2018.json
    │   ├── config_commodities_2009_2015.json
    │   ├── config_commodities_2009_2017.json
    │   ├── config_commodities_2010_2016.json
    │   ├── config_commodities_2010_2018.json
    │   ├── config_commodities_2010_2019.json
    │   ├── config_commodities_2011_2015.json
    │   ├── config_commodities_2011_2017.json
    │   ├── config_commodities_2011_2019.json
    │   ├── config_commodities_2012_2016.json
    │   ├── config_commodities_2012_2018.json
    │   ├── config_commodities_2013_2017.json
    │   ├── config_commodities_2013_2019.json
    │   ├── config_commodities_2014_2018.json
    │   ├── config_commodities_2015_2019.json
    │   └── config_commodities_pr.json
    ├── draft.py
    ├── main.py
    └── mlp_trainer.py
├── notebooks
    ├── PairsTrading-Benchmark-FixedBeta_2009_2019.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_2015_2019.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_NoClustering_2012_2016.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_NoClustering_2013_2017.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_NoClustering_2014_2018.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_OPTICS_2012_2016.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_OPTICS_2013_2017.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_OPTICS_2014_2018.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_Sector_2012_2016.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_Sector_2013_2017.ipynb
    ├── PairsTrading-Benchmark-FixedBeta_Sector_2014_2018.ipynb
    ├── PairsTrading-Clustering.ipynb
    ├── PairsTrading-DataPreprocessing.ipynb
    └── PairsTrading-Forecasting_2009_2019.ipynb
└── training
    ├── PairsTrading_DeepLearning.ipynb
    ├── encoder_decoder_trainer.py
    └── rnn_trainer.py


/ExtendedAbstract.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simaomsarmento/PairsTrading/0781877c75673ceca3c61704eee9c9dca9d37b6b/ExtendedAbstract.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # PairsTrading
 2 | 
 3 | ## Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering
 4 | 
 5 | **September 2019**
 6 | 
 7 | **Simão Moraes Sarmento, simao.moraes.sarmento@tecnico.ulisboa.pt**
 8 | 
 9 | ## Thesis Abstract:
10 | Pairs Trading is one of the most valuable market-neutral strategies used by hedge funds. It is particularly interesting as it overcomes the arduous process of valuing securities by focusing on relative pricing. By buying a relatively undervalued security and selling a relatively overvalued one, a profit can be made upon the pair's price convergence. However, with the growing availability of data, it became increasingly harder to find rewarding pairs. In this work, we address two problems: (i) how to find profitable pairs while constraining the search space and (ii) how to avoid long decline periods due to prolonged divergent pairs. To manage these difficulties, the application of promising Machine Learning techniques is investigated in detail. We propose the integration of an Unsupervised Learning algorithm, OPTICS, to handle problem (i). The results obtained demonstrate the suggested technique can outperform the common pairs' search methods, achieving an average portfolio Sharpe ratio of 3.79, in comparison to 3.58 and 2.59 obtained by standard approaches. For problem (ii), we introduce a forecasting-based trading model, capable of reducing the periods of portfolio decline by 75\%. Yet, this comes at the expense of decreasing overall profitability. The proposed strategy is tested using an ARMA model, an LSTM and an LSTM Encoder-Decoder. This work's results are simulated during varying periods between January 2009 and December 2018, using 5-minutes price data from a group of 208 commodity-linked ETFs, and accounting for transaction costs.  
11 | 
12 | ## Repository content:
13 | 
14 | This repository contains all the code developed to produce the results presented in *Thesis.pdf*.
15 | 
16 | A detailed explanation concerning the code organization can be found in *code_organization.pdf*.
17 | 
18 | 
19 | 
20 | 
21 | 
22 | ## Notes:
23 | 
24 | - The files have been organized in folders to make this repo tidier. Nevertheless, the code presented in the notebooks and 
25 | in the training files presumes the class files are in the same directory. 
26 | - To rerun the notebooks or the training files the path to the classes must be adapted.
27 |  
28 | Data available in: https://www.dropbox.com/sh/0w3vu1eylrfnkch/AABttIlDf64MmVf5CP1Qy-XOa?dl=0
29 | 
30 | 
31 | 
32 | ## Training the Deep Learning models on Google Colab
33 | 
34 | 1. Copy all the required files (data folder + classes + training files) to directory in Google Drive
35 | 2. Run the notebook in the 'training' folder using google colab.
36 | 
37 | 


--------------------------------------------------------------------------------
/Thesis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simaomsarmento/PairsTrading/0781877c75673ceca3c61704eee9c9dca9d37b6b/Thesis.pdf


--------------------------------------------------------------------------------
/classes/class_DataProcessor.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from openpyxl import load_workbook
  4 | 
  5 | # just set the seed for the random number generator
  6 | np.random.seed(107)
  7 | 
  8 | class DataProcessor:
  9 |     """
 10 |     This class contains a set of pairs trading strategies along
 11 |     with some auxiliary functions
 12 | 
 13 |     """
 14 | 
 15 |     def read_ticker_excel(self, path=None):
 16 |         """
 17 |         Assumes the relevant tickers are saved in an excel file.
 18 | 
 19 |         :param path: path to excel
 20 |         :return: df with tickers data, list with tickers
 21 |         """
 22 | 
 23 |         df = pd.read_excel(path)
 24 | 
 25 |         # remove duplicated
 26 |         unique_df = df[~df.duplicated(subset=['Ticker'], keep='first')].sort_values(['Ticker'])
 27 |         tickers = unique_df.Ticker.unique()
 28 | 
 29 |         return df, unique_df, tickers
 30 | 
 31 |     def dict_to_df(self, dataset, threshold=None):
 32 |         """
 33 |         Transforms a dictionary into a Dataframe
 34 | 
 35 |         :param dataset: dictionary containing tickers as keys and corresponding price series
 36 |         :param threshold: threshold for number of Nan Values
 37 |         :return: df with tickers as columns
 38 |         :return: df_clean with tickers as columns, and columns with null values dropped
 39 |         """
 40 | 
 41 |         first_count = True
 42 |         for k in dataset.keys():
 43 |             if dataset[k] is not None:
 44 |                 if first_count:
 45 |                     df = dataset[k]
 46 |                     first_count = False
 47 |                 else:
 48 |                     df = pd.concat([df, dataset[k]], axis=1)
 49 | 
 50 |         if threshold is not None:
 51 |             df_clean = self.remove_tickers_with_nan(df, threshold)
 52 |         else:
 53 |             df_clean = df
 54 | 
 55 |         return df, df_clean
 56 | 
 57 |     def remove_tickers_with_nan(self, df, threshold):
 58 |         """
 59 |         Removes columns with more than threshold null values
 60 |         """
 61 |         null_values = df.isnull().sum()
 62 |         null_values = null_values[null_values > 0]
 63 | 
 64 |         to_remove = list(null_values[null_values > threshold].index)
 65 |         df = df.drop(columns=to_remove)
 66 | 
 67 |         return df
 68 | 
 69 |     def get_return_series(self, df_prices):
 70 |         """
 71 |         This function calculates the return series of a given price series
 72 | 
 73 |         :param prices: time series with prices
 74 |         :return: return series
 75 |         """
 76 |         df_returns = df_prices.pct_change()
 77 |         df_returns = df_returns.iloc[1:]
 78 | 
 79 |         return df_returns
 80 | 
 81 |     def split_data(self, df_prices, training_dates, testing_dates, remove_nan=True):
 82 |         """
 83 |         This function splits a dataframe into training and validation sets
 84 |         :param df_prices: dataframe containing prices for all dates
 85 |         :param training_dates: tuple (training initial date, training final date)
 86 |         :param testing_dates: tuple (testing initial date, testing final date)
 87 |         :param remove_nan: flag to detect if nan values are to be removed
 88 | 
 89 |         :return: df with training prices
 90 |         :return: df with testing prices
 91 |         """
 92 |         if remove_nan:
 93 |             dataset_mask = ((df_prices.index >= training_dates[0]) &\
 94 |                             (df_prices.index <= testing_dates[1]))
 95 |             df_prices_dataset = df_prices[dataset_mask]
 96 |             print('Total of {} tickers'.format(df_prices_dataset.shape[1]))
 97 |             df_prices_dataset_without_nan = self.remove_tickers_with_nan(df_prices_dataset, 0)
 98 |             print('Total of {} tickers after removing tickers with Nan values'.format(
 99 |                 df_prices_dataset_without_nan.shape[1]))
100 |             df_prices = df_prices_dataset_without_nan.copy()
101 | 
102 |         train_mask = (df_prices.index <= training_dates[1])
103 |         test_mask = (df_prices.index >= testing_dates[0])
104 |         df_prices_train = df_prices[train_mask]
105 |         df_prices_test = df_prices[test_mask]
106 | 
107 |         return df_prices_train, df_prices_test
108 | 
109 |     def append_df_to_excel(self, filename, df, sheet_name='Sheet1', startrow=None,
110 |                            truncate_sheet=False,
111 |                            **to_excel_kwargs):
112 |         """
113 |         Source: https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting
114 |         -data-using-pandas/47740262#47740262
115 | 
116 |         Append a DataFrame [df] to existing Excel file [filename]
117 |         into [sheet_name] Sheet.
118 |         If [filename] doesn't exist, then this function will create it.
119 | 
120 |         Parameters:
121 |           filename : File path or existing ExcelWriter
122 |                      (Example: '/path/to/file.xlsx')
123 |           df : dataframe to save to workbook
124 |           sheet_name : Name of sheet which will contain DataFrame.
125 |                        (default: 'Sheet1')
126 |           startrow : upper left cell row to dump data frame.
127 |                      Per default (startrow=None) calculate the last row
128 |                      in the existing DF and write to the next row...
129 |           truncate_sheet : truncate (remove and recreate) [sheet_name]
130 |                            before writing DataFrame to Excel file
131 |           to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
132 |                             [can be dictionary]
133 | 
134 |         Returns: None
135 |         """
136 | 
137 |         # ignore [engine] parameter if it was passed
138 |         if 'engine' in to_excel_kwargs:
139 |             to_excel_kwargs.pop('engine')
140 | 
141 |         writer = pd.ExcelWriter(filename, engine='openpyxl')
142 | 
143 |         # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
144 |         #try:
145 |         #    FileNotFoundError
146 |         #except NameError:
147 |         #    FileNotFoundError = IOError
148 | 
149 |         try:
150 |             # try to open an existing workbook
151 |             writer.book = load_workbook(filename)
152 | 
153 |             # get the last row in the existing Excel sheet
154 |             # if it was not specified explicitly
155 |             if startrow is None and sheet_name in writer.book.sheetnames:
156 |                 startrow = writer.book[sheet_name].max_row
157 | 
158 |             # truncate sheet
159 |             if truncate_sheet and sheet_name in writer.book.sheetnames:
160 |                 # index of [sheet_name] sheet
161 |                 idx = writer.book.sheetnames.index(sheet_name)
162 |                 # remove [sheet_name]
163 |                 writer.book.remove(writer.book.worksheets[idx])
164 |                 # create an empty sheet [sheet_name] using old index
165 |                 writer.book.create_sheet(sheet_name, idx)
166 | 
167 |             # copy existing sheets
168 |             writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
169 |         except FileNotFoundError:
170 |             # file does not exist yet, we will create it
171 |             pass
172 | 
173 |         if startrow is None:
174 |             startrow = 0
175 | 
176 |         # write out the new sheet
177 |         df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
178 | 
179 |         # save the workbook
180 |         writer.save()
181 | 


--------------------------------------------------------------------------------
/classes/class_ForecastingTrader.py:
--------------------------------------------------------------------------------
  1 | # set seeds
  2 | import numpy as np
  3 | np.random.seed(1) # NumPy
  4 | import random
  5 | random.seed(3) # Python
  6 | import tensorflow as tf
  7 | tf.set_random_seed(2) # Tensorflow
  8 | session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
  9 |                               inter_op_parallelism_threads=1)
 10 | from keras import backend as K
 11 | sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
 12 | K.set_session(sess)
 13 | 
 14 | import pandas as pd
 15 | 
 16 | import matplotlib.pyplot as plt
 17 | 
 18 | from classes import class_Trader
 19 | 
 20 | from sklearn.preprocessing import StandardScaler
 21 | 
 22 | # Import keras
 23 | from keras.models import Sequential
 24 | from keras.layers import Dense, Dropout, TimeDistributed, CuDNNLSTM
 25 | from keras.callbacks import EarlyStopping
 26 | from keras.initializers import glorot_normal
 27 | from keras.layers import RepeatVector
 28 | from keras.utils import plot_model
 29 | #from keras_sequential_ascii import keras2ascii
 30 | # just set the seed for the random number generator
 31 | 
 32 | import pickle
 33 | 
 34 | 
 35 | 
 36 | class ForecastingTrader:
 37 |     """
 38 | 
 39 | 
 40 |     """
 41 |     def __init__(self):
 42 |         """
 43 |         :initial elements
 44 |         """
 45 | 
 46 |     def destandardize(self, predictions, spread_mean, spread_std):
 47 |         """
 48 |         This function transforms the normalized predictions into the original space.
 49 |         """
 50 |         return predictions * spread_std + spread_mean
 51 | 
 52 |     def forecast_spread_trading(self, X, Y, spread_test, spread_train, beta, predictions, lag,
 53 |                                 low_quantile=0.15, high_quantile=0.85, multistep=0):
 54 |         """
 55 |         This function will set the trading positions based on the forecasted spread.
 56 |         For each day, the function compares the predicted spread for that day with the
 57 |         true value of the spread in tha day before, giving the predicted spread pct change.
 58 |         In case it is larger than the threshold, a position is entered.
 59 |         Note: because a position entered in day n it is only accounted for on the day after,
 60 |         we shift the entered positions.
 61 |         : predictions: predictions should not be standardized, but with regular mean and variance.
 62 |         """
 63 |         # 1. Get predictions pct_change
 64 |         # we want to see the pct change of the prediction compared
 65 |         # to the true value but the previous instant time, because we
 66 |         # are interested in seeing the temporal % change
 67 |         if multistep == 0:
 68 |             predictions_1 = predictions
 69 |             predictions_2 = pd.Series(data=[0]*len(predictions), index=predictions.index)
 70 |             predictions_pct_change = (((predictions_1 - spread_test.shift(lag)) /
 71 |                                        abs(spread_test.shift(lag))) * 100).fillna(0)
 72 |             # true_change = spread_test.diff().fillna(0)
 73 |         else:
 74 |             predictions_1, predictions_2 = predictions['t'], predictions['t+1']
 75 |             predictions_pct_change = (((predictions_2 - spread_test.shift(lag)) /
 76 |                                        abs(spread_test.shift(lag))) * 100).fillna(0)
 77 |             # need to add last row and first row correspondingly
 78 |             predictions_1 = predictions_1.append(pd.Series(data=predictions_2[-1], index=spread_test[-1:].index))
 79 |             predictions_2 = pd.concat([pd.Series(data=predictions_1[0], index=predictions_1[:1].index), predictions_2])
 80 | 
 81 |         # 2. Calculate trading thresholds
 82 |         spread_train_pct_change = ((spread_train - spread_train.shift(lag+multistep)) /
 83 |                                    abs(spread_train.shift(lag+multistep))) * 100
 84 |         positive_changes = spread_train_pct_change[spread_train_pct_change > 0]
 85 |         negative_changes = spread_train_pct_change[spread_train_pct_change < 0]
 86 |         long_threshold = positive_changes.quantile(q=high_quantile, interpolation='linear')
 87 |         #print('Long threshold: {:.2f}'.format(long_threshold))
 88 |         short_threshold = negative_changes.quantile(q=low_quantile, interpolation='linear')
 89 |         #print('Short threshold: {:.2f}'.format(short_threshold))
 90 | 
 91 |         # 3. Define trading timings
 92 |         # Note: If we want to enter a position at the beginning of day N,
 93 |         # because of the way pnl is calculated the position is entered
 94 |         # in the previous day.
 95 |         # Example: In day 23 the percentage change is 55% (wrt day 22). If we were to enter the
 96 |         # position in day 23, the following code would not consider the gains during day 23, even if we had
 97 |         # set the position in the morning. (it conly considers the gains for the next days)
 98 |         # Thus, as a workaoround, we enter the position at day 22 (at night), and it considers the gains for day 23
 99 |         numUnits = pd.Series(data=[0.] * len(spread_test), index=spread_test.index, name='numUnits')
100 |         longsEntry = predictions_pct_change > long_threshold
101 |         longsEntry = longsEntry.shift(-1).fillna(False)
102 |         numUnits[longsEntry] = 1.
103 |         shortsEntry = predictions_pct_change < short_threshold
104 |         shortsEntry = shortsEntry.shift(-1).fillna(False)
105 |         numUnits[shortsEntry] = -1
106 | 
107 |         # ffill if applicable
108 |         if lag == 1:
109 |             pct_change_from_previous = predictions_pct_change
110 |         else:
111 |             pct_change_from_previous = predictions_pct_change = (((predictions_1 - spread_test.shift(1)) /
112 |                                                                   abs(spread_test.shift(1))) * 100).fillna(0)
113 |         for i in range(1, len(numUnits) - 1):
114 |             if numUnits[i] != 0:
115 |                 continue
116 |             else:
117 |                 if numUnits[i - 1] == 0:
118 |                     continue
119 |                 elif numUnits[i - 1] == 1.:
120 |                     if pct_change_from_previous[i + 1] > 0:
121 |                         numUnits[i] = 1
122 |                         continue
123 |                 elif numUnits[i - 1] == -1.:
124 |                     if pct_change_from_previous[i + 1] < 0:
125 |                         numUnits[i] = -1.
126 |                         continue
127 | 
128 |         # 4. Calculate P&L and Returns
129 |         trader = class_Trader.Trader()
130 | 
131 |         # concatenate for positions with not enough data to be predicted
132 |         lookback = len(Y)-len(spread_test)
133 |         numUnits_not_predicted = pd.Series(data=[0.] * lookback, index=Y.index[:lookback])
134 |         numUnits = pd.concat([numUnits_not_predicted, numUnits], axis=0)
135 |         numUnits.name = 'numUnits'
136 |         # add trade durations
137 |         numUnits_df = pd.DataFrame(numUnits, index=Y.index)
138 |         numUnits_df = numUnits_df.rename(columns={"positions": "numUnits"})
139 |         trading_durations = trader.add_trading_duration(numUnits_df)
140 |         # calculate balance
141 |         balance_summary = trader.calculate_balance(Y, X, beta, numUnits.shift(1).fillna(0), trading_durations)
142 |         # calculate return per position
143 |         position_ret, _, _ = trader.calculate_position_returns(Y, X, beta, numUnits)
144 |         df = pd.DataFrame({'position_return':position_ret.values,
145 |                            'trading_duration':trading_durations,
146 |                            'position_during_day': numUnits.shift(1).fillna(0).values},
147 |                           index = position_ret.index)
148 |         position_ret_with_costs = trader.add_transaction_costs(df, beta)
149 |         balance_summary['position_ret_with_costs']=position_ret_with_costs
150 | 
151 |         # summarize
152 |         ret_with_costs, cum_ret_with_costs = balance_summary.returns, (balance_summary.account_balance-1)
153 |         bins = [-np.inf, -0.00000001, 0.00000001, np.inf]
154 |         names = ['-1', '0', '1']
155 |         summary = pd.DataFrame(data={'prediction(t)': predictions_1.values,
156 |                                      'prediction(t+1)': predictions_2.values,
157 |                                      'spread(t)': spread_test.values,
158 |                                      'predicted_change(%)': predictions_pct_change,
159 |                                      'position_during_day': numUnits.shift(1).fillna(0).values[lookback:],
160 |                                      'position_return': position_ret,
161 |                                      'position_ret_with_costs': position_ret_with_costs,
162 |                                      'trading_days': trading_durations[lookback:],
163 |                                      'ret_with_costs': ret_with_costs[lookback:],
164 |                                      'predicted_direction': pd.cut(predictions_pct_change, bins, labels=names),
165 |                                      'true_direction': pd.cut(spread_test.diff(), bins, labels=names)
166 |                                      },
167 |                                index=spread_test.index)
168 | 
169 |         return ret_with_costs, cum_ret_with_costs, summary, balance_summary
170 | 
171 |     def returns_forecasting_trading(self, Y, X, beta, predictions, test):
172 |         """
173 |         This function implements Dunis methodology.
174 |         It tracks big changes in returns, and opens a position when the change in the returns is significant.
175 |         """
176 |         # track positions for which the expected p&l overweights the transaction costs
177 |         numUnits = pd.Series(data=[0.] * len(predictions), index=predictions.index, name='numUnits')
178 |         long_opportunities = predictions > 0.0056
179 |         short_opportunities = predictions < -0.0056
180 |         longsEntry = long_opportunities.shift(-1).fillna(False)
181 |         numUnits[longsEntry] = 1.
182 |         shortsEntry = short_opportunities.shift(-1).fillna(False)
183 |         numUnits[shortsEntry] = -1.
184 | 
185 |         # Calculate P&L and Returns
186 |         trader = class_Trader.Trader()
187 |         # concatenate positions with not enough data to be predicted
188 |         lookback = len(Y) - len(predictions)
189 |         numUnits_not_predicted = pd.Series(data=[0.] * lookback, index=Y.index[:lookback])
190 |         numUnits = pd.concat([numUnits_not_predicted, numUnits], axis=0)
191 |         numUnits.name = 'numUnits'
192 |         # add trade durations
193 |         numUnits_df = pd.DataFrame(numUnits, index=Y.index)
194 |         numUnits_df = numUnits_df.rename(columns={"positions": "numUnits"})
195 |         trading_durations = trader.add_trading_duration(numUnits_df)
196 |         # calculate balance
197 |         balance_summary = trader.calculate_balance(Y, X, beta, numUnits.shift(1).fillna(0), trading_durations)
198 | 
199 |         # summarize
200 |         ret_with_costs, cum_ret_with_costs = balance_summary.returns, (balance_summary.account_balance - 1)
201 |         summary = pd.DataFrame(data={'predicted_pnl(t)': predictions.values,
202 |                                      'pnl(t)': test.values,
203 |                                      'position_during_day': numUnits.shift(1).fillna(0).values[lookback:],
204 |                                      'Y': Y[lookback:],
205 |                                      'X': X[lookback:],
206 |                                      'Y_pct_change': Y.pct_change()[lookback:],
207 |                                      'X_pct_change': X.pct_change()[lookback:],
208 |                                      'trading_days': trading_durations[lookback:],
209 |                                      'ret_with_costs': ret_with_costs[lookback:]
210 |                                      },
211 |                                index=test.index)
212 | 
213 |         return ret_with_costs, cum_ret_with_costs, summary, balance_summary
214 | 
215 |     def spread_trading(self, X, Y, spread_test, spread_train, beta, predictions, lag, low_quantile=0.10,
216 |                        high_quantile=0.90, multistep=0):
217 |         """
218 |         This function will set the trading positions based on the forecasted spread.
219 |         For each day, the function compares the predicted spread for that day with the
220 |         true value of the spread in tha day before, giving the predicted spread pct change.
221 |         In case it is larger than the threshold, a position is entered.
222 |         Note: because a position entered in day n it is only accounted for on the day after,
223 |         we shift the entered positions.
224 |         : predictions: predictions should not be standardized, but with regular mean and variance.
225 |         """
226 |         # 1. Get predictions change
227 |         if multistep == 0:
228 |             predictions_1 = predictions
229 |             predictions_change = predictions.diff().fillna(0)
230 |             true_change = spread_test.diff().fillna(0)
231 |         else:
232 |             predictions_1, predictions_2 = predictions
233 |             predictions_change = (predictions_2 - predictions_1.shift(lag)).fillna(0)
234 |             # need to add last row
235 |             predictions_change = predictions_change.append(pd.Series(data=[0], index=spread_test[-1:].index))
236 |             predictions_1 = predictions_1.append(pd.Series(data=predictions_2[-1], index=spread_test[-1:].index))
237 | 
238 |         # 2. Calculate trading threshold
239 |         spread_train_change = (spread_train - spread_train.shift(lag+multistep)).fillna(0)
240 |         positive_changes = spread_train_change[spread_train_change > 0]
241 |         negative_changes = spread_train_change[spread_train_change < 0]
242 |         long_threshold = positive_changes.quantile(q=high_quantile, interpolation='linear')
243 |         print('Long threshold: {:.2f}'.format(long_threshold))
244 |         short_threshold = negative_changes.quantile(q=low_quantile, interpolation='linear')
245 |         print('Short threshold: {:.2f}'.format(short_threshold))
246 | 
247 |         # 3. Define trading timings
248 |         numUnits = pd.Series(data=[0.] * len(spread_test), index=spread_test.index, name='numUnits')
249 |         longsEntry = (predictions_change > long_threshold) & (true_change.shift() > 0)
250 |         longsEntry = longsEntry.shift(-1).fillna(False)
251 |         numUnits[longsEntry] = 1.
252 |         shortsEntry = (predictions_change < short_threshold) & (true_change.shift() < 0)
253 |         shortsEntry = shortsEntry.shift(-1).fillna(False)
254 |         numUnits[shortsEntry] = -1.
255 | 
256 |         # ffill if applicable
257 |         if lag == 1:
258 |             change_from_previous = predictions_change
259 |         else:
260 |             change_from_previous = (predictions_1 - spread_test.shift(1)).fillna(0)
261 |         for i in range(1, len(numUnits) - 1):
262 |             if numUnits[i] != 0:
263 |                 continue
264 |             else:
265 |                 if numUnits[i - 1] == 0:
266 |                     continue
267 |                 elif numUnits[i - 1] == 1.:
268 |                     if change_from_previous[i + 1] > 0:
269 |                         numUnits[i] = 1
270 |                         continue
271 |                 elif numUnits[i - 1] == -1.:
272 |                     if change_from_previous[i + 1] < 0:
273 |                         numUnits[i] = -1.
274 |                         continue
275 | 
276 |         # 4. Calculate P&L and Returns
277 |         trader = class_Trader.Trader()
278 | 
279 |         # concatenate for positions with not enough data to be predicted
280 |         lookback = len(Y)-len(spread_test)
281 |         numUnits_not_predicted = pd.Series(data=[0.] * lookback, index=Y.index[:lookback])
282 |         numUnits = pd.concat([numUnits_not_predicted, numUnits], axis=0)
283 |         numUnits.name = 'numUnits'
284 |         # add trade durations
285 |         numUnits_df = pd.DataFrame(numUnits, index=Y.index)
286 |         numUnits_df = numUnits_df.rename(columns={"positions": "numUnits"})
287 |         trading_durations = trader.add_trading_duration(numUnits_df)
288 |         # calculate balance
289 |         balance_summary = trader.calculate_balance(Y, X, beta, numUnits.shift(1).fillna(0), trading_durations)
290 | 
291 |         # summarize
292 |         ret_with_costs, cum_ret_with_costs = balance_summary.returns, (balance_summary.account_balance-1)
293 |         summary = pd.DataFrame(data={'prediction(t)': predictions_1.values,
294 |                                      'spread(t)': spread_test.values,
295 |                                      'predicted_change': predictions_change,
296 |                                      'true_change': spread_test.diff().fillna(0).values,
297 |                                      'position_during_day': numUnits.shift(1).fillna(0).values[lookback:],
298 |                                      'Y': Y[lookback:],
299 |                                      'X': X[lookback:],
300 |                                      'trading_days': trading_durations[lookback:],
301 |                                      'ret_with_costs': ret_with_costs[lookback:]
302 |                                      },
303 |                                index=spread_test.index)
304 |         print('Accuracy of time series forecasting: {:.2f}%'.format(self.calculate_direction_accuracy(spread_test,
305 |                                                                                                   predictions_1)))
306 | 
307 |         return ret_with_costs, cum_ret_with_costs, summary, balance_summary
308 | 
309 |     def momentum_trading(self, X, Y, spread_test, spread_train, beta, predictions, lag, low_quantile=0.10,
310 |                          high_quantile=0.90, multistep=0):
311 |         """
312 |         This function will set the trading positions based on the forecasted spread.
313 |         For each day, the function compares the predicted spread for that day with the
314 |         true value of the spread in tha day before, giving the predicted spread pct change.
315 |         In case it is larger than the threshold, a position is entered.
316 |         Note: because a position entered in day n it is only accounted for on the day after,
317 |         we shift the entered positions.
318 |         : predictions: predictions should not be standardized, but with regular mean and variance.
319 |         """
320 |         # 1. Get predictions change
321 |         if multistep == 0:
322 |             predictions_1 = predictions
323 |             predictions_change = spread_test - predictions_1
324 |         else:
325 |             predictions_1, predictions_2 = predictions
326 |             predictions_change = (predictions_2 - predictions_1.shift(lag)).fillna(0)
327 |             # need to add last row
328 |             predictions_change = predictions_change.append(pd.Series(data=[0], index=spread_test[-1:].index))
329 |             predictions_1 = predictions_1.append(pd.Series(data=predictions_2[-1], index=spread_test[-1:].index))
330 | 
331 |         # 2. Calculate trading threshold
332 |         spread_train_change = (spread_train - spread_train.shift(lag+multistep)).fillna(0)
333 |         positive_changes = spread_train_change[spread_train_change > 0]
334 |         negative_changes = spread_train_change[spread_train_change < 0]
335 |         long_threshold = positive_changes.quantile(q=high_quantile, interpolation='linear')
336 |         print('Long threshold: {:.2f}'.format(long_threshold))
337 |         short_threshold = negative_changes.quantile(q=low_quantile, interpolation='linear')
338 |         print('Short threshold: {:.2f}'.format(short_threshold))
339 | 
340 |         # 3. Define trading timings
341 |         numUnits = pd.Series(data=[0.] * len(spread_test), index=spread_test.index, name='numUnits')
342 |         longsEntry = predictions_change > long_threshold
343 |         numUnits[longsEntry] = 1.
344 |         shortsEntry = predictions_change < short_threshold
345 |         numUnits[shortsEntry] = -1.
346 | 
347 |         # ffill if applicable
348 |         if lag == 1:
349 |             change_from_previous = predictions_change
350 |         else:
351 |             change_from_previous = (predictions_1 - spread_test.shift(1)).fillna(0)
352 |         for i in range(1, len(numUnits) - 1):
353 |             if numUnits[i] != 0:
354 |                 continue
355 |             else:
356 |                 if numUnits[i - 1] == 0:
357 |                     continue
358 |                 elif numUnits[i - 1] == 1.:
359 |                     if change_from_previous[i] > 0:
360 |                         numUnits[i] = 1
361 |                         continue
362 |                 elif numUnits[i - 1] == -1.:
363 |                     if change_from_previous[i] < 0:
364 |                         numUnits[i] = -1.
365 |                         continue
366 | 
367 |         # 4. Calculate P&L and Returns
368 |         trader = class_Trader.Trader()
369 | 
370 |         # concatenate for positions with not enough data to be predicted
371 |         lookback = len(Y)-len(spread_test)
372 |         numUnits_not_predicted = pd.Series(data=[0.] * lookback, index=Y.index[:lookback])
373 |         numUnits = pd.concat([numUnits_not_predicted, numUnits], axis=0)
374 |         numUnits.name = 'numUnits'
375 |         # add trade durations
376 |         numUnits_df = pd.DataFrame(numUnits, index=Y.index)
377 |         numUnits_df = numUnits_df.rename(columns={"positions": "numUnits"})
378 |         trading_durations = trader.add_trading_duration(numUnits_df)
379 |         # calculate balance
380 |         balance_summary = trader.calculate_balance(Y, X, beta, numUnits.shift(1).fillna(0), trading_durations)
381 | 
382 |         # summarize
383 |         ret_with_costs, cum_ret_with_costs = balance_summary.returns, (balance_summary.account_balance-1)
384 |         summary = pd.DataFrame(data={'prediction(t)': predictions_1.values,
385 |                                      'spread(t)': spread_test.values,
386 |                                      'spread_predicted_change': predictions_change.values,
387 |                                      'position_during_day': numUnits.shift(1).fillna(0).values[lookback:],
388 |                                      '{}'.format(Y.name): Y[lookback:],
389 |                                      '{}'.format(X.name): X[lookback:],
390 |                                      'trading_days': trading_durations[lookback:],
391 |                                      'ret_with_costs': ret_with_costs[lookback:]
392 |                                      },
393 |                                index=spread_test.index)
394 |         print('Accuracy of time series forecasting: {:.2f}%'.format(self.calculate_direction_accuracy(spread_test,
395 |                                                                                                   predictions_1)))
396 | 
397 |         return ret_with_costs, cum_ret_with_costs, summary, balance_summary
398 | 
399 |     def calculate_direction_accuracy(self, true, predictions):
400 | 
401 |         bins = [-np.inf, -0.00000001, 0.00000001, np.inf]
402 |         names = ['-1', '0', '1']
403 |         predictions_change = predictions.diff().fillna(0)
404 | 
405 |         predicted_direction = pd.cut(predictions_change, bins, labels=names)
406 |         true_direction = pd.cut(true.diff().fillna(0), bins, labels=names)
407 |         #accuracy = len(predicted_direction[predicted_direction == true_direction])/len(predicted_direction) * 100
408 | 
409 |         predicted_direction_subset = predicted_direction[true_direction != '0']
410 |         true_direction_subset = true_direction[true_direction != '0']
411 |         accuracy = len(predicted_direction_subset[predicted_direction_subset == true_direction_subset]) / \
412 |                    len(predicted_direction_subset) * 100
413 | 
414 |         return accuracy
415 | 
416 |     def series_to_supervised(self, data, index=None, n_in=1, n_out=1, dropnan=True):
417 |         """
418 |         Frame a time series as a supervised learning dataset.
419 |         Arguments:
420 |         data: Sequence of observations as a list or NumPy array.
421 |         n_in: Number of lag observations as input (X).
422 |         n_out: Number of observations as output (y).
423 |         dropnan: Boolean whether or not to drop rows with NaN values.
424 |         Returns:
425 |             Pandas DataFrame of series framed for supervised learning.
426 |         """
427 |         n_vars = 1 if type(data) is list else data.shape[1]
428 |         if index is None:
429 |             df = pd.DataFrame(data)
430 |         else:
431 |             df = pd.DataFrame(data, index=index)
432 |         cols, names = list(), list()
433 |         # input sequence (t-n, ... t-1)
434 |         for i in range(n_in, 0, -1):
435 |             cols.append(df.shift(i))
436 |             names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
437 |         # forecast sequence (t, t+1, ... t+n)
438 |         for i in range(0, n_out):
439 |             cols.append(df.shift(-i))
440 |             if i == 0:
441 |                 names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
442 |             else:
443 |                 names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
444 |         # put it all together
445 |         agg = pd.concat(cols, axis=1)
446 |         agg.columns = names
447 |         # drop rows with NaN values
448 |         if dropnan:
449 |             agg.dropna(inplace=True)
450 |         return agg
451 | 
452 |     def plot_loss(self, history):
453 |         """
454 |         Function to plot loss function.
455 |         Arguments:
456 |         history: History object with data from training.
457 |         title: Plot title.
458 |         """
459 | 
460 |         plt.plot(history.history['loss'], label = "training")
461 |         plt.plot(history.history['val_loss'], label = "validation")
462 | 
463 |     def prepare_train_data(self, spread, model_config):
464 |         """
465 | 
466 |         :param spread: spread of the pair being considered
467 |         :param model_config: dictionary with model parameters
468 |         :return:
469 |             tuple with training data
470 |             tuple with validation data
471 |             y_series in validation period (to compare with predictions later on)
472 |         """
473 |         train_val_split = model_config['train_val_split']
474 | 
475 |         scaler = StandardScaler()
476 |         spread_norm = scaler.fit_transform(spread.values.reshape(spread.shape[0], 1))
477 |         spread_norm = pd.Series(data=spread_norm.flatten(), index=spread.index)
478 |         forecasting_data = self.series_to_supervised(list(spread_norm), spread.index, model_config['n_in'],
479 |                                                      model_config['n_out'], dropnan=True)
480 |         # define dataset
481 |         if model_config['n_out'] == 1:
482 |             X_series = forecasting_data.drop(columns='var1(t)')
483 |             y_series = forecasting_data['var1(t)']
484 |         elif model_config['n_out'] == 2:
485 |             X_series = forecasting_data.drop(columns=['var1(t)', 'var1(t+1)'])
486 |             y_series = forecasting_data[['var1(t)', 'var1(t+1)']]
487 | 
488 |         # split
489 |         X_series_train = X_series[:train_val_split]
490 |         X_series_val = X_series[train_val_split:]
491 |         y_series_train = y_series[:train_val_split]
492 |         y_series_val = y_series[train_val_split:]
493 | 
494 |         X_train = X_series_train.values
495 |         X_val = X_series_val.values
496 |         y_train = y_series_train.values
497 |         y_val = y_series_val.values
498 | 
499 |         return (X_train, y_train), (X_val, y_val), y_series_val, scaler
500 | 
501 |     def prepare_test_data(self, spread, model_config, scaler):
502 |         """
503 |         """
504 |         # normalize spread
505 |         spread_norm = scaler.transform(spread.values.reshape(spread.shape[0], 1))
506 |         spread_norm = pd.Series(data=spread_norm.flatten(), index=spread.index)
507 |         forecasting_data = self.series_to_supervised(list(spread_norm), spread.index, model_config['n_in'],
508 |                                                      model_config['n_out'], dropnan=True)
509 |         # define dataset
510 |         if model_config['n_out'] == 1:
511 |             X_series_test = forecasting_data.drop(columns='var1(t)')
512 |             y_series_test = forecasting_data['var1(t)']
513 |         elif model_config['n_out'] == 2:
514 |             X_series_test = forecasting_data.drop(columns=['var1(t)', 'var1(t+1)'])
515 |             y_series_test = forecasting_data[['var1(t)', 'var1(t+1)']]
516 | 
517 |         X_test = X_series_test.values
518 |         y_test = y_series_test.values
519 | 
520 |         return (X_test, y_test), y_series_test
521 | 
522 |     def destandardize(self, predictions, spread_mean, spread_std):
523 |         """
524 |         This function transforms the normalized predictions into the original space.
525 |         """
526 |         return predictions * spread_std + spread_mean
527 | 
528 |     def train_models(self, pairs, model_config, model_type='mlp'):
529 |         """
530 |         This function trains the models for every pair identified.
531 | 
532 |         :param pairs: list with pairs and corresponding statistics
533 |         :param model_config: dictionary with info for the model
534 |         :return: all models
535 |         """
536 | 
537 |         models = []
538 |         for pair in pairs:
539 | 
540 |             # prepare train data
541 |             spread = pair[2]['spread']
542 |             train_data, validation_data, y_series_val, scaler = self.prepare_train_data(spread, model_config)
543 |             # prepare test data
544 |             spread_test = pair[2]['Y_test']-pair[2]['coint_coef']*pair[2]['X_test']
545 |             test_data, y_series_test = self.prepare_test_data(spread_test, model_config, scaler)
546 | 
547 |             # train model and get predictions
548 |             if model_type == 'mlp':
549 |                 model, history, score, predictions_train, predictions_val, predictions_test = \
550 |                                                               self.apply_MLP(X=train_data[0],
551 |                                                                             y=train_data[1],
552 |                                                                             validation_data=validation_data,
553 |                                                                             test_data=test_data,
554 |                                                                             n_in=model_config['n_in'],
555 |                                                                             hidden_nodes=model_config['hidden_nodes'],
556 |                                                                             epochs=model_config['epochs'],
557 |                                                                             optimizer=model_config['optimizer'],
558 |                                                                             loss_fct=model_config['loss_fct'],
559 |                                                                             batch_size=model_config['batch_size'])
560 | 
561 |             elif model_type == 'rnn':
562 |                 model, history, score, predictions_train, predictions_val, predictions_test = \
563 |                                                               self.apply_RNN(X=train_data[0],
564 |                                                                              y=train_data[1],
565 |                                                                              validation_data=validation_data,
566 |                                                                              test_data=test_data,
567 |                                                                              hidden_nodes=model_config['hidden_nodes'],
568 |                                                                              epochs=model_config['epochs'],
569 |                                                                              optimizer=model_config['optimizer'],
570 |                                                                              loss_fct=model_config['loss_fct'],
571 |                                                                              batch_size=model_config['batch_size'])
572 |             elif model_type == 'encoder_decoder':
573 |                 model, history, score, predictions_train, predictions_val, predictions_test = \
574 |                                                 self.apply_encoder_decoder(X=train_data[0],
575 |                                                                            y=train_data[1],
576 |                                                                            validation_data=validation_data,
577 |                                                                            test_data=test_data,
578 |                                                                            n_in=model_config['n_in'],
579 |                                                                            n_out=model_config['n_out'],
580 |                                                                            hidden_nodes=model_config['hidden_nodes'],
581 |                                                                            epochs=model_config['epochs'],
582 |                                                                            optimizer=model_config['optimizer'],
583 |                                                                            loss_fct=model_config['loss_fct'],
584 |                                                                            batch_size=model_config['batch_size'])
585 | 
586 |                 # validation
587 |                 predictions_val = pd.DataFrame({'t': predictions_val.reshape(predictions_val.shape[0],
588 |                                                                              predictions_val.shape[1])[:, 0],
589 |                                                 't+1': predictions_val.reshape(predictions_val.shape[0],
590 |                                                                                predictions_val.shape[1])[:, 1]},
591 |                                                index=y_series_val.index)
592 |                 predictions_val['t'] = scaler.inverse_transform(np.array(predictions_val['t']))
593 |                 predictions_val['t+1'] = scaler.inverse_transform(np.array(predictions_val['t+1']))
594 | 
595 |                 # test
596 |                 predictions_test = pd.DataFrame({'t': predictions_test.reshape(predictions_test.shape[0],
597 |                                                                                predictions_test.shape[1])[:, 0],
598 |                                                 't+1': predictions_test.reshape(predictions_test.shape[0],
599 |                                                                                 predictions_test.shape[1])[:, 1]},
600 |                                                 index=y_series_test.index)
601 |                 predictions_test['t'] = scaler.inverse_transform(np.array(predictions_test['t']))
602 |                 predictions_test['t+1'] = scaler.inverse_transform(np.array(predictions_test['t+1']))
603 | 
604 |                 # train
605 |                 predictions_train = predictions_val.copy()  # not relevant, just to fill up
606 | 
607 |             # transform predictions to series
608 |             if model_type != 'encoder_decoder':
609 |                 predictions_train = scaler.inverse_transform(predictions_train)
610 |                 predictions_val = scaler.inverse_transform(predictions_val)
611 |                 predictions_test = scaler.inverse_transform(predictions_test)
612 |                 predictions_train = pd.Series(data=predictions_train.flatten(),
613 |                                               index=spread[model_config['n_in']:-len(y_series_val)].index)
614 |                 predictions_val = pd.Series(data=predictions_val.flatten(), index=y_series_val.index)
615 |                 predictions_test = pd.Series(data=predictions_test.flatten(),
616 |                                              index=spread_test[-len(test_data[1]):].index)
617 | 
618 |             # save all info
619 |             # check epochs
620 |             if len(history.history['val_loss']) == 500:
621 |                 epoch_stop = 500
622 |             else:
623 |                 epoch_stop = len(history.history['val_loss']) - 50 # patience=50
624 | 
625 |             model_info = {'leg1': pair[0],
626 |                           'leg2': pair[1],
627 |                           'standardization_dict': 'scaler',
628 |                           'history': history.history,
629 |                           'score': score,
630 |                           'epoch_stop': epoch_stop,
631 |                           'predictions_train': predictions_train.copy(),
632 |                           'predictions_val': predictions_val.copy(),
633 |                           'predictions_test': predictions_test.copy()
634 |                           }
635 |             models.append(model_info)
636 | 
637 |         # append model configuration on last position
638 |         models.append(model_config)
639 | 
640 |         return models
641 | 
642 |     # ################################### MLP ############################################
643 |     def apply_MLP(self, X, y, validation_data, test_data, n_in, hidden_nodes, epochs, optimizer, loss_fct,
644 |                   batch_size=128):
645 | 
646 |         # define validation set
647 |         X_val = validation_data[0]
648 |         y_val = validation_data[1]
649 | 
650 |         # define test set
651 |         X_test = test_data[0]
652 |         y_test = test_data[1]
653 | 
654 |         model = Sequential()
655 |         glorot_init = glorot_normal(seed=None)
656 |         for i in range(len(hidden_nodes)):
657 |             model.add(Dense(hidden_nodes[i], activation='relu', input_dim=n_in, kernel_initializer=glorot_init))
658 |         #model.add(Dropout(0.1))
659 |         model.add(Dense(1))
660 |         model.compile(optimizer=optimizer, loss=loss_fct, metrics=['mae'])
661 |         model.summary()
662 |         if len(hidden_nodes)>1:
663 |             plot_model(model, to_file='/content/drive/PairsTrading/mlp_models/model_{}-{}_{}.png'.format(str(n_in),
664 |                        str(hidden_nodes[0]), str(hidden_nodes[1]), show_shapes=True, show_layer_names=False))
665 |         else:
666 |             plot_model(model, to_file='/content/drive/PairsTrading/mlp_models/model_{}-{}.png'.format(str(n_in),
667 |                        str(hidden_nodes[0])), show_shapes=True, show_layer_names=False)
668 |         #print(keras2ascii(model))
669 | 
670 |         # simple early stopping
671 |         es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
672 | 
673 |         history = model.fit(X, y, epochs=epochs, verbose=1, validation_data=validation_data,
674 |                             shuffle=False, batch_size=batch_size, callbacks=[es])
675 | 
676 |         # scores
677 |         if len(history.history['loss']) < 500:
678 |             train_score = [min(history.history['loss']), min(history.history['mean_absolute_error'])]
679 |             val_score = [min(history.history['val_loss']),min(history.history['val_mean_absolute_error'])]
680 |         else:
681 |             train_score = [history.history['loss'][-1], history.history['mean_absolute_error'][-1]]
682 |             val_score = [history.history['val_loss'][-1], history.history['val_mean_absolute_error'][-1]]
683 |         score = {'train': train_score, 'val': val_score}
684 | 
685 |         # predictions
686 |         predictions_train = model.predict(X, verbose=1)
687 |         predictions_validation = model.predict(X_val, verbose=1)
688 |         predictions_test = model.predict(X_test, verbose=1)
689 | 
690 |         print('------------------------------------------------------------')
691 |         print('The mse train loss is: ', train_score[0])
692 |         print('The mae train loss is: ', train_score[1])
693 |         print('The mse test loss is: ', val_score[0])
694 |         print('The mae test loss is: ', val_score[1])
695 |         print('------------------------------------------------------------')
696 | 
697 |         return model, history, score, predictions_train, predictions_validation, predictions_test
698 | 
699 |     # ################################### RNN ############################################
700 |     def apply_RNN(self, X, y, validation_data, test_data, hidden_nodes, epochs, optimizer, loss_fct,
701 |                   batch_size=256):
702 |         """
703 |         Note: CuDNNLSTM provides a faster implementation on GPU than regular LSTM
704 |         :param X:
705 |         :param y:
706 |         :param validation_data:
707 |         :param test_data:
708 |         :param hidden_nodes:
709 |         :param epochs:
710 |         :param optimizer:
711 |         :param loss_fct:
712 |         :param batch_size:
713 |         :return:
714 |         """
715 |         # reshape
716 |         X = X.reshape((X.shape[0], X.shape[1], 1))
717 |         X_val = validation_data[0].reshape((validation_data[0].shape[0], validation_data[0].shape[1], 1))
718 |         y_val = validation_data[1]
719 |         X_test = test_data[0].reshape((test_data[0].shape[0], test_data[0].shape[1], 1))
720 |         y_test = test_data[1]
721 | 
722 |         # define model
723 |         model = Sequential()
724 |         glorot_init = glorot_normal(seed=None)
725 |         # add GRU layers
726 |         if len(hidden_nodes) == 1:
727 |             #model.add(LSTM(hidden_nodes[0], activation='relu', input_shape=(X.shape[1], 1),
728 |             #               kernel_initializer=glorot_init))
729 |             model.add(CuDNNLSTM(hidden_nodes[0], input_shape=(X.shape[1], 1), kernel_initializer=glorot_init))
730 |         else:
731 |             for i in range(len(hidden_nodes)-1):
732 |                 if i == 0:
733 |                     #model.add(LSTM(hidden_nodes[0], activation='relu', input_shape=(X.shape[1], 1),
734 |                     #              return_sequences=True, kernel_initializer=glorot_init))
735 |                     model.add(CuDNNLSTM(hidden_nodes[0], input_shape=(X.shape[1], 1),
736 |                                    return_sequences=True, kernel_initializer=glorot_init))
737 |                 else:
738 |                     #model.add(LSTM(hidden_nodes[i], activation='relu', return_sequences=True,
739 |                     #               kernel_initializer=glorot_init))
740 |                     model.add(CuDNNLSTM(hidden_nodes[i], return_sequences=True,
741 |                                    kernel_initializer=glorot_init))
742 |                 # add dropout in between
743 |                 model.add(Dropout(0.1))
744 | 
745 |             #model.add(LSTM(hidden_nodes[-1], activation='relu', kernel_initializer=glorot_init)) # last layer does not return sequences
746 |             model.add(CuDNNLSTM(hidden_nodes[-1], kernel_initializer=glorot_init))# last layer does not return sequences
747 |         # add regularization
748 |         #model.add(Dropout(0.1))
749 |         # add dense layer for output
750 |         model.add(Dense(1, kernel_initializer=glorot_init))
751 |         model.compile(optimizer=optimizer, loss=loss_fct, metrics=['mae'])
752 |         model.summary()
753 |         plot_model(model, to_file='/content/drive/PairsTrading/rnn_models/model.png', show_shapes=True,
754 |                    show_layer_names=False)
755 | 
756 |         # simple early stopping
757 |         es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
758 | 
759 |         # fit model
760 |         history = model.fit(X, y, epochs=epochs, verbose=1, validation_data=(X_val, y_val), shuffle=False,
761 |                             batch_size=batch_size, callbacks=[es])
762 | 
763 |         # scores
764 |         if len(history.history['loss']) < 500:
765 |             train_score = [min(history.history['loss']), min(history.history['mean_absolute_error'])]
766 |             val_score = [min(history.history['val_loss']),min(history.history['val_mean_absolute_error'])]
767 |         else:
768 |             train_score = [history.history['loss'][-1], history.history['mean_absolute_error'][-1]]
769 |             val_score = [history.history['val_loss'][-1], history.history['val_mean_absolute_error'][-1]]
770 | 
771 |         score = {'train': train_score, 'val': val_score}
772 | 
773 |         # removed test score calculation to save time
774 |         #test_score = model.evaluate(X_test, y_test, verbose=1)
775 |         # , 'test': test_score}
776 | 
777 |         predictions_train = model.predict(X, verbose=1)
778 |         predictions_validation = model.predict(X_val, verbose=1)
779 |         predictions_test = model.predict(X_test, verbose=1)
780 | 
781 |         print('------------------------------------------------------------')
782 |         print('The mse train loss is: ', train_score[0])
783 |         print('The mae train loss is: ', train_score[1])
784 |         print('The mse test loss is: ', val_score[0])
785 |         print('The mae test loss is: ', val_score[1])
786 |         print('------------------------------------------------------------')
787 | 
788 |         return model, history, score, predictions_train, predictions_validation, predictions_test
789 | 
790 |     # ################################### ENCODER DECODER ############################################
791 |     def apply_encoder_decoder(self, X, y, validation_data, test_data, n_in, n_out, hidden_nodes,
792 |                               epochs, optimizer, loss_fct, batch_size=512):
793 | 
794 |         # reshape from [samples, timesteps] into [samples, timesteps, features]
795 |         X = X.reshape((X.shape[0], X.shape[1], 1))
796 |         y = y.reshape((y.shape[0], y.shape[1], 1))
797 |         X_val = validation_data[0].reshape((validation_data[0].shape[0], validation_data[0].shape[1], 1))
798 |         y_val = validation_data[1].reshape((validation_data[1].shape[0], validation_data[1].shape[1], 1))
799 |         X_test = test_data[0].reshape((test_data[0].shape[0], test_data[0].shape[1], 1))
800 |         y_test = test_data[1].reshape((test_data[1].shape[0], test_data[1].shape[1], 1))
801 | 
802 |         # define model
803 |         glorot_init = glorot_normal(seed=None)
804 |         model = Sequential()
805 | 
806 |         # CuDNNLSTM provides a faster implementation on GPU
807 |         #model.add(LSTM(hidden_nodes[0], activation='relu', input_shape=(n_in, 1),  kernel_initializer=glorot_init))
808 |         model.add(CuDNNLSTM(hidden_nodes[0], input_shape=(n_in, 1), kernel_initializer=glorot_init))
809 |         model.add(RepeatVector(n_out))
810 | 
811 |         # CuDNNLSTM provides a faster implementation on GPU
812 |         #model.add(LSTM(hidden_nodes[1], activation='relu', return_sequences=True,  kernel_initializer=glorot_init))
813 |         model.add(CuDNNLSTM(hidden_nodes[1], return_sequences=True, kernel_initializer=glorot_init))
814 | 
815 |         #model.add(Dropout(0.1))
816 |         model.add(TimeDistributed(Dense(1, kernel_initializer=glorot_init)))
817 |         model.compile(optimizer=optimizer, loss=loss_fct, metrics=['mae'])
818 |         model.summary()
819 |         plot_model(model, to_file='/content/drive/PairsTrading/encoder_decoder/model.png', show_shapes=True,
820 |                    show_layer_names=False)
821 | 
822 |         # fit model
823 |         # simple early stopping
824 |         es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
825 | 
826 |         # fit model
827 |         history = model.fit(X, y, epochs=epochs, verbose=1, validation_data=(X_val, y_val), shuffle=False,
828 |                             batch_size=batch_size, callbacks=[es])
829 | 
830 |         # scores
831 |         if len(history.history['loss']) < 500:
832 |             train_score = [min(history.history['loss']), min(history.history['mean_absolute_error'])]
833 |             val_score = [min(history.history['val_loss']), min(history.history['val_mean_absolute_error'])]
834 |         else:
835 |             train_score = [history.history['loss'][-1], history.history['mean_absolute_error'][-1]]
836 |             val_score = [history.history['val_loss'][-1], history.history['val_mean_absolute_error'][-1]]
837 | 
838 |         score = {'train': train_score, 'val': val_score}
839 | 
840 |         predictions_train = model.predict(X, verbose=1)
841 |         predictions_validation = model.predict(X_val, verbose=1)
842 |         predictions_test = model.predict(X_test, verbose=1)
843 | 
844 |         print('------------------------------------------------------------')
845 |         print('The mse train loss is: ', train_score[0])
846 |         print('The mae train loss is: ', train_score[1])
847 |         print('The mse test loss is: ', val_score[0])
848 |         print('The mae test loss is: ', val_score[1])
849 |         print('------------------------------------------------------------')
850 | 
851 |         return model, history, score, predictions_train, predictions_validation, predictions_test
852 | 
853 |     def display_forecasting_score(self, models):
854 | 
855 |         # initialize storage variables
856 |         best_score = 99999
857 |         best_model = None
858 | 
859 |         all_models = models
860 |         for configuration in all_models:
861 |             config = configuration[-1]
862 |             print('\nNEW CONFIGURATION:')
863 |             print('Configuration: ', config)
864 |             mae_train, mse_train = list(), list()
865 |             mae_val, mse_val = list(), list()
866 |             for pair_i in range(len(configuration) - 1):
867 |                 score_train = configuration[pair_i]['score']['train']
868 |                 score_val = configuration[pair_i]['score']['val']
869 |                 # print('MAE: {:.2f}%'.format(score[1]))
870 |                 mse_train.append(score_train[0])
871 |                 mae_train.append(score_train[1])
872 |                 mse_val.append(score_val[0])
873 |                 mae_val.append(score_val[1])
874 |                 print('\nPair loaded: {}_{}: Epochs: {} Val_MSE: {}'.format(configuration[pair_i]['leg1'],
875 |                                                                             configuration[pair_i]['leg2'],
876 |                                                                             configuration[pair_i]['epoch_stop'],
877 |                                                                             score_val[0]
878 |                                                                             ))
879 |             if (np.mean(mse_val)) < best_score:
880 |                 best_score = np.mean(mse_val)
881 |                 best_model = config
882 | 
883 |             print('\nCONFIGURATION TRAIN MSE ERROR: {:.4f}E-4'.format(np.mean(mse_train) * 10000))
884 |             print('CONFIGURATION TRAIN MAE ERROR: {:.4f}'.format(np.mean(mae_train)))
885 |             print('\nCONFIGURATION VAL MSE ERROR: {:.4f}E-4'.format(np.mean(mse_val) * 10000))
886 |             print('CONFIGURATION VAL MAE ERROR: {:.4f}'.format(np.mean(mae_val)))
887 | 
888 |         return (best_model, best_score)
889 | 
890 |     def run_specific_model(self, n_in, hidden_nodes, pairs, path='models/', train_val_split='2017-01-01', lag=1,
891 |                            multistep=0, low_quantile=0.10, high_quantile=0.90):
892 | 
893 |         nodes_name = str(hidden_nodes[0]) + '_' + str(hidden_nodes[1]) if len(hidden_nodes) > 1 else str(hidden_nodes[0])
894 |         file_name = 'models_n_in-' + str(n_in) + '_hidden_nodes-' + nodes_name + '.pkl'
895 | 
896 |         with open(path + file_name, 'rb') as f:
897 |             model = pickle.load(f)
898 | 
899 |         model_cumret, model_sharpe_ratio = list(), list()
900 |         balance_summaries, summaries = list(), list()
901 |         for pair_i in range(len(model) - 1):
902 |             #print('\nPair loaded: {}_{}:'.format(model[pair_i]['leg1'], model[pair_i]['leg2']))
903 |             #print('Check pairs: {}_{}.'.format(pairs[pair_i][0], pairs[pair_i][1]))
904 |             predictions = model[pair_i]['predictions_val']
905 | 
906 |             ret, cumret, summary, balance_summary = self.forecast_spread_trading(
907 |                                                             X=pairs[pair_i][2]['X_train'][train_val_split:],
908 |                                                             Y=pairs[pair_i][2]['Y_train'][train_val_split:],
909 |                                                             spread_test=pairs[pair_i][2]['spread'][train_val_split:],
910 |                                                             spread_train=pairs[pair_i][2]['spread'][:train_val_split],
911 |                                                             beta=pairs[pair_i][2]['coint_coef'],
912 |                                                             predictions=predictions,
913 |                                                             lag=lag,
914 |                                                             low_quantile=low_quantile,
915 |                                                             high_quantile=high_quantile,
916 |                                                             multistep=multistep)
917 | 
918 |             #print('Accumulated return: {:.2f}%'.format(cumret[-1] * 100))
919 | 
920 |             trader = class_Trader.Trader()
921 |             if np.std(ret) != 0:
922 |                 sharpe_ratio = trader.calculate_sharpe_ratio(1, 252, ret)
923 |             else:
924 |                 sharpe_ratio = 0
925 |             #print('Sharpe Ratio:', sharpe_ratio)
926 | 
927 |             model_cumret.append(cumret[-1] * 100)
928 |             model_sharpe_ratio.append(sharpe_ratio)
929 |             summaries.append(summary)
930 |             balance_summaries.append(balance_summary)
931 | 
932 |         return model, model_cumret, model_sharpe_ratio, summaries, balance_summaries
933 | 
934 |     def test_specific_model(self, n_in, hidden_nodes, pairs, path, train_test_split='2018-01-01', lag=1,
935 |                             low_quantile=0.10, high_quantile=0.90, multistep=0, profitable_pairs_indices=None):
936 | 
937 |         nodes_name = str(hidden_nodes[0]) + '_' + str(hidden_nodes[1]) if len(hidden_nodes) > 1 else str(
938 |             hidden_nodes[0])
939 |         file_name = 'models_n_in-' + str(n_in) + '_hidden_nodes-' + nodes_name + '.pkl'
940 | 
941 |         with open(path + file_name, 'rb') as f:
942 |             model = pickle.load(f)
943 | 
944 |         model_cumret, model_sharpe_ratio = list(), list()
945 |         summaries, balance_summaries = list(), list()
946 |         for pair_i in range(len(model) - 1):
947 |             if pair_i in profitable_pairs_indices:
948 |                 #print('\nPair loaded: {}_{}:'.format(model[pair_i]['leg1'], model[pair_i]['leg2']))
949 |                 #print('Check pairs: {}_{}.'.format(pairs[pair_i][0], pairs[pair_i][1]))
950 |                 predictions = model[pair_i]['predictions_test']
951 |                 spread_test = pairs[pair_i][2]['Y_test'] - pairs[pair_i][2]['coint_coef'] * pairs[pair_i][2]['X_test']
952 | 
953 |                 ret, cumret, summary, balance_summary = self.forecast_spread_trading(
954 |                                                             X=pairs[pair_i][2]['X_test'],
955 |                                                             Y=pairs[pair_i][2]['Y_test'],
956 |                                                             spread_test=spread_test[-len(predictions)-multistep:],
957 |                                                             spread_train=pairs[pair_i][2]['spread'][:train_test_split],
958 |                                                             beta=pairs[pair_i][2]['coint_coef'],
959 |                                                             predictions=predictions,
960 |                                                             lag=lag,
961 |                                                             low_quantile=low_quantile,
962 |                                                             high_quantile=high_quantile,
963 |                                                             multistep=multistep)
964 | 
965 |                 #print('Accumulated return: {:.2f}%'.format(cumret[-1] * 100))
966 | 
967 |                 trader = class_Trader.Trader()
968 |                 if np.std(ret) != 0:
969 |                     sharpe_ratio = trader.calculate_sharpe_ratio(1, 252, ret)
970 |                 else:
971 |                     sharpe_ratio = 0
972 |                 #print('Sharpe Ratio:', sharpe_ratio)
973 | 
974 |                 model_cumret.append(cumret[-1] * 100)
975 |                 model_sharpe_ratio.append(sharpe_ratio)
976 |                 summaries.append(summary)
977 |                 balance_summaries.append(balance_summary)
978 | 
979 |         return model, model_cumret, model_sharpe_ratio, summaries, balance_summaries
980 | 
981 | 
982 | 


--------------------------------------------------------------------------------
/classes/class_SeriesAnalyser.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import sys
  4 | import collections, functools, operator
  5 | 
  6 | import statsmodels.api as sm
  7 | from statsmodels.tsa.stattools import coint, adfuller
  8 | 
  9 | from sklearn.cluster import DBSCAN
 10 | from sklearn.cluster import OPTICS, cluster_optics_dbscan
 11 | from sklearn.decomposition import PCA
 12 | from sklearn import preprocessing
 13 | from sklearn.metrics import silhouette_score
 14 | 
 15 | # just set the seed for the random number generator
 16 | np.random.seed(107)
 17 | 
 18 | 
 19 | class SeriesAnalyser:
 20 |     """
 21 |     This class contains a set of functions to deal with time series analysis.
 22 |     """
 23 | 
 24 |     def __init__(self):
 25 |         """
 26 |         :initial elements
 27 |         """
 28 | 
 29 |     def check_for_stationarity(self, X,  subsample=0):
 30 |         """
 31 |         H_0 in adfuller is unit root exists (non-stationary).
 32 |         We must observe significant p-value to convince ourselves that the series is stationary.
 33 | 
 34 |         :param X: time series
 35 |         :param subsample: boolean indicating whether to subsample series
 36 |         :return: adf results
 37 |         """
 38 |         if subsample != 0:
 39 |             frequency = round(len(X)/subsample)
 40 |             subsampled_X = X[0::frequency]
 41 |             result = adfuller(subsampled_X)
 42 |         else:
 43 |             result = adfuller(X)
 44 |         # result contains:
 45 |         # 0: t-statistic
 46 |         # 1: p-value
 47 |         # others: please see https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html
 48 | 
 49 |         return {'t_statistic': result[0], 'p_value': result[1], 'critical_values': result[4]}
 50 | 
 51 |     def check_properties(self, train_series, test_series, p_value_threshold, min_half_life=78, max_half_life=20000,
 52 |                          min_zero_crossings=0, hurst_threshold=0.5, subsample=0):
 53 |         """
 54 |         Gets two time series as inputs and provides information concerning cointegration stasttics
 55 |         Y - b*X : Y is dependent, X is independent
 56 |         """
 57 | 
 58 |         # for some reason is not giving right results
 59 |         # t_statistic, p_value, crit_value = coint(X,Y, method='aeg')
 60 | 
 61 |         # perform test manually in both directions
 62 |         X = train_series[0]
 63 |         Y = train_series[1]
 64 |         pairs = [(X, Y), (Y, X)]
 65 |         pair_stats = [0] * 2
 66 |         criteria_not_verified = 'cointegration'
 67 | 
 68 |         # first of all, must verify price series S1 and S2 are I(1)
 69 |         stats_Y = self.check_for_stationarity(np.asarray(Y), subsample=subsample)
 70 |         if stats_Y['p_value'] > 0.10:
 71 |             stats_X = self.check_for_stationarity(np.asarray(X), subsample=subsample)
 72 |             if stats_X['p_value'] > 0.10:
 73 |                 # conditions to test cointegration verified
 74 | 
 75 |                 for i, pair in enumerate(pairs):
 76 |                     S1 = np.asarray(pair[0])
 77 |                     S2 = np.asarray(pair[1])
 78 |                     S1_c = sm.add_constant(S1)
 79 | 
 80 |                     # Y = bX + c
 81 |                     # ols: (Y, X)
 82 |                     results = sm.OLS(S2, S1_c).fit()
 83 |                     b = results.params[1]
 84 | 
 85 |                     if b > 0:
 86 |                         spread = pair[1] - b * pair[0] # as Pandas Series
 87 |                         spread_array = np.asarray(spread) # as array for faster computations
 88 | 
 89 |                         stats = self.check_for_stationarity(spread_array, subsample=subsample)
 90 |                         if stats['p_value'] < p_value_threshold:  # verifies required pvalue
 91 |                             criteria_not_verified = 'hurst_exponent'
 92 | 
 93 |                             hurst_exponent = self.hurst(spread_array)
 94 |                             if hurst_exponent < hurst_threshold:
 95 |                                 criteria_not_verified = 'half_life'
 96 | 
 97 |                                 hl = self.calculate_half_life(spread_array)
 98 |                                 if (hl >= min_half_life) and (hl < max_half_life):
 99 |                                     criteria_not_verified = 'mean_cross'
100 | 
101 |                                     zero_cross = self.zero_crossings(spread_array)
102 |                                     if zero_cross >= min_zero_crossings:
103 |                                         criteria_not_verified = 'None'
104 | 
105 |                                         pair_stats[i] = {'t_statistic': stats['t_statistic'],
106 |                                                           'critical_val': stats['critical_values'],
107 |                                                           'p_value': stats['p_value'],
108 |                                                           'coint_coef': b,
109 |                                                           'zero_cross': zero_cross,
110 |                                                           'half_life': int(round(hl)),
111 |                                                           'hurst_exponent': hurst_exponent,
112 |                                                           'spread': spread,
113 |                                                           'Y_train': pair[1],
114 |                                                           'X_train': pair[0]
115 |                                                           }
116 | 
117 |         if pair_stats[0] == 0 and pair_stats[1] == 0:
118 |             result = None
119 |             return result, criteria_not_verified
120 | 
121 |         elif pair_stats[0] == 0:
122 |             result = 1
123 |         elif pair_stats[1] == 0:
124 |             result = 0
125 |         else: # both combinations are possible
126 |             # select lowest t-statistic as representative test
127 |             if abs(pair_stats[0]['t_statistic']) > abs(pair_stats[1]['t_statistic']):
128 |                 result = 0
129 |             else:
130 |                 result = 1
131 | 
132 |         if result == 0:
133 |             result = pair_stats[0]
134 |             result['X_test'] = test_series[0]
135 |             result['Y_test'] = test_series[1]
136 |         elif result == 1:
137 |             result = pair_stats[1]
138 |             result['X_test'] = test_series[1]
139 |             result['Y_test'] = test_series[0]
140 | 
141 |         return result, criteria_not_verified
142 | 
143 |     def find_pairs(self, data_train, data_test, p_value_threshold, min_half_life=78, max_half_life=20000,
144 |                    min_zero_crossings=0, hurst_threshold=0.5, subsample=0):
145 |         """
146 |         This function receives a df with the different securities as columns, and aims to find tradable
147 |         pairs within this world. There is a df containing the training data and another one containing test data
148 |         Tradable pairs are those that verify:
149 |             - cointegration
150 |             - minimium half life
151 |             - minimium zero crossings
152 | 
153 |         :param data_train: df with training prices in columns
154 |         :param data_test: df with testing prices in columns
155 |         :param p_value_threshold:  pvalue threshold for a pair to be cointegrated
156 |         :param min_half_life: minimium half life value of the spread to consider the pair
157 |         :param min_zero_crossings: minimium number of allowed zero crossings
158 |         :param hurst_threshold: mimimium acceptable number for hurst threshold
159 |         :return: pairs that passed test
160 |         """
161 |         n = data_train.shape[1]
162 |         keys = data_train.keys()
163 |         pairs_fail_criteria = {'cointegration': 0, 'hurst_exponent': 0, 'half_life': 0, 'mean_cross': 0, 'None': 0}
164 |         pairs = []
165 |         for i in range(n):
166 |             for j in range(i + 1, n):
167 |                 S1_train = data_train[keys[i]]; S2_train = data_train[keys[j]]
168 |                 S1_test = data_test[keys[i]]; S2_test = data_test[keys[j]]
169 |                 result, criteria_not_verified = self.check_properties((S1_train, S2_train), (S1_test, S2_test),
170 |                                                                       p_value_threshold, min_half_life, max_half_life,
171 |                                                                       min_zero_crossings, hurst_threshold, subsample)
172 |                 pairs_fail_criteria[criteria_not_verified] += 1
173 |                 if result is not None:
174 |                     pairs.append((keys[i], keys[j], result))
175 | 
176 | 
177 |         return pairs, pairs_fail_criteria
178 | 
179 |     def pairs_overlap(self, pairs, p_value_threshold, min_zero_crossings, min_half_life, hurst_threshold):
180 |         """
181 |         This function receives the pairs identified in the training set, and returns a list of the pairs
182 |         which are still cointegrated in the test set.
183 | 
184 |         :param pairs: list of pairs in the train set for which to verify cointegration in the test set
185 |         :param p_value_threshold: p_value to consider cointegration
186 |         :param min_zero_crossings: zero crossings to consider cointegration
187 |         :param min_half_life: minimum half-life to consider cointegration
188 |         :param hurst_threshold:  maximum threshold to consider cointegration
189 | 
190 |         :return: list with pairs overlapped
191 |         :return: list with indices from the pairs overlapped
192 |         """
193 |         pairs_overlapped = []
194 |         pairs_overlapped_index = []
195 | 
196 |         for index, pair in enumerate(pairs):
197 |             # get consituents
198 |             X = pair[2]['X_test']
199 |             Y = pair[2]['Y_test']
200 |             # check if pairs is valid
201 |             series_name = X.name
202 |             X = sm.add_constant(X)
203 |             results = sm.OLS(Y, X).fit()
204 |             X = X[series_name]
205 |             b = results.params[X.name]
206 |             spread = Y - b * X
207 |             stats = self.check_for_stationarity(pd.Series(spread, name='Spread'))
208 | 
209 |             if stats['p_value'] < p_value_threshold:  # verifies required pvalue
210 |                 hl = self.calculate_half_life(spread)
211 |                 if hl >= min_half_life:  # verifies required half life
212 |                     zero_cross = self.zero_crossings(spread)
213 |                     if zero_cross >= min_zero_crossings:  # verifies required zero crossings
214 |                         hurst_exponent = self.hurst(spread)
215 |                         if hurst_exponent < hurst_threshold:  # verifies hurst exponent
216 |                             pairs_overlapped.append(pair)
217 |                             pairs_overlapped_index.append(index)
218 | 
219 |         return pairs_overlapped, pairs_overlapped_index
220 | 
221 |     def zscore(self, series):
222 |         """
223 |         Returns the nromalized time series assuming a normal distribution
224 |         """
225 |         return (series-series.mean())/np.std(series)
226 | 
227 |     def calculate_half_life(self, z_array):
228 |         """
229 |         This function calculates the half life parameter of a
230 |         mean reversion series
231 |         """
232 |         z_lag = np.roll(z_array, 1)
233 |         z_lag[0] = 0
234 |         z_ret = z_array - z_lag
235 |         z_ret[0] = 0
236 | 
237 |         # adds intercept terms to X variable for regression
238 |         z_lag2 = sm.add_constant(z_lag)
239 | 
240 |         model = sm.OLS(z_ret[1:], z_lag2[1:])
241 |         res = model.fit()
242 | 
243 |         halflife = -np.log(2) / res.params[1]
244 | 
245 |         return halflife
246 | 
247 |     def hurst(self, ts):
248 |         """
249 |         Returns the Hurst Exponent of the time series vector ts.
250 |         Series vector ts should be a price series.
251 |         Source: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing"""
252 |         # Create the range of lag values
253 |         lags = range(2, 100)
254 | 
255 |         # Calculate the array of the variances of the lagged differences
256 |         # Here it calculates the variances, but why it uses
257 |         # standard deviation and then make a root of it?
258 |         tau = [np.sqrt(np.std(np.subtract(ts[lag:], ts[:-lag]))) for lag in lags]
259 | 
260 |         # Use a linear fit to estimate the Hurst Exponent
261 |         poly = np.polyfit(np.log(lags), np.log(tau), 1)
262 | 
263 |         # Return the Hurst exponent from the polyfit output
264 |         return poly[0] * 2.0
265 | 
266 |     def variance_ratio(self, ts, lag=2):
267 |         """
268 |         Returns the variance ratio test result
269 |         Source: https://gist.github.com/jcorrius/56b4983ca059e69f2d2df38a3a05e225#file-variance_ratio-py
270 |         """
271 |         # make sure we are working with an array, convert if necessary
272 |         ts = np.asarray(ts)
273 | 
274 |         # Apply the formula to calculate the test
275 |         n = len(ts)
276 |         mu = sum(ts[1:n] - ts[:n - 1]) / n
277 |         m = (n - lag + 1) * (1 - lag / n)
278 |         b = sum(np.square(ts[1:n] - ts[:n - 1] - mu)) / (n - 1)
279 |         t = sum(np.square(ts[lag:n] - ts[:n - lag] - lag * mu)) / m
280 |         return t / (lag * b)
281 | 
282 |     def zero_crossings(self, x):
283 |         """
284 |         Function that counts the number of zero crossings of a given signal
285 |         :param x: the signal to be analyzed
286 |         """
287 |         x = x - x.mean()
288 |         zero_crossings = sum(1 for i, _ in enumerate(x) if (i + 1 < len(x)) if ((x[i] * x[i + 1] < 0) or (x[i] == 0)))
289 | 
290 |         return zero_crossings
291 | 
292 |     def apply_PCA(self, n_components, df, svd_solver='auto', random_state=0):
293 |         """
294 |         This function applies Principal Component Analysis to the df given as
295 |         parameter
296 | 
297 |         :param n_components: number of principal components
298 |         :param df: dataframe containing time series for analysis
299 |         :param svd_solver: solver for PCA: see PCA documentation
300 |         :return: reduced normalized and transposed df
301 |         """
302 | 
303 |         if not isinstance(n_components, str):
304 |             if n_components > df.shape[1]:
305 |                 print("ERROR: number of components larger than samples...")
306 |                 exit()
307 | 
308 |         pca = PCA(n_components=n_components, svd_solver=svd_solver, random_state=random_state)
309 |         pca.fit(df)
310 |         explained_variance = pca.explained_variance_
311 | 
312 |         # standardize
313 |         X = preprocessing.StandardScaler().fit_transform(pca.components_.T)
314 | 
315 |         return X, explained_variance
316 | 
317 |     def apply_OPTICS(self, X, df_returns, min_samples, max_eps=2, xi=0.05, cluster_method='xi'):
318 |         """
319 | 
320 |         :param X:
321 |         :param df_returns:
322 |         :param min_samples:
323 |         :param max_eps:
324 |         :param xi:
325 |         :param eps:
326 |         :return:
327 |         """
328 |         clf = OPTICS(min_samples=min_samples, max_eps=max_eps, xi=xi, metric='euclidean', cluster_method=cluster_method)
329 |         print(clf)
330 | 
331 |         clf.fit(X)
332 |         labels = clf.labels_
333 |         n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
334 |         print("Clusters discovered: %d" % n_clusters_)
335 | 
336 |         clustered_series_all = pd.Series(index=df_returns.columns, data=labels.flatten())
337 |         clustered_series = clustered_series_all[clustered_series_all != -1]
338 | 
339 |         counts = clustered_series.value_counts()
340 |         print("Pairs to evaluate: %d" % (counts * (counts - 1) / 2).sum())
341 | 
342 |         return clustered_series_all, clustered_series, counts, clf
343 | 
344 |     def apply_DBSCAN(self, eps, min_samples, X, df_returns):
345 |         """
346 |         This function applies a DBSCAN clustering algo
347 | 
348 |         :param eps: min distance for a sample to be within the cluster
349 |         :param min_samples: min_samples to consider a cluster
350 |         :param X: data
351 | 
352 |         :return: clustered_series_all: series with all tickers and labels
353 |         :return: clustered_series: series with tickers belonging to a cluster
354 |         :return: counts: counts of each cluster
355 |         :return: clf object
356 |         """
357 |         clf = DBSCAN(eps=eps, min_samples=min_samples, metric='euclidean')
358 |         #print(clf)
359 | 
360 |         clf.fit(X)
361 |         labels = clf.labels_
362 |         n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
363 |         print("Clusters discovered: %d" % n_clusters_)
364 | 
365 |         clustered_series_all = pd.Series(index=df_returns.columns, data=labels.flatten())
366 |         clustered_series = clustered_series_all[clustered_series_all != -1]
367 | 
368 |         counts = clustered_series.value_counts()
369 |         print("Pairs to evaluate: %d" % (counts * (counts - 1) / 2).sum())
370 | 
371 |         return clustered_series_all, clustered_series, counts, clf
372 | 
373 |     def clustering_for_optimal_PCA(self, min_components, max_components, returns, clustering_params):
374 |         """
375 |         This function experiments different values for the number of PCA components considered.
376 |         It returns the values obtained for the number of components which provided the best silhouette
377 |         coefficient.
378 | 
379 |         :param min_components: min number of components to test
380 |         :param max_components: max number of components to test
381 |         :param returns: series of returns
382 |         :param clustering_params: parameters for clustering
383 | 
384 |         :return: X: PCA reduced dataset
385 |         :return: clustered_series_all: cluster labels for all sample
386 |         :return: clustered_series: cluster labels for samples belonging to a cluster
387 |         :return: counts: counts for each cluster
388 |         :return: clf: object returned by DBSCAN
389 |         """
390 |         # initialize dictionary to save best performers
391 |         best_n_comp = {'n_comp': -1,
392 |                        'silhouette': -1,
393 |                        'X': None,
394 |                        'clustered_series_all': None,
395 |                        'clustered_series': None,
396 |                        'counts': None,
397 |                        'clf': None
398 |                        }
399 | 
400 |         for n_comp in range(min_components, max_components):
401 |             print('\nNumber of components: ', n_comp)
402 |             # Apply PCA on data
403 |             print('Returns shape: ', returns.shape)
404 |             X, _ = self.apply_PCA(n_comp, returns)
405 |             # Apply DBSCAN
406 |             clustered_series_all, clustered_series, counts, clf = self.apply_DBSCAN(
407 |                 clustering_params['epsilon'],
408 |                 clustering_params['min_samples'],
409 |                 X,
410 |                 returns)
411 |             # Silhouette score
412 |             silhouette = silhouette_score(X, clf.labels_, 'euclidean')
413 |             print('Silhouette score ', silhouette)
414 | 
415 |             # Standard deviation
416 |             # std_deviation = counts.std()
417 |             # print('Standard deviation: ',std_deviation))
418 | 
419 |             if silhouette > best_n_comp['silhouette']:
420 |                 best_n_comp = {'n_comp': n_comp,
421 |                                'silhouette': silhouette,
422 |                                'X': X,
423 |                                'clustered_series_all': clustered_series_all,
424 |                                'clustered_series': clustered_series,
425 |                                'counts': counts,
426 |                                'clf': clf
427 |                                }
428 | 
429 |         print('\nThe best silhouette coefficient was: {} for {} principal components'.format(best_n_comp['silhouette'],
430 |                                                                                              best_n_comp['n_comp']))
431 | 
432 |         return best_n_comp['X'], best_n_comp['clustered_series_all'], best_n_comp['clustered_series'], best_n_comp[
433 |             'counts'], best_n_comp['clf']
434 | 
435 |     def get_candidate_pairs(self, clustered_series, pricing_df_train, pricing_df_test, min_half_life=78,
436 |                             max_half_life=20000, min_zero_crosings=20, p_value_threshold=0.05, hurst_threshold=0.5,
437 |                             subsample=0):
438 |         """
439 |         This function looks for tradable pairs over the clusters formed previously.
440 | 
441 |         :param clustered_series: series with cluster label info
442 |         :param pricing_df_train: df with price series from train set
443 |         :param pricing_df_test: df with price series from test set
444 |         :param n_clusters: number of clusters
445 |         :param min_half_life: min half life of a time series to be considered as candidate
446 |         :param min_zero_crosings: min number of zero crossings (or mean crossings)
447 |         :param p_value_threshold: p_value to check during cointegration test
448 |         :param hurst_threshold: max hurst exponent value
449 | 
450 |         :return: list of pairs and its info
451 |         :return: list of unique tickers identified in the candidate pairs universe
452 |         """
453 | 
454 |         total_pairs, total_pairs_fail_criteria = [], []
455 |         n_clusters = len(clustered_series.value_counts())
456 |         for clust in range(n_clusters):
457 |             sys.stdout.write("\r"+'Cluster {}/{}'.format(clust+1, n_clusters))
458 |             sys.stdout.flush()
459 |             symbols = list(clustered_series[clustered_series == clust].index)
460 |             cluster_pricing_train = pricing_df_train[symbols]
461 |             cluster_pricing_test = pricing_df_test[symbols]
462 |             pairs, pairs_fail_criteria = self.find_pairs(cluster_pricing_train,
463 |                                                         cluster_pricing_test,
464 |                                                         p_value_threshold,
465 |                                                         min_half_life,
466 |                                                         max_half_life,
467 |                                                         min_zero_crosings,
468 |                                                         hurst_threshold,
469 |                                                         subsample)
470 |             total_pairs.extend(pairs)
471 |             total_pairs_fail_criteria.append(pairs_fail_criteria)
472 | 
473 |         print('Found {} pairs'.format(len(total_pairs)))
474 |         unique_tickers = np.unique([(element[0], element[1]) for element in total_pairs])
475 |         print('The pairs contain {} unique tickers'.format(len(unique_tickers)))
476 | 
477 |         # discarded
478 |         review = dict(functools.reduce(operator.add, map(collections.Counter, total_pairs_fail_criteria)))
479 |         print('Pairs Selection failed stage: ', review)
480 | 
481 |         return total_pairs, unique_tickers
482 | 


--------------------------------------------------------------------------------
/classes/class_Trader.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import sys
  4 | import matplotlib.pyplot as plt
  5 | from datetime import timedelta
  6 | 
  7 | # just set the seed for the random number generator
  8 | np.random.seed(107)
  9 | 
 10 | class Trader:
 11 |     """
 12 |     This class contains a set of pairs trading strategies along
 13 |     with some auxiliary functions
 14 |     """
 15 | 
 16 |     def __init__(self):
 17 |         """
 18 |         :initial elements
 19 |         """
 20 | 
 21 |     def threshold_strategy(self, y, x, beta, entry_level=1.0, exit_level=1.0, stabilizing_threshold=5):
 22 |         """
 23 |         This function implements a threshold filter strategy with a fixed beta, corresponding to the cointegration
 24 |         ratio.
 25 |         :param y: price series of asset y
 26 |         :param x: price series of asset x
 27 |         :param entry_level: abs of long and short threshold
 28 |         :param exit_multiplier: abs of exit threshold
 29 |         :param stabilizing_threshold: number of initial periods when no positions should be set
 30 |         """
 31 | 
 32 |         # calculate normalized spread
 33 |         spread = y - beta * x
 34 |         norm_spread = (spread - spread.mean()) / np.std(spread)
 35 |         norm_spread = np.asarray(norm_spread.values)
 36 | 
 37 |         # get indices for long and short positions
 38 |         longs_entry = norm_spread < -entry_level
 39 |         longs_exit = norm_spread > -exit_level
 40 |         shorts_entry = norm_spread > entry_level
 41 |         shorts_exit = norm_spread < exit_level
 42 | 
 43 |         num_units_long = pd.Series([np.nan for i in range(len(y))])
 44 |         num_units_short = pd.Series([np.nan for i in range(len(y))])
 45 | 
 46 |         # remove trades while the spread is stabilizing
 47 |         longs_entry[:stabilizing_threshold] = False
 48 |         longs_exit[:stabilizing_threshold] = False
 49 |         shorts_entry[:stabilizing_threshold] = False
 50 |         shorts_exit[:stabilizing_threshold] = False
 51 | 
 52 |         # set threshold crossings with corresponding position
 53 |         num_units_long[longs_entry] = 1.
 54 |         num_units_long[longs_exit] = 0
 55 |         num_units_short[shorts_entry] = -1.
 56 |         num_units_short[shorts_exit] = 0
 57 | 
 58 |         # shift to simulate entry delay in real life trading
 59 |         # please comment if no need to simulate delay
 60 |         num_units_long = num_units_long.shift(1)
 61 |         num_units_short = num_units_short.shift(1)
 62 | 
 63 |         # initialize market position with zero
 64 |         num_units_long[0] = 0.
 65 |         num_units_short[0] = 0.
 66 |         # finally, fill in between
 67 |         num_units_long = num_units_long.fillna(method='ffill')
 68 |         num_units_short = num_units_short.fillna(method='ffill')
 69 |         num_units = num_units_long + num_units_short
 70 |         num_units = pd.Series(data=num_units.values, index=y.index, name='numUnits')
 71 | 
 72 |         # add position durations
 73 |         trading_durations = self.add_trading_duration(pd.DataFrame(num_units, index=y.index))
 74 | 
 75 |         # Method 1: calculate return per each position
 76 |         # This method receives the series with the positions and calculate the return at the end of each position, not
 77 |         # yet accounting for costs
 78 |         position_ret, _, ret_summary = self.calculate_position_returns(y, x, beta, num_units)
 79 |         # Method 2: calculate balance in total
 80 |         # This method constructs the portfolio during the entire trading session and calculates the returns every 5 min.
 81 |         # By compounding the returns during a position, we obtain the position return as given in method 1.
 82 |         # This method is necessary to obtain the daily returns from which to estimate the Sharpe Ratio
 83 |         balance_summary = self.calculate_balance(y, x, beta, num_units.shift(1).fillna(0), trading_durations)
 84 | 
 85 |         # add transaction costs and gather all info in a single dataframe
 86 |         series_to_include = [(balance_summary.pnl, 'pnl'),
 87 |                              (balance_summary.pnl_y, 'pnl_y'),
 88 |                              (balance_summary.pnl_x, 'pnl_x'),
 89 |                              (balance_summary.account_balance, 'account_balance'),
 90 |                              (balance_summary.returns, 'returns'),
 91 |                              (position_ret, 'position_return'),
 92 |                              (y, y.name),
 93 |                              (x, x.name),
 94 |                              (pd.Series(norm_spread, index=y.index), 'norm_spread'),
 95 |                              (num_units, 'numUnits'),
 96 |                              (trading_durations, 'trading_duration')]
 97 |         summary = self.trade_summary(series_to_include, beta)
 98 | 
 99 |         # calculate sharpe ratio for each pair separately
100 |         ret_w_costs = summary.returns
101 |         n_years = round(len(y) / (240 * 78))
102 |         n_days = 252
103 |         if np.std(ret_w_costs) == 0:
104 |             sharpe_no_costs, sharpe_w_costs = (0, 0)
105 |         else:
106 |             if np.std(position_ret) == 0:
107 |                 sharpe_no_costs=0
108 |             else:
109 |                 sharpe_no_costs = self.calculate_sharpe_ratio(n_years, n_days, position_ret)
110 |             sharpe_w_costs = self.calculate_sharpe_ratio(n_years, n_days, ret_w_costs)
111 | 
112 |         return summary, (sharpe_no_costs, sharpe_w_costs), balance_summary
113 | 
114 |     def apply_trading_strategy(self, pairs, strategy='fixed_beta', entry_multiplier=1, exit_multiplier=0,
115 |                                test_mode=False, train_val_split='2017-01-01'):
116 |         """
117 |         This function implements the standard fixed beta trading strategy.
118 |         :param pairs: list with pairs identified in the training set
119 |         :param strategy: currently, only fixed_beta is implemented
120 |         :param entry_multiplier: threshold that defines where to enter a position
121 |         :param exit_multiplier: threshold that defines where to exit a position
122 |         :param test_mode: flag to decide whether to apply strategy on the validation set or in the test set
123 |         :param train_val_split: split of training and validation data
124 |         """
125 |         sharpe_results = []
126 |         cum_returns = []
127 |         sharpe_results_with_costs = []
128 |         cum_returns_with_costs = []
129 |         performance = []  # aux variable to store pairs' record
130 |         print(' entry delay turned on.')
131 |         for i, pair in enumerate(pairs):
132 |             sys.stdout.write("\r"+'Pair: {}/{}'.format(i + 1, len(pairs)))
133 |             sys.stdout.flush()
134 |             pair_info = pair[2]
135 | 
136 |             if test_mode:
137 |                 y = pair_info['Y_test']
138 |                 x = pair_info['X_test']
139 |             else:
140 |                 y = pair_info['Y_train'][train_val_split:]
141 |                 x = pair_info['X_train'][train_val_split:]
142 | 
143 |             if strategy == 'fixed_beta':
144 |                 summary, sharpe, balance_summary = self.threshold_strategy(y=y, x=x, beta=pair_info['coint_coef'],
145 |                                                                            entry_level=entry_multiplier,
146 |                                                                            exit_level=exit_multiplier)
147 |                 # no costs
148 |                 cum_returns.append((np.cumprod(1 + summary.position_return) - 1).iloc[-1] * 100)
149 |                 sharpe_results.append(sharpe[0])
150 |                 # with costs
151 |                 # cum_returns_with_costs.append((np.cumprod(1 + summary.position_ret_with_costs) - 1).iloc[-1] * 100)
152 |                 cum_returns_with_costs.append((summary.account_balance[-1] - 1) * 100)
153 |                 sharpe_results_with_costs.append(sharpe[1])
154 |                 performance.append((pair, summary, balance_summary))
155 | 
156 |             else:
157 |                 print('Only one strategy currently available: \n1.Fixed Beta')
158 |                 exit()
159 | 
160 |         return (sharpe_results, cum_returns), (sharpe_results_with_costs, cum_returns_with_costs), performance
161 | 
162 |     def trade_summary(self, series, beta=0):
163 |         """
164 |         This function receives a set of series containing information from the trade and
165 |         returns a DataFrame containing the summary data.
166 |         :param series: a list of tuples containing the time series and the corresponding names
167 |         :param beta: cointegration ratio. If moving beta, use beta=0.
168 |         """
169 |         for attribute, attribute_name in series:
170 |             try:
171 |                 attribute.name = attribute_name
172 |             except:
173 |                 continue
174 |         summary = pd.concat([item[0] for item in series], axis=1)
175 | 
176 |         # change numUnits so that it corresponds to the position for the row's date,
177 |         # instead of corresponding to the position entered in the end of that day.
178 |         summary['numUnits'] = summary['numUnits'].shift().fillna(0)
179 |         summary = summary.rename(columns={"numUnits": "position_during_day"})
180 | 
181 |         # add position costs
182 |         summary['position_ret_with_costs'] = self.add_transaction_costs(summary, beta)
183 | 
184 |         return summary
185 | 
186 |     def add_trading_duration(self, df):
187 |         """
188 |         The following function adds a column containing the trading duration in days.
189 |         :param df: Dataframe containing column with positions to enter in next day
190 |         """
191 | 
192 |         df['trading_duration'] = [0] * len(df)
193 |         previous_unit = 0.
194 |         new_position_counter = 0
195 |         day = df.index[0].day
196 |         for index, row in df.iterrows():
197 |             if previous_unit == row['numUnits']:
198 |                 if previous_unit != 0.:
199 |                     # update counter
200 |                     if index.day != day:
201 |                         new_position_counter += 1
202 |                         day = index.day
203 |                     # verify if it is last trading day
204 |                     if index == df.index[-1]:
205 |                         df.loc[index, 'trading_duration'] = new_position_counter
206 |                 continue  # no change in positions to verify
207 |             else:
208 |                 if previous_unit == 0.:
209 |                     previous_unit = row['numUnits']
210 |                     # begin counter
211 |                     new_position_counter = 1
212 |                     day = index.day
213 |                     continue  # simply start the trade
214 |                 else:
215 |                     df.loc[index, 'trading_duration'] = new_position_counter
216 |                     previous_unit = row['numUnits']
217 |                     # begin counter
218 |                     new_position_counter = 1
219 |                     day = index.day
220 |                     continue
221 | 
222 |         return df['trading_duration']
223 | 
224 |     def add_transaction_costs(self, summary, beta=0, comission_costs=0.08, market_impact=0.2, short_rental=1):
225 |         """
226 |         Function to add transaction costs per position.
227 |         :param summary: dataframe containing summary of all transactions
228 |         :param beta: cointegration factor, use 0 if moving beta
229 |         :param comission_costs: commision costs, in percentage, per security, per trade
230 |         :param market_impact: market impact costs, in percentage, per security, per trade
231 |         :param short_rental: short rental costs, in annual percentage
232 |         """
233 |         fixed_costs_per_trade = (comission_costs + market_impact) / 100  # remove percentage
234 |         short_costs_per_day = (short_rental / 252) / 100  # remove percentage
235 | 
236 |         costs = summary.apply(lambda row: self.apply_costs(row, fixed_costs_per_trade, short_costs_per_day, beta),
237 |                               axis=1)
238 | 
239 |         ret_with_costs = summary['position_return'] - costs
240 | 
241 |         return ret_with_costs
242 | 
243 |     def apply_costs(self, row, fixed_costs_per_trade, short_costs_per_day, beta=0):
244 | 
245 |         if beta == 0:
246 |             beta = row['beta_position']
247 | 
248 |         if row['position_during_day'] == 1. and row['trading_duration'] != 0:
249 |             if beta >= 1:
250 |                 return fixed_costs_per_trade * (1 / beta) + fixed_costs_per_trade + short_costs_per_day * \
251 |                        row['trading_duration']
252 |             elif beta < 1:
253 |                 return fixed_costs_per_trade * beta + fixed_costs_per_trade + short_costs_per_day * \
254 |                        row['trading_duration'] * beta
255 | 
256 |         elif row['position_during_day'] == -1. and row['trading_duration'] != 0:
257 |             if beta >= 1:
258 |                 return fixed_costs_per_trade * (1 / beta) + fixed_costs_per_trade + short_costs_per_day * \
259 |                        row['trading_duration'] * (1 / beta)
260 |             elif beta < 1:
261 |                 return fixed_costs_per_trade * beta + fixed_costs_per_trade + short_costs_per_day * \
262 |                        row['trading_duration']
263 |         else:
264 |             return 0
265 | 
266 |     def calculate_balance(self, y, x, beta, positions, trading_durations):
267 |         """
268 |         Function to calculate balance during a trading session.
269 | 
270 |         :param y: y series
271 |         :param x: x series
272 |         :param beta: pair's cointegration coefficient
273 |         :param positions: position during the current day
274 |         :param trading_durations: series with trading duration of each trade
275 |         :return: balance dataframe containing summary info
276 |         """
277 |         y_returns = y.pct_change().fillna(0) * positions
278 |         x_returns = -x.pct_change().fillna(0) * positions
279 | 
280 |         leg_y = [np.nan] * len(y)  # initial balance
281 |         leg_x = [np.nan] * len(y)  # initial balance
282 |         pnl_y = [np.nan] * len(y)
283 |         pnl_x = [np.nan] * len(y)
284 |         account_balance = [np.nan] * len(y)
285 | 
286 |         # auxiliary series to indicate beginning and end of position
287 |         new_positions_idx = positions.diff()[positions.diff() != 0].index.values
288 |         end_positions_idx = trading_durations[trading_durations != 0].index.values
289 |         position_trigger = pd.Series([0] * len(y), index=y.index, name='position_trigger')
290 |         # 2: new position
291 |         # 1: new position which only lasts one day
292 |         # -1: end of position that did not start on that day
293 |         position_trigger[new_positions_idx] = 2.
294 |         position_trigger[end_positions_idx] = position_trigger[end_positions_idx] - 1.
295 |         position_trigger = position_trigger * positions.abs()
296 |         position_trigger.name = 'position_trigger'
297 | 
298 |         for i in range(len(y)):
299 |             if i == 0:
300 |                 pnl_y[0] = 0
301 |                 pnl_x[0] = 0
302 |                 account_balance[0] = 1
303 |                 if beta > 1:
304 |                     leg_y[0] = 1 / beta
305 |                     leg_x[0] = 1
306 |                 else:
307 |                     leg_y[0] = 1
308 |                     leg_x[0] = beta
309 |             elif positions[i] == 0:
310 |                 pnl_y[i] = 0
311 |                 pnl_x[i] = 0
312 |                 leg_y[i] = leg_y[i - 1]
313 |                 leg_x[i] = leg_x[i - 1]
314 |                 account_balance[i] = account_balance[i-1]
315 |             else:
316 |                 # add costs
317 |                 if position_trigger[i] == 1:
318 |                     # every new position invest initial 1$ + acc in X + acc in Y
319 |                     position_investment = account_balance[i-1]
320 |                     # if new position, that most legs contain now the overall invested
321 |                     if beta > 1:
322 |                         pnl_y[i] = y_returns[i] * position_investment * (1 / beta)
323 |                         pnl_x[i] = x_returns[i] * position_investment
324 |                     else:
325 |                         pnl_y[i] = y_returns[i] * position_investment
326 |                         pnl_x[i] = x_returns[i] * position_investment * beta
327 | 
328 |                     # update legs
329 |                     if beta > 1:
330 |                         if positions[i] == 1:
331 |                             leg_y[i] = position_investment * (1 / beta) + pnl_y[i]
332 |                             leg_x[i] = position_investment - pnl_x[i]
333 |                         else:
334 |                             leg_y[i] = position_investment * (1 / beta) - pnl_y[i]
335 |                             leg_x[i] = position_investment + pnl_x[i]
336 |                     else:
337 |                         if positions[i] == 1:
338 |                             leg_y[i] = position_investment + pnl_y[i]
339 |                             leg_x[i] = position_investment * beta - pnl_x[i]
340 |                         else:
341 |                             leg_y[i] = position_investment - pnl_y[i]
342 |                             leg_x[i] = position_investment * beta + pnl_x[i]
343 | 
344 |                     # commission costs + market impact costs + short rental costs
345 |                     if beta >= 1:
346 |                         pnl_y[i] = pnl_y[i] - 0.0028*(1/beta)*position_investment # add commission + bid ask spread
347 |                         pnl_x[i] = pnl_x[i] - 0.0028*position_investment # add commission + bid ask spread
348 |                         if positions[i] == 1:
349 |                             pnl_x[i] = pnl_x[i] - 1 * (0.01 / 252)*position_investment
350 |                         elif positions[i] == -1:
351 |                             pnl_y[i] = pnl_y[i] - 1 * (0.01 / 252)*(1/beta)*position_investment
352 |                     elif beta < 1:
353 |                         pnl_y[i] = pnl_y[i] - 0.0028 * position_investment  # add commission + bid ask spread
354 |                         pnl_x[i] = pnl_x[i] - 0.0028 * beta * position_investment  # add commission + bid ask spread
355 |                         if positions[i] == 1:
356 |                             pnl_x[i] = pnl_x[i] - 1 * (0.01 / 252)*beta*position_investment
357 |                         elif positions[i] == -1:
358 |                             pnl_y[i] = pnl_y[i] - 1 * (0.01 / 252)*position_investment
359 |                     # update balance
360 |                     account_balance[i] = account_balance[i-1] + pnl_x[i] + pnl_y[i]
361 | 
362 |                 elif position_trigger[i] == 2:
363 |                     # every new position invest initial 1$ + acc in X + acc in Y
364 |                     position_investment = account_balance[i-1]
365 |                     # if new position, that most legs contain now the overall invested
366 |                     if beta > 1:
367 |                         pnl_y[i] = y_returns[i] * position_investment * (1 / beta)
368 |                         pnl_x[i] = x_returns[i] * position_investment
369 |                     else:
370 |                         pnl_y[i] = y_returns[i] * position_investment
371 |                         pnl_x[i] = x_returns[i] * position_investment * beta
372 | 
373 |                     # update legs
374 |                     if beta > 1:
375 |                         if positions[i] == 1:
376 |                             leg_y[i] = position_investment * (1 / beta) + pnl_y[i]
377 |                             leg_x[i] = position_investment - pnl_x[i]
378 |                         else:
379 |                             leg_y[i] = position_investment * (1 / beta) - pnl_y[i]
380 |                             leg_x[i] = position_investment + pnl_x[i]
381 |                     else:
382 |                         if positions[i] == 1:
383 |                             leg_y[i] = position_investment + pnl_y[i]
384 |                             leg_x[i] = position_investment * beta - pnl_x[i]
385 |                         else:
386 |                             leg_y[i] = position_investment - pnl_y[i]
387 |                             leg_x[i] = position_investment * beta + pnl_x[i]
388 | 
389 |                     # commission costs + market impact costs + short rental costs
390 |                     if beta >= 1:
391 |                         pnl_y[i] = pnl_y[i] - 0.0028*(1/beta)*position_investment # add commission + bid ask spread
392 |                         pnl_x[i] = pnl_x[i] - 0.0028*position_investment # add commission + bid ask spread
393 |                     elif beta < 1:
394 |                         pnl_y[i] = pnl_y[i] - 0.0028 * position_investment  # add commission + bid ask spread
395 |                         pnl_x[i] = pnl_x[i] - 0.0028 * beta * position_investment  # add commission + bid ask spread
396 |                     # update balance
397 |                     account_balance[i] = account_balance[i - 1] + pnl_x[i] + pnl_y[i]
398 | 
399 |                 else:
400 |                     # calculate trade pnl
401 |                     pnl_y[i] = y_returns[i] * leg_y[i - 1]
402 |                     pnl_x[i] = x_returns[i] * leg_x[i - 1]
403 | 
404 |                     # update legs
405 |                     if positions[i] == 1:
406 |                         leg_y[i] = leg_y[i - 1] + pnl_y[i]
407 |                         leg_x[i] = leg_x[i - 1] - pnl_x[i]
408 |                     else:
409 |                         leg_y[i] = leg_y[i - 1] - pnl_y[i]
410 |                         leg_x[i] = leg_x[i - 1] + pnl_x[i]
411 | 
412 |                     # add short costs
413 |                     if position_trigger[i] == -1:
414 |                         if positions[i]==1:
415 |                             if beta > 1:
416 |                                 pnl_x[i] = pnl_x[i] - trading_durations[i] * (0.01 / 252) * position_investment
417 |                             elif beta < 1:
418 |                                 pnl_x[i] = pnl_x[i] - trading_durations[i] * (0.01 / 252)*beta*position_investment
419 |                         elif positions[i]==-1:
420 |                             if beta > 1:
421 |                                 pnl_y[i] = pnl_y[i] - trading_durations[i] * (0.01 / 252)*(1/beta)*position_investment
422 |                             elif beta < 1:
423 |                                 pnl_y[i] = pnl_y[i] - trading_durations[i] * (0.01 / 252) * position_investment
424 | 
425 |                     # update balance
426 |                     account_balance[i] = account_balance[i - 1] + pnl_x[i] + pnl_y[i]
427 |         pnl = [pnl_y[i] + pnl_x[i] for i in range(len(y))]
428 | 
429 |         # join everything in dataframe
430 |         balance = pd.Series(data=account_balance, index=y.index, name='account_balance')
431 |         returns = balance.pct_change().fillna(0)
432 |         returns.name = 'returns'
433 |         pnl = pd.Series(data=pnl, index=y.index, name='pnl')
434 |         pnl_y = pd.Series(data=pnl_y, index=y.index, name='pnl_y')
435 |         pnl_x = pd.Series(data=pnl_x, index=y.index, name='pnl_x')
436 |         leg_y = pd.Series(data=leg_y, index=y.index, name='leg_y')
437 |         leg_x = pd.Series(data=leg_x, index=y.index, name='leg_x')
438 |         balance_summary = pd.concat(
439 |                 [balance, pnl, pnl_y, pnl_x, leg_y, leg_x, returns, position_trigger, positions, y, x,
440 |                  trading_durations], axis=1)
441 | 
442 |         return balance_summary
443 | 
444 |     def calculate_sharpe_ratio(self, n_years, n_days, ret):
445 |         """
446 |         Calculate sharpe ratio for one asset only.
447 |         As an estimate of the expected value use the yearly return.
448 |         :param n_years: number of years being considered
449 |         :param n_days: number of trading days per year
450 |         :param ret: array containing returns per timestep
451 |         """
452 |         rf = {2014: 0.00033, 2015: 0.00053, 2016: 0.0032, 2017: 0.0093, 2018: 0.0194}
453 |         time_in_market = n_years * n_days
454 |         daily_index = ret.resample('D').last().dropna().index
455 |         daily_ret = (ret + 1).resample('D').prod() - 1
456 |         # remove added days from resample
457 |         daily_ret = daily_ret.loc[daily_index]
458 | 
459 |         annualized_ret = (np.cumprod(1 + ret) - 1)[-1]
460 |         year = ret.index[0].year
461 |         if year in rf.keys():
462 |             sharpe_ratio = (annualized_ret-rf[year]) / (np.std(daily_ret)*np.sqrt(time_in_market))
463 |         else:
464 |             print('Not considering risk-free rate')
465 |             sharpe_ratio = annualized_ret / (np.std(daily_ret)*np.sqrt(time_in_market))
466 | 
467 |         return sharpe_ratio
468 | 
469 |     def calculate_portfolio_sharpe_ratio(self, performance, pairs):
470 |         """
471 |         Calculates the sharpe ratio based on the account balance of the total portfolio
472 | 
473 |         :param performance: df with summary statistics from strategy
474 |         :param pairs: list with pairs
475 |         """
476 |         # calculate total daily account balance & df with returns
477 |         total_account_balance = performance[0][1]['account_balance'].resample('D').last().dropna()
478 |         portfolio_returns = total_account_balance.pct_change().fillna(0)
479 |         for index in range(1, len(pairs)):
480 |             pair_balance = performance[index][1]['account_balance'].resample('D').last().dropna()
481 |             total_account_balance = total_account_balance + pair_balance
482 |             portfolio_returns = pd.concat([portfolio_returns, pair_balance.pct_change().fillna(0)], axis=1)
483 | 
484 |         # add first day with initial balance
485 |         total_account_balance = pd.Series(data=[len(pairs)],
486 |                                           index=[total_account_balance.index[0] - timedelta(days=1)]).append(
487 |                                           total_account_balance)
488 | 
489 |         # calculate portfolio volatility
490 |         weights = np.array([1 / len(pairs)] * len(pairs))
491 |         vol = np.sqrt(np.dot(weights.T, np.dot(portfolio_returns.cov(), weights)))
492 | 
493 |         # calculate sharpe ratio
494 |         rf = {2014: 0.00033, 2015: 0.00053, 2016: 0.0032, 2017: 0.0093, 2018: 0.0194}
495 |         annualized_ret = (total_account_balance[-1]-len(pairs))/len(pairs)
496 |         year = total_account_balance.index[-1].year
497 |         if year in rf.keys():
498 |             # assuming iid return's distributio, sr may be calculated as:
499 |             sharpe_ratio = (annualized_ret - rf[year]) / (vol*np.sqrt(252))
500 |             print('Sharpe Ratio assumming IID returns: ',sharpe_ratio)
501 |             print('Autocorrelation: ', total_account_balance.pct_change().fillna(0).autocorr(lag=1))
502 |             # accounting for non-zero autocorrelatio, daily sr should be calculated as:
503 |             # the daily sharpe ratio is then multiplied by the annualization factor proposed by the paper: The
504 |             # Statistics of Sharpe Ratios by Andrew W Lo
505 |             annualized_ret = total_account_balance.pct_change().fillna(0).mean()
506 |             rf_daily = (1+rf[year])**(1/252)-1
507 |             sharpe_ratio = (annualized_ret-rf_daily) /vol
508 |             print('Daily Sharpe Ratio', sharpe_ratio)
509 |         else:
510 |             print('Not considering risk-free rate')
511 |             sharpe_ratio = annualized_ret / (vol*np.sqrt(252))
512 | 
513 |         return sharpe_ratio
514 | 
515 |     def calculate_maximum_drawdown(self, account_balance):
516 |         """
517 |         Function to calculate maximum drawdown w.r.t portfolio balance.
518 | 
519 |         source: https://stackoverflow.com/questions/22607324/start-end-and-duration-of-maximum-drawdown-in-python
520 |         """
521 | 
522 |         # first calculate total drawdown period
523 |         account_balance_drawdowns = account_balance.resample('D').last().dropna().diff().fillna(0).apply(lambda row: 0 if row >= 0 else 1)
524 |         total_dd_duration = account_balance_drawdowns.sum()
525 |         print('Total Drawdown Days: {} days'.format(total_dd_duration))
526 | 
527 |         xs = np.asarray(account_balance.values)
528 | 
529 |         i = np.argmax(np.maximum.accumulate(xs) - xs)  # end of the period
530 |         if i == 0:
531 |             plt.plot(xs)
532 |             return 0
533 |         else:
534 |             j = np.argmax(xs[:i])  # start of period
535 |             plt.figure(figsize=(10,7))
536 |             plt.grid()
537 |             plt.plot(xs, label='Total Account Balance')
538 |             dates = account_balance.resample('BMS').first().dropna().index.date
539 |             xi = np.arange(0, len(account_balance), len(account_balance)/12)
540 |             plt.xticks(xi, dates, rotation=50)
541 |             plt.xlim(0, len(account_balance))
542 |             plt.plot([i, j], [xs[i], xs[j]], 'o', color='Red', markersize=10)
543 |             plt.xlabel('Date', size=12)
544 |             plt.ylabel('Capital($)', size=12)
545 |             plt.legend()
546 | 
547 |         max_dd_period = round((i - j) / 78)
548 |         print('Max DD period: {} days'.format(max_dd_period))
549 |         #print('Max DD period: {} days'.format((account_balance.index[i]-account_balance.index[j]).days))
550 | 
551 |         return (xs[i]-xs[j])/xs[j] * 100, max_dd_period, total_dd_duration
552 | 
553 |     def calculate_position_returns(self, y, x, beta, positions):
554 |         """
555 |         This method receives the series with the positions and calculate the return at the end of each position, not
556 |         yet accounting for costs
557 | 
558 |         Y: price of ETF Y
559 |         X: price of ETF X
560 |         beta: cointegration ratio
561 |         positions: array indicating position to enter in next day
562 |         """
563 |         # get copy of series
564 |         y = y.copy()
565 |         y.name = 'y'
566 |         x = x.copy()
567 |         x.name = 'x'
568 | 
569 |         # positions preceed the day when the position is actually entered!
570 |         # get indices before entering position
571 |         new_positions = positions.diff()[positions.diff() != 0].index.values
572 |         # create variable for signalizing end of position
573 |         end_position = pd.Series(data=[0] * len(y), index=y.index, name='end_position')
574 |         end_position[new_positions] = 1.
575 |         # add end position if trading period is over and position is open
576 |         if positions[-1] != 0:
577 |             end_position[-1] = 1.
578 | 
579 |         # get corresponding X and Y
580 |         y_entry = pd.Series(data=[np.nan] * len(y), index=y.index, name='y_entry')
581 |         x_entry = pd.Series(data=[np.nan] * len(y), index=y.index, name='x_entry')
582 |         y_entry[new_positions] = y[new_positions]
583 |         x_entry[new_positions] = x[new_positions]
584 |         y_entry = y_entry.shift().fillna(method='ffill')
585 |         x_entry = x_entry.shift().fillna(method='ffill')
586 | 
587 |         # name positions series
588 |         positions.name = 'positions'
589 | 
590 |         # apply returns per trade
591 |         # each row contain all the parameters to be applied in that position
592 |         df = pd.concat([y, x, positions.shift().fillna(0), y_entry, x_entry, end_position], axis=1)
593 |         returns = df.apply(lambda row: self.return_per_position(row, beta), axis=1).fillna(0)
594 |         cum_returns = np.cumprod(returns + 1) - 1
595 |         df['ret'] = returns
596 |         returns.name = 'position_return'
597 | 
598 |         return returns, cum_returns, df
599 | 
600 |     def return_per_position(self, row, beta=None, sliding=False):
601 |         if row['end_position'] != 0:
602 |             y_returns = (row['y'] - row['y_entry']) / row['y_entry']
603 |             x_returns = (row['x'] - row['x_entry']) / row['x_entry']
604 |             if sliding:
605 |                 beta = row['beta_position']
606 |             if beta > 1.:
607 |                 return ((1 / beta) * y_returns - 1 * x_returns) * row['positions']
608 |             else:
609 |                 return (y_returns - beta * x_returns) * row['positions']
610 |         else:
611 |             return 0
612 | 
613 |     def calculate_metrics(self, cum_returns, n_years):
614 |         """
615 |         Calculate common metrics on average over all pairs.
616 |         :param cum_returns: array with cumulative returns of every pair
617 |         :param n_years: numbers of yers of the trading strategy
618 |         :return: average average total roi
619 |         :return: average annual roi
620 |         :return: percentage of pairs with positive returns
621 |         """
622 |         # use below for fully invested capital:
623 |         # cum_returns_filtered = [cum for cum in cum_returns if cum != 0]
624 |         # or use below for commited capital:
625 |         cum_returns_filtered = cum_returns
626 | 
627 |         avg_total_roi = np.mean(cum_returns_filtered)
628 | 
629 |         avg_annual_roi = ((1 + (avg_total_roi / 100)) ** (1 / float(n_years)) - 1) * 100
630 |         print('Annual ROI: ', avg_annual_roi)
631 | 
632 |         cum_returns_filtered = np.asarray(cum_returns_filtered)
633 |         positive_pct = len(cum_returns_filtered[cum_returns_filtered > 0]) * 100 / len(cum_returns_filtered)
634 |         print('{} % of the pairs had positive returns'.format(positive_pct))
635 | 
636 |         return avg_total_roi, avg_annual_roi, positive_pct
637 | 
638 |     def summarize_results(self, sharpe_results, cum_returns, performance, total_pairs, ticker_segment_dict, n_years):
639 |         """
640 |         This function summarizes interesting metrics to include in the final output
641 |         :param sharpe_results: array containing sharpe results for each pair
642 |         :param cum_returns: array containing cum returns for each pair
643 |         :param performance: df containing a summary of each pair's trade
644 |         :param total_pairs: list containing all the identified pairs
645 |         :param ticker_segment_dict: dict containing segment for each ticker
646 |         :param n_years: number of years the strategy is running
647 |         :return: dictionary with metrics of interest
648 |         """
649 | 
650 |         avg_total_roi, avg_annual_roi, positive_pct = self.calculate_metrics(cum_returns, n_years)
651 | 
652 |         portfolio_sharpe_ratio = self.calculate_portfolio_sharpe_ratio(performance, total_pairs)
653 | 
654 |         sorted_indices = np.flip(np.argsort(sharpe_results), axis=0)
655 |         # print(sorted_indices)
656 |         # initialize list of lists
657 |         data = []
658 |         for index in sorted_indices:
659 |             # get number of positive and negative positions
660 |             position_returns = performance[index][1]['position_ret_with_costs']
661 |             positive_positions = len(position_returns[position_returns > 0])
662 |             negative_positions = len(position_returns[position_returns < 0])
663 |             data.append([total_pairs[index][0],
664 |                          ticker_segment_dict[total_pairs[index][0]],
665 |                          total_pairs[index][1],
666 |                          ticker_segment_dict[total_pairs[index][1]],
667 |                          total_pairs[index][2]['t_statistic'],
668 |                          total_pairs[index][2]['p_value'],
669 |                          total_pairs[index][2]['zero_cross'],
670 |                          total_pairs[index][2]['half_life'],
671 |                          total_pairs[index][2]['hurst_exponent'],
672 |                          positive_positions,
673 |                          negative_positions,
674 |                          sharpe_results[index]
675 |                          ])
676 | 
677 |         # Create the pandas DataFrame
678 |         pairs_df = pd.DataFrame(data, columns=['Leg1', 'Leg1_Segmt', 'Leg2', 'Leg2_Segmt', 't_statistic', 'p_value',
679 |                                                'zero_cross', 'half_life', 'hurst_exponent', 'positive_trades',
680 |                                                'negative_trades', 'sharpe_result'])
681 | 
682 |         pairs_df['positive_trades_per_pair_pct'] = (pairs_df['positive_trades']) / \
683 |                                                    (pairs_df['positive_trades'] + pairs_df['negative_trades']) * 100
684 | 
685 |         print('Total number of trades: ', pairs_df.positive_trades.sum() + pairs_df.negative_trades.sum())
686 |         print('Positive trades: ', pairs_df.positive_trades.sum())
687 |         print('Negative trades: ', pairs_df.negative_trades.sum())
688 | 
689 |         avg_positive_trades_per_pair_pct = pairs_df['positive_trades_per_pair_pct'].mean()
690 | 
691 |         results = {'n_pairs': len(sharpe_results),
692 |                    'portfolio_sharpe_ratio': portfolio_sharpe_ratio,
693 |                    'avg_total_roi': avg_total_roi,
694 |                    'avg_annual_roi': avg_annual_roi,
695 |                    'pct_positive_trades_per_pair': avg_positive_trades_per_pair_pct,
696 |                    'pct_pairs_with_positive_results': positive_pct,
697 |                    'avg_half_life': pairs_df['half_life'].mean(),
698 |                    'avg_hurst_exponent': pairs_df['hurst_exponent'].mean()}
699 | 
700 |         # Drawdown info
701 |         total_account_balance = performance[0][1]['account_balance']
702 |         for index in range(1, len(total_pairs)):
703 |             total_account_balance = total_account_balance + performance[index][1]['account_balance']
704 |         total_account_balance = total_account_balance.fillna(method='ffill')
705 |         max_dd, max_dd_duration, total_dd_duration = self.calculate_maximum_drawdown(total_account_balance)
706 |         print('Maximum drawdown of portfolio: {:.2f}%'.format(max_dd))
707 | 
708 |         return results, pairs_df
709 | 
710 | 


--------------------------------------------------------------------------------
/code_organization.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simaomsarmento/PairsTrading/0781877c75673ceca3c61704eee9c9dca9d37b6b/code_organization.pdf


--------------------------------------------------------------------------------
/data/link_to_data.txt:
--------------------------------------------------------------------------------
1 | https://www.dropbox.com/sh/0w3vu1eylrfnkch/AABttIlDf64MmVf5CP1Qy-XOa?dl=0


--------------------------------------------------------------------------------
/drafts/config/config.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "ticker_segment_dict": "data/etfs/pickle/ticker_segment_dict.pickle",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2009",
 6 |         "training_final_date": "31-12-2017",
 7 |         "testing_initial_date": "01-01-2018",
 8 |         "testing_final_date": "31-12-2018",
 9 |         "nan_threshold": 0
10 |     },
11 |     "PCA": {
12 |         "N_COMPONENTS": 3,
13 |     },
14 |     "clustering": {
15 |         "algo": "DBSCAN",
16 |         "epsilon": 0.4,
17 |         "min_samples": 2
18 |     },
19 |     "pair_restrictions": {
20 |         "min_half_life": 5,
21 |         "min_zero_crossings": 120,
22 |         "p_value_threshold": 0.05,
23 |         "hurst_threshold": 0.5
24 |     },
25 |     "trading": {
26 |         "strategy": "kalman",
27 |         "lookback_multiplier": 2,
28 |         "entry_multiplier": 2,
29 |         "exit_multiplier": 0
30 |     },
31 |     "trading_filter": {
32 |         "active": 0,
33 |         "name": "correlation",
34 |         "filter_lookback_multiplier": 2,
35 |         "lag": 1,
36 |         "diff_threshold": 0
37 |     },
38 |     "mlp": {
39 |         "n_in": 5,
40 |         "n_out": 1,
41 |         "epochs": 200,
42 |         "hidden_nodes":5,
43 |         "loss_fct": "mse",
44 |         "optimizer": "adam",
45 |         "train_val_split": "2016-01-01"
46 |     },
47 |     "output": {
48 |         "filename": "summary/results.xlsx"
49 |     }
50 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2000_2018.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/commodity_ETFs_long.xlsx",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "02-02-2000",
 6 |         "training_final_date": "01-01-2015",
 7 |         "testing_initial_date": "01-01-2015",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2008_2018.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/commodity_ETFs_long.xlsx",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "02-02-2008",
 6 |         "training_final_date": "01-01-2015",
 7 |         "testing_initial_date": "01-01-2015",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2009_2015.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2009",
 6 |         "training_final_date": "01-01-2013",
 7 |         "testing_initial_date": "01-01-2013",
 8 |         "testing_final_date": "01-01-2015",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2009_2017.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2009",
 6 |         "training_final_date": "01-01-2014",
 7 |         "testing_initial_date": "01-01-2014",
 8 |         "testing_final_date": "01-01-2017",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2010_2016.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2010",
 6 |         "training_final_date": "01-01-2014",
 7 |         "testing_initial_date": "01-01-2014",
 8 |         "testing_final_date": "01-01-2016",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2010_2018.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2010",
 6 |         "training_final_date": "01-01-2015",
 7 |         "testing_initial_date": "01-01-2015",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2010_2019.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "ticker_segment_dict": "../data/etfs/pickle/ticker_segment_dict.pickle",
 4 |     },
 5 |     "pair_restrictions": {
 6 |         "min_half_life": 5,
 7 |         "min_zero_crossings": 12,
 8 |         "p_value_threshold": 0.05,
 9 |         "hurst_threshold": 0.5
10 |     },
11 |     "trading": {
12 |         "strategy": "bollinger",
13 |         "lookback_multiplier": 2,
14 |         "entry_multiplier": 1,
15 |         "exit_multiplier": 0
16 |     },
17 |     "trading_filter": {
18 |         "active": 0,
19 |         "name": "correlation",
20 |         "filter_lookback_multiplier": 2,
21 |         "lag": 1,
22 |         "diff_threshold": 0
23 |     },
24 |     "output": {
25 |         "filename": "summary/results.xlsx"
26 |     }
27 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2011_2015.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2011",
 6 |         "training_final_date": "01-01-2014",
 7 |         "testing_initial_date": "01-01-2014",
 8 |         "testing_final_date": "01-01-2015",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2011_2017.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2011",
 6 |         "training_final_date": "01-01-2015",
 7 |         "testing_initial_date": "01-01-2015",
 8 |         "testing_final_date": "01-01-2017",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2011_2019.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2011",
 6 |         "training_final_date": "01-01-2016",
 7 |         "testing_initial_date": "01-01-2016",
 8 |         "testing_final_date": "01-01-2019",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2012_2016.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2012",
 6 |         "training_final_date": "01-01-2015",
 7 |         "testing_initial_date": "01-01-2015",
 8 |         "testing_final_date": "01-01-2016",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2012_2018.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2012",
 6 |         "training_final_date": "01-01-2016",
 7 |         "testing_initial_date": "01-01-2016",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2013_2017.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2013",
 6 |         "training_final_date": "01-01-2016",
 7 |         "testing_initial_date": "01-01-2016",
 8 |         "testing_final_date": "01-01-2017",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2013_2019.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2013",
 6 |         "training_final_date": "01-01-2017",
 7 |         "testing_initial_date": "01-01-2017",
 8 |         "testing_final_date": "01-01-2019",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2014_2018.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-01-2014",
 6 |         "training_final_date": "01-01-2017",
 7 |         "testing_initial_date": "01-01-2017",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": [10,15]
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_2015_2019.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/pickle/commodity_ETFs_long_updated",
 4 |         "ticker_segment_dict": "data/etfs/pickle/ticker_segment_dict.pickle",
 5 |         "ticker_attribute": "Ticker",
 6 |         "training_initial_date": "01-01-2015",
 7 |         "training_final_date": "01-01-2018",
 8 |         "testing_initial_date": "01-01-2018",
 9 |         "testing_final_date": "01-01-2019",
10 |         "data_source": "yahoo",
11 |         "nan_threshold": 0
12 |     },
13 |     "PCA": {
14 |         "N_COMPONENTS": [10,15]
15 |     },
16 |     "clustering": {
17 |         "algo": "DBSCAN",
18 |         "epsilon": 0.4,
19 |         "min_samples": 2
20 |     },
21 |     "pair_restrictions": {
22 |         "min_half_life": 5,
23 |         "min_zero_crossings": 12,
24 |         "p_value_threshold": 0.05,
25 |         "hurst_threshold": 0.5
26 |     },
27 |     "trading": {
28 |         "strategy": "kalman",
29 |         "lookback_multiplier": 2,
30 |         "entry_multiplier": 1,
31 |         "exit_multiplier": 0
32 |     },
33 |     "trading_filter": {
34 |         "active": 0,
35 |         "name": "correlation",
36 |         "filter_lookback_multiplier": 2,
37 |         "lag": 1,
38 |         "diff_threshold": 0
39 |     },
40 |     "mlp": {
41 |         "n_in": 5,
42 |         "n_out": 1,
43 |         "epochs": 200,
44 |         "hidden_nodes":5,
45 |         "loss_fct": "mse",
46 |         "optimizer": "adam",
47 |         "train_val_split": "2016-01-01"
48 |     },
49 |     "output": {
50 |         "filename": "summary/results.xlsx"
51 |     }
52 | }


--------------------------------------------------------------------------------
/drafts/config/config_commodities_pr.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "dataset": {
 3 |         "path": "data/etfs/commodity_ETFs_long.xlsx",
 4 |         "ticker_attribute": "Ticker",
 5 |         "training_initial_date": "01-06-2012",
 6 |         "training_final_date": "01-01-2018",
 7 |         "testing_initial_date": "01-06-2017",
 8 |         "testing_final_date": "01-01-2018",
 9 |         "data_source": "yahoo",
10 |         "nan_threshold": 0
11 |     },
12 |     "PCA": {
13 |         "N_COMPONENTS": 15
14 |     },
15 |     "clustering": {
16 |         "algo": "DBSCAN",
17 |         "epsilon": 0.4,
18 |         "min_samples": 2
19 |     },
20 |     "pair_restrictions": {
21 |         "min_half_life": 5,
22 |         "min_zero_crossings": 12,
23 |         "p_value_threshold": 0.05,
24 |         "hurst_threshold": 0.5
25 |     },
26 |     "trading": {
27 |         "strategy": "kalman",
28 |         "lookback_multiplier": 2,
29 |         "entry_multiplier": 1,
30 |         "exit_multiplier": 0
31 |     },
32 |     "trading_filter": {
33 |         "active": 0,
34 |         "name": "correlation",
35 |         "filter_lookback_multiplier": 2,
36 |         "lag": 1,
37 |         "diff_threshold": 0
38 |     },
39 |     "output": {
40 |         "filename": "summary/results.xlsx"
41 |     }
42 | }


--------------------------------------------------------------------------------
/drafts/draft.py:
--------------------------------------------------------------------------------
  1 | # This file contains old functions, not being used anymore but might turn out helpful at some point in time
  2 | 
  3 | # trader.py
  4 | def bollinger_bands(self, y, x, lookback, entry_multiplier=1, exit_multiplier=0):
  5 |     """
  6 |     This function implements a pairs trading strategy based
  7 |     on bollinger bands.
  8 |     Source: Example 3.2 EC's book
  9 |     : Y & X: time series composing the spread
 10 |     : lookback : Lookback period
 11 |     : entry_multiplier : defines the multiple of std deviation used to enter a position
 12 |     : exit_multiplier: defines the multiple of std deviation used to exit a position
 13 |     """
 14 |     # print("Warning: don't forget lookback (halflife) must be at least 3.")
 15 | 
 16 |     entryZscore = entry_multiplier
 17 |     exitZscore = exit_multiplier
 18 | 
 19 |     # obtain zscore
 20 |     zscore, rolling_beta = self.rolling_zscore(y, x, lookback)
 21 |     zscore_array = np.asarray(zscore)
 22 | 
 23 |     # find long and short indices
 24 |     numUnitsLong = pd.Series([np.nan for i in range(len(y))])
 25 |     numUnitsLong.iloc[0] = 0.
 26 |     long_entries = self.cross_threshold(zscore_array, -entryZscore, 'down', 'entry')
 27 |     numUnitsLong[long_entries] = 1.0
 28 |     long_exits = self.cross_threshold(zscore_array, -exitZscore, 'up')
 29 |     numUnitsLong[long_exits] = 0.0
 30 |     numUnitsLong = numUnitsLong.fillna(method='ffill')
 31 |     numUnitsLong.index = zscore.index
 32 | 
 33 |     numUnitsShort = pd.Series([np.nan for i in range(len(y))])
 34 |     numUnitsShort.iloc[0] = 0.
 35 |     short_entries = self.cross_threshold(zscore_array, entryZscore, 'up', 'entry')
 36 |     numUnitsShort[short_entries] = -1.0
 37 |     short_exits = self.cross_threshold(zscore_array, exitZscore, 'down')
 38 |     numUnitsShort[short_exits] = 0.0
 39 |     numUnitsShort = numUnitsShort.fillna(method='ffill')
 40 |     numUnitsShort.index = zscore.index
 41 | 
 42 |     # concatenate all positions
 43 |     numUnits = numUnitsShort + numUnitsLong
 44 |     numUnits = pd.Series(data=numUnits.values, index=y.index, name='numUnits')
 45 | 
 46 |     # position durations
 47 |     trading_durations = self.add_trading_duration(pd.DataFrame(numUnits, index=y.index))
 48 | 
 49 |     beta = rolling_beta.copy()
 50 |     position_ret, _, ret_summary = self.calculate_sliding_position_returns(y, x, beta, numUnits)
 51 | 
 52 |     # get trade summary
 53 |     rolling_spread = y - rolling_beta * x
 54 | 
 55 |     # All series contain Date as index
 56 |     series_to_include = [(position_ret, 'position_return'),
 57 |                          (y, y.name),
 58 |                          (x, x.name),
 59 |                          (rolling_beta, 'beta_position'),
 60 |                          (rolling_spread, 'spread'),
 61 |                          (zscore, 'zscore'),
 62 |                          (numUnits, 'numUnits'),
 63 |                          (trading_durations, 'trading_duration')]
 64 |     summary = self.trade_summary(series_to_include)
 65 | 
 66 |     return summary, ret_summary
 67 | 
 68 | def bollinger_bands_ec(self, Y, X, lookback, entry_multiplier=1, exit_multiplier=0):
 69 |     df = pd.concat([Y, X], axis=1)
 70 |     df = df.reset_index()
 71 |     df['hedgeRatio'] = np.nan
 72 |     for t in range(lookback, len(df)):
 73 |         x = np.array(X)[t - lookback:t]
 74 |         x = sm.add_constant(x)
 75 |         y = np.array(Y)[t - lookback:t]
 76 |         df.loc[t, 'hedgeRatio'] = sm.OLS(y, x).fit().params[1]
 77 | 
 78 |     cols = [X.name, Y.name]
 79 | 
 80 |     yport = np.ones(df[cols].shape);
 81 |     yport[:, 0] = -df['hedgeRatio']
 82 |     yport = yport * df[cols]
 83 | 
 84 |     yport = yport[X.name] + yport[Y.name]
 85 |     data_mean = pd.rolling_mean(yport, window=lookback)
 86 |     data_std = pd.rolling_std(yport, window=lookback)
 87 |     zScore = (yport - data_mean) / data_std
 88 | 
 89 |     entryZscore = entry_multiplier
 90 |     exitZscore = exit_multiplier
 91 | 
 92 |     longsEntry = zScore < -entryZscore
 93 |     longsExit = zScore > -exitZscore
 94 |     shortsEntry = zScore > entryZscore
 95 |     shortsExit = zScore < exitZscore
 96 | 
 97 |     numUnitsLong = pd.Series([np.nan for i in range(len(df))])
 98 |     numUnitsShort = pd.Series([np.nan for i in range(len(df))])
 99 |     numUnitsLong[0] = 0.
100 |     numUnitsShort[0] = 0.
101 | 
102 |     numUnitsLong[longsEntry] = 1.0
103 |     numUnitsLong[longsExit] = 0.0
104 |     numUnitsLong = numUnitsLong.fillna(method='ffill')
105 | 
106 |     numUnitsShort[shortsEntry] = -1.0
107 |     numUnitsShort[shortsExit] = 0.0
108 |     numUnitsShort = numUnitsShort.fillna(method='ffill')
109 |     df['numUnits'] = numUnitsShort + numUnitsLong
110 | 
111 |     tmp1 = np.ones(df[cols].shape) * np.array([df['numUnits']]).T
112 |     tmp2 = np.ones(df[cols].shape)
113 |     tmp2[:, 0] = -df['hedgeRatio']
114 |     positions = pd.DataFrame(tmp1 * tmp2 * df[cols]).fillna(0)
115 |     pnl = positions.shift(1) * (df[cols] - df[cols].shift(1)) / df[cols].shift(1)
116 |     pnl = pnl.sum(axis=1)
117 |     ret = pnl / np.sum(np.abs(positions.shift(1)), axis=1)
118 |     ret = ret.fillna(0)
119 |     apr = ((np.prod(1. + ret)) ** (252. / len(ret))) - 1
120 |     print('APR', apr)
121 |     if np.std(ret) == 0:
122 |         sharpe = 0
123 |     else:
124 |         sharpe = np.sqrt(252.) * np.mean(ret) / np.std(ret)  # should the mean include moments of no holding?
125 |     print('Sharpe', sharpe)
126 | 
127 |     # checking results
128 |     X = X.reset_index(drop=True)
129 |     Y = Y.reset_index()
130 |     pnl.name = 'pnl';
131 |     rolling_spread = yport
132 |     rolling_spread.name = 'spread'
133 |     zScore.name = 'zscore'
134 |     ret.name = 'ret'
135 |     numUnits = df['numUnits'];
136 |     numUnits.name = 'position_during_day'
137 |     numUnits = numUnits.shift()
138 |     summary = pd.concat([pnl, ret, X, Y, rolling_spread, zScore, numUnits], axis=1)
139 |     summary.index = summary['Date']
140 |     # new_df = new_df.loc[datetime(2006,7,26):]
141 |     summary = summary[36:]
142 | 
143 |     return pnl, ret, summary, sharpe
144 | 
145 | def linear_strategy(self, Y, X, lookback):
146 |     """
147 |     This function applies a simple pairs trading strategy based on
148 |     Ernie Chan's book: Algoritmic Trading.
149 | 
150 |     The number of shares for each position is set to be the negative
151 |     z-score
152 |     """
153 | 
154 |     # z-score
155 |     zscore = self.rolling_zscore(Y, X, lookback)
156 |     numUnits = -zscore
157 | 
158 |     # Define strategy
159 |     # Multiply num positions inversely (-) proportionally to z-score
160 |     # ATTENTION: in the book the signals are inverted. The author confirms it here:
161 |     # http://epchan.blogspot.com/2013/05/my-new-book-on-algorithmic-trading-is.html
162 |     X_positions = numUnits*(-rolling_beta*X)
163 |     Y_positions = numUnits*Y
164 | 
165 |     # P&L:position (-spread value) * percentage of change
166 |     # note that pnl is not a percentage. We multiply a position value by a percentage
167 |     X_returns = (X - X.shift(periods=-1))/X.shift(periods=-1)
168 |     Y_returns = (Y - Y.shift(periods=-1))/Y.shift(periods=-1)
169 |     pnl = X_positions.shift(periods=-1)*X_returns + Y_positions.shift(periods=-1)*Y_returns
170 |     total_pnl = (X_positions.shift(periods=-1)*(X - X.shift(periods=-1)) + \
171 |                  Y_positions.shift(periods=-1)*(Y - Y.shift(periods=-1))).sum()
172 |     ret=pnl/(abs(X_positions.shift(periods=-1))+abs(Y_positions).shift(periods=-1))
173 | 
174 |     return pnl, total_pnl, ret
175 | 
176 | def calculate_returns_no_rebalance(self, y, x, beta, positions):
177 |     """
178 |     Y: price of ETF Y
179 |     X: price of ETF X
180 |     beta: cointegration ratio
181 |     positions: array indicating position to enter in next day
182 |     """
183 | 
184 |     # calculate each leg return
185 |     y_returns = y.pct_change().fillna(0); y_returns.name = 'y_returns'
186 |     x_returns = x.pct_change().fillna(0); x_returns.name = 'x_returns'
187 | 
188 |     # positions preceed the day when the position is actually entered!
189 |     # get indices before entering position
190 |     new_positions = positions.diff()[positions.diff() != 0].index.values
191 |     # get corresponding betas
192 |     beta_position = pd.Series(data=[np.nan] * len(y), index=y.index, name='beta_position')
193 |     beta_position[new_positions] = beta[new_positions]
194 |     # fill in between time slots with same beta
195 |     beta_position = beta_position.fillna(method='ffill')
196 |     # shift betas to match row when position is on
197 |     beta_position = beta_position.shift().fillna(0)
198 |     # name positions series
199 |     positions.name = 'positions'
200 | 
201 |     # apply returns per trade
202 |     # each row contain all the parameters to be applied in that position
203 |     df = pd.concat([y_returns, x_returns, beta_position, positions.shift().fillna(0)], axis=1)
204 |     returns = df.apply(lambda row: self.return_per_timestep(row), axis=1)
205 |     cum_returns = np.cumprod(returns + 1) - 1
206 | 
207 |     return returns, cum_returns
208 | 
209 | def calculate_returns_adapted(self, y, x, beta, positions):
210 |     """
211 |     Y: price of ETF Y
212 |     X: price of ETF X
213 |     beta: cointegration ratio
214 |     positions: array indicating when to take a position
215 |     """
216 |     # calculate each leg return
217 |     y_returns = y.pct_change().fillna(0);
218 |     y_returns.name = 'y_returns'
219 |     x_returns = x.pct_change().fillna(0);
220 |     x_returns.name = 'x_returns'
221 | 
222 |     # name positions series
223 |     positions.name = 'positions'
224 | 
225 |     # beta must shift from row above
226 |     beta_position = beta.shift().fillna(0)
227 |     beta_position.name = 'beta_position'
228 | 
229 |     # apply returns per trade
230 |     df = pd.concat([y_returns, x_returns, beta_position, positions], axis=1)
231 |     returns = df.apply(lambda row: self.return_per_timestep(row), axis=1)
232 |     cum_returns = np.cumprod(returns + 1) - 1
233 | 
234 |     return returns, cum_returns
235 | 
236 | def filter_profitable_pairs(self, sharpe_results, pairs):
237 |     """
238 |     This function discards pairs that were not profitable mantaining those for which a positive sharpe ratio was
239 |     obtained.
240 |     :param sharpe_results: list with sharpe resutls for every pair
241 |     :param pairs: list with all pairs and their info
242 |     :return: list with profitable pairs and their info
243 |     """
244 | 
245 |     sharpe_results = np.asarray(sharpe_results)
246 |     profitable_pairs_indices = np.argwhere(sharpe_results > 0)
247 |     profitable_pairs = [pairs[i] for i in profitable_pairs_indices.flatten()]
248 | 
249 |     return profitable_pairs
250 | 
251 | def rolling_zscore(self, Y, X, lookback):
252 |     """
253 |     This function calculates the normalized moving spread
254 |     Note that moving average and moving std will have the first 39 values as np.Nan, because
255 |     the spread is only valid after 20 points, and the moving averages still need 20 points more
256 |     to define its value.
257 |     """
258 |     # Calculate moving parameters
259 |     # 1.beta:
260 |     rolling_beta = self.rolling_regression(Y, X, window=lookback)
261 |     # 2.spread:
262 |     rolling_spread = Y - rolling_beta * X
263 |     # 3.moving average
264 |     rolling_avg = rolling_spread.rolling(window=lookback, center=False).mean()
265 |     rolling_avg.name = 'spread_' + str(lookback) + 'mavg'
266 |     # 4. rolling standard deviation
267 |     rolling_std = rolling_spread.rolling(window=lookback, center=False).std()
268 |     rolling_std.name = 'rolling_std_' + str(lookback)
269 | 
270 |     # z-score
271 |     zscore = (rolling_spread - rolling_avg) / rolling_std
272 | 
273 |     return zscore, rolling_beta
274 | 
275 | def rolling_regression(self, y, x, window):
276 |     """
277 |     y and x must be pandas.Series
278 |     y is the dependent variable
279 |     x is the independent variable
280 |     spread: y - b*x
281 |     Source: https://stackoverflow.com/questions/37317727/deprecated-rolling-window-
282 |             option-in-ols-from-pandas-to-statsmodels/39704930#39704930
283 |     """
284 |     # Clean-up
285 |     x = x.dropna()
286 |     y = y.dropna()
287 |     # Trim acc to shortest
288 |     if x.index.size > y.index.size:
289 |         x = x[y.index]
290 |     else:
291 |         y = y[x.index]
292 |     # Verify enough space
293 |     if x.index.size < window:
294 |         return None
295 |     else:
296 |         # Add a constant if needed
297 |         X_name = x.name
298 |         X = x.to_frame()
299 |         X['c'] = 1
300 |         # Loop... this can be improved
301 |         estimate_data = []
302 |         for i in range(window, len(X)):
303 |             X_slice = X.iloc[i - window:i, :]  # always index in np as opposed to pandas, much faster
304 |             y_slice = y.iloc[i - window:i]
305 |             coeff = sm.OLS(y_slice, X_slice).fit()
306 |             estimate_data.append(coeff.params[X_name])
307 | 
308 |         # Assemble
309 |         estimate = pd.Series(data=np.nan, index=x.index[:window])
310 |         # add nan values for first #lookback indices
311 |         estimate = estimate.append(pd.Series(data=estimate_data, index=x.index[window:]))
312 |         return estimate
313 | 
314 | def kalman_filter(self, y, x, entry_multiplier=1.0, exit_multiplier=1.0, stabilizing_threshold=5):
315 |     """
316 |     This function implements a Kalman Filter for the estimation of
317 |     the moving hedge ratio
318 |     :param y:
319 |     :param x:
320 |     :param entry_multiplier:
321 |     :param exit_multiplier:
322 |     :param stabilizing_threshold:
323 |     :return:
324 |     """
325 | 
326 |     # store series for late usage
327 |     x_series = x.copy()
328 |     y_series = y.copy()
329 | 
330 |     # add constant
331 |     x = x.to_frame()
332 |     x['intercept'] = 1
333 | 
334 |     x = np.array(x)
335 |     y = np.array(y)
336 |     delta = 0.0001
337 |     Ve = 0.001
338 | 
339 |     yhat = np.ones(len(y)) * np.nan
340 |     e = np.ones(len(y)) * np.nan
341 |     Q = np.ones(len(y)) * np.nan
342 |     R = np.zeros((2, 2))
343 |     P = np.zeros((2, 2))
344 | 
345 |     beta = np.matrix(np.zeros((2, len(y))) * np.nan)
346 | 
347 |     Vw = delta / (1 - delta) * np.eye(2)
348 | 
349 |     beta[:, 0] = 0.
350 | 
351 |     for t in range(len(y)):
352 |         if (t > 0):
353 |             beta[:, t] = beta[:, t - 1]
354 |             R = P + Vw
355 | 
356 |         yhat[t] = np.dot(x[t, :], beta[:, t])
357 | 
358 |         tmp1 = np.matrix(x[t, :])
359 |         tmp2 = np.matrix(x[t, :]).T
360 |         Q[t] = np.dot(np.dot(tmp1, R), tmp2) + Ve
361 | 
362 |         e[t] = y[t] - yhat[t]  # plays spread role
363 | 
364 |         K = np.dot(R, np.matrix(x[t, :]).T) / Q[t]
365 | 
366 |         # print R;print x[t, :].T;print Q[t];print 'K',K;print;print
367 | 
368 |         beta[:, t] = beta[:, t] + np.dot(K, np.matrix(e[t]))
369 | 
370 |         tmp1 = np.matrix(x[t, :])
371 |         P = R - np.dot(np.dot(K, tmp1), R)
372 | 
373 |     # if t==2:
374 |     # print beta[0, :].T
375 | 
376 |     # plt.plot(beta[0, :].T)
377 |     # plt.savefig('/tmp/beta1.png')
378 |     # plt.hold(False)
379 |     # plt.plot(beta[1, :].T)
380 |     # plt.savefig('/tmp/beta2.png')
381 |     # plt.hold(False)
382 |     # plt.plot(e[2:], 'r')
383 |     # plt.hold(True)
384 |     # plt.plot(np.sqrt(Q[2:]))
385 |     # plt.savefig('/tmp/Q.png')
386 | 
387 |     y2 = pd.concat([x_series, y_series], axis=1)
388 | 
389 |     longsEntry = e < -entry_multiplier * np.sqrt(Q)
390 |     longsExit = e > -exit_multiplier * np.sqrt(Q)
391 | 
392 |     shortsEntry = e > entry_multiplier * np.sqrt(Q)
393 |     shortsExit = e < exit_multiplier * np.sqrt(Q)
394 | 
395 |     numUnitsLong = pd.Series([np.nan for i in range(len(y))])
396 |     numUnitsShort = pd.Series([np.nan for i in range(len(y))])
397 |     # initialize with zero
398 |     numUnitsLong[0] = 0.
399 |     numUnitsShort[0] = 0.
400 |     # remove trades while the spread is stabilizing
401 |     longsEntry[:stabilizing_threshold] = False
402 |     longsExit[:stabilizing_threshold] = False
403 |     shortsEntry[:stabilizing_threshold] = False
404 |     shortsExit[:stabilizing_threshold] = False
405 | 
406 |     numUnitsLong[longsEntry] = 1.
407 |     numUnitsLong[longsExit] = 0
408 |     numUnitsLong = numUnitsLong.fillna(method='ffill')
409 | 
410 |     numUnitsShort[shortsEntry] = -1.
411 |     numUnitsShort[shortsExit] = 0
412 |     numUnitsShort = numUnitsShort.fillna(method='ffill')
413 | 
414 |     numUnits = numUnitsLong + numUnitsShort
415 |     numUnits = pd.Series(data=numUnits.values, index=y_series.index, name='numUnits')
416 | 
417 |     # position durations
418 |     trading_durations = self.add_trading_duration(pd.DataFrame(numUnits, index=y_series.index))
419 | 
420 |     beta = pd.Series(data=np.squeeze(np.asarray(beta[0, :])), index=y_series.index).fillna(0)
421 |     position_ret, _, ret_summary = self.calculate_sliding_position_returns(y_series, x_series, beta, numUnits)
422 | 
423 |     # add transaction costs and gather all info in df
424 |     series_to_include = [(position_ret, 'position_return'),
425 |                          (y_series, y_series.name),
426 |                          (x_series, x_series.name),
427 |                          (beta, 'beta_position'),
428 |                          (pd.Series(e, index=y_series.index), 'e'),
429 |                          (pd.Series(np.sqrt(Q), index=y_series.index), 'sqrt(Q)'),
430 |                          (numUnits, 'numUnits'),
431 |                          (trading_durations, 'trading_duration')]
432 | 
433 |     summary = self.trade_summary(series_to_include)
434 | 
435 |     return summary, ret_summary
436 | 
437 | def cross_threshold(self, array, threshold, direction='up', position='exit'):
438 |     """
439 |     This function returns the indices corresponding to the positions where a given threshold
440 |     is crossed
441 |     :param array: np.array with time series
442 |     :param threshold: threshold to be crossed
443 |     :param direction: going up or down
444 |     :param mode: auxiliar variable indicating whether we are checking for a position entry or exit
445 |     :return: indices where threshold is crossed going in the desired direction
446 |     """
447 | 
448 |     # add index for first element transitioning from None value, in case its above/below threshold
449 |     # only add when checking if position should be entered.
450 |     initial_index = []
451 |     first_index, first_element = next((item[0], item[1]) for item in enumerate(array) if not np.isnan(item[1]))
452 |     if position == 'entry':
453 |         if direction == 'up':
454 |             if first_element > threshold:
455 |                 initial_index.append(first_index)
456 |         elif direction == 'down':
457 |             if first_element < threshold:
458 |                 initial_index.append(first_index)
459 |         else:
460 |             print('The series must be either going "up" or "down", please insert valid direction')
461 |     initial_index = np.asarray(initial_index, dtype='int')
462 | 
463 |     # add small decimal case to consider only strictly larger/smaller
464 |     if threshold > 0:
465 |         threshold = threshold + 0.000000001
466 |     else:
467 |         threshold = threshold - 0.000000001
468 |     array = array - threshold
469 | 
470 |     # add other indices
471 |     indices = np.where(np.diff(np.sign(array)))[0] + 1
472 |     # only consider indices after first element which is not Nan
473 |     indices = indices[indices > first_index]
474 | 
475 |     direction_indices = indices
476 |     for index in indices:
477 |         if direction == 'up':
478 |             if array[index] < array[index - 1]:
479 |                 direction_indices = direction_indices[direction_indices != index]
480 |         elif direction == 'down':
481 |             if array[index] > array[index - 1]:
482 |                 direction_indices = direction_indices[direction_indices != index]
483 |     # concatenate
484 |     direction_indices = np.concatenate((initial_index, direction_indices), axis=0)
485 | 
486 |     return direction_indices
487 | 
488 | def apply_correlation_filter(self, lookback, lag, threshold, Y, X, units):
489 |     """
490 |     This function implements a filter proposed by Dunnis 2005.
491 |     The main idea is tracking how the correlation is varying in a moving period, so that we
492 |     are able to identify when the two legs of the spread are moving in opposing directions
493 |     by analyzing how the correlation values are varying.
494 |     :param lookback: lookback period
495 |     :param lag: lag to compare the correlaiton evolution
496 |     :param threshold: minimium difference to consider change
497 |     :param Y: Y series
498 |     :param X: X series
499 |     :param units: positions taken
500 |     :return: indices for position entry
501 |     """
502 | 
503 |     # calculate correlation variations
504 |     rolling_window = lookback
505 |     returns_X = X.pct_change()
506 |     returns_Y = Y.pct_change()
507 |     correlation = returns_X.rolling(rolling_window).corr(returns_Y)
508 |     diff_correlation = correlation.diff(periods=lag).fillna(0)
509 | 
510 |     # change positions accordingly
511 |     diff_correlation.name = 'diff_correlation';
512 |     units.name = 'units'
513 |     units.index = diff_correlation.index
514 |     df = pd.concat([diff_correlation, units], axis=1)
515 |     new_df = self.update_positions(df, 'diff_correlation', threshold)
516 | 
517 |     units = new_df['units']
518 | 
519 |     return units
520 | 
521 | def apply_zscorediff_filter(self, lag, threshold, zscore, units):
522 |     """
523 |     This function implements a filter which tracks how the zscore has been growing.
524 |     The premise is that positions should not be entered while zscore is rising.
525 |     :param lookback: lookback period
526 |     :param lag: lag to compare the zscore evolution
527 |     :param threshold: minimium difference to consider change
528 |     :param Y: Y series
529 |     :param X: X series
530 |     :param units: positions taken
531 |     :return: indices for position entry
532 |     """
533 | 
534 |     # calculate zscore differences
535 |     zscore_diff = zscore.diff(periods=lag).fillna(0)
536 | 
537 |     # change positions accordingly
538 |     zscore_diff.name = 'zscore_diff';
539 |     units.name = 'units'
540 |     units.index = zscore_diff.index
541 |     df = pd.concat([zscore_diff, units], axis=1)
542 |     new_df = self.update_positions(df, 'zscore_diff', threshold)
543 | 
544 |     units = new_df['units']
545 | 
546 |     return units
547 | 
548 | def calculate_sliding_position_returns(self, y, x, beta, positions):
549 |     """
550 |     Y: price of ETF Y
551 |     X: price of ETF X
552 |     beta: moving cointegration ratio
553 |     positions: array indicating position to enter in next day
554 |     """
555 |     # get copy of series
556 |     y = y.copy()
557 |     y.name = 'y'
558 |     x = x.copy()
559 |     x.name = 'x'
560 | 
561 |     # positions preceed the day when the position is actually entered!
562 |     # get indices before entering position
563 |     new_positions = positions.diff()[positions.diff() != 0].index.values
564 |     # get corresponding betas
565 |     beta_position = pd.Series(data=[np.nan] * len(y), index=y.index, name='beta_position')
566 |     beta_position[new_positions] = beta[new_positions]
567 |     # fill in between time slots with same beta
568 |     beta_position = beta_position.fillna(method='ffill')
569 |     # shift betas to match row when position is on
570 |     beta_position = beta_position.shift().fillna(0)
571 | 
572 |     # create variable for signalizing end of position
573 |     end_position = pd.Series(data=[0] * len(y), index=y.index, name='end_position')
574 |     end_position[new_positions] = 1.
575 | 
576 |     # get corresponding X and Y
577 |     y_entry = pd.Series(data=[np.nan] * len(y), index=y.index, name='y_entry')
578 |     x_entry = pd.Series(data=[np.nan] * len(y), index=y.index, name='x_entry')
579 |     y_entry[new_positions] = y[new_positions]
580 |     x_entry[new_positions] = x[new_positions]
581 |     y_entry = y_entry.shift().fillna(method='ffill')
582 |     x_entry = x_entry.shift().fillna(method='ffill')
583 | 
584 |     # name positions series
585 |     positions.name = 'positions'
586 | 
587 |     # apply returns per trade
588 |     # each row contain all the parameters to be applied in that position
589 |     df = pd.concat([y, x, beta_position, positions.shift().fillna(0), y_entry, x_entry, end_position], axis=1)
590 |     returns = df.apply(lambda row: self.return_per_position(row, sliding=True), axis=1).fillna(0)
591 |     cum_returns = np.cumprod(returns + 1) - 1
592 |     df['ret'] = returns
593 |     returns.name = 'position_return'
594 | 
595 |     return returns, cum_returns, df
596 | 
597 | def return_per_timestep(self, row):
598 |     if row['beta_position'] > 1.:
599 |         return ((1 / row['beta_position']) * row['y_returns'] - 1 * row['x_returns']) * row['positions']
600 |     else:
601 |         return (row['y_returns'] - row['beta_position'] * row['x_returns']) * row['positions']
602 | 
603 | def update_positions(self, df, attribute, threshold):
604 |     """
605 |     The following function receives a dataframe containing the current positions
606 |     along with the attribute column from which condition should be verified.
607 |     A new df with positions updated accordingly is returned.
608 |     :param df: df containing positions and column with attribute
609 |     :param attribute: attribute name
610 |     :param threshold: threshold that condition must verify
611 |     :return: df with updated positions
612 |     """
613 |     previous_unit = 0
614 |     for index, row in df.iterrows():
615 |         if previous_unit == row['units']:
616 |             continue  # no change in positions to verify
617 |         else:
618 |             if row['units'] == 0:
619 |                 previous_unit = row['units']
620 |                 continue  # simply close trade, nothing to verify
621 |             else:
622 |                 if (row[attribute] <= threshold and row['units'] < 0) or \
623 |                    (row[attribute] > threshold and row['units'] > 0): # if criteria is met, continue
624 |                     previous_unit = row['units']
625 |                     continue
626 |                 else:  # if criteria is not met, update row
627 |                     df.loc[index, 'units'] = 0
628 |                     previous_unit = 0
629 |                     continue
630 | 
631 |     return df
632 | 
633 | # data_processor.py
634 | def read_tickers_prices(self, tickers, initial_date, final_date, data_source, column='Adj Close'):
635 |     """
636 |     This function reads the price series for the requested tickers
637 | 
638 |     :param tickers: list with tickers from which to retrieve prices
639 |     :param initial_date: start date to retrieve price series
640 |     :param final_date: end point
641 |     :param data_source: data source from where to retrieve data
642 | 
643 |     :return: dictionary with price series for each ticker
644 |     """
645 |     error_counter = 0
646 |     dataset = {key: None for key in tickers}
647 |     for ticker in tickers:
648 |         try:
649 |             df = data.DataReader(ticker, data_source, initial_date, final_date)
650 |             series = df[column]
651 |             series.name = ticker  # filter close price only
652 |             dataset[ticker] = series.copy()
653 |         except:
654 |             error_counter = error_counter + 1
655 |             print('Not Possible to retrieve information for ' + ticker)
656 | 
657 |     print('\nUnable to download ' + str(error_counter / len(tickers) * 100) + '% of the ETFs')
658 | 
659 |     return dataset
660 | 
661 | 
662 | 
663 | # forecasting notebook
664 | 
665 | def apply_ARIMA(series, p, d, q):
666 |     # fit model
667 |     model = ARIMA(series, order=(p,d,q))
668 |     model_fit = model.fit(disp=0)
669 |     print(model_fit.summary())
670 |     # plot residual errors
671 |     residuals = pd.DataFrame(model_fit.resid)
672 |     residuals.plot()
673 |     plt.show()
674 |     residuals.plot(kind='kde')
675 |     plt.show()
676 |     print(residuals.describe())
677 | 
678 | def rolling_ARIMA(series, p, d, q, train_val_split):
679 |     # standardize
680 |     mean = series.mean()
681 |     std = np.std(series)
682 |     norm_series = (series - mean) / std
683 | 
684 |     train, val = norm_series[:train_val_split].values, norm_series[train_val_split:].values
685 |     history = np.asarray([x for x in train])
686 |     predictions = list()
687 |     for t in range(len(val)):
688 |         model = ARIMA(history, order=(p, d, q))
689 |         model_fit = model.fit(transparams=False, trend='nc', tol=0.0001, disp=0)
690 |         if t == 0:
691 |             print(model_fit.summary())
692 |             print(history[-5:])
693 |         output = model_fit.forecast()
694 |         yhat = output[0]
695 |         predictions.append(yhat)
696 |         obs = val[t]
697 |         history = np.append(history, obs)
698 |         print('predicted=%f, expected=%f' % (yhat, obs))
699 | 
700 |     # destandardize
701 |     val = val * std + mean
702 |     predictions = np.asarray(predictions);
703 |     predictions = predictions * std + mean
704 |     error = mean_squared_error(val, predictions)
705 |     print('Test MSE: {}'.format(error))
706 |     # plot
707 |     # plt.plot(val)
708 |     # plt.plot(predictions, color='red')
709 |     # plt.show()
710 | 
711 |     return error, predictions
712 | 


--------------------------------------------------------------------------------
/drafts/main.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import json
  4 | import sys
  5 | import pickle
  6 | from classes import class_Trader, class_ForecastingTrader, class_DataProcessor, class_SeriesAnalyser
  7 | 
  8 | # just set the seed for the random number generator
  9 | np.random.seed(107)
 10 | 
 11 | if __name__ == "__main__":
 12 | 
 13 |     # read inout parameters
 14 |     config_path = sys.argv[1]
 15 |     pairs_mode = int(sys.argv[2])
 16 |     trade_mode = int(sys.argv[3]) # 1. Benchmark 2.ML 3.Both
 17 |     with open(config_path, 'r') as f:
 18 |         config = json.load(f)
 19 | 
 20 |     ###################################################################################################################
 21 |     # 1. Upload Dataset
 22 |     # This code assumes the data preprocessing has been done previously by running the notebook:
 23 |     # - PairsTrading-DataPreprocessing.ipynb
 24 |     # Therefore, we simply retrieve the data from a pickle file and select the dates to study
 25 |     ###################################################################################################################
 26 | 
 27 |     # initialize data processor
 28 |     data_processor = class_DataProcessor.DataProcessor()
 29 | 
 30 |     # Read dataset and select dates
 31 |     dataset_path = config['dataset']['path']
 32 |     df_prices = pd.read_pickle(dataset_path)
 33 | 
 34 |     # split data in training and test
 35 |     df_prices_train, df_prices_test = data_processor.split_data(df_prices,
 36 |                                                                 (config['dataset']['training_initial_date'],
 37 |                                                                  config['dataset']['training_final_date']),
 38 |                                                                 (config['dataset']['testing_initial_date'],
 39 |                                                                  config['dataset']['testing_final_date']),
 40 |                                                                 remove_nan=True)
 41 | 
 42 |     ###################################################################################################################
 43 |     # 2. Pairs Filtering & Selection
 44 |     # As this part is very visual, the pairs filtering and selection can be obtained by running the notebook:
 45 |     # - 'PairsTrading-Clustering.ipynb'
 46 |     # This section uploads the pairs for each scenario
 47 |     ###################################################################################################################
 48 |     # initialize series analyser
 49 |     series_analyser = class_SeriesAnalyser.SeriesAnalyser()
 50 | 
 51 |     if pairs_mode == 1:
 52 |         with open('data/etfs/pickle/pairs_unfiltered.pickle', 'rb') as handle:
 53 |             pairs = pickle.load(handle)
 54 |     elif pairs_mode == 2:
 55 |         with open('data/etfs/pickle/pairs_category.pickle', 'rb') as handle:
 56 |             pairs = pickle.load(handle)
 57 |     elif pairs_mode == 3:
 58 |         with open('data/etfs/pickle/pairs_unsupervised_learning.pickle', 'rb') as handle:
 59 |             pairs = pickle.load(handle)
 60 | 
 61 |     ###################################################################################################################
 62 |     # 3. Apply trading
 63 |     # First apply the strategy to the training data, to discard the pairs that were not profitable not even in the
 64 |     # training period.
 65 |     # Secondly, apply the strategy on the test set
 66 |     ###################################################################################################################
 67 |     trader = class_Trader.Trader()
 68 | 
 69 |     # obtain trading strategy
 70 |     trading_strategy = config['trading']['strategy']
 71 | 
 72 |     # obtain trading filter info
 73 |     if config['trading_filter']['active'] == 1:
 74 |         trading_filter = config['trading_filter']
 75 |     else:
 76 |         trading_filter = None
 77 | 
 78 |     # ################################################ BENCHMARK #######################################################
 79 |     if (trade_mode == 1) or (trade_mode == 3):
 80 |         # Run on TRAIN SET
 81 |         if 'bollinger' in trading_strategy:
 82 |             sharpe_results, cum_returns, performance = \
 83 |                 trader.apply_bollinger_strategy(pairs=pairs,
 84 |                                                 lookback_multiplier=config['trading']['lookback_multiplier'],
 85 |                                                 entry_multiplier=config['trading']['entry_multiplier'],
 86 |                                                 exit_multiplier=config['trading']['exit_multiplier'],
 87 |                                                 trading_filter=trading_filter,
 88 |                                                 test_mode=False
 89 |                                                 )
 90 |         elif 'kalman' in trading_strategy:
 91 |             sharpe_results, cum_returns, performance = \
 92 |                 trader.apply_kalman_strategy(pairs,
 93 |                                              entry_multiplier=config['trading']['entry_multiplier'],
 94 |                                              exit_multiplier=config['trading']['exit_multiplier'],
 95 |                                              trading_filter=trading_filter,
 96 |                                              test_mode=False
 97 |                                              )
 98 |         else:
 99 |             print('Please insert valid trading strategy: 1. "bollinger" or 2."kalman"')
100 |             exit()
101 | 
102 |         # get train metrics
103 |         n_years_train = round(len(df_prices_train) / 240)
104 |         train_metrics = trader.calculate_metrics(sharpe_results, cum_returns, n_years_train)
105 | 
106 |         # filter pairs with positive results
107 |         profitable_pairs = trader.filter_profitable_pairs(sharpe_results=sharpe_results, pairs=pairs)
108 | 
109 |         # Run on TEST SET
110 |         if 'bollinger' in trading_strategy:
111 |             sharpe_results, cum_returns, performance = \
112 |                 trader.apply_bollinger_strategy(pairs=profitable_pairs,
113 |                                                 lookback_multiplier=config['trading']['lookback_multiplier'],
114 |                                                 entry_multiplier=config['trading']['entry_multiplier'],
115 |                                                 exit_multiplier=config['trading']['exit_multiplier'],
116 |                                                 trading_filter=trading_filter,
117 |                                                 test_mode=True
118 |                                                 )
119 |             print('Avg sharpe Ratio using Bollinger in test set: ', np.mean(sharpe_results))
120 | 
121 |         elif 'kalman' in trading_strategy:
122 |             sharpe_results, cum_returns, performance = \
123 |                 trader.apply_kalman_strategy(pairs=profitable_pairs,
124 |                                              entry_multiplier=config['trading']['entry_multiplier'],
125 |                                              exit_multiplier=config['trading']['exit_multiplier'],
126 |                                              trading_filter=trading_filter,
127 |                                              test_mode=True
128 |                                              )
129 |             print('Avg sharpe Ratio using kalman in the test set: ', np.mean(sharpe_results))
130 | 
131 |     # ################################################ ML BASED #######################################################
132 |     if (trade_mode == 2) or (trade_mode == 3):
133 | 
134 |         forecasting_trader = class_ForecastingTrader.ForecastingTrader()
135 | 
136 |         # 1) get pairs spreads and train models
137 |         mlp_config = config['mlp']
138 |         mlp_config['train_val_split'] = int(config['mlp']['train_val_split']*len(pairs[0][2]['spread']))
139 |         models = forecasting_trader.train_models(pairs[:2], model_config=mlp_config) # CHANGE LIMITATION OF PAIRS
140 | 
141 |         # 2) test models on training set and only keep profitable spreads
142 |         print('Still under construction')
143 |         exit()
144 | 
145 |         # 3) test spreads on test set
146 | 
147 |     ###################################################################################################################
148 |     # 4. Get results
149 |     # Obtain the results in the test set.
150 |     # - writes global pairs results in an excel file
151 |     # - stores dataframe with info regarding every pair in pickle file
152 |     ###################################################################################################################
153 |     with open(config['dataset']['ticker_segment_dict'], 'rb') as handle:
154 |         ticker_segment_dict = pickle.load(handle)
155 | 
156 |     results, pairs_summary = trader.summarize_results(sharpe_results, cum_returns, performance, profitable_pairs,
157 |                                                       ticker_segment_dict)
158 | 
159 | 


--------------------------------------------------------------------------------
/drafts/mlp_trainer.py:
--------------------------------------------------------------------------------
 1 | from classes import class_ForecastingTrader, class_DataProcessor
 2 | import numpy as np
 3 | np.random.seed(1) # NumPy
 4 | import random
 5 | random.seed(3) # Python
 6 | import tensorflow as tf
 7 | tf.set_random_seed(2) # Tensorflow
 8 | session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
 9 |                               inter_op_parallelism_threads=1)
10 | from keras import backend as K
11 | sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
12 | K.set_session(sess)
13 | 
14 | import pandas as pd
15 | import pickle
16 | import gc
17 | 
18 | forecasting_trader = class_ForecastingTrader.ForecastingTrader()
19 | data_processor = class_DataProcessor.DataProcessor()
20 | 
21 | ################################# READ PRICES AND PAIRS #################################
22 | # read prices
23 | df_prices = pd.read_pickle('/content/drive/PairsTrading/2009-2019/commodity_ETFs_intraday_interpolated_screened_no_outliers.pickle')
24 | # split data in training and test
25 | df_prices_train, df_prices_test = data_processor.split_data(df_prices,
26 |                                                             ('01-01-2009',
27 |                                                              '31-12-2017'),
28 |                                                             ('01-01-2018',
29 |                                                              '31-12-2018'),
30 |                                                             remove_nan=True)
31 | # load pairs
32 | with open('/content/drive/PairsTrading/2009-2019/pairs_unsupervised_learning_optical_intraday.pickle', 'rb') as handle:
33 |     pairs = pickle.load(handle)
34 | n_years_train = round(len(df_prices_train) / (240 * 78))
35 | print('Loaded {} pairs!'.format(len(pairs)))
36 | 
37 | 
38 | ################################# TRAIN MODELS #################################
39 | 
40 | n_in_set = [6, 12, 24]
41 | hidden_nodes_set = [[10], [20], [30], [10,10]]
42 | hidden_nodes_names = [str(nodes[0])+'*2' if len(nodes) > 1 else str(nodes[0]) for nodes in hidden_nodes_set]
43 | 
44 | # WARNING!!
45 | # pairs = pairs[:2]
46 | 
47 | for input_dim in n_in_set:
48 |     for i, hidden_nodes in enumerate(hidden_nodes_set):
49 |         model_config = {"n_in": input_dim,
50 |                         "n_out": 1,
51 |                         "epochs": 500,
52 |                         "hidden_nodes": hidden_nodes,
53 |                         "loss_fct": "mse",
54 |                         "optimizer": "rmsprop",
55 |                         "batch_size": 256,
56 |                         "train_val_split": '2017-01-01',
57 |                         "test_init": '2018-01-01'}
58 |         models = forecasting_trader.train_models(pairs, model_config, model_type='mlp')
59 |         # save models for this configuration
60 |         with open('/content/drive/PairsTrading/mlp_models/models_n_in-'+str(input_dim)+'_hidden_nodes-'+hidden_nodes_names[i]+'.pkl', 'wb') as f:
61 |             pickle.dump(models, f)
62 | 
63 | gc.collect()


--------------------------------------------------------------------------------
/training/PairsTrading_DeepLearning.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"PairsTrading_DeepLearning.ipynb","version":"0.3.2","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"code","metadata":{"id":"EBl2Ok7tbGHC","colab_type":"code","outputId":"c4fbb45a-3fc9-4f6d-95b3-d341445f23b8","executionInfo":{"status":"ok","timestamp":1567675837287,"user_tz":-60,"elapsed":4098,"user":{"displayName":"Simao Sarmento","photoUrl":"https://lh3.googleusercontent.com/a-/AAuE7mBWip6a0UyBh_1Dd-LrfHuFavFBPDAae6wiUEky=s64","userId":"16654987200280043400"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["import tensorflow as tf\n","tf.test.gpu_device_name()"],"execution_count":1,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'/device:GPU:0'"]},"metadata":{"tags":[]},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"MPzxbLTnNQet","colab_type":"text"},"source":["**Instalation**\n","\n","Must run every time the notebook is closed."]},{"cell_type":"code","metadata":{"id":"PARahqujMTrP","colab_type":"code","outputId":"d3989cdb-a455-43e4-85e8-6f6cec5ae2b5","executionInfo":{"status":"ok","timestamp":1567675959397,"user_tz":-60,"elapsed":118030,"user":{"displayName":"Simao Sarmento","photoUrl":"https://lh3.googleusercontent.com/a-/AAuE7mBWip6a0UyBh_1Dd-LrfHuFavFBPDAae6wiUEky=s64","userId":"16654987200280043400"}},"colab":{"base_uri":"https://localhost:8080/","height":136}},"source":["# Install a Drive FUSE wrapper.\n"," # https://github.com/astrada/google-drive-ocamlfuse\n","\n","!apt-get install -y -qq software-properties-common python-software-properties module-init-tools\n","!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null \n","!apt-get update -qq 2>&1 > /dev/null\n","!apt-get -y install -qq google-drive-ocamlfuse fuse"],"execution_count":2,"outputs":[{"output_type":"stream","text":["E: Package 'python-software-properties' has no installation candidate\n","Selecting previously unselected package google-drive-ocamlfuse.\n","(Reading database ... 131183 files and directories currently installed.)\n","Preparing to unpack .../google-drive-ocamlfuse_0.7.6-0ubuntu1~ubuntu18.04.1_amd64.deb ...\n","Unpacking google-drive-ocamlfuse (0.7.6-0ubuntu1~ubuntu18.04.1) ...\n","Setting up google-drive-ocamlfuse (0.7.6-0ubuntu1~ubuntu18.04.1) ...\n","Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"IuppVB4mMcyH","colab_type":"code","colab":{}},"source":["# Generate auth tokens for Colab\n","\n","from google.colab import auth \n","auth.authenticate_user()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"H1DhebmJMr3O","colab_type":"code","outputId":"85c60d31-b24b-4717-9383-d7aab5955f4b","executionInfo":{"status":"ok","timestamp":1567676010997,"user_tz":-60,"elapsed":24171,"user":{"displayName":"Simao Sarmento","photoUrl":"https://lh3.googleusercontent.com/a-/AAuE7mBWip6a0UyBh_1Dd-LrfHuFavFBPDAae6wiUEky=s64","userId":"16654987200280043400"}},"colab":{"base_uri":"https://localhost:8080/","height":105}},"source":["# Generate creds for the Drive FUSE library.\n","\n","from oauth2client.client import GoogleCredentials \n","creds = GoogleCredentials.get_application_default()\n","import getpass \n","!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL\n","vcode = getpass.getpass() \n","!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}"],"execution_count":4,"outputs":[{"output_type":"stream","text":["Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force\n","··········\n","Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force\n","Please enter the verification code: Access token retrieved correctly.\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"mID6JP_yM24_","colab_type":"code","colab":{}},"source":["# Create a directory and mount Google Drive using that directory.\n","\n","!mkdir -p drive\n","!google-drive-ocamlfuse drive"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"7NwIXZ-bNGjg","colab_type":"code","outputId":"1a90f65f-f5f4-4acf-e1dd-9f982772ca98","executionInfo":{"status":"ok","timestamp":1567676034775,"user_tz":-60,"elapsed":2435,"user":{"displayName":"Simao Sarmento","photoUrl":"https://lh3.googleusercontent.com/a-/AAuE7mBWip6a0UyBh_1Dd-LrfHuFavFBPDAae6wiUEky=s64","userId":"16654987200280043400"}},"colab":{"base_uri":"https://localhost:8080/","height":119}},"source":["print ('Files in Drive:')\n","!ls /content/drive/PairsTrading"],"execution_count":6,"outputs":[{"output_type":"stream","text":["Files in Drive:\n","2009-2019\t\t    encoder_decoder\t\t     __pycache__\n","class_DataProcessor.py\t    encoder_decoder_trainer.py\t     rnn_models\n","class_ForecastingTrader.py  mlp_models\t\t\t     rnn_trainer.py\n","class_SeriesAnalyser.py     mlp_trainer.py\n","class_Trader.py\t\t    PairsTrading_DeepLearning.ipynb\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"Y7vdSjDhMm7G","colab_type":"text"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"2qjrA9XTgww5","colab_type":"text"},"source":["**Run Python File**\n","\n","Run the executable here"]},{"cell_type":"code","metadata":{"id":"BjZQKJaegvDl","colab_type":"code","outputId":"55c317f9-55ab-4834-a69e-1d2e6608e063","executionInfo":{"status":"ok","timestamp":1567676538921,"user_tz":-60,"elapsed":96370,"user":{"displayName":"Simao Sarmento","photoUrl":"https://lh3.googleusercontent.com/a-/AAuE7mBWip6a0UyBh_1Dd-LrfHuFavFBPDAae6wiUEky=s64","userId":"16654987200280043400"}},"colab":{"base_uri":"https://localhost:8080/","height":1000}},"source":["!python3 \"/content/drive/PairsTrading/rnn_trainer.py\""],"execution_count":9,"outputs":[{"output_type":"stream","text":["WARNING: Logging before flag parsing goes to stderr.\n","W0905 09:40:50.160757 140025728874368 deprecation_wrapper.py:119] From /content/drive/PairsTrading/class_ForecastingTrader.py:7: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.\n","\n","W0905 09:40:50.172061 140025728874368 deprecation_wrapper.py:119] From /content/drive/PairsTrading/class_ForecastingTrader.py:8: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.\n","\n","Using TensorFlow backend.\n","W0905 09:40:50.200075 140025728874368 deprecation_wrapper.py:119] From /content/drive/PairsTrading/class_ForecastingTrader.py:11: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.\n","\n","W0905 09:40:50.200282 140025728874368 deprecation_wrapper.py:119] From /content/drive/PairsTrading/class_ForecastingTrader.py:11: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n","\n","2019-09-05 09:40:50.205835: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz\n","2019-09-05 09:40:50.206109: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x297b2c0 executing computations on platform Host. Devices:\n","2019-09-05 09:40:50.206147: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>\n","2019-09-05 09:40:50.208095: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1\n","2019-09-05 09:40:50.255080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.255896: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x297bb80 executing computations on platform CUDA. Devices:\n","2019-09-05 09:40:50.255925: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7\n","2019-09-05 09:40:50.256097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.256766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: \n","name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235\n","pciBusID: 0000:00:04.0\n","2019-09-05 09:40:50.257082: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0\n","2019-09-05 09:40:50.258509: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0\n","2019-09-05 09:40:50.259685: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0\n","2019-09-05 09:40:50.260017: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0\n","2019-09-05 09:40:50.261678: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0\n","2019-09-05 09:40:50.262807: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0\n","2019-09-05 09:40:50.266743: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7\n","2019-09-05 09:40:50.266863: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.267610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.268276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0\n","2019-09-05 09:40:50.268329: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0\n","2019-09-05 09:40:50.269628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:\n","2019-09-05 09:40:50.269660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 \n","2019-09-05 09:40:50.269684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N \n","2019-09-05 09:40:50.269810: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.270546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:50.271232: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.\n","2019-09-05 09:40:50.271278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10748 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)\n","2019-09-05 09:40:51.371562: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:51.372348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: \n","name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235\n","pciBusID: 0000:00:04.0\n","2019-09-05 09:40:51.372436: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0\n","2019-09-05 09:40:51.372501: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0\n","2019-09-05 09:40:51.372536: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0\n","2019-09-05 09:40:51.372580: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0\n","2019-09-05 09:40:51.372612: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0\n","2019-09-05 09:40:51.372649: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0\n","2019-09-05 09:40:51.372710: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7\n","2019-09-05 09:40:51.372819: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:51.373585: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:51.374279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0\n","2019-09-05 09:40:51.374338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:\n","2019-09-05 09:40:51.374357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 \n","2019-09-05 09:40:51.374372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N \n","2019-09-05 09:40:51.374505: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:51.375273: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n","2019-09-05 09:40:51.375942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10748 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)\n","Total of 59 tickers\n","Total of 58 tickers after removing tickers with Nan values\n","Loaded 5 pairs!\n","W0905 09:40:52.478559 140025728874368 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n","\n","W0905 09:40:53.106798 140025728874368 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n","\n","Model: \"sequential_1\"\n","_________________________________________________________________\n","Layer (type)                 Output Shape              Param #   \n","=================================================================\n","cu_dnnlstm_1 (CuDNNLSTM)     (None, 50)                10600     \n","_________________________________________________________________\n","dense_1 (Dense)              (None, 1)                 51        \n","=================================================================\n","Total params: 10,651\n","Trainable params: 10,651\n","Non-trainable params: 0\n","_________________________________________________________________\n","Train on 156492 samples, validate on 19506 samples\n","Epoch 1/1\n","2019-09-05 09:40:56.115567: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0\n","2019-09-05 09:40:56.278138: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7\n","156492/156492 [==============================] - 3s 17us/step - loss: 0.1698 - mean_absolute_error: 0.2601 - val_loss: 0.0486 - val_mean_absolute_error: 0.1749\n","156492/156492 [==============================] - 10s 66us/step\n","19506/19506 [==============================] - 1s 72us/step\n","19368/19368 [==============================] - 1s 71us/step\n","------------------------------------------------------------\n","The mse train loss is:  0.16978542109206232\n","The mae train loss is:  0.26013653967987066\n","The mse test loss is:  0.048557348591191006\n","The mae test loss is:  0.17493258422916882\n","------------------------------------------------------------\n","Model: \"sequential_2\"\n","_________________________________________________________________\n","Layer (type)                 Output Shape              Param #   \n","=================================================================\n","cu_dnnlstm_2 (CuDNNLSTM)     (None, 50)                10600     \n","_________________________________________________________________\n","dense_2 (Dense)              (None, 1)                 51        \n","=================================================================\n","Total params: 10,651\n","Trainable params: 10,651\n","Non-trainable params: 0\n","_________________________________________________________________\n","Train on 156492 samples, validate on 19506 samples\n","Epoch 1/1\n","156492/156492 [==============================] - 1s 9us/step - loss: 0.2911 - mean_absolute_error: 0.2505 - val_loss: 0.0068 - val_mean_absolute_error: 0.0590\n","156492/156492 [==============================] - 9s 59us/step\n","19506/19506 [==============================] - 1s 60us/step\n","19368/19368 [==============================] - 1s 58us/step\n","------------------------------------------------------------\n","The mse train loss is:  0.2911193497426787\n","The mae train loss is:  0.25051101294911976\n","The mse test loss is:  0.006790886505617798\n","The mae test loss is:  0.058951191548370745\n","------------------------------------------------------------\n","Model: \"sequential_3\"\n","_________________________________________________________________\n","Layer (type)                 Output Shape              Param #   \n","=================================================================\n","cu_dnnlstm_3 (CuDNNLSTM)     (None, 50)                10600     \n","_________________________________________________________________\n","dense_3 (Dense)              (None, 1)                 51        \n","=================================================================\n","Total params: 10,651\n","Trainable params: 10,651\n","Non-trainable params: 0\n","_________________________________________________________________\n","Train on 156492 samples, validate on 19506 samples\n","Epoch 1/1\n","156492/156492 [==============================] - 1s 9us/step - loss: 0.1242 - mean_absolute_error: 0.2012 - val_loss: 0.0019 - val_mean_absolute_error: 0.0268\n","156492/156492 [==============================] - 9s 60us/step\n","19506/19506 [==============================] - 1s 59us/step\n","19368/19368 [==============================] - 1s 59us/step\n","------------------------------------------------------------\n","The mse train loss is:  0.12416537870057956\n","The mae train loss is:  0.2011626216097118\n","The mse test loss is:  0.0018718050352051262\n","The mae test loss is:  0.026804204574329395\n","------------------------------------------------------------\n","Model: \"sequential_4\"\n","_________________________________________________________________\n","Layer (type)                 Output Shape              Param #   \n","=================================================================\n","cu_dnnlstm_4 (CuDNNLSTM)     (None, 50)                10600     \n","_________________________________________________________________\n","dense_4 (Dense)              (None, 1)                 51        \n","=================================================================\n","Total params: 10,651\n","Trainable params: 10,651\n","Non-trainable params: 0\n","_________________________________________________________________\n","Train on 156492 samples, validate on 19506 samples\n","Epoch 1/1\n","156492/156492 [==============================] - 1s 9us/step - loss: 0.3695 - mean_absolute_error: 0.2716 - val_loss: 0.0065 - val_mean_absolute_error: 0.0699\n","156492/156492 [==============================] - 10s 66us/step\n","19506/19506 [==============================] - 1s 71us/step\n","19368/19368 [==============================] - 1s 72us/step\n","------------------------------------------------------------\n","The mse train loss is:  0.3694828930803668\n","The mae train loss is:  0.2716029795592812\n","The mse test loss is:  0.006465116719103487\n","The mae test loss is:  0.06986089733682399\n","------------------------------------------------------------\n","Model: \"sequential_5\"\n","_________________________________________________________________\n","Layer (type)                 Output Shape              Param #   \n","=================================================================\n","cu_dnnlstm_5 (CuDNNLSTM)     (None, 50)                10600     \n","_________________________________________________________________\n","dense_5 (Dense)              (None, 1)                 51        \n","=================================================================\n","Total params: 10,651\n","Trainable params: 10,651\n","Non-trainable params: 0\n","_________________________________________________________________\n","Train on 156492 samples, validate on 19506 samples\n","Epoch 1/1\n","156492/156492 [==============================] - 2s 10us/step - loss: 0.3712 - mean_absolute_error: 0.3324 - val_loss: 0.2948 - val_mean_absolute_error: 0.3062\n","156492/156492 [==============================] - 10s 61us/step\n","19506/19506 [==============================] - 1s 60us/step\n","19368/19368 [==============================] - 1s 59us/step\n","------------------------------------------------------------\n","The mse train loss is:  0.37119835905431353\n","The mae train loss is:  0.33239949728593815\n","The mse test loss is:  0.2948396854766464\n","The mae test loss is:  0.3062351587905794\n","------------------------------------------------------------\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"6aIp-WltjwMh","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]}]}


--------------------------------------------------------------------------------
/training/encoder_decoder_trainer.py:
--------------------------------------------------------------------------------
 1 | from classes import class_ForecastingTrader, class_DataProcessor
 2 | import numpy as np
 3 | np.random.seed(1) # NumPy
 4 | import random
 5 | random.seed(3) # Python
 6 | import tensorflow as tf
 7 | tf.set_random_seed(2) # Tensorflow
 8 | session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
 9 |                               inter_op_parallelism_threads=1)
10 | from keras import backend as K
11 | sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
12 | K.set_session(sess)
13 | 
14 | import pandas as pd
15 | import pickle
16 | import gc
17 | 
18 | forecasting_trader = class_ForecastingTrader.ForecastingTrader()
19 | data_processor = class_DataProcessor.DataProcessor()
20 | 
21 | ################################# READ PRICES AND PAIRS #################################
22 | # read prices
23 | df_prices = pd.read_pickle('/content/drive/PairsTrading/2009-2019/commodity_ETFs_intraday_interpolated_screened_no_outliers.pickle')
24 | #df_prices = pd.read_pickle('data/etfs/pickle/commodity_ETFs_intraday_interpolated_screened_no_outliers.pickle')
25 | # split data in training and test
26 | df_prices_train, df_prices_test = data_processor.split_data(df_prices,
27 |                                                             ('01-01-2009',
28 |                                                              '31-12-2017'),
29 |                                                             ('01-01-2018',
30 |                                                              '31-12-2018'),
31 |                                                             remove_nan=True)
32 | # load pairs
33 | with open('/content/drive/PairsTrading/2009-2019/pairs_unsupervised_learning_optical_intraday.pickle', 'rb') as handle:
34 | #with open('data/etfs/pickle/2009-2019/pairs_unsupervised_learning_optical_intraday.pickle', 'rb') as handle:
35 |     pairs = pickle.load(handle)
36 | n_years_train = round(len(df_prices_train) / (240 * 78))
37 | print('Loaded {} pairs!'.format(len(pairs)))
38 | 
39 | ################################# TRAIN MODELS #################################
40 | 
41 | combinations = [(24, [15, 15])]
42 | hidden_nodes_names = ['15_15_nodes']
43 | 
44 | for i, configuration in enumerate(combinations):
45 | 
46 |     model_config = {"n_in": configuration[0],
47 |                     "n_out": 2,
48 |                     "epochs": 500,
49 |                     "hidden_nodes": configuration[1],
50 |                     "loss_fct": "mse",
51 |                     "optimizer": "rmsprop",
52 |                     "batch_size": 512,
53 |                     "train_val_split": '2017-01-01',
54 |                     "test_init": '2018-01-01'}
55 |     models = forecasting_trader.train_models(pairs, model_config, model_type='encoder_decoder')
56 | 
57 |     # save models for this configuration
58 |     with open('/content/drive/PairsTrading/encoder_decoder/models_n_in-' + str(configuration[0]) + '_hidden_nodes-' +
59 |               hidden_nodes_names[i] + '.pkl', 'wb') as f:
60 |         pickle.dump(models, f)
61 | 
62 | gc.collect()


--------------------------------------------------------------------------------
/training/rnn_trainer.py:
--------------------------------------------------------------------------------
 1 | from classes import class_ForecastingTrader, class_DataProcessor
 2 | import numpy as np
 3 | np.random.seed(1) # NumPy
 4 | import random
 5 | random.seed(3) # Python
 6 | import tensorflow as tf
 7 | tf.set_random_seed(2) # Tensorflow
 8 | session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
 9 |                               inter_op_parallelism_threads=1)
10 | from keras import backend as K
11 | sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
12 | K.set_session(sess)
13 | 
14 | import pandas as pd
15 | import pickle
16 | import gc
17 | 
18 | forecasting_trader = class_ForecastingTrader.ForecastingTrader()
19 | data_processor = class_DataProcessor.DataProcessor()
20 | 
21 | ################################# READ PRICES AND PAIRS #################################
22 | # read prices
23 | df_prices = pd.read_pickle('/content/drive/PairsTrading/2009-2019/commodity_ETFs_intraday_interpolated_screened_no_outliers.pickle')
24 | #df_prices = pd.read_pickle('data/etfs/pickle/commodity_ETFs_intraday_interpolated_screened_no_outliers.pickle')
25 | # split data in training and test
26 | df_prices_train, df_prices_test = data_processor.split_data(df_prices,
27 |                                                             ('01-01-2009',
28 |                                                              '31-12-2017'),
29 |                                                             ('01-01-2018',
30 |                                                              '31-12-2018'),
31 |                                                             remove_nan=True)
32 | # load pairs
33 | with open('/content/drive/PairsTrading/2009-2019/pairs_unsupervised_learning_optical_intraday.pickle', 'rb') as handle:
34 | #with open('data/etfs/pickle/2009-2019/pairs_unsupervised_learning_optical_intraday.pickle', 'rb') as handle:
35 |     pairs = pickle.load(handle)
36 | n_years_train = round(len(df_prices_train) / (240 * 78))
37 | print('Loaded {} pairs!'.format(len(pairs)))
38 | 
39 | ################################# TRAIN MODELS #################################
40 | 
41 | combinations = [(24, [50])]
42 | hidden_nodes_names = ['50_nodes']
43 | 
44 | for i, configuration in enumerate(combinations):
45 | 
46 |     model_config = {"n_in": configuration[0],
47 |                     "n_out": 1,
48 |                     "epochs": 500,
49 |                     "hidden_nodes": configuration[1],
50 |                     "loss_fct": "mse",
51 |                     "optimizer": "rmsprop",
52 |                     "batch_size": 512,
53 |                     "train_val_split": '2017-01-01',
54 |                     "test_init": '2018-01-01'}
55 |     models = forecasting_trader.train_models(pairs, model_config, model_type='rnn')
56 | 
57 |     # save models for this configuration
58 |     with open('/content/drive/PairsTrading/rnn_models/models_n_in-' + str(configuration[0]) + '_hidden_nodes-' + hidden_nodes_names[i] +
59 |               '.pkl','wb') as f:
60 |         pickle.dump(models, f)
61 | 
62 | gc.collect()


--------------------------------------------------------------------------------