├── current_results.png ├── README.md └── algo.py /current_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bartchr808/Quantopian_Pairs_Trader/HEAD/current_results.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Quantopian_Pairs_Trader 2 | Read my Medimum article about this project [here](https://medium.com/@bart.chr/pairs-trading-for-algorithmic-trading-breakdown-d8b709f59372)! 3 | 4 | This is my implementation of a Pairs Trading Algorithm on the algorithmic trading research/competition platform [Quantopian](https://www.quantopian.com/home) so I can dive deeper and learn more about [Pairs Trading](http://www.investopedia.com/university/guide-pairs-trading/) and implementing trading algorithms. Some tests/measures I'm currently learning about and using include: 5 | 6 | * Kwiatkowski-Phillips-Schmidt-Shin ([KPSS](https://en.wikipedia.org/wiki/KPSS_test)) stationarity test 7 | * Augmented Dickey–Fuller ([ADF](https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test)) unit root test 8 | * [Hedge ratio](http://www.investopedia.com/terms/h/hedgeratio.asp) 9 | * Half life of [mean-reversion](http://www.investopedia.com/terms/m/meanreversion.asp) from the [Ornstein-Uhlenbeck process](https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process) 10 | * [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent) 11 | * [Softmax function](https://en.wikipedia.org/wiki/Softmax_function) for calculating percent of each security on open order 12 | 13 | ## Current Results 14 | Currently, my implementation will be able to run on arbitrary amount of possible pairs that a user could provide. However, for current testing purposes, I have only been testing on two possible pairs (and hardcoded some logic that assumes that such as how I record values on the graph). Those two pairs are Microsoft ([MSFT](http://www.google.ca/finance?q=MSFT&ei=FD-ZWcHDKNHejAHt4p-IBA)) + Apple ([AAPL](http://www.google.ca/finance?q=AAPL&ei=-T6ZWaGkKIaO2AbyxbCIBQ)), and McDonalds ([MCD](http://www.google.ca/finance?q=MCD&ei=LT-ZWbHmLIaO2AbyxbCIBQ)) + Yum! Brands ([YUM](http://www.google.ca/finance?q=YUM&ei=PT-ZWcnHCsK42Aac35GoCQ)) which owns chains like KFC, Pizza Hut, and Taco Bell. The former can be seen in the lower graph with the label "\_tech" and the latter with the label "\_food" with corresponding Z and hedge ratio values. 15 | 16 | ![Current Results Graph](https://raw.githubusercontent.com/bartchr808/Quantopian_Pairs_Trader/master/current_results.png "Current Results Graph") 17 | 18 | I had my algorithm run for 13 years from January 1, 2004 to January 1, 2017. However, I'm planning on seeing how extensible my algorithm is to different time periods and different combination of pairs. From the few tests I have done so far, this seems true and I haven't "overfitted" my algorithm by making it only work in a specific instance. 19 | 20 | 23 | 24 | ## Issues/Next Steps 25 | * Reduce the drawdown and beta and get the leveraging under control. 26 | * Quantopian has deprecated version of Statsmodels python library which doesn't have the KPSS test available. I'll need to manually add it myself. 27 | * Look into cleaning up how I'm currently returning a completely new pair object in `process_pair` and replacing the old pair in the for-loop in `my_handle_data`. 28 | * Haven't looked at using Kalman filters for determining hedge ratios. Not sure if I need to or if the way I did it sufficient. 29 | * Need to look into how Quantopian's `order_target_percent` function works when I have several different pairs and not one or two (e.g. will the first opening order take up my entire portfolio?) . 30 | -------------------------------------------------------------------------------- /algo.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import statsmodels.api as sm 3 | import statsmodels.tsa.stattools as ts 4 | import pandas as pd 5 | 6 | # ~~~~~~~~~~~~~~~~~~~~~~ TESTS FOR FINDING PAIR TO TRADE ON ~~~~~~~~~~~~~~~~~~~~~~ 7 | class ADF(object): 8 | """ 9 | Augmented Dickey–Fuller (ADF) unit root test 10 | Source: http://www.pythonforfinance.net/2016/05/09/python-backtesting-mean-reversion-part-2/ 11 | """ 12 | 13 | def __init__(self): 14 | self.p_value = None 15 | self.five_perc_stat = None 16 | self.perc_stat = None 17 | self.p_min = .0 18 | self.p_max = .05 19 | self.look_back = 63 20 | 21 | def apply_adf(self, time_series): 22 | model = ts.adfuller(time_series, 1) 23 | self.p_value = model[1] 24 | self.five_perc_stat = model[4]['5%'] 25 | self.perc_stat = model[0] 26 | 27 | def use_P(self): 28 | return (self.p_value > self.p_min) and (self.p_value < self.p_max) 29 | 30 | def use_critical(self): 31 | return abs(self.perc_stat) > abs(self.five_perc_stat) 32 | 33 | 34 | """ 35 | # DEPRECATED 36 | class KPSS(object): 37 | #Kwiatkowski-Phillips-Schmidt-Shin (KPSS) stationarity tests 38 | def __init__(self): 39 | Exception("Not implemented yet") 40 | self.p_value = None 41 | self.ten_perc_stat = None 42 | self.perc_stat = None 43 | self.p_min = 0.0 44 | self.p_max = 0.2 45 | self.look_back = 50 46 | 47 | def apply_kpss(self, time_series): 48 | self.p_value = ts.adfuller(time_series, 1)[1] 49 | self.five_perc_stat = ts.adfuller(time_series, 1)[4]['5%'] # possibly make this 10% 50 | self.perc_stat = ts.adfuller(time_series, 1)[0] 51 | 52 | def use(self): 53 | return (self.p_value > self.p_min) and (self.p_value < self.p_max) and (self.perc_stat > self.five_perc_stat) 54 | """ 55 | 56 | 57 | class Half_Life(object): 58 | """ 59 | Half Life test from the Ornstein-Uhlenbeck process 60 | Source: http://www.pythonforfinance.net/2016/05/09/python-backtesting-mean-reversion-part-2/ 61 | """ 62 | 63 | def __init__(self): 64 | self.hl_min = 1.0 65 | self.hl_max = 42.0 66 | self.look_back = 43 67 | self.half_life = None 68 | 69 | def apply_half_life(self, time_series): 70 | lag = np.roll(time_series, 1) 71 | lag[0] = 0 72 | ret = time_series - lag 73 | ret[0] = 0 74 | 75 | # adds intercept terms to X variable for regression 76 | lag2 = sm.add_constant(lag) 77 | 78 | model = sm.OLS(ret, lag2) 79 | res = model.fit() 80 | 81 | self.half_life = -np.log(2) / res.params[1] 82 | 83 | def use(self): 84 | return (self.half_life < self.hl_max) and (self.half_life > self.hl_min) 85 | 86 | 87 | class Hurst(): 88 | """ 89 | If Hurst Exponent is under the 0.5 value of a random walk, then the series is mean reverting 90 | Source: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing 91 | """ 92 | 93 | def __init__(self): 94 | self.h_min = 0.0 95 | self.h_max = 0.4 96 | self.look_back = 126 97 | self.lag_max = 100 98 | self.h_value = None 99 | 100 | def apply_hurst(self, time_series): 101 | """Returns the Hurst Exponent of the time series vector ts""" 102 | # Create the range of lag values 103 | lags = range(2, self.lag_max) 104 | 105 | # Calculate the array of the variances of the lagged differences 106 | tau = [np.sqrt(np.std(np.subtract(time_series[lag:], time_series[:-lag]))) for lag in lags] 107 | 108 | # Use a linear fit to estimate the Hurst Exponent 109 | poly = np.polyfit(np.log10(lags), np.log10(tau), 1) 110 | 111 | # Return the Hurst exponent from the polyfit output 112 | self.h_value = poly[0]*2.0 113 | 114 | def use(self): 115 | return (self.h_value < self.h_max) and (self.h_value > self.h_min) 116 | 117 | # ~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS FOR FILING AN ORDER ~~~~~~~~~~~~~~~~~~~~~~ 118 | def hedge_ratio(Y, X): 119 | # Look into using Kalman Filter to calculate the hedge ratio 120 | X = sm.add_constant(X) 121 | model = sm.OLS(Y, X).fit() 122 | return model.params[1] 123 | 124 | def softmax_order(stock_1_shares, stock_2_shares, stock_1_price, stock_2_price): 125 | stock_1_cost = stock_1_shares * stock_1_price 126 | stock_2_cost = stock_2_shares * stock_2_price 127 | costs = np.array([stock_1_cost, stock_2_cost]) 128 | return np.exp(costs) / np.sum(np.exp(costs), axis=0) 129 | 130 | def initialize(context): 131 | """ 132 | Called once at the start of the algorithm. 133 | """ 134 | 135 | context.asset_pairs = [[symbol('MSFT'), symbol('AAPL'), {'in_short': False, 'in_long': False, 'spread': np.array([]), 'hedge_history': np.array([])}], 136 | [symbol('YUM'), symbol('MCD'), {'in_short': False, 'in_long': False, 'spread': np.array([]), 'hedge_history': np.array([])}]] 137 | context.z_back = 20 138 | context.hedge_lag = 2 139 | context.entry_z = 0.5 140 | 141 | schedule_function(my_handle_data, date_rules.every_day(), 142 | time_rules.market_close(hours=4)) 143 | # Typical slippage and commision I have seen others use and is used in templates by Quantopian 144 | set_slippage(slippage.VolumeShareSlippage(volume_limit=0.025, price_impact=0.1)) 145 | set_commission(commission.PerShare(cost=0.01, min_trade_cost=1.0)) 146 | 147 | 148 | def my_handle_data(context, data): 149 | """ 150 | Called every day. 151 | """ 152 | 153 | if get_open_orders(): 154 | return 155 | 156 | for i in range(len(context.asset_pairs)): 157 | pair = context.asset_pairs[i] 158 | new_pair = process_pair(pair, context, data) 159 | context.asset_pairs[i] = new_pair 160 | 161 | def process_pair(pair, context, data): 162 | """ 163 | Main function that will execute an order for every pair. 164 | """ 165 | 166 | # Get stock data 167 | stock_1 = pair[0] 168 | stock_2 = pair[1] 169 | prices = data.history([stock_1, stock_2], "price", 300, "1d") 170 | stock_1_P = prices[stock_1] 171 | stock_2_P = prices[stock_2] 172 | in_short = pair[2]['in_short'] 173 | in_long = pair[2]['in_long'] 174 | spread = pair[2]['spread'] 175 | hedge_history = pair[2]['hedge_history'] 176 | 177 | # Get hedge ratio (look into using Kalman Filter) 178 | try: 179 | hedge = hedge_ratio(stock_1_P, stock_2_P) 180 | except ValueError as e: 181 | log.error(e) 182 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 183 | 184 | hedge_history = np.append(hedge_history, hedge) 185 | 186 | if hedge_history.size < context.hedge_lag: 187 | log.debug("Hedge history too short!") 188 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 189 | 190 | hedge = hedge_history[-context.hedge_lag] 191 | spread = np.append( 192 | spread, stock_1_P[-1] - hedge * stock_2_P[-1]) 193 | spread_length = spread.size 194 | 195 | adf = ADF() 196 | half_life = Half_Life() 197 | hurst = Hurst() 198 | 199 | # Check if current window size is large enough for adf, half life, and hurst exponent 200 | if (spread_length < adf.look_back) or (spread_length < half_life.look_back) or (spread_length < hurst.look_back): 201 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 202 | 203 | 204 | # possible "SVD did not converge" error because of OLS 205 | try: 206 | adf.apply_adf(spread[-adf.look_back:]) 207 | half_life.apply_half_life(spread[-half_life.look_back:]) 208 | hurst.apply_hurst(spread[-hurst.look_back:]) 209 | except: 210 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 211 | 212 | # Check if they are in fact a stationary (or possibly trend stationary...need to avoid this) time series 213 | # * Only cancel if all measures believe it isn't stationary 214 | if not adf.use_P() and not adf.use_critical() and not half_life.use() and not hurst.use(): 215 | if in_short or in_long: 216 | # Enter logic here for how to handle open positions after mean reversion 217 | # of spread breaks down. 218 | log.info('Tests have failed. Exiting open positions') 219 | order_target(stock_1, 0) 220 | order_target(stock_2, 0) 221 | in_short = in_long = False 222 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 223 | 224 | log.debug("Not Stationary!") 225 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 226 | 227 | # Check if current window size is large enough for Z score 228 | if spread_length < context.z_back: 229 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 230 | 231 | spreads = spread[-context.z_back:] 232 | z_score = (spreads[-1] - spreads.mean()) / spreads.std() 233 | 234 | # Record measures 235 | if stock_1 == sid(5061): 236 | record(Z_tech=z_score) 237 | record(Hedge_tech=hedge) 238 | else: 239 | record(Z_food=z_score) 240 | record(Hedge_food=hedge) 241 | 242 | # Close order logic 243 | if in_short and z_score < 0.0: 244 | order_target(stock_1, 0) 245 | order_target(stock_2, 0) 246 | in_short = False 247 | in_long = False 248 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 249 | elif in_long and z_score > 0.0: 250 | order_target(stock_1, 0) 251 | order_target(stock_2, 0) 252 | in_short = False 253 | in_long = False 254 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 255 | 256 | # Open order logic 257 | if (z_score < -context.entry_z) and (not in_long): 258 | stock_1_shares = 1 259 | stock_2_shares = -hedge 260 | in_long = True 261 | in_short = False 262 | (stock_1_perc, stock_2_perc) = softmax_order(stock_1_shares, stock_2_shares, stock_1_P[-1], stock_2_P[-1]) 263 | order_target_percent(stock_1, stock_1_perc) 264 | order_target_percent(stock_2, stock_2_perc) 265 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 266 | elif z_score > context.entry_z and (not in_short): 267 | stock_1_shares = -1 268 | stock_2_shares = hedge 269 | in_short = True 270 | in_long = False 271 | (stock_1_perc, stock_2_perc) = softmax_order(stock_1_shares, stock_2_shares, stock_1_P[-1], stock_2_P[-1]) 272 | order_target_percent(stock_1, stock_1_perc) 273 | order_target_percent(stock_2, stock_2_perc) 274 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 275 | 276 | return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}] 277 | --------------------------------------------------------------------------------