├── current_results.png
├── README.md
└── algo.py


/current_results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bartchr808/Quantopian_Pairs_Trader/HEAD/current_results.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Quantopian_Pairs_Trader
 2 | Read my Medimum article about this project [here](https://medium.com/@bart.chr/pairs-trading-for-algorithmic-trading-breakdown-d8b709f59372)!
 3 | 
 4 | This is my implementation of a Pairs Trading Algorithm on the algorithmic trading research/competition platform [Quantopian](https://www.quantopian.com/home) so I can dive deeper and learn more about [Pairs Trading](http://www.investopedia.com/university/guide-pairs-trading/) and implementing trading algorithms. Some tests/measures I'm currently learning about and using include:
 5 | 
 6 | * Kwiatkowski-Phillips-Schmidt-Shin ([KPSS](https://en.wikipedia.org/wiki/KPSS_test)) stationarity test
 7 | * Augmented Dickey–Fuller ([ADF](https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test)) unit root test
 8 | * [Hedge ratio](http://www.investopedia.com/terms/h/hedgeratio.asp)
 9 | * Half life of [mean-reversion](http://www.investopedia.com/terms/m/meanreversion.asp) from the [Ornstein-Uhlenbeck process](https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process)
10 | * [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent)
11 | * [Softmax function](https://en.wikipedia.org/wiki/Softmax_function) for calculating percent of each security on open order
12 | 
13 | ## Current Results
14 | Currently, my implementation will be able to run on arbitrary amount of possible pairs that a user could provide. However, for current testing purposes, I have only been testing on two possible pairs (and hardcoded some logic that assumes that such as how I record values on the graph). Those two pairs are Microsoft ([MSFT](http://www.google.ca/finance?q=MSFT&ei=FD-ZWcHDKNHejAHt4p-IBA)) + Apple ([AAPL](http://www.google.ca/finance?q=AAPL&ei=-T6ZWaGkKIaO2AbyxbCIBQ)), and McDonalds ([MCD](http://www.google.ca/finance?q=MCD&ei=LT-ZWbHmLIaO2AbyxbCIBQ)) + Yum! Brands ([YUM](http://www.google.ca/finance?q=YUM&ei=PT-ZWcnHCsK42Aac35GoCQ)) which owns chains like KFC, Pizza Hut, and Taco Bell. The former can be seen in the lower graph with the label "\_tech" and the latter with the label "\_food" with corresponding Z and hedge ratio values.
15 | 
16 | ![Current Results Graph](https://raw.githubusercontent.com/bartchr808/Quantopian_Pairs_Trader/master/current_results.png "Current Results Graph")
17 | 
18 | I had my algorithm run for 13 years from January 1, 2004 to January 1, 2017. However, I'm planning on seeing how extensible my algorithm is to different time periods and different combination of pairs. From the few tests I have done so far, this seems true and I haven't "overfitted" my algorithm by making it only work in a specific instance.
19 | 
20 | <!--
21 | As my current results stand, I did **10,498.24%** better than the defualt Quantopian benchmark, the SPDR S&P 500 Trust ETF ([SPY](http://www.google.ca/finance?q=SPY&ei=7z6ZWYiaLo2gjAGFs4OYAg)) (103.76%), and **10,510.59%** better than the [Dow](http://www.google.com/finance?q=INDEXDJX:.DJI) (91.41%), with a **10,602%** return.
22 | -->
23 | 
24 | ## Issues/Next Steps
25 | * Reduce the drawdown and beta and get the leveraging under control.
26 | * Quantopian has deprecated version of Statsmodels python library which doesn't have the KPSS test available. I'll need to manually add it myself.
27 | * Look into cleaning up how I'm currently returning a completely new pair object in `process_pair` and replacing the old pair in the for-loop in `my_handle_data`.
28 | * Haven't looked at using Kalman filters for determining hedge ratios. Not sure if I need to or if the way I did it sufficient.
29 | * Need to look into how Quantopian's `order_target_percent` function works when I have several different pairs and not one or two (e.g. will the first opening order take up my entire portfolio?) .
30 | 


--------------------------------------------------------------------------------
/algo.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import statsmodels.api as sm
  3 | import statsmodels.tsa.stattools as ts
  4 | import pandas as pd
  5 | 
  6 | # ~~~~~~~~~~~~~~~~~~~~~~ TESTS FOR FINDING PAIR TO TRADE ON ~~~~~~~~~~~~~~~~~~~~~~
  7 | class ADF(object):
  8 |     """
  9 |     Augmented Dickey–Fuller (ADF) unit root test
 10 |     Source: http://www.pythonforfinance.net/2016/05/09/python-backtesting-mean-reversion-part-2/
 11 |     """
 12 | 
 13 |     def __init__(self):
 14 |         self.p_value = None
 15 |         self.five_perc_stat = None
 16 |         self.perc_stat = None
 17 |         self.p_min = .0
 18 |         self.p_max = .05
 19 |         self.look_back = 63
 20 | 
 21 |     def apply_adf(self, time_series):
 22 |         model = ts.adfuller(time_series, 1)
 23 |         self.p_value = model[1]
 24 |         self.five_perc_stat = model[4]['5%']
 25 |         self.perc_stat = model[0]
 26 | 
 27 |     def use_P(self):
 28 |         return (self.p_value > self.p_min) and (self.p_value < self.p_max)
 29 |     
 30 |     def use_critical(self):
 31 |         return abs(self.perc_stat) > abs(self.five_perc_stat)
 32 | 
 33 | 
 34 | """  
 35 | # DEPRECATED
 36 | class KPSS(object):
 37 |     #Kwiatkowski-Phillips-Schmidt-Shin (KPSS) stationarity tests
 38 |     def __init__(self):
 39 |         Exception("Not implemented yet")
 40 |         self.p_value = None
 41 |         self.ten_perc_stat = None
 42 |         self.perc_stat = None
 43 |         self.p_min = 0.0
 44 |         self.p_max = 0.2
 45 |         self.look_back = 50
 46 |     
 47 |     def apply_kpss(self, time_series):
 48 |         self.p_value = ts.adfuller(time_series, 1)[1]
 49 |         self.five_perc_stat = ts.adfuller(time_series, 1)[4]['5%'] # possibly make this 10%
 50 |         self.perc_stat = ts.adfuller(time_series, 1)[0]
 51 |         
 52 |     def use(self):
 53 |         return (self.p_value > self.p_min) and (self.p_value < self.p_max) and (self.perc_stat > self.five_perc_stat)
 54 | """
 55 | 
 56 | 
 57 | class Half_Life(object):
 58 |     """
 59 |     Half Life test from the Ornstein-Uhlenbeck process 
 60 |     Source: http://www.pythonforfinance.net/2016/05/09/python-backtesting-mean-reversion-part-2/
 61 |     """
 62 | 
 63 |     def __init__(self):
 64 |         self.hl_min = 1.0
 65 |         self.hl_max = 42.0
 66 |         self.look_back = 43
 67 |         self.half_life = None
 68 | 
 69 |     def apply_half_life(self, time_series):
 70 |         lag = np.roll(time_series, 1)
 71 |         lag[0] = 0
 72 |         ret = time_series - lag
 73 |         ret[0] = 0
 74 | 
 75 |         # adds intercept terms to X variable for regression
 76 |         lag2 = sm.add_constant(lag)
 77 | 
 78 |         model = sm.OLS(ret, lag2)
 79 |         res = model.fit()
 80 | 
 81 |         self.half_life = -np.log(2) / res.params[1]
 82 | 
 83 |     def use(self):
 84 |         return (self.half_life < self.hl_max) and (self.half_life > self.hl_min)
 85 | 
 86 | 
 87 | class Hurst():
 88 |     """
 89 |     If Hurst Exponent is under the 0.5 value of a random walk, then the series is mean reverting
 90 |     Source: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
 91 |     """
 92 | 
 93 |     def __init__(self):
 94 |         self.h_min = 0.0
 95 |         self.h_max = 0.4
 96 |         self.look_back = 126
 97 |         self.lag_max = 100
 98 |         self.h_value = None
 99 |     
100 |     def apply_hurst(self, time_series):
101 |         """Returns the Hurst Exponent of the time series vector ts"""
102 |         # Create the range of lag values
103 |         lags = range(2, self.lag_max)
104 | 
105 |         # Calculate the array of the variances of the lagged differences
106 |         tau = [np.sqrt(np.std(np.subtract(time_series[lag:], time_series[:-lag]))) for lag in lags]
107 | 
108 |         # Use a linear fit to estimate the Hurst Exponent
109 |         poly = np.polyfit(np.log10(lags), np.log10(tau), 1)
110 | 
111 |         # Return the Hurst exponent from the polyfit output
112 |         self.h_value = poly[0]*2.0 
113 | 
114 |     def use(self):
115 |         return (self.h_value < self.h_max) and (self.h_value > self.h_min)
116 | 
117 | # ~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS FOR FILING AN ORDER ~~~~~~~~~~~~~~~~~~~~~~
118 | def hedge_ratio(Y, X):
119 |     # Look into using Kalman Filter to calculate the hedge ratio
120 |     X = sm.add_constant(X)
121 |     model = sm.OLS(Y, X).fit()
122 |     return model.params[1]
123 | 
124 | def softmax_order(stock_1_shares, stock_2_shares, stock_1_price, stock_2_price):
125 |     stock_1_cost = stock_1_shares * stock_1_price
126 |     stock_2_cost = stock_2_shares * stock_2_price
127 |     costs = np.array([stock_1_cost, stock_2_cost])
128 |     return np.exp(costs) / np.sum(np.exp(costs), axis=0)
129 | 
130 | def initialize(context):
131 |     """
132 |     Called once at the start of the algorithm.
133 |     """
134 |     
135 |     context.asset_pairs = [[symbol('MSFT'), symbol('AAPL'), {'in_short': False, 'in_long': False, 'spread': np.array([]), 'hedge_history': np.array([])}], 
136 |                            [symbol('YUM'), symbol('MCD'), {'in_short': False, 'in_long': False, 'spread': np.array([]), 'hedge_history': np.array([])}]]
137 |     context.z_back = 20
138 |     context.hedge_lag = 2
139 |     context.entry_z = 0.5
140 |     
141 |     schedule_function(my_handle_data, date_rules.every_day(),
142 |                       time_rules.market_close(hours=4))
143 |     # Typical slippage and commision I have seen others use and is used in templates by Quantopian
144 |     set_slippage(slippage.VolumeShareSlippage(volume_limit=0.025, price_impact=0.1))
145 |     set_commission(commission.PerShare(cost=0.01, min_trade_cost=1.0))
146 | 
147 | 
148 | def my_handle_data(context, data):
149 |     """
150 |     Called every day.
151 |     """
152 |     
153 |     if get_open_orders():
154 |         return
155 |     
156 |     for i in range(len(context.asset_pairs)):
157 |         pair = context.asset_pairs[i]      
158 |         new_pair = process_pair(pair, context, data)
159 |         context.asset_pairs[i] = new_pair
160 | 
161 | def process_pair(pair, context, data):
162 |     """
163 |     Main function that will execute an order for every pair.
164 |     """
165 |     
166 |     # Get stock data
167 |     stock_1 = pair[0]
168 |     stock_2 = pair[1]
169 |     prices = data.history([stock_1, stock_2], "price", 300, "1d")
170 |     stock_1_P = prices[stock_1]
171 |     stock_2_P = prices[stock_2]
172 |     in_short = pair[2]['in_short']
173 |     in_long = pair[2]['in_long']
174 |     spread = pair[2]['spread']
175 |     hedge_history = pair[2]['hedge_history']
176 | 
177 |     # Get hedge ratio (look into using Kalman Filter)
178 |     try:
179 |         hedge = hedge_ratio(stock_1_P, stock_2_P)
180 |     except ValueError as e:
181 |         log.error(e)
182 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
183 |     
184 |     hedge_history = np.append(hedge_history, hedge)
185 |     
186 |     if hedge_history.size < context.hedge_lag:
187 |         log.debug("Hedge history too short!")
188 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
189 | 
190 |     hedge = hedge_history[-context.hedge_lag]
191 |     spread = np.append(
192 |         spread, stock_1_P[-1] - hedge * stock_2_P[-1])
193 |     spread_length = spread.size
194 | 
195 |     adf = ADF()
196 |     half_life = Half_Life()
197 |     hurst = Hurst()
198 | 
199 |     # Check if current window size is large enough for adf, half life, and hurst exponent
200 |     if (spread_length < adf.look_back) or (spread_length < half_life.look_back) or (spread_length < hurst.look_back):
201 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
202 | 
203 |     
204 |     # possible "SVD did not converge" error because of OLS
205 |     try:
206 |         adf.apply_adf(spread[-adf.look_back:])
207 |         half_life.apply_half_life(spread[-half_life.look_back:])
208 |         hurst.apply_hurst(spread[-hurst.look_back:])
209 |     except:
210 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
211 | 
212 |     # Check if they are in fact a stationary (or possibly trend stationary...need to avoid this) time series
213 |     # * Only cancel if all measures believe it isn't stationary
214 |     if not adf.use_P() and not adf.use_critical() and not half_life.use() and not hurst.use():
215 |         if in_short or in_long:
216 |             # Enter logic here for how to handle open positions after mean reversion
217 |             # of spread breaks down.
218 |             log.info('Tests have failed. Exiting open positions')
219 |             order_target(stock_1, 0)
220 |             order_target(stock_2, 0)
221 |             in_short = in_long = False
222 |             return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
223 |             
224 |         log.debug("Not Stationary!")
225 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
226 |     
227 |     # Check if current window size is large enough for Z score
228 |     if spread_length < context.z_back:
229 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
230 | 
231 |     spreads = spread[-context.z_back:]
232 |     z_score = (spreads[-1] - spreads.mean()) / spreads.std()
233 |     
234 |     # Record measures
235 |     if stock_1 == sid(5061):
236 |         record(Z_tech=z_score)
237 |         record(Hedge_tech=hedge)
238 |     else:
239 |         record(Z_food=z_score)
240 |         record(Hedge_food=hedge)
241 |     
242 |     # Close order logic
243 |     if in_short and z_score < 0.0:
244 |         order_target(stock_1, 0)
245 |         order_target(stock_2, 0)
246 |         in_short = False
247 |         in_long = False
248 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
249 |     elif in_long and z_score > 0.0:
250 |         order_target(stock_1, 0)
251 |         order_target(stock_2, 0)
252 |         in_short = False
253 |         in_long = False
254 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
255 |     
256 |     # Open order logic
257 |     if (z_score < -context.entry_z) and (not in_long):
258 |         stock_1_shares = 1
259 |         stock_2_shares = -hedge
260 |         in_long = True
261 |         in_short = False
262 |         (stock_1_perc, stock_2_perc) = softmax_order(stock_1_shares, stock_2_shares, stock_1_P[-1], stock_2_P[-1])
263 |         order_target_percent(stock_1, stock_1_perc)
264 |         order_target_percent(stock_2, stock_2_perc)
265 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
266 |     elif z_score > context.entry_z and (not in_short):
267 |         stock_1_shares = -1
268 |         stock_2_shares = hedge
269 |         in_short = True
270 |         in_long = False
271 |         (stock_1_perc, stock_2_perc) = softmax_order(stock_1_shares, stock_2_shares, stock_1_P[-1], stock_2_P[-1])
272 |         order_target_percent(stock_1, stock_1_perc)
273 |         order_target_percent(stock_2, stock_2_perc)
274 |         return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
275 | 
276 |     return [stock_1, stock_2, {'in_short': in_short, 'in_long': in_long, 'spread': spread, 'hedge_history': hedge_history}]
277 | 


--------------------------------------------------------------------------------