├── Capstone_Final_Yicheng_Wang.rar
├── QuantTradingStrategy_FinalProjectCode.pdf
├── README.md
├── platform_client.py
├── platform_server.py
├── static
└── plots
│ ├── backtest_pnl.jpg
│ └── trade_pnl.jpg
└── templates
├── back_testing.html
├── base.html
├── build_model.html
├── data_prep.html
├── index.html
├── real_trade.html
└── trade_analysis.html
/Capstone_Final_Yicheng_Wang.rar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/Capstone_Final_Yicheng_Wang.rar
--------------------------------------------------------------------------------
/QuantTradingStrategy_FinalProjectCode.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/QuantTradingStrategy_FinalProjectCode.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Pairs-Trading-with-Machine-Learning
2 |
3 | Notebook file: implement strategy on Russell3000
4 |
5 | It implements the Pairs Trading strategy with Machine Learning to find the most profitable portfolio.
6 | The idea is based on the stocks that share loadings to common factors in the past should be related in the future.
7 | We used Russell 3000 as our project data from 2010 to 2018 from Bloomberg.
8 | The information we retrieved contains daily prices of stocks, Global Industry Classification Standard (GICS), analyst rating, market to book value, return on asset, debt to asset, EPS, and market cap.
9 |
10 | Result: the best model with tuned hyperparameters achieved Sharpe ratio 1.55.
11 |
12 | ## Pairs-Trading-with-Machine-Learning-on-Distributed-Python-Platform
13 |
14 | This project implements a distributed Python platform that can be used to test quantitative models for trading financial instruments in a network setting under client/server infrastructure. Normally, we backtest locally using past historical data to check the performance of our trading strategies. The performance result, in this case, is usually an illusion of what the actual performance is in real-time trading. We also show in this paper this conclusion by showing that our quantitative trading model performs much worse in the simulated trading than that in backtesting environment. Therefore, we build this Python platform not only for implementing trading strategies and backtesting them historically but also for simulating trades similar to what is in real market, acting as another control before real-time trading.
15 |
16 | Strategy:
17 | 1. Implemented PCA and DBSCAN clustering to group SP500 stocks based on similar factor loadings
18 | 2. Identified pairs within clusters to implement dollar neutral Bollinger Band pairs trading strategy
19 | 3. Constructed portfolio with pairs equally weighted
20 |
21 | Result: This portfolio achieved has a 2.5 Sharpe ratio and 25% annual return in 2018.
22 |
23 | * Codes are in "platform_server.py" and "platform_client.py";
24 | * database is "pairs_trading.db";
25 | * templates for flask are in "templates" folder;
26 | * "static" folder has PnL plots.
27 |
28 | **Download "Capstone_Final_Yicheng_Wang.rar" if you want to run the project (with codes, data and video instruction).**
29 |
30 | Instructions:
31 | 1. In "Program" folder, run "platform_server.py";
32 | 2. Open another console and run "platform_client.py";
33 | 3. Open web browser and go to "http://127.0.0.1:5000/", then home page will show;
34 | 4. Click "Stock Pairs" -> "Building Model" -> "Back Testing" -> "Trading Analysis" -> "Real Trading" in order;
35 | video instructions are in "video_flask" for web browser and "video_program" for running program.
36 |
--------------------------------------------------------------------------------
/platform_client.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*
2 | #!/usr/bin/env python3
3 |
4 | import socket
5 | from socket import AF_INET, SOCK_STREAM
6 | import threading
7 | import queue
8 |
9 | import json
10 | import sys
11 | import urllib.request
12 | import pandas as pd
13 | import matplotlib.pyplot as plt
14 | import datetime as dt
15 | import talib
16 | import numpy as np
17 | import time
18 |
19 | from sklearn.cluster import DBSCAN
20 | from sklearn import preprocessing
21 | from sklearn.decomposition import PCA
22 | import statsmodels.api as sm
23 | import statsmodels.tsa.stattools as ts
24 |
25 | from sqlalchemy import Column, ForeignKey, Integer, Float, String
26 | from sqlalchemy import create_engine
27 | from sqlalchemy import MetaData
28 | from sqlalchemy import Table
29 | from sqlalchemy import inspect
30 | from sqlalchemy import and_
31 |
32 |
33 | from flask import Flask, render_template
34 | app = Flask(__name__, template_folder='templates')
35 |
36 |
37 |
38 | clientID = "yicheng"
39 |
40 |
41 | ' download data '
42 | data_start_date = dt.datetime(2014,1,1) # hours:minute:seconds
43 | data_end_date = dt.date.today() # only dates
44 | requestURL = "https://eodhistoricaldata.com/api/eod/"
45 | myEodKey = "5ba84ea974ab42.45160048"
46 | requestSP500 = "https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/64dd3e9582b936b0352fdd826ecd3c95/constituents_json.json"
47 |
48 | ' trading '
49 | engine = create_engine('sqlite:///pairs_trading.db')
50 | engine.execute("PRAGMA foreign_keys = ON")
51 | metadata = MetaData()
52 | metadata.reflect(bind=engine) # bind to Engine, load all tables
53 |
54 | ' Parameters '
55 | training_start_date = dt.datetime(2014,1,1)
56 | training_end_date = dt.datetime(2018,1,1)
57 | backtesting_start_date = dt.datetime(2018,1,1)
58 | backtesting_end_date = dt.datetime(2019,1,1)
59 | capital = 1000000.
60 | significance = 0.05
61 | k = 2
62 | mvt = 10
63 | # PCA
64 | N_PRIN_COMPONENTS = 50
65 | epsilon = 1.8
66 |
67 |
68 |
69 | def get_daily_data(symbol='', start=data_start_date, end=data_end_date, requestType=requestURL,
70 | apiKey=myEodKey, completeURL=None):
71 | if not completeURL:
72 | symbolURL = str(symbol) + '?'
73 | startURL = "from=" + str(start)
74 | endURL = "to=" + str(end)
75 | apiKeyURL = "api_token=" + myEodKey
76 | completeURL = requestURL + symbolURL + startURL + '&' + endURL + '&' + apiKeyURL + '&period=d&fmt=json'
77 |
78 | # if cannot open url
79 | try:
80 | with urllib.request.urlopen(completeURL) as req:
81 | data = json.load(req)
82 | return data
83 | except:
84 | pass
85 |
86 |
87 | ' populate stock data for each stock '
88 | def download_stock_data(ticker, metadata, engine, table_name):
89 | column_names = ['symbol','date','open','high','low','close','adjusted_close','volume']
90 | price_list = []
91 | clear_a_table(table_name, metadata, engine)
92 |
93 | if 'GSPC' not in ticker:
94 | symbol_full = str(ticker) + ".US"
95 | stock = get_daily_data(symbol=symbol_full)
96 | else:
97 | stock = get_daily_data(symbol=ticker)
98 |
99 | if stock:
100 | for stock_data in stock:
101 | price_list.append([str(ticker), stock_data['date'], stock_data['open'], stock_data['high'],
102 | stock_data['low'], stock_data['close'], stock_data['adjusted_close'],
103 | stock_data['volume']])
104 |
105 | stocks = pd.DataFrame(price_list, columns=column_names)
106 | stocks.to_sql(table_name, con=engine, if_exists='replace', index=False, chunksize=5)
107 |
108 |
109 | def execute_sql_statement(sql_st, engine):
110 | result = engine.execute(sql_st)
111 | result_df = pd.DataFrame(result.fetchall())
112 | result_df.columns = result.keys()
113 | return result_df
114 |
115 |
116 | ''' create table '''
117 | def create_sp500_info_table(name, metadata, engine, null=False):
118 | table = Table(name, metadata,
119 | Column('name', String(50), nullable=null),
120 | Column('sector', String(50), nullable=null),
121 | Column('symbol', String(50), primary_key=True, nullable=null),
122 | extend_existing = True) # constructor
123 | table.create(engine, checkfirst=True)
124 |
125 | def create_price_table(name, metadata, engine, null=True):
126 | if name != 'GSPC.INDX':
127 | foreign_key = 'sp500.symbol'
128 | table = Table(name, metadata,
129 | Column('symbol', String(50), ForeignKey(foreign_key),
130 | primary_key=True, nullable=null),
131 | Column('date', String(50), primary_key=True, nullable=null),
132 | Column('open', Float, nullable=null),
133 | Column('high', Float, nullable=null),
134 | Column('low', Float, nullable=null),
135 | Column('close', Float, nullable=null),
136 | Column('adjusted_close', Float, nullable=null),
137 | Column('volume', Integer, nullable=null),
138 | extend_existing = True)
139 | else:
140 | table = Table(name, metadata,
141 | Column('symbol', String(50), primary_key=True, nullable=null),
142 | Column('date', String(50), primary_key=True, nullable=null),
143 | Column('open', Float, nullable=null),
144 | Column('high', Float, nullable=null),
145 | Column('low', Float, nullable=null),
146 | Column('close', Float, nullable=null),
147 | Column('adjusted_close', Float, nullable=null),
148 | Column('volume', Integer, nullable=null),
149 | extend_existing = True)
150 | table.create(engine, checkfirst=True)
151 |
152 | def create_stockpairs_table(table_name, metadata, engine):
153 | table = Table(table_name, metadata,
154 | Column('Ticker1', String(50), primary_key=True, nullable=False),
155 | Column('Ticker2', String(50), primary_key=True, nullable=False),
156 | Column('Score', Float, nullable=False),
157 | Column('Profit_Loss', Float, nullable=False),
158 | extend_existing=True)
159 | table.create(engine, checkfirst=True)
160 |
161 | def create_pairprices_table(table_name, metadata, engine, null=True):
162 | table = Table(table_name, metadata,
163 | Column('Symbol1', String(50), ForeignKey('stockpairs.Ticker1'), primary_key=True, nullable=null),
164 | Column('Symbol2', String(50), ForeignKey('stockpairs.Ticker2'), primary_key=True, nullable=null),
165 | Column('Date', String(50), primary_key=True, nullable=null),
166 | Column('Close1', Float, nullable=null),
167 | Column('Close2', Float, nullable=null),
168 | Column('Residual', Float, nullable=null),
169 | Column('Lower', Float, nullable=null),
170 | Column('MA', Float, nullable=null),
171 | Column('Upper', Float, nullable=null),
172 | extend_existing=True)
173 | table.create(engine, checkfirst=True)
174 |
175 | def create_trades_table(table_name, metadata, engine, null=False):
176 | table = Table(table_name, metadata,
177 | Column('Symbol1', String(50), ForeignKey('stockpairs.Ticker1'), primary_key=True, nullable=null),
178 | Column('Symbol2', String(50), ForeignKey('stockpairs.Ticker2'), primary_key=True, nullable=null),
179 | Column('Date', String(50), primary_key=True, nullable=null),
180 | Column('Close1', Float, nullable=null),
181 | Column('Close2', Float, nullable=null),
182 | Column('Qty1', Float, nullable=null),
183 | Column('Qty2', Float, nullable=null),
184 | Column('P/L', Float, nullable=null),
185 | extend_existing=True)
186 | table.create(engine, checkfirst=True)
187 |
188 | def clear_a_table(table_name, metadata, engine):
189 | conn = engine.connect()
190 | table = metadata.tables[table_name]
191 | delete_st = table.delete()
192 | conn.execute(delete_st)
193 |
194 |
195 | def download_market_data(metadata, engine, sp500_info_df):
196 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
197 | print("Downloading data ...")
198 |
199 | ' put sp500 constituent data into databases '
200 | create_sp500_info_table('sp500', metadata, engine)
201 | clear_a_table('sp500', metadata, engine) # clear table before insert
202 | sp500_info_df.to_sql('sp500', con=engine, if_exists='append', index=False,
203 | chunksize=5)
204 |
205 | ' get data for each ticker from sp500 '
206 | for symbol in sp500_info_df.Symbol:
207 | create_price_table(symbol, metadata, engine)
208 | download_stock_data(symbol, metadata, engine, symbol)
209 |
210 | ' SP500 index price '
211 | create_price_table('GSPC.INDX', metadata, engine)
212 | download_stock_data('GSPC.INDX', metadata, engine, 'GSPC.INDX')
213 |
214 | print("Finished downloading.")
215 |
216 |
217 | def training_data(metadata, engine, significance, sp500_info_df,
218 | training_start_date, training_end_date):
219 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
220 | print("Training data ...")
221 | print("Start date:", training_start_date, ", End date:", training_end_date)
222 |
223 | ' get training set '
224 | Price = pd.DataFrame()
225 |
226 | for symbol in sp500_info_df.Symbol:
227 | select_st = "SELECT date, adjusted_close From " + "\"" + symbol + "\"" + \
228 | " WHERE date >= " + "\"" + str(training_start_date) + "\"" + \
229 | " AND date <= " + "\"" + str(training_end_date) + "\"" + ";"
230 | try:
231 | result_df = execute_sql_statement(select_st, engine)
232 | result_df.set_index('date', inplace=True) # date as index
233 | result_df.columns = [symbol] # name is column
234 | Price = pd.concat([Price, result_df], axis=1, sort=True)
235 | except:
236 | pass
237 |
238 | ' PCA: reduce dimension '
239 | Price.sort_index(inplace=True)
240 | Price.fillna(method='ffill', inplace=True)
241 | Price = Price.loc[:,(Price>0).all(0)] # every price > 0
242 |
243 | Price_ret = Price.pct_change()
244 | Price_ret = Price_ret.replace([np.inf, -np.inf], np.nan)
245 | Price_ret.dropna(axis=0, how='all', inplace=True) # drop first row (NA)
246 | Price_ret.dropna(axis=1, how='any', inplace=True)
247 |
248 | pca = PCA(n_components=N_PRIN_COMPONENTS)
249 | pca.fit(Price_ret)
250 | X = pd.DataFrame(pca.components_.T, index=Price_ret.columns)
251 | sp500_info_df.set_index('Symbol', inplace=True)
252 | X = pd.concat([X, sp500_info_df.Sector.T], axis=1, sort=True)
253 | X = pd.get_dummies(X)
254 |
255 | ' DBSCAN: identify clusters from stocks that are closest '
256 | X.dropna(axis=0, how='any', inplace=True)
257 | X_arr = preprocessing.StandardScaler().fit_transform(X)
258 | clf = DBSCAN(eps=epsilon, min_samples=3)
259 |
260 | # labels is label values from -1 to x
261 | # -1 represents noisy samples that are not in clusters
262 | clf.fit(X_arr)
263 | clustered = clf.labels_
264 | # all stock with its cluster label (including -1)
265 | clustered_series = pd.Series(index=X.index, data=clustered.flatten())
266 | # clustered stock with its cluster label
267 | clustered_series = clustered_series[clustered_series != -1]
268 |
269 | poss_cluster = clustered_series.value_counts().sort_index()
270 | print(poss_cluster)
271 |
272 | 'identify cointegrated pairs from clusters'
273 | def Cointegration(cluster, significance, start_day, end_day):
274 | pair_coin = []
275 | p_value = []
276 | adf = []
277 | n = cluster.shape[0]
278 | keys = cluster.keys()
279 | for i in range(n):
280 | for j in range(i+1,n):
281 | asset_1 = Price.loc[start_day:end_day, keys[i]]
282 | asset_2 = Price.loc[start_day:end_day, keys[j]]
283 | results = sm.OLS(asset_1, asset_2)
284 | results = results.fit()
285 | predict = results.predict(asset_2)
286 | error = asset_1 - predict
287 | ADFtest = ts.adfuller(error)
288 | if ADFtest[1] < significance:
289 | pair_coin.append([keys[i], keys[j]]) # pair names
290 | p_value.append(ADFtest[1]) # p value, smaller the better
291 | adf.append(ADFtest[0]) # adf test stats, larger the better
292 | return p_value, pair_coin, adf
293 |
294 | "Pair selection method"
295 | "select a pair with lowest p-value from each cluster"
296 | def PairSelection(clustered_series, significance,
297 | start_day=str(training_start_date), end_day=str(training_end_date)):
298 | Opt_pairs = [] # to get best pair in cluster i
299 | tstats = []
300 |
301 | for i in range(len(poss_cluster)):
302 | cluster = clustered_series[clustered_series == i]
303 | result = Cointegration(cluster, significance, start_day, end_day)
304 | if len(result[0]) > 0:
305 | if np.min(result[0]) < significance:
306 | index = np.where(result[0] == np.min(result[0]))[0][0]
307 | Opt_pairs.append([result[1][index][0], result[1][index][1]])
308 | tstats.append(round(result[2][index], 4))
309 |
310 | return Opt_pairs, tstats
311 |
312 | stock_pairs, tstats = PairSelection(clustered_series, significance)
313 | # put into sql table
314 | create_stockpairs_table('stockpairs', metadata, engine)
315 | clear_a_table('stockpairs', metadata, engine)
316 | stock_pairs = pd.DataFrame(stock_pairs, columns=['Ticker1', 'Ticker2'])
317 | stock_pairs["Score"] = -1 * np.array(tstats)
318 | stock_pairs["Profit_Loss"] = 0.0
319 | stock_pairs.to_sql('stockpairs', con=engine, if_exists='append', index=False, chunksize=5)
320 |
321 | print(stock_pairs[["Ticker1", "Ticker2"]])
322 | print("Finished training.")
323 | return stock_pairs
324 |
325 |
326 | def building_model(metadata, engine, k, mvt,
327 | backtesting_start_date, backtesting_end_date):
328 | global ols_results
329 | '''
330 | get pair prices, moving averages, bollinger bands
331 | k: number of std
332 | mvt: moving average period
333 | '''
334 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
335 | print("Building Model ...")
336 | print("Parameters: k =", k, ", moving average =", mvt)
337 |
338 | select_st = "SELECT Ticker1, Ticker2 from stockpairs;"
339 | stock_pairs = execute_sql_statement(select_st, engine)
340 |
341 | create_pairprices_table('pairprices', metadata, engine, mvt)
342 | clear_a_table('pairprices', metadata, engine)
343 |
344 | for pair in stock_pairs.values:
345 | select_st = "SELECT stockpairs.Ticker1 as Symbol1, stockpairs.Ticker2 as Symbol2, \
346 | " + pair[0] + ".date as Date, " + pair[0] + ".Adjusted_close as Close1, \
347 | " + pair[1] + ".Adjusted_close as Close2 \
348 | From " + pair[0] + ", " + pair[1] + ", stockpairs \
349 | Where (((stockpairs.Ticker1 = " + pair[0] + ".symbol) and \
350 | (stockpairs.Ticker2 = " + pair[1] + ".symbol)) and \
351 | (" + pair[0] + ".date = " + pair[1] + ".date)) \
352 | and " + pair[0] + ".date >= " + "\"" + str(training_start_date) + "\"" + \
353 | " AND " + pair[0] + ".date <= " + "\"" + str(training_end_date) + "\" \
354 | ORDER BY Symbol1, Symbol2;"
355 |
356 | result_df = execute_sql_statement(select_st, engine)
357 |
358 | select_st = "SELECT stockpairs.Ticker1 as Symbol1, stockpairs.Ticker2 as Symbol2, \
359 | " + pair[0] + ".date as Date, " + pair[0] + ".Adjusted_close as Close1, \
360 | " + pair[1] + ".Adjusted_close as Close2 \
361 | FROM " + pair[0] + ", " + pair[1] + ", stockpairs \
362 | WHERE (((stockpairs.Ticker1 = " + pair[0] + ".symbol) and \
363 | (stockpairs.Ticker2 = " + pair[1] + ".symbol)) and \
364 | (" + pair[0] + ".date = " + pair[1] + ".date)) \
365 | and " + pair[0] + ".date >= " + "\"" + str(backtesting_start_date) + "\"" + \
366 | " AND " + pair[0] + ".date <= " + "\"" + str(backtesting_end_date) + "\" \
367 | ORDER BY Symbol1, Symbol2;"
368 | result_df2 = execute_sql_statement(select_st, engine)
369 |
370 | # get bollinger band
371 | results = sm.OLS(result_df.Close1, sm.add_constant(result_df.Close2)).fit()
372 | predict = results.params[0] + results.params[1] * result_df2.Close2
373 | ols_results[pair[0]] = results
374 | error = np.subtract(result_df2.Close1, predict)
375 | upperband, middleband, lowerband = talib.BBANDS(error, timeperiod=mvt,
376 | nbdevup=k, nbdevdn=k, matype=0)
377 | result_df2[['Residual', 'Lower', 'MA', 'Upper']] = pd.DataFrame([error, lowerband, middleband, upperband]).T.round(4)
378 | result_df2.to_sql('pairprices', con=engine, if_exists='append', index=False, chunksize=5)
379 |
380 | print("Finished building model.")
381 |
382 |
383 | class StockPair:
384 |
385 | def __init__(self, symbol1, symbol2, start_date, end_date):
386 | self.ticker1 = symbol1
387 | self.ticker2 = symbol2
388 | self.start_date = start_date
389 | self.end_date = end_date
390 | self.trades = {}
391 | self.total_profit_loss = 0.0
392 |
393 | def __str__(self):
394 | return str(self.__class__) + ": " + str(self.__dict__) + "\n"
395 |
396 | def __repr__(self):
397 | return str(self.__class__) + ": " + str(self.__dict__) + "\n"
398 |
399 | def createTrade(self, date, close1, close2, res, lower, upper, qty1 = 0, qty2 = 0, profit_loss = 0.0):
400 | self.trades[date] = np.array([close1, close2, res, lower, upper, qty1, qty2, profit_loss])
401 |
402 | def updateTrades(self): # dollar neutral, available dollar for buy/sell for each pair
403 | trades_matrix = np.array(list(self.trades.values()))
404 |
405 | for index in range(1, trades_matrix.shape[0]):
406 | # RES SELL SIGNAL: buy asset 1, sell asset 2
407 | if (trades_matrix[index-1, 2] < trades_matrix[index-1, 4] and
408 | trades_matrix[index, 2] > trades_matrix[index, 4]):
409 | trades_matrix[index, 5] = int(capital / trades_matrix[index, 0])
410 | trades_matrix[index, 6] = int(-capital / trades_matrix[index, 1])
411 | # RES BUY SIGNAL: sell asset 1, buy asset 2
412 | elif (trades_matrix[index-1, 2] > trades_matrix[index-1, 3] and
413 | trades_matrix[index, 2] < trades_matrix[index, 3]):
414 | trades_matrix[index, 5] = int(-capital / trades_matrix[index, 0])
415 | trades_matrix[index, 6] = int(capital / trades_matrix[index, 1])
416 | # no act
417 | else:
418 | trades_matrix[index, 5] = trades_matrix[index-1, 5]
419 | trades_matrix[index, 6] = trades_matrix[index-1, 6]
420 |
421 | 'update profit and loss'
422 | trades_matrix[index, 7] = trades_matrix[index, 5] * (trades_matrix[index, 0] - trades_matrix[index-1, 0]) \
423 | + trades_matrix[index, 6] * (trades_matrix[index, 1] - trades_matrix[index-1, 1])
424 | trades_matrix[index, 7] = round(trades_matrix[index, 7], 2)
425 | self.total_profit_loss += trades_matrix[index, 7]
426 |
427 | for key, index in zip(self.trades.keys(), range(0, trades_matrix.shape[0])):
428 | self.trades[key] = trades_matrix[index]
429 |
430 | return pd.DataFrame(trades_matrix[:, range(5, trades_matrix.shape[1])], columns=['Qty1', 'Qty2', 'P/L'])
431 |
432 |
433 | def back_testing(metadata, engine, backtesting_start_date, backtesting_end_date):
434 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
435 | print("Backtesting ...")
436 | print("Start date:", backtesting_start_date, ", End date:", backtesting_end_date)
437 |
438 | print('create StockPair')
439 | stock_pair_map = dict()
440 |
441 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;'
442 | stock_pairs = execute_sql_statement(select_st, engine)
443 |
444 | for index, row in stock_pairs.iterrows():
445 | aKey = (row['Ticker1'], row['Ticker2'])
446 | stock_pair_map[aKey] = StockPair(row['Ticker1'], row['Ticker2'],
447 | backtesting_start_date, backtesting_end_date)
448 |
449 | print('create Trades')
450 | select_st = 'SELECT * FROM pairprices;'
451 | result_df = execute_sql_statement(select_st, engine)
452 |
453 | for index in range(result_df.shape[0]):
454 | aKey = (result_df.at[index, 'Symbol1'], result_df.at[index, 'Symbol2'])
455 | stock_pair_map[aKey].createTrade(result_df.at[index, 'Date'],
456 | result_df.at[index, 'Close1'], result_df.at[index, 'Close2'],
457 | result_df.at[index, 'Residual'], result_df.at[index, 'Lower'],
458 | result_df.at[index, 'Upper'])
459 |
460 | print('update Trades')
461 | trades_df = pd.DataFrame(columns=['Qty1', 'Qty2', 'P/L'])
462 | for key, value in stock_pair_map.items():
463 | trades_df = trades_df.append(value.updateTrades(), ignore_index=True)
464 |
465 | table = metadata.tables['stockpairs']
466 | update_st = table.update().values(Profit_Loss=value.total_profit_loss).where( \
467 | and_(table.c.Ticker1==value.ticker1, table.c.Ticker2==value.ticker2))
468 | engine.execute(update_st)
469 |
470 | result_df = result_df[['Symbol1', 'Symbol2', 'Date', 'Close1', 'Close2']].join(trades_df)
471 |
472 | create_trades_table('trades', metadata, engine)
473 | clear_a_table('trades', metadata, engine)
474 | result_df.to_sql('trades', con=engine, if_exists='append', index=False, chunksize=5)
475 |
476 | print("Finished backtesting.")
477 |
478 |
479 | 'real time data according to market date'
480 | def feed_realtime_data(ticker, start, end):
481 | global price_data
482 | column_names = ['symbol','date','adjusted_close']
483 |
484 | stock = get_daily_data(symbol=ticker, start=start, end=end)
485 | if stock:
486 | for stock_data in stock:
487 | price_data.append([str(ticker), stock_data['date'],
488 | stock_data['adjusted_close']])
489 | stocks = pd.DataFrame(price_data, columns=column_names)
490 | stocks.adjusted_close = stocks.adjusted_close.astype(float)
491 | return stocks
492 |
493 |
494 | def get_orders(market_date=None):
495 | orders_list = []
496 |
497 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;'
498 | pairs = execute_sql_statement(select_st, engine)
499 |
500 | for index, row in pairs.iterrows():
501 | # previous data for ols fit
502 | select_st = "SELECT symbol, date, adjusted_close FROM "+str(row[0])+ \
503 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \
504 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";"
505 | result1 = execute_sql_statement(select_st, engine)
506 | select_st = "SELECT symbol, date, adjusted_close FROM "+str(row[1])+ \
507 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \
508 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";"
509 | result2 = execute_sql_statement(select_st, engine)
510 |
511 | if market_date:
512 | # append latest real data to previous data
513 | stock1 = feed_realtime_data(row[0], market_date, market_date)
514 | stock1 = stock1[stock1.symbol == row[0]]
515 | result1 = pd.concat([result1, stock1], ignore_index=True)
516 | stock2 = feed_realtime_data(row[1], market_date, market_date)
517 | stock2 = stock2[stock2.symbol == row[1]]
518 | result2 = pd.concat([result2, stock2], ignore_index=True)
519 |
520 | try:
521 | results = ols_results[row[0]]
522 | predict = results.params[0] + results.params[1] * result2.adjusted_close
523 | error = np.subtract(result1.adjusted_close, predict)
524 | upperband, middleband, lowerband = talib.BBANDS(error, timeperiod=mvt,
525 | nbdevup=k, nbdevdn=k, matype=0)
526 | price1 = round(result1.adjusted_close.values[-1], 2)
527 | price2 = round(result2.adjusted_close.values[-1], 2)
528 |
529 | if (error.values[-2] < upperband.values[-2] and error.values[-1] > upperband.values[-1]):
530 | amt1 = int(capital / price1)
531 | amt2 = int(capital / price2)
532 | order1 = 'Order New '+row[0]+' Buy '+str(price1)+' '+str(amt1)
533 | order2 = 'Order New '+row[1]+' Sell '+str(price2)+' '+str(amt2)
534 | orders_list.append(order1)
535 | orders_list.append(order2)
536 | print(order1, ',', order2)
537 |
538 | elif error.values[-2] > lowerband.values[-2] and error.values[-2] < lowerband.values[-1]:
539 | amt1 = int(capital / price1)
540 | amt2 = int(capital / price2)
541 | order1 = 'Order New '+row[0]+' Sell '+str(price1)+' '+str(amt1)
542 | order2 = 'Order New '+row[1]+' Buy '+str(price2)+' '+str(amt2)
543 | orders_list.append(order1)
544 | orders_list.append(order2)
545 | print(order1, ',', order2)
546 |
547 | else:
548 | print(row[0], row[1], 'No order signal.')
549 |
550 | except:
551 | print('No order signal.')
552 |
553 | return orders_list
554 |
555 |
556 | def receive(e, q):
557 | """Handles receiving of messages."""
558 | total_server_response = []
559 | msg_end_tag = ".$$$$"
560 |
561 | while True:
562 | try:
563 | recv_end = False
564 | # everytime only load certain size
565 | server_response = client_socket.recv(BUFSIZ).decode("utf8")
566 |
567 | if server_response:
568 | if msg_end_tag in server_response: # if reaching end of message
569 | server_response = server_response.replace(msg_end_tag, '')
570 | recv_end = True
571 |
572 | # append every response
573 | total_server_response.append(server_response)
574 |
575 | # if reaching the end, put it into queue
576 | if recv_end == True:
577 | server_response_message = ''.join(total_server_response)
578 | data = json.loads(server_response_message)
579 | #print(data)
580 | q.put(data)
581 | total_server_response = []
582 |
583 | if e.isSet():
584 | e.clear()
585 |
586 | except OSError: # Possibly client has left the chat.
587 | break
588 |
589 |
590 | ' The logon message includes the list of stocks from client '
591 | def get_stock_list_from_database():
592 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;'
593 | pairs = execute_sql_statement(select_st, engine)
594 | tickers = pd.concat([pairs["Ticker1"], pairs["Ticker2"]], ignore_index=True)
595 | tickers.drop_duplicates(keep='first', inplace=True)
596 | tickers.sort_values(axis=0, ascending=True, inplace=True, kind='quicksort')
597 | print(tickers)
598 | return tickers
599 |
600 | def logon():
601 | tickers = get_stock_list_from_database();
602 | client_msg = json.dumps({'Client':clientID, 'Status':'Logon', 'Stocks':tickers.str.cat(sep=',')})
603 | return client_msg
604 |
605 | def get_user_list():
606 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"User List\"}"
607 | return client_msg
608 |
609 | def get_stock_list():
610 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"Stock List\"}"
611 | return client_msg
612 |
613 | def get_market_status():
614 | client_msg = json.dumps({'Client':clientID, 'Status':'Market Status'})
615 | return client_msg
616 |
617 | def get_order_table(stock_list):
618 | client_msg = json.dumps({'Client':clientID, 'Status':'Order Inquiry', 'Symbol':stock_list})
619 | return client_msg
620 |
621 | def enter_a_new_order(symbol, side, price, qty):
622 | client_msg = json.dumps({'Client':clientID, 'Status':'New Order', 'Symbol':symbol, 'Side':side, 'Price':price, 'Qty':qty})
623 | return client_msg
624 |
625 | def quit_connection():
626 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"Quit\"}"
627 | return client_msg
628 |
629 | def send_msg(client_msg):
630 | client_socket.send(bytes(client_msg, "utf8"))
631 | data = json.loads(client_msg)
632 | return data
633 |
634 | def set_event(e):
635 | e.set();
636 |
637 | def wait_for_an_event(e):
638 | while e.isSet():
639 | continue
640 |
641 | def get_data(q):
642 | data = q.get()
643 | q.task_done()
644 | # print(dt.datetime.now(), data)
645 | return data
646 |
647 |
648 | # command in queue
649 | def join_trading_network(e, q):
650 | global market_period_list, record_order_df
651 | last_close_time = time.time()
652 |
653 | threading.Thread(target=receive, args=(e,q)).start()
654 |
655 | set_event(e)
656 | send_msg(logon()) # automatic logon
657 | wait_for_an_event(e)
658 | get_data(q)
659 |
660 | set_event(e)
661 | send_msg(get_user_list()) # automatic print out user list
662 | wait_for_an_event(e)
663 | get_data(q)
664 |
665 | set_event(e)
666 | send_msg(get_stock_list()) # automatically print out stock list
667 | wait_for_an_event(e)
668 | get_data(q)
669 |
670 | while True:
671 | set_event(e)
672 | client_msg = get_market_status() # automatically print market status
673 | send_msg(client_msg)
674 | wait_for_an_event(e)
675 | data = get_data(q)
676 | market_status = data["Market Status"]
677 |
678 | 'The client will loop until market open'
679 | if (market_status == "Market Closed" or
680 | market_status == "Pending Open" or
681 | market_status == "Not Open"):
682 | # if market closed too long, stop trading
683 | if time.time() - last_close_time > 150:
684 | print('>>>> Stop trading after ', time.time() - last_close_time, 'seconds')
685 | break;
686 | time.sleep(1)
687 | continue
688 |
689 | last_close_time = time.time()
690 |
691 | ' place order every 40s (1day) '
692 | print('======================================================')
693 | market_period = data["Market Period"]
694 | market_period_list.append(market_period) # store past dates
695 | print("Current market status is:", market_status)
696 | print("Market period is:", market_period_list)
697 |
698 | ' pLace order according to strategy using previous close price'
699 | if len(market_period_list) > 1:
700 | prev_date = market_period_list[-2]
701 | orders_list = get_orders(prev_date) # up to previous day close price
702 | else:
703 | orders_list = get_orders()
704 |
705 | 'The client will send orders to server only during market open and pending closing'
706 | if orders_list:
707 |
708 | for order in orders_list:
709 | order_list = order.split(" ")
710 | mySymbol = str(order_list[2])
711 | mySide = str(order_list[3])
712 | myPrice = float(order_list[4])
713 | myQuantity = int(order_list[5])
714 |
715 | set_event(e)
716 | send_msg(get_order_table([mySymbol])) # pass in list
717 | wait_for_an_event(e)
718 | data = get_data(q)
719 | order_data = json.loads(data)
720 | order_table = pd.DataFrame(order_data["data"])
721 | if order_table.empty:
722 | print('Empty table')
723 | continue
724 |
725 | if mySide == 'Buy':
726 | order_table = order_table[order_table["Side"] == 'Sell']
727 | order_table.sort_values('Price', ascending=True, inplace=True)
728 | order_table.reset_index(drop=True, inplace=True)
729 | best_price = order_table.loc[0, 'Price']
730 | order_index = order_table.loc[0, 'OrderIndex']
731 | else:
732 | order_table = order_table[order_table["Side"] == 'Buy']
733 | order_table.sort_values('Price', ascending=False, inplace=True)
734 | order_table.reset_index(drop=True, inplace=True)
735 | best_price = order_table.loc[0, 'Price']
736 | order_index = order_table.loc[0, 'OrderIndex']
737 | print(order_table.iloc[0, :])
738 | print('today best price', best_price, ', previous day close price', myPrice, ', order index', order_index)
739 |
740 | set_event(e)
741 | client_msg = enter_a_new_order(symbol=mySymbol, side=mySide, price=float(best_price), qty=myQuantity)
742 | send_msg(client_msg)
743 | wait_for_an_event(e)
744 | data = get_data(q)
745 |
746 | 'record orders'
747 | record_order = pd.Series([market_period, mySymbol, mySide, best_price, myQuantity])
748 | record_order_df = pd.concat([record_order_df, record_order], axis=1)
749 |
750 | time.sleep(30) # skip to next day
751 |
752 | record_order_df = record_order_df.T
753 | try:
754 | record_order_df.columns = ['Date', 'Symbol', 'Side', 'Price', 'Quantity']
755 | record_order_df.loc[record_order_df['Side']=='Sell', 'Quantity'] = -1.*record_order_df.loc[record_order_df['Side']=='Sell', 'Quantity']
756 | record_order_df.set_index(['Symbol', 'Date'], inplace=True)
757 | print(record_order_df)
758 | except:
759 | print('No Orders!!!!')
760 |
761 |
762 | set_event(e)
763 | send_msg(quit_connection()) # automatically quit
764 | wait_for_an_event(e)
765 |
766 |
767 |
768 | 'define function to calculate maximum drawdown'
769 | def MaxDrawdown(Ret_Cum):
770 | # ret_cum also can be portfolio position series
771 | ContVal = np.zeros(np.size(Ret_Cum))
772 | MaxDD = np.zeros(np.size(Ret_Cum))
773 | for i in range(np.size(Ret_Cum)):
774 | if i == 0:
775 | if Ret_Cum[i] < 0:
776 | ContVal[i] = Ret_Cum[i]
777 | else:
778 | ContVal[i] = 0
779 | else:
780 | ContVal[i] = Ret_Cum[i] - np.nanmax(Ret_Cum[0:(i+1)])
781 | MaxDD[i] = np.nanmin(ContVal[0:(i+1)])
782 | return MaxDD
783 |
784 |
785 | @app.route('/')
786 | def index():
787 | return render_template("index.html")
788 |
789 |
790 | @app.route('/data_prep')
791 | def data_prep():
792 | inspector = inspect(engine)
793 |
794 | sp500_info = get_daily_data(completeURL=requestSP500)
795 | sp500_info_df = pd.DataFrame(sp500_info)
796 | if len(inspector.get_table_names()) == 0: # if no market data, download market data
797 | download_market_data(metadata, engine, sp500_info_df)
798 | else:
799 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
800 | print("Data already downloaded ...")
801 |
802 | stock_pairs = training_data(metadata, engine, significance, sp500_info_df,
803 | training_start_date, training_end_date)
804 | pairs = stock_pairs.transpose()
805 | list_of_pairs = [pairs[i] for i in pairs]
806 | return render_template("data_prep.html", pair_list=list_of_pairs)
807 |
808 |
809 | @app.route('/build_model')
810 | def build_model():
811 | building_model(metadata, engine, k, mvt,
812 | backtesting_start_date, backtesting_end_date)
813 |
814 | select_st = "SELECT * from pairprices;"
815 | result_df = execute_sql_statement(select_st, engine)
816 | result_df = result_df.transpose()
817 | list_of_pairs = [result_df[i] for i in result_df]
818 | return render_template("build_model.html", pair_list=list_of_pairs)
819 |
820 |
821 | @app.route('/back_test')
822 | def model_back_testing():
823 | back_testing(metadata, engine, backtesting_start_date, backtesting_end_date)
824 |
825 | select_st = "SELECT * from stockpairs;"
826 | result_df = execute_sql_statement(select_st, engine)
827 | result_df['Score'] = result_df['Score'].map('{:.4f}'.format)
828 | result_df['Profit_Loss'] = result_df['Profit_Loss'].map('${:,.2f}'.format)
829 | result_df = result_df.transpose()
830 | list_of_pairs = [result_df[i] for i in result_df]
831 | return render_template("back_testing.html", pair_list=list_of_pairs)
832 |
833 |
834 | @app.route('/trade_analysis')
835 | def trade_analysis():
836 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
837 | print("Generating trading analysis ...")
838 |
839 | select_st = "SELECT printf(\"US$%.2f\", sum(Profit_Loss)) AS Profit, count(Profit_Loss) AS Total_Trades, \
840 | sum(CASE WHEN Profit_Loss > 0 THEN 1 ELSE 0 END) AS Profit_Trades, \
841 | sum(CASE WHEN Profit_Loss < 0 THEN 1 ELSE 0 END) AS Loss_Trades FROM StockPairs;"
842 | result_df = execute_sql_statement(select_st, engine)
843 |
844 | 'sp500 pnl'
845 | select_st = "SELECT symbol, date, adjusted_close FROM [GSPC.INDX]"+ \
846 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \
847 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";"
848 | sp_df = execute_sql_statement(select_st, engine)
849 | sp_df['ret'] = sp_df['adjusted_close'].pct_change()
850 | sp_df['cumpnl'] = capital * (1 + sp_df['ret']).cumprod() - capital
851 | sp_df.index = pd.to_datetime(sp_df.date)
852 |
853 | 'Get pnl'
854 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;'
855 | pair_df = execute_sql_statement(select_st, engine)
856 | select_st = 'SELECT * FROM trades;'
857 | pnl_df = execute_sql_statement(select_st, engine)
858 | total_pnl = pd.DataFrame(0, columns=["P/L"], index=pnl_df.Date.unique())
859 |
860 | for value in pair_df.values:
861 | pnl = pnl_df.loc[pnl_df.Symbol1==value[0], ["Date","P/L"]]
862 | pnl.set_index("Date", inplace=True)
863 | total_pnl = total_pnl.add(pnl) # adding two dataframe
864 |
865 | cumpnl = total_pnl.cumsum()
866 | maxdraw = MaxDrawdown(cumpnl['P/L'].values)
867 | result_df["Max_Drawdown"] = maxdraw[-1]
868 | cumret = cumpnl.pct_change()
869 | cumret = cumret.replace(np.inf, np.nan)
870 | cumret = cumret.replace(-np.inf, np.nan)
871 | result_df["Sharpe"] = np.sqrt(252) * np.nanmean(cumret) / np.nanstd(cumret)
872 | result_df = result_df.round(2)
873 |
874 | print(result_df.to_string(index=False))
875 | result_df = result_df.transpose()
876 | trade_results = [result_df[i] for i in result_df]
877 |
878 | 'plot'
879 | cumpnl.index = pd.to_datetime(cumpnl.index)
880 | maxdraw = pd.DataFrame(maxdraw, index=cumpnl.index)
881 | fig = plt.figure(figsize=(12,7))
882 | plt.title('Backtesting cumPnL '+str(backtesting_start_date)+' to '+str(backtesting_end_date),
883 | fontsize=15)
884 | plt.xlabel('Date')
885 | plt.ylabel('PnL (dollars)')
886 | plt.plot(cumpnl, label='pairs trading pnl')
887 | plt.plot(maxdraw, label='maximum drawdown')
888 | plt.plot(sp_df['cumpnl'], label='benchmark(sp500) pnl')
889 | plt.legend()
890 | plt.tight_layout()
891 | fig.savefig('static/plots/backtest_pnl.jpg')
892 | plt.show()
893 | return render_template("trade_analysis.html", trade_list=trade_results)
894 |
895 |
896 | @app.route('/real_trade')
897 | def real_trade():
898 | global bClientThreadStarted, client_thread
899 |
900 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
901 | print("Real trading ...", bClientThreadStarted)
902 |
903 | if bClientThreadStarted == False:
904 | client_thread.start()
905 | bClientThreadStarted = True
906 | print("Client thread starts ...", bClientThreadStarted)
907 | client_thread.join() # wait until this thread finishes, then continue main thread
908 |
909 | 'real trade analysis'
910 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<")
911 | print("Trading analysis ...")
912 | get_orders(market_period_list[-1])
913 | stocks_df = pd.DataFrame(price_data, columns=['symbol','date','adjusted_close'])
914 | stocks_df.adjusted_close = stocks_df.adjusted_close.astype(float)
915 | total_pnl = pd.Series(0, index=stocks_df.date.unique())
916 |
917 | try:
918 | for stock in record_order_df.index.levels[0]:
919 | order_df = record_order_df.loc[stock,:]
920 | stock_df = stocks_df[stocks_df['symbol']==stock]
921 | stock_df.set_index('date', inplace=True)
922 | join_df = stock_df.join(order_df)
923 | join_df.fillna(method='ffill', inplace=True)
924 | join_df['pnl'] = (join_df['adjusted_close'] - join_df['Price']) * join_df['Quantity']
925 | total_pnl = total_pnl.add(join_df.pnl, fill_value=0) # series + series
926 | except:
927 | pass # if no orders
928 |
929 | result_df = pd.DataFrame()
930 | result_df.loc[0,'Profits'] = sum(total_pnl)
931 | result_df.loc[0,'Total_Trades'] = len(record_order_df) / 2
932 |
933 | cumpnl = total_pnl.cumsum()
934 | maxdraw = MaxDrawdown(cumpnl.values)
935 | result_df.loc[0,"Max_Drawdown"] = maxdraw[-1]
936 | cumret = cumpnl.pct_change()
937 | cumret = cumret.replace(np.inf, np.nan)
938 | cumret = cumret.replace(-np.inf, np.nan)
939 | result_df.loc[0,"Sharpe"] = np.sqrt(30) * np.nanmean(cumret) / np.nanstd(cumret)
940 | result_df = result_df.round(2)
941 |
942 | print(result_df)
943 | result_df = result_df.transpose()
944 | trade_results = [result_df[i] for i in result_df]
945 |
946 | 'sp500 pnl'
947 | select_st = "SELECT symbol, date, adjusted_close FROM [GSPC.INDX]"+ \
948 | " WHERE date >= " + "\"" + str(market_period_list[0]) + "\"" + \
949 | " AND date <= " + "\"" + str(market_period_list[-1]) + "\"" + ";"
950 | sp_df = execute_sql_statement(select_st, engine)
951 | sp_df['ret'] = sp_df['adjusted_close'].pct_change()
952 | sp_df['cumpnl'] = capital * (1 + sp_df['ret']).cumprod() - capital
953 | sp_df.index = pd.to_datetime(sp_df.date)
954 |
955 | 'plot'
956 | cumpnl.index = pd.to_datetime(cumpnl.index)
957 | maxdraw = pd.DataFrame(maxdraw, index=cumpnl.index)
958 | fig = plt.figure(figsize=(12,7))
959 | plt.title('Trading cumPnL '+str(market_period_list[0])+' to '+str(market_period_list[-1]),
960 | fontsize=15)
961 | plt.xlabel('Date')
962 | plt.ylabel('PnL (dollars)')
963 | plt.plot(cumpnl, label='pairs trading pnl')
964 | plt.plot(maxdraw, label='maximum drawdown')
965 | plt.plot(sp_df['cumpnl'], label='benchmark(sp500) pnl')
966 | plt.legend()
967 | plt.tight_layout()
968 | fig.savefig('static/plots/trade_pnl.jpg')
969 | plt.show()
970 |
971 | return render_template("real_trade.html", trade_list=trade_results)
972 |
973 |
974 |
975 | if(len(sys.argv) > 1) :
976 | clientID = sys.argv[1]
977 | else:
978 | clientID = "Yicheng"
979 |
980 | HOST = socket.gethostbyname(socket.gethostname())
981 | PORT = 6500
982 | BUFSIZ = 1024
983 | ADDR = (HOST, PORT)
984 |
985 | client_socket = socket.socket(AF_INET, SOCK_STREAM) # create TCP/IP socket
986 | client_socket.connect(ADDR)
987 |
988 |
989 |
990 | if __name__ == "__main__":
991 | market_period_list = []
992 | price_data = []
993 | record_order_df = pd.DataFrame()
994 | ols_results = {}
995 |
996 | 'real trade'
997 | e = threading.Event()
998 | q = queue.Queue()
999 | client_thread = threading.Thread(target=join_trading_network, args=(e,q))
1000 |
1001 | 'dashboard'
1002 | bClientThreadStarted = False
1003 | app.run()
1004 |
--------------------------------------------------------------------------------
/platform_server.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #!/usr/bin/env python3
3 |
4 |
5 | import socket
6 | from threading import Thread
7 | import json
8 | import urllib.request
9 | import sys
10 | import pandas as pd
11 | import random
12 | import sched, time
13 | import datetime as dt
14 |
15 | from sqlalchemy import create_engine
16 | from sqlalchemy import MetaData
17 |
18 |
19 | serverID = "Server1"
20 |
21 | startDate = dt.datetime(2019,1,1) # hours:minute:seconds
22 | endDate = dt.date.today() # only dates
23 | requestURL = "https://eodhistoricaldata.com/api/eod/"
24 | myEodKey = "5ba84ea974ab42.45160048"
25 |
26 | ' trading '
27 | engine = create_engine('sqlite:///pairs_trading.db')
28 | engine.execute("PRAGMA foreign_keys = ON")
29 | metadata = MetaData()
30 | metadata.reflect(bind=engine) # bind to Engine, load all tables
31 |
32 |
33 | def get_daily_data(symbol='', start=startDate, end=endDate, requestType=requestURL,
34 | apiKey=myEodKey, completeURL=None):
35 | if not completeURL:
36 | symbolURL = str(symbol) + '?'
37 | startURL = "from=" + str(start)
38 | endURL = "to=" + str(end)
39 | apiKeyURL = "api_token=" + myEodKey
40 | completeURL = requestURL + symbolURL + startURL + '&' + endURL + '&' + apiKeyURL + '&period=d&fmt=json'
41 | print(completeURL)
42 |
43 | # if cannot open url
44 | try:
45 | with urllib.request.urlopen(completeURL) as req:
46 | data = json.load(req)
47 | return data
48 | except:
49 | pass
50 |
51 |
52 | def accept_incoming_connections():
53 | while True:
54 | client, client_address = platform_server.accept()
55 | print("%s:%s has connected." % client_address)
56 | client_thread = Thread(target=handle_client, args=(client,))
57 | client_thread.setDaemon(True)
58 | client_thread.start()
59 |
60 |
61 | def handle_client(client):
62 | """Handles a single client connection."""
63 | global symbols
64 | price_unit = 0.001
65 | client_msg = client.recv(buf_size).decode("utf8")
66 | data = json.loads(client_msg)
67 | print(data)
68 | clientID = data["Client"]
69 | status = data["Status"]
70 | msg_end_tag = ".$$$$"
71 |
72 | if status == "Logon":
73 |
74 | if (clientID in clients.values()):
75 | text = "%s duplicated connection request!" % clientID
76 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Rejected\"}"
77 | server_msg = "".join((server_msg, msg_end_tag))
78 | client.send(bytes(server_msg, "utf8"))
79 | print(text)
80 | client.close()
81 | return
82 |
83 | else:
84 | text = "Welcome %s!" % clientID
85 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Ack\"}"
86 | server_msg = "".join((server_msg, msg_end_tag))
87 | client.send(bytes(server_msg, "utf8"))
88 | clients[client] = clientID
89 | print (clients[client])
90 | client_symbols = list(data["Stocks"].split(','))
91 | symbols.extend(client_symbols)
92 | symbols = sorted(set(symbols))
93 |
94 | try:
95 | while True:
96 | msg = client.recv(buf_size).decode("utf8")
97 | data = json.loads(msg)
98 | print(data)
99 |
100 | if data["Status"] == "Quit":
101 | text = "%s left!" % clientID
102 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Done\"}"
103 | print(server_msg)
104 |
105 | elif data["Status"] == "Order Inquiry":
106 | if "Symbol" in data and data["Symbol"] != "":
107 | server_msg = json.dumps(order_table.loc[order_table['Symbol'].isin(data["Symbol"])].to_json(orient='table'))
108 |
109 | elif data["Status"] == "New Order":
110 | if market_status == "Market Closed":
111 | data["Status"] = "Order Reject"
112 |
113 | if ((order_table["Symbol"] == data["Symbol"]) &
114 | (order_table["Side"] != data["Side"]) &
115 | (abs(order_table["Price"] - float(data["Price"])) < price_unit) &
116 | (order_table["Status"] != 'Filled')).any():
117 |
118 | mask = (order_table["Symbol"] == data["Symbol"]) & \
119 | (order_table["Side"] != data["Side"]) & \
120 | (abs(order_table["Price"] - float(data["Price"])) < price_unit) & \
121 | (order_table["Status"] != 'Filled')
122 | order_qty = order_table.loc[(mask.values), 'Qty']
123 |
124 | if (order_qty.item() == data['Qty']):
125 | order_table.loc[(mask.values), 'Qty'] = 0
126 | order_table.loc[(mask.values), 'Status'] = 'Filled'
127 | data["Status"] = "Fill"
128 | elif (order_qty.item() < data['Qty']):
129 | data['Qty'] = order_qty.item() # return your quantity
130 | order_table.loc[(mask.values), 'Qty'] = 0
131 | order_table.loc[(mask.values), 'Status'] = 'Filled'
132 | data["Status"] = "Order Partial Fill"
133 | else:
134 | order_table.loc[(mask.values), 'Qty'] -= data['Qty']
135 | order_table.loc[(mask.values), 'Status'] = 'Partial Filled'
136 | data["Status"] = "Order Fill"
137 |
138 | else:
139 | if market_status == "Pending Closing":
140 | order_table_for_pending_closing = order_table[(order_table["Symbol"] == data["Symbol"]) &
141 | (order_table["Side"] != data["Side"])].iloc[[0,-1]]
142 | prices = order_table_for_pending_closing["Price"].values
143 |
144 | if data["Side"] == "Buy":
145 | price = float(prices[0])
146 | price += 0.01
147 | else:
148 | price = float(prices[-1])
149 | price -= 0.01
150 | data["Price"] = str(round(price,2))
151 | data["Status"] = "Order Fill"
152 | else:
153 | data["Status"] = "Order Reject"
154 | # print(data)
155 | server_msg = json.dumps(data)
156 |
157 | elif data["Status"] == "User List":
158 | user_list = str('')
159 | for clientKey in clients:
160 | user_list += clients[clientKey] + str(',')
161 | server_msg = json.dumps({'User List':user_list})
162 |
163 | elif data["Status"] == "Stock List":
164 | #stock_list = symbols.str.cat(sep=',')
165 | stock_list = ','.join(symbols)
166 | server_msg = json.dumps({"Stock List":stock_list})
167 |
168 | elif data["Status"] == "Market Status":
169 | server_msg = json.dumps({"Server":serverID, "Market Status":market_status, "Market Period":market_period})
170 |
171 | else:
172 | text = "Unknown Message from Client"
173 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Unknown Message\"}"
174 | print(server_msg)
175 |
176 | server_msg = "".join((server_msg, msg_end_tag))
177 | client.send(bytes(server_msg, "utf8"))
178 |
179 | if data["Status"] == "Quit":
180 | client.close()
181 | del clients[client]
182 | users = ''
183 | for clientKey in clients:
184 | users += clients[clientKey] + ','
185 | print(users)
186 | return
187 |
188 | except KeyboardInterrupt:
189 | sys.exit(0)
190 |
191 | except json.decoder.JSONDecodeError:
192 | del clients[client]
193 | sys.exit(0)
194 |
195 | clients = {}
196 |
197 |
198 | def generate_qty(number_of_qty):
199 | total_qty = 0
200 | list_of_qty = []
201 | for index in range(number_of_qty):
202 | qty = random.randint(1,101)
203 | list_of_qty.append(qty)
204 | total_qty += qty
205 | return (total_qty, list_of_qty)
206 |
207 |
208 | def populate_order_table(symbols, start, end):
209 | price_scale = 0.05
210 | global order_index, order_table
211 | order_table.drop(order_table.index, inplace=True)
212 |
213 | for symbol in symbols:
214 | stock = get_daily_data(symbol, start, end)
215 |
216 | for stock_data in stock:
217 | (total_qty, list_of_qty) = generate_qty(int((float(stock_data['high'])-float(stock_data['low']))/price_scale))
218 | buy_price = float(stock_data['low']);
219 | sell_price = float(stock_data['high'])
220 | daily_volume = float(stock_data['volume'])
221 |
222 | for index in range(0, len(list_of_qty)-1, 2):
223 | order_index += 1
224 | order_table.loc[order_index] = [order_index, symbol, 'Buy', buy_price, int((list_of_qty[index]/total_qty)*daily_volume), 'New']
225 | buy_price += 0.05
226 | order_index += 1
227 | order_table.loc[order_index] = [order_index, symbol, 'Sell', sell_price, int((list_of_qty[index+1]/total_qty)*daily_volume), 'New']
228 | sell_price -= 0.05
229 |
230 | print(order_table)
231 | print(market_status, market_period)
232 |
233 |
234 | '''
235 | (1) Server will provide consolidated books for 30 trading days,
236 | (a) simulated from market data starting from 1/2/2019.
237 | (b) Each simulated trading date has one book, with buy orders and sell orders
238 | simulated from the high and low price from the day, with daily volume randomly
239 | distributed cross all price points.
240 | (c) Each simulated trading date starts with a new book simulated from corresponding
241 | daily historical data
242 | '''
243 | def create_market_interest(index):
244 | global market_period, symbols
245 |
246 | market_periods = pd.bdate_range('2019-01-02', '2019-04-01').strftime("%Y-%m-%d").tolist()
247 |
248 | # in order
249 | startDate = market_periods[index]
250 | endDate = market_periods[index]
251 |
252 | if len(order_table) == 0 or (market_status != "Market Closed" and market_status != "Pending Closing"):
253 | market_period = startDate
254 | populate_order_table(symbols, startDate, endDate)
255 | print(market_status, "Creating market interest")
256 | else:
257 | print(market_status, "No new market interest")
258 |
259 | '''
260 | (2) Each simulated trading day lasts 30 seconds,
261 | following by 5 seconds of pending closing phase
262 | and 5 seconds of market closed phase before market reopen
263 | '''
264 | def update_market_status(status, day):
265 | global market_status
266 | global order_index
267 | global order_table
268 |
269 | market_status = status
270 | create_market_interest(day)
271 |
272 | market_status = 'Open'
273 | print(market_status)
274 | time.sleep(30)
275 |
276 | market_status = 'Pending Closing'
277 | print(market_status)
278 | time.sleep(5)
279 |
280 | market_status = 'Market Closed'
281 | print(market_status)
282 |
283 | order_table.fillna(0)
284 | order_index = 0
285 | time.sleep(5)
286 |
287 | '''
288 | (3) There are 5 phases of market:
289 | (a) Not Open, start
290 | (b) Pending Open,
291 | (c) Open, 30
292 | (d) Pending Close, 5
293 | (e) Market Closed 5
294 | '''
295 | def set_market_status(scheduler, time_in_seconds):
296 | value = dt.datetime.fromtimestamp(time_in_seconds)
297 | print(value.strftime('%Y-%m-%d %H:%M:%S'))
298 |
299 | # 40s for one day
300 | for day in range(total_market_days):
301 | scheduler.enter(40*day+1,1, update_market_status, argument=('Pending Open', day))
302 | scheduler.run()
303 |
304 |
305 | port = 6500
306 | buf_size = 1024
307 | platform_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
308 | print(socket.gethostname())
309 | platform_server.bind((socket.gethostname(), port))
310 |
311 | if __name__ == "__main__":
312 |
313 | market_status = "Not Open"
314 | market_period = "2019-01-01"
315 | order_index = 0
316 | total_market_days = 30
317 |
318 | symbols = []
319 | order_table_columns = ['OrderIndex', 'Symbol', 'Side', 'Price', 'Qty', 'Status']
320 | order_table = pd.DataFrame(columns=order_table_columns)
321 | order_table = order_table.fillna(0)
322 |
323 | platform_server.listen(1)
324 | print("Waiting for client requests")
325 | time.sleep(80) # wait for backtesting to finish
326 |
327 | try:
328 | scheduler = sched.scheduler(time.time, time.sleep)
329 | current_time_in_seconds = time.time()
330 | scheduler_thread = Thread(target=set_market_status, args=(scheduler, current_time_in_seconds))
331 | scheduler_thread.setDaemon(True)
332 |
333 | server_thread = Thread(target=accept_incoming_connections)
334 | server_thread.setDaemon(True)
335 |
336 | server_thread.start()
337 | scheduler_thread.start()
338 |
339 | scheduler_thread.join() # wait until scheduler finished
340 | server_thread.join() # server finish after scheduler finished
341 |
342 | except (KeyboardInterrupt, SystemExit):
343 | platform_server.close()
344 | sys.exit(0)
345 |
--------------------------------------------------------------------------------
/static/plots/backtest_pnl.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/static/plots/backtest_pnl.jpg
--------------------------------------------------------------------------------
/static/plots/trade_pnl.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/static/plots/trade_pnl.jpg
--------------------------------------------------------------------------------
/templates/back_testing.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 | Back Testing
28 |
29 |
30 |
31 |
32 | Ticker1
33 | Ticker2
34 | Score
35 | Profit_Loss
36 |
37 |
38 |
39 | {% for pair in pair_list %}
40 |
41 | {{pair.Ticker1}}
42 | {{pair.Ticker2}}
43 | {{pair.Score}}
44 | {{pair.Profit_Loss}}
45 |
46 | {% endfor %}
47 |
48 |
49 |
50 |
51 | {% endblock %}
--------------------------------------------------------------------------------
/templates/base.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Pair Trading Analytics
6 |
7 |
8 | {% block head %} {% endblock %}
9 |
10 |
11 |
12 |
13 |
14 |
15 | Dashboard
16 |
17 |
30 |
31 |
32 |
33 |
34 |
35 | {% block content %} {% endblock %}
36 |
37 |
38 |
39 |
40 |
41 |
42 |
--------------------------------------------------------------------------------
/templates/build_model.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 | Building Model for Pairs
28 |
29 |
30 |
31 |
32 | Symbol1
33 | Symbol2
34 | Date
35 | Close1
36 | Close2
37 | Residual
38 | Lower
39 | MA
40 | Upper
41 |
42 |
43 |
44 | {% for pair in pair_list %}
45 |
46 | {{pair.Symbol1}}
47 | {{pair.Symbol2}}
48 | {{pair.Date}}
49 | {{pair.Close1}}
50 | {{pair.Close2}}
51 | {{pair.Residual}}
52 | {{pair.Lower}}
53 | {{pair.MA}}
54 | {{pair.Upper}}
55 |
56 | {% endfor %}
57 |
58 |
59 |
60 |
61 | {% endblock %}
--------------------------------------------------------------------------------
/templates/data_prep.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 | Pair Watch list
28 |
29 |
30 |
31 |
32 | Ticker1
33 | Ticker2
34 | tScore
35 |
36 |
37 |
38 | {% for pair in pair_list %}
39 |
40 | {{pair.Ticker1}}
41 | {{pair.Ticker2}}
42 | {{pair.Score}}
43 |
44 | {% endfor %}
45 |
46 |
47 |
48 |
49 | {% endblock %}
--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 |
28 |
29 |
Pairs Trading with Machine Learning
30 |
31 | This project uses Machine Learning methods to group assets based on similar factor loadings,
32 | then identifies pairs within each cluster to implement pairs trading strategy.
33 | Pairs are then constructed into a market-neutral portfolio.
34 |
35 |
36 |
37 |
38 |
39 |
40 |
Machine Learning Methods
41 |
42 | Apply Principal Component Analysis(PCA) to reduce the dimension of the returns data and factor(Indsutry) data,
43 | then group stocks using DBSCAN clustering.
44 |
45 |
46 |
47 |
Finding Pairs
48 |
49 | Run cointegration test(ADF test on stationarity of residual) on each pair in each cluster. Get the most cointegrated pair.
50 |
51 |
52 |
53 |
Trading Logic
54 |
55 | Trade on residuals of pair prices. Then apply bollinger band strategy:
56 | If the residual just crosses below lower band, long; if the residual crosses above higher band, short;
57 | otherwise, hold on to current position.
58 |
59 |
60 |
61 |
62 |
63 | {% endblock %}
--------------------------------------------------------------------------------
/templates/real_trade.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 | Trading Analysis
28 |
29 |
30 |
31 |
32 | Profit
33 | Total_Trades
34 | Maximum_Drawdown
35 | Sharpe_ratio
36 |
37 |
38 |
39 | {% for trade in trade_list %}
40 |
41 | {{trade.Profits}}
42 | {{trade.Total_Trades}}
43 | {{trade.Max_Drawdown}}
44 | {{trade.Sharpe}}
45 |
46 | {% endfor %}
47 |
48 |
49 |
50 |
51 |
52 | {% endblock %}
--------------------------------------------------------------------------------
/templates/trade_analysis.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block content %}
3 |
25 |
26 |
27 | Trading Analysis
28 |
29 |
30 |
31 |
32 | Profit
33 | Total_Trades
34 | Profit_Trades
35 | Loss_Trades
36 | Maximum_Drawdown
37 | Sharpe_ratio
38 |
39 |
40 |
41 | {% for trade in trade_list %}
42 |
43 | {{trade.Profit}}
44 | {{trade.Total_Trades}}
45 | {{trade.Profit_Trades}}
46 | {{trade.Loss_Trades}}
47 | {{trade.Max_Drawdown}}
48 | {{trade.Sharpe}}
49 |
50 | {% endfor %}
51 |
52 |
53 |
54 |
55 |
56 | {% endblock %}
--------------------------------------------------------------------------------