├── Capstone_Final_Yicheng_Wang.rar ├── QuantTradingStrategy_FinalProjectCode.pdf ├── README.md ├── platform_client.py ├── platform_server.py ├── static └── plots │ ├── backtest_pnl.jpg │ └── trade_pnl.jpg └── templates ├── back_testing.html ├── base.html ├── build_model.html ├── data_prep.html ├── index.html ├── real_trade.html └── trade_analysis.html /Capstone_Final_Yicheng_Wang.rar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/Capstone_Final_Yicheng_Wang.rar -------------------------------------------------------------------------------- /QuantTradingStrategy_FinalProjectCode.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/QuantTradingStrategy_FinalProjectCode.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pairs-Trading-with-Machine-Learning 2 | 3 | Notebook file: implement strategy on Russell3000 4 | 5 | It implements the Pairs Trading strategy with Machine Learning to find the most profitable portfolio. 6 | The idea is based on the stocks that share loadings to common factors in the past should be related in the future. 7 | We used Russell 3000 as our project data from 2010 to 2018 from Bloomberg. 8 | The information we retrieved contains daily prices of stocks, Global Industry Classification Standard (GICS), analyst rating, market to book value, return on asset, debt to asset, EPS, and market cap. 9 | 10 | Result: the best model with tuned hyperparameters achieved Sharpe ratio 1.55. 11 | 12 | ## Pairs-Trading-with-Machine-Learning-on-Distributed-Python-Platform 13 | 14 | This project implements a distributed Python platform that can be used to test quantitative models for trading financial instruments in a network setting under client/server infrastructure. Normally, we backtest locally using past historical data to check the performance of our trading strategies. The performance result, in this case, is usually an illusion of what the actual performance is in real-time trading. We also show in this paper this conclusion by showing that our quantitative trading model performs much worse in the simulated trading than that in backtesting environment. Therefore, we build this Python platform not only for implementing trading strategies and backtesting them historically but also for simulating trades similar to what is in real market, acting as another control before real-time trading. 15 | 16 | Strategy: 17 | 1. Implemented PCA and DBSCAN clustering to group SP500 stocks based on similar factor loadings 18 | 2. Identified pairs within clusters to implement dollar neutral Bollinger Band pairs trading strategy 19 | 3. Constructed portfolio with pairs equally weighted 20 | 21 | Result: This portfolio achieved has a 2.5 Sharpe ratio and 25% annual return in 2018. 22 | 23 | * Codes are in "platform_server.py" and "platform_client.py"; 24 | * database is "pairs_trading.db"; 25 | * templates for flask are in "templates" folder; 26 | * "static" folder has PnL plots. 27 | 28 | **Download "Capstone_Final_Yicheng_Wang.rar" if you want to run the project (with codes, data and video instruction).** 29 | 30 | Instructions: 31 | 1. In "Program" folder, run "platform_server.py"; 32 | 2. Open another console and run "platform_client.py"; 33 | 3. Open web browser and go to "http://127.0.0.1:5000/", then home page will show; 34 | 4. Click "Stock Pairs" -> "Building Model" -> "Back Testing" -> "Trading Analysis" -> "Real Trading" in order; 35 | video instructions are in "video_flask" for web browser and "video_program" for running program. 36 | -------------------------------------------------------------------------------- /platform_client.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -* 2 | #!/usr/bin/env python3 3 | 4 | import socket 5 | from socket import AF_INET, SOCK_STREAM 6 | import threading 7 | import queue 8 | 9 | import json 10 | import sys 11 | import urllib.request 12 | import pandas as pd 13 | import matplotlib.pyplot as plt 14 | import datetime as dt 15 | import talib 16 | import numpy as np 17 | import time 18 | 19 | from sklearn.cluster import DBSCAN 20 | from sklearn import preprocessing 21 | from sklearn.decomposition import PCA 22 | import statsmodels.api as sm 23 | import statsmodels.tsa.stattools as ts 24 | 25 | from sqlalchemy import Column, ForeignKey, Integer, Float, String 26 | from sqlalchemy import create_engine 27 | from sqlalchemy import MetaData 28 | from sqlalchemy import Table 29 | from sqlalchemy import inspect 30 | from sqlalchemy import and_ 31 | 32 | 33 | from flask import Flask, render_template 34 | app = Flask(__name__, template_folder='templates') 35 | 36 | 37 | 38 | clientID = "yicheng" 39 | 40 | 41 | ' download data ' 42 | data_start_date = dt.datetime(2014,1,1) # hours:minute:seconds 43 | data_end_date = dt.date.today() # only dates 44 | requestURL = "https://eodhistoricaldata.com/api/eod/" 45 | myEodKey = "5ba84ea974ab42.45160048" 46 | requestSP500 = "https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/64dd3e9582b936b0352fdd826ecd3c95/constituents_json.json" 47 | 48 | ' trading ' 49 | engine = create_engine('sqlite:///pairs_trading.db') 50 | engine.execute("PRAGMA foreign_keys = ON") 51 | metadata = MetaData() 52 | metadata.reflect(bind=engine) # bind to Engine, load all tables 53 | 54 | ' Parameters ' 55 | training_start_date = dt.datetime(2014,1,1) 56 | training_end_date = dt.datetime(2018,1,1) 57 | backtesting_start_date = dt.datetime(2018,1,1) 58 | backtesting_end_date = dt.datetime(2019,1,1) 59 | capital = 1000000. 60 | significance = 0.05 61 | k = 2 62 | mvt = 10 63 | # PCA 64 | N_PRIN_COMPONENTS = 50 65 | epsilon = 1.8 66 | 67 | 68 | 69 | def get_daily_data(symbol='', start=data_start_date, end=data_end_date, requestType=requestURL, 70 | apiKey=myEodKey, completeURL=None): 71 | if not completeURL: 72 | symbolURL = str(symbol) + '?' 73 | startURL = "from=" + str(start) 74 | endURL = "to=" + str(end) 75 | apiKeyURL = "api_token=" + myEodKey 76 | completeURL = requestURL + symbolURL + startURL + '&' + endURL + '&' + apiKeyURL + '&period=d&fmt=json' 77 | 78 | # if cannot open url 79 | try: 80 | with urllib.request.urlopen(completeURL) as req: 81 | data = json.load(req) 82 | return data 83 | except: 84 | pass 85 | 86 | 87 | ' populate stock data for each stock ' 88 | def download_stock_data(ticker, metadata, engine, table_name): 89 | column_names = ['symbol','date','open','high','low','close','adjusted_close','volume'] 90 | price_list = [] 91 | clear_a_table(table_name, metadata, engine) 92 | 93 | if 'GSPC' not in ticker: 94 | symbol_full = str(ticker) + ".US" 95 | stock = get_daily_data(symbol=symbol_full) 96 | else: 97 | stock = get_daily_data(symbol=ticker) 98 | 99 | if stock: 100 | for stock_data in stock: 101 | price_list.append([str(ticker), stock_data['date'], stock_data['open'], stock_data['high'], 102 | stock_data['low'], stock_data['close'], stock_data['adjusted_close'], 103 | stock_data['volume']]) 104 | 105 | stocks = pd.DataFrame(price_list, columns=column_names) 106 | stocks.to_sql(table_name, con=engine, if_exists='replace', index=False, chunksize=5) 107 | 108 | 109 | def execute_sql_statement(sql_st, engine): 110 | result = engine.execute(sql_st) 111 | result_df = pd.DataFrame(result.fetchall()) 112 | result_df.columns = result.keys() 113 | return result_df 114 | 115 | 116 | ''' create table ''' 117 | def create_sp500_info_table(name, metadata, engine, null=False): 118 | table = Table(name, metadata, 119 | Column('name', String(50), nullable=null), 120 | Column('sector', String(50), nullable=null), 121 | Column('symbol', String(50), primary_key=True, nullable=null), 122 | extend_existing = True) # constructor 123 | table.create(engine, checkfirst=True) 124 | 125 | def create_price_table(name, metadata, engine, null=True): 126 | if name != 'GSPC.INDX': 127 | foreign_key = 'sp500.symbol' 128 | table = Table(name, metadata, 129 | Column('symbol', String(50), ForeignKey(foreign_key), 130 | primary_key=True, nullable=null), 131 | Column('date', String(50), primary_key=True, nullable=null), 132 | Column('open', Float, nullable=null), 133 | Column('high', Float, nullable=null), 134 | Column('low', Float, nullable=null), 135 | Column('close', Float, nullable=null), 136 | Column('adjusted_close', Float, nullable=null), 137 | Column('volume', Integer, nullable=null), 138 | extend_existing = True) 139 | else: 140 | table = Table(name, metadata, 141 | Column('symbol', String(50), primary_key=True, nullable=null), 142 | Column('date', String(50), primary_key=True, nullable=null), 143 | Column('open', Float, nullable=null), 144 | Column('high', Float, nullable=null), 145 | Column('low', Float, nullable=null), 146 | Column('close', Float, nullable=null), 147 | Column('adjusted_close', Float, nullable=null), 148 | Column('volume', Integer, nullable=null), 149 | extend_existing = True) 150 | table.create(engine, checkfirst=True) 151 | 152 | def create_stockpairs_table(table_name, metadata, engine): 153 | table = Table(table_name, metadata, 154 | Column('Ticker1', String(50), primary_key=True, nullable=False), 155 | Column('Ticker2', String(50), primary_key=True, nullable=False), 156 | Column('Score', Float, nullable=False), 157 | Column('Profit_Loss', Float, nullable=False), 158 | extend_existing=True) 159 | table.create(engine, checkfirst=True) 160 | 161 | def create_pairprices_table(table_name, metadata, engine, null=True): 162 | table = Table(table_name, metadata, 163 | Column('Symbol1', String(50), ForeignKey('stockpairs.Ticker1'), primary_key=True, nullable=null), 164 | Column('Symbol2', String(50), ForeignKey('stockpairs.Ticker2'), primary_key=True, nullable=null), 165 | Column('Date', String(50), primary_key=True, nullable=null), 166 | Column('Close1', Float, nullable=null), 167 | Column('Close2', Float, nullable=null), 168 | Column('Residual', Float, nullable=null), 169 | Column('Lower', Float, nullable=null), 170 | Column('MA', Float, nullable=null), 171 | Column('Upper', Float, nullable=null), 172 | extend_existing=True) 173 | table.create(engine, checkfirst=True) 174 | 175 | def create_trades_table(table_name, metadata, engine, null=False): 176 | table = Table(table_name, metadata, 177 | Column('Symbol1', String(50), ForeignKey('stockpairs.Ticker1'), primary_key=True, nullable=null), 178 | Column('Symbol2', String(50), ForeignKey('stockpairs.Ticker2'), primary_key=True, nullable=null), 179 | Column('Date', String(50), primary_key=True, nullable=null), 180 | Column('Close1', Float, nullable=null), 181 | Column('Close2', Float, nullable=null), 182 | Column('Qty1', Float, nullable=null), 183 | Column('Qty2', Float, nullable=null), 184 | Column('P/L', Float, nullable=null), 185 | extend_existing=True) 186 | table.create(engine, checkfirst=True) 187 | 188 | def clear_a_table(table_name, metadata, engine): 189 | conn = engine.connect() 190 | table = metadata.tables[table_name] 191 | delete_st = table.delete() 192 | conn.execute(delete_st) 193 | 194 | 195 | def download_market_data(metadata, engine, sp500_info_df): 196 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 197 | print("Downloading data ...") 198 | 199 | ' put sp500 constituent data into databases ' 200 | create_sp500_info_table('sp500', metadata, engine) 201 | clear_a_table('sp500', metadata, engine) # clear table before insert 202 | sp500_info_df.to_sql('sp500', con=engine, if_exists='append', index=False, 203 | chunksize=5) 204 | 205 | ' get data for each ticker from sp500 ' 206 | for symbol in sp500_info_df.Symbol: 207 | create_price_table(symbol, metadata, engine) 208 | download_stock_data(symbol, metadata, engine, symbol) 209 | 210 | ' SP500 index price ' 211 | create_price_table('GSPC.INDX', metadata, engine) 212 | download_stock_data('GSPC.INDX', metadata, engine, 'GSPC.INDX') 213 | 214 | print("Finished downloading.") 215 | 216 | 217 | def training_data(metadata, engine, significance, sp500_info_df, 218 | training_start_date, training_end_date): 219 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 220 | print("Training data ...") 221 | print("Start date:", training_start_date, ", End date:", training_end_date) 222 | 223 | ' get training set ' 224 | Price = pd.DataFrame() 225 | 226 | for symbol in sp500_info_df.Symbol: 227 | select_st = "SELECT date, adjusted_close From " + "\"" + symbol + "\"" + \ 228 | " WHERE date >= " + "\"" + str(training_start_date) + "\"" + \ 229 | " AND date <= " + "\"" + str(training_end_date) + "\"" + ";" 230 | try: 231 | result_df = execute_sql_statement(select_st, engine) 232 | result_df.set_index('date', inplace=True) # date as index 233 | result_df.columns = [symbol] # name is column 234 | Price = pd.concat([Price, result_df], axis=1, sort=True) 235 | except: 236 | pass 237 | 238 | ' PCA: reduce dimension ' 239 | Price.sort_index(inplace=True) 240 | Price.fillna(method='ffill', inplace=True) 241 | Price = Price.loc[:,(Price>0).all(0)] # every price > 0 242 | 243 | Price_ret = Price.pct_change() 244 | Price_ret = Price_ret.replace([np.inf, -np.inf], np.nan) 245 | Price_ret.dropna(axis=0, how='all', inplace=True) # drop first row (NA) 246 | Price_ret.dropna(axis=1, how='any', inplace=True) 247 | 248 | pca = PCA(n_components=N_PRIN_COMPONENTS) 249 | pca.fit(Price_ret) 250 | X = pd.DataFrame(pca.components_.T, index=Price_ret.columns) 251 | sp500_info_df.set_index('Symbol', inplace=True) 252 | X = pd.concat([X, sp500_info_df.Sector.T], axis=1, sort=True) 253 | X = pd.get_dummies(X) 254 | 255 | ' DBSCAN: identify clusters from stocks that are closest ' 256 | X.dropna(axis=0, how='any', inplace=True) 257 | X_arr = preprocessing.StandardScaler().fit_transform(X) 258 | clf = DBSCAN(eps=epsilon, min_samples=3) 259 | 260 | # labels is label values from -1 to x 261 | # -1 represents noisy samples that are not in clusters 262 | clf.fit(X_arr) 263 | clustered = clf.labels_ 264 | # all stock with its cluster label (including -1) 265 | clustered_series = pd.Series(index=X.index, data=clustered.flatten()) 266 | # clustered stock with its cluster label 267 | clustered_series = clustered_series[clustered_series != -1] 268 | 269 | poss_cluster = clustered_series.value_counts().sort_index() 270 | print(poss_cluster) 271 | 272 | 'identify cointegrated pairs from clusters' 273 | def Cointegration(cluster, significance, start_day, end_day): 274 | pair_coin = [] 275 | p_value = [] 276 | adf = [] 277 | n = cluster.shape[0] 278 | keys = cluster.keys() 279 | for i in range(n): 280 | for j in range(i+1,n): 281 | asset_1 = Price.loc[start_day:end_day, keys[i]] 282 | asset_2 = Price.loc[start_day:end_day, keys[j]] 283 | results = sm.OLS(asset_1, asset_2) 284 | results = results.fit() 285 | predict = results.predict(asset_2) 286 | error = asset_1 - predict 287 | ADFtest = ts.adfuller(error) 288 | if ADFtest[1] < significance: 289 | pair_coin.append([keys[i], keys[j]]) # pair names 290 | p_value.append(ADFtest[1]) # p value, smaller the better 291 | adf.append(ADFtest[0]) # adf test stats, larger the better 292 | return p_value, pair_coin, adf 293 | 294 | "Pair selection method" 295 | "select a pair with lowest p-value from each cluster" 296 | def PairSelection(clustered_series, significance, 297 | start_day=str(training_start_date), end_day=str(training_end_date)): 298 | Opt_pairs = [] # to get best pair in cluster i 299 | tstats = [] 300 | 301 | for i in range(len(poss_cluster)): 302 | cluster = clustered_series[clustered_series == i] 303 | result = Cointegration(cluster, significance, start_day, end_day) 304 | if len(result[0]) > 0: 305 | if np.min(result[0]) < significance: 306 | index = np.where(result[0] == np.min(result[0]))[0][0] 307 | Opt_pairs.append([result[1][index][0], result[1][index][1]]) 308 | tstats.append(round(result[2][index], 4)) 309 | 310 | return Opt_pairs, tstats 311 | 312 | stock_pairs, tstats = PairSelection(clustered_series, significance) 313 | # put into sql table 314 | create_stockpairs_table('stockpairs', metadata, engine) 315 | clear_a_table('stockpairs', metadata, engine) 316 | stock_pairs = pd.DataFrame(stock_pairs, columns=['Ticker1', 'Ticker2']) 317 | stock_pairs["Score"] = -1 * np.array(tstats) 318 | stock_pairs["Profit_Loss"] = 0.0 319 | stock_pairs.to_sql('stockpairs', con=engine, if_exists='append', index=False, chunksize=5) 320 | 321 | print(stock_pairs[["Ticker1", "Ticker2"]]) 322 | print("Finished training.") 323 | return stock_pairs 324 | 325 | 326 | def building_model(metadata, engine, k, mvt, 327 | backtesting_start_date, backtesting_end_date): 328 | global ols_results 329 | ''' 330 | get pair prices, moving averages, bollinger bands 331 | k: number of std 332 | mvt: moving average period 333 | ''' 334 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 335 | print("Building Model ...") 336 | print("Parameters: k =", k, ", moving average =", mvt) 337 | 338 | select_st = "SELECT Ticker1, Ticker2 from stockpairs;" 339 | stock_pairs = execute_sql_statement(select_st, engine) 340 | 341 | create_pairprices_table('pairprices', metadata, engine, mvt) 342 | clear_a_table('pairprices', metadata, engine) 343 | 344 | for pair in stock_pairs.values: 345 | select_st = "SELECT stockpairs.Ticker1 as Symbol1, stockpairs.Ticker2 as Symbol2, \ 346 | " + pair[0] + ".date as Date, " + pair[0] + ".Adjusted_close as Close1, \ 347 | " + pair[1] + ".Adjusted_close as Close2 \ 348 | From " + pair[0] + ", " + pair[1] + ", stockpairs \ 349 | Where (((stockpairs.Ticker1 = " + pair[0] + ".symbol) and \ 350 | (stockpairs.Ticker2 = " + pair[1] + ".symbol)) and \ 351 | (" + pair[0] + ".date = " + pair[1] + ".date)) \ 352 | and " + pair[0] + ".date >= " + "\"" + str(training_start_date) + "\"" + \ 353 | " AND " + pair[0] + ".date <= " + "\"" + str(training_end_date) + "\" \ 354 | ORDER BY Symbol1, Symbol2;" 355 | 356 | result_df = execute_sql_statement(select_st, engine) 357 | 358 | select_st = "SELECT stockpairs.Ticker1 as Symbol1, stockpairs.Ticker2 as Symbol2, \ 359 | " + pair[0] + ".date as Date, " + pair[0] + ".Adjusted_close as Close1, \ 360 | " + pair[1] + ".Adjusted_close as Close2 \ 361 | FROM " + pair[0] + ", " + pair[1] + ", stockpairs \ 362 | WHERE (((stockpairs.Ticker1 = " + pair[0] + ".symbol) and \ 363 | (stockpairs.Ticker2 = " + pair[1] + ".symbol)) and \ 364 | (" + pair[0] + ".date = " + pair[1] + ".date)) \ 365 | and " + pair[0] + ".date >= " + "\"" + str(backtesting_start_date) + "\"" + \ 366 | " AND " + pair[0] + ".date <= " + "\"" + str(backtesting_end_date) + "\" \ 367 | ORDER BY Symbol1, Symbol2;" 368 | result_df2 = execute_sql_statement(select_st, engine) 369 | 370 | # get bollinger band 371 | results = sm.OLS(result_df.Close1, sm.add_constant(result_df.Close2)).fit() 372 | predict = results.params[0] + results.params[1] * result_df2.Close2 373 | ols_results[pair[0]] = results 374 | error = np.subtract(result_df2.Close1, predict) 375 | upperband, middleband, lowerband = talib.BBANDS(error, timeperiod=mvt, 376 | nbdevup=k, nbdevdn=k, matype=0) 377 | result_df2[['Residual', 'Lower', 'MA', 'Upper']] = pd.DataFrame([error, lowerband, middleband, upperband]).T.round(4) 378 | result_df2.to_sql('pairprices', con=engine, if_exists='append', index=False, chunksize=5) 379 | 380 | print("Finished building model.") 381 | 382 | 383 | class StockPair: 384 | 385 | def __init__(self, symbol1, symbol2, start_date, end_date): 386 | self.ticker1 = symbol1 387 | self.ticker2 = symbol2 388 | self.start_date = start_date 389 | self.end_date = end_date 390 | self.trades = {} 391 | self.total_profit_loss = 0.0 392 | 393 | def __str__(self): 394 | return str(self.__class__) + ": " + str(self.__dict__) + "\n" 395 | 396 | def __repr__(self): 397 | return str(self.__class__) + ": " + str(self.__dict__) + "\n" 398 | 399 | def createTrade(self, date, close1, close2, res, lower, upper, qty1 = 0, qty2 = 0, profit_loss = 0.0): 400 | self.trades[date] = np.array([close1, close2, res, lower, upper, qty1, qty2, profit_loss]) 401 | 402 | def updateTrades(self): # dollar neutral, available dollar for buy/sell for each pair 403 | trades_matrix = np.array(list(self.trades.values())) 404 | 405 | for index in range(1, trades_matrix.shape[0]): 406 | # RES SELL SIGNAL: buy asset 1, sell asset 2 407 | if (trades_matrix[index-1, 2] < trades_matrix[index-1, 4] and 408 | trades_matrix[index, 2] > trades_matrix[index, 4]): 409 | trades_matrix[index, 5] = int(capital / trades_matrix[index, 0]) 410 | trades_matrix[index, 6] = int(-capital / trades_matrix[index, 1]) 411 | # RES BUY SIGNAL: sell asset 1, buy asset 2 412 | elif (trades_matrix[index-1, 2] > trades_matrix[index-1, 3] and 413 | trades_matrix[index, 2] < trades_matrix[index, 3]): 414 | trades_matrix[index, 5] = int(-capital / trades_matrix[index, 0]) 415 | trades_matrix[index, 6] = int(capital / trades_matrix[index, 1]) 416 | # no act 417 | else: 418 | trades_matrix[index, 5] = trades_matrix[index-1, 5] 419 | trades_matrix[index, 6] = trades_matrix[index-1, 6] 420 | 421 | 'update profit and loss' 422 | trades_matrix[index, 7] = trades_matrix[index, 5] * (trades_matrix[index, 0] - trades_matrix[index-1, 0]) \ 423 | + trades_matrix[index, 6] * (trades_matrix[index, 1] - trades_matrix[index-1, 1]) 424 | trades_matrix[index, 7] = round(trades_matrix[index, 7], 2) 425 | self.total_profit_loss += trades_matrix[index, 7] 426 | 427 | for key, index in zip(self.trades.keys(), range(0, trades_matrix.shape[0])): 428 | self.trades[key] = trades_matrix[index] 429 | 430 | return pd.DataFrame(trades_matrix[:, range(5, trades_matrix.shape[1])], columns=['Qty1', 'Qty2', 'P/L']) 431 | 432 | 433 | def back_testing(metadata, engine, backtesting_start_date, backtesting_end_date): 434 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 435 | print("Backtesting ...") 436 | print("Start date:", backtesting_start_date, ", End date:", backtesting_end_date) 437 | 438 | print('create StockPair') 439 | stock_pair_map = dict() 440 | 441 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;' 442 | stock_pairs = execute_sql_statement(select_st, engine) 443 | 444 | for index, row in stock_pairs.iterrows(): 445 | aKey = (row['Ticker1'], row['Ticker2']) 446 | stock_pair_map[aKey] = StockPair(row['Ticker1'], row['Ticker2'], 447 | backtesting_start_date, backtesting_end_date) 448 | 449 | print('create Trades') 450 | select_st = 'SELECT * FROM pairprices;' 451 | result_df = execute_sql_statement(select_st, engine) 452 | 453 | for index in range(result_df.shape[0]): 454 | aKey = (result_df.at[index, 'Symbol1'], result_df.at[index, 'Symbol2']) 455 | stock_pair_map[aKey].createTrade(result_df.at[index, 'Date'], 456 | result_df.at[index, 'Close1'], result_df.at[index, 'Close2'], 457 | result_df.at[index, 'Residual'], result_df.at[index, 'Lower'], 458 | result_df.at[index, 'Upper']) 459 | 460 | print('update Trades') 461 | trades_df = pd.DataFrame(columns=['Qty1', 'Qty2', 'P/L']) 462 | for key, value in stock_pair_map.items(): 463 | trades_df = trades_df.append(value.updateTrades(), ignore_index=True) 464 | 465 | table = metadata.tables['stockpairs'] 466 | update_st = table.update().values(Profit_Loss=value.total_profit_loss).where( \ 467 | and_(table.c.Ticker1==value.ticker1, table.c.Ticker2==value.ticker2)) 468 | engine.execute(update_st) 469 | 470 | result_df = result_df[['Symbol1', 'Symbol2', 'Date', 'Close1', 'Close2']].join(trades_df) 471 | 472 | create_trades_table('trades', metadata, engine) 473 | clear_a_table('trades', metadata, engine) 474 | result_df.to_sql('trades', con=engine, if_exists='append', index=False, chunksize=5) 475 | 476 | print("Finished backtesting.") 477 | 478 | 479 | 'real time data according to market date' 480 | def feed_realtime_data(ticker, start, end): 481 | global price_data 482 | column_names = ['symbol','date','adjusted_close'] 483 | 484 | stock = get_daily_data(symbol=ticker, start=start, end=end) 485 | if stock: 486 | for stock_data in stock: 487 | price_data.append([str(ticker), stock_data['date'], 488 | stock_data['adjusted_close']]) 489 | stocks = pd.DataFrame(price_data, columns=column_names) 490 | stocks.adjusted_close = stocks.adjusted_close.astype(float) 491 | return stocks 492 | 493 | 494 | def get_orders(market_date=None): 495 | orders_list = [] 496 | 497 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;' 498 | pairs = execute_sql_statement(select_st, engine) 499 | 500 | for index, row in pairs.iterrows(): 501 | # previous data for ols fit 502 | select_st = "SELECT symbol, date, adjusted_close FROM "+str(row[0])+ \ 503 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \ 504 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";" 505 | result1 = execute_sql_statement(select_st, engine) 506 | select_st = "SELECT symbol, date, adjusted_close FROM "+str(row[1])+ \ 507 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \ 508 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";" 509 | result2 = execute_sql_statement(select_st, engine) 510 | 511 | if market_date: 512 | # append latest real data to previous data 513 | stock1 = feed_realtime_data(row[0], market_date, market_date) 514 | stock1 = stock1[stock1.symbol == row[0]] 515 | result1 = pd.concat([result1, stock1], ignore_index=True) 516 | stock2 = feed_realtime_data(row[1], market_date, market_date) 517 | stock2 = stock2[stock2.symbol == row[1]] 518 | result2 = pd.concat([result2, stock2], ignore_index=True) 519 | 520 | try: 521 | results = ols_results[row[0]] 522 | predict = results.params[0] + results.params[1] * result2.adjusted_close 523 | error = np.subtract(result1.adjusted_close, predict) 524 | upperband, middleband, lowerband = talib.BBANDS(error, timeperiod=mvt, 525 | nbdevup=k, nbdevdn=k, matype=0) 526 | price1 = round(result1.adjusted_close.values[-1], 2) 527 | price2 = round(result2.adjusted_close.values[-1], 2) 528 | 529 | if (error.values[-2] < upperband.values[-2] and error.values[-1] > upperband.values[-1]): 530 | amt1 = int(capital / price1) 531 | amt2 = int(capital / price2) 532 | order1 = 'Order New '+row[0]+' Buy '+str(price1)+' '+str(amt1) 533 | order2 = 'Order New '+row[1]+' Sell '+str(price2)+' '+str(amt2) 534 | orders_list.append(order1) 535 | orders_list.append(order2) 536 | print(order1, ',', order2) 537 | 538 | elif error.values[-2] > lowerband.values[-2] and error.values[-2] < lowerband.values[-1]: 539 | amt1 = int(capital / price1) 540 | amt2 = int(capital / price2) 541 | order1 = 'Order New '+row[0]+' Sell '+str(price1)+' '+str(amt1) 542 | order2 = 'Order New '+row[1]+' Buy '+str(price2)+' '+str(amt2) 543 | orders_list.append(order1) 544 | orders_list.append(order2) 545 | print(order1, ',', order2) 546 | 547 | else: 548 | print(row[0], row[1], 'No order signal.') 549 | 550 | except: 551 | print('No order signal.') 552 | 553 | return orders_list 554 | 555 | 556 | def receive(e, q): 557 | """Handles receiving of messages.""" 558 | total_server_response = [] 559 | msg_end_tag = ".$$$$" 560 | 561 | while True: 562 | try: 563 | recv_end = False 564 | # everytime only load certain size 565 | server_response = client_socket.recv(BUFSIZ).decode("utf8") 566 | 567 | if server_response: 568 | if msg_end_tag in server_response: # if reaching end of message 569 | server_response = server_response.replace(msg_end_tag, '') 570 | recv_end = True 571 | 572 | # append every response 573 | total_server_response.append(server_response) 574 | 575 | # if reaching the end, put it into queue 576 | if recv_end == True: 577 | server_response_message = ''.join(total_server_response) 578 | data = json.loads(server_response_message) 579 | #print(data) 580 | q.put(data) 581 | total_server_response = [] 582 | 583 | if e.isSet(): 584 | e.clear() 585 | 586 | except OSError: # Possibly client has left the chat. 587 | break 588 | 589 | 590 | ' The logon message includes the list of stocks from client ' 591 | def get_stock_list_from_database(): 592 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;' 593 | pairs = execute_sql_statement(select_st, engine) 594 | tickers = pd.concat([pairs["Ticker1"], pairs["Ticker2"]], ignore_index=True) 595 | tickers.drop_duplicates(keep='first', inplace=True) 596 | tickers.sort_values(axis=0, ascending=True, inplace=True, kind='quicksort') 597 | print(tickers) 598 | return tickers 599 | 600 | def logon(): 601 | tickers = get_stock_list_from_database(); 602 | client_msg = json.dumps({'Client':clientID, 'Status':'Logon', 'Stocks':tickers.str.cat(sep=',')}) 603 | return client_msg 604 | 605 | def get_user_list(): 606 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"User List\"}" 607 | return client_msg 608 | 609 | def get_stock_list(): 610 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"Stock List\"}" 611 | return client_msg 612 | 613 | def get_market_status(): 614 | client_msg = json.dumps({'Client':clientID, 'Status':'Market Status'}) 615 | return client_msg 616 | 617 | def get_order_table(stock_list): 618 | client_msg = json.dumps({'Client':clientID, 'Status':'Order Inquiry', 'Symbol':stock_list}) 619 | return client_msg 620 | 621 | def enter_a_new_order(symbol, side, price, qty): 622 | client_msg = json.dumps({'Client':clientID, 'Status':'New Order', 'Symbol':symbol, 'Side':side, 'Price':price, 'Qty':qty}) 623 | return client_msg 624 | 625 | def quit_connection(): 626 | client_msg = "{\"Client\":\"" + clientID + "\", \"Status\":\"Quit\"}" 627 | return client_msg 628 | 629 | def send_msg(client_msg): 630 | client_socket.send(bytes(client_msg, "utf8")) 631 | data = json.loads(client_msg) 632 | return data 633 | 634 | def set_event(e): 635 | e.set(); 636 | 637 | def wait_for_an_event(e): 638 | while e.isSet(): 639 | continue 640 | 641 | def get_data(q): 642 | data = q.get() 643 | q.task_done() 644 | # print(dt.datetime.now(), data) 645 | return data 646 | 647 | 648 | # command in queue 649 | def join_trading_network(e, q): 650 | global market_period_list, record_order_df 651 | last_close_time = time.time() 652 | 653 | threading.Thread(target=receive, args=(e,q)).start() 654 | 655 | set_event(e) 656 | send_msg(logon()) # automatic logon 657 | wait_for_an_event(e) 658 | get_data(q) 659 | 660 | set_event(e) 661 | send_msg(get_user_list()) # automatic print out user list 662 | wait_for_an_event(e) 663 | get_data(q) 664 | 665 | set_event(e) 666 | send_msg(get_stock_list()) # automatically print out stock list 667 | wait_for_an_event(e) 668 | get_data(q) 669 | 670 | while True: 671 | set_event(e) 672 | client_msg = get_market_status() # automatically print market status 673 | send_msg(client_msg) 674 | wait_for_an_event(e) 675 | data = get_data(q) 676 | market_status = data["Market Status"] 677 | 678 | 'The client will loop until market open' 679 | if (market_status == "Market Closed" or 680 | market_status == "Pending Open" or 681 | market_status == "Not Open"): 682 | # if market closed too long, stop trading 683 | if time.time() - last_close_time > 150: 684 | print('>>>> Stop trading after ', time.time() - last_close_time, 'seconds') 685 | break; 686 | time.sleep(1) 687 | continue 688 | 689 | last_close_time = time.time() 690 | 691 | ' place order every 40s (1day) ' 692 | print('======================================================') 693 | market_period = data["Market Period"] 694 | market_period_list.append(market_period) # store past dates 695 | print("Current market status is:", market_status) 696 | print("Market period is:", market_period_list) 697 | 698 | ' pLace order according to strategy using previous close price' 699 | if len(market_period_list) > 1: 700 | prev_date = market_period_list[-2] 701 | orders_list = get_orders(prev_date) # up to previous day close price 702 | else: 703 | orders_list = get_orders() 704 | 705 | 'The client will send orders to server only during market open and pending closing' 706 | if orders_list: 707 | 708 | for order in orders_list: 709 | order_list = order.split(" ") 710 | mySymbol = str(order_list[2]) 711 | mySide = str(order_list[3]) 712 | myPrice = float(order_list[4]) 713 | myQuantity = int(order_list[5]) 714 | 715 | set_event(e) 716 | send_msg(get_order_table([mySymbol])) # pass in list 717 | wait_for_an_event(e) 718 | data = get_data(q) 719 | order_data = json.loads(data) 720 | order_table = pd.DataFrame(order_data["data"]) 721 | if order_table.empty: 722 | print('Empty table') 723 | continue 724 | 725 | if mySide == 'Buy': 726 | order_table = order_table[order_table["Side"] == 'Sell'] 727 | order_table.sort_values('Price', ascending=True, inplace=True) 728 | order_table.reset_index(drop=True, inplace=True) 729 | best_price = order_table.loc[0, 'Price'] 730 | order_index = order_table.loc[0, 'OrderIndex'] 731 | else: 732 | order_table = order_table[order_table["Side"] == 'Buy'] 733 | order_table.sort_values('Price', ascending=False, inplace=True) 734 | order_table.reset_index(drop=True, inplace=True) 735 | best_price = order_table.loc[0, 'Price'] 736 | order_index = order_table.loc[0, 'OrderIndex'] 737 | print(order_table.iloc[0, :]) 738 | print('today best price', best_price, ', previous day close price', myPrice, ', order index', order_index) 739 | 740 | set_event(e) 741 | client_msg = enter_a_new_order(symbol=mySymbol, side=mySide, price=float(best_price), qty=myQuantity) 742 | send_msg(client_msg) 743 | wait_for_an_event(e) 744 | data = get_data(q) 745 | 746 | 'record orders' 747 | record_order = pd.Series([market_period, mySymbol, mySide, best_price, myQuantity]) 748 | record_order_df = pd.concat([record_order_df, record_order], axis=1) 749 | 750 | time.sleep(30) # skip to next day 751 | 752 | record_order_df = record_order_df.T 753 | try: 754 | record_order_df.columns = ['Date', 'Symbol', 'Side', 'Price', 'Quantity'] 755 | record_order_df.loc[record_order_df['Side']=='Sell', 'Quantity'] = -1.*record_order_df.loc[record_order_df['Side']=='Sell', 'Quantity'] 756 | record_order_df.set_index(['Symbol', 'Date'], inplace=True) 757 | print(record_order_df) 758 | except: 759 | print('No Orders!!!!') 760 | 761 | 762 | set_event(e) 763 | send_msg(quit_connection()) # automatically quit 764 | wait_for_an_event(e) 765 | 766 | 767 | 768 | 'define function to calculate maximum drawdown' 769 | def MaxDrawdown(Ret_Cum): 770 | # ret_cum also can be portfolio position series 771 | ContVal = np.zeros(np.size(Ret_Cum)) 772 | MaxDD = np.zeros(np.size(Ret_Cum)) 773 | for i in range(np.size(Ret_Cum)): 774 | if i == 0: 775 | if Ret_Cum[i] < 0: 776 | ContVal[i] = Ret_Cum[i] 777 | else: 778 | ContVal[i] = 0 779 | else: 780 | ContVal[i] = Ret_Cum[i] - np.nanmax(Ret_Cum[0:(i+1)]) 781 | MaxDD[i] = np.nanmin(ContVal[0:(i+1)]) 782 | return MaxDD 783 | 784 | 785 | @app.route('/') 786 | def index(): 787 | return render_template("index.html") 788 | 789 | 790 | @app.route('/data_prep') 791 | def data_prep(): 792 | inspector = inspect(engine) 793 | 794 | sp500_info = get_daily_data(completeURL=requestSP500) 795 | sp500_info_df = pd.DataFrame(sp500_info) 796 | if len(inspector.get_table_names()) == 0: # if no market data, download market data 797 | download_market_data(metadata, engine, sp500_info_df) 798 | else: 799 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 800 | print("Data already downloaded ...") 801 | 802 | stock_pairs = training_data(metadata, engine, significance, sp500_info_df, 803 | training_start_date, training_end_date) 804 | pairs = stock_pairs.transpose() 805 | list_of_pairs = [pairs[i] for i in pairs] 806 | return render_template("data_prep.html", pair_list=list_of_pairs) 807 | 808 | 809 | @app.route('/build_model') 810 | def build_model(): 811 | building_model(metadata, engine, k, mvt, 812 | backtesting_start_date, backtesting_end_date) 813 | 814 | select_st = "SELECT * from pairprices;" 815 | result_df = execute_sql_statement(select_st, engine) 816 | result_df = result_df.transpose() 817 | list_of_pairs = [result_df[i] for i in result_df] 818 | return render_template("build_model.html", pair_list=list_of_pairs) 819 | 820 | 821 | @app.route('/back_test') 822 | def model_back_testing(): 823 | back_testing(metadata, engine, backtesting_start_date, backtesting_end_date) 824 | 825 | select_st = "SELECT * from stockpairs;" 826 | result_df = execute_sql_statement(select_st, engine) 827 | result_df['Score'] = result_df['Score'].map('{:.4f}'.format) 828 | result_df['Profit_Loss'] = result_df['Profit_Loss'].map('${:,.2f}'.format) 829 | result_df = result_df.transpose() 830 | list_of_pairs = [result_df[i] for i in result_df] 831 | return render_template("back_testing.html", pair_list=list_of_pairs) 832 | 833 | 834 | @app.route('/trade_analysis') 835 | def trade_analysis(): 836 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 837 | print("Generating trading analysis ...") 838 | 839 | select_st = "SELECT printf(\"US$%.2f\", sum(Profit_Loss)) AS Profit, count(Profit_Loss) AS Total_Trades, \ 840 | sum(CASE WHEN Profit_Loss > 0 THEN 1 ELSE 0 END) AS Profit_Trades, \ 841 | sum(CASE WHEN Profit_Loss < 0 THEN 1 ELSE 0 END) AS Loss_Trades FROM StockPairs;" 842 | result_df = execute_sql_statement(select_st, engine) 843 | 844 | 'sp500 pnl' 845 | select_st = "SELECT symbol, date, adjusted_close FROM [GSPC.INDX]"+ \ 846 | " WHERE date >= " + "\"" + str(backtesting_start_date) + "\"" + \ 847 | " AND date <= " + "\"" + str(backtesting_end_date) + "\"" + ";" 848 | sp_df = execute_sql_statement(select_st, engine) 849 | sp_df['ret'] = sp_df['adjusted_close'].pct_change() 850 | sp_df['cumpnl'] = capital * (1 + sp_df['ret']).cumprod() - capital 851 | sp_df.index = pd.to_datetime(sp_df.date) 852 | 853 | 'Get pnl' 854 | select_st = 'SELECT Ticker1, Ticker2 FROM stockpairs;' 855 | pair_df = execute_sql_statement(select_st, engine) 856 | select_st = 'SELECT * FROM trades;' 857 | pnl_df = execute_sql_statement(select_st, engine) 858 | total_pnl = pd.DataFrame(0, columns=["P/L"], index=pnl_df.Date.unique()) 859 | 860 | for value in pair_df.values: 861 | pnl = pnl_df.loc[pnl_df.Symbol1==value[0], ["Date","P/L"]] 862 | pnl.set_index("Date", inplace=True) 863 | total_pnl = total_pnl.add(pnl) # adding two dataframe 864 | 865 | cumpnl = total_pnl.cumsum() 866 | maxdraw = MaxDrawdown(cumpnl['P/L'].values) 867 | result_df["Max_Drawdown"] = maxdraw[-1] 868 | cumret = cumpnl.pct_change() 869 | cumret = cumret.replace(np.inf, np.nan) 870 | cumret = cumret.replace(-np.inf, np.nan) 871 | result_df["Sharpe"] = np.sqrt(252) * np.nanmean(cumret) / np.nanstd(cumret) 872 | result_df = result_df.round(2) 873 | 874 | print(result_df.to_string(index=False)) 875 | result_df = result_df.transpose() 876 | trade_results = [result_df[i] for i in result_df] 877 | 878 | 'plot' 879 | cumpnl.index = pd.to_datetime(cumpnl.index) 880 | maxdraw = pd.DataFrame(maxdraw, index=cumpnl.index) 881 | fig = plt.figure(figsize=(12,7)) 882 | plt.title('Backtesting cumPnL '+str(backtesting_start_date)+' to '+str(backtesting_end_date), 883 | fontsize=15) 884 | plt.xlabel('Date') 885 | plt.ylabel('PnL (dollars)') 886 | plt.plot(cumpnl, label='pairs trading pnl') 887 | plt.plot(maxdraw, label='maximum drawdown') 888 | plt.plot(sp_df['cumpnl'], label='benchmark(sp500) pnl') 889 | plt.legend() 890 | plt.tight_layout() 891 | fig.savefig('static/plots/backtest_pnl.jpg') 892 | plt.show() 893 | return render_template("trade_analysis.html", trade_list=trade_results) 894 | 895 | 896 | @app.route('/real_trade') 897 | def real_trade(): 898 | global bClientThreadStarted, client_thread 899 | 900 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 901 | print("Real trading ...", bClientThreadStarted) 902 | 903 | if bClientThreadStarted == False: 904 | client_thread.start() 905 | bClientThreadStarted = True 906 | print("Client thread starts ...", bClientThreadStarted) 907 | client_thread.join() # wait until this thread finishes, then continue main thread 908 | 909 | 'real trade analysis' 910 | print(" >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<") 911 | print("Trading analysis ...") 912 | get_orders(market_period_list[-1]) 913 | stocks_df = pd.DataFrame(price_data, columns=['symbol','date','adjusted_close']) 914 | stocks_df.adjusted_close = stocks_df.adjusted_close.astype(float) 915 | total_pnl = pd.Series(0, index=stocks_df.date.unique()) 916 | 917 | try: 918 | for stock in record_order_df.index.levels[0]: 919 | order_df = record_order_df.loc[stock,:] 920 | stock_df = stocks_df[stocks_df['symbol']==stock] 921 | stock_df.set_index('date', inplace=True) 922 | join_df = stock_df.join(order_df) 923 | join_df.fillna(method='ffill', inplace=True) 924 | join_df['pnl'] = (join_df['adjusted_close'] - join_df['Price']) * join_df['Quantity'] 925 | total_pnl = total_pnl.add(join_df.pnl, fill_value=0) # series + series 926 | except: 927 | pass # if no orders 928 | 929 | result_df = pd.DataFrame() 930 | result_df.loc[0,'Profits'] = sum(total_pnl) 931 | result_df.loc[0,'Total_Trades'] = len(record_order_df) / 2 932 | 933 | cumpnl = total_pnl.cumsum() 934 | maxdraw = MaxDrawdown(cumpnl.values) 935 | result_df.loc[0,"Max_Drawdown"] = maxdraw[-1] 936 | cumret = cumpnl.pct_change() 937 | cumret = cumret.replace(np.inf, np.nan) 938 | cumret = cumret.replace(-np.inf, np.nan) 939 | result_df.loc[0,"Sharpe"] = np.sqrt(30) * np.nanmean(cumret) / np.nanstd(cumret) 940 | result_df = result_df.round(2) 941 | 942 | print(result_df) 943 | result_df = result_df.transpose() 944 | trade_results = [result_df[i] for i in result_df] 945 | 946 | 'sp500 pnl' 947 | select_st = "SELECT symbol, date, adjusted_close FROM [GSPC.INDX]"+ \ 948 | " WHERE date >= " + "\"" + str(market_period_list[0]) + "\"" + \ 949 | " AND date <= " + "\"" + str(market_period_list[-1]) + "\"" + ";" 950 | sp_df = execute_sql_statement(select_st, engine) 951 | sp_df['ret'] = sp_df['adjusted_close'].pct_change() 952 | sp_df['cumpnl'] = capital * (1 + sp_df['ret']).cumprod() - capital 953 | sp_df.index = pd.to_datetime(sp_df.date) 954 | 955 | 'plot' 956 | cumpnl.index = pd.to_datetime(cumpnl.index) 957 | maxdraw = pd.DataFrame(maxdraw, index=cumpnl.index) 958 | fig = plt.figure(figsize=(12,7)) 959 | plt.title('Trading cumPnL '+str(market_period_list[0])+' to '+str(market_period_list[-1]), 960 | fontsize=15) 961 | plt.xlabel('Date') 962 | plt.ylabel('PnL (dollars)') 963 | plt.plot(cumpnl, label='pairs trading pnl') 964 | plt.plot(maxdraw, label='maximum drawdown') 965 | plt.plot(sp_df['cumpnl'], label='benchmark(sp500) pnl') 966 | plt.legend() 967 | plt.tight_layout() 968 | fig.savefig('static/plots/trade_pnl.jpg') 969 | plt.show() 970 | 971 | return render_template("real_trade.html", trade_list=trade_results) 972 | 973 | 974 | 975 | if(len(sys.argv) > 1) : 976 | clientID = sys.argv[1] 977 | else: 978 | clientID = "Yicheng" 979 | 980 | HOST = socket.gethostbyname(socket.gethostname()) 981 | PORT = 6500 982 | BUFSIZ = 1024 983 | ADDR = (HOST, PORT) 984 | 985 | client_socket = socket.socket(AF_INET, SOCK_STREAM) # create TCP/IP socket 986 | client_socket.connect(ADDR) 987 | 988 | 989 | 990 | if __name__ == "__main__": 991 | market_period_list = [] 992 | price_data = [] 993 | record_order_df = pd.DataFrame() 994 | ols_results = {} 995 | 996 | 'real trade' 997 | e = threading.Event() 998 | q = queue.Queue() 999 | client_thread = threading.Thread(target=join_trading_network, args=(e,q)) 1000 | 1001 | 'dashboard' 1002 | bClientThreadStarted = False 1003 | app.run() 1004 | -------------------------------------------------------------------------------- /platform_server.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #!/usr/bin/env python3 3 | 4 | 5 | import socket 6 | from threading import Thread 7 | import json 8 | import urllib.request 9 | import sys 10 | import pandas as pd 11 | import random 12 | import sched, time 13 | import datetime as dt 14 | 15 | from sqlalchemy import create_engine 16 | from sqlalchemy import MetaData 17 | 18 | 19 | serverID = "Server1" 20 | 21 | startDate = dt.datetime(2019,1,1) # hours:minute:seconds 22 | endDate = dt.date.today() # only dates 23 | requestURL = "https://eodhistoricaldata.com/api/eod/" 24 | myEodKey = "5ba84ea974ab42.45160048" 25 | 26 | ' trading ' 27 | engine = create_engine('sqlite:///pairs_trading.db') 28 | engine.execute("PRAGMA foreign_keys = ON") 29 | metadata = MetaData() 30 | metadata.reflect(bind=engine) # bind to Engine, load all tables 31 | 32 | 33 | def get_daily_data(symbol='', start=startDate, end=endDate, requestType=requestURL, 34 | apiKey=myEodKey, completeURL=None): 35 | if not completeURL: 36 | symbolURL = str(symbol) + '?' 37 | startURL = "from=" + str(start) 38 | endURL = "to=" + str(end) 39 | apiKeyURL = "api_token=" + myEodKey 40 | completeURL = requestURL + symbolURL + startURL + '&' + endURL + '&' + apiKeyURL + '&period=d&fmt=json' 41 | print(completeURL) 42 | 43 | # if cannot open url 44 | try: 45 | with urllib.request.urlopen(completeURL) as req: 46 | data = json.load(req) 47 | return data 48 | except: 49 | pass 50 | 51 | 52 | def accept_incoming_connections(): 53 | while True: 54 | client, client_address = platform_server.accept() 55 | print("%s:%s has connected." % client_address) 56 | client_thread = Thread(target=handle_client, args=(client,)) 57 | client_thread.setDaemon(True) 58 | client_thread.start() 59 | 60 | 61 | def handle_client(client): 62 | """Handles a single client connection.""" 63 | global symbols 64 | price_unit = 0.001 65 | client_msg = client.recv(buf_size).decode("utf8") 66 | data = json.loads(client_msg) 67 | print(data) 68 | clientID = data["Client"] 69 | status = data["Status"] 70 | msg_end_tag = ".$$$$" 71 | 72 | if status == "Logon": 73 | 74 | if (clientID in clients.values()): 75 | text = "%s duplicated connection request!" % clientID 76 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Rejected\"}" 77 | server_msg = "".join((server_msg, msg_end_tag)) 78 | client.send(bytes(server_msg, "utf8")) 79 | print(text) 80 | client.close() 81 | return 82 | 83 | else: 84 | text = "Welcome %s!" % clientID 85 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Ack\"}" 86 | server_msg = "".join((server_msg, msg_end_tag)) 87 | client.send(bytes(server_msg, "utf8")) 88 | clients[client] = clientID 89 | print (clients[client]) 90 | client_symbols = list(data["Stocks"].split(',')) 91 | symbols.extend(client_symbols) 92 | symbols = sorted(set(symbols)) 93 | 94 | try: 95 | while True: 96 | msg = client.recv(buf_size).decode("utf8") 97 | data = json.loads(msg) 98 | print(data) 99 | 100 | if data["Status"] == "Quit": 101 | text = "%s left!" % clientID 102 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Done\"}" 103 | print(server_msg) 104 | 105 | elif data["Status"] == "Order Inquiry": 106 | if "Symbol" in data and data["Symbol"] != "": 107 | server_msg = json.dumps(order_table.loc[order_table['Symbol'].isin(data["Symbol"])].to_json(orient='table')) 108 | 109 | elif data["Status"] == "New Order": 110 | if market_status == "Market Closed": 111 | data["Status"] = "Order Reject" 112 | 113 | if ((order_table["Symbol"] == data["Symbol"]) & 114 | (order_table["Side"] != data["Side"]) & 115 | (abs(order_table["Price"] - float(data["Price"])) < price_unit) & 116 | (order_table["Status"] != 'Filled')).any(): 117 | 118 | mask = (order_table["Symbol"] == data["Symbol"]) & \ 119 | (order_table["Side"] != data["Side"]) & \ 120 | (abs(order_table["Price"] - float(data["Price"])) < price_unit) & \ 121 | (order_table["Status"] != 'Filled') 122 | order_qty = order_table.loc[(mask.values), 'Qty'] 123 | 124 | if (order_qty.item() == data['Qty']): 125 | order_table.loc[(mask.values), 'Qty'] = 0 126 | order_table.loc[(mask.values), 'Status'] = 'Filled' 127 | data["Status"] = "Fill" 128 | elif (order_qty.item() < data['Qty']): 129 | data['Qty'] = order_qty.item() # return your quantity 130 | order_table.loc[(mask.values), 'Qty'] = 0 131 | order_table.loc[(mask.values), 'Status'] = 'Filled' 132 | data["Status"] = "Order Partial Fill" 133 | else: 134 | order_table.loc[(mask.values), 'Qty'] -= data['Qty'] 135 | order_table.loc[(mask.values), 'Status'] = 'Partial Filled' 136 | data["Status"] = "Order Fill" 137 | 138 | else: 139 | if market_status == "Pending Closing": 140 | order_table_for_pending_closing = order_table[(order_table["Symbol"] == data["Symbol"]) & 141 | (order_table["Side"] != data["Side"])].iloc[[0,-1]] 142 | prices = order_table_for_pending_closing["Price"].values 143 | 144 | if data["Side"] == "Buy": 145 | price = float(prices[0]) 146 | price += 0.01 147 | else: 148 | price = float(prices[-1]) 149 | price -= 0.01 150 | data["Price"] = str(round(price,2)) 151 | data["Status"] = "Order Fill" 152 | else: 153 | data["Status"] = "Order Reject" 154 | # print(data) 155 | server_msg = json.dumps(data) 156 | 157 | elif data["Status"] == "User List": 158 | user_list = str('') 159 | for clientKey in clients: 160 | user_list += clients[clientKey] + str(',') 161 | server_msg = json.dumps({'User List':user_list}) 162 | 163 | elif data["Status"] == "Stock List": 164 | #stock_list = symbols.str.cat(sep=',') 165 | stock_list = ','.join(symbols) 166 | server_msg = json.dumps({"Stock List":stock_list}) 167 | 168 | elif data["Status"] == "Market Status": 169 | server_msg = json.dumps({"Server":serverID, "Market Status":market_status, "Market Period":market_period}) 170 | 171 | else: 172 | text = "Unknown Message from Client" 173 | server_msg = "{\"Server\":\"" + serverID + "\", \"Response\":\"" + text + "\", \"Status\":\"Unknown Message\"}" 174 | print(server_msg) 175 | 176 | server_msg = "".join((server_msg, msg_end_tag)) 177 | client.send(bytes(server_msg, "utf8")) 178 | 179 | if data["Status"] == "Quit": 180 | client.close() 181 | del clients[client] 182 | users = '' 183 | for clientKey in clients: 184 | users += clients[clientKey] + ',' 185 | print(users) 186 | return 187 | 188 | except KeyboardInterrupt: 189 | sys.exit(0) 190 | 191 | except json.decoder.JSONDecodeError: 192 | del clients[client] 193 | sys.exit(0) 194 | 195 | clients = {} 196 | 197 | 198 | def generate_qty(number_of_qty): 199 | total_qty = 0 200 | list_of_qty = [] 201 | for index in range(number_of_qty): 202 | qty = random.randint(1,101) 203 | list_of_qty.append(qty) 204 | total_qty += qty 205 | return (total_qty, list_of_qty) 206 | 207 | 208 | def populate_order_table(symbols, start, end): 209 | price_scale = 0.05 210 | global order_index, order_table 211 | order_table.drop(order_table.index, inplace=True) 212 | 213 | for symbol in symbols: 214 | stock = get_daily_data(symbol, start, end) 215 | 216 | for stock_data in stock: 217 | (total_qty, list_of_qty) = generate_qty(int((float(stock_data['high'])-float(stock_data['low']))/price_scale)) 218 | buy_price = float(stock_data['low']); 219 | sell_price = float(stock_data['high']) 220 | daily_volume = float(stock_data['volume']) 221 | 222 | for index in range(0, len(list_of_qty)-1, 2): 223 | order_index += 1 224 | order_table.loc[order_index] = [order_index, symbol, 'Buy', buy_price, int((list_of_qty[index]/total_qty)*daily_volume), 'New'] 225 | buy_price += 0.05 226 | order_index += 1 227 | order_table.loc[order_index] = [order_index, symbol, 'Sell', sell_price, int((list_of_qty[index+1]/total_qty)*daily_volume), 'New'] 228 | sell_price -= 0.05 229 | 230 | print(order_table) 231 | print(market_status, market_period) 232 | 233 | 234 | ''' 235 | (1) Server will provide consolidated books for 30 trading days, 236 | (a) simulated from market data starting from 1/2/2019. 237 | (b) Each simulated trading date has one book, with buy orders and sell orders 238 | simulated from the high and low price from the day, with daily volume randomly 239 | distributed cross all price points. 240 | (c) Each simulated trading date starts with a new book simulated from corresponding 241 | daily historical data 242 | ''' 243 | def create_market_interest(index): 244 | global market_period, symbols 245 | 246 | market_periods = pd.bdate_range('2019-01-02', '2019-04-01').strftime("%Y-%m-%d").tolist() 247 | 248 | # in order 249 | startDate = market_periods[index] 250 | endDate = market_periods[index] 251 | 252 | if len(order_table) == 0 or (market_status != "Market Closed" and market_status != "Pending Closing"): 253 | market_period = startDate 254 | populate_order_table(symbols, startDate, endDate) 255 | print(market_status, "Creating market interest") 256 | else: 257 | print(market_status, "No new market interest") 258 | 259 | ''' 260 | (2) Each simulated trading day lasts 30 seconds, 261 | following by 5 seconds of pending closing phase 262 | and 5 seconds of market closed phase before market reopen 263 | ''' 264 | def update_market_status(status, day): 265 | global market_status 266 | global order_index 267 | global order_table 268 | 269 | market_status = status 270 | create_market_interest(day) 271 | 272 | market_status = 'Open' 273 | print(market_status) 274 | time.sleep(30) 275 | 276 | market_status = 'Pending Closing' 277 | print(market_status) 278 | time.sleep(5) 279 | 280 | market_status = 'Market Closed' 281 | print(market_status) 282 | 283 | order_table.fillna(0) 284 | order_index = 0 285 | time.sleep(5) 286 | 287 | ''' 288 | (3) There are 5 phases of market: 289 | (a) Not Open, start 290 | (b) Pending Open, 291 | (c) Open, 30 292 | (d) Pending Close, 5 293 | (e) Market Closed 5 294 | ''' 295 | def set_market_status(scheduler, time_in_seconds): 296 | value = dt.datetime.fromtimestamp(time_in_seconds) 297 | print(value.strftime('%Y-%m-%d %H:%M:%S')) 298 | 299 | # 40s for one day 300 | for day in range(total_market_days): 301 | scheduler.enter(40*day+1,1, update_market_status, argument=('Pending Open', day)) 302 | scheduler.run() 303 | 304 | 305 | port = 6500 306 | buf_size = 1024 307 | platform_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 308 | print(socket.gethostname()) 309 | platform_server.bind((socket.gethostname(), port)) 310 | 311 | if __name__ == "__main__": 312 | 313 | market_status = "Not Open" 314 | market_period = "2019-01-01" 315 | order_index = 0 316 | total_market_days = 30 317 | 318 | symbols = [] 319 | order_table_columns = ['OrderIndex', 'Symbol', 'Side', 'Price', 'Qty', 'Status'] 320 | order_table = pd.DataFrame(columns=order_table_columns) 321 | order_table = order_table.fillna(0) 322 | 323 | platform_server.listen(1) 324 | print("Waiting for client requests") 325 | time.sleep(80) # wait for backtesting to finish 326 | 327 | try: 328 | scheduler = sched.scheduler(time.time, time.sleep) 329 | current_time_in_seconds = time.time() 330 | scheduler_thread = Thread(target=set_market_status, args=(scheduler, current_time_in_seconds)) 331 | scheduler_thread.setDaemon(True) 332 | 333 | server_thread = Thread(target=accept_incoming_connections) 334 | server_thread.setDaemon(True) 335 | 336 | server_thread.start() 337 | scheduler_thread.start() 338 | 339 | scheduler_thread.join() # wait until scheduler finished 340 | server_thread.join() # server finish after scheduler finished 341 | 342 | except (KeyboardInterrupt, SystemExit): 343 | platform_server.close() 344 | sys.exit(0) 345 | -------------------------------------------------------------------------------- /static/plots/backtest_pnl.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/static/plots/backtest_pnl.jpg -------------------------------------------------------------------------------- /static/plots/trade_pnl.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangy8989/Pairs-Trading-with-Machine-Learning/4b617ca4ac35e03ed08af91d911e40179d81cf46/static/plots/trade_pnl.jpg -------------------------------------------------------------------------------- /templates/back_testing.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |

Back Testing

28 |
29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | {% for pair in pair_list %} 40 | 41 | 42 | 43 | 44 | 45 | 46 | {% endfor %} 47 | 48 |
Ticker1Ticker2ScoreProfit_Loss
{{pair.Ticker1}} {{pair.Ticker2}} {{pair.Score}} {{pair.Profit_Loss}}
49 |
50 |
51 | {% endblock %} -------------------------------------------------------------------------------- /templates/base.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Pair Trading Analytics 6 | 7 | 8 | {% block head %} {% endblock %} 9 | 10 | 11 | 31 | 32 |
33 |
34 | 35 | {% block content %} {% endblock %} 36 |
37 |
38 | 39 | 40 | 41 | 42 | -------------------------------------------------------------------------------- /templates/build_model.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |

Building Model for Pairs

28 |
29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | {% for pair in pair_list %} 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | {% endfor %} 57 | 58 |
Symbol1Symbol2DateClose1Close2ResidualLowerMAUpper
{{pair.Symbol1}} {{pair.Symbol2}} {{pair.Date}} {{pair.Close1}} {{pair.Close2}} {{pair.Residual}} {{pair.Lower}} {{pair.MA}} {{pair.Upper}}
59 |
60 |
61 | {% endblock %} -------------------------------------------------------------------------------- /templates/data_prep.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |

Pair Watch list

28 |
29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | {% for pair in pair_list %} 39 | 40 | 41 | 42 | 43 | 44 | {% endfor %} 45 | 46 |
Ticker1Ticker2tScore
{{pair.Ticker1}} {{pair.Ticker2}} {{pair.Score}}
47 |
48 |
49 | {% endblock %} -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |
28 |
29 |

Pairs Trading with Machine Learning

30 |

31 | This project uses Machine Learning methods to group assets based on similar factor loadings, 32 | then identifies pairs within each cluster to implement pairs trading strategy. 33 | Pairs are then constructed into a market-neutral portfolio. 34 |

35 |
36 |
37 |
38 |
39 |
40 |

Machine Learning Methods

41 |

42 | Apply Principal Component Analysis(PCA) to reduce the dimension of the returns data and factor(Indsutry) data, 43 | then group stocks using DBSCAN clustering. 44 |

45 |
46 |
47 |

Finding Pairs

48 |

49 | Run cointegration test(ADF test on stationarity of residual) on each pair in each cluster. Get the most cointegrated pair. 50 |

51 |
52 |
53 |

Trading Logic

54 |

55 | Trade on residuals of pair prices. Then apply bollinger band strategy: 56 | If the residual just crosses below lower band, long; if the residual crosses above higher band, short; 57 | otherwise, hold on to current position. 58 |

59 |
60 |
61 |
62 |
63 | {% endblock %} -------------------------------------------------------------------------------- /templates/real_trade.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |

Trading Analysis

28 |
29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | {% for trade in trade_list %} 40 | 41 | 42 | 43 | 44 | 45 | 46 | {% endfor %} 47 | 48 |
ProfitTotal_TradesMaximum_DrawdownSharpe_ratio
{{trade.Profits}} {{trade.Total_Trades}} {{trade.Max_Drawdown}} {{trade.Sharpe}}
49 |
50 | Trading PnL 51 |
52 | {% endblock %} -------------------------------------------------------------------------------- /templates/trade_analysis.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | {% block content %} 3 | 25 | 26 |
27 |

Trading Analysis

28 |
29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | {% for trade in trade_list %} 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | {% endfor %} 51 | 52 |
ProfitTotal_TradesProfit_TradesLoss_TradesMaximum_DrawdownSharpe_ratio
{{trade.Profit}} {{trade.Total_Trades}} {{trade.Profit_Trades}} {{trade.Loss_Trades}} {{trade.Max_Drawdown}} {{trade.Sharpe}}
53 |
54 | Backtesting PnL 55 |
56 | {% endblock %} --------------------------------------------------------------------------------