├── README.md
├── helpers
├── get_features.py
├── get_historical_data.py
├── oanda_api_helpers.py
└── utils.py
├── images
├── Figure_1.png
├── Inspecting_features.gif
├── ROC_lr_v2.png
├── cnn_v1.png
├── cnn_v2.png
├── feature_heatmap.png
├── features_example.png
├── legend_one_fits_all.png
├── lr_v1.png
├── lr_v2.png
├── lr_v3.png
├── lstm_v1.png
├── lstm_v2.png
└── portfolio_value_1.png
├── main.py
├── models.py
├── saved_models
├── lr-v2-3000.data-00000-of-00001
├── lr-v2-3000.index
└── lr-v2-3000.meta
├── train_cnn_v1.py
├── train_cnn_v2.py
├── train_logistic_regression_v1.py
├── train_logistic_regression_v2.py
├── train_logistic_regression_v3.py
├── train_lstm_v1.py
├── train_lstm_v2.py
└── train_lstm_v3.py
/README.md:
--------------------------------------------------------------------------------
1 | ---------
2 | ### Table of Contents
3 |
4 | 1. [Intro](#Intro)
5 | 2. [Trading tools and helpers](#Tools)
6 | 3. [Training models v1](#train_v1)
7 | 4. [Training models v2](#train_v2)
8 | 5. [Final conclusions and ideas](#Concusion)
9 |
10 |
11 |
12 | ---------
13 | ### 1. Intro
14 |
15 | This is a repo where I store code for training and making an automated FX trading bot.
16 |
17 | Essentially most of the work done here is about trying to train an accurate price movement classification model. But it as well contains all of the other necessary stuff like downloading historical or recent FX data and live-managing a demo trading account using OANDA's API.
18 |
19 | As a default, the training is done using 15 years of hourly data of EUR/USD. The dataset is split into 11 years of training data, 3 years of test data and 1.5 years of cross-validation data (keep this in mind when looking at portfolio value charts). MFeature engineering is mostly done using indicators from [ta-lib](https://github.com/mrjbq7/ta-lib) package.
20 |
21 | Once I'm comfortible with data exploration, models and stuff, I should try other pairs as well. Or maybe a grand-model of huge-bunch-of-different-pairs at once!?
22 |
23 |
24 |
25 | ---------
26 | ### 2. Trading tools and helpers
27 |
28 |
29 | * **[Easy way to get historical data.](helpers/get_historical_data.py)** Simple code to download historical (or current) data of selected instruments using OANDA's API.
30 |
31 | * **[Live trading portfolio manager.](helpers/oanda_api_helpers.py)** A class to manage real-time trading, open and close positions etc. using OANDA's API wrapper.
32 |
33 | * **[Kind of ready to use trading bot.](/main.py)** Final script combining everything to live manage a trading portfolio.
34 |
35 |
36 |
37 | ---------
38 | ### 3. Training models V1
39 |
40 | First try is a bunch of 'quick and dirty' models with just a few features and some optimization experimentation. I've hand picked a few financial indicators and made sure they do not correlate too much. Additionally I've made a few dummy variables for market hours in major markets.
41 |
42 |
43 |
44 |
45 |
46 |
47 | **Predicting price direction.**
48 |
49 | Predict the direction of price in the next time period. Target values [1, 0, 0] for up, [0, 1, 0] for down [0, 0, 1] for flat (sidenote: the threshold for minimum price change that is still considered flat is determined such that each label of up, down and flat makes roughly 1/3 of full dataset). Train by minimizing cross entropy of error.
50 |
51 | | [logistic regression](/train_logistic_regression_v1.py) | [lstm net](/train_lstm_v1.py) | [convolutional net](/train_cnn_v1.py) |
52 | | ------------------- | -------- | ----------------- |
53 | |
|
|
|
54 |
55 |
56 |
57 |
58 |
59 | **Predicting optimal positions allocation**
60 |
61 | Instead of predicting price direction, allocate the funds to buy, sell, do not enter positions directly. For instance [0.5, 0.2, 0.3] would indicate to buy 0.5 units, sell 0.2 units and keep in cash 0.3 units. In this case there are no target labels and the model is trained by maximizing objective function (hourly average return).
62 |
63 | | [logistic regression](/train_logistic_regression_v2.py) | [lstm net](/train_lstm_v2.py) | [convolutional net](/train_cnn_v2.py) |
64 | | ------------------- | -------- | ----------------- |
65 | |
|
|
|
66 |
67 |
68 |
69 |
70 |
71 | **Concusions v1:**
72 | - Optimizing by minimizing cost cross-entropy with target labels works (i.e. predicting price direction). Optimizing by maximizing average return without target labels does not work (i.e. predicting optimal positions allocation). Because of unstable / uneven gradients maybe..?
73 | - LSTM and CNN models suffer from overfit problem (and underfit as well) that is hard to deal with. So I'll have to filter out least important features if I want to make it work.
74 | - Learning rate does make a big difference. Training logistic regression with really small lr converges much better. It's probably a good idea to decrease lr again after a number of iterations.
75 | - Results are terribly (!!!) dependent on randomization. My guess is, because the surface of objective function is very rough, each random initialization of weights and random pick of first training batches leads to new local optima. Therefore, to find a really good fit each model should be trained multiple times.
76 | - Sometimes cost function is jumping up and down like crazy because batches of input are not homogenious (?) (the set of 'rules' by which objective function is optimized changes dramatically from batch to batch). Nonetheless, it slowly moves towards some kind of optima (not always! it might take a few tries of training from the beginning).
77 | - Adjusting hyper-parameters is hard but it seems it might be worth the effort.
78 |
79 |
80 |
81 | ---------
82 | ### 3. Training models V2
83 |
84 | This time the idea was to:
85 | 1. Create dozens of features (ta-lib indicators) of varying periods. Roughly there is 80 indicators, some of which can vary in time-periods, so all-in-all it is reasonable to create ~250 features.
86 | 2. Perform PCA to simplify everything and get rid of similar and unimportant highly correlated features.
87 | 3. Experiment with polynomials.
88 |
89 | ** Plot example of a few features after normalization
**
90 | 
91 |
92 | After trying multiple ways of combining the features polynomials and PCA, it seems that this approach did not increase the accuracy of the model. Just for future reference I unclude best ROC scores I was able to reach using this approach.
93 |
94 | ** Receiver operating curve
**
95 | 
96 |
97 |
98 | **Conclusions v2:**
99 |
100 | 1. Given only price and volume data, predicting price direction is not really accurate.
101 | 2. For predictions to be reasonable more features are needed. For instance sentiment data, other macroeconomic data or whatever.
102 | 3. If not only possible profitable strategy would be, to use other models like position sizing and carefully entering trades to decrease total transaction costs.
103 |
104 | Here is an example of portfolio value given the best models. Unfortunately, the results change dramatically once transaction costs are accounted for.
105 |
106 | ** Portfolio value w\ and w\o transaction costs
**
107 |
108 |
109 |
110 |
111 |
112 | ---------
113 | ### 5. Final remarks
114 |
115 | **Ideas to try out someday:**
116 | 1. Use inner layers of cnn as features in logistic regression.
117 | 2. Grand-model with multiple pairs as input and output.
118 | 3. Use evolution strategies to optimize for stuff that has no smooth gradients: SL...
119 |
120 |
121 |
122 |
123 |
124 |
--------------------------------------------------------------------------------
/helpers/get_features.py:
--------------------------------------------------------------------------------
1 | """
2 | Features
3 |
4 | https://github.com/mrjbq7/ta-lib
5 | https://cryptotrader.org/talib
6 | """
7 |
8 | import pandas as pd
9 | import numpy as np
10 | from datetime import datetime
11 | from talib.abstract import *
12 | import talib
13 | from helpers.utils import extract_timeseries_from_oanda_data
14 | import pylab as plt
15 |
16 |
17 | def scale_data(input_data_no_dummies, split):
18 | """Scale NON DUMMY data given train, test, cv split"""
19 | from sklearn.preprocessing import MinMaxScaler
20 | train_split = int(len(input_data_no_dummies)*split[0])
21 | scaler = MinMaxScaler()
22 | scaler.fit(input_data_no_dummies[:train_split])
23 | return scaler.transform(input_data_no_dummies)
24 |
25 |
26 | def prep_data_for_feature_gen(data):
27 | """Restructure OANDA data to use it for TA-Lib feature generation"""
28 | inputs = {
29 | 'open': np.array([x['openMid'] for x in data]),
30 | 'high': np.array([x['highMid'] for x in data]),
31 | 'low': np.array([x['lowMid'] for x in data]),
32 | 'close': np.array([x['closeMid'] for x in data]),
33 | 'volume': np.array([float(x['volume']) for x in data])}
34 | return inputs
35 |
36 |
37 | def get_features(oanda_data):
38 | """Given OANDA data get some specified indicators using TA-Lib
39 | This is unfinished work. For now just random unstructured indicators
40 | """
41 |
42 | # price and volume
43 | price, volume = extract_timeseries_from_oanda_data(oanda_data, ['closeMid', 'volume'])
44 | price_change = np.array([float(i) / float(j) - 1 for i, j in zip(price[1:], price)])
45 | volume_change = np.array([float(i) / float(j) - 1 for i, j in zip(volume[1:], volume)])
46 | price_change = np.concatenate([[np.nan], price_change], axis=0)
47 | volume_change = np.concatenate([[np.nan], volume_change], axis=0)
48 |
49 | inputs = prep_data_for_feature_gen(oanda_data)
50 |
51 | # overlap studies
52 | par_sar = SAREXT(inputs)
53 | outm, outf = MAMA(inputs, optInFastLimit=12, optInSlowLimit=24)
54 | upper, middle, lower = BBANDS(inputs,
55 | optInTimePeriod=12,
56 | optInNbDevUp=2,
57 | optInNbDevDn=2,
58 | optinMAType='EMA')
59 | upper = upper - price.ravel()
60 | middle = middle - price.ravel()
61 | lower = price.ravel() - lower
62 |
63 | # momentum
64 | bop = BOP(inputs)
65 | cci = CCI(inputs)
66 | adx = ADX(inputs, optInTimePeriod=24)
67 | cmo = CMO(inputs, optInTimePeriod=6)
68 | will = WILLR(inputs, optInTimePeriod=16)
69 | slowk, slowd = STOCH(inputs,
70 | optInFastK_Period=5,
71 | optInSlowK_Period=3,
72 | optInSlowK_MAType=0,
73 | optInSlowD_Period=3,
74 | optInSlowD_MAType=0)
75 | macd1, macd2, macd3 = MACD(inputs,
76 | optInFastPeriod=12,
77 | optInSlowPeriod=6,
78 | optInSignalPeriod=3)
79 | stocf1, stockf2 = STOCHF(inputs,
80 | optInFastK_Period=12,
81 | optInFastD_Period=6,
82 | optInFastD_MAType='EXP')
83 | rsi1, rsi2 = STOCHRSI(inputs,
84 | optInTimePeriod=24,
85 | optInFastK_Period=12,
86 | optInFastD_Period=24,
87 | optInFastD_MAType='EXP')
88 |
89 | # volume indicators
90 | ados = ADOSC(inputs, optInFastPeriod=24, optInSlowPeriod=12)
91 |
92 | # cycle indicators
93 | ht_sine1, ht_sine2 = HT_SINE(inputs)
94 | ht_phase = HT_DCPHASE(inputs)
95 | ht_trend = HT_TRENDMODE(inputs)
96 |
97 | # price transform indicators
98 | wcp = WCLPRICE(inputs)
99 |
100 | # volatility indicators
101 | avg_range = NATR(inputs, optInTimePeriod=6)
102 |
103 | # TODO: pattern recognition (is this bullshit?)
104 | # pattern_indicators = []
105 | # for func in talib.get_function_groups()['Pattern Recognition']:
106 | # result = eval(func + '(inputs)') / 100
107 | # pattern_indicators.append(result)
108 | # pattern_indicators = np.array(pattern_indicators)
109 |
110 | # markets dummies
111 | time = np.array([datetime.strptime(x['time'], '%Y-%m-%dT%H:%M:%S.000000Z') for x in oanda_data])
112 | mrkt_london = [3 <= x.hour <= 11 for x in time]
113 | mrkt_ny = [8 <= x.hour <= 16 for x in time]
114 | mrkt_sydney = [17 <= x.hour <= 24 or 0 <= x.hour <= 1 for x in time]
115 | mrkt_tokyo = [19 <= x.hour <= 24 or 0 <= x.hour <= 3 for x in time]
116 |
117 | # sorting indicators
118 | all_indicators = np.array([price_change, volume_change, par_sar, outm, outf, upper, middle, lower, bop, cci, adx,
119 | cmo, macd1, macd2, macd3, stocf1, stockf2, rsi1, rsi2,
120 | ados, ht_sine1, ht_sine2, ht_phase, wcp, avg_range])
121 |
122 | all_dummies = np.array([ht_trend, mrkt_london, mrkt_ny, mrkt_sydney, mrkt_tokyo])
123 |
124 | return all_indicators.T, all_dummies.T # transpose to get (data_points, features)
125 |
126 |
127 | def get_features_v2(oanda_data, time_periods, return_numpy):
128 | """Returns all (mostly) indicators from ta-lib library for given time periods"""
129 |
130 | # load primary data
131 | inputs = prep_data_for_feature_gen(oanda_data)
132 |
133 | # get name of all the functions
134 | function_groups = ['Cycle Indicators',
135 | 'Momentum Indicators',
136 | 'Overlap Studies',
137 | 'Volume Indicators',
138 | 'Volatility Indicators',
139 | 'Statistic Functions']
140 | function_list = [talib.get_function_groups()[group] for group in function_groups]
141 | function_list = [item for sublist in function_list for item in sublist] # flatten the list
142 | function_list.remove('MAVP')
143 |
144 | # price and volume
145 | price, volume = extract_timeseries_from_oanda_data(oanda_data, ['closeMid', 'volume'])
146 | price_change = np.array([float(i) / float(j) - 1 for i, j in zip(price[1:], price)])
147 | volume_change = np.array([float(i) / float(j) - 1 for i, j in zip(volume[1:], volume)])
148 | price_change = np.concatenate([[0], price_change], axis=0)
149 | volume_change = np.concatenate([[0], volume_change], axis=0)
150 |
151 | # get all indicators
152 | df_indicators = pd.DataFrame()
153 | df_indicators['price'] = price.ravel()
154 | df_indicators['price_delta'] = price_change
155 | df_indicators['volume_change'] = volume_change
156 | for func in function_list:
157 | if 'timeperiod' in getattr(talib.abstract, func).info['parameters']:
158 | for time_period in time_periods:
159 | indicator = getattr(talib.abstract, func)(inputs, timeperiod=time_period)
160 | if any(isinstance(item, np.ndarray) for item in indicator): # if indicator returns > 1 time-series
161 | indicator_id = 0
162 | for x in indicator:
163 | df_indicators[func + '_' + str(indicator_id) + '_tp_' + str(time_period)] = x
164 | indicator_id += 1
165 | else: # if indicator returns 1 time-series
166 | df_indicators[func + '_tp_' + str(time_period)] = indicator
167 | else:
168 | indicator = getattr(talib.abstract, func)(inputs)
169 | if any(isinstance(item, np.ndarray) for item in indicator):
170 | indicator_id = 0
171 | for x in indicator:
172 | df_indicators[func + str(indicator_id)] = x
173 | indicator_id += 1
174 | else:
175 | df_indicators[func] = indicator
176 |
177 | # manual handling of features
178 | df_indicators['AD'] = df_indicators['AD'].pct_change()
179 | df_indicators['OBV'] = df_indicators['OBV'].pct_change()
180 | df_indicators['HT_DCPERIOD'] = (df_indicators['HT_DCPERIOD'] > pd.rolling_mean(df_indicators['HT_DCPERIOD'], 50)).astype(float)
181 | df_indicators['HT_DCPHASE'] = (df_indicators['HT_DCPHASE'] > pd.rolling_mean(df_indicators['HT_DCPHASE'], 10)).astype(float)
182 | df_indicators['ADX_tp_10'] = (df_indicators['ADX_tp_10'] > pd.rolling_mean(df_indicators['ADX_tp_10'], 10)).astype(float)
183 | df_indicators['MACD0'] = df_indicators['MACD0'] - df_indicators['MACD1']
184 | df_indicators['MINUS_DI_tp_10'] = (df_indicators['MINUS_DI_tp_10'] > pd.rolling_mean(df_indicators['MINUS_DI_tp_10'], 20)).astype(float)
185 | df_indicators['RSI_tp_10'] = (df_indicators['RSI_tp_10'] > pd.rolling_mean(df_indicators['RSI_tp_10'], 15)).astype(float)
186 | df_indicators['ULTOSC'] = (df_indicators['ULTOSC'] > pd.rolling_mean(df_indicators['ULTOSC'], 15)).astype(float)
187 | df_indicators['BBANDS_0_tp_10'] = df_indicators['BBANDS_0_tp_10'] - df_indicators['price']
188 | df_indicators['BBANDS_1_tp_10'] = df_indicators['BBANDS_1_tp_10'] - df_indicators['price']
189 | df_indicators['BBANDS_2_tp_10'] = df_indicators['BBANDS_2_tp_10'] - df_indicators['price']
190 | df_indicators['DEMA_tp_10'] = df_indicators['DEMA_tp_10'] - df_indicators['price']
191 | df_indicators['EMA_tp_10'] = df_indicators['EMA_tp_10'] - df_indicators['price']
192 | df_indicators['HT_TRENDLINE'] = df_indicators['HT_TRENDLINE'] - df_indicators['price']
193 | df_indicators['KAMA_tp_10'] = df_indicators['KAMA_tp_10'] - df_indicators['price']
194 | df_indicators['MAMA0'] = df_indicators['MAMA0'] - df_indicators['price']
195 | df_indicators['MAMA1'] = df_indicators['MAMA1'] - df_indicators['price']
196 | df_indicators['MIDPOINT_tp_10'] = df_indicators['MIDPOINT_tp_10'] - df_indicators['price']
197 | df_indicators['MIDPRICE_tp_10'] = df_indicators['MIDPRICE_tp_10'] - df_indicators['price']
198 | df_indicators['SMA_tp_10'] = df_indicators['SMA_tp_10'] - df_indicators['price']
199 | df_indicators['T3_tp_10'] = df_indicators['T3_tp_10'] - df_indicators['price']
200 | df_indicators['TEMA_tp_10'] = df_indicators['TEMA_tp_10'] - df_indicators['price']
201 | df_indicators['TRIMA_tp_10'] = df_indicators['TRIMA_tp_10'] - df_indicators['price']
202 | df_indicators['WMA_tp_10'] = df_indicators['WMA_tp_10'] - df_indicators['price']
203 | df_indicators['SAR'] = df_indicators['SAR'] - df_indicators['price']
204 | df_indicators['LINEARREG_tp_10'] = df_indicators['LINEARREG_tp_10'] - df_indicators['price']
205 | df_indicators['LINEARREG_INTERCEPT_tp_10'] = df_indicators['LINEARREG_INTERCEPT_tp_10'] - df_indicators['price']
206 | df_indicators['TSF_tp_10'] = df_indicators['TSF_tp_10'] - df_indicators['price']
207 |
208 | # markets dummies
209 | time = np.array([datetime.strptime(x['time'], '%Y-%m-%dT%H:%M:%S.000000Z') for x in oanda_data])
210 | df_indicators['mrkt_london'] = np.array([3 <= x.hour <= 11 for x in time]).astype(int)
211 | df_indicators['mrkt_ny'] = np.array([8 <= x.hour <= 16 for x in time]).astype(int)
212 | df_indicators['mrkt_sydney'] = np.array([17 <= x.hour <= 24 or 0 <= x.hour <= 1 for x in time]).astype(int)
213 | df_indicators['mrkt_tokyo'] = np.array([19 <= x.hour <= 24 or 0 <= x.hour <= 3 for x in time]).astype(int)
214 |
215 | print('Features shape: {}'.format(df_indicators.shape))
216 |
217 | return df_indicators.as_matrix() if return_numpy else df_indicators
218 |
219 |
220 | # # min max scaling params (needs to be created manually (for now) or better use scilearn min_max scaler
221 | # # min max parameters for scaling
222 | # import pandas as pd
223 | # oanda_data = np.load('data\\AUD_JPY_H1.npy')[-50000:]
224 | # all_indicators, all_dummies = get_features(oanda_data)
225 | # length = int(len(all_indicators) * 0.5)
226 | # all_indicators = pd.DataFrame(all_indicators[:length, ])
227 | # all_indicators_pd = all_indicators[all_indicators.apply(lambda x: np.abs(x - x.median()) / x.std() < 3).all(axis=1)]
228 | # all_indicators_np = all_indicators_pd.as_matrix()
229 | #
230 | # min_max_parameters = np.array([np.nanmax(all_indicators_np[:, :length].T, axis=1),
231 | # np.nanmin(all_indicators_np[:, :length].T, axis=1)])
232 |
233 | # eur usd
234 | min_max_scaling = np.array([[1.86410584e-03, 2.01841085e+00, 1.19412800e+00,
235 | 1.19447352e+00, 1.19295244e+00, 2.70961491e-03,
236 | 1.32700000e-03, 4.05070743e-03, 9.86577181e-01,
237 | 2.51521519e+02, 3.64593211e+01, 5.84544775e+01,
238 | 1.52468944e-03, 1.44255282e-03, 3.38887291e-04,
239 | 9.91166078e+01, 9.54336553e+01, 1.00000000e+02,
240 | 1.00000000e+02, 7.34727536e+03, 2.47949723e-01,
241 | -5.09698958e-01, 2.06138115e+02, 1.19570250e+00,
242 | 1.04819528e-01],
243 | [-1.30372596e-03, -6.84790089e-01, -1.19592000e+00,
244 | 1.18566979e+00, 1.15180427e+00, 2.90481254e-05,
245 | -1.93000000e-03, 3.75062541e-05, -9.53846154e-01,
246 | -2.35424245e+02, 1.29761986e+01, -2.20316967e+01,
247 | 8.84276800e-05, 1.76158803e-04, -2.39463856e-04,
248 | 3.67647059e-01, 9.65742018e+00, 0.00000000e+00,
249 | -2.96059473e-15, -1.71810219e+03, -4.40536468e-01,
250 | -9.46300629e-01, 1.65643780e+02, 1.18376500e+00,
251 | 6.95891066e-02]])
252 |
253 |
--------------------------------------------------------------------------------
/helpers/get_historical_data.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import numpy as np
3 | from datetime import datetime
4 | import os
5 |
6 |
7 | def download_data_from_oanda(params):
8 | """
9 | Input: a dictionary with parameters
10 | http://developer.oanda.com/rest-live/rates/
11 | params = {'instrument': 'EUR_USD',
12 | 'candleFormat': 'midpoint',
13 | 'granularity': 'M15',
14 | 'dailyAlignment': '0',
15 | 'start': '2017-11-20',
16 | 'count': '5000'}
17 |
18 | Return: list of data up to the last available data point
19 | """
20 |
21 | # initiate variables
22 | data = []
23 | time_delta = None
24 | time_format = None
25 | finished = False
26 |
27 | while not finished:
28 |
29 | # get response
30 | try:
31 | response = requests.get(url='https://api-fxtrade.oanda.com/v1/candles', params=params).json()
32 | except ValueError:
33 | print('Something is wrong with oanda response')
34 |
35 | # append response
36 | data = np.append(data, np.array(response['candles']))
37 |
38 | # ascertain time delta (only once)
39 | if not time_delta:
40 | time_format = '%Y-%m-%dT%H:%M:%S.000000Z'
41 | time_delta = datetime.strptime(response['candles'][-1]['time'], time_format) - \
42 | datetime.strptime(response['candles'][-2]['time'], time_format)
43 |
44 | # start from last time stamp
45 | params['start'] = (datetime.strptime(response['candles'][-1]['time'], time_format) + time_delta).isoformat('T')
46 |
47 | # check if finished
48 | finished = not response['candles'][-1]['complete']
49 | print('Done!') if finished else print(response['candles'][-1]['time'])
50 |
51 | return data
52 |
53 |
54 | def download_multiple_instruments_and_save(instrument_list, params):
55 | """
56 | Downloads specified instruments and saves it to /data/'instrument'.npy
57 |
58 | instrument_list = ["AUD_JPY", "AUD_USD", "CHF_JPY",
59 | "EUR_CAD", "EUR_CHF", "EUR_GBP",
60 | "EUR_JPY", "EUR_USD", "GBP_CHF",
61 | "GBP_JPY", "GBP_USD", "NZD_JPY",
62 | "NZD_USD", "USD_CHF", "USD_JPY"]
63 |
64 | params = {'instrument': '',
65 | 'candleFormat': 'midpoint',
66 | 'granularity': 'M15',
67 | 'dailyAlignment': '0',
68 | 'start': '2017-11-20',
69 | 'count': '5000'}
70 |
71 | Return: None it just saves the data
72 | """
73 |
74 | starting_time = params['start']
75 | for instrument in instrument_list:
76 |
77 | # download data
78 | params['instrument'] = instrument
79 | params['start'] = starting_time
80 | instrument_data = download_data_from_oanda(params)
81 |
82 | # save and track progress
83 | np.save('data\\{}_{}.npy'.format(instrument, params['granularity']), instrument_data)
84 | print('{} is finished!'.format(instrument))
85 |
86 |
87 | def get_latest_oanda_data(instrument, granularity, count):
88 | """Returns last oanda data (with a length of count) for a given instrument and granularity"""
89 |
90 | params = {'instrument': instrument,
91 | 'candleFormat': 'midpoint',
92 | 'granularity': granularity,
93 | 'dailyAlignment': '0',
94 | 'count': count+1} # +1 to make sure all returned candles are complete
95 | response = requests.get(url='https://api-fxtrade.oanda.com/v1/candles', params=params).json()
96 | data = np.array(response['candles'])
97 |
98 | # if last candle is complete, return full data (except first point), else omit last data point
99 | return data[1:] if data[-1]['complete'] else data[:-1]
100 |
101 |
102 | # code to download a list of instruments
103 | # download_multiple_instruments_and_save(instrument_list=["AUD_JPY", "AUD_USD", "CHF_JPY",
104 | # "EUR_CAD", "EUR_CHF", "EUR_GBP",
105 | # "EUR_JPY", "EUR_USD", "GBP_CHF",
106 | # "GBP_JPY", "GBP_USD", "NZD_JPY",
107 | # "NZD_USD", "USD_CHF", "USD_JPY"],
108 | # params={'instrument': '',
109 | # 'candleFormat': 'midpoint',
110 | # 'granularity': 'M1',
111 | # 'dailyAlignment': '0',
112 | # 'start': '2001-01-01',
113 | # 'count': '5000'})
114 |
--------------------------------------------------------------------------------
/helpers/oanda_api_helpers.py:
--------------------------------------------------------------------------------
1 | """
2 | This is where all the code related to interacting with oanda is stored.
3 |
4 | Sources:
5 | https://media.readthedocs.org/pdf/oanda-api-v20/latest/oanda-api-v20.pdf
6 | https://github.com/hootnot/oanda-api-v20
7 |
8 | """
9 |
10 | import oandapyV20
11 | import oandapyV20.endpoints.orders as orders
12 | import oandapyV20.endpoints.trades as trades
13 | import oandapyV20.endpoints.accounts as accounts
14 | import oandapyV20.endpoints.positions as positions
15 | from oandapyV20.contrib.requests import MarketOrderRequest
16 | import json
17 |
18 |
19 | # TODO: make sure send_request checks if order is through on weekends and no order_book is created
20 | class TradingSession:
21 |
22 | # initiate objects
23 | def __init__(self, accountID, access_token):
24 | self.accountID = accountID
25 | self.access_token = access_token
26 | self.api = oandapyV20.API(access_token=access_token, environment="practice")
27 | self.order_book = self.oanda_order_book()
28 |
29 | # initiate methods
30 | def send_request(self, request):
31 | """
32 | Sends request to oanda API.
33 | Returns oanda's response if success, 1 if error.
34 | """
35 | try:
36 | rv = self.api.request(request)
37 | # print(json.dumps(rv, indent=2))
38 | return rv
39 | except oandapyV20.exceptions.V20Error as err:
40 | print(request.status_code, err)
41 | return 1
42 |
43 | def open_order(self, instrument, units):
44 |
45 | # check if position is already open
46 | if units < 0:
47 | if self.order_book[instrument]['order_type'] is (not None and -1):
48 | print('Short: {} (holding)'.format(instrument))
49 | return 1
50 | elif units > 0:
51 | if self.order_book[instrument]['order_type'] is (not None and 1):
52 | print('Long: {} (holding)'.format(instrument))
53 | return 1
54 | else:
55 | print('Units specified: 0')
56 | return 1
57 |
58 | # define parameters, create and send a request
59 | mkt_order = MarketOrderRequest(instrument=instrument, units=units)
60 | r = orders.OrderCreate(self.accountID, data=mkt_order.data)
61 | request_data = self.send_request(r)
62 |
63 | # check if request was fulfilled and save its ID
64 | if request_data is not 1:
65 | instrument = request_data['orderCreateTransaction']['instrument']
66 | self.order_book[instrument]['tradeID'] = request_data['lastTransactionID']
67 | self.order_book[instrument]['order_type'] = -1 if units < 0 else 1
68 | print('{}: {}'.format('Long' if units > 0 else 'Short', instrument))
69 | return 0
70 | else:
71 | return 1
72 |
73 | def close_order(self, instrument):
74 |
75 | # check if position exist
76 | if self.order_book[instrument]['order_type'] is None:
77 | print('Position {} does not exist'.format(instrument))
78 | return 1
79 |
80 | # create and send a request
81 | r = trades.TradeClose(accountID=self.accountID, tradeID=self.order_book[instrument]['tradeID'])
82 | request_data = self.send_request(r)
83 |
84 | # check if request was fulfilled and clear it
85 | if request_data is not 1:
86 | instrument = request_data['orderCreateTransaction']['instrument']
87 | self.order_book[instrument]['order_type'] = None
88 | self.order_book[instrument]['tradeID'] = None
89 | print('Closed: {}'.format(instrument))
90 | return 0
91 | else:
92 | return 1
93 |
94 | def check_open_positions(self):
95 | r = positions.OpenPositions(self.accountID)
96 | return self.send_request(r)
97 |
98 | def check_account_summary(self):
99 | r = accounts.AccountSummary(self.accountID)
100 | return self.send_request(r)
101 |
102 | def oanda_order_book(self):
103 | """Synchronize open positions with this object's order_book"""
104 | order_book_oanda = self.check_open_positions()
105 | order_book = {'EUR_USD': {'order_type': None, 'tradeID': None},
106 | 'AUD_JPY': {'order_type': None, 'tradeID': None}}
107 | for pos in order_book_oanda['positions']:
108 | try:
109 | trade_id = pos['long']['tradeIDs']
110 | order_type = 1
111 | except KeyError:
112 | trade_id = pos['short']['tradeIDs']
113 | order_type = -1
114 | order_book[pos['instrument']]['tradeID'] = trade_id
115 | order_book[pos['instrument']]['order_type'] = order_type
116 | return order_book
117 |
118 | def sync_with_oanda(self):
119 | self.order_book = self.oanda_order_book()
120 |
121 | def close_all_open_positions(self):
122 | """Close all opened positions"""
123 |
124 | # check oanda for open positions
125 | try:
126 | open_positions = self.check_open_positions()['positions'][0]
127 | except IndexError:
128 | self.order_book = self.oanda_order_book()
129 | print('No opened positions')
130 | return 0
131 |
132 | # get ID's of open positions
133 | trade_ids = []
134 | try:
135 | [trade_ids.append(x) for x in open_positions['short']['tradeIDs']]
136 | except KeyError:
137 | pass
138 | try:
139 | [trade_ids.append(x) for x in open_positions['long']['tradeIDs']]
140 | except KeyError:
141 | pass
142 |
143 | # close orders by ID
144 | [close_order_manually(self.accountID, self.access_token, x) for x in trade_ids]
145 | self.order_book = self.oanda_order_book()
146 | print('All positions closed')
147 | return 0
148 |
149 |
150 | def close_order_manually(accountID, access_token, tradeID):
151 | """
152 | Closes order manually using tradeID.
153 | """
154 | api = oandapyV20.API(access_token=access_token, environment="practice")
155 | request = trades.TradeClose(accountID, tradeID)
156 | rv = api.request(request)
157 | print(json.dumps(rv, indent=2))
158 | return 0
159 |
160 |
--------------------------------------------------------------------------------
/helpers/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Utils
3 | """
4 |
5 | import numpy as np
6 | import pandas as pd
7 | import pylab as plt
8 |
9 |
10 | def remove_nan_rows(items):
11 | """
12 | Get rid of rows if at least one value is nan in at least one item in input.
13 | Inputs a list of items to remove nans
14 | Returns arrays with filtered rows and unified length.
15 | """
16 | unified_mask = np.ones(len(items[0]), dtype=bool)
17 | for item in items:
18 | mask = np.any(np.isnan(item), axis=1)
19 | unified_mask[mask == True] = False
20 | return [item[unified_mask, :] for item in items]
21 |
22 |
23 | def extract_timeseries_from_oanda_data(oanda_data, items):
24 | """Given keys of oanda data, put it's contents into an array"""
25 | output = []
26 | for item in items:
27 | time_series = np.array([x[item] for x in oanda_data]).reshape(len(oanda_data), 1)
28 | output.append(time_series)
29 | return output if len(output) > 1 else output[0]
30 |
31 |
32 | def price_to_binary_target(oanda_data, delta=0.0001):
33 | """Quick and dirty way of constructing output where:
34 | [1, 0, 0] rise in price
35 | [0, 1, 0] price drop
36 | [0, 0, 1] no change (flat)
37 | """
38 | price = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
39 | price_change = np.array([x1 / x2 - 1 for x1, x2 in zip(price[1:], price)])
40 | price_change = np.concatenate([[[0]], price_change])
41 | binary_price = np.zeros(shape=(len(price), 3))
42 | binary_price[-1] = np.nan
43 | for data_point in range(len(price_change) - 1):
44 | if price_change[data_point+1] > 0 and price_change[data_point+1] - delta > 0: # price will drop
45 | column = 0
46 | elif price_change[data_point+1] < 0 and price_change[data_point+1] + delta < 0: # price will rise
47 | column = 1
48 | else: # price will not change
49 | column = 2
50 | binary_price[data_point][column] = 1
51 |
52 | # print target label distribution
53 | data_points = len(binary_price[:-1])
54 | print('Rise: {:.2f}, Drop: {:.2f}, Flat: {:.2f}'.format(np.sum(binary_price[:-1, 0]) / data_points,
55 | np.sum(binary_price[:-1, 1]) / data_points,
56 | np.sum(binary_price[:-1, 2]) / data_points))
57 |
58 | # print df to check if no look-ahead bias is introduced
59 | print(pd.DataFrame(np.concatenate([np.around(price, 5),
60 | np.around(price_change, 4),
61 | binary_price.astype(int)], axis=1)[:10, :]))
62 |
63 | return binary_price
64 |
65 |
66 | def train_test_validation_split(list_of_items, split=(0.5, 0.35, 0.15)):
67 | """Splits data into train, test, validation samples"""
68 | train, test, cv = split
69 | id_train = int(len(list_of_items[0]) * train)
70 | id_test = int(len(list_of_items[0]) * (train + test))
71 |
72 | split_tuple = ()
73 | for item in list_of_items:
74 | train_split = item[:id_train]
75 | test_split = item[id_train:id_test]
76 | cv_split = item[id_test:]
77 | split_tuple = split_tuple + (train_split, test_split, cv_split)
78 | return split_tuple
79 |
80 |
81 | def get_signal(softmax_output):
82 | """Return an array of signals given softmax output"""
83 | signal_index = np.argmax(softmax_output, axis=1)
84 | signal = np.zeros(shape=(len(signal_index), 1))
85 | for index, point in zip(signal_index, range(len(signal))):
86 | if index == 0:
87 | signal[point] = 1
88 | elif index == 1:
89 | signal[point] = -1
90 | else:
91 | signal[point] = 0
92 | return signal
93 |
94 |
95 | def portfolio_value(price_change, signal, trans_costs=0.000):
96 | """Return portfolio value.
97 | IMPORTANT!
98 | signal received from last fully formed candle
99 | percentage price change over last fully formed candle and previous period"""
100 | # signal = signal_train
101 | # price_change = price_data
102 | signal_percent = signal[:-1] * price_change[1:]
103 | transaction_costs = np.zeros_like(signal_percent)
104 | for i in range(len(signal)-1):
105 | transaction_costs[i] = [trans_costs if signal[i] != signal[i+1] and signal[i+1] != 0 else 0]
106 | value = np.cumsum(signal_percent - transaction_costs) + 1
107 | # full = np.concatenate([signal, np.concatenate([[[0]], transaction_costs], axis=0)], axis=1)
108 | return value
109 |
110 |
111 | def get_data_batch(list_of_items, batch_size, sequential):
112 | """Returns a batch of data. A batch of sequence or random points."""
113 | if sequential:
114 | indexes = np.random.randint(len(list_of_items[0]) - (batch_size+1))
115 | else:
116 | indexes = np.random.randint(0, len(list_of_items[0]), batch_size)
117 | batch_list = []
118 | for item in list_of_items:
119 | batch = item[indexes:indexes+batch_size, ...] if sequential else item[indexes, ...]
120 | batch_list.append(batch)
121 | return batch_list
122 |
123 |
124 | def get_lstm_input_output(x, y, time_steps):
125 | """Returns sequential lstm shaped data [batch_size, time_steps, features]"""
126 | data_points, _ = np.shape(x)
127 | x_batch_reshaped = []
128 | for i in range(data_points - time_steps):
129 | x_batch_reshaped.append(x[i: i+time_steps, :])
130 | return np.array(x_batch_reshaped), y[time_steps:]
131 |
132 |
133 | def get_cnn_input_output(x, y, time_steps=12):
134 | """Returns sequential cnn shaped data [batch_size, features, time_steps]"""
135 | data_points, _ = np.shape(x)
136 | x_batch_reshaped = []
137 | for i in range(data_points - time_steps):
138 | x_batch_reshaped.append(x[i:i+time_steps, :])
139 | x_batch_reshaped = np.transpose(np.array([x_batch_reshaped]), axes=(1, 3, 2, 0))
140 | return np.array(x_batch_reshaped), y[time_steps:]
141 |
142 |
143 | def plot_roc_curve(y_pred_prob, y_target):
144 | """Source: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html"""
145 | from sklearn.metrics import roc_curve, auc
146 | # roc curve
147 | fpr = dict()
148 | tpr = dict()
149 | roc_auc = dict()
150 | for i in range(3):
151 | fpr[i], tpr[i], _ = roc_curve(y_target[:, i], y_pred_prob[:, i])
152 | roc_auc[i] = auc(fpr[i], tpr[i])
153 |
154 | # Compute micro-average ROC curve and ROC area
155 | fpr["micro"], tpr["micro"], _ = roc_curve(y_target.ravel(), y_pred_prob.ravel())
156 | roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
157 |
158 | return roc_auc["micro"], fpr["micro"], tpr["micro"]
159 |
160 |
161 | def y_trans(y, to_1d):
162 | """Transform y data from [1, -1] to [[1, 0, 0], [0, 1, 0]] and vice versa"""
163 | if to_1d:
164 | y_flat = y.argmax(axis=1)
165 | map = {0: 1, 1: -1, 2: 0}
166 | y_new = np.copy(y_flat)
167 | for k, v in map.items():
168 | y_new[y_flat == k] = v
169 | return y_new.reshape(-1, 1)
170 | else:
171 | y_new = np.zeros(shape=(len(y), 3))
172 | for i in range(len(y_new)):
173 | index = [0 if y[i] == 1 else 1 if y[i] == -1 else 2][0]
174 | y_new[i, index] = 1
175 | return y_new
176 |
177 |
178 | def min_max_scale(input_train, input_test, input_cv, std_dev_threshold=2.1):
179 | from sklearn.preprocessing import MinMaxScaler
180 |
181 | # get rid of outliers
182 | input_train_df = pd.DataFrame(input_train)
183 | input_train_no_outliers = input_train_df[input_train_df.apply(
184 | lambda x: np.abs(x - x.median()) / x.std() < std_dev_threshold).all(axis=1)].as_matrix()
185 |
186 | scaler = MinMaxScaler()
187 | scaler.fit(input_train_no_outliers)
188 |
189 | input_train_scaled = scaler.fit_transform(input_train)
190 | input_test_scaled = scaler.fit_transform(input_test)
191 | input_cv_scaled = scaler.fit_transform(input_cv)
192 |
193 | return input_train_scaled, input_test_scaled, input_cv_scaled
194 |
195 |
196 | def get_pca(input_train, input_test, input_cv, threshold=0.01):
197 | from sklearn.decomposition import PCA
198 | pca = PCA()
199 | pca.fit(input_train)
200 | plt.plot(pca.explained_variance_ratio_)
201 | nr_features = np.sum(pca.explained_variance_ratio_ > threshold)
202 |
203 | input_train_pca = pca.fit_transform(input_train)
204 | input_test_pca = pca.fit_transform(input_test)
205 | input_cv_pca = pca.fit_transform(input_cv)
206 |
207 | input_train_pca = input_train_pca[:, :nr_features]
208 | input_test_pca = input_test_pca[:, :nr_features]
209 | input_cv_pca = input_cv_pca[:, :nr_features]
210 |
211 | return input_train_pca, input_test_pca, input_cv_pca
212 |
213 |
214 | def get_poloynomials(input_train, input_test, input_cv, degree=2):
215 | from sklearn.preprocessing import PolynomialFeatures
216 | poly = PolynomialFeatures(degree=degree)
217 | poly.fit(input_train)
218 |
219 | input_train_poly = poly.fit_transform(input_train)
220 | input_test_poly = poly.fit_transform(input_test)
221 | input_cv_poly = poly.fit_transform(input_cv)
222 |
223 | return input_train_poly, input_test_poly, input_cv_poly
--------------------------------------------------------------------------------
/images/Figure_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/Figure_1.png
--------------------------------------------------------------------------------
/images/Inspecting_features.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/Inspecting_features.gif
--------------------------------------------------------------------------------
/images/ROC_lr_v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/ROC_lr_v2.png
--------------------------------------------------------------------------------
/images/cnn_v1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/cnn_v1.png
--------------------------------------------------------------------------------
/images/cnn_v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/cnn_v2.png
--------------------------------------------------------------------------------
/images/feature_heatmap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/feature_heatmap.png
--------------------------------------------------------------------------------
/images/features_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/features_example.png
--------------------------------------------------------------------------------
/images/legend_one_fits_all.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/legend_one_fits_all.png
--------------------------------------------------------------------------------
/images/lr_v1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/lr_v1.png
--------------------------------------------------------------------------------
/images/lr_v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/lr_v2.png
--------------------------------------------------------------------------------
/images/lr_v3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/lr_v3.png
--------------------------------------------------------------------------------
/images/lstm_v1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/lstm_v1.png
--------------------------------------------------------------------------------
/images/lstm_v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/lstm_v2.png
--------------------------------------------------------------------------------
/images/portfolio_value_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/images/portfolio_value_1.png
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | """
2 | This is the main code for automated FX trading
3 |
4 | """
5 |
6 | from apscheduler.schedulers.blocking import BlockingScheduler
7 | from helpers.oanda_api_helpers import TradingSession
8 | from helpers.utils import remove_nan_rows
9 | from helpers.get_features import get_features, min_max_scaling
10 | from helpers.get_historical_data import get_latest_oanda_data
11 | import tensorflow as tf
12 | import numpy as np
13 | import pandas as pd
14 | import datetime
15 | import time
16 | import pytz
17 |
18 |
19 | # some parameters
20 | close_current_positions = True
21 | start_on_spot = True
22 |
23 | # oanda access keys
24 | accountID = ''
25 | access_token = ''
26 | model_name = 'lr-v1-avg_score1.454-2000'
27 |
28 | # init trading session
29 | trading_sess = TradingSession(accountID=accountID, access_token=access_token)
30 | if close_current_positions:
31 | trading_sess.close_all_open_positions()
32 |
33 | # init tf model
34 | config = tf.ConfigProto(device_count={'GPU': 0})
35 | sess = tf.Session(config=config)
36 | saver = tf.train.import_meta_graph('saved_models/' + model_name + '.meta')
37 | saver.restore(sess, tf.train.latest_checkpoint('saved_models/'))
38 | graph = tf.get_default_graph()
39 | x = graph.get_tensor_by_name('Placeholder:0')
40 | drop_out = graph.get_tensor_by_name('strided_slice_1:0')
41 | y_ = graph.get_tensor_by_name('Softmax:0')
42 |
43 | # other variables
44 | log = pd.DataFrame()
45 | tz = pytz.timezone('Europe/Vilnius')
46 | start_time = str(datetime.datetime.now(tz))[:-13].replace(':', '-')
47 | margin_rate = float(trading_sess.check_account_summary()['account']['marginRate'])
48 | last_complete_candle_stamp = ''
49 |
50 |
51 | def do_stuff_every_period():
52 |
53 | global log
54 | global start_time
55 | global last_complete_candle_stamp
56 | global margin_rate
57 | current_time = str(datetime.datetime.now(tz))[:-13]
58 |
59 | # estimate position size
60 | account_balance = np.around(float(trading_sess.check_account_summary()['account']['balance']), 0)
61 | funds_to_commit = account_balance * (1 / margin_rate)
62 |
63 | # download latest data
64 | # always check if new candle is present, because even after 5 seconds, it might be not formed if market is very calm
65 | # make sure this loop does not loop endlessly on weekends (this is configured in scheduler)
66 | while True:
67 | oanda_data = get_latest_oanda_data('EUR_USD', 'H1', 300) # many data-points to increase EMA and such accuracy
68 | current_complete_candle_stamp = oanda_data[-1]['time']
69 | if current_complete_candle_stamp != last_complete_candle_stamp: # if new candle is complete
70 | break
71 | time.sleep(5)
72 | last_complete_candle_stamp = current_complete_candle_stamp
73 |
74 | # get features
75 | input_data_raw, input_data_dummy = get_features(oanda_data)
76 | input_data, input_data_dummy = remove_nan_rows([input_data_raw, input_data_dummy])
77 | input_data_scaled_no_dummy = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
78 | input_data_scaled = np.concatenate([input_data_scaled_no_dummy, input_data_dummy], axis=1)
79 |
80 | # estimate signal
81 | y_pred = sess.run(y_, feed_dict={x: input_data_scaled[-1:, :], drop_out: 1})
82 | order_signal_id = y_pred.argmax()
83 | order_signal = [1, -1, 0][order_signal_id] # 0 stands for buy, 1 for sell, 2 for hold
84 |
85 | # manage trading positions
86 | current_position = trading_sess.order_book['EUR_USD']['order_type']
87 | if current_position != order_signal:
88 | if current_position is not None:
89 | trading_sess.close_order('EUR_USD')
90 | trading_sess.open_order('EUR_USD', funds_to_commit * order_signal)
91 | else:
92 | print('{}: EUR_USD (holding)'.format(['Long', 'Short', 'Nothing'][order_signal_id]))
93 |
94 | # log
95 | new_log = pd.DataFrame([[current_time, oanda_data[-1]['closeMid'], y_pred]],
96 | columns=['Datetime', 'Last input Price', 'y_pred'])
97 | log = log.append(new_log)
98 | log.to_csv('logs/log {}.csv'.format(start_time))
99 |
100 | print('{} | price: {:.5f} | signal: buy: {:.2f}, sell: {:.2f}, nothing: {:.2f}'
101 | .format(current_time, oanda_data[-1]['closeMid'], y_pred[0][0], y_pred[0][1], y_pred[0][2]))
102 |
103 |
104 | # Scheduler
105 | scheduler = BlockingScheduler()
106 | scheduler.add_job(do_stuff_every_period,
107 | trigger='cron',
108 | day_of_week='0-4',
109 | hour='0-23',
110 | minute='0',
111 | second='5')
112 |
113 | if start_on_spot:
114 | do_stuff_every_period()
115 | scheduler.start()
116 |
117 | # close_order_manually(accountID, access_token, 1603)
118 | # trading_sess.check_open_positions()
119 | # trading_sess.check_account_summary()
120 | # trading_sess.order_book
121 |
--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
1 | """
2 | A set of models to train
3 | """
4 |
5 | import tensorflow as tf
6 | import numpy as np
7 |
8 |
9 | def logistic_regression(input_dim, output_dim):
10 | """Simple logistic regression
11 | Returns x and y placeholders, logits and y_ (y hat)"""
12 | tf.reset_default_graph()
13 |
14 | x = tf.placeholder(tf.float32, [None, input_dim])
15 | y = tf.placeholder(tf.float32, [None, output_dim])
16 | learning_r = tf.placeholder(tf.float32, 1)[0]
17 | drop_out = tf.placeholder(tf.float32, 1)[0]
18 |
19 | w_init = tf.contrib.layers.xavier_initializer()
20 | b_init = tf.initializers.truncated_normal(mean=0.1, stddev=0.025)
21 | w = tf.get_variable('weights1', shape=[input_dim, output_dim], initializer=w_init)
22 | b = tf.get_variable('bias1', shape=[output_dim], initializer=b_init)
23 |
24 | logits = tf.matmul(tf.nn.dropout(x, keep_prob=drop_out), w) + b
25 | y_ = tf.nn.softmax(logits)
26 |
27 | [print(var) for var in tf.trainable_variables()]
28 | return x, y, logits, y_, learning_r, drop_out
29 |
30 |
31 | def lstm_nn(input_dim, output_dim, time_steps, n_hidden):
32 | """LSTM net returns x and y placeholders, logits and y_ (y hat)"""
33 |
34 | tf.reset_default_graph()
35 |
36 | x = tf.placeholder(tf.float32, [None, time_steps, input_dim])
37 | y = tf.placeholder(tf.float32, [None, output_dim])
38 | learning_r = tf.placeholder(tf.float32, 1)[0]
39 | drop_out = tf.placeholder(tf.float32, 1)[0]
40 |
41 | w_init = tf.contrib.layers.xavier_initializer()
42 | b_init = tf.initializers.truncated_normal(mean=0.1, stddev=0.025)
43 | w = tf.get_variable('last_weights', shape=[n_hidden[-1], output_dim], initializer=w_init)
44 | # b = tf.get_variable('bias1', shape=[output_dim], initializer=b_init)
45 |
46 | x_split = tf.unstack(x, time_steps, 1)
47 |
48 | # stack lstm cells, a cell per hidden layer
49 | stacked_lstm_cells = [] # a list of lstm cells to be inputted into MultiRNNCell
50 | for layer_size in n_hidden:
51 | stacked_lstm_cells.append(tf.contrib.rnn.BasicLSTMCell(layer_size, activation=tf.nn.relu))
52 |
53 | # create the net and add dropout
54 | lstm_cell = tf.contrib.rnn.MultiRNNCell(stacked_lstm_cells)
55 | lstm_cell_with_dropout = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=drop_out)
56 |
57 | # forward propagate
58 | outputs, state = tf.contrib.rnn.static_rnn(lstm_cell_with_dropout, x_split, dtype=tf.float32)
59 | logits = tf.matmul(outputs[-1], w) # + b # logits are used for cross entropy
60 | y_ = tf.nn.softmax(logits)
61 |
62 | [print(var) for var in tf.trainable_variables()]
63 | print([print(i) for i in outputs])
64 | print(y_)
65 | return x, y, logits, y_, learning_r, drop_out
66 |
67 |
68 | def cnn(input_dim, output_dim, time_steps, filter):
69 | """CNN returns x and y placeholders, logits and y_ (y hat)"""
70 |
71 | tf.reset_default_graph()
72 |
73 | x = tf.placeholder(tf.float32, [None, input_dim, time_steps, 1])
74 | y = tf.placeholder(tf.float32, [None, output_dim])
75 | learning_r = tf.placeholder(tf.float32, 1)[0]
76 | drop_out = tf.placeholder(tf.float32, 1)[0]
77 |
78 | conv1 = tf.layers.conv2d(inputs=x,
79 | filters=filter[0],
80 | kernel_size=(input_dim, 1),
81 | kernel_initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32),
82 | strides=1,
83 | padding='valid',
84 | activation=tf.nn.relu)
85 | conv1_dropout = tf.layers.dropout(inputs=conv1, rate=drop_out)
86 | conv2 = tf.layers.conv2d(inputs=conv1_dropout,
87 | filters=filter[1],
88 | kernel_size=(1, time_steps),
89 | kernel_initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32),
90 | strides=1,
91 | padding='valid',
92 | activation=tf.nn.relu)
93 | logits_dense = tf.layers.dense(inputs=conv2,
94 | units=output_dim,
95 | kernel_initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32),
96 | activation=None,
97 | use_bias=False)
98 |
99 | logits = tf.reshape(logits_dense, (-1, output_dim))
100 | y_ = tf.nn.softmax(tf.reshape(logits_dense, (-1, output_dim)))
101 |
102 | [print(var) for var in tf.trainable_variables()]
103 | print(y_)
104 | return x, y, logits, y_, learning_r, drop_out
105 |
106 |
107 | def vanilla_nn(input_dim, output_dim, architecture, drop_layer=0, drop_keep_prob=0.9):
108 | """Vanilla neural net
109 | Returns x and y placeholders, logits and y_ (y hat)"""
110 | tf.reset_default_graph()
111 |
112 | x = tf.placeholder(tf.float32, [None, input_dim])
113 | y = tf.placeholder(tf.float32, [None, output_dim])
114 |
115 | w_init = tf.contrib.layers.xavier_initializer()
116 | b_init = tf.initializers.truncated_normal(mean=0.1, stddev=0.025)
117 | layer_sizes = [input_dim] + architecture + [output_dim]
118 | weights, biases = {}, {}
119 | layer_values = {0: x}
120 | for layer, size_current, size_next in zip(range(len(layer_sizes)), layer_sizes, layer_sizes[1:]):
121 |
122 | # create weights
123 | last_layer = layer == len(layer_sizes) - 2 # dummy variable for last layer
124 | weights[layer] = tf.get_variable('weights{}'.format(layer), shape=[size_current, size_next], initializer=w_init)
125 | biases[layer] = tf.get_variable('biases{}'.format(layer), shape=[size_next], initializer=b_init)
126 |
127 | # forward-propagate
128 | if not last_layer:
129 | layer_values[layer+1] = tf.nn.relu(tf.matmul(layer_values[layer], weights[layer]) + biases[layer])
130 | else:
131 | layer_values[layer+1] = tf.matmul(layer_values[layer], weights[layer]) + biases[layer]
132 | y_ = tf.nn.softmax(layer_values[layer+1])
133 | if drop_layer == layer:
134 | layer_values[layer+1] = tf.nn.dropout(layer_values[layer+1], keep_prob=drop_keep_prob)
135 |
136 | [print(var) for var in tf.trainable_variables()]
137 | print([print(value) for _, value in layer_values.items()])
138 | print(y_)
139 | return x, y, layer_values[len(layer_values)-1], y_
140 |
141 |
--------------------------------------------------------------------------------
/saved_models/lr-v2-3000.data-00000-of-00001:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/saved_models/lr-v2-3000.data-00000-of-00001
--------------------------------------------------------------------------------
/saved_models/lr-v2-3000.index:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/saved_models/lr-v2-3000.index
--------------------------------------------------------------------------------
/saved_models/lr-v2-3000.meta:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raidastauras/Trading-Bot/5b375d80e13923596d0bc902a761dc78e6437ef2/saved_models/lr-v2-3000.meta
--------------------------------------------------------------------------------
/train_cnn_v1.py:
--------------------------------------------------------------------------------
1 | """
2 | Training cnn model
3 |
4 | Dropout and lr to placeholders
5 | random batches
6 | """
7 |
8 | import numpy as np
9 | import pylab as pl
10 | import tensorflow as tf
11 | from helpers.utils import price_to_binary_target, extract_timeseries_from_oanda_data, train_test_validation_split
12 | from helpers.utils import remove_nan_rows, get_signal, get_data_batch, get_cnn_input_output
13 | from models import cnn
14 | from helpers.get_features import get_features, min_max_scaling
15 |
16 |
17 | # other-params
18 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
19 | pl.rcParams.update({'font.size': 6})
20 |
21 | # hyper-params
22 | batch_size = 1024
23 | learning_rate = 0.002
24 | drop_keep_prob = 0.8
25 | value_moving_average = 50
26 | split = (0.5, 0.3, 0.2)
27 | plotting = False
28 | saving = False
29 | time_steps = 4
30 |
31 | # load data
32 | oanda_data = np.load('data\\EUR_USD_H1.npy') # [-50000:]
33 | output_data_raw = price_to_binary_target(oanda_data, delta=0.00027)
34 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
35 | input_data_raw, input_data_dummy_raw = get_features(oanda_data)
36 | price_data_raw = np.concatenate([[[0]],
37 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
38 |
39 | # prepare data
40 | input_data, output_data, input_data_dummy, price_data = \
41 | remove_nan_rows([input_data_raw, output_data_raw, input_data_dummy_raw, price_data_raw])
42 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
43 | input_data = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
44 | input_data, output_data = get_cnn_input_output(input_data, output_data, time_steps=time_steps)
45 | price_data = price_data[-len(input_data):]
46 |
47 | # split to train, test and cross validation
48 | input_train, input_test, input_cv, output_train, output_test, output_cv, price_train, price_test, price_cv = \
49 | train_test_validation_split([input_data, output_data, price_data], split=split)
50 |
51 | # get dims
52 | _, input_dim, _, _ = np.shape(input_data)
53 | _, output_dim = np.shape(output_data)
54 |
55 | # forward-propagation
56 | x, y, logits, y_, learning_r, drop_out = cnn(input_dim, output_dim, time_steps=time_steps, filter=[3, 6])
57 |
58 | # tf cost and optimizer
59 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
60 | train_step = tf.train.AdamOptimizer(learning_r).minimize(cost)
61 |
62 | # init session
63 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
64 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
65 | saver = tf.train.Saver()
66 | init = tf.global_variables_initializer()
67 | sess = tf.Session()
68 | sess.run(init)
69 |
70 | # train
71 | while True:
72 |
73 | if step == 30000:
74 | break
75 |
76 | # train model
77 | x_train, y_train = get_data_batch([input_train, output_train], batch_size, sequential=False)
78 | _, cost_train = sess.run([train_step, cost],
79 | feed_dict={x: x_train, y: y_train, learning_r: learning_rate, drop_out: drop_keep_prob})
80 |
81 | # keep track of stuff
82 | step += 1
83 | if step % 100 == 0 or step == 1:
84 |
85 | # get y_ predictions
86 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
87 | cost_test, y_test_pred = sess.run([cost, y_], feed_dict={x: input_test, y: output_test, drop_out: drop_keep_prob})
88 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
89 |
90 | # get portfolio value
91 | signal_train, signal_test, signal_cv = get_signal(y_train_pred), get_signal(y_test_pred), get_signal(y_cv_pred)
92 | value_train = 1 + np.cumsum(np.sum(signal_train[:-1] * price_train[1:], axis=1))
93 | value_test = 1 + np.cumsum(np.sum(signal_test[:-1] * price_test[1:], axis=1))
94 | value_cv = 1 + np.cumsum(np.sum(signal_cv[:-1] * price_cv[1:], axis=1))
95 |
96 | # save history
97 | step_hist.append(step)
98 | cost_hist_train.append(cost_train)
99 | cost_hist_test.append(cost_test)
100 | value_hist_train.append(value_train[-1])
101 | value_hist_test.append(value_test[-1])
102 | value_hist_cv.append(value_cv[-1])
103 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
104 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
105 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
106 |
107 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
108 |
109 | if plotting:
110 |
111 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
112 |
113 | pl.subplot(211)
114 | pl.title('cost function')
115 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
116 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
117 |
118 | pl.subplot(212)
119 | pl.title('Portfolio value')
120 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
121 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
122 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
123 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
124 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
125 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
126 | pl.pause(1e-10)
127 |
128 | # save if some complicated rules
129 | if saving:
130 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
131 | else np.average([value_test[-1], value_cv[-1]])
132 | saving_score = current_score if saving_score < current_score else saving_score
133 | if saving_score == current_score and saving_score > 0.05:
134 | saver.save(sess, 'saved_models/cnn-v1-avg_score{:.3f}'.format(current_score), global_step=step)
135 | print('Model saved. Average score: {:.2f}'.format(current_score))
136 |
137 | pl.figure(2)
138 | pl.plot(value_test, linewidth=0.2)
139 | pl.plot(value_cv, linewidth=2)
140 | pl.pause(1e-10)
141 |
142 |
143 |
--------------------------------------------------------------------------------
/train_cnn_v2.py:
--------------------------------------------------------------------------------
1 | """
2 | Training lstm v2:
3 | using model to allocate funds, i.e. maximizing return without target labels.
4 |
5 | """
6 |
7 | import numpy as np
8 | import pylab as pl
9 | import tensorflow as tf
10 | from helpers.utils import extract_timeseries_from_oanda_data, train_test_validation_split
11 | from helpers.utils import remove_nan_rows, get_data_batch, get_cnn_input_output
12 | from models import cnn
13 | from helpers.get_features import get_features, min_max_scaling
14 |
15 |
16 | # other-params
17 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
18 | pl.rcParams.update({'font.size': 6})
19 |
20 | # hyper-params
21 | batch_size = 1024
22 | learning_rate = 0.002
23 | drop_keep_prob = 0.2
24 | value_moving_average = 50
25 | split = (0.5, 0.3, 0.2)
26 | plotting = False
27 | saving = False
28 | time_steps = 4
29 |
30 | # load data
31 | oanda_data = np.load('data\\EUR_USD_H1.npy') # [-50000:]
32 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
33 | input_data_raw, input_data_dummy = get_features(oanda_data)
34 | price_data_raw = np.concatenate([[[0]],
35 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
36 |
37 | # prepare data
38 | input_data, price_data, input_data_dummy = remove_nan_rows([input_data_raw, price_data_raw, input_data_dummy])
39 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
40 | input_data_scaled = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
41 | input_data, _ = get_cnn_input_output(input_data, np.zeros_like(input_data), time_steps=time_steps)
42 | price_data = price_data[-len(input_data):]
43 |
44 | # split to train,test and cross validation
45 | input_train, input_test, input_cv, price_train, price_test, price_cv = \
46 | train_test_validation_split([input_data, price_data], split=split)
47 |
48 | # get dims
49 | _, input_dim, _, _ = np.shape(input_train)
50 |
51 | # forward-propagation
52 | x, y, logits, y_, learning_r, drop_out = cnn(input_dim, 3, time_steps=time_steps, filter=[1, 1])
53 |
54 | # tf cost and optimizer
55 | price_h = tf.placeholder(tf.float32, [None, 1])
56 | signals = tf.constant([[1., -1., 0.]])
57 | cost = (tf.reduce_mean(y_ * signals * price_h * 100)) # profit function
58 | train_step = tf.train.AdamOptimizer(learning_r).minimize(-cost)
59 |
60 | # init session
61 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
62 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
63 | saver = tf.train.Saver()
64 | init = tf.global_variables_initializer()
65 | sess = tf.Session()
66 | sess.run(init)
67 |
68 | # train
69 | while True:
70 |
71 | if step == 30000:
72 | break
73 |
74 | # train model
75 | x_train, price_batch = get_data_batch([input_train[:-1], price_train[1:]], batch_size, sequential=False)
76 | _, cost_train = sess.run([train_step, cost],
77 | feed_dict={x: x_train, price_h: price_batch,
78 | learning_r: learning_rate, drop_out: drop_keep_prob})
79 |
80 | # keep track of stuff
81 | step += 1
82 | if step % 100 == 0 or step == 1:
83 |
84 | # get y_ predictions
85 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
86 | y_test_pred, cost_test = sess.run([y_, cost], feed_dict={x: input_test[:-1], price_h: price_test[1:], drop_out: drop_keep_prob})
87 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
88 |
89 | # get portfolio value
90 | value_train = 1 + np.cumsum(np.sum(y_train_pred[:-1] * [1., -1., 0.] * price_train[1:], axis=1))
91 | value_test = 1 + np.cumsum(np.sum(y_test_pred * [1., -1., 0.] * price_test[1:], axis=1))
92 | value_cv = 1 + np.cumsum(np.sum(y_cv_pred[:-1] * [1., -1., 0.] * price_cv[1:], axis=1))
93 |
94 | # save history
95 | step_hist.append(step)
96 | cost_hist_train.append(cost_train)
97 | cost_hist_test.append(cost_test)
98 | value_hist_train.append(value_train[-1])
99 | value_hist_test.append(value_test[-1])
100 | value_hist_cv.append(value_cv[-1])
101 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
102 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
103 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
104 |
105 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
106 |
107 | if plotting:
108 |
109 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
110 |
111 | pl.subplot(211)
112 | pl.title('Objective function')
113 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
114 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
115 |
116 | pl.subplot(212)
117 | pl.title('Portfolio value')
118 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
119 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
120 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
121 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
122 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
123 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
124 | pl.pause(1e-10)
125 |
126 | # save if some complicated rules
127 | if saving:
128 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
129 | else np.average([value_test[-1], value_cv[-1]])
130 | saving_score = current_score if saving_score < current_score else saving_score
131 | if saving_score == current_score and saving_score > 0.05:
132 | saver.save(sess, 'saved_models/lstm-v2-avg_score{:.3f}'.format(current_score), global_step=step)
133 | print('Model saved. Average score: {:.2f}'.format(current_score))
134 |
135 | pl.figure(2)
136 | pl.plot(value_train, linewidth=1)
137 | pl.plot(value_test, linewidth=1)
138 | pl.plot(value_cv, linewidth=1)
139 | pl.pause(1e-10)
140 |
141 |
--------------------------------------------------------------------------------
/train_logistic_regression_v1.py:
--------------------------------------------------------------------------------
1 | """
2 | Training logistic regression model V1:
3 | predicting price movement.
4 | """
5 |
6 | import numpy as np
7 | import pylab as pl
8 | import tensorflow as tf
9 | from helpers.utils import price_to_binary_target, extract_timeseries_from_oanda_data, train_test_validation_split
10 | from helpers.utils import remove_nan_rows, get_signal, get_data_batch
11 | from models import logistic_regression
12 | from helpers.get_features import get_features, min_max_scaling
13 |
14 |
15 | # other-params
16 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
17 | pl.rcParams.update({'font.size': 6})
18 | np.random.seed(0)
19 | tf.set_random_seed(0)
20 |
21 | # hyper-params
22 | batch_size = 1024
23 | learning_rate = 0.002
24 | drop_keep_prob = 1
25 | value_moving_average = 50
26 | split = (0.5, 0.3, 0.2)
27 | plotting = False
28 | saving = False
29 |
30 | # load data
31 | oanda_data = np.load('data\\EUR_USD_H1.npy')[-50000:]
32 | output_data_raw = price_to_binary_target(oanda_data, delta=0.0001)
33 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
34 | input_data_raw, input_data_dummy_raw = get_features(oanda_data)
35 | price_data_raw = np.concatenate([[[0]],
36 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
37 |
38 | # prepare data
39 | input_data, output_data, input_data_dummy, price_data = \
40 | remove_nan_rows([input_data_raw, output_data_raw, input_data_dummy_raw, price_data_raw])
41 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
42 | input_data_scaled = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
43 |
44 | # split to train, test and cross validation
45 | input_train, input_test, input_cv, output_train, output_test, output_cv, price_train, price_test, price_cv = \
46 | train_test_validation_split([input_data_scaled, output_data, price_data], split=split)
47 |
48 | # get dims
49 | _, input_dim = np.shape(input_train)
50 | _, output_dim = np.shape(output_train)
51 |
52 | # forward-propagation
53 | x, y, logits, y_, learning_r, drop_out = logistic_regression(input_dim, output_dim)
54 |
55 | # tf cost and optimizer
56 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
57 | train_step = tf.train.AdamOptimizer(learning_r).minimize(cost)
58 |
59 | # init session
60 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
61 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
62 | saver = tf.train.Saver()
63 | init = tf.global_variables_initializer()
64 | sess = tf.Session()
65 | sess.run(init)
66 |
67 | # main loop
68 | while True:
69 |
70 | if step == 2000:
71 | break
72 |
73 | # train model
74 | x_train, y_train = get_data_batch([input_train, output_train], batch_size, sequential=False)
75 | _, cost_train = sess.run([train_step, cost],
76 | feed_dict={x: x_train, y: y_train, learning_r: learning_rate, drop_out: drop_keep_prob})
77 |
78 | # keep track of stuff
79 | step += 1
80 | if step % 1 == 0 or step == 1:
81 |
82 | # get y_ predictions
83 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
84 | y_test_pred, cost_test = sess.run([y_, cost], feed_dict={x: input_test, y: output_test, drop_out: drop_keep_prob})
85 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
86 |
87 | # get portfolio value
88 | signal_train, signal_test, signal_cv = get_signal(y_train_pred), get_signal(y_test_pred), get_signal(y_cv_pred)
89 | value_train = 1 + np.cumsum(np.sum(signal_train[:-1] * price_train[1:], axis=1))
90 | value_test = 1 + np.cumsum(np.sum(signal_test[:-1] * price_test[1:], axis=1))
91 | value_cv = 1 + np.cumsum(np.sum(signal_cv[:-1] * price_cv[1:], axis=1))
92 |
93 | # save history
94 | step_hist.append(step)
95 | cost_hist_train.append(cost_train)
96 | cost_hist_test.append(cost_test)
97 | value_hist_train.append(value_train[-1])
98 | value_hist_test.append(value_test[-1])
99 | value_hist_cv.append(value_cv[-1])
100 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
101 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
102 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
103 |
104 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
105 |
106 | if plotting:
107 |
108 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
109 |
110 | pl.subplot(211)
111 | pl.title('cost function')
112 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
113 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
114 |
115 | pl.subplot(212)
116 | pl.title('Portfolio value')
117 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
118 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
119 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
120 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
121 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
122 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
123 | pl.pause(1e-10)
124 |
125 | # save if some complicated rules
126 | if saving:
127 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
128 | else np.average([value_test[-1], value_cv[-1]])
129 | saving_score = current_score if saving_score < current_score else saving_score
130 | if saving_score == current_score and saving_score > 0.05:
131 | saver.save(sess, 'saved_models/lr-v1-avg_score{:.3f}'.format(current_score), global_step=step)
132 | print('Model saved. Average score: {:.2f}'.format(current_score))
133 |
134 | pl.figure(2)
135 | pl.plot(value_train, linewidth=1)
136 | pl.plot(value_test, linewidth=1)
137 | pl.plot(value_cv, linewidth=1)
138 | pl.pause(1e-10)
139 |
140 |
--------------------------------------------------------------------------------
/train_logistic_regression_v2.py:
--------------------------------------------------------------------------------
1 | """
2 | Training logistic regression v2:
3 | using model to allocate funds, i.e. maximizing return without correct labels.
4 |
5 | Things to work out:
6 | 1. price_data_raw percentage or pips? (percentage)
7 | 2. objective function normalization (yearly percentage return..? etc)
8 |
9 | Other options to set:
10 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
11 | pl.rcParams.update({'font.size': 6})
12 | """
13 |
14 |
15 | import numpy as np
16 | import pylab as pl
17 | import tensorflow as tf
18 | from helpers.utils import extract_timeseries_from_oanda_data, train_test_validation_split
19 | from helpers.utils import remove_nan_rows, get_data_batch
20 | from models import logistic_regression
21 | from helpers.get_features import get_features, min_max_scaling
22 |
23 |
24 | # other-params
25 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
26 | pl.rcParams.update({'font.size': 6})
27 |
28 | # hyper-params
29 | batch_size = 1024
30 | learning_rate = 0.002
31 | drop_keep_prob = 0.3
32 | value_moving_average = 50
33 | split = (0.5, 0.3, 0.2)
34 | plotting = False
35 | saving = False
36 |
37 | # load data
38 | oanda_data = np.load('data\\EUR_USD_H1.npy')[-50000:]
39 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
40 | input_data_raw, input_data_dummy = get_features(oanda_data)
41 | price_data_raw = np.concatenate([[[0]],
42 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
43 |
44 | # prepare data
45 | input_data, price_data, input_data_dummy = remove_nan_rows([input_data_raw, price_data_raw, input_data_dummy])
46 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
47 | input_data_scaled = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
48 |
49 | # split to train, test and cross validation
50 | input_train, input_test, input_cv, price_train, price_test, price_cv = \
51 | train_test_validation_split([input_data_scaled, price_data], split=split)
52 |
53 | # get dims
54 | _, input_dim = np.shape(input_data_scaled)
55 |
56 | # forward-propagation
57 | x, y, logits, y_, learning_r, drop_out = logistic_regression(input_dim, 3)
58 |
59 | # tf cost and optimizer
60 | price_h = tf.placeholder(tf.float32, [None, 1])
61 | signals = tf.constant([[1., -1., 0.]])
62 | cost = tf.reduce_mean(y_ * signals * price_h * 100) # * 24 * 251 # objective fun: annualized average hourly return
63 | train_step = tf.train.AdamOptimizer(learning_r).minimize(-cost)
64 |
65 | # init session
66 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
67 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
68 | saver = tf.train.Saver()
69 | init = tf.global_variables_initializer()
70 | sess = tf.Session()
71 | sess.run(init)
72 |
73 | # main loop
74 | while True:
75 |
76 | if step ==30000:
77 | break
78 |
79 | # train model
80 | x_train, price_batch = get_data_batch([input_train[:-1], price_train[1:]], batch_size, sequential=False)
81 | _, cost_train = sess.run([train_step, cost],
82 | feed_dict={x: x_train, price_h: price_batch,
83 | learning_r: learning_rate, drop_out: drop_keep_prob})
84 |
85 | # keep track of stuff
86 | step += 1
87 | if step % 100 == 0 or step == 1:
88 |
89 | # get y_ predictions
90 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
91 | y_test_pred, cost_test = sess.run([y_, cost], feed_dict={x: input_test[:-1], price_h: price_test[1:], drop_out: drop_keep_prob})
92 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
93 |
94 | # get portfolio value
95 | value_train = 1 + np.cumsum(np.sum(y_train_pred[:-1] * [1., -1., 0.] * price_train[1:], axis=1))
96 | value_test = 1 + np.cumsum(np.sum(y_test_pred * [1., -1., 0.] * price_test[1:], axis=1))
97 | value_cv = 1 + np.cumsum(np.sum(y_cv_pred[:-1] * [1., -1., 0.] * price_cv[1:], axis=1))
98 |
99 | # save history
100 | step_hist.append(step)
101 | cost_hist_train.append(cost_train)
102 | cost_hist_test.append(cost_test)
103 | value_hist_train.append(value_train[-1])
104 | value_hist_test.append(value_test[-1])
105 | value_hist_cv.append(value_cv[-1])
106 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
107 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
108 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
109 |
110 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
111 |
112 | if plotting:
113 |
114 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
115 |
116 | pl.subplot(211)
117 | pl.title('Objective function')
118 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
119 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
120 |
121 | pl.subplot(212)
122 | pl.title('Portfolio value')
123 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
124 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
125 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
126 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
127 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
128 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
129 | pl.pause(1e-10)
130 |
131 | # save if some complicated rules
132 | if saving:
133 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
134 | else np.average([value_test[-1], value_cv[-1]])
135 | saving_score = current_score if saving_score < current_score else saving_score
136 | if saving_score == current_score and saving_score > 0.05:
137 | saver.save(sess, 'saved_models/lr-v2-avg_score{:.3f}'.format(current_score), global_step=step)
138 | print('Model saved. Average score: {:.2f}'.format(current_score))
139 |
140 | pl.figure(2)
141 | pl.plot(value_train, linewidth=1)
142 | pl.plot(value_test, linewidth=1)
143 | pl.plot(value_cv, linewidth=1)
144 | pl.pause(1e-10)
145 |
146 |
--------------------------------------------------------------------------------
/train_logistic_regression_v3.py:
--------------------------------------------------------------------------------
1 | """
2 | Train logistic regression with many features, PCA, polynomials
3 |
4 | """
5 |
6 | import numpy as np
7 | import pylab as plt
8 | from helpers.get_features import get_features_v2
9 | from helpers.utils import portfolio_value, price_to_binary_target, get_data_batch
10 | from helpers.utils import get_signal, remove_nan_rows, train_test_validation_split, plot_roc_curve
11 | from helpers.utils import min_max_scale, get_pca, get_poloynomials
12 | import tensorflow as tf
13 | from models import logistic_regression
14 |
15 |
16 | # hyper-params
17 | batch_size = 1024
18 | learning_rate = 0.002
19 | drop_keep_prob = 0.4
20 | value_moving_average = 50
21 | split = (0.5, 0.3, 0.2)
22 | plotting = False
23 | saving = False
24 | transaction_c = 0.000
25 |
26 | # load data
27 | oanda_data = np.load('data\\EUR_USD_H1.npy')[-50000:]
28 | y_data = price_to_binary_target(oanda_data, delta=0.000275)
29 | x_data = get_features_v2(oanda_data, time_periods=[10, 25, 50, 120, 256], return_numpy=False)
30 |
31 | # separate, rearrange and remove nans
32 | price = x_data['price'].as_matrix().reshape(-1, 1)
33 | price_change = x_data['price_delta'].as_matrix().reshape(-1, 1)
34 | x_data = x_data.drop(['price', 'price_delta'], axis=1).as_matrix()
35 | price, price_change, x_data, y_data = remove_nan_rows([price, price_change, x_data, y_data])
36 |
37 | # split to train, test and cross validation
38 | input_train, input_test, input_cv, output_train, output_test, output_cv, price_train, price_test, price_cv = \
39 | train_test_validation_split([x_data, y_data, price_change], split=split)
40 |
41 | # pre-process data: scale, pca, polynomial
42 | input_train, input_test, input_cv = min_max_scale(input_train, input_test, input_cv, std_dev_threshold=2.5)
43 | # input_train, input_test, input_cv = get_pca(input_train, input_test, input_cv, threshold=0.01)
44 | input_train, input_test, input_cv = get_poloynomials(input_train, input_test, input_cv, degree=2)
45 |
46 | # get dims
47 | _, input_dim = np.shape(input_train)
48 | _, output_dim = np.shape(output_train)
49 |
50 | # forward-propagation
51 | x, y, logits, y_, learning_r, drop_out = logistic_regression(input_dim, output_dim)
52 |
53 | # tf cost and optimizer
54 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
55 | train_step = tf.train.AdamOptimizer(learning_r).minimize(cost)
56 |
57 | # init session
58 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
59 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
60 | saver = tf.train.Saver()
61 | init = tf.global_variables_initializer()
62 | sess = tf.Session()
63 | sess.run(init)
64 |
65 | # main loop
66 | for _ in range(5000):
67 |
68 | if step % 1000 == 0:
69 | learning_rate *= 0.8
70 |
71 | # train model
72 | x_train, y_train = get_data_batch([input_train, output_train], batch_size, sequential=False)
73 | _, cost_train = sess.run([train_step, cost],
74 | feed_dict={x: x_train, y: y_train, learning_r: learning_rate, drop_out: drop_keep_prob})
75 |
76 | # keep track of stuff
77 | step += 1
78 | if step % 10 == 0 or step == 1:
79 |
80 | # get y_ predictions
81 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
82 | y_test_pred, cost_test = sess.run([y_, cost], feed_dict={x: input_test, y: output_test, drop_out: drop_keep_prob})
83 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
84 |
85 | # get portfolio value
86 | signal_train, signal_test, signal_cv = get_signal(y_train_pred), get_signal(y_test_pred), get_signal(y_cv_pred)
87 | value_train = portfolio_value(price_train, signal_train, trans_costs=transaction_c)
88 | value_test = portfolio_value(price_test, signal_test, trans_costs=transaction_c)
89 | value_cv = portfolio_value(price_cv, signal_cv, trans_costs=transaction_c)
90 |
91 | # save history
92 | step_hist.append(step)
93 | cost_hist_train.append(cost_train)
94 | cost_hist_test.append(cost_test)
95 | value_hist_train.append(value_train[-1])
96 | value_hist_test.append(value_test[-1])
97 | value_hist_cv.append(value_cv[-1])
98 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
99 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
100 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
101 |
102 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
103 |
104 | if plotting:
105 |
106 | plt.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
107 |
108 | plt.subplot(211)
109 | plt.title('cost function')
110 | plt.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
111 | plt.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
112 |
113 | plt.subplot(212)
114 | plt.title('Portfolio value')
115 | plt.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
116 | plt.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
117 | plt.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
118 | plt.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
119 | plt.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
120 | plt.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
121 | plt.pause(1e-10)
122 |
123 | # save if some complicated rules
124 | if saving:
125 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
126 | else np.average([value_test[-1], value_cv[-1]])
127 | saving_score = current_score if saving_score < current_score else saving_score
128 | if saving_score == current_score and saving_score > 0.1:
129 | saver.save(sess, 'saved_models/lr-v1-avg_score{:.3f}'.format(current_score), global_step=step)
130 | print('Model saved. Average score: {:.2f}'.format(current_score))
131 |
132 | plt.figure(2)
133 | plt.plot(value_train, linewidth=1)
134 | plt.plot(value_test, linewidth=1)
135 | plt.plot(value_cv, linewidth=1)
136 | plt.pause(1e-10)
137 |
138 | # roc curve
139 | roc_auc_train, fpr_train, tpr_train = plot_roc_curve(y_train_pred, output_train)
140 | roc_auc_test, fpr_test, tpr_test = plot_roc_curve(y_test_pred, output_test)
141 | roc_auc_cv, fpr_cv, tpr_cv = plot_roc_curve(y_cv_pred, output_cv)
142 |
143 | plt.figure(2, figsize=(3, 3), dpi=80, facecolor='w', edgecolor='k')
144 | plt.plot(fpr_train, tpr_train, color='darkorange', lw=2, label='Train area: {:0.2f}'.format(roc_auc_train))
145 | plt.plot(fpr_test, tpr_test, color='dodgerblue', lw=2, label='Test area: {:0.2f}'.format(roc_auc_test))
146 | plt.plot(fpr_cv, tpr_cv, color='magenta', lw=2, label='CV area: {:0.2f}'.format(roc_auc_cv))
147 | plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
148 | plt.xlim([0.0, 1.0])
149 | plt.ylim([0.0, 1.0])
150 | plt.xlabel('False Positive Rate')
151 | plt.ylabel('True Positive Rate')
152 | plt.title('Receiver operating characteristic')
153 | plt.legend(loc="lower right")
154 | plt.show()
155 |
--------------------------------------------------------------------------------
/train_lstm_v1.py:
--------------------------------------------------------------------------------
1 | """
2 | Training lstm model v1
3 | """
4 |
5 | import numpy as np
6 | import pylab as pl
7 | import tensorflow as tf
8 | from sklearn.preprocessing import minmax_scale
9 | from helpers.utils import price_to_binary_target, extract_timeseries_from_oanda_data, train_test_validation_split
10 | from helpers.utils import remove_nan_rows, get_signal, portfolio_value, get_data_batch, get_lstm_input_output
11 | from models import lstm_nn
12 | from helpers.get_features import get_features, min_max_scaling
13 |
14 |
15 | # other-params
16 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
17 | pl.rcParams.update({'font.size': 6})
18 |
19 | # hyper-params
20 | batch_size = 1024
21 | learning_rate = 0.001
22 | drop_keep_prob = 0.6
23 | value_moving_average = 50
24 | split = (0.7, 0.2, 0.1)
25 | plotting = False
26 | saving = False
27 | time_steps = 5
28 |
29 | # load data
30 | oanda_data = np.load('data\\EUR_USD_H1.npy') # [-50000:]
31 | output_data_raw = price_to_binary_target(oanda_data, delta=0.00027)
32 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
33 | input_data_raw, input_data_dummy_raw = get_features(oanda_data)
34 | price_data_raw = np.concatenate([[[0]],
35 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
36 |
37 | # prepare data
38 | input_data, output_data, input_data_dummy, price_data = \
39 | remove_nan_rows([input_data_raw, output_data_raw, input_data_dummy_raw, price_data_raw])
40 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
41 | input_data = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
42 | input_data, output_data = get_lstm_input_output(input_data, output_data, time_steps=time_steps)
43 | price_data = price_data[-len(input_data):]
44 |
45 | # split to train,test and cross validation
46 | input_train, input_test, input_cv, output_train, output_test, output_cv, price_train, price_test, price_cv = \
47 | train_test_validation_split([input_data, output_data, price_data], split=split)
48 |
49 | # get dims
50 | _, _, input_dim = np.shape(input_data)
51 | _, output_dim = np.shape(output_data)
52 |
53 | # forward-propagation
54 | x, y, logits, y_, learning_r, drop_out = lstm_nn(input_dim, output_dim, time_steps=time_steps, n_hidden=[8])
55 |
56 | # tf cost and optimizer
57 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
58 | train_step = tf.train.AdamOptimizer(learning_r).minimize(cost)
59 |
60 | # init session
61 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
62 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
63 | saver = tf.train.Saver()
64 | init = tf.global_variables_initializer()
65 | sess = tf.Session()
66 | sess.run(init)
67 |
68 | # train
69 | while True:
70 |
71 | if step == 30000:
72 | break
73 |
74 | # train model
75 | x_train, y_train = get_data_batch([input_train, output_train], batch_size, sequential=False)
76 | _, cost_train = sess.run([train_step, cost],
77 | feed_dict={x: x_train, y: y_train, learning_r: learning_rate, drop_out: drop_keep_prob})
78 |
79 | # keep track of stuff
80 | step += 1
81 | if step % 100 == 0 or step == 1:
82 |
83 | # get y_ predictions
84 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
85 | cost_test, y_test_pred = sess.run([cost, y_], feed_dict={x: input_test, y: output_test, drop_out: drop_keep_prob})
86 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, y: output_cv, drop_out: drop_keep_prob})
87 |
88 | # get portfolio value
89 | signal_train, signal_test, signal_cv = get_signal(y_train_pred), get_signal(y_test_pred), get_signal(y_cv_pred)
90 | value_train = 1 + np.cumsum(np.sum(signal_train[:-1] * price_train[1:], axis=1))
91 | value_test = 1 + np.cumsum(np.sum(signal_test[:-1] * price_test[1:], axis=1))
92 | value_cv = 1 + np.cumsum(np.sum(signal_cv[:-1] * price_cv[1:], axis=1))
93 |
94 | # save history
95 | step_hist.append(step)
96 | cost_hist_train.append(cost_train)
97 | cost_hist_test.append(cost_test)
98 | value_hist_train.append(value_train[-1])
99 | value_hist_test.append(value_test[-1])
100 | value_hist_cv.append(value_cv[-1])
101 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
102 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
103 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
104 |
105 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
106 |
107 | if plotting:
108 |
109 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
110 |
111 | pl.subplot(211)
112 | pl.title('cost function')
113 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
114 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
115 |
116 | pl.subplot(212)
117 | pl.title('Portfolio value')
118 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
119 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
120 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
121 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
122 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
123 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
124 | pl.pause(1e-10)
125 |
126 | # save if some complicated rules
127 | if saving:
128 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
129 | else np.average([value_test[-1], value_cv[-1]])
130 | saving_score = current_score if saving_score < current_score else saving_score
131 | if saving_score == current_score and saving_score > 0.05:
132 | saver.save(sess, 'saved_models/lstm-v1-avg_score{:.3f}'.format(current_score), global_step=step)
133 | print('Model saved. Average score: {:.2f}'.format(current_score))
134 |
135 | pl.figure(2)
136 | pl.plot(value_test, linewidth=0.2)
137 | pl.plot(value_cv, linewidth=2)
138 | pl.pause(1e-10)
139 |
140 |
141 | def save_plot():
142 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
143 |
144 | pl.subplot(211)
145 | pl.title('cost function')
146 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
147 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
148 |
149 | pl.subplot(212)
150 | pl.title('Portfolio value')
151 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
152 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
153 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
154 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
155 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
156 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
157 | pl.pause(1e-10)
158 |
159 | pl.savefig('lstm_v1_{:.3f}_{:.3f}.png'.format(learning_rate, value_cv[-1]))
160 | pl.close()
161 |
162 | save_plot()
163 |
--------------------------------------------------------------------------------
/train_lstm_v2.py:
--------------------------------------------------------------------------------
1 | """
2 | Training lstm v2:
3 | using model to allocate funds, i.e. maximizing return without target labels.
4 |
5 | """
6 |
7 | import numpy as np
8 | import pylab as pl
9 | import tensorflow as tf
10 | from helpers.utils import extract_timeseries_from_oanda_data, train_test_validation_split
11 | from helpers.utils import remove_nan_rows, get_data_batch, get_lstm_input_output
12 | from models import lstm_nn
13 | from helpers.get_features import get_features, min_max_scaling
14 |
15 |
16 | # other-params
17 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
18 | pl.rcParams.update({'font.size': 6})
19 |
20 | # hyper-params
21 | batch_size = 10024
22 | learning_rate = 0.05
23 | drop_keep_prob = 0.7
24 | value_moving_average = 50
25 | split = (0.7, 0.2, 0.1)
26 | plotting = False
27 | saving = False
28 | time_steps = 6
29 |
30 | # load data
31 | oanda_data = np.load('data\\EUR_USD_H1.npy') # [-50000:]
32 | price_data_raw = extract_timeseries_from_oanda_data(oanda_data, ['closeMid'])
33 | input_data_raw, input_data_dummy = get_features(oanda_data)
34 | price_data_raw = np.concatenate([[[0]],
35 | (price_data_raw[1:] - price_data_raw[:-1]) / (price_data_raw[1:] + 1e-10)], axis=0)
36 |
37 | # prepare data
38 | input_data, price_data, input_data_dummy = remove_nan_rows([input_data_raw, price_data_raw, input_data_dummy])
39 | input_data_scaled_no_dummies = (input_data - min_max_scaling[1, :]) / (min_max_scaling[0, :] - min_max_scaling[1, :])
40 | input_data_scaled = np.concatenate([input_data_scaled_no_dummies, input_data_dummy], axis=1)
41 | input_data_lstm, _ = get_lstm_input_output(input_data_scaled, np.zeros_like(input_data), time_steps=time_steps)
42 | price_data = price_data[-len(input_data_lstm):]
43 |
44 | # split to train,test and cross validation
45 | input_train, input_test, input_cv, price_train, price_test, price_cv = \
46 | train_test_validation_split([input_data_lstm, price_data], split=split)
47 |
48 | # get dims
49 | _, _, input_dim = np.shape(input_train)
50 |
51 | # forward-propagation
52 | x, y, logits, y_, learning_r, drop_out = lstm_nn(input_dim, 3, time_steps=time_steps, n_hidden=[3])
53 |
54 | # tf cost and optimizer
55 | price_h = tf.placeholder(tf.float32, [None, 1])
56 | signals = tf.constant([[1., -1., -1e-10]])
57 | cost = (tf.reduce_mean(y_ * signals * price_h * 100)) # profit function
58 | train_step = tf.train.AdamOptimizer(learning_r).minimize(-cost)
59 |
60 | # init session
61 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
62 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
63 | saver = tf.train.Saver()
64 | init = tf.global_variables_initializer()
65 | sess = tf.Session()
66 | sess.run(init)
67 |
68 | # train
69 | while True:
70 |
71 | if step == 30000:
72 | break
73 |
74 | # train model
75 | x_train, price_batch = get_data_batch([input_train[:-1], price_train[1:]], batch_size, sequential=False)
76 | _, cost_train = sess.run([train_step, cost],
77 | feed_dict={x: x_train, price_h: price_batch,
78 | learning_r: learning_rate, drop_out: drop_keep_prob})
79 |
80 | # keep track of stuff
81 | step += 1
82 | if step % 100 == 0 or step == 1:
83 |
84 | # get y_ predictions
85 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
86 | y_test_pred, cost_test = sess.run([y_, cost], feed_dict={x: input_test[:-1], price_h: price_test[1:], drop_out: drop_keep_prob})
87 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, drop_out: drop_keep_prob})
88 |
89 | # get portfolio value
90 | value_train = 1 + np.cumsum(np.sum(y_train_pred[:-1] * [1., -1., 0.] * price_train[1:], axis=1))
91 | value_test = 1 + np.cumsum(np.sum(y_test_pred * [1., -1., 0.] * price_test[1:], axis=1))
92 | value_cv = 1 + np.cumsum(np.sum(y_cv_pred[:-1] * [1., -1., 0.] * price_cv[1:], axis=1))
93 |
94 | # save history
95 | step_hist.append(step)
96 | cost_hist_train.append(cost_train)
97 | cost_hist_test.append(cost_test)
98 | value_hist_train.append(value_train[-1])
99 | value_hist_test.append(value_test[-1])
100 | value_hist_cv.append(value_cv[-1])
101 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
102 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
103 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
104 |
105 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
106 |
107 | if plotting:
108 |
109 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
110 |
111 | pl.subplot(211)
112 | pl.title('Objective function')
113 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
114 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
115 |
116 | pl.subplot(212)
117 | pl.title('Portfolio value')
118 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
119 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
120 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
121 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
122 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
123 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
124 | pl.pause(1e-10)
125 |
126 | # save if some complicated rules
127 | if saving:
128 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
129 | else np.average([value_test[-1], value_cv[-1]])
130 | saving_score = current_score if saving_score < current_score else saving_score
131 | if saving_score == current_score and saving_score > 0.05:
132 | saver.save(sess, 'saved_models/lstm-v2-avg_score{:.3f}'.format(current_score), global_step=step)
133 | print('Model saved. Average score: {:.2f}'.format(current_score))
134 |
135 | pl.figure(2)
136 | pl.plot(value_train, linewidth=1)
137 | pl.plot(value_test, linewidth=1)
138 | pl.plot(value_cv, linewidth=1)
139 | pl.pause(1e-10)
140 |
141 |
--------------------------------------------------------------------------------
/train_lstm_v3.py:
--------------------------------------------------------------------------------
1 | """
2 | Training lstm model v1
3 | """
4 |
5 | import numpy as np
6 | import pylab as pl
7 | import tensorflow as tf
8 | from helpers.utils import price_to_binary_target, min_max_scale, get_poloynomials, get_pca, train_test_validation_split
9 | from helpers.utils import remove_nan_rows, get_signal, get_data_batch, get_lstm_input_output, plot_roc_curve
10 | from models import lstm_nn
11 | from helpers.get_features import get_features_v2
12 |
13 |
14 | # other-params
15 | np.set_printoptions(linewidth=75*3+5, edgeitems=6)
16 | pl.rcParams.update({'font.size': 6})
17 |
18 | # hyper-params
19 | batch_size = 1024
20 | learning_rate = 0.001
21 | drop_keep_prob = 0.6
22 | value_moving_average = 50
23 | split = (0.7, 0.2, 0.1)
24 | plotting = False
25 | saving = False
26 | time_steps = 5
27 | transaction_c = 0.000
28 |
29 | # load data
30 | oanda_data = np.load('data\\EUR_USD_H1.npy')[-50000:]
31 | y_data = price_to_binary_target(oanda_data, delta=0.000275)
32 | x_data = get_features_v2(oanda_data, time_periods=[10], return_numpy=False)
33 |
34 | # separate, rearrange and remove nans
35 | price = x_data['price'].as_matrix().reshape(-1, 1)
36 | price_change = x_data['price_delta'].as_matrix().reshape(-1, 1)
37 | x_data = x_data.drop(['price', 'price_delta'], axis=1).as_matrix()
38 | price, price_change, x_data, y_data = remove_nan_rows([price, price_change, x_data, y_data])
39 |
40 | # split to train,test and cross validation
41 | input_train, input_test, input_cv, output_train, output_test, output_cv, price_train, price_test, price_cv = \
42 | train_test_validation_split([x_data, y_data, price_change], split=split)
43 |
44 | # pre-process data: scale, pca, polynomial
45 | input_train, input_test, input_cv = min_max_scale(input_train, input_test, input_cv, std_dev_threshold=2.5)
46 | input_train, input_test, input_cv = get_pca(input_train, input_test, input_cv, threshold=0.01)
47 | # input_train, input_test, input_cv = get_poloynomials(input_train, input_test, input_cv, degree=2)
48 |
49 | # prep lstm format
50 | input_train, output_train = get_lstm_input_output(input_train, output_train, time_steps=time_steps)
51 | input_test, output_test = get_lstm_input_output(input_test, output_test, time_steps=time_steps)
52 | input_cv, output_cv = get_lstm_input_output(input_cv, output_cv, time_steps=time_steps)
53 |
54 | price_train = price_train[-len(input_train):]
55 | price_test = price_test[-len(input_test):]
56 | price_cv = price_cv[-len(input_cv):]
57 |
58 | # get dims
59 | _, _, input_dim = np.shape(input_train)
60 | _, output_dim = np.shape(output_train)
61 |
62 | # forward-propagation
63 | x, y, logits, y_, learning_r, drop_out = lstm_nn(input_dim, output_dim, time_steps=time_steps, n_hidden=[8])
64 |
65 | # tf cost and optimizer
66 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
67 | train_step = tf.train.AdamOptimizer(learning_r).minimize(cost)
68 |
69 | # init session
70 | cost_hist_train, cost_hist_test, value_hist_train, value_hist_test, value_hist_cv, value_hist_train_ma, \
71 | value_hist_test_ma, value_hist_cv_ma, step, step_hist, saving_score = [], [], [], [], [], [], [], [], 0, [], 0.05
72 | saver = tf.train.Saver()
73 | init = tf.global_variables_initializer()
74 | sess = tf.Session()
75 | sess.run(init)
76 |
77 | # train
78 | while True:
79 |
80 | if step == 30000:
81 | break
82 |
83 | # train model
84 | x_train, y_train = get_data_batch([input_train, output_train], batch_size, sequential=False)
85 | _, cost_train = sess.run([train_step, cost],
86 | feed_dict={x: x_train, y: y_train, learning_r: learning_rate, drop_out: drop_keep_prob})
87 |
88 | # keep track of stuff
89 | step += 1
90 | if step % 100 == 0 or step == 1:
91 |
92 | # get y_ predictions
93 | y_train_pred = sess.run(y_, feed_dict={x: input_train, drop_out: drop_keep_prob})
94 | cost_test, y_test_pred = sess.run([cost, y_], feed_dict={x: input_test, y: output_test, drop_out: drop_keep_prob})
95 | y_cv_pred = sess.run(y_, feed_dict={x: input_cv, y: output_cv, drop_out: drop_keep_prob})
96 |
97 | # get portfolio value
98 | signal_train, signal_test, signal_cv = get_signal(y_train_pred), get_signal(y_test_pred), get_signal(y_cv_pred)
99 | value_train = 1 + np.cumsum(np.sum(signal_train[:-1] * price_train[1:], axis=1))
100 | value_test = 1 + np.cumsum(np.sum(signal_test[:-1] * price_test[1:], axis=1))
101 | value_cv = 1 + np.cumsum(np.sum(signal_cv[:-1] * price_cv[1:], axis=1))
102 |
103 | # save history
104 | step_hist.append(step)
105 | cost_hist_train.append(cost_train)
106 | cost_hist_test.append(cost_test)
107 | value_hist_train.append(value_train[-1])
108 | value_hist_test.append(value_test[-1])
109 | value_hist_cv.append(value_cv[-1])
110 | value_hist_train_ma.append(np.mean(value_hist_train[-value_moving_average:]))
111 | value_hist_test_ma.append(np.mean(value_hist_test[-value_moving_average:]))
112 | value_hist_cv_ma.append(np.mean(value_hist_cv[-value_moving_average:]))
113 |
114 | print('Step {}: train {:.4f}, test {:.4f}'.format(step, cost_train, cost_test))
115 |
116 | if plotting:
117 |
118 | pl.figure(1, figsize=(3, 7), dpi=80, facecolor='w', edgecolor='k')
119 |
120 | pl.subplot(211)
121 | pl.title('cost function')
122 | pl.plot(step_hist, cost_hist_train, color='darkorange', linewidth=0.3)
123 | pl.plot(step_hist, cost_hist_test, color='dodgerblue', linewidth=0.3)
124 |
125 | pl.subplot(212)
126 | pl.title('Portfolio value')
127 | pl.plot(step_hist, value_hist_train, color='darkorange', linewidth=0.3)
128 | pl.plot(step_hist, value_hist_test, color='dodgerblue', linewidth=0.3)
129 | pl.plot(step_hist, value_hist_cv, color='magenta', linewidth=1)
130 | pl.plot(step_hist, value_hist_train_ma, color='tomato', linewidth=1.5)
131 | pl.plot(step_hist, value_hist_test_ma, color='royalblue', linewidth=1.5)
132 | pl.plot(step_hist, value_hist_cv_ma, color='black', linewidth=1.5)
133 | pl.pause(1e-10)
134 |
135 | # save if some complicated rules
136 | if saving:
137 | current_score = 0 if value_test[-1] < 0.01 or value_cv[-1] < 0.01 \
138 | else np.average([value_test[-1], value_cv[-1]])
139 | saving_score = current_score if saving_score < current_score else saving_score
140 | if saving_score == current_score and saving_score > 0.05:
141 | saver.save(sess, 'saved_models/lstm-v1-avg_score{:.3f}'.format(current_score), global_step=step)
142 | print('Model saved. Average score: {:.2f}'.format(current_score))
143 |
144 | pl.figure(2)
145 | pl.plot(value_test, linewidth=0.2)
146 | pl.plot(value_cv, linewidth=2)
147 | pl.pause(1e-10)
148 |
149 |
150 | # roc curve
151 | roc_auc_train, fpr_train, tpr_train = plot_roc_curve(y_train_pred, output_train)
152 | roc_auc_test, fpr_test, tpr_test = plot_roc_curve(y_test_pred, output_test)
153 | roc_auc_cv, fpr_cv, tpr_cv = plot_roc_curve(y_cv_pred, output_cv)
154 |
155 | plt.figure(2, figsize=(3, 3), dpi=80, facecolor='w', edgecolor='k')
156 | plt.plot(fpr_train, tpr_train, color='darkorange', lw=2, label='Train area: {:0.2f}'.format(roc_auc_train))
157 | plt.plot(fpr_test, tpr_test, color='dodgerblue', lw=2, label='Test area: {:0.2f}'.format(roc_auc_test))
158 | plt.plot(fpr_cv, tpr_cv, color='magenta', lw=2, label='CV area: {:0.2f}'.format(roc_auc_cv))
159 | plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
160 | plt.xlim([0.0, 1.0])
161 | plt.ylim([0.0, 1.0])
162 | plt.xlabel('False Positive Rate')
163 | plt.ylabel('True Positive Rate')
164 | plt.title('Receiver operating characteristic')
165 | plt.legend(loc="lower right")
166 | plt.show()
167 |
--------------------------------------------------------------------------------