11 |
12 | ## Reference
13 | Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. 2020. Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. In ICAIF ’20: ACM International Conference on AI in Finance, Oct. 15–16, 2020, Manhattan, NY. ACM, New York, NY, USA.
14 |
15 | ## [Our Medium Blog](https://medium.com/@ai4finance/deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02)
16 | ## Installation:
17 | ```shell
18 | git clone https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020.git
19 | ```
20 |
21 |
22 |
23 | ### Prerequisites
24 | For [OpenAI Baselines](https://github.com/openai/baselines), you'll need system packages CMake, OpenMPI and zlib. Those can be installed as follows
25 |
26 | #### Ubuntu
27 |
28 | ```bash
29 | sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx
30 | ```
31 |
32 | #### Mac OS X
33 | Installation of system packages on Mac requires [Homebrew](https://brew.sh). With Homebrew installed, run the following:
34 | ```bash
35 | brew install cmake openmpi
36 | ```
37 |
38 | #### Windows 10
39 |
40 | To install stable-baselines on Windows, please look at the [documentation](https://stable-baselines.readthedocs.io/en/master/guide/install.html#prerequisites).
41 |
42 | ### Create and Activate Virtual Environment (Optional but highly recommended)
43 | cd into this repository
44 | ```bash
45 | cd Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
46 | ```
47 | Under folder /Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020, create a virtual environment
48 | ```bash
49 | pip install virtualenv
50 | ```
51 | Virtualenvs are essentially folders that have copies of python executable and all python packages.
52 |
53 | **Virtualenvs can also avoid packages conflicts.**
54 |
55 | Create a virtualenv **venv** under folder /Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
56 | ```bash
57 | virtualenv -p python3 venv
58 | ```
59 | To activate a virtualenv:
60 | ```
61 | source venv/bin/activate
62 | ```
63 |
64 | ## Dependencies
65 |
66 | The script has been tested running under **Python >= 3.6.0**, with the folowing packages installed:
67 |
68 | ```shell
69 | pip install -r requirements.txt
70 | ```
71 |
72 | ### Questions
73 |
74 | ### About Tensorflow 2.0: https://github.com/hill-a/stable-baselines/issues/366
75 |
76 | If you have questions regarding TensorFlow, note that tensorflow 2.0 is not compatible now, you may use
77 |
78 | ```bash
79 | pip install tensorflow==1.15.4
80 | ```
81 |
82 | If you have questions regarding Stable-baselines package, please refer to [Stable-baselines installation guide](https://github.com/hill-a/stable-baselines). Install the Stable Baselines package using pip:
83 | ```
84 | pip install stable-baselines[mpi]
85 | ```
86 |
87 | This includes an optional dependency on MPI, enabling algorithms DDPG, GAIL, PPO1 and TRPO. If you do not need these algorithms, you can install without MPI:
88 | ```
89 | pip install stable-baselines
90 | ```
91 |
92 | Please read the [documentation](https://stable-baselines.readthedocs.io/) for more details and alternatives (from source, using docker).
93 |
94 |
95 | ## Run DRL Ensemble Strategy
96 | ```shell
97 | python run_DRL.py
98 | ```
99 | ## Backtesting
100 |
101 | Use Quantopian's [pyfolio package](https://github.com/quantopian/pyfolio) to do the backtesting.
102 |
103 | [Backtesting script](backtesting.ipynb)
104 |
105 | ## Status
106 |
107 |
129 |
--------------------------------------------------------------------------------
/env/EnvMultipleStock_train.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | from gym.utils import seeding
4 | import gym
5 | from gym import spaces
6 | import matplotlib
7 | matplotlib.use('Agg')
8 | import matplotlib.pyplot as plt
9 | import pickle
10 |
11 | # shares normalization factor
12 | # 100 shares per trade
13 | HMAX_NORMALIZE = 100
14 | # initial amount of money we have in our account
15 | INITIAL_ACCOUNT_BALANCE=1000000
16 | # total number of stocks in our portfolio
17 | STOCK_DIM = 30
18 | # transaction fee: 1/1000 reasonable percentage
19 | TRANSACTION_FEE_PERCENT = 0.001
20 | REWARD_SCALING = 1e-4
21 |
22 | class StockEnvTrain(gym.Env):
23 | """A stock trading environment for OpenAI gym"""
24 | metadata = {'render.modes': ['human']}
25 |
26 | def __init__(self, df,day = 0):
27 | #super(StockEnv, self).__init__()
28 | #money = 10 , scope = 1
29 | self.day = day
30 | self.df = df
31 |
32 | # action_space normalization and shape is STOCK_DIM
33 | self.action_space = spaces.Box(low = -1, high = 1,shape = (STOCK_DIM,))
34 | # Shape = 181: [Current Balance]+[prices 1-30]+[owned shares 1-30]
35 | # +[macd 1-30]+ [rsi 1-30] + [cci 1-30] + [adx 1-30]
36 | self.observation_space = spaces.Box(low=0, high=np.inf, shape = (181,))
37 | # load data from a pandas dataframe
38 | self.data = self.df.loc[self.day,:]
39 | self.terminal = False
40 | # initalize state
41 | self.state = [INITIAL_ACCOUNT_BALANCE] + \
42 | self.data.adjcp.values.tolist() + \
43 | [0]*STOCK_DIM + \
44 | self.data.macd.values.tolist() + \
45 | self.data.rsi.values.tolist() + \
46 | self.data.cci.values.tolist() + \
47 | self.data.adx.values.tolist()
48 | # initialize reward
49 | self.reward = 0
50 | self.cost = 0
51 | # memorize all the total balance change
52 | self.asset_memory = [INITIAL_ACCOUNT_BALANCE]
53 | self.rewards_memory = []
54 | self.trades = 0
55 | #self.reset()
56 | self._seed()
57 |
58 |
59 | def _sell_stock(self, index, action):
60 | # perform sell action based on the sign of the action
61 | if self.state[index+STOCK_DIM+1] > 0:
62 | #update balance
63 | self.state[0] += \
64 | self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \
65 | (1- TRANSACTION_FEE_PERCENT)
66 |
67 | self.state[index+STOCK_DIM+1] -= min(abs(action), self.state[index+STOCK_DIM+1])
68 | self.cost +=self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \
69 | TRANSACTION_FEE_PERCENT
70 | self.trades+=1
71 | else:
72 | pass
73 |
74 |
75 | def _buy_stock(self, index, action):
76 | # perform buy action based on the sign of the action
77 | available_amount = self.state[0] // self.state[index+1]
78 | # print('available_amount:{}'.format(available_amount))
79 |
80 | #update balance
81 | self.state[0] -= self.state[index+1]*min(available_amount, action)* \
82 | (1+ TRANSACTION_FEE_PERCENT)
83 |
84 | self.state[index+STOCK_DIM+1] += min(available_amount, action)
85 |
86 | self.cost+=self.state[index+1]*min(available_amount, action)* \
87 | TRANSACTION_FEE_PERCENT
88 | self.trades+=1
89 |
90 | def step(self, actions):
91 | # print(self.day)
92 | self.terminal = self.day >= len(self.df.index.unique())-1
93 | # print(actions)
94 |
95 | if self.terminal:
96 | plt.plot(self.asset_memory,'r')
97 | plt.savefig('results/account_value_train.png')
98 | plt.close()
99 | end_total_asset = self.state[0]+ \
100 | sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)]))
101 |
102 | #print("end_total_asset:{}".format(end_total_asset))
103 | df_total_value = pd.DataFrame(self.asset_memory)
104 | df_total_value.to_csv('results/account_value_train.csv')
105 | #print("total_reward:{}".format(self.state[0]+sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):61]))- INITIAL_ACCOUNT_BALANCE ))
106 | #print("total_cost: ", self.cost)
107 | #print("total_trades: ", self.trades)
108 | df_total_value.columns = ['account_value']
109 | df_total_value['daily_return']=df_total_value.pct_change(1)
110 | sharpe = (252**0.5)*df_total_value['daily_return'].mean()/ \
111 | df_total_value['daily_return'].std()
112 | #print("Sharpe: ",sharpe)
113 | #print("=================================")
114 | df_rewards = pd.DataFrame(self.rewards_memory)
115 | #df_rewards.to_csv('results/account_rewards_train.csv')
116 |
117 | # print('total asset: {}'.format(self.state[0]+ sum(np.array(self.state[1:29])*np.array(self.state[29:]))))
118 | #with open('obs.pkl', 'wb') as f:
119 | # pickle.dump(self.state, f)
120 |
121 | return self.state, self.reward, self.terminal,{}
122 |
123 | else:
124 | # print(np.array(self.state[1:29]))
125 |
126 | actions = actions * HMAX_NORMALIZE
127 | #actions = (actions.astype(int))
128 |
129 | begin_total_asset = self.state[0]+ \
130 | sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)]))
131 | #print("begin_total_asset:{}".format(begin_total_asset))
132 |
133 | argsort_actions = np.argsort(actions)
134 |
135 | sell_index = argsort_actions[:np.where(actions < 0)[0].shape[0]]
136 | buy_index = argsort_actions[::-1][:np.where(actions > 0)[0].shape[0]]
137 |
138 | for index in sell_index:
139 | # print('take sell action'.format(actions[index]))
140 | self._sell_stock(index, actions[index])
141 |
142 | for index in buy_index:
143 | # print('take buy action: {}'.format(actions[index]))
144 | self._buy_stock(index, actions[index])
145 |
146 | self.day += 1
147 | self.data = self.df.loc[self.day,:]
148 | #load next state
149 | # print("stock_shares:{}".format(self.state[29:]))
150 | self.state = [self.state[0]] + \
151 | self.data.adjcp.values.tolist() + \
152 | list(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)]) + \
153 | self.data.macd.values.tolist() + \
154 | self.data.rsi.values.tolist() + \
155 | self.data.cci.values.tolist() + \
156 | self.data.adx.values.tolist()
157 |
158 | end_total_asset = self.state[0]+ \
159 | sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)]))
160 | self.asset_memory.append(end_total_asset)
161 | #print("end_total_asset:{}".format(end_total_asset))
162 |
163 | self.reward = end_total_asset - begin_total_asset
164 | # print("step_reward:{}".format(self.reward))
165 | self.rewards_memory.append(self.reward)
166 |
167 | self.reward = self.reward*REWARD_SCALING
168 |
169 |
170 |
171 | return self.state, self.reward, self.terminal, {}
172 |
173 | def reset(self):
174 | self.asset_memory = [INITIAL_ACCOUNT_BALANCE]
175 | self.day = 0
176 | self.data = self.df.loc[self.day,:]
177 | self.cost = 0
178 | self.trades = 0
179 | self.terminal = False
180 | self.rewards_memory = []
181 | #initiate state
182 | self.state = [INITIAL_ACCOUNT_BALANCE] + \
183 | self.data.adjcp.values.tolist() + \
184 | [0]*STOCK_DIM + \
185 | self.data.macd.values.tolist() + \
186 | self.data.rsi.values.tolist() + \
187 | self.data.cci.values.tolist() + \
188 | self.data.adx.values.tolist()
189 | # iteration += 1
190 | return self.state
191 |
192 | def render(self, mode='human'):
193 | return self.state
194 |
195 | def _seed(self, seed=None):
196 | self.np_random, seed = seeding.np_random(seed)
197 | return [seed]
--------------------------------------------------------------------------------
/env/EnvMultipleStock_validation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | from gym.utils import seeding
4 | import gym
5 | from gym import spaces
6 | import matplotlib
7 | matplotlib.use('Agg')
8 | import matplotlib.pyplot as plt
9 | import pickle
10 |
11 | # shares normalization factor
12 | # 100 shares per trade
13 | HMAX_NORMALIZE = 100
14 | # initial amount of money we have in our account
15 | INITIAL_ACCOUNT_BALANCE=1000000
16 | # total number of stocks in our portfolio
17 | STOCK_DIM = 30
18 | # transaction fee: 1/1000 reasonable percentage
19 | TRANSACTION_FEE_PERCENT = 0.001
20 |
21 | # turbulence index: 90-150 reasonable threshold
22 | #TURBULENCE_THRESHOLD = 140
23 | REWARD_SCALING = 1e-4
24 |
25 | class StockEnvValidation(gym.Env):
26 | """A stock trading environment for OpenAI gym"""
27 | metadata = {'render.modes': ['human']}
28 |
29 | def __init__(self, df, day = 0, turbulence_threshold=140, iteration=''):
30 | #super(StockEnv, self).__init__()
31 | #money = 10 , scope = 1
32 | self.day = day
33 | self.df = df
34 | # action_space normalization and shape is STOCK_DIM
35 | self.action_space = spaces.Box(low = -1, high = 1,shape = (STOCK_DIM,))
36 | # Shape = 181: [Current Balance]+[prices 1-30]+[owned shares 1-30]
37 | # +[macd 1-30]+ [rsi 1-30] + [cci 1-30] + [adx 1-30]
38 | self.observation_space = spaces.Box(low=0, high=np.inf, shape = (181,))
39 | # load data from a pandas dataframe
40 | self.data = self.df.loc[self.day,:]
41 | self.terminal = False
42 | self.turbulence_threshold = turbulence_threshold
43 | # initalize state
44 | self.state = [INITIAL_ACCOUNT_BALANCE] + \
45 | self.data.adjcp.values.tolist() + \
46 | [0]*STOCK_DIM + \
47 | self.data.macd.values.tolist() + \
48 | self.data.rsi.values.tolist() + \
49 | self.data.cci.values.tolist() + \
50 | self.data.adx.values.tolist()
51 | # initialize reward
52 | self.reward = 0
53 | self.turbulence = 0
54 | self.cost = 0
55 | self.trades = 0
56 | # memorize all the total balance change
57 | self.asset_memory = [INITIAL_ACCOUNT_BALANCE]
58 | self.rewards_memory = []
59 | #self.reset()
60 | self._seed()
61 |
62 | self.iteration=iteration
63 |
64 |
65 | def _sell_stock(self, index, action):
66 | # perform sell action based on the sign of the action
67 | if self.turbulence