├── requirements.txt
├── LICENSE
├── MLP_model.py
├── backtest.py
├── LSTM_model.py
├── preprocessing.py
├── get_prices.py
└── README.md


/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas-datareader
2 | fix-yahoo-finance
3 | numpy
4 | pandas
5 | tensorflow
6 | keras
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Vivek Palaniappan
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/MLP_model.py:
--------------------------------------------------------------------------------
 1 | import get_prices as hist
 2 | import tensorflow as tf
 3 | from preprocessing import DataProcessing
 4 | # import pandas_datareader.data as pdr if using the single test below
 5 | import fix_yahoo_finance as fix
 6 | fix.pdr_override()
 7 | 
 8 | start = "2003-01-01"
 9 | end = "2018-01-01"
10 | 
11 | hist.get_stock_data("AAPL", start_date=start, end_date=end)
12 | process = DataProcessing("stock_prices.csv", 0.9)
13 | process.gen_test(10)
14 | process.gen_train(10)
15 | 
16 | X_train = process.X_train / 200
17 | Y_train = process.Y_train / 200
18 | 
19 | X_test = process.X_test / 200
20 | Y_test = process.Y_test / 200
21 | 
22 | model = tf.keras.models.Sequential()
23 | model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
24 | model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
25 | model.add(tf.keras.layers.Dense(1, activation=tf.nn.relu))
26 | 
27 | model.compile(optimizer="adam", loss="mean_squared_error")
28 | 
29 | model.fit(X_train, Y_train, epochs=100)
30 | 
31 | print(model.evaluate(X_test, Y_test))
32 | 
33 | # If instead of a full backtest, you just want to see how accurate the model is for a particular prediction, run this:
34 | # data = pdr.get_data_yahoo("AAPL", "2017-12-19", "2018-01-03")
35 | # stock = data["Adj Close"]
36 | # X_predict = np.array(stock).reshape((1, 10)) / 200
37 | # print(model.predict(X_predict)*200)
38 | 
39 | 


--------------------------------------------------------------------------------
/backtest.py:
--------------------------------------------------------------------------------
 1 | import pandas_datareader.data as pdr
 2 | import fix_yahoo_finance as fix
 3 | import numpy as np
 4 | fix.pdr_override()
 5 | 
 6 | 
 7 | def back_test(strategy, seq_len, ticker, start_date, end_date, dim):
 8 |     """
 9 |     A simple back test for a given date period
10 |     :param strategy: the chosen strategy. Note to have already formed the model, and fitted with training data.
11 |     :param seq_len: length of the days used for prediction
12 |     :param ticker: company ticker
13 |     :param start_date: starting date
14 |     :type start_date: "YYYY-mm-dd"
15 |     :param end_date: ending date
16 |     :type end_date: "YYYY-mm-dd"
17 |     :param dim: dimension required for strategy: 3dim for LSTM and 2dim for MLP
18 |     :type dim: tuple
19 |     :return: Percentage errors array that gives the errors for every test in the given date range
20 |     """
21 |     data = pdr.get_data_yahoo(ticker, start_date, end_date)
22 |     stock_data = data["Adj Close"]
23 |     errors = []
24 |     for i in range((len(stock_data)//10)*10 - seq_len - 1):
25 |         x = np.array(stock_data.iloc[i: i + seq_len, 1]).reshape(dim) / 200
26 |         y = np.array(stock_data.iloc[i + seq_len + 1, 1]) / 200
27 |         predict = strategy.predict(x)
28 |         while predict == 0:
29 |             predict = strategy.predict(x)
30 |         error = (predict - y) / 100
31 |         errors.append(error)
32 |         total_error = np.array(errors)
33 |     print(f"Average error = {total_error.mean()}")
34 |     # If you want to see the full error list then print the following statement
35 |     # print(errors)
36 | 


--------------------------------------------------------------------------------
/LSTM_model.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | import get_prices as hist
 4 | import tensorflow as tf
 5 | from preprocessing import DataProcessing
 6 | import pandas_datareader.data as pdr
 7 | import fix_yahoo_finance as fix
 8 | fix.pdr_override()
 9 | 
10 | start = "2003-01-01"
11 | end = "2018-01-01"
12 | 
13 | hist.get_stock_data("AAPL", start_date=start, end_date=end)
14 | process = DataProcessing("stock_prices.csv", 0.9)
15 | process.gen_test(10)
16 | process.gen_train(10)
17 | 
18 | X_train = process.X_train.reshape((3379, 10, 1)) / 200
19 | Y_train = process.Y_train / 200
20 | 
21 | X_test = process.X_test.reshape(359, 10, 1) / 200
22 | Y_test = process.Y_test / 200
23 | 
24 | model = tf.keras.Sequential()
25 | model.add(tf.keras.layers.LSTM(20, input_shape=(10, 1), return_sequences=True))
26 | model.add(tf.keras.layers.LSTM(20))
27 | model.add(tf.keras.layers.Dense(1, activation=tf.nn.relu))
28 | 
29 | model.compile(optimizer="adam", loss="mean_squared_error")
30 | 
31 | model.fit(X_train, Y_train, epochs=50)
32 | 
33 | print(model.evaluate(X_test, Y_test))
34 | 
35 | data = pdr.get_data_yahoo("AAPL", "2017-12-19", "2018-01-03")
36 | stock = data["Adj Close"]
37 | X_predict = np.array(stock).reshape((1, 10, 1)) / 200
38 | 
39 | print(model.predict(X_predict)*200)
40 | 
41 | # If instead of a full backtest, you just want to see how accurate the model is for a particular prediction, run this:
42 | # data = pdr.get_data_yahoo("AAPL", "2017-12-19", "2018-01-03")
43 | # stock = data["Adj Close"]
44 | # X_predict = np.array(stock).reshape((1, 10)) / 200
45 | # print(model.predict(X_predict)*200)
46 | 


--------------------------------------------------------------------------------
/preprocessing.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | 
 4 | 
 5 | class DataProcessing:
 6 |     def __init__(self, file, train):
 7 |         self.file = pd.read_csv(file)
 8 |         self.train = train
 9 |         self.i = int(self.train * len(self.file))
10 |         self.stock_train = self.file[0: self.i]
11 |         self.stock_test = self.file[self.i:]
12 |         self.input_train = []
13 |         self.output_train = []
14 |         self.input_test = []
15 |         self.output_test = []
16 | 
17 |     def gen_train(self, seq_len):
18 |         """
19 |         Generates training data
20 |         :param seq_len: length of window
21 |         :return: X_train and Y_train
22 |         """
23 |         for i in range((len(self.stock_train)//seq_len)*seq_len - seq_len - 1):
24 |             x = np.array(self.stock_train.iloc[i: i + seq_len, 1])
25 |             y = np.array([self.stock_train.iloc[i + seq_len + 1, 1]], np.float64)
26 |             self.input_train.append(x)
27 |             self.output_train.append(y)
28 |         self.X_train = np.array(self.input_train)
29 |         self.Y_train = np.array(self.output_train)
30 | 
31 |     def gen_test(self, seq_len):
32 |         """
33 |         Generates test data
34 |         :param seq_len: Length of window
35 |         :return: X_test and Y_test
36 |         """
37 |         for i in range((len(self.stock_test)//seq_len)*seq_len - seq_len - 1):
38 |             x = np.array(self.stock_test.iloc[i: i + seq_len, 1])
39 |             y = np.array([self.stock_test.iloc[i + seq_len + 1, 1]], np.float64)
40 |             self.input_test.append(x)
41 |             self.output_test.append(y)
42 |         self.X_test = np.array(self.input_test)
43 |         self.Y_test = np.array(self.output_test)
44 | 


--------------------------------------------------------------------------------
/get_prices.py:
--------------------------------------------------------------------------------
 1 | import pandas_datareader.data as pdr
 2 | import fix_yahoo_finance as fix
 3 | import time
 4 | fix.pdr_override()
 5 | 
 6 | 
 7 | def get_stock_data(ticker, start_date, end_date):
 8 |     """
 9 |     Gets historical stock data of given tickers between dates
10 |     :param ticker: company, or companies whose data is to fetched
11 |     :type ticker: string or list of strings
12 |     :param start_date: starting date for stock prices
13 |     :type start_date: string of date "YYYY-mm-dd"
14 |     :param end_date: end date for stock prices
15 |     :type end_date: string of date "YYYY-mm-dd"
16 |     :return: stock_data.csv
17 |     """
18 |     i = 1
19 |     try:
20 |         all_data = pdr.get_data_yahoo(ticker, start_date, end_date)
21 |     except ValueError:
22 |         print("ValueError, trying again")
23 |         i += 1
24 |         if i < 5:
25 |             time.sleep(10)
26 |             get_stock_data(ticker, start_date, end_date)
27 |         else:
28 |             print("Tried 5 times, Yahoo error. Trying after 2 minutes")
29 |             time.sleep(120)
30 |             get_stock_data(ticker, start_date, end_date)
31 |     stock_data = all_data["Adj Close"]
32 |     stock_data.to_csv("stock_prices.csv")
33 | 
34 | 
35 | def get_sp500(start_date, end_date):
36 |     """
37 |     Gets sp500 price data
38 |     :param start_date: starting date for sp500 prices
39 |     :type start_date: string of date "Y-m-d"
40 |     :param end_date: end date for sp500 prices
41 |     :type end_date: string of date "Y-m-d"
42 |     :return: sp500_data.csv
43 |     """
44 |     i = 1
45 |     try:
46 |         sp500_all_data = pdr.get_data_yahoo("SPY", start_date, end_date)
47 |     except ValueError:
48 |         print("ValueError, trying again")
49 |         i += 1
50 |         if i < 5:
51 |             time.sleep(10)
52 |             get_stock_data(start_date, end_date)
53 |         else:
54 |             print("Tried 5 times, Yahoo error. Trying after 2 minutes")
55 |             time.sleep(120)
56 |             get_stock_data(start_date, end_date)
57 |     sp500_data = sp500_all_data["Adj Close"]
58 |     sp500_data.to_csv("sp500_data.csv")
59 | 
60 | 
61 | if __name__ == "__main__":
62 |     get_stock_data("AAPL", "2018-05-01", "2018-06-01")
63 |     # get_sp500("2018-05-01", "2018-06-01")
64 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # IntroNeuralNetworks in Python: A Template Project
  2 | [![forthebadge made-with-python](https://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
  3 | 
  4 | [![GitHub license](https://img.shields.io/badge/License-MIT-brightgreen.svg?style=flat-square)](https://github.com/VivekPa/NeuralNetworkStocks/blob/master/LICENSE.txt) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
  5 | 
  6 | IntroNeuralNetworks is a project that **introduces** neural networks and illustrates an example of how one can use neural networks to predict stock prices. It is built with the goal of allowing beginners to understand the fundamentals of how neural network models are built and go through the entire workflow of machine learning. This model is in no way sophisticated, so do improve upon this base project in any way. 
  7 | 
  8 | The core steps involved is: download stock price data from Yahoo Finance, preprocess the dataframes according to specifications for neural network libraries and finally train the neural network model and backtest over historical data. 
  9 | 
 10 | This model is not meant to be used to live trade stocks with. However, with further extensions, this model can definitely be used to support your trading strategies. 
 11 | 
 12 | I hope you find this project useful in your journey as a trader or a machine learning engineer. Personally, this is my first major machine learning and python project, so I'll appreciate if you **leave a star**. 
 13 | 
 14 | *As a disclaimer, this is a purely educational project. Any backtested results do not guarantee performance in live trading. Do live trading at your own risk.*
 15 | *This guide and further analysis has been cross-posted in my blog, [Engineer Quant](https://medium.com/engineer-quant)*
 16 | 
 17 | ## Contents
 18 | - [Contents](#contents)
 19 | - [Overview](#overview)
 20 | - [Getting Started](#getting-started)
 21 | - [Requirements](#requirements)
 22 | - [Stock Price Data](#stock-price-data)
 23 | - [Preprocessing](#preprocessing)
 24 |   - [Preparing Train Dataset](#preparing-train-dataset)
 25 |   - [Preparing Test Dataset](#preparing-test-dataset)
 26 | - [Neural Network Models](#neural-network-models)
 27 |   - [Multilayer Perceptron Model](#multilayer-perceptron-model)
 28 |   - [LSTM Model](#lstm-model)
 29 | - [Backtesting](#backtesting)
 30 | - [Stock Predictions](#stock-predictions)
 31 | - [Extensions](#extensions)
 32 |   - [Getting Data](#getting-data)
 33 |   - [Neural Network Model](#neuron-network-model)
 34 |   - [Supporting Trade](#supporting-trade)
 35 | - [Contributing](#contributing)
 36 | 
 37 | ## Overview
 38 | 
 39 | The overall workflow for this project is as such:
 40 | 1. Acquire the stock price data - this will give us our *features* for the model.
 41 | 2. Preprocess the data - make the train and test datasets.
 42 | 3. Use the neural network to learn from the training data.
 43 | 4. Backtest the model across a date range.
 44 | 5. Make useful stock price predictions 
 45 | 6. Supplement your trading strategies with the predictions
 46 | 
 47 | Although this is very general, it is essentially what you need to build your own machine learning or neural network model.
 48 | 
 49 | ## Getting Started
 50 | 
 51 | For those of you that do not want to learn about the construction of the model (although I highly suggest you to), clone and download the project, unzip it to your preferred folder and run the following code in your computer.
 52 | 
 53 | ```bash
 54 | pip install -r requirements.txt
 55 | python LSTM_model.py
 56 | ```
 57 | It's as simple as that!
 58 | 
 59 | ## Requirements
 60 | 
 61 | For those who want a more details manual, this program is built in Python 3.6. If you are using an earlier version of Python, like Python 3.x, you will run into problems with syntax when it comes to f strings. I do suggest that you update to Python 3.6.
 62 | 
 63 | ```bash
 64 | pip install -r requirements.txt
 65 | ```
 66 | 
 67 | ## Stock Price Data
 68 | 
 69 | Now we come to the most dreaded part of any machine learning project: data acquisiton and data preprocessing. As tedious and hard as it might be, it is vital to have high quality data to feed into your model. As the saying goes "Garbage in. Garbage out." This is most applicable to machine learning models, as your model is only as good as the data it is fed. Processing the data comes in two parts: downloading the data, and forming our datasets for the model. Thanks to Yahoo Finance API, downloading the stock price data is relatively simple (sadly I doubt not for long). 
 70 | 
 71 | To download the stock price data, we use `pandas_datareader` which after a while did not work. So we use this [fix](https://github.com/ranaroussi/fix-yahoo-finance) and use `fix_yahoo_finance`. If this fails (maybe in the near future), you can just download the stock data directly from Yahoo for free and save it as `stock_price.csv`.
 72 | 
 73 | ## Preprocessing
 74 | 
 75 | Once we have the stock price data for the stocks we are going to predict, we now need to create the training and testing datasets.
 76 | 
 77 | ### Preparing Train Dataset
 78 | 
 79 | The goal for our training dataset is to have rows of a given length (the number of prices used to predict) along with the correct prediction to evaluate our model against. I have given the user the option of choosing how much of the stock price data you want to use for your training data when calling the `Preprocessing` class. Generating the training data is done quite simply using `numpy.arrays` and a for loop. You can perform this by running:
 80 | 
 81 | ```python
 82 | Preprocessing.get_train(seq_len)
 83 | ```
 84 | 
 85 | ### Preparing Test Dataset
 86 | 
 87 | The test dataset is prepared in precisely the same way as the training dataset, just that the length of the data is different. This is done with the following code:
 88 | 
 89 | ```python
 90 | Preprocessing.get_test(seq_len)
 91 | ```
 92 | 
 93 | ## Neural Network Models
 94 | 
 95 | Since the main goal of this project is to get acquainted with machine learning and neural networks, I will explain what models I have used and why they may be efficient in predicting stock prices. If you want a more detailed explanation of neural networks, check out my blog.
 96 | 
 97 | ### Multilayer Perceptron Model
 98 | 
 99 | A multilayer perceptron is the most basic of neural networks that uses backpropagation to learn from the training dataset. If you want more details about how the multilayer perceptron works, do read this [article](https://medium.com/engineer-quant/multilayer-perceptron-4453615c4337).
100 | 
101 | ### LSTM Model
102 | 
103 | The benefit of using a Long Short Term Memory neural network is that there is an extra element of long term memory, where the neural network has data about the data in prior layers as a 'memory' which allows the model to find the relationships between the data itself and between the data and output. Again for more details, please read this [article](https://www.altumintelligence.com/articles/a/Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks)
104 | 
105 | ## Backtesting
106 | 
107 | My backtest system is simple in the sense that it only evaluates how well the model predicts the stock price. It does not actually consider how to trade based on these predictions (that is the topic of developing trading strategies using this model). To run just the backtesting, you will need to run
108 | 
109 | ```python
110 | back_test(strategy, seq_len, ticker, start_date, end_date, dim)
111 | ```
112 | The `dim` variable is the dimensions of the data set you want and it is necessary to successfully train the models.
113 | 
114 | ## Stock Predictions
115 | 
116 | Now that your model has been trained and backtested, we can use it to make stock price predictions. In order to make stock price predictions, you need to download the current data and use the `predict` method of `keras` module. Run the following code after training and backtesting the model:
117 | 
118 | ```python
119 | data = pdr.get_data_yahoo("AAPL", "2017-12-19", "2018-01-03")
120 | stock = data["Adj Close"]
121 | X_predict = np.array(stock).reshape((1, 10)) / 200
122 | print(model.predict(X_predict)*200)
123 | ```
124 | 
125 | ## Extensions
126 | 
127 | As mentioned before, this projected is highly extendable, and here some ideas for improving the project.
128 | 
129 | ### Getting Data
130 | 
131 | Getting data is pretty standard using Yahoo Finance. However, you may want to look into clustering data in terms of trends of stocks (maybe by sector, or if you want to be really precise, use k-means clustering?).
132 | 
133 | ### Neural Network Model
134 | 
135 | This neural network can be improved in many ways:
136 | 1. Tuning hyperparameters: find the optimal hyperparameters that gives the best prediction 
137 | 2. Backtesting: Make the backtesting system more robust (I have left certain important aspects out for you to figure). Maybe include buying and shorting?
138 | 3. Try different Neural Networks: There are plenty of options and see which works best for your stocks.
139 | 
140 | ### Supporting Trade
141 | 
142 | As I said earlier, this model can be used to support trading by using this prediction in your trading strategy. Examples include:
143 | 1. Simple long short strategy: you buy if the prediction is higher, and vice versa
144 | 2. Intraday Trading: if you can get your hands on minute data or even tick data, you can use this predictor to trade.
145 | 3. Statistical Arbitrage: use can also use the predictions of various stock prices to find the correlation between stocks. 
146 | 
147 | ## Contributing
148 | 
149 | Feel free to fork this and submit PRs. I am open and grateful for any suggestions or bug fixes. Hope you enjoy this project!
150 | 
151 | ---
152 | For more content like this, check out my academic blog at [https://medium.com/engineer-quant](https://medium.com/engineer-quant)
153 | 


--------------------------------------------------------------------------------