├── Readme.md
├── preprocess.py
├── loadData.py
└── stock_data
├── INFRATEL.csv
└── TITAN.csv
/Readme.md:
--------------------------------------------------------------------------------
1 | ## Overview
2 | Predicting NIFTY50 index movement for 7 days period.
3 | LSTM layers are used in keras to predict NIFTY50 index movement for 7 days period.
4 |
5 | ## Youtube Tutorial
6 | https://www.youtube.com/watch?v=BYd9M_ragR0
7 |
8 | Project is divided into three parts:
9 | * loadData.py
10 | Scraped Wiki's NIFTY50 page to get ticker symbols
11 | Used Quandl API to fetch stock data for past 5 years
12 |
13 | * Preprocess.py
14 | Label training data as 0(sell) and 1(buy)
15 | Scale data using sklearn preprocessing libarary
16 |
17 | * Build_model.py
18 | Build model in keras with LSTM layers.
19 |
20 | Program runs at 61% Validation accuracy and 39% Validation loss
21 |
22 | ### Dependencies:
23 | * pandas
24 | * tensorflow
25 | * keras
26 | * sklearn
27 | * numpy
28 | * beautifulsoup4
29 | * requests
30 |
31 | ### Usage:
32 | Run Buildmodel.py script on commandline.
33 |
34 | ### Acknowledgements:
35 | * sentdex tutorial -> https://www.youtube.com/watch?time_continue=535&v=yWkpRdpOiPY
36 | * Siraj Raval tutorial -> https://www.youtube.com/watch?v=ftMq5ps503w&vl=en
37 |
--------------------------------------------------------------------------------
/preprocess.py:
--------------------------------------------------------------------------------
1 | from sklearn import preprocessing
2 | import pandas as pd
3 | import numpy as np
4 | import loadData
5 |
6 |
7 | #how many days data will be used to create series to train RNN
8 | SERIES_LENGTH=30
9 | PREDICT_LENGTH=7
10 |
11 | TICKER="NIFTY_50"
12 |
13 | def normalize_data(df):
14 | pass#implement it if you want to use different techniques for normalizing and scaling
15 |
16 | def scale_data(df):
17 | for column in df.columns:
18 | df[column] = preprocessing.scale(df[column].values)
19 | return df
20 |
21 | def process_data(df):
22 | df["nifty_future_price"]=df[f"{TICKER}_Close"].shift(-PREDICT_LENGTH)
23 |
24 | #Dropping any Nan values
25 | df.dropna(inplace=True)
26 |
27 | #comparing future nifty price with today's price and labeling it as 1 if price increases and zero otherwise
28 | df["Label"]=np.where(df["nifty_future_price"]>=df["NIFTY_50_Close"],1,0)
29 |
30 | #dropping 'nifty_future_price' columns as it is no longer required
31 | df.drop('nifty_future_price',1,inplace=True)
32 | df.to_csv('nifty50_future_label.csv')
33 |
34 | sequence=[]
35 | temp=df.loc[:, df.columns != 'Label']
36 | temp=scale_data(temp)
37 | # print(f"temp{temp[:30]}")
38 | for i in range (len(temp)-SERIES_LENGTH):
39 | sequence.append([np.array(temp[i:i+SERIES_LENGTH]),df.iloc[i+SERIES_LENGTH,-1]])
40 |
41 | np.random.shuffle(sequence)
42 |
43 | X=[]
44 | y=[]
45 | buy=[]
46 | sell=[]
47 | for seq ,label in sequence:
48 | if label == 0:
49 | sell.append([seq,label])
50 | else:
51 | buy.append([seq,label])
52 | # print(f"buy :{buy[:10]}")
53 | # print(f"sell :{sell[:10]}")
54 | buys=len(buy)
55 | sells=len(sell)
56 | # print(f"original buys:{buys} original sells:{sells}")
57 | if(buys