├── Readme.md ├── preprocess.py ├── loadData.py └── stock_data ├── INFRATEL.csv └── TITAN.csv /Readme.md: -------------------------------------------------------------------------------- 1 | ## Overview 2 | Predicting NIFTY50 index movement for 7 days period. 3 | LSTM layers are used in keras to predict NIFTY50 index movement for 7 days period. 4 | 5 | ## Youtube Tutorial 6 | https://www.youtube.com/watch?v=BYd9M_ragR0 7 | 8 | Project is divided into three parts:
9 | * loadData.py
10 | Scraped Wiki's NIFTY50 page to get ticker symbols
11 | Used Quandl API to fetch stock data for past 5 years 12 | 13 | * Preprocess.py
14 | Label training data as 0(sell) and 1(buy)
15 | Scale data using sklearn preprocessing libarary 16 | 17 | * Build_model.py
18 | Build model in keras with LSTM layers. 19 | 20 | Program runs at 61% Validation accuracy and 39% Validation loss 21 | 22 | ### Dependencies: 23 | * pandas 24 | * tensorflow 25 | * keras 26 | * sklearn 27 | * numpy 28 | * beautifulsoup4 29 | * requests 30 | 31 | ### Usage: 32 | Run Buildmodel.py script on commandline. 33 | 34 | ### Acknowledgements: 35 | * sentdex tutorial -> https://www.youtube.com/watch?time_continue=535&v=yWkpRdpOiPY 36 | * Siraj Raval tutorial -> https://www.youtube.com/watch?v=ftMq5ps503w&vl=en 37 | -------------------------------------------------------------------------------- /preprocess.py: -------------------------------------------------------------------------------- 1 | from sklearn import preprocessing 2 | import pandas as pd 3 | import numpy as np 4 | import loadData 5 | 6 | 7 | #how many days data will be used to create series to train RNN 8 | SERIES_LENGTH=30 9 | PREDICT_LENGTH=7 10 | 11 | TICKER="NIFTY_50" 12 | 13 | def normalize_data(df): 14 | pass#implement it if you want to use different techniques for normalizing and scaling 15 | 16 | def scale_data(df): 17 | for column in df.columns: 18 | df[column] = preprocessing.scale(df[column].values) 19 | return df 20 | 21 | def process_data(df): 22 | df["nifty_future_price"]=df[f"{TICKER}_Close"].shift(-PREDICT_LENGTH) 23 | 24 | #Dropping any Nan values 25 | df.dropna(inplace=True) 26 | 27 | #comparing future nifty price with today's price and labeling it as 1 if price increases and zero otherwise 28 | df["Label"]=np.where(df["nifty_future_price"]>=df["NIFTY_50_Close"],1,0) 29 | 30 | #dropping 'nifty_future_price' columns as it is no longer required 31 | df.drop('nifty_future_price',1,inplace=True) 32 | df.to_csv('nifty50_future_label.csv') 33 | 34 | sequence=[] 35 | temp=df.loc[:, df.columns != 'Label'] 36 | temp=scale_data(temp) 37 | # print(f"temp{temp[:30]}") 38 | for i in range (len(temp)-SERIES_LENGTH): 39 | sequence.append([np.array(temp[i:i+SERIES_LENGTH]),df.iloc[i+SERIES_LENGTH,-1]]) 40 | 41 | np.random.shuffle(sequence) 42 | 43 | X=[] 44 | y=[] 45 | buy=[] 46 | sell=[] 47 | for seq ,label in sequence: 48 | if label == 0: 49 | sell.append([seq,label]) 50 | else: 51 | buy.append([seq,label]) 52 | # print(f"buy :{buy[:10]}") 53 | # print(f"sell :{sell[:10]}") 54 | buys=len(buy) 55 | sells=len(sell) 56 | # print(f"original buys:{buys} original sells:{sells}") 57 | if(buys