├── report └── Report.pdf ├── README.md ├── 0. Preprocessing.ipynb ├── 4. RNN.ipynb ├── 5. GRU.ipynb └── 8. RoBERTa.ipynb /report/Report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rohithteja/Twitter-Sentiment-Analysis-and-Tweet-Extraction/HEAD/report/Report.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Twitter-Sentiment-Analysis-and-Tweet-Extraction 2 | Twitter sentiment analysis of the dataset taken from Kaggle competition (https://www.kaggle.com/c/tweet-sentiment-extraction). 3 | 4 | In the social media, Twitter has been the most engaging platform for a long time and many 5 | companies, political personalities, celebrities have their presence in twitter which makes it a great 6 | place for having discussions by millions of users about various topics ranging from laws passed by 7 | the parliament to the new movie releases. This makes it interesting to analyse the sentiments of 8 | the tweets to see if a tweet by a certain user has positive or negative impact on the community. 9 | Going a step further, we can also find which words of the tweet contribute to the sentiment. The 10 | analysis of public reaction can be easily done using the sentiment analysis and the keyword 11 | extraction of the tweets. 12 | In this project, the sentiment analysis of tweets using various deep learning algorithms is tested 13 | and their performance using different metrics was calculated. Also a method to perform keyword 14 | extraction of tweets was explored (https://www.kaggle.com/yutanakamura/dear-pytorch-lovers-bert-transformers-lightning). The dataset that was considered encompasses a broad set of tweets which are classified into 3 different types of sentiments, namely, positive, negative and neutral. 15 | 16 | The deep learning algorithms tested are listed as follows: 17 | 1. Multilayer Perceptron (MLP) 18 | 2. Convolutional neural network (CNN) 19 | 3. Recurrent neural network (RNN) 20 | 4. Long short-term memory (LSTM) 21 | 5. Gated recurrent unit (GRU) 22 | 6. Bi-directional LSTM (Bi-LSTM) 23 | 7. RoBERTa (Transformer architecture) 24 | 8. BERT 25 | 9. XLNet 26 | 27 | Note - The report of the experiments is included in this repo. 28 | -------------------------------------------------------------------------------- /0. Preprocessing.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"0. Preprocessing.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1jT_ju-_wnQgF9XO_vdfu5aQFXi4u9Vlt","authorship_tag":"ABX9TyOQzKMm5vAoyw/Nl6894f7Y"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"code","metadata":{"id":"TKTziNf2eq_j"},"source":["import numpy as np\n","import pandas as pd\n","from sklearn.model_selection import train_test_split\n","import re\n","from matplotlib import pyplot as plt\n","import nltk\n","from nltk.tokenize import RegexpTokenizer\n","from nltk.stem import WordNetLemmatizer\n","from nltk.corpus import stopwords\n","import string\n","import unicodedata"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"I52HikOxoXDh"},"source":["! sudo apt install openjdk-8-jdk\n","! sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java\n","! pip install language-check\n","! pip install pycontractions"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"wgUmGStToZhZ"},"source":["from pycontractions import Contractions"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"y_7oAsQjqU3s"},"source":["pip install autocorrect"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"ksgf1FsDerB8"},"source":["nltk.download('stopwords')\n","nltk.download('wordnet')\n","nltk.download('punkt')"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"cVqcGbW_erET"},"source":["df = pd.read_csv(\"data/train.csv\")"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"N9na9PIWsAgT"},"source":["df = pd.read_csv(\"data/test.csv\")"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":402},"id":"Xf0Zc7I5erG8","executionInfo":{"status":"ok","timestamp":1607679866544,"user_tz":-60,"elapsed":590,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"f9b4a71a-1d12-4b47-d6f4-58c3a2fb52b9"},"source":["df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextsentiment
0f87dea47dbLast session of the day http://twitpic.com/67ezhneutral
196d74cb729Shanghai is also really exciting (precisely -...positive
2eee518ae67Recession hit Veronique Branquinho, she has to...negative
301082688c6happy bday!positive
433987a8ee5http://twitpic.com/4w75p - I like it!!positive
............
3529e5f0e6ef4bits at 3 am, im very tired but i can`t sleep ...negative
3530416863ce47All alone in this old house again. Thanks for...positive
35316332da480cI know what you mean. My little dog is sinkin...negative
3532df1baec676_sutra what is your next youtube video gonna b...positive
3533469e15c5a8http://twitpic.com/4woj2 - omgssh ang cute n...positive
\n","

3534 rows × 3 columns

\n","
"],"text/plain":[" textID text sentiment\n","0 f87dea47db Last session of the day http://twitpic.com/67ezh neutral\n","1 96d74cb729 Shanghai is also really exciting (precisely -... positive\n","2 eee518ae67 Recession hit Veronique Branquinho, she has to... negative\n","3 01082688c6 happy bday! positive\n","4 33987a8ee5 http://twitpic.com/4w75p - I like it!! positive\n","... ... ... ...\n","3529 e5f0e6ef4b its at 3 am, im very tired but i can`t sleep ... negative\n","3530 416863ce47 All alone in this old house again. Thanks for... positive\n","3531 6332da480c I know what you mean. My little dog is sinkin... negative\n","3532 df1baec676 _sutra what is your next youtube video gonna b... positive\n","3533 469e15c5a8 http://twitpic.com/4woj2 - omgssh ang cute n... positive\n","\n","[3534 rows x 3 columns]"]},"metadata":{"tags":[]},"execution_count":46}]},{"cell_type":"code","metadata":{"id":"ePNBm7GyerNe"},"source":["url = r'''(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))'''"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"oqo-1H6S7ZtW"},"source":["#for text\r\n","\r\n","df.text = df.text.map(lambda x:str(x).lower()) #lower case\r\n","df.text = df.text.map(lambda x:re.sub(r\"\\b[^\\s]+@[^\\s]+[.][^\\s]+\\b\", \"\", x)) #email\r\n","df.text = df.text.map(lambda x:re.sub(url, \"\", x)) #url\r\n","df.text = df.text.map(lambda x:re.sub(r'[^a-zA-z.,!?/:;\\\"\\'\\s]', \"\", x)) #numbers\r\n","df.text = df.text.map(lambda x:re.sub(r'^\\s*|\\s\\s*', ' ', x).strip()) #white space\r\n","df.text = df.text.map(lambda x:''.join([c for c in x if c not in string.punctuation])) #punctuations\r\n","df.text = df.text.map(lambda x:re.sub(r'[^a-zA-z0-9.,!?/:;\\\"\\'\\s]', '', x)) #special char\r\n","df.text = df.text.map(lambda x:unicodedata.normalize('NFKD', x).encode('ascii', 'ignore').decode('utf-8', 'ignore')) #unicode"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"_CMf_Ip1LMat"},"source":["#for selected_text\r\n","\r\n","df.selected_text = df.selected_text.map(lambda x:str(x).lower()) #lower case\r\n","df.selected_text = df.selected_text.map(lambda x:re.sub(r\"\\b[^\\s]+@[^\\s]+[.][^\\s]+\\b\", \"\", x)) #email\r\n","df.selected_text = df.selected_text.map(lambda x:re.sub(url, \"\", x)) #url\r\n","df.selected_text = df.selected_text.map(lambda x:re.sub(r'[^a-zA-z.,!?/:;\\\"\\'\\s]', \"\", x)) #numbers\r\n","df.selected_text = df.selected_text.map(lambda x:re.sub(r'^\\s*|\\s\\s*', ' ', x).strip()) #white space\r\n","df.selected_text = df.selected_text.map(lambda x:''.join([c for c in x if c not in string.punctuation])) #punctuations\r\n","df.selected_text = df.selected_text.map(lambda x:re.sub(r'[^a-zA-z0-9.,!?/:;\\\"\\'\\s]', '', x)) #special char\r\n","df.selected_text = df.selected_text.map(lambda x:unicodedata.normalize('NFKD', x).encode('ascii', 'ignore').decode('utf-8', 'ignore')) #unicode"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":402},"id":"GHmT9315oNHv","executionInfo":{"status":"ok","timestamp":1607679897526,"user_tz":-60,"elapsed":863,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"bc359afa-f863-47ad-fca3-0d7df8b9df32"},"source":["df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextsentiment
0f87dea47dblast session of the dayneutral
196d74cb729shanghai is also really exciting precisely sky...positive
2eee518ae67recession hit veronique branquinho she has to ...negative
301082688c6happy bdaypositive
433987a8ee5i like itpositive
............
3529e5f0e6ef4bits at am im very tired but i cant sleep but i...negative
3530416863ce47all alone in this old house again thanks for t...positive
35316332da480ci know what you mean my little dog is sinking ...negative
3532df1baec676sutra what is your next youtube video gonna be...positive
3533469e15c5a8omgssh ang cute ng bbypositive
\n","

3534 rows × 3 columns

\n","
"],"text/plain":[" textID text sentiment\n","0 f87dea47db last session of the day neutral\n","1 96d74cb729 shanghai is also really exciting precisely sky... positive\n","2 eee518ae67 recession hit veronique branquinho she has to ... negative\n","3 01082688c6 happy bday positive\n","4 33987a8ee5 i like it positive\n","... ... ... ...\n","3529 e5f0e6ef4b its at am im very tired but i cant sleep but i... negative\n","3530 416863ce47 all alone in this old house again thanks for t... positive\n","3531 6332da480c i know what you mean my little dog is sinking ... negative\n","3532 df1baec676 sutra what is your next youtube video gonna be... positive\n","3533 469e15c5a8 omgssh ang cute ng bby positive\n","\n","[3534 rows x 3 columns]"]},"metadata":{"tags":[]},"execution_count":55}]},{"cell_type":"code","metadata":{"id":"XG9YI5ucqMVb"},"source":["#to use spellcheck\n","\n","from autocorrect import Speller\n","spell = Speller(lang=\"en\")\n","tokenizer = RegexpTokenizer(r'\\w+')\n","\n","df.text = df.text.map(lambda x:tokenizer.tokenize(x)) #remove punctuation and tokenize\n","df.text = df.text.map(lambda x:[spell(i) for i in x]) #spell check"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"IHz0jxZiqMli"},"source":["df.to_csv(\"data/preprocessed_train.csv\",index=False)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"BlM-iKN4rJxj"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"JMki-KXJrJ0D"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"00s4XDourJ2c"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"EsJ2l4yErJ4p"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"BTjsAPptrJ7G"},"source":[""],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /4. RNN.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"accelerator":"GPU","colab":{"name":"4. RNN.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1QhCah_wh5pgIO02spSb8VbYc5n6eVHiR","authorship_tag":"ABX9TyN8W7fgeybj7ZGgBlwrvQOZ"},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"code","metadata":{"id":"XnIpJnkHuT-D"},"source":["import pandas as pd\r\n","import os\r\n","import numpy as np\r\n","from sklearn.model_selection import train_test_split\r\n","import gensim\r\n","from gensim.models.word2vec import Word2Vec\r\n","import gensim.downloader as api\r\n","from keras.utils import to_categorical\r\n","import matplotlib.pyplot as plt \r\n","import keras\r\n","from time import time"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"VijxsP322W3W","executionInfo":{"status":"ok","timestamp":1609346649560,"user_tz":-60,"elapsed":6728,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"06d5b848-efa6-4d5a-c626-d130477da5ad"},"source":["df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextselected_textsentiment
0cb774db0d1id have responded if i were goingid have responded if i were goingneutral
1549e992a42sooo sad i will miss you here in san diegosooo sadnegative
2088c60f138my boss is bullying mebullying menegative
39642c003efwhat interview leave me aloneleave me alonenegative
4358bd9e861sons of why couldnt they put them on the rele...sons ofnegative
...............
274764eac33d1c0wish we could come see u on denver husband los...d lostnegative
274774f4c4fc327ive wondered about rake to the client has made...dont forcenegative
27478f67aae2310yay good for both of you enjoy the break you p...yay good for both of youpositive
27479ed167662a5but it was worth itbut it was worth itpositive
274806f7127d9d7all this flirting going on the atg smiles yay ...all this flirting going on the atg smiles yay ...neutral
\n","

27481 rows × 4 columns

\n","
"],"text/plain":[" textID ... sentiment\n","0 cb774db0d1 ... neutral\n","1 549e992a42 ... negative\n","2 088c60f138 ... negative\n","3 9642c003ef ... negative\n","4 358bd9e861 ... negative\n","... ... ... ...\n","27476 4eac33d1c0 ... negative\n","27477 4f4c4fc327 ... negative\n","27478 f67aae2310 ... positive\n","27479 ed167662a5 ... positive\n","27480 6f7127d9d7 ... neutral\n","\n","[27481 rows x 4 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"code","metadata":{"id":"sEBirsml1Hgu"},"source":["# for case 1 run this code (case 1 = text)\r\n","case = \"case1-rnn\"\r\n","\r\n","#read data\r\n","df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df.text = df.text.map(lambda x:str(x))\r\n","df.sentiment = df.sentiment.astype(\"category\")\r\n","df.sentiment = df.sentiment.cat.codes"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"wxc2J9Gr2bQ-"},"source":["# for case 2 run this code (case 2 = selected text)\r\n","case = \"case2-rnn\"\r\n","\r\n","#read data\r\n","df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df.text = df.selected_text.map(lambda x:str(x))\r\n","df.sentiment = df.sentiment.astype(\"category\")\r\n","df.sentiment = df.sentiment.cat.codes"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Kiv4fMpy2wVQ"},"source":["# train, val, test split\r\n","x_train, xtest, y_train, ytest = train_test_split(df.text.values, df.sentiment.values,stratify=df.sentiment.values, test_size=0.3,random_state=1)\r\n","y_train = to_categorical(y_train)\r\n","x_val = xtest[0:4122]\r\n","y_val = to_categorical(ytest[0:4122])\r\n","x_test = xtest[4122:]\r\n","y_test = ytest[4122:]\r\n","\r\n","#padding and tokenization\r\n","from keras.preprocessing.text import Tokenizer\r\n","from keras.preprocessing.sequence import pad_sequences\r\n","\r\n","tokenizer = Tokenizer(num_words=5000)\r\n","tokenizer.fit_on_texts(df.text.values)\r\n","\r\n","X_train = tokenizer.texts_to_sequences(x_train)\r\n","X_val = tokenizer.texts_to_sequences(x_val)\r\n","X_test = tokenizer.texts_to_sequences(x_test)\r\n","\r\n","vocab_size = len(tokenizer.word_index) + 1\r\n","\r\n","maxlen = 100\r\n","\r\n","X_train = pad_sequences(X_train, padding='pre', maxlen=maxlen)\r\n","X_val = pad_sequences(X_val, padding='pre', maxlen=maxlen)\r\n","X_test = pad_sequences(X_test, padding='pre', maxlen=maxlen)\r\n","\r\n","word_index = tokenizer.word_index"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"MfchPQ_iuUHb"},"source":["#import glove embeddings\r\n","\r\n","embeddings_index = {}\r\n","f = open(os.path.join( 'glove.twitter.27B.100d.txt'))\r\n","for line in f:\r\n"," values = line.split()\r\n"," word = values[0]\r\n"," coefs = np.asarray(values[1:], dtype='float32')\r\n"," embeddings_index[word] = coefs\r\n","f.close()\r\n","\r\n","embedding_matrix = np.zeros((len(word_index) + 1, 100))\r\n","for word, i in word_index.items():\r\n"," embedding_vector = embeddings_index.get(word)\r\n"," if embedding_vector is not None:\r\n"," # words not found in embedding index will be all-zeros.\r\n"," embedding_matrix[i] = embedding_vector"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Z4oLKEuduUTR"},"source":["#using glove\r\n","\r\n","from keras.models import Sequential\r\n","from keras import regularizers\r\n","from keras.layers.core import Dense, Dropout, Flatten\r\n","from keras import layers\r\n","from sklearn.metrics import accuracy_score, f1_score\r\n","from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D\r\n","\r\n","\r\n","def rnn_glove(activation,optimizer,epochs,batchsize):\r\n"," embedding_dim = 100\r\n","\r\n"," model = Sequential()\r\n"," model.add(layers.Embedding(input_dim=vocab_size, \r\n"," output_dim=embedding_dim, weights=[embedding_matrix],\r\n"," input_length=maxlen))\r\n"," model.add(layers.SimpleRNN(64))\r\n"," model.add(Dense(3,activation='softmax'))\r\n"," model.compile(optimizer=optimizer,\r\n"," loss='categorical_crossentropy',\r\n"," metrics=['accuracy'])\r\n"," history = model.fit(X_train, y_train,\r\n"," epochs=epochs,\r\n"," verbose=0,\r\n"," validation_data=(X_val, y_val),\r\n"," batch_size=batchsize)\r\n"," \r\n"," return history, model"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"DD3YOLNLfVq5"},"source":["activation = [ \"tanh\"]\r\n","optimizer = [\"adam\", \"SGD\", \"RMSprop\", \"Adadelta\"]\r\n","epochs = [5,10,15,20]\r\n","batchsize = [8,16,32,64,128]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"dk16QKlGk3y7","executionInfo":{"status":"ok","timestamp":1609214909898,"user_tz":-60,"elapsed":8825246,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"6a411f05-0098-4efb-c8ca-00c6dade73be"},"source":["#experiments using glove embeddings\r\n","\r\n","# 1. selecting activation fixing - optimizer = adam, epochs = 5, batch = 16\r\n","sel_activation = {}\r\n","for i in activation:\r\n"," history, model = rnn_glove(i,\"adam\",5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_activation.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_activation_final = max(sel_activation, key=sel_activation.get)\r\n","print(\"best activation function is \",sel_activation_final)\r\n","\r\n","# 2. selecting optimizer by fixing - activation = best, epochs = 5, batch = 16\r\n","sel_optimizer = {}\r\n","for i in optimizer:\r\n"," history, model = rnn_glove(sel_activation_final,i,5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_optimizer.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_optimizer_final = max(sel_optimizer, key=sel_optimizer.get)\r\n","print(\"best optimizer is \",sel_optimizer_final)\r\n","\r\n","# 3. graph epoch vs accuracy score\r\n","\r\n","acc_train_epoch = {}\r\n","acc_val_epoch = {}\r\n","for i in epochs:\r\n"," history, model = rnn_glove(sel_activation_final,sel_optimizer_final,i,16)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_epoch.update(temp_train)\r\n"," acc_val_epoch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_epoch_final = max(acc_val_epoch, key=acc_val_epoch.get)\r\n","print(\"best epoch is \",sel_epoch_final)\r\n","\r\n","df_epoch_train = pd.DataFrame(list(acc_train_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","df_epoch_val = pd.DataFrame(list(acc_val_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","\r\n","df_epoch_val.Epochs = df_epoch_val.Epochs.map(lambda x:str(x))\r\n","df_epoch_train.Epochs = df_epoch_train.Epochs.map(lambda x:str(x))\r\n","\r\n","\r\n","plt.figure()\r\n","plt.plot(df_epoch_train.iloc[:,0],df_epoch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_epoch_val.iloc[:,0],df_epoch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Epochs\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Epochs\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-epoch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","\r\n","# 4. graph batch size vs accuracy score\r\n","acc_train_batch = {}\r\n","acc_val_batch = {}\r\n","for i in batchsize:\r\n"," history, model = rnn_glove(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_batch.update(temp_train)\r\n"," acc_val_batch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_batch_final = max(acc_val_batch, key=acc_val_batch.get)\r\n","print(\"best batchsize is \",sel_batch_final)\r\n","\r\n","df_batch_train = pd.DataFrame(list(acc_train_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","df_batch_val = pd.DataFrame(list(acc_val_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","\r\n","df_batch_val.Batchsize = df_batch_val.Batchsize.map(lambda x:str(x))\r\n","df_batch_train.Batchsize = df_batch_train.Batchsize.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_batch_train.iloc[:,0],df_batch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_batch_val.iloc[:,0],df_batch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Batchsize\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Batchsize\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-batch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","# 5. best model\r\n","t0 = time()\r\n","history, model = rnn_glove(sel_activation_final,sel_optimizer_final,sel_epoch_final,sel_batch_final)\r\n","pred = np.argmax(model.predict(X_test), axis=-1)\r\n","print(\"test accuracy score = \",accuracy_score(y_pred=pred, y_true=y_test))\r\n","t1 = time()\r\n","print(\"time taken is \", t1-t0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["129/129 [==============================] - 1s 9ms/step - loss: 0.7991 - accuracy: 0.6747\n","best activation function is tanh\n","129/129 [==============================] - 1s 9ms/step - loss: 0.8368 - accuracy: 0.6737\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9075 - accuracy: 0.5902\n","129/129 [==============================] - 1s 9ms/step - loss: 0.6998 - accuracy: 0.7135\n","129/129 [==============================] - 1s 9ms/step - loss: 1.1566 - accuracy: 0.3719\n","best optimizer is RMSprop\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5717 - accuracy: 0.7684\n","129/129 [==============================] - 1s 9ms/step - loss: 0.6976 - accuracy: 0.7232\n","602/602 [==============================] - 5s 9ms/step - loss: 0.3841 - accuracy: 0.8579\n","129/129 [==============================] - 1s 9ms/step - loss: 0.7638 - accuracy: 0.7023\n","602/602 [==============================] - 5s 9ms/step - loss: 0.2630 - accuracy: 0.9077\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9372 - accuracy: 0.6900\n","602/602 [==============================] - 5s 9ms/step - loss: 0.1313 - accuracy: 0.9575\n","129/129 [==============================] - 1s 9ms/step - loss: 1.2095 - accuracy: 0.6720\n","best epoch is 5\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5847 - accuracy: 0.7687\n","129/129 [==============================] - 1s 9ms/step - loss: 0.7015 - accuracy: 0.7227\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5686 - accuracy: 0.7744\n","129/129 [==============================] - 1s 9ms/step - loss: 0.6971 - accuracy: 0.7096\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5793 - accuracy: 0.7651\n","129/129 [==============================] - 1s 9ms/step - loss: 0.7064 - accuracy: 0.7065\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5641 - accuracy: 0.7789\n","129/129 [==============================] - 1s 9ms/step - loss: 0.7114 - accuracy: 0.7094\n","602/602 [==============================] - 5s 9ms/step - loss: 0.5709 - accuracy: 0.7685\n","129/129 [==============================] - 1s 9ms/step - loss: 0.6851 - accuracy: 0.7120\n","best batchsize is 8\n","test accuracy score = 0.7152558816395829\n","time taken is 824.2185091972351\n"],"name":"stdout"},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"tags":[],"needs_background":"light"}},{"output_type":"display_data","data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxU9fX/8debTUBQEFCQJcGqVRFFDW4oRa0VaxX7rQuKda2odcWqpdWqpdpa2+8XN+yvtG5VKrXUra2KVsEVK8GiIAiiRQmIAoIsAdnO749zxwxxkkxCJpPlPB+PeWTufmaS3HM/y/1cmRkhhBBCec3yHUAIIYT6KRJECCGEjCJBhBBCyCgSRAghhIwiQYQQQsgoEkQIIYSMIkGEELaKpMmSflCD7Z6WdFYuYgq1IxJEqFTyz79c0jb5jiVkL/m9rZO0WtLnkl6S1Lca25ukXXMZo5kda2YP5PIYYetEgggVklQIHA4YcEIdH7tFXR4vV/L8OS4xs3bADsBk4ME8xhIaoEgQoTJnAq8D9wNbVAVI6inpUUlLJC2TdFfasvMlzZa0StIsSfsn87e4KpV0v6SbkveDJJVI+rGkxcB9kjpK+kdyjOXJ+x5p2+8g6T5Ji5LljyfzZ0o6Pm29lpKWStqv/AeU1DnZ7wpJn0l6WVKzyj6jpGaSrpP0oaRPJf1J0vbJssLkc54n6SPghWT+ucl3slzSREkFmb7wpNrlknLz3pL0P3Kjk2OulDRD0t5V/RLNbBMwHtgrbZ8HSpqSfO6PJd0lqVWy7KVktbeSEsipyfwhkqYnx35f0uC0wxRIejX5nT8rqXOyTWtJDyXf3wpJUyXtlCz7smoq+Yyr014maVCy7GBJryXbv5WaH+qAmcUrXhlfwDzgh8ABwAZgp2R+c+AtYDSwLdAaOCxZdjKwEOgPCNgVKEiWGbBr2v7vB25K3g8CNgK/BrYB2gCdgO8BbYH2wF+Bx9O2/yfwF6Aj0BL4RjL/GuAvaesNAWZU8Bl/Bfy/ZPuWeIlJVXzGc5PvZhegHfAo8GCyrDD5nH9KtmuTHH8esCfQArgOeK2CeM4EXk2b3gtYkXwnxwDTgA5JjHsC3SrYz2TgB8n7VsDNwEtpyw8ADk7iKQRmA1ekLS//uzoQ+Bw4Gr+w7A7skXas94Hdk887GbglWXYB8Pfkd9g8Oe525WMsF/tw4F1gu+Q4y4BvJ8c9Opnuku//j6bwynsA8aqfL+AwPCl0TqbfBUYk7w8BlgAtMmw3Ebi8gn1WlSDWA60riakfsDx53w3YDHTMsN7OwKq0E9EE4JoK9jkKeCI9riw+4/PAD9Omv558V6mTrQG7pC1/GjgvbboZUEqSOMvtuz2whrKkejNwb/L+SGBucmJvVsXvb3JyjBXAF8nJ/ahK1r8CeKyS39XvgdGVHOu6tOkfAs8k788FXgP2qWC7H5SbdxjwKbB7Mv1jkuRb7m/srHz/jzSFV1QxhYqcBTxrZkuT6T9TVs3UE/jQzDZm2K4nfjVZE0vMbF1qQlJbSb9PqnJWAi8BHSQ1T47zmZktL78TM1sEvAp8T1IH4FhgXAXH/A1+df+spA8kjcziM+4MfJg2/SGeHHZKm7cg7X0BcHtSRbIC+AwvAXTPEPsqvGQ0NJl1Wip2M3sBuAsYA3wqaayk7Sr4XACXmVkH/Kr+O8AESfsASNo9qVpbnHy3vwQ6V7Kvqn6vi9Pel+IlK/B2j4nA+KQq8FZJLTPtQFJP4BH85D83mV0AnJz67pLv7zD8AiHkWCSI8BWS2gCnAN9ITiCLgRHAvpL2xU9+vZS5AXYB8LUKdl2KVzWkdC23vPzQwj/Cr84PMrPtgIGpEJPj7JAkgEweAM7Aq7ymmNnCTCuZ2Soz+5GZ7YI3xF8p6Sgq/4yL8BNXSi+8euyTCj7LAuACM+uQ9mpjZq9VEPvDwGmSDsGrtialxXuHmR2AVz3tDlxdwT7SP+NmM3sZT4TfSmb/Di8V7pZ8tz/Fv9eKVPZ7rezYG8zs52a2F3AonqjOLL9e8jf3OHCbmT1d7rgPlvvutjWzW6obS6i+SBAhkxOBTfhJqF/y2hN4Gf/nfgP4GLhF0rZJQ+SAZNs/AldJOiBpVN01rUF2OnC6pOZJA+c3qoijPbAWWCFpB+CG1AIz+xivurlb3pjdUtLAtG0fB/YHLsfbAzKS9J0kRuHVMJvwqqvKPuPDwAhJvSW1w6++/1JBaQO8jeMnkvokx9xe0smVfO6n8AQ0Ktnv5mS7/pIOSq7A1wDrklirlCSbvYB3klntgZXAakl7ABeV2+QTvI0l5R7gHElHyRvpuyfbVXXcIyT1TUp9K/GquEwx3wu8a2a3lpv/EHC8pGOSv5vW8g4NPTLsI9S2fNdxxav+vYBngP/NMP8UvCqhBX7V/DjeYLgUuCNtvQuBOcBqYCawXzK/CD9BrcKrHh5myzaIknLH2xmvp16N171fgF+Zt0iW74CXFD4BlgOPltv+j/iJtF0ln3UEMD9ZrwT4WdqyjJ8Rv7C6Hr+6XYKfxDomywrTY0zb1/eBGfhJcgFJu0Ilcd2T7Kd/2ryjgLeT72MpXvWU8bMl39u6ZN3VeOlhRNrygXgJYjWe+EcBr5T7HX6Mt2Gcksz7bnL8Vcn+jkk71g/Stj07tS+8imxO8v1+AtyR9vv7crvks5amxbsaODxZdhDwIl41twSvguuV7/+TpvBS8gsIodGRdD3e2HlGvmMJoSFqFDcjhVBeUiV1Hn7lHkKogWiDCI2OpPPxapynzeylqtYPIWQWVUwhhBAyihJECCGEjBpNG0Tnzp2tsLAw32GEEEKDMm3atKVm1iXTskaTIAoLCykuLs53GCGE0KBI+rCiZVHFFEIIIaNIECGEEDLKaYKQNFjSHEnz0gZBS18+OhlffrqkuclAXKllt0p6Rz6G/h3JUAghhBDqSM7aIJKxV8bg47eXAFMlPWlms1LrmNmItPUvBfZL3h8KDAD2SRa/go/bMzlX8YYQQthSLksQBwLzzOwDM1uPP9FqSCXrn4aPzQM+Lktr/EEn2+APcvmkgu1CCCHkQC4TRHe2HBO/hAzj3wMko332Jnk8o5lNwYc4/jh5TTSz2Rm2Gy6pWFLxkiVLajn8EEKdGzcOCguhWTP/Oa6ix3iEulBfGqmHAhPMn52L/LnFewI98KRypKTDy29kZmPNrMjMirp0ydiNN4TQUIwbB8OHw4cfgpn/HD48kkQe5TJBLMSfQpXSI5mXyVDKqpfAhxV+3cxWm9lqfNz/Q3ISZQihfrj2Wigt3XJeaanPD3mRywQxFdgteahKKzwJPFl+peShIx2BKWmzP8KfZtYieTjKN/CHqocQGqP1673EkMlHH3mJItS5nCUI86drXYI/j3Y28IiZvSNplKQT0lYdCoy3LUcNnIA//3YG8Bbwlpn9PVexhhDyaOVK+PrXK16+7ba+/Je/hAULKl4v1LpGM5prUVGRxVAbITQQ69bBK6/AN7/p0z/9KWzcCGPGbFnN1LYtnHsuzJgBL74Ikm9z0UXw3e/mJ/ZGRtI0MyvKtKy+NFKHEJqCtWvhzjvha1+DwYO9+gi8dHDrrTB2LBQUeCIoKPDpO++EyZPh/ffh+uvhvffg2Wd9OzP497+jCipHogQRQsi90lI/2f/617B4MQwcCDfcAEcc4cmgOjZvhjVroH17mDIFDj0UdtsNzjoLzjwTevaseh/hS1GCCCHk12efwY9/DHvuCZMmeXXRkUdWPzmA3yPRvr2/79sX7rsPuneH667zUsfRR5eVTMJWiQQRQqh9q1d7aWHoUJ/u0QNmz4YXXoBBg2rvOO3awdlne9L54AMvlSxfDjvu6Mv/+U949dWogqqhSBAhhNqzcqW3JxQWwsiRsGJFWaPzLrvk9ti9e3uCKC6G1q193rXXwmGHwe67w803R8mimiJBhBBqx2uveWK49lo46CB4/XV45hnviZQvr7wC99/vJZjrrvP4rr8+f/E0MJEgQvXEWDkh3fLl8Pbb/r5vXzj2WJg61at2Djoov7GBV0GddVZZFdSNN3qJAmD+fPjBDzyJRBVURtGLKWQvNVZO+X7qY8fCsGH5iyvUvc8+g9Gj4Y47vIH4nXdq1uCcT0884X+3a9Z4t9uzz/ZeUL165TuyOhW9mELtiLFywtKlflNbQQHcdBN861vw8MMNLzkADBniXW7vv9+7xv7sZ95d9vPPfXkjuXjeGjl7YFBoBNas8eqC4mL/WdlYOaFpmDgRbrkFTjnF6/T33jvfEW2dVBXUWWfBf//r7Sbbb+/Lhgzx3lBnnw0DBjTMJLiVIkEEt3YtTJ/uiWDQINhnH6+bHTzYlxcUeHVS+RIE+JXWgAF+0jjpJK9yCI3D4sXw2996tctll3m31QMOgD32yHdkta93b38BbNoEXbrA+PFwzz1lVVBnndWkbsSLKqambPlyb6Tr189vPDr0ULj8cnjuOV9+yCHw1FPw6afeoDd27Fd7pLRu7UlhzRq44oqysXXAuzyGhmnRIhgxwk+Yo0f71TVA8+aNMzmU17y5J4bFi+GBBzxB/uxn8NhjvnzdOv+bb+zMrFG8DjjgAAsZbNhg9vbbZvfea/bDH5r172921VW+bP16s+7dzY45xuy668wef9xs4cLK9/fQQ2YFBWaS/3zoobJlc+aYTZrk77/4wqxjR7MBA8xuu82spCQHHy7kxN13m22zjVnz5mZnn202d26+I6of/vtfs+XL/f0995i1b2927rlmL71ktnlzXkPbGkCxVXBezfuJvbZekSDMbNMmP0m//HLZvN12818z+B/0oEFmY8aULc/VH/bKlWY33WS2zz5lxx8wwGzy5NwcL2ydDz80W7zY37/4op/45s3Lb0z12Ztv+nfUrp3/bX/ta2ajRplt3JjvyKqtsgQR3Vwbuhde8JEtp06FadO8B8Yuu/jIlwB/+INXA/Xv73eTNstDreKcOfDXv/rrd7/zqqw334SXX442i3ybPx9+9Ssfz+jCC73basjemjXw6KPeE2rVKnjjDZ//8suw//7+LIt6rrJurpEgGoqPPy7rTTRjBvztb36yv+AC/+fed18oKvJEUFTkjcz1kZn3BrnpJq/ThbIG7u99L5JFXfngA08M99/vf0fnnedDYzSxewBq1bp1fjG2ciV07ertGCef7I3bhx9eb3tBVZYg8l41VFuvRlXFtGSJ1+GbmY0b5+0EqWqaZs3M+vYtqw5YutRs3br8xbo13n3X7Be/KKuG6ty5rIheWprf2Bq7M8/0doZLLjFbsCDf0TQumzd7u0R6FdQuu5g991y+I8uIqGKqx0pL/YEnqdJBcbH3GJk8Gb7xDR8i4J57ykoG/fo1iGJrtc2Z4w+C+c53PBXutht06+ZXYFGy2Hpz5/pgdVdcAfvtByUlXnLYeed8R9a4pVdB3XWXD3f+xhvw7rv+d10P/pfzVoIABgNzgHnAyAzLRwPTk9dcYEUy/4i0+dOBdcCJlR2rQZQg1qwxe/VV79UzZYrPe/31stJBYaHZSSeZ/frX3mOiqVq3zhv8+vb170UyO+wws2eeyXdkDc+sWWann+4lzzZtzB54IN8RhUsv9b/rdu3MzjnHOwVs2pS3cMhHLyagOfA+sAvQCngL2KuS9S8F7s0wfwfgM6BtZcertwmitNTsvPO8GqV587JkcMMNvnzdOrOnnzb79NO8hllvzZ5dliz+9jef9/77ZrffHl1nq3L++Z5ct93W7JprzD75JN8RBTOvgnr5ZT8vtG/v54PDD89bOJUliJxVMUk6BLjRzI5Jpn+SlFh+VcH6rwE3mNlz5eYPB75hZpWOBpfXKqaNG2HWLK8eSlUV9evnPYjMYK+9/E7k/v3LqoqiaF99ljRw3303XHyxvx8wwKuhTjopvlPwh/LssYd/N7/8pT+458oroXPnfEcWMlmzxm++27ABzjnHzyVDh8Lxx3sVVLt2OQ8hL1VMwEnAH9Omvw/cVcG6BcDHQPMMy14AvlPV8eqsBLFpkzeu/utfZfMOOKCsZLDddmZHHGE2enTdxNNUzZpl9vOfm+29t3/v22xjtnq1L0s18Dclb75p9t3v+nfx97/nO5pQU++/b7brrv573HZbv1Fx8uScVkFRSQmivgy1MRSYYGab0mdK6gb0BSZm2kjScEnFkoqXLFlSsyNn83yDF1/05+keeSR07OhXaKecUjba44gR8OCD3vC0fLnfm3DFFTWLJ2Rnzz39wS8zZvhV8733ljX4HXkkDBzojYIff5zfOHOtuNgHldt/f/+7u+EGL1WFhmmXXbxDwSuvwGmneXf2QYP8HASweXPdxlNR5tjaF3AIMDFt+ifATypY9z/AoRnmXw6MzeZ4NSpBPPSQWdu2ZVf/qSvRE080O/bYsq6W11xj1qqVD1Nx0UV+m/3bbzfo2+sbrc2bvetsqmQhef3uY4/lO7Lat2GDWc+ePqTJqFFlw0CExmPNGrPx48u6f191lY+GcP/9ZqtWVT70TZbIUyN1C+ADoDdljdR9Mqy3BzCf5Ka9csteB47I5ng1ShAFBVsmh/TXPvt4cc/MbMWKhnuvQVOWqobq06dseJGlS83uvNNs0aL8xlZTr71mdsYZZX+PxcVmn3+e35hC3Rkzpmz4nFattuz4An7BW80kUVmCyOl9EJK+DdyG92i618xuljQqCejJZJ0bgdZmNrLctoXAq0BPM6uyXFWjRupmzTI/FESq+6JcyK1Nm/zO1vHjvegu+d2tqfssunXLd4SVe+UV+PnP4V//8gbn557zjhCh6TGDKVPgmGO8E0J5BQU+hEqWYqiNihQWZn4ITjW/4NDAzJpVNjbUO+/4hUJJiSeJVCKpL5Yv9wQ2aZI/vObqq+Gii+rFDVYhz2rpAjceOVqRm2/+6vMN2rb1+aHx2msvb8ydOdMTxF13lZUghg71O9jHjPFnAeSDWdkFSocO/jf5f//nd9hfdVUkh+AqGjerFsfTatoJYtgwfwhOQYFn3YICnx5W6S0XoTHZay+/Ik/p3x+WLYNLLvH7KgYNgkceqZtYzLzq6PDDoW9fj0OCf/zDe8qVv5gJTVsdXOA27QQBngzmz/ci2fz5kRyaumuuKStZ3HADLFni0wBffOE36dV2ycIMnn7ah0H/1re82vPXv46SQqhcHVzgNu02iBCysWEDtGwJzz/vj1SV/D6LVAN3165bt/9Zs6BPH68a+OlPfXjobbapldBDqEq0QYSwNVq29J9HHeWlieuv95JFqhoqVcLI9mLLDJ54wp+JAV7N9fTTPprtBRdEcgj1RiSIEKqjTx+48Uavgpo5E265xecBXHaZt1mkV0Ol36lfUFA23PaJJ/rd9+vW+XqDB0OrVnn4QCFULKqYQqgtt93mdcCzZ3s11B57+KNf16/fcr2ddoJbb4XTT4cWLfITawiJqGIKoS5ccYWXLGbM8MepvvfeV5MDeBXSmWdGcgj1XiSIEGqTBHvv7Xc9b9qUeZ0FC+o2phBqKBJECLlSBzcyhZBLkSBCyJW4Uz80cJEgQsiVuFM/NHDRShZCLg0bFgkhNFhRggghhJBRJIgQQggZRYIIIYSQUSSIEEIIGUWCCCGEkFEkiBBCCBnlNEFIGixpjqR5kkZmWD5a0vTkNVfSirRlvSQ9K2m2pFmSCnMZawghhC3l7D4ISc2BMcDRQAkwVdKTZjYrtY6ZjUhb/1Jgv7Rd/Am42cyek9QOyP4p3CGEELZaLksQBwLzzOwDM1sPjAeGVLL+acDDAJL2AlqY2XMAZrbazEpzGGsIIYRycpkgugPpw1aWJPO+QlIB0Bt4IZm1O7BC0qOS/iPpN0mJpPx2wyUVSypesmRJLYcfQghNW31ppB4KTDCz1PjILYDDgauA/sAuwNnlNzKzsWZWZGZFXbp0qatYQwihSchlglgI9Eyb7pHMy2QoSfVSogSYnlRPbQQeB/bPSZQhhBAyymWCmArsJqm3pFZ4Eniy/EqS9gA6AlPKbdtBUqpYcCQwq/y2IYQQcidnCSK58r8EmAjMBh4xs3ckjZJ0QtqqQ4HxlvZw7KSq6SrgeUkzAAF/yFWsIYQQvkpp5+UGraioyIqLi/MdRgghNCiSpplZUaZl9aWROoQQQj0TCSKEEEJGkSBCCCFkFAkihBBCRpEgQgghZJR1gpDUNpeBhBBCqF+qTBCSDpU0C3g3md5X0t05jyyEEEJeZVOCGA0cAywDMLO3gIG5DCqEEEL+ZVXFZGYLys3alHHFEEIIjUY2DwxaIOlQwCS1BC7Hh84IIYTQiGVTgrgQuBh/lsNCoF8yHUIIoRGrtASRPKTndjMbVkfxhBBCqCcqLUEko6oWJMN1hxBCaEKyaYP4AHhV0pPAmtRMM/u/nEUVQggh77JJEO8nr2ZA+9yGE0IIob6oMkGY2c8BJLVLplfnOqgQQgj5l82d1HtL+g/wDvCOpGmS+uQ+tBBCCPmUTTfXscCVZlZgZgXAj4jHf4YQQqOXTYLY1swmpSbMbDKwbTY7lzRY0hxJ8ySNzLB8tKTpyWuupBVpyzalLXsym+OFEEKoPVn1YpL0M+DBZPoMvGdTpZJ7KMYARwMlwFRJT5rZrNQ6ZjYibf1Lgf3SdrHWzPplEV8IIYQcyKYEcS7QBXgU+BvQOZlXlQOBeWb2gZmtB8YDQypZ/zTg4Sz2G0IIoQ5k04tpOXBZDfbdHUgf5K8EOCjTipIKgN7AC2mzW0sqBjYCt5jZ4xm2Gw4MB+jVq1cNQgwhhFCRbHoxPSepQ9p0R0kTazmOocCE5M7tlAIzKwJOB26T9LXyG5nZWDMrMrOiLl261HJIIYTQtGVTxdTZzL5sPE5KFDtmsd1CoGfadI9kXiZDKVe9ZGYLk58fAJPZsn0ihBBCjmWTIDZL+rL+JqkOsiy2mwrsJql3MpbTUOArvZEk7QF0BKakzesoaZvkfWdgADCr/LYhhBByJ5teTNcCr0h6ERBwOEm9f2XMbKOkS4CJQHPgXjN7R9IooNjMUsliKDDezNKTzp7A7yVtxpPYLem9n0IIobZs2LCBkpIS1q1bl+9Qcqp169b06NGDli1bZr2NtjwvV7CSX8UfnEy+bmZLaxZi7hQVFVlxcXG+wwghNDD//e9/ad++PZ06dUJSvsPJCTNj2bJlrFq1it69e2+xTNK0pL33K7JppB6A35PwD6AD8NOkmimEEBq8devWNerkACCJTp06VbuUlE0bxO+AUkn7AlfiI7v+qfohhhBC/dSYk0NKTT5jNgliY9I+MAQYY2ZjiGG/QwihVqxYsYK777672tt9+9vfZsWKFVWvuBWySRCrJP0EH2Ljn5KaAdm3coQQQmMybhwUFkKzZv5z3Lit2l1FCWLjxo2VbvfUU0/RoUOHStfZWtn0YjoVv1ntPDNbnHR5/U1OowohhPpo3DgYPhxKS336ww99GmDYsBrtcuTIkbz//vv069ePli1b0rp1azp27Mi7777L3LlzOfHEE1mwYAHr1q3j8ssvZ3hyvMLCQoqLi1m9ejXHHnsshx12GK+99hrdu3fniSeeoE2bNlv9cbPqxdQQRC+mEEJNzJ49mz333LNsxqBBX13plFPghz+EXr1gwYKvLu/UCZYu9ddJJ225bPLkSo8/f/58vvOd7zBz5kwmT57Mcccdx8yZM7/sbfTZZ5+xww47sHbtWvr378+LL75Ip06dtkgQu+66K8XFxfTr149TTjmFE044gTPOOKPqz0rlvZiyKUGEEEIAKCnJPH/Zslo7xIEHHrhFV9Q77riDxx57DIAFCxbw3nvv0alTpy226d27N/36+eDXBxxwAPPnz6+VWCJBhBBCusqu+Hv18mql8gqSnv+dO1dZYqjKttuWPW5n8uTJ/Otf/2LKlCm0bduWQYMGZeyqus0223z5vnnz5qxdu3arYkjJ5j6I45OG6RBCaNpuvhnatt1yXtu2Pr+G2rdvz6pVqzIu+/zzz+nYsSNt27bl3Xff5fXXX6/xcWoimxP/qcB7km5Nxk0KIYSmadgwGDvWSwyS/xw7tsYN1ACdOnViwIAB7L333lx99dVbLBs8eDAbN25kzz33ZOTIkRx88MEV7CU3sh1qYzv8gT7n4AP13Qc8bGaZ014eRCN1CKEmMjXcNlbVbaTOqurIzFYCE/CnwnUDvgu8mTwmNIQQQiOUTRvECZIew5/J0BI40MyOBfYFfpTb8EIIIeRLNr2YvgeMNrOX0meaWamk83ITVgghhHzLJkHcCHycmpDUBtjJzOab2fO5CiyEEEJ+ZdMG8Vdgc9r0pmReCCGERiybBNHCzNanJpL3rXIXUgghhPogmwSxRNIJqQlJQ4B690S5EEJoCtq1a1dnx8omQVyIP0XuI0kLgB8DF2Szc0mDJc2RNE/SyAzLR0uanrzmSlpRbvl2kkok3ZXN8UIIIddqebTveq3KRmozex84WFK7ZHp1NjuW1BwYAxwNlABTJT1pZrPS9j0ibf1Lgf3K7eYXwEuEEEI9kIPRvhk5ciQ9e/bk4osvBuDGG2+kRYsWTJo0ieXLl7NhwwZuuukmhgwZUgufoHqyGqxP0nFAH6B16rF1Zjaqis0OBOaZ2QfJPsbjT6WbVcH6pwE3pB3zAGAn4Bkg411+IYRQ2yob7fsnPylLDimlpXD55Z4gajDaN6eeeipXXHHFlwnikUceYeLEiVx22WVst912LF26lIMPPpgTTjihzh+NWmWCkPT/gLbAEcAfgZOAN7LYd3cgfeD0EuCgCo5RAPQGXkimmwH/iz/F7puVxDYcGA7Qq1evLEIKIYSay8Vo3/vttx+ffvopixYtYsmSJXTs2JGuXbsyYsQIXnrpJZo1a8bChQv55JNP6Nq1a80PVAPZlCAONbN9JL1tZj+X9L/A07Ucx1BggpltSqZ/CDxlZiWVZUwzGwuMBR+LqZZjCiE0QfkY7fvkk09mwoQJLF68mFNPPZVx48axZMkSpk2bRsuWLSksLMw4zHeuZdNInYqqVNLOwAZ8PKaqLAR6pk33SOZlMhR4OG36EOASSfOB3wJnSroli2OGEELO5GC0b8CrmcaPH8+ECRM4+eST+fzzz9lxxx1p2bIlkyZN4sNMWakOZFOC+LukDvhzqAVH1ScAABgVSURBVN/ER3P9QxbbTQV2k9QbTwxD8WdbbyEZQrwjMCU1z8yGpS0/Gygys6/0ggohhLqUaoi+9lr46CMvUdx881aN9g1Anz59WLVqFd27d6dbt24MGzaM448/nr59+1JUVMQee+TnSQuVJoikLeB5M1sB/E3SP4DWZvZ5VTs2s42SLgEmAs2Be83sHUmjgGIzezJZdSgw3hrLw7FDCI3asGFbnxAymTFjxpfvO3fuzJQpUzKut3p1Vh1Ja0WlCcLMNksaQ9L91My+AL7Idudm9hTwVLl515ebvrGKfdwP3J/tMUMIIdSObNognpf0PdV1/6oQQgh5lU2CuAAfnO8LSSslrZK0MsdxhRBCyLNs7qRuXxeBhBBCvphZnd+EVtdq0sybzY1yAys4WAyBEUJo8Fq3bs2yZcvo1KlTo00SZsayZcto3bp1tbbLppvr1WnvW+NDaEwDjqzWkUIIoR7q0aMHJSUlLFmyJN+h5FTr1q3p0aNHtbbJporp+PRpST2B26oXWggh1E8tW7akd+/e+Q6jXsqmkbq8EmDP2g4khBBC/ZJNG8Sd+N3T4AmlH35HdQghhEYsmzaI4rT3G4GHzezVHMUTQgihnsgmQUwA1qVGWpXUXFJbMyutYrsQQggNWFZ3UgNt0qbbAP/KTTghhBDqi2wSROv0x4wm79tWsn4IIYRGIJsEsUbS/qmJ5FGga3MXUgghhPogmzaIK4C/SloECOgKnJrTqEIIIeRdNjfKTU0e6vP1ZNYcM9uQ27BCCCHkW5VVTJIuBrY1s5lmNhNoJ+mHuQ8thBBCPmXTBnF+8kQ5AMxsOXB+7kIKIYRQH2STIJqnPyxIUnOgVe5CCiGEUB9k00j9DPAXSb9Ppi9I5oUQQmjEsilB/Bh4AbgoeT3PlkOAV0jSYElzJM2TNDLD8tGSpievuZJWJPMLJL2ZzH9H0oXZf6QQQgi1IZteTJuB/5e8kHQ4cCdwcWXbJVVRY4Cj8RFgp0p60sxmpe17RNr6lwL7JZMfA4eY2ReS2gEzk20XVefDhRBCqLmshvuWtJ+kWyXNB0YB72ax2YHAPDP7wMzWA+OBIZWsfxrwMICZrTezL5L522QbZwghhNpTYQlC0u74Sfs0YCnwF0BmdkSW++4OLEibLgEOquBYBUBvvCorNa8n8E9gV+DqTKUHScOB4QC9evXKMqwQQgjZqOzK/F38saLfMbPDzOxOYFOO4hgKTEiNGAtgZgvMbB88QZwlaafyG5nZWDMrMrOiLl265Ci0EEJomipLEP+DtwVMkvQHSUfhQ21kayHQM226RzIvk6Ek1UvlJSWHmcDh1Th2CCGErVRhgjCzx81sKLAHMAkfk2lHSb+T9K0s9j0V2E1Sb0mt8CTwZPmVkmE8OgJT0ub1kNQmed8ROAyYk/3HCiGEsLWqbPw1szVm9mczOx4vBfwH7/pa1XYbgUuAicBs4BEze0fSKEknpK06FBhvZpY2b0/g35LeAl4EfmtmM7L+VCGEELaatjwvN1xFRUVWXFxc9YohhBC+JGmamRVlWhbdR0MIIWQUCSKEEEJGkSBCCCFkFAkihBBCRpEgQgghZBQJIoQQQkaRIEIIIWQUCSKEEEJGkSBCCCFkFAkihBBCRpEgQgghZNTkE8S4cVBYCM2a+c9x4/IdUQgh1A9VPpO6MRs3DoYPh9JSn/7wQ58GGDYsf3GFEEJ90KRLENdeW5YcUkpLfX4IITR1TTpBfPRR5vkffggrV9ZtLCGEUN806QTRq1fFywYOhEbyqIwQQqiRJp0gbr4Z2rbdcl7btvCLX8Att4AE69bBNdfAf/+bnxhDCCFfmnSCGDYMxo6FggJPBgUFPn3ddTB4sK/zxhtw222w226+/ttv5zfmEEKoKzlNEJIGS5ojaZ6kkRmWj5Y0PXnNlbQimd9P0hRJ70h6W9KpuYpx2DCYPx82b/af5XsvDRzopYcrroAnnoB994Xjjos2ihBC45ezBCGpOTAGOBbYCzhN0l7p65jZCDPrZ2b9gDuBR5NFpcCZZtYHGAzcJqlDrmKtSvfu8NvfeqP2L37hyaR9e182e3a0VYQQGqdcliAOBOaZ2Qdmth4YDwypZP3TgIcBzGyumb2XvF8EfAp0yWGsWdlhB69+evppr5Jatgz694d99vF7KjZuzHeEIYRQe3KZILoDC9KmS5J5XyGpAOgNvJBh2YFAK+D9DMuGSyqWVLxkyZJaCbo6ttsOfvc7L1GccYa3U9x9N6xdW+ehhBBCrasvjdRDgQlmtil9pqRuwIPAOWa2ufxGZjbWzIrMrKhLl7ovYLRsCd//PsyY4e0TXbvCxRfDe+/VeSghhFDrcpkgFgI906Z7JPMyGUpSvZQiaTvgn8C1ZvZ6TiKsJc2awQknwGuveS+nffbx+RdeCD/+MXz8cX7jCyGEmshlgpgK7Capt6RWeBJ4svxKkvYAOgJT0ua1Ah4D/mRmE3IYY62SoG9ff795M6xZ443bhYVwwQUwb15ewwshhGrJWYIws43AJcBEYDbwiJm9I2mUpBPSVh0KjDfboi/QKcBA4Oy0brD9chVrLjRrBg8+CHPnwrnnwgMPwNe/7vNCCKEhkDWSPppFRUVWXFyc7zAqtHgx3H47XHKJd5udOtVLGN/4hpc8QgghHyRNM7OiTMvqSyN1o9e1K/zqV54cwIfyOOIIOPRQb+De/JUm+BBCyK9IEHny0EPeJfaTT+DEE2HvveHRR6veLoQQ6kokiDxp0wYuusjbKMaNgxYtygYE3LDBq59CCCGfIkHkWYsWcPrp8NZbcNllPu/Pf/aBA0eNgs8+y298IYSmKxJEPSH5jXcAffrAIYfADTf4MyuuvBJKSvIbXwih6YkEUQ8VFcHf/+433X33u3DHHd5OEUIIdSkSRD3Wt6/fNzFvnjdoA3z+uQ/vUY979IYQGolIEA1AYSEceKC/f+stL1307w9HHw3PPx/DjYcQciMSRAMzcKA/l+LWW2HmTPjmN+Ggg2D16nxHFkJobCJBNEDbbQdXX+3dYseOhX79oF07X/bSS7B+fX7jCyE0DpEgGrDWreH88z1JACxaBEcdBbvsAqNHR6kihLB1IkE0It26wT/+4Q8uuvJK7yJ7ww2wfHm+IwshNESRIBoRCY45BiZNgilTfCDAX/0KVq705THeUwihOiJBNFIHHwyPPQYffuh3ZQMMGQJnnw2zZuU1tBBCLRk3zns5NmvmP8eNq939R4Jo5Lp185+bNsHXvgZ//avfqX3iifB6vX5OXwihMuPGwfDhfhFo5j+HD6/dJBEJoolo3hxuu83/iK6/3ns7HXII/OlP+Y4shFCemd8UO2cOvPgijB/v/7+lpb78D3+Ac84pm04pLYVrr629OFrU3q5CQ9C5M/z8595N9o9/9GdpAzz7LKxYAd/7nieTEELufPqpj4bw8cf+MLHUz9QjiseMgUsv/ep2xx3nnVDatfNRnzP56KPaizOeKBcAOOkk+NvfYNdd4Zpr4MwzYZtt8h1VCA3D6tV+gu/UCTp2hA8+8Auw8glg3Djviv63v/n/XErHjv5QsXHjYL/9YPp0+Ne/fF63bmU/O3YsewJlYaHXCJRXUADz52cfe96eKCdpsKQ5kuZJGplh+ei0Z07PlbQibdkzklZI+kcuYwzuL3/xP9oOHbwes3fvqH4KTdvmzf5Ar+nT4Zln4L77/D34TaoDB/rVfPv2/tptN3j8cV++ZAn85jd+kl+61Lucn3CCl+ABBg3yNsD582HtWh/Wf9YsTw7gN79edRWccYYnlD59YIcdtnw88c03Q9u2W8bctq3Pry05q2KS1BwYAxwNlABTJT1pZl/2oTGzEWnrXwrsl7aL3wBtgQtyFWMo07w5/M//+OixkyZ599hNm3zZmjX+2nFHv8K59lovxvbq5X+Mw4blN/YQqmvzZj9Bp1/hf/yxn4xPP91P8N26lf0PpIwa5SfvNm2851BR0ZZX+Ycd5uv17w9ffOHrZNKpk7+2Rur/Lpf/j7lsgzgQmGdmHwBIGg8MASrqZHkacENqwsyelzQoh/GFDCQ48kh/pWofx46Fn/4UDj8cXnnFr3igrNcERJIItaOmFyCbN8OyZT7MTOq576lOGelJYPBguP12/zsfNKisHr95c9hpJy85g5+8R44sO/GnkkCqV2DXrjB5csXxVJQYatuwYbn938tlgugOLEibLgEOyrSipAKgN/BCdQ4gaTgwHKBXr141izJUKFWcPe44eOcduOeer65TWurJIxJE2FqpbpupnjmpC5AlS/y+nsWL/UR+/PG+/MILYdo0P/F/8gls3Og3ij7zjC+//Xav3kmd3Pfd16tqwP+2J070apuuXb3qJ71zRrNmcNNNdffZ66v60otpKDDBzDZVuWYaMxsLjAVvpM5FYAF2390b3O69N/PQ4h995FdfPXuWvQYOLGuEW7TIq6da1Je/tlAvfPqp1+UvXOiv667L3G1zxIiy6T59yhLE+vV+pb/33mVX+nvuWbburFleFVSRI46ovc/SWOXyX3Yh0DNtukcyL5OhwMU5jCXUgl69Mvea6NDBG+AWLIC5c/0ZFWvWeILYvNkHD9ywwf+JUwnkpJPglFM84bzxhs/r2rXuiuYhN1IXEBLMmAFTp/oFQioJLF8OL7/s61x5ZfY3df3zn/73sfPOZfPuvbfybSpLDiE7uUwQU4HdJPXGE8NQ4PTyK0naA+gITMlhLKEW3HzzllUA4L0m7rrrq1VMqbrdTZvgzjs9eaReb70F++/vy1PVB+AljO7dPVlcfrknkdWrvSdIKrF06bJlT45Qd774wk/23btDq1bw6qs+nEt6Ali0yF8dOsCf/wy33OLbdurk23Xv7lf+rVp5P/+hQ8vm9++fuQ9/QQF8+9t1+1mDy1mCMLONki4BJgLNgXvN7B1Jo4BiM3syWXUoMN7K3ZAh6WVgD6CdpBLgPDObmKt4Q9Wq02uiZcuyn+efX/E+27XzJ+SlJ5AFC8quROfM8Z5VKdtsAz16eP3yccdBSYmPYJtevdWhQySR6ti82evqUyf4gw/2E/rkyf5gqlQCWLrU158xw6t1pk/3G7q6d/cr+/79/X1qUMjLLvMLim7dfGj68g4q1yL5y19mvgCpzW6boXriRrlQr61d6w3k5RPIiBF+gnn88S0TCMC228LTT3uvq//8B554YssE0qOH91tvCkpLt7y6X7jQE+tee3mPtDPO8Pnpd+U+/bT39nn2We+AsPPOZUmge/ey/vwbN3rDbm0m4+hGXfcqu1Eumg1Dvdamjfc1L8r45+sNlumJo6TEf6ZGsH3zTe+7Xv46aNYsb9D8+9+/WgLp2dPbTer7kCMbNvhVfPrJf9EiOPlkr5J56y3vs1/eDjt4gthxR0+iqSqeVBJI9fT51rf8VZFcdDrIdbfNUD2RIEKD1ry5lwh69PDBB8s77zz4/vf9xJmeSFK9ot9/Hx59tKz6JGXlSi9l3Hnnlm0gqddhh2V35VyTK+ING7xqbtMmeOihLev4Fy6EU0+FH/3IB3M78MAtv4tu3cradAoL/YbH8iWA7bbz5bvvDg8+WPVnCE1XJIjQ6LVq5SfLwsKvLrviCn+tXVtW+li0qKwKqrTUx9V56SUfzBBg++3L3l92Gfz732VVVz17+nhWQ4Zk7td//vnevTPVdfOGGzxJpZcATjrJh3Vo1sz7+q9b5+0qqav8jh19206dvPosdfLfccctSz3bb+83e4VQU9EGEUKWVq3yJPLZZzBggM+79VYvYaRKJmvWeBXNzJkVD6bWunXZ3eh9+ngCSa/iGTCg7B6S+fP9xF9+zJ0QaktlbRCRIEKoJWZesvj887KnfGX695LKevqYRY+rkF95G801hKZE8uqfVFVWRaO/pM+P5BDqs0gQIeRIXQzHHEIuRYIIIUeGDfORcAsKvKRQUODT0Y0zNBTRiymEHIp+/aEhixJECCGEjCJBhBBCyCgSRAghhIwiQYQQQsgoEkQIIYSMGs2d1JKWABkGNshaZ2BplWuFlPi+qie+r+qJ76t6tub7KjCzLpkWNJoEsbUkFVd0u3n4qvi+qie+r+qJ76t6cvV9RRVTCCGEjCJBhBBCyCgSRJmx+Q6ggYnvq3ri+6qe+L6qJyffV7RBhBBCyChKECGEEDKKBBFCCCGjJp8gJI2Q9I6kmZIeltQ63zHVN5LulfSppJnl5l8q6d3k+7s1X/HVJ5JaS3pD0lvJ9/LzZP44SXOSv7N7JbXMd6z1haQOkiYkf0uzJR2StuxHkkxS53zGmG+Z/gcl/Sb5zt6W9JikDsn8lpIekDQj+T5/UtPjNukEIak7cBlQZGZ7A82BofmNql66HxicPkPSEcAQYF8z6wP8Ng9x1UdfAEea2b5AP2CwpIOBccAeQF+gDfCD/IVY79wOPGNmewD7ArMBJPUEvgV8lMfY6ov7Kfc/CDwH7G1m+wBzgVQiOBnYxsz6AgcAF0gqrMlBm3SCSLQA2khqAbQFFuU5nnrHzF4CPis3+yLgFjP7Ilnn0zoPrB4ytzqZbJm8zMyeSpYZ8AbQI29B1iOStgcGAvcAmNl6M1uRLB4NXAM0+Z40mf4HzexZM9uYTL5O2d+UAdsm57Q2wHpgZU2O26QThJktxK98PwI+Bj43s2fzG1WDsTtwuKR/S3pRUv98B1RfSGouaTrwKfCcmf07bVlL4PvAM/mKr57pDSwB7pP0H0l/lLStpCHAQjN7K8/xNRTnAk8n7ycAa/Bz2kfAb82s/AVeVpp0gpDUEa8m6Q3sjGfdM/IbVYPRAtgBOBi4GnhEkvIbUv1gZpvMrB9+RXegpL3TFt8NvGRmL+cnunqnBbA/8Dsz2w8/sd0I/BS4Po9xNRiSrgU24tWYAAcCm/BzWm/gR5J2qcm+m3SCAL4J/NfMlpjZBuBR4NA8x9RQlACPJrUmbwCb8QHDQiKpKplEUncs6QagC3BlPuOqZ0qAkrRS1gQ8YfQG3pI0H0+0b0rqmp8Q6y9JZwPfAYZZ2U1tp+NtOhuSqt9XgRqN09TUE8RHwMGS2iZXv0eRNJCFKj0OHAEgaXegFTH6JpK6pPUmaQMcDbwr6QfAMcBpZrY5nzHWJ2a2GFgg6evJrKOAN81sRzMrNLNCPInsn6wbEpIG4200J5hZadqij4Ajk3W2xUv579bkGC22NsiGzMz+LWkC8CZeRPsPcYv/V0h6GBgEdJZUAtwA3Avcm3S7Ww+clXYF05R1Ax6Q1By/AHvEzP4haSM+HP2UpCbuUTMblcc465NLgXGSWgEfAOfkOZ56p4L/wZ8A2wDPJX9Tr5vZhcAYvE3nHUDAfWb2do2OG//TIYQQMmnqVUwhhBAqEAkihBBCRpEgQgghZBQJIoQQQkaRIEIIIWQUCSKEhKRNkqYnI7G+KanSmyaTUUh/mMV+J0vK+kYlSa9lu24IuRQJIoQya82sXzIS60+AX1WxfgegygRRXWYWd/OHeiESRAiZbQcsB5DUTtLzSaliRjKQHMAtwNeSUsdvknV/nKzzlqRb0vZ3cvKciLmSDk/W7ZPMm56M6b9bMn918nNUsmy6pIWS7kvmn5G23e+Tm/JCqHVxo1wICUmbgBlAa/yO6CPNbFpqKHgzW5k8uOZ1YDegAPhH8iwRJB0L/Az4ppmVStrBzD6TNBmYZmY/kvRt4Eoz+6akO/G7X1N3ETc3s7WSVptZu7S4OgAvA2cDpcCtwP+Y2QZJdyf7+FMdfEWhiWnSQ22EUM7aZBRWkqea/SkZiVXALyUNxAcl7A7slGH7b+LDGpQClBti+dHk5zSgMHk/BbhWUg986I33yu8wGSPsIeD/kmR1Cf4QmKnJ8Apt8GHFQ6h1kSBCyMDMpiSlhS7At5OfByRX7fPxUkZ1fJH83ETyf2dmf5b0b+A44ClJF5jZC+W2uxEf7fS+ZFrAA2ZW48dIhpCtaIMIIQNJe+CPoF0GbA98miSHI/CqJYBVQPu0zZ4DzpHUNtnHDlUcYxfgAzO7A3gC2Kfc8uPxUsllabOfB06StGPqGJIKCCEHogQRQpk2yZPgwK/UzzKzTZLGAX+XNAMoJhk62cyWSXo1GdH2aTO7WlI/oFjSeuAp/ME3FTkF+L6kDcBi4Jflll+JV2e9kVQnPWlm10u6DnhWUjNgA3AxPlJsCLUqGqlDCCFkFFVMIYQQMooEEUIIIaNIECGEEDKKBBFCCCGjSBAhhBAyigQRQggho0gQIYQQMvr/D660X2do5T8AAAAASUVORK5CYII=\n","text/plain":["
"]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"code","metadata":{"id":"nH3le1L0ApgP"},"source":["#using keras embedding\r\n","\r\n","def rnn_keras(activation,optimizer,epochs,batchsize):\r\n"," embedding_dim = 100\r\n","\r\n"," model = Sequential()\r\n"," model.add(layers.Embedding(input_dim=vocab_size, \r\n"," output_dim=embedding_dim, \r\n"," input_length=maxlen))\r\n"," model.add(layers.SimpleRNN(64))\r\n"," model.add(Dense(3,activation='softmax'))\r\n"," model.compile(optimizer=optimizer,\r\n"," loss='categorical_crossentropy',\r\n"," metrics=['accuracy'])\r\n"," history = model.fit(X_train, y_train,\r\n"," epochs=epochs,\r\n"," verbose=0,\r\n"," validation_data=(X_val, y_val),\r\n"," batch_size=batchsize)\r\n"," \r\n"," return history, model"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"9F36uecY_L9k","outputId":"8dea8760-567b-49a6-cb57-82e001ce9324"},"source":["#experiments using keras embeddings\r\n","\r\n","# 1. selecting activation fixing - optimizer = adam, epochs = 5, batch = 16\r\n","sel_activation = {}\r\n","for i in activation:\r\n"," history, model = rnn_keras(i,\"adam\",5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_activation.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_activation_final = max(sel_activation, key=sel_activation.get)\r\n","print(\"best activation function is \",sel_activation_final)\r\n","\r\n","# 2. selecting optimizer by fixing - activation = best, epochs = 5, batch = 16\r\n","sel_optimizer = {}\r\n","for i in optimizer:\r\n"," history, model = rnn_keras(sel_activation_final,i,5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_optimizer.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_optimizer_final = max(sel_optimizer, key=sel_optimizer.get)\r\n","print(\"best optimizer is \",sel_optimizer_final)\r\n","\r\n","# 3. graph epoch vs accuracy score\r\n","\r\n","acc_train_epoch = {}\r\n","acc_val_epoch = {}\r\n","for i in epochs:\r\n"," history, model = rnn_keras(sel_activation_final,sel_optimizer_final,i,16)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_epoch.update(temp_train)\r\n"," acc_val_epoch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_epoch_final = max(acc_val_epoch, key=acc_val_epoch.get)\r\n","print(\"best epoch is \",sel_epoch_final)\r\n","\r\n","df_epoch_train = pd.DataFrame(list(acc_train_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","df_epoch_val = pd.DataFrame(list(acc_val_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","\r\n","df_epoch_val.Epochs = df_epoch_val.Epochs.map(lambda x:str(x))\r\n","df_epoch_train.Epochs = df_epoch_train.Epochs.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_epoch_train.iloc[:,0],df_epoch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_epoch_val.iloc[:,0],df_epoch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Epochs\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Epochs\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-epoch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","\r\n","# 4. graph batch size vs accuracy score\r\n","acc_train_batch = {}\r\n","acc_val_batch = {}\r\n","for i in batchsize:\r\n"," history, model = rnn_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_batch.update(temp_train)\r\n"," acc_val_batch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_batch_final = max(acc_val_batch, key=acc_val_batch.get)\r\n","print(\"best batchsize is \",sel_batch_final)\r\n","\r\n","df_batch_train = pd.DataFrame(list(acc_train_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","df_batch_val = pd.DataFrame(list(acc_val_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","\r\n","df_batch_val.Batchsize = df_batch_val.Batchsize.map(lambda x:str(x))\r\n","df_batch_train.Batchsize = df_batch_train.Batchsize.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_batch_train.iloc[:,0],df_batch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_batch_val.iloc[:,0],df_batch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Batchsize\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Batchsize\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-batch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","# 5. best model\r\n","t0 = time()\r\n","history, model = rnn_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,sel_batch_final)\r\n","pred = np.argmax(model.predict(X_test), axis=-1)\r\n","print(\"test accuracy score = \",accuracy_score(y_pred=pred, y_true=y_test))\r\n","t1 = time()\r\n","print(\"time taken is \", t1-t0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["129/129 [==============================] - 1s 9ms/step - loss: 1.2387 - accuracy: 0.6366\n","best activation function is tanh\n","129/129 [==============================] - 1s 9ms/step - loss: 1.1789 - accuracy: 0.5961\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9554 - accuracy: 0.5245\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9616 - accuracy: 0.6339\n","129/129 [==============================] - 1s 9ms/step - loss: 1.0851 - accuracy: 0.4042\n","best optimizer is RMSprop\n","602/602 [==============================] - 5s 9ms/step - loss: 0.2977 - accuracy: 0.8977\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9371 - accuracy: 0.6395\n","602/602 [==============================] - 5s 9ms/step - loss: 0.1199 - accuracy: 0.9614\n","129/129 [==============================] - 1s 9ms/step - loss: 1.3572 - accuracy: 0.6157\n","602/602 [==============================] - 5s 9ms/step - loss: 0.0772 - accuracy: 0.9755\n","129/129 [==============================] - 1s 9ms/step - loss: 1.6694 - accuracy: 0.5873\n","602/602 [==============================] - 5s 9ms/step - loss: 0.0516 - accuracy: 0.9839\n","129/129 [==============================] - 1s 9ms/step - loss: 2.1405 - accuracy: 0.5633\n","best epoch is 5\n","602/602 [==============================] - 5s 9ms/step - loss: 0.4626 - accuracy: 0.8178\n","129/129 [==============================] - 1s 9ms/step - loss: 0.8269 - accuracy: 0.6664\n","602/602 [==============================] - 5s 9ms/step - loss: 0.2892 - accuracy: 0.8995\n","129/129 [==============================] - 1s 9ms/step - loss: 0.9192 - accuracy: 0.6422\n","602/602 [==============================] - 5s 9ms/step - loss: 0.2157 - accuracy: 0.9256\n","129/129 [==============================] - 1s 9ms/step - loss: 1.0571 - accuracy: 0.6359\n","602/602 [==============================] - 5s 9ms/step - loss: 0.2357 - accuracy: 0.9196\n","129/129 [==============================] - 1s 9ms/step - loss: 1.0123 - accuracy: 0.6402\n"," 73/602 [==>...........................] - ETA: 4s - loss: 0.2410 - accuracy: 0.9259"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"M-L-vb5IOtQH"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"m9U3sIZIOtSp"},"source":[""],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /5. GRU.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"accelerator":"GPU","colab":{"name":"5. GRU.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"code","metadata":{"id":"XnIpJnkHuT-D"},"source":["import pandas as pd\r\n","import os\r\n","import numpy as np\r\n","from sklearn.model_selection import train_test_split\r\n","import gensim\r\n","from gensim.models.word2vec import Word2Vec\r\n","import gensim.downloader as api\r\n","from keras.utils import to_categorical\r\n","import matplotlib.pyplot as plt \r\n","import keras\r\n","from time import time"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"VijxsP322W3W","executionInfo":{"status":"ok","timestamp":1609347653641,"user_tz":-60,"elapsed":3803,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"892c309f-7134-4583-cde3-5f2b53018c99"},"source":["df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextselected_textsentiment
0cb774db0d1id have responded if i were goingid have responded if i were goingneutral
1549e992a42sooo sad i will miss you here in san diegosooo sadnegative
2088c60f138my boss is bullying mebullying menegative
39642c003efwhat interview leave me aloneleave me alonenegative
4358bd9e861sons of why couldnt they put them on the rele...sons ofnegative
...............
274764eac33d1c0wish we could come see u on denver husband los...d lostnegative
274774f4c4fc327ive wondered about rake to the client has made...dont forcenegative
27478f67aae2310yay good for both of you enjoy the break you p...yay good for both of youpositive
27479ed167662a5but it was worth itbut it was worth itpositive
274806f7127d9d7all this flirting going on the atg smiles yay ...all this flirting going on the atg smiles yay ...neutral
\n","

27481 rows × 4 columns

\n","
"],"text/plain":[" textID ... sentiment\n","0 cb774db0d1 ... neutral\n","1 549e992a42 ... negative\n","2 088c60f138 ... negative\n","3 9642c003ef ... negative\n","4 358bd9e861 ... negative\n","... ... ... ...\n","27476 4eac33d1c0 ... negative\n","27477 4f4c4fc327 ... negative\n","27478 f67aae2310 ... positive\n","27479 ed167662a5 ... positive\n","27480 6f7127d9d7 ... neutral\n","\n","[27481 rows x 4 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"code","metadata":{"id":"sEBirsml1Hgu"},"source":["# for case 1 run this code (case 1 = text)\r\n","case = \"case1-gru\"\r\n","\r\n","#read data\r\n","df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df.text = df.text.map(lambda x:str(x))\r\n","df.sentiment = df.sentiment.astype(\"category\")\r\n","df.sentiment = df.sentiment.cat.codes"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"wxc2J9Gr2bQ-"},"source":["# for case 2 run this code (case 2 = selected text)\r\n","case = \"case2-gru\"\r\n","\r\n","#read data\r\n","df = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","df.text = df.selected_text.map(lambda x:str(x))\r\n","df.sentiment = df.sentiment.astype(\"category\")\r\n","df.sentiment = df.sentiment.cat.codes"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Kiv4fMpy2wVQ"},"source":["# train, val, test split\r\n","x_train, xtest, y_train, ytest = train_test_split(df.text.values, df.sentiment.values,stratify=df.sentiment.values, test_size=0.3,random_state=1)\r\n","y_train = to_categorical(y_train)\r\n","x_val = xtest[0:4122]\r\n","y_val = to_categorical(ytest[0:4122])\r\n","x_test = xtest[4122:]\r\n","y_test = ytest[4122:]\r\n","\r\n","#padding and tokenizing\r\n","from keras.preprocessing.text import Tokenizer\r\n","from keras.preprocessing.sequence import pad_sequences\r\n","\r\n","tokenizer = Tokenizer(num_words=5000)\r\n","tokenizer.fit_on_texts(df.text.values)\r\n","\r\n","X_train = tokenizer.texts_to_sequences(x_train)\r\n","X_val = tokenizer.texts_to_sequences(x_val)\r\n","X_test = tokenizer.texts_to_sequences(x_test)\r\n","\r\n","vocab_size = len(tokenizer.word_index) + 1\r\n","\r\n","maxlen = 100\r\n","\r\n","X_train = pad_sequences(X_train, padding='pre', maxlen=maxlen)\r\n","X_val = pad_sequences(X_val, padding='pre', maxlen=maxlen)\r\n","X_test = pad_sequences(X_test, padding='pre', maxlen=maxlen)\r\n","\r\n","word_index = tokenizer.word_index"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"MfchPQ_iuUHb"},"source":["#import glove embeddings\r\n","\r\n","embeddings_index = {}\r\n","f = open(os.path.join( 'glove.twitter.27B.100d.txt'))\r\n","for line in f:\r\n"," values = line.split()\r\n"," word = values[0]\r\n"," coefs = np.asarray(values[1:], dtype='float32')\r\n"," embeddings_index[word] = coefs\r\n","f.close()\r\n","\r\n","embedding_matrix = np.zeros((len(word_index) + 1, 100))\r\n","for word, i in word_index.items():\r\n"," embedding_vector = embeddings_index.get(word)\r\n"," if embedding_vector is not None:\r\n"," # words not found in embedding index will be all-zeros.\r\n"," embedding_matrix[i] = embedding_vector"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Z4oLKEuduUTR"},"source":["#using glove\r\n","\r\n","from keras.models import Sequential\r\n","from keras import regularizers\r\n","from keras.layers.core import Dense, Dropout, Flatten\r\n","from keras import layers\r\n","from sklearn.metrics import accuracy_score, f1_score\r\n","from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D\r\n","\r\n","\r\n","def gru_glove(activation,optimizer,epochs,batchsize):\r\n"," embedding_dim = 100\r\n","\r\n"," model = Sequential()\r\n"," model.add(layers.Embedding(input_dim=vocab_size, \r\n"," output_dim=embedding_dim, weights=[embedding_matrix],\r\n"," input_length=maxlen))\r\n"," model.add(layers.GRU(64))\r\n"," model.add(Dense(3,activation='softmax'))\r\n"," model.compile(optimizer=optimizer,\r\n"," loss='categorical_crossentropy',\r\n"," metrics=['accuracy'])\r\n"," history = model.fit(X_train, y_train,\r\n"," epochs=epochs,\r\n"," verbose=0,\r\n"," validation_data=(X_val, y_val),\r\n"," batch_size=batchsize)\r\n"," \r\n"," return history, model"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"DD3YOLNLfVq5"},"source":["activation = [ \"tanh\"]\r\n","optimizer = [\"adam\", \"SGD\", \"RMSprop\", \"Adadelta\"]\r\n","epochs = [5,10,15,20]\r\n","batchsize = [8,16,32,64,128]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"dk16QKlGk3y7","executionInfo":{"status":"error","timestamp":1609240215657,"user_tz":-60,"elapsed":1240623,"user":{"displayName":"Rohith Teja","photoUrl":"","userId":"15087575277418902762"}},"outputId":"c5936249-c3bc-4228-f48b-996d330336da"},"source":["#experiments using glove embeddings\r\n","\r\n","# 1. selecting activation fixing - optimizer = adam, epochs = 5, batch = 16\r\n","sel_activation = {}\r\n","for i in activation:\r\n"," history, model = gru_glove(i,\"adam\",5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_activation.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_activation_final = max(sel_activation, key=sel_activation.get)\r\n","print(\"best activation function is \",sel_activation_final)\r\n","\r\n","# 2. selecting optimizer by fixing - activation = best, epochs = 5, batch = 16\r\n","sel_optimizer = {}\r\n","for i in optimizer:\r\n"," history, model = gru_glove(sel_activation_final,i,5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_optimizer.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_optimizer_final = max(sel_optimizer, key=sel_optimizer.get)\r\n","print(\"best optimizer is \",sel_optimizer_final)\r\n","\r\n","# 3. graph epoch vs accuracy score\r\n","\r\n","acc_train_epoch = {}\r\n","acc_val_epoch = {}\r\n","for i in epochs:\r\n"," history, model = gru_glove(sel_activation_final,sel_optimizer_final,i,16)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_epoch.update(temp_train)\r\n"," acc_val_epoch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_epoch_final = max(acc_val_epoch, key=acc_val_epoch.get)\r\n","print(\"best epoch is \",sel_epoch_final)\r\n","\r\n","df_epoch_train = pd.DataFrame(list(acc_train_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","df_epoch_val = pd.DataFrame(list(acc_val_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","\r\n","df_epoch_val.Epochs = df_epoch_val.Epochs.map(lambda x:str(x))\r\n","df_epoch_train.Epochs = df_epoch_train.Epochs.map(lambda x:str(x))\r\n","\r\n","\r\n","plt.figure()\r\n","plt.plot(df_epoch_train.iloc[:,0],df_epoch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_epoch_val.iloc[:,0],df_epoch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Epochs\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Epochs\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-epoch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","\r\n","# 4. graph batch size vs accuracy score\r\n","acc_train_batch = {}\r\n","acc_val_batch = {}\r\n","for i in batchsize:\r\n"," history, model = gru_glove(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_batch.update(temp_train)\r\n"," acc_val_batch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_batch_final = max(acc_val_batch, key=acc_val_batch.get)\r\n","print(\"best batchsize is \",sel_batch_final)\r\n","\r\n","df_batch_train = pd.DataFrame(list(acc_train_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","df_batch_val = pd.DataFrame(list(acc_val_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","\r\n","df_batch_val.Batchsize = df_batch_val.Batchsize.map(lambda x:str(x))\r\n","df_batch_train.Batchsize = df_batch_train.Batchsize.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_batch_train.iloc[:,0],df_batch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_batch_val.iloc[:,0],df_batch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Batchsize\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Batchsize\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-batch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","# 5. best model\r\n","t0 = time()\r\n","history, model = gru_glove(sel_activation_final,sel_optimizer_final,sel_epoch_final,sel_batch_final)\r\n","pred = np.argmax(model.predict(X_test), axis=-1)\r\n","print(\"test accuracy score = \",accuracy_score(y_pred=pred, y_true=y_test))\r\n","t1 = time()\r\n","print(\"time taken is \", t1-t0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["129/129 [==============================] - 1s 4ms/step - loss: 0.8080 - accuracy: 0.7152\n","best activation function is tanh\n","129/129 [==============================] - 1s 4ms/step - loss: 0.7805 - accuracy: 0.7135\n","129/129 [==============================] - 1s 4ms/step - loss: 0.7485 - accuracy: 0.6773\n","129/129 [==============================] - 1s 4ms/step - loss: 0.6540 - accuracy: 0.7368\n","129/129 [==============================] - 1s 4ms/step - loss: 1.0742 - accuracy: 0.4022\n","best optimizer is RMSprop\n","602/602 [==============================] - 3s 4ms/step - loss: 0.4555 - accuracy: 0.8216\n","129/129 [==============================] - 1s 4ms/step - loss: 0.6553 - accuracy: 0.7336\n","602/602 [==============================] - 3s 4ms/step - loss: 0.2070 - accuracy: 0.9347\n","129/129 [==============================] - 1s 4ms/step - loss: 0.8447 - accuracy: 0.7103\n","602/602 [==============================] - 2s 4ms/step - loss: 0.0582 - accuracy: 0.9842\n","129/129 [==============================] - 1s 4ms/step - loss: 1.3716 - accuracy: 0.6965\n","602/602 [==============================] - 2s 4ms/step - loss: 0.0175 - accuracy: 0.9952\n","129/129 [==============================] - 1s 4ms/step - loss: 1.9760 - accuracy: 0.6783\n","best epoch is 5\n"],"name":"stdout"},{"output_type":"stream","text":["ERROR:root:Internal Python error in the inspect module.\n","Below is the traceback from this internal error.\n","\n"],"name":"stderr"},{"output_type":"stream","text":["Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py\", line 2882, in run_code\n"," exec(code_obj, self.user_global_ns, self.user_ns)\n"," File \"\", line 61, in \n"," history, model = mlp_glove(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\n"," File \"\", line 27, in mlp_glove\n"," batch_size=batchsize)\n"," File \"/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py\", line 1117, in fit\n"," self._fit_frame = tf_inspect.currentframe()\n"," File \"/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_inspect.py\", line 95, in currentframe\n"," return _inspect.stack()[1][0]\n"," File \"/usr/lib/python3.6/inspect.py\", line 1501, in stack\n"," return getouterframes(sys._getframe(1), context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1478, in getouterframes\n"," frameinfo = (frame,) + getframeinfo(frame, context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1448, in getframeinfo\n"," filename = getsourcefile(frame) or getfile(frame)\n"," File \"/usr/lib/python3.6/inspect.py\", line 696, in getsourcefile\n"," if getattr(getmodule(object, filename), '__loader__', None) is not None:\n"," File \"/usr/lib/python3.6/inspect.py\", line 725, in getmodule\n"," file = getabsfile(object, _filename)\n"," File \"/usr/lib/python3.6/inspect.py\", line 709, in getabsfile\n"," return os.path.normcase(os.path.abspath(_filename))\n"," File \"/usr/lib/python3.6/posixpath.py\", line 383, in abspath\n"," cwd = os.getcwd()\n","FileNotFoundError: [Errno 2] No such file or directory\n","\n","During handling of the above exception, another exception occurred:\n","\n","Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py\", line 1823, in showtraceback\n"," stb = value._render_traceback_()\n","AttributeError: 'FileNotFoundError' object has no attribute '_render_traceback_'\n","\n","During handling of the above exception, another exception occurred:\n","\n","Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 1132, in get_records\n"," return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 313, in wrapped\n"," return f(*args, **kwargs)\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 358, in _fixed_getinnerframes\n"," records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n"," File \"/usr/lib/python3.6/inspect.py\", line 1490, in getinnerframes\n"," frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1448, in getframeinfo\n"," filename = getsourcefile(frame) or getfile(frame)\n"," File \"/usr/lib/python3.6/inspect.py\", line 696, in getsourcefile\n"," if getattr(getmodule(object, filename), '__loader__', None) is not None:\n"," File \"/usr/lib/python3.6/inspect.py\", line 725, in getmodule\n"," file = getabsfile(object, _filename)\n"," File \"/usr/lib/python3.6/inspect.py\", line 709, in getabsfile\n"," return os.path.normcase(os.path.abspath(_filename))\n"," File \"/usr/lib/python3.6/posixpath.py\", line 383, in abspath\n"," cwd = os.getcwd()\n","FileNotFoundError: [Errno 2] No such file or directory\n"],"name":"stdout"},{"output_type":"error","ename":"FileNotFoundError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m"]},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"code","metadata":{"id":"nH3le1L0ApgP"},"source":["#using keras embedding\r\n","\r\n","def gru_keras(activation,optimizer,epochs,batchsize):\r\n"," embedding_dim = 100\r\n","\r\n"," model = Sequential()\r\n"," model.add(layers.Embedding(input_dim=vocab_size, \r\n"," output_dim=embedding_dim, \r\n"," input_length=maxlen))\r\n"," model.add(layers.GRU(64))\r\n"," model.add(Dense(3,activation='softmax'))\r\n"," model.compile(optimizer=optimizer,\r\n"," loss='categorical_crossentropy',\r\n"," metrics=['accuracy'])\r\n"," history = model.fit(X_train, y_train,\r\n"," epochs=epochs,\r\n"," verbose=0,\r\n"," validation_data=(X_val, y_val),\r\n"," batch_size=batchsize)\r\n"," \r\n"," return history, model"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"9F36uecY_L9k","executionInfo":{"status":"error","timestamp":1609241682232,"user_tz":-60,"elapsed":1201728,"user":{"displayName":"Rohith Teja","photoUrl":"","userId":"15087575277418902762"}},"outputId":"199f2548-be4c-441e-dcc7-1085876ba53c"},"source":["#experiments using keras embeddings\r\n","\r\n","# 1. selecting activation fixing - optimizer = adam, epochs = 5, batch = 16\r\n","sel_activation = {}\r\n","for i in activation:\r\n"," history, model = gru_keras(i,\"adam\",5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_activation.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_activation_final = max(sel_activation, key=sel_activation.get)\r\n","print(\"best activation function is \",sel_activation_final)\r\n","\r\n","# 2. selecting optimizer by fixing - activation = best, epochs = 5, batch = 16\r\n","sel_optimizer = {}\r\n","for i in optimizer:\r\n"," history, model = gru_keras(sel_activation_final,i,5,16)\r\n"," temp = {i:model.evaluate(X_val,y_val)[1]}\r\n"," sel_optimizer.update(temp)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_optimizer_final = max(sel_optimizer, key=sel_optimizer.get)\r\n","print(\"best optimizer is \",sel_optimizer_final)\r\n","\r\n","# 3. graph epoch vs accuracy score\r\n","\r\n","acc_train_epoch = {}\r\n","acc_val_epoch = {}\r\n","for i in epochs:\r\n"," history, model = gru_keras(sel_activation_final,sel_optimizer_final,i,16)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_epoch.update(temp_train)\r\n"," acc_val_epoch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_epoch_final = max(acc_val_epoch, key=acc_val_epoch.get)\r\n","print(\"best epoch is \",sel_epoch_final)\r\n","\r\n","df_epoch_train = pd.DataFrame(list(acc_train_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","df_epoch_val = pd.DataFrame(list(acc_val_epoch.items()), columns=['Epochs', 'Accuracy score'])\r\n","\r\n","df_epoch_val.Epochs = df_epoch_val.Epochs.map(lambda x:str(x))\r\n","df_epoch_train.Epochs = df_epoch_train.Epochs.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_epoch_train.iloc[:,0],df_epoch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_epoch_val.iloc[:,0],df_epoch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Epochs\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Epochs\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-epoch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","\r\n","# 4. graph batch size vs accuracy score\r\n","acc_train_batch = {}\r\n","acc_val_batch = {}\r\n","for i in batchsize:\r\n"," history, model = gru_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\r\n"," temp_train = {i:model.evaluate(X_train,y_train)[1]}\r\n"," temp_val = {i:model.evaluate(X_val,y_val)[1]}\r\n"," acc_train_batch.update(temp_train)\r\n"," acc_val_batch.update(temp_val)\r\n"," keras.backend.clear_session()\r\n","\r\n","sel_batch_final = max(acc_val_batch, key=acc_val_batch.get)\r\n","print(\"best batchsize is \",sel_batch_final)\r\n","\r\n","df_batch_train = pd.DataFrame(list(acc_train_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","df_batch_val = pd.DataFrame(list(acc_val_batch.items()), columns=['Batchsize', 'Accuracy score'])\r\n","\r\n","df_batch_val.Batchsize = df_batch_val.Batchsize.map(lambda x:str(x))\r\n","df_batch_train.Batchsize = df_batch_train.Batchsize.map(lambda x:str(x))\r\n","\r\n","plt.figure()\r\n","plt.plot(df_batch_train.iloc[:,0],df_batch_train.iloc[:,1],c=\"r\",label=\"train\",linestyle='--', marker='o')\r\n","plt.plot(df_batch_val.iloc[:,0],df_batch_val.iloc[:,1],c=\"b\",label = \"val\",linestyle='--', marker='o')\r\n","plt.title(\"Accuracy score vs Batchsize\")\r\n","plt.ylabel(\"Accuracy score\")\r\n","plt.xlabel(\"Batchsize\")\r\n","plt.legend()\r\n","plt.savefig(\"images/acc-batch-glove-\"+case, bbox_inches='tight',dpi = 200)\r\n","\r\n","# 5. best model\r\n","t0 = time()\r\n","history, model = gru_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,sel_batch_final)\r\n","pred = np.argmax(model.predict(X_test), axis=-1)\r\n","print(\"test accuracy score = \",accuracy_score(y_pred=pred, y_true=y_test))\r\n","t1 = time()\r\n","print(\"time taken is \", t1-t0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["129/129 [==============================] - 1s 4ms/step - loss: 0.8845 - accuracy: 0.6895\n","best activation function is tanh\n","129/129 [==============================] - 1s 4ms/step - loss: 0.8548 - accuracy: 0.6943\n","129/129 [==============================] - 1s 4ms/step - loss: 1.0620 - accuracy: 0.4328\n","129/129 [==============================] - 1s 4ms/step - loss: 0.6852 - accuracy: 0.7091\n","129/129 [==============================] - 1s 4ms/step - loss: 1.0936 - accuracy: 0.4136\n","best optimizer is RMSprop\n","602/602 [==============================] - 2s 4ms/step - loss: 0.4635 - accuracy: 0.8244\n","129/129 [==============================] - 1s 4ms/step - loss: 0.6777 - accuracy: 0.7254\n","602/602 [==============================] - 2s 4ms/step - loss: 0.3208 - accuracy: 0.8864\n","129/129 [==============================] - 1s 4ms/step - loss: 0.8381 - accuracy: 0.6841\n","602/602 [==============================] - 2s 4ms/step - loss: 0.1924 - accuracy: 0.9357\n","129/129 [==============================] - 1s 4ms/step - loss: 1.0715 - accuracy: 0.6655\n","602/602 [==============================] - 3s 4ms/step - loss: 0.0860 - accuracy: 0.9734\n","129/129 [==============================] - 1s 4ms/step - loss: 1.4457 - accuracy: 0.6531\n","best epoch is 5\n"],"name":"stdout"},{"output_type":"stream","text":["ERROR:root:Internal Python error in the inspect module.\n","Below is the traceback from this internal error.\n","\n"],"name":"stderr"},{"output_type":"stream","text":["Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py\", line 2882, in run_code\n"," exec(code_obj, self.user_global_ns, self.user_ns)\n"," File \"\", line 60, in \n"," history, model = mlp_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,i)\n"," File \"\", line 19, in mlp_keras\n"," batch_size=batchsize)\n"," File \"/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py\", line 1117, in fit\n"," self._fit_frame = tf_inspect.currentframe()\n"," File \"/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_inspect.py\", line 95, in currentframe\n"," return _inspect.stack()[1][0]\n"," File \"/usr/lib/python3.6/inspect.py\", line 1501, in stack\n"," return getouterframes(sys._getframe(1), context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1478, in getouterframes\n"," frameinfo = (frame,) + getframeinfo(frame, context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1448, in getframeinfo\n"," filename = getsourcefile(frame) or getfile(frame)\n"," File \"/usr/lib/python3.6/inspect.py\", line 696, in getsourcefile\n"," if getattr(getmodule(object, filename), '__loader__', None) is not None:\n"," File \"/usr/lib/python3.6/inspect.py\", line 725, in getmodule\n"," file = getabsfile(object, _filename)\n"," File \"/usr/lib/python3.6/inspect.py\", line 709, in getabsfile\n"," return os.path.normcase(os.path.abspath(_filename))\n"," File \"/usr/lib/python3.6/posixpath.py\", line 383, in abspath\n"," cwd = os.getcwd()\n","FileNotFoundError: [Errno 2] No such file or directory\n","\n","During handling of the above exception, another exception occurred:\n","\n","Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py\", line 1823, in showtraceback\n"," stb = value._render_traceback_()\n","AttributeError: 'FileNotFoundError' object has no attribute '_render_traceback_'\n","\n","During handling of the above exception, another exception occurred:\n","\n","Traceback (most recent call last):\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 1132, in get_records\n"," return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 313, in wrapped\n"," return f(*args, **kwargs)\n"," File \"/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py\", line 358, in _fixed_getinnerframes\n"," records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n"," File \"/usr/lib/python3.6/inspect.py\", line 1490, in getinnerframes\n"," frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)\n"," File \"/usr/lib/python3.6/inspect.py\", line 1448, in getframeinfo\n"," filename = getsourcefile(frame) or getfile(frame)\n"," File \"/usr/lib/python3.6/inspect.py\", line 696, in getsourcefile\n"," if getattr(getmodule(object, filename), '__loader__', None) is not None:\n"," File \"/usr/lib/python3.6/inspect.py\", line 725, in getmodule\n"," file = getabsfile(object, _filename)\n"," File \"/usr/lib/python3.6/inspect.py\", line 709, in getabsfile\n"," return os.path.normcase(os.path.abspath(_filename))\n"," File \"/usr/lib/python3.6/posixpath.py\", line 383, in abspath\n"," cwd = os.getcwd()\n","FileNotFoundError: [Errno 2] No such file or directory\n"],"name":"stdout"},{"output_type":"error","ename":"FileNotFoundError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m"]},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"code","metadata":{"id":"m9U3sIZIOtSp","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1609242538023,"user_tz":-60,"elapsed":134318,"user":{"displayName":"Rohith Teja","photoUrl":"","userId":"15087575277418902762"}},"outputId":"3c1c9faf-ecb6-4ea8-af46-f2590de7ebb3"},"source":["# 5. best model\r\n","t0 = time()\r\n","history, model = mlp_keras(sel_activation_final,sel_optimizer_final,sel_epoch_final,sel_batch_final)\r\n","pred = np.argmax(model.predict(X_test), axis=-1)\r\n","print(\"test accuracy score = \",accuracy_score(y_pred=pred, y_true=y_test))\r\n","t1 = time()\r\n","print(\"time taken is \", t1-t0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["test accuracy score = 0.7196216347319913\n","time taken is 133.69832515716553\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"RBNS5ShqUqOh"},"source":[""],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /8. RoBERTa.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"8. RoBERTa.ipynb","provenance":[{"file_id":"1dLIJOZ9Ar0G88j2Sreqg5nokTNm0UAyM","timestamp":1608768028418}],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU","widgets":{"application/vnd.jupyter.widget-state+json":{"aa165c4c8d4a4bdba30d669f59906d15":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","state":{"_view_name":"HBoxView","_dom_classes":[],"_model_name":"HBoxModel","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.5.0","box_style":"","layout":"IPY_MODEL_6ec0c4d438e34833979e9caa837ba5a7","_model_module":"@jupyter-widgets/controls","children":["IPY_MODEL_14fd1204a16d4c3692e57e2852041c2b","IPY_MODEL_6117124ef56747cb9660a3b0a1320814"]}},"6ec0c4d438e34833979e9caa837ba5a7":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"14fd1204a16d4c3692e57e2852041c2b":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","state":{"_view_name":"ProgressView","style":"IPY_MODEL_18ac3a6a121c4a18910759f5c69e22f1","_dom_classes":[],"description":"Epoch: 100%","_model_name":"FloatProgressModel","bar_style":"success","max":3,"_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":3,"_view_count":null,"_view_module_version":"1.5.0","orientation":"horizontal","min":0,"description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_36db65bfa8214685b95ea2ea46b81bda"}},"6117124ef56747cb9660a3b0a1320814":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","state":{"_view_name":"HTMLView","style":"IPY_MODEL_244e5d1e610e4e7180f83fa6e3ab8ab7","_dom_classes":[],"description":"","_model_name":"HTMLModel","placeholder":"​","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":" 3/3 [12:23<00:00, 248.00s/it]","_view_count":null,"_view_module_version":"1.5.0","description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_e803ea8467d848888de63596f4ee1920"}},"18ac3a6a121c4a18910759f5c69e22f1":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","state":{"_view_name":"StyleView","_model_name":"ProgressStyleModel","description_width":"initial","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","bar_color":null,"_model_module":"@jupyter-widgets/controls"}},"36db65bfa8214685b95ea2ea46b81bda":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"244e5d1e610e4e7180f83fa6e3ab8ab7":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","state":{"_view_name":"StyleView","_model_name":"DescriptionStyleModel","description_width":"","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","_model_module":"@jupyter-widgets/controls"}},"e803ea8467d848888de63596f4ee1920":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"dc452a3c7a0842309e44f5a32c1b82bd":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","state":{"_view_name":"HBoxView","_dom_classes":[],"_model_name":"HBoxModel","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.5.0","box_style":"","layout":"IPY_MODEL_e8412f4d3b8e46ab81d4f301157f00be","_model_module":"@jupyter-widgets/controls","children":["IPY_MODEL_e8a92849a76642a7ab2b186ae0df4ae7","IPY_MODEL_eaa7ef95bed7416aa97237bcff09b7af"]}},"e8412f4d3b8e46ab81d4f301157f00be":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"e8a92849a76642a7ab2b186ae0df4ae7":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","state":{"_view_name":"ProgressView","style":"IPY_MODEL_853d9a7b580a474aafa0941123ff863b","_dom_classes":[],"description":"Iteration: 100%","_model_name":"FloatProgressModel","bar_style":"success","max":2405,"_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":2405,"_view_count":null,"_view_module_version":"1.5.0","orientation":"horizontal","min":0,"description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_4f78a0941ba24435b1ad3f9de377fcd7"}},"eaa7ef95bed7416aa97237bcff09b7af":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","state":{"_view_name":"HTMLView","style":"IPY_MODEL_3a2ae730cb3d470ab40b96c8e4e67058","_dom_classes":[],"description":"","_model_name":"HTMLModel","placeholder":"​","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":" 2405/2405 [12:23<00:00, 3.23it/s]","_view_count":null,"_view_module_version":"1.5.0","description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_ead4e7718c7a497bb0bc54fe8e19ae35"}},"853d9a7b580a474aafa0941123ff863b":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","state":{"_view_name":"StyleView","_model_name":"ProgressStyleModel","description_width":"initial","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","bar_color":null,"_model_module":"@jupyter-widgets/controls"}},"4f78a0941ba24435b1ad3f9de377fcd7":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"3a2ae730cb3d470ab40b96c8e4e67058":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","state":{"_view_name":"StyleView","_model_name":"DescriptionStyleModel","description_width":"","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","_model_module":"@jupyter-widgets/controls"}},"ead4e7718c7a497bb0bc54fe8e19ae35":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"ca3c971fe77d454eaf3179146487dc44":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","state":{"_view_name":"HBoxView","_dom_classes":[],"_model_name":"HBoxModel","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.5.0","box_style":"","layout":"IPY_MODEL_82b8beb2e5284052846d9e1a554b65e2","_model_module":"@jupyter-widgets/controls","children":["IPY_MODEL_cd664e45c6b24bc791ab4c063dd97b01","IPY_MODEL_866d76086148468f8409464495f5b65e"]}},"82b8beb2e5284052846d9e1a554b65e2":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"cd664e45c6b24bc791ab4c063dd97b01":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","state":{"_view_name":"ProgressView","style":"IPY_MODEL_a125e2c861bf40f88ec5448bb0ac14a4","_dom_classes":[],"description":"Iteration: 100%","_model_name":"FloatProgressModel","bar_style":"success","max":2405,"_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":2405,"_view_count":null,"_view_module_version":"1.5.0","orientation":"horizontal","min":0,"description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_aa163f956a7a48669f70ebbc50912fa9"}},"866d76086148468f8409464495f5b65e":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","state":{"_view_name":"HTMLView","style":"IPY_MODEL_aff54d05e66d4eddb83fec133c002263","_dom_classes":[],"description":"","_model_name":"HTMLModel","placeholder":"​","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":" 2405/2405 [09:09<00:00, 4.38it/s]","_view_count":null,"_view_module_version":"1.5.0","description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_305cb5e80a6943e6a714908a088c0398"}},"a125e2c861bf40f88ec5448bb0ac14a4":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","state":{"_view_name":"StyleView","_model_name":"ProgressStyleModel","description_width":"initial","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","bar_color":null,"_model_module":"@jupyter-widgets/controls"}},"aa163f956a7a48669f70ebbc50912fa9":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"aff54d05e66d4eddb83fec133c002263":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","state":{"_view_name":"StyleView","_model_name":"DescriptionStyleModel","description_width":"","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","_model_module":"@jupyter-widgets/controls"}},"305cb5e80a6943e6a714908a088c0398":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"c980297feeca4330a7ec8cd7b3611d63":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","state":{"_view_name":"HBoxView","_dom_classes":[],"_model_name":"HBoxModel","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.5.0","box_style":"","layout":"IPY_MODEL_738d1f6643bb4d0784fc279254b24134","_model_module":"@jupyter-widgets/controls","children":["IPY_MODEL_43dd7167468747d9aac84ae9a003e42f","IPY_MODEL_91a9349ed14b49cea0bd1d2762dfa7cf"]}},"738d1f6643bb4d0784fc279254b24134":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"43dd7167468747d9aac84ae9a003e42f":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","state":{"_view_name":"ProgressView","style":"IPY_MODEL_3f1bbd86cedd44d79d5b9bbb1f502bd4","_dom_classes":[],"description":"Iteration: 100%","_model_name":"FloatProgressModel","bar_style":"success","max":2405,"_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":2405,"_view_count":null,"_view_module_version":"1.5.0","orientation":"horizontal","min":0,"description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_142219a840f3408d9b951aa884fa4362"}},"91a9349ed14b49cea0bd1d2762dfa7cf":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","state":{"_view_name":"HTMLView","style":"IPY_MODEL_b75ec03a81894ce6a13e232ad16222ab","_dom_classes":[],"description":"","_model_name":"HTMLModel","placeholder":"​","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":" 2405/2405 [05:54<00:00, 6.79it/s]","_view_count":null,"_view_module_version":"1.5.0","description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_638ece11c7014635a228491add1be51d"}},"3f1bbd86cedd44d79d5b9bbb1f502bd4":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","state":{"_view_name":"StyleView","_model_name":"ProgressStyleModel","description_width":"initial","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","bar_color":null,"_model_module":"@jupyter-widgets/controls"}},"142219a840f3408d9b951aa884fa4362":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"b75ec03a81894ce6a13e232ad16222ab":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","state":{"_view_name":"StyleView","_model_name":"DescriptionStyleModel","description_width":"","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","_model_module":"@jupyter-widgets/controls"}},"638ece11c7014635a228491add1be51d":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"dfcc3e8bb29142aaaf0cf2a9d4ecce92":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","state":{"_view_name":"HBoxView","_dom_classes":[],"_model_name":"HBoxModel","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.5.0","box_style":"","layout":"IPY_MODEL_1acce1360fef474aa86b9abd11d6faab","_model_module":"@jupyter-widgets/controls","children":["IPY_MODEL_1cd501aa81f345d185b19a3f741e50d4","IPY_MODEL_500f8f5204db452f9ec05c1e392eeb00"]}},"1acce1360fef474aa86b9abd11d6faab":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"1cd501aa81f345d185b19a3f741e50d4":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","state":{"_view_name":"ProgressView","style":"IPY_MODEL_ca515671c27c4905834982fd360c6649","_dom_classes":[],"description":"Evaluating: 100%","_model_name":"FloatProgressModel","bar_style":"success","max":685,"_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":685,"_view_count":null,"_view_module_version":"1.5.0","orientation":"horizontal","min":0,"description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_368dd39a241b4903ae2a084be4697c7a"}},"500f8f5204db452f9ec05c1e392eeb00":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","state":{"_view_name":"HTMLView","style":"IPY_MODEL_d35c7207643f49dfa3b1f4fb6a81de81","_dom_classes":[],"description":"","_model_name":"HTMLModel","placeholder":"​","_view_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","value":" 685/685 [00:40<00:00, 16.86it/s]","_view_count":null,"_view_module_version":"1.5.0","description_tooltip":null,"_model_module":"@jupyter-widgets/controls","layout":"IPY_MODEL_f9f7fb247b0a41a0b25eef76e8337cd8"}},"ca515671c27c4905834982fd360c6649":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","state":{"_view_name":"StyleView","_model_name":"ProgressStyleModel","description_width":"initial","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","bar_color":null,"_model_module":"@jupyter-widgets/controls"}},"368dd39a241b4903ae2a084be4697c7a":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}},"d35c7207643f49dfa3b1f4fb6a81de81":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","state":{"_view_name":"StyleView","_model_name":"DescriptionStyleModel","description_width":"","_view_module":"@jupyter-widgets/base","_model_module_version":"1.5.0","_view_count":null,"_view_module_version":"1.2.0","_model_module":"@jupyter-widgets/controls"}},"f9f7fb247b0a41a0b25eef76e8337cd8":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","state":{"_view_name":"LayoutView","grid_template_rows":null,"right":null,"justify_content":null,"_view_module":"@jupyter-widgets/base","overflow":null,"_model_module_version":"1.2.0","_view_count":null,"flex_flow":null,"width":null,"min_width":null,"border":null,"align_items":null,"bottom":null,"_model_module":"@jupyter-widgets/base","top":null,"grid_column":null,"overflow_y":null,"overflow_x":null,"grid_auto_flow":null,"grid_area":null,"grid_template_columns":null,"flex":null,"_model_name":"LayoutModel","justify_items":null,"grid_row":null,"max_height":null,"align_content":null,"visibility":null,"align_self":null,"height":null,"min_height":null,"padding":null,"grid_auto_rows":null,"grid_gap":null,"max_width":null,"order":null,"_view_module_version":"1.2.0","grid_template_areas":null,"object_position":null,"object_fit":null,"grid_auto_columns":null,"margin":null,"display":null,"left":null}}}}},"cells":[{"cell_type":"code","metadata":{"id":"gxAkXurO4ed2"},"source":["!pip install transformers"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"969WOYVB4sMG"},"source":["import csv\r\n","import os\r\n","import random\r\n","from pathlib import Path\r\n","import numpy as np\r\n","import pandas as pd\r\n","import torch\r\n","from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,\r\n"," TensorDataset)\r\n","from torch.utils.data.distributed import DistributedSampler\r\n","from transformers import RobertaConfig, RobertaForSequenceClassification, RobertaTokenizer\r\n","from transformers import AdamW, get_linear_schedule_with_warmup\r\n","from tqdm import tqdm, trange, tqdm_notebook\r\n","from sklearn.metrics import matthews_corrcoef, f1_score"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"Q1QTMaFx40y0","executionInfo":{"status":"ok","timestamp":1609351260576,"user_tz":-60,"elapsed":2756,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"89f9c425-4584-4117-a5ea-11c53e7ab63a"},"source":["dataset = pd.read_csv(\"data/preprocessed_train.csv\")\r\n","dataset"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextselected_textsentiment
0cb774db0d1id have responded if i were goingid have responded if i were goingneutral
1549e992a42sooo sad i will miss you here in san diegosooo sadnegative
2088c60f138my boss is bullying mebullying menegative
39642c003efwhat interview leave me aloneleave me alonenegative
4358bd9e861sons of why couldnt they put them on the rele...sons ofnegative
...............
274764eac33d1c0wish we could come see u on denver husband los...d lostnegative
274774f4c4fc327ive wondered about rake to the client has made...dont forcenegative
27478f67aae2310yay good for both of you enjoy the break you p...yay good for both of youpositive
27479ed167662a5but it was worth itbut it was worth itpositive
274806f7127d9d7all this flirting going on the atg smiles yay ...all this flirting going on the atg smiles yay ...neutral
\n","

27481 rows × 4 columns

\n","
"],"text/plain":[" textID ... sentiment\n","0 cb774db0d1 ... neutral\n","1 549e992a42 ... negative\n","2 088c60f138 ... negative\n","3 9642c003ef ... negative\n","4 358bd9e861 ... negative\n","... ... ... ...\n","27476 4eac33d1c0 ... negative\n","27477 4f4c4fc327 ... negative\n","27478 f67aae2310 ... positive\n","27479 ed167662a5 ... positive\n","27480 6f7127d9d7 ... neutral\n","\n","[27481 rows x 4 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"code","metadata":{"id":"o8f8_Z0kz4iO"},"source":["dataset.dropna(inplace=True)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"ZUrUmrmjz4k1","executionInfo":{"status":"ok","timestamp":1609351552294,"user_tz":-60,"elapsed":1327,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"05f304a9-8b0d-4a6d-a1d9-7e5f1954d3c9"},"source":["dataset"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextselected_textsentiment
0cb774db0d1id have responded if i were goingid have responded if i were goingneutral
1549e992a42sooo sad i will miss you here in san diegosooo sadnegative
2088c60f138my boss is bullying mebullying menegative
39642c003efwhat interview leave me aloneleave me alonenegative
4358bd9e861sons of why couldnt they put them on the rele...sons ofnegative
...............
274764eac33d1c0wish we could come see u on denver husband los...d lostnegative
274774f4c4fc327ive wondered about rake to the client has made...dont forcenegative
27478f67aae2310yay good for both of you enjoy the break you p...yay good for both of youpositive
27479ed167662a5but it was worth itbut it was worth itpositive
274806f7127d9d7all this flirting going on the atg smiles yay ...all this flirting going on the atg smiles yay ...neutral
\n","

27383 rows × 4 columns

\n","
"],"text/plain":[" textID ... sentiment\n","0 cb774db0d1 ... neutral\n","1 549e992a42 ... negative\n","2 088c60f138 ... negative\n","3 9642c003ef ... negative\n","4 358bd9e861 ... negative\n","... ... ... ...\n","27476 4eac33d1c0 ... negative\n","27477 4f4c4fc327 ... negative\n","27478 f67aae2310 ... positive\n","27479 ed167662a5 ... positive\n","27480 6f7127d9d7 ... neutral\n","\n","[27383 rows x 4 columns]"]},"metadata":{"tags":[]},"execution_count":16}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":390},"id":"dpwDnTHA48Vc","executionInfo":{"status":"ok","timestamp":1609351584653,"user_tz":-60,"elapsed":650,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"2c4a04fa-dc6e-41bb-9561-600cdec6ff08"},"source":["from sklearn.preprocessing import OrdinalEncoder\r\n","\r\n","ord_enc = OrdinalEncoder()\r\n","dataset[\"label\"] = ord_enc.fit_transform(dataset[[\"sentiment\"]])\r\n","dataset[[\"sentiment\", \"label\"]].head(11)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentimentlabel
0neutral1.0
1negative0.0
2negative0.0
3negative0.0
4negative0.0
5neutral1.0
6positive2.0
7neutral1.0
8neutral1.0
9positive2.0
10neutral1.0
\n","
"],"text/plain":[" sentiment label\n","0 neutral 1.0\n","1 negative 0.0\n","2 negative 0.0\n","3 negative 0.0\n","4 negative 0.0\n","5 neutral 1.0\n","6 positive 2.0\n","7 neutral 1.0\n","8 neutral 1.0\n","9 positive 2.0\n","10 neutral 1.0"]},"metadata":{"tags":[]},"execution_count":17}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":204},"id":"aki1pDzn50hg","executionInfo":{"status":"ok","timestamp":1609351588399,"user_tz":-60,"elapsed":826,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"0341f255-8c00-4abf-e14a-2cca6c47a15c"},"source":["dataset.head()"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textIDtextselected_textsentimentlabel
0cb774db0d1id have responded if i were goingid have responded if i were goingneutral1.0
1549e992a42sooo sad i will miss you here in san diegosooo sadnegative0.0
2088c60f138my boss is bullying mebullying menegative0.0
39642c003efwhat interview leave me aloneleave me alonenegative0.0
4358bd9e861sons of why couldnt they put them on the rele...sons ofnegative0.0
\n","
"],"text/plain":[" textID ... label\n","0 cb774db0d1 ... 1.0\n","1 549e992a42 ... 0.0\n","2 088c60f138 ... 0.0\n","3 9642c003ef ... 0.0\n","4 358bd9e861 ... 0.0\n","\n","[5 rows x 5 columns]"]},"metadata":{"tags":[]},"execution_count":18}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0TyOfPIu6EtO","executionInfo":{"status":"ok","timestamp":1609351590803,"user_tz":-60,"elapsed":1078,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"e29cdb51-b9c5-49f8-9aaa-bdacfc6858bf"},"source":["dataset.shape"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(27383, 5)"]},"metadata":{"tags":[]},"execution_count":19}]},{"cell_type":"code","metadata":{"id":"9swHoSGh_QlL"},"source":["#run this for case 1\r\n","dataset = dataset[['text','label']]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"3aJkeeC_SWrr"},"source":["#run this for case 2\r\n","dataset = dataset[['selected_text','label']]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":204},"id":"R_GWhFJ5_W4E","executionInfo":{"status":"ok","timestamp":1609351593576,"user_tz":-60,"elapsed":706,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"b52c05a0-a01d-48b7-b686-4677ecf6a98f"},"source":["dataset.head()"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textlabel
0id have responded if i were going1.0
1sooo sad i will miss you here in san diego0.0
2my boss is bullying me0.0
3what interview leave me alone0.0
4sons of why couldnt they put them on the rele...0.0
\n","
"],"text/plain":[" text label\n","0 id have responded if i were going 1.0\n","1 sooo sad i will miss you here in san diego 0.0\n","2 my boss is bullying me 0.0\n","3 what interview leave me alone 0.0\n","4 sons of why couldnt they put them on the rele... 0.0"]},"metadata":{"tags":[]},"execution_count":21}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0CPo45-w54ez","executionInfo":{"status":"ok","timestamp":1609352321768,"user_tz":-60,"elapsed":1212,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"2950eeae-4f0b-49f5-f858-2ff548f30a8a"},"source":["#train and val split\r\n","\r\n","train_df = dataset.iloc[:19236]\r\n","vall_df = dataset.iloc[19236:]\r\n","train_df.shape, vall_df.shape"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["((19236, 2), (8147, 2))"]},"metadata":{"tags":[]},"execution_count":43}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"hDlLFDsM38qS","executionInfo":{"status":"ok","timestamp":1609352336462,"user_tz":-60,"elapsed":1286,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"50042d0e-8947-4fa6-9597-9da6d2fb2ce8"},"source":["#val test split\r\n","\r\n","val_df = vall_df.iloc[:4122]\r\n","test_df = vall_df.iloc[4122:]\r\n","val_df.shape, test_df.shape"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["((4122, 2), (4025, 2))"]},"metadata":{"tags":[]},"execution_count":44}]},{"cell_type":"code","metadata":{"id":"nBYKVQtm7qQx"},"source":["save_dir = \"data/\"\r\n","train_df.to_csv(save_dir + \"train.csv\", index=False)\r\n","val_df.to_csv(save_dir + \"dev.csv\", index=False)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Im_CMsfg7XML"},"source":["test_df.to_csv(save_dir + \"test.csv\", index=False)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"sGZz9CDp6Nmq"},"source":["class InputExample(object):\r\n"," \"\"\"A single training/test example for simple sequence classification.\"\"\"\r\n","\r\n"," def __init__(self, guid, text_a, text_b=None, label=None):\r\n"," \"\"\"Constructs a InputExample.\r\n"," Args:\r\n"," guid: Unique id for the example.\r\n"," text_a: string. The untokenized text of the first sequence. For single\r\n"," sequence tasks, only this sequence must be specified.\r\n"," text_b: (Optional) string. The untokenized text of the second sequence.\r\n"," Only must be specified for sequence pair tasks.\r\n"," label: (Optional) string. The label of the example. This should be\r\n"," specified for train and dev examples, but not for test examples.\r\n"," \"\"\"\r\n"," self.guid = guid\r\n"," self.text_a = text_a\r\n"," self.text_b = text_b\r\n"," self.label = label"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"jQvqkYnY6ZkZ"},"source":["class InputFeatures(object):\r\n"," \"\"\"A single set of features of data.\"\"\"\r\n","\r\n"," def __init__(self, input_ids, input_mask, segment_ids, label_id):\r\n"," self.input_ids = input_ids\r\n"," self.input_mask = input_mask\r\n"," self.segment_ids = segment_ids\r\n"," self.label_id = label_id"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"KycuyRUt6d4w"},"source":["\r\n","class DataProcessor(object):\r\n"," \"\"\"Base class for data converters for sequence classification data sets.\"\"\"\r\n","\r\n"," def get_train_examples(self, data_dir):\r\n"," \"\"\"Gets a collection of `InputExample`s for the train set.\"\"\"\r\n"," raise NotImplementedError()\r\n","\r\n"," def get_dev_examples(self, data_dir):\r\n"," \"\"\"Gets a collection of `InputExample`s for the dev set.\"\"\"\r\n"," raise NotImplementedError()\r\n","\r\n"," def get_labels(self):\r\n"," \"\"\"Gets the list of labels for this data set.\"\"\"\r\n"," raise NotImplementedError()\r\n","\r\n"," @classmethod\r\n"," def _read_tsv(cls, input_file, quotechar=None):\r\n"," \"\"\"Reads a tab separated value file.\"\"\"\r\n"," with open(input_file, \"r\", encoding=\"utf-8-sig\") as f:\r\n"," reader = csv.reader(f, delimiter=\"\\t\", quotechar=quotechar)\r\n"," lines = []\r\n"," for line in reader:\r\n"," if sys.version_info[0] == 2:\r\n"," line = list(unicode(cell, 'utf-8') for cell in line)\r\n"," lines.append(line)\r\n"," return lines\r\n"," \r\n","class TweetProcessor(DataProcessor):\r\n"," \"\"\"Processor for the Amazon Reviews data set.\"\"\"\r\n","\r\n"," def get_train_examples(self, data_dir):\r\n"," \"\"\"See base class.\"\"\"\r\n"," return self._create_examples(\r\n"," self._read_tsv(os.path.join(data_dir, \"train.csv\")), \"train\")\r\n","\r\n"," def get_dev_examples(self, data_dir):\r\n"," \"\"\"See base class.\"\"\"\r\n"," return self._create_examples(\r\n"," self._read_tsv(os.path.join(data_dir, \"dev.csv\")), \"dev\")\r\n","\r\n"," def get_test_examples(self, data_dir):\r\n"," \"\"\"See base class.\"\"\"\r\n"," return self._create_examples(\r\n"," self._read_tsv(os.path.join(data_dir, \"test.csv\")), \"test\")\r\n","\r\n"," def get_labels(self):\r\n"," \"\"\"See base class.\"\"\"\r\n"," return [0, 1, 2]\r\n","\r\n"," #Hack to be compatible with the existing code in transformers library\r\n"," def _read_tsv(self, file_path):\r\n"," return pd.read_csv(file_path).values.tolist()\r\n","\r\n"," def _create_examples(self, lines, set_type):\r\n"," \"\"\"Creates examples for the training and dev sets.\"\"\"\r\n"," examples = []\r\n"," for (i, line) in enumerate(lines):\r\n"," if i == 0:\r\n"," continue\r\n"," guid = \"%s-%s\" % (set_type, i)\r\n"," text_a = str(line[0])\r\n"," # text_b = None\r\n"," label = line[1]\r\n"," examples.append(\r\n"," InputExample(guid=guid, text_a=text_a, text_b=None, label=label))\r\n"," return examples"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"6uM9OJal6iF-"},"source":["def set_seed(seed):\r\n"," random.seed(seed)\r\n"," np.random.seed(seed)\r\n"," torch.manual_seed(seed)\r\n"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"SVAmID3W6lVc"},"source":["def simple_accuracy(preds, labels):\r\n"," return (preds == labels).mean()\r\n"," \r\n","def acc_and_f1(preds, labels):\r\n"," acc = simple_accuracy(preds, labels)\r\n"," f1 = f1_score(y_true=labels, y_pred=preds)\r\n"," return {\r\n"," \"acc\": acc,\r\n"," \"f1\": f1,\r\n"," \r\n"," }\r\n","\r\n","\r\n","def f1_weighted(preds, labels):\r\n"," return f1_score(y_true=labels, y_pred=preds, average='weighted')\r\n"," \r\n","def compute_metrics(task_name, preds, labels):\r\n"," assert len(preds) == len(labels)\r\n"," if task_name == \"tweet\":\r\n"," return {\"acc\": acc_and_f1(preds, labels)}\r\n"," else:\r\n"," raise KeyError(task_name)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Wk2MjBbu6npD"},"source":["def _truncate_seq_pair(tokens_a, tokens_b, max_length):\r\n"," \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\r\n","\r\n"," # This is a simple heuristic which will always truncate the longer sequence\r\n"," # one token at a time. This makes more sense than truncating an equal percent\r\n"," # of tokens from each, since if one sequence is very short then each token\r\n"," # that's truncated likely contains more information than a longer sequence.\r\n"," while True:\r\n"," total_length = len(tokens_a) + len(tokens_b)\r\n"," if total_length <= max_length:\r\n"," break\r\n"," if len(tokens_a) > len(tokens_b):\r\n"," tokens_a.pop()\r\n"," else:\r\n"," tokens_b.pop()"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"tRP-i1VR6qoL"},"source":["def convert_examples_to_features(examples, label_list, max_seq_length,\r\n"," tokenizer, output_mode,\r\n"," cls_token_at_end=False, pad_on_left=False,\r\n"," cls_token='[CLS]', sep_token='[SEP]', pad_token=0,\r\n"," sequence_a_segment_id=0, sequence_b_segment_id=1,\r\n"," cls_token_segment_id=1, pad_token_segment_id=0,\r\n"," mask_padding_with_zero=True):\r\n"," \"\"\" Loads a data file into a list of `InputBatch`s\r\n"," `cls_token_at_end` define the location of the CLS token:\r\n"," - False (Default, BERT/XLM pattern): [CLS] + A + [SEP] + B + [SEP]\r\n"," - True (XLNet/GPT pattern): A + [SEP] + B + [SEP] + [CLS]\r\n"," `cls_token_segment_id` define the segment id associated to the CLS token (0 for BERT, 2 for XLNet)\r\n"," \"\"\"\r\n","\r\n"," label_map = {label : i for i, label in enumerate(label_list)}\r\n","\r\n"," features = []\r\n"," for (ex_index, example) in enumerate(examples):\r\n","\r\n"," tokens_a = tokenizer.tokenize(example.text_a)\r\n","\r\n"," tokens_b = None\r\n"," if example.text_b:\r\n"," tokens_b = tokenizer.tokenize(example.text_b)\r\n"," # Modifies `tokens_a` and `tokens_b` in place so that the total\r\n"," # length is less than the specified length.\r\n"," # Account for [CLS], [SEP], [SEP] with \"- 3\"\r\n"," _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\r\n"," else:\r\n"," # Account for [CLS] and [SEP] with \"- 2\"\r\n"," if len(tokens_a) > max_seq_length - 2:\r\n"," tokens_a = tokens_a[:(max_seq_length - 2)]\r\n","\r\n"," # The convention in BERT is:\r\n"," # (a) For sequence pairs:\r\n"," # tokens: [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\r\n"," # type_ids: 0 0 0 0 0 0 0 0 1 1 1 1 1 1\r\n"," # (b) For single sequences:\r\n"," # tokens: [CLS] the dog is hairy . [SEP]\r\n"," # type_ids: 0 0 0 0 0 0 0\r\n"," #\r\n"," # Where \"type_ids\" are used to indicate whether this is the first\r\n"," # sequence or the second sequence. The embedding vectors for `type=0` and\r\n"," # `type=1` were learned during pre-training and are added to the wordpiece\r\n"," # embedding vector (and position vector). This is not *strictly* necessary\r\n"," # since the [SEP] token unambiguously separates the sequences, but it makes\r\n"," # it easier for the model to learn the concept of sequences.\r\n"," #\r\n"," # For classification tasks, the first vector (corresponding to [CLS]) is\r\n"," # used as as the \"sentence vector\". Note that this only makes sense because\r\n"," # the entire model is fine-tuned.\r\n"," tokens = tokens_a + [sep_token]\r\n"," segment_ids = [sequence_a_segment_id] * len(tokens)\r\n","\r\n"," if tokens_b:\r\n"," tokens += tokens_b + [sep_token]\r\n"," segment_ids += [sequence_b_segment_id] * (len(tokens_b) + 1)\r\n","\r\n"," if cls_token_at_end:\r\n"," tokens = tokens + [cls_token]\r\n"," segment_ids = segment_ids + [cls_token_segment_id]\r\n"," else:\r\n"," tokens = [cls_token] + tokens\r\n"," segment_ids = [cls_token_segment_id] + segment_ids\r\n","\r\n"," input_ids = tokenizer.convert_tokens_to_ids(tokens)\r\n","\r\n"," # The mask has 1 for real tokens and 0 for padding tokens. Only real\r\n"," # tokens are attended to.\r\n"," input_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)\r\n","\r\n"," # Zero-pad up to the sequence length.\r\n"," padding_length = max_seq_length - len(input_ids)\r\n"," if pad_on_left:\r\n"," input_ids = ([pad_token] * padding_length) + input_ids\r\n"," input_mask = ([0 if mask_padding_with_zero else 1] * padding_length) + input_mask\r\n"," segment_ids = ([pad_token_segment_id] * padding_length) + segment_ids\r\n"," else:\r\n"," input_ids = input_ids + ([pad_token] * padding_length)\r\n"," input_mask = input_mask + ([0 if mask_padding_with_zero else 1] * padding_length)\r\n"," segment_ids = segment_ids + ([pad_token_segment_id] * padding_length)\r\n","\r\n"," assert len(input_ids) == max_seq_length\r\n"," assert len(input_mask) == max_seq_length\r\n"," assert len(segment_ids) == max_seq_length\r\n","\r\n"," if output_mode == \"classification\":\r\n"," label_id = label_map[example.label]\r\n"," elif output_mode == \"regression\":\r\n"," label_id = float(example.label)\r\n"," else:\r\n"," raise KeyError(output_mode)\r\n","\r\n"," features.append(\r\n"," InputFeatures(input_ids=input_ids,\r\n"," input_mask=input_mask,\r\n"," segment_ids=segment_ids,\r\n"," label_id=label_id))\r\n"," return features"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"D_bnAV7b60ZK"},"source":["\r\n","processor = TweetProcessor()\r\n","label_list = processor.get_labels()\r\n","num_labels = len(label_list)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Z9RazYscD0bp","executionInfo":{"status":"ok","timestamp":1609352370003,"user_tz":-60,"elapsed":7293,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"a19b6ac9-4c89-4fd6-8834-1e4c9165ed0f"},"source":["#RobertaConfig, RobertaForSequenceClassification,RobertaTokenizer\r\n","config = RobertaConfig.from_pretrained('roberta-base', num_labels=num_labels)\r\n","tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\r\n","model = RobertaForSequenceClassification.from_pretrained('roberta-base', config=config)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']\n","- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n","- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n","Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']\n","You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"],"name":"stderr"}]},{"cell_type":"code","metadata":{"id":"9RFaKP0l8kRK"},"source":[" def load_and_cache_examples(tokenizer, dataset='train'): \r\n"," if dataset == \"train\":\r\n"," examples = processor.get_train_examples(data_dir)\r\n"," elif dataset == \"dev\":\r\n"," examples = processor.get_dev_examples(data_dir)\r\n"," else:\r\n"," examples = processor.get_test_examples(data_dir)\r\n"," \r\n"," features = convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode,\r\n"," cls_token_at_end=True, # xlnet has a cls token at the end\r\n"," cls_token=tokenizer.cls_token,\r\n"," sep_token=tokenizer.sep_token,\r\n"," cls_token_segment_id=2,\r\n"," pad_on_left=True, # pad on the left for xlnet\r\n"," pad_token=tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0],\r\n"," pad_token_segment_id=4)\r\n"," # Convert to Tensors and build dataset\r\n"," all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\r\n"," all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)\r\n"," all_segment_ids = torch.tensor([f.segment_ids for f in features], dtype=torch.long)\r\n"," if output_mode == \"classification\":\r\n"," all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)\r\n"," elif output_mode == \"regression\":\r\n"," all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.float)\r\n","\r\n"," dataset = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)\r\n"," return dataset"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"9vsiFZRF93rE","executionInfo":{"status":"ok","timestamp":1609352505148,"user_tz":-60,"elapsed":1246,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"67afe68f-3cec-459b-e23c-c5e93a642780"},"source":["output_mode = 'classification'\r\n","max_seq_length = 60\r\n","batch_size = 8\r\n","max_grad_norm = 1.0\r\n","gradient_accumulation_steps=2\r\n","num_train_epochs=3\r\n","weight_decay=0.0\r\n","device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\r\n","print(device)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["cuda\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"y4shGAUC-RfA"},"source":["learning_rate = 2e-5\r\n","adam_epsilon = 1e-8\r\n","num_warmup_steps = 0"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"HGeQmWDw-bYN"},"source":["def train(train_dataset, model, tokenizer):\r\n"," \"\"\" Train the model \"\"\"\r\n"," train_sampler = RandomSampler(train_dataset)\r\n"," train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=batch_size)\r\n"," num_training_steps = len(train_dataloader) // gradient_accumulation_steps * num_train_epochs\r\n"," # Prepare optimizer and schedule (linear warmup and decay)\r\n"," no_decay = ['bias', 'LayerNorm.weight']\r\n"," optimizer_grouped_parameters = [\r\n"," {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': weight_decay},\r\n"," {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}\r\n"," ]\r\n"," optimizer = AdamW(optimizer_grouped_parameters, lr=learning_rate, eps=adam_epsilon)\r\n"," scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps= num_warmup_steps,num_training_steps = num_training_steps)\r\n"," \r\n"," global_step = 0\r\n"," tr_loss, logging_loss = 0.0, 0.0\r\n"," model.zero_grad()\r\n"," train_iterator = tqdm_notebook(range(int(num_train_epochs)), desc=\"Epoch\")\r\n"," set_seed(42)\r\n"," for _ in train_iterator:\r\n"," epoch_iterator = tqdm_notebook(train_dataloader, desc=\"Iteration\")\r\n"," for step, batch in enumerate(epoch_iterator):\r\n"," model.train()\r\n"," batch = tuple(t.to(device) for t in batch)\r\n"," inputs = {'input_ids': batch[0],\r\n"," 'attention_mask': batch[1],\r\n"," 'token_type_ids': None, # XLM and RoBERTa don't use segment_ids\r\n"," 'labels': batch[3]}\r\n"," outputs = model(**inputs)\r\n"," loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)\r\n"," if gradient_accumulation_steps > 1:\r\n"," loss = loss / gradient_accumulation_steps\r\n"," loss.backward()\r\n"," torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)\r\n"," tr_loss += loss.item()\r\n"," if (step + 1) % gradient_accumulation_steps == 0:\r\n"," scheduler.step() # Update learning rate schedule\r\n"," optimizer.step()\r\n"," model.zero_grad()\r\n"," global_step += 1\r\n"," \r\n"," return global_step, tr_loss / global_step"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"6dWLmYAJ-euq"},"source":["def evaluate(model, tokenizer, prefix=\"\"):\r\n"," results = {}\r\n"," eval_dataset = load_and_cache_examples(tokenizer, dataset='dev')\r\n"," eval_batch_size = 8\r\n"," eval_sampler = SequentialSampler(eval_dataset)\r\n"," eval_dataloader = DataLoader(eval_dataset, sampler=eval_sampler, batch_size=eval_batch_size)\r\n"," eval_loss = 0.0\r\n"," nb_eval_steps = 0\r\n"," preds = None\r\n"," out_label_ids = None\r\n"," for batch in tqdm_notebook(eval_dataloader, desc=\"Evaluating\"):\r\n"," model.eval()\r\n"," batch = tuple(t.to(device) for t in batch)\r\n"," with torch.no_grad():\r\n"," inputs = {'input_ids': batch[0],\r\n"," 'attention_mask': batch[1],\r\n"," 'token_type_ids': None, # XLM and RoBERTa don't use segment_ids\r\n"," 'labels': batch[3]}\r\n"," outputs = model(**inputs)\r\n"," tmp_eval_loss, logits = outputs[:2]\r\n"," eval_loss += tmp_eval_loss.mean().item()\r\n"," \r\n"," nb_eval_steps += 1\r\n"," if preds is None:\r\n"," preds = logits.detach().cpu().numpy()\r\n"," out_label_ids = inputs['labels'].detach().cpu().numpy()\r\n"," else:\r\n"," preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)\r\n"," out_label_ids = np.append(out_label_ids, inputs['labels'].detach().cpu().numpy(), axis=0)\r\n"," eval_loss = eval_loss / nb_eval_steps\r\n"," if output_mode == \"classification\":\r\n"," preds = np.argmax(preds, axis=1)\r\n"," elif output_mode == \"regression\":\r\n"," preds = np.squeeze(preds)\r\n"," result = compute_metrics(\"tweet\", preds, out_label_ids)\r\n"," return result"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":335,"referenced_widgets":["aa165c4c8d4a4bdba30d669f59906d15","6ec0c4d438e34833979e9caa837ba5a7","14fd1204a16d4c3692e57e2852041c2b","6117124ef56747cb9660a3b0a1320814","18ac3a6a121c4a18910759f5c69e22f1","36db65bfa8214685b95ea2ea46b81bda","244e5d1e610e4e7180f83fa6e3ab8ab7","e803ea8467d848888de63596f4ee1920","dc452a3c7a0842309e44f5a32c1b82bd","e8412f4d3b8e46ab81d4f301157f00be","e8a92849a76642a7ab2b186ae0df4ae7","eaa7ef95bed7416aa97237bcff09b7af","853d9a7b580a474aafa0941123ff863b","4f78a0941ba24435b1ad3f9de377fcd7","3a2ae730cb3d470ab40b96c8e4e67058","ead4e7718c7a497bb0bc54fe8e19ae35","ca3c971fe77d454eaf3179146487dc44","82b8beb2e5284052846d9e1a554b65e2","cd664e45c6b24bc791ab4c063dd97b01","866d76086148468f8409464495f5b65e","a125e2c861bf40f88ec5448bb0ac14a4","aa163f956a7a48669f70ebbc50912fa9","aff54d05e66d4eddb83fec133c002263","305cb5e80a6943e6a714908a088c0398","c980297feeca4330a7ec8cd7b3611d63","738d1f6643bb4d0784fc279254b24134","43dd7167468747d9aac84ae9a003e42f","91a9349ed14b49cea0bd1d2762dfa7cf","3f1bbd86cedd44d79d5b9bbb1f502bd4","142219a840f3408d9b951aa884fa4362","b75ec03a81894ce6a13e232ad16222ab","638ece11c7014635a228491add1be51d"]},"id":"zlDfbUgj-mBF","executionInfo":{"status":"ok","timestamp":1609353103597,"user_tz":-60,"elapsed":588749,"user":{"displayName":"Rohith Teja","photoUrl":"https://lh3.googleusercontent.com/-nt8x4joQmgY/AAAAAAAAAAI/AAAAAAAAAvg/AbgIIUozOq0/s64/photo.jpg","userId":"01155222072916958278"}},"outputId":"1f17ff7a-4d66-4052-9113-34ec66e186cf"},"source":["data_dir= 'data'\r\n","model.to(device)\r\n","train_dataset = load_and_cache_examples(tokenizer, dataset=\"train\")\r\n","global_step, tr_loss = train(train_dataset, model, tokenizer)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:18: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0\n","Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`\n"],"name":"stderr"},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"aa165c4c8d4a4bdba30d669f59906d15","version_minor":0,"version_major":2},"text/plain":["HBox(children=(FloatProgress(value=0.0, description='Epoch', max=3.0, style=ProgressStyle(description_width='i…"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:21: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0\n","Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`\n"],"name":"stderr"},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"dc452a3c7a0842309e44f5a32c1b82bd","version_minor":0,"version_major":2},"text/plain":["HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2405.0, style=ProgressStyle(description_w…"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n"," \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n"],"name":"stderr"},{"output_type":"stream","text":["\n"],"name":"stdout"},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"ca3c971fe77d454eaf3179146487dc44","version_minor":0,"version_major":2},"text/plain":["HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2405.0, style=ProgressStyle(description_w…"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["\n"],"name":"stdout"},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"c980297feeca4330a7ec8cd7b3611d63","version_minor":0,"version_major":2},"text/plain":["HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2405.0, style=ProgressStyle(description_w…"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"2nd7E7A8-sRW","colab":{"base_uri":"https://localhost:8080/","height":138,"referenced_widgets":["dfcc3e8bb29142aaaf0cf2a9d4ecce92","1acce1360fef474aa86b9abd11d6faab","1cd501aa81f345d185b19a3f741e50d4","500f8f5204db452f9ec05c1e392eeb00","ca515671c27c4905834982fd360c6649","368dd39a241b4903ae2a084be4697c7a","d35c7207643f49dfa3b1f4fb6a81de81","f9f7fb247b0a41a0b25eef76e8337cd8"]},"executionInfo":{"status":"ok","timestamp":1609278897437,"user_tz":-60,"elapsed":42042,"user":{"displayName":"Yogesh Kumar Pilli","photoUrl":"","userId":"08942912479073261381"}},"outputId":"5d9ede65-0392-41ff-be03-65514a54aa90"},"source":["# Evaluation\r\n","result = evaluate(model, tokenizer, prefix=global_step)\r\n","print(result)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:11: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0\n","Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`\n"," # This is added back by InteractiveShellApp.init_path()\n"],"name":"stderr"},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"dfcc3e8bb29142aaaf0cf2a9d4ecce92","version_minor":0,"version_major":2},"text/plain":["HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=685.0, style=ProgressStyle(description_w…"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["\n","{'acc': 0.7791932919904274}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"GIx38YdB_oHO"},"source":["def evaluatetest(model, tokenizer, prefix=\"\"):\r\n"," results = {}\r\n"," eval_dataset = load_and_cache_examples(tokenizer, dataset='test')\r\n"," eval_batch_size = 8\r\n"," eval_sampler = SequentialSampler(eval_dataset)\r\n"," eval_dataloader = DataLoader(eval_dataset, sampler=eval_sampler, batch_size=eval_batch_size)\r\n"," eval_loss = 0.0\r\n"," nb_eval_steps = 0\r\n"," preds = None\r\n"," out_label_ids = None\r\n"," for batch in tqdm_notebook(eval_dataloader, desc=\"Evaluating\"):\r\n"," model.eval()\r\n"," batch = tuple(t.to(device) for t in batch)\r\n"," with torch.no_grad():\r\n"," inputs = {'input_ids': batch[0],\r\n"," 'attention_mask': batch[1],\r\n"," 'token_type_ids': None, # XLM and RoBERTa don't use segment_ids\r\n"," 'labels': batch[3]}\r\n"," outputs = model(**inputs)\r\n"," tmp_eval_loss, logits = outputs[:2]\r\n"," eval_loss += tmp_eval_loss.mean().item()\r\n"," \r\n"," nb_eval_steps += 1\r\n"," if preds is None:\r\n"," preds = logits.detach().cpu().numpy()\r\n"," out_label_ids = inputs['labels'].detach().cpu().numpy()\r\n"," else:\r\n"," preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)\r\n"," out_label_ids = np.append(out_label_ids, inputs['labels'].detach().cpu().numpy(), axis=0)\r\n"," eval_loss = eval_loss / nb_eval_steps\r\n"," if output_mode == \"classification\":\r\n"," preds = np.argmax(preds, axis=1)\r\n"," elif output_mode == \"regression\":\r\n"," preds = np.squeeze(preds)\r\n"," result = compute_metrics(\"tweet\", preds, out_label_ids)\r\n"," return result"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"uDyFnbBM4_gb"},"source":["# Evaluation\r\n","result = evaluatetest(model, tokenizer, prefix=global_step)\r\n","print(result)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"EUIhxMbp4_jD"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"2RwuIGFk4_ky"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"p6Tgzbr04_nQ"},"source":[""],"execution_count":null,"outputs":[]}]} --------------------------------------------------------------------------------