├── LICENSE ├── README.md ├── SentimentAnalysis.ipynb ├── SentimentAnalysis_BERT.ipynb ├── SentimentAnalysis_RNN.ipynb ├── data ├── opinion-lexicon │ ├── negative-words.txt │ └── positive-words.txt ├── sample_submission.csv ├── sentiment140_160k_tweets_train.csv └── sentiment140_test.csv └── images ├── inclass-competition.jpg └── output_7_1.png /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Exploration of Sentiment Analysis 2 | 3 | This repo provides the submission entry for an in-class NLP sentiment analysis competition held at Microsoft AI Singapore group using techniques learned in class to classify text in identifying positive or negative sentiment. 4 | 5 | ![jpg](images/inclass-competition.jpg) 6 | 7 | Recommended to install [Anaconda](https://www.anaconda.com/products/distribution), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. Alternatively, you can make use of [Google Colaboratory](https://colab.research.google.com/), which allows you to write and execute Python codes in your browser. 8 | 9 | **Data** 10 | 11 | Data for this in-class competition comes from the [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140) dataset where the training and test data consists of randomly sampled 10% and 5% of the dataset. 12 | 13 | ## Getting started using Lexicon and Machine Learning (ML) based methods 14 | Open `SentimentAnalysis.ipynb` on a jupyter notebook environment, or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis.ipynb) 15 | 16 | - VADER (VALENCE based sentiment analyzer) [67%] 17 | - Naive Bayes 18 | - Linear SVM (Support Vector Machine) [80%] 19 | - Decision Tree 20 | - Random Forest 21 | - Extra Trees 22 | - SVC [80%] 23 | 24 | ## Exploring using Deep Learning Techniques (LSTM) 25 | Open `SentimentAnalysis_RNN.ipynb` on a jupyter notebook environment, or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis_RNN.ipynb) 26 | 27 | The LSTM deep learning method [79%] did not perform better than SVC/SVM method 28 |
29 | 30 | ## How about the BERT Transformers model? 31 | Open `SentimentAnalysis_BERT.ipynb` on a jupyter notebook environment, or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis_BERT.ipynb) 32 | 33 | The State-of-the-Art transformer model performs slightly better at [82%] accuracy 34 | 35 | 1041 | 1042 | 1043 | -------------------------------------------------------------------------------- /SentimentAnalysis_BERT.ipynb: -------------------------------------------------------------------------------- 1 | {"cells":[{"cell_type":"markdown","metadata":{"id":"efyTtfc_oJkh"},"source":["## 1. Adding imports & installing neccessay packages ##"]},{"cell_type":"code","source":["!pip -q install transformers"],"metadata":{"id":"xQm4c8nyWICn","executionInfo":{"status":"ok","timestamp":1665398259740,"user_tz":-480,"elapsed":14984,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"colab":{"base_uri":"https://localhost:8080/"},"outputId":"58016851-2cab-4bd4-bf78-9422f2ea30a2"},"execution_count":1,"outputs":[{"output_type":"stream","name":"stdout","text":["\u001b[K |████████████████████████████████| 4.9 MB 27.1 MB/s \n","\u001b[K |████████████████████████████████| 163 kB 54.5 MB/s \n","\u001b[K |████████████████████████████████| 6.6 MB 48.6 MB/s \n","\u001b[?25h"]}]},{"cell_type":"code","execution_count":2,"metadata":{"id":"GaVIBWlyoKz3","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1665398399234,"user_tz":-480,"elapsed":93287,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"ef8533c7-46ad-43c5-e1bb-e7ddf63a9257"},"outputs":[{"output_type":"stream","name":"stdout","text":["Mounted at /content/gdrive\n"]}],"source":["### run this if using google colab to mount google drive as local storage\n","\n","from google.colab import drive\n","import os\n","drive.mount('/content/gdrive')\n","\n","repo_path = '/content/gdrive/My Drive/colab/NLP-Bootcamp/'"]},{"cell_type":"code","execution_count":3,"metadata":{"id":"sdBgdze84r8s","executionInfo":{"status":"ok","timestamp":1665398485515,"user_tz":-480,"elapsed":3661,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"outputs":[],"source":["from transformers import AutoTokenizer, TFAutoModelForSequenceClassification\n","\n","import tensorflow as tf\n","import pandas as pd\n","import numpy as np\n","import collections\n","%matplotlib inline\n","\n","from sklearn.model_selection import train_test_split\n","from sklearn.metrics import confusion_matrix, accuracy_score"]},{"cell_type":"markdown","metadata":{"id":"YLttTMckfNa_"},"source":["## 2. Loading Data ##"]},{"cell_type":"code","execution_count":4,"metadata":{"id":"w9EA4jMv4ywO","colab":{"base_uri":"https://localhost:8080/","height":206},"executionInfo":{"status":"ok","timestamp":1665398489725,"user_tz":-480,"elapsed":4223,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"b5aa4429-64e8-4970-a9b2-69fedda95d36"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" target ids user \\\n","0 p 1978186076 ceruleanbreeze \n","1 p 1994697891 enthusiasticjen \n","2 p 2191885992 LifeRemixed \n","3 p 1753662211 lovemandy \n","4 p 2177442789 _LOVELYmanu \n","\n"," text \n","0 @nocturnalie Anyway, and now Abby and I share ... \n","1 @JoeGigantino Few times I'm trying to leave co... \n","2 @AngieGriffin Good Morning Angie I'll be in t... \n","3 had a good day driving up mountains, visiting ... \n","4 downloading some songs i love lady GaGa. "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
targetidsusertext
0p1978186076ceruleanbreeze@nocturnalie Anyway, and now Abby and I share ...
1p1994697891enthusiasticjen@JoeGigantino Few times I'm trying to leave co...
2p2191885992LifeRemixed@AngieGriffin Good Morning Angie I'll be in t...
3p1753662211lovemandyhad a good day driving up mountains, visiting ...
4p2177442789_LOVELYmanudownloading some songs i love lady GaGa.
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":4}],"source":["### run below 2 lines of code for setting train & test data path on google colab\n","trainData = os.path.join(repo_path, 'data/sentiment140_160k_tweets_train.csv')\n","testData = os.path.join(repo_path, 'data/sentiment140_test.csv')\n","\n","### run below 3 lines of code for setting train & test data path on local machine\n","'''\n","DATA = './data/'\n","trainData = DATA + 'sentiment140_160k_tweets_train.csv'\n","testData = DATA + 'sentiment140_test.csv'\n","'''\n","\n","train = pd.read_csv(trainData)\n","test = pd.read_csv(testData)\n","\n","train.head()"]},{"cell_type":"markdown","metadata":{"id":"DE0NVFR9s4o4"},"source":["Looking at distribution of *'positives'* & *'negatives'* samples in train dataset "]},{"cell_type":"code","execution_count":5,"metadata":{"id":"MF2-MSXFoJkr","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1665398489726,"user_tz":-480,"elapsed":16,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"962cfd62-2362-41f6-871d-010469af4a44"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["Counter({'p': 80000, 'n': 79985})"]},"metadata":{},"execution_count":5}],"source":["collections.Counter(train['target'])"]},{"cell_type":"code","execution_count":6,"metadata":{"id":"vwyLXx_moJks","colab":{"base_uri":"https://localhost:8080/","height":293},"executionInfo":{"status":"ok","timestamp":1665398489727,"user_tz":-480,"elapsed":14,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"71637eb6-d24f-42ef-8915-aa4d49ebb0d2"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{},"execution_count":6},{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYMAAAEDCAYAAADX1GjKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVBUlEQVR4nO3df6zd9X3f8ecrdggkKdiEO4vZZkaLlcphhcAduEtVrXg1Nu1qSw0I1s1XyMKbIFs7TducaZI1CFMiVWNlIkhW8bCjLg5ljeylpq7lUFXtZOLLjwKGIN9AiG0BvsXGJGFAnb73x/m4Pbk51/cYrs81+PmQjs7n+/58vt/zOdK1X/d8v59zv6kqJElntw/N9AQkSTPPMJAkGQaSJMNAkoRhIEnCMJAkAbNnegLv1kUXXVSLFi2a6WlI0vvGY4899pdVNdSr730bBosWLWJ0dHSmpyFJ7xtJXpqsz9NEkiTDQJJkGEiSMAwkSRgGkiT6DIMk/zbJviTPJPlaknOTXJrk0SRjSb6e5Jw29iNte6z1L+o6zhda/fkk13XVV7TaWJL10/0mJUknN2UYJJkP/BtguKouA2YBNwFfBu6uqk8CR4G1bZe1wNFWv7uNI8mStt+ngRXAV5LMSjILuBdYCSwBbm5jJUkD0u9potnAeUlmAx8FXgauBR5q/ZuB1a29qm3T+pclSatvraq3q+pFYAy4uj3GquqFqnoH2NrGSpIGZMovnVXVoSS/DXwf+H/AHwOPAa9X1fE27CAwv7XnAwfavseTHAM+0ep7ug7dvc+BCfVres0lyTpgHcAll1wy1dRn3KL1fzjTU/hA+d6XfmWmp/CB4s/n9Hq//3z2c5poLp3f1C8F/i7wMTqneQauqjZW1XBVDQ8N9fxGtSTpXejnNNE/AV6sqvGq+ivgD4DPAnPaaSOABcCh1j4ELARo/RcAr3XXJ+wzWV2SNCD9hMH3gaVJPtrO/S8DngUeAT7XxowA21p7e9um9X+rOjda3g7c1FYbXQosBr4N7AUWt9VJ59C5yLz9vb81SVK/+rlm8GiSh4DHgePAE8BG4A+BrUm+2Gr3t13uB76aZAw4Quc/d6pqX5IH6QTJceD2qvoxQJLPAzvprFTaVFX7pu8tSpKm0tdfLa2qDcCGCeUX6KwEmjj2LeCGSY5zF3BXj/oOYEc/c5EkTT+/gSxJMgwkSYaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSfQRBkk+leTJrscbSX4ryYVJdiXZ357ntvFJck+SsSRPJbmy61gjbfz+JCNd9auSPN32uafdXlOSNCBThkFVPV9VV1TVFcBVwJvAN4D1wO6qWgzsbtsAK+nc33gxsA64DyDJhXTulnYNnTukbTgRIG3MrV37rZiWdydJ6supniZaBny3ql4CVgGbW30zsLq1VwFbqmMPMCfJxcB1wK6qOlJVR4FdwIrWd35V7amqArZ0HUuSNACnGgY3AV9r7XlV9XJrvwLMa+35wIGufQ622snqB3vUJUkD0ncYJDkH+DXg9yf2td/oaxrnNdkc1iUZTTI6Pj5+ul9Oks4ap/LJYCXweFW92rZfbad4aM+HW/0QsLBrvwWtdrL6gh71n1JVG6tquKqGh4aGTmHqkqSTOZUwuJm/PUUEsB04sSJoBNjWVV/TVhUtBY6100k7geVJ5rYLx8uBna3vjSRL2yqiNV3HkiQNwOx+BiX5GPDLwL/sKn8JeDDJWuAl4MZW3wFcD4zRWXl0C0BVHUlyJ7C3jbujqo609m3AA8B5wMPtIUkakL7CoKp+BHxiQu01OquLJo4t4PZJjrMJ2NSjPgpc1s9cJEnTz28gS5IMA0mSYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkiT7DIMmcJA8l+U6S55L8fJILk+xKsr89z21jk+SeJGNJnkpyZddxRtr4/UlGuupXJXm67XNPuxeyJGlA+v1k8DvAH1XVzwKXA88B64HdVbUY2N22AVYCi9tjHXAfQJILgQ3ANcDVwIYTAdLG3Nq134r39rYkSadiyjBIcgHwi8D9AFX1TlW9DqwCNrdhm4HVrb0K2FIde4A5SS4GrgN2VdWRqjoK7AJWtL7zq2pPu3/ylq5jSZIGoJ9PBpcC48D/TPJEkt9N8jFgXlW93Ma8Asxr7fnAga79D7bayeoHe9QlSQPSTxjMBq4E7quqzwA/4m9PCQHQfqOv6Z/eT0qyLsloktHx8fHT/XKSdNboJwwOAger6tG2/RCdcHi1neKhPR9u/YeAhV37L2i1k9UX9Kj/lKraWFXDVTU8NDTUx9QlSf2YMgyq6hXgQJJPtdIy4FlgO3BiRdAIsK21twNr2qqipcCxdjppJ7A8ydx24Xg5sLP1vZFkaVtFtKbrWJKkAZjd57h/DfxeknOAF4Bb6ATJg0nWAi8BN7axO4DrgTHgzTaWqjqS5E5gbxt3R1Udae3bgAeA84CH20OSNCB9hUFVPQkM9+ha1mNsAbdPcpxNwKYe9VHgsn7mIkmafn4DWZJkGEiSDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSfQZBkm+l+TpJE8mGW21C5PsSrK/Pc9t9SS5J8lYkqeSXNl1nJE2fn+Ska76Ve34Y23fTPcblSRN7lQ+GfxSVV1RVSduf7ke2F1Vi4HdbRtgJbC4PdYB90EnPIANwDXA1cCGEwHSxtzatd+Kd/2OJEmn7L2cJloFbG7tzcDqrvqW6tgDzElyMXAdsKuqjlTVUWAXsKL1nV9Ve9r9k7d0HUuSNAD9hkEBf5zksSTrWm1eVb3c2q8A81p7PnCga9+DrXay+sEedUnSgMzuc9wvVNWhJH8H2JXkO92dVVVJavqn95NaEK0DuOSSS073y0nSWaOvTwZVdag9Hwa+Qeec/6vtFA/t+XAbfghY2LX7glY7WX1Bj3qveWysquGqGh4aGupn6pKkPkwZBkk+luRnTrSB5cAzwHbgxIqgEWBba28H1rRVRUuBY+100k5geZK57cLxcmBn63sjydK2imhN17EkSQPQz2miecA32mrP2cD/qqo/SrIXeDDJWuAl4MY2fgdwPTAGvAncAlBVR5LcCext4+6oqiOtfRvwAHAe8HB7SJIGZMowqKoXgMt71F8DlvWoF3D7JMfaBGzqUR8FLutjvpKk08BvIEuSDANJkmEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJIlTCIMks5I8keSbbfvSJI8mGUvy9STntPpH2vZY61/UdYwvtPrzSa7rqq9otbEk66fv7UmS+nEqnwx+E3iua/vLwN1V9UngKLC21dcCR1v97jaOJEuAm4BPAyuAr7SAmQXcC6wElgA3t7GSpAHpKwySLAB+Bfjdth3gWuChNmQzsLq1V7VtWv+yNn4VsLWq3q6qF4Ex4Or2GKuqF6rqHWBrGytJGpB+Pxn8d+A/AH/dtj8BvF5Vx9v2QWB+a88HDgC0/mNt/N/UJ+wzWV2SNCBThkGSXwUOV9VjA5jPVHNZl2Q0yej4+PhMT0eSPjD6+WTwWeDXknyPzimca4HfAeYkmd3GLAAOtfYhYCFA678AeK27PmGfyeo/pao2VtVwVQ0PDQ31MXVJUj+mDIOq+kJVLaiqRXQuAH+rqn4DeAT4XBs2Amxr7e1tm9b/raqqVr+prTa6FFgMfBvYCyxuq5POaa+xfVrenSSpL7OnHjKp/whsTfJF4Ang/la/H/hqkjHgCJ3/3KmqfUkeBJ4FjgO3V9WPAZJ8HtgJzAI2VdW+9zAvSdIpOqUwqKo/Af6ktV+gsxJo4pi3gBsm2f8u4K4e9R3AjlOZiyRp+vgNZEmSYSBJMgwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJNFHGCQ5N8m3k/xFkn1J/kurX5rk0SRjSb7e7l9Mu8fx11v90SSLuo71hVZ/Psl1XfUVrTaWZP30v01J0sn088ngbeDaqrocuAJYkWQp8GXg7qr6JHAUWNvGrwWOtvrdbRxJltC5H/KngRXAV5LMSjILuBdYCSwBbm5jJUkDMmUYVMcP2+aH26OAa4GHWn0zsLq1V7VtWv+yJGn1rVX1dlW9CIzRuYfy1cBYVb1QVe8AW9tYSdKA9HXNoP0G/yRwGNgFfBd4vaqOtyEHgfmtPR84AND6jwGf6K5P2GeyuiRpQPoKg6r6cVVdASyg85v8z57WWU0iyboko0lGx8fHZ2IKkvSBdEqriarqdeAR4OeBOUlmt64FwKHWPgQsBGj9FwCvddcn7DNZvdfrb6yq4aoaHhoaOpWpS5JOop/VRENJ5rT2ecAvA8/RCYXPtWEjwLbW3t62af3fqqpq9ZvaaqNLgcXAt4G9wOK2OukcOheZt0/Hm5Mk9Wf21EO4GNjcVv18CHiwqr6Z5Flga5IvAk8A97fx9wNfTTIGHKHznztVtS/Jg8CzwHHg9qr6MUCSzwM7gVnApqraN23vUJI0pSnDoKqeAj7To/4CnesHE+tvATdMcqy7gLt61HcAO/qYryTpNPAbyJIkw0CSZBhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgSaK/eyAvTPJIkmeT7Evym61+YZJdSfa357mtniT3JBlL8lSSK7uONdLG708y0lW/KsnTbZ97kuR0vFlJUm/9fDI4Dvy7qloCLAVuT7IEWA/srqrFwO62DbCSzs3uFwPrgPugEx7ABuAaOrfL3HAiQNqYW7v2W/He35okqV9ThkFVvVxVj7f2D4DngPnAKmBzG7YZWN3aq4At1bEHmJPkYuA6YFdVHamqo8AuYEXrO7+q9lRVAVu6jiVJGoBTumaQZBHwGeBRYF5Vvdy6XgHmtfZ84EDXbgdb7WT1gz3qkqQB6TsMknwc+N/Ab1XVG9197Tf6mua59ZrDuiSjSUbHx8dP98tJ0lmjrzBI8mE6QfB7VfUHrfxqO8VDez7c6oeAhV27L2i1k9UX9Kj/lKraWFXDVTU8NDTUz9QlSX3oZzVRgPuB56rqv3V1bQdOrAgaAbZ11de0VUVLgWPtdNJOYHmSue3C8XJgZ+t7I8nS9lpruo4lSRqA2X2M+SzwL4CnkzzZav8J+BLwYJK1wEvAja1vB3A9MAa8CdwCUFVHktwJ7G3j7qiqI619G/AAcB7wcHtIkgZkyjCoqj8DJlv3v6zH+AJun+RYm4BNPeqjwGVTzUWSdHr4DWRJkmEgSTIMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCTR3z2QNyU5nOSZrtqFSXYl2d+e57Z6ktyTZCzJU0mu7NpnpI3fn2Skq35VkqfbPve0+yBLkgaon08GDwArJtTWA7urajGwu20DrAQWt8c64D7ohAewAbgGuBrYcCJA2phbu/ab+FqSpNNsyjCoqj8FjkworwI2t/ZmYHVXfUt17AHmJLkYuA7YVVVHquoosAtY0frOr6o97d7JW7qOJUkakHd7zWBeVb3c2q8A81p7PnCga9zBVjtZ/WCPuiRpgN7zBeT2G31Nw1ymlGRdktEko+Pj44N4SUk6K7zbMHi1neKhPR9u9UPAwq5xC1rtZPUFPeo9VdXGqhququGhoaF3OXVJ0kTvNgy2AydWBI0A27rqa9qqoqXAsXY6aSewPMncduF4ObCz9b2RZGlbRbSm61iSpAGZPdWAJF8D/jFwUZKDdFYFfQl4MMla4CXgxjZ8B3A9MAa8CdwCUFVHktwJ7G3j7qiqExelb6OzYuk84OH2kCQN0JRhUFU3T9K1rMfYAm6f5DibgE096qPAZVPNQ5J0+vgNZEmSYSBJMgwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJHEGhUGSFUmeTzKWZP1Mz0eSziZnRBgkmQXcC6wElgA3J1kys7OSpLPHGREGwNXAWFW9UFXvAFuBVTM8J0k6a8ye6Qk084EDXdsHgWsmDkqyDljXNn+Y5PkBzO1scBHwlzM9iankyzM9A80Qfz6nz9+brONMCYO+VNVGYONMz+ODJsloVQ3P9DykXvz5HIwz5TTRIWBh1/aCVpMkDcCZEgZ7gcVJLk1yDnATsH2G5yRJZ40z4jRRVR1P8nlgJzAL2FRV+2Z4WmcTT73pTObP5wCkqmZ6DpKkGXamnCaSJM0gw0CSZBhIks6QC8gavCQfAX4dWETXz0FV3TFTc5JOSHIucBvwC0ABfwbcV1VvzejEPsAMg7PXNuAY8Bjw9gzPRZpoC/AD4H+07X8GfBW4YcZm9AHnaqKzVJJnquqymZ6H1EuSZ6tqyVQ1TR+vGZy9/m+SfzDTk5Am8XiSpSc2klwDjM7gfD7w/GRwlkryLPBJ4EU6p4kCVFX93IxOTAKSPAd8Cvh+K10CPA8cx5/T08IwOEsl6fnXC6vqpUHPRZposp/PE/w5nX6GgSTJawaSJMNAkoRhIPWUZE6S2wbwOqu937fOBIaB1NscOt+A7Us63s2/p9WAYaAZ5wVkqYckW4FVdJYzPgL8HDAX+DDwn6tqW5JFdO7B8ShwFXA9sAb458A4nft6P1ZVv53k7wP3AkPAm8CtwIXAN+l8E/wY8OtV9d0BvUXpJ/jnKKTe1gOXVdUVSWYDH62qN5JcBOxJcuJOfIuBkarak+Qf0vl7T5fTCY3H6fy5D+jcoOVfVdX+9gWqr1TVte0436yqhwb55qSJDANpagH+a5JfBP4amA/Ma30vVdWe1v4ssK39MbW3kvwfgCQfB/4R8PtJThzzI4OavNQPw0Ca2m/QOb1zVVX9VZLvAee2vh/1sf+HgNer6orTND/pPfMCstTbD4Cfae0LgMMtCH4JmOzbsX8O/NMk57ZPA78KUFVvAC8muQH+5mLz5T1eR5oxhoHUQ1W9Bvx5kmeAK4DhJE/TuUD8nUn22QtsB54CHgaepnNhGDqfLtYm+QtgH52L0wBbgX+f5Il2kVmaEa4mkqZRko9X1Q+TfBT4U2BdVT0+0/OSpuI1A2l6bWxfIjsX2GwQ6P3CTwaSJK8ZSJIMA0kShoEkCcNAkoRhIEnCMJAkAf8fRluB5QL2emMAAAAASUVORK5CYII=\n"},"metadata":{"needs_background":"light"}}],"source":["train.groupby('target').size().plot(kind='bar')"]},{"cell_type":"markdown","metadata":{"id":"xyHV7gCCxCpO"},"source":["We will find that it is a relatively well-balanced dataset"]},{"cell_type":"code","source":["# review text length of training data\n","# BERT max is 512\n","train['length'] = train['text'].apply(lambda x: len(x.split(' ')))\n","\n","train.hist(\"length\", bins=10)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":316},"id":"h_n1L9VjWP79","executionInfo":{"status":"ok","timestamp":1665398490326,"user_tz":-480,"elapsed":611,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"8038b362-15df-4a5e-d2ac-a5e467fd113e"},"execution_count":7,"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[]],\n"," dtype=object)"]},"metadata":{},"execution_count":7},{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYMAAAEICAYAAAC9E5gJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAYH0lEQVR4nO3df7DddZ3f8edLIsISJVDsHSTU4JjBsqQg3IE4ut0bUAhohT9cB4ZKcNC0Fax22NFQu4s/d7HVdaWj1sySBVxrpKyUFEE2G7nTsTMgRJHwQ0qEIMnwQ02AjbK4cd/943xSj+Fc7g3JPecrPB8zZ+73+/58vt/zPj9yX/d8z/ecpKqQJL24vWTUDUiSRs8wkCQZBpIkw0CShGEgScIwkCRhGEj/X5JNSd485OtckKSSzBnm9Uq7MgykIRpF4EgzYRhIkgwDaVdJXpJkRZIfJflZkquTHNzGdh7WWZbkx0l+muQjfdvun+TKJNuS3JvkQ0k2t7GvAP8M+F9Jtif5UN/VnjNof9KwGAbSs70fOBP4feBVwDbgC7vMeRNwJHAy8MdJ/nmrXwIsAF4DvAX41zs3qKp3AT8G/lVVza2q/zyD/UlDYRhIz/ZvgY9U1eaqegb4KPCOXd7k/VhVPV1VPwB+ABzT6u8E/qSqtlXVZuCyGV7nVPuThsIzGKRnezVwbZJ/7Kv9ChjrW3+0b/kXwNy2/Crg4b6x/uXnMtX+pKHwlYH0bA8Dp1XVvL7LflW1ZQbbPgLM71s/fJdxvyZYnWQYSM/234BPJXk1QJJXJjljhtteDVyc5KAkhwEX7jL+GL33E6ROMQykZ/s8sAb4myR/B9wCnDjDbT8ObAYeBP4WuAZ4pm/8T4H/lOSJJH+491qW9kz8z22k2ZPk3wFnVdXvj7oX6bn4ykDai5IcmuSN7bMKRwIXAdeOui9pOp5NJO1d+wJfBo4AngBWA18caUfSDMzolUGSeUmuSfLD9qnKNyQ5OMnaJPe3nwe1uUlyWZKNSe5Mclzffpa1+fcnWdZXPz7JhrbNZUmy92+qNPuq6qGqOrqqDqiqw6rqoqr65aj7kqYz08NEnwe+VVWvo/dhmHuBFcC6qloIrGvrAKcBC9tlOfAlgPZx/kvovRF3AnDJzgBpc97bt93SPbtZkqTdMe0byEkOBO4AXlN9k5PcB0xU1SNJDgUmq+rIJF9uy1/rn7fzUlX/ptW/DEy2y80taEhydv+8qRxyyCG1YMGCgWM///nPOeCAA57zdo1CF/vqYk9gX7ujiz1BN/vqYk8wvL7Wr1//06p65aCxmbxncATwE+AvkxwDrAc+AIxV1SNtzqP8+tOZh/Gbn7rc3GrPVd88oP4sSZbTe7XB2NgYn/nMZwY2vH37dubO7d4HOLvYVxd7AvvaHV3sCbrZVxd7guH1tWTJkoemGptJGMwBjgPeX1W3Jvk8vz4kBEBVVZJZP0e1qlYCKwHGx8drYmJi4LzJyUmmGhulLvbVxZ7AvnZHF3uCbvbVxZ6gG33N5D2DzcDmqrq1rV9DLxwea4eHaD8fb+Nb+M2P4M9vteeqzx9QlyQNybRhUFWPAg+3c6ah9xW799D7hObOM4KWAde15TXAue2sosXAk+1w0k3AKe1j+gcBpwA3tbGnkixuZxGd27cvSdIQzPRzBu8HvppkX+AB4N30guTqJOcDD9H76l6AG4DTgY30vn3x3QBVtTXJJ4Db2ryPV9XWtvw+4Apgf+DGdpEkDcmMwqCq7gDGBwydPGBuARdMsZ9VwKoB9duBo2fSiyRp7/PrKCRJhoEkyTCQJGEYSJJ4kX5r6YIV3xzJ9V6xtHsfg5ck8JWBJAnDQJKEYSBJwjCQJGEYSJIwDCRJvEhPLR2VDVue5LwRnda66dK3juR6Jf128JWBJMkwkCQZBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgSWKGYZBkU5INSe5IcnurHZxkbZL728+DWj1JLkuyMcmdSY7r28+yNv/+JMv66se3/W9s22Zv31BJ0tR255XBkqo6tqrG2/oKYF1VLQTWtXWA04CF7bIc+BL0wgO4BDgROAG4ZGeAtDnv7dtu6fO+RZKk3bYnh4nOAK5sy1cCZ/bVr6qeW4B5SQ4FTgXWVtXWqtoGrAWWtrFXVNUtVVXAVX37kiQNQXq/f6eZlDwIbAMK+HJVrUzyRFXNa+MBtlXVvCTXA5dW1Xfa2Drgw8AEsF9VfbLV/wh4Gphs89/c6r8HfLiq3jagj+X0Xm0wNjZ2/OrVqwf2u337dubOnTvl7dmw5clpb/NsGNsfHnt6JFfNosMOHFif7r4aFfuauS72BN3sq4s9wfD6WrJkyfq+ozu/Yab/B/KbqmpLkn8KrE3yw/7Bqqok06fKHqqqlcBKgPHx8ZqYmBg4b3JykqnGgJH9P8QXLdrBZzeM5r+d3nTOxMD6dPfVqNjXzHWxJ+hmX13sCbrR14wOE1XVlvbzceBaesf8H2uHeGg/H2/TtwCH920+v9Weqz5/QF2SNCTThkGSA5K8fOcycApwF7AG2HlG0DLgura8Bji3nVW0GHiyqh4BbgJOSXJQe+P4FOCmNvZUksXtcNO5ffuSJA3BTI5ZjAHXtrM95wD/vaq+leQ24Ook5wMPAe9s828ATgc2Ar8A3g1QVVuTfAK4rc37eFVtbcvvA64A9gdubBdJ0pBMGwZV9QBwzID6z4CTB9QLuGCKfa0CVg2o3w4cPYN+JUmzwE8gS5IMA0mSYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJYjfCIMk+Sb6f5Pq2fkSSW5NsTPL1JPu2+sva+sY2vqBvHxe3+n1JTu2rL221jUlW7L2bJ0maid15ZfAB4N6+9U8Dn6uq1wLbgPNb/XxgW6t/rs0jyVHAWcDvAkuBL7aA2Qf4AnAacBRwdpsrSRqSGYVBkvnAW4G/aOsBTgKuaVOuBM5sy2e0ddr4yW3+GcDqqnqmqh4ENgIntMvGqnqgqn4JrG5zJUlDkqqaflJyDfCnwMuBPwTOA25pf/2T5HDgxqo6OsldwNKq2tzGfgScCHy0bfNXrX45cGO7iqVV9Z5WfxdwYlVdOKCP5cBygLGxseNXr149sN/t27czd+7cKW/Phi1PTnubZ8PY/vDY0yO5ahYdduDA+nT31ajY18x1sSfoZl9d7AmG19eSJUvWV9X4oLE5022c5G3A41W1PsnE3m5ud1TVSmAlwPj4eE1MDG5ncnKSqcYAzlvxzVnobnoXLdrBZzdMe5fPik3nTAysT3dfjYp9zVwXe4Ju9tXFnqAbfc3kN9MbgbcnOR3YD3gF8HlgXpI5VbUDmA9safO3AIcDm5PMAQ4EftZX36l/m6nqkqQhmPY9g6q6uKrmV9UCem8Af7uqzgFuBt7Rpi0DrmvLa9o6bfzb1TsWtQY4q51tdASwEPgucBuwsJ2dtG+7jjV75dZJkmZkT45ZfBhYneSTwPeBy1v9cuArSTYCW+n9cqeq7k5yNXAPsAO4oKp+BZDkQuAmYB9gVVXdvQd9SZJ2026FQVVNApNt+QF6ZwLtOufvgT+YYvtPAZ8aUL8BuGF3epEk7T1+AlmStEeHifRbZMEUZ1BdtGjHrJ5dtenSt87aviXtPb4ykCQZBpIkw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRIzCIMk+yX5bpIfJLk7ycda/YgktybZmOTrSfZt9Ze19Y1tfEHfvi5u9fuSnNpXX9pqG5Os2Ps3U5L0XGbyyuAZ4KSqOgY4FliaZDHwaeBzVfVaYBtwfpt/PrCt1T/X5pHkKOAs4HeBpcAXk+yTZB/gC8BpwFHA2W2uJGlIpg2D6tneVl/aLgWcBFzT6lcCZ7blM9o6bfzkJGn11VX1TFU9CGwETmiXjVX1QFX9Eljd5kqShiRVNf2k3l/v64HX0vsr/r8At7S//klyOHBjVR2d5C5gaVVtbmM/Ak4EPtq2+atWvxy4sV3F0qp6T6u/Czixqi4c0MdyYDnA2NjY8atXrx7Y7/bt25k7d+6Ut2fDlienvc2zYWx/eOzpkVz1lGa7p0WHHfi8tpvuMRyVLvbVxZ6gm311sScYXl9LlixZX1Xjg8bmzGQHVfUr4Ngk84Brgdftxf5mrKpWAisBxsfHa2JiYuC8yclJphoDOG/FN2ehu+ldtGgHn90wo7t8aGa7p03nTDyv7aZ7DEeli311sSfoZl9d7Am60ddunU1UVU8ANwNvAOYl2flbZD6wpS1vAQ4HaOMHAj/rr++yzVR1SdKQzORsole2VwQk2R94C3AvvVB4R5u2DLiuLa9p67Txb1fvWNQa4Kx2ttERwELgu8BtwMJ2dtK+9N5kXrM3bpwkaWZmcnzgUODK9r7BS4Crq+r6JPcAq5N8Evg+cHmbfznwlSQbga30frlTVXcnuRq4B9gBXNAOP5HkQuAmYB9gVVXdvdduoSRpWtOGQVXdCbx+QP0BemcC7Vr/e+APptjXp4BPDajfANwwg34lSbPATyBLkgwDSZJhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEliBmGQ5PAkNye5J8ndST7Q6gcnWZvk/vbzoFZPksuSbExyZ5Lj+va1rM2/P8myvvrxSTa0bS5Lktm4sZKkwWbyymAHcFFVHQUsBi5IchSwAlhXVQuBdW0d4DRgYbssB74EvfAALgFOBE4ALtkZIG3Oe/u2W7rnN02SNFPThkFVPVJV32vLfwfcCxwGnAFc2aZdCZzZls8ArqqeW4B5SQ4FTgXWVtXWqtoGrAWWtrFXVNUtVVXAVX37kiQNwW69Z5BkAfB64FZgrKoeaUOPAmNt+TDg4b7NNrfac9U3D6hLkoZkzkwnJpkL/DXwwap6qv+wflVVkpqF/nbtYTm9Q0+MjY0xOTk5cN727dunHAO4aNGOWehuemP7j+66pzLbPT3X4/BcpnsMR6WLfXWxJ+hmX13sCbrR14zCIMlL6QXBV6vqG638WJJDq+qRdqjn8VbfAhzet/n8VtsCTOxSn2z1+QPmP0tVrQRWAoyPj9fExMSgaUxOTjLVGMB5K7455dhsumjRDj67Ycb5OxSz3dOmcyae13bTPYaj0sW+utgTdLOvLvYE3ehrJmcTBbgcuLeq/qxvaA2w84ygZcB1ffVz21lFi4En2+Gkm4BTkhzU3jg+BbipjT2VZHG7rnP79iVJGoKZ/En4RuBdwIYkd7TafwQuBa5Ocj7wEPDONnYDcDqwEfgF8G6Aqtqa5BPAbW3ex6tqa1t+H3AFsD9wY7tIkoZk2jCoqu8AU533f/KA+QVcMMW+VgGrBtRvB46erhdJ0uzwE8iSJMNAkmQYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJzCAMkqxK8niSu/pqBydZm+T+9vOgVk+Sy5JsTHJnkuP6tlnW5t+fZFlf/fgkG9o2lyXJ3r6RkqTnNpNXBlcAS3eprQDWVdVCYF1bBzgNWNguy4EvQS88gEuAE4ETgEt2Bkib896+7Xa9LknSLJs2DKrqfwNbdymfAVzZlq8EzuyrX1U9twDzkhwKnAqsraqtVbUNWAssbWOvqKpbqqqAq/r2JUkakvR+B08zKVkAXF9VR7f1J6pqXlsOsK2q5iW5Hri0qr7TxtYBHwYmgP2q6pOt/kfA08Bkm//mVv894MNV9bYp+lhO7xUHY2Njx69evXpgv9u3b2fu3LlT3p4NW56c9jbPhrH94bGnR3LVU5rtnhYdduDz2m66x3BUuthXF3uCbvbVxZ5geH0tWbJkfVWNDxqbs6c7r6pKMn2i7AVVtRJYCTA+Pl4TExMD501OTjLVGMB5K745C91N76JFO/jshj2+y/eq2e5p0zkTz2u76R7DUeliX13sCbrZVxd7gm709XzPJnqsHeKh/Xy81bcAh/fNm99qz1WfP6AuSRqi5/sn4RpgGXBp+3ldX/3CJKvpvVn8ZFU9kuQm4E/63jQ+Bbi4qrYmeSrJYuBW4Fzgvz7PntRBC57nq7CLFu3Y41dwmy596x5tL72YTBsGSb5G75j/IUk20zsr6FLg6iTnAw8B72zTbwBOBzYCvwDeDdB+6X8CuK3N+3hV7XxT+n30zljaH7ixXSRJQzRtGFTV2VMMnTxgbgEXTLGfVcCqAfXbgaOn60OSNHv8BLIkyTCQJBkGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQ6FAZJlia5L8nGJCtG3Y8kvZh0IgyS7AN8ATgNOAo4O8lRo+1Kkl48OhEGwAnAxqp6oKp+CawGzhhxT5L0opGqGnUPJHkHsLSq3tPW3wWcWFUX7jJvObC8rR4J3DfFLg8BfjpL7e6JLvbVxZ7AvnZHF3uCbvbVxZ5geH29uqpeOWhgzhCufK+pqpXAyunmJbm9qsaH0NJu6WJfXewJ7Gt3dLEn6GZfXewJutFXVw4TbQEO71uf32qSpCHoShjcBixMckSSfYGzgDUj7kmSXjQ6cZioqnYkuRC4CdgHWFVVd+/BLqc9lDQiXeyriz2Bfe2OLvYE3eyriz1BB/rqxBvIkqTR6sphIknSCBkGkqQXXhh05WstkqxK8niSu/pqBydZm+T+9vOgIfd0eJKbk9yT5O4kHxh1X0n2S/LdJD9oPX2s1Y9Icmt7HL/eTiwYuiT7JPl+kuu70leSTUk2JLkjye2tNurn1rwk1yT5YZJ7k7yhAz0d2e6jnZenknywA339h/ZcvyvJ19q/gZE/r15QYdCxr7W4Ali6S20FsK6qFgLr2vow7QAuqqqjgMXABe3+GWVfzwAnVdUxwLHA0iSLgU8Dn6uq1wLbgPOH2FO/DwD39q13pa8lVXVs37npo35ufR74VlW9DjiG3n020p6q6r52Hx0LHA/8Arh2lH0lOQz498B4VR1N74SZs+jC86qqXjAX4A3ATX3rFwMXj7CfBcBdfev3AYe25UOB+0Z8f10HvKUrfQG/A3wPOJHepzHnDHpch9jPfHq/LE4CrgfSkb42AYfsUhvZYwgcCDxIOyGlCz0N6PEU4P+Mui/gMOBh4GB6Z3NeD5zahefVC+qVAb++o3fa3GpdMVZVj7TlR4GxUTWSZAHweuBWRtxXOxRzB/A4sBb4EfBEVe1oU0b1OP458CHgH9v6P+lIXwX8TZL17StaYLSP4RHAT4C/bIfU/iLJASPuaVdnAV9ryyPrq6q2AJ8Bfgw8AjwJrKcDz6sXWhj81qjenwAjOa83yVzgr4EPVtVTo+6rqn5VvZfy8+l9aeHrhnn9gyR5G/B4Va0fdS8DvKmqjqN3OPSCJP+yf3AEj+Ec4DjgS1X1euDn7HLoZcTP932BtwP/Y9exYffV3p84g16Avgo4gGcfTh6JF1oYdP1rLR5LcihA+/n4sBtI8lJ6QfDVqvpGV/oCqKongJvpvUyel2TnhyJH8Ti+EXh7kk30vkX3JHrHxUfd186/Lqmqx+kdAz+B0T6Gm4HNVXVrW7+GXjh04nlFLzS/V1WPtfVR9vVm4MGq+klV/QPwDXrPtZE/r15oYdD1r7VYAyxry8voHbMfmiQBLgfurao/60JfSV6ZZF5b3p/eexj30guFd4yiJ4Cquriq5lfVAnrPo29X1Tmj7ivJAUlevnOZ3rHwuxjhY1hVjwIPJzmylU4G7hllT7s4m18fIoLR9vVjYHGS32n/HnfeVyN9XgEvrDeQ25svpwP/l95x54+MsI+v0Tsm+A/0/nI6n94x53XA/cDfAgcPuac30XtJfCdwR7ucPsq+gH8BfL/1dBfwx63+GuC7wEZ6L+9fNsLHcgK4vgt9tev/QbvcvfM53oHn1rHA7e1x/J/AQaPuqfV1APAz4MC+2qjvq48BP2zP968ALxv186qq/DoKSdIL7zCRJOl5MAwkSYaBJMkwkCRhGEiSMAwkSRgGkiTg/wEjZhCQASPvQgAAAABJRU5ErkJggg==\n"},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"5RK3QPUnbUFq"},"source":["## 3. Data (Text) Preprocessing ##"]},{"cell_type":"code","source":["x_train, x_test, y_train, y_test = train_test_split(train['text'], train['target'], test_size = 0.2, shuffle = True)"],"metadata":{"id":"HiT1tD_fYgR3","executionInfo":{"status":"ok","timestamp":1665398490327,"user_tz":-480,"elapsed":8,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":8,"outputs":[]},{"cell_type":"code","source":["# Which model to use?\n","model_name = 'bert-base-cased' # can handle upper & lower case\n","\n","# Load a tokenizer\n","tokenizer = AutoTokenizer.from_pretrained(model_name)\n","\n","# Load the model\n","model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 2) # num_labels = how many class"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":246,"referenced_widgets":["97de7fdce8544f31afb095802e2e5005","8dace0fb763e4a959360cf0d363a69aa","9314c5a3805044cba0476482029572a5","6e55c1dbc2764edb8b254a107e07519e","d4dd9b3d525e449b8b9bf818930ccacd","c33c5d6553c0499ba9c14787c3b87d1a","2292036ff6a44b798ec2655755754f64","d0c50da1124b47e9a42c825819010c20","10d2bb37154544d38a9dae60fd85a79e","68980c787d65418499b741df0c7ed0bc","d417408e9d8347ff8a7da34cf82876f0","bb3489d239c04264805bb6cfb2b0296b","a2e2e875789a438fa3806e9a075a332d","248183cb5db54926bbfda5119f9d9728","81414e8116a74555916988b8616c3f9f","23bbde8084b44b8499c8ca86846ed6b1","d7e3c6c0bee2470ab21a765d2b37a2f0","20c84709586043e1bab613c9f9e3fd52","e6506c7c4e6a433d8c1c88b2241204dd","cec20a043ed84d43892374bfa9c2e7bc","5016b882b6e64ebf8db93bcc40e2b04f","8bdc0099de8a4547afd1f60c9c85b3cd","006b4ef7097f4fa6a6dcfce779168061","a84a4dc33a9a4f4d89962c4d5aa6c8df","24bf26bd89284b0c8373c24625f3c998","1aa8e55d70fb4f55983cffdddb782d65","33ee0034b99f44df9f155fb4cda7f8b0","967a112db2ff4d5fb747b32304bff3ac","68d611daa25249058d5a4ba4d1ec06b4","562d1e446e984d2fa0c0216aecd615e7","3afe7dbeab1f45af84c0827ab592fe4b","931d28a36fe54ab0816b34287a7e8e34","f401ca1c2592411a9ef4208716441d20","46f3459bb0f0415bb1d125d7f2b73bc1","caaa9b2055e14927843f2dbfb8235e68","d3f8cb1fd1cf48378d5ac7d9e6b16008","34b5de801be64c7b8c2a7476967f5c3d","d0d737607dca46dfa799dd599c6e893e","c6744a612ff3420392f4dfada8b99ed2","684754ec0839488cbb4874ff1bbdfa85","a392f1cfa51c4b5b8ac8628b73124b04","495e719570154198bc1e51d5495401ad","cbb81f2546ae4a4b9918ee6d2d868902","3fea45b57ece4a28b2b166bb61893d48","6b8736558136436287da9fd02f38953e","5a078505b7cd4f028e9161001bd9f373","639a5bbf489d48d0aa06fdfbcd7741a2","062d938282984631bba14727c5d92008","ecd88ef215ce40f682f6d84243ac02f4","28377a7d46b34bf18b65bd9cc223748f","e973d87e106744c295eb13fcf4b273ce","7e9bac1e099a467e8cde00ea9321483b","8377fdc53af747bc8ec810930debcf90","c0d4cf13f9424775a604a0f97e8a2352","f5a0ea148eb9452882f918ee15416b22"]},"id":"_OeGNzrTYbx_","executionInfo":{"status":"ok","timestamp":1665398524569,"user_tz":-480,"elapsed":34249,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"9b234300-da3b-42f8-bd9a-ccb456000ea7"},"execution_count":9,"outputs":[{"output_type":"display_data","data":{"text/plain":["Downloading: 0%| | 0.00/29.0 [00:00"]},"metadata":{},"execution_count":12}]},{"cell_type":"code","source":[],"metadata":{"id":"dLPqf-jyhduq","executionInfo":{"status":"ok","timestamp":1665398533256,"user_tz":-480,"elapsed":55,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":12,"outputs":[]},{"cell_type":"code","source":["# map the labels \n","label_map = {'p':1, 'n':0} # for one hot encoding later (start from 0)\n","label_map\n","\n","# Map the labels to our dictionary\n","train_labels = y_train.map(label_map).values\n","valid_labels = y_test.map(label_map).values"],"metadata":{"id":"5IiOyeOehd9V","executionInfo":{"status":"ok","timestamp":1665398533256,"user_tz":-480,"elapsed":54,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":13,"outputs":[]},{"cell_type":"code","source":["# verifying labels mapping\n","\n","row = 400\n","\n","y_train.iloc[row], train_labels[row]"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"9Xmm73NejOUR","executionInfo":{"status":"ok","timestamp":1665398533256,"user_tz":-480,"elapsed":54,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"76846d4d-b0c8-43bd-ea95-4337a3f20516"},"execution_count":14,"outputs":[{"output_type":"execute_result","data":{"text/plain":["('p', 1)"]},"metadata":{},"execution_count":14}]},{"cell_type":"code","source":["# optimizer\n","opt = tf.keras.optimizers.Adam(learning_rate = 5e-5, epsilon = 1e-8) # BERT needs a very low learning rate \n","# can use AdamW as well\n","\n","#loss\n","loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True) # use to do softmax as BERT final model is not softmax\n","\n","# compile the model\n","model.compile(optimizer = opt,\n"," loss = loss,\n"," metrics = ['accuracy'])\n","# BLEU score for translation tasks"],"metadata":{"id":"4l2T3FcWibTg","executionInfo":{"status":"ok","timestamp":1665398533987,"user_tz":-480,"elapsed":782,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":15,"outputs":[]},{"cell_type":"code","source":["# lets look at the model\n","model.summary()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"EoOvrM7iitnH","executionInfo":{"status":"ok","timestamp":1665398533988,"user_tz":-480,"elapsed":76,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"7fb7f816-ad11-41d5-83a4-e9b76babeed9"},"execution_count":16,"outputs":[{"output_type":"stream","name":"stdout","text":["Model: \"tf_bert_for_sequence_classification\"\n","_________________________________________________________________\n"," Layer (type) Output Shape Param # \n","=================================================================\n"," bert (TFBertMainLayer) multiple 108310272 \n"," \n"," dropout_37 (Dropout) multiple 0 \n"," \n"," classifier (Dense) multiple 1538 \n"," \n","=================================================================\n","Total params: 108,311,810\n","Trainable params: 108,311,810\n","Non-trainable params: 0\n","_________________________________________________________________\n"]}]},{"cell_type":"code","source":["# let's freeze the bert layer & train only the last layer\n","model.layers[0].trainable = False\n","\n","model.summary()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"HgY6V5w1i1-W","executionInfo":{"status":"ok","timestamp":1665398533988,"user_tz":-480,"elapsed":70,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}},"outputId":"9a4819cb-c30d-424d-e840-be4933c15101"},"execution_count":17,"outputs":[{"output_type":"stream","name":"stdout","text":["Model: \"tf_bert_for_sequence_classification\"\n","_________________________________________________________________\n"," Layer (type) Output Shape Param # \n","=================================================================\n"," bert (TFBertMainLayer) multiple 108310272 \n"," \n"," dropout_37 (Dropout) multiple 0 \n"," \n"," classifier (Dense) multiple 1538 \n"," \n","=================================================================\n","Total params: 108,311,810\n","Trainable params: 1,538\n","Non-trainable params: 108,310,272\n","_________________________________________________________________\n"]}]},{"cell_type":"code","source":["# train the model\n","model.fit(dict(train_tokens), train_labels,\n"," epochs = 3,\n"," batch_size = 16, # don't use large batch size for BERT\n"," validation_data=(dict(valid_tokens), valid_labels),\n"," verbose = 1)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"_k-0oEN_i_e1","outputId":"b1390453-79c0-4c4b-cfe7-a18175002389","executionInfo":{"status":"ok","timestamp":1665405339698,"user_tz":-480,"elapsed":5993220,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":19,"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch 1/3\n","8000/8000 [==============================] - 1992s 249ms/step - loss: 0.3918 - accuracy: 0.8259 - val_loss: 0.4000 - val_accuracy: 0.8288\n","Epoch 2/3\n","8000/8000 [==============================] - 1998s 250ms/step - loss: 0.3223 - accuracy: 0.8632 - val_loss: 0.4001 - val_accuracy: 0.8282\n","Epoch 3/3\n","8000/8000 [==============================] - 2002s 250ms/step - loss: 0.2543 - accuracy: 0.8969 - val_loss: 0.4363 - val_accuracy: 0.8191\n"]},{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{},"execution_count":19}]},{"cell_type":"code","source":[],"metadata":{"id":"ZVsFM3yc0dZV","executionInfo":{"status":"aborted","timestamp":1665399338724,"user_tz":-480,"elapsed":2,"user":{"displayName":"Leong Kwok Hing","userId":"10103425286911153836"}}},"execution_count":null,"outputs":[]}],"metadata":{"accelerator":"GPU","colab":{"collapsed_sections":[],"provenance":[]},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"gpuClass":"standard","widgets":{"application/vnd.jupyter.widget-state+json":{"97de7fdce8544f31afb095802e2e5005":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_8dace0fb763e4a959360cf0d363a69aa","IPY_MODEL_9314c5a3805044cba0476482029572a5","IPY_MODEL_6e55c1dbc2764edb8b254a107e07519e"],"layout":"IPY_MODEL_d4dd9b3d525e449b8b9bf818930ccacd"}},"8dace0fb763e4a959360cf0d363a69aa":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_c33c5d6553c0499ba9c14787c3b87d1a","placeholder":"​","style":"IPY_MODEL_2292036ff6a44b798ec2655755754f64","value":"Downloading: 100%"}},"9314c5a3805044cba0476482029572a5":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_d0c50da1124b47e9a42c825819010c20","max":29,"min":0,"orientation":"horizontal","style":"IPY_MODEL_10d2bb37154544d38a9dae60fd85a79e","value":29}},"6e55c1dbc2764edb8b254a107e07519e":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_68980c787d65418499b741df0c7ed0bc","placeholder":"​","style":"IPY_MODEL_d417408e9d8347ff8a7da34cf82876f0","value":" 29.0/29.0 [00:00<00:00, 492B/s]"}},"d4dd9b3d525e449b8b9bf818930ccacd":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c33c5d6553c0499ba9c14787c3b87d1a":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"2292036ff6a44b798ec2655755754f64":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"d0c50da1124b47e9a42c825819010c20":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"10d2bb37154544d38a9dae60fd85a79e":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"68980c787d65418499b741df0c7ed0bc":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d417408e9d8347ff8a7da34cf82876f0":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"bb3489d239c04264805bb6cfb2b0296b":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_a2e2e875789a438fa3806e9a075a332d","IPY_MODEL_248183cb5db54926bbfda5119f9d9728","IPY_MODEL_81414e8116a74555916988b8616c3f9f"],"layout":"IPY_MODEL_23bbde8084b44b8499c8ca86846ed6b1"}},"a2e2e875789a438fa3806e9a075a332d":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_d7e3c6c0bee2470ab21a765d2b37a2f0","placeholder":"​","style":"IPY_MODEL_20c84709586043e1bab613c9f9e3fd52","value":"Downloading: 100%"}},"248183cb5db54926bbfda5119f9d9728":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_e6506c7c4e6a433d8c1c88b2241204dd","max":570,"min":0,"orientation":"horizontal","style":"IPY_MODEL_cec20a043ed84d43892374bfa9c2e7bc","value":570}},"81414e8116a74555916988b8616c3f9f":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_5016b882b6e64ebf8db93bcc40e2b04f","placeholder":"​","style":"IPY_MODEL_8bdc0099de8a4547afd1f60c9c85b3cd","value":" 570/570 [00:00<00:00, 16.3kB/s]"}},"23bbde8084b44b8499c8ca86846ed6b1":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d7e3c6c0bee2470ab21a765d2b37a2f0":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"20c84709586043e1bab613c9f9e3fd52":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"e6506c7c4e6a433d8c1c88b2241204dd":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"cec20a043ed84d43892374bfa9c2e7bc":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"5016b882b6e64ebf8db93bcc40e2b04f":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"8bdc0099de8a4547afd1f60c9c85b3cd":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"006b4ef7097f4fa6a6dcfce779168061":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_a84a4dc33a9a4f4d89962c4d5aa6c8df","IPY_MODEL_24bf26bd89284b0c8373c24625f3c998","IPY_MODEL_1aa8e55d70fb4f55983cffdddb782d65"],"layout":"IPY_MODEL_33ee0034b99f44df9f155fb4cda7f8b0"}},"a84a4dc33a9a4f4d89962c4d5aa6c8df":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_967a112db2ff4d5fb747b32304bff3ac","placeholder":"​","style":"IPY_MODEL_68d611daa25249058d5a4ba4d1ec06b4","value":"Downloading: 100%"}},"24bf26bd89284b0c8373c24625f3c998":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_562d1e446e984d2fa0c0216aecd615e7","max":213450,"min":0,"orientation":"horizontal","style":"IPY_MODEL_3afe7dbeab1f45af84c0827ab592fe4b","value":213450}},"1aa8e55d70fb4f55983cffdddb782d65":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_931d28a36fe54ab0816b34287a7e8e34","placeholder":"​","style":"IPY_MODEL_f401ca1c2592411a9ef4208716441d20","value":" 213k/213k [00:00<00:00, 234kB/s]"}},"33ee0034b99f44df9f155fb4cda7f8b0":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"967a112db2ff4d5fb747b32304bff3ac":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"68d611daa25249058d5a4ba4d1ec06b4":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"562d1e446e984d2fa0c0216aecd615e7":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3afe7dbeab1f45af84c0827ab592fe4b":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"931d28a36fe54ab0816b34287a7e8e34":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"f401ca1c2592411a9ef4208716441d20":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"46f3459bb0f0415bb1d125d7f2b73bc1":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_caaa9b2055e14927843f2dbfb8235e68","IPY_MODEL_d3f8cb1fd1cf48378d5ac7d9e6b16008","IPY_MODEL_34b5de801be64c7b8c2a7476967f5c3d"],"layout":"IPY_MODEL_d0d737607dca46dfa799dd599c6e893e"}},"caaa9b2055e14927843f2dbfb8235e68":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_c6744a612ff3420392f4dfada8b99ed2","placeholder":"​","style":"IPY_MODEL_684754ec0839488cbb4874ff1bbdfa85","value":"Downloading: 100%"}},"d3f8cb1fd1cf48378d5ac7d9e6b16008":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_a392f1cfa51c4b5b8ac8628b73124b04","max":435797,"min":0,"orientation":"horizontal","style":"IPY_MODEL_495e719570154198bc1e51d5495401ad","value":435797}},"34b5de801be64c7b8c2a7476967f5c3d":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_cbb81f2546ae4a4b9918ee6d2d868902","placeholder":"​","style":"IPY_MODEL_3fea45b57ece4a28b2b166bb61893d48","value":" 436k/436k [00:00<00:00, 278kB/s]"}},"d0d737607dca46dfa799dd599c6e893e":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c6744a612ff3420392f4dfada8b99ed2":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"684754ec0839488cbb4874ff1bbdfa85":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"a392f1cfa51c4b5b8ac8628b73124b04":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"495e719570154198bc1e51d5495401ad":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"cbb81f2546ae4a4b9918ee6d2d868902":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3fea45b57ece4a28b2b166bb61893d48":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"6b8736558136436287da9fd02f38953e":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_5a078505b7cd4f028e9161001bd9f373","IPY_MODEL_639a5bbf489d48d0aa06fdfbcd7741a2","IPY_MODEL_062d938282984631bba14727c5d92008"],"layout":"IPY_MODEL_ecd88ef215ce40f682f6d84243ac02f4"}},"5a078505b7cd4f028e9161001bd9f373":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_28377a7d46b34bf18b65bd9cc223748f","placeholder":"​","style":"IPY_MODEL_e973d87e106744c295eb13fcf4b273ce","value":"Downloading: 100%"}},"639a5bbf489d48d0aa06fdfbcd7741a2":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_7e9bac1e099a467e8cde00ea9321483b","max":526681800,"min":0,"orientation":"horizontal","style":"IPY_MODEL_8377fdc53af747bc8ec810930debcf90","value":526681800}},"062d938282984631bba14727c5d92008":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_c0d4cf13f9424775a604a0f97e8a2352","placeholder":"​","style":"IPY_MODEL_f5a0ea148eb9452882f918ee15416b22","value":" 527M/527M [00:10<00:00, 51.1MB/s]"}},"ecd88ef215ce40f682f6d84243ac02f4":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"28377a7d46b34bf18b65bd9cc223748f":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"e973d87e106744c295eb13fcf4b273ce":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"7e9bac1e099a467e8cde00ea9321483b":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"8377fdc53af747bc8ec810930debcf90":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"c0d4cf13f9424775a604a0f97e8a2352":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"f5a0ea148eb9452882f918ee15416b22":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}}}}},"nbformat":4,"nbformat_minor":0} -------------------------------------------------------------------------------- /SentimentAnalysis_RNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "efyTtfc_oJkh" 7 | }, 8 | "source": [ 9 | "## 1. Adding imports & installing neccessay packages ##" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "id": "GaVIBWlyoKz3", 17 | "colab": { 18 | "base_uri": "https://localhost:8080/" 19 | }, 20 | "outputId": "1fbc85d8-fc75-4df9-dd99-b9d2968fcf46" 21 | }, 22 | "outputs": [ 23 | { 24 | "output_type": "stream", 25 | "name": "stdout", 26 | "text": [ 27 | "Mounted at /content/gdrive\n" 28 | ] 29 | } 30 | ], 31 | "source": [ 32 | "### run this if using google colab to mount google drive as local storage\n", 33 | "\n", 34 | "from google.colab import drive\n", 35 | "import os\n", 36 | "drive.mount('/content/gdrive')\n", 37 | "\n", 38 | "repo_path = '/content/gdrive/My Drive/colab/NLP-Bootcamp/'" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "id": "sdBgdze84r8s" 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "import pandas as pd\n", 50 | "import collections\n", 51 | "%matplotlib inline\n", 52 | "\n", 53 | "# Import modules to calculate accuracy and confusion matrix\n", 54 | "from sklearn.metrics import confusion_matrix, accuracy_score" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "id": "YLttTMckfNa_" 61 | }, 62 | "source": [ 63 | "## 2. Loading Data ##" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": { 70 | "id": "w9EA4jMv4ywO", 71 | "colab": { 72 | "base_uri": "https://localhost:8080/", 73 | "height": 206 74 | }, 75 | "outputId": "4590a838-52eb-45f6-b056-bb8685147946" 76 | }, 77 | "outputs": [ 78 | { 79 | "output_type": "execute_result", 80 | "data": { 81 | "text/plain": [ 82 | " target ids user \\\n", 83 | "0 p 1978186076 ceruleanbreeze \n", 84 | "1 p 1994697891 enthusiasticjen \n", 85 | "2 p 2191885992 LifeRemixed \n", 86 | "3 p 1753662211 lovemandy \n", 87 | "4 p 2177442789 _LOVELYmanu \n", 88 | "\n", 89 | " text \n", 90 | "0 @nocturnalie Anyway, and now Abby and I share ... \n", 91 | "1 @JoeGigantino Few times I'm trying to leave co... \n", 92 | "2 @AngieGriffin Good Morning Angie I'll be in t... \n", 93 | "3 had a good day driving up mountains, visiting ... \n", 94 | "4 downloading some songs i love lady GaGa. " 95 | ], 96 | "text/html": [ 97 | "\n", 98 | "
\n", 99 | "
\n", 100 | "
\n", 101 | "\n", 114 | "\n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | "
targetidsusertext
0p1978186076ceruleanbreeze@nocturnalie Anyway, and now Abby and I share ...
1p1994697891enthusiasticjen@JoeGigantino Few times I'm trying to leave co...
2p2191885992LifeRemixed@AngieGriffin Good Morning Angie I'll be in t...
3p1753662211lovemandyhad a good day driving up mountains, visiting ...
4p2177442789_LOVELYmanudownloading some songs i love lady GaGa.
\n", 162 | "
\n", 163 | " \n", 173 | " \n", 174 | " \n", 211 | "\n", 212 | " \n", 236 | "
\n", 237 | "
\n", 238 | " " 239 | ] 240 | }, 241 | "metadata": {}, 242 | "execution_count": 3 243 | } 244 | ], 245 | "source": [ 246 | "### run below 2 lines of code for setting train & test data path on google colab\n", 247 | "trainData = os.path.join(repo_path, 'data/sentiment140_160k_tweets_train.csv')\n", 248 | "testData = os.path.join(repo_path, 'data/sentiment140_test.csv')\n", 249 | "\n", 250 | "### run below 3 lines of code for setting train & test data path on local machine\n", 251 | "'''\n", 252 | "DATA = './data/'\n", 253 | "trainData = DATA + 'sentiment140_160k_tweets_train.csv'\n", 254 | "testData = DATA + 'sentiment140_test.csv'\n", 255 | "'''\n", 256 | "\n", 257 | "train = pd.read_csv(trainData)\n", 258 | "test = pd.read_csv(testData)\n", 259 | "\n", 260 | "train.head()" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": { 266 | "id": "DE0NVFR9s4o4" 267 | }, 268 | "source": [ 269 | "Looking at distribution of *'positives'* & *'negatives'* samples in train dataset " 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": { 276 | "id": "MF2-MSXFoJkr", 277 | "colab": { 278 | "base_uri": "https://localhost:8080/" 279 | }, 280 | "outputId": "1ba63640-3a77-48e5-f620-11b8260ada21" 281 | }, 282 | "outputs": [ 283 | { 284 | "output_type": "execute_result", 285 | "data": { 286 | "text/plain": [ 287 | "Counter({'p': 80000, 'n': 79985})" 288 | ] 289 | }, 290 | "metadata": {}, 291 | "execution_count": 4 292 | } 293 | ], 294 | "source": [ 295 | "collections.Counter(train['target'])" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": { 302 | "id": "vwyLXx_moJks", 303 | "colab": { 304 | "base_uri": "https://localhost:8080/", 305 | "height": 293 306 | }, 307 | "outputId": "aceb2fa0-4236-46ab-a950-d6eb656c9cbb" 308 | }, 309 | "outputs": [ 310 | { 311 | "output_type": "execute_result", 312 | "data": { 313 | "text/plain": [ 314 | "" 315 | ] 316 | }, 317 | "metadata": {}, 318 | "execution_count": 5 319 | }, 320 | { 321 | "output_type": "display_data", 322 | "data": { 323 | "text/plain": [ 324 | "
" 325 | ], 326 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEDCAYAAADX1GjKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVBUlEQVR4nO3df6zd9X3f8ecrdggkKdiEO4vZZkaLlcphhcAduEtVrXg1Nu1qSw0I1s1XyMKbIFs7TducaZI1CFMiVWNlIkhW8bCjLg5ljeylpq7lUFXtZOLLjwKGIN9AiG0BvsXGJGFAnb73x/m4Pbk51/cYrs81+PmQjs7n+/58vt/zOdK1X/d8v59zv6kqJElntw/N9AQkSTPPMJAkGQaSJMNAkoRhIEnCMJAkAbNnegLv1kUXXVSLFi2a6WlI0vvGY4899pdVNdSr730bBosWLWJ0dHSmpyFJ7xtJXpqsz9NEkiTDQJJkGEiSMAwkSRgGkiT6DIMk/zbJviTPJPlaknOTXJrk0SRjSb6e5Jw29iNte6z1L+o6zhda/fkk13XVV7TaWJL10/0mJUknN2UYJJkP/BtguKouA2YBNwFfBu6uqk8CR4G1bZe1wNFWv7uNI8mStt+ngRXAV5LMSjILuBdYCSwBbm5jJUkD0u9potnAeUlmAx8FXgauBR5q/ZuB1a29qm3T+pclSatvraq3q+pFYAy4uj3GquqFqnoH2NrGSpIGZMovnVXVoSS/DXwf+H/AHwOPAa9X1fE27CAwv7XnAwfavseTHAM+0ep7ug7dvc+BCfVres0lyTpgHcAll1wy1dRn3KL1fzjTU/hA+d6XfmWmp/CB4s/n9Hq//3z2c5poLp3f1C8F/i7wMTqneQauqjZW1XBVDQ8N9fxGtSTpXejnNNE/AV6sqvGq+ivgD4DPAnPaaSOABcCh1j4ELARo/RcAr3XXJ+wzWV2SNCD9hMH3gaVJPtrO/S8DngUeAT7XxowA21p7e9um9X+rOjda3g7c1FYbXQosBr4N7AUWt9VJ59C5yLz9vb81SVK/+rlm8GiSh4DHgePAE8BG4A+BrUm+2Gr3t13uB76aZAw4Quc/d6pqX5IH6QTJceD2qvoxQJLPAzvprFTaVFX7pu8tSpKm0tdfLa2qDcCGCeUX6KwEmjj2LeCGSY5zF3BXj/oOYEc/c5EkTT+/gSxJMgwkSYaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSfQRBkk+leTJrscbSX4ryYVJdiXZ357ntvFJck+SsSRPJbmy61gjbfz+JCNd9auSPN32uafdXlOSNCBThkFVPV9VV1TVFcBVwJvAN4D1wO6qWgzsbtsAK+nc33gxsA64DyDJhXTulnYNnTukbTgRIG3MrV37rZiWdydJ6supniZaBny3ql4CVgGbW30zsLq1VwFbqmMPMCfJxcB1wK6qOlJVR4FdwIrWd35V7amqArZ0HUuSNACnGgY3AV9r7XlV9XJrvwLMa+35wIGufQ622snqB3vUJUkD0ncYJDkH+DXg9yf2td/oaxrnNdkc1iUZTTI6Pj5+ul9Oks4ap/LJYCXweFW92rZfbad4aM+HW/0QsLBrvwWtdrL6gh71n1JVG6tquKqGh4aGTmHqkqSTOZUwuJm/PUUEsB04sSJoBNjWVV/TVhUtBY6100k7geVJ5rYLx8uBna3vjSRL2yqiNV3HkiQNwOx+BiX5GPDLwL/sKn8JeDDJWuAl4MZW3wFcD4zRWXl0C0BVHUlyJ7C3jbujqo609m3AA8B5wMPtIUkakL7CoKp+BHxiQu01OquLJo4t4PZJjrMJ2NSjPgpc1s9cJEnTz28gS5IMA0mSYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkiT7DIMmcJA8l+U6S55L8fJILk+xKsr89z21jk+SeJGNJnkpyZddxRtr4/UlGuupXJXm67XNPuxeyJGlA+v1k8DvAH1XVzwKXA88B64HdVbUY2N22AVYCi9tjHXAfQJILgQ3ANcDVwIYTAdLG3Nq134r39rYkSadiyjBIcgHwi8D9AFX1TlW9DqwCNrdhm4HVrb0K2FIde4A5SS4GrgN2VdWRqjoK7AJWtL7zq2pPu3/ylq5jSZIGoJ9PBpcC48D/TPJEkt9N8jFgXlW93Ma8Asxr7fnAga79D7bayeoHe9QlSQPSTxjMBq4E7quqzwA/4m9PCQHQfqOv6Z/eT0qyLsloktHx8fHT/XKSdNboJwwOAger6tG2/RCdcHi1neKhPR9u/YeAhV37L2i1k9UX9Kj/lKraWFXDVTU8NDTUx9QlSf2YMgyq6hXgQJJPtdIy4FlgO3BiRdAIsK21twNr2qqipcCxdjppJ7A8ydx24Xg5sLP1vZFkaVtFtKbrWJKkAZjd57h/DfxeknOAF4Bb6ATJg0nWAi8BN7axO4DrgTHgzTaWqjqS5E5gbxt3R1Udae3bgAeA84CH20OSNCB9hUFVPQkM9+ha1mNsAbdPcpxNwKYe9VHgsn7mIkmafn4DWZJkGEiSDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSfQZBkm+l+TpJE8mGW21C5PsSrK/Pc9t9SS5J8lYkqeSXNl1nJE2fn+Ska76Ve34Y23fTPcblSRN7lQ+GfxSVV1RVSduf7ke2F1Vi4HdbRtgJbC4PdYB90EnPIANwDXA1cCGEwHSxtzatd+Kd/2OJEmn7L2cJloFbG7tzcDqrvqW6tgDzElyMXAdsKuqjlTVUWAXsKL1nV9Ve9r9k7d0HUuSNAD9hkEBf5zksSTrWm1eVb3c2q8A81p7PnCga9+DrXay+sEedUnSgMzuc9wvVNWhJH8H2JXkO92dVVVJavqn95NaEK0DuOSSS073y0nSWaOvTwZVdag9Hwa+Qeec/6vtFA/t+XAbfghY2LX7glY7WX1Bj3qveWysquGqGh4aGupn6pKkPkwZBkk+luRnTrSB5cAzwHbgxIqgEWBba28H1rRVRUuBY+100k5geZK57cLxcmBn63sjydK2imhN17EkSQPQz2miecA32mrP2cD/qqo/SrIXeDDJWuAl4MY2fgdwPTAGvAncAlBVR5LcCext4+6oqiOtfRvwAHAe8HB7SJIGZMowqKoXgMt71F8DlvWoF3D7JMfaBGzqUR8FLutjvpKk08BvIEuSDANJkmEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJIlTCIMks5I8keSbbfvSJI8mGUvy9STntPpH2vZY61/UdYwvtPrzSa7rqq9otbEk66fv7UmS+nEqnwx+E3iua/vLwN1V9UngKLC21dcCR1v97jaOJEuAm4BPAyuAr7SAmQXcC6wElgA3t7GSpAHpKwySLAB+Bfjdth3gWuChNmQzsLq1V7VtWv+yNn4VsLWq3q6qF4Ex4Or2GKuqF6rqHWBrGytJGpB+Pxn8d+A/AH/dtj8BvF5Vx9v2QWB+a88HDgC0/mNt/N/UJ+wzWV2SNCBThkGSXwUOV9VjA5jPVHNZl2Q0yej4+PhMT0eSPjD6+WTwWeDXknyPzimca4HfAeYkmd3GLAAOtfYhYCFA678AeK27PmGfyeo/pao2VtVwVQ0PDQ31MXVJUj+mDIOq+kJVLaiqRXQuAH+rqn4DeAT4XBs2Amxr7e1tm9b/raqqVr+prTa6FFgMfBvYCyxuq5POaa+xfVrenSSpL7OnHjKp/whsTfJF4Ang/la/H/hqkjHgCJ3/3KmqfUkeBJ4FjgO3V9WPAZJ8HtgJzAI2VdW+9zAvSdIpOqUwqKo/Af6ktV+gsxJo4pi3gBsm2f8u4K4e9R3AjlOZiyRp+vgNZEmSYSBJMgwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJNFHGCQ5N8m3k/xFkn1J/kurX5rk0SRjSb7e7l9Mu8fx11v90SSLuo71hVZ/Psl1XfUVrTaWZP30v01J0sn088ngbeDaqrocuAJYkWQp8GXg7qr6JHAUWNvGrwWOtvrdbRxJltC5H/KngRXAV5LMSjILuBdYCSwBbm5jJUkDMmUYVMcP2+aH26OAa4GHWn0zsLq1V7VtWv+yJGn1rVX1dlW9CIzRuYfy1cBYVb1QVe8AW9tYSdKA9HXNoP0G/yRwGNgFfBd4vaqOtyEHgfmtPR84AND6jwGf6K5P2GeyuiRpQPoKg6r6cVVdASyg85v8z57WWU0iyboko0lGx8fHZ2IKkvSBdEqriarqdeAR4OeBOUlmt64FwKHWPgQsBGj9FwCvddcn7DNZvdfrb6yq4aoaHhoaOpWpS5JOop/VRENJ5rT2ecAvA8/RCYXPtWEjwLbW3t62af3fqqpq9ZvaaqNLgcXAt4G9wOK2OukcOheZt0/Hm5Mk9Wf21EO4GNjcVv18CHiwqr6Z5Flga5IvAk8A97fx9wNfTTIGHKHznztVtS/Jg8CzwHHg9qr6MUCSzwM7gVnApqraN23vUJI0pSnDoKqeAj7To/4CnesHE+tvATdMcqy7gLt61HcAO/qYryTpNPAbyJIkw0CSZBhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgSaK/eyAvTPJIkmeT7Evym61+YZJdSfa357mtniT3JBlL8lSSK7uONdLG708y0lW/KsnTbZ97kuR0vFlJUm/9fDI4Dvy7qloCLAVuT7IEWA/srqrFwO62DbCSzs3uFwPrgPugEx7ABuAaOrfL3HAiQNqYW7v2W/He35okqV9ThkFVvVxVj7f2D4DngPnAKmBzG7YZWN3aq4At1bEHmJPkYuA6YFdVHamqo8AuYEXrO7+q9lRVAVu6jiVJGoBTumaQZBHwGeBRYF5Vvdy6XgHmtfZ84EDXbgdb7WT1gz3qkqQB6TsMknwc+N/Ab1XVG9197Tf6mua59ZrDuiSjSUbHx8dP98tJ0lmjrzBI8mE6QfB7VfUHrfxqO8VDez7c6oeAhV27L2i1k9UX9Kj/lKraWFXDVTU8NDTUz9QlSX3oZzVRgPuB56rqv3V1bQdOrAgaAbZ11de0VUVLgWPtdNJOYHmSue3C8XJgZ+t7I8nS9lpruo4lSRqA2X2M+SzwL4CnkzzZav8J+BLwYJK1wEvAja1vB3A9MAa8CdwCUFVHktwJ7G3j7qiqI619G/AAcB7wcHtIkgZkyjCoqj8DJlv3v6zH+AJun+RYm4BNPeqjwGVTzUWSdHr4DWRJkmEgSTIMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCTR3z2QNyU5nOSZrtqFSXYl2d+e57Z6ktyTZCzJU0mu7NpnpI3fn2Skq35VkqfbPve0+yBLkgaon08GDwArJtTWA7urajGwu20DrAQWt8c64D7ohAewAbgGuBrYcCJA2phbu/ab+FqSpNNsyjCoqj8FjkworwI2t/ZmYHVXfUt17AHmJLkYuA7YVVVHquoosAtY0frOr6o97d7JW7qOJUkakHd7zWBeVb3c2q8A81p7PnCga9zBVjtZ/WCPuiRpgN7zBeT2G31Nw1ymlGRdktEko+Pj44N4SUk6K7zbMHi1neKhPR9u9UPAwq5xC1rtZPUFPeo9VdXGqhququGhoaF3OXVJ0kTvNgy2AydWBI0A27rqa9qqoqXAsXY6aSewPMncduF4ObCz9b2RZGlbRbSm61iSpAGZPdWAJF8D/jFwUZKDdFYFfQl4MMla4CXgxjZ8B3A9MAa8CdwCUFVHktwJ7G3j7qiqExelb6OzYuk84OH2kCQN0JRhUFU3T9K1rMfYAm6f5DibgE096qPAZVPNQ5J0+vgNZEmSYSBJMgwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJHEGhUGSFUmeTzKWZP1Mz0eSziZnRBgkmQXcC6wElgA3J1kys7OSpLPHGREGwNXAWFW9UFXvAFuBVTM8J0k6a8ye6Qk084EDXdsHgWsmDkqyDljXNn+Y5PkBzO1scBHwlzM9iankyzM9A80Qfz6nz9+brONMCYO+VNVGYONMz+ODJsloVQ3P9DykXvz5HIwz5TTRIWBh1/aCVpMkDcCZEgZ7gcVJLk1yDnATsH2G5yRJZ40z4jRRVR1P8nlgJzAL2FRV+2Z4WmcTT73pTObP5wCkqmZ6DpKkGXamnCaSJM0gw0CSZBhIks6QC8gavCQfAX4dWETXz0FV3TFTc5JOSHIucBvwC0ABfwbcV1VvzejEPsAMg7PXNuAY8Bjw9gzPRZpoC/AD4H+07X8GfBW4YcZm9AHnaqKzVJJnquqymZ6H1EuSZ6tqyVQ1TR+vGZy9/m+SfzDTk5Am8XiSpSc2klwDjM7gfD7w/GRwlkryLPBJ4EU6p4kCVFX93IxOTAKSPAd8Cvh+K10CPA8cx5/T08IwOEsl6fnXC6vqpUHPRZposp/PE/w5nX6GgSTJawaSJMNAkoRhIPWUZE6S2wbwOqu937fOBIaB1NscOt+A7Us63s2/p9WAYaAZ5wVkqYckW4FVdJYzPgL8HDAX+DDwn6tqW5JFdO7B8ShwFXA9sAb458A4nft6P1ZVv53k7wP3AkPAm8CtwIXAN+l8E/wY8OtV9d0BvUXpJ/jnKKTe1gOXVdUVSWYDH62qN5JcBOxJcuJOfIuBkarak+Qf0vl7T5fTCY3H6fy5D+jcoOVfVdX+9gWqr1TVte0436yqhwb55qSJDANpagH+a5JfBP4amA/Ma30vVdWe1v4ssK39MbW3kvwfgCQfB/4R8PtJThzzI4OavNQPw0Ca2m/QOb1zVVX9VZLvAee2vh/1sf+HgNer6orTND/pPfMCstTbD4Cfae0LgMMtCH4JmOzbsX8O/NMk57ZPA78KUFVvAC8muQH+5mLz5T1eR5oxhoHUQ1W9Bvx5kmeAK4DhJE/TuUD8nUn22QtsB54CHgaepnNhGDqfLtYm+QtgH52L0wBbgX+f5Il2kVmaEa4mkqZRko9X1Q+TfBT4U2BdVT0+0/OSpuI1A2l6bWxfIjsX2GwQ6P3CTwaSJK8ZSJIMA0kShoEkCcNAkoRhIEnCMJAkAf8fRluB5QL2emMAAAAASUVORK5CYII=\n" 327 | }, 328 | "metadata": { 329 | "needs_background": "light" 330 | } 331 | } 332 | ], 333 | "source": [ 334 | "train.groupby('target').size().plot(kind='bar')" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": { 340 | "id": "xyHV7gCCxCpO" 341 | }, 342 | "source": [ 343 | "We will find that it is a relatively well-balanced dataset" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "id": "5RK3QPUnbUFq" 350 | }, 351 | "source": [ 352 | "## 3. Data (Text) Preprocessing ##" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": null, 358 | "metadata": { 359 | "id": "N_p8SQxQMKrq" 360 | }, 361 | "outputs": [], 362 | "source": [ 363 | "### mapping a dictionary of apostrophe words\n", 364 | "\n", 365 | "appos = {\n", 366 | "\"aren't\" : \"are not\",\n", 367 | "\"can't\" : \"cannot\",\n", 368 | "\"cant\" : \"cannot\",\n", 369 | "\"couldn't\" : \"could not\",\n", 370 | "\"didn't\" : \"did not\",\n", 371 | "\"doesn't\" : \"does not\",\n", 372 | "\"don't\" : \"do not\",\n", 373 | "\"hadn't\" : \"had not\",\n", 374 | "\"hasn't\" : \"has not\",\n", 375 | "\"haven't\" : \"have not\",\n", 376 | "\"he'd\" : \"he would\",\n", 377 | "\"he'll\" : \"he will\",\n", 378 | "\"he's\" : \"he is\",\n", 379 | "\"i'd\" : \"I would\",\n", 380 | "\"i'd\" : \"I had\",\n", 381 | "\"i'll\" : \"I will\",\n", 382 | "\"i'm\" : \"I am\",\n", 383 | "\"im\" : \"I am\",\n", 384 | "\"isn't\" : \"is not\",\n", 385 | "\"it's\" : \"it is\",\n", 386 | "\"it'll\":\"it will\",\n", 387 | "\"i've\" : \"I have\",\n", 388 | "\"let's\" : \"let us\",\n", 389 | "\"mightn't\" : \"might not\",\n", 390 | "\"mustn't\" : \"must not\",\n", 391 | "\"shan't\" : \"shall not\",\n", 392 | "\"she'd\" : \"she would\",\n", 393 | "\"she'll\" : \"she will\",\n", 394 | "\"she's\" : \"she is\",\n", 395 | "\"shouldn't\" : \"should not\",\n", 396 | "\"that's\" : \"that is\",\n", 397 | "\"there's\" : \"there is\",\n", 398 | "\"they'd\" : \"they would\",\n", 399 | "\"they'll\" : \"they will\",\n", 400 | "\"they're\" : \"they are\",\n", 401 | "\"they've\" : \"they have\",\n", 402 | "\"we'd\" : \"we would\",\n", 403 | "\"we're\" : \"we are\",\n", 404 | "\"weren't\" : \"were not\",\n", 405 | "\"we've\" : \"we have\",\n", 406 | "\"what'll\" : \"what will\",\n", 407 | "\"what're\" : \"what are\",\n", 408 | "\"what's\" : \"what is\",\n", 409 | "\"what've\" : \"what have\",\n", 410 | "\"where's\" : \"where is\",\n", 411 | "\"who'd\" : \"who would\",\n", 412 | "\"who'll\" : \"who will\",\n", 413 | "\"who're\" : \"who are\",\n", 414 | "\"who's\" : \"who is\",\n", 415 | "\"who've\" : \"who have\",\n", 416 | "\"won't\" : \"will not\",\n", 417 | "\"wouldn't\" : \"would not\",\n", 418 | "\"you'd\" : \"you would\",\n", 419 | "\"you'll\" : \"you will\",\n", 420 | "\"you're\" : \"you are\",\n", 421 | "\"you've\" : \"you have\",\n", 422 | "\"'re\": \" are\",\n", 423 | "\"wasn't\": \"was not\",\n", 424 | "\"we'll\":\" will\",\n", 425 | "\"didn't\": \"did not\",\n", 426 | "\"gg\" : \"going\"\n", 427 | "}" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": { 434 | "id": "7IbqS4m-4-EX", 435 | "colab": { 436 | "base_uri": "https://localhost:8080/", 437 | "height": 206 438 | }, 439 | "outputId": "38904185-6b96-49f0-a68c-05012f58de30" 440 | }, 441 | "outputs": [ 442 | { 443 | "output_type": "execute_result", 444 | "data": { 445 | "text/plain": [ 446 | " target ids user \\\n", 447 | "0 p 1978186076 ceruleanbreeze \n", 448 | "1 p 1994697891 enthusiasticjen \n", 449 | "2 p 2191885992 LifeRemixed \n", 450 | "3 p 1753662211 lovemandy \n", 451 | "4 p 2177442789 _LOVELYmanu \n", 452 | "\n", 453 | " text \\\n", 454 | "0 @nocturnalie Anyway, and now Abby and I share ... \n", 455 | "1 @JoeGigantino Few times I'm trying to leave co... \n", 456 | "2 @AngieGriffin Good Morning Angie I'll be in t... \n", 457 | "3 had a good day driving up mountains, visiting ... \n", 458 | "4 downloading some songs i love lady GaGa. \n", 459 | "\n", 460 | " ugc \n", 461 | "0 anyway and now abby and i share all our crops ... \n", 462 | "1 few times I am trying to leave comments in you... \n", 463 | "2 good morning angie I will be in the atl july 8... \n", 464 | "3 had a good day driving up mountains visiting k... \n", 465 | "4 downloading some songs i love lady gaga " 466 | ], 467 | "text/html": [ 468 | "\n", 469 | "
\n", 470 | "
\n", 471 | "
\n", 472 | "\n", 485 | "\n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | "
targetidsusertextugc
0p1978186076ceruleanbreeze@nocturnalie Anyway, and now Abby and I share ...anyway and now abby and i share all our crops ...
1p1994697891enthusiasticjen@JoeGigantino Few times I'm trying to leave co...few times I am trying to leave comments in you...
2p2191885992LifeRemixed@AngieGriffin Good Morning Angie I'll be in t...good morning angie I will be in the atl july 8...
3p1753662211lovemandyhad a good day driving up mountains, visiting ...had a good day driving up mountains visiting k...
4p2177442789_LOVELYmanudownloading some songs i love lady GaGa.downloading some songs i love lady gaga
\n", 539 | "
\n", 540 | " \n", 550 | " \n", 551 | " \n", 588 | "\n", 589 | " \n", 613 | "
\n", 614 | "
\n", 615 | " " 616 | ] 617 | }, 618 | "metadata": {}, 619 | "execution_count": 7 620 | } 621 | ], 622 | "source": [ 623 | "import re\n", 624 | "\n", 625 | "def preprocess_text(sentence):\n", 626 | " text = re.sub('((www\\.[^\\s]+)|(https?://[^\\s]+))','', sentence['text'])\n", 627 | " text = re.sub('@[^\\s]+','', text)\n", 628 | " text = text.lower().split()\n", 629 | " reformed = [appos[word] if word in appos else word for word in text]\n", 630 | " reformed = \" \".join(reformed) \n", 631 | " text = re.sub('&[^\\s]+;', '', reformed)\n", 632 | " text = re.sub('[^a-zA-Zа-яА-Я1-9]+', ' ', text)\n", 633 | " text = re.sub(' +',' ', text)\n", 634 | " #text = re.sub(' [\\w] ', ' ', text)\n", 635 | " return text.strip()\n", 636 | "\n", 637 | "preprocess = train\n", 638 | "preprocess['ugc'] = preprocess.apply(preprocess_text, axis=1)\n", 639 | "\n", 640 | "preprocess.head()" 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": { 646 | "id": "CUsyZDEqK_4r" 647 | }, 648 | "source": [ 649 | "## 4. Sentiment Analysis using Deep Learning-based Method: RNN ##" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": null, 655 | "metadata": { 656 | "id": "ReMymperLMAh" 657 | }, 658 | "outputs": [], 659 | "source": [ 660 | "from sklearn.model_selection import train_test_split\n", 661 | "from sklearn.metrics import classification_report\n", 662 | "from keras.preprocessing.text import Tokenizer\n", 663 | "from keras.preprocessing import sequence\n", 664 | "\n", 665 | "max_features = 4000\n", 666 | "#num_words = 20\n", 667 | "embedding_size = 256\n", 668 | "lstm_dim = 256\n", 669 | "batch_size = 64\n", 670 | "num_epochs = 10\n", 671 | "\n", 672 | "# Create tokenizer by converting text into sequence of integers\n", 673 | "tokenizer = Tokenizer(num_words=max_features, split=' ')\n", 674 | "tokenizer.fit_on_texts(preprocess['ugc'].values)\n", 675 | "\n", 676 | "X = tokenizer.texts_to_sequences(preprocess['ugc'].values)\n", 677 | "X = sequence.pad_sequences(X)\n", 678 | "#X = sequence.pad_sequences(X, maxlen=num_words)\n", 679 | "y = pd.get_dummies(preprocess['target']).values\n", 680 | "\n", 681 | "# Adding 1 because of reserved 0 index\n", 682 | "vocab_size = len(tokenizer.word_index) + 1" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "metadata": { 689 | "id": "fqZyya_Y6xhl", 690 | "colab": { 691 | "base_uri": "https://localhost:8080/" 692 | }, 693 | "outputId": "44c5cbd1-51bb-45c9-f107-45b513c341a1" 694 | }, 695 | "outputs": [ 696 | { 697 | "output_type": "stream", 698 | "name": "stderr", 699 | "text": [ 700 | "WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.\n", 701 | "WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.\n" 702 | ] 703 | }, 704 | { 705 | "output_type": "stream", 706 | "name": "stdout", 707 | "text": [ 708 | "Model: \"sequential\"\n", 709 | "_________________________________________________________________\n", 710 | " Layer (type) Output Shape Param # \n", 711 | "=================================================================\n", 712 | " embedding (Embedding) (None, 38, 256) 1024000 \n", 713 | " \n", 714 | " spatial_dropout1d (SpatialD (None, 38, 256) 0 \n", 715 | " ropout1D) \n", 716 | " \n", 717 | " lstm (LSTM) (None, 38, 256) 525312 \n", 718 | " \n", 719 | " lstm_1 (LSTM) (None, 128) 197120 \n", 720 | " \n", 721 | " dense (Dense) (None, 2) 258 \n", 722 | " \n", 723 | "=================================================================\n", 724 | "Total params: 1,746,690\n", 725 | "Trainable params: 1,746,690\n", 726 | "Non-trainable params: 0\n", 727 | "_________________________________________________________________\n" 728 | ] 729 | } 730 | ], 731 | "source": [ 732 | "# Define model\n", 733 | "from keras import Sequential\n", 734 | "from keras.layers import Embedding, LSTM, Dropout, Dense, SpatialDropout1D\n", 735 | "\n", 736 | "model = Sequential()\n", 737 | "model.add(Embedding(input_dim=max_features,\n", 738 | " output_dim=embedding_size,\n", 739 | " input_length=X.shape[1]))\n", 740 | "model.add(SpatialDropout1D(0.4))\n", 741 | "model.add(LSTM(units=lstm_dim,\n", 742 | " dropout=0.2,\n", 743 | " activation='tanh',\n", 744 | " recurrent_dropout=0.2,\n", 745 | " recurrent_activation='sigmoid',\n", 746 | " return_sequences=True))\n", 747 | "model.add(LSTM(units=128,\n", 748 | " dropout=0.2,\n", 749 | " activation='tanh',\n", 750 | " recurrent_dropout=0.2,\n", 751 | " recurrent_activation='sigmoid'))\n", 752 | "model.add(Dense(2, activation='sigmoid')) \n", 753 | "\n", 754 | "model.summary()" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": null, 760 | "metadata": { 761 | "id": "HRNFpdoUN3Cw" 762 | }, 763 | "outputs": [], 764 | "source": [ 765 | "# Compile model\n", 766 | "model.compile(loss='binary_crossentropy',\n", 767 | " optimizer='adam',\n", 768 | " metrics=[['accuracy']])" 769 | ] 770 | }, 771 | { 772 | "cell_type": "code", 773 | "execution_count": null, 774 | "metadata": { 775 | "id": "ogRmCK1UZhCC" 776 | }, 777 | "outputs": [], 778 | "source": [ 779 | "Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.2, shuffle=True)" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": null, 785 | "metadata": { 786 | "colab": { 787 | "base_uri": "https://localhost:8080/" 788 | }, 789 | "id": "AJ083iTdOKNo", 790 | "outputId": "eaf7b1cc-040a-4e3d-84eb-f8ad1e77ddf6" 791 | }, 792 | "outputs": [ 793 | { 794 | "output_type": "stream", 795 | "name": "stdout", 796 | "text": [ 797 | "Epoch 1/10\n", 798 | "2000/2000 [==============================] - 699s 346ms/step - loss: 0.4805 - accuracy: 0.7679 - val_loss: 0.4439 - val_accuracy: 0.7926\n", 799 | "Epoch 2/10\n", 800 | "2000/2000 [==============================] - 702s 351ms/step - loss: 0.4287 - accuracy: 0.8005 - val_loss: 0.4355 - val_accuracy: 0.7964\n", 801 | "Epoch 3/10\n", 802 | "2000/2000 [==============================] - 690s 345ms/step - loss: 0.4076 - accuracy: 0.8133 - val_loss: 0.4422 - val_accuracy: 0.7967\n", 803 | "Epoch 4/10\n", 804 | "2000/2000 [==============================] - 684s 342ms/step - loss: 0.3895 - accuracy: 0.8233 - val_loss: 0.4425 - val_accuracy: 0.7979\n", 805 | "Epoch 5/10\n", 806 | "2000/2000 [==============================] - 677s 339ms/step - loss: 0.3737 - accuracy: 0.8324 - val_loss: 0.4458 - val_accuracy: 0.7996\n", 807 | "Epoch 6/10\n", 808 | "2000/2000 [==============================] - 674s 337ms/step - loss: 0.3560 - accuracy: 0.8405 - val_loss: 0.4561 - val_accuracy: 0.7929\n", 809 | "Epoch 7/10\n", 810 | "2000/2000 [==============================] - 674s 337ms/step - loss: 0.3394 - accuracy: 0.8485 - val_loss: 0.4660 - val_accuracy: 0.7936\n", 811 | "Epoch 8/10\n", 812 | "2000/2000 [==============================] - 676s 338ms/step - loss: 0.3208 - accuracy: 0.8588 - val_loss: 0.4931 - val_accuracy: 0.7899\n", 813 | "Epoch 9/10\n", 814 | "2000/2000 [==============================] - 678s 339ms/step - loss: 0.3037 - accuracy: 0.8671 - val_loss: 0.5225 - val_accuracy: 0.7865\n", 815 | "Epoch 10/10\n", 816 | "2000/2000 [==============================] - 674s 337ms/step - loss: 0.2876 - accuracy: 0.8745 - val_loss: 0.5419 - val_accuracy: 0.7852\n" 817 | ] 818 | } 819 | ], 820 | "source": [ 821 | "# Fit model\n", 822 | "history = model.fit(Xtrain, ytrain, batch_size=batch_size, epochs=num_epochs, validation_data=(Xtest, ytest))" 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": null, 828 | "metadata": { 829 | "id": "Dft2-xIgnjCI", 830 | "colab": { 831 | "base_uri": "https://localhost:8080/" 832 | }, 833 | "outputId": "d0b90103-126f-473b-af21-0a3bfc00d373" 834 | }, 835 | "outputs": [ 836 | { 837 | "output_type": "stream", 838 | "name": "stdout", 839 | "text": [ 840 | "Accuracy: 78.52%\n" 841 | ] 842 | } 843 | ], 844 | "source": [ 845 | "# Final evaluation of the model\n", 846 | "scores = model.evaluate(Xtest, ytest, verbose=0)\n", 847 | "print(\"Accuracy: %.2f%%\" % (scores[1]*100))" 848 | ] 849 | } 850 | ], 851 | "metadata": { 852 | "accelerator": "GPU", 853 | "colab": { 854 | "collapsed_sections": [], 855 | "provenance": [] 856 | }, 857 | "kernelspec": { 858 | "display_name": "Python 3", 859 | "language": "python", 860 | "name": "python3" 861 | }, 862 | "language_info": { 863 | "codemirror_mode": { 864 | "name": "ipython", 865 | "version": 3 866 | }, 867 | "file_extension": ".py", 868 | "mimetype": "text/x-python", 869 | "name": "python", 870 | "nbconvert_exporter": "python", 871 | "pygments_lexer": "ipython3", 872 | "version": "3.7.3" 873 | }, 874 | "gpuClass": "standard" 875 | }, 876 | "nbformat": 4, 877 | "nbformat_minor": 0 878 | } -------------------------------------------------------------------------------- /data/opinion-lexicon/negative-words.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KwokHing/SentimentAnalysis-Python-Demo/1c661f2105c1d3997b38a612bda9d03bc9f36cbb/data/opinion-lexicon/negative-words.txt -------------------------------------------------------------------------------- /data/opinion-lexicon/positive-words.txt: -------------------------------------------------------------------------------- 1 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 2 | ; 3 | ; Opinion Lexicon: Positive 4 | ; 5 | ; This file contains a list of POSITIVE opinion words (or sentiment words). 6 | ; 7 | ; This file and the papers can all be downloaded from 8 | ; http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 9 | ; 10 | ; If you use this list, please cite the following paper: 11 | ; 12 | ; Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews." 13 | ; Proceedings of the ACM SIGKDD International Conference on Knowledge 14 | ; Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, 15 | ; Washington, USA, 16 | ; Notes: 17 | ; 1. The appearance of an opinion word in a sentence does not necessarily 18 | ; mean that the sentence expresses a positive or negative opinion. 19 | ; See the paper below: 20 | ; 21 | ; Bing Liu. "Sentiment Analysis and Subjectivity." An chapter in 22 | ; Handbook of Natural Language Processing, Second Edition, 23 | ; (editors: N. Indurkhya and F. J. Damerau), 2010. 24 | ; 25 | ; 2. You will notice many misspelled words in the list. They are not 26 | ; mistakes. They are included as these misspelled words appear 27 | ; frequently in social media content. 28 | ; 29 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 30 | 31 | a+ 32 | abound 33 | abounds 34 | abundance 35 | abundant 36 | accessable 37 | accessible 38 | acclaim 39 | acclaimed 40 | acclamation 41 | accolade 42 | accolades 43 | accommodative 44 | accomodative 45 | accomplish 46 | accomplished 47 | accomplishment 48 | accomplishments 49 | accurate 50 | accurately 51 | achievable 52 | achievement 53 | achievements 54 | achievible 55 | acumen 56 | adaptable 57 | adaptive 58 | adequate 59 | adjustable 60 | admirable 61 | admirably 62 | admiration 63 | admire 64 | admirer 65 | admiring 66 | admiringly 67 | adorable 68 | adore 69 | adored 70 | adorer 71 | adoring 72 | adoringly 73 | adroit 74 | adroitly 75 | adulate 76 | adulation 77 | adulatory 78 | advanced 79 | advantage 80 | advantageous 81 | advantageously 82 | advantages 83 | adventuresome 84 | adventurous 85 | advocate 86 | advocated 87 | advocates 88 | affability 89 | affable 90 | affably 91 | affectation 92 | affection 93 | affectionate 94 | affinity 95 | affirm 96 | affirmation 97 | affirmative 98 | affluence 99 | affluent 100 | afford 101 | affordable 102 | affordably 103 | afordable 104 | agile 105 | agilely 106 | agility 107 | agreeable 108 | agreeableness 109 | agreeably 110 | all-around 111 | alluring 112 | alluringly 113 | altruistic 114 | altruistically 115 | amaze 116 | amazed 117 | amazement 118 | amazes 119 | amazing 120 | amazingly 121 | ambitious 122 | ambitiously 123 | ameliorate 124 | amenable 125 | amenity 126 | amiability 127 | amiabily 128 | amiable 129 | amicability 130 | amicable 131 | amicably 132 | amity 133 | ample 134 | amply 135 | amuse 136 | amusing 137 | amusingly 138 | angel 139 | angelic 140 | apotheosis 141 | appeal 142 | appealing 143 | applaud 144 | appreciable 145 | appreciate 146 | appreciated 147 | appreciates 148 | appreciative 149 | appreciatively 150 | appropriate 151 | approval 152 | approve 153 | ardent 154 | ardently 155 | ardor 156 | articulate 157 | aspiration 158 | aspirations 159 | aspire 160 | assurance 161 | assurances 162 | assure 163 | assuredly 164 | assuring 165 | astonish 166 | astonished 167 | astonishing 168 | astonishingly 169 | astonishment 170 | astound 171 | astounded 172 | astounding 173 | astoundingly 174 | astutely 175 | attentive 176 | attraction 177 | attractive 178 | attractively 179 | attune 180 | audible 181 | audibly 182 | auspicious 183 | authentic 184 | authoritative 185 | autonomous 186 | available 187 | aver 188 | avid 189 | avidly 190 | award 191 | awarded 192 | awards 193 | awe 194 | awed 195 | awesome 196 | awesomely 197 | awesomeness 198 | awestruck 199 | awsome 200 | backbone 201 | balanced 202 | bargain 203 | beauteous 204 | beautiful 205 | beautifullly 206 | beautifully 207 | beautify 208 | beauty 209 | beckon 210 | beckoned 211 | beckoning 212 | beckons 213 | believable 214 | believeable 215 | beloved 216 | benefactor 217 | beneficent 218 | beneficial 219 | beneficially 220 | beneficiary 221 | benefit 222 | benefits 223 | benevolence 224 | benevolent 225 | benifits 226 | best 227 | best-known 228 | best-performing 229 | best-selling 230 | better 231 | better-known 232 | better-than-expected 233 | beutifully 234 | blameless 235 | bless 236 | blessing 237 | bliss 238 | blissful 239 | blissfully 240 | blithe 241 | blockbuster 242 | bloom 243 | blossom 244 | bolster 245 | bonny 246 | bonus 247 | bonuses 248 | boom 249 | booming 250 | boost 251 | boundless 252 | bountiful 253 | brainiest 254 | brainy 255 | brand-new 256 | brave 257 | bravery 258 | bravo 259 | breakthrough 260 | breakthroughs 261 | breathlessness 262 | breathtaking 263 | breathtakingly 264 | breeze 265 | bright 266 | brighten 267 | brighter 268 | brightest 269 | brilliance 270 | brilliances 271 | brilliant 272 | brilliantly 273 | brisk 274 | brotherly 275 | bullish 276 | buoyant 277 | cajole 278 | calm 279 | calming 280 | calmness 281 | capability 282 | capable 283 | capably 284 | captivate 285 | captivating 286 | carefree 287 | cashback 288 | cashbacks 289 | catchy 290 | celebrate 291 | celebrated 292 | celebration 293 | celebratory 294 | champ 295 | champion 296 | charisma 297 | charismatic 298 | charitable 299 | charm 300 | charming 301 | charmingly 302 | chaste 303 | cheaper 304 | cheapest 305 | cheer 306 | cheerful 307 | cheery 308 | cherish 309 | cherished 310 | cherub 311 | chic 312 | chivalrous 313 | chivalry 314 | civility 315 | civilize 316 | clarity 317 | classic 318 | classy 319 | clean 320 | cleaner 321 | cleanest 322 | cleanliness 323 | cleanly 324 | clear 325 | clear-cut 326 | cleared 327 | clearer 328 | clearly 329 | clears 330 | clever 331 | cleverly 332 | cohere 333 | coherence 334 | coherent 335 | cohesive 336 | colorful 337 | comely 338 | comfort 339 | comfortable 340 | comfortably 341 | comforting 342 | comfy 343 | commend 344 | commendable 345 | commendably 346 | commitment 347 | commodious 348 | compact 349 | compactly 350 | compassion 351 | compassionate 352 | compatible 353 | competitive 354 | complement 355 | complementary 356 | complemented 357 | complements 358 | compliant 359 | compliment 360 | complimentary 361 | comprehensive 362 | conciliate 363 | conciliatory 364 | concise 365 | confidence 366 | confident 367 | congenial 368 | congratulate 369 | congratulation 370 | congratulations 371 | congratulatory 372 | conscientious 373 | considerate 374 | consistent 375 | consistently 376 | constructive 377 | consummate 378 | contentment 379 | continuity 380 | contrasty 381 | contribution 382 | convenience 383 | convenient 384 | conveniently 385 | convience 386 | convienient 387 | convient 388 | convincing 389 | convincingly 390 | cool 391 | coolest 392 | cooperative 393 | cooperatively 394 | cornerstone 395 | correct 396 | correctly 397 | cost-effective 398 | cost-saving 399 | counter-attack 400 | counter-attacks 401 | courage 402 | courageous 403 | courageously 404 | courageousness 405 | courteous 406 | courtly 407 | covenant 408 | cozy 409 | creative 410 | credence 411 | credible 412 | crisp 413 | crisper 414 | cure 415 | cure-all 416 | cushy 417 | cute 418 | cuteness 419 | danke 420 | danken 421 | daring 422 | daringly 423 | darling 424 | dashing 425 | dauntless 426 | dawn 427 | dazzle 428 | dazzled 429 | dazzling 430 | dead-cheap 431 | dead-on 432 | decency 433 | decent 434 | decisive 435 | decisiveness 436 | dedicated 437 | defeat 438 | defeated 439 | defeating 440 | defeats 441 | defender 442 | deference 443 | deft 444 | deginified 445 | delectable 446 | delicacy 447 | delicate 448 | delicious 449 | delight 450 | delighted 451 | delightful 452 | delightfully 453 | delightfulness 454 | dependable 455 | dependably 456 | deservedly 457 | deserving 458 | desirable 459 | desiring 460 | desirous 461 | destiny 462 | detachable 463 | devout 464 | dexterous 465 | dexterously 466 | dextrous 467 | dignified 468 | dignify 469 | dignity 470 | diligence 471 | diligent 472 | diligently 473 | diplomatic 474 | dirt-cheap 475 | distinction 476 | distinctive 477 | distinguished 478 | diversified 479 | divine 480 | divinely 481 | dominate 482 | dominated 483 | dominates 484 | dote 485 | dotingly 486 | doubtless 487 | dreamland 488 | dumbfounded 489 | dumbfounding 490 | dummy-proof 491 | durable 492 | dynamic 493 | eager 494 | eagerly 495 | eagerness 496 | earnest 497 | earnestly 498 | earnestness 499 | ease 500 | eased 501 | eases 502 | easier 503 | easiest 504 | easiness 505 | easing 506 | easy 507 | easy-to-use 508 | easygoing 509 | ebullience 510 | ebullient 511 | ebulliently 512 | ecenomical 513 | economical 514 | ecstasies 515 | ecstasy 516 | ecstatic 517 | ecstatically 518 | edify 519 | educated 520 | effective 521 | effectively 522 | effectiveness 523 | effectual 524 | efficacious 525 | efficient 526 | efficiently 527 | effortless 528 | effortlessly 529 | effusion 530 | effusive 531 | effusively 532 | effusiveness 533 | elan 534 | elate 535 | elated 536 | elatedly 537 | elation 538 | electrify 539 | elegance 540 | elegant 541 | elegantly 542 | elevate 543 | elite 544 | eloquence 545 | eloquent 546 | eloquently 547 | embolden 548 | eminence 549 | eminent 550 | empathize 551 | empathy 552 | empower 553 | empowerment 554 | enchant 555 | enchanted 556 | enchanting 557 | enchantingly 558 | encourage 559 | encouragement 560 | encouraging 561 | encouragingly 562 | endear 563 | endearing 564 | endorse 565 | endorsed 566 | endorsement 567 | endorses 568 | endorsing 569 | energetic 570 | energize 571 | energy-efficient 572 | energy-saving 573 | engaging 574 | engrossing 575 | enhance 576 | enhanced 577 | enhancement 578 | enhances 579 | enjoy 580 | enjoyable 581 | enjoyably 582 | enjoyed 583 | enjoying 584 | enjoyment 585 | enjoys 586 | enlighten 587 | enlightenment 588 | enliven 589 | ennoble 590 | enough 591 | enrapt 592 | enrapture 593 | enraptured 594 | enrich 595 | enrichment 596 | enterprising 597 | entertain 598 | entertaining 599 | entertains 600 | enthral 601 | enthrall 602 | enthralled 603 | enthuse 604 | enthusiasm 605 | enthusiast 606 | enthusiastic 607 | enthusiastically 608 | entice 609 | enticed 610 | enticing 611 | enticingly 612 | entranced 613 | entrancing 614 | entrust 615 | enviable 616 | enviably 617 | envious 618 | enviously 619 | enviousness 620 | envy 621 | equitable 622 | ergonomical 623 | err-free 624 | erudite 625 | ethical 626 | eulogize 627 | euphoria 628 | euphoric 629 | euphorically 630 | evaluative 631 | evenly 632 | eventful 633 | everlasting 634 | evocative 635 | exalt 636 | exaltation 637 | exalted 638 | exaltedly 639 | exalting 640 | exaltingly 641 | examplar 642 | examplary 643 | excallent 644 | exceed 645 | exceeded 646 | exceeding 647 | exceedingly 648 | exceeds 649 | excel 650 | exceled 651 | excelent 652 | excellant 653 | excelled 654 | excellence 655 | excellency 656 | excellent 657 | excellently 658 | excels 659 | exceptional 660 | exceptionally 661 | excite 662 | excited 663 | excitedly 664 | excitedness 665 | excitement 666 | excites 667 | exciting 668 | excitingly 669 | exellent 670 | exemplar 671 | exemplary 672 | exhilarate 673 | exhilarating 674 | exhilaratingly 675 | exhilaration 676 | exonerate 677 | expansive 678 | expeditiously 679 | expertly 680 | exquisite 681 | exquisitely 682 | extol 683 | extoll 684 | extraordinarily 685 | extraordinary 686 | exuberance 687 | exuberant 688 | exuberantly 689 | exult 690 | exultant 691 | exultation 692 | exultingly 693 | eye-catch 694 | eye-catching 695 | eyecatch 696 | eyecatching 697 | fabulous 698 | fabulously 699 | facilitate 700 | fair 701 | fairly 702 | fairness 703 | faith 704 | faithful 705 | faithfully 706 | faithfulness 707 | fame 708 | famed 709 | famous 710 | famously 711 | fancier 712 | fancinating 713 | fancy 714 | fanfare 715 | fans 716 | fantastic 717 | fantastically 718 | fascinate 719 | fascinating 720 | fascinatingly 721 | fascination 722 | fashionable 723 | fashionably 724 | fast 725 | fast-growing 726 | fast-paced 727 | faster 728 | fastest 729 | fastest-growing 730 | faultless 731 | fav 732 | fave 733 | favor 734 | favorable 735 | favored 736 | favorite 737 | favorited 738 | favour 739 | fearless 740 | fearlessly 741 | feasible 742 | feasibly 743 | feat 744 | feature-rich 745 | fecilitous 746 | feisty 747 | felicitate 748 | felicitous 749 | felicity 750 | fertile 751 | fervent 752 | fervently 753 | fervid 754 | fervidly 755 | fervor 756 | festive 757 | fidelity 758 | fiery 759 | fine 760 | fine-looking 761 | finely 762 | finer 763 | finest 764 | firmer 765 | first-class 766 | first-in-class 767 | first-rate 768 | flashy 769 | flatter 770 | flattering 771 | flatteringly 772 | flawless 773 | flawlessly 774 | flexibility 775 | flexible 776 | flourish 777 | flourishing 778 | fluent 779 | flutter 780 | fond 781 | fondly 782 | fondness 783 | foolproof 784 | foremost 785 | foresight 786 | formidable 787 | fortitude 788 | fortuitous 789 | fortuitously 790 | fortunate 791 | fortunately 792 | fortune 793 | fragrant 794 | free 795 | freed 796 | freedom 797 | freedoms 798 | fresh 799 | fresher 800 | freshest 801 | friendliness 802 | friendly 803 | frolic 804 | frugal 805 | fruitful 806 | ftw 807 | fulfillment 808 | fun 809 | futurestic 810 | futuristic 811 | gaiety 812 | gaily 813 | gain 814 | gained 815 | gainful 816 | gainfully 817 | gaining 818 | gains 819 | gallant 820 | gallantly 821 | galore 822 | geekier 823 | geeky 824 | gem 825 | gems 826 | generosity 827 | generous 828 | generously 829 | genial 830 | genius 831 | gentle 832 | gentlest 833 | genuine 834 | gifted 835 | glad 836 | gladden 837 | gladly 838 | gladness 839 | glamorous 840 | glee 841 | gleeful 842 | gleefully 843 | glimmer 844 | glimmering 845 | glisten 846 | glistening 847 | glitter 848 | glitz 849 | glorify 850 | glorious 851 | gloriously 852 | glory 853 | glow 854 | glowing 855 | glowingly 856 | god-given 857 | god-send 858 | godlike 859 | godsend 860 | gold 861 | golden 862 | good 863 | goodly 864 | goodness 865 | goodwill 866 | goood 867 | gooood 868 | gorgeous 869 | gorgeously 870 | grace 871 | graceful 872 | gracefully 873 | gracious 874 | graciously 875 | graciousness 876 | grand 877 | grandeur 878 | grateful 879 | gratefully 880 | gratification 881 | gratified 882 | gratifies 883 | gratify 884 | gratifying 885 | gratifyingly 886 | gratitude 887 | great 888 | greatest 889 | greatness 890 | grin 891 | groundbreaking 892 | guarantee 893 | guidance 894 | guiltless 895 | gumption 896 | gush 897 | gusto 898 | gutsy 899 | hail 900 | halcyon 901 | hale 902 | hallmark 903 | hallmarks 904 | hallowed 905 | handier 906 | handily 907 | hands-down 908 | handsome 909 | handsomely 910 | handy 911 | happier 912 | happily 913 | happiness 914 | happy 915 | hard-working 916 | hardier 917 | hardy 918 | harmless 919 | harmonious 920 | harmoniously 921 | harmonize 922 | harmony 923 | headway 924 | heal 925 | healthful 926 | healthy 927 | hearten 928 | heartening 929 | heartfelt 930 | heartily 931 | heartwarming 932 | heaven 933 | heavenly 934 | helped 935 | helpful 936 | helping 937 | hero 938 | heroic 939 | heroically 940 | heroine 941 | heroize 942 | heros 943 | high-quality 944 | high-spirited 945 | hilarious 946 | holy 947 | homage 948 | honest 949 | honesty 950 | honor 951 | honorable 952 | honored 953 | honoring 954 | hooray 955 | hopeful 956 | hospitable 957 | hot 958 | hotcake 959 | hotcakes 960 | hottest 961 | hug 962 | humane 963 | humble 964 | humility 965 | humor 966 | humorous 967 | humorously 968 | humour 969 | humourous 970 | ideal 971 | idealize 972 | ideally 973 | idol 974 | idolize 975 | idolized 976 | idyllic 977 | illuminate 978 | illuminati 979 | illuminating 980 | illumine 981 | illustrious 982 | ilu 983 | imaculate 984 | imaginative 985 | immaculate 986 | immaculately 987 | immense 988 | impartial 989 | impartiality 990 | impartially 991 | impassioned 992 | impeccable 993 | impeccably 994 | important 995 | impress 996 | impressed 997 | impresses 998 | impressive 999 | impressively 1000 | impressiveness 1001 | improve 1002 | improved 1003 | improvement 1004 | improvements 1005 | improves 1006 | improving 1007 | incredible 1008 | incredibly 1009 | indebted 1010 | individualized 1011 | indulgence 1012 | indulgent 1013 | industrious 1014 | inestimable 1015 | inestimably 1016 | inexpensive 1017 | infallibility 1018 | infallible 1019 | infallibly 1020 | influential 1021 | ingenious 1022 | ingeniously 1023 | ingenuity 1024 | ingenuous 1025 | ingenuously 1026 | innocuous 1027 | innovation 1028 | innovative 1029 | inpressed 1030 | insightful 1031 | insightfully 1032 | inspiration 1033 | inspirational 1034 | inspire 1035 | inspiring 1036 | instantly 1037 | instructive 1038 | instrumental 1039 | integral 1040 | integrated 1041 | intelligence 1042 | intelligent 1043 | intelligible 1044 | interesting 1045 | interests 1046 | intimacy 1047 | intimate 1048 | intricate 1049 | intrigue 1050 | intriguing 1051 | intriguingly 1052 | intuitive 1053 | invaluable 1054 | invaluablely 1055 | inventive 1056 | invigorate 1057 | invigorating 1058 | invincibility 1059 | invincible 1060 | inviolable 1061 | inviolate 1062 | invulnerable 1063 | irreplaceable 1064 | irreproachable 1065 | irresistible 1066 | irresistibly 1067 | issue-free 1068 | jaw-droping 1069 | jaw-dropping 1070 | jollify 1071 | jolly 1072 | jovial 1073 | joy 1074 | joyful 1075 | joyfully 1076 | joyous 1077 | joyously 1078 | jubilant 1079 | jubilantly 1080 | jubilate 1081 | jubilation 1082 | jubiliant 1083 | judicious 1084 | justly 1085 | keen 1086 | keenly 1087 | keenness 1088 | kid-friendly 1089 | kindliness 1090 | kindly 1091 | kindness 1092 | knowledgeable 1093 | kudos 1094 | large-capacity 1095 | laud 1096 | laudable 1097 | laudably 1098 | lavish 1099 | lavishly 1100 | law-abiding 1101 | lawful 1102 | lawfully 1103 | lead 1104 | leading 1105 | leads 1106 | lean 1107 | led 1108 | legendary 1109 | leverage 1110 | levity 1111 | liberate 1112 | liberation 1113 | liberty 1114 | lifesaver 1115 | light-hearted 1116 | lighter 1117 | likable 1118 | like 1119 | liked 1120 | likes 1121 | liking 1122 | lionhearted 1123 | lively 1124 | logical 1125 | long-lasting 1126 | lovable 1127 | lovably 1128 | love 1129 | loved 1130 | loveliness 1131 | lovely 1132 | lover 1133 | loves 1134 | loving 1135 | low-cost 1136 | low-price 1137 | low-priced 1138 | low-risk 1139 | lower-priced 1140 | loyal 1141 | loyalty 1142 | lucid 1143 | lucidly 1144 | luck 1145 | luckier 1146 | luckiest 1147 | luckiness 1148 | lucky 1149 | lucrative 1150 | luminous 1151 | lush 1152 | luster 1153 | lustrous 1154 | luxuriant 1155 | luxuriate 1156 | luxurious 1157 | luxuriously 1158 | luxury 1159 | lyrical 1160 | magic 1161 | magical 1162 | magnanimous 1163 | magnanimously 1164 | magnificence 1165 | magnificent 1166 | magnificently 1167 | majestic 1168 | majesty 1169 | manageable 1170 | maneuverable 1171 | marvel 1172 | marveled 1173 | marvelled 1174 | marvellous 1175 | marvelous 1176 | marvelously 1177 | marvelousness 1178 | marvels 1179 | master 1180 | masterful 1181 | masterfully 1182 | masterpiece 1183 | masterpieces 1184 | masters 1185 | mastery 1186 | matchless 1187 | mature 1188 | maturely 1189 | maturity 1190 | meaningful 1191 | memorable 1192 | merciful 1193 | mercifully 1194 | mercy 1195 | merit 1196 | meritorious 1197 | merrily 1198 | merriment 1199 | merriness 1200 | merry 1201 | mesmerize 1202 | mesmerized 1203 | mesmerizes 1204 | mesmerizing 1205 | mesmerizingly 1206 | meticulous 1207 | meticulously 1208 | mightily 1209 | mighty 1210 | mind-blowing 1211 | miracle 1212 | miracles 1213 | miraculous 1214 | miraculously 1215 | miraculousness 1216 | modern 1217 | modest 1218 | modesty 1219 | momentous 1220 | monumental 1221 | monumentally 1222 | morality 1223 | motivated 1224 | multi-purpose 1225 | navigable 1226 | neat 1227 | neatest 1228 | neatly 1229 | nice 1230 | nicely 1231 | nicer 1232 | nicest 1233 | nifty 1234 | nimble 1235 | noble 1236 | nobly 1237 | noiseless 1238 | non-violence 1239 | non-violent 1240 | notably 1241 | noteworthy 1242 | nourish 1243 | nourishing 1244 | nourishment 1245 | novelty 1246 | nurturing 1247 | oasis 1248 | obsession 1249 | obsessions 1250 | obtainable 1251 | openly 1252 | openness 1253 | optimal 1254 | optimism 1255 | optimistic 1256 | opulent 1257 | orderly 1258 | originality 1259 | outdo 1260 | outdone 1261 | outperform 1262 | outperformed 1263 | outperforming 1264 | outperforms 1265 | outshine 1266 | outshone 1267 | outsmart 1268 | outstanding 1269 | outstandingly 1270 | outstrip 1271 | outwit 1272 | ovation 1273 | overjoyed 1274 | overtake 1275 | overtaken 1276 | overtakes 1277 | overtaking 1278 | overtook 1279 | overture 1280 | pain-free 1281 | painless 1282 | painlessly 1283 | palatial 1284 | pamper 1285 | pampered 1286 | pamperedly 1287 | pamperedness 1288 | pampers 1289 | panoramic 1290 | paradise 1291 | paramount 1292 | pardon 1293 | passion 1294 | passionate 1295 | passionately 1296 | patience 1297 | patient 1298 | patiently 1299 | patriot 1300 | patriotic 1301 | peace 1302 | peaceable 1303 | peaceful 1304 | peacefully 1305 | peacekeepers 1306 | peach 1307 | peerless 1308 | pep 1309 | pepped 1310 | pepping 1311 | peppy 1312 | peps 1313 | perfect 1314 | perfection 1315 | perfectly 1316 | permissible 1317 | perseverance 1318 | persevere 1319 | personages 1320 | personalized 1321 | phenomenal 1322 | phenomenally 1323 | picturesque 1324 | piety 1325 | pinnacle 1326 | playful 1327 | playfully 1328 | pleasant 1329 | pleasantly 1330 | pleased 1331 | pleases 1332 | pleasing 1333 | pleasingly 1334 | pleasurable 1335 | pleasurably 1336 | pleasure 1337 | plentiful 1338 | pluses 1339 | plush 1340 | plusses 1341 | poetic 1342 | poeticize 1343 | poignant 1344 | poise 1345 | poised 1346 | polished 1347 | polite 1348 | politeness 1349 | popular 1350 | portable 1351 | posh 1352 | positive 1353 | positively 1354 | positives 1355 | powerful 1356 | powerfully 1357 | praise 1358 | praiseworthy 1359 | praising 1360 | pre-eminent 1361 | precious 1362 | precise 1363 | precisely 1364 | preeminent 1365 | prefer 1366 | preferable 1367 | preferably 1368 | prefered 1369 | preferes 1370 | preferring 1371 | prefers 1372 | premier 1373 | prestige 1374 | prestigious 1375 | prettily 1376 | pretty 1377 | priceless 1378 | pride 1379 | principled 1380 | privilege 1381 | privileged 1382 | prize 1383 | proactive 1384 | problem-free 1385 | problem-solver 1386 | prodigious 1387 | prodigiously 1388 | prodigy 1389 | productive 1390 | productively 1391 | proficient 1392 | proficiently 1393 | profound 1394 | profoundly 1395 | profuse 1396 | profusion 1397 | progress 1398 | progressive 1399 | prolific 1400 | prominence 1401 | prominent 1402 | promise 1403 | promised 1404 | promises 1405 | promising 1406 | promoter 1407 | prompt 1408 | promptly 1409 | proper 1410 | properly 1411 | propitious 1412 | propitiously 1413 | pros 1414 | prosper 1415 | prosperity 1416 | prosperous 1417 | prospros 1418 | protect 1419 | protection 1420 | protective 1421 | proud 1422 | proven 1423 | proves 1424 | providence 1425 | proving 1426 | prowess 1427 | prudence 1428 | prudent 1429 | prudently 1430 | punctual 1431 | pure 1432 | purify 1433 | purposeful 1434 | quaint 1435 | qualified 1436 | qualify 1437 | quicker 1438 | quiet 1439 | quieter 1440 | radiance 1441 | radiant 1442 | rapid 1443 | rapport 1444 | rapt 1445 | rapture 1446 | raptureous 1447 | raptureously 1448 | rapturous 1449 | rapturously 1450 | rational 1451 | razor-sharp 1452 | reachable 1453 | readable 1454 | readily 1455 | ready 1456 | reaffirm 1457 | reaffirmation 1458 | realistic 1459 | realizable 1460 | reasonable 1461 | reasonably 1462 | reasoned 1463 | reassurance 1464 | reassure 1465 | receptive 1466 | reclaim 1467 | recomend 1468 | recommend 1469 | recommendation 1470 | recommendations 1471 | recommended 1472 | reconcile 1473 | reconciliation 1474 | record-setting 1475 | recover 1476 | recovery 1477 | rectification 1478 | rectify 1479 | rectifying 1480 | redeem 1481 | redeeming 1482 | redemption 1483 | refine 1484 | refined 1485 | refinement 1486 | reform 1487 | reformed 1488 | reforming 1489 | reforms 1490 | refresh 1491 | refreshed 1492 | refreshing 1493 | refund 1494 | refunded 1495 | regal 1496 | regally 1497 | regard 1498 | rejoice 1499 | rejoicing 1500 | rejoicingly 1501 | rejuvenate 1502 | rejuvenated 1503 | rejuvenating 1504 | relaxed 1505 | relent 1506 | reliable 1507 | reliably 1508 | relief 1509 | relish 1510 | remarkable 1511 | remarkably 1512 | remedy 1513 | remission 1514 | remunerate 1515 | renaissance 1516 | renewed 1517 | renown 1518 | renowned 1519 | replaceable 1520 | reputable 1521 | reputation 1522 | resilient 1523 | resolute 1524 | resound 1525 | resounding 1526 | resourceful 1527 | resourcefulness 1528 | respect 1529 | respectable 1530 | respectful 1531 | respectfully 1532 | respite 1533 | resplendent 1534 | responsibly 1535 | responsive 1536 | restful 1537 | restored 1538 | restructure 1539 | restructured 1540 | restructuring 1541 | retractable 1542 | revel 1543 | revelation 1544 | revere 1545 | reverence 1546 | reverent 1547 | reverently 1548 | revitalize 1549 | revival 1550 | revive 1551 | revives 1552 | revolutionary 1553 | revolutionize 1554 | revolutionized 1555 | revolutionizes 1556 | reward 1557 | rewarding 1558 | rewardingly 1559 | rich 1560 | richer 1561 | richly 1562 | richness 1563 | right 1564 | righten 1565 | righteous 1566 | righteously 1567 | righteousness 1568 | rightful 1569 | rightfully 1570 | rightly 1571 | rightness 1572 | risk-free 1573 | robust 1574 | rock-star 1575 | rock-stars 1576 | rockstar 1577 | rockstars 1578 | romantic 1579 | romantically 1580 | romanticize 1581 | roomier 1582 | roomy 1583 | rosy 1584 | safe 1585 | safely 1586 | sagacity 1587 | sagely 1588 | saint 1589 | saintliness 1590 | saintly 1591 | salutary 1592 | salute 1593 | sane 1594 | satisfactorily 1595 | satisfactory 1596 | satisfied 1597 | satisfies 1598 | satisfy 1599 | satisfying 1600 | satisified 1601 | saver 1602 | savings 1603 | savior 1604 | savvy 1605 | scenic 1606 | seamless 1607 | seasoned 1608 | secure 1609 | securely 1610 | selective 1611 | self-determination 1612 | self-respect 1613 | self-satisfaction 1614 | self-sufficiency 1615 | self-sufficient 1616 | sensation 1617 | sensational 1618 | sensationally 1619 | sensations 1620 | sensible 1621 | sensibly 1622 | sensitive 1623 | serene 1624 | serenity 1625 | sexy 1626 | sharp 1627 | sharper 1628 | sharpest 1629 | shimmering 1630 | shimmeringly 1631 | shine 1632 | shiny 1633 | significant 1634 | silent 1635 | simpler 1636 | simplest 1637 | simplified 1638 | simplifies 1639 | simplify 1640 | simplifying 1641 | sincere 1642 | sincerely 1643 | sincerity 1644 | skill 1645 | skilled 1646 | skillful 1647 | skillfully 1648 | slammin 1649 | sleek 1650 | slick 1651 | smart 1652 | smarter 1653 | smartest 1654 | smartly 1655 | smile 1656 | smiles 1657 | smiling 1658 | smilingly 1659 | smitten 1660 | smooth 1661 | smoother 1662 | smoothes 1663 | smoothest 1664 | smoothly 1665 | snappy 1666 | snazzy 1667 | sociable 1668 | soft 1669 | softer 1670 | solace 1671 | solicitous 1672 | solicitously 1673 | solid 1674 | solidarity 1675 | soothe 1676 | soothingly 1677 | sophisticated 1678 | soulful 1679 | soundly 1680 | soundness 1681 | spacious 1682 | sparkle 1683 | sparkling 1684 | spectacular 1685 | spectacularly 1686 | speedily 1687 | speedy 1688 | spellbind 1689 | spellbinding 1690 | spellbindingly 1691 | spellbound 1692 | spirited 1693 | spiritual 1694 | splendid 1695 | splendidly 1696 | splendor 1697 | spontaneous 1698 | sporty 1699 | spotless 1700 | sprightly 1701 | stability 1702 | stabilize 1703 | stable 1704 | stainless 1705 | standout 1706 | state-of-the-art 1707 | stately 1708 | statuesque 1709 | staunch 1710 | staunchly 1711 | staunchness 1712 | steadfast 1713 | steadfastly 1714 | steadfastness 1715 | steadiest 1716 | steadiness 1717 | steady 1718 | stellar 1719 | stellarly 1720 | stimulate 1721 | stimulates 1722 | stimulating 1723 | stimulative 1724 | stirringly 1725 | straighten 1726 | straightforward 1727 | streamlined 1728 | striking 1729 | strikingly 1730 | striving 1731 | strong 1732 | stronger 1733 | strongest 1734 | stunned 1735 | stunning 1736 | stunningly 1737 | stupendous 1738 | stupendously 1739 | sturdier 1740 | sturdy 1741 | stylish 1742 | stylishly 1743 | stylized 1744 | suave 1745 | suavely 1746 | sublime 1747 | subsidize 1748 | subsidized 1749 | subsidizes 1750 | subsidizing 1751 | substantive 1752 | succeed 1753 | succeeded 1754 | succeeding 1755 | succeeds 1756 | succes 1757 | success 1758 | successes 1759 | successful 1760 | successfully 1761 | suffice 1762 | sufficed 1763 | suffices 1764 | sufficient 1765 | sufficiently 1766 | suitable 1767 | sumptuous 1768 | sumptuously 1769 | sumptuousness 1770 | super 1771 | superb 1772 | superbly 1773 | superior 1774 | superiority 1775 | supple 1776 | support 1777 | supported 1778 | supporter 1779 | supporting 1780 | supportive 1781 | supports 1782 | supremacy 1783 | supreme 1784 | supremely 1785 | supurb 1786 | supurbly 1787 | surmount 1788 | surpass 1789 | surreal 1790 | survival 1791 | survivor 1792 | sustainability 1793 | sustainable 1794 | swank 1795 | swankier 1796 | swankiest 1797 | swanky 1798 | sweeping 1799 | sweet 1800 | sweeten 1801 | sweetheart 1802 | sweetly 1803 | sweetness 1804 | swift 1805 | swiftness 1806 | talent 1807 | talented 1808 | talents 1809 | tantalize 1810 | tantalizing 1811 | tantalizingly 1812 | tempt 1813 | tempting 1814 | temptingly 1815 | tenacious 1816 | tenaciously 1817 | tenacity 1818 | tender 1819 | tenderly 1820 | terrific 1821 | terrifically 1822 | thank 1823 | thankful 1824 | thinner 1825 | thoughtful 1826 | thoughtfully 1827 | thoughtfulness 1828 | thrift 1829 | thrifty 1830 | thrill 1831 | thrilled 1832 | thrilling 1833 | thrillingly 1834 | thrills 1835 | thrive 1836 | thriving 1837 | thumb-up 1838 | thumbs-up 1839 | tickle 1840 | tidy 1841 | time-honored 1842 | timely 1843 | tingle 1844 | titillate 1845 | titillating 1846 | titillatingly 1847 | togetherness 1848 | tolerable 1849 | toll-free 1850 | top 1851 | top-notch 1852 | top-quality 1853 | topnotch 1854 | tops 1855 | tough 1856 | tougher 1857 | toughest 1858 | traction 1859 | tranquil 1860 | tranquility 1861 | transparent 1862 | treasure 1863 | tremendously 1864 | trendy 1865 | triumph 1866 | triumphal 1867 | triumphant 1868 | triumphantly 1869 | trivially 1870 | trophy 1871 | trouble-free 1872 | trump 1873 | trumpet 1874 | trust 1875 | trusted 1876 | trusting 1877 | trustingly 1878 | trustworthiness 1879 | trustworthy 1880 | trusty 1881 | truthful 1882 | truthfully 1883 | truthfulness 1884 | twinkly 1885 | ultra-crisp 1886 | unabashed 1887 | unabashedly 1888 | unaffected 1889 | unassailable 1890 | unbeatable 1891 | unbiased 1892 | unbound 1893 | uncomplicated 1894 | unconditional 1895 | undamaged 1896 | undaunted 1897 | understandable 1898 | undisputable 1899 | undisputably 1900 | undisputed 1901 | unencumbered 1902 | unequivocal 1903 | unequivocally 1904 | unfazed 1905 | unfettered 1906 | unforgettable 1907 | unity 1908 | unlimited 1909 | unmatched 1910 | unparalleled 1911 | unquestionable 1912 | unquestionably 1913 | unreal 1914 | unrestricted 1915 | unrivaled 1916 | unselfish 1917 | unwavering 1918 | upbeat 1919 | upgradable 1920 | upgradeable 1921 | upgraded 1922 | upheld 1923 | uphold 1924 | uplift 1925 | uplifting 1926 | upliftingly 1927 | upliftment 1928 | upscale 1929 | usable 1930 | useable 1931 | useful 1932 | user-friendly 1933 | user-replaceable 1934 | valiant 1935 | valiantly 1936 | valor 1937 | valuable 1938 | variety 1939 | venerate 1940 | verifiable 1941 | veritable 1942 | versatile 1943 | versatility 1944 | vibrant 1945 | vibrantly 1946 | victorious 1947 | victory 1948 | viewable 1949 | vigilance 1950 | vigilant 1951 | virtue 1952 | virtuous 1953 | virtuously 1954 | visionary 1955 | vivacious 1956 | vivid 1957 | vouch 1958 | vouchsafe 1959 | warm 1960 | warmer 1961 | warmhearted 1962 | warmly 1963 | warmth 1964 | wealthy 1965 | welcome 1966 | well 1967 | well-backlit 1968 | well-balanced 1969 | well-behaved 1970 | well-being 1971 | well-bred 1972 | well-connected 1973 | well-educated 1974 | well-established 1975 | well-informed 1976 | well-intentioned 1977 | well-known 1978 | well-made 1979 | well-managed 1980 | well-mannered 1981 | well-positioned 1982 | well-received 1983 | well-regarded 1984 | well-rounded 1985 | well-run 1986 | well-wishers 1987 | wellbeing 1988 | whoa 1989 | wholeheartedly 1990 | wholesome 1991 | whooa 1992 | whoooa 1993 | wieldy 1994 | willing 1995 | willingly 1996 | willingness 1997 | win 1998 | windfall 1999 | winnable 2000 | winner 2001 | winners 2002 | winning 2003 | wins 2004 | wisdom 2005 | wise 2006 | wisely 2007 | witty 2008 | won 2009 | wonder 2010 | wonderful 2011 | wonderfully 2012 | wonderous 2013 | wonderously 2014 | wonders 2015 | wondrous 2016 | woo 2017 | work 2018 | workable 2019 | worked 2020 | works 2021 | world-famous 2022 | worth 2023 | worth-while 2024 | worthiness 2025 | worthwhile 2026 | worthy 2027 | wow 2028 | wowed 2029 | wowing 2030 | wows 2031 | yay 2032 | youthful 2033 | zeal 2034 | zenith 2035 | zest 2036 | zippy 2037 | -------------------------------------------------------------------------------- /images/inclass-competition.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KwokHing/SentimentAnalysis-Python-Demo/1c661f2105c1d3997b38a612bda9d03bc9f36cbb/images/inclass-competition.jpg -------------------------------------------------------------------------------- /images/output_7_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KwokHing/SentimentAnalysis-Python-Demo/1c661f2105c1d3997b38a612bda9d03bc9f36cbb/images/output_7_1.png --------------------------------------------------------------------------------