├── README.md
├── Twitter Sentiment Analysis - Classical Approach VS Deep Learning.ipynb
└── images
├── dropout.png
├── embedding.png
├── love_scrable.jpg
└── sentiment_classification.png
/README.md:
--------------------------------------------------------------------------------
1 | # Twitter Sentiment Analysis - Classical Approach VS Deep Learning
2 |
3 |
4 |
5 | Photo by Gaelle Marcel on Unsplash.
6 |
7 | # Overview
8 |
9 | This project's aim, is to explore the world of *Natural Language Processing* (NLP) by building what is known as a **Sentiment Analysis Model**. A sentiment analysis model is a model that analyses a given piece of text and predicts whether this piece of text expresses positive or negative sentiment.
10 |
11 |
12 |
13 | To this end, we will be using the `sentiment140` dataset containing data collected from twitter. An impressive feature of this dataset is that it is *perfectly* balanced (i.e., the number of examples in each class is equal).
14 |
15 | Citing the [creators](http://help.sentiment140.com/for-students/) of this dataset:
16 |
17 | > *Our approach was unique because our training data was automatically created, as opposed to having humans manual annotate tweets. In our approach, we assume that any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. We used the Twitter Search API to collect these tweets by using keyword search*
18 |
19 | After a series of **cleaning and data processing**, and after visualizing our data in a **word cloud**, we will be building a **Naive Bayezian** model. This model's goal would be to properly classify positive and negative tweets in terms of sentiment.
20 | Next, we will propose a much more advanced solution using a **deep learning** model: **LSTM**. This process will require a different kind of data cleaning and processing. Also, we will discover **Word Embeddings**, **Dropout** and many other machine learning related concepts.
21 |
22 | Throughout this notebook, we will take advantage of every result, visualization and failure in order to try and further understand the data, extract insights and information from it and learn how to improve our model. From the type of words used in positive/negative sentiment tweets, to the vocabulary diversity in each case and the day of the week in which these tweets occur, to the overfitting concept and grasping the huge importance of the data while building a given model, I really hope that you'll enjoy going through this notebook and gain not only technical skills but also analytical skills from it.
23 |
24 | ---
25 |
26 | This notebook is written by **Joseph Assaker**. Feel free to reach out for any feedback on this notebook via [email](mailto:lb.josephassaker@gmail.com) or [LinkedIn](https://www.linkedin.com/in/joseph-assaker/).
27 |
28 | ---
29 |
30 | Now, let's start with the fun 🎉
31 |
32 | ### **Table of Content:**
33 |
34 | 1. Importing and Discovering the Dataset
35 | 2. Cleaning and Processing the Data
36 | 2.1. Tokenization
37 | 2.2. Lemmatization
38 | 2.3. Cleaning the Data
39 | 3. Visualizing the Data
40 | 4. Naive Bayesian Model
41 | 4.1. Splitting the Data
42 | 4.2. Training the Model
43 | 4.3. Testing the Model
44 | 4.4. Asserting the Model
45 | 5. Deep Learning Model - LSTM
46 | 5.1. Data Pre-processing
47 | 5.1.1. Word Embeddings
48 | 5.1.2. Global Vectors for Word Representation (GloVe)
49 | 5.1.3. Data Padding
50 | 5.2. Data Transformation
51 | 5.3. Building the Model
52 | 5.4. Training the Model
53 | 5.5. Investigating Possibilties to Improve the Model
54 | 5.5.1. Regularization - Dropout
55 | 5.5.2. Inspecting the Data - Unknown Words
56 | 5.6. Predicting on Custom Data
57 | 5.7. Inspecting Wrongly Predicted Data
58 | 6. Bonus Section
59 | 7. Extra Tip: Pickling !
60 | 8. Further Work
61 |
62 |
63 | Continue reading the whole notebook [here](https://github.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/blob/master/Twitter%20Sentiment%20Analysis%20-%20Classical%20Approach%20VS%20Deep%20Learning.ipynb).
64 |
65 | You can also find this notebook, and give it an upvote 😊, on [Kaggle](https://www.kaggle.com/josephassaker/twitter-sentiment-analysis-classical-vs-lstm).
66 |
--------------------------------------------------------------------------------
/images/dropout.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/dropout.png
--------------------------------------------------------------------------------
/images/embedding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/embedding.png
--------------------------------------------------------------------------------
/images/love_scrable.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/love_scrable.jpg
--------------------------------------------------------------------------------
/images/sentiment_classification.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/sentiment_classification.png
--------------------------------------------------------------------------------