├── README.md
├── Twitter Sentiment Analysis - Classical Approach VS Deep Learning.ipynb
└── images
    ├── dropout.png
    ├── embedding.png
    ├── love_scrable.jpg
    └── sentiment_classification.png


/README.md:
--------------------------------------------------------------------------------
 1 | # Twitter Sentiment Analysis - Classical Approach VS Deep Learning
 2 | 
 3 | <img src="./images/love_scrable.jpg" style="width:1000px;margin-bottom:15px">
 4 | 
 5 | <span>Photo by <a href="https://unsplash.com/@gaellemarcel?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Gaelle Marcel</a> on <a href="https://unsplash.com/s/photos/computer-text?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>.
 6 | 
 7 | # Overview
 8 | 
 9 | This project's aim, is to explore the world of *Natural Language Processing* (NLP) by building what is known as a **Sentiment Analysis Model**. A sentiment analysis model is a model that analyses a given piece of text and predicts whether this piece of text expresses positive or negative sentiment.
10 | 
11 | <center><img src="./images/sentiment_classification.png" style="width:800px;margin-bottom:15px"></center>
12 | 
13 | To this end, we will be using the `sentiment140` dataset containing data collected from twitter. An impressive feature of this dataset is that it is *perfectly* balanced (i.e., the number of examples in each class is equal).
14 | 
15 | Citing the [creators](http://help.sentiment140.com/for-students/) of this dataset:
16 | 
17 | > *Our approach was unique because our training data was automatically created, as opposed to having humans manual annotate tweets. In our approach, we assume that any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. We used the Twitter Search API to collect these tweets by using keyword search*
18 | 
19 | After a series of **cleaning and data processing**, and after visualizing our data in a **word cloud**, we will be building a **Naive Bayezian** model. This model's goal would be to properly classify positive and negative tweets in terms of sentiment.
20 | Next, we will propose a much more advanced solution using a **deep learning** model: **LSTM**. This process will require a different kind of data cleaning and processing. Also, we will discover **Word Embeddings**, **Dropout** and many other machine learning related concepts.
21 | 
22 | Throughout this notebook, we will take advantage of every result, visualization and failure in order to try and further understand the data, extract insights and information from it and learn how to improve our model. From the type of words used in positive/negative sentiment tweets, to the vocabulary diversity in each case and the day of the week in which these tweets occur, to the overfitting concept and grasping the huge importance of the data while building a given model, I really hope that you'll enjoy going through this notebook and gain not only technical skills but also analytical skills from it.
23 | 
24 | ---
25 | 
26 | This notebook is written by **Joseph Assaker**. Feel free to reach out for any feedback on this notebook via [email](mailto:lb.josephassaker@gmail.com) or [LinkedIn](https://www.linkedin.com/in/joseph-assaker/).
27 | 
28 | ---
29 | 
30 | Now, let's start with the fun 🎉
31 | 
32 | ### **Table of Content:**
33 | 
34 |  1. Importing and Discovering the Dataset  
35 |  2. Cleaning and Processing the Data  
36 |   2.1. Tokenization  
37 |   2.2. Lemmatization  
38 |   2.3. Cleaning the Data  
39 |  3. Visualizing the Data
40 |  4. Naive Bayesian Model  
41 |   4.1. Splitting the Data  
42 |   4.2. Training the Model  
43 |   4.3. Testing the Model  
44 |   4.4. Asserting the Model    
45 |  5. Deep Learning Model - LSTM  
46 |   5.1. Data Pre-processing  
47 | &nbsp;&nbsp;&nbsp;&nbsp;5.1.1. Word Embeddings  
48 | &nbsp;&nbsp;&nbsp;&nbsp;5.1.2. Global Vectors for Word Representation (GloVe)  
49 | &nbsp;&nbsp;&nbsp;&nbsp;5.1.3. Data Padding  
50 |   5.2. Data Transformation  
51 |   5.3. Building the Model  
52 |   5.4. Training the Model  
53 |   5.5. Investigating Possibilties to Improve the Model  
54 | &nbsp;&nbsp;&nbsp;&nbsp;5.5.1. Regularization - Dropout  
55 | &nbsp;&nbsp;&nbsp;&nbsp;5.5.2. Inspecting the Data - Unknown Words  
56 |   5.6. Predicting on Custom Data  
57 |   5.7. Inspecting Wrongly Predicted Data  
58 |  6. Bonus Section
59 |  7. Extra Tip: Pickling !
60 |  8. Further Work
61 |  
62 |  
63 | Continue reading the whole notebook [here](https://github.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/blob/master/Twitter%20Sentiment%20Analysis%20-%20Classical%20Approach%20VS%20Deep%20Learning.ipynb).
64 | 
65 | You can also find this notebook, and give it an upvote 😊, on [Kaggle](https://www.kaggle.com/josephassaker/twitter-sentiment-analysis-classical-vs-lstm).
66 | 


--------------------------------------------------------------------------------
/images/dropout.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/dropout.png


--------------------------------------------------------------------------------
/images/embedding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/embedding.png


--------------------------------------------------------------------------------
/images/love_scrable.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/love_scrable.jpg


--------------------------------------------------------------------------------
/images/sentiment_classification.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JosephAssaker/Twitter-Sentiment-Analysis-Classical-Approach-VS-Deep-Learning/d6a9925f7c563e4298674a54f581ec89ee20c1d6/images/sentiment_classification.png


--------------------------------------------------------------------------------