├── .github └── workflows │ └── contributors.yml ├── Code_of_Conduct.md ├── Contributing.md ├── README.md ├── app ├── main.py ├── sentiment_analysis.py ├── stream.py ├── tweet_network.py ├── tweets.json ├── tweets_data.py └── twitter_config.py ├── issue_template.md ├── pull_request_template.md └── requirements.txt /.github/workflows/contributors.yml: -------------------------------------------------------------------------------- 1 | name: Add contributors 2 | on: 3 | schedule: 4 | - cron: '0 0 * * *' 5 | # push: 6 | # branches: 7 | # - main 8 | 9 | jobs: 10 | add-contributors: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - uses: actions/checkout@v2 14 | - uses: BobAnkh/add-contributors@master 15 | with: 16 | CONTRIBUTOR: '# Contributors 📑' 17 | COLUMN_PER_ROW: '8' 18 | ACCESS_TOKEN: ${{secrets.GITHUB_TOKEN}} 19 | IMG_WIDTH: '80' 20 | FONT_SIZE: '12' 21 | PATH: '/README.md' 22 | COMMIT_MESSAGE: 'docs(README): update contributors' 23 | AVATAR_SHAPE: 'round' 24 | -------------------------------------------------------------------------------- /Code_of_Conduct.md: -------------------------------------------------------------------------------- 1 | # Contributor Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | For the betterment of open-source and in order to make a welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of their age, body size disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, skin tone, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment includes: 10 | 11 | 12 | 13 | - Using welcoming and inclusive language 14 | 15 | 16 | - Being respectful of differing viewpoints and experiences 17 | 18 | 19 | - Gracefully accepting constructive criticism 20 | 21 | 22 | - Focusing on what is best for the community 23 | 24 | 25 | - Showing empathy towards other community members 26 | 27 | Examples of unacceptable behavior by participants includes: 28 | 29 | 30 | - The use of sexualized language or imagery and unwelcome sexual attention or advances 31 | 32 | 33 | - Trolling, insulting/derogatory comments, and personal or political attacks 34 | 35 | 36 | - Public or private harassment 37 | 38 | 39 | - Publishing others' private information, such as a physical or electronic address, without explicit permission 40 | 41 | 42 | - Other conduct which could reasonably be considered inappropriate in a professional setting 43 | 44 | ## Our Responsibilities 45 | 46 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 47 | 48 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 49 | 50 | ## Scope 51 | 52 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed a representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 53 | 54 | ## Enforcement 55 | 56 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 57 | 58 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 59 | 60 | 61 | Community leaders will follow these Community Impact Guidelines in determining 62 | the consequences for any action they deem in violation of this Code of Conduct: 63 | 64 | ### 1. Correction 65 | 66 | **Community Impact**: Use of inappropriate language or other behavior deemed 67 | unprofessional or unwelcome in the community. 68 | 69 | **Consequence**: A private, written warning from community leaders, providing 70 | clarity around the nature of the violation and an explanation of why the 71 | behavior was inappropriate. A public apology may be requested. 72 | 73 | ### 2. Warning 74 | 75 | **Community Impact**: A violation through a single incident or series 76 | of actions. 77 | 78 | **Consequence**: A warning with consequences for continued behavior. No 79 | interaction with the people involved, including unsolicited interaction with 80 | those enforcing the Code of Conduct, for a specified period of time. This 81 | includes avoiding interactions in community spaces as well as external channels 82 | like social media. Violating these terms may lead to a temporary or 83 | permanent ban. 84 | 85 | ### 3. Temporary Ban 86 | 87 | **Community Impact**: A serious violation of community standards, including 88 | sustained inappropriate behavior. 89 | 90 | **Consequence**: A temporary ban from any sort of interaction or public 91 | communication with the community for a specified period of time. No public or 92 | private interaction with the people involved, including unsolicited interaction 93 | with those enforcing the Code of Conduct, is allowed during this period. 94 | Violating these terms may lead to a permanent ban. 95 | 96 | ### 4. Permanent Ban 97 | 98 | **Community Impact**: Demonstrating a pattern of violation of community 99 | standards, including sustained inappropriate behavior, harassment of an 100 | individual, or aggression toward or disparagement of classes of individuals. 101 | 102 | **Consequence**: A permanent ban from any sort of public interaction within 103 | the community. 104 | -------------------------------------------------------------------------------- /Contributing.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 🤝 2 | 3 |
4 | 5 | [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com) 6 |   7 | [![Open Source? Yes!](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)](https://github.com/Naereen/badges/) 8 | 9 | 10 | This documentation contains a set of guidelines to help you during the contribution process. 11 | We are happy to welcome all the contributions from anyone willing to improve/add new scripts to this project. 12 | Thank you for helping out and remember, **no contribution is too small.** 13 | Being an open source contributor doesn't just mean writing code, either. You can help out by writing documentation, tests, or even giving suggestions. 🏆 14 | 15 |
16 | 17 | ## Note : All PRs to this repo must be made only to develop branch ( master is used only for deployment ). 18 | 19 | ### 0 : Issues 20 | 21 | - Always check the Existing Issues and **Do not create an issue if it already exists.** or create your own issue. 22 | - Only start working on an issue if it has been assigned to you. **Check assignees** 23 | - Every change in this project should/must have an associated issue. **Issue before PR** 24 | - Do not have multiple PRs for the same issue. **One PR per issue** 25 | - Assignee should make PR in a time bound manner (possibly 1 week ) otherwise it maybe unassigned. 26 | - If a PR closes the issue link it to the issue. 27 | - If a change is requested link the commit to the issue. 28 | 29 | 30 | 31 | ### 1 : Fork the Project 32 | 33 | - Fork this Repository. This will create a Local Copy of this Repository on your Github Profile. 34 | Keep a reference to the original project in `upstream` remote. 35 | 36 | ```bash 37 | git clone https://github.com// 38 | cd 39 | git remote add upstream https://github.com// 40 | ``` 41 | 42 | - If you have already forked the project, update your copy before working. 43 | 44 | ```bash 45 | git remote update 46 | git checkout 47 | git rebase upstream/ 48 | ``` 49 | 50 | ### 2 : Branch 51 | 52 | ### Create a new branch after setting up the project locally before making any changes, so as to avoid merge conflicts while making PRs . 53 | Use its name to identify the issue your addressing.Feature , Bug Fix or Enhancement. 54 | 55 | ```bash 56 | # It will create a new branch with name Branch_Name and switch to that branch 57 | git checkout -b branch_name 58 | ``` 59 | 60 | ### 3 : Work on the issue assigned 61 | 62 | - Work on the issue(s) assigned to you. 63 | - Add all the files/folders needed. 64 | - After you've made changes or made your contribution to the project add changes to the branch you've just created by: 65 | 66 | ```bash 67 | # To add all new files to branch Branch_Name 68 | git add . 69 | 70 | # To add only a few files to Branch_Name 71 | git add 72 | ``` 73 | 74 | ### 4 : Commit 75 | 76 | - To commit give a descriptive message for the convenience of reviewer by: 77 | 78 | ```bash 79 | # This message get associated with all files you have changed 80 | git commit -m "message" 81 | ``` 82 | 83 | - **NOTE**: A PR should have only one commit. Multiple commits should be squashed. 84 | 85 | ### 5 : Work Remotely 86 | 87 | ```bash 88 | # To push your work to your remote repository 89 | git push -u origin Branch_Name 90 | ``` 91 | 92 | ### 6 : Pull Request 93 | 94 | - Go to your repository in browser and click on compare and pull requests. 95 | Then add a title and description to your pull request that explains your contribution. 96 | 97 | 98 | ### 7 : Review 99 | 100 | - 🎉🌟Congratulations! Sit and relax, you've made your contribution to js-dom-snippets project. Wait until the PR is reviewed and incorporate changes suggested by the community. After which the PR can be successfully merged. 101 | 🎉🎊 102 | 103 | 104 | ### Note : Do not add images, rather 👇 105 | - You can do that by hosting all you images and screenshots to any images hosting sites such as [imgur](https://imgur.com/), [imgbb](https://imgbb.com/), [postimages](https://postimages.org/). 106 | - Then link your uploaded images to README files. 107 | 108 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 17 | 18 | --- 19 | 20 | # RealTime-TwitterDataAnalysis 21 | Collect and process real time x (formerly twitter) data to analyse popularity of tweets with specific keywords or hashtags , visualize important metrics, generate twitter networks and map tweets and trends geographically. 22 | 23 |
24 | 25 | 26 | **Analysing Twitter Data Can be useful in a wide variety of fields like:** 27 | 28 | - In the industry it can be used in **Marketing and Product Analysis** to improve upon an organization's business decisions. 29 | - It can be used to **Measure public opinions** which can serve to gauge mood of people in important topics of interest such as political or social events. 30 | - Further , it can be used for **Clustering Behavioral Groups** by identifying conversation spheres , patterns in behaviour of diferent subsections of the society and also the bridges or major influencers. 31 | 32 |
33 |
34 |
35 | 36 | 37 | - **Show Real Time Plot Of Tweet Volumes and Proportion Of tweets mentioning either keyword** 38 |

39 | 40 | ![Total Volume Of tweets](https://i.postimg.cc/rsy2C9md/total-mentions.png) ---- ![Average Mentions](https://i.postimg.cc/c1TS1P9f/average-mention.png) 41 | 42 | 43 |

44 | 45 | - **Graph tweet sentiment by performing NLP** 46 |

47 | 48 | ![Tweet Sentriment](https://i.postimg.cc/HnSDS7c6/tweets-sentiment.png) 49 | 50 |

51 | 52 | - **Node Networks for Retweets And Replied Tweets | Force Directed Circular Layout Showing more Retweeted user Larger in Size** 53 | 54 |

55 | 56 | 57 |
58 | node-network.png            Circular-layout-reply-network 59 | 60 |
61 | 62 |

63 | 64 | 65 | ### **Functionality** 66 | - Stream tweets containing specific keywords in real time 67 | - Show volume metrics for selected tweets in real time 68 | - Filter tweets by any time window 69 | - Plot the prevalance of tweets regarding a particular topic of interest. 70 | - Perform Sentiment analysis of tweets based on keyword and chart them in real time 71 | - Twitter Networks Graphing follow , favorited , retweet and reply networks. 72 | - Analyse Twitter Networks and discover important nodes that are influencers or conversation bridges. 73 | - Display Tweets Geographically on a Map. 74 | 75 |
76 | 77 | ### Domains 78 | [![Badge](https://shields.io/badge/-Data--Mining-CAF4F4?logo=appveyor&logoColor=000080)]()  79 | [![Badge](https://shields.io/badge/-Data--Manipulatio--and--Feature--Engineering-FCFFE9?logo=appveyor&logoColor=000080)]()  80 | [![Badge](https://shields.io/badge/-Big--Data--Analytics-FFF2CC?logo=appveyor&logoColor=000080)]()  81 | [![Badge](https://shields.io/badge/-Natural--Language--Processing-FDE0D9?logo=appveyor&logoColor=000080)]()  82 | [![Badge](https://shields.io/badge/-API--and--Web--Server--Communication-CAEFD1?logo=appveyor&logoColor=000080)]()  83 | [![Badge](https://shields.io/badge/-Network--Analysis-FFB8A7?logo=appveyor&logoColor=000080)]() 84 | 85 |
86 | 87 | 88 | ### Tech Stack 89 | [![Badge](https://img.shields.io/badge/Streamlit-EE4C2C?style=for-the-badge&logo=streamlit&logoColor=white)]()  90 | [![Badge](https://img.shields.io/badge/tweepy-1DA1F2?style=for-the-badge&logo=tweepy&logoColor=white)]()  91 | [![Badge](https://img.shields.io/badge/nltk-000000?style=for-the-badge&logo=nltk&logoColor=white)]()  92 | [![Badge](https://img.shields.io/badge/NetworkX-FFA500?style=for-the-badge&logo=NetworkX&logoColor=white)]()  93 | [![Badge](https://img.shields.io/badge/pandas-035a7d?style=for-the-badge&logo=pandas&logoColor=white)]()  94 | [![Badge](https://img.shields.io/badge/numpy-306998?style=for-the-badge&logo=numpy&logoColor=white)]()  95 | [![Badge](https://img.shields.io/badge/matplotlib-FFC0CB?style=for-the-badge&logo=matplotlib&logoColor=white)]() 96 | 97 |
98 |
99 | 100 | ## Setup Locally 101 | 102 | - Clone the repository on your local machine. 103 | ```sh 104 | git clone https://github.com/kaustav202/RealTime-TwitterDataAnalysis.git 105 | ``` 106 | - Go into the cloned directory 107 | - Run `pip install -r requirements.txt` to install all the dependencies. 108 | - Create a developer account on twitter: https://developer.twitter.com/en 109 | - Get your Twitter API credentials and replace the placeholders in twitter_config.py. 110 | - Go to the Twitter Developer Portal Projects & Apps page at https://developer.twitter.com/en/portal/projects-and-apps 111 | - Find the API/consumer key and secret under the Consumer Keys section of the Keys and Tokens tab of your app 112 | - Your account's access token and secret for your app can be found under the Authentication Tokens section of the Keys and Tokens tab of your app 113 | - From inside the `app/` folder, you can run `python stream.py` which adds(streams) the tweets into `tweets.json` 114 | - Run `python main.py` which is the application entry point preferably after some time so that you have more tweets to perform the analysis. 115 | - You can also perform Sentiment Analysis by running `python sentiment_analysis.py` and draw tweet network graphs by running `python tweet_network.py` 116 | - Remember that the streaming (writing) of tweets is a completely independent step that needs to be performed initially by running stream.py 117 | 118 |
119 | 120 | ### Data Format 121 | 122 | The data recieved from twitter stream api is in a json format 123 | 124 | ### Important Module and Object Structures 125 | 126 | **The Overall Structure Of the Project** 127 | [![twitter-Info-Structure.png](https://i.postimg.cc/6p8wmzhn/twitter-Info-Structure.png)](https://postimg.cc/gxbfwVQ2) 128 | 129 | 130 | 131 | ## How to get started with contributions 132 | 133 | - Read The [Contributing Guidelines](./Contributions.md) and [Code Of Conduct](./Code_Of_Conduct.md). 134 | 135 | #### Steps To Contribute 136 | 137 | - Fork this Repository. 138 | - Clone the Repository: `git clone "url of this repo"` 139 | - Check existing issues or raise a new issue of your own and ask it to be assigned to you. 140 | - Wait for the issue to be assigned to you. 141 | - Create a branch: `git checkout -b ` 142 | - Put your code :- 143 | 144 | - Make all necessary changes or modifications to the code in your local cloned branch. 145 | - Neccessary information like functionalities, screenshots, working video(if required) should be kept handy (you will need to present it when submitting the PR) 146 | 147 | - Push changes to gitHub ( on your forked repo ) : `git push origin ` 148 | - Create a new pull request to the original repo ( main branch of this repo ) 149 | - Submit your changes for review. 150 | - And Boom! You're done 🥳 151 | - The maintainers will review and merge your changes into the main branch of this project. You will be automatically notified via E-mail once the changes have been merged. 152 | 153 | #### Note : If you want your changes to count towards hacktoberfest ensure that the issue you are working on has #hacktoberfest label 154 | 155 | --- 156 | 157 | ![GitHub release](https://img.shields.io/github/release/kaustav202/RealTime-TwitterDataAnalysis)
158 | 159 | ![GitHub pull-requests merged](https://badgen.net/github/merged-prs/kaustav202/RealTime-TwitterDataAnalysis)   ![GitHub branches](https://badgen.net/github/branches/kaustav202/RealTime-TwitterDataAnalysis)  ![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)     ![Maintainer](https://img.shields.io/badge/maintainer-Kaustav-blue)   ![GitHub license](https://badgen.net/github/license/kaustav202/RealTime-TwitterDataAnalysis) 160 | 161 | ![GitHub forks](https://badgen.net/github/forks/kaustav202/RealTime-TwitterDataAnalysis)  ![GitHub stars](https://badgen.net/github/stars/kaustav202/RealTime-TwitterDataAnalysis)  ![GitHub issues](https://img.shields.io/github/issues/kaustav202/RealTime-TwitterDataAnalysis)  ![GitHub contributors](https://img.shields.io/github/contributors/kaustav202/RealTime-TwitterDataAnalysis) 162 | 163 | 164 | # Contributors 📑 165 | 166 | 167 | 168 | 175 | 182 | 189 | 196 | 203 | 210 | 211 |
169 | 170 | kaustav202/ 171 |
172 | kaustav202 173 |
174 |
176 | 177 | SegFal/ 178 |
179 | SegFal 180 |
181 |
183 | 184 | Rishav 185 |
186 | Rishav Mitra 187 |
188 |
190 | 191 | jatin00000/ 192 |
193 | jatin00000 194 |
195 |
197 | 198 | Manan 199 |
200 | Manan Garg 201 |
202 |
204 | 205 | Gokulakrishnan 206 |
207 | Gokulakrishnan Shankar 208 |
209 |
212 | 213 | 214 |

Project Maintainers ⚡

215 | 216 | 217 | --- 218 | 219 | ## Happy Contributing! 🧡 220 | 221 | [![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg)](https://forthebadge.com) 222 | 223 | Star Mark this repository and keep contributing as you learn!! 224 | -------------------------------------------------------------------------------- /app/main.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | from tweets_data import get_tweets 4 | from sentiment_analysis import SentimentAnalysis 5 | import pandas as pd 6 | 7 | 8 | 9 | 10 | 11 | sentiment = SentimentAnalysis() 12 | sentiment.set_hashtag_data('5T','#apple','#google') 13 | 14 | ds_tweets = get_tweets() 15 | 16 | python = sentiment.check_word_in_tweet('#google', ds_tweets) 17 | 18 | # Find mentions of #rstats in all text fields 19 | rstats = sentiment.check_word_in_tweet('#apple', ds_tweets) 20 | 21 | # Print proportion of tweets mentioning #python 22 | print("Proportion of #google tweets:", np.sum(python) / ds_tweets.shape[0]) 23 | 24 | # Print proportion of tweets mentioning #rstats 25 | print("Proportion of #apple tweets:", np.sum(rstats) / ds_tweets.shape[0]) 26 | 27 | 28 | 29 | 30 | ds_tweets['created_at'] = pd.to_datetime(ds_tweets["created_at"]) 31 | 32 | ds_tweets = ds_tweets.set_index("created_at") 33 | 34 | 35 | 36 | ds_tweets['google'] = sentiment.check_word_in_tweet('#google', ds_tweets) 37 | 38 | ds_tweets['apple'] = sentiment.check_word_in_tweet('#apple', ds_tweets) 39 | 40 | sum_google = ds_tweets['google'].resample('5T').sum() 41 | 42 | sum_apple = ds_tweets['apple'].resample('5T').sum() 43 | 44 | plt.plot(sum_google , color = 'blue') 45 | plt.plot(sum_apple , color = 'red') 46 | 47 | plt.xlabel('minute'); plt.ylabel('Tweet Count') 48 | plt.title('Total mentions over time') 49 | plt.legend(('#google', '#apple')) 50 | plt.show() 51 | 52 | 53 | 54 | 55 | mean_google = ds_tweets['google'].resample('5T').mean() 56 | 57 | mean_apple = ds_tweets['apple'].resample('5T').mean() 58 | 59 | plt.plot(mean_google , color = 'blue') 60 | plt.plot(mean_apple , color = 'red') 61 | 62 | plt.xlabel('minute'); plt.ylabel('Percentage') 63 | plt.title('Average mentions over time') 64 | plt.legend(('#google', '#apple')) 65 | plt.show() 66 | sentiment.graph() 67 | -------------------------------------------------------------------------------- /app/sentiment_analysis.py: -------------------------------------------------------------------------------- 1 | from nltk.sentiment.vader import SentimentIntensityAnalyzer 2 | from tweets_data import get_tweets 3 | import pandas as pd 4 | import matplotlib.pyplot as plt 5 | 6 | class SentimentAnalysis: 7 | def __init__(self,ds_tweets = get_tweets()): 8 | self.sid = SentimentIntensityAnalyzer() 9 | self.dates = pd.to_datetime(ds_tweets['created_at']) 10 | self.ds_tweets = ds_tweets.set_index(self.dates) 11 | self.hashtag = {} 12 | self.interval = '5T' 13 | self.sentiment_scores = self.ds_tweets['text'].apply(self.sid.polarity_scores) 14 | self.sentiment = self.sentiment_scores.apply(lambda x: x["compound"]) 15 | 16 | def check_word_in_tweet(self,word, data):## This function checks columns for presence of keywords(boolean list) 17 | 18 | """Checks if a word is in a Twitter dataset's text. 19 | Checks text and extended tweet (140+ character tweets) for tweets, 20 | retweets and quoted tweets. 21 | Returns a logical pandas Series. 22 | """ 23 | contains_column = data['text'].str.contains(word, case = False) 24 | contains_column |= data['extended_tweet-full_text'].str.contains(word, case = False) 25 | contains_column |= data["quoted_status-text"].str.contains(word, case = False) 26 | contains_column |= data["quoted_status-extended_tweet-full_text"].str.contains(word, case = False) 27 | contains_column |= data["retweeted_status-text"].str.contains(word, case = False) 28 | contains_column |= data["retweeted_status-extended_tweet-full_text"].str.contains(word, case = False) 29 | 30 | return contains_column 31 | 32 | 33 | def set_hashtag_data(self, interval='5T',*argv): 34 | for hashtags in argv: 35 | self.hashtag[hashtags] = self.sentiment[self.check_word_in_tweet(hashtags,self.ds_tweets)].resample(interval).mean() 36 | self.interval = interval 37 | return None 38 | 39 | def get_hashtag_data(self): 40 | return self.hashtag 41 | 42 | def graph(self): 43 | 44 | if(len(self.hashtag) == 0): 45 | return False 46 | 47 | for keys in self.hashtag: 48 | plt.plot(self.hashtag[keys]) 49 | plt.xlabel('minutes') 50 | plt.ylabel('Sentiment') 51 | plt.title('Sentiment of Tweets'); 52 | plt.legend(self.hashtag.keys()) 53 | plt.show() 54 | 55 | 56 | 57 | -------------------------------------------------------------------------------- /app/stream.py: -------------------------------------------------------------------------------- 1 | 2 | import tweepy 3 | from tweepy import Stream 4 | import twitter_config as tc 5 | 6 | # authorize the API Key 7 | authentication = tweepy.OAuthHandler(tc.api_key, tc.api_secret_key) 8 | 9 | # authorization to user's access token and access token secret 10 | authentication.set_access_token(tc.access_token, tc.access_token_secret) 11 | 12 | # call the api 13 | api = tweepy.API(authentication) 14 | 15 | #tweepy.models.Status 16 | 17 | 18 | 19 | 20 | from tweepy.streaming import StreamListener 21 | import json 22 | import time 23 | import sys 24 | 25 | class SListener(StreamListener): 26 | def __init__(self, api = None, fprefix = 'streamer'): 27 | self.api = api or API() 28 | self.counter = 0 29 | self.fprefix = fprefix 30 | self.output = open(self.fprefix , 'w') 31 | 32 | 33 | def on_data(self, data): 34 | if 'in_reply_to_status' in data: 35 | self.on_status(data) 36 | if (self.counter % 50 == 0): 37 | print("No of tweets currently in tweets.json = ", self.counter) 38 | elif 'delete' in data: 39 | delete = json.loads(data)['delete']['status'] 40 | if self.on_delete(delete['id'], delete['user_id']) is False: 41 | return False 42 | elif 'limit' in data: 43 | if self.on_limit(json.loads(data)['limit']['track']) is False: 44 | return False 45 | elif 'warning' in data: 46 | warning = json.loads(data)['warnings'] 47 | print("WARNING: %s" % warning['message']) 48 | return 49 | 50 | 51 | def on_status(self, status): 52 | self.output.write(status) 53 | self.counter += 1 54 | # if (self.counter % 50 == 0): 55 | # print("No of tweets currently in tweets.json = ", self.counter) 56 | if self.counter >= 20000: 57 | self.output.close() 58 | self.output = open( self.fprefix , 'w') 59 | self.counter = 0 60 | return 61 | 62 | 63 | def on_delete(self, status_id, user_id): 64 | print("Delete notice") 65 | return 66 | 67 | 68 | def on_limit(self, track): 69 | print("WARNING: Limitation notice received, tweets missed: %d" % track) 70 | return 71 | 72 | 73 | def on_error(self, status_code): 74 | print('Encountered error with status code:', status_code) 75 | return 76 | 77 | 78 | def on_timeout(self): 79 | print("Timeout, sleeping for 60 seconds...") 80 | time.sleep(60) 81 | return 82 | 83 | 84 | 85 | # Set up words to track 86 | keywords_to_track = ['#google', '#apple'] 87 | 88 | # Instantiate the SListener object 89 | 90 | file_to_write = "tweets.json" 91 | 92 | listen = SListener(api , file_to_write) 93 | 94 | # Instantiate the Stream object 95 | stream = Stream(authentication, listen) 96 | 97 | # Begin collecting data 98 | stream.filter(track = keywords_to_track) 99 | 100 | -------------------------------------------------------------------------------- /app/tweet_network.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | import networkx as nx 4 | from tweets_data import get_tweets 5 | 6 | 7 | 8 | ## Create retweets Subset of ds_tweets 9 | 10 | ds_retweets = get_tweets() 11 | 12 | G_rt = nx.from_pandas_edgelist( 13 | ds_retweets, 14 | source = "user-screen_name", 15 | target = "retweeted_status-user-screen_name", 16 | create_using = nx.DiGraph()) 17 | 18 | # Print the number of nodes 19 | print('Nodes in RT network:', len(G_rt.nodes())) 20 | 21 | # Print the number of edges 22 | print('Edges in RT network:', len(G_rt.edges())) 23 | 24 | 25 | 26 | ## Create reply Subset of ds_tweets 27 | 28 | ds_replies = get_tweets() 29 | 30 | # Create reply network from edgelist 31 | G_reply = nx.from_pandas_edgelist( 32 | ds_replies, 33 | source = "user-screen_name", 34 | target = "in_reply_to_screen_name", 35 | create_using = nx.DiGraph()) 36 | 37 | # Print the number of nodes 38 | print('Nodes in reply network:', len(G_reply.nodes())) 39 | 40 | # Print the number of edges 41 | print('Edges in reply network:', len(G_reply.edges())) 42 | 43 | 44 | 45 | ## Drawing the Graph 46 | 47 | pos = nx.random_layout(G_rt) 48 | 49 | # Create size list 50 | sizes = [x[1] for x in G_rt.degree()] 51 | 52 | 53 | # Draw the network 54 | nx.draw_networkx(G_rt, pos, 55 | with_labels = False, 56 | node_size = sizes, 57 | width = 0.1, alpha = 0.7, 58 | arrowsize = 2, linewidths = 0) 59 | 60 | # Turn axis off and show 61 | plt.axis('off'); plt.show() 62 | 63 | 64 | 65 | ## Nodes Of Interest 66 | 67 | # Generate in-degree centrality for retweets 68 | rt_centrality = nx.in_degree_centrality(G_rt) 69 | 70 | # Generate in-degree centrality for replies 71 | reply_centrality = nx.in_degree_centrality(G_reply) 72 | 73 | # Store centralities in DataFrame 74 | rt = pd.DataFrame(list(rt_centrality.items()), columns = column_names) 75 | reply = pd.DataFrame(list(reply_centrality.items()), columns = column_names) 76 | 77 | # Print first five results in descending order of centrality 78 | print(rt.sort_values('degree_centrality', ascending = False).head()) 79 | 80 | # Print first five results in descending order of centrality 81 | print(reply.sort_values('degree_centrality', ascending = False).head()) 82 | 83 | 84 | 85 | rt_centrality = nx.betweenness_centrality(G_rt) 86 | 87 | # Generate betweenness centrality for replies 88 | reply_centrality = nx.betweenness_centrality(G_reply) 89 | 90 | # Store centralities in data frames 91 | rt = pd.DataFrame(rt_centrality.items(), columns = column_names) 92 | reply = pd.DataFrame(reply_centrality.items(), columns = column_names) 93 | 94 | # Print first five results in descending order of centrality 95 | print(rt.sort_values('betweenness_centrality', ascending = False).head()) 96 | 97 | # Print first five results in descending order of centrality 98 | print(reply.sort_values('betweenness_centrality', ascending = False).head()) 99 | -------------------------------------------------------------------------------- /app/tweets.json: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /app/tweets_data.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | import json 4 | 5 | import pandas as pd 6 | 7 | ##reading the json file and importing as dataframe 8 | 9 | tweets_json = open("tweets.json","r").read() 10 | 11 | 12 | ## split the json file with newline char , load each tweet as dictionary and store as a list of dicts 13 | 14 | def flatten_tweets(tweets_json): 15 | """ Flattens out tweet dictionaries so relevant JSON 16 | is in a top-level dictionary.""" 17 | tweets_list = [] 18 | 19 | # Iterate through each tweet 20 | for tweet in tweets_json: 21 | tweet_obj = json.loads(tweet) 22 | 23 | # Store the user screen name in 'user-screen_name' 24 | tweet_obj['user-screen_name'] = tweet_obj["user"]["screen_name"] 25 | 26 | # Check if this is a 140+ character tweet 27 | if 'extended_tweet' in tweet_obj: 28 | # Store the extended tweet text in 'extended_tweet-full_text' 29 | tweet_obj['extended_tweet-full_text'] = tweet_obj["extended_tweet"]["full_text"] 30 | 31 | if 'retweeted_status' in tweet_obj: 32 | # Store the retweet user screen name in 'retweeted_status-user-screen_name' 33 | tweet_obj['retweeted_status-user-screen_name'] = tweet_obj["retweeted_status"]["user"]["screen_name"] 34 | 35 | # Store the retweet text in 'retweeted_status-text' 36 | tweet_obj['retweeted_status-text'] = tweet_obj["retweeted_status"]["text"] 37 | if 'extended_tweet' in tweet_obj["retweeted_status"]: 38 | tweet_obj["retweeted_status-extended_tweet-full_text"] = tweet_obj["retweeted_status"]["extended_tweet"]["full_text"] 39 | if 'quoted_status' in tweet_obj: 40 | tweet_obj["quoted_status-text"] = tweet_obj["quoted_status"]["text"] 41 | if 'extended_tweet' in tweet_obj["quoted_status"]: 42 | tweet_obj["quoted_status-extended_tweet-full_text"] = tweet_obj["quoted_status"]["extended_tweet"]["full_text"] 43 | tweets_list.append(tweet_obj) 44 | return tweets_list 45 | 46 | ## Converts list of json into list of dictionaries 47 | 48 | 49 | 50 | 51 | with open("tweets.json","r") as fh: 52 | tweets = fh.read().split("\n") 53 | 54 | tweets = [i for i in tweets if i] 55 | list_of_tweets = flatten_tweets(tweets) 56 | ## storing tweets as list of dictionaries 57 | 58 | 59 | 60 | # Create a DataFrame from `tweets` 61 | ds_tweets = pd.DataFrame(list_of_tweets) 62 | 63 | def get_tweets(): 64 | return ds_tweets 65 | 66 | 67 | -------------------------------------------------------------------------------- /app/twitter_config.py: -------------------------------------------------------------------------------- 1 | 2 | api_key = 'your api key' 3 | api_secret_key = 'your api secret' 4 | access_token = 'your api access token' 5 | access_token_secret = 'your access token secret' 6 | -------------------------------------------------------------------------------- /issue_template.md: -------------------------------------------------------------------------------- 1 | * **I'm submitting a ...** 2 | - [ ] bug report 3 | - [ ] feature request 4 | - [ ] change/modification 5 | - [ ] design issue 6 | 7 | 8 | * **Do you want to request a *feature* or report a *bug*?** 9 | 10 | 11 | 12 | * **What is the current behavior?** 13 | 14 | 15 | 16 | * **If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem** 17 | 18 | 19 | 20 | * **What is the expected behavior?** 21 | 22 | 23 | 24 | * **What is the motivation / use case for changing the behavior?** 25 | 26 | 27 | 28 | * **Please tell us about your environment:** 29 | 30 | - Version: 31 | - Browser: [all | Chrome XX | Firefox XX | IE XX | Safari XX | Mobile Chrome XX | Android X.X Web Browser | iOS XX Safari | iOS XX UIWebView | iOS XX WKWebView ] 32 | - Language: [all | TypeScript X.X | ES6/7 | ES5 | Dart] 33 | 34 | 35 | * **Other information** (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc) 36 | -------------------------------------------------------------------------------- /pull_request_template.md: -------------------------------------------------------------------------------- 1 | A similar PR may already be submitted! 2 | Please search among the [Pull request](../) before creating one. 3 | 4 | Thanks for submitting a pull request! Please provide enough information so that others can review your pull request: 5 | 6 | For more information, see the `CONTRIBUTING` guide. 7 | 8 | 9 | * **What kind of change does this PR introduce?** (Bug fix, feature, docs update, ...) 10 | 11 | 12 | 13 | * **What is the current behavior?** (You can also link to an open issue here) 14 | 15 | 16 | 17 | * **What is the new behavior (if this is a feature change)?** 18 | 19 | 20 | 21 | * **Does this PR introduce a breaking change?** (What changes might users need to make in their application due to this PR?) 22 | 23 | 24 | 25 | * **Other information**: 26 | 27 | 28 | 29 | Explain the **motivation** for making this change. What existing problem does the pull request solve? 30 | 31 | 32 | 33 | **Test plan (required)** 34 | 35 | Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes UI. 36 | 37 | 38 | 39 | **Code formatting** 40 | 41 | 42 | 43 | **Closing issues** 44 | 45 | 46 | Fixes # 47 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib==3.3.4 2 | networkx==2.5 3 | nltk==3.6.1 4 | numpy==1.20.1 5 | pandas==1.2.4 6 | tweepy==3.7.0 7 | --------------------------------------------------------------------------------