├── AdvScraper
    ├── GetOldTweets3
    │   ├── GetOldTweets3_Article_Scraper.ipynb
    │   └── GetOldTweets3_Companion_Scraper.ipynb
    ├── README.md
    ├── Tweepy
    │   ├── Tweepy_Article_Scraper.ipynb
    │   ├── Tweepy_Companion_Scraper.ipynb
    │   └── credentials.csv
    └── Tweepy_and_GetOldTweets3.ipynb
├── BasicScraper
    ├── GetOldTweets3_Basic_Scraper.ipynb
    ├── README.md
    └── Tweepy_Basic_Scraper.ipynb
├── README.md
├── ScraperV4
    ├── README.md
    └── Tweepy_Scraper_V4.ipynb
└── snscrape
    ├── README.md
    ├── cli-with-python
        ├── snscrape-python-cli.ipynb
        └── snscrape-python-cli.py
    └── python-wrapper
        ├── snscrape-python-wrapper.ipynb
        └── snscrape-python-wrapper.py


/AdvScraper/GetOldTweets3/GetOldTweets3_Article_Scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Article Notebook for Scraping Twitter Using GetOldTweets3\n",
  8 |     "\n",
  9 |     "Package: https://github.com/Mottl/GetOldTweets3\n",
 10 |     "\n",
 11 |     "Article Read-Along: https://towardsdatascience.com/how-to-scrape-more-information-from-tweets-on-twitter-44fd540b8a1f\n",
 12 |     "\n",
 13 |     "### Notebook Author: Martin Beck\n",
 14 |     "#### Information current as of August, 13th 2020\n",
 15 |     "<b> Dependencies:</b> Make sure GetOldTweets3 is already installed in your Python environment. If not, you can pip install GetOldTweets3 to install the package. If you want more information on setting up I have an article [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) that goes into deeper detail."
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "markdown",
 20 |    "metadata": {},
 21 |    "source": [
 22 |     "## Notebook's Table of Contents<a name=\"TOC\"></a>\n",
 23 |     "\n",
 24 |     "1. [Getting More Information From Tweets](#Section1)\n",
 25 |     "<br>How to scrape more information from tweets such as favorite count, retweet count, mentions, permalinks, etc.\n",
 26 |     "2. [Getting User Information From Tweets](#Section2)\n",
 27 |     "<br><b>GetOldTweets3 does not offer</b> anymore user information than their screename or Twitter @ name which is shown in section 1.\n",
 28 |     "3. [Scraping Tweets With Advanced Queries](#Section3)\n",
 29 |     "<br>How to scrape for tweets using deeper queries such as searching by language of tweets, tweets within a certain location, tweets within specific date ranges, top tweets, etc.\n",
 30 |     "4. [Putting It All Together](#Section4)\n",
 31 |     "<br>Showcasing how you can mix and match the methods shown above to create queries that'll fulfill your data needs."
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "## Imports for Notebook"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "code",
 43 |    "execution_count": 27,
 44 |    "metadata": {},
 45 |    "outputs": [],
 46 |    "source": [
 47 |     "# Pip install Tweepy if you don't already have the package\n",
 48 |     "# !pip install tweepy\n",
 49 |     "\n",
 50 |     "# Imports\n",
 51 |     "import GetOldTweets3 as got\n",
 52 |     "import pandas as pd"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "markdown",
 57 |    "metadata": {},
 58 |    "source": [
 59 |     "## 1. Getting More Information From Tweets <a name=\"Section1\"></a>\n",
 60 |     "[Return to Table of Contents](#TOC)\n",
 61 |     "<br>\n",
 62 |     "List of information available in the tweet object with GetOldTweets3\n",
 63 |     "* tweet.geo: <b>*NOTE GEO-DATA NOT WORKING BASED ON ISSUE</b><br><br>\n",
 64 |     "\n",
 65 |     "* tweet.id: Id of tweet\n",
 66 |     "* tweet.author_id: User id of tweet's author\n",
 67 |     "* tweet.username: Username of tweet's author, commonly called User @ name\n",
 68 |     "* tweet.to: If tweet is a reply, the original tweet's username\n",
 69 |     "* tweet.text: Text content of tweet\n",
 70 |     "* tweet.retweets: Count of retweets\n",
 71 |     "* tweet.favorites: Count of favorites\n",
 72 |     "* tweet.replies: Count of replies\n",
 73 |     "* tweet.date: Date tweet was created\n",
 74 |     "* tweet.formatted_date: Formatted version of when tweet was created\n",
 75 |     "* tweet.hashtags: Hashtags that tweet contains\n",
 76 |     "* tweet.mentions: Mentions of other users that tweet contains\n",
 77 |     "* tweet.urls: Urls that are in the tweet\n",
 78 |     "* tweet.permalink: Permalink of tweet itself"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": 35,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "username = 'jack'\n",
 88 |     "count = 150\n",
 89 |     " \n",
 90 |     "# Creation of tweetCriteria query object with methods to specify further\n",
 91 |     "tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
 92 |     ".setMaxTweets(count)\n",
 93 |     "    \n",
 94 |     "# Creation of tweets iterable containing all queried tweet data\n",
 95 |     "tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
 96 |     " \n",
 97 |     "# List comprehension pulling chosen tweet information from tweets\n",
 98 |     "# Add or remove tweet information you want in the below list comprehension\n",
 99 |     "tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites, tweet.replies, tweet.date, tweet.formatted_date, tweet.hashtags, tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
100 |     " \n",
101 |     "# Creation of dataframe from tweets_list\n",
102 |     "# Add or remove columns as you remove tweet information\n",
103 |     "tweets_df1 = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
104 |     "                                                 'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "code",
109 |    "execution_count": 36,
110 |    "metadata": {},
111 |    "outputs": [
112 |     {
113 |      "data": {
114 |       "text/html": [
115 |        "<div>\n",
116 |        "<style scoped>\n",
117 |        "    .dataframe tbody tr th:only-of-type {\n",
118 |        "        vertical-align: middle;\n",
119 |        "    }\n",
120 |        "\n",
121 |        "    .dataframe tbody tr th {\n",
122 |        "        vertical-align: top;\n",
123 |        "    }\n",
124 |        "\n",
125 |        "    .dataframe thead th {\n",
126 |        "        text-align: right;\n",
127 |        "    }\n",
128 |        "</style>\n",
129 |        "<table border=\"1\" class=\"dataframe\">\n",
130 |        "  <thead>\n",
131 |        "    <tr style=\"text-align: right;\">\n",
132 |        "      <th></th>\n",
133 |        "      <th>Tweet Id</th>\n",
134 |        "      <th>Tweet User Id</th>\n",
135 |        "      <th>Tweet User</th>\n",
136 |        "      <th>Reply to</th>\n",
137 |        "      <th>Text</th>\n",
138 |        "      <th>Retweets</th>\n",
139 |        "      <th>Favorites</th>\n",
140 |        "      <th>Replies</th>\n",
141 |        "      <th>Datetime</th>\n",
142 |        "      <th>Formatted date</th>\n",
143 |        "      <th>Hashtags</th>\n",
144 |        "      <th>Mentions</th>\n",
145 |        "      <th>Urls</th>\n",
146 |        "      <th>Permalink</th>\n",
147 |        "    </tr>\n",
148 |        "  </thead>\n",
149 |        "  <tbody>\n",
150 |        "    <tr>\n",
151 |        "      <th>0</th>\n",
152 |        "      <td>1294765289255706624</td>\n",
153 |        "      <td>12</td>\n",
154 |        "      <td>jack</td>\n",
155 |        "      <td>jsngr</td>\n",
156 |        "      <td>Jordan is incredible</td>\n",
157 |        "      <td>116</td>\n",
158 |        "      <td>1272</td>\n",
159 |        "      <td>62</td>\n",
160 |        "      <td>2020-08-15 22:37:55+00:00</td>\n",
161 |        "      <td>Sat Aug 15 22:37:55 +0000 2020</td>\n",
162 |        "      <td></td>\n",
163 |        "      <td></td>\n",
164 |        "      <td>https://twitter.com/jsngr/status/1294635175222...</td>\n",
165 |        "      <td>https://twitter.com/jack/status/12947652892557...</td>\n",
166 |        "    </tr>\n",
167 |        "    <tr>\n",
168 |        "      <th>1</th>\n",
169 |        "      <td>1293753884159234050</td>\n",
170 |        "      <td>12</td>\n",
171 |        "      <td>jack</td>\n",
172 |        "      <td>SpaceForceDoD</td>\n",
173 |        "      <td>?</td>\n",
174 |        "      <td>741</td>\n",
175 |        "      <td>9113</td>\n",
176 |        "      <td>583</td>\n",
177 |        "      <td>2020-08-13 03:38:57+00:00</td>\n",
178 |        "      <td>Thu Aug 13 03:38:57 +0000 2020</td>\n",
179 |        "      <td></td>\n",
180 |        "      <td></td>\n",
181 |        "      <td>https://twitter.com/spaceforcedod/status/12936...</td>\n",
182 |        "      <td>https://twitter.com/jack/status/12937538841592...</td>\n",
183 |        "    </tr>\n",
184 |        "    <tr>\n",
185 |        "      <th>2</th>\n",
186 |        "      <td>1293687636675223552</td>\n",
187 |        "      <td>12</td>\n",
188 |        "      <td>jack</td>\n",
189 |        "      <td>TwitterDev</td>\n",
190 |        "      <td>Build on Twitter again!</td>\n",
191 |        "      <td>619</td>\n",
192 |        "      <td>4945</td>\n",
193 |        "      <td>442</td>\n",
194 |        "      <td>2020-08-12 23:15:42+00:00</td>\n",
195 |        "      <td>Wed Aug 12 23:15:42 +0000 2020</td>\n",
196 |        "      <td></td>\n",
197 |        "      <td></td>\n",
198 |        "      <td>https://twitter.com/TwitterDev/status/12935935...</td>\n",
199 |        "      <td>https://twitter.com/jack/status/12936876366752...</td>\n",
200 |        "    </tr>\n",
201 |        "    <tr>\n",
202 |        "      <th>3</th>\n",
203 |        "      <td>1293641297459388416</td>\n",
204 |        "      <td>12</td>\n",
205 |        "      <td>jack</td>\n",
206 |        "      <td>boardroom</td>\n",
207 |        "      <td>Thanks for the chat @richkleiman and Gianni! G...</td>\n",
208 |        "      <td>52</td>\n",
209 |        "      <td>385</td>\n",
210 |        "      <td>89</td>\n",
211 |        "      <td>2020-08-12 20:11:34+00:00</td>\n",
212 |        "      <td>Wed Aug 12 20:11:34 +0000 2020</td>\n",
213 |        "      <td></td>\n",
214 |        "      <td>@richkleiman</td>\n",
215 |        "      <td>https://twitter.com/boardroom/status/129356427...</td>\n",
216 |        "      <td>https://twitter.com/jack/status/12936412974593...</td>\n",
217 |        "    </tr>\n",
218 |        "    <tr>\n",
219 |        "      <th>4</th>\n",
220 |        "      <td>1291956273814990848</td>\n",
221 |        "      <td>12</td>\n",
222 |        "      <td>jack</td>\n",
223 |        "      <td>Mayalangersegal</td>\n",
224 |        "      <td>Thank you. Thank you. Thank you. @RemindMe_OfT...</td>\n",
225 |        "      <td>2</td>\n",
226 |        "      <td>93</td>\n",
227 |        "      <td>16</td>\n",
228 |        "      <td>2020-08-08 04:35:53+00:00</td>\n",
229 |        "      <td>Sat Aug 08 04:35:53 +0000 2020</td>\n",
230 |        "      <td></td>\n",
231 |        "      <td>@RemindMe_OfThis</td>\n",
232 |        "      <td></td>\n",
233 |        "      <td>https://twitter.com/jack/status/12919562738149...</td>\n",
234 |        "    </tr>\n",
235 |        "  </tbody>\n",
236 |        "</table>\n",
237 |        "</div>"
238 |       ],
239 |       "text/plain": [
240 |        "              Tweet Id  Tweet User Id Tweet User         Reply to  \\\n",
241 |        "0  1294765289255706624             12       jack            jsngr   \n",
242 |        "1  1293753884159234050             12       jack    SpaceForceDoD   \n",
243 |        "2  1293687636675223552             12       jack       TwitterDev   \n",
244 |        "3  1293641297459388416             12       jack        boardroom   \n",
245 |        "4  1291956273814990848             12       jack  Mayalangersegal   \n",
246 |        "\n",
247 |        "                                                Text  Retweets  Favorites  \\\n",
248 |        "0                               Jordan is incredible       116       1272   \n",
249 |        "1                                                  ?       741       9113   \n",
250 |        "2                            Build on Twitter again!       619       4945   \n",
251 |        "3  Thanks for the chat @richkleiman and Gianni! G...        52        385   \n",
252 |        "4  Thank you. Thank you. Thank you. @RemindMe_OfT...         2         93   \n",
253 |        "\n",
254 |        "   Replies                  Datetime                  Formatted date Hashtags  \\\n",
255 |        "0       62 2020-08-15 22:37:55+00:00  Sat Aug 15 22:37:55 +0000 2020            \n",
256 |        "1      583 2020-08-13 03:38:57+00:00  Thu Aug 13 03:38:57 +0000 2020            \n",
257 |        "2      442 2020-08-12 23:15:42+00:00  Wed Aug 12 23:15:42 +0000 2020            \n",
258 |        "3       89 2020-08-12 20:11:34+00:00  Wed Aug 12 20:11:34 +0000 2020            \n",
259 |        "4       16 2020-08-08 04:35:53+00:00  Sat Aug 08 04:35:53 +0000 2020            \n",
260 |        "\n",
261 |        "           Mentions                                               Urls  \\\n",
262 |        "0                    https://twitter.com/jsngr/status/1294635175222...   \n",
263 |        "1                    https://twitter.com/spaceforcedod/status/12936...   \n",
264 |        "2                    https://twitter.com/TwitterDev/status/12935935...   \n",
265 |        "3      @richkleiman  https://twitter.com/boardroom/status/129356427...   \n",
266 |        "4  @RemindMe_OfThis                                                      \n",
267 |        "\n",
268 |        "                                           Permalink  \n",
269 |        "0  https://twitter.com/jack/status/12947652892557...  \n",
270 |        "1  https://twitter.com/jack/status/12937538841592...  \n",
271 |        "2  https://twitter.com/jack/status/12936876366752...  \n",
272 |        "3  https://twitter.com/jack/status/12936412974593...  \n",
273 |        "4  https://twitter.com/jack/status/12919562738149...  "
274 |       ]
275 |      },
276 |      "execution_count": 36,
277 |      "metadata": {},
278 |      "output_type": "execute_result"
279 |     }
280 |    ],
281 |    "source": [
282 |     "tweets_df1.head()"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "markdown",
287 |    "metadata": {},
288 |    "source": [
289 |     "## 2. Getting User Information From Tweets<a name=\"Section2\"></a>\n",
290 |     "[Return to Table of Contents](#TOC)\n",
291 |     "<br><b>GetOldTweets3 is limited in the user information that is accessible.</b> This library only allows access to a tweet author's username and user_id. If you want user information I recommend looking into utilizing Tweepy for all of your scraping, or using Tweepy in tandem with GetOldTweets3 in order to utilize both libraries to their strengths."
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "markdown",
296 |    "metadata": {},
297 |    "source": [
298 |     "## 3. Scraping Tweets With Advanced Queries<a name=\"Section3\"></a>\n",
299 |     "[Return to Table of Contents](#TOC)\n",
300 |     "<br>\n",
301 |     "List of methods available with GetOldTweets3 to refine your queries.\n",
302 |     "\n",
303 |     "* setUsername(str): Setting query based on username\n",
304 |     "* setMaxTweets(int): Setting maximum number of tweets to search\n",
305 |     "* setQuerySearch(str): Setting query based on text\n",
306 |     "* setSince(str \"yyyy-mm-dd\"): Setting lower bound date on query\n",
307 |     "* setUntil(str \"yyyy-mm-dd\"): Setting upper bound date on query\n",
308 |     "* setNear(str): Setting location of query search\n",
309 |     "* setWithin(str): Setting radius of query search location\n",
310 |     "* setLang(str): Setting language of query\n",
311 |     "* setTopTweets(bool): Setting query to search only for top tweets\n",
312 |     "* setEmoji(\"ignore\"/\"unicode\"/\"name\"): Setting query to search using emoji styles"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "code",
317 |    "execution_count": 37,
318 |    "metadata": {},
319 |    "outputs": [],
320 |    "source": [
321 |     "username = \"BarackObama\"\n",
322 |     "text_query = \"Hello\"\n",
323 |     "since_date = \"2011-01-01\"\n",
324 |     "until_date = \"2016-12-20\"\n",
325 |     "count = 150\n",
326 |     " \n",
327 |     "# Creation of tweetCriteria query object with methods to specify further\n",
328 |     "tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
329 |     ".setQuerySearch(text_query).setSince(since_date)\\\n",
330 |     ".setUntil(until_date).setMaxTweets(count)\n",
331 |     " \n",
332 |     "# Creation of tweets iterable containing all queried tweet data\n",
333 |     "tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
334 |     " \n",
335 |     "# List comprehension pulling chosen tweet information from tweets\n",
336 |     "# Add or remove tweet information you want in the below list comprehension\n",
337 |     "tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.text, tweet.retweets, tweet.favorites,tweet.replies,tweet.date] for tweet in tweets]\n",
338 |     " \n",
339 |     "# Creation of dataframe from tweets list\n",
340 |     "# Add or remove columns as you remove tweet information\n",
341 |     "tweets_df3 = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User', 'Text','Retweets', 'Favorites', \n",
342 |     "                                                 'Replies', 'Datetime'])"
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "code",
347 |    "execution_count": 38,
348 |    "metadata": {
349 |     "scrolled": false
350 |    },
351 |    "outputs": [
352 |     {
353 |      "data": {
354 |       "text/html": [
355 |        "<div>\n",
356 |        "<style scoped>\n",
357 |        "    .dataframe tbody tr th:only-of-type {\n",
358 |        "        vertical-align: middle;\n",
359 |        "    }\n",
360 |        "\n",
361 |        "    .dataframe tbody tr th {\n",
362 |        "        vertical-align: top;\n",
363 |        "    }\n",
364 |        "\n",
365 |        "    .dataframe thead th {\n",
366 |        "        text-align: right;\n",
367 |        "    }\n",
368 |        "</style>\n",
369 |        "<table border=\"1\" class=\"dataframe\">\n",
370 |        "  <thead>\n",
371 |        "    <tr style=\"text-align: right;\">\n",
372 |        "      <th></th>\n",
373 |        "      <th>Tweet Id</th>\n",
374 |        "      <th>Tweet User Id</th>\n",
375 |        "      <th>Tweet User</th>\n",
376 |        "      <th>Text</th>\n",
377 |        "      <th>Retweets</th>\n",
378 |        "      <th>Favorites</th>\n",
379 |        "      <th>Replies</th>\n",
380 |        "      <th>Datetime</th>\n",
381 |        "    </tr>\n",
382 |        "  </thead>\n",
383 |        "  <tbody>\n",
384 |        "    <tr>\n",
385 |        "      <th>0</th>\n",
386 |        "      <td>682986933862154241</td>\n",
387 |        "      <td>813286</td>\n",
388 |        "      <td>BarackObama</td>\n",
389 |        "      <td>Hello, 2016.</td>\n",
390 |        "      <td>3506</td>\n",
391 |        "      <td>13010</td>\n",
392 |        "      <td>760</td>\n",
393 |        "      <td>2016-01-01 18:09:08+00:00</td>\n",
394 |        "    </tr>\n",
395 |        "    <tr>\n",
396 |        "      <th>1</th>\n",
397 |        "      <td>547783171199496192</td>\n",
398 |        "      <td>813286</td>\n",
399 |        "      <td>BarackObama</td>\n",
400 |        "      <td>Say hello to friends you know and everyone you...</td>\n",
401 |        "      <td>3555</td>\n",
402 |        "      <td>9075</td>\n",
403 |        "      <td>1087</td>\n",
404 |        "      <td>2014-12-24 15:57:39+00:00</td>\n",
405 |        "    </tr>\n",
406 |        "    <tr>\n",
407 |        "      <th>2</th>\n",
408 |        "      <td>457281289351999489</td>\n",
409 |        "      <td>813286</td>\n",
410 |        "      <td>BarackObama</td>\n",
411 |        "      <td>Hello, spring.</td>\n",
412 |        "      <td>5807</td>\n",
413 |        "      <td>10089</td>\n",
414 |        "      <td>1040</td>\n",
415 |        "      <td>2014-04-18 22:15:30+00:00</td>\n",
416 |        "    </tr>\n",
417 |        "    <tr>\n",
418 |        "      <th>3</th>\n",
419 |        "      <td>438453976833343488</td>\n",
420 |        "      <td>813286</td>\n",
421 |        "      <td>BarackObama</td>\n",
422 |        "      <td>\"Hello OFA!\" —President Obama at the #ActionSu...</td>\n",
423 |        "      <td>134</td>\n",
424 |        "      <td>244</td>\n",
425 |        "      <td>57</td>\n",
426 |        "      <td>2014-02-25 23:22:28+00:00</td>\n",
427 |        "    </tr>\n",
428 |        "    <tr>\n",
429 |        "      <th>4</th>\n",
430 |        "      <td>265569746991333377</td>\n",
431 |        "      <td>813286</td>\n",
432 |        "      <td>BarackObama</td>\n",
433 |        "      <td>“Hello, Columbus! Hello, Ohio! Are you fired u...</td>\n",
434 |        "      <td>513</td>\n",
435 |        "      <td>208</td>\n",
436 |        "      <td>81</td>\n",
437 |        "      <td>2012-11-05 21:42:16+00:00</td>\n",
438 |        "    </tr>\n",
439 |        "  </tbody>\n",
440 |        "</table>\n",
441 |        "</div>"
442 |       ],
443 |       "text/plain": [
444 |        "             Tweet Id  Tweet User Id   Tweet User  \\\n",
445 |        "0  682986933862154241         813286  BarackObama   \n",
446 |        "1  547783171199496192         813286  BarackObama   \n",
447 |        "2  457281289351999489         813286  BarackObama   \n",
448 |        "3  438453976833343488         813286  BarackObama   \n",
449 |        "4  265569746991333377         813286  BarackObama   \n",
450 |        "\n",
451 |        "                                                Text  Retweets  Favorites  \\\n",
452 |        "0                                      Hello, 2016.       3506      13010   \n",
453 |        "1  Say hello to friends you know and everyone you...      3555       9075   \n",
454 |        "2                                    Hello, spring.       5807      10089   \n",
455 |        "3  \"Hello OFA!\" —President Obama at the #ActionSu...       134        244   \n",
456 |        "4  “Hello, Columbus! Hello, Ohio! Are you fired u...       513        208   \n",
457 |        "\n",
458 |        "   Replies                  Datetime  \n",
459 |        "0      760 2016-01-01 18:09:08+00:00  \n",
460 |        "1     1087 2014-12-24 15:57:39+00:00  \n",
461 |        "2     1040 2014-04-18 22:15:30+00:00  \n",
462 |        "3       57 2014-02-25 23:22:28+00:00  \n",
463 |        "4       81 2012-11-05 21:42:16+00:00  "
464 |       ]
465 |      },
466 |      "execution_count": 38,
467 |      "metadata": {},
468 |      "output_type": "execute_result"
469 |     }
470 |    ],
471 |    "source": [
472 |     "tweets_df3.head()"
473 |    ]
474 |   },
475 |   {
476 |    "cell_type": "markdown",
477 |    "metadata": {},
478 |    "source": [
479 |     "## 4. Putting It All Together<a name=\"Section4\"></a>\n",
480 |     "[Return to Table of Contents](#TOC)\n",
481 |     "<br>\n",
482 |     "Great, we now know how to pull more information from tweets and querying with advanced parameters. The great thing is how easy it is to mix and match whatever you want to search for. While it was shown above several times. The point is that you can mix and match the information you want from the tweets and the type of queries you conduct. It's just important that you update the column names in the pandas dataframe so you don't get errors.\n",
483 |     "\n",
484 |     "<br>\n",
485 |     "Below is an example of a search for 150 top tweets with 'coronavirus' in it that occurred between August 5th and August 8th 2020 in Washington D.C."
486 |    ]
487 |   },
488 |   {
489 |    "cell_type": "code",
490 |    "execution_count": 39,
491 |    "metadata": {},
492 |    "outputs": [],
493 |    "source": [
494 |     "text_query = 'Coronavirus'\n",
495 |     "since_date = '2020-08-05'\n",
496 |     "until_date = '2020-08-10'\n",
497 |     "location = 'Washington, D.C.'\n",
498 |     "top_tweets = True\n",
499 |     "count = 150\n",
500 |     " \n",
501 |     "# Creation of tweetCriteria query object with methods to specify further\n",
502 |     "tweetCriteria = got.manager.TweetCriteria()\\\n",
503 |     ".setQuerySearch(text_query).setSince(since_date)\\\n",
504 |     ".setUntil(until_date).setNear(location).setTopTweets(top_tweets)\\\n",
505 |     ".setMaxTweets(count)\n",
506 |     " \n",
507 |     "# Creation of tweets iterable containing all queried tweet data\n",
508 |     "tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
509 |     " \n",
510 |     "# List comprehension pulling chosen tweet information from tweets\n",
511 |     "# Add or remove tweet information you want in the below list comprehension\n",
512 |     "tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites, tweet.replies, tweet.date, tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
513 |     " \n",
514 |     "# Creation of dataframe from tweets list\n",
515 |     "# Add or remove columns as you remove tweet information\n",
516 |     "tweets_df4 = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text',\n",
517 |     "                                                  'Retweets', 'Favorites', 'Replies', 'Datetime', 'Mentions','Urls','Permalink'])"
518 |    ]
519 |   },
520 |   {
521 |    "cell_type": "code",
522 |    "execution_count": 40,
523 |    "metadata": {},
524 |    "outputs": [
525 |     {
526 |      "data": {
527 |       "text/html": [
528 |        "<div>\n",
529 |        "<style scoped>\n",
530 |        "    .dataframe tbody tr th:only-of-type {\n",
531 |        "        vertical-align: middle;\n",
532 |        "    }\n",
533 |        "\n",
534 |        "    .dataframe tbody tr th {\n",
535 |        "        vertical-align: top;\n",
536 |        "    }\n",
537 |        "\n",
538 |        "    .dataframe thead th {\n",
539 |        "        text-align: right;\n",
540 |        "    }\n",
541 |        "</style>\n",
542 |        "<table border=\"1\" class=\"dataframe\">\n",
543 |        "  <thead>\n",
544 |        "    <tr style=\"text-align: right;\">\n",
545 |        "      <th></th>\n",
546 |        "      <th>Tweet Id</th>\n",
547 |        "      <th>Tweet User Id</th>\n",
548 |        "      <th>Tweet User</th>\n",
549 |        "      <th>Reply to</th>\n",
550 |        "      <th>Text</th>\n",
551 |        "      <th>Retweets</th>\n",
552 |        "      <th>Favorites</th>\n",
553 |        "      <th>Replies</th>\n",
554 |        "      <th>Datetime</th>\n",
555 |        "      <th>Mentions</th>\n",
556 |        "      <th>Urls</th>\n",
557 |        "      <th>Permalink</th>\n",
558 |        "    </tr>\n",
559 |        "  </thead>\n",
560 |        "  <tbody>\n",
561 |        "    <tr>\n",
562 |        "      <th>0</th>\n",
563 |        "      <td>1292610170309181447</td>\n",
564 |        "      <td>535643852</td>\n",
565 |        "      <td>JordanSchachtel</td>\n",
566 |        "      <td>None</td>\n",
567 |        "      <td>Fauci had a very interesting Q&amp;A this weekend ...</td>\n",
568 |        "      <td>276</td>\n",
569 |        "      <td>563</td>\n",
570 |        "      <td>92</td>\n",
571 |        "      <td>2020-08-09 23:54:14+00:00</td>\n",
572 |        "      <td></td>\n",
573 |        "      <td>https://www.cnbc.com/2020/08/07/coronavirus-va...</td>\n",
574 |        "      <td>https://twitter.com/JordanSchachtel/status/129...</td>\n",
575 |        "    </tr>\n",
576 |        "    <tr>\n",
577 |        "      <th>1</th>\n",
578 |        "      <td>1292584089833349121</td>\n",
579 |        "      <td>225265639</td>\n",
580 |        "      <td>ddale8</td>\n",
581 |        "      <td>None</td>\n",
582 |        "      <td>If the president confused you about what was a...</td>\n",
583 |        "      <td>1743</td>\n",
584 |        "      <td>3481</td>\n",
585 |        "      <td>143</td>\n",
586 |        "      <td>2020-08-09 22:10:36+00:00</td>\n",
587 |        "      <td></td>\n",
588 |        "      <td>https://cnn.it/31zwwir</td>\n",
589 |        "      <td>https://twitter.com/ddale8/status/129258408983...</td>\n",
590 |        "    </tr>\n",
591 |        "    <tr>\n",
592 |        "      <th>2</th>\n",
593 |        "      <td>1292543811235840000</td>\n",
594 |        "      <td>53809979</td>\n",
595 |        "      <td>davidalim</td>\n",
596 |        "      <td>None</td>\n",
597 |        "      <td>Antigen tests have been touted as a way to sca...</td>\n",
598 |        "      <td>164</td>\n",
599 |        "      <td>212</td>\n",
600 |        "      <td>27</td>\n",
601 |        "      <td>2020-08-09 19:30:33+00:00</td>\n",
602 |        "      <td>@rachel_roubein</td>\n",
603 |        "      <td>https://www.politico.com/news/2020/08/09/coron...</td>\n",
604 |        "      <td>https://twitter.com/davidalim/status/129254381...</td>\n",
605 |        "    </tr>\n",
606 |        "    <tr>\n",
607 |        "      <th>3</th>\n",
608 |        "      <td>1292525660422930432</td>\n",
609 |        "      <td>18956073</td>\n",
610 |        "      <td>dcexaminer</td>\n",
611 |        "      <td>None</td>\n",
612 |        "      <td>A Nashville, Tennessee, councilwoman wants tho...</td>\n",
613 |        "      <td>315</td>\n",
614 |        "      <td>274</td>\n",
615 |        "      <td>390</td>\n",
616 |        "      <td>2020-08-09 18:18:26+00:00</td>\n",
617 |        "      <td></td>\n",
618 |        "      <td>https://washex.am/3kD8L1E</td>\n",
619 |        "      <td>https://twitter.com/dcexaminer/status/12925256...</td>\n",
620 |        "    </tr>\n",
621 |        "    <tr>\n",
622 |        "      <th>4</th>\n",
623 |        "      <td>1292468804648394752</td>\n",
624 |        "      <td>309822757</td>\n",
625 |        "      <td>ryanstruyk</td>\n",
626 |        "      <td>None</td>\n",
627 |        "      <td>The United States just reached 5 million repor...</td>\n",
628 |        "      <td>974</td>\n",
629 |        "      <td>1257</td>\n",
630 |        "      <td>52</td>\n",
631 |        "      <td>2020-08-09 14:32:30+00:00</td>\n",
632 |        "      <td></td>\n",
633 |        "      <td></td>\n",
634 |        "      <td>https://twitter.com/ryanstruyk/status/12924688...</td>\n",
635 |        "    </tr>\n",
636 |        "  </tbody>\n",
637 |        "</table>\n",
638 |        "</div>"
639 |       ],
640 |       "text/plain": [
641 |        "              Tweet Id  Tweet User Id       Tweet User Reply to  \\\n",
642 |        "0  1292610170309181447      535643852  JordanSchachtel     None   \n",
643 |        "1  1292584089833349121      225265639           ddale8     None   \n",
644 |        "2  1292543811235840000       53809979        davidalim     None   \n",
645 |        "3  1292525660422930432       18956073       dcexaminer     None   \n",
646 |        "4  1292468804648394752      309822757       ryanstruyk     None   \n",
647 |        "\n",
648 |        "                                                Text  Retweets  Favorites  \\\n",
649 |        "0  Fauci had a very interesting Q&A this weekend ...       276        563   \n",
650 |        "1  If the president confused you about what was a...      1743       3481   \n",
651 |        "2  Antigen tests have been touted as a way to sca...       164        212   \n",
652 |        "3  A Nashville, Tennessee, councilwoman wants tho...       315        274   \n",
653 |        "4  The United States just reached 5 million repor...       974       1257   \n",
654 |        "\n",
655 |        "   Replies                  Datetime         Mentions  \\\n",
656 |        "0       92 2020-08-09 23:54:14+00:00                    \n",
657 |        "1      143 2020-08-09 22:10:36+00:00                    \n",
658 |        "2       27 2020-08-09 19:30:33+00:00  @rachel_roubein   \n",
659 |        "3      390 2020-08-09 18:18:26+00:00                    \n",
660 |        "4       52 2020-08-09 14:32:30+00:00                    \n",
661 |        "\n",
662 |        "                                                Urls  \\\n",
663 |        "0  https://www.cnbc.com/2020/08/07/coronavirus-va...   \n",
664 |        "1                             https://cnn.it/31zwwir   \n",
665 |        "2  https://www.politico.com/news/2020/08/09/coron...   \n",
666 |        "3                          https://washex.am/3kD8L1E   \n",
667 |        "4                                                      \n",
668 |        "\n",
669 |        "                                           Permalink  \n",
670 |        "0  https://twitter.com/JordanSchachtel/status/129...  \n",
671 |        "1  https://twitter.com/ddale8/status/129258408983...  \n",
672 |        "2  https://twitter.com/davidalim/status/129254381...  \n",
673 |        "3  https://twitter.com/dcexaminer/status/12925256...  \n",
674 |        "4  https://twitter.com/ryanstruyk/status/12924688...  "
675 |       ]
676 |      },
677 |      "execution_count": 40,
678 |      "metadata": {},
679 |      "output_type": "execute_result"
680 |     }
681 |    ],
682 |    "source": [
683 |     "tweets_df4.head()"
684 |    ]
685 |   }
686 |  ],
687 |  "metadata": {
688 |   "kernelspec": {
689 |    "display_name": "Python 3",
690 |    "language": "python",
691 |    "name": "python3"
692 |   },
693 |   "language_info": {
694 |    "codemirror_mode": {
695 |     "name": "ipython",
696 |     "version": 3
697 |    },
698 |    "file_extension": ".py",
699 |    "mimetype": "text/x-python",
700 |    "name": "python",
701 |    "nbconvert_exporter": "python",
702 |    "pygments_lexer": "ipython3",
703 |    "version": "3.7.3"
704 |   }
705 |  },
706 |  "nbformat": 4,
707 |  "nbformat_minor": 2
708 | }
709 | 


--------------------------------------------------------------------------------
/AdvScraper/GetOldTweets3/GetOldTweets3_Companion_Scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Companion Notebook for Scraping Twitter Using GetOldTweets3\n",
  8 |     "\n",
  9 |     "Package: https://github.com/Mottl/GetOldTweets3\n",
 10 |     "\n",
 11 |     "Article Read-Along: https://towardsdatascience.com/how-to-scrape-more-information-from-tweets-on-twitter-44fd540b8a1f\n",
 12 |     "\n",
 13 |     "### Notebook Author: Martin Beck\n",
 14 |     "#### Information current as of August, 13th 2020\n",
 15 |     "<b> Dependencies:</b> Make sure GetOldTweets3 is already installed in your Python environment. If not, you can pip install GetOldTweets3 to install the package. If you want more information on setting up I have an article [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) that goes into deeper detail."
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "markdown",
 20 |    "metadata": {},
 21 |    "source": [
 22 |     "## Notebook's Table of Contents<a name=\"TOC\"></a>\n",
 23 |     "<br>\n",
 24 |     "<b>This companion notebook is meant to build on the scraping article and article notebook as it covers more scenarios that may come up and provides more examples.</b>\n",
 25 |     "\n",
 26 |     "1. [Getting More Information From Tweets](#Section1)\n",
 27 |     "<br>How to scrape more information from tweets such as favorite count, retweet count, mentions, permalinks, etc.\n",
 28 |     "2. [Getting User Information From Tweets](#Section2)\n",
 29 |     "<br><b>GetOldTweets3 does not offer</b> anymore user information than their screename or Twitter @ name which is shown in section 1.\n",
 30 |     "3. [Scraping Tweets With Advanced Queries](#Section3)\n",
 31 |     "<br>How to scrape for tweets using deeper queries such as searching by language of tweets, tweets within a certain location, tweets within specific date ranges, top tweets, etc.\n",
 32 |     "4. [Putting It All Together](#Section4)\n",
 33 |     "<br>Showcasing how you can mix and match the methods shown above to create queries that'll fulfill your data needs."
 34 |    ]
 35 |   },
 36 |   {
 37 |    "cell_type": "markdown",
 38 |    "metadata": {},
 39 |    "source": [
 40 |     "## Imports for Notebook"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "code",
 45 |    "execution_count": 3,
 46 |    "metadata": {},
 47 |    "outputs": [],
 48 |    "source": [
 49 |     "# Pip install GetOldTweets3 if you don't already have the package\n",
 50 |     "# !pip install GetOldTweets3\n",
 51 |     "\n",
 52 |     "# Imports\n",
 53 |     "import GetOldTweets3 as got\n",
 54 |     "import pandas as pd"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "## 1. Getting More Information From Tweets <a name=\"Section1\"></a>\n",
 62 |     "[Return to Table of Contents](#TOC)\n",
 63 |     "<br>\n",
 64 |     "List of information available in the tweet object with GetOldTweets3 I included everything except geo data due to issues that are currently still open.\n",
 65 |     "\n",
 66 |     "* tweet.geo: <b>*NOTE GEO-DATA NOT WORKING BASED ON ISSUE</b><br><br>\n",
 67 |     "\n",
 68 |     "* tweet.id: Id of tweet\n",
 69 |     "* tweet.author_id: User id of tweet's author\n",
 70 |     "* tweet.username: Username of tweet's author, commonly called User's @ name\n",
 71 |     "* tweet.to: If tweet is a reply, the original tweet's username\n",
 72 |     "* tweet.text: Text content of tweet\n",
 73 |     "* tweet.retweets: Count of retweets\n",
 74 |     "* tweet.favorites: Count of favorites\n",
 75 |     "* tweet.replies: Count of replies\n",
 76 |     "* tweet.date: Date tweet was created\n",
 77 |     "* tweet.formatted_date: Formatted version of when tweet was created\n",
 78 |     "* tweet.hashtags: Hashtags that tweet contains\n",
 79 |     "* tweet.mentions: Mentions of other users that tweet contains\n",
 80 |     "* tweet.urls: Urls that are in the tweet\n",
 81 |     "* tweet.permalink: Permalink of tweet itself"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "### Query by Username\n",
 89 |     "I created three functions to build off of based off of various scenarios that are likely to happen for someone scraping tweets from users. After each function I call them to showcase an example of them being used.\n",
 90 |     "\n",
 91 |     "#### F1. scrape_user_tweets\n",
 92 |     "This function scrapes a single users tweets and exports the data as a csv or excel file\n",
 93 |     "\n",
 94 |     "#### F2. scrape_multiple_users_multifile\n",
 95 |     "This function scrapes multiple users based on a list and exports separate csv or excel files per user.\n",
 96 |     "\n",
 97 |     "#### F3. scrape_multiple_users_singlefile\n",
 98 |     "This function scrapes multiple users based on a list and exports one csv or excel file containing all tweets"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": 34,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "def scrape_user_tweets(username, max_tweets):\n",
108 |     "    # Creation of query object\n",
109 |     "    tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
110 |     "                                            .setMaxTweets(max_tweets)\n",
111 |     "    # Creation of list that contains all tweets\n",
112 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
113 |     "\n",
114 |     "    # Pulling information from tweets iterable object\n",
115 |     "    # Add or remove tweet information you want in the below list comprehension\n",
116 |     "    tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
117 |     "                    tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
118 |     "                    tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
119 |     "\n",
120 |     "    # Creation of dataframe from tweets list\n",
121 |     "    # Add or remove columns as you remove tweet information\n",
122 |     "    tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
123 |     "                                                     'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
124 |     "    \n",
125 |     "    # Removing timezone information to allow excel file download\n",
126 |     "    tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
127 |     "    \n",
128 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
129 |     "    tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
130 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "code",
135 |    "execution_count": 35,
136 |    "metadata": {},
137 |    "outputs": [],
138 |    "source": [
139 |     "# Creating example username to scrape from\n",
140 |     "username = 'jack'\n",
141 |     "\n",
142 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
143 |     "max_tweets = 150\n",
144 |     "\n",
145 |     "# Function will scrape username, attempt to pull max_tweet amount, and create csv/excel file from data.\n",
146 |     "scrape_user_tweets(username,max_tweets)"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": 36,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": [
155 |     "def scrape_multiple_users_multifile(username_list, max_tweets_per):\n",
156 |     "    # Looping through each username in user list\n",
157 |     "    for username in username_list:\n",
158 |     "        # Creation of query object\n",
159 |     "        tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
160 |     "                                                .setMaxTweets(max_tweets_per)\n",
161 |     "        # Creation of list that contains all tweets\n",
162 |     "        tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
163 |     "\n",
164 |     "        # Creating list of chosen tweet data\n",
165 |     "        # Add or remove tweet information you want in the below list comprehension\n",
166 |     "        tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
167 |     "                        tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
168 |     "                        tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
169 |     "\n",
170 |     "        # Creation of dataframe from tweets list\n",
171 |     "        # Add or remove columns as you remove tweet information\n",
172 |     "        tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
173 |     "                                                         'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
174 |     "        \n",
175 |     "        # Removing timezone information to allow excel file download\n",
176 |     "        tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
177 |     "        \n",
178 |     "        # Uncomment/comment below lines to decide between creating csv or excel file \n",
179 |     "        tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
180 |     "#         tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "code",
185 |    "execution_count": 37,
186 |    "metadata": {},
187 |    "outputs": [],
188 |    "source": [
189 |     "# Creating example user list with 3 users\n",
190 |     "user_name_list = ['jack','billgates','random']\n",
191 |     "\n",
192 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
193 |     "max_tweets_per = 150\n",
194 |     "\n",
195 |     "# Function will scrape each user, attempting to pull max_tweet amount, and create csv/excel file per user.\n",
196 |     "scrape_multiple_users_multifile(user_name_list, max_tweets_per)"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "code",
201 |    "execution_count": 40,
202 |    "metadata": {},
203 |    "outputs": [],
204 |    "source": [
205 |     "def scrape_multiple_users_singlefile(username_list, max_tweets_per):\n",
206 |     "    # Creating master list to contain all tweets\n",
207 |     "    master_tweets_list = []\n",
208 |     "    \n",
209 |     "    # Looping through each username in user list\n",
210 |     "    for username in user_name_list:\n",
211 |     "        # Creation of query object\n",
212 |     "        tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
213 |     "                                                .setMaxTweets(max_tweets_per)\n",
214 |     "        # Creation of list that contains all tweets\n",
215 |     "        tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
216 |     "\n",
217 |     "        # Creating list of chosen tweet data\n",
218 |     "        # Appending new tweets per user into the master tweet list\n",
219 |     "        # Add or remove tweet information you want in the below list comprehension\n",
220 |     "        for tweet in tweets:\n",
221 |     "            master_tweets_list.append((tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
222 |     "                            tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
223 |     "                            tweet.mentions, tweet.urls, tweet.permalink))\n",
224 |     "\n",
225 |     "    # Creation of dataframe from tweets list\n",
226 |     "    # Add or remove columns as you remove tweet information\n",
227 |     "    tweets_df = pd.DataFrame(master_tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
228 |     "                                                             'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
229 |     "    \n",
230 |     "    # Removing timezone information to allow excel file download\n",
231 |     "    tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
232 |     "    \n",
233 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
234 |     "    tweets_df.to_csv('multi-user-tweets.csv', sep=',', index = False)\n",
235 |     "#     tweets_df.to_excel('multi-user-tweets.xlsx', index = False)"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": 41,
241 |    "metadata": {},
242 |    "outputs": [],
243 |    "source": [
244 |     "# Creating example user list with 3 users\n",
245 |     "user_name_list = ['jack','billgates','random']\n",
246 |     "\n",
247 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
248 |     "max_tweets_per = 150\n",
249 |     "\n",
250 |     "# Function will scrape each user, attempting to pull max_tweet amount, and create one csv/excel file containing all data name multi-user-tweets.\n",
251 |     "scrape_multiple_users_singlefile(user_name_list, max_tweets_per)"
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "markdown",
256 |    "metadata": {},
257 |    "source": [
258 |     "### Query by Text Search\n",
259 |     "I created a function to build off of for scraping tweets by text search.\n",
260 |     "\n",
261 |     "#### F1. scrape_text_query\n",
262 |     "This function scrapes tweets from Twitter based on the text search and exports the data as a csv or excel file"
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": 44,
268 |    "metadata": {},
269 |    "outputs": [],
270 |    "source": [
271 |     "def scrape_text_query(text_query, count):\n",
272 |     "    # Creation of query object\n",
273 |     "    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(text_query)\\\n",
274 |     "                            .setMaxTweets(count)\n",
275 |     "    # Creation of list that contains all tweets\n",
276 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
277 |     "\n",
278 |     "    # Creating list of chosen tweet data\n",
279 |     "    # Add or remove tweet information you want in the below list comprehension\n",
280 |     "    tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
281 |     "                    tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
282 |     "                    tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
283 |     "\n",
284 |     "    # Creation of dataframe from tweets\n",
285 |     "    # Add or remove columns as you remove tweet information\n",
286 |     "    tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
287 |     "                                                             'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
288 |     "    \n",
289 |     "    # Removing timezone information to allow excel file download\n",
290 |     "    tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
291 |     "    \n",
292 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
293 |     "    tweets_df.to_csv('{}-tweets.csv'.format(text_query), sep=',', index = False)\n",
294 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(text_query), index = False)"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": null,
300 |    "metadata": {},
301 |    "outputs": [],
302 |    "source": [
303 |     "# Input search query to scrape tweets and name csv file\n",
304 |     "text_query = 'Coronavirus'\n",
305 |     "\n",
306 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
307 |     "max_tweets = 150\n",
308 |     "\n",
309 |     "# Function scrapes for tweets containing text_query, attempting to pull max_tweet amount and create csv/excel file containing data.\n",
310 |     "scrape_text_query(text_query, max_tweets)"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "markdown",
315 |    "metadata": {},
316 |    "source": [
317 |     "## 2. Getting User Information From Tweets<a name=\"Section2\"></a>\n",
318 |     "[Return to Table of Contents](#TOC)\n",
319 |     "<br><b>GetOldTweets3 is limited in the user information that is accessible.</b> This library only allows access to a tweet author's username and user_id. If you want user information I recommend looking into utilizing Tweepy for all of your scraping, or using Tweepy in tandem with GetOldTweets3 in order to utilize both libraries to their strengths."
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "markdown",
324 |    "metadata": {},
325 |    "source": [
326 |     "## 3. Scraping Tweets With Advanced Queries<a name=\"Section3\"></a>\n",
327 |     "[Return to Table of Contents](#TOC)\n",
328 |     "<br>\n",
329 |     "List of methods available with GetOldTweets3 to refine your queries.\n",
330 |     "\n",
331 |     "* setUsername(str): Setting query based on username\n",
332 |     "* setMaxTweets(int): Setting maximum number of tweets to search\n",
333 |     "* setQuerySearch(str): Setting query based on text\n",
334 |     "* setSince(str \"yyyy-mm-dd\"): Setting lower bound date on query\n",
335 |     "* setUntil(str \"yyyy-mm-dd\"): Setting upper bound date on query\n",
336 |     "* setNear(str): Setting location of query search\n",
337 |     "* setWithin(str): Setting radius of query search location\n",
338 |     "* setLang(str): Setting language of query\n",
339 |     "* setTopTweets(bool): Setting query to search only for top tweets\n",
340 |     "* setEmoji(\"ignore\"/\"unicode\"/\"name\"): Setting query to search using emoji styles"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "markdown",
345 |    "metadata": {},
346 |    "source": [
347 |     "I created two functions to build off of that utilize the different query methods available through the TweetCriteria class. As you can see you can mix and match the above methods in any way. It's important to remember that the more restrictive you make the search the more likely that a smaller amount of tweets that will come up.\n",
348 |     "\n",
349 |     "#### F1. scrape_advanced_queries1\n",
350 |     "This function queries by using .setUsername to set the username, .setQuerySearch to set text to query for, .setSince to set the oldest date of the tweets to query, .setUntil to set the most recent date of the tweets to query, .setMaxTweets to set the amount of tweets to query for.\n",
351 |     "\n",
352 |     "#### F2. scrape_advanced_queries2\n",
353 |     "This function queries by using .setQuerySearch, .setNear to set a location to query for tweets around, .setWithin to set a radius restriction around the chosen location, .setLang to scrape for tweets written in a specific language, .setMaxTweets"
354 |    ]
355 |   },
356 |   {
357 |    "cell_type": "code",
358 |    "execution_count": 45,
359 |    "metadata": {},
360 |    "outputs": [],
361 |    "source": [
362 |     "def scrape_advanced_queries1(username, text_query, since_date, until_date, count):\n",
363 |     "    # Creation of query object with as many specific queries as you want\n",
364 |     "    tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
365 |     "    .setQuerySearch(text_query).setSince(since_date)\\\n",
366 |     "    .setUntil(until_date).setMaxTweets(count)\n",
367 |     "    \n",
368 |     "    # Creation of list that contains all tweets\n",
369 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
370 |     "\n",
371 |     "    # Creating list of chosen tweet data\n",
372 |     "    # Add or remove tweet information you want in the below list comprehension\n",
373 |     "    tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
374 |     "                    tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
375 |     "                    tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
376 |     "\n",
377 |     "    # Creation of dataframe from tweets list\n",
378 |     "    # Add or remove columns as you remove tweet information\n",
379 |     "    tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
380 |     "                                                     'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
381 |     "    \n",
382 |     "    # Removing timezone information to allow excel file download\n",
383 |     "    tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
384 |     "    \n",
385 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
386 |     "    tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
387 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
388 |    ]
389 |   },
390 |   {
391 |    "cell_type": "code",
392 |    "execution_count": 46,
393 |    "metadata": {},
394 |    "outputs": [],
395 |    "source": [
396 |     "username = \"BarackObama\"\n",
397 |     "text_query = \"Hello\"\n",
398 |     "since_date = \"2011-01-01\"\n",
399 |     "until_date = \"2016-12-20\"\n",
400 |     "count = 150\n",
401 |     "\n",
402 |     "scrape_advanced_queries1(username, text_query, since_date, until_date, count)"
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": 47,
408 |    "metadata": {},
409 |    "outputs": [],
410 |    "source": [
411 |     "def scrape_advanced_queries2(text_query, location, radius, language, count):\n",
412 |     "    # Creation of query object with as many specific queries as you want\n",
413 |     "    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(text_query)\\\n",
414 |     "    .setNear(location).setWithin(radius).setLang(language).setMaxTweets(count)\n",
415 |     "    \n",
416 |     "    # Creation of list that contains all tweets\n",
417 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
418 |     "\n",
419 |     "    # Creating list of chosen tweet data\n",
420 |     "    # Add or remove tweet information you want in the below list comprehension\n",
421 |     "    tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
422 |     "                    tweet.replies,tweet.date, tweet.formatted_date, tweet.hashtags, \n",
423 |     "                    tweet.mentions, tweet.urls, tweet.permalink,] for tweet in tweets]\n",
424 |     "\n",
425 |     "    # Creation of dataframe from tweets list\n",
426 |     "    # Add or remove columns as you remove tweet information\n",
427 |     "    tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User','Reply to', 'Text','Retweets', 'Favorites', 'Replies', 'Datetime',\n",
428 |     "                                                     'Formatted date', 'Hashtags','Mentions','Urls','Permalink'])\n",
429 |     "    \n",
430 |     "    # Removing timezone information to allow excel file download\n",
431 |     "    tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
432 |     "    \n",
433 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
434 |     "    tweets_df.to_csv('{}-tweets.csv'.format(text_query), sep=',', index = False)\n",
435 |     "    tweets_df.to_excel('{}-tweets.xlsx'.format(text_query), index = False)"
436 |    ]
437 |   },
438 |   {
439 |    "cell_type": "code",
440 |    "execution_count": 48,
441 |    "metadata": {},
442 |    "outputs": [],
443 |    "source": [
444 |     "text_query = \"Hola\"\n",
445 |     "location = \"Mexico\"\n",
446 |     "radius = \"100mi\"\n",
447 |     "language = \"Spanish\"\n",
448 |     "count = 150\n",
449 |     "\n",
450 |     "scrape_advanced_queries2(text_query, location, radius, language, count)"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "markdown",
455 |    "metadata": {},
456 |    "source": [
457 |     "## 4. Putting It All Together<a name=\"Section4\"></a>\n",
458 |     "[Return to Table of Contents](#TOC)\n",
459 |     "<br>\n",
460 |     "Great, we now know how to pull more information from tweets and querying with advanced parameters. The great thing is how easy it is to mix and match whatever you want to search for. While it was shown above several times. The point is that you can mix and match the information you want from the tweets and the type of queries you conduct. It's just important that you update the column names in the pandas dataframe so you don't get errors.\n",
461 |     "\n",
462 |     "<br>\n",
463 |     "Below is an example of a search for 150 top tweets with 'coronavirus' in it that occurred between August 5th and August 8th 2020 in Washington D.C.\n"
464 |    ]
465 |   },
466 |   {
467 |    "cell_type": "code",
468 |    "execution_count": 49,
469 |    "metadata": {},
470 |    "outputs": [],
471 |    "source": [
472 |     "text_query = 'Coronavirus'\n",
473 |     "since_date = '2020-08-05'\n",
474 |     "until_date = '2020-08-10'\n",
475 |     "location = 'Washington, D.C.'\n",
476 |     "top_tweets = True\n",
477 |     "count = 150\n",
478 |     "\n",
479 |     "# Creation of tweetCriteria query object with methods to specify further\n",
480 |     "tweetCriteria = got.manager.TweetCriteria().setQuerySearch(text_query).setSince(since_date)\\\n",
481 |     ".setUntil(until_date).setNear(location).setTopTweets(top_tweets).setMaxTweets(count)\n",
482 |     "\n",
483 |     "# Creation of tweets iterable containing all queried tweet data\n",
484 |     "tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
485 |     "\n",
486 |     "# List comprehension pulling chosen tweet information per tweet from tweets\n",
487 |     "# Add or remove tweet information you want in the below list comprehension\n",
488 |     "tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.to, tweet.text, tweet.retweets, tweet.favorites,\n",
489 |     "                tweet.replies,tweet.date, tweet.mentions, tweet.urls, tweet.permalink,] \n",
490 |     "               for tweet in tweets]\n",
491 |     "\n",
492 |     "# Creation of dataframe from tweets list\n",
493 |     "# Add or remove columns as you remove tweet information\n",
494 |     "tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Twitter User Id', 'Twitter @ Name','Reply to', 'Text','Retweets', 'Favorites', \n",
495 |     "                                                 'Replies', 'Datetime','Mentions','Urls','Permalink'])\n",
496 |     "# Removing timezone information to allow excel file download\n",
497 |     "tweets_df['Datetime'] = tweets_df['Datetime'].apply(lambda x: x.replace(tzinfo=None))\n",
498 |     "\n",
499 |     "# Uncomment/comment below lines to decide between creating csv or excel file \n",
500 |     "tweets_df.to_csv('put-together-tweets.csv', sep=',', index = False)\n",
501 |     "# tweets_df.to_excel('put-together-tweets.xlsx', index = False)"
502 |    ]
503 |   }
504 |  ],
505 |  "metadata": {
506 |   "kernelspec": {
507 |    "display_name": "Python 3",
508 |    "language": "python",
509 |    "name": "python3"
510 |   },
511 |   "language_info": {
512 |    "codemirror_mode": {
513 |     "name": "ipython",
514 |     "version": 3
515 |    },
516 |    "file_extension": ".py",
517 |    "mimetype": "text/x-python",
518 |    "name": "python",
519 |    "nbconvert_exporter": "python",
520 |    "pygments_lexer": "ipython3",
521 |    "version": "3.7.3"
522 |   }
523 |  },
524 |  "nbformat": 4,
525 |  "nbformat_minor": 2
526 | }
527 | 


--------------------------------------------------------------------------------
/AdvScraper/README.md:
--------------------------------------------------------------------------------
 1 | # NOTE, the following information is heavily outdated, GetOldTweets3 is no longer usable, and the Tweepy code utilizes Twitter API V1, V2 is currently used.
 2 | 
 3 | 
 4 | 
 5 | 
 6 | ---
 7 | ---
 8 | 
 9 | # How to Scrape More Information From Tweets on Twitter
10 | This folder contains the jupyter notebooks for my advanced scraping tutorial published [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1 "written article").
11 | 
12 | This folder contains two subfolders based on two different Python packages I used to scrape tweets. Each sub-folder contains an article notebook that follows the code snippets in my article and a companion notebook that provides more code examples and easy to use functions. This folder also contains a third item titled Tweepy_and_GetOldTweets3.ipynb that utilizes both Python packages to allow one to use GetOldTweets3 and have access to user information. 
13 | 
14 | The contents of this folder and its subfolders are shown below. 
15 | 
16 | 
17 | * GetOldTweets3
18 |   * GetOldTweets3_Article_Scraper.ipynb
19 |   * GetOldTweets3_Companion_Scraper.ipynb
20 | * Tweepy
21 |   * Tweepy_Article_Scraper.ipynb
22 |   * Tweepy_Companion_Scraper.ipynb
23 |   * credentials.csv
24 | * Tweepy_and_GetOldTweets3.ipynb
25 | 


--------------------------------------------------------------------------------
/AdvScraper/Tweepy/Tweepy_Companion_Scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Companion Notebook for Scraping Twitter Using Tweepy\n",
  8 |     "\n",
  9 |     "Package Github: https://github.com/tweepy/tweepy\n",
 10 |     "\n",
 11 |     "Package Documentation: https://tweepy.readthedocs.io/en/latest/\n",
 12 |     "\n",
 13 |     "Article Read-Along: https://towardsdatascience.com/how-to-scrape-more-information-from-tweets-on-twitter-44fd540b8a1f\n",
 14 |     "\n",
 15 |     "### Notebook Author: Martin Beck\n",
 16 |     "#### Information current as of August, 13th 2020\n",
 17 |     "<b> Dependencies:</b> Make sure Tweepy is already installed in your Python environment. If not, you can pip install Tweepy to install the package. If you want more information on setting up I have an article [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) that goes into deeper detail."
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "## Notebook's Table of Contents<a name=\"TOC\"></a>\n",
 25 |     "<br>\n",
 26 |     "<b>This companion notebook is meant to build on the scraping article and article notebook as it covers more scenarios that may come up and provides more examples.</b>\n",
 27 |     "\n",
 28 |     "0. [Credentials and Authorization](#Section0)\n",
 29 |     "<br>Setting up credentials and authorization in order to utilize Tweepy\n",
 30 |     "1. [Getting More Information From Tweets](#Section1)\n",
 31 |     "<br>How to scrape more information from tweets such as favorite count, retweet count, if they're replying to someone else, if turned on the coordinates of where the tweet came from, etc.\n",
 32 |     "2. [Getting User Information From Tweets](#Section2)\n",
 33 |     "<br>How to scrape user information from tweets such as their follower count, total amount of tweets, if they're a verified user, location of where account is registered, etc.\n",
 34 |     "3. [Scraping Tweets With Advanced Queries](#Section3)\n",
 35 |     "<br>How to scrape for tweets using deeper queries such as searching by language of tweets, tweets within a certain location, tweets within specific date ranges, top tweets, etc.\n",
 36 |     "4. [Putting It All Together](#Section4)\n",
 37 |     "<br>Showcasing how you can mix and match the methods shown above to create queries that'll fulfill your data needs."
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## Imports for Notebook"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": 11,
 50 |    "metadata": {},
 51 |    "outputs": [],
 52 |    "source": [
 53 |     "# Pip install Tweepy if you don't already have the package\n",
 54 |     "# !pip install tweepy\n",
 55 |     "\n",
 56 |     "# Imports\n",
 57 |     "import tweepy\n",
 58 |     "import pandas as pd\n",
 59 |     "import time"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "markdown",
 64 |    "metadata": {},
 65 |    "source": [
 66 |     "## 0. Credentials and Authorization<a name=\"Section0\"></a>\n",
 67 |     "[Return to Table of Contents](#TOC)\n",
 68 |     "<br>Tweepy requires credentials before you can utilize its API. The below code helps setup the notebook for authorization. I already have an an article covering setting up Tweepy and getting credentials [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) if further instructions are needed.\n",
 69 |     "\n",
 70 |     "You don't necessarily have to create a credentials file, however if you find youself sharing Tweepy code to other parties I recommend it so you don't accidentally share your credentials. Otherwise skip the below cell and just enter your credentials in and have them hardcoded below."
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": 19,
 76 |    "metadata": {},
 77 |    "outputs": [
 78 |     {
 79 |      "data": {
 80 |       "text/html": [
 81 |        "<div>\n",
 82 |        "<style scoped>\n",
 83 |        "    .dataframe tbody tr th:only-of-type {\n",
 84 |        "        vertical-align: middle;\n",
 85 |        "    }\n",
 86 |        "\n",
 87 |        "    .dataframe tbody tr th {\n",
 88 |        "        vertical-align: top;\n",
 89 |        "    }\n",
 90 |        "\n",
 91 |        "    .dataframe thead th {\n",
 92 |        "        text-align: right;\n",
 93 |        "    }\n",
 94 |        "</style>\n",
 95 |        "<table border=\"1\" class=\"dataframe\">\n",
 96 |        "  <thead>\n",
 97 |        "    <tr style=\"text-align: right;\">\n",
 98 |        "      <th></th>\n",
 99 |        "      <th>name</th>\n",
100 |        "      <th>key</th>\n",
101 |        "    </tr>\n",
102 |        "  </thead>\n",
103 |        "  <tbody>\n",
104 |        "    <tr>\n",
105 |        "      <th>0</th>\n",
106 |        "      <td>consumer_key</td>\n",
107 |        "      <td>XXXXXXXXXXX</td>\n",
108 |        "    </tr>\n",
109 |        "    <tr>\n",
110 |        "      <th>1</th>\n",
111 |        "      <td>consumer_secret</td>\n",
112 |        "      <td>XXXXXXXXXXX</td>\n",
113 |        "    </tr>\n",
114 |        "    <tr>\n",
115 |        "      <th>2</th>\n",
116 |        "      <td>access_token</td>\n",
117 |        "      <td>XXXXXXXXXXX</td>\n",
118 |        "    </tr>\n",
119 |        "    <tr>\n",
120 |        "      <th>3</th>\n",
121 |        "      <td>access_secret</td>\n",
122 |        "      <td>XXXXXXXXXXX</td>\n",
123 |        "    </tr>\n",
124 |        "  </tbody>\n",
125 |        "</table>\n",
126 |        "</div>"
127 |       ],
128 |       "text/plain": [
129 |        "              name          key\n",
130 |        "0     consumer_key  XXXXXXXXXXX\n",
131 |        "1  consumer_secret  XXXXXXXXXXX\n",
132 |        "2     access_token  XXXXXXXXXXX\n",
133 |        "3    access_secret  XXXXXXXXXXX"
134 |       ]
135 |      },
136 |      "execution_count": 19,
137 |      "metadata": {},
138 |      "output_type": "execute_result"
139 |     }
140 |    ],
141 |    "source": [
142 |     "# Loading in from csv file\n",
143 |     "\n",
144 |     "credentials_df = pd.read_csv('credentials.csv',header=None,names=['name','key'])\n",
145 |     "\n",
146 |     "credentials_df"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": 13,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": [
155 |     "# Credentials from csv file\n",
156 |     "\n",
157 |     "consumer_key = credentials_df.loc[credentials_df['name']=='consumer_key','key'].iloc[0]\n",
158 |     "consumer_secret = credentials_df.loc[credentials_df['name']=='consumer_secret','key'].iloc[0]\n",
159 |     "access_token = credentials_df.loc[credentials_df['name']=='access_token','key'].iloc[0]\n",
160 |     "access_token_secret = credentials_df.loc[credentials_df['name']=='access_secret','key'].iloc[0]\n",
161 |     "\n",
162 |     "# Credentials hardcoded\n",
163 |     "\n",
164 |     "# consumer_key = \"XXXXX\"\n",
165 |     "# consumer_secret = \"XXXXX\"\n",
166 |     "# access_token = \"XXXXX\"\n",
167 |     "# access_token_secret = \"XXXXXX\"\n",
168 |     "\n",
169 |     "auth = tweepy.OAuthHandler(consumer_key, consumer_secret)\n",
170 |     "auth.set_access_token(access_token, access_token_secret)\n",
171 |     "api = tweepy.API(auth,wait_on_rate_limit=True)"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "markdown",
176 |    "metadata": {},
177 |    "source": [
178 |     "## 1. Getting More Information From Tweets<a name=\"Section1\"></a>\n",
179 |     "[Return to Table of Contents](#TOC)\n",
180 |     "<br>List of information available in tweet object with Tweepy. This is not an exhaustive list but does contain a majority of the available information. If you want an exhaustive list of everything contained in the tweet object there's documentation [here](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/tweet-object) describing all the attributes. \n",
181 |     "\n",
182 |     "String versions of Id's (e.g., id_str, in_reply_to_status_id_str) are used instead to best keep data integrity as there is a possibility for Id's stored as integers to be cut off.\n",
183 |     "\n",
184 |     "* tweet.user <b>User information is covered in part 2 in greater detail</b><br><br>\n",
185 |     "\n",
186 |     "* tweet.full_text: <b>Text content of tweet when API is told to pull all contents of tweets that have more than 140 characters</b><br><br>\n",
187 |     "\n",
188 |     "* tweet.text: Text content of tweet\n",
189 |     "* tweet.created_at: Date tweet was created\n",
190 |     "* tweet.id_str: Id of tweet\n",
191 |     "* tweet.user.screen_name: Username of tweet's author\n",
192 |     "* tweet.coordinates: Geographic location as reported by user or client. May be null that is why extract_coordinates function below was created\n",
193 |     "* tweet.place: Indicates place associated with tweet where user signed up with like Las Vegas, NV. May be null that so extract_place function below was created\n",
194 |     "* tweet.retweet_count: Count of retweets\n",
195 |     "* tweet.favorite_count: Count of favorites\n",
196 |     "* tweet.lang: Indicates a BCP 47 language identifier corresponding to machine detected language of tweet text.\n",
197 |     "* tweet.source: Source where tweet was posted through. Ex: Twitter Web Client\n",
198 |     "* tweet.in_reply_to_status_id_str: If a tweet is a reply, the original tweet's id. Can be null if tweet is not a reply\n",
199 |     "* tweet.in_reply_to_user_id_str: If a tweet is a reply, string representation of original tweet's user id\n",
200 |     "* tweet.is_quote_status: If tweet is a quote tweet"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "markdown",
205 |    "metadata": {},
206 |    "source": [
207 |     "### Query by Username\n",
208 |     "I created three functions to build off of based off of various scenarios that are likely to happen for someone scraping tweets from users. After each function I call them to showcase an example of them being used.\n",
209 |     "\n",
210 |     "#### F0. extract_coordinates and extract_place\n",
211 |     "These functions check for if a tweet has either coordinate information or place information and extract the pertinent information from their json. These are separate functions because they can be nullable so it's important to check first if they have them then to extract and replace in the dataframe.\n",
212 |     "\n",
213 |     "#### F1. scrape_user_tweets\n",
214 |     "This function scrapes a single users tweets and exports the data as a csv or excel file\n",
215 |     "\n",
216 |     "#### F2. scrape_multiple_users_multifile\n",
217 |     "This function scrapes multiple users based on a list and exports separate csv or excel files per user.\n",
218 |     "\n",
219 |     "#### F3. scrape_multiple_users_singlefile\n",
220 |     "This function scrapes multiple users based on a list and exports one csv or excel file containing all tweets"
221 |    ]
222 |   },
223 |   {
224 |    "cell_type": "code",
225 |    "execution_count": 14,
226 |    "metadata": {},
227 |    "outputs": [],
228 |    "source": [
229 |     "# Function created to extract coordinates from tweet if it has coordinate info\n",
230 |     "# Tweets tend to have null so important to run check\n",
231 |     "# Make sure to run this cell as it is used in a lot of different functions below\n",
232 |     "def extract_coordinates(row):\n",
233 |     "    if row['Tweet Coordinates']:\n",
234 |     "        return row['Tweet Coordinates']['coordinates']\n",
235 |     "    else:\n",
236 |     "        return None\n",
237 |     "\n",
238 |     "# Function created to extract place such as city, state or country from tweet if it has place info\n",
239 |     "# Tweets tend to have null so important to run check\n",
240 |     "# Make sure to run this cell as it is used in a lot of different functions below\n",
241 |     "def extract_place(row):\n",
242 |     "    if row['Place Info']:\n",
243 |     "        return row['Place Info'].full_name\n",
244 |     "    else:\n",
245 |     "        return None"
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "code",
250 |    "execution_count": 127,
251 |    "metadata": {},
252 |    "outputs": [],
253 |    "source": [
254 |     "def scrape_user_tweets(username, max_tweets):\n",
255 |     "    # Creation of query method using parameters\n",
256 |     "    tweets = tweepy.Cursor(api.user_timeline,id=username).items(max_tweets)\n",
257 |     "\n",
258 |     "    # List comprehension pulling chosen tweet information from tweets iterable object\n",
259 |     "    # Add or remove tweet information you want in the below list comprehension\n",
260 |     "    tweets_list = [[tweet.text, tweet.created_at, tweet.id_str, tweet.user.screen_name, tweet.coordinates,\n",
261 |     "                   tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.lang,\n",
262 |     "                   tweet.source, tweet.in_reply_to_status_id_str, \n",
263 |     "                    tweet.in_reply_to_user_id_str, tweet.is_quote_status,\n",
264 |     "                    ] for tweet in tweets]\n",
265 |     "\n",
266 |     "    # Creation of dataframe from tweets_list\n",
267 |     "    # Add or remove columns as you remove tweet information\n",
268 |     "    tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter @ Name', 'Tweet Coordinates', 'Place Info',\n",
269 |     "                                                 'Retweets', 'Favorites', 'Language', 'Source', 'Replied Tweet Id',\n",
270 |     "                                                  'Replied Tweet User Id Str', 'Quote Status Bool'])\n",
271 |     "    \n",
272 |     "    # Checks if there are coordinates attached to tweets, if so extracts them\n",
273 |     "    tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
274 |     "    \n",
275 |     "    # Checks if there is place information available, if so extracts them\n",
276 |     "    tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
277 |     "    \n",
278 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
279 |     "    tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
280 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": 128,
286 |    "metadata": {},
287 |    "outputs": [],
288 |    "source": [
289 |     "# Creating example username to scrape from\n",
290 |     "username = 'random'\n",
291 |     "\n",
292 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
293 |     "max_tweets = 150\n",
294 |     "\n",
295 |     "# Function will scrape username, attempt to pull max_tweet amount, and create csv/excel file from data.\n",
296 |     "scrape_user_tweets(username,max_tweets)"
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "code",
301 |    "execution_count": 131,
302 |    "metadata": {},
303 |    "outputs": [],
304 |    "source": [
305 |     "def scrape_multiple_users_multifile(username_list, max_tweets_per):\n",
306 |     "    # Looping through each username in user list\n",
307 |     "    \n",
308 |     "    for username in username_list:   \n",
309 |     "        # Creation of query method using parameters\n",
310 |     "        tweets = tweepy.Cursor(api.user_timeline,id=username).items(max_tweets_per)\n",
311 |     "\n",
312 |     "        # List comprehension pulling chosen tweet information from tweets iterable object\n",
313 |     "        # Add or remove tweet information you want in the below list comprehension\n",
314 |     "        tweets_list = [[tweet.text, tweet.created_at, tweet.id_str, tweet.user.screen_name, tweet.coordinates,\n",
315 |     "                   tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.lang,\n",
316 |     "                   tweet.source, tweet.in_reply_to_status_id_str, \n",
317 |     "                    tweet.in_reply_to_user_id_str, tweet.is_quote_status,] for tweet in tweets]\n",
318 |     "\n",
319 |     "        # Creation of dataframe from tweets_list\n",
320 |     "        # Add or remove columns as you remove tweet information\n",
321 |     "        tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter @ Name', 'Tweet Coordinates', 'Place Info',\n",
322 |     "                                                 'Retweets', 'Favorites', 'Language', 'Source', 'Replied Tweet Id',\n",
323 |     "                                                  'Replied Tweet User Id Str', 'Quote Status Bool'])\n",
324 |     "        \n",
325 |     "        # Checks if there are coordinates attached to tweets, if so extracts them\n",
326 |     "        tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
327 |     "        \n",
328 |     "        # Checks if there is place information available, if so extracts them\n",
329 |     "        tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
330 |     "        \n",
331 |     "        # Uncomment/comment below lines to decide between creating csv or excel file \n",
332 |     "        tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
333 |     "#         tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
334 |    ]
335 |   },
336 |   {
337 |    "cell_type": "code",
338 |    "execution_count": 130,
339 |    "metadata": {},
340 |    "outputs": [],
341 |    "source": [
342 |     "# Creating example user list with 3 users\n",
343 |     "user_name_list = ['jack','billgates','random']\n",
344 |     "\n",
345 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
346 |     "max_tweets_per = 150\n",
347 |     "\n",
348 |     "# Function will scrape each user, attempting to pull max_tweet amount, and create csv/excel file per user.\n",
349 |     "scrape_multiple_users_multifile(user_name_list, max_tweets_per)"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": 134,
355 |    "metadata": {},
356 |    "outputs": [],
357 |    "source": [
358 |     "def scrape_multiple_users_singlefile(username_list, max_tweets_per):\n",
359 |     "    # Creating master list to contain all tweets\n",
360 |     "    master_tweets_list = []\n",
361 |     "    \n",
362 |     "    # Looping through each username in user list\n",
363 |     "    for username in user_name_list:\n",
364 |     "        # Creation of query method using parameters\n",
365 |     "        tweets = tweepy.Cursor(api.user_timeline,id=username).items(max_tweets_per)\n",
366 |     "        \n",
367 |     "        # List comprehension pulling chosen tweet information from tweets iterable object\n",
368 |     "        # Appending new tweets per user into the master tweet list\n",
369 |     "        # Add or remove tweet information you want in the below list comprehension\n",
370 |     "        for tweet in tweets:\n",
371 |     "            master_tweets_list.append((tweet.text, tweet.created_at, tweet.id_str, tweet.user.screen_name, tweet.coordinates,\n",
372 |     "                   tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.lang,\n",
373 |     "                   tweet.source, tweet.in_reply_to_status_id_str, \n",
374 |     "                    tweet.in_reply_to_user_id_str, tweet.is_quote_status))\n",
375 |     "        \n",
376 |     "    # Creation of dataframe from tweets_list\n",
377 |     "    # Add or remove columns as you remove tweet information\n",
378 |     "    tweets_df = pd.DataFrame(master_tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter @ Name', 'Tweet Coordinates', 'Place Info',\n",
379 |     "                                                 'Retweets', 'Favorites', 'Language', 'Source', 'Replied Tweet Id',\n",
380 |     "                                                  'Replied Tweet User Id Str', 'Quote Status Bool'])\n",
381 |     "\n",
382 |     "    # Checks if there are coordinates attached to tweets, if so extracts them\n",
383 |     "    tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
384 |     "    \n",
385 |     "    # Checks if there is place information available, if so extracts them\n",
386 |     "    tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
387 |     "    \n",
388 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
389 |     "    tweets_df.to_csv('multi-user-tweets.csv', sep=',', index = False)\n",
390 |     "#     tweets_df.to_excel('multi-user-tweets.xlsx', index = False)"
391 |    ]
392 |   },
393 |   {
394 |    "cell_type": "code",
395 |    "execution_count": 133,
396 |    "metadata": {},
397 |    "outputs": [],
398 |    "source": [
399 |     "# Creating example user list with 3 users\n",
400 |     "user_name_list = ['jack','billgates','random']\n",
401 |     "\n",
402 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
403 |     "max_tweets_per = 150\n",
404 |     "\n",
405 |     "# Function will scrape each user, attempting to pull max_tweet amount, and create one csv/excel file containing all data name multi-user-tweets.\n",
406 |     "scrape_multiple_users_singlefile(user_name_list, max_tweets_per)"
407 |    ]
408 |   },
409 |   {
410 |    "cell_type": "markdown",
411 |    "metadata": {},
412 |    "source": [
413 |     "## Allowing API to Access up to 280 Characters From Tweets\n",
414 |     "\n",
415 |     "In the cursor parameters add tweet_mode='extended' to access tweet text that goes beyond Twitter's original 140 character limit.\n",
416 |     "\n",
417 |     "If tweet_mode is set to extended the tweet attribute tweet.text becomes tweet.full_text isntead."
418 |    ]
419 |   },
420 |   {
421 |    "cell_type": "code",
422 |    "execution_count": 17,
423 |    "metadata": {},
424 |    "outputs": [],
425 |    "source": [
426 |     "def scrape_extended_tweets(username, max_tweets):\n",
427 |     "    # Creation of query method using parameters\n",
428 |     "    tweets = tweepy.Cursor(api.user_timeline,id=username, tweet_mode='extended').items(max_tweets)\n",
429 |     "\n",
430 |     "    # List comprehension pulling chosen tweet information from tweets iterable object\n",
431 |     "    # Add or remove tweet information you want in the below list comprehension\n",
432 |     "    tweets_list = [[tweet.full_text, tweet.created_at, tweet.id_str, tweet.user.screen_name, tweet.coordinates,\n",
433 |     "                   tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.lang,\n",
434 |     "                   tweet.source, tweet.in_reply_to_status_id_str, \n",
435 |     "                    tweet.in_reply_to_user_id_str, tweet.is_quote_status,\n",
436 |     "                    ] for tweet in tweets]\n",
437 |     "\n",
438 |     "    # Creation of dataframe from tweets_list\n",
439 |     "    # Add or remove columns as you remove tweet information\n",
440 |     "    tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter @ Name', 'Tweet Coordinates', 'Place Info',\n",
441 |     "                                                 'Retweets', 'Favorites', 'Language', 'Source', 'Replied Tweet Id',\n",
442 |     "                                                  'Replied Tweet User Id Str', 'Quote Status Bool'])\n",
443 |     "    \n",
444 |     "    # Checks if there are coordinates attached to tweets, if so extracts them\n",
445 |     "    tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
446 |     "    \n",
447 |     "    # Checks if there is place information available, if so extracts them\n",
448 |     "    tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
449 |     "    \n",
450 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
451 |     "    tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
452 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(username), index = False)"
453 |    ]
454 |   },
455 |   {
456 |    "cell_type": "code",
457 |    "execution_count": 18,
458 |    "metadata": {},
459 |    "outputs": [],
460 |    "source": [
461 |     "# Creating example username to scrape from\n",
462 |     "username = 'billgates'\n",
463 |     "\n",
464 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
465 |     "max_tweets = 150\n",
466 |     "\n",
467 |     "# Function will scrape username, attempt to pull max_tweet amount, and create csv/excel file from data.\n",
468 |     "scrape_extended_tweets(username,max_tweets)"
469 |    ]
470 |   },
471 |   {
472 |    "cell_type": "markdown",
473 |    "metadata": {},
474 |    "source": [
475 |     "### Query by Text Search\n",
476 |     "I created one function to build off of for scraping tweets by text search.\n",
477 |     "\n",
478 |     "#### F1. scrape_text_query\n",
479 |     "This function scrapes tweets from Twitter based on the text search and exports the data as a csv or excel file"
480 |    ]
481 |   },
482 |   {
483 |    "cell_type": "code",
484 |    "execution_count": 9,
485 |    "metadata": {},
486 |    "outputs": [],
487 |    "source": [
488 |     "def scrape_text_query(text_query, max_tweets):\n",
489 |     "    # Creation of query method using parameters\n",
490 |     "    tweets = tweepy.Cursor(api.search,q=text_query, tweet_mode='extended').items(max_tweets)\n",
491 |     "\n",
492 |     "    # List comprehension pulling chosen tweet information from tweets iterable object\n",
493 |     "    # Add or remove tweet information you want in the below list comprehension\n",
494 |     "    tweets_list = [[tweet.full_text, tweet.created_at, tweet.id_str, tweet.user.screen_name, tweet.coordinates,\n",
495 |     "               tweet.place, tweet.retweet_count, tweet.favorite_count, tweet.lang,\n",
496 |     "               tweet.source, tweet.in_reply_to_status_id_str, \n",
497 |     "                tweet.in_reply_to_user_id_str, tweet.is_quote_status,\n",
498 |     "                ] for tweet in tweets]\n",
499 |     "\n",
500 |     "    # Creation of dataframe from tweets_list\n",
501 |     "    # Add or remove columns as you remove tweet information\n",
502 |     "    tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter @ Name', 'Tweet Coordinates', 'Place Info',\n",
503 |     "                                                 'Retweets', 'Favorites', 'Language', 'Source', 'Replied Tweet Id',\n",
504 |     "                                                  'Replied Tweet User Id Str', 'Quote Status Bool'])\n",
505 |     "\n",
506 |     "    # Checks if there are coordinates attached to tweets, if so extracts them\n",
507 |     "    tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
508 |     "    \n",
509 |     "    # Checks if there is place information available, if so extracts them\n",
510 |     "    tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
511 |     "\n",
512 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
513 |     "    tweets_df.to_csv('{}-tweets.csv'.format(text_query), sep=',', index = False)\n",
514 |     "#     tweets_df.to_excel('{}-tweets.xlsx'.format(text_query), index = False)"
515 |    ]
516 |   },
517 |   {
518 |    "cell_type": "code",
519 |    "execution_count": 10,
520 |    "metadata": {},
521 |    "outputs": [],
522 |    "source": [
523 |     "# Input search query to scrape tweets and name csv file\n",
524 |     "text_query = 'Coronavirus'\n",
525 |     "\n",
526 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
527 |     "max_tweets = 150\n",
528 |     "\n",
529 |     "# Function scrapes for tweets containing text_query, attempting to pull max_tweet amount and create csv/excel file containing data.\n",
530 |     "scrape_text_query(text_query, max_tweets)"
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "markdown",
535 |    "metadata": {},
536 |    "source": [
537 |     "## 2. Getting User Information From Tweets<a name=\"Section2\"></a>\n",
538 |     "[Return to Table of Contents](#TOC)\n",
539 |     "\n",
540 |     "<b>Tweepy excels in this category. Having more access to user information than GetOldTweets3.</b>\n",
541 |     "<br>List of information available in user object with Tweepy. This is not an exhaustive list but does contain a majority of the available information. If you want an exhaustive list of everything contained in the tweet object there's documentation [here](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/user-object) describing all the attributes. \n",
542 |     "\n",
543 |     "String versions of Id's (e.g., id_str, user.id_str) are used instead to best keep data integrity as there is a possibility for Id's stored as integers to be cut off.\n",
544 |     "\n",
545 |     "* tweet.text: Text content of tweet\n",
546 |     "* tweet.created_at: Date tweet was created\n",
547 |     "* tweet.id_str: Id of tweet\n",
548 |     "* tweet.user.name: Name of the user as they've defined it\n",
549 |     "* tweet.user.screen_name: Username of tweet's author, commonly called User @ name\n",
550 |     "* tweet.user.id_str: Use id of tweet's author\n",
551 |     "* tweet.user.location: User defined location for account's profile. Can be nullable\n",
552 |     "* tweet.user.url: URL provided by user in bio. Can be nullable\n",
553 |     "* tweet.user.description: Text in user bio. Can be nullable\n",
554 |     "* tweet.user.verified: Boolean indicating whether user has a verified account\n",
555 |     "* tweet.user.followers_count: Count of followers user has\n",
556 |     "* tweet.user.friends_count: Count of other users that user is following\n",
557 |     "* tweet.user.favourites_count: Count of tweets user has liked in the account's lifetime\n",
558 |     "* tweet.user.statuses_count: Count of tweets (including retweets) issued by user\n",
559 |     "* tweet.user.listed_count: Count of public lists that user is member of\n",
560 |     "* tweet.user.created_at: Date that the user account was created on Twitter\n",
561 |     "* tweet.user.profile_image_url_https: HTTPS-based URL pointing to user's profile image\n",
562 |     "* tweet.user.default_profile: When true, indicates user has not altered the theme or background of user profile\n",
563 |     "* tweet.user.default_profile_image: When true, indicates if user has not uploaded their own profile image and default image is used instead\n",
564 |     "\n",
565 |     "### Query by Text Search\n",
566 |     "I created one function to build off of that searches by text and pulls all user information available.\n",
567 |     "\n",
568 |     "#### F1. scrape_user_information\n",
569 |     "This function scrapes tweets from Twitter based on the text search, pulls user information and exports the data as a csv or excel file"
570 |    ]
571 |   },
572 |   {
573 |    "cell_type": "code",
574 |    "execution_count": 137,
575 |    "metadata": {},
576 |    "outputs": [],
577 |    "source": [
578 |     "def scrape_user_information(text_query, max_tweets):\n",
579 |     "    # Creation of query method using parameters\n",
580 |     "    tweets = tweepy.Cursor(api.search,q=text_query).items(max_tweets)\n",
581 |     "\n",
582 |     "    # List comprehension pulling chosen tweet information from tweets iterable object\n",
583 |     "    # Add or remove tweet information you want in the below list comprehension\n",
584 |     "    tweets_list = [[tweet.text, tweet.created_at, tweet.id_str, tweet.user.name, tweet.user.screen_name, \n",
585 |     "                    tweet.user.id_str, tweet.user.location, tweet.user.url,\n",
586 |     "                    tweet.user.description, tweet.user.verified, tweet.user.followers_count,\n",
587 |     "                    tweet.user.friends_count, tweet.user.favourites_count, tweet.user.statuses_count,\n",
588 |     "                    tweet.user.listed_count, tweet.user.created_at, tweet.user.profile_image_url_https,\n",
589 |     "                    tweet.user.default_profile, tweet.user.default_profile_image] for tweet in tweets]\n",
590 |     "\n",
591 |     "    # Creation of dataframe from tweets_list\n",
592 |     "    # Add or remove columns as you remove tweet information\n",
593 |     "    tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Twitter Username', 'Twitter @ name',\n",
594 |     "                                                 'Twitter User Id', 'Twitter User Location', 'URL in Bio', 'Twitter Bio',\n",
595 |     "                                                 'User Verified Status', 'Users Following Count',\n",
596 |     "                                                 'Number users this account is following', 'Users Number of Likes', 'Users Tweet Count',\n",
597 |     "                                                 'Lists Containing User', 'Account Created Time', 'Profile Image URL',\n",
598 |     "                                                 'User Default Profile', 'User Default Profile Image'])\n",
599 |     "\n",
600 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
601 |     "    tweets_df.to_csv('{}-userinfo-tweets.csv'.format(text_query), sep=',', index = False)\n",
602 |     "    # tweets_df.to_excel('{}-userinfo-tweets.xlsx'.format(text_query), index = False)"
603 |    ]
604 |   },
605 |   {
606 |    "cell_type": "code",
607 |    "execution_count": 138,
608 |    "metadata": {
609 |     "scrolled": false
610 |    },
611 |    "outputs": [],
612 |    "source": [
613 |     "# Input search query to scrape tweets and name csv file\n",
614 |     "text_query = 'Coronavirus'\n",
615 |     "\n",
616 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
617 |     "max_tweets = 150\n",
618 |     "\n",
619 |     "# Function scrapes for tweets containing text_query, attempting to pull max_tweet amount and create csv/excel file containing data.\n",
620 |     "scrape_user_information(text_query, max_tweets)"
621 |    ]
622 |   },
623 |   {
624 |    "cell_type": "markdown",
625 |    "metadata": {},
626 |    "source": [
627 |     "## 3. Scraping Tweets With Advanced Queries<a name=\"Section3\"></a>\n",
628 |     "[Return to Table of Contents](#TOC)\n",
629 |     "<br>List of query methods available with Tweepy. This is not an exhaustive list but does contain a majority of the methods available. If you want an exhaustive list of everything available there's documentation [here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets).\n",
630 |     "\n",
631 |     "* q = str: Setting query based on text\n",
632 |     "* geocode = str \"lat,long,radius\": Setting location of query and radius\n",
633 |     "* lang = str: Setting language of query, full list of language codes [here](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)</b>\n",
634 |     "* result_type = str \"mixed\"/\"recent\"/\"popular\": Setting popularity preference of query\n",
635 |     "* until = str \"yyyy-mm-dd\": Setting upper bound date on query, if using standard search API be cognizant of 7-day limit\n",
636 |     "* since_id = str or int: Returns results with Id's more recent than given Id\n",
637 |     "* max_id = str or int: Returns results with Id's older than given Id\n",
638 |     "* count = int: Number of tweets to return per page. Max is 100, defaults to 15"
639 |    ]
640 |   },
641 |   {
642 |    "cell_type": "markdown",
643 |    "metadata": {},
644 |    "source": [
645 |     "I created a function to build off of based that utilize the different query methods available with Tweepy. As you can see you can mix and match the above methods in any way. It's important to remember that the more restrictive you make the search the more likely that a smaller amount of tweets that will come up.\n",
646 |     "\n",
647 |     "#### F1. scrape_advanced_queries\n",
648 |     "This function queries by using geocode to set a location to query for tweets and restrict within a certain radius, lang to scrape for tweets written in a specific language, result_type to search for tweets based on popularity, until to set an upper bound date on tweets, since_id to set a restriction on the oldest id possible, and max_id to set a restriction on the earliest id possible"
649 |    ]
650 |   },
651 |   {
652 |    "cell_type": "code",
653 |    "execution_count": 139,
654 |    "metadata": {},
655 |    "outputs": [],
656 |    "source": [
657 |     "def scrape_advanced_queries(coordinates, language, result_type, until_date, max_tweets):\n",
658 |     "    # Creation of query method using parameters\n",
659 |     "    tweets = tweepy.Cursor(api.search, geocode=coordinates, lang=language, result_type = result_type, \n",
660 |     "                           until = until_date, count = 100).items(max_tweets)\n",
661 |     "\n",
662 |     "    # List comprehension pulling chosen tweet information from tweets iterable object\n",
663 |     "    # Add or remove tweet information you want in the below list comprehension\n",
664 |     "    tweets_list = [[tweet.text, tweet.created_at, tweet.id_str, tweet.favorite_count, tweet.user.screen_name, \n",
665 |     "                    tweet.user.id_str, tweet.user.location, tweet.user.url, \n",
666 |     "                    tweet.user.verified, tweet.user.followers_count,\n",
667 |     "                    tweet.user.friends_count, tweet.user.statuses_count,\n",
668 |     "                    tweet.user.default_profile_image, tweet.lang] for tweet in tweets]\n",
669 |     "\n",
670 |     "    # Creation of dataframe from tweets_list\n",
671 |     "    # Add or remove columns as you remove tweet information\n",
672 |     "    tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Tweet Favorite Count', 'Twitter @ name',\n",
673 |     "                                                 'Twitter User Id', 'Twitter User Location', 'URL in Bio','User Verified Status', 'Users Current Following Count',\n",
674 |     "                                                 'Number of accounts user is following', 'Users Tweet Count',\n",
675 |     "                                                 'Profile Image URL','Tweet Language'])\n",
676 |     "    \n",
677 |     "    # Uncomment/comment below lines to decide between creating csv or excel file \n",
678 |     "    tweets_df.to_csv('advancedqueries-tweets.csv', sep=',', index = False)\n",
679 |     "    # tweets_df.to_excel('advancedqueries-tweets.xlsx', index = False)"
680 |    ]
681 |   },
682 |   {
683 |    "cell_type": "code",
684 |    "execution_count": 140,
685 |    "metadata": {},
686 |    "outputs": [],
687 |    "source": [
688 |     "# Example may no longer show tweets if until_date falls outside \n",
689 |     "# of 7-day period from when you run cell\n",
690 |     "\n",
691 |     "coordinates = '19.402833,-99.141051,50mi'\n",
692 |     "language = 'es'\n",
693 |     "result_type = 'recent'\n",
694 |     "until_date = '2020-08-10'\n",
695 |     "max_tweets = 150\n",
696 |     "\n",
697 |     "scrape_advanced_queries(coordinates, language, result_type, until_date, max_tweets)"
698 |    ]
699 |   },
700 |   {
701 |    "cell_type": "markdown",
702 |    "metadata": {},
703 |    "source": [
704 |     "## 4. Putting It All Together<a name=\"Section4\"></a>\n",
705 |     "[Return to Table of Contents](#TOC)\n",
706 |     "<br>\n",
707 |     "Great, we now know how to pull more information from tweets and querying with advanced parameters. The great thing is how easy it is to mix and match whatever you want to search for. While it was shown above several times. The point is that you can mix and match the information you want from the tweets and the type of queries you conduct. It's just important that you update the column names in the pandas dataframe so you don't get errors.\n",
708 |     "\n",
709 |     "<br>\n",
710 |     "Below is an example of a search for 150 tweets with 'Coronavirus' in it that occurred within a 50 mile radius of Las Vegas, NV. Which in this case has the geo coordinates of lat 36.169786, long -115.139858\n"
711 |    ]
712 |   },
713 |   {
714 |    "cell_type": "code",
715 |    "execution_count": 142,
716 |    "metadata": {},
717 |    "outputs": [],
718 |    "source": [
719 |     "text_query = 'Coronavirus'\n",
720 |     "coordinates = '36.169786,-115.139858,50mi'\n",
721 |     "max_tweets = 150\n",
722 |     "\n",
723 |     "# Creation of query method using parameters\n",
724 |     "tweets = tweepy.Cursor(api.search, q = text_query, geocode = coordinates, count = 100).items(max_tweets)\n",
725 |     "\n",
726 |     "# List comprehension pulling chosen tweet information from tweets iterable object\n",
727 |     "# Add or remove tweet information you want in the below list comprehension\n",
728 |     "tweets_list = [[tweet.text, tweet.created_at, tweet.id_str, tweet.favorite_count, tweet.user.screen_name, \n",
729 |     "                tweet.user.id_str, tweet.user.location, tweet.user.followers_count, tweet.coordinates, tweet.place] for tweet in tweets]\n",
730 |     "\n",
731 |     "# Creation of dataframe from tweets_list\n",
732 |     "# Add or remove columns as you remove tweet information\n",
733 |     "tweets_df = pd.DataFrame(tweets_list,columns=['Tweet Text', 'Tweet Datetime', 'Tweet Id', 'Tweet Favorite Count', 'Twitter @ name',\n",
734 |     "                                             'Twitter User Id', 'Twitter User Location', 'Users Current Following Count', 'Tweet Coordinates', 'Place Info'])\n",
735 |     "\n",
736 |     "# Checks if there are coordinates attached to tweets, if so extracts them\n",
737 |     "tweets_df['Tweet Coordinates'] = tweets_df.apply(extract_coordinates,axis=1)\n",
738 |     "    \n",
739 |     "# Checks if there is place information available, if so extracts them\n",
740 |     "tweets_df['Place Info'] = tweets_df.apply(extract_place,axis=1)\n",
741 |     "\n",
742 |     "# Uncomment/comment below lines to decide between creating csv or excel file \n",
743 |     "tweets_df.to_csv('put-together-tweets.csv', sep=',', index = False)\n",
744 |     "# tweets_df.to_excel('put-together-tweets.xlsx', index = False)"
745 |    ]
746 |   }
747 |  ],
748 |  "metadata": {
749 |   "kernelspec": {
750 |    "display_name": "Python 3",
751 |    "language": "python",
752 |    "name": "python3"
753 |   },
754 |   "language_info": {
755 |    "codemirror_mode": {
756 |     "name": "ipython",
757 |     "version": 3
758 |    },
759 |    "file_extension": ".py",
760 |    "mimetype": "text/x-python",
761 |    "name": "python",
762 |    "nbconvert_exporter": "python",
763 |    "pygments_lexer": "ipython3",
764 |    "version": "3.7.3"
765 |   }
766 |  },
767 |  "nbformat": 4,
768 |  "nbformat_minor": 2
769 | }
770 | 


--------------------------------------------------------------------------------
/AdvScraper/Tweepy/credentials.csv:
--------------------------------------------------------------------------------
1 | consumer_key,XXXXXX
2 | consumer_secret,XXXXXX
3 | access_token,XXXXXX
4 | access_secret,XXXXXX
5 | 


--------------------------------------------------------------------------------
/AdvScraper/Tweepy_and_GetOldTweets3.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Notebook for Scraping Twitter With Tweepy and GetOldTweets3\n",
   8 |     "\n",
   9 |     "Tweepy Package Github: https://github.com/tweepy/tweepy\n",
  10 |     "\n",
  11 |     "GetOldTweets3 Package Github: https://github.com/Mottl/GetOldTweets3\n",
  12 |     "\n",
  13 |     "Tweepy Package Documentation: https://tweepy.readthedocs.io/en/latest/\n",
  14 |     "\n",
  15 |     "Article Read-Along: https://towardsdatascience.com/how-to-scrape-more-information-from-tweets-on-twitter-44fd540b8a1f\n",
  16 |     "\n",
  17 |     "### Notebook Author: Martin Beck\n",
  18 |     "#### Information current as of August, 13th 2020\n",
  19 |     "<b> Dependencies:</b> Make sure Tweepy and GetOldTweets3 is already installed in your Python environment. If not, you can pip install Tweepy to install the package. If you want more information on setting up I have an article [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) that goes into deeper detail."
  20 |    ]
  21 |   },
  22 |   {
  23 |    "cell_type": "markdown",
  24 |    "metadata": {},
  25 |    "source": [
  26 |     "## Notebook's Table of Contents<a name=\"TOC\"></a>\n",
  27 |     "\n",
  28 |     "0. [Credentials and Authorization](#Section0)\n",
  29 |     "<br>Setting up credentials and authorization in order to utilize Tweepy\n",
  30 |     "1. [Available Methods With Tweepy](#Section1)\n",
  31 |     "<br>Methods available with Tweepy to pull more information\n",
  32 |     "2. [How to Use Tweepy With GetOldTweets3](#Section2)\n",
  33 |     "<br>Examples on using Tweepy's methods and how to use them on your datasets."
  34 |    ]
  35 |   },
  36 |   {
  37 |    "cell_type": "markdown",
  38 |    "metadata": {},
  39 |    "source": [
  40 |     "## Imports for Notebook"
  41 |    ]
  42 |   },
  43 |   {
  44 |    "cell_type": "code",
  45 |    "execution_count": 1,
  46 |    "metadata": {},
  47 |    "outputs": [],
  48 |    "source": [
  49 |     "# Pip install Tweepy if you don't already have the package\n",
  50 |     "# !pip install tweepy\n",
  51 |     "\n",
  52 |     "# Imports\n",
  53 |     "import tweepy\n",
  54 |     "import pandas as pd\n",
  55 |     "import GetOldTweets3 as got\n",
  56 |     "import time"
  57 |    ]
  58 |   },
  59 |   {
  60 |    "cell_type": "markdown",
  61 |    "metadata": {},
  62 |    "source": [
  63 |     "## 0. Credentials and Authorization<a name=\"Section0\"></a>\n",
  64 |     "[Return to Table of Contents](#TOC)\n",
  65 |     "<br>Tweepy requires credentials before you can utilize its API. The below code helps setup the notebook for authorization. I already have an an article covering setting up Tweepy and getting credentials [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1) if further instructions are needed.\n",
  66 |     "\n",
  67 |     "You don't necessarily have to create a credentials file, however if you find youself sharing Tweepy code to other parties I recommend it so you don't accidentally share your credentials. Otherwise skip the below cell and just enter your credentials in and have them hardcoded below."
  68 |    ]
  69 |   },
  70 |   {
  71 |    "cell_type": "code",
  72 |    "execution_count": 44,
  73 |    "metadata": {},
  74 |    "outputs": [
  75 |     {
  76 |      "data": {
  77 |       "text/html": [
  78 |        "<div>\n",
  79 |        "<style scoped>\n",
  80 |        "    .dataframe tbody tr th:only-of-type {\n",
  81 |        "        vertical-align: middle;\n",
  82 |        "    }\n",
  83 |        "\n",
  84 |        "    .dataframe tbody tr th {\n",
  85 |        "        vertical-align: top;\n",
  86 |        "    }\n",
  87 |        "\n",
  88 |        "    .dataframe thead th {\n",
  89 |        "        text-align: right;\n",
  90 |        "    }\n",
  91 |        "</style>\n",
  92 |        "<table border=\"1\" class=\"dataframe\">\n",
  93 |        "  <thead>\n",
  94 |        "    <tr style=\"text-align: right;\">\n",
  95 |        "      <th></th>\n",
  96 |        "      <th>name</th>\n",
  97 |        "      <th>key</th>\n",
  98 |        "    </tr>\n",
  99 |        "  </thead>\n",
 100 |        "  <tbody>\n",
 101 |        "    <tr>\n",
 102 |        "      <th>0</th>\n",
 103 |        "      <td>consumer_key</td>\n",
 104 |        "      <td>XXXXXX</td>\n",
 105 |        "    </tr>\n",
 106 |        "    <tr>\n",
 107 |        "      <th>1</th>\n",
 108 |        "      <td>consumer_secret</td>\n",
 109 |        "      <td>XXXXXX</td>\n",
 110 |        "    </tr>\n",
 111 |        "    <tr>\n",
 112 |        "      <th>2</th>\n",
 113 |        "      <td>access_token</td>\n",
 114 |        "      <td>XXXXXX</td>\n",
 115 |        "    </tr>\n",
 116 |        "    <tr>\n",
 117 |        "      <th>3</th>\n",
 118 |        "      <td>access_secret</td>\n",
 119 |        "      <td>XXXXXX</td>\n",
 120 |        "    </tr>\n",
 121 |        "  </tbody>\n",
 122 |        "</table>\n",
 123 |        "</div>"
 124 |       ],
 125 |       "text/plain": [
 126 |        "              name     key\n",
 127 |        "0     consumer_key  XXXXXX\n",
 128 |        "1  consumer_secret  XXXXXX\n",
 129 |        "2     access_token  XXXXXX\n",
 130 |        "3    access_secret  XXXXXX"
 131 |       ]
 132 |      },
 133 |      "execution_count": 44,
 134 |      "metadata": {},
 135 |      "output_type": "execute_result"
 136 |     }
 137 |    ],
 138 |    "source": [
 139 |     "# Loading in from csv file\n",
 140 |     "\n",
 141 |     "credentials_df = pd.read_csv('credentials.csv',header=None,names=['name','key'])\n",
 142 |     "\n",
 143 |     "credentials_df"
 144 |    ]
 145 |   },
 146 |   {
 147 |    "cell_type": "code",
 148 |    "execution_count": 3,
 149 |    "metadata": {},
 150 |    "outputs": [],
 151 |    "source": [
 152 |     "# Credentials from csv file\n",
 153 |     "\n",
 154 |     "consumer_key = credentials_df.loc[credentials_df['name']=='consumer_key','key'].iloc[0]\n",
 155 |     "consumer_secret = credentials_df.loc[credentials_df['name']=='consumer_secret','key'].iloc[0]\n",
 156 |     "access_token = credentials_df.loc[credentials_df['name']=='access_token','key'].iloc[0]\n",
 157 |     "access_token_secret = credentials_df.loc[credentials_df['name']=='access_secret','key'].iloc[0]\n",
 158 |     "\n",
 159 |     "# Credentials hardcoded\n",
 160 |     "\n",
 161 |     "# consumer_key = \"XXXXX\"\n",
 162 |     "# consumer_secret = \"XXXXX\"\n",
 163 |     "# access_token = \"XXXXX\"\n",
 164 |     "# access_token_secret = \"XXXXXX\"\n",
 165 |     "\n",
 166 |     "auth = tweepy.OAuthHandler(consumer_key, consumer_secret)\n",
 167 |     "auth.set_access_token(access_token, access_token_secret)\n",
 168 |     "api = tweepy.API(auth,wait_on_rate_limit=True)"
 169 |    ]
 170 |   },
 171 |   {
 172 |    "cell_type": "markdown",
 173 |    "metadata": {},
 174 |    "source": [
 175 |     "## 1. Available Methods With Tweepy<a name=\"Section1\"></a>\n",
 176 |     "[Return to Table of Contents](#TOC)\n",
 177 |     "<br>For the most part there are only two relevant methods. If you're curious about what else you can do with Tweepy the documentation is available [here](http://docs.tweepy.org/en/latest/api.html#search-methods). \n",
 178 |     "\n",
 179 |     "The revelant methods are api.get_status and api.get_user\n",
 180 |     "\n",
 181 |     "<b>api.get_status provides you with access to Tweepy's tweet object which by default also includes user information.</b>\n",
 182 |     "\n",
 183 |     "<b>api.get_user only provides you with user information. </b>\n",
 184 |     "\n",
 185 |     "You can use either if you only care about accessing user data since they both contain it. However, if you want access to tweet information that is only available with Tweepy such as tweet.in_reply_to_user_id_str I'd recommend using api.get_status"
 186 |    ]
 187 |   },
 188 |   {
 189 |    "cell_type": "markdown",
 190 |    "metadata": {},
 191 |    "source": [
 192 |     "## 2. How to Use Tweepy With GetOldTweets3<a name=\"Section2\"></a>\n",
 193 |     "[Return to Table of Contents](#TOC)\n",
 194 |     "\n",
 195 |     "Below is a list of information accessible in both Tweepy's tweet and user object. This is not an exhaustive list for either object. If you want an exhaustive list of everything contained in Tweepy's tweet object there's documentation [here](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/user-object). If you want an exhaustive list of everything contained in the Tweepy's user object there's documentation [here](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/user-object). \n",
 196 |     "\n",
 197 |     "* tweet.coordinates: Geographic location as reported by user or client. May be null that is why extract_coordinates function below was created\n",
 198 |     "* tweet.place: Indicates place associated with tweet where user signed up with like Las Vegas, NV. May be null that so extract_place function below was created\n",
 199 |     "* tweet.lang: Indicates a BCP 47 language identifier corresponding to machine detected language of tweet text.\n",
 200 |     "* tweet.source: Source where tweet was posted through. Ex: Twitter Web Client\n",
 201 |     "* tweet.in_reply_to_status_id_str: If a tweet is a reply, the original tweet's id. Can be null if tweet is not a reply\n",
 202 |     "* tweet.in_reply_to_user_id_str: If a tweet is a reply, string representation of original tweet's user id\n",
 203 |     "* tweet.user.location: User defined location for account's profile. Can be nullable\n",
 204 |     "* tweet.user.url: URL provided by user in bio. Can be nullable\n",
 205 |     "* tweet.user.description: Text in user bio. Can be nullable\n",
 206 |     "* tweet.user.verified: Boolean indicating whether user has a verified account\n",
 207 |     "* tweet.user.followers_count: Count of followers user has\n",
 208 |     "* tweet.user.friends_count: Count of other users that user is following\n",
 209 |     "* tweet.user.favourites_count: Count of tweets user has liked in the account's lifetime\n",
 210 |     "* tweet.user.statuses_count: Count of tweets (including retweets) issued by user\n",
 211 |     "* tweet.user.listed_count: Count of public lists that user is member of\n",
 212 |     "* tweet.user.created_at: Date that the user account was created on Twitter\n",
 213 |     "* tweet.user.profile_image_url_https: HTTPS-based URL pointing to user's profile image\n",
 214 |     "* tweet.user.default_profile: When true, indicates user has not altered the theme or background of user profile\n",
 215 |     "* tweet.user.default_profile_image: When true, indicates if user has not uploaded their own profile image and default image is used instead\n",
 216 |     "\n",
 217 |     "<b>Remember Tweepy still has its request limitations meaning if you have larger datasets, that running these requests may take time. I've ran this workaround on a smaller dataset of 5k tweets and it took me 1-2hrs to finish running. It's up to you whether you'd rather let your computer spend time running for free or spend money on using Twitter's Premium/Enterprise APIs to work with bigger datasets."
 218 |    ]
 219 |   },
 220 |   {
 221 |    "cell_type": "markdown",
 222 |    "metadata": {},
 223 |    "source": [
 224 |     "### Preparation\n",
 225 |     "\n",
 226 |     "To use Tweepy with GetOldTweets3 there is a little bit of preparation required. Depending on whether you're using the api.get_status or api.get_user method you'll need to have the relevant information available.\n",
 227 |     "\n",
 228 |     "In the case of api.get_status make sure you use GOT3 to scrape for <b>tweet.id</b>\n",
 229 |     "\n",
 230 |     "In the case of api.get_user make sure you use GOT3 to scrape for either <b>tweet.author_id</b> or <b>tweet.username</b>\n",
 231 |     "\n",
 232 |     "I'll showcase this below."
 233 |    ]
 234 |   },
 235 |   {
 236 |    "cell_type": "code",
 237 |    "execution_count": 4,
 238 |    "metadata": {},
 239 |    "outputs": [],
 240 |    "source": [
 241 |     "text_query = 'Hello'\n",
 242 |     "since_date = \"2020-7-20\"\n",
 243 |     "until_date = \"2020-7-21\"\n",
 244 |     "\n",
 245 |     "count = 150\n",
 246 |     " \n",
 247 |     "# Creation of tweetCriteria query object with methods to specify further\n",
 248 |     "tweetCriteria = got.manager.TweetCriteria()\\\n",
 249 |     ".setQuerySearch(text_query).setSince(since_date)\\\n",
 250 |     ".setUntil(until_date).setMaxTweets(count)\n",
 251 |     " \n",
 252 |     "# Creation of tweets iterable containing all queried tweet data\n",
 253 |     "tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
 254 |     " \n",
 255 |     "# List comprehension pulling chosen tweet information from tweets\n",
 256 |     "# Add or remove tweet information you want in the below list comprehension\n",
 257 |     "tweets_list = [[tweet.id, tweet.author_id, tweet.username, tweet.text, tweet.retweets, tweet.favorites, tweet.replies, tweet.date] for tweet in tweets]\n",
 258 |     " \n",
 259 |     "# Creation of dataframe from tweets list\n",
 260 |     "# Add or remove columns as you remove tweet information\n",
 261 |     "tweets_df = pd.DataFrame(tweets_list, columns = ['Tweet Id', 'Tweet User Id', 'Tweet User', 'Text',\n",
 262 |     "                                                  'Retweets', 'Favorites', 'Replies', 'Datetime'])"
 263 |    ]
 264 |   },
 265 |   {
 266 |    "cell_type": "markdown",
 267 |    "metadata": {},
 268 |    "source": [
 269 |     "### I scraped with GetOldTweets3 making sure that I have tweet.id, and tweet.author_id or tweet.username."
 270 |    ]
 271 |   },
 272 |   {
 273 |    "cell_type": "code",
 274 |    "execution_count": 41,
 275 |    "metadata": {
 276 |     "scrolled": true
 277 |    },
 278 |    "outputs": [
 279 |     {
 280 |      "data": {
 281 |       "text/html": [
 282 |        "<div>\n",
 283 |        "<style scoped>\n",
 284 |        "    .dataframe tbody tr th:only-of-type {\n",
 285 |        "        vertical-align: middle;\n",
 286 |        "    }\n",
 287 |        "\n",
 288 |        "    .dataframe tbody tr th {\n",
 289 |        "        vertical-align: top;\n",
 290 |        "    }\n",
 291 |        "\n",
 292 |        "    .dataframe thead th {\n",
 293 |        "        text-align: right;\n",
 294 |        "    }\n",
 295 |        "</style>\n",
 296 |        "<table border=\"1\" class=\"dataframe\">\n",
 297 |        "  <thead>\n",
 298 |        "    <tr style=\"text-align: right;\">\n",
 299 |        "      <th></th>\n",
 300 |        "      <th>Tweet Id</th>\n",
 301 |        "      <th>Tweet User Id</th>\n",
 302 |        "      <th>Tweet User</th>\n",
 303 |        "      <th>Text</th>\n",
 304 |        "      <th>Retweets</th>\n",
 305 |        "      <th>Favorites</th>\n",
 306 |        "      <th>Replies</th>\n",
 307 |        "      <th>Datetime</th>\n",
 308 |        "    </tr>\n",
 309 |        "  </thead>\n",
 310 |        "  <tbody>\n",
 311 |        "    <tr>\n",
 312 |        "      <th>0</th>\n",
 313 |        "      <td>1285363858832363520</td>\n",
 314 |        "      <td>1182717701203972096</td>\n",
 315 |        "      <td>workinclassbird</td>\n",
 316 |        "      <td>friend..... hello</td>\n",
 317 |        "      <td>0</td>\n",
 318 |        "      <td>3</td>\n",
 319 |        "      <td>1</td>\n",
 320 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 321 |        "    </tr>\n",
 322 |        "    <tr>\n",
 323 |        "      <th>1</th>\n",
 324 |        "      <td>1285363857242947584</td>\n",
 325 |        "      <td>1183184898070405120</td>\n",
 326 |        "      <td>Soap_The_Scrub</td>\n",
 327 |        "      <td>hello yes i interacted</td>\n",
 328 |        "      <td>0</td>\n",
 329 |        "      <td>4</td>\n",
 330 |        "      <td>3</td>\n",
 331 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 332 |        "    </tr>\n",
 333 |        "    <tr>\n",
 334 |        "      <th>2</th>\n",
 335 |        "      <td>1285363856202698753</td>\n",
 336 |        "      <td>844768299388813314</td>\n",
 337 |        "      <td>kuroslays</td>\n",
 338 |        "      <td>Hello lew,</td>\n",
 339 |        "      <td>0</td>\n",
 340 |        "      <td>0</td>\n",
 341 |        "      <td>0</td>\n",
 342 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 343 |        "    </tr>\n",
 344 |        "    <tr>\n",
 345 |        "      <th>3</th>\n",
 346 |        "      <td>1285363856055951363</td>\n",
 347 |        "      <td>1214501518646247425</td>\n",
 348 |        "      <td>bubsji</td>\n",
 349 |        "      <td>im nervous HELLO</td>\n",
 350 |        "      <td>0</td>\n",
 351 |        "      <td>0</td>\n",
 352 |        "      <td>0</td>\n",
 353 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 354 |        "    </tr>\n",
 355 |        "    <tr>\n",
 356 |        "      <th>4</th>\n",
 357 |        "      <td>1285363852851511301</td>\n",
 358 |        "      <td>811267164476841984</td>\n",
 359 |        "      <td>realJakeLogan</td>\n",
 360 |        "      <td>Butt Stallion says hello neck gaiter</td>\n",
 361 |        "      <td>0</td>\n",
 362 |        "      <td>0</td>\n",
 363 |        "      <td>0</td>\n",
 364 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 365 |        "    </tr>\n",
 366 |        "  </tbody>\n",
 367 |        "</table>\n",
 368 |        "</div>"
 369 |       ],
 370 |       "text/plain": [
 371 |        "              Tweet Id        Tweet User Id       Tweet User  \\\n",
 372 |        "0  1285363858832363520  1182717701203972096  workinclassbird   \n",
 373 |        "1  1285363857242947584  1183184898070405120   Soap_The_Scrub   \n",
 374 |        "2  1285363856202698753   844768299388813314        kuroslays   \n",
 375 |        "3  1285363856055951363  1214501518646247425           bubsji   \n",
 376 |        "4  1285363852851511301   811267164476841984    realJakeLogan   \n",
 377 |        "\n",
 378 |        "                                    Text  Retweets  Favorites  Replies  \\\n",
 379 |        "0                      friend..... hello         0          3        1   \n",
 380 |        "1                hello yes i interacted          0          4        3   \n",
 381 |        "2                            Hello lew,          0          0        0   \n",
 382 |        "3                       im nervous HELLO         0          0        0   \n",
 383 |        "4  Butt Stallion says hello neck gaiter          0          0        0   \n",
 384 |        "\n",
 385 |        "                   Datetime  \n",
 386 |        "0 2020-07-20 23:59:59+00:00  \n",
 387 |        "1 2020-07-20 23:59:59+00:00  \n",
 388 |        "2 2020-07-20 23:59:58+00:00  \n",
 389 |        "3 2020-07-20 23:59:58+00:00  \n",
 390 |        "4 2020-07-20 23:59:58+00:00  "
 391 |       ]
 392 |      },
 393 |      "execution_count": 41,
 394 |      "metadata": {},
 395 |      "output_type": "execute_result"
 396 |     }
 397 |    ],
 398 |    "source": [
 399 |     "# Taking a quick look at the data scraped\n",
 400 |     "tweets_df.head()"
 401 |    ]
 402 |   },
 403 |   {
 404 |    "cell_type": "markdown",
 405 |    "metadata": {},
 406 |    "source": [
 407 |     "### Alright now we have our data, let's look at a row for information to test how api.get_status and api.get_user work."
 408 |    ]
 409 |   },
 410 |   {
 411 |    "cell_type": "code",
 412 |    "execution_count": 6,
 413 |    "metadata": {},
 414 |    "outputs": [
 415 |     {
 416 |      "data": {
 417 |       "text/plain": [
 418 |        "Tweet Id                           1285363852851511301\n",
 419 |        "Tweet User Id                       811267164476841984\n",
 420 |        "Tweet User                               realJakeLogan\n",
 421 |        "Text             Butt Stallion says hello neck gaiter \n",
 422 |        "Retweets                                             0\n",
 423 |        "Favorites                                            0\n",
 424 |        "Replies                                              0\n",
 425 |        "Datetime                     2020-07-20 23:59:58+00:00\n",
 426 |        "Name: 4, dtype: object"
 427 |       ]
 428 |      },
 429 |      "execution_count": 6,
 430 |      "metadata": {},
 431 |      "output_type": "execute_result"
 432 |     }
 433 |    ],
 434 |    "source": [
 435 |     "# Using iloc to show a specific row of data\n",
 436 |     "tweets_df.iloc[4]"
 437 |    ]
 438 |   },
 439 |   {
 440 |    "cell_type": "code",
 441 |    "execution_count": 7,
 442 |    "metadata": {},
 443 |    "outputs": [
 444 |     {
 445 |      "name": "stdout",
 446 |      "output_type": "stream",
 447 |      "text": [
 448 |       "Tweet Id:  1285363852851511301\n",
 449 |       "User Id:  811267164476841984\n",
 450 |       "Username:  realJakeLogan\n"
 451 |      ]
 452 |     }
 453 |    ],
 454 |    "source": [
 455 |     "# Printing out the relevant information for us\n",
 456 |     "print(\"Tweet Id: \",tweets_df.iloc[4][0])\n",
 457 |     "print(\"User Id: \",tweets_df.iloc[4][1])\n",
 458 |     "print(\"Username: \",tweets_df.iloc[4][2])"
 459 |    ]
 460 |   },
 461 |   {
 462 |    "cell_type": "markdown",
 463 |    "metadata": {},
 464 |    "source": [
 465 |     "### Perfect now let's test get_status and get_user with the above Tweet Id, User Id, and Username."
 466 |    ]
 467 |   },
 468 |   {
 469 |    "cell_type": "code",
 470 |    "execution_count": 48,
 471 |    "metadata": {},
 472 |    "outputs": [],
 473 |    "source": [
 474 |     "api.get_status(1285363852851511301)"
 475 |    ]
 476 |   },
 477 |   {
 478 |    "cell_type": "markdown",
 479 |    "metadata": {},
 480 |    "source": [
 481 |     "### There's a lot going on with that. Remember the list from above that shows the attributes of tweet and user objects? We can use that to focus on the relevant parts."
 482 |    ]
 483 |   },
 484 |   {
 485 |    "cell_type": "code",
 486 |    "execution_count": 8,
 487 |    "metadata": {},
 488 |    "outputs": [
 489 |     {
 490 |      "name": "stdout",
 491 |      "output_type": "stream",
 492 |      "text": [
 493 |       "Salinas Valley, CA\n",
 494 |       "9\n",
 495 |       "WordPress.com\n"
 496 |      ]
 497 |     }
 498 |    ],
 499 |    "source": [
 500 |     "# Using the get_status method to request for the tweet data and pull out requested information\n",
 501 |     "print(api.get_status(1285363852851511301).user.location)\n",
 502 |     "print(api.get_status(1285363852851511301).user.followers_count)\n",
 503 |     "print(api.get_status(1285363852851511301).source)"
 504 |    ]
 505 |   },
 506 |   {
 507 |    "cell_type": "code",
 508 |    "execution_count": 9,
 509 |    "metadata": {},
 510 |    "outputs": [
 511 |     {
 512 |      "name": "stdout",
 513 |      "output_type": "stream",
 514 |      "text": [
 515 |       "Salinas Valley, CA\n",
 516 |       "9\n"
 517 |      ]
 518 |     },
 519 |     {
 520 |      "ename": "AttributeError",
 521 |      "evalue": "'User' object has no attribute 'source'",
 522 |      "output_type": "error",
 523 |      "traceback": [
 524 |       "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
 525 |       "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
 526 |       "\u001b[1;32m<ipython-input-9-0adaf2d3816e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mapi\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_user\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m811267164476841984\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlocation\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mapi\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_user\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'realJakeLogan'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfollowers_count\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mapi\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_user\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m811267164476841984\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msource\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
 527 |       "\u001b[1;31mAttributeError\u001b[0m: 'User' object has no attribute 'source'"
 528 |      ]
 529 |     }
 530 |    ],
 531 |    "source": [
 532 |     "print(api.get_user(811267164476841984).location)\n",
 533 |     "print(api.get_user('realJakeLogan').followers_count)\n",
 534 |     "\n",
 535 |     "# Should throw an error because user object only has user information\n",
 536 |     "print(api.get_user(811267164476841984).source)"
 537 |    ]
 538 |   },
 539 |   {
 540 |    "cell_type": "markdown",
 541 |    "metadata": {},
 542 |    "source": [
 543 |     "As you can see user information is available with either method. The only difference is api.get_status requires you to enter the user keyword as seen with user.location to look at its user information whereas api.get_user only requires .location because it is the user information. That's why we see the error above with looking at the source information with api.get_user because there is no tweet information.\n",
 544 |     "\n",
 545 |     "Lastly, as you can see api.get_user is able to use either User Id or a Twitter Username to pull up user information.\n",
 546 |     "\n",
 547 |     "These methods are great, but using it on a single item is only good for testing. The power really comes in when you can create a function allowing you to use it with a whole dataset."
 548 |    ]
 549 |   },
 550 |   {
 551 |    "cell_type": "code",
 552 |    "execution_count": 10,
 553 |    "metadata": {},
 554 |    "outputs": [],
 555 |    "source": [
 556 |     "# Creating copy of original df to mess around with\n",
 557 |     "tweet_df_test = tweets_df.copy()"
 558 |    ]
 559 |   },
 560 |   {
 561 |    "cell_type": "code",
 562 |    "execution_count": 20,
 563 |    "metadata": {},
 564 |    "outputs": [],
 565 |    "source": [
 566 |     "# Creating functions to request tweet or user information and extract them\n",
 567 |     "def extract_tweepy_tweet_info(row):\n",
 568 |     "    tweet = api.get_status(row['Tweet Id'])\n",
 569 |     "    return tweet.source\n",
 570 |     "\n",
 571 |     "def extract_tweepy_tweet_user_info(row):\n",
 572 |     "    tweet = api.get_status(row['Tweet Id'])\n",
 573 |     "    return tweet.user.statuses_count\n",
 574 |     "    \n",
 575 |     "def extract_tweepy_user_info1(row):\n",
 576 |     "    user = api.get_user(row['Tweet User Id'])\n",
 577 |     "    return user.followers_count\n",
 578 |     "\n",
 579 |     "def extract_tweepy_user_info2(row):\n",
 580 |     "    user = api.get_user(row['Tweet User'])\n",
 581 |     "    return user.verified"
 582 |    ]
 583 |   },
 584 |   {
 585 |    "cell_type": "code",
 586 |    "execution_count": 21,
 587 |    "metadata": {
 588 |     "scrolled": true
 589 |    },
 590 |    "outputs": [],
 591 |    "source": [
 592 |     "# Setting new columns to be equal to the returned data from each function\n",
 593 |     "tweet_df_test['Tweet Source'] = tweet_df_test.apply(extract_tweepy_tweet_info,axis=1)\n",
 594 |     "tweet_df_test['Tweets Count'] = tweet_df_test.apply(extract_tweepy_tweet_user_info,axis=1)\n",
 595 |     "tweet_df_test['Follower Count'] = tweet_df_test.apply(extract_tweepy_user_info1,axis=1)\n",
 596 |     "tweet_df_test['Verified Status'] = tweet_df_test.apply(extract_tweepy_user_info2,axis=1)"
 597 |    ]
 598 |   },
 599 |   {
 600 |    "cell_type": "code",
 601 |    "execution_count": 26,
 602 |    "metadata": {},
 603 |    "outputs": [
 604 |     {
 605 |      "data": {
 606 |       "text/html": [
 607 |        "<div>\n",
 608 |        "<style scoped>\n",
 609 |        "    .dataframe tbody tr th:only-of-type {\n",
 610 |        "        vertical-align: middle;\n",
 611 |        "    }\n",
 612 |        "\n",
 613 |        "    .dataframe tbody tr th {\n",
 614 |        "        vertical-align: top;\n",
 615 |        "    }\n",
 616 |        "\n",
 617 |        "    .dataframe thead th {\n",
 618 |        "        text-align: right;\n",
 619 |        "    }\n",
 620 |        "</style>\n",
 621 |        "<table border=\"1\" class=\"dataframe\">\n",
 622 |        "  <thead>\n",
 623 |        "    <tr style=\"text-align: right;\">\n",
 624 |        "      <th></th>\n",
 625 |        "      <th>Tweet Id</th>\n",
 626 |        "      <th>Tweet User Id</th>\n",
 627 |        "      <th>Tweet User</th>\n",
 628 |        "      <th>Text</th>\n",
 629 |        "      <th>Retweets</th>\n",
 630 |        "      <th>Favorites</th>\n",
 631 |        "      <th>Replies</th>\n",
 632 |        "      <th>Datetime</th>\n",
 633 |        "      <th>Tweet Source</th>\n",
 634 |        "      <th>Follower Count</th>\n",
 635 |        "      <th>Verified Status</th>\n",
 636 |        "      <th>Tweets Count</th>\n",
 637 |        "    </tr>\n",
 638 |        "  </thead>\n",
 639 |        "  <tbody>\n",
 640 |        "    <tr>\n",
 641 |        "      <th>0</th>\n",
 642 |        "      <td>1285363858832363520</td>\n",
 643 |        "      <td>1182717701203972096</td>\n",
 644 |        "      <td>workinclassbird</td>\n",
 645 |        "      <td>friend..... hello</td>\n",
 646 |        "      <td>0</td>\n",
 647 |        "      <td>3</td>\n",
 648 |        "      <td>1</td>\n",
 649 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 650 |        "      <td>Twitter for iPhone</td>\n",
 651 |        "      <td>1877</td>\n",
 652 |        "      <td>False</td>\n",
 653 |        "      <td>561</td>\n",
 654 |        "    </tr>\n",
 655 |        "    <tr>\n",
 656 |        "      <th>1</th>\n",
 657 |        "      <td>1285363857242947584</td>\n",
 658 |        "      <td>1183184898070405120</td>\n",
 659 |        "      <td>Soap_The_Scrub</td>\n",
 660 |        "      <td>hello yes i interacted</td>\n",
 661 |        "      <td>0</td>\n",
 662 |        "      <td>4</td>\n",
 663 |        "      <td>3</td>\n",
 664 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 665 |        "      <td>Twitter for iPhone</td>\n",
 666 |        "      <td>1265</td>\n",
 667 |        "      <td>False</td>\n",
 668 |        "      <td>11815</td>\n",
 669 |        "    </tr>\n",
 670 |        "    <tr>\n",
 671 |        "      <th>2</th>\n",
 672 |        "      <td>1285363856202698753</td>\n",
 673 |        "      <td>844768299388813314</td>\n",
 674 |        "      <td>kuroslays</td>\n",
 675 |        "      <td>Hello lew,</td>\n",
 676 |        "      <td>0</td>\n",
 677 |        "      <td>0</td>\n",
 678 |        "      <td>0</td>\n",
 679 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 680 |        "      <td>Twitter for iPhone</td>\n",
 681 |        "      <td>1201</td>\n",
 682 |        "      <td>False</td>\n",
 683 |        "      <td>7332</td>\n",
 684 |        "    </tr>\n",
 685 |        "    <tr>\n",
 686 |        "      <th>3</th>\n",
 687 |        "      <td>1285363856055951363</td>\n",
 688 |        "      <td>1214501518646247425</td>\n",
 689 |        "      <td>bubsji</td>\n",
 690 |        "      <td>im nervous HELLO</td>\n",
 691 |        "      <td>0</td>\n",
 692 |        "      <td>0</td>\n",
 693 |        "      <td>0</td>\n",
 694 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 695 |        "      <td>Twitter for Android</td>\n",
 696 |        "      <td>568</td>\n",
 697 |        "      <td>False</td>\n",
 698 |        "      <td>10844</td>\n",
 699 |        "    </tr>\n",
 700 |        "    <tr>\n",
 701 |        "      <th>4</th>\n",
 702 |        "      <td>1285363852851511301</td>\n",
 703 |        "      <td>811267164476841984</td>\n",
 704 |        "      <td>realJakeLogan</td>\n",
 705 |        "      <td>Butt Stallion says hello neck gaiter</td>\n",
 706 |        "      <td>0</td>\n",
 707 |        "      <td>0</td>\n",
 708 |        "      <td>0</td>\n",
 709 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 710 |        "      <td>WordPress.com</td>\n",
 711 |        "      <td>9</td>\n",
 712 |        "      <td>False</td>\n",
 713 |        "      <td>147</td>\n",
 714 |        "    </tr>\n",
 715 |        "  </tbody>\n",
 716 |        "</table>\n",
 717 |        "</div>"
 718 |       ],
 719 |       "text/plain": [
 720 |        "              Tweet Id        Tweet User Id       Tweet User  \\\n",
 721 |        "0  1285363858832363520  1182717701203972096  workinclassbird   \n",
 722 |        "1  1285363857242947584  1183184898070405120   Soap_The_Scrub   \n",
 723 |        "2  1285363856202698753   844768299388813314        kuroslays   \n",
 724 |        "3  1285363856055951363  1214501518646247425           bubsji   \n",
 725 |        "4  1285363852851511301   811267164476841984    realJakeLogan   \n",
 726 |        "\n",
 727 |        "                                    Text  Retweets  Favorites  Replies  \\\n",
 728 |        "0                      friend..... hello         0          3        1   \n",
 729 |        "1                hello yes i interacted          0          4        3   \n",
 730 |        "2                            Hello lew,          0          0        0   \n",
 731 |        "3                       im nervous HELLO         0          0        0   \n",
 732 |        "4  Butt Stallion says hello neck gaiter          0          0        0   \n",
 733 |        "\n",
 734 |        "                   Datetime         Tweet Source  Follower Count  \\\n",
 735 |        "0 2020-07-20 23:59:59+00:00   Twitter for iPhone            1877   \n",
 736 |        "1 2020-07-20 23:59:59+00:00   Twitter for iPhone            1265   \n",
 737 |        "2 2020-07-20 23:59:58+00:00   Twitter for iPhone            1201   \n",
 738 |        "3 2020-07-20 23:59:58+00:00  Twitter for Android             568   \n",
 739 |        "4 2020-07-20 23:59:58+00:00        WordPress.com               9   \n",
 740 |        "\n",
 741 |        "   Verified Status  Tweets Count  \n",
 742 |        "0            False           561  \n",
 743 |        "1            False         11815  \n",
 744 |        "2            False          7332  \n",
 745 |        "3            False         10844  \n",
 746 |        "4            False           147  "
 747 |       ]
 748 |      },
 749 |      "execution_count": 26,
 750 |      "metadata": {},
 751 |      "output_type": "execute_result"
 752 |     }
 753 |    ],
 754 |    "source": [
 755 |     "# Output of data\n",
 756 |     "tweet_df_test.head()"
 757 |    ]
 758 |   },
 759 |   {
 760 |    "cell_type": "markdown",
 761 |    "metadata": {},
 762 |    "source": [
 763 |     "As you can see there are now four new columns added on at the end of this dataframe.\n",
 764 |     "\n",
 765 |     "It's worth noting the above code is not done efficiently in regards to time and API requests. If you find yourself using either method to access more than one piece of information for each tweet the functions above are not the best way to do so because they send one request per tweet.attribute instead of collecting several attributes for one request.\n",
 766 |     "\n",
 767 |     "If you want to access several attributes per Tweet, there's a couple ways of doing so. Either create a list, store the data in the list then add it to the dataframe. Or create a function that will create a series and return it, then use pandas to apply this method to a dataframe. I'll showcase the former as it's easier to grasp."
 768 |    ]
 769 |   },
 770 |   {
 771 |    "cell_type": "code",
 772 |    "execution_count": 24,
 773 |    "metadata": {},
 774 |    "outputs": [],
 775 |    "source": [
 776 |     "# Creating copy of original df to mess around with\n",
 777 |     "tweets_df_test_efficient = tweets_df.copy()"
 778 |    ]
 779 |   },
 780 |   {
 781 |    "cell_type": "code",
 782 |    "execution_count": 34,
 783 |    "metadata": {},
 784 |    "outputs": [],
 785 |    "source": [
 786 |     "# Creation of list to store scrape tweet data\n",
 787 |     "tweets_holding_list = []\n",
 788 |     "\n",
 789 |     "def extract_tweepy_tweet_info_efficient(row):\n",
 790 |     "    # Using Tweepy API to request for tweet data\n",
 791 |     "    tweet = api.get_status(row['Tweet Id'])\n",
 792 |     "    \n",
 793 |     "    # Storing chosen tweet data in tweets_holding_list to be used later\n",
 794 |     "    tweets_holding_list.append((tweet.source, tweet.user.statuses_count, tweet.user.followers_count, tweet.user.verified))"
 795 |    ]
 796 |   },
 797 |   {
 798 |    "cell_type": "code",
 799 |    "execution_count": null,
 800 |    "metadata": {},
 801 |    "outputs": [],
 802 |    "source": [
 803 |     "# Applying the extract_tweepy_tweet_info_efficient function to store tweet data in the tweets_holding_list\n",
 804 |     "tweets_df_test_efficient.apply(extract_tweepy_tweet_info_efficient, axis=1)\n",
 805 |     "\n",
 806 |     "# Creating new columns to store the data that's currently being held in tweets_holding_list\n",
 807 |     "tweets_df_test_efficient[['Tweet Source', 'User Tweet Count', 'Follower Count', 'User Verified Status']] = pd.DataFrame(tweets_holding_list)"
 808 |    ]
 809 |   },
 810 |   {
 811 |    "cell_type": "code",
 812 |    "execution_count": 43,
 813 |    "metadata": {},
 814 |    "outputs": [
 815 |     {
 816 |      "data": {
 817 |       "text/html": [
 818 |        "<div>\n",
 819 |        "<style scoped>\n",
 820 |        "    .dataframe tbody tr th:only-of-type {\n",
 821 |        "        vertical-align: middle;\n",
 822 |        "    }\n",
 823 |        "\n",
 824 |        "    .dataframe tbody tr th {\n",
 825 |        "        vertical-align: top;\n",
 826 |        "    }\n",
 827 |        "\n",
 828 |        "    .dataframe thead th {\n",
 829 |        "        text-align: right;\n",
 830 |        "    }\n",
 831 |        "</style>\n",
 832 |        "<table border=\"1\" class=\"dataframe\">\n",
 833 |        "  <thead>\n",
 834 |        "    <tr style=\"text-align: right;\">\n",
 835 |        "      <th></th>\n",
 836 |        "      <th>Tweet Id</th>\n",
 837 |        "      <th>Tweet User Id</th>\n",
 838 |        "      <th>Tweet User</th>\n",
 839 |        "      <th>Text</th>\n",
 840 |        "      <th>Retweets</th>\n",
 841 |        "      <th>Favorites</th>\n",
 842 |        "      <th>Replies</th>\n",
 843 |        "      <th>Datetime</th>\n",
 844 |        "      <th>Tweet Source</th>\n",
 845 |        "      <th>User Tweet Count</th>\n",
 846 |        "      <th>Follower Count</th>\n",
 847 |        "      <th>User Verified Status</th>\n",
 848 |        "    </tr>\n",
 849 |        "  </thead>\n",
 850 |        "  <tbody>\n",
 851 |        "    <tr>\n",
 852 |        "      <th>0</th>\n",
 853 |        "      <td>1285363858832363520</td>\n",
 854 |        "      <td>1182717701203972096</td>\n",
 855 |        "      <td>workinclassbird</td>\n",
 856 |        "      <td>friend..... hello</td>\n",
 857 |        "      <td>0</td>\n",
 858 |        "      <td>3</td>\n",
 859 |        "      <td>1</td>\n",
 860 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 861 |        "      <td>Twitter for iPhone</td>\n",
 862 |        "      <td>561</td>\n",
 863 |        "      <td>1878</td>\n",
 864 |        "      <td>False</td>\n",
 865 |        "    </tr>\n",
 866 |        "    <tr>\n",
 867 |        "      <th>1</th>\n",
 868 |        "      <td>1285363857242947584</td>\n",
 869 |        "      <td>1183184898070405120</td>\n",
 870 |        "      <td>Soap_The_Scrub</td>\n",
 871 |        "      <td>hello yes i interacted</td>\n",
 872 |        "      <td>0</td>\n",
 873 |        "      <td>4</td>\n",
 874 |        "      <td>3</td>\n",
 875 |        "      <td>2020-07-20 23:59:59+00:00</td>\n",
 876 |        "      <td>Twitter for iPhone</td>\n",
 877 |        "      <td>11819</td>\n",
 878 |        "      <td>1266</td>\n",
 879 |        "      <td>False</td>\n",
 880 |        "    </tr>\n",
 881 |        "    <tr>\n",
 882 |        "      <th>2</th>\n",
 883 |        "      <td>1285363856202698753</td>\n",
 884 |        "      <td>844768299388813314</td>\n",
 885 |        "      <td>kuroslays</td>\n",
 886 |        "      <td>Hello lew,</td>\n",
 887 |        "      <td>0</td>\n",
 888 |        "      <td>0</td>\n",
 889 |        "      <td>0</td>\n",
 890 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 891 |        "      <td>Twitter for iPhone</td>\n",
 892 |        "      <td>7333</td>\n",
 893 |        "      <td>1200</td>\n",
 894 |        "      <td>False</td>\n",
 895 |        "    </tr>\n",
 896 |        "    <tr>\n",
 897 |        "      <th>3</th>\n",
 898 |        "      <td>1285363856055951363</td>\n",
 899 |        "      <td>1214501518646247425</td>\n",
 900 |        "      <td>bubsji</td>\n",
 901 |        "      <td>im nervous HELLO</td>\n",
 902 |        "      <td>0</td>\n",
 903 |        "      <td>0</td>\n",
 904 |        "      <td>0</td>\n",
 905 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 906 |        "      <td>Twitter for Android</td>\n",
 907 |        "      <td>10861</td>\n",
 908 |        "      <td>568</td>\n",
 909 |        "      <td>False</td>\n",
 910 |        "    </tr>\n",
 911 |        "    <tr>\n",
 912 |        "      <th>4</th>\n",
 913 |        "      <td>1285363852851511301</td>\n",
 914 |        "      <td>811267164476841984</td>\n",
 915 |        "      <td>realJakeLogan</td>\n",
 916 |        "      <td>Butt Stallion says hello neck gaiter</td>\n",
 917 |        "      <td>0</td>\n",
 918 |        "      <td>0</td>\n",
 919 |        "      <td>0</td>\n",
 920 |        "      <td>2020-07-20 23:59:58+00:00</td>\n",
 921 |        "      <td>WordPress.com</td>\n",
 922 |        "      <td>147</td>\n",
 923 |        "      <td>9</td>\n",
 924 |        "      <td>False</td>\n",
 925 |        "    </tr>\n",
 926 |        "  </tbody>\n",
 927 |        "</table>\n",
 928 |        "</div>"
 929 |       ],
 930 |       "text/plain": [
 931 |        "              Tweet Id        Tweet User Id       Tweet User  \\\n",
 932 |        "0  1285363858832363520  1182717701203972096  workinclassbird   \n",
 933 |        "1  1285363857242947584  1183184898070405120   Soap_The_Scrub   \n",
 934 |        "2  1285363856202698753   844768299388813314        kuroslays   \n",
 935 |        "3  1285363856055951363  1214501518646247425           bubsji   \n",
 936 |        "4  1285363852851511301   811267164476841984    realJakeLogan   \n",
 937 |        "\n",
 938 |        "                                    Text  Retweets  Favorites  Replies  \\\n",
 939 |        "0                      friend..... hello         0          3        1   \n",
 940 |        "1                hello yes i interacted          0          4        3   \n",
 941 |        "2                            Hello lew,          0          0        0   \n",
 942 |        "3                       im nervous HELLO         0          0        0   \n",
 943 |        "4  Butt Stallion says hello neck gaiter          0          0        0   \n",
 944 |        "\n",
 945 |        "                   Datetime         Tweet Source  User Tweet Count  \\\n",
 946 |        "0 2020-07-20 23:59:59+00:00   Twitter for iPhone               561   \n",
 947 |        "1 2020-07-20 23:59:59+00:00   Twitter for iPhone             11819   \n",
 948 |        "2 2020-07-20 23:59:58+00:00   Twitter for iPhone              7333   \n",
 949 |        "3 2020-07-20 23:59:58+00:00  Twitter for Android             10861   \n",
 950 |        "4 2020-07-20 23:59:58+00:00        WordPress.com               147   \n",
 951 |        "\n",
 952 |        "   Follower Count  User Verified Status  \n",
 953 |        "0            1878                 False  \n",
 954 |        "1            1266                 False  \n",
 955 |        "2            1200                 False  \n",
 956 |        "3             568                 False  \n",
 957 |        "4               9                 False  "
 958 |       ]
 959 |      },
 960 |      "execution_count": 43,
 961 |      "metadata": {},
 962 |      "output_type": "execute_result"
 963 |     }
 964 |    ],
 965 |    "source": [
 966 |     "# Output of data\n",
 967 |     "tweets_df_test_efficient.head()"
 968 |    ]
 969 |   },
 970 |   {
 971 |    "cell_type": "markdown",
 972 |    "metadata": {},
 973 |    "source": [
 974 |     "There you go. That's all there is to it. It's more efficient to only run the api request once and pull all the information you need than to send a request for each tweet.attribute. It'll save a lot more time in the long run."
 975 |    ]
 976 |   }
 977 |  ],
 978 |  "metadata": {
 979 |   "kernelspec": {
 980 |    "display_name": "Python 3",
 981 |    "language": "python",
 982 |    "name": "python3"
 983 |   },
 984 |   "language_info": {
 985 |    "codemirror_mode": {
 986 |     "name": "ipython",
 987 |     "version": 3
 988 |    },
 989 |    "file_extension": ".py",
 990 |    "mimetype": "text/x-python",
 991 |    "name": "python",
 992 |    "nbconvert_exporter": "python",
 993 |    "pygments_lexer": "ipython3",
 994 |    "version": "3.7.3"
 995 |   }
 996 |  },
 997 |  "nbformat": 4,
 998 |  "nbformat_minor": 2
 999 | }
1000 | 


--------------------------------------------------------------------------------
/BasicScraper/GetOldTweets3_Basic_Scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "colab_type": "text",
  7 |     "id": "3MDPXp5-X80r"
  8 |    },
  9 |    "source": [
 10 |     "# Scraper for Twitter using GetOldTweets3\n",
 11 |     "\n",
 12 |     "Package: https://github.com/Mottl/GetOldTweets3\n",
 13 |     "\n",
 14 |     "### Notebook Author: Martin Beck"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {
 21 |     "colab": {
 22 |      "base_uri": "https://localhost:8080/",
 23 |      "height": 1000
 24 |     },
 25 |     "colab_type": "code",
 26 |     "id": "vp7x7kWeYABh",
 27 |     "outputId": "af1a20c2-2262-47f8-e27f-90076bd7860b",
 28 |     "scrolled": true
 29 |    },
 30 |    "outputs": [],
 31 |    "source": [
 32 |     "# Pip install GetOldTweets3 if you don't already have the package\n",
 33 |     "# !pip install GetOldTweets3\n",
 34 |     "\n",
 35 |     "# Imports\n",
 36 |     "import GetOldTweets3 as got\n",
 37 |     "import pandas as pd"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {
 43 |     "colab_type": "text",
 44 |     "id": "he3accCbyaWG"
 45 |    },
 46 |    "source": [
 47 |     "## Query by Username\n",
 48 |     "Creation of queries using GetOldTweets3\n",
 49 |     "\n",
 50 |     "Function is focused on querying by username then providing a CSV file of that query using pandas."
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 2,
 56 |    "metadata": {
 57 |     "colab": {},
 58 |     "colab_type": "code",
 59 |     "id": "54rhT5wfZVXD"
 60 |    },
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "# Function that pulls tweets from a specific username and turns to csv file\n",
 64 |     "\n",
 65 |     "# Parameters: (list of twitter usernames), (max number of most recent tweets to pull from)\n",
 66 |     "def username_tweets_to_csv(username, count):\n",
 67 |     "    # Creation of query object\n",
 68 |     "    tweetCriteria = got.manager.TweetCriteria().setUsername(username)\\\n",
 69 |     "                                            .setMaxTweets(count)\n",
 70 |     "    # Creation of list that contains all tweets\n",
 71 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
 72 |     "\n",
 73 |     "    # Creating list of chosen tweet data\n",
 74 |     "    user_tweets = [[tweet.date, tweet.text] for tweet in tweets]\n",
 75 |     "\n",
 76 |     "    # Creation of dataframe from tweets list\n",
 77 |     "    tweets_df = pd.DataFrame(user_tweets, columns = ['Datetime', 'Text'])\n",
 78 |     "\n",
 79 |     "    # Converting dataframe to CSV\n",
 80 |     "    tweets_df.to_csv('{}-{}k-tweets.csv'.format(username, int(count/1000)), sep=',')"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "# Now to use the function created\n",
 90 |     "# Input username(s) to scrape tweets and name csv file\n",
 91 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
 92 |     "username = 'jack'\n",
 93 |     "count = 2000\n",
 94 |     "\n",
 95 |     "# Calling function to turn username's past x amount of tweets into a CSV file\n",
 96 |     "username_tweets_to_csv(username, count)"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {
102 |     "colab_type": "text",
103 |     "id": "G7r4McYgyoQy"
104 |    },
105 |    "source": [
106 |     "## Query by Text Search\n",
107 |     "Function is focused on querying by text query then providing a CSV file of that query using pandas."
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": 34,
113 |    "metadata": {
114 |     "colab": {},
115 |     "colab_type": "code",
116 |     "id": "JSjpix_9A5e6"
117 |    },
118 |    "outputs": [],
119 |    "source": [
120 |     "# Function that pulls tweets based on a general search query and turns to csv file\n",
121 |     "\n",
122 |     "# Parameters: (text query you want to search), (max number of most recent tweets to pull from)\n",
123 |     "def text_query_to_csv(text_query, count):\n",
124 |     "    # Creation of query object\n",
125 |     "    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(text_query)\\\n",
126 |     "                            .setMaxTweets(count)\n",
127 |     "    # Creation of list that contains all tweets\n",
128 |     "    tweets = got.manager.TweetManager.getTweets(tweetCriteria)\n",
129 |     "\n",
130 |     "    # Creating list of chosen tweet data\n",
131 |     "    text_tweets = [[tweet.date, tweet.text] for tweet in tweets]\n",
132 |     "\n",
133 |     "    # Creation of dataframe from tweets\n",
134 |     "    tweets_df = pd.DataFrame(text_tweets, columns = ['Datetime', 'Text'])\n",
135 |     "\n",
136 |     "    # Converting tweets dataframe to csv file\n",
137 |     "    tweets_df.to_csv('{}-{}k-tweets.csv'.format(text_query, int(count/1000)), sep=',')"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "# Now to use the function created\n",
147 |     "# Input search query to scrape tweets and name csv file\n",
148 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
149 |     "text_query = 'USA Election 2020'\n",
150 |     "count = 5000\n",
151 |     "\n",
152 |     "# Calling function to query X amount of relevant tweets and create a CSV file\n",
153 |     "text_query_to_csv(text_query, count)"
154 |    ]
155 |   }
156 |  ],
157 |  "metadata": {
158 |   "colab": {
159 |    "collapsed_sections": [],
160 |    "name": "GetOldTweets3 Twitter Scraper",
161 |    "provenance": []
162 |   },
163 |   "kernelspec": {
164 |    "display_name": "Python 3",
165 |    "language": "python",
166 |    "name": "python3"
167 |   },
168 |   "language_info": {
169 |    "codemirror_mode": {
170 |     "name": "ipython",
171 |     "version": 3
172 |    },
173 |    "file_extension": ".py",
174 |    "mimetype": "text/x-python",
175 |    "name": "python",
176 |    "nbconvert_exporter": "python",
177 |    "pygments_lexer": "ipython3",
178 |    "version": "3.7.3"
179 |   }
180 |  },
181 |  "nbformat": 4,
182 |  "nbformat_minor": 1
183 | }
184 | 


--------------------------------------------------------------------------------
/BasicScraper/README.md:
--------------------------------------------------------------------------------
 1 | # NOTE, the following information is heavily outdated, GetOldTweets3 is no longer usable, and the Tweepy code utilizes Twitter API V1, V2 is currently used.
 2 | 
 3 | 
 4 | 
 5 | 
 6 | ---
 7 | ---
 8 | 
 9 | # How to Scrape Tweets from Twitter
10 | This folder contains my jupyter notebooks for my basic scraping tutorial published [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1 "written article").
11 | 
12 | This folder's notebooks scrape tweets using two different packages in Python.
13 | * GetOldTweets3
14 | * Tweepy
15 | 


--------------------------------------------------------------------------------
/BasicScraper/Tweepy_Basic_Scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "colab_type": "text",
  7 |     "id": "xh92xbMkLy28"
  8 |    },
  9 |    "source": [
 10 |     "# Scraper for Twitter using Tweepy\n",
 11 |     "\n",
 12 |     "Package Github: https://github.com/tweepy/tweepy\n",
 13 |     "\n",
 14 |     "Package Documentation: https://tweepy.readthedocs.io/en/latest/\n",
 15 |     "\n",
 16 |     "### Notebook Author: Martin Beck"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 3,
 22 |    "metadata": {
 23 |     "colab": {
 24 |      "base_uri": "https://localhost:8080/",
 25 |      "height": 213
 26 |     },
 27 |     "colab_type": "code",
 28 |     "id": "90OU2SDJL2Q9",
 29 |     "outputId": "89d239d4-dc97-43c7-fff0-cbbe793bf094"
 30 |    },
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "# Pip install Tweepy if you don't already have the package\n",
 34 |     "# !pip install tweepy\n",
 35 |     "\n",
 36 |     "# Imports\n",
 37 |     "import tweepy\n",
 38 |     "import pandas as pd\n",
 39 |     "import time"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "metadata": {
 45 |     "colab_type": "text",
 46 |     "id": "5q3dtxauP0KR"
 47 |    },
 48 |    "source": [
 49 |     "## Credentials and Authorization"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 4,
 55 |    "metadata": {
 56 |     "colab": {},
 57 |     "colab_type": "code",
 58 |     "id": "4NcOQy9XM5hR"
 59 |    },
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "# Credentials\n",
 63 |     "\n",
 64 |     "consumer_key = \"XXXXXX\"\n",
 65 |     "consumer_secret = \"XXXXXX\"\n",
 66 |     "access_token = \"XXXXXX\"\n",
 67 |     "access_token_secret = \"XXXXXX\"\n",
 68 |     "\n",
 69 |     "auth = tweepy.OAuthHandler(consumer_key, consumer_secret)\n",
 70 |     "auth.set_access_token(access_token, access_token_secret)\n",
 71 |     "api = tweepy.API(auth,wait_on_rate_limit=True)"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {
 77 |     "colab_type": "text",
 78 |     "id": "LvBbNQXgM3QI"
 79 |    },
 80 |    "source": [
 81 |     "## Query by Username\n",
 82 |     "Creation of queries using Tweepy API\n",
 83 |     "\n",
 84 |     "Function is focused on completing the query then providing a CSV file of that query using pandas"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 5,
 90 |    "metadata": {
 91 |     "colab": {},
 92 |     "colab_type": "code",
 93 |     "id": "fguMqU2ifc5h"
 94 |    },
 95 |    "outputs": [],
 96 |    "source": [
 97 |     "tweets = []\n",
 98 |     "\n",
 99 |     "def username_tweets_to_csv(username,count):\n",
100 |     "    try:      \n",
101 |     "        # Creation of query method using parameters\n",
102 |     "        tweets = tweepy.Cursor(api.user_timeline,id=username).items(count)\n",
103 |     "\n",
104 |     "        # Pulling information from tweets iterable object\n",
105 |     "        tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets]\n",
106 |     "\n",
107 |     "        # Creation of dataframe from tweets list\n",
108 |     "        # Add or remove columns as you remove tweet information\n",
109 |     "        tweets_df = pd.DataFrame(tweets_list,columns=['Datetime', 'Tweet Id', 'Text'])\n",
110 |     "\n",
111 |     "        # Converting dataframe to CSV \n",
112 |     "        tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)\n",
113 |     "\n",
114 |     "    except BaseException as e:\n",
115 |     "          print('failed on_status,',str(e))\n",
116 |     "          time.sleep(3)"
117 |    ]
118 |   },
119 |   {
120 |    "cell_type": "code",
121 |    "execution_count": 6,
122 |    "metadata": {},
123 |    "outputs": [],
124 |    "source": [
125 |     "# Input username to scrape tweets and name csv file\n",
126 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
127 |     "username = 'jack'\n",
128 |     "count = 150\n",
129 |     "\n",
130 |     "# Calling function to turn username's past X amount of tweets into a CSV file\n",
131 |     "username_tweets_to_csv(username, count)"
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "markdown",
136 |    "metadata": {
137 |     "colab_type": "text",
138 |     "id": "jFe9EonmM6u9"
139 |    },
140 |    "source": [
141 |     "## Query by Text Search\n",
142 |     "Function is focused on completing the query then providing a CSV file of that query using pandas"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": 7,
148 |    "metadata": {
149 |     "colab": {},
150 |     "colab_type": "code",
151 |     "id": "1hOeCFq6M83k"
152 |    },
153 |    "outputs": [],
154 |    "source": [
155 |     "tweets = []\n",
156 |     "\n",
157 |     "def text_query_to_csv(text_query,count):\n",
158 |     "    try:\n",
159 |     "        # Creation of query method using parameters\n",
160 |     "        tweets = tweepy.Cursor(api.search,q=text_query).items(count)\n",
161 |     "\n",
162 |     "        # Pulling information from tweets iterable object\n",
163 |     "        tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets]\n",
164 |     "\n",
165 |     "        # Creation of dataframe from tweets list\n",
166 |     "        # Add or remove columns as you remove tweet information\n",
167 |     "        tweets_df = pd.DataFrame(tweets_list,columns=['Datetime', 'Tweet Id', 'Text'])\n",
168 |     "\n",
169 |     "        # Converting dataframe to CSV \n",
170 |     "        tweets_df.to_csv('{}-tweets.csv'.format(text_query), sep=',', index = False)\n",
171 |     "\n",
172 |     "    except BaseException as e:\n",
173 |     "        print('failed on_status,',str(e))\n",
174 |     "        time.sleep(3)"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": 8,
180 |    "metadata": {},
181 |    "outputs": [],
182 |    "source": [
183 |     "# Input search query to scrape tweets and name csv file\n",
184 |     "# Max recent tweets pulls x amount of most recent tweets from that user\n",
185 |     "text_query = 'USA Election 2020'\n",
186 |     "count = 150\n",
187 |     "\n",
188 |     "# Calling function to query X amount of relevant tweets and create a CSV file\n",
189 |     "text_query_to_csv(text_query, count)"
190 |    ]
191 |   }
192 |  ],
193 |  "metadata": {
194 |   "colab": {
195 |    "collapsed_sections": [],
196 |    "name": "Tweepy Twitter Scraper",
197 |    "provenance": []
198 |   },
199 |   "kernelspec": {
200 |    "display_name": "Python 3",
201 |    "language": "python",
202 |    "name": "python3"
203 |   },
204 |   "language_info": {
205 |    "codemirror_mode": {
206 |     "name": "ipython",
207 |     "version": 3
208 |    },
209 |    "file_extension": ".py",
210 |    "mimetype": "text/x-python",
211 |    "name": "python",
212 |    "nbconvert_exporter": "python",
213 |    "pygments_lexer": "ipython3",
214 |    "version": "3.7.3"
215 |   }
216 |  },
217 |  "nbformat": 4,
218 |  "nbformat_minor": 1
219 | }
220 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Scraping Tweets from Twitter
 2 | This repository contains various materials that follow my series of tweet scraping articles.
 3 | 
 4 | The **ScraperV4** folder contains materials from my Twitter scraping tutorial article available [here](https://betterprogramming.pub/how-to-scrape-tweets-from-twitter-141ed19abb10).
 5 | This article covers:
 6 | 
 7 | * Setting up Tweepy with Twitter API V2
 8 | * Simple queires with Tweepy
 9 | 
10 | ## OUTDATED MATERIALS
11 | <b>All materials pertaining to the following below sections are outdated. Materials are left for archival reasons. Many API changes prevent scrapers like snscrape and GetOldTweets3 to work. The only current version that consistently works is using Twitter API V2 which is shown in the above section</b>  
12 | <b>If you raise any issues on my code, please refer to the specific sub-directory and Python libraries used so I can know where to help.</b>
13 | 
14 | The <b>BasicScraper</b> folder contains materials from my beginner scraping tutorial article available [here](https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1 "written article").  
15 | This article covers:
16 | * Setting up Tweepy and GetOldTweets3
17 | * Simple queries with Tweepy and GetOldTweets3
18 | 
19 | The <b>AdvScraper</b> folder contains materials from my advanced scraping tutorial article available [here](https://towardsdatascience.com/how-to-scrape-more-information-from-tweets-on-twitter-44fd540b8a1f "written article").  
20 | This article covers:
21 | * Pulling more information from tweets with Tweepy and GetOldTweets3
22 | * Pulling user information from tweets with Tweepy and GetOldTweets3
23 | * Scraping using filters with Tweepy and GetOldTweets3
24 | 
25 | The <b>snscrape</b> folder contains materials from my snscrape scraping tutorial article available [here](https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af).  
26 | This article covers:
27 | * Setting up snscrape
28 | * Different ways to use snscrape
29 | * Simple queries with snscrape
30 | 


--------------------------------------------------------------------------------
/ScraperV4/README.md:
--------------------------------------------------------------------------------
1 | # How to Scrape Tweets from Twitter
2 | This folder contains my jupyter notebooks for my updated basic scraping tutorial published [here](https://betterprogramming.pub/how-to-scrape-tweets-from-twitter-141ed19abb10).
3 | 
4 | This folder's notebook scrape tweets using the Tweepy package in Python.
5 | 


--------------------------------------------------------------------------------
/ScraperV4/Tweepy_Scraper_V4.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "554d9ee5",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Scraper for Twitter Using Tweepy\n",
  9 |     "Package Github: https://github.com/tweepy/tweepy\n",
 10 |     "\n",
 11 |     "Package Documentation: https://tweepy.readthedocs.io/en/latest/"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "id": "ba8aff69",
 17 |    "metadata": {},
 18 |    "source": [
 19 |     "## Notebook Author: Martin Beck"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "code",
 24 |    "execution_count": 1,
 25 |    "id": "1e38a14f",
 26 |    "metadata": {},
 27 |    "outputs": [],
 28 |    "source": [
 29 |     "# Pip install Tweepy if you don't already have the package\n",
 30 |     "# !pip install tweepy\n",
 31 |     "\n",
 32 |     "# Imports\n",
 33 |     "import tweepy\n",
 34 |     "import pandas as pd"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "id": "c8f69264",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "## Credentials and Authorization"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 2,
 48 |    "id": "5d396c11",
 49 |    "metadata": {},
 50 |    "outputs": [],
 51 |    "source": [
 52 |     "# Credentials\n",
 53 |     "bearer_token = \"XXXXXXX\"\n",
 54 |     "\n",
 55 |     "client = tweepy.Client(bearer_token)"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "markdown",
 60 |    "id": "77359a30",
 61 |    "metadata": {},
 62 |    "source": [
 63 |     "## Query by Username\n",
 64 |     "Function is focused on using Username to Search then providing a CSV file of that scrape using pandas"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": 189,
 70 |    "id": "1dc585a4",
 71 |    "metadata": {},
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "def username_search_to_csv(username, count):\n",
 75 |     "    try:\n",
 76 |     "        # grabbing user id from username \n",
 77 |     "        user_id = client.get_user(username=username).data.id\n",
 78 |     "        \n",
 79 |     "        # Creation of query method using parameters\n",
 80 |     "        tweets = tweepy.Paginator(client.get_users_tweets, user_id, tweet_fields=[\"author_id\", \"created_at\", \"lang\", \"public_metrics\"], expansions=[\"author_id\"], max_results=100).flatten(limit = count)\n",
 81 |     "        \n",
 82 |     "        tweets_list = []\n",
 83 |     "        \n",
 84 |     "        # Pulling information from tweets generator\n",
 85 |     "        tweets_list = [[tweet.created_at, tweet.id, tweet.text, tweet.public_metrics[\"retweet_count\"], tweet.public_metrics[\"like_count\"]]for tweet in tweets]\n",
 86 |     "        \n",
 87 |     "        # Creation of dataframe from tweets list\n",
 88 |     "        tweets_df = pd.DataFrame(tweets_list, columns=[\"Created At\", \"Tweet Id\", \"Text\", \"Retweet Count\", \"Like Count\"])\n",
 89 |     "        \n",
 90 |     "        # Converting dataframe to CSV \n",
 91 |     "        tweets_df.to_csv(\"{}-tweets.csv\".format(username), sep=\",\", index = False)\n",
 92 |     "        \n",
 93 |     "        print(\"Completed Scrape!\")\n",
 94 |     "        \n",
 95 |     "    except BaseException as e:\n",
 96 |     "        print(\"failed on_status,\",str(e))"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 4,
102 |    "id": "1abdf7d4",
103 |    "metadata": {},
104 |    "outputs": [],
105 |    "source": [
106 |     "# Input search query to scrape tweets and name csv file\n",
107 |     "username = \"BillGates\"\n",
108 |     "count = 10\n",
109 |     "\n",
110 |     "# Calling function to query X amount of relevant tweets and create a CSV file\n",
111 |     "username_search_to_csv(username, count)"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "markdown",
116 |    "id": "4eee4399",
117 |    "metadata": {},
118 |    "source": [
119 |     "## Scrape by Keyword Search\n",
120 |     "Function is focused on using Keyword Search then providing a CSV file of that scrape using pandas"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 2,
126 |    "id": "0f3198ff",
127 |    "metadata": {},
128 |    "outputs": [],
129 |    "source": [
130 |     "def keyword_search_to_csv(keyword_search, count):\n",
131 |     "    try:\n",
132 |     "        # Creation of query method using parameters\n",
133 |     "        tweets = tweepy.Paginator(client.search_recent_tweets, keyword_search, tweet_fields=[\"author_id\", \"created_at\", \"lang\", \"public_metrics\"], user_fields=[\"username\"]).flatten(limit = count)\n",
134 |     "        \n",
135 |     "        tweets_list = []\n",
136 |     "        \n",
137 |     "        # Pulling information from tweets generator\n",
138 |     "        tweets_list = [[tweet.created_at, tweet.id, tweet.text, tweet.public_metrics[\"retweet_count\"], tweet.public_metrics[\"like_count\"]]for tweet in tweets]\n",
139 |     "        \n",
140 |     "        # Creation of dataframe from tweets list\n",
141 |     "        tweets_df = pd.DataFrame(tweets_list, columns=[\"Created At\", \"Tweet Id\", \"Text\", \"Retweet Count\", \"Like Count\"])\n",
142 |     "        \n",
143 |     "        # Converting dataframe to CSV \n",
144 |     "        tweets_df.to_csv(\"{}-tweets.csv\".format(keyword_search), sep=\",\", index = False)\n",
145 |     "        \n",
146 |     "        print(\"Completed Scrape!\")\n",
147 |     "        \n",
148 |     "    except BaseException as e:\n",
149 |     "        print(\"failed on_status,\",str(e))"
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "code",
154 |    "execution_count": 3,
155 |    "id": "8adcbae1",
156 |    "metadata": {},
157 |    "outputs": [],
158 |    "source": [
159 |     "# Input search query to scrape tweets and name csv file\n",
160 |     "keyword_search = \"Dogs\"\n",
161 |     "count = 10\n",
162 |     "\n",
163 |     "# Calling function to query X amount of relevant tweets and create a CSV file\n",
164 |     "keyword_search_to_csv(keyword_search, count)"
165 |    ]
166 |   }
167 |  ],
168 |  "metadata": {
169 |   "kernelspec": {
170 |    "display_name": "Python 3 (ipykernel)",
171 |    "language": "python",
172 |    "name": "python3"
173 |   },
174 |   "language_info": {
175 |    "codemirror_mode": {
176 |     "name": "ipython",
177 |     "version": 3
178 |    },
179 |    "file_extension": ".py",
180 |    "mimetype": "text/x-python",
181 |    "name": "python",
182 |    "nbconvert_exporter": "python",
183 |    "pygments_lexer": "ipython3",
184 |    "version": "3.11.2"
185 |   }
186 |  },
187 |  "nbformat": 4,
188 |  "nbformat_minor": 5
189 | }
190 | 


--------------------------------------------------------------------------------
/snscrape/README.md:
--------------------------------------------------------------------------------
 1 | # NOTE, the following information is heavily outdated. Snscrape is currently unusable as mentioned in its GitHub issues: https://github.com/JustAnotherArchivist/snscrape/issues/996
 2 | 
 3 | 
 4 | 
 5 | 
 6 | ---
 7 | ---# How to Scrape Tweets with snscrape
 8 | This folder contains the jupyter notebooks for my snscrape scraping tutorial published [here](https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af).
 9 | 
10 | This folder contains two subfolders based on what method you use with snscrape such as using the CLI commands or the Python wrapper available with snscrape. Each sub-folder contains a Jupyter notebook and Python script that follows the code snippets in my article.
11 | 
12 | The contents of this folder and its subfolders are shown below.
13 | 
14 | * cli-with-python
15 |   * snscrape-python-cli.ipynb
16 |   * snscrape-python-cli.py
17 | * python-wrapper
18 |   * snscrape-python-wrapper.ipynb
19 |   * snscrape-python-wrapper.py
20 | 


--------------------------------------------------------------------------------
/snscrape/cli-with-python/snscrape-python-cli.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Article Notebook for Scraping Twitter Using snscrape's CLI Commands With Python\n",
  8 |     "<br>Package Github: https://github.com/JustAnotherArchivist/snscrape\n",
  9 |     "<br>This notebook will be using the development version of snscrape\n",
 10 |     "\n",
 11 |     "Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af\n",
 12 |     "\n",
 13 |     "### Notebook Author: Martin Beck\n",
 14 |     "<b>Information current as of November, 26th 2020</b><br>\n",
 15 |     "\n",
 16 |     "This notebook contains materials for scraping tweets from Twitter using snscrape's CLI commands with Python\n",
 17 |     "\n",
 18 |     "<b>Dependencies: </b> \n",
 19 |     "- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).\n",
 20 |     "- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.\n",
 21 |     "- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook."
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 4,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "# Run the pip install command below if you don't already have the library\n",
 31 |     "# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git\n",
 32 |     "\n",
 33 |     "# Run the below command if you don't already have Pandas\n",
 34 |     "# !pip install pandas\n",
 35 |     "\n",
 36 |     "# Imports\n",
 37 |     "import os\n",
 38 |     "import pandas as pd"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "# Query by Username\n",
 46 |     "The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {
 53 |     "scrolled": true
 54 |    },
 55 |    "outputs": [],
 56 |    "source": [
 57 |     "# Setting variables to be used in format string command below\n",
 58 |     "tweet_count = 100\n",
 59 |     "username = \"jack\"\n",
 60 |     "\n",
 61 |     "# Using OS library to call CLI commands in Python\n",
 62 |     "os.system(\"snscrape --jsonl --max-results {} twitter-search 'from:{}'> user-tweets.json\".format(tweet_count, username))"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 6,
 68 |    "metadata": {},
 69 |    "outputs": [
 70 |     {
 71 |      "data": {
 72 |       "text/html": [
 73 |        "<div>\n",
 74 |        "<style scoped>\n",
 75 |        "    .dataframe tbody tr th:only-of-type {\n",
 76 |        "        vertical-align: middle;\n",
 77 |        "    }\n",
 78 |        "\n",
 79 |        "    .dataframe tbody tr th {\n",
 80 |        "        vertical-align: top;\n",
 81 |        "    }\n",
 82 |        "\n",
 83 |        "    .dataframe thead th {\n",
 84 |        "        text-align: right;\n",
 85 |        "    }\n",
 86 |        "</style>\n",
 87 |        "<table border=\"1\" class=\"dataframe\">\n",
 88 |        "  <thead>\n",
 89 |        "    <tr style=\"text-align: right;\">\n",
 90 |        "      <th></th>\n",
 91 |        "      <th>url</th>\n",
 92 |        "      <th>date</th>\n",
 93 |        "      <th>content</th>\n",
 94 |        "      <th>renderedContent</th>\n",
 95 |        "      <th>id</th>\n",
 96 |        "      <th>user</th>\n",
 97 |        "      <th>outlinks</th>\n",
 98 |        "      <th>tcooutlinks</th>\n",
 99 |        "      <th>replyCount</th>\n",
100 |        "      <th>retweetCount</th>\n",
101 |        "      <th>likeCount</th>\n",
102 |        "      <th>quoteCount</th>\n",
103 |        "      <th>conversationId</th>\n",
104 |        "      <th>lang</th>\n",
105 |        "      <th>source</th>\n",
106 |        "      <th>media</th>\n",
107 |        "      <th>retweetedTweet</th>\n",
108 |        "      <th>quotedTweet</th>\n",
109 |        "      <th>mentionedUsers</th>\n",
110 |        "    </tr>\n",
111 |        "  </thead>\n",
112 |        "  <tbody>\n",
113 |        "    <tr>\n",
114 |        "      <th>0</th>\n",
115 |        "      <td>https://twitter.com/jack/status/13324354308016...</td>\n",
116 |        "      <td>2020-11-27 21:25:36+00:00</td>\n",
117 |        "      <td>@JesseDorogusker @Square ❤️</td>\n",
118 |        "      <td>@JesseDorogusker @Square ❤️</td>\n",
119 |        "      <td>1332435430801690624</td>\n",
120 |        "      <td>{'username': 'jack', 'displayname': 'jack', 'i...</td>\n",
121 |        "      <td>[]</td>\n",
122 |        "      <td>[]</td>\n",
123 |        "      <td>54</td>\n",
124 |        "      <td>8</td>\n",
125 |        "      <td>226</td>\n",
126 |        "      <td>1</td>\n",
127 |        "      <td>1332428871891775488</td>\n",
128 |        "      <td>und</td>\n",
129 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
130 |        "      <td>NaN</td>\n",
131 |        "      <td>NaN</td>\n",
132 |        "      <td>None</td>\n",
133 |        "      <td>[{'username': 'JesseDorogusker', 'displayname'...</td>\n",
134 |        "    </tr>\n",
135 |        "    <tr>\n",
136 |        "      <th>1</th>\n",
137 |        "      <td>https://twitter.com/jack/status/13291496370060...</td>\n",
138 |        "      <td>2020-11-18 19:49:02+00:00</td>\n",
139 |        "      <td>@NeerajKA Welcome!</td>\n",
140 |        "      <td>@NeerajKA Welcome!</td>\n",
141 |        "      <td>1329149637006041088</td>\n",
142 |        "      <td>{'username': 'jack', 'displayname': 'jack', 'i...</td>\n",
143 |        "      <td>[]</td>\n",
144 |        "      <td>[]</td>\n",
145 |        "      <td>72</td>\n",
146 |        "      <td>14</td>\n",
147 |        "      <td>800</td>\n",
148 |        "      <td>8</td>\n",
149 |        "      <td>1329140522565439490</td>\n",
150 |        "      <td>en</td>\n",
151 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
152 |        "      <td>NaN</td>\n",
153 |        "      <td>NaN</td>\n",
154 |        "      <td>None</td>\n",
155 |        "      <td>[{'username': 'NeerajKA', 'displayname': 'Neer...</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <th>2</th>\n",
159 |        "      <td>https://twitter.com/jack/status/13291372550263...</td>\n",
160 |        "      <td>2020-11-18 18:59:50+00:00</td>\n",
161 |        "      <td>Join @CashApp! #Bitcoin https://t.co/SbYANIZyix</td>\n",
162 |        "      <td>Join @CashApp! #Bitcoin twitter.com/owenbjenni...</td>\n",
163 |        "      <td>1329137255026311168</td>\n",
164 |        "      <td>{'username': 'jack', 'displayname': 'jack', 'i...</td>\n",
165 |        "      <td>[https://twitter.com/owenbjennings/status/1329...</td>\n",
166 |        "      <td>[https://t.co/SbYANIZyix]</td>\n",
167 |        "      <td>585</td>\n",
168 |        "      <td>277</td>\n",
169 |        "      <td>2507</td>\n",
170 |        "      <td>132</td>\n",
171 |        "      <td>1329137255026311168</td>\n",
172 |        "      <td>en</td>\n",
173 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
174 |        "      <td>NaN</td>\n",
175 |        "      <td>NaN</td>\n",
176 |        "      <td>{'url': 'https://twitter.com/owenbjennings/sta...</td>\n",
177 |        "      <td>[{'username': 'CashApp', 'displayname': 'Cash ...</td>\n",
178 |        "    </tr>\n",
179 |        "    <tr>\n",
180 |        "      <th>3</th>\n",
181 |        "      <td>https://twitter.com/jack/status/13291366656847...</td>\n",
182 |        "      <td>2020-11-18 18:57:29+00:00</td>\n",
183 |        "      <td>@kateconger @sarahintampa Nah</td>\n",
184 |        "      <td>@kateconger @sarahintampa Nah</td>\n",
185 |        "      <td>1329136665684705280</td>\n",
186 |        "      <td>{'username': 'jack', 'displayname': 'jack', 'i...</td>\n",
187 |        "      <td>[]</td>\n",
188 |        "      <td>[]</td>\n",
189 |        "      <td>38</td>\n",
190 |        "      <td>5</td>\n",
191 |        "      <td>176</td>\n",
192 |        "      <td>10</td>\n",
193 |        "      <td>1329126492731699203</td>\n",
194 |        "      <td>und</td>\n",
195 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
196 |        "      <td>NaN</td>\n",
197 |        "      <td>NaN</td>\n",
198 |        "      <td>None</td>\n",
199 |        "      <td>[{'username': 'kateconger', 'displayname': 'o....</td>\n",
200 |        "    </tr>\n",
201 |        "    <tr>\n",
202 |        "      <th>4</th>\n",
203 |        "      <td>https://twitter.com/jack/status/13291358061921...</td>\n",
204 |        "      <td>2020-11-18 18:54:05+00:00</td>\n",
205 |        "      <td>@mmasnick Terrible idea! And terribly false.</td>\n",
206 |        "      <td>@mmasnick Terrible idea! And terribly false.</td>\n",
207 |        "      <td>1329135806192107521</td>\n",
208 |        "      <td>{'username': 'jack', 'displayname': 'jack', 'i...</td>\n",
209 |        "      <td>[]</td>\n",
210 |        "      <td>[]</td>\n",
211 |        "      <td>51</td>\n",
212 |        "      <td>13</td>\n",
213 |        "      <td>222</td>\n",
214 |        "      <td>16</td>\n",
215 |        "      <td>1329128773845860352</td>\n",
216 |        "      <td>en</td>\n",
217 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
218 |        "      <td>NaN</td>\n",
219 |        "      <td>NaN</td>\n",
220 |        "      <td>None</td>\n",
221 |        "      <td>[{'username': 'mmasnick', 'displayname': 'Mike...</td>\n",
222 |        "    </tr>\n",
223 |        "  </tbody>\n",
224 |        "</table>\n",
225 |        "</div>"
226 |       ],
227 |       "text/plain": [
228 |        "                                                 url  \\\n",
229 |        "0  https://twitter.com/jack/status/13324354308016...   \n",
230 |        "1  https://twitter.com/jack/status/13291496370060...   \n",
231 |        "2  https://twitter.com/jack/status/13291372550263...   \n",
232 |        "3  https://twitter.com/jack/status/13291366656847...   \n",
233 |        "4  https://twitter.com/jack/status/13291358061921...   \n",
234 |        "\n",
235 |        "                       date                                          content  \\\n",
236 |        "0 2020-11-27 21:25:36+00:00                      @JesseDorogusker @Square ❤️   \n",
237 |        "1 2020-11-18 19:49:02+00:00                               @NeerajKA Welcome!   \n",
238 |        "2 2020-11-18 18:59:50+00:00  Join @CashApp! #Bitcoin https://t.co/SbYANIZyix   \n",
239 |        "3 2020-11-18 18:57:29+00:00                    @kateconger @sarahintampa Nah   \n",
240 |        "4 2020-11-18 18:54:05+00:00     @mmasnick Terrible idea! And terribly false.   \n",
241 |        "\n",
242 |        "                                     renderedContent                   id  \\\n",
243 |        "0                        @JesseDorogusker @Square ❤️  1332435430801690624   \n",
244 |        "1                                 @NeerajKA Welcome!  1329149637006041088   \n",
245 |        "2  Join @CashApp! #Bitcoin twitter.com/owenbjenni...  1329137255026311168   \n",
246 |        "3                      @kateconger @sarahintampa Nah  1329136665684705280   \n",
247 |        "4       @mmasnick Terrible idea! And terribly false.  1329135806192107521   \n",
248 |        "\n",
249 |        "                                                user  \\\n",
250 |        "0  {'username': 'jack', 'displayname': 'jack', 'i...   \n",
251 |        "1  {'username': 'jack', 'displayname': 'jack', 'i...   \n",
252 |        "2  {'username': 'jack', 'displayname': 'jack', 'i...   \n",
253 |        "3  {'username': 'jack', 'displayname': 'jack', 'i...   \n",
254 |        "4  {'username': 'jack', 'displayname': 'jack', 'i...   \n",
255 |        "\n",
256 |        "                                            outlinks  \\\n",
257 |        "0                                                 []   \n",
258 |        "1                                                 []   \n",
259 |        "2  [https://twitter.com/owenbjennings/status/1329...   \n",
260 |        "3                                                 []   \n",
261 |        "4                                                 []   \n",
262 |        "\n",
263 |        "                 tcooutlinks  replyCount  retweetCount  likeCount  quoteCount  \\\n",
264 |        "0                         []          54             8        226           1   \n",
265 |        "1                         []          72            14        800           8   \n",
266 |        "2  [https://t.co/SbYANIZyix]         585           277       2507         132   \n",
267 |        "3                         []          38             5        176          10   \n",
268 |        "4                         []          51            13        222          16   \n",
269 |        "\n",
270 |        "        conversationId lang  \\\n",
271 |        "0  1332428871891775488  und   \n",
272 |        "1  1329140522565439490   en   \n",
273 |        "2  1329137255026311168   en   \n",
274 |        "3  1329126492731699203  und   \n",
275 |        "4  1329128773845860352   en   \n",
276 |        "\n",
277 |        "                                              source  media  retweetedTweet  \\\n",
278 |        "0  <a href=\"http://twitter.com/download/iphone\" r...    NaN             NaN   \n",
279 |        "1  <a href=\"http://twitter.com/download/iphone\" r...    NaN             NaN   \n",
280 |        "2  <a href=\"http://twitter.com/download/iphone\" r...    NaN             NaN   \n",
281 |        "3  <a href=\"http://twitter.com/download/iphone\" r...    NaN             NaN   \n",
282 |        "4  <a href=\"http://twitter.com/download/iphone\" r...    NaN             NaN   \n",
283 |        "\n",
284 |        "                                         quotedTweet  \\\n",
285 |        "0                                               None   \n",
286 |        "1                                               None   \n",
287 |        "2  {'url': 'https://twitter.com/owenbjennings/sta...   \n",
288 |        "3                                               None   \n",
289 |        "4                                               None   \n",
290 |        "\n",
291 |        "                                      mentionedUsers  \n",
292 |        "0  [{'username': 'JesseDorogusker', 'displayname'...  \n",
293 |        "1  [{'username': 'NeerajKA', 'displayname': 'Neer...  \n",
294 |        "2  [{'username': 'CashApp', 'displayname': 'Cash ...  \n",
295 |        "3  [{'username': 'kateconger', 'displayname': 'o....  \n",
296 |        "4  [{'username': 'mmasnick', 'displayname': 'Mike...  "
297 |       ]
298 |      },
299 |      "execution_count": 6,
300 |      "metadata": {},
301 |      "output_type": "execute_result"
302 |     }
303 |    ],
304 |    "source": [
305 |     "# Reads the json generated from the CLI command above and creates a pandas dataframe\n",
306 |     "tweets_df1 = pd.read_json('user-tweets.json', lines=True)\n",
307 |     "\n",
308 |     "# Displays first 5 entries from dataframe\n",
309 |     "tweets_df1.head()"
310 |    ]
311 |   },
312 |   {
313 |    "cell_type": "code",
314 |    "execution_count": 7,
315 |    "metadata": {},
316 |    "outputs": [],
317 |    "source": [
318 |     "# Export dataframe into a CSV\n",
319 |     "tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "markdown",
324 |    "metadata": {},
325 |    "source": [
326 |     "# Query by Text Search\n",
327 |     "The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas"
328 |    ]
329 |   },
330 |   {
331 |    "cell_type": "code",
332 |    "execution_count": null,
333 |    "metadata": {},
334 |    "outputs": [],
335 |    "source": [
336 |     "# Setting variables to be used in format string command below\n",
337 |     "tweet_count = 500\n",
338 |     "text_query = \"its the elephant\"\n",
339 |     "since_date = \"2020-06-01\"\n",
340 |     "until_date = \"2020-07-31\"\n",
341 |     "\n",
342 |     "# Using OS library to call CLI commands in Python\n",
343 |     "os.system('snscrape --jsonl --max-results {} --since {} twitter-search \"{} until:{}\"> text-query-tweets.json'.format(tweet_count, since_date, text_query, until_date))"
344 |    ]
345 |   },
346 |   {
347 |    "cell_type": "code",
348 |    "execution_count": 9,
349 |    "metadata": {},
350 |    "outputs": [
351 |     {
352 |      "data": {
353 |       "text/html": [
354 |        "<div>\n",
355 |        "<style scoped>\n",
356 |        "    .dataframe tbody tr th:only-of-type {\n",
357 |        "        vertical-align: middle;\n",
358 |        "    }\n",
359 |        "\n",
360 |        "    .dataframe tbody tr th {\n",
361 |        "        vertical-align: top;\n",
362 |        "    }\n",
363 |        "\n",
364 |        "    .dataframe thead th {\n",
365 |        "        text-align: right;\n",
366 |        "    }\n",
367 |        "</style>\n",
368 |        "<table border=\"1\" class=\"dataframe\">\n",
369 |        "  <thead>\n",
370 |        "    <tr style=\"text-align: right;\">\n",
371 |        "      <th></th>\n",
372 |        "      <th>url</th>\n",
373 |        "      <th>date</th>\n",
374 |        "      <th>content</th>\n",
375 |        "      <th>renderedContent</th>\n",
376 |        "      <th>id</th>\n",
377 |        "      <th>user</th>\n",
378 |        "      <th>outlinks</th>\n",
379 |        "      <th>tcooutlinks</th>\n",
380 |        "      <th>replyCount</th>\n",
381 |        "      <th>retweetCount</th>\n",
382 |        "      <th>likeCount</th>\n",
383 |        "      <th>quoteCount</th>\n",
384 |        "      <th>conversationId</th>\n",
385 |        "      <th>lang</th>\n",
386 |        "      <th>source</th>\n",
387 |        "      <th>media</th>\n",
388 |        "      <th>retweetedTweet</th>\n",
389 |        "      <th>quotedTweet</th>\n",
390 |        "      <th>mentionedUsers</th>\n",
391 |        "    </tr>\n",
392 |        "  </thead>\n",
393 |        "  <tbody>\n",
394 |        "    <tr>\n",
395 |        "      <th>0</th>\n",
396 |        "      <td>https://twitter.com/TylerPaulUtt1/status/12889...</td>\n",
397 |        "      <td>2020-07-30 23:57:02+00:00</td>\n",
398 |        "      <td>@SiBuduh @langoinstitute do you know the Ko wo...</td>\n",
399 |        "      <td>@SiBuduh @langoinstitute do you know the Ko wo...</td>\n",
400 |        "      <td>1288986997143601152</td>\n",
401 |        "      <td>{'username': 'TylerPaulUtt1', 'displayname': '...</td>\n",
402 |        "      <td>[]</td>\n",
403 |        "      <td>[]</td>\n",
404 |        "      <td>1</td>\n",
405 |        "      <td>0</td>\n",
406 |        "      <td>0</td>\n",
407 |        "      <td>0</td>\n",
408 |        "      <td>1288307058928947204</td>\n",
409 |        "      <td>en</td>\n",
410 |        "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
411 |        "      <td>None</td>\n",
412 |        "      <td>NaN</td>\n",
413 |        "      <td>None</td>\n",
414 |        "      <td>[{'username': 'SiBuduh', 'displayname': 'Ed Lu...</td>\n",
415 |        "    </tr>\n",
416 |        "    <tr>\n",
417 |        "      <th>1</th>\n",
418 |        "      <td>https://twitter.com/EndlessSynthwav/status/128...</td>\n",
419 |        "      <td>2020-07-30 23:44:04+00:00</td>\n",
420 |        "      <td>@RockstarGames Any idea if the elephant rifle ...</td>\n",
421 |        "      <td>@RockstarGames Any idea if the elephant rifle ...</td>\n",
422 |        "      <td>1288983731122966534</td>\n",
423 |        "      <td>{'username': 'EndlessSynthwav', 'displayname':...</td>\n",
424 |        "      <td>[]</td>\n",
425 |        "      <td>[]</td>\n",
426 |        "      <td>0</td>\n",
427 |        "      <td>0</td>\n",
428 |        "      <td>0</td>\n",
429 |        "      <td>0</td>\n",
430 |        "      <td>1288983731122966534</td>\n",
431 |        "      <td>en</td>\n",
432 |        "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
433 |        "      <td>None</td>\n",
434 |        "      <td>NaN</td>\n",
435 |        "      <td>None</td>\n",
436 |        "      <td>[{'username': 'RockstarGames', 'displayname': ...</td>\n",
437 |        "    </tr>\n",
438 |        "    <tr>\n",
439 |        "      <th>2</th>\n",
440 |        "      <td>https://twitter.com/aanalyst50/status/12889677...</td>\n",
441 |        "      <td>2020-07-30 22:40:40+00:00</td>\n",
442 |        "      <td>@realDonaldTrump Trump just keeps ignoring the...</td>\n",
443 |        "      <td>@realDonaldTrump Trump just keeps ignoring the...</td>\n",
444 |        "      <td>1288967774795116550</td>\n",
445 |        "      <td>{'username': 'aanalyst50', 'displayname': 'Don...</td>\n",
446 |        "      <td>[]</td>\n",
447 |        "      <td>[]</td>\n",
448 |        "      <td>0</td>\n",
449 |        "      <td>0</td>\n",
450 |        "      <td>1</td>\n",
451 |        "      <td>0</td>\n",
452 |        "      <td>1288966119676616704</td>\n",
453 |        "      <td>en</td>\n",
454 |        "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
455 |        "      <td>None</td>\n",
456 |        "      <td>NaN</td>\n",
457 |        "      <td>None</td>\n",
458 |        "      <td>[{'username': 'realDonaldTrump', 'displayname'...</td>\n",
459 |        "    </tr>\n",
460 |        "    <tr>\n",
461 |        "      <th>3</th>\n",
462 |        "      <td>https://twitter.com/RozeyBozzy/status/12889669...</td>\n",
463 |        "      <td>2020-07-30 22:37:18+00:00</td>\n",
464 |        "      <td>@cslogan88 Famous 19th century song from Engla...</td>\n",
465 |        "      <td>@cslogan88 Famous 19th century song from Engla...</td>\n",
466 |        "      <td>1288966929236066309</td>\n",
467 |        "      <td>{'username': 'RozeyBozzy', 'displayname': 'Roz...</td>\n",
468 |        "      <td>[]</td>\n",
469 |        "      <td>[]</td>\n",
470 |        "      <td>0</td>\n",
471 |        "      <td>0</td>\n",
472 |        "      <td>0</td>\n",
473 |        "      <td>0</td>\n",
474 |        "      <td>1288965246602838017</td>\n",
475 |        "      <td>en</td>\n",
476 |        "      <td>&lt;a href=\"https://mobile.twitter.com\" rel=\"nofo...</td>\n",
477 |        "      <td>None</td>\n",
478 |        "      <td>NaN</td>\n",
479 |        "      <td>None</td>\n",
480 |        "      <td>[{'username': 'cslogan88', 'displayname': 'Chr...</td>\n",
481 |        "    </tr>\n",
482 |        "    <tr>\n",
483 |        "      <th>4</th>\n",
484 |        "      <td>https://twitter.com/alfred_hanan/status/128896...</td>\n",
485 |        "      <td>2020-07-30 22:32:44+00:00</td>\n",
486 |        "      <td>@realDonaldTrump #RepublicanTrumpVirus.\\nLets ...</td>\n",
487 |        "      <td>@realDonaldTrump #RepublicanTrumpVirus.\\nLets ...</td>\n",
488 |        "      <td>1288965780030144512</td>\n",
489 |        "      <td>{'username': 'alfred_hanan', 'displayname': 'A...</td>\n",
490 |        "      <td>[]</td>\n",
491 |        "      <td>[]</td>\n",
492 |        "      <td>0</td>\n",
493 |        "      <td>0</td>\n",
494 |        "      <td>0</td>\n",
495 |        "      <td>0</td>\n",
496 |        "      <td>1288947487911419905</td>\n",
497 |        "      <td>en</td>\n",
498 |        "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
499 |        "      <td>None</td>\n",
500 |        "      <td>NaN</td>\n",
501 |        "      <td>None</td>\n",
502 |        "      <td>[{'username': 'realDonaldTrump', 'displayname'...</td>\n",
503 |        "    </tr>\n",
504 |        "  </tbody>\n",
505 |        "</table>\n",
506 |        "</div>"
507 |       ],
508 |       "text/plain": [
509 |        "                                                 url  \\\n",
510 |        "0  https://twitter.com/TylerPaulUtt1/status/12889...   \n",
511 |        "1  https://twitter.com/EndlessSynthwav/status/128...   \n",
512 |        "2  https://twitter.com/aanalyst50/status/12889677...   \n",
513 |        "3  https://twitter.com/RozeyBozzy/status/12889669...   \n",
514 |        "4  https://twitter.com/alfred_hanan/status/128896...   \n",
515 |        "\n",
516 |        "                       date  \\\n",
517 |        "0 2020-07-30 23:57:02+00:00   \n",
518 |        "1 2020-07-30 23:44:04+00:00   \n",
519 |        "2 2020-07-30 22:40:40+00:00   \n",
520 |        "3 2020-07-30 22:37:18+00:00   \n",
521 |        "4 2020-07-30 22:32:44+00:00   \n",
522 |        "\n",
523 |        "                                             content  \\\n",
524 |        "0  @SiBuduh @langoinstitute do you know the Ko wo...   \n",
525 |        "1  @RockstarGames Any idea if the elephant rifle ...   \n",
526 |        "2  @realDonaldTrump Trump just keeps ignoring the...   \n",
527 |        "3  @cslogan88 Famous 19th century song from Engla...   \n",
528 |        "4  @realDonaldTrump #RepublicanTrumpVirus.\\nLets ...   \n",
529 |        "\n",
530 |        "                                     renderedContent                   id  \\\n",
531 |        "0  @SiBuduh @langoinstitute do you know the Ko wo...  1288986997143601152   \n",
532 |        "1  @RockstarGames Any idea if the elephant rifle ...  1288983731122966534   \n",
533 |        "2  @realDonaldTrump Trump just keeps ignoring the...  1288967774795116550   \n",
534 |        "3  @cslogan88 Famous 19th century song from Engla...  1288966929236066309   \n",
535 |        "4  @realDonaldTrump #RepublicanTrumpVirus.\\nLets ...  1288965780030144512   \n",
536 |        "\n",
537 |        "                                                user outlinks tcooutlinks  \\\n",
538 |        "0  {'username': 'TylerPaulUtt1', 'displayname': '...       []          []   \n",
539 |        "1  {'username': 'EndlessSynthwav', 'displayname':...       []          []   \n",
540 |        "2  {'username': 'aanalyst50', 'displayname': 'Don...       []          []   \n",
541 |        "3  {'username': 'RozeyBozzy', 'displayname': 'Roz...       []          []   \n",
542 |        "4  {'username': 'alfred_hanan', 'displayname': 'A...       []          []   \n",
543 |        "\n",
544 |        "   replyCount  retweetCount  likeCount  quoteCount       conversationId lang  \\\n",
545 |        "0           1             0          0           0  1288307058928947204   en   \n",
546 |        "1           0             0          0           0  1288983731122966534   en   \n",
547 |        "2           0             0          1           0  1288966119676616704   en   \n",
548 |        "3           0             0          0           0  1288965246602838017   en   \n",
549 |        "4           0             0          0           0  1288947487911419905   en   \n",
550 |        "\n",
551 |        "                                              source media  retweetedTweet  \\\n",
552 |        "0  <a href=\"http://twitter.com/#!/download/ipad\" ...  None             NaN   \n",
553 |        "1  <a href=\"http://twitter.com/download/android\" ...  None             NaN   \n",
554 |        "2  <a href=\"http://twitter.com/download/iphone\" r...  None             NaN   \n",
555 |        "3  <a href=\"https://mobile.twitter.com\" rel=\"nofo...  None             NaN   \n",
556 |        "4  <a href=\"http://twitter.com/download/android\" ...  None             NaN   \n",
557 |        "\n",
558 |        "  quotedTweet                                     mentionedUsers  \n",
559 |        "0        None  [{'username': 'SiBuduh', 'displayname': 'Ed Lu...  \n",
560 |        "1        None  [{'username': 'RockstarGames', 'displayname': ...  \n",
561 |        "2        None  [{'username': 'realDonaldTrump', 'displayname'...  \n",
562 |        "3        None  [{'username': 'cslogan88', 'displayname': 'Chr...  \n",
563 |        "4        None  [{'username': 'realDonaldTrump', 'displayname'...  "
564 |       ]
565 |      },
566 |      "execution_count": 9,
567 |      "metadata": {},
568 |      "output_type": "execute_result"
569 |     }
570 |    ],
571 |    "source": [
572 |     "# Reads the json generated from the CLI command above and creates a pandas dataframe\n",
573 |     "tweets_df2 = pd.read_json('text-query-tweets.json', lines=True)\n",
574 |     "\n",
575 |     "# Displays first 5 entries from dataframe\n",
576 |     "tweets_df2.head()"
577 |    ]
578 |   },
579 |   {
580 |    "cell_type": "code",
581 |    "execution_count": 10,
582 |    "metadata": {},
583 |    "outputs": [],
584 |    "source": [
585 |     "# Export dataframe into a CSV\n",
586 |     "tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)"
587 |    ]
588 |   }
589 |  ],
590 |  "metadata": {
591 |   "kernelspec": {
592 |    "display_name": "Python 3",
593 |    "language": "python",
594 |    "name": "python3"
595 |   },
596 |   "language_info": {
597 |    "codemirror_mode": {
598 |     "name": "ipython",
599 |     "version": 3
600 |    },
601 |    "file_extension": ".py",
602 |    "mimetype": "text/x-python",
603 |    "name": "python",
604 |    "nbconvert_exporter": "python",
605 |    "pygments_lexer": "ipython3",
606 |    "version": "3.7.3"
607 |   }
608 |  },
609 |  "nbformat": 4,
610 |  "nbformat_minor": 4
611 | }
612 | 


--------------------------------------------------------------------------------
/snscrape/cli-with-python/snscrape-python-cli.py:
--------------------------------------------------------------------------------
 1 | # Script Author: Martin Beck
 2 | # Medium Article Follow-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af
 3 | 
 4 | # Pip install the command below if you don't have the development version of snscrape 
 5 | # !pip install git+https://github.com/JustAnotherArchivist/snscrape.git
 6 | 
 7 | # Run the below command if you don't already have Pandas
 8 | # !pip install pandas
 9 | 
10 | # Imports
11 | import os
12 | import pandas as pd
13 | 
14 | # Below are two ways of scraping using CLI commands.
15 | # Comment or uncomment as you need. If you currently run the script as is it will scrape both queries
16 | # then output two different csv files.
17 | 
18 | # Query by username
19 | # Setting variables to be used in format string command below
20 | tweet_count = 100
21 | username = "jack"
22 | 
23 | # Using OS library to call CLI commands in Python
24 | os.system("snscrape --jsonl --max-results {} twitter-search 'from:{}'> user-tweets.json".format(tweet_count, username))
25 | 
26 | # Reads the json generated from the CLI command above and creates a pandas dataframe
27 | tweets_df1 = pd.read_json('user-tweets.json', lines=True)
28 | 
29 | # Displays first 5 entries from dataframe
30 | # tweets_df1.head()
31 | 
32 | # Export dataframe into a CSV
33 | tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)
34 | 
35 | 
36 | # Query by text search
37 | # Setting variables to be used in format string command below
38 | tweet_count = 500
39 | text_query = "its the elephant"
40 | since_date = "2020-06-01"
41 | until_date = "2020-07-31"
42 | 
43 | # Using OS library to call CLI commands in Python
44 | os.system('snscrape --jsonl --max-results {} --since {} twitter-search "{} until:{}"> text-query-tweets.json'.format(tweet_count, since_date, text_query, until_date))
45 | 
46 | # Reads the json generated from the CLI command above and creates a pandas dataframe
47 | tweets_df2 = pd.read_json('text-query-tweets.json', lines=True)
48 | 
49 | # Displays first 5 entries from dataframe
50 | # tweets_df2.head()
51 | 
52 | # Export dataframe into a CSV
53 | tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)


--------------------------------------------------------------------------------
/snscrape/python-wrapper/snscrape-python-wrapper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper\n",
  8 |     "<br>Package Github: https://github.com/JustAnotherArchivist/snscrape\n",
  9 |     "<br>This notebook will be using the development version of snscrape\n",
 10 |     "\n",
 11 |     "Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af\n",
 12 |     "\n",
 13 |     "### Notebook Author: Martin Beck\n",
 14 |     "<b>Information current as of November, 28th 2020</b><br>\n",
 15 |     "\n",
 16 |     "This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper\n",
 17 |     "\n",
 18 |     "<b>Dependencies: </b> \n",
 19 |     "- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).\n",
 20 |     "- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.\n",
 21 |     "- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook."
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 4,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "# Run the pip install command below if you don't already have the library\n",
 31 |     "# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git\n",
 32 |     "\n",
 33 |     "# Run the below command if you don't already have Pandas\n",
 34 |     "# !pip install pandas\n",
 35 |     "\n",
 36 |     "# Imports\n",
 37 |     "import snscrape.modules.twitter as sntwitter\n",
 38 |     "import pandas as pd"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "# Query by Username\n",
 46 |     "The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 35,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "# Setting variables to be used below\n",
 56 |     "maxTweets = 100\n",
 57 |     "\n",
 58 |     "# Creating list to append tweet data to\n",
 59 |     "tweets_list1 = []\n",
 60 |     "\n",
 61 |     "# Using TwitterSearchScraper to scrape data and append tweets to list\n",
 62 |     "for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):\n",
 63 |     "    if i>maxTweets:\n",
 64 |     "        break\n",
 65 |     "    tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 36,
 71 |    "metadata": {
 72 |     "scrolled": false
 73 |    },
 74 |    "outputs": [
 75 |     {
 76 |      "data": {
 77 |       "text/html": [
 78 |        "<div>\n",
 79 |        "<style scoped>\n",
 80 |        "    .dataframe tbody tr th:only-of-type {\n",
 81 |        "        vertical-align: middle;\n",
 82 |        "    }\n",
 83 |        "\n",
 84 |        "    .dataframe tbody tr th {\n",
 85 |        "        vertical-align: top;\n",
 86 |        "    }\n",
 87 |        "\n",
 88 |        "    .dataframe thead th {\n",
 89 |        "        text-align: right;\n",
 90 |        "    }\n",
 91 |        "</style>\n",
 92 |        "<table border=\"1\" class=\"dataframe\">\n",
 93 |        "  <thead>\n",
 94 |        "    <tr style=\"text-align: right;\">\n",
 95 |        "      <th></th>\n",
 96 |        "      <th>Datetime</th>\n",
 97 |        "      <th>Tweet Id</th>\n",
 98 |        "      <th>Text</th>\n",
 99 |        "      <th>Username</th>\n",
100 |        "    </tr>\n",
101 |        "  </thead>\n",
102 |        "  <tbody>\n",
103 |        "    <tr>\n",
104 |        "      <th>0</th>\n",
105 |        "      <td>2020-11-27 21:25:36+00:00</td>\n",
106 |        "      <td>1332435430801690624</td>\n",
107 |        "      <td>@JesseDorogusker @Square ❤️</td>\n",
108 |        "      <td>jack</td>\n",
109 |        "    </tr>\n",
110 |        "    <tr>\n",
111 |        "      <th>1</th>\n",
112 |        "      <td>2020-11-18 19:49:02+00:00</td>\n",
113 |        "      <td>1329149637006041088</td>\n",
114 |        "      <td>@NeerajKA Welcome!</td>\n",
115 |        "      <td>jack</td>\n",
116 |        "    </tr>\n",
117 |        "    <tr>\n",
118 |        "      <th>2</th>\n",
119 |        "      <td>2020-11-18 18:59:50+00:00</td>\n",
120 |        "      <td>1329137255026311168</td>\n",
121 |        "      <td>Join @CashApp! #Bitcoin https://t.co/SbYANIZyix</td>\n",
122 |        "      <td>jack</td>\n",
123 |        "    </tr>\n",
124 |        "    <tr>\n",
125 |        "      <th>3</th>\n",
126 |        "      <td>2020-11-18 18:57:29+00:00</td>\n",
127 |        "      <td>1329136665684705280</td>\n",
128 |        "      <td>@kateconger @sarahintampa Nah</td>\n",
129 |        "      <td>jack</td>\n",
130 |        "    </tr>\n",
131 |        "    <tr>\n",
132 |        "      <th>4</th>\n",
133 |        "      <td>2020-11-18 18:54:05+00:00</td>\n",
134 |        "      <td>1329135806192107521</td>\n",
135 |        "      <td>@mmasnick Terrible idea! And terribly false.</td>\n",
136 |        "      <td>jack</td>\n",
137 |        "    </tr>\n",
138 |        "  </tbody>\n",
139 |        "</table>\n",
140 |        "</div>"
141 |       ],
142 |       "text/plain": [
143 |        "                   Datetime             Tweet Id  \\\n",
144 |        "0 2020-11-27 21:25:36+00:00  1332435430801690624   \n",
145 |        "1 2020-11-18 19:49:02+00:00  1329149637006041088   \n",
146 |        "2 2020-11-18 18:59:50+00:00  1329137255026311168   \n",
147 |        "3 2020-11-18 18:57:29+00:00  1329136665684705280   \n",
148 |        "4 2020-11-18 18:54:05+00:00  1329135806192107521   \n",
149 |        "\n",
150 |        "                                              Text Username  \n",
151 |        "0                      @JesseDorogusker @Square ❤️     jack  \n",
152 |        "1                               @NeerajKA Welcome!     jack  \n",
153 |        "2  Join @CashApp! #Bitcoin https://t.co/SbYANIZyix     jack  \n",
154 |        "3                    @kateconger @sarahintampa Nah     jack  \n",
155 |        "4     @mmasnick Terrible idea! And terribly false.     jack  "
156 |       ]
157 |      },
158 |      "execution_count": 36,
159 |      "metadata": {},
160 |      "output_type": "execute_result"
161 |     }
162 |    ],
163 |    "source": [
164 |     "# Creating a dataframe from the tweets list above\n",
165 |     "tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])\n",
166 |     "\n",
167 |     "# Display first 5 entries from dataframe\n",
168 |     "tweets_df1.head()"
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "code",
173 |    "execution_count": 37,
174 |    "metadata": {},
175 |    "outputs": [],
176 |    "source": [
177 |     "# Export dataframe into a CSV\n",
178 |     "tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {},
184 |    "source": [
185 |     "# Query by Text Search\n",
186 |     "The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": 27,
192 |    "metadata": {},
193 |    "outputs": [],
194 |    "source": [
195 |     "# Setting variables to be used below\n",
196 |     "maxTweets = 500\n",
197 |     "\n",
198 |     "# Creating list to append tweet data to\n",
199 |     "tweets_list2 = []\n",
200 |     "\n",
201 |     "# Using TwitterSearchScraper to scrape data and append tweets to list\n",
202 |     "for i,tweet in enumerate(sntwitter.TwitterSearchScraper('its the elephant since:2020-06-01 until:2020-07-31').get_items()):\n",
203 |     "    if i>maxTweets:\n",
204 |     "        break\n",
205 |     "    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": 28,
211 |    "metadata": {
212 |     "scrolled": false
213 |    },
214 |    "outputs": [
215 |     {
216 |      "data": {
217 |       "text/html": [
218 |        "<div>\n",
219 |        "<style scoped>\n",
220 |        "    .dataframe tbody tr th:only-of-type {\n",
221 |        "        vertical-align: middle;\n",
222 |        "    }\n",
223 |        "\n",
224 |        "    .dataframe tbody tr th {\n",
225 |        "        vertical-align: top;\n",
226 |        "    }\n",
227 |        "\n",
228 |        "    .dataframe thead th {\n",
229 |        "        text-align: right;\n",
230 |        "    }\n",
231 |        "</style>\n",
232 |        "<table border=\"1\" class=\"dataframe\">\n",
233 |        "  <thead>\n",
234 |        "    <tr style=\"text-align: right;\">\n",
235 |        "      <th></th>\n",
236 |        "      <th>Datetime</th>\n",
237 |        "      <th>Tweet Id</th>\n",
238 |        "      <th>Text</th>\n",
239 |        "      <th>Username</th>\n",
240 |        "    </tr>\n",
241 |        "  </thead>\n",
242 |        "  <tbody>\n",
243 |        "    <tr>\n",
244 |        "      <th>0</th>\n",
245 |        "      <td>2020-07-30 23:57:02+00:00</td>\n",
246 |        "      <td>1288986997143601152</td>\n",
247 |        "      <td>@SiBuduh @langoinstitute do you know the Ko wo...</td>\n",
248 |        "      <td>TylerPaulUtt1</td>\n",
249 |        "    </tr>\n",
250 |        "    <tr>\n",
251 |        "      <th>1</th>\n",
252 |        "      <td>2020-07-30 23:44:04+00:00</td>\n",
253 |        "      <td>1288983731122966534</td>\n",
254 |        "      <td>@RockstarGames Any idea if the elephant rifle ...</td>\n",
255 |        "      <td>EndlessSynthwav</td>\n",
256 |        "    </tr>\n",
257 |        "    <tr>\n",
258 |        "      <th>2</th>\n",
259 |        "      <td>2020-07-30 22:40:40+00:00</td>\n",
260 |        "      <td>1288967774795116550</td>\n",
261 |        "      <td>@realDonaldTrump Trump just keeps ignoring the...</td>\n",
262 |        "      <td>aanalyst50</td>\n",
263 |        "    </tr>\n",
264 |        "    <tr>\n",
265 |        "      <th>3</th>\n",
266 |        "      <td>2020-07-30 22:37:18+00:00</td>\n",
267 |        "      <td>1288966929236066309</td>\n",
268 |        "      <td>@cslogan88 Famous 19th century song from Engla...</td>\n",
269 |        "      <td>RozeyBozzy</td>\n",
270 |        "    </tr>\n",
271 |        "    <tr>\n",
272 |        "      <th>4</th>\n",
273 |        "      <td>2020-07-30 22:32:44+00:00</td>\n",
274 |        "      <td>1288965780030144512</td>\n",
275 |        "      <td>@realDonaldTrump #RepublicanTrumpVirus.\\nLets ...</td>\n",
276 |        "      <td>alfred_hanan</td>\n",
277 |        "    </tr>\n",
278 |        "  </tbody>\n",
279 |        "</table>\n",
280 |        "</div>"
281 |       ],
282 |       "text/plain": [
283 |        "                   Datetime             Tweet Id  \\\n",
284 |        "0 2020-07-30 23:57:02+00:00  1288986997143601152   \n",
285 |        "1 2020-07-30 23:44:04+00:00  1288983731122966534   \n",
286 |        "2 2020-07-30 22:40:40+00:00  1288967774795116550   \n",
287 |        "3 2020-07-30 22:37:18+00:00  1288966929236066309   \n",
288 |        "4 2020-07-30 22:32:44+00:00  1288965780030144512   \n",
289 |        "\n",
290 |        "                                                Text         Username  \n",
291 |        "0  @SiBuduh @langoinstitute do you know the Ko wo...    TylerPaulUtt1  \n",
292 |        "1  @RockstarGames Any idea if the elephant rifle ...  EndlessSynthwav  \n",
293 |        "2  @realDonaldTrump Trump just keeps ignoring the...       aanalyst50  \n",
294 |        "3  @cslogan88 Famous 19th century song from Engla...       RozeyBozzy  \n",
295 |        "4  @realDonaldTrump #RepublicanTrumpVirus.\\nLets ...     alfred_hanan  "
296 |       ]
297 |      },
298 |      "execution_count": 28,
299 |      "metadata": {},
300 |      "output_type": "execute_result"
301 |     }
302 |    ],
303 |    "source": [
304 |     "# Creating a dataframe from the tweets list above\n",
305 |     "tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])\n",
306 |     "\n",
307 |     "# Display first 5 entries from dataframe\n",
308 |     "tweets_df2.head()"
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "code",
313 |    "execution_count": 38,
314 |    "metadata": {},
315 |    "outputs": [],
316 |    "source": [
317 |     "# Export dataframe into a CSV\n",
318 |     "tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)"
319 |    ]
320 |   }
321 |  ],
322 |  "metadata": {
323 |   "kernelspec": {
324 |    "display_name": "Python 3",
325 |    "language": "python",
326 |    "name": "python3"
327 |   },
328 |   "language_info": {
329 |    "codemirror_mode": {
330 |     "name": "ipython",
331 |     "version": 3
332 |    },
333 |    "file_extension": ".py",
334 |    "mimetype": "text/x-python",
335 |    "name": "python",
336 |    "nbconvert_exporter": "python",
337 |    "pygments_lexer": "ipython3",
338 |    "version": "3.7.3"
339 |   }
340 |  },
341 |  "nbformat": 4,
342 |  "nbformat_minor": 4
343 | }
344 | 


--------------------------------------------------------------------------------
/snscrape/python-wrapper/snscrape-python-wrapper.py:
--------------------------------------------------------------------------------
 1 | # Script Author: Martin Beck
 2 | # Medium Article Follow-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af
 3 | 
 4 | # Pip install the command below if you don't have the development version of snscrape 
 5 | # !pip install git+https://github.com/JustAnotherArchivist/snscrape.git
 6 | 
 7 | # Run the below command if you don't already have Pandas
 8 | # !pip install pandas
 9 | 
10 | # Imports
11 | import snscrape.modules.twitter as sntwitter
12 | import pandas as pd
13 | 
14 | # Below are two ways of scraping using the Python Wrapper.
15 | # Comment or uncomment as you need. If you currently run the script as is it will scrape both queries
16 | # then output two different csv files.
17 | 
18 | # Query by username
19 | # Setting variables to be used below
20 | maxTweets = 100
21 | 
22 | # Creating list to append tweet data to
23 | tweets_list1 = []
24 | 
25 | # Using TwitterSearchScraper to scrape data 
26 | for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
27 |     if i>maxTweets:
28 |         break
29 |     tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
30 | 
31 | # Creating a dataframe from the tweets list above
32 | tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])
33 | 
34 | # Display first 5 entries from dataframe
35 | # tweets_df1.head()
36 | 
37 | # Export dataframe into a CSV
38 | tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)
39 | 
40 | 
41 | # Query by text search
42 | # Setting variables to be used below
43 | maxTweets = 500
44 | 
45 | # Creating list to append tweet data to
46 | tweets_list2 = []
47 | 
48 | # Using TwitterSearchScraper to scrape data and append tweets to list
49 | for i,tweet in enumerate(sntwitter.TwitterSearchScraper('its the elephant since:2020-06-01 until:2020-07-31').get_items()):
50 |     if i>maxTweets:
51 |         break
52 |     tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
53 | 
54 | # Creating a dataframe from the tweets list above
55 | tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])
56 | 
57 | # Display first 5 entries from dataframe
58 | tweets_df2.head()
59 | 
60 | # Export dataframe into a CSV
61 | tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)


--------------------------------------------------------------------------------
	Tweet Id	Tweet User Id	Tweet User	Reply to	Text	Retweets	Favorites	Replies	Datetime	Formatted date	Mentions	Urls	Permalink
0	1294765289255706624	12	jack	jsngr	Jordan is incredible	116	1272	62	2020-08-15 22:37:55+00:00	Sat Aug 15 22:37:55 +0000 2020		https://twitter.com/jsngr/status/1294635175222...	https://twitter.com/jack/status/12947652892557...
1	1293753884159234050	12	jack	SpaceForceDoD	?	741	9113	583	2020-08-13 03:38:57+00:00	Thu Aug 13 03:38:57 +0000 2020		https://twitter.com/spaceforcedod/status/12936...	https://twitter.com/jack/status/12937538841592...
2	1293687636675223552	12	jack	TwitterDev	Build on Twitter again!	619	4945	442	2020-08-12 23:15:42+00:00	Wed Aug 12 23:15:42 +0000 2020		https://twitter.com/TwitterDev/status/12935935...	https://twitter.com/jack/status/12936876366752...
3	1293641297459388416	12	jack	boardroom	Thanks for the chat @richkleiman and Gianni! G...	52	385	89	2020-08-12 20:11:34+00:00	Wed Aug 12 20:11:34 +0000 2020	@richkleiman	https://twitter.com/boardroom/status/129356427...	https://twitter.com/jack/status/12936412974593...
4	1291956273814990848	12	jack	Mayalangersegal	Thank you. Thank you. Thank you. @RemindMe_OfT...	2	93	16	2020-08-08 04:35:53+00:00	Sat Aug 08 04:35:53 +0000 2020	@RemindMe_OfThis		https://twitter.com/jack/status/12919562738149...
	Tweet Id	Tweet User Id	Tweet User	Text	Retweets	Favorites	Replies	Datetime
0	682986933862154241	813286	BarackObama	Hello, 2016.	3506	13010	760	2016-01-01 18:09:08+00:00
1	547783171199496192	813286	BarackObama	Say hello to friends you know and everyone you...	3555	9075	1087	2014-12-24 15:57:39+00:00
2	457281289351999489	813286	BarackObama	Hello, spring.	5807	10089	1040	2014-04-18 22:15:30+00:00
3	438453976833343488	813286	BarackObama	\"Hello OFA!\" —President Obama at the #ActionSu...	134	244	57	2014-02-25 23:22:28+00:00
4	265569746991333377	813286	BarackObama	“Hello, Columbus! Hello, Ohio! Are you fired u...	513	208	81	2012-11-05 21:42:16+00:00
	Tweet Id	Tweet User Id	Tweet User	Reply to	Text	Retweets	Favorites	Replies	Datetime	Mentions	Urls	Permalink
0	1292610170309181447	535643852	JordanSchachtel	None	Fauci had a very interesting Q&A this weekend ...	276	563	92	2020-08-09 23:54:14+00:00		https://www.cnbc.com/2020/08/07/coronavirus-va...	https://twitter.com/JordanSchachtel/status/129...
1	1292584089833349121	225265639	ddale8	None	If the president confused you about what was a...	1743	3481	143	2020-08-09 22:10:36+00:00		https://cnn.it/31zwwir	https://twitter.com/ddale8/status/129258408983...
2	1292543811235840000	53809979	davidalim	None	Antigen tests have been touted as a way to sca...	164	212	27	2020-08-09 19:30:33+00:00	@rachel_roubein	https://www.politico.com/news/2020/08/09/coron...	https://twitter.com/davidalim/status/129254381...
3	1292525660422930432	18956073	dcexaminer	None	A Nashville, Tennessee, councilwoman wants tho...	315	274	390	2020-08-09 18:18:26+00:00		https://washex.am/3kD8L1E	https://twitter.com/dcexaminer/status/12925256...
4	1292468804648394752	309822757	ryanstruyk	None	The United States just reached 5 million repor...	974	1257	52	2020-08-09 14:32:30+00:00			https://twitter.com/ryanstruyk/status/12924688...
	name	key
0	consumer_key	XXXXXXXXXXX
1	consumer_secret	XXXXXXXXXXX
2	access_token	XXXXXXXXXXX
3	access_secret	XXXXXXXXXXX
	name	key
0	consumer_key	XXXXXX
1	consumer_secret	XXXXXX
2	access_token	XXXXXX
3	access_secret	XXXXXX
	Tweet Id	Tweet User Id	Tweet User	Text	Favorites	Replies	Datetime
0	1285363858832363520	1182717701203972096	workinclassbird	friend..... hello	3	1	2020-07-20 23:59:59+00:00
1	1285363857242947584	1183184898070405120	Soap_The_Scrub	hello yes i interacted	4	3	2020-07-20 23:59:59+00:00
2	1285363856202698753	844768299388813314	kuroslays	Hello lew,	0	0	2020-07-20 23:59:58+00:00
3	1285363856055951363	1214501518646247425	bubsji	im nervous HELLO	0	0	2020-07-20 23:59:58+00:00
4	1285363852851511301	811267164476841984	realJakeLogan	Butt Stallion says hello neck gaiter	0	0	2020-07-20 23:59:58+00:00
	url	date	content	renderedContent	id	user	outlinks	tcooutlinks	replyCount	retweetCount	likeCount	quoteCount	conversationId	lang	source	media	retweetedTweet	quotedTweet	mentionedUsers
0	https://twitter.com/jack/status/13324354308016...	2020-11-27 21:25:36+00:00	@JesseDorogusker @Square ❤️	@JesseDorogusker @Square ❤️	1332435430801690624	{'username': 'jack', 'displayname': 'jack', 'i...	[]	[]	54	8	226	1	1332428871891775488	und	<a href=\"http://twitter.com/download/iphone\" r...	NaN	NaN	None	[{'username': 'JesseDorogusker', 'displayname'...
1	https://twitter.com/jack/status/13291496370060...	2020-11-18 19:49:02+00:00	@NeerajKA Welcome!	@NeerajKA Welcome!	1329149637006041088	{'username': 'jack', 'displayname': 'jack', 'i...	[]	[]	72	14	800	8	1329140522565439490	en	<a href=\"http://twitter.com/download/iphone\" r...	NaN	NaN	None	[{'username': 'NeerajKA', 'displayname': 'Neer...
2	https://twitter.com/jack/status/13291372550263...	2020-11-18 18:59:50+00:00	Join @CashApp! #Bitcoin https://t.co/SbYANIZyix	Join @CashApp! #Bitcoin twitter.com/owenbjenni...	1329137255026311168	{'username': 'jack', 'displayname': 'jack', 'i...	[https://twitter.com/owenbjennings/status/1329...	[https://t.co/SbYANIZyix]	585	277	2507	132	1329137255026311168	en	<a href=\"http://twitter.com/download/iphone\" r...	NaN	NaN	{'url': 'https://twitter.com/owenbjennings/sta...	[{'username': 'CashApp', 'displayname': 'Cash ...
3	https://twitter.com/jack/status/13291366656847...	2020-11-18 18:57:29+00:00	@kateconger @sarahintampa Nah	@kateconger @sarahintampa Nah	1329136665684705280	{'username': 'jack', 'displayname': 'jack', 'i...	[]	[]	38	5	176	10	1329126492731699203	und	<a href=\"http://twitter.com/download/iphone\" r...	NaN	NaN	None	[{'username': 'kateconger', 'displayname': 'o....
4	https://twitter.com/jack/status/13291358061921...	2020-11-18 18:54:05+00:00	@mmasnick Terrible idea! And terribly false.	@mmasnick Terrible idea! And terribly false.	1329135806192107521	{'username': 'jack', 'displayname': 'jack', 'i...	[]	[]	51	13	222	16	1329128773845860352	en	<a href=\"http://twitter.com/download/iphone\" r...	NaN	NaN	None	[{'username': 'mmasnick', 'displayname': 'Mike...
	url	date	content	renderedContent	id	user	outlinks	tcooutlinks	replyCount	likeCount	conversationId	lang	source	media	retweetedTweet	quotedTweet	mentionedUsers
0	https://twitter.com/TylerPaulUtt1/status/12889...	2020-07-30 23:57:02+00:00	@SiBuduh @langoinstitute do you know the Ko wo...	@SiBuduh @langoinstitute do you know the Ko wo...	1288986997143601152	{'username': 'TylerPaulUtt1', 'displayname': '...	[]	[]	1	0	1288307058928947204	en	<a href=\"http://twitter.com/#!/download/ipad\" ...	None	NaN	None	[{'username': 'SiBuduh', 'displayname': 'Ed Lu...
1	https://twitter.com/EndlessSynthwav/status/128...	2020-07-30 23:44:04+00:00	@RockstarGames Any idea if the elephant rifle ...	@RockstarGames Any idea if the elephant rifle ...	1288983731122966534	{'username': 'EndlessSynthwav', 'displayname':...	[]	[]	0	0	1288983731122966534	en	<a href=\"http://twitter.com/download/android\" ...	None	NaN	None	[{'username': 'RockstarGames', 'displayname': ...
2	https://twitter.com/aanalyst50/status/12889677...	2020-07-30 22:40:40+00:00	@realDonaldTrump Trump just keeps ignoring the...	@realDonaldTrump Trump just keeps ignoring the...	1288967774795116550	{'username': 'aanalyst50', 'displayname': 'Don...	[]	[]	0	1	1288966119676616704	en	<a href=\"http://twitter.com/download/iphone\" r...	None	NaN	None	[{'username': 'realDonaldTrump', 'displayname'...
3	https://twitter.com/RozeyBozzy/status/12889669...	2020-07-30 22:37:18+00:00	@cslogan88 Famous 19th century song from Engla...	@cslogan88 Famous 19th century song from Engla...	1288966929236066309	{'username': 'RozeyBozzy', 'displayname': 'Roz...	[]	[]	0	0	1288965246602838017	en	<a href=\"https://mobile.twitter.com\" rel=\"nofo...	None	NaN	None	[{'username': 'cslogan88', 'displayname': 'Chr...
4	https://twitter.com/alfred_hanan/status/128896...	2020-07-30 22:32:44+00:00	@realDonaldTrump #RepublicanTrumpVirus.\\nLets ...	@realDonaldTrump #RepublicanTrumpVirus.\\nLets ...	1288965780030144512	{'username': 'alfred_hanan', 'displayname': 'A...	[]	[]	0	0	1288947487911419905	en	<a href=\"http://twitter.com/download/android\" ...	None	NaN	None	[{'username': 'realDonaldTrump', 'displayname'...