├── README.md
└── ElonTweetsSentimentAnalysis.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Thinkful_Bootcamp
2 | These are my capstone projects from my Thinkful Data Science bootcamp program that I completed in four out of the scheduled six months. They apply various techniques from machine learning (in Python) to volatility trading research.
3 |
--------------------------------------------------------------------------------
/ElonTweetsSentimentAnalysis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "ElonTweetsSentimentAnalysis.ipynb",
7 | "provenance": [],
8 | "collapsed_sections": []
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | }
14 | },
15 | "cells": [
16 | {
17 | "cell_type": "code",
18 | "metadata": {
19 | "id": "A6ydHvrgCuvl",
20 | "colab_type": "code",
21 | "colab": {}
22 | },
23 | "source": [
24 | "import nltk\n",
25 | "# Using the following tutorial for sentiment analysis: https://www.digitalocean.com/community/tutorials/how-to-perform-sentiment-analysis-in-python-3-using-the-natural-language-toolkit-nltk"
26 | ],
27 | "execution_count": 0,
28 | "outputs": []
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {
33 | "id": "lJJKcPzuMF4Y",
34 | "colab_type": "text"
35 | },
36 | "source": [
37 | "Since this is a sentiment analysis project, I googled up a tutorial on how to do that using NLTK's tweets corpus, and will follow that for the initial model. The goal, ultimately, is to classify the tweets of Elon Musk and use a markov generator to create Markov-generated Elon Musk tweets.\n",
38 | "\n"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "metadata": {
44 | "id": "O7qkA7VqibNI",
45 | "colab_type": "code",
46 | "outputId": "fef68a11-117e-4aa4-8507-e73d8f634d9d",
47 | "colab": {
48 | "base_uri": "https://localhost:8080/",
49 | "height": 70
50 | }
51 | },
52 | "source": [
53 | "# get twitter samples from nltk\n",
54 | "\n",
55 | "nltk.download('twitter_samples')"
56 | ],
57 | "execution_count": 0,
58 | "outputs": [
59 | {
60 | "output_type": "stream",
61 | "text": [
62 | "[nltk_data] Downloading package twitter_samples to /root/nltk_data...\n",
63 | "[nltk_data] Package twitter_samples is already up-to-date!\n"
64 | ],
65 | "name": "stdout"
66 | },
67 | {
68 | "output_type": "execute_result",
69 | "data": {
70 | "text/plain": [
71 | "True"
72 | ]
73 | },
74 | "metadata": {
75 | "tags": []
76 | },
77 | "execution_count": 50
78 | }
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "metadata": {
84 | "id": "Q3iJ3mp8idQ9",
85 | "colab_type": "code",
86 | "colab": {}
87 | },
88 | "source": [
89 | "# group tweets into positive and negative\n",
90 | "\n",
91 | "from nltk.corpus import twitter_samples\n",
92 | "positive_tweets = twitter_samples.strings('positive_tweets.json')\n",
93 | "negative_tweets = twitter_samples.strings('negative_tweets.json')\n",
94 | "text = twitter_samples.strings('tweets.20150430-223406.json')"
95 | ],
96 | "execution_count": 0,
97 | "outputs": []
98 | },
99 | {
100 | "cell_type": "code",
101 | "metadata": {
102 | "id": "DTdBPcI6inJU",
103 | "colab_type": "code",
104 | "outputId": "62a7e3dd-9834-4645-93e5-1b730b9c2f23",
105 | "colab": {
106 | "base_uri": "https://localhost:8080/",
107 | "height": 70
108 | }
109 | },
110 | "source": [
111 | "nltk.download('punkt')"
112 | ],
113 | "execution_count": 0,
114 | "outputs": [
115 | {
116 | "output_type": "stream",
117 | "text": [
118 | "[nltk_data] Downloading package punkt to /root/nltk_data...\n",
119 | "[nltk_data] Unzipping tokenizers/punkt.zip.\n"
120 | ],
121 | "name": "stdout"
122 | },
123 | {
124 | "output_type": "execute_result",
125 | "data": {
126 | "text/plain": [
127 | "True"
128 | ]
129 | },
130 | "metadata": {
131 | "tags": []
132 | },
133 | "execution_count": 4
134 | }
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "metadata": {
140 | "id": "msUOgzg0i8fD",
141 | "colab_type": "code",
142 | "outputId": "99be366f-5874-4192-c72d-a4e4e2f4a9bc",
143 | "colab": {
144 | "base_uri": "https://localhost:8080/",
145 | "height": 124
146 | }
147 | },
148 | "source": [
149 | "nltk.download('wordnet')\n",
150 | "nltk.download('averaged_perceptron_tagger')"
151 | ],
152 | "execution_count": 0,
153 | "outputs": [
154 | {
155 | "output_type": "stream",
156 | "text": [
157 | "[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
158 | "[nltk_data] Unzipping corpora/wordnet.zip.\n",
159 | "[nltk_data] Downloading package averaged_perceptron_tagger to\n",
160 | "[nltk_data] /root/nltk_data...\n",
161 | "[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.\n"
162 | ],
163 | "name": "stdout"
164 | },
165 | {
166 | "output_type": "execute_result",
167 | "data": {
168 | "text/plain": [
169 | "True"
170 | ]
171 | },
172 | "metadata": {
173 | "tags": []
174 | },
175 | "execution_count": 5
176 | }
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "metadata": {
182 | "id": "eIdsJHCBlPrb",
183 | "colab_type": "code",
184 | "colab": {}
185 | },
186 | "source": [
187 | "# get tokens from tweets\n",
188 | "\n",
189 | "from nltk.tag import pos_tag\n",
190 | "from nltk.corpus import twitter_samples\n",
191 | "\n",
192 | "tweet_tokens = twitter_samples.tokenized('positive_tweets.json')"
193 | ],
194 | "execution_count": 0,
195 | "outputs": []
196 | },
197 | {
198 | "cell_type": "code",
199 | "metadata": {
200 | "id": "NTQSdB2blgSz",
201 | "colab_type": "code",
202 | "colab": {}
203 | },
204 | "source": [
205 | "# use nltk for lemmatization\n",
206 | "\n",
207 | "from nltk.tag import pos_tag\n",
208 | "from nltk.stem.wordnet import WordNetLemmatizer\n",
209 | "\n",
210 | "def lemmatize_sentence(tokens):\n",
211 | " lemmatizer = WordNetLemmatizer()\n",
212 | " lemmatized_sentence = []\n",
213 | " for word, tag in pos_tag(tokens):\n",
214 | " if tag.startswith('NN'):\n",
215 | " pos = 'n'\n",
216 | " elif tag.startswith('VB'):\n",
217 | " pos = 'v'\n",
218 | " else:\n",
219 | " pos = 'a'\n",
220 | " lemmatized_sentence.append(lemmatizer.lemmatize(word, pos))\n",
221 | " return lemmatized_sentence"
222 | ],
223 | "execution_count": 0,
224 | "outputs": []
225 | },
226 | {
227 | "cell_type": "code",
228 | "metadata": {
229 | "id": "43jvLP8HmZd2",
230 | "colab_type": "code",
231 | "colab": {}
232 | },
233 | "source": [
234 | "import re, string\n",
235 | "\n",
236 | "# function to clean tweet text data--removes @s and links\n",
237 | "# also removes punctuation and stop words\n",
238 | "\n",
239 | "def remove_noise(tweet_tokens, stop_words = ()):\n",
240 | "\n",
241 | " cleaned_tokens = []\n",
242 | "\n",
243 | " for token, tag in pos_tag(tweet_tokens):\n",
244 | " token = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+#]|[!*\\(\\),]|'\\\n",
245 | " '(?:%[0-9a-fA-F][0-9a-fA-F]))+','', token)\n",
246 | " token = re.sub(\"(@[A-Za-z0-9_]+)\",\"\", token)\n",
247 | "\n",
248 | " if tag.startswith(\"NN\"):\n",
249 | " pos = 'n'\n",
250 | " elif tag.startswith('VB'):\n",
251 | " pos = 'v'\n",
252 | " else:\n",
253 | " pos = 'a'\n",
254 | "\n",
255 | " lemmatizer = WordNetLemmatizer()\n",
256 | " token = lemmatizer.lemmatize(token, pos)\n",
257 | "\n",
258 | " if len(token) > 0 and token not in string.punctuation and token.lower() not in stop_words:\n",
259 | " cleaned_tokens.append(token.lower())\n",
260 | " return cleaned_tokens"
261 | ],
262 | "execution_count": 0,
263 | "outputs": []
264 | },
265 | {
266 | "cell_type": "code",
267 | "metadata": {
268 | "id": "63Z2e6o9mflj",
269 | "colab_type": "code",
270 | "outputId": "cabca506-0583-4284-c018-2235cb4f35e5",
271 | "colab": {
272 | "base_uri": "https://localhost:8080/",
273 | "height": 70
274 | }
275 | },
276 | "source": [
277 | "nltk.download('stopwords')"
278 | ],
279 | "execution_count": 0,
280 | "outputs": [
281 | {
282 | "output_type": "stream",
283 | "text": [
284 | "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
285 | "[nltk_data] Unzipping corpora/stopwords.zip.\n"
286 | ],
287 | "name": "stdout"
288 | },
289 | {
290 | "output_type": "execute_result",
291 | "data": {
292 | "text/plain": [
293 | "True"
294 | ]
295 | },
296 | "metadata": {
297 | "tags": []
298 | },
299 | "execution_count": 9
300 | }
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "metadata": {
306 | "id": "Osn1fZxDm-ij",
307 | "colab_type": "code",
308 | "outputId": "96791166-0f21-4100-a935-e0fc880e97f1",
309 | "colab": {
310 | "base_uri": "https://localhost:8080/",
311 | "height": 35
312 | }
313 | },
314 | "source": [
315 | "# get stop words\n",
316 | "\n",
317 | "from nltk.corpus import stopwords\n",
318 | "stop_words = stopwords.words('english')\n"
319 | ],
320 | "execution_count": 0,
321 | "outputs": [
322 | {
323 | "output_type": "stream",
324 | "text": [
325 | "['#followfriday', 'top', 'engage', 'member', 'community', 'week', ':)']\n"
326 | ],
327 | "name": "stdout"
328 | }
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "metadata": {
334 | "id": "OCncgRZonewy",
335 | "colab_type": "code",
336 | "colab": {}
337 | },
338 | "source": [
339 | "from nltk.corpus import stopwords\n",
340 | "stop_words = stopwords.words('english')\n",
341 | "\n",
342 | "#print(remove_noise(tweet_tokens[0], stop_words))\n",
343 | "\n",
344 | "positive_tweet_tokens = twitter_samples.tokenized('positive_tweets.json')\n",
345 | "negative_tweet_tokens = twitter_samples.tokenized('negative_tweets.json')\n",
346 | "\n",
347 | "positive_cleaned_tokens_list = []\n",
348 | "negative_cleaned_tokens_list = []\n",
349 | "\n",
350 | "for tokens in positive_tweet_tokens:\n",
351 | " positive_cleaned_tokens_list.append(remove_noise(tokens, stop_words))\n",
352 | "\n",
353 | "for tokens in negative_tweet_tokens:\n",
354 | " negative_cleaned_tokens_list.append(remove_noise(tokens, stop_words))"
355 | ],
356 | "execution_count": 0,
357 | "outputs": []
358 | },
359 | {
360 | "cell_type": "code",
361 | "metadata": {
362 | "id": "HfIyds1iniei",
363 | "colab_type": "code",
364 | "colab": {}
365 | },
366 | "source": [
367 | "# get most common positive words\n",
368 | "\n",
369 | "def get_all_words(cleaned_tokens_list):\n",
370 | " for tokens in cleaned_tokens_list:\n",
371 | " for token in tokens:\n",
372 | " yield token\n",
373 | "\n",
374 | "all_pos_words = get_all_words(positive_cleaned_tokens_list)"
375 | ],
376 | "execution_count": 0,
377 | "outputs": []
378 | },
379 | {
380 | "cell_type": "code",
381 | "metadata": {
382 | "id": "gUzE4zQCoIcN",
383 | "colab_type": "code",
384 | "outputId": "d2685554-5c12-4c31-f9c9-ff0ad66fafd8",
385 | "colab": {
386 | "base_uri": "https://localhost:8080/",
387 | "height": 55
388 | }
389 | },
390 | "source": [
391 | "from nltk import FreqDist\n",
392 | "\n",
393 | "freq_dist_pos = FreqDist(all_pos_words)\n",
394 | "print(freq_dist_pos.most_common(30))"
395 | ],
396 | "execution_count": 0,
397 | "outputs": [
398 | {
399 | "output_type": "stream",
400 | "text": [
401 | "[(':)', 3691), (':-)', 701), (':d', 658), ('thanks', 388), ('follow', 357), ('love', 333), ('...', 290), ('good', 283), ('get', 263), ('thank', 253), ('u', 245), ('day', 242), ('like', 229), ('see', 195), ('happy', 192), (\"i'm\", 183), ('great', 175), ('hi', 173), ('go', 167), ('back', 163), ('know', 161), ('new', 147), ('make', 145), (':p', 139), ('<3', 135), ('one', 131), ('..', 129), ('time', 125), ('hope', 123), ('us', 115)]\n"
402 | ],
403 | "name": "stdout"
404 | }
405 | ]
406 | },
407 | {
408 | "cell_type": "code",
409 | "metadata": {
410 | "id": "IXRN3HFJ7Z95",
411 | "colab_type": "code",
412 | "colab": {}
413 | },
414 | "source": [
415 | "def get_tweets_for_model(cleaned_tokens_list):\n",
416 | " for tweet_tokens in cleaned_tokens_list:\n",
417 | " yield dict([token, True] for token in tweet_tokens)\n",
418 | "\n",
419 | "positive_tokens_for_model = get_tweets_for_model(positive_cleaned_tokens_list)\n",
420 | "negative_tokens_for_model = get_tweets_for_model(negative_cleaned_tokens_list)"
421 | ],
422 | "execution_count": 0,
423 | "outputs": []
424 | },
425 | {
426 | "cell_type": "code",
427 | "metadata": {
428 | "id": "-Gf7mJk-VNDp",
429 | "colab_type": "code",
430 | "colab": {}
431 | },
432 | "source": [
433 | "import random\n",
434 | "\n",
435 | "positive_dataset = [(tweet_dict, \"Positive\")\n",
436 | " for tweet_dict in positive_tokens_for_model]\n",
437 | "\n",
438 | "negative_dataset = [(tweet_dict, \"Negative\")\n",
439 | " for tweet_dict in negative_tokens_for_model]\n",
440 | "\n",
441 | "dataset = positive_dataset + negative_dataset\n",
442 | "\n",
443 | "random.shuffle(dataset)\n",
444 | "\n",
445 | "train_data = dataset[:7000]\n",
446 | "test_data = dataset[7000:]"
447 | ],
448 | "execution_count": 0,
449 | "outputs": []
450 | },
451 | {
452 | "cell_type": "code",
453 | "metadata": {
454 | "id": "dsG3xsa2Vcff",
455 | "colab_type": "code",
456 | "colab": {}
457 | },
458 | "source": [
459 | "import pandas as pd\n",
460 | "import numpy as np\n"
461 | ],
462 | "execution_count": 0,
463 | "outputs": []
464 | },
465 | {
466 | "cell_type": "code",
467 | "metadata": {
468 | "id": "1DjH1CPvVqtF",
469 | "colab_type": "code",
470 | "outputId": "3672ae07-2786-4ed9-e83c-477fcd23c5eb",
471 | "colab": {
472 | "base_uri": "https://localhost:8080/",
473 | "height": 194
474 | }
475 | },
476 | "source": [
477 | "td = pd.DataFrame(train_data)\n"
478 | ],
479 | "execution_count": 0,
480 | "outputs": [
481 | {
482 | "output_type": "execute_result",
483 | "data": {
484 | "text/html": [
485 | "
\n",
486 | "\n",
499 | "
\n",
500 | " \n",
501 | " \n",
502 | " | \n",
503 | " 0 | \n",
504 | " 1 | \n",
505 | "
\n",
506 | " \n",
507 | " \n",
508 | " \n",
509 | " | 0 | \n",
510 | " {'yelaaaaaaa': True, ':(': True} | \n",
511 | " Negative | \n",
512 | "
\n",
513 | " \n",
514 | " | 1 | \n",
515 | " {'check': True, 'new': True, 'van': True, 'out... | \n",
516 | " Positive | \n",
517 | "
\n",
518 | " \n",
519 | " | 2 | \n",
520 | " {'every': True, 'night': True, 'take': True, '... | \n",
521 | " Negative | \n",
522 | "
\n",
523 | " \n",
524 | " | 3 | \n",
525 | " {'yes': True, 'always': True, 'selfish': True,... | \n",
526 | " Positive | \n",
527 | "
\n",
528 | " \n",
529 | " | 4 | \n",
530 | " {':(': True} | \n",
531 | " Negative | \n",
532 | "
\n",
533 | " \n",
534 | "
\n",
535 | "
"
536 | ],
537 | "text/plain": [
538 | " 0 1\n",
539 | "0 {'yelaaaaaaa': True, ':(': True} Negative\n",
540 | "1 {'check': True, 'new': True, 'van': True, 'out... Positive\n",
541 | "2 {'every': True, 'night': True, 'take': True, '... Negative\n",
542 | "3 {'yes': True, 'always': True, 'selfish': True,... Positive\n",
543 | "4 {':(': True} Negative"
544 | ]
545 | },
546 | "metadata": {
547 | "tags": []
548 | },
549 | "execution_count": 17
550 | }
551 | ]
552 | },
553 | {
554 | "cell_type": "code",
555 | "metadata": {
556 | "id": "BPgePHzwWkSd",
557 | "colab_type": "code",
558 | "outputId": "4ebc6b41-7378-43f6-a5bb-b271e53480eb",
559 | "colab": {
560 | "base_uri": "https://localhost:8080/",
561 | "height": 248
562 | }
563 | },
564 | "source": [
565 | "# Use naive bayes classifier\n",
566 | "\n",
567 | "from nltk import classify\n",
568 | "from nltk import NaiveBayesClassifier\n",
569 | "classifier = NaiveBayesClassifier.train(train_data)\n",
570 | "\n",
571 | "print(\"Accuracy is:\", classify.accuracy(classifier, test_data))\n",
572 | "\n",
573 | "print(classifier.show_most_informative_features(10))"
574 | ],
575 | "execution_count": 0,
576 | "outputs": [
577 | {
578 | "output_type": "stream",
579 | "text": [
580 | "Accuracy is: 0.996\n",
581 | "Most Informative Features\n",
582 | " :( = True Negati : Positi = 2056.5 : 1.0\n",
583 | " :) = True Positi : Negati = 1662.7 : 1.0\n",
584 | " follower = True Positi : Negati = 36.8 : 1.0\n",
585 | " bam = True Positi : Negati = 21.3 : 1.0\n",
586 | " sad = True Negati : Positi = 19.6 : 1.0\n",
587 | " arrive = True Positi : Negati = 18.8 : 1.0\n",
588 | " community = True Positi : Negati = 15.2 : 1.0\n",
589 | " welcome = True Positi : Negati = 14.6 : 1.0\n",
590 | " blog = True Positi : Negati = 13.8 : 1.0\n",
591 | " poor = True Negati : Positi = 13.5 : 1.0\n",
592 | "None\n"
593 | ],
594 | "name": "stdout"
595 | }
596 | ]
597 | },
598 | {
599 | "cell_type": "markdown",
600 | "metadata": {
601 | "id": "B_wnP9R6-3eI",
602 | "colab_type": "text"
603 | },
604 | "source": [
605 | "Let's test the model out by classifying Elon Musk's tweets. Elon Musk is a very interesting individual, so we'll start by classifying his tweets first (according to this NLTK model), and then use a markov chain generator to generate random tweets that look like Elon Musk may have written them."
606 | ]
607 | },
608 | {
609 | "cell_type": "code",
610 | "metadata": {
611 | "id": "IaRicHIjWswW",
612 | "colab_type": "code",
613 | "colab": {}
614 | },
615 | "source": [
616 | "# test on Elon Musk tweets\n",
617 | "\n",
618 | "elon_tweets = pd.read_csv(\"https://www.dropbox.com/s/s9tp2lv32l95r0g/user_tweets.csv?raw=1\", sep = \",\")"
619 | ],
620 | "execution_count": 0,
621 | "outputs": []
622 | },
623 | {
624 | "cell_type": "code",
625 | "metadata": {
626 | "id": "pEcaZiMBaAJ-",
627 | "colab_type": "code",
628 | "outputId": "c48f1a59-3f51-438a-dcfe-a2c22cc490d5",
629 | "colab": {
630 | "base_uri": "https://localhost:8080/",
631 | "height": 194
632 | }
633 | },
634 | "source": [
635 | "elon_tweets.head()"
636 | ],
637 | "execution_count": 0,
638 | "outputs": [
639 | {
640 | "output_type": "execute_result",
641 | "data": {
642 | "text/html": [
643 | "\n",
644 | "\n",
657 | "
\n",
658 | " \n",
659 | " \n",
660 | " | \n",
661 | " text | \n",
662 | " username | \n",
663 | " linktotweet | \n",
664 | " tweetembedcode | \n",
665 | " createdat | \n",
666 | "
\n",
667 | " \n",
668 | " \n",
669 | " \n",
670 | " | 0 | \n",
671 | " @highqualitysh1t I love the thought of a car d... | \n",
672 | " elonmusk | \n",
673 | " http://twitter.com/elonmusk/status/93704198630... | \n",
674 | " <blockquote class=\"twitter-tweet\"><p lang=\"en\"... | \n",
675 | " 2017-12-02T19:33:00 | \n",
676 | "
\n",
677 | " \n",
678 | " | 1 | \n",
679 | " @novaspivack Asimov's Foundation books should ... | \n",
680 | " elonmusk | \n",
681 | " http://twitter.com/elonmusk/status/93709071522... | \n",
682 | " <blockquote class=\"twitter-tweet\"><p lang=\"en\"... | \n",
683 | " 2017-12-02T22:46:00 | \n",
684 | "
\n",
685 | " \n",
686 | " | 2 | \n",
687 | " @novaspivack That's certainly the right way to... | \n",
688 | " elonmusk | \n",
689 | " http://twitter.com/elonmusk/status/93710961569... | \n",
690 | " <blockquote class=\"twitter-tweet\"><p lang=\"en\"... | \n",
691 | " 2017-12-03T00:01:00 | \n",
692 | "
\n",
693 | " \n",
694 | " | 3 | \n",
695 | " To preserve the transcendent majesty & spe... | \n",
696 | " elonmusk | \n",
697 | " http://twitter.com/elonmusk/status/93739733099... | \n",
698 | " <blockquote class=\"twitter-tweet\"><p lang=\"en\"... | \n",
699 | " 2017-12-03T19:05:00 | \n",
700 | "
\n",
701 | " \n",
702 | " | 4 | \n",
703 | " @harrisonlingren @JW8888888 Busted | \n",
704 | " elonmusk | \n",
705 | " http://twitter.com/elonmusk/status/93739781363... | \n",
706 | " <blockquote class=\"twitter-tweet\"><p lang=\"en\"... | \n",
707 | " 2017-12-03T19:07:00 | \n",
708 | "
\n",
709 | " \n",
710 | "
\n",
711 | "
"
712 | ],
713 | "text/plain": [
714 | " text ... createdat\n",
715 | "0 @highqualitysh1t I love the thought of a car d... ... 2017-12-02T19:33:00\n",
716 | "1 @novaspivack Asimov's Foundation books should ... ... 2017-12-02T22:46:00\n",
717 | "2 @novaspivack That's certainly the right way to... ... 2017-12-03T00:01:00\n",
718 | "3 To preserve the transcendent majesty & spe... ... 2017-12-03T19:05:00\n",
719 | "4 @harrisonlingren @JW8888888 Busted ... 2017-12-03T19:07:00\n",
720 | "\n",
721 | "[5 rows x 5 columns]"
722 | ]
723 | },
724 | "metadata": {
725 | "tags": []
726 | },
727 | "execution_count": 52
728 | }
729 | ]
730 | },
731 | {
732 | "cell_type": "code",
733 | "metadata": {
734 | "id": "WmP4XREtzbDQ",
735 | "colab_type": "code",
736 | "colab": {}
737 | },
738 | "source": [
739 | "from nltk.tokenize import word_tokenize"
740 | ],
741 | "execution_count": 0,
742 | "outputs": []
743 | },
744 | {
745 | "cell_type": "code",
746 | "metadata": {
747 | "id": "_BRVEkW9zzK2",
748 | "colab_type": "code",
749 | "colab": {}
750 | },
751 | "source": [
752 | "# classify Elon's tweets\n",
753 | "\n",
754 | "classifications = []\n",
755 | "elon_tweets = elon_tweets.dropna()\n",
756 | "for tweet in elon_tweets['text']:\n",
757 | "\n",
758 | " custom_tokens = remove_noise(word_tokenize(tweet))\n",
759 | " classifications.append(classifier.classify(dict([token, True] for token in custom_tokens)))"
760 | ],
761 | "execution_count": 0,
762 | "outputs": []
763 | },
764 | {
765 | "cell_type": "code",
766 | "metadata": {
767 | "id": "IOv_EK3A0nm4",
768 | "colab_type": "code",
769 | "outputId": "b4a0ba5d-9165-4428-b38a-15f740981eda",
770 | "colab": {
771 | "base_uri": "https://localhost:8080/",
772 | "height": 761
773 | }
774 | },
775 | "source": [
776 | "elon_tweets['class'] = classifications\n",
777 | "elon_tweets.head()\n",
778 | "neg_elon_tweets = elon_tweets[elon_tweets['class'] == 'Negative']\n",
779 | "pos_elon_tweets = elon_tweets[elon_tweets['class'] == 'Positive']\n",
780 | "print(neg_elon_tweets.shape)\n",
781 | "print(pos_elon_tweets.shape)\n",
782 | "elon_tweets.head()\n",
783 | "del elon_tweets['linktotweet']\n",
784 | "del elon_tweets['tweetembedcode']\n",
785 | "del elon_tweets['username']\n"
786 | ],
787 | "execution_count": 0,
788 | "outputs": [
789 | {
790 | "output_type": "stream",
791 | "text": [
792 | "(3174, 3)\n",
793 | "(3722, 3)\n"
794 | ],
795 | "name": "stdout"
796 | },
797 | {
798 | "output_type": "error",
799 | "ename": "KeyError",
800 | "evalue": "ignored",
801 | "traceback": [
802 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
803 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
804 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 2645\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2646\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2647\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
805 | "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
806 | "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
807 | "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
808 | "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
809 | "\u001b[0;31mKeyError\u001b[0m: 'linktotweet'",
810 | "\nDuring handling of the above exception, another exception occurred:\n",
811 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
812 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpos_elon_tweets\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0melon_tweets\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0;32mdel\u001b[0m \u001b[0melon_tweets\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'linktotweet'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0melon_tweets\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'tweetembedcode'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0melon_tweets\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'username'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
813 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__delitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 3757\u001b[0m \u001b[0;31m# there was no match, this call should raise the appropriate\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3758\u001b[0m \u001b[0;31m# exception:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3759\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_data\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdelete\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3760\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3761\u001b[0m \u001b[0;31m# delete from the caches\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
814 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py\u001b[0m in \u001b[0;36mdelete\u001b[0;34m(self, item)\u001b[0m\n\u001b[1;32m 1000\u001b[0m \u001b[0mDelete\u001b[0m \u001b[0mselected\u001b[0m \u001b[0mitem\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mitems\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnon\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mplace\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1001\u001b[0m \"\"\"\n\u001b[0;32m-> 1002\u001b[0;31m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1003\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1004\u001b[0m \u001b[0mis_deleted\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbool_\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
815 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 2646\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2647\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2648\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_maybe_cast_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2649\u001b[0m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtolerance\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtolerance\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2650\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msize\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
816 | "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
817 | "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
818 | "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
819 | "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
820 | "\u001b[0;31mKeyError\u001b[0m: 'linktotweet'"
821 | ]
822 | }
823 | ]
824 | },
825 | {
826 | "cell_type": "code",
827 | "metadata": {
828 | "id": "9vGWuoaa7b5Y",
829 | "colab_type": "code",
830 | "outputId": "4204a0f0-9318-4e64-ae78-995f15645d27",
831 | "colab": {
832 | "base_uri": "https://localhost:8080/",
833 | "height": 927
834 | }
835 | },
836 | "source": [
837 | "elon_tweets.head(30)"
838 | ],
839 | "execution_count": 0,
840 | "outputs": [
841 | {
842 | "output_type": "execute_result",
843 | "data": {
844 | "text/html": [
845 | "\n",
846 | "\n",
859 | "
\n",
860 | " \n",
861 | " \n",
862 | " | \n",
863 | " text | \n",
864 | " createdat | \n",
865 | " class | \n",
866 | "
\n",
867 | " \n",
868 | " \n",
869 | " \n",
870 | " | 0 | \n",
871 | " @highqualitysh1t I love the thought of a car d... | \n",
872 | " 2017-12-02T19:33:00 | \n",
873 | " Positive | \n",
874 | "
\n",
875 | " \n",
876 | " | 1 | \n",
877 | " @novaspivack Asimov's Foundation books should ... | \n",
878 | " 2017-12-02T22:46:00 | \n",
879 | " Positive | \n",
880 | "
\n",
881 | " \n",
882 | " | 2 | \n",
883 | " @novaspivack That's certainly the right way to... | \n",
884 | " 2017-12-03T00:01:00 | \n",
885 | " Negative | \n",
886 | "
\n",
887 | " \n",
888 | " | 3 | \n",
889 | " To preserve the transcendent majesty & spe... | \n",
890 | " 2017-12-03T19:05:00 | \n",
891 | " Negative | \n",
892 | "
\n",
893 | " \n",
894 | " | 4 | \n",
895 | " @harrisonlingren @JW8888888 Busted | \n",
896 | " 2017-12-03T19:07:00 | \n",
897 | " Negative | \n",
898 | "
\n",
899 | " \n",
900 | " | 5 | \n",
901 | " @IvanEscobosa Yes | \n",
902 | " 2017-12-03T19:07:00 | \n",
903 | " Positive | \n",
904 | "
\n",
905 | " \n",
906 | " | 6 | \n",
907 | " Hat | \n",
908 | " 2017-12-03T19:20:00 | \n",
909 | " Negative | \n",
910 | "
\n",
911 | " \n",
912 | " | 7 | \n",
913 | " Every 5000th buyer of our boringly boring hat ... | \n",
914 | " 2017-12-03T19:24:00 | \n",
915 | " Positive | \n",
916 | "
\n",
917 | " \n",
918 | " | 8 | \n",
919 | " @TheRealUtkarsh Because it's stupid | \n",
920 | " 2017-12-03T19:29:00 | \n",
921 | " Negative | \n",
922 | "
\n",
923 | " \n",
924 | " | 9 | \n",
925 | " @maralkalajian Maybe | \n",
926 | " 2017-12-03T19:28:00 | \n",
927 | " Negative | \n",
928 | "
\n",
929 | " \n",
930 | " | 10 | \n",
931 | " @NefuDaBoss Beyond anything you can imagine | \n",
932 | " 2017-12-03T19:37:00 | \n",
933 | " Positive | \n",
934 | "
\n",
935 | " \n",
936 | " | 11 | \n",
937 | " @TheRealUtkarsh And, at the risk of stating th... | \n",
938 | " 2017-12-03T19:33:00 | \n",
939 | " Negative | \n",
940 | "
\n",
941 | " \n",
942 | " | 12 | \n",
943 | " That special hat delivery will take place deep... | \n",
944 | " 2017-12-03T20:01:00 | \n",
945 | " Negative | \n",
946 | "
\n",
947 | " \n",
948 | " | 13 | \n",
949 | " @sm_carl Deal | \n",
950 | " 2017-12-03T22:21:00 | \n",
951 | " Negative | \n",
952 | "
\n",
953 | " \n",
954 | " | 14 | \n",
955 | " The *real* money comes from merchandising. I l... | \n",
956 | " 2017-12-03T22:24:00 | \n",
957 | " Negative | \n",
958 | "
\n",
959 | " \n",
960 | " | 15 | \n",
961 | " @FortuneTech Do it | \n",
962 | " 2017-12-07T17:05:00 | \n",
963 | " Negative | \n",
964 | "
\n",
965 | " \n",
966 | " | 16 | \n",
967 | " @tiamaria68uk Yes | \n",
968 | " 2017-12-08T01:43:00 | \n",
969 | " Positive | \n",
970 | "
\n",
971 | " \n",
972 | " | 17 | \n",
973 | " On Tuesday, SpaceX will attempt to fly both an... | \n",
974 | " 2017-12-08T03:23:00 | \n",
975 | " Positive | \n",
976 | "
\n",
977 | " \n",
978 | " | 18 | \n",
979 | " On Tuesday, SpaceX will attempt to refly both ... | \n",
980 | " 2017-12-08T03:24:00 | \n",
981 | " Positive | \n",
982 | "
\n",
983 | " \n",
984 | " | 19 | \n",
985 | " @tiamaria68uk Plus a towel and a sign saying \"... | \n",
986 | " 2017-12-08T05:37:00 | \n",
987 | " Negative | \n",
988 | "
\n",
989 | " \n",
990 | " | 20 | \n",
991 | " Over 30,000 hats sold! | \n",
992 | " 2017-12-11T03:07:00 | \n",
993 | " Negative | \n",
994 | "
\n",
995 | " \n",
996 | " | 21 | \n",
997 | " After 50k hats, we will start selling The Bori... | \n",
998 | " 2017-12-11T03:11:00 | \n",
999 | " Negative | \n",
1000 | "
\n",
1001 | " \n",
1002 | " | 22 | \n",
1003 | " I know it's a little off-brand, but kids love it | \n",
1004 | " 2017-12-11T07:47:00 | \n",
1005 | " Positive | \n",
1006 | "
\n",
1007 | " \n",
1008 | " | 23 | \n",
1009 | " @Justin_Levy04 Safest flamethrower ever | \n",
1010 | " 2017-12-11T07:49:00 | \n",
1011 | " Negative | \n",
1012 | "
\n",
1013 | " \n",
1014 | " | 24 | \n",
1015 | " And those are just the really big ones. The Pe... | \n",
1016 | " 2017-12-11T23:13:00 | \n",
1017 | " Negative | \n",
1018 | "
\n",
1019 | " \n",
1020 | " | 25 | \n",
1021 | " @DMC_Ryan I agree | \n",
1022 | " 2017-12-11T23:20:00 | \n",
1023 | " Positive | \n",
1024 | "
\n",
1025 | " \n",
1026 | " | 26 | \n",
1027 | " 35,000 | \n",
1028 | " 2017-12-12T16:34:00 | \n",
1029 | " Negative | \n",
1030 | "
\n",
1031 | " \n",
1032 | " | 27 | \n",
1033 | " @kaaningilamo Our existence cannot just be abo... | \n",
1034 | " 2017-12-13T20:32:00 | \n",
1035 | " Positive | \n",
1036 | "
\n",
1037 | " \n",
1038 | " | 28 | \n",
1039 | " It is high time that humanity went beyond Eart... | \n",
1040 | " 2017-12-13T20:29:00 | \n",
1041 | " Positive | \n",
1042 | "
\n",
1043 | " \n",
1044 | " | 29 | \n",
1045 | " 10 hat aficionados, in appreciation for their ... | \n",
1046 | " 2017-12-14T19:24:00 | \n",
1047 | " Negative | \n",
1048 | "
\n",
1049 | " \n",
1050 | "
\n",
1051 | "
"
1052 | ],
1053 | "text/plain": [
1054 | " text ... class\n",
1055 | "0 @highqualitysh1t I love the thought of a car d... ... Positive\n",
1056 | "1 @novaspivack Asimov's Foundation books should ... ... Positive\n",
1057 | "2 @novaspivack That's certainly the right way to... ... Negative\n",
1058 | "3 To preserve the transcendent majesty & spe... ... Negative\n",
1059 | "4 @harrisonlingren @JW8888888 Busted ... Negative\n",
1060 | "5 @IvanEscobosa Yes ... Positive\n",
1061 | "6 Hat ... Negative\n",
1062 | "7 Every 5000th buyer of our boringly boring hat ... ... Positive\n",
1063 | "8 @TheRealUtkarsh Because it's stupid ... Negative\n",
1064 | "9 @maralkalajian Maybe ... Negative\n",
1065 | "10 @NefuDaBoss Beyond anything you can imagine ... Positive\n",
1066 | "11 @TheRealUtkarsh And, at the risk of stating th... ... Negative\n",
1067 | "12 That special hat delivery will take place deep... ... Negative\n",
1068 | "13 @sm_carl Deal ... Negative\n",
1069 | "14 The *real* money comes from merchandising. I l... ... Negative\n",
1070 | "15 @FortuneTech Do it ... Negative\n",
1071 | "16 @tiamaria68uk Yes ... Positive\n",
1072 | "17 On Tuesday, SpaceX will attempt to fly both an... ... Positive\n",
1073 | "18 On Tuesday, SpaceX will attempt to refly both ... ... Positive\n",
1074 | "19 @tiamaria68uk Plus a towel and a sign saying \"... ... Negative\n",
1075 | "20 Over 30,000 hats sold! ... Negative\n",
1076 | "21 After 50k hats, we will start selling The Bori... ... Negative\n",
1077 | "22 I know it's a little off-brand, but kids love it ... Positive\n",
1078 | "23 @Justin_Levy04 Safest flamethrower ever ... Negative\n",
1079 | "24 And those are just the really big ones. The Pe... ... Negative\n",
1080 | "25 @DMC_Ryan I agree ... Positive\n",
1081 | "26 35,000 ... Negative\n",
1082 | "27 @kaaningilamo Our existence cannot just be abo... ... Positive\n",
1083 | "28 It is high time that humanity went beyond Eart... ... Positive\n",
1084 | "29 10 hat aficionados, in appreciation for their ... ... Negative\n",
1085 | "\n",
1086 | "[30 rows x 3 columns]"
1087 | ]
1088 | },
1089 | "metadata": {
1090 | "tags": []
1091 | },
1092 | "execution_count": 37
1093 | }
1094 | ]
1095 | },
1096 | {
1097 | "cell_type": "markdown",
1098 | "metadata": {
1099 | "id": "n08VrV261-7a",
1100 | "colab_type": "text"
1101 | },
1102 | "source": [
1103 | "Elon apparently has nearly as many negative tweets as positive tweets according to the NLTK tweet corpus. This is probably not that good of a model. It seems the NLTK twitter corpus was just not that good."
1104 | ]
1105 | },
1106 | {
1107 | "cell_type": "code",
1108 | "metadata": {
1109 | "id": "AD1fP9Kq82Uu",
1110 | "colab_type": "code",
1111 | "outputId": "5337a455-2a28-4f74-f9e4-01d7466331aa",
1112 | "colab": {
1113 | "base_uri": "https://localhost:8080/",
1114 | "height": 230
1115 | }
1116 | },
1117 | "source": [
1118 | "import spacy\n",
1119 | "!pip install markovify\n",
1120 | "import markovify"
1121 | ],
1122 | "execution_count": 0,
1123 | "outputs": [
1124 | {
1125 | "output_type": "stream",
1126 | "text": [
1127 | "Collecting markovify\n",
1128 | " Downloading https://files.pythonhosted.org/packages/de/c3/2e017f687e47e88eb9d8adf970527e2299fb566eba62112c2851ebb7ab93/markovify-0.8.0.tar.gz\n",
1129 | "Collecting unidecode\n",
1130 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)\n",
1131 | "\u001b[K |████████████████████████████████| 245kB 4.1MB/s \n",
1132 | "\u001b[?25hBuilding wheels for collected packages: markovify\n",
1133 | " Building wheel for markovify (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
1134 | " Created wheel for markovify: filename=markovify-0.8.0-cp36-none-any.whl size=10694 sha256=ca8a650180069eeb9ec4dd5df4ffa3f56752c6aaee8882909120a65e399f9bef\n",
1135 | " Stored in directory: /root/.cache/pip/wheels/5d/a8/92/35e2df870ff15a65657679dca105d190ec3c854a9f75435e40\n",
1136 | "Successfully built markovify\n",
1137 | "Installing collected packages: unidecode, markovify\n",
1138 | "Successfully installed markovify-0.8.0 unidecode-1.1.1\n"
1139 | ],
1140 | "name": "stdout"
1141 | }
1142 | ]
1143 | },
1144 | {
1145 | "cell_type": "code",
1146 | "metadata": {
1147 | "id": "nTxYl2Nr9AIu",
1148 | "colab_type": "code",
1149 | "colab": {}
1150 | },
1151 | "source": [
1152 | "nlp = spacy.load('en')\n",
1153 | "\n",
1154 | "# below is necessary to avoid memory error of SpaCy\n",
1155 | "nlp.max_length = 20000000\n",
1156 | "\n",
1157 | "# all the processing work is done below, so it may take a while\n",
1158 | "twitter_doc = nlp(\" \".join(elon_tweets.text))"
1159 | ],
1160 | "execution_count": 0,
1161 | "outputs": []
1162 | },
1163 | {
1164 | "cell_type": "code",
1165 | "metadata": {
1166 | "id": "M-9tJV8-EXSp",
1167 | "colab_type": "code",
1168 | "colab": {}
1169 | },
1170 | "source": [
1171 | "elon_negative_doc = nlp(\" \".join(elon_tweets[elon_tweets[\"class\"]==\"Negative\"].text))\n",
1172 | "elon_negative_sents = \" \".join([sent.text for sent in elon_negative_doc.sents if len(sent.text) > 1])\n",
1173 | "\n"
1174 | ],
1175 | "execution_count": 0,
1176 | "outputs": []
1177 | },
1178 | {
1179 | "cell_type": "code",
1180 | "metadata": {
1181 | "id": "eCvUdz6tE04N",
1182 | "colab_type": "code",
1183 | "outputId": "b0e6f894-b32a-4e34-8379-44e670bf4047",
1184 | "colab": {
1185 | "base_uri": "https://localhost:8080/",
1186 | "height": 748
1187 | }
1188 | },
1189 | "source": [
1190 | "elon_negative_generator = markovify.Text(elon_negative_sents, state_size = 3)\n",
1191 | "\n",
1192 | "# three randomly generated negative sentences\n",
1193 | "for i in range(20):\n",
1194 | " print(elon_negative_generator.make_sentence(tries=100))\n",
1195 | "\n",
1196 | "# three randomly-generated negative sentences of no more than 100 characters\n",
1197 | "for i in range(20):\n",
1198 | " print(elon_negative_generator.make_short_sentence(200, tries=100))"
1199 | ],
1200 | "execution_count": 0,
1201 | "outputs": [
1202 | {
1203 | "output_type": "stream",
1204 | "text": [
1205 | "@kimbal you too @austinbarnard45 @flcnhvy @Joe__Wakefield @tjq1190 @tyger_cyber @fawfulfan @_Mikemo This is a significant policy difference in Italy vs most other countries.\n",
1206 | "@BrandonJHavard No @arctechinc @BrandonJHavard No @CarolineGee8 @NathanBomey @USATODAY @Tesla @mayemusk That was a mistake.\n",
1207 | "@PPathole Black & white interior available only for Model 3 Performance https://t.co/Vejb9fTY5Q RT @SpaceX: Dragon is holding at the capture point ~10 meters from the @Space_Station.\n",
1208 | "We can solar power all of human civilization with a tiny % of the US prob helps convert some naysayers.\n",
1209 | "This is simple replacement of the Autopilot team has been working all weekend to resolve last minute issues.\n",
1210 | "@DMC_Ryan Can someone please do that!? I would def recommend this @williamwinters @austinhopperrrr @maysacha @thanr @JamesWorldSpace Exactly.\n",
1211 | "RT @CRcars: Consumer Reports tested the @Tesla Model 3 and 5-star safety ratings: name a more iconic duo.\n",
1212 | "@martinengwicht @Erdayastronaut @DiscoverMag Even connectivity at my house with the Tesla AI/autopilot team in about four weeks.\n",
1213 | "Next stop, the restaurant at the end of the year by The Detroit News! https://t.co/inyVuf1CzL RT @Tesla: In 8 years we've gone from building 500 vehicles per year in USA alone https://t.co/xoaK6v6T6f @ValueAnalyst1 @peterdevietien @scottwww @karpathy @Tesla @nvidia Or maybe pegging @flcnhvy Is that a real billboard?\n",
1214 | "@J_ump_er @neiltyson @Space_Station To be precise, my mistake.\n",
1215 | "After struggling to make the awesome version of Model 3 Performance is next-level.\n",
1216 | "@PPathole @Teslarati Yes, rather embarrassing tbh RT @SpaceX: Dragon returned home yesterday after its second month-long stay at the @Space_Station.\n",
1217 | "Will be done free of charge for those who haven't purchased it is starting to roll out.\n",
1218 | "None\n",
1219 | "RT @NASA: More than 5,600 pounds of @ISS_Research and supplies.\n",
1220 | "All major highways in Texas will have Superchargers, all the way into your garage https://t.co/DPnkC80NWH First Boring Brick store opening in ~2 months.\n",
1221 | "@wonderofscience @Treebeard1671 @Erdayastronaut @keego73 Yes.\n",
1222 | "@thehumanwire @InsideEVs @28delayslater Probably a month or two, although it may be a little racing game in the racing game in the car you need to stop every three hours for bathroom and food anyway.\n",
1223 | "ET. https://t.co/0qHhHzD4Js Atmospheric entry at 17,000 mph is like a beaver with small ideas, but it has a certain quiet dignity ...\n",
1224 | "@TeslaCharts Then you'll be fine @TeslaCharts I don't want to write Tesla a check.\n",
1225 | "Tesla will be there as soon as we have the data.\n",
1226 | "Thousands of US auto startups, but only Ford & Tesla are the only 2 American car companies to avoid bankruptcy.\n",
1227 | "Thanks @elonmusk and everyone at Tesla for making the finest car I've ever owned or driven.\n",
1228 | "Will feel like folding space from one part of a city bus, it will be something like that.\n",
1229 | "So many bots!! True https://t.co/CS6semhKm6 @JaneidyEve @slashdot Yes, if there is someone with a pattern of doing so.\n",
1230 | "Our main issue here in Boca is that it will be ~150t to LEO fully reusable @NYCHealthSystem @Tesla Happy we could be helpful!\n",
1231 | "@jeffborden_cga @marc_benton @MrTommyCampbell @Tesla When prices go down, those who already bought don't want to write Tesla a check.\n",
1232 | "@flcnhvy No, he's still much faster @teslaownersSV @Benioff Boring Co is launching a whole product line of DIY watchtowers.\n",
1233 | "@WorldAndScience Common myth, but incorrect @John_Gardi @Erdayastronaut @NotEricRalph @MartianDays @torybruno @AerojetRdyne @elon Both at the same time!\n",
1234 | "No more need to import fossil fuels for electricity, which is a super tiny % of Earth's surface, which is a strategic vulnerability.\n",
1235 | "@teslaownersSV @elontimes Excessive shades of grey @Erdayastronaut @13ericralph31 @HarryStoltz1 @flcnhvy Since Raptor produces 200 tons of force, cost per ton over time would actually be ~$1.\n",
1236 | "@SpaceForceDoD Starfleet begins @engineers_feed I try so hard to be way better.\n",
1237 | "This vehicle has seen a lot of people whose judgment I respect have suggested this.\n",
1238 | "@wonderofscience @Treebeard1671 @Erdayastronaut @keego73 Yes.\n",
1239 | "For now, it's slightly disadvantageous to have Tesla FSD computer or Tesla Nvidia-based computer.\n",
1240 | "The awareness is just a matter of courtesy & fairness.\n",
1241 | "EDT, or 3:30 UTC on June 25 - https://t.co/gtC39uBC7z https://t.co/NTblKjmFxT Falcon Heavy on Feb 6 from Apollo launchpad 39A at Cape Kennedy.\n",
1242 | "@RossSheingold @Tesla Upgrading hardware is not important for a few days Order Tesla Solar + Powerwall battery for 24/7 clean power with no blackouts.\n",
1243 | "@DMC_Ryan Yes, Model 3 Performance is next-level.\n",
1244 | "Our main issue here in Boca is that it will be information & entertainment, not trickery @Erik2be @Teslarati Sorry to hear that.\n"
1245 | ],
1246 | "name": "stdout"
1247 | }
1248 | ]
1249 | },
1250 | {
1251 | "cell_type": "markdown",
1252 | "metadata": {
1253 | "id": "cllgk9GcHzTf",
1254 | "colab_type": "text"
1255 | },
1256 | "source": [
1257 | "The sentiments in the Markov-generated Elon negative tweets do not seem to be negative *at all*. It seems that the NLTK twitter corpus is extremely lacking once the luxury of emoticons gets lost. It seems Elon Musk is generally a pretty positive guy."
1258 | ]
1259 | },
1260 | {
1261 | "cell_type": "code",
1262 | "metadata": {
1263 | "id": "GfWthtm1JQon",
1264 | "colab_type": "code",
1265 | "colab": {}
1266 | },
1267 | "source": [
1268 | "elon_positive_doc = nlp(\" \".join(elon_tweets[elon_tweets[\"class\"]==\"Positive\"].text))\n",
1269 | "elon_positive_sents = \" \".join([sent.text for sent in elon_positive_doc.sents if len(sent.text) > 1])\n"
1270 | ],
1271 | "execution_count": 0,
1272 | "outputs": []
1273 | },
1274 | {
1275 | "cell_type": "code",
1276 | "metadata": {
1277 | "id": "gE996aWqJ3cB",
1278 | "colab_type": "code",
1279 | "outputId": "57d85dd1-14a5-4ae1-d096-5c49985370a8",
1280 | "colab": {
1281 | "base_uri": "https://localhost:8080/",
1282 | "height": 728
1283 | }
1284 | },
1285 | "source": [
1286 | "elon_positive_generator = markovify.Text(elon_positive_sents, state_size = 3)\n",
1287 | "\n",
1288 | "# three randomly generated negative sentences\n",
1289 | "for i in range(20):\n",
1290 | " print(elon_positive_generator.make_sentence(tries=100))\n",
1291 | "\n",
1292 | "# three randomly-generated negative sentences of no more than 100 characters\n",
1293 | "for i in range(20):\n",
1294 | " print(elon_positive_generator.make_short_sentence(200, tries=100))"
1295 | ],
1296 | "execution_count": 0,
1297 | "outputs": [
1298 | {
1299 | "output_type": "stream",
1300 | "text": [
1301 | "@Kristennetten @mayapolarbear Technically, his bro RT @wonderofscience : This is what it would look a bit like a Mars simulator.\n",
1302 | "Really need to bring it to a Tesla service center. https://t.co/KCIFtliZr8 @ElectrekCo I was just a simple nucleotide, drifting alone in small crevice with 3 trillion siblings.\n",
1303 | "Essence of a good editor @lexiheft @Tesla Coming soon @CathieDWood Thank you for your trust in the @SpaceX team.\n",
1304 | "@Erdayastronaut @SpaceX Super proud of Tesla Autopilot team!\n",
1305 | "@NASA @SpaceX @Space_Station @Commercial_Crew Most likely, but this is an important clarification @ThePhoenixFlare @MKBHD @HyperChangeTV Yeah, news is actually super good.\n",
1306 | "It's amazing. https://t.co/eLqr4pLeIX @thanr Sure @aparanjape Prob early next year Just finished an engineering review with SpaceX Propulsion.\n",
1307 | "Weather is over 90% favorable for today's launch attempt - https://t.co/gtC39uBC7z RT @NASA: How many worlds exist outside our solar system?\n",
1308 | "Aiming to finish initial construction this summer, start Model 3 production ramp & trying to get to profitability.\n",
1309 | "RT @SpaceX: Webcast of Falcon 9 launch of Dragon targeted for 12:29 p.m.\n",
1310 | "Engineering team also getting feedback from the British dive team on how to improve the link quality so it can be deadly to pro divers when high.\n",
1311 | "Adding fold out solar wings would generate 30 to 40 rocket cores for ~300 missions over 5 years.\n",
1312 | "We are building the Starship prototypes locally at our launch site in Texas https://t.co/MtxkuhDDdE @grafikhure_de @StevenHardison Skin will get too hot for paint.\n",
1313 | "Apply at https://t.co/84BkZvuBn8 Looking forward to using the new Iridium constellation.\n",
1314 | "Aero surfaces & high gimbal angle main engines for landing orientation, so won't need high thrust reaction control.\n",
1315 | "EDT, or 4:29 UTC. https://t.co/gtC39uBC7z https://t.co/GziYEyZAxN RT @SpaceX: Successful deployment of 60 Starlink satellites from Pad 40 in Florida.\n",
1316 | "Obsidian Black & Metallic Silver will still be available as an upgrade from service.\n",
1317 | "Reminder to US buyers that the $7500 tax credit drops in half for Tesla on July 1.\n",
1318 | "Adding fold out solar wings would generate 30 to 40 rocket cores for ~300 missions over 5 years.\n",
1319 | "Two seater electric ATV designed to work with them @JaneidyEve Mini-sub arriving in about 17 hours.\n",
1320 | "@reto_siegrist @flcnhvy @thirdrowtesla Yes @justpaulinelol @reto_siegrist @flcnhvy @thirdrowtesla Yes @justpaulinelol @reto_siegrist @flcnhvy @thirdrowtesla Definitely!\n",
1321 | "RT @wonderofscience: Red sunset on a red planet.\n",
1322 | "As with vehicle safety, it will be in a future situation.\n",
1323 | "@ChrisEvans Great thread RT @tsrandall: In the second half of 2018, the Model 3 is the best-selling electric car, despite no advertising or paid endorsements.\n",
1324 | "@vincent13031925 @TeslaBull @scottwww @stetopinini @S_Padival @lopezlinette Indeed, very simple question.\n",
1325 | "2019 free cash flow Super proud of SpaceX propulsion/test/materials team!\n",
1326 | "Weather is 60% favorable for tonight's four-hour launch window, which opens Tuesday, August 7 at 1:18 a.m.\n",
1327 | "2019 free cash flow Super proud of SpaceX propulsion/test/materials team!\n",
1328 | "#Tesla #Model3 # RWD https://t.co/G3FnR3pwRi RT @Tesla: Model 3 recently earned the 2020 IIHS Top Safety Pick+ Award.\n",
1329 | "If not Tesla, please take a look at the timeline: https://t.co/cP1HuVwg6c https://t.co/fJCvzTo8Y7 RT @AstroAnnimal : Yes buddy, that's your Mother Earth.\n",
1330 | "RT @SpaceX: Falcon 9 and Crew Dragon are vertical ahead of today's attempt - https://t.co/gtC39uBC7z RT @NASA: How many worlds exist outside our solar system?\n",
1331 | "RT @Hyperloop: Student teams are arriving at @SpaceX for the 2018 Hyperloop Pod Competition Technically, alcohol is a solution Just saw that on a Tesla.\n",
1332 | "There need to be right.\n",
1333 | "RT @SpaceX: Falcon 9 and RADARSAT Constellation Mission have rolled out to the Tesla team, who have been working crazy hard.\n",
1334 | "Great word-of-mouth is why Model 3 is designed to burn @nichegamer Nice @justpaulinelol @Tesla Same, except where limited by EU regulations.\n",
1335 | "Also, thank you to all Tesla supporters around the world have different rules & processes, so just takes time.\n",
1336 | "EST, 1:45 UTC - https://t.co/gtC39uBC7z https://t.co/UbpdQrJeNx RT @Tesla: Model 3 configurator is now open to left-hand drive countries in Europe.\n",
1337 | "A lot of the misunderstandings @WillFealey No problem if you want a display car, order at https://t.co/46TXqRJ3C1 or visit our stores.\n",
1338 | "If all goes well, each launch of 60 satellites will generate more power than we needed, 2.7kW sending back to the grid!\n",
1339 | "Both computers will be used for GF4.\n",
1340 | "Hopefully, ready to test in a few months @brandonbernicky @tsrandall Not good enough yet.\n"
1341 | ],
1342 | "name": "stdout"
1343 | }
1344 | ]
1345 | },
1346 | {
1347 | "cell_type": "markdown",
1348 | "metadata": {
1349 | "id": "tKOir9Z3LoxG",
1350 | "colab_type": "text"
1351 | },
1352 | "source": [
1353 | "It seems the positive *and* negative tweets have about similar sentiments. There seems to be lots of talk about Elon's companies."
1354 | ]
1355 | }
1356 | ]
1357 | }
--------------------------------------------------------------------------------