├── .github
└── FUNDING.yml
├── .gitignore
├── README.md
├── environment.yml
├── ted.csv
├── tutorial.ipynb
└── youtube.jpg
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | patreon: dataschool
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .ipynb_checkpoints/
3 | data/
4 | extras/
5 | notebooks/
6 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Data Science Best Practices with pandas
2 |
3 | This tutorial was presented by Kevin Markham at PyCon on [May 2, 2019](https://us.pycon.org/2019/schedule/presentation/92/). Watch the complete [tutorial video](https://www.youtube.com/watch?v=dPwLlJkSHLo&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=36) on YouTube.
4 |
5 | [](https://www.youtube.com/watch?v=dPwLlJkSHLo&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=36 "Watch the complete tutorial video on YouTube")
6 |
7 | ## Jupyter Notebook
8 |
9 | The tutorial code is available as a [Jupyter notebook](tutorial.ipynb). You can run this notebook in the cloud (no installation required) by clicking the "launch binder" button:
10 |
11 | [](https://mybinder.org/v2/gh/justmarkham/pycon-2019-tutorial/master?filepath=tutorial.ipynb)
12 |
13 | ## What is the tutorial about?
14 |
15 | The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.
16 |
17 | In this tutorial, you'll use pandas to answer questions about a real-world dataset. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions.
18 |
19 | ## How well do I need to know pandas to participate?
20 |
21 | You will get the most out of this tutorial if you are an intermediate pandas user, since the tutorial does not cover pandas basics.
22 |
23 | - If you are new to pandas, I recommend watching some videos from my free [pandas course](https://www.dataschool.io/easier-data-analysis-with-pandas/) before the tutorial.
24 | - If you just need a pandas refresher, I recommend reviewing this [Jupyter notebook](https://nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb), which includes all of the code from my pandas course.
25 |
26 | ## What dataset are we using?
27 |
28 | `ted.csv` is the [TED Talks dataset](https://www.kaggle.com/rounakbanik/ted-talks) from Kaggle Datasets, made available under the [CC BY-NC-SA 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/).
29 |
30 | ## How do I download the CSV file from GitHub?
31 |
32 | Here are three options that will work equally well:
33 |
34 | - If you want to directly download only the CSV file, **right click on the following link** and select "Save As": [`ted.csv`](https://raw.githubusercontent.com/justmarkham/pycon-2019-tutorial/master/ted.csv).
35 | - If you know how to use git, you can click the green button above and **clone the entire repository**.
36 | - If you know how to open a ZIP file, you can click the green button above and **download the entire repository**.
37 |
38 | ## What do I need to do before the tutorial?
39 |
40 | 1. Make sure that [pandas](https://pandas.pydata.org/pandas-docs/stable/install.html) and [matplotlib](https://matplotlib.org/users/installing.html) are installed on your computer. (The easiest way to install pandas and matplotlib is by downloading the [Anaconda distribution](https://www.anaconda.com/distribution/).)
41 | 2. Download the CSV file from this repository.
42 | 3. Read the file into pandas using the `read_csv()` function to make sure everything is working.
43 |
44 | ## How can I check that pandas and matplotlib are properly installed?
45 |
46 | 1. Move the CSV file into your working directory. (This is usually the directory where you create Python scripts or notebooks.)
47 | 2. Open the Python environment of your choice.
48 | 3. If you're using the **Jupyter notebook**, run the following code:
49 |
50 | ```python
51 | import pandas as pd
52 | import matplotlib.pyplot as plt
53 | %matplotlib inline
54 | ted = pd.read_csv('ted.csv')
55 | ted.comments.plot()
56 | ```
57 |
58 | 4. If you're using **any other Python environment**, run the following code:
59 |
60 | ```python
61 | import pandas as pd
62 | import matplotlib.pyplot as plt
63 | ted = pd.read_csv('ted.csv')
64 | ted.comments.plot()
65 | plt.show()
66 | ```
67 |
68 | If you don't get any error messages, and a plot appears on your screen, then it's very likely that pandas and matplotlib are installed correctly.
69 |
70 | ## Who is the instructor?
71 |
72 | Kevin Markham is the founder of [Data School](https://www.dataschool.io/), an online school for learning data science with Python. He is passionate about teaching data science to people who are new to the field, regardless of their educational and professional backgrounds. Previously, Kevin was the lead data science instructor for General Assembly in Washington, DC. Currently, he teaches machine learning and data analysis to over 10,000 students each month through the Data School [YouTube channel](https://www.youtube.com/dataschool). He has a degree in Computer Engineering from Vanderbilt University and lives in Asheville, North Carolina with his wife and son.
73 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: pycon-2019-tutorial
2 | channels:
3 | - defaults
4 | dependencies:
5 | - pandas
6 | - matplotlib
7 |
--------------------------------------------------------------------------------
/tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# PyCon 2019: Data Science Best Practices with pandas ([video](https://www.youtube.com/watch?v=dPwLlJkSHLo&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=36))\n",
8 | "\n",
9 | "### GitHub repository: https://github.com/justmarkham/pycon-2019-tutorial\n",
10 | "\n",
11 | "### Instructor: Kevin Markham\n",
12 | "\n",
13 | "- Website: https://www.dataschool.io\n",
14 | "- YouTube: https://www.youtube.com/dataschool\n",
15 | "- Patreon: https://www.patreon.com/dataschool\n",
16 | "- Twitter: https://twitter.com/justmarkham\n",
17 | "- GitHub: https://github.com/justmarkham"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "## 1. Introduction to the TED Talks dataset\n",
25 | "\n",
26 | "https://www.kaggle.com/rounakbanik/ted-talks"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [
34 | {
35 | "data": {
36 | "text/plain": [
37 | "'0.24.2'"
38 | ]
39 | },
40 | "execution_count": 1,
41 | "metadata": {},
42 | "output_type": "execute_result"
43 | }
44 | ],
45 | "source": [
46 | "import pandas as pd\n",
47 | "pd.__version__"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 2,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "import matplotlib.pyplot as plt\n",
57 | "%matplotlib inline"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": 3,
63 | "metadata": {},
64 | "outputs": [],
65 | "source": [
66 | "ted = pd.read_csv('ted.csv')"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 4,
72 | "metadata": {
73 | "scrolled": false
74 | },
75 | "outputs": [
76 | {
77 | "data": {
78 | "text/html": [
79 | "
\n",
80 | "\n",
93 | "
\n",
94 | " \n",
95 | " \n",
96 | " \n",
97 | " comments \n",
98 | " description \n",
99 | " duration \n",
100 | " event \n",
101 | " film_date \n",
102 | " languages \n",
103 | " main_speaker \n",
104 | " name \n",
105 | " num_speaker \n",
106 | " published_date \n",
107 | " ratings \n",
108 | " related_talks \n",
109 | " speaker_occupation \n",
110 | " tags \n",
111 | " title \n",
112 | " url \n",
113 | " views \n",
114 | " \n",
115 | " \n",
116 | " \n",
117 | " \n",
118 | " 0 \n",
119 | " 4553 \n",
120 | " Sir Ken Robinson makes an entertaining and pro... \n",
121 | " 1164 \n",
122 | " TED2006 \n",
123 | " 1140825600 \n",
124 | " 60 \n",
125 | " Ken Robinson \n",
126 | " Ken Robinson: Do schools kill creativity? \n",
127 | " 1 \n",
128 | " 1151367060 \n",
129 | " [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n",
130 | " [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n",
131 | " Author/educator \n",
132 | " ['children', 'creativity', 'culture', 'dance',... \n",
133 | " Do schools kill creativity? \n",
134 | " https://www.ted.com/talks/ken_robinson_says_sc... \n",
135 | " 47227110 \n",
136 | " \n",
137 | " \n",
138 | " 1 \n",
139 | " 265 \n",
140 | " With the same humor and humanity he exuded in ... \n",
141 | " 977 \n",
142 | " TED2006 \n",
143 | " 1140825600 \n",
144 | " 43 \n",
145 | " Al Gore \n",
146 | " Al Gore: Averting the climate crisis \n",
147 | " 1 \n",
148 | " 1151367060 \n",
149 | " [{'id': 7, 'name': 'Funny', 'count': 544}, {'i... \n",
150 | " [{'id': 243, 'hero': 'https://pe.tedcdn.com/im... \n",
151 | " Climate advocate \n",
152 | " ['alternative energy', 'cars', 'climate change... \n",
153 | " Averting the climate crisis \n",
154 | " https://www.ted.com/talks/al_gore_on_averting_... \n",
155 | " 3200520 \n",
156 | " \n",
157 | " \n",
158 | " 2 \n",
159 | " 124 \n",
160 | " New York Times columnist David Pogue takes aim... \n",
161 | " 1286 \n",
162 | " TED2006 \n",
163 | " 1140739200 \n",
164 | " 26 \n",
165 | " David Pogue \n",
166 | " David Pogue: Simplicity sells \n",
167 | " 1 \n",
168 | " 1151367060 \n",
169 | " [{'id': 7, 'name': 'Funny', 'count': 964}, {'i... \n",
170 | " [{'id': 1725, 'hero': 'https://pe.tedcdn.com/i... \n",
171 | " Technology columnist \n",
172 | " ['computers', 'entertainment', 'interface desi... \n",
173 | " Simplicity sells \n",
174 | " https://www.ted.com/talks/david_pogue_says_sim... \n",
175 | " 1636292 \n",
176 | " \n",
177 | " \n",
178 | " 3 \n",
179 | " 200 \n",
180 | " In an emotionally charged talk, MacArthur-winn... \n",
181 | " 1116 \n",
182 | " TED2006 \n",
183 | " 1140912000 \n",
184 | " 35 \n",
185 | " Majora Carter \n",
186 | " Majora Carter: Greening the ghetto \n",
187 | " 1 \n",
188 | " 1151367060 \n",
189 | " [{'id': 3, 'name': 'Courageous', 'count': 760}... \n",
190 | " [{'id': 1041, 'hero': 'https://pe.tedcdn.com/i... \n",
191 | " Activist for environmental justice \n",
192 | " ['MacArthur grant', 'activism', 'business', 'c... \n",
193 | " Greening the ghetto \n",
194 | " https://www.ted.com/talks/majora_carter_s_tale... \n",
195 | " 1697550 \n",
196 | " \n",
197 | " \n",
198 | " 4 \n",
199 | " 593 \n",
200 | " You've never seen data presented like this. Wi... \n",
201 | " 1190 \n",
202 | " TED2006 \n",
203 | " 1140566400 \n",
204 | " 48 \n",
205 | " Hans Rosling \n",
206 | " Hans Rosling: The best stats you've ever seen \n",
207 | " 1 \n",
208 | " 1151440680 \n",
209 | " [{'id': 9, 'name': 'Ingenious', 'count': 3202}... \n",
210 | " [{'id': 2056, 'hero': 'https://pe.tedcdn.com/i... \n",
211 | " Global health expert; data visionary \n",
212 | " ['Africa', 'Asia', 'Google', 'demo', 'economic... \n",
213 | " The best stats you've ever seen \n",
214 | " https://www.ted.com/talks/hans_rosling_shows_t... \n",
215 | " 12005869 \n",
216 | " \n",
217 | " \n",
218 | "
\n",
219 | "
"
220 | ],
221 | "text/plain": [
222 | " comments description duration \\\n",
223 | "0 4553 Sir Ken Robinson makes an entertaining and pro... 1164 \n",
224 | "1 265 With the same humor and humanity he exuded in ... 977 \n",
225 | "2 124 New York Times columnist David Pogue takes aim... 1286 \n",
226 | "3 200 In an emotionally charged talk, MacArthur-winn... 1116 \n",
227 | "4 593 You've never seen data presented like this. Wi... 1190 \n",
228 | "\n",
229 | " event film_date languages main_speaker \\\n",
230 | "0 TED2006 1140825600 60 Ken Robinson \n",
231 | "1 TED2006 1140825600 43 Al Gore \n",
232 | "2 TED2006 1140739200 26 David Pogue \n",
233 | "3 TED2006 1140912000 35 Majora Carter \n",
234 | "4 TED2006 1140566400 48 Hans Rosling \n",
235 | "\n",
236 | " name num_speaker published_date \\\n",
237 | "0 Ken Robinson: Do schools kill creativity? 1 1151367060 \n",
238 | "1 Al Gore: Averting the climate crisis 1 1151367060 \n",
239 | "2 David Pogue: Simplicity sells 1 1151367060 \n",
240 | "3 Majora Carter: Greening the ghetto 1 1151367060 \n",
241 | "4 Hans Rosling: The best stats you've ever seen 1 1151440680 \n",
242 | "\n",
243 | " ratings \\\n",
244 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n",
245 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i... \n",
246 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i... \n",
247 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}... \n",
248 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}... \n",
249 | "\n",
250 | " related_talks \\\n",
251 | "0 [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n",
252 | "1 [{'id': 243, 'hero': 'https://pe.tedcdn.com/im... \n",
253 | "2 [{'id': 1725, 'hero': 'https://pe.tedcdn.com/i... \n",
254 | "3 [{'id': 1041, 'hero': 'https://pe.tedcdn.com/i... \n",
255 | "4 [{'id': 2056, 'hero': 'https://pe.tedcdn.com/i... \n",
256 | "\n",
257 | " speaker_occupation \\\n",
258 | "0 Author/educator \n",
259 | "1 Climate advocate \n",
260 | "2 Technology columnist \n",
261 | "3 Activist for environmental justice \n",
262 | "4 Global health expert; data visionary \n",
263 | "\n",
264 | " tags \\\n",
265 | "0 ['children', 'creativity', 'culture', 'dance',... \n",
266 | "1 ['alternative energy', 'cars', 'climate change... \n",
267 | "2 ['computers', 'entertainment', 'interface desi... \n",
268 | "3 ['MacArthur grant', 'activism', 'business', 'c... \n",
269 | "4 ['Africa', 'Asia', 'Google', 'demo', 'economic... \n",
270 | "\n",
271 | " title \\\n",
272 | "0 Do schools kill creativity? \n",
273 | "1 Averting the climate crisis \n",
274 | "2 Simplicity sells \n",
275 | "3 Greening the ghetto \n",
276 | "4 The best stats you've ever seen \n",
277 | "\n",
278 | " url views \n",
279 | "0 https://www.ted.com/talks/ken_robinson_says_sc... 47227110 \n",
280 | "1 https://www.ted.com/talks/al_gore_on_averting_... 3200520 \n",
281 | "2 https://www.ted.com/talks/david_pogue_says_sim... 1636292 \n",
282 | "3 https://www.ted.com/talks/majora_carter_s_tale... 1697550 \n",
283 | "4 https://www.ted.com/talks/hans_rosling_shows_t... 12005869 "
284 | ]
285 | },
286 | "execution_count": 4,
287 | "metadata": {},
288 | "output_type": "execute_result"
289 | }
290 | ],
291 | "source": [
292 | "# each row represents a single talk\n",
293 | "ted.head()"
294 | ]
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": 5,
299 | "metadata": {},
300 | "outputs": [
301 | {
302 | "data": {
303 | "text/plain": [
304 | "(2550, 17)"
305 | ]
306 | },
307 | "execution_count": 5,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "# rows, columns\n",
314 | "ted.shape"
315 | ]
316 | },
317 | {
318 | "cell_type": "code",
319 | "execution_count": 6,
320 | "metadata": {},
321 | "outputs": [
322 | {
323 | "data": {
324 | "text/plain": [
325 | "comments int64\n",
326 | "description object\n",
327 | "duration int64\n",
328 | "event object\n",
329 | "film_date int64\n",
330 | "languages int64\n",
331 | "main_speaker object\n",
332 | "name object\n",
333 | "num_speaker int64\n",
334 | "published_date int64\n",
335 | "ratings object\n",
336 | "related_talks object\n",
337 | "speaker_occupation object\n",
338 | "tags object\n",
339 | "title object\n",
340 | "url object\n",
341 | "views int64\n",
342 | "dtype: object"
343 | ]
344 | },
345 | "execution_count": 6,
346 | "metadata": {},
347 | "output_type": "execute_result"
348 | }
349 | ],
350 | "source": [
351 | "# object columns are usually strings, but can also be arbitrary Python objects (lists, dictionaries)\n",
352 | "ted.dtypes"
353 | ]
354 | },
355 | {
356 | "cell_type": "code",
357 | "execution_count": 7,
358 | "metadata": {},
359 | "outputs": [
360 | {
361 | "data": {
362 | "text/plain": [
363 | "comments 0\n",
364 | "description 0\n",
365 | "duration 0\n",
366 | "event 0\n",
367 | "film_date 0\n",
368 | "languages 0\n",
369 | "main_speaker 0\n",
370 | "name 0\n",
371 | "num_speaker 0\n",
372 | "published_date 0\n",
373 | "ratings 0\n",
374 | "related_talks 0\n",
375 | "speaker_occupation 6\n",
376 | "tags 0\n",
377 | "title 0\n",
378 | "url 0\n",
379 | "views 0\n",
380 | "dtype: int64"
381 | ]
382 | },
383 | "execution_count": 7,
384 | "metadata": {},
385 | "output_type": "execute_result"
386 | }
387 | ],
388 | "source": [
389 | "# count the number of missing values in each column\n",
390 | "ted.isna().sum()"
391 | ]
392 | },
393 | {
394 | "cell_type": "markdown",
395 | "metadata": {},
396 | "source": [
397 | "## 2. Which talks provoke the most online discussion?"
398 | ]
399 | },
400 | {
401 | "cell_type": "code",
402 | "execution_count": 8,
403 | "metadata": {},
404 | "outputs": [
405 | {
406 | "data": {
407 | "text/html": [
408 | "\n",
409 | "\n",
422 | "
\n",
423 | " \n",
424 | " \n",
425 | " \n",
426 | " comments \n",
427 | " description \n",
428 | " duration \n",
429 | " event \n",
430 | " film_date \n",
431 | " languages \n",
432 | " main_speaker \n",
433 | " name \n",
434 | " num_speaker \n",
435 | " published_date \n",
436 | " ratings \n",
437 | " related_talks \n",
438 | " speaker_occupation \n",
439 | " tags \n",
440 | " title \n",
441 | " url \n",
442 | " views \n",
443 | " \n",
444 | " \n",
445 | " \n",
446 | " \n",
447 | " 1787 \n",
448 | " 2673 \n",
449 | " Our consciousness is a fundamental aspect of o... \n",
450 | " 1117 \n",
451 | " TED2014 \n",
452 | " 1395100800 \n",
453 | " 33 \n",
454 | " David Chalmers \n",
455 | " David Chalmers: How do you explain consciousness? \n",
456 | " 1 \n",
457 | " 1405350484 \n",
458 | " [{'id': 25, 'name': 'OK', 'count': 280}, {'id'... \n",
459 | " [{'id': 1308, 'hero': 'https://pe.tedcdn.com/i... \n",
460 | " Philosopher \n",
461 | " ['brain', 'consciousness', 'neuroscience', 'ph... \n",
462 | " How do you explain consciousness? \n",
463 | " https://www.ted.com/talks/david_chalmers_how_d... \n",
464 | " 2162764 \n",
465 | " \n",
466 | " \n",
467 | " 201 \n",
468 | " 2877 \n",
469 | " Jill Bolte Taylor got a research opportunity f... \n",
470 | " 1099 \n",
471 | " TED2008 \n",
472 | " 1204070400 \n",
473 | " 49 \n",
474 | " Jill Bolte Taylor \n",
475 | " Jill Bolte Taylor: My stroke of insight \n",
476 | " 1 \n",
477 | " 1205284200 \n",
478 | " [{'id': 22, 'name': 'Fascinating', 'count': 14... \n",
479 | " [{'id': 184, 'hero': 'https://pe.tedcdn.com/im... \n",
480 | " Neuroanatomist \n",
481 | " ['biology', 'brain', 'consciousness', 'global ... \n",
482 | " My stroke of insight \n",
483 | " https://www.ted.com/talks/jill_bolte_taylor_s_... \n",
484 | " 21190883 \n",
485 | " \n",
486 | " \n",
487 | " 644 \n",
488 | " 3356 \n",
489 | " Questions of good and evil, right and wrong ar... \n",
490 | " 1386 \n",
491 | " TED2010 \n",
492 | " 1265846400 \n",
493 | " 39 \n",
494 | " Sam Harris \n",
495 | " Sam Harris: Science can answer moral questions \n",
496 | " 1 \n",
497 | " 1269249180 \n",
498 | " [{'id': 8, 'name': 'Informative', 'count': 923... \n",
499 | " [{'id': 666, 'hero': 'https://pe.tedcdn.com/im... \n",
500 | " Neuroscientist, philosopher \n",
501 | " ['culture', 'evolutionary psychology', 'global... \n",
502 | " Science can answer moral questions \n",
503 | " https://www.ted.com/talks/sam_harris_science_c... \n",
504 | " 3433437 \n",
505 | " \n",
506 | " \n",
507 | " 0 \n",
508 | " 4553 \n",
509 | " Sir Ken Robinson makes an entertaining and pro... \n",
510 | " 1164 \n",
511 | " TED2006 \n",
512 | " 1140825600 \n",
513 | " 60 \n",
514 | " Ken Robinson \n",
515 | " Ken Robinson: Do schools kill creativity? \n",
516 | " 1 \n",
517 | " 1151367060 \n",
518 | " [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n",
519 | " [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n",
520 | " Author/educator \n",
521 | " ['children', 'creativity', 'culture', 'dance',... \n",
522 | " Do schools kill creativity? \n",
523 | " https://www.ted.com/talks/ken_robinson_says_sc... \n",
524 | " 47227110 \n",
525 | " \n",
526 | " \n",
527 | " 96 \n",
528 | " 6404 \n",
529 | " Richard Dawkins urges all atheists to openly s... \n",
530 | " 1750 \n",
531 | " TED2002 \n",
532 | " 1012608000 \n",
533 | " 42 \n",
534 | " Richard Dawkins \n",
535 | " Richard Dawkins: Militant atheism \n",
536 | " 1 \n",
537 | " 1176689220 \n",
538 | " [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
539 | " [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
540 | " Evolutionary biologist \n",
541 | " ['God', 'atheism', 'culture', 'religion', 'sci... \n",
542 | " Militant atheism \n",
543 | " https://www.ted.com/talks/richard_dawkins_on_m... \n",
544 | " 4374792 \n",
545 | " \n",
546 | " \n",
547 | "
\n",
548 | "
"
549 | ],
550 | "text/plain": [
551 | " comments description duration \\\n",
552 | "1787 2673 Our consciousness is a fundamental aspect of o... 1117 \n",
553 | "201 2877 Jill Bolte Taylor got a research opportunity f... 1099 \n",
554 | "644 3356 Questions of good and evil, right and wrong ar... 1386 \n",
555 | "0 4553 Sir Ken Robinson makes an entertaining and pro... 1164 \n",
556 | "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n",
557 | "\n",
558 | " event film_date languages main_speaker \\\n",
559 | "1787 TED2014 1395100800 33 David Chalmers \n",
560 | "201 TED2008 1204070400 49 Jill Bolte Taylor \n",
561 | "644 TED2010 1265846400 39 Sam Harris \n",
562 | "0 TED2006 1140825600 60 Ken Robinson \n",
563 | "96 TED2002 1012608000 42 Richard Dawkins \n",
564 | "\n",
565 | " name num_speaker \\\n",
566 | "1787 David Chalmers: How do you explain consciousness? 1 \n",
567 | "201 Jill Bolte Taylor: My stroke of insight 1 \n",
568 | "644 Sam Harris: Science can answer moral questions 1 \n",
569 | "0 Ken Robinson: Do schools kill creativity? 1 \n",
570 | "96 Richard Dawkins: Militant atheism 1 \n",
571 | "\n",
572 | " published_date ratings \\\n",
573 | "1787 1405350484 [{'id': 25, 'name': 'OK', 'count': 280}, {'id'... \n",
574 | "201 1205284200 [{'id': 22, 'name': 'Fascinating', 'count': 14... \n",
575 | "644 1269249180 [{'id': 8, 'name': 'Informative', 'count': 923... \n",
576 | "0 1151367060 [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n",
577 | "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
578 | "\n",
579 | " related_talks \\\n",
580 | "1787 [{'id': 1308, 'hero': 'https://pe.tedcdn.com/i... \n",
581 | "201 [{'id': 184, 'hero': 'https://pe.tedcdn.com/im... \n",
582 | "644 [{'id': 666, 'hero': 'https://pe.tedcdn.com/im... \n",
583 | "0 [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n",
584 | "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
585 | "\n",
586 | " speaker_occupation \\\n",
587 | "1787 Philosopher \n",
588 | "201 Neuroanatomist \n",
589 | "644 Neuroscientist, philosopher \n",
590 | "0 Author/educator \n",
591 | "96 Evolutionary biologist \n",
592 | "\n",
593 | " tags \\\n",
594 | "1787 ['brain', 'consciousness', 'neuroscience', 'ph... \n",
595 | "201 ['biology', 'brain', 'consciousness', 'global ... \n",
596 | "644 ['culture', 'evolutionary psychology', 'global... \n",
597 | "0 ['children', 'creativity', 'culture', 'dance',... \n",
598 | "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n",
599 | "\n",
600 | " title \\\n",
601 | "1787 How do you explain consciousness? \n",
602 | "201 My stroke of insight \n",
603 | "644 Science can answer moral questions \n",
604 | "0 Do schools kill creativity? \n",
605 | "96 Militant atheism \n",
606 | "\n",
607 | " url views \n",
608 | "1787 https://www.ted.com/talks/david_chalmers_how_d... 2162764 \n",
609 | "201 https://www.ted.com/talks/jill_bolte_taylor_s_... 21190883 \n",
610 | "644 https://www.ted.com/talks/sam_harris_science_c... 3433437 \n",
611 | "0 https://www.ted.com/talks/ken_robinson_says_sc... 47227110 \n",
612 | "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 "
613 | ]
614 | },
615 | "execution_count": 8,
616 | "metadata": {},
617 | "output_type": "execute_result"
618 | }
619 | ],
620 | "source": [
621 | "# sort by the number of first-level comments, though this is biased in favor of older talks\n",
622 | "ted.sort_values('comments').tail()"
623 | ]
624 | },
625 | {
626 | "cell_type": "code",
627 | "execution_count": 9,
628 | "metadata": {},
629 | "outputs": [],
630 | "source": [
631 | "# correct for this bias by calculating the number of comments per view\n",
632 | "ted['comments_per_view'] = ted.comments / ted.views"
633 | ]
634 | },
635 | {
636 | "cell_type": "code",
637 | "execution_count": 10,
638 | "metadata": {},
639 | "outputs": [
640 | {
641 | "data": {
642 | "text/html": [
643 | "\n",
644 | "\n",
657 | "
\n",
658 | " \n",
659 | " \n",
660 | " \n",
661 | " comments \n",
662 | " description \n",
663 | " duration \n",
664 | " event \n",
665 | " film_date \n",
666 | " languages \n",
667 | " main_speaker \n",
668 | " name \n",
669 | " num_speaker \n",
670 | " published_date \n",
671 | " ratings \n",
672 | " related_talks \n",
673 | " speaker_occupation \n",
674 | " tags \n",
675 | " title \n",
676 | " url \n",
677 | " views \n",
678 | " comments_per_view \n",
679 | " \n",
680 | " \n",
681 | " \n",
682 | " \n",
683 | " 954 \n",
684 | " 2492 \n",
685 | " Janet Echelman found her true voice as an arti... \n",
686 | " 566 \n",
687 | " TED2011 \n",
688 | " 1299110400 \n",
689 | " 35 \n",
690 | " Janet Echelman \n",
691 | " Janet Echelman: Taking imagination seriously \n",
692 | " 1 \n",
693 | " 1307489760 \n",
694 | " [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
695 | " [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n",
696 | " Artist \n",
697 | " ['art', 'cities', 'culture', 'data', 'design',... \n",
698 | " Taking imagination seriously \n",
699 | " https://www.ted.com/talks/janet_echelman \n",
700 | " 1832930 \n",
701 | " 0.001360 \n",
702 | " \n",
703 | " \n",
704 | " 694 \n",
705 | " 1502 \n",
706 | " Filmmaker Sharmeen Obaid-Chinoy takes on a ter... \n",
707 | " 489 \n",
708 | " TED2010 \n",
709 | " 1265760000 \n",
710 | " 32 \n",
711 | " Sharmeen Obaid-Chinoy \n",
712 | " Sharmeen Obaid-Chinoy: Inside a school for sui... \n",
713 | " 1 \n",
714 | " 1274865960 \n",
715 | " [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
716 | " [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n",
717 | " Filmmaker \n",
718 | " ['TED Fellows', 'children', 'culture', 'film',... \n",
719 | " Inside a school for suicide bombers \n",
720 | " https://www.ted.com/talks/sharmeen_obaid_chino... \n",
721 | " 1057238 \n",
722 | " 0.001421 \n",
723 | " \n",
724 | " \n",
725 | " 96 \n",
726 | " 6404 \n",
727 | " Richard Dawkins urges all atheists to openly s... \n",
728 | " 1750 \n",
729 | " TED2002 \n",
730 | " 1012608000 \n",
731 | " 42 \n",
732 | " Richard Dawkins \n",
733 | " Richard Dawkins: Militant atheism \n",
734 | " 1 \n",
735 | " 1176689220 \n",
736 | " [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
737 | " [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
738 | " Evolutionary biologist \n",
739 | " ['God', 'atheism', 'culture', 'religion', 'sci... \n",
740 | " Militant atheism \n",
741 | " https://www.ted.com/talks/richard_dawkins_on_m... \n",
742 | " 4374792 \n",
743 | " 0.001464 \n",
744 | " \n",
745 | " \n",
746 | " 803 \n",
747 | " 834 \n",
748 | " David Bismark demos a new system for voting th... \n",
749 | " 422 \n",
750 | " TEDGlobal 2010 \n",
751 | " 1279065600 \n",
752 | " 36 \n",
753 | " David Bismark \n",
754 | " David Bismark: E-voting without fraud \n",
755 | " 1 \n",
756 | " 1288685640 \n",
757 | " [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n",
758 | " [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n",
759 | " Voting system designer \n",
760 | " ['culture', 'democracy', 'design', 'global iss... \n",
761 | " E-voting without fraud \n",
762 | " https://www.ted.com/talks/david_bismark_e_voti... \n",
763 | " 543551 \n",
764 | " 0.001534 \n",
765 | " \n",
766 | " \n",
767 | " 744 \n",
768 | " 649 \n",
769 | " Hours before New York lawmakers rejected a key... \n",
770 | " 453 \n",
771 | " New York State Senate \n",
772 | " 1259712000 \n",
773 | " 0 \n",
774 | " Diane J. Savino \n",
775 | " Diane J. Savino: The case for same-sex marriage \n",
776 | " 1 \n",
777 | " 1282062180 \n",
778 | " [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n",
779 | " [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n",
780 | " Senator \n",
781 | " ['God', 'LGBT', 'culture', 'government', 'law'... \n",
782 | " The case for same-sex marriage \n",
783 | " https://www.ted.com/talks/diane_j_savino_the_c... \n",
784 | " 292395 \n",
785 | " 0.002220 \n",
786 | " \n",
787 | " \n",
788 | "
\n",
789 | "
"
790 | ],
791 | "text/plain": [
792 | " comments description duration \\\n",
793 | "954 2492 Janet Echelman found her true voice as an arti... 566 \n",
794 | "694 1502 Filmmaker Sharmeen Obaid-Chinoy takes on a ter... 489 \n",
795 | "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n",
796 | "803 834 David Bismark demos a new system for voting th... 422 \n",
797 | "744 649 Hours before New York lawmakers rejected a key... 453 \n",
798 | "\n",
799 | " event film_date languages main_speaker \\\n",
800 | "954 TED2011 1299110400 35 Janet Echelman \n",
801 | "694 TED2010 1265760000 32 Sharmeen Obaid-Chinoy \n",
802 | "96 TED2002 1012608000 42 Richard Dawkins \n",
803 | "803 TEDGlobal 2010 1279065600 36 David Bismark \n",
804 | "744 New York State Senate 1259712000 0 Diane J. Savino \n",
805 | "\n",
806 | " name num_speaker \\\n",
807 | "954 Janet Echelman: Taking imagination seriously 1 \n",
808 | "694 Sharmeen Obaid-Chinoy: Inside a school for sui... 1 \n",
809 | "96 Richard Dawkins: Militant atheism 1 \n",
810 | "803 David Bismark: E-voting without fraud 1 \n",
811 | "744 Diane J. Savino: The case for same-sex marriage 1 \n",
812 | "\n",
813 | " published_date ratings \\\n",
814 | "954 1307489760 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
815 | "694 1274865960 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
816 | "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
817 | "803 1288685640 [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n",
818 | "744 1282062180 [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n",
819 | "\n",
820 | " related_talks \\\n",
821 | "954 [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n",
822 | "694 [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n",
823 | "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
824 | "803 [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n",
825 | "744 [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n",
826 | "\n",
827 | " speaker_occupation \\\n",
828 | "954 Artist \n",
829 | "694 Filmmaker \n",
830 | "96 Evolutionary biologist \n",
831 | "803 Voting system designer \n",
832 | "744 Senator \n",
833 | "\n",
834 | " tags \\\n",
835 | "954 ['art', 'cities', 'culture', 'data', 'design',... \n",
836 | "694 ['TED Fellows', 'children', 'culture', 'film',... \n",
837 | "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n",
838 | "803 ['culture', 'democracy', 'design', 'global iss... \n",
839 | "744 ['God', 'LGBT', 'culture', 'government', 'law'... \n",
840 | "\n",
841 | " title \\\n",
842 | "954 Taking imagination seriously \n",
843 | "694 Inside a school for suicide bombers \n",
844 | "96 Militant atheism \n",
845 | "803 E-voting without fraud \n",
846 | "744 The case for same-sex marriage \n",
847 | "\n",
848 | " url views \\\n",
849 | "954 https://www.ted.com/talks/janet_echelman 1832930 \n",
850 | "694 https://www.ted.com/talks/sharmeen_obaid_chino... 1057238 \n",
851 | "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 \n",
852 | "803 https://www.ted.com/talks/david_bismark_e_voti... 543551 \n",
853 | "744 https://www.ted.com/talks/diane_j_savino_the_c... 292395 \n",
854 | "\n",
855 | " comments_per_view \n",
856 | "954 0.001360 \n",
857 | "694 0.001421 \n",
858 | "96 0.001464 \n",
859 | "803 0.001534 \n",
860 | "744 0.002220 "
861 | ]
862 | },
863 | "execution_count": 10,
864 | "metadata": {},
865 | "output_type": "execute_result"
866 | }
867 | ],
868 | "source": [
869 | "# interpretation: for every view of the same-sex marriage talk, there are 0.002 comments\n",
870 | "ted.sort_values('comments_per_view').tail()"
871 | ]
872 | },
873 | {
874 | "cell_type": "code",
875 | "execution_count": 11,
876 | "metadata": {},
877 | "outputs": [],
878 | "source": [
879 | "# make this more interpretable by inverting the calculation\n",
880 | "ted['views_per_comment'] = ted.views / ted.comments"
881 | ]
882 | },
883 | {
884 | "cell_type": "code",
885 | "execution_count": 12,
886 | "metadata": {},
887 | "outputs": [
888 | {
889 | "data": {
890 | "text/html": [
891 | "\n",
892 | "\n",
905 | "
\n",
906 | " \n",
907 | " \n",
908 | " \n",
909 | " comments \n",
910 | " description \n",
911 | " duration \n",
912 | " event \n",
913 | " film_date \n",
914 | " languages \n",
915 | " main_speaker \n",
916 | " name \n",
917 | " num_speaker \n",
918 | " published_date \n",
919 | " ratings \n",
920 | " related_talks \n",
921 | " speaker_occupation \n",
922 | " tags \n",
923 | " title \n",
924 | " url \n",
925 | " views \n",
926 | " comments_per_view \n",
927 | " views_per_comment \n",
928 | " \n",
929 | " \n",
930 | " \n",
931 | " \n",
932 | " 744 \n",
933 | " 649 \n",
934 | " Hours before New York lawmakers rejected a key... \n",
935 | " 453 \n",
936 | " New York State Senate \n",
937 | " 1259712000 \n",
938 | " 0 \n",
939 | " Diane J. Savino \n",
940 | " Diane J. Savino: The case for same-sex marriage \n",
941 | " 1 \n",
942 | " 1282062180 \n",
943 | " [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n",
944 | " [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n",
945 | " Senator \n",
946 | " ['God', 'LGBT', 'culture', 'government', 'law'... \n",
947 | " The case for same-sex marriage \n",
948 | " https://www.ted.com/talks/diane_j_savino_the_c... \n",
949 | " 292395 \n",
950 | " 0.002220 \n",
951 | " 450.531587 \n",
952 | " \n",
953 | " \n",
954 | " 803 \n",
955 | " 834 \n",
956 | " David Bismark demos a new system for voting th... \n",
957 | " 422 \n",
958 | " TEDGlobal 2010 \n",
959 | " 1279065600 \n",
960 | " 36 \n",
961 | " David Bismark \n",
962 | " David Bismark: E-voting without fraud \n",
963 | " 1 \n",
964 | " 1288685640 \n",
965 | " [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n",
966 | " [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n",
967 | " Voting system designer \n",
968 | " ['culture', 'democracy', 'design', 'global iss... \n",
969 | " E-voting without fraud \n",
970 | " https://www.ted.com/talks/david_bismark_e_voti... \n",
971 | " 543551 \n",
972 | " 0.001534 \n",
973 | " 651.739808 \n",
974 | " \n",
975 | " \n",
976 | " 96 \n",
977 | " 6404 \n",
978 | " Richard Dawkins urges all atheists to openly s... \n",
979 | " 1750 \n",
980 | " TED2002 \n",
981 | " 1012608000 \n",
982 | " 42 \n",
983 | " Richard Dawkins \n",
984 | " Richard Dawkins: Militant atheism \n",
985 | " 1 \n",
986 | " 1176689220 \n",
987 | " [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
988 | " [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
989 | " Evolutionary biologist \n",
990 | " ['God', 'atheism', 'culture', 'religion', 'sci... \n",
991 | " Militant atheism \n",
992 | " https://www.ted.com/talks/richard_dawkins_on_m... \n",
993 | " 4374792 \n",
994 | " 0.001464 \n",
995 | " 683.134291 \n",
996 | " \n",
997 | " \n",
998 | " 694 \n",
999 | " 1502 \n",
1000 | " Filmmaker Sharmeen Obaid-Chinoy takes on a ter... \n",
1001 | " 489 \n",
1002 | " TED2010 \n",
1003 | " 1265760000 \n",
1004 | " 32 \n",
1005 | " Sharmeen Obaid-Chinoy \n",
1006 | " Sharmeen Obaid-Chinoy: Inside a school for sui... \n",
1007 | " 1 \n",
1008 | " 1274865960 \n",
1009 | " [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
1010 | " [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n",
1011 | " Filmmaker \n",
1012 | " ['TED Fellows', 'children', 'culture', 'film',... \n",
1013 | " Inside a school for suicide bombers \n",
1014 | " https://www.ted.com/talks/sharmeen_obaid_chino... \n",
1015 | " 1057238 \n",
1016 | " 0.001421 \n",
1017 | " 703.886818 \n",
1018 | " \n",
1019 | " \n",
1020 | " 954 \n",
1021 | " 2492 \n",
1022 | " Janet Echelman found her true voice as an arti... \n",
1023 | " 566 \n",
1024 | " TED2011 \n",
1025 | " 1299110400 \n",
1026 | " 35 \n",
1027 | " Janet Echelman \n",
1028 | " Janet Echelman: Taking imagination seriously \n",
1029 | " 1 \n",
1030 | " 1307489760 \n",
1031 | " [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
1032 | " [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n",
1033 | " Artist \n",
1034 | " ['art', 'cities', 'culture', 'data', 'design',... \n",
1035 | " Taking imagination seriously \n",
1036 | " https://www.ted.com/talks/janet_echelman \n",
1037 | " 1832930 \n",
1038 | " 0.001360 \n",
1039 | " 735.525682 \n",
1040 | " \n",
1041 | " \n",
1042 | "
\n",
1043 | "
"
1044 | ],
1045 | "text/plain": [
1046 | " comments description duration \\\n",
1047 | "744 649 Hours before New York lawmakers rejected a key... 453 \n",
1048 | "803 834 David Bismark demos a new system for voting th... 422 \n",
1049 | "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n",
1050 | "694 1502 Filmmaker Sharmeen Obaid-Chinoy takes on a ter... 489 \n",
1051 | "954 2492 Janet Echelman found her true voice as an arti... 566 \n",
1052 | "\n",
1053 | " event film_date languages main_speaker \\\n",
1054 | "744 New York State Senate 1259712000 0 Diane J. Savino \n",
1055 | "803 TEDGlobal 2010 1279065600 36 David Bismark \n",
1056 | "96 TED2002 1012608000 42 Richard Dawkins \n",
1057 | "694 TED2010 1265760000 32 Sharmeen Obaid-Chinoy \n",
1058 | "954 TED2011 1299110400 35 Janet Echelman \n",
1059 | "\n",
1060 | " name num_speaker \\\n",
1061 | "744 Diane J. Savino: The case for same-sex marriage 1 \n",
1062 | "803 David Bismark: E-voting without fraud 1 \n",
1063 | "96 Richard Dawkins: Militant atheism 1 \n",
1064 | "694 Sharmeen Obaid-Chinoy: Inside a school for sui... 1 \n",
1065 | "954 Janet Echelman: Taking imagination seriously 1 \n",
1066 | "\n",
1067 | " published_date ratings \\\n",
1068 | "744 1282062180 [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n",
1069 | "803 1288685640 [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n",
1070 | "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n",
1071 | "694 1274865960 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
1072 | "954 1307489760 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n",
1073 | "\n",
1074 | " related_talks \\\n",
1075 | "744 [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n",
1076 | "803 [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n",
1077 | "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n",
1078 | "694 [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n",
1079 | "954 [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n",
1080 | "\n",
1081 | " speaker_occupation \\\n",
1082 | "744 Senator \n",
1083 | "803 Voting system designer \n",
1084 | "96 Evolutionary biologist \n",
1085 | "694 Filmmaker \n",
1086 | "954 Artist \n",
1087 | "\n",
1088 | " tags \\\n",
1089 | "744 ['God', 'LGBT', 'culture', 'government', 'law'... \n",
1090 | "803 ['culture', 'democracy', 'design', 'global iss... \n",
1091 | "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n",
1092 | "694 ['TED Fellows', 'children', 'culture', 'film',... \n",
1093 | "954 ['art', 'cities', 'culture', 'data', 'design',... \n",
1094 | "\n",
1095 | " title \\\n",
1096 | "744 The case for same-sex marriage \n",
1097 | "803 E-voting without fraud \n",
1098 | "96 Militant atheism \n",
1099 | "694 Inside a school for suicide bombers \n",
1100 | "954 Taking imagination seriously \n",
1101 | "\n",
1102 | " url views \\\n",
1103 | "744 https://www.ted.com/talks/diane_j_savino_the_c... 292395 \n",
1104 | "803 https://www.ted.com/talks/david_bismark_e_voti... 543551 \n",
1105 | "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 \n",
1106 | "694 https://www.ted.com/talks/sharmeen_obaid_chino... 1057238 \n",
1107 | "954 https://www.ted.com/talks/janet_echelman 1832930 \n",
1108 | "\n",
1109 | " comments_per_view views_per_comment \n",
1110 | "744 0.002220 450.531587 \n",
1111 | "803 0.001534 651.739808 \n",
1112 | "96 0.001464 683.134291 \n",
1113 | "694 0.001421 703.886818 \n",
1114 | "954 0.001360 735.525682 "
1115 | ]
1116 | },
1117 | "execution_count": 12,
1118 | "metadata": {},
1119 | "output_type": "execute_result"
1120 | }
1121 | ],
1122 | "source": [
1123 | "# interpretation: 1 out of every 450 people leave a comment\n",
1124 | "ted.sort_values('views_per_comment').head()"
1125 | ]
1126 | },
1127 | {
1128 | "cell_type": "markdown",
1129 | "metadata": {},
1130 | "source": [
1131 | "Lessons:\n",
1132 | "\n",
1133 | "1. Consider the limitations and biases of your data when analyzing it\n",
1134 | "2. Make your results understandable"
1135 | ]
1136 | },
1137 | {
1138 | "cell_type": "markdown",
1139 | "metadata": {},
1140 | "source": [
1141 | "## 3. Visualize the distribution of comments"
1142 | ]
1143 | },
1144 | {
1145 | "cell_type": "code",
1146 | "execution_count": 13,
1147 | "metadata": {},
1148 | "outputs": [
1149 | {
1150 | "data": {
1151 | "text/plain": [
1152 | ""
1153 | ]
1154 | },
1155 | "execution_count": 13,
1156 | "metadata": {},
1157 | "output_type": "execute_result"
1158 | },
1159 | {
1160 | "data": {
1161 | "image/png": "\n",
1162 | "text/plain": [
1163 | ""
1164 | ]
1165 | },
1166 | "metadata": {
1167 | "needs_background": "light"
1168 | },
1169 | "output_type": "display_data"
1170 | }
1171 | ],
1172 | "source": [
1173 | "# line plot is not appropriate here (use it to measure something over time)\n",
1174 | "ted.comments.plot()"
1175 | ]
1176 | },
1177 | {
1178 | "cell_type": "code",
1179 | "execution_count": 14,
1180 | "metadata": {},
1181 | "outputs": [
1182 | {
1183 | "data": {
1184 | "text/plain": [
1185 | ""
1186 | ]
1187 | },
1188 | "execution_count": 14,
1189 | "metadata": {},
1190 | "output_type": "execute_result"
1191 | },
1192 | {
1193 | "data": {
1194 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE7pJREFUeJzt3X+w5XV93/Hny+WHojQsYaHbZckuma2VZBToBnFIU6MRgTQSO7GFycQdYrKZBGZ0kplmMW2wyThjOlFTphbFsg1YleDvrZKSlZg4+UNgQeSHK+EGt7IuZdfQANFUAnn3j/O5cFju3ns+eM8957jPx8x3zvf7Pp/v+b4PHHjd74/zPakqJEka1Qsm3YAkabYYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhwx6QbG4YQTTqgNGzZMug1Jmim33377t6pqzVLjvi+DY8OGDezatWvSbUjSTEnyv0cZ56EqSVKXsQVHkvVJvpBkd5J7k7y11d+R5JtJ7mzTBUPrXJ5kLsl9SV4/VD+v1eaSbBtXz5KkpY3zUNWTwG9U1R1JjgVuT7KzPffeqvr94cFJTgMuAn4E+CfA55P80/b0+4DXAXuB25LsqKqvjrF3SdIhjC04quoh4KE2/3iS3cC6RVa5ELi+qr4LfD3JHHBWe26uqh4ASHJ9G2twSNIErMg5jiQbgDOAW1rpsiR3JdmeZHWrrQMeHFptb6sdqn7wNrYm2ZVk14EDB5b5HUiS5o09OJK8BPgE8Laqegy4Cvhh4HQGeyTvnh+6wOq1SP3Zhaqrq2pzVW1es2bJq8kkSc/TWC/HTXIkg9D4cFV9EqCqHh56/oPAZ9viXmD90OonA/va/KHqkqQVNs6rqgJcA+yuqvcM1dcODXsjcE+b3wFclOToJBuBTcCtwG3ApiQbkxzF4AT6jnH1LUla3Dj3OM4BfgG4O8mdrfZ24OIkpzM43LQH+BWAqro3yQ0MTno/CVxaVU8BJLkMuAlYBWyvqnvH2LckaRGpes7pgpm3efPm+l6+Ob5h2+eWsZvR7XnXT09ku5IEkOT2qtq81Di/OS5J6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqMrbgSLI+yReS7E5yb5K3tvrxSXYmub89rm71JLkyyVySu5KcOfRaW9r4+5NsGVfPkqSljXOP40ngN6rqZcDZwKVJTgO2ATdX1Sbg5rYMcD6wqU1bgatgEDTAFcArgbOAK+bDRpK08sYWHFX1UFXd0eYfB3YD64ALgWvbsGuBn23zFwLX1cCXgOOSrAVeD+ysqkeq6v8CO4HzxtW3JGlxK3KOI8kG4AzgFuCkqnoIBuECnNiGrQMeHFptb6sdqi5JmoCxB0eSlwCfAN5WVY8tNnSBWi1SP3g7W5PsSrLrwIEDz69ZSdKSxhocSY5kEBofrqpPtvLD7RAU7XF/q+8F1g+tfjKwb5H6s1TV1VW1uao2r1mzZnnfiCTpaeO8qirANcDuqnrP0FM7gPkro7YAnxmqv7ldXXU28Gg7lHUTcG6S1e2k+LmtJkmagCPG+NrnAL8A3J3kzlZ7O/Au4IYkbwG+AbypPXcjcAEwB3wHuASgqh5J8rvAbW3c71TVI2PsW5K0iLEFR1X9BQufnwB47QLjC7j0EK+1Hdi+fN1Jkp4vvzkuSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLiMFR5IfHXcjkqTZMOoex/uT3Jrk15IcN9aOJElTbaTgqKofB34eWA/sSvKRJK9bbJ0k25PsT3LPUO0dSb6Z5M42XTD03OVJ5pLcl+T1Q/XzWm0uybbudyhJWlYjn+OoqvuBfw/8JvAvgSuTfC3Jvz7EKn8InLdA/b1VdXqbbgRIchpwEfAjbZ3/mmRVklXA+4DzgdOAi9tYSdKEjHqO4+VJ3gvsBl4D/ExVvazNv3ehdarqi8AjI/ZxIXB9VX23qr4OzAFntWmuqh6oqieA69tYSdKEjLrH8V+AO4BXVNWlVXUHQFXtY7AX0uOyJHe1Q1mrW20d8ODQmL2tdqi6JGlCRg2OC4CPVNXfASR5QZJjAKrqQx3buwr4YeB04CHg3a2eBcbWIvXnSLI1ya4kuw4cONDRkiSpx6jB8XngRUPLx7Ral6p6uKqeqqp/AD7I4FAUDPYk1g8NPRnYt0h9ode+uqo2V9XmNWvW9LYmSRrRqMHxwqr62/mFNn9M78aSrB1afCMwf8XVDuCiJEcn2QhsAm4FbgM2JdmY5CgGJ9B39G5XkrR8jhhx3LeTnDl/biPJPwf+brEVknwUeDVwQpK9wBXAq5OczuBw0x7gVwCq6t4kNwBfBZ4ELq2qp9rrXAbcBKwCtlfVvV3vUJK0rEYNjrcBH0syf5hoLfBvF1uhqi5eoHzNIuPfCbxzgfqNwI0j9ilJGrORgqOqbkvyz4CXMjhh/bWq+vuxdiZJmkqj7nEA/Biwoa1zRhKq6rqxdCVJmlojBUeSDzG4jPZO4KlWLsDgkKTDzKh7HJuB06pqwe9QSJIOH6NejnsP8I/H2YgkaTaMusdxAvDVJLcC350vVtUbxtKVJGlqjRoc7xhnE5Kk2THq5bh/nuSHgE1V9fl2n6pV421NkjSNRr2t+i8DHwc+0ErrgE+PqylJ0vQa9eT4pcA5wGPw9I86nTiupiRJ02vU4Phu+yElAJIcwSFuby5J+v42anD8eZK3Ay9qvzX+MeB/jq8tSdK0GjU4tgEHgLsZ3NH2Rvp/+U+S9H1g1Kuq5n946YPjbUeSNO1GvVfV11ngnEZVnbrsHUmSplrPvarmvRB4E3D88rcjSZp2I53jqKq/Hpq+WVV/ALxmzL1JkqbQqIeqzhxafAGDPZBjx9KRJGmqjXqo6t1D808y+L3wf7Ps3UiSpt6oV1X95LgbkSTNhlEPVf36Ys9X1XuWpx1J0rTruarqx4AdbflngC8CD46jKUnS9Or5Iaczq+pxgCTvAD5WVb80rsYkSdNp1FuOnAI8MbT8BLBh2buRJE29Ufc4PgTcmuRTDL5B/kbgurF1JUmaWqNeVfXOJH8M/ItWuqSqvjy+tiRJ02rUQ1UAxwCPVdV/BvYm2TimniRJU2zUn469AvhN4PJWOhL4H+NqSpI0vUbd43gj8Abg2wBVtQ9vOSJJh6VRg+OJqirardWTvHh8LUmSptmowXFDkg8AxyX5ZeDz+KNOknRYGvWqqt9vvzX+GPBS4LeraudYO5MkTaUlgyPJKuCmqvopwLCQpMPckoeqquop4DtJfmAF+pEkTblRz3H8P+DuJNckuXJ+WmyFJNuT7E9yz1Dt+CQ7k9zfHle3etprziW5a/iHo5JsaePvT7Ll+bxJSdLyGTU4Pgf8BwZ3xL19aFrMHwLnHVTbBtxcVZuAm9sywPnApjZtBa6CQdAAVwCvBM4CrpgPG0nSZCx6jiPJKVX1jaq6tveFq+qLSTYcVL4QeHWbvxb4MwZfLLwQuK5d8vulJMclWdvG7qyqR1o/OxmE0Ud7+5EkLY+l9jg+PT+T5BPLsL2TquohgPZ4Yquv49m/7bG31Q5Vf44kW5PsSrLrwIEDy9CqJGkhSwVHhuZPHWMfWaBWi9SfW6y6uqo2V9XmNWvWLGtzkqRnLBUcdYj55+vhdgiK9ri/1fcC64fGnQzsW6QuSZqQpYLjFUkeS/I48PI2/1iSx5M89jy2twOYvzJqC/CZofqb29VVZwOPtkNZNwHnJlndToqf22qSpAlZ9OR4Va16vi+c5KMMTm6fkGQvg6uj3sXg9iVvAb4BvKkNvxG4AJgDvgNc0rb/SJLfBW5r435n/kS5JGkyRv0FwG5VdfEhnnrtAmMLuPQQr7Md2L6MrUmSvgc9P+QkSZLBIUnqY3BIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpy0SCI8meJHcnuTPJrlY7PsnOJPe3x9WtniRXJplLcleSMyfRsyRpYJJ7HD9ZVadX1ea2vA24uao2ATe3ZYDzgU1t2gpcteKdSpKeNk2Hqi4Erm3z1wI/O1S/rga+BByXZO0kGpQkTS44CviTJLcn2dpqJ1XVQwDt8cRWXwc8OLTu3laTJE3AERPa7jlVtS/JicDOJF9bZGwWqNVzBg0CaCvAKaecsjxdSpKeYyJ7HFW1rz3uBz4FnAU8PH8Iqj3ub8P3AuuHVj8Z2LfAa15dVZuravOaNWvG2b4kHdZWPDiSvDjJsfPzwLnAPcAOYEsbtgX4TJvfAby5XV11NvDo/CEtSdLKm8ShqpOATyWZ3/5Hqup/JbkNuCHJW4BvAG9q428ELgDmgO8Al6x8y5KkeSseHFX1APCKBep/Dbx2gXoBl65Aa5KkEUzT5biSpBlgcEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkrocMekG9IwN2z43sW3veddPT2zbkmaLexySpC4GhySpy8wER5LzktyXZC7Jtkn3I0mHq5kIjiSrgPcB5wOnARcnOW2yXUnS4WkmggM4C5irqgeq6gngeuDCCfckSYelWbmqah3w4NDyXuCVE+rl+9Ikr+iaBK8ik56/WQmOLFCrZw1ItgJb2+LfJrnve9jeCcC3vof1J2mWe4cV6j+/N5aX9Z/95Mxy7zA9/f/QKINmJTj2AuuHlk8G9g0PqKqrgauXY2NJdlXV5uV4rZU2y73DbPc/y73DbPc/y73D7PU/K+c4bgM2JdmY5CjgImDHhHuSpMPSTOxxVNWTSS4DbgJWAdur6t4JtyVJh6WZCA6AqroRuHGFNrcsh7wmZJZ7h9nuf5Z7h9nuf5Z7hxnrP1W19ChJkppZOcchSZoSBseQab2tSZLtSfYnuWeodnySnUnub4+rWz1Jrmzv4a4kZw6ts6WNvz/JlhXqfX2SLyTZneTeJG+dlf6TvDDJrUm+0nr/j62+McktrY8/ahdskOTotjzXnt8w9FqXt/p9SV4/7t4Peh+rknw5yWdnrf8ke5LcneTOJLtabeo/O22bxyX5eJKvtc//q2al9yVVldPgcN0q4K+AU4GjgK8Ap026r9bbTwBnAvcM1f4TsK3NbwN+r81fAPwxg+++nA3c0urHAw+0x9VtfvUK9L4WOLPNHwv8JYPbxkx9/62Hl7T5I4FbWk83ABe1+vuBX23zvwa8v81fBPxRmz+tfZ6OBja2z9mqFfz8/DrwEeCzbXlm+gf2ACccVJv6z07b7rXAL7X5o4DjZqX3Jd/bpBuYlgl4FXDT0PLlwOWT7muonw08OzjuA9a2+bXAfW3+A8DFB48DLgY+MFR/1rgVfB+fAV43a/0DxwB3MLhjwbeAIw7+3DC46u9Vbf6INi4Hf5aGx61A3ycDNwOvAT7b+pml/vfw3OCY+s8O8I+Ar9POI89S76NMHqp6xkK3NVk3oV5GcVJVPQTQHk9s9UO9j4m/v3bo4wwGf7nPRP/tMM+dwH5gJ4O/tv+mqp5coI+ne2zPPwr84KR6b/4A+HfAP7TlH2S2+i/gT5LcnsHdIWA2PjunAgeA/94OE/63JC+ekd6XZHA8Y8nbmsyIQ72Pib6/JC8BPgG8raoeW2zoArWJ9V9VT1XV6Qz+cj8LeNkifUxV70n+FbC/qm4fLi/Sy1T135xTVWcyuDP2pUl+YpGx09T/EQwOL19VVWcA32ZwaOpQpqn3JRkcz1jytiZT5uEkawHa4/5WP9T7mNj7S3Ikg9D4cFV9spVnpn+Aqvob4M8YHH8+Lsn8d6CG+3i6x/b8DwCPMLnezwHekGQPgztKv4bBHsis9E9V7WuP+4FPMQjvWfjs7AX2VtUtbfnjDIJkFnpfksHxjFm7rckOYP4Kiy0Mzh3M19/crtI4G3i07RLfBJybZHW7kuPcVhurJAGuAXZX1Xtmqf8ka5Ic1+ZfBPwUsBv4AvBzh+h9/j39HPCnNTgwvQO4qF21tBHYBNw6zt4Bquryqjq5qjYw+Dz/aVX9/Kz0n+TFSY6dn2fw7/weZuCzU1X/B3gwyUtb6bXAV2eh95FM+iTLNE0Mrmz4SwbHsX9r0v0M9fVR4CHg7xn8BfIWBseebwbub4/Ht7Fh8KNXfwXcDWweep1fBObadMkK9f7jDHat7wLubNMFs9A/8HLgy633e4DfbvVTGfyPcw74GHB0q7+wLc+1508deq3fau/pPuD8CXyGXs0zV1XNRP+tz6+06d75/yZn4bPTtnk6sKt9fj7N4Kqomeh9qclvjkuSunioSpLUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSl/8PI9ITMXQHUxUAAAAASUVORK5CYII=\n",
1195 | "text/plain": [
1196 | ""
1197 | ]
1198 | },
1199 | "metadata": {
1200 | "needs_background": "light"
1201 | },
1202 | "output_type": "display_data"
1203 | }
1204 | ],
1205 | "source": [
1206 | "# histogram shows the frequency distribution of a single numeric variable\n",
1207 | "ted.comments.plot(kind='hist')"
1208 | ]
1209 | },
1210 | {
1211 | "cell_type": "code",
1212 | "execution_count": 15,
1213 | "metadata": {},
1214 | "outputs": [
1215 | {
1216 | "data": {
1217 | "text/plain": [
1218 | ""
1219 | ]
1220 | },
1221 | "execution_count": 15,
1222 | "metadata": {},
1223 | "output_type": "execute_result"
1224 | },
1225 | {
1226 | "data": {
1227 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n",
1228 | "text/plain": [
1229 | ""
1230 | ]
1231 | },
1232 | "metadata": {
1233 | "needs_background": "light"
1234 | },
1235 | "output_type": "display_data"
1236 | }
1237 | ],
1238 | "source": [
1239 | "# modify the plot to be more informative\n",
1240 | "ted[ted.comments < 1000].comments.plot(kind='hist')"
1241 | ]
1242 | },
1243 | {
1244 | "cell_type": "code",
1245 | "execution_count": 16,
1246 | "metadata": {},
1247 | "outputs": [
1248 | {
1249 | "data": {
1250 | "text/plain": [
1251 | "(32, 19)"
1252 | ]
1253 | },
1254 | "execution_count": 16,
1255 | "metadata": {},
1256 | "output_type": "execute_result"
1257 | }
1258 | ],
1259 | "source": [
1260 | "# check how many observations we removed from the plot\n",
1261 | "ted[ted.comments >= 1000].shape"
1262 | ]
1263 | },
1264 | {
1265 | "cell_type": "code",
1266 | "execution_count": 17,
1267 | "metadata": {},
1268 | "outputs": [
1269 | {
1270 | "data": {
1271 | "text/plain": [
1272 | ""
1273 | ]
1274 | },
1275 | "execution_count": 17,
1276 | "metadata": {},
1277 | "output_type": "execute_result"
1278 | },
1279 | {
1280 | "data": {
1281 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n",
1282 | "text/plain": [
1283 | ""
1284 | ]
1285 | },
1286 | "metadata": {
1287 | "needs_background": "light"
1288 | },
1289 | "output_type": "display_data"
1290 | }
1291 | ],
1292 | "source": [
1293 | "# can also write this using the query method\n",
1294 | "ted.query('comments < 1000').comments.plot(kind='hist')"
1295 | ]
1296 | },
1297 | {
1298 | "cell_type": "code",
1299 | "execution_count": 18,
1300 | "metadata": {},
1301 | "outputs": [
1302 | {
1303 | "data": {
1304 | "text/plain": [
1305 | ""
1306 | ]
1307 | },
1308 | "execution_count": 18,
1309 | "metadata": {},
1310 | "output_type": "execute_result"
1311 | },
1312 | {
1313 | "data": {
1314 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n",
1315 | "text/plain": [
1316 | ""
1317 | ]
1318 | },
1319 | "metadata": {
1320 | "needs_background": "light"
1321 | },
1322 | "output_type": "display_data"
1323 | }
1324 | ],
1325 | "source": [
1326 | "# can also write this using the loc accessor\n",
1327 | "ted.loc[ted.comments < 1000, 'comments'].plot(kind='hist')"
1328 | ]
1329 | },
1330 | {
1331 | "cell_type": "code",
1332 | "execution_count": 19,
1333 | "metadata": {},
1334 | "outputs": [
1335 | {
1336 | "data": {
1337 | "text/plain": [
1338 | ""
1339 | ]
1340 | },
1341 | "execution_count": 19,
1342 | "metadata": {},
1343 | "output_type": "execute_result"
1344 | },
1345 | {
1346 | "data": {
1347 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD8CAYAAABthzNFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE/NJREFUeJzt3X+w5XV93/HnS1AQEl2QC93uLrlQd4xMpsr2VteStgaM5Ufqmg40Gqfu0G226ZBGa2biYjM1mWlncCYVZJIhEjEuVEXEH2yBanDFZDpT0WWggILdFSl7s4RdIyxRVETf/eN8Lp4u3909d/d+77k/no+ZM+f7/Xw/33Pen/vd4cX350lVIUnSgV4w7gIkSQuTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqdOx4y7gaJxyyik1OTk57jIkaVG5++67v11VE4frt6gDYnJykh07doy7DElaVJL831H6eYhJktTJgJAkdTIgJEmdDAhJUicDQpLUyYCQJHUyICRJnQwISVKnXgMiyYokNyd5KMmDSV6X5OQkdyTZ2d5Pan2T5Ooku5Lcl2Rdn7VJkg6t7zupPwB8rqouTvIi4ATgPcD2qroiyRZgC/Bu4AJgbXu9FrimvS9Ik1tuO+J1H7niojmsRJL60dseRJKXAP8EuA6gqp6pqieBDcDW1m0r8OY2vQG4vga+DKxIsrKv+iRJh9bnIaYzgX3AnyW5J8mHkpwInFZVjwG091Nb/1XA7qH1p1ubJGkM+gyIY4F1wDVVdTbwPQaHkw4mHW31vE7J5iQ7kuzYt2/f3FQqSXqePgNiGpiuqrva/M0MAuPxmUNH7X3vUP81Q+uvBvYc+KFVdW1VTVXV1MTEYZ9WK0k6Qr0FRFX9NbA7ySta03nA14FtwMbWthG4pU1vA97ermZaD+yfORQlSZp/fV/F9O+Bj7YrmB4GLmUQSjcl2QQ8ClzS+t4OXAjsAp5ufSVJY9JrQFTVvcBUx6LzOvoWcFmf9UiSRued1JKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnqZEBIkjoZEJKkTr0GRJJHktyf5N4kO1rbyUnuSLKzvZ/U2pPk6iS7ktyXZF2ftUmSDm0+9iB+qapeXVVTbX4LsL2q1gLb2zzABcDa9toMXDMPtUmSDmIch5g2AFvb9FbgzUPt19fAl4EVSVaOoT5JEv0HRAF/nuTuJJtb22lV9RhAez+1ta8Cdg+tO93aJEljcGzPn39OVe1JcipwR5KHDtE3HW31vE6DoNkMcPrpp89NlZKk5+l1D6Kq9rT3vcBngNcAj88cOmrve1v3aWDN0OqrgT0dn3ltVU1V1dTExESf5UvSstZbQCQ5McnPzkwDbwQeALYBG1u3jcAtbXob8PZ2NdN6YP/MoShJ0vzr8xDTacBnksx8z8eq6nNJvgrclGQT8ChwSet/O3AhsAt4Gri0x9qY3HJbnx8vSYtebwFRVQ8Dr+po/xvgvI72Ai7rqx5J0ux4J7UkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnq1PdPjqrD0fwWxSNXXDSHlUjSwbkHIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOvUeEEmOSXJPklvb/BlJ7kqyM8knkryotR/X5ne15ZN91yZJOrj52IN4B/Dg0Pz7gCurai3wBLCptW8CnqiqlwNXtn6SpDHpNSCSrAYuAj7U5gOcC9zcumwF3tymN7R52vLzWn9J0hj0vQdxFfC7wE/a/MuAJ6vq2TY/Daxq06uA3QBt+f7WX5I0Br0FRJJfAfZW1d3DzR1da4Rlw5+7OcmOJDv27ds3B5VKkrr0uQdxDvCmJI8ANzI4tHQVsCLJzO9QrAb2tOlpYA1AW/5S4DsHfmhVXVtVU1U1NTEx0WP5krS89RYQVXV5Va2uqkngLcAXq+ptwJ3Axa3bRuCWNr2tzdOWf7GqnrcHIUmaHyMFRJJfmMPvfDfwriS7GJxjuK61Xwe8rLW/C9gyh98pSZqlUX9y9E/a/QofAT5WVU/O5kuq6kvAl9r0w8BrOvr8ALhkNp8rSerPSHsQVfWLwNsYnCPYkeRjSX6518okSWM18jmIqtoJ/B6DQ0T/FLg6yUNJ/kVfxUmSxmfUcxB/P8mVDO6IPhf451X1yjZ9ZY/1SZLGZNRzEH8E/Cnwnqr6/kxjVe1J8nu9VCZJGqtRA+JC4PtV9WOAJC8Ajq+qp6vqht6qkySNzajnIL4AvHho/oTWJklaokYNiOOr6rszM236hH5KkiQtBKMGxPeSrJuZSfIPgO8for8kaZEb9RzEO4FPJpl5btJK4Nf6KUmStBCMFBBV9dUkPw+8gsFTVx+qqh/1WpkkaaxG3YMA+IfAZFvn7CRU1fW9VCVJGruRAiLJDcDfA+4FftyaCzAgJGmJGnUPYgo4y8dvS9LyMepVTA8Af6fPQiRJC8uoexCnAF9P8hXghzONVfWmXqqSJI3dqAHx+30WIUlaeEa9zPUvkvwcsLaqvpDkBOCYfkuTJI3TqI/7/g3gZuCDrWkV8Nm+ipIkjd+oJ6kvA84BnoLnfjzo1L6KkiSN36gB8cOqemZmJsmxDO6DkCQtUaMGxF8keQ/w4vZb1J8E/nt/ZUmSxm3UgNgC7APuB/4tcDuD36eWJC1Ro17F9BMGPzn6p/2WI0laKEZ9FtO36DjnUFVnznlFkqQFYTbPYppxPHAJcPLclyNJWihGOgdRVX8z9PqrqroKOPdQ6yQ5PslXkvzvJF9L8get/YwkdyXZmeQTSV7U2o9r87va8smjHJsk6SiMeqPcuqHXVJLfBH72MKv9EDi3ql4FvBo4P8l64H3AlVW1FngC2NT6bwKeqKqXA1e2fpKkMRn1ENN/HZp+FngE+JeHWqE9Gvy7bfaF7VUM9jx+vbVvZfCcp2uADfz0mU83A3+UJD5iXJLGY9SrmH7pSD48yTHA3cDLgT8Gvgk8WVXPti7TDB7bQXvf3b7v2ST7gZcB3z6S75YkHZ1Rr2J616GWV9X7D9L+Y+DVSVYAnwFe2dVt5msOsWy4ls3AZoDTTz/9UGVJko7CqDfKTQH/jsH/5a8CfhM4i8F5iMOdi6CqngS+BKwHVrRHdQCsBva06WlgDTz3KI+XAt/p+Kxrq2qqqqYmJiZGLF+SNFuz+cGgdVX1twBJfh/4ZFX9m4OtkGQC+FFVPZnkxcAbGJx4vhO4GLgR2Ajc0lbZ1ub/V1v+Rc8/SNL4jBoQpwPPDM0/A0weZp2VwNZ2HuIFwE1VdWuSrwM3JvnPwD3Ada3/dcANSXYx2HN4y4i1SZJ6MGpA3AB8JclnGJwX+FXg+kOtUFX3AWd3tD8MvKaj/QcMbsCTJC0Ao17F9F+S/A/gH7emS6vqnv7KkiSN26gnqQFOAJ6qqg8A00nO6KkmSdICMOqd1O8F3g1c3ppeCPy3voqSJI3fqHsQvwq8CfgeQFXtYYTLWyVJi9eoAfFMu+S0AJKc2F9JkqSFYNSAuCnJBxnc5PYbwBfwx4MkaUkb9SqmP2y/Rf0U8ArgP1XVHb1WJkkaq8MGRLvR7fNV9QbAUJCkZeKwAVFVP07ydJKXVtX++ShKBze55bYjXveRKy6aw0okLXWj3kn9A+D+JHfQrmQCqKrf7qUqSdLYjRoQt7WXJGmZOGRAJDm9qh6tqq3zVZAkaWE43GWun52ZSPKpnmuRJC0ghwuI4V95O7PPQiRJC8vhAqIOMi1JWuIOd5L6VUmeYrAn8eI2TZuvqnpJr9VJksbmkAFRVcfMVyGSpIVlNr8HIUlaRgwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTJgJAkdeotIJKsSXJnkgeTfC3JO1r7yUnuSLKzvZ/U2pPk6iS7ktyXZF1ftUmSDq/PPYhngd+pqlcC64HLkpwFbAG2V9VaYHubB7gAWNtem4FreqxNknQYo/5g0KxV1WPAY236b5M8CKwCNgCvb922Al8C3t3ar6+qAr6cZEWSle1zNAf8uVJJszEv5yCSTAJnA3cBp838R7+9n9q6rQJ2D6023dokSWPQe0Ak+RngU8A7q+qpQ3XtaHveI8aTbE6yI8mOffv2zVWZkqQD9BoQSV7IIBw+WlWfbs2PJ1nZlq8E9rb2aWDN0OqrgT0HfmZVXVtVU1U1NTEx0V/xkrTM9XkVU4DrgAer6v1Di7YBG9v0RuCWofa3t6uZ1gP7Pf8gSePT20lq4BzgXwH3J7m3tb0HuAK4Kckm4FHgkrbsduBCYBfwNHBpj7VJkg6jz6uY/ifd5xUAzuvoX8BlfdUjSZod76SWJHUyICRJnQwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTJgJAkdTIgJEmdDAhJUicDQpLUyYCQJHUyICRJnQwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTp2HEXoMVhcsttR7X+I1dcNEeVSJov7kFIkjoZEJKkTgaEJKmTASFJ6tRbQCT5cJK9SR4Yajs5yR1Jdrb3k1p7klydZFeS+5Ks66suSdJo+tyD+Ahw/gFtW4DtVbUW2N7mAS4A1rbXZuCaHuuSJI2gt4Coqr8EvnNA8wZga5veCrx5qP36GvgysCLJyr5qkyQd3nyfgzitqh4DaO+ntvZVwO6hftOtTZI0JgvlJHU62qqzY7I5yY4kO/bt29dzWZK0fM13QDw+c+iove9t7dPAmqF+q4E9XR9QVddW1VRVTU1MTPRarCQtZ/P9qI1twEbgivZ+y1D7byW5EXgtsH/mUJSWhqN5VIeP6ZDGo7eASPJx4PXAKUmmgfcyCIabkmwCHgUuad1vBy4EdgFPA5f2VZckaTS9BURVvfUgi87r6FvAZX3VIkmavYVyklqStMAYEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSp03zfSS3NmndhS+PhHoQkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6uRlrlrSvERWOnLuQUiSOhkQkqROHmKSDsLDU1ruDAhpATKctBB4iEmS1MmAkCR18hCT1IOjOUQkLRTuQUiSOrkHIS0x49p78eT40mNASFoQvHJr4VlQAZHkfOADwDHAh6rqijGXJGkRGFe4LPVQWzABkeQY4I+BXwamga8m2VZVXx9vZZI09472UOB8BMyCCQjgNcCuqnoYIMmNwAbAgJAWgcV65dZirXs+LKSrmFYBu4fmp1ubJGkMFtIeRDra6nmdks3A5jb73STfOMLvOwX49hGuu1gtxzHD8hz3chwzLKNx533PTR7JmH9ulE4LKSCmgTVD86uBPQd2qqprgWuP9suS7KiqqaP9nMVkOY4Zlue4l+OYYXmOu88xL6RDTF8F1iY5I8mLgLcA28ZckyQtWwtmD6Kqnk3yW8DnGVzm+uGq+tqYy5KkZWvBBARAVd0O3D5PX3fUh6kWoeU4Zlie416OY4blOe7expyq550HliRpQZ2DkCQtIMsuIJKcn+QbSXYl2TLueuZKkjVJ7kzyYJKvJXlHaz85yR1Jdrb3k1p7klzd/g73JVk33hEcnSTHJLknya1t/owkd7Vxf6Jd+ECS49r8rrZ8cpx1H6kkK5LcnOShts1ftxy2dZL/0P59P5Dk40mOX4rbOsmHk+xN8sBQ26y3b5KNrf/OJBtnW8eyCoihx3lcAJwFvDXJWeOtas48C/xOVb0SWA9c1sa2BdheVWuB7W0eBn+Dte21Gbhm/kueU+8AHhyafx9wZRv3E8Cm1r4JeKKqXg5c2fotRh8APldVPw+8isHYl/S2TrIK+G1gqqp+gcHFLG9haW7rjwDnH9A2q+2b5GTgvcBrGTyp4r0zoTKyqlo2L+B1wOeH5i8HLh93XT2N9RYGz7X6BrCyta0EvtGmPwi8daj/c/0W24vBPTPbgXOBWxncdPlt4NgDtzuDq+Re16aPbf0y7jHMcrwvAb51YN1LfVvz06ctnNy23a3AP1uq2xqYBB440u0LvBX44FD7/9dvlNey2oNgmTzOo+1Knw3cBZxWVY8BtPdTW7el9Le4Cvhd4Cdt/mXAk1X1bJsfHttz427L97f+i8mZwD7gz9phtQ8lOZElvq2r6q+APwQeBR5jsO3uZmlv62Gz3b5Hvd2XW0CM9DiPxSzJzwCfAt5ZVU8dqmtH26L7WyT5FWBvVd093NzRtUZYtlgcC6wDrqmqs4Hv8dPDDV2Wwphph0c2AGcAfxc4kcHhlQMtpW09ioON86jHv9wCYqTHeSxWSV7IIBw+WlWfbs2PJ1nZlq8E9rb2pfK3OAd4U5JHgBsZHGa6CliRZOY+n+GxPTfutvylwHfms+A5MA1MV9Vdbf5mBoGx1Lf1G4BvVdW+qvoR8GngH7G0t/Ww2W7fo97uyy0gluzjPJIEuA54sKreP7RoGzBz9cJGBucmZtrf3q6AWA/sn9l9XUyq6vKqWl1Vkwy25xer6m3AncDFrduB4575e1zc+i+q/6usqr8Gdid5RWs6j8Fj8Zf0tmZwaGl9khPav/eZcS/ZbX2A2W7fzwNvTHJS2/t6Y2sb3bhPxIzhxM+FwP8Bvgn8x3HXM4fj+kUGu4/3Afe214UMjrluB3a295Nb/zC4ouubwP0MrgwZ+ziO8m/weuDWNn0m8BVgF/BJ4LjWfnyb39WWnznuuo9wrK8GdrTt/VngpOWwrYE/AB4CHgBuAI5bitsa+DiD8yw/YrAnsOlIti/wr9v4dwGXzrYO76SWJHVaboeYJEkjMiAkSZ0MCElSJwNCktTJgJAkdTIgJEmdDAhJUicDQpLU6f8BGHUhs1TTeTsAAAAASUVORK5CYII=\n",
1348 | "text/plain": [
1349 | ""
1350 | ]
1351 | },
1352 | "metadata": {
1353 | "needs_background": "light"
1354 | },
1355 | "output_type": "display_data"
1356 | }
1357 | ],
1358 | "source": [
1359 | "# increase the number of bins to see more detail\n",
1360 | "ted.loc[ted.comments < 1000, 'comments'].plot(kind='hist', bins=20)"
1361 | ]
1362 | },
1363 | {
1364 | "cell_type": "code",
1365 | "execution_count": 20,
1366 | "metadata": {},
1367 | "outputs": [
1368 | {
1369 | "data": {
1370 | "text/plain": [
1371 | ""
1372 | ]
1373 | },
1374 | "execution_count": 20,
1375 | "metadata": {},
1376 | "output_type": "execute_result"
1377 | },
1378 | {
1379 | "data": {
1380 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFC5JREFUeJzt3X+QXWV9x/H3d+9uspCQEGClMZu4dEhbCsUf3fFH7bSOVCJaCH/Ujo5TgjKmOIJVmdbQacehnQDSDladihMlGjtW4mBboNImKcZqp5WaIBUllWT4kayhsJBLJLkh2WW//WPPxl2yJLn3ZvfuzXm/Zu6cc5773Hu+m8nu5z7PPT8iM5EklU9HqwuQJLWGASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklVRnqws4mrPOOiv7+vpaXYYktZWtW7c+k5k9x+o3owOgr6+PLVu2tLoMSWorEfHE8fRzCkiSSsoAkKSSMgAkqaSOGQARsTYino6IH41rOyMiNkXE9mK5oGiPiPhMROyIiB9GxOvGvWZF0X97RKyYmh9HknS8jmcE8GXg7S9pWwXcl5lLgfuKbYBLgKXFYyVwG4wGBvAJ4A3A64FPjIWGJKk1jhkAmfkdYM9LmpcD64r1dcDl49q/kqO+B5weEQuBZcCmzNyTmVVgE0eGitQWli1bRkdHBxFBR0cHy5Yta3VJUkMa/Q7g7Mx8EqBYvqJoXwTsGtdvoGh7uXaprSxbtoyNGzdy9dVX89xzz3H11VezceNGQ0Bt6USfBxCTtOVR2o98g4iVjE4fsWTJkhNXmXQCbNq0iQ9+8IN87nOfAzi8/PznP9/KsqSGNDoCeKqY2qFYPl20DwCLx/XrBXYfpf0ImbkmM/szs7+n55gnsknTKjO56aabJrTddNNNeG9ttaNGA+BuYOxInhXAXeParyiOBnojsLeYItoAXBwRC4ovfy8u2qS2EhFcf/31E9quv/56IiYb5Eoz2zGngCLia8BbgLMiYoDRo3luBr4eEVcBO4F3Fd3vBd4B7ABqwPsAMnNPRPwl8P2i319k5ku/WJZmvLe97W3cdttt3HHHHVSrVRYsWEC1WuXiiy9udWlS3Y4ZAJn5npd56qJJ+ibwoZd5n7XA2rqqk2aYK6+8ku985ztUq1UAqtUq3d3dXHnlla0tTGqAZwJLdVi9ejXXXXcd559/Ph0dHZx//vlcd911rF69utWlSXWb0VcDlWaahx9+mKeffpo5c+YAsH//ftasWcMzzzzT4sqk+jkCkOpQqVQ4cOAAwOEjfw4cOEClUmllWVJDDACpDsPDw9RqNa699lr27dvHtddeS61WY3h4uNWlSXWLmXz8cn9/f3pDGM0kEcHSpUvZsWMHmUlEcO6557J9+3bPBdCMERFbM7P/WP0cAUh12r59+4RLQWzfvr3VJUkNcQQg1SEi6OrqAmBoaGjC+kz+XVK5HO8IwKOApDoNDQ1Nui61G6eAJKmkDACpAZdddhmDg4NcdtllrS5FaphTQFKdzjvvPDZs2EBPTw+zZ8/mvPPOY9u2ba0uS6qbIwCpTk888QQLFy6ko6ODhQsX8sQTT7S6JKkhjgCkOkQEtVqNnTt3MjIycnjp5aDVjhwBSHU49dRTARgZGZmwHGuX2okjAKkO+/fv55RTTmF4ePjweQCdnZ3s37+/1aVJdXMEINXphhtu4NChQ2Qmhw4d4oYbbmh1SVJDDACpTrfeeiubN29maGiIzZs3c+utt7a6JKkhTgFJdejt7eX555/n/e9/Pzt37mTJkiUcOHCA3t7eVpcm1c0RgFSHW265hZGREX76059OWN5yyy2tLk2qmwEg1am7u5tFixYRESxatIju7u5WlyQ1xACQ6rB69WrWr1/PY489xsjICI899hjr16/3nsBqSwaAVIdt27YxMDDABRdcQKVS4YILLmBgYMBLQagteT8AqQ6LFy/mqaeemnAZ6K6uLs4++2x27drVwsqkn/OOYNIUGBwcZGhoiLlz5xIRzJ07l6GhIQYHB1tdmlQ3DwOV6nDw4EEign379gGwb98+IoKDBw+2uDKpfo4ApDq9dNp0Jk+jSkdjAEgN8IYwOhk4BSQ14Lvf/S49PT0sWLCg1aVIDXMEIDWgWq1OWErtyACQGjB37twJS6kdGQBSHebMmQMw4Sig8e1SO2kqACLioxHx44j4UUR8LSK6I+KciLg/IrZHxPqImFX0nV1s7yie7zsRP4A0nWq1Gl1dXRPaurq6qNVqLapIalzDARARi4APA/2ZeQFQAd4NfBL4VGYuBarAVcVLrgKqmXku8Kmin9RWKpUKs2bNoq+vj46ODvr6+pg1axaVSqXVpUl1a3YKqBM4JSI6gVOBJ4G3AncWz68DLi/WlxfbFM9fFN5JW21meHiYWq3Grl27GBkZYdeuXdRqNYaHh1tdmlS3hgMgM38K/DWwk9E//HuBrcBzmTn22zAALCrWFwG7itcOF/3PbHT/UqtkJmOfXSLCE8HUtpqZAlrA6Kf6c4BXAnOASybpOvbbMdmn/SN+cyJiZURsiYgtXl9FM9XYH33/+KudNTMF9DvAY5k5mJlDwD8AvwGcXkwJAfQCu4v1AWAxQPH8fGDPS980M9dkZn9m9vf09DRRnjR15s2bR0Qwb968VpciNayZANgJvDEiTi3m8i8CHgY2A79X9FkB3FWs311sUzz/rfTjk9pQR0cH1WqVzKRardLR4dHUak/NfAdwP6Nf5j4APFS81xrg48DHImIHo3P8txcvuR04s2j/GLCqibqllhkZGTnqttQuvCGMVIejHbg2k3+XVC7eEEaaQmMXgfNicGpnBoBUp/nz50+4GNz8+fNbXJHUGANAqtPevXsn3A9g7969rS5Jaoj3A5AacM8999DT03PU7wSkmc4RgFSnSqUy4UQwrwOkdmUASHXo7Oyku7t7wsXguru76ex0MK32YwBIdZg3bx61Wo0XXngBgBdeeIFareYZwWpLBoBUh2q1yty5c3n22WcZGRnh2WefZe7cud4aUm3JAJDqMGvWLBYuXHj48s/Dw8MsXLiQWbNmtbgyqX4GgFSHgwcP8sgjj3DppZcyODjIpZdeyiOPPMLBgwdbXZpUN7+5kurU19fHhg0b6OnpYfbs2fT19fH444+3uiypbo4ApDoNDAxw4403sn//fm688UYGBgZaXZLUEANAqtOFF17I2rVrOe2001i7di0XXnhhq0uSGuIUkFSnBx544PBF4Hbv3u0RQGpbjgCkOvT29tLZ2Um1WmVkZIRqtUpnZye9vb2tLk2qmwEg1aFWqzE8PHz48g+VSoXh4WFqtVqLK5PqZwBIddizZ88RF4CLCPbsOeL21tKMZwBIdapUKofvA9zR0eHF4NS2/BJYqtPw8PDhAHjxxRe9J7DaliMASSopA0BqwNinfj/9q50ZAFIDxn8HILUr//dKDXAEoJOBASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklVRTARARp0fEnRHxvxGxLSLeFBFnRMSmiNheLBcUfSMiPhMROyLihxHxuhPzI0iSGtHsCODTwL9m5q8Arwa2AauA+zJzKXBfsQ1wCbC0eKwEbmty35KkJjQcABExD/gt4HaAzDyUmc8By4F1Rbd1wOXF+nLgKznqe8DpEbGw4colSU1pZgTwi8Ag8KWI+EFEfDEi5gBnZ+aTAMXyFUX/RcCuca8fKNokSS3QTAB0Aq8DbsvM1wL7+fl0z2RikrY8olPEyojYEhFbBgcHmyhPknQ0zQTAADCQmfcX23cyGghPjU3tFMunx/VfPO71vcDul75pZq7JzP7M7O/p6WmiPEnS0TQcAJn5f8CuiPjlouki4GHgbmBF0bYCuKtYvxu4ojga6I3A3rGpIqndLFiwYMJSakfN3hP4WuCrETELeBR4H6Oh8vWIuArYCbyr6Hsv8A5gB1Ar+kptad++fROWUjtqKgAy80Ggf5KnLpqkbwIfamZ/0kwxNDQ0YSm1I88ElqSSMgAkqaQMAEkqKQNAasCCBQuICI8CUltr9iggqZSq1eqEpdSOHAFIUkkZAJJUUgaAJJWUASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklZQBIEklZQBIUkkZAJJUUgaAJJWUASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEgNiIgJS6kddba6AGkmqPcPeWZOWB7ve4zvL7WaIwCJ0T/Mx/O45ppriAgqlQoAlUqFiOCaa645rtdLM4kjAKkOn/3sZwH4whe+wIsvvkhnZycf+MAHDrdL7SRm8qeS/v7+3LJlS6vLkCbVt+qbPH7zO1tdhnSEiNiamf3H6ucUkCSVVNMBEBGViPhBRPxzsX1ORNwfEdsjYn1EzCraZxfbO4rn+5rdtySpcSdiBPBHwLZx258EPpWZS4EqcFXRfhVQzcxzgU8V/SRJLdJUAEREL/BO4IvFdgBvBe4suqwDLi/WlxfbFM9fFB5ELUkt0+wI4G+APwFGiu0zgecyc7jYHgAWFeuLgF0AxfN7i/4TRMTKiNgSEVsGBwebLE+S9HIaDoCI+F3g6czcOr55kq55HM/9vCFzTWb2Z2Z/T09Po+VJko6hmfMA3gxcFhHvALqBeYyOCE6PiM7iU34vsLvoPwAsBgYiohOYD+xpYv+SpCY0PALIzOszszcz+4B3A9/KzPcCm4HfK7qtAO4q1u8utime/1bO5JMQJOkkNxXnAXwc+FhE7GB0jv/2ov124Myi/WPAqinYtyTpOJ2QS0Fk5reBbxfrjwKvn6TPC8C7TsT+JEnN80xgSSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkups9IURsRj4CvALwAiwJjM/HRFnAOuBPuBx4PczsxoRAXwaeAdQA67MzAeaK1+a3Ktv2MjeA0NTvp++Vd+c0veff0oX//OJi6d0HyqvhgMAGAauy8wHIuI0YGtEbAKuBO7LzJsjYhWwCvg4cAmwtHi8AbitWEon3N4DQzx+8ztbXUbTpjpgVG4NTwFl5pNjn+Az83lgG7AIWA6sK7qtAy4v1pcDX8lR3wNOj4iFDVcuSWrKCfkOICL6gNcC9wNnZ+aTMBoSwCuKbouAXeNeNlC0vfS9VkbElojYMjg4eCLKkyRNoukAiIi5wDeAj2Tmz47WdZK2PKIhc01m9mdmf09PT7PlSZJeRlMBEBFdjP7x/2pm/kPR/NTY1E6xfLpoHwAWj3t5L7C7mf1LkhrXcAAUR/XcDmzLzFvHPXU3sKJYXwHcNa79ihj1RmDv2FSRJGn6NXMU0JuBPwAeiogHi7Y/BW4Gvh4RVwE7gXcVz93L6CGgOxg9DPR9TexbktSkhgMgM/+Dyef1AS6apH8CH2p0f5KkE8szgSWppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKqpmLwUkz1mnnreLX1q1qdRlNO+08gPa/taVmJgNAJ6Xnt93sPYGlY3AKSJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKM4F10joZzqKdf0pXq0vQScwA0ElpOi4D0bfqmyfF5SZUXk4BSVJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAkldS0B0BEvD0ifhIROyKi/W/aKkltaloDICIqwN8ClwC/CrwnIn51OmuQJI2a7hHA64EdmfloZh4C7gCWT3MNkiSm/1IQi4Bd47YHgDeM7xARK4GVAEuWLJm+ylRqEdHY6z5ZX//MbGg/0lSY7hHAZL9lE34jMnNNZvZnZn9PT880laWyy8xpeUgzyXQHwACweNx2L7B7mmuQJDH9AfB9YGlEnBMRs4B3A3dPcw2SJKb5O4DMHI6Ia4ANQAVYm5k/ns4aJEmjpv1+AJl5L3DvdO9XkjSRZwJLUkkZAJJUUgaAJJWUASBJJRUz+eSUiBgEnmh1HdLLOAt4ptVFSJN4VWYe80zaGR0A0kwWEVsys7/VdUiNcgpIkkrKAJCkkjIApMataXUBUjP8DkCSSsoRgCSVlAEgtVBEfCQiTm11HSonp4CkFoqIx4H+zPR8Ak07RwA6KUXEFRHxw4j4n4j4u4h4VUTcV7TdFxFLin5fjojbImJzRDwaEb8dEWsjYltEfHnc++2LiE9GxNaI+LeIeH1EfLt4zWVFn0pE/FVEfL/Yzx8W7W8p+t4ZEf8bEV+NUR8GXglsLvZfKer5UUQ8FBEfbcE/ncpkum6F58PHdD2A84GfAGcV22cA9wAriu33A/9UrH8ZuIPR25UuB34G/BqjH462Aq8p+iVwSbH+j8BGoAt4NfBg0b4S+LNifTawBTgHeAuwl9E74HUA/wX8ZtHv8XF1/jqwadzPcXqr/y19nNwPRwA6Gb0VuDOLaZXM3AO8Cfj74vm/A35zXP97MjOBh4CnMvOhzBwBfgz0FX0OAf9arD8E/HtmDhXrY30uBq6IiAeB+4EzgaXFc/+dmQPF+z447jXjPQr8YkR8NiLezmgYSVPGANDJKBj9xH40458/WCxHxq2PbY/dNGmoCIkJ/Yo/6GN9Arg2M19TPM7JzI0v2QfAi0xyM6bMrDI6ovg28CHgi8f4GaSmGAA6Gd0H/H5EnAkQEWcA/8noPagB3gv8xxTsdwPwwYjoKvb7SxEx5xiveR44reh/FtCRmd8A/hx43RTUKB027beElKZaZv44IlYD/x4RLwI/AD4MrI2IPwYGgfdNwa6/yOjUzgMREcV+Lj/Ga9YA/xIRTwIfAb4UEWMfzK6fghqlwzwMVJJKyikgSSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmk/h/OplPavCQBBQAAAABJRU5ErkJggg==\n",
1381 | "text/plain": [
1382 | ""
1383 | ]
1384 | },
1385 | "metadata": {
1386 | "needs_background": "light"
1387 | },
1388 | "output_type": "display_data"
1389 | }
1390 | ],
1391 | "source": [
1392 | "# boxplot can also show distributions, but it's far less useful for concentrated distributions because of outliers\n",
1393 | "ted.loc[ted.comments < 1000, 'comments'].plot(kind='box')"
1394 | ]
1395 | },
1396 | {
1397 | "cell_type": "markdown",
1398 | "metadata": {},
1399 | "source": [
1400 | "Lessons:\n",
1401 | "\n",
1402 | "1. Choose your plot type based on the question you are answering and the data type(s) you are working with\n",
1403 | "2. Use pandas one-liners to iterate through plots quickly\n",
1404 | "3. Try modifying the plot defaults\n",
1405 | "4. Creating plots involves decision-making"
1406 | ]
1407 | },
1408 | {
1409 | "cell_type": "markdown",
1410 | "metadata": {},
1411 | "source": [
1412 | "## 4. Plot the number of talks that took place each year\n",
1413 | "\n",
1414 | "Bonus exercise: calculate the average delay between filming and publishing"
1415 | ]
1416 | },
1417 | {
1418 | "cell_type": "code",
1419 | "execution_count": 21,
1420 | "metadata": {},
1421 | "outputs": [
1422 | {
1423 | "data": {
1424 | "text/plain": [
1425 | "2012 TEDxBoulder\n",
1426 | "1307 TEDxUCL\n",
1427 | "144 TEDGlobal 2007\n",
1428 | "1739 TED2014\n",
1429 | "1529 TEDGlobal 2013\n",
1430 | "1181 TEDxWomen 2011\n",
1431 | "2150 TEDYouth 2015\n",
1432 | "1719 TED2014\n",
1433 | "64 TED2007\n",
1434 | "1178 TEDxCambridge\n",
1435 | "Name: event, dtype: object"
1436 | ]
1437 | },
1438 | "execution_count": 21,
1439 | "metadata": {},
1440 | "output_type": "execute_result"
1441 | }
1442 | ],
1443 | "source": [
1444 | "# event column does not always include the year\n",
1445 | "ted.event.sample(10)"
1446 | ]
1447 | },
1448 | {
1449 | "cell_type": "code",
1450 | "execution_count": 22,
1451 | "metadata": {},
1452 | "outputs": [
1453 | {
1454 | "data": {
1455 | "text/plain": [
1456 | "0 1140825600\n",
1457 | "1 1140825600\n",
1458 | "2 1140739200\n",
1459 | "3 1140912000\n",
1460 | "4 1140566400\n",
1461 | "Name: film_date, dtype: int64"
1462 | ]
1463 | },
1464 | "execution_count": 22,
1465 | "metadata": {},
1466 | "output_type": "execute_result"
1467 | }
1468 | ],
1469 | "source": [
1470 | "# dataset documentation for film_date says \"Unix timestamp of the filming\"\n",
1471 | "ted.film_date.head()"
1472 | ]
1473 | },
1474 | {
1475 | "cell_type": "code",
1476 | "execution_count": 23,
1477 | "metadata": {},
1478 | "outputs": [
1479 | {
1480 | "data": {
1481 | "text/plain": [
1482 | "0 1970-01-01 00:00:01.140825600\n",
1483 | "1 1970-01-01 00:00:01.140825600\n",
1484 | "2 1970-01-01 00:00:01.140739200\n",
1485 | "3 1970-01-01 00:00:01.140912000\n",
1486 | "4 1970-01-01 00:00:01.140566400\n",
1487 | "Name: film_date, dtype: datetime64[ns]"
1488 | ]
1489 | },
1490 | "execution_count": 23,
1491 | "metadata": {},
1492 | "output_type": "execute_result"
1493 | }
1494 | ],
1495 | "source": [
1496 | "# results don't look right\n",
1497 | "pd.to_datetime(ted.film_date).head()"
1498 | ]
1499 | },
1500 | {
1501 | "cell_type": "markdown",
1502 | "metadata": {},
1503 | "source": [
1504 | "[pandas documentation for `to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)"
1505 | ]
1506 | },
1507 | {
1508 | "cell_type": "code",
1509 | "execution_count": 24,
1510 | "metadata": {},
1511 | "outputs": [
1512 | {
1513 | "data": {
1514 | "text/plain": [
1515 | "0 2006-02-25\n",
1516 | "1 2006-02-25\n",
1517 | "2 2006-02-24\n",
1518 | "3 2006-02-26\n",
1519 | "4 2006-02-22\n",
1520 | "Name: film_date, dtype: datetime64[ns]"
1521 | ]
1522 | },
1523 | "execution_count": 24,
1524 | "metadata": {},
1525 | "output_type": "execute_result"
1526 | }
1527 | ],
1528 | "source": [
1529 | "# now the results look right\n",
1530 | "pd.to_datetime(ted.film_date, unit='s').head()"
1531 | ]
1532 | },
1533 | {
1534 | "cell_type": "code",
1535 | "execution_count": 25,
1536 | "metadata": {},
1537 | "outputs": [],
1538 | "source": [
1539 | "ted['film_datetime'] = pd.to_datetime(ted.film_date, unit='s')"
1540 | ]
1541 | },
1542 | {
1543 | "cell_type": "code",
1544 | "execution_count": 26,
1545 | "metadata": {
1546 | "scrolled": false
1547 | },
1548 | "outputs": [
1549 | {
1550 | "data": {
1551 | "text/html": [
1552 | "\n",
1553 | "\n",
1566 | "
\n",
1567 | " \n",
1568 | " \n",
1569 | " \n",
1570 | " event \n",
1571 | " film_datetime \n",
1572 | " \n",
1573 | " \n",
1574 | " \n",
1575 | " \n",
1576 | " 831 \n",
1577 | " TEDWomen 2010 \n",
1578 | " 2010-12-08 \n",
1579 | " \n",
1580 | " \n",
1581 | " 2464 \n",
1582 | " TED2017 \n",
1583 | " 2017-04-24 \n",
1584 | " \n",
1585 | " \n",
1586 | " 2392 \n",
1587 | " TEDxBeaconStreet \n",
1588 | " 2016-11-19 \n",
1589 | " \n",
1590 | " \n",
1591 | " 1307 \n",
1592 | " TEDxUCL \n",
1593 | " 2012-06-03 \n",
1594 | " \n",
1595 | " \n",
1596 | " 2234 \n",
1597 | " TED2016 \n",
1598 | " 2016-02-17 \n",
1599 | " \n",
1600 | " \n",
1601 | "
\n",
1602 | "
"
1603 | ],
1604 | "text/plain": [
1605 | " event film_datetime\n",
1606 | "831 TEDWomen 2010 2010-12-08\n",
1607 | "2464 TED2017 2017-04-24\n",
1608 | "2392 TEDxBeaconStreet 2016-11-19\n",
1609 | "1307 TEDxUCL 2012-06-03\n",
1610 | "2234 TED2016 2016-02-17"
1611 | ]
1612 | },
1613 | "execution_count": 26,
1614 | "metadata": {},
1615 | "output_type": "execute_result"
1616 | }
1617 | ],
1618 | "source": [
1619 | "# verify that event name matches film_datetime for a random sample\n",
1620 | "ted[['event', 'film_datetime']].sample(5)"
1621 | ]
1622 | },
1623 | {
1624 | "cell_type": "code",
1625 | "execution_count": 27,
1626 | "metadata": {},
1627 | "outputs": [
1628 | {
1629 | "data": {
1630 | "text/plain": [
1631 | "comments int64\n",
1632 | "description object\n",
1633 | "duration int64\n",
1634 | "event object\n",
1635 | "film_date int64\n",
1636 | "languages int64\n",
1637 | "main_speaker object\n",
1638 | "name object\n",
1639 | "num_speaker int64\n",
1640 | "published_date int64\n",
1641 | "ratings object\n",
1642 | "related_talks object\n",
1643 | "speaker_occupation object\n",
1644 | "tags object\n",
1645 | "title object\n",
1646 | "url object\n",
1647 | "views int64\n",
1648 | "comments_per_view float64\n",
1649 | "views_per_comment float64\n",
1650 | "film_datetime datetime64[ns]\n",
1651 | "dtype: object"
1652 | ]
1653 | },
1654 | "execution_count": 27,
1655 | "metadata": {},
1656 | "output_type": "execute_result"
1657 | }
1658 | ],
1659 | "source": [
1660 | "# new column uses the datetime data type (this was an automatic conversion)\n",
1661 | "ted.dtypes"
1662 | ]
1663 | },
1664 | {
1665 | "cell_type": "code",
1666 | "execution_count": 28,
1667 | "metadata": {},
1668 | "outputs": [
1669 | {
1670 | "data": {
1671 | "text/plain": [
1672 | "0 2006\n",
1673 | "1 2006\n",
1674 | "2 2006\n",
1675 | "3 2006\n",
1676 | "4 2006\n",
1677 | "Name: film_datetime, dtype: int64"
1678 | ]
1679 | },
1680 | "execution_count": 28,
1681 | "metadata": {},
1682 | "output_type": "execute_result"
1683 | }
1684 | ],
1685 | "source": [
1686 | "# datetime columns have convenient attributes under the dt namespace\n",
1687 | "ted.film_datetime.dt.year.head()"
1688 | ]
1689 | },
1690 | {
1691 | "cell_type": "code",
1692 | "execution_count": 29,
1693 | "metadata": {},
1694 | "outputs": [
1695 | {
1696 | "data": {
1697 | "text/plain": [
1698 | "0 ted2006\n",
1699 | "1 ted2006\n",
1700 | "2 ted2006\n",
1701 | "3 ted2006\n",
1702 | "4 ted2006\n",
1703 | "Name: event, dtype: object"
1704 | ]
1705 | },
1706 | "execution_count": 29,
1707 | "metadata": {},
1708 | "output_type": "execute_result"
1709 | }
1710 | ],
1711 | "source": [
1712 | "# similar to string methods under the str namespace\n",
1713 | "ted.event.str.lower().head()"
1714 | ]
1715 | },
1716 | {
1717 | "cell_type": "code",
1718 | "execution_count": 30,
1719 | "metadata": {},
1720 | "outputs": [
1721 | {
1722 | "data": {
1723 | "text/plain": [
1724 | "2013 270\n",
1725 | "2011 270\n",
1726 | "2010 267\n",
1727 | "2012 267\n",
1728 | "2016 246\n",
1729 | "2015 239\n",
1730 | "2014 237\n",
1731 | "2009 232\n",
1732 | "2007 114\n",
1733 | "2017 98\n",
1734 | "2008 84\n",
1735 | "2005 66\n",
1736 | "2006 50\n",
1737 | "2003 33\n",
1738 | "2004 33\n",
1739 | "2002 27\n",
1740 | "1998 6\n",
1741 | "2001 5\n",
1742 | "1983 1\n",
1743 | "1991 1\n",
1744 | "1994 1\n",
1745 | "1990 1\n",
1746 | "1984 1\n",
1747 | "1972 1\n",
1748 | "Name: film_datetime, dtype: int64"
1749 | ]
1750 | },
1751 | "execution_count": 30,
1752 | "metadata": {},
1753 | "output_type": "execute_result"
1754 | }
1755 | ],
1756 | "source": [
1757 | "# count the number of talks each year using value_counts()\n",
1758 | "ted.film_datetime.dt.year.value_counts()"
1759 | ]
1760 | },
1761 | {
1762 | "cell_type": "code",
1763 | "execution_count": 31,
1764 | "metadata": {},
1765 | "outputs": [
1766 | {
1767 | "data": {
1768 | "text/plain": [
1769 | ""
1770 | ]
1771 | },
1772 | "execution_count": 31,
1773 | "metadata": {},
1774 | "output_type": "execute_result"
1775 | },
1776 | {
1777 | "data": {
1778 | "image/png": "\n",
1779 | "text/plain": [
1780 | ""
1781 | ]
1782 | },
1783 | "metadata": {
1784 | "needs_background": "light"
1785 | },
1786 | "output_type": "display_data"
1787 | }
1788 | ],
1789 | "source": [
1790 | "# points are plotted and connected in the order you give them to pandas\n",
1791 | "ted.film_datetime.dt.year.value_counts().plot()"
1792 | ]
1793 | },
1794 | {
1795 | "cell_type": "code",
1796 | "execution_count": 32,
1797 | "metadata": {},
1798 | "outputs": [
1799 | {
1800 | "data": {
1801 | "text/plain": [
1802 | ""
1803 | ]
1804 | },
1805 | "execution_count": 32,
1806 | "metadata": {},
1807 | "output_type": "execute_result"
1808 | },
1809 | {
1810 | "data": {
1811 | "image/png": "\n",
1812 | "text/plain": [
1813 | ""
1814 | ]
1815 | },
1816 | "metadata": {
1817 | "needs_background": "light"
1818 | },
1819 | "output_type": "display_data"
1820 | }
1821 | ],
1822 | "source": [
1823 | "# need to sort the index before plotting\n",
1824 | "ted.film_datetime.dt.year.value_counts().sort_index().plot()"
1825 | ]
1826 | },
1827 | {
1828 | "cell_type": "code",
1829 | "execution_count": 33,
1830 | "metadata": {
1831 | "scrolled": true
1832 | },
1833 | "outputs": [
1834 | {
1835 | "data": {
1836 | "text/plain": [
1837 | "Timestamp('2017-08-27 00:00:00')"
1838 | ]
1839 | },
1840 | "execution_count": 33,
1841 | "metadata": {},
1842 | "output_type": "execute_result"
1843 | }
1844 | ],
1845 | "source": [
1846 | "# we only have partial data for 2017\n",
1847 | "ted.film_datetime.max()"
1848 | ]
1849 | },
1850 | {
1851 | "cell_type": "markdown",
1852 | "metadata": {},
1853 | "source": [
1854 | "Lessons:\n",
1855 | "\n",
1856 | "1. Read the documentation\n",
1857 | "2. Use the datetime data type for dates and times\n",
1858 | "3. Check your work as you go\n",
1859 | "4. Consider excluding data if it might not be relevant"
1860 | ]
1861 | },
1862 | {
1863 | "cell_type": "markdown",
1864 | "metadata": {},
1865 | "source": [
1866 | "## 5. What were the \"best\" events in TED history to attend?"
1867 | ]
1868 | },
1869 | {
1870 | "cell_type": "code",
1871 | "execution_count": 34,
1872 | "metadata": {},
1873 | "outputs": [
1874 | {
1875 | "data": {
1876 | "text/plain": [
1877 | "TED2014 84\n",
1878 | "TED2009 83\n",
1879 | "TED2013 77\n",
1880 | "TED2016 77\n",
1881 | "TED2015 75\n",
1882 | "Name: event, dtype: int64"
1883 | ]
1884 | },
1885 | "execution_count": 34,
1886 | "metadata": {},
1887 | "output_type": "execute_result"
1888 | }
1889 | ],
1890 | "source": [
1891 | "# count the number of talks (great if you value variety, but they may not be great talks)\n",
1892 | "ted.event.value_counts().head()"
1893 | ]
1894 | },
1895 | {
1896 | "cell_type": "code",
1897 | "execution_count": 35,
1898 | "metadata": {},
1899 | "outputs": [
1900 | {
1901 | "data": {
1902 | "text/plain": [
1903 | "event\n",
1904 | "AORN Congress 149818.0\n",
1905 | "Arbejdsglaede Live 971594.0\n",
1906 | "BBC TV 521974.0\n",
1907 | "Bowery Poetry Club 676741.0\n",
1908 | "Business Innovation Factory 304086.0\n",
1909 | "Name: views, dtype: float64"
1910 | ]
1911 | },
1912 | "execution_count": 35,
1913 | "metadata": {},
1914 | "output_type": "execute_result"
1915 | }
1916 | ],
1917 | "source": [
1918 | "# use views as a proxy for \"quality of talk\"\n",
1919 | "ted.groupby('event').views.mean().head()"
1920 | ]
1921 | },
1922 | {
1923 | "cell_type": "code",
1924 | "execution_count": 36,
1925 | "metadata": {},
1926 | "outputs": [
1927 | {
1928 | "data": {
1929 | "text/plain": [
1930 | "event\n",
1931 | "TEDxNorrkoping 6569493.0\n",
1932 | "TEDxCreativeCoast 8444981.0\n",
1933 | "TEDxBloomington 9484259.5\n",
1934 | "TEDxHouston 16140250.5\n",
1935 | "TEDxPuget Sound 34309432.0\n",
1936 | "Name: views, dtype: float64"
1937 | ]
1938 | },
1939 | "execution_count": 36,
1940 | "metadata": {},
1941 | "output_type": "execute_result"
1942 | }
1943 | ],
1944 | "source": [
1945 | "# find the largest values, but we don't know how many talks are being averaged\n",
1946 | "ted.groupby('event').views.mean().sort_values().tail()"
1947 | ]
1948 | },
1949 | {
1950 | "cell_type": "code",
1951 | "execution_count": 37,
1952 | "metadata": {},
1953 | "outputs": [
1954 | {
1955 | "data": {
1956 | "text/html": [
1957 | "\n",
1958 | "\n",
1971 | "
\n",
1972 | " \n",
1973 | " \n",
1974 | " \n",
1975 | " count \n",
1976 | " mean \n",
1977 | " \n",
1978 | " \n",
1979 | " event \n",
1980 | " \n",
1981 | " \n",
1982 | " \n",
1983 | " \n",
1984 | " \n",
1985 | " \n",
1986 | " TEDxNorrkoping \n",
1987 | " 1 \n",
1988 | " 6569493.0 \n",
1989 | " \n",
1990 | " \n",
1991 | " TEDxCreativeCoast \n",
1992 | " 1 \n",
1993 | " 8444981.0 \n",
1994 | " \n",
1995 | " \n",
1996 | " TEDxBloomington \n",
1997 | " 2 \n",
1998 | " 9484259.5 \n",
1999 | " \n",
2000 | " \n",
2001 | " TEDxHouston \n",
2002 | " 2 \n",
2003 | " 16140250.5 \n",
2004 | " \n",
2005 | " \n",
2006 | " TEDxPuget Sound \n",
2007 | " 1 \n",
2008 | " 34309432.0 \n",
2009 | " \n",
2010 | " \n",
2011 | "
\n",
2012 | "
"
2013 | ],
2014 | "text/plain": [
2015 | " count mean\n",
2016 | "event \n",
2017 | "TEDxNorrkoping 1 6569493.0\n",
2018 | "TEDxCreativeCoast 1 8444981.0\n",
2019 | "TEDxBloomington 2 9484259.5\n",
2020 | "TEDxHouston 2 16140250.5\n",
2021 | "TEDxPuget Sound 1 34309432.0"
2022 | ]
2023 | },
2024 | "execution_count": 37,
2025 | "metadata": {},
2026 | "output_type": "execute_result"
2027 | }
2028 | ],
2029 | "source": [
2030 | "# show the number of talks along with the mean (events with the highest means had only 1 or 2 talks)\n",
2031 | "ted.groupby('event').views.agg(['count', 'mean']).sort_values('mean').tail()"
2032 | ]
2033 | },
2034 | {
2035 | "cell_type": "code",
2036 | "execution_count": 38,
2037 | "metadata": {},
2038 | "outputs": [
2039 | {
2040 | "data": {
2041 | "text/html": [
2042 | "\n",
2043 | "\n",
2056 | "
\n",
2057 | " \n",
2058 | " \n",
2059 | " \n",
2060 | " count \n",
2061 | " mean \n",
2062 | " sum \n",
2063 | " \n",
2064 | " \n",
2065 | " event \n",
2066 | " \n",
2067 | " \n",
2068 | " \n",
2069 | " \n",
2070 | " \n",
2071 | " \n",
2072 | " \n",
2073 | " TED2006 \n",
2074 | " 45 \n",
2075 | " 3.274345e+06 \n",
2076 | " 147345533 \n",
2077 | " \n",
2078 | " \n",
2079 | " TED2015 \n",
2080 | " 75 \n",
2081 | " 2.011017e+06 \n",
2082 | " 150826305 \n",
2083 | " \n",
2084 | " \n",
2085 | " TEDGlobal 2013 \n",
2086 | " 66 \n",
2087 | " 2.584163e+06 \n",
2088 | " 170554736 \n",
2089 | " \n",
2090 | " \n",
2091 | " TED2014 \n",
2092 | " 84 \n",
2093 | " 2.072874e+06 \n",
2094 | " 174121423 \n",
2095 | " \n",
2096 | " \n",
2097 | " TED2013 \n",
2098 | " 77 \n",
2099 | " 2.302700e+06 \n",
2100 | " 177307937 \n",
2101 | " \n",
2102 | " \n",
2103 | "
\n",
2104 | "
"
2105 | ],
2106 | "text/plain": [
2107 | " count mean sum\n",
2108 | "event \n",
2109 | "TED2006 45 3.274345e+06 147345533\n",
2110 | "TED2015 75 2.011017e+06 150826305\n",
2111 | "TEDGlobal 2013 66 2.584163e+06 170554736\n",
2112 | "TED2014 84 2.072874e+06 174121423\n",
2113 | "TED2013 77 2.302700e+06 177307937"
2114 | ]
2115 | },
2116 | "execution_count": 38,
2117 | "metadata": {},
2118 | "output_type": "execute_result"
2119 | }
2120 | ],
2121 | "source": [
2122 | "# calculate the total views per event\n",
2123 | "ted.groupby('event').views.agg(['count', 'mean', 'sum']).sort_values('sum').tail()"
2124 | ]
2125 | },
2126 | {
2127 | "cell_type": "markdown",
2128 | "metadata": {},
2129 | "source": [
2130 | "Lessons:\n",
2131 | "\n",
2132 | "1. Think creatively for how you can use the data you have to answer your question\n",
2133 | "2. Watch out for small sample sizes"
2134 | ]
2135 | },
2136 | {
2137 | "cell_type": "markdown",
2138 | "metadata": {},
2139 | "source": [
2140 | "## 6. Unpack the ratings data"
2141 | ]
2142 | },
2143 | {
2144 | "cell_type": "code",
2145 | "execution_count": 39,
2146 | "metadata": {},
2147 | "outputs": [
2148 | {
2149 | "data": {
2150 | "text/plain": [
2151 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n",
2152 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n",
2153 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n",
2154 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n",
2155 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n",
2156 | "Name: ratings, dtype: object"
2157 | ]
2158 | },
2159 | "execution_count": 39,
2160 | "metadata": {},
2161 | "output_type": "execute_result"
2162 | }
2163 | ],
2164 | "source": [
2165 | "# previously, users could tag talks on the TED website (funny, inspiring, confusing, etc.)\n",
2166 | "ted.ratings.head()"
2167 | ]
2168 | },
2169 | {
2170 | "cell_type": "code",
2171 | "execution_count": 40,
2172 | "metadata": {},
2173 | "outputs": [
2174 | {
2175 | "data": {
2176 | "text/plain": [
2177 | "\"[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]\""
2178 | ]
2179 | },
2180 | "execution_count": 40,
2181 | "metadata": {},
2182 | "output_type": "execute_result"
2183 | }
2184 | ],
2185 | "source": [
2186 | "# two ways to examine the ratings data for the first talk\n",
2187 | "ted.loc[0, 'ratings']\n",
2188 | "ted.ratings[0]"
2189 | ]
2190 | },
2191 | {
2192 | "cell_type": "code",
2193 | "execution_count": 41,
2194 | "metadata": {},
2195 | "outputs": [
2196 | {
2197 | "data": {
2198 | "text/plain": [
2199 | "str"
2200 | ]
2201 | },
2202 | "execution_count": 41,
2203 | "metadata": {},
2204 | "output_type": "execute_result"
2205 | }
2206 | ],
2207 | "source": [
2208 | "# this is a string not a list\n",
2209 | "type(ted.ratings[0])"
2210 | ]
2211 | },
2212 | {
2213 | "cell_type": "code",
2214 | "execution_count": 42,
2215 | "metadata": {},
2216 | "outputs": [],
2217 | "source": [
2218 | "# convert this into something useful using Python's ast module (Abstract Syntax Tree)\n",
2219 | "import ast"
2220 | ]
2221 | },
2222 | {
2223 | "cell_type": "code",
2224 | "execution_count": 43,
2225 | "metadata": {},
2226 | "outputs": [
2227 | {
2228 | "data": {
2229 | "text/plain": [
2230 | "[1, 2, 3]"
2231 | ]
2232 | },
2233 | "execution_count": 43,
2234 | "metadata": {},
2235 | "output_type": "execute_result"
2236 | }
2237 | ],
2238 | "source": [
2239 | "# literal_eval() allows you to evaluate a string containing a Python literal or container\n",
2240 | "ast.literal_eval('[1, 2, 3]')"
2241 | ]
2242 | },
2243 | {
2244 | "cell_type": "code",
2245 | "execution_count": 44,
2246 | "metadata": {},
2247 | "outputs": [
2248 | {
2249 | "data": {
2250 | "text/plain": [
2251 | "list"
2252 | ]
2253 | },
2254 | "execution_count": 44,
2255 | "metadata": {},
2256 | "output_type": "execute_result"
2257 | }
2258 | ],
2259 | "source": [
2260 | "# if you have a string representation of something, you can retrieve what it actually represents\n",
2261 | "type(ast.literal_eval('[1, 2, 3]'))"
2262 | ]
2263 | },
2264 | {
2265 | "cell_type": "code",
2266 | "execution_count": 45,
2267 | "metadata": {},
2268 | "outputs": [
2269 | {
2270 | "data": {
2271 | "text/plain": [
2272 | "[{'id': 7, 'name': 'Funny', 'count': 19645},\n",
2273 | " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n",
2274 | " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n",
2275 | " {'id': 3, 'name': 'Courageous', 'count': 3253},\n",
2276 | " {'id': 11, 'name': 'Longwinded', 'count': 387},\n",
2277 | " {'id': 2, 'name': 'Confusing', 'count': 242},\n",
2278 | " {'id': 8, 'name': 'Informative', 'count': 7346},\n",
2279 | " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n",
2280 | " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n",
2281 | " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n",
2282 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n",
2283 | " {'id': 25, 'name': 'OK', 'count': 1174},\n",
2284 | " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n",
2285 | " {'id': 10, 'name': 'Inspiring', 'count': 24924}]"
2286 | ]
2287 | },
2288 | "execution_count": 45,
2289 | "metadata": {},
2290 | "output_type": "execute_result"
2291 | }
2292 | ],
2293 | "source": [
2294 | "# unpack the ratings data for the first talk\n",
2295 | "ast.literal_eval(ted.ratings[0])"
2296 | ]
2297 | },
2298 | {
2299 | "cell_type": "code",
2300 | "execution_count": 46,
2301 | "metadata": {},
2302 | "outputs": [
2303 | {
2304 | "data": {
2305 | "text/plain": [
2306 | "list"
2307 | ]
2308 | },
2309 | "execution_count": 46,
2310 | "metadata": {},
2311 | "output_type": "execute_result"
2312 | }
2313 | ],
2314 | "source": [
2315 | "# now we have a list (of dictionaries)\n",
2316 | "type(ast.literal_eval(ted.ratings[0]))"
2317 | ]
2318 | },
2319 | {
2320 | "cell_type": "code",
2321 | "execution_count": 47,
2322 | "metadata": {},
2323 | "outputs": [],
2324 | "source": [
2325 | "# define a function to convert an element in the ratings Series from string to list\n",
2326 | "def str_to_list(ratings_str):\n",
2327 | " return ast.literal_eval(ratings_str)"
2328 | ]
2329 | },
2330 | {
2331 | "cell_type": "code",
2332 | "execution_count": 48,
2333 | "metadata": {},
2334 | "outputs": [
2335 | {
2336 | "data": {
2337 | "text/plain": [
2338 | "[{'id': 7, 'name': 'Funny', 'count': 19645},\n",
2339 | " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n",
2340 | " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n",
2341 | " {'id': 3, 'name': 'Courageous', 'count': 3253},\n",
2342 | " {'id': 11, 'name': 'Longwinded', 'count': 387},\n",
2343 | " {'id': 2, 'name': 'Confusing', 'count': 242},\n",
2344 | " {'id': 8, 'name': 'Informative', 'count': 7346},\n",
2345 | " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n",
2346 | " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n",
2347 | " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n",
2348 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n",
2349 | " {'id': 25, 'name': 'OK', 'count': 1174},\n",
2350 | " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n",
2351 | " {'id': 10, 'name': 'Inspiring', 'count': 24924}]"
2352 | ]
2353 | },
2354 | "execution_count": 48,
2355 | "metadata": {},
2356 | "output_type": "execute_result"
2357 | }
2358 | ],
2359 | "source": [
2360 | "# test the function\n",
2361 | "str_to_list(ted.ratings[0])"
2362 | ]
2363 | },
2364 | {
2365 | "cell_type": "code",
2366 | "execution_count": 49,
2367 | "metadata": {},
2368 | "outputs": [
2369 | {
2370 | "data": {
2371 | "text/plain": [
2372 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n",
2373 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n",
2374 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n",
2375 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n",
2376 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n",
2377 | "Name: ratings, dtype: object"
2378 | ]
2379 | },
2380 | "execution_count": 49,
2381 | "metadata": {},
2382 | "output_type": "execute_result"
2383 | }
2384 | ],
2385 | "source": [
2386 | "# Series apply method applies a function to every element in a Series and returns a Series\n",
2387 | "ted.ratings.apply(str_to_list).head()"
2388 | ]
2389 | },
2390 | {
2391 | "cell_type": "code",
2392 | "execution_count": 50,
2393 | "metadata": {},
2394 | "outputs": [
2395 | {
2396 | "data": {
2397 | "text/plain": [
2398 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n",
2399 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n",
2400 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n",
2401 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n",
2402 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n",
2403 | "Name: ratings, dtype: object"
2404 | ]
2405 | },
2406 | "execution_count": 50,
2407 | "metadata": {},
2408 | "output_type": "execute_result"
2409 | }
2410 | ],
2411 | "source": [
2412 | "# lambda is a shorter alternative\n",
2413 | "ted.ratings.apply(lambda x: ast.literal_eval(x)).head()"
2414 | ]
2415 | },
2416 | {
2417 | "cell_type": "code",
2418 | "execution_count": 51,
2419 | "metadata": {},
2420 | "outputs": [
2421 | {
2422 | "data": {
2423 | "text/plain": [
2424 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n",
2425 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n",
2426 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n",
2427 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n",
2428 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n",
2429 | "Name: ratings, dtype: object"
2430 | ]
2431 | },
2432 | "execution_count": 51,
2433 | "metadata": {},
2434 | "output_type": "execute_result"
2435 | }
2436 | ],
2437 | "source": [
2438 | "# an even shorter alternative is to apply the function directly (without lambda)\n",
2439 | "ted.ratings.apply(ast.literal_eval).head()"
2440 | ]
2441 | },
2442 | {
2443 | "cell_type": "code",
2444 | "execution_count": 52,
2445 | "metadata": {},
2446 | "outputs": [],
2447 | "source": [
2448 | "ted['ratings_list'] = ted.ratings.apply(lambda x: ast.literal_eval(x))"
2449 | ]
2450 | },
2451 | {
2452 | "cell_type": "code",
2453 | "execution_count": 53,
2454 | "metadata": {},
2455 | "outputs": [
2456 | {
2457 | "data": {
2458 | "text/plain": [
2459 | "[{'id': 7, 'name': 'Funny', 'count': 19645},\n",
2460 | " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n",
2461 | " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n",
2462 | " {'id': 3, 'name': 'Courageous', 'count': 3253},\n",
2463 | " {'id': 11, 'name': 'Longwinded', 'count': 387},\n",
2464 | " {'id': 2, 'name': 'Confusing', 'count': 242},\n",
2465 | " {'id': 8, 'name': 'Informative', 'count': 7346},\n",
2466 | " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n",
2467 | " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n",
2468 | " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n",
2469 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n",
2470 | " {'id': 25, 'name': 'OK', 'count': 1174},\n",
2471 | " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n",
2472 | " {'id': 10, 'name': 'Inspiring', 'count': 24924}]"
2473 | ]
2474 | },
2475 | "execution_count": 53,
2476 | "metadata": {},
2477 | "output_type": "execute_result"
2478 | }
2479 | ],
2480 | "source": [
2481 | "# check that the new Series looks as expected\n",
2482 | "ted.ratings_list[0]"
2483 | ]
2484 | },
2485 | {
2486 | "cell_type": "code",
2487 | "execution_count": 54,
2488 | "metadata": {},
2489 | "outputs": [
2490 | {
2491 | "data": {
2492 | "text/plain": [
2493 | "list"
2494 | ]
2495 | },
2496 | "execution_count": 54,
2497 | "metadata": {},
2498 | "output_type": "execute_result"
2499 | }
2500 | ],
2501 | "source": [
2502 | "# each element in the Series is a list\n",
2503 | "type(ted.ratings_list[0])"
2504 | ]
2505 | },
2506 | {
2507 | "cell_type": "code",
2508 | "execution_count": 55,
2509 | "metadata": {
2510 | "scrolled": true
2511 | },
2512 | "outputs": [
2513 | {
2514 | "data": {
2515 | "text/plain": [
2516 | "dtype('O')"
2517 | ]
2518 | },
2519 | "execution_count": 55,
2520 | "metadata": {},
2521 | "output_type": "execute_result"
2522 | }
2523 | ],
2524 | "source": [
2525 | "# data type of the new Series is object\n",
2526 | "ted.ratings_list.dtype"
2527 | ]
2528 | },
2529 | {
2530 | "cell_type": "code",
2531 | "execution_count": 56,
2532 | "metadata": {},
2533 | "outputs": [
2534 | {
2535 | "data": {
2536 | "text/plain": [
2537 | "comments int64\n",
2538 | "description object\n",
2539 | "duration int64\n",
2540 | "event object\n",
2541 | "film_date int64\n",
2542 | "languages int64\n",
2543 | "main_speaker object\n",
2544 | "name object\n",
2545 | "num_speaker int64\n",
2546 | "published_date int64\n",
2547 | "ratings object\n",
2548 | "related_talks object\n",
2549 | "speaker_occupation object\n",
2550 | "tags object\n",
2551 | "title object\n",
2552 | "url object\n",
2553 | "views int64\n",
2554 | "comments_per_view float64\n",
2555 | "views_per_comment float64\n",
2556 | "film_datetime datetime64[ns]\n",
2557 | "ratings_list object\n",
2558 | "dtype: object"
2559 | ]
2560 | },
2561 | "execution_count": 56,
2562 | "metadata": {},
2563 | "output_type": "execute_result"
2564 | }
2565 | ],
2566 | "source": [
2567 | "# object is not just for strings\n",
2568 | "ted.dtypes"
2569 | ]
2570 | },
2571 | {
2572 | "cell_type": "markdown",
2573 | "metadata": {},
2574 | "source": [
2575 | "Lessons:\n",
2576 | "\n",
2577 | "1. Pay attention to data types in pandas\n",
2578 | "2. Use apply any time it is necessary"
2579 | ]
2580 | },
2581 | {
2582 | "cell_type": "markdown",
2583 | "metadata": {},
2584 | "source": [
2585 | "## 7. Count the total number of ratings received by each talk\n",
2586 | "\n",
2587 | "Bonus exercises:\n",
2588 | "\n",
2589 | "- for each talk, calculate the percentage of ratings that were negative\n",
2590 | "- for each talk, calculate the average number of ratings it received per day since it was published"
2591 | ]
2592 | },
2593 | {
2594 | "cell_type": "code",
2595 | "execution_count": 57,
2596 | "metadata": {},
2597 | "outputs": [
2598 | {
2599 | "data": {
2600 | "text/plain": [
2601 | "[{'id': 7, 'name': 'Funny', 'count': 19645},\n",
2602 | " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n",
2603 | " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n",
2604 | " {'id': 3, 'name': 'Courageous', 'count': 3253},\n",
2605 | " {'id': 11, 'name': 'Longwinded', 'count': 387},\n",
2606 | " {'id': 2, 'name': 'Confusing', 'count': 242},\n",
2607 | " {'id': 8, 'name': 'Informative', 'count': 7346},\n",
2608 | " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n",
2609 | " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n",
2610 | " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n",
2611 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n",
2612 | " {'id': 25, 'name': 'OK', 'count': 1174},\n",
2613 | " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n",
2614 | " {'id': 10, 'name': 'Inspiring', 'count': 24924}]"
2615 | ]
2616 | },
2617 | "execution_count": 57,
2618 | "metadata": {},
2619 | "output_type": "execute_result"
2620 | }
2621 | ],
2622 | "source": [
2623 | "# expected result (for each talk) is sum of count\n",
2624 | "ted.ratings_list[0]"
2625 | ]
2626 | },
2627 | {
2628 | "cell_type": "code",
2629 | "execution_count": 58,
2630 | "metadata": {},
2631 | "outputs": [],
2632 | "source": [
2633 | "# start by building a simple function\n",
2634 | "def get_num_ratings(list_of_dicts):\n",
2635 | " return list_of_dicts[0]"
2636 | ]
2637 | },
2638 | {
2639 | "cell_type": "code",
2640 | "execution_count": 59,
2641 | "metadata": {},
2642 | "outputs": [
2643 | {
2644 | "data": {
2645 | "text/plain": [
2646 | "{'id': 7, 'name': 'Funny', 'count': 19645}"
2647 | ]
2648 | },
2649 | "execution_count": 59,
2650 | "metadata": {},
2651 | "output_type": "execute_result"
2652 | }
2653 | ],
2654 | "source": [
2655 | "# pass it a list, and it returns the first element in the list, which is a dictionary\n",
2656 | "get_num_ratings(ted.ratings_list[0])"
2657 | ]
2658 | },
2659 | {
2660 | "cell_type": "code",
2661 | "execution_count": 60,
2662 | "metadata": {},
2663 | "outputs": [],
2664 | "source": [
2665 | "# modify the function to return the vote count\n",
2666 | "def get_num_ratings(list_of_dicts):\n",
2667 | " return list_of_dicts[0]['count']"
2668 | ]
2669 | },
2670 | {
2671 | "cell_type": "code",
2672 | "execution_count": 61,
2673 | "metadata": {},
2674 | "outputs": [
2675 | {
2676 | "data": {
2677 | "text/plain": [
2678 | "19645"
2679 | ]
2680 | },
2681 | "execution_count": 61,
2682 | "metadata": {},
2683 | "output_type": "execute_result"
2684 | }
2685 | ],
2686 | "source": [
2687 | "# pass it a list, and it returns a value from the first dictionary in the list\n",
2688 | "get_num_ratings(ted.ratings_list[0])"
2689 | ]
2690 | },
2691 | {
2692 | "cell_type": "code",
2693 | "execution_count": 62,
2694 | "metadata": {},
2695 | "outputs": [],
2696 | "source": [
2697 | "# modify the function to get the sum of count\n",
2698 | "def get_num_ratings(list_of_dicts):\n",
2699 | " num = 0\n",
2700 | " for d in list_of_dicts:\n",
2701 | " num = num + d['count']\n",
2702 | " return num"
2703 | ]
2704 | },
2705 | {
2706 | "cell_type": "code",
2707 | "execution_count": 63,
2708 | "metadata": {},
2709 | "outputs": [
2710 | {
2711 | "data": {
2712 | "text/plain": [
2713 | "93850"
2714 | ]
2715 | },
2716 | "execution_count": 63,
2717 | "metadata": {},
2718 | "output_type": "execute_result"
2719 | }
2720 | ],
2721 | "source": [
2722 | "# looks about right\n",
2723 | "get_num_ratings(ted.ratings_list[0])"
2724 | ]
2725 | },
2726 | {
2727 | "cell_type": "code",
2728 | "execution_count": 64,
2729 | "metadata": {},
2730 | "outputs": [
2731 | {
2732 | "data": {
2733 | "text/plain": [
2734 | "[{'id': 7, 'name': 'Funny', 'count': 544},\n",
2735 | " {'id': 3, 'name': 'Courageous', 'count': 139},\n",
2736 | " {'id': 2, 'name': 'Confusing', 'count': 62},\n",
2737 | " {'id': 1, 'name': 'Beautiful', 'count': 58},\n",
2738 | " {'id': 21, 'name': 'Unconvincing', 'count': 258},\n",
2739 | " {'id': 11, 'name': 'Longwinded', 'count': 113},\n",
2740 | " {'id': 8, 'name': 'Informative', 'count': 443},\n",
2741 | " {'id': 10, 'name': 'Inspiring', 'count': 413},\n",
2742 | " {'id': 22, 'name': 'Fascinating', 'count': 132},\n",
2743 | " {'id': 9, 'name': 'Ingenious', 'count': 56},\n",
2744 | " {'id': 24, 'name': 'Persuasive', 'count': 268},\n",
2745 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 116},\n",
2746 | " {'id': 26, 'name': 'Obnoxious', 'count': 131},\n",
2747 | " {'id': 25, 'name': 'OK', 'count': 203}]"
2748 | ]
2749 | },
2750 | "execution_count": 64,
2751 | "metadata": {},
2752 | "output_type": "execute_result"
2753 | }
2754 | ],
2755 | "source": [
2756 | "# check with another record\n",
2757 | "ted.ratings_list[1]"
2758 | ]
2759 | },
2760 | {
2761 | "cell_type": "code",
2762 | "execution_count": 65,
2763 | "metadata": {},
2764 | "outputs": [
2765 | {
2766 | "data": {
2767 | "text/plain": [
2768 | "2936"
2769 | ]
2770 | },
2771 | "execution_count": 65,
2772 | "metadata": {},
2773 | "output_type": "execute_result"
2774 | }
2775 | ],
2776 | "source": [
2777 | "# looks about right\n",
2778 | "get_num_ratings(ted.ratings_list[1])"
2779 | ]
2780 | },
2781 | {
2782 | "cell_type": "code",
2783 | "execution_count": 66,
2784 | "metadata": {},
2785 | "outputs": [
2786 | {
2787 | "data": {
2788 | "text/plain": [
2789 | "0 93850\n",
2790 | "1 2936\n",
2791 | "2 2824\n",
2792 | "3 3728\n",
2793 | "4 25620\n",
2794 | "Name: ratings_list, dtype: int64"
2795 | ]
2796 | },
2797 | "execution_count": 66,
2798 | "metadata": {},
2799 | "output_type": "execute_result"
2800 | }
2801 | ],
2802 | "source": [
2803 | "# apply it to every element in the Series\n",
2804 | "ted.ratings_list.apply(get_num_ratings).head()"
2805 | ]
2806 | },
2807 | {
2808 | "cell_type": "code",
2809 | "execution_count": 67,
2810 | "metadata": {},
2811 | "outputs": [
2812 | {
2813 | "data": {
2814 | "text/plain": [
2815 | "93850"
2816 | ]
2817 | },
2818 | "execution_count": 67,
2819 | "metadata": {},
2820 | "output_type": "execute_result"
2821 | }
2822 | ],
2823 | "source": [
2824 | "# another alternative is to use a generator expression\n",
2825 | "sum((d['count'] for d in ted.ratings_list[0]))"
2826 | ]
2827 | },
2828 | {
2829 | "cell_type": "code",
2830 | "execution_count": 68,
2831 | "metadata": {},
2832 | "outputs": [
2833 | {
2834 | "data": {
2835 | "text/plain": [
2836 | "0 93850\n",
2837 | "1 2936\n",
2838 | "2 2824\n",
2839 | "3 3728\n",
2840 | "4 25620\n",
2841 | "Name: ratings_list, dtype: int64"
2842 | ]
2843 | },
2844 | "execution_count": 68,
2845 | "metadata": {},
2846 | "output_type": "execute_result"
2847 | }
2848 | ],
2849 | "source": [
2850 | "# use lambda to apply this method\n",
2851 | "ted.ratings_list.apply(lambda x: sum((d['count'] for d in x))).head()"
2852 | ]
2853 | },
2854 | {
2855 | "cell_type": "code",
2856 | "execution_count": 69,
2857 | "metadata": {},
2858 | "outputs": [
2859 | {
2860 | "data": {
2861 | "text/plain": [
2862 | "93850"
2863 | ]
2864 | },
2865 | "execution_count": 69,
2866 | "metadata": {},
2867 | "output_type": "execute_result"
2868 | }
2869 | ],
2870 | "source": [
2871 | "# another alternative is to use pd.DataFrame()\n",
2872 | "pd.DataFrame(ted.ratings_list[0])['count'].sum()"
2873 | ]
2874 | },
2875 | {
2876 | "cell_type": "code",
2877 | "execution_count": 70,
2878 | "metadata": {},
2879 | "outputs": [
2880 | {
2881 | "data": {
2882 | "text/plain": [
2883 | "0 93850\n",
2884 | "1 2936\n",
2885 | "2 2824\n",
2886 | "3 3728\n",
2887 | "4 25620\n",
2888 | "Name: ratings_list, dtype: int64"
2889 | ]
2890 | },
2891 | "execution_count": 70,
2892 | "metadata": {},
2893 | "output_type": "execute_result"
2894 | }
2895 | ],
2896 | "source": [
2897 | "# use lambda to apply this method\n",
2898 | "ted.ratings_list.apply(lambda x: pd.DataFrame(x)['count'].sum()).head()"
2899 | ]
2900 | },
2901 | {
2902 | "cell_type": "code",
2903 | "execution_count": 71,
2904 | "metadata": {},
2905 | "outputs": [],
2906 | "source": [
2907 | "ted['num_ratings'] = ted.ratings_list.apply(get_num_ratings)"
2908 | ]
2909 | },
2910 | {
2911 | "cell_type": "code",
2912 | "execution_count": 72,
2913 | "metadata": {},
2914 | "outputs": [
2915 | {
2916 | "data": {
2917 | "text/plain": [
2918 | "count 2550.000000\n",
2919 | "mean 2436.408235\n",
2920 | "std 4226.795631\n",
2921 | "min 68.000000\n",
2922 | "25% 870.750000\n",
2923 | "50% 1452.500000\n",
2924 | "75% 2506.750000\n",
2925 | "max 93850.000000\n",
2926 | "Name: num_ratings, dtype: float64"
2927 | ]
2928 | },
2929 | "execution_count": 72,
2930 | "metadata": {},
2931 | "output_type": "execute_result"
2932 | }
2933 | ],
2934 | "source": [
2935 | "# do one more check\n",
2936 | "ted.num_ratings.describe()"
2937 | ]
2938 | },
2939 | {
2940 | "cell_type": "markdown",
2941 | "metadata": {},
2942 | "source": [
2943 | "Lessons:\n",
2944 | "\n",
2945 | "1. Write your code in small chunks, and check your work as you go\n",
2946 | "2. Lambda is best for simple functions"
2947 | ]
2948 | },
2949 | {
2950 | "cell_type": "markdown",
2951 | "metadata": {},
2952 | "source": [
2953 | "## 8. Which occupations deliver the funniest TED talks on average?\n",
2954 | "\n",
2955 | "Bonus exercises:\n",
2956 | "\n",
2957 | "- for each talk, calculate the most frequent rating\n",
2958 | "- for each talk, clean the occupation data so that there's only one occupation per talk"
2959 | ]
2960 | },
2961 | {
2962 | "cell_type": "markdown",
2963 | "metadata": {},
2964 | "source": [
2965 | "### Step 1: Count the number of funny ratings"
2966 | ]
2967 | },
2968 | {
2969 | "cell_type": "code",
2970 | "execution_count": 73,
2971 | "metadata": {},
2972 | "outputs": [
2973 | {
2974 | "data": {
2975 | "text/plain": [
2976 | "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n",
2977 | "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n",
2978 | "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n",
2979 | "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n",
2980 | "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n",
2981 | "Name: ratings_list, dtype: object"
2982 | ]
2983 | },
2984 | "execution_count": 73,
2985 | "metadata": {},
2986 | "output_type": "execute_result"
2987 | }
2988 | ],
2989 | "source": [
2990 | "# \"Funny\" is not always the first dictionary in the list\n",
2991 | "ted.ratings_list.head()"
2992 | ]
2993 | },
2994 | {
2995 | "cell_type": "code",
2996 | "execution_count": 74,
2997 | "metadata": {},
2998 | "outputs": [
2999 | {
3000 | "data": {
3001 | "text/plain": [
3002 | "True 2550\n",
3003 | "Name: ratings, dtype: int64"
3004 | ]
3005 | },
3006 | "execution_count": 74,
3007 | "metadata": {},
3008 | "output_type": "execute_result"
3009 | }
3010 | ],
3011 | "source": [
3012 | "# check ratings (not ratings_list) to see if \"Funny\" is always a rating type\n",
3013 | "ted.ratings.str.contains('Funny').value_counts()"
3014 | ]
3015 | },
3016 | {
3017 | "cell_type": "code",
3018 | "execution_count": 75,
3019 | "metadata": {},
3020 | "outputs": [],
3021 | "source": [
3022 | "# write a custom function\n",
3023 | "def get_funny_ratings(list_of_dicts):\n",
3024 | " for d in list_of_dicts:\n",
3025 | " if d['name'] == 'Funny':\n",
3026 | " return d['count']"
3027 | ]
3028 | },
3029 | {
3030 | "cell_type": "code",
3031 | "execution_count": 76,
3032 | "metadata": {},
3033 | "outputs": [
3034 | {
3035 | "data": {
3036 | "text/plain": [
3037 | "[{'id': 3, 'name': 'Courageous', 'count': 760},\n",
3038 | " {'id': 1, 'name': 'Beautiful', 'count': 291},\n",
3039 | " {'id': 2, 'name': 'Confusing', 'count': 32},\n",
3040 | " {'id': 7, 'name': 'Funny', 'count': 59},\n",
3041 | " {'id': 9, 'name': 'Ingenious', 'count': 105},\n",
3042 | " {'id': 21, 'name': 'Unconvincing', 'count': 36},\n",
3043 | " {'id': 11, 'name': 'Longwinded', 'count': 53},\n",
3044 | " {'id': 8, 'name': 'Informative', 'count': 380},\n",
3045 | " {'id': 10, 'name': 'Inspiring', 'count': 1070},\n",
3046 | " {'id': 22, 'name': 'Fascinating', 'count': 132},\n",
3047 | " {'id': 24, 'name': 'Persuasive', 'count': 460},\n",
3048 | " {'id': 23, 'name': 'Jaw-dropping', 'count': 230},\n",
3049 | " {'id': 26, 'name': 'Obnoxious', 'count': 35},\n",
3050 | " {'id': 25, 'name': 'OK', 'count': 85}]"
3051 | ]
3052 | },
3053 | "execution_count": 76,
3054 | "metadata": {},
3055 | "output_type": "execute_result"
3056 | }
3057 | ],
3058 | "source": [
3059 | "# examine a record in which \"Funny\" is not the first dictionary\n",
3060 | "ted.ratings_list[3]"
3061 | ]
3062 | },
3063 | {
3064 | "cell_type": "code",
3065 | "execution_count": 77,
3066 | "metadata": {},
3067 | "outputs": [
3068 | {
3069 | "data": {
3070 | "text/plain": [
3071 | "59"
3072 | ]
3073 | },
3074 | "execution_count": 77,
3075 | "metadata": {},
3076 | "output_type": "execute_result"
3077 | }
3078 | ],
3079 | "source": [
3080 | "# check that the function works\n",
3081 | "get_funny_ratings(ted.ratings_list[3])"
3082 | ]
3083 | },
3084 | {
3085 | "cell_type": "code",
3086 | "execution_count": 78,
3087 | "metadata": {},
3088 | "outputs": [
3089 | {
3090 | "data": {
3091 | "text/plain": [
3092 | "0 19645\n",
3093 | "1 544\n",
3094 | "2 964\n",
3095 | "3 59\n",
3096 | "4 1390\n",
3097 | "Name: funny_ratings, dtype: int64"
3098 | ]
3099 | },
3100 | "execution_count": 78,
3101 | "metadata": {},
3102 | "output_type": "execute_result"
3103 | }
3104 | ],
3105 | "source": [
3106 | "# apply it to every element in the Series\n",
3107 | "ted['funny_ratings'] = ted.ratings_list.apply(get_funny_ratings)\n",
3108 | "ted.funny_ratings.head()"
3109 | ]
3110 | },
3111 | {
3112 | "cell_type": "code",
3113 | "execution_count": 79,
3114 | "metadata": {},
3115 | "outputs": [
3116 | {
3117 | "data": {
3118 | "text/plain": [
3119 | "0"
3120 | ]
3121 | },
3122 | "execution_count": 79,
3123 | "metadata": {},
3124 | "output_type": "execute_result"
3125 | }
3126 | ],
3127 | "source": [
3128 | "# check for missing values\n",
3129 | "ted.funny_ratings.isna().sum()"
3130 | ]
3131 | },
3132 | {
3133 | "cell_type": "markdown",
3134 | "metadata": {},
3135 | "source": [
3136 | "### Step 2: Calculate the percentage of ratings that are funny"
3137 | ]
3138 | },
3139 | {
3140 | "cell_type": "code",
3141 | "execution_count": 80,
3142 | "metadata": {},
3143 | "outputs": [],
3144 | "source": [
3145 | "ted['funny_rate'] = ted.funny_ratings / ted.num_ratings"
3146 | ]
3147 | },
3148 | {
3149 | "cell_type": "code",
3150 | "execution_count": 81,
3151 | "metadata": {},
3152 | "outputs": [
3153 | {
3154 | "data": {
3155 | "text/plain": [
3156 | "1849 Science humorist\n",
3157 | "337 Comedian\n",
3158 | "124 Performance poet, multimedia artist\n",
3159 | "315 Expert\n",
3160 | "1168 Social energy entrepreneur\n",
3161 | "1468 Ornithologist\n",
3162 | "595 Comedian, voice artist\n",
3163 | "1534 Cartoon editor\n",
3164 | "97 Satirist\n",
3165 | "2297 Actor, writer\n",
3166 | "568 Comedian\n",
3167 | "675 Data scientist\n",
3168 | "21 Humorist, web artist\n",
3169 | "194 Jugglers\n",
3170 | "2273 Comedian and writer\n",
3171 | "2114 Comedian and writer\n",
3172 | "173 Investor\n",
3173 | "747 Comedian\n",
3174 | "1398 Comedian\n",
3175 | "685 Actor, comedian, playwright\n",
3176 | "Name: speaker_occupation, dtype: object"
3177 | ]
3178 | },
3179 | "execution_count": 81,
3180 | "metadata": {},
3181 | "output_type": "execute_result"
3182 | }
3183 | ],
3184 | "source": [
3185 | "# \"gut check\" that this calculation makes sense by examining the occupations of the funniest talks\n",
3186 | "ted.sort_values('funny_rate').speaker_occupation.tail(20)"
3187 | ]
3188 | },
3189 | {
3190 | "cell_type": "code",
3191 | "execution_count": 82,
3192 | "metadata": {},
3193 | "outputs": [
3194 | {
3195 | "data": {
3196 | "text/plain": [
3197 | "2549 Game designer\n",
3198 | "1612 Biologist\n",
3199 | "612 Sculptor\n",
3200 | "998 Penguin expert\n",
3201 | "593 Engineer\n",
3202 | "284 Space activist\n",
3203 | "1041 Biomedical engineer\n",
3204 | "1618 Spinal cord researcher\n",
3205 | "2132 Computational geneticist\n",
3206 | "442 Sculptor\n",
3207 | "426 Author, thinker\n",
3208 | "458 Educator\n",
3209 | "2437 Environmental engineer\n",
3210 | "1491 Photojournalist\n",
3211 | "1893 Forensic anthropologist\n",
3212 | "783 Marine biologist\n",
3213 | "195 Kenyan MP\n",
3214 | "772 HIV/AIDS fighter\n",
3215 | "788 Building activist\n",
3216 | "936 Neuroengineer\n",
3217 | "Name: speaker_occupation, dtype: object"
3218 | ]
3219 | },
3220 | "execution_count": 82,
3221 | "metadata": {},
3222 | "output_type": "execute_result"
3223 | }
3224 | ],
3225 | "source": [
3226 | "# examine the occupations of the least funny talks\n",
3227 | "ted.sort_values('funny_rate').speaker_occupation.head(20)"
3228 | ]
3229 | },
3230 | {
3231 | "cell_type": "markdown",
3232 | "metadata": {},
3233 | "source": [
3234 | "### Step 3: Analyze the funny rate by occupation"
3235 | ]
3236 | },
3237 | {
3238 | "cell_type": "code",
3239 | "execution_count": 83,
3240 | "metadata": {},
3241 | "outputs": [
3242 | {
3243 | "data": {
3244 | "text/plain": [
3245 | "speaker_occupation\n",
3246 | "Comedian 0.512457\n",
3247 | "Actor, writer 0.515152\n",
3248 | "Actor, comedian, playwright 0.558107\n",
3249 | "Jugglers 0.566828\n",
3250 | "Comedian and writer 0.602085\n",
3251 | "Name: funny_rate, dtype: float64"
3252 | ]
3253 | },
3254 | "execution_count": 83,
3255 | "metadata": {},
3256 | "output_type": "execute_result"
3257 | }
3258 | ],
3259 | "source": [
3260 | "# calculate the mean funny rate for each occupation\n",
3261 | "ted.groupby('speaker_occupation').funny_rate.mean().sort_values().tail()"
3262 | ]
3263 | },
3264 | {
3265 | "cell_type": "code",
3266 | "execution_count": 84,
3267 | "metadata": {},
3268 | "outputs": [
3269 | {
3270 | "data": {
3271 | "text/plain": [
3272 | "count 2544\n",
3273 | "unique 1458\n",
3274 | "top Writer\n",
3275 | "freq 45\n",
3276 | "Name: speaker_occupation, dtype: object"
3277 | ]
3278 | },
3279 | "execution_count": 84,
3280 | "metadata": {},
3281 | "output_type": "execute_result"
3282 | }
3283 | ],
3284 | "source": [
3285 | "# however, most of the occupations have a sample size of 1\n",
3286 | "ted.speaker_occupation.describe()"
3287 | ]
3288 | },
3289 | {
3290 | "cell_type": "markdown",
3291 | "metadata": {},
3292 | "source": [
3293 | "### Step 4: Focus on occupations that are well-represented in the data"
3294 | ]
3295 | },
3296 | {
3297 | "cell_type": "code",
3298 | "execution_count": 85,
3299 | "metadata": {},
3300 | "outputs": [
3301 | {
3302 | "data": {
3303 | "text/plain": [
3304 | "Writer 45\n",
3305 | "Artist 34\n",
3306 | "Designer 34\n",
3307 | "Journalist 33\n",
3308 | "Entrepreneur 31\n",
3309 | "Architect 30\n",
3310 | "Inventor 27\n",
3311 | "Psychologist 26\n",
3312 | "Photographer 25\n",
3313 | "Filmmaker 21\n",
3314 | "Author 20\n",
3315 | "Economist 20\n",
3316 | "Neuroscientist 20\n",
3317 | "Educator 20\n",
3318 | "Roboticist 16\n",
3319 | "Philosopher 16\n",
3320 | "Biologist 15\n",
3321 | "Physicist 14\n",
3322 | "Musician 11\n",
3323 | "Marine biologist 11\n",
3324 | "Technologist 10\n",
3325 | "Activist 10\n",
3326 | "Global health expert; data visionary 10\n",
3327 | "Historian 9\n",
3328 | "Singer/songwriter 9\n",
3329 | "Oceanographer 9\n",
3330 | "Behavioral economist 9\n",
3331 | "Poet 9\n",
3332 | "Astronomer 9\n",
3333 | "Graphic designer 9\n",
3334 | " ..\n",
3335 | "Anatomical artist 1\n",
3336 | "Literary scholar 1\n",
3337 | "Social entrepreneur, lawyer 1\n",
3338 | "Physician, bioengineer and entrepreneur 1\n",
3339 | "medical inventor 1\n",
3340 | "Mental health advocate 1\n",
3341 | "Public sector researcher 1\n",
3342 | "Speleologist 1\n",
3343 | "Disaster relief expert 1\n",
3344 | "Artist and curator 1\n",
3345 | "Finance journalist 1\n",
3346 | "Wildlife conservationist 1\n",
3347 | "Sex worker and activist 1\n",
3348 | "Connector 1\n",
3349 | "Sociologist, human rights activist 1\n",
3350 | "Author, producer 1\n",
3351 | "Painter 1\n",
3352 | "Policy expert 1\n",
3353 | "Environmental economist 1\n",
3354 | "Sound artist, composer 1\n",
3355 | "Senator 1\n",
3356 | "High school principal 1\n",
3357 | "Poet of code 1\n",
3358 | "Healthcare revolutionary 1\n",
3359 | "Circular economy advocate 1\n",
3360 | "Caregiver 1\n",
3361 | "Transportation geek 1\n",
3362 | "Music icon 1\n",
3363 | "Surprisologist 1\n",
3364 | "Psychiatrist and writer 1\n",
3365 | "Name: speaker_occupation, Length: 1458, dtype: int64"
3366 | ]
3367 | },
3368 | "execution_count": 85,
3369 | "metadata": {},
3370 | "output_type": "execute_result"
3371 | }
3372 | ],
3373 | "source": [
3374 | "# count how many times each occupation appears\n",
3375 | "ted.speaker_occupation.value_counts()"
3376 | ]
3377 | },
3378 | {
3379 | "cell_type": "code",
3380 | "execution_count": 86,
3381 | "metadata": {},
3382 | "outputs": [
3383 | {
3384 | "data": {
3385 | "text/plain": [
3386 | "pandas.core.series.Series"
3387 | ]
3388 | },
3389 | "execution_count": 86,
3390 | "metadata": {},
3391 | "output_type": "execute_result"
3392 | }
3393 | ],
3394 | "source": [
3395 | "# value_counts() outputs a pandas Series, thus we can use pandas to manipulate the output\n",
3396 | "occupation_counts = ted.speaker_occupation.value_counts()\n",
3397 | "type(occupation_counts)"
3398 | ]
3399 | },
3400 | {
3401 | "cell_type": "code",
3402 | "execution_count": 87,
3403 | "metadata": {},
3404 | "outputs": [
3405 | {
3406 | "data": {
3407 | "text/plain": [
3408 | "Writer 45\n",
3409 | "Artist 34\n",
3410 | "Designer 34\n",
3411 | "Journalist 33\n",
3412 | "Entrepreneur 31\n",
3413 | "Architect 30\n",
3414 | "Inventor 27\n",
3415 | "Psychologist 26\n",
3416 | "Photographer 25\n",
3417 | "Filmmaker 21\n",
3418 | "Author 20\n",
3419 | "Economist 20\n",
3420 | "Neuroscientist 20\n",
3421 | "Educator 20\n",
3422 | "Roboticist 16\n",
3423 | "Philosopher 16\n",
3424 | "Biologist 15\n",
3425 | "Physicist 14\n",
3426 | "Musician 11\n",
3427 | "Marine biologist 11\n",
3428 | "Technologist 10\n",
3429 | "Activist 10\n",
3430 | "Global health expert; data visionary 10\n",
3431 | "Historian 9\n",
3432 | "Singer/songwriter 9\n",
3433 | "Oceanographer 9\n",
3434 | "Behavioral economist 9\n",
3435 | "Poet 9\n",
3436 | "Astronomer 9\n",
3437 | "Graphic designer 9\n",
3438 | " ..\n",
3439 | "Legal activist 6\n",
3440 | "Photojournalist 6\n",
3441 | "Evolutionary biologist 6\n",
3442 | "Singer-songwriter 6\n",
3443 | "Performance poet, multimedia artist 6\n",
3444 | "Climate advocate 6\n",
3445 | "Techno-illusionist 6\n",
3446 | "Social entrepreneur 6\n",
3447 | "Comedian 6\n",
3448 | "Reporter 6\n",
3449 | "Writer, activist 6\n",
3450 | "Investor and advocate for moral leadership 5\n",
3451 | "Surgeon 5\n",
3452 | "Paleontologist 5\n",
3453 | "Physician 5\n",
3454 | "Tech visionary 5\n",
3455 | "Chef 5\n",
3456 | "Science writer 5\n",
3457 | "Game designer 5\n",
3458 | "Cartoonist 5\n",
3459 | "Producer 5\n",
3460 | "Violinist 5\n",
3461 | "Researcher 5\n",
3462 | "Social Media Theorist 5\n",
3463 | "Environmentalist, futurist 5\n",
3464 | "Data scientist 5\n",
3465 | "Musician, activist 5\n",
3466 | "Sculptor 5\n",
3467 | "Chemist 5\n",
3468 | "Sound consultant 5\n",
3469 | "Name: speaker_occupation, Length: 68, dtype: int64"
3470 | ]
3471 | },
3472 | "execution_count": 87,
3473 | "metadata": {},
3474 | "output_type": "execute_result"
3475 | }
3476 | ],
3477 | "source": [
3478 | "# show occupations which appear at least 5 times\n",
3479 | "occupation_counts[occupation_counts >= 5]"
3480 | ]
3481 | },
3482 | {
3483 | "cell_type": "code",
3484 | "execution_count": 88,
3485 | "metadata": {},
3486 | "outputs": [
3487 | {
3488 | "data": {
3489 | "text/plain": [
3490 | "Index(['Writer', 'Artist', 'Designer', 'Journalist', 'Entrepreneur',\n",
3491 | " 'Architect', 'Inventor', 'Psychologist', 'Photographer', 'Filmmaker',\n",
3492 | " 'Author', 'Economist', 'Neuroscientist', 'Educator', 'Roboticist',\n",
3493 | " 'Philosopher', 'Biologist', 'Physicist', 'Musician', 'Marine biologist',\n",
3494 | " 'Technologist', 'Activist', 'Global health expert; data visionary',\n",
3495 | " 'Historian', 'Singer/songwriter', 'Oceanographer',\n",
3496 | " 'Behavioral economist', 'Poet', 'Astronomer', 'Graphic designer',\n",
3497 | " 'Philanthropist', 'Novelist', 'Social psychologist', 'Engineer',\n",
3498 | " 'Computer scientist', 'Futurist', 'Astrophysicist', 'Mathematician',\n",
3499 | " 'Legal activist', 'Photojournalist', 'Evolutionary biologist',\n",
3500 | " 'Singer-songwriter', 'Performance poet, multimedia artist',\n",
3501 | " 'Climate advocate', 'Techno-illusionist', 'Social entrepreneur',\n",
3502 | " 'Comedian', 'Reporter', 'Writer, activist',\n",
3503 | " 'Investor and advocate for moral leadership', 'Surgeon',\n",
3504 | " 'Paleontologist', 'Physician', 'Tech visionary', 'Chef',\n",
3505 | " 'Science writer', 'Game designer', 'Cartoonist', 'Producer',\n",
3506 | " 'Violinist', 'Researcher', 'Social Media Theorist',\n",
3507 | " 'Environmentalist, futurist', 'Data scientist', 'Musician, activist',\n",
3508 | " 'Sculptor', 'Chemist', 'Sound consultant'],\n",
3509 | " dtype='object')"
3510 | ]
3511 | },
3512 | "execution_count": 88,
3513 | "metadata": {},
3514 | "output_type": "execute_result"
3515 | }
3516 | ],
3517 | "source": [
3518 | "# save the index of this Series\n",
3519 | "top_occupations = occupation_counts[occupation_counts >= 5].index\n",
3520 | "top_occupations"
3521 | ]
3522 | },
3523 | {
3524 | "cell_type": "markdown",
3525 | "metadata": {},
3526 | "source": [
3527 | "### Step 5: Re-analyze the funny rate by occupation (for top occupations only)"
3528 | ]
3529 | },
3530 | {
3531 | "cell_type": "code",
3532 | "execution_count": 89,
3533 | "metadata": {},
3534 | "outputs": [
3535 | {
3536 | "data": {
3537 | "text/plain": [
3538 | "(786, 24)"
3539 | ]
3540 | },
3541 | "execution_count": 89,
3542 | "metadata": {},
3543 | "output_type": "execute_result"
3544 | }
3545 | ],
3546 | "source": [
3547 | "# filter DataFrame to include only those occupations\n",
3548 | "ted_top_occupations = ted[ted.speaker_occupation.isin(top_occupations)]\n",
3549 | "ted_top_occupations.shape"
3550 | ]
3551 | },
3552 | {
3553 | "cell_type": "code",
3554 | "execution_count": 90,
3555 | "metadata": {},
3556 | "outputs": [
3557 | {
3558 | "data": {
3559 | "text/plain": [
3560 | "speaker_occupation\n",
3561 | "Surgeon 0.002465\n",
3562 | "Physician 0.004515\n",
3563 | "Photojournalist 0.004908\n",
3564 | "Investor and advocate for moral leadership 0.005198\n",
3565 | "Photographer 0.007152\n",
3566 | "Environmentalist, futurist 0.007317\n",
3567 | "Violinist 0.009534\n",
3568 | "Singer-songwriter 0.010597\n",
3569 | "Chemist 0.010970\n",
3570 | "Philanthropist 0.012522\n",
3571 | "Activist 0.012539\n",
3572 | "Astrophysicist 0.013147\n",
3573 | "Oceanographer 0.014596\n",
3574 | "Paleontologist 0.015780\n",
3575 | "Social psychologist 0.015887\n",
3576 | "Tech visionary 0.016654\n",
3577 | "Sculptor 0.016960\n",
3578 | "Social Media Theorist 0.017450\n",
3579 | "Social entrepreneur 0.017921\n",
3580 | "Inventor 0.021801\n",
3581 | "Sound consultant 0.022011\n",
3582 | "Legal activist 0.022303\n",
3583 | "Historian 0.023215\n",
3584 | "Musician, activist 0.023395\n",
3585 | "Economist 0.025488\n",
3586 | "Writer, activist 0.026665\n",
3587 | "Journalist 0.027997\n",
3588 | "Computer scientist 0.029070\n",
3589 | "Architect 0.030579\n",
3590 | "Engineer 0.031711\n",
3591 | " ... \n",
3592 | "Roboticist 0.042777\n",
3593 | "Astronomer 0.044581\n",
3594 | "Psychologist 0.044984\n",
3595 | "Musician 0.045336\n",
3596 | "Physicist 0.046302\n",
3597 | "Filmmaker 0.048603\n",
3598 | "Futurist 0.050460\n",
3599 | "Behavioral economist 0.050460\n",
3600 | "Technologist 0.050965\n",
3601 | "Chef 0.054207\n",
3602 | "Science writer 0.055993\n",
3603 | "Designer 0.059287\n",
3604 | "Writer 0.060745\n",
3605 | "Game designer 0.062317\n",
3606 | "Reporter 0.066250\n",
3607 | "Evolutionary biologist 0.069157\n",
3608 | "Novelist 0.070876\n",
3609 | "Entrepreneur 0.073295\n",
3610 | "Author 0.075508\n",
3611 | "Artist 0.078939\n",
3612 | "Global health expert; data visionary 0.090306\n",
3613 | "Poet 0.107398\n",
3614 | "Graphic designer 0.135718\n",
3615 | "Techno-illusionist 0.152171\n",
3616 | "Cartoonist 0.162120\n",
3617 | "Data scientist 0.184076\n",
3618 | "Producer 0.202531\n",
3619 | "Singer/songwriter 0.252205\n",
3620 | "Performance poet, multimedia artist 0.306468\n",
3621 | "Comedian 0.512457\n",
3622 | "Name: funny_rate, Length: 68, dtype: float64"
3623 | ]
3624 | },
3625 | "execution_count": 90,
3626 | "metadata": {},
3627 | "output_type": "execute_result"
3628 | }
3629 | ],
3630 | "source": [
3631 | "# redo the previous groupby\n",
3632 | "ted_top_occupations.groupby('speaker_occupation').funny_rate.mean().sort_values()"
3633 | ]
3634 | },
3635 | {
3636 | "cell_type": "markdown",
3637 | "metadata": {},
3638 | "source": [
3639 | "Lessons:\n",
3640 | "\n",
3641 | "1. Check your assumptions about your data\n",
3642 | "2. Check whether your results are reasonable\n",
3643 | "3. Take advantage of the fact that pandas operations often output a DataFrame or a Series\n",
3644 | "4. Watch out for small sample sizes\n",
3645 | "5. Consider the impact of missing data\n",
3646 | "6. Data scientists are hilarious"
3647 | ]
3648 | }
3649 | ],
3650 | "metadata": {
3651 | "kernelspec": {
3652 | "display_name": "Python 3",
3653 | "language": "python",
3654 | "name": "python3"
3655 | },
3656 | "language_info": {
3657 | "codemirror_mode": {
3658 | "name": "ipython",
3659 | "version": 3
3660 | },
3661 | "file_extension": ".py",
3662 | "mimetype": "text/x-python",
3663 | "name": "python",
3664 | "nbconvert_exporter": "python",
3665 | "pygments_lexer": "ipython3",
3666 | "version": "3.7.3"
3667 | }
3668 | },
3669 | "nbformat": 4,
3670 | "nbformat_minor": 2
3671 | }
3672 |
--------------------------------------------------------------------------------
/youtube.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/justmarkham/pycon-2019-tutorial/d522bbda9567437189251fdb2d4477e15c8bfde4/youtube.jpg
--------------------------------------------------------------------------------