├── Clase 1
├── 01_Introducción_y_red_de_transporte.ipynb
├── boeing-osmnx-street-networks.pdf
├── data
│ ├── USA-piedmont.graphml
│ ├── santa-monica.graphml
│ ├── scl-lascondes-network.graphml
│ └── scl-network.graphml
├── images
│ ├── paris_bldgs.png
│ └── piedmont_bldgs.png
└── img
│ ├── datagramas.png
│ ├── libro.png
│ └── wikipedia_song.png
├── Clase 2
├── 02_introducción_a_pandas-Formato-Estudiantes.ipynb
└── SalesJan2009.csv
├── Clase 3
├── 03_Groupping_&_Apply-Estudiantes.ipynb
├── grouping.png
└── grouping2.png
├── Clase 4
├── Seaborn & Matplotlib - Formato Estudiantes.ipynb
├── matplotlib.png
└── seaborn.png
├── Clase 5
├── Introducción a MatplotLib-Formato Estudiantes.ipynb
├── Ticks.png
├── axes.png
├── ejercicio1.png
├── ejercicio11.png
├── ejercicio2.png
├── figureParameters.png
├── figureParts.png
└── subplots.png
├── Clase 6
├── Clase 6 - Formato Estudiantes.ipynb
├── gqm.png
├── mini_ejercicio1.png
├── mini_ejercicio22.png
└── mini_ejercicio222.png
├── Clase 7
├── Final Clase 7 Estudiante.ipynb
├── graf1.png
├── graf2.png
├── graf3.png
├── graf4.png
├── graf5.png
├── prueba_plotly.png
├── timesData.csv
└── torpedo.png
├── Clase 8
├── Clase 8.ipynb
├── data-society-major-speeches-by-donald-trump.zip
└── data-society-twitters-about-us-airline.zip
└── README.md
/Clase 1/01_Introducción_y_red_de_transporte.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "\n",
12 | " \n",
13 | "\n",
14 | " \n",
15 | "\n",
16 | "# Taller de Manejo y Visualización de Datos con Python\n",
17 | "Introducción y motivación al taller \n",
18 | "Felipe González P. \n",
19 | "felipe.gonzalezp.12@sansano.usm.cl \n",
20 | "\n",
21 | " \n",
22 | "Jueves Bloque 7-8 \n",
23 | "Campus San Joaquin\n",
24 | "\n",
25 | "\n"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {
31 | "slideshow": {
32 | "slide_type": "slide"
33 | }
34 | },
35 | "source": [
36 | "
\n",
37 | "*Data by itself is uselss. Data is only useful if you apply it* \n",
38 | "
\n",
39 | "\n",
40 | " — Todd Park.\n",
41 | "
\n",
42 | " \n"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {
48 | "slideshow": {
49 | "slide_type": "slide"
50 | }
51 | },
52 | "source": [
53 | "## ¿Qué Haremos?"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {
59 | "slideshow": {
60 | "slide_type": "fragment"
61 | }
62 | },
63 | "source": [
64 | "Desarrollar habilidades de análisis y visualizacion de información. Se utilizará Python con el framework Anaconda y las librerías Pandas y Matplotlib."
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {
70 | "slideshow": {
71 | "slide_type": "slide"
72 | }
73 | },
74 | "source": [
75 | "\n",
76 | "## Objetivos\n"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {
82 | "slideshow": {
83 | "slide_type": "fragment"
84 | }
85 | },
86 | "source": [
87 | "* Extraer conjuntos de datos ( datasets ) desde la web"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {
93 | "slideshow": {
94 | "slide_type": "fragment"
95 | }
96 | },
97 | "source": [
98 | "* Aplicar técnicas de limpieza de datos\n"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {
104 | "slideshow": {
105 | "slide_type": "fragment"
106 | }
107 | },
108 | "source": [
109 | "* Aplicar distintas técnicas de visualización de información\n"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {
115 | "slideshow": {
116 | "slide_type": "fragment"
117 | }
118 | },
119 | "source": [
120 | "* Conocer aplicaciones de análisis y visualización de información en la industria\n"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {
126 | "slideshow": {
127 | "slide_type": "fragment"
128 | }
129 | },
130 | "source": [
131 | "* Presentar de manera oral y escrita información sobre un determinado dataset"
132 | ]
133 | },
134 | {
135 | "cell_type": "markdown",
136 | "metadata": {
137 | "slideshow": {
138 | "slide_type": "slide"
139 | }
140 | },
141 | "source": [
142 | "## Evaluación\n",
143 | " "
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {
149 | "slideshow": {
150 | "slide_type": "fragment"
151 | }
152 | },
153 | "source": [
154 | "### NF = 30% Asistencia + 5% Informe Preliminar + 65 % Trabajo Final ( 40 % Presentación + 60 % Informe Escrito)\n",
155 | "*Se requiere un mínimo de 80% de Asistencia para aprobar el curso.*"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {
161 | "slideshow": {
162 | "slide_type": "slide"
163 | }
164 | },
165 | "source": [
166 | "## Referencias"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {
172 | "slideshow": {
173 | "slide_type": "fragment"
174 | }
175 | },
176 | "source": [
177 | " \n",
178 | "
\n",
179 | "
\n",
180 | "
\n",
181 | "
Leer Libro Aquí \n",
182 | "
\n"
183 | ]
184 | },
185 | {
186 | "cell_type": "markdown",
187 | "metadata": {
188 | "slideshow": {
189 | "slide_type": "slide"
190 | }
191 | },
192 | "source": [
193 | "## Referencias"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "metadata": {
199 | "slideshow": {
200 | "slide_type": "fragment"
201 | }
202 | },
203 | "source": [
204 | " \n"
210 | ]
211 | },
212 | {
213 | "cell_type": "markdown",
214 | "metadata": {
215 | "slideshow": {
216 | "slide_type": "slide"
217 | }
218 | },
219 | "source": [
220 | "## Anaconda\n"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {
226 | "slideshow": {
227 | "slide_type": "fragment"
228 | }
229 | },
230 | "source": [
231 | "* Dice ser: *\"The Most Popular Python Data Science Platform\"*\n",
232 | "* Herramienta freemium que soporta Python y R. \n",
233 | "* Se utiliza para procesar grandes cantidades de datos, análisis predictivo y tareas de computacionación científica. \n",
234 | "* Su objetivo es simplificar la administración y despliegue de paquetes\n",
235 | "\n",
236 | "\n",
237 | "
\n",
238 | "
Descarga Anaconda Aquí \n",
239 | " **Importante: Se utilizará Python 2.7 para el desarrollo del taller**\n",
240 | "
\n"
241 | ]
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "metadata": {
246 | "slideshow": {
247 | "slide_type": "slide"
248 | }
249 | },
250 | "source": [
251 | "# Instalar Anaconda\n",
252 | "* Tutorial de Instalación de Anaconda \n",
253 | "\n",
254 | "# Ejecutar Anaconda\n",
255 | "* En linea de comandos escribir: jupyter notebook\n"
256 | ]
257 | },
258 | {
259 | "cell_type": "markdown",
260 | "metadata": {
261 | "slideshow": {
262 | "slide_type": "slide"
263 | }
264 | },
265 | "source": [
266 | " ¡COMENCEMOS! "
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {
272 | "slideshow": {
273 | "slide_type": "slide"
274 | }
275 | },
276 | "source": [
277 | "# OpenStreetMap y red de transporte\n",
278 | "\n",
279 | "Créditos a: Diego Caro y Eduardo Graells-Garrido del Instituto de Data Science, Ingeniería UDD.\n",
280 | "\n",
281 | "[OpenStreetMap](https://www.openstreetmap.org/) es una iniciativa colaborativa para crear el mapa editable del mundo. En otras palabras, Wikipedia es a Encyclopædia Britannica lo que OpenStreetMap es a Google Maps.\n",
282 | "\n",
283 | "En el ejercicio de hoy revisaremos cómo calcular la ruta más corta entre dos localidades utilizando información disponible en OpenStreetMap.\n",
284 | "\n",
285 | "Para procesar los datos de OpenStreetMap utilizaremos el módulo Python [OSMnx](https://github.com/gboeing/osmnx). OSMnx es desarrollado por [Geoff Boeing](https://twitter.com/gboeing). El análisis de la ruta más corta lo realizamos con el módulo [NetworkX](https://networkx.github.io/) \n",
286 | "\n"
287 | ]
288 | },
289 | {
290 | "cell_type": "markdown",
291 | "metadata": {
292 | "slideshow": {
293 | "slide_type": "slide"
294 | }
295 | },
296 | "source": [
297 | "*OSMnx lets you **download spatial geometries and construct, project, visualize, and analyze complex street networks**. It allows you to **automate the collection and computational analysis of street networks** for powerful and consistent research, transportation engineering, and urban design. OSMnx is built on top of NetworkX, matplotlib, and geopandas for rich network analytic capabilities, beautiful and simple visualizations, and fast spatial queries with R-tree.*\n",
298 | "\n",
299 | "### Documentación: http://osmnx.readthedocs.io/en/stable/osmnx.html"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {
305 | "slideshow": {
306 | "slide_type": "slide"
307 | }
308 | },
309 | "source": [
310 | "Revisa el [artículo](http://geoffboeing.com/publications/osmnx-complex-street-networks/) de OSMnx: \n",
311 | "* Boeing, G. 2017. “OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks.” Computers, Environment and Urban Systems. 65, 126-139. doi:10.1016/j.compenvurbsys.2017.05.004"
312 | ]
313 | },
314 | {
315 | "cell_type": "code",
316 | "execution_count": null,
317 | "metadata": {
318 | "slideshow": {
319 | "slide_type": "slide"
320 | }
321 | },
322 | "outputs": [],
323 | "source": [
324 | "import osmnx as ox\n",
325 | "import networkx as nx\n",
326 | "import matplotlib.pyplot as plt\n",
327 | "\n",
328 | "%config InlineBackend.figure_format = 'retina' # mejor resolución\n",
329 | "%matplotlib inline\n"
330 | ]
331 | },
332 | {
333 | "cell_type": "markdown",
334 | "metadata": {
335 | "slideshow": {
336 | "slide_type": "slide"
337 | }
338 | },
339 | "source": [
340 | "# Descargando Mapa\n",
341 | "La función [ox.graph_from_place(...)](http://osmnx.readthedocs.io/en/stable/osmnx.html?highlight=graph_from_place#osmnx.core.graph_from_place) descarga el mapa de una zona (comuna, región o país). Por ejemplo, el de la comuna de San Joaquín."
342 | ]
343 | },
344 | {
345 | "cell_type": "code",
346 | "execution_count": null,
347 | "metadata": {
348 | "slideshow": {
349 | "slide_type": "fragment"
350 | }
351 | },
352 | "outputs": [],
353 | "source": [
354 | "# Descargar mapa de la comuna de santiago\n",
355 | "#G = ox.graph_from_place('Santiago, Chile', network_type='drive')\n",
356 | "\n",
357 | "\n",
358 | "# Guardarlo para posterior uso :)\n",
359 | "#ox.save_graphml(G, filename='./scl-network.graphml')\n",
360 | "\n"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": null,
366 | "metadata": {
367 | "slideshow": {
368 | "slide_type": "fragment"
369 | }
370 | },
371 | "outputs": [],
372 | "source": [
373 | "# Cargar un mapa ya guardado\n",
374 | "G = ox.load_graphml('./scl-network.graphml')\n"
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "metadata": {
380 | "slideshow": {
381 | "slide_type": "slide"
382 | }
383 | },
384 | "source": [
385 | "La siguiente figura muestra el mapa de la comuna. Cada nodo (punto) indica alguna esquina o intersección de calles, y cada línea indica una calle."
386 | ]
387 | },
388 | {
389 | "cell_type": "code",
390 | "execution_count": null,
391 | "metadata": {
392 | "slideshow": {
393 | "slide_type": "fragment"
394 | }
395 | },
396 | "outputs": [],
397 | "source": [
398 | "ox.plot_graph(ox.project_graph(G))\n"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {
404 | "slideshow": {
405 | "slide_type": "slide"
406 | }
407 | },
408 | "source": [
409 | "# ¿Cuántos metros cuadrados cubre la comuna de Santiago?"
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": null,
415 | "metadata": {
416 | "slideshow": {
417 | "slide_type": "fragment"
418 | }
419 | },
420 | "outputs": [],
421 | "source": [
422 | "G_proj = ox.project_graph(G)\n",
423 | "nodes_proj = ox.graph_to_gdfs(G_proj, edges=False)\n",
424 | "graph_area_m = nodes_proj.unary_union.convex_hull.area\n",
425 | "graph_area_m\n",
426 | "print \"La ciudad de santiago está compuesta por \"+str(graph_area_m)+\" metros cuadrados\"\n",
427 | "print \"La ciudad de santiago está compuesta por \"+str(graph_area_m/1000)+\" kilómetros cuadrados\"\n"
428 | ]
429 | },
430 | {
431 | "cell_type": "markdown",
432 | "metadata": {
433 | "slideshow": {
434 | "slide_type": "slide"
435 | }
436 | },
437 | "source": [
438 | "## Ruta más corta\n",
439 | "\n",
440 | "En este ejemplo calcularemos la ruta más corta entre el Palacio Causiño y el Palacio de la Moneda.\n",
441 | "\n",
442 | "Lo primero es traducir una dirección en una coordenada latitud/longitud. Este proceso se conoce como [Geocoding](https://en.wikipedia.org/wiki/Geocoding). Podemos utilizar Google Maps como Geocoder:\n",
443 | "* Palacio Causiño https://www.google.cl/maps/place/Palacio+Cousi%C3%B1o/@-33.4497625,-70.6600015,16z/data=!4m8!1m2!2m1!1sla+moneda!3m4!1s0x9662c508abe3af79:0xa6201fda2ff7103b!8m2!3d-33.452104!4d-70.6568125\n",
444 | "* Palacio de la Moneda\n",
445 | "https://www.google.cl/maps/place/La+Moneda+Palace/@-33.4432722,-70.6560104,17z/data=!4m8!1m2!2m1!1spalacio+la+moneda!3m4!1s0x9662c5a6fd47e465:0x5d0fa12b4d88ae82!8m2!3d-33.4429091!4d-70.6538699?hl=en\n"
446 | ]
447 | },
448 | {
449 | "cell_type": "markdown",
450 | "metadata": {
451 | "slideshow": {
452 | "slide_type": "slide"
453 | }
454 | },
455 | "source": [
456 | "Definir los puntos de origen y destino"
457 | ]
458 | },
459 | {
460 | "cell_type": "code",
461 | "execution_count": null,
462 | "metadata": {
463 | "slideshow": {
464 | "slide_type": "fragment"
465 | }
466 | },
467 | "outputs": [],
468 | "source": [
469 | "origin_point = (-33.4497625,-70.6600015) # palacio causiño\n",
470 | "destination_point = (-33.4432722,-70.6560104) # Palacio de la moneda\n"
471 | ]
472 | },
473 | {
474 | "cell_type": "markdown",
475 | "metadata": {
476 | "slideshow": {
477 | "slide_type": "slide"
478 | }
479 | },
480 | "source": [
481 | "Para encontrar la ruta entre dos localidades, es necesario encontrar el nodo (interesección) más cercano al punto de origen y destino. La función ``ox.get_nearest_node(...)`` lo hace por nosotros."
482 | ]
483 | },
484 | {
485 | "cell_type": "code",
486 | "execution_count": null,
487 | "metadata": {
488 | "slideshow": {
489 | "slide_type": "fragment"
490 | }
491 | },
492 | "outputs": [],
493 | "source": [
494 | "origin_node = ox.get_nearest_node(G, origin_point) \n",
495 | "destination_node = ox.get_nearest_node(G, destination_point)\n",
496 | "origin_node, destination_node #Lo que retorna son ID de nodos dentro del mapa"
497 | ]
498 | },
499 | {
500 | "cell_type": "markdown",
501 | "metadata": {
502 | "slideshow": {
503 | "slide_type": "slide"
504 | }
505 | },
506 | "source": [
507 | "La distancia más corta se calcula con la función ``nx.shortest_path_length(...)``. La respuesta se entrega en metros."
508 | ]
509 | },
510 | {
511 | "cell_type": "code",
512 | "execution_count": null,
513 | "metadata": {
514 | "slideshow": {
515 | "slide_type": "fragment"
516 | }
517 | },
518 | "outputs": [],
519 | "source": [
520 | "distance = nx.shortest_path_length(G, origin_node, destination_node, weight='length')\n",
521 | "distance\n"
522 | ]
523 | },
524 | {
525 | "cell_type": "code",
526 | "execution_count": null,
527 | "metadata": {
528 | "slideshow": {
529 | "slide_type": "slide"
530 | }
531 | },
532 | "outputs": [],
533 | "source": [
534 | "# Encontrar el camino más corto entre los puntos de origen y destino\n",
535 | "route = nx.shortest_path(G, origin_node, destination_node, weight='length')\n",
536 | "str(route)"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": null,
542 | "metadata": {
543 | "slideshow": {
544 | "slide_type": "slide"
545 | }
546 | },
547 | "outputs": [],
548 | "source": [
549 | "# plot the route showing origin/destination lat-long points in blue\n",
550 | "fig, ax = ox.plot_graph_route(G, route, origin_point=origin_point, destination_point=destination_point)"
551 | ]
552 | },
553 | {
554 | "cell_type": "markdown",
555 | "metadata": {
556 | "slideshow": {
557 | "slide_type": "slide"
558 | }
559 | },
560 | "source": [
561 | "# Ejercicios\n",
562 | "\n",
563 | "1. Calcular la distancia más corta entre tu casa y la universidad.\n",
564 | "2. ¿Qué otros usos se le pueden dar a estos datos?"
565 | ]
566 | },
567 | {
568 | "cell_type": "code",
569 | "execution_count": null,
570 | "metadata": {
571 | "slideshow": {
572 | "slide_type": "slide"
573 | }
574 | },
575 | "outputs": [],
576 | "source": [
577 | "#Si se descargar el mapa de toda la región metropolitana\n",
578 | "#G = ox.graph_from_place('Provincia de Santiago, Chile', network_type='drive')\n"
579 | ]
580 | },
581 | {
582 | "cell_type": "markdown",
583 | "metadata": {
584 | "slideshow": {
585 | "slide_type": "slide"
586 | }
587 | },
588 | "source": [
589 | " Más sobre OSMNX "
590 | ]
591 | },
592 | {
593 | "cell_type": "code",
594 | "execution_count": null,
595 | "metadata": {
596 | "slideshow": {
597 | "slide_type": "slide"
598 | }
599 | },
600 | "outputs": [],
601 | "source": [
602 | "import osmnx as ox\n",
603 | "import networkx as nx\n",
604 | "import matplotlib.cm as cm\n",
605 | "import matplotlib.colors as colors\n",
606 | "import pandas as pd\n",
607 | "%matplotlib inline\n",
608 | "ox.config(log_console=True, use_cache=True)"
609 | ]
610 | },
611 | {
612 | "cell_type": "markdown",
613 | "metadata": {},
614 | "source": [
615 | "# Descargar Mapa"
616 | ]
617 | },
618 | {
619 | "cell_type": "code",
620 | "execution_count": null,
621 | "metadata": {
622 | "slideshow": {
623 | "slide_type": "slide"
624 | }
625 | },
626 | "outputs": [],
627 | "source": [
628 | "# Veamos una ciudad de estados unidos\n",
629 | "#M = ox.graph_from_place('Piedmont, California', network_type='drive') #walk para ver senderos\n",
630 | "#M = ox.project_graph(M)\n",
631 | "#ox.save_graphml(M, filename='./USA-piedmont.graphml')\n",
632 | "\n"
633 | ]
634 | },
635 | {
636 | "cell_type": "markdown",
637 | "metadata": {
638 | "slideshow": {
639 | "slide_type": "slide"
640 | }
641 | },
642 | "source": [
643 | "# Cargar mapa y mostrarlo"
644 | ]
645 | },
646 | {
647 | "cell_type": "code",
648 | "execution_count": null,
649 | "metadata": {
650 | "slideshow": {
651 | "slide_type": "fragment"
652 | }
653 | },
654 | "outputs": [],
655 | "source": [
656 | "M = ox.load_graphml('./USA-piedmont.graphml')\n",
657 | "fig, ax = ox.plot_graph(M, bgcolor='k', node_size=30, node_color='#999999', node_edgecolor='none', node_zorder=2,\n",
658 | " edge_color='#555555', edge_linewidth=1.5, edge_alpha=1)"
659 | ]
660 | },
661 | {
662 | "cell_type": "markdown",
663 | "metadata": {
664 | "slideshow": {
665 | "slide_type": "slide"
666 | }
667 | },
668 | "source": [
669 | "# Calcular y visualizar nodos centrales"
670 | ]
671 | },
672 | {
673 | "cell_type": "code",
674 | "execution_count": null,
675 | "metadata": {
676 | "slideshow": {
677 | "slide_type": "fragment"
678 | }
679 | },
680 | "outputs": [],
681 | "source": [
682 | "node_centrality = nx.closeness_centrality(M)\n"
683 | ]
684 | },
685 | {
686 | "cell_type": "code",
687 | "execution_count": null,
688 | "metadata": {
689 | "slideshow": {
690 | "slide_type": "fragment"
691 | }
692 | },
693 | "outputs": [],
694 | "source": [
695 | "# Mostremoslo\n",
696 | "df = pd.DataFrame(data=pd.Series(node_centrality).sort_values(), columns=['cc'])\n",
697 | "df['colors'] = ox.get_colors(n=len(df), cmap='inferno', start=0.2)\n",
698 | "df = df.reindex(M.nodes())\n",
699 | "nc = df['colors'].tolist()\n",
700 | "fig, ax = ox.plot_graph(M, bgcolor='k', node_size=30, node_color=nc, node_edgecolor='none', node_zorder=2,\n",
701 | " edge_color='#555555', edge_linewidth=1.5, edge_alpha=1)"
702 | ]
703 | },
704 | {
705 | "cell_type": "markdown",
706 | "metadata": {
707 | "slideshow": {
708 | "slide_type": "slide"
709 | }
710 | },
711 | "source": [
712 | " Revisando el arco del triunfo "
713 | ]
714 | },
715 | {
716 | "cell_type": "code",
717 | "execution_count": null,
718 | "metadata": {
719 | "slideshow": {
720 | "slide_type": "slide"
721 | }
722 | },
723 | "outputs": [],
724 | "source": [
725 | "import osmnx as ox\n",
726 | "from IPython.display import Image\n",
727 | "\n",
728 | "# configurar \n",
729 | "img_folder = 'images'\n",
730 | "extension = 'png'\n",
731 | "size = 240"
732 | ]
733 | },
734 | {
735 | "cell_type": "code",
736 | "execution_count": null,
737 | "metadata": {
738 | "slideshow": {
739 | "slide_type": "slide"
740 | }
741 | },
742 | "outputs": [],
743 | "source": [
744 | "point = (48.873446, 2.294255)\n",
745 | "dist = 612\n",
746 | "gdf = ox.buildings_from_point(point=point, distance=dist)\n",
747 | "gdf_proj = ox.project_gdf(gdf)\n",
748 | "bbox = ox.bbox_from_point(point=point, distance=dist, project_utm=True)\n",
749 | "fig, ax = ox.plot_buildings(gdf_proj, bgcolor='#333333', color='w', figsize=(4,4), bbox=bbox,\n",
750 | " save=True, show=False, close=True, filename='paris_bldgs', dpi=90)\n",
751 | "Image('{}/{}.{}'.format(img_folder, 'paris_bldgs', extension), height=size, width=size)"
752 | ]
753 | },
754 | {
755 | "cell_type": "markdown",
756 | "metadata": {
757 | "slideshow": {
758 | "slide_type": "fragment"
759 | }
760 | },
761 | "source": [
762 | "# ¡Que engorroso :( !"
763 | ]
764 | },
765 | {
766 | "cell_type": "markdown",
767 | "metadata": {
768 | "slideshow": {
769 | "slide_type": "slide"
770 | }
771 | },
772 | "source": [
773 | "# Ahora con funciones"
774 | ]
775 | },
776 | {
777 | "cell_type": "code",
778 | "execution_count": null,
779 | "metadata": {
780 | "slideshow": {
781 | "slide_type": "fragment"
782 | }
783 | },
784 | "outputs": [],
785 | "source": [
786 | "# helper funcion to get one-square-mile street networks, building footprints, and plot them\n",
787 | "def make_plot(place, point, network_type='drive', bldg_color='orange', dpi=40,\n",
788 | " dist=805, default_width=4, street_widths=None):\n",
789 | " gdf = ox.buildings_from_point(point=point, distance=dist)\n",
790 | " gdf_proj = ox.project_gdf(gdf)\n",
791 | " fig, ax = ox.plot_figure_ground(point=point, dist=dist, network_type=network_type, default_width=default_width,\n",
792 | " street_widths=street_widths, save=False, show=False, close=True)\n",
793 | " fig, ax = ox.plot_buildings(gdf_proj, fig=fig, ax=ax, color=bldg_color, set_bounds=False,\n",
794 | " save=True, show=False, close=True, filename=place, dpi=dpi)"
795 | ]
796 | },
797 | {
798 | "cell_type": "code",
799 | "execution_count": null,
800 | "metadata": {
801 | "slideshow": {
802 | "slide_type": "fragment"
803 | }
804 | },
805 | "outputs": [],
806 | "source": [
807 | "point = (48.873446, 2.294255)\n",
808 | "place = 'paris_bldgs'\n",
809 | "\n",
810 | "make_plot(place, point)\n",
811 | "Image('{}/{}.{}'.format(img_folder, place, extension), height=size, width=size)\n"
812 | ]
813 | },
814 | {
815 | "cell_type": "markdown",
816 | "metadata": {
817 | "slideshow": {
818 | "slide_type": "slide"
819 | }
820 | },
821 | "source": [
822 | "# Obtener principales estadísticas\n",
823 | "\n",
824 | "### Documentación: https://osmnx.readthedocs.io/en/stable/osmnx.html#module-osmnx.stats"
825 | ]
826 | },
827 | {
828 | "cell_type": "code",
829 | "execution_count": null,
830 | "metadata": {
831 | "slideshow": {
832 | "slide_type": "fragment"
833 | }
834 | },
835 | "outputs": [],
836 | "source": [
837 | "basic_stats = ox.basic_stats(M)\n",
838 | "for key in basic_stats:\n",
839 | " print key, basic_stats[key]"
840 | ]
841 | },
842 | {
843 | "cell_type": "markdown",
844 | "metadata": {
845 | "slideshow": {
846 | "slide_type": "slide"
847 | }
848 | },
849 | "source": [
850 | "# Obtener mayor información\n",
851 | "### turn on/ turn off análisis topológicos\n"
852 | ]
853 | },
854 | {
855 | "cell_type": "code",
856 | "execution_count": null,
857 | "metadata": {
858 | "slideshow": {
859 | "slide_type": "fragment"
860 | }
861 | },
862 | "outputs": [],
863 | "source": [
864 | "more_stats = ox.extended_stats(M, ecc=True, bc=True, cc=True) \n",
865 | "for key in sorted(more_stats.keys()):\n",
866 | " print(key)"
867 | ]
868 | },
869 | {
870 | "cell_type": "markdown",
871 | "metadata": {
872 | "slideshow": {
873 | "slide_type": "slide"
874 | }
875 | },
876 | "source": [
877 | " ¿Preguntas? "
878 | ]
879 | }
880 | ],
881 | "metadata": {
882 | "anaconda-cloud": {},
883 | "celltoolbar": "Slideshow",
884 | "kernelspec": {
885 | "display_name": "Python 2",
886 | "language": "python",
887 | "name": "python2"
888 | },
889 | "language_info": {
890 | "codemirror_mode": {
891 | "name": "ipython",
892 | "version": 2
893 | },
894 | "file_extension": ".py",
895 | "mimetype": "text/x-python",
896 | "name": "python",
897 | "nbconvert_exporter": "python",
898 | "pygments_lexer": "ipython2",
899 | "version": "2.7.14"
900 | }
901 | },
902 | "nbformat": 4,
903 | "nbformat_minor": 1
904 | }
905 |
--------------------------------------------------------------------------------
/Clase 1/boeing-osmnx-street-networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/boeing-osmnx-street-networks.pdf
--------------------------------------------------------------------------------
/Clase 1/images/paris_bldgs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/images/paris_bldgs.png
--------------------------------------------------------------------------------
/Clase 1/images/piedmont_bldgs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/images/piedmont_bldgs.png
--------------------------------------------------------------------------------
/Clase 1/img/datagramas.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/img/datagramas.png
--------------------------------------------------------------------------------
/Clase 1/img/libro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/img/libro.png
--------------------------------------------------------------------------------
/Clase 1/img/wikipedia_song.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 1/img/wikipedia_song.png
--------------------------------------------------------------------------------
/Clase 2/02_introducción_a_pandas-Formato-Estudiantes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "\n",
12 | " \n",
13 | "\n",
14 | " \n",
15 | "\n",
16 | "# Taller de Manejo y Visualización de Datos con Python\n",
17 | "### Introducción a Pandas \n",
18 | "\n",
19 | "Felipe González P. \n",
20 | "felipe.gonzalezp.12@sansano.usm.cl \n",
21 | "\n",
22 | "\n",
23 | "\n",
24 | "\n"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {
30 | "slideshow": {
31 | "slide_type": "slide"
32 | }
33 | },
34 | "source": [
35 | "## ¿Qué es Numpy?"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {
41 | "slideshow": {
42 | "slide_type": "fragment"
43 | }
44 | },
45 | "source": [
46 | "Numpy es extensión de Python que le agrega mayor soporte para vectores y matrices, constituyendo una biblioteca de funciones matemáticas de alto nivel para operar con estos elementos. "
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {
52 | "slideshow": {
53 | "slide_type": "slide"
54 | }
55 | },
56 | "source": [
57 | "## Ejemplo"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": null,
63 | "metadata": {
64 | "slideshow": {
65 | "slide_type": "fragment"
66 | }
67 | },
68 | "outputs": [],
69 | "source": [
70 | "import numpy\n",
71 | "from matplotlib import pyplot\n",
72 | "%matplotlib inline\n",
73 | "x = numpy.linspace(0, 2 * numpy.pi,10) #variemos el último valor. 50 es el valor por defecto\n",
74 | "y = numpy.sin(x)\n",
75 | "pyplot.plot(x, y)\n",
76 | "pyplot.show()\n",
77 | "\n"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {
83 | "slideshow": {
84 | "slide_type": "slide"
85 | }
86 | },
87 | "source": [
88 | "## ¿Qué es Pandas?"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {
94 | "slideshow": {
95 | "slide_type": "fragment"
96 | }
97 | },
98 | "source": [
99 | "* Es una librería escrita como extensión de Numpy para la manipulación y análisis de datos en Python. \n",
100 | "* Resuelve las limitaciones que tiene Numpy\n",
101 | "* Muy útil para trabajar con data que no está muy bien organizada. "
102 | ]
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {
107 | "slideshow": {
108 | "slide_type": "slide"
109 | }
110 | },
111 | "source": [
112 | " "
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {
118 | "slideshow": {
119 | "slide_type": "slide"
120 | }
121 | },
122 | "source": [
123 | "\n",
124 | "## Importar Pandas\n"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": null,
130 | "metadata": {
131 | "slideshow": {
132 | "slide_type": "fragment"
133 | }
134 | },
135 | "outputs": [],
136 | "source": [
137 | "import pandas as pd\n",
138 | "pd.__version__\n"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {
144 | "slideshow": {
145 | "slide_type": "slide"
146 | }
147 | },
148 | "source": [
149 | "# Recordemos"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {
156 | "slideshow": {
157 | "slide_type": "fragment"
158 | }
159 | },
160 | "outputs": [],
161 | "source": [
162 | "#pd.\n",
163 | "pd?"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {
169 | "slideshow": {
170 | "slide_type": "slide"
171 | }
172 | },
173 | "source": [
174 | " ¿Qué haremos con Pandas? "
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {
180 | "slideshow": {
181 | "slide_type": "slide"
182 | }
183 | },
184 | "source": [
185 | "# Data Munging\n",
186 | "\n",
187 | "Es el proceso de transformar datos \"puros\" en otro formato con el objetivo de hacerlos más apropiados y valioso para propósitos de análisis. \n"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {
193 | "slideshow": {
194 | "slide_type": "slide"
195 | }
196 | },
197 | "source": [
198 | "# Usar Pandas para Análisis de datos \n",
199 | "## Data Munging\n",
200 | "\n",
201 | "Basado en: WaveDataLab "
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {
207 | "slideshow": {
208 | "slide_type": "slide"
209 | }
210 | },
211 | "source": [
212 | "Importar librerías"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": null,
218 | "metadata": {
219 | "slideshow": {
220 | "slide_type": "fragment"
221 | }
222 | },
223 | "outputs": [],
224 | "source": [
225 | "%matplotlib\n",
226 | "import numpy as np\n",
227 | "import pandas as pd"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {
233 | "slideshow": {
234 | "slide_type": "slide"
235 | }
236 | },
237 | "source": [
238 | "Leer archivo CSV"
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": null,
244 | "metadata": {
245 | "slideshow": {
246 | "slide_type": "fragment"
247 | }
248 | },
249 | "outputs": [],
250 | "source": [
251 | "ver=pd.read_csv(\"SalesJan2009.csv\",\";\")\n"
252 | ]
253 | },
254 | {
255 | "cell_type": "markdown",
256 | "metadata": {
257 | "slideshow": {
258 | "slide_type": "slide"
259 | }
260 | },
261 | "source": [
262 | "Ver las primeras filas de un archivo"
263 | ]
264 | },
265 | {
266 | "cell_type": "code",
267 | "execution_count": null,
268 | "metadata": {
269 | "slideshow": {
270 | "slide_type": "fragment"
271 | }
272 | },
273 | "outputs": [],
274 | "source": [
275 | "ver.head(100)"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {
281 | "slideshow": {
282 | "slide_type": "slide"
283 | }
284 | },
285 | "source": [
286 | "Encontrar el número de filas y columnas de un archivo"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": null,
292 | "metadata": {
293 | "slideshow": {
294 | "slide_type": "fragment"
295 | }
296 | },
297 | "outputs": [],
298 | "source": [
299 | "ver.shape\n"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {
305 | "slideshow": {
306 | "slide_type": "fragment"
307 | }
308 | },
309 | "source": [
310 | "Encontrar el número de filas"
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": null,
316 | "metadata": {
317 | "slideshow": {
318 | "slide_type": "fragment"
319 | }
320 | },
321 | "outputs": [],
322 | "source": [
323 | "len(ver)"
324 | ]
325 | },
326 | {
327 | "cell_type": "markdown",
328 | "metadata": {
329 | "slideshow": {
330 | "slide_type": "fragment"
331 | }
332 | },
333 | "source": [
334 | "Encontrar los nombres de las columnas"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": null,
340 | "metadata": {
341 | "slideshow": {
342 | "slide_type": "fragment"
343 | }
344 | },
345 | "outputs": [],
346 | "source": [
347 | "ver.columns"
348 | ]
349 | },
350 | {
351 | "cell_type": "markdown",
352 | "metadata": {
353 | "slideshow": {
354 | "slide_type": "slide"
355 | }
356 | },
357 | "source": [
358 | "Mostrar las primeras 5 filas de una determinada columna"
359 | ]
360 | },
361 | {
362 | "cell_type": "code",
363 | "execution_count": null,
364 | "metadata": {
365 | "slideshow": {
366 | "slide_type": "fragment"
367 | }
368 | },
369 | "outputs": [],
370 | "source": [
371 | "ver['Name'][:5]"
372 | ]
373 | },
374 | {
375 | "cell_type": "markdown",
376 | "metadata": {
377 | "slideshow": {
378 | "slide_type": "slide"
379 | }
380 | },
381 | "source": [
382 | "### Crear rangos categóricos para datos numéricos. \n",
383 | "\n",
384 | "\n",
385 | "*Se puede especificar la cantidad de rangos que se desea* \n",
386 | "\n",
387 | "¿Cuando es útil? Cuando se desea convertir edades en rangos etarios"
388 | ]
389 | },
390 | {
391 | "cell_type": "code",
392 | "execution_count": null,
393 | "metadata": {
394 | "slideshow": {
395 | "slide_type": "slide"
396 | }
397 | },
398 | "outputs": [],
399 | "source": [
400 | "latitudes = pd.cut(ver['Latitude'], 5)\n",
401 | "latitudes[:5]\n",
402 | "#Notar que son equispaciados"
403 | ]
404 | },
405 | {
406 | "cell_type": "markdown",
407 | "metadata": {
408 | "slideshow": {
409 | "slide_type": "slide"
410 | }
411 | },
412 | "source": [
413 | "¿Cuántos filas hay por cada rango?"
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": null,
419 | "metadata": {
420 | "slideshow": {
421 | "slide_type": "fragment"
422 | }
423 | },
424 | "outputs": [],
425 | "source": [
426 | "pd.value_counts(latitudes)\n"
427 | ]
428 | },
429 | {
430 | "cell_type": "markdown",
431 | "metadata": {
432 | "slideshow": {
433 | "slide_type": "slide"
434 | }
435 | },
436 | "source": [
437 | "Ordenar los datos dada una determinada columna"
438 | ]
439 | },
440 | {
441 | "cell_type": "code",
442 | "execution_count": null,
443 | "metadata": {
444 | "slideshow": {
445 | "slide_type": "fragment"
446 | }
447 | },
448 | "outputs": [],
449 | "source": [
450 | "ver.sort_values(by=['Price'])"
451 | ]
452 | },
453 | {
454 | "cell_type": "markdown",
455 | "metadata": {
456 | "slideshow": {
457 | "slide_type": "slide"
458 | }
459 | },
460 | "source": [
461 | "Ordenar por múltiples columnas"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "metadata": {
468 | "slideshow": {
469 | "slide_type": "fragment"
470 | }
471 | },
472 | "outputs": [],
473 | "source": [
474 | "ver.sort_values(by=['Transaction_date','Last_Login'])[:5]\n"
475 | ]
476 | },
477 | {
478 | "cell_type": "markdown",
479 | "metadata": {
480 | "slideshow": {
481 | "slide_type": "slide"
482 | }
483 | },
484 | "source": [
485 | "Poner NaN primero"
486 | ]
487 | },
488 | {
489 | "cell_type": "code",
490 | "execution_count": null,
491 | "metadata": {
492 | "slideshow": {
493 | "slide_type": "fragment"
494 | }
495 | },
496 | "outputs": [],
497 | "source": [
498 | "ver.sort_values(by='Payment_Type', ascending=False, na_position='first')\n"
499 | ]
500 | },
501 | {
502 | "cell_type": "markdown",
503 | "metadata": {
504 | "slideshow": {
505 | "slide_type": "slide"
506 | }
507 | },
508 | "source": [
509 | "Contar valores de una determinada columna"
510 | ]
511 | },
512 | {
513 | "cell_type": "code",
514 | "execution_count": null,
515 | "metadata": {
516 | "slideshow": {
517 | "slide_type": "fragment"
518 | }
519 | },
520 | "outputs": [],
521 | "source": [
522 | "ver['Payment_Type'].value_counts()\n",
523 | "#NaN no aparece"
524 | ]
525 | },
526 | {
527 | "cell_type": "markdown",
528 | "metadata": {
529 | "slideshow": {
530 | "slide_type": "slide"
531 | }
532 | },
533 | "source": [
534 | "Encontrar los tipos de cada columna"
535 | ]
536 | },
537 | {
538 | "cell_type": "code",
539 | "execution_count": null,
540 | "metadata": {
541 | "slideshow": {
542 | "slide_type": "fragment"
543 | }
544 | },
545 | "outputs": [],
546 | "source": [
547 | "ver.dtypes\n"
548 | ]
549 | },
550 | {
551 | "cell_type": "markdown",
552 | "metadata": {
553 | "slideshow": {
554 | "slide_type": "slide"
555 | }
556 | },
557 | "source": [
558 | "Obtener los valores únicos de una columna"
559 | ]
560 | },
561 | {
562 | "cell_type": "code",
563 | "execution_count": null,
564 | "metadata": {
565 | "slideshow": {
566 | "slide_type": "fragment"
567 | }
568 | },
569 | "outputs": [],
570 | "source": [
571 | "ver['Country'].unique()\n"
572 | ]
573 | },
574 | {
575 | "cell_type": "markdown",
576 | "metadata": {
577 | "slideshow": {
578 | "slide_type": "slide"
579 | }
580 | },
581 | "source": [
582 | "# Mini Ejercicio\n",
583 | "## ¿A cuantos países se envía mercadería?"
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": null,
589 | "metadata": {
590 | "slideshow": {
591 | "slide_type": "fragment"
592 | }
593 | },
594 | "outputs": [],
595 | "source": []
596 | },
597 | {
598 | "cell_type": "markdown",
599 | "metadata": {
600 | "slideshow": {
601 | "slide_type": "slide"
602 | }
603 | },
604 | "source": [
605 | "Hacer comparaciones"
606 | ]
607 | },
608 | {
609 | "cell_type": "code",
610 | "execution_count": null,
611 | "metadata": {
612 | "slideshow": {
613 | "slide_type": "fragment"
614 | }
615 | },
616 | "outputs": [],
617 | "source": [
618 | "ver.loc[0:3,'Payment_Type'] == \"Visa\"\n"
619 | ]
620 | },
621 | {
622 | "cell_type": "markdown",
623 | "metadata": {
624 | "slideshow": {
625 | "slide_type": "slide"
626 | }
627 | },
628 | "source": [
629 | " Ejercicios "
630 | ]
631 | },
632 | {
633 | "cell_type": "markdown",
634 | "metadata": {
635 | "slideshow": {
636 | "slide_type": "slide"
637 | }
638 | },
639 | "source": [
640 | "# Parte 1: Obtener y explorar los datos\n",
641 | "Agradecimientos a : Guilherme Samora y Kevin Markham \n",
642 | "\n",
643 | "### Importar pandas y numpy"
644 | ]
645 | },
646 | {
647 | "cell_type": "code",
648 | "execution_count": null,
649 | "metadata": {
650 | "slideshow": {
651 | "slide_type": "fragment"
652 | }
653 | },
654 | "outputs": [],
655 | "source": []
656 | },
657 | {
658 | "cell_type": "markdown",
659 | "metadata": {
660 | "slideshow": {
661 | "slide_type": "slide"
662 | }
663 | },
664 | "source": [
665 | "### Observar/Descargar los documentos desde [esta dirección](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). \n",
666 | "\n",
667 | "### Los datos deben quedar almacenados en una variable"
668 | ]
669 | },
670 | {
671 | "cell_type": "code",
672 | "execution_count": null,
673 | "metadata": {
674 | "slideshow": {
675 | "slide_type": "fragment"
676 | }
677 | },
678 | "outputs": [],
679 | "source": []
680 | },
681 | {
682 | "cell_type": "markdown",
683 | "metadata": {
684 | "slideshow": {
685 | "slide_type": "slide"
686 | }
687 | },
688 | "source": [
689 | "### Mostrar las primeras 10 filas"
690 | ]
691 | },
692 | {
693 | "cell_type": "code",
694 | "execution_count": null,
695 | "metadata": {
696 | "slideshow": {
697 | "slide_type": "fragment"
698 | }
699 | },
700 | "outputs": [],
701 | "source": []
702 | },
703 | {
704 | "cell_type": "markdown",
705 | "metadata": {
706 | "slideshow": {
707 | "slide_type": "slide"
708 | }
709 | },
710 | "source": [
711 | "## Mostrar 'choice_description' del quinto elemento del dataset"
712 | ]
713 | },
714 | {
715 | "cell_type": "code",
716 | "execution_count": null,
717 | "metadata": {
718 | "slideshow": {
719 | "slide_type": "fragment"
720 | }
721 | },
722 | "outputs": [],
723 | "source": []
724 | },
725 | {
726 | "cell_type": "markdown",
727 | "metadata": {
728 | "slideshow": {
729 | "slide_type": "slide"
730 | }
731 | },
732 | "source": [
733 | "### ¿Cuál es el número de filas en el dataset?"
734 | ]
735 | },
736 | {
737 | "cell_type": "code",
738 | "execution_count": null,
739 | "metadata": {
740 | "slideshow": {
741 | "slide_type": "fragment"
742 | }
743 | },
744 | "outputs": [],
745 | "source": []
746 | },
747 | {
748 | "cell_type": "markdown",
749 | "metadata": {
750 | "slideshow": {
751 | "slide_type": "slide"
752 | }
753 | },
754 | "source": [
755 | "### ¿Cuál es el número de columnas?"
756 | ]
757 | },
758 | {
759 | "cell_type": "code",
760 | "execution_count": null,
761 | "metadata": {
762 | "slideshow": {
763 | "slide_type": "fragment"
764 | }
765 | },
766 | "outputs": [],
767 | "source": []
768 | },
769 | {
770 | "cell_type": "markdown",
771 | "metadata": {
772 | "slideshow": {
773 | "slide_type": "slide"
774 | }
775 | },
776 | "source": [
777 | "### ¿Cuántos productos se vendieron en total?"
778 | ]
779 | },
780 | {
781 | "cell_type": "code",
782 | "execution_count": null,
783 | "metadata": {
784 | "slideshow": {
785 | "slide_type": "fragment"
786 | }
787 | },
788 | "outputs": [],
789 | "source": []
790 | },
791 | {
792 | "cell_type": "markdown",
793 | "metadata": {
794 | "slideshow": {
795 | "slide_type": "slide"
796 | }
797 | },
798 | "source": [
799 | "### ¿Cuál fue la ganancia total?\n"
800 | ]
801 | },
802 | {
803 | "cell_type": "code",
804 | "execution_count": null,
805 | "metadata": {
806 | "slideshow": {
807 | "slide_type": "fragment"
808 | }
809 | },
810 | "outputs": [],
811 | "source": []
812 | },
813 | {
814 | "cell_type": "markdown",
815 | "metadata": {
816 | "slideshow": {
817 | "slide_type": "slide"
818 | }
819 | },
820 | "source": [
821 | "### ¿Cuántas ordenes se generaron?"
822 | ]
823 | },
824 | {
825 | "cell_type": "code",
826 | "execution_count": null,
827 | "metadata": {
828 | "slideshow": {
829 | "slide_type": "fragment"
830 | }
831 | },
832 | "outputs": [],
833 | "source": []
834 | },
835 | {
836 | "cell_type": "markdown",
837 | "metadata": {
838 | "slideshow": {
839 | "slide_type": "slide"
840 | }
841 | },
842 | "source": [
843 | "### ¿Cuántos diferentes productos se vendieron?"
844 | ]
845 | },
846 | {
847 | "cell_type": "code",
848 | "execution_count": null,
849 | "metadata": {
850 | "slideshow": {
851 | "slide_type": "fragment"
852 | }
853 | },
854 | "outputs": [],
855 | "source": []
856 | },
857 | {
858 | "cell_type": "markdown",
859 | "metadata": {
860 | "slideshow": {
861 | "slide_type": "slide"
862 | }
863 | },
864 | "source": [
865 | "### ¿Cuántos productos cuentan más de $10.00?"
866 | ]
867 | },
868 | {
869 | "cell_type": "code",
870 | "execution_count": null,
871 | "metadata": {
872 | "slideshow": {
873 | "slide_type": "fragment"
874 | }
875 | },
876 | "outputs": [],
877 | "source": []
878 | },
879 | {
880 | "cell_type": "markdown",
881 | "metadata": {
882 | "slideshow": {
883 | "slide_type": "slide"
884 | }
885 | },
886 | "source": [
887 | "### De los productos que se vendieron solo una vez ¿Cuál es el precio para cada item?"
888 | ]
889 | },
890 | {
891 | "cell_type": "code",
892 | "execution_count": null,
893 | "metadata": {
894 | "slideshow": {
895 | "slide_type": "fragment"
896 | }
897 | },
898 | "outputs": [],
899 | "source": []
900 | },
901 | {
902 | "cell_type": "markdown",
903 | "metadata": {
904 | "slideshow": {
905 | "slide_type": "slide"
906 | }
907 | },
908 | "source": [
909 | "### Ordenar productos por nombre"
910 | ]
911 | },
912 | {
913 | "cell_type": "code",
914 | "execution_count": null,
915 | "metadata": {
916 | "slideshow": {
917 | "slide_type": "fragment"
918 | }
919 | },
920 | "outputs": [],
921 | "source": []
922 | },
923 | {
924 | "cell_type": "markdown",
925 | "metadata": {
926 | "slideshow": {
927 | "slide_type": "slide"
928 | }
929 | },
930 | "source": [
931 | "### ¿Cuál es el producto más caro? ¿A qué precio se vende?"
932 | ]
933 | },
934 | {
935 | "cell_type": "code",
936 | "execution_count": null,
937 | "metadata": {
938 | "slideshow": {
939 | "slide_type": "fragment"
940 | }
941 | },
942 | "outputs": [],
943 | "source": []
944 | },
945 | {
946 | "cell_type": "markdown",
947 | "metadata": {
948 | "slideshow": {
949 | "slide_type": "slide"
950 | }
951 | },
952 | "source": [
953 | "### ¿Cuánta gente ordenó \"Veggie Salad Bowl\"?"
954 | ]
955 | },
956 | {
957 | "cell_type": "code",
958 | "execution_count": null,
959 | "metadata": {
960 | "slideshow": {
961 | "slide_type": "fragment"
962 | }
963 | },
964 | "outputs": [],
965 | "source": []
966 | },
967 | {
968 | "cell_type": "markdown",
969 | "metadata": {
970 | "slideshow": {
971 | "slide_type": "slide"
972 | }
973 | },
974 | "source": [
975 | "### ¿Cuántas veces las personas ordenaron más de una \"Canned Soda\"?"
976 | ]
977 | },
978 | {
979 | "cell_type": "code",
980 | "execution_count": null,
981 | "metadata": {
982 | "slideshow": {
983 | "slide_type": "fragment"
984 | }
985 | },
986 | "outputs": [],
987 | "source": []
988 | },
989 | {
990 | "cell_type": "markdown",
991 | "metadata": {
992 | "slideshow": {
993 | "slide_type": "slide"
994 | }
995 | },
996 | "source": [
997 | " Agrupar datos por columnas con .groupby()\n",
998 | " "
999 | ]
1000 | },
1001 | {
1002 | "cell_type": "markdown",
1003 | "metadata": {
1004 | "slideshow": {
1005 | "slide_type": "slide"
1006 | }
1007 | },
1008 | "source": [
1009 | "### ¿Cuál es el producto más solicitado?"
1010 | ]
1011 | },
1012 | {
1013 | "cell_type": "code",
1014 | "execution_count": null,
1015 | "metadata": {
1016 | "slideshow": {
1017 | "slide_type": "fragment"
1018 | }
1019 | },
1020 | "outputs": [],
1021 | "source": [
1022 | "c = chipo.groupby('item_name')\n",
1023 | "c = c.sum()\n",
1024 | "c = c.sort_values(['quantity'], ascending=False)\n",
1025 | "c.head(1)"
1026 | ]
1027 | },
1028 | {
1029 | "cell_type": "markdown",
1030 | "metadata": {
1031 | "slideshow": {
1032 | "slide_type": "slide"
1033 | }
1034 | },
1035 | "source": [
1036 | "### ¿Cuántos items de este producto fueron pedidos?"
1037 | ]
1038 | },
1039 | {
1040 | "cell_type": "code",
1041 | "execution_count": null,
1042 | "metadata": {
1043 | "slideshow": {
1044 | "slide_type": "fragment"
1045 | }
1046 | },
1047 | "outputs": [],
1048 | "source": [
1049 | "c = chipo.groupby('item_name')\n",
1050 | "c = c.sum()\n",
1051 | "c = c.sort_values(['quantity'], ascending=False)\n",
1052 | "c.head(1)"
1053 | ]
1054 | },
1055 | {
1056 | "cell_type": "markdown",
1057 | "metadata": {
1058 | "slideshow": {
1059 | "slide_type": "slide"
1060 | }
1061 | },
1062 | "source": [
1063 | "### ¿Cuánto vale en promedio los productos del local?"
1064 | ]
1065 | },
1066 | {
1067 | "cell_type": "code",
1068 | "execution_count": null,
1069 | "metadata": {
1070 | "slideshow": {
1071 | "slide_type": "fragment"
1072 | }
1073 | },
1074 | "outputs": [],
1075 | "source": [
1076 | "chipo.groupby(by=['order_id']).sum().mean()['item_price']\n"
1077 | ]
1078 | },
1079 | {
1080 | "cell_type": "markdown",
1081 | "metadata": {
1082 | "slideshow": {
1083 | "slide_type": "slide"
1084 | }
1085 | },
1086 | "source": [
1087 | "### ¿Cuándo fue el elemento más pedido según 'choice_description' ?"
1088 | ]
1089 | },
1090 | {
1091 | "cell_type": "code",
1092 | "execution_count": null,
1093 | "metadata": {
1094 | "slideshow": {
1095 | "slide_type": "fragment"
1096 | }
1097 | },
1098 | "outputs": [],
1099 | "source": [
1100 | "c = chipo.groupby('choice_description').sum()\n",
1101 | "c = c.sort_values(['quantity'], ascending=False)\n",
1102 | "c.head(1)\n",
1103 | "# Diet Coke 159"
1104 | ]
1105 | }
1106 | ],
1107 | "metadata": {
1108 | "anaconda-cloud": {},
1109 | "celltoolbar": "Slideshow",
1110 | "kernelspec": {
1111 | "display_name": "Python 2",
1112 | "language": "python",
1113 | "name": "python2"
1114 | },
1115 | "language_info": {
1116 | "codemirror_mode": {
1117 | "name": "ipython",
1118 | "version": 2
1119 | },
1120 | "file_extension": ".py",
1121 | "mimetype": "text/x-python",
1122 | "name": "python",
1123 | "nbconvert_exporter": "python",
1124 | "pygments_lexer": "ipython2",
1125 | "version": "2.7.14"
1126 | }
1127 | },
1128 | "nbformat": 4,
1129 | "nbformat_minor": 1
1130 | }
1131 |
--------------------------------------------------------------------------------
/Clase 3/03_Groupping_&_Apply-Estudiantes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "\n",
12 | " \n",
13 | "\n",
14 | " \n",
15 | "\n",
16 | "# Taller de Manejo y Visualización de Datos con Python\n",
17 | "### Grouping & Apply \n",
18 | "\n",
19 | "Felipe González P. \n",
20 | "felipe.gonzalezp.12@sansano.usm.cl \n",
21 | "\n",
22 | "*Agradecimientos a : https://github.com/justmarkham* \n",
23 | "\n",
24 | "\n",
25 | "\n",
26 | "\n"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {
32 | "slideshow": {
33 | "slide_type": "slide"
34 | }
35 | },
36 | "source": [
37 | "# Dataframe\n",
38 | "\n",
39 | "Es una estructura de datos bidimensional (lo datos se alinean de forma tabular en filas y columnas).\n",
40 | "\n",
41 | "**Ventajas:**\n",
42 | "* Las columnas pueden ser de distinto tipo.\n",
43 | "* El tamaño puede ser mutable.\n",
44 | "* Se pueden acceder a columnas y filas específicas.\n",
45 | "* Se pueden realizar operaciones sobre filas y columnas\n"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {
51 | "slideshow": {
52 | "slide_type": "slide"
53 | }
54 | },
55 | "source": [
56 | " "
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {
62 | "slideshow": {
63 | "slide_type": "slide"
64 | }
65 | },
66 | "source": [
67 | "Un Dataframe puede ser creado a través de los siguientes input:\n",
68 | "* List\n",
69 | "* Dict\n",
70 | "* Series\n",
71 | "* Numpy ndarrays\n",
72 | "* A partir de otro dataframe"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {
78 | "slideshow": {
79 | "slide_type": "slide"
80 | }
81 | },
82 | "source": [
83 | "GroupBy "
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {
89 | "slideshow": {
90 | "slide_type": "slide"
91 | }
92 | },
93 | "source": [
94 | "### Introducción:\n",
95 | "\n",
96 | "GroupBy puede ser resumido en un Dividir-Aplicar-Combinar (*Split-Apply-Combine*)\n",
97 | "\n",
98 | "\n"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {
104 | "slideshow": {
105 | "slide_type": "fragment"
106 | }
107 | },
108 | "source": [
109 | "\n"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {
115 | "slideshow": {
116 | "slide_type": "slide"
117 | }
118 | },
119 | "source": [
120 | ""
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {
126 | "slideshow": {
127 | "slide_type": "slide"
128 | }
129 | },
130 | "source": [
131 | "# Patrón: Split - Apply - Combine\n",
132 | "\n",
133 | "* Un dataset es dividido en pequeñas partes\n",
134 | "* Cada una de esas piezas es operada/analizada independientemente\n",
135 | "* Todos los resultados son combinados al final. \n",
136 | "\n",
137 | "*Este patrón es similar a MapReduce (modelo de programación para dar soporte a la computación paralela sobre grandes colecciones de datos en grupos de computadoras y al commodity computing.) *"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {
143 | "slideshow": {
144 | "slide_type": "slide"
145 | }
146 | },
147 | "source": [
148 | " Consumo de alcohol por continente "
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {
154 | "slideshow": {
155 | "slide_type": "slide"
156 | }
157 | },
158 | "source": [
159 | "### Importar Librería necesaria"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": null,
165 | "metadata": {
166 | "slideshow": {
167 | "slide_type": "fragment"
168 | }
169 | },
170 | "outputs": [],
171 | "source": [
172 | "import pandas as pd\n",
173 | "import matplotlib\n",
174 | "%matplotlib inline"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {
180 | "slideshow": {
181 | "slide_type": "slide"
182 | }
183 | },
184 | "source": [
185 | "### Descargar dataset desde [url](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). "
186 | ]
187 | },
188 | {
189 | "cell_type": "markdown",
190 | "metadata": {
191 | "slideshow": {
192 | "slide_type": "slide"
193 | }
194 | },
195 | "source": [
196 | "### Guardar datos en una variable"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": null,
202 | "metadata": {
203 | "slideshow": {
204 | "slide_type": "fragment"
205 | }
206 | },
207 | "outputs": [],
208 | "source": [
209 | "drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv')\n",
210 | "drinks.head() #asume n igual a 5 en caso de no agregar un valor"
211 | ]
212 | },
213 | {
214 | "cell_type": "markdown",
215 | "metadata": {
216 | "slideshow": {
217 | "slide_type": "slide"
218 | }
219 | },
220 | "source": [
221 | "### ¿Qué continente bebe más cerveza en promedio? ¿Cuál bebe menos?"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": null,
227 | "metadata": {
228 | "slideshow": {
229 | "slide_type": "fragment"
230 | }
231 | },
232 | "outputs": [],
233 | "source": [
234 | "drinks.groupby('continent').beer_servings.mean()"
235 | ]
236 | },
237 | {
238 | "cell_type": "markdown",
239 | "metadata": {
240 | "slideshow": {
241 | "slide_type": "slide"
242 | }
243 | },
244 | "source": [
245 | "### Imprimir las estadísticas de consumo de vino por cada continente"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": null,
251 | "metadata": {
252 | "slideshow": {
253 | "slide_type": "fragment"
254 | }
255 | },
256 | "outputs": [],
257 | "source": [
258 | "drinks.groupby('continent').wine_servings.describe()\n"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {
265 | "slideshow": {
266 | "slide_type": "slide"
267 | }
268 | },
269 | "outputs": [],
270 | "source": [
271 | "drinks.groupby('continent').wine_servings.describe().boxplot()\n"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {
277 | "slideshow": {
278 | "slide_type": "slide"
279 | }
280 | },
281 | "source": [
282 | "### ¿Cuál es el promedio de consumo por cada columna en cada continente?"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {
289 | "slideshow": {
290 | "slide_type": "fragment"
291 | }
292 | },
293 | "outputs": [],
294 | "source": [
295 | "drinks.groupby('continent').mean()"
296 | ]
297 | },
298 | {
299 | "cell_type": "markdown",
300 | "metadata": {
301 | "slideshow": {
302 | "slide_type": "slide"
303 | }
304 | },
305 | "source": [
306 | "### ¿Cuál es la mediana de consumo por cada columna en cada continente?\n"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": null,
312 | "metadata": {
313 | "slideshow": {
314 | "slide_type": "fragment"
315 | }
316 | },
317 | "outputs": [],
318 | "source": [
319 | "drinks.groupby('continent').median()"
320 | ]
321 | },
322 | {
323 | "cell_type": "markdown",
324 | "metadata": {
325 | "slideshow": {
326 | "slide_type": "slide"
327 | }
328 | },
329 | "source": [
330 | "### Imprimir el promedio, mínimo y máximo con respecto al consumo de *spirit* agrupado por continente"
331 | ]
332 | },
333 | {
334 | "cell_type": "code",
335 | "execution_count": null,
336 | "metadata": {
337 | "slideshow": {
338 | "slide_type": "fragment"
339 | }
340 | },
341 | "outputs": [],
342 | "source": [
343 | "drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max']) #llama a la funcion mean, min, max\n"
344 | ]
345 | },
346 | {
347 | "cell_type": "markdown",
348 | "metadata": {
349 | "slideshow": {
350 | "slide_type": "slide"
351 | }
352 | },
353 | "source": [
354 | "Apply "
355 | ]
356 | },
357 | {
358 | "cell_type": "markdown",
359 | "metadata": {
360 | "slideshow": {
361 | "slide_type": "slide"
362 | }
363 | },
364 | "source": [
365 | " Consumo de alcohol en estudiantes "
366 | ]
367 | },
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {
371 | "slideshow": {
372 | "slide_type": "slide"
373 | }
374 | },
375 | "source": [
376 | "### Los datos necesarios están aqui: [url](https://github.com/guipsamora/pandas_exercises/blob/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv).\n",
377 | "\n",
378 | "### Lo guardaremos como siempre en una variable"
379 | ]
380 | },
381 | {
382 | "cell_type": "code",
383 | "execution_count": null,
384 | "metadata": {
385 | "slideshow": {
386 | "slide_type": "fragment"
387 | }
388 | },
389 | "outputs": [],
390 | "source": [
391 | "df = pd.read_csv('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv', sep = ',')\n",
392 | "df.head()"
393 | ]
394 | },
395 | {
396 | "cell_type": "markdown",
397 | "metadata": {
398 | "slideshow": {
399 | "slide_type": "slide"
400 | }
401 | },
402 | "source": [
403 | "### ¡Son muchas columnas! Solo queremos las que están dentro del rango [school,guardian]"
404 | ]
405 | },
406 | {
407 | "cell_type": "code",
408 | "execution_count": null,
409 | "metadata": {
410 | "slideshow": {
411 | "slide_type": "fragment"
412 | }
413 | },
414 | "outputs": [],
415 | "source": [
416 | "stud_alcoh = df.loc[: , \"school\":\"guardian\"] # es para seleccionar. df.loc[row_indexer,column_indexer]\n",
417 | "stud_alcoh.head()"
418 | ]
419 | },
420 | {
421 | "cell_type": "markdown",
422 | "metadata": {
423 | "slideshow": {
424 | "slide_type": "slide"
425 | }
426 | },
427 | "source": [
428 | "### Creemos una función que ponga en mayúscula ciertas palabras"
429 | ]
430 | },
431 | {
432 | "cell_type": "code",
433 | "execution_count": null,
434 | "metadata": {
435 | "slideshow": {
436 | "slide_type": "fragment"
437 | }
438 | },
439 | "outputs": [],
440 | "source": [
441 | "captalizer = lambda x: x.upper()"
442 | ]
443 | },
444 | {
445 | "cell_type": "markdown",
446 | "metadata": {
447 | "slideshow": {
448 | "slide_type": "slide"
449 | }
450 | },
451 | "source": [
452 | "### Pongamos en mayúsculas Mjob y Fjob"
453 | ]
454 | },
455 | {
456 | "cell_type": "code",
457 | "execution_count": null,
458 | "metadata": {
459 | "slideshow": {
460 | "slide_type": "fragment"
461 | }
462 | },
463 | "outputs": [],
464 | "source": [
465 | "stud_alcoh['Mjob'].apply(captalizer)\n",
466 | "stud_alcoh['Fjob'].apply(captalizer)"
467 | ]
468 | },
469 | {
470 | "cell_type": "markdown",
471 | "metadata": {
472 | "slideshow": {
473 | "slide_type": "slide"
474 | }
475 | },
476 | "source": [
477 | "### ¿Cuáles son los últimos 5 elementos del dataset?"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": null,
483 | "metadata": {
484 | "slideshow": {
485 | "slide_type": "fragment"
486 | }
487 | },
488 | "outputs": [],
489 | "source": [
490 | "stud_alcoh.tail() #Por defecto n = 5"
491 | ]
492 | },
493 | {
494 | "cell_type": "markdown",
495 | "metadata": {
496 | "slideshow": {
497 | "slide_type": "slide"
498 | }
499 | },
500 | "source": [
501 | "### ¡El dataset original aún tiene en minúsculas Mjob y Fjob!"
502 | ]
503 | },
504 | {
505 | "cell_type": "code",
506 | "execution_count": null,
507 | "metadata": {
508 | "slideshow": {
509 | "slide_type": "fragment"
510 | }
511 | },
512 | "outputs": [],
513 | "source": [
514 | "stud_alcoh['Mjob'] = stud_alcoh['Mjob'].apply(captalizer)\n",
515 | "stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer)\n",
516 | "stud_alcoh.tail()"
517 | ]
518 | },
519 | {
520 | "cell_type": "markdown",
521 | "metadata": {
522 | "slideshow": {
523 | "slide_type": "slide"
524 | }
525 | },
526 | "source": [
527 | "### Creemos una nueva columna que muestre 'true' en el caso de que el individuo sea un *legal_drinker*"
528 | ]
529 | },
530 | {
531 | "cell_type": "code",
532 | "execution_count": null,
533 | "metadata": {
534 | "slideshow": {
535 | "slide_type": "fragment"
536 | }
537 | },
538 | "outputs": [],
539 | "source": [
540 | "def majority(x):\n",
541 | " if x > 17:\n",
542 | " return True\n",
543 | " else:\n",
544 | " return False"
545 | ]
546 | },
547 | {
548 | "cell_type": "code",
549 | "execution_count": null,
550 | "metadata": {
551 | "slideshow": {
552 | "slide_type": "slide"
553 | }
554 | },
555 | "outputs": [],
556 | "source": [
557 | "stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority)\n",
558 | "stud_alcoh.head()"
559 | ]
560 | },
561 | {
562 | "cell_type": "markdown",
563 | "metadata": {
564 | "slideshow": {
565 | "slide_type": "slide"
566 | }
567 | },
568 | "source": [
569 | "### Multipliquemos cada número del dataset por 10\n",
570 | ""
571 | ]
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": null,
576 | "metadata": {
577 | "slideshow": {
578 | "slide_type": "slide"
579 | }
580 | },
581 | "outputs": [],
582 | "source": [
583 | "import numpy \n",
584 | "def times10(x):\n",
585 | " if type(x) is long:\n",
586 | " return 10*x\n",
587 | " else:\n",
588 | " return x\n"
589 | ]
590 | },
591 | {
592 | "cell_type": "code",
593 | "execution_count": null,
594 | "metadata": {
595 | "slideshow": {
596 | "slide_type": "slide"
597 | }
598 | },
599 | "outputs": [],
600 | "source": [
601 | "stud_alcoh.applymap(times10).head(10)\n",
602 | "#Applymap: Apply a function to a DataFrame that is intended to operate elementwise, \n",
603 | "#i.e. like doing map(func, series) for each series in the DataFrame"
604 | ]
605 | },
606 | {
607 | "cell_type": "markdown",
608 | "metadata": {
609 | "slideshow": {
610 | "slide_type": "slide"
611 | }
612 | },
613 | "source": [
614 | "Ejercicios "
615 | ]
616 | },
617 | {
618 | "cell_type": "markdown",
619 | "metadata": {
620 | "slideshow": {
621 | "slide_type": "slide"
622 | }
623 | },
624 | "source": [
625 | "### Ocupar los datos de: [url](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user) y guardarlos en una variable. "
626 | ]
627 | },
628 | {
629 | "cell_type": "code",
630 | "execution_count": null,
631 | "metadata": {
632 | "slideshow": {
633 | "slide_type": "fragment"
634 | }
635 | },
636 | "outputs": [],
637 | "source": [
638 | "\n"
639 | ]
640 | },
641 | {
642 | "cell_type": "markdown",
643 | "metadata": {
644 | "slideshow": {
645 | "slide_type": "slide"
646 | }
647 | },
648 | "source": [
649 | "### Muestra las primeros 5 filas del dataset"
650 | ]
651 | },
652 | {
653 | "cell_type": "code",
654 | "execution_count": null,
655 | "metadata": {
656 | "slideshow": {
657 | "slide_type": "fragment"
658 | }
659 | },
660 | "outputs": [],
661 | "source": []
662 | },
663 | {
664 | "cell_type": "markdown",
665 | "metadata": {
666 | "slideshow": {
667 | "slide_type": "slide"
668 | }
669 | },
670 | "source": [
671 | "### ¿Cuál es la media de edad por ocupación?"
672 | ]
673 | },
674 | {
675 | "cell_type": "code",
676 | "execution_count": null,
677 | "metadata": {
678 | "slideshow": {
679 | "slide_type": "fragment"
680 | }
681 | },
682 | "outputs": [],
683 | "source": []
684 | },
685 | {
686 | "cell_type": "markdown",
687 | "metadata": {
688 | "slideshow": {
689 | "slide_type": "slide"
690 | }
691 | },
692 | "source": [
693 | "### Descubre la proporción de hombres por ocupación y ordenalos de mayor a menor"
694 | ]
695 | },
696 | {
697 | "cell_type": "code",
698 | "execution_count": null,
699 | "metadata": {
700 | "slideshow": {
701 | "slide_type": "fragment"
702 | }
703 | },
704 | "outputs": [],
705 | "source": []
706 | },
707 | {
708 | "cell_type": "markdown",
709 | "metadata": {
710 | "slideshow": {
711 | "slide_type": "slide"
712 | }
713 | },
714 | "source": [
715 | "### Por cada ocupación calcula el mínimo y máximo valor"
716 | ]
717 | },
718 | {
719 | "cell_type": "code",
720 | "execution_count": null,
721 | "metadata": {
722 | "slideshow": {
723 | "slide_type": "fragment"
724 | }
725 | },
726 | "outputs": [],
727 | "source": []
728 | },
729 | {
730 | "cell_type": "markdown",
731 | "metadata": {
732 | "slideshow": {
733 | "slide_type": "slide"
734 | }
735 | },
736 | "source": [
737 | "### Por cada combinación \"ocupación-genero\" calcula el promedio de la edad"
738 | ]
739 | },
740 | {
741 | "cell_type": "code",
742 | "execution_count": null,
743 | "metadata": {
744 | "slideshow": {
745 | "slide_type": "fragment"
746 | }
747 | },
748 | "outputs": [],
749 | "source": []
750 | },
751 | {
752 | "cell_type": "markdown",
753 | "metadata": {
754 | "slideshow": {
755 | "slide_type": "slide"
756 | }
757 | },
758 | "source": [
759 | "### Por cada ocupación muestra el porcentaje de hombres y mujeres"
760 | ]
761 | },
762 | {
763 | "cell_type": "code",
764 | "execution_count": null,
765 | "metadata": {
766 | "slideshow": {
767 | "slide_type": "fragment"
768 | }
769 | },
770 | "outputs": [],
771 | "source": []
772 | },
773 | {
774 | "cell_type": "markdown",
775 | "metadata": {
776 | "slideshow": {
777 | "slide_type": "slide"
778 | }
779 | },
780 | "source": [
781 | "# Consumo de alcohol en estudiantes"
782 | ]
783 | },
784 | {
785 | "cell_type": "markdown",
786 | "metadata": {
787 | "slideshow": {
788 | "slide_type": "slide"
789 | }
790 | },
791 | "source": [
792 | "### Los datos son los siguientes"
793 | ]
794 | },
795 | {
796 | "cell_type": "code",
797 | "execution_count": 1,
798 | "metadata": {
799 | "slideshow": {
800 | "slide_type": "fragment"
801 | }
802 | },
803 | "outputs": [],
804 | "source": [
805 | "raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], \n",
806 | " 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], \n",
807 | " 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], \n",
808 | " 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],\n",
809 | " 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}"
810 | ]
811 | },
812 | {
813 | "cell_type": "markdown",
814 | "metadata": {
815 | "slideshow": {
816 | "slide_type": "slide"
817 | }
818 | },
819 | "source": [
820 | "### Convierte el diccionario en un dataframe"
821 | ]
822 | },
823 | {
824 | "cell_type": "code",
825 | "execution_count": null,
826 | "metadata": {
827 | "slideshow": {
828 | "slide_type": "fragment"
829 | }
830 | },
831 | "outputs": [],
832 | "source": []
833 | },
834 | {
835 | "cell_type": "markdown",
836 | "metadata": {
837 | "slideshow": {
838 | "slide_type": "slide"
839 | }
840 | },
841 | "source": [
842 | "### ¿Cual es el promedio de el preTestScore del regimiento Nighthawks?"
843 | ]
844 | },
845 | {
846 | "cell_type": "code",
847 | "execution_count": null,
848 | "metadata": {
849 | "slideshow": {
850 | "slide_type": "fragment"
851 | }
852 | },
853 | "outputs": [],
854 | "source": []
855 | },
856 | {
857 | "cell_type": "markdown",
858 | "metadata": {
859 | "slideshow": {
860 | "slide_type": "slide"
861 | }
862 | },
863 | "source": [
864 | "### Muestra estadíticas generales por compañia"
865 | ]
866 | },
867 | {
868 | "cell_type": "code",
869 | "execution_count": null,
870 | "metadata": {
871 | "slideshow": {
872 | "slide_type": "fragment"
873 | }
874 | },
875 | "outputs": [],
876 | "source": []
877 | },
878 | {
879 | "cell_type": "markdown",
880 | "metadata": {
881 | "slideshow": {
882 | "slide_type": "slide"
883 | }
884 | },
885 | "source": [
886 | "### ¿Cual es el promedio para cada compañía en el preTestScore?\n"
887 | ]
888 | },
889 | {
890 | "cell_type": "code",
891 | "execution_count": null,
892 | "metadata": {
893 | "slideshow": {
894 | "slide_type": "fragment"
895 | }
896 | },
897 | "outputs": [],
898 | "source": []
899 | },
900 | {
901 | "cell_type": "markdown",
902 | "metadata": {
903 | "slideshow": {
904 | "slide_type": "slide"
905 | }
906 | },
907 | "source": [
908 | "### Mostrar el promedio en el preTestScore segun regiment y company\n"
909 | ]
910 | },
911 | {
912 | "cell_type": "code",
913 | "execution_count": null,
914 | "metadata": {
915 | "slideshow": {
916 | "slide_type": "fragment"
917 | }
918 | },
919 | "outputs": [],
920 | "source": []
921 | },
922 | {
923 | "cell_type": "markdown",
924 | "metadata": {
925 | "slideshow": {
926 | "slide_type": "slide"
927 | }
928 | },
929 | "source": [
930 | "### Agrupar todo el dataframe por regimiento y compañía, de tal forma que se vea esto:\n",
931 | " \n"
932 | ]
933 | },
934 | {
935 | "cell_type": "code",
936 | "execution_count": null,
937 | "metadata": {
938 | "slideshow": {
939 | "slide_type": "fragment"
940 | }
941 | },
942 | "outputs": [],
943 | "source": []
944 | },
945 | {
946 | "cell_type": "markdown",
947 | "metadata": {
948 | "slideshow": {
949 | "slide_type": "slide"
950 | }
951 | },
952 | "source": [
953 | "### ¿Cuál es el número de observaciones en cada regimiento y compañía?"
954 | ]
955 | },
956 | {
957 | "cell_type": "code",
958 | "execution_count": null,
959 | "metadata": {
960 | "slideshow": {
961 | "slide_type": "fragment"
962 | }
963 | },
964 | "outputs": [],
965 | "source": []
966 | },
967 | {
968 | "cell_type": "markdown",
969 | "metadata": {
970 | "slideshow": {
971 | "slide_type": "slide"
972 | }
973 | },
974 | "source": [
975 | "### Itera sobre cada regimiento de tal forma que puedas imprimir por pantalla toda su información:"
976 | ]
977 | },
978 | {
979 | "cell_type": "markdown",
980 | "metadata": {
981 | "slideshow": {
982 | "slide_type": "fragment"
983 | }
984 | },
985 | "source": [
986 | " "
987 | ]
988 | },
989 | {
990 | "cell_type": "code",
991 | "execution_count": null,
992 | "metadata": {
993 | "slideshow": {
994 | "slide_type": "slide"
995 | }
996 | },
997 | "outputs": [],
998 | "source": []
999 | }
1000 | ],
1001 | "metadata": {
1002 | "anaconda-cloud": {},
1003 | "celltoolbar": "Slideshow",
1004 | "kernelspec": {
1005 | "display_name": "Python 2",
1006 | "language": "python",
1007 | "name": "python2"
1008 | },
1009 | "language_info": {
1010 | "codemirror_mode": {
1011 | "name": "ipython",
1012 | "version": 2
1013 | },
1014 | "file_extension": ".py",
1015 | "mimetype": "text/x-python",
1016 | "name": "python",
1017 | "nbconvert_exporter": "python",
1018 | "pygments_lexer": "ipython2",
1019 | "version": "2.7.14"
1020 | }
1021 | },
1022 | "nbformat": 4,
1023 | "nbformat_minor": 1
1024 | }
1025 |
--------------------------------------------------------------------------------
/Clase 3/grouping.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 3/grouping.png
--------------------------------------------------------------------------------
/Clase 3/grouping2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 3/grouping2.png
--------------------------------------------------------------------------------
/Clase 4/matplotlib.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 4/matplotlib.png
--------------------------------------------------------------------------------
/Clase 4/seaborn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 4/seaborn.png
--------------------------------------------------------------------------------
/Clase 5/Introducción a MatplotLib-Formato Estudiantes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "\n",
12 | " \n",
13 | "\n",
14 | " \n",
15 | "\n",
16 | "# Taller de Manejo y Visualización de Datos con Python\n",
17 | "Introducción a Matplotlib \n",
18 | "Felipe González P. \n",
19 | "felipe.gonzalezp.12@sansano.usm.cl \n",
20 | "\n",
21 | " \n",
22 | "Jueves Bloque 7-8 \n",
23 | "Campus San Joaquín\n",
24 | "\n",
25 | "\n"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {
31 | "slideshow": {
32 | "slide_type": "slide"
33 | }
34 | },
35 | "source": [
36 | "## Introducción\n",
37 | "\n",
38 | "Matplotlib es probablemente el paquete de Python más utilizado para gráficos 2D. Proporciona una forma muy rápida de visualizar datos de Python y cifras de calidad de publicación en muchos formatos. "
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "## Que debemos hacer\n",
46 | "\n",
47 | "Siempre que trabajemos en un IPython notebook debemos escribir:"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": null,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "%matplotlib inline\n"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "## Importar Matplotlib"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {},
70 | "outputs": [],
71 | "source": [
72 | "from matplotlib import pyplot as plt\n"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {},
78 | "source": [
79 | "## Figura\n",
80 | ""
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "## Figures, Subplots, Axes and Ticks\n",
88 | "\n",
89 | "*Una 'figure' en matplotlib significa la ventana completa de la interfaz. Dentro de esta figura pueden haber 'subplots'\n",
90 | "\n",
91 | "Podemos tener el control sobre lo mostrado utilizando 'figure', 'subplot' y 'axes' de manera específica. Mientras 'subplot' es utilizado para añadir pequeñas figuras dentro de una grilla, 'axes' permite tener mayor control de lo programado. \n",
92 | "\n",
93 | "Importante: \n",
94 | "*gca()* se utiliza para obtener los 'axes' actuales \n",
95 | "*gcf()* se utiliza para obtener la figura actual. "
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "## Figures \n",
103 | " \n",
104 | ""
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "## Subplots\n",
112 | "\n",
113 | "Se puede ocupar subplot(fila,columna,indice) o 'gridspec' en caso de querer algo más poderoso\n",
114 | "\n",
115 | ""
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "## Axes\n",
123 | "Los ejes (*axes*) son muy similares a las subplots, pero permiten la colocación de gráficos en cualquier ubicación de la figura. Entonces, si queremos poner una gráfico más pequeño dentro de uno más grande, lo hacemos con *axes*.\n",
124 | "\n",
125 | ""
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "## Ticks\n",
133 | " \n",
134 | ""
135 | ]
136 | },
137 | {
138 | "cell_type": "markdown",
139 | "metadata": {},
140 | "source": [
141 | "## Datos\n",
142 | "\n",
143 | "Misión: Dibujar las funciones de coseno y seno en la misma figura. \n",
144 | "\n",
145 | "El primer paso es obtener los datos para las funciones seno y coseno:"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "import numpy as np\n",
155 | "\n",
156 | "X = np.linspace(-np.pi, np.pi, 256, endpoint=True) #valores x (256 valores)\n",
157 | "C, S = np.cos(X), np.sin(X) # valores y de seno y coseno"
158 | ]
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "metadata": {},
163 | "source": [
164 | "### Gráfico Simple\n",
165 | "\n",
166 | "Matplotlib viene con un conjunto de configuraciones predeterminadas que permiten personalizar todo tipo de propiedades. Se puede controlar los valores predeterminados de casi todas las propiedades en matplotlib: tamaño de figura, ancho de línea, color y estilo, ejes, propiedades de cuadrícula, propiedades de texto, etc."
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": null,
172 | "metadata": {},
173 | "outputs": [],
174 | "source": [
175 | "import numpy as np\n",
176 | "import matplotlib.pyplot as plt\n",
177 | "\n",
178 | "X = np.linspace(-np.pi, np.pi, 256, endpoint=True)\n",
179 | "C, S = np.cos(X), np.sin(X)\n",
180 | "\n",
181 | "plt.plot(X, C)\n",
182 | "plt.plot(X, S)\n",
183 | "\n",
184 | "plt.show()"
185 | ]
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "metadata": {},
190 | "source": [
191 | "## ¿Qué valores podemos cambiar?"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": null,
197 | "metadata": {},
198 | "outputs": [],
199 | "source": [
200 | "import numpy as np\n",
201 | "import matplotlib.pyplot as plt\n",
202 | "\n",
203 | "#Crear una figura de 8x6 pulgadas, 80 puntos por pulgada.\n",
204 | "plt.figure(figsize=(8, 6), dpi=80)\n",
205 | "\n",
206 | "# Crear un nuevo subplot desde una grilla de 1 por 1\n",
207 | "plt.subplot(1, 1, 1) #fila, columna, indice\n",
208 | "\n",
209 | "#datos\n",
210 | "X = np.linspace(-np.pi, np.pi, 256, endpoint=True)\n",
211 | "C, S = np.cos(X), np.sin(X)\n",
212 | "\n",
213 | "#Graficar cosino con una linea azul continua de ancho 1 (pixeles)\n",
214 | "plt.plot(X, C, color=\"blue\", linewidth=1.0, linestyle=\"-\")\n",
215 | "\n",
216 | "plt.plot(X, S, color=\"green\", linewidth=1.0, linestyle=\"-\")\n",
217 | "\n",
218 | "# Establecer x limites\n",
219 | "plt.xlim(-4.0, 4.0)\n",
220 | "\n",
221 | "plt.xticks(np.linspace(-4, 4, 9, endpoint=True)) #parametros de eje x\n",
222 | "\n",
223 | "# Establecer y limites\n",
224 | "plt.ylim(-1.0, 1.0)\n",
225 | "\n",
226 | "plt.yticks(np.linspace(-1, 1, 5, endpoint=True)) #parametro de eje y\n",
227 | "\n",
228 | "# Guardar figura usando 72 puntos por pulgada\n",
229 | "# plt.savefig(\"ejercicio_2.png\", dpi=72)\n",
230 | "\n",
231 | "# Mostrar resultado en pantalla\n",
232 | "plt.show()"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "## Cambiar colores y ancho de lineas"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": null,
245 | "metadata": {},
246 | "outputs": [],
247 | "source": [
248 | "plt.figure(figsize=(10, 6), dpi=80)\n",
249 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\")\n",
250 | "plt.plot(X, S, color=\"red\", linewidth=5.5, linestyle=\"-\")"
251 | ]
252 | },
253 | {
254 | "cell_type": "markdown",
255 | "metadata": {},
256 | "source": [
257 | "## Establecer límites\n",
258 | "Debemos cambiar los límites de la figura para que toda la data calce en ella"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": [
267 | "plt.figure(figsize=(10, 6), dpi=80)\n",
268 | "plt.xlim(X.min() * 1.1, X.max() * 1.2)\n",
269 | "plt.ylim(C.min() * 1.1, C.max() * 1.2)\n",
270 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\")\n",
271 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\")"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "## Establecer etiquetas\n",
279 | "\n",
280 | "Las marcas actuales no son ideales porque no muestran los valores interesantes (+/- π, + / - π / 2) para el seno y el coseno. Los cambiaremos de modo que solo muestren estos valores."
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {},
287 | "outputs": [],
288 | "source": [
289 | "plt.figure(figsize=(10, 6), dpi=80)\n",
290 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\")\n",
291 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\")\n",
292 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi]) #estan transformados en decimal\n",
293 | "plt.yticks([-1, 0, +1])\n",
294 | "plt.show()"
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {},
300 | "source": [
301 | "## Establecer etiquetas\n",
302 | "\n",
303 | "Las marcas ahora están colocadas correctamente, pero su etiqueta no es muy explícita. Podríamos suponer que 3.142 es π, pero sería mejor hacerlo explícito. Cuando establecemos valores de marca, también podemos proporcionar una etiqueta correspondiente en la segunda lista de argumentos. \n",
304 | "\n",
305 | "**Usaremos latex para permitir una buena representación de la etiqueta.**"
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": null,
311 | "metadata": {},
312 | "outputs": [],
313 | "source": [
314 | "plt.figure(figsize=(10, 6), dpi=80)\n",
315 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\")\n",
316 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\")\n",
317 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
318 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
319 | "\n",
320 | "plt.yticks([-1, 0, +1],\n",
321 | " [r'$-1$', r'$0$', r'$+1$'])\n",
322 | "plt.show()"
323 | ]
324 | },
325 | {
326 | "cell_type": "markdown",
327 | "metadata": {},
328 | "source": [
329 | "## Modificar marco de la figura\n",
330 | "*Spines* son las líneas que conectan las marcas de eje y anotan los límites del área de datos. Se pueden colocar en posiciones arbitrarias y hasta ahora, estaban en el borde del eje. \n",
331 | "\n",
332 | "Cambiaremos eso ya que queremos tenerlos en el medio. Como hay cuatro (arriba / abajo / izquierda / derecha), descartaremos la parte superior y la derecha configurando el color en 'none' y moveremos el inferior y el izquierdo para centrarlos en 0."
333 | ]
334 | },
335 | {
336 | "cell_type": "code",
337 | "execution_count": null,
338 | "metadata": {},
339 | "outputs": [],
340 | "source": [
341 | "plt.figure(figsize=(10, 6), dpi=80)\n",
342 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\")\n",
343 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\")\n",
344 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
345 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
346 | "\n",
347 | "plt.yticks([-1, 0, +1],\n",
348 | " [r'$-1$', r'$0$', r'$+1$'])\n",
349 | "ax = plt.gca() # gca se ocupa para decir q nos referimos al grafico actual\n",
350 | "ax.spines['right'].set_color('none')\n",
351 | "ax.spines['top'].set_color('none')\n",
352 | "ax.xaxis.set_ticks_position('bottom')\n",
353 | "ax.spines['bottom'].set_position(('data',0))\n",
354 | "ax.yaxis.set_ticks_position('left')\n",
355 | "ax.spines['left'].set_position(('data',0)) #outward, axes, data\n",
356 | "plt.show()"
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "metadata": {},
362 | "source": [
363 | "## Añadir una leyenda"
364 | ]
365 | },
366 | {
367 | "cell_type": "code",
368 | "execution_count": null,
369 | "metadata": {},
370 | "outputs": [],
371 | "source": [
372 | "plt.figure(figsize=(10, 6), dpi=80)\n",
373 | "\n",
374 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\", label=\"coseno\")\n",
375 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\", label=\"seno\")\n",
376 | "plt.legend(loc='upper left')\n",
377 | "\n",
378 | "\n",
379 | "\n",
380 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
381 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
382 | "plt.yticks([-1, 0, +1],\n",
383 | " [r'$-1$', r'$0$', r'$+1$'])\n",
384 | "ax = plt.gca() # gca se ocupa para decir q nos referimos al grafico actual\n",
385 | "ax.spines['right'].set_color('none')\n",
386 | "ax.spines['top'].set_color('none')\n",
387 | "ax.xaxis.set_ticks_position('bottom')\n",
388 | "ax.spines['bottom'].set_position(('data',0))\n",
389 | "ax.yaxis.set_ticks_position('left')\n",
390 | "ax.spines['left'].set_position(('data',0)) #outward, axes, data\n",
391 | "\n",
392 | "\n",
393 | "plt.show()"
394 | ]
395 | },
396 | {
397 | "cell_type": "markdown",
398 | "metadata": {},
399 | "source": [
400 | "## Indicar puntos específicos\n",
401 | "\n",
402 | "Anotemos algunos puntos interesantes usando el comando *annotate*.\n"
403 | ]
404 | },
405 | {
406 | "cell_type": "code",
407 | "execution_count": null,
408 | "metadata": {},
409 | "outputs": [],
410 | "source": [
411 | "plt.figure(figsize=(10, 6), dpi=80)\n",
412 | "\n",
413 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\", label=\"coseno\")\n",
414 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\", label=\"seno\")\n",
415 | "plt.legend(loc='upper left')\n",
416 | "\n",
417 | "t = 2 * np.pi / 3\n",
418 | "plt.plot([t, t], [0, np.cos(t)], color='blue', linewidth=2.5, linestyle=\"--\")\n",
419 | "plt.scatter([t, ], [np.cos(t), ], 50, color='blue')\n",
420 | "\n",
421 | "plt.annotate(r'$sin(\\frac{2\\pi}{3})=\\frac{\\sqrt{3}}{2}$',\n",
422 | " xy=(t, np.sin(t)), xycoords='data',\n",
423 | " xytext=(+10, +30), textcoords='offset points', fontsize=16,\n",
424 | " arrowprops=dict(arrowstyle=\"->\"))\n",
425 | "\n",
426 | "plt.plot([t, t],[0, np.sin(t)], color='red', linewidth=2.5, linestyle=\"--\")\n",
427 | "plt.scatter([t, ],[np.sin(t), ], 50, color='red') #probar eliminar\n",
428 | "\n",
429 | "plt.annotate(r'$cos(\\frac{2\\pi}{3})=-\\frac{1}{2}$',\n",
430 | " xy=(t, np.cos(t)), xycoords='data',\n",
431 | " xytext=(-90, -50), textcoords='offset points', fontsize=16,\n",
432 | " arrowprops=dict(arrowstyle=\"->\"))\n",
433 | "\n",
434 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
435 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
436 | "plt.yticks([-1, 0, +1],\n",
437 | " [r'$-1$', r'$0$', r'$+1$'])\n",
438 | "ax = plt.gca() # gca se ocupa para decir q nos referimos al grafico actual\n",
439 | "ax.spines['right'].set_color('none')\n",
440 | "ax.spines['top'].set_color('none')\n",
441 | "ax.xaxis.set_ticks_position('bottom')\n",
442 | "ax.spines['bottom'].set_position(('data',0))\n",
443 | "ax.yaxis.set_ticks_position('left')\n",
444 | "ax.spines['left'].set_position(('data',0)) #outward, axes, data\n"
445 | ]
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "metadata": {},
450 | "source": [
451 | "## Fijación por los detalles\n",
452 | "\n",
453 | "Las etiquetas están ahora apenas visibles debido a las líneas azul y roja. Podemos hacerlos más grandes y también podemos ajustar sus propiedades de modo que se representen sobre un fondo blanco semitransparente. Esto nos permitirá ver los datos y las etiquetas."
454 | ]
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": null,
459 | "metadata": {},
460 | "outputs": [],
461 | "source": [
462 | "plt.figure(figsize=(10, 6), dpi=80)\n",
463 | "\n",
464 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\", label=\"coseno\")\n",
465 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\", label=\"seno\")\n",
466 | "plt.legend(loc='upper left')\n",
467 | "\n",
468 | "t = 2 * np.pi / 3\n",
469 | "plt.plot([t, t], [0, np.cos(t)], color='blue', linewidth=2.5, linestyle=\"--\")\n",
470 | "plt.scatter([t, ], [np.cos(t), ], 50, color='blue')\n",
471 | "\n",
472 | "plt.annotate(r'$sin(\\frac{2\\pi}{3})=\\frac{\\sqrt{3}}{2}$',\n",
473 | " xy=(t, np.sin(t)), xycoords='data',\n",
474 | " xytext=(+10, +30), textcoords='offset points', fontsize=16,\n",
475 | " arrowprops=dict(arrowstyle=\"->\"))\n",
476 | "\n",
477 | "plt.plot([t, t],[0, np.sin(t)], color='red', linewidth=2.5, linestyle=\"--\")\n",
478 | "plt.scatter([t, ],[np.sin(t), ], 50, color='red') #probar eliminar\n",
479 | "\n",
480 | "plt.annotate(r'$cos(\\frac{2\\pi}{3})=-\\frac{1}{2}$',\n",
481 | " xy=(t, np.cos(t)), xycoords='data',\n",
482 | " xytext=(-90, -50), textcoords='offset points', fontsize=16,\n",
483 | " arrowprops=dict(arrowstyle=\"->\"))\n",
484 | "\n",
485 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
486 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
487 | "plt.yticks([-1, 0, +1],\n",
488 | " [r'$-1$', r'$0$', r'$+1$'])\n",
489 | "ax = plt.gca() # gca se ocupa para decir q nos referimos al grafico actual\n",
490 | "ax.spines['right'].set_color('none')\n",
491 | "ax.spines['top'].set_color('none')\n",
492 | "ax.xaxis.set_ticks_position('bottom')\n",
493 | "ax.spines['bottom'].set_position(('data',0))\n",
494 | "ax.yaxis.set_ticks_position('left')\n",
495 | "ax.spines['left'].set_position(('data',0)) #outward, axes, data\n",
496 | "\n",
497 | "for label in ax.get_xticklabels() + ax.get_yticklabels():\n",
498 | " label.set_fontsize(16)\n",
499 | " label.set_bbox(dict(facecolor='white', edgecolor='None', alpha=0.65))#probar valores de alpha y facecolor"
500 | ]
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "metadata": {},
505 | "source": [
506 | "## Añadir Título"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": null,
512 | "metadata": {},
513 | "outputs": [],
514 | "source": [
515 | "plt.figure(figsize=(10, 6), dpi=80)\n",
516 | "\n",
517 | "plt.plot(X, C, color=\"blue\", linewidth=2.5, linestyle=\"-\", label=\"coseno\")\n",
518 | "plt.plot(X, S, color=\"red\", linewidth=2.5, linestyle=\"-\", label=\"seno\")\n",
519 | "plt.legend(loc='upper left')\n",
520 | "\n",
521 | "plt.title(\"sin(x) y cos(x)\")\n",
522 | "t = 2 * np.pi / 3\n",
523 | "plt.plot([t, t], [0, np.cos(t)], color='blue', linewidth=2.5, linestyle=\"--\")\n",
524 | "plt.scatter([t, ], [np.cos(t), ], 50, color='blue')\n",
525 | "\n",
526 | "plt.annotate(r'$sin(\\frac{2\\pi}{3})=\\frac{\\sqrt{3}}{2}$',\n",
527 | " xy=(t, np.sin(t)), xycoords='data',\n",
528 | " xytext=(+10, +30), textcoords='offset points', fontsize=16,\n",
529 | " arrowprops=dict(arrowstyle=\"->\"))\n",
530 | "\n",
531 | "plt.plot([t, t],[0, np.sin(t)], color='red', linewidth=2.5, linestyle=\"--\")\n",
532 | "plt.scatter([t, ],[np.sin(t), ], 50, color='red') #probar eliminar\n",
533 | "\n",
534 | "plt.annotate(r'$cos(\\frac{2\\pi}{3})=-\\frac{1}{2}$',\n",
535 | " xy=(t, np.cos(t)), xycoords='data',\n",
536 | " xytext=(-90, -50), textcoords='offset points', fontsize=16,\n",
537 | " arrowprops=dict(arrowstyle=\"->\"))\n",
538 | "\n",
539 | "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n",
540 | " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n",
541 | "plt.yticks([-1, 0, +1],\n",
542 | " [r'$-1$', r'$0$', r'$+1$'])\n",
543 | "ax = plt.gca() # gca se ocupa para decir q nos referimos al grafico actual\n",
544 | "ax.spines['right'].set_color('none')\n",
545 | "ax.spines['top'].set_color('none')\n",
546 | "ax.xaxis.set_ticks_position('bottom')\n",
547 | "ax.spines['bottom'].set_position(('data',0))\n",
548 | "ax.yaxis.set_ticks_position('left')\n",
549 | "ax.spines['left'].set_position(('data',0)) #outward, axes, data\n",
550 | "\n",
551 | "for label in ax.get_xticklabels() + ax.get_yticklabels():\n",
552 | " label.set_fontsize(16)\n",
553 | " label.set_bbox(dict(facecolor='white', edgecolor='None', alpha=0.65))#probar valores de alpha y facecolor"
554 | ]
555 | },
556 | {
557 | "cell_type": "markdown",
558 | "metadata": {},
559 | "source": [
560 | "## Ejercicio\n",
561 | "\n",
562 | "Aplique todo lo visto anteriormente para representar las funciones *cosh(x)* y *sinh(x)* desde -2pi a 2pi, utilizando en el eje 'x' 256 valores distintos."
563 | ]
564 | },
565 | {
566 | "cell_type": "markdown",
567 | "metadata": {},
568 | "source": [
569 | "## Crear datos"
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": null,
575 | "metadata": {},
576 | "outputs": [],
577 | "source": []
578 | },
579 | {
580 | "cell_type": "markdown",
581 | "metadata": {},
582 | "source": [
583 | "## Graficar\n",
584 | "\n",
585 | "Debe lograr la siguiente figura\n",
586 | "\n",
587 | ""
588 | ]
589 | },
590 | {
591 | "cell_type": "code",
592 | "execution_count": null,
593 | "metadata": {},
594 | "outputs": [],
595 | "source": []
596 | },
597 | {
598 | "cell_type": "markdown",
599 | "metadata": {},
600 | "source": [
601 | "De datos a visualización "
602 | ]
603 | },
604 | {
605 | "cell_type": "markdown",
606 | "metadata": {},
607 | "source": [
608 | "## Datos\n",
609 | "\n",
610 | "Los datos muestran el número de ventas para ciertas compañías"
611 | ]
612 | },
613 | {
614 | "cell_type": "code",
615 | "execution_count": null,
616 | "metadata": {},
617 | "outputs": [],
618 | "source": [
619 | "import numpy as np\n",
620 | "import matplotlib.pyplot as plt\n",
621 | "from matplotlib.ticker import FuncFormatter\n",
622 | "\n",
623 | "data = {'Barton LLC': 109438.50,\n",
624 | " 'Frami, Hills and Schmidt': 103569.59,\n",
625 | " 'Fritsch, Russel and Anderson': 112214.71,\n",
626 | " 'Jerde-Hilpert': 112591.43,\n",
627 | " 'Keeling LLC': 100934.30,\n",
628 | " 'Koepp Ltd': 103660.54,\n",
629 | " 'Kulas Inc': 137351.96,\n",
630 | " 'Trantow-Barrows': 123381.38,\n",
631 | " 'White-Trantow': 135841.99,\n",
632 | " 'Will LLC': 104437.60}\n",
633 | "group_data = list(data.values())\n",
634 | "group_names = list(data.keys())\n",
635 | "group_mean = np.mean(group_data)"
636 | ]
637 | },
638 | {
639 | "cell_type": "markdown",
640 | "metadata": {},
641 | "source": [
642 | "¿Como visualizarían esta información? "
643 | ]
644 | },
645 | {
646 | "cell_type": "markdown",
647 | "metadata": {},
648 | "source": [
649 | "Esta data suele visualizarse utilizando *barplots* (gráfico de barras), donde cada barra corresponderá a un grupo. \n",
650 | "\n",
651 | "Para hacer esto, se puede crear una instancia de *figure* y *axes*. La *Figure* será nuestro 'canvas¿ y 'Axes' será una parte de nuestro canvas en donde haremos una visualización en particular"
652 | ]
653 | },
654 | {
655 | "cell_type": "code",
656 | "execution_count": null,
657 | "metadata": {},
658 | "outputs": [],
659 | "source": [
660 | "fig, ax = plt.subplots() #Figuras pueden tener multiples 'axes' dentro. "
661 | ]
662 | },
663 | {
664 | "cell_type": "markdown",
665 | "metadata": {},
666 | "source": [
667 | "Se ha creado una instancia de 'Axes', ahora podemos graficar en ella.\n",
668 | "\n",
669 | "'barh' crea un 'gráfico de barra horizontal'"
670 | ]
671 | },
672 | {
673 | "cell_type": "code",
674 | "execution_count": null,
675 | "metadata": {},
676 | "outputs": [],
677 | "source": [
678 | "fig, ax = plt.subplots()\n",
679 | "ax.barh(group_names, group_data) #cambiar barh por bar"
680 | ]
681 | },
682 | {
683 | "cell_type": "markdown",
684 | "metadata": {},
685 | "source": [
686 | "## Estilos\n",
687 | "\n",
688 | "Hay muchos estilos disponibles en Matplotlib con la idea de lograr una mejor visualización. \n"
689 | ]
690 | },
691 | {
692 | "cell_type": "code",
693 | "execution_count": null,
694 | "metadata": {},
695 | "outputs": [],
696 | "source": [
697 | "print(plt.style.available)\n"
698 | ]
699 | },
700 | {
701 | "cell_type": "markdown",
702 | "metadata": {},
703 | "source": [
704 | "Un estilo se puede activar de la siguiente forma"
705 | ]
706 | },
707 | {
708 | "cell_type": "code",
709 | "execution_count": null,
710 | "metadata": {},
711 | "outputs": [],
712 | "source": [
713 | "plt.style.use('fivethirtyeight') #seaborn-dark-palette\n",
714 | "fig, ax = plt.subplots()\n",
715 | "ax.barh(group_names, group_data)"
716 | ]
717 | },
718 | {
719 | "cell_type": "markdown",
720 | "metadata": {},
721 | "source": [
722 | "## Rotar ejes\n",
723 | "Si bien el gráfico va quedando bastante bien, es dificil distinguir cada una de los elementos del eje 'x'.\n",
724 | "Podemos rotar los ejes para permitir mayor claridad.\n",
725 | "\n",
726 | "Cuando se quiere cambiar la propiedad de muchos itemes a la vez, se puede utilizar la función 'setp'. Esta función tomará una lista (o muchas listas) de objetos de Matplotlib e intentará setear el estilo de cada uno de los elementos\n"
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": null,
732 | "metadata": {},
733 | "outputs": [],
734 | "source": [
735 | "fig, ax = plt.subplots()\n",
736 | "ax.barh(group_names, group_data)\n",
737 | "labels = ax.get_xticklabels() #Aqui tenemos acceso a las etiquetas de x\n",
738 | "plt.setp(labels, rotation=45, horizontalalignment='right');#se le paso la lista labels"
739 | ]
740 | },
741 | {
742 | "cell_type": "markdown",
743 | "metadata": {},
744 | "source": [
745 | "## Adjustar figura\n",
746 | "\n",
747 | "*tigh_layout* automaticamente ajusta parametros del subplot para que ocupe todo el espacio de la figura que está permitido. \n"
748 | ]
749 | },
750 | {
751 | "cell_type": "code",
752 | "execution_count": null,
753 | "metadata": {},
754 | "outputs": [],
755 | "source": [
756 | "\n",
757 | "fig, ax = plt.subplots()\n",
758 | "ax.barh(group_names, group_data)\n",
759 | "labels = ax.get_xticklabels()\n",
760 | "plt.setp(labels, rotation=45, horizontalalignment='right');\n",
761 | "plt.tight_layout()"
762 | ]
763 | },
764 | {
765 | "cell_type": "markdown",
766 | "metadata": {},
767 | "source": [
768 | "## Añadir etiquetas a la figura\n",
769 | "\n",
770 | "Añadamos límites, etiquetas y título"
771 | ]
772 | },
773 | {
774 | "cell_type": "code",
775 | "execution_count": null,
776 | "metadata": {},
777 | "outputs": [],
778 | "source": [
779 | "fig, ax = plt.subplots()\n",
780 | "ax.barh(group_names, group_data)\n",
781 | "labels = ax.get_xticklabels()\n",
782 | "plt.setp(labels, rotation=45, horizontalalignment='right')\n",
783 | "plt.tight_layout()\n",
784 | "ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company',\n",
785 | " title='Company Revenue');"
786 | ]
787 | },
788 | {
789 | "cell_type": "markdown",
790 | "metadata": {},
791 | "source": [
792 | "## Cambiemos el tamaño de la figura"
793 | ]
794 | },
795 | {
796 | "cell_type": "code",
797 | "execution_count": null,
798 | "metadata": {},
799 | "outputs": [],
800 | "source": [
801 | "fig, ax = plt.subplots(figsize=(8, 4))\n",
802 | "ax.barh(group_names, group_data)\n",
803 | "labels = ax.get_xticklabels()\n",
804 | "plt.tight_layout()\n",
805 | "plt.setp(labels, rotation=45, horizontalalignment='right')\n",
806 | "ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company',\n",
807 | " title='Company Revenue');"
808 | ]
809 | },
810 | {
811 | "cell_type": "markdown",
812 | "metadata": {},
813 | "source": [
814 | "## Establecer patrones en las etiquetas\n",
815 | "\n",
816 | "Se necesita la clase 'ticker.FuncFormatter'. \n",
817 | "\n",
818 | "Definamos una funcion que toma un entero como input y retorna un string como salida"
819 | ]
820 | },
821 | {
822 | "cell_type": "code",
823 | "execution_count": null,
824 | "metadata": {},
825 | "outputs": [],
826 | "source": [
827 | "def currency(x, pos): #valor y posicion\n",
828 | "\n",
829 | " if x >= 1e6:\n",
830 | " s = '${:1.1f}M'.format(x*1e-6)\n",
831 | " else:\n",
832 | " s = '${:1.0f}K'.format(x*1e-3)\n",
833 | " return s\n",
834 | "\n",
835 | "formatter = FuncFormatter(currency)"
836 | ]
837 | },
838 | {
839 | "cell_type": "markdown",
840 | "metadata": {},
841 | "source": [
842 | "Podemos aplicar esta función a las etiquetas de nuestra figura. Se debe utilizar el atributo 'xaxis'. Esto nos permite centrar la atención en un axis específico"
843 | ]
844 | },
845 | {
846 | "cell_type": "code",
847 | "execution_count": null,
848 | "metadata": {},
849 | "outputs": [],
850 | "source": [
851 | "fig, ax = plt.subplots(figsize=(6, 8))\n",
852 | "ax.barh(group_names, group_data)\n",
853 | "labels = ax.get_xticklabels()\n",
854 | "plt.setp(labels, rotation=45, horizontalalignment='right')\n",
855 | "ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company',\n",
856 | " title='Company Revenue')\n",
857 | "plt.tight_layout()\n",
858 | "ax.xaxis.set_major_formatter(formatter) # se modifican etiquetas principales"
859 | ]
860 | },
861 | {
862 | "cell_type": "markdown",
863 | "metadata": {},
864 | "source": [
865 | "## Combinar con otras visualizaciones\n",
866 | "\n"
867 | ]
868 | },
869 | {
870 | "cell_type": "code",
871 | "execution_count": null,
872 | "metadata": {},
873 | "outputs": [],
874 | "source": [
875 | "fig, ax = plt.subplots(figsize=(6, 8))\n",
876 | "ax.barh(group_names, group_data)\n",
877 | "labels = ax.get_xticklabels()\n",
878 | "plt.setp(labels, rotation=45, horizontalalignment='right')\n",
879 | "ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company',\n",
880 | " title='Company Revenue')\n",
881 | "plt.tight_layout()\n",
882 | "ax.xaxis.set_major_formatter(formatter) # se modifican etiquetas principales\n",
883 | "#Añadir una linea vertical en el promedio de las compañías\n",
884 | "ax.axvline(group_mean, ls='--', color='r')\n",
885 | "\n",
886 | "# Agregar nuevas compañías\n",
887 | "for group in [3, 5, 8]:\n",
888 | " ax.text(145000, group, \"Nueva Empresa\", fontsize=10,\n",
889 | " verticalalignment=\"center\")"
890 | ]
891 | },
892 | {
893 | "cell_type": "markdown",
894 | "metadata": {},
895 | "source": [
896 | "## Guardar Imagen"
897 | ]
898 | },
899 | {
900 | "cell_type": "code",
901 | "execution_count": null,
902 | "metadata": {},
903 | "outputs": [],
904 | "source": [
905 | "print(fig.canvas.get_supported_filetypes())\n"
906 | ]
907 | },
908 | {
909 | "cell_type": "code",
910 | "execution_count": null,
911 | "metadata": {},
912 | "outputs": [],
913 | "source": [
914 | "\n",
915 | "#fig.savefig('sales.png', transparent=False, dpi=80, bbox_inches=\"tight\")"
916 | ]
917 | },
918 | {
919 | "cell_type": "markdown",
920 | "metadata": {},
921 | "source": [
922 | "## Ejercicio"
923 | ]
924 | },
925 | {
926 | "cell_type": "markdown",
927 | "metadata": {},
928 | "source": [
929 | " "
930 | ]
931 | },
932 | {
933 | "cell_type": "code",
934 | "execution_count": null,
935 | "metadata": {},
936 | "outputs": [],
937 | "source": [
938 | "peliculas ={\n",
939 | "'Avengers: Infinity War': ('27 de abril','Walt Disney Pictures',257698183),\n",
940 | "'Black Panther': ('16 de febrero','Walt Disney Pictures',202003951),\n",
941 | "'A Quiet Place':('6 de abril','Paramount',88203562),\n",
942 | "'Ready Player One':('29 de marzo','Warner Bros',41764050),\n",
943 | "'Cincuenta sombras liberadas':('9 de febrero','Universal Pictures',38560195),\n",
944 | "'Rampage':('13 de abril','Warner Bros',35753093),\n",
945 | "'A Wrinkle in Time':('9 de marzo','Walt Disney Pictures',33123609),\n",
946 | "'Insidious: The Last Key':('5 de enero','Universal Pictures',29581355),\n",
947 | "'Pacific Rim: Uprising':('23 de marzo','Universal Pictures',28116535),\n",
948 | "'Peter Rabbit':('9 de febrero','Sony / Columbia',25010928)\n",
949 | "}"
950 | ]
951 | },
952 | {
953 | "cell_type": "code",
954 | "execution_count": null,
955 | "metadata": {},
956 | "outputs": [],
957 | "source": []
958 | }
959 | ],
960 | "metadata": {
961 | "celltoolbar": "Slideshow",
962 | "kernelspec": {
963 | "display_name": "Python 2",
964 | "language": "python",
965 | "name": "python2"
966 | },
967 | "language_info": {
968 | "codemirror_mode": {
969 | "name": "ipython",
970 | "version": 2
971 | },
972 | "file_extension": ".py",
973 | "mimetype": "text/x-python",
974 | "name": "python",
975 | "nbconvert_exporter": "python",
976 | "pygments_lexer": "ipython2",
977 | "version": "2.7.14"
978 | }
979 | },
980 | "nbformat": 4,
981 | "nbformat_minor": 2
982 | }
983 |
--------------------------------------------------------------------------------
/Clase 5/Ticks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/Ticks.png
--------------------------------------------------------------------------------
/Clase 5/axes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/axes.png
--------------------------------------------------------------------------------
/Clase 5/ejercicio1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/ejercicio1.png
--------------------------------------------------------------------------------
/Clase 5/ejercicio11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/ejercicio11.png
--------------------------------------------------------------------------------
/Clase 5/ejercicio2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/ejercicio2.png
--------------------------------------------------------------------------------
/Clase 5/figureParameters.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/figureParameters.png
--------------------------------------------------------------------------------
/Clase 5/figureParts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/figureParts.png
--------------------------------------------------------------------------------
/Clase 5/subplots.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 5/subplots.png
--------------------------------------------------------------------------------
/Clase 6/gqm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 6/gqm.png
--------------------------------------------------------------------------------
/Clase 6/mini_ejercicio1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 6/mini_ejercicio1.png
--------------------------------------------------------------------------------
/Clase 6/mini_ejercicio22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 6/mini_ejercicio22.png
--------------------------------------------------------------------------------
/Clase 6/mini_ejercicio222.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 6/mini_ejercicio222.png
--------------------------------------------------------------------------------
/Clase 7/graf1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/graf1.png
--------------------------------------------------------------------------------
/Clase 7/graf2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/graf2.png
--------------------------------------------------------------------------------
/Clase 7/graf3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/graf3.png
--------------------------------------------------------------------------------
/Clase 7/graf4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/graf4.png
--------------------------------------------------------------------------------
/Clase 7/graf5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/graf5.png
--------------------------------------------------------------------------------
/Clase 7/prueba_plotly.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/prueba_plotly.png
--------------------------------------------------------------------------------
/Clase 7/torpedo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 7/torpedo.png
--------------------------------------------------------------------------------
/Clase 8/Clase 8.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "\n",
12 | " \n",
13 | "\n",
14 | " \n",
15 | "\n",
16 | "# Taller de Manejo y Visualización de Datos con Python\n",
17 | "ScatterText \n",
18 | "Felipe González P. \n",
19 | "felipe.gonzalezp.12@sansano.usm.cl \n",
20 | "\n",
21 | " \n",
22 | "Jueves Bloque 7-8 \n",
23 | "Campus San Joaquín\n",
24 | "\n",
25 | "\n"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {
31 | "slideshow": {
32 | "slide_type": "slide"
33 | }
34 | },
35 | "source": [
36 | "# spaCy\n",
37 | "SpaCy es una biblioteca que tiene funciones para todas las tareas clave de NLP necesarias para preparar el texto para un análisis posterior, incluida la tokenización, el reconocimiento de entidades nombradas y los vectores de palabras, entre muchos otros. Su objetivo es proporcionar un proceso simplificado para que llegue al resultado final lo más rápido posible. No usaremos esto directamente, ya que es aprovechado automáticamente por la biblioteca Scattertext, pero dada su importancia en esta área, vale la pena mencionarlo."
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {
43 | "slideshow": {
44 | "slide_type": "slide"
45 | }
46 | },
47 | "source": [
48 | "# Scattertext\n",
49 | "\n",
50 | "Scattertext es un paquete de Python que permite comparar y contrastar cómo se usan las palabras de manera diferente en dos tipos de documentos, produciendo visualizaciones interactivas basadas en Javascript que pueden incorporarse fácilmente en los Jupyter Notebooks. Usando SpaCy y Empath, Scattertext también puede mostrar cómo los estados emocionales y las palabras se relacionan con un tema en particular.\n"
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {
56 | "slideshow": {
57 | "slide_type": "slide"
58 | }
59 | },
60 | "source": [
61 | " \n",
62 | "\n",
63 | "\n"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {
69 | "slideshow": {
70 | "slide_type": "slide"
71 | }
72 | },
73 | "source": [
74 | " \n",
75 | ""
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {
81 | "slideshow": {
82 | "slide_type": "slide"
83 | }
84 | },
85 | "source": [
86 | "# Instalación \n",
87 | "```bash\n",
88 | "conda install -c conda-forge spacy\n",
89 | "conda install -c conda-forge scattertext\n",
90 | "python -m spacy download en\n",
91 | "```"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {
97 | "slideshow": {
98 | "slide_type": "slide"
99 | }
100 | },
101 | "source": [
102 | "# Importar librerías "
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": 4,
108 | "metadata": {
109 | "slideshow": {
110 | "slide_type": "fragment"
111 | }
112 | },
113 | "outputs": [
114 | {
115 | "data": {
116 | "text/html": [
117 | ""
118 | ],
119 | "text/plain": [
120 | ""
121 | ]
122 | },
123 | "metadata": {},
124 | "output_type": "display_data"
125 | }
126 | ],
127 | "source": [
128 | "%matplotlib inline\n",
129 | "import pandas as pd\n",
130 | "import numpy as np\n",
131 | "import matplotlib.pyplot as plt\n",
132 | "import spacy\n",
133 | "from datetime import datetime\n",
134 | "import scattertext as st\n",
135 | "from IPython.display import IFrame\n",
136 | "from IPython.core.display import display, HTML\n",
137 | "display(HTML(\"\"))"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {
143 | "slideshow": {
144 | "slide_type": "slide"
145 | }
146 | },
147 | "source": [
148 | "# Leer dataset\n",
149 | "Dataset obtenido desde GitHub "
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 8,
155 | "metadata": {
156 | "slideshow": {
157 | "slide_type": "fragment"
158 | }
159 | },
160 | "outputs": [],
161 | "source": [
162 | "data = pd.read_csv(\"election_day_tweets_with_emojis_cleaned_VF.csv\")"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 9,
168 | "metadata": {
169 | "slideshow": {
170 | "slide_type": "fragment"
171 | }
172 | },
173 | "outputs": [
174 | {
175 | "data": {
176 | "text/html": [
177 | "\n",
178 | "\n",
191 | "
\n",
192 | " \n",
193 | " \n",
194 | " \n",
195 | " tweetindex \n",
196 | " text \n",
197 | " ts \n",
198 | " hour \n",
199 | " min \n",
200 | " lang \n",
201 | " source.r \n",
202 | " clinton \n",
203 | " trump \n",
204 | " election \n",
205 | " vote \n",
206 | " emoji.names \n",
207 | " url \n",
208 | " \n",
209 | " \n",
210 | " \n",
211 | " \n",
212 | " 0 \n",
213 | " 3 \n",
214 | " Is Election Day over yet \n",
215 | " 11/8/16 15:57 \n",
216 | " 15 \n",
217 | " 1557 \n",
218 | " en \n",
219 | " IOS \n",
220 | " 0 \n",
221 | " 0 \n",
222 | " 1 \n",
223 | " 0 \n",
224 | " expressionless face \n",
225 | " twitter.com/________kenzie/status/796094330494... \n",
226 | " \n",
227 | " \n",
228 | " 1 \n",
229 | " 5 \n",
230 | " Do people not vote for the Green Party because... \n",
231 | " 11/8/16 15:22 \n",
232 | " 15 \n",
233 | " 1522 \n",
234 | " en \n",
235 | " WEB \n",
236 | " 0 \n",
237 | " 0 \n",
238 | " 1 \n",
239 | " 1 \n",
240 | " thinking face \n",
241 | " twitter.com/________owl/status/796085601938313217 \n",
242 | " \n",
243 | " \n",
244 | " 2 \n",
245 | " 13 \n",
246 | " anyways, I voted \n",
247 | " 11/8/16 18:09 \n",
248 | " 18 \n",
249 | " 1809 \n",
250 | " en \n",
251 | " IOS \n",
252 | " 0 \n",
253 | " 0 \n",
254 | " 0 \n",
255 | " 1 \n",
256 | " hugging face \n",
257 | " twitter.com/_______kml/status/796127605422522373 \n",
258 | " \n",
259 | " \n",
260 | " 3 \n",
261 | " 15 \n",
262 | " Mfs Steady Talking Bout The Election How Many ... \n",
263 | " 11/8/16 16:23 \n",
264 | " 16 \n",
265 | " 1623 \n",
266 | " en \n",
267 | " IOS \n",
268 | " 0 \n",
269 | " 0 \n",
270 | " 1 \n",
271 | " 1 \n",
272 | " face with tears of joy, sleeping face \n",
273 | " twitter.com/_______KrewNate/status/79610077736... \n",
274 | " \n",
275 | " \n",
276 | " 4 \n",
277 | " 18 \n",
278 | " good thing i can't vote \n",
279 | " 11/8/16 11:12 \n",
280 | " 11 \n",
281 | " 1112 \n",
282 | " en \n",
283 | " IOS \n",
284 | " 0 \n",
285 | " 0 \n",
286 | " 0 \n",
287 | " 1 \n",
288 | " hugging face, Puerto Rico \n",
289 | " twitter.com/_______richard/status/796022544570... \n",
290 | " \n",
291 | " \n",
292 | " 5 \n",
293 | " 21 \n",
294 | " 1 vote won't change anything \n",
295 | " 11/8/16 23:19 \n",
296 | " 23 \n",
297 | " 2319 \n",
298 | " en \n",
299 | " IOS \n",
300 | " 0 \n",
301 | " 0 \n",
302 | " 0 \n",
303 | " 1 \n",
304 | " face with rolling eyes \n",
305 | " twitter.com/______aMoya/status/796205457979428864 \n",
306 | " \n",
307 | " \n",
308 | " 6 \n",
309 | " 22 \n",
310 | " make sure yall get out and VOTE today!! \n",
311 | " 11/8/16 6:31 \n",
312 | " 6 \n",
313 | " 631 \n",
314 | " en \n",
315 | " OTHER \n",
316 | " 0 \n",
317 | " 0 \n",
318 | " 0 \n",
319 | " 1 \n",
320 | " double exclamation mark \n",
321 | " twitter.com/______Ashlee/status/79595176059457... \n",
322 | " \n",
323 | " \n",
324 | " 7 \n",
325 | " 38 \n",
326 | " This Election Really Show How Fucking Smart Th... \n",
327 | " 11/8/16 22:15 \n",
328 | " 22 \n",
329 | " 2215 \n",
330 | " en \n",
331 | " IOS \n",
332 | " 0 \n",
333 | " 0 \n",
334 | " 1 \n",
335 | " 0 \n",
336 | " expressionless face \n",
337 | " twitter.com/______raeeeeee/status/796189489907... \n",
338 | " \n",
339 | " \n",
340 | " 8 \n",
341 | " 49 \n",
342 | " This is actually shocking. But then again, hon... \n",
343 | " 11/8/16 22:32 \n",
344 | " 22 \n",
345 | " 2232 \n",
346 | " en \n",
347 | " IOS \n",
348 | " 0 \n",
349 | " 0 \n",
350 | " 0 \n",
351 | " 1 \n",
352 | " face with rolling eyes \n",
353 | " twitter.com/_____armani/status/796193747176398849 \n",
354 | " \n",
355 | " \n",
356 | " 9 \n",
357 | " 53 \n",
358 | " My snapchat story/rant about this election is ... \n",
359 | " 11/8/16 13:32 \n",
360 | " 13 \n",
361 | " 1332 \n",
362 | " en \n",
363 | " IOS \n",
364 | " 0 \n",
365 | " 0 \n",
366 | " 1 \n",
367 | " 0 \n",
368 | " face with tears of joy, United States \n",
369 | " twitter.com/_____bat/status/796057835461283840 \n",
370 | " \n",
371 | " \n",
372 | "
\n",
373 | "
"
374 | ],
375 | "text/plain": [
376 | " tweetindex text \\\n",
377 | "0 3 Is Election Day over yet \n",
378 | "1 5 Do people not vote for the Green Party because... \n",
379 | "2 13 anyways, I voted \n",
380 | "3 15 Mfs Steady Talking Bout The Election How Many ... \n",
381 | "4 18 good thing i can't vote \n",
382 | "5 21 1 vote won't change anything \n",
383 | "6 22 make sure yall get out and VOTE today!! \n",
384 | "7 38 This Election Really Show How Fucking Smart Th... \n",
385 | "8 49 This is actually shocking. But then again, hon... \n",
386 | "9 53 My snapchat story/rant about this election is ... \n",
387 | "\n",
388 | " ts hour min lang source.r clinton trump election vote \\\n",
389 | "0 11/8/16 15:57 15 1557 en IOS 0 0 1 0 \n",
390 | "1 11/8/16 15:22 15 1522 en WEB 0 0 1 1 \n",
391 | "2 11/8/16 18:09 18 1809 en IOS 0 0 0 1 \n",
392 | "3 11/8/16 16:23 16 1623 en IOS 0 0 1 1 \n",
393 | "4 11/8/16 11:12 11 1112 en IOS 0 0 0 1 \n",
394 | "5 11/8/16 23:19 23 2319 en IOS 0 0 0 1 \n",
395 | "6 11/8/16 6:31 6 631 en OTHER 0 0 0 1 \n",
396 | "7 11/8/16 22:15 22 2215 en IOS 0 0 1 0 \n",
397 | "8 11/8/16 22:32 22 2232 en IOS 0 0 0 1 \n",
398 | "9 11/8/16 13:32 13 1332 en IOS 0 0 1 0 \n",
399 | "\n",
400 | " emoji.names \\\n",
401 | "0 expressionless face \n",
402 | "1 thinking face \n",
403 | "2 hugging face \n",
404 | "3 face with tears of joy, sleeping face \n",
405 | "4 hugging face, Puerto Rico \n",
406 | "5 face with rolling eyes \n",
407 | "6 double exclamation mark \n",
408 | "7 expressionless face \n",
409 | "8 face with rolling eyes \n",
410 | "9 face with tears of joy, United States \n",
411 | "\n",
412 | " url \n",
413 | "0 twitter.com/________kenzie/status/796094330494... \n",
414 | "1 twitter.com/________owl/status/796085601938313217 \n",
415 | "2 twitter.com/_______kml/status/796127605422522373 \n",
416 | "3 twitter.com/_______KrewNate/status/79610077736... \n",
417 | "4 twitter.com/_______richard/status/796022544570... \n",
418 | "5 twitter.com/______aMoya/status/796205457979428864 \n",
419 | "6 twitter.com/______Ashlee/status/79595176059457... \n",
420 | "7 twitter.com/______raeeeeee/status/796189489907... \n",
421 | "8 twitter.com/_____armani/status/796193747176398849 \n",
422 | "9 twitter.com/_____bat/status/796057835461283840 "
423 | ]
424 | },
425 | "execution_count": 9,
426 | "metadata": {},
427 | "output_type": "execute_result"
428 | }
429 | ],
430 | "source": [
431 | "data.head(10)"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {
437 | "slideshow": {
438 | "slide_type": "slide"
439 | }
440 | },
441 | "source": [
442 | "# Analizar tweets por hora "
443 | ]
444 | },
445 | {
446 | "cell_type": "code",
447 | "execution_count": 49,
448 | "metadata": {
449 | "slideshow": {
450 | "slide_type": "fragment"
451 | }
452 | },
453 | "outputs": [],
454 | "source": [
455 | "import time\n",
456 | "def transform_datetime(x):\n",
457 | " try:\n",
458 | " return time.strftime('%y-%m-%d %H:00', time.strptime(x,'%m/%d/%y %H:%M'))\n",
459 | " except:\n",
460 | " pass\n",
461 | "data['created_at'] = data.ts.apply(transform_datetime)\n"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": 50,
467 | "metadata": {
468 | "slideshow": {
469 | "slide_type": "fragment"
470 | }
471 | },
472 | "outputs": [
473 | {
474 | "data": {
475 | "text/html": [
476 | "\n",
477 | "\n",
490 | "
\n",
491 | " \n",
492 | " \n",
493 | " \n",
494 | " tweetindex \n",
495 | " text \n",
496 | " ts \n",
497 | " hour \n",
498 | " min \n",
499 | " lang \n",
500 | " source.r \n",
501 | " clinton \n",
502 | " trump \n",
503 | " election \n",
504 | " vote \n",
505 | " emoji.names \n",
506 | " url \n",
507 | " created_at \n",
508 | " \n",
509 | " \n",
510 | " \n",
511 | " \n",
512 | " 0 \n",
513 | " 3 \n",
514 | " Is Election Day over yet \n",
515 | " 11/8/16 15:57 \n",
516 | " 15 \n",
517 | " 1557 \n",
518 | " en \n",
519 | " IOS \n",
520 | " 0 \n",
521 | " 0 \n",
522 | " 1 \n",
523 | " 0 \n",
524 | " expressionless face \n",
525 | " twitter.com/________kenzie/status/796094330494... \n",
526 | " 16-11-08 15:00 \n",
527 | " \n",
528 | " \n",
529 | " 1 \n",
530 | " 5 \n",
531 | " Do people not vote for the Green Party because... \n",
532 | " 11/8/16 15:22 \n",
533 | " 15 \n",
534 | " 1522 \n",
535 | " en \n",
536 | " WEB \n",
537 | " 0 \n",
538 | " 0 \n",
539 | " 1 \n",
540 | " 1 \n",
541 | " thinking face \n",
542 | " twitter.com/________owl/status/796085601938313217 \n",
543 | " 16-11-08 15:00 \n",
544 | " \n",
545 | " \n",
546 | " 2 \n",
547 | " 13 \n",
548 | " anyways, I voted \n",
549 | " 11/8/16 18:09 \n",
550 | " 18 \n",
551 | " 1809 \n",
552 | " en \n",
553 | " IOS \n",
554 | " 0 \n",
555 | " 0 \n",
556 | " 0 \n",
557 | " 1 \n",
558 | " hugging face \n",
559 | " twitter.com/_______kml/status/796127605422522373 \n",
560 | " 16-11-08 18:00 \n",
561 | " \n",
562 | " \n",
563 | " 3 \n",
564 | " 15 \n",
565 | " Mfs Steady Talking Bout The Election How Many ... \n",
566 | " 11/8/16 16:23 \n",
567 | " 16 \n",
568 | " 1623 \n",
569 | " en \n",
570 | " IOS \n",
571 | " 0 \n",
572 | " 0 \n",
573 | " 1 \n",
574 | " 1 \n",
575 | " face with tears of joy, sleeping face \n",
576 | " twitter.com/_______KrewNate/status/79610077736... \n",
577 | " 16-11-08 16:00 \n",
578 | " \n",
579 | " \n",
580 | " 4 \n",
581 | " 18 \n",
582 | " good thing i can't vote \n",
583 | " 11/8/16 11:12 \n",
584 | " 11 \n",
585 | " 1112 \n",
586 | " en \n",
587 | " IOS \n",
588 | " 0 \n",
589 | " 0 \n",
590 | " 0 \n",
591 | " 1 \n",
592 | " hugging face, Puerto Rico \n",
593 | " twitter.com/_______richard/status/796022544570... \n",
594 | " 16-11-08 11:00 \n",
595 | " \n",
596 | " \n",
597 | "
\n",
598 | "
"
599 | ],
600 | "text/plain": [
601 | " tweetindex text \\\n",
602 | "0 3 Is Election Day over yet \n",
603 | "1 5 Do people not vote for the Green Party because... \n",
604 | "2 13 anyways, I voted \n",
605 | "3 15 Mfs Steady Talking Bout The Election How Many ... \n",
606 | "4 18 good thing i can't vote \n",
607 | "\n",
608 | " ts hour min lang source.r clinton trump election vote \\\n",
609 | "0 11/8/16 15:57 15 1557 en IOS 0 0 1 0 \n",
610 | "1 11/8/16 15:22 15 1522 en WEB 0 0 1 1 \n",
611 | "2 11/8/16 18:09 18 1809 en IOS 0 0 0 1 \n",
612 | "3 11/8/16 16:23 16 1623 en IOS 0 0 1 1 \n",
613 | "4 11/8/16 11:12 11 1112 en IOS 0 0 0 1 \n",
614 | "\n",
615 | " emoji.names \\\n",
616 | "0 expressionless face \n",
617 | "1 thinking face \n",
618 | "2 hugging face \n",
619 | "3 face with tears of joy, sleeping face \n",
620 | "4 hugging face, Puerto Rico \n",
621 | "\n",
622 | " url created_at \n",
623 | "0 twitter.com/________kenzie/status/796094330494... 16-11-08 15:00 \n",
624 | "1 twitter.com/________owl/status/796085601938313217 16-11-08 15:00 \n",
625 | "2 twitter.com/_______kml/status/796127605422522373 16-11-08 18:00 \n",
626 | "3 twitter.com/_______KrewNate/status/79610077736... 16-11-08 16:00 \n",
627 | "4 twitter.com/_______richard/status/796022544570... 16-11-08 11:00 "
628 | ]
629 | },
630 | "execution_count": 50,
631 | "metadata": {},
632 | "output_type": "execute_result"
633 | }
634 | ],
635 | "source": [
636 | "data.head()"
637 | ]
638 | },
639 | {
640 | "cell_type": "markdown",
641 | "metadata": {
642 | "slideshow": {
643 | "slide_type": "slide"
644 | }
645 | },
646 | "source": [
647 | "# Cantidad del dataset "
648 | ]
649 | },
650 | {
651 | "cell_type": "code",
652 | "execution_count": 51,
653 | "metadata": {
654 | "slideshow": {
655 | "slide_type": "fragment"
656 | }
657 | },
658 | "outputs": [
659 | {
660 | "data": {
661 | "text/plain": [
662 | "175655"
663 | ]
664 | },
665 | "execution_count": 51,
666 | "metadata": {},
667 | "output_type": "execute_result"
668 | }
669 | ],
670 | "source": [
671 | "len(data)"
672 | ]
673 | },
674 | {
675 | "cell_type": "markdown",
676 | "metadata": {
677 | "slideshow": {
678 | "slide_type": "fragment"
679 | }
680 | },
681 | "source": [
682 | "# Categorizemos los tweets"
683 | ]
684 | },
685 | {
686 | "cell_type": "code",
687 | "execution_count": 52,
688 | "metadata": {
689 | "slideshow": {
690 | "slide_type": "fragment"
691 | }
692 | },
693 | "outputs": [],
694 | "source": [
695 | "data_clinton = data[data['clinton']==1]\n",
696 | "data_trump = data[data['trump']==1]"
697 | ]
698 | },
699 | {
700 | "cell_type": "markdown",
701 | "metadata": {
702 | "slideshow": {
703 | "slide_type": "slide"
704 | }
705 | },
706 | "source": [
707 | "¿Cuántos escriben tweets sobre Clinton? ¿Cuántos sobre Trump?"
708 | ]
709 | },
710 | {
711 | "cell_type": "markdown",
712 | "metadata": {
713 | "slideshow": {
714 | "slide_type": "slide"
715 | }
716 | },
717 | "source": [
718 | "# Veamos los tweets en el tiempo"
719 | ]
720 | },
721 | {
722 | "cell_type": "code",
723 | "execution_count": 53,
724 | "metadata": {
725 | "slideshow": {
726 | "slide_type": "fragment"
727 | }
728 | },
729 | "outputs": [],
730 | "source": [
731 | "tweets_over_time_clinton = data_clinton.groupby('created_at').size()\n",
732 | "tweets_over_time_trump = data_trump.groupby('created_at').size()\n"
733 | ]
734 | },
735 | {
736 | "cell_type": "code",
737 | "execution_count": 54,
738 | "metadata": {
739 | "slideshow": {
740 | "slide_type": "fragment"
741 | }
742 | },
743 | "outputs": [
744 | {
745 | "data": {
746 | "text/plain": [
747 | "created_at\n",
748 | "16-11-08 01:00 421\n",
749 | "16-11-08 02:00 758\n",
750 | "16-11-08 03:00 743\n",
751 | "16-11-08 04:00 545\n",
752 | "16-11-08 05:00 631\n",
753 | "16-11-08 06:00 861\n",
754 | "16-11-08 07:00 1349\n",
755 | "16-11-08 08:00 1743\n",
756 | "16-11-08 09:00 1827\n",
757 | "16-11-08 10:00 1799\n",
758 | "16-11-08 11:00 1736\n",
759 | "16-11-08 12:00 1608\n",
760 | "16-11-08 13:00 1472\n",
761 | "16-11-08 14:00 1438\n",
762 | "16-11-08 15:00 1467\n",
763 | "16-11-08 16:00 1064\n",
764 | "16-11-08 17:00 1079\n",
765 | "16-11-08 18:00 1121\n",
766 | "16-11-08 19:00 1311\n",
767 | "16-11-08 20:00 1500\n",
768 | "16-11-08 21:00 1166\n",
769 | "16-11-08 22:00 1021\n",
770 | "16-11-08 23:00 898\n",
771 | "16-11-09 00:00 685\n",
772 | "16-11-09 01:00 735\n",
773 | "16-11-09 02:00 1016\n",
774 | "16-11-09 03:00 466\n",
775 | "dtype: int64"
776 | ]
777 | },
778 | "execution_count": 54,
779 | "metadata": {},
780 | "output_type": "execute_result"
781 | }
782 | ],
783 | "source": [
784 | "tweets_over_time_clinton"
785 | ]
786 | },
787 | {
788 | "cell_type": "markdown",
789 | "metadata": {
790 | "slideshow": {
791 | "slide_type": "slide"
792 | }
793 | },
794 | "source": [
795 | "# Grafiquemos "
796 | ]
797 | },
798 | {
799 | "cell_type": "code",
800 | "execution_count": 58,
801 | "metadata": {},
802 | "outputs": [
803 | {
804 | "data": {
805 | "image/png": "\n",
806 | "text/plain": [
807 | ""
808 | ]
809 | },
810 | "metadata": {},
811 | "output_type": "display_data"
812 | }
813 | ],
814 | "source": [
815 | "plt.figure(figsize=(15,6))\n",
816 | "plt.title(\"Tweets en el tiempo en el día de elecciones según candidato\", fontsize=20)\n",
817 | "\n",
818 | "plt.plot(tweets_over_time_clinton, label=\"clinton\")\n",
819 | "plt.plot(tweets_over_time_trump, label=\"trump\")\n",
820 | "\n",
821 | "\n",
822 | "plt.ylabel('Número de tweets', fontsize=16)\n",
823 | "plt.xlabel(\"Año-Mes-Día Hora \", fontsize=16)\n",
824 | "plt.legend()\n",
825 | "plt.xticks(rotation=70)\n",
826 | "plt.grid(True)\n",
827 | "plt.show()"
828 | ]
829 | },
830 | {
831 | "cell_type": "markdown",
832 | "metadata": {
833 | "slideshow": {
834 | "slide_type": "slide"
835 | }
836 | },
837 | "source": [
838 | "
Usemos ScatterText "
839 | ]
840 | },
841 | {
842 | "cell_type": "markdown",
843 | "metadata": {
844 | "slideshow": {
845 | "slide_type": "slide"
846 | }
847 | },
848 | "source": [
849 | "## Cargaremos un modelo, este contiene vocabulario, entidades y sintáxis propia del idioma. No lo usaremos directamente"
850 | ]
851 | },
852 | {
853 | "cell_type": "code",
854 | "execution_count": 59,
855 | "metadata": {
856 | "slideshow": {
857 | "slide_type": "fragment"
858 | }
859 | },
860 | "outputs": [],
861 | "source": [
862 | "nlp = spacy.load('en')"
863 | ]
864 | },
865 | {
866 | "cell_type": "markdown",
867 | "metadata": {
868 | "slideshow": {
869 | "slide_type": "slide"
870 | }
871 | },
872 | "source": [
873 | "## Crearemos una nueva columna en los dos dataframe creados"
874 | ]
875 | },
876 | {
877 | "cell_type": "code",
878 | "execution_count": 61,
879 | "metadata": {
880 | "slideshow": {
881 | "slide_type": "fragment"
882 | }
883 | },
884 | "outputs": [],
885 | "source": [
886 | "data_clinton['category'] = 'clinton'\n",
887 | "data_trump['category'] = 'trump'\n"
888 | ]
889 | },
890 | {
891 | "cell_type": "markdown",
892 | "metadata": {
893 | "slideshow": {
894 | "slide_type": "slide"
895 | }
896 | },
897 | "source": [
898 | "## Uniremos todo"
899 | ]
900 | },
901 | {
902 | "cell_type": "code",
903 | "execution_count": 62,
904 | "metadata": {
905 | "slideshow": {
906 | "slide_type": "fragment"
907 | }
908 | },
909 | "outputs": [],
910 | "source": [
911 | "data_night_election = pd.concat([data_clinton[['category','url','text']], data_trump[['category','url','text']]])\n"
912 | ]
913 | },
914 | {
915 | "cell_type": "markdown",
916 | "metadata": {
917 | "slideshow": {
918 | "slide_type": "slide"
919 | }
920 | },
921 | "source": [
922 | "# Alerta\n",
923 | "\n",
924 | "Si correrá los siguientes códigos, debe tener cuenta que tomará tiempo obtener el resultado. Es recomendable hacer un sample. "
925 | ]
926 | },
927 | {
928 | "cell_type": "markdown",
929 | "metadata": {
930 | "slideshow": {
931 | "slide_type": "slide"
932 | }
933 | },
934 | "source": [
935 | "# Corpus \n",
936 | "\n",
937 | "Un corpus lingüístico es un conjunto amplio y estructurado de ejemplos reales de uso de la lengua. Estos ejemplos pueden ser textos, o muestras orales\n"
938 | ]
939 | },
940 | {
941 | "cell_type": "markdown",
942 | "metadata": {
943 | "slideshow": {
944 | "slide_type": "slide"
945 | }
946 | },
947 | "source": [
948 | "# Crearemos un corpus a partir de un dataframe\n"
949 | ]
950 | },
951 | {
952 | "cell_type": "code",
953 | "execution_count": 35,
954 | "metadata": {
955 | "slideshow": {
956 | "slide_type": "fragment"
957 | }
958 | },
959 | "outputs": [
960 | {
961 | "name": "stdout",
962 | "output_type": "stream",
963 | "text": [
964 | "Duration: 0:17:39.146936\n"
965 | ]
966 | }
967 | ],
968 | "source": [
969 | "start_time = datetime.now() \n",
970 | "corpus_data_night_election = st.CorpusFromPandas(data_night_election,category_col='category', text_col='text', nlp=nlp).build() \n",
971 | "end_time = datetime.now()\n",
972 | "print('Duration: {}'.format(end_time - start_time))\n"
973 | ]
974 | },
975 | {
976 | "cell_type": "markdown",
977 | "metadata": {
978 | "slideshow": {
979 | "slide_type": "slide"
980 | }
981 | },
982 | "source": [
983 | "# Veamos términos frecuentes en la noche de elecciones"
984 | ]
985 | },
986 | {
987 | "cell_type": "code",
988 | "execution_count": 40,
989 | "metadata": {
990 | "slideshow": {
991 | "slide_type": "fragment"
992 | }
993 | },
994 | "outputs": [
995 | {
996 | "data": {
997 | "text/plain": [
998 | "12907510"
999 | ]
1000 | },
1001 | "execution_count": 40,
1002 | "metadata": {},
1003 | "output_type": "execute_result"
1004 | }
1005 | ],
1006 | "source": [
1007 | "mininum = 500\n",
1008 | "html_group_4_english = st.produce_scattertext_explorer(corpus_data_night_election, category='clinton', \n",
1009 | " category_name='Clinton', \n",
1010 | " not_category_name='Trump',\n",
1011 | " minimum_term_frequency = mininum,\n",
1012 | " width_in_pixels=1000, \n",
1013 | " metadata=data_night_election['url'])\n",
1014 | "open(\"Night_elections.html\", 'wb').write(html_group_4_english.encode('utf-8'))\n"
1015 | ]
1016 | },
1017 | {
1018 | "cell_type": "markdown",
1019 | "metadata": {
1020 | "slideshow": {
1021 | "slide_type": "slide"
1022 | }
1023 | },
1024 | "source": [
1025 | "# Hagamos Topic Modeling"
1026 | ]
1027 | },
1028 | {
1029 | "cell_type": "code",
1030 | "execution_count": 42,
1031 | "metadata": {
1032 | "slideshow": {
1033 | "slide_type": "fragment"
1034 | }
1035 | },
1036 | "outputs": [],
1037 | "source": [
1038 | "feat_builder = st.FeatsFromOnlyEmpath()\n",
1039 | "empath_corpush = st.CorpusFromParsedDocuments(data_night_election,\n",
1040 | " category_col='category',\n",
1041 | " feats_from_spacy_doc=feat_builder,\n",
1042 | " parsed_col='text').build()\n"
1043 | ]
1044 | },
1045 | {
1046 | "cell_type": "code",
1047 | "execution_count": 46,
1048 | "metadata": {
1049 | "slideshow": {
1050 | "slide_type": "fragment"
1051 | }
1052 | },
1053 | "outputs": [
1054 | {
1055 | "data": {
1056 | "text/plain": [
1057 | "16687660"
1058 | ]
1059 | },
1060 | "execution_count": 46,
1061 | "metadata": {},
1062 | "output_type": "execute_result"
1063 | }
1064 | ],
1065 | "source": [
1066 | "html = st.produce_scattertext_explorer(empath_corpush,\n",
1067 | "... category='clinton',\n",
1068 | "... category_name='Clinton',\n",
1069 | "... not_category_name='Trump',\n",
1070 | "... width_in_pixels=1000,\n",
1071 | "... metadata=data_night_election['url'],\n",
1072 | "... use_non_text_features=True,\n",
1073 | "... use_full_doc=True,\n",
1074 | "... topic_model_term_lists=feat_builder.get_top_model_term_lists())\n",
1075 | "open(\"Topic-Night-Elections.html\", 'wb').write(html.encode('utf-8'))"
1076 | ]
1077 | }
1078 | ],
1079 | "metadata": {
1080 | "celltoolbar": "Slideshow",
1081 | "kernelspec": {
1082 | "display_name": "Python 3",
1083 | "language": "python",
1084 | "name": "python3"
1085 | },
1086 | "language_info": {
1087 | "codemirror_mode": {
1088 | "name": "ipython",
1089 | "version": 3
1090 | },
1091 | "file_extension": ".py",
1092 | "mimetype": "text/x-python",
1093 | "name": "python",
1094 | "nbconvert_exporter": "python",
1095 | "pygments_lexer": "ipython3",
1096 | "version": "3.6.4"
1097 | }
1098 | },
1099 | "nbformat": 4,
1100 | "nbformat_minor": 2
1101 | }
1102 |
--------------------------------------------------------------------------------
/Clase 8/data-society-major-speeches-by-donald-trump.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 8/data-society-major-speeches-by-donald-trump.zip
--------------------------------------------------------------------------------
/Clase 8/data-society-twitters-about-us-airline.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gonzalezf/Data-Analysis-and-Visualization-with-Python/475bb944a004bc3e5b2f94d7484b76fefe39a3ca/Clase 8/data-society-twitters-about-us-airline.zip
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ### Data Analysis and Visualization with Python
2 | #### Taller de manejo y visualización de datos con Python
3 |
4 | **Descripción**: En este taller los estudiantes desarrollaran habilidades de análisis de datos y visualización de información. Se ulizará Python con el framework Anaconda y las librerías Pandas y Matplotlib
5 |
6 | **Requisitos de entrada**: Conocimientos básicos de programación (IWI-131)
7 |
8 | **Competencias Específicas del Perfil de Egreso a las que contribuye**:
9 |
10 | 1. Modelos y métodos: Diseña y aplica métodos estadíscos para el análisis y la interpretación
11 | de datos y el diseño de experimentos computacionales.
12 | 2. Modelos y métodos: Aprende a enfrentar y proponer soluciones usando estrategias
13 | algorítmicas en problemas complejos
14 |
15 |
16 | **Competencias Transversales del Perfil de Egreso a las que contribuye**:
17 | 1. Comunicar información oral y escrita de manera eficaz al interior de las organizaciones en las
18 | que se desempeña, como con endades del entorno.
19 | 2. Actuar con autonomía, flexibilidad, iniciava, y pensamiento críco al enfrentar problemácas
20 | de la profesión .
21 |
22 |
23 | **Objetivos** (Resultados del aprendizaje): Al aprobar la asignatura, el estudiante será capaz de:
24 |
25 | 1. Extraer conjuntos de datos (_datasets_) desde laweb
26 | 2. Aplicar técnicas de limpieza dedatos
27 | 3. Aplicar distintas técnicas de visualización deinformación
28 | 4. Conocer aplicaciones de análisis y visualización de información en laindustria
29 | 5. Presentar de manera oral y escrita trabajo de análisis sobre un determinadodataset.
30 |
31 | **Metodología de enseñanza y de aprendizaje**:
32 | 1. Clases exposivas, material de apoyo teórico disponible para leer después de clases.
33 | 2. Se presentan ejemplos en la industria para mostrar como el análisis de datos y visualización de
34 | información logran resolver problemas actuales.
35 | 3. En el informe preliminar el estudiante debe dar a conocer el dataset que ulizará en el trabajo
36 | final junto con las preguntas que intentará responder.
37 | 4. En el trabajo final el alumno debe extraer información valiosa desde un dataset de su elección,
38 | manejando los datos con Pandas y visualizando los resultados con Matplotlib. Los resultados
39 | son reportados en un informe escrito y en una presentación.
40 |
41 |
42 | **Evaluación**:
43 | 1. Se exige un mínimo de 80% de asistencia al taller. Una vez cumplido, la nota se calcula como se
44 | explica a connuación:
45 | 2. Nota final = 30%*Asistencia + 5%* Informe Preliminar + 65%* Trabajo Final (40% Presentación +
46 | 60% Informe Escrito)
47 |
48 |
49 | **Programación semestre**
50 |
51 | | **Sesión Nº** | **Nombre** | **Tipo Actividad** |
52 | | --- | --- | --- |
53 | |1 | Introducción y motivación al taller. | Clase teórico-práctica Ejemplo de cómo Data Science logra solucionar problemas (Open Street Map yRed de transporte) |
54 | |2| Introducción a Pandas | Clase teórico-práctica Instalación de librería, manejo de objetos y missing values en Pandas. |
55 | |3| Extracción de datasets desde la web | Clase teórico-práctica Extracción de información desde sitios web usando Python + Pandas |
56 | |4| Limpieza de datos con Pandas | Clase teórico-práctica Data Munging. Unión de datasets con Pandas |
57 | |5 | Introducción a la visualización de información | Clase teórico-práctica Introducción a Matplotlib. Gráficos de línea y dispersión. Visualización de errores en gráficos. |
58 | |6 | ¿Cuál es el gráfico correcto para mis datos? | Clase teórico-práctica Gráficos de densidad y contorno.Histogramas. Personalización de figuras. |
59 | | 7 | Visualización de información geoespacial | Clase teórico-práctica Visualización de data con BaseMap (data geográfica) y Seaborn. |
60 | |8 | Ejemplo Práctico I | Clase teórico-práctica Trabajando con la encuesta de viajes(origen-destino), Santiago 2012 |
61 | | 9 | Ejemplo Práctico II | Clase teórico-práctica Trabajando con la encuesta CASEN Entrega de informe preliminar |
62 | | 10 | Sesión final | Clase práctica Presentaciones del trabajo final.Cierre del taller. |
63 |
64 | Este curso está basado en el gran trabajo de Phd. Eduardo Graells (https://github.com/carnby/uddvis)
65 |
--------------------------------------------------------------------------------