├── .gitignore ├── 01_el_proyecto_scipy.ipynb ├── 02_conceptos_basicos_de_numpy.ipynb ├── 03_gestion_de_arreglos_de_numpy.ipynb ├── 04_arreglos_con_contenido_aleatorio.ipynb ├── 05_operaciones_basicas_con_arreglos.ipynb ├── 06_manipulacion_de_arreglos_de_numpy.ipynb ├── 07_gestion_y_analisis_de_datos_de_numpy.ipynb ├── 08_algebra_lineal_con_numpy.ipynb ├── 09_introduccion_a_pandas.ipynb ├── 10_tipos_de_datos_de_pandas.ipynb ├── 11_operaciones_basicas_con_dataframes.ipynb ├── 12_indices_y_multiindices.ipynb ├── 13_uniones_y_mezclas_de_dataframes.ipynb ├── 14_metodo_merge.ipynb ├── 15_metodo_filter.ipynb ├── 16_metodos_apply_y_transform.ipynb ├── 17_metodos_de_enmascaramiento.ipynb ├── 18_gestion_de_datos.ipynb ├── 19_limpieza_y_datos_faltantes.ipynb ├── 20_transformacion_de_objetos.ipynb ├── 21_metodos_groupby.ipynb ├── 22_extraccion_y_almacenamiento.ipynb ├── 23_visualizacion_de_datos_con_pandas.ipynb ├── 24_introduccion_a_matplotlib.ipynb ├── 25_elementos_de_un_grafico.ipynb ├── 26_tipos_basicos_de_graficos.ipynb ├── 27_introduccion_a_plotnine.ipynb ├── 28_introduccion_a_seaborn.ipynb ├── 29_objetos_de_seaborn.ipynb ├── 30_introduccion_a_dask.ipynb ├── 31_introduccion_a_plotly_y_dash.ipynb ├── LICENSE ├── README.md ├── data ├── Casos_Diarios_Estado_Nacional_Confirmados_20211221.csv ├── data_covid.csv └── datos_filtrados.csv ├── img ├── arquitectura_dask.png ├── ciclo.png ├── dask_cluster.png ├── grammar_of_graphics.png └── pythonista.png └── src └── 31 ├── callback.py └── hola_mundo.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | 106 | # excel files 107 | *.xlsx 108 | arreglo.* 109 | arreglos.* 110 | grafica.png 111 | imagen.jpg 112 | 113 | -------------------------------------------------------------------------------- /01_el_proyecto_scipy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# El proyecto *Scipy*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Existen herramientas y lenguajes de programación diseñados específicamente para el cómputo científico y el análisis de datos tales como:\n", 22 | "\n", 23 | "* [*Matlab*](https://la.mathworks.com/products/matlab.html).\n", 24 | "* [*SPSS* ](https://www.ibm.com/analytics/spss-statistics-software).\n", 25 | "* [*Mathematica*](https://www.wolfram.com/mathematica/).\n", 26 | "\n", 27 | "Algunos de ellos incluso ha sido publicado bajo los términos de licencias libres, como:\n", 28 | "\n", 29 | "* [*GNU Octave*](https://www.gnu.org/software/octave/).\n", 30 | "* [*R*](https://www.r-project.org/).\n", 31 | "* [*Julia*](https://julialang.org/).\n", 32 | "\n", 33 | "A diferencia de estas herramientas y lenguajes altamente especializadas en temas estadísticos y de análisis de datos, *Python* es un lenguaje de programación de propósito general. \n", 34 | "\n", 35 | "Sin embargo, debido a las características de *Python* se han creado diversos proyectos enfocados a ofrecer herramientas altamente competitivas en el tema de análisis de datos, estadística y cómputo científico." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Proyectos más relevantes para análisis de datos.\n", 43 | "\n", 44 | "### El proyecto *Scipy*.\n", 45 | "\n", 46 | "El proyecto [*Scipy*](https://www.scipy.org) consta de una serie de herramientas y bibliotecas especializadas en cómputo científico y análisis de datos. Los [componentes más importantes](https://projects.scipy.org/stackspec.html) del proyecto son: \n", 47 | "\n", 48 | "* [*Numpy*](https://numpy.org), una biblioteca que ofrece:\n", 49 | " * Gestión de arreglos multidimensionales (```np.array```).\n", 50 | " * Tipos de datos optimizados para operaciones con arreglos.\n", 51 | " * Poderosos componentes de cálculo vectorial y de álgebra lineal.\n", 52 | "* [*Pandas*](https://pandas.pydata.org/), una herramienta especializada en:\n", 53 | " * Obtención y almacenamientos de datos de diversas fuentes y con diversos formatos.\n", 54 | " * Tratamiento de los datos.\n", 55 | " * Análisis de los datos. \n", 56 | "* [*Matplotlib*](https://matplotlib.org/), una biblioteca especializada en el despliegue y visualización con una sintaxis similar a la de *Matlab*.\n", 57 | "* [*iPython*](https://ipython.org/), un intérprete de *Python* especializado en temas de análisis de datos y cómputo científico, el cual fue el origen del proyecto [*Jupyter*](https://jupyter.org/).\n", 58 | "* [Sympy](https://www.sympy.org), una herramienta que permite realizar operaciones con expresiones de álgebra simbólica.\n", 59 | "\n", 60 | "Estos componentes principales son proyectos muy maduros, cuentan con extensa documentación y soporte tanto de la comunidad como comercial." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "### Los *scikits*.\n", 68 | "\n", 69 | "Los [*scikits*](https://www.scipy.org/scikits.html) son un compendio de proyectos basados en *Scipy* que ofrecen herramientas puntuales para temas muy específicos tales como:\n", 70 | "\n", 71 | "* Machine Learning.\n", 72 | "* Redes neuronales.\n", 73 | "* Análisis de imágenes.\n", 74 | "* Cómputo paralelo.\n", 75 | "* Supercómputo usando GPU.\n", 76 | "* Series de tiempo, etc.\n", 77 | "\n", 78 | "Estos proyectos no son mantenidos ni soportados por *Scipy* y su documentación y madurez no es homogenea.\n", 79 | "\n", 80 | "Los proyectos pueden ser consutados en:\n", 81 | "\n", 82 | "https://scikits.appspot.com/scikits" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "## Instalación de los componentes." 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": { 96 | "scrolled": true 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "!pip install numpy scipy pandas matplotlib" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "###

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 108 | "

© José Luis Chiquete Valdivieso. 2023.

" 109 | ] 110 | } 111 | ], 112 | "metadata": { 113 | "kernelspec": { 114 | "display_name": "Python 3 (ipykernel)", 115 | "language": "python", 116 | "name": "python3" 117 | }, 118 | "language_info": { 119 | "codemirror_mode": { 120 | "name": "ipython", 121 | "version": 3 122 | }, 123 | "file_extension": ".py", 124 | "mimetype": "text/x-python", 125 | "name": "python", 126 | "nbconvert_exporter": "python", 127 | "pygments_lexer": "ipython3", 128 | "version": "3.9.2" 129 | } 130 | }, 131 | "nbformat": 4, 132 | "nbformat_minor": 4 133 | } 134 | -------------------------------------------------------------------------------- /04_arreglos_con_contenido_aleatorio.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Arreglos con contenido aleatorio." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": { 21 | "scrolled": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## El paquete ```np.random```." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "https://numpy.org/doc/stable/reference/random/generator.html" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "### La función ```np.random.rand()```.\n", 47 | "\n", 48 | "* La función ```np.random.rand()``` crea un arreglo cuyos elementos son valores aleatorios que van de ```0``` a antes de ```1``` dentro de una distribución uniforme.\n", 49 | "\n", 50 | "```\n", 51 | "np.random.rand()\n", 52 | "```\n", 53 | "\n", 54 | "* Donde `````` es una secuencia de valores enteros separados por comas que definen la forma del arreglo.\n", 55 | "\n", 56 | "https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "**Ejemplo:**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "* La siguiente celda generará un arreglo de forma ```(2, 2, 2)```conteniendo números aleatorios." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "scrolled": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "np.random.rand(2, 2, 2)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "### La función ```np.random.randint()```.\n", 89 | "\n", 90 | "* La función ```np.random.randint()``` crea un arreglo cuyos elementos son valores entros aleatorios en un rango dado.\n", 91 | "\n", 92 | "```\n", 93 | "np.random.randint(, , )\n", 94 | "```\n", 95 | "\n", 96 | "Donde:\n", 97 | "\n", 98 | "* `````` es el valor inicial del rango a partir del cual se generarán los números aleatorios, incluyéndolo a este.\n", 99 | "* `````` es el valor final del rango a partir del cual se generarán los números aleatorios, sin incluirlo.\n", 100 | "* `````` es un objeto ```tuple``` que definen la forma del arreglo." 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "**Ejemplos:**" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "* La siguente celda creará una arreglo de forma ```(3, 3)``` con valores enteros que pueden ir de ```1``` a ```2```." 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": { 121 | "scrolled": true 122 | }, 123 | "outputs": [], 124 | "source": [ 125 | "np.random.randint(1, 3, (3, 3))" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "* La siguente celda creará una arreglo de forma ```(3, 2, 4)``` con valores enteros que pueden ir de ```0``` a ```255```." 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "scrolled": true 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "np.random.randint(0, 256, (3, 2, 4))" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 151 | "

© José Luis Chiquete Valdivieso. 2023.

" 152 | ] 153 | } 154 | ], 155 | "metadata": { 156 | "kernelspec": { 157 | "display_name": "Python 3 (ipykernel)", 158 | "language": "python", 159 | "name": "python3" 160 | }, 161 | "language_info": { 162 | "codemirror_mode": { 163 | "name": "ipython", 164 | "version": 3 165 | }, 166 | "file_extension": ".py", 167 | "mimetype": "text/x-python", 168 | "name": "python", 169 | "nbconvert_exporter": "python", 170 | "pygments_lexer": "ipython3", 171 | "version": "3.9.2" 172 | } 173 | }, 174 | "nbformat": 4, 175 | "nbformat_minor": 2 176 | } 177 | -------------------------------------------------------------------------------- /08_algebra_lineal_con_numpy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Álgebra lineal con *Numpy*.\n", 15 | "\n", 16 | "El componente más poderoso de *Numpy* es su capacidad de realizar operaciones con arreglos, y un caso particular de ellos son las matrices numéricas.\n", 17 | "\n", 18 | "*Numpy* cuenta con una poderosa biblioteca de funciones que permiten tratar a los arreglos como estructuras algebraicas.\n", 19 | "\n", 20 | "La biblioteca especializada en alǵebra lineal es [```np.linalg```](https://numpy.org/doc/stable/reference/routines.linalg.html). Pero además de esta biblioteca especializada, es posible realizar operaciones básicas de matrices y vectores." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Producto punto entre dos matrices.\n", 37 | "\n", 38 | "\n", 39 | "La función ```np.dot()```permite realizar las operaciones de [producto punto](https://es.wikipedia.org/wiki/Producto_escalar) entre dos matrices compatibles (vectores).\n", 40 | "\n", 41 | "```\n", 42 | "np.dot(,)\n", 43 | "\n", 44 | "```\n", 45 | "\n", 46 | "Donde:\n", 47 | "\n", 48 | "* Cada `````` es una arreglo que cumple con la características para realizar esta operación.\n", 49 | "\n", 50 | "https://numpy.org/doc/stable/reference/generated/numpy.dot.html" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "**Ejemplo:**" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "* La siguiente celda creará al arreglo con nombre ```arreglo_1 ``` de forma ```(3, 2)```." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "arreglo_1 = np.array([[1, 2],\n", 74 | " [3, 4],\n", 75 | " [5, 6]])" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "arreglo_1" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "* La siguiente celda creará al arreglo con nombre ```arreglo_2 ``` de forma ```(2, 4)```." 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "arreglo_2 = np.array([[11, 12, 13, 14],\n", 101 | " [15, 16, 17, 18]])" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "arreglo_2" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "* La siguiente celda realizará la operación de producto punto entre ```arreglo_1``` y ```arreglo_2```, regresando una matriz de la forma ```(3, 4)```." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "scrolled": true 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "np.dot(arreglo_1, arreglo_2)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "* El signo ```@``` es reconocido por *Numpy* como el operador de producto punto." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "scrolled": true 143 | }, 144 | "outputs": [], 145 | "source": [ 146 | "arreglo_1 @ arreglo_2" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "## Producto cruz entre dos matrices." 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "La función ```np.cross()```permite realizar las operaciones de [producto cruz](https://es.wikipedia.org/wiki/Producto_vectorial) entre dos matrices compatibles.\n", 161 | "\n", 162 | "```\n", 163 | "np.cross(,)\n", 164 | "\n", 165 | "```\n", 166 | "\n", 167 | "https://numpy.org/doc/stable/reference/generated/numpy.cross.html" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "**Ejemplo:**" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "* La siguiente celda creará un arreglo de una dimensión, de forma ```(1,2)``` al que se le llamará ```vector_1```." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "vector_1 = np.array([1, 2])" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "scrolled": true 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "vector_1.shape" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "* La siguiente celda creará un arreglo de una dimensión, de forma ```(1,2)``` al que se le llamará ```vector_2```." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "vector_2 = np.array([11, 12])" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": { 224 | "scrolled": true 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "vector_2.shape" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "* La siguiente celda ejecutará la función ```np.cross()``` con ```vector_1``` y ```vector_2```." 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "np.cross(vector_1, vector_2)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "np.cross(vector_1, vector_2).shape" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "## El paquete ```numpy.linalg```.\n", 261 | "\n", 262 | "La biblioteca especializada en operaciones de álgebra lineal de *Numpy* es ```numpy.linalg```.\n", 263 | "\n", 264 | "El estudio de todas las funciones contenidas en este paquete están fuera de los alcances de este curso, pero se ejemplificarán las funciones.\n", 265 | "\n", 266 | "* ```np.linalg.det()```\n", 267 | "* ```np.linalg.solve()```\n", 268 | "* ```np.linalg.inv()```\n", 269 | "\n", 270 | "https://numpy.org/doc/stable/reference/routines.linalg.html" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "import numpy.linalg" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### Cálculo del determinante de una matriz mediante ```numpy.linalg.det()```.\n", 287 | "\n", 288 | "\n", 289 | "\n", 290 | "**Ejemplo:**\n", 291 | "\n", 292 | "* Se calculará el determinante de la matriz:\n", 293 | "\n", 294 | "$$ \\det\\begin{vmatrix}0&1&2\\\\3&4&5\\\\6&7&8\\end{vmatrix}$$\n", 295 | "\n", 296 | "* El cálculo del determinante es el siguiente:\n", 297 | "\n", 298 | "$$ ((0 * 4 * 8) + (1 * 5 * 6) + (2 * 3 * 7)) - ((6 * 4 * 2) + (7 * 5 * 0) + (8 * 3* 1)) = 0$$" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [ 307 | "matriz = np.arange(9).reshape(3, 3)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "matriz" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": null, 322 | "metadata": {}, 323 | "outputs": [], 324 | "source": [ 325 | "np.linalg.det(matriz)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "* Se calculará el determinante de la matriz:\n", 333 | "\n", 334 | "$$ \\det\\begin{vmatrix}1&1&2\\\\3&4&5\\\\6&7&8\\end{vmatrix}$$\n", 335 | "\n", 336 | "* El cálculo del determinante es el siguiente:\n", 337 | "\n", 338 | "$$ ((1 * 4 * 8) + (1 * 5 * 6) + (2 * 3 * 7)) - ((6 * 4 * 2) + (7 * 5 * 1) + (8 * 3* 1)) = -3$$" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "matriz = np.array([[1, 1, 2],\n", 348 | " [3, 4, 5],\n", 349 | " [6, 7, 8]])" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "np.linalg.det(matriz)" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "### Soluciones de ecuaciones lineales con la función ```np.linalg.solve()```.\n", 366 | "\n", 367 | "Un sistema de ecuaciones lineales coresponde un conjunto de ecuaciones de la forma:\n", 368 | "\n", 369 | "$$\n", 370 | "a_{11}x_1 + a_{12}x_2 + \\cdots a_{1n}x_n = y_1 \\\\\n", 371 | "a_{21}x_1 + a_{22}x_2 + \\cdots a_{2n}x_n = y_2\\\\\n", 372 | "\\vdots\\\\\n", 373 | "a_{m1}x_1 + a_{m2}x_2 + \\cdots a_{mn}x_n = y_m\n", 374 | "$$\n", 375 | "\n", 376 | "Lo cual puede ser expresado de forma matricial.\n", 377 | "\n", 378 | "$$ \n", 379 | "\\begin{bmatrix}a_{11}\\\\a_{21}\\\\ \\vdots\\\\ a_{m1}\\end{bmatrix}x_1 + \\begin{bmatrix}a_{12}\\\\a_{22}\\\\ \\vdots\\\\ a_{m2}\\end{bmatrix}x_2 + \\cdots \\begin{bmatrix}a_{m1}\\\\a_{m2}\\\\ \\vdots\\\\ a_{mn}\\end{bmatrix}x_n = \\begin{bmatrix}y_{1}\\\\y_{2}\\\\ \\vdots\\\\ y_{m}\\end{bmatrix}\n", 380 | "$$\n", 381 | "\n", 382 | "Existen múltiples métodos para calcular los valores $x_1, x_2 \\cdots x_n$ que cumplan con el sistema siempre que $m = n$.\n", 383 | "\n", 384 | "Numpy cuenta con la función ```np.linalg.solve()```, la cual puede calcular la solución de un sistema de ecuaciones lineales al expresarse como un par de matrices de la siguiente foma:\n", 385 | "\n", 386 | "$$ \n", 387 | "\\begin{bmatrix}a_{11}&a_{12}&\\cdots&a_{1n}\\\\a_{21}&a_{22}&\\cdots&a_{2n}\\\\ \\vdots\\\\ a_{n1}&a_{n2}&\\cdots&a_{nn}\\end{bmatrix}= \\begin{bmatrix}y_{1}\\\\y_{2}\\\\ \\vdots\\\\ y_{n}\\end{bmatrix}\n", 388 | "$$\n", 389 | "\n", 390 | "La función ```numpy.linagl.solve()``` permite resolver sistemas de ecuaciones lineales ingresando un arreglo de dimensiones ```(n, n)``` como primer argumente y otro con dimensión ```(n)``` como segundo argumento. " 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "**Ejemplo:**\n", 398 | "\n", 399 | "* Para resolver el sistema de ecuaciones:\n", 400 | "\n", 401 | "$$\n", 402 | "2x_1 + 5x_2 - 3x_3 = 22.2 \\\\\n", 403 | "11x_1 - 4x_2 + 22x_3 = 11.6 \\\\\n", 404 | "54x_1 + 1x_2 + 19x_3 = -40.1 \\\\\n", 405 | "$$" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "* La siguiente celda creará al arreglo ```a```, el cual representa a la matriz de coeficientes del sistema de ecuaciones lineales.\n", 413 | "\n", 414 | "$$ \n", 415 | "\\begin{bmatrix}2\\\\11\\\\54\\end{bmatrix}x_1 + \n", 416 | "\\begin{bmatrix}5\\\\-4\\\\1\\end{bmatrix}x_2 + \n", 417 | "\\begin{bmatrix}-3\\\\22\\\\19\\end{bmatrix}x_3 \n", 418 | "$$" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "a = np.array([[2, 5, -3],\n", 428 | " [11, -4, 22],\n", 429 | " [54, 1, 19]])" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": {}, 436 | "outputs": [], 437 | "source": [ 438 | "a" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [ 447 | "a.shape" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "* La siguiente celda corresponde a cada valor de $y$.\n", 455 | "\n", 456 | "$$\n", 457 | "\\begin{bmatrix}22.2\\\\11.6\\\\-40.1\\end{bmatrix}\n", 458 | "$$" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": {}, 465 | "outputs": [], 466 | "source": [ 467 | "y = np.array([22.2, 11.6, -40.1])" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": null, 473 | "metadata": {}, 474 | "outputs": [], 475 | "source": [ 476 | "y" 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": null, 482 | "metadata": { 483 | "scrolled": true 484 | }, 485 | "outputs": [], 486 | "source": [ 487 | "y.shape" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "* la siguiente celda resolverá el sistema de ecuaciones.\n", 495 | "\n", 496 | "$$\n", 497 | "1.80243902x_1 + 6.7549776x_2 + 2.65666999x_3 = y\n", 498 | "$$" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "np.linalg.solve(a, y)" 508 | ] 509 | }, 510 | { 511 | "cell_type": "markdown", 512 | "metadata": {}, 513 | "source": [ 514 | "### Cálculo de la inversa de una matriz mediante ```numpy.linalg.inv()```.\n", 515 | "\n", 516 | "Es posible calcular la matriz inversa de una [matriz invertible](https://es.wikipedia.org/wiki/Matriz_invertible) usando la función ```numpy.linalg.inv()```." 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "metadata": {}, 522 | "source": [ 523 | "**Ejemplo:**" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "* La siguiente celda definirá al arreglo ```a```, el cual representa a una matriz invertible." 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": null, 536 | "metadata": { 537 | "scrolled": true 538 | }, 539 | "outputs": [], 540 | "source": [ 541 | "a = np.array([[2, 5, -3],\n", 542 | " [11, -4, 22],\n", 543 | " [54, 1, 19]])" 544 | ] 545 | }, 546 | { 547 | "cell_type": "markdown", 548 | "metadata": {}, 549 | "source": [ 550 | "* La siguiente celda calculará la matriz inversa de ```a```." 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": null, 556 | "metadata": {}, 557 | "outputs": [], 558 | "source": [ 559 | "np.linalg.inv(a)" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": {}, 565 | "source": [ 566 | "* Un método para resolver un sistema de ecuaciones lineales es el de realizar un producto punto con la inversa de los coeficientes del sistema y los valores de ```y```." 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": null, 572 | "metadata": {}, 573 | "outputs": [], 574 | "source": [ 575 | "np.linalg.inv(a).dot(y)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "metadata": {}, 581 | "source": [ 582 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 583 | "

© José Luis Chiquete Valdivieso. 2023.

" 584 | ] 585 | } 586 | ], 587 | "metadata": { 588 | "kernelspec": { 589 | "display_name": "Python 3 (ipykernel)", 590 | "language": "python", 591 | "name": "python3" 592 | }, 593 | "language_info": { 594 | "codemirror_mode": { 595 | "name": "ipython", 596 | "version": 3 597 | }, 598 | "file_extension": ".py", 599 | "mimetype": "text/x-python", 600 | "name": "python", 601 | "nbconvert_exporter": "python", 602 | "pygments_lexer": "ipython3", 603 | "version": "3.10.6" 604 | } 605 | }, 606 | "nbformat": 4, 607 | "nbformat_minor": 2 608 | } 609 | -------------------------------------------------------------------------------- /10_tipos_de_datos_de_pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Tipos de datos de *Pandas*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "*Pandas* toma como base a *Numpy* y lo extiende para poder realizar operaciones de análisis de datos, por lo que es compatible con elementos como:\n", 22 | "\n", 23 | "* ```np.nan```.\n", 24 | "* ```np.inf```." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "import pandas as pd\n", 34 | "import numpy as np\n", 35 | "from datetime import datetime" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Convenciones de nombres." 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "En este capítulo se hará referencia al paquete ```pandas``` como ```pd```, al paquete ```numpy``` como ```np```y a los *dataframes* instanciados de ```pd.DataFrame``` como ```df```." 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "## Tipos de datos de *Pandas*.\n", 57 | "\n", 58 | "*Pandas* extiende y a su vez restringe los tipos de datos de *Python* y de *Numpy* a los siguientes:\n", 59 | "\n", 60 | "* ```object``` el cual representa a una cadena de caracteres.\n", 61 | "* ```int64``` es el tipo para números enteros. \n", 62 | "* ```float64``` es el tipo para números de punto flotante.\n", 63 | "* ```bool``` es el tipo para valores booleanos.\n", 64 | "* ```datetime64``` es el tipo usado para gestionar fechas y horas.\n", 65 | "* ```timedelta64``` es el tipo de diferencias de tiempo. \n", 66 | "* ```category``` es un tipo de dato que contiene una colección finita de posibles valores (no se estudiará en este curso)." 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "**Ejemplo:**" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "* A continuación se creará el *dataframe* ```datos``` que define las columnas.\n", 81 | " * *nombres* de tipo ```object```.\n", 82 | " * *fechas* de tipo ```datetime64```.\n", 83 | " * *saldo* de tipo ```float64```.\n", 84 | " * *al corriente* de tipo ```bool```." 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "datos = pd.DataFrame({'nombres':('Juan Pérez',\n", 94 | " 'María Sánchez'\n", 95 | " , 'Jorge Vargas',\n", 96 | " 'Rodrigo Martínez'),\n", 97 | " 'fechas':(datetime(1995,12,21), \n", 98 | " datetime(1989,1,13), \n", 99 | " datetime(1992,9,14), \n", 100 | " datetime(1993,7,8)),\n", 101 | " 'saldo': (2500, \n", 102 | " 5345, \n", 103 | " np.nan, \n", 104 | " 11323.2),\n", 105 | " 'al corriente':(True, \n", 106 | " True, \n", 107 | " False, \n", 108 | " True)})" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": { 115 | "scrolled": true 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "datos" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "## El atributo ```df.dtypes```.\n", 127 | "\n", 128 | "Este atributo es una serie de *Pandas* que contienen la relación de los tipos de datos de cada columna del *dataframe*.\n", 129 | "\n", 130 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "**Ejemplo:**\n", 138 | "\n", 139 | "* A partir del *dataframe* ```datos``` se obtendrá el tipo de datos de cada columna. " 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "datos.dtypes" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "## El método ```df.astype()```.\n", 156 | "\n", 157 | "Este método permite regresar los datos contenidos en un dataframe de *Pandas* a un tipo de dato específico. \n", 158 | "\n", 159 | "```\n", 160 | "df.astype()\n", 161 | "```\n", 162 | "\n", 163 | "Donde:\n", 164 | "\n", 165 | "* `````` es un tipo de dato soportado por *Pandas*.\n", 166 | "\n", 167 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "**Ejemplos:**" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "* La siguiente celda convertirá el contenido del *dataframe* ```datos``` a ```str```, lo cual dará por resultado elementos de tipo ```object```." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "datos" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "datos.astype(str)" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "datos.astype(str).dtypes" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "datos.dtypes" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "* La siguiente celda intentará convertir el contenido de la columna ```datos['saldo']``` a ```int64```. Sin embargo, algunos de sus contenidos no pueden ser convertidos a ese tipo de datos y se generará una excepciónde tipo ```IntCastingNaNError```." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "datos['saldo'].astype(\"int64\")" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "## La función ```pd.to_datetime()```.\n", 241 | "\n", 242 | "Esta función permite crear una columna de tipo ```datetime64``` a partir de un *dataframe* con columnas cuyos encabezados sean:\n", 243 | "\n", 244 | "* ```year``` (obligatorio)\n", 245 | "* ```month``` (obligatorio)\n", 246 | "* ```day``` (obligatorio)\n", 247 | "* ```hour```\n", 248 | "* ```minutes```\n", 249 | "* ```seconds```\n", 250 | "\n", 251 | "```\n", 252 | "pd.to_datetime()\n", 253 | "```\n", 254 | "\n", 255 | "Donde:\n", 256 | "\n", 257 | "* `````` es un *dataframe* con los identificadores de las columnas dispuesto en el formato descrito.\n", 258 | "\n", 259 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "**Ejemplo:**" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "* La siguiente celda creará al *dataframe* ```fechas``` con las columnas:\n", 274 | " * ```year```\n", 275 | " * ```month```\n", 276 | " * ```day```\n", 277 | " * ```hour```\n", 278 | " * ```minutes```\n", 279 | " * ```seconds```" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "fechas = pd.DataFrame({'year': [1997, 1982, 1985],\n", 289 | " 'month': [1, 12, 3],\n", 290 | " 'day': [14, 5, 21],\n", 291 | " 'hour':[17, 0, 4],\n", 292 | " 'minutes':[45, 39, 28],\n", 293 | " 'seconds':[11.1803, 23.74583, 3.8798]})" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "fechas" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "* A continuacion se creará la serie ```nuevas_fechas```, compuesta por elementos de tipo ```datetime64``` al aplicar la función ```pd.to_datetime()``` al *dataframe* ```fechas```." 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "metadata": {}, 316 | "outputs": [], 317 | "source": [ 318 | "nuevas_fechas = pd.to_datetime(fechas)" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "nuevas_fechas" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [ 336 | "type(nuevas_fechas)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "## La función ```pd.to_numeric()```.\n", 344 | "\n", 345 | "Esta función transforma al contenido de un *dataframe* o serie a un formato numérico." 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "**Ejemplo:**" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "* La siguiente celda transformará la serie ```nuevas_fechas``` a formato numérico." 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": null, 365 | "metadata": {}, 366 | "outputs": [], 367 | "source": [ 368 | "pd.to_numeric(nuevas_fechas)" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "pd.to_datetime(pd.to_numeric(nuevas_fechas))" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "## La función ```pd.to_timedelta()```.\n", 385 | "\n", 386 | "Esta función convertirá valores numéricos a formato ```timedelta64``` usando nanosegundos como referencia." 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "**Ejemplo:**" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "* La siguiente celda generará al *dataframe* ```datos``` que contiene una secuencia de 20 números." 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "datos = pd.DataFrame(np.arange(2811154301025,\n", 410 | " 2811154301125, 5).reshape(10, 2))" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": null, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [ 419 | "datos" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "* Se aplicará la función ```pd.to_timedelta()``` a ```datos[1]```." 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": { 433 | "scrolled": true 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "pd.to_timedelta(datos[1])" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "* La siguiente celda intentará ejecutar la función ```pd.to_timedelta()``` al *dataframe* ```nuevas_fechas``` el cual contiene objetos de tipo ```datetime```, desencadenando una excepción de tipo ```TypeError```." 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": null, 450 | "metadata": {}, 451 | "outputs": [], 452 | "source": [ 453 | "pd.to_timedelta(nuevas_fechas)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 461 | "

© José Luis Chiquete Valdivieso. 2023.

" 462 | ] 463 | } 464 | ], 465 | "metadata": { 466 | "kernelspec": { 467 | "display_name": "Python 3 (ipykernel)", 468 | "language": "python", 469 | "name": "python3" 470 | }, 471 | "language_info": { 472 | "codemirror_mode": { 473 | "name": "ipython", 474 | "version": 3 475 | }, 476 | "file_extension": ".py", 477 | "mimetype": "text/x-python", 478 | "name": "python", 479 | "nbconvert_exporter": "python", 480 | "pygments_lexer": "ipython3", 481 | "version": "3.9.2" 482 | } 483 | }, 484 | "nbformat": 4, 485 | "nbformat_minor": 2 486 | } 487 | -------------------------------------------------------------------------------- /12_indices_y_multiindices.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Índices y multiíndeces.\n", 15 | "\n", 16 | "Los índices e índices de columnas son objetos de *Pandas* que pueden ser tan simple como un listados de cadenas de caracteres o estructuras compleja de múltiples niveles.\n", 17 | "\n", 18 | "En este capítulo se estudiarán a los objetos instanciados de las clases ```pd.Index``` y ```pd.MultiIndex```." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import pandas as pd\n", 28 | "import numpy as np" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## El atributo ```pd.DataFrame.axes```.\n", 36 | "\n", 37 | "El atributo ```pd.DataFrame.axes``` es una lista que contiene a los índices de renglones y de columnas de un *dataframe*.\n", 38 | "\n", 39 | "* El objeto ```pd.DataFrame.axes[0]``` corresponde a los índices del *dataframe*.\n", 40 | "* El objeto ```pd.DataFrame.axes[1]``` corresponde a los índices de la columnas del *dataframe*.\n", 41 | "\n", 42 | "Los índices pueden ser de tipo ```pd.Index``` o ```pd.MultiIndex```.\n", 43 | "\n", 44 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.axes.html" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "**Ejemplo:**" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "* Se creará al *dataframe* ```poblacion```. " 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "poblacion = pd.DataFrame({'Animal':('lobo',\n", 68 | " 'coyote',\n", 69 | " 'jaguar',\n", 70 | " 'cerdo salvaje',\n", 71 | " 'tapir',\n", 72 | " 'venado',\n", 73 | " 'ocelote',\n", 74 | " 'puma'),\n", 75 | " 'Norte_I':(12,\n", 76 | " np.NAN,\n", 77 | " None,\n", 78 | " 2,\n", 79 | " 4,\n", 80 | " 2,\n", 81 | " 14,\n", 82 | " 5\n", 83 | " ),\n", 84 | " 'Norte_II':(23,\n", 85 | " 4,\n", 86 | " 25,\n", 87 | " 21,\n", 88 | " 9,\n", 89 | " 121,\n", 90 | " 1,\n", 91 | " 2\n", 92 | " ),\n", 93 | " 'Centro_I':(15,\n", 94 | " 23,\n", 95 | " 2,\n", 96 | " 120,\n", 97 | " 40,\n", 98 | " 121,\n", 99 | " 0,\n", 100 | " 5),\n", 101 | " 'Sur_I':(28,\n", 102 | " 46,\n", 103 | " 14,\n", 104 | " 156,\n", 105 | " 79,\n", 106 | " 12,\n", 107 | " 2,\n", 108 | " np.NAN)}).set_index('Animal')" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "poblacion" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "* Se desplegará ```poblacion.axes```." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "poblacion.axes" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "poblacion.axes[0]" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "poblacion.axes[1]" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "## La clase ```pd.Index```.\n", 159 | "\n", 160 | "Esta clase es la clase que permite crear índices simples y se instancia de esta manera.\n", 161 | "\n", 162 | "```\n", 163 | "pd.Index(['<índice 1>', '<índice 2>',..., '<índice n>'], name='')\n", 164 | "```\n", 165 | "Donde:\n", 166 | "\n", 167 | "* ```<índice x>``` es una cadena de caracteres correspondiente al nombre de un índice.\n", 168 | "* `````` es una cadena de caracteres para el atributo ```name``` del objeto ```pd.Index```." 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "**Ejemplo:**" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "* Se creará el objeto ```pd.Index``` con nombre ```indice```." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "indice = pd.Index(['N_1', 'N_2', 'C', 'S'], name='Regiones')" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "* Se asignará ```indice``` al atributo ```poblacion.columns```." 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "poblacion" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "poblacion.columns = indice" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "poblacion" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "### El atributo ```pd.Index.name```.\n", 233 | "\n", 234 | "Este atributo contiene el nombre del objeto ```pd.Index.name```, el cual será desplegado como parte de un índice." 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "* Se despegará el atributo ```name``` de ```poblacion.index```." 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "poblacion.index.name" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "poblacion.columns.name" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "### El atributo ```pd.Index.values```.\n", 267 | "\n", 268 | "Este atributo es un objeto ```np.ndarray``` que contiene los nombres de cada índice." 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "**Ejemplos:**" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "* Se desplegará el atributo ```poblacion.columns.values```." 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [ 291 | "poblacion.columns.values" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "* Se sustituirá el valor de ```poblacion.columns.values[3]``` por la cadena ```'Sur'```." 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [ 307 | "poblacion.columns.values[3] = 'Sur'" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "poblacion" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "## La clase ```pd.MultiIndex```.\n", 324 | "\n", 325 | "Los objetos instanciados de la clase ```pd.MultiIndex``` permiten tener más de un nivel de índices.\n", 326 | "\n", 327 | "Estos objetos están conformados por:\n", 328 | "* niveles (```levels```), los cuales se van desagregando conforme descienden.\n", 329 | "* codigos de ordenamiento (```codes```), ls cuales contienen listas describiendo la distribución de los índices por nivel.\n", 330 | "* nombres (```names```) correspondientes a cada nivel." 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | " ## Creación de un objeto ```pd.MultiIndex```.\n", 338 | " \n", 339 | " Para la creación de objetos instanciados de ```pd.MultiIndex``` se pueden utilizar los siguientes métodos de clase:\n", 340 | " \n", 341 | " * ```pd.MultiIndex.from_arrays()```.\n", 342 | " * ```pd.MultiIndex.from_tuples()```.\n", 343 | " * ```pd.MultiIndex.from_products()```.\n", 344 | " * ```pd.MultiIndex.from_dataframes()```. \n", 345 | " \n", 346 | " \n", 347 | " ```\n", 348 | " pd.MultiIndex.(, names=)\n", 349 | " ```" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "**Ejemplos:**" 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "* A continuación se creará una tupla que describe diversos índices." 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": {}, 370 | "outputs": [], 371 | "source": [ 372 | "lista = []\n", 373 | "for zona in ('Norte_I', 'Sur_I', 'Centro_I'):\n", 374 | " for animal in ('jaguar', 'conejo', 'lobo'):\n", 375 | " lista.append((zona, animal)) " 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": {}, 382 | "outputs": [], 383 | "source": [ 384 | "lista" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "tupla=tuple(lista) " 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "tupla" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "* La siguiente celda creará un objeto a partir de ```pd.MultiIndex.from_tuples```." 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": { 416 | "scrolled": true 417 | }, 418 | "outputs": [], 419 | "source": [ 420 | "pd.MultiIndex.from_tuples(tupla, names=['zona', 'animal'])" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "* A continuación se crearán objetos similares utilizando el método ```pd.MultiIndex.from_product()```." 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": {}, 434 | "outputs": [], 435 | "source": [ 436 | "pd.MultiIndex.from_product([('Norte_I', 'Sur_I', 'Centro_I'), \n", 437 | " ('jaguar', 'conejo', 'lobo')],\n", 438 | " names=['zona', 'animal'])" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "* Se definirá al objeto ```columnas``` utilizando el método ```pd.MultiIndex.from_product()```." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": null, 451 | "metadata": {}, 452 | "outputs": [], 453 | "source": [ 454 | "columnas = pd.MultiIndex.from_product([('Norte_I', 'Sur_I', 'Centro_I'),\n", 455 | " ('jaguar', 'conejo', 'lobo')],\n", 456 | " names=['zona', 'animal'])" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": {}, 462 | "source": [ 463 | "El *dataframe* ```poblacion``` será creado utilizando el objeto ```columnas``` para el atributo ```poblacion.columns```." 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": null, 469 | "metadata": {}, 470 | "outputs": [], 471 | "source": [ 472 | "poblacion = pd.DataFrame([[12, 11, 24, 32, 15, 42, 35, 11, 35],\n", 473 | " [23, 22, 54, 3, 34, 24, 39, 29, 11],\n", 474 | " [35, 32, 67, 15, 42, 34, 46, 40, 13],\n", 475 | " [33, 43, 87, 11, 61, 42, 52, 41, 15],\n", 476 | " [44, 56, 98, 16, 70, 50, 57, 41, 17],\n", 477 | " [53, 62, 103, 21, 74, 54, 69, 55, 23]], \n", 478 | " index=('enero', 'febrero', 'marzo', 'abril', 'mayo', 'junio'),\n", 479 | " columns=columnas)" 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": null, 485 | "metadata": {}, 486 | "outputs": [], 487 | "source": [ 488 | "poblacion.columns" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "metadata": { 495 | "scrolled": true 496 | }, 497 | "outputs": [], 498 | "source": [ 499 | "poblacion" 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "metadata": {}, 505 | "source": [ 506 | "# Indexado.\n", 507 | "\n", 508 | "El indexado de un índice se realiza mediante corchetes. El corchete inicial corresponde al nivel superior.\n", 509 | "\n", 510 | "```\n", 511 | "df[][]...[]\n", 512 | "```\n", 513 | "Donde:\n", 514 | "\n", 515 | "* Cada `````` es el identificador del índice al que se desea acceder en el nivel ```i``` correspondiente." 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": {}, 521 | "source": [ 522 | "**Ejemplo:**" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "* La siguiente celda regreará un *dataframe* correspondiente a las columnas del índice de primer nivel ```poblacion['Norte_I']```." 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": null, 535 | "metadata": { 536 | "scrolled": true 537 | }, 538 | "outputs": [], 539 | "source": [ 540 | "poblacion['Norte_I']" 541 | ] 542 | }, 543 | { 544 | "cell_type": "markdown", 545 | "metadata": {}, 546 | "source": [ 547 | "* La siguiente celda regreará un *dataframe* correspondiente a las columnas del índice de primer nivel ```poblacion['Sur_I']['jaguar']```." 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": {}, 554 | "outputs": [], 555 | "source": [ 556 | "poblacion['Sur_I']['jaguar']" 557 | ] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": {}, 562 | "source": [ 563 | "## El metódo ```pd.MultiIndex.droplevel()```.\n", 564 | "\n", 565 | "Este método elimina un nivel de un objeto ```pd.MultiIndex```.\n", 566 | "\n", 567 | "```\n", 568 | ".droplevel()\n", 569 | "```" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": {}, 575 | "source": [ 576 | "**Ejemplo:**" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "* Se utilizará al objeto ```columnas``` creado previamente." 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": null, 589 | "metadata": { 590 | "scrolled": true 591 | }, 592 | "outputs": [], 593 | "source": [ 594 | "columnas" 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": {}, 600 | "source": [ 601 | "* La siguiente celda eliminará el primer nivel del objeto ```columnas```." 602 | ] 603 | }, 604 | { 605 | "cell_type": "code", 606 | "execution_count": null, 607 | "metadata": {}, 608 | "outputs": [], 609 | "source": [ 610 | "columnas.droplevel(0)" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "* La siguiente celda al objeto ```nuevas_cols``` a partir de eliminar el primer nivel del objeto ```columnas```." 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": null, 623 | "metadata": {}, 624 | "outputs": [], 625 | "source": [ 626 | "nuevas_cols = poblacion.columns.droplevel('zona')" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": null, 632 | "metadata": { 633 | "scrolled": true 634 | }, 635 | "outputs": [], 636 | "source": [ 637 | "nuevas_cols" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "* La siguiente celda asignará al atributo ```poblacion.columns``` el objeto ```nuevas_cols```." 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": null, 650 | "metadata": {}, 651 | "outputs": [], 652 | "source": [ 653 | "poblacion.columns = nuevas_cols" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": null, 659 | "metadata": { 660 | "scrolled": true 661 | }, 662 | "outputs": [], 663 | "source": [ 664 | "poblacion" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": null, 670 | "metadata": {}, 671 | "outputs": [], 672 | "source": [ 673 | "poblacion['jaguar']" 674 | ] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": {}, 679 | "source": [ 680 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 681 | "

© José Luis Chiquete Valdivieso. 2023.

" 682 | ] 683 | } 684 | ], 685 | "metadata": { 686 | "kernelspec": { 687 | "display_name": "Python 3 (ipykernel)", 688 | "language": "python", 689 | "name": "python3" 690 | }, 691 | "language_info": { 692 | "codemirror_mode": { 693 | "name": "ipython", 694 | "version": 3 695 | }, 696 | "file_extension": ".py", 697 | "mimetype": "text/x-python", 698 | "name": "python", 699 | "nbconvert_exporter": "python", 700 | "pygments_lexer": "ipython3", 701 | "version": "3.9.2" 702 | } 703 | }, 704 | "nbformat": 4, 705 | "nbformat_minor": 2 706 | } 707 | -------------------------------------------------------------------------------- /14_metodo_merge.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# El método ```pd.DataFrame.merge()```." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "from datetime import datetime" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "El método ```pd.DataFrame.merge()``` permite crear en un nuevo *dataframe* a partir de la relación entre el *dataframe* de origen y el que se ingresa como *argumento*, indicando las columnas en las que pueda encontrar elementos coincidentes.\n", 32 | "\n", 33 | "\n", 34 | "```\n", 35 | "df.merge(, left_on=[, , ..., ], \n", 36 | " right_on=[, , ..., ], \n", 37 | " on=[, ,.. ],\n", 38 | " how=)\n", 39 | "```\n", 40 | "\n", 41 | "Donde:\n", 42 | "\n", 43 | "* `````` es un *dataframe* de *Pandas*.\n", 44 | "* Cada `````` es un objeto `````` que corresponde al identificador de una columna del *dataframe* que contiene al método.\n", 45 | "* `````` es un objeto `````` que corresponde al identificador de una columna del *dataframe* ``````.\n", 46 | "* Cada ``````es un objeto `````` que corresponde al identificador de una columna que comparte el mismo nombre en ambos *dataframes*.\n", 47 | "* `````` es el modo en el que se realizará la combinación y puede ser:\n", 48 | "\n", 49 | " * ```'inner'```, el cual es el valor por defecto.\n", 50 | " * ```'outer'```\n", 51 | " * ```'left'```\n", 52 | " * ```'right'```\n", 53 | " \n", 54 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "**Ejemplos:**" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "* La siguiente celda creará al *dataframe* ```clientes```, el cual contiene a las columnas:\n", 69 | "\n", 70 | "* ```'ident'```.\n", 71 | "* ```'nombre'```.\n", 72 | "* ```'primer apellido'```.\n", 73 | "* ```'suc_origen'```." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "clientes = pd.DataFrame({'ident':(19232, \n", 83 | " 19233, \n", 84 | " 19234, \n", 85 | " 19235, \n", 86 | " 19236),\n", 87 | " 'nombre':('Adriana',\n", 88 | " 'Marcos',\n", 89 | " 'Rubén',\n", 90 | " 'Samuel',\n", 91 | " 'Martha'),\n", 92 | " 'primer apellido':('Sánchez',\n", 93 | " 'García',\n", 94 | " 'Rincón',\n", 95 | " 'Oliva',\n", 96 | " 'Martínez'),\n", 97 | " 'suc_origen':('CDMX01',\n", 98 | " 'CDMX02',\n", 99 | " 'CDMX02',\n", 100 | " 'CDMX01',\n", 101 | " 'CDMX03')\n", 102 | " })" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": { 109 | "scrolled": false 110 | }, 111 | "outputs": [], 112 | "source": [ 113 | "clientes" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "* La siguiente celda creará al *dataframe* ```sucursales```, el cual contiene a las columnas:\n", 121 | "\n", 122 | "* ```'clave'```.\n", 123 | "* ```nombre_comercial```." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "sucursales = pd.DataFrame({'clave':('CDMX01', \n", 133 | " 'CDMX02', \n", 134 | " 'MTY01', \n", 135 | " 'GDL01'),\n", 136 | " 'nombre_comercial':('Galerías',\n", 137 | " 'Centro',\n", 138 | " 'Puerta de la Silla',\n", 139 | " 'Minerva Plaza')})" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "sucursales" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "* La siguiente celda creará al *dataframe* ```facturas```, el cual contiene a las columnas:\n", 156 | "\n", 157 | "* ```'folio'```.\n", 158 | "* ```'sucursal'```.\n", 159 | "* ```'monto'```.\n", 160 | "* ```'fecha'```.\n", 161 | "* ```'cliente'```." 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": {}, 168 | "outputs": [], 169 | "source": [ 170 | "facturas = pd.DataFrame({'folio':(15234, \n", 171 | " \n", 172 | " 15235, \n", 173 | " 15236, \n", 174 | " 15237, \n", 175 | " 15238, \n", 176 | " 15239, \n", 177 | " 15240,\n", 178 | " 15241,\n", 179 | " 15242),\n", 180 | " 'sucursal':('CDMX01',\n", 181 | " 'MTY01',\n", 182 | " 'CDMX02',\n", 183 | " 'CDMX02',\n", 184 | " 'MTY01',\n", 185 | " 'GDL01',\n", 186 | " 'CDMX02',\n", 187 | " 'MTY01',\n", 188 | " 'GDL01'),\n", 189 | " 'monto':(1420.00,\n", 190 | " 1532.00,\n", 191 | " 890.00,\n", 192 | " 1300.00,\n", 193 | " 3121.47,\n", 194 | " 1100.5,\n", 195 | " 12230,\n", 196 | " 230.85,\n", 197 | " 1569),\n", 198 | " 'fecha':(datetime(2019,3,11,17,24),\n", 199 | " datetime(2019,3,24,14,46),\n", 200 | " datetime(2019,3,25,17,58),\n", 201 | " datetime(2019,3,27,13,11),\n", 202 | " datetime(2019,3,31,10,25),\n", 203 | " datetime(2019,4,1,18,32),\n", 204 | " datetime(2019,4,3,11,43),\n", 205 | " datetime(2019,4,4,16,55),\n", 206 | " datetime(2019,4,5,12,59)),\n", 207 | " 'cliente':(19234,\n", 208 | " 19232,\n", 209 | " 19235,\n", 210 | " 19233,\n", 211 | " 19236,\n", 212 | " 19237,\n", 213 | " 19232,\n", 214 | " 19233,\n", 215 | " 19232)\n", 216 | " })" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": { 223 | "scrolled": true 224 | }, 225 | "outputs": [], 226 | "source": [ 227 | "facturas" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "* Cada una de las siguentes dos celdas regresarán un *dataframe* que relacionará los elementos de ```facturas[\"sucursal\"]``` con ```sucursales[\"clave\"]```." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "facturas.merge(sucursales, left_on=\"sucursal\", right_on=\"clave\")" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": null, 249 | "metadata": { 250 | "scrolled": true 251 | }, 252 | "outputs": [], 253 | "source": [ 254 | "sucursales.merge(facturas, left_on=\"clave\", right_on=\"sucursal\")" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "* La siguente celda regresará un *dataframe* que relacionará los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```." 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": { 268 | "scrolled": false 269 | }, 270 | "outputs": [], 271 | "source": [ 272 | "facturas.merge(clientes, left_on=\"cliente\", right_on=\"ident\")" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "* La siguente celda regresará un *dataframe* que relacionará los elementos de ```clientes[\"suc_origen\"]``` con ```sucursales[\"clave\"]```." 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "clientes.merge(sucursales, left_on='suc_origen', right_on='clave')" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "* La siguente celda regresará un *dataframe* que relacionará:\n", 296 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n", 297 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n", 298 | "\n", 299 | "* El *dataframe* resultante contendrá exclusivamente aquellos elementos en los que exista coincidencia en ambas relaciones." 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": { 306 | "scrolled": false 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n", 311 | " right_on=[\"ident\", \"suc_origen\"])" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "* La siguente celda regresará un *dataframe* que relacionará:\n", 319 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n", 320 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n", 321 | " * Se ingresará el argumento ```how=\"inner\"```.\n", 322 | "\n", 323 | "* El *dataframe* resultante contendrá exclusivamente aquellos elementos en los que exista coincidencia en ambas relaciones." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [ 332 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n", 333 | " right_on=[\"ident\", \"suc_origen\"], how=\"inner\")" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "* La siguente celda regresará un *dataframe* que relacionará:\n", 341 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n", 342 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n", 343 | " * Se ingresará el argumento ```how=\"outer\"```.\n", 344 | "\n", 345 | "* El *dataframe* resultante:\n", 346 | " * Contendrá a todas las posibles relaciones entre los *dataframes* ```facturas``` y ```clientes```.\n", 347 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. " 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": { 354 | "scrolled": true 355 | }, 356 | "outputs": [], 357 | "source": [ 358 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n", 359 | " right_on=[\"ident\", \"suc_origen\"], how=\"outer\")" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "* La siguente celda regresará un *dataframe* que relacionará:\n", 367 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n", 368 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n", 369 | " * Se ingresará el argumento ```how=\"left\"```.\n", 370 | "\n", 371 | "* El *dataframe* resultante:\n", 372 | " * Contendrá a las posibles relaciones del *dataframes* ```facturas```.\n", 373 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. " 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": { 380 | "scrolled": false 381 | }, 382 | "outputs": [], 383 | "source": [ 384 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n", 385 | " right_on=[\"ident\", \"suc_origen\"], how=\"left\")" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "* La siguente celda regresará un *dataframe* que relacionará:\n", 393 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n", 394 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n", 395 | " * Se ingresará el argumento ```how=\"right\"```.\n", 396 | "\n", 397 | "* El *dataframe* resultante:\n", 398 | " * Contendrá a las posibles relaciones del *dataframes* ```clientes```.\n", 399 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. " 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": { 406 | "scrolled": true 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n", 411 | " right_on=[\"ident\", \"suc_origen\"], how=\"right\")" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 419 | "

© José Luis Chiquete Valdivieso. 2023.

" 420 | ] 421 | } 422 | ], 423 | "metadata": { 424 | "kernelspec": { 425 | "display_name": "Python 3 (ipykernel)", 426 | "language": "python", 427 | "name": "python3" 428 | }, 429 | "language_info": { 430 | "codemirror_mode": { 431 | "name": "ipython", 432 | "version": 3 433 | }, 434 | "file_extension": ".py", 435 | "mimetype": "text/x-python", 436 | "name": "python", 437 | "nbconvert_exporter": "python", 438 | "pygments_lexer": "ipython3", 439 | "version": "3.9.2" 440 | } 441 | }, 442 | "nbformat": 4, 443 | "nbformat_minor": 2 444 | } 445 | -------------------------------------------------------------------------------- /15_metodo_filter.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# El método ```pd.DataFrame.filter()```." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "from datetime import datetime" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "El método ```pd.DataFrame.filter()``` permite buscar coincidencias mediante ciertos argumentos de búsqueda sobre los índices de un *dataframe*. El resultado es un *dataframe* nuevo con los elementos coincidentes de la búsqueda.\n", 32 | "\n", 33 | "```\n", 34 | "df.filter(, axis=)\n", 35 | "```\n", 36 | "\n", 37 | "Donde:\n", 38 | "\n", 39 | "* `````` es un argumento que define los cirterios de búsqueda. Los parámetros disponibles para los argumentos de este método son:\n", 40 | " * ```items``` en el que se definen los encabezados a buscar dentro de un objeto iterable.\n", 41 | " * ```like``` en el que se define una cadena de caracteres que debe coincidir con el identificador de algún índice.\n", 42 | " * ```regex``` define un patrón mediante una expresión regular.\n", 43 | "* `````` puede ser:\n", 44 | " * ```0``` para realizar la búsqueda en los índices de los renglones.\n", 45 | " * ```1``` para realizar la búsqueda en los índices de lass columnas. Este es el valor por defecto. \n", 46 | "\n", 47 | "\n", 48 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "**Ejemplos:**" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "* La siguente celda definirá al *dataframe* ```facturas``` con los identificadores de columnas:\n", 63 | " * ```'folio'```.\n", 64 | " * ```'sucursal'```.\n", 65 | " * ```'monto'```.\n", 66 | " * ```'fecha'```.\n", 67 | " * ```'cliente'```." 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "facturas = pd.DataFrame({'folio':(15234, \n", 77 | " 15235, \n", 78 | " 15236, \n", 79 | " 15237, \n", 80 | " 15238, \n", 81 | " 15239, \n", 82 | " 15240,\n", 83 | " 15241,\n", 84 | " 15242),\n", 85 | " 'sucursal':('CDMX01',\n", 86 | " 'MTY01',\n", 87 | " 'CDMX02',\n", 88 | " 'CDMX02',\n", 89 | " 'MTY01',\n", 90 | " 'GDL01',\n", 91 | " 'CDMX02',\n", 92 | " 'MTY01',\n", 93 | " 'GDL01'),\n", 94 | " 'monto':(1420.00,\n", 95 | " 1532.00,\n", 96 | " 890.00,\n", 97 | " 1300.00,\n", 98 | " 3121.47,\n", 99 | " 1100.5,\n", 100 | " 12230,\n", 101 | " 230.85,\n", 102 | " 1569),\n", 103 | " 'fecha':(datetime(2019,3,11,17,24),\n", 104 | " datetime(2019,3,24,14,46),\n", 105 | " datetime(2019,3,25,17,58),\n", 106 | " datetime(2019,3,27,13,11),\n", 107 | " datetime(2019,3,31,10,25),\n", 108 | " datetime(2019,4,1,18,32),\n", 109 | " datetime(2019,4,3,11,43),\n", 110 | " datetime(2019,4,4,16,55),\n", 111 | " datetime(2019,4,5,12,59)),\n", 112 | " 'id_cliente':(19234,\n", 113 | " 19232,\n", 114 | " 19235,\n", 115 | " 19233,\n", 116 | " 19236,\n", 117 | " 19237,\n", 118 | " 19232,\n", 119 | " 19233,\n", 120 | " 19232)\n", 121 | " })" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "scrolled": true 129 | }, 130 | "outputs": [], 131 | "source": [ 132 | "facturas" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna sean exactamente ```'id_cliente'``` o ```'sucursal'```." 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "facturas.filter(items=['id_cliente','sucursal'])" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna que incluyan la cadena ```'mon'```." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "facturas.filter(like=\"mon\")" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna que incluyan la cadena ```'o'```." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "scrolled": true 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "facturas.filter(like=\"o\")" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de índice que incluyan la cadena ```'1'```." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "scrolled": true 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "facturas.filter(like=\"1\", axis=0)" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna cumplan con la expresión regular ```r\"sal$\"```." 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": { 214 | "scrolled": true 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "facturas.filter(regex=r\"sal$\")" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "## Ejemplo de ```pd.DataFrame.filter()``` y ```pd.DataFrame.merge().```" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "* La siguiente celda creará al *dataframe* ```clientes``` con la estructura de columnas: \n", 233 | " * ```'id'```.\n", 234 | " * ```'nombre'```.\n", 235 | " * ```'apellido'```.\n", 236 | " * ```'suc_origen'```." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "clientes = pd.DataFrame({'id':(19232, \n", 246 | " 19233, \n", 247 | " 19234, \n", 248 | " 19235, \n", 249 | " 19236),\n", 250 | " 'nombre':('Adriana',\n", 251 | " 'Marcos',\n", 252 | " 'Rubén',\n", 253 | " 'Samuel',\n", 254 | " 'Martha'),\n", 255 | " 'apellido':('Sánchez',\n", 256 | " 'García',\n", 257 | " 'Rincón',\n", 258 | " 'Oliva',\n", 259 | " 'Martínez'),\n", 260 | " 'suc_origen':('CDMX01',\n", 261 | " 'CDMX02',\n", 262 | " 'CDMX02',\n", 263 | " 'CDMX01',\n", 264 | " 'CDMX03')\n", 265 | " })" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "clientes" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "* La siguiente celda combinará los métodos ```filter()``` y ```merge()``` que resultarán en un *dataframe* con una estructura de columnas:\n", 282 | " * Su utilizará el método ```clientes.merge()``` para identificar coincidencias entre los elementos de ```clientes['id']``` y ```facturas['id_cliente']```.\n", 283 | " * Al *dataframe* resultante se le aplicará el método ```filter()```para regresar únicamente las columnas: \n", 284 | " * ```'folio'```. \n", 285 | " * ```'nombre```. \n", 286 | " * ```'apellido'```.\n", 287 | " * ```'monto'```." 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": { 294 | "scrolled": true 295 | }, 296 | "outputs": [], 297 | "source": [ 298 | "clientes.merge(facturas,\n", 299 | " left_on='id',\n", 300 | " right_on='id_cliente')" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": null, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "clientes.filter(items=['folio', \n", 310 | " 'nombre', \n", 311 | " 'apellido', \n", 312 | " 'monto'])" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": { 319 | "scrolled": true 320 | }, 321 | "outputs": [], 322 | "source": [ 323 | "clientes.merge(facturas,\n", 324 | " left_on='id',\n", 325 | " right_on='id_cliente').filter(items=['folio',\n", 326 | " 'nombre', \n", 327 | " 'apellido',\n", 328 | " 'monto'])" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 336 | "

© José Luis Chiquete Valdivieso. 2023.

" 337 | ] 338 | } 339 | ], 340 | "metadata": { 341 | "kernelspec": { 342 | "display_name": "Python 3 (ipykernel)", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.9.2" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 2 361 | } 362 | -------------------------------------------------------------------------------- /16_metodos_apply_y_transform.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Los métodos ```apply()``` y ```transform()```." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "import numpy as np" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## *Dataframe* ilustrativo." 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "El *dataframe* ```poblacion``` representa un censo poblacional de especies animales en diversas regiones geográficas.\n", 39 | "\n", 40 | "Las poblaciones de animales censadas representan los índices del *dataframe* y son:\n", 41 | "* ```'lobo'```.\n", 42 | "* ```'jaguar'```.\n", 43 | "* ```'coyote'```.\n", 44 | "* ```'halcón'```. \n", 45 | "* ```'lechuza'```.\n", 46 | "* ```'aguila'```.\n", 47 | "\n", 48 | "Las regiones geográficas representan la columnas del *dataframe* y son:\n", 49 | "\n", 50 | "* ```Norte_1```.\n", 51 | "* ```Norte_2```.\n", 52 | "* ```Sur_1```.\n", 53 | "* ```Sur_2```." 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "indice = ('lobo', 'jaguar', 'coyote', 'halcón', 'lechuza', 'aguila')\n", 63 | "poblacion = pd.DataFrame({'Norte_1':(25,\n", 64 | " 45,\n", 65 | " 23,\n", 66 | " 67,\n", 67 | " 14,\n", 68 | " 12),\n", 69 | " 'Norte_2':(31,\n", 70 | " 0,\n", 71 | " 23,\n", 72 | " 3,\n", 73 | " 34,\n", 74 | " 2),\n", 75 | " 'Sur_1':(0,\n", 76 | " 4,\n", 77 | " 3,\n", 78 | " 1,\n", 79 | " 1,\n", 80 | " 2),\n", 81 | " 'Sur_2':(2,\n", 82 | " 0,\n", 83 | " 12,\n", 84 | " 23,\n", 85 | " 11,\n", 86 | " 2)}, index=indice)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": { 93 | "scrolled": false 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "poblacion" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## El método ```apply()```." 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "El método ```apply()``` permite aplicar una función a una serie o dataframe de *Pandas*.\n", 112 | "\n", 113 | "```\n", 114 | ".apply(, axis=)\n", 115 | "```\n", 116 | "\n", 117 | "Donde:\n", 118 | "\n", 119 | "* `````` es una serie o un *dataframe* de *Pandas*.\n", 120 | "* `````` es una función de *Python* o de *Numpy*.\n", 121 | "* `````` puede ser:\n", 122 | " * ```0``` para aplicar la función a los renglones. Este es el valor por defecto.\n", 123 | " * ```1``` para aplicar la función a las columnas.\n", 124 | "\n", 125 | "Este método realiza operaciones de *broadcast* dentro del objeto.\n", 126 | "\n", 127 | "Para fines prácticos se explorará el método ```pd.DataFrame.apply()``` cuya documentación puede ser consultada en:\n", 128 | "\n", 129 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### Funciones aceptadas.\n", 137 | "\n", 138 | "* El método ```apply()``` permite ingresar como argumento el nombre de una función o una función *lambda* de *Python*." 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "**Ejemplos:**" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "* La siguiente celda definirá a la función ```suma_dos()```." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "def suma_dos(x:int) -> int:\n", 162 | " ''''Función que regresa el resultado de sumar 2 unidades a un entero.'''\n", 163 | " return x + 2" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "* La siguiente celda regresará un *dataframe* que contiene el resultado de ejecutar la función ```suma_dos()``` usando a cada elemento del *dataframe* ```población``` como argumento." 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "poblacion" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": {}, 186 | "outputs": [], 187 | "source": [ 188 | "poblacion.apply(suma_dos)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "* La siguiente celda regresará un *dataframe* que contiene el resultado de ejecutar la función definida como ```lambda x: x + 2``` usando a cada elemento del *dataframe* ```población``` como argumento." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "poblacion.apply(lambda x: x + 2)" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "* La siguiente celda regresará una serie que corresponde a ejecutar la función ```suma_dos()``` a cada elemento de la serie que conforma la columna ```poblacion['Norte_2']```. " 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": { 218 | "scrolled": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "poblacion['Norte_2'].apply(suma_dos)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### *Broadcasting*." 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "* La siguiente celda utilizará las propiedades de *broadcasting* para aplicar una función que suma diversos elementos a cada renglón del *dataframe* ```poblacion```el *dataframe* ```poblacion``` en el eje ```0```. \n", 237 | "* En vista de que el objeto ```[1, 2, 3, 4, 5, 6]``` tiene ```6``` elementos y el *dataframe* ```poblacion``` es de forma ```(6, 4)```, es posible realizar el *broadcasting*." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "poblacion" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "metadata": { 253 | "scrolled": false 254 | }, 255 | "outputs": [], 256 | "source": [ 257 | "poblacion.apply(lambda x: x + [1, 2, 3, 4, 5, 6])" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "* La siguiente celda utilizará las propiedades de *broadcasting* para aplicar una función que suma diversos elementos a cada renglón del *dataframe* ```poblacion```el *dataframe* ```poblacion``` en el eje ```1```. \n", 265 | "* En vista de que el objeto ```[1, 2, 3, 4]``` tiene ```4``` elementos y el *dataframe* ```poblacion``` es de forma ```(6, 4)```, es posible realizar el *broadcasting*." 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "poblacion" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": null, 280 | "metadata": {}, 281 | "outputs": [], 282 | "source": [ 283 | "poblacion.apply(lambda x: x + [1, 2, 3, 4], axis=1)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "* La siguiente celda aplicará la función con *broadcasting* sobre el eje ```0``` con un objeto de tamaño iadecuado. Se desencadenará una excepción del tipo ```ValueError```." 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": { 297 | "scrolled": false 298 | }, 299 | "outputs": [], 300 | "source": [ 301 | "poblacion.apply(lambda x: x + [1, 2, 3, 4])" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "### Aplicación de funciones de *Numpy*.\n", 309 | "\n", 310 | "*Numpy* cuenta con funciones de agregación capaces de realizar operaciones con la totalidad de los elementos de un arreglo, en vez de con cada uno de ellos. \n", 311 | "\n", 312 | "El método ```apply()``` es compatible con este tipo de funciones." 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "**Ejemplo:**" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```0``` (renglones) del *dataframe* ```poblacion```, usando la función ```np.sum()``` y regresará una serie con los resultados." 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "scrolled": true 334 | }, 335 | "outputs": [], 336 | "source": [ 337 | "poblacion.apply(np.sum)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```1``` (columnas) del *dataframe* ```poblacion```, usando la función ```np.sum()``` y regresará una serie con los resultados." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "scrolled": true 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "poblacion.apply(np.sum, axis=1)" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```0``` (renglones) del *dataframe* ```poblacion```, usando la función ```np.mean()``` y regresará una serie con los resultados." 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": { 369 | "scrolled": true 370 | }, 371 | "outputs": [], 372 | "source": [ 373 | "poblacion.apply(np.mean)" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```1``` (columnas) del *dataframe* ```poblacion```, usando la función ```np.mean()``` y regresará una serie con los resultados." 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": { 387 | "scrolled": true 388 | }, 389 | "outputs": [], 390 | "source": [ 391 | "poblacion.apply(np.mean, axis=1)" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "### Optimización en función de contexto de los datos." 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "El método ```pd.Dataframe.apply()``` permite identificar ciertos datos que podrían causar errores o excepciones y es capaz de utilizar funcione de *numpy* análogas que den un resultado en vez de una excepción." 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "**Ejemplo:**" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "* La función ```np.mean()``` regresa un valor ```np.NaN``` cuando encuentra un valor ```np.Nan``` en el arreglo que se le ingresa como argumento." 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": {}, 426 | "outputs": [], 427 | "source": [ 428 | "arreglo = np.array([25, np.NaN, 23, 67, 14, 12])" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "arreglo" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "np.mean(arreglo)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "* La función ```np.nanmean()``` descarta los valores ```np.NaN``` que se encuentren en el arreglo que se le ingresa como argumento y calcula el promedio con el resto de los elementos." 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": {}, 460 | "outputs": [], 461 | "source": [ 462 | "np.nanmean(arreglo)" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "* La siguiente celda creará al *dataframe* ```poblacion_nan``` a partir del *dataframe* ```poblacion```, sustituyendo el valor de ```poblacion_nan['Norte_1']['jaguar']```por ```np.NaN```." 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": { 476 | "scrolled": true 477 | }, 478 | "outputs": [], 479 | "source": [ 480 | "poblacion_nan = poblacion.copy()\n", 481 | "poblacion_nan['Norte_1']['jaguar'] = np.NaN\n", 482 | "poblacion_nan" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "* La siguiente celda usará la función ```np.mean()``` como argumento del método ```poblacion_nan.apply()```. El comportamiento es idéntico a usar ```np.nanmean()```.\n", 490 | "* El resultado para la columna ```Norte_1``` es ```28.2``` en vez de ```np.Nan```." 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": null, 496 | "metadata": {}, 497 | "outputs": [], 498 | "source": [ 499 | "poblacion_nan.apply(np.mean)" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": {}, 506 | "outputs": [], 507 | "source": [ 508 | "poblacion_nan.apply(np.nanmean)" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "## El método ```pd.DataFrame.transform()```.\n", 516 | "\n", 517 | "Este método permite crear nuevos niveles con los resultados de las funciones de aplicará una o más funciones a los elementos de un *dataframe*.\n", 518 | "\n", 519 | "```\n", 520 | "df.transform(, , ..., axis=)\n", 521 | "```\n", 522 | "\n", 523 | "Donde:\n", 524 | "\n", 525 | "* `````` es una función de *Python* o de *Numpy*.\n", 526 | "* `````` puede ser:\n", 527 | " * ```0``` para aplicar la función a los renglones. Este es el valor por defecto.\n", 528 | " * ```1``` para aplicar la función a las columnas.\n", 529 | "\n", 530 | "**NOTA:** Este método no permite realizar operaciones de agregación.\n", 531 | "\n", 532 | "\n", 533 | "La documentación del método ```pd.DataFrame.transform()``` puede ser consultada en:\n", 534 | "\n", 535 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": {}, 541 | "source": [ 542 | "**Ejemplo:**" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "* Se utilizará el *dataframe* ```poblacion``` definido previamente." 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": null, 555 | "metadata": { 556 | "scrolled": true 557 | }, 558 | "outputs": [], 559 | "source": [ 560 | "poblacion" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "* La siguiente celda aplicará las funciones al *dataframe* ```poblacion```:\n", 568 | " * ```lambda x: x + [1, 2, 3, 4, 5, 6]```.\n", 569 | " * ```np.log```\n", 570 | " * ```np.sin```\n", 571 | "* El dataframe resultante tendrá un subnivel debajo de cada columna de ```poblacion``` para en el que e creará una columna con el resultado de aplicar la función tomando a cada elemento como argumento." 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": null, 577 | "metadata": { 578 | "scrolled": true 579 | }, 580 | "outputs": [], 581 | "source": [ 582 | "poblacion.transform([lambda x: x + [1, 2, 3, 4, 5, 6],\n", 583 | " np.log,\n", 584 | " np.sin])" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "* El método ```poblacion.transform()``` no es compatible con la función ```np.mean()```, por lo que se desencadenará una excepción ```ValueError```." 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": null, 597 | "metadata": { 598 | "scrolled": false 599 | }, 600 | "outputs": [], 601 | "source": [ 602 | "poblacion.transform(np.mean)" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "* Sin embargo, es posible ingresar una función de agregación en otra función que no realice agregación por si misma." 610 | ] 611 | }, 612 | { 613 | "cell_type": "code", 614 | "execution_count": null, 615 | "metadata": {}, 616 | "outputs": [], 617 | "source": [ 618 | "poblacion.transform(lambda x: x - x.mean())" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": {}, 624 | "source": [ 625 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 626 | "

© José Luis Chiquete Valdivieso. 2023.

" 627 | ] 628 | } 629 | ], 630 | "metadata": { 631 | "kernelspec": { 632 | "display_name": "Python 3 (ipykernel)", 633 | "language": "python", 634 | "name": "python3" 635 | }, 636 | "language_info": { 637 | "codemirror_mode": { 638 | "name": "ipython", 639 | "version": 3 640 | }, 641 | "file_extension": ".py", 642 | "mimetype": "text/x-python", 643 | "name": "python", 644 | "nbconvert_exporter": "python", 645 | "pygments_lexer": "ipython3", 646 | "version": "3.9.2" 647 | } 648 | }, 649 | "nbformat": 4, 650 | "nbformat_minor": 2 651 | } 652 | -------------------------------------------------------------------------------- /17_metodos_de_enmascaramiento.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Métodos de enmascaramiento." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "En este capítulo se explorarán los métodos que permiten sustituir los valores de un *dataframe* mediante otro *dataframe* con valores booleanos." 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## *Dataframe* ilustrativo. " 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "El *dataframe* ```poblacion``` representa un censo poblacional de especies animales en diversas regiones geográficas.\n", 45 | "\n", 46 | "Las poblaciones de animales censadas representan los índices del *dataframe* y son:\n", 47 | "* ```'lobo'```.\n", 48 | "* ```'jaguar'```.\n", 49 | "* ```'coyote'```.\n", 50 | "* ```'halcón'```. \n", 51 | "* ```'lechuza'```.\n", 52 | "* ```'aguila'```.\n", 53 | "\n", 54 | "Las regiones geográficas representan la columnas del *dataframe* y son:\n", 55 | "\n", 56 | "* ```Norte_1```.\n", 57 | "* ```Norte_2```.\n", 58 | "* ```Sur_1```.\n", 59 | "* ```Sur_2```." 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "indice = ('lobo', 'jaguar', 'coyote', 'halcón', 'lechuza', 'aguila')\n", 69 | "poblacion = pd.DataFrame({'Norte_1':(25,\n", 70 | " 0,\n", 71 | " 45,\n", 72 | " 23,\n", 73 | " 67,\n", 74 | " 12),\n", 75 | " 'Norte_2':(31,\n", 76 | " 0,\n", 77 | " 23,\n", 78 | " 3,\n", 79 | " 34,\n", 80 | " 2),\n", 81 | " 'Sur_1':(0,\n", 82 | " 4,\n", 83 | " 3,\n", 84 | " 1,\n", 85 | " 1,\n", 86 | " 2),\n", 87 | " 'Sur_2':(2,\n", 88 | " 0,\n", 89 | " 12,\n", 90 | " 23,\n", 91 | " 11,\n", 92 | " 2)},\n", 93 | " index=indice)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "poblacion" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "## Enmascaramiento.\n", 110 | "\n", 111 | "Se entiende por \"enmascaramiento\" la aplicación sobre un *datraframe* de otro *dataframe* de tamaño idéntico, pero compuesto por valores booleanos (*dataframe* de máscara), con la finalidad de sustituir cada valor del *dataframe* original en función del valor booleano correspondiente.\n" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "## El método ```mask()```.\n", 119 | "\n", 120 | "El método ```mask()``` permite sutituir por un valor predeterminado a aquellos elementos cuya contraparte en el objeto usado como máscara sea ```True```.\n", 121 | "\n", 122 | "```\n", 123 | ".mask(, )\n", 124 | "```\n", 125 | "\n", 126 | "Donde:\n", 127 | "\n", 128 | "* `````` es una serie o un *dataframe*.\n", 129 | "* `````` es una serie o un *dataframe* de dimensiones idénticas a `````` donde todos su elementos son de tipo ```bool```.\n", 130 | "* ```valor``` es el valor que sutituirá a aquellos elementos en `````` cuya contraparte en `````` sea ```True```.\n" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "**Ejemplo:**" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "* Se uitlizará el *dataframe* ```poblacion``` definido previamente." 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "poblacion" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "* Se creará el *dataframe* ```poblacion_evaluada``` validando si cada elemento del ```poblacion``` es igual a ```0```." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "poblacion_evaluada = poblacion == 0" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "poblacion_evaluada" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "* La siguiente celda sustiruirá con la cadena ```'extinto'``` a cada valor en el *dataframe* ```poblacion``` que corresponda a ```True``` en el *dataframe* ```poblacion_evaluada```." 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "poblacion.mask(poblacion_evaluada, 'extinto')" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "## El método ```where()```.\n", 202 | "\n", 203 | "\n", 204 | "El método ```where()``` permite sustituir por un valor predeterminado a aquellos elementos cuya contraparte en el objeto usado como máscara sea ```False```.\n", 205 | "\n", 206 | "```\n", 207 | ".where(, )\n", 208 | "```\n", 209 | "\n", 210 | "Donde:\n", 211 | "\n", 212 | "* `````` es una serie o un *dataframe*.\n", 213 | "* `````` es una serie o un *dataframe* de dimensiones idénticas a `````` donde todos su elementos son de tipo ```bool```.\n", 214 | "* ```valor``` es el valor que sutituirá a aquellos elementos en `````` cuya contraparte en `````` sea ```False```.\n" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "**Ejemplo:**" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "* La siguiente celda sustituirá a cada valor del *dataframe* ```poblacion``` que al validar si es menor que ```10``` de por resultado ```False``` por la cadena ```'sin riesgo'```." 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "poblacion.where(poblacion < 10, 'sin riesgo')" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "## Ejemplo de combinación de ```mask()``` y ```where()```." 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "* La siguiente celda sustituirá por ```'sin riesgo'``` a aquellos elementos cuyo valor sea mayor o igual ```10``` y sustituirá por ```'amenazado'``` a aquellos elementos cuyo valor sea menor a ```2``` en el *dataframe* ```poblacion```." 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "poblacion.where(poblacion < 10, 'sin riesgo').\\\n", 261 | " mask(poblacion < 2, 'amenazados')" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "* La siguiente celda usará los métodos ```filter()```, ```where()``` y ```mask()``` para aplicar el criterio del ejemplo previo, pero sólo a la columna ```'Sur-2```." 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "poblacion.filter(items=['Sur_2']).\\\n", 278 | " where(poblacion < 10, 'sin riesgo').\\\n", 279 | " mask(poblacion < 2, 'amenazados')" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "## El método ```query()```." 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": { 293 | "scrolled": true 294 | }, 295 | "outputs": [], 296 | "source": [ 297 | "poblacion" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "poblacion[\"Norte_1\"] == 0" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "poblacion[poblacion[\"Norte_1\"] == 0]" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "poblacion.query('Norte_1 == 0')" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 332 | "

© José Luis Chiquete Valdivieso. 2023.

" 333 | ] 334 | } 335 | ], 336 | "metadata": { 337 | "kernelspec": { 338 | "display_name": "Python 3 (ipykernel)", 339 | "language": "python", 340 | "name": "python3" 341 | }, 342 | "language_info": { 343 | "codemirror_mode": { 344 | "name": "ipython", 345 | "version": 3 346 | }, 347 | "file_extension": ".py", 348 | "mimetype": "text/x-python", 349 | "name": "python", 350 | "nbconvert_exporter": "python", 351 | "pygments_lexer": "ipython3", 352 | "version": "3.9.2" 353 | } 354 | }, 355 | "nbformat": 4, 356 | "nbformat_minor": 2 357 | } 358 | -------------------------------------------------------------------------------- /19_limpieza_y_datos_faltantes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Limpieza y datos faltantes." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "En este capítulo se explorarán diversos métodos enfocados a gestionar *dataframes* que no son homogéneos en sus contenidos." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "scrolled": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "import pandas as pd\n", 33 | "import numpy as np" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## El *dataframe* ilustrativo." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "El *dataframe* ```poblacion``` describe una serie de muestras poblacionales de animales en varias regiones geográficas." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "poblacion = pd.DataFrame({'Animal':('lobo',\n", 57 | " 'coyote',\n", 58 | " 'jaguar',\n", 59 | " 'cerdo salvaje',\n", 60 | " 'tapir',\n", 61 | " 'venado',\n", 62 | " 'ocelote',\n", 63 | " 'puma'),\n", 64 | " 'Norte_I':(12,\n", 65 | " np.NAN,\n", 66 | " None,\n", 67 | " 2,\n", 68 | " 4,\n", 69 | " 2,\n", 70 | " 14,\n", 71 | " 5\n", 72 | " ),\n", 73 | " 'Norte_II':(23,\n", 74 | " 4,\n", 75 | " 25,\n", 76 | " 21,\n", 77 | " 9,\n", 78 | " 121,\n", 79 | " 1,\n", 80 | " 2\n", 81 | " ),\n", 82 | " 'Centro_I':(15,\n", 83 | " 23,\n", 84 | " 2,\n", 85 | " None,\n", 86 | " 40,\n", 87 | " 121,\n", 88 | " 0,\n", 89 | " 5),\n", 90 | " 'Sur_I':(28,\n", 91 | " 46,\n", 92 | " 14,\n", 93 | " 156,\n", 94 | " 79,\n", 95 | " 12,\n", 96 | " 2,\n", 97 | " np.NAN)}).set_index('Animal')" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "poblacion" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "## Métodos de validación de *NaN*.\n", 114 | "\n", 115 | "En muchos casos, los *dataframes* incluyen objetos de tipo ```np.NaN```. Por lo general este tipo de dato denota datos incompletos cuyo verdadero valor es desconocido.\n", 116 | "\n", 117 | "Poder transformar eficientemente ```np.NaN``` en valores relevantes requiere de experiencia y conocimiento de los datos con los que se trabaja." 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "### El método ```isna()```.\n", 125 | "\n", 126 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores ```np.NaN```." 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "**Ejemplo:**" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "* La siguiente celda evaluará cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN```. En caso de que el elemento sea ```np.NaN```, el valor dentro del *dataframe* resultante será ```True```." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "poblacion.isna()" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "### El método ```isnull()```.\n", 157 | "\n", 158 | "Este método de enmascaramineto que detecta aquellos elementos que contienen tanto a ```np.NaN``` como a ```None```." 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "**Ejemplo:**" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "* Se utilizará al *dataframe* ```poblacion``` definido previamente." 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": null, 178 | "metadata": { 179 | "scrolled": true 180 | }, 181 | "outputs": [], 182 | "source": [ 183 | "poblacion" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "* La siguiente celda evaluará cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN``` o ```None```. En caso de que el elemento coincida, el valor dentro del *dataframe* resultante será ```True```." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "poblacion.isnull()" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "### El método ```notna()```.\n", 207 | "\n", 208 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores distintos a ```np.NaN```." 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "**Ejemplo:**" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "* La siguiente celda evaluará a cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN```. En caso de que el elemento sea ```np.NaN```, el valor dentro del *dataframe* resultante será ```False```." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "poblacion.notna()" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "### El método ```notnull()```.\n", 239 | "\n", 240 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores distintos a ```np.NaN``` o a ```None```." 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "**Ejemplo:**" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "* La siguiente celda evaluará a cada elemento de ```poblacion``` validando si existen valore distintos a ```np.NaN``` o a ```None```. En caso de que el elemento sea ```np.NaN``` o ```None```, el valor dentro del *dataframe* resultante será ```False```." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "poblacion.notnull()" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "## El método ```fillna()```.\n", 271 | "\n", 272 | "Este método sustituirá los valores ```np.NaN``` con el valor designado como argumento.\n", 273 | "\n", 274 | "\n", 275 | "```\n", 276 | "df.fillna()\n", 277 | "```\n", 278 | "\n", 279 | "Donde:\n", 280 | "\n", 281 | "* `````` es cualquier objeto de *Python*, *Numpy* o de *Pandas*." 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "**Ejemplo:**" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": { 295 | "scrolled": false 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "poblacion" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "* Las siguientes celdas sustituirán a los elementos ```np.NaN``` por el objeto ingresado como argumento." 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": { 313 | "scrolled": true 314 | }, 315 | "outputs": [], 316 | "source": [ 317 | "poblacion.fillna(0)" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [ 326 | "poblacion.fillna(15)" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "poblacion.fillna(\"inválido\")" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "## El método ```interpolate()```.\n", 343 | "\n", 344 | "Este método realiza cáclulos de interpolación para sustituir a ```np.NaN```.\n", 345 | "\n", 346 | "\n", 347 | "```\n", 348 | "df.interpolate(method=, axis=)\n", 349 | "```\n", 350 | "\n", 351 | "Donde:\n", 352 | "\n", 353 | "* `````` es un método de interpolación. El valor por defecto es ```'linear'```.\n", 354 | "* El parámetro ```axis``` define el eje desde el cual se tomarán los elementos de interpolación y su valor por defecto es ```0```.\n", 355 | "\n", 356 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html\n", 357 | "\n", 358 | "**Nota:** *Scipy* cuenta con diversos algoritmos de interpolación, los cuales pueden ser consultados en: \n", 359 | "\n", 360 | "* https://docs.scipy.org/doc/scipy/reference/interpolate.html\n", 361 | "* https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "**Ejemplos:**" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "* Se utilizará el *dataframe* ```poblacion```definido previamente." 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": { 382 | "scrolled": true 383 | }, 384 | "outputs": [], 385 | "source": [ 386 | "poblacion" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "* La siguiente celda ejecutará el método ```poblacion.intterpolate()``` con los argumentos por defecto.\n", 394 | "* El *dataframe* resultante modificará aquellos elementos ```np.NaN``` aplicando una interpolación lineal tomando como datos de referencia a los del renglón a la que pertence el elemento. " 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": null, 400 | "metadata": { 401 | "scrolled": true 402 | }, 403 | "outputs": [], 404 | "source": [ 405 | "poblacion.interpolate()" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "* La siguiente celda ejecutará el método ```poblacion.intterpolate()``` con el argumento ```axis=1```.\n", 413 | "* El *dataframe* resultante modificará aquellos elementos ```np.NaN``` aplicando una interpolación lineal tomando como datos de referencia a los de la columna a la que pertence el elemento." 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": { 420 | "scrolled": true 421 | }, 422 | "outputs": [], 423 | "source": [ 424 | "poblacion.interpolate(axis=1)" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "* La siguiente celda usará el argumento ```method=\"zero\"```.\n", 432 | "* El métódo ```\"zero\"``` requiere que exista un índice numérico para poder realizar la interpolación, por lo que se desencadenará una excepción de tipo ```ValueError```. " 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": null, 438 | "metadata": { 439 | "scrolled": false 440 | }, 441 | "outputs": [], 442 | "source": [ 443 | "poblacion.interpolate(method=\"zero\")" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "* La siguiente celda creará un *dataframe* llamado ```poblacion_numerica```, basado en ```poblacion```en el que los índices serán numéricos y se desechará la columna ```'Animal'```." 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "poblacion_numerica = poblacion.reset_index().drop('Animal', axis=1)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "metadata": {}, 466 | "outputs": [], 467 | "source": [ 468 | "poblacion_numerica" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "* la siguiente celda aplicará el métódo ```poblacion_numerica.interpolate()```:\n", 476 | " * Ingresando el argumento ```axis=0```.\n", 477 | " * Ingresando el argumento ```method=\"zero\"```." 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "poblacion_numerica.interpolate(method=\"zero\", axis=0)" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "## El método ```dropna()```.\n", 494 | "\n", 495 | "Este método desechará los renglones o columnas que contengan valores ```np.NaN```.\n", 496 | "\n", 497 | "\n", 498 | "```\n", 499 | "df.dropna(axis=)\n", 500 | "```\n", 501 | "\n", 502 | "Donde:\n", 503 | "\n", 504 | "* ``````" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "**Ejemplo:**" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": null, 517 | "metadata": {}, 518 | "outputs": [], 519 | "source": [ 520 | "poblacion" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": {}, 527 | "outputs": [], 528 | "source": [ 529 | "poblacion.dropna()" 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": null, 535 | "metadata": {}, 536 | "outputs": [], 537 | "source": [ 538 | "poblacion.dropna(axis=1)" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "## El método ```duplicated()```.\n", 546 | "\n", 547 | "Identifica aquellos renglones duplicados." 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": {}, 554 | "outputs": [], 555 | "source": [ 556 | "poblacion.duplicated()" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": null, 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "otra_poblacion = pd.DataFrame({'Animal':('lobo',\n", 566 | " 'coyote',\n", 567 | " 'jaguar',\n", 568 | " 'cerdo salvaje',\n", 569 | " 'tapir',\n", 570 | " 'venado',\n", 571 | " 'ocelote',\n", 572 | " 'puma'),\n", 573 | " 'Norte_I':(12,\n", 574 | " 4,\n", 575 | " None,\n", 576 | " 2,\n", 577 | " 4,\n", 578 | " 2,\n", 579 | " 14,\n", 580 | " 4\n", 581 | " ),\n", 582 | " 'Norte_II':(23,\n", 583 | " 4,\n", 584 | " 25,\n", 585 | " 21,\n", 586 | " 9,\n", 587 | " 121,\n", 588 | " 1,\n", 589 | " 4\n", 590 | " ),\n", 591 | " 'Centro_I':(15,\n", 592 | " 4,\n", 593 | " 2,\n", 594 | " 120,\n", 595 | " 40,\n", 596 | " 121,\n", 597 | " 0,\n", 598 | " 4),\n", 599 | " 'Sur_I':(28,\n", 600 | " 4,\n", 601 | " 14,\n", 602 | " 156,\n", 603 | " 79,\n", 604 | " 12,\n", 605 | " 2,\n", 606 | " 4)}).set_index('Animal')" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": {}, 613 | "outputs": [], 614 | "source": [ 615 | "otra_poblacion" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": null, 621 | "metadata": {}, 622 | "outputs": [], 623 | "source": [ 624 | "otra_poblacion.duplicated()" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "## El método ```drop_duplicates()```.\n", 632 | "\n", 633 | "Este método elimina renglones duplicados." 634 | ] 635 | }, 636 | { 637 | "cell_type": "code", 638 | "execution_count": null, 639 | "metadata": {}, 640 | "outputs": [], 641 | "source": [ 642 | "otra_poblacion.drop_duplicates()" 643 | ] 644 | }, 645 | { 646 | "cell_type": "markdown", 647 | "metadata": {}, 648 | "source": [ 649 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 650 | "

© José Luis Chiquete Valdivieso. 2023.

" 651 | ] 652 | } 653 | ], 654 | "metadata": { 655 | "kernelspec": { 656 | "display_name": "Python 3 (ipykernel)", 657 | "language": "python", 658 | "name": "python3" 659 | }, 660 | "language_info": { 661 | "codemirror_mode": { 662 | "name": "ipython", 663 | "version": 3 664 | }, 665 | "file_extension": ".py", 666 | "mimetype": "text/x-python", 667 | "name": "python", 668 | "nbconvert_exporter": "python", 669 | "pygments_lexer": "ipython3", 670 | "version": "3.9.2" 671 | } 672 | }, 673 | "nbformat": 4, 674 | "nbformat_minor": 2 675 | } 676 | -------------------------------------------------------------------------------- /21_metodos_groupby.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Métodos ```groupby()```." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "*Pandas* cuenta con una funcionalidad que permite agrupar los datos idénticos en una columna o un renglón de un *dataframe* .\n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Tanto las series como los dataframes de *Pandas* cuentan con un método ```groupby()```.\n", 29 | "\n", 30 | "* El método ```pd.DataFrame.groupby()``` regresa un objeto ```pd.core.groupby.generic.DataFrameGroupBy```.\n", 31 | "* El método ```pd.Series.groupby()``` regresa un objeto ```pd.core.groupby.generic.SeriesGroupBy```.\n", 32 | "\n", 33 | "En este capítulo se explorará el método ```pd.DataFrame.groupby()```, asumiendo que el método```pd.Series.groupby()``` se comporta de forma similar." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import pandas as pd\n", 43 | "from datetime import datetime" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## El método ```pd.DataFrame.groupby()```.\n", 51 | "\n", 52 | "El método regresa un objeto de tipo ```pd.core.groupby.generic.DataFrameGroupBy```.\n", 53 | "\n", 54 | "```\n", 55 | "df.groupby(by=, axis=, group_keys=True)\n", 56 | "```\n", 57 | "Donde:\n", 58 | "\n", 59 | "* `````` corresponde al identificador de la columna o índice en el que se realizará la agrupación.\n", 60 | "* El argumento ```axis``` indicará el eje al que se aplicará el método. El valor por defecto es ```1```.\n", 61 | "* El argumento ```group_keys``` le indica al método que use los valores de agrupamiento como llaves. El valor por defecto es ```False```, pero se recomienda asignarle el valor ```True```.\n", 62 | "\n", 63 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "**Ejemplo:**" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "* La siguiente celda creará al dataframe ```facturas``` con la estructura de columnas:\n", 78 | "\n", 79 | " * ```'folio'```.\n", 80 | " * ```'sucursal'```.\n", 81 | " * ```'monto'```.\n", 82 | " * ```'fecha'```.\n", 83 | " * ```'cliente'```." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "facturas = pd.DataFrame({'folio':(15234, \n", 93 | " 15235, \n", 94 | " 15236, \n", 95 | " 15237, \n", 96 | " 15238, \n", 97 | " 15239, \n", 98 | " 15240,\n", 99 | " 15241,\n", 100 | " 15242),\n", 101 | " 'sucursal':('CDMX01',\n", 102 | " 'MTY01',\n", 103 | " 'CDMX02',\n", 104 | " 'CDMX02',\n", 105 | " 'MTY01',\n", 106 | " 'GDL01',\n", 107 | " 'CDMX02',\n", 108 | " 'MTY01',\n", 109 | " 'GDL01'),\n", 110 | " 'monto':(1420.00,\n", 111 | " 1532.00,\n", 112 | " 890.00,\n", 113 | " 1300.00,\n", 114 | " 3121.47,\n", 115 | " 1100.5,\n", 116 | " 12230,\n", 117 | " 230.85,\n", 118 | " 1569),\n", 119 | " 'fecha':(datetime(2019,3,11,17,24),\n", 120 | " datetime(2019,3,24,14,46),\n", 121 | " datetime(2019,3,25,17,58),\n", 122 | " datetime(2019,3,27,13,11),\n", 123 | " datetime(2019,3,31,10,25),\n", 124 | " datetime(2019,4,1,18,32),\n", 125 | " datetime(2019,4,3,11,43),\n", 126 | " datetime(2019,4,4,16,55),\n", 127 | " datetime(2019,4,5,12,59)),\n", 128 | " 'cliente':(19234,\n", 129 | " 19232,\n", 130 | " 19235,\n", 131 | " 19233,\n", 132 | " 19236,\n", 133 | " 19237,\n", 134 | " 19232,\n", 135 | " 19233,\n", 136 | " 19232)\n", 137 | " })" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": { 144 | "scrolled": true 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "facturas" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "* La siguiente celda agrupará aquellos elementos en los que el valor de la columna ```facturas['cliente']``` sean iguales." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "clientes = facturas.groupby(\"cliente\", group_keys=True)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "* El objeto ```clientes``` es de tipo ```pd.core.groupby.generic.DataFrameGroupBy```." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "clientes" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "## Los objetos ```core.groupby.generic.DataFrameGroupBy```.\n", 188 | "\n", 189 | "Los objetos ```core.groupby.generic.DataFrameGroupBy``` son iteradores que contienen a objetos de tipo ```tuple``` resultantes de la agrupación.\n", 190 | "\n", 191 | "* El primer elemento de la tupla corresponde al valor que se agrupa.\n", 192 | "* El segundo elemento de la tupla corresponde a un *dataframe* con los elementos agrupados.\n", 193 | "\n", 194 | "Dichos objetos contiene diversos métodos capaces de procesar los datos de cada objeto ```tuple``` que contiene." 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "**Ejemplo:**" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "* La siguiente celda desplegará las tuplas contenidas en ```clientes```." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": { 215 | "scrolled": true 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "for item in clientes:\n", 220 | " print(f\"\"\"ciente: {item[0]}\n", 221 | " -------\n", 222 | "{item[1]}\n", 223 | "\"\"\")" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "* La siguiente celda creará un objeto tipo ```list``` llamado ```clientes_agrupados``` a patir del objeto ```cliente```." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "clientes_agrupados = list(clientes)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": { 246 | "scrolled": true 247 | }, 248 | "outputs": [], 249 | "source": [ 250 | "clientes_agrupados" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "* La siguiente celda regresará al *datafame* que corresponde al segundo elemento de la tupla ```clientes_agrupados[0]```." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "clientes_agrupados[0][1]" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "type(clientes_agrupados[0][1])" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "### Indexado de los objetos ```DataFrameGroupBy```.\n", 283 | "\n", 284 | "Los objetos ```DataFrameGroupBy``` permiten el indexado de columnas propio de los *dataframes*.\n", 285 | "\n", 286 | "```\n", 287 | "[]\n", 288 | "```\n", 289 | "Donde:\n", 290 | "\n", 291 | "* `````` es un objeto ```DataFrameGroupBy```.\n", 292 | "* `````` es el identificador de una columna del *dataframe* original.\n", 293 | "\n", 294 | "En caso de haber ingresado el parámetro ```group_keys=True```, es posible usar la siguiente sintaxis.\n", 295 | "\n", 296 | "```\n", 297 | "[[, , ... ]]\n", 298 | "```\n", 299 | "Donde:\n", 300 | "\n", 301 | "* `````` es un objeto ```DataFrameGroupBy```.\n", 302 | "* `````` es el identificador de una columna del *dataframe* original.\n", 303 | "\n" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "**Ejemplo:**" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "* La siguiente celda regresará un listado de los elementos agrupados, pero sólo se incluirá a la columna ```'fecha'```." 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": { 324 | "scrolled": true 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "for item in clientes['fecha']:\n", 329 | " print(f\"\"\"ciente: {item[0]}\n", 330 | " -------\n", 331 | "{item[1]}\n", 332 | "\"\"\")" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "* La siguiente celda regresará un listado de los elementos agrupados, pero sólo se incluirá a las columnas ```'fecha'``` y ```monto```." 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "metadata": {}, 346 | "outputs": [], 347 | "source": [ 348 | "for item in clientes[['fecha', 'monto']]:\n", 349 | " print(f\"\"\"ciente: {item[0]}\n", 350 | " -------\n", 351 | "{item[1]}\n", 352 | "\"\"\")" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "### Atributos y métodos de ```DataFrameGroupBy```.\n", 360 | "\n", 361 | "Los objetos ```core.groupby.generic.DataFrameGroupBy``` cuentan con una gran cantidad de atributo y métodos que permiten analizar y manipular los datos de las tuplas que contienen dichos objetos.\n", 362 | "\n", 363 | "https://pandas.pydata.org/docs/reference/groupby.html" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "* Las siguientes celdas mostrarán algunos métodos y atributos de los objetos ```core.groupby.generic.DataFrameGroupBy```." 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "**Ejemplos:**" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "* La siguiente celda regresará al atributo ```clientes.indices```, el cual es un objeto de tipo ```dict``` donde las claves corresponden a cada valor de agrupación y los valores corresponden a un arreglo que enumera los índices en donde se encontró dicho valor de agrupación." 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": { 391 | "scrolled": true 392 | }, 393 | "outputs": [], 394 | "source": [ 395 | "clientes.indices" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "* La siguiente celda regresará una serie en el que el índice corresponden a cada valor de agrupación y los valores corresponden al numero de elementos agrupados del objeto ```cliente```." 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "metadata": { 409 | "scrolled": true 410 | }, 411 | "outputs": [], 412 | "source": [ 413 | "clientes.size()" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "* La siguiente celda regresará un *dataframe* en el que el índice corresponden a cada valor de agrupación y los valores corresponden a la media estadística de los valores agrupados de cada columna restante del *dataset* original de ```clientes```.\n", 421 | "* El parámetro ```numeric_only=True``` le indica al método que aplique el cálculo sólo a aquellas columnas que contengan valores numéricos." 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": null, 427 | "metadata": { 428 | "scrolled": false 429 | }, 430 | "outputs": [], 431 | "source": [ 432 | "clientes.mean(numeric_only=True)" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "* La siguiente celda aplicará el método ```mean()``` a ```clientes['monto']```." 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": null, 445 | "metadata": { 446 | "scrolled": true 447 | }, 448 | "outputs": [], 449 | "source": [ 450 | "clientes['monto'].mean(numeric_only=True)" 451 | ] 452 | }, 453 | { 454 | "cell_type": "markdown", 455 | "metadata": {}, 456 | "source": [ 457 | "* La siguiente celda trazará un histograma a partir de los valores en la columna ```\"monto\"``` de cada elemento agrupado." 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": null, 463 | "metadata": {}, 464 | "outputs": [], 465 | "source": [ 466 | "clientes.hist(column=\"monto\")" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "* La siguiente celda aplicará una función que divida a cada valor entre ```1000```." 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "clientes['monto'].apply(func=lambda x: x / 100)" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 490 | "

© José Luis Chiquete Valdivieso. 2023.

" 491 | ] 492 | } 493 | ], 494 | "metadata": { 495 | "kernelspec": { 496 | "display_name": "Python 3 (ipykernel)", 497 | "language": "python", 498 | "name": "python3" 499 | }, 500 | "language_info": { 501 | "codemirror_mode": { 502 | "name": "ipython", 503 | "version": 3 504 | }, 505 | "file_extension": ".py", 506 | "mimetype": "text/x-python", 507 | "name": "python", 508 | "nbconvert_exporter": "python", 509 | "pygments_lexer": "ipython3", 510 | "version": "3.9.2" 511 | } 512 | }, 513 | "nbformat": 4, 514 | "nbformat_minor": 2 515 | } 516 | -------------------------------------------------------------------------------- /22_extraccion_y_almacenamiento.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Extracción y almacenamiento de dataframes y series. " 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "!pip install openpyxl" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import pandas as pd\n", 33 | "import numpy as np" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "Una de las fortalezas de *Pandas* es su capacidad de extraer información de diversas fuentes de datos.\n", 41 | "\n", 42 | "En este capítulo se realizará la extracción de un dataframe a partir de un archivo de hoja de cálculo publicado en Internet." 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "## El paquete ```xlrd```.\n", 50 | "\n", 51 | "Este paquete permite realizar operaciones de lectura en hojas de cálculo en formatos ```xls``` y ```xlsx```. \n", 52 | "\n", 53 | "La documentaciónde ```xlrd``` está disponible en:\n", 54 | "\n", 55 | "https://xlrd.readthedocs.io/en/latest/\n", 56 | "\n", 57 | "*Pandas* utiliza ```xlrd``` para extraer información de este tipo de archivos." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "!pip install xlrd" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "## Funciones de lectura de Pandas.\n", 74 | "\n", 75 | "* ```pd.read_clipboard()``` Permite leer datos que se encuentran en el espacio de memoria del \"clipboard\" en un sistemas.\n", 76 | "* ```pd.read_csv()``` Permite leer datos que se encuentran en un archivo *CSV*.\n", 77 | "* ```pd.read_excel()``` Permite leer datos que se encuentran en un archivo de *Excel*.\n", 78 | "* ```pd.read_feather()``` Permite leer datos a partir de [*feather*](https://github.com/wesm/feather).\n", 79 | "* ```pd.read_fwf()```.\n", 80 | "* ```pd.read_gbq()``` para [*Google Big Query*](https://pandas-gbq.readthedocs.io/en/latest/).\n", 81 | "* ```pd.read_hdf()``` para [HDF5](https://www.hdfgroup.org/solutions/hdf5).\n", 82 | "* ```pd.read_html()``` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html.\n", 83 | "* ```pd.read_json()```.\n", 84 | "* ```pd.read_msgpack()``` https://pandas-msgpack.readthedocs.io/en/latest/.\n", 85 | "* ```pd.read_parquet()``` https://databricks.com/glossary/what-is-parquet.\n", 86 | "* ```pd.read_pickle()```.\n", 87 | "* ```pd.read_sas()```.\n", 88 | "* ```pd. read_sql()```.\n", 89 | "* ```pd.read_sql_query()```.\n", 90 | "* ```pd.read_sql_table()```.\n", 91 | "* ```pd.read_stata()```.\n", 92 | "* ```pd.read_table()```.\n", 93 | "\n", 94 | "**Nota:** En la mayor parte de los casos los datos extraidos son almacenados en un *dataframe*." 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "## Métodos de persistencia y almacenamiento de los dataframes de *Pandas*.\n", 102 | "\n", 103 | "* ```pd.DataFrame.to_clipboard()```\n", 104 | "* ```pd.DataFrame.to_csv()```\n", 105 | "* ```pd.DataFrame.to_dict()```\n", 106 | "* ```pd.DataFrame.to_excel()```\n", 107 | "* ```pd.DataFrame.to_feather()```\n", 108 | "* ```pd.DataFrame.to_gbq()```\n", 109 | "* ```pd.DataFrame.to_hdf()```\n", 110 | "* ```pd.DataFrame.to_html```\n", 111 | "* ```pd.DataFrame.to_json()```\n", 112 | "* ```pd.DataFrame.to_latex()```\n", 113 | "* ```pd.DataFrame.to_msgpack()```\n", 114 | "* ```pd.DataFrame.to_numpy()```\n", 115 | "* ```pd.DataFrame.to_parquet()```\n", 116 | "* ```pd.DataFrame.to_pickle()```\n", 117 | "* ```pd.DataFrame.to_records()```\n", 118 | "* ```pd.DataFrame.to_sql()```\n", 119 | "* ```pd.DataFrame.to_stata()```" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "## Obtención de datos a partir de una hoja de cálculo pulbvicada por el INEGI.\n", 127 | "\n", 128 | "A continuación se descargará el archivo localizado en https://www.inegi.org.mx/contenidos/temas/economia/cn/itaee/tabulados/ori/ITAEE_2.xlsx" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "### Obtención del archivo usando ```urllib```." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "import urllib.request" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "urllib.request.urlretrieve(\"https://www.inegi.org.mx/contenidos/temas/economia/cn/itaee/tabulados/ori/ITAEE_2.xlsx\", \"datos.xlsx\")" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": { 160 | "scrolled": true 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "%ls datos.xlsx" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "### Carga del archivo con ```pd.read_excel()```." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "original = pd.read_excel('datos.xlsx')" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": { 187 | "scrolled": true 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "original" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": { 198 | "scrolled": false 199 | }, 200 | "outputs": [], 201 | "source": [ 202 | "original.head(39)" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "### Uso de ```set_index()``` para definir un índice por entidad." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "original.columns" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": { 225 | "scrolled": true 226 | }, 227 | "outputs": [], 228 | "source": [ 229 | "original.columns.values" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": null, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "original.columns.values[0]" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "original.set_index(original.columns.values[0], inplace=True)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "original.head(39)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "original.index.name = 'Entidades'" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": { 272 | "scrolled": true 273 | }, 274 | "outputs": [], 275 | "source": [ 276 | "original" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "### Obtención de los datos relevantes." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "datos = original[6:39]" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "metadata": {}, 299 | "outputs": [], 300 | "source": [ 301 | "datos" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "### Creación de un índice de columnas adecuado." 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "periodos = pd.MultiIndex.from_product([[x for x in range(2003, 2023)],\n", 318 | " ['T1', 'T2', 'T3', 'T4', 'Anual']], \n", 319 | " names=('Año', 'Periodo'))" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "periodos" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": {}, 335 | "outputs": [], 336 | "source": [ 337 | "datos.columns = periodos" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": { 344 | "scrolled": false 345 | }, 346 | "outputs": [], 347 | "source": [ 348 | "datos" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "metadata": { 355 | "scrolled": true 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "datos[2005]" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": null, 365 | "metadata": {}, 366 | "outputs": [], 367 | "source": [ 368 | "datos[2005]['T1']" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "datos[2005]['T1'][1:]" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [ 386 | "periodo = datos[2005]['T1'][1:]" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": { 393 | "scrolled": true 394 | }, 395 | "outputs": [], 396 | "source": [ 397 | "periodo.mean()" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "periodo.to_excel('datos_utiles.xlsx')" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "### Extracción y escritura en formato CVS." 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": {}, 420 | "outputs": [], 421 | "source": [ 422 | "nuevos_datos = pd.read_csv('data/datos_filtrados.csv')" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": null, 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "nuevos_datos" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 439 | "

© José Luis Chiquete Valdivieso. 2023.

" 440 | ] 441 | } 442 | ], 443 | "metadata": { 444 | "kernelspec": { 445 | "display_name": "Python 3 (ipykernel)", 446 | "language": "python", 447 | "name": "python3" 448 | }, 449 | "language_info": { 450 | "codemirror_mode": { 451 | "name": "ipython", 452 | "version": 3 453 | }, 454 | "file_extension": ".py", 455 | "mimetype": "text/x-python", 456 | "name": "python", 457 | "nbconvert_exporter": "python", 458 | "pygments_lexer": "ipython3", 459 | "version": "3.9.2" 460 | } 461 | }, 462 | "nbformat": 4, 463 | "nbformat_minor": 2 464 | } 465 | -------------------------------------------------------------------------------- /23_visualizacion_de_datos_con_pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Visualización de datos con *Pandas*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## El atributo ```pandas.Dataframe.plot```.\n", 31 | "\n", 32 | "Este atributo contiene una colección de métodos que permiten deplegar gráficos descriptivos básicos basados en *Matplotlib*.\n", 33 | "\n", 34 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**Ejemplo:**" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "* El archivo ```data/datos_filtrados.csv``` contiene los datos de crecimiento económico trimestral y anual de las Entidad Federativas de la República Mexicana desde *2013*." 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "* La siguiente celda creará al *dataframe* ```datos``` a partir del archivo ```data/datos_filtrados.csv```." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "datos = pd.read_csv('data/datos_filtrados.csv')" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": { 71 | "scrolled": false 72 | }, 73 | "outputs": [], 74 | "source": [ 75 | "datos" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "* Las siguientes celdas modificarán el *dataframe*." 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "datos = datos.drop([0, 1])" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": { 98 | "scrolled": true 99 | }, 100 | "outputs": [], 101 | "source": [ 102 | "datos" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "datos.set_index('Año', inplace=True)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "datos" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "datos.index.name = \"Entidad\"" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "datos" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "columnas = pd.MultiIndex.from_product([(x for x in range(2003, 2020)),\n", 148 | " ('T1', 'T2', 'T3', 'T4', 'Anual')],\n", 149 | " names=['Año', 'Período'])" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "datos.columns = columnas" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "scrolled": true 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "datos" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "datos = datos.astype(float)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": null, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "datos[2004]['Anual']" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": null, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [ 196 | "historico_2004 = datos[2004]['Anual'][1:]" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "historico_2004" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": {}, 212 | "outputs": [], 213 | "source": [ 214 | "historico_2004.plot()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": null, 220 | "metadata": {}, 221 | "outputs": [], 222 | "source": [ 223 | "historico_2004.plot.hist()" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "historico_2004.plot.pie()" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "historico_2004.plot.area()" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "historico_2004.plot.bar()" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": { 257 | "scrolled": true 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "historico_2004.plot.barh()" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 269 | "

© José Luis Chiquete Valdivieso. 2021.

" 270 | ] 271 | } 272 | ], 273 | "metadata": { 274 | "kernelspec": { 275 | "display_name": "Python 3 (ipykernel)", 276 | "language": "python", 277 | "name": "python3" 278 | }, 279 | "language_info": { 280 | "codemirror_mode": { 281 | "name": "ipython", 282 | "version": 3 283 | }, 284 | "file_extension": ".py", 285 | "mimetype": "text/x-python", 286 | "name": "python", 287 | "nbconvert_exporter": "python", 288 | "pygments_lexer": "ipython3", 289 | "version": "3.9.2" 290 | } 291 | }, 292 | "nbformat": 4, 293 | "nbformat_minor": 2 294 | } 295 | -------------------------------------------------------------------------------- /24_introduccion_a_matplotlib.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# El paquete *Matplotlib*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "[*Matlplotib*](https://matplotlib.org/) consiste en una biblioteca especializada en visualización de datos y es la base de herramientas más avanzadas como [*Seaborn*](https://seaborn.pydata.org/), [*Plotnine*](https://plotnine.readthedocs.io/en/stable/) y [*Dash*](https://dash.plotly.com/) entre muchas otras.\n", 22 | "\n", 23 | "La sintaxis de *Matplotlib* está basada en la sintaxis de *Matlab*.\n", 24 | "\n", 25 | "Por convención, el paquete ```matplotlib``` se importa como ```mpl```. Se utilizará esta nomenclatura para próximas referencia." 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "import numpy as np" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "!pip install matplotlib" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## El objeto ```mpl.pyplot```.\n", 51 | "\n", 52 | "*Matplotlib* contiene una muy extensa biblioteca, la cual tiene como componente principal al objeto ```matplotlib.pyplot```. Por convención, ```matplotlib.pyplot``` se importa como ```plt```. Se utilizará esta convención para próximas referencias.\n", 53 | "\n", 54 | "El uso de ```plt``` permite crear objetos capaces de desplegar gráficos y definir múltiples características de éstos.\n", 55 | "\n", 56 | "A lo largo de este capítulo se explorarán algunos de dichos recursos.\n", 57 | "\n", 58 | "https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "from matplotlib import pyplot as plt" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "### El método ```plt.plot()```." 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "Esta función permite crear una o más gráficas en 2 dimensiones a partir de arreglos de puntos para los ejes ```x``` y ```y``` que son ingresados como argumentos.\n", 82 | "\n", 83 | "```\n", 84 | "plt.plot(, ,\n", 85 | " , , ... \n", 86 | " , , )\n", 87 | "```" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "### El método ```plt.show()```.\n", 95 | "\n", 96 | "El método ```plt. show()``` permite desplegar el gráfico creado con ```plt.plot()``` en el entorno desde el que se ejecutan las instrucciones.\n", 97 | "\n", 98 | "**NOTA:** En el caso de las notebooks de *Jupyter*, no es necesario usar ```plt. show()``` para que el gráfico se despliegue al ejecutar una celda con ```plt.plot()```." 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "**Ejemplos:**" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "* Las siguientes celdas crearán un arreglo unidimensinal de *Numpy* con nombre ```x```, el cual contendrá ```500``` segmentos lineales que van de ```0``` a ```3π``` usando la función ```np.linspace()```." 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": { 119 | "scrolled": true 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "x = np.linspace(0, 3*np.pi, 500)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "x" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "* La siguente celda utilizará ```plt.plot()``` para desplegar dos gráficas que unirán cada punto definido mediante una línea, las cuales están en función del arreglo ```x```:\n", 140 | "\n", 141 | "* ```np.sin(x ** 2)```\n", 142 | "* ```np.cos(x ** 2)```" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "plt.plot(x, np.sin(x**2), x, np.cos(x ** 2))" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "plt.show()" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "* Las celdas anteriores crearon de forma automática un objeto de tipo ```plt.Figure```, el cual fue utilizado para desplegar las gráficas correspondientes." 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "* La siguiente celda incluye algunas funciones de ```plt``` que definen el título del gráfico (```plt.title()```), y las estiqueta de los ejes ( ```plt.xlabel```y ```plt.ylabel```)." 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "plt.plot(x, np.sin(x**2), x, np.cos(x ** 2))\n", 184 | "plt.title('Funciones sinusoidales')\n", 185 | "plt.xlabel('Eje de las x')\n", 186 | "plt.ylabel('f(x)')\n", 187 | "plt.show()" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 195 | "

© José Luis Chiquete Valdivieso. 2023.

" 196 | ] 197 | } 198 | ], 199 | "metadata": { 200 | "kernelspec": { 201 | "display_name": "Python 3 (ipykernel)", 202 | "language": "python", 203 | "name": "python3" 204 | }, 205 | "language_info": { 206 | "codemirror_mode": { 207 | "name": "ipython", 208 | "version": 3 209 | }, 210 | "file_extension": ".py", 211 | "mimetype": "text/x-python", 212 | "name": "python", 213 | "nbconvert_exporter": "python", 214 | "pygments_lexer": "ipython3", 215 | "version": "3.9.2" 216 | } 217 | }, 218 | "nbformat": 4, 219 | "nbformat_minor": 2 220 | } 221 | -------------------------------------------------------------------------------- /26_tipos_basicos_de_graficos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Tipos básicos de gráfico de *Matplotlib*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "https://matplotlib.org/stable/plot_types/index.html\n", 22 | "\n", 23 | "https://matplotlib.org/stable/gallery/" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import matplotlib as mpl\n", 33 | "import numpy as np\n", 34 | "from matplotlib import pyplot as plt" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Preliminares." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "np.random.seed(12314124)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "normal = np.random.normal(50, 20, 500)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "normal" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "normal2 = np.random.normal(25, 12, 500)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "ancho = np.random.randint(5, 60, 500)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Histograma." 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "plt.hist(normal)" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": null, 115 | "metadata": { 116 | "scrolled": true 117 | }, 118 | "outputs": [], 119 | "source": [ 120 | "plt.hist(normal, histtype='step')" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": { 127 | "scrolled": true 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "plt.hist(normal, orientation='horizontal')" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": { 138 | "scrolled": false 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "plt.hist(normal, color='red')" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": { 149 | "scrolled": true 150 | }, 151 | "outputs": [], 152 | "source": [ 153 | "plt.hist(normal, bins=13)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "plt.hist(normal, stacked=True)" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "## Histograma 2D." 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist2d.html" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "plt.hist2d(normal, normal2)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "## Boxplot." 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": { 206 | "scrolled": true 207 | }, 208 | "outputs": [], 209 | "source": [ 210 | "plt.boxplot(normal)" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "plt.boxplot((normal, normal2))" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "## Scatterplot." 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": { 240 | "scrolled": true 241 | }, 242 | "outputs": [], 243 | "source": [ 244 | "plt.scatter(normal, normal2)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "plt.scatter(normal, normal2, s=ancho, c=ancho)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 261 | "

© José Luis Chiquete Valdivieso. 2023.

" 262 | ] 263 | } 264 | ], 265 | "metadata": { 266 | "kernelspec": { 267 | "display_name": "Python 3 (ipykernel)", 268 | "language": "python", 269 | "name": "python3" 270 | }, 271 | "language_info": { 272 | "codemirror_mode": { 273 | "name": "ipython", 274 | "version": 3 275 | }, 276 | "file_extension": ".py", 277 | "mimetype": "text/x-python", 278 | "name": "python", 279 | "nbconvert_exporter": "python", 280 | "pygments_lexer": "ipython3", 281 | "version": "3.9.2" 282 | } 283 | }, 284 | "nbformat": 4, 285 | "nbformat_minor": 2 286 | } 287 | -------------------------------------------------------------------------------- /27_introduccion_a_plotnine.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Introducción a *Plotnine*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "*Plotnine* es un proyecto que pretende implementar en *Python* las funcionalidades de [*ggplot2*](https://ggplot2.tidyverse.org/), la popular herramienta de visualización de datos para *R* y la [gramática de gráficas por capas](http://vita.had.co.nz/papers/layered-grammar.html); ambas creadas por [*Hadley Wickham*](https://hadley.nz/) y basadas en la gramática de gráficas de [*Leland Wilkinson*](https://en.wikipedia.org/wiki/Leland_Wilkinson).\n", 22 | "\n", 23 | "Aún cuando muchas de las funcionalidades de *ggplot2* ya han sido portadas, aún quedan muchas que se encuentran pendientes.\n", 24 | "\n", 25 | "\n", 26 | "La documentación oficial de *Plotnine* puede ser consultada en la siguiente liga:\n", 27 | "\n", 28 | "https://plotnine.readthedocs.io/en/stable/\n", 29 | "\n", 30 | "La siguiente liga apunta a un breve tutorial sobre *Plotnine* tomando como base a *ggplot2*: \n", 31 | "\n", 32 | "https://datascienceworkshops.com/blog/plotnine-grammar-of-graphics-for-python/" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": { 39 | "scrolled": true 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "!pip install plotnine" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "from plotnine import *\n", 53 | "import pandas as pd\n", 54 | "import numpy as np\n", 55 | "from datetime import datetime\n", 56 | "from typing import Any" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## Gramática de capas de un gráfico." 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "La gramática de capas define una estructura de elementos que condformnan un gráfico.\n", 71 | "\n", 72 | "* Datos.\n", 73 | "* Mapeo.\n", 74 | "* Estética.\n", 75 | "* Objetos geométricos.\n", 76 | "* Escalas.\n", 77 | "* Especificación de facetas.\n", 78 | "* Transfromaciones.\n", 79 | "* Sistema de coordenadas." 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## Sintaxis de la gramática." 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### La función ```ggplot()```." 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "```\n", 101 | "ggplot(data=, mapping=, )\n", 102 | "```\n", 103 | "\n", 104 | "https://plotnine.readthedocs.io/en/stable/generated/plotnine.ggplot.html" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "### La función ```pltonine.mapping.aes()```.\n", 112 | "\n", 113 | "https://plotnine.readthedocs.io/en/stable/generated/plotnine.mapping.aes.html" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### Funciones de geometría." 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "https://plotnine.readthedocs.io/en/stable/api.html#geoms" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Funciones de temas.\n", 135 | "\n", 136 | "https://plotnine.readthedocs.io/en/stable/api.html#themes" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "## Ejemplos." 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "### Ejemplo de histograma." 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "np.random.seed(23523889)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "arreglo_base = pd.DataFrame(np.random.normal(12, 25, 1000),\n", 169 | " columns=pd.Index(['observaciones']))\n", 170 | "arreglo_base" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": { 177 | "scrolled": false 178 | }, 179 | "outputs": [], 180 | "source": [ 181 | "ggplot(data=arreglo_base)" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "ggplot(data=arreglo_base, mapping=aes(x='observaciones')) + geom_histogram()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "scrolled": true 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "(ggplot(data=arreglo_base, mapping=aes(x='observaciones')) + \n", 202 | "geom_histogram(bins=10, fill='yellow', color=\"orange\"))" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "histograma = pd.DataFrame(np.histogram(arreglo_base, bins=13)).T\n", 212 | "histograma.columns = pd.Index(['frecuencias','rangos'])\n", 213 | "histograma" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "### Ejemplo de columnas." 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "metadata": { 234 | "scrolled": true 235 | }, 236 | "outputs": [], 237 | "source": [ 238 | "ggplot(histograma, aes(x='rangos', y='frecuencias', fill='rangos')) + geom_col()" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "### Ejemplo de de líneas." 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": null, 251 | "metadata": {}, 252 | "outputs": [], 253 | "source": [ 254 | "casos = pd.read_csv('data/data_covid.csv')\n", 255 | "columnas = casos.columns.values\n", 256 | "columnas[0] = 'Fechas'\n", 257 | "casos.columns = pd.Index(columnas)\n", 258 | "casos.columns.name = \"Entidades\"\n", 259 | "casos['Fechas'] = pd.to_datetime(casos['Fechas'])\n", 260 | "casos.set_index('Fechas', inplace =True)\n", 261 | "casos" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n", 271 | "+ geom_line())" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n", 281 | "+ geom_line() \n", 282 | "+ geom_smooth())" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": { 289 | "scrolled": true 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n", 294 | "+ geom_line() \n", 295 | "+ geom_smooth(span=0.07, color='red'))" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": { 302 | "scrolled": false 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "(ggplot(casos, aes(x=casos.index, y='Nacional')) \n", 307 | " + geom_line() \n", 308 | " + geom_smooth(span=0.07, color='blue') \n", 309 | " + theme_xkcd()\n", 310 | " + theme(axis_text_x=element_text(rotation=90, hjust=0.5)))" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "(ggplot(casos, aes(x=casos.index, y='Nacional')) \n", 320 | " + geom_line(color='red') \n", 321 | " + geom_smooth(span=0.07, color='blue') \n", 322 | " + theme_tufte()\n", 323 | " + theme(axis_text_x=element_text(rotation=45, hjust=1)))" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "### Ejemplo de columnas." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "data = casos.drop('Nacional', axis=1).T[datetime(2021,1,1)].to_frame()\n", 340 | "data.columns = pd.Index(['Casos'])\n", 341 | "data" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": null, 347 | "metadata": {}, 348 | "outputs": [], 349 | "source": [ 350 | "(ggplot(data, aes(x=data.index, y=data, fill='Casos')) \n", 351 | " + geom_col()\n", 352 | " + theme(axis_text_x=element_text(rotation=90, hjust=0.5)))" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 360 | "

© José Luis Chiquete Valdivieso. 2022.

" 361 | ] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3 (ipykernel)", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.9.2" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 2 385 | } 386 | -------------------------------------------------------------------------------- /29_objetos_de_seaborn.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Objetos de *Seaborn*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Aún cuando las funciones *Seaborn* son muy populares y fáciles de desarrollar, tienen ciertas desventajas con respecto a otras soluciones basadas en gramáticas de gráficas. A partir de la versión 0.12, *Seaborn* cuenta con una biblioteca de objetos que se apega a dicha gramática.\n", 22 | "\n", 23 | "\n", 24 | "\n", 25 | "\n", 26 | "Por convención, la biblioteca de objetos de seaborn ```seaborn.objects``` es importada como ```so```. En adelente, se seguirá dicha convención.\n", 27 | "\n", 28 | "https://seaborn.pydata.org/tutorial/objects_interface.html\n", 29 | "\n", 30 | "**NOTA:** Debido a que los objetos de *Seaborn* son una adición muy reciente, aún tiene funcionalidades limitadas en comparación con las funciones." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import seaborn as sns\n", 40 | "from seaborn import objects as so" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## La clase ```so.Plot ```.\n", 48 | "\n", 49 | "Los componentes principales de este tipo de visualizaciones son los objetos instanciados de la clase ```so.Plot```.\n", 50 | "\n", 51 | "```\n", 52 | "so.Plot(data=, x=, y=, )\n", 53 | "```\n", 54 | "\n", 55 | "Donde:\n", 56 | "\n", 57 | "* `````` es un *dataset* compatible con un *dataframe* de *Pandas*.\n", 58 | "* `````` es el identificador de la columna del `````` que se utilizará para el eje de las $x$ en caso de que se requiera.\n", 59 | "* `````` es el identificador de la columna del `````` que se utilizará para el eje de las $y$ en caso de que se requiera.\n", 60 | "\n", 61 | "\n", 62 | "Los objetos instanciados de ```so.Plot``` cuentan con varios métodos y atributos que permiten cumplir con la funcionalidades de las capas:\n", 63 | "\n", 64 | "* Datos.\n", 65 | "* Estética.\n", 66 | "* Escala.\n", 67 | "* Facetas.\n", 68 | "\n", 69 | "https://seaborn.pydata.org/generated/seaborn.objects.Plot.html" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### Métodos en cascada.\n", 77 | "\n", 78 | "Los métodos de loss objetos instanciados de ```so.Plot``` también regresan objetos instanciados de ```so.Plot```, por lo que es posible aplicar métodos en cascada." 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "### El método ```so.Plot.add() ```.\n", 86 | "\n", 87 | "Este método permite añadir funcionalidades que extienden a los objeto de tipo ```so.Plot.add()``` en las capas de objetos geométricos, y estadísitca, principalmente.\n", 88 | "\n", 89 | "```\n", 90 | "so.Plot.add((), (), ... ())\n", 91 | "```\n", 92 | "\n", 93 | "Donde:\n", 94 | "\n", 95 | "* Cada ``````es una función de ```seaborn.objects```." 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "**Ejemplo:**" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "dataset = dataset = sns.load_dataset(\"iris\")\n", 112 | "dataset" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": { 119 | "scrolled": true 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\")" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": { 130 | "scrolled": true 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "(so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\")\n", 135 | " .add(so.Dots()))" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "(so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\", color=\"species\")\n", 145 | " .add(so.Dots()))" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "scrolled": true 153 | }, 154 | "outputs": [], 155 | "source": [ 156 | "(so.Plot(data=dataset, x='sepal_length',\n", 157 | " y='sepal_width')\n", 158 | " .facet('species'))" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n", 168 | " facet('species').\n", 169 | " add(so.Bar(), so.Hist()))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": { 176 | "scrolled": true 177 | }, 178 | "outputs": [], 179 | "source": [ 180 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n", 181 | " facet('species').\n", 182 | " add(so.Bar(), so.Hist()).scale(x=1))" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n", 192 | " add(so.Bar(), so.S()))" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 200 | "

© José Luis Chiquete Valdivieso. 2022.

" 201 | ] 202 | } 203 | ], 204 | "metadata": { 205 | "kernelspec": { 206 | "display_name": "Python 3 (ipykernel)", 207 | "language": "python", 208 | "name": "python3" 209 | }, 210 | "language_info": { 211 | "codemirror_mode": { 212 | "name": "ipython", 213 | "version": 3 214 | }, 215 | "file_extension": ".py", 216 | "mimetype": "text/x-python", 217 | "name": "python", 218 | "nbconvert_exporter": "python", 219 | "pygments_lexer": "ipython3", 220 | "version": "3.10.6" 221 | } 222 | }, 223 | "nbformat": 4, 224 | "nbformat_minor": 2 225 | } 226 | -------------------------------------------------------------------------------- /30_introduccion_a_dask.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "[![img/pythonista.png](img/pythonista.png)](https://www.pythonista.io)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Introducción a *Dask*." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Las bibliotecas de *Scipy* tienen limitaciones en cuanto a su capacidad de escalar de forma horizontal y aún cuando son capaces de realizar *multithreading* para procesamiento en paralelo, están restringidas a la cantidad de recursos disponibles de la máquina dede las que són ejecutadas.\n", 22 | "\n", 23 | "[*Dask*](https://dask.org/) es una biblioteca general para cómputo paralelo que permite escalar sus operaciones por medio de clústers (grupos de equipos de cómputo que trabajan de forma coordinada).\n", 24 | "\n", 25 | "*Dask* consta de:\n", 26 | "\n", 27 | "* Un calendarizador de tareas dinámico (*dynamic task scheduler*).\n", 28 | "* Una colección de bibliotecas optimizadas para *Big Data*, con interfaces que extienden a*Numpy* y *Pandas*.\n", 29 | "\n", 30 | "https://docs.dask.org/en/stable/\n", 31 | "\n", 32 | "https://tutorial.dask.org/" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": { 39 | "scrolled": true 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "pip install dask" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## Principales paquetes de *Dask*." 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "### Paquetes de colecciones de datos de *Dask*.\n", 65 | "\n", 66 | "* ```dask.array```, el cual contiene una biblioteca para manejo de arreglos similar a la de *Numpy*. Por convención, este módulo se importa como ```da```. La documentación de este paquete puede consultarse en:\n", 67 | " * https://docs.dask.org/en/stable/array.html\n", 68 | "* ```dask.dataframe```, el cual contiene una biblioteca para manejo de *datafames* similar a la de *Pandas*. Por convención, este módulo se importa como ```dd```. La documentación de este paquete puede consultarse en:\n", 69 | " * https://docs.dask.org/en/stable/dataframe.html\n", 70 | "* ```dask.bags```, el cual contiene una biblioteca para manejo de *bags*, las cuales son estructuras de datos que pueden contener datos semi-estructurados y estructurados. Por convención este módulo se importa como ```db```. La documentación de este paquete puede consultarse en:\n", 71 | "https://docs.dask.org/en/stable/bag.html" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "### Evaluación perezosa (*lazy*) con el método ```compute()```." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "import dask.dataframe as dd" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "df = dd.read_csv('data/data_covid.csv')" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": { 103 | "scrolled": true 104 | }, 105 | "outputs": [], 106 | "source": [ 107 | "df" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "df.compute()" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "type(df[\"Nacional\"])" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "scrolled": true 133 | }, 134 | "outputs": [], 135 | "source": [ 136 | "df[\"Nacional\"].compute()" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "df.loc[df[\"Nacional\"] > 50000].loc[:, ['index', 'Nacional']]" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "df.loc[df[\"Nacional\"] > 50000].loc[:, ['index', 'Nacional']].compute()" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "### Bibliotecas de *Dask*.\n", 162 | "\n", 163 | "* ```dask.delayed```. Esta biblioteca permite procesar colecciones basadas en *Python* de forma paralela.\n", 164 | " * https://docs.dask.org/en/stable/delayed.html\n", 165 | "* ```dask.futures```. Es una implementación de [```concurrent.futures```](https://docs.python.org/3/library/concurrent.futures.html) de *Python* optimizado para correr en un cluster. La documentación de este paquete puede consultarse en:\n", 166 | " * https://docs.dask.org/en/stable/futures.html" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "## Depliegue de un cluster con ```Dask.Distributed```." 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "*Dask* puede ser desplegado en clusters mediante el uso de varios equipos *workers* gestionados por un *scheduler*.\n", 181 | "\n", 182 | "\n", 183 | "https://distributed.dask.org/en/stable/" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "scrolled": true 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "!pip install \"bokeh>=2.4.2, <3\"\n", 202 | "!pip install dask distributed --upgrade" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "!dask scheduler" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "

\"Licencia
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

\n", 219 | "

© José Luis Chiquete Valdivieso. 2022.

" 220 | ] 221 | } 222 | ], 223 | "metadata": { 224 | "kernelspec": { 225 | "display_name": "Python 3 (ipykernel)", 226 | "language": "python", 227 | "name": "python3" 228 | }, 229 | "language_info": { 230 | "codemirror_mode": { 231 | "name": "ipython", 232 | "version": 3 233 | }, 234 | "file_extension": ".py", 235 | "mimetype": "text/x-python", 236 | "name": "python", 237 | "nbconvert_exporter": "python", 238 | "pygments_lexer": "ipython3", 239 | "version": "3.10.6" 240 | } 241 | }, 242 | "nbformat": 4, 243 | "nbformat_minor": 2 244 | } 245 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Pythonista® 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # py311 Gestión de datos con Numpy, Pandas y Matplotlib. 2 | 3 | ## Temario: 4 | 5 | * Sicpy y Numpy. 6 | * Introducción a Pandas. 7 | * Tipos de datos en Pandas. 8 | * Operaciones básicas. 9 | * Operaciones de adición. 10 | * El método merge. 11 | * El método filter. 12 | * El método apply. 13 | * El método groupby. 14 | * Métodos de enmascaramiento. 15 | * Gestión de datos. 16 | * Limpieza de datos faltantes. 17 | * Transformaciones. 18 | * Índices y multi-índices. 19 | * Extracción y almacenamiento de datos. 20 | * Visualización de datos con pandas. 21 | * Introducción a Matplotlib. 22 | * Gráficas estadísticas. 23 | * Grafícas en 3D. 24 | * Procesamiento de imágenes. 25 | * Cómputo paralelo con Dask. 26 | -------------------------------------------------------------------------------- /img/arquitectura_dask.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/arquitectura_dask.png -------------------------------------------------------------------------------- /img/ciclo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/ciclo.png -------------------------------------------------------------------------------- /img/dask_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/dask_cluster.png -------------------------------------------------------------------------------- /img/grammar_of_graphics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/grammar_of_graphics.png -------------------------------------------------------------------------------- /img/pythonista.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/pythonista.png -------------------------------------------------------------------------------- /src/31/callback.py: -------------------------------------------------------------------------------- 1 | from dash import Dash, dcc, html, Input, Output 2 | 3 | app = Dash(__name__) 4 | 5 | app.layout = html.Div([ 6 | html.H6("Change the value in the text box to see callbacks in action!"), 7 | html.Div([ 8 | "Input: ", 9 | dcc.Input(id='my-input', value='initial value', type='text') 10 | ]), 11 | html.Br(), 12 | html.Div(id='my-output'), 13 | 14 | ]) 15 | 16 | 17 | @app.callback( 18 | Output(component_id='my-output', component_property='children'), 19 | Input(component_id='my-input', component_property='value') 20 | ) 21 | def update_output_div(input_value): 22 | return f'Output: {input_value}' 23 | 24 | 25 | if __name__ == '__main__': 26 | app.run_server(debug=True, port=8000, host="0.0.0.0") -------------------------------------------------------------------------------- /src/31/hola_mundo.py: -------------------------------------------------------------------------------- 1 | from dash import Dash, html 2 | import pandas as pd 3 | 4 | df = pd.read_csv('https://gist.githubusercontent.com/chriddyp/c78bf172206ce24f77d6363a2d754b59/raw/c353e8ef842413cae56ae3920b8fd78468aa4cb2/usa-agricultural-exports-2011.csv') 5 | 6 | 7 | def generate_table(dataframe, max_rows=10): 8 | return html.Table([ 9 | html.Thead( 10 | html.Tr([html.Th(col) for col in dataframe.columns]) 11 | ), 12 | html.Tbody([ 13 | html.Tr([ 14 | html.Td(dataframe.iloc[i][col]) for col in dataframe.columns 15 | ]) for i in range(min(len(dataframe), max_rows)) 16 | ]) 17 | ]) 18 | 19 | 20 | app = Dash(__name__) 21 | 22 | app.layout = html.Div([ 23 | html.H4(children='US Agriculture Exports (2011)'), 24 | generate_table(df) 25 | ]) 26 | 27 | if __name__ == '__main__': 28 | app.run_server(debug=True, port=8080, host="0.0.0.0") 29 | --------------------------------------------------------------------------------