├── .gitignore
├── 01_el_proyecto_scipy.ipynb
├── 02_conceptos_basicos_de_numpy.ipynb
├── 03_gestion_de_arreglos_de_numpy.ipynb
├── 04_arreglos_con_contenido_aleatorio.ipynb
├── 05_operaciones_basicas_con_arreglos.ipynb
├── 06_manipulacion_de_arreglos_de_numpy.ipynb
├── 07_gestion_y_analisis_de_datos_de_numpy.ipynb
├── 08_algebra_lineal_con_numpy.ipynb
├── 09_introduccion_a_pandas.ipynb
├── 10_tipos_de_datos_de_pandas.ipynb
├── 11_operaciones_basicas_con_dataframes.ipynb
├── 12_indices_y_multiindices.ipynb
├── 13_uniones_y_mezclas_de_dataframes.ipynb
├── 14_metodo_merge.ipynb
├── 15_metodo_filter.ipynb
├── 16_metodos_apply_y_transform.ipynb
├── 17_metodos_de_enmascaramiento.ipynb
├── 18_gestion_de_datos.ipynb
├── 19_limpieza_y_datos_faltantes.ipynb
├── 20_transformacion_de_objetos.ipynb
├── 21_metodos_groupby.ipynb
├── 22_extraccion_y_almacenamiento.ipynb
├── 23_visualizacion_de_datos_con_pandas.ipynb
├── 24_introduccion_a_matplotlib.ipynb
├── 25_elementos_de_un_grafico.ipynb
├── 26_tipos_basicos_de_graficos.ipynb
├── 27_introduccion_a_plotnine.ipynb
├── 28_introduccion_a_seaborn.ipynb
├── 29_objetos_de_seaborn.ipynb
├── 30_introduccion_a_dask.ipynb
├── 31_introduccion_a_plotly_y_dash.ipynb
├── LICENSE
├── README.md
├── data
├── Casos_Diarios_Estado_Nacional_Confirmados_20211221.csv
├── data_covid.csv
└── datos_filtrados.csv
├── img
├── arquitectura_dask.png
├── ciclo.png
├── dask_cluster.png
├── grammar_of_graphics.png
└── pythonista.png
└── src
└── 31
├── callback.py
└── hola_mundo.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 | .pytest_cache/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 | db.sqlite3
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # Environments
85 | .env
86 | .venv
87 | env/
88 | venv/
89 | ENV/
90 | env.bak/
91 | venv.bak/
92 |
93 | # Spyder project settings
94 | .spyderproject
95 | .spyproject
96 |
97 | # Rope project settings
98 | .ropeproject
99 |
100 | # mkdocs documentation
101 | /site
102 |
103 | # mypy
104 | .mypy_cache/
105 |
106 | # excel files
107 | *.xlsx
108 | arreglo.*
109 | arreglos.*
110 | grafica.png
111 | imagen.jpg
112 |
113 |
--------------------------------------------------------------------------------
/01_el_proyecto_scipy.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# El proyecto *Scipy*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "Existen herramientas y lenguajes de programación diseñados específicamente para el cómputo científico y el análisis de datos tales como:\n",
22 | "\n",
23 | "* [*Matlab*](https://la.mathworks.com/products/matlab.html).\n",
24 | "* [*SPSS* ](https://www.ibm.com/analytics/spss-statistics-software).\n",
25 | "* [*Mathematica*](https://www.wolfram.com/mathematica/).\n",
26 | "\n",
27 | "Algunos de ellos incluso ha sido publicado bajo los términos de licencias libres, como:\n",
28 | "\n",
29 | "* [*GNU Octave*](https://www.gnu.org/software/octave/).\n",
30 | "* [*R*](https://www.r-project.org/).\n",
31 | "* [*Julia*](https://julialang.org/).\n",
32 | "\n",
33 | "A diferencia de estas herramientas y lenguajes altamente especializadas en temas estadísticos y de análisis de datos, *Python* es un lenguaje de programación de propósito general. \n",
34 | "\n",
35 | "Sin embargo, debido a las características de *Python* se han creado diversos proyectos enfocados a ofrecer herramientas altamente competitivas en el tema de análisis de datos, estadística y cómputo científico."
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "## Proyectos más relevantes para análisis de datos.\n",
43 | "\n",
44 | "### El proyecto *Scipy*.\n",
45 | "\n",
46 | "El proyecto [*Scipy*](https://www.scipy.org) consta de una serie de herramientas y bibliotecas especializadas en cómputo científico y análisis de datos. Los [componentes más importantes](https://projects.scipy.org/stackspec.html) del proyecto son: \n",
47 | "\n",
48 | "* [*Numpy*](https://numpy.org), una biblioteca que ofrece:\n",
49 | " * Gestión de arreglos multidimensionales (```np.array```).\n",
50 | " * Tipos de datos optimizados para operaciones con arreglos.\n",
51 | " * Poderosos componentes de cálculo vectorial y de álgebra lineal.\n",
52 | "* [*Pandas*](https://pandas.pydata.org/), una herramienta especializada en:\n",
53 | " * Obtención y almacenamientos de datos de diversas fuentes y con diversos formatos.\n",
54 | " * Tratamiento de los datos.\n",
55 | " * Análisis de los datos. \n",
56 | "* [*Matplotlib*](https://matplotlib.org/), una biblioteca especializada en el despliegue y visualización con una sintaxis similar a la de *Matlab*.\n",
57 | "* [*iPython*](https://ipython.org/), un intérprete de *Python* especializado en temas de análisis de datos y cómputo científico, el cual fue el origen del proyecto [*Jupyter*](https://jupyter.org/).\n",
58 | "* [Sympy](https://www.sympy.org), una herramienta que permite realizar operaciones con expresiones de álgebra simbólica.\n",
59 | "\n",
60 | "Estos componentes principales son proyectos muy maduros, cuentan con extensa documentación y soporte tanto de la comunidad como comercial."
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "### Los *scikits*.\n",
68 | "\n",
69 | "Los [*scikits*](https://www.scipy.org/scikits.html) son un compendio de proyectos basados en *Scipy* que ofrecen herramientas puntuales para temas muy específicos tales como:\n",
70 | "\n",
71 | "* Machine Learning.\n",
72 | "* Redes neuronales.\n",
73 | "* Análisis de imágenes.\n",
74 | "* Cómputo paralelo.\n",
75 | "* Supercómputo usando GPU.\n",
76 | "* Series de tiempo, etc.\n",
77 | "\n",
78 | "Estos proyectos no son mantenidos ni soportados por *Scipy* y su documentación y madurez no es homogenea.\n",
79 | "\n",
80 | "Los proyectos pueden ser consutados en:\n",
81 | "\n",
82 | "https://scikits.appspot.com/scikits"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | "## Instalación de los componentes."
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": null,
95 | "metadata": {
96 | "scrolled": true
97 | },
98 | "outputs": [],
99 | "source": [
100 | "!pip install numpy scipy pandas matplotlib"
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "###

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
108 | "© José Luis Chiquete Valdivieso. 2023.
"
109 | ]
110 | }
111 | ],
112 | "metadata": {
113 | "kernelspec": {
114 | "display_name": "Python 3 (ipykernel)",
115 | "language": "python",
116 | "name": "python3"
117 | },
118 | "language_info": {
119 | "codemirror_mode": {
120 | "name": "ipython",
121 | "version": 3
122 | },
123 | "file_extension": ".py",
124 | "mimetype": "text/x-python",
125 | "name": "python",
126 | "nbconvert_exporter": "python",
127 | "pygments_lexer": "ipython3",
128 | "version": "3.9.2"
129 | }
130 | },
131 | "nbformat": 4,
132 | "nbformat_minor": 4
133 | }
134 |
--------------------------------------------------------------------------------
/04_arreglos_con_contenido_aleatorio.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Arreglos con contenido aleatorio."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {
21 | "scrolled": true
22 | },
23 | "outputs": [],
24 | "source": [
25 | "import numpy as np"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## El paquete ```np.random```."
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": [
39 | "https://numpy.org/doc/stable/reference/random/generator.html"
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | "### La función ```np.random.rand()```.\n",
47 | "\n",
48 | "* La función ```np.random.rand()``` crea un arreglo cuyos elementos son valores aleatorios que van de ```0``` a antes de ```1``` dentro de una distribución uniforme.\n",
49 | "\n",
50 | "```\n",
51 | "np.random.rand()\n",
52 | "```\n",
53 | "\n",
54 | "* Donde `````` es una secuencia de valores enteros separados por comas que definen la forma del arreglo.\n",
55 | "\n",
56 | "https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "**Ejemplo:**"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "* La siguiente celda generará un arreglo de forma ```(2, 2, 2)```conteniendo números aleatorios."
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": null,
76 | "metadata": {
77 | "scrolled": true
78 | },
79 | "outputs": [],
80 | "source": [
81 | "np.random.rand(2, 2, 2)"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "### La función ```np.random.randint()```.\n",
89 | "\n",
90 | "* La función ```np.random.randint()``` crea un arreglo cuyos elementos son valores entros aleatorios en un rango dado.\n",
91 | "\n",
92 | "```\n",
93 | "np.random.randint(, , )\n",
94 | "```\n",
95 | "\n",
96 | "Donde:\n",
97 | "\n",
98 | "* `````` es el valor inicial del rango a partir del cual se generarán los números aleatorios, incluyéndolo a este.\n",
99 | "* `````` es el valor final del rango a partir del cual se generarán los números aleatorios, sin incluirlo.\n",
100 | "* `````` es un objeto ```tuple``` que definen la forma del arreglo."
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "**Ejemplos:**"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "* La siguente celda creará una arreglo de forma ```(3, 3)``` con valores enteros que pueden ir de ```1``` a ```2```."
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": null,
120 | "metadata": {
121 | "scrolled": true
122 | },
123 | "outputs": [],
124 | "source": [
125 | "np.random.randint(1, 3, (3, 3))"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "* La siguente celda creará una arreglo de forma ```(3, 2, 4)``` con valores enteros que pueden ir de ```0``` a ```255```."
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "metadata": {
139 | "scrolled": true
140 | },
141 | "outputs": [],
142 | "source": [
143 | "np.random.randint(0, 256, (3, 2, 4))"
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {},
149 | "source": [
150 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
151 | "© José Luis Chiquete Valdivieso. 2023.
"
152 | ]
153 | }
154 | ],
155 | "metadata": {
156 | "kernelspec": {
157 | "display_name": "Python 3 (ipykernel)",
158 | "language": "python",
159 | "name": "python3"
160 | },
161 | "language_info": {
162 | "codemirror_mode": {
163 | "name": "ipython",
164 | "version": 3
165 | },
166 | "file_extension": ".py",
167 | "mimetype": "text/x-python",
168 | "name": "python",
169 | "nbconvert_exporter": "python",
170 | "pygments_lexer": "ipython3",
171 | "version": "3.9.2"
172 | }
173 | },
174 | "nbformat": 4,
175 | "nbformat_minor": 2
176 | }
177 |
--------------------------------------------------------------------------------
/08_algebra_lineal_con_numpy.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Álgebra lineal con *Numpy*.\n",
15 | "\n",
16 | "El componente más poderoso de *Numpy* es su capacidad de realizar operaciones con arreglos, y un caso particular de ellos son las matrices numéricas.\n",
17 | "\n",
18 | "*Numpy* cuenta con una poderosa biblioteca de funciones que permiten tratar a los arreglos como estructuras algebraicas.\n",
19 | "\n",
20 | "La biblioteca especializada en alǵebra lineal es [```np.linalg```](https://numpy.org/doc/stable/reference/routines.linalg.html). Pero además de esta biblioteca especializada, es posible realizar operaciones básicas de matrices y vectores."
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": null,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": [
29 | "import numpy as np"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Producto punto entre dos matrices.\n",
37 | "\n",
38 | "\n",
39 | "La función ```np.dot()```permite realizar las operaciones de [producto punto](https://es.wikipedia.org/wiki/Producto_escalar) entre dos matrices compatibles (vectores).\n",
40 | "\n",
41 | "```\n",
42 | "np.dot(,)\n",
43 | "\n",
44 | "```\n",
45 | "\n",
46 | "Donde:\n",
47 | "\n",
48 | "* Cada `````` es una arreglo que cumple con la características para realizar esta operación.\n",
49 | "\n",
50 | "https://numpy.org/doc/stable/reference/generated/numpy.dot.html"
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "**Ejemplo:**"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "* La siguiente celda creará al arreglo con nombre ```arreglo_1 ``` de forma ```(3, 2)```."
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": [
73 | "arreglo_1 = np.array([[1, 2],\n",
74 | " [3, 4],\n",
75 | " [5, 6]])"
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": null,
81 | "metadata": {},
82 | "outputs": [],
83 | "source": [
84 | "arreglo_1"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "* La siguiente celda creará al arreglo con nombre ```arreglo_2 ``` de forma ```(2, 4)```."
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {},
98 | "outputs": [],
99 | "source": [
100 | "arreglo_2 = np.array([[11, 12, 13, 14],\n",
101 | " [15, 16, 17, 18]])"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": [
110 | "arreglo_2"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "* La siguiente celda realizará la operación de producto punto entre ```arreglo_1``` y ```arreglo_2```, regresando una matriz de la forma ```(3, 4)```."
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": null,
123 | "metadata": {
124 | "scrolled": true
125 | },
126 | "outputs": [],
127 | "source": [
128 | "np.dot(arreglo_1, arreglo_2)"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "* El signo ```@``` es reconocido por *Numpy* como el operador de producto punto."
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {
142 | "scrolled": true
143 | },
144 | "outputs": [],
145 | "source": [
146 | "arreglo_1 @ arreglo_2"
147 | ]
148 | },
149 | {
150 | "cell_type": "markdown",
151 | "metadata": {},
152 | "source": [
153 | "## Producto cruz entre dos matrices."
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "La función ```np.cross()```permite realizar las operaciones de [producto cruz](https://es.wikipedia.org/wiki/Producto_vectorial) entre dos matrices compatibles.\n",
161 | "\n",
162 | "```\n",
163 | "np.cross(,)\n",
164 | "\n",
165 | "```\n",
166 | "\n",
167 | "https://numpy.org/doc/stable/reference/generated/numpy.cross.html"
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "metadata": {},
173 | "source": [
174 | "**Ejemplo:**"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "* La siguiente celda creará un arreglo de una dimensión, de forma ```(1,2)``` al que se le llamará ```vector_1```."
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "vector_1 = np.array([1, 2])"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "scrolled": true
198 | },
199 | "outputs": [],
200 | "source": [
201 | "vector_1.shape"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "* La siguiente celda creará un arreglo de una dimensión, de forma ```(1,2)``` al que se le llamará ```vector_2```."
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": null,
214 | "metadata": {},
215 | "outputs": [],
216 | "source": [
217 | "vector_2 = np.array([11, 12])"
218 | ]
219 | },
220 | {
221 | "cell_type": "code",
222 | "execution_count": null,
223 | "metadata": {
224 | "scrolled": true
225 | },
226 | "outputs": [],
227 | "source": [
228 | "vector_2.shape"
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "* La siguiente celda ejecutará la función ```np.cross()``` con ```vector_1``` y ```vector_2```."
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": null,
241 | "metadata": {},
242 | "outputs": [],
243 | "source": [
244 | "np.cross(vector_1, vector_2)"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": null,
250 | "metadata": {},
251 | "outputs": [],
252 | "source": [
253 | "np.cross(vector_1, vector_2).shape"
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {},
259 | "source": [
260 | "## El paquete ```numpy.linalg```.\n",
261 | "\n",
262 | "La biblioteca especializada en operaciones de álgebra lineal de *Numpy* es ```numpy.linalg```.\n",
263 | "\n",
264 | "El estudio de todas las funciones contenidas en este paquete están fuera de los alcances de este curso, pero se ejemplificarán las funciones.\n",
265 | "\n",
266 | "* ```np.linalg.det()```\n",
267 | "* ```np.linalg.solve()```\n",
268 | "* ```np.linalg.inv()```\n",
269 | "\n",
270 | "https://numpy.org/doc/stable/reference/routines.linalg.html"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": null,
276 | "metadata": {},
277 | "outputs": [],
278 | "source": [
279 | "import numpy.linalg"
280 | ]
281 | },
282 | {
283 | "cell_type": "markdown",
284 | "metadata": {},
285 | "source": [
286 | "### Cálculo del determinante de una matriz mediante ```numpy.linalg.det()```.\n",
287 | "\n",
288 | "\n",
289 | "\n",
290 | "**Ejemplo:**\n",
291 | "\n",
292 | "* Se calculará el determinante de la matriz:\n",
293 | "\n",
294 | "$$ \\det\\begin{vmatrix}0&1&2\\\\3&4&5\\\\6&7&8\\end{vmatrix}$$\n",
295 | "\n",
296 | "* El cálculo del determinante es el siguiente:\n",
297 | "\n",
298 | "$$ ((0 * 4 * 8) + (1 * 5 * 6) + (2 * 3 * 7)) - ((6 * 4 * 2) + (7 * 5 * 0) + (8 * 3* 1)) = 0$$"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "matriz = np.arange(9).reshape(3, 3)"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": null,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "matriz"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": null,
322 | "metadata": {},
323 | "outputs": [],
324 | "source": [
325 | "np.linalg.det(matriz)"
326 | ]
327 | },
328 | {
329 | "cell_type": "markdown",
330 | "metadata": {},
331 | "source": [
332 | "* Se calculará el determinante de la matriz:\n",
333 | "\n",
334 | "$$ \\det\\begin{vmatrix}1&1&2\\\\3&4&5\\\\6&7&8\\end{vmatrix}$$\n",
335 | "\n",
336 | "* El cálculo del determinante es el siguiente:\n",
337 | "\n",
338 | "$$ ((1 * 4 * 8) + (1 * 5 * 6) + (2 * 3 * 7)) - ((6 * 4 * 2) + (7 * 5 * 1) + (8 * 3* 1)) = -3$$"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": null,
344 | "metadata": {},
345 | "outputs": [],
346 | "source": [
347 | "matriz = np.array([[1, 1, 2],\n",
348 | " [3, 4, 5],\n",
349 | " [6, 7, 8]])"
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "metadata": {},
356 | "outputs": [],
357 | "source": [
358 | "np.linalg.det(matriz)"
359 | ]
360 | },
361 | {
362 | "cell_type": "markdown",
363 | "metadata": {},
364 | "source": [
365 | "### Soluciones de ecuaciones lineales con la función ```np.linalg.solve()```.\n",
366 | "\n",
367 | "Un sistema de ecuaciones lineales coresponde un conjunto de ecuaciones de la forma:\n",
368 | "\n",
369 | "$$\n",
370 | "a_{11}x_1 + a_{12}x_2 + \\cdots a_{1n}x_n = y_1 \\\\\n",
371 | "a_{21}x_1 + a_{22}x_2 + \\cdots a_{2n}x_n = y_2\\\\\n",
372 | "\\vdots\\\\\n",
373 | "a_{m1}x_1 + a_{m2}x_2 + \\cdots a_{mn}x_n = y_m\n",
374 | "$$\n",
375 | "\n",
376 | "Lo cual puede ser expresado de forma matricial.\n",
377 | "\n",
378 | "$$ \n",
379 | "\\begin{bmatrix}a_{11}\\\\a_{21}\\\\ \\vdots\\\\ a_{m1}\\end{bmatrix}x_1 + \\begin{bmatrix}a_{12}\\\\a_{22}\\\\ \\vdots\\\\ a_{m2}\\end{bmatrix}x_2 + \\cdots \\begin{bmatrix}a_{m1}\\\\a_{m2}\\\\ \\vdots\\\\ a_{mn}\\end{bmatrix}x_n = \\begin{bmatrix}y_{1}\\\\y_{2}\\\\ \\vdots\\\\ y_{m}\\end{bmatrix}\n",
380 | "$$\n",
381 | "\n",
382 | "Existen múltiples métodos para calcular los valores $x_1, x_2 \\cdots x_n$ que cumplan con el sistema siempre que $m = n$.\n",
383 | "\n",
384 | "Numpy cuenta con la función ```np.linalg.solve()```, la cual puede calcular la solución de un sistema de ecuaciones lineales al expresarse como un par de matrices de la siguiente foma:\n",
385 | "\n",
386 | "$$ \n",
387 | "\\begin{bmatrix}a_{11}&a_{12}&\\cdots&a_{1n}\\\\a_{21}&a_{22}&\\cdots&a_{2n}\\\\ \\vdots\\\\ a_{n1}&a_{n2}&\\cdots&a_{nn}\\end{bmatrix}= \\begin{bmatrix}y_{1}\\\\y_{2}\\\\ \\vdots\\\\ y_{n}\\end{bmatrix}\n",
388 | "$$\n",
389 | "\n",
390 | "La función ```numpy.linagl.solve()``` permite resolver sistemas de ecuaciones lineales ingresando un arreglo de dimensiones ```(n, n)``` como primer argumente y otro con dimensión ```(n)``` como segundo argumento. "
391 | ]
392 | },
393 | {
394 | "cell_type": "markdown",
395 | "metadata": {},
396 | "source": [
397 | "**Ejemplo:**\n",
398 | "\n",
399 | "* Para resolver el sistema de ecuaciones:\n",
400 | "\n",
401 | "$$\n",
402 | "2x_1 + 5x_2 - 3x_3 = 22.2 \\\\\n",
403 | "11x_1 - 4x_2 + 22x_3 = 11.6 \\\\\n",
404 | "54x_1 + 1x_2 + 19x_3 = -40.1 \\\\\n",
405 | "$$"
406 | ]
407 | },
408 | {
409 | "cell_type": "markdown",
410 | "metadata": {},
411 | "source": [
412 | "* La siguiente celda creará al arreglo ```a```, el cual representa a la matriz de coeficientes del sistema de ecuaciones lineales.\n",
413 | "\n",
414 | "$$ \n",
415 | "\\begin{bmatrix}2\\\\11\\\\54\\end{bmatrix}x_1 + \n",
416 | "\\begin{bmatrix}5\\\\-4\\\\1\\end{bmatrix}x_2 + \n",
417 | "\\begin{bmatrix}-3\\\\22\\\\19\\end{bmatrix}x_3 \n",
418 | "$$"
419 | ]
420 | },
421 | {
422 | "cell_type": "code",
423 | "execution_count": null,
424 | "metadata": {},
425 | "outputs": [],
426 | "source": [
427 | "a = np.array([[2, 5, -3],\n",
428 | " [11, -4, 22],\n",
429 | " [54, 1, 19]])"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {},
436 | "outputs": [],
437 | "source": [
438 | "a"
439 | ]
440 | },
441 | {
442 | "cell_type": "code",
443 | "execution_count": null,
444 | "metadata": {},
445 | "outputs": [],
446 | "source": [
447 | "a.shape"
448 | ]
449 | },
450 | {
451 | "cell_type": "markdown",
452 | "metadata": {},
453 | "source": [
454 | "* La siguiente celda corresponde a cada valor de $y$.\n",
455 | "\n",
456 | "$$\n",
457 | "\\begin{bmatrix}22.2\\\\11.6\\\\-40.1\\end{bmatrix}\n",
458 | "$$"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {},
465 | "outputs": [],
466 | "source": [
467 | "y = np.array([22.2, 11.6, -40.1])"
468 | ]
469 | },
470 | {
471 | "cell_type": "code",
472 | "execution_count": null,
473 | "metadata": {},
474 | "outputs": [],
475 | "source": [
476 | "y"
477 | ]
478 | },
479 | {
480 | "cell_type": "code",
481 | "execution_count": null,
482 | "metadata": {
483 | "scrolled": true
484 | },
485 | "outputs": [],
486 | "source": [
487 | "y.shape"
488 | ]
489 | },
490 | {
491 | "cell_type": "markdown",
492 | "metadata": {},
493 | "source": [
494 | "* la siguiente celda resolverá el sistema de ecuaciones.\n",
495 | "\n",
496 | "$$\n",
497 | "1.80243902x_1 + 6.7549776x_2 + 2.65666999x_3 = y\n",
498 | "$$"
499 | ]
500 | },
501 | {
502 | "cell_type": "code",
503 | "execution_count": null,
504 | "metadata": {},
505 | "outputs": [],
506 | "source": [
507 | "np.linalg.solve(a, y)"
508 | ]
509 | },
510 | {
511 | "cell_type": "markdown",
512 | "metadata": {},
513 | "source": [
514 | "### Cálculo de la inversa de una matriz mediante ```numpy.linalg.inv()```.\n",
515 | "\n",
516 | "Es posible calcular la matriz inversa de una [matriz invertible](https://es.wikipedia.org/wiki/Matriz_invertible) usando la función ```numpy.linalg.inv()```."
517 | ]
518 | },
519 | {
520 | "cell_type": "markdown",
521 | "metadata": {},
522 | "source": [
523 | "**Ejemplo:**"
524 | ]
525 | },
526 | {
527 | "cell_type": "markdown",
528 | "metadata": {},
529 | "source": [
530 | "* La siguiente celda definirá al arreglo ```a```, el cual representa a una matriz invertible."
531 | ]
532 | },
533 | {
534 | "cell_type": "code",
535 | "execution_count": null,
536 | "metadata": {
537 | "scrolled": true
538 | },
539 | "outputs": [],
540 | "source": [
541 | "a = np.array([[2, 5, -3],\n",
542 | " [11, -4, 22],\n",
543 | " [54, 1, 19]])"
544 | ]
545 | },
546 | {
547 | "cell_type": "markdown",
548 | "metadata": {},
549 | "source": [
550 | "* La siguiente celda calculará la matriz inversa de ```a```."
551 | ]
552 | },
553 | {
554 | "cell_type": "code",
555 | "execution_count": null,
556 | "metadata": {},
557 | "outputs": [],
558 | "source": [
559 | "np.linalg.inv(a)"
560 | ]
561 | },
562 | {
563 | "cell_type": "markdown",
564 | "metadata": {},
565 | "source": [
566 | "* Un método para resolver un sistema de ecuaciones lineales es el de realizar un producto punto con la inversa de los coeficientes del sistema y los valores de ```y```."
567 | ]
568 | },
569 | {
570 | "cell_type": "code",
571 | "execution_count": null,
572 | "metadata": {},
573 | "outputs": [],
574 | "source": [
575 | "np.linalg.inv(a).dot(y)"
576 | ]
577 | },
578 | {
579 | "cell_type": "markdown",
580 | "metadata": {},
581 | "source": [
582 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
583 | "© José Luis Chiquete Valdivieso. 2023.
"
584 | ]
585 | }
586 | ],
587 | "metadata": {
588 | "kernelspec": {
589 | "display_name": "Python 3 (ipykernel)",
590 | "language": "python",
591 | "name": "python3"
592 | },
593 | "language_info": {
594 | "codemirror_mode": {
595 | "name": "ipython",
596 | "version": 3
597 | },
598 | "file_extension": ".py",
599 | "mimetype": "text/x-python",
600 | "name": "python",
601 | "nbconvert_exporter": "python",
602 | "pygments_lexer": "ipython3",
603 | "version": "3.10.6"
604 | }
605 | },
606 | "nbformat": 4,
607 | "nbformat_minor": 2
608 | }
609 |
--------------------------------------------------------------------------------
/10_tipos_de_datos_de_pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Tipos de datos de *Pandas*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "*Pandas* toma como base a *Numpy* y lo extiende para poder realizar operaciones de análisis de datos, por lo que es compatible con elementos como:\n",
22 | "\n",
23 | "* ```np.nan```.\n",
24 | "* ```np.inf```."
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": null,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "import pandas as pd\n",
34 | "import numpy as np\n",
35 | "from datetime import datetime"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "## Convenciones de nombres."
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "En este capítulo se hará referencia al paquete ```pandas``` como ```pd```, al paquete ```numpy``` como ```np```y a los *dataframes* instanciados de ```pd.DataFrame``` como ```df```."
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "## Tipos de datos de *Pandas*.\n",
57 | "\n",
58 | "*Pandas* extiende y a su vez restringe los tipos de datos de *Python* y de *Numpy* a los siguientes:\n",
59 | "\n",
60 | "* ```object``` el cual representa a una cadena de caracteres.\n",
61 | "* ```int64``` es el tipo para números enteros. \n",
62 | "* ```float64``` es el tipo para números de punto flotante.\n",
63 | "* ```bool``` es el tipo para valores booleanos.\n",
64 | "* ```datetime64``` es el tipo usado para gestionar fechas y horas.\n",
65 | "* ```timedelta64``` es el tipo de diferencias de tiempo. \n",
66 | "* ```category``` es un tipo de dato que contiene una colección finita de posibles valores (no se estudiará en este curso)."
67 | ]
68 | },
69 | {
70 | "cell_type": "markdown",
71 | "metadata": {},
72 | "source": [
73 | "**Ejemplo:**"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "* A continuación se creará el *dataframe* ```datos``` que define las columnas.\n",
81 | " * *nombres* de tipo ```object```.\n",
82 | " * *fechas* de tipo ```datetime64```.\n",
83 | " * *saldo* de tipo ```float64```.\n",
84 | " * *al corriente* de tipo ```bool```."
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": null,
90 | "metadata": {},
91 | "outputs": [],
92 | "source": [
93 | "datos = pd.DataFrame({'nombres':('Juan Pérez',\n",
94 | " 'María Sánchez'\n",
95 | " , 'Jorge Vargas',\n",
96 | " 'Rodrigo Martínez'),\n",
97 | " 'fechas':(datetime(1995,12,21), \n",
98 | " datetime(1989,1,13), \n",
99 | " datetime(1992,9,14), \n",
100 | " datetime(1993,7,8)),\n",
101 | " 'saldo': (2500, \n",
102 | " 5345, \n",
103 | " np.nan, \n",
104 | " 11323.2),\n",
105 | " 'al corriente':(True, \n",
106 | " True, \n",
107 | " False, \n",
108 | " True)})"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": null,
114 | "metadata": {
115 | "scrolled": true
116 | },
117 | "outputs": [],
118 | "source": [
119 | "datos"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "## El atributo ```df.dtypes```.\n",
127 | "\n",
128 | "Este atributo es una serie de *Pandas* que contienen la relación de los tipos de datos de cada columna del *dataframe*.\n",
129 | "\n",
130 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "**Ejemplo:**\n",
138 | "\n",
139 | "* A partir del *dataframe* ```datos``` se obtendrá el tipo de datos de cada columna. "
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": null,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "datos.dtypes"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "## El método ```df.astype()```.\n",
156 | "\n",
157 | "Este método permite regresar los datos contenidos en un dataframe de *Pandas* a un tipo de dato específico. \n",
158 | "\n",
159 | "```\n",
160 | "df.astype()\n",
161 | "```\n",
162 | "\n",
163 | "Donde:\n",
164 | "\n",
165 | "* `````` es un tipo de dato soportado por *Pandas*.\n",
166 | "\n",
167 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html"
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "metadata": {},
173 | "source": [
174 | "**Ejemplos:**"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "* La siguiente celda convertirá el contenido del *dataframe* ```datos``` a ```str```, lo cual dará por resultado elementos de tipo ```object```."
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "datos"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "datos.astype(str)"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": null,
205 | "metadata": {},
206 | "outputs": [],
207 | "source": [
208 | "datos.astype(str).dtypes"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": null,
214 | "metadata": {},
215 | "outputs": [],
216 | "source": [
217 | "datos.dtypes"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "* La siguiente celda intentará convertir el contenido de la columna ```datos['saldo']``` a ```int64```. Sin embargo, algunos de sus contenidos no pueden ser convertidos a ese tipo de datos y se generará una excepciónde tipo ```IntCastingNaNError```."
225 | ]
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": null,
230 | "metadata": {},
231 | "outputs": [],
232 | "source": [
233 | "datos['saldo'].astype(\"int64\")"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {},
239 | "source": [
240 | "## La función ```pd.to_datetime()```.\n",
241 | "\n",
242 | "Esta función permite crear una columna de tipo ```datetime64``` a partir de un *dataframe* con columnas cuyos encabezados sean:\n",
243 | "\n",
244 | "* ```year``` (obligatorio)\n",
245 | "* ```month``` (obligatorio)\n",
246 | "* ```day``` (obligatorio)\n",
247 | "* ```hour```\n",
248 | "* ```minutes```\n",
249 | "* ```seconds```\n",
250 | "\n",
251 | "```\n",
252 | "pd.to_datetime()\n",
253 | "```\n",
254 | "\n",
255 | "Donde:\n",
256 | "\n",
257 | "* `````` es un *dataframe* con los identificadores de las columnas dispuesto en el formato descrito.\n",
258 | "\n",
259 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html"
260 | ]
261 | },
262 | {
263 | "cell_type": "markdown",
264 | "metadata": {},
265 | "source": [
266 | "**Ejemplo:**"
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {},
272 | "source": [
273 | "* La siguiente celda creará al *dataframe* ```fechas``` con las columnas:\n",
274 | " * ```year```\n",
275 | " * ```month```\n",
276 | " * ```day```\n",
277 | " * ```hour```\n",
278 | " * ```minutes```\n",
279 | " * ```seconds```"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": null,
285 | "metadata": {},
286 | "outputs": [],
287 | "source": [
288 | "fechas = pd.DataFrame({'year': [1997, 1982, 1985],\n",
289 | " 'month': [1, 12, 3],\n",
290 | " 'day': [14, 5, 21],\n",
291 | " 'hour':[17, 0, 4],\n",
292 | " 'minutes':[45, 39, 28],\n",
293 | " 'seconds':[11.1803, 23.74583, 3.8798]})"
294 | ]
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": null,
299 | "metadata": {},
300 | "outputs": [],
301 | "source": [
302 | "fechas"
303 | ]
304 | },
305 | {
306 | "cell_type": "markdown",
307 | "metadata": {},
308 | "source": [
309 | "* A continuacion se creará la serie ```nuevas_fechas```, compuesta por elementos de tipo ```datetime64``` al aplicar la función ```pd.to_datetime()``` al *dataframe* ```fechas```."
310 | ]
311 | },
312 | {
313 | "cell_type": "code",
314 | "execution_count": null,
315 | "metadata": {},
316 | "outputs": [],
317 | "source": [
318 | "nuevas_fechas = pd.to_datetime(fechas)"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": null,
324 | "metadata": {},
325 | "outputs": [],
326 | "source": [
327 | "nuevas_fechas"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": null,
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "type(nuevas_fechas)"
337 | ]
338 | },
339 | {
340 | "cell_type": "markdown",
341 | "metadata": {},
342 | "source": [
343 | "## La función ```pd.to_numeric()```.\n",
344 | "\n",
345 | "Esta función transforma al contenido de un *dataframe* o serie a un formato numérico."
346 | ]
347 | },
348 | {
349 | "cell_type": "markdown",
350 | "metadata": {},
351 | "source": [
352 | "**Ejemplo:**"
353 | ]
354 | },
355 | {
356 | "cell_type": "markdown",
357 | "metadata": {},
358 | "source": [
359 | "* La siguiente celda transformará la serie ```nuevas_fechas``` a formato numérico."
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": null,
365 | "metadata": {},
366 | "outputs": [],
367 | "source": [
368 | "pd.to_numeric(nuevas_fechas)"
369 | ]
370 | },
371 | {
372 | "cell_type": "code",
373 | "execution_count": null,
374 | "metadata": {},
375 | "outputs": [],
376 | "source": [
377 | "pd.to_datetime(pd.to_numeric(nuevas_fechas))"
378 | ]
379 | },
380 | {
381 | "cell_type": "markdown",
382 | "metadata": {},
383 | "source": [
384 | "## La función ```pd.to_timedelta()```.\n",
385 | "\n",
386 | "Esta función convertirá valores numéricos a formato ```timedelta64``` usando nanosegundos como referencia."
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "**Ejemplo:**"
394 | ]
395 | },
396 | {
397 | "cell_type": "markdown",
398 | "metadata": {},
399 | "source": [
400 | "* La siguiente celda generará al *dataframe* ```datos``` que contiene una secuencia de 20 números."
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": null,
406 | "metadata": {},
407 | "outputs": [],
408 | "source": [
409 | "datos = pd.DataFrame(np.arange(2811154301025,\n",
410 | " 2811154301125, 5).reshape(10, 2))"
411 | ]
412 | },
413 | {
414 | "cell_type": "code",
415 | "execution_count": null,
416 | "metadata": {},
417 | "outputs": [],
418 | "source": [
419 | "datos"
420 | ]
421 | },
422 | {
423 | "cell_type": "markdown",
424 | "metadata": {},
425 | "source": [
426 | "* Se aplicará la función ```pd.to_timedelta()``` a ```datos[1]```."
427 | ]
428 | },
429 | {
430 | "cell_type": "code",
431 | "execution_count": null,
432 | "metadata": {
433 | "scrolled": true
434 | },
435 | "outputs": [],
436 | "source": [
437 | "pd.to_timedelta(datos[1])"
438 | ]
439 | },
440 | {
441 | "cell_type": "markdown",
442 | "metadata": {},
443 | "source": [
444 | "* La siguiente celda intentará ejecutar la función ```pd.to_timedelta()``` al *dataframe* ```nuevas_fechas``` el cual contiene objetos de tipo ```datetime```, desencadenando una excepción de tipo ```TypeError```."
445 | ]
446 | },
447 | {
448 | "cell_type": "code",
449 | "execution_count": null,
450 | "metadata": {},
451 | "outputs": [],
452 | "source": [
453 | "pd.to_timedelta(nuevas_fechas)"
454 | ]
455 | },
456 | {
457 | "cell_type": "markdown",
458 | "metadata": {},
459 | "source": [
460 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
461 | "© José Luis Chiquete Valdivieso. 2023.
"
462 | ]
463 | }
464 | ],
465 | "metadata": {
466 | "kernelspec": {
467 | "display_name": "Python 3 (ipykernel)",
468 | "language": "python",
469 | "name": "python3"
470 | },
471 | "language_info": {
472 | "codemirror_mode": {
473 | "name": "ipython",
474 | "version": 3
475 | },
476 | "file_extension": ".py",
477 | "mimetype": "text/x-python",
478 | "name": "python",
479 | "nbconvert_exporter": "python",
480 | "pygments_lexer": "ipython3",
481 | "version": "3.9.2"
482 | }
483 | },
484 | "nbformat": 4,
485 | "nbformat_minor": 2
486 | }
487 |
--------------------------------------------------------------------------------
/12_indices_y_multiindices.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Índices y multiíndeces.\n",
15 | "\n",
16 | "Los índices e índices de columnas son objetos de *Pandas* que pueden ser tan simple como un listados de cadenas de caracteres o estructuras compleja de múltiples niveles.\n",
17 | "\n",
18 | "En este capítulo se estudiarán a los objetos instanciados de las clases ```pd.Index``` y ```pd.MultiIndex```."
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "import pandas as pd\n",
28 | "import numpy as np"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "## El atributo ```pd.DataFrame.axes```.\n",
36 | "\n",
37 | "El atributo ```pd.DataFrame.axes``` es una lista que contiene a los índices de renglones y de columnas de un *dataframe*.\n",
38 | "\n",
39 | "* El objeto ```pd.DataFrame.axes[0]``` corresponde a los índices del *dataframe*.\n",
40 | "* El objeto ```pd.DataFrame.axes[1]``` corresponde a los índices de la columnas del *dataframe*.\n",
41 | "\n",
42 | "Los índices pueden ser de tipo ```pd.Index``` o ```pd.MultiIndex```.\n",
43 | "\n",
44 | "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.axes.html"
45 | ]
46 | },
47 | {
48 | "cell_type": "markdown",
49 | "metadata": {},
50 | "source": [
51 | "**Ejemplo:**"
52 | ]
53 | },
54 | {
55 | "cell_type": "markdown",
56 | "metadata": {},
57 | "source": [
58 | "* Se creará al *dataframe* ```poblacion```. "
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "poblacion = pd.DataFrame({'Animal':('lobo',\n",
68 | " 'coyote',\n",
69 | " 'jaguar',\n",
70 | " 'cerdo salvaje',\n",
71 | " 'tapir',\n",
72 | " 'venado',\n",
73 | " 'ocelote',\n",
74 | " 'puma'),\n",
75 | " 'Norte_I':(12,\n",
76 | " np.NAN,\n",
77 | " None,\n",
78 | " 2,\n",
79 | " 4,\n",
80 | " 2,\n",
81 | " 14,\n",
82 | " 5\n",
83 | " ),\n",
84 | " 'Norte_II':(23,\n",
85 | " 4,\n",
86 | " 25,\n",
87 | " 21,\n",
88 | " 9,\n",
89 | " 121,\n",
90 | " 1,\n",
91 | " 2\n",
92 | " ),\n",
93 | " 'Centro_I':(15,\n",
94 | " 23,\n",
95 | " 2,\n",
96 | " 120,\n",
97 | " 40,\n",
98 | " 121,\n",
99 | " 0,\n",
100 | " 5),\n",
101 | " 'Sur_I':(28,\n",
102 | " 46,\n",
103 | " 14,\n",
104 | " 156,\n",
105 | " 79,\n",
106 | " 12,\n",
107 | " 2,\n",
108 | " np.NAN)}).set_index('Animal')"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": null,
114 | "metadata": {},
115 | "outputs": [],
116 | "source": [
117 | "poblacion"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "* Se desplegará ```poblacion.axes```."
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": null,
130 | "metadata": {},
131 | "outputs": [],
132 | "source": [
133 | "poblacion.axes"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": null,
139 | "metadata": {},
140 | "outputs": [],
141 | "source": [
142 | "poblacion.axes[0]"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": null,
148 | "metadata": {},
149 | "outputs": [],
150 | "source": [
151 | "poblacion.axes[1]"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {},
157 | "source": [
158 | "## La clase ```pd.Index```.\n",
159 | "\n",
160 | "Esta clase es la clase que permite crear índices simples y se instancia de esta manera.\n",
161 | "\n",
162 | "```\n",
163 | "pd.Index(['<índice 1>', '<índice 2>',..., '<índice n>'], name='')\n",
164 | "```\n",
165 | "Donde:\n",
166 | "\n",
167 | "* ```<índice x>``` es una cadena de caracteres correspondiente al nombre de un índice.\n",
168 | "* `````` es una cadena de caracteres para el atributo ```name``` del objeto ```pd.Index```."
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | "**Ejemplo:**"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "* Se creará el objeto ```pd.Index``` con nombre ```indice```."
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": null,
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "indice = pd.Index(['N_1', 'N_2', 'C', 'S'], name='Regiones')"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | "* Se asignará ```indice``` al atributo ```poblacion.columns```."
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "metadata": {},
205 | "outputs": [],
206 | "source": [
207 | "poblacion"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": null,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "poblacion.columns = indice"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": null,
222 | "metadata": {},
223 | "outputs": [],
224 | "source": [
225 | "poblacion"
226 | ]
227 | },
228 | {
229 | "cell_type": "markdown",
230 | "metadata": {},
231 | "source": [
232 | "### El atributo ```pd.Index.name```.\n",
233 | "\n",
234 | "Este atributo contiene el nombre del objeto ```pd.Index.name```, el cual será desplegado como parte de un índice."
235 | ]
236 | },
237 | {
238 | "cell_type": "markdown",
239 | "metadata": {},
240 | "source": [
241 | "* Se despegará el atributo ```name``` de ```poblacion.index```."
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": null,
247 | "metadata": {},
248 | "outputs": [],
249 | "source": [
250 | "poblacion.index.name"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": null,
256 | "metadata": {},
257 | "outputs": [],
258 | "source": [
259 | "poblacion.columns.name"
260 | ]
261 | },
262 | {
263 | "cell_type": "markdown",
264 | "metadata": {},
265 | "source": [
266 | "### El atributo ```pd.Index.values```.\n",
267 | "\n",
268 | "Este atributo es un objeto ```np.ndarray``` que contiene los nombres de cada índice."
269 | ]
270 | },
271 | {
272 | "cell_type": "markdown",
273 | "metadata": {},
274 | "source": [
275 | "**Ejemplos:**"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {},
281 | "source": [
282 | "* Se desplegará el atributo ```poblacion.columns.values```."
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {},
289 | "outputs": [],
290 | "source": [
291 | "poblacion.columns.values"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "metadata": {},
297 | "source": [
298 | "* Se sustituirá el valor de ```poblacion.columns.values[3]``` por la cadena ```'Sur'```."
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "poblacion.columns.values[3] = 'Sur'"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": null,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "poblacion"
317 | ]
318 | },
319 | {
320 | "cell_type": "markdown",
321 | "metadata": {},
322 | "source": [
323 | "## La clase ```pd.MultiIndex```.\n",
324 | "\n",
325 | "Los objetos instanciados de la clase ```pd.MultiIndex``` permiten tener más de un nivel de índices.\n",
326 | "\n",
327 | "Estos objetos están conformados por:\n",
328 | "* niveles (```levels```), los cuales se van desagregando conforme descienden.\n",
329 | "* codigos de ordenamiento (```codes```), ls cuales contienen listas describiendo la distribución de los índices por nivel.\n",
330 | "* nombres (```names```) correspondientes a cada nivel."
331 | ]
332 | },
333 | {
334 | "cell_type": "markdown",
335 | "metadata": {},
336 | "source": [
337 | " ## Creación de un objeto ```pd.MultiIndex```.\n",
338 | " \n",
339 | " Para la creación de objetos instanciados de ```pd.MultiIndex``` se pueden utilizar los siguientes métodos de clase:\n",
340 | " \n",
341 | " * ```pd.MultiIndex.from_arrays()```.\n",
342 | " * ```pd.MultiIndex.from_tuples()```.\n",
343 | " * ```pd.MultiIndex.from_products()```.\n",
344 | " * ```pd.MultiIndex.from_dataframes()```. \n",
345 | " \n",
346 | " \n",
347 | " ```\n",
348 | " pd.MultiIndex.(, names=)\n",
349 | " ```"
350 | ]
351 | },
352 | {
353 | "cell_type": "markdown",
354 | "metadata": {},
355 | "source": [
356 | "**Ejemplos:**"
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "metadata": {},
362 | "source": [
363 | "* A continuación se creará una tupla que describe diversos índices."
364 | ]
365 | },
366 | {
367 | "cell_type": "code",
368 | "execution_count": null,
369 | "metadata": {},
370 | "outputs": [],
371 | "source": [
372 | "lista = []\n",
373 | "for zona in ('Norte_I', 'Sur_I', 'Centro_I'):\n",
374 | " for animal in ('jaguar', 'conejo', 'lobo'):\n",
375 | " lista.append((zona, animal)) "
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": null,
381 | "metadata": {},
382 | "outputs": [],
383 | "source": [
384 | "lista"
385 | ]
386 | },
387 | {
388 | "cell_type": "code",
389 | "execution_count": null,
390 | "metadata": {},
391 | "outputs": [],
392 | "source": [
393 | "tupla=tuple(lista) "
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": null,
399 | "metadata": {},
400 | "outputs": [],
401 | "source": [
402 | "tupla"
403 | ]
404 | },
405 | {
406 | "cell_type": "markdown",
407 | "metadata": {},
408 | "source": [
409 | "* La siguiente celda creará un objeto a partir de ```pd.MultiIndex.from_tuples```."
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": null,
415 | "metadata": {
416 | "scrolled": true
417 | },
418 | "outputs": [],
419 | "source": [
420 | "pd.MultiIndex.from_tuples(tupla, names=['zona', 'animal'])"
421 | ]
422 | },
423 | {
424 | "cell_type": "markdown",
425 | "metadata": {},
426 | "source": [
427 | "* A continuación se crearán objetos similares utilizando el método ```pd.MultiIndex.from_product()```."
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": null,
433 | "metadata": {},
434 | "outputs": [],
435 | "source": [
436 | "pd.MultiIndex.from_product([('Norte_I', 'Sur_I', 'Centro_I'), \n",
437 | " ('jaguar', 'conejo', 'lobo')],\n",
438 | " names=['zona', 'animal'])"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "* Se definirá al objeto ```columnas``` utilizando el método ```pd.MultiIndex.from_product()```."
446 | ]
447 | },
448 | {
449 | "cell_type": "code",
450 | "execution_count": null,
451 | "metadata": {},
452 | "outputs": [],
453 | "source": [
454 | "columnas = pd.MultiIndex.from_product([('Norte_I', 'Sur_I', 'Centro_I'),\n",
455 | " ('jaguar', 'conejo', 'lobo')],\n",
456 | " names=['zona', 'animal'])"
457 | ]
458 | },
459 | {
460 | "cell_type": "markdown",
461 | "metadata": {},
462 | "source": [
463 | "El *dataframe* ```poblacion``` será creado utilizando el objeto ```columnas``` para el atributo ```poblacion.columns```."
464 | ]
465 | },
466 | {
467 | "cell_type": "code",
468 | "execution_count": null,
469 | "metadata": {},
470 | "outputs": [],
471 | "source": [
472 | "poblacion = pd.DataFrame([[12, 11, 24, 32, 15, 42, 35, 11, 35],\n",
473 | " [23, 22, 54, 3, 34, 24, 39, 29, 11],\n",
474 | " [35, 32, 67, 15, 42, 34, 46, 40, 13],\n",
475 | " [33, 43, 87, 11, 61, 42, 52, 41, 15],\n",
476 | " [44, 56, 98, 16, 70, 50, 57, 41, 17],\n",
477 | " [53, 62, 103, 21, 74, 54, 69, 55, 23]], \n",
478 | " index=('enero', 'febrero', 'marzo', 'abril', 'mayo', 'junio'),\n",
479 | " columns=columnas)"
480 | ]
481 | },
482 | {
483 | "cell_type": "code",
484 | "execution_count": null,
485 | "metadata": {},
486 | "outputs": [],
487 | "source": [
488 | "poblacion.columns"
489 | ]
490 | },
491 | {
492 | "cell_type": "code",
493 | "execution_count": null,
494 | "metadata": {
495 | "scrolled": true
496 | },
497 | "outputs": [],
498 | "source": [
499 | "poblacion"
500 | ]
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "metadata": {},
505 | "source": [
506 | "# Indexado.\n",
507 | "\n",
508 | "El indexado de un índice se realiza mediante corchetes. El corchete inicial corresponde al nivel superior.\n",
509 | "\n",
510 | "```\n",
511 | "df[][]...[]\n",
512 | "```\n",
513 | "Donde:\n",
514 | "\n",
515 | "* Cada `````` es el identificador del índice al que se desea acceder en el nivel ```i``` correspondiente."
516 | ]
517 | },
518 | {
519 | "cell_type": "markdown",
520 | "metadata": {},
521 | "source": [
522 | "**Ejemplo:**"
523 | ]
524 | },
525 | {
526 | "cell_type": "markdown",
527 | "metadata": {},
528 | "source": [
529 | "* La siguiente celda regreará un *dataframe* correspondiente a las columnas del índice de primer nivel ```poblacion['Norte_I']```."
530 | ]
531 | },
532 | {
533 | "cell_type": "code",
534 | "execution_count": null,
535 | "metadata": {
536 | "scrolled": true
537 | },
538 | "outputs": [],
539 | "source": [
540 | "poblacion['Norte_I']"
541 | ]
542 | },
543 | {
544 | "cell_type": "markdown",
545 | "metadata": {},
546 | "source": [
547 | "* La siguiente celda regreará un *dataframe* correspondiente a las columnas del índice de primer nivel ```poblacion['Sur_I']['jaguar']```."
548 | ]
549 | },
550 | {
551 | "cell_type": "code",
552 | "execution_count": null,
553 | "metadata": {},
554 | "outputs": [],
555 | "source": [
556 | "poblacion['Sur_I']['jaguar']"
557 | ]
558 | },
559 | {
560 | "cell_type": "markdown",
561 | "metadata": {},
562 | "source": [
563 | "## El metódo ```pd.MultiIndex.droplevel()```.\n",
564 | "\n",
565 | "Este método elimina un nivel de un objeto ```pd.MultiIndex```.\n",
566 | "\n",
567 | "```\n",
568 | ".droplevel()\n",
569 | "```"
570 | ]
571 | },
572 | {
573 | "cell_type": "markdown",
574 | "metadata": {},
575 | "source": [
576 | "**Ejemplo:**"
577 | ]
578 | },
579 | {
580 | "cell_type": "markdown",
581 | "metadata": {},
582 | "source": [
583 | "* Se utilizará al objeto ```columnas``` creado previamente."
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": null,
589 | "metadata": {
590 | "scrolled": true
591 | },
592 | "outputs": [],
593 | "source": [
594 | "columnas"
595 | ]
596 | },
597 | {
598 | "cell_type": "markdown",
599 | "metadata": {},
600 | "source": [
601 | "* La siguiente celda eliminará el primer nivel del objeto ```columnas```."
602 | ]
603 | },
604 | {
605 | "cell_type": "code",
606 | "execution_count": null,
607 | "metadata": {},
608 | "outputs": [],
609 | "source": [
610 | "columnas.droplevel(0)"
611 | ]
612 | },
613 | {
614 | "cell_type": "markdown",
615 | "metadata": {},
616 | "source": [
617 | "* La siguiente celda al objeto ```nuevas_cols``` a partir de eliminar el primer nivel del objeto ```columnas```."
618 | ]
619 | },
620 | {
621 | "cell_type": "code",
622 | "execution_count": null,
623 | "metadata": {},
624 | "outputs": [],
625 | "source": [
626 | "nuevas_cols = poblacion.columns.droplevel('zona')"
627 | ]
628 | },
629 | {
630 | "cell_type": "code",
631 | "execution_count": null,
632 | "metadata": {
633 | "scrolled": true
634 | },
635 | "outputs": [],
636 | "source": [
637 | "nuevas_cols"
638 | ]
639 | },
640 | {
641 | "cell_type": "markdown",
642 | "metadata": {},
643 | "source": [
644 | "* La siguiente celda asignará al atributo ```poblacion.columns``` el objeto ```nuevas_cols```."
645 | ]
646 | },
647 | {
648 | "cell_type": "code",
649 | "execution_count": null,
650 | "metadata": {},
651 | "outputs": [],
652 | "source": [
653 | "poblacion.columns = nuevas_cols"
654 | ]
655 | },
656 | {
657 | "cell_type": "code",
658 | "execution_count": null,
659 | "metadata": {
660 | "scrolled": true
661 | },
662 | "outputs": [],
663 | "source": [
664 | "poblacion"
665 | ]
666 | },
667 | {
668 | "cell_type": "code",
669 | "execution_count": null,
670 | "metadata": {},
671 | "outputs": [],
672 | "source": [
673 | "poblacion['jaguar']"
674 | ]
675 | },
676 | {
677 | "cell_type": "markdown",
678 | "metadata": {},
679 | "source": [
680 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
681 | "© José Luis Chiquete Valdivieso. 2023.
"
682 | ]
683 | }
684 | ],
685 | "metadata": {
686 | "kernelspec": {
687 | "display_name": "Python 3 (ipykernel)",
688 | "language": "python",
689 | "name": "python3"
690 | },
691 | "language_info": {
692 | "codemirror_mode": {
693 | "name": "ipython",
694 | "version": 3
695 | },
696 | "file_extension": ".py",
697 | "mimetype": "text/x-python",
698 | "name": "python",
699 | "nbconvert_exporter": "python",
700 | "pygments_lexer": "ipython3",
701 | "version": "3.9.2"
702 | }
703 | },
704 | "nbformat": 4,
705 | "nbformat_minor": 2
706 | }
707 |
--------------------------------------------------------------------------------
/14_metodo_merge.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# El método ```pd.DataFrame.merge()```."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd\n",
24 | "from datetime import datetime"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "El método ```pd.DataFrame.merge()``` permite crear en un nuevo *dataframe* a partir de la relación entre el *dataframe* de origen y el que se ingresa como *argumento*, indicando las columnas en las que pueda encontrar elementos coincidentes.\n",
32 | "\n",
33 | "\n",
34 | "```\n",
35 | "df.merge(, left_on=[, , ..., ], \n",
36 | " right_on=[, , ..., ], \n",
37 | " on=[, ,.. ],\n",
38 | " how=)\n",
39 | "```\n",
40 | "\n",
41 | "Donde:\n",
42 | "\n",
43 | "* `````` es un *dataframe* de *Pandas*.\n",
44 | "* Cada `````` es un objeto `````` que corresponde al identificador de una columna del *dataframe* que contiene al método.\n",
45 | "* `````` es un objeto `````` que corresponde al identificador de una columna del *dataframe* ``````.\n",
46 | "* Cada ``````es un objeto `````` que corresponde al identificador de una columna que comparte el mismo nombre en ambos *dataframes*.\n",
47 | "* `````` es el modo en el que se realizará la combinación y puede ser:\n",
48 | "\n",
49 | " * ```'inner'```, el cual es el valor por defecto.\n",
50 | " * ```'outer'```\n",
51 | " * ```'left'```\n",
52 | " * ```'right'```\n",
53 | " \n",
54 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "**Ejemplos:**"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "* La siguiente celda creará al *dataframe* ```clientes```, el cual contiene a las columnas:\n",
69 | "\n",
70 | "* ```'ident'```.\n",
71 | "* ```'nombre'```.\n",
72 | "* ```'primer apellido'```.\n",
73 | "* ```'suc_origen'```."
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": null,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "clientes = pd.DataFrame({'ident':(19232, \n",
83 | " 19233, \n",
84 | " 19234, \n",
85 | " 19235, \n",
86 | " 19236),\n",
87 | " 'nombre':('Adriana',\n",
88 | " 'Marcos',\n",
89 | " 'Rubén',\n",
90 | " 'Samuel',\n",
91 | " 'Martha'),\n",
92 | " 'primer apellido':('Sánchez',\n",
93 | " 'García',\n",
94 | " 'Rincón',\n",
95 | " 'Oliva',\n",
96 | " 'Martínez'),\n",
97 | " 'suc_origen':('CDMX01',\n",
98 | " 'CDMX02',\n",
99 | " 'CDMX02',\n",
100 | " 'CDMX01',\n",
101 | " 'CDMX03')\n",
102 | " })"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "metadata": {
109 | "scrolled": false
110 | },
111 | "outputs": [],
112 | "source": [
113 | "clientes"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "* La siguiente celda creará al *dataframe* ```sucursales```, el cual contiene a las columnas:\n",
121 | "\n",
122 | "* ```'clave'```.\n",
123 | "* ```nombre_comercial```."
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": null,
129 | "metadata": {},
130 | "outputs": [],
131 | "source": [
132 | "sucursales = pd.DataFrame({'clave':('CDMX01', \n",
133 | " 'CDMX02', \n",
134 | " 'MTY01', \n",
135 | " 'GDL01'),\n",
136 | " 'nombre_comercial':('Galerías',\n",
137 | " 'Centro',\n",
138 | " 'Puerta de la Silla',\n",
139 | " 'Minerva Plaza')})"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": null,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "sucursales"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "* La siguiente celda creará al *dataframe* ```facturas```, el cual contiene a las columnas:\n",
156 | "\n",
157 | "* ```'folio'```.\n",
158 | "* ```'sucursal'```.\n",
159 | "* ```'monto'```.\n",
160 | "* ```'fecha'```.\n",
161 | "* ```'cliente'```."
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": null,
167 | "metadata": {},
168 | "outputs": [],
169 | "source": [
170 | "facturas = pd.DataFrame({'folio':(15234, \n",
171 | " \n",
172 | " 15235, \n",
173 | " 15236, \n",
174 | " 15237, \n",
175 | " 15238, \n",
176 | " 15239, \n",
177 | " 15240,\n",
178 | " 15241,\n",
179 | " 15242),\n",
180 | " 'sucursal':('CDMX01',\n",
181 | " 'MTY01',\n",
182 | " 'CDMX02',\n",
183 | " 'CDMX02',\n",
184 | " 'MTY01',\n",
185 | " 'GDL01',\n",
186 | " 'CDMX02',\n",
187 | " 'MTY01',\n",
188 | " 'GDL01'),\n",
189 | " 'monto':(1420.00,\n",
190 | " 1532.00,\n",
191 | " 890.00,\n",
192 | " 1300.00,\n",
193 | " 3121.47,\n",
194 | " 1100.5,\n",
195 | " 12230,\n",
196 | " 230.85,\n",
197 | " 1569),\n",
198 | " 'fecha':(datetime(2019,3,11,17,24),\n",
199 | " datetime(2019,3,24,14,46),\n",
200 | " datetime(2019,3,25,17,58),\n",
201 | " datetime(2019,3,27,13,11),\n",
202 | " datetime(2019,3,31,10,25),\n",
203 | " datetime(2019,4,1,18,32),\n",
204 | " datetime(2019,4,3,11,43),\n",
205 | " datetime(2019,4,4,16,55),\n",
206 | " datetime(2019,4,5,12,59)),\n",
207 | " 'cliente':(19234,\n",
208 | " 19232,\n",
209 | " 19235,\n",
210 | " 19233,\n",
211 | " 19236,\n",
212 | " 19237,\n",
213 | " 19232,\n",
214 | " 19233,\n",
215 | " 19232)\n",
216 | " })"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": null,
222 | "metadata": {
223 | "scrolled": true
224 | },
225 | "outputs": [],
226 | "source": [
227 | "facturas"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "* Cada una de las siguentes dos celdas regresarán un *dataframe* que relacionará los elementos de ```facturas[\"sucursal\"]``` con ```sucursales[\"clave\"]```."
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": null,
240 | "metadata": {},
241 | "outputs": [],
242 | "source": [
243 | "facturas.merge(sucursales, left_on=\"sucursal\", right_on=\"clave\")"
244 | ]
245 | },
246 | {
247 | "cell_type": "code",
248 | "execution_count": null,
249 | "metadata": {
250 | "scrolled": true
251 | },
252 | "outputs": [],
253 | "source": [
254 | "sucursales.merge(facturas, left_on=\"clave\", right_on=\"sucursal\")"
255 | ]
256 | },
257 | {
258 | "cell_type": "markdown",
259 | "metadata": {},
260 | "source": [
261 | "* La siguente celda regresará un *dataframe* que relacionará los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```."
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {
268 | "scrolled": false
269 | },
270 | "outputs": [],
271 | "source": [
272 | "facturas.merge(clientes, left_on=\"cliente\", right_on=\"ident\")"
273 | ]
274 | },
275 | {
276 | "cell_type": "markdown",
277 | "metadata": {},
278 | "source": [
279 | "* La siguente celda regresará un *dataframe* que relacionará los elementos de ```clientes[\"suc_origen\"]``` con ```sucursales[\"clave\"]```."
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": null,
285 | "metadata": {},
286 | "outputs": [],
287 | "source": [
288 | "clientes.merge(sucursales, left_on='suc_origen', right_on='clave')"
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "metadata": {},
294 | "source": [
295 | "* La siguente celda regresará un *dataframe* que relacionará:\n",
296 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n",
297 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n",
298 | "\n",
299 | "* El *dataframe* resultante contendrá exclusivamente aquellos elementos en los que exista coincidencia en ambas relaciones."
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": null,
305 | "metadata": {
306 | "scrolled": false
307 | },
308 | "outputs": [],
309 | "source": [
310 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n",
311 | " right_on=[\"ident\", \"suc_origen\"])"
312 | ]
313 | },
314 | {
315 | "cell_type": "markdown",
316 | "metadata": {},
317 | "source": [
318 | "* La siguente celda regresará un *dataframe* que relacionará:\n",
319 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n",
320 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n",
321 | " * Se ingresará el argumento ```how=\"inner\"```.\n",
322 | "\n",
323 | "* El *dataframe* resultante contendrá exclusivamente aquellos elementos en los que exista coincidencia en ambas relaciones."
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": null,
329 | "metadata": {},
330 | "outputs": [],
331 | "source": [
332 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n",
333 | " right_on=[\"ident\", \"suc_origen\"], how=\"inner\")"
334 | ]
335 | },
336 | {
337 | "cell_type": "markdown",
338 | "metadata": {},
339 | "source": [
340 | "* La siguente celda regresará un *dataframe* que relacionará:\n",
341 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n",
342 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n",
343 | " * Se ingresará el argumento ```how=\"outer\"```.\n",
344 | "\n",
345 | "* El *dataframe* resultante:\n",
346 | " * Contendrá a todas las posibles relaciones entre los *dataframes* ```facturas``` y ```clientes```.\n",
347 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. "
348 | ]
349 | },
350 | {
351 | "cell_type": "code",
352 | "execution_count": null,
353 | "metadata": {
354 | "scrolled": true
355 | },
356 | "outputs": [],
357 | "source": [
358 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n",
359 | " right_on=[\"ident\", \"suc_origen\"], how=\"outer\")"
360 | ]
361 | },
362 | {
363 | "cell_type": "markdown",
364 | "metadata": {},
365 | "source": [
366 | "* La siguente celda regresará un *dataframe* que relacionará:\n",
367 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n",
368 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n",
369 | " * Se ingresará el argumento ```how=\"left\"```.\n",
370 | "\n",
371 | "* El *dataframe* resultante:\n",
372 | " * Contendrá a las posibles relaciones del *dataframes* ```facturas```.\n",
373 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. "
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": null,
379 | "metadata": {
380 | "scrolled": false
381 | },
382 | "outputs": [],
383 | "source": [
384 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n",
385 | " right_on=[\"ident\", \"suc_origen\"], how=\"left\")"
386 | ]
387 | },
388 | {
389 | "cell_type": "markdown",
390 | "metadata": {},
391 | "source": [
392 | "* La siguente celda regresará un *dataframe* que relacionará:\n",
393 | " * Los elementos de ```facturas[\"cliente\"]``` con ```clientes[\"ident\"]```.\n",
394 | " * Los elementos de ```facturas[\"sucursal\"]``` con ```clientes[\"suc_origen\"]```.\n",
395 | " * Se ingresará el argumento ```how=\"right\"```.\n",
396 | "\n",
397 | "* El *dataframe* resultante:\n",
398 | " * Contendrá a las posibles relaciones del *dataframes* ```clientes```.\n",
399 | " * Cuando no existan coincidencias entre *dataframes*, la información faltante será completada con valores ```np.NaN```. "
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": null,
405 | "metadata": {
406 | "scrolled": true
407 | },
408 | "outputs": [],
409 | "source": [
410 | "facturas.merge(clientes, left_on=[\"cliente\", \"sucursal\"],\n",
411 | " right_on=[\"ident\", \"suc_origen\"], how=\"right\")"
412 | ]
413 | },
414 | {
415 | "cell_type": "markdown",
416 | "metadata": {},
417 | "source": [
418 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
419 | "© José Luis Chiquete Valdivieso. 2023.
"
420 | ]
421 | }
422 | ],
423 | "metadata": {
424 | "kernelspec": {
425 | "display_name": "Python 3 (ipykernel)",
426 | "language": "python",
427 | "name": "python3"
428 | },
429 | "language_info": {
430 | "codemirror_mode": {
431 | "name": "ipython",
432 | "version": 3
433 | },
434 | "file_extension": ".py",
435 | "mimetype": "text/x-python",
436 | "name": "python",
437 | "nbconvert_exporter": "python",
438 | "pygments_lexer": "ipython3",
439 | "version": "3.9.2"
440 | }
441 | },
442 | "nbformat": 4,
443 | "nbformat_minor": 2
444 | }
445 |
--------------------------------------------------------------------------------
/15_metodo_filter.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# El método ```pd.DataFrame.filter()```."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd\n",
24 | "from datetime import datetime"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "El método ```pd.DataFrame.filter()``` permite buscar coincidencias mediante ciertos argumentos de búsqueda sobre los índices de un *dataframe*. El resultado es un *dataframe* nuevo con los elementos coincidentes de la búsqueda.\n",
32 | "\n",
33 | "```\n",
34 | "df.filter(, axis=)\n",
35 | "```\n",
36 | "\n",
37 | "Donde:\n",
38 | "\n",
39 | "* `````` es un argumento que define los cirterios de búsqueda. Los parámetros disponibles para los argumentos de este método son:\n",
40 | " * ```items``` en el que se definen los encabezados a buscar dentro de un objeto iterable.\n",
41 | " * ```like``` en el que se define una cadena de caracteres que debe coincidir con el identificador de algún índice.\n",
42 | " * ```regex``` define un patrón mediante una expresión regular.\n",
43 | "* `````` puede ser:\n",
44 | " * ```0``` para realizar la búsqueda en los índices de los renglones.\n",
45 | " * ```1``` para realizar la búsqueda en los índices de lass columnas. Este es el valor por defecto. \n",
46 | "\n",
47 | "\n",
48 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html"
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "**Ejemplos:**"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "* La siguente celda definirá al *dataframe* ```facturas``` con los identificadores de columnas:\n",
63 | " * ```'folio'```.\n",
64 | " * ```'sucursal'```.\n",
65 | " * ```'monto'```.\n",
66 | " * ```'fecha'```.\n",
67 | " * ```'cliente'```."
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": [
76 | "facturas = pd.DataFrame({'folio':(15234, \n",
77 | " 15235, \n",
78 | " 15236, \n",
79 | " 15237, \n",
80 | " 15238, \n",
81 | " 15239, \n",
82 | " 15240,\n",
83 | " 15241,\n",
84 | " 15242),\n",
85 | " 'sucursal':('CDMX01',\n",
86 | " 'MTY01',\n",
87 | " 'CDMX02',\n",
88 | " 'CDMX02',\n",
89 | " 'MTY01',\n",
90 | " 'GDL01',\n",
91 | " 'CDMX02',\n",
92 | " 'MTY01',\n",
93 | " 'GDL01'),\n",
94 | " 'monto':(1420.00,\n",
95 | " 1532.00,\n",
96 | " 890.00,\n",
97 | " 1300.00,\n",
98 | " 3121.47,\n",
99 | " 1100.5,\n",
100 | " 12230,\n",
101 | " 230.85,\n",
102 | " 1569),\n",
103 | " 'fecha':(datetime(2019,3,11,17,24),\n",
104 | " datetime(2019,3,24,14,46),\n",
105 | " datetime(2019,3,25,17,58),\n",
106 | " datetime(2019,3,27,13,11),\n",
107 | " datetime(2019,3,31,10,25),\n",
108 | " datetime(2019,4,1,18,32),\n",
109 | " datetime(2019,4,3,11,43),\n",
110 | " datetime(2019,4,4,16,55),\n",
111 | " datetime(2019,4,5,12,59)),\n",
112 | " 'id_cliente':(19234,\n",
113 | " 19232,\n",
114 | " 19235,\n",
115 | " 19233,\n",
116 | " 19236,\n",
117 | " 19237,\n",
118 | " 19232,\n",
119 | " 19233,\n",
120 | " 19232)\n",
121 | " })"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": null,
127 | "metadata": {
128 | "scrolled": true
129 | },
130 | "outputs": [],
131 | "source": [
132 | "facturas"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna sean exactamente ```'id_cliente'``` o ```'sucursal'```."
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": null,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "facturas.filter(items=['id_cliente','sucursal'])"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna que incluyan la cadena ```'mon'```."
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "facturas.filter(like=\"mon\")"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna que incluyan la cadena ```'o'```."
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {
178 | "scrolled": true
179 | },
180 | "outputs": [],
181 | "source": [
182 | "facturas.filter(like=\"o\")"
183 | ]
184 | },
185 | {
186 | "cell_type": "markdown",
187 | "metadata": {},
188 | "source": [
189 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de índice que incluyan la cadena ```'1'```."
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": null,
195 | "metadata": {
196 | "scrolled": true
197 | },
198 | "outputs": [],
199 | "source": [
200 | "facturas.filter(like=\"1\", axis=0)"
201 | ]
202 | },
203 | {
204 | "cell_type": "markdown",
205 | "metadata": {},
206 | "source": [
207 | "* La siguiente celda regresará un *dataframe* cuyos identificadores de columna cumplan con la expresión regular ```r\"sal$\"```."
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": null,
213 | "metadata": {
214 | "scrolled": true
215 | },
216 | "outputs": [],
217 | "source": [
218 | "facturas.filter(regex=r\"sal$\")"
219 | ]
220 | },
221 | {
222 | "cell_type": "markdown",
223 | "metadata": {},
224 | "source": [
225 | "## Ejemplo de ```pd.DataFrame.filter()``` y ```pd.DataFrame.merge().```"
226 | ]
227 | },
228 | {
229 | "cell_type": "markdown",
230 | "metadata": {},
231 | "source": [
232 | "* La siguiente celda creará al *dataframe* ```clientes``` con la estructura de columnas: \n",
233 | " * ```'id'```.\n",
234 | " * ```'nombre'```.\n",
235 | " * ```'apellido'```.\n",
236 | " * ```'suc_origen'```."
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": null,
242 | "metadata": {},
243 | "outputs": [],
244 | "source": [
245 | "clientes = pd.DataFrame({'id':(19232, \n",
246 | " 19233, \n",
247 | " 19234, \n",
248 | " 19235, \n",
249 | " 19236),\n",
250 | " 'nombre':('Adriana',\n",
251 | " 'Marcos',\n",
252 | " 'Rubén',\n",
253 | " 'Samuel',\n",
254 | " 'Martha'),\n",
255 | " 'apellido':('Sánchez',\n",
256 | " 'García',\n",
257 | " 'Rincón',\n",
258 | " 'Oliva',\n",
259 | " 'Martínez'),\n",
260 | " 'suc_origen':('CDMX01',\n",
261 | " 'CDMX02',\n",
262 | " 'CDMX02',\n",
263 | " 'CDMX01',\n",
264 | " 'CDMX03')\n",
265 | " })"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": [
274 | "clientes"
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "metadata": {},
280 | "source": [
281 | "* La siguiente celda combinará los métodos ```filter()``` y ```merge()``` que resultarán en un *dataframe* con una estructura de columnas:\n",
282 | " * Su utilizará el método ```clientes.merge()``` para identificar coincidencias entre los elementos de ```clientes['id']``` y ```facturas['id_cliente']```.\n",
283 | " * Al *dataframe* resultante se le aplicará el método ```filter()```para regresar únicamente las columnas: \n",
284 | " * ```'folio'```. \n",
285 | " * ```'nombre```. \n",
286 | " * ```'apellido'```.\n",
287 | " * ```'monto'```."
288 | ]
289 | },
290 | {
291 | "cell_type": "code",
292 | "execution_count": null,
293 | "metadata": {
294 | "scrolled": true
295 | },
296 | "outputs": [],
297 | "source": [
298 | "clientes.merge(facturas,\n",
299 | " left_on='id',\n",
300 | " right_on='id_cliente')"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": null,
306 | "metadata": {},
307 | "outputs": [],
308 | "source": [
309 | "clientes.filter(items=['folio', \n",
310 | " 'nombre', \n",
311 | " 'apellido', \n",
312 | " 'monto'])"
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": null,
318 | "metadata": {
319 | "scrolled": true
320 | },
321 | "outputs": [],
322 | "source": [
323 | "clientes.merge(facturas,\n",
324 | " left_on='id',\n",
325 | " right_on='id_cliente').filter(items=['folio',\n",
326 | " 'nombre', \n",
327 | " 'apellido',\n",
328 | " 'monto'])"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
336 | "© José Luis Chiquete Valdivieso. 2023.
"
337 | ]
338 | }
339 | ],
340 | "metadata": {
341 | "kernelspec": {
342 | "display_name": "Python 3 (ipykernel)",
343 | "language": "python",
344 | "name": "python3"
345 | },
346 | "language_info": {
347 | "codemirror_mode": {
348 | "name": "ipython",
349 | "version": 3
350 | },
351 | "file_extension": ".py",
352 | "mimetype": "text/x-python",
353 | "name": "python",
354 | "nbconvert_exporter": "python",
355 | "pygments_lexer": "ipython3",
356 | "version": "3.9.2"
357 | }
358 | },
359 | "nbformat": 4,
360 | "nbformat_minor": 2
361 | }
362 |
--------------------------------------------------------------------------------
/16_metodos_apply_y_transform.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Los métodos ```apply()``` y ```transform()```."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd\n",
24 | "import numpy as np"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "## *Dataframe* ilustrativo."
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "El *dataframe* ```poblacion``` representa un censo poblacional de especies animales en diversas regiones geográficas.\n",
39 | "\n",
40 | "Las poblaciones de animales censadas representan los índices del *dataframe* y son:\n",
41 | "* ```'lobo'```.\n",
42 | "* ```'jaguar'```.\n",
43 | "* ```'coyote'```.\n",
44 | "* ```'halcón'```. \n",
45 | "* ```'lechuza'```.\n",
46 | "* ```'aguila'```.\n",
47 | "\n",
48 | "Las regiones geográficas representan la columnas del *dataframe* y son:\n",
49 | "\n",
50 | "* ```Norte_1```.\n",
51 | "* ```Norte_2```.\n",
52 | "* ```Sur_1```.\n",
53 | "* ```Sur_2```."
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "indice = ('lobo', 'jaguar', 'coyote', 'halcón', 'lechuza', 'aguila')\n",
63 | "poblacion = pd.DataFrame({'Norte_1':(25,\n",
64 | " 45,\n",
65 | " 23,\n",
66 | " 67,\n",
67 | " 14,\n",
68 | " 12),\n",
69 | " 'Norte_2':(31,\n",
70 | " 0,\n",
71 | " 23,\n",
72 | " 3,\n",
73 | " 34,\n",
74 | " 2),\n",
75 | " 'Sur_1':(0,\n",
76 | " 4,\n",
77 | " 3,\n",
78 | " 1,\n",
79 | " 1,\n",
80 | " 2),\n",
81 | " 'Sur_2':(2,\n",
82 | " 0,\n",
83 | " 12,\n",
84 | " 23,\n",
85 | " 11,\n",
86 | " 2)}, index=indice)"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "metadata": {
93 | "scrolled": false
94 | },
95 | "outputs": [],
96 | "source": [
97 | "poblacion"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "## El método ```apply()```."
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "El método ```apply()``` permite aplicar una función a una serie o dataframe de *Pandas*.\n",
112 | "\n",
113 | "```\n",
114 | ".apply(, axis=)\n",
115 | "```\n",
116 | "\n",
117 | "Donde:\n",
118 | "\n",
119 | "* `````` es una serie o un *dataframe* de *Pandas*.\n",
120 | "* `````` es una función de *Python* o de *Numpy*.\n",
121 | "* `````` puede ser:\n",
122 | " * ```0``` para aplicar la función a los renglones. Este es el valor por defecto.\n",
123 | " * ```1``` para aplicar la función a las columnas.\n",
124 | "\n",
125 | "Este método realiza operaciones de *broadcast* dentro del objeto.\n",
126 | "\n",
127 | "Para fines prácticos se explorará el método ```pd.DataFrame.apply()``` cuya documentación puede ser consultada en:\n",
128 | "\n",
129 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html"
130 | ]
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "metadata": {},
135 | "source": [
136 | "### Funciones aceptadas.\n",
137 | "\n",
138 | "* El método ```apply()``` permite ingresar como argumento el nombre de una función o una función *lambda* de *Python*."
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "**Ejemplos:**"
146 | ]
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "* La siguiente celda definirá a la función ```suma_dos()```."
153 | ]
154 | },
155 | {
156 | "cell_type": "code",
157 | "execution_count": null,
158 | "metadata": {},
159 | "outputs": [],
160 | "source": [
161 | "def suma_dos(x:int) -> int:\n",
162 | " ''''Función que regresa el resultado de sumar 2 unidades a un entero.'''\n",
163 | " return x + 2"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "* La siguiente celda regresará un *dataframe* que contiene el resultado de ejecutar la función ```suma_dos()``` usando a cada elemento del *dataframe* ```población``` como argumento."
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": null,
176 | "metadata": {},
177 | "outputs": [],
178 | "source": [
179 | "poblacion"
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "execution_count": null,
185 | "metadata": {},
186 | "outputs": [],
187 | "source": [
188 | "poblacion.apply(suma_dos)"
189 | ]
190 | },
191 | {
192 | "cell_type": "markdown",
193 | "metadata": {},
194 | "source": [
195 | "* La siguiente celda regresará un *dataframe* que contiene el resultado de ejecutar la función definida como ```lambda x: x + 2``` usando a cada elemento del *dataframe* ```población``` como argumento."
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": null,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "poblacion.apply(lambda x: x + 2)"
205 | ]
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "metadata": {},
210 | "source": [
211 | "* La siguiente celda regresará una serie que corresponde a ejecutar la función ```suma_dos()``` a cada elemento de la serie que conforma la columna ```poblacion['Norte_2']```. "
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": null,
217 | "metadata": {
218 | "scrolled": true
219 | },
220 | "outputs": [],
221 | "source": [
222 | "poblacion['Norte_2'].apply(suma_dos)"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "### *Broadcasting*."
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "* La siguiente celda utilizará las propiedades de *broadcasting* para aplicar una función que suma diversos elementos a cada renglón del *dataframe* ```poblacion```el *dataframe* ```poblacion``` en el eje ```0```. \n",
237 | "* En vista de que el objeto ```[1, 2, 3, 4, 5, 6]``` tiene ```6``` elementos y el *dataframe* ```poblacion``` es de forma ```(6, 4)```, es posible realizar el *broadcasting*."
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": null,
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "poblacion"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "metadata": {
253 | "scrolled": false
254 | },
255 | "outputs": [],
256 | "source": [
257 | "poblacion.apply(lambda x: x + [1, 2, 3, 4, 5, 6])"
258 | ]
259 | },
260 | {
261 | "cell_type": "markdown",
262 | "metadata": {},
263 | "source": [
264 | "* La siguiente celda utilizará las propiedades de *broadcasting* para aplicar una función que suma diversos elementos a cada renglón del *dataframe* ```poblacion```el *dataframe* ```poblacion``` en el eje ```1```. \n",
265 | "* En vista de que el objeto ```[1, 2, 3, 4]``` tiene ```4``` elementos y el *dataframe* ```poblacion``` es de forma ```(6, 4)```, es posible realizar el *broadcasting*."
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": [
274 | "poblacion"
275 | ]
276 | },
277 | {
278 | "cell_type": "code",
279 | "execution_count": null,
280 | "metadata": {},
281 | "outputs": [],
282 | "source": [
283 | "poblacion.apply(lambda x: x + [1, 2, 3, 4], axis=1)"
284 | ]
285 | },
286 | {
287 | "cell_type": "markdown",
288 | "metadata": {},
289 | "source": [
290 | "* La siguiente celda aplicará la función con *broadcasting* sobre el eje ```0``` con un objeto de tamaño iadecuado. Se desencadenará una excepción del tipo ```ValueError```."
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": null,
296 | "metadata": {
297 | "scrolled": false
298 | },
299 | "outputs": [],
300 | "source": [
301 | "poblacion.apply(lambda x: x + [1, 2, 3, 4])"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "### Aplicación de funciones de *Numpy*.\n",
309 | "\n",
310 | "*Numpy* cuenta con funciones de agregación capaces de realizar operaciones con la totalidad de los elementos de un arreglo, en vez de con cada uno de ellos. \n",
311 | "\n",
312 | "El método ```apply()``` es compatible con este tipo de funciones."
313 | ]
314 | },
315 | {
316 | "cell_type": "markdown",
317 | "metadata": {},
318 | "source": [
319 | "**Ejemplo:**"
320 | ]
321 | },
322 | {
323 | "cell_type": "markdown",
324 | "metadata": {},
325 | "source": [
326 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```0``` (renglones) del *dataframe* ```poblacion```, usando la función ```np.sum()``` y regresará una serie con los resultados."
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": null,
332 | "metadata": {
333 | "scrolled": true
334 | },
335 | "outputs": [],
336 | "source": [
337 | "poblacion.apply(np.sum)"
338 | ]
339 | },
340 | {
341 | "cell_type": "markdown",
342 | "metadata": {},
343 | "source": [
344 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```1``` (columnas) del *dataframe* ```poblacion```, usando la función ```np.sum()``` y regresará una serie con los resultados."
345 | ]
346 | },
347 | {
348 | "cell_type": "code",
349 | "execution_count": null,
350 | "metadata": {
351 | "scrolled": true
352 | },
353 | "outputs": [],
354 | "source": [
355 | "poblacion.apply(np.sum, axis=1)"
356 | ]
357 | },
358 | {
359 | "cell_type": "markdown",
360 | "metadata": {},
361 | "source": [
362 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```0``` (renglones) del *dataframe* ```poblacion```, usando la función ```np.mean()``` y regresará una serie con los resultados."
363 | ]
364 | },
365 | {
366 | "cell_type": "code",
367 | "execution_count": null,
368 | "metadata": {
369 | "scrolled": true
370 | },
371 | "outputs": [],
372 | "source": [
373 | "poblacion.apply(np.mean)"
374 | ]
375 | },
376 | {
377 | "cell_type": "markdown",
378 | "metadata": {},
379 | "source": [
380 | "* La siguiente celda realizará una sumatoria de cada elemento en el eje ```1``` (columnas) del *dataframe* ```poblacion```, usando la función ```np.mean()``` y regresará una serie con los resultados."
381 | ]
382 | },
383 | {
384 | "cell_type": "code",
385 | "execution_count": null,
386 | "metadata": {
387 | "scrolled": true
388 | },
389 | "outputs": [],
390 | "source": [
391 | "poblacion.apply(np.mean, axis=1)"
392 | ]
393 | },
394 | {
395 | "cell_type": "markdown",
396 | "metadata": {},
397 | "source": [
398 | "### Optimización en función de contexto de los datos."
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {},
404 | "source": [
405 | "El método ```pd.Dataframe.apply()``` permite identificar ciertos datos que podrían causar errores o excepciones y es capaz de utilizar funcione de *numpy* análogas que den un resultado en vez de una excepción."
406 | ]
407 | },
408 | {
409 | "cell_type": "markdown",
410 | "metadata": {},
411 | "source": [
412 | "**Ejemplo:**"
413 | ]
414 | },
415 | {
416 | "cell_type": "markdown",
417 | "metadata": {},
418 | "source": [
419 | "* La función ```np.mean()``` regresa un valor ```np.NaN``` cuando encuentra un valor ```np.Nan``` en el arreglo que se le ingresa como argumento."
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": null,
425 | "metadata": {},
426 | "outputs": [],
427 | "source": [
428 | "arreglo = np.array([25, np.NaN, 23, 67, 14, 12])"
429 | ]
430 | },
431 | {
432 | "cell_type": "code",
433 | "execution_count": null,
434 | "metadata": {},
435 | "outputs": [],
436 | "source": [
437 | "arreglo"
438 | ]
439 | },
440 | {
441 | "cell_type": "code",
442 | "execution_count": null,
443 | "metadata": {},
444 | "outputs": [],
445 | "source": [
446 | "np.mean(arreglo)"
447 | ]
448 | },
449 | {
450 | "cell_type": "markdown",
451 | "metadata": {},
452 | "source": [
453 | "* La función ```np.nanmean()``` descarta los valores ```np.NaN``` que se encuentren en el arreglo que se le ingresa como argumento y calcula el promedio con el resto de los elementos."
454 | ]
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": null,
459 | "metadata": {},
460 | "outputs": [],
461 | "source": [
462 | "np.nanmean(arreglo)"
463 | ]
464 | },
465 | {
466 | "cell_type": "markdown",
467 | "metadata": {},
468 | "source": [
469 | "* La siguiente celda creará al *dataframe* ```poblacion_nan``` a partir del *dataframe* ```poblacion```, sustituyendo el valor de ```poblacion_nan['Norte_1']['jaguar']```por ```np.NaN```."
470 | ]
471 | },
472 | {
473 | "cell_type": "code",
474 | "execution_count": null,
475 | "metadata": {
476 | "scrolled": true
477 | },
478 | "outputs": [],
479 | "source": [
480 | "poblacion_nan = poblacion.copy()\n",
481 | "poblacion_nan['Norte_1']['jaguar'] = np.NaN\n",
482 | "poblacion_nan"
483 | ]
484 | },
485 | {
486 | "cell_type": "markdown",
487 | "metadata": {},
488 | "source": [
489 | "* La siguiente celda usará la función ```np.mean()``` como argumento del método ```poblacion_nan.apply()```. El comportamiento es idéntico a usar ```np.nanmean()```.\n",
490 | "* El resultado para la columna ```Norte_1``` es ```28.2``` en vez de ```np.Nan```."
491 | ]
492 | },
493 | {
494 | "cell_type": "code",
495 | "execution_count": null,
496 | "metadata": {},
497 | "outputs": [],
498 | "source": [
499 | "poblacion_nan.apply(np.mean)"
500 | ]
501 | },
502 | {
503 | "cell_type": "code",
504 | "execution_count": null,
505 | "metadata": {},
506 | "outputs": [],
507 | "source": [
508 | "poblacion_nan.apply(np.nanmean)"
509 | ]
510 | },
511 | {
512 | "cell_type": "markdown",
513 | "metadata": {},
514 | "source": [
515 | "## El método ```pd.DataFrame.transform()```.\n",
516 | "\n",
517 | "Este método permite crear nuevos niveles con los resultados de las funciones de aplicará una o más funciones a los elementos de un *dataframe*.\n",
518 | "\n",
519 | "```\n",
520 | "df.transform(, , ..., axis=)\n",
521 | "```\n",
522 | "\n",
523 | "Donde:\n",
524 | "\n",
525 | "* `````` es una función de *Python* o de *Numpy*.\n",
526 | "* `````` puede ser:\n",
527 | " * ```0``` para aplicar la función a los renglones. Este es el valor por defecto.\n",
528 | " * ```1``` para aplicar la función a las columnas.\n",
529 | "\n",
530 | "**NOTA:** Este método no permite realizar operaciones de agregación.\n",
531 | "\n",
532 | "\n",
533 | "La documentación del método ```pd.DataFrame.transform()``` puede ser consultada en:\n",
534 | "\n",
535 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html"
536 | ]
537 | },
538 | {
539 | "cell_type": "markdown",
540 | "metadata": {},
541 | "source": [
542 | "**Ejemplo:**"
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "metadata": {},
548 | "source": [
549 | "* Se utilizará el *dataframe* ```poblacion``` definido previamente."
550 | ]
551 | },
552 | {
553 | "cell_type": "code",
554 | "execution_count": null,
555 | "metadata": {
556 | "scrolled": true
557 | },
558 | "outputs": [],
559 | "source": [
560 | "poblacion"
561 | ]
562 | },
563 | {
564 | "cell_type": "markdown",
565 | "metadata": {},
566 | "source": [
567 | "* La siguiente celda aplicará las funciones al *dataframe* ```poblacion```:\n",
568 | " * ```lambda x: x + [1, 2, 3, 4, 5, 6]```.\n",
569 | " * ```np.log```\n",
570 | " * ```np.sin```\n",
571 | "* El dataframe resultante tendrá un subnivel debajo de cada columna de ```poblacion``` para en el que e creará una columna con el resultado de aplicar la función tomando a cada elemento como argumento."
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": null,
577 | "metadata": {
578 | "scrolled": true
579 | },
580 | "outputs": [],
581 | "source": [
582 | "poblacion.transform([lambda x: x + [1, 2, 3, 4, 5, 6],\n",
583 | " np.log,\n",
584 | " np.sin])"
585 | ]
586 | },
587 | {
588 | "cell_type": "markdown",
589 | "metadata": {},
590 | "source": [
591 | "* El método ```poblacion.transform()``` no es compatible con la función ```np.mean()```, por lo que se desencadenará una excepción ```ValueError```."
592 | ]
593 | },
594 | {
595 | "cell_type": "code",
596 | "execution_count": null,
597 | "metadata": {
598 | "scrolled": false
599 | },
600 | "outputs": [],
601 | "source": [
602 | "poblacion.transform(np.mean)"
603 | ]
604 | },
605 | {
606 | "cell_type": "markdown",
607 | "metadata": {},
608 | "source": [
609 | "* Sin embargo, es posible ingresar una función de agregación en otra función que no realice agregación por si misma."
610 | ]
611 | },
612 | {
613 | "cell_type": "code",
614 | "execution_count": null,
615 | "metadata": {},
616 | "outputs": [],
617 | "source": [
618 | "poblacion.transform(lambda x: x - x.mean())"
619 | ]
620 | },
621 | {
622 | "cell_type": "markdown",
623 | "metadata": {},
624 | "source": [
625 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
626 | "© José Luis Chiquete Valdivieso. 2023.
"
627 | ]
628 | }
629 | ],
630 | "metadata": {
631 | "kernelspec": {
632 | "display_name": "Python 3 (ipykernel)",
633 | "language": "python",
634 | "name": "python3"
635 | },
636 | "language_info": {
637 | "codemirror_mode": {
638 | "name": "ipython",
639 | "version": 3
640 | },
641 | "file_extension": ".py",
642 | "mimetype": "text/x-python",
643 | "name": "python",
644 | "nbconvert_exporter": "python",
645 | "pygments_lexer": "ipython3",
646 | "version": "3.9.2"
647 | }
648 | },
649 | "nbformat": 4,
650 | "nbformat_minor": 2
651 | }
652 |
--------------------------------------------------------------------------------
/17_metodos_de_enmascaramiento.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Métodos de enmascaramiento."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "En este capítulo se explorarán los métodos que permiten sustituir los valores de un *dataframe* mediante otro *dataframe* con valores booleanos."
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {},
36 | "source": [
37 | "## *Dataframe* ilustrativo. "
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": [
44 | "El *dataframe* ```poblacion``` representa un censo poblacional de especies animales en diversas regiones geográficas.\n",
45 | "\n",
46 | "Las poblaciones de animales censadas representan los índices del *dataframe* y son:\n",
47 | "* ```'lobo'```.\n",
48 | "* ```'jaguar'```.\n",
49 | "* ```'coyote'```.\n",
50 | "* ```'halcón'```. \n",
51 | "* ```'lechuza'```.\n",
52 | "* ```'aguila'```.\n",
53 | "\n",
54 | "Las regiones geográficas representan la columnas del *dataframe* y son:\n",
55 | "\n",
56 | "* ```Norte_1```.\n",
57 | "* ```Norte_2```.\n",
58 | "* ```Sur_1```.\n",
59 | "* ```Sur_2```."
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": null,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "indice = ('lobo', 'jaguar', 'coyote', 'halcón', 'lechuza', 'aguila')\n",
69 | "poblacion = pd.DataFrame({'Norte_1':(25,\n",
70 | " 0,\n",
71 | " 45,\n",
72 | " 23,\n",
73 | " 67,\n",
74 | " 12),\n",
75 | " 'Norte_2':(31,\n",
76 | " 0,\n",
77 | " 23,\n",
78 | " 3,\n",
79 | " 34,\n",
80 | " 2),\n",
81 | " 'Sur_1':(0,\n",
82 | " 4,\n",
83 | " 3,\n",
84 | " 1,\n",
85 | " 1,\n",
86 | " 2),\n",
87 | " 'Sur_2':(2,\n",
88 | " 0,\n",
89 | " 12,\n",
90 | " 23,\n",
91 | " 11,\n",
92 | " 2)},\n",
93 | " index=indice)"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": null,
99 | "metadata": {},
100 | "outputs": [],
101 | "source": [
102 | "poblacion"
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "## Enmascaramiento.\n",
110 | "\n",
111 | "Se entiende por \"enmascaramiento\" la aplicación sobre un *datraframe* de otro *dataframe* de tamaño idéntico, pero compuesto por valores booleanos (*dataframe* de máscara), con la finalidad de sustituir cada valor del *dataframe* original en función del valor booleano correspondiente.\n"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "## El método ```mask()```.\n",
119 | "\n",
120 | "El método ```mask()``` permite sutituir por un valor predeterminado a aquellos elementos cuya contraparte en el objeto usado como máscara sea ```True```.\n",
121 | "\n",
122 | "```\n",
123 | ".mask(, )\n",
124 | "```\n",
125 | "\n",
126 | "Donde:\n",
127 | "\n",
128 | "* `````` es una serie o un *dataframe*.\n",
129 | "* `````` es una serie o un *dataframe* de dimensiones idénticas a `````` donde todos su elementos son de tipo ```bool```.\n",
130 | "* ```valor``` es el valor que sutituirá a aquellos elementos en `````` cuya contraparte en `````` sea ```True```.\n"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "**Ejemplo:**"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {},
143 | "source": [
144 | "* Se uitlizará el *dataframe* ```poblacion``` definido previamente."
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "poblacion"
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "* Se creará el *dataframe* ```poblacion_evaluada``` validando si cada elemento del ```poblacion``` es igual a ```0```."
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": null,
166 | "metadata": {},
167 | "outputs": [],
168 | "source": [
169 | "poblacion_evaluada = poblacion == 0"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "metadata": {},
176 | "outputs": [],
177 | "source": [
178 | "poblacion_evaluada"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "* La siguiente celda sustiruirá con la cadena ```'extinto'``` a cada valor en el *dataframe* ```poblacion``` que corresponda a ```True``` en el *dataframe* ```poblacion_evaluada```."
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "poblacion.mask(poblacion_evaluada, 'extinto')"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "metadata": {},
200 | "source": [
201 | "## El método ```where()```.\n",
202 | "\n",
203 | "\n",
204 | "El método ```where()``` permite sustituir por un valor predeterminado a aquellos elementos cuya contraparte en el objeto usado como máscara sea ```False```.\n",
205 | "\n",
206 | "```\n",
207 | ".where(, )\n",
208 | "```\n",
209 | "\n",
210 | "Donde:\n",
211 | "\n",
212 | "* `````` es una serie o un *dataframe*.\n",
213 | "* `````` es una serie o un *dataframe* de dimensiones idénticas a `````` donde todos su elementos son de tipo ```bool```.\n",
214 | "* ```valor``` es el valor que sutituirá a aquellos elementos en `````` cuya contraparte en `````` sea ```False```.\n"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "metadata": {},
220 | "source": [
221 | "**Ejemplo:**"
222 | ]
223 | },
224 | {
225 | "cell_type": "markdown",
226 | "metadata": {},
227 | "source": [
228 | "* La siguiente celda sustituirá a cada valor del *dataframe* ```poblacion``` que al validar si es menor que ```10``` de por resultado ```False``` por la cadena ```'sin riesgo'```."
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": null,
234 | "metadata": {},
235 | "outputs": [],
236 | "source": [
237 | "poblacion.where(poblacion < 10, 'sin riesgo')"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {},
243 | "source": [
244 | "## Ejemplo de combinación de ```mask()``` y ```where()```."
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "* La siguiente celda sustituirá por ```'sin riesgo'``` a aquellos elementos cuyo valor sea mayor o igual ```10``` y sustituirá por ```'amenazado'``` a aquellos elementos cuyo valor sea menor a ```2``` en el *dataframe* ```poblacion```."
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": null,
257 | "metadata": {},
258 | "outputs": [],
259 | "source": [
260 | "poblacion.where(poblacion < 10, 'sin riesgo').\\\n",
261 | " mask(poblacion < 2, 'amenazados')"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "* La siguiente celda usará los métodos ```filter()```, ```where()``` y ```mask()``` para aplicar el criterio del ejemplo previo, pero sólo a la columna ```'Sur-2```."
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": null,
274 | "metadata": {},
275 | "outputs": [],
276 | "source": [
277 | "poblacion.filter(items=['Sur_2']).\\\n",
278 | " where(poblacion < 10, 'sin riesgo').\\\n",
279 | " mask(poblacion < 2, 'amenazados')"
280 | ]
281 | },
282 | {
283 | "cell_type": "markdown",
284 | "metadata": {},
285 | "source": [
286 | "## El método ```query()```."
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": null,
292 | "metadata": {
293 | "scrolled": true
294 | },
295 | "outputs": [],
296 | "source": [
297 | "poblacion"
298 | ]
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": null,
303 | "metadata": {},
304 | "outputs": [],
305 | "source": [
306 | "poblacion[\"Norte_1\"] == 0"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": null,
312 | "metadata": {},
313 | "outputs": [],
314 | "source": [
315 | "poblacion[poblacion[\"Norte_1\"] == 0]"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": null,
321 | "metadata": {},
322 | "outputs": [],
323 | "source": [
324 | "poblacion.query('Norte_1 == 0')"
325 | ]
326 | },
327 | {
328 | "cell_type": "markdown",
329 | "metadata": {},
330 | "source": [
331 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
332 | "© José Luis Chiquete Valdivieso. 2023.
"
333 | ]
334 | }
335 | ],
336 | "metadata": {
337 | "kernelspec": {
338 | "display_name": "Python 3 (ipykernel)",
339 | "language": "python",
340 | "name": "python3"
341 | },
342 | "language_info": {
343 | "codemirror_mode": {
344 | "name": "ipython",
345 | "version": 3
346 | },
347 | "file_extension": ".py",
348 | "mimetype": "text/x-python",
349 | "name": "python",
350 | "nbconvert_exporter": "python",
351 | "pygments_lexer": "ipython3",
352 | "version": "3.9.2"
353 | }
354 | },
355 | "nbformat": 4,
356 | "nbformat_minor": 2
357 | }
358 |
--------------------------------------------------------------------------------
/19_limpieza_y_datos_faltantes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Limpieza y datos faltantes."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "En este capítulo se explorarán diversos métodos enfocados a gestionar *dataframes* que no son homogéneos en sus contenidos."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "scrolled": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import pandas as pd\n",
33 | "import numpy as np"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "## El *dataframe* ilustrativo."
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "El *dataframe* ```poblacion``` describe una serie de muestras poblacionales de animales en varias regiones geográficas."
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": null,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "poblacion = pd.DataFrame({'Animal':('lobo',\n",
57 | " 'coyote',\n",
58 | " 'jaguar',\n",
59 | " 'cerdo salvaje',\n",
60 | " 'tapir',\n",
61 | " 'venado',\n",
62 | " 'ocelote',\n",
63 | " 'puma'),\n",
64 | " 'Norte_I':(12,\n",
65 | " np.NAN,\n",
66 | " None,\n",
67 | " 2,\n",
68 | " 4,\n",
69 | " 2,\n",
70 | " 14,\n",
71 | " 5\n",
72 | " ),\n",
73 | " 'Norte_II':(23,\n",
74 | " 4,\n",
75 | " 25,\n",
76 | " 21,\n",
77 | " 9,\n",
78 | " 121,\n",
79 | " 1,\n",
80 | " 2\n",
81 | " ),\n",
82 | " 'Centro_I':(15,\n",
83 | " 23,\n",
84 | " 2,\n",
85 | " None,\n",
86 | " 40,\n",
87 | " 121,\n",
88 | " 0,\n",
89 | " 5),\n",
90 | " 'Sur_I':(28,\n",
91 | " 46,\n",
92 | " 14,\n",
93 | " 156,\n",
94 | " 79,\n",
95 | " 12,\n",
96 | " 2,\n",
97 | " np.NAN)}).set_index('Animal')"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "poblacion"
107 | ]
108 | },
109 | {
110 | "cell_type": "markdown",
111 | "metadata": {},
112 | "source": [
113 | "## Métodos de validación de *NaN*.\n",
114 | "\n",
115 | "En muchos casos, los *dataframes* incluyen objetos de tipo ```np.NaN```. Por lo general este tipo de dato denota datos incompletos cuyo verdadero valor es desconocido.\n",
116 | "\n",
117 | "Poder transformar eficientemente ```np.NaN``` en valores relevantes requiere de experiencia y conocimiento de los datos con los que se trabaja."
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "### El método ```isna()```.\n",
125 | "\n",
126 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores ```np.NaN```."
127 | ]
128 | },
129 | {
130 | "cell_type": "markdown",
131 | "metadata": {},
132 | "source": [
133 | "**Ejemplo:**"
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "* La siguiente celda evaluará cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN```. En caso de que el elemento sea ```np.NaN```, el valor dentro del *dataframe* resultante será ```True```."
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": null,
146 | "metadata": {},
147 | "outputs": [],
148 | "source": [
149 | "poblacion.isna()"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "### El método ```isnull()```.\n",
157 | "\n",
158 | "Este método de enmascaramineto que detecta aquellos elementos que contienen tanto a ```np.NaN``` como a ```None```."
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {},
164 | "source": [
165 | "**Ejemplo:**"
166 | ]
167 | },
168 | {
169 | "cell_type": "markdown",
170 | "metadata": {},
171 | "source": [
172 | "* Se utilizará al *dataframe* ```poblacion``` definido previamente."
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": null,
178 | "metadata": {
179 | "scrolled": true
180 | },
181 | "outputs": [],
182 | "source": [
183 | "poblacion"
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "* La siguiente celda evaluará cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN``` o ```None```. En caso de que el elemento coincida, el valor dentro del *dataframe* resultante será ```True```."
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "poblacion.isnull()"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {},
205 | "source": [
206 | "### El método ```notna()```.\n",
207 | "\n",
208 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores distintos a ```np.NaN```."
209 | ]
210 | },
211 | {
212 | "cell_type": "markdown",
213 | "metadata": {},
214 | "source": [
215 | "**Ejemplo:**"
216 | ]
217 | },
218 | {
219 | "cell_type": "markdown",
220 | "metadata": {},
221 | "source": [
222 | "* La siguiente celda evaluará a cada elemento de ```poblacion``` validando si existe algún valor igual a ```np.NaN```. En caso de que el elemento sea ```np.NaN```, el valor dentro del *dataframe* resultante será ```False```."
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": null,
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "poblacion.notna()"
232 | ]
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "### El método ```notnull()```.\n",
239 | "\n",
240 | "Este método de enmascaramiento detecta aquellos elementos que contienen valores distintos a ```np.NaN``` o a ```None```."
241 | ]
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "metadata": {},
246 | "source": [
247 | "**Ejemplo:**"
248 | ]
249 | },
250 | {
251 | "cell_type": "markdown",
252 | "metadata": {},
253 | "source": [
254 | "* La siguiente celda evaluará a cada elemento de ```poblacion``` validando si existen valore distintos a ```np.NaN``` o a ```None```. En caso de que el elemento sea ```np.NaN``` o ```None```, el valor dentro del *dataframe* resultante será ```False```."
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": [
263 | "poblacion.notnull()"
264 | ]
265 | },
266 | {
267 | "cell_type": "markdown",
268 | "metadata": {},
269 | "source": [
270 | "## El método ```fillna()```.\n",
271 | "\n",
272 | "Este método sustituirá los valores ```np.NaN``` con el valor designado como argumento.\n",
273 | "\n",
274 | "\n",
275 | "```\n",
276 | "df.fillna()\n",
277 | "```\n",
278 | "\n",
279 | "Donde:\n",
280 | "\n",
281 | "* `````` es cualquier objeto de *Python*, *Numpy* o de *Pandas*."
282 | ]
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "metadata": {},
287 | "source": [
288 | "**Ejemplo:**"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": null,
294 | "metadata": {
295 | "scrolled": false
296 | },
297 | "outputs": [],
298 | "source": [
299 | "poblacion"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {},
305 | "source": [
306 | "* Las siguientes celdas sustituirán a los elementos ```np.NaN``` por el objeto ingresado como argumento."
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": null,
312 | "metadata": {
313 | "scrolled": true
314 | },
315 | "outputs": [],
316 | "source": [
317 | "poblacion.fillna(0)"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": null,
323 | "metadata": {},
324 | "outputs": [],
325 | "source": [
326 | "poblacion.fillna(15)"
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": null,
332 | "metadata": {},
333 | "outputs": [],
334 | "source": [
335 | "poblacion.fillna(\"inválido\")"
336 | ]
337 | },
338 | {
339 | "cell_type": "markdown",
340 | "metadata": {},
341 | "source": [
342 | "## El método ```interpolate()```.\n",
343 | "\n",
344 | "Este método realiza cáclulos de interpolación para sustituir a ```np.NaN```.\n",
345 | "\n",
346 | "\n",
347 | "```\n",
348 | "df.interpolate(method=, axis=)\n",
349 | "```\n",
350 | "\n",
351 | "Donde:\n",
352 | "\n",
353 | "* `````` es un método de interpolación. El valor por defecto es ```'linear'```.\n",
354 | "* El parámetro ```axis``` define el eje desde el cual se tomarán los elementos de interpolación y su valor por defecto es ```0```.\n",
355 | "\n",
356 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html\n",
357 | "\n",
358 | "**Nota:** *Scipy* cuenta con diversos algoritmos de interpolación, los cuales pueden ser consultados en: \n",
359 | "\n",
360 | "* https://docs.scipy.org/doc/scipy/reference/interpolate.html\n",
361 | "* https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html"
362 | ]
363 | },
364 | {
365 | "cell_type": "markdown",
366 | "metadata": {},
367 | "source": [
368 | "**Ejemplos:**"
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {},
374 | "source": [
375 | "* Se utilizará el *dataframe* ```poblacion```definido previamente."
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": null,
381 | "metadata": {
382 | "scrolled": true
383 | },
384 | "outputs": [],
385 | "source": [
386 | "poblacion"
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "* La siguiente celda ejecutará el método ```poblacion.intterpolate()``` con los argumentos por defecto.\n",
394 | "* El *dataframe* resultante modificará aquellos elementos ```np.NaN``` aplicando una interpolación lineal tomando como datos de referencia a los del renglón a la que pertence el elemento. "
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": null,
400 | "metadata": {
401 | "scrolled": true
402 | },
403 | "outputs": [],
404 | "source": [
405 | "poblacion.interpolate()"
406 | ]
407 | },
408 | {
409 | "cell_type": "markdown",
410 | "metadata": {},
411 | "source": [
412 | "* La siguiente celda ejecutará el método ```poblacion.intterpolate()``` con el argumento ```axis=1```.\n",
413 | "* El *dataframe* resultante modificará aquellos elementos ```np.NaN``` aplicando una interpolación lineal tomando como datos de referencia a los de la columna a la que pertence el elemento."
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": null,
419 | "metadata": {
420 | "scrolled": true
421 | },
422 | "outputs": [],
423 | "source": [
424 | "poblacion.interpolate(axis=1)"
425 | ]
426 | },
427 | {
428 | "cell_type": "markdown",
429 | "metadata": {},
430 | "source": [
431 | "* La siguiente celda usará el argumento ```method=\"zero\"```.\n",
432 | "* El métódo ```\"zero\"``` requiere que exista un índice numérico para poder realizar la interpolación, por lo que se desencadenará una excepción de tipo ```ValueError```. "
433 | ]
434 | },
435 | {
436 | "cell_type": "code",
437 | "execution_count": null,
438 | "metadata": {
439 | "scrolled": false
440 | },
441 | "outputs": [],
442 | "source": [
443 | "poblacion.interpolate(method=\"zero\")"
444 | ]
445 | },
446 | {
447 | "cell_type": "markdown",
448 | "metadata": {},
449 | "source": [
450 | "* La siguiente celda creará un *dataframe* llamado ```poblacion_numerica```, basado en ```poblacion```en el que los índices serán numéricos y se desechará la columna ```'Animal'```."
451 | ]
452 | },
453 | {
454 | "cell_type": "code",
455 | "execution_count": null,
456 | "metadata": {},
457 | "outputs": [],
458 | "source": [
459 | "poblacion_numerica = poblacion.reset_index().drop('Animal', axis=1)"
460 | ]
461 | },
462 | {
463 | "cell_type": "code",
464 | "execution_count": null,
465 | "metadata": {},
466 | "outputs": [],
467 | "source": [
468 | "poblacion_numerica"
469 | ]
470 | },
471 | {
472 | "cell_type": "markdown",
473 | "metadata": {},
474 | "source": [
475 | "* la siguiente celda aplicará el métódo ```poblacion_numerica.interpolate()```:\n",
476 | " * Ingresando el argumento ```axis=0```.\n",
477 | " * Ingresando el argumento ```method=\"zero\"```."
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": null,
483 | "metadata": {},
484 | "outputs": [],
485 | "source": [
486 | "poblacion_numerica.interpolate(method=\"zero\", axis=0)"
487 | ]
488 | },
489 | {
490 | "cell_type": "markdown",
491 | "metadata": {},
492 | "source": [
493 | "## El método ```dropna()```.\n",
494 | "\n",
495 | "Este método desechará los renglones o columnas que contengan valores ```np.NaN```.\n",
496 | "\n",
497 | "\n",
498 | "```\n",
499 | "df.dropna(axis=)\n",
500 | "```\n",
501 | "\n",
502 | "Donde:\n",
503 | "\n",
504 | "* ``````"
505 | ]
506 | },
507 | {
508 | "cell_type": "markdown",
509 | "metadata": {},
510 | "source": [
511 | "**Ejemplo:**"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": null,
517 | "metadata": {},
518 | "outputs": [],
519 | "source": [
520 | "poblacion"
521 | ]
522 | },
523 | {
524 | "cell_type": "code",
525 | "execution_count": null,
526 | "metadata": {},
527 | "outputs": [],
528 | "source": [
529 | "poblacion.dropna()"
530 | ]
531 | },
532 | {
533 | "cell_type": "code",
534 | "execution_count": null,
535 | "metadata": {},
536 | "outputs": [],
537 | "source": [
538 | "poblacion.dropna(axis=1)"
539 | ]
540 | },
541 | {
542 | "cell_type": "markdown",
543 | "metadata": {},
544 | "source": [
545 | "## El método ```duplicated()```.\n",
546 | "\n",
547 | "Identifica aquellos renglones duplicados."
548 | ]
549 | },
550 | {
551 | "cell_type": "code",
552 | "execution_count": null,
553 | "metadata": {},
554 | "outputs": [],
555 | "source": [
556 | "poblacion.duplicated()"
557 | ]
558 | },
559 | {
560 | "cell_type": "code",
561 | "execution_count": null,
562 | "metadata": {},
563 | "outputs": [],
564 | "source": [
565 | "otra_poblacion = pd.DataFrame({'Animal':('lobo',\n",
566 | " 'coyote',\n",
567 | " 'jaguar',\n",
568 | " 'cerdo salvaje',\n",
569 | " 'tapir',\n",
570 | " 'venado',\n",
571 | " 'ocelote',\n",
572 | " 'puma'),\n",
573 | " 'Norte_I':(12,\n",
574 | " 4,\n",
575 | " None,\n",
576 | " 2,\n",
577 | " 4,\n",
578 | " 2,\n",
579 | " 14,\n",
580 | " 4\n",
581 | " ),\n",
582 | " 'Norte_II':(23,\n",
583 | " 4,\n",
584 | " 25,\n",
585 | " 21,\n",
586 | " 9,\n",
587 | " 121,\n",
588 | " 1,\n",
589 | " 4\n",
590 | " ),\n",
591 | " 'Centro_I':(15,\n",
592 | " 4,\n",
593 | " 2,\n",
594 | " 120,\n",
595 | " 40,\n",
596 | " 121,\n",
597 | " 0,\n",
598 | " 4),\n",
599 | " 'Sur_I':(28,\n",
600 | " 4,\n",
601 | " 14,\n",
602 | " 156,\n",
603 | " 79,\n",
604 | " 12,\n",
605 | " 2,\n",
606 | " 4)}).set_index('Animal')"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": null,
612 | "metadata": {},
613 | "outputs": [],
614 | "source": [
615 | "otra_poblacion"
616 | ]
617 | },
618 | {
619 | "cell_type": "code",
620 | "execution_count": null,
621 | "metadata": {},
622 | "outputs": [],
623 | "source": [
624 | "otra_poblacion.duplicated()"
625 | ]
626 | },
627 | {
628 | "cell_type": "markdown",
629 | "metadata": {},
630 | "source": [
631 | "## El método ```drop_duplicates()```.\n",
632 | "\n",
633 | "Este método elimina renglones duplicados."
634 | ]
635 | },
636 | {
637 | "cell_type": "code",
638 | "execution_count": null,
639 | "metadata": {},
640 | "outputs": [],
641 | "source": [
642 | "otra_poblacion.drop_duplicates()"
643 | ]
644 | },
645 | {
646 | "cell_type": "markdown",
647 | "metadata": {},
648 | "source": [
649 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
650 | "© José Luis Chiquete Valdivieso. 2023.
"
651 | ]
652 | }
653 | ],
654 | "metadata": {
655 | "kernelspec": {
656 | "display_name": "Python 3 (ipykernel)",
657 | "language": "python",
658 | "name": "python3"
659 | },
660 | "language_info": {
661 | "codemirror_mode": {
662 | "name": "ipython",
663 | "version": 3
664 | },
665 | "file_extension": ".py",
666 | "mimetype": "text/x-python",
667 | "name": "python",
668 | "nbconvert_exporter": "python",
669 | "pygments_lexer": "ipython3",
670 | "version": "3.9.2"
671 | }
672 | },
673 | "nbformat": 4,
674 | "nbformat_minor": 2
675 | }
676 |
--------------------------------------------------------------------------------
/21_metodos_groupby.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Métodos ```groupby()```."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "*Pandas* cuenta con una funcionalidad que permite agrupar los datos idénticos en una columna o un renglón de un *dataframe* .\n"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "Tanto las series como los dataframes de *Pandas* cuentan con un método ```groupby()```.\n",
29 | "\n",
30 | "* El método ```pd.DataFrame.groupby()``` regresa un objeto ```pd.core.groupby.generic.DataFrameGroupBy```.\n",
31 | "* El método ```pd.Series.groupby()``` regresa un objeto ```pd.core.groupby.generic.SeriesGroupBy```.\n",
32 | "\n",
33 | "En este capítulo se explorará el método ```pd.DataFrame.groupby()```, asumiendo que el método```pd.Series.groupby()``` se comporta de forma similar."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "import pandas as pd\n",
43 | "from datetime import datetime"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## El método ```pd.DataFrame.groupby()```.\n",
51 | "\n",
52 | "El método regresa un objeto de tipo ```pd.core.groupby.generic.DataFrameGroupBy```.\n",
53 | "\n",
54 | "```\n",
55 | "df.groupby(by=, axis=, group_keys=True)\n",
56 | "```\n",
57 | "Donde:\n",
58 | "\n",
59 | "* `````` corresponde al identificador de la columna o índice en el que se realizará la agrupación.\n",
60 | "* El argumento ```axis``` indicará el eje al que se aplicará el método. El valor por defecto es ```1```.\n",
61 | "* El argumento ```group_keys``` le indica al método que use los valores de agrupamiento como llaves. El valor por defecto es ```False```, pero se recomienda asignarle el valor ```True```.\n",
62 | "\n",
63 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "**Ejemplo:**"
71 | ]
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "metadata": {},
76 | "source": [
77 | "* La siguiente celda creará al dataframe ```facturas``` con la estructura de columnas:\n",
78 | "\n",
79 | " * ```'folio'```.\n",
80 | " * ```'sucursal'```.\n",
81 | " * ```'monto'```.\n",
82 | " * ```'fecha'```.\n",
83 | " * ```'cliente'```."
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": null,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "facturas = pd.DataFrame({'folio':(15234, \n",
93 | " 15235, \n",
94 | " 15236, \n",
95 | " 15237, \n",
96 | " 15238, \n",
97 | " 15239, \n",
98 | " 15240,\n",
99 | " 15241,\n",
100 | " 15242),\n",
101 | " 'sucursal':('CDMX01',\n",
102 | " 'MTY01',\n",
103 | " 'CDMX02',\n",
104 | " 'CDMX02',\n",
105 | " 'MTY01',\n",
106 | " 'GDL01',\n",
107 | " 'CDMX02',\n",
108 | " 'MTY01',\n",
109 | " 'GDL01'),\n",
110 | " 'monto':(1420.00,\n",
111 | " 1532.00,\n",
112 | " 890.00,\n",
113 | " 1300.00,\n",
114 | " 3121.47,\n",
115 | " 1100.5,\n",
116 | " 12230,\n",
117 | " 230.85,\n",
118 | " 1569),\n",
119 | " 'fecha':(datetime(2019,3,11,17,24),\n",
120 | " datetime(2019,3,24,14,46),\n",
121 | " datetime(2019,3,25,17,58),\n",
122 | " datetime(2019,3,27,13,11),\n",
123 | " datetime(2019,3,31,10,25),\n",
124 | " datetime(2019,4,1,18,32),\n",
125 | " datetime(2019,4,3,11,43),\n",
126 | " datetime(2019,4,4,16,55),\n",
127 | " datetime(2019,4,5,12,59)),\n",
128 | " 'cliente':(19234,\n",
129 | " 19232,\n",
130 | " 19235,\n",
131 | " 19233,\n",
132 | " 19236,\n",
133 | " 19237,\n",
134 | " 19232,\n",
135 | " 19233,\n",
136 | " 19232)\n",
137 | " })"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": null,
143 | "metadata": {
144 | "scrolled": true
145 | },
146 | "outputs": [],
147 | "source": [
148 | "facturas"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "* La siguiente celda agrupará aquellos elementos en los que el valor de la columna ```facturas['cliente']``` sean iguales."
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "clientes = facturas.groupby(\"cliente\", group_keys=True)"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "* El objeto ```clientes``` es de tipo ```pd.core.groupby.generic.DataFrameGroupBy```."
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {},
178 | "outputs": [],
179 | "source": [
180 | "clientes"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "## Los objetos ```core.groupby.generic.DataFrameGroupBy```.\n",
188 | "\n",
189 | "Los objetos ```core.groupby.generic.DataFrameGroupBy``` son iteradores que contienen a objetos de tipo ```tuple``` resultantes de la agrupación.\n",
190 | "\n",
191 | "* El primer elemento de la tupla corresponde al valor que se agrupa.\n",
192 | "* El segundo elemento de la tupla corresponde a un *dataframe* con los elementos agrupados.\n",
193 | "\n",
194 | "Dichos objetos contiene diversos métodos capaces de procesar los datos de cada objeto ```tuple``` que contiene."
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "metadata": {},
200 | "source": [
201 | "**Ejemplo:**"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "* La siguiente celda desplegará las tuplas contenidas en ```clientes```."
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": null,
214 | "metadata": {
215 | "scrolled": true
216 | },
217 | "outputs": [],
218 | "source": [
219 | "for item in clientes:\n",
220 | " print(f\"\"\"ciente: {item[0]}\n",
221 | " -------\n",
222 | "{item[1]}\n",
223 | "\"\"\")"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {},
229 | "source": [
230 | "* La siguiente celda creará un objeto tipo ```list``` llamado ```clientes_agrupados``` a patir del objeto ```cliente```."
231 | ]
232 | },
233 | {
234 | "cell_type": "code",
235 | "execution_count": null,
236 | "metadata": {},
237 | "outputs": [],
238 | "source": [
239 | "clientes_agrupados = list(clientes)"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": null,
245 | "metadata": {
246 | "scrolled": true
247 | },
248 | "outputs": [],
249 | "source": [
250 | "clientes_agrupados"
251 | ]
252 | },
253 | {
254 | "cell_type": "markdown",
255 | "metadata": {},
256 | "source": [
257 | "* La siguiente celda regresará al *datafame* que corresponde al segundo elemento de la tupla ```clientes_agrupados[0]```."
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": null,
263 | "metadata": {},
264 | "outputs": [],
265 | "source": [
266 | "clientes_agrupados[0][1]"
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": null,
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "type(clientes_agrupados[0][1])"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {},
281 | "source": [
282 | "### Indexado de los objetos ```DataFrameGroupBy```.\n",
283 | "\n",
284 | "Los objetos ```DataFrameGroupBy``` permiten el indexado de columnas propio de los *dataframes*.\n",
285 | "\n",
286 | "```\n",
287 | "[]\n",
288 | "```\n",
289 | "Donde:\n",
290 | "\n",
291 | "* `````` es un objeto ```DataFrameGroupBy```.\n",
292 | "* `````` es el identificador de una columna del *dataframe* original.\n",
293 | "\n",
294 | "En caso de haber ingresado el parámetro ```group_keys=True```, es posible usar la siguiente sintaxis.\n",
295 | "\n",
296 | "```\n",
297 | "[[, , ... ]]\n",
298 | "```\n",
299 | "Donde:\n",
300 | "\n",
301 | "* `````` es un objeto ```DataFrameGroupBy```.\n",
302 | "* `````` es el identificador de una columna del *dataframe* original.\n",
303 | "\n"
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "**Ejemplo:**"
311 | ]
312 | },
313 | {
314 | "cell_type": "markdown",
315 | "metadata": {},
316 | "source": [
317 | "* La siguiente celda regresará un listado de los elementos agrupados, pero sólo se incluirá a la columna ```'fecha'```."
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": null,
323 | "metadata": {
324 | "scrolled": true
325 | },
326 | "outputs": [],
327 | "source": [
328 | "for item in clientes['fecha']:\n",
329 | " print(f\"\"\"ciente: {item[0]}\n",
330 | " -------\n",
331 | "{item[1]}\n",
332 | "\"\"\")"
333 | ]
334 | },
335 | {
336 | "cell_type": "markdown",
337 | "metadata": {},
338 | "source": [
339 | "* La siguiente celda regresará un listado de los elementos agrupados, pero sólo se incluirá a las columnas ```'fecha'``` y ```monto```."
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": null,
345 | "metadata": {},
346 | "outputs": [],
347 | "source": [
348 | "for item in clientes[['fecha', 'monto']]:\n",
349 | " print(f\"\"\"ciente: {item[0]}\n",
350 | " -------\n",
351 | "{item[1]}\n",
352 | "\"\"\")"
353 | ]
354 | },
355 | {
356 | "cell_type": "markdown",
357 | "metadata": {},
358 | "source": [
359 | "### Atributos y métodos de ```DataFrameGroupBy```.\n",
360 | "\n",
361 | "Los objetos ```core.groupby.generic.DataFrameGroupBy``` cuentan con una gran cantidad de atributo y métodos que permiten analizar y manipular los datos de las tuplas que contienen dichos objetos.\n",
362 | "\n",
363 | "https://pandas.pydata.org/docs/reference/groupby.html"
364 | ]
365 | },
366 | {
367 | "cell_type": "markdown",
368 | "metadata": {},
369 | "source": [
370 | "* Las siguientes celdas mostrarán algunos métodos y atributos de los objetos ```core.groupby.generic.DataFrameGroupBy```."
371 | ]
372 | },
373 | {
374 | "cell_type": "markdown",
375 | "metadata": {},
376 | "source": [
377 | "**Ejemplos:**"
378 | ]
379 | },
380 | {
381 | "cell_type": "markdown",
382 | "metadata": {},
383 | "source": [
384 | "* La siguiente celda regresará al atributo ```clientes.indices```, el cual es un objeto de tipo ```dict``` donde las claves corresponden a cada valor de agrupación y los valores corresponden a un arreglo que enumera los índices en donde se encontró dicho valor de agrupación."
385 | ]
386 | },
387 | {
388 | "cell_type": "code",
389 | "execution_count": null,
390 | "metadata": {
391 | "scrolled": true
392 | },
393 | "outputs": [],
394 | "source": [
395 | "clientes.indices"
396 | ]
397 | },
398 | {
399 | "cell_type": "markdown",
400 | "metadata": {},
401 | "source": [
402 | "* La siguiente celda regresará una serie en el que el índice corresponden a cada valor de agrupación y los valores corresponden al numero de elementos agrupados del objeto ```cliente```."
403 | ]
404 | },
405 | {
406 | "cell_type": "code",
407 | "execution_count": null,
408 | "metadata": {
409 | "scrolled": true
410 | },
411 | "outputs": [],
412 | "source": [
413 | "clientes.size()"
414 | ]
415 | },
416 | {
417 | "cell_type": "markdown",
418 | "metadata": {},
419 | "source": [
420 | "* La siguiente celda regresará un *dataframe* en el que el índice corresponden a cada valor de agrupación y los valores corresponden a la media estadística de los valores agrupados de cada columna restante del *dataset* original de ```clientes```.\n",
421 | "* El parámetro ```numeric_only=True``` le indica al método que aplique el cálculo sólo a aquellas columnas que contengan valores numéricos."
422 | ]
423 | },
424 | {
425 | "cell_type": "code",
426 | "execution_count": null,
427 | "metadata": {
428 | "scrolled": false
429 | },
430 | "outputs": [],
431 | "source": [
432 | "clientes.mean(numeric_only=True)"
433 | ]
434 | },
435 | {
436 | "cell_type": "markdown",
437 | "metadata": {},
438 | "source": [
439 | "* La siguiente celda aplicará el método ```mean()``` a ```clientes['monto']```."
440 | ]
441 | },
442 | {
443 | "cell_type": "code",
444 | "execution_count": null,
445 | "metadata": {
446 | "scrolled": true
447 | },
448 | "outputs": [],
449 | "source": [
450 | "clientes['monto'].mean(numeric_only=True)"
451 | ]
452 | },
453 | {
454 | "cell_type": "markdown",
455 | "metadata": {},
456 | "source": [
457 | "* La siguiente celda trazará un histograma a partir de los valores en la columna ```\"monto\"``` de cada elemento agrupado."
458 | ]
459 | },
460 | {
461 | "cell_type": "code",
462 | "execution_count": null,
463 | "metadata": {},
464 | "outputs": [],
465 | "source": [
466 | "clientes.hist(column=\"monto\")"
467 | ]
468 | },
469 | {
470 | "cell_type": "markdown",
471 | "metadata": {},
472 | "source": [
473 | "* La siguiente celda aplicará una función que divida a cada valor entre ```1000```."
474 | ]
475 | },
476 | {
477 | "cell_type": "code",
478 | "execution_count": null,
479 | "metadata": {},
480 | "outputs": [],
481 | "source": [
482 | "clientes['monto'].apply(func=lambda x: x / 100)"
483 | ]
484 | },
485 | {
486 | "cell_type": "markdown",
487 | "metadata": {},
488 | "source": [
489 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
490 | "© José Luis Chiquete Valdivieso. 2023.
"
491 | ]
492 | }
493 | ],
494 | "metadata": {
495 | "kernelspec": {
496 | "display_name": "Python 3 (ipykernel)",
497 | "language": "python",
498 | "name": "python3"
499 | },
500 | "language_info": {
501 | "codemirror_mode": {
502 | "name": "ipython",
503 | "version": 3
504 | },
505 | "file_extension": ".py",
506 | "mimetype": "text/x-python",
507 | "name": "python",
508 | "nbconvert_exporter": "python",
509 | "pygments_lexer": "ipython3",
510 | "version": "3.9.2"
511 | }
512 | },
513 | "nbformat": 4,
514 | "nbformat_minor": 2
515 | }
516 |
--------------------------------------------------------------------------------
/22_extraccion_y_almacenamiento.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Extracción y almacenamiento de dataframes y series. "
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "!pip install openpyxl"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "import pandas as pd\n",
33 | "import numpy as np"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "Una de las fortalezas de *Pandas* es su capacidad de extraer información de diversas fuentes de datos.\n",
41 | "\n",
42 | "En este capítulo se realizará la extracción de un dataframe a partir de un archivo de hoja de cálculo publicado en Internet."
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "## El paquete ```xlrd```.\n",
50 | "\n",
51 | "Este paquete permite realizar operaciones de lectura en hojas de cálculo en formatos ```xls``` y ```xlsx```. \n",
52 | "\n",
53 | "La documentaciónde ```xlrd``` está disponible en:\n",
54 | "\n",
55 | "https://xlrd.readthedocs.io/en/latest/\n",
56 | "\n",
57 | "*Pandas* utiliza ```xlrd``` para extraer información de este tipo de archivos."
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": null,
63 | "metadata": {},
64 | "outputs": [],
65 | "source": [
66 | "!pip install xlrd"
67 | ]
68 | },
69 | {
70 | "cell_type": "markdown",
71 | "metadata": {},
72 | "source": [
73 | "## Funciones de lectura de Pandas.\n",
74 | "\n",
75 | "* ```pd.read_clipboard()``` Permite leer datos que se encuentran en el espacio de memoria del \"clipboard\" en un sistemas.\n",
76 | "* ```pd.read_csv()``` Permite leer datos que se encuentran en un archivo *CSV*.\n",
77 | "* ```pd.read_excel()``` Permite leer datos que se encuentran en un archivo de *Excel*.\n",
78 | "* ```pd.read_feather()``` Permite leer datos a partir de [*feather*](https://github.com/wesm/feather).\n",
79 | "* ```pd.read_fwf()```.\n",
80 | "* ```pd.read_gbq()``` para [*Google Big Query*](https://pandas-gbq.readthedocs.io/en/latest/).\n",
81 | "* ```pd.read_hdf()``` para [HDF5](https://www.hdfgroup.org/solutions/hdf5).\n",
82 | "* ```pd.read_html()``` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html.\n",
83 | "* ```pd.read_json()```.\n",
84 | "* ```pd.read_msgpack()``` https://pandas-msgpack.readthedocs.io/en/latest/.\n",
85 | "* ```pd.read_parquet()``` https://databricks.com/glossary/what-is-parquet.\n",
86 | "* ```pd.read_pickle()```.\n",
87 | "* ```pd.read_sas()```.\n",
88 | "* ```pd. read_sql()```.\n",
89 | "* ```pd.read_sql_query()```.\n",
90 | "* ```pd.read_sql_table()```.\n",
91 | "* ```pd.read_stata()```.\n",
92 | "* ```pd.read_table()```.\n",
93 | "\n",
94 | "**Nota:** En la mayor parte de los casos los datos extraidos son almacenados en un *dataframe*."
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "## Métodos de persistencia y almacenamiento de los dataframes de *Pandas*.\n",
102 | "\n",
103 | "* ```pd.DataFrame.to_clipboard()```\n",
104 | "* ```pd.DataFrame.to_csv()```\n",
105 | "* ```pd.DataFrame.to_dict()```\n",
106 | "* ```pd.DataFrame.to_excel()```\n",
107 | "* ```pd.DataFrame.to_feather()```\n",
108 | "* ```pd.DataFrame.to_gbq()```\n",
109 | "* ```pd.DataFrame.to_hdf()```\n",
110 | "* ```pd.DataFrame.to_html```\n",
111 | "* ```pd.DataFrame.to_json()```\n",
112 | "* ```pd.DataFrame.to_latex()```\n",
113 | "* ```pd.DataFrame.to_msgpack()```\n",
114 | "* ```pd.DataFrame.to_numpy()```\n",
115 | "* ```pd.DataFrame.to_parquet()```\n",
116 | "* ```pd.DataFrame.to_pickle()```\n",
117 | "* ```pd.DataFrame.to_records()```\n",
118 | "* ```pd.DataFrame.to_sql()```\n",
119 | "* ```pd.DataFrame.to_stata()```"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "## Obtención de datos a partir de una hoja de cálculo pulbvicada por el INEGI.\n",
127 | "\n",
128 | "A continuación se descargará el archivo localizado en https://www.inegi.org.mx/contenidos/temas/economia/cn/itaee/tabulados/ori/ITAEE_2.xlsx"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "### Obtención del archivo usando ```urllib```."
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "import urllib.request"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "urllib.request.urlretrieve(\"https://www.inegi.org.mx/contenidos/temas/economia/cn/itaee/tabulados/ori/ITAEE_2.xlsx\", \"datos.xlsx\")"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": null,
159 | "metadata": {
160 | "scrolled": true
161 | },
162 | "outputs": [],
163 | "source": [
164 | "%ls datos.xlsx"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "### Carga del archivo con ```pd.read_excel()```."
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {},
178 | "outputs": [],
179 | "source": [
180 | "original = pd.read_excel('datos.xlsx')"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": null,
186 | "metadata": {
187 | "scrolled": true
188 | },
189 | "outputs": [],
190 | "source": [
191 | "original"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": null,
197 | "metadata": {
198 | "scrolled": false
199 | },
200 | "outputs": [],
201 | "source": [
202 | "original.head(39)"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "### Uso de ```set_index()``` para definir un índice por entidad."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {},
216 | "outputs": [],
217 | "source": [
218 | "original.columns"
219 | ]
220 | },
221 | {
222 | "cell_type": "code",
223 | "execution_count": null,
224 | "metadata": {
225 | "scrolled": true
226 | },
227 | "outputs": [],
228 | "source": [
229 | "original.columns.values"
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": null,
235 | "metadata": {},
236 | "outputs": [],
237 | "source": [
238 | "original.columns.values[0]"
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": null,
244 | "metadata": {},
245 | "outputs": [],
246 | "source": [
247 | "original.set_index(original.columns.values[0], inplace=True)"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": null,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "original.head(39)"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": null,
262 | "metadata": {},
263 | "outputs": [],
264 | "source": [
265 | "original.index.name = 'Entidades'"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {
272 | "scrolled": true
273 | },
274 | "outputs": [],
275 | "source": [
276 | "original"
277 | ]
278 | },
279 | {
280 | "cell_type": "markdown",
281 | "metadata": {},
282 | "source": [
283 | "### Obtención de los datos relevantes."
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": null,
289 | "metadata": {},
290 | "outputs": [],
291 | "source": [
292 | "datos = original[6:39]"
293 | ]
294 | },
295 | {
296 | "cell_type": "code",
297 | "execution_count": null,
298 | "metadata": {},
299 | "outputs": [],
300 | "source": [
301 | "datos"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "### Creación de un índice de columnas adecuado."
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": null,
314 | "metadata": {},
315 | "outputs": [],
316 | "source": [
317 | "periodos = pd.MultiIndex.from_product([[x for x in range(2003, 2023)],\n",
318 | " ['T1', 'T2', 'T3', 'T4', 'Anual']], \n",
319 | " names=('Año', 'Periodo'))"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {},
326 | "outputs": [],
327 | "source": [
328 | "periodos"
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": null,
334 | "metadata": {},
335 | "outputs": [],
336 | "source": [
337 | "datos.columns = periodos"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {
344 | "scrolled": false
345 | },
346 | "outputs": [],
347 | "source": [
348 | "datos"
349 | ]
350 | },
351 | {
352 | "cell_type": "code",
353 | "execution_count": null,
354 | "metadata": {
355 | "scrolled": true
356 | },
357 | "outputs": [],
358 | "source": [
359 | "datos[2005]"
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": null,
365 | "metadata": {},
366 | "outputs": [],
367 | "source": [
368 | "datos[2005]['T1']"
369 | ]
370 | },
371 | {
372 | "cell_type": "code",
373 | "execution_count": null,
374 | "metadata": {},
375 | "outputs": [],
376 | "source": [
377 | "datos[2005]['T1'][1:]"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {},
384 | "outputs": [],
385 | "source": [
386 | "periodo = datos[2005]['T1'][1:]"
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": null,
392 | "metadata": {
393 | "scrolled": true
394 | },
395 | "outputs": [],
396 | "source": [
397 | "periodo.mean()"
398 | ]
399 | },
400 | {
401 | "cell_type": "code",
402 | "execution_count": null,
403 | "metadata": {},
404 | "outputs": [],
405 | "source": [
406 | "periodo.to_excel('datos_utiles.xlsx')"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "metadata": {},
412 | "source": [
413 | "### Extracción y escritura en formato CVS."
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": null,
419 | "metadata": {},
420 | "outputs": [],
421 | "source": [
422 | "nuevos_datos = pd.read_csv('data/datos_filtrados.csv')"
423 | ]
424 | },
425 | {
426 | "cell_type": "code",
427 | "execution_count": null,
428 | "metadata": {},
429 | "outputs": [],
430 | "source": [
431 | "nuevos_datos"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {},
437 | "source": [
438 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
439 | "© José Luis Chiquete Valdivieso. 2023.
"
440 | ]
441 | }
442 | ],
443 | "metadata": {
444 | "kernelspec": {
445 | "display_name": "Python 3 (ipykernel)",
446 | "language": "python",
447 | "name": "python3"
448 | },
449 | "language_info": {
450 | "codemirror_mode": {
451 | "name": "ipython",
452 | "version": 3
453 | },
454 | "file_extension": ".py",
455 | "mimetype": "text/x-python",
456 | "name": "python",
457 | "nbconvert_exporter": "python",
458 | "pygments_lexer": "ipython3",
459 | "version": "3.9.2"
460 | }
461 | },
462 | "nbformat": 4,
463 | "nbformat_minor": 2
464 | }
465 |
--------------------------------------------------------------------------------
/23_visualizacion_de_datos_con_pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Visualización de datos con *Pandas*."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "## El atributo ```pandas.Dataframe.plot```.\n",
31 | "\n",
32 | "Este atributo contiene una colección de métodos que permiten deplegar gráficos descriptivos básicos basados en *Matplotlib*.\n",
33 | "\n",
34 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "**Ejemplo:**"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "* El archivo ```data/datos_filtrados.csv``` contiene los datos de crecimiento económico trimestral y anual de las Entidad Federativas de la República Mexicana desde *2013*."
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "* La siguiente celda creará al *dataframe* ```datos``` a partir del archivo ```data/datos_filtrados.csv```."
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {},
62 | "outputs": [],
63 | "source": [
64 | "datos = pd.read_csv('data/datos_filtrados.csv')"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "metadata": {
71 | "scrolled": false
72 | },
73 | "outputs": [],
74 | "source": [
75 | "datos"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "* Las siguientes celdas modificarán el *dataframe*."
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": null,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "datos = datos.drop([0, 1])"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {
98 | "scrolled": true
99 | },
100 | "outputs": [],
101 | "source": [
102 | "datos"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "metadata": {},
109 | "outputs": [],
110 | "source": [
111 | "datos.set_index('Año', inplace=True)"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": null,
117 | "metadata": {},
118 | "outputs": [],
119 | "source": [
120 | "datos"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": null,
126 | "metadata": {},
127 | "outputs": [],
128 | "source": [
129 | "datos.index.name = \"Entidad\""
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": null,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "datos"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "columnas = pd.MultiIndex.from_product([(x for x in range(2003, 2020)),\n",
148 | " ('T1', 'T2', 'T3', 'T4', 'Anual')],\n",
149 | " names=['Año', 'Período'])"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "datos.columns = columnas"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {
165 | "scrolled": true
166 | },
167 | "outputs": [],
168 | "source": [
169 | "datos"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "metadata": {},
176 | "outputs": [],
177 | "source": [
178 | "datos = datos.astype(float)"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": null,
184 | "metadata": {},
185 | "outputs": [],
186 | "source": [
187 | "datos[2004]['Anual']"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": null,
193 | "metadata": {},
194 | "outputs": [],
195 | "source": [
196 | "historico_2004 = datos[2004]['Anual'][1:]"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": null,
202 | "metadata": {},
203 | "outputs": [],
204 | "source": [
205 | "historico_2004"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": null,
211 | "metadata": {},
212 | "outputs": [],
213 | "source": [
214 | "historico_2004.plot()"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": null,
220 | "metadata": {},
221 | "outputs": [],
222 | "source": [
223 | "historico_2004.plot.hist()"
224 | ]
225 | },
226 | {
227 | "cell_type": "code",
228 | "execution_count": null,
229 | "metadata": {},
230 | "outputs": [],
231 | "source": [
232 | "historico_2004.plot.pie()"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {},
239 | "outputs": [],
240 | "source": [
241 | "historico_2004.plot.area()"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": null,
247 | "metadata": {},
248 | "outputs": [],
249 | "source": [
250 | "historico_2004.plot.bar()"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": null,
256 | "metadata": {
257 | "scrolled": true
258 | },
259 | "outputs": [],
260 | "source": [
261 | "historico_2004.plot.barh()"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
269 | "© José Luis Chiquete Valdivieso. 2021.
"
270 | ]
271 | }
272 | ],
273 | "metadata": {
274 | "kernelspec": {
275 | "display_name": "Python 3 (ipykernel)",
276 | "language": "python",
277 | "name": "python3"
278 | },
279 | "language_info": {
280 | "codemirror_mode": {
281 | "name": "ipython",
282 | "version": 3
283 | },
284 | "file_extension": ".py",
285 | "mimetype": "text/x-python",
286 | "name": "python",
287 | "nbconvert_exporter": "python",
288 | "pygments_lexer": "ipython3",
289 | "version": "3.9.2"
290 | }
291 | },
292 | "nbformat": 4,
293 | "nbformat_minor": 2
294 | }
295 |
--------------------------------------------------------------------------------
/24_introduccion_a_matplotlib.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# El paquete *Matplotlib*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "[*Matlplotib*](https://matplotlib.org/) consiste en una biblioteca especializada en visualización de datos y es la base de herramientas más avanzadas como [*Seaborn*](https://seaborn.pydata.org/), [*Plotnine*](https://plotnine.readthedocs.io/en/stable/) y [*Dash*](https://dash.plotly.com/) entre muchas otras.\n",
22 | "\n",
23 | "La sintaxis de *Matplotlib* está basada en la sintaxis de *Matlab*.\n",
24 | "\n",
25 | "Por convención, el paquete ```matplotlib``` se importa como ```mpl```. Se utilizará esta nomenclatura para próximas referencia."
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": null,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": [
34 | "import numpy as np"
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": null,
40 | "metadata": {},
41 | "outputs": [],
42 | "source": [
43 | "!pip install matplotlib"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## El objeto ```mpl.pyplot```.\n",
51 | "\n",
52 | "*Matplotlib* contiene una muy extensa biblioteca, la cual tiene como componente principal al objeto ```matplotlib.pyplot```. Por convención, ```matplotlib.pyplot``` se importa como ```plt```. Se utilizará esta convención para próximas referencias.\n",
53 | "\n",
54 | "El uso de ```plt``` permite crear objetos capaces de desplegar gráficos y definir múltiples características de éstos.\n",
55 | "\n",
56 | "A lo largo de este capítulo se explorarán algunos de dichos recursos.\n",
57 | "\n",
58 | "https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "from matplotlib import pyplot as plt"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "### El método ```plt.plot()```."
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "Esta función permite crear una o más gráficas en 2 dimensiones a partir de arreglos de puntos para los ejes ```x``` y ```y``` que son ingresados como argumentos.\n",
82 | "\n",
83 | "```\n",
84 | "plt.plot(, ,\n",
85 | " , , ... \n",
86 | " , , )\n",
87 | "```"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "### El método ```plt.show()```.\n",
95 | "\n",
96 | "El método ```plt. show()``` permite desplegar el gráfico creado con ```plt.plot()``` en el entorno desde el que se ejecutan las instrucciones.\n",
97 | "\n",
98 | "**NOTA:** En el caso de las notebooks de *Jupyter*, no es necesario usar ```plt. show()``` para que el gráfico se despliegue al ejecutar una celda con ```plt.plot()```."
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "**Ejemplos:**"
106 | ]
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {},
111 | "source": [
112 | "* Las siguientes celdas crearán un arreglo unidimensinal de *Numpy* con nombre ```x```, el cual contendrá ```500``` segmentos lineales que van de ```0``` a ```3π``` usando la función ```np.linspace()```."
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {
119 | "scrolled": true
120 | },
121 | "outputs": [],
122 | "source": [
123 | "x = np.linspace(0, 3*np.pi, 500)"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": null,
129 | "metadata": {},
130 | "outputs": [],
131 | "source": [
132 | "x"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "* La siguente celda utilizará ```plt.plot()``` para desplegar dos gráficas que unirán cada punto definido mediante una línea, las cuales están en función del arreglo ```x```:\n",
140 | "\n",
141 | "* ```np.sin(x ** 2)```\n",
142 | "* ```np.cos(x ** 2)```"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": null,
148 | "metadata": {},
149 | "outputs": [],
150 | "source": [
151 | "plt.plot(x, np.sin(x**2), x, np.cos(x ** 2))"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": null,
157 | "metadata": {},
158 | "outputs": [],
159 | "source": [
160 | "plt.show()"
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {},
166 | "source": [
167 | "* Las celdas anteriores crearon de forma automática un objeto de tipo ```plt.Figure```, el cual fue utilizado para desplegar las gráficas correspondientes."
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "metadata": {},
173 | "source": [
174 | "* La siguiente celda incluye algunas funciones de ```plt``` que definen el título del gráfico (```plt.title()```), y las estiqueta de los ejes ( ```plt.xlabel```y ```plt.ylabel```)."
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": null,
180 | "metadata": {},
181 | "outputs": [],
182 | "source": [
183 | "plt.plot(x, np.sin(x**2), x, np.cos(x ** 2))\n",
184 | "plt.title('Funciones sinusoidales')\n",
185 | "plt.xlabel('Eje de las x')\n",
186 | "plt.ylabel('f(x)')\n",
187 | "plt.show()"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": [
194 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
195 | "© José Luis Chiquete Valdivieso. 2023.
"
196 | ]
197 | }
198 | ],
199 | "metadata": {
200 | "kernelspec": {
201 | "display_name": "Python 3 (ipykernel)",
202 | "language": "python",
203 | "name": "python3"
204 | },
205 | "language_info": {
206 | "codemirror_mode": {
207 | "name": "ipython",
208 | "version": 3
209 | },
210 | "file_extension": ".py",
211 | "mimetype": "text/x-python",
212 | "name": "python",
213 | "nbconvert_exporter": "python",
214 | "pygments_lexer": "ipython3",
215 | "version": "3.9.2"
216 | }
217 | },
218 | "nbformat": 4,
219 | "nbformat_minor": 2
220 | }
221 |
--------------------------------------------------------------------------------
/26_tipos_basicos_de_graficos.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Tipos básicos de gráfico de *Matplotlib*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "https://matplotlib.org/stable/plot_types/index.html\n",
22 | "\n",
23 | "https://matplotlib.org/stable/gallery/"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "import matplotlib as mpl\n",
33 | "import numpy as np\n",
34 | "from matplotlib import pyplot as plt"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "## Preliminares."
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": null,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "np.random.seed(12314124)"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "normal = np.random.normal(50, 20, 500)"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": null,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "normal"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "metadata": {},
75 | "outputs": [],
76 | "source": [
77 | "normal2 = np.random.normal(25, 12, 500)"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": null,
83 | "metadata": {},
84 | "outputs": [],
85 | "source": [
86 | "ancho = np.random.randint(5, 60, 500)"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "## Histograma."
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": null,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "plt.hist(normal)"
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": null,
115 | "metadata": {
116 | "scrolled": true
117 | },
118 | "outputs": [],
119 | "source": [
120 | "plt.hist(normal, histtype='step')"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": null,
126 | "metadata": {
127 | "scrolled": true
128 | },
129 | "outputs": [],
130 | "source": [
131 | "plt.hist(normal, orientation='horizontal')"
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": null,
137 | "metadata": {
138 | "scrolled": false
139 | },
140 | "outputs": [],
141 | "source": [
142 | "plt.hist(normal, color='red')"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": null,
148 | "metadata": {
149 | "scrolled": true
150 | },
151 | "outputs": [],
152 | "source": [
153 | "plt.hist(normal, bins=13)"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": null,
159 | "metadata": {},
160 | "outputs": [],
161 | "source": [
162 | "plt.hist(normal, stacked=True)"
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {},
168 | "source": [
169 | "## Histograma 2D."
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist2d.html"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": null,
182 | "metadata": {},
183 | "outputs": [],
184 | "source": [
185 | "plt.hist2d(normal, normal2)"
186 | ]
187 | },
188 | {
189 | "cell_type": "markdown",
190 | "metadata": {},
191 | "source": [
192 | "## Boxplot."
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": null,
205 | "metadata": {
206 | "scrolled": true
207 | },
208 | "outputs": [],
209 | "source": [
210 | "plt.boxplot(normal)"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": null,
216 | "metadata": {},
217 | "outputs": [],
218 | "source": [
219 | "plt.boxplot((normal, normal2))"
220 | ]
221 | },
222 | {
223 | "cell_type": "markdown",
224 | "metadata": {},
225 | "source": [
226 | "## Scatterplot."
227 | ]
228 | },
229 | {
230 | "cell_type": "markdown",
231 | "metadata": {},
232 | "source": [
233 | "https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": null,
239 | "metadata": {
240 | "scrolled": true
241 | },
242 | "outputs": [],
243 | "source": [
244 | "plt.scatter(normal, normal2)"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": null,
250 | "metadata": {},
251 | "outputs": [],
252 | "source": [
253 | "plt.scatter(normal, normal2, s=ancho, c=ancho)"
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {},
259 | "source": [
260 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
261 | "© José Luis Chiquete Valdivieso. 2023.
"
262 | ]
263 | }
264 | ],
265 | "metadata": {
266 | "kernelspec": {
267 | "display_name": "Python 3 (ipykernel)",
268 | "language": "python",
269 | "name": "python3"
270 | },
271 | "language_info": {
272 | "codemirror_mode": {
273 | "name": "ipython",
274 | "version": 3
275 | },
276 | "file_extension": ".py",
277 | "mimetype": "text/x-python",
278 | "name": "python",
279 | "nbconvert_exporter": "python",
280 | "pygments_lexer": "ipython3",
281 | "version": "3.9.2"
282 | }
283 | },
284 | "nbformat": 4,
285 | "nbformat_minor": 2
286 | }
287 |
--------------------------------------------------------------------------------
/27_introduccion_a_plotnine.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Introducción a *Plotnine*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "*Plotnine* es un proyecto que pretende implementar en *Python* las funcionalidades de [*ggplot2*](https://ggplot2.tidyverse.org/), la popular herramienta de visualización de datos para *R* y la [gramática de gráficas por capas](http://vita.had.co.nz/papers/layered-grammar.html); ambas creadas por [*Hadley Wickham*](https://hadley.nz/) y basadas en la gramática de gráficas de [*Leland Wilkinson*](https://en.wikipedia.org/wiki/Leland_Wilkinson).\n",
22 | "\n",
23 | "Aún cuando muchas de las funcionalidades de *ggplot2* ya han sido portadas, aún quedan muchas que se encuentran pendientes.\n",
24 | "\n",
25 | "\n",
26 | "La documentación oficial de *Plotnine* puede ser consultada en la siguiente liga:\n",
27 | "\n",
28 | "https://plotnine.readthedocs.io/en/stable/\n",
29 | "\n",
30 | "La siguiente liga apunta a un breve tutorial sobre *Plotnine* tomando como base a *ggplot2*: \n",
31 | "\n",
32 | "https://datascienceworkshops.com/blog/plotnine-grammar-of-graphics-for-python/"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": null,
38 | "metadata": {
39 | "scrolled": true
40 | },
41 | "outputs": [],
42 | "source": [
43 | "!pip install plotnine"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": null,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "from plotnine import *\n",
53 | "import pandas as pd\n",
54 | "import numpy as np\n",
55 | "from datetime import datetime\n",
56 | "from typing import Any"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "## Gramática de capas de un gráfico."
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "La gramática de capas define una estructura de elementos que condformnan un gráfico.\n",
71 | "\n",
72 | "* Datos.\n",
73 | "* Mapeo.\n",
74 | "* Estética.\n",
75 | "* Objetos geométricos.\n",
76 | "* Escalas.\n",
77 | "* Especificación de facetas.\n",
78 | "* Transfromaciones.\n",
79 | "* Sistema de coordenadas."
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "metadata": {},
85 | "source": [
86 | "## Sintaxis de la gramática."
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "### La función ```ggplot()```."
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "```\n",
101 | "ggplot(data=, mapping=, )\n",
102 | "```\n",
103 | "\n",
104 | "https://plotnine.readthedocs.io/en/stable/generated/plotnine.ggplot.html"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "### La función ```pltonine.mapping.aes()```.\n",
112 | "\n",
113 | "https://plotnine.readthedocs.io/en/stable/generated/plotnine.mapping.aes.html"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "### Funciones de geometría."
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "https://plotnine.readthedocs.io/en/stable/api.html#geoms"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## Funciones de temas.\n",
135 | "\n",
136 | "https://plotnine.readthedocs.io/en/stable/api.html#themes"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "## Ejemplos."
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {},
149 | "source": [
150 | "### Ejemplo de histograma."
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {},
157 | "outputs": [],
158 | "source": [
159 | "np.random.seed(23523889)"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": null,
165 | "metadata": {},
166 | "outputs": [],
167 | "source": [
168 | "arreglo_base = pd.DataFrame(np.random.normal(12, 25, 1000),\n",
169 | " columns=pd.Index(['observaciones']))\n",
170 | "arreglo_base"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": null,
176 | "metadata": {
177 | "scrolled": false
178 | },
179 | "outputs": [],
180 | "source": [
181 | "ggplot(data=arreglo_base)"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "ggplot(data=arreglo_base, mapping=aes(x='observaciones')) + geom_histogram()"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "scrolled": true
198 | },
199 | "outputs": [],
200 | "source": [
201 | "(ggplot(data=arreglo_base, mapping=aes(x='observaciones')) + \n",
202 | "geom_histogram(bins=10, fill='yellow', color=\"orange\"))"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "histograma = pd.DataFrame(np.histogram(arreglo_base, bins=13)).T\n",
212 | "histograma.columns = pd.Index(['frecuencias','rangos'])\n",
213 | "histograma"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": null,
219 | "metadata": {},
220 | "outputs": [],
221 | "source": []
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "### Ejemplo de columnas."
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "execution_count": null,
233 | "metadata": {
234 | "scrolled": true
235 | },
236 | "outputs": [],
237 | "source": [
238 | "ggplot(histograma, aes(x='rangos', y='frecuencias', fill='rangos')) + geom_col()"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "### Ejemplo de de líneas."
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": null,
251 | "metadata": {},
252 | "outputs": [],
253 | "source": [
254 | "casos = pd.read_csv('data/data_covid.csv')\n",
255 | "columnas = casos.columns.values\n",
256 | "columnas[0] = 'Fechas'\n",
257 | "casos.columns = pd.Index(columnas)\n",
258 | "casos.columns.name = \"Entidades\"\n",
259 | "casos['Fechas'] = pd.to_datetime(casos['Fechas'])\n",
260 | "casos.set_index('Fechas', inplace =True)\n",
261 | "casos"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {},
268 | "outputs": [],
269 | "source": [
270 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n",
271 | "+ geom_line())"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": null,
277 | "metadata": {},
278 | "outputs": [],
279 | "source": [
280 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n",
281 | "+ geom_line() \n",
282 | "+ geom_smooth())"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {
289 | "scrolled": true
290 | },
291 | "outputs": [],
292 | "source": [
293 | "(ggplot(casos, aes(x=casos.index, y='Nacional'))\n",
294 | "+ geom_line() \n",
295 | "+ geom_smooth(span=0.07, color='red'))"
296 | ]
297 | },
298 | {
299 | "cell_type": "code",
300 | "execution_count": null,
301 | "metadata": {
302 | "scrolled": false
303 | },
304 | "outputs": [],
305 | "source": [
306 | "(ggplot(casos, aes(x=casos.index, y='Nacional')) \n",
307 | " + geom_line() \n",
308 | " + geom_smooth(span=0.07, color='blue') \n",
309 | " + theme_xkcd()\n",
310 | " + theme(axis_text_x=element_text(rotation=90, hjust=0.5)))"
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": null,
316 | "metadata": {},
317 | "outputs": [],
318 | "source": [
319 | "(ggplot(casos, aes(x=casos.index, y='Nacional')) \n",
320 | " + geom_line(color='red') \n",
321 | " + geom_smooth(span=0.07, color='blue') \n",
322 | " + theme_tufte()\n",
323 | " + theme(axis_text_x=element_text(rotation=45, hjust=1)))"
324 | ]
325 | },
326 | {
327 | "cell_type": "markdown",
328 | "metadata": {},
329 | "source": [
330 | "### Ejemplo de columnas."
331 | ]
332 | },
333 | {
334 | "cell_type": "code",
335 | "execution_count": null,
336 | "metadata": {},
337 | "outputs": [],
338 | "source": [
339 | "data = casos.drop('Nacional', axis=1).T[datetime(2021,1,1)].to_frame()\n",
340 | "data.columns = pd.Index(['Casos'])\n",
341 | "data"
342 | ]
343 | },
344 | {
345 | "cell_type": "code",
346 | "execution_count": null,
347 | "metadata": {},
348 | "outputs": [],
349 | "source": [
350 | "(ggplot(data, aes(x=data.index, y=data, fill='Casos')) \n",
351 | " + geom_col()\n",
352 | " + theme(axis_text_x=element_text(rotation=90, hjust=0.5)))"
353 | ]
354 | },
355 | {
356 | "cell_type": "markdown",
357 | "metadata": {},
358 | "source": [
359 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
360 | "© José Luis Chiquete Valdivieso. 2022.
"
361 | ]
362 | }
363 | ],
364 | "metadata": {
365 | "kernelspec": {
366 | "display_name": "Python 3 (ipykernel)",
367 | "language": "python",
368 | "name": "python3"
369 | },
370 | "language_info": {
371 | "codemirror_mode": {
372 | "name": "ipython",
373 | "version": 3
374 | },
375 | "file_extension": ".py",
376 | "mimetype": "text/x-python",
377 | "name": "python",
378 | "nbconvert_exporter": "python",
379 | "pygments_lexer": "ipython3",
380 | "version": "3.9.2"
381 | }
382 | },
383 | "nbformat": 4,
384 | "nbformat_minor": 2
385 | }
386 |
--------------------------------------------------------------------------------
/29_objetos_de_seaborn.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Objetos de *Seaborn*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "Aún cuando las funciones *Seaborn* son muy populares y fáciles de desarrollar, tienen ciertas desventajas con respecto a otras soluciones basadas en gramáticas de gráficas. A partir de la versión 0.12, *Seaborn* cuenta con una biblioteca de objetos que se apega a dicha gramática.\n",
22 | "\n",
23 | "\n",
24 | "
\n",
25 | "\n",
26 | "Por convención, la biblioteca de objetos de seaborn ```seaborn.objects``` es importada como ```so```. En adelente, se seguirá dicha convención.\n",
27 | "\n",
28 | "https://seaborn.pydata.org/tutorial/objects_interface.html\n",
29 | "\n",
30 | "**NOTA:** Debido a que los objetos de *Seaborn* son una adición muy reciente, aún tiene funcionalidades limitadas en comparación con las funciones."
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "import seaborn as sns\n",
40 | "from seaborn import objects as so"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "## La clase ```so.Plot ```.\n",
48 | "\n",
49 | "Los componentes principales de este tipo de visualizaciones son los objetos instanciados de la clase ```so.Plot```.\n",
50 | "\n",
51 | "```\n",
52 | "so.Plot(data=, x=, y=, )\n",
53 | "```\n",
54 | "\n",
55 | "Donde:\n",
56 | "\n",
57 | "* `````` es un *dataset* compatible con un *dataframe* de *Pandas*.\n",
58 | "* `````` es el identificador de la columna del `````` que se utilizará para el eje de las $x$ en caso de que se requiera.\n",
59 | "* `````` es el identificador de la columna del `````` que se utilizará para el eje de las $y$ en caso de que se requiera.\n",
60 | "\n",
61 | "\n",
62 | "Los objetos instanciados de ```so.Plot``` cuentan con varios métodos y atributos que permiten cumplir con la funcionalidades de las capas:\n",
63 | "\n",
64 | "* Datos.\n",
65 | "* Estética.\n",
66 | "* Escala.\n",
67 | "* Facetas.\n",
68 | "\n",
69 | "https://seaborn.pydata.org/generated/seaborn.objects.Plot.html"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### Métodos en cascada.\n",
77 | "\n",
78 | "Los métodos de loss objetos instanciados de ```so.Plot``` también regresan objetos instanciados de ```so.Plot```, por lo que es posible aplicar métodos en cascada."
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "### El método ```so.Plot.add() ```.\n",
86 | "\n",
87 | "Este método permite añadir funcionalidades que extienden a los objeto de tipo ```so.Plot.add()``` en las capas de objetos geométricos, y estadísitca, principalmente.\n",
88 | "\n",
89 | "```\n",
90 | "so.Plot.add((), (), ... ())\n",
91 | "```\n",
92 | "\n",
93 | "Donde:\n",
94 | "\n",
95 | "* Cada ``````es una función de ```seaborn.objects```."
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "**Ejemplo:**"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "metadata": {},
109 | "outputs": [],
110 | "source": [
111 | "dataset = dataset = sns.load_dataset(\"iris\")\n",
112 | "dataset"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {
119 | "scrolled": true
120 | },
121 | "outputs": [],
122 | "source": [
123 | "so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\")"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": null,
129 | "metadata": {
130 | "scrolled": true
131 | },
132 | "outputs": [],
133 | "source": [
134 | "(so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\")\n",
135 | " .add(so.Dots()))"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "(so.Plot(data=dataset, x=\"sepal_length\", y=\"sepal_width\", color=\"species\")\n",
145 | " .add(so.Dots()))"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {
152 | "scrolled": true
153 | },
154 | "outputs": [],
155 | "source": [
156 | "(so.Plot(data=dataset, x='sepal_length',\n",
157 | " y='sepal_width')\n",
158 | " .facet('species'))"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {},
165 | "outputs": [],
166 | "source": [
167 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n",
168 | " facet('species').\n",
169 | " add(so.Bar(), so.Hist()))"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "metadata": {
176 | "scrolled": true
177 | },
178 | "outputs": [],
179 | "source": [
180 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n",
181 | " facet('species').\n",
182 | " add(so.Bar(), so.Hist()).scale(x=1))"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": null,
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "(so.Plot(data=dataset, x='sepal_length', color='sepal_length').\n",
192 | " add(so.Bar(), so.S()))"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
200 | "© José Luis Chiquete Valdivieso. 2022.
"
201 | ]
202 | }
203 | ],
204 | "metadata": {
205 | "kernelspec": {
206 | "display_name": "Python 3 (ipykernel)",
207 | "language": "python",
208 | "name": "python3"
209 | },
210 | "language_info": {
211 | "codemirror_mode": {
212 | "name": "ipython",
213 | "version": 3
214 | },
215 | "file_extension": ".py",
216 | "mimetype": "text/x-python",
217 | "name": "python",
218 | "nbconvert_exporter": "python",
219 | "pygments_lexer": "ipython3",
220 | "version": "3.10.6"
221 | }
222 | },
223 | "nbformat": 4,
224 | "nbformat_minor": 2
225 | }
226 |
--------------------------------------------------------------------------------
/30_introduccion_a_dask.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "[](https://www.pythonista.io)"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Introducción a *Dask*."
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "Las bibliotecas de *Scipy* tienen limitaciones en cuanto a su capacidad de escalar de forma horizontal y aún cuando son capaces de realizar *multithreading* para procesamiento en paralelo, están restringidas a la cantidad de recursos disponibles de la máquina dede las que són ejecutadas.\n",
22 | "\n",
23 | "[*Dask*](https://dask.org/) es una biblioteca general para cómputo paralelo que permite escalar sus operaciones por medio de clústers (grupos de equipos de cómputo que trabajan de forma coordinada).\n",
24 | "\n",
25 | "*Dask* consta de:\n",
26 | "\n",
27 | "* Un calendarizador de tareas dinámico (*dynamic task scheduler*).\n",
28 | "* Una colección de bibliotecas optimizadas para *Big Data*, con interfaces que extienden a*Numpy* y *Pandas*.\n",
29 | "\n",
30 | "https://docs.dask.org/en/stable/\n",
31 | "\n",
32 | "https://tutorial.dask.org/"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": null,
38 | "metadata": {
39 | "scrolled": true
40 | },
41 | "outputs": [],
42 | "source": [
43 | "pip install dask"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## Principales paquetes de *Dask*."
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "
"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "### Paquetes de colecciones de datos de *Dask*.\n",
65 | "\n",
66 | "* ```dask.array```, el cual contiene una biblioteca para manejo de arreglos similar a la de *Numpy*. Por convención, este módulo se importa como ```da```. La documentación de este paquete puede consultarse en:\n",
67 | " * https://docs.dask.org/en/stable/array.html\n",
68 | "* ```dask.dataframe```, el cual contiene una biblioteca para manejo de *datafames* similar a la de *Pandas*. Por convención, este módulo se importa como ```dd```. La documentación de este paquete puede consultarse en:\n",
69 | " * https://docs.dask.org/en/stable/dataframe.html\n",
70 | "* ```dask.bags```, el cual contiene una biblioteca para manejo de *bags*, las cuales son estructuras de datos que pueden contener datos semi-estructurados y estructurados. Por convención este módulo se importa como ```db```. La documentación de este paquete puede consultarse en:\n",
71 | "https://docs.dask.org/en/stable/bag.html"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "### Evaluación perezosa (*lazy*) con el método ```compute()```."
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": null,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": [
87 | "import dask.dataframe as dd"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "df = dd.read_csv('data/data_covid.csv')"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": null,
102 | "metadata": {
103 | "scrolled": true
104 | },
105 | "outputs": [],
106 | "source": [
107 | "df"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "df.compute()"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": null,
122 | "metadata": {},
123 | "outputs": [],
124 | "source": [
125 | "type(df[\"Nacional\"])"
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": null,
131 | "metadata": {
132 | "scrolled": true
133 | },
134 | "outputs": [],
135 | "source": [
136 | "df[\"Nacional\"].compute()"
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": null,
142 | "metadata": {},
143 | "outputs": [],
144 | "source": [
145 | "df.loc[df[\"Nacional\"] > 50000].loc[:, ['index', 'Nacional']]"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "df.loc[df[\"Nacional\"] > 50000].loc[:, ['index', 'Nacional']].compute()"
155 | ]
156 | },
157 | {
158 | "cell_type": "markdown",
159 | "metadata": {},
160 | "source": [
161 | "### Bibliotecas de *Dask*.\n",
162 | "\n",
163 | "* ```dask.delayed```. Esta biblioteca permite procesar colecciones basadas en *Python* de forma paralela.\n",
164 | " * https://docs.dask.org/en/stable/delayed.html\n",
165 | "* ```dask.futures```. Es una implementación de [```concurrent.futures```](https://docs.python.org/3/library/concurrent.futures.html) de *Python* optimizado para correr en un cluster. La documentación de este paquete puede consultarse en:\n",
166 | " * https://docs.dask.org/en/stable/futures.html"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {},
172 | "source": [
173 | "## Depliegue de un cluster con ```Dask.Distributed```."
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "*Dask* puede ser desplegado en clusters mediante el uso de varios equipos *workers* gestionados por un *scheduler*.\n",
181 | "\n",
182 | "\n",
183 | "https://distributed.dask.org/en/stable/"
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "
"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "scrolled": true
198 | },
199 | "outputs": [],
200 | "source": [
201 | "!pip install \"bokeh>=2.4.2, <3\"\n",
202 | "!pip install dask distributed --upgrade"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "!dask scheduler"
212 | ]
213 | },
214 | {
215 | "cell_type": "markdown",
216 | "metadata": {},
217 | "source": [
218 | "
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.
\n",
219 | "© José Luis Chiquete Valdivieso. 2022.
"
220 | ]
221 | }
222 | ],
223 | "metadata": {
224 | "kernelspec": {
225 | "display_name": "Python 3 (ipykernel)",
226 | "language": "python",
227 | "name": "python3"
228 | },
229 | "language_info": {
230 | "codemirror_mode": {
231 | "name": "ipython",
232 | "version": 3
233 | },
234 | "file_extension": ".py",
235 | "mimetype": "text/x-python",
236 | "name": "python",
237 | "nbconvert_exporter": "python",
238 | "pygments_lexer": "ipython3",
239 | "version": "3.10.6"
240 | }
241 | },
242 | "nbformat": 4,
243 | "nbformat_minor": 2
244 | }
245 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Pythonista®
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # py311 Gestión de datos con Numpy, Pandas y Matplotlib.
2 |
3 | ## Temario:
4 |
5 | * Sicpy y Numpy.
6 | * Introducción a Pandas.
7 | * Tipos de datos en Pandas.
8 | * Operaciones básicas.
9 | * Operaciones de adición.
10 | * El método merge.
11 | * El método filter.
12 | * El método apply.
13 | * El método groupby.
14 | * Métodos de enmascaramiento.
15 | * Gestión de datos.
16 | * Limpieza de datos faltantes.
17 | * Transformaciones.
18 | * Índices y multi-índices.
19 | * Extracción y almacenamiento de datos.
20 | * Visualización de datos con pandas.
21 | * Introducción a Matplotlib.
22 | * Gráficas estadísticas.
23 | * Grafícas en 3D.
24 | * Procesamiento de imágenes.
25 | * Cómputo paralelo con Dask.
26 |
--------------------------------------------------------------------------------
/img/arquitectura_dask.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/arquitectura_dask.png
--------------------------------------------------------------------------------
/img/ciclo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/ciclo.png
--------------------------------------------------------------------------------
/img/dask_cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/dask_cluster.png
--------------------------------------------------------------------------------
/img/grammar_of_graphics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/grammar_of_graphics.png
--------------------------------------------------------------------------------
/img/pythonista.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PythonistaMX/py311/8d416bc812485edffd93bfd9fe044442b3d89dad/img/pythonista.png
--------------------------------------------------------------------------------
/src/31/callback.py:
--------------------------------------------------------------------------------
1 | from dash import Dash, dcc, html, Input, Output
2 |
3 | app = Dash(__name__)
4 |
5 | app.layout = html.Div([
6 | html.H6("Change the value in the text box to see callbacks in action!"),
7 | html.Div([
8 | "Input: ",
9 | dcc.Input(id='my-input', value='initial value', type='text')
10 | ]),
11 | html.Br(),
12 | html.Div(id='my-output'),
13 |
14 | ])
15 |
16 |
17 | @app.callback(
18 | Output(component_id='my-output', component_property='children'),
19 | Input(component_id='my-input', component_property='value')
20 | )
21 | def update_output_div(input_value):
22 | return f'Output: {input_value}'
23 |
24 |
25 | if __name__ == '__main__':
26 | app.run_server(debug=True, port=8000, host="0.0.0.0")
--------------------------------------------------------------------------------
/src/31/hola_mundo.py:
--------------------------------------------------------------------------------
1 | from dash import Dash, html
2 | import pandas as pd
3 |
4 | df = pd.read_csv('https://gist.githubusercontent.com/chriddyp/c78bf172206ce24f77d6363a2d754b59/raw/c353e8ef842413cae56ae3920b8fd78468aa4cb2/usa-agricultural-exports-2011.csv')
5 |
6 |
7 | def generate_table(dataframe, max_rows=10):
8 | return html.Table([
9 | html.Thead(
10 | html.Tr([html.Th(col) for col in dataframe.columns])
11 | ),
12 | html.Tbody([
13 | html.Tr([
14 | html.Td(dataframe.iloc[i][col]) for col in dataframe.columns
15 | ]) for i in range(min(len(dataframe), max_rows))
16 | ])
17 | ])
18 |
19 |
20 | app = Dash(__name__)
21 |
22 | app.layout = html.Div([
23 | html.H4(children='US Agriculture Exports (2011)'),
24 | generate_table(df)
25 | ])
26 |
27 | if __name__ == '__main__':
28 | app.run_server(debug=True, port=8080, host="0.0.0.0")
29 |
--------------------------------------------------------------------------------