├── README.md ├── 1_IntruduccionPandas.ipynb ├── 2_TiposDeDatos.ipynb ├── 5_InfoDataSet.ipynb ├── 4_ImportarDatos.ipynb └── 6_TratamientoDatos.ipynb /README.md: -------------------------------------------------------------------------------- 1 | #

Repositorio de GitHub dedicado al aprendizaje de Pandas y su importancia en el análisis de datos! 🐼💼

2 | #

📝 Descripción

3 | En este repositorio, he recopilado una serie de conceptos clave, ejercicios prácticos y recursos útiles para que aquellos que están comenzando en el mundo del análisis de datos puedan sumergirse en el poder de Pandas. Como alguien que ha estado en la búsqueda de su primer empleo en este campo, entiendo lo crucial que es dominar esta herramienta y estar preparado para los desafíos del mundo real. 4 | 5 | #

🎯 Contenido del Repositorio:

6 | 7 | #

Jupyter Notebook:

8 | Este notebook contiene un análisis dedicado al aprendizaje de Pandas y su importancia en el análisis de datos para aquellos que buscan su primer empleo en este campo! 🐼 9 | 10 | #

Conjunto de Datos en csv :

11 | Se incluye el conjunto de datos utilizado en el análisis. 12 | 13 | https://drive.google.com/file/d/1HTCYiLHnIY72-O5qde0O5Tp5vVIuq2bl/view?usp=sharing 14 | 15 | https://drive.google.com/file/d/1v-uBSxvs0cy0iBRTSl5p2dhWNiheLKC_/view?usp=sharing 16 | 17 | #

Cómo Utilizar Este Repositorio:

18 | Clona este repositorio en tu máquina local. 19 | Abre el Jupyter Notebook en tu entorno de trabajo de Python. 20 | Sigue las instrucciones y ejecuta las celdas de código paso a paso para comprender los conceptos y técnicas presentadas. 21 | Experimenta con el código y los datos por tu cuenta para obtener una comprensión más profunda. 22 | #

Si te gusta alguno de los repositorios, podes hacer clic para darme tu apoyo en el ⭐️ botón y correr la voz 🦄⁣

23 | ⁣ 24 | 👩‍💻 Muchas gracias y Saludos!⁣ 25 | ======= 26 | 27 | . 28 | . 29 | . 30 | . 31 | . 32 | El Cuaderno de estudio pertenece a Data con Steven 33 | -------------------------------------------------------------------------------- /1_IntruduccionPandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Curso Pandas" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### Que es Pandas\n", 15 | "\n", 16 | "La biblioteca de software de código abierto Pandas está diseñada específicamente para la manipulación y el análisis de datos en el lenguaje Python. Es potente, flexible y fácil de usar.\n", 17 | "Vas a poder:\n", 18 | "\n", 19 | "**Manipular datos:** Te permite cargar datos desde diferentes fuentes, como archivos CSV, bases de datos incluso páginas web. Cargados los datos podes filtrar, ordenar, agregar o eliminar información.\n", 20 | "\n", 21 | "**Limpieza de datos:** A menudo los datos que obtenemos pueden tener valores faltantes, errores tipográficos o información duplicada. Pandas ofrece herramientas para limpiar datos de manera sencilla.\n", 22 | "\n", 23 | "**Análisis rápido:** Por ejemplo, calcular estadísticas como promedios, la mediana o la desviación estandar de tus datos.\n", 24 | "\n", 25 | "**Visualización de datos:** Además de manipular datos, Pandas puede trabajr muy bien con bibliotecas de visualización de datos, crear gráficos y visualizaciones informativas.\n", 26 | "\n", 27 | "\n" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "### Estructura de Datos\n", 35 | "\n", 36 | "Pandas tiene dos estructuras de datos principales:\n", 37 | "\n", 38 | "**DataFrame:** Similar a una hoja de cálculo en Excel o una tabla en una base de datos, es una estructura bidimensional que organiza los datos en filas y columnas. Cada columna puede contener diferentes tipos de datos (números, texto, fechas, etc)\n", 39 | "\n", 40 | "**Series:** Es una estructura unidimensional similar a una lista o un vector, pero con etiquetas en lugar de índices numéricos. Puedes pensar en una serie como una sola columna de un DataFrame.\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "### Instalar Pandas" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "En la consola de conda: \n", 55 | "\n", 56 | " conda install pandas" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "### Importar Pandas" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | " import pandas as df" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "### Cargar Datos" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 10, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "data": { 87 | "text/html": [ 88 | "
\n", 89 | "\n", 102 | "\n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
0Mas y MasAlmendras4.5412122023-03-29Cádiz
1MercadonaNueces5.845912023-12-05Sevilla
2AlcampoCacahuetes2.6928312023-07-26Madrid
3Mas y MasPistachos6.705162023-10-07Alicante
4LidlNueces6.1412832023-05-12Barcelona
.....................
295CapraboNueces9.7523672023-09-15Sevilla
296AlcampoPistachos3.8219512023-09-19Madrid
297Mas y MasPistachos5.2228112023-05-04Valencia
298Mas y MasCacahuetes8.074732023-01-18Madrid
299AldiNueces5.8010812023-03-23Sevilla
\n", 216 | "

300 rows × 6 columns

\n", 217 | "
" 218 | ], 219 | "text/plain": [ 220 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 221 | "0 Mas y Mas Almendras 4.54 1212 \n", 222 | "1 Mercadona Nueces 5.84 591 \n", 223 | "2 Alcampo Cacahuetes 2.69 2831 \n", 224 | "3 Mas y Mas Pistachos 6.70 516 \n", 225 | "4 Lidl Nueces 6.14 1283 \n", 226 | ".. ... ... ... ... \n", 227 | "295 Caprabo Nueces 9.75 2367 \n", 228 | "296 Alcampo Pistachos 3.82 1951 \n", 229 | "297 Mas y Mas Pistachos 5.22 2811 \n", 230 | "298 Mas y Mas Cacahuetes 8.07 473 \n", 231 | "299 Aldi Nueces 5.80 1081 \n", 232 | "\n", 233 | " Fecha Venta Provincia Cliente \n", 234 | "0 2023-03-29 Cádiz \n", 235 | "1 2023-12-05 Sevilla \n", 236 | "2 2023-07-26 Madrid \n", 237 | "3 2023-10-07 Alicante \n", 238 | "4 2023-05-12 Barcelona \n", 239 | ".. ... ... \n", 240 | "295 2023-09-15 Sevilla \n", 241 | "296 2023-09-19 Madrid \n", 242 | "297 2023-05-04 Valencia \n", 243 | "298 2023-01-18 Madrid \n", 244 | "299 2023-03-23 Sevilla \n", 245 | "\n", 246 | "[300 rows x 6 columns]" 247 | ] 248 | }, 249 | "execution_count": 10, 250 | "metadata": {}, 251 | "output_type": "execute_result" 252 | } 253 | ], 254 | "source": [ 255 | "import pandas as pd\n", 256 | "df = pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/registros_ventas_frutos_secos.csv', delimiter=',')\n", 257 | "df" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "# Visualizar Datos" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 6, 270 | "metadata": {}, 271 | "outputs": [ 272 | { 273 | "data": { 274 | "text/html": [ 275 | "
\n", 276 | "\n", 289 | "\n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
0Mas y MasAlmendras4.5412122023-03-29Cádiz
1MercadonaNueces5.845912023-12-05Sevilla
2AlcampoCacahuetes2.6928312023-07-26Madrid
3Mas y MasPistachos6.705162023-10-07Alicante
4LidlNueces6.1412832023-05-12Barcelona
\n", 349 | "
" 350 | ], 351 | "text/plain": [ 352 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 353 | "0 Mas y Mas Almendras 4.54 1212 \n", 354 | "1 Mercadona Nueces 5.84 591 \n", 355 | "2 Alcampo Cacahuetes 2.69 2831 \n", 356 | "3 Mas y Mas Pistachos 6.70 516 \n", 357 | "4 Lidl Nueces 6.14 1283 \n", 358 | "\n", 359 | " Fecha Venta Provincia Cliente \n", 360 | "0 2023-03-29 Cádiz \n", 361 | "1 2023-12-05 Sevilla \n", 362 | "2 2023-07-26 Madrid \n", 363 | "3 2023-10-07 Alicante \n", 364 | "4 2023-05-12 Barcelona " 365 | ] 366 | }, 367 | "execution_count": 6, 368 | "metadata": {}, 369 | "output_type": "execute_result" 370 | } 371 | ], 372 | "source": [ 373 | "# Como ver los 5 primeros registros\n", 374 | "df.head()" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 7, 380 | "metadata": {}, 381 | "outputs": [ 382 | { 383 | "data": { 384 | "text/html": [ 385 | "
\n", 386 | "\n", 399 | "\n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
0Mas y MasAlmendras4.5412122023-03-29Cádiz
1MercadonaNueces5.845912023-12-05Sevilla
2AlcampoCacahuetes2.6928312023-07-26Madrid
3Mas y MasPistachos6.705162023-10-07Alicante
4LidlNueces6.1412832023-05-12Barcelona
5CarrefourAlmendras1.9227412023-03-24Cádiz
6LidlPistachos3.8828982023-03-08Sevilla
7AlcampoAlmendras3.7624642023-03-01Madrid
8DiaAlmendras6.466412023-09-26Alicante
9ConsumPistachos8.706462023-11-23Cádiz
\n", 504 | "
" 505 | ], 506 | "text/plain": [ 507 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 508 | "0 Mas y Mas Almendras 4.54 1212 \n", 509 | "1 Mercadona Nueces 5.84 591 \n", 510 | "2 Alcampo Cacahuetes 2.69 2831 \n", 511 | "3 Mas y Mas Pistachos 6.70 516 \n", 512 | "4 Lidl Nueces 6.14 1283 \n", 513 | "5 Carrefour Almendras 1.92 2741 \n", 514 | "6 Lidl Pistachos 3.88 2898 \n", 515 | "7 Alcampo Almendras 3.76 2464 \n", 516 | "8 Dia Almendras 6.46 641 \n", 517 | "9 Consum Pistachos 8.70 646 \n", 518 | "\n", 519 | " Fecha Venta Provincia Cliente \n", 520 | "0 2023-03-29 Cádiz \n", 521 | "1 2023-12-05 Sevilla \n", 522 | "2 2023-07-26 Madrid \n", 523 | "3 2023-10-07 Alicante \n", 524 | "4 2023-05-12 Barcelona \n", 525 | "5 2023-03-24 Cádiz \n", 526 | "6 2023-03-08 Sevilla \n", 527 | "7 2023-03-01 Madrid \n", 528 | "8 2023-09-26 Alicante \n", 529 | "9 2023-11-23 Cádiz " 530 | ] 531 | }, 532 | "execution_count": 7, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "# Como ver los primeros 10 registros\n", 539 | "df[:10]" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": {}, 546 | "outputs": [], 547 | "source": [] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 8, 552 | "metadata": {}, 553 | "outputs": [ 554 | { 555 | "data": { 556 | "text/html": [ 557 | "
\n", 558 | "\n", 571 | "\n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | "
Nombre ClienteTipo Producto
0Mas y MasAlmendras
1MercadonaNueces
2AlcampoCacahuetes
3Mas y MasPistachos
4LidlNueces
.........
295CapraboNueces
296AlcampoPistachos
297Mas y MasPistachos
298Mas y MasCacahuetes
299AldiNueces
\n", 637 | "

300 rows × 2 columns

\n", 638 | "
" 639 | ], 640 | "text/plain": [ 641 | " Nombre Cliente Tipo Producto\n", 642 | "0 Mas y Mas Almendras\n", 643 | "1 Mercadona Nueces\n", 644 | "2 Alcampo Cacahuetes\n", 645 | "3 Mas y Mas Pistachos\n", 646 | "4 Lidl Nueces\n", 647 | ".. ... ...\n", 648 | "295 Caprabo Nueces\n", 649 | "296 Alcampo Pistachos\n", 650 | "297 Mas y Mas Pistachos\n", 651 | "298 Mas y Mas Cacahuetes\n", 652 | "299 Aldi Nueces\n", 653 | "\n", 654 | "[300 rows x 2 columns]" 655 | ] 656 | }, 657 | "execution_count": 8, 658 | "metadata": {}, 659 | "output_type": "execute_result" 660 | } 661 | ], 662 | "source": [ 663 | "# Como visualizo 2 Columnas\n", 664 | "df[['Nombre Cliente', 'Tipo Producto']] # doble corchete, sino dará error" 665 | ] 666 | } 667 | ], 668 | "metadata": { 669 | "kernelspec": { 670 | "display_name": "Python 3", 671 | "language": "python", 672 | "name": "python3" 673 | }, 674 | "language_info": { 675 | "codemirror_mode": { 676 | "name": "ipython", 677 | "version": 3 678 | }, 679 | "file_extension": ".py", 680 | "mimetype": "text/x-python", 681 | "name": "python", 682 | "nbconvert_exporter": "python", 683 | "pygments_lexer": "ipython3", 684 | "version": "3.9.13" 685 | } 686 | }, 687 | "nbformat": 4, 688 | "nbformat_minor": 2 689 | } 690 | -------------------------------------------------------------------------------- /2_TiposDeDatos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Tipos de Datos" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Tipos de Datos en Pandas, tomando en cuenta la capacidad de soportar vaalores nulos.\n", 15 | "\n", 16 | "**float:** Tipo de dato flotante en Numpy y puede contener valores decimales. Soporta datos nulos (NaN)\n", 17 | "\n", 18 | "**int:** Es el tipo de dato número entero de NumPy y no soporta nulos.\n", 19 | "\n", 20 | "**object** Es el tipo de dato de objeto de Numpy y se utiliza para representar texto o cualquier objeto de Python. Es versátil pero no es la elección óptima cuando se trata de datos numéricos o categóricos.\n", 21 | "\n", 22 | "**category:** Es untipo de dato introducido por Pandas específicamente diseñado para variables categóricas. Puede ayudar a ahorrar memoria y acelerar ciertas operaciones cuando trabajas con un conjunto de categorías limitado.\n", 23 | "\n", 24 | "**bool:** Es el tipo de dato booleano de Numpy y no soporta datos nulos.\n", 25 | "\n", 26 | "**boolean:** Es untipo de dato booleano introducido por pandas que puede manejar datos nulos.\n", 27 | "\n", 28 | "**datetime64:** Es el tipo de dato de fecha y hora de Numpy y no soporta datos nulos." 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### Crear Datos" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "Crear una Serie" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 1, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "data": { 52 | "text/plain": [ 53 | "0 Pedro\n", 54 | "1 Maria\n", 55 | "2 Juan\n", 56 | "dtype: object" 57 | ] 58 | }, 59 | "execution_count": 1, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "import pandas as pd\n", 66 | "\n", 67 | "serie=pd.Series(['Pedro', 'Maria','Juan'])\n", 68 | "serie" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "Indexa la serie:0,1,2\n", 76 | "\n", 77 | "Dice los valores: Pedro...\n", 78 | "\n", 79 | "Define el tipo de dato: Object\n" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 2, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "data": { 89 | "text/plain": [ 90 | "RangeIndex(start=0, stop=3, step=1)" 91 | ] 92 | }, 93 | "execution_count": 2, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | } 97 | ], 98 | "source": [ 99 | "serie.index" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "RangeIndex: que es un rango.\n", 107 | "\n", 108 | "Empieza en 0.\n", 109 | "\n", 110 | "Termina en 3, por lo tanto indice 2 (Recuerda que empieza en 0)\n", 111 | "\n", 112 | "Y va de 1 en 1 (step)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 3, 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "data": { 122 | "text/plain": [ 123 | "array(['Pedro', 'Maria', 'Juan'], dtype=object)" 124 | ] 125 | }, 126 | "execution_count": 3, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "serie.values" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "Lo devuelve como un objeto de Numpy\n", 140 | "\n", 141 | "Devuelve el tipo de objeto" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "### Definir uno mismo el Indice" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Desde una **Lista Propia**" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 4, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "Esto es el indice: [20, 22, 24, 26, 28, 30]\n", 168 | "Estos son los valores: ['Pedro', 'María', 'Juan', 'Fran', 'Victor', 'Jose']\n" 169 | ] 170 | } 171 | ], 172 | "source": [ 173 | "indice = list(range(20,31,2))\n", 174 | "print('Esto es el indice: ', indice)\n", 175 | "\n", 176 | "valores = ['Pedro', 'María', 'Juan', 'Fran', 'Victor', 'Jose']\n", 177 | "print('Estos son los valores: ',valores)\n", 178 | "\n", 179 | "#Cómo los uno?\n", 180 | "indice_propio= pd.Series(valores, index=indice)" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 5, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "data": { 190 | "text/plain": [ 191 | "20 Pedro\n", 192 | "22 María\n", 193 | "24 Juan\n", 194 | "26 Fran\n", 195 | "28 Victor\n", 196 | "30 Jose\n", 197 | "dtype: object" 198 | ] 199 | }, 200 | "execution_count": 5, 201 | "metadata": {}, 202 | "output_type": "execute_result" 203 | } 204 | ], 205 | "source": [ 206 | "indice_propio" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Desde un **Diccionario**\n", 214 | "\n", 215 | "Crea el diccionario de las listas anteriores" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 6, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "{20: 'Pedro', 22: 'María', 24: 'Juan', 26: 'Fran', 28: 'Victor', 30: 'Jose'}" 227 | ] 228 | }, 229 | "execution_count": 6, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "# Con el método ZIP.\n", 236 | "diccionario = dict(zip(indice, valores))\n", 237 | "diccionario" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 7, 243 | "metadata": {}, 244 | "outputs": [ 245 | { 246 | "name": "stdout", 247 | "output_type": "stream", 248 | "text": [ 249 | "{20: 'Pedro', 22: 'María', 24: 'Juan', 26: 'Fran', 28: 'Victor', 30: 'Jose'}\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "# Otro ejemplo de como crearlo con un list comprehension, como recordatorio.\n", 255 | "diccionario_comprehension = {indice[i]: valores[i] for i in range(len(indice))}\n", 256 | "print(diccionario_comprehension)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 8, 262 | "metadata": {}, 263 | "outputs": [ 264 | { 265 | "data": { 266 | "text/plain": [ 267 | "20 Pedro\n", 268 | "22 María\n", 269 | "24 Juan\n", 270 | "26 Fran\n", 271 | "28 Victor\n", 272 | "30 Jose\n", 273 | "dtype: object" 274 | ] 275 | }, 276 | "execution_count": 8, 277 | "metadata": {}, 278 | "output_type": "execute_result" 279 | } 280 | ], 281 | "source": [ 282 | "serie_dict = pd.Series(diccionario)\n", 283 | "serie_dict" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "Desde un **Array de NumPy**" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/plain": [ 301 | "0 Pedro\n", 302 | "1 María\n", 303 | "2 Juan\n", 304 | "3 Fran\n", 305 | "4 Victor\n", 306 | "5 Jose\n", 307 | "dtype: object" 308 | ] 309 | }, 310 | "execution_count": 9, 311 | "metadata": {}, 312 | "output_type": "execute_result" 313 | } 314 | ], 315 | "source": [ 316 | "import numpy as np\n", 317 | "np_array = np.array(valores)\n", 318 | "array_de_numpy = pd.Series(np_array)\n", 319 | "array_de_numpy" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "ahora al reves de **Pandas** a **Numpy**" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 10, 332 | "metadata": {}, 333 | "outputs": [ 334 | { 335 | "name": "stdout", 336 | "output_type": "stream", 337 | "text": [ 338 | "['Pedro' 'María' 'Juan' 'Fran' 'Victor' 'Jose']\n", 339 | "\n", 340 | "\n" 341 | ] 342 | } 343 | ], 344 | "source": [ 345 | "a_numpy = array_de_numpy.to_numpy()\n", 346 | "print(a_numpy)\n", 347 | "print(type(a_numpy))\n", 348 | "print(type(array_de_numpy))" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "### Crear un DataFrame de una Serie" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 11, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "data": { 365 | "text/html": [ 366 | "
\n", 367 | "\n", 380 | "\n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | "
0
0Pedro
1María
2Juan
3Fran
4Victor
5Jose
\n", 414 | "
" 415 | ], 416 | "text/plain": [ 417 | " 0\n", 418 | "0 Pedro\n", 419 | "1 María\n", 420 | "2 Juan\n", 421 | "3 Fran\n", 422 | "4 Victor\n", 423 | "5 Jose" 424 | ] 425 | }, 426 | "execution_count": 11, 427 | "metadata": {}, 428 | "output_type": "execute_result" 429 | } 430 | ], 431 | "source": [ 432 | "array_de_numpy.to_frame()" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "### Atributos de una Serie" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 12, 445 | "metadata": {}, 446 | "outputs": [ 447 | { 448 | "data": { 449 | "text/plain": [ 450 | "20 Pedro\n", 451 | "22 María\n", 452 | "24 Juan\n", 453 | "26 Fran\n", 454 | "28 Victor\n", 455 | "30 Jose\n", 456 | "dtype: object" 457 | ] 458 | }, 459 | "execution_count": 12, 460 | "metadata": {}, 461 | "output_type": "execute_result" 462 | } 463 | ], 464 | "source": [ 465 | "indice_propio" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 13, 471 | "metadata": {}, 472 | "outputs": [ 473 | { 474 | "data": { 475 | "text/plain": [ 476 | "dtype('O')" 477 | ] 478 | }, 479 | "execution_count": 13, 480 | "metadata": {}, 481 | "output_type": "execute_result" 482 | } 483 | ], 484 | "source": [ 485 | "# Devuelve el tipo de datos de los elementos en la Serie\n", 486 | "indice_propio.dtype" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": 14, 492 | "metadata": {}, 493 | "outputs": [ 494 | { 495 | "data": { 496 | "text/plain": [ 497 | "6" 498 | ] 499 | }, 500 | "execution_count": 14, 501 | "metadata": {}, 502 | "output_type": "execute_result" 503 | } 504 | ], 505 | "source": [ 506 | "# Devuelve el número total de elementos de la Serie.\n", 507 | "indice_propio.size" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": 15, 513 | "metadata": {}, 514 | "outputs": [ 515 | { 516 | "data": { 517 | "text/plain": [ 518 | "(6,)" 519 | ] 520 | }, 521 | "execution_count": 15, 522 | "metadata": {}, 523 | "output_type": "execute_result" 524 | } 525 | ], 526 | "source": [ 527 | "# Proporciona la forma (número de filas y columnas) de la Serie como una tupla.\n", 528 | "indice_propio.shape" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": 16, 534 | "metadata": {}, 535 | "outputs": [ 536 | { 537 | "name": "stdout", 538 | "output_type": "stream", 539 | "text": [ 540 | "RangeIndex(start=0, stop=3, step=1)\n", 541 | "0 Pedro\n", 542 | "1 Maria\n", 543 | "2 Juan\n", 544 | "dtype: object\n" 545 | ] 546 | } 547 | ], 548 | "source": [ 549 | "# Muestra la información sobre el índice de la Serie, incluyendo sus etiquetas y tipo de índice.\n", 550 | "print(serie.index)\n", 551 | "print(serie)" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "# Crear un DataFrame" 559 | ] 560 | }, 561 | { 562 | "cell_type": "markdown", 563 | "metadata": {}, 564 | "source": [ 565 | "desde un **Diccionario**" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 18, 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | "Este es el diccionario: \n", 578 | "\n", 579 | " nombre [Pedro, Maria, Juan]\n", 580 | "edad [30, 40, 50]\n", 581 | "dtype: object\n", 582 | "\n", 583 | "\n", 584 | "Este es el diccionario en DataFrame:\n", 585 | "\n", 586 | " nombre edad\n", 587 | "10 Pedro 30\n", 588 | "20 Maria 40\n", 589 | "30 Juan 50\n" 590 | ] 591 | } 592 | ], 593 | "source": [ 594 | "dicc1= {'nombre': ['Pedro', 'Maria', 'Juan'], 'edad': [30, 40, 50]}\n", 595 | "print('Este es el diccionario: \\n\\n', pd.Series(dicc1))\n", 596 | "\n", 597 | "df_dicc= pd.DataFrame(dicc1,['10','20','30']) #No hace falta ponerle indice si no quiero\n", 598 | "print('\\n\\nEste es el diccionario en DataFrame:\\n\\n', df_dicc)" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 19, 604 | "metadata": {}, 605 | "outputs": [ 606 | { 607 | "data": { 608 | "text/html": [ 609 | "
\n", 610 | "\n", 623 | "\n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | "
nombreedad
10Pedro30
20Maria40
30Juan50
\n", 649 | "
" 650 | ], 651 | "text/plain": [ 652 | " nombre edad\n", 653 | "10 Pedro 30\n", 654 | "20 Maria 40\n", 655 | "30 Juan 50" 656 | ] 657 | }, 658 | "execution_count": 19, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | } 662 | ], 663 | "source": [ 664 | "df_dicc" 665 | ] 666 | }, 667 | { 668 | "cell_type": "markdown", 669 | "metadata": {}, 670 | "source": [ 671 | "Desde **Listas** de **Listas**" 672 | ] 673 | }, 674 | { 675 | "cell_type": "code", 676 | "execution_count": 20, 677 | "metadata": {}, 678 | "outputs": [ 679 | { 680 | "data": { 681 | "text/html": [ 682 | "
\n", 683 | "\n", 696 | "\n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | "
nombreedad
10Pedro30
20Maria40
30Juan50
\n", 722 | "
" 723 | ], 724 | "text/plain": [ 725 | " nombre edad\n", 726 | "10 Pedro 30\n", 727 | "20 Maria 40\n", 728 | "30 Juan 50" 729 | ] 730 | }, 731 | "execution_count": 20, 732 | "metadata": {}, 733 | "output_type": "execute_result" 734 | } 735 | ], 736 | "source": [ 737 | "datos_df = [['Pedro', 30],['Maria',40],['Juan',50]]\n", 738 | "columnas_df = ['nombre', 'edad']\n", 739 | "indice_df = [10,20,30]\n", 740 | "\n", 741 | "lists_df= pd.DataFrame(data=datos_df, columns=columnas_df, index=indice_df)\n", 742 | "lists_df" 743 | ] 744 | }, 745 | { 746 | "cell_type": "markdown", 747 | "metadata": {}, 748 | "source": [ 749 | "De una **Array 2D**" 750 | ] 751 | }, 752 | { 753 | "cell_type": "code", 754 | "execution_count": 21, 755 | "metadata": {}, 756 | "outputs": [ 757 | { 758 | "data": { 759 | "text/html": [ 760 | "
\n", 761 | "\n", 774 | "\n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | "
c1c2c3
f1PedroMariaJuam
f2304050
\n", 798 | "
" 799 | ], 800 | "text/plain": [ 801 | " c1 c2 c3\n", 802 | "f1 Pedro Maria Juam\n", 803 | "f2 30 40 50" 804 | ] 805 | }, 806 | "execution_count": 21, 807 | "metadata": {}, 808 | "output_type": "execute_result" 809 | } 810 | ], 811 | "source": [ 812 | "array_2d= np.array([['Pedro', 'Maria', 'Juam'], [30,40,50]])\n", 813 | "array_data= pd.DataFrame(array_2d, index= ['f1','f2'], columns=['c1','c2','c3'])\n", 814 | "array_data" 815 | ] 816 | }, 817 | { 818 | "cell_type": "code", 819 | "execution_count": 22, 820 | "metadata": {}, 821 | "outputs": [ 822 | { 823 | "data": { 824 | "text/plain": [ 825 | "array([['Pedro', 'Maria', 'Juam'],\n", 826 | " ['30', '40', '50']], dtype='\n", 875 | "\n", 888 | "\n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | "
c1c2c3
f1PedroMariaJuam
f2304050
\n", 912 | "" 913 | ], 914 | "text/plain": [ 915 | " c1 c2 c3\n", 916 | "f1 Pedro Maria Juam\n", 917 | "f2 30 40 50" 918 | ] 919 | }, 920 | "execution_count": 24, 921 | "metadata": {}, 922 | "output_type": "execute_result" 923 | } 924 | ], 925 | "source": [ 926 | "array_data" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 25, 932 | "metadata": {}, 933 | "outputs": [ 934 | { 935 | "data": { 936 | "text/plain": [ 937 | "f1 Pedro\n", 938 | "f2 30\n", 939 | "Name: c1, dtype: object" 940 | ] 941 | }, 942 | "execution_count": 25, 943 | "metadata": {}, 944 | "output_type": "execute_result" 945 | } 946 | ], 947 | "source": [ 948 | "# Devuelve una Series\n", 949 | "array_data.c1" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 26, 955 | "metadata": {}, 956 | "outputs": [ 957 | { 958 | "data": { 959 | "text/plain": [ 960 | "pandas.core.series.Series" 961 | ] 962 | }, 963 | "execution_count": 26, 964 | "metadata": {}, 965 | "output_type": "execute_result" 966 | } 967 | ], 968 | "source": [ 969 | "type(array_data.c1)" 970 | ] 971 | }, 972 | { 973 | "cell_type": "code", 974 | "execution_count": 27, 975 | "metadata": {}, 976 | "outputs": [ 977 | { 978 | "data": { 979 | "text/plain": [ 980 | "f1 Pedro\n", 981 | "f2 30\n", 982 | "Name: c1, dtype: object" 983 | ] 984 | }, 985 | "execution_count": 27, 986 | "metadata": {}, 987 | "output_type": "execute_result" 988 | } 989 | ], 990 | "source": [ 991 | "# Devuelve una Series\n", 992 | "array_data['c1']" 993 | ] 994 | }, 995 | { 996 | "cell_type": "code", 997 | "execution_count": 28, 998 | "metadata": {}, 999 | "outputs": [ 1000 | { 1001 | "data": { 1002 | "text/plain": [ 1003 | "pandas.core.series.Series" 1004 | ] 1005 | }, 1006 | "execution_count": 28, 1007 | "metadata": {}, 1008 | "output_type": "execute_result" 1009 | } 1010 | ], 1011 | "source": [ 1012 | "type(array_data['c1'])" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": 29, 1018 | "metadata": {}, 1019 | "outputs": [ 1020 | { 1021 | "data": { 1022 | "text/plain": [ 1023 | "pandas.core.frame.DataFrame" 1024 | ] 1025 | }, 1026 | "execution_count": 29, 1027 | "metadata": {}, 1028 | "output_type": "execute_result" 1029 | } 1030 | ], 1031 | "source": [ 1032 | "# Si pongo dos corchetes me devuelve un DataFrame\n", 1033 | "type(array_data[['c1']])" 1034 | ] 1035 | }, 1036 | { 1037 | "cell_type": "markdown", 1038 | "metadata": {}, 1039 | "source": [] 1040 | } 1041 | ], 1042 | "metadata": { 1043 | "kernelspec": { 1044 | "display_name": "Python 3", 1045 | "language": "python", 1046 | "name": "python3" 1047 | }, 1048 | "language_info": { 1049 | "codemirror_mode": { 1050 | "name": "ipython", 1051 | "version": 3 1052 | }, 1053 | "file_extension": ".py", 1054 | "mimetype": "text/x-python", 1055 | "name": "python", 1056 | "nbconvert_exporter": "python", 1057 | "pygments_lexer": "ipython3", 1058 | "version": "3.9.13" 1059 | } 1060 | }, 1061 | "nbformat": 4, 1062 | "nbformat_minor": 2 1063 | } 1064 | -------------------------------------------------------------------------------- /5_InfoDataSet.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Previsualizar un Archivo" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "A veces, cuando necesitas trabajar con un archivo de datos, no tienes toda la información sobre cómo está estructurado. No sabes si las columnas tienen nombres, cuál es el separador entre los datos, o incluso si los datos comienzan en la primera línea.\n", 15 | "\n", 16 | "En este caso, puedes utilizar Python de manera básica para echar un vistazo al archivo y obtener algunas pistas. Aquí está el proceso:\n", 17 | "\n", 18 | "1. Abres el archivo utilizando `open`.\n", 19 | "2. Lees un número de líneas del archivo utilizando `readlines`. Esto te proporciona una lista de líneas del archivo, y puedes tomar un subconjunto de ellas utilizando \"slice\" para obtener solo las que necesitas.\n", 20 | "3. Es importante cerrar el archivo después de usarlo. Es una buena práctica no dejar los archivos abiertos en tu programa.\n", 21 | "\n", 22 | "Esta vista previa de las primeras líneas del archivo te ayudará a comprender cómo está estructurado. Por ejemplo, puedes descubrir si los datos no comienzan en la primera fila, si no hay nombres de columnas, o si el separador entre los datos es un tabulador u otro carácter. Esta información te ayudará a parametrizar correctamente las herramientas de importación de datos que uses más adelante." 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": {}, 29 | "outputs": [ 30 | { 31 | "name": "stdout", 32 | "output_type": "stream", 33 | "text": [ 34 | "['\\ufeffNombre Cliente;Tipo Producto;Precio (€/kg);Cantidad Vendida (kg);Fecha Venta;Provincia Cliente;id\\n', 'Mas y Mas;Almendras;4.54;1212;29/3/23;Cádiz;389932\\n', 'Mercadona;Nueces;5.84;591;5/12/23;Sevilla;389933\\n', 'Alcampo;Cacahuetes;2.69;2831;26/7/23;Madrid;389934\\n', 'Mas y Mas;Pistachos;6.7;516;7/10/23;Alicante;389935\\n', 'Lidl;Nueces;6.14;1283;12/5/23;Barcelona;389936\\n', 'Carrefour;Almendras;1.92;2741;24/3/23;Cádiz;389937\\n', 'Lidl;Pistachos;3.88;2898;8/3/23;Sevilla;389938\\n', 'Alcampo;Almendras;3.76;2464;1/3/23;Madrid;389939\\n', 'Dia;Almendras;6.46;641;26/9/23;Alicante;389940\\n']\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "previsualizacion = open('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.csv','r')\n", 40 | "print(previsualizacion.readlines()[0:10])\n", 41 | "previsualizacion.close() # importante cerrar para q no quede abierto en memoria" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## Calidad de Datos" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "La fase de calidad de datos es crucial en un proyecto de Data Science porque afecta directamente la confiabilidad y precisión de los resultados que obtendrás al analizar y modelar datos:\n", 56 | "\n", 57 | "1. Toma de decisiones precisa: Los datos de baja calidad pueden llevar a conclusiones incorrectas. Si tus datos no son confiables, las decisiones basadas en ellos pueden ser erróneas y costosas.\n", 58 | "\n", 59 | "2. Modelado efectivo: Los modelos de Data Science dependen de datos de calidad para funcionar correctamente. Datos defectuosos pueden llevar a modelos poco precisos, lo que reduce la efectividad de tus análisis y predicciones.\n", 60 | "\n", 61 | "3. Eficiencia y ahorro de tiempo: Mejorar la calidad de los datos desde el principio ahorra tiempo en la limpieza y corrección de datos más adelante en el proyecto. Esto permite una ejecución más eficiente de tus tareas de análisis.\n" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 4, 67 | "metadata": {}, 68 | "outputs": [ 69 | { 70 | "data": { 71 | "text/html": [ 72 | "
\n", 73 | "\n", 86 | "\n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
id
389932Mas y MasAlmendras4.54121229/3/23Cádiz
389933MercadonaNueces5.845915/12/23Sevilla
\n", 128 | "
" 129 | ], 130 | "text/plain": [ 131 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 132 | "id \n", 133 | "389932 Mas y Mas Almendras 4.54 1212 \n", 134 | "389933 Mercadona Nueces 5.84 591 \n", 135 | "\n", 136 | " Fecha Venta Provincia Cliente \n", 137 | "id \n", 138 | "389932 29/3/23 Cádiz \n", 139 | "389933 5/12/23 Sevilla " 140 | ] 141 | }, 142 | "execution_count": 4, 143 | "metadata": {}, 144 | "output_type": "execute_result" 145 | } 146 | ], 147 | "source": [ 148 | "import pandas as pd\n", 149 | "\n", 150 | "df = pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.csv',sep= ';', index_col='id')\n", 151 | "df.head(2)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "## Muestreo" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "La función `sample()` en pandas es como una herramienta para elegir una parte de tus datos de manera aleatoria. Imagina que tienes una caja llena de datos, y con `sample()`, puedes sacar algunos datos de esa caja según tus necesidades.\n", 166 | "\n", 167 | "Los parámetros más importantes son:\n", 168 | "\n", 169 | "1. **n**: Esto es cuántos datos quieres tomar de la caja en total. Si dices `n=10`, sacarás 10 datos al azar.\n", 170 | "\n", 171 | "2. **frac**: En lugar de decir cuántos datos, puedes decir qué parte del total quieres tomar. Si dices `frac=0.2`, tomarás el 20% de los datos.\n", 172 | "\n", 173 | "3. **replace**: Esto es como si pusieras de vuelta los datos que sacas. Si está en `False`, una vez que tomas un dato, no lo puedes tomar de nuevo. Si está en `True`, podrías tomar el mismo dato más de una vez.\n", 174 | "\n", 175 | "4. **random_state**: Es como una semilla mágica que te asegura que, si usas el mismo número, siempre obtendrás los mismos datos al azar. Esto es útil para hacer tu trabajo reproducible, es decir, que otros puedan obtener los mismos resultados que tú." 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 6, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "data": { 185 | "text/plain": [ 186 | "(300, 6)" 187 | ] 188 | }, 189 | "execution_count": 6, 190 | "metadata": {}, 191 | "output_type": "execute_result" 192 | } 193 | ], 194 | "source": [ 195 | "# Nos devolvera la cnatidad de registros y la cantidad de columnas\n", 196 | "df.shape" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 8, 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/plain": [ 207 | "(100, 6)" 208 | ] 209 | }, 210 | "execution_count": 8, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "muestra = df.sample(n=100)\n", 217 | "muestra.shape" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "## Vision General de nuestro DataSet" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "### Información general del Dataset" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 10, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "\n", 244 | "Int64Index: 300 entries, 389932 to 390231\n", 245 | "Data columns (total 6 columns):\n", 246 | " # Column Non-Null Count Dtype \n", 247 | "--- ------ -------------- ----- \n", 248 | " 0 Nombre Cliente 300 non-null object \n", 249 | " 1 Tipo Producto 300 non-null object \n", 250 | " 2 Precio (€/kg) 300 non-null float64\n", 251 | " 3 Cantidad Vendida (kg) 300 non-null int64 \n", 252 | " 4 Fecha Venta 300 non-null object \n", 253 | " 5 Provincia Cliente 300 non-null object \n", 254 | "dtypes: float64(1), int64(1), object(4)\n", 255 | "memory usage: 16.4+ KB\n" 256 | ] 257 | } 258 | ], 259 | "source": [ 260 | "df.info()" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 9, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | "Int64Index: 300 entries, 389932 to 390231\n", 274 | "Data columns (total 6 columns):\n", 275 | " # Column Non-Null Count Dtype \n", 276 | "--- ------ -------------- ----- \n", 277 | " 0 Nombre Cliente 300 non-null object \n", 278 | " 1 Tipo Producto 300 non-null object \n", 279 | " 2 Precio (€/kg) 300 non-null float64\n", 280 | " 3 Cantidad Vendida (kg) 300 non-null int64 \n", 281 | " 4 Fecha Venta 300 non-null object \n", 282 | " 5 Provincia Cliente 300 non-null object \n", 283 | "dtypes: float64(1), int64(1), object(4)\n", 284 | "memory usage: 83.6 KB\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "# Memory_ usage = deep te da la info exacta de la memoria\n", 290 | "# Si vemos que ocupa mucho, podemos trabajar con muestras o cambiar el tipo de datos, object ocupa más que category\n", 291 | "df.info(memory_usage='deep')" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "### Indice" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 11, 304 | "metadata": {}, 305 | "outputs": [ 306 | { 307 | "data": { 308 | "text/plain": [ 309 | "Int64Index([389932, 389933, 389934, 389935, 389936, 389937, 389938, 389939,\n", 310 | " 389940, 389941,\n", 311 | " ...\n", 312 | " 390222, 390223, 390224, 390225, 390226, 390227, 390228, 390229,\n", 313 | " 390230, 390231],\n", 314 | " dtype='int64', name='id', length=300)" 315 | ] 316 | }, 317 | "execution_count": 11, 318 | "metadata": {}, 319 | "output_type": "execute_result" 320 | } 321 | ], 322 | "source": [ 323 | "df.index" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "### Columnas" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 12, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "data": { 340 | "text/plain": [ 341 | "Index(['Nombre Cliente', 'Tipo Producto', 'Precio (€/kg)',\n", 342 | " 'Cantidad Vendida (kg)', 'Fecha Venta', 'Provincia Cliente'],\n", 343 | " dtype='object')" 344 | ] 345 | }, 346 | "execution_count": 12, 347 | "metadata": {}, 348 | "output_type": "execute_result" 349 | } 350 | ], 351 | "source": [ 352 | "# Ver el valor de las columnas\n", 353 | "df.columns" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "### Tipo de Variables" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 13, 366 | "metadata": {}, 367 | "outputs": [ 368 | { 369 | "data": { 370 | "text/plain": [ 371 | "Nombre Cliente object\n", 372 | "Tipo Producto object\n", 373 | "Precio (€/kg) float64\n", 374 | "Cantidad Vendida (kg) int64\n", 375 | "Fecha Venta object\n", 376 | "Provincia Cliente object\n", 377 | "dtype: object" 378 | ] 379 | }, 380 | "execution_count": 13, 381 | "metadata": {}, 382 | "output_type": "execute_result" 383 | } 384 | ], 385 | "source": [ 386 | "# Para conocer los tipos de dato sque tenemos\n", 387 | "df.dtypes" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 15, 393 | "metadata": {}, 394 | "outputs": [ 395 | { 396 | "data": { 397 | "text/plain": [ 398 | "dtype('float64')" 399 | ] 400 | }, 401 | "execution_count": 15, 402 | "metadata": {}, 403 | "output_type": "execute_result" 404 | } 405 | ], 406 | "source": [ 407 | "# Saber el valor de una de las columnas. Podes indexar\n", 408 | "df['Precio (€/kg)'].dtypes" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "### Estadísticos" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 16, 421 | "metadata": {}, 422 | "outputs": [ 423 | { 424 | "data": { 425 | "text/html": [ 426 | "
\n", 427 | "\n", 440 | "\n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | "
Precio (€/kg)Cantidad Vendida (kg)
count300.000000300.000000
mean5.5983331616.950000
std2.673505842.463231
min0.970000100.000000
25%3.525000976.250000
50%5.4750001668.000000
75%7.6225002345.000000
max11.5700002986.000000
\n", 491 | "
" 492 | ], 493 | "text/plain": [ 494 | " Precio (€/kg) Cantidad Vendida (kg)\n", 495 | "count 300.000000 300.000000\n", 496 | "mean 5.598333 1616.950000\n", 497 | "std 2.673505 842.463231\n", 498 | "min 0.970000 100.000000\n", 499 | "25% 3.525000 976.250000\n", 500 | "50% 5.475000 1668.000000\n", 501 | "75% 7.622500 2345.000000\n", 502 | "max 11.570000 2986.000000" 503 | ] 504 | }, 505 | "execution_count": 16, 506 | "metadata": {}, 507 | "output_type": "execute_result" 508 | } 509 | ], 510 | "source": [ 511 | "# Devuelve solo los numéricos\n", 512 | "df.describe()" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 17, 518 | "metadata": {}, 519 | "outputs": [ 520 | { 521 | "data": { 522 | "text/html": [ 523 | "
\n", 524 | "\n", 537 | "\n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
count300300300.000000300.000000300300
unique114NaNNaN2046
topMercadonaCacahuetesNaNNaN6/12/23Barcelona
freq33100NaNNaN557
meanNaNNaN5.5983331616.950000NaNNaN
stdNaNNaN2.673505842.463231NaNNaN
minNaNNaN0.970000100.000000NaNNaN
25%NaNNaN3.525000976.250000NaNNaN
50%NaNNaN5.4750001668.000000NaNNaN
75%NaNNaN7.6225002345.000000NaNNaN
maxNaNNaN11.5700002986.000000NaNNaN
\n", 651 | "
" 652 | ], 653 | "text/plain": [ 654 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 655 | "count 300 300 300.000000 300.000000 \n", 656 | "unique 11 4 NaN NaN \n", 657 | "top Mercadona Cacahuetes NaN NaN \n", 658 | "freq 33 100 NaN NaN \n", 659 | "mean NaN NaN 5.598333 1616.950000 \n", 660 | "std NaN NaN 2.673505 842.463231 \n", 661 | "min NaN NaN 0.970000 100.000000 \n", 662 | "25% NaN NaN 3.525000 976.250000 \n", 663 | "50% NaN NaN 5.475000 1668.000000 \n", 664 | "75% NaN NaN 7.622500 2345.000000 \n", 665 | "max NaN NaN 11.570000 2986.000000 \n", 666 | "\n", 667 | " Fecha Venta Provincia Cliente \n", 668 | "count 300 300 \n", 669 | "unique 204 6 \n", 670 | "top 6/12/23 Barcelona \n", 671 | "freq 5 57 \n", 672 | "mean NaN NaN \n", 673 | "std NaN NaN \n", 674 | "min NaN NaN \n", 675 | "25% NaN NaN \n", 676 | "50% NaN NaN \n", 677 | "75% NaN NaN \n", 678 | "max NaN NaN " 679 | ] 680 | }, 681 | "execution_count": 17, 682 | "metadata": {}, 683 | "output_type": "execute_result" 684 | } 685 | ], 686 | "source": [ 687 | "# Devuelve los que yo decida incluir, cacahuate es el registro que más aparece pero no significa que se ael más vendido\n", 688 | "df.describe(include='all')" 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": 20, 694 | "metadata": {}, 695 | "outputs": [ 696 | { 697 | "data": { 698 | "text/html": [ 699 | "
\n", 700 | "\n", 713 | "\n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | "
Nombre ClienteTipo ProductoFecha VentaProvincia Cliente
count300300300300
unique1142046
topMercadonaCacahuetes6/12/23Barcelona
freq33100557
\n", 754 | "
" 755 | ], 756 | "text/plain": [ 757 | " Nombre Cliente Tipo Producto Fecha Venta Provincia Cliente\n", 758 | "count 300 300 300 300\n", 759 | "unique 11 4 204 6\n", 760 | "top Mercadona Cacahuetes 6/12/23 Barcelona\n", 761 | "freq 33 100 5 57" 762 | ] 763 | }, 764 | "execution_count": 20, 765 | "metadata": {}, 766 | "output_type": "execute_result" 767 | } 768 | ], 769 | "source": [ 770 | "# Solo los objetos (vocal o = objeto)\n", 771 | "df.describe(include ='O')" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 22, 777 | "metadata": {}, 778 | "outputs": [ 779 | { 780 | "data": { 781 | "text/html": [ 782 | "
\n", 783 | "\n", 796 | "\n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | "
countuniquetopfreq
Nombre Cliente30011Mercadona33
Tipo Producto3004Cacahuetes100
Fecha Venta3002046/12/235
Provincia Cliente3006Barcelona57
\n", 837 | "
" 838 | ], 839 | "text/plain": [ 840 | " count unique top freq\n", 841 | "Nombre Cliente 300 11 Mercadona 33\n", 842 | "Tipo Producto 300 4 Cacahuetes 100\n", 843 | "Fecha Venta 300 204 6/12/23 5\n", 844 | "Provincia Cliente 300 6 Barcelona 57" 845 | ] 846 | }, 847 | "execution_count": 22, 848 | "metadata": {}, 849 | "output_type": "execute_result" 850 | } 851 | ], 852 | "source": [ 853 | "# Ver el traspuesto con .T \n", 854 | "df.describe(include='O').T" 855 | ] 856 | }, 857 | { 858 | "cell_type": "code", 859 | "execution_count": 23, 860 | "metadata": {}, 861 | "outputs": [ 862 | { 863 | "data": { 864 | "text/html": [ 865 | "
\n", 866 | "\n", 879 | "\n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | "
countmeanstdmin25%50%75%max
Precio (€/kg)300.05.5983332.6735050.973.5255.4757.622511.57
Cantidad Vendida (kg)300.01616.950000842.463231100.00976.2501668.0002345.00002986.00
\n", 918 | "
" 919 | ], 920 | "text/plain": [ 921 | " count mean std min 25% \\\n", 922 | "Precio (€/kg) 300.0 5.598333 2.673505 0.97 3.525 \n", 923 | "Cantidad Vendida (kg) 300.0 1616.950000 842.463231 100.00 976.250 \n", 924 | "\n", 925 | " 50% 75% max \n", 926 | "Precio (€/kg) 5.475 7.6225 11.57 \n", 927 | "Cantidad Vendida (kg) 1668.000 2345.0000 2986.00 " 928 | ] 929 | }, 930 | "execution_count": 23, 931 | "metadata": {}, 932 | "output_type": "execute_result" 933 | } 934 | ], 935 | "source": [ 936 | "df.describe().T" 937 | ] 938 | } 939 | ], 940 | "metadata": { 941 | "kernelspec": { 942 | "display_name": "Python 3", 943 | "language": "python", 944 | "name": "python3" 945 | }, 946 | "language_info": { 947 | "codemirror_mode": { 948 | "name": "ipython", 949 | "version": 3 950 | }, 951 | "file_extension": ".py", 952 | "mimetype": "text/x-python", 953 | "name": "python", 954 | "nbconvert_exporter": "python", 955 | "pygments_lexer": "ipython3", 956 | "version": "3.9.13" 957 | } 958 | }, 959 | "nbformat": 4, 960 | "nbformat_minor": 2 961 | } 962 | -------------------------------------------------------------------------------- /4_ImportarDatos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Importar Datos" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "3 formas de Importar Datos\n", 15 | "\n", 16 | "**CSV**\n", 17 | "\n", 18 | "**EXCEL**\n", 19 | "\n", 20 | "**PORTAPAPELES**\n", 21 | "\n", 22 | "A partir de ahí vendran otros archivos como pickle, pero si entendes la dinámica de estos 3 el resto será más sencillo." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Como importar un CSV" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Importar un csv" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 4, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/html": [ 47 | "
\n", 48 | "\n", 61 | "\n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
id
389932Mas y MasAlmendras4.54121229/3/23Cádiz
389933MercadonaNueces5.845915/12/23Sevilla
389934AlcampoCacahuetes2.69283126/7/23Madrid
389935Mas y MasPistachos6.705167/10/23Alicante
389936LidlNueces6.14128312/5/23Barcelona
.....................
390227CapraboNueces9.75236715/9/23Sevilla
390228AlcampoPistachos3.82195119/9/23Madrid
390229Mas y MasPistachos5.2228114/5/23Valencia
390230Mas y MasCacahuetes8.0747318/1/23Madrid
390231AldiNueces5.80108123/3/23Sevilla
\n", 184 | "

300 rows × 6 columns

\n", 185 | "
" 186 | ], 187 | "text/plain": [ 188 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 189 | "id \n", 190 | "389932 Mas y Mas Almendras 4.54 1212 \n", 191 | "389933 Mercadona Nueces 5.84 591 \n", 192 | "389934 Alcampo Cacahuetes 2.69 2831 \n", 193 | "389935 Mas y Mas Pistachos 6.70 516 \n", 194 | "389936 Lidl Nueces 6.14 1283 \n", 195 | "... ... ... ... ... \n", 196 | "390227 Caprabo Nueces 9.75 2367 \n", 197 | "390228 Alcampo Pistachos 3.82 1951 \n", 198 | "390229 Mas y Mas Pistachos 5.22 2811 \n", 199 | "390230 Mas y Mas Cacahuetes 8.07 473 \n", 200 | "390231 Aldi Nueces 5.80 1081 \n", 201 | "\n", 202 | " Fecha Venta Provincia Cliente \n", 203 | "id \n", 204 | "389932 29/3/23 Cádiz \n", 205 | "389933 5/12/23 Sevilla \n", 206 | "389934 26/7/23 Madrid \n", 207 | "389935 7/10/23 Alicante \n", 208 | "389936 12/5/23 Barcelona \n", 209 | "... ... ... \n", 210 | "390227 15/9/23 Sevilla \n", 211 | "390228 19/9/23 Madrid \n", 212 | "390229 4/5/23 Valencia \n", 213 | "390230 18/1/23 Madrid \n", 214 | "390231 23/3/23 Sevilla \n", 215 | "\n", 216 | "[300 rows x 6 columns]" 217 | ] 218 | }, 219 | "execution_count": 4, 220 | "metadata": {}, 221 | "output_type": "execute_result" 222 | } 223 | ], 224 | "source": [ 225 | "import pandas as pd \n", 226 | "df = pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.csv', delimiter=';', index_col='id' )\n", 227 | "\n", 228 | "df" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 5, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "name": "stdout", 238 | "output_type": "stream", 239 | "text": [ 240 | "\n", 241 | "Int64Index: 300 entries, 389932 to 390231\n", 242 | "Data columns (total 6 columns):\n", 243 | " # Column Non-Null Count Dtype \n", 244 | "--- ------ -------------- ----- \n", 245 | " 0 Nombre Cliente 300 non-null object \n", 246 | " 1 Tipo Producto 300 non-null object \n", 247 | " 2 Precio (€/kg) 300 non-null float64\n", 248 | " 3 Cantidad Vendida (kg) 300 non-null int64 \n", 249 | " 4 Fecha Venta 300 non-null object \n", 250 | " 5 Provincia Cliente 300 non-null object \n", 251 | "dtypes: float64(1), int64(1), object(4)\n", 252 | "memory usage: 16.4+ KB\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "#Información general\n", 258 | "df.info()" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 8, 264 | "metadata": {}, 265 | "outputs": [ 266 | { 267 | "data": { 268 | "text/html": [ 269 | "
\n", 270 | "\n", 283 | "\n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
id
389932Mas y MasAlmendras4.5412122023-03-29Cádiz
389933MercadonaNueces5.845912023-05-12Sevilla
389934AlcampoCacahuetes2.6928312023-07-26Madrid
389935Mas y MasPistachos6.705162023-07-10Alicante
389936LidlNueces6.1412832023-12-05Barcelona
\n", 352 | "
" 353 | ], 354 | "text/plain": [ 355 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 356 | "id \n", 357 | "389932 Mas y Mas Almendras 4.54 1212 \n", 358 | "389933 Mercadona Nueces 5.84 591 \n", 359 | "389934 Alcampo Cacahuetes 2.69 2831 \n", 360 | "389935 Mas y Mas Pistachos 6.70 516 \n", 361 | "389936 Lidl Nueces 6.14 1283 \n", 362 | "\n", 363 | " Fecha Venta Provincia Cliente \n", 364 | "id \n", 365 | "389932 2023-03-29 Cádiz \n", 366 | "389933 2023-05-12 Sevilla \n", 367 | "389934 2023-07-26 Madrid \n", 368 | "389935 2023-07-10 Alicante \n", 369 | "389936 2023-12-05 Barcelona " 370 | ] 371 | }, 372 | "execution_count": 8, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "# Lee el archivo csv con la columna 'Fecha Venta' como fechas parseadas\n", 379 | "df = pd.read_csv ('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.csv', delimiter=';', index_col='id', parse_dates=['Fecha Venta'])\n", 380 | "\n", 381 | "# Muestra las primeras filas del dataFrame para verificar que la fecha se haya parseado correctamente\n", 382 | "df.head()" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": 9, 388 | "metadata": {}, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "\n", 395 | "Int64Index: 300 entries, 389932 to 390231\n", 396 | "Data columns (total 6 columns):\n", 397 | " # Column Non-Null Count Dtype \n", 398 | "--- ------ -------------- ----- \n", 399 | " 0 Nombre Cliente 300 non-null object \n", 400 | " 1 Tipo Producto 300 non-null object \n", 401 | " 2 Precio (€/kg) 300 non-null float64 \n", 402 | " 3 Cantidad Vendida (kg) 300 non-null int64 \n", 403 | " 4 Fecha Venta 300 non-null datetime64[ns]\n", 404 | " 5 Provincia Cliente 300 non-null object \n", 405 | "dtypes: datetime64[ns](1), float64(1), int64(1), object(3)\n", 406 | "memory usage: 16.4+ KB\n" 407 | ] 408 | } 409 | ], 410 | "source": [ 411 | "#Información general. Vemos que la columna fecha de venta ahora es del tipo datetime64\n", 412 | "df.info()" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "## Importar desde un Excel" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 11, 425 | "metadata": {}, 426 | "outputs": [], 427 | "source": [ 428 | "# Parseamos la fecha al momento de la carga\n", 429 | "df_excel = pd.read_excel('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.xlsx', index_col='id', parse_dates=['Fecha Venta'])" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 12, 435 | "metadata": {}, 436 | "outputs": [ 437 | { 438 | "name": "stdout", 439 | "output_type": "stream", 440 | "text": [ 441 | "\n", 442 | "Int64Index: 300 entries, 389932 to 390231\n", 443 | "Data columns (total 6 columns):\n", 444 | " # Column Non-Null Count Dtype \n", 445 | "--- ------ -------------- ----- \n", 446 | " 0 Nombre Cliente 300 non-null object \n", 447 | " 1 Tipo Producto 300 non-null object \n", 448 | " 2 Precio (€/kg) 300 non-null float64 \n", 449 | " 3 Cantidad Vendida (kg) 300 non-null int64 \n", 450 | " 4 Fecha Venta 300 non-null datetime64[ns]\n", 451 | " 5 Provincia Cliente 300 non-null object \n", 452 | "dtypes: datetime64[ns](1), float64(1), int64(1), object(3)\n", 453 | "memory usage: 16.4+ KB\n" 454 | ] 455 | } 456 | ], 457 | "source": [ 458 | "df_excel.info()" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 13, 464 | "metadata": {}, 465 | "outputs": [ 466 | { 467 | "data": { 468 | "text/html": [ 469 | "
\n", 470 | "\n", 483 | "\n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
id
389932Mas y MasAlmendras4.5412122023-03-29Cádiz
389933MercadonaNueces5.845912023-05-12Sevilla
389934AlcampoCacahuetes2.6928312023-07-26Madrid
389935Mas y MasPistachos6.705162023-07-10Alicante
389936LidlNueces6.1412832023-12-05Barcelona
\n", 552 | "
" 553 | ], 554 | "text/plain": [ 555 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 556 | "id \n", 557 | "389932 Mas y Mas Almendras 4.54 1212 \n", 558 | "389933 Mercadona Nueces 5.84 591 \n", 559 | "389934 Alcampo Cacahuetes 2.69 2831 \n", 560 | "389935 Mas y Mas Pistachos 6.70 516 \n", 561 | "389936 Lidl Nueces 6.14 1283 \n", 562 | "\n", 563 | " Fecha Venta Provincia Cliente \n", 564 | "id \n", 565 | "389932 2023-03-29 Cádiz \n", 566 | "389933 2023-05-12 Sevilla \n", 567 | "389934 2023-07-26 Madrid \n", 568 | "389935 2023-07-10 Alicante \n", 569 | "389936 2023-12-05 Barcelona " 570 | ] 571 | }, 572 | "execution_count": 13, 573 | "metadata": {}, 574 | "output_type": "execute_result" 575 | } 576 | ], 577 | "source": [ 578 | "df_excel.head()" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "## Importar desde el Portapapeles" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 14, 591 | "metadata": {}, 592 | "outputs": [ 593 | { 594 | "data": { 595 | "text/html": [ 596 | "
\n", 597 | "\n", 610 | "\n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | "
01234
0('/Users/jimenacambronero/Desktop/ProyectosparaPortfolio/PythonPracticas/Datos/Frutos_Secos.c...delimiter=';',index_col='id'
\n", 632 | "
" 633 | ], 634 | "text/plain": [ 635 | " 0 1 \\\n", 636 | "0 ('/Users/jimenacambronero/Desktop/Proyectos para \n", 637 | "\n", 638 | " 2 3 \\\n", 639 | "0 Portfolio/PythonPracticas/Datos/Frutos_Secos.c... delimiter=';', \n", 640 | "\n", 641 | " 4 \n", 642 | "0 index_col='id' " 643 | ] 644 | }, 645 | "execution_count": 14, 646 | "metadata": {}, 647 | "output_type": "execute_result" 648 | } 649 | ], 650 | "source": [ 651 | "df_portapapeles = pd.read_clipboard(header=None)\n", 652 | "df_portapapeles.head(2)" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "# Bases de datos relacionales -SQL Alchemy-" 660 | ] 661 | }, 662 | { 663 | "cell_type": "markdown", 664 | "metadata": {}, 665 | "source": [ 666 | "SQLAlchemy es una biblioteca de Python que proporciona una forma flexible y de alto rendimiento para trabajar con bases de datos relacionales. Se utiliza comúnmente como un Object Relational Mapper (ORM), que es una técnica que facilita la interacción con bases de datos utilizando objetos y consultas en lugar de escribir directamente en SQL." 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "execution_count": 1, 672 | "metadata": {}, 673 | "outputs": [], 674 | "source": [ 675 | "import sqlalchemy as sa" 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": 26, 681 | "metadata": {}, 682 | "outputs": [], 683 | "source": [ 684 | "# Crear la conexion a la base de datos\n", 685 | "conexion= sa.create_engine('sqlite:////Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/pruebassqlalchemy.db')" 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "execution_count": 35, 691 | "metadata": {}, 692 | "outputs": [], 693 | "source": [ 694 | "# Hacer la query para cargar los datos\n", 695 | "df_sql= pd.read_sql('SELECT * FROM Registros_Ventas',conexion)" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 36, 701 | "metadata": {}, 702 | "outputs": [ 703 | { 704 | "data": { 705 | "text/html": [ 706 | "
\n", 707 | "\n", 720 | "\n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Cliente
0Mas y MasAlmendras4.5412122023-03-29Cádiz
1MercadonaNueces5.845912023-12-05Sevilla
2AlcampoCacahuetes2.6928312023-07-26Madrid
3Mas y MasPistachos6.705162023-10-07Alicante
4LidlNueces6.1412832023-05-12Barcelona
\n", 780 | "
" 781 | ], 782 | "text/plain": [ 783 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 784 | "0 Mas y Mas Almendras 4.54 1212 \n", 785 | "1 Mercadona Nueces 5.84 591 \n", 786 | "2 Alcampo Cacahuetes 2.69 2831 \n", 787 | "3 Mas y Mas Pistachos 6.70 516 \n", 788 | "4 Lidl Nueces 6.14 1283 \n", 789 | "\n", 790 | " Fecha Venta Provincia Cliente \n", 791 | "0 2023-03-29 Cádiz \n", 792 | "1 2023-12-05 Sevilla \n", 793 | "2 2023-07-26 Madrid \n", 794 | "3 2023-10-07 Alicante \n", 795 | "4 2023-05-12 Barcelona " 796 | ] 797 | }, 798 | "execution_count": 36, 799 | "metadata": {}, 800 | "output_type": "execute_result" 801 | } 802 | ], 803 | "source": [ 804 | "# Trabajar normalmente como un dataframe de Pandas\n", 805 | "df_sql.head()" 806 | ] 807 | }, 808 | { 809 | "cell_type": "markdown", 810 | "metadata": {}, 811 | "source": [ 812 | "# Guardar Datos" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": {}, 818 | "source": [ 819 | "CSV\n", 820 | "\n", 821 | " df.to_csv('Prueba.csv')\n", 822 | "\n", 823 | "EXCEL\n", 824 | "\n", 825 | " df.to_excel('Prueba.xlsx')" 826 | ] 827 | }, 828 | { 829 | "cell_type": "markdown", 830 | "metadata": {}, 831 | "source": [ 832 | "# Situaciones Frecuentes" 833 | ] 834 | }, 835 | { 836 | "cell_type": "markdown", 837 | "metadata": {}, 838 | "source": [ 839 | "## Separadores No Estandar\n", 840 | "\n", 841 | "A menudo, cuando trabajas con archivos, la información está organizada en columnas y las columnas están separadas por caracteres especiales como comas, punto y coma o tabulaciones.\n", 842 | "\n", 843 | "Pero a veces, en lugar de comas o punto y coma, pueden usar cualquier otro carácter para separar las columnas. Entonces, lo primero que debes hacer es descubrir qué carácter están usando para separar las columnas, para que podamos leer el archivo correctamente. Esto se hace utilizando el parámetro \"sep\".\n", 844 | "\n", 845 | "La mayoría de las veces, esta información sobre el carácter separador se encuentra en la documentación del archivo o en las instrucciones proporcionadas." 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": 37, 851 | "metadata": {}, 852 | "outputs": [ 853 | { 854 | "data": { 855 | "text/html": [ 856 | "
\n", 857 | "\n", 870 | "\n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | "
NombreEdadCiudad
0Juan25Madrid
1Ana30Barcelona
2Carlos22Sevilla
\n", 900 | "
" 901 | ], 902 | "text/plain": [ 903 | " Nombre Edad Ciudad\n", 904 | "0 Juan 25 Madrid\n", 905 | "1 Ana 30 Barcelona\n", 906 | "2 Carlos 22 Sevilla" 907 | ] 908 | }, 909 | "execution_count": 37, 910 | "metadata": {}, 911 | "output_type": "execute_result" 912 | } 913 | ], 914 | "source": [ 915 | "pd.read_csv( '/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/ejemplo_tabulador.csv', sep='\\t').head(5) " 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "metadata": {}, 921 | "source": [ 922 | "## Cuando los datos NO comienzan en la Primera Fila" 923 | ] 924 | }, 925 | { 926 | "cell_type": "code", 927 | "execution_count": 38, 928 | "metadata": {}, 929 | "outputs": [ 930 | { 931 | "data": { 932 | "text/html": [ 933 | "
\n", 934 | "\n", 947 | "\n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | "
Juan25Madrid
0Ana30Barcelona
1Carlos22Sevilla
\n", 971 | "
" 972 | ], 973 | "text/plain": [ 974 | " Juan 25 Madrid\n", 975 | "0 Ana 30 Barcelona\n", 976 | "1 Carlos 22 Sevilla" 977 | ] 978 | }, 979 | "execution_count": 38, 980 | "metadata": {}, 981 | "output_type": "execute_result" 982 | } 983 | ], 984 | "source": [ 985 | "# skiprows se salta la primera fila, cabecera, directamente carga como cabecera el primer registro\n", 986 | "pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/ejemplo_tabulador.csv', \n", 987 | " sep='\\t',\n", 988 | " skiprows=1).head(5)" 989 | ] 990 | }, 991 | { 992 | "cell_type": "markdown", 993 | "metadata": {}, 994 | "source": [ 995 | "## Columnas Sin Nombre" 996 | ] 997 | }, 998 | { 999 | "cell_type": "markdown", 1000 | "metadata": {}, 1001 | "source": [ 1002 | "A veces puede suceder que no queremos cargar los nombres de las columnas sino que los queremos poner más tarde" 1003 | ] 1004 | }, 1005 | { 1006 | "cell_type": "code", 1007 | "execution_count": 39, 1008 | "metadata": {}, 1009 | "outputs": [ 1010 | { 1011 | "data": { 1012 | "text/html": [ 1013 | "
\n", 1014 | "\n", 1027 | "\n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | "
012
0NombreEdadCiudad
1Juan25Madrid
2Ana30Barcelona
3Carlos22Sevilla
\n", 1063 | "
" 1064 | ], 1065 | "text/plain": [ 1066 | " 0 1 2\n", 1067 | "0 Nombre Edad Ciudad\n", 1068 | "1 Juan 25 Madrid\n", 1069 | "2 Ana 30 Barcelona\n", 1070 | "3 Carlos 22 Sevilla" 1071 | ] 1072 | }, 1073 | "execution_count": 39, 1074 | "metadata": {}, 1075 | "output_type": "execute_result" 1076 | } 1077 | ], 1078 | "source": [ 1079 | "# header = none no cargara nada \n", 1080 | "pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/ejemplo_tabulador.csv', \n", 1081 | " sep='\\t',\n", 1082 | " header=None).head(5)" 1083 | ] 1084 | }, 1085 | { 1086 | "cell_type": "code", 1087 | "execution_count": 40, 1088 | "metadata": {}, 1089 | "outputs": [ 1090 | { 1091 | "data": { 1092 | "text/html": [ 1093 | "
\n", 1094 | "\n", 1107 | "\n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | "
Estasson miscabeceras
0Carlos22Sevilla
\n", 1125 | "
" 1126 | ], 1127 | "text/plain": [ 1128 | " Estas son mis cabeceras\n", 1129 | "0 Carlos 22 Sevilla" 1130 | ] 1131 | }, 1132 | "execution_count": 40, 1133 | "metadata": {}, 1134 | "output_type": "execute_result" 1135 | } 1136 | ], 1137 | "source": [ 1138 | "# Cómo poner nombres en la importación\n", 1139 | "# Como ahora la cabecera corre fila tenemos que ponerle 2 en skiprows, si no no importara la fila 84\n", 1140 | "\n", 1141 | "cabecera = ['Estas', 'son mis', 'cabeceras']\n", 1142 | "\n", 1143 | "pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/ejemplo_tabulador.csv', \n", 1144 | " sep='\\t',\n", 1145 | " skiprows=2,\n", 1146 | " header=0,\n", 1147 | " names=cabecera).head(5)" 1148 | ] 1149 | }, 1150 | { 1151 | "cell_type": "markdown", 1152 | "metadata": {}, 1153 | "source": [ 1154 | "## Nulos no Habituales" 1155 | ] 1156 | }, 1157 | { 1158 | "cell_type": "markdown", 1159 | "metadata": {}, 1160 | "source": [ 1161 | "Si tenes una serie de números 'falsos' como en alguos casos el -999 los podes identificar como nulos." 1162 | ] 1163 | }, 1164 | { 1165 | "cell_type": "code", 1166 | "execution_count": 42, 1167 | "metadata": {}, 1168 | "outputs": [ 1169 | { 1170 | "data": { 1171 | "text/html": [ 1172 | "
\n", 1173 | "\n", 1186 | "\n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | "
IDNombreAñosPuntos
01Alice25.095.0
12BobNaN82.0
23Charlie30.0NaN
34David22.090.0
45Eva35.088.0
\n", 1234 | "
" 1235 | ], 1236 | "text/plain": [ 1237 | " ID Nombre Años Puntos\n", 1238 | "0 1 Alice 25.0 95.0\n", 1239 | "1 2 Bob NaN 82.0\n", 1240 | "2 3 Charlie 30.0 NaN\n", 1241 | "3 4 David 22.0 90.0\n", 1242 | "4 5 Eva 35.0 88.0" 1243 | ] 1244 | }, 1245 | "execution_count": 42, 1246 | "metadata": {}, 1247 | "output_type": "execute_result" 1248 | } 1249 | ], 1250 | "source": [ 1251 | "cabecera = ['ID', 'Nombre', 'Años', 'Puntos']\n", 1252 | "\n", 1253 | "# cargara Nan cdo encuentro un -999\n", 1254 | "pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/ejemplo_valores_nulos.csv', \n", 1255 | " sep=',',\n", 1256 | " header=0,\n", 1257 | " names=cabecera,\n", 1258 | " na_values=-999).head(5)" 1259 | ] 1260 | } 1261 | ], 1262 | "metadata": { 1263 | "kernelspec": { 1264 | "display_name": "Python 3", 1265 | "language": "python", 1266 | "name": "python3" 1267 | }, 1268 | "language_info": { 1269 | "codemirror_mode": { 1270 | "name": "ipython", 1271 | "version": 3 1272 | }, 1273 | "file_extension": ".py", 1274 | "mimetype": "text/x-python", 1275 | "name": "python", 1276 | "nbconvert_exporter": "python", 1277 | "pygments_lexer": "ipython3", 1278 | "version": "3.9.13" 1279 | } 1280 | }, 1281 | "nbformat": 4, 1282 | "nbformat_minor": 2 1283 | } 1284 | -------------------------------------------------------------------------------- /6_TratamientoDatos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Tratamiento de Datos" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Identificar Nulos" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### Conteo por Variable" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 4, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "ename": "AttributeError", 31 | "evalue": "'DataFrame' object has no attribute 'isa'", 32 | "output_type": "error", 33 | "traceback": [ 34 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 35 | "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", 36 | "\u001b[0;32m/var/folders/2d/p3zg32ds4p598pyxblpsp5ph0000gn/T/ipykernel_2676/1660468150.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# .isna = es nulo\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_excel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.xlsx'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex_col\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'id'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparse_dates\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Fecha Venta'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0misa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 37 | "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 5573\u001b[0m ):\n\u001b[1;32m 5574\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5575\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5576\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5577\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 38 | "\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'isa'" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "import pandas as pd\n", 44 | "\n", 45 | "# .isna = es nulo\n", 46 | "df = pd.read_excel('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos.xlsx', index_col='id', parse_dates=['Fecha Venta'])\n", 47 | "df.isa()" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 5, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/plain": [ 58 | "Nombre Cliente 0\n", 59 | "Tipo Producto 0\n", 60 | "Precio (€/kg) 0\n", 61 | "Cantidad Vendida (kg) 0\n", 62 | "Fecha Venta 0\n", 63 | "Provincia Cliente 0\n", 64 | "dtype: int64" 65 | ] 66 | }, 67 | "execution_count": 5, 68 | "metadata": {}, 69 | "output_type": "execute_result" 70 | } 71 | ], 72 | "source": [ 73 | "# conteo de nulos\n", 74 | "df.isna().sum()" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 6, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "data": { 84 | "text/html": [ 85 | "
\n", 86 | "\n", 99 | "\n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | "
IDNombreEdadPuntuacion
01Nombre1NaN95.0
12Nombre225.0NaN
23Nombre3NaNNaN
34Nombre430.0NaN
45Nombre525.0NaN
56Nombre6NaN90.0
67Nombre730.095.0
78Nombre830.0NaN
89Nombre9NaN95.0
910Nombre1025.090.0
1011Nombre1130.090.0
1112Nombre1225.090.0
1213Nombre1330.095.0
1314Nombre1430.090.0
1415Nombre1530.090.0
1516Nombre16NaN95.0
1617Nombre17NaNNaN
1718Nombre1830.0NaN
1819Nombre1925.0NaN
1920Nombre2025.090.0
2021Nombre2125.0NaN
2122Nombre22NaN95.0
2223Nombre2330.090.0
2324Nombre24NaN95.0
2425Nombre2530.0NaN
2526Nombre26NaNNaN
2627Nombre2725.0NaN
2728Nombre2830.090.0
2829Nombre29NaN90.0
2930Nombre3025.090.0
3031Nombre31NaN95.0
3132Nombre3225.095.0
3233Nombre3325.0NaN
3334Nombre3430.095.0
3435Nombre3525.0NaN
3536Nombre3630.090.0
3637Nombre3725.0NaN
3738Nombre38NaNNaN
3839Nombre3930.0NaN
3940Nombre4030.0NaN
4041Nombre4130.090.0
4142Nombre42NaN90.0
4243Nombre4330.0NaN
4344Nombre4430.0NaN
4445Nombre4525.095.0
4546Nombre46NaN95.0
4647Nombre4725.095.0
4748Nombre48NaN90.0
4849Nombre4925.095.0
4950Nombre5030.090.0
\n", 462 | "
" 463 | ], 464 | "text/plain": [ 465 | " ID Nombre Edad Puntuacion\n", 466 | "0 1 Nombre1 NaN 95.0\n", 467 | "1 2 Nombre2 25.0 NaN\n", 468 | "2 3 Nombre3 NaN NaN\n", 469 | "3 4 Nombre4 30.0 NaN\n", 470 | "4 5 Nombre5 25.0 NaN\n", 471 | "5 6 Nombre6 NaN 90.0\n", 472 | "6 7 Nombre7 30.0 95.0\n", 473 | "7 8 Nombre8 30.0 NaN\n", 474 | "8 9 Nombre9 NaN 95.0\n", 475 | "9 10 Nombre10 25.0 90.0\n", 476 | "10 11 Nombre11 30.0 90.0\n", 477 | "11 12 Nombre12 25.0 90.0\n", 478 | "12 13 Nombre13 30.0 95.0\n", 479 | "13 14 Nombre14 30.0 90.0\n", 480 | "14 15 Nombre15 30.0 90.0\n", 481 | "15 16 Nombre16 NaN 95.0\n", 482 | "16 17 Nombre17 NaN NaN\n", 483 | "17 18 Nombre18 30.0 NaN\n", 484 | "18 19 Nombre19 25.0 NaN\n", 485 | "19 20 Nombre20 25.0 90.0\n", 486 | "20 21 Nombre21 25.0 NaN\n", 487 | "21 22 Nombre22 NaN 95.0\n", 488 | "22 23 Nombre23 30.0 90.0\n", 489 | "23 24 Nombre24 NaN 95.0\n", 490 | "24 25 Nombre25 30.0 NaN\n", 491 | "25 26 Nombre26 NaN NaN\n", 492 | "26 27 Nombre27 25.0 NaN\n", 493 | "27 28 Nombre28 30.0 90.0\n", 494 | "28 29 Nombre29 NaN 90.0\n", 495 | "29 30 Nombre30 25.0 90.0\n", 496 | "30 31 Nombre31 NaN 95.0\n", 497 | "31 32 Nombre32 25.0 95.0\n", 498 | "32 33 Nombre33 25.0 NaN\n", 499 | "33 34 Nombre34 30.0 95.0\n", 500 | "34 35 Nombre35 25.0 NaN\n", 501 | "35 36 Nombre36 30.0 90.0\n", 502 | "36 37 Nombre37 25.0 NaN\n", 503 | "37 38 Nombre38 NaN NaN\n", 504 | "38 39 Nombre39 30.0 NaN\n", 505 | "39 40 Nombre40 30.0 NaN\n", 506 | "40 41 Nombre41 30.0 90.0\n", 507 | "41 42 Nombre42 NaN 90.0\n", 508 | "42 43 Nombre43 30.0 NaN\n", 509 | "43 44 Nombre44 30.0 NaN\n", 510 | "44 45 Nombre45 25.0 95.0\n", 511 | "45 46 Nombre46 NaN 95.0\n", 512 | "46 47 Nombre47 25.0 95.0\n", 513 | "47 48 Nombre48 NaN 90.0\n", 514 | "48 49 Nombre49 25.0 95.0\n", 515 | "49 50 Nombre50 30.0 90.0" 516 | ] 517 | }, 518 | "execution_count": 6, 519 | "metadata": {}, 520 | "output_type": "execute_result" 521 | } 522 | ], 523 | "source": [ 524 | "#cargamos otro archivo con registros nulos\n", 525 | "df_nulos= pd.read_csv('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/registros_nulos.csv')\n", 526 | "df_nulos" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": 7, 532 | "metadata": {}, 533 | "outputs": [ 534 | { 535 | "data": { 536 | "text/plain": [ 537 | "ID 0\n", 538 | "Nombre 0\n", 539 | "Edad 15\n", 540 | "Puntuacion 20\n", 541 | "dtype: int64" 542 | ] 543 | }, 544 | "execution_count": 7, 545 | "metadata": {}, 546 | "output_type": "execute_result" 547 | } 548 | ], 549 | "source": [ 550 | "df_nulos.isna().sum()" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": 9, 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "data": { 560 | "text/plain": [ 561 | "Puntuacion 20\n", 562 | "Edad 15\n", 563 | "ID 0\n", 564 | "Nombre 0\n", 565 | "dtype: int64" 566 | ] 567 | }, 568 | "execution_count": 9, 569 | "metadata": {}, 570 | "output_type": "execute_result" 571 | } 572 | ], 573 | "source": [ 574 | "df_nulos.isna().sum().sort_values(ascending=False)" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "### Porcentaje por Variable" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": 12, 587 | "metadata": {}, 588 | "outputs": [ 589 | { 590 | "data": { 591 | "text/plain": [ 592 | "Puntuacion 40.0\n", 593 | "Edad 30.0\n", 594 | "ID 0.0\n", 595 | "Nombre 0.0\n", 596 | "dtype: float64" 597 | ] 598 | }, 599 | "execution_count": 12, 600 | "metadata": {}, 601 | "output_type": "execute_result" 602 | } 603 | ], 604 | "source": [ 605 | "df_nulos.isna().mean().sort_values(ascending= False)*100\n", 606 | "#obtendremos el porcentaje de nulos" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "## Conteo de Duplicados" 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": 13, 619 | "metadata": {}, 620 | "outputs": [], 621 | "source": [ 622 | "df_duplicados= pd.read_excel('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos_duplicados.xlsx')" 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 14, 628 | "metadata": {}, 629 | "outputs": [ 630 | { 631 | "data": { 632 | "text/plain": [ 633 | "23" 634 | ] 635 | }, 636 | "execution_count": 14, 637 | "metadata": {}, 638 | "output_type": "execute_result" 639 | } 640 | ], 641 | "source": [ 642 | "#Saber cuantos duplicados hay\n", 643 | "df_duplicados.duplicated().sum()" 644 | ] 645 | }, 646 | { 647 | "cell_type": "markdown", 648 | "metadata": {}, 649 | "source": [ 650 | "### Localizar Duplicados" 651 | ] 652 | }, 653 | { 654 | "cell_type": "code", 655 | "execution_count": 15, 656 | "metadata": {}, 657 | "outputs": [ 658 | { 659 | "data": { 660 | "text/html": [ 661 | "
\n", 662 | "\n", 675 | "\n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Clienteid
300CapraboCacahuetes6.2416265/2/23Valencia389955
301CapraboCacahuetes6.2416265/2/23Valencia389955
302CapraboCacahuetes6.2416265/2/23Valencia389955
303CapraboCacahuetes6.2416265/2/23Valencia389955
304CapraboCacahuetes6.2416265/2/23Valencia389955
305CapraboCacahuetes6.2416265/2/23Valencia389955
306CapraboCacahuetes6.2416265/2/23Valencia389955
307CapraboCacahuetes6.2416265/2/23Valencia389955
308CapraboCacahuetes6.2416265/2/23Valencia389955
309CapraboCacahuetes6.2416265/2/23Valencia389955
310CapraboCacahuetes6.2416265/2/23Valencia389955
311CapraboCacahuetes6.2416265/2/23Valencia389955
312CapraboCacahuetes6.2416265/2/23Valencia389955
313CapraboCacahuetes6.2416265/2/23Valencia389955
314CapraboCacahuetes6.2416265/2/23Valencia389955
315CapraboCacahuetes6.2416265/2/23Valencia389955
316CapraboCacahuetes6.2416265/2/23Valencia389955
317CapraboCacahuetes6.2416265/2/23Valencia389955
318CapraboCacahuetes6.2416265/2/23Valencia389955
319CapraboCacahuetes6.2416265/2/23Valencia389955
320CapraboCacahuetes6.2416265/2/23Valencia389955
321CapraboCacahuetes6.2416265/2/23Valencia389955
322CapraboCacahuetes6.2416265/2/23Valencia389955
\n", 921 | "
" 922 | ], 923 | "text/plain": [ 924 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 925 | "300 Caprabo Cacahuetes 6.24 1626 \n", 926 | "301 Caprabo Cacahuetes 6.24 1626 \n", 927 | "302 Caprabo Cacahuetes 6.24 1626 \n", 928 | "303 Caprabo Cacahuetes 6.24 1626 \n", 929 | "304 Caprabo Cacahuetes 6.24 1626 \n", 930 | "305 Caprabo Cacahuetes 6.24 1626 \n", 931 | "306 Caprabo Cacahuetes 6.24 1626 \n", 932 | "307 Caprabo Cacahuetes 6.24 1626 \n", 933 | "308 Caprabo Cacahuetes 6.24 1626 \n", 934 | "309 Caprabo Cacahuetes 6.24 1626 \n", 935 | "310 Caprabo Cacahuetes 6.24 1626 \n", 936 | "311 Caprabo Cacahuetes 6.24 1626 \n", 937 | "312 Caprabo Cacahuetes 6.24 1626 \n", 938 | "313 Caprabo Cacahuetes 6.24 1626 \n", 939 | "314 Caprabo Cacahuetes 6.24 1626 \n", 940 | "315 Caprabo Cacahuetes 6.24 1626 \n", 941 | "316 Caprabo Cacahuetes 6.24 1626 \n", 942 | "317 Caprabo Cacahuetes 6.24 1626 \n", 943 | "318 Caprabo Cacahuetes 6.24 1626 \n", 944 | "319 Caprabo Cacahuetes 6.24 1626 \n", 945 | "320 Caprabo Cacahuetes 6.24 1626 \n", 946 | "321 Caprabo Cacahuetes 6.24 1626 \n", 947 | "322 Caprabo Cacahuetes 6.24 1626 \n", 948 | "\n", 949 | " Fecha Venta Provincia Cliente id \n", 950 | "300 5/2/23 Valencia 389955 \n", 951 | "301 5/2/23 Valencia 389955 \n", 952 | "302 5/2/23 Valencia 389955 \n", 953 | "303 5/2/23 Valencia 389955 \n", 954 | "304 5/2/23 Valencia 389955 \n", 955 | "305 5/2/23 Valencia 389955 \n", 956 | "306 5/2/23 Valencia 389955 \n", 957 | "307 5/2/23 Valencia 389955 \n", 958 | "308 5/2/23 Valencia 389955 \n", 959 | "309 5/2/23 Valencia 389955 \n", 960 | "310 5/2/23 Valencia 389955 \n", 961 | "311 5/2/23 Valencia 389955 \n", 962 | "312 5/2/23 Valencia 389955 \n", 963 | "313 5/2/23 Valencia 389955 \n", 964 | "314 5/2/23 Valencia 389955 \n", 965 | "315 5/2/23 Valencia 389955 \n", 966 | "316 5/2/23 Valencia 389955 \n", 967 | "317 5/2/23 Valencia 389955 \n", 968 | "318 5/2/23 Valencia 389955 \n", 969 | "319 5/2/23 Valencia 389955 \n", 970 | "320 5/2/23 Valencia 389955 \n", 971 | "321 5/2/23 Valencia 389955 \n", 972 | "322 5/2/23 Valencia 389955 " 973 | ] 974 | }, 975 | "execution_count": 15, 976 | "metadata": {}, 977 | "output_type": "execute_result" 978 | } 979 | ], 980 | "source": [ 981 | "df_duplicados[df_duplicados.duplicated()]" 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": 16, 987 | "metadata": {}, 988 | "outputs": [ 989 | { 990 | "data": { 991 | "text/html": [ 992 | "
\n", 993 | "\n", 1006 | "\n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | "
Nombre ClienteTipo ProductoPrecio (€/kg)Cantidad Vendida (kg)Fecha VentaProvincia Clienteid
23CapraboCacahuetes6.2416265/2/23Valencia389955
300CapraboCacahuetes6.2416265/2/23Valencia389955
301CapraboCacahuetes6.2416265/2/23Valencia389955
302CapraboCacahuetes6.2416265/2/23Valencia389955
303CapraboCacahuetes6.2416265/2/23Valencia389955
304CapraboCacahuetes6.2416265/2/23Valencia389955
305CapraboCacahuetes6.2416265/2/23Valencia389955
306CapraboCacahuetes6.2416265/2/23Valencia389955
307CapraboCacahuetes6.2416265/2/23Valencia389955
308CapraboCacahuetes6.2416265/2/23Valencia389955
309CapraboCacahuetes6.2416265/2/23Valencia389955
310CapraboCacahuetes6.2416265/2/23Valencia389955
311CapraboCacahuetes6.2416265/2/23Valencia389955
312CapraboCacahuetes6.2416265/2/23Valencia389955
313CapraboCacahuetes6.2416265/2/23Valencia389955
314CapraboCacahuetes6.2416265/2/23Valencia389955
315CapraboCacahuetes6.2416265/2/23Valencia389955
316CapraboCacahuetes6.2416265/2/23Valencia389955
317CapraboCacahuetes6.2416265/2/23Valencia389955
318CapraboCacahuetes6.2416265/2/23Valencia389955
319CapraboCacahuetes6.2416265/2/23Valencia389955
320CapraboCacahuetes6.2416265/2/23Valencia389955
321CapraboCacahuetes6.2416265/2/23Valencia389955
322CapraboCacahuetes6.2416265/2/23Valencia389955
\n", 1262 | "
" 1263 | ], 1264 | "text/plain": [ 1265 | " Nombre Cliente Tipo Producto Precio (€/kg) Cantidad Vendida (kg) \\\n", 1266 | "23 Caprabo Cacahuetes 6.24 1626 \n", 1267 | "300 Caprabo Cacahuetes 6.24 1626 \n", 1268 | "301 Caprabo Cacahuetes 6.24 1626 \n", 1269 | "302 Caprabo Cacahuetes 6.24 1626 \n", 1270 | "303 Caprabo Cacahuetes 6.24 1626 \n", 1271 | "304 Caprabo Cacahuetes 6.24 1626 \n", 1272 | "305 Caprabo Cacahuetes 6.24 1626 \n", 1273 | "306 Caprabo Cacahuetes 6.24 1626 \n", 1274 | "307 Caprabo Cacahuetes 6.24 1626 \n", 1275 | "308 Caprabo Cacahuetes 6.24 1626 \n", 1276 | "309 Caprabo Cacahuetes 6.24 1626 \n", 1277 | "310 Caprabo Cacahuetes 6.24 1626 \n", 1278 | "311 Caprabo Cacahuetes 6.24 1626 \n", 1279 | "312 Caprabo Cacahuetes 6.24 1626 \n", 1280 | "313 Caprabo Cacahuetes 6.24 1626 \n", 1281 | "314 Caprabo Cacahuetes 6.24 1626 \n", 1282 | "315 Caprabo Cacahuetes 6.24 1626 \n", 1283 | "316 Caprabo Cacahuetes 6.24 1626 \n", 1284 | "317 Caprabo Cacahuetes 6.24 1626 \n", 1285 | "318 Caprabo Cacahuetes 6.24 1626 \n", 1286 | "319 Caprabo Cacahuetes 6.24 1626 \n", 1287 | "320 Caprabo Cacahuetes 6.24 1626 \n", 1288 | "321 Caprabo Cacahuetes 6.24 1626 \n", 1289 | "322 Caprabo Cacahuetes 6.24 1626 \n", 1290 | "\n", 1291 | " Fecha Venta Provincia Cliente id \n", 1292 | "23 5/2/23 Valencia 389955 \n", 1293 | "300 5/2/23 Valencia 389955 \n", 1294 | "301 5/2/23 Valencia 389955 \n", 1295 | "302 5/2/23 Valencia 389955 \n", 1296 | "303 5/2/23 Valencia 389955 \n", 1297 | "304 5/2/23 Valencia 389955 \n", 1298 | "305 5/2/23 Valencia 389955 \n", 1299 | "306 5/2/23 Valencia 389955 \n", 1300 | "307 5/2/23 Valencia 389955 \n", 1301 | "308 5/2/23 Valencia 389955 \n", 1302 | "309 5/2/23 Valencia 389955 \n", 1303 | "310 5/2/23 Valencia 389955 \n", 1304 | "311 5/2/23 Valencia 389955 \n", 1305 | "312 5/2/23 Valencia 389955 \n", 1306 | "313 5/2/23 Valencia 389955 \n", 1307 | "314 5/2/23 Valencia 389955 \n", 1308 | "315 5/2/23 Valencia 389955 \n", 1309 | "316 5/2/23 Valencia 389955 \n", 1310 | "317 5/2/23 Valencia 389955 \n", 1311 | "318 5/2/23 Valencia 389955 \n", 1312 | "319 5/2/23 Valencia 389955 \n", 1313 | "320 5/2/23 Valencia 389955 \n", 1314 | "321 5/2/23 Valencia 389955 \n", 1315 | "322 5/2/23 Valencia 389955 " 1316 | ] 1317 | }, 1318 | "execution_count": 16, 1319 | "metadata": {}, 1320 | "output_type": "execute_result" 1321 | } 1322 | ], 1323 | "source": [ 1324 | "# Ver duplicados de otra manera\n", 1325 | "df_duplicados[df_duplicados.duplicated(keep=False)]" 1326 | ] 1327 | }, 1328 | { 1329 | "cell_type": "markdown", 1330 | "metadata": {}, 1331 | "source": [ 1332 | "## Análisis de Valores Únicos\n", 1333 | "Si una variable solo tiene un valor no te aporta nada en tu análisis" 1334 | ] 1335 | }, 1336 | { 1337 | "cell_type": "markdown", 1338 | "metadata": {}, 1339 | "source": [ 1340 | "### Número e Valores Únicos" 1341 | ] 1342 | }, 1343 | { 1344 | "cell_type": "code", 1345 | "execution_count": 17, 1346 | "metadata": {}, 1347 | "outputs": [], 1348 | "source": [ 1349 | "df = pd.read_excel('/Users/jimenacambronero/Desktop/Proyectos para Portfolio/PythonPracticas/Datos/Frutos_Secos_unicos.xlsx')" 1350 | ] 1351 | }, 1352 | { 1353 | "cell_type": "code", 1354 | "execution_count": 19, 1355 | "metadata": {}, 1356 | "outputs": [ 1357 | { 1358 | "data": { 1359 | "text/plain": [ 1360 | "Nombre Cliente 11\n", 1361 | "Tipo Producto 4\n", 1362 | "Precio (€/kg) 258\n", 1363 | "Cantidad Vendida (kg) 284\n", 1364 | "Fecha Venta 204\n", 1365 | "Provincia Cliente 6\n", 1366 | "id 300\n", 1367 | "Completado 1\n", 1368 | "dtype: int64" 1369 | ] 1370 | }, 1371 | "execution_count": 19, 1372 | "metadata": {}, 1373 | "output_type": "execute_result" 1374 | } 1375 | ], 1376 | "source": [ 1377 | "df.nunique()" 1378 | ] 1379 | }, 1380 | { 1381 | "cell_type": "code", 1382 | "execution_count": 20, 1383 | "metadata": {}, 1384 | "outputs": [ 1385 | { 1386 | "data": { 1387 | "text/plain": [ 1388 | "0 Si\n", 1389 | "1 Si\n", 1390 | "2 Si\n", 1391 | "3 Si\n", 1392 | "4 Si\n", 1393 | " ..\n", 1394 | "295 Si\n", 1395 | "296 Si\n", 1396 | "297 Si\n", 1397 | "298 Si\n", 1398 | "299 Si\n", 1399 | "Name: Completado, Length: 300, dtype: object" 1400 | ] 1401 | }, 1402 | "execution_count": 20, 1403 | "metadata": {}, 1404 | "output_type": "execute_result" 1405 | } 1406 | ], 1407 | "source": [ 1408 | " # puede eliminarse porque es una constante" 1409 | ] 1410 | }, 1411 | { 1412 | "cell_type": "markdown", 1413 | "metadata": {}, 1414 | "source": [ 1415 | "### Valores Únicos Diferentes" 1416 | ] 1417 | }, 1418 | { 1419 | "cell_type": "code", 1420 | "execution_count": 21, 1421 | "metadata": {}, 1422 | "outputs": [ 1423 | { 1424 | "data": { 1425 | "text/plain": [ 1426 | "array(['Cádiz', 'Sevilla', 'Madrid', 'Alicante', 'Barcelona', 'Valencia'],\n", 1427 | " dtype=object)" 1428 | ] 1429 | }, 1430 | "execution_count": 21, 1431 | "metadata": {}, 1432 | "output_type": "execute_result" 1433 | } 1434 | ], 1435 | "source": [ 1436 | "df['Provincia Cliente'].unique()" 1437 | ] 1438 | } 1439 | ], 1440 | "metadata": { 1441 | "kernelspec": { 1442 | "display_name": "Python 3", 1443 | "language": "python", 1444 | "name": "python3" 1445 | }, 1446 | "language_info": { 1447 | "codemirror_mode": { 1448 | "name": "ipython", 1449 | "version": 3 1450 | }, 1451 | "file_extension": ".py", 1452 | "mimetype": "text/x-python", 1453 | "name": "python", 1454 | "nbconvert_exporter": "python", 1455 | "pygments_lexer": "ipython3", 1456 | "version": "3.9.13" 1457 | } 1458 | }, 1459 | "nbformat": 4, 1460 | "nbformat_minor": 2 1461 | } 1462 | --------------------------------------------------------------------------------