├── resources └── main-image-python.png ├── .gitignore ├── README.md ├── A. Datasets.md └── 2. Vectores, Eventos Aleatorios y Probabilidad.ipynb /resources/main-image-python.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sborquez/Python-LEC/HEAD/resources/main-image-python.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | 106 | # google drive 107 | .ini 108 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python LEC 2 | ---------------- 3 | > La guia práctica de __Python__ para el **L**aboratorio de **E**stadística **C**omputacional. 4 | 5 | Esta es una recopilación de guias que te ayudarán a introducirte a las herramientas disponibles en [python](https://www.python.org/) para el análisis estadístico. El objetivo de estas guías es lograr que el lector: 6 | 7 | * Adopte estilos y buenas prácticas de programación con Python. 8 | * Aprenda a utilizar modulos del ambiente [scipy](https://www.scipy.org/), tales como: 9 | * [numpy](http://www.numpy.org/) 10 | * [pandas](http://pandas.pydata.org/) 11 | * [jupyter-notebook](http://jupyter.org/) 12 | * [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html/) 13 | * [matplotlib](http://matplotlib.org/) 14 | * Practique la manipulación de dato y aplique su razonamiento estadístico con casos de la mundo real. 15 | 16 | ## Abrir en Google Colab 17 | 18 | Puedes abrir una copia directamente con el siguiente link: 19 | 20 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sborquez/Python-LEC/blob/master) 21 | 22 | ## Contenido 23 | 24 | 1. **[Introducción y Herramientas Básicas](https://github.com/sborquez/Python-LEC/blob/master/0.%20Introducci%C3%B3n%20y%20Herramientas%20B%C3%A1sicas.ipynb)**: 25 | Preparación del laboratorio, guias de estilo y buenas prácticas con Python. Una introducción a las herramientas que se útilizarán. 26 | 2. **[Análisis Exploratorio](https://github.com/sborquez/Python-LEC/blob/master/1.%20An%C3%A1lisis%20Exploratorio.ipynb)**: Carga, limpia y manipula datos usando pandas, crea visualizaciones informativas con matplolib, seaborn y plotly. 27 | 3. **[Vectores, Eventos Aleatorios y Probabilidad](https://github.com/sborquez/Python-LEC/blob/master/2.%20Vectores%2C%20Eventos%20Aleatorios%20y%20Probabilidad.ipynb)**: Aprende de lo básico a lo avanzado de NumPy. Python para la estadística computacional. 28 | 4. **[Distribuciones y Variables Aleatorias](https://github.com/sborquez/Python-LEC/blob/master/3.%20Distribuciones%20y%20Variables%20Aleatorias.ipynb)**: Utiliza las Variables Aleatorias y Distrubuciones disponibles con el paquete de scipy. 29 | 30 | ## Anexos 31 | 32 | A. **[Datasets](https://github.com/sborquez/Python-LEC/blob/master/A.%20Datasets.md)**: Colección de distintos datasets. 33 | 34 | B. **Algebra Lineal**: Pronto ... 35 | 36 | ## Bibliografía 37 | 38 | > Toda la documentación de los módulos se encuentra disponible en sus respectivas páginas. 39 | 40 | * Python for Data Analysis, 2nd Edition by William McKinney [web](http://shop.oreilly.com/product/0636920050896.do) 41 | * Think Bayes: Bayesian Statistics in Python, Allen B. Downey · O'Reilly Media [web](https://greenteapress.com/wp/think-bayes/) 42 | * Think Stats: Exploratory Data Analysis, Allen B. Downey · O'Reilly Media [web](https://greenteapress.com/thinkstats/) 43 | * Seeing Theory, A visual introduction to probability and statistics. [web](https://seeing-theory.brown.edu/) 44 | * Kaggle: Your Home For DataScience [web](https://www.kaggle.com) 45 | 46 | -------------------------------------------------------------------------------- /A. Datasets.md: -------------------------------------------------------------------------------- 1 |

Laboratorio de Estadística Computacional
con Python

2 | 3 | 4 |

Anexo A: Datasets

5 | 6 |
Recopilación creada por Sebastián Bórquez G. - sebastian.borquez.g@gmail.com - DI UTFSM - Abril 2020.
7 | 8 | # Tabla de Contenido 9 | 10 | * [Acceso](#A.1) 11 | * [Nacionales](#A.2) 12 | * [Internacionales](#A.3) 13 | * [Countries](#A.3.1) 14 | * [Coronavirus Source Data](#A.3.2) 15 | * [Kaggle Challenges](#A.4) 16 | * [Pokemon Dataset](#A.4.1) 17 | * [Imágenes](#A.5) 18 | * [Fuentes Favoritas](#A.6) 19 | * [Datasets Privados](#A.7) 20 | 21 | 22 |
23 | 24 | 25 | # Acceso 26 | 27 | Estos datos están disponible en este servidor. 28 | 29 | * [Labcomp](https://labcomp.cl/~sborquez/datasets/) 30 | 31 | 32 |
33 | 34 | 35 | # Nacionales 36 | 37 | Coleción de datos recolectados por fuentes Chilenas. 38 | 39 | ```https://labcomp.cl/~sborquez/datasets/chile/``` 40 | 41 | * Migraciones: [**fuente**](https://www.extranjeria.gob.cl/estadisticas-migratorias/) | [**descarga directa**](https://labcomp.cl/~sborquez/datasets/chile/Formato-WEB-PDs-2005-2016.xlsx) 42 | 43 | 44 |
45 | 46 | 47 | # Intercacional 48 | 49 | Distintas recolección de datos demográficos y ecconómicas. De fuentes internacionales. 50 | 51 | 52 |
53 | 54 | ## Countries 55 | 56 | Fuente: [countryinfo](https://github.com/porimol/countryinfo) 57 | 58 | Información demogeográfica de los paises del mundo. 59 | 60 | Una copia desactualizada `The last update was made on April 16, 2020 (11:30, London time).` en formato `.csv`. Se encuentra aqui [descarga directa](https://labcomp.cl/~sborquez/datasets/covid/total_cases.csv) 61 | 62 | 63 |
64 | 65 | 66 | ## Coronavirus Source Data 67 | 68 | Fuente: [Our World in Data](https://ourworldindata.org/coronavirus-source-data) 69 | 70 | Our complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, and testing. 71 | 72 | All our data can be downloaded. 73 | 74 | You find the complete Our World in Data COVID-19 dataset – together with a complete overview of our sources and more – at our GitHub repository [here](https://github.com/owid/covid-19-data/tree/master/public/data/). 75 | 76 | Una copia de los casos casos totales `The last update was made on April 16, 2020 (11:30, London time).` se encuentra aqui [descarga directa](https://labcomp.cl/~sborquez/datasets/covid/total_cases.csv) 77 | 78 | 79 |
80 | 81 | 82 | # Kaggle Challenges 83 | 84 | Mi colección kaggles favoritos. 85 | 86 | 87 |
88 | 89 | 90 | ## Pokemon Dataset 91 | 92 | 93 | Fuente: [The Complete Pokemon Dataset](https://www.kaggle.com/rounakbanik/pokemon) 94 | 95 | The Complete Pokemon Dataset es un conjunto de datos disponible en la plataforma Kaggle. Este conjunto de datos contiene información sobre Pokemones disponibles hasta la séptima generación. 96 | 97 | Esta información abarca desde los nombres, generación, tipos y stats de los diferentes Pokemones, la cual utilizaremos para analizar a las diferentes generaciones de pokemones. 98 | 99 | 100 | Una copia del dataset `The last update was made on 2019.` se encuentra aqui [descarga directa](https://labcomp.cl/~sborquez/datasets/pokemon/pokemon.csv) 101 | 102 | 103 |
104 | 105 | 106 | ## Imágenes 107 | 108 | Colección de imágenes interesantes para analizar. Esta tiene distintas fuentes desconocidas. 109 | 110 | 111 | ```https://labcomp.cl/~sborquez/datasets/images/``` 112 | 113 | [carpeta](https://labcomp.cl/~sborquez/datasets/images/) 114 | 115 | 116 |
117 | 118 | 119 | ## Fuentes favoritas 120 | 121 | Algunas páginas como fuente de inspiración 122 | 123 | * Kaggle [web](https://www.kaggle.com/competitions) 124 | * UCI repository [web](http://archive.ics.uci.edu/ml/) 125 | * ECML [web](http://www.ecmlpkdd2015.org) 126 | * NIPS [web](https://nips.cc) 127 | 128 | 129 |
130 | 131 | 132 | 133 | # Datasets Privados 134 | 135 | Estos datos no son públicos, pero puedes comunicarte conmigo para concederte acceso sebastian.borquez.g@gmail.com. 136 | 137 | 138 | 139 | ## Her2Net 140 | 141 | Imágenes histopatologicas para la segmentación de sobreexpresión de la proteina HER2 en la membrana celular. 142 | 143 | 144 | 145 | 146 | ## CTA gamma rays Simulations 147 | 148 | 149 | 150 | 151 | Fuente: [**CTA Prod3B**](https://www.cta-observatory.org/science/cta-performance/cta-prod3b-v1/) 152 | 153 | 154 | Simulaciones de MC para la reconstrucción de eventos de gamma rays. 155 | 156 | 157 | -------------------------------------------------------------------------------- /2. Vectores, Eventos Aleatorios y Probabilidad.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "
\n", 8 | "

Laboratorio de Estadística Computacional
con Python

\n", 9 | "\n", 10 | " \n", 11 | "

Tema 2: Vectores, Eventos Aleatorios y Probabilidad

\n", 12 | "Notebook creado por Sebastián Bórquez G. - sebastian.borquez.g@gmail.com - DI UTFSM - Diciembre 2019.\n", 13 | "
\n", 14 | "\n", 15 | "## Tabla de Contenido\n", 16 | "\n", 17 | "* [Introducción a NumPy](#2.1)\n", 18 | " * [NumPy ndarray: El Objeto Arreglo Multidimensional](#2.1.1)\n", 19 | "* [Simulaciones](#2.2)\n", 20 | " * [Eventos Aleatorios](#2.2.1)\n", 21 | " * [Random Walk](#2.2.2)\n", 22 | "* [Teorema de Bayes](#2.3)\n", 23 | " " 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "
\n", 31 | "\n", 32 | "## Introducción a NumPy\n", 33 | "\n", 34 | "\n", 35 | "**[NumPy](http://www.numpy.org/)\n", 36 | " (Numeric Python)** es el módulo de Python para el cálculo científico, incluye soporte para el uso de **arrays n dimensionales (ndarray)**, un rápido y eficiente arreglo multidimensional provisto de operaciones aritméticas vectorizadas. Es requisito para la mayoría de los módulos que realizan operaciones de alto rendimiento. Además incluye el submodulo **numpy.random** para la generación de variables aleatorias.\n", 37 | "\n", 38 | "Para poder utilizarlo debemos agregar la siguiente linea a nuestro código:\n", 39 | "\n", 40 | "```python\n", 41 | "import numpy as np\n", 42 | "```\n", 43 | "\n", 44 | "A continuación se revisarán los comandos básicos que se utilizarán durante esta actividad, para mayor información del módulo se recomienda revisar el siguiente [tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)\n", 45 | "\n", 46 | "
\n", 47 | "\n", 48 | "### NumPy ndarray: El Objeto Arreglo Multidimensional\n", 49 | "\n", 50 | "La característica principal de NumPy es su arreglo N-dimensional, o **ndarray**, contenedor flexible y rápido para grandes conjuntos de datos en Python. Nos permiten realizar operaciones matemáticas en un bloque de datos usando una sintaxis similar a realizar operaciones con elementos escalares:\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "\n" 63 | ] 64 | }, 65 | { 66 | "data": { 67 | "text/plain": [ 68 | "array([[ 0.932, -0.323, 0.147],\n", 69 | " [ 0.442, 0. , -0.231]])" 70 | ] 71 | }, 72 | "execution_count": 1, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "import numpy as np\n", 79 | "\n", 80 | "data = np.array([[0.932, -0.323, 0.147], [0.442, 0.0, -0.231]])\n", 81 | "\n", 82 | "print(type(data))\n", 83 | "\n", 84 | "data" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 2, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "text/plain": [ 95 | "array([[4.63587323, 5.8798375 , 5.3429182 ],\n", 96 | " [5.05328923, 5.5 , 5.76640805]])" 97 | ] 98 | }, 99 | "execution_count": 2, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "55 / (2*data + 10)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Un _ndarray_ es un contenedor generico multidimensional para data homogenia; esto es, **todos los elementos deben ser del mismo tipo**. Cada array tiene su forma o **shape**, una tupla que indica el tamaño de cada dimension, y un **dtype**, un objeto que describe el tipo de la data almacenada en el array:" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 3, 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "name": "stdout", 122 | "output_type": "stream", 123 | "text": [ 124 | "Dimensiones: 2\n", 125 | "Tamaño de las dimensiones del arreglo: (2, 3)\n", 126 | "Tipos de los elementos: float64\n" 127 | ] 128 | } 129 | ], 130 | "source": [ 131 | "print(\"Dimensiones:\", data.ndim)\n", 132 | "print(\"Tamaño de las dimensiones del arreglo:\", data.shape)\n", 133 | "print(\"Tipos de los elementos:\", data.dtype)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "### Crear Arreglos\n", 141 | "\n", 142 | "La manera más fácil de crear arreglos es usando `np.array(iterable)`, pero existe más funciones que se adecuan a distintas necesidades." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 5, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "name": "stdout", 152 | "output_type": "stream", 153 | "text": [ 154 | "Array constructor\n", 155 | "[ 2 3 5 10 -1]\n", 156 | "\n", 157 | "Ceros:\n", 158 | "[[0. 0. 0.]\n", 159 | " [0. 0. 0.]\n", 160 | " [0. 0. 0.]]\n", 161 | "\n", 162 | "Unos:\n", 163 | "[[1. 1. 1.]\n", 164 | " [1. 1. 1.]\n", 165 | " [1. 1. 1.]]\n", 166 | "\n", 167 | "Empty:\n", 168 | "[[6.23042070e-307 4.67296746e-307 1.69121096e-306]\n", 169 | " [8.01095173e-307 1.89146896e-307 1.37961302e-306]\n", 170 | " [1.05699242e-307 8.01097889e-307 1.78020169e-306]\n", 171 | " [7.56601165e-307 1.02359984e-306 1.33510679e-306]\n", 172 | " [2.22522597e-306 1.78019761e-306 1.11260144e-306]\n", 173 | " [6.89812281e-307 2.22522596e-306 2.13621350e-306]]\n", 174 | "\n", 175 | "Range:\n", 176 | "[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]\n", 177 | "\n", 178 | "Regular grid:\n", 179 | "[0. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1. ]\n", 180 | "\n", 181 | "Random:\n", 182 | "[1.63085501 5.36278087 4.35527252 5.09886798 6.83911219 5.61694616]\n" 183 | ] 184 | } 185 | ], 186 | "source": [ 187 | "# Array constructor: np.array( python_iterable )\n", 188 | "print(\"Array constructor\")\n", 189 | "print( np.array([2, 3, 5, 10, -1]) )\n", 190 | "\n", 191 | "# Arrays de ceros: np.zeros(shape)\n", 192 | "print(\"\\nCeros:\")\n", 193 | "print( np.zeros((3,3)) )\n", 194 | "\n", 195 | "# Arrays de unos: np.ones(shape)\n", 196 | "print(\"\\nUnos:\")\n", 197 | "print( np.ones((3,3)) )\n", 198 | "\n", 199 | "# Array vacio: np.empty(shape)\n", 200 | "print(\"\\nEmpty:\")\n", 201 | "print( np.empty((6,3)) )\n", 202 | "\n", 203 | "# Range: np.range(start, stop, step)\n", 204 | "print(\"\\nRange:\")\n", 205 | "print( np.arange(0., 10., 1.) )\n", 206 | "\n", 207 | "# Regular grid: np.linspace(start, end, n_values)\n", 208 | "print(\"\\nRegular grid:\")\n", 209 | "print( np.linspace(0., 1., 9) )\n", 210 | "\n", 211 | "# Secuencia random: np.random\n", 212 | "print(\"\\nRandom:\")\n", 213 | "print( np.random.uniform(10, size=6) )" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### Operaciones matemáticas básicas\n", 221 | "\n", 222 | "La mayoría de las operaciones realizadas en NumPy son realizadas `element-wise`, por ejemplo calcular $C = A + B$ se traduce a $C[i,j] = A[i,j] + B[i,j]$." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 7, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "[[0.57272943 0.46531185 0.92287039]\n", 235 | " [0.20186139 0.16607915 0.92157951]]\n", 236 | "[[0.27430109 0.85422118 0.50129763]\n", 237 | " [0.83846618 0.14699148 0.84202488]]\n", 238 | "Sum:\n", 239 | "[[0.84703052 1.31953303 1.42416803]\n", 240 | " [1.04032757 0.31307063 1.76360439]]\n", 241 | "\n", 242 | "Subtraction\n", 243 | "[[ 0.29842835 -0.38890933 0.42157276]\n", 244 | " [-0.63660479 0.01908767 0.07955462]]\n", 245 | "\n", 246 | "Product\n", 247 | "[[0.15710031 0.39747924 0.46263274]\n", 248 | " [0.16925395 0.02441222 0.77599288]]\n", 249 | "\n", 250 | "Matricial Product (A*B^T)\n", 251 | "[[1.01721229 1.32569097]\n", 252 | " [0.65922475 0.96965905]]\n", 253 | "\n", 254 | "Transpose\n", 255 | "[[0.57272943 0.20186139]\n", 256 | " [0.46531185 0.16607915]\n", 257 | " [0.92287039 0.92157951]]\n", 258 | "[[0.27430109 0.83846618]\n", 259 | " [0.85422118 0.14699148]\n", 260 | " [0.50129763 0.84202488]]\n", 261 | "\n", 262 | " Power\n", 263 | "[[0.32801901 0.21651512 0.85168976]\n", 264 | " [0.04074802 0.02758228 0.84930879]]\n", 265 | "\n", 266 | " np.exp()\n", 267 | "[[1.77310001 1.59251073 2.51650338]\n", 268 | " [1.22367838 1.18066655 2.51325697]]\n", 269 | "\n", 270 | " np.sin()\n", 271 | "[[0.54192795 0.44870152 0.79733728]\n", 272 | " [0.20049327 0.16531673 0.79655753]]\n", 273 | "\n", 274 | " np.cos()\n", 275 | "[[0.84042495 0.89368168 0.60353398]\n", 276 | " [0.97969508 0.98624053 0.60456274]]\n", 277 | "\n", 278 | " np.tan()\n", 279 | "[[0.64482611 0.50208203 1.32111416]\n", 280 | " [0.20464865 0.16762313 1.31757627]]\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "# first we create two random arrays:\n", 286 | "A = np.random.random((2,3))\n", 287 | "B = np.random.random((2,3))\n", 288 | "print(A)\n", 289 | "print(B)\n", 290 | "\n", 291 | "# sum\n", 292 | "print(\"Sum:\")\n", 293 | "print( A+B )\n", 294 | "\n", 295 | "# subtraction\n", 296 | "print(\"\\nSubtraction\")\n", 297 | "print( A-B )\n", 298 | "\n", 299 | "# product\n", 300 | "print(\"\\nProduct\")\n", 301 | "print( A*B )\n", 302 | "\n", 303 | "# matricial product\n", 304 | "print(\"\\nMatricial Product (A*B^T)\")\n", 305 | "print( np.dot(A,B.T) )\n", 306 | "print(\"\\nTranspose\")\n", 307 | "print( A.T )\n", 308 | "print( B.T )\n", 309 | "\n", 310 | "\n", 311 | "\n", 312 | "# power\n", 313 | "print(\"\\n Power\")\n", 314 | "print( A**2 )\n", 315 | "\n", 316 | "# Some common mathematical functions\n", 317 | "print(\"\\n np.exp()\")\n", 318 | "print( np.exp(A) )\n", 319 | "print(\"\\n np.sin()\")\n", 320 | "print( np.sin(A) )\n", 321 | "print(\"\\n np.cos()\")\n", 322 | "print( np.cos(A))\n", 323 | "print(\"\\n np.tan()\")\n", 324 | "print( np.tan(A) )" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "### Operaciones Booleanas" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 8, 337 | "metadata": {}, 338 | "outputs": [ 339 | { 340 | "name": "stdout", 341 | "output_type": "stream", 342 | "text": [ 343 | "A > B:\n", 344 | "[[False False False]\n", 345 | " [False False False]\n", 346 | " [ True True True]]\n", 347 | "\n", 348 | "A =< B:\n", 349 | "[[ True True True]\n", 350 | " [ True True True]\n", 351 | " [False False False]]\n", 352 | "\n", 353 | " A==B:\n", 354 | "[[ True True True]\n", 355 | " [False False True]\n", 356 | " [False False False]]\n", 357 | "\n", 358 | " A!=B:\n", 359 | "[[False False False]\n", 360 | " [ True True False]\n", 361 | " [ True True True]]\n", 362 | "\n", 363 | " A and B:\n", 364 | "[[ True True True]\n", 365 | " [False False True]\n", 366 | " [False False False]]\n", 367 | "[[ True True True]\n", 368 | " [False False True]\n", 369 | " [False False False]]\n", 370 | "\n", 371 | " A or B:\n", 372 | "[[ True True True]\n", 373 | " [False False True]\n", 374 | " [ True True True]]\n", 375 | "[[ True True True]\n", 376 | " [False False True]\n", 377 | " [ True True True]]\n", 378 | "\n", 379 | " not A:\n", 380 | "[[False False False]\n", 381 | " [ True True False]\n", 382 | " [ True True True]]\n", 383 | "[[False False False]\n", 384 | " [ True True False]\n", 385 | " [ True True True]]\n" 386 | ] 387 | } 388 | ], 389 | "source": [ 390 | "# Creating two 2d-arrays\n", 391 | "A = np.array( [[1, 2, 3], [2, 3, 5], [1, 9, 6]] )\n", 392 | "B = np.array( [[1, 2, 3], [3, 5, 5], [0, 8, 5]] )\n", 393 | "\n", 394 | "print(\"A > B:\")\n", 395 | "print( A > B )\n", 396 | "\n", 397 | "print(\"\\nA =< B:\")\n", 398 | "print( A <= B )\n", 399 | "\n", 400 | "print(\"\\n A==B:\")\n", 401 | "print( A==B )\n", 402 | "\n", 403 | "print(\"\\n A!=B:\")\n", 404 | "print( A!=B )\n", 405 | "\n", 406 | "# Creating two 2d boolean arrays\n", 407 | "C = A==B\n", 408 | "D = A>=B\n", 409 | "\n", 410 | "print(\"\\n A and B:\")\n", 411 | "print( C & D)\n", 412 | "print( np.logical_and(C,D) )\n", 413 | "\n", 414 | "print(\"\\n A or B:\")\n", 415 | "print( C | D)\n", 416 | "print( np.logical_or(C,D) )\n", 417 | "\n", 418 | "print(\"\\n not A:\")\n", 419 | "print( ~C )\n", 420 | "print( np.logical_not(C))" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "### Basic Indexing and Slicing\n", 428 | "\n", 429 | "Indexar en un tópico amplio en NumPy existe una variedad de formas y métodos para seleccionar un subconjunto desde tu array o elementos individuales. \n", 430 | "\n", 431 | "Un arreglo de una dimensión son similares a las listas de Python: \n" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 9, 437 | "metadata": {}, 438 | "outputs": [ 439 | { 440 | "name": "stdout", 441 | "output_type": "stream", 442 | "text": [ 443 | "arr: [0 1 2 3 4 5 6 7 8 9]\n", 444 | "\n", 445 | "arr[5]: 5\n", 446 | "arr[5:8]: [5 6 7]\n", 447 | "\n", 448 | "arr[5:8] = 12\n", 449 | "arr: [ 0 1 2 3 4 12 12 12 8 9]\n", 450 | "\n", 451 | "arr_slice: [12 12 12]\n", 452 | "arr: [0 1 2 3 4 0 0 0 8 9]\n" 453 | ] 454 | } 455 | ], 456 | "source": [ 457 | "# Nuevo arrray\n", 458 | "arr = np.arange(10)\n", 459 | "print(\"arr:\",arr)\n", 460 | "\n", 461 | "# Index an element\n", 462 | "print(\"\\narr[5]:\", arr[5])\n", 463 | "\n", 464 | "# Index a slice\n", 465 | "print(\"arr[5:8]:\", arr[5:8])\n", 466 | "\n", 467 | "# Change values to a slice\n", 468 | "print(\"\\narr[5:8] = 12\")\n", 469 | "arr[5:8] = 12\n", 470 | "print(\"arr:\",arr)\n", 471 | "\n", 472 | "# slice are references, not copies.\n", 473 | "arr_slice = arr[5:8]\n", 474 | "print(\"\\narr_slice:\", arr_slice)\n", 475 | "arr_slice[:] = 0\n", 476 | "\n", 477 | "print(\"arr:\", arr)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "Para *n-dimensiones* se utiliza la sintaxis: `arr[i,j,k,...]`, cada indice se separa por una coma:" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 10, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "name": "stdout", 494 | "output_type": "stream", 495 | "text": [ 496 | "arr2d:\n", 497 | " [[1 2 3]\n", 498 | " [4 5 6]\n", 499 | " [7 8 9]]\n", 500 | "arr2d[2]: [7 8 9]\n", 501 | "arr2d[0][2]: 3\n" 502 | ] 503 | } 504 | ], 505 | "source": [ 506 | "arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", 507 | "print(\"arr2d:\\n\", arr2d)\n", 508 | "print(\"arr2d[2]:\",arr2d[2])\n", 509 | "\n", 510 | "print(\"arr2d[0][2]:\", arr2d[0,2])" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "Para _slice_ se usa las siguientes sintaxis:\n", 518 | "\n", 519 | "1. `arr[i_0:i_n,...]`: slice desde i_0 a i_n (exclusivo),\n", 520 | "2. `arr[i_0:i_n: i_step,...]`: slice desde i_0 a i_n con salto de i_step.\n", 521 | "\n", 522 | "Si i_0 y/o i_n no son especificados, se utiliza el inicio y el fin del array respectivamente.\n" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 11, 528 | "metadata": {}, 529 | "outputs": [ 530 | { 531 | "name": "stdout", 532 | "output_type": "stream", 533 | "text": [ 534 | "arr3d:\n", 535 | " [[[ 1 2 3]\n", 536 | " [ 4 5 6]]\n", 537 | "\n", 538 | " [[ 7 8 9]\n", 539 | " [10 11 12]]]\n", 540 | "\n", 541 | "arr3d[:2, 1:]:\n", 542 | " [[ 4]\n", 543 | " [10]]\n" 544 | ] 545 | } 546 | ], 547 | "source": [ 548 | "arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])\n", 549 | "\n", 550 | "print(\"arr3d:\\n\", arr3d)\n", 551 | "\n", 552 | "print(\"\\narr3d[:2, 1:]:\\n\", arr3d[:2, 1:, 0])" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "### Boolean indexing\n", 560 | "\n", 561 | "También es posible usar _masks_ o condiciones booleanas para selecionar valores de un array." 562 | ] 563 | }, 564 | { 565 | "cell_type": "code", 566 | "execution_count": 12, 567 | "metadata": {}, 568 | "outputs": [ 569 | { 570 | "name": "stdout", 571 | "output_type": "stream", 572 | "text": [ 573 | "arr: [0 1 2 3 4 5 6 7]\n", 574 | "arr[mask]: [0 1 3 5 7]\n", 575 | "arr[arr%2 == 0]: [6]\n" 576 | ] 577 | } 578 | ], 579 | "source": [ 580 | "arr = np.arange(8)\n", 581 | "print(\"arr:\", arr)\n", 582 | "\n", 583 | "# Implicit list\n", 584 | "mask = [True, True, False, True, False, True, False, True]\n", 585 | "print(\"arr[mask]:\", arr[mask])\n", 586 | "\n", 587 | "# Boolean condition\n", 588 | "print(\"arr[arr%2 == 0]:\", arr[arr == 6])" 589 | ] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": {}, 594 | "source": [ 595 | "### Statistics methods\n", 596 | "\n", 597 | "Además de las funciones algebraicas, tambien incluye subrutinas comunes para la estadistica:" 598 | ] 599 | }, 600 | { 601 | "cell_type": "code", 602 | "execution_count": 13, 603 | "metadata": {}, 604 | "outputs": [ 605 | { 606 | "name": "stdout", 607 | "output_type": "stream", 608 | "text": [ 609 | "[[1 2 3]\n", 610 | " [3 4 5]]\n", 611 | "[2. 3. 4.]\n", 612 | "[2. 4.]\n", 613 | "axis 0\n", 614 | "2.0\n", 615 | "3.0\n", 616 | "4.0\n", 617 | "\n", 618 | "axis 1\n", 619 | "2.0\n", 620 | "4.0\n" 621 | ] 622 | } 623 | ], 624 | "source": [ 625 | "arr = np.array([[1,2, 3],\n", 626 | " [3,4,5]])\n", 627 | "print(arr)\n", 628 | "\n", 629 | "print(arr.mean(axis=0))\n", 630 | "print(arr.mean(axis=1))\n", 631 | "\n", 632 | "#axis 0\n", 633 | "print(\"axis 0\")\n", 634 | "print(arr[:,0].mean())\n", 635 | "print(arr[:,1].mean())\n", 636 | "print(arr[:,2].mean())\n", 637 | "\n", 638 | "#axis 1\n", 639 | "print(\"\\naxis 1\")\n", 640 | "print(arr[0,:].mean())\n", 641 | "print(arr[1,:].mean())" 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 80, 647 | "metadata": {}, 648 | "outputs": [ 649 | { 650 | "name": "stdout", 651 | "output_type": "stream", 652 | "text": [ 653 | "arr: [ -3 5 -9 8 -5 -9 -3 6 9 0 -4 7 -7 -9 -2 -9 0 -4\n", 654 | " 9 -3 -5 -1 -1 -5 7 -5 2 -8 -5 8 -6 8 8 -10 -2 3\n", 655 | " 0 2 7 8 3 9 3 6 -5 -10 0 5 1 -6 -7 7 6 9\n", 656 | " 1 -7 4 -4 -8 -8 -1 0 -2 -6 -7 3 -5 -8 -10 -10 3 3\n", 657 | " -9 -2 8 9 2 7 8 -5 -4 0 -6 1 7 -4 4 0 -2 4\n", 658 | " -6 5 -9 -1 5 4 8 -8 -1 -6]\n", 659 | "\n", 660 | "min: -10\n", 661 | "max: 9\n", 662 | "ptp (max - min): 19\n", 663 | "\n", 664 | "mean: -0.5\n", 665 | "median: -1.0\n", 666 | "Percentil 95: 8.049999999999997\n", 667 | "\n", 668 | "std: 5.8966091951222275\n", 669 | "var: 34.77\n", 670 | "Cov: 35.121212121212125\n", 671 | "histogram\n", 672 | "\tbins: [-10. -8.1 -6.2 -4.3 -2.4 -0.5 1.4 3.3 5.2 7.1 9. ]\n", 673 | "\tfrec: [10 9 14 8 10 10 9 8 9 13]\n", 674 | "\n" 675 | ] 676 | } 677 | ], 678 | "source": [ 679 | "arr = np.random.randint(-10, 10, 100)\n", 680 | "print(\"arr:\", arr)\n", 681 | "\n", 682 | "# range\n", 683 | "print(\"\\nmin:\", arr.min())\n", 684 | "print(\"max:\", arr.max())\n", 685 | "print(\"ptp (max - min):\", arr.ptp())\n", 686 | "\n", 687 | "# measures of central tendency\n", 688 | "print(\"\\nmean:\", np.mean(arr))\n", 689 | "print(\"median:\", np.median(arr))\n", 690 | "print(\"Percentil 95:\", np.percentile(arr, 95))\n", 691 | "\n", 692 | "# measure of the spread of a distribution\n", 693 | "print(\"\\nstd:\", np.std(arr))\n", 694 | "print(\"var:\", np.var(arr))\n", 695 | "print(\"Cov:\", np.cov(arr))\n", 696 | "\n", 697 | "# and more\n", 698 | "print(\"histogram\")\n", 699 | "print(\"\\tbins:\", np.histogram(arr)[1])\n", 700 | "print(\"\\tfrec:\", np.histogram(arr)[0])\n", 701 | "\n", 702 | "print()" 703 | ] 704 | }, 705 | { 706 | "cell_type": "markdown", 707 | "metadata": {}, 708 | "source": [ 709 | "
\n", 710 | "\n", 711 | "## Simulaciones\n", 712 | "\n", 713 | "\n", 714 | "\n", 715 | "Una **simulación** de computadora es la reproducción del comportamiento de un sistema usando la capacidad de computación de nuestros ordenadores para así poder **simular las salidas del modelo matemático** asociadado a dicho sistema. \n", 716 | "\n", 717 | "En nuestro caso, este modelo matemático son las probabilidades de ocurrir un **evento aleatorio**. De esta forma, una simulación de probabilidades no es más que un **experimento** donde se intenta replicar la salida real de una observación real, de tal forma que las probabilidades de dicha observación concuerdan con lo simulado. " 718 | ] 719 | }, 720 | { 721 | "cell_type": "markdown", 722 | "metadata": {}, 723 | "source": [ 724 | "
\n", 725 | "\n", 726 | "### Eventos Aleatorios\n", 727 | "\n", 728 | "Como habrán notados de los ejemplos anteriores, otra forma de crear arreglos es utilizando el submodulo de [`np.random`](https://docs.scipy.org/doc/numpy/reference/routines.random.html?highlight=random#module-numpy.random), este nos entrega diferentes formas de poder generar datos aleatorios:" 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": 71, 734 | "metadata": {}, 735 | "outputs": [ 736 | { 737 | "name": "stdout", 738 | "output_type": "stream", 739 | "text": [ 740 | "np.random.random():\n", 741 | " [[0.20891013 0.00296997 0.51148049]\n", 742 | " [0.89019044 0.15820597 0.27209639]]\n", 743 | "\n", 744 | "(b - a) * np.random.random() + a:\n", 745 | " [[8.66335984 5.65024063 4.18754926]\n", 746 | " [4.86677011 6.3331849 4.32728307]]\n", 747 | "\n", 748 | "np.random.randint():\n", 749 | " [[ 7 0 3]\n", 750 | " [10 -8 -9]]\n", 751 | "\n", 752 | "np.random.choice with replace:\n", 753 | " ['a' 'c' 'c' 'b' 'a' 'b' 'b' 'b']\n", 754 | "['a', 'a', 'b', 'b', 'b', 'c']\n", 755 | "\n", 756 | "np.random.choice without replace:\n", 757 | " ['b' 'b' 'a' 'b' 'c' 'a']\n" 758 | ] 759 | } 760 | ], 761 | "source": [ 762 | "# uniform distribution [0.0, 1.0)\n", 763 | "print(\"np.random.random():\\n\", np.random.random(size=(2, 3)))\n", 764 | "\n", 765 | "# uniform distribution [a, b)\n", 766 | "a = 4\n", 767 | "b = 10\n", 768 | "print(\"\\n(b - a) * np.random.random() + a:\\n\", (b - a) * np.random.random(size=(2, 3)) + a)\n", 769 | "\n", 770 | "# discrete (int) uniform distribution [low, high)\n", 771 | "low=-10\n", 772 | "high=11\n", 773 | "print(\"\\nnp.random.randint():\\n\", np.random.randint(low, high, size=(2, 3)))\n", 774 | "\n", 775 | "# Choice from 1D array with replace\n", 776 | "choices = [\"a\"] * 2 + [\"b\"] * 3 + [\"c\"]\n", 777 | "print(\"\\nnp.random.choice with replace:\\n\", np.random.choice(choices, size=8))\n", 778 | "\n", 779 | "print(choices)\n", 780 | "# Choice from 1D array without replace\n", 781 | "print(\"\\nnp.random.choice without replace:\\n\",np.random.choice(choices, size=6, replace=False))" 782 | ] 783 | }, 784 | { 785 | "cell_type": "markdown", 786 | "metadata": {}, 787 | "source": [ 788 | "Podemos utilizar esto para simular **eventos aleatorios** con salidas esperadas, por ejemplo el lanzamiento de una moneda." 789 | ] 790 | }, 791 | { 792 | "cell_type": "code", 793 | "execution_count": 26, 794 | "metadata": {}, 795 | "outputs": [ 796 | { 797 | "name": "stdout", 798 | "output_type": "stream", 799 | "text": [ 800 | "Lanzar 1 moneda:\n", 801 | " sello\n", 802 | "\n", 803 | "Lanzar 1 moneda 5 veces:\n", 804 | " ['sello' 'sello' 'sello' 'sello' 'sello']\n", 805 | "\n", 806 | "Lanzar 2 moneda 4 veces:\n", 807 | " [['cara' 'cara']\n", 808 | " ['cara' 'sello']\n", 809 | " ['cara' 'cara']\n", 810 | " ['cara' 'cara']]\n" 811 | ] 812 | } 813 | ], 814 | "source": [ 815 | "moneda = [\"cara\", \"sello\"]\n", 816 | "\n", 817 | "print(\"Lanzar 1 moneda:\\n\", np.random.choice(moneda))\n", 818 | "\n", 819 | "print(\"\\nLanzar 1 moneda 5 veces:\\n\", np.random.choice(moneda, size=5))\n", 820 | "\n", 821 | "print(\"\\nLanzar 2 moneda 4 veces:\\n\", np.random.choice(moneda, size=(4,2)))" 822 | ] 823 | }, 824 | { 825 | "cell_type": "markdown", 826 | "metadata": {}, 827 | "source": [ 828 | "
\n", 829 | "\n", 830 | "### Random Walk\n", 831 | "\n", 832 | "La **caminata aleatoria** o paseo aleatorio o camino aleatorio, abreviado en inglés como RW ([**Random Walks**](https://es.wikipedia.org/wiki/Camino_aleatorio)), es una formalización matemática de la trayectoria que resulta de hacer sucesivos pasos aleatorios.\n", 833 | "\n", 834 | "En su forma más general, las caminatas aleatorias son cualquier **proceso aleatorio** donde la posición de una partícula en cierto instante depende solo de su posición en algún **instante previo** y alguna **variable aleatoria** que determina su subsecuente dirección y la longitud de paso.\n", 835 | "\n", 836 | "\n", 837 | "Digamos que $X(t)$, define una trayectoria que empieza en la posición $X(0) = X_0$.\n", 838 | "\n", 839 | "Un paseo aleatorio se modela mediante la siguiente expresión:\n", 840 | "\n", 841 | "$$X(t+\\tau) = X(t) + \\Phi(\\tau)$$\n", 842 | "\n", 843 | "\n", 844 | "Donde $\\Phi$ es la variable aleatoria que describe la ley de probabilidad para tomar el siguiente paso y $\\tau$ es el intervalo de tiempo entre pasos subsecuentes. " 845 | ] 846 | }, 847 | { 848 | "cell_type": "markdown", 849 | "metadata": {}, 850 | "source": [ 851 | "#### Simulando un random walk\n", 852 | "\n", 853 | "A continuación utilizaremos el módulo random de numpy para simular un _random walk_ con $s$ pasos.\n", 854 | "\n", 855 | "Para este caso, nuestra particula vive en un espacio unidimensional, además la particula solo puede moverse una unidad a la izquierda (-1) o una unidad a la derecha (+1) por cada intervalo de tiempo, y esta se ubica inicialmente en el origen.\n", 856 | "\n", 857 | "Revisemos su comportamiento después de $s=500$ pasos." 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 74, 863 | "metadata": {}, 864 | "outputs": [], 865 | "source": [ 866 | "def random_walk(X_0=0, s=500):\n", 867 | " # Definimos las posibles elecciones\n", 868 | " step = [-1, 1]\n", 869 | "\n", 870 | " # Sabemos que cada paso es independiente, por lo que podemos elegir alatoriamente los 1000 pasos\n", 871 | " walk = np.random.choice(step, s)\n", 872 | " return walk\n", 873 | "#random_walk()" 874 | ] 875 | }, 876 | { 877 | "cell_type": "code", 878 | "execution_count": 17, 879 | "metadata": {}, 880 | "outputs": [ 881 | { 882 | "data": { 883 | "image/png": "\n", 884 | "text/plain": [ 885 | "
" 886 | ] 887 | }, 888 | "metadata": { 889 | "needs_background": "light" 890 | }, 891 | "output_type": "display_data" 892 | } 893 | ], 894 | "source": [ 895 | "# Cantidad de pasos\n", 896 | "s = 500\n", 897 | "\n", 898 | "# X_0, o posicion inicial\n", 899 | "X_0 = 0\n", 900 | "walk = random_walk(X_0, s)\n", 901 | "\n", 902 | "# Finalmente podemos sumar los pasos, para temas de visualización se realiza la cumsum\n", 903 | "position_by_step = np.insert(X_0 + walk.cumsum(), 0, X_0) #se agrega la posicion inicial\n", 904 | "final_position = X_0 + walk.sum()\n", 905 | "t_step = np.arange(s + 1) + 1\n", 906 | "\n", 907 | "plt.figure(figsize=(15,5))\n", 908 | "plt.title(\"Random Walk con 500 pasos\")\n", 909 | "plt.ylabel(\"X(t)\")\n", 910 | "plt.xlabel(\"t\")\n", 911 | "\n", 912 | "plt.axhline(y=0, color='k')\n", 913 | "plt.plot(t_step, position_by_step);" 914 | ] 915 | }, 916 | { 917 | "cell_type": "markdown", 918 | "metadata": {}, 919 | "source": [ 920 | "Nos interesa saber cuál es la **distancia media** a la que llegan los random walk, para esto realizaremos $n$ simulaciones y observaremos como se distribuyen sus posiciones finales." 921 | ] 922 | }, 923 | { 924 | "cell_type": "code", 925 | "execution_count": 65, 926 | "metadata": {}, 927 | "outputs": [ 928 | { 929 | "data": { 930 | "text/plain": [ 931 | "(500, 100)" 932 | ] 933 | }, 934 | "execution_count": 65, 935 | "metadata": {}, 936 | "output_type": "execute_result" 937 | } 938 | ], 939 | "source": [ 940 | "def n_random_walk(X_0, s, n):\n", 941 | " # Definimos las posibles elecciones\n", 942 | " step = [-1, 1]\n", 943 | " \n", 944 | " # Sabemos que cada paso es independiente, por lo que podemos elegir alatoriamente los 10000 pasos\n", 945 | " walk = np.random.choice(step, (s, n))\n", 946 | " return walk\n", 947 | "\n", 948 | "# Cantidad de simulaciones\n", 949 | "n = 100\n", 950 | "\n", 951 | "# Cantidad de pasos\n", 952 | "s = 500\n", 953 | "\n", 954 | "# X_0, o posicion inicial\n", 955 | "X_0 = 0\n", 956 | "\n", 957 | "walk = n_random_walk(X_0, s, n)\n", 958 | "walk.shape" 959 | ] 960 | }, 961 | { 962 | "cell_type": "code", 963 | "execution_count": 19, 964 | "metadata": {}, 965 | "outputs": [ 966 | { 967 | "data": { 968 | "image/png": "\n", 969 | "text/plain": [ 970 | "
" 971 | ] 972 | }, 973 | "metadata": { 974 | "needs_background": "light" 975 | }, 976 | "output_type": "display_data" 977 | } 978 | ], 979 | "source": [ 980 | "def show_random_walks(walks):\n", 981 | " # Finalmente podemos sumar los pasos, para temas de visualización se realiza la cumsum\n", 982 | " final_positions = X_0 + walks.sum(axis=0)\n", 983 | "\n", 984 | " plt.figure(figsize=(12,4))\n", 985 | "\n", 986 | " plt.axvline(x=final_positions.mean(), color=\"k\", label=\"Mean: \"+ str(final_positions.mean()))\n", 987 | " plt.hist(final_positions, bins=20);\n", 988 | "\n", 989 | " plt.title(\"Distribución de posiciones finales - \" + str(len(final_positions)) + \" simulaciones\")\n", 990 | " plt.ylabel(\"N° de simulaciones\")\n", 991 | " plt.xlabel(\"X(t_final)\")\n", 992 | " plt.legend();\n", 993 | " \n", 994 | "show_random_walks(walk)" 995 | ] 996 | }, 997 | { 998 | "cell_type": "markdown", 999 | "metadata": {}, 1000 | "source": [ 1001 | "Observemos como cambia la distribución a medida de que aumentamos las cantidad de simulaciones." 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": 20, 1007 | "metadata": {}, 1008 | "outputs": [ 1009 | { 1010 | "data": { 1011 | "image/png": "\n", 1012 | "text/plain": [ 1013 | "
" 1014 | ] 1015 | }, 1016 | "metadata": { 1017 | "needs_background": "light" 1018 | }, 1019 | "output_type": "display_data" 1020 | } 1021 | ], 1022 | "source": [ 1023 | "walks = n_random_walk(X_0, s, n=1000)\n", 1024 | "show_random_walks(walks)" 1025 | ] 1026 | }, 1027 | { 1028 | "cell_type": "code", 1029 | "execution_count": 21, 1030 | "metadata": {}, 1031 | "outputs": [ 1032 | { 1033 | "data": { 1034 | "image/png": "\n", 1035 | "text/plain": [ 1036 | "
" 1037 | ] 1038 | }, 1039 | "metadata": { 1040 | "needs_background": "light" 1041 | }, 1042 | "output_type": "display_data" 1043 | } 1044 | ], 1045 | "source": [ 1046 | "walks = n_random_walk(X_0, s, n=100000)\n", 1047 | "show_random_walks(walks)" 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": 22, 1053 | "metadata": {}, 1054 | "outputs": [ 1055 | { 1056 | "data": { 1057 | "image/png": "\n", 1058 | "text/plain": [ 1059 | "
" 1060 | ] 1061 | }, 1062 | "metadata": { 1063 | "needs_background": "light" 1064 | }, 1065 | "output_type": "display_data" 1066 | } 1067 | ], 1068 | "source": [ 1069 | "walks = n_random_walk(X_0, s, n=1000000)\n", 1070 | "show_random_walks(walks)" 1071 | ] 1072 | }, 1073 | { 1074 | "cell_type": "markdown", 1075 | "metadata": {}, 1076 | "source": [ 1077 | "Dada nuestras simulaciones realizadas podemos concluir que: \n", 1078 | "\n", 1079 | "La distancia del origen esperado de un random walk de una dimension es 0." 1080 | ] 1081 | }, 1082 | { 1083 | "cell_type": "markdown", 1084 | "metadata": {}, 1085 | "source": [ 1086 | "
\n", 1087 | "\n", 1088 | "## Teorema de Bayes\n", 1089 | "\n", 1090 | "\n", 1091 | "\n", 1092 | "\n", 1093 | "El teorema de Bayes, es una proposición planteada por el matemático inglés Thomas Bayes y extendida por Pierre-Simon Laplace que expresa la probabilidad condicional de un evento aleatorio A dado B:\n", 1094 | "\n", 1095 | "$$P(A|B) = \\frac{P(A)P(B|A)}{P(B)}$$\n", 1096 | "\n", 1097 | "\n", 1098 | "Sin embargo, existe otra interpretación conocida como la **interpretación diacrónica**. Con esta interpretacion obtenemos una expresión que nos indica como actualizar nuestra creencia de una hipotesis inicial **H** la evidencia entregada por los datos __D__.\n", 1099 | "\n", 1100 | "$$P(H|D) = P(H)\\frac{P(D|H)}{P(D)}$$\n", 1101 | "\n", 1102 | "En esta interpretación los terminos son conocidos como:\n", 1103 | "\n", 1104 | " * $P(H)$: Priori, la probabilidad de la hipotesis antes de ver los datos.\n", 1105 | " * $P(H|D)$: Posteriori, la probabilidad de la hipotesis antes de ver los datos, lo que queremos computar.\n", 1106 | " * $P(D|H)$: Likelihood, la probabilidad de la evidencia bajo una hipotesis.\n", 1107 | " * $P(D)$: Constante de normalización, la probabilidad de la evidencia bajo cualquier hipotesis.\n", 1108 | "\n", 1109 | "Además contamos con la **versión de Laplace** donde contamos con $A_1, A_2,..., A_i,..., A_n$, una partición de $\\Omega$, y sea $B$ un evento cualquiera (las mismas hipótesis del teorema de la probabilidad total). Se cumple:\n", 1110 | "\n", 1111 | "$$P(A_i|B) = \\frac{P(A_i)P(B|A_i)}{P(B)} = \\frac{P(A_i)P(B|A_i)}{\\sum_{j=1}^n P(B|A_j)P(A_j)}$$\n" 1112 | ] 1113 | }, 1114 | { 1115 | "cell_type": "markdown", 1116 | "metadata": {}, 1117 | "source": [ 1118 | "Recomiendo los video del canal **3blue1brown** [Bayes theorem, and making probability intuitive](https://www.youtube.com/watch?v=HZGCoVF3YvM) y [The quick proof of Bayes' theorem](https://www.youtube.com/watch?v=U_85TaXbeIo) para poder entenderlo de manera visual." 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": {}, 1124 | "source": [ 1125 | "Veamos como es utilizado el teorema con un ejemplo.\n", 1126 | "\n", 1127 | "Suponga que en su visita más reciente al consultorio del médico, decide hacerse la prueba de detección de una enfermedad rara. Si tiene la mala suerte de recibir un resultado positivo, la siguiente pregunta lógica es: **\"Dado el resultado de la prueba, ¿cuál es la probabilidad de que realmente tenga esta enfermedad?\"** (Después de todo, las pruebas médicas no son perfectamente precisas). El teorema de Bayes nos dice exactamente cómo calcular esta probabilidad:\n", 1128 | "\n", 1129 | "$$P(E|+)=\\frac{P(+|E)P(E)}{P(+)}$$\n", 1130 | "\n", 1131 | "Como indica la ecuación, la probabilidad posterior de tener la enfermedad (E) dado que la prueba fue positiva (+) depende de la probabilidad a priori de la enfermedad $P(E)$. Piense en esto como la incidencia de la enfermedad en la población general. La cual es solo de un $10\\%$\n", 1132 | "\n", 1133 | "$$P(E) = 0.1$$\n", 1134 | "$$P(¬E) = P(S) = 0.9$$\n", 1135 | "\n", 1136 | "La probabilidad posterior también depende de la precisión de la prueba: ¿con qué frecuencia la prueba informa correctamente un resultado negativo para un paciente sano y con qué frecuencia informa un resultado positivo para alguien con la enfermedad?\n", 1137 | "\n", 1138 | "$$P(-|S) = 0.75$$\n", 1139 | "$$P(+|S) = 0.25$$\n", 1140 | "$$P(-|E) = 0.25$$\n", 1141 | "$$P(+|E) = 0.75$$\n", 1142 | "\n", 1143 | "Finalmente, necesitamos saber la probabilidad general de un resultado positivo $P(+)$. La cual puede calcularse o obtenerse como parte de la información del problema." 1144 | ] 1145 | }, 1146 | { 1147 | "cell_type": "code", 1148 | "execution_count": 66, 1149 | "metadata": {}, 1150 | "outputs": [ 1151 | { 1152 | "data": { 1153 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE/BJREFUeJzt3Xt4VNW5x/HfCwESvKaArRohgKh4kCOYKhevXBSVExXkaoXTIrRYOCrWejsqYqXQivSitiKl3goUekQRo/gAggpSuVZFKkUEjECtUUQKBALv+SOTOOTiTJLJJGR9P8+Th9lr1t7vmnk2v2zW7FmYuwsAEIZ6NT0AAEDyEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgKTU9ABKatq0qWdmZtb0MCpu1SrpnHNqehQAArVq1arP3L1ZrH61LvQzMzO1cuXKmh5GxZlJR+K4AdQJZrYlnn5M7wBAQAh9AAgIoQ8AAal1c/oAjgwHDhxQbm6u9u3bV9NDCUpqaqoyMjLUoEGDSu1P6AOolNzcXB1zzDHKzMyUmdX0cILg7srLy1Nubq5atmxZqWNU+/SOmU0zs0/N7L3qrgUgefbt26cmTZoQ+ElkZmrSpEmV/nWVjDn9JyX1SkIdAElG4CdfVd/zag99d39d0ufVXQcAEBtz+gASIvOOlxJ6vM0TrozZx8w0ZswYTZo0SZL00EMPaffu3Ro7dmxCxzJ+/HjdddddxdtdunTRsmXLqnzc7du3a/jw4Zo3b16p537yk5/oiiuuULdu3apcJ1qtCH0zGyFphCQ1b968SsdK9IkXr801WBuoCU9kn6gDuTur7fjvxHHsho0aaebsvyh76I1K/1YTbd+5V3v27Itr34r42YPj1XvIjcXbv5+Vk5AaD/9svHpeM7j4WO0zji9+bvTo0Ro+fHjCQ79W3Kfv7lPcPcvds5o1i7l0BABIkurXT9G1g4fq2SceK/Xc53mfacyIIRp8ZTcNvrKb1qxYXtz+w8HXaMDlF2ncHTerV6ez9MXneZKkm4ddp4FXXKxrunfWX/70pCTpVz8fq/x9e9X/sgt05+jhkqROp2dIkm4b+QO9sejV4pr33HKjFuTMVf6+fbpnzI/Vt0cX9e91od5e9kaZ41/w8ovqenH3Mp9r0aKF8vLytGPHjkq9N+WpFaEPAJU1YOgNynl+tr7a9eVh7b+47w5974aRmv7SIk2a8pTu/+lNkqTfT56oc7tcoD+/vETdL+ut7Z/kFu9z/0OPaGbOYs2Yt0jTpz2unV98rpvvHKtGqWmaNf8N/fy3TxxWo1d2H81/cY4k6cD+/frr0iU6/5KemvnUVEnS/y1YpomPTNU9t4xUfok7bnK3btGxxx2vho0alfvaOnbsqKVLl1b+zSlDtU/vmNkMSRdLampmuZLuc/c/VHddAGE4+phj1bvvQE2fNkWpqanF7cvfXKJN//igeHv3V1/p37u/0toVy/XwE89Kkrpe0kPHHvf1lMr0Pz6uRa8Uzq//c/sn2vrRhzo+/Vvl1j7/kh6aeN/t2p+fr6WLF+qc87ooNS1Na1Ys16Dvj5AktTz1NJ148ina8tFGnda2XfG+n326Q+nfavqNr+2EE07Qtm3bKvBuxFbtoe/ug6q7BoCwfW/YSA284iJd1f+64jY/dEhPP/+qUtPSDuvr7mUeY8Vbb2r5m4v19AuvKi2tsYb16638/PxvrNsoNVVZnc/XsiULNf/F59Trqr5FRWKOuVFqmvbnf331f8+YH2vLhnU66aSTlJOTI6nwuxBpJcZfVUzvADjiHZeerkt7X605M58pbut84SWa+dTX0zF/X/euJKnDdzvp1XmFUzLLlizSri8LP0TdvWuXjj3ueKWlNdZHGzfonTVfL5We0iBFBw4cKLN2r+w+en7WdK1++y11vahwfr7jeV2UM2e2JGnzpo3asS1Xma3aHLZfi1attS13a/H2Aw8/qrVr1xYHviRt2LBB7dq1UyLVirt3ABz55o7qWqP1h4wYpZlPTi3evn3cRI2/+zZd27OrDh48qI7nddY9P5+sH95yu+4YdYPmvzhHWed1VbMTvqOjjjpaXS/urtnPTtO1Pbsqs3Ubte+QVXysvoOHqt+l56ttu/al5vU7X9hN/3vzSF3U83I1aNhQkjRgyDD97M4x6tuji+qnpGjcw4+Vmrtv3PgoZbRoqa0fbVLzlq1KvZ4DBw5o48aNysrKKvVcVVh5/9SpKVlZWV6V/0Slxm7ZnNhbmbeXvtcWqKueyD5R325eOqxqu/35+apXv75SUlL0t1Vv68G7btWs+WXfXVPdFr48T+vfXatRP/1fSYffsjlnzhytXr1aDzzwQKn91q9fr7Zt2x7WZmar3D3mbwiu9AEEZfu2XN028vvyQ4fUoEFD3Tvx1zU2lu6X99aXO8tesKCgoEC33nprwmsS+gCC0qJla8165fWaHkaxPoOGlNner1+/aqnHB7kAEBBCHwACQugDQEAIfQAICB/kAkiI9lNbJPR479ywJWafDi2aqM0ZZ6qgoECt2pyuByY/prS0xhWqM/a2/9H1w29U69PO0NTfTtINo7++Y2bI1Zfq6edf/Ya94/Ovf+7Q/bffpEee/HNc/Xv06KHZs2crPT29yrVL4kofwBGraCG05xa+pQYNGmj2M3+s8DHG/vI3an3aGZKkqY9MPuy5RAS+JD3zxKPqO2hoqfbfPTxBL8yaXqr9+uuv12OPlV45NBEIfQB1QodzO+vjzZskSU9PeVR9undWn+6d9ezU30mS9uz5t0YN7a9+l56vPt0765W5z0mShvXrrXV/W1NjSyiXJTs7WzNmzKj4mxAHpncAHPEKCgq09LUF6npxd73/zlq9MOtPevbFBZK7rsvuqXM6ddUnWzer2bdP1CNPzZKkUksx33znWM18cmqZ384tWkL5gm6XFi+hfPf4SYctofzRxg360XV9NHfJSjWKWu0zniWUS0pPT1d+fr7y8vLUpEmTyrwl5eJKH8ARq+jKfPCVl+g7J2fomoHXa82K5erWq7caNz5KjY86Wt179dbqt9/SqWecqeVvLtbk8fdp9V+X6Zhjj4u7zvmX9NDbS1/X/vx8vfnagsOWUO7dd4Ckw5dQjlZyCeV/rF+n/pddoP6XXaDZz/5Rj00aX7ydl5dX3K86llWWuNIHcAQrmtOPVt56YpmtTtXMlxbrjdde1a8njlPnC7vpRzf/NM46iVtCuU3b/yge8+8enqCTMprrqv6DJUlNmny99k51LKsscaUPoI4557wuem3+S9q7d4/27Pm3Fr0yTx3P7axPd2xXalqaevcZoKEjRuvv7/6t1L7JWEI5Hu6uHTt2KDMzs0L7xYMrfQAJEc8tlsnQ9qz/VHa/wbqud2Ew9xk0RG3btdfSxQs1+cF7Va9ePaWkNNDd4yeV2rcmllAuy6pVq9SpUyelpCQ+ollaOUFYWhmhOVKXVq4JJZdQLk/R0so33XSTsrOz1b172Xf8sLQyANRi37SEclnatWtXbuBXFXP6AJAE5S2hXJbhw4dX2zgIfQCV4vJy75RB9anqe07oA6iULTsPqGDPLoI/idxdeXl5So368ldFMacPoFJ++9cvNFpSi+M/k8lqejh1wvqvYt+Xn5qaqoyMjErXIPQBVMqu/EN68PW82B0Rt80Trqz2GkzvAEBACH0ACAihDwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAGJK/TNrJeZfWBmG83sjjKen2xmayM/G8xsZ9RzB6Oem5vIwQMAKibmf4xuZvUlPSqpp6RcSSvMbK67v1/Ux91vieo/WlKHqEPsdfezEzdkAEBlxXOlf66kje6+yd33S5op6apv6D9I0oxEDA4AkFgxr/QlnSzp46jtXEnnldXRzFpIailpUVRzqpmtlFQgaYK7P1/GfiMkjZCk5s2bxzfycmxOHVyl/QGgLovnSt/KaPNy+g6U9Bd3PxjV1tzdsyQNlvQrM2td6mDuU9w9y92zmjVrFseQAACVEU/o50o6JWo7Q9K2cvoOVImpHXffFvlzk6TFOny+HwCQRPGE/gpJbcyspZk1VGGwl7oLx8xOl5Qu6a2otnQzaxR53FRSV0nvl9wXAJAcMef03b3AzEZJmi+pvqRp7r7OzMZJWunuRb8ABkma6e7RUz9tJT1uZodU+AtmQvRdPwCA5Irng1y5e46knBJt95bYHlvGfssknVWF8QEAEohv5AJAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgBD6ABAQQh8AAkLoA0BACH0ACAihDwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgBD6ABAQQh8AAkLoA0BACH0ACAihDwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgBD6ABAQQh8AAkLoA0BACH0ACAihDwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgBD6ABAQQh8AAkLoA0BACH0ACAihDwABIfQBICCEPgAEhNAHgIDEFfpm1svMPjCzjWZ2RxnPjzGz983sHTNbaGYtop47aGZrIz9zEzl4AEDFpMTqYGb1JT0qqaekXEkrzGyuu78f1W2NpCx332NmIyX9QtKAyHN73f3sBI8bAFAJ8Vzpnytpo7tvcvf9kmZKuiq6g7u/5u57IpvLJWUkdpgAgESIJ/RPlvRx1HZupK08wyS9HLWdamYrzWy5mV1diTECABIk5vSOJCujzcvsaPY9SVmSLopqbu7u28yslaRFZvauu39YYr8RkkZIUvPmzeMaeG20OXVwTQ8BwBHty2qvEM+Vfq6kU6K2MyRtK9nJzHpIultStrvnF7W7+7bIn5skLZbUoeS+7j7F3bPcPatZs2YVegEAgPjFE/orJLUxs5Zm1lDSQEmH3YVjZh0kPa7CwP80qj3dzBpFHjeV1FVS9AfAAIAkijm94+4FZjZK0nxJ9SVNc/d1ZjZO0kp3nyvpl5KOljTbzCRpq7tnS2or6XEzO6TCXzATStz1AwBIonjm9OXuOZJySrTdG/W4Rzn7LZN0VlUGCABIHL6RCwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AASE0AeAgBD6ABAQQh8AAkLoA0BACH0ACAihDwABIfQBICCEPgAEhNAHgIAQ+gAQEEIfAAJC6ANAQAh9AAgIoQ8AASH0ASAghD4ABITQB4CAEPoAEBBCHwACQugDQEAIfQAICKEPAAEh9AEgIIQ+AAQkKaFvZr3M7AMz22hmdySjJgCgtGoPfTOrL+lRSZdLOlPSIDM7s7rrAgBKS8aV/rmSNrr7JnffL2mmpKuSUBcAUEIyQv9kSR9HbedG2gAASZaShBpWRpsf1sFshKQRkc3dZvZBFeo1lfRZjex7/67K7oswVOX8Qgjut6qcIy3i6ZSM0M+VdErUdoakbdEd3H2KpCmJKGZmK90960jaF2HgHEEsyThHkjG9s0JSGzNraWYNJQ2UNDcJdQEAJVT7lb67F5jZKEnzJdWXNM3d11V3XQBAacmY3pG750jKSUYtVW2aqKb2RRg4RxBLtZ8j5u6xewEA6gSWYQCAgNSZ0E/WUg+x6pjZhWa22swKzOza6hoHgLrDzKaZ2adm9l5116oToR/PUg9mdtDM1prZe2Y228waR9rTzGyJmdU3s0wz2xvpV/QzJNJvgZk1iVVH0lZJ/y1penW+ZtQucZ5f9czsN5E+75rZCjNrGem3wMzSa/ZVoAY9KalXMgrVidBXfEs97HX3s929naT9kn4Uaf+BpOfc/WBk+8NIv6KfpyPtz0h6MFYdd9/s7u9IOpTwV4naLJ7za4CkkyS1d/ezJF0jaWek3zOSbkzymFFLuPvrkj5PRq26EvoVXerhDUmnRh5fJ+mFOGrMlXRFBesgTOWdXydK2u7uhyTJ3XPd/YvIc3MlDUrqKBGkuhL6MZd6KO5olqLC6Zl3I18Wa+Xum6O6tC4xvXOBJEX+cjaQ1CieOghTjPNrlqT/ipxXk8ysQ9F+kfOrUWQKEag2SblPPwliLvUgKc3M1kYevyHpDypcC2VniX4fuvvZ5dT5l6RWMeogTDHPL3fPNbPTJXWL/Cw0s37uvjDS5VMVTv/kJW/YCE1dCf3ipR4kfaLCpR4Gl+izt2SYm9leSakVqHNA0ikx6iBMcZ1f7p4v6WVJL5vZPyVdLako9FMl7U3CWBGwOjG94+4FkoqWelgvaVY8Sz1E/kld38xiBr+ZmaTvqPDDtsPqmNk4M8uO9PuumeVK6ifpcTNjyYlAlTy/zKyjmZ0UeVxPUntJWyLbRefX5poZLWqSmc2Q9Jak080s18yGVVutUL6Ra2a73f3oMtr/IGmGuy8ws0wVhnn00s7T3P03ZpYl6U5375uUAeOIEuf51UuFd4AVfS70tqQb3X0f5xeSJZjQL0/kw7Qx7n59jH6/ljQ3av4ViInzC7VNnZjeqQp3XyPptcgXvL7Je/yFREVxfqG2Cf5KHwBCEvyVPgCEhNAHgIAQ+gAQEEIfAAJC6ANAQP4feK4mL8+Rlq0AAAAASUVORK5CYII=\n", 1154 | "text/plain": [ 1155 | "
" 1156 | ] 1157 | }, 1158 | "metadata": { 1159 | "needs_background": "light" 1160 | }, 1161 | "output_type": "display_data" 1162 | } 1163 | ], 1164 | "source": [ 1165 | "# Prior\n", 1166 | "P_E = 0.1\n", 1167 | "P_S = 0.9\n", 1168 | "\n", 1169 | "# Likelihood\n", 1170 | "P_neg_S = 0.75\n", 1171 | "P_pos_S = 0.25\n", 1172 | "P_neg_E = 0.25\n", 1173 | "P_pos_E = 0.75\n", 1174 | "\n", 1175 | "# Const\n", 1176 | "P_pos = P_pos_S*P_S + P_pos_E*P_E\n", 1177 | "P_neg = P_neg_S*P_S + P_neg_E*P_E\n", 1178 | "\n", 1179 | "import matplotlib.pyplot as plt\n", 1180 | "\n", 1181 | "widths = [P_E, P_S]\n", 1182 | "indices = [widths[0]/2., widths[0] + widths[1]/2.]\n", 1183 | "\n", 1184 | "plt.bar(indices, [1,1], widths, label=\"Negativo (-)\")\n", 1185 | "plt.bar(indices, [P_pos_E,P_pos_S], widths, label=\"Positivo (+)\")\n", 1186 | "plt.legend()\n", 1187 | "plt.yticks([0, P_pos_E, P_pos_S, 1], [0, P_pos_E, P_pos_S, 1])\n", 1188 | "plt.xticks(indices + [0, P_E, 1], ['P(E)', 'P(S)']+[0, P_E, 1])\n", 1189 | "plt.axvline(x=P_E, linewidth=1, color='r')\n", 1190 | "plt.show()" 1191 | ] 1192 | }, 1193 | { 1194 | "cell_type": "markdown", 1195 | "metadata": {}, 1196 | "source": [ 1197 | "Finalmente podemos obtener la probabilidad posterior utilizando la fórmula" 1198 | ] 1199 | }, 1200 | { 1201 | "cell_type": "code", 1202 | "execution_count": 70, 1203 | "metadata": {}, 1204 | "outputs": [ 1205 | { 1206 | "name": "stdout", 1207 | "output_type": "stream", 1208 | "text": [ 1209 | "0.25 0.7499999999999999\n", 1210 | "0.03571428571428571 0.9642857142857143\n" 1211 | ] 1212 | } 1213 | ], 1214 | "source": [ 1215 | "P_E_pos = P_pos_E*P_E/P_pos\n", 1216 | "P_E_neg = P_neg_E*P_E/P_neg\n", 1217 | "P_S_pos = P_pos_S*P_S/P_pos\n", 1218 | "P_S_neg = P_neg_S*P_S/P_neg\n", 1219 | "\n", 1220 | "print(P_E_pos, P_S_pos)\n", 1221 | "\n", 1222 | "print(P_E_neg, P_S_neg)" 1223 | ] 1224 | }, 1225 | { 1226 | "cell_type": "markdown", 1227 | "metadata": {}, 1228 | "source": [ 1229 | "Dado el resultado positivo de la prueba, ¿cuál es la probabilidad de que realmente tenga esta enfermedad?\n", 1230 | "\n", 1231 | "Es solo un **25%**." 1232 | ] 1233 | } 1234 | ], 1235 | "metadata": { 1236 | "kernelspec": { 1237 | "display_name": "Python 3", 1238 | "language": "python", 1239 | "name": "python3" 1240 | }, 1241 | "language_info": { 1242 | "codemirror_mode": { 1243 | "name": "ipython", 1244 | "version": 3 1245 | }, 1246 | "file_extension": ".py", 1247 | "mimetype": "text/x-python", 1248 | "name": "python", 1249 | "nbconvert_exporter": "python", 1250 | "pygments_lexer": "ipython3", 1251 | "version": "3.7.3" 1252 | } 1253 | }, 1254 | "nbformat": 4, 1255 | "nbformat_minor": 2 1256 | } 1257 | --------------------------------------------------------------------------------