├── .gitignore ├── Pipfile ├── README.md ├── data └── immo_data.zip ├── environment.yml ├── notebooks ├── 1_Introduction.ipynb ├── 2_Starting_simple.ipynb ├── 3_Did_it_converge.ipynb └── 4_Beyond_linear.ipynb ├── requirements.txt ├── solutions ├── 2_Starting_simple.ipynb ├── 3_Did_it_converge.ipynb └── 4_Beyond_linear.ipynb └── src └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | */.ipynb_checkpoints 2 | .ipynb_checkpoints 3 | Pipfile.lock 4 | .idea 5 | *.csv 6 | */__pycache__/ 7 | 8 | -------------------------------------------------------------------------------- /Pipfile: -------------------------------------------------------------------------------- 1 | [[source]] 2 | name = "pypi" 3 | url = "https://pypi.org/simple" 4 | verify_ssl = true 5 | 6 | [dev-packages] 7 | 8 | [packages] 9 | jupyterlab = "*" 10 | pymc3 = "*" 11 | scikit-learn = "*" 12 | numpy = "*" 13 | pandas = "*" 14 | arviz = {editable = true,git = "https://github.com/arviz-devs/arviz"} 15 | matplotlib = "*" 16 | scipy = "*" 17 | bokeh = "*" 18 | seaborn = "*" 19 | ipykernel = "*" 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyLadies-Bayesian-Tutorial 2 | 3 | Repository with notebooks (and solutions) for my Bayesian tutorial at the PyLadies Meetup Feb 11, 2020. 4 | 5 | # Setup 6 | 7 | ## Download the code from Github 8 | The recommended way to download the code is through git: 9 | 10 | ``` 11 | git clone https://github.com/corriebar/PyLadies-Bayesian-Tutorial.git 12 | ``` 13 | This will download all code and create the folder `PyLadies-Bayesian-Tutorial` in your current folder. 14 | 15 | ## Install all packages 16 | 17 | ### Using Conda (recommended) 18 | 19 | To install the packages using conda, use the following command: 20 | ``` 21 | conda env create -f environment.yml 22 | ``` 23 | To activate the environment and start the notebook from it, run 24 | ``` 25 | conda activate PyLadies-Bayesian-Tutorial 26 | ipython kernel install --user --name=$(basename $(pwd)) 27 | jupyter lab 28 | # or jupyter notebook 29 | ``` 30 | 31 | ### Using Pipenv 32 | 33 | To install pipenv, run 34 | ``` 35 | pip install pipenv 36 | ``` 37 | Then install the necessary packages, using 38 | ``` 39 | cd PyLadies-Bayesian-Tutorial 40 | pipenv install 41 | ``` 42 | To activate the environment and start the notebooks from it, run 43 | ``` 44 | pipenv shell 45 | python -m ipykernel install --user --name=$(basename $(pwd)) 46 | jupyter lab 47 | # or jupyter notebook 48 | ``` 49 | Then, inside jupyter, pick the according kernel for the notebooks. 50 | 51 | ### Using Pip 52 | 53 | You can also install the packages from the `requirements.txt` file using pip: 54 | ``` 55 | pip install -r requirements.txt 56 | ``` 57 | 58 | ## Check that it works and extract the data 59 | Open the notebook `1_Introduction.ipynb` in the folder notebooks and try to run the first cell. If it can load all the packages and runs without problems then you should be good to go for the rest of the tutorial! 60 | 61 | 62 | # The tutorial 63 | The tutorial consists of four notebooks: 64 | 65 | - [Introduction](notebooks/1_Introduction.ipynb) which contains some installation checks & extracts the data as well as short motivation why we'd want to use Bayesian methods. If you already know why to use Bayesian methods then this can easily be skipped (except for the installation cell). 66 | - In [Starting simple](notebooks/2_Starting_simple.ipynb), we have a short look at our data and the start constructing a linear regression in PyMC3. We then learn how to understand your prior and experiment with different priors. 67 | - In [Did it converge](notebooks/3_Did_it_converge.ipynb), we then finally run our first model and check if everything went well. We'll also have a first look at the results. 68 | - To go [beyond linear](notebooks/4_Beyond_linear.ipynb), we then extend our linear model by adding some hierarchies. 69 | 70 | The notebooks in the notebook folders contain small exercises and some missing code. 71 | If you prefer to just tag along with the tutorial or get lost at some point, the full notebooks can be found in [solutions](solutions). 72 | 73 | -------------------------------------------------------------------------------- /data/immo_data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corriebar/PyLadies-Bayesian-Tutorial/2cb0403fdd5706c10f1d5459a45de28e15dabf59/data/immo_data.zip -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: PyLadies-Bayesian-Tutorial 2 | channels: 3 | - defaults 4 | dependencies: 5 | - python=3.7.6 6 | - pymc3=3.7 7 | - scikit-learn 8 | - matplotlib 9 | - bokeh 10 | - seaborn 11 | - ipykernel 12 | - jupyterlab 13 | - pip: 14 | - arviz==0.6.1 15 | prefix: /home/corrie/anaconda3/envs/PyLadies-Bayesian-Tutorial 16 | 17 | -------------------------------------------------------------------------------- /notebooks/1_Introduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Just to check if all packages we'll use can be loaded and unzip the data if you haven't yet:" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 28, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import numpy as np\n", 17 | "import pandas as pd\n", 18 | "import matplotlib.pyplot as plt\n", 19 | "import pymc3 as pm\n", 20 | "import theano\n", 21 | "import arviz as az\n", 22 | "import sys\n", 23 | "sys.path.append('../src/')\n", 24 | "\n", 25 | "\n", 26 | "# these packages are only for the intro, so not super necessary\n", 27 | "import scipy.stats as stats\n", 28 | "from sklearn.linear_model import LogisticRegression\n", 29 | "\n", 30 | "\n", 31 | "import zipfile\n", 32 | "from os import path\n", 33 | "# extract data if csv file doesn't exist yet\n", 34 | "if not path.exists(\"../data/immo_data.csv\"):\n", 35 | " with zipfile.ZipFile(\"../data/immo_data.zip\", 'r') as zip_ref:\n", 36 | " zip_ref.extractall(\"../data/\")\n", 37 | "\n", 38 | "# this should work after successfully extracting the data\n", 39 | "from utils import iqr" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "import numpy as np\n", 49 | "import matplotlib.pyplot as plt\n", 50 | "\n", 51 | "from sklearn.linear_model import LogisticRegression\n", 52 | "import scipy.stats as stats" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "plt.style.use('fivethirtyeight')" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "# Introduction: Motivation\n", 69 | "Before we start doing some Bayesian modelling, let's shortly think why we would want to use Bayesian methods.\n", 70 | "\n", 71 | "One advantage of Bayesian methods is how they deal with __Uncertainty__. Whereas traditional methods only give a point estimate, Bayesian methods return a probability distribution. From these probability distributions, we can deduce credibility intervals (the Bayesian equivalent of confidence intervals) but also allows us to make statements such as \"the real parameter is larger _x_ with a probability of _p_). \n", 72 | "\n", 73 | "Knowing how uncertain our model is, is especially important when we have rather little data or need to make high stake decisions (or both).\n", 74 | "\n", 75 | "We can visualize the differences regarding uncertainty between Bayesian and traditional methods using the very traditional (albeit boring) coin toss example.\n", 76 | "\n", 77 | "We're tossing a (biased) coin _N_ times and want to estimate the probability that it comes up head. This is equivalent to predicting the outcome of the next coin toss.\n", 78 | "\n", 79 | "As non-Bayesian method, I'll use the `LogisticRegression()` classifier from `sklearn` on the previous recorded coin tosses. Since we don't have any other predictors except the past tosses, it basically predicts $\\frac{\\#heads}{N}$. The Bayesian method will instead return a probability distribution over the interval $[0,1]$." 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 12, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "p = 0.75\n", 89 | "N = 4\n", 90 | "\n", 91 | "# result of N coin tosses\n", 92 | "y = np.array([1]* int(N*p) + [0]*(N - int(N*p)))\n", 93 | "# dummy predictors\n", 94 | "X = np.ones((y.shape[0],1))\n", 95 | "X_test = [[1]]" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 13, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "data": { 105 | "text/plain": [ 106 | "array([[0.24999865, 0.75000135]])" 107 | ] 108 | }, 109 | "execution_count": 13, 110 | "metadata": {}, 111 | "output_type": "execute_result" 112 | } 113 | ], 114 | "source": [ 115 | "model = LogisticRegression()\n", 116 | "\n", 117 | "model.fit(X, y)\n", 118 | "\n", 119 | "model.predict_proba(X_test)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "As expected, the classifier estimates the probability for head as 75%. However, this is based on only 4 coin tosses! Seeing 3 tosses coming up head and one tail, I still wouldn't be very convinced the coin is biased.\n", 127 | "\n", 128 | "Let's check the results for the Bayesian method:" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 14, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "data": { 138 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh0AAAFmCAYAAADJbSh4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOzdeVxU9frA8c+ZYd9BFMQNBHHHcjctMfW6t6hoWppmt9S0bLGsfiaVmnlNW0w0S81yya1rWu47uSTeMs19AUVQwQ2QHc7vj5GBEURA4DDD83695nU7Z87ynO+VmWe+q6KqKkIIIYQQZU2ndQBCCCGEqBwk6RBCCCFEuZCkQwghhBDlQpIOIYQQQpQLSTqEEEIIUS6sNL6/DJ0RQgghLJNy9w6p6RBCCCFEuZCkQwghhBDlwqKTjoiICK1DqDSkrMuPlHX5kvI2pSj5asxLjZR1+dGqrC066RBCCFG6JDEQD0KSDiGEEEKUC0k6hBBCFFnLli21DkGYMUk6hBBCCFEuJOkQQgghRLmQpEMIIUSRTZo0SesQhBmz6KTDw8ND6xAqDSnr8iNlLbQUGhqqdQjCjFl00qHTWfTjVShS1uVHylpoycfHR+sQhBmTTy8hhBBFFhsbq3UIwoxJ0iGEEEKIcqH1KrNCCCEqGFVVuZKcRcztDGKSMrmUlEnM7Qwys8GnfhCfHbqGvZWCk7WOem42NPW0w8lGfsOK+5OkQwghKrn0LJX/XU0hPCaZ8Esp/B6bTHxKVsEHj1zGW3uu5Nvt72pNs6p2tKhmx9MBLjT0sC3jqIU5kqRDCCEqofQslY2RSSw5cYv15xNJzlSLduJPk2Dgh/l2n72VwdlbGaw5k8j7e+NoUsWWkHouDAh0oYEkIOIOSTqEEKIS2R+bzPfHbrHidALXU+9RmwHY6RWq2OtxtdHjZqvD1VaPtQ5+2b+Kzm/NICNb5XZGNjG3M7mSnEn2XTnL0WtpHL0Wx6T9cbT2smNim6r08nMq01VqRcUnSYcZGjZsGHZ2dsydO1frUMrdwoULeeGFF9ixYwfBwcFahyOEWVBVlQ2RSXxyMJ7wmJQCj/G00xPgZoO/mw0BrtZ4O1qhKyBB+AUYEOhisi8jSyU2OZOLiRkcvZbGkfhUMrJz3//jSip9frlIi2p2fNCmKn3qSvJRWUnSIcxGXFwc48ePL/Lxf/31F+Hh4YwZM6YMoxKi4srKVll5OoFpB+M5HJ+W7313Wx2tve1p421PDSfrEt/HWq9Q29ma2s7WtPdxIDUzmyPX0vjflVT+vpZK5p0E5NDVVJ5cd5GHq9ox8zEvgms5lviewjxJ0iHMxhtvvEGdOnVITEws0vF//fUXs2fPlqRDVErbL97mtZ2XOXrNNNnQKdDG255HqtsT4GZTYG1GYT7d+s99j7Gz0tHKy55WXvbcSstiU9Rtdl+6baz9+DMulU6roxgd5M6nHbxk5EslIkmHMAtbt25l2bJlHDhwgPbt22sdjhAV1vlb6by5+wo/nzVNzm10Ch1q2NO1thMedvoSXz/q2GHcqlUv8vGutnoGBLrQrY4jmy/cZld0bvIx5+8b/BaZxHddfXhcaj0qBUkvK5idO3fSpUsXqlatioODAx06dGDTpk0FHrt+/XqaNWuGvb09zZs3Z+PGjSbvX716lddffx0/Pz8cHByoX78+n3zyCenp6SbHHTx4kK5du+Ls7IyTkxOdOnViz549JscEBwczbdo0fvjhB/z9/VEUhfDw8AL7VWRlZdGiRQv+97//Fese95KSksLIkSMZM2YMLVq0KNI5wcHBDB8+nJMnT6IoCsOGDTO+V5QyLkrZlWb5FvVaQtxLSmY2E/depeHisyYJh61eoaevE5+0r8bAQNcHSjgA5rw6uETnudrqCannwuRHqhHkmTuaJTIhg86roxi9PZa07EIuICyC1HRUIJs2baJ3797069ePsLAwsrOzWblyJT169GD58uUMGDDAeOyWLVtYuHAhY8aMYeLEiWzYsIE+ffqwe/du2rVrR3p6Oh06dMDBwYH33nsPLy8vTp48yYwZMzhz5gzfffcdAIcOHaJjx4507tyZuXPnkp2dzZYtW/jXv/7F5s2befTRR433XL16NceOHWPixIk89NBD+Pn5sWfPHiIiIvD09DQet2vXLv755x8CAgKKfY+CfPTRR6SlpTF58uQil+X06dP56aefWLFiBfPnz6dGjRpFLuOilF1plm9RryXEvRy8nMLQTZc4ccM0SW3rbc/TAc642T5YolGa3Gz1jA5y58DlFH46lWAcqhv29w3CnW3Z0jgTL0f5arJU8v9sBaGqKmPHjmXkyJF89dVXxv0DBgzg9ddf55133iEkJMTY4/vcuXP8+OOPPPvsswD079+frKwsPvroIzZs2MCff/7J6dOnuXr1KlWrVjVer23btixYsMC4PW7cOPr27cuPP/5o3DdkyBACAgIYP348+/fvN+6PiIhgxYoVhISEGPcFBwezbNkyxo4da9y3YsUKunfvjouLS7HvcbcjR44wY8YMVq5ciZOTU5HLs3Xr1hw7doxff/2V7t27A0Uv46KUXWmWb1GvJcTdMrJUPv4jjql/xJOVZ8hqHWdrBtZ3wd/VRrvgCqEoCm2rO9DQw5YlJ24ZO7keSdTTevl51j9Zi6aedhpHKcqCoqpFnBCmbJTpzSMjI/H19S3LW5SaP//8kxYtWnDp0iWqVzdtL718+TI+Pj4cPnyYpk2bMmzYMC5fvpyvOeXgwYO0adOGpKQkEhISqFOnDq+88grjx4/Pd00wVOl7eXlx8OBBGjRoYPJeXFwc/v7+xMfH4+HhQXBwMMnJyfzxxx8mx3377beEhoaye/du6tatS1ZWFtWrV+eLL75g0KBBxbrH3bKzs2nfvj3e3t78/PPPxv12dnZs3LjxvkNmFy1axLRp0zhx4kSxyrhq1ar3LbvLly+XWvmmp6ff91p5mdO/a0sQERFBy5YttQ4jn3+upTJ0Uwz/u5pq3GerV+gf4EyHGg7F7iBaVLtXLeKx/sNK7XqqqrL9YjIrTycYvxCcrHUs71mDXn7OpXYfYaqc/l3n+0dYpD4diqLoFEV5TVGUk4qiJCmKsltRlNaFHG+tKMqXiqLcvPOarShKycdjVQKnTp3Cy8urwC8cb29vPDw8OHPmjHFfQV86DRs2RFVVoqKi8Pb2Zs2aNWzevBkfHx+aNWvGu+++a/wCBkNtCUCrVq1wdnY2edWtWxdVVY3HAPTq1SvfPfv160dcXJwxGdm+fTtJSUn06dOnRPfIa+7cuRw5coRp06aRlJRkfIGhn0dSUhLFSZqLWsZFKbvSLN+iXEuIvL4/dpOWS8+bJBwBbjZMbOPJYzUdyyzhAEo14QBDrUfn2o680swdG8Xw95yUkc0Tv1xk9l/XS/VeQntFbV6ZAnQDngNOAX2A9YqiNFdVNbqA4z8FAoEmGDKdH4APgIkPHLGF0ul0hX6BZmRk4OxceNaflmaooszMzAQMSUKvXr04duwYO3bsYNOmTTz00EO89dZbTJ48Gb3e0M7722+/3fPagYGBxv+2tc0/lbG7uzvdunVj7dq1PPPMM6xYsYKePXsam0KKe4+8Vq5cye3bt/PVEgD07NkTgPPnzxf5V39xyvh+ZVeUY4rz7EW5nxBpmdmM23WFuUduGPdZKfCkvzNdapdtspHj5SAP5v1d+slAU087htTM4L9x9lxLzSJbhbE7L5Otqrz6cJVSv5/QRlGTjheBYFVVcwZo/6goShegBzA/74GKotgBI4H6OQmJoigjgd2KonygatyeU1EFBgZy5coVLl68SK1atUzeO3LkCAkJCff8cs5x6NAhrK2tCQgIIDExkbS0NDw9PWnUqBGNGjXilVde4ddff6VPnz688cYbBAYGotMZKrs6dOhgcq309HTi4uKM/TIKM3jwYEaNGkVycjI///wzc+bMMXmukt7jyy+/5NatW/n2d+7cmRkzZvDwww/ftykir6KWcVHKztrautTKtyj3K6j5SVQuFxMz6L/+In9cya3dqO5gxb+buj3QxF4VSTVblQmtqjDn8A3OJ2QA8NquK9jqdbwc5K5xdKI0FHXIbAfg2F379IB9Ace2BE6qqnoxZ4eqqieATKBOSYKsDIKCgvD392fKlCn53ps0aRIdO3akdu3axn07d+40+ULOzMxk6tSpPPXUU9jb27Nq1SqaNm3KjRs3TK5lZZWbZ7q6utKrVy8++OADYy1JjilTptCjR48ixf7EE0+Qnp7O+PHjSU1NpXfv3qVyj6ZNm9KhQ4d8L0VRjO8VVPuS91mzs3PH4BW1jItSdqVZvkW5lqjcdl68TfOl50wSjhbV7JjQqorFJBw5XGz0jHvYA3/X3OcauT2Whf/c1DAqUVqK9KmmqurJvNuKogRgaGIpqLnEGyioyeXSnfciixdi5aAoCl9//TW9e/fm+vXrhISEkJWVxeLFiwkPD2fv3r0mx586dYr27dvz7rvvotfrWbRoEX/++ScREREADBo0iHnz5tGqVSvGjh1LnTp1+Pvvv5k1axYvv/yy8ZfzrFmzaNu2LY888ggvv/wyzs7OrFu3jtWrV7N169Yixe7g4ECXLl0ICwsjJCQEBwcHk/dL4x4l0aRJEyIjI1mwYAGBgYF06NChSGVclLIrzfIt6rXOnDnDmTNncHNzk46klcjiYzd5cWuMcUItnQL9AlzoXMtBk/VLmj7WrczvYWelY+xDHnz+53Ui79R4jNgSg41e4dkGrmV+f1F2ij16RVEUL2AbsEBV1ZkFvD8IeFJV1Wfu2n8AeF1V1bzfnsab53xZFsTDw8NYTW3p/vzzTz7//HP+/vtvdDodDz/8MG+//bZJ08pbb72Foih4eXmxevVqbty4QdOmTZk4cSJBQUHG41JTU5k/fz6rV68mLi6OgIAAhg4dSr9+/UzuGRkZyfTp0zlw4ACJiYnUr1+f9957j3bt2hmPeeaZZ+jYsSOjRo0qMO6tW7fy73//mzlz5hRYe1GUexRV/fr1+f7772nbtu19j50zZw7z5s3D09OTbdu2AUUr46KUXWmWb1Gu9fnnn/PFF1/wn//8h/79+9/32bOzs7l+XTrimStVhW8vWvHNhdxhr456lae9M6hlXzlaqVOyYFmMNVfSDJ//OlQ+bZhOcJV7r44rtHXXiJh8WXGxkg5FUZoA64D5qqpOvccxnYE3VVXtedf+U0BfVVWP5tktQ2YthJR1+fDy8mL8+PG89dZbWodSaWgxZDY9S+XlbTEsOpbbhOrjaMXYhzweeEbRBzV7zCDGzF5WJteOjIrEt46vyb6k9Gxm/nmNS0mGDvIOVgr7BvoRVFXm8XgQFXrILICiKJ2ALcC790o47ogCGt11ri3gc+c9IUQJREREkJiYSJ060jXKkiWkZdFr7QWThKOhhw1vt6yiecIBcGR3wcsylBUnGx3jHvagqr3h2ZMzVZ5cd5H4lMxyjUOUjqLO09EM+B7oo6rq8sKOVVX1DJCkKErzPLs7AidUVS3a8qBCiHxmz57NwoULadWqldahiDJyPTWLLmui2HrhtnFfu+r2jG3mgb1V5WhiLoiLjWHqdDu94YdzZEIG/ddHk5FVOZqZLElR/xV/BUwF/lIUxSrPSwdw57/zVqPMAb5VFCVQURTvO+d+VqqRC1HJzJgxg4EDB2odhigjV25nErwqkoN5Rqj08XPi+Yau6HXl32G0ovFxsuaFxm7G+vpdl5J5bddlTWMSxXff0SuKorgCj955hd319lpFUcYB54HhwKI7+8OA+sBfwA3gS1VVy6QRMHTfVT48EF/IEXeP9C1bk9p4EtquWrneU1QOeRfVE5YlOjGDzmuiOHVnwTYFGFTfhY41K95y72UxMVhRNatqx5P+zvz3ziq6YX/foJmnnczhYUbuW9OhquotVVWVe7yeUlU18s5/L8pzjqqq6muqqjqoqlpDVdVPy/QphBDCTJ29mc6jKyNNEo5hjVwrZMIBhrVXtNS9jiOtvHI7kY7ZGcvByykaRiSKo/I2EpqREydO0KdPH1xdXQkICCA0NDTfZFMFGTZsGCNHjiyHCIsmODiYadOmaR2GEBXG2ZvpdFwVaZyLQq/AS03daFvd4T5namfJR29oen9FURja0I3azoaK+sxseG7TJW5nZN/nTFERmP2Uh6Htqt2zOcMShnEeP36cdu3a8cgjjzBv3jxiY2OZOnUqR48eZdWqVVqHJ4QooaiEdB5fHWUcCmqtg5FN3WkiS7rfl41e4eWm7nx8IJ7ULJVTN9J5a/cVwjoXfVkEoQ2zTzos3ccff8wjjzzCr7/+apx9sHXr1nTo0MEikiohKqNLSRk8vjqKC4mGGg5rHYxp5kEDj3tP6y9MedpbMTDQhe+PG4YWzz1yg15+TvSuW/jCmEJb0rxSwTVu3Ng4A2mO9u3bo9friY2N1TAyIURJXL6dyeOrozh3y5BwWCkwKsh8Eo7RXy7VOgSjdtXtaZ5nkrARW2O4mizzd1RkknRUcO+//z6PP/64yb7jx4+TlZVFzZo1i3SNXbt20bJlS5ydnXn00Uf5448/8h1z8OBBunbtirOzM05OTnTq1Ik9e/bkO27z5s107NgRd3d3vL296devH2fPns133IIFC2jQoAE2NjbUqFGD1157jdTUVJNjzp07x4gRI6hVqxaOjo4EBQUxd+7cQpefF8KcxSVn0nl17igVnQIvNXWncRXzSDgA6jRqpnUIRoqi8GxDV1xtDF9lV5OzeHFrjHyGVGCSdJiR69evs337dgYPHkynTp3yLc9ekN27d9O1a1caN27MokWLaNSoEcHBwRw7ljuU+NChQ3Ts2BE7Ozvmzp1LWFgYtWrV4l//+pdJ4rF06VJ69uxJw4YN+eabb/jPf/5DXFwcnTp1IiEhwXjc4sWLGTlyJH379mXp0qX079+fsLAwDhw4YDwmLi6ONm3acPbsWSZPnszSpUvp168fb731FpMnTy6lEhOi4khMz6LHfy9w7LqhE7hOgX83caOZmU3n/U6XxlqHYMLJWsfzjdyM2+vOJTH/qKxIW1FJnw4z0rx5c6KiDDPJr1mzpkjnHD9+nE8++YQJEyYA0K9fP2JjY/n666/5+uuvARg3bhx9+/blxx9/NJ43ZMgQAgICGD9+PPv37wdg5syZvP3220ydmjsL/oABAwgICGDLli0EBQWRlZXFpEmTmD59OuPGjQOgf//+9O3bly5duhjP2759OxkZGWzdutW4hPuTTz5JkyZNTJITISxBepZK3/XRHLpqqO1TgOGN3GhezV7bwCxE4yq2dKrpwI7oZADe2nOF3n5O+DhZaxyZuJvUdJiR77//nrCwMFq1akVISAibN2++7zldunQxJhw5mjdvzrlz5wC4evUq4eHhjBs3jqSkJJPXkCFD+OOPP4wrlUZERJgkHCkpKWRkZFC/fn2io6MBwwquFy9e5MUXXzS5Z8eOHWnfvr1xu169eiQmJvLZZ5+ZrITar18/pk+fXsySEaLiylZVnt90yWRq88ENXGntLQlHaeob4IKXg2F9lsT0bN7cfUXjiERBJOkwIx07dmTkyJHs37+fHj16MGnSpPue4+/vX+D+nDbPnOSjVatWODs7m7zq1q2LqqrGY1RVZcGCBbRt2xY7OzscHBxwdnZm27ZtxuudP38eHx8fnJycCo2refPmfPfdd8ydO5eqVavSpk0bPv74Yy5cuFDk8hCiolNVlXG7rrD8VG7z4xN1nXisRsWdh+N+OvQbqnUIBbLRKwyu72rcXn4qga0XkjSMSBREkg4zpNPpeOmll/j7778f+Fp6veGXwW+//caePXsKfAUGBgLw7rvv8uqrr9K+fXuWL1/Orl272LNnD23atDFez9ramuzsok3SM2zYMM6dO8fBgwcZMGAAO3fupH79+oSF3T3bvhDm6ZOD8Xz1V25NXnBNB3r6Fp6QV3RDJn2udQj31MDDltZ5Zit9Zcdl0jJl0rCKRPp0VHADBw5k/PjxtGzZ0mR/cnIydnYP3gEtMDAQnc6Qe3bo0MHkvfT0dOLi4nBxcSE9PZ1Zs2YZO3zm5eaW24mrcePGxMTEEBMTg4+Pj8lxeZORGzduAODu7k7z5s1p3rw5b775Jl9//TUTJkxg1KhRD/xsQmjph+M3eX9vnHG7RTU7Bga6YLo2pvmZMrAT7/+0Q+sw7ql/PRf+jk8zThr2n0PX+L82VbUOS9whNR0VXFxcXL6pwzMzM/n888/p1q3bA1/f1dWVXr168cEHH+SbWn3KlCn06NEDgKSkJNLT02nQoIHJMWvXriU8PNy4Xa9ePYKCgvL1y/jll1/4/fffjduff/45wcHB+e5pZWVl9h/KQuyKvs2ILTHG7QbuNgxv7IbOAv5tXzh+WOsQCuVqq+cp/9wJwqb8Ec+5W+kaRiTykpqOCm7mzJk8+uijdO3alaFDh5Kenk5YWBjR0dEsX768VO4xa9Ys2rZtyyOPPMLLL7+Ms7Mz69atY/Xq1WzduhUADw8PGjVqxKhRo3jllVewtbVlzZo1LFu2DFdXV5PrzZgxgx49epCZmUmXLl3Yt28fX331lclxY8eOZcmSJbRt25ZRo0bh6enJ/v37+fLLL/nwww9L5bmE0MLJ62k8ve4iOUuB1HCyYmSQO9ayPH256VjTgb2xyVxIzCQ1S2Xsjsusf7KW/KCpAKSmo4J76KGH2LdvHzY2NowcOZI333yTgIAADhw4QJ06dUrlHv7+/uzbt4+6devy/vvv8/zzz3Py5Ek2btzIo48+ajxu+fLl2NjY8OKLLzJ48GCuXLlCeHg4bdu2Nblely5d+PXXXzlw4ABDhgzh999/Z9u2bQQFBRmP8fT0JCIigscff5wpU6YwbNgwDhw4wKpVq3jnnXdK5bmEKG9xyZn0XHuBG2mGjMPFRseYZh7YW1nOR61rVW+tQ7gvnWLoVJqTYvwWmcQv56RTaUWgaDxzW5neXNYmKT9S1uVHyrp8RURE5OtTVZDUzGw6r45ib6xhmXUbncJbLapQx0XmiiiqyKhIfOv4ltr1lpy4xe5Lhrk76rvbcHSIP1ZS4wQU/d/1A8pX2JaTfgshhEZUVeWFLTHGhEMBRjRxs8iEY92cafc/qIJ40t8ZO73he+/kjXQW/iMzlWpNkg4hhHhAn0ZcY9nJ3Lk4+tdz5iEzm968qNbPNZ/J+5ysdXTLM0Q5dH8cyRkyhFZLknQIIcQDWHcukfd+v2rcfqyGA51rOWoYkcircy0H44JwMbcz+TLPvCmi/EnSIYQQJXTsWhrPbrxk7JxWz83GIubisCS2eh29/XJrO6YdjOdaSqaGEVVuknQIIUQJ3EjN4sl1F0lMN1TXV7HT83JTN4vvqPje8u1ah1Bs7X0cjOuy3ErP5pOD8RpHVHlJ0iGEEMWUma0y8Ldoztw0TDplo1MYFeSOs41e48hEQfQ6xWTCsNmHb3AhIUPDiCovSTqEEKKY3v39KlvyrBo7rJErtZwtb6RKQaY+87jWIZTIw1Xt8L0zmigtS2XS/qv3OUOUBUk6hBCiGFadTmDGoWvG7V6+TrTwkmXqKzpFUegXkFvb8f2xW5y8nlbIGaIsSNIhhBBFdPx6GsM3566p0tTTlt51zXvV2Mok0N2WRh62gGFmyk8jrhV+gih1knQIIUQRJKZn0XfdRZLuzPNQ1V7PC40sYxG34ug98m2tQ3ggvfKMZPnhxE2iEmQxuPIkSYcQQtyHqqoM3xzDiRuGLyhrHYxs6o6DdeX7CO0zeoLWITyQADcb6rnZAJCZDf85JLUd5any/cUIIUQxffa/a6w+k2jcfq6BKzUrScfRu73duZHWITywnnlmKf326E0u35Z5O8qLJB0VyKJFi1AUJd/L0dGR4OBgdu/eXaLrhoaG0r1791KOtvwsWrSIBg0aaB2GqKT+d0vHhPDckQ7BNR1oW91Bw4i0dSvustYhPLCGHjbUcc4dyTLrf1LbUV4k6ahgatasyYYNG0xeYWFhODo60rlzZ3bs2FHsaz733HNMmTKl2OdFRkYyefLkYp/3IAq6Z9euXfn22281j0NUPleTM3n/pA1Zd6YcretqTUg9F22DEg9MURR65KntmPP3DW6kZmkYUeUhSUcF4+joSPfu3U1eQ4cO5ddff6Vv3768/XbxO3EFBATQokWLYp9XUZKOGjVq0KFDB83jEJVLVrbKcxsvEZ9u+Jh0stbxUhN3i59x9H5qN2ymdQilollVW3wcrQBIysjmK1mTpVxI0mFGxowZw6FDh0hOTtY6FCEs3tSD8SYTgA1v7Iq7ncw4+v5Pxa9trYh0ikL3PLUdX/x1naR0WYG2rEnSYUacnZ1RVZWMjNzpe3fu3EmXLl2oWrUqDg4OdOjQgU2bNpmcd3efjpztS5cu0a9fP5ycnAgICGDmzJmoqqEeediwYXTq1Im0tDQURSE4OLjQ2M6dO0ffvn1xc3PDwcGBNm3asHbtWpNjkpKSmDRpEvXr18fe3h4/Pz/eeecdEhISCr3n3X06cuI/fPgwHTp0wMHBgYceeohdu3YB8MMPP+Dn54eHhwf9+/fn8mXTNuiTJ08yYMAAfHx8cHV1pUOHDmzfnrueRGHPfurUqfs+pzB/Oy7eJnR/nHG7h68jTapY5lL1xfXDh+O0DqHUtKxmh6e9IZG8nprF3CNS21HWJOkwI+vWraNWrVq4uroCsGnTJrp27YqnpydhYWEsWrSI6tWr06NHD1asWFHotaKjo2nTpg0xMTHMnz+fESNG8H//93/MnTsXgDfffJNPP/0Ua2trNmzYwPTp0+95rQsXLhASEsLVq1eZNWsWCxYsoFWrVgwcOJDly5cbj+vduzc//PADY8aM4aeffuLNN99k1apVPP3008W+59mzZ3nsscfw8fHh+++/p2bNmjz55JNMnjyZESNGMGLECL766iuOHDnCiBEjjOedPHmSVq1acfPmTaZNm8a3335L/fr16datGwcOHCg0jgsXLtC+ffv7Pqcwb5dvZzJoQzTZd/px1LbLpo+fc+EnVSLhqxdrHUKp0esUutXJre348q/rZOb8Hy/KhJXWAQhTqqqSlJRksi8+Pp41a9YwZcoUZsyYYT/gDe0AACAASURBVDxu7NixjBw5kq+++sp47IABA3j99dd55513CAkJuecS2//88w/du3dn7dq12NgYxqxnZGTw448/MmrUKJo2bcq1a9fQ6XT3Hfny/vvv4+/vz65du9DrDb8annnmGVq3bs2bb75JSEgI8fHx7Nq1i4MHD9KyZUvjuV26dGHixImkpKQU655nzpxh8ODBLFmyBIA+ffrg6urKxIkT+emnnxgwYAAA1atXp1u3bmRkZGBtbc38+fMJCgpi06ZNxrIJCQkhISGBefPm0aZNm3vG8f7779OwYUN27Nhxz+fM2S/MU7Zq6MdxJdnQqdDZWscT3mnoK3k/DkvWztueX84mkpiRzcXETNaeTaSfdBYuM1LTUcGcOnUKZ2dnk5efnx8TJ07k//7v/xgzZgwAf/31F2fOnOG9997Ld4133nmHqKgojh49es/71KhRg6VLlxoTDoDAwEBiYmLueU5BsrKyWLt2LcOHDyclJYWkpCTj68knnyQxMZHDhw/j7u6Op6cns2bN4ty5c8bzGzRowMqVK7G3L97aFbVq1WLevHnGbTs7O6pXr86///1vY8IBULduXTIzM7l+3VBtOmPGDMLDw40JR1paGklJSTRp0oTz58/f9zlfe+21Qp9TmLdPD15j20VDPw4FGNHEDWf5aWbRrPUKj9bIHQL9pXQoLVNmn3SEhoaazGlx6NAhDh06hKIo+Pn5oSgKoaGhAPj4+BiPyxnN8dJLL5mcHxMTw7p160z2ffPNNwAm+/r06QMYfmHn3f+gateuzZ49e0xehw4dIj4+nv/7v/8zHnfq1Cm8vLyoXr16vmt4e3vj4eHBmTNn7nmfJk2a4O7unm9/Tp+OooqLiyMxMZHRo0fnS5bc3NxITEzk3Llz2NjYsG7dOs6ePYu/vz+BgYG8+uqrHDx4sFj3y9GoUSOcnPKvefH44wWvgJn3uXbt2kXv3r1xc3PDzs4OZ2dnQkNDC332nOfs379/oc8pzNe+mGQm7sudj6O7rxMN76zTIXJ9uvUfrUModY/VcCCnMmv3pWT+upqqbUAWzOxz+NDQUGNSkZeqqkRGRuLr62vcV9Cv+G+++caYVOTw8fEp8AuooH3r1q0rftCFsLe3L9LwUJ1OV+iXZEZGBs7OZd8OndOcMH36dNq1a1fgMYGBgQC0bduW/fv3c/78ebZv386WLVsIDg6mf//+xonRytqGDRvo3bs3AwYMYM6cOVSvXh1ra2uWLFnC8ePH73leznMuWLCAevXqFXhMznMK83MzNYtBGy4Z5+Pwd7Wmj58s5FaQqGOHcauW/8eOOXO309Oimh0HrxiSja8OX+e7rj4aR2WZzD7pqKwCAwO5cuUKFy9epFatWibvHTlyhISEhHL5EvT09KRKlSqkpaXlS5aysrKIjY2lSpUqJCcnk5CQgLe3N35+fsbOnv/88w9NmjRh3LhxPPzww2Ue76effsro0aNN+sEAhIeHF5p05Dxnampqoc8pzI+qqry0LZaoRMOoMAcrhRGN3aQfxz3MeXUw8/62vCaITrUcjUnHkhO3+LRDNTzt5SuytJl980plFRQUhL+/f4EzjU6aNImOHTtSu3btB7qHlZXVfZtbFEVh8ODBhIWFcfPmTZP3Fi9eTFBQEKqqcuDAAerUqUNUVFS+e+Rcp6j3fBDXr1/PN6X6yZMn8814enccOc/5ySefFPqcwvx8e/QmK08nGLeHNHSlinzZVDp1XaxNpkaff+Tmfc4QJSF/WWZKURS+/vprevfuzfXr1wkJCSErK4vFixcTHh7O3r17H/ge9evXJzs7m//85z889NBDdO3atcDjQkND+eWXX2jZsiWvvfYaXl5e7Ny5k/nz57N48WKsrKwIDg6mZ8+etGnThtdee42GDRty+vRpvvjiC3r27EmzZs2Kdc+SevTRR5k2bRp2dnZ4eXlx4MABZs6cma8pqqA4QkND+e233wp9TmFejl1L47VdufO4PFbDgebVitepWVgGRVF4vJYDC4/dAmDO39d5q0UVrPVS41Wa5FPSjHXr1o09e/YQGhrKyJEj0el0tGvXjn379tG4ceMHvn7VqlWZP38+EyZM4ObNm6SmFty5ysPDg5UrVzJ37lw++eQT4uPj8ff3Z+nSpYSEhACGP+hVq1Yxd+5cwsLCOH/+PH5+frz99tuMHj3aWNNR1HuW1JQpU0hMTGTChAncunWLxo0bM2/ePC5fvsz69esLfXYPDw9+//133nvvvXs+pzAfaZnZDN4QTUqmoYbKx9FK1lUpgmc/mKl1CGWmhZc9q04bhs9GJ2Xy37OJhATKv4nSpGhcJVymN7+7I6koO1LW5UfKunS8tfsyn/3P0DfBSgfvtfKkhlP+5eojoyLxreNbztFVThWhrH85l8iv5w1zJXXwsWfPAD9N4ykrERERJnMmlZF81UTSp0MIUelsvZBkTDgA+gW4FJhwiPxeDvLQOoQy1THP8NnwmBT+vJqibUAWRpIOIUSlci0lk6GbcofPN6liS6eaDoWcISoTV1vD8Nkc849Kh9LSJEmHEKLSUFWVf2+NJfZ2JmCY5vz5hq7lMkeMMB+P5ZmhdOmJWyRnyOqzpUWSDiFEpfHdPzf5+Wyicfv5Rq642Mp6OcXR9LFuWodQ5uq52VDtzuqzt9KzWX0m4T5niKKSpEMIUSmcuZnOaztzh8cG13SgqacsV19cY2Yv0zqEMqcoCo/45NZ2fCdNLKVGkg4hhMXLzFYZsvESyXeGx1Z3sKJfgAyFLInZYwZpHUK5aFfd3tihdNelZE7fSNM2IAshSYcQwuJNOxjP/suGUQh6BV5o4oaNTPpUIkd2b9I6hHLhZqunSZXcBf8W/CO1HaVBkg4hhEWLuJLChwfijNt96jpT21mGx4r765CniWXRsVtkZstSBw9Kkg4hhMVKzshmyMZLZN4ZfODvak23Oo7aBiXMRpMqtrjYGL4mLydn8tudScNEyUnSIYSwWBN+v8qJG+kA2OoVhjdyQyfDYx+IJa4wey96nUK76rlr8Xz3zw0No7EMknQIISzS5qgkvvor9wtyQD0XqjrIclMPaveqRVqHUK7a52li+fV8ErG3MzSMxvxJ0iGEsDg3UrN4YUvurKNBnra095HVY0vDko/e0DqEcuXlYEU9NxsAslT4/s4qtKJkJOmoYPbu3cujjz6Kk5MT3t7eTJgwgfT0dJNjvvzySxRFKfB196qsoaGhuLi4EBgYWOBy96GhoXz++edFji84OJhp06aV7OEqMEt9rsrqtV2XuZRkmHXUyVrHkAYy66goubwJ63f/3ETjhVLNmtQ1ViDHjx+na9euPPXUU7z66qvExMTw4YcfcuLECX7++Wfjh2Z8fDytWrXio48+yncNGxsb43//97//Zfr06Xz22Wfs3buX/v37c/bsWeztDX9AcXFxzJgxg2+//bbIMU6fPp2qVasW+9l27txJdHQ0zz33XLHPFaI4fj6TwA/Hc3+NPtvARWYdFQ+kRTV7lp9MIDVL5czNdA5eSaW1t9SclYQkHRXIzJkzCQ4OZsmSJcZ9jzzyCO3atWPdunU88cQTAFy7do3AwEC6d+9e6PW++eYbRo0axahRoxgxYgQ1a9Zk/fr1hISEAPDpp5/i5+fHgAEDihxj69atS/BkhqRj//79knSIMnU1OZOXt8Uat9t429O8mnw5lKbRXy7VOoRyZ6NXeLiaHftiDXO9LDlxS5KOEpLmlQpk27ZtDBw40GRfq1atCAkJYeHChcZ98fHxeHp63vd658+fJygoCDDUgDRq1Ijz588DcPnyZebMmUNoaCg6nfwzEOZPVVVGboslLiULADdbHQMDZdbR0lanUTOtQ9BEmzxJxk+nZM6OkirWt42iKN6KolxUFOXFQo7xUBRFLeD1zIOHa9liYmJo2LBhvv19+vRhz549xu34+HiqVKli3M7IKLg3dbVq1YiOjjZuR0dHU61aNQA++eQT6tevT9++fYsV4919H3K2Dx06RNu2bXF0dKRFixZs2pQ7a6Gvry8ffvghmzZtQlEUQkNDje9t3LiRdu3a4eDggJubG3369OHo0aMm9/T19WX58uV89tln1KhRg+rVq7Nq1ap8CRrAjRs3aNy4sclzL1++nNatW+Pi4kLNmjV54YUXuHLlSrGeW1R8P564ZbKY29CGbjhaS0Jd2t7p0ljrEDRR393GOGfHleQsdly8rXFE5qnIf5GKotgAawC3+xxaFYgErO96/VSyECsPFxcXbt7MP9VuvXr1uHbtGomJhg/Ua9euceDAAVq3bo2DgwM2NjY0aNCAn34yLeK+ffsSFhbG0aNHmT17NjExMXTv3p1Lly7xzTff8NFHH5VK57rNmzczaNAgAgMD+f7772nZsiV9+vTh8OHDAHz//fc8++yztGjRgg0bNhibWNavX0/v3r0JDAxk4cKFzJo1C0VRaN++PadOnTK5x8yZM/nkk0947733WLp0KbVq1WL16tX5koeff/6ZW7duUaNGDcCQXA0fPpyOHTuycOFCPvzwQyIiIujZs+c9kzVhfqITMxi7I3cxt8dqONA4zxTWQjwonaLQ0it3gcAlJ2QUS0kUp09HLyAWOH2f4zyBeFVVM0scVSXVpUsX5s+fT9euXY37MjMzmTRpEgCJiYk4OzsTHx/P2bNnmTp1Kk2aNOH69eusWbOGZ555Br1eT//+/QF4+eWX+eWXX2jatCl6vZ6vvvoKb29vRo8eTdOmTenTpw8AWVlZ6PUl72i3Y8cOXn/9dWbOnAlA//79OX36NCtWrKBZs2Z07NiRHTt2EB8fb+yHkp2dzZgxY5gwYQKTJ082Xmv48OG88MILfPDBByxfvty4/9ChQ+zdu5c2bdoY9/n6+rJixQrGjh1r3LdixQr69+9vTKY+++wzvvjiC1566SXjMU899RQ1a9Zk3759PPbYYyV+blExqKrKv7fGcCvdMO2op72efgHOGkclLFEbb3u2X0wGYM3ZRMIys7G3ktq04ihyaamq+jMQAtyvIasqcO1BgiqO0NDQew4f9fPzu+d7ZfXK23RQXO+99x5r165l7NixREVFcezYMZ544gl27twJgKurKwCPP/44ixcvZuzYsXTq1Il+/fqxZMkSXn31VT744APj9ezs7Ni6dSuHDx/mwoULjBo1igsXLvDdd9/x8ccfo6oqr7/+Ovb29vj4+LB58+YSxd2zZ09effVVk32BgYHExMTc4wxDEnHp0iVGjx5NUlKSyWvYsGFs2LDBZFhav379TBIOgGeeeYalS3M7tV27di1fv5j4+HiThCM5ORlbW1tq1apl7N8izNuCf26yMcpQ1a0Azzd0xU6+CMpMh35DtQ5BM3Wcralmb/iBlpieza8yLXqxFesvU1XV7CIc5gl4K4ryX0VRTiuK8reiKP9RFMWpZCFWHk2aNGHZsmX88MMP+Pr60rhxY6ytrQkLC8POzg5HR8OaEYsXL+bpp5/Od/6IESM4fvy4SRONoigEBQXh4+MDwOTJk2nVqhXdunXjxx9/ZMGCBSxcuJCnn36aQYMGkZCQUOy4H3300QL3FzaW/ezZs2RmZlKjRg2cnZ1NXh07diQhIYFr13Jz1169euW7xqBBg9i/f78xeVizZg0+Pj60bdvWeExGRgYzZ87koYcewtraGkdHR5ydnTl9+rSMtbcAFxIyeGN3bhPb47UcCXSXZpWyNGRS0ef1sTSKotAqT4dSaWIpvrL4OVAV8AK+BjoBLwIBwI9lcC+L07dvX2JiYoiIiCA6Opq1a9diY2ND/fr173uura3hw/buCcJynD9/noULF/Lxxx8DhuRlzJgxPPvss3z55ZfY29uzbt260nuYQuj1ehwdHdmzZ889X87OuVXkOc+WV+PGjWnatCnLli0DDE0rISEhJv1Uhg4dytSpU3niiSf4+eef2b17N3v27KF27dpl/5CiTKmqyotbY0i406xSzV7PU/7SrFLWpgzspHUImmrtlZt0/BaZxI3ULA2jMT9lMU9HGLBcVdWcuutoRVFC7vxvDVVVLxV0UkRExD0v6OHhcc9hncOGDWPYsGEPFnEpi4yMLNF5MTEx2Nvb4+7uTpUqVcjIyCAyMtLYlyEyMpLo6GimTp3KrFmz8n0Rb9y4EScnJ27fvl1gDG+//TatWrXCz8+PyMhIIiMj6d69u/HYmjVrcuTIkULjT01N5caNG8Zjcrbvfu7ExETS0tKM+27evElKSopx29nZmdu3b5OamkpAQIDJPVJSUkhKSiI21jDfQmZmJnFxcQXG1aNHDxYvXky3bt3YsWMHr7zyivG46Oholi9fbuzXkpeVlRXx8fH5nqOk/9+Vt6LEmZ2dzfXrlrs415rLerZcyPkbUOnmkUJMdFSZ3CsyKrJMrmuOLhw/XKblYQ5l7W1rzeU0HelZKjO3/s2T3uaZeBT2vVtSLVu2LPT9Uk86VFW9Cdy8a1+moiiXgVpAgUnH/QIticjISHx9fUv9umXlnXfeQVVVVqxYYdyXnJzMxo0bWbZsGb6+vnh5ebF37162bt3KK6+8YnJcWFgYgwYNwt/fP9+1T58+zX//+1927txpLBN/f3+uXr2Kr68vmZmZREVF8dBDDxVaZnZ2dri7uxuPydkGTM5zdnbG2trauM/T0xNbW1vjtq+vL0FBQcyePZs1a9aYJJXPP/88N27c4JdffgEMCULVqlULjGvUqFFMnz6dBQsWULt2bZ566inje9evX0dRFDp16oSTU27r3rx584iKisLT0zPfc5jDv5fi/LuuW7du2Qajkchb6Xx14BxgqOXoWtuJDvXKZk6OyKhIfOv4lsm1zVVZlYe5lHUHJYlVpw2jCX9Pdefjlr7aBlQCERERZfK9ez+lnnQoijId+F1V1bV59tXA0MRyv5EvldqoUaPo0qULw4YNo0+fPpw9e5bvvvuOrl27Gv9x2NvbM336dEaNGsXx48fp2LEjiYmJfPnll6Snp/PJJ58UeO0PP/yQzp070759e+O+ESNGMHz4cOrWrcu+ffvIzs42jmgpbU2aNGHmzJksX76cevXq0aJFC+bNm8fjjz9O165dGTJkCDqdjmXLlrF3714OHjxYpOv6+vrSrl07wsLCGD9+vMl7DRo0oEqVKjz33HPGYbqLFi1ix44dODg4FHQ5YQayVZURW2NIyjAkHF4Oep6oK80q5cW1qrfWIWiulZc9q08nogI7o5O5lJRBDSdrrcMyCw/cp0MxsFJyG9K3A18pitJFURQXRVFaAWuBb1RVLbdRLeYoODiY7du3c/r0aZ577jlmz57N888/z9dff21y3EsvvcTixYsJDw/nueee45133uHhhx9m7969JpOG5bh9+zbbtm3Lt1ZLSEgI48aN46233iI8PJyVK1caO6uWtieffJLBgwfz4osv8v777wPQtm1b9uzZg42NDW+88QYvvfQSycnJ7N69m8DAwCJfe9CgQaiqmm86dwcHB+NcHkOHDuWFF17Azs6OQ4cOSZ8OM/bNkRvGYYsKMKyRGzZ6WcytvEzfdkzrEDTnZqsn0N2wzpUK/HSq+B3wKyuluD34FUVZBISrqvrtne1gYAfQSVXVnXf2DQImYqjduAzMB6YUMPqlTIcPmFvzSllKTk4u01/3UtblpzKXdVRCOk1+OGes5fhXHUf6BZTtVOfmUuVfXtbNmUaf0RPK5NrmVNbhMcnGhQXbVbdn70A/jSMqnnJqXsn3a6DYNR2qqg7LSTjubO9UVVXJSTju7FumqmojVVVtVFWtrarqx0UcbivKiDQnCHNnGK0Sa0w4vB30POEnzSrlbf3c6VqHUCE85GmH7s5X6r7YFC4lyQzHRSEz6AghzMK3R2+y9UKeScAauWEtzSpCI042Ouq52Ri3/5tn3R9xb5J0CCEqvAsJGby5J3cSsC61HanralPIGUKUvebVctdiWX1a+nUUhSQdQogKLWdtlcR0Ga1SEby3fLvWIVQYD1e1M3Za2HUpmfgUWXLsfiTpEEJUaAuP3WRz3maVhjJaRVQMrrZ66roahspmq7BWmljuS5IOIUSFdSkpgzd2ma6t4u8mzSpamvrM41qHUKE8nKeJZc0ZSTruR5IOIUSFpKoqI7fFGpesryprq4gK6OGquUnHlgtJ3EozzynRy4skHUKICmnpyQTW51k6fGhDV2lWERWOp70VtZ0Nk3tnZCPL3d+HJB1CiArnyu1MXt152bgdXNNBlqyvIHqPfFvrECqcvLUdq8/IKJbCSNIhhKhwxuyM5fqdJcOr2Ol5WppVKoyymo3UnDWvlrvc/YbIJJIzZC7Me7HopCM7W/6PLy9S1uXH0st61ekE4wqeAM81cMXOyqI/qszK250baR1ChePtaEV1R0MTS0qmysYoaWK5F4v+S75+/brWIVQaUtblx5LL+lpKJq/siDVud/Cxp1EVaVapSG7FXb7/QZVQ3iYWGcVybxaddAghzMu4XVe4mmxoVnGz1dG/Xtku5iZEack7O+m6c4mkZVp2jWRJSdIhhKgQfjufyI8nbhm3n23gir00q1Q4tRs20zqECqmmkxWe9noAEtKz2XUpWeOIKib5ixZCaC4hLYuXt+U2q7T2siPI066QM4RW3v9ph9YhVEiKotAsz79ZGTpbMEk6hBCaezv8KtFJhnUrnK11DAx01TgicS8/fDhO6xAqrKaeuf2P1p1LRFVVDaOpmCTpEEJoaufF28w7csO4PbC+C0428tFUUYWvXqx1CBVWPTcb7O5MYHc+IYMTN9I1jqjikb9sIYRmbmdkM2JrjHG7mactLatJs4owT1Y6hUYeubUd68/JKJa7SdIhhNDMB/uucu5WBgAOVgqDG7iiKDLVuTBfeZtY1ku/jnwk6RBCaOJAbDKf/5k750j/ei642eo1jEgUxadb/9E6hAqtSRVbctLm32OSuZEqC8DlJUmHEKLcpWVm88KWWLLv9LNr6GHDI9XtCz9JVAhRxw5rHUKF5mKrp46LNQBZKmyS2UlNSNIhhCh3Uw/Gc+x6GgC2eoXnpFnFbMx5dbDWIVR4QdLEck+SdAghytXfcalMPRhv3H7K3xlPeysNIxKidDXNM1/HhsgkMrNl6GwOSTqEEOUmM1tlxNYYcmaI9ne1Jrimg7ZBCVHKajlZ4WZr+Hq9nprF/tgUjSOqOCTpEEKUm8//vEbElVQArBQY0tAVnTSrmJVnP5ipdQgVnqIoNMmzUOGv52XobA5JOoQQ5eLMzXQm7o0zbveq60x1R2sNIxIl8Vj/YVqHYBbyTuMv/TpySdIhhChz2arKi1tiSM0ytG3XcrKiW21HjaMSJfFykIfWIZiFBh425KxXePRaGpG3ZHZSkKRDCFEO5h+5aVx1U6fA0IZu6HXSrCIsl61eRwP3PE0skVLbAZJ0CCHKWHRiBuPDrxi3/1Xbkdou0qwiLF/efh0yJbqBJB1CiDKjqiojt8eSmG4YruLloKeXn7PGUYkH0fSxblqHYDbyTom+61IyqTnDtioxSTqEEGVm2ckEfs3TiW5oQzds9NKsYs7GzF6mdQhmw9PeCi8Hw9T+KZkqv8ckaxyR9iTpEEKUibjkTF7dedm4HVzTgQA3Gw0jEqVh9phBWodgVhrmWXV2c9RtDSOpGCTpEEKUidd2XebancWuPOz0PO0vzSqW4MjuTVqHYFbyLnW/+YIkHZJ0CCFK3bpziSw7mWDcfraBC3ZW8nEjKp9AdxtyBmr9FZfK1eRMbQPSmHwKCCFK1c3ULEZuizVut/W2p0kVu0LOEMJy2VvpqOua26y4tZLXdkjSIYQoVW+HXyHmtuHXnLO1jgGBLhpHJErTvL+vax2C2WnkkZt0bK7kS91L0iGEKDXbL95m/tGbxu1B9V1wtJaPGUuye9UirUMwO3f361DVyrvqrHwaCCFKxe2MbP69Nca4/XBVO1p42WsYkSgLSz56Q+sQzE4dF2scrAwdO2JvZ/LPtTSNI9KOJB1CiFIxce9Vzt3KAMDBSmFQfWlWEQJApyg0yFPbsaUS9+uQpEMI8cD2xybz+Z+5bf0h9VxwtdVrGJEQFYtJE0sl7tchSYcQ4oGkZWbzwpYYclqpG3rY0K66NKtYqtFfLtU6BLPUME9n0so8JbokHUKIBzL5j3iOXzcs222rV3iugSuKIlOdW6o6jZppHYJZ8rS3opq9TIkuSYcQosQOx6UyLSLeuP2UvzOe9lYaRiTK2jtdGmsdgtlqVEX6dUjSIYQokcxslRe2xJBTS+zvak1wTQdtgxKiAmsk67BI0iGEKJkZh67xv6upAFjpDCvI6qRZRYh7yjsl+p+VdEp0STqEEMV28noaofvjjNt9/JzxdpRmlcqgQ7+hWodgtuytdNR1sTZuV8Yp0SXpEEIUS7aqMmJrDGlZhvEqtZ2t6FrbUeOoRHkZMulzrUMwa3mXut8RLUmHEEIUavZf1/k9JgUAnWJoVtHrpFmlspgysJPWIZi1+u65ScfO6Mo3gkWSDiFEkZ27lc67v181bveo40QtZ+tCzhCW5sLxw1qHYNZ8Xa3JWY7ozM10ohMztA2onEnSIYQoElVVeXFLDMmZhmYVH0crevg5aRyVEObFWqfgn2ep+52VrIlFkg4hRJHMP3qTHXeqgxXg+UauWEuzSqXjWtVb6xDMXqB7btKxo5I1sUjSIYS4r4uJGby154pxu2sdR3xdbAo5Q1iq6duOaR2C2cvbr2PHRanpEEIII1VVeXlbLInphlnAvBz09PFz1jgqoZV1c6ZpHYLZ83WxxuZOLeH5hAyiEtI1jqj8SNIhhCjU4uO32BBpWBVTwTBaxUYvzSqV1fq507UOwexZ6RT83XI7YFemUSySdAgh7ikmKYNxuy4btzvVdCDATZpVhHhQpkNnK08TiyQdQogCqarKqO2x3EwzNKt42ul5KkCaVYQoDSadSS9KTYcQopJbdjKBX84lGbeHNnLFVi8fdNURegAAIABJREFUGZXde8u3ax2CRfB1tsb2TjNlVGIGkbcqR78O+QQRQuRz+XYmY3fmNqt0rOFgUh0shHgw+rvm66gsQ2cl6RBCmFBVlVd2xHI9NQuAKnZ6+kqzirhj6jOPax2CxajvXvkmCStW0qEoireiKBcVRXnxPse5KIqyRFGUJEVRriqKEvpAUQohys3K0wmsOZNo3B7S0BU7K/l9IkRpq28ySdhtVFXVMJryUeRPEkVRbIA1gFsRDl8EZAB+wGNAiKIoI0oSoBCi/MQlZ/LKjtxmlQ4+9iarYgohSk9tZ2vs7vTruJiYyfkEy1+HpTg/X3oBsRgSj3tSFMUX6AKMUlU1TlXVE8CbwKgSxiiEKCev7LhMfIqhWcXdVkf/ei4aRyQqmt4j39Y6BIuh1ykmQ9Arw+ykRU46VFX9GQgB7lf/0wHYpqpqSp59O4GmiqJYFTtCIUS5WHHqFitPJxi3hzR0w16aVcRd+oyeoHUIFqWyrcNSrE8UVVWzi3CYNxB913mpQALgWZz7CSHKx9UCmlUaV5FmFZHf250baR2CRTHp13HR8vt1lMXPGGugoDqi7DK6nxDiAaiqyujtsdKsIorkVtzl+x8kiqyWU26/jpjbmURaeL+OsmjuuArUKmC/O4bajgJFRESUQShld12Rn5R1+SnNst4cp2f1mdxajX9VSeXKpQuldn1LEBkVqXUIFUpZlkdlLGsfW2vOJRt+k//w+z/0rJZVLvcti8/sli1bFvp+WSQdUcCQvDsURakN3FRVNangU+4faElERESUyXVFflLW5ac0y/rK7UxmRpwFDB9yj/o48HjD6qVybUsRGRWJbx1frcOoMGo3bFZm5VFZy7ppdiLn7sz+e8nGi5Ytfcr8nlp9ZpdFc8duoIGiKN559nW/s18IUUHkrK1y7c4kYB52evrVk0nAROHe/2mH1iFYnLwjWMJjUgo50vw9cNKhGFgpiqIAqKqaDiwGvlcUpZaiKP7ABGDWg95LCFF6lp5M4OezuZOADW3oKqNVxH398OE4rUOwOL4uNtzp1sGx62lcS8nUNqAyVBqfMB0xTATWMc++D4AbwClgC/Chqqq/l8K9hBCl4FJSBmN2xBq3H6vhIJOAiSIJX71Y6xAsjo1eobaztXF7b6zl1nYUu0+HqqrD7treCSh37UsFnnmQwIQQZUNVVf691XTJ+n6ytooQmgpwszHOSBp+KZk+dS3zb1LqUoWoZBb8c5MNkYZOawrwfCNZW0UIrZn267DcScLkk0aISiQqIZ3Xd18xbj9ey5FAWbJeFMOnW//ROgSL5O+a27xy8EoKKZlFmYvT/EjSIUQlka2qvLAlhsR0w4eZl4Oep/wtswpXlJ2oY4e1DsEiOdvo8XLQA5CRDRFXLLNfhyQdQlQScw7fYPtFQ7WtAgxr5IaNXin8JCHuMufVwVqHYLFMmlguSdIhhDBTp26k8XZ4brNKtzqO1HW1KeQMIUR5C3C1/H4dknQIYeEys1WGboohJdOwkFQNJyt6W2jPeCHMWd6ajr2xyWRb4OJvknQIYeE+jYjnwGVDVa1egeGN3LDWSbOKKJlnP5ipdQgWq6q9Hhcbw9fyzbRsjl1L0zii0idJhxAW7M+rKYTujzNu96nrTK08kxAJUVyP9R+mdQgWS1EUi29ikaRDCAuVmpnN0E0x5Iy8q+tqzb9qO2oblDB7Lwd5aB2CRfO38HVYJOkQwkJ9sC+Oo3eqZ210CsMauaGXZhUhKrQAt9yayPBLUtMhhDADu6NvM+PQNeN2/3rOeDkUe9UDIUQ5q+Vkjc2dHwdRiRlcTMzQOKLSJUmHEBbmVloWQzbFkNPvvZGHDY/VcNA0JmE5mj7WTesQLJpep+CXZ3bS3y2sX4ckHUJYmLE7L3Phzq8jByuFoQ3dUBRpVhGlY8zsZVqHYPEseR0WSTqEsCArTyXww/Fbxu3nGrjibqfXMCJhaWaPGaR1CBYv7wiW/Ra2zL0kHUJYiEtJGby8Lca43dbbnhZe9hpGJCzRkd2btA7B4vm65DavHI5PJTnDchZ/k6RDCAuQraoM3xzDjTTDh5OHnZ5n6rtoHJUQoiQcrHVUv9PxOzMb/nc1VeOISo8kHUJYgK8P32DLhduAYTG34Y1csbeSP28hzJVvns6kOTMKWwL5VBLCzB2NT2X8ntzF3LrWdiTQ3VbDiIQlm/f3da1DqBTq5mli2X/ZcjqTStIhhBlLzcxm8MZLpGUZBsjWdLLiCX9ZzE2Und2rFmkdQqXgZ6GdSSXpEMKMvfv7VY7EG2YdtdbBiMaymJsoW0s+ekPrECoFH0crbPWGv+XopEyiLWSSMEk6hDBTmyKT+PzP3Kru/vVc8HGSxdyEsAR6nUIdZ8vr1yFJhxBmKC45k2Gbc4fHNq1iS0eZdVQIi+JngZ1JJekQwsyoKry4NYbLyZkAuNjoGNrIVWYdFeVi9JdLtQ6h0qj7/+3deZxcZZ3v8c+vqrp637KvpANhS0iCQ5jRYRsWBdlBFERAFBQXBrnqAC7D4h29wPWljngZ8OWdF8SLqEzYVxEM6MgWgaxEBJJOp6Gz9JLeu7qqnvvHqXRXdzpJVyddp07V9/161atSp05V//pJ9Tm/es7veZ70ug4lHSLihwebIjz6XufA48/Or6YqqllHJTvmzF/sdwgFY27aCJYVW3roT7g97B0MSjpEAmTN9l5+vGHwQHTirDKOmFjiY0RSaK4/ZYHfIRSM6uIwE1PLGPTEHaubgz9JmJIOkYDo7k9y0VON9CW9yyizKiJ8Yp5mHRXJZ+m9Ha/kwdBZJR0iAfGNF7ewtnlweOyVR9RQFFYdh0g+Sy8mzYe6DiUdIgHw4Dvt3LW6deDxRYdUM71cw2Ml+479xGV+h1BQ5uZZMamSDpEct6m9nyueHRwee1hFgmNmaPVY8celN/3E7xAKygEVRezs0Hy7NUZLb8LfgPaRkg6RHBZPOj7z9GbaUqvHTiwJ8/HJcQ2PFd98/8IT/Q6hoBSFjdlpk4S9GvDeDiUdIjns5pe38af3vYNMyLxpzks0OlZ8tOmtlX6HUHDSi0lf/iDYi78p6RDJUc/Wd/KDV7cPPD5zbgUH1UT38AoRyUf5NEmYkg6RHNTUFeeSZxrZORXQ4ROifLyuwteYRACqJ0/zO4SCM3w69KQL7iRhSjpEckwiVcextdsrGKuKhvj8/BpCquOQHHD7c+v8DqHgTCoJU1nkna7b+pK83RrzOaKxU9IhkmP+12vbeb7Bu25rwOcX1FBVrEIOyQ2P3Xmr3yEUHDOjLq2347Utwb3EoqRDJIe8sLmLm17eNvD49LoKDp9Q7GNEIkM9ftftfodQkNKXuV+xJbjToSvpEMkRTV1xLnqykWTqcu3BNVHOmKs6DhGBumGLvwWVkg6RHJBIOi5+avPAcvWVRSGuWFBDOKQ6DhGBA9J6Ot7Y1ks8GcxiUiUdIjngppe38YfNg3UcVxxRQ60m5JAc9O1fP+93CAWpujhMbbF3yu6JO95q6fM5orFR0iHis6c2dPD9tPk4zpirOg4R2dWcquDXdSjpEPHRpvZ+LnlmcF2VwyeojkNy2w8uOsnvEArW0GLSYNZ1KOkQ8Uks4fjUk5sHFnCqKfbqODQfh4iMZE7V4MykSjpEJCP/44UmXmkaXFflC0fUUhlVHYeIjCy9p2Pl9j5iieAVkyrpEPHB0nVt3LmqdeDx+fMqmad1VSQAzvzSdX6HULAqoiEmpgrM+xKOtc3Bq+tQ0iGSZW9u7eWq5z4YeHzUlBJOmV3uY0Qio3fWV27wO4SCFvRiUiUdIlnU2pvg/Mcb6E11i04vi3DZ4dWY6jgkIK47eb7fIRS0oBeTKukQyZKkc1zydCMb2vsBKAkbX1pUS0lEf4YSHDu2NfkdQkEb0tOxVT0dIrIb33t5G09u7Bx4fPn8GqaVR3yMSESCJn1m0tXbe+mNJ32MJnNKOkSy4OF32rnllcEJwE6dU86HppT4GJHI2Bxw+GK/Qyho5UUhppR6xaT9SVi9PVgzkyrpEBln65r7uDRtArDDaqOcc2CljxGJjN13fvMHv0MoeHMCvPibkg6RcdTam+Ccxxro7Pe6QCeVhPnCwlot5CaB9ctbrvU7hII3pJg0YHUdSjpExsnOlWPfaYsBEA0ZX15US0WR/uwkuP60bKnfIRQ89XSIyC6+8+etPF3fNfD48vnVzEr7hiIiMhazK4vY2Ve6trmP7v7gFJMq6RAZB/ev38FtK5oHHn+8rpyjppb6GJGI5IvSSIipZV4xacLBym3BucSipENkP3vlg24+9+xg4ejCicWcrcJRyRO3/X6t3yEIwxZ/C1Bdh5IOkf2ooaOfcx/bTF/ajKNXHKGVYyV/1K9b6XcIQnBnJlXSIbKfdPUnOefRBpq64wCUR4yvLq6lVDOOSh6585qL/Q5BGFpM+pqSDpHCknSOzz7TyBupa6shg6sW1TK5TDOOisj+N7syMlBM+tfWWGCKSZV0iOwHN720jWXvdAw8vvjQag6tLfYxIhHJZ8XhEFNSxaRJ502JHgSjTjrM7ItmtsnMeszsKTObuZv9JpiZG+F20f4LWyR3LF3Xxr+9OjjF+UmzyzhuZpmPEYmMn8/c+CO/Q5CU9HVY3gjICJZRJR1mdjbwLeBcYDqwHvj1bnafDGwEiobdfrOPsYrknOUNXVz5+8GRKgsmFnPBvCofIxIZX8dfcLnfIUjK7LSk481twViDZbQ9HV8H/tU597pzrg34JnCgmR0xwr6TgO3Oufiwm9tfQYvkgvUtfZz3eAM7L6XOrIjwhSNqNMW55LWrFk3wOwRJmV2R1tOxNRjFpHtNOswsDHwYeGznNudcAngBWDLCSyYDzSNsF8kb27rjnP7wJtr6vIyjOhri6sUTNFJFRLJmduVgofqq7X3Ek7n/3X40R8haoN85t2PY9kZg2gj7TwKmmdnDZvY3M1tlZv/bzCr2NViRXNATT3L2ow1saO8HvDVVrl48gQklYZ8jE5FCUhkNU1PsncZ7E463W2M+R7R3oxnPVwR0jbA9ychJy2RgKvAvwFvADLx6kP+HVxMyohUrVowilMyN1/vKrgqhrRMOvrU+ysvN3p+O4Th7aj/J1kY2tmYvjo31G7P3w0TtnWbe0ceNa3uorTMzMRyhDe8Lz7JX1tE9JTHq147HMXvJkpEugAyyvZVamFkE2OGcKx+2/efAKufcz4ZtrwFqnXMbhr3HZuAo51xj2u7j2he0YsWKvTaA7B+F0NbOOf55eRP/Z+VgdnHhIVWcNLt8D6/a/zbWb6RuTl1Wf2YhU3tnj9o6c4+828GTGzsB+MbfTeCHx490AWJXWTpm71LgttfLK865ONBqZnXDnpoHbBhh/7b0hCPtPZqA2RkEK5JTbn2teUjCcfLs8qwnHCJ++9nVn/Y7BEmTXtcRhBEso616ewQ4f+cDM6vCKyJ9afiOZna7mZ0zbNtMvCTlb2MPVcQ/965r49t/3jrw+OipJVxwsBZxk8Kz+sVn/A5B0sweNldHrg8UHW3ScTfwbTM7MZVw/BC43znXYp6I2cCKVs8Dd5jZKWZWZWZH4yUtP3fOaVSLBM7TGzuHzMVxaG2Uz87XIm4i4r9JJWFKI96xqKU3QUNH3OeI9mxUSYdzbhXeXB0PAPVAKXBt6ukTgP7UPc65p4HrgZ8C24FleEnHN/dn4CLZ8GpTDxc80UA8NRfHrIoIX15US5Hm4hCRHGBmQ+breDPHZyYd9WpUzrmlwNIRti9nWLGIc+5+4P59DU7ET2ube/n4w5vo6ve6KyeUhPnnIzUXhxS2u1e1+B2CDDOrMsLbbd5w2Te29XL2Qbl76VdHT5ERbNgR42MPbqKl1xt+Vl5kXHNkLTXFmotDCtuL/3WP3yHIMEPWYNma2z0dSjpEhmnqivPRB+t5v8u7NlocNq45cgLTy4v28kqR/Hff977udwgyTJAuryjpEEnT1pvg1IfqeXeHN9toJARfXVRLXVXU58hEREY2vTxCqpaU+o7+gR7aXKSkQySlI5bg9Ec2sWq7N9Y9ZPCFI2o5dEKxz5GJiOxeOGTMqBgs0VyZw70dSjpEgO7+JGc90sBLHwyu1HjZ4dUcObnEx6hEcs9Xfvorv0OQEQyfryNXKemQgtcbT3LuYw280Ng9sO3CQ6r4yPQyH6MSyU1z5i/2OwQZwdBl7pV0iOSkWMJxwRObeXbT4JqGn5hXqenNRXbj+lMW+B2CjCC9pyOXi0mVdEjB6k84Pv3UZp7Y0Dmw7ewDK/jYnAofoxIRydysisjAhFlvtfTRs3NGwxyjpEMK0s6E48F3Oga2fbyunNPrlHCISPCUREJMKfPmEUo4WLM9Nxd/U9IhBSeWcFz45GaWpSUcpxxQzjkHVmJaT0Vkj479xGV+hyC7Masi94tJlXRIQYklHJ96YjMPvTuYcJw0u4wL5inhEBmNS2/6id8hyG6k13Ws2q6kQ8RXffEkFzzRwCPvDSYcHz2gnE8dXKWEQ2SUvn/hiX6HILsxK22ujtU5enll1Au+iQRZTzzJBY9v5smNg0WjH5tTzvkHqYdDJBOb3lrpdwiyGzMrhvZ0OOdy7vimng7Jex2xBKc/vGlIwnGaEg4RyTO1xSFKU/Oht/UlaeyM+xzRrpR0SF5r6U1wyrJ6lm8enPjr9LoKzlXCITIm1ZOn+R2C7IaZ7dLbkWuUdEjeauqKc8IDG3l1y+Af3vnzKjlHCYfImN3+3Dq/Q5A9SK/rWJWDdR1KOiQv1bfHOO6Bjaxp9v7oDLj40CpO1cRfIvvksTtv9TsE2YOZ5enFpOrpEBl3q7b18pHfbOSdthjgrRZ7+fxqTpilqc1F9tXjd93udwiyBzOHDJvNvZ4OjV6RvLK8oYtzHmugPeZNARwx+MLCWq0WKyIFIb2nY31rH7GEIxrOncvJ6umQvPHbt3dw6sObBhKOkrBxzYcmKOEQkYJREgkxqcSbDj2ehPUtudXboaRD8sIdb7Zw0ZONxBIOgOpoiH85aiKH1hb7HJlIfvn2r5/3OwTZi5lDiklzq65DSYcEWiLpuHZ5E9csb8Kltk0tC3P9konMSru2KSJSKNKHzebazKRKOiSwOmIJzn2sgX9/s2Vg24HVRVy3ZBITS1WuJDIefnDRSX6HIHsxK4d7OnRklkDa3NHPmY9sYmVaFv+hySV8fkFNThVNiYhk29AJwnKrp0NJhwTOX7b0cNajDXzQNTjF72lzyjnnoEpCmvRLRArclLIwRSHoT8L7XXGae+I50/uryysSKPet38Gxv904kHCEDC47vJrz5lUp4RDJgjO/dJ3fIchehMyYXp6bdR1KOiQQ4knHN15s4pKnG+lNjVApixjXHjmBY2aU+RydSOE46ys3+B2CjMKQZe6blXSIjFpzT5zTHtrEj14fLBidXhbhhqMncegEDYkVyabrTp7vdwgyCrk6bDY3LvKI7MbrW3u44PHNbGjvH9i2eFIxn1tQQ2lEObNItu3Y1uR3CDIKQ4pJtynpENkj5xw/X93G115ooi/hBrafObeCM+ZWqH5DRGQP0ns61jT3kXQuJ46bSjok53TGklz13Pv86q/tA9tKwsbnFtRoSnMRnx1w+GK/Q5BRqIqGqYqGaI8l6Y473tvRz7yaqN9hKemQ3LJmey+ffGIz61tjA9tmVUT44sJappbp4yrit+/85g9+hyCjNLMiQnuLdyxdvb03J5IOXRSXnOCc486VLRx9/4YhCcexM0q5fskkJRwiOeKXt1zrdwgySrk4SZiSDvHd1u44Zz3awFf/0DQwHDYaMi6fX82lh2uGUZFc8qdlS/0OQUYpF6dD19dH8dXTGzu5/HeNbOlODGybWRHhygU1zKjQgm0iImM1MwcnCFPSIb7ojCW5/k9buHNV65DtJ88u57yDKilS74aIyD6ZXh7BAAe80xajqz9JeZG/FziUdEjWPd/QxRXPvs/GtLk3qqIhLp9fw4KJmuxLJJfd9vu1focgo1QUNqaWhWnqTuCA9S19HDW11NeYVNMhWdMRS/Dl5z7g5GX1QxKOxZOKufEfJinhEAmA+nUr/Q5BMpC+BsvaHJgOXUmHZMUTGzpY+Mv3uGv14OWUsojx+QU1fHlRLZXRsI/Richo3XnNxX6HIBmYUT54QSMXkg5dXpFxtbmjn6+90MSD73QM2b54UjGfOaya6mIlGyIi42V62giWtS1KOiRPxZOOO95s4caXttHZnxzYXl5kXHRINUdPLcFyYEpeEZF8pp4OyXvPN3Rx7QtNuwzROmZ6KefPq6Iiqqt6IkH1mRt/5HcIkoGpZRFCBkkHG9v76YwlfT0GK+mQ/ebdthjf/OMWHn536KWU6eURLj60ikNqVSgqEnTHX3C53yFIBiIhY2pphA+64wC81dLH0dP8G8Gir5yyz9p6E9yxoYj5v3x3SMIRDRnnHlTJd/9+khIOkTxx1aIJfocgGUqv61jnc12HejpkzLr7k9zxZgu3rdhOa18R3hQ0ng9PK+W8eZXUqFBURMRX03OorkNJh2SsP+H4xZpW/uer2/mgKz7kuQOri/jUwVXMrfZ/NUMREcmtYlIlHTJqffEk9761g1tf286GtMm9AGqKHBccUssSjUoRyWsLjz/V7xAkQ0o6JFC6+5P8Yk0rt/+lmcbOoT0bNcUhzqirYGZ8Gwf5WJwkItlx9c/u9zsEydCUtBEs9R3eCBa/KOmQ3dreE+euVa3csbKFrWmrwAKUR4xT6yo4cVY50bCxsX6bT1GKSDb97OpPK/EImEjImFoWGbgcvq6lz7dRJEo6ZBfrmvv4yRvN/PKtHfQm3JDnqqIhPnpAOcfPLKMkosFPIoVm9YvP+B2CjMGM8sGkY21zHwt9ikNJhwBecejjGzq4a1Urv9vUtcvztcUhTp1TwTEzyohq2XkRkUBJH8GyrqWPhT5dDVfSUeA27IjxizVt/OfaNpq647s8f0BlhJNnl7NkaimRkJINEZEg2qWYdJY/cSjpKEDtfQmWvdPBfet38HxDF27Y8wYsnlzMybPLObgmqtEoIjLg7lUtfocgY7DLXB1KOmQ89caT/K6+i/vW7+DR9zp2qdUAqI6G+McZZRw7o5RJpfpoiMiuXvyvezQVegBNLYsQNkg42NTRT9euHdtZoTNLHuuIJXhyQycPvdvBExs6h6z2upMBCyYWc9zMMhZOLCasSygisgf3fe/rSjoCKJwawfJ+KtvY0BPiBB/iUNKRR5xz/K0txtMbO3m6vovnG7roG6FHA2B2RYR/mFbKkqml1JZoqnIRkXw3vXww6Xivy58vmEo6Am5LV5wXGrv4Q0M3z9R37jJTaLoppWH+bkoJfz+tlJkVRVmMUkRE/DajPMJfUv9+t9ufKQ+UdASIc4532mK80tTDHxu7eaGxm7+2xvb4mlkVET40uYQPTSlhRnlERaEisk++8tNf+R2CjFF6MemGnhxPOszsi8B3gcnAcuBK51zjbvatAv4DOAfoBu50zt28r8EWkkTS8d6OGKub+3h9ay+vNfXw6pYe2vr2PH1tcdg4rDbKgonFLJhYrIJQEdmv5sxf7HcIMkYz0pa4f687hy+vmNnZwLeAc4H3gJuAXwPH7eYl9wDtwFxgIrDMzBqcc/93XwPON33xJO/u6Oft1j7ebovx19YYa7b3sqa5j+74yPUY6SIGddVRDq6JMn9ClAOro5pPQ0TGzfWnLNCw2YCaUjo4gmVLX4j2vgRVxdmt6Rvt1+CvA//qnHsdwMy+CWwysyOcc2vSdzSzOuAUYKpzrgfYZmbfAP4NKKikoz/h2NIdp6k7TlNXnPe74tS391Pf0U99e4z6jn4aO+Mk955bDCgvMuZWRamrKuKQ2ihzq6KaIVRERPZq+AiWdS19fHh6WVZj2GvSYWZh4MN4l0oAcM4lzOwFYAmwZthLjgWeSyUcOy0HFppZxDmXldHBiaQjlvR6EnZyO2/Ou086R9J5WV8i6bx754glHP1J7z6WdPTFHT0JR088SU/c0d2fpLM/SXssSUcsSXsswY5YkuaeBM29CVp6B+/3RVU0xMyKCDMriqirLKKuuohJJWHVZYiIyJjMSBvBsrY5B5MOoBbod87tGLa9EZg2wv7TgM3pG5xzvWbWDkwCmsYSaKZ++3Y7F/+5DP68Phs/bswMmFgSZkpZhCllYaaWRZheHmFmRYSqqIay7qvH7ryVx++63e8wRPLKVYsm+B1C3jvzS9dx1ldu2O/vO6MiAlu9f69t7tvv7783o0k6ioBdVwCDJIy4Om6m+wOwYsWKUYQyeu9tCwPF+/U9M2U4qiJQHXHUFDmqixyTihyToo6J0dR9kaNoeKs4oMOPiMduQQXQ/J7fYeyiqKfV7xBERDJW1NNK+TgcUxeFQnRNDLOoKslRrpEVKzbv/UUZWLJkyR6fH03SsQ2oHGF7LdAwwvatwOzd7N++ux+yt0Az9e5fd1D09mYsNPSMbqlbyMAMQmaEDcJmhEPefTRkFIW9+2jYiIagrChEaSREacQojYSoLApRFQ1RGfXuq6JhJpSEmViaui/x7gulqHPFihX7/f9wf/jb8hk85HcQIiIZWlQ3g8/801Hj9v5+HbP3mnQ45+Jm1mpmdc65jWlPzQMeGeEl9cCl6RvM7ACgzTnXuS/BZuLCQ6s5qONvOXkilOy5+eabufnmm/0OY7/K1QQvX6m9hzIznMug+j0Dauv8N9rZQR4Bzt/5IDUPxxLgpRH2fRE4zMzS6z1OS20XERGRAjXaIbN3A8+b2RvAX4AfAvc751rMG0oRBhLOEzOzpcC9ZnYlEAVuYFjvh4iIBM/06dP9DkECbFQ9Hc65VXhzdTyAd/mkFLg29fQJQH/qfqcbgVbgbeBZ4Bbn3H/vp5hFRMQn77//vt8hSICNevJ159xS59wk51ytc+7SnfNwOOeWO+fMObc8bd9e59xFzrlS59yBzrl7xyHcX/6lAAAFzElEQVR2ERHJsnyrkZLs8mfFFxERCaRbbrnF7xAkwJR0iIiISFYo6RAREZGsUNIhIiKjtr9nj5bCoqRDREREskJJh4iIjJpmDJV9oaRDREREssLGaw79UfL1h4uIiMi42WXFU/V0iIiISFYo6RAREZGsGO2Cb+Nll64XERERyU/q6RAREZGsCHTSYWZfNLNNZtZjZk+Z2cw97FtlZveZWaeZbTWzm7MYauBl2NYzzWxpqp23mNl/mFlVNuMNukzae9jrlpnZ8nEOL69k+NkOmdmNZvaBme0wswfMbGo24w2yDNt6sZktN7MOM2sys++bWaDPWdlmZtPMrMHMrtzLflk7Pwb2P9DMzga+BZwLTAfWA7/ew0vuAfqBucDxwCfN7IpxDjMvZNLWZlYKvAhsBxYDS4Bq4D+zEmweGMNne+frPg2cMb7R5ZcxtPX1wDHAUcDhQDvw83EOMy9keBypBJ5OPT8bOBH4GPDVrASbB8wsCjwI1Ixi93vI1vnRORfIG7AcuCTtcRhoBI4YYd86vINDadq204AVfv8eQbhl2NZnAK8N21YMdAElfv8uQbhl0t5p+0wFmvAO6sv9/h2Ccsvwsx0GNgPT07ZVAc/5/XsE4ZZhWx8LvD5s27nA7/z+PYJyA84DlgH3AlfuYb+snh8D2dNhZmHgw8BjO7c55xLAC3jfrIc7Fu/A0JO2bTmw0Mz8LqbNaWNo6z8C5wzbFsI7wETHKcy8MYb23uluvN6k18Y1wDwyxuPIW865D9L2b3fOnTzesQbdGNp6I3CgmS1I2/ZR4K1xDDOvOOceAj7J3ufDyur5Magn3Fqg3zm3Y9j2RmDaCPtPw/uGMsA512tm7cAkvG+IMrKM2to5146XNae7Fvhj6jnZs0w/25jZJcBhwIXAceMbXl7JtK0PBDaY2Q3ApXjd0Q8CtzrnYuMaafBlehzZbGbfAv5kZj8GDgGOAE4a90jziHMuabbXQaJZPT8GNekowuuuHy7JyHUqme4vg/ap7czsPLyk4x/3c1z5KqP2NrPpwI+Bs51zfaM4wMigTD/bU/CSjaeBy4EY8APgLuDz4xNi3hjLcWQpcAJeHU0Ur62HJy2y77J6fgzqCXcbUDnC9lp2/ZYNsBXv2uto95dBmbb1ADP7GvAz4DTn3LvjEFs+yrS97wbucc69NK5R5adM29qA3wPnO+dec86txOtdOs/MJo9fmHkho7Y2swnAq3iJ3Vy8no5j8OoTZP/K6vkxkEmHcy4OtJpZ3bCn5gEbRnhJPTA/fYOZHQC0Oec6xyPGfDGGtsY8/473bfAjzrk3xjPGfJJJe6c+w2cBV5rZdjPbjlc4dkzq8TFZCDmwxvDZbgCaXKrSLvUenal9h7+HpBlDW18NrHfOXeac2+qc2wCcDXzUzA4a12ALT1bPj4FMOlIeAc7f+SA1D8QSYKRvfC8Ch5lZ+rXD01LbZe8yaWuA7wGHAsc45zaNf3h5Z7Tt3Yg3nHAhcGTqdhWwIvXvFdkINuAy+Wz/ETjdzIrT9i/F+ybeMM5x5oNM2noaXk9Huh68ywDFu+4u+yCr58eg1nSA1638vJm9AfwF+CFwv3OuxbwL22Eg4TwxM1sK3JuaJCUK7CwGk70bdVub2cF4NRx1QGxY9XMi/Vui7NZo2zvBsAKwVG9Hn3Nu8/A3lRFlchzZZGbPA0vN7Jt4J8Db8IqkVYy+d6Nua+BhvHZei5fsVQPfBTahESz7xO/zY2B7Opxzq4CvAw/gdQ+V4p3swCs+6k/d73Qj0Aq8DTwL3OKc+++sBRxgGbb16UAF3uRg/cNuw4fSygjG8NmWMRpDW1+FdxxZndq/GvhctuINskza2jn3O7xLLDfi9ei9jFfYeK6+uOwzX8+Ppv8/ERERyYbA9nSIiIhIsCjpEBERkaxQ0iEiIiJZoaRDREREskJJh4iIiGSFkg4RERHJCiUdIiIikhVKOkRERCQrlHSIiIhIVvx/O77O7v+q9egAAAAASUVORK5CYII=\n", 139 | "text/plain": [ 140 | "
" 141 | ] 142 | }, 143 | "metadata": { 144 | "needs_background": "light" 145 | }, 146 | "output_type": "display_data" 147 | } 148 | ], 149 | "source": [ 150 | "dist = stats.beta\n", 151 | "x = np.linspace(0, 1, 100)\n", 152 | "\n", 153 | "heads = y.sum()\n", 154 | "prob_dens = dist.pdf(x, 1 + heads, 1 + N - heads)\n", 155 | "cred_interval = dist.interval(0.95, 1 + heads, 1 + N - heads)\n", 156 | "plt.plot(x, prob_dens, label=f\"observe {N} tosses,\\n {heads} heads\" )\n", 157 | "plt.fill_between(x, 0, prob_dens, alpha=0.4)\n", 158 | "plt.hlines(y=0, xmin=cred_interval[0], xmax=cred_interval[1], label=\"95% interval\")\n", 159 | "plt.axvline(0.75, color=\"k\", linestyle=\"--\", lw=1, label=\"Point estimate\")\n", 160 | "\n", 161 | "leg = plt.legend()\n", 162 | "leg.get_frame().set_alpha(0.4)" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "The distribution covers a wide range of all possible _head_ probabilities. From the distribution, we can derive the 95% interval, that is, the interval that contains the true parameter with a probability of 95%, which goes from $[ 0.28, 0.95]$. The Bayesian model is thus very uncertain about the true parameter.\n", 170 | "\n", 171 | "We can do the same experiment with more tosses:" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 25, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "data": { 181 | "text/plain": [ 182 | "array([[0.25000016, 0.74999984]])" 183 | ] 184 | }, 185 | "execution_count": 25, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "N = 100\n", 192 | "y = np.array([1]* int(N*p) + [0]*(N - int(N*p)))\n", 193 | "X = np.ones((y.shape[0],1))\n", 194 | "\n", 195 | "model.fit(X,y)\n", 196 | "model.predict_proba(X_test)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 26, 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhAAAAFmCAYAAAA8k6PIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOzdeXhTVfoH8O9JuqaUlrZA2YtA2UFldahsAwMqFRGKbCrIiMAPHXcRVAqCIKPggiCigEWgIOgozCAjAgKCaOsIsshayg5doCvdkvP7IzTN7Zq0SW6W7+d5+tjcnHvuyQ0mb8/yHiGlBBEREZE1NGo3gIiIiFwPAwgiIiKyGgMIIiIishoDCCIiIrIaAwgiIiKympcN6+JyDiIiIvckSh9gDwQRERFZjQEEERERWc1lAoiEhAS1m+AxeK8di/fbcXivlYQo0yttM7zXjqXG/XaZAIKIiGyLX/JUEwwgiIiIyGoMIIiIPFTXrl3VbgK5MAYQREREZDUGEERERGQ1BhBERB5q1qxZajeBXJjLBBAhISFqN8Fj8F47Fu83qSU2NlbtJpALc5kAQqNxmaa6PN5rx+L9JrU0bNhQ7SaQC+MnFxGRh7py5YraTSAXxgCCiIiIrGbL3TiJiMgF7L+ci1+u3kKnO+9SuynkwhhAEBF5kFM38tF/czLy9RIRT2/An+n5aBPiq3azyAVxCIOIyIMs/+MG8vUSAHBuxQz8ZUMS9l3KVblV5IoYQBAReYj8IgM+P55RcuDnTbiRb8CAr5Kx+VSmeg0jl8QAwgmMHz8ekydPVrsZDnfgwAH4+PggPj6+3OczMjIwbtw41KlTB0FBQRg7dizS0tKqXY7I0/3rTBZSb+kBALV9Sj7+8/USMf++iPf/x/9vyHIMIEgVKSkpiImJQWFhYbnPSykxePBgJCYm4r333sOHH36IhIQEDB06tFrlyvP7779jyZIlNnk9RK7gkyM3TL/3aaQDANTz1wIAJIBnf7yG/Zc5nEGW4SRKcji9Xo/Ro0ejQYMGMBgM5ZbZvHkzjhw5ghMnTpiS3fTu3RutW7fGjh07MGDAAKvKlac4gJg2bZqNXyGR8zl1Ix87LxiDAwGgV0MdonYchVdwGD48lI5zmcZg/l9nsvCXhjoVW0qugj0Q5HD5+fmoU6cONm3aBB8fn3LLfPPNNxgxYoQiU15ERAQGDhyIrVu3Wl2OyNN9euSm6feOYb6o46dF8rFDqOWjwX0RtUzP7WMPBFmIAYSd7d69GwMGDEDdunWh0+kQFRWF7du3l1t269at6Ny5M/z9/XH33Xfju+++Uzx//fp1PPfcc2jevDl0Oh1at26N+fPno6CgQFHu119/xcCBAxEYGIhatWqhX79+2Lt3r6JM3759sWDBAqxZswYtWrSAEAL79u1D3759y7RLr9ejS5cu+O2336y6RkV0Oh2+/PJLNGvWrMIyv/zyC6Kiosoc7969O44ePWp1udL69u2LCRMm4MSJExBCYPz48abnLHnPLHkvLH2/Dh06VOW9tLQuovIU6CVWHy8JIKJu9zAsfWYMAKBFkLfpucTrecgrKr9nkMgcAwg72r59OwYOHIiwsDAsW7YMq1evRoMGDXDfffdh48aNirLff/89hg8fjgEDBmDNmjW46667EB0djQMHDgAACgoKEBUVhV27dmHGjBmIj4/H3//+d7z33nuYMmWKqZ7ExET06dMHfn5++Pjjj7Fs2TI0adIEf/vb38p8KW3evBmTJ0/Gk08+iW3btqF58+bYu3cvDh8+rCj3448/4ujRo2jZsqXV16iulJSUcgOMRo0a4erVq1aXK23hwoV4/vnn0bhxY2zbtg0vvPACAMveM0veC2ver1GjRlV6Ly2ti6gi357NwvVc4+TJYF8NOoQq8z4E+mhRX2ecC1Ggl0i4lufwNpILklLa6seukpKS7H0JmzIYDLJVq1Zy2rRpZZ579tlnZUREhDQYDFJKKR9//HEJQH7xxReKco8//rgcPHiwlFLKn3/+WQKQ169fV5TZs2ePHD9+vOlxVFSUHDt2bJlrzp49W/bo0cP0uE+fPhKA3Lhxo6Jc//795cSJExXHnnrqKTl06FCrr2GJZs2ayfXr15c5rtVq5U8//VTm+Nq1a2WLFi2sLleeVatWydatW5seW/qeWfJeWPN+md/bYub30tK6qGq//vqr2k1QxcDN5yQWH5VYfFQ+8K9kufxwulx+OF0CMP3eK/6sqcyCX1JqfE1PvddqccD9LvO9zwDCTn777TcphJCXL18u89yVK1ekEEIePnxYSmkMFAYNGlSm3C+//CKFEDInJ0deuXJF+vj4yOeee67cOqWU8tq1axKA/PXXX2VWVpbi5+zZs1IIIdPS0qSUxgCiW7duZepYsWKFDA8Pl3q9XkopZVFRkaxbt65ct26d1dewREUBhK+vr/zhhx/KHP/8889lhw4drC5XntIBhKXvmSXvhTXv1zfffFPpvbSkLrKMJ36pnbmZbwoMxOKjcv4vKaagYewbi0y/P/bdRVO5If9KrvF1PfFeq0mNAIJDGHZy8uRJ1K9fHw0aNCjzXHh4OEJCQnD69GnTsYiIiDLl2rZtCyklkpOTER4ejq+++gr//e9/0bBhQ3Tu3Bmvvvoq/vzzT1P5s2fPAgC6deuGwMBAxc8dd9wBKaWpDAA88MADZa45fPhwpKenY8+ePQCAnTt3Ijs7G9HR0dW6RnWFhobi5s2bZY6np6cjKCjI6nKWsPQ9s+S9sOb9Gjp0aKX30pK6iCryqdnSzfahvgjx05oe9x4x3vR7i6CSCc37r9yCQUqHtI9cF5dx2olGozF28VSgsLAQgYGBldaRn58PACgqKgJg/MJ/4IEHcOzYMezatQvbt2/HnXfeiRdffBFz586FVmv8YPjPf/5TYd2RkZGm3319y+a/r1OnDu69916sW7cOffv2xcaNG3H//fejVi3jLG1rr1FdrVq1wrFjx/Dwww8rjp85cwbNmze3upwlrHnPqnovLClTfC9XrVplml9SWvG9tOR6ROX5+nSW6feoUsszn+oUguWH0wEA9XVa1PLWILvQgPQ8PU7cKEBb7pFBlWAAYSeRkZG4du0aLly4gCZNmiie++OPP5CZmVnlF21iYiK8vb3RsmVLZGVlIT8/H2FhYWjXrh3atWuH//u//8O///1vREdH4/nnn0dkZCQ0GmOnUumVCQUFBUhJSUHt2rWrbPvQoUMRGxuL9957D19//TWWLl2qeF22uEZVoqKi8MMPP+C1115THN+zZw8mTpxodTlLWPqeWfJeeHt72+z9suR6ISEhVr1W8gw5hQacuGFcqSNg7IGoiBACLYK8cSjV+IfLT5dzGUBQpTiEYSedOnVCixYtMG/evDLPzZo1C3369EHTpk1Nx3bv3o2MjJIc9UVFRXjrrbfw0EMPwd/fH5s2bULHjh1x48YNRV1eXiUxYFBQEB544AG88cYbpt6LYvPmzcN9991nUdsHDBiA/Px8vPTSS8jLy8OQIUNsfo2qjBkzBnv27DGtQgGMqa+PHz+OESNGWF2uPF5eXopEVpa+Z5a8F9a8X4sXL670XlpSF1F5/kjNQ3GfWrjOCz5aUWn5FsElwxjcYIuqwk8gOxFC4KOPPsKQIUOQnp6OmJgY6PV6xMXFYd++fdi/f7+i/MmTJ9GrVy+8+uqr0Gq1WL16Nf73v/8hISEBADB69GgsX74c3bp1w9NPP41mzZrh8OHDWLx4MZ566inTX6CLFy9Gz5498Ze//AVPPfUUAgMDsWXLFmzevBk7duywqO3+/v548MEHsWzZMsTExECnU3Z72uIaVWnXrh3GjRuHYcOGYebMmahbty5mzpyJadOmKZJGWVquPB06dMC5c+ewcuVKREZGIioqyqL3zJL3wpr3q3v37pXeS0vrOn36NE6fPo2OHTuiUaNGNnkfyLX9nlKyHLNxYNmP+469ByketzSbB/HTlVv2axi5h/JmVlbzx65cbRVGsQMHDshBgwbJkJAQGRYWJqOjo+WRI0cUZR5//HE5fvx4OXPmTNmoUSPp5+cno6Kiysyqzc3NlW+++aZs0aKFDAgIkF27dpWff/55mWueOnVKjhgxQoaFhUlvb2959913y507dyrK9OnTR86fP7/cNiclJclvv/1WApCbNm0qt4wl17BERaswpJQyJydHxsTESH9/f9m4cWP58ssvy6KiomqXK89bb70lg4ODFasxLHnPLHkvLH2/du3aVeW9tKSuWbNmSQBy1apVFr12T+RpKwMm77hsWlkx7NvzphUXFf0s+T1Ner1/1HTO1ezCal/b0+612tRYhSGk7Wba2nXK7rlz58pdqUC2x3vtWLa83/Xr18fbb7+tyKxJJRISEtC1a1e1m+Ew98Qn4eerxp6Ep++sgw6hfornl0wbjWlL1iuO/TMhFaczjPtifDWkMYa1rN6cJk+712pzwP0uM/7FORBEbiIhIQFZWVkVruggz6I3SBxOLRnCaFLLu0yZP/aUTauvmAfBfTGoEgwgiNzEkiVLsGrVqnL3BiHPcyajALlFxo7h2j4aBPlqqzjDyDwfxE+XOQ+CKsZJlERu4p133kFYWJjazSAnoZhAWcvyj3rzHojfrt/CrSID/L34tyaV5VQBROyB65h9MLWSEscc1hYAmNUjDLH31HPoNYmqi8EDmTuUUrI0uHFg2eELAKYkUuZqeWsQrtPiaq4ehQbg16u30LtxgN3aSa6LYSURkRsy74Eob/4DAOzZtLrc4y05D4IswADCgc6dOwchRKU/u3fvBgB88MEHFZbJy6t8q92+fftiwYIFDnhFlomIiEB8fLzazSDyKJYMYayd83y5xzkPgizhXEMY99SrcMjAHZYW1q9fH9u2bSv3uVOnTuGZZ54xJT9KTU1Ft27dMGfOnDJlfXx8yhwjIiqWkluEyznGPXS8NUB9nXUf9ebzIPZfyTWu+ReVZ7Ekz+NUAYS78/f3x+DBg8t9bvfu3ejZs6dpf4y0tDRERkZWWJ6IqCKHzJZvNqrlDa3Gui//ev5a+HsJ3CqSuJlvwLVcPcID+HVBShzCcALF6ZKfeOIJ07HU1FROiiOiarF0BcbUD9aVe1wIgbr+JeedySiwXePIbTCAcAL/+c9/kJmZiVGjRpmOpaamIjQ01PS4sLDQ6no3bdqEtm3bIigoCPfddx9OnDhRpsx3332He+65BzqdDsHBwYiOji63XHx8PLp3747atWujcePGeOKJJ3Dt2jVFGYPBgLfffhsRERHw8fFB8+bNMXv2bMWGVQBw6NAhjBw5EuHh4ahVqxZ69OiBjRs3Wv36iKh8ihUYFUygBIBm7TpX+Fxd/5K8EWduMoCgshhAOIGVK1ciJiYGgYGBpmNpaWk4ePAgunfvDp1OBx8fH7Rp0wYbNmywqM5169Zh1KhRGDx4MD777DPodDr07t1b8aW/detWDBkyBJGRkVi1ahUWL14MIQRiYmJw8uRJU7n58+djwoQJ6NOnD1atWoXZs2cjISEB999/vyKwmTt3LubOnYvJkyfjiy++QJ8+fTB79mxcuHDBVOb48ePo2bMnCgoKsHjxYqxZswb33nsvxo4dizVr1tTkNhLRbYoVGBUs4QSAVwa0r/A58wDiNHsgqBwc1FLZ9evXsXXrVuzcuVNxPDU1FWfOnMFbb72FDh06ID09HV999RVGjRoFrVZb5VbVf/zxB9atW4fRo0cDAB5++GHcfffdiIuLw0svvQSDwYBp06Zh+vTpmDt3rum8CRMmYOTIkXjjjTdMKyfeffddvP/++5g0aZKp3EMPPYTGjRvjwIED6N27N27evIl//vOfWLNmDR566CEAwMiRIzFw4ECMGzfOdN63336LFi1a4F//+pfp2LBhw9CyZcsyPRpEZL28IgP+vGHeA1G9j/m6ZhMv2QNB5WEAobK4uDjccccduPfeexXH+/fvj2HDhmHYsGGmY8OHD0dYWBjeeOONKgOIv//976bgAQA0Gg3uvPNOnD17FgCQmJiIS5cuYerUqcjOzlacO3z4cEyaNMk08zo1VZncKzc3F76+vmjSpAmSkpLQu3dv7NixAwEBAabgodjYsWMxc+ZM0+PIyEicPXsWK1aswOjRo1GrVi0AwOTJk6u6VURkgWPp+Si6PWpY118Lv2pmkaxnPoSRYf0QKrk/DmGobOXKlZgwYUKZ43FxcYrgodjEiRNx/Phx3Lx5s9J6W7RoUe7x4t1Xz5w5g6KiIjRq1AiBgYGKn1GjRiEzMxNpaWkAjPMvFi1ahDvvvBPe3t4ICAhAYGAgTp06ZaovKSnJtIKkMsOGDUNsbCxef/11hISEoF+/fli0aBFSUlKqPJeIqmZJAqliUcMfq/A580mUp9kDQeVgD4SKDhw4gJMnT+Lxxx+3+BxfX18AqDKZVFW0Wi0CAgLw3XfflXnuypUraNCggWlOxmOPPYbvv/8eU6dOxdy5cxEUFAQhBMaOHWs6x9vbu8xkyYq8/PLLeP7553Hw4EHs3r0bX375JWJjYxEXF1emB4OIrKNMYV35R/yjs96r8LkgXw28NECRAUjL0yMjX2/xhlzkGRhAqGjlypW477770KBBA8Xx5ORkvPjii1izZg38/PwUzyUmJiIwMBB169at0bXbtGmDnJwchIaGom3btornjh8/jjp16sDX1xfnzp1DfHw8EhIS0KVLF0U5f39/0+/t27dHbGws8vLyyrTZPLBITU2Fr68vAgMD0atXL/Tq1QszZ87ESy+9hFmzZjGAIKoh5RLOynsg5j3SDzM37Cr3Oc3tpZxXbiekOpNRgLvr+ZdbljwThzBUkpOTgw0bNmDixIllnqtXrx6+//57fPbZZ4rjubm5mDNnjmkiZU107NgRnTp1wquvvlqm5+C1114zTZhMT0+HEAKtW7dWlFm+fDlOnz5tety7d2/4+vri448/VpT76KOPFKswXnjhBYwcOdI09FHMy8uLme6IakhKiUMWrsAAgPPHD1X6vGIlBocxqBT2QKjkyy+/hL+/P4YMGVLmOX9/fyxcuBBTpkzB8ePH0adPH2RlZeGDDz5AQUEB5s+fb5M2LF++HP3798fAgQPx6KOPQqPRYP369fjpp5+QkJAAwNhTERoainHjxplWU6xevRq7du2CTqcz1eXr64sFCxZg8uTJSEtLw913341t27YhPj5eUe61115Dr1690K9fP0yYMAEBAQHYsWMHPvvsM8TFxdnkdRF5quTMQmQUGP8g0HkJ1PGt2d+IxnkQxiGRMzc5kZKULPrXJYTQCSE+FEJcFkLkCCG+E0JE2Ldp7q1hw4aYNWsWvLzKj+EmTZqEuLg47Nu3D+PGjcMrr7yCu+66C/v371ckmKqJnj17Yu/evfDx8cHzzz+PSZMmITc3F/Hx8aYJkTqdDps3b8a1a9fw2GOP4YknnoCfnx8SExPRtGlTRX0TJkzAypUr8fXXX+Pxxx9HcnIyDh48qBhuadWqFX7//Xe0aNEC06dPx5NPPonz589j9+7dilUjRGS90vkfqurVC6obXunzimRSzAVBpYjSXcnlFhJiKYAmAJ4GkAXgeQCDpZTmg+JVV1QD7rCZlqvgvXYs3m/HSUhIQNeuXdVuht3M+yUFr+03rmjq30SHRyKDalTfkbQ8fPj7DQBA38Y67BoRYfG57n6vnY0D7neZaNTS/q2/AZgppTwnpUyTUs4EcIcQoqFNm0dERNV21ixfQz3/qkeotyxdUOnzXMpJlbE0gDgFYHjxAyFEdwCFALh4n4jISZhnjKyrq3qi9daPF1b6fKif1vRn56XsIuQVWbZUmzyDpQHEMwAeFUJ8K4R4AcC3AJ6QUnJWDRGRkzCfp1DXgh6IqnhpBEL9jIGIBJCUyY98KmHRHAgAEEKMArAGQA6AowBGSynPmxUxVVQ8g788ISEh0Gi4epRITQaDAenp6Wo3g2wo3wDcu98fEgICEi+1KIC2ipXR86Pvxqtbfqu0zPpL3jh3y/iZ/W7bfPQO1duqyeTkSs2pKPOvyaIQVQgxH8YhjAEAfgbwHIADQojuUspLVVzUJjjRzHF4rx1Lrft9xx13OPyaanPniX3H0/Mh958BAIT4eaFFRNVT1GbE70SzZhGVlml6KwPnLuUCADR1m6Lr3ZatAnPne+2M1LjfVXYFCCGaAfg/AH2klD9KKfOllAsAfA2AOyARETmBs2bDF2H+tks5zaWcVBFLxhLqA7gkpbxS6ngKAF/bN4mIiKylmEBpYQDx1qj+VZbhSgyqiCUBxO8A/IUQ84QQjYUQQbfnQ0wF8Ll9m0dERJYw33LbFhMoS+ritt5UvioDCCllAYD+AO4AkAjgIoApAB6UUh61b/OIiMgSyhUYNhzCMFsOei6zAHqDXXMGkguxKEyVUp4FwDzDRERO6mw1lnAOmfxylWV8tRrU9tEgs8CAQgNwIasQEUE+1W4nuQ+upyQicnEGKRVZKC1JIgUA0VOnW1ROsSsnJ1LSbQwgiIhc3OXsIuTrjUMLAd4C/l6WfbS//Nd2FpUz79HgrpxUjAGEHa1evRpCiDI/AQEB6Nu3L/bs2VOtemNjYzF48GAbt9ZxVq9ejTZt2qjdDCK3Ud0MlBkpVy0qV0/HpZxUFgMIO2vcuDG2bdum+Fm2bBkCAgLw17/+Fbt27bK6znHjxmHevHlWn3fu3DnMnTvX6vNqorxrDhw4EJ9++qnq7SByF/aaQFlSJ5dyUlkMIOwsICAAgwcPVvw89thj+Pe//42HH34YL79c9SSm0lq2bIkuXbpUXbAUZwkgGjVqhKioKNXbQeQuzOc/hFnRA9G0bWeLyjGZFJWHAYSKpk2bhsTEROTm5qrdFCJyYdVJIgUAMzdY1gNaV2c2ByKjAJbuoUTujQGEigIDAyGlRGFhyV8Pu3fvxoABA1C3bl3odDpERUVh+/btivNKz4Eofnzp0iUMHz4ctWrVQsuWLbFo0SLT/+jjx49Hv379kJ+fDyEE+vbtW2nbTp48iYcffhjBwcHQ6XTo0aMHvvnmG0WZ7OxszJo1C61bt4a/vz+aN2+OV155BZmZmZVes/QciOL2Hzp0CFFRUdDpdLjzzjvx448/AgDWrFmD5s2bIyQkBCNGjMDVq8px2xMnTmDkyJFo2LAhgoKCEBUVhZ07d5qer+y1W/I6iZxddYcw1sx+1qJyAV4C/l7GvZRyCiWu5XJDLWIAoaotW7agSZMmCAoKAgBs374dAwcORFhYGJYtW4bVq1ejQYMGuO+++7Bx48ZK67p48SJ69OiBy5cvY8WKFZg4cSJee+01fPzxxwCAF154AW+//Ta8vb2xbds2LFy4sMK6Ll26hF69euH69etYvHgxVq5ciW7duuGRRx5BfHy8qdyQIUOwZs0aTJs2DRs2bMALL7yATZs2YdiwYVZf88yZM+jduzcaNmyIzz//HI0bN8bQoUMxd+5cTJw4ERMnTsSHH36IP/74AxMnTjSdd+LECXTr1g03b97EggUL8Omnn6J169YYNGgQDh48WGk7zp8/b9HrJHJ21c1CuW9znEXlhBAcxqAybJfvlMolpUR2drbiWGpqKr766ivMmzcP77zzjqnc008/jcmTJ+PDDz80lR05ciSee+45vPLKK4iJiYEQ5e/Pe/ToUQwePBjffPMNfHyMSV4KCwvxxRdfYMqUKejYsSPS0tKg0WiqXMHxzjvvoG3btti1axe0WuOHxqhRo9C9e3e88MILiImJQWpqKn788Uf8+uuvih3gBgwYgNdffx23bt2y6pqnT5/GmDFjsHbtWgBAdHQ0goKC8Prrr2PDhg0YOXIkAKBBgwYYNGgQCgsL4e3tjRUrVqBTp07Yvn276d7ExMQgMzMTy5cvR48ePSpsx8yZM6t8ncXHiZzVzTw90vOMPQJeGiDI1z5/F9b198L5rCIAxiGTXg11drkOuQ72QNjZyZMnERgYqPhp3rw5Xn/9dbz22muYNm0aAOD333/H6dOnMWPGjDJ1vPLKK0hOTsaRI0cqvE6jRo2wbt06U/AAAJGRkbh8+bJV7dXr9dixYwf+8Y9/4NatW8jOzjb9DB06FFlZWTh06BDq1KmDsLAwLF68GGfPnjWd36ZNG3z55Zfw9/e36rpNmjTB8uXLTY/9/PzQoEEDPPnkk6bgATBuQV1UVIT09HQAxmBn3759puAhPz8f2dnZ6NChA5KSkip9nd98802Vr5PI2Z3NNBu+8POCpoI/MmqKPRBUmlMFELGxsYp8CYmJiUhMTIQQAs2bN4cQArGxsQCAhg0bmsoVr0iYNGmS4vzLly9jy5YtimOffPIJACiORUdHAzD+1Wt+3BaaNm2KvXv3Kn4SExORmpqK1157zVTu5MmTqF+/Pho0aFCmjvDwcISEhOD06dMVXqdDhw6oU6dOmePWTnZKSUlBdnY2RowYUSbwCQ4ORlZWFs6ePQsfHx9s2bIFZ86cQYsWLRAZGYlnnnkGv/76q1XXK9auXTvUqlWrzPH+/cvfLdD8df34448YMmQIgoOD4efnh8DAQMTGxlb62lNSUpCVlVXl6yRyduaJncIszEBZ7O0dlm9nZD6R8iw31SI42RBGbGysKUAwJ6XEuXPnEBERYTpW3l/Wn3zyiSlAKNawYcNyv0jKO7ZlyxbrG10Ff39/i5YsajSaSr/wCgsLERgYaMumlau4y37lypVo1apVuWUiIyMBAD179sTPP/+MpKQk7Ny5E99//z369u2LESNGmJJo2du2bdswZMgQjBw5EkuXLkWDBg3g7e2NtWvX4vjx4xWeZ83rJHJmNckBkXzsEILrlf2jpTwhviV1X8pmAEFOFkB4ssjISFy7dg0XLlxAkyZNFM/98ccfyMzMdMgXWlhYGOrUqYO8vLwygY9er8eVK1cQGhqK3NxcZGZmIjw8HM2bNzdNdDx69Cg6dOiAZ599FnfddZfd2/v2229j6tSpinkjALBv375KA4iwsDCEhoZW+TqJnF11s1ACwNJnxmD54XSLygabza24mF1k1XXIPTnVEIYn69SpE1q0aFFuhslZs2ahT58+aNq0aY2u4eXlVeWQhhACDz74IObPn4+bN28qnouLi0OnTp0gpcTBgwfRrNWLN5AAACAASURBVFkzJCcnl7lGcT2WXrMm0tPTy6TFPnHiRJlMl6XbIYTAmDFjqnydRM7urJ2zUBar41dS98XsQv7/QeyBcBZCCHz00UcYMmQI0tPTERMTA71ej7i4OOzbtw/79++v8TVat24Ng8GAf/7zn7jzzjsxcODAcss9++yz2LdvH7p27Yp//OMfqF+/Pnbv3o0VK1YgLi4OXl5e6Nu3L+6//3706NED//jHP9C2bVucOnUK77//Pu6//3507tzZqmtW17333osFCxbAz88P9evXx8GDB7Fo0aIywz3ltSM2Nhb/+c9/Kn2dRM7OfA6EtT0Q1vD30sBXK5Cvl7hVJHEj34AQP65S8mT8hHQigwYNwt69exEbG4vJkydDo9HgnnvuwYEDB9C+ffsa11+3bl2sWLEC06dPx82bN5GXl1duueDgYPz000+YMWMG5s+fj9TUVLRo0QLr1q1DTEwMAGPAs2nTJnz88cdYtmwZkpKS0Lx5c7z88suYOnWqqQfC0mtW17x585CVlYXp06cjIyMD7du3x/Lly3H16lVs3bq10tceEhJS5eskcmYFeokLt+cjCAChVvZAjH1jkVXl6/hqcPV2EqmLWYUMIDycsGE3lF37s0pPoiT74b12LN5vx0lISFDkLXF1J2/ko/XnZwAYv9wXRNW36/Xe+18ajqcbh0z+PbQJ7m9e8cRud7vXzs4B97vMrHjOgSAiclFnq5mBsthTnUKsKh+sWInBiZSejgEEEZGLsvc23qXV8VVOpCTPxgCCiMhFme/CGaaz/5S2OlzKSWYYQBARuaia9kB07D3IqvLB5ks5s9gD4ekYQBARuSjlLpzWBxDTlqy3qrxyCIM9EJ6OAQQRkQuSUpZKImX9EMaSaaOtKs85EGSOAQQRkQu6nqvHrSLj6nl/L4EAb+s/zv/Ys92q8gHeAl63L5NZYEBWgd7qa5L7YABBROSCks3mIIQ6KKGTEELRC8GlnJ6NAQQRkQtKziwJIByZEZLDGFSMAQQRkQs6n1XzAMLSnTjNBfuZLeXMYg+EJ2MAQUTkgmwxhLFn02qrzwlmDwTdxgCCiMgFJWeWrMCobg/E2jnPW30Ol3JSMQYQdrZ//37ce++9qFWrFsLDwzF9+nQUFBQoynzwwQcQQpT7U3r3ytjYWNSuXRuRkZHlbvEdGxuL9957z+L29e3bFwsWLKjei3Ni7vq6iIqdNxs+cNQkSoBzIKgEt/O2o+PHj2PgwIF46KGH8Mwzz+Dy5cuYPXs2/vzzT3z99demLa9TU1PRrVs3zJkzp0wdPj4+pt//9a9/YeHChXj33Xexf/9+jBgxAmfOnIG/vz8AICUlBe+88w4+/fRTi9u4cOFC1K1b1+rXtnv3bly8eBHjxo2z+lwiqjlb9EBURx3FHAgGEJ6MAYQdLVq0CH379sXatWtNx/7yl7/gnnvuwZYtW/Dggw8CANLS0hAZGYnBgwdXWt8nn3yCKVOmYMqUKZg4cSIaN26MrVu3IiYmBgDw9ttvo3nz5hg5cqTFbezevXs1XpkxgPj5558ZQBCpIKtAjxv5BgCAlwACfarXmTz1g3VWn6NYxpnDIQxPxiEMO/rhhx/wyCOPKI5169YNMTExWLVqlelYamoqwsLCqqwvKSkJnTp1AmDsmWjXrh2SkpIAAFevXsXSpUsRGxsLjYZvK5E7K70CQ3O7N9Nazdp1tvqcQB8NNLcvl3pLj7wiQ7WuTa6P3zR2dPnyZbRt27bM8ejoaOzdu9f0ODU1FaGhoabHhYXldwvWq1cPFy9eND2+ePEi6tWrBwCYP38+WrdujYcfftiqNpaeK9C3b18sW7YM+/fvR8+ePREQEIAuXbpg+/aSjHURERGYPXs2tm/fDiEEYmNjTc999913uOeee6DT6RAcHIzo6GgcOXJEcc2IiAjEx8fj3XffRaNGjdCgQQNs2rSpTLAFADdu3ED79u0Vrzs+Ph7du3dH7dq10bhxYzzxxBO4du2aVa+byJXZKgfEKwPaW32ORgjFSgwmk/JcDCDsqHbt2rh582aZ461atUJaWhqysrIAGIcwDh48iO7du0On08HHxwdt2rTBhg0bFOc9/PDDWLZsGY4cOYIlS5bg8uXLGDx4MC5duoRPPvkEc+bMMc2rqIm9e/eib9++iIyMxOeff46uXbsiOjoahw4dAgB8/vnnGDt2LLp06YJt27aZhjG2bt2KIUOGIDIyEqtWrcLixYshhECvXr1w8uRJxTUWLVqE+fPnY8aMGVi3bh2aNGmCzZs3lwkEvv76a2RkZKBRo0YAjIHShAkT0KdPH6xatQqzZ89GQkIC7r///goDLyJ3o1YSqWLBim29+f+dp+IcCDsaMGAAVqxYgYEDB5qOFRUVYdasWQCArKwsBAYGIjU1FWfOnMFbb72FDh06ID09HV999RVGjRoFrVaLESNGAACeeuopfPvtt+jYsSO0Wi0+/PBDhIeHY+rUqejYsSOio6MBAHq9Hlpt9T9UDhw4gDlz5uD1118HAIwYMQKnTp3Cxo0b0blzZ/Tp0we7du1Camqqad6GwWDAtGnTMH36dMydO9dU14QJE/DEE0/gjTfeQHx8vOl4YmIi9u/fjx49epiORUREYOPGjXj66adNxzZu3IgRI0aYAqN3330X77//PiZNmmQq89BDD6Fx48Y4cOAAevfuXe3XTeQqbJFEqiaM8yCMbeBSTs/lVD0QsbGxFS5nbN68eYXP2evHvGu+OmbMmIFvvvkGTz/9NJKTk3Hs2DE8+OCD2L17NwAgKCgIANC/f3/ExcXh6aefRr9+/TB8+HCsXbsWzzzzDN544w1TfX5+ftixYwcOHTqE8+fPY8qUKTh//jw+++wzvPnmm5BS4rnnnoO/vz8aNmyI//73v9Vqd79+/UzBQ7HIyEhcvny5wnMSExNx6dIlTJ06FdnZ2Yqf8ePHY9u2bZBSmsoPHz5cETwAwKhRo7BuXcmkrrS0tDLzSFJTUxXBQ25uLnx9fdGkSRPTfBAid2erfTCihj9WrfMUyaS4EsNjOVUA4W46dOiA9evXY82aNYiIiED79u3h7e2NZcuWwc/PDwEBAQCAuLg4DBs2rMz5EydOxPHjxxXDIEIIdOrUCQ0bNgQAzJ07F926dcOgQYPwxRdfYOXKlVi1ahWGDRuG0aNHIzMz0+p2d+vWrdzj5gFAaWfOnEFRUREaNWqEwMBAxU+fPn2QmZmJtLQ0U/kHHnigTB2jR4/Gzz//bAoEvvrqKzRs2BA9e/Y0lSksLMSiRYtw5513wtvbGwEBAQgMDMSpU6cqbR+RO7HVEMajsyzPGWNOsZSTQxgei0MYdvbwww9j8ODBOH78OMLDw9GoUSOsX78erVu3rvJcX19fACiTTKpYUlISVq1aZeppiIuLw7Rp0zB27FiMGjUK33zzDbZs2YKxY8fa7gVVQKvVIiAgAN99912FZQIDA02/F782c+3bt0fHjh2xfv16zJgxAxs3bkRMTIxiXsdjjz2G77//HlOnTsXcuXMRFBQEIYRDXiORszhvox6IeY/0w8wNu6w+jztyEuBkAURsbGyFwwbnzp1DRESEQ9tTUxcuXIBOp0NoaCi6dOliOn7s2DH87W9/AwAkJyfjxRdfxJo1a+Dn56c4PzExEYGBgRUmepozZw7uvfde9OvXDwBw6dIlREZGAjB+obds2RKXLl2yx0sro02bNsjJyUFoaGiZlSe5ubnIzMwsN2gobcyYMVi7di2efPJJ7Nq1C/PmzTM9d+7cOcTHxyMhIUFxPwGYkmkRubtCvcTl2/kXBIDgGgQQ548fqtZ5zEZJAIcw7OrFF1/ElClTFMdyc3MRHx9vSvZUr149fP/99/jss8/KlJszZ45pImVpp06dwhdffIE333zTdKxp06Y4duwYAONkzRMnTqBp06a2flkAAC8vLxgMJeu/O3bsiE6dOuHVV19VHAeAKVOmKOYtVGbUqFE4cuQIYmNj0bRpU0Wiq/T0dAghyvTeLF++HKdPn67BqyFyHRezC2G4PVpX20cDb03NV15ZS7kKgz0QnsqpeiDczZQpUzBgwACMHz8e0dHROHPmDD777DMMHDgQXbt2BWD8y3nhwoWYMmUKjh8/jj59+iArKwsffPABCgoKMH/+/HLrnj17Nv7617+iV69epmMTJ07EhAkTcMcdd+DAgQMwGAymlRm21qFDByxatAjx8fFo1aoVunTpguXLl6N///4YOHAgHn30UWg0Gqxfvx779+/Hr7/+alG9ERERuOeee7Bs2TK89NJLiufatGmD0NBQjBs3zrR0dPXq1di1axd0Op3NXyORM7LlCoyguuHVO89XCwFAAriaU4RCvYS31vGBDKmLPRB21LdvX+zcuROnTp3CuHHjsGTJEjz++OP46KOPFOUmTZqEuLg47Nu3D+PGjcMrr7yCu+66C/v371ckmCqWk5ODH374oczeGTExMXj22Wfx4osvYt++ffjyyy9NEzVtbejQoRgzZgz+/ve/Y+bMmQCAnj17Yu/evfDx8cHzzz+PSZMmITc3F3v27DENrVhi9OjRkFKWScmt0+lMuSIee+wxPPHEE/Dz80NiYqLdelqInI35BMqabqK18Idj1TrPSyNM6bMlgCtMae2RhA1nrtt1CrwrzoGwp9zcXLv91c177Vi8346TkJBg6v1zVW8eTMEbB1IAAH9rGoDhrWpXu64tSxcgeur0ap371i+ppuWkP42MwF8aKj+P3OFeuxIH3O8yXUzsgXBR7LIn8ky2HMLY+vHCap/LpZzEAIKIyIUohjD8HZ+Fshj3wyAGEERELsQ8C2WIr3oBBJdyEgMIIiIXIaXEeRv2QMyI31ntcxVLObPYA+GJGEAQEbmIlFt65OmN89X9vQT8vdT7CK/jxx4IT8cAgojIRdh6G++3RvWv9rnKIQz2QHgiBhBERC5C7W28zZlPorycUwi9gZvZeRoGEERELsJW23jbgo9WIMDbmBqgyABcv8VeCE9jcQAhhOgjhEgUQuQKIQ4JIQbZs2FERKRk6yGMIZNfrtH5imEMTqT0OBYFEEKIFgDWAXgRQDiANwGsF0I0tmPbiIjIjK2HMKqbhbKYchiDAYSnsbQHYgqA96SUu6SUmVLKTQA+A9DSfk1TKr3DI9kP77Vj8X6TpWy5DwYAvPzXdjU6P8hsKeeVHK7E8DSWBhDDYeyBMJFSviSl3G3zFlUgPT3dUZfyeLzXjsX7TZZKtnEPREbK1RqdH+zDHghPVmUAIYTQAggD0FIIsVMIcVoIsUEI4bDeByIiT5ddYEB6nh4A4CWA2j7qz4E374G4zKWcHqfK3TiFEPUBXARwHsB0AP8DEA3jfIg7pZQpt4uaKkpISLBLY4mIPNXZXIFHfvMHAAR7S0xpVlDjOlc9OwYT3ltXdcEKnMrRYNMVbwDAX+ro8X77/Bq3iZxHqd09y+zG6WVBHQJAAYB+Usrzt48tFkK0BzAewD+ruKhNcGtYx+G9dizeb8dx5XudkpQF/HYBAFC/li8imjWscZ2zv95fswoyC4AraQCAHK0OXbt2ND3lyvfaFalxvy3pA0sBUAjgcqnjhwFE2LpBRERUlq3nPwDAmtnP1uj8IM6B8GhVBhBSSj2ARAD3l3qqHYAL9mgUEREpnbdDEql9m+NqdH5tH42pXzvllh6Femaj9CSWDGEAwHwAnwkhbgJIADAQwCMAOtirYUREVMLWSaRsQasRCPTRILPAuBT5am4RmgR6q9wqchSLpvFKKXfAOIHyMwA3AcwBECOlvGTHthER0W3OtA+GuWDFSgzmgvAklvZAQEq5HsB6O7aFiIgqYI85EG/vOFrjOoJ8tcDtNNacB+FZ1F9ITERElSoySFwyy7MQ4mubACL52KEa1xFklo+CAYRnYQBBROTkLmcXoXi37No+GnhryyzJr5alz4ypcR2K/TCYTMqjMIAgInJyyVklSaOcaf4DcHsI47Yr7IHwKAwgiIic3HmzrbKdLYBQTKLkhloehQEEEZGTS860Tw/E2DcW1bgObqjluRhAEBE5OfMeCFslkQKA3iPG17gObqjluRhAEBE5OXvlgHiqU0iN6wg0y0aZlqdHfpGhxnWSa2AAQUTk5JwxC2UxjRCKXghOpPQcDCCIiJyYlNJps1AWM99UiwGE52AAQUTkxG7kG5BdaBwW8NUKBHjZJgcEAHTsPcgm9ShXYjCA8BQMIIiInNh58+ELXy2EsF0AMW2JbXYnMM8FwQDCczCAICJyYuZ7YNSx8fDFkmmjbVIPN9TyTAwgiIicmPn8B1su4QSAP/Zst0k9QcwF4ZEYQBAROTHlBErn/MjmKgzP5Jz/GomICIBzL+EsFsw5EB6JAQQRkROz5xLO5YfTbVJPEOdAeCQGEERETsy8B8LWcyD2bFptk3pqeWugub045Ea+AbeYjdIjMIAgInJS+UUGXM01DgkIKIcKbGHtnOdtUo9GCAT5cB6Ep2EAQUTkpC6abU4V7KuBVmO7HBC2psgFwU21PAIDCCIiJ5Xs5CmszQVzJYbHYQBBROSkztt5BcbUD9bZrC5lLghOpPQEDCCIiJxUsh2TSAFAs3adbVaXMhsleyA8AQMIIiInZe9dOF8Z0N5mdXE/DM/DAIKIyEk5+zbe5pTJpDiE4QkYQBAROSlXyEJZzHwZJ4cwPAMDCCIiJySltHsPRNTwx2xWl3kPBFdheAYGEERETuh6rh75egkA0HkJ+HvZ/uP60Vnv2ayuAG8B7e00FRkFBtzS26xqclIMIIiInJAj5j/Me6SfzeoSQih6IVILnDfpFdkGAwgiIifkiCRS548fsml95ptqpTCAcHsMIIiInJArrcAoFuzDHghPwgCCiMgJ2TsLJQAE1Q23aX212QPhURhAEBE5IUcMYSz84ZhN6+McCM/CAIKIyAk5Yghjy9IFNq0vmD0QHoUBBBGREzJPImWPfTAAYOvHC21aXxDnQHgUBhBERE4mp9CAtDxjIgWtAGr7uMZHtaIHIp8BhLtzjX+VREQe5ILZ8EUdPy00wjW+jM031LpeICClVLE1ZG8MIIiInIxiDwxf+y3hnBG/06b16bwEfDTGYCfPIJCRb7Bp/eRcGEAQETmZpMwC0++h/q6RAwIozkZZ8rVyibtyujUGEERETiYpo+SLN8yOSaTeGtXf5nWaL+W8xF053RoDCCIiJ5NkNoQR5kI9EAAQ7GfWA8EAwq0xgCAicjLmQxhh/l4qtsR65j0QF7M5hOHOGEAQETkZ8yEMe+WAAIAhk1+2eZ11OIThMRhAEBE5kawCvSkHhJdGucOlrUVPnW7zOhWTKNkD4dYYQBAROZHSvQ/2zAHx8l/b2bxO9kB4DgYQREROJEmRwtq+8x8yUq7avE7FKgwu43RrDCCIiJxIUob5BErXWoEBGNNuF/eZXM/VI7+IyaTcFQMIIiIn4sglnE3bdrZ5nVqNUMzbuJLDYQx3xQCCiMiJKJZw2nEFBgDM3LDLLvUqhzEYQLgrBhBERE5EMYnSzj0Qa2Y/a5d6uRLDMzCAICJyElJKhyaR2rc5zi71Mp21Z2AAQUTkJFJv6ZFTaNwC208rEODlGtt4l8ZslJ6BAQQRkZMoPYFS2DEHhD3V8eV+GJ7A6gBCCPGgEEIKISJs3xwiIs9lvoTTnimsi72946hd6lUOYbAHwl1ZFUAIIUIALANQUFVZIiKyjiKJlANyQCQfO2SXejkHwjNY2wPxIYBNAFLs0BYiIo+mSCJl5yyUALD0mTF2qVexCiOnCFJKu1yH1GVxACGEeAhADwCv2q85RESey5FJpOzJz0sDX40xaCjQS6Te0qvcIrIHiwIIIUQojEMXT0opc+3bJCIiz+QuAQQABHqV9DpwGMM9CUu6loQQ6wHckFJOvf34IoAoKeU5s2KmihISEmzcTCIi96aXQNR+fxRJ48qLF+7Ih4+d18n977vNuGvwcLvUvf6SN87dMr6Axe3yEBXCPTFcTdeuXc0fllkSVOUgmxDiYRiHLjpV86I2kZCQYJd6qSzea8fi/XYcZ77XF7IKUfTTKQBALW8NIptH2P2aEU+9YLe6A69dNP2ua9AcXTvWsdu1SJ1/25bEt9MANABwTgiRKoRIvf34NyHEh3ZtHRGRh1BjF86nOoXYrW7lEAaXcrojS6b5PgLAt9SxXwEMA3DC5i0iIvJAiiWcDsgBYW/mAcRFzoFwS1UGEFLKMks2hRB6AFellDfs0ioiIg+jRg+EPQWafbuwB8I9MZU1EZETOGe+AsNBPRAdew+yW92BWq7CcHfVylQipWxs64YQEXky5RJO+yeRAoBpS9bbrW7OgXB/7IEgInIC5tt4OyKNNQAsmTbabnXrtID29sK/G/kG5BZyGae7YQBBRKSyAr3ExSxjN78AEOKgIYw/9my3W91CAEHcVMutMYAgIlLZ+axCUya+YF8NvDWuuY13adzW270xgCAiUpliG28HzX9wBMWunDkMINwNAwgiIpUlqbACAwCWH063a/3BHMJwawwgiIhUplYOiD2bVtu1/mAOYbg1BhBERCpTaxfOtXOet2v95j0QF9kD4XYYQBARqUwxB8IN0lgXq6MYwmAPhLthAEFEpLIzGY5PIuUIwX7mQxjsgXA3DCCIiFSUdqsIaXl6AIC3RjlvwN6mfrDOrvUH+5T0QFzJKYLeICspTa6GAQQRkYpO3SwZvqiv84JGOC4HRLN2ne1av7dWIMDb+Hr0Erh+i8MY7oQBBBGRik7cKAkg6ukcO3zxyoD2dr+G+TyI4myb5B4YQBARqejkDWUPhLthLgj3xQCCiEhFJ2/mm36vr3OfFRjFzOd0XORKDLfCAIKISEXmQxjhDu6BiBr+mN2vYb4x2Pks9kC4EwYQREQqMUiJUyrOgXh01nt2v4Z5XotzmQwg3AkDCCIilVzIKkSe3ri0MdBbgwBvx34kz3ukn92vYZ7X4lxmQSUlydUwgCAiUslJRe+D4+c/nD9+yO7XYA+E+2IAQUSkkpM33XsFBgAE+WqgvZ3aIuWWHjmFBnUbRDbDAIKISCUnVF7CGVQ33O7X0AihmEiZzF4It8EAgohIJSdvmC/hdHwAsfCHYw65jnIYg/Mg3AUDCCIilSh7IBw/B2LL0gUOuU6oP+dBuCMGEEREKsgrMpi68wWAuir0QGz9eKFDrsOJlO6JAQQRkQrOZBSgeG/KUD8tvDWO20TL0RhAuCcGEEREKlAMXwS45wqMYswF4Z4YQBARqUCxiZa/OntgzIjf6ZDrsAfCPTGAICJSwQnzFRhu3gPBXBDuiQEEEZEKnGEb77dG9XfIdZgLwj0xgCAiUoEnZKE0x1wQ7ocBBBGRg6Xn6ZF6Sw8A8NYAwb7u/1HMXBDux/3/1RIROZnSGSg1Qp0lnEMmv+ywa3EipfthAEFE5GBq74FRLHrqdIddiwGE+2EAQUTkYMptvNULIF7+azuHXYu5INwPAwgiIgc7ebNkCCNchT0wimWkXHXYtdgD4X4YQBAROZizDGE4EnNBuB8GEEREDmSQEqecZAijadvODrsWc0G4HwYQREQOdDGrCHl64zZagd4aBHir9zE8c8Muh16PuSDcCwMIIiIHMk9hXU/F+Q8AsGb2sw69HnNBuBcGEEREDmSegTJc5fkP+zbHOfR6nEjpXhhAEBE50JFUz9lEqzQGEO6FAQQRkQP9npJn+r1xLW8VW+J4zAXhXhhAEBE5iN4g8UdaSQDRpJa6PRBv7zjq0OuxB8K9MIAgInKQMxkFyCk0rsCo7aNBbV91J1EmHzvk0OsxF4R7YQBBROQgzjZ8sfSZMQ69HnNBuBcGEEREDnIopWQCZZNAz5pAWYy5INwHAwgiIgdxth4INTAXhPtgAEFE5CCHUs0mUDpBD8TYNxY5/JqcSOk+GEAQETlASm4RLmUXAQC8NUA9f/UDiN4jxjv8mgwg3AcDCCIiBzDvfWhUyxtajVCxNUZPdQpx+DWZC8J9MIAgInIA8wmUjVXO/6Am9kC4DwYQREQOwAmURqVzQWQXMBeEq2IAQUTkAOYBRJNA5wggOvYe5PBraoRAmNlKDPPdScm1WBRACCE0Qoh/CCFOCCGyhRB7hBDd7d04IiJ3kFdkwJ83nG8IY9qS9apct4HZJmLH0hlAuCpLeyDmAXgcwDgAjQB8AmCrEKKxvRpGROQujqXno+h2T31dfy38vJyj83fJtNGqXLdBQEkPzLE0BhCuytJ/xX8H8KiU8lcpZYaU8gsA/wFwn/2aRkTkHpQTKJ1j+AIA/tizXZXrNmQPhFuwtB8tCsDJUse0APxt2xwiIvejnP/gHMMXalIOYXApp6uyqAdCSnlCSimLHwshWgKIBvCtvRpGROQuuAJDqb7OC8VZMM5mFOBWEVdiuCJhFhdYdoIQ9QH8AGCllNI8D6qpooSEBNu0jojIxUkJ9P/ZH9l641fm1Gb5CGIMgWXJPrhZaLwna++8hcha1n0Xkf117drV/GGZzGdW9aUJIToA2AJgRangobKL2kRCQoJd6qWyeK8di/fbcdS41+cyCpD902kAgM5LoFOLZhBC/SyUALBn02q7pbM+l3wOEc0iKny+6c103Ew1zn8Q4S3RtU2QXdrhKdT4t23xVGAhRD8A3wN4VUr5lv2aRETkPkoPXzhL8AAAa+c8r9q1wzmR0uVZ1AMhhOgM4HMA0VJKjk8QEVnoUGrJl6OzJJByBuYrMY4zgHBJlg5hfAjgLQC/CyHMzzFIKTn7hYioAsoeCK7AKMZkUq6vyiEMIUQQgHsBLANQWOrnK7u2jojIxR1ywhTWxaZ+sE61a4frSgKIUzcLUKDnJEpXU2U4LKXMQDmzL4mIqHI38/RIur3jpEYox/2dQbN2nVW7tp+XBiF+WqTn6VFkAE7fLEC7UF/V2kPWc458qkREbmj/lVzT741qecFb41x/i70yoL2q1+cwhmtjAEFEmnQw7QAAEalJREFUZCc/XiwJICKD+dd1aYoAgntiuBwGEEREdrLnUkkA0SrYR8WWOCf2QLg2BhBERHaQU2hAwvVbpsfOGEBEDX9M1eszgHBtDCCIiOzgwJVc0xbeDQO8UMvH+T5uH531nqrXb2C2EuPkjQIUGbgSw5U4379oIiI34ArDF/Me6afq9XXeGgTdDqzy9RJJGdyZ05UwgCAisgPzCZSt6jhnAHH++CG1m8CtvV0YAwgiIhvLKzLg4NWS+Q+RTtoD4Qy4EsN1MYAgIrKxX67eQv7tzIr1/LUI8tWq3KLyBdUNV7sJnEjpwhhAEBHZmGL+g5MOXwDAwh+Oqd0EBhAujAEEEZGNmQcQzjx8sWXpArWbgIYBJfuDHE/Ph0FyJYarYABBRGRDhXqpSGHtrCswAGDrxwvVbgJq+WgQ6G38KrpVJJF8e+8Qcn4MIIiIbOi367eQU2j8KzrET4tQf+faQMsZmQ9jHOcwhstgAEFEZEM/usjwhTMJ5zwIl8QAgojIhlwhgVSxGfE71W4CAGUPxJFUBhCuggEEEZGN6A0S+1xkBYYzaVyrJID42Sx/Bjk3BhBERDZyODUPGQXGDTBq+2hQz9858z8Ue2tUf7WbAACIqO0DrTD+fuJGAa7nFqnbILIIAwgiIhspvXxTCKFia1yHj1YgonbJcs59l3MrKU3OggEEEZGN7LyQY/qdwxfWMZ8vsvcSAwhXwACCiMgGMvP12J5cEkC0DfFVsTWWGTL5ZbWbYNLSLIDYwwDCJTCAICKygS1J2ab9LxrX8kJ9nfPnf4ieOl3tJpi0CPJB8YDP7yl5yMzXq9oeqhoDCCIiG9h4MsP0e5f6/iq2xHIv/7Wd2k0w0Xlr0Oj2agyDBA5c4WoMZ8cAgoiohjLy9fjObPiiSz0/FVtjuYyUq2o3QcF8GGMvJ1I6PQYQREQ19O3ZLBTcHr5oEugawxfOiBMpXQsDCCKiGtpwMtP0e9d6rjF8AQBN23ZWuwkK5gHEwau3kF9kULE1VBUGEERENXAjT4//JmebHnep7xrDFwAwc8MutZugEOSrRd3bybfy9RIJ1/NUbhFVhgEEEVENfHMmC4W3/1BuFuiNui60++aa2c+q3YQyWnIYw2UwgCAiqoGNp0qGL1yp9wEA9m2OU7sJZXAehOtgAEFEVE3peXp8f75k+KKri6y+cGbmPRA/Xc6F3iBVbA1VhgEEEVE1/etMJorn+TWv7Y1QFxq+cFb1/LWo7WP8asooMOBIGrf3dlYMIIiIqmnDCdcdvgCAt3ccVbsJZQghOIzhIhhAEBFVQ+qtIvxwwTx5lOss3yyWfOyQ2k0oFydSugYGEERE1bDij5u4nTsKLYK8EeKnVbdB1bD0mTFqN6FcrUplpJSS8yCcEQMIIiIr5RQasOh/aabHvRrqVGyN+2lUywt+WuPWWldyinAmo1DlFlF5GEAQEVnp48M3kHrLuFtkqJ8WPcNdb/jCmWlKzYP40izTJzkPBhBERFa4VWTAPxNTTY8HNwuAViMqOcN5jX1jkdpNqFA3s6Dss6M3YOAwhtNhAEFEZIVPj9zEtVxj70MdXw3uceHhi94jxqvdhArdVdcP/l7GwOxMRiF+vMjJlM6GAQQRkYXyiwx4O6Gk9+FvzWrB20V7HwDgqU4hajehQj5agR6KXoibKraGysMAgojIQquPZeBSdhEAoLaPBlEu3PvgCszv76ZTmbiRp1exNVQaAwgiIgsU6iUWmPc+NA2Aj9Z1ex9cQZNAbzQNNGb3zNdLrP0zQ+UWkTkGEEREFvjizwycyzQuJ6zlrUHvxq7f+9Cx9yC1m1Al816IT4/cYE4IJ8IAgoioCjfz9Ij9OcX0eEDTAPhqXf/jc9qS9Wo3oUrd6vvD+/atPpSaj9+u56nbIDJx/f8DiIjsSEqJiTsu43yWsfchwEugrxv0PgDAkmmj1W5ClXTeGkWa8E+PcDKls2AAQURUiWWHb+Cr01mmx+PaBsHfyz0+Ov/Ys13tJlikV8OSAGLdiQzkFhpUbA0Vc4//C4iI7OD363l4bs810+O+jXW42wU3zXJ1rYJ9UM/fuNdIZoEBm04xM6UzYABBRFSOrAI9Rv7nIgpu75jVpJYXRrSsrXKrPJMQQrHfyPu/p6PIwMmUamMAQURUipQSU3ZexambBQAAX63Akx3rwNvNlm0uP5yudhMsdk8DfxTf/t+u52HuwZTKTyC7YwBBRGRGSonXD6Qocg6MbROE+jovFVtlH3s2rVa7CRYL8tViyB2Bpsdv/pKKfZeY3lpNDCCIiG7LKzJgzLZLmPdLScKoXg38FSmV3cnaOc+r3QSrDG4WgMjbu3QaJDD2u0u4yeyUqmEAQUQE4HpuEfpvTka82dbR7UJ8Map1kIqtInMaITChfTB0tzfZOp9ViKd+uMLkUiphAEFEHu9oWh56xCfhwJVbpmN9GukwrXMdpqt2MiF+WjzatiSo23gqE58fY4prNTCAICKPdSm7EFN3XsFda8+a0lQLAI9E1sbo1rWhdeGdNi0x9YN1ajehWu6u548os9wQ03Zfwa4LOSq2yDNZHEAIISYJIc4LIW4JIbYJIRrZs2FERPZyLacIz/14FS1WncaywzdQnJfIVyswtXMd9G8SACHcO3gAgGbtOqvdhGobGVkb9XXG3BA5hRL9Nyfj/3ZeQXYBk0w5ikUBhBDiQQCvAngIQAMAfwKIt2O7iIhs6npuEbZc02LE1gtovuoU3vtfOvL1JWPnLYN98ErXUHQK81OxlY71yoD2ajeh2ny1GjzZoQ4CvEoCvaWHb6DjF2fww/lsFVvmOSxdl/Q8gNellL8BgBDiRQDnhRAdpJRH7NY6IqJquFVkwIn0AhxNz8fRtDzsvJCLX67egoQvgCxF2Yja3hh6RyDahvh4RK+DO2kS6I1ZPeti7Z8ZOJSaDwA4l1mIAV+dxwPNa2Fws1r4W7MAtArme2sPVQYQQggtgJ4AhhYfk1LqhRA/AugKwCEBRIEByC9i15Qj8F47lrvd79Lz4YsfS1n6d2k6JiVgAKA3SOgloJcSegNQaJAoNEgU6CUKDBL5eonsAgNyigzILjAgu9CAtDw9ruUW4VpuEa7n6nExuxBJGYVl2lFak0AvPHhHIDqG+vLLxYUF+WoxpVMd/HotD/EnMpBTZHzn/52UjX8nGXsimgV6o09jHZoGeqNBgBfCA7wQrvNCbR8NfLUCvtri/wpoBCCEccWHgPH30v86nO1fi0alf7+iquUvQogwAElSysBSx/8JIE1KueD2Ibuuo/F5/ygKpbO9bUTkKjQCaORrQLfGQegU5otwnZfdAoctSxdg68cL7VK3Oxoy+WVET51e43oy8vVYfyIT/0vxrC2/p3Wug8cDL+H/27vbELmuOo7j399sdpNtnkyJyW7iQxqfqqZWaF6oic9UQosxCRQRW/Eh2BcKFhW0ICERFIVKEQSNLyQNSIXQ1iJiqRrWRomarcHG0KJo2jx13UZj09RNstn9++LMrpPJJjt3du+dmZvfBy6zc+bMnf/8984955575s7atWvzfJnLPiyNnMLoBqaa3jrOFeZQDA4OZgurIeW8kIuZzT4RLOsJXjUvWDFvnNf2BmsWjrFgDsAwnCMtOekeOZ3fykuoe+Q08//1jxmvZz7wpZUwtFQceqnCX850cfhshXPj5T74HB4ehoWz3/ZO1yFppAPxArBwivIlwLFmXrQZ3fsPo7YbOCqnGB9HFX/DtyhlzPeVhnxrh4Mr0uR9kUYIuiqiqzp8XBH0VERPV3Wp/r2gu8L87srk7fXzKizrncPy6+aw7Lou+ubPYfXinil/cntwcDDvozQA/jawgkdyf5XyeNuqFXz8fbfksu7RseAPQyMcOnWOof9eZOjli5O3Z0eD82PjnB+LySWA8Uin18bjyqfk2smKvuXA8UK27VrTnsIAkHQcWB8Rz9aU7QW+ExE/rxblmteiPvjmXBfN+S6Oc30pSbldxdG5LlYB+b7sCL7Rw55HgS2Ta5EWkSZQ7p+duMzMzKyTNPo1zp3AXkkHgSeB+4AHI6JzfgvWzMwu0d/f3+oQrIM1NAIREU+RrgWxB3iONKPxnhzjMjOznJ08ebLVIVgHa3jmVkTsjoilEbEkIu6KiJHpn2VmZu1q+/btrQ7BOli5pn6bmVnDduzY0eoQrIO5A2FmZmaZuQNhZmZmmbkDYWZ2jcrnqsF2rXAHwszMzDJzB8LM7BrlK0XaTLgDYWZmZpk19FsYDWrH3xgxMzOzmWv6tzDMzMzMJrkDYWZmZpk1+mNajbhseMPMzMzKySMQZmZmllnbdCAkfVbSUUkjkn4haeVV6i6S9GNJZyUNS9peYKilkDHfKyXtrub6n5K+L2lRkfF2siy5rnveQ5IGcg6vVDJu1xVJ2yQ9L+lFSXskLS8y3k6XMd83SxqQ9JKkIUnfkNQ2bVC7k9Qn6ZikrdPUK6x9bIt/nqSNwL3AJqAfeAb4yVWesgsYBW4A3gPcIekzOYdZGlnyLakXeAI4BdwMrAUWAz8qJNgO18S2PfG8jwG35xtduTSR668A64BbgDcDZ4Af5hxmaWTcjywEHqs+/mrg/cCHgM8VEmyHk9QDPAy8ooHquyiqfYyIli/AAHBnzf0u4ASwZoq6q0gf9N6asg3AYKvfR6csGfN9O3Cgrmwu8DIwr9Xvpd2XLLmuqbMcGCLtnAda/R46Zcm4XXcBx4H+mrJFwK9b/T46ZcmY7/XAn+rKNgGPt/p9dMICbAYeAh4Atl6lXqHtY8tHICR1Ae8AfjZRFhFjwG9IR7v11pM+5CM1ZQPATZJmc1JoKTWR733AR+rKKqSdRU9OYZZCE7mesJM0wnMg1wBLpMn9yNMR8XxN/TMR8cG8Yy2DJvL9LLBa0ltrym4Fns4xzNKIiEeAO5j+ekuFto/t0OAuAUYj4sW68hNA3xT1+0hHDpMi4pykM8BS0pGbXVmmfEfEGVKPttY9wL7qY3ZlWbdtJN0J3Ah8FHh3vuGVStZcrwaOSPoqcBdpyPdh4FsRcSHXSMsh637kuKR7gd9Kuh94I7AG+EDukZZERIxL037ZsdD2sR06EN2k4fB640w9RyNrfbvUjPInaTOpA/GuWY6rjDLlWlI/cD+wMSLON7CzsP/Lul0vI3UcHgM+CVwAvgn8APh0PiGWSjP7kd3Ae0lzT3pI+a7vgNjMFNo+tkOD+wKwcIryJVx+5AswTDpX2Wh9u1TWfE+S9AXge8CGiPh7DrGVTdZc7wR2RcT+XKMqp6y5FvArYEtEHIiIP5NGfTZLemV+YZZGpnxLuh74I6mjdgNpBGId6Zy+zZ5C28eWdyAi4iJwWtKquodeDxyZ4inPAW+pLZD0GuA/EXE2jxjLpIl8o+S7pCO1d0bEwTxjLIssua5uwx8Gtko6JekUadLUuur9dQWE3LGa2K6PAUNRnWVWXcfZat36dVidJvL9eeCZiPhERAxHxBFgI3CrpNflGuy1pdD2seUdiKpHgS0Td6rXGFgLTHUk9gRwo6Ta82wbquXWmCz5Bvg68CZgXUQczT+8Umk01ydIX2+7CXh7dbkbGKz+PVhEsB0uy3a9D7hN0tya+r2ko+NjOcdZFlny3Ucagag1Qhpun3t5dWtSoe1jO8yBgDR0u1fSQeBJ4D7gwYj4t9KJ4C5gLJILknYDD1QvqNEDTEyEssY0nG9JbyDNeVgFXKibyTtWewRnU2o012PUTX6qjkKcj4jj9Su1KWXZjxyVtBfYLenLpIbs26TJwZ6I3ZiG8w38lJTrw6TO22Lga8BR/E2MprW6fWyLEYiIeAr4IrCHNATTS2q0IE26Ga3eTtgGnAb+CvwS2BERvyss4A6XMd+3AQtIF5IarVvqv95pdZrYtq1JTeT6btJ+5FC1/mLgU0XF2+my5DsiHiedxthGGm37PWli3yYfhMxIS9tH+X9nZmZmWbXFCISZmZl1FncgzMzMLDN3IMzMzCwzdyDMzMwsM3cgzMzMLDN3IMzMzCwzdyDMzMwsM3cgzMzMLDN3IMzMzCyz/wG99QkT0e2BDAAAAABJRU5ErkJggg==\n", 207 | "text/plain": [ 208 | "
" 209 | ] 210 | }, 211 | "metadata": { 212 | "needs_background": "light" 213 | }, 214 | "output_type": "display_data" 215 | } 216 | ], 217 | "source": [ 218 | "heads = y.sum()\n", 219 | "prob_dens = dist.pdf(x, 1 + heads, 1 + N - heads)\n", 220 | "cred_interval = dist.interval(0.95, 1 + heads, 1 + N - heads)\n", 221 | "\n", 222 | "plt.plot(x, prob_dens, label=f\"observe {N} tosses,\\n {heads} heads\" )\n", 223 | "plt.fill_between(x, 0, prob_dens, alpha=0.4)\n", 224 | "plt.hlines(y=0, xmin=cred_interval[0], xmax=cred_interval[1], label=\"95% interval\")\n", 225 | "plt.axvline(0.75, color=\"k\", linestyle=\"--\", lw=1, label=\"Point estimate\")\n", 226 | "\n", 227 | "plt.legend()\n", 228 | "plt.show()" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "Whereas the logistic regression model gives the exact same result, the Bayesian model outputs a much smaller interval, $[0.66, 0.82]$ highlighting the increased confidence. We can now be much more certain the coin is indeed biased.\n", 236 | "\n", 237 | "Another motivation to use Bayesian methods is that we can handcraft very flexible models which allows us to include domain knowledge. Such handcrafted models are also easier to interpret than so called black box models. Very often, we can build bespoke models where the parameters have a direct interpretation in the context of the problem we're trying to solve. We will see this also later in this tutorial in our problem.\n", 238 | "\n", 239 | "\n", 240 | "### Disclaimer: What this tutorial is not\n", 241 | "This tutorial is definitely not complete. It is not an introduction to the theory behind Bayesian methods and will therefore not even mention Bayes formula. I will try to keep mathematical notation and formulas to a minimum and to only focus on how to use Bayesian methods in Python.\n", 242 | "I will also not explain how the algorithms to compute these methods work behind the scenes. \n", 243 | "\n", 244 | "### Further Resources\n", 245 | "If you like to learn more about Bayesian modelling, I can recommend the following resources:\n", 246 | "\n", 247 | "- For a gentle introduction to Bayesian methods, I recommend Richard McElreath's __\"[Statistical Rethinking](https://xcelab.net/rm/statistical-rethinking/)\"__. It focusses on the application of Bayesian methods but also gives the ideas and intuitions for the theory behind these methods and how they're computed. The second edition is coming out very soon and the [Berlin Bayesians meetup](https://www.meetup.com/de-DE/BerlinBayesians/) will read it in their reading group! The book also comes with accompanying lecture videos and uses a lot of code. The book itself uses R but there are many translations of the book in other probabilistic programming languages including to PyMC3.\n", 248 | "\n", 249 | "- The [Berlin Bayesian Meetup](https://www.meetup.com/de-DE/BerlinBayesians/) not only has a reading group but also hosts regular talk events and a community on Slack where you can get help if you get stuck.\n", 250 | "\n", 251 | "- __\"Bayesian Data Analysis\"__ by Gelman et al goes deep into Bayesian theory and also talks a bit about the computational Bayesian methods work.\n", 252 | "\n", 253 | "- The tutorial is based on a [blog post](https://tech.europace.de/hauspreise-in-berlin/) (in German and with R) and some talks I gave previously, e.g. at the [PyData](https://www.youtube.com/watch?v=WbNmcvxRwow&t=123s)" 254 | ] 255 | } 256 | ], 257 | "metadata": { 258 | "kernelspec": { 259 | "display_name": "PyLadies-Bayesian-Tutorial", 260 | "language": "python", 261 | "name": "pyladies-bayesian-tutorial" 262 | }, 263 | "language_info": { 264 | "codemirror_mode": { 265 | "name": "ipython", 266 | "version": 3 267 | }, 268 | "file_extension": ".py", 269 | "mimetype": "text/x-python", 270 | "name": "python", 271 | "nbconvert_exporter": "python", 272 | "pygments_lexer": "ipython3", 273 | "version": "3.7.4" 274 | } 275 | }, 276 | "nbformat": 4, 277 | "nbformat_minor": 4 278 | } 279 | -------------------------------------------------------------------------------- /notebooks/2_Starting_simple.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Starting simple: Linear Regression\n", 8 | "As so often, we will start relatively simple with a Linear Regression. Sounds maybe a bit boring but don't worry, I will show you later how to extend this model to something slightly more complex.\n", 9 | "\n", 10 | "## Short Look at the Data\n", 11 | "For this tutorial, I will use rental data that I scraped from Immoscout24. For a more detailed description and information on how I scraped the data set, you can check its description on [kaggle](https://www.kaggle.com/corrieaar/apartment-rental-offers-in-germany) where I occasionally also update the data.\n", 12 | "For this analysis, we will concentrate on rental offers in Berlin but of course feel free to try out later different cities or areas!" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import sys\n", 22 | "sys.path.append('../src/')\n", 23 | "\n", 24 | "import numpy as np\n", 25 | "import pandas as pd\n", 26 | "import matplotlib.pyplot as plt\n", 27 | "import csv\n", 28 | "\n", 29 | "from utils import iqr, iqr_rule" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "plt.style.use(\"fivethirtyeight\")" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "d = pd.read_csv(\"../data/immo_data.csv\", dtype={\"geo_plz\": str})" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "Before using the data, we will do a bit of preprocessing: We remove outliers where either the living area or the total rent is too low or too high. To remove outliers, the [Inter quartile range (IQR)](https://en.wikipedia.org/wiki/Interquartile_range#Outliers) is used. The IQR rule marks everything as outlier that's too far from the middle range of the data. Most of the data we throw away this way were typos or similar with unreasonable input values.\n", 55 | "For a more thorough analysis, it might be useful to check that we don't throw away real cases and instead incorporate the outliers in further analysis.\n", 56 | "\n", 57 | "As target variable we will use the variable `totalRent`, which in most cases is the sum of the base rent and a service fee (heating etc)." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "d[\"totalRent\"] = np.where(d[\"totalRent\"].isnull(), d[\"baseRent\"], d[\"totalRent\"])\n", 67 | "\n", 68 | "# since log doesn't work with 0, we replace 0 with 0.5\n", 69 | "# seems reasonable to say hat a rent of 0€ is the same as 50ct\n", 70 | "d[\"livingSpace_m\"] = np.where(d[\"livingSpace\"] <= 0, 0.5, d[\"livingSpace\"])\n", 71 | "d[\"totalRent_m\"] = np.where(d[\"totalRent\"] <= 0, 0.5, d[\"totalRent\"])\n", 72 | "d[\"logRent\"] = np.log(d[\"totalRent_m\"])\n", 73 | "d[\"logSpace\"] = np.log(d[\"livingSpace_m\"])\n", 74 | "\n", 75 | "not_outlier = iqr_rule(d[\"logSpace\"], factor=1.5) & iqr_rule(d[\"logRent\"], factor=1.5)\n", 76 | "d = d[not_outlier]\n", 77 | "berlin = d[(d.regio1 == \"Berlin\")].copy()" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "In this analysis, we want to predict the rent (`totalRent`) by the living area (`livingSpace`).\n", 85 | "\n", 86 | "You can have a short look at the data and these two variables!\n", 87 | "\n", 88 | "For example:\n", 89 | "- What is the average rent in Berlin?\n", 90 | "- What is the average size of a flat in Berlin?\n", 91 | "- Plot rent vs living space" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "Before working with the data, we will rescale and normalize the living area and also rescale the total rent:" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "berlin[\"livingSpace_s\"] = (berlin[\"livingSpace\"] - berlin[\"livingSpace\"].mean()) / np.std(berlin[\"livingSpace\"])\n", 136 | "berlin[\"totalRent_s\"] = berlin[\"totalRent\"] / 100\n", 137 | "\n", 138 | "# saving for later\n", 139 | "berlin.to_csv(\"../data/berlin.csv\", encoding=\"utf-8\", quoting=csv.QUOTE_NONNUMERIC) # special quoting necessary because otherwise description messes up" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "We will have to standardize/destandardize area a few times, so we will use small helper functions for this:" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "# these functions are also saved in src/utils.py\n", 156 | "def standardize_area(x):\n", 157 | " return ( x - berlin[\"livingSpace\"].mean()) / np.std(berlin[\"livingSpace\"])\n", 158 | " \n", 159 | "def destandardize_area(x):\n", 160 | " return ( x * np.std(berlin[\"livingSpace\"]) ) + berlin[\"livingSpace\"].mean() " 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "plt.figure(figsize=(9,9))\n", 170 | "plt.scatter(berlin.livingSpace_s, berlin.totalRent_s, s=4)\n", 171 | "plt.title(\"Rent by (standardized) living area\")\n", 172 | "plt.xlabel(\"Living Area [sqm]\")\n", 173 | "plt.ylabel(\"Monthly Rent [100€]\")\n", 174 | "plt.axvline(x=0, c=\"k\", linewidth=1)\n", 175 | "plt.axhline(y=np.mean(berlin.totalRent_s), c=\"k\", linewidth=1, linestyle=\"--\", label=\"Average Rent\")\n", 176 | "plt.legend()\n", 177 | "plt.show()" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "Looking at the plot, it roughly looks like a linear relationship between living area and monthly rent. Indeed, the bigger the flat the more expensive it should be. So we will start our modelling with a linear model.\n", 185 | "\n", 186 | "So in math notation, our model can be written as follows:\n", 187 | "\n", 188 | "$$ \\text{rent} \\approx \\alpha + \\beta \\text{area} $$\n", 189 | "\n", 190 | "This is the same as\n", 191 | "$$ \\text{rent} = \\alpha + \\beta \\text{area} + \\epsilon$$\n", 192 | "where $\\epsilon$ is normally distributed, i.e. $\\epsilon \\sim \\text{Normal}(0, \\sigma)$. This can be rewritten as\n", 193 | "$$ \\text{rent} \\sim \\text{Normal}(\\alpha + \\beta \\text{area}, \\sigma).$$\n", 194 | "\n", 195 | "For easier reading, we rewrite this again:\n", 196 | "$$\\begin{align*} \\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 197 | "\\mu &= \\alpha + \\beta \\text{area}\n", 198 | "\\end{align*}$$\n", 199 | "This will be our first model!" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## Adding a Bayesian ingredient: Priors\n", 207 | "\n", 208 | "Before implementing this model, let's shortly think about what the parameters $\\alpha$ and $\\beta$ mean here.\n", 209 | "\n", 210 | "We will use the model with our rescaled data and thus $\\alpha$ the __intercept is the rental price of an average sized flat__. (For averaged sized flats, the scaled area is 0).\n", 211 | "\n", 212 | "$\\beta$ is then the __increase in rent__ if the flat is one standard deviation larger. One standard deviation is 29sqm which is roughly the average size for a room (check living space divided by number of rooms `noRooms`). Thus $\\beta$ is roughly the increase in rent if the flat would have one more room. \n", 213 | "\n", 214 | "$\\sigma$ is __how much the rent can differ__ for two flats of the same size. Concretely, it is how much the rent can differ from the average rent for flats of the same size. As our model says that the rent is normally distributed, about 95% of the cases should be within $2\\sigma$ of the average rent.\n", 215 | "As error term, $\\sigma$ is always positive.\n", 216 | "\n", 217 | "Thinking about what the parameters mean beforehand is very important in a Bayesian analysis since we need to specify priors. Priors are what we think the parameters could be before seeing the data. And, obviously, to be able to say what range the parameters would be in, it would be good to know what the parameters mean.\n", 218 | "\n", 219 | "\n", 220 | "If we don't know anything about the problems, we might want to specify priors that are very uninformative and vague. \n", 221 | "We could for example specify $\\alpha$ and $\\beta$ as being somewhere between -10,000 and + 10,000:" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "import pymc3 as pm" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "with pm.Model() as mod:\n", 240 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10000)\n", 241 | " beta = ..." 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "A PyMC-Model is specified in a context. Before we can actually specify the model, we need to specify the priors, since, as usual in Python, each variable we want to use need to declared beforehand.\n", 249 | "In PyMC, you always need to specify the name of the variable twice. This is so that the variable knows its own name.\n", 250 | "\n", 251 | "If you print the model, it renders nicely in a more mathy looking description:" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "We can add to the model by opening the context again.\n", 266 | "To for example add a prior for sigma, we can proceed as follows:" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "with mod:\n", 276 | " sigma = ..." 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "Since $\\sigma$ as error term is always a positive variable, we need to use a distribution that is also always positive. One commonly used distribution for this is the Half-Normal. A normal distribution that is cut in half and only positive.\n", 284 | "Other commonly used distributions for $\\sigma$ are for example the Exponential or the Half-Cauchy.\n", 285 | "\n", 286 | "Now that we specified some priors, we can write out the complete model:" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "with mod:\n", 296 | " mu = ...\n", 297 | " \n", 298 | " rent = ..." 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "The PyMC-Model is written very similar to how the model was specified above. \n", 306 | "\n", 307 | "Note that for the outcome variable, we need to specify the observed data.\n", 308 | "Usually the whole model is written as one:" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "with pm.Model() as mod:\n", 318 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10000)\n", 319 | " beta = pm.Normal(\"beta\", mu=0, sigma=10000)\n", 320 | " sigma = pm.HalfNormal(\"sigma\", sigma=10000)\n", 321 | " \n", 322 | " mu = alpha + beta*berlin[\"livingSpace_s\"] \n", 323 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 324 | " observed=berlin[\"totalRent_s\"])" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "Especially in the beginning, when starting out with Bayesian modelling, picking priors can seem a bit daring. \n", 332 | "Does it make a difference if we use $\\text{Normal}(0,100)$ or $\\text{Normal}(0, 1000)$?\n", 333 | "What's with using different distributions?\n", 334 | "\n", 335 | "\n", 336 | "There are a few tips that help a bit with picking a good prior. The first one is to visualize your priors. We can do this with PyMC by sampling from our priors.\n", 337 | "This is the similar to sampling from the specified distributions using numpy or scipy. However, on top of sampling from the probability distributions, it then also computes $\\mu$ (the linear part of the model) using the samples and the predictor variables. It then uses the computed $\\mu$ and the samples from $\\sigma$ to sample possible outcome values. Even though we specified the target variable (in Machine Learning this one is usually called `y`) it does not use this (yet) and is thus very quick." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "with mod:\n", 347 | " priors = ..." 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "We will use [ArviZ](https://arviz-devs.github.io/arviz/) to keep track of the different outputs computed from our model and to visualize them." 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "import arviz as az\n", 364 | "pm_data = az.from_pymc3(prior = priors)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "ArviZ comes with many plots that are useful to analyze Bayesian models.\n", 372 | "Let's look at the priors for our three model parameters:" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "This doesn't look super interesting, it is basically just a density plot of the distributions we gave for the priors.\n", 387 | "More interesting is to visualize what rental prices these prior distributions would produce given the predictor data.\n", 388 | "Unfortunately, there is no ArviZ plot for this yet, but we can do this ourselves without too much work.\n", 389 | "\n", 390 | "The object `priors` contains a numpy array for `rent`. Check what it contains:" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "For each observation in our dataframe (7092 obs) it computed 1000 samples for possible rent prices, using the samples from `alpha`, `beta`, and `sigma` together with the corresponding living area from this observation.\n", 405 | "\n", 406 | "We can flatten the matrix to obtain one big array and plot a histogram:" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "prior_rents = priors[\"rent\"].flatten()*100\n", 416 | "# plot histogram\n", 417 | "plt.hist(prior_rents, alpha=0.9, ec=\"darkblue\", bins=70)\n", 418 | "plt.title(\"Histogram over possible range of rental prices\")\n", 419 | "plt.xlabel(\"Monthly Rent [€]\")\n", 420 | "plt.show()" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "Compare this with the histogram over the actual rental prices:" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": {}, 434 | "outputs": [], 435 | "source": [ 436 | "plt.hist(berlin[\"totalRent_s\"]*100, alpha=0.9, ec=\"darkblue\", bins=70)\n", 437 | "plt.title(\"Histogram over the actual rental prices\")\n", 438 | "plt.xlabel(\"Monthly Rent [€]\")\n", 439 | "plt.show()" 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "The histograms don't look very similar and the range of the sampled rents is completly off! A rent of up 50,000€ per month doesn't sound very realistic.\n", 447 | "\n", 448 | "\n", 449 | "Another good way to understand the prior better is to visualize the model that it would produce. In our case, the model is a line determined by the intercept $\\alpha$ and the slope $\\beta$. \n", 450 | "We can thus sample 50 $\\alpha$ and $\\beta$s and multiply this with the range of area values:" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "area_s = np.linspace(start=-2, stop=3.5, num=50)\n", 460 | "draws = np.random.choice(len(priors[\"alpha\"]), replace=False, size=50)\n", 461 | "\n", 462 | "alpha = ...\n", 463 | "beta = ...\n", 464 | "\n", 465 | "mu = ...\n", 466 | "\n", 467 | "\n", 468 | "plt.plot(destandardize_area(area_s), mu*100, c=\"#737373\", alpha=0.5)\n", 469 | "plt.xlabel(\"Living Area [sqm]\")\n", 470 | "plt.ylabel(\"Price [€]\")\n", 471 | "plt.title(\"Linear model according to our prior\")\n", 472 | "plt.show()" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "Exercise: \n", 480 | "Use `plt.axhline(y=my_rent)` to compare the sampled lines with your own rent, the most expensive, reasonable rent you can think off and the cheapest possible rent you can think off. I also like to google what is the known most expensive rent one can find in Berlin (or elsewhere) and see how it compares to the model.\n", 481 | "\n", 482 | "You could also add these benchmarks to the histogram using `plt.axvline(x=my_rent)`.\n", 483 | "\n", 484 | "These priors are obviously not very realistic! Luckily, we all know a bit about rents, so even without looking at the data, we can think of better priors.\n", 485 | "\n", 486 | "Write a new model `mod_informed` that uses the same linear part as above, but better priors. You can use the same distributions as above and only change its parameters. I recommend you to try various priors and check what effects they have on the resulting model and its outputs.\n", 487 | "\n", 488 | "Remember: \n", 489 | "- $\\alpha$ the intercept is the rental price of an average sized flat (averaged sized flat is 77sqm).\n", 490 | "- $\\beta$ the slope is roughly the increase in rent if the flat would have one more room\n", 491 | "- $\\sigma$ is how much the rent can differ for two flats of the same size. As error term, $\\sigma$ is always positive.\n", 492 | "\n", 493 | "Always keep in mind the scale of your data!" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "with pm.Model() as mod_informed:\n", 503 | " # new model goes here\n", 504 | " \n", 505 | " mu = ...\n", 506 | " \n", 507 | " rent = ...\n", 508 | " \n", 509 | " priors = ..." 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "I made functions for the above plot, so you don't have to copy the whole plot code: `compare_hist(priors, berlin)` to plot the two histograms and `draw_models(priors, berlin)` to plot sampled lines from our model:" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": null, 522 | "metadata": {}, 523 | "outputs": [], 524 | "source": [ 525 | "from utils import compare_hist, draw_models\n", 526 | "\n", 527 | "compare_hist(priors, berlin)\n", 528 | "plt.show()" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": {}, 535 | "outputs": [], 536 | "source": [ 537 | "draw_models(priors, berlin)\n", 538 | "plt.show()" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "Questions to explore regarding different priors:\n", 546 | "- We've seen above what happens when we use overly vague priors, what happens if we use very narrow priors? What problems could arise?\n", 547 | "- So far, we've only used the normal distribution for the priors. Try out some different distributions, e.g.\n", 548 | " - Uniform distribution, e.g. over -1000 to + 1000\n", 549 | " - Student-T distribution\n", 550 | "- Since we know that rents should increase for larger flats, we know that the slope $\\beta$ should be positive, how could we bias our priors to positive values?" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "To understand better how something works, it often helps to implement it oneself. Try implementing yourself sampling from the prior using only numpy or scipy." 558 | ] 559 | } 560 | ], 561 | "metadata": { 562 | "kernelspec": { 563 | "display_name": "PyLadies-Bayesian-Tutorial", 564 | "language": "python", 565 | "name": "pyladies-bayesian-tutorial" 566 | }, 567 | "language_info": { 568 | "codemirror_mode": { 569 | "name": "ipython", 570 | "version": 3 571 | }, 572 | "file_extension": ".py", 573 | "mimetype": "text/x-python", 574 | "name": "python", 575 | "nbconvert_exporter": "python", 576 | "pygments_lexer": "ipython3", 577 | "version": "3.7.4" 578 | } 579 | }, 580 | "nbformat": 4, 581 | "nbformat_minor": 4 582 | } 583 | -------------------------------------------------------------------------------- /notebooks/3_Did_it_converge.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Adding the data\n", 8 | "After having tried out various priors, we can now proceed and feed some data to our model.\n", 9 | "Let's first load the data and write down our model again:" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import numpy as np\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "\n", 22 | "import pymc3 as pm\n", 23 | "import arviz as az\n", 24 | "\n", 25 | "import sys\n", 26 | "sys.path.append('../src/')\n", 27 | "from utils import standardize_area, destandardize_area" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "plt.style.use(\"fivethirthyeight\")\n", 37 | "plt.rcParams['figure.figsize'] = 9, 9" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "berlin = pd.read_csv(\"../data/berlin.csv\", index_col=0)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "with pm.Model() as mod:\n", 56 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10)\n", 57 | " beta = pm.Normal(\"beta\", mu=1, sigma=5)\n", 58 | " \n", 59 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 60 | " \n", 61 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 62 | " \n", 63 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 64 | " observed = berlin[\"totalRent_s\"])\n", 65 | " \n", 66 | " priors = pm.sample_prior_predictive()" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "We already specified all necessary data in the model but we still need to tell it to start estimating the parameters (in ML speak: train the model).\n", 74 | "In Bayesian Modelling this process is also called sampling since we're trying to get a probability distributions, the posterior, by sampling from it.\n", 75 | "The command is thus also called `pm.sample()` in PyMC. The result is a sample from our target probability distribution and is often called the _trace_. This is because the successive sampling traces out the probability distribution. This [visualization](https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard) might help to give an intuition what happens in the background. \n", 76 | "\n", 77 | "With `draws`, we determine how big a sample we want to have and `tune` determines how much the algorithm should try to find and explore the target probability before starting to sample. Sampling can easily be parallelized by letting the sampler run multiple so called chains. It is recommended to have at least two chains but you can go up to as many cores you have (that's usually four).\n", 78 | "You will get `draws*chains` number of samples." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "with mod:\n", 88 | " trace = ..." 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "We again feed all the different artificats into the ArviZ InferenceData object." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "pm_data = az.from_pymc3(...)\n", 105 | "pm_data" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "What we first need to check, is if our model actually converged. For this, we will mostly look at two things:\n", 113 | "- trace plots\n", 114 | "- different summary statistics\n", 115 | "\n", 116 | "Let's start with the trace plots. Again, ArviZ provides a handy function for these:" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "On the left, we see the estimated probability distribution for our parameters. On the right, we see how the samples trace through the distribution. The $y$-axes on the right are the same as the $x$-axes on the left. The $x$-axes on the right are the sample number.\n", 131 | "\n", 132 | "\n", 133 | "So what do we need to look out for when checking these trace plots? In short, any non-random looking patterns. These trace plots actually look all good, so it's probably best to show a few examples that fail. Sometimes the best way to understand something is trying to break it.\n", 134 | "\n", 135 | "\n", 136 | "Go play around with this model and try to make it fail!\n", 137 | "\n", 138 | "For most fails, you will also see warnings by PyMC. Another sign that something is wrong with your model is if the sampling process takes very long. This is a relatively simple model with not too much data, so it shouldn't take more than a few seconds." 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "Some hints on what to try:\n", 146 | "- make the priors really really narrow\n", 147 | "- use priors with hard boundaries (e.g. Uniform) on a wrong range, e.g. a negative range for beta\n", 148 | "- use almost no tuning steps, e.g. `tune=10`\n", 149 | "- use a very small amount of samples, e.g. `draws=100`" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "with pm.Model() as failed_mod:\n", 159 | " \n", 160 | " ...\n", 161 | " \n", 162 | " failed_trace = pm.sample(draws=100, tune=100)\n", 163 | " \n", 164 | "failed_data = az.from_pymc3(trace = failed_trace, model = failed_mod)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "az.plot_trace(failed_data)\n", 174 | "plt.show()" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "The next thing to check after the trace plots are the summary statistics.\n", 182 | "In particular, there are\n", 183 | "- ESS, effective sample size, the estimated number of samples\n", 184 | "- R_hat, a diagnostic that compares the different chains\n", 185 | "\n", 186 | "__ESS:__ As we're consecutively sampling from the probability space, sample $n$ is often slightly auto-correlated to sample $n-1$ and sample $n+1$. For this reason, the final sample cannot be treated as a completely independent sample from our target probability distribution, the posterior. ESS estimates how many _effective_ samples we have. If everything worked fine, this should be close to the actual number of samples. If it is much lower then there might be a problem with your model. This model is a very simple one and NUTS, the algorithm used by PyMC3 is very effective, so the effective number of samples is even higher than the actual number of samples! In general, you'll want the ESS to be at least larger than 25% of the actual number of samples. A ESS smaller than that often indicates that your model has some problems with the sampling. Also note that how many effective samples you need depends heavily on the goal of your analysis: If you only want to estimate the mean and a 50% interval, you will already get robust estimates with around 300 effective samples. If you need a high precision and want to estimate a 95% or even 99% interval, you will need much more effective samples to estimate the tails correctly!\n", 187 | "\n", 188 | "__R_hat:__ The R_hat diagnostic checks if the different chains converged to the same value. This diagnostic should be very close to 1, definitely smaller than 1.05, some argue even smaller than 1.005, everything larger indicates a problem.\n", 189 | "\n", 190 | "\n", 191 | "We can check these values with ArviZ:" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "For this model, all summary and diagnostic statistics look good: the ESS is even larger than the actual number of samples and R_hat is very close to 1. Good!\n", 206 | "\n", 207 | "In the same summary, we can also see the point estimates for the three parameters in the column `mean`. The estimates for alpha and beta should be very close to the coefficients we would get from `sklearn.linear_model.LinearRegression()` (feel free to try this out).\n", 208 | "\n", 209 | "HPD is the highest probability density, so that 94% of probability mass is between hpd_3% and hpd_97%.\n", 210 | "\n", 211 | "Let's have a look at some visualizations of the results" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "fig, ax = plt.subplots(figsize=(20, 9))\n", 221 | "az.plot_forest(pm_data,\n", 222 | " # combine the different chains\n", 223 | " combined=True,\n", 224 | " kind=\"ridgeplot\", ridgeplot_alpha=0.6, \n", 225 | " linewidth=1,\n", 226 | " ax=ax)\n", 227 | "plt.show()" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Or we can visualize the resulting model as before, this time not using the prior but the posterior, that is the trace object:" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "from utils import draw_models\n", 244 | "\n", 245 | "...\n", 246 | "\n", 247 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4)\n", 248 | "plt.title(\"Linear model\")\n", 249 | "plt.show()" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "Instead of drawing a few sample lines from the model, we can also compute mu for each sample and use ArviZ to plot credibility intervals (hpd intervals):" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "area_s = np.linspace(start=-2, stop=3.5, num=50)\n", 266 | "\n", 267 | "mu_pred = ...\n", 268 | "\n", 269 | "# destandardize area again\n", 270 | "area = destandardize_area(area_s)\n", 271 | "\n", 272 | "plt.plot(area, mu_pred.mean(1)*100, alpha=0.3, c=\"k\")\n", 273 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.4)\n", 274 | "\n", 275 | "az.plot_hpd(area, mu_pred.T*100, credible_interval=0.83)\n", 276 | "\n", 277 | "\n", 278 | "\n", 279 | "\n", 280 | "plt.xlabel('Living Area [sqm]')\n", 281 | "plt.ylabel('Rent [€]')\n", 282 | "plt.title('Visualizing the uncertainty in our model')\n", 283 | "plt.show()" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "Our model is actually very certain about where the mu is. This makes sense, because we have quite a bit of data for this relatively simple model.\n", 291 | "\n", 292 | "But this is not all the uncertainty our model is aware of. We also have the estimate sigma that tells us how far the rent could be away from this mean mu. We can compute the predicted ranges for rent as follows:" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "metadata": {}, 299 | "outputs": [], 300 | "source": [ 301 | "import scipy.stats as stats" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "rent_pred = stats.norm.rvs(mu_pred, trace['sigma'])\n", 311 | "\n", 312 | "plt.plot(area, mu_pred.mean(1)*100, alpha=0.3, c=\"k\")\n", 313 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.7)\n", 314 | "\n", 315 | "az.plot_hpd(area, rent_pred.T*100, credible_interval=0.83, \n", 316 | " fill_kwargs={\"alpha\": 0.5})\n", 317 | "\n", 318 | "\n", 319 | "\n", 320 | "plt.xlabel('Living Area [sqm]')\n", 321 | "plt.ylabel('Rent [€]')\n", 322 | "plt.title('Visualizing the uncertainty in our model')\n", 323 | "plt.show()" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "The red area should now contain around 83% of our data if the model is good.\n", 331 | "\n", 332 | "The same way as we computed the `mu_pred` and `rent_pred`, we could also predict the rent price for a new flat. \n", 333 | "\n", 334 | "Can you predict the rent price for example for your own flat?" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "area_of_my_own_flat = []\n", 344 | "\n", 345 | "# don't forget to standardize\n", 346 | "area_of_my_own_flat_s = ..." 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "There is also a less manual way to get predictions on new data. Unfortunately, it is not as straight-forward as with scikit-learn. You will need to set up the predictor variables used inside the model as a data container:" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [ 362 | "with pm.Model() as mod:\n", 363 | " # create data containers for predictor variable\n", 364 | " area = pm.Data(\"area\", berlin[\"livingSpace_s\"])\n", 365 | " \n", 366 | " # rest stays the same\n", 367 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10)\n", 368 | " beta = pm.Normal(\"beta\", mu=0, sigma=5)\n", 369 | " \n", 370 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 371 | " \n", 372 | " mu = alpha + beta*area\n", 373 | " \n", 374 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 375 | " observed = berlin[\"totalRent_s\"])\n", 376 | " \n", 377 | " trace = pm.sample(draws=1000, tune=1000)" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "We can then swap the content of this data container with new data and then compute the prediction for the new data:" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "# unfortunately there is an issue that predicting a single obs\n", 394 | "# doesn't work correctly https://github.com/pymc-devs/pymc3/issues/3640#issuecomment-563897443\n", 395 | "# so just use two\n", 396 | "new_area = standardize_area([74, 97])\n", 397 | "with mod:\n", 398 | " pm.set_data({\"area\": new_area})\n", 399 | " \n", 400 | " post_pred = pm.sample_posterior_predictive(trace, samples=1000)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "At first, it might look unhandy and more difficult to work with samples as predictions instead of point estimates. While indeed this means we often need to handle multidimensional arrays, the advantage of this is that we can see the whole probability distribution over rents for one flat.\n", 408 | "\n", 409 | "Plot the histogram over rents for the first flat!\n", 410 | "(Don't forget to destandardize the rent price again!)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": null, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "The sample also allows it to answer questions such as, how likely is it that the rent of this flat would be below 600€. \n", 425 | "\n", 426 | "For this, we check how many points in our samples are smaller than 600€ and then take the mean of the vector:" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": {}, 439 | "source": [ 440 | "Cool, eh? (Though the result is maybe a bit depressing...)" 441 | ] 442 | } 443 | ], 444 | "metadata": { 445 | "kernelspec": { 446 | "display_name": "PyLadies-Bayesian-Tutorial", 447 | "language": "python", 448 | "name": "pyladies-bayesian-tutorial" 449 | }, 450 | "language_info": { 451 | "codemirror_mode": { 452 | "name": "ipython", 453 | "version": 3 454 | }, 455 | "file_extension": ".py", 456 | "mimetype": "text/x-python", 457 | "name": "python", 458 | "nbconvert_exporter": "python", 459 | "pygments_lexer": "ipython3", 460 | "version": "3.7.4" 461 | } 462 | }, 463 | "nbformat": 4, 464 | "nbformat_minor": 4 465 | } 466 | -------------------------------------------------------------------------------- /notebooks/4_Beyond_linear.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Beyond Linear: Going Hierarchical\n", 8 | "All nice and well but linear models with only one predictor can be a bit boring, right? Also, most people know that size is not the only factor determining the rental price. If you've lived in Berlin for a while, you know that certain areas are much more expensive than others.\n", 9 | "\n", 10 | "Unfortunately, this data set doesn't contain the coordinates for each flat nor the exact address. But for each flat we have a ZIP code (the PLZ).\n", 11 | "We will now extend our model to incorporate the location by using the ZIP code. We do so by training one linear model per ZIP code." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import numpy as np\n", 21 | "import pandas as pd\n", 22 | "import matplotlib.pyplot as plt\n", 23 | "import seaborn as sns\n", 24 | "\n", 25 | "import pymc3 as pm\n", 26 | "import theano\n", 27 | "\n", 28 | "import sys\n", 29 | "sys.path.append('../src/')\n", 30 | "from utils import standardize_area, destandardize_area\n", 31 | "\n", 32 | "berlin = pd.read_csv(\"../data/berlin.csv\", index_col=0, dtype={\"geo_plz\":str})" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "plt.style.use(\"fivethirtyeight\")\n", 42 | "plt.rcParams['figure.figsize'] = 9, 9" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "One model per zip code works well if a zip code has many observations:" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "sns.regplot(x=berlin[\"livingSpace\"][berlin.geo_plz == \"13583\"], y=berlin[\"totalRent\"][berlin.geo_plz == \"13583\"], \n", 59 | " color=\"#42b883\", ci=0, scatter_kws={\"s\": 40}, label=\"Spandau\")\n", 60 | "sns.regplot(x=berlin[\"livingSpace\"][berlin.geo_plz == \"10405\"], y=berlin[\"totalRent\"][berlin.geo_plz == \"10405\"], \n", 61 | " color=\"#f07855\", ci=0, scatter_kws={\"s\": 40}, label=\"Prenzlauer Berg\")\n", 62 | "\n", 63 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent\"], s=3, c=\"gray\", alpha=0.3)\n", 64 | "plt.ylabel(\"Monthly Rent [€]\")\n", 65 | "plt.xlabel(\"Living Area [sqm]\")\n", 66 | "plt.legend()\n", 67 | "plt.show()" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "If a zip code has only a handful or less observations, it is a bit more difficult. In this zip code, we only have two observations and a model fit on these two points would result in a negative slope which doesn't make much sense. In cases where we have little data, we would prefer the model to be closer to a model fit on all data, as we did before. Since even in Blankenburg, in general, bigger flats should be more expensive." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent\"], s=3, c=\"gray\", alpha=0.3)\n", 84 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 85 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 86 | "plt.legend()\n", 87 | "plt.show()" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "We can achieve this by saying that for each linear model (one for each zip code), the parameters determining the slope and intercept of this model come from a common distribution. So all slope parameters for example would come from a Normal distribution centered around some value close to 4.4 (the slope parameter we obtained in our last model) but the slope for Prenzlauer Berg would be higher than average while Spandau would be lower than average and places like Blankenburg would stay close to the mean (and also have higher uncertainty).\n", 95 | "\n", 96 | "Before, our model looked like this:\n", 97 | "\n", 98 | "$$\\begin{align*}\n", 99 | "\\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 100 | "\\mu &= \\alpha + \\beta \\text{area} \\\\\n", 101 | "\\\\\n", 102 | "\\alpha &\\sim \\text{Normal}(0, 10) \\\\\n", 103 | "\\beta &\\sim \\text{Normal}(0, 5) \\\\\n", 104 | "\\\\\n", 105 | "\\sigma &\\sim \\text{HalfNormal}(5) \n", 106 | "\\end{align*}$$\n", 107 | "\n", 108 | "We know extend this as follows:\n", 109 | "$$\\begin{align*}\n", 110 | "\\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 111 | "\\mu &= \\alpha_{[\\text{ZIP}]} + \\beta_{[\\text{ZIP}]} \\text{area} \\\\\n", 112 | "\\\\\n", 113 | "\\alpha_{[\\text{ZIP}]} &\\sim \\text{Normal}(\\mu_{\\alpha}, \\sigma_{\\alpha}) \\\\\n", 114 | "\\beta_{[\\text{ZIP}]} &\\sim \\text{Normal}(\\mu_{\\beta}, \\sigma_{\\beta}) \\\\\n", 115 | "\\\\\n", 116 | "\\mu_{\\alpha} &\\sim \\text{Normal}(0, 10) \\\\\n", 117 | "\\mu_{\\beta} &\\sim \\text{Normal}(0, 5) \\\\\n", 118 | "\\\\\n", 119 | "\\sigma, \\sigma_{\\alpha}, \\sigma_{\\beta} &\\sim \\text{HalfNormal}(5) \n", 120 | "\\end{align*}$$\n", 121 | "\n", 122 | "This looks like an awful lot more formula but the most important changes are only in the second, third and fourth line, the lines below are mostly repeating priors from above.\n", 123 | "\n", 124 | "In the second line, we use a linear model similar to above, but with different $\\alpha$ and $\\beta$ for each ZIP code. As before, we need to define priors for these two parameters. Only unlike above, we now put so called hyperpriors on the parameters of these priors. $\\mu_{\\alpha}$ is now the expected mean for the intercepts of each ZIP code and $\\sigma_{\\alpha}$ determines how much of a difference there can be between the intercept in Spandau and the intercept in Prenzlauer Berg.\n", 125 | "\n", 126 | "Don't worry if so many formulas are not your cup of tea, we now gonna look at the code for the model. Before writing the model however, we need to map the zip codes to an index variable:" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "berlin[\"zip\"] = berlin.geo_plz.map(str.strip)\n", 136 | "zip_codes = np.sort(berlin.zip.unique())\n", 137 | "num_zip_codes = len(zip_codes)\n", 138 | "zip_lookup = dict(zip(zip_codes, range(num_zip_codes)))\n", 139 | "berlin[\"zip_code\"] = berlin.zip.replace(zip_lookup).values" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "And a small helper function to map from ZIP string to ZIP code:" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "def map_zip_codes(zip_strings, zip_lookup=zip_lookup):\n", 156 | " return pd.Series(zip_strings).replace(zip_lookup).values" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": {}, 163 | "outputs": [], 164 | "source": [ 165 | "with pm.Model() as hier_model:\n", 166 | " mu_alpha = ...\n", 167 | " sigma_alpha = ...\n", 168 | " \n", 169 | " mu_beta = ...\n", 170 | " sigma_beta = ...\n", 171 | " \n", 172 | " alpha = ...\n", 173 | " \n", 174 | " beta = ...\n", 175 | " \n", 176 | "\n", 177 | " sigma = ...\n", 178 | " \n", 179 | " mu = ...\n", 180 | " \n", 181 | " rent = pm.Normal(\"rent\", mu=mu, sd=sigma, observed=berlin[\"totalRent_s\"])\n", 182 | " \n", 183 | " trace = pm.sample(random_seed=2020, chains=4, \n", 184 | " draws=1000, tune=1000)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "We will need to do the convergency checks again. For this, we first collect the different model outputs in an ArviZ Data object:" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "import arviz as az\n", 201 | "pm_data = ..." 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "One thing where the ArviZ Data object is very useful, is for dealing with high-dimensional model artificats. With hierararchical models such as this one, we get one parameter per level (here the ZIP codes) and 4000 samples per parameter, coming from four chains (at least in the beginning, we want to keep the samples of each chain separated). ArviZ makes it easier to keep track of all the different dimensions and coordinates:" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "pm_data.posterior" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "There are the dimensions `alpha_dim_0` and `beta_dim_0` that represent the ZIP codes. We have 208 ZIP codes and since we mapped them to an index, the coordinates for these dimensions are simply the integers from 0 to 207. But we can give them meaningful names by providing ArviZ with the original ZIP code strings:" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "pm_data = az.from_pymc3(model=hier_model, trace=trace,\n", 234 | " # create a new coordinate\n", 235 | " ...\n", 236 | " # this coordinate is used by the dimension alpha and beta\n", 237 | " ...\n", 238 | " )" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "pm_data.posterior" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "There's much more that can be done with the ArviZ data object but this would go beyond this tutorial. A small introduction on how to work with them can be found [here](https://github.com/corriebar/arviz/blob/xarray-example/doc/notebooks/Working%20with%20InferenceData.ipynb).\n", 255 | "\n", 256 | "\n", 257 | "Next, we'll check the trace plots. Remember, ArviZ has a function for this. Only problem: we know have many many alpha and beta parameters, one each for each ZIP code. This is way too much to plot! Use the function parameter `var_names` to only select the parameters from the model that don't use ZIP code as index." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "Next, we'll check the three different summary/diagnostic statistics. There were R_hat, MCSE (Monte Carlo Standard Error), and ESS (Effective Sample Size). You can get the summary table using ArviZ again:" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "Unfortunately, since we have so many parameters, we can't check them easily by hand. What we can do instead, is to plot a histogram for each of the diagnostics:" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "# rhat\n", 295 | "summ = az.summary(pm_data, round_to=5)\n", 296 | "\n", 297 | "...\n", 298 | "plt.title(\"R_hat\")\n", 299 | "plt.show()" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Check for yourself the ESS diagnostics:" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "# ess\n", 316 | "...\n", 317 | "plt.title(\"ESS Mean\")\n", 318 | "plt.show()" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "Before, we checked how good our model fit the data by comparing the plot of the linear model to our data. Since we now have a collection of linear models, this would be rather difficult. What we can do instead is a so called posterior-predictive check. We compare the predicted distribution of outcomes to the actual distribution of outcomes." 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": {}, 332 | "outputs": [], 333 | "source": [ 334 | "with hier_model:\n", 335 | " posterior_pred = ..." 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(14,14), sharex=True)\n", 345 | "ax = ax.ravel()\n", 346 | "\n", 347 | "...\n", 348 | "\n", 349 | "ax[0].set_title(\"Original data\")\n", 350 | "\n", 351 | "sample_nums = np.random.choice(posterior_pred[\"rent\"].shape[0], size=3, replace=False)\n", 352 | "for i, samp in enumerate(sample_nums):\n", 353 | " \n", 354 | " ...\n", 355 | " ax[i+1].set_title(f\"Sample {i+1}\")\n", 356 | "plt.suptitle(\"Comparing Original Data to Predicted Data\")\n", 357 | "plt.show()" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "Even though it is difficult to visualize all models, we can pick out a few and check how the model differs for different ZIP codes. To now be able to select the posterior sample of a single ZIP code, the ArviZ object becomes very helpful. If we, for example, want to extract the sample for Blankenfelde (the ZIP code 13129 from above with so few observations), we get the data as follows:" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": null, 370 | "metadata": {}, 371 | "outputs": [], 372 | "source": [ 373 | "blankenfelde = ... \n" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "area_s = np.linspace(start=-2, stop=3.5, num=50)\n", 383 | "\n", 384 | "mu_pred_blankenfelde = ...\n", 385 | "\n", 386 | "\n", 387 | "# destandardize area again\n", 388 | "area = destandardize_area(area_s)\n", 389 | "\n", 390 | "plt.plot(area, mu_pred_blankenfelde.mean(1)*100, alpha=0.3, c=\"k\")\n", 391 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.4, c=\"grey\")\n", 392 | "\n", 393 | "plt.title(\"Uncertainty (mu) for Blankenburg\")\n", 394 | "\n", 395 | "\n", 396 | "az.plot_hpd(area, mu_pred_blankenfelde.T*100, credible_interval=0.83)\n", 397 | "\n", 398 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 399 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 400 | "plt.xlabel('Living Area [sqm]')\n", 401 | "plt.ylabel('Rent [€]')\n", 402 | "plt.legend()\n", 403 | "plt.show()" 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "We can compare this with one of the ZIP codes from above that had more data.\n", 411 | "Go and make the same plot for a different ZIP code (of your choice)!" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": null, 417 | "metadata": {}, 418 | "outputs": [], 419 | "source": [] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [ 432 | "If you want, you can also check how the full uncertainty looks like for these ZIP codes. Remember, for this you'll need to compute the predictions for rent." 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": null, 438 | "metadata": {}, 439 | "outputs": [], 440 | "source": [ 441 | "import scipy.stats as stats" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "rent_blankenfelde = ...\n", 451 | "\n", 452 | "plt.plot(area, mu_pred_blankenfelde.mean(1)*100, alpha=0.3, c=\"k\")\n", 453 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.7, c=\"grey\")\n", 454 | "\n", 455 | "az.plot_hpd(area, mu_pred_blankenfelde.T*100, credible_interval=0.83, \n", 456 | " fill_kwargs={\"alpha\": 0.5})\n", 457 | "\n", 458 | "az.plot_hpd(area, rent_blankenfelde.T*100, credible_interval=0.83, \n", 459 | " fill_kwargs={\"alpha\": 0.5})\n", 460 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 461 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 462 | "\n", 463 | "plt.legend()\n", 464 | "plt.title(\"Full uncertainty for Blankenfelde\")\n", 465 | "plt.xlabel('Living Area [sqm]')\n", 466 | "plt.ylabel('Rent [€]')\n", 467 | "plt.show()" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": null, 473 | "metadata": {}, 474 | "outputs": [], 475 | "source": [ 476 | "# repeat for ZIP code of your choice" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "Instead of computing the rent predictions by hand, we could also use PyMC data container to handle predictions on new data.\n", 484 | "\n", 485 | "Unfortunately, because of an still unclosed issue, we can't use the PyMC data container to update the ZIP code indices but need to use a shared theano variable. However, both types are updated in a similar fashion." 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": null, 491 | "metadata": {}, 492 | "outputs": [], 493 | "source": [ 494 | "zips = ...\n", 495 | "\n", 496 | "with pm.Model() as hier_model:\n", 497 | " \n", 498 | " area = ...\n", 499 | " \n", 500 | " mu_alpha = pm.Normal(\"mu_alpha\", mu=0, sigma=10)\n", 501 | " sigma_alpha = pm.HalfNormal(\"sigma_alpha\", sigma=5)\n", 502 | " \n", 503 | " mu_beta = pm.Normal(\"mu_beta\", mu=0, sigma=5)\n", 504 | " sigma_beta = pm.HalfNormal(\"sigma_beta\", sigma=5)\n", 505 | " \n", 506 | " alpha = pm.Normal(\"alpha\", mu=mu_alpha, sd=sigma_alpha, \n", 507 | " shape=num_zip_codes)\n", 508 | " \n", 509 | " beta = pm.Normal(\"beta\", mu=mu_beta, sd=sigma_beta, \n", 510 | " shape=num_zip_codes)\n", 511 | " \n", 512 | "\n", 513 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 514 | " \n", 515 | " mu = alpha[zips] + beta[zips]*area\n", 516 | " \n", 517 | " rent = pm.Normal(\"rent\", mu=mu, sd=sigma, observed=berlin[\"totalRent_s\"])\n", 518 | " \n", 519 | " trace = pm.sample(random_seed=2020, chains=4, \n", 520 | " draws=1000, tune=1000)" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": {}, 526 | "source": [ 527 | "Feel free to change the area and ZIP code data to for example your own flat." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": null, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "more_flats = pd.DataFrame({\"area\": standardize_area(np.array([100, 240, 74])), \n", 537 | " \"zip_code\": [\"10243\", \"10179\", \"12047\"]})\n", 538 | "\n", 539 | "more_flats[\"zip\"] = map_zip_codes(more_flats[\"zip_code\"])" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": {}, 546 | "outputs": [], 547 | "source": [ 548 | "with hier_model:\n", 549 | " zips.set_value(more_flats[\"zip\"])\n", 550 | " pm.set_data({\"area\": more_flats[\"area\"]})\n", 551 | " post_pred = pm.sample_posterior_predictive(trace, samples=1000)" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "As before, we can now plot this as a histogram:" 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": null, 564 | "metadata": {}, 565 | "outputs": [], 566 | "source": [ 567 | "y_pred = post_pred[\"rent\"][:,2]*100\n", 568 | "\n", 569 | "plt.hist(y_pred, ec=\"darkblue\", alpha=0.9, bins=20)\n", 570 | "plt.title(\"Rental price distribution\\nfor a flat of 74sqm in 12047\")\n", 571 | "plt.show()" 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "metadata": {}, 577 | "source": [ 578 | "Or ask for the probability that your flat would have a rent lower than your own rent:" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [] 587 | } 588 | ], 589 | "metadata": { 590 | "kernelspec": { 591 | "display_name": "PyLadies-Bayesian-Tutorial", 592 | "language": "python", 593 | "name": "pyladies-bayesian-tutorial" 594 | }, 595 | "language_info": { 596 | "codemirror_mode": { 597 | "name": "ipython", 598 | "version": 3 599 | }, 600 | "file_extension": ".py", 601 | "mimetype": "text/x-python", 602 | "name": "python", 603 | "nbconvert_exporter": "python", 604 | "pygments_lexer": "ipython3", 605 | "version": "3.7.4" 606 | } 607 | }, 608 | "nbformat": 4, 609 | "nbformat_minor": 4 610 | } 611 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | -e git+https://github.com/arviz-devs/arviz@685b259b8618b9182ed348d0f06f4ca4cf303d44#egg=arviz 2 | attrs==19.3.0 3 | backcall==0.1.0 4 | bleach==3.1.0 5 | bokeh==1.4.0 6 | cftime==1.0.4.2 7 | cycler==0.10.0 8 | decorator==4.4.1 9 | defusedxml==0.6.0 10 | entrypoints==0.3 11 | h5py==2.10.0 12 | importlib-metadata==1.5.0 13 | ipykernel==5.1.4 14 | ipython==7.12.0 15 | ipython-genutils==0.2.0 16 | jedi==0.16.0 17 | Jinja2==2.11.1 18 | joblib==0.14.1 19 | json5==0.9.0 20 | jsonschema==3.2.0 21 | jupyter-client==5.3.4 22 | jupyter-core==4.6.1 23 | jupyterlab==1.2.6 24 | jupyterlab-server==1.0.6 25 | kiwisolver==1.1.0 26 | MarkupSafe==1.1.1 27 | matplotlib==3.1.3 28 | mistune==0.8.4 29 | nbconvert==5.6.1 30 | nbformat==5.0.4 31 | netCDF4==1.5.3 32 | notebook==6.0.3 33 | numpy==1.18.1 34 | packaging==20.1 35 | pandas==1.0.1 36 | pandocfilters==1.4.2 37 | parso==0.6.1 38 | patsy==0.5.1 39 | pexpect==4.8.0 40 | pickleshare==0.7.5 41 | Pillow==7.0.0 42 | prometheus-client==0.7.1 43 | prompt-toolkit==3.0.3 44 | ptyprocess==0.6.0 45 | Pygments==2.5.2 46 | pymc3==3.8 47 | pyparsing==2.4.6 48 | pyrsistent==0.15.7 49 | python-dateutil==2.8.1 50 | pytz==2019.3 51 | PyYAML==5.3 52 | pyzmq==18.1.1 53 | scikit-learn==0.22.1 54 | scipy==1.4.1 55 | seaborn==0.10.0 56 | Send2Trash==1.5.0 57 | six==1.14.0 58 | terminado==0.8.3 59 | testpath==0.4.4 60 | Theano==1.0.4 61 | tornado==6.0.3 62 | tqdm==4.42.1 63 | traitlets==4.3.3 64 | wcwidth==0.1.8 65 | webencodings==0.5.1 66 | xarray==0.15.0 67 | zipp==2.1.0 68 | -------------------------------------------------------------------------------- /solutions/2_Starting_simple.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Starting simple: Linear Regression\n", 8 | "As so often, we will start relatively simple with a Linear Regression. Sounds maybe a bit boring but don't worry, I will show you later how to extend this model to something slightly more complex.\n", 9 | "\n", 10 | "## Short Look at the Data\n", 11 | "For this tutorial, I will use rental data that I scraped from Immoscout24. For a more detailed description and information on how I scraped the data set, you can check its description on [kaggle](https://www.kaggle.com/corrieaar/apartment-rental-offers-in-germany) where I occasionally also update the data.\n", 12 | "For this analysis, we will concentrate on rental offers in Berlin but of course feel free to try out different cities or areas!" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import sys\n", 22 | "sys.path.append('../src/')\n", 23 | "\n", 24 | "import numpy as np\n", 25 | "import pandas as pd\n", 26 | "import matplotlib.pyplot as plt\n", 27 | "import csv\n", 28 | "\n", 29 | "from utils import iqr, iqr_rule" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "plt.style.use(\"fivethirtyeight\") # or your own favorite style\n", 39 | "plt.rcParams['figure.figsize'] = 9, 9" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "d = pd.read_csv(\"../data/immo_data.csv\", dtype={\"geo_plz\": str})" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "Before using the data, we will do a bit of preprocessing: We remove outliers where either the living area or the total rent is too low or too high. To remove outliers, the [Inter quartile range (IQR)](https://en.wikipedia.org/wiki/Interquartile_range#Outliers) is used. The IQR rule marks everything as outlier that's too far from the middle range of the data. Most of the data we throw away this way were typos or similar with unreasonable input values.\n", 56 | "For a more thorough analysis, it might be useful to check that we don't throw away real cases and instead incorporate the outliers in further analysis.\n", 57 | "\n", 58 | "As target variable we will use the variable `totalRent`, which in most cases is the sum of the base rent and a service fee." 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "d[\"totalRent\"] = np.where(d[\"totalRent\"].isnull(), d[\"baseRent\"], d[\"totalRent\"])\n", 68 | "\n", 69 | "# since log doesn't work with 0, we replace 0 with 0.5\n", 70 | "# seems reasonable tto say hat a rent of 0€ is the same as 50ct\n", 71 | "d[\"livingSpace_m\"] = np.where(d[\"livingSpace\"] <= 0, 0.5, d[\"livingSpace\"])\n", 72 | "d[\"totalRent_m\"] = np.where(d[\"totalRent\"] <= 0, 0.5, d[\"totalRent\"])\n", 73 | "d[\"logRent\"] = np.log(d[\"totalRent_m\"])\n", 74 | "d[\"logSpace\"] = np.log(d[\"livingSpace_m\"])\n", 75 | "\n", 76 | "not_outlier = iqr_rule(d[\"logSpace\"], factor=1.5) & iqr_rule(d[\"logRent\"], factor=1.5)\n", 77 | "d = d[not_outlier]\n", 78 | "berlin = d[(d.regio1 == \"Berlin\") ]" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "In this analysis, we want to predict the rent (`totalRent`) by the living area (`livingSpace`).\n", 86 | "\n", 87 | "You can have a short look at the data and these two variables!\n", 88 | "\n", 89 | "For example:\n", 90 | "- What is the average rent in Berlin?\n", 91 | "- What is the average size of a flat in Berlin?\n", 92 | "- Plot rent vs living space" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "berlin.shape" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "berlin.iloc[0]" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# average rent\n", 120 | "berlin[\"totalRent\"].mean()" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "berlin[\"totalRent\"].median()" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "The difference in mean and median suggests that the distribution is not normally distributed. If you plot the histogram, you can also see that the distribution is skewed." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# average living area\n", 146 | "berlin[\"livingSpace\"].mean()" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "berlin[\"livingSpace\"].median()" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "berlin.date.value_counts().plot.bar()\n", 165 | "plt.title(\"Number of cases per scrape time\")\n", 166 | "plt.show()" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "berlin[[\"totalRent\", \"livingSpace\", \"noParkSpaces\", \"picturecount\", \"yearConstructed\"]].corr()" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "Living space is strongly positively correlated with the total rents, as we would expect. Interestingly, the construction year doesn't seem to be strongly correlated with total rent, this might be because it is not a linear relationship: Altbau tends to be very expensive in Berlin, Plattenbau from the 70s less so and flats in new buildings are expensive again. Picture count is also positively correlated with the total rent but is a good example that correlation does not imply causation." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "berlin[\"livingSpace\"].mean()" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "berlin[\"totalRent\"].mean()" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "berlin[\"totalRent\"].median()" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "plt.figure(figsize=(9,9))\n", 219 | "plt.scatter(berlin.livingSpace, berlin.totalRent, s=4)\n", 220 | "plt.title(\"Rent by living area\")\n", 221 | "plt.xlabel(\"Living Area [sqm]\")\n", 222 | "plt.ylabel(\"Monthly Rent [€]\")\n", 223 | "plt.show()" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "Before working with the data, we will rescale and normalize the living area and also rescale the total rent:" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "berlin[\"livingSpace_s\"] = (berlin[\"livingSpace\"] - berlin[\"livingSpace\"].mean()) / np.std(berlin[\"livingSpace\"])\n", 240 | "berlin[\"totalRent_s\"] = berlin[\"totalRent\"] / 100\n", 241 | "\n", 242 | "# saving for later\n", 243 | "berlin.to_csv(\"../data/berlin.csv\", encoding=\"utf-8\", quoting=csv.QUOTE_NONNUMERIC) # special quoting necessary because otherwise description messes up" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "We will have to standardize/destandardize area a few times, so we will use small helper functions for this:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "def standardize_area(x):\n", 260 | " return ( x - berlin[\"livingSpace\"].mean()) / np.std(berlin[\"livingSpace\"])\n", 261 | " \n", 262 | "def destandardize_area(x):\n", 263 | " return ( x * np.std(berlin[\"livingSpace\"]) ) + berlin[\"livingSpace\"].mean() " 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": {}, 270 | "outputs": [], 271 | "source": [ 272 | "plt.figure(figsize=(9,9))\n", 273 | "plt.scatter(berlin.livingSpace_s, berlin.totalRent_s, s=4)\n", 274 | "plt.title(\"Rent by living area\")\n", 275 | "plt.xlabel(\"Living Area [sqm]\")\n", 276 | "plt.ylabel(\"Monthly Rent [100€]\")\n", 277 | "plt.axvline(x=0, c=\"k\", linewidth=1)\n", 278 | "plt.axhline(y=np.mean(berlin.totalRent_s), c=\"k\", linewidth=1, linestyle=\"--\", label=\"Average Rent\")\n", 279 | "plt.legend()\n", 280 | "plt.show()" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "Looking at the plot, it roughly looks like a linear relationship between area and monthly rent. Indeed, the bigger the flat, the more expensive it should be. So we will start our modelling with a linear model.\n", 288 | "\n", 289 | "So in math notation, our model can be written as follows:\n", 290 | "\n", 291 | "$$ \\text{rent} \\approx \\alpha + \\beta \\text{area} $$\n", 292 | "\n", 293 | "This is the same as\n", 294 | "$$ \\text{rent} = \\alpha + \\beta \\text{area} + \\epsilon$$\n", 295 | "where $\\epsilon$ is normally distributed, i.e. $\\epsilon \\sim \\text{Normal}(0, \\sigma)$. This can be rewritten as\n", 296 | "$$ \\text{rent} \\sim \\text{Normal}(\\alpha + \\beta \\text{area}, \\sigma).$$\n", 297 | "\n", 298 | "For easier reading, we rewrite this again:\n", 299 | "$$\\begin{align*} \\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 300 | "\\mu &= \\alpha + \\beta \\text{area}\n", 301 | "\\end{align*}$$\n", 302 | "This will be our first model!" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "## Adding a Bayesian ingredient: Priors\n", 310 | "\n", 311 | "Before implementing this model, let's shortly think about what the parameters $\\alpha$ and $\\beta$ mean here.\n", 312 | "\n", 313 | "We will use the model with our rescaled data and thus $\\alpha$ the intercept is the rental price of an average sized flat. (For averaged sized flats, the scaled area is 0).\n", 314 | "\n", 315 | "$\\beta$ is then the increase in rent if the flat is one standard deviation larger. One standard deviation is 29sqm which is roughly the average size for a room (check living space divided by number of rooms _noRooms_). Thus $\\beta$ is roughly the increase in rent if the flat would have one more room. \n", 316 | "\n", 317 | "$\\sigma$ is how much the rent can differ for two flats of the same size. Concretely, it is how much the rent can differ from the average rent for flats of the same size. As our model says that the rent is normally distributed, about 95% of the cases should be within $2\\sigma$ of the average rent.\n", 318 | "As error term, $\\sigma$ is always positive.\n", 319 | "\n", 320 | "Thinking about what the parameters mean beforehand is very important in a Bayesian analysis since we need to specify priors. Priors are what we think the parameters could be before seeing the data. And, obviously, to be able to say what range the parameters would be in, it would be good to know what the parameters mean.\n", 321 | "\n", 322 | "\n", 323 | "If we don't know anything about the problems, we might want to specify priors that are very uninformative and vague. \n", 324 | "We could for example specify $\\alpha$ and $\\beta$ as being somewhere between -10,000 and + 10,000:" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | "import pymc3 as pm" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [ 342 | "with pm.Model() as mod:\n", 343 | " alpha = pm.Normal('alpha', mu=0, sigma=10000)\n", 344 | " beta = pm.Normal('beta', mu=0, sigma=10000)" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "A PyMC-Model is specified in a context. Before we can actually specify the model, we need to specify the priors, since, as usual in Python, each variable we want to use need to declared beforehand.\n", 352 | "In PyMC, you always need to specify the name of the variable twice. This is so that the variable knows its own name.\n", 353 | "\n", 354 | "If you print the model, it renders nicely in a more mathy looking description:" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "mod" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "We can add to the model by opening the context again.\n", 371 | "To for example add a prior for sigma, we can proceed as follows:" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "with mod:\n", 381 | " sigma = pm.HalfNormal('sigma', sigma=10000)" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "Since $\\sigma$ as error term is always a positive variable, we need to use a distribution that is also always positive. One commonly used distribution for this is the Half-Normal. A normal distribution that is cut in half and only positive.\n", 389 | "Other commonly used distributions for $\\sigma$ are for example the Exponential or the Half-Cauchy.\n", 390 | "\n", 391 | "Now that we specified some priors, we can write out the complete model:" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "metadata": {}, 398 | "outputs": [], 399 | "source": [ 400 | "with mod:\n", 401 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 402 | " \n", 403 | " rent = pm.Normal('rent',mu=mu, sigma=sigma,\n", 404 | " observed=berlin[\"totalRent_s\"])" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "The PyMC-Model is written very similar to how the model was specified above. \n", 412 | "\n", 413 | "Note that for the outcome variable, we need to specify the observed data.\n", 414 | "Usually the whole model is written as one:" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [ 423 | "with pm.Model() as mod:\n", 424 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10000)\n", 425 | " beta = pm.Normal(\"beta\", mu=0, sigma=10000)\n", 426 | " sigma = pm.HalfNormal(\"sigma\", sigma=10000)\n", 427 | " \n", 428 | " mu = alpha + beta*berlin[\"livingSpace_s\"] \n", 429 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 430 | " observed=berlin[\"totalRent_s\"])" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "Especially in the beginning, when starting out with Bayesian modelling, picking priors can seem a bit daring. \n", 438 | "Does it make a difference if we use $\\text{Normal}(0,100)$ or $\\text{Normal}(0, 1000)$?\n", 439 | "What's with using different distributions?\n", 440 | "\n", 441 | "\n", 442 | "There are a few tips that help a bit with picking a good prior. The first one is to visualize your priors. We can do this with PyMC by sampling from our priors.\n", 443 | "This is the similar to sampling from the specified distributions using numpy or scipy. However, on top of sampling from the probability distributions, it then also computes $\\mu$ (the linear part of the model) using the samples and the predictor variables. It then uses the computed $\\mu$ and the samples from $\\sigma$ to sample possible outcome values. Even though we specified the target variable (in Machine Learning this one is usually called `y`) it does not use this (yet) and is thus very quick." 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "with mod:\n", 453 | " priors = pm.sample_prior_predictive(samples=1000)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "We will use [ArviZ](https://arviz-devs.github.io/arviz/) to keep track of the different outputs computed from our model and to visualize them." 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": null, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "import arviz as az\n", 470 | "pm_data = az.from_pymc3(prior = priors)" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "ArviZ comes with many plots that are useful to analyze Bayesian models.\n", 478 | "Let's start looking at the priors for our three model parameters:" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "az.plot_density(pm_data, \n", 488 | " # we want to plot the prior\n", 489 | " group=\"prior\", \n", 490 | " # and only the following variables\n", 491 | " var_names=[\"alpha\", \"beta\", \"sigma\"],\n", 492 | " # just some settings to make it prettier\n", 493 | " shade=0.3, bw=8, figsize=(20,6), credible_interval=1)\n", 494 | "plt.show()" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "This doesn't look super interesting, it is basically just a density plot of the distributions we gave for the priors.\n", 502 | "More interesting is to visualize what rental prices these prior distributions would produce given the predictor data.\n", 503 | "Unfortunately, there is no ArviZ plot for this yet, but we can do this ourselves without too much work.\n", 504 | "\n", 505 | "The object `priors` contains a numpy array for `rent`. Check what it contains:" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": null, 511 | "metadata": {}, 512 | "outputs": [], 513 | "source": [ 514 | "priors[\"rent\"].shape" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "For each observation in our dataframe (7092 obs) it computed 1000 samples for possible rent prices, using the samples from `alpha`, `beta`, and `sigma` together with the corresponding living area from this observation.\n", 522 | "\n", 523 | "We can flatten the matrix to obtain one big array and plot a histogram:" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": {}, 530 | "outputs": [], 531 | "source": [ 532 | "plt.hist(priors[\"rent\"].flatten()*100, alpha=0.9, ec=\"darkblue\", bins=70)\n", 533 | "plt.title(\"Histogram over possible range of rental prices\")\n", 534 | "plt.xlabel(\"Monthly Rent [€]\")\n", 535 | "plt.show()" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": {}, 541 | "source": [ 542 | "Compare this with the histogram over the actual rental prices:" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": null, 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "plt.hist(berlin[\"totalRent_s\"]*100, alpha=0.9, ec=\"darkblue\", bins=70)\n", 552 | "plt.title(\"Histogram over the actual rental prices\")\n", 553 | "plt.xlabel(\"Monthly Rent [€]\")\n", 554 | "plt.show()" 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "metadata": {}, 560 | "source": [ 561 | "The histograms don't look very similar and the range of the sampled rents is completly off! A rent of up to 50,000,000€ per month doesn't sound very realistic.\n", 562 | "\n", 563 | "\n", 564 | "Another good way to understand the prior better is to visualize the model that it would produce. In our case, the model is a line determined by the intercept $\\alpha$ and the slope $\\beta$. \n", 565 | "We can thus sample 50 $\\alpha$ and $\\beta$s and multiply this with the range of area values:" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": null, 571 | "metadata": {}, 572 | "outputs": [], 573 | "source": [ 574 | "area_s = np.linspace(start=-2, stop=3.5, num=50)\n", 575 | "draws = np.random.choice(len(priors[\"alpha\"]), replace=False, size=50)\n", 576 | "alpha = priors[\"alpha\"][draws]\n", 577 | "beta = priors[\"beta\"][draws]\n", 578 | "\n", 579 | "mu = alpha + beta * area_s[:, None]\n", 580 | "\n", 581 | "\n", 582 | "plt.plot(destandardize_area(area_s), mu*100, c=\"#737373\", alpha=0.5)\n", 583 | "plt.xlabel(\"Living Area [sqm]\")\n", 584 | "plt.ylabel(\"Price [€]\")\n", 585 | "plt.title(\"Linear model according to our prior\")\n", 586 | "plt.axhline(y=900, color=\"#fc4f30\", label=\"My own rent\")\n", 587 | "plt.axhline(y=5000, color='#e5ae38', label=\"Most expensive realistic rent I can think of\")\n", 588 | "plt.axhline(y=0, label=\"Free rent\")\n", 589 | "plt.legend()\n", 590 | "plt.show()" 591 | ] 592 | }, 593 | { 594 | "cell_type": "markdown", 595 | "metadata": {}, 596 | "source": [ 597 | "Exercise: Use `plt.axhline(y=my_rent)` to compare the sampled lines with your own rent, the most expensive, reasonable rent you can think off and the cheapest possible rent you can think off. I also like to google what is the known most expensive rent one can find in Berlin (or elsewhere) and see how it compares to the model.\n", 598 | "\n", 599 | "You could also add these benchmarks to the histogram using `plt.axvline(x=my_rent)`.\n", 600 | "\n", 601 | "These priors are obviously not very realistic! Luckily, we all know a bit about rents, so even without looking at the data, we can think of better priors.\n", 602 | "\n", 603 | "Write a new model `mod_informed` that uses the same linear part as above, but better priors. You can use the same distributions as above and only change its parameters. I recommend you to try various priors and check what effects they have on the resulting model and its outputs.\n", 604 | "\n", 605 | "Remember: \n", 606 | "- $\\alpha$ the intercept is the rental price of an average sized flat (averaged sized flat is 77sqm).\n", 607 | "- $\\beta$ the slope is roughly the increase in rent if the flat would have one more room\n", 608 | "- $\\sigma$ is how much the rent can differ for two flats of the same size. As error term, $\\sigma$ is always positive.\n", 609 | "\n", 610 | "Always keep in mind the scale of your data!" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": null, 616 | "metadata": {}, 617 | "outputs": [], 618 | "source": [ 619 | "with pm.Model() as mod_informed:\n", 620 | " # these are definitely not the only good priors for this problem\n", 621 | " # for this problem, many priors, even the original ones,\n", 622 | " # work quite fine for this problem\n", 623 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10)\n", 624 | " beta = pm.Normal(\"beta\", mu=0, sigma=5)\n", 625 | " \n", 626 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 627 | " \n", 628 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 629 | " \n", 630 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 631 | " observed=berlin[\"totalRent_s\"])\n", 632 | " \n", 633 | " priors = pm.sample_prior_predictive(samples=1000)" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "I made functions for the above plot, so you don't have to copy the whole plot code: `compare_hist(priors, berlin)` to plot the two histograms and `draw_models(priors, berlin)` to plot sampled lines from our model:" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "metadata": {}, 647 | "outputs": [], 648 | "source": [ 649 | "from utils import compare_hist, draw_models\n", 650 | "\n", 651 | "compare_hist(priors, berlin)\n", 652 | "plt.show()" 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": null, 658 | "metadata": {}, 659 | "outputs": [], 660 | "source": [ 661 | "draw_models(priors, berlin)\n", 662 | "plt.axhline(y=900, color=\"#fc4f30\", label=\"My own rent\")\n", 663 | "plt.axhline(y=5000, color='#e5ae38', label=\"Most expensive realistic rent I can think of\")\n", 664 | "plt.axhline(y=0, label=\"Free rent\")\n", 665 | "plt.legend()\n", 666 | "plt.show()" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "The priors still cover a very wide range of values, even some values that are not very realistic. It still covers negative values, has negative slopes, and goes up to some still relatively extreme values. But that's fine, afterall, the model will still see the data and then learn from that. But this way we managed to tell the model a plausible range of possible values.\n", 674 | "\n", 675 | "\n", 676 | "Note: It is not good practise to deduce priors from the data we will train on later. So comparing the prior distributions to the original distribution is not something you should do to get the prior distribution close to the data distribution. I only presented the data here to help get an idea what ranges we're talking about. It's usually better to deduce priors from domain knowledge or external knowledge and not from the data itself!" 677 | ] 678 | }, 679 | { 680 | "cell_type": "markdown", 681 | "metadata": {}, 682 | "source": [ 683 | "Questions to explore regarding different priors:\n", 684 | "- We've seen above what happens when we use overly vague priors, what happens if we use very narrow priors? What problems could arise?" 685 | ] 686 | }, 687 | { 688 | "cell_type": "code", 689 | "execution_count": null, 690 | "metadata": {}, 691 | "outputs": [], 692 | "source": [ 693 | "with pm.Model() as mod_narrow:\n", 694 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=0.5)\n", 695 | " beta = pm.Normal(\"beta\", mu=0, sigma=0.5)\n", 696 | " \n", 697 | " sigma = pm.HalfNormal(\"sigma\", sigma=0.5)\n", 698 | " \n", 699 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 700 | " \n", 701 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 702 | " observed=berlin[\"totalRent_s\"])\n", 703 | " \n", 704 | " priors_narrow = pm.sample_prior_predictive(samples=1000)" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "compare_hist(priors_narrow, berlin)\n", 714 | "plt.show()" 715 | ] 716 | }, 717 | { 718 | "cell_type": "markdown", 719 | "metadata": {}, 720 | "source": [ 721 | "This way, the model covers a very small range of values it thinks are plausible and unfortunately, the real range is not included in this range!\n", 722 | "Using very (too) narrow priors is the same as using (too) heavy regularization in Machine Learning: The model underfits and cannot learn from the data because we restrict it too strongly. \n", 723 | "\n", 724 | "In this case, since the normal distribution puts at least a tiny tiny bit of probability mass even on far away values and we have a lot of data, the model still manages to learn the correct values. But we're making it quite difficult for the model! If we'd have less data this would certainly not be the case.\n", 725 | "\n", 726 | "A good source on which priors to pick for which problem is the [Stan Prior Choice Recommendations](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations)." 727 | ] 728 | }, 729 | { 730 | "cell_type": "markdown", 731 | "metadata": {}, 732 | "source": [ 733 | "- So far, we've only used the normal distribution for the priors. Try out some different distributions, e.g.\n", 734 | " - Uniform distribution, e.g. over -100 to + 100" 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": null, 740 | "metadata": {}, 741 | "outputs": [], 742 | "source": [ 743 | "with pm.Model() as mod_uniform:\n", 744 | " alpha = pm.Uniform(\"alpha\", lower=-100, upper=100)\n", 745 | " beta = pm.Uniform(\"beta\", lower=-100, upper=100)\n", 746 | " \n", 747 | " sigma = pm.HalfNormal(\"sigma\", sigma=0.5)\n", 748 | " \n", 749 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 750 | " \n", 751 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 752 | " observed=berlin[\"totalRent_s\"])\n", 753 | " \n", 754 | " priors_uniform = pm.sample_prior_predictive(samples=1000)" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": null, 760 | "metadata": {}, 761 | "outputs": [], 762 | "source": [ 763 | "compare_hist(priors_uniform, berlin)\n", 764 | "plt.show()" 765 | ] 766 | }, 767 | { 768 | "cell_type": "markdown", 769 | "metadata": {}, 770 | "source": [ 771 | "The prior predictive distributions looks actually quite similar to the ones we got above. In this case, the results will also look very similar.\n", 772 | "\n", 773 | "\n", 774 | "However, when we use a uniform prior distribution, we introduce hard borders. Hard borders are generally not a good idea: \n", 775 | "They can make it difficult for the sampling algorithm (using for example $\\text{Uniform}(-1000, +1000)$ beaks the sampling!) but they're especially problematic when the true parameter value is not inside this hard border range:" 776 | ] 777 | }, 778 | { 779 | "cell_type": "code", 780 | "execution_count": null, 781 | "metadata": {}, 782 | "outputs": [], 783 | "source": [ 784 | "with pm.Model() as mod_uniform_narrow:\n", 785 | " alpha = pm.Uniform(\"alpha\", lower=-0.5, upper=0.5)\n", 786 | " beta = pm.Uniform(\"beta\", lower=-0.5, upper=0.5)\n", 787 | " \n", 788 | " sigma = pm.HalfNormal(\"sigma\", sigma=0.5)\n", 789 | " \n", 790 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 791 | " \n", 792 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 793 | " observed=berlin[\"totalRent_s\"])\n", 794 | " \n", 795 | " trace_uniform_narrow = pm.sample(draws=1000, tune=1000)\n", 796 | " priors_uniform_narrow = pm.sample_prior_predictive(samples=1000)" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "This model puts zero probability mass on the correct parameter values: It will thus fail and always produce wrong results, no matter how much data we feed. This is not so much visible in the prior predictive distribution (except for it being far off from plausbile values) but shows in the results:" 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": null, 809 | "metadata": {}, 810 | "outputs": [], 811 | "source": [ 812 | "az.summary(trace_uniform_narrow)" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": {}, 818 | "source": [ 819 | "Both alpha and beta are exactly at the border of their prior distribution and if we compare it later to our actual model, we will see that these values are completely off.\n", 820 | "\n", 821 | "So unless you have good reasons for it, I would not recommended to use the Uniform distribution as prior." 822 | ] 823 | }, 824 | { 825 | "cell_type": "markdown", 826 | "metadata": {}, 827 | "source": [ 828 | " - Student-T distribution\n", 829 | " \n", 830 | "The Student-T distribution is often used instead of the Normal distribution for the rent distribution. The t-distribution has much heavier tails which makes it much more robust against outlier. it is thus often also called a robust regression. To use the T-distribution, we will have to estimate another parameter $\\nu$ nu, the degrees of freedom." 831 | ] 832 | }, 833 | { 834 | "cell_type": "code", 835 | "execution_count": null, 836 | "metadata": {}, 837 | "outputs": [], 838 | "source": [ 839 | "with pm.Model() as mod_robust:\n", 840 | " alpha = pm.Normal(\"alpha\", mu=0, sigma=10)\n", 841 | " beta = pm.Normal(\"beta\", mu=0, sigma=5)\n", 842 | " \n", 843 | " sigma = pm.HalfNormal(\"sigma\", sigma=0.5)\n", 844 | " \n", 845 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 846 | " \n", 847 | " # nu, the degrees of freedom is a new parameter\n", 848 | " # this is a commonly used default prior used for nu\n", 849 | " nu = pm.Gamma(\"nu\", alpha=2, beta=0.1)\n", 850 | " \n", 851 | " rent = pm.StudentT(\"rent\", nu=nu, mu=mu, sigma=sigma,\n", 852 | " observed=berlin[\"totalRent_s\"])\n", 853 | " \n", 854 | " trace_robust = pm.sample(draws=1000, tune=1000)\n", 855 | " priors_robust = pm.sample_prior_predictive(samples=1000)" 856 | ] 857 | }, 858 | { 859 | "cell_type": "code", 860 | "execution_count": null, 861 | "metadata": {}, 862 | "outputs": [], 863 | "source": [ 864 | "az.plot_density(az.from_pymc3(prior=priors_robust, model=mod_robust), \n", 865 | " group=\"prior\",\n", 866 | " var_names=[\"alpha\", \"beta\", \"sigma\", \"nu\"],\n", 867 | " shade=0.3, credible_interval=1, bw=8)\n", 868 | "plt.show()" 869 | ] 870 | }, 871 | { 872 | "cell_type": "markdown", 873 | "metadata": {}, 874 | "source": [ 875 | "Since for large values of nu, the T-distribution also approaches a Normal distribution, we can even check how \"normal\" our data is. In a robust regression, it is even possible to [detect outliers](https://docs.pymc.io/notebooks/GLM-robust-with-outlier-detection.html)." 876 | ] 877 | }, 878 | { 879 | "cell_type": "code", 880 | "execution_count": null, 881 | "metadata": {}, 882 | "outputs": [], 883 | "source": [ 884 | "az.summary(trace_robust)" 885 | ] 886 | }, 887 | { 888 | "cell_type": "markdown", 889 | "metadata": {}, 890 | "source": [ 891 | "In this case, $\\nu$ is relatively high, indicating that our data is pretty normal. Well, kind of by design since I cleaned out the outliers before.\n", 892 | "You can go and try how this changes if you leave the outliers (or at least some outliers) in." 893 | ] 894 | }, 895 | { 896 | "cell_type": "markdown", 897 | "metadata": {}, 898 | "source": [ 899 | "- Since we know that rents should increase for larger flats, we know that the slope $\\beta$ should be positive, how could we bias our priors to positive values?\n", 900 | "\n", 901 | "One possibility is to just shift the mean of beta a bit up:" 902 | ] 903 | }, 904 | { 905 | "cell_type": "code", 906 | "execution_count": null, 907 | "metadata": {}, 908 | "outputs": [], 909 | "source": [ 910 | "with pm.Model() as mod_pos_slope:\n", 911 | " alpha = pm.Normal(\"alpha\", mu=5, sigma=10)\n", 912 | " beta = pm.Normal(\"beta\", mu=2, sigma=5)\n", 913 | " \n", 914 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 915 | " \n", 916 | " mu = alpha + beta*berlin[\"livingSpace_s\"]\n", 917 | " \n", 918 | " rent = pm.Normal(\"rent\", mu=mu, sigma=sigma,\n", 919 | " observed=berlin[\"totalRent_s\"])\n", 920 | " \n", 921 | " priors_pos_slope = pm.sample_prior_predictive(samples=1000)" 922 | ] 923 | }, 924 | { 925 | "cell_type": "code", 926 | "execution_count": null, 927 | "metadata": {}, 928 | "outputs": [], 929 | "source": [ 930 | "draw_models(priors_pos_slope, berlin)\n", 931 | "plt.show()" 932 | ] 933 | }, 934 | { 935 | "cell_type": "markdown", 936 | "metadata": {}, 937 | "source": [ 938 | "This way, there are still a few negative slope lines but most have now a positive slope.\n", 939 | "\n", 940 | "Since there is quite a lot of data, in this problem it doesn't make too much of a difference though to use this slightly adapted prior.\n", 941 | "\n", 942 | "\n", 943 | "Another option is to consider a transformation of the data: For both the living area and the rental price, it might make sense to log-transform these variables." 944 | ] 945 | }, 946 | { 947 | "cell_type": "markdown", 948 | "metadata": {}, 949 | "source": [ 950 | "Optional exercises for at home: \n", 951 | "To understand better how something works, it often helps to implement it oneself. Try implementing yourself sampling from the prior using only numpy or scipy." 952 | ] 953 | }, 954 | { 955 | "cell_type": "code", 956 | "execution_count": null, 957 | "metadata": {}, 958 | "outputs": [], 959 | "source": [ 960 | "# implementing prior sampling\n", 961 | "n = 1000\n", 962 | "alpha = np.random.normal(loc=0, scale=10, size=n)\n", 963 | "beta = np.random.normal(loc=0, scale=5, size=n)\n", 964 | "\n", 965 | "sigma = np.random.normal(loc=0, scale=5, size=n)\n", 966 | "sigma = np.absolute(sigma)\n", 967 | "\n", 968 | "mu = alpha + beta*berlin[\"livingSpace_s\"][:,None]\n", 969 | "\n", 970 | "rent = np.random.normal(loc=mu, scale=sigma).reshape(1,n,len(berlin))" 971 | ] 972 | }, 973 | { 974 | "cell_type": "code", 975 | "execution_count": null, 976 | "metadata": {}, 977 | "outputs": [], 978 | "source": [ 979 | "own_data = az.from_dict(prior = {\"alpha\":alpha,\n", 980 | " \"beta\":beta,\n", 981 | " \"sigma\":sigma},\n", 982 | " prior_predictive={\"rent\":rent})" 983 | ] 984 | }, 985 | { 986 | "cell_type": "code", 987 | "execution_count": null, 988 | "metadata": {}, 989 | "outputs": [], 990 | "source": [ 991 | "az.plot_density(own_data, group=\"prior\",\n", 992 | " shade=0.3, bw=8, credible_interval=1)\n", 993 | "plt.show()" 994 | ] 995 | }, 996 | { 997 | "cell_type": "code", 998 | "execution_count": null, 999 | "metadata": {}, 1000 | "outputs": [], 1001 | "source": [ 1002 | "prior = {\"alpha\":alpha,\n", 1003 | " \"beta\":beta,\n", 1004 | " \"sigma\":sigma,\n", 1005 | " \"rent\": rent}\n", 1006 | "compare_hist(prior, berlin)\n", 1007 | "plt.show()" 1008 | ] 1009 | }, 1010 | { 1011 | "cell_type": "code", 1012 | "execution_count": null, 1013 | "metadata": {}, 1014 | "outputs": [], 1015 | "source": [ 1016 | "draw_models(prior, berlin)\n", 1017 | "plt.show()" 1018 | ] 1019 | } 1020 | ], 1021 | "metadata": { 1022 | "kernelspec": { 1023 | "display_name": "PyLadies-Bayesian-Tutorial", 1024 | "language": "python", 1025 | "name": "pyladies-bayesian-tutorial" 1026 | }, 1027 | "language_info": { 1028 | "codemirror_mode": { 1029 | "name": "ipython", 1030 | "version": 3 1031 | }, 1032 | "file_extension": ".py", 1033 | "mimetype": "text/x-python", 1034 | "name": "python", 1035 | "nbconvert_exporter": "python", 1036 | "pygments_lexer": "ipython3", 1037 | "version": "3.7.4" 1038 | } 1039 | }, 1040 | "nbformat": 4, 1041 | "nbformat_minor": 4 1042 | } 1043 | -------------------------------------------------------------------------------- /solutions/4_Beyond_linear.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Beyond Linear: Going Hierarchical\n", 8 | "All nice and well but linear models with only one predictor can be a bit boring, right? Also, most people know that size is not the only factor determining the rental price. If you've lived in Berlin for a while, you know that certain areas are much more expensive than others.\n", 9 | "\n", 10 | "Unfortunately, this data set doesn't contain the coordinates for each flat nor the exact address. But for each flat we have a ZIP code (the PLZ).\n", 11 | "We will now extend our model to incorporate the location by using the ZIP code. We do so by training one linear model per ZIP code." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import numpy as np\n", 21 | "import pandas as pd\n", 22 | "import matplotlib.pyplot as plt\n", 23 | "import seaborn as sns\n", 24 | "\n", 25 | "import pymc3 as pm\n", 26 | "import theano\n", 27 | "\n", 28 | "import sys\n", 29 | "sys.path.append('../src/')\n", 30 | "from utils import standardize_area, destandardize_area\n", 31 | "\n", 32 | "berlin = pd.read_csv(\"../data/berlin.csv\", index_col=0, dtype={\"geo_plz\":str})" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "plt.style.use(\"fivethirtyeight\")" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "One model per zip code works well if a zip code has many observations:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "sns.regplot(x=berlin[\"livingSpace\"][berlin.geo_plz == \"13583\"], y=berlin[\"totalRent\"][berlin.geo_plz == \"13583\"], \n", 58 | " color=\"#42b883\", ci=0, scatter_kws={\"s\": 40}, label=\"Spandau\")\n", 59 | "sns.regplot(x=berlin[\"livingSpace\"][berlin.geo_plz == \"10405\"], y=berlin[\"totalRent\"][berlin.geo_plz == \"10405\"], \n", 60 | " color=\"#f07855\", ci=0, scatter_kws={\"s\": 40}, label=\"Prenzlauer Berg\")\n", 61 | "\n", 62 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent\"], s=3, c=\"gray\", alpha=0.3)\n", 63 | "plt.ylabel(\"Monthly Rent [€]\")\n", 64 | "plt.xlabel(\"Living Area [sqm]\")\n", 65 | "plt.legend()\n", 66 | "plt.show()" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "If a zip code has only a handful or less observations, it is a bit more difficult. In this zip code, we only have two observations and a model fit on these two points would result in a negative slope which doesn't make much sense. In cases where we have little data, we would prefer the model to be closer to a model fit on all data, as we did before. Since even in Blankenburg, in general, bigger flats should be more expensive." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent\"], s=3, c=\"gray\", alpha=0.3)\n", 83 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 84 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 85 | "plt.legend()\n", 86 | "plt.show()" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "We can achieve this by saying that for each linear model (one for each zip code), the parameters determining the slope and intercept of this model come from a common distribution. So all slope parameters for example would come from a Normal distribution centered around some value close to 4.4 (the slope parameter we obtained in our last model) but the slope for Prenzlauer Berg would be higher than average while Spandau would be lower than average and places like Blankenburg would stay close to the mean (and also have higher uncertainty).\n", 94 | "\n", 95 | "Before, our model looked like this:\n", 96 | "\n", 97 | "$$\\begin{align*}\n", 98 | "\\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 99 | "\\mu &= \\alpha + \\beta \\text{area} \\\\\n", 100 | "\\\\\n", 101 | "\\alpha &\\sim \\text{Normal}(0, 10) \\\\\n", 102 | "\\beta &\\sim \\text{Normal}(0, 5) \\\\\n", 103 | "\\\\\n", 104 | "\\sigma &\\sim \\text{HalfNormal}(5) \n", 105 | "\\end{align*}$$\n", 106 | "\n", 107 | "We know extend this as follows:\n", 108 | "$$\\begin{align*}\n", 109 | "\\text{rent} &\\sim \\text{Normal}(\\mu, \\sigma) \\\\\n", 110 | "\\mu &= \\alpha_{[\\text{ZIP}]} + \\beta_{[\\text{ZIP}]} \\text{area} \\\\\n", 111 | "\\\\\n", 112 | "\\alpha_{[\\text{ZIP}]} &\\sim \\text{Normal}(\\mu_{\\alpha}, \\sigma_{\\alpha}) \\\\\n", 113 | "\\beta_{[\\text{ZIP}]} &\\sim \\text{Normal}(\\mu_{\\beta}, \\sigma_{\\beta}) \\\\\n", 114 | "\\\\\n", 115 | "\\mu_{\\alpha} &\\sim \\text{Normal}(0, 10) \\\\\n", 116 | "\\mu_{\\beta} &\\sim \\text{Normal}(0, 5) \\\\\n", 117 | "\\\\\n", 118 | "\\sigma, \\sigma_{\\alpha}, \\sigma_{\\beta} &\\sim \\text{HalfNormal}(5) \n", 119 | "\\end{align*}$$\n", 120 | "\n", 121 | "This looks like an awful lot more formula but the most important changes are only in the second, third and fourth line, the lines below are mostly repeating priors from above.\n", 122 | "\n", 123 | "In the second line, we use a linear model similar to above, but with different $\\alpha$ and $\\beta$ for each ZIP code. As before, we need to define priors for these two parameters. Only unlike above, we now put so called hyperpriors on the parameters of these priors. $\\mu_{\\alpha}$ is now the expected mean for the intercepts of each ZIP code and $\\sigma_{\\alpha}$ determines how much of a difference there can be between the intercept in Spandau and the intercept in Prenzlauer Berg.\n", 124 | "\n", 125 | "Don't worry if so many formulas are not your cup of tea, we now gonna look at the code for the model. Before writing the model however, we need to map the zip codes to an index variable:" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "berlin[\"zip\"] = berlin.geo_plz.map(str.strip)\n", 135 | "zip_codes = np.sort(berlin.zip.unique())\n", 136 | "num_zip_codes = len(zip_codes)\n", 137 | "zip_lookup = dict(zip(zip_codes, range(num_zip_codes)))\n", 138 | "berlin[\"zip_code\"] = berlin.zip.replace(zip_lookup).values" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "And a small helper function to map from ZIP string to ZIP code:" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "def map_zip_codes(zip_strings, zip_lookup=zip_lookup):\n", 155 | " return pd.Series(zip_strings).replace(zip_lookup).values" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "zips = berlin[\"zip_code\"].values\n", 165 | "\n", 166 | "with pm.Model() as hier_model:\n", 167 | " \n", 168 | " mu_alpha = pm.Normal(\"mu_alpha\", mu=0, sigma=10)\n", 169 | " sigma_alpha = pm.HalfNormal(\"sigma_alpha\", sigma=5)\n", 170 | " \n", 171 | " mu_beta = pm.Normal(\"mu_beta\", mu=0, sigma=5)\n", 172 | " sigma_beta = pm.HalfNormal(\"sigma_beta\", sigma=5)\n", 173 | " \n", 174 | " alpha = pm.Normal(\"alpha\", mu=mu_alpha, sd=sigma_alpha, \n", 175 | " shape=num_zip_codes)\n", 176 | " \n", 177 | " beta = pm.Normal(\"beta\", mu=mu_beta, sd=sigma_beta, \n", 178 | " shape=num_zip_codes)\n", 179 | " \n", 180 | "\n", 181 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 182 | " \n", 183 | " mu = alpha[zips] + beta[zips]*berlin[\"livingSpace_s\"]\n", 184 | " \n", 185 | " rent = pm.Normal(\"rent\", mu=mu, sd=sigma, observed=berlin[\"totalRent_s\"])\n", 186 | " \n", 187 | " trace = pm.sample(random_seed=2020, chains=2, \n", 188 | " draws=1000, tune=1000)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "We will need to do the convergency checks again. For this, we first collect the different model outputs in an ArviZ Data object:" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "import arviz as az\n", 205 | "pm_data = az.from_pymc3(model=hier_model, trace=trace )" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "One thing where the ArviZ Data object is very useful, is for dealing with high-dimensional model artificats. With hierararchical models such as this one, we get one parameter per level (here the ZIP codes) and 4000 samples per parameter, coming from four chains (at least in the beginning, we want to keep the samples of each chain separated). ArviZ makes it easier to keep track of all the different dimensions and coordinates:" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "pm_data.posterior" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "There are the dimensions `alpha_dim_0` and `beta_dim_0` that represent the ZIP codes. We have 208 ZIP codes and since we mapped them to an index, the coordinates for these dimensions are simply the integers from 0 to 207. But we can give them meaningful names by providing ArviZ with the original ZIP code strings:" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "pm_data = az.from_pymc3(model=hier_model, trace=trace,\n", 238 | " # create a new coordinate\n", 239 | " coords={'zip_code': zip_codes},\n", 240 | " # this coordinate is used by the dimension alpha and beta\n", 241 | " dims={\"alpha\": [\"zip_code\"], \"beta\": [\"zip_code\"]} )" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "pm_data.posterior" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "There's much more that can be done with the ArviZ data object but this would go beyond this tutorial. A small introduction on how to work with them can be found [here](https://github.com/corriebar/arviz/blob/xarray-example/doc/notebooks/Working%20with%20InferenceData.ipynb).\n", 258 | "\n", 259 | "\n", 260 | "Next, we'll check the trace plots. Remember, ArviZ has a function for this. Only problem: we know have many many alpha and beta parameters, one each for each ZIP code. This is way too much to plot! Use the function parameter `var_names` to only select the parameters from the model that don't use ZIP code as index." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "az.plot_trace(pm_data, \n", 270 | " var_names = [\"mu_alpha\", \"mu_beta\", \"sigma_alpha\", \"sigma_beta\", \"sigma\"]\n", 271 | " )\n", 272 | "plt.show()" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "Next, we'll check the three different summary/diagnostic statistics. There were R_hat, MCSE (Monte Carlo Standard Error), and ESS (Effective Sample Size). You can get the summary table using ArviZ again:" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "az.summary(pm_data, round_to=4)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "Unfortunately, since we have so many parameters, we can't check them easily by hand. What we can do instead, is to plot a histogram for each of the diagnostics:" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "summ = az.summary(pm_data, round_to=5)\n", 305 | "plt.hist(summ[\"r_hat\"], bins=50, alpha=0.9, ec=\"darkblue\")\n", 306 | "plt.title(\"R_hat\")\n", 307 | "plt.show()" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "Check for yourself the ESS diagnostics:" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": {}, 321 | "outputs": [], 322 | "source": [ 323 | "summ = az.summary(pm_data, round_to=5)\n", 324 | "plt.hist(summ[\"ess_mean\"], bins=50, alpha=0.9, ec=\"darkblue\")\n", 325 | "plt.title(\"ESS Mean\")\n", 326 | "plt.show()" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": {}, 332 | "source": [ 333 | "Before, we checked how good our model fit the data by comparing the plot of the linear model to our data. Since we now have a collection of linear models, this would be rather difficult. What we can do instead is a so called posterior-predictive check. We compare the predicted distribution of outcomes to the actual distribution of outcomes." 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [ 342 | "with hier_model:\n", 343 | " posterior_pred = pm.sample_posterior_predictive(trace=trace)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": {}, 350 | "outputs": [], 351 | "source": [ 352 | "fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(14,14), sharex=True)\n", 353 | "ax = ax.ravel()\n", 354 | "ax[0].hist(berlin[\"totalRent\"], ec=\"darkblue\", alpha=0.9, bins=20)\n", 355 | "ax[0].set_title(\"Original data\")\n", 356 | "sample_nums = np.random.choice(posterior_pred[\"rent\"].shape[0], size=3, replace=False)\n", 357 | "for i, samp in enumerate(sample_nums):\n", 358 | " ax[i+1].hist(posterior_pred[\"rent\"][samp]*100, ec=\"#a93900\",color=\"#fc4f30\", alpha=0.8, bins=20)\n", 359 | " ax[i+1].set_title(f\"Sample {i+1}\")\n", 360 | "plt.suptitle(\"Comparing Original Data to Predicted Data\")\n", 361 | "plt.show()" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "Even though it is difficult to visualize all models, we can pick out a few and check how the model differs for different ZIP codes. To now be able to select the posterior sample of a single ZIP code, the ArviZ object becomes very helpful. If we, for example, want to extract the sample for Blankenfelde (the ZIP code above with so few observations), we get the data as follows:" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "blankenfelde = pm_data.posterior.sel(zip_code=\"13129\")\n", 378 | "# need to stack chains and draws so that the resulting object only has one dimension left\n", 379 | "blankenfelde = blankenfelde.stack(samples = [\"chain\", \"draw\"])" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "area_s = np.linspace(start=-2, stop=3.5, num=50)\n", 389 | "\n", 390 | "mu_pred_blankenfelde = blankenfelde.alpha.values + blankenfelde.beta.values * area_s[:,None]\n", 391 | "\n", 392 | "# destandardize area again\n", 393 | "area = destandardize_area(area_s)\n", 394 | "\n", 395 | "plt.plot(area, mu_pred_blankenfelde.mean(1)*100, alpha=0.3, c=\"k\")\n", 396 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.4, c=\"grey\")\n", 397 | "\n", 398 | "plt.title(\"Uncertainty (mu) for Blankenburg\")\n", 399 | "\n", 400 | "\n", 401 | "az.plot_hpd(area, mu_pred_blankenfelde.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#008FD5\"})\n", 402 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 403 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 404 | "plt.xlabel('Living Area [sqm]')\n", 405 | "plt.ylabel('Rent [€]')\n", 406 | "plt.legend()\n", 407 | "plt.show()" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "We can compare this with one of the ZIP codes from above that had more data.\n", 415 | "Go and make the same plot for a different ZIP code (of your choice)!" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": {}, 422 | "outputs": [], 423 | "source": [ 424 | "prenzl_berg = pm_data.posterior.sel(zip_code = \"10405\")\n", 425 | "prenzl_berg = prenzl_berg.stack(samples = [\"draw\", \"chain\"])\n", 426 | "\n", 427 | "spandau = pm_data.posterior.sel(zip_code = \"13583\")\n", 428 | "spandau = spandau.stack(samples = [\"draw\", \"chain\"])" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "mu_pred_prenzl_berg = prenzl_berg.alpha.values + prenzl_berg.beta.values * area_s[:,None]\n", 438 | "mu_pred_spandau = spandau.alpha.values + spandau.beta.values * area_s[:, None]\n", 439 | "\n", 440 | "plt.plot(area, mu_pred_prenzl_berg.mean(1)*100, alpha=0.3)\n", 441 | "plt.plot(area, mu_pred_spandau.mean(1)*100, alpha=0.3)\n", 442 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.4, c=\"grey\")\n", 443 | "\n", 444 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"10405\"], \n", 445 | " berlin[\"totalRent\"][berlin.geo_plz == \"10405\"], s=40, label=\"Prenzlauer Berg\")\n", 446 | "\n", 447 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13583\"], \n", 448 | " berlin[\"totalRent\"][berlin.geo_plz == \"13583\"], s=40, label=\"Spandau\")\n", 449 | "\n", 450 | "plt.title(\"Uncertainty (mu) \\nfor Prenzlauer Berg & Spandau\")\n", 451 | "plt.legend()\n", 452 | "\n", 453 | "az.plot_hpd(area, mu_pred_prenzl_berg.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#008FD5\"})\n", 454 | "\n", 455 | "az.plot_hpd(area, mu_pred_spandau.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#FC4F30\"})\n", 456 | "plt.xlabel('Living Area [sqm]')\n", 457 | "plt.ylabel('Rent [€]')\n", 458 | "plt.show()" 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "We can see that there is much less uncertainty for ZIP codes with more data. The model is also very sure that both the intercept and slope are higher for Prenzlauer Berg than for Spandau, just as we would expect.\n", 466 | "\n", 467 | "If you want, you can also check how the full uncertainty looks like for these ZIP codes. Remember, for this you'll need to compute the predictions for rent." 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": null, 473 | "metadata": {}, 474 | "outputs": [], 475 | "source": [ 476 | "import scipy.stats as stats" 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": null, 482 | "metadata": {}, 483 | "outputs": [], 484 | "source": [ 485 | "rent_blankenfelde = stats.norm.rvs(mu_pred_blankenfelde, trace['sigma'])\n", 486 | "\n", 487 | "plt.plot(area, mu_pred_blankenfelde.mean(1)*100, alpha=0.3, c=\"k\")\n", 488 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.7, c=\"grey\")\n", 489 | "\n", 490 | "az.plot_hpd(area, mu_pred_blankenfelde.T*100, credible_interval=0.83, \n", 491 | " fill_kwargs={\"alpha\": 0.5})\n", 492 | "\n", 493 | "az.plot_hpd(area, rent_blankenfelde.T*100, credible_interval=0.83, \n", 494 | " fill_kwargs={\"alpha\": 0.5})\n", 495 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13129\"], \n", 496 | " berlin[\"totalRent\"][berlin.geo_plz == \"13129\"], s=40, label=\"Blankenburg\")\n", 497 | "plt.legend()\n", 498 | "plt.title(\"Full uncertainty for Blankenfelde\")\n", 499 | "plt.xlabel('Living Area [sqm]')\n", 500 | "plt.ylabel('Rent [€]')\n", 501 | "plt.show()" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": null, 507 | "metadata": {}, 508 | "outputs": [], 509 | "source": [ 510 | "rent_prenzl_berg = stats.norm.rvs(mu_pred_prenzl_berg, trace['sigma'])\n", 511 | "rent_spandau = stats.norm.rvs(mu_pred_spandau, trace['sigma'])\n", 512 | "\n", 513 | "plt.plot(area, mu_pred_prenzl_berg.mean(1)*100, alpha=0.3)\n", 514 | "plt.plot(area, mu_pred_spandau.mean(1)*100, alpha=0.3)\n", 515 | "plt.scatter(berlin[\"livingSpace\"], berlin[\"totalRent_s\"]*100, s=4, alpha=0.4, c=\"grey\")\n", 516 | "\n", 517 | "plt.title(\"Full uncertainty \\nfor Prenzlauer Berg & Spandau\")\n", 518 | "\n", 519 | "az.plot_hpd(area, mu_pred_prenzl_berg.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#008FD5\"})\n", 520 | "az.plot_hpd(area, rent_prenzl_berg.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#008FD5\"})\n", 521 | "\n", 522 | "az.plot_hpd(area, mu_pred_spandau.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#FC4F30\"})\n", 523 | "az.plot_hpd(area, rent_spandau.T*100, credible_interval=0.83, fill_kwargs={\"color\": \"#FC4F30\"})\n", 524 | "\n", 525 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"10405\"], \n", 526 | " berlin[\"totalRent\"][berlin.geo_plz == \"10405\"], s=40, label=\"Prenzlauer Berg\")\n", 527 | "\n", 528 | "plt.scatter(berlin[\"livingSpace\"][berlin.geo_plz == \"13583\"], \n", 529 | " berlin[\"totalRent\"][berlin.geo_plz == \"13583\"], s=40, label=\"Spandau\")\n", 530 | "\n", 531 | "plt.legend()\n", 532 | "plt.xlabel('Living Area [sqm]')\n", 533 | "plt.ylabel('Rent [€]')\n", 534 | "plt.show()" 535 | ] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "metadata": {}, 540 | "source": [ 541 | "Instead of computing the rent predictions by hand, we could also use PyMC data container to handle predictions on new data.\n", 542 | "\n", 543 | "Unfortunately, because of an still unclosed issue, we can't use the PyMC data container to update the ZIP code indices but need to use a shared theano variable. However, both types are updated in a similar fashion." 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": null, 549 | "metadata": {}, 550 | "outputs": [], 551 | "source": [ 552 | "zips = theano.shared(berlin[\"zip_code\"].values)\n", 553 | "# idx variables cannnot used with pm.Data() so far, because of bug\n", 554 | "# see here: https://discourse.pymc.io/t/integer-values-with-pm-data/3776\n", 555 | "# and here: https://github.com/pymc-devs/pymc3/issues/3493\n", 556 | "\n", 557 | "\n", 558 | "with pm.Model() as hier_model:\n", 559 | " area = pm.Data(\"area\", berlin[\"livingSpace_s\"])\n", 560 | " #zips = pm.Data(\"zips\", d[\"zip_code\"].values)\n", 561 | " \n", 562 | " mu_alpha = pm.Normal(\"mu_alpha\", mu=0, sigma=10)\n", 563 | " sigma_alpha = pm.HalfNormal(\"sigma_alpha\", sigma=5)\n", 564 | " \n", 565 | " mu_beta = pm.Normal(\"mu_beta\", mu=0, sigma=5)\n", 566 | " sigma_beta = pm.HalfNormal(\"sigma_beta\", sigma=5)\n", 567 | " \n", 568 | " alpha = pm.Normal(\"alpha\", mu=mu_alpha, sd=sigma_alpha, \n", 569 | " shape=num_zip_codes)\n", 570 | " \n", 571 | " beta = pm.Normal(\"beta\", mu=mu_beta, sd=sigma_beta, \n", 572 | " shape=num_zip_codes)\n", 573 | " \n", 574 | "\n", 575 | " sigma = pm.HalfNormal(\"sigma\", sigma=5)\n", 576 | " \n", 577 | " mu = alpha[zips] + beta[zips]*area\n", 578 | " \n", 579 | " rent = pm.Normal(\"rent\", mu=mu, sd=sigma, observed=berlin[\"totalRent_s\"])\n", 580 | " \n", 581 | " trace = pm.sample(random_seed=2020, chains=4, \n", 582 | " draws=1000, tune=1000)" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "Feel free to change the area and ZIP code data to for example your own flat." 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": null, 595 | "metadata": {}, 596 | "outputs": [], 597 | "source": [ 598 | "more_flats = pd.DataFrame({\"area\": standardize_area(np.array([100, 240, 74])), \n", 599 | " \"zip_code\": [\"10243\", \"10179\", \"12047\"]})\n", 600 | "\n", 601 | "more_flats[\"zip\"] = map_zip_codes(more_flats[\"zip_code\"])" 602 | ] 603 | }, 604 | { 605 | "cell_type": "code", 606 | "execution_count": null, 607 | "metadata": {}, 608 | "outputs": [], 609 | "source": [ 610 | "with hier_model:\n", 611 | " zips.set_value(more_flats[\"zip\"])\n", 612 | " pm.set_data({\"area\": more_flats[\"area\"]})\n", 613 | " post_pred = pm.sample_posterior_predictive(trace, samples=1000)" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": {}, 619 | "source": [ 620 | "As before, we can now plot this as a histogram:" 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": null, 626 | "metadata": {}, 627 | "outputs": [], 628 | "source": [ 629 | "y_pred = post_pred[\"rent\"][:,2]*100\n", 630 | "\n", 631 | "plt.hist(y_pred, ec=\"darkblue\", alpha=0.9, bins=20)\n", 632 | "plt.title(\"Rental price distribution\\nfor a flat of 74sqm in 12047\")\n", 633 | "plt.show()" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "Or ask for the probability that your flat would have a rent lower than your own rent:" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "metadata": {}, 647 | "outputs": [], 648 | "source": [ 649 | "np.mean(y_pred < 900)" 650 | ] 651 | } 652 | ], 653 | "metadata": { 654 | "kernelspec": { 655 | "display_name": "PyLadies-Bayesian-Tutorial", 656 | "language": "python", 657 | "name": "pyladies-bayesian-tutorial" 658 | }, 659 | "language_info": { 660 | "codemirror_mode": { 661 | "name": "ipython", 662 | "version": 3 663 | }, 664 | "file_extension": ".py", 665 | "mimetype": "text/x-python", 666 | "name": "python", 667 | "nbconvert_exporter": "python", 668 | "pygments_lexer": "ipython3", 669 | "version": "3.7.4" 670 | } 671 | }, 672 | "nbformat": 4, 673 | "nbformat_minor": 4 674 | } 675 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | 5 | DATA = pd.read_csv("../data/immo_data.csv", dtype={"geo_plz": str}) 6 | 7 | def iqr(data): 8 | """compute the interquartile range (excluding nan)""" 9 | return np.nanquantile(data, 0.75) - np.nanquantile(data, 0.25) 10 | 11 | def iqr_rule(data, factor=1.5): 12 | """returns an outlier filter mask using the iqr rule""" 13 | iqr_ = iqr(data) 14 | upper_fence = np.nanquantile(data, 0.75) + factor*iqr_ 15 | lower_fence = np.nanquantile(data, 0.25) - factor*iqr_ 16 | return (data <= upper_fence) & (data >= lower_fence) 17 | 18 | def preprocess_data(data): 19 | data["totalRent"] = np.where(data["totalRent"].isnull(), data["baseRent"], data["totalRent"]) 20 | 21 | # since log doesn't work with 0, we replace 0 with 0.5 22 | # seems reasonable tto say hat a rent of 0€ is the same as 50ct 23 | data["livingSpace_m"] = np.where(data["livingSpace"] <= 0, 0.5, data["livingSpace"]) 24 | data["totalRent_m"] = np.where(data["totalRent"] <= 0, 0.5, data["totalRent"]) 25 | data["logRent"] = np.log(data["totalRent_m"]) 26 | data["logSpace"] = np.log(data["livingSpace_m"]) 27 | 28 | not_outlier = iqr_rule(data["logSpace"], factor=1.5) & iqr_rule(data["logRent"], factor=1.5) 29 | d = data[not_outlier] 30 | berlin = d[(d.regio1 == "Berlin")].copy() 31 | 32 | berlin["livingSpace_s"] = (berlin["livingSpace"] - berlin["livingSpace"].mean()) / np.std(berlin["livingSpace"]) 33 | berlin["totalRent_s"] = berlin["totalRent"] / 100 34 | 35 | return berlin 36 | 37 | BERLIN = preprocess_data(DATA) 38 | 39 | 40 | def compare_hist(priors, data): 41 | fig, ax = plt.subplots(figsize=(20,9), nrows=1, ncols=2) 42 | ax[0].hist(priors["rent"].flatten()*100, alpha=0.9, ec="darkblue", bins=70) 43 | ax[0].set_title("Histogram over possible range of rental prices") 44 | ax[0].set_xlabel("Monthly Rent [€]") 45 | ax[1].hist(data["totalRent_s"]*100, alpha=0.9, ec="darkblue", bins=70) 46 | ax[1].set_title("Histogram over the actual rental prices") 47 | ax[1].set_xlabel("Monthly Rent [€]") 48 | return fig, ax 49 | 50 | 51 | def draw_models(priors, data): 52 | area_s = np.linspace(start=-2, stop=3.5, num=50) 53 | draws = np.random.choice(len(priors["alpha"]), replace=False, size=50) 54 | alpha = priors["alpha"][draws] 55 | beta = priors["beta"][draws] 56 | 57 | mu = alpha + beta * area_s[:, None] 58 | 59 | fig, ax = plt.subplots(figsize=(9,9)) 60 | ax.plot(area_s*np.std(data["livingSpace"]) + data["livingSpace"].mean(), mu*100, c="#737373", alpha=0.5) 61 | ax.set_xlabel("Living Area [sqm]", fontdict={"fontsize": 22}) 62 | ax.set_ylabel("Price [€]", fontdict={"fontsize": 22}) 63 | ax.set_title("Linear model according to our prior") 64 | return fig, ax 65 | 66 | 67 | def standardize_area(x): 68 | return (x - BERLIN["livingSpace"].mean()) / np.std(BERLIN["livingSpace"]) 69 | 70 | 71 | def destandardize_area(x): 72 | return (x * np.std(BERLIN["livingSpace"])) + BERLIN["livingSpace"].mean() 73 | --------------------------------------------------------------------------------