├── .gitignore ├── README.md ├── Index.ipynb ├── Solutions-03.ipynb ├── Solutions-02.ipynb ├── 04-Implementing-MCMC.ipynb ├── 02-Simple-Bayesian-Modeling.ipynb ├── Solutions-04.ipynb ├── Solutions-05.ipynb ├── 03-Bayesian-Modeling-With-MCMC.ipynb ├── 01-Introduction.ipynb └── 05-Radial-Velocity.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | .DS_store 3 | .ipynb_checkpoints -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bayesian Methods in Astronomy 2 | 3 | [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.50993.svg)](http://dx.doi.org/10.5281/zenodo.50993) 4 | 5 | 6 | Materials for the Bayesian Methods in Astronomy workshop at the 227th American Astronomical Society meeting. 7 | 8 | View all the content [here](https://github.com/jakevdp/AAS227Workshop/blob/master/Index.ipynb) 9 | 10 | 11 | ## Requirements 12 | 13 | For this workshop we will be using Python, and in particular *Python version 3.4-3.5*. In addition, we will make use of the following packages, along with their dependencies. 14 | 15 | **Note that familiarity with the following packages is considered a pre-requisite to the workshop. We will not be spending time helping people with setting up or installing these packages.** 16 | 17 | - [NumPy](http://numpy.org) (Numerical Python) for efficient manipulation of array-based data 18 | - [SciPy](http://scipy.org) (Scientific Python) for optimization and other routines 19 | - [Pandas](http://pandas.pydata.org) for reading data files into arrays 20 | - [Matplotlib](http://matplotlib.org) for scientific visualization 21 | - [IPython notebook](http://ipython.org) as an interactive computing environment 22 | 23 | To install these dependencies, I *highly* recommend using the [miniconda](http://conda.pydata.org/miniconda.html) Python installer *(be sure to download the Python 3.5 version of miniconda)*. Once this is installed, run the following commands in your terminal: 24 | 25 | ``` 26 | $ conda install numpy scipy pandas matplotlib ipython-notebook pip 27 | ``` 28 | 29 | On top of these prerequisites, we will be introducing two relatively lightweight packages designed to enable efficient Bayesian computation in Python. 30 | They are: 31 | 32 | - [emcee](http://dan.iel.fm/emcee/) for Markov Chain Monte Carlo sampling of Bayesian posteriors 33 | - [corner.py](https://pypi.python.org/pypi/corner/1.0.0) for visualization of multidimensional posteriors 34 | 35 | These can be installed via the Python Package Index, using the ``pip`` command: 36 | 37 | ``` 38 | $ pip install emcee 39 | $ pip install corner 40 | ``` 41 | 42 | ## Workshop Material 43 | 44 | All the material for this workshop, as well as the exercises we will work on together, is contained in IPython notebooks within this repository. 45 | Please clone this repository (if you are familiar with Git) or download the zip file (link is to the right on the main repository page) to download this content; note that I will likely update the material in the days prior to the workshop, so please plan to git-pull or re-download the zip file the morning of the workshop. 46 | -------------------------------------------------------------------------------- /Index.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Bayesian Methods in Astronomy: Hands-on Statistics" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This is the content for the *Bayesian Methods in Astronomy* workshop, presented at the 227th meeting of the American Astronomical Society.\n", 15 | "The full repository can be found on GitHub: [http://github.com/jakevdp/AAS227Workshop](https://github.com/jakevdp/AAS227Workshop)" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## Contents\n", 23 | "\n", 24 | "### [1. Introduction: Probability and Bayes' Rule](01-Introduction.ipynb)\n", 25 | "\n", 26 | "### [2. Simple Bayesian Modeling](02-Simple-Bayesian-Modeling.ipynb)\n", 27 | "- Solutions to exercise and breakout [here](Solutions-02.ipynb)\n", 28 | " \n", 29 | "### [3. Bayesian Modeling via Sampling](03-Bayesian-Modeling-With-MCMC.ipynb)\n", 30 | "- Solutions to breakout [here](Solutions-03.ipynb)\n", 31 | "\n", 32 | "### [4. Implementing MCMC](04-Implementing-MCMC.ipynb)\n", 33 | "- Solutions to breakout [here](Solutions-04.ipynb)\n", 34 | "\n", 35 | "### [5. Application: Radial Velocity Planet Searches](05-Radial-Velocity.ipynb)\n", 36 | "- Solutions to breakout [here](Solutions-05.ipynb)" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Requirements\n", 44 | "\n", 45 | "For this workshop we will be using Python, and in particular *Python version 3.4-3.5*. In addition, we will make use of the following packages, along with their dependencies.\n", 46 | "\n", 47 | "**Note that familiarity with the following packages is considered a pre-requisite to the workshop. We will not be spending time helping people with setting up or installing these packages.**\n", 48 | "\n", 49 | "- [NumPy](http://numpy.org) (Numerical Python) for efficient manipulation of array-based data\n", 50 | "- [SciPy](http://scipy.org) (Scientific Python) for optimization and other routines\n", 51 | "- [Pandas](http://pandas.pydata.org) for reading data files into arrays\n", 52 | "- [Matplotlib](http://matplotlib.org) for scientific visualization\n", 53 | "- [IPython notebook](http://ipython.org) as an interactive computing environment\n", 54 | "\n", 55 | "To install these dependencies, I *highly* recommend using the [miniconda](http://conda.pydata.org/miniconda.html) Python installer *(be sure to download the Python 3.5 version of miniconda)*. Once this is installed, run the following commands in your terminal:\n", 56 | "\n", 57 | "```\n", 58 | "$ conda install numpy scipy pandas matplotlib ipython-notebook pip\n", 59 | "```\n", 60 | "\n", 61 | "On top of these prerequisites, we will be introducing two relatively lightweight packages designed to enable efficient Bayesian computation in Python.\n", 62 | "They are:\n", 63 | "\n", 64 | "- [emcee](http://dan.iel.fm/emcee/) for Markov Chain Monte Carlo sampling of Bayesian posteriors\n", 65 | "- [corner.py](https://pypi.python.org/pypi/corner/1.0.0) for visualization of multidimensional posteriors\n", 66 | "\n", 67 | "These can be installed via the Python Package Index, using the ``pip`` command:\n", 68 | "\n", 69 | "```\n", 70 | "$ pip install emcee\n", 71 | "$ pip install corner\n", 72 | "```" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "## Workshop Material\n", 80 | "\n", 81 | "All the material for this workshop, as well as the exercises we will work on together, is contained in IPython notebooks within this repository.\n", 82 | "Please clone this repository (if you are familiar with Git) or download the zip file (link is to the right on the main repository page) to download this content; note that I will likely update the material in the days prior to the workshop, so please plan to git-pull or re-download the zip file the morning of the workshop." 83 | ] 84 | } 85 | ], 86 | "metadata": { 87 | "kernelspec": { 88 | "display_name": "Python 3.5", 89 | "language": "", 90 | "name": "python3.5" 91 | }, 92 | "language_info": { 93 | "codemirror_mode": { 94 | "name": "ipython", 95 | "version": 3 96 | }, 97 | "file_extension": ".py", 98 | "mimetype": "text/x-python", 99 | "name": "python", 100 | "nbconvert_exporter": "python", 101 | "pygments_lexer": "ipython3", 102 | "version": "3.5.1" 103 | } 104 | }, 105 | "nbformat": 4, 106 | "nbformat_minor": 0 107 | } 108 | -------------------------------------------------------------------------------- /Solutions-03.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Solutions-03" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This contains one possible solution for the three-parameter linear model with intrinsic scatter.\n", 15 | "\n", 16 | "Recall that we are working with this data:" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": false 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "%matplotlib inline\n", 28 | "import matplotlib.pyplot as plt\n", 29 | "import numpy as np\n", 30 | "import seaborn; seaborn.set() # for plot formatting\n", 31 | "\n", 32 | "def make_data_scatter(intercept, slope, scatter,\n", 33 | " N=20, dy=2, rseed=42):\n", 34 | " rand = np.random.RandomState(rseed)\n", 35 | " x = 100 * rand.rand(20)\n", 36 | " y = intercept + slope * x\n", 37 | " y += np.sqrt(dy ** 2 + scatter ** 2) * rand.randn(20)\n", 38 | " return x, y, dy * np.ones_like(x)\n", 39 | "\n", 40 | "\n", 41 | "# (intercept, slope, intrinsic scatter)\n", 42 | "theta = (25, 0.5, 3.0)\n", 43 | "x, y, dy = make_data_scatter(*theta)\n", 44 | "plt.errorbar(x, y, dy, fmt='o');" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "### Define the prior, likelihood, and posterior\n", 52 | "\n", 53 | "The likelihood for this model looks very similar to what we used above, except that the intrinsic scatter is added *in quadrature* to the measurement error.\n", 54 | "If $\\varepsilon_i$ is the measurement error on the point $(x_i, y_i)$, and $\\sigma$ is the intrinsic scatter, then the likelihood should look like this:\n", 55 | "\n", 56 | "$$\n", 57 | "P(x_i,y_i\\mid\\theta) = \\frac{1}{\\sqrt{2\\pi(\\varepsilon_i^2 + \\sigma^2)}} \\exp\\left(\\frac{-\\left[y_i - y(x_i;\\theta)\\right]^2}{2(\\varepsilon_i^2 + \\sigma^2)}\\right)\n", 58 | "$$\n", 59 | "\n", 60 | "For the prior, you can use either a flat or symmetric prior on the slope and intercept, but on the intrinsic scatter $\\sigma$ it is best to use a scale-invariant Jeffreys Prior:\n", 61 | "\n", 62 | "$$\n", 63 | "P(\\sigma)\\propto\\sigma^{-1}\n", 64 | "$$\n", 65 | "\n", 66 | "As discussed before, this has the nice feature that the resulting posterior will not depend on the units of measurement." 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": { 73 | "collapsed": true 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "# Define functions to compute the log-prior, log-likelihood, and log-posterior\n", 78 | "\n", 79 | "# theta = [intercept, slope, scatter]\n", 80 | "\n", 81 | "def log_prior(theta):\n", 82 | " if theta[2] <= 0 or np.any(np.abs(theta[:2]) > 1000):\n", 83 | " return -np.inf # log(0)\n", 84 | " else:\n", 85 | " # Jeffreys Prior\n", 86 | " return -np.log(theta[2])\n", 87 | " \n", 88 | "def log_likelihood(theta, x, y, dy):\n", 89 | " y_model = theta[0] + theta[1] * x\n", 90 | " S = dy ** 2 + theta[2] ** 2\n", 91 | " return -0.5 * np.sum(np.log(2 * np.pi * S) +\n", 92 | " (y - y_model) ** 2 / S)\n", 93 | "\n", 94 | "def log_posterior(theta, x, y, dy):\n", 95 | " return log_prior(theta) + log_likelihood(theta, x, y, dy)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### Sampling from the Posterior" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": { 109 | "collapsed": true 110 | }, 111 | "outputs": [], 112 | "source": [ 113 | "# Using emcee, create and initialize a sampler and draw 200 samples from the posterior.\n", 114 | "# Remember to think about what starting guesses should you use!\n", 115 | "# You can use the above as a template\n", 116 | "\n", 117 | "import emcee\n", 118 | "\n", 119 | "ndim = 3 # number of parameters in the model\n", 120 | "nwalkers = 50 # number of MCMC walkers\n", 121 | "\n", 122 | "# initialize walkers\n", 123 | "starting_guesses = np.random.randn(nwalkers, ndim)\n", 124 | "starting_guesses[:, 2] = np.random.rand(nwalkers)\n", 125 | "\n", 126 | "sampler = emcee.EnsembleSampler(nwalkers, ndim, log_posterior,\n", 127 | " args=[x, y, dy])\n", 128 | "pos, prob, state = sampler.run_mcmc(starting_guesses, 200)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "### Visualizing the Chains" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "collapsed": false 143 | }, 144 | "outputs": [], 145 | "source": [ 146 | "# Plot the three chains as above\n", 147 | "\n", 148 | "fig, ax = plt.subplots(3, sharex=True)\n", 149 | "for i in range(3):\n", 150 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "### Restarting and getting a clean sample" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "collapsed": false 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "# Are your chains stabilized? Reset them and get a clean sample\n", 169 | "\n", 170 | "sampler.reset()\n", 171 | "pos, prob, state = sampler.run_mcmc(pos, 1000)\n", 172 | "\n", 173 | "fig, ax = plt.subplots(3, sharex=True)\n", 174 | "for i in range(3):\n", 175 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "### Visualizing the results" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": { 189 | "collapsed": false 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "# Use corner.py to visualize the three-dimensional posterior\n", 194 | "\n", 195 | "import corner\n", 196 | "corner.corner(sampler.flatchain, truths=theta,\n", 197 | " labels=['intercept', 'slope', 'scatter']);" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "And visualizing the model over the data:" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": null, 210 | "metadata": { 211 | "collapsed": false 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "# Next plot ~100 of the samples as models over the data to get an idea of the fit\n", 216 | "\n", 217 | "chain = sampler.flatchain\n", 218 | "\n", 219 | "plt.errorbar(x, y, dy, fmt='o');\n", 220 | "\n", 221 | "thetas = [chain[i] for i in np.random.choice(chain.shape[0], 100)]\n", 222 | "\n", 223 | "xfit = np.linspace(0, 100)\n", 224 | "for i in range(100):\n", 225 | " theta = thetas[i]\n", 226 | " plt.plot(xfit, theta[0] + theta[1] * xfit,\n", 227 | " color='black', alpha=0.05);" 228 | ] 229 | } 230 | ], 231 | "metadata": { 232 | "kernelspec": { 233 | "display_name": "Python 3", 234 | "language": "python", 235 | "name": "python3" 236 | }, 237 | "language_info": { 238 | "codemirror_mode": { 239 | "name": "ipython", 240 | "version": 3 241 | }, 242 | "file_extension": ".py", 243 | "mimetype": "text/x-python", 244 | "name": "python", 245 | "nbconvert_exporter": "python", 246 | "pygments_lexer": "ipython3", 247 | "version": "3.5.1" 248 | } 249 | }, 250 | "nbformat": 4, 251 | "nbformat_minor": 0 252 | } 253 | -------------------------------------------------------------------------------- /Solutions-02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Solutions-02" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### Setup Code" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": { 21 | "collapsed": false 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "%matplotlib inline\n", 26 | "import matplotlib.pyplot as plt\n", 27 | "import numpy as np\n", 28 | "import seaborn; seaborn.set() # for plot formatting" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "metadata": { 35 | "collapsed": false 36 | }, 37 | "outputs": [], 38 | "source": [ 39 | "# Making the data\n", 40 | "\n", 41 | "def make_data(intercept, slope, N=20, dy=5, rseed=42):\n", 42 | " rand = np.random.RandomState(rseed)\n", 43 | " x = 100 * rand.rand(N)\n", 44 | " y = intercept + slope * x\n", 45 | " y += dy * rand.randn(N)\n", 46 | " return x, y, dy * np.ones_like(x)\n", 47 | "\n", 48 | "theta_true = [25, 0.5]\n", 49 | "x, y, dy = make_data(*theta_true)\n", 50 | "plt.errorbar(x, y, dy, fmt='o');" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### Quick Exercise\n", 58 | "\n", 59 | "1. Write a Python function which computes the log-likelihood given a parameter vector $\\theta$, an array of errors $\\varepsilon$, and an array of $x$ and $y$ values\n", 60 | "\n", 61 | "2. Use tools in [``scipy.optimize``](http://docs.scipy.org/doc/scipy/reference/optimize.html) to maximize this likelihood (i.e. minimize the negative log-likelihood). How close is this result to the input ``theta_true`` above?" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "#### Solution" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": { 75 | "collapsed": true 76 | }, 77 | "outputs": [], 78 | "source": [ 79 | "def log_likelihood(theta, x, y, dy):\n", 80 | " y_model = theta[0] + theta[1] * x\n", 81 | " return -0.5 * np.sum(np.log(2 * np.pi * dy ** 2) +\n", 82 | " (y - y_model) ** 2 / dy ** 2)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": { 89 | "collapsed": false 90 | }, 91 | "outputs": [], 92 | "source": [ 93 | "from scipy import optimize\n", 94 | "\n", 95 | "def minfunc(theta, x, y, dy):\n", 96 | " return -log_likelihood(theta, x, y, dy)\n", 97 | "\n", 98 | "result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))\n", 99 | "print(\"result:\", result.x)\n", 100 | "print(\"input\", theta_true)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "### Breakout\n", 108 | "\n", 109 | "1. Using matplotlib, plot the posterior probability distribution for the slope and intercept, once for each prior. I would suggest using ``plt.contourf()`` or ``plt.pcolor()``. How different are the distributions?\n", 110 | "\n", 111 | "2. Modify the dataset – how do the results change if you have very few data points or very large errors?\n", 112 | "\n", 113 | "3. If you finish this quickly, try adding 1-sigma and 2-sigma contours to your plot, keeping in mind that the probabilities are not normalized! You can add them to your plot with ``plt.contour()``." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "#### Setup\n", 121 | "\n", 122 | "These are the two prior functions we defined in the notebook:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": { 129 | "collapsed": true 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "def log_flat_prior(theta):\n", 134 | " if np.all(np.abs(theta) < 1000):\n", 135 | " return 0\n", 136 | " else:\n", 137 | " return -np.inf # log(0)\n", 138 | " \n", 139 | "def log_symmetric_prior(theta):\n", 140 | " if np.abs(theta[0]) < 1000:\n", 141 | " return -1.5 * np.log(1 + theta[1] ** 2)\n", 142 | " else:\n", 143 | " return -np.inf # log(0)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "#### Solution" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "We'll start by defining a function which takes a two-dimensional grid of likelihoods and returns 1, 2, and 3-sigma contours.\n", 158 | "This acts by sorting and normalizing the values and then finding the locations of the $0.68^2$, $0.95^2$, and $0.997^2$ cutoffs:" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "def contour_levels(grid):\n", 170 | " \"\"\"Compute 1, 2, 3-sigma contour levels for a gridded 2D posterior\"\"\"\n", 171 | " sorted_ = np.sort(grid.ravel())[::-1]\n", 172 | " pct = np.cumsum(sorted_) / np.sum(sorted_)\n", 173 | " cutoffs = np.searchsorted(pct, np.array([0.68, 0.95, 0.997]) ** 2)\n", 174 | " return sorted_[cutoffs]" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "Now we define a function to compute and plot the results of the Bayesian analysis:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": { 188 | "collapsed": true 189 | }, 190 | "outputs": [], 191 | "source": [ 192 | "def plot_results(x, y, dy,\n", 193 | " slope_limits=(0.3, 0.7),\n", 194 | " intercept_limits=(15, 35)):\n", 195 | " # 1. Evaluate the log probability on the grid (once for each prior)\n", 196 | " slope_range = np.linspace(*slope_limits)\n", 197 | " intercept_range = np.linspace(*intercept_limits)\n", 198 | "\n", 199 | " log_P1 = [[log_likelihood([b, m], x, y, dy) + log_flat_prior([b, m])\n", 200 | " for m in slope_range] for b in intercept_range]\n", 201 | " log_P2 = [[log_likelihood([b, m], x, y, dy) + log_symmetric_prior([b, m])\n", 202 | " for m in slope_range] for b in intercept_range]\n", 203 | "\n", 204 | " # For convenience, we normalize the probability density such that the maximum is 1\n", 205 | " P1 = np.exp(log_P1 - np.max(log_P1))\n", 206 | " P2 = np.exp(log_P2 - np.max(log_P2))\n", 207 | "\n", 208 | " # 2. Create two subplots and plot contours showing the results\n", 209 | " fig, ax = plt.subplots(1, 2, figsize=(16, 6),\n", 210 | " sharex=True, sharey=True)\n", 211 | " \n", 212 | " ax[0].contourf(slope_range, intercept_range, P1, 100, cmap='Blues')\n", 213 | " ax[0].contour(slope_range, intercept_range, P1, contour_levels(P1), colors='black')\n", 214 | " ax[0].set_title('Flat Prior')\n", 215 | "\n", 216 | " ax[1].contourf(slope_range, intercept_range, P2, 100, cmap='Blues')\n", 217 | " ax[1].contour(slope_range, intercept_range, P2, contour_levels(P1), colors='black')\n", 218 | " ax[1].set_title('Symmetric Prior')\n", 219 | "\n", 220 | " # 3. Add grids and set axis labels\n", 221 | " for axi in ax:\n", 222 | " axi.grid('on', linestyle=':', color='gray', alpha=0.5)\n", 223 | " axi.set_axisbelow(False)\n", 224 | " axi.set_xlabel('slope')\n", 225 | " axi.set_ylabel('intercept')" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "plot_results(x, y, dy)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "We see that the form of the prior in this case makes very little difference to the final posterior. In general, this often ends up being the case: for all the worrying about the effect of the prior, when you have enough data to constrain your model well, the prior has very little effect." 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Let's use some different data and see what happens:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": { 257 | "collapsed": false 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "x2, y2, dy2 = make_data(*theta_true, N=3, dy=40)\n", 262 | "plt.errorbar(x2, y2, dy2, fmt='o');" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": null, 268 | "metadata": { 269 | "collapsed": false 270 | }, 271 | "outputs": [], 272 | "source": [ 273 | "plot_results(x2, y2, dy2,\n", 274 | " slope_limits=(-2, 5),\n", 275 | " intercept_limits=(-300, 200))" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "We see here that the form of the prior **does** have a clear effect in the case where the data don't constrain the model well (in this case, three points with very large error bars).\n", 283 | "This encodes exactly what you would scientifically expect: if you don't have very good data, it is unlikely to change your views of the world (which are of course encoded in the prior)." 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.5.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 0 308 | } 309 | -------------------------------------------------------------------------------- /04-Implementing-MCMC.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Breakout: Do-It-Yourself MCMC\n", 8 | "\n", 9 | "As we have seen, the standard Bayesian approach to model fitting involves sampling the posterior, usually via a variant of Markov Chain Monte Carlo (MCMC). Though there are many very sophisticated MCMC samplers out there, the most simple algorithm (Metropolis-Hastings) is rather straightforward to code.\n", 10 | "\n", 11 | "Here we'll walk through creating our own Metropolis-Hastings sampler from scratch, in order to better understand exactly what is going on under the hood.\n", 12 | "\n", 13 | "If you'd like to view one possible solution, take a look at the [Solutions-04](Solutions-04.ipynb) notebook (but again, try to make an honest effort at this before you peek!)" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Preliminaries\n", 21 | "\n", 22 | "As usual, we start with some imports:" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": { 29 | "collapsed": false 30 | }, 31 | "outputs": [], 32 | "source": [ 33 | "%matplotlib inline\n", 34 | "import numpy as np\n", 35 | "import matplotlib.pyplot as plt\n", 36 | "from scipy import stats\n", 37 | "\n", 38 | "# use seaborn plotting defaults\n", 39 | "# If you don't have seaborn installed, you can comment this out.\n", 40 | "import seaborn as sns; sns.set()" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## Metropolis-Hastings Procedure\n", 48 | "\n", 49 | "Recall the Metropolis-Hastings procedure:\n", 50 | "\n", 51 | "1. Define a posterior $p(\\theta~|~D, I)$\n", 52 | "2. Define a *proposal density* $p(\\theta_{i + 1}~|~\\theta_i)$, which must be a symmetric function, but otherwise is unconstrained (a Gaussian is the usual choice).\n", 53 | "3. Choose a starting point $\\theta_0$\n", 54 | "4. Repeat the following:\n", 55 | "\n", 56 | " 1. Given $\\theta_i$, draw a new $\\theta_{i + 1}$ from the proposal distribution\n", 57 | " \n", 58 | " 2. Compute the *acceptance ratio*\n", 59 | " $$\n", 60 | " a = \\frac{p(\\theta_{i + 1}~|~D,I)}{p(\\theta_i~|~D,I)}\n", 61 | " $$\n", 62 | " \n", 63 | " 3. If $a \\ge 1$, the proposal is more likely: accept the draw and add $\\theta_{i + 1}$ to the chain.\n", 64 | " \n", 65 | " 4. If $a < 1$, then accept the point with probability $a$: this can be done by drawing a uniform random number $r$ and checking if $a < r$. If the point is accepted, add $\\theta_{i + 1}$ to the chain. If not, then add $\\theta_i$ to the chain *again*.\n", 66 | " \n", 67 | "The goal is to produce a \"chain\", i.e. a list of $\\theta$ values, where each $\\theta$ is a vector of parameters for your model.\n", 68 | "Here we'll write a simple Metropolis-Hastings sampler in Python.\n", 69 | "\n", 70 | "Note that the ``np.random.randn()`` function will be useful: it returns a pseudorandom value drawn from a standard normal distribution (i.e. mean of zero and variance of 1)." 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## The Data\n", 78 | "\n", 79 | "We'll use data drawn from a straight line model" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": { 86 | "collapsed": false 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "def make_data(intercept, slope, N=20, dy=2, rseed=42):\n", 91 | " rand = np.random.RandomState(rseed)\n", 92 | " x = 100 * rand.rand(20)\n", 93 | " y = intercept + slope * x\n", 94 | " y += dy * rand.randn(20)\n", 95 | " return x, y, dy * np.ones_like(x)\n", 96 | "\n", 97 | "theta_true = (2, 0.5)\n", 98 | "x, y, dy = make_data(*theta_true)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Exercise\n", 106 | "\n", 107 | "Walk through all the following steps, filling-in the code along the way." 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "First plot the data to see what we're looking at (Use a ``plt.errorbar()`` plot with the provided data)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": { 121 | "collapsed": false 122 | }, 123 | "outputs": [], 124 | "source": [] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "We're going to fit a line to the data, as we've done through the lecture:" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": { 137 | "collapsed": false 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "def model(theta, x):\n", 142 | " # the `theta` argument is a list of parameter values, e.g., theta = [m, b] for a line\n", 143 | " pass" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "---\n", 151 | "\n", 152 | "We'll start with the assumption that the data are independent and identically distributed so that the likelihood is simply a product of Gaussians (one big Gaussian). We'll also assume that the uncertainties reported are correct, and that there are no uncertainties on the `x` data. We need to define a function that will evaluate the (ln)likelihood of the data, given a particular choice of your model parameters. A good way to structure this function is as follows:" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": { 159 | "collapsed": false 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "def log_likelihood(theta, x, y, dy):\n", 164 | " # we will pass the parameters (theta) to the model function\n", 165 | " # the other arguments are the data\n", 166 | " pass " 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "What about priors? Remember your prior only depends on the model parameters, but be careful about what kind of prior you are specifying for each parameter. Do we need to properly normalize the probabilities?" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": { 180 | "collapsed": false 181 | }, 182 | "outputs": [], 183 | "source": [ 184 | "def log_prior(theta):\n", 185 | " pass" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "Now we can define a function that evaluates the (ln)posterior probability, which is just the sum of the ln prior and ln likelihood:" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": { 199 | "collapsed": false 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "def log_posterior(theta, x, y, dy):\n", 204 | " return ln_prior(theta) + ln_likelihood(theta, x, y, dy)" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "Now write a function to actually run a Metropolis-Hastings MCMC sampler. Ford (2005) includes a great step-by-step walkthrough of the Metropolis-Hastings algorithm, and we'll base our code on that. Fill-in the steps mentioned in the comments below:" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": { 218 | "collapsed": false 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "def run_mcmc(log_posterior, nsteps, theta0, stepsize, args=()):\n", 223 | " \"\"\"\n", 224 | " Run a Markov Chain Monte Carlo\n", 225 | " \n", 226 | " Parameters\n", 227 | " ----------\n", 228 | " log_posterior: callable\n", 229 | " our function to compute the posterior\n", 230 | " nsteps: int\n", 231 | " the number of steps in the chain\n", 232 | " theta0: list\n", 233 | " the starting guess for parameters theta\n", 234 | " stepsize: float\n", 235 | " a parameter controlling the size of the random step\n", 236 | " e.g. it could be the width of the Gaussian distribution\n", 237 | " args: tuple (optional)\n", 238 | " additional arguments (data) passed to log_posterior\n", 239 | " \"\"\"\n", 240 | " # Create the array of size (nsteps, ndims) to hold the chain\n", 241 | " # Initialize the first row of this with theta0\n", 242 | " \n", 243 | " # Create the array of size nsteps to hold the log-likelihoods for each point\n", 244 | " # Initialize the first entry of this with the log likelihood at theta0\n", 245 | " \n", 246 | " # Loop for nsteps\n", 247 | " for i in range(nsteps):\n", 248 | " # Randomly draw a new theta from the proposal distribution.\n", 249 | " # for example, you can do a normally-distributed step by utilizing\n", 250 | " # the np.random.randn() function\n", 251 | " \n", 252 | " # Calculate the probability for the new state\n", 253 | " \n", 254 | " # Compare it to the probability of the old state\n", 255 | " # Using the acceptance probability function\n", 256 | " # (remember that you've computed the log probability, not the probability!)\n", 257 | " \n", 258 | " # Chose a random number r between 0 and 1 to compare with p_accept\n", 259 | " \n", 260 | " # If p_accept>1 or p_accept>r, accept the step\n", 261 | " # Save the position to the i^th row of the chain\n", 262 | " \n", 263 | " # Save the probability to the i^th entry of the array\n", 264 | " \n", 265 | " # Else, do not accept the step\n", 266 | " # Set the position and probability are equal to the previous values\n", 267 | " \n", 268 | " # Return the chain and probabilities" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "Now run the MCMC code on the data provided." 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": { 282 | "collapsed": false 283 | }, 284 | "outputs": [], 285 | "source": [] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Plot the position of the walker as a function of step number for each of the parameters. Are the chains converged? " 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": { 298 | "collapsed": false 299 | }, 300 | "outputs": [], 301 | "source": [] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "Make histograms of the samples for each parameter. Should you include all of the samples? " 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": { 314 | "collapsed": false 315 | }, 316 | "outputs": [], 317 | "source": [] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "Report to us your constraints on the model parameters.\n", 324 | "This is the number for the abstract – the challenge is to figure out how to accurately summarize a multi-dimensional posterior (which is **the result** in Bayesianism) with a few numbers (which is what readers want to see as they skim the arXiv).\n", 325 | "\n", 326 | "What numbers should you use?" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "collapsed": false 334 | }, 335 | "outputs": [], 336 | "source": [] 337 | } 338 | ], 339 | "metadata": { 340 | "kernelspec": { 341 | "display_name": "Python 3", 342 | "language": "python", 343 | "name": "python3" 344 | }, 345 | "language_info": { 346 | "codemirror_mode": { 347 | "name": "ipython", 348 | "version": 3 349 | }, 350 | "file_extension": ".py", 351 | "mimetype": "text/x-python", 352 | "name": "python", 353 | "nbconvert_exporter": "python", 354 | "pygments_lexer": "ipython3", 355 | "version": "3.5.1" 356 | } 357 | }, 358 | "nbformat": 4, 359 | "nbformat_minor": 0 360 | } 361 | -------------------------------------------------------------------------------- /02-Simple-Bayesian-Modeling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Simple Bayesian Modeling\n", 8 | "\n", 9 | "Here we'll apply what we learned in the previous section, and use Python to fit some Bayesian models.\n", 10 | "We'll start with the classic model fitting problem: **Fitting a line to data**.\n", 11 | "\n", 12 | "We'll begin with some standard Python imports:" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": { 19 | "collapsed": false 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "%matplotlib inline\n", 24 | "import matplotlib.pyplot as plt\n", 25 | "import numpy as np\n", 26 | "import seaborn; seaborn.set() # for plot formatting" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "## The Data\n", 34 | "\n", 35 | "For the sake of this demonstration, let's start by creating some data that we will fit with a straight line:" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": { 42 | "collapsed": false 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "def make_data(intercept, slope, N=20, dy=5, rseed=42):\n", 47 | " rand = np.random.RandomState(rseed)\n", 48 | " x = 100 * rand.rand(N)\n", 49 | " y = intercept + slope * x\n", 50 | " y += dy * rand.randn(N)\n", 51 | " return x, y, dy * np.ones_like(x)\n", 52 | "\n", 53 | "theta_true = [25, 0.5]\n", 54 | "x, y, dy = make_data(*theta_true)\n", 55 | "plt.errorbar(x, y, dy, fmt='o');" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "## The Model\n", 63 | "\n", 64 | "Next we need to specify a model. We're fitting a straight line to data, so we'll need a slope and an intercept; i.e.\n", 65 | "\n", 66 | "$$\n", 67 | "y_M(x) = mx + b\n", 68 | "$$\n", 69 | "\n", 70 | "where our paramteter vector might be \n", 71 | "\n", 72 | "$$\n", 73 | "\\theta = [b, m]\n", 74 | "$$\n", 75 | "\n", 76 | "But this is only half the picture: what we mean by a \"model\" in a Bayesian sense is not only this expected value $y_M(x;\\theta)$, but a **probability distribution** for our data.\n", 77 | "That is, we need an expression to compute the likelihood $P(D\\mid\\theta)$ for our data as a function of the parameters $\\theta$." 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "Here we are given data with simple errorbars, which imply that the probability for any *single* data point is a normal distribution about the true value. That is,\n", 85 | "\n", 86 | "$$\n", 87 | "y_i \\sim \\mathcal{N}(y_M(x_i;\\theta), \\sigma)\n", 88 | "$$\n", 89 | "\n", 90 | "or, in other words,\n", 91 | "\n", 92 | "$$\n", 93 | "P(x_i,y_i\\mid\\theta) = \\frac{1}{\\sqrt{2\\pi\\varepsilon_i^2}} \\exp\\left(\\frac{-\\left[y_i - y_M(x_i;\\theta)\\right]^2}{2\\varepsilon_i^2}\\right)\n", 94 | "$$\n", 95 | "\n", 96 | "where $\\varepsilon_i$ are the (known) measurement errorsindicated by the errorbars." 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "Assuming all the points are independent, we can find the full likelihood by multiplying the individual likelihoods together:\n", 104 | "\n", 105 | "$$\n", 106 | "P(D\\mid\\theta) = \\prod_{i=1}^N P(x_i,y_i\\mid\\theta)\n", 107 | "$$\n", 108 | "\n", 109 | "For convenience (and also for numerical accuracy) this is often expressed in terms of the log-likelihood:\n", 110 | "\n", 111 | "$$\n", 112 | "\\log P(D\\mid\\theta) = -\\frac{1}{2}\\sum_{i=1}^N\\left(\\log(2\\pi\\varepsilon_i^2) + \\frac{\\left[y_i - y_M(x_i;\\theta)\\right]^2}{\\varepsilon_i^2}\\right)\n", 113 | "$$" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### Quick Exercise\n", 121 | "\n", 122 | "1. Write a Python function which computes the log-likelihood given a parameter vector $\\theta$, an array of errors $\\varepsilon$, and an array of $x$ and $y$ values\n", 123 | "\n", 124 | "2. Use tools in [``scipy.optimize``](http://docs.scipy.org/doc/scipy/reference/optimize.html) to maximize this likelihood (i.e. minimize the negative log-likelihood). How close is this result to the input ``theta_true`` above?" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": { 131 | "collapsed": true 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "# Write your code here" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "collapsed": true 143 | }, 144 | "outputs": [], 145 | "source": [] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": { 151 | "collapsed": true 152 | }, 153 | "outputs": [], 154 | "source": [] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": { 160 | "collapsed": false 161 | }, 162 | "outputs": [], 163 | "source": [] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "See the [Solutions-02](Solutions-02.ipynb#Quick-Exercise) notebook for one possible approach." 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "## The Prior\n", 177 | "\n", 178 | "We have computed the likelihood, now we need to think about the prior $P(\\theta\\mid I)$.\n", 179 | "\n", 180 | "This is where Bayesianism gets a bit controversial... what can we actually say about the slope and intercept before we fit our data?\n", 181 | "\n", 182 | "There are a couple approaches to choosing priors that you'll come across in practice:" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "### 0. Conjugate Priors\n", 190 | "\n", 191 | "In the days before computational power caught up with the aspirations of Bayesians, it was important to choose priors which would make the posterior analytically computable. A [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior) is a prior which, due to its mathematical relation to the likelihood, makes the result analytically computable.\n", 192 | "The problem is that the form of the conjugate prior is very rarely defensible on any grounds other than computational convenience, and so this is **almost never a good choice**.\n", 193 | "You'll still occasionally hear people attack Bayesian approaches because of the use of conjugate priors: these people's arguments are decades outdated, and you should treat them appropriately." 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "### 1. Empirical Priors\n", 201 | "\n", 202 | "Empirical Priors are priors which are actually posteriors from previous studies of the same phenomenon. For example, it's common in Supernova cosmology studies to use the WMAP results as a prior: that is, we actually plug-in a *real result* and use our new data to improve on that. This situation is where Bayesian approaches really shine.\n", 203 | "\n", 204 | "For our linear fit, you might imagine that our $x, y$ data is a more accurate version of a previous experiment, where we've found that the intercept is $\\theta_0 = 50 \\pm 30$ and the slope is $\\theta_1 = 1.0 \\pm 0.5$.\n", 205 | "In this case, we'd encode this prior knowledge in the prior distribution itself." 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### 2. Flat Priors\n", 213 | "\n", 214 | "If you don't have an empirical prior, you might be tempted to simply use a *flat prior* – i.e. a prior that is constant between two reasonable limits (i.e. equal probability slopes from -1000 to +1000).\n", 215 | "\n", 216 | "The problem is that flat priors are not always non-informative! For example, a flat prior on the slope will effectively give a higher weight to larger slopes.\n", 217 | "We can see this straightforwardly by plotting regularly-spaced slopes between 0 and 20:" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": { 224 | "collapsed": false 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "xx = np.linspace(-1, 1)\n", 229 | "for slope in np.linspace(0, 20, 100):\n", 230 | " plt.plot(xx, slope * xx, '-k', linewidth=1)\n", 231 | "plt.axis([-1, 1, -1, 1], aspect='equal');" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "The density of the lines is a proxy for the probability of those slopes with a flat prior.\n", 239 | "This is an important point to realize: **flat priors are not necessarily minimally informative**." 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "### 3. Non-informative Priors\n", 247 | "\n", 248 | "What we *really* want in cases where no empirical prior is available is a **non-informative prior**. Among other things, such a prior should not depend on the units of the data.\n", 249 | "Perhaps the most principled approach to choosing non-informative priors was the *principle of maximum entropy* advocated by Jaynes ([book](http://omega.albany.edu:8008/JaynesBook.html)).\n", 250 | "\n", 251 | "Similar in spirit is the commonly-used [Jeffreys Prior](https://en.wikipedia.org/wiki/Jeffreys_prior), which in many cases of interest amounts to a \"scale invariant\" prior: a flat prior on the logarithm of the parameter.\n", 252 | "\n", 253 | "In the case of the linear slope, we often want a prior which does not artificially over-weight large slopes: there are a couple possible approaches to this (see http://arxiv.org/abs/1411.5018 for some discussion). For our situation, we might use a flat prior on the angle the line makes with the x-axis, which gives\n", 254 | "\n", 255 | "$$\n", 256 | "P(m) \\propto (1 + m^2)^{-3/2}\n", 257 | "$$\n", 258 | "\n", 259 | "For lack of a better term, I like to call this a \"symmetric prior\" on the slope (because it's the same whether we're fitting $y = mx + b$ or $x = m^\\prime y + b^\\prime$)." 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "### Implementation\n", 267 | "\n", 268 | "Let's define two python functions to compute the options for our prior: we'll use both a (log) flat prior and a (log) symmetric prior.\n", 269 | "In general, we need not worry about the normalization of the prior or the likelihood, which makes our lives easier:" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": { 276 | "collapsed": true 277 | }, 278 | "outputs": [], 279 | "source": [ 280 | "def log_flat_prior(theta):\n", 281 | " if np.all(np.abs(theta) < 1000):\n", 282 | " return 0 # log(1)\n", 283 | " else:\n", 284 | " return -np.inf # log(0)\n", 285 | " \n", 286 | "def log_symmetric_prior(theta):\n", 287 | " if np.abs(theta[0]) < 1000:\n", 288 | " return -1.5 * np.log(1 + theta[1] ** 2)\n", 289 | " else:\n", 290 | " return -np.inf # log(0)" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "With these defined, we now have what we need to compute the log posterior as a function of the model parameters.\n", 298 | "You might be tempted to maximize this posterior in the same way that we did with the likelihood above, but this is not a Bayesian result! The Bayesian result is a (possibly marginalized) posterior probability for our parameters.\n", 299 | "The mode of a probability distribution is perhaps slightly informative, but it is in no way a Bayesian result.\n", 300 | "\n", 301 | "In the breakout, we will take a few minutes now to plot the posterior probability as a function of the slope and intercept." 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "## Breakout\n", 309 | "\n", 310 | "1. Using matplotlib, plot the posterior probability distribution for the slope and intercept, once for each prior. I would suggest using ``plt.contourf()`` or ``plt.pcolor()``. How different are the distributions?\n", 311 | "\n", 312 | "2. Modify the dataset – how do the results change if you have very few data points or very large errors?\n", 313 | "\n", 314 | "3. If you finish this quickly, try adding 1-sigma and 2-sigma contours to your plot, keeping in mind that the probabilities are not normalized! You can add them to your plot with ``plt.contour()``." 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": { 321 | "collapsed": false 322 | }, 323 | "outputs": [], 324 | "source": [ 325 | "# Write your code here" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": { 332 | "collapsed": false 333 | }, 334 | "outputs": [], 335 | "source": [] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": { 341 | "collapsed": true 342 | }, 343 | "outputs": [], 344 | "source": [] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": { 350 | "collapsed": true 351 | }, 352 | "outputs": [], 353 | "source": [] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": null, 358 | "metadata": { 359 | "collapsed": true 360 | }, 361 | "outputs": [], 362 | "source": [] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": { 368 | "collapsed": true 369 | }, 370 | "outputs": [], 371 | "source": [] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "See the [Solutions-02](Solutions-02.ipynb#Breakout) notebook for one possible approach to answering these questions. But give this a good try before simply clicking! It's by working through this yourself that you will really learn the material." 378 | ] 379 | } 380 | ], 381 | "metadata": { 382 | "kernelspec": { 383 | "display_name": "Python 3", 384 | "language": "python", 385 | "name": "python3" 386 | }, 387 | "language_info": { 388 | "codemirror_mode": { 389 | "name": "ipython", 390 | "version": 3 391 | }, 392 | "file_extension": ".py", 393 | "mimetype": "text/x-python", 394 | "name": "python", 395 | "nbconvert_exporter": "python", 396 | "pygments_lexer": "ipython3", 397 | "version": "3.5.1" 398 | } 399 | }, 400 | "nbformat": 4, 401 | "nbformat_minor": 0 402 | } 403 | -------------------------------------------------------------------------------- /Solutions-04.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Solutions-04\n", 8 | "\n", 9 | "As we have seen, the standard Bayesian approach to model fitting involves sampling the posterior, usually via a variant of Markov Chain Monte Carlo (MCMC). Though there are many very sophisticated MCMC samplers out there, the most simple algorithm (Metropolis-Hastings) is rather straightforward to code.\n", 10 | "\n", 11 | "Here we'll walk through creating our own Metropolis-Hastings sampler from scratch, in order to better understand exactly what is going on under the hood.\n", 12 | "\n", 13 | "If you'd like to view one possible solution, take a look at the [Solutions-04](Solutions-04.ipynb) notebook (but again, try to make an honest effort at this before you peek!)" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Preliminaries\n", 21 | "\n", 22 | "As usual, we start with some imports:" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": { 29 | "collapsed": false 30 | }, 31 | "outputs": [], 32 | "source": [ 33 | "%matplotlib inline\n", 34 | "import numpy as np\n", 35 | "import matplotlib.pyplot as plt\n", 36 | "from scipy import stats\n", 37 | "\n", 38 | "# use seaborn plotting defaults\n", 39 | "# If this causes an error, you can comment it out.\n", 40 | "import seaborn; seaborn.set()" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## Metropolis-Hastings Procedure\n", 48 | "\n", 49 | "Recall the Metropolis-Hastings procedure:\n", 50 | "\n", 51 | "1. Define a posterior $p(\\theta~|~D, I)$\n", 52 | "2. Define a *proposal density* $p(\\theta_{i + 1}~|~\\theta_i)$, which must be a symmetric function, but otherwise is unconstrained (a Gaussian is the usual choice).\n", 53 | "3. Choose a starting point $\\theta_0$\n", 54 | "4. Repeat the following:\n", 55 | "\n", 56 | " 1. Given $\\theta_i$, draw a new $\\theta_{i + 1}$ from the proposal distribution\n", 57 | " \n", 58 | " 2. Compute the *acceptance ratio*\n", 59 | " $$\n", 60 | " a = \\frac{p(\\theta_{i + 1}~|~D,I)}{p(\\theta_i~|~D,I)}\n", 61 | " $$\n", 62 | " \n", 63 | " 3. If $a \\ge 1$, the proposal is more likely: accept the draw and add $\\theta_{i + 1}$ to the chain.\n", 64 | " \n", 65 | " 4. If $a < 1$, then accept the point with probability $a$: this can be done by drawing a uniform random number $r$ and checking if $a < r$. If the point is accepted, add $\\theta_{i + 1}$ to the chain. If not, then add $\\theta_i$ to the chain *again*.\n", 66 | " \n", 67 | "The goal is to produce a \"chain\", i.e. a list of $\\theta$ values, where each $\\theta$ is a vector of parameters for your model.\n", 68 | "Here we'll write a simple Metropolis-Hastings sampler in Python.\n", 69 | "\n", 70 | "Note that the ``np.random.randn()`` function will be useful: it returns a pseudorandom value drawn from a standard normal distribution (i.e. mean of zero and variance of 1)." 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## The Data\n", 78 | "\n", 79 | "We'll use data drawn from a straight line model" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": { 86 | "collapsed": false 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "def make_data(intercept, slope, N=20, dy=2, rseed=42):\n", 91 | " rand = np.random.RandomState(rseed)\n", 92 | " x = 100 * rand.rand(20)\n", 93 | " y = intercept + slope * x\n", 94 | " y += dy * rand.randn(20)\n", 95 | " return x, y, dy * np.ones_like(x)\n", 96 | "\n", 97 | "theta_true = (2, 0.5)\n", 98 | "x, y, dy = make_data(*theta_true)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Exercise\n", 106 | "\n", 107 | "Walk through all the following steps, filling-in the code along the way." 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "First plot the data to see what we're looking at (Use a ``plt.errorbar()`` plot with the provided data)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": { 121 | "collapsed": false 122 | }, 123 | "outputs": [], 124 | "source": [ 125 | "plt.errorbar(x, y, dy, fmt='o');" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "We're going to fit a line to the data, as we've done through the lecture:" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "def model(theta, x):\n", 144 | " # the `theta` argument is a list of parameter values, e.g., theta = [m, b] for a line\n", 145 | " return theta[0] + theta[1] * x" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "---\n", 153 | "\n", 154 | "We'll start with the assumption that the data are independent and identically distributed so that the likelihood is simply a product of Gaussians (one big Gaussian). We'll also assume that the uncertainties reported are correct, and that there are no uncertainties on the `x` data. We need to define a function that will evaluate the (ln)likelihood of the data, given a particular choice of your model parameters. A good way to structure this function is as follows:" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": { 161 | "collapsed": false 162 | }, 163 | "outputs": [], 164 | "source": [ 165 | "def ln_likelihood(theta, x, y, dy):\n", 166 | " # we will pass the parameters (theta) to the model function\n", 167 | " # the other arguments are the data\n", 168 | " return -0.5 * np.sum(np.log(2 * np.pi * dy ** 2)\n", 169 | " + ((y - model(theta, x)) / dy) ** 2)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "What about priors? Remember your prior only depends on the model parameters, but be careful about what kind of prior you are specifying for each parameter. Do we need to properly normalize the probabilities?" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": { 183 | "collapsed": false 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "def ln_prior(theta):\n", 188 | " # flat prior: log(1) = 0\n", 189 | " return 0" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "Now we can define a function that evaluates the log-posterior probability, which is just the sum of the log-prior and log-likelihood:" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": { 203 | "collapsed": false 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "def ln_posterior(theta, x, y, dy):\n", 208 | " return ln_prior(theta) + ln_likelihood(theta, x, y, dy)" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "Now write a function to actually run a Metropolis-Hastings MCMC sampler. Ford (2005) includes a great step-by-step walkthrough of the Metropolis-Hastings algorithm, and we'll base our code on that. Fill-in the steps mentioned in the comments below:" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": { 222 | "collapsed": false 223 | }, 224 | "outputs": [], 225 | "source": [ 226 | "def run_mcmc(ln_posterior, nsteps, ndim, theta0, stepsize, args=()):\n", 227 | " \"\"\"\n", 228 | " Run a Markov Chain Monte Carlo\n", 229 | " \n", 230 | " Parameters\n", 231 | " ----------\n", 232 | " ln_posterior: callable\n", 233 | " our function to compute the posterior\n", 234 | " nsteps: int\n", 235 | " the number of steps in the chain\n", 236 | " theta0: list\n", 237 | " the starting guess for theta\n", 238 | " stepsize: float\n", 239 | " a parameter controlling the size of the random step\n", 240 | " e.g. it could be the width of the Gaussian distribution\n", 241 | " args: tuple (optional)\n", 242 | " additional arguments passed to ln_posterior\n", 243 | " \"\"\"\n", 244 | " # Create the array of size (nsteps, ndims) to hold the chain\n", 245 | " # Initialize the first row of this with theta0\n", 246 | " chain = np.zeros((nsteps, ndim))\n", 247 | " chain[0] = theta0\n", 248 | " \n", 249 | " # Create the array of size nsteps to hold the log-likelihoods for each point\n", 250 | " # Initialize the first entry of this with the log likelihood at theta0\n", 251 | " log_likes = np.zeros(nsteps)\n", 252 | " log_likes[0] = ln_posterior(chain[0], *args)\n", 253 | " \n", 254 | " # Loop for nsteps\n", 255 | " for i in range(1, nsteps):\n", 256 | " # Randomly draw a new theta from the proposal distribution.\n", 257 | " # for example, you can do a normally-distributed step by utilizing\n", 258 | " # the np.random.randn() function\n", 259 | " theta_new = chain[i - 1] + stepsize * np.random.randn(ndim)\n", 260 | " \n", 261 | " # Calculate the probability for the new state\n", 262 | " log_like_new = ln_likelihood(theta_new, *args)\n", 263 | " \n", 264 | " # Compare it to the probability of the old state\n", 265 | " # Using the acceptance probability function\n", 266 | " # (remember that you've computed the log probability, not the probability!)\n", 267 | " log_p_accept = log_like_new - log_likes[i - 1]\n", 268 | " \n", 269 | " # Chose a random number r between 0 and 1 to compare with p_accept\n", 270 | " r = np.random.rand()\n", 271 | " \n", 272 | " # If p_accept>1 or p_accept>r, accept the step\n", 273 | " # Else, do not accept the step\n", 274 | " if log_p_accept > np.log(r):\n", 275 | " chain[i] = theta_new\n", 276 | " log_likes[i] = log_like_new\n", 277 | " else:\n", 278 | " chain[i] = chain[i - 1]\n", 279 | " log_likes[i] = log_likes[i - 1]\n", 280 | " \n", 281 | " return chain" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "Now run the MCMC code on the data provided." 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": { 295 | "collapsed": false 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "chain = run_mcmc(ln_posterior, 10000, 2, [0, 1], 0.1, (x, y, dy))" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Plot the position of the walker as a function of step number for each of the parameters. Are the chains converged? " 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": { 313 | "collapsed": false 314 | }, 315 | "outputs": [], 316 | "source": [ 317 | "fig, ax = plt.subplots(2)\n", 318 | "ax[0].plot(chain[:, 0])\n", 319 | "ax[1].plot(chain[:, 1]);" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": { 326 | "collapsed": true 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "# Now that we've burned-in, let's get a fresh chain\n", 331 | "chain = run_mcmc(ln_posterior, 20000, 2, chain[-1], 0.1, (x, y, dy))" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": null, 337 | "metadata": { 338 | "collapsed": false 339 | }, 340 | "outputs": [], 341 | "source": [ 342 | "fig, ax = plt.subplots(2)\n", 343 | "ax[0].plot(chain[:, 0])\n", 344 | "ax[1].plot(chain[:, 1]);" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "Make histograms of the samples for each parameter. Should you include all of the samples? " 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": { 358 | "collapsed": false 359 | }, 360 | "outputs": [], 361 | "source": [ 362 | "fig, ax = plt.subplots(2)\n", 363 | "ax[0].hist(chain[:, 0], alpha=0.5)\n", 364 | "ax[1].hist(chain[:, 1], alpha=0.5);" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "It's also sometimes useful to view a two-dimensional histogram:" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": { 378 | "collapsed": false 379 | }, 380 | "outputs": [], 381 | "source": [ 382 | "plt.hist2d(chain[:, 0], chain[:, 1], bins=30,\n", 383 | " cmap='Blues')\n", 384 | "plt.xlabel('intercept')\n", 385 | "plt.ylabel('slope')\n", 386 | "plt.grid(False);" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "Report to us your constraints on the model parameters.\n", 394 | "This is the number for the abstract – the challenge is to figure out how to accurately summarize a multi-dimensional posterior (which is **the result** in Bayesianism) with a few numbers (which is what readers want to see as they skim the arXiv).\n", 395 | "\n", 396 | "What numbers should you use?" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": false 404 | }, 405 | "outputs": [], 406 | "source": [ 407 | "theta_best = chain.mean(0)\n", 408 | "theta_std = chain.std(0)\n", 409 | "\n", 410 | "print(\"true intercept:\", theta_true[0])\n", 411 | "print(\"true slope:\", theta_true[1])\n", 412 | "print()\n", 413 | "print(\"intercept = {0:.1f} +/- {1:.1f}\".format(theta_best[0], theta_std[0]))\n", 414 | "print(\"slope = {0:.2f} +/- {1:.2f}\".format(theta_best[1], theta_std[1]))" 415 | ] 416 | } 417 | ], 418 | "metadata": { 419 | "kernelspec": { 420 | "display_name": "Python 3", 421 | "language": "python", 422 | "name": "python3" 423 | }, 424 | "language_info": { 425 | "codemirror_mode": { 426 | "name": "ipython", 427 | "version": 3 428 | }, 429 | "file_extension": ".py", 430 | "mimetype": "text/x-python", 431 | "name": "python", 432 | "nbconvert_exporter": "python", 433 | "pygments_lexer": "ipython3", 434 | "version": "3.5.1" 435 | } 436 | }, 437 | "nbformat": 4, 438 | "nbformat_minor": 0 439 | } 440 | -------------------------------------------------------------------------------- /Solutions-05.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Solution-05\n", 8 | "\n", 9 | "This notebook contains the solution to the Bayesian model of 47 Ursae Majoris.\n", 10 | "\n", 11 | "Your task to find the orbital parameters of the first planet in this system, using data from table 1 of [Fischer et al 2002](http://iopscience.iop.org/article/10.1086/324336/meta).\n", 12 | "\n", 13 | "The following IPython magic command will create the data file ``47UrsaeMajoris.txt`` in the current directory:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": { 20 | "collapsed": false 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "%%file 47UrsaeMajoris.txt\n", 25 | "date rv rv_err\n", 26 | "6959.737 -60.48 14.00\n", 27 | "7194.912 -53.60 7.49\n", 28 | "7223.798 -38.36 6.14\n", 29 | "7964.893 0.60 8.19\n", 30 | "8017.730 -28.29 10.57\n", 31 | "8374.771 -40.25 9.37\n", 32 | "8647.897 42.37 11.41\n", 33 | "8648.910 32.64 11.02\n", 34 | "8670.878 55.45 11.45\n", 35 | "8745.691 51.78 8.76\n", 36 | "8992.061 4.49 11.21\n", 37 | "9067.771 -14.63 7.00\n", 38 | "9096.734 -26.06 6.79\n", 39 | "9122.691 -47.38 7.91\n", 40 | "9172.686 -38.22 10.55\n", 41 | "9349.912 -52.21 9.52\n", 42 | "9374.964 -48.69 8.67\n", 43 | "9411.839 -36.01 12.81\n", 44 | "9481.720 -52.46 13.40\n", 45 | "9767.918 38.58 5.48\n", 46 | "9768.908 36.68 5.02\n", 47 | "9802.789 37.93 3.85\n", 48 | "10058.079 15.82 3.45\n", 49 | "10068.980 15.46 4.63\n", 50 | "10072.012 21.20 4.09\n", 51 | "10088.994 1.30 4.25\n", 52 | "10089.947 6.12 3.70\n", 53 | "10091.900 0.00 4.16\n", 54 | "10120.918 4.07 4.16\n", 55 | "10124.905 0.29 3.74\n", 56 | "10125.823 -1.87 3.79\n", 57 | "10127.898 -0.68 4.10\n", 58 | "10144.877 -4.13 5.26\n", 59 | "10150.797 -8.14 4.18\n", 60 | "10172.829 -10.79 4.43\n", 61 | "10173.762 -9.33 5.43\n", 62 | "10181.742 -23.87 3.28\n", 63 | "10187.740 -16.70 4.67\n", 64 | "10199.730 -16.29 3.98\n", 65 | "10203.733 -21.84 4.92\n", 66 | "10214.731 -24.51 3.67\n", 67 | "10422.018 -56.63 4.23\n", 68 | "10438.001 -39.61 3.91\n", 69 | "10442.027 -44.62 4.05\n", 70 | "10502.853 -32.05 4.69\n", 71 | "10504.859 -39.08 4.65\n", 72 | "10536.845 -22.46 5.18\n", 73 | "10537.842 -22.83 4.16\n", 74 | "10563.673 -17.47 4.03\n", 75 | "10579.697 -11.01 3.84\n", 76 | "10610.719 -8.67 3.52\n", 77 | "10793.957 37.00 3.78\n", 78 | "10795.039 41.85 4.80\n", 79 | "10978.684 36.42 5.01\n", 80 | "11131.066 13.56 6.61\n", 81 | "11175.027 -3.74 8.17\n", 82 | "11242.842 -21.85 5.43\n", 83 | "11303.712 -48.75 4.63\n", 84 | "11508.070 -51.65 8.37\n", 85 | "11536.064 -72.44 4.73\n", 86 | "11540.999 -57.58 5.97\n", 87 | "11607.916 -43.94 4.94\n", 88 | "11626.771 -39.14 7.03\n", 89 | "11627.754 -50.88 6.21\n", 90 | "11628.727 -51.52 5.87\n", 91 | "11629.832 -51.86 4.60\n", 92 | "11700.693 -24.58 5.20\n", 93 | "11861.049 14.64 5.33\n", 94 | "11874.068 14.15 5.75\n", 95 | "11881.045 18.02 4.15\n", 96 | "11895.068 16.96 4.60\n", 97 | "11906.014 11.73 4.07\n", 98 | "11907.011 22.83 4.38\n", 99 | "11909.042 23.42 3.78\n", 100 | "11910.955 18.34 4.33\n", 101 | "11914.067 15.45 5.37\n", 102 | "11915.048 24.05 3.82\n", 103 | "11916.033 23.16 3.67\n", 104 | "11939.969 27.53 5.08\n", 105 | "11946.960 21.44 4.18\n", 106 | "11969.902 30.99 4.58\n", 107 | "11971.894 38.36 5.01\n", 108 | "11998.779 33.82 3.93\n", 109 | "11999.820 27.52 3.98\n", 110 | "12000.858 23.40 4.07\n", 111 | "12028.740 37.08 4.95\n", 112 | "12033.746 26.28 5.24\n", 113 | "12040.759 31.12 3.54\n", 114 | "12041.719 34.04 3.45\n", 115 | "12042.695 31.38 3.98\n", 116 | "12073.723 21.81 4.73" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "But first, we'll copy some relevant material from the first notebook." 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "## Preliminaries" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": { 137 | "collapsed": false 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "%matplotlib inline\n", 142 | "import matplotlib.pyplot as plt\n", 143 | "import numpy as np\n", 144 | "import pandas as pd\n", 145 | "import seaborn; seaborn.set() #nice plot formatting" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "## The Radial Velocity Model" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": { 159 | "collapsed": true 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "from collections import namedtuple\n", 164 | "params = namedtuple('params', ['T', 'e', 'K', 'V', 'omega', 'chi', 's'])" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": { 171 | "collapsed": true 172 | }, 173 | "outputs": [], 174 | "source": [ 175 | "from scipy import optimize\n", 176 | "\n", 177 | "@np.vectorize\n", 178 | "def compute_E(M, e):\n", 179 | " \"\"\"Solve Kepler's eqns for eccentric anomaly given mean anomaly\"\"\"\n", 180 | " f = lambda E, M=M, e=e: E - e * np.sin(E) - M\n", 181 | " return optimize.brentq(f, 0, 2 * np.pi)\n", 182 | "\n", 183 | "\n", 184 | "def radial_velocity(t, theta):\n", 185 | " \"\"\"Compute radial velocity given orbital parameters\"\"\"\n", 186 | " T, e, K, V, omega, chi = theta[:6]\n", 187 | " \n", 188 | " # compute mean anomaly (0 <= M < 2pi)\n", 189 | " M = 2 * np.pi * ((t / T + chi) % 1)\n", 190 | " \n", 191 | " # solve for eccentric anomaly\n", 192 | " E = compute_E(M, e)\n", 193 | " \n", 194 | " # compute true anomaly\n", 195 | " f = 2 * np.arctan2(np.sqrt(1 + e) * np.sin(E / 2),\n", 196 | " np.sqrt(1 - e) * np.cos(E / 2))\n", 197 | " \n", 198 | " # compute radial velocity\n", 199 | " return V - K * (np.sin(f + omega) + e * np.sin(omega))" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## The Prior, Likelihood, and Posterior" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": { 213 | "collapsed": false 214 | }, 215 | "outputs": [], 216 | "source": [ 217 | "theta_lim = params(T=(0.2, 2000),\n", 218 | " e=(0, 1),\n", 219 | " K=(0.01, 2000),\n", 220 | " V=(-2000, 2000),\n", 221 | " omega=(0, 2 * np.pi),\n", 222 | " chi=(0, 1),\n", 223 | " s=(0.001, 100))\n", 224 | "theta_min, theta_max = map(np.array, zip(*theta_lim))\n", 225 | "\n", 226 | "def log_prior(theta):\n", 227 | " if np.any(theta < theta_min) or np.any(theta > theta_max):\n", 228 | " return -np.inf # log(0)\n", 229 | " \n", 230 | " # Jeffreys Prior on T, K, and s\n", 231 | " return -np.sum(np.log(theta[[0, 2, 6]]))\n", 232 | "\n", 233 | "def log_likelihood(theta, t, rv, rv_err):\n", 234 | " sq_err = rv_err ** 2 + theta[6] ** 2\n", 235 | " rv_model = radial_velocity(t, theta)\n", 236 | " return -0.5 * np.sum(np.log(sq_err) + (rv - rv_model) ** 2 / sq_err)\n", 237 | "\n", 238 | "def log_posterior(theta, t, rv, rv_err):\n", 239 | " ln_prior = log_prior(theta)\n", 240 | " if np.isinf(ln_prior):\n", 241 | " return ln_prior\n", 242 | " else:\n", 243 | " return ln_prior + log_likelihood(theta, t, rv, rv_err)\n", 244 | " \n", 245 | "def make_starting_guess(t, rv, rv_err):\n", 246 | " model = LombScargleFast()\n", 247 | " model.optimizer.set(period_range=theta_lim.T,\n", 248 | " quiet=True)\n", 249 | " model.fit(t, rv, rv_err)\n", 250 | "\n", 251 | " rv_range = 0.5 * (np.max(rv) - np.min(rv))\n", 252 | " rv_center = np.mean(rv)\n", 253 | " return params(T=model.best_period,\n", 254 | " e=0.1,\n", 255 | " K=rv_range,\n", 256 | " V=rv_center,\n", 257 | " omega=np.pi,\n", 258 | " chi=0.5,\n", 259 | " s=rv_err.mean())" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "## The Data" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "After running the above command, your data will be in a file called ``47UrsaeMajoris.txt``.\n", 274 | "\n", 275 | "An easy way to load data in this form is the [pandas](http://pandas.pydata.org) package, which implements a DataFrame object (basically, a labeled data table).\n", 276 | "Reading the CSV file is a one-line operation:" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "import pandas as pd\n", 288 | "data = pd.read_csv('47UrsaeMajoris.txt', delim_whitespace=True)\n", 289 | "t, rv, rv_err = data.values.T" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "### Visualize the Data" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": { 303 | "collapsed": false 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "# Start by Visualizing the Data\n", 308 | "plt.errorbar(t, rv, rv_err, fmt='.');" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "### Compute the Periodogram" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "collapsed": false 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "# Compute the periodogram to look for significant periodicity\n", 327 | "\n", 328 | "from gatspy.periodic import LombScargleFast\n", 329 | "model = LombScargleFast()\n", 330 | "\n", 331 | "model.fit(t, rv, rv_err)\n", 332 | "periods, power = model.periodogram_auto()\n", 333 | "plt.semilogx(periods, power)\n", 334 | "plt.xlim(0, 10000);" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "### Initialize and run the MCMC Chain" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": null, 347 | "metadata": { 348 | "collapsed": false 349 | }, 350 | "outputs": [], 351 | "source": [ 352 | "theta_guess = make_starting_guess(t, rv, rv_err)\n", 353 | "theta_guess" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": { 360 | "collapsed": false 361 | }, 362 | "outputs": [], 363 | "source": [ 364 | "import emcee\n", 365 | "\n", 366 | "ndim = len(theta_guess) # number of parameters in the model\n", 367 | "nwalkers = 50 # number of MCMC walkers\n", 368 | "\n", 369 | "# start with a tight distribution of theta around the initial guess\n", 370 | "rng = np.random.RandomState(42)\n", 371 | "starting_guesses = theta_guess * (1 + 0.1 * rng.randn(nwalkers, ndim))\n", 372 | "\n", 373 | "# create the sampler; fix the random state for replicability\n", 374 | "sampler = emcee.EnsembleSampler(nwalkers, ndim, log_posterior, args=(t, rv, rv_err))\n", 375 | "sampler.random_state = rng\n", 376 | "\n", 377 | "# time and run the MCMC\n", 378 | "%time pos, prob, state = sampler.run_mcmc(starting_guesses, 1000)" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "### Plot the chains: have they stabilized?" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": { 392 | "collapsed": false 393 | }, 394 | "outputs": [], 395 | "source": [ 396 | "def plot_chains(sampler):\n", 397 | " fig, ax = plt.subplots(7, figsize=(8, 10), sharex=True)\n", 398 | " for i in range(7):\n", 399 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);\n", 400 | " ax[i].set_ylabel(params._fields[i])\n", 401 | "\n", 402 | "plot_chains(sampler)" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "metadata": { 409 | "collapsed": false 410 | }, 411 | "outputs": [], 412 | "source": [ 413 | "sampler.reset()\n", 414 | "%time pos, prob, state = sampler.run_mcmc(pos, 1000)" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": { 421 | "collapsed": false 422 | }, 423 | "outputs": [], 424 | "source": [ 425 | "plot_chains(sampler)" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "### Make a Corner Plot for the Parameters" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": { 444 | "collapsed": false 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "import corner\n", 449 | "corner.corner(sampler.flatchain[:, :2], \n", 450 | " labels=params._fields[:2]);" 451 | ] 452 | }, 453 | { 454 | "cell_type": "markdown", 455 | "metadata": {}, 456 | "source": [ 457 | "### Plot the Model Fits over the Data" 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": null, 463 | "metadata": { 464 | "collapsed": false 465 | }, 466 | "outputs": [], 467 | "source": [ 468 | "t_fit = np.linspace(t.min(), t.max(), 1000)\n", 469 | "rv_fit = [radial_velocity(t_fit, sampler.flatchain[i])\n", 470 | " for i in rng.choice(sampler.flatchain.shape[0], 200)]\n", 471 | "\n", 472 | "plt.figure(figsize=(14, 6))\n", 473 | "plt.errorbar(t, rv, rv_err, fmt='.k')\n", 474 | "plt.plot(t_fit, np.transpose(rv_fit), '-k', alpha=0.01)\n", 475 | "plt.xlabel('time (days)')\n", 476 | "plt.ylabel('radial velocity (km/s)');" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "### Results: Report your (joint) uncertainties on period and eccentricity" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": null, 489 | "metadata": { 490 | "collapsed": false 491 | }, 492 | "outputs": [], 493 | "source": [ 494 | "mean = sampler.flatchain.mean(0)\n", 495 | "std = sampler.flatchain.std(0)\n", 496 | "\n", 497 | "print(\"Period = {0:.0f} +/- {1:.0f} days\".format(mean[0], std[0]))\n", 498 | "print(\"eccentricity = {0:.2f} +/- {1:.2f}\".format(mean[1], std[1]))" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "In this case, a simple mean and standard deviation is probably not the best summary of the data, because the posterior has multiple modes. We could isolate the strongest mode and then get a better estimate:" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": null, 511 | "metadata": { 512 | "collapsed": false 513 | }, 514 | "outputs": [], 515 | "source": [ 516 | "chain = sampler.flatchain[sampler.flatchain[:, 0] > 1050]\n", 517 | "mean = chain.mean(0)\n", 518 | "std = chain.std(0)\n", 519 | "\n", 520 | "print(\"Period = {0:.0f} +/- {1:.0f} days\".format(mean[0], std[0]))\n", 521 | "print(\"eccentricity = {0:.2f} +/- {1:.2f}\".format(mean[1], std[1]))\n", 522 | "corner.corner(chain[:, :2], \n", 523 | " labels=params._fields[:2]);" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "This is a better representation of the parameter estimates for the most prominant peak." 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "## Extra Credit\n", 538 | "\n", 539 | "If you finish early, try tackling this...\n", 540 | "\n", 541 | "One reason we see multiple modes in the posterior is that there are actually multiple planets in the system! For example, the source of the above data is a paper which actually reports *two* detected planets.\n", 542 | "\n", 543 | "Build a Bayesian model which models two planets simultaneously: can you find signals from both planets in the data?" 544 | ] 545 | } 546 | ], 547 | "metadata": { 548 | "kernelspec": { 549 | "display_name": "Python 3", 550 | "language": "python", 551 | "name": "python3" 552 | }, 553 | "language_info": { 554 | "codemirror_mode": { 555 | "name": "ipython", 556 | "version": 3 557 | }, 558 | "file_extension": ".py", 559 | "mimetype": "text/x-python", 560 | "name": "python", 561 | "nbconvert_exporter": "python", 562 | "pygments_lexer": "ipython3", 563 | "version": "3.5.1" 564 | } 565 | }, 566 | "nbformat": 4, 567 | "nbformat_minor": 0 568 | } 569 | -------------------------------------------------------------------------------- /03-Bayesian-Modeling-With-MCMC.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Bayesian Modeling with MCMC" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In the previous section we explored a Bayesian solution to a straight line fit.\n", 15 | "The result made use of the evaluation of a posterior across a grid of parameters: a strategy that *will not* scale to higher-dimensional models." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## The Curse of Dimensionality\n", 23 | "\n", 24 | "The reason it will not scale is one of the effects of the ubiquitous \"Curse of Dimensionality\". To understand this, consider how many evaluations we need for an $N$-dimensional grid with 100 samples per dimension\n", 25 | "\n", 26 | "In one dimension, we have $100$ points.\n", 27 | "\n", 28 | "In two dimensions we have $100^2 = 10,000$ evaluations.\n", 29 | "\n", 30 | "In three dimensions, we have $100^3 = 1,000,000$ evaluations.\n", 31 | "\n", 32 | "In $N$ dimensions, we have $100^N$ evaluations, and as $N$ grows this quickly becomes untenable! For example, if we have only six model parameters, this \"dense grid\" approach will require evaluating the posterior at one trillion grid points, the results of which would require several terabytes of memory just to store!\n", 33 | "\n", 34 | "Evidently the dense grid strategy will not work for any but the simplest Bayesian models." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Circumventing the Curse with Sampling\n", 42 | "\n", 43 | "An idea that revolutionized Bayesian modeling (and made possible the wide variety of Bayesian approaches used in practice today) is *Markov Chain Monte Carlo* (MCMC), an approach that allows one to efficiently draw (pseudo)random samples from a posterior distribution even in relatively high dimensions." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### The Metropolis-Hastings Sampler\n", 51 | "\n", 52 | "Perhaps the simplest of MCMC samplers is the *Metropolis-Hastings Sampler*.\n", 53 | "This provides a procedure which, given a pseudo-random number generator, selects a chain of points which (in the long-term limit) will be a representative sample from the posterior. The procedure is surprisingly simple:\n", 54 | "\n", 55 | "1. Define a posterior $p(\\theta~|~D, I)$\n", 56 | "2. Define a *proposal density* $p(\\theta_{i + 1}~|~\\theta_i)$, which must be a symmetric function, but otherwise is unconstrained (a Gaussian is the usual choice).\n", 57 | "3. Choose a starting point $\\theta_0$\n", 58 | "4. Repeat the following:\n", 59 | "\n", 60 | " 1. Given $\\theta_i$, draw a new $\\theta_{i + 1}$\n", 61 | " \n", 62 | " 2. Compute the *acceptance ratio*\n", 63 | " $$\n", 64 | " a = \\frac{p(\\theta_{i + 1}~|~D,I)}{p(\\theta_i~|~D,I)}\n", 65 | " $$\n", 66 | " \n", 67 | " 3. If $a \\ge 1$, the proposal is more likely: accept the draw and add $\\theta_{i + 1}$ to the chain.\n", 68 | " \n", 69 | " 4. If $a < 1$, then accept the point with probability $a$: this can be done by drawing a uniform random number $r$ and checking if $a < r$. If the point is accepted, add $\\theta_{i + 1}$ to the chain. If not, then add $\\theta_i$ to the chain *again*.\n", 70 | "\n", 71 | "There are a few caveats to be aware of when using MCMC\n", 72 | "\n", 73 | "#### 1. The procedure is provably correct... but only in the long-term limit!\n", 74 | "\n", 75 | "Sometimes the long-term limit is **very** long. What we're looking for is \"stabilization\" of the MCMC chain, meaning that it has reached a statistical equilibrium. There is a vast literature on how to measure stabilization of an MCMC chain. Here we'll use the sloppy but intuitive LAI approach (i.e. Look At It).\n", 76 | "\n", 77 | "#### 2. The size of the proposal distribution is *very* important\n", 78 | "\n", 79 | "- If your proposal distribution is too small, it will take too long for your chain to move, and you have the danger of getting stuck in a local maximum for a long (but not infinite) time.\n", 80 | "\n", 81 | "- If your proposal distribution is too large, you will not be able to efficiently explore the space around a particular peak\n", 82 | "\n", 83 | "In general, choosing an appropriate scale for the proposal distribution is one of the most difficult parts of using the MCMC procedure above.\n", 84 | "More sophisticated methods (such as what we will use below) have built-in ways to estimate this along the way, but it's still something to be aware of!\n", 85 | "\n", 86 | "#### 3. Fast Stabilization can be helped by good initialization\n", 87 | "\n", 88 | "In practice, assuring that MCMC will stabilize quickly has a lot to do with choosing a suitable initialization. For this purpose, it can be useful to find the maximum a posteriori (MAP) value, and initialize the chain with this." 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "A bit later in the session, we will spend some time actually coding a simple Metropolis-Hastings sampler which uses the above procedure.\n", 96 | "But first let's take a look at MCMC in action." 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "## Sampling with ``emcee``\n", 104 | "\n", 105 | "There are several good Python approaches to Bayesian computation with MCMC. You can read about some of them in this blog post: [How to Be a Bayesian in Python](http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/).\n", 106 | "\n", 107 | "Here we'll focus on [``emcee``](http://dan.iel.fm/emcee/), a lightweight Python package developed by Dan Foreman-Mackey and collaborators.\n", 108 | "One benefit of ``emcee`` is that it uses an *ensemble sampler* which automatically tunes the shape and size of the proposal distribution (you can read more details in the ``emcee`` documentation)." 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "### Creating some Data" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": { 122 | "collapsed": false 123 | }, 124 | "outputs": [], 125 | "source": [ 126 | "%matplotlib inline\n", 127 | "import matplotlib.pyplot as plt\n", 128 | "import numpy as np\n", 129 | "import seaborn; seaborn.set() # for plot formatting" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": { 136 | "collapsed": false 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "def make_data(intercept, slope, N=20, dy=5, rseed=42):\n", 141 | " rand = np.random.RandomState(rseed)\n", 142 | " x = 100 * rand.rand(N)\n", 143 | " y = intercept + slope * x\n", 144 | " y += dy * rand.randn(N)\n", 145 | " return x, y, dy * np.ones_like(x)\n", 146 | "\n", 147 | "theta_true = (25, 0.5)\n", 148 | "x, y, dy = make_data(*theta_true)\n", 149 | "\n", 150 | "plt.errorbar(x, y, dy, fmt='o');" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "### Defining our Posterior" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "collapsed": true 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "# theta = [intercept, slope]\n", 169 | "\n", 170 | "def log_prior(theta):\n", 171 | " if np.all(np.abs(theta) < 1000):\n", 172 | " return 0\n", 173 | " else:\n", 174 | " return -np.inf # log(0)\n", 175 | " \n", 176 | "def log_likelihood(theta, x, y, dy):\n", 177 | " y_model = theta[0] + theta[1] * x\n", 178 | " return -0.5 * np.sum(np.log(2 * np.pi * dy ** 2) +\n", 179 | " (y - y_model) ** 2 / dy ** 2)\n", 180 | "\n", 181 | "def log_posterior(theta, x, y, dy):\n", 182 | " return log_prior(theta) + log_likelihood(theta, x, y, dy)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "### Using ``emcee`` to Sample" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "collapsed": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "import emcee\n", 201 | "\n", 202 | "ndim = 2 # number of parameters in the model\n", 203 | "nwalkers = 50 # number of MCMC walkers\n", 204 | "\n", 205 | "# initialize walkers\n", 206 | "starting_guesses = np.random.randn(nwalkers, ndim)\n", 207 | "\n", 208 | "sampler = emcee.EnsembleSampler(nwalkers, ndim, log_posterior,\n", 209 | " args=[x, y, dy])\n", 210 | "pos, prob, state = sampler.run_mcmc(starting_guesses, 200)" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "### Plotting the chains" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": { 224 | "collapsed": false 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "fig, ax = plt.subplots(2, sharex=True)\n", 229 | "for i in range(2):\n", 230 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "### Restarting after burn-in" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": { 244 | "collapsed": true 245 | }, 246 | "outputs": [], 247 | "source": [ 248 | "sampler.reset()\n", 249 | "pos, prob, state = sampler.run_mcmc(pos, 1000)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "collapsed": false 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "fig, ax = plt.subplots(2, sharex=True)\n", 261 | "for i in range(2):\n", 262 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "### Visualizing the Posterior" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "Using the [corner.py](https://pypi.python.org/pypi/corner) package, we can take a look at this multi-dimensional posterior, along with the input values for the parameters:" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "import corner\n", 288 | "corner.corner(sampler.flatchain, labels=['intercept', 'slope']);" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "Another way to visualize the posterior is to plot the model over the data.\n", 296 | "Each point in the two-dimensional space above corresponds to a possible model for our data; if we select ~100 of these at random and plot them over our data, it will give us a good idea of the spread in the model results:" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": { 303 | "collapsed": false 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "chain = sampler.flatchain\n", 308 | "\n", 309 | "plt.errorbar(x, y, dy, fmt='o');\n", 310 | "\n", 311 | "thetas = [chain[i] for i in np.random.choice(chain.shape[0], 100)]\n", 312 | "\n", 313 | "xfit = np.linspace(0, 100)\n", 314 | "for i in range(100):\n", 315 | " theta = thetas[i]\n", 316 | " plt.plot(xfit, theta[0] + theta[1] * xfit,\n", 317 | " color='black', alpha=0.05);" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "## Breakout: Linear Fit with Intrinsic Scatter\n", 325 | "\n", 326 | "Above we have done a simple model, where the data is drawn from a straight line.\n", 327 | "\n", 328 | "Often, however, we will be modeling relationships where there is some intrinsic scatter in the model itself: that is, even if the data were *perfectly* measured, they would not fall along a perfect straight line, but would have some (unknown) scatter about that line.\n", 329 | "\n", 330 | "Here we'll make a slightly more complicated model in which we will fit for the slope, intercept, and intrinsic scatter all at once." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": { 337 | "collapsed": false 338 | }, 339 | "outputs": [], 340 | "source": [ 341 | "def make_data_scatter(intercept, slope, scatter,\n", 342 | " N=20, dy=2, rseed=42):\n", 343 | " rand = np.random.RandomState(rseed)\n", 344 | " x = 100 * rand.rand(20)\n", 345 | " y = intercept + slope * x\n", 346 | " y += np.sqrt(dy ** 2 + scatter ** 2) * rand.randn(20)\n", 347 | " return x, y, dy * np.ones_like(x)\n", 348 | "\n", 349 | "\n", 350 | "# (intercept, slope, intrinsic scatter)\n", 351 | "theta = (25, 0.5, 3.0)\n", 352 | "x, y, dy = make_data_scatter(*theta)\n", 353 | "plt.errorbar(x, y, dy, fmt='o');" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "The following will walk you through using ``emcee`` to fit a model to this data.\n", 361 | "Feel free to copy and adapt the code from the simple linear example above; if you'd like to see a solution, you can look at the [Solutions-03](Solutions-03.ipynb) notebook – but please resist the temptation to look before you have made a good attempt at this yourself!" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "### Define the prior, likelihood, and posterior\n", 369 | "\n", 370 | "The likelihood for this model looks very similar to what we used above, except that the intrinsic scatter is added *in quadrature* to the measurement error.\n", 371 | "If $\\varepsilon_i$ is the measurement error on the point $(x_i, y_i)$, and $\\sigma$ is the intrinsic scatter, then the likelihood should look like this:\n", 372 | "\n", 373 | "$$\n", 374 | "P(x_i,y_i\\mid\\theta) = \\frac{1}{\\sqrt{2\\pi(\\varepsilon_i^2 + \\sigma^2)}} \\exp\\left(\\frac{-\\left[y_i - y(x_i;\\theta)\\right]^2}{2(\\varepsilon_i^2 + \\sigma^2)}\\right)\n", 375 | "$$\n", 376 | "\n", 377 | "For the prior, you can use either a flat or symmetric prior on the slope and intercept, but on the intrinsic scatter $\\sigma$ it is best to use a scale-invariant Jeffreys Prior:\n", 378 | "\n", 379 | "$$\n", 380 | "P(\\sigma)\\propto\\sigma^{-1}\n", 381 | "$$\n", 382 | "\n", 383 | "As discussed before, this has the nice feature that the resulting posterior will not depend on the units of measurement." 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": null, 389 | "metadata": { 390 | "collapsed": false 391 | }, 392 | "outputs": [], 393 | "source": [ 394 | "# Define functions to compute the log-prior, log-likelihood, and log-posterior\n", 395 | "\n", 396 | "# theta = [intercept, slope, scatter]\n", 397 | "\n", 398 | "def log_prior(theta):\n", 399 | " # fill this in\n", 400 | " pass\n", 401 | " \n", 402 | "def log_likelihood(theta, x, y, dy):\n", 403 | " # fill this in\n", 404 | " pass\n", 405 | "\n", 406 | "def log_posterior(theta, x, y, dy):\n", 407 | " # fill this in\n", 408 | " pass" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "### Sampling from the Posterior" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": { 422 | "collapsed": false 423 | }, 424 | "outputs": [], 425 | "source": [ 426 | "# Using emcee, create and initialize a sampler and draw 200 samples from the posterior.\n", 427 | "# Remember to think about what starting guesses should you use!\n", 428 | "# You can use the above as a template\n", 429 | "\n" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "### Visualizing the Chains" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": { 443 | "collapsed": false 444 | }, 445 | "outputs": [], 446 | "source": [ 447 | "# Plot the three chains as above\n", 448 | "\n" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "### Resetting and getting a clean sample" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": null, 461 | "metadata": { 462 | "collapsed": false 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "# Are your chains stabilized? Reset them and get a clean sample" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "### Visualizing the results" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "metadata": { 480 | "collapsed": false 481 | }, 482 | "outputs": [], 483 | "source": [ 484 | "# Use corner.py to visualize the three-dimensional posterior\n", 485 | "\n" 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": null, 491 | "metadata": { 492 | "collapsed": true 493 | }, 494 | "outputs": [], 495 | "source": [ 496 | "# Next plot ~100 of the samples as models over the data to get an idea of the fit\n", 497 | "\n" 498 | ] 499 | } 500 | ], 501 | "metadata": { 502 | "kernelspec": { 503 | "display_name": "Python 3", 504 | "language": "python", 505 | "name": "python3" 506 | }, 507 | "language_info": { 508 | "codemirror_mode": { 509 | "name": "ipython", 510 | "version": 3 511 | }, 512 | "file_extension": ".py", 513 | "mimetype": "text/x-python", 514 | "name": "python", 515 | "nbconvert_exporter": "python", 516 | "pygments_lexer": "ipython3", 517 | "version": "3.5.1" 518 | } 519 | }, 520 | "nbformat": 4, 521 | "nbformat_minor": 0 522 | } 523 | -------------------------------------------------------------------------------- /01-Introduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Bayesian Methods in Astronomy: Hands-on Statistics" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Big Picture: What Are We After?\n", 15 | "\n", 16 | "There are two fundamental types of statistical questions we'll want to answer:\n", 17 | "\n", 18 | "#### 1. Model Fitting\n", 19 | "*Given this Model, what parameters best fit my data?*\n", 20 | "\n", 21 | "Examples:\n", 22 | "\n", 23 | "- What are the slope and intercept of a line of best-fit?\n", 24 | "- What are the parameters of the best quadratic fit?\n", 25 | "- What is the frequency, amplitude, and phase of a sinusoidal fit?\n", 26 | "- What are the orbital parameters of a planet in this radial velocity data?\n", 27 | "\n", 28 | "#### 2. Model Selection\n", 29 | "\n", 30 | "*Given two potential Models, which better describes my data?*\n", 31 | "\n", 32 | "Examples:\n", 33 | "\n", 34 | "- Is there a linear trend in this data?\n", 35 | "- Does a linear or quadratic fit describe our data better?\n", 36 | "- Is there a periodic signal in this timeseries?\n", 37 | "- Does this star have a planet around it? Does this star have two planets around it?\n", 38 | "\n", 39 | "Often one of the two models is a *null hypothesis*, or a baseline model in which the effect you're interested in is not observed." 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## Frequentist vs Bayesian Approaches\n", 47 | "\n", 48 | "Both the model fitting and model selection problems can be approached from either a *frequentist* or a *Bayesian* standpoint.\n", 49 | "Fundamentally, the difference between these lies in the **definition of probability** that they use:\n", 50 | "\n", 51 | "- **A frequentist probability is a measure *long-run frequency* of (real or imagined) repeated trials.** Among other things, this generally means that observed data can be described probabilistically (you can repeat the experiment and get a different realization) while model parameters are fixed, and cannot be described probabilistically (the universe remains the same no matter how many times you observe it). \n", 52 | "\n", 53 | "- **A Bayesian probability is a *quantification of belief*.** Among other things, this generally means that observed data are treated as fixed (you know exactly what values you measured) while model parameters – including the \"true\" values of the data reflected by noisy measurements – are treated probabilistically (your degree of knowledge about the universe changes as you gather more data).\n", 54 | "\n", 55 | "Perhaps surprisingly to the uninitiated, scientists and statisticians have been vehemently fighting in favor of one approach or the other for decades.\n", 56 | "For more a more fleshed-out discussion of these different definitions and their consequences, you can see my [series of blog posts](http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/) on the topic." 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## The Bayesian Problem Setting\n", 64 | "\n", 65 | "Thus the end-goal of a Bayesian analysis is a probabilistic statement about the universe.\n", 66 | "Roughly we want to measure\n", 67 | "\n", 68 | "$$\n", 69 | "P(science)\n", 70 | "$$\n", 71 | "\n", 72 | "Where \"science\" might be encapsulated in the cosmological model, the mass of a planet around a star, or whatever else we're interested in learning about." 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "We don't of course measure this without reference to data, so more specifically we want to measure\n", 80 | "\n", 81 | "$$\n", 82 | "P(science~|~data)\n", 83 | "$$\n", 84 | "\n", 85 | "which should be read \"the probability of the science *given* the data.\"" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "Of course, we should be explicit that this measurement is not done in a vaccum: generally before observing any data we have *some* degree of background information that informs the science, so we should actually write\n", 93 | "\n", 94 | "$$\n", 95 | "P(science~|~data, background\\ info)\n", 96 | "$$\n", 97 | "\n", 98 | "This should be read \"the probability of the science given the data *and* the background information\"." 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "Finally, there are often things in the scientific model that we don't particularly care about: these are known as \"nuisance parameters\". As an example of a nuisance parameter, if you are finding a planet in radial velocity data, the secular motion of the star is *extremely* important to model correctly, but in the end you don't really care about its best-fit value.\n", 106 | "\n", 107 | "With that in mind, we can write:\n", 108 | "\n", 109 | "$$\n", 110 | "P(science,nuisance\\ parameters~|~data, background\\ info)\n", 111 | "$$\n", 112 | "\n", 113 | "Where as before the comma should be read as an \"and\"." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "This is starting to get a bit cumbersome, so let's create some symbols that will let us express this more easily:\n", 121 | "\n", 122 | "$$\n", 123 | "P(\\theta_S, \\theta_N~|~D, I)\n", 124 | "$$\n", 125 | "\n", 126 | "- $\\theta_S$ represents the \"science\": the set of parameters that we are interested in constraining\n", 127 | "- $\\theta_N$ represents the \"nuisance parameters\": the set of parameters that are important in the model, but are not particularly interesting for the scientific result.\n", 128 | "- $D$ represents the \"observed data\"\n", 129 | "- $I$ represents the information or knowledge you had before observing the data, including whatever made you choose the model you're fitting.\n", 130 | "\n", 131 | "Finally, we'll often just write $\\theta = (\\theta_S, \\theta_N)$ as a shorthand for all the model parameters.\n", 132 | "\n", 133 | "This quantity, $P(\\theta~|~D,I)$ is called the \"posterior probability\" and determining this quantity is the ultimate goal of a Bayesian analysis.\n", 134 | "\n", 135 | "Now all we need to do is compute it!\n", 136 | "\n", 137 | "The core problem is this: **We do not have a way to directly calculate** $P(\\theta~|~D,I)$. We often do have an expression for $P(D~|~\\theta,I)$, but these two expressions are **not** equal.\n", 138 | "\n", 139 | "$$\n", 140 | "P(\\theta~|~D,I) \\ne P(D~|~\\theta,I)\n", 141 | "$$\n", 142 | "\n", 143 | "\n", 144 | "We need to figure out how these two expressions are related." 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "## The Nitty Gritty: Thinking about Probability" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### Normalization of Probability\n", 159 | "\n", 160 | "We will generally be dealing with probability *densities*, that is, $P(A) dA$ is the probability of a value falling between $A$ and $A + dA$.\n", 161 | "\n", 162 | "Probability densities are normalized such that the union of all possible events has a probability of unity; mathematically that criterion looks like this:\n", 163 | "\n", 164 | "$$\n", 165 | "\\int P(A) dA = 1\n", 166 | "$$\n", 167 | "\n", 168 | "Among other things, consider the units implied by this expression: because probability is dimensionless, the units of $P(A)$ must be the inverse of the units of $A$.\n", 169 | "This can be very useful to keep in mind as you manipulate probabilistic expressions!" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### Joint Probability\n", 177 | "\n", 178 | "When we're talking about two different variables, we can write a joint probability:\n", 179 | "\n", 180 | "$$\n", 181 | "P(A,B)\n", 182 | "$$\n", 183 | "\n", 184 | "This should be read \"probability of $A$ and $B$\".\n", 185 | "\n", 186 | "In the case that $A$ and $B$ are independent, $P(A,B) = P(A)P(B)$" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "### Conditional probability\n", 194 | "\n", 195 | "Sometimes we want to express the probability of one variable conditioned on the value of another variable. We can write it like this:\n", 196 | "\n", 197 | "$$\n", 198 | "P(A\\mid B) = \\frac{P(A, B)}{P(B)}\n", 199 | "$$\n", 200 | "\n", 201 | "This should be read \"the probability of $A$ given $B$\".\n", 202 | "\n", 203 | "Rearranging this we get the more common expression of this:\n", 204 | "\n", 205 | "$$\n", 206 | "P(A, B) = P(A\\mid B)P(B)\n", 207 | "$$\n", 208 | "\n", 209 | "Note that conditional probabilities are still probabilities, so that the normalization is the same as above:\n", 210 | "\n", 211 | "$$\n", 212 | "\\int P(A\\mid B)dA = 1\n", 213 | "$$\n", 214 | "\n", 215 | "This makes clear that $P(A \\mid B)$ has units equivalent to $1/A$ – and no units related to $B$. Among other things, this means that expressions like $\\int P(A\\mid B)dB$ should raise a very big red flag!" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "### Marginalization\n", 223 | "\n", 224 | "Given the above two properties, we can quite easily show that\n", 225 | "$$\n", 226 | "\\int P(A, B) dA = P(B)\n", 227 | "$$\n", 228 | "This is known as *marginalization* – we have integrated *A* out of the joint probability, and we are left with the raw probability of *B*." 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "### Bayes' Rule\n", 236 | "\n", 237 | "The definition of conditional probability is entirely symmetric, so we can write\n", 238 | "\n", 239 | "$$\n", 240 | "P(A, B) = P(B, A)\n", 241 | "$$\n", 242 | "\n", 243 | "$$\n", 244 | "P(A\\mid B)P(B) = P(B\\mid A)P(A)\n", 245 | "$$\n", 246 | "\n", 247 | "which is more commonly rearranged in this form:\n", 248 | "\n", 249 | "$$\n", 250 | "P(A\\mid B) = \\frac{P(B\\mid A)P(A)}{P(B)}\n", 251 | "$$\n", 252 | "\n", 253 | "This is known as *Bayes' Theorem* or *Bayes' Rule*, and is important because it gives a formula for \"flipping\" conditional probabilities." 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "## From Bayes' Rule to Bayesian Inference\n", 261 | "\n", 262 | "If we replace these labels, we find the usual expression of Bayes' theorem as it relates to model fitting:\n", 263 | "\n", 264 | "$$\n", 265 | "P(\\theta \\mid D) = \\frac{P(D\\mid\\theta)P(\\theta)}{P(D)}\n", 266 | "$$\n", 267 | "\n", 268 | "Technically all the probabilities should all be conditioned on the information $I$:\n", 269 | "\n", 270 | "$$\n", 271 | "P(\\theta \\mid D,I) = \\frac{P(D \\mid \\theta,I)P(\\theta \\mid I)}{P(D \\mid I)}\n", 272 | "$$\n", 273 | "\n", 274 | "Recall $\\theta$ is the model we're interested in, $D$ is the observed data, and $I$ encodes all the prior information, including what led us to choose the particular model we're using." 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "### Wherefore the Controversy?\n", 282 | "\n", 283 | "When we write Bayes' rule this way, we're all of a sudden doing something controversial: can you see where this controversy lies?\n", 284 | "\n", 285 | "Two controversial points:\n", 286 | "\n", 287 | "- We have a probability distribution over model parameters. A frequentist would say this is meaningless!\n", 288 | "\n", 289 | "- The answer depends on the prior $P(\\theta\\mid I)$. This is the probability of the model parameters without any data: how are we supposed to know that?\n", 290 | "\n", 291 | "Nevertheless, applying Bayes' rule in this manner gives us a means of quantifying our knowledge of the parameters $\\theta$ given observed data" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "### Exploring the Terms in Bayesian Inference\n", 299 | "\n", 300 | "We have four terms in the above expression, and we need to make sure we understand them:\n", 301 | "\n", 302 | "#### $P(\\theta\\mid D, I)$ is the *posterior*.\n", 303 | "This is the quantity we want to compute: our knowledge of the model given the data & background knowledge (including the choice of model).\n", 304 | "\n", 305 | "#### $P(D\\mid\\theta,I)$ is the *likelihood*.\n", 306 | "This measures the probability of seeing our data given the model. This is identical to the quantity maximized in frequentist *maximum-likelihood* approaches.\n", 307 | "\n", 308 | "#### $P(\\theta\\mid I)$ is the *prior*.\n", 309 | "This encodes any knowledge we had about the answer before measuring the current data.\n", 310 | "\n", 311 | "#### $P(D\\mid I)$ is the *Fully Marginalized Likelihood*\n", 312 | "You might prefer the acronym *FML* (it's also called the *Evidence* among other things). In the context of model fitting, it acts as a normalization constant and in most cases can be ignored." 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "### Aside on the FML\n", 320 | "\n", 321 | "In general the Fully Marginalized Likelihood (FML) is **very costly** to compute, which makes the acronym doubly appropriate in any situation where you actually need it.\n", 322 | "To see why it's so costly, consider that the FML can be expressed as an integral using the identities we covered above:\n", 323 | "$$\n", 324 | "P(D\\mid I) = \\int P(D\\mid\\theta, I) P(\\theta) d\\theta\n", 325 | "$$\n", 326 | "In other words, it is the integral over the likelihood for *all possible values of theta*.\n", 327 | "When your likelihood is a complicated function of many parameters, computing this integral can become extremely costly (a manifestation of the *curse of dimensionality*).\n", 328 | "\n", 329 | "In general, for **model fitting**, you can ignore the FML as a simple normalization term. In **model selection**, the FML can become important." 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "## What is the Point?\n", 337 | "\n", 338 | "At first blush, this might all seem needlessly complicated. Why not simply maximize the likelihood and be done with it? Why multiply by a prior at all?\n", 339 | "\n", 340 | "There are a couple good reasons to go through all of this:" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "### \"Purity\"\n", 348 | "\n", 349 | "Many advocates of the Bayesian approach argue for it's philosophical purity: you quantify knowledge in terms of a probability, then follow the math to compute the answer.\n", 350 | "The fact that you need to specify a prior might be inconvenient, but we can't simply pretend it away.\n", 351 | "There are good reasons to think that the Bayesian posterior is just the quantity we wish to compute; in that case we should compute it, however inconvenient!\n", 352 | "\n", 353 | "Perhaps the most vocal 20th century proponent of this view was Jaynes; I'd highly suggest looking at his book, *Probability Theory: The Logic of Science* ([PDF here](http://bayes.wustl.edu/etj/prob/book.pdf))." 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "### Parameter Uncertainties\n", 361 | "\n", 362 | "Whether frequentist or Bayesian, the maximum likelihood \"point estimate\" is only a small part of the picture. What we're really interested in scientifically is the *uncertainty* of the estimates. So simply reporting a point estimate is not appropriate.\n", 363 | "\n", 364 | "In frequentist approaches, \"error bars\" are generally computed from *Confidence Intervals*, which effectively measure $P(\\hat\\theta\\mid\\theta)$, rather than $P(\\theta\\mid D)$.\n", 365 | "It takes some mental gymnastics to relate the confidence interval to the quantity we as scientists have in mind when we say \"uncertainty\".\n", 366 | "\n", 367 | "In the Bayesian approach, we are actually measuring $P(\\theta\\mid D)$ from the beginning.\n", 368 | "\n", 369 | "For some approachable reading on frequentist vs. Bayesian uncertainties, I'd suggest [The Fallacy of Placing Confidence in Confidence Intervals](http://learnbayes.org/papers/confidenceIntervalsFallacy/), as well as my (rather opinionated) blog post on the topic, [Confidence, Credibility, and why Frequentism and Science do not Mix](http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/)." 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "### Marginalization of nuisance parameters\n", 377 | "\n", 378 | "Additionally, Bayesian approaches offer a very natural way to systematically account for nuisance parameters.\n", 379 | "\n", 380 | "Consider that we now have a recipe for computing the posterior, $P(\\theta\\mid D,I)$. Recall that in general our model parameters $\\theta$ contain some parameters of interest (what we called $\\theta_S$, or the \"science\") and some nuisance parameters (what we called $\\theta_N$). That is, we're really measuring\n", 381 | "\n", 382 | "$$\n", 383 | "P(\\theta_S,\\theta_N\\mid D, I)\n", 384 | "$$\n", 385 | "\n", 386 | "What we're really interested in, though, is $P(\\theta_S\\mid D, I)$ which we can compute via a straightforward *marginalization* integral:\n", 387 | "\n", 388 | "$$\n", 389 | "P(\\theta_S\\mid D, I) = \\int P(\\theta_S, \\theta_N\\mid D, I)d\\theta_N\n", 390 | "$$\n", 391 | "\n", 392 | "where the integral is over the entire space of $\\theta_N$.\n", 393 | "This quantity, the marginalized posterior, is our final result of interest.\n", 394 | "\n", 395 | "At first glance, you might think this problem would be nearly as difficult as the computation of the FML above.\n", 396 | "We'll see later, however, that a feature of the MCMC approaches typical for large Bayesian problems is that such marginalized likelihoods come essentially for free." 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "## Looking forward\n", 404 | "\n", 405 | "We now have all the probabilistic machinery we need to do Bayesian inference... next we will get our fingers on our keyboards and fit some Bayesian models!" 406 | ] 407 | } 408 | ], 409 | "metadata": { 410 | "kernelspec": { 411 | "display_name": "Python 3", 412 | "language": "python", 413 | "name": "python3" 414 | }, 415 | "language_info": { 416 | "codemirror_mode": { 417 | "name": "ipython", 418 | "version": 3 419 | }, 420 | "file_extension": ".py", 421 | "mimetype": "text/x-python", 422 | "name": "python", 423 | "nbconvert_exporter": "python", 424 | "pygments_lexer": "ipython3", 425 | "version": "3.5.1" 426 | } 427 | }, 428 | "nbformat": 4, 429 | "nbformat_minor": 0 430 | } 431 | -------------------------------------------------------------------------------- /05-Radial-Velocity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Bayesian Radial Velocity Planet Search" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "One area in astronomy where Bayesian approaches have been applied with great success is in the hunt for extrasolar planets.\n", 15 | "There are several benefits of Bayesian modeling in this situation. In particular:\n", 16 | "\n", 17 | "1. Bayesian modeling can account for the many nuisance parameters typical in a planet search. For example, when seeking evidence of a planet around a star, we need our model to account for things like the phase and the longitude of periastron, but we don't necessarily care about these parameters in the end.\n", 18 | "\n", 19 | "2. Often we have very important prior information – for example, we might have a very good constraint on the period from eclipse data, and use this to model orbital parameters with radial velocity data.\n", 20 | "\n", 21 | "3. The forward-modeling aspect of Bayesian approaches can be advantageous when dealing with detectors that have strong systematic uncertainties. For example, this idea is key to some of the recent analysis of K2 data.\n", 22 | "\n", 23 | "Here we'll take a look at a Bayesian approach to determining orbital parameters of a planet from radial velocity (RV) measurements.\n", 24 | "We'll start with some generated data in which we know the correct answer, and then take a look at some real RV measurements from the literature." 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Preliminaries" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "As usual, we start with some imports" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": false 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "%matplotlib inline\n", 50 | "import matplotlib.pyplot as plt\n", 51 | "import numpy as np\n", 52 | "import pandas as pd\n", 53 | "import seaborn; seaborn.set() #nice plot formatting" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## The Radial Velocity Model" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "The first important step is to define a mathematical (and computational) model of how the parameters of interest are reflected in our observations." 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "Some references relating to what we're going to compute below:\n", 75 | "\n", 76 | "- Balan 2009: http://adsabs.harvard.edu/abs/2009MNRAS.394.1936B\n", 77 | "- Exofit Manual: http://www.star.ucl.ac.uk/~lahav/ExoFitv2.pdf\n", 78 | "- Hou 2014: http://arxiv.org/pdf/1401.6128.pdf" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "The equation for radial velocity is this:\n", 86 | "\n", 87 | "$$\n", 88 | "v(t) = V - K[ \\sin(f + \\omega) + e \\sin(\\omega)]\n", 89 | "$$\n", 90 | "\n", 91 | "where $V$ is the overall velocity of the system, and\n", 92 | "\n", 93 | "$$\n", 94 | "K = \\frac{m_p}{m_s + m_p} \\frac{2\\pi}{T}\\frac{a \\sin i}{\\sqrt{1 - e^2}}\n", 95 | "$$\n", 96 | "\n", 97 | "The true anomaly $f$ satisfies\n", 98 | "\n", 99 | "$$\n", 100 | "\\cos(f) = \\frac{\\cos(E) - e}{1 - e\\cos E}\n", 101 | "$$\n", 102 | "\n", 103 | "Rearranging this we can write\n", 104 | "$$\n", 105 | "f = 2 \\cdot{\\rm atan2}\\left(\\sqrt{1 + e}\\sin(E/2), \\sqrt{1 - e} \\cos(E/2)\\right)\n", 106 | "$$\n", 107 | "\n", 108 | "The eccentric anomaly $E$ satisfies\n", 109 | "$$\n", 110 | "M = E - e\\sin E\n", 111 | "$$\n", 112 | "\n", 113 | "and the mean anomaly is\n", 114 | "$$\n", 115 | "M = \\frac{2\\pi}{T}(t + \\tau)\n", 116 | "$$\n", 117 | "\n", 118 | "and $\\tau$ is the time of pericenter passage, which we'll parametrize with the parameter $\\chi = \\tau / T$" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "These are the parameters needed to compute the radial velocity:\n", 126 | "\n", 127 | "- $T$: orbital period\n", 128 | "- $K$: amplitude of RV oscillation\n", 129 | "- $V$: secular offset of RV oscillation\n", 130 | "- $e$: eccentricity\n", 131 | "- $\\omega$: longitude of periastron\n", 132 | "- $\\chi$: dimensionless phase offset\n", 133 | "\n", 134 | "Additionally, we will fit a scatter parameter $s$ which accounts for global data errors not reflected in the reported uncertainties (this is very similar to the third parameter from the linear fit we saw earlier)\n", 135 | "\n", 136 | "For convenience, we'll store these parameters in a ``namedtuple``:" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": { 143 | "collapsed": true 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "from collections import namedtuple\n", 148 | "params = namedtuple('params', ['T', 'e', 'K', 'V', 'omega', 'chi', 's'])" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Here is a function to compute the observed radial velocity as a function of these parameters:" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "collapsed": true 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "from scipy import optimize\n", 167 | "\n", 168 | "@np.vectorize\n", 169 | "def compute_E(M, e):\n", 170 | " \"\"\"Solve Kepler's eqns for eccentric anomaly given mean anomaly\"\"\"\n", 171 | " f = lambda E, M=M, e=e: E - e * np.sin(E) - M\n", 172 | " return optimize.brentq(f, 0, 2 * np.pi)\n", 173 | "\n", 174 | "\n", 175 | "def radial_velocity(t, theta):\n", 176 | " \"\"\"Compute radial velocity given orbital parameters\"\"\"\n", 177 | " T, e, K, V, omega, chi = theta[:6]\n", 178 | " \n", 179 | " # compute mean anomaly (0 <= M < 2pi)\n", 180 | " M = 2 * np.pi * ((t / T + chi) % 1)\n", 181 | " \n", 182 | " # solve for eccentric anomaly\n", 183 | " E = compute_E(M, e)\n", 184 | " \n", 185 | " # compute true anomaly\n", 186 | " f = 2 * np.arctan2(np.sqrt(1 + e) * np.sin(E / 2),\n", 187 | " np.sqrt(1 - e) * np.cos(E / 2))\n", 188 | " \n", 189 | " # compute radial velocity\n", 190 | " return V - K * (np.sin(f + omega) + e * np.sin(omega))" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "Just to get a sense of whether we've done this correctly, let's use IPython's interactive features to see how the parameters change the observed RV curve (you may have to first ``pip install ipywidgets``)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": { 204 | "collapsed": false 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "from ipywidgets import interact\n", 209 | "\n", 210 | "def plot_RV(T, e, K, V, omega, chi):\n", 211 | " t = np.linspace(0, 5, 200)\n", 212 | " theta = [T, e, K, V, omega, chi]\n", 213 | " plt.plot(t, radial_velocity(t, theta))\n", 214 | " \n", 215 | "interact(plot_RV,\n", 216 | " T=(0, 5.), K=(0, 2000.), V=(-2000., 2000.),\n", 217 | " e=(0, 1.), omega=(0, 2 * np.pi), chi=(0, 1.));" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "The model seems to be working as expected!" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "## Fitting Simulated Data\n", 232 | "\n", 233 | "Now let's generate some simulated data so that we can explore a Bayesian approach to modeling the radial velocity effects of a planet orbiting a star.\n", 234 | "We'll choose some reasonable parameters and create some data:" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": { 241 | "collapsed": true 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "theta_sim = params(T=700, e=0.38, K=60, V=12,\n", 246 | " omega=3.10, chi=0.67, s=1)\n", 247 | "Nobs = 50\n", 248 | "\n", 249 | "rng = np.random.RandomState(0)\n", 250 | "t_sim = 1400 + 600 * rng.rand(Nobs)\n", 251 | "err_sim = 5 + 5 * rng.rand(Nobs)\n", 252 | "rv_sim = radial_velocity(t_sim, theta_sim) + err_sim * rng.randn(Nobs)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": { 259 | "collapsed": false 260 | }, 261 | "outputs": [], 262 | "source": [ 263 | "plt.errorbar(t_sim, rv_sim, err_sim, fmt='.k');\n", 264 | "xlim = plt.xlim()\n", 265 | "t_fit = np.linspace(xlim[0], xlim[1], 500)\n", 266 | "plt.plot(t_fit, radial_velocity(t_fit, theta_sim), color='gray', lw=8, alpha=0.2);" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "## Bayesian Fit" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "Next let's use ``emcee`` to do a Bayesian model fit to this data.\n", 281 | "We'll follow some of the references above and use a flat prior on most parameters, and a Jeffreys Prior on the scale factors:" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": { 288 | "collapsed": false 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "theta_lim = params(T=(0.2, 2000),\n", 293 | " e=(0, 1),\n", 294 | " K=(0.01, 2000),\n", 295 | " V=(-2000, 2000),\n", 296 | " omega=(0, 2 * np.pi),\n", 297 | " chi=(0, 1),\n", 298 | " s=(0.001, 100))\n", 299 | "theta_min, theta_max = map(np.array, zip(*theta_lim))\n", 300 | "\n", 301 | "def log_prior(theta):\n", 302 | " if np.any(theta < theta_min) or np.any(theta > theta_max):\n", 303 | " return -np.inf # log(0)\n", 304 | " \n", 305 | " # Jeffreys Prior on T, K, and s\n", 306 | " return -np.sum(np.log(theta[[0, 2, 6]]))\n", 307 | "\n", 308 | "def log_likelihood(theta, t, rv, rv_err):\n", 309 | " sq_err = rv_err ** 2 + theta[6] ** 2\n", 310 | " rv_model = radial_velocity(t, theta)\n", 311 | " return -0.5 * np.sum(np.log(sq_err) + (rv - rv_model) ** 2 / sq_err)\n", 312 | "\n", 313 | "def log_posterior(theta, t, rv, rv_err):\n", 314 | " ln_prior = log_prior(theta)\n", 315 | " if np.isinf(ln_prior):\n", 316 | " return ln_prior\n", 317 | " else:\n", 318 | " return ln_prior + log_likelihood(theta, t, rv, rv_err)" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "With the likelihood model in place, we can now use ``emcee`` to sample the posterior and view the resulting chains:" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": { 332 | "collapsed": false 333 | }, 334 | "outputs": [], 335 | "source": [ 336 | "import emcee\n", 337 | "\n", 338 | "ndim = len(theta_sim) # number of parameters in the model\n", 339 | "nwalkers = 50 # number of MCMC walkers\n", 340 | "\n", 341 | "# start with theta near the midpoint of the prior range\n", 342 | "rng = np.random.RandomState(42)\n", 343 | "theta_guess = 0.5 * (theta_min + theta_max)\n", 344 | "theta_range = (theta_max - theta_min)\n", 345 | "starting_guesses = theta_guess + 0.05 * theta_range * rng.randn(nwalkers, ndim)\n", 346 | "\n", 347 | "# create the sampler; fix the random state for replicability\n", 348 | "sampler = emcee.EnsembleSampler(nwalkers, ndim, log_posterior, args=(t_sim, rv_sim, err_sim))\n", 349 | "sampler.random_state = rng\n", 350 | "\n", 351 | "# time and run the MCMC\n", 352 | "%time pos, prob, state = sampler.run_mcmc(starting_guesses, 1000)" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": null, 358 | "metadata": { 359 | "collapsed": false 360 | }, 361 | "outputs": [], 362 | "source": [ 363 | "def plot_chains(sampler):\n", 364 | " fig, ax = plt.subplots(7, figsize=(8, 10), sharex=True)\n", 365 | " for i in range(7):\n", 366 | " ax[i].plot(sampler.chain[:, :, i].T, '-k', alpha=0.2);\n", 367 | " ax[i].set_ylabel(params._fields[i])\n", 368 | "\n", 369 | "plot_chains(sampler)" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "Yikes! it's all over the place!\n", 377 | "\n", 378 | "The issue here is that our initialization was haphazard and the posterior is extremely multimodal (especially in *T*); given a number of steps approaching infinity, the MCMC algorithm would converge, but we don't have an infinite amount of time to wait! Instead we can more carefully initialize the walkers.\n", 379 | "\n", 380 | "First, let's use a Lomb-Scargle periodogram to find a suitable guess at the period.\n", 381 | "The [gatspy](http://www.astroml.org/gatspy/) package has a nice implementation (first you need to ``pip install gatspy``)" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": null, 387 | "metadata": { 388 | "collapsed": false 389 | }, 390 | "outputs": [], 391 | "source": [ 392 | "from gatspy.periodic import LombScargleFast\n", 393 | "model = LombScargleFast()\n", 394 | "\n", 395 | "model.fit(t_sim, rv_sim, err_sim)\n", 396 | "periods, power = model.periodogram_auto()\n", 397 | "plt.semilogx(periods, power)\n", 398 | "plt.xlim(0, 10000);" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "Now we can choose a sensible starting point with this period, and with other parameters estimated from the data" 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": null, 411 | "metadata": { 412 | "collapsed": false 413 | }, 414 | "outputs": [], 415 | "source": [ 416 | "def make_starting_guess(t, rv, rv_err):\n", 417 | " model = LombScargleFast()\n", 418 | " model.optimizer.set(period_range=theta_lim.T,\n", 419 | " quiet=True)\n", 420 | " model.fit(t, rv, rv_err)\n", 421 | "\n", 422 | " rv_range = 0.5 * (np.max(rv) - np.min(rv))\n", 423 | " rv_center = np.mean(rv)\n", 424 | " return params(T=model.best_period,\n", 425 | " e=0.1,\n", 426 | " K=rv_range,\n", 427 | " V=rv_center,\n", 428 | " omega=np.pi,\n", 429 | " chi=0.5,\n", 430 | " s=rv_err.mean())\n", 431 | "\n", 432 | "theta_guess = make_starting_guess(t_sim, rv_sim, err_sim)\n", 433 | "theta_guess" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": { 440 | "collapsed": false 441 | }, 442 | "outputs": [], 443 | "source": [ 444 | "sampler.reset()\n", 445 | "start = theta_guess * (1 + 0.01 * rng.randn(nwalkers, ndim))\n", 446 | "pos, prob, state = sampler.run_mcmc(start, 1000)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": null, 452 | "metadata": { 453 | "collapsed": false 454 | }, 455 | "outputs": [], 456 | "source": [ 457 | "plot_chains(sampler)" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "Much more reasonable! The trace appears to have stabilized by the end of this, so let's reset and get a clean 1000 samples from the posterior:" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": null, 470 | "metadata": { 471 | "collapsed": true 472 | }, 473 | "outputs": [], 474 | "source": [ 475 | "sampler.reset()\n", 476 | "pos, prob, state = sampler.run_mcmc(pos, 1000)" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "Using the [corner.py](https://pypi.python.org/pypi/corner) package, we can take a look at this multi-dimensional posterior, along with the input values for the parameters:" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": null, 489 | "metadata": { 490 | "collapsed": false 491 | }, 492 | "outputs": [], 493 | "source": [ 494 | "import corner\n", 495 | "corner.corner(sampler.flatchain, labels=params._fields, truths=theta_sim);" 496 | ] 497 | }, 498 | { 499 | "cell_type": "markdown", 500 | "metadata": {}, 501 | "source": [ 502 | "We can view the fit to our data by sampling from the posterior, and computing the model associated with each sample: because the trace already is a sample from the posterior, we can simply draw randomly from these points:" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": { 509 | "collapsed": false 510 | }, 511 | "outputs": [], 512 | "source": [ 513 | "rng = np.random.RandomState(42)\n", 514 | "\n", 515 | "rv_fit = [radial_velocity(t_fit, sampler.flatchain[i])\n", 516 | " for i in rng.choice(sampler.flatchain.shape[0], 200)]\n", 517 | "\n", 518 | "plt.errorbar(t_sim, rv_sim, err_sim, fmt='.k')\n", 519 | "plt.plot(t_fit, np.transpose(rv_fit), '-k', alpha=0.01)\n", 520 | "plt.xlabel('time (days)')\n", 521 | "plt.ylabel('radial velocity (km/s)');" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "Finally, we can treat everything but the period and eccentricity as a nuisance parameter (i.e. marginalize over them) and take a look at our parameter constraints:" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": { 535 | "collapsed": false 536 | }, 537 | "outputs": [], 538 | "source": [ 539 | "import corner\n", 540 | "corner.corner(sampler.flatchain[:, :2], \n", 541 | " labels=params._fields[:2],\n", 542 | " truths=theta_sim[:2]);" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "This is the marginalized posterior distribution for the period and eccentricity (i.e. integrating over all the other parameters as nuisance parameters).\n", 550 | "We see that, as we might hope, the true value lies well withing the uncertainty implied by the marginalized posterior!" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "## Breakout: 47 Ursae Majoris\n", 558 | "\n", 559 | "Your task now is to repeat this analysis using some real data, which we'll take from table 1 of [Fischer et al 2002](http://iopscience.iop.org/article/10.1086/324336/meta).\n", 560 | "\n", 561 | "If you're curious about one possible solution to this problem, see [Solutions-05](Solutions-05.ipynb). As usual, try to fight the temptation to peek at this until after you've given the problem a reasonable effort!\n", 562 | "\n", 563 | "The following IPython magic command will create the data file ``47UrsaeMajoris.txt`` in the current directory:" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": null, 569 | "metadata": { 570 | "collapsed": false 571 | }, 572 | "outputs": [], 573 | "source": [ 574 | "%%file 47UrsaeMajoris.txt\n", 575 | "date rv rv_err\n", 576 | "6959.737 -60.48 14.00\n", 577 | "7194.912 -53.60 7.49\n", 578 | "7223.798 -38.36 6.14\n", 579 | "7964.893 0.60 8.19\n", 580 | "8017.730 -28.29 10.57\n", 581 | "8374.771 -40.25 9.37\n", 582 | "8647.897 42.37 11.41\n", 583 | "8648.910 32.64 11.02\n", 584 | "8670.878 55.45 11.45\n", 585 | "8745.691 51.78 8.76\n", 586 | "8992.061 4.49 11.21\n", 587 | "9067.771 -14.63 7.00\n", 588 | "9096.734 -26.06 6.79\n", 589 | "9122.691 -47.38 7.91\n", 590 | "9172.686 -38.22 10.55\n", 591 | "9349.912 -52.21 9.52\n", 592 | "9374.964 -48.69 8.67\n", 593 | "9411.839 -36.01 12.81\n", 594 | "9481.720 -52.46 13.40\n", 595 | "9767.918 38.58 5.48\n", 596 | "9768.908 36.68 5.02\n", 597 | "9802.789 37.93 3.85\n", 598 | "10058.079 15.82 3.45\n", 599 | "10068.980 15.46 4.63\n", 600 | "10072.012 21.20 4.09\n", 601 | "10088.994 1.30 4.25\n", 602 | "10089.947 6.12 3.70\n", 603 | "10091.900 0.00 4.16\n", 604 | "10120.918 4.07 4.16\n", 605 | "10124.905 0.29 3.74\n", 606 | "10125.823 -1.87 3.79\n", 607 | "10127.898 -0.68 4.10\n", 608 | "10144.877 -4.13 5.26\n", 609 | "10150.797 -8.14 4.18\n", 610 | "10172.829 -10.79 4.43\n", 611 | "10173.762 -9.33 5.43\n", 612 | "10181.742 -23.87 3.28\n", 613 | "10187.740 -16.70 4.67\n", 614 | "10199.730 -16.29 3.98\n", 615 | "10203.733 -21.84 4.92\n", 616 | "10214.731 -24.51 3.67\n", 617 | "10422.018 -56.63 4.23\n", 618 | "10438.001 -39.61 3.91\n", 619 | "10442.027 -44.62 4.05\n", 620 | "10502.853 -32.05 4.69\n", 621 | "10504.859 -39.08 4.65\n", 622 | "10536.845 -22.46 5.18\n", 623 | "10537.842 -22.83 4.16\n", 624 | "10563.673 -17.47 4.03\n", 625 | "10579.697 -11.01 3.84\n", 626 | "10610.719 -8.67 3.52\n", 627 | "10793.957 37.00 3.78\n", 628 | "10795.039 41.85 4.80\n", 629 | "10978.684 36.42 5.01\n", 630 | "11131.066 13.56 6.61\n", 631 | "11175.027 -3.74 8.17\n", 632 | "11242.842 -21.85 5.43\n", 633 | "11303.712 -48.75 4.63\n", 634 | "11508.070 -51.65 8.37\n", 635 | "11536.064 -72.44 4.73\n", 636 | "11540.999 -57.58 5.97\n", 637 | "11607.916 -43.94 4.94\n", 638 | "11626.771 -39.14 7.03\n", 639 | "11627.754 -50.88 6.21\n", 640 | "11628.727 -51.52 5.87\n", 641 | "11629.832 -51.86 4.60\n", 642 | "11700.693 -24.58 5.20\n", 643 | "11861.049 14.64 5.33\n", 644 | "11874.068 14.15 5.75\n", 645 | "11881.045 18.02 4.15\n", 646 | "11895.068 16.96 4.60\n", 647 | "11906.014 11.73 4.07\n", 648 | "11907.011 22.83 4.38\n", 649 | "11909.042 23.42 3.78\n", 650 | "11910.955 18.34 4.33\n", 651 | "11914.067 15.45 5.37\n", 652 | "11915.048 24.05 3.82\n", 653 | "11916.033 23.16 3.67\n", 654 | "11939.969 27.53 5.08\n", 655 | "11946.960 21.44 4.18\n", 656 | "11969.902 30.99 4.58\n", 657 | "11971.894 38.36 5.01\n", 658 | "11998.779 33.82 3.93\n", 659 | "11999.820 27.52 3.98\n", 660 | "12000.858 23.40 4.07\n", 661 | "12028.740 37.08 4.95\n", 662 | "12033.746 26.28 5.24\n", 663 | "12040.759 31.12 3.54\n", 664 | "12041.719 34.04 3.45\n", 665 | "12042.695 31.38 3.98\n", 666 | "12073.723 21.81 4.73" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "An easy way to load data in this form is the [pandas](http://pandas.pydata.org) package, which implements a DataFrame object (basically, a labeled data table).\n", 674 | "Reading the CSV file is a one-line operation:" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": { 681 | "collapsed": false 682 | }, 683 | "outputs": [], 684 | "source": [ 685 | "import pandas as pd\n", 686 | "data = pd.read_csv('47UrsaeMajoris.txt', delim_whitespace=True)\n", 687 | "t, rv, rv_err = data.values.T" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": {}, 693 | "source": [ 694 | "With this data in hand, you can now start to explore it and search for a planet in the radial wobbles of the star." 695 | ] 696 | }, 697 | { 698 | "cell_type": "markdown", 699 | "metadata": {}, 700 | "source": [ 701 | "### Visualize the Data" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": null, 707 | "metadata": { 708 | "collapsed": false 709 | }, 710 | "outputs": [], 711 | "source": [ 712 | "# Fill-in code to visualize the data\n", 713 | "\n" 714 | ] 715 | }, 716 | { 717 | "cell_type": "markdown", 718 | "metadata": {}, 719 | "source": [ 720 | "### Compute the Periodogram" 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": null, 726 | "metadata": { 727 | "collapsed": false 728 | }, 729 | "outputs": [], 730 | "source": [ 731 | "# Compute the periodogram to look for significant periodicity\n", 732 | "\n" 733 | ] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": {}, 738 | "source": [ 739 | "### Initialize and run the MCMC Chain" 740 | ] 741 | }, 742 | { 743 | "cell_type": "code", 744 | "execution_count": null, 745 | "metadata": { 746 | "collapsed": false 747 | }, 748 | "outputs": [], 749 | "source": [ 750 | "# Fill-in your code here\n", 751 | "\n" 752 | ] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "### Plot the chains: have they stabilized?" 759 | ] 760 | }, 761 | { 762 | "cell_type": "code", 763 | "execution_count": null, 764 | "metadata": { 765 | "collapsed": false 766 | }, 767 | "outputs": [], 768 | "source": [ 769 | "# Fill-in your code here\n", 770 | "\n" 771 | ] 772 | }, 773 | { 774 | "cell_type": "markdown", 775 | "metadata": {}, 776 | "source": [ 777 | "### If necessary, reset and re-run the sampler" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": null, 783 | "metadata": { 784 | "collapsed": false 785 | }, 786 | "outputs": [], 787 | "source": [ 788 | "# Fill-in your code here\n", 789 | "\n" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": {}, 795 | "source": [ 796 | "### Make a Corner Plot for the Parameters" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": null, 802 | "metadata": { 803 | "collapsed": false 804 | }, 805 | "outputs": [], 806 | "source": [ 807 | "# Fill-in your code here\n", 808 | "\n" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": {}, 814 | "source": [ 815 | "### Plot the Model Fits over the Data" 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": null, 821 | "metadata": { 822 | "collapsed": false 823 | }, 824 | "outputs": [], 825 | "source": [ 826 | "# Fill-in your code here\n", 827 | "\n" 828 | ] 829 | }, 830 | { 831 | "cell_type": "markdown", 832 | "metadata": {}, 833 | "source": [ 834 | "### Results: Report your (joint) uncertainties on period and eccentricity" 835 | ] 836 | }, 837 | { 838 | "cell_type": "code", 839 | "execution_count": null, 840 | "metadata": { 841 | "collapsed": true 842 | }, 843 | "outputs": [], 844 | "source": [ 845 | "# Fill-in your code here\n", 846 | "\n" 847 | ] 848 | }, 849 | { 850 | "cell_type": "markdown", 851 | "metadata": {}, 852 | "source": [ 853 | "## Extra Credit\n", 854 | "\n", 855 | "If you finish early, try tackling this...\n", 856 | "\n", 857 | "The source of the above data is a paper which actually reports *two* detected planets. Build a Bayesian model which models both of them at once: can you find signals from both planets in the data?" 858 | ] 859 | } 860 | ], 861 | "metadata": { 862 | "kernelspec": { 863 | "display_name": "Python 3", 864 | "language": "python", 865 | "name": "python3" 866 | }, 867 | "language_info": { 868 | "codemirror_mode": { 869 | "name": "ipython", 870 | "version": 3 871 | }, 872 | "file_extension": ".py", 873 | "mimetype": "text/x-python", 874 | "name": "python", 875 | "nbconvert_exporter": "python", 876 | "pygments_lexer": "ipython3", 877 | "version": "3.5.1" 878 | } 879 | }, 880 | "nbformat": 4, 881 | "nbformat_minor": 0 882 | } 883 | --------------------------------------------------------------------------------