├── .gitignore ├── README.rst ├── SciPy Lecture 1.pdf ├── SciPy Lecture 4.pdf ├── bioassay.py ├── cov.py ├── mean.py ├── obs.py ├── triangular.py ├── truncated_metropolis.py └── weibull.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | An Introduction to Bayesian Statistical Modeling using PyMC 2 | =========================================================== 3 | 4 | PyMC is a Python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems across all quantitative disciplines. This hands-on tutorial will introduce users to the key components of PyMC and how to employ them to construct, fit and diagnose models. Though some familiarity with statistics is assumed, the tutorial will begin with a brief overview of Bayesian inference, including an introduction to Markov chain Monte Carlo. 5 | 6 | Installing PyMC 7 | --------------- 8 | 9 | PyMC is known to run on Mac OS X, Linux and Windows, but in theory should be 10 | able to work on just about any platform for which Python, a Fortran compiler 11 | and the NumPy module are available. However, installing some extra 12 | depencies can greatly improve PyMC's performance and versatility. 13 | The following describes the required and optional dependencies and takes you 14 | through the installation process. 15 | 16 | Dependencies 17 | ------------ 18 | 19 | PyMC requires some prerequisite packages to be present on the system. 20 | Fortunately, there are currently only a few dependencies, and all are 21 | freely available online. 22 | 23 | * `Python`_ version 2.5 or 2.6. 24 | 25 | * `NumPy`_ (1.4 or newer): The fundamental scientific programming package, it provides a 26 | multidimensional array type and many useful functions for numerical analysis. 27 | 28 | * `Matplotlib (optional)`_ : 2D plotting library which produces publication 29 | quality figures in a variety of image formats and interactive environments 30 | 31 | * `pyTables (optional)`_ : Package for managing hierarchical datasets and 32 | designed to efficiently and easily cope with extremely large amounts of data. 33 | Requires the `HDF5`_ library. 34 | 35 | * `pydot (optional)`_ : Python interface to Graphviz's Dot language, it allows 36 | PyMC to create both directed and non-directed graphical representations of models. 37 | Requires the `Graphviz`_ library. 38 | 39 | * `SciPy (optional)`_ : Library of algorithms for mathematics, science 40 | and engineering. 41 | 42 | * `IPython (optional)`_ : An enhanced interactive Python shell and an 43 | architecture for interactive parallel computing. 44 | 45 | * `nose (optional)`_ : A test discovery-based unittest extension (required 46 | to run the test suite). 47 | 48 | 49 | There are prebuilt distributions that include all required dependencies. For 50 | Mac OS X users, we recommend the `MacPython`_ distribution or the 51 | `Enthought Python Distribution`_ on OS X 10.5 (Leopard) and Python 2.6.1 that 52 | ships with OS X 10.6 (Snow Leopard). Windows users should download and install the 53 | `Enthought Python Distribution`_. The Enthought Python Distribution comes 54 | bundled with these prerequisites. Note that depending on the currency of these 55 | distributions, some packages may need to be updated manually. 56 | 57 | For Mac OS X 10.6 (Leopard) users, a script for installing all the key dependencies, as well as a recent build of PyMC, can be downloaded from the `SciPy Superpack page`_. 58 | 59 | If instead of installing the prebuilt binaries you prefer (or have) to build 60 | ``pymc`` yourself, make sure you have a Fortran and a C compiler. There are free 61 | compilers (gfortran, gcc) available on all platforms. Other compilers have not been 62 | tested with PyMC but may work nonetheless. 63 | 64 | 65 | .. _`Python`: http://www.python.org/. 66 | 67 | .. _`NumPy`: http://www.scipy.org/NumPy 68 | 69 | .. _`Matplotlib (optional)`: http://matplotlib.sourceforge.net/ 70 | 71 | .. _`MacPython`: http://www.activestate.com/Products/ActivePython/ 72 | 73 | .. _`Enthought Python Distribution`: http://www.enthought.com/products/epddownload.php 74 | 75 | .. _`SciPy (optional)`: http://www.scipy.org/ 76 | 77 | .. _`IPython (optional)`: http://ipython.scipy.org/ 78 | 79 | .. _`pyTables (optional)`: http://www.pytables.org/moin 80 | 81 | .. _`HDF5`: http://www.hdfgroup.org/HDF5/ 82 | 83 | .. _`pydot (optional)`: http://code.google.com/p/pydot/ 84 | 85 | .. _`Graphviz`: http://www.graphviz.org/ 86 | 87 | .. _`nose (optional)`: http://somethingaboutorange.com/mrl/projects/nose/ 88 | 89 | .. _`SciPy Superpack page`: http://http://stronginference.com/scipy-superpack/ 90 | 91 | Compiling the source code 92 | ------------------------- 93 | 94 | You can check out the latest development source of the code from `GitHub`_ 95 | repository:: 96 | 97 | git clone git://github.com/pymc-devs/pymc.git pymc 98 | 99 | Then move into the ``pymc`` directory and follow the platform specific instructions. 100 | 101 | Though this code is technically development source, it contains important bug fixes and features absent from the previous release (2.1) and is relatively stable. Hence, we recommend using the latest development code if possible. A new release is in the works, but will not be complete prior to SciPy 2011. 102 | 103 | Windows 104 | ~~~~~~~ 105 | 106 | One way to compile PyMC on Windows is to install `MinGW`_ and `MSYS`_. MinGW is 107 | the GNU Compiler Collection (GCC) augmented with Windows specific headers and 108 | libraries. MSYS is a POSIX-like console (bash) with UNIX command line tools. 109 | Download the `Automated MinGW Installer`_ and double-click on it to launch 110 | the installation process. You will be asked to select which 111 | components are to be installed: make sure the g77 compiler is selected and 112 | proceed with the instructions. Then download and install `MSYS-1.0.exe`_, 113 | launch it and again follow the on-screen instructions. 114 | 115 | Once this is done, launch the MSYS console, change into the PyMC directory and 116 | type:: 117 | 118 | python setup.py install 119 | 120 | This will build the C and Fortran extension and copy the libraries and python 121 | modules in the C:/Python26/Lib/site-packages/pymc directory. 122 | 123 | .. _`GitHub`: http://github.com 124 | 125 | .. _`MinGW`: http://www.mingw.org/ 126 | 127 | .. _`MSYS`: http://www.mingw.org/wiki/MSYS 128 | 129 | .. _`Automated MinGW Installer`: http://sourceforge.net/projects/mingw/files/ 130 | 131 | .. _`MSYS-1.0.exe`: http://downloads.sourceforge.net/mingw/MSYS-1.0.11.exe 132 | 133 | 134 | Mac OS X or Linux 135 | ~~~~~~~~~~~~~~~~~ 136 | 137 | In a terminal, type:: 138 | 139 | python setup.py config_fc --fcompiler gnu95 build 140 | python setup.py install 141 | 142 | The above syntax also assumes that you have gFortran installed and available. The 143 | `sudo` command may be required to install PyMC into the Python ``site-packages`` 144 | directory if it has restricted privileges. 145 | 146 | In addition, the python2.6-dev package may be required to install PyMC on Linux systems. On Ubuntu or Debian, we have had success by installing the following prior to building PyMC:: 147 | 148 | sudo apt-get install ipython python-setuptools python-dev python-nose 149 | python-tk python-numpy python-matplotlib python-scipy python-networkx 150 | gfortran libatlas-base-dev 151 | 152 | 153 | Running the test suite 154 | ---------------------- 155 | 156 | ``pymc`` comes with a set of tests that verify that the critical components 157 | of the code work as expected. To run these tests, users must have `nose`_ 158 | installed. The tests are launched from a python shell:: 159 | 160 | import pymc 161 | pymc.test() 162 | 163 | In case of failures, messages detailing the nature of these failures will 164 | appear. In case this happens (it shouldn't), please report 165 | the problems on the `issue tracker`_ 166 | specifying the version you are using and the environment. 167 | 168 | .. _`nose`: http://somethingaboutorange.com/mrl/projects/nose/ 169 | 170 | .. _`issue tracker`: http://github.com/pymc-devs/pymc/issues 171 | 172 | 173 | Code for BDA Project Template 174 | ----------------------------- 175 | 176 | Here is `a template`_ for a project to do Bayesian data analysis with PyMC 177 | 178 | .. _`a template`: https://github.com/aflaxman/pymc-project-template 179 | 180 | Code for the Human Development Index vs Total Fertility Rate example 181 | -------------------------------------------------------------------- 182 | 183 | Code to `replicate examples`_ from the tutorial. 184 | 185 | .. _`replicate examples`: https://github.com/aflaxman/pymc-example-tfr-hdi 186 | -------------------------------------------------------------------------------- /SciPy Lecture 1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fonnesbeck/pymc_tutorial/46b12d84569b517f19aec2961bb052a94de06a51/SciPy Lecture 1.pdf -------------------------------------------------------------------------------- /SciPy Lecture 4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fonnesbeck/pymc_tutorial/46b12d84569b517f19aec2961bb052a94de06a51/SciPy Lecture 4.pdf -------------------------------------------------------------------------------- /bioassay.py: -------------------------------------------------------------------------------- 1 | from pymc import * 2 | from numpy import array 3 | 4 | n = [5]*4 5 | dose = [-.86,-.3,-.05,.73] 6 | response = [0,1,3,5] 7 | 8 | alpha = Normal('alpha', mu=0, tau=0.01) 9 | beta = Normal('beta', mu=0, tau=0.01) 10 | 11 | theta = Lambda('theta', lambda a=alpha, b=beta: invlogit(a + b*array(dose))) 12 | 13 | @observed 14 | def deaths(value=response, n=n, p=theta): 15 | """deaths ~ binomial_like(n, p)""" 16 | return binomial_like_like(value, n, p) -------------------------------------------------------------------------------- /cov.py: -------------------------------------------------------------------------------- 1 | from pymc.gp import * 2 | from pymc.gp.cov_funs import matern 3 | from numpy import * 4 | 5 | C = Covariance(eval_fun = matern.euclidean, diff_degree = 1.4, amp = .4, scale = 1.) 6 | # C = Covariance(eval_fun = matern.euclidean, diff_degree = 1.4, amp = .4, scale = 1., rank_limit=100) 7 | # C = FullRankCovariance(eval_fun = matern.euclidean, diff_degree = 1.4, amp = .4, scale = 1.) 8 | # C = NearlyFullRankCovariance(eval_fun = matern.euclidean, diff_degree = 1.4, amp = .4, scale = 1.) 9 | 10 | #### - Plot - #### 11 | if __name__ == '__main__': 12 | from pylab import * 13 | 14 | x=arange(-1.,1.,.01) 15 | clf() 16 | 17 | # Plot the covariance function 18 | subplot(1,2,1) 19 | 20 | contourf(x,x,C(x,x).view(ndarray),origin='lower',extent=(-1.,1.,-1.,1.),cmap=cm.bone) 21 | 22 | xlabel('x') 23 | ylabel('y') 24 | title('C(x,y)') 25 | axis('tight') 26 | colorbar() 27 | 28 | # Plot a slice of the covariance function 29 | subplot(1,2,2) 30 | 31 | plot(x,C(x,0).view(ndarray).ravel(),'k-') 32 | 33 | xlabel('x') 34 | ylabel('C(x,0)') 35 | title('A slice of C') 36 | 37 | # show() 38 | -------------------------------------------------------------------------------- /mean.py: -------------------------------------------------------------------------------- 1 | from pymc.gp import * 2 | 3 | # Generate mean 4 | def quadfun(x, a, b, c): 5 | return (a * x ** 2 + b * x + c) 6 | 7 | M = Mean(quadfun, a = 1., b = .5, c = 2.) -------------------------------------------------------------------------------- /obs.py: -------------------------------------------------------------------------------- 1 | # Import the mean and covariance 2 | from mean import M 3 | from cov import C 4 | from pymc.gp import * 5 | from numpy import * 6 | 7 | # Impose observations on the GP 8 | o = array([-.5,.5]) 9 | V = array([.002,.002]) 10 | data = array([3.1, 2.9]) 11 | observe(M, C, obs_mesh=o, obs_V = V, obs_vals = data) 12 | 13 | # Generate realizations 14 | f_list=[Realization(M, C) for i in range(3)] 15 | 16 | x=arange(-1.,1.,.01) 17 | 18 | #### - Plot - #### 19 | if __name__ == '__main__': 20 | from pylab import * 21 | 22 | x=arange(-1.,1.,.01) 23 | 24 | clf() 25 | 26 | plot_envelope(M, C, mesh=x) 27 | 28 | for f in f_list: 29 | plot(x, f(x)) 30 | 31 | xlabel('x') 32 | ylabel('f(x)') 33 | title('Three realizations of the observed GP') 34 | axis('tight') 35 | 36 | 37 | # show() 38 | -------------------------------------------------------------------------------- /triangular.py: -------------------------------------------------------------------------------- 1 | from numpy import log, random, sqrt, zeros, atleast_1d 2 | 3 | def triangular_like(x, mode, minval, maxval): 4 | """Log-likelihood of triangular distribution""" 5 | 6 | x = atleast_1d(x) 7 | 8 | # Check for support 9 | if any(xmaxval): return -inf 10 | 11 | # Likelihood of left values 12 | like = sum(log(2*(x[x<=mode] - minval)) - log(mode - minval) - log(maxval - minval)) 13 | 14 | # Likelihood of right values 15 | like += sum(log(2*(maxval - x[x>mode])) - log(maxval - minval) - log(maxval - mode)) 16 | 17 | return like 18 | 19 | def rtriangular(mode, minval, maxval, size=1): 20 | """Generate triangular random numbers""" 21 | 22 | # Uniform random numbers 23 | z = atleast_1d(random.random(size)) 24 | 25 | # Threshold for transformation 26 | threshold = (mode - minval)/(maxval - minval) 27 | 28 | # Transform uniforms 29 | u = atleast_1d(zeros(size)) 30 | u[z<=threshold] = minval + sqrt(z[z<=threshold]*(maxval - minval)*(mode - minval)) 31 | u[z>threshold] = maxval - sqrt((1 - z[z>threshold])*(maxval - minval)*(maxval - mode)) 32 | 33 | return u -------------------------------------------------------------------------------- /truncated_metropolis.py: -------------------------------------------------------------------------------- 1 | class TruncatedMetropolis(pymc.Metropolis): 2 | def __init__(self, stochastic, low_bound, up_bound, *args, **kwargs): 3 | self.low_bound = low_bound 4 | self.up_bound = up_bound 5 | pymc.Metropolis.__init__(self, stochastic, *args, **kwargs) 6 | 7 | # Propose method generates proposal values 8 | def propose(self): 9 | tau = 1./(self.adaptive_scale_factor * self.proposal_sd)**2 10 | self.stochastic.value = \ 11 | pymc.rtruncnorm(self.stochastic.value, tau, self.low_bound, self.up_bound) 12 | 13 | # Hastings factor method accounts for asymmetric proposal distribution 14 | def hastings_factor(self): 15 | tau = 1./(self.adaptive_scale_factor * self.proposal_sd)**2 16 | cur_val = self.stochastic.value 17 | last_val = self.stochastic.last_value 18 | 19 | lp_for = pymc.truncnorm_like(cur_val, last_val, tau, \ 20 | self.low_bound, self.up_bound) 21 | lp_bak = pymc.truncnorm_like(last_val, cur_val, tau, \ 22 | self.low_bound, self.up_bound) 23 | 24 | if self.verbose > 1: 25 | print self._id + ': Hastings factor %f'%(lp_bak - lp_for) 26 | return lp_bak - lp_for -------------------------------------------------------------------------------- /weibull.py: -------------------------------------------------------------------------------- 1 | import pymc 2 | 3 | # Some fake data 4 | alpha = 3 5 | beta = 5 6 | N = 100 7 | dataset = pymc.rweibull(alpha,beta, N) 8 | 9 | # Model 10 | a = pymc.Uniform('a', lower=0, upper=10, value=5, doc='Weibull alpha parameter') 11 | b = pymc.Uniform('b', lower=0, upper=10, value=5, doc='Weibull beta parameter') 12 | like = pymc.Weibull('like', alpha=a, beta=b, value=dataset, observed=True) --------------------------------------------------------------------------------