├── Bayesian Inference.mlappinstall ├── LICENSE.txt ├── README.md ├── entropy.m ├── invert_VBAtoolbox.m ├── invert_monte_carlo.m ├── invert_variational_laplace.m ├── invert_variational_numeric.m ├── invert_variational_stochastic.m ├── log_joint.m ├── log_likelihood.m ├── main.m ├── sigmoid.m └── simulate_data.m /Bayesian Inference.mlappinstall: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lionel-rigoux/tutorial-bayesian-inference/8553fa3c1251c3dff291bed919fe8ef8bffc15fd/Bayesian Inference.mlappinstall -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | ----------- 3 | 4 | Copyright (c) 2017 Lionel Rigoux (lionel-rigoux.github.io) 5 | Permission is hereby granted, free of charge, to any person 6 | obtaining a copy of this software and associated documentation 7 | files (the "Software"), to deal in the Software without 8 | restriction, including without limitation the rights to use, 9 | copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the 11 | Software is furnished to do so, subject to the following 12 | conditions: 13 | 14 | The above copyright notice and this permission notice shall be 15 | included in all copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | OTHER DEALINGS IN THE SOFTWARE. 25 | 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A practical tutorial on Bayesian inference 2 | 3 | This goal of this repo is to provide a gentle introduction to numerical methods for Bayesian inference. Papers on the topic are usually quite abstract and general, and existing implementations are too complex to be back engineered. 4 | 5 | Here, you'll find different numerical solutions to a single, simple model: the logistic regression (see below). The various algorithms are voluntarily reduced to their bare minimum in order to provide simple working examples. Hopefully, this code will provide some insight into the different approaches, their strengths, and theirs limitations. 6 | 7 | ## The model: logistic regression 8 | 9 | Imagine a simple psychophysic experiment in which we present the subject with a sequence stimuli of various intensities. The probability of the subject detecting the stimulus increases with its intensity: 10 | 11 | ![By Meerpirat - Own work, CC BY-SA 4.0, Link](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Psychometric_function_with_artificial_data.png/295px-Psychometric_function_with_artificial_data.png) 12 | 13 | Let's denote `x` the stimulus intensity. If we assume that the task is calibrated such that the subject is at chance level for a neutral simulus (`x = 0`), we can write the [psychometric function](https://en.wikipedia.org/wiki/Psychometric_function) that maps the stimulus intensity to the probability of detection using a sigmoid function: 14 | 15 | ![](http://latex.codecogs.com/gif.latex?s%28x%2C%5Ctheta%29%20%3D%20%5Cfrac%7B1%7D%7B1+e%5E%7B-%5Ctheta%20x%7D%7D) 16 | 17 | where theta is a parameter that capture the sensitivity of the subject to changes in intensities. This function is implemented in `sigmoid.m`. 18 | 19 | At each trial, the subject can only give a binary answer (`y`), "seen" (`y = 1`) or "not seen" (`y = 0`). Formally, we can describe the probability distribution of the responses, aka. the Likelihood function, as a [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution), ie: 20 | 21 | ![](http://latex.codecogs.com/gif.latex?p%28y%7C%5Ctheta%29%20%3D%20s%28x%2C%5Ctheta%29%5Ey%20%5B1-s%28x%2C%5Ctheta%29%5D%5E%7B1-y%7D) 22 | 23 | The function `simulate_data(true_theta)` will simulate 100 artificial responses for a sequence of stimuli between -5 and 5 for a given sensitivity parameter `true_theta`. 24 | 25 | ## The solutions 26 | 27 | Our goal is to perform Bayesian inference to invert the logistic model. 28 | 29 | The function `[posterior, logEvidence] = main()` will generate artificial data, define a prior, and run a set of inversion routines that will approximate both the posterior and the (log-)model evidence. More precisely, it will implement: 30 | 31 | - An MCMC scheme: the Metropolis-Hastings algorithm, with a rough approximation of the model evidence via the Harmonic estimator (`invert_monte_carlo.m`). 32 | - A Variational-Laplace scheme, as implemented in SPM or the VBA toolbox (`invert_variational_laplace`) 33 | - A Variational procedure without Laplace. Although this method is never used in practice, it can help dissociate the influence of the Gaussian approximation of the posterior from the Laplace approximation (`invert_variational_numeric`). 34 | - A Stochastic Gradient "blackbox" scheme (`invert_variational_stochastic`), as usually found in machine learning 35 | - An easy inversion using the VBA-toolbox (`invert_VBAtoolbox`) 36 | 37 | ## References 38 | 39 | ### Conjugacy 40 | 41 | - Murphy, K. P. (2007). Conjugate Bayesian analysis of the Gaussian distribution. 42 | 43 | ### Variational inference 44 | 45 | - Zhang, C., Butepage, J., Kjellstrom, H., & Mandt, S. (2018). Advances in variational inference. IEEE transactions on pattern analysis and machine intelligence. 46 | 47 | #### Laplace 48 | 49 | - Daunizeau, J. (2017). The variational Laplace approach to approximate Bayesian inference. arXiv preprint arXiv:1703.02089. 50 | - Friston, K., Mattout, J., Trujillo-Barreto, N., Ashburner, J., & Penny, W. (2007). Variational free energy and the Laplace approximation. Neuroimage, 34(1), 220-234. 51 | 52 | #### Stochastic Gradient 53 | 54 | - Ranganath, R., Gerrish, S., & Blei, D. (2014, April). Black box variational inference. In Artificial Intelligence and Statistics (pp. 814-822). 55 | 56 | ### Markov Chains Monte Carlo methods 57 | 58 | - Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (pp. 93-1). Toronto, Ontario, Canada: Department of Computer Science, University of Toronto. 59 | - Geyer, C. J. (2011). Introduction to markov chain monte carlo. Handbook of markov chain monte carlo, 20116022, 45. 60 | - Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine learning, 50(1-2), 5-43. 61 | 62 | #### Deep Inference 63 | 64 | - Nolan, S., Smerzi, A. & Pezzè, L. (2021). Machine learning approach to Bayesian parameter estimation. npj Quantum Inf 7, 169 65 | 66 | ### Model selection 67 | 68 | - Friel, N., & Wyse, J. (2012). Estimating the evidence–a review. Statistica Neerlandica, 66(3), 288-308. 69 | - Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. Neuroimage, 46(4), 1004-1017. 70 | - Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection for group studies—revisited. Neuroimage, 84, 971-985. 71 | - Penny, W. D., Stephan, K. E., Daunizeau, J., Rosa, M. J., Friston, K. J., Schofield, T. M., & Leff, A. P. (2010). Comparing families of dynamic causal models. PLoS computational biology, 6(3), e1000709. 72 | 73 | ### Hierarchical approaches 74 | 75 | - Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS computational biology, 15(6), e1007043. 76 | -------------------------------------------------------------------------------- /entropy.m: -------------------------------------------------------------------------------- 1 | function H = entropy (q) 2 | % Entropy of the univariatate Gaussian 3 | % ------------------------------------------------------------------------- 4 | % $$ H[q] = E[- \log q(\theta)]_q = 1/2 log(2e\pi\Sigma) $$ 5 | % ------------------------------------------------------------------------- 6 | H = 0.5 * (log (2 * pi * q.sigma) + 1); 7 | end -------------------------------------------------------------------------------- /invert_VBAtoolbox.m: -------------------------------------------------------------------------------- 1 | function [posterior, logEvidence] = invert_VBAtoolbox(data, prior) 2 | % Bayesian logistic regression using the VBA toolbox (variational laplace) 3 | % ------------------------------------------------------------------------- 4 | % This script is a minimal demo showing how to run the inference only given 5 | % the specification of the model prediction, letting the toolbox do the all 6 | % the work. 7 | % ------------------------------------------------------------------------- 8 | 9 | %% ========================================================================= 10 | % Model definition 11 | % ========================================================================= 12 | % Note that in the toolbox, parameters of static models are called "phi" 13 | 14 | % mapping between input and response 15 | function [gx] = g_logistic(~,param,input,~) 16 | gx = VBA_sigmoid(param * input); 17 | end 18 | 19 | % number of parameters 20 | dim.n_phi = 1; 21 | 22 | % indicate we are fitting binary data 23 | options.sources.type = 1; 24 | 25 | %% ========================================================================= 26 | % Inference 27 | % ========================================================================= 28 | 29 | % specify the prior 30 | options.priors.muPhi = prior.mu; 31 | options.priors.SigmaPhi = prior.sigma; 32 | options.tolFun = 1e-4; 33 | options.GNtolFun = 1e-4; 34 | 35 | % call the inversion routine 36 | [post, out] = VBA_NLStateSpaceModel (data.y, data.x, [], @g_logistic, dim, options); 37 | 38 | 39 | %% ========================================================================= 40 | % Wrapping up 41 | % ========================================================================= 42 | 43 | % rename for consitency with the other demos 44 | posterior.mu = post.muPhi; 45 | posterior.sigma = post.SigmaPhi; 46 | logEvidence = out.F; 47 | 48 | end 49 | -------------------------------------------------------------------------------- /invert_monte_carlo.m: -------------------------------------------------------------------------------- 1 | function [posterior, logEvidence, theta]= monte_carlo (data, prior) 2 | % Bayesian logistic regression using MCMC/sampling (Metropolis Hasting) 3 | % ------------------------------------------------------------------------- 4 | % This script implements a Metropolis-Hastings algorithm to generate samples 5 | % from the posterior of our logisitic problem. 6 | % We also compute the so-called harmonic estimator of the model evidence 7 | % using the posterior samples. 8 | % ------------------------------------------------------------------------- 9 | 10 | %% Metropolis-Hastings algorithm 11 | % ======================================================================== 12 | % Here, we use a Markov Chain to generate samples and an accept/reject rule 13 | % based on the joint density to collect samples from the posterior. 14 | 15 | % Initialisation 16 | % --------------------------------------------------------------------- 17 | % number of samples 18 | N = 1e6; 19 | % variance of the proposal distribution 20 | proposal_sigma = 0.15; 21 | % starting values 22 | theta = 0; 23 | old = log_joint(data, prior, theta); 24 | 25 | % Main loop 26 | % --------------------------------------------------------------------- 27 | for t = 2 : N 28 | 29 | % propose a new sample with a random step (Markov-Chain jump) 30 | proposal = theta(t-1) + sqrt (proposal_sigma) * randn(); 31 | 32 | % compute the (log) joint probability of the proposal 33 | new = log_joint (data, prior, proposal); 34 | 35 | % do we get warmer? 36 | accept_prob = exp (new - old); 37 | accept = accept_prob > rand(1) ; 38 | if accept % if yes, confirm the jump 39 | theta(t) = proposal; 40 | old = new; 41 | 42 | else % otherwise, stay in place 43 | theta(t) = theta(t-1); 44 | end 45 | end 46 | 47 | %% Posterior characterization 48 | % ======================================================================== 49 | % Before we can use the law of large numbers, we need to clean up the 50 | % samples to ensure they are memory less (no effect of the starting point) 51 | % and independent (no autocorrelation). 52 | 53 | % If we were doing things properly, we would have run multiple chains and 54 | % would compute convergence scores, eg. Gelman Rubin Disagnostic... 55 | 56 | % Remove "burn in" phase. See "Geweke diagnostic" 57 | % --------------------------------------------------------------------- 58 | theta(1:100) = []; 59 | 60 | % De-correlation 61 | % --------------------------------------------------------------------- 62 | % Compute autocorrelation for increasing lag 63 | for lag = 1 : 100 64 | AR(lag) = corr(theta(lag+1:end)', theta(1:end-lag)'); 65 | end 66 | % find minimum lag to have negligible autocorrelation 67 | optlag = find(AR<.05, 1); 68 | % decimate the samples accordingly 69 | theta = theta(1:optlag:end); 70 | 71 | % Posterior moments 72 | % --------------------------------------------------------------------- 73 | % We can now use the law of large numbers to approximate the sufficient 74 | % statistics of the posterior distribution. 75 | posterior.mu = mean (theta); 76 | posterior.sigma = var (theta); 77 | 78 | %% Model evidence 79 | % ======================================================================== 80 | % If one can approximate the model evidence using samples from the prior, 81 | % it is better to do it using samples from the posterior because it better 82 | % explores where the likelihood is high. Here, we apply the so-called Harmonic 83 | % estimator which uses samples from the posterior: \theta_t ~ p(\theta|y) 84 | % 85 | % $$ p(y) \approx N / sum [1 / p(y|\theta_t)] $$ 86 | % 87 | % Note that this estimator tend to overestimate the evidence and is quite 88 | % insensitive to the prior: 89 | % See https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ 90 | % Better (but slightly more complicated) estimators exists,like the one in 91 | % Chib & Jeliazkov for the Metropolis-Hastings output. 92 | 93 | for t = 1 : numel (theta) 94 | ll(t) = log_likelihood (data, theta(t)); 95 | end 96 | 97 | logEvidence = log (numel (ll)) - logsumexp (-ll); 98 | 99 | end 100 | 101 | % ======================================================================== 102 | % Returns log(sum(exp(a))) while avoiding numerical underflow. 103 | function s = logsumexp(a) 104 | ma = max(a); 105 | s = ma + log(sum(exp(a - ma))); 106 | end 107 | -------------------------------------------------------------------------------- /invert_variational_laplace.m: -------------------------------------------------------------------------------- 1 | function [posterior, logEvidence] = variational_laplace (data, prior) 2 | % Bayesian logistic regression using variational Laplace 3 | % ------------------------------------------------------------------------- 4 | % This script implements a simple variational inference scheme to compute 5 | % the (Gaussian approximation of the) posterior distribution and the 6 | % (Free energy approximation of the) model evidence for our logisitic 7 | % model. Using the Laplace approximation to the expected log joint, all 8 | % values except the posterior mean can be derived analytically 9 | % ------------------------------------------------------------------------- 10 | 11 | % ------------------------------------------------------------------------- 12 | % 1a) the posterior mean is the MAP. This is usually solved using a 13 | % regularized Gauss-Newton scheme. For the sake of simplicity, we prefer 14 | % here the built-in Matlab optimization function. 15 | 16 | posterior.mu = fminsearch(@(theta) - log_joint(data, prior, theta), 0); 17 | 18 | % ------------------------------------------------------------------------- 19 | % 1b) The posterior variance has an analytical solution: 20 | % 21 | % $$ Sigma^* = - [d^2/d\theta^2 log p(y,\theta)]^1 $$ 22 | % 23 | 24 | % Second order derivative of the log prior 25 | d2dtheta2_logPrior = - 1 / prior.sigma; 26 | 27 | % Second order derivative of the log likelihood. This is the most difficult 28 | % term to derive, and can be approximated via automatic numeric 29 | % differentiation if necessary. Here, we used known identities of the 30 | % log-sigmoid derivatives to simplify the expression. 31 | sx = sigmoid (data.x, posterior.mu); 32 | d2dtheta2_logLikelihood = sum(- data.x.^2 .* sx .* (1-sx)); 33 | 34 | % Second order derivative of the log joint 35 | posterior.sigma = - inv (d2dtheta2_logPrior + d2dtheta2_logLikelihood); 36 | 37 | % ------------------------------------------------------------------------- 38 | % 2) The (log) model evidence can be approximated by the Free energy, which 39 | % it itself approximated via the Laplace approximation at the optimal q. 40 | 41 | logEvidence = free_energy_laplace (data, prior, posterior); 42 | 43 | end 44 | 45 | %% ========================================================================= 46 | % Free energy: 47 | % $$ F = E[\log p(y,\theta)]_q + H[q] 48 | % ========================================================================= 49 | function F = free_energy_laplace (data, prior, q) 50 | F = ... 51 | Eq_log_joint_laplace (data, prior, q) ... 52 | + entropy (q); 53 | end 54 | 55 | %% ========================================================================= 56 | % Evaluates the expectation of the log joint distribution, ie: 57 | % 58 | % $$ E[\log p(y, \theta)]_q = \int \log p(y,\theta) q(\theta) d\theta $$ 59 | % 60 | % using the Laplace approximation: 61 | % 62 | % $$ E[\log p(y, \theta)]_q \approx 63 | % \log p(y, \theta^*) 64 | % + 1/2 tr[Sigma d^2/d\theta^2 \log p(y, \theta)] 65 | % $$ 66 | % 67 | % Here, we used the fact that: 68 | % 69 | % $$ Sigma^* = - [d^2/d\theta^2 \log p(y, \theta)]^-1 $$ 70 | % 71 | % to simplify the second term of the appoximation. 72 | % ========================================================================= 73 | function elj = Eq_log_joint_laplace (data, prior, q) 74 | elj = log_joint (data, prior, q.mu) - 0.5; 75 | end 76 | 77 | 78 | 79 | 80 | 81 | -------------------------------------------------------------------------------- /invert_variational_numeric.m: -------------------------------------------------------------------------------- 1 | function [posterior, logEvidence] = invert_variational_numeric(data, prior) 2 | % Bayesian logistic regression using variational inference (with sampling) 3 | % ------------------------------------------------------------------------- 4 | % This script implements a simple variational inference scheme to compute 5 | % the (Gaussian approximation of the) posterior distribution and the 6 | % (Free energy approximation of the) model evidence for our logisitic 7 | % model. In this case, we do ot use the Laplace approximation and compute 8 | % the expected log-joint via sampling. 9 | % No one is using this in practice, this code is only intended for 10 | % educative purpose! 11 | % ------------------------------------------------------------------------- 12 | 13 | % Assuming that the posterior takes a Gaussian form: 14 | % q(\theta) = N(mu, Sigma) 15 | % the inference reduces to an optimization problem: finding the moments 16 | % mu and Sigma which maximize the Free energy. 17 | 18 | % Starting point of the search. For the sake of simplicity, we quickstart 19 | % the algorithm with a good guess. 20 | 21 | % Maximize the Free Energy 22 | [muSigma, mF] = fminsearch ( ... 23 | @(x) - free_energy(data, prior, struct('mu', x(1), 'sigma', x(2)^2)), ... 24 | [2; 1], ... starting point of the search 25 | struct('TolFun', 1e-2, 'TolX', 1e-2) ... no need to be more precise than the sampling 26 | ); 27 | 28 | % Wrapping up 29 | posterior.mu = muSigma(1); 30 | posterior.sigma = muSigma(2)^2; 31 | logEvidence = - mF; 32 | 33 | end 34 | 35 | %% ========================================================================= 36 | % Free energy: 37 | % $$ F = E[\log p(y,\theta)]_q + H[q] 38 | % ========================================================================= 39 | function F = free_energy (data, prior, q) 40 | F = ... 41 | Eq_log_joint (data, prior, q) ... 42 | + entropy (q); 43 | end 44 | 45 | %% ========================================================================= 46 | % Computes the expectation of the log joint distribution, ie: 47 | % 48 | % $$ E[\log p(y, \theta)]_q = \int \log p(y,\theta) q(\theta) d\theta $$ 49 | % 50 | % This is done via a sampling approach as follows: 51 | % - for t = 1 : N 52 | % - sample \theta_t from q(\theta) 53 | % - compute \log p(y, \theta_t) 54 | % According to the low of large numbers, the mean of the \log p(y, \theta_t) 55 | % will converge, with increasing N, to the value of the expectation. 56 | % ========================================================================= 57 | function elj = Eq_log_joint(data, prior, q) 58 | % initialisation 59 | % --------------------------------------------------------------------- 60 | % number of samples 61 | N = 1e4; 62 | % memory pre-allocation 63 | lj = nan(1, N); 64 | 65 | % Sampling procedure 66 | % --------------------------------------------------------------------- 67 | parfor t = 1 : N 68 | % draw theta from q(\theta) = N(mu, Sigma) 69 | theta = q.mu + sqrt(q.sigma) * randn(); 70 | % compute the corresponding log joint 71 | lj(t) = log_joint (data, prior, theta); 72 | end 73 | 74 | % Apply the law of large numbers 75 | % --------------------------------------------------------------------- 76 | elj = mean (lj); 77 | end 78 | 79 | 80 | -------------------------------------------------------------------------------- /invert_variational_stochastic.m: -------------------------------------------------------------------------------- 1 | function [q, logEvidence] = invert_variational_stochastic (data, prior) 2 | % Bayesian logistic regression using stochasting gradient inference 3 | % ------------------------------------------------------------------------- 4 | % This script implements a so called "blackbox" variational inference. 5 | % This approach relies on the Free Energy approximation (aka ELBO). Rather 6 | % trying to direclty approximate the expected energy (eg with the Laplace 7 | % approximation), we first derive the gradient of the ELBO wrt the variational 8 | % parameters (mu, sigma) and approximate this gradient via sampling. 9 | % 10 | % See Ranganath, Gerrish & Blei (2013) Black Box Variational Inference 11 | % ------------------------------------------------------------------------- 12 | 13 | % Assuming that the posterior takes a Gaussian form: 14 | % q(\theta) = N(mu, Sigma) 15 | % the inference reduces to an optimization problem: finding the moments 16 | % mu and Sigma which maximize the Free energy. 17 | 18 | %% ========================================================================= 19 | % Free energy optimisation 20 | % ========================================================================= 21 | 22 | % Meta parameters 23 | % ------------------------------------------------------------------------- 24 | % number of random draws for the gradient estimation 25 | batchSize = 1e3; 26 | 27 | % convergence criteria 28 | epsilon = 0.001; 29 | maxIter = 1e3; 30 | 31 | % learning rate scaling 32 | eta = 1; 33 | 34 | % Stochastic gradient ascent 35 | % ------------------------------------------------------------------------- 36 | % initialisation 37 | q = prior; 38 | 39 | % follow gradient until convergence 40 | for t = 1 : maxIter 41 | 42 | % draw a random batch of parameters from the variational distribution 43 | z = q.mu + sqrt(q.sigma) * randn(1,batchSize); 44 | 45 | % compute scores for all elements of the batch 46 | for i = 1 : batchSize 47 | h(:,i) = grad_score(q, z(i)); 48 | f(:,i) = h(:,i) * (log_joint(data, prior, z(i)) - score(q, z(i))); 49 | end 50 | 51 | % approximate gradient as expectation 52 | for d = 1 : 2 53 | % control variate 54 | c = cov(h(d,:),f(d,:)); 55 | a(d) = c(1,2)/var(h(d,:)); 56 | 57 | % approximate gradient 58 | gL(d, t) = mean(f(d,:) - a(d)*h(d,:)); 59 | end 60 | 61 | % adjust learning rate (adaGrad) 62 | rho = eta./sqrt(sum(gL.^2,2)); 63 | 64 | % update variational moments 65 | delta = rho .* gL(:,t); 66 | q.mu = q.mu + delta(1); 67 | q.sigma = (sqrt(q.sigma) + delta(2))^2; 68 | 69 | % check for convergence 70 | if norm(delta) < epsilon 71 | break 72 | end 73 | 74 | end 75 | 76 | %% ========================================================================= 77 | % model evidence 78 | % ========================================================================= 79 | 80 | % Using the Jensen's inequality, we can write the lower bound to the log 81 | % model evidence as: 82 | % 83 | % $$ log p(y) >= E_q[log p(y,\theta) - log q(theta)] $$ 84 | % 85 | % We can then use the law of large number to approximate the expectation 86 | % via sampling. 87 | 88 | for i = 1 : 1e5 89 | theta = q.mu + sqrt(q.sigma) * randn(); 90 | Li = log_joint(data, prior, theta) - score(q, theta); 91 | end 92 | logEvidence = mean(Li); 93 | 94 | end 95 | 96 | %% ========================================================================= 97 | % Subfunction 98 | % ========================================================================= 99 | 100 | % scoring function: log q(\theta) = N(mu,sigma) 101 | function s = score (q, z_i) 102 | s = - 0.5*log(2*pi*q.sigma) - 0.5 * ((z_i-q.mu).^2)/q.sigma; 103 | end 104 | 105 | % gradient of the scoring function 106 | function g = grad_score(q, z_i) 107 | % d/d\theta 108 | g(1,1) = (z_i-q.mu)/q.sigma; 109 | % d/d\Sigma 110 | sq_sigma = sqrt(q.sigma); 111 | g(2,1) = ((z_i-q.mu)^2)/sq_sigma^3- 1/sq_sigma; 112 | end 113 | -------------------------------------------------------------------------------- /log_joint.m: -------------------------------------------------------------------------------- 1 | function z = log_joint(data, prior, theta) 2 | 3 | % compute log joint 4 | z = - 0.5*log(2*pi*prior.sigma) - 0.5 * ((theta-prior.mu).^2)/prior.sigma ... % Gaussian prior 5 | + log_likelihood(data, theta); 6 | end -------------------------------------------------------------------------------- /log_likelihood.m: -------------------------------------------------------------------------------- 1 | function ll = log_likelihood(data, theta) 2 | 3 | % model prediction 4 | sx = sigmoid(data.x, theta); 5 | % avoid numerical overflow 6 | sx = max(min(sx,1-eps), eps); 7 | % compute log joint 8 | ll = sum( ... % aggregate over obsevations 9 | data.y.*log(sx) + (1-data.y).*log(1-sx) ... %binomial log-likelihood 10 | ); 11 | end -------------------------------------------------------------------------------- /main.m: -------------------------------------------------------------------------------- 1 | function [posterior, logEvidence] = main () 2 | 3 | % Simulate data according to our logisitic model with a known parameter 4 | % ------------------------------------------------------------------------- 5 | true_theta = 3; 6 | data = simulate_data (true_theta); 7 | 8 | % Define a prior 9 | % ------------------------------------------------------------------------- 10 | prior.mu = 0; 11 | prior.sigma = 5; 12 | 13 | % Solve the inference problem... 14 | % ------------------------------------------------------------------------- 15 | 16 | % - using sampling (MCMC) 17 | tic 18 | [posterior(1), logEvidence(1), samples] = invert_monte_carlo (data, prior); 19 | toc 20 | 21 | % - using the variational-laplace approch 22 | tic 23 | [posterior(2), logEvidence(2)] = invert_variational_laplace (data, prior); 24 | toc 25 | 26 | % - using a dummy variational without laplace 27 | tic 28 | [posterior(3), logEvidence(3)] = invert_variational_numeric (data, prior); 29 | toc 30 | 31 | % - using variational inference with stochastic gradient ("blackbox") 32 | tic 33 | [posterior(4), logEvidence(4)] = invert_variational_stochastic (data, prior); 34 | toc 35 | 36 | % - using the VBA toolobx (variational laplace) 37 | tic 38 | [posterior(5), logEvidence(5)] = invert_VBAtoolbox (data, prior); 39 | toc 40 | 41 | % Plot the results 42 | % ------------------------------------------------------------------------- 43 | 44 | x = linspace(- 6, 6, 200); 45 | figure(); 46 | hold on 47 | plot_gaussian(x, prior, 'r'); 48 | plot_likelihood(x, data, 'b'); 49 | plot_joint(x, data, prior, 'm'); 50 | plot_samples(samples, [.3 .6 .6]); 51 | plot_gaussian(x, posterior(2), 'g'); 52 | plot_gaussian(x, posterior(4), 'g--'); 53 | legend({'prior','likelihood', 'joint', 'MCMC', 'variational Laplace', 'blackbox'}); 54 | 55 | end 56 | 57 | % plotting helpers 58 | 59 | function h = plot_gaussian(x, p, color) 60 | h = plot(x, normpdf (x, p.mu, sqrt(p.sigma)), color); 61 | end 62 | 63 | function plot_samples(samples, color) 64 | histogram (samples, ... 65 | 'Normalization', 'pdf', ... 66 | 'FaceColor', color, ... 67 | 'EdgeColor', color) 68 | end 69 | 70 | function h = plot_likelihood(x, data, color) 71 | for t = 1 : numel (x) 72 | p(t) = exp(log_likelihood(data, x(t))); 73 | end 74 | p = p / sum (p * (x(2)-x(1))); 75 | h = plot(x, p, color); 76 | end 77 | 78 | function h = plot_joint(x, data, prior, color) 79 | for t = 1 : numel (x) 80 | p(t) = exp(log_joint(data, prior, x(t))); 81 | end 82 | p = p / sum (p * (x(2)-x(1))); 83 | h = plot(x, p, color); 84 | end 85 | 86 | 87 | 88 | -------------------------------------------------------------------------------- /sigmoid.m: -------------------------------------------------------------------------------- 1 | function z = sigmoid (x, theta) 2 | z = 1 ./ (1 + exp(- theta * x)); 3 | end -------------------------------------------------------------------------------- /simulate_data.m: -------------------------------------------------------------------------------- 1 | function data = simulate_data (theta) 2 | % Simulate binary responses from a logisitc model 3 | % ------------------------------------------------------------------------- 4 | % ------------------------------------------------------------------------- 5 | 6 | % true parameter of the model: the slope of the sigmoid mapping 7 | data.theta = theta; 8 | 9 | % experimental manipulation (eg: stimulus intensity) 10 | data.x = linspace(-5,5,100); 11 | 12 | % predictions of the model 13 | data.sx = sigmoid (data.x, data.theta); 14 | 15 | % generate binary responses from model predictions 16 | data.y = +(rand(1, numel(data.x)) < data.sx) ; 17 | 18 | 19 | --------------------------------------------------------------------------------