├── Bayesian Inference.mlappinstall
├── LICENSE.txt
├── README.md
├── entropy.m
├── invert_VBAtoolbox.m
├── invert_monte_carlo.m
├── invert_variational_laplace.m
├── invert_variational_numeric.m
├── invert_variational_stochastic.m
├── log_joint.m
├── log_likelihood.m
├── main.m
├── sigmoid.m
└── simulate_data.m
/Bayesian Inference.mlappinstall:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lionel-rigoux/tutorial-bayesian-inference/8553fa3c1251c3dff291bed919fe8ef8bffc15fd/Bayesian Inference.mlappinstall
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 | -----------
3 |
4 | Copyright (c) 2017 Lionel Rigoux (lionel-rigoux.github.io)
5 | Permission is hereby granted, free of charge, to any person
6 | obtaining a copy of this software and associated documentation
7 | files (the "Software"), to deal in the Software without
8 | restriction, including without limitation the rights to use,
9 | copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the
11 | Software is furnished to do so, subject to the following
12 | conditions:
13 |
14 | The above copyright notice and this permission notice shall be
15 | included in all copies or substantial portions of the Software.
16 |
17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
18 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
19 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
20 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
21 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
22 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
24 | OTHER DEALINGS IN THE SOFTWARE.
25 |
26 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A practical tutorial on Bayesian inference
2 |
3 | This goal of this repo is to provide a gentle introduction to numerical methods for Bayesian inference. Papers on the topic are usually quite abstract and general, and existing implementations are too complex to be back engineered.
4 |
5 | Here, you'll find different numerical solutions to a single, simple model: the logistic regression (see below). The various algorithms are voluntarily reduced to their bare minimum in order to provide simple working examples. Hopefully, this code will provide some insight into the different approaches, their strengths, and theirs limitations.
6 |
7 | ## The model: logistic regression
8 |
9 | Imagine a simple psychophysic experiment in which we present the subject with a sequence stimuli of various intensities. The probability of the subject detecting the stimulus increases with its intensity:
10 |
11 | 
12 |
13 | Let's denote `x` the stimulus intensity. If we assume that the task is calibrated such that the subject is at chance level for a neutral simulus (`x = 0`), we can write the [psychometric function](https://en.wikipedia.org/wiki/Psychometric_function) that maps the stimulus intensity to the probability of detection using a sigmoid function:
14 |
15 | 
16 |
17 | where theta is a parameter that capture the sensitivity of the subject to changes in intensities. This function is implemented in `sigmoid.m`.
18 |
19 | At each trial, the subject can only give a binary answer (`y`), "seen" (`y = 1`) or "not seen" (`y = 0`). Formally, we can describe the probability distribution of the responses, aka. the Likelihood function, as a [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution), ie:
20 |
21 | 
22 |
23 | The function `simulate_data(true_theta)` will simulate 100 artificial responses for a sequence of stimuli between -5 and 5 for a given sensitivity parameter `true_theta`.
24 |
25 | ## The solutions
26 |
27 | Our goal is to perform Bayesian inference to invert the logistic model.
28 |
29 | The function `[posterior, logEvidence] = main()` will generate artificial data, define a prior, and run a set of inversion routines that will approximate both the posterior and the (log-)model evidence. More precisely, it will implement:
30 |
31 | - An MCMC scheme: the Metropolis-Hastings algorithm, with a rough approximation of the model evidence via the Harmonic estimator (`invert_monte_carlo.m`).
32 | - A Variational-Laplace scheme, as implemented in SPM or the VBA toolbox (`invert_variational_laplace`)
33 | - A Variational procedure without Laplace. Although this method is never used in practice, it can help dissociate the influence of the Gaussian approximation of the posterior from the Laplace approximation (`invert_variational_numeric`).
34 | - A Stochastic Gradient "blackbox" scheme (`invert_variational_stochastic`), as usually found in machine learning
35 | - An easy inversion using the VBA-toolbox (`invert_VBAtoolbox`)
36 |
37 | ## References
38 |
39 | ### Conjugacy
40 |
41 | - Murphy, K. P. (2007). Conjugate Bayesian analysis of the Gaussian distribution.
42 |
43 | ### Variational inference
44 |
45 | - Zhang, C., Butepage, J., Kjellstrom, H., & Mandt, S. (2018). Advances in variational inference. IEEE transactions on pattern analysis and machine intelligence.
46 |
47 | #### Laplace
48 |
49 | - Daunizeau, J. (2017). The variational Laplace approach to approximate Bayesian inference. arXiv preprint arXiv:1703.02089.
50 | - Friston, K., Mattout, J., Trujillo-Barreto, N., Ashburner, J., & Penny, W. (2007). Variational free energy and the Laplace approximation. Neuroimage, 34(1), 220-234.
51 |
52 | #### Stochastic Gradient
53 |
54 | - Ranganath, R., Gerrish, S., & Blei, D. (2014, April). Black box variational inference. In Artificial Intelligence and Statistics (pp. 814-822).
55 |
56 | ### Markov Chains Monte Carlo methods
57 |
58 | - Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (pp. 93-1). Toronto, Ontario, Canada: Department of Computer Science, University of Toronto.
59 | - Geyer, C. J. (2011). Introduction to markov chain monte carlo. Handbook of markov chain monte carlo, 20116022, 45.
60 | - Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine learning, 50(1-2), 5-43.
61 |
62 | #### Deep Inference
63 |
64 | - Nolan, S., Smerzi, A. & Pezzè, L. (2021). Machine learning approach to Bayesian parameter estimation. npj Quantum Inf 7, 169
65 |
66 | ### Model selection
67 |
68 | - Friel, N., & Wyse, J. (2012). Estimating the evidence–a review. Statistica Neerlandica, 66(3), 288-308.
69 | - Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. Neuroimage, 46(4), 1004-1017.
70 | - Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection for group studies—revisited. Neuroimage, 84, 971-985.
71 | - Penny, W. D., Stephan, K. E., Daunizeau, J., Rosa, M. J., Friston, K. J., Schofield, T. M., & Leff, A. P. (2010). Comparing families of dynamic causal models. PLoS computational biology, 6(3), e1000709.
72 |
73 | ### Hierarchical approaches
74 |
75 | - Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS computational biology, 15(6), e1007043.
76 |
--------------------------------------------------------------------------------
/entropy.m:
--------------------------------------------------------------------------------
1 | function H = entropy (q)
2 | % Entropy of the univariatate Gaussian
3 | % -------------------------------------------------------------------------
4 | % $$ H[q] = E[- \log q(\theta)]_q = 1/2 log(2e\pi\Sigma) $$
5 | % -------------------------------------------------------------------------
6 | H = 0.5 * (log (2 * pi * q.sigma) + 1);
7 | end
--------------------------------------------------------------------------------
/invert_VBAtoolbox.m:
--------------------------------------------------------------------------------
1 | function [posterior, logEvidence] = invert_VBAtoolbox(data, prior)
2 | % Bayesian logistic regression using the VBA toolbox (variational laplace)
3 | % -------------------------------------------------------------------------
4 | % This script is a minimal demo showing how to run the inference only given
5 | % the specification of the model prediction, letting the toolbox do the all
6 | % the work.
7 | % -------------------------------------------------------------------------
8 |
9 | %% =========================================================================
10 | % Model definition
11 | % =========================================================================
12 | % Note that in the toolbox, parameters of static models are called "phi"
13 |
14 | % mapping between input and response
15 | function [gx] = g_logistic(~,param,input,~)
16 | gx = VBA_sigmoid(param * input);
17 | end
18 |
19 | % number of parameters
20 | dim.n_phi = 1;
21 |
22 | % indicate we are fitting binary data
23 | options.sources.type = 1;
24 |
25 | %% =========================================================================
26 | % Inference
27 | % =========================================================================
28 |
29 | % specify the prior
30 | options.priors.muPhi = prior.mu;
31 | options.priors.SigmaPhi = prior.sigma;
32 | options.tolFun = 1e-4;
33 | options.GNtolFun = 1e-4;
34 |
35 | % call the inversion routine
36 | [post, out] = VBA_NLStateSpaceModel (data.y, data.x, [], @g_logistic, dim, options);
37 |
38 |
39 | %% =========================================================================
40 | % Wrapping up
41 | % =========================================================================
42 |
43 | % rename for consitency with the other demos
44 | posterior.mu = post.muPhi;
45 | posterior.sigma = post.SigmaPhi;
46 | logEvidence = out.F;
47 |
48 | end
49 |
--------------------------------------------------------------------------------
/invert_monte_carlo.m:
--------------------------------------------------------------------------------
1 | function [posterior, logEvidence, theta]= monte_carlo (data, prior)
2 | % Bayesian logistic regression using MCMC/sampling (Metropolis Hasting)
3 | % -------------------------------------------------------------------------
4 | % This script implements a Metropolis-Hastings algorithm to generate samples
5 | % from the posterior of our logisitic problem.
6 | % We also compute the so-called harmonic estimator of the model evidence
7 | % using the posterior samples.
8 | % -------------------------------------------------------------------------
9 |
10 | %% Metropolis-Hastings algorithm
11 | % ========================================================================
12 | % Here, we use a Markov Chain to generate samples and an accept/reject rule
13 | % based on the joint density to collect samples from the posterior.
14 |
15 | % Initialisation
16 | % ---------------------------------------------------------------------
17 | % number of samples
18 | N = 1e6;
19 | % variance of the proposal distribution
20 | proposal_sigma = 0.15;
21 | % starting values
22 | theta = 0;
23 | old = log_joint(data, prior, theta);
24 |
25 | % Main loop
26 | % ---------------------------------------------------------------------
27 | for t = 2 : N
28 |
29 | % propose a new sample with a random step (Markov-Chain jump)
30 | proposal = theta(t-1) + sqrt (proposal_sigma) * randn();
31 |
32 | % compute the (log) joint probability of the proposal
33 | new = log_joint (data, prior, proposal);
34 |
35 | % do we get warmer?
36 | accept_prob = exp (new - old);
37 | accept = accept_prob > rand(1) ;
38 | if accept % if yes, confirm the jump
39 | theta(t) = proposal;
40 | old = new;
41 |
42 | else % otherwise, stay in place
43 | theta(t) = theta(t-1);
44 | end
45 | end
46 |
47 | %% Posterior characterization
48 | % ========================================================================
49 | % Before we can use the law of large numbers, we need to clean up the
50 | % samples to ensure they are memory less (no effect of the starting point)
51 | % and independent (no autocorrelation).
52 |
53 | % If we were doing things properly, we would have run multiple chains and
54 | % would compute convergence scores, eg. Gelman Rubin Disagnostic...
55 |
56 | % Remove "burn in" phase. See "Geweke diagnostic"
57 | % ---------------------------------------------------------------------
58 | theta(1:100) = [];
59 |
60 | % De-correlation
61 | % ---------------------------------------------------------------------
62 | % Compute autocorrelation for increasing lag
63 | for lag = 1 : 100
64 | AR(lag) = corr(theta(lag+1:end)', theta(1:end-lag)');
65 | end
66 | % find minimum lag to have negligible autocorrelation
67 | optlag = find(AR<.05, 1);
68 | % decimate the samples accordingly
69 | theta = theta(1:optlag:end);
70 |
71 | % Posterior moments
72 | % ---------------------------------------------------------------------
73 | % We can now use the law of large numbers to approximate the sufficient
74 | % statistics of the posterior distribution.
75 | posterior.mu = mean (theta);
76 | posterior.sigma = var (theta);
77 |
78 | %% Model evidence
79 | % ========================================================================
80 | % If one can approximate the model evidence using samples from the prior,
81 | % it is better to do it using samples from the posterior because it better
82 | % explores where the likelihood is high. Here, we apply the so-called Harmonic
83 | % estimator which uses samples from the posterior: \theta_t ~ p(\theta|y)
84 | %
85 | % $$ p(y) \approx N / sum [1 / p(y|\theta_t)] $$
86 | %
87 | % Note that this estimator tend to overestimate the evidence and is quite
88 | % insensitive to the prior:
89 | % See https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/
90 | % Better (but slightly more complicated) estimators exists,like the one in
91 | % Chib & Jeliazkov for the Metropolis-Hastings output.
92 |
93 | for t = 1 : numel (theta)
94 | ll(t) = log_likelihood (data, theta(t));
95 | end
96 |
97 | logEvidence = log (numel (ll)) - logsumexp (-ll);
98 |
99 | end
100 |
101 | % ========================================================================
102 | % Returns log(sum(exp(a))) while avoiding numerical underflow.
103 | function s = logsumexp(a)
104 | ma = max(a);
105 | s = ma + log(sum(exp(a - ma)));
106 | end
107 |
--------------------------------------------------------------------------------
/invert_variational_laplace.m:
--------------------------------------------------------------------------------
1 | function [posterior, logEvidence] = variational_laplace (data, prior)
2 | % Bayesian logistic regression using variational Laplace
3 | % -------------------------------------------------------------------------
4 | % This script implements a simple variational inference scheme to compute
5 | % the (Gaussian approximation of the) posterior distribution and the
6 | % (Free energy approximation of the) model evidence for our logisitic
7 | % model. Using the Laplace approximation to the expected log joint, all
8 | % values except the posterior mean can be derived analytically
9 | % -------------------------------------------------------------------------
10 |
11 | % -------------------------------------------------------------------------
12 | % 1a) the posterior mean is the MAP. This is usually solved using a
13 | % regularized Gauss-Newton scheme. For the sake of simplicity, we prefer
14 | % here the built-in Matlab optimization function.
15 |
16 | posterior.mu = fminsearch(@(theta) - log_joint(data, prior, theta), 0);
17 |
18 | % -------------------------------------------------------------------------
19 | % 1b) The posterior variance has an analytical solution:
20 | %
21 | % $$ Sigma^* = - [d^2/d\theta^2 log p(y,\theta)]^1 $$
22 | %
23 |
24 | % Second order derivative of the log prior
25 | d2dtheta2_logPrior = - 1 / prior.sigma;
26 |
27 | % Second order derivative of the log likelihood. This is the most difficult
28 | % term to derive, and can be approximated via automatic numeric
29 | % differentiation if necessary. Here, we used known identities of the
30 | % log-sigmoid derivatives to simplify the expression.
31 | sx = sigmoid (data.x, posterior.mu);
32 | d2dtheta2_logLikelihood = sum(- data.x.^2 .* sx .* (1-sx));
33 |
34 | % Second order derivative of the log joint
35 | posterior.sigma = - inv (d2dtheta2_logPrior + d2dtheta2_logLikelihood);
36 |
37 | % -------------------------------------------------------------------------
38 | % 2) The (log) model evidence can be approximated by the Free energy, which
39 | % it itself approximated via the Laplace approximation at the optimal q.
40 |
41 | logEvidence = free_energy_laplace (data, prior, posterior);
42 |
43 | end
44 |
45 | %% =========================================================================
46 | % Free energy:
47 | % $$ F = E[\log p(y,\theta)]_q + H[q]
48 | % =========================================================================
49 | function F = free_energy_laplace (data, prior, q)
50 | F = ...
51 | Eq_log_joint_laplace (data, prior, q) ...
52 | + entropy (q);
53 | end
54 |
55 | %% =========================================================================
56 | % Evaluates the expectation of the log joint distribution, ie:
57 | %
58 | % $$ E[\log p(y, \theta)]_q = \int \log p(y,\theta) q(\theta) d\theta $$
59 | %
60 | % using the Laplace approximation:
61 | %
62 | % $$ E[\log p(y, \theta)]_q \approx
63 | % \log p(y, \theta^*)
64 | % + 1/2 tr[Sigma d^2/d\theta^2 \log p(y, \theta)]
65 | % $$
66 | %
67 | % Here, we used the fact that:
68 | %
69 | % $$ Sigma^* = - [d^2/d\theta^2 \log p(y, \theta)]^-1 $$
70 | %
71 | % to simplify the second term of the appoximation.
72 | % =========================================================================
73 | function elj = Eq_log_joint_laplace (data, prior, q)
74 | elj = log_joint (data, prior, q.mu) - 0.5;
75 | end
76 |
77 |
78 |
79 |
80 |
81 |
--------------------------------------------------------------------------------
/invert_variational_numeric.m:
--------------------------------------------------------------------------------
1 | function [posterior, logEvidence] = invert_variational_numeric(data, prior)
2 | % Bayesian logistic regression using variational inference (with sampling)
3 | % -------------------------------------------------------------------------
4 | % This script implements a simple variational inference scheme to compute
5 | % the (Gaussian approximation of the) posterior distribution and the
6 | % (Free energy approximation of the) model evidence for our logisitic
7 | % model. In this case, we do ot use the Laplace approximation and compute
8 | % the expected log-joint via sampling.
9 | % No one is using this in practice, this code is only intended for
10 | % educative purpose!
11 | % -------------------------------------------------------------------------
12 |
13 | % Assuming that the posterior takes a Gaussian form:
14 | % q(\theta) = N(mu, Sigma)
15 | % the inference reduces to an optimization problem: finding the moments
16 | % mu and Sigma which maximize the Free energy.
17 |
18 | % Starting point of the search. For the sake of simplicity, we quickstart
19 | % the algorithm with a good guess.
20 |
21 | % Maximize the Free Energy
22 | [muSigma, mF] = fminsearch ( ...
23 | @(x) - free_energy(data, prior, struct('mu', x(1), 'sigma', x(2)^2)), ...
24 | [2; 1], ... starting point of the search
25 | struct('TolFun', 1e-2, 'TolX', 1e-2) ... no need to be more precise than the sampling
26 | );
27 |
28 | % Wrapping up
29 | posterior.mu = muSigma(1);
30 | posterior.sigma = muSigma(2)^2;
31 | logEvidence = - mF;
32 |
33 | end
34 |
35 | %% =========================================================================
36 | % Free energy:
37 | % $$ F = E[\log p(y,\theta)]_q + H[q]
38 | % =========================================================================
39 | function F = free_energy (data, prior, q)
40 | F = ...
41 | Eq_log_joint (data, prior, q) ...
42 | + entropy (q);
43 | end
44 |
45 | %% =========================================================================
46 | % Computes the expectation of the log joint distribution, ie:
47 | %
48 | % $$ E[\log p(y, \theta)]_q = \int \log p(y,\theta) q(\theta) d\theta $$
49 | %
50 | % This is done via a sampling approach as follows:
51 | % - for t = 1 : N
52 | % - sample \theta_t from q(\theta)
53 | % - compute \log p(y, \theta_t)
54 | % According to the low of large numbers, the mean of the \log p(y, \theta_t)
55 | % will converge, with increasing N, to the value of the expectation.
56 | % =========================================================================
57 | function elj = Eq_log_joint(data, prior, q)
58 | % initialisation
59 | % ---------------------------------------------------------------------
60 | % number of samples
61 | N = 1e4;
62 | % memory pre-allocation
63 | lj = nan(1, N);
64 |
65 | % Sampling procedure
66 | % ---------------------------------------------------------------------
67 | parfor t = 1 : N
68 | % draw theta from q(\theta) = N(mu, Sigma)
69 | theta = q.mu + sqrt(q.sigma) * randn();
70 | % compute the corresponding log joint
71 | lj(t) = log_joint (data, prior, theta);
72 | end
73 |
74 | % Apply the law of large numbers
75 | % ---------------------------------------------------------------------
76 | elj = mean (lj);
77 | end
78 |
79 |
80 |
--------------------------------------------------------------------------------
/invert_variational_stochastic.m:
--------------------------------------------------------------------------------
1 | function [q, logEvidence] = invert_variational_stochastic (data, prior)
2 | % Bayesian logistic regression using stochasting gradient inference
3 | % -------------------------------------------------------------------------
4 | % This script implements a so called "blackbox" variational inference.
5 | % This approach relies on the Free Energy approximation (aka ELBO). Rather
6 | % trying to direclty approximate the expected energy (eg with the Laplace
7 | % approximation), we first derive the gradient of the ELBO wrt the variational
8 | % parameters (mu, sigma) and approximate this gradient via sampling.
9 | %
10 | % See Ranganath, Gerrish & Blei (2013) Black Box Variational Inference
11 | % -------------------------------------------------------------------------
12 |
13 | % Assuming that the posterior takes a Gaussian form:
14 | % q(\theta) = N(mu, Sigma)
15 | % the inference reduces to an optimization problem: finding the moments
16 | % mu and Sigma which maximize the Free energy.
17 |
18 | %% =========================================================================
19 | % Free energy optimisation
20 | % =========================================================================
21 |
22 | % Meta parameters
23 | % -------------------------------------------------------------------------
24 | % number of random draws for the gradient estimation
25 | batchSize = 1e3;
26 |
27 | % convergence criteria
28 | epsilon = 0.001;
29 | maxIter = 1e3;
30 |
31 | % learning rate scaling
32 | eta = 1;
33 |
34 | % Stochastic gradient ascent
35 | % -------------------------------------------------------------------------
36 | % initialisation
37 | q = prior;
38 |
39 | % follow gradient until convergence
40 | for t = 1 : maxIter
41 |
42 | % draw a random batch of parameters from the variational distribution
43 | z = q.mu + sqrt(q.sigma) * randn(1,batchSize);
44 |
45 | % compute scores for all elements of the batch
46 | for i = 1 : batchSize
47 | h(:,i) = grad_score(q, z(i));
48 | f(:,i) = h(:,i) * (log_joint(data, prior, z(i)) - score(q, z(i)));
49 | end
50 |
51 | % approximate gradient as expectation
52 | for d = 1 : 2
53 | % control variate
54 | c = cov(h(d,:),f(d,:));
55 | a(d) = c(1,2)/var(h(d,:));
56 |
57 | % approximate gradient
58 | gL(d, t) = mean(f(d,:) - a(d)*h(d,:));
59 | end
60 |
61 | % adjust learning rate (adaGrad)
62 | rho = eta./sqrt(sum(gL.^2,2));
63 |
64 | % update variational moments
65 | delta = rho .* gL(:,t);
66 | q.mu = q.mu + delta(1);
67 | q.sigma = (sqrt(q.sigma) + delta(2))^2;
68 |
69 | % check for convergence
70 | if norm(delta) < epsilon
71 | break
72 | end
73 |
74 | end
75 |
76 | %% =========================================================================
77 | % model evidence
78 | % =========================================================================
79 |
80 | % Using the Jensen's inequality, we can write the lower bound to the log
81 | % model evidence as:
82 | %
83 | % $$ log p(y) >= E_q[log p(y,\theta) - log q(theta)] $$
84 | %
85 | % We can then use the law of large number to approximate the expectation
86 | % via sampling.
87 |
88 | for i = 1 : 1e5
89 | theta = q.mu + sqrt(q.sigma) * randn();
90 | Li = log_joint(data, prior, theta) - score(q, theta);
91 | end
92 | logEvidence = mean(Li);
93 |
94 | end
95 |
96 | %% =========================================================================
97 | % Subfunction
98 | % =========================================================================
99 |
100 | % scoring function: log q(\theta) = N(mu,sigma)
101 | function s = score (q, z_i)
102 | s = - 0.5*log(2*pi*q.sigma) - 0.5 * ((z_i-q.mu).^2)/q.sigma;
103 | end
104 |
105 | % gradient of the scoring function
106 | function g = grad_score(q, z_i)
107 | % d/d\theta
108 | g(1,1) = (z_i-q.mu)/q.sigma;
109 | % d/d\Sigma
110 | sq_sigma = sqrt(q.sigma);
111 | g(2,1) = ((z_i-q.mu)^2)/sq_sigma^3- 1/sq_sigma;
112 | end
113 |
--------------------------------------------------------------------------------
/log_joint.m:
--------------------------------------------------------------------------------
1 | function z = log_joint(data, prior, theta)
2 |
3 | % compute log joint
4 | z = - 0.5*log(2*pi*prior.sigma) - 0.5 * ((theta-prior.mu).^2)/prior.sigma ... % Gaussian prior
5 | + log_likelihood(data, theta);
6 | end
--------------------------------------------------------------------------------
/log_likelihood.m:
--------------------------------------------------------------------------------
1 | function ll = log_likelihood(data, theta)
2 |
3 | % model prediction
4 | sx = sigmoid(data.x, theta);
5 | % avoid numerical overflow
6 | sx = max(min(sx,1-eps), eps);
7 | % compute log joint
8 | ll = sum( ... % aggregate over obsevations
9 | data.y.*log(sx) + (1-data.y).*log(1-sx) ... %binomial log-likelihood
10 | );
11 | end
--------------------------------------------------------------------------------
/main.m:
--------------------------------------------------------------------------------
1 | function [posterior, logEvidence] = main ()
2 |
3 | % Simulate data according to our logisitic model with a known parameter
4 | % -------------------------------------------------------------------------
5 | true_theta = 3;
6 | data = simulate_data (true_theta);
7 |
8 | % Define a prior
9 | % -------------------------------------------------------------------------
10 | prior.mu = 0;
11 | prior.sigma = 5;
12 |
13 | % Solve the inference problem...
14 | % -------------------------------------------------------------------------
15 |
16 | % - using sampling (MCMC)
17 | tic
18 | [posterior(1), logEvidence(1), samples] = invert_monte_carlo (data, prior);
19 | toc
20 |
21 | % - using the variational-laplace approch
22 | tic
23 | [posterior(2), logEvidence(2)] = invert_variational_laplace (data, prior);
24 | toc
25 |
26 | % - using a dummy variational without laplace
27 | tic
28 | [posterior(3), logEvidence(3)] = invert_variational_numeric (data, prior);
29 | toc
30 |
31 | % - using variational inference with stochastic gradient ("blackbox")
32 | tic
33 | [posterior(4), logEvidence(4)] = invert_variational_stochastic (data, prior);
34 | toc
35 |
36 | % - using the VBA toolobx (variational laplace)
37 | tic
38 | [posterior(5), logEvidence(5)] = invert_VBAtoolbox (data, prior);
39 | toc
40 |
41 | % Plot the results
42 | % -------------------------------------------------------------------------
43 |
44 | x = linspace(- 6, 6, 200);
45 | figure();
46 | hold on
47 | plot_gaussian(x, prior, 'r');
48 | plot_likelihood(x, data, 'b');
49 | plot_joint(x, data, prior, 'm');
50 | plot_samples(samples, [.3 .6 .6]);
51 | plot_gaussian(x, posterior(2), 'g');
52 | plot_gaussian(x, posterior(4), 'g--');
53 | legend({'prior','likelihood', 'joint', 'MCMC', 'variational Laplace', 'blackbox'});
54 |
55 | end
56 |
57 | % plotting helpers
58 |
59 | function h = plot_gaussian(x, p, color)
60 | h = plot(x, normpdf (x, p.mu, sqrt(p.sigma)), color);
61 | end
62 |
63 | function plot_samples(samples, color)
64 | histogram (samples, ...
65 | 'Normalization', 'pdf', ...
66 | 'FaceColor', color, ...
67 | 'EdgeColor', color)
68 | end
69 |
70 | function h = plot_likelihood(x, data, color)
71 | for t = 1 : numel (x)
72 | p(t) = exp(log_likelihood(data, x(t)));
73 | end
74 | p = p / sum (p * (x(2)-x(1)));
75 | h = plot(x, p, color);
76 | end
77 |
78 | function h = plot_joint(x, data, prior, color)
79 | for t = 1 : numel (x)
80 | p(t) = exp(log_joint(data, prior, x(t)));
81 | end
82 | p = p / sum (p * (x(2)-x(1)));
83 | h = plot(x, p, color);
84 | end
85 |
86 |
87 |
88 |
--------------------------------------------------------------------------------
/sigmoid.m:
--------------------------------------------------------------------------------
1 | function z = sigmoid (x, theta)
2 | z = 1 ./ (1 + exp(- theta * x));
3 | end
--------------------------------------------------------------------------------
/simulate_data.m:
--------------------------------------------------------------------------------
1 | function data = simulate_data (theta)
2 | % Simulate binary responses from a logisitc model
3 | % -------------------------------------------------------------------------
4 | % -------------------------------------------------------------------------
5 |
6 | % true parameter of the model: the slope of the sigmoid mapping
7 | data.theta = theta;
8 |
9 | % experimental manipulation (eg: stimulus intensity)
10 | data.x = linspace(-5,5,100);
11 |
12 | % predictions of the model
13 | data.sx = sigmoid (data.x, data.theta);
14 |
15 | % generate binary responses from model predictions
16 | data.y = +(rand(1, numel(data.x)) < data.sx) ;
17 |
18 |
19 |
--------------------------------------------------------------------------------