├── images ├── hmm.png ├── lds.png └── transition.png ├── Readme.md ├── 01-hmm-em-algorithm.ipynb └── 02-lds-em-algorithm.ipynb /images/hmm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/hmm-lds-em-algorithm/HEAD/images/hmm.png -------------------------------------------------------------------------------- /images/lds.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/hmm-lds-em-algorithm/HEAD/images/lds.png -------------------------------------------------------------------------------- /images/transition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/hmm-lds-em-algorithm/HEAD/images/transition.png -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Estimate Hidden Markov Models (HMM) and Linear Dynamical Systems (LDS) 2 | 3 | **\- Estimate Sequential Data with Hidden States in Python \-** 4 | 5 | In this repository, I'll introduce you machine learning methods, **EM algorithm**, to analyze sequential data, Hidden Markov Models (HMM) and Linear Dynamical Systems (LDS).
6 | These both models are mixture models, in which the choice of mixture component for each observation will depend on the choice of component for the previous observation. 7 | 8 | To make you have a clear picture, I'll give you mathematical descriptions, with clear examples in Python script. 9 | 10 | - [Hidden Markov Models (HMM)](01-hmm-em-algorithm.ipynb) 11 | - [Linear Dynamical Systems (LDS)](02-lds-em-algorithm.ipynb) 12 | 13 | When there exist some sequential pattern in data, in which the sets of data is not independent or identically distributed (shortly, i.i.d.), these tutorials will help you find patterns in Markov chain. 14 | 15 | > Note : For time series analysis in sequential data based on AR (auto-regressive) and MA (moving average), see "[Introduce Time Series Analysis with ARIMA](https://tsmatz.wordpress.com/2017/07/26/time-series-arima-r-tutorial-01-ar-ma/)". 16 | 17 | ## Reference 18 | 19 | This program code is inspired by Chapter 13 in "[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)" (Christopher M. Bishop, Microsoft).
20 | For the further reading, see "Pattern Recognition and Machine Learning". 21 | -------------------------------------------------------------------------------- /01-hmm-em-algorithm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "fda67945", 6 | "metadata": {}, 7 | "source": [ 8 | "# Hidden Markov Models (HMM) in Python\n", 9 | "\n", 10 | "HMM can handle a wide range of distributions, such as, discrete tables, Gaussians, and mixtures of Gaussians.\n", 11 | "\n", 12 | "In HMM, it assumes the latent (hidden) discrete multinomial variables $ \\{\\mathbf{z}_n\\} $, which generates the corresponding observation $ \\{\\mathbf{x}_n\\} $. (See below.)
\n", 13 | "The observers can see only $ \\{\\mathbf{x}_n\\} $, and the model will then be estimated using obervations, $ \\{\\mathbf{x}_n\\} $.\n", 14 | "\n", 15 | "![Hidden Markov Models](images/hmm.png)\n", 16 | "\n", 17 | "In HMM, $ p(\\mathbf{z}_n|\\mathbf{z}_{n-1}) $ is called a **transition probability**, and $ p(\\mathbf{x}_n|\\mathbf{z}_n) $ is a **emission probability**.\n", 18 | "\n", 19 | "> In this notebook, I denote a scalar variable by normal letter (such as, $ x $), and denote a vector (incl. a matrix) by bold letter (such as, $ \\mathbf{x} $).\n", 20 | "\n", 21 | "*back to [Readme](https://github.com/tsmatz/hmm-lds-em-algorithm/)*" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "id": "7561acd9", 27 | "metadata": {}, 28 | "source": [ 29 | "## Sampling in Hidden Markov Models (Generate sample data)\n", 30 | "\n", 31 | "First of all, we'll generate sample data (observations) by using the distribution of Hidden Markov Models (HMM).\n", 32 | "\n", 33 | "As I mentioned above, the distribution of the latent (hidden) variables $ \\{\\mathbf{z}_n\\} $ is discrete, and it then corresponds to a table of transitions.\n", 34 | "\n", 35 | "For sampling, first I'll create a set of latent (hidden) variables, $ \\{\\mathbf{z}_n\\} $, in which it has 3 states (i.e, $ K=3 $) with the following transition probabilities $ p(\\mathbf{z}_n|\\mathbf{z}_{n-1}) $.\n", 36 | "\n", 37 | "![HMM Discrete Transition](images/transition.png)\n", 38 | "\n", 39 | "$$ A = \\begin{bmatrix} 0.7 & 0.15 & 0.15 \\\\ 0.0 & 0.5 & 0.5 \\\\ 0.3 & 0.35 & 0.35 \\end{bmatrix} $$\n", 40 | "\n", 41 | "From now, I'll use the letter $ k \\in \\{0, 1, 2\\} $ for the corresponding 3 states, and I assume $ \\mathbf{z}_n = (z_{n,0}, z_{n,1}, z_{n,2}) $, in which $ z_{n,k^{\\prime}}=1 $ and $ z_{n,k \\neq k^{\\prime}}=0 $ in state $ k^{\\prime} $." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "id": "e8c961a3", 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "data": { 52 | "text/plain": [ 53 | "array([0, 0, 0, ..., 0, 0, 2])" 54 | ] 55 | }, 56 | "execution_count": 1, 57 | "metadata": {}, 58 | "output_type": "execute_result" 59 | } 60 | ], 61 | "source": [ 62 | "import numpy as np\n", 63 | "\n", 64 | "np.random.seed(1000) # For debugging and reproducibility\n", 65 | "\n", 66 | "N = 1000\n", 67 | "\n", 68 | "Z = np.array([0])\n", 69 | "for n in range(N):\n", 70 | " prev_z = Z[len(Z) - 1]\n", 71 | " if prev_z == 0:\n", 72 | " post_z = np.random.choice(3, size=1, p=[0.7, 0.15, 0.15])\n", 73 | " elif prev_z == 1:\n", 74 | " post_z = np.random.choice(3, size=1, p=[0.0, 0.5, 0.5])\n", 75 | " elif prev_z == 2:\n", 76 | " post_z = np.random.choice(3, size=1, p=[0.3, 0.35, 0.35])\n", 77 | " Z = np.append(Z, post_z)\n", 78 | "Z" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "id": "4751b087", 84 | "metadata": {}, 85 | "source": [ 86 | "Next I'll create the corresponding observation $ \\{\\mathbf{x}_n\\} $ for sampling.
\n", 87 | "Here I assume 2-dimensional **Gaussian distribution** $ \\mathcal{N}(\\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k) $ for emission probabilities $ p(\\mathbf{x}_n|\\mathbf{z}_n) $, when $ \\mathbf{z}_n $ belongs to $ k $. ($ k=0,1,2 $)
\n", 88 | "In order to simplify, I also assume that parameters $ \\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k $ are independent for different components $ k=0, 1, 2 $.\n", 89 | "\n", 90 | "In this example, I set $ \\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k $ as follows.\n", 91 | "\n", 92 | "$$ \\mathbf{\\mu}_0=(16.0, 1.0), \\;\\; \\mathbf{\\Sigma}_0 = \\begin{bmatrix} 4.0 & 3.5 \\\\ 3.5 & 4.0 \\end{bmatrix} $$\n", 93 | "\n", 94 | "$$ \\mathbf{\\mu}_1=(1.0, 16.0), \\;\\; \\mathbf{\\Sigma}_1 = \\begin{bmatrix} 4.0 & 0.0 \\\\ 0.0 & 1.0 \\end{bmatrix} $$\n", 95 | "\n", 96 | "$$ \\mathbf{\\mu}_2=(-5.0, -5.0), \\;\\; \\mathbf{\\Sigma}_2 = \\begin{bmatrix} 1.0 & 0.0 \\\\ 0.0 & 4.0 \\end{bmatrix} $$" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 2, 102 | "id": "9d18dbe8", 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "array([[16.10996367, -0.05478763],\n", 109 | " [18.15392063, 3.77525205],\n", 110 | " [16.73825958, 0.59324625],\n", 111 | " ...,\n", 112 | " [14.2188323 , -1.0984775 ],\n", 113 | " [18.41063372, 5.28130838],\n", 114 | " [-3.64054111, -4.00216984]])" 115 | ] 116 | }, 117 | "execution_count": 2, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "X = np.empty((0,2))\n", 124 | "for z_n in Z:\n", 125 | " if z_n == 0:\n", 126 | " x_n = np.random.multivariate_normal(\n", 127 | " mean=[16.0, 1.0],\n", 128 | " cov=[[4.0,3.5],[3.5,4.0]],\n", 129 | " size=1)\n", 130 | " elif z_n == 1:\n", 131 | " x_n = np.random.multivariate_normal(\n", 132 | " mean=[1.0, 16.0],\n", 133 | " cov=[[4.0,0.0],[0.0,1.0]],\n", 134 | " size=1)\n", 135 | " elif z_n ==2:\n", 136 | " x_n = np.random.multivariate_normal(\n", 137 | " mean=[-5.0, -5.0],\n", 138 | " cov=[[1.0,0.0],[0.0,4.0]],\n", 139 | " size=1)\n", 140 | " X = np.vstack((X, x_n))\n", 141 | "X" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "id": "eb896c56", 147 | "metadata": {}, 148 | "source": [ 149 | "## EM algorithm in Hidden Markov Models (HMM)\n", 150 | "\n", 151 | "Now, using the given observation $ \\{ \\mathbf{x}_n \\} $, let's try to estimate the optimimal parameters in HMM.\n", 152 | "\n", 153 | "When I denote unknown parameters by $ \\mathbf{\\theta} $, our goal is to get the optimal parameters $ \\mathbf{\\theta} $ to maximize the following (1).\n", 154 | "\n", 155 | "$$ p(\\mathbf{X}|\\mathbf{\\theta}) = \\sum_{\\mathbf{Z}} p(\\mathbf{X},\\mathbf{Z}|\\mathbf{\\theta}) \\;\\;\\;\\;(1) $$\n", 156 | "\n", 157 | "where $ \\mathbf{Z} = \\{\\mathbf{z}_n\\} $ and $ \\mathbf{X} = \\{\\mathbf{x}_n\\} $\n", 158 | "\n", 159 | "In this example, I use the following parameters as $ \\mathbf{\\theta} = \\{ \\mathbf{\\pi}, \\mathbf{A}, \\mathbf{\\mu}, \\mathbf{\\Sigma} \\} $.\n", 160 | "\n", 161 | "- $ \\pi_k (k \\in \\{0, 1, 2\\}) $ : The possibility (scalar) for component $ k $ in initial latent node $ \\mathbf{z}_0 $. ($ \\Sigma_k \\pi_k = 1 $)\n", 162 | "- $ A_{j,k} \\; (j, k \\in \\{0, 1, 2\\}) $ : The transition probability (scalar) for the latent variable $ \\mathbf{z}_{n-1} $ to $ \\mathbf{z}_n $, in which $ \\mathbf{z}_{n-1} $ belongs to $ j $ and $ \\mathbf{z}_n $ belongs to $ k $. ($ \\Sigma_k A_{j,k} = 1 $)\n", 163 | "- $ \\mathbf{\\mu}_k $ : The mean (2-dimensional vector) for Gaussian distribution in emission probabilities $ p(\\mathbf{x}_n|\\mathbf{z}_n) $ when the latent variable $ \\mathbf{z}_n $ belongs to $ k $.\n", 164 | "- $ \\mathbf{\\Sigma}_k $ : The covariance matrix ($ 2 \\times 2 $ matrix) for Gaussian distribution in emission probabilities $ p(\\mathbf{x}_n|\\mathbf{z}_n) $ when the latent variable $ \\mathbf{z}_n $ belongs to $ k $.\n", 165 | "\n", 166 | "In (1), the number of parameters will rapidly increase, when the number of states $ K $ increases (in this example, $ K = 3 $). Furthermore it has summation (not multiplication) in distribution (1), and the log likelihood will then lead to complex expression in maximum likelihood estimation (MLE).
\n", 167 | "Therefore, it will be difficult to directly apply maximum likelihood estimation (MLE) for the expression (1).\n", 168 | "\n", 169 | "> Note : Please see [here](https://tsmatz.wordpress.com/2017/08/30/regression-in-machine-learning-math-for-beginners/) for the idea of maximum likelihood estimation (MLE).\n", 170 | "\n", 171 | "In practice, the expectation–maximization algorithm (shortly, **EM algorithm**) can often be applied to solve parameters in HMM.
\n", 172 | "In this example, I'll also apply EM algorithm, instead of MLE.\n", 173 | "\n", 174 | "In EM algorithm for HMM, we start with initial parameters $ \\mathbf{\\theta}^{old} $, and optimize (find) new $ \\mathbf{\\theta} $ to maximize the following expression (2).
\n", 175 | "By repeating this operation, we can expect to reach to the likelihood parameters $ \\hat{\\mathbf{\\theta}} $.\n", 176 | "\n", 177 | "$$ Q(\\mathbf{\\theta}, \\mathbf{\\theta}^{old}) = \\sum_{\\mathbf{Z}} p(\\mathbf{Z}|\\mathbf{X}, \\mathbf{\\theta}^{old}) \\ln p(\\mathbf{X}, \\mathbf{Z}|\\mathbf{\\theta}) \\;\\;\\;\\;(2) $$\n", 178 | "\n", 179 | "> Note : For the essential idea of EM algorithm, see Chapter 9 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" (Christopher M. Bishop, Microsoft)\n", 180 | "\n", 181 | "Now I denote the discrete probability $ p(\\mathbf{z}_n|\\mathbf{X},\\mathbf{\\theta}^{old}) $ by $ \\gamma(z_{n,k}) \\; (k=0,1,2) $, in which $ \\gamma(z_{n,k}) $ represents the probability of $ \\mathbf{z}_n $ for belonging to $ k $.
\n", 182 | "I also denote the discrete probability $ p(\\mathbf{z}_{n-1}, \\mathbf{z}_n | \\mathbf{X},\\mathbf{\\theta}^{old}) $ by $ \\xi(z_{n-1,j}, z_{n,k}) \\; (j,k=0,1,2) $, in which $ \\xi(z_{n-1,j}, z_{n,k}) $ represents the joint probability that $ \\mathbf{z}_{n-1} $ belongs to $ j $ and $ \\mathbf{z}_n $ belongs to $ k $. \n", 183 | "\n", 184 | "In Gaussian HMM (in above model), the equation (2) is written as follows, using $ \\gamma() $ and $ \\xi() $.\n", 185 | "\n", 186 | "$$ Q(\\mathbf{\\theta}, \\mathbf{\\theta}^{old}) = \\sum_{k=0}^{K-1} \\gamma(z_{0,k}) \\ln{\\pi_k} + \\sum_{n=1}^{N-1} \\sum_{j=0}^{K-1} \\sum_{k=0}^{K-1} \\xi(z_{n-1,j},z_{n,k}) \\ln{A_{j,k}} + \\sum_{n=0}^{N-1} \\sum_{k=0}^{K-1} \\gamma(z_{n,k}) \\ln{p(\\mathbf{x}_n|\\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k)} \\;\\;\\;\\;(3)$$\n", 187 | "\n", 188 | "where\n", 189 | "\n", 190 | "$$ \\gamma(\\mathbf{z}_n) = p(\\mathbf{z}_n|\\mathbf{X},\\mathbf{\\theta}^{old}) $$\n", 191 | "\n", 192 | "$$ \\xi(\\mathbf{z}_{n-1}, \\mathbf{z}_n) = p(\\mathbf{z}_{n-1}, \\mathbf{z}_n|\\mathbf{X},\\mathbf{\\theta}^{old}) $$\n", 193 | "\n", 194 | "It's known that $ \\gamma() $ and $ \\xi() $ can be given by the following $ \\alpha() $ and $ \\beta() $, which are determined recursively. (i.e, We can first determine all $ \\alpha() $ and $ \\beta() $ recursively, and then we can obtain $ \\gamma() $ and $ \\xi() $ with known $ \\alpha(), \\beta() $.)\n", 195 | "\n", 196 | "$$ \\gamma(z_{n,k}) = \\frac{\\alpha(z_{n,k})\\beta(z_{n,k})}{\\sum_{k=0}^{K-1} \\alpha(z_{n,k})\\beta(z_{n,k})} $$\n", 197 | "\n", 198 | "$$ \\xi(z_{n-1,j},z_{n,k}) = \\frac{\\alpha(z_{n-1,j})p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old})A_{j,k}^{old}\\beta(z_{n,k})}{\\sum_{j=0}^{K-1} \\sum_{k=0}^{K-1} \\alpha(z_{n-1,j})p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old})A_{j,k}^{old}\\beta(z_{n,k})} $$\n", 199 | "\n", 200 | "where all $ \\alpha() $ and $ \\beta() $ are recursively given by\n", 201 | "\n", 202 | "$$ \\alpha(z_{n,k}) = p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old}) \\sum_{j=0}^{K-1} A_{jk}^{old} \\alpha(z_{n-1,j}) $$\n", 203 | "\n", 204 | "$$ \\beta(z_{n-1,k}) = \\sum_{j=0}^{K-1} A^{old}_{k,j} p(\\mathbf{x}_{n}|\\mathbf{\\mu}_j^{old}, \\mathbf{\\Sigma}_j^{old}) \\beta(z_{n,j}) $$\n", 205 | "\n", 206 | "Now we need the starting condition for recursion, $ \\alpha() $ and $ \\beta() $, and these are given as follows.\n", 207 | "\n", 208 | "$$ \\alpha(z_{0,k}) = \\pi_k^{old} p(\\mathbf{x}_0|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old}) $$\n", 209 | "\n", 210 | "$$ \\beta(z_{N-1,k}) = 1 $$\n", 211 | "\n", 212 | "> Note : Here I have showed these properties in Gaussian HMM without any proofs, but you can refer Chapter 13 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" (Christopher M. Bishop, Microsoft) for details. (In this notebook, I'm using the same notation in this book.)\n", 213 | "\n", 214 | "Once you have got $ \\gamma() $ and $ \\xi() $, you can get the optimal $ \\mathbf{\\theta} = \\{ \\mathbf{\\pi}, \\mathbf{A}, \\mathbf{\\mu}, \\mathbf{\\Sigma} \\} $ to maximize (3) as follows by applying Lagrange multipliers.\n", 215 | "\n", 216 | "$$ \\pi_k = \\frac{\\gamma(z_{0,k})}{\\sum_{j=0}^{K-1} \\gamma(z_{0,j})} $$\n", 217 | "\n", 218 | "$$ A_{j,k} = \\frac{\\sum_{n=1}^{N-1} \\xi(z_{n-1,j},z_{n,k})}{\\sum_{l=0}^{K-1} \\sum_{n=1}^{N-1} \\xi(z_{n-1,j},z_{n,l})} $$\n", 219 | "\n", 220 | "$$ \\mathbf{\\mu}_k = \\frac{\\sum_{n=0}^{N-1} \\gamma(z_{n,k}) \\mathbf{x}_n}{\\sum_{n=0}^{N-1} \\gamma(z_{n,k})} $$\n", 221 | "\n", 222 | "$$ \\mathbf{\\Sigma}_k = \\frac{\\sum_{n=0}^{N-1} \\gamma(z_{n,k}) (\\mathbf{x}_n-\\mathbf{\\mu}_k) (\\mathbf{x}_n-\\mathbf{\\mu}_k)^T}{\\sum_{n=0}^{N-1} \\gamma(z_{n,k})} $$\n", 223 | "\n", 224 | "You repeat this process by replacing $ \\mathbf{\\theta}^{old} $ with this new $ \\mathbf{\\theta} $, and you will eventually get the optimal results $ \\hat{\\mathbf{\\theta}} $ to maximize (1).\n", 225 | "\n", 226 | "In practice, $ \\alpha() $ and $ \\beta() $ will quickly go to zero (because it's recursively multiplied by $ p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old}) $ and $ A_{j,k}^{old} $) and it will then exceed the dynamic range of precision in computation, when $ N $ is large.
\n", 227 | "For this reason, the coefficients, called **scaling factors**, will be introduced to normalize $ \\alpha() $ and $ \\beta() $ in each step $ n $. (See the following source code.) The scaling factors will be canceled in EM algorithms, however, when you monitor the value of likelihood functions, you'll need to record scaling factors and apply these facotrs. (See Chapter 13 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" for details.)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "id": "fba1a21e", 233 | "metadata": {}, 234 | "source": [ 235 | "## Apply algorithm in Python" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "id": "4f60f500", 241 | "metadata": {}, 242 | "source": [ 243 | "## 0. Prerequisites" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": null, 249 | "id": "f4baaa78", 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "!pip3 install numpy\n", 254 | "!pip3 install scipy" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "id": "aa3be6ac", 260 | "metadata": {}, 261 | "source": [ 262 | "## 1. Initialize parameters\n", 263 | "\n", 264 | "First, initialize $ \\mathbf{\\theta} = \\{ \\pi_k, \\mathbf{A}_{j,k}, \\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k \\} $ as follows.\n", 265 | "\n", 266 | "- $ \\pi_0 = 0.3, \\pi_1 = 0.3, \\pi_2 = 0.4 $\n", 267 | "- $ A_{i,j} = 0.4 $ if $ i=j $, and $ A_{i,j} = 0.3 $ otherwise\n", 268 | "- $ \\mathbf{\\mu}_k = (1.0, 1.0) \\;\\;\\; (k = 0,1,2) $\n", 269 | "- $ \\mathbf{\\Sigma}_k = \\begin{bmatrix} 1.0 & 0.5 \\\\ 0.5 & 1.0 \\end{bmatrix} \\;\\;\\; (k = 0,1,2) $\n", 270 | "\n", 271 | "In this example, I set the fixed values. However, in practice, K-means will be used to determine initial $ \\mathbf{\\mu}_k $ and $ \\mathbf{\\Sigma}_k $, in order to speed up optimization." 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 3, 277 | "id": "b114a5c1", 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [ 281 | "# Initialize parameters\n", 282 | "theta_old = {\n", 283 | " \"pi\":[0.3, 0.3, 0.4],\n", 284 | " \"A\":[[0.4,0.3,0.3],[0.3,0.4,0.3],[0.3,0.3,0.4]],\n", 285 | " \"mu\":[[1.0,1.0],[1.0,1.0],[1.0,1.0]],\n", 286 | " \"Sigma\":[\n", 287 | " [[1.0,0.5],[0.5,1.0]],\n", 288 | " [[1.0,0.5],[0.5,1.0]],\n", 289 | " [[1.0,0.5],[0.5,1.0]]\n", 290 | " ]\n", 291 | "}" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "id": "bb646728", 297 | "metadata": {}, 298 | "source": [ 299 | "## 2. Get $ \\alpha() $ and $ \\beta() $" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "id": "2d8a4fc8", 305 | "metadata": {}, 306 | "source": [ 307 | "Now I set the starting condition, $ \\alpha(z_{0,k}) $. :\n", 308 | "\n", 309 | "$$ \\alpha(z_{0,k}) = \\pi_k^{old} p(\\mathbf{x}_0|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old}) $$\n", 310 | "\n", 311 | "And we can recursively obtain all $ \\alpha(z_{n,k}) $ as follows.\n", 312 | "\n", 313 | "$$ \\alpha(z_{n,k}) = p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old}) \\sum_{j=0}^{K-1} A_{jk}^{old} \\alpha(z_{n-1,j}) $$\n", 314 | "\n", 315 | "As I mentioned above, I also introduce a scaling factor in each steps to prevent the overflow of dynamic range. (Here I don't record these scaling factors.)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 4, 321 | "id": "179fc0b8", 322 | "metadata": {}, 323 | "outputs": [], 324 | "source": [ 325 | "from scipy.stats import multivariate_normal\n", 326 | "\n", 327 | "def get_alpha():\n", 328 | " alpha = np.empty((0,3))\n", 329 | "\n", 330 | " # Get initial alpha_0\n", 331 | " alpha_0 = np.array([])\n", 332 | " for k in range(3):\n", 333 | " p_dist = multivariate_normal(\n", 334 | " mean=theta_old[\"mu\"][k],\n", 335 | " cov=theta_old[\"Sigma\"][k])\n", 336 | " alpha_0 = np.append(alpha_0, theta_old[\"pi\"][k] * p_dist.pdf(X[0]))\n", 337 | " alpha_0 = alpha_0 / alpha_0.sum() # apply scaling\n", 338 | " alpha = np.vstack((alpha, alpha_0))\n", 339 | "\n", 340 | " # Get all elements recursively\n", 341 | " for n in range(1, N):\n", 342 | " alpha_n = np.array([])\n", 343 | " for k in range(3):\n", 344 | " p_dist = multivariate_normal(\n", 345 | " mean=theta_old[\"mu\"][k],\n", 346 | " cov=theta_old[\"Sigma\"][k])\n", 347 | " alpha_n = np.append(\n", 348 | " alpha_n,\n", 349 | " p_dist.pdf(X[n]) * sum((theta_old[\"A\"][j][k] * alpha[n-1][j]) for j in range(3)))\n", 350 | " alpha_n = alpha_n / alpha_n.sum() # apply scaling\n", 351 | " alpha = np.vstack((alpha, alpha_n))\n", 352 | "\n", 353 | " return alpha" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "id": "7a216fa5", 359 | "metadata": {}, 360 | "source": [ 361 | "I also set the starting condition, $ \\beta(z_{N-1,k}) $. :\n", 362 | "\n", 363 | "$$ \\beta(z_{N-1,k}) = 1 $$\n", 364 | "\n", 365 | "And we can recursively obtain all $ \\beta(z_{n,k}) $ as follows.\n", 366 | "\n", 367 | "$$ \\beta(z_{n-1,k}) = \\sum_{j=0}^{K-1} A^{old}_{k,j} p(\\mathbf{x}_{n}|\\mathbf{\\mu}_j^{old}, \\mathbf{\\Sigma}_j^{old}) \\beta(z_{n,j}) $$\n", 368 | "\n", 369 | "As I mentioned above, I also introduce a scaling factor in each steps to prevent the overflow of dynamic range. In practice, the scaling factors can be shared between $ \\alpha() $ and $ \\beta() $ (and you can then use these shared values for getting values of likelihood function), but in this example, I simply normalize values in each steps." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 5, 375 | "id": "6e94369b", 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "def get_beta():\n", 380 | " beta_rev = np.empty((0,3))\n", 381 | "\n", 382 | " # Get initial beta_{N-1}\n", 383 | " beta_last = np.array([1.0, 1.0, 1.0])\n", 384 | " beta_last = beta_last / beta_last.sum() # apply scaling\n", 385 | " beta_rev = np.vstack((beta_rev, beta_last))\n", 386 | "\n", 387 | " # Get all elements recursively\n", 388 | " for n in range(1, N):\n", 389 | " beta_rev_n = np.array([])\n", 390 | " for k in range(3):\n", 391 | " beta_rev_n_k = 0\n", 392 | " for j in range(3):\n", 393 | " p_dist = multivariate_normal(\n", 394 | " mean=theta_old[\"mu\"][j],\n", 395 | " cov=theta_old[\"Sigma\"][j])\n", 396 | " beta_rev_n_k = theta_old[\"A\"][k][j] * p_dist.pdf(X[n-1]) * beta_rev[n-1][j] + beta_rev_n_k\n", 397 | " beta_rev_n = np.append(beta_rev_n, beta_rev_n_k)\n", 398 | " beta_rev_n = beta_rev_n / beta_rev_n.sum() # apply scaling\n", 399 | " beta_rev = np.vstack((beta_rev, beta_rev_n))\n", 400 | "\n", 401 | " # Reverse results\n", 402 | " beta = np.flip(beta_rev, axis=0)\n", 403 | " \n", 404 | " return beta" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "id": "e486bdfc", 410 | "metadata": {}, 411 | "source": [ 412 | "## 3. Get $ \\gamma() $ and $ \\xi() $" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "id": "ee549e91", 418 | "metadata": {}, 419 | "source": [ 420 | "Now we obtain $ \\gamma() $ and $ \\xi() $ with previous $ \\alpha() $ and $ \\beta() $.
\n", 421 | "First we get $ \\gamma() $ as follows. (I note that the value is not normalized.)\n", 422 | "\n", 423 | "$$ \\gamma(z_{n,k}) = \\frac{\\alpha(z_{n,k})\\beta(z_{n,k})}{\\sum_{k=0}^{K-1} \\alpha(z_{n,k})\\beta(z_{n,k})} $$" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 6, 429 | "id": "4282314b", 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [ 433 | "def get_gamma(alpha, beta):\n", 434 | " gamma = np.empty((0,3))\n", 435 | "\n", 436 | " for n in range(N):\n", 437 | " gamma_n = np.array([])\n", 438 | " for k in range(3):\n", 439 | " gamma_n = np.append(gamma_n, alpha[n][k] * beta[n][k])\n", 440 | " gamma_n = gamma_n / gamma_n.sum()\n", 441 | " gamma = np.vstack((gamma, gamma_n))\n", 442 | "\n", 443 | " return gamma" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "id": "4f0da770", 449 | "metadata": {}, 450 | "source": [ 451 | "Next we also get $ \\xi() $ as follows. (I note that the value is not normalized.)\n", 452 | "\n", 453 | "$$ \\xi(z_{n-1,j},z_{n,k}) = \\frac{\\alpha(z_{n-1,j})p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old})A_{j,k}^{old}\\beta(z_{n,k})}{\\sum_{j=0}^{K-1} \\sum_{k=0}^{K-1} \\alpha(z_{n-1,j})p(\\mathbf{x}_n|\\mathbf{\\mu}_k^{old}, \\mathbf{\\Sigma}_k^{old})A_{j,k}^{old}\\beta(z_{n,k})} $$" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 7, 459 | "id": "1ec9dfd4", 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [ 463 | "def get_xi(alpha, beta):\n", 464 | " xi = np.empty((0,3,3))\n", 465 | "\n", 466 | " for n in range(1, N):\n", 467 | " xi_n = np.zeros((3,3), dtype=np.float64)\n", 468 | " for j in range(3):\n", 469 | " for k in range(3):\n", 470 | " p_dist = multivariate_normal(\n", 471 | " mean=theta_old[\"mu\"][k],\n", 472 | " cov=theta_old[\"Sigma\"][k])\n", 473 | " xi_n[j][k] = alpha[n-1][j] * p_dist.pdf(X[n]) * theta_old[\"A\"][j][k] * beta[n][k]\n", 474 | " xi_n = xi_n / xi_n.sum()\n", 475 | " xi = np.vstack((xi, [xi_n]))\n", 476 | "\n", 477 | " return xi" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "id": "2e27e5dc", 483 | "metadata": {}, 484 | "source": [ 485 | "## 4. Get new (optimal) parameters $ \\mathbf{\\theta} $\n", 486 | "\n", 487 | "Finally, get new $ \\mathbf{\\theta} = \\{ \\pi_k, A, \\mathbf{\\mu}, \\mathbf{\\Sigma} \\} $ using previous $ \\gamma() $ and $ \\xi() $." 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "id": "60333624", 493 | "metadata": {}, 494 | "source": [ 495 | "First, $ \\pi_k \\; (k=0,1,2) $ is given as follows. (The obtained $ \\gamma() $ is fed into the following equation.)\n", 496 | "\n", 497 | "$$ \\pi_k = \\frac{\\gamma(z_{0,k})}{\\sum_{j=0}^{K-1} \\gamma(z_{0,j})} $$" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 8, 503 | "id": "c0b47105", 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "def get_pi_new(gamma):\n", 508 | " pi_new = np.array([])\n", 509 | "\n", 510 | " denom = sum(gamma[0][j] for j in range(3))\n", 511 | " for k in range(3):\n", 512 | " pi_new = np.append(pi_new, gamma[0][k] / denom)\n", 513 | "\n", 514 | " return pi_new" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "id": "44ae3cdf", 520 | "metadata": {}, 521 | "source": [ 522 | "$ A_{j,k} \\; (j,k=0,1,2) $ is given as follows.\n", 523 | "\n", 524 | "$$ A_{j,k} = \\frac{\\sum_{n=1}^{N-1} \\xi(z_{n-1,j},z_{n,k})}{\\sum_{l=0}^{K-1} \\sum_{n=1}^{N-1} \\xi(z_{n-1,j},z_{n,l})} $$" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 9, 530 | "id": "8a686f2e", 531 | "metadata": {}, 532 | "outputs": [], 533 | "source": [ 534 | "def get_A_new(xi):\n", 535 | " A_new = np.zeros((3,3), dtype=np.float64)\n", 536 | "\n", 537 | " for j in range(3):\n", 538 | " for k in range(3):\n", 539 | " denom = 0\n", 540 | " for l in range(3):\n", 541 | " for n in range(1, N):\n", 542 | " denom = denom + xi[n-1][j][l]\n", 543 | " A_new[j][k] = sum(xi[n-1][j][k] for n in range(1, N)) / denom\n", 544 | "\n", 545 | " return A_new" 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "id": "ff6dae4a", 551 | "metadata": {}, 552 | "source": [ 553 | "$ \\mathbf{\\mu}_{k} \\; (k=0,1,2) $ is given as follows.\n", 554 | "\n", 555 | "$$ \\mathbf{\\mu}_k = \\frac{\\sum_{n=0}^{N-1} \\gamma(z_{n,k}) \\mathbf{x}_n}{\\sum_{n=0}^{N-1} \\gamma(z_{n,k})} $$" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 10, 561 | "id": "98940314", 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "def get_mu_new(gamma):\n", 566 | " mu_new = np.zeros((3,2), dtype=np.float64)\n", 567 | "\n", 568 | " for k in range(3):\n", 569 | " denom = sum(gamma[n][k] for n in range(N))\n", 570 | " numer_x = sum(gamma[n][k] * X[n][0] for n in range(N))\n", 571 | " mu_new[k][0] = numer_x / denom\n", 572 | " numer_y = sum(gamma[n][k] * X[n][1] for n in range(N))\n", 573 | " mu_new[k][1] = numer_y / denom\n", 574 | "\n", 575 | " return mu_new" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "id": "9fa5659c", 581 | "metadata": {}, 582 | "source": [ 583 | "$ \\mathbf{\\Sigma}_{k} \\; (k=0,1,2) $ is given as follows.\n", 584 | "\n", 585 | "$$ \\mathbf{\\Sigma}_k = \\frac{\\sum_{n=0}^{N-1} \\gamma(z_{n,k}) (\\mathbf{x}_n-\\mathbf{\\mu}_k) (\\mathbf{x}_n-\\mathbf{\\mu}_k)^T}{\\sum_{n=0}^{N-1} \\gamma(z_{n,k})} $$" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 11, 591 | "id": "52d9e96a", 592 | "metadata": {}, 593 | "outputs": [], 594 | "source": [ 595 | "def get_Sigma_new(gamma, mu_new):\n", 596 | " Sigma_new = np.empty((0,2,2))\n", 597 | "\n", 598 | " for k in range(3):\n", 599 | " denom = sum(gamma[n][k] for n in range(N))\n", 600 | " numer = np.zeros((2, 2), dtype=np.float64)\n", 601 | " for n in range(N):\n", 602 | " sub = np.subtract(X[n], mu_new[k])\n", 603 | " sub = np.array([sub])\n", 604 | " sub_t = sub.transpose()\n", 605 | " numer = numer + gamma[n][k] * np.matmul(sub_t, sub)\n", 606 | " Sigma_new = np.vstack((Sigma_new, [numer / denom]))\n", 607 | "\n", 608 | " return Sigma_new" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "id": "7786d473", 614 | "metadata": {}, 615 | "source": [ 616 | "## 5. Run algorithm" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": 12, 622 | "id": "566f0062", 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "name": "stdout", 627 | "output_type": "stream", 628 | "text": [ 629 | "Running iteration 100 ...\n", 630 | "Done\n" 631 | ] 632 | } 633 | ], 634 | "source": [ 635 | "for loop in range(100):\n", 636 | " print(\"Running iteration {} ...\".format(loop + 1), end=\"\\r\")\n", 637 | " # Get alpha and beta\n", 638 | " alpha = get_alpha()\n", 639 | " beta = get_beta()\n", 640 | " # Get gamma and xi\n", 641 | " gamma = get_gamma(alpha, beta)\n", 642 | " xi = get_xi(alpha, beta)\n", 643 | " # Get optimized new parameters\n", 644 | " pi_new = get_pi_new(gamma)\n", 645 | " A_new = get_A_new(xi)\n", 646 | " mu_new = get_mu_new(gamma)\n", 647 | " Sigma_new = get_Sigma_new(gamma, mu_new)\n", 648 | " # Replace theta and repeat\n", 649 | " theta_old[\"pi\"] = pi_new\n", 650 | " theta_old[\"A\"] = A_new\n", 651 | " theta_old[\"mu\"] = mu_new\n", 652 | " theta_old[\"Sigma\"] = Sigma_new\n", 653 | "\n", 654 | "print(\"\\nDone\")" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "id": "1a09c21f", 660 | "metadata": {}, 661 | "source": [ 662 | "Here is the estimated results for parameters. (See below.)
\n", 663 | "I note that the obtained class $ k $ is transposed against the original $ k $ and thus I have rearranged its order in the following results.\n", 664 | "\n", 665 | "$$ A^{pred} = \\begin{bmatrix} 0.71929825 & 0.06427734 & 0.21642441 \\\\ 0.0 & 0.23535639 & 0.76464361 \\\\ 0.32796274 & 0.20301841 & 0.46901885 \\end{bmatrix} \\;\\; A^{label} = \\begin{bmatrix} 0.7 & 0.15 & 0.15 \\\\ 0.0 & 0.5 & 0.5 \\\\ 0.3 & 0.35 & 0.35 \\end{bmatrix} $$\n", 666 | "\n", 667 | "$$ \\mathbf{\\mu}_0^{pred}=(16.07330508, 0.9698968) \\;\\; \\mathbf{\\mu}_0^{label}=(16.0, 1.0), \\\\ \\mathbf{\\Sigma}_0^{pred} = \\begin{bmatrix} 3.99411283 & 3.47671034 \\\\ 3.47671034 & 4.06141304 \\end{bmatrix} \\;\\; \\mathbf{\\Sigma}_0^{label} = \\begin{bmatrix} 4.0 & 3.5 \\\\ 3.5 & 4.0 \\end{bmatrix} $$\n", 668 | "\n", 669 | "$$ \\mathbf{\\mu}_1^{pred}=(1.10889991, 16.13350166) \\;\\; \\mathbf{\\mu}_1^{label}=(1.0, 16.0), \\\\ \\mathbf{\\Sigma}_1^{pred} = \\begin{bmatrix} 4.5466816 & 0.06402755 \\\\ 0.06402755 & 0.93057687 \\end{bmatrix} \\;\\; \\mathbf{\\Sigma}_1^{label} = \\begin{bmatrix} 4.0 & 0.0 \\\\ 0.0 & 1.0 \\end{bmatrix} $$\n", 670 | "\n", 671 | "$$ \\mathbf{\\mu}_2^{pred}=(-3.16503151, 1.4672828) \\;\\; \\mathbf{\\mu}_2^{label}=(-5.0, -5.0), \\\\ \\mathbf{\\Sigma}_2^{pred} = \\begin{bmatrix} 9.10191777 & 26.13881586 \\\\ 26.13881586 & 97.13257096 \\end{bmatrix} \\;\\; \\mathbf{\\Sigma}_2^{label} = \\begin{bmatrix} 1.0 & 0.0 \\\\ 0.0 & 4.0 \\end{bmatrix} $$" 672 | ] 673 | }, 674 | { 675 | "cell_type": "code", 676 | "execution_count": 13, 677 | "id": "e725701f", 678 | "metadata": {}, 679 | "outputs": [ 680 | { 681 | "name": "stdout", 682 | "output_type": "stream", 683 | "text": [ 684 | "A\n", 685 | "[[0.46901885 0.20301841 0.32796274]\n", 686 | " [0.76464361 0.23535639 0. ]\n", 687 | " [0.21642441 0.06427734 0.71929825]]\n", 688 | "Mu\n", 689 | "[[-3.16503151 1.4672828 ]\n", 690 | " [ 1.10889991 16.13350166]\n", 691 | " [16.07330508 0.9698968 ]]\n", 692 | "Sigma\n", 693 | "[[[ 9.10191777 26.13881586]\n", 694 | " [26.13881586 97.13257096]]\n", 695 | "\n", 696 | " [[ 4.5466816 0.06402755]\n", 697 | " [ 0.06402755 0.93057687]]\n", 698 | "\n", 699 | " [[ 3.99411283 3.47671034]\n", 700 | " [ 3.47671034 4.06141304]]]\n" 701 | ] 702 | } 703 | ], 704 | "source": [ 705 | "np.set_printoptions(suppress=True)\n", 706 | "print(\"A\")\n", 707 | "print(A_new)\n", 708 | "print(\"Mu\")\n", 709 | "print(mu_new)\n", 710 | "print(\"Sigma\")\n", 711 | "print(Sigma_new)" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": null, 717 | "id": "e7d9865e", 718 | "metadata": {}, 719 | "outputs": [], 720 | "source": [] 721 | } 722 | ], 723 | "metadata": { 724 | "kernelspec": { 725 | "display_name": "Python 3", 726 | "language": "python", 727 | "name": "python3" 728 | }, 729 | "language_info": { 730 | "codemirror_mode": { 731 | "name": "ipython", 732 | "version": 3 733 | }, 734 | "file_extension": ".py", 735 | "mimetype": "text/x-python", 736 | "name": "python", 737 | "nbconvert_exporter": "python", 738 | "pygments_lexer": "ipython3", 739 | "version": "3.6.9" 740 | } 741 | }, 742 | "nbformat": 4, 743 | "nbformat_minor": 5 744 | } 745 | -------------------------------------------------------------------------------- /02-lds-em-algorithm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "fda67945", 6 | "metadata": {}, 7 | "source": [ 8 | "# Linear Dynamical Systems (LDS) in Python\n", 9 | "\n", 10 | "Historically, hidden Markov models (HMM) and linear dynamical systems (LDS) were developed independently. However, as you will find in this example, LDS has the deep relationship with HMM.
\n", 11 | "In this example, I also apply the same approach in hidden Markov models (HMM) example, **EM algorithm**, for Linear dynamical systems (LDS). (See [here](./01-hmm-em-algorithm.ipynb) for HMM.)\n", 12 | "\n", 13 | "Same as HMM, LDS assumes the latent (hidden) multinomial variables $ \\{\\mathbf{z}_{n}\\} $, and it then generates the corresponding observations, $ \\{\\mathbf{x}_{n}\\} $. The observers can see only $ \\{\\mathbf{x}_{n}\\} $, and the model will be estimated using observation, $ \\{\\mathbf{x}_{n}\\} $.
\n", 14 | "However, unlike HMM, the probability for the latent (hidden) variables $ \\{\\mathbf{z}_{n}\\} $ is not discrete. (i.e, It's continuous.)
\n", 15 | "See below.\n", 16 | "\n", 17 | "![Linear Dynamical Systems](images/lds.png)\n", 18 | "\n", 19 | "In this model, $ p(\\mathbf{z}_{n}|\\mathbf{z}_{n-1}) $ is called a **transition probability**, and $ p(\\mathbf{x}_{n}|\\mathbf{z}_n) $ is a **emission probability**.\n", 20 | "\n", 21 | "> In this notebook, I denote a scalar variable by normal letter (such as, $ x $), and denote a vector (incl. a matrix) by bold letter (such as, $ \\mathbf{x} $).\n", 22 | "\n", 23 | "*back to [Readme](https://github.com/tsmatz/hmm-lds-em-algorithm/)*" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "id": "7561acd9", 29 | "metadata": {}, 30 | "source": [ 31 | "## Sampling in Linear Dynamical Systems (Generate sample data)\n", 32 | "\n", 33 | "First of all, we'll generate sample data (observations) by using the distribution of Linear Dynamical Systems (LDS).\n", 34 | "\n", 35 | "Here I assume that both $ p(\\mathbf{z}_{n}|\\mathbf{z}_{n-1}) $ (transition probability) and $ p(\\mathbf{x}_{n}|\\mathbf{z}_n) $ (emission probability) are a Gaussian distribution as follows. (Note that **I have omitted bias terms** in this example, in order to simplify examples.)\n", 36 | "\n", 37 | "$$ p(\\mathbf{z}_{n}|\\mathbf{z}_{n-1}) = \\mathcal{N}(\\mathbf{z}_{n}|\\mathbf{A}\\mathbf{z}_{n-1}, \\mathbf{\\Gamma}) $$\n", 38 | "\n", 39 | "$$ p(\\mathbf{x}_{n}|\\mathbf{z}_{n}) = \\mathcal{N}(\\mathbf{x}_{n}|\\mathbf{C}\\mathbf{z}_{n}, \\mathbf{\\Sigma}) $$\n", 40 | "\n", 41 | "In this example, I assume that $ \\{ \\mathbf{z}_{n} \\} $ is 3 dimensional space, and $ \\{ \\mathbf{x}_{n} \\} $ is 2 dimensional space.\n", 42 | "\n", 43 | "For sampling, first I'll create the latent (hidden) distribution $ \\{\\mathbf{z}_n\\} $ with the following parameters.
\n", 44 | "I note that $ \\mathbf{A} $ is a rotation matrix in every axis - x, y, and z.\n", 45 | "\n", 46 | "$$ \\mathbf{A} = \\begin{bmatrix} 0.750 & 0.433 & -0.500 \\\\ -0.217 & 0.875 & 0.433 \\\\ 0.625 & -0.217 & 0.750 \\end{bmatrix} $$\n", 47 | "\n", 48 | "$$ \\mathbf{\\Gamma} = \\begin{bmatrix} 1.5 & 0.1 & 0.0 \\\\ 0.1 & 2.0 & 0.3 \\\\ 0.0 & 0.3 & 1.0 \\end{bmatrix} $$" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 1, 54 | "id": "e8c961a3", 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "array([[ 23. , 24. , 25. ],\n", 61 | " [ 15.71236153, 27.90702904, 28.17441948],\n", 62 | " [ 9.28832674, 32.27866956, 25.05656425],\n", 63 | " ...,\n", 64 | " [-32.29319196, 7.81322471, 76.74943814],\n", 65 | " [-59.97730124, 48.80904779, 35.1615163 ],\n", 66 | " [-42.51320807, 70.05011635, -21.98105043]])" 67 | ] 68 | }, 69 | "execution_count": 1, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "import numpy as np\n", 76 | "import math\n", 77 | "\n", 78 | "np.random.seed(1000) # For debugging and reproducibility\n", 79 | "\n", 80 | "N = 2000\n", 81 | "\n", 82 | "# rotate pi / 6 radian in any axis\n", 83 | "A = np.matmul(\n", 84 | " np.matmul(\n", 85 | " np.array([\n", 86 | " [1.0,0.0,0.0],\n", 87 | " [0.0,math.cos(math.pi/6),math.sin(math.pi/6)],\n", 88 | " [0.0,-1.0*math.sin(math.pi/6),math.cos(math.pi/6)]\n", 89 | " ]),\n", 90 | " np.array([\n", 91 | " [math.cos(math.pi/6),0.0,-1.0*math.sin(math.pi/6)],\n", 92 | " [0.0,1.0,0.0],\n", 93 | " [math.sin(math.pi/6),0.0,math.cos(math.pi/6)]\n", 94 | " ])),\n", 95 | " np.array([\n", 96 | " [math.cos(math.pi/6),math.sin(math.pi/6),0.0],\n", 97 | " [-1.0*math.sin(math.pi/6),math.cos(math.pi/6),0.0],\n", 98 | " [0.0,0.0,1.0]\n", 99 | " ])\n", 100 | ")\n", 101 | "\n", 102 | "Gamma = np.array([\n", 103 | " [1.5, 0.1, 0.0],\n", 104 | " [0.1, 2.0, 0.3],\n", 105 | " [0.0, 0.3, 1.0]\n", 106 | "])\n", 107 | "\n", 108 | "Z = np.array([[23.0, 24.0, 25.0]])\n", 109 | "for n in range(N):\n", 110 | " z_prev = Z[len(Z) - 1]\n", 111 | " mean = np.matmul(A, z_prev)\n", 112 | " z_post = np.random.multivariate_normal(\n", 113 | " mean=mean,\n", 114 | " cov=Gamma,\n", 115 | " size=1)\n", 116 | " Z = np.vstack((Z, z_post))\n", 117 | "Z" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "id": "4751b087", 123 | "metadata": {}, 124 | "source": [ 125 | "Next I'll create the corresponding observation $ \\{\\mathbf{x}_n\\} $ (2 dimensional space) with the following parameters in $ p(\\mathbf{x}_{n}|\\mathbf{z}_{n}) = \\mathcal{N}(\\mathbf{x}_{n}|\\mathbf{C}\\mathbf{z}_{n}, \\mathbf{\\Sigma}) $.\n", 126 | "\n", 127 | "$$ \\mathbf{C} = \\begin{bmatrix} 1.0 & 1.0 & 0.0 \\\\ 0.0 & 1.0 & 1.0 \\end{bmatrix} $$\n", 128 | "\n", 129 | "$$ \\mathbf{\\Sigma} = \\begin{bmatrix} 1.0 & 0.2 \\\\ 0.2 & 2.0 \\end{bmatrix} $$" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 2, 135 | "id": "9d18dbe8", 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "data": { 140 | "text/plain": [ 141 | "array([[ 47.68902127, 48.86610347],\n", 142 | " [ 43.94953287, 54.53438865],\n", 143 | " [ 41.39095419, 55.59938853],\n", 144 | " ...,\n", 145 | " [-24.57596382, 84.39783143],\n", 146 | " [-12.60208378, 84.9819878 ],\n", 147 | " [ 29.52247208, 49.04898829]])" 148 | ] 149 | }, 150 | "execution_count": 2, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "C = np.array([\n", 157 | " [1.0, 1.0, 0.0],\n", 158 | " [0.0, 1.0, 1.0],\n", 159 | "])\n", 160 | "\n", 161 | "Sigma = np.array([\n", 162 | " [1.0,0.2],\n", 163 | " [0.2,2.0],\n", 164 | "])\n", 165 | "\n", 166 | "X = np.empty((0,2))\n", 167 | "for z_n in Z:\n", 168 | " mean = np.matmul(C, z_n)\n", 169 | " x_n = np.random.multivariate_normal(\n", 170 | " mean=mean,\n", 171 | " cov=Sigma,\n", 172 | " size=1)\n", 173 | " X = np.vstack((X, x_n))\n", 174 | "X" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "id": "eb896c56", 180 | "metadata": {}, 181 | "source": [ 182 | "## EM algorithm in Linear Dynamical Systems (LDS)\n", 183 | "\n", 184 | "Now, using the given observation $ \\{ \\mathbf{x}_n \\} $, let's try to estimate the optimimal parameters in LDS.\n", 185 | "\n", 186 | "When I denote unknown parameters by $ \\mathbf{\\theta} $, our goal is to get the optimal parameters $ \\mathbf{\\theta} $ to maximize the following (1).\n", 187 | "\n", 188 | "$$ p(\\mathbf{X}|\\mathbf{\\theta}) = \\sum_{\\mathbf{Z}} p(\\mathbf{X},\\mathbf{Z}|\\mathbf{\\theta}) \\;\\;\\;\\;(1) $$\n", 189 | "\n", 190 | "where $ \\mathbf{Z} = \\{\\mathbf{z}_n\\} $ and $ \\mathbf{X} = \\{\\mathbf{x}_n\\} $\n", 191 | "\n", 192 | "As I have mentioned in [HMM](./01-hmm-em-algorithm.ipynb), it's difficult to apply [maximum likelihood estimation (MLE)](https://tsmatz.wordpress.com/2017/08/30/regression-in-machine-learning-math-for-beginners/) for the expression (1), and I will then apply **EM algorithm** (**E**xpectation–**M**aximization algorithm) to solve unknown parameters.\n", 193 | "\n", 194 | "In EM algorithm for LDS, we start with initial parameters $ \\mathbf{\\theta}^{old} $, and optimize (find) new $ \\mathbf{\\theta} $ to maximize the following expression (2).
\n", 195 | "By repeating this operation, we can expect to reach to the likelihood parameters $ \\hat{\\mathbf{\\theta}} $.\n", 196 | "\n", 197 | "$$ Q(\\mathbf{\\theta}, \\mathbf{\\theta}^{old}) = \\sum_{\\mathbf{Z}} p(\\mathbf{Z}|\\mathbf{X}, \\mathbf{\\theta}^{old}) \\ln p(\\mathbf{X}, \\mathbf{Z}|\\mathbf{\\theta}) \\;\\;\\;\\;(2) $$\n", 198 | "\n", 199 | "> Note : For the essential idea of EM algorithm, see Chapter 9 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" (Christopher M. Bishop, Microsoft)\n", 200 | "\n", 201 | "In LDS, we use the following parameters as $ \\mathbf{\\theta} = \\{ \\mathbf{m}_0, \\mathbf{S}_0, \\mathbf{A}, \\mathbf{\\Gamma}, \\mathbf{C}, \\mathbf{\\Sigma} \\} $.\n", 202 | "\n", 203 | "- $ \\mathbf{m}_0, \\mathbf{S}_0 $ : Gaussian distribution's parameters (mean, variance) in initial latent node $ p(\\mathbf{z}_0) = \\mathcal{N}(\\mathbf{m}_0, \\mathbf{S}_0) $.\n", 204 | "- $ \\mathbf{A}, \\mathbf{\\Gamma} $ : parameters in transition probability $ p(\\mathbf{z}_{n}|\\mathbf{z}_{n-1}) = \\mathcal{N}(\\mathbf{z}_{n}|\\mathbf{A}\\mathbf{z}_{n-1}, \\mathbf{\\Gamma}) $.
\n", 205 | "(In this example, I have omitted a bias term for simplicity.)\n", 206 | "- $ \\mathbf{C}, \\mathbf{\\Sigma} $ : parameters in emission probability $ p(\\mathbf{x}_{n}|\\mathbf{z}_{n}) = \\mathcal{N}(\\mathbf{x}_{n}|\\mathbf{C}\\mathbf{z}_{n}, \\mathbf{\\Sigma}) $.
\n", 207 | "(In this example, I have omitted a bias term for simplicity.)\n", 208 | "\n", 209 | "Now I denote the probability $ p(\\mathbf{z}_n|\\mathbf{x}_1, \\ldots, \\mathbf{x}_n, \\mathbf{\\theta}^{old}) $ by $ \\alpha(z_n) = \\mathcal{N}(\\mathbf{z}_{n}|\\mathbf{\\mu}_n, \\mathbf{V}_n) $.
\n", 210 | "It's known that $ \\mathbf{\\mu}_n $ and $ \\mathbf{V}_n $ can be recursively given by $ \\mathbf{\\theta}^{old} $ and $ \\mathbf{X} $ as follows.\n", 211 | "\n", 212 | "$$ \\mathbf{\\mu}_n = \\mathbf{A}^{old} \\mathbf{\\mu}_{n-1} + \\mathbf{K}_{n-1} (\\mathbf{x}_n - \\mathbf{C}^{old} \\mathbf{A}^{old} \\mathbf{\\mu}_{n-1}) $$\n", 213 | "\n", 214 | "$$ \\mathbf{V}_n = (\\mathbf{I} - \\mathbf{K}_{n-1} \\mathbf{C}^{old}) \\mathbf{P}_{n-1} $$\n", 215 | "\n", 216 | "where\n", 217 | "\n", 218 | "$ \\mathbf{P}_{n} = \\mathbf{A}^{old} \\mathbf{V}_{n} (\\mathbf{A}^{old})^{T} + \\mathbf{\\Gamma}^{old} $\n", 219 | "\n", 220 | "and\n", 221 | "\n", 222 | "$ \\mathbf{K}_n = \\mathbf{P}_{n} (\\mathbf{C}^{old})^T (\\mathbf{C}^{old} \\mathbf{P}_{n} (\\mathbf{C}^{old})^T + \\mathbf{\\Sigma}^{old})^{-1} $\n", 223 | "\n", 224 | "> Note : In [HMM](./01-hmm-em-algorithm.ipynb), $ \\alpha(z_n) $ is defined as $ \\alpha(z_n)=p(\\mathbf{x}_1, \\ldots, \\mathbf{x}_n, \\mathbf{z}_n |\\mathbf{\\theta}^{old}) $. As I mentioned in HMM, the series of these variables $ \\alpha(z_n) $ will go to zero exponentially, when $ N $ is large. It will then eventually exceed the dynamic range of precision in computation, and we should introduce a scaling factor in each recursive steps.
\n", 225 | "> In this example, $ \\alpha(z_n) $ is already a scaled variable $ \\alpha(z_n)=p(\\mathbf{z}_n|\\mathbf{x}_1, \\ldots, \\mathbf{x}_n, \\mathbf{\\theta}^{old}) $ and you don't need to scale in recursion. Therefore, when you monitor the value of likelihood functions, you need to calculate scaling factors $ \\{ c_n \\} $ in each steps. In this example, we don't use $ \\{ c_n \\} $, but please see Chapter 13 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" (Christopher M. Bishop, Microsoft) for details.\n", 226 | "\n", 227 | "In this recursion, the starting condition $ \\mathbf{\\mu}_0, \\mathbf{V}_0 $ is :\n", 228 | "\n", 229 | "$$ \\mathbf{\\mu}_0 = \\mathbf{m}_0^{old} + \\mathbf{K}_0 (\\mathbf{x}_0 - \\mathbf{C}^{old} \\mathbf{m}_0^{old}) $$\n", 230 | "\n", 231 | "$$ \\mathbf{V}_0 = (\\mathbf{I} - \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T (\\mathbf{C}^{old} \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T + \\mathbf{\\Sigma}^{old})^{-1} \\mathbf{C}^{old}) \\mathbf{S}_0^{old} $$\n", 232 | "\n", 233 | "> Note : When we assume $ \\mathbf{P}_{-1} = \\mathbf{S}_0 $, then $ \\mathbf{V}_0 $ is denoted by $ (\\mathbf{I} - \\mathbf{K}_{-1} \\mathbf{C}^{old}) \\mathbf{P}_{-1} $. As you can find, it is consistent with the previous expression $ \\mathbf{V}_n $.
\n", 234 | "> This $ \\mathbf{K} $ is called a Kalman gain matrix.
\n", 235 | "> Same as $ \\mathbf{V}_0 $, $ \\mathbf{\\mu}_0 $ is also related with $ \\mathbf{m}_0 $.\n", 236 | "\n", 237 | "Now I also denote the probability $ p(\\mathbf{z}_n|\\mathbf{X},\\mathbf{\\theta}^{old}) $ by $ \\gamma(z_n) = \\mathcal{N}(\\mathbf{z}_{n}|\\hat{\\mathbf{\\mu}}_n, \\hat{\\mathbf{V}}_n) $.\n", 238 | "\n", 239 | "Once you've done previous forward recursion for $ \\{ \\mathbf{\\mu}_n \\} $ and $ \\{ \\mathbf{V}_n \\} $, run the following backward recursion and then get $ \\{ \\hat{\\mathbf{\\mu}}_n \\} $ and $ \\{ \\hat{\\mathbf{V}}_n \\} $ by using $ \\mathbf{\\theta}^{old} , \\mathbf{X} $, and previous $ \\{ \\mathbf{\\mu}_n \\}, \\{ \\mathbf{V}_n \\} $.\n", 240 | "\n", 241 | "$$ \\hat{\\mathbf{\\mu}}_n = \\mathbf{\\mu}_n + \\mathbf{J}_n (\\hat{\\mathbf{\\mu}}_{n+1} - \\mathbf{A}^{old} \\mathbf{\\mu}_n) $$\n", 242 | "\n", 243 | "$$ \\hat{\\mathbf{V}}_n = \\mathbf{V}_n + \\mathbf{J}_n (\\hat{\\mathbf{V}}_{n+1} - \\mathbf{P}_n) \\mathbf{J}_n^T $$\n", 244 | "\n", 245 | "where $ \\mathbf{J}_n = \\mathbf{V}_n (\\mathbf{A}^{old})^T (\\mathbf{P}_n)^{-1} $.\n", 246 | "\n", 247 | "Once you have got these variables, the obtained $ \\{ \\hat{\\mathbf{\\mu}}_n \\} $ and $ \\{ \\hat{\\mathbf{V}}_n \\} $ are fed into the following equations and you can then get the following expectations.\n", 248 | "\n", 249 | "$$ \\mathbb{E}[\\mathbf{z}_n] = \\hat{\\mathbf{\\mu}}_n $$\n", 250 | "\n", 251 | "$$ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^T] = \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T $$\n", 252 | "\n", 253 | "$$ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] = \\hat{\\mathbf{V}}_n + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_n^T $$\n", 254 | "\n", 255 | "> Note : $ \\mathbb{E}[\\mathbf{z}_{n-1} \\mathbf{z}_{n}^T] = \\left( \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T \\right)^T $\n", 256 | "\n", 257 | "With these expectation values, you can get the optimal $ \\mathbf{\\theta} = \\{ \\mathbf{m}_0, \\mathbf{S}_0, \\mathbf{A}, \\mathbf{\\Gamma}, \\mathbf{C}, \\mathbf{\\Sigma} \\} $ to maximize (2) as follows.\n", 258 | "\n", 259 | "$$ \\mathbf{m}_0^{new} = \\mathbb{E}[\\mathbf{z}_0] $$\n", 260 | "\n", 261 | "$$ \\mathbf{S}_0^{new} = \\mathbb{E}[\\mathbf{z}_0 \\mathbf{z}_0^T] - \\mathbb{E}[\\mathbf{z}_0]\\mathbb{E}[\\mathbf{z}_0^T] $$\n", 262 | "\n", 263 | "$$ \\mathbf{A}^{new} = \\left( \\sum_{n=1}^{N-1} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^T] \\right) \\left( \\sum_{n=0}^{N-2} \\mathbb{E}[\\mathbf{z}_{n} \\mathbf{z}_{n}^T] \\right)^{-1} $$\n", 264 | "\n", 265 | "$$ \\mathbf{\\Gamma}^{new} = \\frac{1}{N - 1} \\sum_{n=1}^{N-1} \\left\\{ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] - \\mathbf{A}^{new} \\mathbb{E}[\\mathbf{z}_{n-1} \\mathbf{z}_n^T] - \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^{T}](\\mathbf{A}^{new})^T + \\mathbf{A}^{new} \\mathbb{E}[\\mathbf{z}_{n-1}\\mathbf{z}_{n-1}^T](\\mathbf{A}^{new})^T \\right\\} $$\n", 266 | "\n", 267 | "$$ \\mathbf{C}^{new} = \\left( \\sum_{n=0}^{N-1} \\mathbf{x}_n \\mathbb{E}[\\mathbf{z}_n]^T \\right) \\left( \\sum_{n=0}^{N-1} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] \\right)^{-1} $$\n", 268 | "\n", 269 | "$$ \\mathbf{\\Sigma}^{new} = \\frac{1}{N} \\sum_{n=0}^{N-1} \\left\\{ \\mathbf{x}_n \\mathbf{x}_n^T - \\mathbf{C}^{new} \\mathbb{E}[\\mathbf{z}_n] \\mathbf{x}_n^T - \\mathbf{x}_n \\mathbb{E}[\\mathbf{z}_n^T](\\mathbf{C}^{new})^T + \\mathbf{C}^{new} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] (\\mathbf{C}^{new})^T \\right\\} $$\n", 270 | "\n", 271 | "You should repeat this process by replacing $ \\mathbf{\\theta}^{old} $ with new $ \\mathbf{\\theta} $, and you will eventually get the optimal results $ \\hat{\\mathbf{\\theta}} $.\n", 272 | "\n", 273 | "> Note : For these properties, please refer Chapter 13 in \"[Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&epi=TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ&irgwc=1&OCID=AID2200057_aff_7593_1243925&tduid=%28ir__vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300%29%287593%29%281243925%29%28TnL5HPStwNw-g4zE85KQgCXaCQfYBhtuFQ%29%28%29&irclickid=_vhvv9m6caokf6nb62oprh029if2xo0rux3ga300300)\" (Christopher M. Bishop, Microsoft)." 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "id": "fba1a21e", 279 | "metadata": {}, 280 | "source": [ 281 | "## Apply algorithm in Python" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "id": "4f60f500", 287 | "metadata": {}, 288 | "source": [ 289 | "## 0. Prerequisites" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "id": "de7be26b", 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "!pip3 install numpy\n", 300 | "!pip3 install scipy" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "id": "aa3be6ac", 306 | "metadata": {}, 307 | "source": [ 308 | "## 1. Initialize parameters\n", 309 | "\n", 310 | "First, initialize $ \\mathbf{\\theta} = \\{ \\mathbf{m}_0, \\mathbf{S}_0, \\mathbf{A}, \\mathbf{\\Gamma}, \\mathbf{C}, \\mathbf{\\Sigma} \\} $.
\n", 311 | "In this example, I set the primitive fixed values as follows.\n", 312 | "\n", 313 | "- $ \\mathbf{m}_0 = (10.0, 10.0, 10.0) $\n", 314 | "- $ \\mathbf{S}_0 = \\begin{bmatrix} 1.0 & 0.5 & 0.5 \\\\ 0.5 & 1.0 & 0.5 \\\\ 0.5 & 0.5 & 1.0 \\end{bmatrix} $\n", 315 | "- $ \\mathbf{A} = \\begin{bmatrix} 1.0 & 1.1 & 1.2 \\\\ 1.3 & 1.4 & 1.5 \\\\ 1.6 & 1.7 & 1.8 \\end{bmatrix} $\n", 316 | "- $ \\mathbf{\\Gamma} = \\begin{bmatrix} 1.0 & 0.5 & 0.5 \\\\ 0.5 & 1.0 & 0.5 \\\\ 0.5 & 0.5 & 1.0 \\end{bmatrix} $\n", 317 | "- $ \\mathbf{C} = \\begin{bmatrix} 1.0 & 1.0 & 1.0 \\\\ 1.0 & 1.0 & 1.0 \\end{bmatrix} $\n", 318 | "- $ \\mathbf{\\Sigma} = \\begin{bmatrix} 1.0 & 0.5 \\\\ 0.5 & 1.0 \\end{bmatrix} $" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 3, 324 | "id": "b114a5c1", 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "# Initialize parameters\n", 329 | "class theta:\n", 330 | " m0 = np.empty((0,3))\n", 331 | " S0 = np.empty((0,3,3))\n", 332 | " A = np.empty((0,3,3))\n", 333 | " Gamma = np.empty((0,3,3))\n", 334 | " C = np.empty((0,2,3))\n", 335 | " Sigma = np.empty((0,2,2))\n", 336 | "\n", 337 | " def __init__(self, m0, S0, A, Gamma, C, Sigma):\n", 338 | " self.m0 = m0\n", 339 | " self.S0 = S0\n", 340 | " self.A = A\n", 341 | " self.Gamma = Gamma\n", 342 | " self.C = C\n", 343 | " self.Sigma = Sigma\n", 344 | "\n", 345 | "theta_old = theta(\n", 346 | " m0=np.array([10.0, 10.0, 10.0]),\n", 347 | " S0=np.array([\n", 348 | " [1.0, 0.5, 0.5],\n", 349 | " [0.5, 1.0, 0.5],\n", 350 | " [0.5, 0.5, 1.0]\n", 351 | " ]),\n", 352 | " A=np.array([\n", 353 | " [1.0, 1.1, 1.2],\n", 354 | " [1.3, 1.4, 1.5],\n", 355 | " [1.6, 1.7, 1.8]\n", 356 | " ]),\n", 357 | " Gamma=np.array([\n", 358 | " [1.0, 0.5, 0.5],\n", 359 | " [0.5, 1.0, 0.5],\n", 360 | " [0.5, 0.5, 1.0]\n", 361 | " ]),\n", 362 | " C=np.array([\n", 363 | " [1.0, 1.0, 1.0],\n", 364 | " [1.0, 1.0, 1.0],\n", 365 | " ]),\n", 366 | " Sigma=np.array([\n", 367 | " [1.0, 0.5],\n", 368 | " [0.5, 1.0]\n", 369 | " ])\n", 370 | ")" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "id": "bb646728", 376 | "metadata": {}, 377 | "source": [ 378 | "## 2. Get $ \\{ \\mathbf{\\mu}_n \\} $ and $ \\{ \\mathbf{V}_n \\} $" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "id": "619b21d7", 384 | "metadata": {}, 385 | "source": [ 386 | "Now I set the starting condition, $ \\mathbf{V}_0 $ as follows. :\n", 387 | "\n", 388 | "$$ \\mathbf{V}_0 = (\\mathbf{I} - \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T (\\mathbf{C}^{old} \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T + \\mathbf{\\Sigma}^{old})^{-1} \\mathbf{C}^{old}) \\mathbf{S}_0^{old} $$\n", 389 | "\n", 390 | "where $ \\mathbf{P}_{n} = \\mathbf{A}^{old} \\mathbf{V}_{n} (\\mathbf{A}^{old})^{T} + \\mathbf{\\Gamma}^{old} $ and $ \\mathbf{K}_n = \\mathbf{P}_{n} (\\mathbf{C}^{old})^T (\\mathbf{C}^{old} \\mathbf{P}_{n} (\\mathbf{C}^{old})^T + \\mathbf{\\Sigma}^{old})^{-1} $\n", 391 | "\n", 392 | "As I mentioned above, this is equivalent to :\n", 393 | "\n", 394 | "$$ \\mathbf{V}_0 = (\\mathbf{I} - \\mathbf{K}_{-1} \\mathbf{C}^{old}) \\mathbf{S}_0^{old} $$\n", 395 | "\n", 396 | "where $ \\mathbf{K}_{-1} = \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T (\\mathbf{C}^{old} \\mathbf{S}_0^{old} (\\mathbf{C}^{old})^T + \\mathbf{\\Sigma}^{old})^{-1} $\n", 397 | "\n", 398 | "And we can recursively obtain all $ \\{ \\mathbf{V}_n \\} $ as follows.\n", 399 | "\n", 400 | "$$ \\mathbf{V}_n = (\\mathbf{I} - \\mathbf{K}_{n-1} \\mathbf{C}^{old}) \\mathbf{P}_{n-1} $$" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 4, 406 | "id": "05c6c762", 407 | "metadata": {}, 408 | "outputs": [], 409 | "source": [ 410 | "from scipy.stats import multivariate_normal\n", 411 | "\n", 412 | "def P(V_n):\n", 413 | " res = np.matmul(theta_old.A, V_n)\n", 414 | " res = np.matmul(res, theta_old.A.transpose())\n", 415 | " res = res + theta_old.Gamma\n", 416 | " return res\n", 417 | "\n", 418 | "def K(P_n):\n", 419 | " res = np.matmul(P_n, theta_old.C.transpose())\n", 420 | " inv = np.matmul(theta_old.C, P_n)\n", 421 | " inv = np.matmul(inv, theta_old.C.transpose())\n", 422 | " inv = inv + theta_old.Sigma\n", 423 | " inv = np.linalg.inv(inv)\n", 424 | " res = np.matmul(res, inv)\n", 425 | " return res\n", 426 | "\n", 427 | "def get_V():\n", 428 | " V = np.empty((0,3,3))\n", 429 | " I = np.array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])\n", 430 | "\n", 431 | " # Get initial V_0\n", 432 | " K_minus1 = K(theta_old.S0)\n", 433 | " V_0 = np.matmul(\n", 434 | " np.subtract(I, np.matmul(K_minus1, theta_old.C)),\n", 435 | " theta_old.S0)\n", 436 | " V = np.vstack((V, [V_0]))\n", 437 | "\n", 438 | " # Get all elements recursively\n", 439 | " for n in range(1, N):\n", 440 | " P_n_minus1 = P(V[n-1])\n", 441 | " V_n = np.matmul(\n", 442 | " np.subtract(I, np.matmul(K(P_n_minus1),theta_old.C)),\n", 443 | " P_n_minus1)\n", 444 | " V = np.vstack((V, [V_n]))\n", 445 | "\n", 446 | " return V" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "id": "2d8a4fc8", 452 | "metadata": {}, 453 | "source": [ 454 | "I also set the starting condition, $ \\mathbf{\\mu}_0 $ as follows. :\n", 455 | "\n", 456 | "$$ \\mathbf{\\mu}_0 = \\mathbf{m}_0^{old} + \\mathbf{K}_0 (\\mathbf{x}_0 - \\mathbf{C}^{old} \\mathbf{m}_0^{old}) $$\n", 457 | "\n", 458 | "And we can recursively obtain all $ \\{ \\mathbf{\\mu}_n \\} $ as follows.\n", 459 | "\n", 460 | "$$ \\mathbf{\\mu}_n = \\mathbf{A}^{old} \\mathbf{\\mu}_{n-1} + \\mathbf{K}_{n-1} (\\mathbf{x}_n - \\mathbf{C}^{old} \\mathbf{A}^{old} \\mathbf{\\mu}_{n-1}) $$" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 5, 466 | "id": "179fc0b8", 467 | "metadata": {}, 468 | "outputs": [], 469 | "source": [ 470 | "def get_mu(V):\n", 471 | " mu = np.empty((0,3))\n", 472 | "\n", 473 | " # Get initial mu_0\n", 474 | " P_0 = P(V[0])\n", 475 | " K_0 = K(P_0)\n", 476 | " theta_old_m0_T = np.array([theta_old.m0]).transpose()\n", 477 | " X_0_T = np.array([X[0]]).transpose()\n", 478 | " mu_0_T = np.add(\n", 479 | " theta_old_m0_T,\n", 480 | " np.matmul(\n", 481 | " K_0,\n", 482 | " np.subtract(\n", 483 | " X_0_T,\n", 484 | " np.matmul(theta_old.C,theta_old_m0_T)\n", 485 | " )\n", 486 | " )\n", 487 | " )\n", 488 | " mu_0 = np.squeeze(mu_0_T.transpose())\n", 489 | " mu = np.vstack((mu, mu_0))\n", 490 | "\n", 491 | " # Get all elements recursively\n", 492 | " for n in range(1, N):\n", 493 | " P_n_minus1 = P(V[n-1])\n", 494 | " K_n_minus1 = K(P_n_minus1)\n", 495 | " mu_n_minus1_T = np.array([mu[n-1]]).transpose()\n", 496 | " mu_n_former_T = np.matmul(theta_old.A, mu_n_minus1_T)\n", 497 | " X_n_T = np.array([X[n]]).transpose()\n", 498 | " sub_n_T = np.subtract(\n", 499 | " X_n_T,\n", 500 | " np.matmul(\n", 501 | " np.matmul(theta_old.C,theta_old.A),\n", 502 | " mu_n_minus1_T\n", 503 | " )\n", 504 | " )\n", 505 | " mu_n_latter_T = np.matmul(\n", 506 | " K_n_minus1,\n", 507 | " sub_n_T)\n", 508 | " mu_n_T = np.add(mu_n_former_T, mu_n_latter_T)\n", 509 | " mu_n = np.squeeze(mu_n_T.transpose())\n", 510 | " mu = np.vstack((mu, mu_n))\n", 511 | "\n", 512 | " return mu" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "id": "e486bdfc", 518 | "metadata": {}, 519 | "source": [ 520 | "## 3. Get $ \\{ \\hat{\\mathbf{\\mu}}_n \\} $ and $ \\{ \\hat{\\mathbf{V}}_n \\} $" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "id": "ee549e91", 526 | "metadata": {}, 527 | "source": [ 528 | "Now we obtain $ \\{ \\hat{\\mathbf{\\mu}}_n \\} $ and $ \\{ \\hat{\\mathbf{V}}_n \\} $ by running backward recursion with previous $ \\{ \\mathbf{\\mu}_n \\} $ and $ \\{ \\mathbf{V}_n \\} $.\n", 529 | "\n", 530 | "First we get $ \\{ \\hat{\\mathbf{V}}_n \\} $ as follows.\n", 531 | "\n", 532 | "$$ \\hat{\\mathbf{V}}_n = \\mathbf{V}_n + \\mathbf{J}_n (\\hat{\\mathbf{V}}_{n+1} - \\mathbf{P}_n) \\mathbf{J}_n^T $$\n", 533 | "\n", 534 | "where $ \\mathbf{J}_n = \\mathbf{V}_n (\\mathbf{A}^{old})^T (\\mathbf{P}_n)^{-1} $" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 6, 540 | "id": "4282314b", 541 | "metadata": {}, 542 | "outputs": [], 543 | "source": [ 544 | "def J(V_n, P_n):\n", 545 | " return np.matmul(\n", 546 | " np.matmul(V_n,theta_old.A.transpose()),\n", 547 | " np.linalg.inv(P_n)\n", 548 | " )\n", 549 | "\n", 550 | "def get_V_hat(V):\n", 551 | " V_hat_rev = np.empty((0,3,3))\n", 552 | " V_hat_rev = np.vstack((V_hat_rev, [V[N-1]]))\n", 553 | "\n", 554 | " for n in range(1, N):\n", 555 | " V_n = V[N-n-1]\n", 556 | " P_n = P(V_n)\n", 557 | " J_n = J(V_n, P_n)\n", 558 | " V_hat_n = np.add(\n", 559 | " V_n,\n", 560 | " np.matmul(\n", 561 | " np.matmul(\n", 562 | " J_n,\n", 563 | " np.subtract(V_hat_rev[n-1],P_n)\n", 564 | " ),\n", 565 | " J_n.transpose()\n", 566 | " )\n", 567 | " )\n", 568 | " V_hat_rev = np.vstack((V_hat_rev, [V_hat_n]))\n", 569 | "\n", 570 | " # Reverse results\n", 571 | " V_hat = np.flip(V_hat_rev, axis=0)\n", 572 | "\n", 573 | " return V_hat" 574 | ] 575 | }, 576 | { 577 | "cell_type": "markdown", 578 | "id": "4f0da770", 579 | "metadata": {}, 580 | "source": [ 581 | "Next we also get $ \\{ \\hat{\\mathbf{\\mu}}_n \\} $ as follows.\n", 582 | "\n", 583 | "$$ \\hat{\\mathbf{\\mu}}_n = \\mathbf{\\mu}_n + \\mathbf{J}_n (\\hat{\\mathbf{\\mu}}_{n+1} - \\mathbf{A}^{old} \\mathbf{\\mu}_n) $$" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 7, 589 | "id": "1ec9dfd4", 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "def get_mu_hat(mu, V):\n", 594 | " mu_hat_rev = np.empty((0,3))\n", 595 | " mu_hat_rev = np.vstack((mu_hat_rev, mu[N-1]))\n", 596 | "\n", 597 | " for n in range(1, N):\n", 598 | " mu_n = mu[N-n-1]\n", 599 | " mu_n_T = np.array([mu_n]).transpose()\n", 600 | " mu_hat_rev_n_minus1_T = np.array([mu_hat_rev[n-1]]).transpose()\n", 601 | " V_n = V[N-n-1]\n", 602 | " P_n = P(V_n)\n", 603 | " J_n = J(V_n, P_n)\n", 604 | " mu_hat_n_T = np.add(\n", 605 | " mu_n_T,\n", 606 | " np.matmul(\n", 607 | " J_n,\n", 608 | " np.subtract(\n", 609 | " mu_hat_rev_n_minus1_T,\n", 610 | " np.matmul(theta_old.A,mu_n_T)\n", 611 | " )\n", 612 | " )\n", 613 | " )\n", 614 | " mu_hat_n = np.squeeze(mu_hat_n_T.transpose())\n", 615 | " mu_hat_rev = np.vstack((mu_hat_rev, mu_hat_n))\n", 616 | "\n", 617 | " # Reverse results\n", 618 | " mu_hat = np.flip(mu_hat_rev, axis=0)\n", 619 | "\n", 620 | " return mu_hat" 621 | ] 622 | }, 623 | { 624 | "cell_type": "markdown", 625 | "id": "b57fd509", 626 | "metadata": {}, 627 | "source": [ 628 | "## 4. Get Expectations\n", 629 | "\n", 630 | "Now we get the following array of expectations.\n", 631 | "\n", 632 | "1. $ \\mathbb{E}[\\mathbf{z}_n] = \\hat{\\mathbf{\\mu}}_n \\;\\; (n=0, \\ldots, N-1) $\n", 633 | "2. $ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^T] = \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T \\;\\; (n=1, \\ldots, N-1) $\n", 634 | "3. $ \\mathbb{E}[\\mathbf{z}_{n-1} \\mathbf{z}_{n}^T] = \\left( \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T \\right)^T \\;\\; (n=1, \\ldots, N-1) $\n", 635 | "4. $ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] = \\hat{\\mathbf{V}}_n + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_n^T \\;\\; (n=0, \\ldots, N-1) $" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "id": "8d4b49e1", 641 | "metadata": {}, 642 | "source": [ 643 | "### (1) $ \\mathbb{E}[\\mathbf{z}_n] = \\hat{\\mathbf{\\mu}}_n \\;\\; (n=0, \\ldots, N-1)$" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 8, 649 | "id": "225e3890", 650 | "metadata": {}, 651 | "outputs": [], 652 | "source": [ 653 | "def get_E1(mu_hat):\n", 654 | " return mu_hat" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "id": "5bfb1d88", 660 | "metadata": {}, 661 | "source": [ 662 | "### (2) $ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^T] = \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T \\;\\; (n=1, \\ldots, N-1)$\n", 663 | "\n", 664 | "### (3) $ \\mathbb{E}[\\mathbf{z}_{n-1} \\mathbf{z}_{n}^T] = \\left( \\hat{\\mathbf{V}}_n \\mathbf{J}_{n-1}^{T} + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_{n-1}^T \\right)^T \\;\\; (n=1, \\ldots, N-1) $" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 9, 670 | "id": "3e82019e", 671 | "metadata": {}, 672 | "outputs": [], 673 | "source": [ 674 | "def get_E2_E3(V, mu_hat, V_hat):\n", 675 | " E2 = np.empty((0,3,3))\n", 676 | " E3 = np.empty((0,3,3))\n", 677 | " for n in range(1,N):\n", 678 | " P_n_minus1 = P(V[n-1])\n", 679 | " J_n_minus1 = J(V[n-1], P_n_minus1)\n", 680 | " mu_hat_n_T = np.array([mu_hat[n]]).transpose()\n", 681 | " mu_hat_n_minus1 = np.array([mu_hat[n-1]])\n", 682 | " E2_n = np.add(\n", 683 | " np.matmul(\n", 684 | " V_hat[n],\n", 685 | " J_n_minus1.transpose()\n", 686 | " ),\n", 687 | " np.matmul(\n", 688 | " mu_hat_n_T,\n", 689 | " mu_hat_n_minus1\n", 690 | " )\n", 691 | " )\n", 692 | " E2 = np.vstack((E2, [E2_n]))\n", 693 | " E3_n = E2_n.transpose()\n", 694 | " E3 = np.vstack((E3, [E3_n]))\n", 695 | " return E2, E3" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "id": "ecc0c74e", 701 | "metadata": {}, 702 | "source": [ 703 | "### (4) $ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] = \\hat{\\mathbf{V}}_n + \\hat{\\mathbf{\\mu}}_n \\hat{\\mathbf{\\mu}}_n^T \\;\\; (n=0, \\ldots, N-1) $" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": 10, 709 | "id": "8d1777ee", 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "def get_E4(mu_hat, V_hat):\n", 714 | " E4 = np.empty((0,3,3))\n", 715 | " for n in range(N):\n", 716 | " mu_hat_n = np.array([mu_hat[n]])\n", 717 | " mu_hat_n_T = mu_hat_n.transpose()\n", 718 | " E4_n = np.add(\n", 719 | " V_hat[n],\n", 720 | " np.matmul(mu_hat_n_T, mu_hat_n)\n", 721 | " )\n", 722 | " E4 = np.vstack((E4, [E4_n]))\n", 723 | " return E4" 724 | ] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "id": "2e27e5dc", 729 | "metadata": {}, 730 | "source": [ 731 | "## 5. Get new (optimal) parameters $ \\mathbf{\\theta} $\n", 732 | "\n", 733 | "Finally, get new $ \\mathbf{\\theta} = \\{ \\mathbf{m}_0, \\mathbf{S}_0, \\mathbf{A}, \\mathbf{\\Gamma}, \\mathbf{C}, \\mathbf{\\Sigma} \\} $ using previous E1, E2, E3, and E4." 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "id": "60333624", 739 | "metadata": {}, 740 | "source": [ 741 | "First, $ \\mathbf{m}_0 $ is given as follows.\n", 742 | "\n", 743 | "$$ \\mathbf{m}_0^{new} = \\mathbb{E}[\\mathbf{z}_0] $$" 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": 11, 749 | "id": "1e03c9e0", 750 | "metadata": {}, 751 | "outputs": [], 752 | "source": [ 753 | "def get_m0_new(E1):\n", 754 | " return E1[0]" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "id": "9f3e71ec", 760 | "metadata": {}, 761 | "source": [ 762 | "$ \\mathbf{S}_0 $ is given as follows.\n", 763 | "\n", 764 | "$$ \\mathbf{S}_0^{new} = \\mathbb{E}[\\mathbf{z}_0 \\mathbf{z}_0^T] - \\mathbb{E}[\\mathbf{z}_0]\\mathbb{E}[\\mathbf{z}_0^T] $$" 765 | ] 766 | }, 767 | { 768 | "cell_type": "code", 769 | "execution_count": 12, 770 | "id": "91b2e0d7", 771 | "metadata": {}, 772 | "outputs": [], 773 | "source": [ 774 | "def get_S0_new(E1, E4):\n", 775 | " E1_0 = np.array([E1[0]])\n", 776 | " E1_0_T = E1_0.transpose()\n", 777 | " return np.subtract(\n", 778 | " E4[0],\n", 779 | " np.matmul(E1_0_T, E1_0)\n", 780 | " )" 781 | ] 782 | }, 783 | { 784 | "cell_type": "markdown", 785 | "id": "c4ee688b", 786 | "metadata": {}, 787 | "source": [ 788 | "$ \\mathbf{A} $ is given as follows.\n", 789 | "\n", 790 | "$$ \\mathbf{A}^{new} = \\left( \\sum_{n=1}^{N-1} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^T] \\right) \\left( \\sum_{n=0}^{N-2} \\mathbb{E}[\\mathbf{z}_{n} \\mathbf{z}_{n}^T] \\right)^{-1} $$" 791 | ] 792 | }, 793 | { 794 | "cell_type": "code", 795 | "execution_count": 13, 796 | "id": "f5826d67", 797 | "metadata": {}, 798 | "outputs": [], 799 | "source": [ 800 | "def get_A_new(E2, E4):\n", 801 | " return np.matmul(\n", 802 | " np.sum(E2, axis=0),\n", 803 | " np.linalg.inv(np.sum(E4[:N-1], axis=0))\n", 804 | " )" 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "id": "1163c8a5", 810 | "metadata": {}, 811 | "source": [ 812 | "$ \\mathbf{\\Gamma} $ is given as follows.\n", 813 | "\n", 814 | "$$ \\mathbf{\\Gamma}^{new} = \\frac{1}{N - 1} \\sum_{n=1}^{N-1} \\left\\{ \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] - \\mathbf{A}^{new} \\mathbb{E}[\\mathbf{z}_{n-1} \\mathbf{z}_n^T] - \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_{n-1}^{T}](\\mathbf{A}^{new})^T + \\mathbf{A}^{new} \\mathbb{E}[\\mathbf{z}_{n-1}\\mathbf{z}_{n-1}^T](\\mathbf{A}^{new})^T \\right\\} $$" 815 | ] 816 | }, 817 | { 818 | "cell_type": "code", 819 | "execution_count": 14, 820 | "id": "7b86809f", 821 | "metadata": {}, 822 | "outputs": [], 823 | "source": [ 824 | "def get_Gamma_new(E2, E3, E4, A_new):\n", 825 | " elems = np.empty((0,3,3))\n", 826 | " for n in range(1, N):\n", 827 | " elems_n = np.add(\n", 828 | " np.subtract(\n", 829 | " np.subtract(\n", 830 | " E4[n],\n", 831 | " np.matmul(\n", 832 | " A_new,\n", 833 | " E3[n-1]\n", 834 | " )\n", 835 | " ),\n", 836 | " np.matmul(\n", 837 | " E2[n-1],\n", 838 | " A_new.transpose()\n", 839 | " )\n", 840 | " ),\n", 841 | " np.matmul(\n", 842 | " np.matmul(\n", 843 | " A_new,\n", 844 | " E4[n-1]\n", 845 | " ),\n", 846 | " A_new.transpose()\n", 847 | " )\n", 848 | " )\n", 849 | " elems = np.vstack((elems, [elems_n]))\n", 850 | " return np.sum(elems, axis=0) / (N-1)" 851 | ] 852 | }, 853 | { 854 | "cell_type": "markdown", 855 | "id": "551c888b", 856 | "metadata": {}, 857 | "source": [ 858 | "$ \\mathbf{C} $ is given as follows.\n", 859 | "\n", 860 | "$$ \\mathbf{C}^{new} = \\left( \\sum_{n=0}^{N-1} \\mathbf{x}_n \\mathbb{E}[\\mathbf{z}_n]^T \\right) \\left( \\sum_{n=0}^{N-1} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] \\right)^{-1} $$" 861 | ] 862 | }, 863 | { 864 | "cell_type": "code", 865 | "execution_count": 15, 866 | "id": "5d0de357", 867 | "metadata": {}, 868 | "outputs": [], 869 | "source": [ 870 | "def get_C_new_former(E1):\n", 871 | " elems = np.empty((0,2,3))\n", 872 | " for n in range(N):\n", 873 | " x_n_T = np.array([X[n]]).transpose()\n", 874 | " E1_n = np.array([E1[n]])\n", 875 | " elems_n = np.matmul(x_n_T, E1_n)\n", 876 | " elems = np.vstack((elems, [elems_n]))\n", 877 | " return np.sum(elems, axis=0)\n", 878 | "\n", 879 | "def get_C_new(E1, E4):\n", 880 | " return np.matmul(\n", 881 | " get_C_new_former(E1),\n", 882 | " np.linalg.inv(np.sum(E4, axis=0))\n", 883 | " )" 884 | ] 885 | }, 886 | { 887 | "cell_type": "markdown", 888 | "id": "ad4ad181", 889 | "metadata": {}, 890 | "source": [ 891 | "$ \\mathbf{\\Sigma} $ is given as follows.\n", 892 | "\n", 893 | "$$ \\mathbf{\\Sigma}^{new} = \\frac{1}{N} \\sum_{n=0}^{N-1} \\left\\{ \\mathbf{x}_n \\mathbf{x}_n^T - \\mathbf{C}^{new} \\mathbb{E}[\\mathbf{z}_n] \\mathbf{x}_n^T - \\mathbf{x}_n \\mathbb{E}[\\mathbf{z}_n^T](\\mathbf{C}^{new})^T + \\mathbf{C}^{new} \\mathbb{E}[\\mathbf{z}_n \\mathbf{z}_n^T] (\\mathbf{C}^{new})^T \\right\\} $$" 894 | ] 895 | }, 896 | { 897 | "cell_type": "code", 898 | "execution_count": 16, 899 | "id": "ec94e176", 900 | "metadata": {}, 901 | "outputs": [], 902 | "source": [ 903 | "def get_Sigma_new(E1, E4, C_new):\n", 904 | " elems = np.empty((0,2,2))\n", 905 | " for n in range(N):\n", 906 | " x_n = np.array([X[n]])\n", 907 | " x_n_T = x_n.transpose()\n", 908 | " E1_n = np.array([E1[n]])\n", 909 | " E1_n_T = E1_n.transpose()\n", 910 | " elem_n = np.add(\n", 911 | " np.subtract(\n", 912 | " np.subtract(\n", 913 | " np.matmul(x_n_T, x_n),\n", 914 | " np.matmul(\n", 915 | " np.matmul(\n", 916 | " C_new,\n", 917 | " E1_n_T\n", 918 | " ),\n", 919 | " x_n\n", 920 | " )\n", 921 | " ),\n", 922 | " np.matmul(\n", 923 | " np.matmul(\n", 924 | " x_n_T,\n", 925 | " E1_n\n", 926 | " ),\n", 927 | " C_new.transpose()\n", 928 | " )\n", 929 | " ),\n", 930 | " np.matmul(\n", 931 | " np.matmul(\n", 932 | " C_new,\n", 933 | " E4[n]\n", 934 | " ),\n", 935 | " C_new.transpose()\n", 936 | " )\n", 937 | " )\n", 938 | " elems = np.vstack((elems, [elem_n]))\n", 939 | " return np.sum(elems, axis=0) / N" 940 | ] 941 | }, 942 | { 943 | "cell_type": "markdown", 944 | "id": "7786d473", 945 | "metadata": {}, 946 | "source": [ 947 | "## 5. Run algorithm" 948 | ] 949 | }, 950 | { 951 | "cell_type": "code", 952 | "execution_count": 17, 953 | "id": "566f0062", 954 | "metadata": {}, 955 | "outputs": [ 956 | { 957 | "name": "stdout", 958 | "output_type": "stream", 959 | "text": [ 960 | "Running iteration 100 ...\n", 961 | "Done\n" 962 | ] 963 | } 964 | ], 965 | "source": [ 966 | "for loop in range(100):\n", 967 | " print(\"Running iteration {} ...\".format(loop + 1), end=\"\\r\")\n", 968 | " # Get mu and V\n", 969 | " V = get_V()\n", 970 | " mu = get_mu(V)\n", 971 | " # Get mu_hat and V_hat\n", 972 | " V_hat = get_V_hat(V)\n", 973 | " mu_hat = get_mu_hat(mu, V)\n", 974 | " # Get expectation values\n", 975 | " E1 = get_E1(mu_hat)\n", 976 | " E2, E3 = get_E2_E3(V, mu_hat, V_hat)\n", 977 | " E4 = get_E4(mu_hat, V_hat)\n", 978 | " # Get optimized new parameters\n", 979 | " m0_new = get_m0_new(E1)\n", 980 | " S0_new = get_S0_new(E1, E4)\n", 981 | " A_new = get_A_new(E2, E4)\n", 982 | " Gamma_new = get_Gamma_new(E2, E3, E4, A_new)\n", 983 | " C_new = get_C_new(E1, E4)\n", 984 | " Sigma_new = get_Sigma_new(E1, E4, C_new)\n", 985 | " # Replace theta and repeat\n", 986 | " theta_old.m0 = m0_new\n", 987 | " theta_old.S0 = S0_new\n", 988 | " theta_old.A = A_new\n", 989 | " theta_old.Gamma = Gamma_new\n", 990 | " theta_old.C = C_new\n", 991 | " theta_old.Sigma = Sigma_new\n", 992 | "\n", 993 | "print(\"\\nDone\")" 994 | ] 995 | }, 996 | { 997 | "cell_type": "markdown", 998 | "id": "17595e06", 999 | "metadata": {}, 1000 | "source": [ 1001 | "Here is the estimated results for parameters.\n", 1002 | "\n", 1003 | "I note that the result won't be unique (depending on scaling, rotation, etc) and this result is then one of such locally maximized parameters.
\n", 1004 | "For your reference, I show you the simulated data $ \\{ \\mathbf{x}_n \\} $ (first 5 points in sequence) compared with observations in below.\n", 1005 | "\n", 1006 | "> Note : Here I don't go so far, but you can also find the most probable sequence of hidden states for a given observation sequence." 1007 | ] 1008 | }, 1009 | { 1010 | "cell_type": "code", 1011 | "execution_count": 18, 1012 | "id": "7e45c752", 1013 | "metadata": {}, 1014 | "outputs": [ 1015 | { 1016 | "name": "stdout", 1017 | "output_type": "stream", 1018 | "text": [ 1019 | "m0\n", 1020 | "[ 2.57496192 6.3276932 11.234217 ]\n", 1021 | "S0\n", 1022 | "[[5.24798097e-05 2.79583705e-05 8.54545730e-05]\n", 1023 | " [2.79583705e-05 2.07409258e-04 1.78504551e-04]\n", 1024 | " [8.54545730e-05 1.78504551e-04 3.76386362e-04]]\n", 1025 | "A\n", 1026 | "[[ 0.78563464 -0.54698622 0.37223684]\n", 1027 | " [ 0.89487513 0.87198508 -0.1177087 ]\n", 1028 | " [ 0.10580215 0.43482559 0.71643318]]\n", 1029 | "Gamma\n", 1030 | "[[0.00832995 0.02118287 0.03410333]\n", 1031 | " [0.02118287 0.06644179 0.10506952]\n", 1032 | " [0.03410333 0.10506952 0.17151925]]\n", 1033 | "C\n", 1034 | "[[-15.90921342 -3.14027935 9.66084486]\n", 1035 | " [ 19.55862629 7.56719346 -4.3960726 ]]\n", 1036 | "Sigma\n", 1037 | "[[0.96929266 0.04601912]\n", 1038 | " [0.04601912 1.87507473]]\n" 1039 | ] 1040 | } 1041 | ], 1042 | "source": [ 1043 | "print(\"m0\")\n", 1044 | "print(m0_new)\n", 1045 | "print(\"S0\")\n", 1046 | "print(S0_new)\n", 1047 | "print(\"A\")\n", 1048 | "print(A_new)\n", 1049 | "print(\"Gamma\")\n", 1050 | "print(Gamma_new)\n", 1051 | "print(\"C\")\n", 1052 | "print(C_new)\n", 1053 | "print(\"Sigma\")\n", 1054 | "print(Sigma_new)" 1055 | ] 1056 | }, 1057 | { 1058 | "cell_type": "code", 1059 | "execution_count": 19, 1060 | "id": "e7d9865e", 1061 | "metadata": {}, 1062 | "outputs": [ 1063 | { 1064 | "name": "stdout", 1065 | "output_type": "stream", 1066 | "text": [ 1067 | "***** Observation *****\n", 1068 | "[[47.68902127 48.86610347]\n", 1069 | " [43.94953287 54.53438865]\n", 1070 | " [41.39095419 55.59938853]\n", 1071 | " [45.94610067 55.73904685]\n", 1072 | " [53.12714417 48.54030761]]\n", 1073 | "***** Simulated Result *****\n", 1074 | "[[47.49019418 47.29451052]\n", 1075 | " [44.14951187 57.2202532 ]\n", 1076 | " [42.05716531 56.68219724]\n", 1077 | " [43.07225681 55.11272219]\n", 1078 | " [52.83202734 51.55821952]]\n" 1079 | ] 1080 | } 1081 | ], 1082 | "source": [ 1083 | "# Show simulated results\n", 1084 | "\n", 1085 | "N_sim = 4\n", 1086 | "\n", 1087 | "Z_sim = np.array([m0_new])\n", 1088 | "for n in range(N_sim):\n", 1089 | " z_prev = Z_sim[len(Z_sim) - 1]\n", 1090 | " mean = np.matmul(A_new, z_prev)\n", 1091 | " z_post = np.random.multivariate_normal(\n", 1092 | " mean=mean,\n", 1093 | " cov=Gamma_new,\n", 1094 | " size=1)\n", 1095 | " Z_sim = np.vstack((Z_sim, z_post))\n", 1096 | "X_sim = np.empty((0,2))\n", 1097 | "for z_n in Z_sim:\n", 1098 | " mean = np.matmul(C_new, z_n)\n", 1099 | " x_n = np.random.multivariate_normal(\n", 1100 | " mean=mean,\n", 1101 | " cov=Sigma_new,\n", 1102 | " size=1)\n", 1103 | " X_sim = np.vstack((X_sim, x_n))\n", 1104 | "print(\"***** Observation *****\")\n", 1105 | "print(X[:5])\n", 1106 | "print(\"***** Simulated Result *****\")\n", 1107 | "print(X_sim)" 1108 | ] 1109 | }, 1110 | { 1111 | "cell_type": "code", 1112 | "execution_count": null, 1113 | "id": "0b4c5fc4", 1114 | "metadata": {}, 1115 | "outputs": [], 1116 | "source": [] 1117 | } 1118 | ], 1119 | "metadata": { 1120 | "kernelspec": { 1121 | "display_name": "Python 3", 1122 | "language": "python", 1123 | "name": "python3" 1124 | }, 1125 | "language_info": { 1126 | "codemirror_mode": { 1127 | "name": "ipython", 1128 | "version": 3 1129 | }, 1130 | "file_extension": ".py", 1131 | "mimetype": "text/x-python", 1132 | "name": "python", 1133 | "nbconvert_exporter": "python", 1134 | "pygments_lexer": "ipython3", 1135 | "version": "3.6.9" 1136 | } 1137 | }, 1138 | "nbformat": 4, 1139 | "nbformat_minor": 5 1140 | } 1141 | --------------------------------------------------------------------------------