├── .gitignore ├── README.md └── prob-rob ├── Probabilistic-robotics.apkg └── ch3 ├── ch3-solutions.tex └── understanding-covariance.tex /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled files 2 | *.o 3 | *.so 4 | *.rlib 5 | *.dll 6 | 7 | # Executables 8 | *.exe 9 | 10 | # Generated by Cargo 11 | /target/ 12 | build/* 13 | reference/* 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | probabilistic-robotics 2 | ====================== 3 | 4 | I'm building a robotics curriculum for a K-8 school, and as part of that 5 | process I'm trying to master the computational parts of mobile robotics (I also 6 | want to build a robotic mower for my pasture...). I'm going through two books 7 | to do that: [Introduction to Autonomous Mobile Robots](http://amzn.com/0262015358) 8 | and [Probabilistic Robotics](http://amzn.com/0262201623). 9 | 10 | These are my notes, in the form of Anki flash cards, along with my attempts at 11 | solutions to all of the exercises in both books. I will be keeping the two 12 | books' answer sets separate for easier reference. 13 | 14 | I'm hoping that this repo will turn into a kind of self-directed MOOC for 15 | mobile robotics. I like online courses a lot, but I tend to think they don't go 16 | deep enough most of the time. That will definitely not be the case here - if we 17 | read the books and do all of the note cards and exercises, my hope is that 18 | we'll end up with deep knowledge of the subject. 19 | 20 | ### Caveat Emptor 21 | 22 | I'm not an expert in this stuff (yet), so it's likely that my solutions will be 23 | incomplete, incorrect, or otherwise flawed! I would be extremely happy to get 24 | pull requests with corrections, alternative solutions, and additional 25 | flashcards (or better versions of the existing ones), or anything else that 26 | fits with the idea of self-teaching mobile robotics, as I don't have any 27 | external reference to use for checking my work. So far, both of these books are 28 | excellent and I highly recommend checking them out if you're interested in this 29 | kind of stuff, and if you do please check back here from time to time to see 30 | what we've come up with for study aids. 31 | 32 | ### What's in here 33 | 34 | 1. Anki decks, one for Probabilistic Robotics, and the other for Intro to 35 | Autonomous Robots. To use them, you'll need to get [Anki](http://ankisrs.net/). 36 | 2. LaTeX files for my solutions to the exercises. I try to include a lot of 37 | explanation of why I'm doing certain things, as a way to help my future self 38 | when studying this stuff later. Hopefully you'll find them handy as well. 39 | 3. Miscellaneous programs from the books or other sources of inspiration, 40 | probably mostly written in Rust, Python, or Julia. I welcome pull requests with 41 | bug fixes and alternative solutions or example programs, in any language. 42 | 43 | 44 | -------------------------------------------------------------------------------- /prob-rob/Probabilistic-robotics.apkg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aethertap/probabilistic-robotics/a6c22d6332f91addb0b617347cad5583d651c418/prob-rob/Probabilistic-robotics.apkg -------------------------------------------------------------------------------- /prob-rob/ch3/ch3-solutions.tex: -------------------------------------------------------------------------------- 1 | 2 | \documentclass[10pt]{article} 3 | \usepackage{amsmath} 4 | \begin{document} 5 | \title{Solutions for Chapter 3 of Probabilistic Robotics} 6 | 7 | \section{Problem 1} 8 | \begin{enumerate} 9 | \item \textit{In this and the following exercises, you are asked to design a Kalman 10 | filter for a simple dynamical system: a car with linear dynamics moving in a 11 | linear environment. Assume $\delta t = 1$ for simplicity. The position of the 12 | car at time $t$ is given by $x_t$. Its velocity is given by $\dot{x}_t$, and its 13 | acceleration [from time $t-1$ to time $t$] is given by $\ddot{x}_t$. Suppose the 14 | acceleration is set randomly at each point in time, according to a Gaussian with 15 | zero mean and covariance $\sigma^2 = 1$.} 16 | 17 | One thing I note: At first it sounded to me like we would know what the 18 | acceleration value was at each time step. This threw me off for a long time, 19 | until I realized that this is more like modeling a random wind blowing - you 20 | don't know what the effect is at any given moment, all you know is it's 21 | statistical behavior. Since we don't \textit{know} the value of the 22 | acceleration, it \textit{can't} be part of our state! That may be obvious to 23 | others reading the problem, but I worked on it for two days (!) before I figured 24 | it out. C'est la vie. 25 | 26 | \begin{enumerate} 27 | \item \textit{What is a minimal state vector for the Kalman filter (so that 28 | the resulting system is Markovian)?} 29 | 30 | In order to be Markovian, we have to have a state vector such that the 31 | future and the past are independent given the present state. If the state 32 | vector has the position and velocity, this condition is met. All of the 33 | acceleration values of the past are completely summarized in the position 34 | and velocity, so keeping it as a state variable doesn't tell us anything 35 | new in terms of predicting the future. The acceleration is set randomly at 36 | each time step, but given that we know the position and velocity we don't 37 | need to know any past acceleration in order to compute the future given 38 | the state and the control (which in this case will be setting the 39 | acceleration). 40 | 41 | Our state vector is thus: 42 | $$\begin{pmatrix}x_t \\ \dot{x}_t\end{pmatrix}$$ 43 | 44 | \item \textit{For your state vector, design the state transition probability 45 | $p(x_t | u_t,x_{t-1})$. Hint: this transition function will possess linear 46 | matrices $A$ and $B$ and a noise covariance $R$.} 47 | 48 | We will use the moments parameterization since we're building a Kalman filter. 49 | The state transition function is of the form 50 | $$x_t = A x_{t-1} + B u_t + \epsilon_t$$ 51 | 52 | Where $\epsilon_t$ is a gaussian random variable with mean zero and variance 1 53 | (the acceleration, as given in the problem). We'll assume there is no 54 | acceleration, and incorporate the acceleration as error in the position and 55 | velocity. 56 | 57 | The mean for our distribution is given by $A x_{t-1} + B u_t$, and the 58 | variance is given by $A\Sigma_{t-1}A^T + cov(\epsilon_t)$. The 59 | $A\Sigma_{t-1}A^T$ term carries forward the uncertainty from the previous state 60 | updated by the state transition function (i.e. we need to apply the same 61 | transformation to our error bounds that we apply to the estimate, otherwise 62 | the error bounds will become meaningless). The $cov(\epsilon_t)$ term is the 63 | new error added in by the random acceleration. 64 | 65 | All that remains is to use the equations of motion to derive matrices for $A$ 66 | and $B$. We know that: 67 | \begin{gather} 68 | x_t = x_{t-1} + \left(\frac{\dot{x}_{t-1}+\dot{x}_t}{2}\right) \Delta_t \\ 69 | \dot{x}_t = \dot{x}_{t-1} + \ddot{x}_t\Delta_t 70 | \end{gather} 71 | 72 | Using our expression for $\dot{x}_t$ in the first equation yields: 73 | 74 | \begin{gather} 75 | x_t = x_{t-1} + \dot{x}_t-1\Delta_t + \frac{1}{2}\ddot{x}_t\Delta_t^2 76 | \end{gather} 77 | 78 | We use the average velocity between the start and end of the time slice to 79 | calculate the new $x_t$ value, because the velocity has changed in the 80 | interval as a result of the acceleration ($\ddot{x}_{t}$). 81 | 82 | There is no control action in this model, since there is no explicit (known) 83 | acceleration (i.e. the acceleration value is just a probability distribution 84 | and we don't know its value at any given time). That means $B u_t = \vec{0}$. 85 | 86 | Now we can write our prediction of the overall mean, in matrix form: 87 | 88 | $$ 89 | \overline{\mu}_t = \begin{pmatrix}x_t \\ \dot{x}_t\end{pmatrix} = 90 | \begin{pmatrix}1 & \Delta t \\ 0 & 1\end{pmatrix} \begin{pmatrix}x_{t-1} \\ 91 | \dot{x}_{t-1} \end{pmatrix} 92 | $$ 93 | 94 | Next we need to model the covariance. Since we are given the variance of the 95 | acceleration (=1), we have to figure out how that maps into variance of the 96 | two state variables and make a variance vector accordingly. 97 | 98 | From our equations of motion, we know that $x_t$ depends on $\ddot{x}_t$ 99 | scaled by $\frac{\Delta_t}{2}$, and we know that $\dot{x}_t$ depends on 100 | $\ddot{x}_t$ without scaling (i.e. the scale factor is 1). 101 | 102 | That gives us what we need to figure out how the random acceleration should 103 | affect our estimates for position and velocity. The variance of velocity 104 | should be equal to the variance of acceleration times the time elapsed, and 105 | the variance of position should be equal to half of the squared time elapsed. 106 | 107 | If we knew what the acceleration was at each step, we could add a correction 108 | term to the above expression for the mean to get an exact answer. This is what 109 | that term would look like: 110 | \begin{gather} 111 | \delta_t = \begin{pmatrix}\frac{\Delta_t^2}{2} \\ \Delta_t\end{pmatrix} \ddot{x}_t 112 | \end{gather} 113 | 114 | However, we don't have knowledge of $\ddot{x}_t$, all we know is that it's 115 | gaussian with mean zero and std deviation 1. We therefore need to turn that 116 | equation into a multivariate gaussian distribution, which when added to our 117 | mean will yield the total probability distribution for $x_t$. In order to do 118 | that, we need to calculate a covariance matrix based on the vector $\delta_t$ 119 | (which is telling us how much the acceleration impacts each of the state 120 | variables). 121 | 122 | We know the variance of the acceleration, but we need to figure out how that 123 | maps into the variance of the position and velocity. Our $\delta$ vector gives 124 | us the scaling factors - if we know the standard deviation of $\ddot{x}_t$, 125 | then we could scale it by the values in $\delta_t$ to see how it would change 126 | the standard deviation of $x_t$ and $\dot{x}_t$. However, since we're working 127 | with a gaussian distribution, we want to use variance. That means that we're 128 | working with the squared standard deviation, so we need to apply the 129 | appropriate analogous operation to $\delta_t$ in order to make a covariance 130 | matrix. We can do that by calculating $\delta_t \delta_t^T$ (yes, I'm 131 | hand-waving a bit here because I haven't figured out the intuitive explanation 132 | for \textit{why} this should make the right covariance matrix yet...): 133 | 134 | \begin{align} 135 | R = cov\left(\delta_t\right) = \delta_t \delta_t^T &=& 136 | \begin{pmatrix}\frac{\Delta_t^2}{2} \\ \Delta_t\end{pmatrix} 137 | \begin{pmatrix}\frac{\Delta_t^2}{2} & \Delta_t\end{pmatrix} \\ 138 | &=& 139 | \begin{pmatrix}\frac{\Delta_t^4}{4} & \frac{\Delta_t^3}{2} \\ 140 | \frac{\Delta_t^3}{2} & \Delta_t^2 \end{pmatrix} 141 | \end{align} 142 | 143 | Now we have our probability $p(x_t | u_t,x_{t-1})$: 144 | \begin{gather} 145 | \overline{\mu}_t = \begin{pmatrix}1 & \Delta_t \\ 0 & 1\end{pmatrix} 146 | \begin{pmatrix}x_{t-1} \\ \dot{x}_{t-1}\end{pmatrix}\\ 147 | \overline{\Sigma}_t = \begin{pmatrix}1 & \Delta_t \\ 0 & 1\end{pmatrix} 148 | \Sigma_{t-1} 149 | \begin{pmatrix}1 & 0 \\ \Delta_t & 1\end{pmatrix} + 150 | \begin{pmatrix}\frac{\Delta_t^4}{4} & \frac{\Delta_t^3}{2} \\ 151 | \frac{\Delta_t^3}{2} & \Delta_t^2 \end{pmatrix} \sigma_{\ddot{x}_t}^2 152 | \end{gather} 153 | 154 | We have to provide an initial value for $\Sigma_0$. In this case, I think it's 155 | reasonable to assume that it is $0$. The uncertainty from the random 156 | accelerations at each timestep will propagate into $\Sigma_t$, so it will not 157 | remain $0$ for long. 158 | 159 | Later, the Kalman gain will be used to select how much to weight the 160 | prediction versus the correction (measurement), based on the relative 161 | magnitudes of the covariances. 162 | 163 | \item \textit{Implement the state prediction step of the Kalman filter. Assuming 164 | we know at time $t=x, x_0 = \dot{x}_0 = \ddot{x}_0 = 0$. Compute the state 165 | distributions for times $t=1,2,...,5$.} 166 | 167 | \item \textit{For each value of $t$, plot the joint posterior over $x$ and 168 | $\dot{x}$ in a diagram, where x is the horizontal and $\dot{x}$ is the 169 | vertical axis. For each posterior, your are asked to plot an uncertainty 170 | ellipse, which is the ellipse of points that are one standard deviation away 171 | from the mean. Hint: if you do not have access to a mathematics library, you 172 | can create those ellipses by analyzing the eigenvalues of the covariance 173 | matrix.} 174 | 175 | \item \textit{What will happen to the correlation between $x_t$ and $\dot{x}_t$ 176 | as $t\to \infty$?} 177 | 178 | \end{enumerate} 179 | 180 | \item \textit{In Chapter 3.2.4, we derived the prediction step of the KF. This 181 | step is often derived with Z transforms or Fourier transforms, using the 182 | Convolution theorem. Re-derive the prediction step using transforms.} 183 | 184 | I didn't do this one, it seems like a lot more work than it's worth for my 185 | goals. Maybe I'll come back to it... 186 | 187 | \item \textit{We noted in the text that the EKF linearization is an 188 | approximation. To see how bad this approximation is, we ask you to work out 189 | an example. Suppose we have a mobile robot operating in a planar environment. 190 | Its state is its x-y location and its global headiing $\theta$. Suppose we know 191 | x and y with high certainty, but the orientation $\theta$ is unknown. This is 192 | reflected in our initial estimate:} 193 | 194 | \begin{align} 195 | \mu &= \begin{pmatrix}0 \\ 0 \\ 0\end{pmatrix} \\ 196 | \Sigma &= \begin{pmatrix}0.01 & 0 & 0\\ 0 & 0.01 & 0 \\ 0 & 0 & 10000\end{pmatrix} 197 | \end{align} 198 | 199 | \begin{enumerate} 200 | \item \textit{Draw, graphically, your best model of the posterior over 201 | the robot pose after the robot moves $d=1$ units forward. For this 202 | exercise, we assume that the robot moves flawlessly without any noise. 203 | Thus, the expected location of the robot after motion will be:} 204 | \begin{align} 205 | \begin{pmatrix}x' \\ y' \\ \theta'\end{pmatrix} = 206 | \begin{pmatrix}x+cos \theta \\ y + sin \theta \\ \theta\end{pmatrix} 207 | \end{align} 208 | \item \textit{Now develop this motion into a prediction step for the 209 | EKF. For that, you have to generate a new Gaussian estimate of the 210 | robot pose using the linearized model. You should give the exact 211 | mathematical equations for each of these steps, and state the Gaussian 212 | that results.} 213 | \item \textit{Draw the uncertainty ellipse of the Gaussian and 214 | compare it with your intuitive solution.} 215 | \item \textit{Now incorporate a measurement. Our measurement shall be 216 | a noisy projection of the x-coordinate of the robot, with covariance 217 | $Q=0.01$. Specify the measurement model. Now apply the measurement 218 | both to your intuitive posterior, and formally to the EKF estimate using 219 | the standard EKF machinery. Give the exact result of the EKF, and compare 220 | it with the result of your intuitive analysis.} 221 | \item \textit{Discuss the difference between your estimate of the 222 | posterior, and the Gaussian produced by the EKF. How significant 223 | are those differences? What can be changed to make the approximation 224 | more accurate? What would have happened if the initial orientation had 225 | been known, but not the robot's y-coordinate?} 226 | \end{enumerate} 227 | 228 | \item \textit{The Kalman filter in Table 3.1 lacked a constant additive term 229 | in the motion and the measurement models. Extend this algorithm to contain 230 | such terms.} 231 | 232 | \item \textit{Prove (via example) the existence of a sparse information 233 | matrix in multivariate Gaussians (of dimension d) that correlate all d 234 | variables with correlation coefficients that are $\epsilon$-close to 1. We 235 | say an infomration matrix is sparse if all but a constant number of elements 236 | in each row and column are zero.} 237 | \end{enumerate} 238 | 239 | \end{document} 240 | -------------------------------------------------------------------------------- /prob-rob/ch3/understanding-covariance.tex: -------------------------------------------------------------------------------- 1 | \documentclass[12pt]{article} 2 | \usepackage[margin=0.5in]{geometry} 3 | \usepackage{amsmath} 4 | \author{Erik Lee} 5 | \date{2014-10-24} 6 | \title{A digression on the mysteries of covariance matrices} 7 | \begin{document} 8 | \maketitle 9 | 10 | I struggled for a long time trying to get some intuition about why it is that 11 | the covariance matrix should be given by $\vec{v} \vec{v}^T$, where $\vec{v}$ is 12 | the vector of coefficients for the source of variance in the equations of 13 | motion. That sentence is a tortured mess, let me try it with math: 14 | 15 | If we have these equations of motion: 16 | 17 | \begin{align} 18 | x_t &= x_{t-1} + \dot{x}_{t-1} \Delta_t + \frac{\Delta_t^2}{2} \ddot{x}_t \\ 19 | \dot{x}_t &= \dot{x}_{t-1} + \Delta_t \ddot{x}_t 20 | \end{align} 21 | 22 | And we want to figure out how variance of $\ddot{x}$ affects the variance of $x$ 23 | and $\dot{x}$, why should it be the case that we make a vector out of the 24 | coefficients of $\ddot{x}$ for each equation, then multiply it by its transpose: 25 | 26 | \begin{align} 27 | cov_a(x,\dot{x}) &= \begin{pmatrix}\frac{\Delta_t^2}{2} \\ \Delta_t\end{pmatrix} 28 | \begin{pmatrix}\frac{\Delta_t^2}{2} & \Delta_t\end{pmatrix}\sigma_a^2\\ 29 | &= \begin{pmatrix}\frac{\Delta_t^4}{4} & \frac{\Delta_t^3}{2} \\ 30 | \frac{\Delta_t^3}{2} & t^2 \end{pmatrix}\sigma_a^2 31 | \end{align} 32 | 33 | To try to figure this out, I decided to start at the base definition of 34 | covariance and work from there. tl;dr: Since $x$ and $\dot{x}$ are linear 35 | functions with respect to $\ddot{x}$, their covariance terms are just the 36 | products of their coefficients. Their variances are the squares of their 37 | coefficients, and if you do the number crunching you see that the vector product 38 | above gives you all possible combinations of pairwise products between the 39 | variables, so it's a convenient way to get the data you need. It also happens to 40 | organize it in a way that's predictable, and can thus be used in the formula for 41 | a multivariate gaussian distribution. Read on below if you want to see the math 42 | I did to get there. 43 | 44 | Here's the setup: We have two variables that are functions of each other and of 45 | a third variable. The third variable (in this case acceleration) has a variance, 46 | and we want to see how the variance of that third variable propagates itself 47 | through to the other two variables, given their function definitions: 48 | 49 | \begin{align} 50 | x &= f(y,a)\\ 51 | y &= g(a) \\ 52 | cov(x,y) &= E[(x y - E(x)E(y))^2] \\ 53 | &= \sum_i f(y,a_i)f(a_i) + \left(\sum_j f(y,a_j)p(a_j)\right)\left(\sum_k g(a_k)p(a_k)\right) 54 | \end{align} 55 | 56 | Now, that pretty much stopped me, so I decided to assume the functions $f$ and 57 | $g$ are linear in $a$ (which happens to be the case here): 58 | 59 | \begin{align} 60 | x &= f(y,a)\\ 61 | &= f_1(y)F_1a + f_2(y)F_2 + F_3\\ 62 | y &= g(a) \\ 63 | &= G_1g_1(a) + G_2 64 | \end{align} 65 | 66 | Now, since all we care about is how changes in $a$ affect the outcome, we can 67 | treat everything that doesn't have an $a$ term as constant, and simplify our 68 | expressions to this: 69 | 70 | \begin{align} 71 | x &= f(a)\\ 72 | &= F_1 a + F_2\\ 73 | y &= g(a) \\ 74 | &= G_1 a + G_2 75 | \end{align} 76 | 77 | Now we plug those formulas into the covariance definition to see what the 78 | covariance of x and y are: 79 | 80 | \begin{align} 81 | & cov(x,y) = E\left[\left(f(a)g(a) - 82 | E\left(f(a)\right)E\left(g(a)\right)\right)^2\right] \\ 83 | &= \sum_i \left[\left(F_1 a_i + F_2\right)\left(G_1 a_i + G_2 \right) + 84 | \left(\sum_j (F_1 a_j+ F_2) p(a_j)\right) 85 | \left(\sum_k (G_1 a_k + G_2) p(a_k)\right)\right]p(a_i)\\ 86 | &= \sum_i \left[\left(F_1 G_1 a_i^2 + (F_1 G_2 + F_2 G_1)a_i + F_2 G_2\right) - 87 | \left(F_2 + F_1 \sum_j a_j p(a_j)\right)\left(G_2 + G_1\sum_k a_k 88 | p(a_k)\right)\right]p(a_i)\\ 89 | &= \sum_i \left[\left (F_1 G_1 a_i^2 + (F_1 G_2 + F_2 G_1)a_i+F_2 G_2\right) - 90 | \left(F_2 + F_1 E(a)\right)\left(G_2+G_1 E(a)\right)\right]p(a_i) 91 | \end{align} 92 | 93 | In the above, we just noticed that the sum terms on the right simplify down to 94 | the expected value of $a$ passed through the original functions $f$ and $g$, 95 | which is a constant with respect to the sum, so we can pull it out): 96 | 97 | \begin{align} 98 | &cov(x,y) = \sum_i \left[\left (F_1 G_1 a_i^2 + (F_1 G_2 + F_2 G_1)a_i+F_2 G_2\right) - 99 | \left(F_2 + F_1 E(a)\right)\left(G_2+G_1 E(a)\right)\right]p(a_i)\\ 100 | &= \left(\sum_i \left (F_1 G_1 a_i^2 + (F_1 G_2 + F_2 G_1)a_i+F_2 G_2\right)p(a_i)\right) - 101 | \left(F_2 G_2 + (F_2 G_1 + F_1 G_2) E(a) + F_1 G_1 E(a)^2\right) 102 | \end{align} 103 | 104 | Now we can start to see a pattern. The left term looks a lot like $E(a^2)$ (with 105 | some extra stuff), and the right side looks like $E(a)E(a)$ (with some extra 106 | stuff): 107 | 108 | \begin{align} 109 | &cov(x,y) = \left(\sum_i (F_1 G_2 + F_2 G_1)a_ip(a_i)\right) + F_1 G_1 E(a^2) 110 | -(F_2 G_1 + F_1 G_2) E(a) - F_1 G_1 E(a)^2 111 | \end{align} 112 | 113 | Now we notice that the two terms that aren't quadratic both cancel! 114 | 115 | \begin{align} 116 | &cov(x,y) = \left(\sum_i (F_1 G_2 + F_2 G_1)a_ip(a_i)\right) + F_1 G_1 E(a^2) 117 | -(F_2 G_1 + F_1 G_2) E(a) - F_1 G_1 E(a)^2\\ 118 | &= (F_1 G_2 + F_2 G_1)E(a) -(F_2 G_1 + F_1 G_2) E(a) + F_1 G_1 E(a^2) - F_1 119 | G_1 E(a)^2\\ 120 | &= F_1 G_1 E(a^2) - F_1 G_1 E(a)^2\\ 121 | &= F_1 G_1 (E(a^2) - E(a)^2) \\ 122 | &= F_1 G_1 \sigma_a^2 123 | \end{align} 124 | 125 | Phew! Okay, so what we have learned is that the constant factors in the linear 126 | equation all cancel out (no surprise, since we're measuring variance). All we 127 | have to do to figure out what $a$'s variance contributes to $x$'s and $y$'s 128 | variances is to take all of the coefficients of terms with $a$ in them from the 129 | equations for $x$ and $y$ and multiply them together! So, it turns out that the 130 | vector product 131 | 132 | \begin{align} 133 | cov(\vec{x}_a) &= \begin{pmatrix}x_a \\ \dot{x}_a\end{pmatrix}\begin{pmatrix}x_a & \dot{x}_a\end{pmatrix}\sigma_a^2\\ 134 | &= \begin{pmatrix}\frac{\Delta_t^2}{2} \\ \Delta_t\end{pmatrix} 135 | \begin{pmatrix}\frac{\Delta_t^2}{2} & \Delta_t\end{pmatrix}\sigma_a^2\\ 136 | &= \begin{pmatrix}\frac{\Delta_t^4}{4} & \frac{\Delta_t^3}{2} \\ 137 | \frac{\Delta_t^3}{2} & t^2 \end{pmatrix}\sigma_a^2 138 | \end{align} 139 | 140 | gives us exactly what we're looking for. In general, we can say that the 141 | covariance of any two variables is going to follow the coefficient product rule 142 | from above, so what we want to do is have a matrix that has all possible 143 | pairwise combinations of variables from the vector. This is what the product 144 | above yields, and this is why you can take the coefficients of the $a$ terms, 145 | multiply them together, and make a covariance matrix out of it. 146 | 147 | Hopefully that is as useful to your understanding as it was to mine. I couldn't 148 | see the relationship between the $a$ coefficients of $x,y$ and the covariance 149 | matrix before doing this. 150 | 151 | \end{document} 152 | --------------------------------------------------------------------------------