├── cover
├── cover.pdf
├── full_cover.png
├── README.md
└── gen_q_wiener.py
├── dissertation.pdf
├── scripts
├── requirements.txt
├── README.md
├── vanishing_prior_cov.py
├── TME_positive_definite_softplus.py
├── showcase_gp_sinusoidal.py
├── draw_ssdgp_m12.py
├── showcase_gp_rectangular.py
├── draw_ssdgp_m32.py
├── disc_err_dgp_m12.py
├── spectro_temporal.py
└── TME_estimation_benes.py
├── thesis_latex
├── figs
│ ├── drift-est.pdf
│ ├── gp-kfs-eq.pdf
│ ├── ais-r-ssdgp.pdf
│ ├── imu-r-ssdgp.pdf
│ ├── r-ssgp-admm.pdf
│ ├── drift-est-dw.pdf
│ ├── drift-est-tanh.pdf
│ ├── ssdgp-reg-rect.pdf
│ ├── ssdgp-reg-sine.pdf
│ ├── tme-benes-all.pdf
│ ├── tme-benes-cov.pdf
│ ├── tme-benes-nn.pdf
│ ├── tme-benes-x3.pdf
│ ├── vanishing-cov.pdf
│ ├── disc-err_dgp_m12.pdf
│ ├── tme-benes-filter.pdf
│ ├── tme-ct3d-filter.pdf
│ ├── gp-fail-example-m12.pdf
│ ├── gp-fail-example-m32.pdf
│ ├── gp-fail-example-m52.pdf
│ ├── gp-fail-example-rbf.pdf
│ ├── gravit-wave-ssdgp.pdf
│ ├── samples_ssdgp_m12.pdf
│ ├── samples_ssdgp_m32.pdf
│ ├── tme-benes-nonlinear.pdf
│ ├── tme-benes-smoother.pdf
│ ├── tme-ct3d-smoother.pdf
│ ├── tme-softplus-mineigs.pdf
│ ├── gp-fail-example-sinu-m12.pdf
│ ├── gp-fail-example-sinu-m32.pdf
│ ├── gp-fail-example-sinu-m52.pdf
│ ├── spectro-temporal-demo1.pdf
│ ├── tme-duffing-smoother-x1.pdf
│ ├── tme-duffing-smoother-x2.pdf
│ ├── tme-benes-filter-smoother.pdf
│ ├── tme-duffing-filter-smoother.pdf
│ ├── gp-sign-fit-example.tex
│ ├── ssdgp-identifiability-graph.tex
│ ├── dgp-example-2.tex
│ └── dgp-binary-tree.tex
├── title-pages
│ ├── README.md
│ ├── backcover.pdf
│ └── title-pages.pdf
├── sRGB_IEC61966-2-1_black_scaled.icc
├── dissertation.xmpdata
├── aalto_licenses
│ ├── README.md
│ ├── README-aaltologo.md
│ ├── README.txt
│ ├── LICENSES.txt
│ ├── LICENSES-aaltologo.txt
│ └── sRGB_IEC61966-2-1_black_scaled.icc-COPYRIGHT
├── modifications_of_aaltoseries.txt
├── README.md
├── fouriernc2.sty
├── list_of_papers.tex
├── zmacro.tex
├── ch6.tex
├── fourier2.sty
├── ch1.tex
├── dissertation.tex
└── ch5.tex
├── .gitignore
├── lectio_praecursoria
├── README.md
├── figs
│ └── path-graph.tex
├── scripts
│ ├── draw_gp_samples.py
│ └── kfs_anime.py
├── zz.cls
├── z_marcro.tex
└── slide.tex
├── license.txt
├── errata.md
├── .github
└── workflows
│ └── latex_compile.yml
└── README.md
/cover/cover.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/cover/cover.pdf
--------------------------------------------------------------------------------
/dissertation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/dissertation.pdf
--------------------------------------------------------------------------------
/cover/full_cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/cover/full_cover.png
--------------------------------------------------------------------------------
/scripts/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | gpflow
3 | matplotlib
4 | scipy
5 | scikit-learn
6 | sympy
7 | tme>=0.1.4
8 |
--------------------------------------------------------------------------------
/thesis_latex/figs/drift-est.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-kfs-eq.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-kfs-eq.pdf
--------------------------------------------------------------------------------
/cover/README.md:
--------------------------------------------------------------------------------
1 | Code to generate the dissertation cover. Run `python gen_q_wiener.py` to generate the file `cover.pdf`.
2 |
--------------------------------------------------------------------------------
/thesis_latex/figs/ais-r-ssdgp.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ais-r-ssdgp.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/imu-r-ssdgp.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/imu-r-ssdgp.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/r-ssgp-admm.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/r-ssgp-admm.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/drift-est-dw.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est-dw.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/drift-est-tanh.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est-tanh.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/ssdgp-reg-rect.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ssdgp-reg-rect.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/ssdgp-reg-sine.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ssdgp-reg-sine.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-all.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-all.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-cov.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-cov.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-nn.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-nn.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-x3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-x3.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/vanishing-cov.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/vanishing-cov.pdf
--------------------------------------------------------------------------------
/thesis_latex/title-pages/README.md:
--------------------------------------------------------------------------------
1 | # README
2 |
3 | Title pages and backcover generated from Aalto Publicaton Platform.
4 |
--------------------------------------------------------------------------------
/thesis_latex/figs/disc-err_dgp_m12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/disc-err_dgp_m12.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-filter.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-ct3d-filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-ct3d-filter.pdf
--------------------------------------------------------------------------------
/thesis_latex/title-pages/backcover.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/title-pages/backcover.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-m12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m12.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-m32.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m32.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-m52.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m52.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-rbf.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-rbf.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gravit-wave-ssdgp.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gravit-wave-ssdgp.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/samples_ssdgp_m12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/samples_ssdgp_m12.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/samples_ssdgp_m32.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/samples_ssdgp_m32.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-nonlinear.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-nonlinear.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-smoother.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-smoother.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-ct3d-smoother.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-ct3d-smoother.pdf
--------------------------------------------------------------------------------
/thesis_latex/title-pages/title-pages.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/title-pages/title-pages.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-softplus-mineigs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-softplus-mineigs.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-sinu-m12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m12.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-sinu-m32.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m32.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-fail-example-sinu-m52.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m52.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/spectro-temporal-demo1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/spectro-temporal-demo1.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-duffing-smoother-x1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-smoother-x1.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-duffing-smoother-x2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-smoother-x2.pdf
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-benes-filter-smoother.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-filter-smoother.pdf
--------------------------------------------------------------------------------
/thesis_latex/sRGB_IEC61966-2-1_black_scaled.icc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/sRGB_IEC61966-2-1_black_scaled.icc
--------------------------------------------------------------------------------
/thesis_latex/figs/tme-duffing-filter-smoother.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-filter-smoother.pdf
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.asc
2 | *.ase
3 | *.asp
4 | *.aux
5 | *.bbl
6 | *.blg
7 | *.brf
8 | *.log
9 | *.out
10 | *.synctex.gz
11 | *.toc
12 | *.nav
13 | *.snm
14 | pdfa.xmpi
15 | .idea
16 | __pycache__
17 |
--------------------------------------------------------------------------------
/lectio_praecursoria/README.md:
--------------------------------------------------------------------------------
1 | # About
2 |
3 | Lectio praecursoria given at the public defence of the dissertation.
4 |
5 | # Notes
6 |
7 | In case of missing-figure errors, you need to run the scripts in `./scripts` and copy the generated pictures to the corresponding path.
8 |
--------------------------------------------------------------------------------
/thesis_latex/dissertation.xmpdata:
--------------------------------------------------------------------------------
1 | \Title{State-space deep Gaussian processes with applications}
2 | \Author{Zheng Zhao}
3 | \Subject{State-space methods for deep Gaussian processes}
4 | \Keywords{Gaussian processes, machine learning, state space, stochastic differential equations, stochastic filtering}
5 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/README.md:
--------------------------------------------------------------------------------
1 | This project contains the aaltoseries class file and docs.
2 |
3 | The zip file 'examples.zip' contains a number of example files. In order to
4 | compile them, copy aaltoseries.cls from root into this directory, and also
5 | obtain a copy of aaltologo.sty and place it there.
6 |
7 |
--------------------------------------------------------------------------------
/license.txt:
--------------------------------------------------------------------------------
1 | Unless otherwise stated, all rights belong to the author Zheng Zhao. This repository contains files covered by different licenses, please check their licenses before you use them.
2 |
3 | You are free to download, display, and print ./dissertation.pdf for your own personal use. Commercial use of it is prohibited.
4 |
5 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/README-aaltologo.md:
--------------------------------------------------------------------------------
1 | This is the home of the file aaltologo.sty, which generates the Aalto logo's for documents and also otherwise defines relevant parts of the Aalto visual identity,
2 | specifically, colours and fonts used, and the official names of Schools, Institutes, etc. as used in the various logos, in Finnish, English and Swedish.
3 |
--------------------------------------------------------------------------------
/errata.md:
--------------------------------------------------------------------------------
1 | # Errata
2 |
3 | 1. Page 86, Assumption 4.21, $i$ should be replaced with 2.
4 | 2. Page 54, Example (3.10), the sentence "In this case, |k| should be less than 0.5" is inaccurate. It should be "less or equal than 0.5".
5 | 3. Page 44, Equation (3.11), the Ito integral there is missing a (\nabla_X \phi)^\trans. Similarly, in Equation (3.14), the Ito integral is missing.
6 |
7 | If you spot any error/typo/inaccuracy in the thesis, please feel free to submit an Issue or drop me an email.
8 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/README.txt:
--------------------------------------------------------------------------------
1 | aaltologo.sty -- A LaTeX package for creating Aalto University logos
2 |
3 | See the accompanying documentation in aaltologo.pdf for a user manual. Read the file
4 | LICENSES.txt for publication licenses of the files aaltologo.sty and aaltologo.pdf.
5 | The publication licenses are also given in the documentation.
6 |
7 | For installation, move/copy aaltologo.sty to a location where LaTeX can find it.
8 |
9 | Enjoy creating Aalto University logos.
10 |
--------------------------------------------------------------------------------
/thesis_latex/modifications_of_aaltoseries.txt:
--------------------------------------------------------------------------------
1 | Zheng Zhao made the following changes to the aaltoseries.cls class.
2 |
3 | 1. Removed the "DRAFT" signs.
4 | 2. Changed \@date to a fix date October 4 in the preface environment.
5 | 3. Changed "List of Publications" to "List of publications".
6 | 4. Added copyright notice for \addpublication with "submitted" option.
7 |
8 | You can also clearly see the differences by using the `diff` command.
9 |
10 | Disclaimer: The changes above do not create a derivative of aaltoseries, therefore, the CC Attribution Non-derivative license is not violated.
11 |
--------------------------------------------------------------------------------
/.github/workflows/latex_compile.yml:
--------------------------------------------------------------------------------
1 | name: Dissertation latex compile
2 | on:
3 | workflow_dispatch:
4 | inputs:
5 | name:
6 | description: 'Workflow run name'
7 | required: true
8 | default: 'Manual unittest'
9 |
10 | jobs:
11 | build_latex:
12 | runs-on: ubuntu-latest
13 | steps:
14 | - name: Set up Git repository
15 | uses: actions/checkout@v2
16 | - name: Compile LaTeX document
17 | uses: xu-cheng/latex-action@v2
18 | with:
19 | root_file: dissertation.tex
20 | latexmk_shell_escape: true
21 | pre_compile: "cd thesis_latex"
22 | post_compile: "latexmk -c"
23 |
24 |
--------------------------------------------------------------------------------
/thesis_latex/README.md:
--------------------------------------------------------------------------------
1 | # README
2 |
3 | Latex source of the thesis titled "State-space deep Gaussian processes with applications".
4 |
5 | The main tex file is `dissertation.tex`. It is recommended that your Texlive version is greater or equal than *2019* in order to compile the thesis.
6 |
7 | # Licenses
8 |
9 | The licenses for `aaltoseries.cls`, `aaltologo.sty`, and `sRGB_IEC61966-2-1_black_scaled.icc` are found in `./aalto_licenses`. Note that `aaltoseries.cls` is modified, see, `modifications_of_aaltoseries.txt`. As for the licenses for `fourier2.sty` and `fouriernc2.sty`, please inquiry the aaltoseries developers.
10 |
11 | `z_marcro.tex` is under CC BY 4.0 license.
12 |
13 | You are free to compile, download, display, and print the thesis and latex source files of it, as long as you give proper citation to it. Commercial use is prohibited.
14 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/LICENSES.txt:
--------------------------------------------------------------------------------
1 | The aaltoseries class has been published under the Creative Commons Attribution No-Derivative license (http://creativecommons.org/licenses/by-nd/1.0/). This means that you CAN use the class freely in your own documents BUT it also means that you CANNOT base your own Aalto University publication series class/package upon this class. However, you CAN use this class as an example in designing your own classes/packages that implement the publication series recommendations of another university or company.
2 |
3 | The documentation for the aaltoseries class has been published under the Creative Commons Attribution license (http://creativecommons.org/licenses/by/1.0/). This means that you CAN freely write your own documentation for the aaltoseries class based on this document as long as you give credit for the original documentation by citing it appropriately.
4 |
5 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/LICENSES-aaltologo.txt:
--------------------------------------------------------------------------------
1 | The aaltologo package has been published under Creative Commons Attribution
2 | No-Derivative license (http://creativecommons.org/licenses/by-nd/1.0/). This means
3 | that you CAN use the package freely in your own documents, packages and classes
4 | BUT it also means that you CANNOT base your own Aalto University logo package upon
5 | this package. Furthermore, if you want to write your own Aalto University logo
6 | package, you NEED to contact the Aalto University Marketing and Communications BEFORE
7 | publishing your package. However, you can use this package as an example in designing
8 | your own packages implementing the visual identity of a company/university.
9 |
10 | The documentation for the aaltologo package has been published under the Creative
11 | Commons Attribution license (http://creativecommons.org/licenses/by/1.0/). This means
12 | that you CAN freely write your own documentation for the aaltologo package based on
13 | the document as long as you give credit for the original documentation by citing it
14 | appropriately.
15 |
16 |
17 |
--------------------------------------------------------------------------------
/thesis_latex/aalto_licenses/sRGB_IEC61966-2-1_black_scaled.icc-COPYRIGHT:
--------------------------------------------------------------------------------
1 | For the file sRGB_IEC61966-2-1_black_scaled.icc:
2 |
3 | Copyright International Color Consortium, 2009
4 |
5 | It is hereby acknowledged that the file "sRGB_IEC61966-2-1_black
6 | scaled.icc" is provided "AS IS" WITH NO EXPRESS OR IMPLIED WARRANTY.
7 |
8 | Licensing
9 |
10 | This profile is made available by the International Color Consortium,
11 | and may be copied, distributed, embedded, made, used, and sold without
12 | restriction. Altered versions of this profile shall have the original
13 | identification and copyright information removed and shall not be
14 | misrepresented as the original profile.
15 |
16 | Terms of use
17 |
18 | To anyone who acknowledges that the file "sRGB_IEC61966-2-1_black
19 | scaled.icc" is provided "AS IS" WITH NO EXPRESS OR IMPLIED WARRANTY,
20 | permission to use, copy and distribute these file for any purpose is
21 | hereby granted without fee, provided that the file is not changed
22 | including the ICC copyright notice tag, and that the name of ICC shall
23 | not be used in advertising or publicity pertaining to distribution of
24 | the software without specific, written prior permission. ICC makes no
25 | representations about the suitability of this software for any
26 | purpose.
27 |
28 |
--------------------------------------------------------------------------------
/thesis_latex/fouriernc2.sty:
--------------------------------------------------------------------------------
1 | \def\fileversion{1.0}%
2 | \def\filedate{2005/12/20}%
3 | \NeedsTeXFormat{LaTeX2e}%
4 | \ProvidesPackage{fouriernc}%
5 | [\filedate\space\fileversion\space fouriernc package]%
6 |
7 | %The metrics for the 'upright' option have not been tuned.
8 | \DeclareOption{sloped}{\PassOptionsToPackage{sloped}{fourier}}
9 | %\DeclareOption{upright}{\PassOptionsToPackage{upright}{fourier}}
10 |
11 | \ExecuteOptions{sloped}
12 | \ProcessOptions
13 | \RequirePackage{fourier2}
14 |
15 | %\ifsloped
16 | \DeclareSymbolFont{letters}{FML}{fncmi}{m}{it}
17 | \DeclareSymbolFont{otherletters}{FML}{fncm}{m}{it}
18 | \SetSymbolFont{letters}{bold}{FML}{fncmi}{b}{it}
19 | \SetSymbolFont{otherletters}{bold}{FML}{fncm}{b}{it}
20 | %\else
21 | % \DeclareSymbolFont{letters}{FML}{fncm}{m}{it}
22 | % \DeclareSymbolFont{otherletters}{FML}{fncmi}{m}{it}
23 | % \SetSymbolFont{letters}{bold}{FML}{fncm}{b}{it}
24 | % \SetSymbolFont{otherletters}{bold}{FML}{fncmi}{b}{it}
25 | %\fi
26 |
27 | \renewcommand{\rmdefault}{fnc}
28 |
29 | \DeclareFontSubstitution{FML}{fncmi}{m}{it}
30 | \DeclareFontSubstitution{FMS}{fncm}{m}{n}
31 |
32 | \DeclareSymbolFont{operators}{T1}{fnc}{m}{n}
33 | \SetSymbolFont{operators}{bold}{T1}{fnc}{b}{n}
34 | \DeclareSymbolFont{symbols}{FMS}{fncm}{m}{n}
35 | \DeclareMathAlphabet{\mathbf}{T1}{fnc}{b}{n}
36 | \DeclareMathAlphabet{\mathrm}{T1}{fnc}{m}{n}
37 | \DeclareMathAlphabet{\mathit}{T1}{fnc}{m}{it}
38 | \DeclareMathAlphabet{\mathcal}{FMS}{fncm}{m}{n}
39 |
40 | \endinput
41 |
--------------------------------------------------------------------------------
/scripts/README.md:
--------------------------------------------------------------------------------
1 | This folder contains Python/Matlab scripts that generate some of the figures in the dissertation. Specifically, the scripts in this folder are as follows.
2 |
3 | 1. `disc_err_dgp_m12.py`: Compute the discretisation errors for a Matern DGP. Related to **Figure 4.3**.
4 |
5 | 2. `draw_ssdgp_m12.py`: Draw samples from a Matern 1/2 SS-DGP. Related to **Figure 4.4**.
6 |
7 | 3. `draw_ssdgp_m32.py`: Draw samples from a Matern 3/2 SS-DGP. Related to **Figure 4.5**.
8 |
9 | 4. `showcase_gp_rectangular.py`: Perform GP regression for a rectangular signal. Related to **Figure 1.1**.
10 |
11 | 5. `showcase_gp_sinusoidal.py`: Perform GP regression for a sinusoidal signal. Related to **Figure 1.1**.
12 |
13 | 6. `spectro_temporal.py`: Spectro-temporal state-space method for estimation of spectrogram. Related to **Figure 5.2**.
14 |
15 | 7. `TME_estimation_benes.py`: Use TME to estimate a few expectations of a Benes SDE. Related to **Figure 3.1**.
16 |
17 | 8. `TME_postive_definite_softplus.py`: Analyse the postive definiteness of the TME covariance estimator for an SDE. Related to **Figure 3.2**.
18 |
19 | 9. `vanishing_prior_cov.py`: Estimate a cross-covariance of an SS-DGP. Related to **Figure 4.8**.
20 |
21 | # Requirements
22 |
23 | In order to run the scripts, you need to install a few packages as follows.
24 |
25 | `pip install numpy scipy scikit-learn sympy tme matplotlib`.
26 |
27 | Additionally, if you want to run scripts `showcase_gp_rectangular.py` and `showcase_gp_sinusoidal.py`, you need to install `gpflow`, that is, `pip install gpflow`.
28 |
29 | # License
30 |
31 | You are free to do anything you want with the scripts in this folder, except that `spectro_temporal.py` is under the MIT license. I do not give any warranty of any kind.
32 |
--------------------------------------------------------------------------------
/thesis_latex/figs/gp-sign-fit-example.tex:
--------------------------------------------------------------------------------
1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
2 |
3 | \begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=1]
4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300
5 |
6 | %Straight Lines [id:da2875292600189103]
7 | \draw [line width=1.5] (100,160) -- (100,34) ;
8 | \draw [shift={(100,30)}, rotate = 450] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
9 | %Straight Lines [id:da3046472264214819]
10 | \draw [line width=1.5] (80,140) -- (226,140) ;
11 | \draw [shift={(230,140)}, rotate = 180] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
12 | %Straight Lines [id:da9835771066030095]
13 | \draw [line width=0.75] (100,140) -- (130,70) ;
14 | %Straight Lines [id:da7083475840908604]
15 | \draw [line width=0.75] (130,70) -- (220,70) ;
16 | %Straight Lines [id:da20127916146270386]
17 | \draw [dash pattern={on 0.84pt off 2.51pt}] (130,70) -- (130,140) ;
18 | %Straight Lines [id:da677805480574649]
19 | \draw [dash pattern={on 0.84pt off 2.51pt}] (130,70) -- (100,70) ;
20 | %Straight Lines [id:da35275281353074295]
21 | \draw [dash pattern={on 0.84pt off 2.51pt}] (170,70) -- (170,140) ;
22 |
23 | % Text Node
24 | \draw (81,62.4) node [anchor=north west][inner sep=0.75pt] {$1$};
25 | % Text Node
26 | \draw (82,142.4) node [anchor=north west][inner sep=0.75pt] {$t_0$};
27 | % Text Node
28 | \draw (124,142.4) node [anchor=north west][inner sep=0.75pt] {$t_{1}$};
29 | % Text Node
30 | \draw (164,142.4) node [anchor=north west][inner sep=0.75pt] {$t_{2}$};
31 | % Text Node
32 | \draw (67,42.4) node [anchor=north west][inner sep=0.75pt] {$u(t)$};
33 | % Text Node
34 | \draw (204,142.4) node [anchor=north west][inner sep=0.75pt] {$t$};
35 |
36 |
37 | \end{tikzpicture}
--------------------------------------------------------------------------------
/lectio_praecursoria/figs/path-graph.tex:
--------------------------------------------------------------------------------
1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
2 |
3 | \begin{tikzpicture}[x=0.7pt,y=0.7pt,yscale=-1,xscale=1]
4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300
5 |
6 | %Shape: Circle [id:dp821654118938725]
7 | \draw [line width=1.5] (40,90) .. controls (40,78.95) and (48.95,70) .. (60,70) .. controls (71.05,70) and (80,78.95) .. (80,90) .. controls (80,101.05) and (71.05,110) .. (60,110) .. controls (48.95,110) and (40,101.05) .. (40,90) -- cycle ;
8 | %Shape: Circle [id:dp9693294242254202]
9 | \draw [line width=1.5] (110,90) .. controls (110,78.95) and (118.95,70) .. (130,70) .. controls (141.05,70) and (150,78.95) .. (150,90) .. controls (150,101.05) and (141.05,110) .. (130,110) .. controls (118.95,110) and (110,101.05) .. (110,90) -- cycle ;
10 | %Straight Lines [id:da7555295343349109]
11 | \draw [line width=1.5] (110,90) -- (84,90) ;
12 | \draw [shift={(80,90)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
13 | %Straight Lines [id:da1673080699648295]
14 | \draw [line width=1.5] (220,90) -- (194,90) ;
15 | \draw [shift={(190,90)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
16 | %Shape: Circle [id:dp9339346589837028]
17 | \draw [line width=1.5] (220,90) .. controls (220,78.95) and (228.95,70) .. (240,70) .. controls (251.05,70) and (260,78.95) .. (260,90) .. controls (260,101.05) and (251.05,110) .. (240,110) .. controls (228.95,110) and (220,101.05) .. (220,90) -- cycle ;
18 |
19 | % Text Node
20 | \draw (60,90) node [font=\large] {$U_{0}^{1}$};
21 | % Text Node
22 | \draw (130,90) node [font=\large] {$U_{1}^{2}$};
23 | % Text Node
24 | \draw (161,81) node [anchor=north west][inner sep=0.75pt] [align=left] {...};
25 | % Text Node
26 | \draw (240,90) node [font=\large] {$U_{L-1}^{L}$};
27 |
28 |
29 | \end{tikzpicture}
30 |
31 |
--------------------------------------------------------------------------------
/cover/gen_q_wiener.py:
--------------------------------------------------------------------------------
1 | # Simulate an H^1_0([0, S])-valued Wiener process. This generates the cover image of the thesis.
2 | # Zheng Zhao 2019 2021
3 | #
4 | # Reference: Gabriel J. Lord et al., 2014 spde book.
5 | #
6 | # note: Aalto platform does not support RGBA colour hence, cannot use alpha.
7 | #
8 |
9 | import numpy as np
10 | import matplotlib.pyplot as plt
11 | from matplotlib import cm
12 |
13 | np.random.seed(1901)
14 |
15 | # Paras
16 | r = 2
17 | J = 2 ** 7
18 | K = J - 1
19 |
20 | S = 2
21 | xs = np.linspace(0, S, K)
22 |
23 | dt = 5e-3
24 | ts = np.arange(dt, 2 + dt, dt)
25 |
26 | ps = np.arange(1, K + 1).reshape(1, -1) * 1.0
27 | lam_j = ps ** (-(2 * r + 1))
28 | sheet_jk = ps.T * ps
29 |
30 | # Simulate Wiener processes
31 | normal_incs = np.random.randn(ts.size, K)
32 | dW = np.dot(np.sqrt(2 * lam_j * dt / S) * normal_incs / np.sqrt(dt), np.sin(np.pi * sheet_jk / J))
33 | WW = np.cumsum(dW, 0)
34 |
35 | # Plot
36 | fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
37 |
38 | colours = cm.magma(np.linspace(0, 0.9, ts.shape[0]))
39 | for t, Wt, colour in zip(ts, WW, colours):
40 | _, = ax.plot3D(xs, [t] * K, Wt, linewidth=0.1, color=colour, alpha=1.)
41 |
42 | ax.grid(False)
43 |
44 | ax.set_axis_off()
45 |
46 | ax.xaxis.set_ticklabels([])
47 | ax.yaxis.set_ticklabels([])
48 | ax.zaxis.set_ticklabels([])
49 |
50 | ax.xaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
51 | ax.yaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
52 | ax.zaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
53 |
54 | # Transparent spines
55 | ax.w_xaxis.line.set_color((1.0, 1.0, 1.0, 0.0))
56 | ax.w_yaxis.line.set_color((1.0, 1.0, 1.0, 0.0))
57 | ax.w_zaxis.line.set_color((1.0, 1.0, 1.0, 0.0))
58 |
59 | ax.xaxis.pane.fill = False
60 | ax.yaxis.pane.fill = False
61 |
62 | ax.view_init(23, 25)
63 |
64 | bbox = fig.bbox_inches.from_bounds(1.91, 1.48, 2.775, 1.843)
65 |
66 | # Save in pdf and png
67 | png_meta_data = {'Title': 'A Q-Wiener process realisation',
68 | 'Author': 'Zheng Zhao',
69 | 'Copyright': 'Zheng Zhao',
70 | 'Description': 'https://github.com/zgbkdlm/dissertation'}
71 | pdf_meta_data = {'Title': 'A Q-Wiener process realisation',
72 | 'Author': 'Zheng Zhao'}
73 | plt.savefig('cover.png', dpi=1200, transparent=True, metadata=png_meta_data, bbox_inches=bbox)
74 | plt.savefig('cover.pdf', bbox_inches=bbox, metadata=pdf_meta_data)
75 |
76 |
--------------------------------------------------------------------------------
/lectio_praecursoria/scripts/draw_gp_samples.py:
--------------------------------------------------------------------------------
1 | """
2 | Draw GP samples
3 |
4 | """
5 | import jax
6 | import math
7 | import jax.numpy as jnp
8 | import jax.scipy
9 | import jax.scipy.optimize
10 | import matplotlib.pyplot as plt
11 | from jax.config import config
12 |
13 | config.update("jax_enable_x64", True)
14 |
15 | plt.rcParams.update({
16 | 'text.usetex': True,
17 | 'text.latex.preamble': r'\usepackage{fouriernc}',
18 | 'font.family': "serif",
19 | 'font.serif': 'New Century Schoolbook',
20 | 'font.size': 18})
21 |
22 | # Random seed
23 | key = jax.random.PRNGKey(6666)
24 |
25 | jndarray = jnp.ndarray
26 |
27 |
28 | def m12_cov(t1: float, t2: float, s: float, ell: float) -> float:
29 | """Matern 1/2"""
30 | return s ** 2 * jnp.exp(-jnp.abs(t1 - t2) / ell)
31 |
32 |
33 | def m32_cov(t1: float, t2: float, s: float, ell: float) -> float:
34 | """Matern 3/2"""
35 | z = math.sqrt(3) * jnp.abs(t1 - t2) / ell
36 | return s ** 2 * (1 + z) * jnp.exp(-z)
37 |
38 |
39 | vectorised_m12_cov = jax.vmap(jax.vmap(m12_cov, in_axes=[0, None, None, None]), in_axes=[None, 0, None, None])
40 | vectorised_m32_cov = jax.vmap(jax.vmap(m32_cov, in_axes=[0, None, None, None]), in_axes=[None, 0, None, None])
41 |
42 |
43 | # Times
44 | ts = jnp.linspace(0, 1, 1000)
45 | num_mcs = 10
46 |
47 | # Paras
48 | s = 1.
49 | ell = 1.
50 |
51 | # Compute mean and covariances
52 | mean = jnp.zeros_like(ts)
53 |
54 | for cov_func, cov_name in zip([vectorised_m12_cov, vectorised_m32_cov],
55 | ['m12', 'm32']):
56 |
57 | cov = cov_func(ts, ts, s, ell)
58 | fig, ax = plt.subplots(figsize=(8.8, 6.6))
59 | plt.xlim(0, 1)
60 | plt.ylabel('$U(t)$', fontsize=20)
61 | plt.xlabel('$t$', fontsize=20)
62 |
63 | for i in range(num_mcs):
64 |
65 | # Random key
66 | key, subkey = jax.random.split(key)
67 |
68 | # Draw!
69 | gp_sample = jax.random.multivariate_normal(key=subkey, mean=mean, cov=cov)
70 |
71 | # Plot
72 | ax.plot(ts, gp_sample, linewidth=1)
73 |
74 | if cov_func is vectorised_m12_cov:
75 | plt.subplots_adjust(top=0.995, bottom=0.09, left=0.08, right=0.981, hspace=0.2, wspace=0.2)
76 | else:
77 | plt.subplots_adjust(top=0.995, bottom=0.09, left=0.105, right=0.981, hspace=0.2, wspace=0.2)
78 |
79 | plt.savefig(f'../figs/gp-sample-{cov_name}.pdf')
80 | plt.cla()
81 |
82 |
--------------------------------------------------------------------------------
/thesis_latex/figs/ssdgp-identifiability-graph.tex:
--------------------------------------------------------------------------------
1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
2 |
3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1]
4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300
5 |
6 | %Shape: Circle [id:dp821654118938725]
7 | \draw [line width=1.5] (130,70) .. controls (130,58.95) and (138.95,50) .. (150,50) .. controls (161.05,50) and (170,58.95) .. (170,70) .. controls (170,81.05) and (161.05,90) .. (150,90) .. controls (138.95,90) and (130,81.05) .. (130,70) -- cycle ;
8 | %Shape: Circle [id:dp9693294242254202]
9 | \draw [line width=1.5] (200,40) .. controls (200,28.95) and (208.95,20) .. (220,20) .. controls (231.05,20) and (240,28.95) .. (240,40) .. controls (240,51.05) and (231.05,60) .. (220,60) .. controls (208.95,60) and (200,51.05) .. (200,40) -- cycle ;
10 | %Straight Lines [id:da2336152744929989]
11 | \draw [line width=1.5] (200,100) -- (173.38,83.14) ;
12 | \draw [shift={(170,81)}, rotate = 392.35] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
13 | %Straight Lines [id:da7555295343349109]
14 | \draw [line width=1.5] (200,40) -- (173.33,57.78) ;
15 | \draw [shift={(170,60)}, rotate = 326.31] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
16 | %Shape: Rectangle [id:dp057979059306741076]
17 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (120,10) -- (320,10) -- (320,130) -- (120,130) -- cycle ;
18 | %Shape: Rectangle [id:dp30596298266066313]
19 | \draw [line width=1.5] (270,20) -- (310,20) -- (310,60) -- (270,60) -- cycle ;
20 | %Shape: Rectangle [id:dp9621262435565225]
21 | \draw [line width=1.5] (200,80) -- (240,80) -- (240,120) -- (200,120) -- cycle ;
22 | %Straight Lines [id:da8176933026073245]
23 | \draw [line width=1.5] (270,40) -- (244,40) ;
24 | \draw [shift={(240,40)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
25 |
26 | % Text Node
27 | \draw (150,70) node [font=\large] {$U_{0}^{1}$};
28 | % Text Node
29 | \draw (220,40) node [font=\large] {$U_{1}^{2}$};
30 | % Text Node
31 | \draw (290,37) node [font=\large] {$\varphi $};
32 | % Text Node
33 | \draw (220,96) node [font=\large] {$\psi $};
34 | % Text Node
35 | \draw (301,112) node [font=\large] {$\mathcal{V}$};
36 |
37 |
38 | \end{tikzpicture}
39 |
--------------------------------------------------------------------------------
/scripts/vanishing_prior_cov.py:
--------------------------------------------------------------------------------
1 | # Numerically plot the vanishing covariance problem for Matern SS-DGP. This gives Figure 4.8 in the thesis.
2 | #
3 | # Zheng Zhao 2021
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 |
10 | l2 = 0.1
11 | s2 = 0.1
12 | l3 = 0.1
13 | s3 = 0.1
14 |
15 |
16 | def g(x):
17 | return np.exp(x)
18 |
19 |
20 | def a_b(x):
21 | """Return a(x) and b(x)
22 | """
23 | return -1 / np.array([g(x[1]), l2, l3]) * x, \
24 | math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])), s2 / np.sqrt(l2), s3 / np.sqrt(l3)])
25 |
26 |
27 | def euler_maruyama(x0, dt, num_steps):
28 | xx = np.zeros(shape=(num_steps, x0.shape[0]))
29 | x = x0
30 | for i in range(num_steps):
31 | ax, bx = a_b(x)
32 | x = x + ax * dt + np.sqrt(dt) * bx @ np.random.randn(3)
33 | xx[i] = x
34 | return xx
35 |
36 |
37 | if __name__ == '__main__':
38 |
39 | path_figs = '../thesis/figs'
40 | plt.rcParams.update({
41 | 'text.usetex': True,
42 | 'text.latex.preamble': r'\usepackage{fouriernc}',
43 | 'font.family': "serif",
44 | 'font.serif': 'New Century Schoolbook',
45 | 'font.size': 20})
46 |
47 | np.random.seed(2020)
48 |
49 | end_T = 1
50 | num_steps = 1000
51 | dt = end_T / num_steps
52 | tt = np.linspace(dt, end_T, num_steps)
53 |
54 | num_mc = 20000
55 |
56 | m0 = np.array([1., 1., 1.])
57 | P0 = np.array([[1., 0., 0.5],
58 | [0., 1., 0.],
59 | [0.5, 0., 1.]])
60 | P0_chol = np.linalg.cholesky(P0)
61 |
62 | # Compute Euler Maruyama
63 | xx = np.zeros(shape=(num_mc, num_steps, 3))
64 | for mc in range(num_mc):
65 | x0 = m0 + P0_chol @ np.random.randn(3)
66 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps)
67 |
68 | Ex = np.mean(xx, axis=0)[:, 0]
69 | Ey = np.mean(xx, axis=0)[:, 2]
70 | Exy = np.mean(xx[:, :, 0] * xx[:, :, 2], axis=0)
71 |
72 | cov = Exy - Ex * Ey
73 |
74 | # Plot
75 | plt.figure(figsize=(7, 4))
76 | plt.plot(tt, cov, c='black', linewidth=3, label=r'$\mathrm{Cov}\,\big[U^1_0(t), U^3_1(t)\big]$')
77 |
78 | plt.grid(linestyle='--', alpha=0.3, which='both')
79 |
80 | plt.xlabel('$t$')
81 | plt.ylabel(r'$\mathrm{Cov}\,\big[U^1_0(t), U^3_1(t)\big]$')
82 | plt.xlim(0, end_T)
83 | plt.yticks([0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
84 |
85 | plt.legend(loc='upper right', fontsize=20)
86 |
87 | plt.tight_layout(pad=0.1)
88 |
89 | plt.savefig(os.path.join(path_figs, 'vanishing-cov.pdf'))
90 | # plt.show()
91 |
--------------------------------------------------------------------------------
/scripts/TME_positive_definite_softplus.py:
--------------------------------------------------------------------------------
1 | # This will generate Figure 3.1 in the thesis.
2 | #
3 | # Zheng Zhao 2020
4 | #
5 | import os
6 | import sympy
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 | import tme.base_sympy as tme
10 |
11 | from sympy import lambdify
12 | from matplotlib.ticker import MultipleLocator
13 |
14 | if __name__ == '__main__':
15 |
16 | # Initial value and paras
17 | np.random.seed(666)
18 | x0 = np.array([[0.],
19 | [0.]])
20 |
21 | # Example SDE
22 | kappa = sympy.Symbol('k')
23 | x = sympy.MatrixSymbol('x', 2, 1)
24 | f = sympy.Matrix([[sympy.log(1 + sympy.exp(x[0])) + kappa * x[1]],
25 | [sympy.log(1 + sympy.exp(x[1])) + kappa * x[0]]])
26 | L = sympy.eye(2)
27 | dt_sym = sympy.Symbol('dt')
28 |
29 | # TME
30 | tme_mean, tme_cov = tme.mean_and_cov(x, f, L, dt_sym,
31 | order=2, simp=True)
32 |
33 | # Cov
34 | cov_func = lambdify([x, kappa, dt_sym], tme_cov, 'numpy')
35 |
36 | # Compute
37 | # xx = np.linspace(0, 1, 200)
38 | kk = np.linspace(-4, 4, 500)
39 | dts = np.linspace(0.01, 6, 500)
40 |
41 | mineigs = np.zeros((kk.size, dts.size))
42 |
43 | for i in range(kk.size):
44 | for j in range(dts.size):
45 | eig, _ = np.linalg.eigh(cov_func(x0, kk[i], dts[j]))
46 | mineigs[i, j] = eig.min()
47 |
48 | # Plot
49 | path_figs = '../thesis/figs'
50 | plt.rcParams.update({
51 | 'text.usetex': True,
52 | 'text.latex.preamble': r'\usepackage{fouriernc}',
53 | 'font.family': "serif",
54 | 'font.serif': 'New Century Schoolbook',
55 | 'font.size': 18})
56 |
57 | fig = plt.figure()
58 |
59 | grid_k, grid_dt = np.meshgrid(dts, kk)
60 |
61 | mineigs_crop = mineigs
62 | mineigs_crop[mineigs_crop < 0] = -1.
63 |
64 | ax = plt.axes()
65 | ax.xaxis.set_major_locator(MultipleLocator(1))
66 | ax.yaxis.set_major_locator(MultipleLocator(1))
67 |
68 | cnt = plt.contourf(grid_dt, grid_k, mineigs_crop, cmap=plt.cm.Blues_r)
69 | for c in cnt.collections:
70 | c.set_edgecolor("face")
71 |
72 | cbar = plt.colorbar()
73 | cbar_ticks = [tick.get_text() for tick in cbar.ax.get_yticklabels()]
74 | cbar_ticks[0] = '<0'
75 |
76 | cbar.ax.set_yticklabels(cbar_ticks)
77 |
78 | plt.axvline(-0.5, c='red', linestyle='--', alpha=0.5)
79 | plt.axvline(0.5, c='red', linestyle='--', alpha=0.5)
80 |
81 | plt.xlabel(r'$\kappa$')
82 | plt.ylabel(r'$\Delta t$')
83 | plt.title(r'$\lambda_{\mathrm{min}}(\Sigma_2(\Delta t))$')
84 |
85 | plt.tight_layout(pad=0.1)
86 |
87 | # plt.show()
88 |
89 | plt.savefig(os.path.join(path_figs, 'tme-softplus-mineigs.pdf'))
90 |
--------------------------------------------------------------------------------
/lectio_praecursoria/zz.cls:
--------------------------------------------------------------------------------
1 | \NeedsTeXFormat{LaTeX2e}
2 | \ProvidesClass{zz}[2020/10/11 Zheng Zhao's minimalism beamer class]
3 |
4 | % Dependencies
5 | \RequirePackage{lastpage}
6 | \RequirePackage{calc}
7 | \RequirePackage[dvipsnames]{xcolor}
8 |
9 | % Commands and definitions
10 | \newlength\titlepagesep \setlength\titlepagesep{0.2cm}
11 | \newlength\titlepageauthorsep \setlength\titlepageauthorsep{0.6cm}
12 | \newlength\footery \setlength\footery{8cm}
13 |
14 | \definecolor{footergray}{gray}{0.5}
15 | \newcommand{\setcmapBeijing}{%
16 | \definecolor{frametitlecolor}{gray}{0.2}
17 | \definecolor{mastercolour}{named}{RoyalPurple}
18 | \definecolor{secondcolour}{RGB}{255, 121, 19}
19 | }
20 | \newcommand{\setcmapHelsinki}{%
21 | \definecolor{frametitlecolor}{gray}{0.2}
22 | \definecolor{mastercolour}{RGB}{66, 140, 212}
23 | \definecolor{secondcolour}{RGB}{255, 156, 218}
24 | }
25 | \newcommand{\setcmapReykjavik}{%
26 | \definecolor{frametitlecolor}{gray}{0.2}
27 | \definecolor{mastercolour}{gray}{0.4}
28 | \definecolor{secondcolour}{gray}{0.4}
29 | }
30 | \setcmapHelsinki
31 |
32 | \newif\ifseriffont\seriffontfalse
33 | \newif\iffullfooter\fullfooterfalse
34 |
35 | % Parse options and load beamer
36 | \DeclareOption{garamond}{%
37 | }
38 | \DeclareOption{seriffont}{\seriffonttrue}
39 | \DeclareOption{fullfooter}{\fullfootertrue}
40 | \DeclareOption{cmap=Beijing}{\setcmapBeijing}
41 | \DeclareOption{cmap=Helsinki}{\setcmapHelsinki}
42 | \DeclareOption*{\PassOptionsToClass{\CurrentOption}{beamer}}
43 | \ProcessOptions\relax
44 | \LoadClass{beamer}
45 |
46 | % Commands and definitions that depend on options
47 | \renewcommand{\titlepage}{%
48 | {%
49 | \setbeamertemplate{footline}{}
50 | \frame[t, noframenumbering]{%
51 | \vspace{2cm}
52 | \centering{
53 | {\Large \scshape \textbf{\inserttitle}}\\[0.4cm]
54 | \insertsubtitle\\[1.8cm]
55 | \insertauthor\\[\titlepageauthorsep]
56 | {\scriptsize \insertinstitute}\\[\titlepagesep]
57 | {\scriptsize \insertdate}
58 | }
59 | }
60 | }
61 | }
62 |
63 | % Beamer customisations
64 | \iffullfooter
65 | \newcommand{\footertext}{\beamer@shorttitle}
66 | \else
67 | \newcommand{\footertext}{~}
68 | \fi
69 | \setbeamertemplate{footline}{%
70 | \noindent
71 | \begin{minipage}{.45\paperwidth}
72 | \vspace{-0.5cm}
73 | \hspace{\beamer@leftmargin}
74 | \footertext
75 | \end{minipage}
76 | \hfill
77 | \begin{minipage}{.45\paperwidth}
78 | \vspace{-0.5cm}
79 | \hspace{.35\paperwidth minus \beamer@rightmargin}
80 | {%
81 | \color{footergray}
82 | \tiny
83 | \arabic{page}/\pageref{LastPage}
84 | }
85 | \end{minipage}
86 | }
87 |
88 | \setbeamertemplate{navigation symbols}{}
89 | \setbeamertemplate{frametitle}{%
90 | {%
91 | \color{frametitlecolor}
92 | \vspace{0.2cm}\insertframetitle\\[-0.15cm]
93 | \rule{\widthof{\insertframetitle}}{1.5pt}
94 | }
95 | }
96 |
97 | % Fonts
98 | \ifseriffont
99 | % \usefonttheme{structuresmallcapsserif}
100 | \usefonttheme{serif}
101 | \fi
102 | \setbeamerfont{section title}{size=\normalsize,series=\bfseries}
103 | \setbeamerfont{frametitle}{series=\bfseries, shape=\scshape, family=\rmfamily}
104 | %\setbeamerfont{framesubtitle}{series=\rmfamily}
105 | %\setbeamerfont{caption}{series=\rmfamily}
106 | %\AtBeginDocument{\rmfamily}
107 |
108 | % Colours
109 | \setbeamercolor{structure}{fg=mastercolour}
110 | \setbeamercolor{alerted text}{fg=secondcolour}
111 | \setbeamercolor{example text}{fg=mastercolour}
112 |
--------------------------------------------------------------------------------
/scripts/showcase_gp_sinusoidal.py:
--------------------------------------------------------------------------------
1 | # Show the GP regression on a composite sinusoidal signal, and generate Figure 1.1.
2 | #
3 | # Zheng Zhao 2021
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import gpflow
9 | import matplotlib.pyplot as plt
10 | from typing import Tuple
11 | from matplotlib.ticker import MultipleLocator
12 |
13 | path_figs = '../thesis/figs'
14 | np.random.seed(666)
15 | plt.rcParams.update({
16 | 'text.usetex': True,
17 | 'text.latex.preamble': r'\usepackage{fouriernc}',
18 | 'font.family': "serif",
19 | 'font.serif': 'New Century Schoolbook',
20 | 'font.size': 20})
21 |
22 |
23 | def sinu(t: np.ndarray,
24 | r: float) -> Tuple[np.ndarray, np.ndarray]:
25 | """Composite sinusoidal signal. Return the signal and an noisy measurement of it.
26 | """
27 | ft = np.sin(7 * np.pi * np.cos(2 * np.pi * (t ** 2))) ** 2 / \
28 | (np.cos(5 * np.pi * t) + 2)
29 | return ft, ft + math.sqrt(r) * np.random.randn(*t.shape)
30 |
31 |
32 | # Simulate measurements
33 | t = np.linspace(0, 1, 400).reshape(-1, 1)
34 | r = 0.004
35 | ft, y = sinu(t, r)
36 |
37 | # GPflow
38 | ell = 1.
39 | sigma = 1.
40 |
41 | m12 = gpflow.kernels.Matern12(lengthscales=ell, variance=sigma)
42 | m32 = gpflow.kernels.Matern32(lengthscales=ell, variance=sigma)
43 | m52 = gpflow.kernels.Matern52(lengthscales=ell, variance=sigma)
44 |
45 | # Plots
46 | for name, label, cov in zip(['m12', 'm32', 'm52'],
47 | [r'Mat\'ern $1\,/\,2$', r'Mat\'ern $3\,/\,2$', r'Mat\'ern $5\,/\,2$'],
48 | [m12, m32, m52]):
49 | print(f'GP regression with {name} cov function')
50 | model = gpflow.models.GPR(data=(t, y), kernel=cov, mean_function=None)
51 | model.likelihood.variance.assign(r)
52 |
53 | opt = gpflow.optimizers.Scipy()
54 | opt_logs = opt.minimize(model.training_loss, model.trainable_variables,
55 | method='L-BFGS-B',
56 | options={'disp': True})
57 |
58 | m, P = model.predict_f(t)
59 |
60 | # Plot and save
61 | fig = plt.figure(figsize=(16, 8))
62 | ax = plt.axes()
63 | plt.plot(t, ft, c='black', alpha=0.8, linestyle='--', linewidth=2, label='True signal')
64 | plt.scatter(t, y, s=15, c='black', edgecolors='none', alpha=0.3, label='Measurements')
65 | plt.plot(t, m, c='black', linewidth=3, label=label)
66 | plt.fill_between(
67 | t[:, 0],
68 | m[:, 0] - 1.96 * np.sqrt(P[:, 0]),
69 | m[:, 0] + 1.96 * np.sqrt(P[:, 0]),
70 | color='black',
71 | edgecolor='none',
72 | alpha=0.2,
73 | )
74 |
75 | plt.grid(linestyle='--', alpha=0.3, which='both')
76 |
77 | plt.xlim(0, 1)
78 | plt.ylim(-0.2, 1.2)
79 |
80 | ax.xaxis.set_major_locator(MultipleLocator(0.2))
81 | ax.xaxis.set_minor_locator(MultipleLocator(0.1))
82 | ax.xaxis.set_major_formatter('{x:.1f}')
83 |
84 | ax.yaxis.set_major_locator(MultipleLocator(0.4))
85 | ax.yaxis.set_minor_locator(MultipleLocator(0.1))
86 | ax.yaxis.set_major_formatter('{x:.1f}')
87 |
88 | plt.xlabel('$t$', fontsize=24)
89 | plt.title('$\\ell \\approx {:.2f}, \\quad \\sigma \\approx {:.2f}$'.format(cov.lengthscales.numpy(),
90 | cov.variance.numpy()))
91 | plt.legend(loc='upper left', fontsize='large')
92 |
93 | plt.tight_layout(pad=0.1)
94 |
95 | filename = 'gp-fail-example-sinu-' + name + '.pdf'
96 | plt.savefig(os.path.join(path_figs, filename))
97 |
--------------------------------------------------------------------------------
/thesis_latex/list_of_papers.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = dissertation.tex
2 | \addpublication{Zheng Zhao, Toni Karvonen, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{Taylor moment expansion for continuous-discrete Gaussian filtering}{IEEE Transactions on Automatic Control}{Volume 66, Issue 9, Pages 4460--4467}{December}{2020}{Zheng Zhao, Toni Karvonen, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{paperTME}
3 | \addcontribution{Zheng Zhao wrote the article and produced the results. The stability analysis is mainly due to Toni Karvonen. Roland Hostettler gave useful comments. Simo S\"{a}rkk\"{a} contributed the idea.}
4 | \adderrata{In Example 7, the coefficient $\Phi_{x,2}$ should multiply with a factor $2$.}
5 |
6 | \addpublication{Zheng Zhao, Muhammad Emzir, and Simo S\"{a}rkk\"{a}}{Deep state-space Gaussian processes}{Statistics and Computing}{Volume 31, Issue 6, Article number 75, Pages 1--26}{September}{2021}{Zheng Zhao, Muhammad Emzir, and Simo S\"{a}rkk\"{a}}{paperSSDGP}
7 | \addcontribution{Zheng Zhao wrote the article and produced the results. Muhammad Emzir and Simo S\"{a}rkk\"{a} gave useful comments.}
8 |
9 | \addpublication{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{Kalman-based spectro-temporal ECG analysis using deep convolutional networks for atrial fibrillation detection}{Journal of Signal Processing Systems}{Volume 92, Issue 7, Pages 621--636}{April}{2020}{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{paperKFSECG}
10 | \addcontribution{Zheng Zhao wrote the article and produced the results. Ali Bahrami Rad helped with the experiments. Simo S\"{a}rkk\"{a} came up with the spectro-temporal idea.}
11 |
12 | \addpublication[conference]{Zheng Zhao, Filip Tronarp, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{State-space Gaussian process for drift estimation in stochastic differential equations}{Proceedings of the 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}{Barcelona, Spain, Pages 5295--5299}{May}{2020}{IEEE}{paperDRIFT}
13 | \addcontribution{Zheng Zhao wrote the article and produced the results. Filip Tronarp provided codes for the iterated posterior linearisation filter. Roland Hostettler gave useful comments. Idea was due to Simo S\"{a}rkk\"{a}.}
14 |
15 | \addpublication[conference]{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{Spectro-temporal ECG analysis for atrial fibrillation detection}{Proceedings of the IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)}{Aalborg, Denmark, 6 pages}{September}{2018}{IEEE}{paperKFSECGCONF}
16 | \addcontribution{Zheng Zhao wrote the article and produced the results. Ali Bahrami Rad helped with the experiments. Simo S\"{a}rkk\"{a} came up with the spectro-temporal idea.}
17 |
18 | \addpublication[accepted]{Sarang Thombre, Zheng Zhao, Henrik Ramm-Schmidt, Jos\'{e} M. Vallet García, Tuomo Malkam\"{a}ki, Sergey Nikolskiy, Toni Hammarberg, Hiski Nuortie, M. Zahidul H. Bhuiyan, Simo S\"{a}rkk\"{a}, and Ville V. Lehtola}{Sensors and AI techniques for situational awareness in autonomous ships: a review}{IEEE Transactions on Intelligent Transportation Systems}{20 pages}{September}{2020}{IEEE}{paperMARITIME}
19 | \addcontribution{Zheng Zhao wrote the reviews of AI techniques and produced corresponding results.}
20 |
21 | \addpublication[submitted]{Zheng Zhao, Rui Gao, and Simo S\"{a}rkk\"{a}}{Hierarchical Non-stationary temporal Gaussian processes with $L^1$-regularization}{Statistics and Computing}{20 pages}{May}{2021}{Zheng Zhao, Rui Gao, and Simo S\"{a}rkk\"{a}}{paperRNSSGP}
22 | \addcontribution{Zheng Zhao wrote the article and produced the results. Rui Gao contributed the convergence analysis. Simo S\"{a}rkk\"{a} gave useful comments.}
23 |
--------------------------------------------------------------------------------
/scripts/draw_ssdgp_m12.py:
--------------------------------------------------------------------------------
1 | # Draw Matern 1/2 SS-DGP samples and generate Figure 4.4 in the thesis
2 | #
3 | # Zheng Zhao
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 |
10 | l2 = 2.
11 | s2 = 2.
12 | l3 = 2.
13 | s3 = 2.
14 |
15 |
16 | def g(x):
17 | """Transformation function
18 | """
19 | return np.exp(x)
20 |
21 |
22 | def a_b(x):
23 | """Return SDE drift and dispersion function a and b
24 | """
25 | return -1 / np.array([g(x[1]), l2, l3]) * x, \
26 | math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])), s2 / np.sqrt(l2), s3 / np.sqrt(l3)])
27 |
28 |
29 | def euler_maruyama(x0, dt, num_steps, int_steps):
30 | xx = np.zeros(shape=(num_steps, x0.shape[0]))
31 | x = x0
32 | ddt = dt / int_steps
33 | for i in range(num_steps):
34 | for j in range(int_steps):
35 | ax, bx = a_b(x)
36 | x = x + ax * ddt + np.sqrt(ddt) * bx @ np.random.randn(3)
37 | xx[i] = x
38 | return xx
39 |
40 |
41 | if __name__ == '__main__':
42 | path_figs = '../thesis/figs'
43 | plt.rcParams.update({
44 | 'text.usetex': True,
45 | 'text.latex.preamble': r'\usepackage{fouriernc}',
46 | 'font.family': "serif",
47 | 'font.serif': 'New Century Schoolbook',
48 | 'font.size': 20})
49 |
50 | np.random.seed(2020)
51 |
52 | end_T = 10
53 | num_steps = 1000
54 | int_steps = 10
55 | dt = end_T / num_steps
56 | tt = np.linspace(dt, end_T, num_steps)
57 |
58 | num_mc = 2
59 |
60 | # Compute Euler--Maruyama
61 | xx = np.zeros(shape=(num_mc, num_steps, 3))
62 | for mc in range(num_mc):
63 | x0 = np.random.randn(3)
64 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps, int_steps=int_steps)
65 |
66 | colours = ('black', 'tab:blue', 'tab:purple')
67 | markers = ('.', 'x', '1')
68 |
69 | # Plot u
70 | fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(12, 13), sharex=True)
71 | for mc in range(num_mc):
72 | ax1.plot(tt, xx[mc, :, 0],
73 | linewidth=2, c=colours[mc],
74 | marker=markers[mc], markevery=200, markersize=16,
75 | label=f'Sample {mc + 1}')
76 |
77 | ax1.grid(linestyle='--', alpha=0.3, which='both')
78 |
79 | ax1.set_ylabel('$U^1_0(t)$')
80 | ax1.set_xlim(0, end_T)
81 | ax1.set_xticks(np.arange(0, end_T + 1, 1))
82 |
83 | ax1.legend(ncol=2, loc='upper left', fontsize=18)
84 |
85 | # Plot ell
86 | for mc in range(num_mc):
87 | ax2.plot(tt, xx[mc, :, 1],
88 | linewidth=2, c=colours[mc],
89 | marker=markers[mc], markevery=200, markersize=16,
90 | label=f'Sample {mc}')
91 |
92 | ax2.grid(linestyle='--', alpha=0.3, which='both')
93 |
94 | ax2.set_ylabel('$U^2_1(t)$')
95 | ax2.set_xlim(0, end_T)
96 | ax2.set_xticks(np.arange(0, end_T + 1, 1))
97 |
98 | # Plot sigma
99 | for mc in range(num_mc):
100 | ax3.plot(tt, xx[mc, :, 2],
101 | linewidth=2, c=colours[mc],
102 | marker=markers[mc], markevery=200, markersize=16,
103 | label=f'Sample {mc}')
104 |
105 | ax3.grid(linestyle='--', alpha=0.3, which='both')
106 |
107 | ax3.set_xlabel('$t$')
108 | ax3.set_ylabel('$U^3_1(t)$')
109 | ax3.set_xlim(0, end_T)
110 | ax3.set_xticks(np.arange(0, end_T + 1, 1))
111 |
112 | plt.tight_layout(pad=0.1)
113 | plt.subplots_adjust(bottom=0.053)
114 | plt.savefig(os.path.join(path_figs, 'samples_ssdgp_m12.pdf'))
115 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Doctoral dissertation of Zheng Zhao
2 |
3 |
4 | [](https://github.com/zgbkdlm/dissertation/actions/workflows/latex_compile.yml)
5 |
6 | This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems. As an example, one can think of a family of DGPs as solutions to stochastic differential equations (SDEs), and view their regression problems as filtering and smoothing problems. Additionally, this thesis also presents a few applications from (D)GPs, such as system identification of SDEs and spectro-temporal signal analysis.
7 |
8 | Supervisor: Prof. Simo Särkkä.
9 |
10 | Pre-examiners: Prof. Kody J. H. Law from The University of Manchester and Prof. David Duvenaud from University of Toronto.
11 |
12 | Opponent: Prof. Manfred Opper from University of Birmingham.
13 |
14 | The public defence of the thesis will be streamed online on December 10, 2021 at noon (Helsinki time) via Zoom link https://aalto.zoom.us/j/67529212279. It is free and open to the public, you are welcome to attend.
15 |
16 | More details regarding the thesis itself can be found in its title pages.
17 |
18 | # Contents
19 |
20 | The dissertation is in `./dissertation.pdf`. Feel free to download and read~~
21 |
22 | **Note that you may also find an "official" version in [aaltodoc](http://urn.fi/URN:ISBN:978-952-64-0603-9) published by Aalto University, the content of which has no difference with `./dissertation.pdf`. However, the aaltodoc-version has many readability issues, please use `./dissertation.pdf` instead.**
23 |
24 | 1. `./dissertation.pdf`. The PDF of the thesis.
25 | 2. `./errata.md`. Errata of the thesis.
26 | 3. `./cover`. This folder contains a Python script that generates the cover image.
27 | 4. `./lectio_praecursoria`. This folder contains the presentation at the public defence of the thesis.
28 | 5. `./scripts`. This folder contains Python scripts that are used to generate some of the figures in the thesis.
29 | 6. `./thesis_latex`. This folder contains the LaTeX source of the thesis. Compiling the tex files here will generate a PDF the same as with `./dissertation.pdf`.
30 |
31 | # Satellite repositories
32 |
33 | 1. [https://github.com/zgbkdlm/ssdgp](https://github.com/zgbkdlm/ssdgp) contains implementation of state-space deep Gaussian processes.
34 | 2. [https://github.com/zgbkdlm/tme](https://github.com/zgbkdlm/tme) and [https://github.com/zgbkdlm/tmefs](https://github.com/zgbkdlm/tmefs) contain implementation of Taylor moment expansion method and its filter and smoother applications.
35 |
36 | # Citation
37 |
38 | Bibtex:
39 |
40 | ```bibtex
41 | @phdthesis{Zhao2021Thesis,
42 | title = {State-space deep Gaussian processes with applications},
43 | author = {Zheng Zhao},
44 | school = {Aalto University},
45 | year = {2021},
46 | }
47 | ```
48 |
49 | Plain text:
50 |
51 | Zheng Zhao. *State-space deep Gaussian processes with applications*. PhD thesis, Aalto University, 2021.
52 |
53 | # License
54 |
55 | Unless otherwise stated, all rights belong to the author Zheng Zhao. This repository consists of files covered by different licenses, please check their licenses before you use them.
56 |
57 | You are free to download, display, and print `./dissertation.pdf` for your own personal use. Commercial use of it is prohibited.
58 |
59 | # Acknowledgement
60 |
61 | I would like to thank [Adrien (Monte) Corenflos](https://adriencorenflos.github.io/), [Christos Merkatas](https://cmerkatas.github.io/), [Dennis Yeung](https://www.linkedin.com/in/dptyeung/?originalSubdomain=fi), and [Sakira Hassan](https://sakira.github.io/) for their time and efforts for reviewing and checking the languange of the thesis.
62 |
63 | # Contact
64 |
65 | Zheng Zhao, zheng.zhao@aalto.fi
66 |
--------------------------------------------------------------------------------
/scripts/showcase_gp_rectangular.py:
--------------------------------------------------------------------------------
1 | # Show the GP regression on a magnitude-varying rectangular wave signal, and generate Figure 1.1.
2 | #
3 | # Zheng Zhao 2021
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import gpflow
9 | import matplotlib.pyplot as plt
10 |
11 | from typing import Tuple
12 | from matplotlib.ticker import MultipleLocator
13 |
14 | path_figs = '../thesis/figs'
15 | np.random.seed(666)
16 | plt.rcParams.update({
17 | 'text.usetex': True,
18 | 'text.latex.preamble': r'\usepackage{fouriernc}',
19 | 'font.family': "serif",
20 | 'font.serif': 'New Century Schoolbook',
21 | 'font.size': 20})
22 |
23 |
24 | def rect(t: np.ndarray,
25 | r: float) -> Tuple[np.ndarray, np.ndarray]:
26 | """Rectangle signal. Return the signal and an noisy measurement of it.
27 | """
28 | tau = (t - np.min(t)) / (np.max(t) - np.min(t))
29 |
30 | p = np.linspace(1 / 6, 5 / 6, 5)
31 |
32 | y = np.zeros_like(t)
33 | y[(tau >= 0) & (tau < p[0])] = 0
34 | y[(tau >= p[0]) & (tau < p[1])] = 1
35 | y[(tau >= p[1]) & (tau < p[2])] = 0
36 | y[(tau >= p[2]) & (tau < p[3])] = 0.6
37 | y[(tau >= p[3]) & (tau < p[4])] = 0
38 | y[tau >= p[4]] = 0.4
39 |
40 | return y, y + math.sqrt(r) * np.random.randn(*t.shape)
41 |
42 |
43 | # Simulate measurements
44 | t = np.linspace(0, 1, 400).reshape(-1, 1)
45 | r = 0.004
46 | ft, y = rect(t, r)
47 |
48 | # GPflow
49 | ell = 1.
50 | sigma = 1.
51 |
52 | m12 = gpflow.kernels.Matern12(lengthscales=ell, variance=sigma)
53 | m32 = gpflow.kernels.Matern32(lengthscales=ell, variance=sigma)
54 | m52 = gpflow.kernels.Matern52(lengthscales=ell, variance=sigma)
55 | rbf = gpflow.kernels.SquaredExponential(lengthscales=ell, variance=sigma)
56 |
57 | # Plots
58 | for name, label, cov in zip(['m12', 'm32', 'm52', 'rbf'],
59 | [r'Mat\'ern $1\,/\,2$', r'Mat\'ern $3\,/\,2$', r'Mat\'ern $5\,/\,2$', r'RBF'],
60 | [m12, m32, m52, rbf]):
61 | print(f'GP regression with {name} cov function')
62 | model = gpflow.models.GPR(data=(t, y), kernel=cov, mean_function=None)
63 | model.likelihood.variance.assign(r)
64 |
65 | opt = gpflow.optimizers.Scipy()
66 | opt_logs = opt.minimize(model.training_loss, model.trainable_variables,
67 | method='L-BFGS-B',
68 | options={'disp': True})
69 |
70 | m, P = model.predict_f(t)
71 |
72 | # Plot and save
73 | fig = plt.figure(figsize=(16, 8))
74 | ax = plt.axes()
75 | plt.plot(t, ft, c='black', alpha=0.8, linestyle='--', linewidth=2, label='True signal')
76 | plt.scatter(t, y, s=15, c='black', edgecolors='none', alpha=0.3, label='Measurements')
77 | plt.plot(t, m, c='black', linewidth=3, label=label)
78 | plt.fill_between(
79 | t[:, 0],
80 | m[:, 0] - 1.96 * np.sqrt(P[:, 0]),
81 | m[:, 0] + 1.96 * np.sqrt(P[:, 0]),
82 | color='black',
83 | edgecolor='none',
84 | alpha=0.2,
85 | )
86 |
87 | plt.grid(linestyle='--', alpha=0.3, which='both')
88 |
89 | plt.xlim(0, 1)
90 | plt.ylim(-0.2, 1.2)
91 |
92 | ax.xaxis.set_major_locator(MultipleLocator(0.2))
93 | ax.xaxis.set_minor_locator(MultipleLocator(0.1))
94 | ax.xaxis.set_major_formatter('{x:.1f}')
95 |
96 | ax.yaxis.set_major_locator(MultipleLocator(0.4))
97 | ax.yaxis.set_minor_locator(MultipleLocator(0.1))
98 | ax.yaxis.set_major_formatter('{x:.1f}')
99 |
100 | plt.xlabel('$t$', fontsize=24)
101 | plt.title('$\\ell \\approx {:.2f}, \\quad \\sigma \\approx {:.2f}$'.format(cov.lengthscales.numpy(),
102 | cov.variance.numpy()))
103 | plt.legend(loc='upper right', fontsize='large')
104 |
105 | plt.tight_layout(pad=0.1)
106 |
107 | filename = 'gp-fail-example-' + name + '.pdf'
108 | plt.savefig(os.path.join(path_figs, filename))
109 |
--------------------------------------------------------------------------------
/scripts/draw_ssdgp_m32.py:
--------------------------------------------------------------------------------
1 | # Draw Matern 3/2 SS-DGP samples and generate Figure 4.4 in the thesis
2 | #
3 | # Zheng Zhao
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 |
10 | l2 = 0.5
11 | s2 = 2.
12 | l3 = 0.5
13 | s3 = 2.
14 |
15 |
16 | def g(x):
17 | """Transformation function
18 | """
19 | return np.exp(x)
20 |
21 |
22 | def a_b(x):
23 | """Return SDE drift and dispersion function a and b
24 | """
25 | kappa1 = math.sqrt(3) / g(x[2])
26 | kappa2 = math.sqrt(3) / l2
27 | kappa3 = math.sqrt(3) / l3
28 | return np.array([[0, 1, 0, 0, 0, 0],
29 | [- kappa1 ** 2, -2 * kappa1, 0, 0, 0, 0],
30 | [0, 0, 0, 1, 0, 0],
31 | [0, 0, - kappa2 ** 2, -2 * kappa2, 0, 0],
32 | [0, 0, 0, 0, 0, 1],
33 | [0, 0, 0, 0, - kappa3 ** 2, -2 * kappa3]]) @ x, \
34 | 2 * np.diag([0.,
35 | g(x[4]) * kappa1 ** 1.5,
36 | 0.,
37 | s2 * kappa2 ** 1.5,
38 | 0.,
39 | s3 * kappa3 ** 1.5])
40 |
41 |
42 | def euler_maruyama(x0, dt, num_steps, int_steps):
43 | xx = np.zeros(shape=(num_steps, x0.shape[0]))
44 | x = x0
45 | ddt = dt / int_steps
46 | for i in range(num_steps):
47 | for j in range(int_steps):
48 | ax, bx = a_b(x)
49 | x = x + ax * ddt + np.sqrt(ddt) * bx @ np.random.randn(6)
50 | xx[i] = x
51 | return xx
52 |
53 |
54 | if __name__ == '__main__':
55 |
56 | path_figs = '../thesis/figs'
57 | plt.rcParams.update({
58 | 'text.usetex': True,
59 | 'text.latex.preamble': r'\usepackage{fouriernc}',
60 | 'font.family': "serif",
61 | 'font.serif': 'New Century Schoolbook',
62 | 'font.size': 20})
63 |
64 | np.random.seed(2020)
65 |
66 | end_T = 10
67 | num_steps = 1000
68 | int_steps = 10
69 | dt = end_T / num_steps
70 | tt = np.linspace(dt, end_T, num_steps)
71 |
72 | num_mc = 3
73 |
74 | # Compute Euler--Maruyama
75 | xx = np.zeros(shape=(num_mc, num_steps, 6))
76 | for mc in range(num_mc):
77 | x0 = np.random.randn(6)
78 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps, int_steps=int_steps)
79 |
80 | colours = ('black', 'tab:blue', 'tab:purple')
81 | markers = ('.', 'x', '1')
82 |
83 | # Plot u
84 | fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(12, 13), sharex=True)
85 | for mc in range(num_mc):
86 | ax1.plot(tt, xx[mc, :, 0],
87 | linewidth=2, c=colours[mc],
88 | marker=markers[mc], markevery=200, markersize=16,
89 | label=f'Sample {mc + 1}')
90 |
91 | ax1.grid(linestyle='--', alpha=0.3, which='both')
92 |
93 | ax1.set_ylabel('$\\overline{U}^1_0(t)$')
94 | ax1.set_xlim(0, end_T)
95 | ax1.set_xticks(np.arange(0, end_T + 1, 1))
96 |
97 | ax1.legend(ncol=3, loc='lower left', fontsize=18)
98 |
99 | # Plot ell
100 | for mc in range(num_mc):
101 | ax2.plot(tt, xx[mc, :, 2],
102 | linewidth=2, c=colours[mc],
103 | marker=markers[mc], markevery=200, markersize=16,
104 | label=f'Sample {mc}')
105 |
106 | ax2.grid(linestyle='--', alpha=0.3, which='both')
107 |
108 | ax2.set_ylabel('$\\overline{U}^2_1(t)$')
109 | ax2.set_xlim(0, end_T)
110 | ax2.set_xticks(np.arange(0, end_T + 1, 1))
111 |
112 | # Plot sigma
113 | for mc in range(num_mc):
114 | ax3.plot(tt, xx[mc, :, 4],
115 | linewidth=2, c=colours[mc],
116 | marker=markers[mc], markevery=200, markersize=16,
117 | label=f'Sample {mc}')
118 |
119 | ax3.grid(linestyle='--', alpha=0.3, which='both')
120 |
121 | ax3.set_xlabel('$t$')
122 | ax3.set_ylabel('$\\overline{U}^3_1(t)$')
123 | ax3.set_xlim(0, end_T)
124 | ax3.set_xticks(np.arange(0, end_T + 1, 1))
125 |
126 | plt.tight_layout(pad=0.1)
127 | plt.subplots_adjust(bottom=0.053)
128 | plt.savefig(os.path.join(path_figs, 'samples_ssdgp_m32.pdf'))
129 |
--------------------------------------------------------------------------------
/thesis_latex/figs/dgp-example-2.tex:
--------------------------------------------------------------------------------
1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
2 |
3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1]
4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300
5 |
6 | %Shape: Circle [id:dp821654118938725]
7 | \draw [line width=1.5] (130,70) .. controls (130,58.95) and (138.95,50) .. (150,50) .. controls (161.05,50) and (170,58.95) .. (170,70) .. controls (170,81.05) and (161.05,90) .. (150,90) .. controls (138.95,90) and (130,81.05) .. (130,70) -- cycle ;
8 | %Shape: Circle [id:dp7532382575897598]
9 | \draw [line width=1.5] (80,140) .. controls (80,128.95) and (88.95,120) .. (100,120) .. controls (111.05,120) and (120,128.95) .. (120,140) .. controls (120,151.05) and (111.05,160) .. (100,160) .. controls (88.95,160) and (80,151.05) .. (80,140) -- cycle ;
10 | %Shape: Circle [id:dp5902978501813476]
11 | \draw [line width=1.5] (180,140) .. controls (180,128.95) and (188.95,120) .. (200,120) .. controls (211.05,120) and (220,128.95) .. (220,140) .. controls (220,151.05) and (211.05,160) .. (200,160) .. controls (188.95,160) and (180,151.05) .. (180,140) -- cycle ;
12 | %Straight Lines [id:da9016881764549007]
13 | \draw [line width=1.5] (137.17,92.83) -- (110,120) ;
14 | \draw [shift={(140,90)}, rotate = 135] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
15 | %Straight Lines [id:da026586992089503658]
16 | \draw [line width=1.5] (162.83,92.83) -- (190,120) ;
17 | \draw [shift={(160,90)}, rotate = 45] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
18 | %Straight Lines [id:da8305170432215097]
19 | \draw [line width=1.5] (150,94) -- (150,120) ;
20 | \draw [shift={(150,90)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
21 | %Shape: Circle [id:dp5384623722514443]
22 | \draw [line width=1.5] (130,140) .. controls (130,128.95) and (138.95,120) .. (150,120) .. controls (161.05,120) and (170,128.95) .. (170,140) .. controls (170,151.05) and (161.05,160) .. (150,160) .. controls (138.95,160) and (130,151.05) .. (130,140) -- cycle ;
23 | %Shape: Circle [id:dp16437012307038867]
24 | \draw [line width=1.5] (80,210) .. controls (80,198.95) and (88.95,190) .. (100,190) .. controls (111.05,190) and (120,198.95) .. (120,210) .. controls (120,221.05) and (111.05,230) .. (100,230) .. controls (88.95,230) and (80,221.05) .. (80,210) -- cycle ;
25 | %Shape: Circle [id:dp734554865123197]
26 | \draw [line width=1.5] (240,70) .. controls (240,58.95) and (248.95,50) .. (260,50) .. controls (271.05,50) and (280,58.95) .. (280,70) .. controls (280,81.05) and (271.05,90) .. (260,90) .. controls (248.95,90) and (240,81.05) .. (240,70) -- cycle ;
27 | %Shape: Circle [id:dp48143471465976706]
28 | \draw [line width=1.5] (240,140) .. controls (240,128.95) and (248.95,120) .. (260,120) .. controls (271.05,120) and (280,128.95) .. (280,140) .. controls (280,151.05) and (271.05,160) .. (260,160) .. controls (248.95,160) and (240,151.05) .. (240,140) -- cycle ;
29 | %Straight Lines [id:da6721182832003427]
30 | \draw [line width=1.5] (100,164) -- (100,171) -- (100,190) ;
31 | \draw [shift={(100,160)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
32 | %Straight Lines [id:da34999125713295975]
33 | \draw [line width=1.5] (260,94) -- (260,120) ;
34 | \draw [shift={(260,90)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
35 | %Shape: Rectangle [id:dp18541931062860906]
36 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (60,40) -- (290,40) -- (290,240) -- (60,240) -- cycle ;
37 | %Shape: Rectangle [id:dp6446344359743517]
38 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (70,110) -- (230,110) -- (230,170) -- (70,170) -- cycle ;
39 |
40 | % Text Node
41 | \draw (150,70) node [font=\large] {$U_{0}^{1}$};
42 | % Text Node
43 | \draw (100,140) node [font=\large] {$U_{1}^{3}$};
44 | % Text Node
45 | \draw (200,140) node [font=\large] {$U_{1}^{5}$};
46 | % Text Node
47 | \draw (150,140) node [font=\large] {$U_{1}^{4}$};
48 | % Text Node
49 | \draw (100,210) node [font=\large] {$U_{3}^{7}$};
50 | % Text Node
51 | \draw (260,70) node [font=\large] {$U_{0}^{2}$};
52 | % Text Node
53 | \draw (260,140) node [font=\large] {$U_{2}^{6}$};
54 | % Text Node
55 | \draw (73,87.4) node [anchor=north west][inner sep=0.75pt] {$\mathcal{U}^{1}$};
56 | % Text Node
57 | \draw (261,212.4) node [anchor=north west][inner sep=0.75pt] {$\mathcal{V}$};
58 |
59 |
60 | \end{tikzpicture}
61 |
--------------------------------------------------------------------------------
/thesis_latex/figs/dgp-binary-tree.tex:
--------------------------------------------------------------------------------
1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt
2 |
3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1]
4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300
5 |
6 | %Shape: Circle [id:dp821654118938725]
7 | \draw [line width=1.5] (200,30) .. controls (200,18.95) and (208.95,10) .. (220,10) .. controls (231.05,10) and (240,18.95) .. (240,30) .. controls (240,41.05) and (231.05,50) .. (220,50) .. controls (208.95,50) and (200,41.05) .. (200,30) -- cycle ;
8 | %Shape: Circle [id:dp7532382575897598]
9 | \draw [line width=1.5] (150,100) .. controls (150,88.95) and (158.95,80) .. (170,80) .. controls (181.05,80) and (190,88.95) .. (190,100) .. controls (190,111.05) and (181.05,120) .. (170,120) .. controls (158.95,120) and (150,111.05) .. (150,100) -- cycle ;
10 | %Shape: Circle [id:dp5902978501813476]
11 | \draw [line width=1.5] (250,100) .. controls (250,88.95) and (258.95,80) .. (270,80) .. controls (281.05,80) and (290,88.95) .. (290,100) .. controls (290,111.05) and (281.05,120) .. (270,120) .. controls (258.95,120) and (250,111.05) .. (250,100) -- cycle ;
12 | %Straight Lines [id:da9016881764549007]
13 | \draw [line width=1.5] (207.17,52.83) -- (180,80) ;
14 | \draw [shift={(210,50)}, rotate = 135] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
15 | %Straight Lines [id:da026586992089503658]
16 | \draw [line width=1.5] (232.83,52.83) -- (260,80) ;
17 | \draw [shift={(230,50)}, rotate = 45] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
18 | %Shape: Circle [id:dp7932559631044391]
19 | \draw [line width=1.5] (100,170) .. controls (100,158.95) and (108.95,150) .. (120,150) .. controls (131.05,150) and (140,158.95) .. (140,170) .. controls (140,181.05) and (131.05,190) .. (120,190) .. controls (108.95,190) and (100,181.05) .. (100,170) -- cycle ;
20 | %Shape: Circle [id:dp8827895580667542]
21 | \draw [line width=1.5] (170,170) .. controls (170,158.95) and (178.95,150) .. (190,150) .. controls (201.05,150) and (210,158.95) .. (210,170) .. controls (210,181.05) and (201.05,190) .. (190,190) .. controls (178.95,190) and (170,181.05) .. (170,170) -- cycle ;
22 | %Shape: Circle [id:dp1311962702995253]
23 | \draw [line width=1.5] (230,170) .. controls (230,158.95) and (238.95,150) .. (250,150) .. controls (261.05,150) and (270,158.95) .. (270,170) .. controls (270,181.05) and (261.05,190) .. (250,190) .. controls (238.95,190) and (230,181.05) .. (230,170) -- cycle ;
24 | %Shape: Circle [id:dp055039676738894316]
25 | \draw [line width=1.5] (300,170) .. controls (300,158.95) and (308.95,150) .. (320,150) .. controls (331.05,150) and (340,158.95) .. (340,170) .. controls (340,181.05) and (331.05,190) .. (320,190) .. controls (308.95,190) and (300,181.05) .. (300,170) -- cycle ;
26 | %Straight Lines [id:da3720861628053822]
27 | \draw [line width=1.5] (156.8,122.4) -- (120,150) ;
28 | \draw [shift={(160,120)}, rotate = 143.13] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
29 | %Straight Lines [id:da5765340159287278]
30 | \draw [line width=1.5] (181.26,123.79) -- (190,150) ;
31 | \draw [shift={(180,120)}, rotate = 71.57] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
32 | %Straight Lines [id:da8546030008258803]
33 | \draw [line width=1.5] (258.74,123.79) -- (250,150) ;
34 | \draw [shift={(260,120)}, rotate = 108.43] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
35 | %Straight Lines [id:da21589979146129545]
36 | \draw [line width=1.5] (283.2,122.4) -- (320,150) ;
37 | \draw [shift={(280,120)}, rotate = 36.87] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ;
38 | %Shape: Rectangle [id:dp24854624791356805]
39 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (140,70) -- (300,70) -- (300,130) -- (140,130) -- cycle ;
40 | %Shape: Rectangle [id:dp15950228071348826]
41 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (220,140) -- (350,140) -- (350,200) -- (220,200) -- cycle ;
42 | %Shape: Rectangle [id:dp8128591939925793]
43 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (90,0) -- (360,0) -- (360,210) -- (90,210) -- cycle ;
44 |
45 | % Text Node
46 | \draw (340.5,25) node {$\mathcal{V}$};
47 | % Text Node
48 | \draw (316,80.5) node {$\mathcal{U}^{1}$};
49 | % Text Node
50 | \draw (339,124.5) node {$\mathcal{U}^{3}$};
51 | % Text Node
52 | \draw (220,30) node [font=\large] {$U_{0}^{1}$};
53 | % Text Node
54 | \draw (170,100) node [font=\large] {$U_{1}^{2}$};
55 | % Text Node
56 | \draw (270,100) node [font=\large] {$U_{1}^{3}$};
57 | % Text Node
58 | \draw (120,170) node [font=\large] {$U_{2}^{4}$};
59 | % Text Node
60 | \draw (190,170) node [font=\large] {$U_{2}^{5}$};
61 | % Text Node
62 | \draw (250,170) node [font=\large] {$U_{3}^{6}$};
63 | % Text Node
64 | \draw (320,170) node [font=\large] {$U_{3}^{7}$};
65 |
66 | \end{tikzpicture}
67 |
--------------------------------------------------------------------------------
/scripts/disc_err_dgp_m12.py:
--------------------------------------------------------------------------------
1 | # Compare different discretisation schemes on an SS-DGP and Generate Figure 4.3 in the thesis
2 | # This is done purely in sympy and numpy.
3 | #
4 | # Zheng Zhao 2020
5 | #
6 | import os
7 | import math
8 | import numpy as np
9 | import matplotlib.pyplot as plt
10 | import sympy as sp
11 |
12 | import tme.base_sympy as tme
13 | from sympy import lambdify
14 |
15 | l2 = 1.
16 | s2 = 0.1
17 | l3 = 1.
18 | s3 = 0.1
19 |
20 |
21 | # Transformation function
22 | def g(x):
23 | return np.exp(0.5 * x)
24 |
25 |
26 | def g_sym(x):
27 | return sp.exp(0.5 * x)
28 |
29 |
30 | # SS-DGP drift
31 | def a(x):
32 | return -np.array([x[0] / g(x[1]),
33 | x[1] / l2,
34 | x[2] / l3])
35 |
36 |
37 | # SS-DGP dispersion
38 | def b(x):
39 | return math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])),
40 | s2 / math.sqrt(l2),
41 | s3 / math.sqrt(l3)])
42 |
43 |
44 | def euler_maruyama(x0, dt, dws):
45 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0]))
46 | x = x0
47 | for idx, dw in enumerate(dws):
48 | x = x + a(x) * dt + b(x) @ dw
49 | xx[idx] = x
50 | return xx
51 |
52 |
53 | # Locally conditional discretisation method for giving
54 | # x_k \approx F(x_{k-1}) + q(x_{k-1})
55 | def lcd_F(x, dt):
56 | return np.diag(np.exp(-dt / np.array([g(x[1]), l2, l3])))
57 |
58 |
59 | def lcd_Q(x, dt):
60 | return np.diag([g(x[2]) ** 2 * (1 - np.exp(-2 * dt / g(x[1]))),
61 | s2 ** 2 * (1 - np.exp(-2 * dt / l2)),
62 | s3 ** 2 * (1 - np.exp(-2 * dt / l3))])
63 |
64 |
65 | def lcd(x0, dt, dws):
66 | """Locally conditional discretisation for Matern 1/2 SS-DGPs
67 | """
68 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0]))
69 | x = x0
70 | for idx, dw in enumerate(dws):
71 | x = lcd_F(x, dt) @ x + np.sqrt(lcd_Q(x, dt)) @ dw / np.sqrt(dt)
72 | xx[idx] = x
73 | return xx
74 |
75 |
76 | def local_sum(x, factor):
77 | target_shape = (int(x.shape[0] / factor), x.shape[1])
78 | xx = np.zeros(target_shape)
79 | for i in range(target_shape[0]):
80 | xx[i] = np.sum(x[i * factor:(i + 1) * factor], axis=0)
81 | return xx
82 |
83 |
84 | def give_tme_symbols(order=3, simp=True):
85 | """Give mean and covariance symbols of Taylor moment expansion.
86 | """
87 | # Symbols
88 | x = sp.MatrixSymbol('x', 3, 1)
89 | a_sym = -sp.Matrix([x[0] / g_sym(x[1]),
90 | x[1] / sp.S(l2),
91 | x[2] / sp.S(l3)])
92 | b_sym = sp.sqrt(sp.S(2)) * sp.Matrix([[g_sym(x[2]) / sp.sqrt(g_sym(x[1])), 0, 0],
93 | [0, sp.S(s2) / sp.sqrt(sp.S(l2)), 0],
94 | [0, 0, sp.S(s3) / sp.sqrt(sp.S(l3))]
95 | ])
96 | dt_sym = sp.Symbol('dt')
97 | tme_mean, tme_cov = tme.mean_and_cov(x, a_sym, b_sym, dt_sym,
98 | order=order, simp=simp)
99 | tme_mean_func = lambdify([x, dt_sym], tme_mean, 'numpy')
100 | tme_cov_func = lambdify([x, dt_sym], tme_cov, 'numpy')
101 | return tme_mean_func, tme_cov_func
102 |
103 |
104 | def tme_disc(x0, dt, dws, f, Q):
105 | """Taylor moment expansion discretisation. This gives you a demonstration how to use TME in sympy, however,
106 | please use the Jax implementation in practice.
107 | """
108 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0], 1))
109 | x = x0.reshape(-1, 1)
110 | for idx, dw in enumerate(dws):
111 | x = f(x, dt) + (np.linalg.cholesky(Q(x, dt)) @ dw / np.sqrt(dt))[:, None]
112 | xx[idx] = x
113 | return xx[:, :, 0]
114 |
115 |
116 | def abs_err(x1, x2):
117 | return np.sum(np.abs(x1 - x2))
118 |
119 |
120 | if __name__ == '__main__':
121 | path_figs = '../thesis/figs'
122 | plt.rcParams.update({
123 | 'text.usetex': True,
124 | 'text.latex.preamble': r'\usepackage{fouriernc}',
125 | 'font.family': "serif",
126 | 'font.serif': 'New Century Schoolbook',
127 | 'font.size': 20})
128 |
129 | np.random.seed(2020)
130 |
131 | end_T = 10
132 | num_steps = 100
133 | dt = end_T / num_steps
134 | tt = np.linspace(dt, end_T, num_steps)
135 |
136 | boost_factor = 1000
137 | boost_num_steps = num_steps * boost_factor
138 | boost_dt = dt / boost_factor
139 | boost_tt = np.linspace(boost_dt, end_T, boost_num_steps)
140 | boost_dws = np.sqrt(dt) * np.random.randn(boost_num_steps, 3)
141 |
142 | x0 = np.zeros(shape=(3,))
143 |
144 | # Compute very accurate discretisation
145 | boost_xx = euler_maruyama(x0, boost_dt, boost_dws)
146 | exact_xx = boost_xx[boost_factor - 1::boost_factor]
147 | exact_tt = boost_tt[boost_factor - 1::boost_factor]
148 |
149 | # Compute Euler Maruyama
150 | dws = local_sum(boost_dws, boost_factor)
151 | em_xx = euler_maruyama(x0, dt, dws)
152 |
153 | # Compute locally conditional discretisation
154 | lcd_xx = lcd(x0, dt, dws)
155 |
156 | # Compute TME
157 | tme_order = 3
158 | tme_mean_func, tme_cov_func = give_tme_symbols(order=tme_order, simp=True)
159 | tme_xx = tme_disc(x0, dt, dws, tme_mean_func, tme_cov_func)
160 |
161 | # Compute abs error
162 | err_dim = 0
163 | err_em = abs_err(em_xx[:, err_dim], exact_xx[:, err_dim])
164 | err_lcd = abs_err(lcd_xx[:, err_dim], exact_xx[:, err_dim])
165 | err_tme = abs_err(tme_xx[:, err_dim], exact_xx[:, err_dim])
166 | print(f'Euler--Maruyama abs err: {err_em}')
167 | print(f'LCD abs err: {err_lcd}')
168 | print(f'TME abs err: {err_tme}')
169 |
170 | # Plot
171 | plt.figure(figsize=(16, 8))
172 | plt.plot(tt, exact_xx[:, 0],
173 | c='black', linewidth=3, label='Numerical exact')
174 | plt.plot(tt, em_xx[:, 0],
175 | c='tab:blue', linewidth=3, linestyle=(0, (1, 1)),
176 | label=f'Euler--Maruyama (abs. err. $\\approx$ {err_em:.1f})')
177 | plt.plot(tt, lcd_xx[:, 0],
178 | c='tab:purple', linewidth=3, linestyle=(0, (5, 1)),
179 | label=f'LCD (abs. err. $\\approx$ {err_lcd:.1f})')
180 | plt.plot(tt, tme_xx[:, 0],
181 | c='tab:red', linewidth=3, linestyle=(0, (3, 1, 1, 1)),
182 | label=f'TME-{tme_order} (abs. err. $\\approx$ {err_tme:.1f})')
183 |
184 | plt.legend(loc='lower left', fontsize=24)
185 |
186 | plt.grid(linestyle='--', alpha=0.3, which='both')
187 | plt.xlim(0, end_T)
188 |
189 | plt.xlabel('$t$', fontsize=26)
190 | plt.ylabel('$U^1_0(t)$', fontsize=26)
191 | plt.xticks(np.arange(0, end_T + 1, 1))
192 |
193 | plt.tight_layout(pad=0.1)
194 | plt.savefig(os.path.join(path_figs, 'disc-err_dgp_m12.pdf'))
195 |
--------------------------------------------------------------------------------
/scripts/spectro_temporal.py:
--------------------------------------------------------------------------------
1 | # Probabilistic state-space spectro-temporal estimation. This will generate Figure 5.2 in the thesis.
2 | #
3 | # Zheng Zhao 2020
4 | #
5 | import os
6 | import math
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 |
10 | from scipy.linalg import cho_factor, cho_solve
11 | from typing import Tuple
12 |
13 |
14 | def test_signal(ts: np.ndarray, R: float) -> Tuple[np.ndarray, np.ndarray]:
15 | """Generate a test sinusoidal signal with multiple freq bands
16 |
17 | Parameters
18 | ----------
19 | ts : np.ndarray
20 | Time instances.
21 | R : float
22 | Measurement noise variance.
23 |
24 | Returns
25 | -------
26 | zt, yt : np.ndarray
27 | Ground truth signal and its noisy measurements, respectively.
28 | """
29 | t1 = ts[ts < 1 / 3]
30 | t2 = ts[(ts >= 1 / 3) & (ts < 2 / 3)]
31 | t3 = ts[ts >= 2 / 3]
32 | zt = np.concatenate([np.sin(2 * math.pi * 10 * t1),
33 | np.sin(2 * math.pi * 40 * t2) + np.sin(2 * math.pi * 60 * t2),
34 | np.sin(2 * math.pi * 90 * t3)],
35 | axis=0)
36 | yt = zt + math.sqrt(R) * np.random.randn(ts.size)
37 | return zt, yt
38 |
39 |
40 | def kf_rts(F: np.ndarray, Q: np.ndarray,
41 | H: np.ndarray, R: float,
42 | y: np.ndarray,
43 | m0: np.ndarray, p0: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
44 | """Simple enough Kalman filter and RTS smoother.
45 |
46 | x_k = F x_{k-1} + q_{k-1},
47 | y_k = H x_k + r_k,
48 |
49 | Parameters
50 | ----------
51 | F : np.ndarray
52 | State transition.
53 | Q : np.ndarray
54 | State covariance.
55 | H : np.ndarray
56 | Measurement matrix.
57 | R : float
58 | Measurement noise variance.
59 | y : np.ndarray
60 | Measurements.
61 | m0, P0 : np.ndarray
62 | Initial mean and cov.
63 |
64 | Returns
65 | -------
66 | ms, ps : np.ndarray
67 | Smoothing posterior mean and covariances.
68 | """
69 | dim_x = m0.size
70 | num_y = y.size
71 |
72 | mm = np.zeros(shape=(num_y, dim_x))
73 | pp = np.zeros(shape=(num_y, dim_x, dim_x))
74 |
75 | mm_pred = mm.copy()
76 | pp_pred = pp.copy()
77 |
78 | m = m0
79 | p = p0
80 |
81 | # Filtering pass
82 | for k in range(num_y):
83 | # Pred
84 | m = F @ m
85 | p = F @ p @ F.T + Q
86 | mm_pred[k] = m
87 | pp_pred[k] = p
88 |
89 | # Update
90 | Hk = H[k]
91 | S = Hk @ p @ Hk.T + R
92 | K = p @ Hk.T / S
93 | m = m + K * (y[k] - Hk @ m)
94 | p = p - np.outer(K, K) * S
95 |
96 | # Save
97 | mm[k] = m
98 | pp[k] = p
99 |
100 | # Smoothing pass
101 | ms = mm.copy()
102 | ps = pp.copy()
103 | for k in range(num_y - 2, -1, -1):
104 | (c, low) = cho_factor(pp_pred[k + 1])
105 | G = pp[k] @ cho_solve((c, low), F).T
106 | ms[k] = mm[k] + G @ (ms[k + 1] - mm_pred[k + 1])
107 | ps[k] = pp[k] + G @ (ps[k + 1] - pp_pred[k + 1]) @ G.T
108 |
109 | return ms, ps
110 |
111 |
112 | def generate_spectro_temporal_ssm(ell: float, sigma: float,
113 | ts: np.ndarray, dt: float,
114 | freqs: np.ndarray):
115 | """Generate the state-space model for specro-temporal analysis. Only implemented for the Matern 12 prior with
116 | uniform parameters for ell and sigma for all frequency components.
117 |
118 | Parameters
119 | ----------
120 | ell : float
121 | Length scale of Matern 12.
122 | sigma : float
123 | Magnitude scale of Matern 12.
124 | ts : np.ndarray
125 | Time instances.
126 | dt : float
127 | Time interval. (It is left as an exercise for you to implement varying dt)
128 | freqs : np.ndarray
129 | Frequencies.
130 |
131 | Returns
132 | -------
133 | F, Q, H
134 | State coefficients.
135 | """
136 | dim_x = 2 * N + 1
137 |
138 | lam = 1 / ell
139 | q = 2 * sigma ** 2 / ell
140 |
141 | F = math.exp(-lam * dt) * np.eye(dim_x)
142 | Q = q / (2 * lam) * (1 - math.exp(-2 * lam * dt)) * np.eye(dim_x)
143 |
144 | H = np.array([[1.]
145 | + [np.cos(2 * math.pi * f * t) for f in freqs]
146 | + [np.sin(2 * math.pi * f * t) for f in freqs] for t in ts])
147 | return F, Q, H
148 |
149 |
150 | if __name__ == '__main__':
151 | # Parameters of priors
152 | ell = 0.1
153 | sigma = 0.5
154 |
155 | # Generate a signal and measurements
156 | fs = 1000
157 | ts = np.linspace(0, 1, fs)
158 | R = 0.01
159 | zt, yt = test_signal(ts=ts, R=R)
160 |
161 | # Generate state-space GP model
162 | # Order of Fourier expansions
163 | N = 100
164 | freqs = np.linspace(1, 100, N)
165 | F, Q, H = generate_spectro_temporal_ssm(ell=ell, sigma=sigma, ts=ts, dt=1 / fs,
166 | freqs=freqs)
167 |
168 | # Kalman filtering and smoothing
169 | m0 = np.zeros(shape=(2 * N + 1,))
170 | p0 = 1. * np.eye(2 * N + 1)
171 |
172 | # Discarded smoothing covariance ps
173 | ms, _ = kf_rts(F=F, Q=Q, H=H, R=R + 0.01, y=yt, m0=m0, p0=p0)
174 |
175 | # Draw spectrogram sqrt(a^2 + b^2)
176 | spectrogram = np.sqrt(ms[:, 1:N + 1] ** 2 + ms[:, N + 1:] ** 2)
177 |
178 | # Plot
179 | path_figs = '../thesis/figs'
180 |
181 | plt.rcParams.update({
182 | 'text.usetex': True,
183 | 'text.latex.preamble': r'\usepackage{fouriernc}',
184 | 'font.family': "serif",
185 | 'font.serif': 'New Century Schoolbook',
186 | 'font.size': 20})
187 |
188 | # Plot signal
189 | fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
190 |
191 | axs[0].plot(ts, zt, linewidth=2, c='black', label='Signal')
192 | axs[0].scatter(ts, yt, s=10, c='tab:purple', edgecolors='none', alpha=0.4, label='Measurements')
193 | axs[0].set_xlabel('$t$', fontsize=24)
194 |
195 | axs[0].grid(linestyle='--', alpha=0.3, which='both')
196 | axs[0].legend(loc='upper left', fontsize=16)
197 |
198 | # Plot spectrogram and true freq bands
199 | mesh_ts, mesh_freqs = np.meshgrid(ts, freqs, indexing='ij')
200 | axs[1].contourf(mesh_ts, mesh_freqs, spectrogram, levels=4, cmap=plt.cm.Blues_r)
201 |
202 | axs[1].axhline(y=10, xmin=ts[0], xmax=ts[ts < 1 / 3][-1],
203 | c='black', linewidth=2, linestyle='--')
204 | axs[1].axhline(y=40, xmin=ts[(ts >= 1 / 3) & (ts < 2 / 3)][0], xmax=ts[(ts >= 1 / 3) & (ts < 2 / 3)][-1],
205 | c='black', linewidth=2, linestyle='--')
206 | axs[1].axhline(y=60, xmin=ts[(ts >= 1 / 3) & (ts < 2 / 3)][0], xmax=ts[(ts >= 1 / 3) & (ts < 2 / 3)][-1],
207 | c='black', linewidth=2, linestyle='--')
208 | axs[1].axhline(y=90, xmin=ts[ts >= 2 / 3][0], xmax=ts[ts >= 2 / 3][-1],
209 | c='black', linewidth=2, linestyle='--')
210 |
211 | axs[1].set_xlabel('$t$', fontsize=24)
212 | axs[1].set_ylabel('Frequency')
213 |
214 | plt.subplots_adjust(left=0.044, bottom=0.11, right=0.989, top=0.977, wspace=0.134, hspace=0.2)
215 | plt.savefig(os.path.join(path_figs, 'spectro-temporal-demo1.pdf'))
216 |
--------------------------------------------------------------------------------
/lectio_praecursoria/z_marcro.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = dissertation.tex
2 |
3 | % To use this macro you need packages: amsmath, amssymb, bm, mathtools
4 | %
5 | % Zheng Zhao @ 2019
6 | % zz@zabemon.com
7 | %
8 | % License: Creativice Commons Attribution 4.0 International (CC BY 4.0)
9 | %
10 |
11 | % Adaptive bold math font command
12 | \newcommand{\cu}[1]{
13 | \ifcat\noexpand#1\relax
14 | \bm{#1}
15 | \else
16 | \mathbf{#1}
17 | \fi
18 | }
19 |
20 | \newcommand{\tash}[2]{\frac{\partial #1}{\partial #2}}
21 | \newcommand{\tashh}[3]{\frac{\partial^2 #1}{\partial #2 \, \partial #3}}
22 |
23 | % Slightly smaller spacing than a pure mathop
24 | \newcommand{\diff}{\mathop{}\!\mathrm{d}}
25 |
26 | % Complex
27 | \newcommand{\imag}{\mathrm{i}}
28 |
29 | % Exponential
30 | \newcommand{\expp}{\mathrm{e}}
31 |
32 | % \mid used in condition in probability e.g., E[x \mid y]
33 | \newcommand{\cond}{{\;|\;}}
34 | \newcommand{\condbig}{{\;\big|\;}}
35 | \newcommand{\condBig}{{\;\Big|\;}}
36 | \newcommand{\condbigg}{{\;\bigg|\;}}
37 | \newcommand{\condBigg}{{\;\Bigg|\;}}
38 |
39 | \let\sup\relax
40 | \let\inf\relax
41 | \let\lim\relax
42 | \DeclareMathOperator*{\argmin}{arg\,min\,} % Argmin
43 | \DeclareMathOperator*{\argmax}{arg\,max\,} % Argmax
44 | \DeclareMathOperator*{\sup}{sup\,} % sup better spacing
45 | \DeclareMathOperator*{\inf}{inf\,} % inf
46 | \DeclareMathOperator*{\lim}{lim\,} % inf
47 |
48 | \newcommand{\sgn}{\operatorname{sgn}} % sign function
49 |
50 | \newcommand{\expecsym}{\operatorname{\mathbb{E}}} % Expec
51 | \newcommand{\covsym}{\operatorname{Cov}} % Covariance
52 | \newcommand{\varrsym}{\operatorname{Var}} % Variance
53 | \newcommand{\diagsym}{\operatorname{diag}} % Diagonal matrix
54 | \newcommand{\tracesym}{\operatorname{tr}} % Trace
55 |
56 | % Two problems for E, Cov, Var etc. with brackets
57 | % 1. \operatorname does not give space for bracket, thus we need to manually add \, after E. If \left\right is used then no need to add space.
58 | % 2. \left\right does not give correct vertical spacing. The brackets will be shifted down slightly.
59 | % Solution is to use \left\right when it is inevitable.
60 | % Use \expec when you do not want auto-height
61 | % Use \expec* when you want auto-height
62 | % Use \expecsym when you want to fully define the behaviour, which only gives the E symbol wihout brackets.
63 | \let\expec\relax
64 | \let\cov\relax
65 | \let\varr\relax
66 | \let\diag\relax
67 | \let\trace\relax
68 |
69 | \makeatletter
70 | % E [ ]
71 | \newcommand{\expec}{\@ifstar{\@expecauto}{\@expecnoauto}}
72 | \newcommand{\@expecauto}[1]{\expecsym \left[ #1 \right]}
73 | \newcommand{\@expecnoauto}[1]{\expecsym \, [#1]}
74 | \newcommand{\expecbig}[1]{\expecsym \big[ #1 \big]}
75 | \newcommand{\expecBig}[1]{\expecsym \Big[ #1 \Big]}
76 | \newcommand{\expecbigg}[1]{\expecsym \bigg[ #1 \bigg]}
77 | \newcommand{\expecBigg}[1]{\expecsym \Bigg[ #1 \Bigg]}
78 |
79 |
80 | % Cov [ ]
81 | \newcommand{\cov}{\@ifstar{\@covauto}{\@covnoauto}}
82 | \newcommand{\@covauto}[1]{\covsym \left[ #1 \right]}
83 | \newcommand{\@covnoauto}[1]{\covsym \, [#1]}
84 | \newcommand{\covbig}[1]{\covsym \big[ #1 \big]}
85 | \newcommand{\covBig}[1]{\covsym \Big[ #1 \Big]}
86 | \newcommand{\covbigg}[1]{\covsym \bigg[ #1 \bigg]}
87 | \newcommand{\covBigg}[1]{\covsym \Bigg[ #1 \Bigg]}
88 |
89 | % Var [ ]
90 | \newcommand{\varr}{\@ifstar{\@varrauto}{\@varrnoauto}}
91 | \newcommand{\@varrauto}[1]{\varrsym \left[ #1 \right]}
92 | \newcommand{\@varrnoauto}[1]{\varrsym \, [#1]}
93 | \newcommand{\varrbig}[1]{\varrsym \big[ #1 \big]}
94 | \newcommand{\varrBig}[1]{\varrsym \Big[ #1 \Big]}
95 | \newcommand{\varrbigg}[1]{\varrsym \bigg[ #1 \bigg]}
96 | \newcommand{\varrBigg}[1]{\varrsym \Bigg[ #1 \Bigg]}
97 |
98 | % Diag ( )
99 | \newcommand{\diag}{\@ifstar{\@diagauto}{\@diagnoauto}}
100 | \newcommand{\@diagauto}[1]{\diagsym \left( #1 \right)}
101 | \newcommand{\@diagnoauto}[1]{\diagsym \, (#1)}
102 | \newcommand{\diagbig}[1]{\diagsym \big( #1 \big)}
103 | \newcommand{\diagBig}[1]{\diagsym \Big( #1 \Big)}
104 | \newcommand{\diagbigg}[1]{\diagsym \bigg( #1 \bigg)}
105 | \newcommand{\diagBigg}[1]{\diagsym \Bigg( #1 \Bigg)}
106 |
107 | % tr ( )
108 | \newcommand{\trace}{\@ifstar{\@traceauto}{\@tracenoauto}}
109 | \newcommand{\@traceauto}[1]{\tracesym \left( #1 \right)}
110 | \newcommand{\@tracenoauto}[1]{\tracesym \, (#1)}
111 | \newcommand{\tracebig}[1]{\tracesym \big( #1 \big)}
112 | \newcommand{\traceBig}[1]{\tracesym \Big( #1 \Big)}
113 | \newcommand{\tracebigg}[1]{\tracesym \bigg( #1 \bigg)}
114 | \newcommand{\traceBigg}[1]{\tracesym \Bigg( #1 \Bigg)}
115 | \makeatother
116 |
117 | \newcommand{\A}{\mathcal{A}} % Generator
118 | \newcommand{\Am}{\overline{\mathcal{A}}} % Generator
119 |
120 | % Transpose symbol using (DIN) EN ISO 80000-2:2013 standard
121 | \newcommand*{\trans}{{\mkern-1.5mu\mathsf{T}}}
122 |
123 | \newcommand*{\T}{\mathbb{T}} % Set of temporal varialbes
124 | \newcommand*{\R}{\mathbb{R}} % Set of real numbers
125 | \newcommand*{\Q}{\mathbb{Q}} % Set of rational numbers
126 | \newcommand*{\N}{\mathbb{N}} % Set of natural numbers
127 | \newcommand*{\Z}{\mathbb{Z}} % Set of integers
128 |
129 | \newcommand*{\BB}{\mathcal{B}} % Borel sigma-algebra
130 | \newcommand*{\FF}{\mathcal{F}} % Sigma-algebra
131 | \newcommand*{\PP}{\mathbb{P}} % Probability measure
132 | \newcommand*{\GP}{\mathrm{GP}} % GP
133 |
134 | \newcommand{\mineig}{\lambda_{\mathrm{min}}}
135 | \newcommand{\maxeig}{\lambda_{\mathrm{max}}}
136 |
137 | % Norm and inner product
138 | %% use \norm* to enable auto-height
139 |
140 | %% Some notes on these paired delimiters:
141 | %% It is argued that there should be no space between operator and delimiter, but this might not be suitable in some cases. Indeed log(x) should have no space between log and (, but log |x| with a mathop{} spacing looks absolutely much prettier than log|x| because here |x| is an argument. Think, shouldn't it be log(|x|) in full expansion, and we ignored () with spacing?
142 | %% See discussion in https://tex.stackexchange.com/questions/461806/missing-space-with-declarepaireddelimiter
143 | %
144 | \let\norm\relax
145 | \DeclarePairedDelimiter{\normbracket}{\lVert}{\rVert}
146 | \newcommand{\norm}{\normbracket}
147 | \newcommand{\normbig}[1]{\big \lVert #1 \big \rVert}
148 | \newcommand{\normBig}[1]{\Big \lVert #1 \Big\rVert}
149 | \newcommand{\normbigg}[1]{\bigg \lVert #1 \bigg\rVert}
150 | \newcommand{\normBigg}[1]{\Bigg \lVert #1 \Bigg\rVert}
151 | %\makeatletter
152 | %\newcommand{\norm}{\@ifstar{\@normnoauto}{\@normauto}}
153 | %\newcommand{\@normauto}[1]{\left\lVert#1\right\rVert}
154 | %\newcommand{\@normnoauto}[1]{\lVert#1\rVert}
155 | %\makeatother
156 |
157 | \let\innerp\relax
158 | \DeclarePairedDelimiter{\innerpbracket}{\langle}{\rangle}
159 | \newcommand{\innerp}{\innerpbracket}
160 | %\makeatletter
161 | %\newcommand{\innerp}{\@ifstar{\@inpnoautp}{\@inpauto}}
162 | %\newcommand{\@inpauto}[2]{\left\langle#1, #2\right\rangle}
163 | %\newcommand{\@inpnoautp}[2]{\left#1, #2\rangle}
164 | %\makeatother
165 |
166 | \let\abs\relax
167 | \DeclarePairedDelimiter{\absbracket}{\lvert}{\rvert}
168 | \newcommand{\abs}{\absbracket}
169 | \newcommand{\absbig}[1]{\big \lvert #1 \big \rvert}
170 | \newcommand{\absBig}[1]{\Big \lvert #1 \Big\rvert}
171 | \newcommand{\absbigg}[1]{\bigg \lvert #1 \bigg\rvert}
172 | \newcommand{\absBigg}[1]{\Bigg \lvert #1 \Bigg\rvert}
173 | %\makeatletter
174 | %\newcommand{\abs}{\@ifstar{\@absnoauto}{\@absauto}}
175 | %\newcommand{\@absauto}[1]{\left\lvert#1\right\rvert}
176 | %\newcommand{\@absnoauto}[1]{\lvert#1\rvert}
177 | %\makeatother
178 |
179 | % Some functions
180 | \newcommand{\mBesselsec}{\operatorname{K}_\nu}
181 | \newcommand{\jacob}{\operatorname{J}}
182 | \newcommand{\hessian}{\operatorname{H}}
183 |
184 | % Literals
185 | \def\matern{Mat\'{e}rn }
186 |
187 | % Theorem envs
188 | % Dummy env for those sharing the same numbering system.
189 | %
190 | \makeatletter
191 | \@ifundefined{thmnumcounter}{}
192 | {%
193 | \newtheorem{envcounter}{EnvcounterDummy}[\thmnumcounter]
194 | \newtheorem{theorem}[\thmnumcounter]{Theorem}
195 | \newtheorem{proposition}[\thmnumcounter]{Proposition}
196 | \newtheorem{lemma}[\thmnumcounter]{Lemma}
197 | \newtheorem{corollary}[\thmnumcounter]{Corollary}
198 | \newtheorem{remark}[\thmnumcounter]{Remark}
199 | \newtheorem{example}[\thmnumcounter]{Example}
200 | \newtheorem{definition}[\thmnumcounter]{Definition}
201 | \newtheorem{algorithm}[\thmnumcounter]{Algorithm}
202 | \newtheorem{assumption}[\thmnumcounter]{Assumption}
203 | \newcommand{\textcmd}{zz}
204 | }
205 | \makeatother
206 |
--------------------------------------------------------------------------------
/thesis_latex/zmacro.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = main.tex
2 | % Generic macro definitions for a number of math operations.
3 | % Version 2.0, last updated 02.06.2022.
4 | %
5 | % To use this macro you need packages: amsmath, amssymb, bm, mathtools
6 | %
7 | % Zheng Zhao @ 2019
8 | % zz@zabemon.com
9 | %
10 | % License: Creativice Commons Attribution 4.0 International (CC BY 4.0)
11 | %
12 |
13 | % Adaptive bold math font command
14 | \newcommand{\cu}[1]{
15 | \ifcat\noexpand#1\relax
16 | \bm{#1}
17 | \else
18 | \mathbf{#1}
19 | \fi
20 | }
21 |
22 | \newcommand{\tash}[2]{\frac{\partial #1}{\partial #2}}
23 | \newcommand{\tashh}[3]{\frac{\partial^2 #1}{\partial #2 \, \partial #3}}
24 |
25 | % Slightly smaller spacing than a pure mathop
26 | \newcommand{\diff}{\mathop{}\!\mathrm{d}}
27 |
28 | % Complex
29 | \newcommand{\imag}{\mathrm{i}}
30 |
31 | % Exponential
32 | \newcommand{\expp}{\mathrm{e}}
33 |
34 | % \mid used in condition in probability e.g., E[x \mid y]
35 | \newcommand{\cond}{{\;|\;}}
36 | \newcommand{\condbig}{{\;\big|\;}}
37 | \newcommand{\condBig}{{\;\Big|\;}}
38 | \newcommand{\condbigg}{{\;\bigg|\;}}
39 | \newcommand{\condBigg}{{\;\Bigg|\;}}
40 |
41 | \let\sup\relax
42 | \let\inf\relax
43 | \let\lim\relax
44 | \DeclareMathOperator*{\argmin}{arg\,min\,} % Argmin
45 | \DeclareMathOperator*{\argmax}{arg\,max\,} % Argmax
46 | \DeclareMathOperator*{\sup}{sup\,} % sup better spacing
47 | \DeclareMathOperator*{\inf}{inf\,} % inf
48 | \DeclareMathOperator*{\lim}{lim\,} % inf
49 | \DeclareMathOperator*{\oprepeat}{\cdots} % repeat operation
50 |
51 | \newcommand{\sgn}{\operatorname{sgn}} % sign function
52 |
53 | \newcommand{\expecsym}{\operatorname{\mathbb{E}}} % Expec
54 | \newcommand{\covsym}{\operatorname{Cov}} % Covariance
55 | \newcommand{\varrsym}{\operatorname{Var}} % Variance
56 | \newcommand{\diagsym}{\operatorname{diag}} % Diagonal matrix
57 | \newcommand{\tracesym}{\operatorname{tr}} % Trace
58 |
59 | % Two problems for E, Cov, Var etc. with brackets
60 | % 1. \operatorname does not give space for bracket, thus we need to manually add \, after E. If \left\right is used then no need to add space.
61 | % 2. \left\right does not give correct vertical spacing. The brackets will be shifted down slightly.
62 | % Solution is to use \left\right when it is inevitable.
63 | % Use \expec when you do not want auto-height
64 | % Use \expec* when you want auto-height
65 | % Use \expecsym when you want to fully define the behaviour, which only gives the E symbol wihout brackets.
66 | \let\expec\relax
67 | \let\cov\relax
68 | \let\varr\relax
69 | \let\diag\relax
70 | \let\trace\relax
71 |
72 | \makeatletter
73 | % E [ ]
74 | \newcommand{\expec}{\@ifstar{\@expecauto}{\@expecnoauto}}
75 | \newcommand{\@expecauto}[1]{\expecsym \left[ #1 \right]}
76 | \newcommand{\@expecnoauto}[1]{\expecsym [#1]}
77 | \newcommand{\expecbig}[1]{\expecsym \bigl[ #1 \bigr]}
78 | \newcommand{\expecBig}[1]{\expecsym \Bigl[ #1 \Bigr]}
79 | \newcommand{\expecbigg}[1]{\expecsym \biggl[ #1 \biggr]}
80 | \newcommand{\expecBigg}[1]{\expecsym \Biggl[ #1 \Biggr]}
81 |
82 |
83 | % Cov [ ]
84 | \newcommand{\cov}{\@ifstar{\@covauto}{\@covnoauto}}
85 | \newcommand{\@covauto}[1]{\covsym \left[ #1 \right]}
86 | \newcommand{\@covnoauto}[1]{\covsym [#1]}
87 | \newcommand{\covbig}[1]{\covsym \bigl[ #1 \bigr]}
88 | \newcommand{\covBig}[1]{\covsym \Bigl[ #1 \Bigr]}
89 | \newcommand{\covbigg}[1]{\covsym \biggl[ #1 \biggr]}
90 | \newcommand{\covBigg}[1]{\covsym \Biggl[ #1 \Biggr]}
91 |
92 | % Var [ ]
93 | \newcommand{\varr}{\@ifstar{\@varrauto}{\@varrnoauto}}
94 | \newcommand{\@varrauto}[1]{\varrsym \left[ #1 \right]}
95 | \newcommand{\@varrnoauto}[1]{\varrsym [#1]}
96 | \newcommand{\varrbig}[1]{\varrsym \bigl[ #1 \bigr]}
97 | \newcommand{\varrBig}[1]{\varrsym \Bigl[ #1 \Bigr]}
98 | \newcommand{\varrbigg}[1]{\varrsym \biggl[ #1 \biggr]}
99 | \newcommand{\varrBigg}[1]{\varrsym \Biggl[ #1 \Biggr]}
100 |
101 | % Diag ( )
102 | \newcommand{\diag}{\@ifstar{\@diagauto}{\@diagnoauto}}
103 | \newcommand{\@diagauto}[1]{\diagsym \left( #1 \right)}
104 | \newcommand{\@diagnoauto}[1]{\diagsym (#1)}
105 | \newcommand{\diagbig}[1]{\diagsym \bigl( #1 \bigr)}
106 | \newcommand{\diagBig}[1]{\diagsym \Bigl( #1 \Bigr)}
107 | \newcommand{\diagbigg}[1]{\diagsym \biggl( #1 \biggr)}
108 | \newcommand{\diagBigg}[1]{\diagsym \Biggl( #1 \Biggr)}
109 |
110 | % tr ( )
111 | \newcommand{\trace}{\@ifstar{\@traceauto}{\@tracenoauto}}
112 | \newcommand{\@traceauto}[1]{\tracesym \left( #1 \right)}
113 | \newcommand{\@tracenoauto}[1]{\tracesym (#1)}
114 | \newcommand{\tracebig}[1]{\tracesym \bigl( #1 \bigr)}
115 | \newcommand{\traceBig}[1]{\tracesym \Bigl( #1 \Bigr)}
116 | \newcommand{\tracebigg}[1]{\tracesym \biggl( #1 \biggr)}
117 | \newcommand{\traceBigg}[1]{\tracesym \Biggl( #1 \Biggr)}
118 | \makeatother
119 |
120 | \newcommand{\A}{\mathcal{A}} % Generator
121 | \newcommand{\Am}{\overline{\mathcal{A}}} % Generator
122 |
123 | % Transpose symbol using (DIN) EN ISO 80000-2:2013 standard
124 | \newcommand*{\trans}{{\mkern-1.5mu\mathsf{T}}}
125 |
126 | \newcommand*{\T}{\mathbb{T}} % Set of temporal varialbes
127 | \newcommand*{\R}{\mathbb{R}} % Set of real numbers
128 | \newcommand*{\Q}{\mathbb{Q}} % Set of rational numbers
129 | \newcommand*{\N}{\mathbb{N}} % Set of natural numbers
130 | \newcommand*{\Z}{\mathbb{Z}} % Set of integers
131 |
132 | \newcommand*{\BB}{\mathcal{B}} % Borel sigma-algebra
133 | \newcommand*{\FF}{\mathcal{F}} % Sigma-algebra
134 | \newcommand*{\PP}{\mathbb{P}} % Probability measure
135 | \newcommand*{\GP}{\mathrm{GP}} % GP
136 |
137 | \newcommand{\mineig}{\lambda_{\mathrm{min}}}
138 | \newcommand{\maxeig}{\lambda_{\mathrm{max}}}
139 |
140 | % Norm and inner product
141 | %% use \norm* to enable auto-height
142 |
143 | %% Some notes on these paired delimiters:
144 | %% It is argued that there should be no space between operator and delimiter, but this might not be suitable in some cases. Indeed log(x) should have no space between log and (, but log |x| with a mathop{} spacing looks absolutely much prettier than log|x| because here |x| is an argument. Think, shouldn't it be log(|x|) in full expansion, and we ignored () with spacing?
145 | %% See discussion in https://tex.stackexchange.com/questions/461806/missing-space-with-declarepaireddelimiter
146 | %
147 | \let\norm\relax
148 | \DeclarePairedDelimiter{\normbracket}{\lVert}{\rVert}
149 | \newcommand{\norm}{\normbracket}
150 | \newcommand{\normbig}[1]{\big \lVert #1 \big \rVert}
151 | \newcommand{\normBig}[1]{\Big \lVert #1 \Big\rVert}
152 | \newcommand{\normbigg}[1]{\bigg \lVert #1 \bigg\rVert}
153 | \newcommand{\normBigg}[1]{\Bigg \lVert #1 \Bigg\rVert}
154 | %\makeatletter
155 | %\newcommand{\norm}{\@ifstar{\@normnoauto}{\@normauto}}
156 | %\newcommand{\@normauto}[1]{\left\lVert#1\right\rVert}
157 | %\newcommand{\@normnoauto}[1]{\lVert#1\rVert}
158 | %\makeatother
159 |
160 | \let\innerp\relax
161 | \DeclarePairedDelimiter{\innerpbracket}{\langle}{\rangle}
162 | \newcommand{\innerp}{\innerpbracket}
163 | %\makeatletter
164 | %\newcommand{\innerp}{\@ifstar{\@inpnoautp}{\@inpauto}}
165 | %\newcommand{\@inpauto}[2]{\left\langle#1, #2\right\rangle}
166 | %\newcommand{\@inpnoautp}[2]{\left#1, #2\rangle}
167 | %\makeatother
168 |
169 | \let\abs\relax
170 | \DeclarePairedDelimiter{\absbracket}{\lvert}{\rvert}
171 | \newcommand{\abs}{\absbracket}
172 | \newcommand{\absbig}[1]{\big \lvert #1 \big \rvert}
173 | \newcommand{\absBig}[1]{\Big \lvert #1 \Big\rvert}
174 | \newcommand{\absbigg}[1]{\bigg \lvert #1 \bigg\rvert}
175 | \newcommand{\absBigg}[1]{\Bigg \lvert #1 \Bigg\rvert}
176 | %\makeatletter
177 | %\newcommand{\abs}{\@ifstar{\@absnoauto}{\@absauto}}
178 | %\newcommand{\@absauto}[1]{\left\lvert#1\right\rvert}
179 | %\newcommand{\@absnoauto}[1]{\lvert#1\rvert}
180 | %\makeatother
181 |
182 | % Some functions
183 | \newcommand{\mBesselsec}{\operatorname{K_\nu}}
184 | \newcommand{\jacob}{\mathrm{J}}
185 | \newcommand{\hessian}{\mathrm{H}}
186 |
187 | % Literals
188 | \def\matern{Mat\'{e}rn }
189 |
190 | % Theorem envs
191 | % Dummy env for those sharing the same numbering system.
192 | % If you would like to customise your environment numbering, you can define a command e.g., \thmnumcounter{section} in your main tex.
193 | % If \thmnumcounter is undefined, it is assumed that you will deal with defining theorem lemma etc by yourself.
194 | \makeatletter
195 | \@ifundefined{thmenvcounter}{}
196 | {%
197 | \newtheorem{envcounter}{EnvcounterDummy}[\thmenvcounter]
198 | \newtheorem{theorem}[envcounter]{Theorem}
199 | \newtheorem{proposition}[envcounter]{Proposition}
200 | \newtheorem{lemma}[envcounter]{Lemma}
201 | \newtheorem{corollary}[envcounter]{Corollary}
202 | \newtheorem{remark}[envcounter]{Remark}
203 | \newtheorem{example}[envcounter]{Example}
204 | \newtheorem{definition}[envcounter]{Definition}
205 | \newtheorem{algorithm}[envcounter]{Algorithm}
206 | \newtheorem{assumption}[envcounter]{Assumption}
207 | }
208 | \makeatother
209 |
--------------------------------------------------------------------------------
/lectio_praecursoria/scripts/kfs_anime.py:
--------------------------------------------------------------------------------
1 | """
2 | Generate animation of filtering and smoothing operations.
3 |
4 | Zheng Zhao, 2021
5 | """
6 | import math
7 | import numpy as np
8 | import scipy.linalg
9 | import matplotlib.pyplot as plt
10 | from matplotlib.animation import FuncAnimation
11 | from matplotlib import animation
12 | from typing import Tuple
13 |
14 |
15 | def lti_sde_to_disc(A: np.ndarray, B: np.ndarray, dt: float) -> Tuple[np.ndarray, np.ndarray]:
16 | # Axelsson and Gustafsson 2015
17 | dim = A.shape[0]
18 |
19 | F = scipy.linalg.expm(A * dt)
20 | phi = np.vstack([np.hstack([A, np.outer(B, B)]), np.hstack([np.zeros_like(A), -A.T])])
21 | AB = scipy.linalg.expm(phi * dt) @ np.vstack([np.zeros_like(A), np.eye(dim)])
22 | Q = AB[0:dim, :] @ F.T
23 | return F, Q
24 |
25 |
26 | def simulate_data_from_disc_ss(F: np.ndarray, Q: np.ndarray,
27 | H: np.ndarray, R: float,
28 | m0: np.ndarray, p0: np.ndarray,
29 | T: int) -> Tuple[np.ndarray, np.ndarray]:
30 | dim_x = m0.size
31 |
32 | xs = np.empty((T, dim_x))
33 | ys = np.empty((T, ))
34 |
35 | x = m0 + np.linalg.cholesky(p0) @ np.random.randn(dim_x)
36 | for k in range(T):
37 | x = F @ x + np.linalg.cholesky(Q) @ np.random.randn(dim_x)
38 | y = H @ x + math.sqrt(R) * np.random.randn()
39 | xs[k] = x
40 | ys[k] = y
41 | return xs, ys
42 |
43 |
44 | def kf_rts(F: np.ndarray, Q: np.ndarray,
45 | H: np.ndarray, R: float,
46 | y: np.ndarray,
47 | m0: np.ndarray, p0: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
48 | """A Kalman filter and RTS smoother implementation can't be simpler.
49 |
50 | x_k = F x_{k-1} + q_{k-1},
51 | y_k = H x_k + r_k,
52 |
53 | Parameters
54 | ----------
55 | F : np.ndarray
56 | State transition.
57 | Q : np.ndarray
58 | State covariance.
59 | H : np.ndarray
60 | Measurement matrix (should give 1d measurement for simplicity).
61 | R : float
62 | Measurement noise variance.
63 | y : np.ndarray
64 | Measurements.
65 | m0, P0 : np.ndarray
66 | Initial mean and cov.
67 | Returns
68 | -------
69 | ms, ps : np.ndarray
70 | Smoothing posterior mean and covariances.
71 | """
72 | dim_x = m0.size
73 | num_y = y.size
74 |
75 | mfs = np.zeros(shape=(num_y, dim_x))
76 | pfs = np.zeros(shape=(num_y, dim_x, dim_x))
77 |
78 | mps = mfs.copy()
79 | pps = pfs.copy()
80 |
81 | m = m0
82 | p = p0
83 |
84 | # Filtering pass
85 | for k in range(num_y):
86 |
87 | # Pred
88 | m = F @ m
89 | p = F @ p @ F.T + Q
90 |
91 | mps[k] = m
92 | pps[k] = p
93 |
94 | # Update
95 | S = H @ p @ H.T + R
96 | K = p @ H.T / S
97 | m = m + K @ (y[k] - H @ m)
98 | p = p - K @ S @ K.T
99 |
100 | # Save
101 | mfs[k] = m
102 | pfs[k] = p
103 |
104 | # Smoothing pass
105 | mss = mfs.copy()
106 | pss = pfs.copy()
107 | for k in range(num_y - 2, -1, -1):
108 | (c, low) = scipy.linalg.cho_factor(pps[k + 1])
109 | G = pfs[k] @ scipy.linalg.cho_solve((c, low), F).T
110 | mss[k] = mfs[k] + G @ (mss[k + 1] - mps[k + 1])
111 | pss[k] = pfs[k] + G @ (pss[k + 1] - pps[k + 1]) @ G.T
112 |
113 | return mfs, pfs, mss, pss
114 |
115 |
116 | if __name__ == "__main__":
117 |
118 | np.random.seed(666666)
119 |
120 | plt.rcParams.update({
121 | 'text.usetex': True,
122 | 'text.latex.preamble': r'\usepackage{fouriernc}',
123 | 'font.family': "serif",
124 | 'font.serif': 'New Century Schoolbook',
125 | 'font.size': 18})
126 | anime_writer = animation.ImageMagickWriter()
127 |
128 | # Matern 3/2 coefficients
129 | ell = 2.
130 | sigma = 1.
131 |
132 | A = np.array([[0., 1.],
133 | [-3 / ell ** 2, -2 * math.sqrt(3) / ell]])
134 | B = np.array([0., sigma * math.sqrt(12 * math.sqrt(3)) / ell ** (3 / 2)])
135 |
136 | m0 = np.zeros((2, ))
137 | p0 = np.array([[sigma ** 2, 0.],
138 | [0., 3 * sigma ** 2 / ell ** 2]])
139 |
140 | dt = 0.1
141 | F, Q = lti_sde_to_disc(A, B, dt)
142 |
143 | H = np.array([[1., 0.]])
144 | R = 0.1
145 |
146 | # Generate data
147 | T = 100
148 | ts = np.linspace(dt, T * dt, T)
149 | xs, ys = simulate_data_from_disc_ss(F, Q, H, R, m0, p0, T)
150 |
151 | # Filtering and smoothing
152 | mfs, pfs, mss, pss = kf_rts(F, Q, H, R, ys, m0, p0)
153 |
154 | # Animation for filtering
155 | # Updating polycollection in matplotlib is difficult,
156 | # the code in the following for fill_between might look ugly
157 | fig, ax = plt.subplots(figsize=(8.8, 6.6))
158 |
159 | line_true, = ax.plot(ts[0], xs[0, 0], c='black', linewidth=3, label='True signal $X(t)$')
160 | sct_data = ax.scatter(ts[0], ys[0], s=4, c='purple', label='Measurement $Y_k$')
161 | line_mfs, = ax.plot(ts[0], mfs[0, 0], c='tab:blue', linewidth=3, label='Filtering mean')
162 | ax.fill_between(
163 | ts[:1],
164 | mfs[:1, 0] - 1.96 * np.sqrt(pfs[:1, 0, 0]),
165 | mfs[:1, 0] + 1.96 * np.sqrt(pfs[:1, 0, 0]),
166 | color='tab:blue',
167 | edgecolor='none',
168 | alpha=0.1,
169 | label='.95 confidence'
170 | )
171 | k_line = ax.axvline(ts[0], c='black', linestyle='--')
172 | k_text = ax.text(ts[0], 0., '$k=0$', fontsize=18)
173 |
174 | ax.set_xlim(0, T * dt)
175 | ax.set_ylim(-2.5, 1.5)
176 | ax.legend(loc='upper left', ncol=2, fontsize=18)
177 | ax.set_xlabel('$t$', fontsize=18)
178 |
179 | plt.subplots_adjust(top=.986, bottom=.084, left=.063, right=.988)
180 |
181 | def anime_func(frame):
182 | line_true.set_data(ts[:frame], xs[:frame, 0])
183 | line_mfs.set_data(ts[:frame], mfs[:frame, 0])
184 | k_line.set_data((ts[frame-1], ts[frame-1]), (0, 1))
185 | k_text.set_text(f'$k={frame}$')
186 | k_text.set_position((ts[frame], 0.))
187 | ax.collections.clear()
188 | ax.fill_between(
189 | ts[:frame],
190 | mfs[:frame, 0] - 1.96 * np.sqrt(pfs[:frame, 0, 0]),
191 | mfs[:frame, 0] + 1.96 * np.sqrt(pfs[:frame, 0, 0]),
192 | color='tab:blue',
193 | edgecolor='none',
194 | alpha=0.1
195 | )
196 | ax.scatter(ts[:frame], ys[:frame], s=4, c='purple')
197 |
198 | ani = FuncAnimation(fig, anime_func,
199 | frames=T, interval=10,
200 | repeat=False)
201 |
202 | ani.save('../figs/animes/filter.png', writer=anime_writer)
203 | # plt.show()
204 |
205 | plt.close(fig)
206 |
207 | # Animation for smoothing
208 | fig, ax = plt.subplots(figsize=(8.8, 6.6))
209 |
210 | ax.plot(ts, xs[:, 0], c='black', linewidth=3, label='True signal $X(t)$')
211 | ax.scatter(ts, ys, s=4, c='purple', label='Measurement $Y_k$')
212 | line_mss, = ax.plot(ts, mss[:, 0], c='tab:blue', linewidth=3, label='Smoothing mean')
213 | ax.fill_between(
214 | ts[:1],
215 | mss[:1, 0] - 1.96 * np.sqrt(pss[:1, 0, 0]),
216 | mss[:1, 0] + 1.96 * np.sqrt(pss[:1, 0, 0]),
217 | color='tab:blue',
218 | edgecolor='none',
219 | alpha=0.1,
220 | label='.95 confidence'
221 | )
222 | k_line = ax.axvline(ts[0], c='black', linestyle='--')
223 | k_text = ax.text(ts[0], 0., '$k=0$', fontsize=18)
224 |
225 | ax.set_xlim(0, T * dt)
226 | ax.set_ylim(-2.5, 1.5)
227 | ax.legend(loc='upper left', ncol=2, fontsize=18)
228 | ax.set_xlabel('$t$', fontsize=18)
229 |
230 | plt.subplots_adjust(top=.986, bottom=.084, left=.063, right=.988)
231 |
232 | def anime_func(frame):
233 | line_mss.set_data(ts[:frame], mss[:frame, 0])
234 | k_line.set_data((ts[frame-1], ts[frame-1]), (0, 1))
235 | k_text.set_text(f'$k={frame}$')
236 | k_text.set_position((ts[frame], 0.))
237 | ax.collections.clear()
238 | ax.fill_between(
239 | ts[:frame],
240 | mss[:frame, 0] - 1.96 * np.sqrt(pss[:frame, 0, 0]),
241 | mss[:frame, 0] + 1.96 * np.sqrt(pss[:frame, 0, 0]),
242 | color='tab:blue',
243 | edgecolor='none',
244 | alpha=0.1
245 | )
246 | ax.scatter(ts, ys, s=4, c='purple')
247 |
248 | ani = FuncAnimation(fig, anime_func,
249 | frames=T, interval=10,
250 | repeat=False)
251 |
252 | ani.save('../figs/animes/smoother.png', writer=anime_writer)
253 | # plt.show()
254 |
255 |
--------------------------------------------------------------------------------
/thesis_latex/ch6.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = dissertation.tex
2 | \chapter{Summary and discussion}
3 | \label{chap:summary}
4 | In this chapter we present a concise summary of Publications I--VII as well as discussion on a few unsolved problems and possible future extensions.
5 |
6 | \section{Summary of publications}
7 | This section briefly summaries the contributions of Publications~I--VII and highlights their significances.
8 |
9 | \subsection*{Publication~\cp{paperTME} (Chapter~\ref{chap:tme})}
10 | This paper proposes a new class of non-linear continuous-discrete Gaussian filters and smoothers by using the Taylor moment expansion (TME) scheme to predict the means and covariances from SDEs. The main significance of this paper is that the TME method can provide asymptotically exact solutions of the predictive mean and covariances required in the Gaussian filtering and smoothing steps. Secondly, the paper analyses the positive definiteness of TME covariance approximations and thereupon presents a few sufficient conditions to guarantee the positive definiteness. Lastly, the paper analyses the stability of TME Gaussian filters.
11 |
12 | \subsection*{Publication~\cp{paperSSDGP} (Chapter~\ref{chap:dssgp})}
13 | This paper introduces state-space representations of a class of deep Gaussian processes (DGPs). More specifically, the paper defines DGPs as vector-valued stochastic processes over collections of conditional GPs, thereupon, the paper represents DGPs in hierarchical systems of the SDE representations of their conditional GPs. The main significance of this paper is that the resulting state-space DGPs (SS-DGPs) are Markov processes, so that the SS-DGP regression problem is computationally cheap (i.e., linear with respect to the number of measurements) by using continuous-discrete filtering and smoothing methods. Secondly, the paper identifies that for a certain class of SS-DGPs the Gaussian filtering and smoothing methods fail to learn the posterior distributions of their state components. Finally, the paper features a real application of SS-DGPs in modelling a gravitational wave signal.
14 |
15 | \subsection*{Publication~\cp{paperKFSECG} (Section~\ref{sec:spectro-temporal})}
16 | This paper is an extension of Publication~\cp{paperKFSECGCONF}. In particular, the quasi-periodic SDEs are used to model the Fourier coefficients instead of the Ornstein--Uhlenbeck ones used in Publication~\cp{paperKFSECGCONF}. This consideration leads to state-space models for which the measurement representations are time-invariant therefore, one can use steady-state Kalman filters and smoothers to solve the spectro-temporal estimation problem with lower computational cost compared to Publication~\cp{paperKFSECGCONF}. This paper also expands the experiments for atrial fibrillation detection by taking into account more classifiers.
17 |
18 | \subsection*{Publication~\cp{paperDRIFT} (Section~\ref{sec:drift-est})}
19 | This paper is concerned with the state-space GP approach for estimating unknown drift functions of SDEs from partially observed trajectories. This approach is significant mainly in terms of computation, as the computational complexity scales linearly in the number of measurements. In addition, the state-space GP approach allows for using high-order It\^{o}--Taylor expansions in order to give accurate SDE discretisations without the necessity to compute the covariance matrices of the derivatives of the GP prior.
20 |
21 | \subsection*{Publication~\cp{paperKFSECGCONF} (Section~\ref{sec:spectro-temporal})}
22 | This paper introduces a state-space probabilistic spectro-temporal estimation method and thereupon applies the method for detecting atrial fibrillation from electrocardiogram signals. The so-called probabilistic spectro-temporal estimation is a GP regression-based model for estimating the coefficients of Fourier expansions. The main significance of this paper is that the state-space framework allows for dealing with large sets of measurements and high-order Fourier expansions. Also, the combination of the spectro-temporal estimation method and deep convolutional neural networks shows efficacy for classifying a class of electrocardiogram signals.
23 |
24 | \subsection*{Publication~\cp{paperMARITIME} (Section~\ref{sec:maritime})}
25 | This paper reviews sensor technologies and machine learning methods for autonomous maritime vessel navigation. In particular, the paper lists and reviews a number of studies that use deep learning and GP methods for vessel trajectory analysis, ship detection and classification, and ship tracking. The paper also features a ship detection example by using a deep convolutional neural network.
26 |
27 | \subsection*{Publication~\cp{paperRNSSGP} (Section~\ref{sec:l1-r-dgp})}
28 | This paper solves $L^1$-regularised DGP regression problems under the alternating direction method of multipliers (ADMM) framework. The significance of this paper is that one can introduce regularisation (e.g., sparseness or total variation) at any level of the DGP component hierarchy. Secondly, the paper provides a general framework that allows for regularising both batch and state-space DGPs. Finally, the paper presents a convergence analysis for the proposed ADMM solution of $L^1$-regularised DGP regression problems.
29 |
30 | \section{Discussion}
31 | Finally, we end this thesis with discussion on some unsolved problems and possible future extensions.
32 |
33 | \subsection*{Positive definiteness analysis for high-order and high-dimensional TME covariance approximation}
34 | Theorem~\ref{thm:tme-cov-pd} provides a sufficient condition to guarantee the positive definiteness of TME covariance approximations. However, the use of Theorem~\ref{thm:tme-cov-pd} soon becomes infeasible as the expansion order $M$ and the state dimension $d$ grow large. In practice, it can be easier to check the positive definiteness numerically when $d$ is small.
35 |
36 | \subsection*{Practical implementation of TME}
37 | A practical challenge with implementing TME consists in the presence of derivative terms in $\A$ (see, Equation~\eqref{equ:generator-ito}). This in turn implies that the iterated generator $\A^M$ further requires the computation of derivatives of the SDE coefficients up to order $M$. While the derivatives of $\A$ are easily computed by hand, the derivatives in $\A^M$ require more consideration as they involve numerous applications of the chain rule, not to mention the multidimensional operator $\Am$ in Remark~\ref{remark:multidim-generator}.
38 |
39 | While in our current implementation we chose to use symbolic differentiation (for ease of implementation as well as portability across languages), several things can be said against using it. Symbolic differentiation explicitly computes full Jacobians, where only vector-Jacobian/Jacobian-vector products would be necessary. This induces an unnecessary overhead that grows with the dimension of the problem. Also, symbolic differentiation is usually independent of the philosophy of modern differentiable programming frameworks and the optimisation for parallelisable hardware (e.g., GPUs), hence they may incur a loss of performance on these.
40 |
41 | Automatic differentiation tools, for instance, TensorFlow and JaX are amenable to computing the derivatives in $\Am$. Furthermore, they provide efficient computations for Jacobian-vector/vector-Jacobian products. We hence argue that these tools are worthwhile for performance improvement in the future\footnote{By the time of the pre-examination of this thesis, the TME method is now implemented in JaX as an open source library (see, Section~\ref{sec:codes}).}.
42 |
43 | \subsection*{Generalisation of the identifiability analysis}
44 | The identifiability analysis in Section~\ref{sec:identi-problem} is limited to SS-DGPs for which the GP elements are one-dimensional. This dimension assumption is used in order to derive Equation~\eqref{equ:vanish-cov-eq1} in closed-form. However, it is of interest to see whether we can generalise Lemma~\ref{lemma:vanishing-prior-cov} for SS-DGPs that have multidimensional GP elements.
45 |
46 | The abstract Gaussian filter in Algorithm~\ref{alg:abs-gf} assumes that the prediction steps are done exactly. However, this assumption may not always be realistic because Gaussian filters often involve numerical integrations to predict through SDEs, for example, by using sigma-point methods. Hence, it is important to verify if Lemma~\ref{lemma:vanishing-prior-cov} still holds when one computes the filtering predictions by some numerical means.
47 |
48 | \subsection*{Spatio-temporal SS-DGPs}
49 | SS-DGPs are stochastic processes defined on temporal domains. In order to model spatio-temporal data, it is necessary to generalise SS-DGPs to take values in infinite-dimensional spaces~\citep{Giuseppe2014}. A path for this generalisation is to leverage the stochastic partial differential equation (SPDE) representations of spatio-temporal GPs. To see this, let us consider an $\mathbb{H}$-valued stochastic process $U \colon \T \to \mathbb{H}$ governed by a well-defined SPDE
50 | %
51 | \begin{equation}
52 | \diff U(t) = A \, U(t) \diff t + B \diff W(t) \nonumber
53 | \end{equation}
54 | %
55 | with some boundary and initial conditions, where $A\colon \mathbb{H} \to \mathbb{H}$ and $B\colon \mathbb{W} \to \mathbb{H}$ are linear operators, and $W\colon \T \to \mathbb{W}$ is a $\mathbb{W}$-valued Wiener process. Then we can borrow the idea presented in Section~\ref{sec:ssdgp} to form a spatio-temporal SS-DGP by hierarchically composing such SPDEs of the form above.
56 |
57 | A different path for generalising SS-DGPs is shown by~\citet{Emzir2020}. Specifically, they build deep Gaussian fields based on the SPDE representations of \matern fields~\citep{Whittle1954, Lindgren2011}. However, we should note that this approach gives random fields instead of spatio-temporal processes.
58 |
--------------------------------------------------------------------------------
/thesis_latex/fourier2.sty:
--------------------------------------------------------------------------------
1 | \def\fileversion{1.4}%
2 | \def\filedate{2005/01/01}%
3 | \NeedsTeXFormat{LaTeX2e}%
4 | \ProvidesPackage{fourier}%
5 | [\filedate\space\fileversion\space fourier-GUTenberg package]%
6 | \DeclareFontEncoding{FML}{}{}
7 | \DeclareFontSubstitution{FML}{futm}{m}{it}
8 | \DeclareFontEncoding{FMS}{}{}
9 | \DeclareFontSubstitution{FMS}{futm}{m}{n}
10 | %\DeclareFontEncoding{FMX}{}{}
11 | % \DeclareFontSubstitution{FMX}{futm}{m}{n}
12 | %%
13 | \newif\ifsloped\newif\ifpoorman\poormantrue
14 | \newif\ifwidespace\widespacefalse
15 | \DeclareOption{widespace}{\widespacetrue}
16 | %%
17 | \DeclareOption{poorman}{\def\textfamilyextension{s}%
18 | \def\mathfamilyextension{s}}
19 | \DeclareOption{expert}{\def\textfamilyextension{x}%
20 | \def\mathfamilyextension{x}\poormanfalse}
21 | \DeclareOption{oldstyle}{\def\textfamilyextension{j}%
22 | \def\mathfamilyextension{x}\poormanfalse}
23 | \DeclareOption{fulloldstyle}{\def\textfamilyextension{j}%
24 | \def\mathfamilyextension{j}\poormanfalse}
25 | \DeclareOption{sloped}{\slopedtrue}
26 | \DeclareOption{upright}{\slopedfalse}
27 | \ExecuteOptions{sloped,poorman}
28 | \ProcessOptions
29 | %%
30 |
31 | %%
32 | \ifwidespace
33 | \DeclareRobustCommand{\SetFourierSpace}{%
34 | \fontdimen2\font=1.23\fontdimen2\font}
35 | \fi
36 | \ifpoorman\else
37 | \newcommand*{\sbseries}{\fontseries{sb}\selectfont}
38 | \newcommand*{\blackseries}{\fontseries{eb}\selectfont}
39 | \newcommand*{\titleshape}{\fontshape{tt}\selectfont}
40 | \DeclareTextFontCommand{\textsb}{\sbseries}%
41 | \DeclareTextFontCommand{\textblack}{\blackseries}%
42 | \DeclareTextFontCommand{\texttitle}{\titleshape}%
43 | \newcommand*{\oldstyle}{\fontfamily{futj}\selectfont}
44 | \newcommand*{\lining}{\fontfamily{futx}\selectfont}
45 | \fi
46 | \renewcommand{\rmdefault}{fut\textfamilyextension}
47 | \RequirePackage[T1]{fontenc}
48 | \RequirePackage{textcomp}
49 | \RequirePackage{fourier-orns}
50 | \DeclareSymbolFont{operators}{T1}{fut\mathfamilyextension}{m}{n}%
51 | \SetSymbolFont{operators}{bold}{T1}{fut\mathfamilyextension}{b}{n}%
52 |
53 | %
54 | \def\addFourierGreekPrefix#1{other}
55 | \newcommand{\othergreek}[1]{\expandafter\csname\expandafter%
56 | \addFourierGreekPrefix\string#1\endcsname}
57 | %
58 | \ifsloped
59 | \DeclareSymbolFont{letters}{FML}{futmi}{m}{it}%
60 | \DeclareSymbolFont{otherletters}{FML}{futm}{m}{it}
61 | \SetSymbolFont{letters}{bold}{FML}{futmi}{b}{it}
62 | \SetSymbolFont{otherletters}{bold}{FML}{futm}{b}{it}
63 | \DeclareMathSymbol{\Gamma}{\mathord}{otherletters}{000}
64 | \DeclareMathSymbol{\Delta}{\mathord}{otherletters}{001}
65 | \DeclareMathSymbol{\Theta}{\mathord}{otherletters}{002}
66 | \DeclareMathSymbol{\Lambda}{\mathord}{otherletters}{003}
67 | \DeclareMathSymbol{\Xi}{\mathord}{otherletters}{004}
68 | \DeclareMathSymbol{\Pi}{\mathord}{otherletters}{005}
69 | \DeclareMathSymbol{\Sigma}{\mathord}{otherletters}{006}
70 | \DeclareMathSymbol{\Upsilon}{\mathord}{otherletters}{007}
71 | \DeclareMathSymbol{\Phi}{\mathord}{otherletters}{008}
72 | \DeclareMathSymbol{\Psi}{\mathord}{otherletters}{009}
73 | \DeclareMathSymbol{\Omega}{\mathord}{otherletters}{010}
74 | \DeclareMathSymbol{\otherGamma}{\mathord}{letters}{000}
75 | \DeclareMathSymbol{\otherDelta}{\mathord}{letters}{001}
76 | \DeclareMathSymbol{\otherTheta}{\mathord}{letters}{002}
77 | \DeclareMathSymbol{\otherLambda}{\mathord}{letters}{003}
78 | \DeclareMathSymbol{\otherXi}{\mathord}{letters}{004}
79 | \DeclareMathSymbol{\otherPi}{\mathord}{letters}{005}
80 | \DeclareMathSymbol{\otherSigma}{\mathord}{letters}{006}
81 | \DeclareMathSymbol{\otherUpsilon}{\mathord}{letters}{007}
82 | \DeclareMathSymbol{\otherPhi}{\mathord}{letters}{008}
83 | \DeclareMathSymbol{\otherPsi}{\mathord}{letters}{009}
84 | \DeclareMathSymbol{\otherOmega}{\mathord}{letters}{010}
85 | \else
86 | \DeclareSymbolFont{letters}{FML}{futm}{m}{it}%
87 | \DeclareSymbolFont{otherletters}{FML}{futmi}{m}{it}
88 | \SetSymbolFont{letters}{bold}{FML}{futm}{b}{it}
89 | \SetSymbolFont{otherletters}{bold}{FML}{futmi}{b}{it}
90 | \DeclareMathSymbol{\Gamma}{\mathord}{letters}{000}
91 | \DeclareMathSymbol{\Delta}{\mathord}{letters}{001}
92 | \DeclareMathSymbol{\Theta}{\mathord}{letters}{002}
93 | \DeclareMathSymbol{\Lambda}{\mathord}{letters}{003}
94 | \DeclareMathSymbol{\Xi}{\mathord}{letters}{004}
95 | \DeclareMathSymbol{\Pi}{\mathord}{letters}{005}
96 | \DeclareMathSymbol{\Sigma}{\mathord}{letters}{006}
97 | \DeclareMathSymbol{\Upsilon}{\mathord}{letters}{007}
98 | \DeclareMathSymbol{\Phi}{\mathord}{letters}{008}
99 | \DeclareMathSymbol{\Psi}{\mathord}{letters}{009}
100 | \DeclareMathSymbol{\Omega}{\mathord}{letters}{010}
101 | \DeclareMathSymbol{\otherGamma}{\mathord}{otherletters}{000}
102 | \DeclareMathSymbol{\otherDelta}{\mathord}{otherletters}{001}
103 | \DeclareMathSymbol{\otherTheta}{\mathord}{otherletters}{002}
104 | \DeclareMathSymbol{\otherLambda}{\mathord}{otherletters}{003}
105 | \DeclareMathSymbol{\otherXi}{\mathord}{otherletters}{004}
106 | \DeclareMathSymbol{\otherPi}{\mathord}{otherletters}{005}
107 | \DeclareMathSymbol{\otherSigma}{\mathord}{otherletters}{006}
108 | \DeclareMathSymbol{\otherUpsilon}{\mathord}{otherletters}{007}
109 | \DeclareMathSymbol{\otherPhi}{\mathord}{otherletters}{008}
110 | \DeclareMathSymbol{\otherPsi}{\mathord}{otherletters}{009}
111 | \DeclareMathSymbol{\otherOmega}{\mathord}{otherletters}{010}
112 | \fi
113 | \DeclareSymbolFont{symbols}{FMS}{futm}{m}{n}%
114 | %\DeclareSymbolFont{largesymbols}{FMX}{futm}{m}{n}
115 | \DeclareMathAlphabet{\mathbf}{T1}{fut\mathfamilyextension}{bx}{n}%
116 | \DeclareMathAlphabet{\mathrm}{T1}{fut\mathfamilyextension}{m}{n}%
117 | \DeclareMathAlphabet{\mathit}{T1}{fut\mathfamilyextension}{m}{it}%
118 | \DeclareMathAlphabet{\mathcal}{FMS}{futm}{m}{n}%
119 | \DeclareMathSymbol{\varkappa}{\mathord}{letters}{128}
120 | \DeclareMathSymbol{\varvarrho}{\mathord}{letters}{129}
121 | \DeclareMathSymbol{+}{\mathbin}{symbols}{128}
122 | \DeclareMathSymbol{=}{\mathrel}{symbols}{129}
123 | \DeclareMathSymbol{<}{\mathrel}{symbols}{130}
124 | \DeclareMathSymbol{>}{\mathrel}{symbols}{131}
125 | \DeclareMathSymbol{\leqslant}{\mathrel}{symbols}{132}
126 | \DeclareMathSymbol{\geqslant}{\mathrel}{symbols}{133}
127 | \DeclareMathSymbol{\parallelslant}{\mathrel}{symbols}{134}
128 | \DeclareMathSymbol{\thething}{\mathord}{symbols}{135}
129 | \DeclareMathSymbol{\vDash}{\mathrel}{symbols}{136}
130 | \DeclareMathSymbol{\blacktriangleleft}{\mathrel}{symbols}{137}
131 | \DeclareMathSymbol{\blacktriangleright}{\mathrel}{symbols}{138}
132 | \DeclareMathSymbol{\nleqslant}{\mathrel}{symbols}{139}
133 | \DeclareMathSymbol{\ngeqslant}{\mathrel}{symbols}{140}
134 | \DeclareMathSymbol{\parallel}{\mathrel}{symbols}{141}
135 | \DeclareMathSymbol{\nparallel}{\mathrel}{symbols}{142}
136 | \DeclareMathSymbol{\nparallelslant}{\mathrel}{symbols}{143}
137 | \DeclareMathSymbol{\nvDash}{\mathrel}{symbols}{144}
138 | \DeclareMathSymbol{\intercal}{\mathbin}{symbols}{145}
139 | \DeclareMathSymbol{\hslash}{\mathord}{symbols}{146}
140 | \DeclareMathSymbol{\nexists}{\mathord}{symbols}{147}
141 | \DeclareMathSymbol{\complement}{\mathord}{symbols}{148}
142 | \DeclareMathSymbol{\varsubsetneq}{\mathrel}{symbols}{149}
143 | \DeclareMathSymbol{\xswordsup}{\mathord}{symbols}{150}
144 | \DeclareMathSymbol{\xswordsdown}{\mathord}{symbols}{151}
145 | \let\notin\@undefined
146 | \DeclareMathSymbol{\notin}{\mathrel}{symbols}{155}
147 | \DeclareMathSymbol{\notowns}{\mathrel}{symbols}{156}
148 | \DeclareMathSymbol{\hbar}{\mathord}{symbols}{157}
149 | \DeclareMathSymbol{\smallsetminus}{\mathbin}{symbols}{158}
150 | \DeclareMathSymbol{\subsetneqq}{\mathrel}{symbols}{159}
151 | \DeclareMathSymbol{\rightrightarrows}{\mathrel}{symbols}{160}
152 | \DeclareMathSymbol{\leftleftarrows}{\mathrel}{symbols}{161}
153 | \DeclareMathSymbol{\square}{\mathord}{symbols}{162}
154 | \DeclareMathSymbol{\curvearrowleft}{\mathrel}{symbols}{163}
155 | \DeclareMathSymbol{\curvearrowright}{\mathrel}{symbols}{164}
156 | \DeclareMathSymbol{\blacksquare}{\mathord}{symbols}{165}
157 | \DeclareMathSymbol{\otheralpha}{\mathord}{otherletters}{011}
158 | \DeclareMathSymbol{\otherbeta}{\mathord}{otherletters}{012}
159 | \DeclareMathSymbol{\othergamma}{\mathord}{otherletters}{013}
160 | \DeclareMathSymbol{\otherdelta}{\mathord}{otherletters}{014}
161 | \DeclareMathSymbol{\otherepsilon}{\mathord}{otherletters}{015}
162 | \DeclareMathSymbol{\otherzeta}{\mathord}{otherletters}{016}
163 | \DeclareMathSymbol{\othereta}{\mathord}{otherletters}{017}
164 | \DeclareMathSymbol{\othertheta}{\mathord}{otherletters}{018}
165 | \DeclareMathSymbol{\otheriota}{\mathord}{otherletters}{019}
166 | \DeclareMathSymbol{\otherkappa}{\mathord}{otherletters}{020}
167 | \DeclareMathSymbol{\otherlambda}{\mathord}{otherletters}{021}
168 | \DeclareMathSymbol{\othermu}{\mathord}{otherletters}{022}
169 | \DeclareMathSymbol{\othernu}{\mathord}{otherletters}{023}
170 | \DeclareMathSymbol{\otherxi}{\mathord}{otherletters}{024}
171 | \DeclareMathSymbol{\otherpi}{\mathord}{otherletters}{025}
172 | \DeclareMathSymbol{\otherrho}{\mathord}{otherletters}{026}
173 | \DeclareMathSymbol{\othersigma}{\mathord}{otherletters}{027}
174 | \DeclareMathSymbol{\othertau}{\mathord}{otherletters}{028}
175 | \DeclareMathSymbol{\otherupsilon}{\mathord}{otherletters}{029}
176 | \DeclareMathSymbol{\otherphi}{\mathord}{otherletters}{030}
177 | \DeclareMathSymbol{\otherchi}{\mathord}{otherletters}{031}
178 | \DeclareMathSymbol{\otherpsi}{\mathord}{otherletters}{032}
179 | \DeclareMathSymbol{\otheromega}{\mathord}{otherletters}{033}
180 | \DeclareMathSymbol{\othervarepsilon}{\mathord}{otherletters}{034}
181 | \DeclareMathSymbol{\othervartheta}{\mathord}{otherletters}{035}
182 | \DeclareMathSymbol{\othervarpi}{\mathord}{otherletters}{036}
183 | \DeclareMathSymbol{\othervarrho}{\mathord}{otherletters}{037}
184 | \DeclareMathSymbol{\othervarsigma}{\mathord}{otherletters}{038}
185 | \DeclareMathSymbol{\othervarphi}{\mathord}{otherletters}{039}
186 | \DeclareMathSymbol{\varkappa}{\mathord}{letters}{128}
187 | \DeclareMathSymbol{\varvarrho}{\mathord}{letters}{129}
188 | \DeclareMathSymbol{\varpartialdiff}{\mathord}{letters}{130}
189 | \DeclareMathSymbol{\varvarpi}{\mathord}{letters}{131}
190 | \DeclareMathSymbol{\othervarkappa}{\mathord}{otherletters}{128}
191 | \DeclareMathSymbol{\othervarvarrho}{\mathord}{otherletters}{129}
192 | \DeclareMathSymbol{\othervarvarpi}{\mathord}{otherletters}{131}
193 |
194 | % No MathDelimiters! - MV
195 |
196 | \DeclareMathAccent{\acute}{\mathalpha}{operators}{1}
197 | \DeclareMathAccent{\grave}{\mathalpha}{operators}{0}
198 | \DeclareMathAccent{\ddot}{\mathalpha}{operators}{4}
199 | \DeclareMathAccent{\tilde}{\mathalpha}{operators}{3}
200 | \DeclareMathAccent{\bar}{\mathalpha}{operators}{9}
201 | \DeclareMathAccent{\breve}{\mathalpha}{operators}{8}
202 | \DeclareMathAccent{\check}{\mathalpha}{operators}{7}
203 | \DeclareMathAccent{\hat}{\mathalpha}{operators}{2}
204 | \DeclareMathAccent{\dot}{\mathalpha}{operators}{10}
205 | \DeclareMathAccent{\mathring}{\mathalpha}{operators}{6}
206 | \DeclareMathAccent{\wideparen}{\mathord}{largesymbols}{148}
207 | %%
208 |
209 | \DeclareMathAccent{\widearc}{\mathord}{largesymbols}{216}
210 | \DeclareMathAccent{\wideOarc}{\mathord}{largesymbols}{228}
211 | %%
212 | \def\defaultscriptratio{.76}
213 | \def\defaultscriptscriptratio{.6}
214 | \DeclareMathSizes{5} {6} {6} {6}
215 | \DeclareMathSizes{6} {6} {6} {6}
216 | \DeclareMathSizes{7} {6.8} {6} {6}
217 | \DeclareMathSizes{8} {8} {6.8}{6}
218 | \DeclareMathSizes{9} {9} {7.6}{6}
219 | \DeclareMathSizes{10} {10} {7.6}{6}
220 | \DeclareMathSizes{10.95}{10.95}{8.3}{6}
221 | \DeclareMathSizes{12} {12} {9} {7}
222 | \DeclareMathSizes{14.4} {14.4} {10} {8}
223 | \DeclareMathSizes{17.28}{17.28}{12} {9}
224 | \DeclareMathSizes{20.74}{20.74}{14.4}{10}
225 | \DeclareMathSizes{24.88}{24.88}{17.28}{12}
226 | \thinmuskip=2mu
227 | \medmuskip=2.5mu plus 1mu minus 2.5mu
228 | \thickmuskip=3.5mu plus 2.5mu
229 | %%
230 | \delimiterfactor850
231 | %%
232 | \DeclareFontFamily{U}{futm}{}
233 | \DeclareFontShape{U}{futm}{m}{n}{
234 | <-> s * [.92] fourier-bb
235 | }{}
236 | \DeclareSymbolFont{Ufutm}{U}{futm}{m}{n}
237 | \DeclareSymbolFontAlphabet{\math@bb}{Ufutm}
238 | \AtBeginDocument{\let\mathbb\math@bb %
239 |
240 | \ifx\overset\@undefined\else
241 | \newcommand{\widering}[1]{\overset{\smash{\vbox to .2ex{%
242 | \hbox{$\mathring{}$}}}}{\wideparen{#1}}}
243 | \fi
244 | %
245 | \def\accentclass@{0} % I'm unsure whether this is ok
246 | }
247 | %
248 | %
249 | \endinput
250 |
251 |
--------------------------------------------------------------------------------
/scripts/TME_estimation_benes.py:
--------------------------------------------------------------------------------
1 | # Demonstrate TME on a Benes model for some expectation approximations. This will generate Figure 3.1 in the thesis.
2 | #
3 | # Zheng Zhao 2020
4 | #
5 | import os
6 | import math
7 | import sympy
8 | import numpy as np
9 | import matplotlib.pyplot as plt
10 | import tme.base_sympy as tme
11 |
12 | from typing import Callable
13 | from sympy import lambdify
14 |
15 |
16 | def rieman1D(x: np.ndarray,
17 | f: Callable,
18 | *args, **kwargs):
19 | r"""Riemannian computation of an integral
20 | \int f(x, *args, **kwargs) dx \approx \sum f(x_i) (x_i - x_i-1).
21 | Can be replaced by :code:`np.trapz`.
22 | """
23 | return np.sum(f(x[1:], *args, **kwargs) * np.diff(x))
24 |
25 |
26 | def benesPDF(x: np.ndarray,
27 | x0: float,
28 | dt: float):
29 | """
30 | Transition density of the Benes model.
31 | See, pp. 214 of Sarkka 2019.
32 | """
33 | return 1 / math.sqrt(2 * math.pi * dt) * np.cosh(x) / np.cosh(x0) \
34 | * math.exp(-0.5 * dt) * np.exp(-0.5 / dt * (x - x0) ** 2)
35 |
36 |
37 | def f_mean(x: np.ndarray,
38 | x0: float,
39 | dt: float):
40 | """Expectation integrand.
41 | """
42 | return x * benesPDF(x, x0, dt)
43 |
44 |
45 | def f_x2(x: np.ndarray,
46 | x0: float,
47 | dt: float):
48 | """Expectation integrand.
49 | """
50 | return x ** 2 * benesPDF(x, x0, dt)
51 |
52 |
53 | def f_x3(x: np.ndarray,
54 | x0: float,
55 | dt: float):
56 | """Expectation integrand.
57 | """
58 | return x ** 3 * benesPDF(x, x0, dt)
59 |
60 |
61 | def f_nonlinear(x: np.ndarray,
62 | x0: float,
63 | dt: float):
64 | """Expectation integrand.
65 | """
66 | return np.sin(x) * benesPDF(x, x0, dt)
67 |
68 |
69 | def softplus(x):
70 | return np.log(1 + np.exp(x))
71 |
72 |
73 | def softplus_sympy(x):
74 | return sympy.log(1 + sympy.exp(x))
75 |
76 |
77 | def f_nn(x: np.ndarray,
78 | x0: float,
79 | dt: float):
80 | """
81 | A toy-level neural network with a single perceptron
82 | NN(x) = sigmoid(x)
83 | """
84 | return softplus(softplus(x)) * benesPDF(x, x0, dt)
85 |
86 |
87 | def em_mean(f: Callable,
88 | x0: float,
89 | dt: float):
90 | """E[x | x0] by Euler Maruyama
91 | """
92 | return x0 + f(x0) * dt
93 |
94 |
95 | def em_cov(f: Callable,
96 | x0: float,
97 | dt: float):
98 | """Var[x | x0] by Euler Maruyama
99 | """
100 | return dt
101 |
102 |
103 | def em_x3(f: Callable,
104 | x0: float,
105 | dt: float):
106 | """E[x^3 | x0] by Euler Maruyama
107 | """
108 | return x0 ** 3 + 3 * x0 ** 2 * f(x0) * dt \
109 | + 3 * x0 * dt + f(x0) ** 3 * dt ** 3 \
110 | + 3 * f(x0) * dt ** 2
111 |
112 |
113 | def ito15_mean(f: Callable,
114 | dfdx: Callable,
115 | d2fdx2: Callable,
116 | x0: float,
117 | dt: float):
118 | """E[x | x0] by Ito-1.5
119 | """
120 | return x0 + f(x0) * dt + (dfdx(x0) * f(x0) + 0.5 * d2fdx2(x0)) * dt ** 2 / 2
121 |
122 |
123 | def ito15_cov(f: Callable,
124 | dfdx: Callable,
125 | d2fdx2: Callable,
126 | x0: float,
127 | dt: float):
128 | """Cov[x | x0] by Ito-1.5
129 | """
130 | return dt + dfdx(x0) ** 2 * dt ** 3 / 3 + dfdx(x0) * dt ** 2
131 |
132 |
133 | def ito15_x3(f: Callable,
134 | dfdx: Callable,
135 | d2fdx2: Callable,
136 | x0: float,
137 | dt: float):
138 | """E[x^3 | x0] by Ito-1.5
139 | """
140 | z = x0 + f(x0) * dt + (dfdx(x0) * f(x0) + 0.5 * d2fdx2(x0)) * dt ** 2 / 2
141 | return z ** 3 + 3 * z * dt + 3 * z * dfdx(x0) * dt ** 2 + z * dfdx(x0) * dt ** 3
142 |
143 |
144 | tanh = lambda u: np.tanh(u)
145 | dtanh = lambda u: 1 - np.tanh(u) ** 2
146 | ddtanh = lambda u: 2 * (np.tanh(u) ** 3 - np.tanh(u))
147 |
148 | if __name__ == '__main__':
149 |
150 | # Initial value and paras
151 | np.random.seed(666)
152 | x0 = 0.5
153 | T = np.linspace(0.01, 4, 200)
154 |
155 | # Riemannian range
156 | range_dx = np.linspace(x0 - 20, x0 + 20, 100000)
157 |
158 | # Benes SDE
159 | x = sympy.MatrixSymbol('x', 1, 1)
160 | f = sympy.Matrix([sympy.tanh(x[0])])
161 | L = sympy.eye(1)
162 | dt_sym = sympy.Symbol('dt')
163 |
164 | # TME
165 | tme_mean, tme_cov = tme.mean_and_cov(x, f, L, dt_sym,
166 | order=3, simp=True)
167 | tme_x3 = tme.expectation(sympy.Matrix([x[0] ** 3]), x, f, L, dt_sym,
168 | order=3, simp=True)
169 | tme_nonlinear3 = tme.expectation(sympy.Matrix([sympy.sin(x[0])]), x, f, L, dt_sym,
170 | order=2, simp=True)
171 | tme_nonlinear4 = tme.expectation(sympy.Matrix([sympy.sin(x[0])]), x, f, L, dt_sym,
172 | order=3, simp=True)
173 | tme_nn = tme.expectation(sympy.Matrix([softplus_sympy(softplus_sympy(x[0]))]), x, f, L, dt_sym,
174 | order=3, simp=True)
175 |
176 | tme_mean_func = lambdify([x, dt_sym], tme_mean, 'numpy')
177 | tme_cov_func = lambdify([x, dt_sym], tme_cov, 'numpy')
178 | tme_x3_func = lambdify([x, dt_sym], tme_x3, 'numpy')
179 | tme_nonlinear_func3 = lambdify([x, dt_sym], tme_nonlinear3, 'numpy')
180 | tme_nonlinear_func4 = lambdify([x, dt_sym], tme_nonlinear4, 'numpy')
181 | tme_nonlinear_nn = lambdify([x, dt_sym], tme_nn, 'numpy')
182 |
183 | # Result containers
184 | tme_mean_result = np.zeros_like(T)
185 | tme_cov_result = np.zeros_like(T)
186 | tme_x3_result = np.zeros_like(T)
187 | tme_nonlinear3_result = np.zeros_like(T)
188 | tme_nonlinear4_result = np.zeros_like(T)
189 | tme_nn_result = np.zeros_like(T)
190 |
191 | riem_mean_result = np.zeros_like(T)
192 | riem_cov_result = np.zeros_like(T)
193 | riem_x3_result = np.zeros_like(T)
194 | riem_nonlinear_result = np.zeros_like(T)
195 | riem_nn_result = np.zeros_like(T)
196 |
197 | em_mean_result = np.zeros_like(T)
198 | em_cov_result = np.zeros_like(T)
199 | em_x3_result = np.zeros_like(T)
200 |
201 | ito15_mean_result = np.zeros_like(T)
202 | ito15_cov_result = np.zeros_like(T)
203 | ito15_x3_result = np.zeros_like(T)
204 |
205 | for idx, t in enumerate(T):
206 | tme_mean_result[idx] = tme_mean_func(np.array([[x0]]), np.array([[t]]))
207 | tme_cov_result[idx] = tme_cov_func(np.array([[x0]]), np.array([[t]]))
208 | tme_x3_result[idx] = tme_x3_func(np.array([[x0]]), np.array([[t]]))
209 | tme_nonlinear3_result[idx] = tme_nonlinear_func3(np.array([[x0]]), np.array([[t]]))
210 | tme_nonlinear4_result[idx] = tme_nonlinear_func4(np.array([[x0]]), np.array([[t]]))
211 | tme_nn_result[idx] = tme_nonlinear_nn(np.array([[x0]]), np.array([[t]]))
212 |
213 | riem_mean_result[idx] = rieman1D(range_dx, f_mean, x0=x0, dt=t)
214 | riem_cov_result[idx] = rieman1D(range_dx, f_x2, x0=x0, dt=t) - riem_mean_result[idx] ** 2
215 | riem_x3_result[idx] = rieman1D(range_dx, f_x3, x0=x0, dt=t)
216 | riem_nonlinear_result[idx] = rieman1D(range_dx, f_nonlinear, x0=x0, dt=t)
217 | riem_nn_result[idx] = rieman1D(range_dx, f_nn, x0=x0, dt=t)
218 |
219 | em_mean_result[idx] = em_mean(lambda z: np.tanh(z), x0, t)
220 | em_cov_result[idx] = em_cov(lambda z: np.tanh(z), x0, t)
221 | em_x3_result[idx] = em_x3(lambda z: np.tanh(z), x0, t)
222 |
223 | ito15_mean_result[idx] = ito15_mean(tanh, dtanh, ddtanh, x0, t)
224 | ito15_cov_result[idx] = ito15_cov(tanh, dtanh, ddtanh, x0, t)
225 | ito15_x3_result[idx] = ito15_x3(tanh, dtanh, ddtanh, x0, t)
226 |
227 | # Plot
228 | path_figs = '../thesis/figs'
229 | plt.rcParams.update({
230 | 'text.usetex': True,
231 | 'text.latex.preamble': r'\usepackage{fouriernc}',
232 | 'font.family': "serif",
233 | 'font.serif': 'New Century Schoolbook',
234 | 'font.size': 15})
235 |
236 | fig, axs = plt.subplots(nrows=4, ncols=1, figsize=(9, 12), sharex=True)
237 |
238 | # No need to show the mean because the results are identical
239 | # plt.figure()
240 | # plt.plot(T, tme_mean_result, label='TME')
241 | # plt.plot(T, riem_mean_result, label='Exact')
242 | # plt.plot(T, em_mean_result, label='EM')
243 | # plt.plot(T, ito15_mean_result, label='Ito15')
244 | # plt.legend()
245 | # plt.savefig(os.path.join(path_figs, 'tme-benes-mean.pdf'))
246 |
247 | # Variance
248 | axs[0].plot(T, riem_cov_result,
249 | c='black',
250 | linewidth=3, marker='+', markevery=20,
251 | markersize=17,
252 | label='Exact')
253 | axs[0].plot(T, tme_cov_result,
254 | c='tab:blue',
255 | linewidth=3, marker='.', markevery=20,
256 | markersize=17,
257 | label='TME-3')
258 | axs[0].plot(T, em_cov_result,
259 | c='tab:red',
260 | linewidth=3, marker='1', markevery=20,
261 | markersize=17,
262 | label='Euler--Maruyama')
263 | axs[0].plot(T, ito15_cov_result,
264 | c='tab:purple',
265 | linewidth=3, marker='2', markevery=20,
266 | markersize=17,
267 | label=r'It\^{o}-1.5')
268 |
269 | axs[0].grid(linestyle='--', alpha=0.3, which='both')
270 |
271 | axs[0].set_ylabel(r'$\mathrm{Var}\,[X(t) \mid X(t_0)]$')
272 | axs[0].legend(loc='upper left', fontsize=17)
273 |
274 | # X^3
275 | axs[1].plot(T, riem_x3_result,
276 | c='black',
277 | linewidth=3, marker='+', markevery=20,
278 | markersize=17,
279 | label='Exact')
280 | axs[1].plot(T, tme_x3_result,
281 | c='tab:blue',
282 | linewidth=3, marker='.', markevery=20,
283 | markersize=17,
284 | label='TME-3')
285 | axs[1].plot(T, em_x3_result,
286 | c='tab:red',
287 | linewidth=3, marker='1', markevery=20,
288 | markersize=17,
289 | label='Euler--Maruyama')
290 | axs[1].plot(T, ito15_x3_result,
291 | c='tab:purple',
292 | linewidth=3, marker='2', markevery=20,
293 | markersize=17,
294 | label=r'It\^{o}-1.5')
295 |
296 | axs[1].grid(linestyle='--', alpha=0.3, which='both')
297 |
298 | axs[1].set_ylabel(r'$\mathbb{E} \, [X^3(t) \mid X(t_0)]$')
299 | axs[1].legend(loc='upper left', fontsize=17)
300 |
301 | # Nonlinear function only by TME
302 | axs[2].plot(T, riem_nonlinear_result,
303 | c='black',
304 | linewidth=3, marker='+', markevery=20,
305 | markersize=17,
306 | label='Exact')
307 | axs[2].plot(T, tme_nonlinear3_result,
308 | c='tab:purple',
309 | linewidth=3, marker='.', markevery=20,
310 | markersize=17,
311 | label='TME-2')
312 | axs[2].plot(T, tme_nonlinear4_result,
313 | c='tab:blue',
314 | linewidth=3, marker='x', markevery=20,
315 | markersize=17,
316 | label='TME-3')
317 |
318 | axs[2].grid(linestyle='--', alpha=0.3, which='both')
319 | axs[2].set_ylim(-6, 4)
320 |
321 | axs[2].set_ylabel(r'$\mathbb{E}\, [\sin(X(t)) \mid X(t_0)]$')
322 | axs[2].legend(loc='lower left', fontsize=17)
323 |
324 | # Neural network
325 | axs[3].plot(T, riem_nn_result,
326 | c='black',
327 | linewidth=3, marker='+', markevery=20,
328 | markersize=17,
329 | label='Exact')
330 | axs[3].plot(T, tme_nn_result,
331 | c='tab:blue',
332 | linewidth=3, marker='.', markevery=20,
333 | markersize=17,
334 | label='TME-3')
335 |
336 | axs[3].grid(linestyle='--', alpha=0.3, which='both')
337 | # plt.ylim(-1, 1)
338 |
339 | axs[3].set_xlim(0, 4)
340 |
341 | axs[3].set_xlabel('$t$', fontsize=16)
342 | axs[3].set_ylabel(r'$\mathbb{E}\, [\log(1+\exp(\log(1+\exp(X(t))))) \mid X(t_0)]$')
343 | axs[3].legend(loc='upper left', fontsize=17)
344 |
345 | plt.tight_layout(pad=0.1)
346 | plt.savefig(os.path.join(path_figs, 'tme-benes-all.pdf'))
347 | # plt.show()
348 |
--------------------------------------------------------------------------------
/thesis_latex/ch1.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = dissertation.tex
2 | \chapter{Introduction}
3 | \label{chap:intro}
4 | In signal processing, statistics, and machine learning, it is common to consider that noisy measurements/data are generated from a latent, unknown, function. In statistics, this is often regarded as a regression problem over the space of functions. Specifically, Bayesian statistics impose a prior belief over the latent function of interest in the form of a probability distribution. It is therefore of vital importance to choose the prior appropriately, since it will encode the characteristics of the underlying function. In recent decades, Gaussian processes\footnote{In the statistics and applied probability literature, Gaussian processes can also be found under the name of Gaussian fields, in particular when they are multidimensional in the input. Depending on the context, we may use one or the other terminology interchangeably.}~\citep[GPs,][]{Carl2006GPML} have become a popular family of prior distributions over functions, and they have been used successfully in numerous applications~\citep{Roberts2013, Hennig2015, Kocijan2016}.
5 |
6 | Formally, GPs are function-valued random variables that have Gaussian distributions fully determined by their mean and covariance functions. The choice of mean and covariance functions is in itself arbitrary, which allows for representing functions with various properties. As an example, \matern covariance functions are used as priors to functions with different degrees of differentiability~\citep{Carl2006GPML}. However, the use of GPs in practice usually involves two main challenges.
7 |
8 | The first challenge lies in the expensive \textit{computational cost} of training and parameter estimation. Due to the necessity of inverting covariance matrices during the learning phase, the computational complexity of standard GP regression and parameter estimation is cubic in the number of measurements. This makes GP computationally infeasible for large-scale datasets. Moreover, when the sampled data points are densely located, the covariance matrices that need inversion may happen to be numerically singular or close to singular, making the learning process unstable.
9 |
10 | The second challenge is related to modelling of irregular functions, such as piecewise smooth functions, or functions that have time-varying features (e.g., frequency or volatility). Many commonly-used GPs (e.g., with \matern covariance functions) fail to cover these irregular functions mainly because their probability distributions are invariant under translation (i.e., they are said to be \textit{stationary}). This behaviour is illustrated in Figure~\ref{fig:gp-fail}, where we show that a \matern GP poorly fits two irregular functions (i.e., a rectangular signal and a composite sinusoidal signal), because the GP's parameters/features are assumed to be constant over time. Specifically, in the rectangular signal example, in order to model the discontinuities, the \matern GP recovers a small global length scale ($\ell \approx 0.04$) which results in poor fitting in the continuous and flat parts. Similarly, in the composite sinusoidal signal example, the GP learns a small global length scale ($\ell \approx 0.01$) in order to model the high-frequency sections of the signal. This too results in poor fitting the low-frequency section of the signal.
11 |
12 | \begin{figure}[t!]
13 | \centering
14 | \includegraphics[width=.95\linewidth]{figs/gp-fail-example-m32}\\
15 | \includegraphics[width=.95\linewidth]{figs/gp-fail-example-sinu-m32}
16 | \caption{Mat\'{e}rn $\nu=3\,/\,2$ GP regression on a magnitude-varying rectangular signal (top) and a composite sinusoidal signal (bottom). The parameters $\ell$ and $\sigma$ are learnt by maximum likelihood estimation. The figures are taken from~\citet{Zhao2020SSDGP}.}
17 | \label{fig:gp-fail}
18 | \end{figure}
19 |
20 | The main aim of this thesis is thus to introduce a new class of non-stationary (Gaussian) Markov processes, that we name \textit{state-space deep Gaussian processes (SS-DGPs)}\footnote{Please note that although the name includes the term Gaussian, SS-DGPs are typically not Gaussian distributed, but instead hierarchically conditionally Gaussian, hence the name.}. These are able to address the computational and non-stationarity challenges aforementioned, by hierarchically composing the state-space representations of GPs. Indeed, SS-DGPs are computationally efficient models due to their Markovian structure. More precisely, this means that the resulting regression problem can be solved in linear computational time (with respect to the number of measurements) by using Bayesian filtering and smoothing methods. Moreover, due to their hierarchical nature, SS-DGPs are capable of changing their features/characteristics (e.g., length scale) over time, thereby inducing a rich class of priors compatible with irregular functions. The thesis ends with a collection of applications of state-space (deep) GPs.
21 |
22 | \section{Bibliographical notes}
23 | \label{sec:literature-review}
24 | In this section we provide a short and non-exhaustive review of related works in the GP literature. In particular we will focus on works that consider specifically reducing their computational complexity and allowing the non-stationarity in GPs.
25 |
26 | \subsection*{Scalable Gaussian processes}
27 | We now give a list of GP methods and approximations that are commonly used to reduce the computational costs of GP regression and parameter learning.
28 |
29 | \subsubsection{Sparse approximations of Gaussian processes}
30 | Sparse GPs approximate full-rank GPs with sparse representations by using, for example, inducing points~\citep{Snelson2006}, subsets of data~\citep{Snelson2007, Csato2002}, or approximations of marginal likelihoods~\citep{Titsias2009}, mostly relying on so-called pseudo-inputs. These approaches can reduce the computational complexity to quadratic in the number of pseudo-inputs and linear in the number of data points. In practice, the number and position of pseudo-inputs used in sparse representation must either be assigned by human experts or learnt from data~\citep{Hensman2013}. For more comprehensive reviews of sparse GPs, see, for example,~\citet{Quinonero2005unifying, Chalupka2013, LiuHaitao2020}.
31 |
32 | \subsubsection*{Gaussian Markov random fields}
33 | Gaussian Markov random fields~\citep[GMRFs,][]{Rue2005Book} are indexed collections of Gaussian random variables that have a Markov property (defined on graph). They are computationally efficient models because their precision matrices are sparse by construction. Methodologies for solving the regression and parameter learning problems on GMRFs can be found, for example, in~\citet{Rue2007, Rue2009N}. However, GMRFs are usually only approximations of Gaussian fields~\citep[see, e.g.,][Chapter 5]{Rue2005Book}, although explicit representations exist for some specific Gaussian fields~\citep{Lindgren2011}.
34 |
35 | \subsubsection*{State-space representations of Gaussian processes}
36 | State-space Gaussian processes (SS-GPs) are (temporal) Markov GPs that are solutions of stochastic differential equations~\citep[SDEs,][]{Simo2013SSGP, Sarkka2019}. Due to their Markovian structure, probability distributions of SS-GPs factorise sequentially in the time dimension. The regression problem can therefore be solved efficiently in linear time with respect to the number of data points. Moreover, leveraging the sparse structure of the precision matrix \citep{Grigorievskiy2017}, or leveraging the associativity of the Kalman filtering and smoothing operations \citep{Corenflos2021SSGP} can lead to a sublinear computational complexity.
37 |
38 | \subsubsection*{Other data-scalable Gaussian processes}
39 | \citet{Rasmussen2002GPexperts, Meeds2006} form mixtures of GPs by splitting the dataset into batches resulting in a computational complexity that is cubic in the batch size. This methodology can further be made parallel \citep{ZhangMinyi2019}. \citet{Lazaro2010} approximate stationary GPs with sparse spectral representations (i.e., trigonometric expansions). \citet{Gardner2018} and \citet{KeWang2019} use conjugate gradients and stochastic trace estimation to efficiently compute the marginal log-likelihood of standard GPs, as well as their gradients with respect to parameters, resulting in a quadratic computational complexity in the number of data points.
40 |
41 | \subsection*{Non-stationary Gaussian processes}
42 | In the below we give a list of methods that are introduced in order to induce non-stationarity in GPs.
43 |
44 | \subsubsection*{Non-stationary covariance function-based Gaussian processes}
45 | Non-stationary covariance functions can be constructed by making their parameters (e.g., length scale or magnitude) depend on the data position. For instance, \citet{Gibbs} and \citet{Higdon1999non} present specific examples of covariance functions where the length scale parameter depends on the spatial location. On the other hand, \citet{Paciorek2004, Paciorek2006} generalise these constructions to turn any stationary covariance function into a non-stationary one. There also exist some other non-stationary covariance functions, such as the polynomial or neural network covariance functions~\citep{Williams1998, Carl2006GPML} that can also give non-stationary GPs, but we do not review them here as they are not within the scope of this thesis.
46 |
47 | \subsubsection*{Composition-based Gaussian processes}
48 | \citet{Sampson1992, Schmidt2003, Carl2006GPML} show that it is possible to construct a non-stationary GP as the pullback of an existing stationary GP by a non-linear transformation. Formally, given a stationary GP $U\colon E \to \R$, one can find a suitable transformation $\Upsilon\colon \T \to E$, such that the composition $U \circ \Upsilon\colon \T\to\R$ is a non-stationary GP on $\T$. For example, \citet{Calandra2016ManifoldGP} and~\citet{Wilson2016DeepKernel} choose $\Upsilon$ as neural networks.
49 |
50 | \subsubsection{Warping-based Gaussian processes}
51 | Conversely to the composition paradigm above, it is also possible to transform GPs the other way around, that is, to consider that GPs are the transformations of some non-Gaussian processes by non-linear functions~\citep{Snelson2004}. Computing the marginal log-likelihood function of these warped GPs is then done by leveraging the change-of-variables formula for Lebesgue integrals (when it applies). However, the warping can be computationally demanding as the change-of-variables formula requires computing the inverse determinant of the transformation Jacobian. This issue can be mitigated, for example, by writing the the warping scheme with multiple layers of elementary functions which have explicit inverses~\citep{Rios2019}.
52 |
53 |
54 | \subsection*{Deep Gaussian processes}
55 | The deterministic constructions for introducing non-stationarity GPs can be further extended in order to give a class of non-stationary non-Gaussian processes that can also represent irregular functions. While they are different in structure, the three subclasses of models presented below are usually all referred as deep Gaussian processes (DGPs) in literature.
56 |
57 | \subsubsection*{Composition-based deep Gaussian processes}
58 | \citet{Gredilla2012} extends the aforementioned pullback idea by taking $\Upsilon\colon \T\to E$ to be a GP instead of a deterministic mapping in order to overcome the overfitting problem. Resulting compositions of the form $U \circ \Upsilon\colon \T\to\R$ may not necessarily be GPs anymore but may provide a more flexible family of priors than that of deterministic compositions. This construction can be done recursively leading to a subclass of DGPs~\citep{Damianou2013}. However, the training of these DGPs is found to be challenging and requires approximate inference methods~\citep{Bui2016, Salimbeni2017Doubly}. Moreover, \citet{Duvenaud2014Thesis, Duvenaud2014} show that increasing the depth of DGPs can lead to a representation pathology, where samples of DGPs tend to be flat in high probability and exhibit sudden jumps. This problem can be mitigated by making their latent GP components explicitly depend on their original inputs~\citep{Duvenaud2014}.
59 |
60 | \subsubsection*{Hierarchical parametrisation-based deep Gaussian processes}
61 | A similar idea to compositional DGPs is to model the parameters of GPs as latent GPs. The posterior distribution of the joint model can then be computed by successive applications of Bayes' rule. As an example,~\citet{Roininen2016} consider putting a GP prior on the length scale parameter of a \matern GP and use Metropolis-within-Gibbs to sample from the posterior distribution. Similarly,~\citet{Salimbeni2017ns} model the length scale parameter of the non-stationary covariance function introduced by~\citet{Paciorek2004} as a GP, but use a variational approximation to approximate its posterior distribution. Other sampling techniques to recover the posterior distribution of these models can be found, for example, in~\citet{Heinonen2016, Karla2020}.
62 |
63 | \citet{Zhao2020SSDGP} and~\citet{Emzir2020} show that this hierarchy in parametrisation can be done recursively, leading to another subclass of DGPs that can be represented by stochastic (partial) differential equations. The relationship between the composition-based and parametrisation-based DGPs is also briefly discussed in~\citet{Dunlop2018JMLR}.
64 |
65 | \section{Reproducibility}
66 | \label{sec:codes}
67 | In order to allow for reproducibility of our work, we provide the following implementations.
68 |
69 | \begin{itemize}
70 | \item Taylor moment expansion (Chapter~\ref{chap:tme}). Python and Matlab codes for it are available at \href{https://github.com/zgbkdlm/tme}{https://github.com/zgbkdlm/tme}.
71 | \item State-space deep Gaussian processes (Chapter~\ref{chap:dssgp}). Python and Matlab codes for it are available at \href{https://github.com/zgbkdlm/ssdgp}{https://github.com/zgbkdlm/ssdgp}.
72 | \item The Python codes for reproducing Figures~\ref{fig:gp-fail}, \ref{fig:disc-err-dgp-m12}, \ref{fig:ssdgp-vanishing-cov}, and~\ref{fig:spectro-temporal-demo}, as well as the simulations in Examples~\ref{example-tme-benes}, \ref{example:tme-softplus}, \ref{example:ssdgp-m12}, and~\ref{example:ssdgp-m32} are available at \href{https://github.com/zgbkdlm/dissertation}{https://github.com/zgbkdlm/dissertation}.
73 | \end{itemize}
74 |
75 | \section{Outline of the thesis}
76 | \label{sec:outline}
77 | This thesis consists of seven publications and overviews of them, and the thesis is organised as follows.
78 |
79 | In Chapter~\ref{chap:cd-smoothing} we review stochastic differential equations (SDEs) and Bayesian continuous-discrete filtering and smoothing (CD-FS) problems. This chapter lays out the preliminary definitions and results that are needed in the rest of the thesis.
80 |
81 | Chapter~\ref{chap:tme} (related to Publication~\cp{paperTME}) shows how to solve Gaussian approximated CD-FS problems by using the Taylor moment expansion (TME) method. This chapter also features some numerical demonstrations and analyses the positive definiteness of TME covariance approximations as well as the stability of TME Gaussian filters.
82 |
83 | Chapter~\ref{chap:dssgp} (related to Publications~\cp{paperSSDGP} and~\cp{paperRNSSGP}) introduces SS-DGPs. In particular, after defining DGPs formally, we introduce their state-space representations. Secondly, we present how to sample from SS-DGPs by combining discretisation and numerical integration. Thirdly, we illustrate the construction of SS-DGPs in the \matern sense. Fourhtly, we represent SS-DGP regression problems as CD-FS problems that we can then solve using the methods introduced in Chapter~\ref{chap:cd-smoothing}. Finally, we explain how DGPs can be regularised in the $L^1$ sense, in particular to promote sparsity at any level of the DGP component hierarchy.
84 |
85 | Chapter~\ref{chap:apps} (related to Publications~\cp{paperDRIFT},~\cp{paperKFSECG},~\cp{paperKFSECGCONF},~\cp{paperSSDGP}, and~\cp{paperMARITIME}) introduces various applications of state-space (deep) GPs. These include estimation of the drift functions in SDEs, probabilistic spectro-temporal signal analysis, as well as modelling real-world signals (from astrophysics, human motion, and maritime navigation) with SS-DGPs.
86 |
87 | Finally, Chapter~\ref{chap:summary} offers a summary of the contributions of the seven publications presented in this thesis, and concludes with a discussion of unsolved problems and possible future extensions.
88 |
--------------------------------------------------------------------------------
/thesis_latex/dissertation.tex:
--------------------------------------------------------------------------------
1 | % Metadata for pdfx
2 | \RequirePackage{filecontents}
3 | \begin{filecontents*}{dissertation.xmpdata}
4 | \Title{State-space deep Gaussian processes with applications}
5 | \Author{Zheng Zhao}
6 | \Subject{State-space methods for deep Gaussian processes}
7 | \Keywords{Gaussian processes, machine learning, state space, stochastic differential equations, stochastic filtering}
8 | \end{filecontents*}
9 |
10 | \documentclass[dissertation,final,vertlayout,pdfa,nologo,math]{aaltoseries}
11 |
12 | % Kludge to make sure we have utf8 input (check that this file is utf8!)
13 | \makeatletter
14 | \@ifpackageloaded{inputenc}{%
15 | \inputencoding{utf8}}{%
16 | \usepackage[utf8]{inputenc}}
17 | \makeatother
18 |
19 | % hyperref is pre-loaded by aaltoseries
20 | \hypersetup{bookmarks=true, colorlinks=false, pagebackref=true, hypertexnames=true, hidelinks}
21 |
22 | % Enable backref especially for bibliography
23 | \usepackage[hyperpageref]{backref}
24 | \renewcommand*{\backref}[1]{}
25 | \renewcommand*{\backrefalt}[4]{{
26 | \ifcase #1 Not cited.%
27 | \or Cited on page~#2.%
28 | \else Cited on pages #2.%
29 | \fi%
30 | }}
31 |
32 | \usepackage[english]{babel}
33 | \usepackage{amsmath,amsthm,amssymb,bm}
34 | % Adjustment set by the aaltoseries developers
35 | \interdisplaylinepenalty=2500
36 | \renewcommand*{\arraystretch}{1.2}
37 | \setlength{\jot}{8pt}
38 |
39 | % Enable the following to suppress page headers and numbers on
40 | % content-less left (even-numbered) pages. Fixes a bug in aaltoseries
41 | \usepackage{emptypage}
42 |
43 | \usepackage[SchoolofEngineering]{aaltologo}
44 |
45 | \usepackage{CJKutf8}
46 | \usepackage[round, authoryear]{natbib}
47 | \usepackage{graphicx}
48 | \usepackage{mathtools}
49 | \usepackage[shortlabels]{enumitem}
50 |
51 | \usepackage{tikz}
52 | \usetikzlibrary{fadings}
53 | \usetikzlibrary{patterns}
54 | \usetikzlibrary{shadows.blur}
55 | \usetikzlibrary{shapes}
56 |
57 | \newcommand{\thmenvcounter}{chapter}
58 | \input{zmacro.tex}
59 |
60 | \newcommand{\zz}[1]{{\color{red} #1}}
61 |
62 | \newcommand*{\hilite}[1]{
63 | \setlength{\fboxsep}{3mm}%
64 | \begin{center}\colorbox{orange}{\parbox{0.9\columnwidth}{\textit{#1}}}\end{center}%
65 | }
66 |
67 | \author{Zheng Zhao}
68 | \title{State-Space Deep Gaussian Processes with Applications}
69 |
70 | \begin{document}
71 |
72 | \includepdf[pages=-]{title-pages/title-pages.pdf}
73 |
74 | \draftabstract{This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SS-DGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuous-discrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SS-DGPs and solving their regression problems. Lastly, this thesis features a number of applications of state-space (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectro-temporal features of signals.}
75 |
76 | \setcounter{page}{0}
77 |
78 | %% Preface
79 | % Note: I myself changed this environment.
80 | \begin{preface}[Helsinki]{\large{\begin{CJK}{UTF8}{bkai}\\趙~正\end{CJK}}
81 | }
82 | The research work in this thesis has been carried out in the Department of Electrical Engineering and Automation, Aalto University, during the years 2018-2021. My doctoral studies officially started in April of 2018, while most of the pivotal work came in 2020-2021. During this time, my doctoral research was financially supported by Academy of Finland and Aalto ELEC Doctoral School. The Aalto Scientific Computing team and the Aalto Learning Center also provided useful computational and literature resources for my studies. I particularly enjoyed the Spring, Autumn, and Winter in Finland, which allowed me to find inner peace and focus on my research.
83 |
84 | I would like to offer my greatest gratitude to Prof. Simo S\"{a}rkk\"{a} who is my supervisor and mentor, and without whom this work would never have been possible. After finishing my master studies in Beijing University of Technology in 2017, I found myself lost in finding a ``meaningful'' way of life in the never-sleeping metropolis that is Beijing. This quest was fulfilled when Simo offered me the opportunity of pursuing a doctoral degree under his supervision. Disregarding my bewilderment on the research path in the beginning, Simo's patience and valuable guidance led me to a research area that I am fascinated in. Over the years, Simo's help, support, and friendship have helped me become a qualified and independent researcher. I think very highly of Simo's supervision, and I almost surely could not have found a better supervisor.
85 |
86 | During my years in the campus, I owe a great thanks to Rui Gao (\begin{CJK}{UTF8}{bkai}高~睿\end{CJK}) who is a brilliant, learnt, and erudite researcher.
87 |
88 | I would like to thank these few people that have accompanied me through joy and sorrow, I name: Adrien Corenflos and Christos Merkatas. I thank you for the friendship and relieving me from solitude\footnote{This was written under constraint.}.
89 |
90 | During my years in Aalto university, I have shared my office with Marco Soldati, Juha Sarmavuori, Janne Myll\"{a}rinen, Fei Wang (\begin{CJK}{UTF8}{bkai}王~斐\end{CJK}), Jiaqi Liu (\begin{CJK}{UTF8}{bkai}劉~佳琦\end{CJK}), Ajinkya Gorad, Masaya Murata (\begin{CJK}{UTF8}{bkai}村田~真哉\end{CJK}), and Otto Kangasmaa. I thank them all for filling the office with happiness and joy. I especially thank Marco Soldati who offered me honest friendship, lasagne, and taught me many useful Italian phrases. My thanks also go to Lauri Palva, Zenith Purisha, Joel Jaskari, Sakira Hassan, Fatemeh Yaghoobi, Abubakar Yamin, Zaeed Khan, Xiaofeng Ma (\begin{CJK}{UTF8}{bkai}馬 曉峰\end{CJK}), Prof. Ivan Vujaklija, Dennis Yeung, Wendy Lam, Prof. Ilkka Laakso, Marko Mikkonen, Noora Matilainen, Juhani Kataja, Linda Srbova, and Tuomas Turunen. All these amazing people made working at Aalto a real pleasure. I would also like to give my thanks to Laila Aikala who kindly offered me a peaceful place to stay in Espoo.
91 |
92 | I warmly thank Prof. Leo K\"{a}rkk\"{a}inen for the collaboration on the AI in Health Technology course and our inspiring discussions on many Thursdays and Fridays. I particularly enjoyed the collaboration with Muhammad Fuady Emzir who offered me knowledge generously and with no reservations. Many thanks go to my coauthors Prof. Roland Hostettler, Prof. Ali Bahrami Rad, Filip Tronarp, and Toni Karvonen. I also appreciated the collaboration with Sarang Thombre and Toni Hammarberg from Finnish Geospatial Research Institute, Prof. Ville V. Lehtola from University of Twente, and Tuomas Lumikari from Helsinki University Hospital. I also thank Prof. Lassi Roininen and Prof. Arno Solin for their time and valuable advice.
93 |
94 | Lastly, I would like to thank my parents and sister who support me persistently as always.
95 |
96 | \end{preface}
97 |
98 | %% Table of contents of the dissertation
99 | \clearpage
100 | \tableofcontents
101 |
102 | % To be defined before generating list of publications. Use 'no' if no acknowledgement
103 | \languagecheck{Adrien Corenflos, Christos Merkatas, and Dennis Yeung}
104 |
105 | %% This is for article dissertations. Remove if you write a monograph dissertation.
106 | % The actual publications are entered manually one by one as shown further down:
107 | % use \addpublication, \addcontribution, \adderrata, and addpublicationpdf.
108 | % The last adds the actual article, the other three enter related information
109 | % that will be collected in lists -- like this one.
110 | %
111 | % Uncomment and edit as needed
112 | \def\authorscontributionname{Author's contribution}
113 | \listofpublications
114 |
115 | %%% Add lists of figures and tables as you usually do (\listoffigures, \listoftables)
116 | %\listoffigures
117 |
118 | %% Add list of abbreviations, list of symbols, etc., using your preferred package/method.
119 | \abbreviations
120 |
121 | \begin{description}[style=multiline,leftmargin=3cm]
122 | \item[CD-FS] Continuous-discrete filtering and smoothing
123 | \item[DGP] Deep Gaussian process
124 | \item[GFS] Gaussian approximated density filter and smoother
125 | \item[GMRF] Gaussian Markov random field
126 | \item[GP] Gaussian process
127 | \item[It\^{o}-1.5] It\^{o}--Taylor strong order 1.5
128 | \item[LCD] Locally conditional discretisation
129 | \item[MAP] Maximum a posteriori
130 | \item[MCMC] Markov chain Monte Carlo
131 | \item[MLE] Maximum likelihood estimation
132 | \item[NSGP] Non-stationary Gaussian process
133 | \item[ODE] Ordinary differential equation
134 | \item[PDE] Partial differential equation
135 | \item[RBF] Radial basis function
136 | \item[R-DGP] Regularised (batch) deep Gaussian process
137 | \item[R-SS-DGP] Regularised state-space deep Gaussian process
138 | \item[RTS] Rauch--Tung--Striebel
139 | \item[SDE] Stochastic differential equation
140 | \item[SS-DGP] State-space deep Gaussian process
141 | \item[SS-GP] State-space Gaussian process
142 | \item[TME] Taylor moment expansion
143 | \end{description}
144 |
145 | \symbols
146 |
147 | \begin{description}[style=multiline,leftmargin=3cm]
148 | \item[$a$] Drift function of SDE
149 | \item[$A$] Drift matrix of linear SDE
150 | \item[$\A$] Infinitesimal generator
151 | \item[$\Am$] Multidimensional infinitesimal generator
152 | \item[$b$] Dispersion function of SDE
153 | \item[$B$] Dispersion matrix of linear SDE
154 | \item[$c$] Constant
155 | \item[$\mathcal{C}^k(\Omega; \Pi)$] Space of $k$ times continuously differentiable functions on $\Omega$ mapping to $\Pi$
156 | \item[$C(t,t')$] Covariance function
157 | \item[$C_{\mathrm{Mat.}}(t,t')$] \matern covariance function
158 | \item[$C_{\mathrm{NS}}(t,t')$] Non-stationary \matern covariance function
159 | \item[$C_{1:T}$] Covariance/Gram matrix by evaluating the covariance function $C(t, t')$ on Cartesian grid $(t_1,\ldots, t_T) \times (t_1,\ldots, t_T)$
160 | \item[$\covsym$] Covariance
161 | \item[$\cov{X \mid Y}$] Conditional covariance of random variable $X$ given another random variable $Y$
162 | \item[$\cov{X \mid y}$] Conditional covariance of random variable $X$ given the realisation $y$ of random variable $Y$
163 | \item[$d$] Dimension of state variable
164 | \item[$d_i$] Dimension of the $i$-th GP element
165 | \item[$d_y$] Dimension of measurement variable
166 | \item[$\det$] Determinant
167 | \item[$\diagsym$] Diagonal matrix
168 | \item[$\expecsym$] Expectation
169 | \item[$\expec{X \mid \mathcal{F}}$] Conditional expectation of $X$ given sigma-algebra $\mathcal{F}$
170 | \item[$\expec{X \cond Y}$] Conditional expectation of $X$ given the sigma-algebra generated by random variable $Y$
171 | \item[$\expec{X \cond y}$] Conditional expectation of $X$ given the realisation $y$ of random variable $Y$
172 | \item[$f$] Approximate transition function in discrete state-space model
173 | \item[$f^M$] $M$-order TME approximated transition function in discrete state-space model
174 | \item[$\check{f}$] Exact transition function in discrete state-space model
175 | \item[$\mathring{f}_j$] $j$-th frequency component
176 | \item[$\FF$] Sigma-algebra
177 | \item[$\FF_t$] Filtration
178 | \item[$\FF_t^W$] Filtration generated by $W$ and initial random variable
179 | \item[$g$] Transformation function
180 | \item[$\mathrm{GP}(0, C(t,t'))$] Zero-mean Gaussian process with covariance function $C(t,t')$.
181 | \item[$h$] Measurement function
182 | \item[$H$] Measurement matrix
183 | \item[$\hessian_x f$] Hessian matrix of $f$ with respect to $x$
184 | \item[$I$] Identity matrix
185 | \item[$J$] Set of conditional dependencies of GP elements
186 | \item[$\jacob_x f$] Jacobian matrix of $f$ with respect to $x$
187 | \item[$K$] Kalman gain
188 | \item[$\mBesselsec$] Modified Bessel function of the second kind with parameter $\nu$
189 | \item[$\ell$] Length scale parameter
190 | \item[$\mathcal{L}^\mathrm{A}$] Augmented Lagrangian function
191 | \item[$\mathcal{L}^\mathrm{B}$] MAP objective function of batch DGP
192 | \item[$\mathcal{L}^\mathrm{B-REG}$] $L^1$-regularisation term for batch DGP
193 | \item[$\mathcal{L}^\mathrm{S}$] MAP objective function of state-space DGP
194 | \item[$\mathcal{L}^\mathrm{S-REG}$] $L^1$-regularisation term for state-space DGP
195 | \item[$m(t)$] Mean function
196 | \item[$m^-_k$] Predictive mean at time $t_k$
197 | \item[$m^f_k$] Filtering mean at time $t_k$
198 | \item[$m^s_k$] Smoothing mean at time $t_k$
199 | \item[$M$] Order of Taylor moment expansion
200 | \item[$N$] Order of Fourier expansion
201 | \item[$\mathrm{N}(x\mid m, P)$] Normal probability density function with mean $m$ and covariance $P$
202 | \item[$\N$] Set of natural numbers
203 | \item[$O$] Big $O$ notation
204 | \item[$p_X(x)$] Probability density function of random variable $X$
205 | \item[$p_{X \cond Y}(x\cond y)$] Conditional probability density function of $X$ given $Y$ taking value $y$
206 | \item[$P^-_k$] Predictive covariance at time $t_k$
207 | \item[$P^f_k$] Filtering covariance at time $t_k$
208 | \item[$P^s_k$] Smoothing covariance at time $t_k$
209 | \item[$P^{i,j}_k$] Filtering covariance of the $i$ and $j$-th state elements at time $t_k$
210 | \item[$\mathbb{P}$] Probability measure
211 | \item[$q_k$] Approximate process noise in discretised state-space model at time $t_k$
212 | \item[$\check{q}_k$] Exact process noise in discretised state-space model at time $t_k$
213 | \item[$Q_k$] Covariance of process noise $q_k$
214 | \item[$R_{M, \phi}$] Remainder of $M$-order TME approximation for target function $\phi$
215 | \item[$\R$] Set of real numbers
216 | \item[$\R_{>0}$] Set of positive real numbers
217 | \item[$\R_{<0}$] Set of negative real numbers
218 | \item[$\sgn$] Sign function
219 | \item[$\mathcal{S}_{m, P}$] Sigma-point approximation of Gaussian integral with mean $m$ and covariance $P$
220 | \item[$t$] Temporal variable
221 | \item[$\tracesym$] Trace
222 | \item[$t_0$] Initial time
223 | \item[$T$] Number of measurements
224 | \item[$\T$] Temporal domain $\T\coloneqq [t_0, \infty)$
225 | \item[$U$] (State-space) GP
226 | \item[$U^i_{j_i}$] (State-space) GP element in $\mathcal{V}$ indexed by $i$, and it is also a parent of the $j_i$-th GP element in $\mathcal{V}$
227 | \item[$U_{1:T}$] Collection of $U(t_1), U(t_2),\ldots, U(t_T)$
228 | \item[$\mathcal{U}^i$] Collection of parents of $U^i_{j_i}$
229 | \item[$V$] (State-space) deep GP
230 | \item[$V_k$] Shorthand of $V(t_k)$
231 | \item[$V_{1:T}$] Collection of $V(t_1), V(t_2),\ldots, V(t_T)$
232 | \item[$\mathcal{V}$] Collection of GP elements
233 | \item[$\varrsym$] Variance
234 | \item[$w$] Dimension of Wiener process
235 | \item[$W$] Wiener process
236 | \item[$X$] Stochastic process
237 | \item[$X_0$] Initial random variable
238 | \item[$X_k$] Shorthand of $X(t_k)$
239 | \item[$Y_k$] Measurement random variable at time $t_k$
240 | \item[$Y_{1:T}$] Collection of $Y_1, Y_2,\ldots, Y_T$
241 |
242 | \item[$\gamma$] Dimension of the state variable of Mat\'{e}rn GP
243 | \item[$\Gamma$] Shorthand of $b(x) \, b(x)^\trans$
244 | \item[$\varGamma$] Gamma function
245 | \item[$\Delta t$] Time interval $t-s$
246 | \item[$\Delta t_k$] Time interval $t_k-t_{k-1}$
247 | \item[$\eta$] Multiplier for augmented Lagrangian function
248 | \item[$\theta$] Auxiliary variable used in augmented Lagrangian function
249 | \item[$\Theta_{r}$] $r$-th polynomial coefficient in TME covariance approximation
250 | \item[$\mineig$] Minimum eigenvalue
251 | \item[$\maxeig$] Maximum eigenvalue
252 | \item[$\Lambda(t)$] Solution of a matrix ordinary differential equation
253 | \item[$\cu{\Lambda}(t, s)$] Shorthand of $\Lambda(t) \, (\Lambda(s))^{-1}$
254 | \item[$\xi_k$] Measurement noise at time $t_k$
255 | \item[$\Xi_k$] Variance of measurement noise $\xi_k$
256 | \item[$\rho$] Penalty parameter in augmented Lagrangian function
257 | \item[$\sigma$] Magnitude (scale) parameter
258 | \item[$\Sigma_M$] $M$-order TME covariance approximant
259 | \item[$\phi$] Target function
260 | \item[$\phi_{ij}$] $i,j$-th element of $\phi$
261 | \item[$\phi^\mathrm{I}$] $\phi^\mathrm{I}(x) \coloneqq x$
262 | \item[$\phi^\mathrm{II}$] $\phi^\mathrm{II}(x) \coloneqq x \, x^\trans$
263 | \item[$\Phi$] Sparsity inducing matrix
264 | \item[$\chi(\Delta t)$] Polynomial of $\Delta t$ associated with TME covariance approximation
265 | \item[$\Omega$] Sample space
266 |
267 | \item[$(\Omega, \FF, \FF_t, \PP)$] Filtered probability space with sample space $\Omega$, sigma-algebra $\FF$, filtration $\FF_t$, and probability measure $\PP$
268 | \item[$\abs{\cdot}$] Absolute value
269 | \item[$\norm{\cdot}_p$] $L^p$ norm or $L^p$-induced matrix norm
270 | \item[$\norm{\cdot}_G$] Euclidean norm weighted by a non-singular matrix $G$
271 | \item[$\nabla_x f$] Gradient of $f$ with respect to $x$
272 | \item[$\binom{\cdot}{\cdot}$] Binomial coefficient
273 | \item[$\innerp{\cdot, \cdot}$] Inner product
274 | \item[$\circ$] Mapping composition
275 | \item[$\coloneqq$] By definition
276 | \item[$\times$] Cartesian product
277 | \item[$a \, \wedge \, b$] Minimum of $a$ and $b$
278 | \end{description}
279 |
280 | \input{ch1}
281 | \input{ch2}
282 | \input{ch3}
283 | \input{ch4}
284 | \input{ch5}
285 | \input{ch6}
286 |
287 | \renewcommand{\bibname}{References}
288 | \bibliographystyle{plainnat}
289 | \bibliography{refs}
290 |
291 | % Errata list, if you have errors in the publications.
292 | \errata
293 |
294 | \input{list_of_papers.tex}
295 |
296 | \includepdf[pages=-]{title-pages/backcover.pdf}
297 |
298 | \end{document}
299 |
--------------------------------------------------------------------------------
/lectio_praecursoria/slide.tex:
--------------------------------------------------------------------------------
1 | \documentclass[seriffont, cmap=Beijing, 10pt]{zz}
2 |
3 | \newcommand\hmmax{0}
4 | \newcommand\bmmax{0}
5 |
6 | \usepackage[utf8]{inputenc}
7 | \usepackage[T1]{fontenc}
8 | %\usepackage{fouriernc}
9 | \usepackage{amsmath, amssymb, bm, mathtools}
10 | \usepackage{animate}
11 | \usepackage{graphicx}
12 |
13 | \usepackage{tikz}
14 | \usetikzlibrary{fadings}
15 | \usetikzlibrary{patterns}
16 | \usetikzlibrary{shadows.blur}
17 | \usetikzlibrary{shapes}
18 |
19 | \title{Lectio Praecursoria}
20 | \subtitle{State-Space Deep Gaussian Processes with Applications}
21 |
22 | \date[10 December 2021]{10 December 2021}
23 | \institute{Aalto University}
24 |
25 | \author[Zheng Zhao]{Zheng Zhao}
26 |
27 | \setbeamercovered{transparent}
28 | \setbeamertemplate{section in toc}[circle]
29 |
30 | % Change toc item spacing
31 | % https://tex.stackexchange.com/questions/170268/separation-space-between-tableofcontents-items-in-beamer
32 | \usepackage{etoolbox}
33 | \makeatletter
34 | \patchcmd{\beamer@sectionintoc}
35 | {\vfill}
36 | {\vskip2.\itemsep}
37 | {}
38 | {}
39 | \makeatother
40 |
41 | % Footnote without numbering
42 | % https://tex.stackexchange.com/questions/30720/footnote-without-a-marker
43 | \newcommand\blfootnote[1]{%
44 | \begingroup
45 | \renewcommand\thefootnote{}\footnote{\scriptsize#1}%
46 | \addtocounter{footnote}{-1}%
47 | \endgroup
48 | }
49 |
50 | \input{../thesis_latex/z_marcro.tex}
51 |
52 | \begin{document}
53 |
54 | \titlepage
55 |
56 | \begin{frame}{The dissertation}
57 | \noindent
58 | \begin{minipage}{.48\textwidth}
59 | \begin{figure}
60 | \centering
61 | \fbox{\includegraphics[trim={2cm 2cm 2cm 2cm},width=.8\linewidth,clip]{../thesis_latex/title-pages/title-pages}}
62 | \end{figure}
63 | \end{minipage}
64 | \hfill
65 | \begin{minipage}{.48\textwidth}
66 | \begin{block}{}
67 | Available online:\\ \url{https://github.com/zgbkdlm/dissertation}\\
68 | or scan the QR code
69 | \end{block}
70 | \begin{block}{}
71 | \begin{figure}
72 | \centering
73 | \includegraphics[width=.5\linewidth]{figs/qr-code-thesis}
74 | \end{figure}
75 | \end{block}
76 | \begin{block}{}
77 | Companion codes in Python and Matlab are also in $^\wedge$
78 | \end{block}
79 | \end{minipage}
80 | \end{frame}
81 |
82 | \begin{frame}{Contents}
83 | This dissertation mainly consists of:
84 | \begin{block}{}
85 | \tableofcontents
86 | \end{block}
87 | \end{frame}
88 |
89 | \section{Continuous-discrete filtering and smoothing with Taylor moment expansion}
90 | \begin{frame}{Contents}
91 | \begin{block}{}
92 | \tableofcontents[currentsection]
93 | \end{block}
94 | \end{frame}
95 |
96 | \begin{frame}{Stochastic filtering}
97 | \begin{block}{}
98 | Consider a system
99 | %
100 | \begin{equation}
101 | \begin{split}
102 | \diff X(t) &= a(X(t)) \diff t + b(X(t)) \diff W(t), \quad X(t_0) = X_0,\\
103 | Y_k &= h(X(t_k)) + \xi_k, \quad \xi_k \sim \mathrm{N}(0, \Xi_k),
104 | \end{split}
105 | \end{equation}
106 | %
107 | and a set of data $y_{1:T} = \lbrace y_1, y_2,\ldots, y_T \rbrace$. The goals are to estimate
108 | \begin{itemize}
109 | \item the (marginal) \alert{filtering} distributions
110 | %
111 | \begin{equation}
112 | p(x_k \cond y_{1:k}), \quad \text{for }k=1,2,\ldots,
113 | \end{equation}
114 | %
115 | \item and the (marginal) \alert{smoothing} distributions
116 | %
117 | \begin{equation}
118 | p(x_k \cond y_{1:T}), \quad \text{for }k=1,2,\ldots,T,
119 | \end{equation}
120 | %
121 | \end{itemize}
122 | \end{block}
123 | \blfootnote{$p(x \cond y)$ abbreviates $p_{X \cond Y}(x \cond y)$.}
124 | \end{frame}
125 |
126 | \begin{frame}{Stochastic filtering}
127 | \begin{block}{}
128 | \begin{figure}
129 | \centering
130 | \animategraphics[autoplay, loop, width=.49\linewidth]{10}{figs/animes/filter-}{0}{99}
131 | \animategraphics[autoplay, loop, width=.49\linewidth]{10}{figs/animes/smoother-}{0}{99}
132 | \caption{Filtering (left) and smoothing (right).}
133 | % dummy image
134 | \end{figure}
135 | \end{block}
136 | \end{frame}
137 |
138 | \begin{frame}{Stochastic filtering}
139 | \begin{block}{}
140 | Solving the \alert{filtering} and \alert{smoothing} problems usually involves computing
141 | %
142 | \begin{equation}
143 | \expec{\phi(X(t)) \cond X(s)}
144 | \end{equation}
145 | %
146 | for $t\geq s \in\T$ and some \alert{target function $\phi$}.
147 | \end{block}
148 | \begin{block}{}
149 | For instance, in \alert{Gaussian} approximate filtering and smoothing, we choose $\phi^\mathrm{I}(x)\coloneqq x$ and $\phi^\mathrm{II}(x)\coloneqq x \, x^\trans$ in order to approximate
150 | %
151 | \begin{equation}
152 | \begin{split}
153 | p(x_k \cond y_{1:k}) &\approx \mathrm{N}\big(x_k \cond m^f_k, P^f_k\big), \\
154 | p(x_k \cond y_{1:T}) &\approx \mathrm{N}\big(x_k \cond m^s_k, P^s_k\big).
155 | \end{split}
156 | \end{equation}
157 | %
158 | \end{block}
159 | \end{frame}
160 |
161 | \begin{frame}{Stochastic filtering}
162 | \begin{block}{}
163 | Thanks to D. Florens-Zmirou and D. Dacunha-Castelle, for any $\phi\in \mathcal{C}(\R^{2\,(M+1)};\R)$, it is possible to
164 | %
165 | \begin{equation}
166 | \expec{\phi(X(t)) \cond X(s)} = \sum^M_{r=0} \A^r\phi(X(s))\,\Delta t^r + R_{M, \phi}(X(s), \Delta t),
167 | \end{equation}
168 | %
169 | where
170 | %
171 | \begin{equation}
172 | \begin{split}
173 | \A\phi(x) &\coloneqq (\nabla_x\phi(x))^\trans \, a(x) + \frac{1}{2} \, \tracebig{\Gamma(x) \, \hessian_x\phi(x)},\\
174 | \Gamma(x) &\coloneqq b(x) \, b(x)^\trans.
175 | \end{split}
176 | \end{equation}
177 | %
178 | \end{block}
179 | \begin{block}{}
180 | We call this \alert{Taylor moment expansion (TME)}, detailed in Section 3.3.
181 | \end{block}
182 | \end{frame}
183 |
184 | \begin{frame}{Stochastic filtering}
185 | \begin{block}{}
186 | However, the TME approximation to the \alert{covariance} $\cov{X(t) \cond X(s)}$ might not be \alert{positive definite}. Detailed in \alert{Theorem~3.5}.
187 | \end{block}
188 | \begin{block}{}
189 | This problem can be numerically addressed by:
190 | \begin{itemize}
191 | \item Choose small \alert{time interval} $t-s$.
192 | \item Increase the expansion \alert{order} $M$ if the SDE coefficients are regular enough.
193 | \item Tune SDE coefficients so that it's positive definite for all $t-s\in\R_{>0}$ (see \alert{Corollary 3.6}).
194 | \item Tune SDE coefficients so that it's positive definite for all $X(s)\in\R^d$ (see \alert{Lemma 3.8}).
195 | \end{itemize}
196 | \end{block}
197 | \end{frame}
198 |
199 | \begin{frame}{Stochastic filtering}
200 | \begin{block}{}
201 | \begin{example}
202 | \begin{equation}
203 | \begin{split}
204 | \diff X^1(t) &= \big( \log(1+\exp(X^1(t))) + \kappa\,X^2(t) \big)\diff t + \diff W_1(t),\\
205 | \diff X^2(t) &= \big( \log(1+\exp(X^2(t))) + \kappa\,X^1(t) \big)\diff t + \diff W_2(t),\\
206 | X^1(t_0)&=X^2(t_0)=0,
207 | \end{split}
208 | \end{equation}
209 | where $\kappa\in\R$ is a \alert{tunable} parameter. By applying \alert{Corollary 3.6}, the TME-2 covariance approximation to this SDE is positive definite for \alert{all} $t-t_0\in\R_{>0}$, if \alert{$\abs{\kappa}\leq0.5$}.
210 | \end{example}
211 | \end{block}
212 | \end{frame}
213 |
214 | %\begin{frame}{Stochastic filtering}
215 | % \begin{block}{}
216 | % \begin{figure}
217 | % \centering
218 | % \includegraphics[width=.7\linewidth]{../thesis_latex/figs/tme-softplus-mineigs}
219 | % \caption{The minimum eigenvalues of TME-2 approximated $\cov{X(t) \cond X(t_0)}$ (denote $\Sigma_2$) w.r.t. $\Delta t=t-t_0$ and $\kappa$.}
220 | % \end{figure}
221 | % \end{block}
222 | %\end{frame}
223 |
224 | \begin{frame}{Stochastic filtering}
225 | \begin{block}{}
226 | \alert{Section 3.6} details how to run Gaussian filters and smoothers with the TME method.
227 | \end{block}
228 | \begin{block}{}
229 | Under a few assumptions on the system, the TME Gaussian filters and smoothers are \alert{stable} in the sense that (\alert{Theorem 3.17})
230 | %
231 | \begin{equation}
232 | \expecBig{\normbig{X_k - m^f_k}_2^2} \leq (c^f_1)^k \, \trace{P_0} + c^f_2, \quad k=1,2,\ldots
233 | \end{equation}
234 | %
235 | and
236 | \begin{equation}
237 | \begin{split}
238 | \expecBig{\normbig{X_k - m^s_k}_2^2} &\leq c^f_0(k) + (c^s_1)^{T-k} \, c_2 + c_3, \quad,\\
239 | k&=1,2,\ldots,T, \quad T=1,2,\ldots,
240 | \end{split}
241 | \end{equation}
242 | where $c^f_1<1$, $c^s_1<1$, and $c^f_0(k)$ depends on $\expec{\norm{X_k - m^f_k}_2^2}$ \alert{only}.
243 | \end{block}
244 | \end{frame}
245 |
246 | \begin{frame}{Stochastic filtering}
247 | \begin{figure}
248 | \centering
249 | \includegraphics[width=.6\linewidth]{../thesis_latex/figs/tme-duffing-filter-smoother}\\
250 | \includegraphics[width=.4\linewidth]{../thesis_latex/figs/tme-duffing-smoother-x1}
251 | \includegraphics[width=.4\linewidth]{../thesis_latex/figs/tme-duffing-smoother-x2}
252 | \caption{TME on Duffing-van der Pol (\alert{Example 3.19}).}
253 | \end{figure}
254 | \end{frame}
255 |
256 | \section{State-space deep Gaussian processes}
257 | \begin{frame}{Contents}
258 | \begin{block}{}
259 | \tableofcontents[currentsection]
260 | \end{block}
261 | \end{frame}
262 |
263 | %\begin{frame}{Gaussian processes}
264 | % \begin{block}{}
265 | % $U\colon\T\to\R^d$ is said to be a \alert{Gaussian process (GP)}, if for every $t_10},
383 | % \end{split}
384 | % \end{equation}
385 | % %
386 | % then it is \alert{hard} for Gaussian filters and smoothers to estimate the posterior distribution of $\sigma$ from data.
387 | % \end{block}
388 | % \begin{block}{}
389 | % The \alert{Kalman gain} for $\sigma$ converges to zero as $k\to\infty$.
390 | % \end{block}
391 | % \begin{block}{}
392 | % These are detailed in \alert{Section 4.8}.
393 | % \end{block}
394 | %\end{frame}
395 |
396 | \section{Applications of state-space (deep) Gaussian processes}
397 | \begin{frame}{Contents}
398 | \begin{block}{}
399 | \tableofcontents[currentsection]
400 | \end{block}
401 | \end{frame}
402 |
403 | \begin{frame}{Probabilistic Drift Estimation}
404 | \begin{block}{}
405 | Consider an SDE
406 | %
407 | \begin{equation}
408 | \diff X(t) = a(X(t)) \diff t + b \diff W(t), \quad X(t_0) = X_0,
409 | \end{equation}
410 | %
411 | where the drift function $a$ is \alert{unknown}. The task is to estimate $a$ from a set of partial observations \alert{$x(t_1), x(t_2), \ldots, x(t_T)$} of the SDE.
412 | \end{block}
413 | \begin{block}{}
414 | One can assume that
415 | %
416 | \begin{equation}
417 | a(x) \sim \mathrm{SSGP}(0, C(x, x'))
418 | \end{equation}
419 | %
420 | then build an \alert{approximate likelihood} model from any \alert{discretisation} of the SDE.
421 | \end{block}
422 | \begin{block}{}
423 | If necessary, let $a$ follow an SS-DGP.
424 | \end{block}
425 | \end{frame}
426 |
427 | \begin{frame}{Probabilistic Drift Estimation}
428 | \begin{block}{}
429 | Essentially, the estimation model reads
430 | %
431 | \begin{equation}
432 | \begin{split}
433 | a(x) &\sim \mathrm{SSGP}(0, C(x, x')),\\
434 | X(t_k) - X(t_{k-1}) &\approx f_{k-1}(X_{k-1}) + q_{k-1}(X_{k-1}),
435 | \end{split}
436 | \end{equation}
437 | %
438 | where $f_{k-1}$ and $q_{k-1}$ are some \alert{non-linear functions and random variables} of $a$ and its \alert{derivatives} (depending on the discretisation).
439 | \end{block}
440 | \begin{block}{}
441 | What are the \alert{upsides} for placing an SS-(D)GP prior on $a$?
442 | \begin{itemize}
443 | \item \alert{Linear} time computational complexity.
444 | \item Derivatives of $a$ appear as \alert{state components}, no need to compute the covariance matrices of derivatives.
445 | \item Amenable to \alert{high-order discretisation schemes/accurate likelihood approxiamation}.
446 | \end{itemize}
447 | \end{block}
448 | \end{frame}
449 |
450 | %\begin{frame}{}
451 | % \begin{figure}
452 | % \centering
453 | % \includegraphics[width=\linewidth]{../thesis_latex/figs/drift-est}
454 | % \caption{Left: $a(x) = 3 \, (x-x^3)$. Right: $a(x) = \tanh(x)$.}
455 | % \end{figure}
456 | %\end{frame}
457 |
458 | \begin{frame}{Spectro-temporal Analysis}
459 | \begin{block}{}
460 | Consider any periodic signal $z\colon\T\to\R$. We may want to approximate it by \alert{Fourier expansion}:
461 | %
462 | \begin{equation}
463 | z(t) \approx \alpha_0 + \sum^N_{n=1} \big[ \alpha_n \cos(2 \, \pi \, f_n \, t) + \beta_n\sin(2 \, \pi \, f_n \, t) \big].
464 | \end{equation}
465 | %
466 | \alert{GP estimation} of the coefficients \alert{$\lbrace \alpha_0, \alpha_n,\beta_n \rbrace_{n=1}^N$}:
467 | %
468 | \begin{equation}
469 | \begin{split}
470 | \alpha_0(t) &\sim \mathrm{SSGP}(0, C^0_\alpha(t, t')), \\
471 | \alpha_n(t) &\sim \mathrm{SSGP}(0, C^n_\alpha(t, t')), \\
472 | \beta_n(t) &\sim \mathrm{SSGP}(0, C^n_\beta(t, t')), \\
473 | Y_k \alert{=} \alpha_0(t_k) + \sum^N_{n=1} \big[ \alpha_n(t_k) &\cos(2 \, \pi \, f_n \, t_k) + \beta_n(t_k)\sin(2 \, \pi \, f_n \, t_k) \big] + \xi_k,\nonumber
474 | \end{split}
475 | \end{equation}
476 | \end{block}
477 | \end{frame}
478 |
479 | \begin{frame}{Spectro-temporal Analysis}
480 | \begin{block}{}
481 | However, the approach is \alert{computationally demanding}. Needs to store and compute \alert{$2 \, N+1$} covariance matrices of dimension \alert{$T\times T$} and their \alert{inverse}.
482 | \end{block}
483 | \begin{block}{}
484 | If we use the state-space approach, then it reduces to solve \alert{$T$} covariance matrices of dimension \alert{$2 \, N+1$}. Beneficial when \alert{$T\gg N$}.
485 | \end{block}
486 | \begin{block}{}
487 | With a clever choice of \alert{stationary} state-space prior, the said covariance matrices are no longer a problem. Replaced by a \alert{pre-computed and data-independent} stationary covariance matrix. Even faster.
488 | \end{block}
489 | \begin{block}{}
490 | Detailed in \alert{Section 5.2}.
491 | \end{block}
492 | \end{frame}
493 |
494 | %\begin{frame}{}
495 | % \begin{figure}
496 | % \centering
497 | % \includegraphics[width=\linewidth]{../thesis_latex/figs/spectro-temporal-demo1}
498 | % \caption{Spectrogram (right, contour plot) of a sinusoidal signal (left) estimated by RTS smoother. Dashed black lines stand for the ground truth frequencies.}
499 | % \end{figure}
500 | %\end{frame}
501 |
502 | \begin{frame}
503 | \noindent
504 | \begin{minipage}{.48\textwidth}
505 | \begin{figure}
506 | \centering
507 | \fbox{\includegraphics[trim={2cm 2cm 2cm 2cm},width=.8\linewidth,clip]{../thesis_latex/title-pages/title-pages}}
508 | \end{figure}
509 | \end{minipage}
510 | \hfill
511 | \begin{minipage}{.48\textwidth}
512 | \begin{block}{}
513 | Thank you!
514 | \end{block}
515 | \begin{block}{}
516 | \begin{figure}
517 | \centering
518 | \includegraphics[width=.5\linewidth]{figs/qr-code-thesis}
519 | \end{figure}
520 | \end{block}
521 | \end{minipage}
522 | \end{frame}
523 |
524 | \end{document}
--------------------------------------------------------------------------------
/thesis_latex/ch5.tex:
--------------------------------------------------------------------------------
1 | %!TEX root = dissertation.tex
2 | \chapter{Applications}
3 | \label{chap:apps}
4 | In this chapter, we present the experimental results in Publications~\cp{paperDRIFT}, \cp{paperKFSECG}, \cp{paperKFSECGCONF}, \cp{paperSSDGP}, and~\cp{paperMARITIME}. These works are mainly concerned with the applications of state-space (deep) GPs. Specifically, in Section~\ref{sec:drift-est} we show how to use the SS-GP regression method to estimate unknown drift functions in SDEs. Similarly, under that same state-space framework, in Section~\ref{sec:spectro-temporal} we show how to estimate the posterior distributions of the Fourier coefficients of signals. Sections~\ref{sec:apps-ssdgp} and~\ref{sec:maritime} illustrate how SS-DGPs can be used to model real-world signals, such as gravitational waves, accelerometer recordings of human motion, and maritime vessel trajectories.
5 |
6 | \section{Drift estimation in stochastic differential equations}
7 | \label{sec:drift-est}
8 | Consider a scalar-valued stochastic process $X \colon \T \to \R$ governed by a stochastic differential equation
9 | %
10 | \begin{equation}
11 | \diff X(t) = a(X(t)) \diff t + b \diff W(t), \quad X(t_0) = X_0,
12 | \label{equ:drift-est-sde}
13 | \end{equation}
14 | %
15 | where $b\in\R$ is a constant, $W\colon \T\to\R$ is a Wiener process, and $a\colon \R\to\R$ is an \emph{unknown} drift function. Suppose that we have measurement random variables $X(t_1), X(t_2), \ldots, X(t_T)$ of $X$ at time instances $t_1, t_2, \ldots, t_T\in\T$, the goal is to estimate the drift function $a$ from these measurements.
16 |
17 | One way to proceed is to assume a parametric form of function $a=a_\vartheta(\cdot)$ and estimate its parameters $\vartheta$ by using, for example, maximum likelihood estimation~\citep{Zmirou1986, Yoshida1992, Kessler1997, Sahalia2003} or Monte Carlo methods~\citep{Roberts2001, Beskos2006}.
18 |
19 | In this chapter, we are mainly concerned with the GP regression approach for estimating the unknown $a$~\citep{Papaspiliopoulos2012, Ruttor2013, Garcia2017, Batz2018, Opper2019}. The key idea of this approach is to assume that the unknown drift function is distributed according to a GP, that is
20 | %
21 | \begin{equation}
22 | a(x) \sim \mathrm{GP}\bigl(0, C(x, x')\bigr).
23 | \end{equation}
24 | %
25 | Having at our disposal measurements $X(t_1), X(t_2), \ldots, X(t_T)$ observed directly from SDE~\eqref{equ:drift-est-sde}, we can formulate the problem of estimating $a$ as a GP regression problem. In order to do so, we discretise the SDE in Equation~\eqref{equ:drift-est-sde} and thereupon define the measurement model as
26 | %
27 | \begin{equation}
28 | Y_k \coloneqq X(t_{k}) - X(t_{k-1}) = \check{f}_{k-1}(X(t_{k-1})) + \check{q}_{k-1}(X(t_{k-1}))
29 | \end{equation}
30 | %
31 | for $k=1,2,\ldots, T$, where the function $\check{f}_{k-1}\colon \R \to \R$ and the random variable $\check{q}_{k-1}$ represent the exact discretisation of $X$ at $t_{k}$ from $t_{k-1}$. We write the GP regression model for estimating the drift function by
32 | %
33 | \begin{equation}
34 | \begin{split}
35 | a(x) &\sim \mathrm{GP}(0, C\bigl(x, x')\bigr),\\
36 | Y_k &= \check{f}_{k-1}(X_{k-1}) + \check{q}_{k-1}(X_{k-1}).
37 | \label{equ:drift-est-reg-model}
38 | \end{split}
39 | \end{equation}
40 | %
41 | The goal now is to estimate the posterior density of $a(x)$ for all $x\in\R$ from a set of data $y_{1:T} = \lbrace x_k - x_{k-1} \colon k=1,2,\ldots, T \rbrace$.
42 |
43 | However, the exact discretisation of non-linear SDEs is rarely possible. In practice, we often have to approximate $\check{f}_{k-1}$ and $\check{q}_{k-1}$ by using, for instance, Euler--Maruyama scheme, Milstein's method, or more generally It\^{o}--Taylor expansions~\citep{Kloeden1992}. As an example, application of the Euler--Maruyama method to Equation~\eqref{equ:drift-est-sde} gives
44 | %
45 | \begin{equation}
46 | \begin{split}
47 | \check{f}_{k-1} &\approx a(x) \, \Delta t_k, \\
48 | \check{q}_{k-1} &\approx b \, \delta_k,
49 | \end{split}
50 | \end{equation}
51 | %
52 | where $\Delta t_k \coloneqq t_{k} - t_{k-1}$ and $\delta_k \sim \mathrm{N}(0, \Delta t_k)$.
53 |
54 | However, the discretisation by the Euler--Maruyama scheme can sometimes be crude, especially when the discretisation step is relatively large, making the measurement representation obtained from it inaccurate.~\citet{ZhaoZheng2020Drift} show that if the prior of $a$ is chosen of certain regularities, it is possible to leverage high-order It\^{o}--Taylor expansions in order to discretise the SDE with higher accuracy. As an example, suppose that the GP prior $a$ is twice-differentiable almost surely. Then, the It\^{o}--Taylor strong order 1.5 (It\^{o}-1.5) method~\citep{Kloeden1992} gives
55 | %
56 | \begin{equation}
57 | \begin{split}
58 | \check{f}_{k-1}(x) &\approx a(x) \, \Delta t_{k} + \frac{1}{2} \Big( \frac{\diff a}{\diff x}(x) \, a(x) + \frac{1}{2}\, \frac{\diff^2 a}{\diff x^2}(x) \, b^2 \Big) \, \Delta t_k^2,\\
59 | \check{q}_{k-1}(x) &\approx b\,\delta_{1,k} + \frac{\diff a}{\diff x}(x) \, b \, \delta_{2, k},
60 | \label{equ:drift-est-ito15}
61 | \end{split}
62 | \end{equation}
63 | %
64 | where
65 | %
66 | \begin{equation}
67 | \begin{bmatrix}
68 | \delta_{1,k} \\
69 | \delta_{2,k}
70 | \end{bmatrix} \sim \mathrm{N}\left(
71 | \begin{bmatrix}
72 | 0\\0
73 | \end{bmatrix},
74 | \begin{bmatrix}
75 | \frac{(\Delta t_k)^3}{3} & \frac{(\Delta t_k)^2}{2}\\
76 | \frac{(\Delta t_k)^2}{2} & \Delta t_k
77 | \end{bmatrix}\right).
78 | \end{equation}
79 | %
80 | Indeed, using a higher order It\^{o}--Taylor expansion can lead to a better measurement representation, however, this in turn requires more computations and limits the choice of the prior model. It is also worth mentioning that if one uses the approximations of high order It\^{o}--Taylor expansions -- such as the one in Equation~\eqref{equ:drift-est-ito15} -- the resulting measurement representation in the GP regression model~\eqref{equ:drift-est-reg-model} is no longer linear with respect to $a$. Consequently, the GP regression solution may not admit a closed-form solution.
81 |
82 | One problem of this GP regression-based drift estimation approach is that the computation can be demanding if the number of measurements $T$ is large. Moreover, if the measurements are densely located then the covariance matrices used in GP regression may be numerically close to singular. These two issues are already discussed in Introduction and Section~\ref{sec:ssgp}. In addition, the GP regression model is not amenable to high order It\^{o}--Taylor expansions, as these expansions result in non-linear measurement representations and require to compute the derivatives of $a$ up to a certain order.
83 |
84 | \begin{figure}[t!]
85 | \centering
86 | \includegraphics[width=.99\linewidth]{figs/drift-est}
87 | \caption{Estimation of drift functions $a(x)=3\,(x-x^3)$ (left) and $a(x)=\tanh(x)$ (right) by \citet{ZhaoZheng2020Drift}. UKFS stands for unscented Kalman filter and RTS smoother (UKFS). Shaded area stands for 0.95 confidence interval associated with the UKFS estimation.}
88 | \label{fig:drift-est}
89 | \end{figure}
90 |
91 | \citet{ZhaoZheng2020Drift} address the problems above by considering solving the GP regression problem in Equation~\eqref{equ:drift-est-reg-model} under the state-space framework. More precisely, they put an SS-GP prior over the unknown $a$ instead of a standard batch GP. The main benefit of doing so for this application is that the SS-GP regression solvers are computationally more efficient for large-scale measurements compared to the standard batch GP regression (see, Introduction and Section~\ref{sec:ssgp}). Moreover, in order to use high order It\^{o}--Taylor expansions, \citet{ZhaoZheng2020Drift} consider putting SS-GP priors over $a$ of the \matern family, so that the derivatives of $a$ naturally appear as the state components of $a$ (see, Section~\ref{sec:deep-matern}). In this way, computing the covariance matrices of the derivatives of $a$ is no longer needed.
92 |
93 | \begin{remark}
94 | Note that the SS-GP approach requires to treat $X(t_1), X(t_2), \ldots ,\allowbreak X(t_T)$ as time variables and sort their data $x_{1:T}=\lbrace x_1,x_2,\ldots,x_T \rbrace$ in temporal order.
95 | \end{remark}
96 |
97 | In Figure~\ref{fig:drift-est}, we show a representative result from~\citet{ZhaoZheng2020Drift}, where the SS-GP approach is employed to approximate the drift functions of two SDEs. In particular, the solutions are obtained by using the It\^{o}-1.5 discretisation, and an unscented Kalman filter and an RTS smoother. For more details regarding the experiments the reader is referred to~\citet{ZhaoZheng2020Drift}.
98 |
99 | \section{Probabilistic spectro-temporal signal analysis}
100 | \label{sec:spectro-temporal}
101 | Let $z\colon \T \to \R$ be a periodic signal. In signal processing, it is often of interest to approximate the signal by Fourier expansions of the form
102 | %
103 | \begin{equation}
104 | z(t) \approx \alpha_0 + \sum^{N}_{n=1} \big[\alpha_n \cos(2 \, \pi \, \mathring{f}_n \, t) + \beta_n \sin(2 \, \pi \, \mathring{f}_n \, t)\big],
105 | \end{equation}
106 | %
107 | where $\big\lbrace \mathring{f}_n\colon n=1,2,\ldots,N \big\rbrace$ stand for the frequency components, and $N$ is a given expansion order. When $z$ satisfies certain conditions~\citep{Katznelson2004}, the representation in the equation above converges as $N\to\infty$ (in various modes).
108 |
109 | Let use denote $y_k\coloneqq y(t_k)$ and suppose that we have a set of measurement data $y_{1:T}=\lbrace y_k\colon k=1,2,\ldots,T \rbrace$ of the signal at time instances $t_1, t_2, \ldots, t_T\in\T$. In order to quantify the truncation and measurement errors, we introduce Gaussian random variables $\xi_k \sim \mathrm{N}(0, \Xi_k)$ for $k=1,2,\ldots,T$ and let
110 | %
111 | \begin{equation}
112 | \begin{split}
113 | Y_k = \alpha_0 + \sum^{N}_{n=1} \big[\alpha_n \cos(2 \, \pi \, \mathring{f}_n \, t_k) + \beta_n \sin(2 \, \pi \, \mathring{f}_n \, t_k)\big] + \xi_k
114 | \label{equ:spectro-temporal-y}
115 | \end{split}
116 | \end{equation}
117 | %
118 | represent the random measurements of $z$ at $t_k$. The goal now is to estimate the coefficients $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$ from the data $y_{1:T}$. We call this problem the \emph{spectro-temporal estimation} problem.
119 |
120 | One way to proceed is by using the MLE method~\citep{Bretthorst1988}, but~\citet{QiYuan2002, ZhaoZheng2018KF, ZhaoZheng2020KFECG} show that we can also consider this spectro-temporal estimation problem as a GP regression problem. More precisely, the modelling assumption is that
121 | %
122 | \begin{equation}
123 | \begin{split}
124 | \alpha_0(t) &\sim \mathrm{GP}\big(0, C^0_\alpha(t, t')\big),\\
125 | \alpha_n(t) &\sim \mathrm{GP}\big(0, C^n_\alpha(t, t')\big),\\
126 | \beta_n(t) &\sim \mathrm{GP}\big(0, C^n_\beta(t, t')\big),
127 | \label{equ:spectro-temporal-gp-priors}
128 | \end{split}
129 | \end{equation}
130 | %
131 | for $n=1,2,\ldots, N$, and that the measurements follow
132 | %
133 | \begin{equation}
134 | Y_k = \alpha_0(t_k) + \sum^{N}_{n=1} \big[\alpha_n(t_k) \, \cos(2 \, \pi \, \mathring{f}_n \, t_k) + \beta_n(t_k) \, \sin(2 \, \pi \, \mathring{f}_n \, t_k)\big] + \xi_k,\nonumber
135 | \end{equation}
136 | %
137 | for $k=1,2,\ldots,T$. This results in a standard GP regression problem therefore, the posterior distribution of coefficients $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$ have a close-form solution. However, solving this GP regression problem is, in practice, infeasible when the expansion order $N$ and the number of measurements $T$ are large. This is due to the fact that one needs to compute $2\,N+1$ covariance matrices of dimension $T\times T$ and compute their inverse.
138 |
139 | \citet{ZhaoZheng2018KF} propose to solve this spectro-temporal GP regression problem under the state-space framework, that is, by replacing the GP priors in Equation~\eqref{equ:spectro-temporal-gp-priors} with their SDE representations. Since SS-GPs have already been extensively discussed in previous sections, we omit the resulting state-space spectro-temporal estimation formulations. However, the details can be found in Section~\ref{sec:ssgp} and in~\citet{ZhaoZheng2018KF}.
140 |
141 | The computational cost of the state-space spectro-temporal estimation method is substantially cheaper than that of standard batch GP methods. Indeed, Kalman filters and smoothers only need to compute one $E$-dimensional covariance matrix at each time step (see, Algorithm~\ref{alg:kfs}) instead of those required by batch GP methods. The dimension $E$ is equal to the sum of all the state dimensions of the SS-GPs $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$.
142 |
143 | \citet{ZhaoZheng2020KFECG} further extend the state-space spectro-temporal estimation method by putting quasi-periodic SDE priors~\citep{Solin2014} over the Fourier coefficients instead of the Ornstein--Uhlenbeck SDE priors used in~\citet{ZhaoZheng2018KF}. This consideration generates a time-invariant version of the measurement model in Equation~\eqref{equ:spectro-temporal-y}, thus, one can apply steady-state Kalman filters and smoothers (SS-KFSs) in order to achieve lower computational costs. The computational cost is further reduced because SS-KFSs do not need to compute the $E$-dimensional covariances of the state in their filtering and smoothing loops. Instead, the state covariances in SS-KFSs are replaced by a pre-computed steady covariance matrix obtained as the solution of its discrete algebraic Riccati equation (DARE). Moreover, solving the DARE is independent of data/measurements, which is especially useful when the model is known or fixed. However, SS-KFSs may not always be computationally efficient when $N \gg T$, since solving an $E$-dimensional DARE can be demanding when $E$ is large.
144 |
145 | \citet{ZhaoZheng2018KF, ZhaoZheng2020KFECG} show that the state-space spectro-temporal estimation method can be a useful feature extraction mechanism for detecting atrial fibrillation from electrocardiogram signals. More specifically, the spectro-temporal method estimates the spectrogram images of atrial fibrillation signals. These images are then fed to a deep convolutional neural network classifier which is tasked with recognising atrial fibrillation manifestations.
146 |
147 | Since the measurement noises $\lbrace \xi_k\colon k=1,2,\ldots,T \rbrace$ in Equation~\eqref{equ:spectro-temporal-y} encode the truncation and measurement errors, it is also of interest to estimate them. This is done in~\citet{GaoRui2019ALKS}, where the variances $\Xi_k$ of $\xi_k$ for $k=1,2,\ldots, T$ are estimated under the alternating direction method of multipliers.
148 |
149 | \begin{figure}[t!]
150 | \centering
151 | \includegraphics[width=.99\linewidth]{figs/spectro-temporal-demo1}
152 | \caption{Spectrogram (right, contour plot) of a sinusoidal signal (left) generated by Kalman filtering and RTS smoothing using the method in Section~\ref{sec:spectro-temporal}. Dashed black lines (right) stand for the ground truth frequencies.}
153 | \label{fig:spectro-temporal-demo}
154 | \end{figure}
155 |
156 | Figure~\ref{fig:spectro-temporal-demo} illustrates an example of using the state-space spectro-temporal estimation method to estimate the spectrogram of a sinusoidal signal with multiple frequency bands.
157 |
158 | \section{Signal modelling with SS-DGPs}
159 | \label{sec:apps-ssdgp}
160 | In this section, we apply SS-DGPs for modelling gravitational waves and human motion (i.e., acceleration). We consider these as SS-DGP regression problems, where the measurement models are assumed to be linear with respect to the SS-DGPs with additive Gaussian noises. As for their priors, we chose the Mat\'{e}rn $\nu=3 \, / \, 2$ SS-DGP in Example~\ref{example:ssdgp-m32}, except that the parent GPs $U^2_1$ and $U^3_1$ use the Mat\'{e}rn $\nu=1 \, / \, 2$ representation.
161 |
162 | \subsection*{Modelling gravitational waves}
163 | Gravitational waves are curvatures of spacetime caused by the movement of objects with mass~\citep{Maggiore2008}. Since the time Albert Einstein predicted the existence of gravitational waves theoretically from a linearised field equation in 1916~\citep{EinsteinGW1937, Hill2017}, much effort has been done to observe their presence~\citep{Blair1991}. In 2015, the laser interferometer gravitational-wave observatory (LIGO) team first observed a gravitational wave from the merging of a black hole binary~\citep[event GW150914,][]{LIGO2016}. This wave/signal is challenging for standard GPs to fit because the frequency of the signal changes over time. It is then of our interest to see if SS-DGPs can fit this gravitational wave signal.
164 |
165 | \begin{figure}[t!]
166 | \centering
167 | \includegraphics[width=.99\linewidth]{figs/gravit-wave-ssdgp}
168 | \caption{Mat\'{e}rn $\nu=3\, \, /\, 2$ SS-DGP regression (solved by cubature Kalman filter and smoother) for the gravitational wave in event GW150914 (Hanford, Washington). The shaded area stands for 0.95 confidence interval. Details about the data can be found in~\citet{Zhao2020SSDGP}.}
169 | \label{fig:gravit-wave-ssdgp}
170 | \end{figure}
171 |
172 | Figure~\ref{fig:gravit-wave-ssdgp} plots the SS-DGP fit for the gravitational wave observed in the event GW150914. In the same figure, we also show the fit from a \matern $\nu=3\,/\,2$ GP as well as a waveform (which is regarded as the ground truth) computed from the numerical relativity (purple dashed lines) for comparison. Details about the experiment and data are found in~\citet{Zhao2020SSDGP}.
173 |
174 | Figure~\ref{fig:gravit-wave-ssdgp} shows that the GP fails to give a reasonable fit to the gravitational wave because the GP over-adapts the high-frequency section of the signal around $0.4$~s. On the contrary, the SS-DGP does not have such a problem, and the fit is closer to the numerical relativity waveform compared that of the GP. Moreover, the estimated length scale (in log transformation) can interpret the data in the sense that the length scale value decreases as the signal frequency increases.
175 |
176 | \subsection*{Modelling human motion}
177 | We apply the regularised SS-DGP (R-SS-DGP) presented in Section~\ref{sec:l1-r-dgp} to fit an accelerometer recording of human motion. The reason for using R-SS-DGP here is that the recording (see, the first row of Figure~\ref{fig:imu-r-ssdgp}) is found to have some sharp changes and artefacts. Hence, we aim at testing if we can use sparse length scale and magnitude to describe such data. The collection of accelerometer recordings and the experiment settings are detailed in~\citet{Hostettler2018} and~\citet{Zhao2021RSSGP}, respectively.
178 |
179 | \begin{figure}[t!]
180 | \centering
181 | \includegraphics[width=.99\linewidth]{figs/imu-r-ssdgp}
182 | \caption{Human motion modelling with an R-SS-DGP. The GP here uses a \matern $\nu=3\,/\,2$ covariance function. Shaded area stands for 0.95 confidence interval.}
183 | \label{fig:imu-r-ssdgp}
184 | \end{figure}
185 |
186 | A demonstrative result is shown in Figure~\ref{fig:imu-r-ssdgp}. We see that the fit of R-SS-DGP is smoother than that of GP. Moreover, the posterior variance of R-SS-DGP is also found to be reasonably smaller than GP. It is also evidenced from the figure that the GP does not handle the artefacts well, for example, around times $t=55$~s and $62$~s. Finally, we find that the learnt length scale and magnitude (in log transformation) are sparse, and that they can respond sharply to the abrupt signal changes and artefacts.
187 |
188 | \section{Maritime situational awareness}
189 | \label{sec:maritime}
190 | Another area of applications of (deep) GPs is autonomous maritime navigation. In~\citet{Sarang2020}, we present a literature review on the sensor technology and machine learning methods for autonomous vessel navigation. In particular, we show that GP-based methods are able to analyse ship trajectories~\citep{Rong2019}, detect navigation abnormality~\citep{Kowalska2012, Smith2014}, and detect/classify vessels~\citep{XiaoZ2017}.
191 |
192 | \begin{figure}[t!]
193 | \centering
194 | \includegraphics[width=.99\linewidth]{figs/ais-r-ssdgp}
195 | \caption{Modelling AIS recording (speed over ground) of MS Finlandia with an R-SS-DGP. The GP here uses a \matern $\nu=3\,/\,2$ covariance function. Shaded area stands for 0.95 confidence interval.}
196 | \label{fig:ais-ssdgp}
197 | \end{figure}
198 |
199 | In Figure~\ref{fig:ais-ssdgp}, we present an example for fitting an automatic identification system (AIS) recording by using an R-SS-DGP. The recording is taken from MS Finlandia (Helsinki--Tallinn) by Fleetrange Oy on December 10, 2020. We see from the figure that the fit of R-SS-DGP is smoother than that of GP. Moreover, the learnt length scale and magnitude parameters are flat and jump at the acceleration/deceleration points.
200 |
--------------------------------------------------------------------------------