├── cover ├── cover.pdf ├── full_cover.png ├── README.md └── gen_q_wiener.py ├── dissertation.pdf ├── scripts ├── requirements.txt ├── README.md ├── vanishing_prior_cov.py ├── TME_positive_definite_softplus.py ├── showcase_gp_sinusoidal.py ├── draw_ssdgp_m12.py ├── showcase_gp_rectangular.py ├── draw_ssdgp_m32.py ├── disc_err_dgp_m12.py ├── spectro_temporal.py └── TME_estimation_benes.py ├── thesis_latex ├── figs │ ├── drift-est.pdf │ ├── gp-kfs-eq.pdf │ ├── ais-r-ssdgp.pdf │ ├── imu-r-ssdgp.pdf │ ├── r-ssgp-admm.pdf │ ├── drift-est-dw.pdf │ ├── drift-est-tanh.pdf │ ├── ssdgp-reg-rect.pdf │ ├── ssdgp-reg-sine.pdf │ ├── tme-benes-all.pdf │ ├── tme-benes-cov.pdf │ ├── tme-benes-nn.pdf │ ├── tme-benes-x3.pdf │ ├── vanishing-cov.pdf │ ├── disc-err_dgp_m12.pdf │ ├── tme-benes-filter.pdf │ ├── tme-ct3d-filter.pdf │ ├── gp-fail-example-m12.pdf │ ├── gp-fail-example-m32.pdf │ ├── gp-fail-example-m52.pdf │ ├── gp-fail-example-rbf.pdf │ ├── gravit-wave-ssdgp.pdf │ ├── samples_ssdgp_m12.pdf │ ├── samples_ssdgp_m32.pdf │ ├── tme-benes-nonlinear.pdf │ ├── tme-benes-smoother.pdf │ ├── tme-ct3d-smoother.pdf │ ├── tme-softplus-mineigs.pdf │ ├── gp-fail-example-sinu-m12.pdf │ ├── gp-fail-example-sinu-m32.pdf │ ├── gp-fail-example-sinu-m52.pdf │ ├── spectro-temporal-demo1.pdf │ ├── tme-duffing-smoother-x1.pdf │ ├── tme-duffing-smoother-x2.pdf │ ├── tme-benes-filter-smoother.pdf │ ├── tme-duffing-filter-smoother.pdf │ ├── gp-sign-fit-example.tex │ ├── ssdgp-identifiability-graph.tex │ ├── dgp-example-2.tex │ └── dgp-binary-tree.tex ├── title-pages │ ├── README.md │ ├── backcover.pdf │ └── title-pages.pdf ├── sRGB_IEC61966-2-1_black_scaled.icc ├── dissertation.xmpdata ├── aalto_licenses │ ├── README.md │ ├── README-aaltologo.md │ ├── README.txt │ ├── LICENSES.txt │ ├── LICENSES-aaltologo.txt │ └── sRGB_IEC61966-2-1_black_scaled.icc-COPYRIGHT ├── modifications_of_aaltoseries.txt ├── README.md ├── fouriernc2.sty ├── list_of_papers.tex ├── zmacro.tex ├── ch6.tex ├── fourier2.sty ├── ch1.tex ├── dissertation.tex └── ch5.tex ├── .gitignore ├── lectio_praecursoria ├── README.md ├── figs │ └── path-graph.tex ├── scripts │ ├── draw_gp_samples.py │ └── kfs_anime.py ├── zz.cls ├── z_marcro.tex └── slide.tex ├── license.txt ├── errata.md ├── .github └── workflows │ └── latex_compile.yml └── README.md /cover/cover.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/cover/cover.pdf -------------------------------------------------------------------------------- /dissertation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/dissertation.pdf -------------------------------------------------------------------------------- /cover/full_cover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/cover/full_cover.png -------------------------------------------------------------------------------- /scripts/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | gpflow 3 | matplotlib 4 | scipy 5 | scikit-learn 6 | sympy 7 | tme>=0.1.4 8 | -------------------------------------------------------------------------------- /thesis_latex/figs/drift-est.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-kfs-eq.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-kfs-eq.pdf -------------------------------------------------------------------------------- /cover/README.md: -------------------------------------------------------------------------------- 1 | Code to generate the dissertation cover. Run `python gen_q_wiener.py` to generate the file `cover.pdf`. 2 | -------------------------------------------------------------------------------- /thesis_latex/figs/ais-r-ssdgp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ais-r-ssdgp.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/imu-r-ssdgp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/imu-r-ssdgp.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/r-ssgp-admm.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/r-ssgp-admm.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/drift-est-dw.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est-dw.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/drift-est-tanh.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/drift-est-tanh.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/ssdgp-reg-rect.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ssdgp-reg-rect.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/ssdgp-reg-sine.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/ssdgp-reg-sine.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-all.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-all.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-cov.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-cov.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-nn.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-nn.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-x3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-x3.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/vanishing-cov.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/vanishing-cov.pdf -------------------------------------------------------------------------------- /thesis_latex/title-pages/README.md: -------------------------------------------------------------------------------- 1 | # README 2 | 3 | Title pages and backcover generated from Aalto Publicaton Platform. 4 | -------------------------------------------------------------------------------- /thesis_latex/figs/disc-err_dgp_m12.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/disc-err_dgp_m12.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-filter.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-ct3d-filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-ct3d-filter.pdf -------------------------------------------------------------------------------- /thesis_latex/title-pages/backcover.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/title-pages/backcover.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-m12.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m12.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-m32.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m32.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-m52.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-m52.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-rbf.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-rbf.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gravit-wave-ssdgp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gravit-wave-ssdgp.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/samples_ssdgp_m12.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/samples_ssdgp_m12.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/samples_ssdgp_m32.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/samples_ssdgp_m32.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-nonlinear.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-nonlinear.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-smoother.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-smoother.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-ct3d-smoother.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-ct3d-smoother.pdf -------------------------------------------------------------------------------- /thesis_latex/title-pages/title-pages.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/title-pages/title-pages.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-softplus-mineigs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-softplus-mineigs.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-sinu-m12.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m12.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-sinu-m32.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m32.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/gp-fail-example-sinu-m52.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/gp-fail-example-sinu-m52.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/spectro-temporal-demo1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/spectro-temporal-demo1.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-duffing-smoother-x1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-smoother-x1.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-duffing-smoother-x2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-smoother-x2.pdf -------------------------------------------------------------------------------- /thesis_latex/figs/tme-benes-filter-smoother.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-benes-filter-smoother.pdf -------------------------------------------------------------------------------- /thesis_latex/sRGB_IEC61966-2-1_black_scaled.icc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/sRGB_IEC61966-2-1_black_scaled.icc -------------------------------------------------------------------------------- /thesis_latex/figs/tme-duffing-filter-smoother.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgbkdlm/dissertation/HEAD/thesis_latex/figs/tme-duffing-filter-smoother.pdf -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.asc 2 | *.ase 3 | *.asp 4 | *.aux 5 | *.bbl 6 | *.blg 7 | *.brf 8 | *.log 9 | *.out 10 | *.synctex.gz 11 | *.toc 12 | *.nav 13 | *.snm 14 | pdfa.xmpi 15 | .idea 16 | __pycache__ 17 | -------------------------------------------------------------------------------- /lectio_praecursoria/README.md: -------------------------------------------------------------------------------- 1 | # About 2 | 3 | Lectio praecursoria given at the public defence of the dissertation. 4 | 5 | # Notes 6 | 7 | In case of missing-figure errors, you need to run the scripts in `./scripts` and copy the generated pictures to the corresponding path. 8 | -------------------------------------------------------------------------------- /thesis_latex/dissertation.xmpdata: -------------------------------------------------------------------------------- 1 | \Title{State-space deep Gaussian processes with applications} 2 | \Author{Zheng Zhao} 3 | \Subject{State-space methods for deep Gaussian processes} 4 | \Keywords{Gaussian processes, machine learning, state space, stochastic differential equations, stochastic filtering} 5 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/README.md: -------------------------------------------------------------------------------- 1 | This project contains the aaltoseries class file and docs. 2 | 3 | The zip file 'examples.zip' contains a number of example files. In order to 4 | compile them, copy aaltoseries.cls from root into this directory, and also 5 | obtain a copy of aaltologo.sty and place it there. 6 | 7 | -------------------------------------------------------------------------------- /license.txt: -------------------------------------------------------------------------------- 1 | Unless otherwise stated, all rights belong to the author Zheng Zhao. This repository contains files covered by different licenses, please check their licenses before you use them. 2 | 3 | You are free to download, display, and print ./dissertation.pdf for your own personal use. Commercial use of it is prohibited. 4 | 5 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/README-aaltologo.md: -------------------------------------------------------------------------------- 1 | This is the home of the file aaltologo.sty, which generates the Aalto logo's for documents and also otherwise defines relevant parts of the Aalto visual identity, 2 | specifically, colours and fonts used, and the official names of Schools, Institutes, etc. as used in the various logos, in Finnish, English and Swedish. 3 | -------------------------------------------------------------------------------- /errata.md: -------------------------------------------------------------------------------- 1 | # Errata 2 | 3 | 1. Page 86, Assumption 4.21, $i$ should be replaced with 2. 4 | 2. Page 54, Example (3.10), the sentence "In this case, |k| should be less than 0.5" is inaccurate. It should be "less or equal than 0.5". 5 | 3. Page 44, Equation (3.11), the Ito integral there is missing a (\nabla_X \phi)^\trans. Similarly, in Equation (3.14), the Ito integral is missing. 6 | 7 | If you spot any error/typo/inaccuracy in the thesis, please feel free to submit an Issue or drop me an email. 8 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/README.txt: -------------------------------------------------------------------------------- 1 | aaltologo.sty -- A LaTeX package for creating Aalto University logos 2 | 3 | See the accompanying documentation in aaltologo.pdf for a user manual. Read the file 4 | LICENSES.txt for publication licenses of the files aaltologo.sty and aaltologo.pdf. 5 | The publication licenses are also given in the documentation. 6 | 7 | For installation, move/copy aaltologo.sty to a location where LaTeX can find it. 8 | 9 | Enjoy creating Aalto University logos. 10 | -------------------------------------------------------------------------------- /thesis_latex/modifications_of_aaltoseries.txt: -------------------------------------------------------------------------------- 1 | Zheng Zhao made the following changes to the aaltoseries.cls class. 2 | 3 | 1. Removed the "DRAFT" signs. 4 | 2. Changed \@date to a fix date October 4 in the preface environment. 5 | 3. Changed "List of Publications" to "List of publications". 6 | 4. Added copyright notice for \addpublication with "submitted" option. 7 | 8 | You can also clearly see the differences by using the `diff` command. 9 | 10 | Disclaimer: The changes above do not create a derivative of aaltoseries, therefore, the CC Attribution Non-derivative license is not violated. 11 | -------------------------------------------------------------------------------- /.github/workflows/latex_compile.yml: -------------------------------------------------------------------------------- 1 | name: Dissertation latex compile 2 | on: 3 | workflow_dispatch: 4 | inputs: 5 | name: 6 | description: 'Workflow run name' 7 | required: true 8 | default: 'Manual unittest' 9 | 10 | jobs: 11 | build_latex: 12 | runs-on: ubuntu-latest 13 | steps: 14 | - name: Set up Git repository 15 | uses: actions/checkout@v2 16 | - name: Compile LaTeX document 17 | uses: xu-cheng/latex-action@v2 18 | with: 19 | root_file: dissertation.tex 20 | latexmk_shell_escape: true 21 | pre_compile: "cd thesis_latex" 22 | post_compile: "latexmk -c" 23 | 24 | -------------------------------------------------------------------------------- /thesis_latex/README.md: -------------------------------------------------------------------------------- 1 | # README 2 | 3 | Latex source of the thesis titled "State-space deep Gaussian processes with applications". 4 | 5 | The main tex file is `dissertation.tex`. It is recommended that your Texlive version is greater or equal than *2019* in order to compile the thesis. 6 | 7 | # Licenses 8 | 9 | The licenses for `aaltoseries.cls`, `aaltologo.sty`, and `sRGB_IEC61966-2-1_black_scaled.icc` are found in `./aalto_licenses`. Note that `aaltoseries.cls` is modified, see, `modifications_of_aaltoseries.txt`. As for the licenses for `fourier2.sty` and `fouriernc2.sty`, please inquiry the aaltoseries developers. 10 | 11 | `z_marcro.tex` is under CC BY 4.0 license. 12 | 13 | You are free to compile, download, display, and print the thesis and latex source files of it, as long as you give proper citation to it. Commercial use is prohibited. 14 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/LICENSES.txt: -------------------------------------------------------------------------------- 1 | The aaltoseries class has been published under the Creative Commons Attribution No-Derivative license (http://creativecommons.org/licenses/by-nd/1.0/). This means that you CAN use the class freely in your own documents BUT it also means that you CANNOT base your own Aalto University publication series class/package upon this class. However, you CAN use this class as an example in designing your own classes/packages that implement the publication series recommendations of another university or company. 2 | 3 | The documentation for the aaltoseries class has been published under the Creative Commons Attribution license (http://creativecommons.org/licenses/by/1.0/). This means that you CAN freely write your own documentation for the aaltoseries class based on this document as long as you give credit for the original documentation by citing it appropriately. 4 | 5 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/LICENSES-aaltologo.txt: -------------------------------------------------------------------------------- 1 | The aaltologo package has been published under Creative Commons Attribution 2 | No-Derivative license (http://creativecommons.org/licenses/by-nd/1.0/). This means 3 | that you CAN use the package freely in your own documents, packages and classes 4 | BUT it also means that you CANNOT base your own Aalto University logo package upon 5 | this package. Furthermore, if you want to write your own Aalto University logo 6 | package, you NEED to contact the Aalto University Marketing and Communications BEFORE 7 | publishing your package. However, you can use this package as an example in designing 8 | your own packages implementing the visual identity of a company/university. 9 | 10 | The documentation for the aaltologo package has been published under the Creative 11 | Commons Attribution license (http://creativecommons.org/licenses/by/1.0/). This means 12 | that you CAN freely write your own documentation for the aaltologo package based on 13 | the document as long as you give credit for the original documentation by citing it 14 | appropriately. 15 | 16 | 17 | -------------------------------------------------------------------------------- /thesis_latex/aalto_licenses/sRGB_IEC61966-2-1_black_scaled.icc-COPYRIGHT: -------------------------------------------------------------------------------- 1 | For the file sRGB_IEC61966-2-1_black_scaled.icc: 2 | 3 | Copyright International Color Consortium, 2009 4 | 5 | It is hereby acknowledged that the file "sRGB_IEC61966-2-1_black 6 | scaled.icc" is provided "AS IS" WITH NO EXPRESS OR IMPLIED WARRANTY. 7 | 8 | Licensing 9 | 10 | This profile is made available by the International Color Consortium, 11 | and may be copied, distributed, embedded, made, used, and sold without 12 | restriction. Altered versions of this profile shall have the original 13 | identification and copyright information removed and shall not be 14 | misrepresented as the original profile. 15 | 16 | Terms of use 17 | 18 | To anyone who acknowledges that the file "sRGB_IEC61966-2-1_black 19 | scaled.icc" is provided "AS IS" WITH NO EXPRESS OR IMPLIED WARRANTY, 20 | permission to use, copy and distribute these file for any purpose is 21 | hereby granted without fee, provided that the file is not changed 22 | including the ICC copyright notice tag, and that the name of ICC shall 23 | not be used in advertising or publicity pertaining to distribution of 24 | the software without specific, written prior permission. ICC makes no 25 | representations about the suitability of this software for any 26 | purpose. 27 | 28 | -------------------------------------------------------------------------------- /thesis_latex/fouriernc2.sty: -------------------------------------------------------------------------------- 1 | \def\fileversion{1.0}% 2 | \def\filedate{2005/12/20}% 3 | \NeedsTeXFormat{LaTeX2e}% 4 | \ProvidesPackage{fouriernc}% 5 | [\filedate\space\fileversion\space fouriernc package]% 6 | 7 | %The metrics for the 'upright' option have not been tuned. 8 | \DeclareOption{sloped}{\PassOptionsToPackage{sloped}{fourier}} 9 | %\DeclareOption{upright}{\PassOptionsToPackage{upright}{fourier}} 10 | 11 | \ExecuteOptions{sloped} 12 | \ProcessOptions 13 | \RequirePackage{fourier2} 14 | 15 | %\ifsloped 16 | \DeclareSymbolFont{letters}{FML}{fncmi}{m}{it} 17 | \DeclareSymbolFont{otherletters}{FML}{fncm}{m}{it} 18 | \SetSymbolFont{letters}{bold}{FML}{fncmi}{b}{it} 19 | \SetSymbolFont{otherletters}{bold}{FML}{fncm}{b}{it} 20 | %\else 21 | % \DeclareSymbolFont{letters}{FML}{fncm}{m}{it} 22 | % \DeclareSymbolFont{otherletters}{FML}{fncmi}{m}{it} 23 | % \SetSymbolFont{letters}{bold}{FML}{fncm}{b}{it} 24 | % \SetSymbolFont{otherletters}{bold}{FML}{fncmi}{b}{it} 25 | %\fi 26 | 27 | \renewcommand{\rmdefault}{fnc} 28 | 29 | \DeclareFontSubstitution{FML}{fncmi}{m}{it} 30 | \DeclareFontSubstitution{FMS}{fncm}{m}{n} 31 | 32 | \DeclareSymbolFont{operators}{T1}{fnc}{m}{n} 33 | \SetSymbolFont{operators}{bold}{T1}{fnc}{b}{n} 34 | \DeclareSymbolFont{symbols}{FMS}{fncm}{m}{n} 35 | \DeclareMathAlphabet{\mathbf}{T1}{fnc}{b}{n} 36 | \DeclareMathAlphabet{\mathrm}{T1}{fnc}{m}{n} 37 | \DeclareMathAlphabet{\mathit}{T1}{fnc}{m}{it} 38 | \DeclareMathAlphabet{\mathcal}{FMS}{fncm}{m}{n} 39 | 40 | \endinput 41 | -------------------------------------------------------------------------------- /scripts/README.md: -------------------------------------------------------------------------------- 1 | This folder contains Python/Matlab scripts that generate some of the figures in the dissertation. Specifically, the scripts in this folder are as follows. 2 | 3 | 1. `disc_err_dgp_m12.py`: Compute the discretisation errors for a Matern DGP. Related to **Figure 4.3**. 4 | 5 | 2. `draw_ssdgp_m12.py`: Draw samples from a Matern 1/2 SS-DGP. Related to **Figure 4.4**. 6 | 7 | 3. `draw_ssdgp_m32.py`: Draw samples from a Matern 3/2 SS-DGP. Related to **Figure 4.5**. 8 | 9 | 4. `showcase_gp_rectangular.py`: Perform GP regression for a rectangular signal. Related to **Figure 1.1**. 10 | 11 | 5. `showcase_gp_sinusoidal.py`: Perform GP regression for a sinusoidal signal. Related to **Figure 1.1**. 12 | 13 | 6. `spectro_temporal.py`: Spectro-temporal state-space method for estimation of spectrogram. Related to **Figure 5.2**. 14 | 15 | 7. `TME_estimation_benes.py`: Use TME to estimate a few expectations of a Benes SDE. Related to **Figure 3.1**. 16 | 17 | 8. `TME_postive_definite_softplus.py`: Analyse the postive definiteness of the TME covariance estimator for an SDE. Related to **Figure 3.2**. 18 | 19 | 9. `vanishing_prior_cov.py`: Estimate a cross-covariance of an SS-DGP. Related to **Figure 4.8**. 20 | 21 | # Requirements 22 | 23 | In order to run the scripts, you need to install a few packages as follows. 24 | 25 | `pip install numpy scipy scikit-learn sympy tme matplotlib`. 26 | 27 | Additionally, if you want to run scripts `showcase_gp_rectangular.py` and `showcase_gp_sinusoidal.py`, you need to install `gpflow`, that is, `pip install gpflow`. 28 | 29 | # License 30 | 31 | You are free to do anything you want with the scripts in this folder, except that `spectro_temporal.py` is under the MIT license. I do not give any warranty of any kind. 32 | -------------------------------------------------------------------------------- /thesis_latex/figs/gp-sign-fit-example.tex: -------------------------------------------------------------------------------- 1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt 2 | 3 | \begin{tikzpicture}[x=0.75pt,y=0.75pt,yscale=-1,xscale=1] 4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300 5 | 6 | %Straight Lines [id:da2875292600189103] 7 | \draw [line width=1.5] (100,160) -- (100,34) ; 8 | \draw [shift={(100,30)}, rotate = 450] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 9 | %Straight Lines [id:da3046472264214819] 10 | \draw [line width=1.5] (80,140) -- (226,140) ; 11 | \draw [shift={(230,140)}, rotate = 180] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 12 | %Straight Lines [id:da9835771066030095] 13 | \draw [line width=0.75] (100,140) -- (130,70) ; 14 | %Straight Lines [id:da7083475840908604] 15 | \draw [line width=0.75] (130,70) -- (220,70) ; 16 | %Straight Lines [id:da20127916146270386] 17 | \draw [dash pattern={on 0.84pt off 2.51pt}] (130,70) -- (130,140) ; 18 | %Straight Lines [id:da677805480574649] 19 | \draw [dash pattern={on 0.84pt off 2.51pt}] (130,70) -- (100,70) ; 20 | %Straight Lines [id:da35275281353074295] 21 | \draw [dash pattern={on 0.84pt off 2.51pt}] (170,70) -- (170,140) ; 22 | 23 | % Text Node 24 | \draw (81,62.4) node [anchor=north west][inner sep=0.75pt] {$1$}; 25 | % Text Node 26 | \draw (82,142.4) node [anchor=north west][inner sep=0.75pt] {$t_0$}; 27 | % Text Node 28 | \draw (124,142.4) node [anchor=north west][inner sep=0.75pt] {$t_{1}$}; 29 | % Text Node 30 | \draw (164,142.4) node [anchor=north west][inner sep=0.75pt] {$t_{2}$}; 31 | % Text Node 32 | \draw (67,42.4) node [anchor=north west][inner sep=0.75pt] {$u(t)$}; 33 | % Text Node 34 | \draw (204,142.4) node [anchor=north west][inner sep=0.75pt] {$t$}; 35 | 36 | 37 | \end{tikzpicture} -------------------------------------------------------------------------------- /lectio_praecursoria/figs/path-graph.tex: -------------------------------------------------------------------------------- 1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt 2 | 3 | \begin{tikzpicture}[x=0.7pt,y=0.7pt,yscale=-1,xscale=1] 4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300 5 | 6 | %Shape: Circle [id:dp821654118938725] 7 | \draw [line width=1.5] (40,90) .. controls (40,78.95) and (48.95,70) .. (60,70) .. controls (71.05,70) and (80,78.95) .. (80,90) .. controls (80,101.05) and (71.05,110) .. (60,110) .. controls (48.95,110) and (40,101.05) .. (40,90) -- cycle ; 8 | %Shape: Circle [id:dp9693294242254202] 9 | \draw [line width=1.5] (110,90) .. controls (110,78.95) and (118.95,70) .. (130,70) .. controls (141.05,70) and (150,78.95) .. (150,90) .. controls (150,101.05) and (141.05,110) .. (130,110) .. controls (118.95,110) and (110,101.05) .. (110,90) -- cycle ; 10 | %Straight Lines [id:da7555295343349109] 11 | \draw [line width=1.5] (110,90) -- (84,90) ; 12 | \draw [shift={(80,90)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 13 | %Straight Lines [id:da1673080699648295] 14 | \draw [line width=1.5] (220,90) -- (194,90) ; 15 | \draw [shift={(190,90)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 16 | %Shape: Circle [id:dp9339346589837028] 17 | \draw [line width=1.5] (220,90) .. controls (220,78.95) and (228.95,70) .. (240,70) .. controls (251.05,70) and (260,78.95) .. (260,90) .. controls (260,101.05) and (251.05,110) .. (240,110) .. controls (228.95,110) and (220,101.05) .. (220,90) -- cycle ; 18 | 19 | % Text Node 20 | \draw (60,90) node [font=\large] {$U_{0}^{1}$}; 21 | % Text Node 22 | \draw (130,90) node [font=\large] {$U_{1}^{2}$}; 23 | % Text Node 24 | \draw (161,81) node [anchor=north west][inner sep=0.75pt] [align=left] {...}; 25 | % Text Node 26 | \draw (240,90) node [font=\large] {$U_{L-1}^{L}$}; 27 | 28 | 29 | \end{tikzpicture} 30 | 31 | -------------------------------------------------------------------------------- /cover/gen_q_wiener.py: -------------------------------------------------------------------------------- 1 | # Simulate an H^1_0([0, S])-valued Wiener process. This generates the cover image of the thesis. 2 | # Zheng Zhao 2019 2021 3 | # 4 | # Reference: Gabriel J. Lord et al., 2014 spde book. 5 | # 6 | # note: Aalto platform does not support RGBA colour hence, cannot use alpha. 7 | # 8 | 9 | import numpy as np 10 | import matplotlib.pyplot as plt 11 | from matplotlib import cm 12 | 13 | np.random.seed(1901) 14 | 15 | # Paras 16 | r = 2 17 | J = 2 ** 7 18 | K = J - 1 19 | 20 | S = 2 21 | xs = np.linspace(0, S, K) 22 | 23 | dt = 5e-3 24 | ts = np.arange(dt, 2 + dt, dt) 25 | 26 | ps = np.arange(1, K + 1).reshape(1, -1) * 1.0 27 | lam_j = ps ** (-(2 * r + 1)) 28 | sheet_jk = ps.T * ps 29 | 30 | # Simulate Wiener processes 31 | normal_incs = np.random.randn(ts.size, K) 32 | dW = np.dot(np.sqrt(2 * lam_j * dt / S) * normal_incs / np.sqrt(dt), np.sin(np.pi * sheet_jk / J)) 33 | WW = np.cumsum(dW, 0) 34 | 35 | # Plot 36 | fig, ax = plt.subplots(subplot_kw={"projection": "3d"}) 37 | 38 | colours = cm.magma(np.linspace(0, 0.9, ts.shape[0])) 39 | for t, Wt, colour in zip(ts, WW, colours): 40 | _, = ax.plot3D(xs, [t] * K, Wt, linewidth=0.1, color=colour, alpha=1.) 41 | 42 | ax.grid(False) 43 | 44 | ax.set_axis_off() 45 | 46 | ax.xaxis.set_ticklabels([]) 47 | ax.yaxis.set_ticklabels([]) 48 | ax.zaxis.set_ticklabels([]) 49 | 50 | ax.xaxis.set_pane_color((1.0, 1.0, 1.0, 0.0)) 51 | ax.yaxis.set_pane_color((1.0, 1.0, 1.0, 0.0)) 52 | ax.zaxis.set_pane_color((1.0, 1.0, 1.0, 0.0)) 53 | 54 | # Transparent spines 55 | ax.w_xaxis.line.set_color((1.0, 1.0, 1.0, 0.0)) 56 | ax.w_yaxis.line.set_color((1.0, 1.0, 1.0, 0.0)) 57 | ax.w_zaxis.line.set_color((1.0, 1.0, 1.0, 0.0)) 58 | 59 | ax.xaxis.pane.fill = False 60 | ax.yaxis.pane.fill = False 61 | 62 | ax.view_init(23, 25) 63 | 64 | bbox = fig.bbox_inches.from_bounds(1.91, 1.48, 2.775, 1.843) 65 | 66 | # Save in pdf and png 67 | png_meta_data = {'Title': 'A Q-Wiener process realisation', 68 | 'Author': 'Zheng Zhao', 69 | 'Copyright': 'Zheng Zhao', 70 | 'Description': 'https://github.com/zgbkdlm/dissertation'} 71 | pdf_meta_data = {'Title': 'A Q-Wiener process realisation', 72 | 'Author': 'Zheng Zhao'} 73 | plt.savefig('cover.png', dpi=1200, transparent=True, metadata=png_meta_data, bbox_inches=bbox) 74 | plt.savefig('cover.pdf', bbox_inches=bbox, metadata=pdf_meta_data) 75 | 76 | -------------------------------------------------------------------------------- /lectio_praecursoria/scripts/draw_gp_samples.py: -------------------------------------------------------------------------------- 1 | """ 2 | Draw GP samples 3 | 4 | """ 5 | import jax 6 | import math 7 | import jax.numpy as jnp 8 | import jax.scipy 9 | import jax.scipy.optimize 10 | import matplotlib.pyplot as plt 11 | from jax.config import config 12 | 13 | config.update("jax_enable_x64", True) 14 | 15 | plt.rcParams.update({ 16 | 'text.usetex': True, 17 | 'text.latex.preamble': r'\usepackage{fouriernc}', 18 | 'font.family': "serif", 19 | 'font.serif': 'New Century Schoolbook', 20 | 'font.size': 18}) 21 | 22 | # Random seed 23 | key = jax.random.PRNGKey(6666) 24 | 25 | jndarray = jnp.ndarray 26 | 27 | 28 | def m12_cov(t1: float, t2: float, s: float, ell: float) -> float: 29 | """Matern 1/2""" 30 | return s ** 2 * jnp.exp(-jnp.abs(t1 - t2) / ell) 31 | 32 | 33 | def m32_cov(t1: float, t2: float, s: float, ell: float) -> float: 34 | """Matern 3/2""" 35 | z = math.sqrt(3) * jnp.abs(t1 - t2) / ell 36 | return s ** 2 * (1 + z) * jnp.exp(-z) 37 | 38 | 39 | vectorised_m12_cov = jax.vmap(jax.vmap(m12_cov, in_axes=[0, None, None, None]), in_axes=[None, 0, None, None]) 40 | vectorised_m32_cov = jax.vmap(jax.vmap(m32_cov, in_axes=[0, None, None, None]), in_axes=[None, 0, None, None]) 41 | 42 | 43 | # Times 44 | ts = jnp.linspace(0, 1, 1000) 45 | num_mcs = 10 46 | 47 | # Paras 48 | s = 1. 49 | ell = 1. 50 | 51 | # Compute mean and covariances 52 | mean = jnp.zeros_like(ts) 53 | 54 | for cov_func, cov_name in zip([vectorised_m12_cov, vectorised_m32_cov], 55 | ['m12', 'm32']): 56 | 57 | cov = cov_func(ts, ts, s, ell) 58 | fig, ax = plt.subplots(figsize=(8.8, 6.6)) 59 | plt.xlim(0, 1) 60 | plt.ylabel('$U(t)$', fontsize=20) 61 | plt.xlabel('$t$', fontsize=20) 62 | 63 | for i in range(num_mcs): 64 | 65 | # Random key 66 | key, subkey = jax.random.split(key) 67 | 68 | # Draw! 69 | gp_sample = jax.random.multivariate_normal(key=subkey, mean=mean, cov=cov) 70 | 71 | # Plot 72 | ax.plot(ts, gp_sample, linewidth=1) 73 | 74 | if cov_func is vectorised_m12_cov: 75 | plt.subplots_adjust(top=0.995, bottom=0.09, left=0.08, right=0.981, hspace=0.2, wspace=0.2) 76 | else: 77 | plt.subplots_adjust(top=0.995, bottom=0.09, left=0.105, right=0.981, hspace=0.2, wspace=0.2) 78 | 79 | plt.savefig(f'../figs/gp-sample-{cov_name}.pdf') 80 | plt.cla() 81 | 82 | -------------------------------------------------------------------------------- /thesis_latex/figs/ssdgp-identifiability-graph.tex: -------------------------------------------------------------------------------- 1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt 2 | 3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1] 4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300 5 | 6 | %Shape: Circle [id:dp821654118938725] 7 | \draw [line width=1.5] (130,70) .. controls (130,58.95) and (138.95,50) .. (150,50) .. controls (161.05,50) and (170,58.95) .. (170,70) .. controls (170,81.05) and (161.05,90) .. (150,90) .. controls (138.95,90) and (130,81.05) .. (130,70) -- cycle ; 8 | %Shape: Circle [id:dp9693294242254202] 9 | \draw [line width=1.5] (200,40) .. controls (200,28.95) and (208.95,20) .. (220,20) .. controls (231.05,20) and (240,28.95) .. (240,40) .. controls (240,51.05) and (231.05,60) .. (220,60) .. controls (208.95,60) and (200,51.05) .. (200,40) -- cycle ; 10 | %Straight Lines [id:da2336152744929989] 11 | \draw [line width=1.5] (200,100) -- (173.38,83.14) ; 12 | \draw [shift={(170,81)}, rotate = 392.35] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 13 | %Straight Lines [id:da7555295343349109] 14 | \draw [line width=1.5] (200,40) -- (173.33,57.78) ; 15 | \draw [shift={(170,60)}, rotate = 326.31] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 16 | %Shape: Rectangle [id:dp057979059306741076] 17 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (120,10) -- (320,10) -- (320,130) -- (120,130) -- cycle ; 18 | %Shape: Rectangle [id:dp30596298266066313] 19 | \draw [line width=1.5] (270,20) -- (310,20) -- (310,60) -- (270,60) -- cycle ; 20 | %Shape: Rectangle [id:dp9621262435565225] 21 | \draw [line width=1.5] (200,80) -- (240,80) -- (240,120) -- (200,120) -- cycle ; 22 | %Straight Lines [id:da8176933026073245] 23 | \draw [line width=1.5] (270,40) -- (244,40) ; 24 | \draw [shift={(240,40)}, rotate = 360] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 25 | 26 | % Text Node 27 | \draw (150,70) node [font=\large] {$U_{0}^{1}$}; 28 | % Text Node 29 | \draw (220,40) node [font=\large] {$U_{1}^{2}$}; 30 | % Text Node 31 | \draw (290,37) node [font=\large] {$\varphi $}; 32 | % Text Node 33 | \draw (220,96) node [font=\large] {$\psi $}; 34 | % Text Node 35 | \draw (301,112) node [font=\large] {$\mathcal{V}$}; 36 | 37 | 38 | \end{tikzpicture} 39 | -------------------------------------------------------------------------------- /scripts/vanishing_prior_cov.py: -------------------------------------------------------------------------------- 1 | # Numerically plot the vanishing covariance problem for Matern SS-DGP. This gives Figure 4.8 in the thesis. 2 | # 3 | # Zheng Zhao 2021 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | 10 | l2 = 0.1 11 | s2 = 0.1 12 | l3 = 0.1 13 | s3 = 0.1 14 | 15 | 16 | def g(x): 17 | return np.exp(x) 18 | 19 | 20 | def a_b(x): 21 | """Return a(x) and b(x) 22 | """ 23 | return -1 / np.array([g(x[1]), l2, l3]) * x, \ 24 | math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])), s2 / np.sqrt(l2), s3 / np.sqrt(l3)]) 25 | 26 | 27 | def euler_maruyama(x0, dt, num_steps): 28 | xx = np.zeros(shape=(num_steps, x0.shape[0])) 29 | x = x0 30 | for i in range(num_steps): 31 | ax, bx = a_b(x) 32 | x = x + ax * dt + np.sqrt(dt) * bx @ np.random.randn(3) 33 | xx[i] = x 34 | return xx 35 | 36 | 37 | if __name__ == '__main__': 38 | 39 | path_figs = '../thesis/figs' 40 | plt.rcParams.update({ 41 | 'text.usetex': True, 42 | 'text.latex.preamble': r'\usepackage{fouriernc}', 43 | 'font.family': "serif", 44 | 'font.serif': 'New Century Schoolbook', 45 | 'font.size': 20}) 46 | 47 | np.random.seed(2020) 48 | 49 | end_T = 1 50 | num_steps = 1000 51 | dt = end_T / num_steps 52 | tt = np.linspace(dt, end_T, num_steps) 53 | 54 | num_mc = 20000 55 | 56 | m0 = np.array([1., 1., 1.]) 57 | P0 = np.array([[1., 0., 0.5], 58 | [0., 1., 0.], 59 | [0.5, 0., 1.]]) 60 | P0_chol = np.linalg.cholesky(P0) 61 | 62 | # Compute Euler Maruyama 63 | xx = np.zeros(shape=(num_mc, num_steps, 3)) 64 | for mc in range(num_mc): 65 | x0 = m0 + P0_chol @ np.random.randn(3) 66 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps) 67 | 68 | Ex = np.mean(xx, axis=0)[:, 0] 69 | Ey = np.mean(xx, axis=0)[:, 2] 70 | Exy = np.mean(xx[:, :, 0] * xx[:, :, 2], axis=0) 71 | 72 | cov = Exy - Ex * Ey 73 | 74 | # Plot 75 | plt.figure(figsize=(7, 4)) 76 | plt.plot(tt, cov, c='black', linewidth=3, label=r'$\mathrm{Cov}\,\big[U^1_0(t), U^3_1(t)\big]$') 77 | 78 | plt.grid(linestyle='--', alpha=0.3, which='both') 79 | 80 | plt.xlabel('$t$') 81 | plt.ylabel(r'$\mathrm{Cov}\,\big[U^1_0(t), U^3_1(t)\big]$') 82 | plt.xlim(0, end_T) 83 | plt.yticks([0.0, 0.1, 0.2, 0.3, 0.4, 0.5]) 84 | 85 | plt.legend(loc='upper right', fontsize=20) 86 | 87 | plt.tight_layout(pad=0.1) 88 | 89 | plt.savefig(os.path.join(path_figs, 'vanishing-cov.pdf')) 90 | # plt.show() 91 | -------------------------------------------------------------------------------- /scripts/TME_positive_definite_softplus.py: -------------------------------------------------------------------------------- 1 | # This will generate Figure 3.1 in the thesis. 2 | # 3 | # Zheng Zhao 2020 4 | # 5 | import os 6 | import sympy 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | import tme.base_sympy as tme 10 | 11 | from sympy import lambdify 12 | from matplotlib.ticker import MultipleLocator 13 | 14 | if __name__ == '__main__': 15 | 16 | # Initial value and paras 17 | np.random.seed(666) 18 | x0 = np.array([[0.], 19 | [0.]]) 20 | 21 | # Example SDE 22 | kappa = sympy.Symbol('k') 23 | x = sympy.MatrixSymbol('x', 2, 1) 24 | f = sympy.Matrix([[sympy.log(1 + sympy.exp(x[0])) + kappa * x[1]], 25 | [sympy.log(1 + sympy.exp(x[1])) + kappa * x[0]]]) 26 | L = sympy.eye(2) 27 | dt_sym = sympy.Symbol('dt') 28 | 29 | # TME 30 | tme_mean, tme_cov = tme.mean_and_cov(x, f, L, dt_sym, 31 | order=2, simp=True) 32 | 33 | # Cov 34 | cov_func = lambdify([x, kappa, dt_sym], tme_cov, 'numpy') 35 | 36 | # Compute 37 | # xx = np.linspace(0, 1, 200) 38 | kk = np.linspace(-4, 4, 500) 39 | dts = np.linspace(0.01, 6, 500) 40 | 41 | mineigs = np.zeros((kk.size, dts.size)) 42 | 43 | for i in range(kk.size): 44 | for j in range(dts.size): 45 | eig, _ = np.linalg.eigh(cov_func(x0, kk[i], dts[j])) 46 | mineigs[i, j] = eig.min() 47 | 48 | # Plot 49 | path_figs = '../thesis/figs' 50 | plt.rcParams.update({ 51 | 'text.usetex': True, 52 | 'text.latex.preamble': r'\usepackage{fouriernc}', 53 | 'font.family': "serif", 54 | 'font.serif': 'New Century Schoolbook', 55 | 'font.size': 18}) 56 | 57 | fig = plt.figure() 58 | 59 | grid_k, grid_dt = np.meshgrid(dts, kk) 60 | 61 | mineigs_crop = mineigs 62 | mineigs_crop[mineigs_crop < 0] = -1. 63 | 64 | ax = plt.axes() 65 | ax.xaxis.set_major_locator(MultipleLocator(1)) 66 | ax.yaxis.set_major_locator(MultipleLocator(1)) 67 | 68 | cnt = plt.contourf(grid_dt, grid_k, mineigs_crop, cmap=plt.cm.Blues_r) 69 | for c in cnt.collections: 70 | c.set_edgecolor("face") 71 | 72 | cbar = plt.colorbar() 73 | cbar_ticks = [tick.get_text() for tick in cbar.ax.get_yticklabels()] 74 | cbar_ticks[0] = '<0' 75 | 76 | cbar.ax.set_yticklabels(cbar_ticks) 77 | 78 | plt.axvline(-0.5, c='red', linestyle='--', alpha=0.5) 79 | plt.axvline(0.5, c='red', linestyle='--', alpha=0.5) 80 | 81 | plt.xlabel(r'$\kappa$') 82 | plt.ylabel(r'$\Delta t$') 83 | plt.title(r'$\lambda_{\mathrm{min}}(\Sigma_2(\Delta t))$') 84 | 85 | plt.tight_layout(pad=0.1) 86 | 87 | # plt.show() 88 | 89 | plt.savefig(os.path.join(path_figs, 'tme-softplus-mineigs.pdf')) 90 | -------------------------------------------------------------------------------- /lectio_praecursoria/zz.cls: -------------------------------------------------------------------------------- 1 | \NeedsTeXFormat{LaTeX2e} 2 | \ProvidesClass{zz}[2020/10/11 Zheng Zhao's minimalism beamer class] 3 | 4 | % Dependencies 5 | \RequirePackage{lastpage} 6 | \RequirePackage{calc} 7 | \RequirePackage[dvipsnames]{xcolor} 8 | 9 | % Commands and definitions 10 | \newlength\titlepagesep \setlength\titlepagesep{0.2cm} 11 | \newlength\titlepageauthorsep \setlength\titlepageauthorsep{0.6cm} 12 | \newlength\footery \setlength\footery{8cm} 13 | 14 | \definecolor{footergray}{gray}{0.5} 15 | \newcommand{\setcmapBeijing}{% 16 | \definecolor{frametitlecolor}{gray}{0.2} 17 | \definecolor{mastercolour}{named}{RoyalPurple} 18 | \definecolor{secondcolour}{RGB}{255, 121, 19} 19 | } 20 | \newcommand{\setcmapHelsinki}{% 21 | \definecolor{frametitlecolor}{gray}{0.2} 22 | \definecolor{mastercolour}{RGB}{66, 140, 212} 23 | \definecolor{secondcolour}{RGB}{255, 156, 218} 24 | } 25 | \newcommand{\setcmapReykjavik}{% 26 | \definecolor{frametitlecolor}{gray}{0.2} 27 | \definecolor{mastercolour}{gray}{0.4} 28 | \definecolor{secondcolour}{gray}{0.4} 29 | } 30 | \setcmapHelsinki 31 | 32 | \newif\ifseriffont\seriffontfalse 33 | \newif\iffullfooter\fullfooterfalse 34 | 35 | % Parse options and load beamer 36 | \DeclareOption{garamond}{% 37 | } 38 | \DeclareOption{seriffont}{\seriffonttrue} 39 | \DeclareOption{fullfooter}{\fullfootertrue} 40 | \DeclareOption{cmap=Beijing}{\setcmapBeijing} 41 | \DeclareOption{cmap=Helsinki}{\setcmapHelsinki} 42 | \DeclareOption*{\PassOptionsToClass{\CurrentOption}{beamer}} 43 | \ProcessOptions\relax 44 | \LoadClass{beamer} 45 | 46 | % Commands and definitions that depend on options 47 | \renewcommand{\titlepage}{% 48 | {% 49 | \setbeamertemplate{footline}{} 50 | \frame[t, noframenumbering]{% 51 | \vspace{2cm} 52 | \centering{ 53 | {\Large \scshape \textbf{\inserttitle}}\\[0.4cm] 54 | \insertsubtitle\\[1.8cm] 55 | \insertauthor\\[\titlepageauthorsep] 56 | {\scriptsize \insertinstitute}\\[\titlepagesep] 57 | {\scriptsize \insertdate} 58 | } 59 | } 60 | } 61 | } 62 | 63 | % Beamer customisations 64 | \iffullfooter 65 | \newcommand{\footertext}{\beamer@shorttitle} 66 | \else 67 | \newcommand{\footertext}{~} 68 | \fi 69 | \setbeamertemplate{footline}{% 70 | \noindent 71 | \begin{minipage}{.45\paperwidth} 72 | \vspace{-0.5cm} 73 | \hspace{\beamer@leftmargin} 74 | \footertext 75 | \end{minipage} 76 | \hfill 77 | \begin{minipage}{.45\paperwidth} 78 | \vspace{-0.5cm} 79 | \hspace{.35\paperwidth minus \beamer@rightmargin} 80 | {% 81 | \color{footergray} 82 | \tiny 83 | \arabic{page}/\pageref{LastPage} 84 | } 85 | \end{minipage} 86 | } 87 | 88 | \setbeamertemplate{navigation symbols}{} 89 | \setbeamertemplate{frametitle}{% 90 | {% 91 | \color{frametitlecolor} 92 | \vspace{0.2cm}\insertframetitle\\[-0.15cm] 93 | \rule{\widthof{\insertframetitle}}{1.5pt} 94 | } 95 | } 96 | 97 | % Fonts 98 | \ifseriffont 99 | % \usefonttheme{structuresmallcapsserif} 100 | \usefonttheme{serif} 101 | \fi 102 | \setbeamerfont{section title}{size=\normalsize,series=\bfseries} 103 | \setbeamerfont{frametitle}{series=\bfseries, shape=\scshape, family=\rmfamily} 104 | %\setbeamerfont{framesubtitle}{series=\rmfamily} 105 | %\setbeamerfont{caption}{series=\rmfamily} 106 | %\AtBeginDocument{\rmfamily} 107 | 108 | % Colours 109 | \setbeamercolor{structure}{fg=mastercolour} 110 | \setbeamercolor{alerted text}{fg=secondcolour} 111 | \setbeamercolor{example text}{fg=mastercolour} 112 | -------------------------------------------------------------------------------- /scripts/showcase_gp_sinusoidal.py: -------------------------------------------------------------------------------- 1 | # Show the GP regression on a composite sinusoidal signal, and generate Figure 1.1. 2 | # 3 | # Zheng Zhao 2021 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import gpflow 9 | import matplotlib.pyplot as plt 10 | from typing import Tuple 11 | from matplotlib.ticker import MultipleLocator 12 | 13 | path_figs = '../thesis/figs' 14 | np.random.seed(666) 15 | plt.rcParams.update({ 16 | 'text.usetex': True, 17 | 'text.latex.preamble': r'\usepackage{fouriernc}', 18 | 'font.family': "serif", 19 | 'font.serif': 'New Century Schoolbook', 20 | 'font.size': 20}) 21 | 22 | 23 | def sinu(t: np.ndarray, 24 | r: float) -> Tuple[np.ndarray, np.ndarray]: 25 | """Composite sinusoidal signal. Return the signal and an noisy measurement of it. 26 | """ 27 | ft = np.sin(7 * np.pi * np.cos(2 * np.pi * (t ** 2))) ** 2 / \ 28 | (np.cos(5 * np.pi * t) + 2) 29 | return ft, ft + math.sqrt(r) * np.random.randn(*t.shape) 30 | 31 | 32 | # Simulate measurements 33 | t = np.linspace(0, 1, 400).reshape(-1, 1) 34 | r = 0.004 35 | ft, y = sinu(t, r) 36 | 37 | # GPflow 38 | ell = 1. 39 | sigma = 1. 40 | 41 | m12 = gpflow.kernels.Matern12(lengthscales=ell, variance=sigma) 42 | m32 = gpflow.kernels.Matern32(lengthscales=ell, variance=sigma) 43 | m52 = gpflow.kernels.Matern52(lengthscales=ell, variance=sigma) 44 | 45 | # Plots 46 | for name, label, cov in zip(['m12', 'm32', 'm52'], 47 | [r'Mat\'ern $1\,/\,2$', r'Mat\'ern $3\,/\,2$', r'Mat\'ern $5\,/\,2$'], 48 | [m12, m32, m52]): 49 | print(f'GP regression with {name} cov function') 50 | model = gpflow.models.GPR(data=(t, y), kernel=cov, mean_function=None) 51 | model.likelihood.variance.assign(r) 52 | 53 | opt = gpflow.optimizers.Scipy() 54 | opt_logs = opt.minimize(model.training_loss, model.trainable_variables, 55 | method='L-BFGS-B', 56 | options={'disp': True}) 57 | 58 | m, P = model.predict_f(t) 59 | 60 | # Plot and save 61 | fig = plt.figure(figsize=(16, 8)) 62 | ax = plt.axes() 63 | plt.plot(t, ft, c='black', alpha=0.8, linestyle='--', linewidth=2, label='True signal') 64 | plt.scatter(t, y, s=15, c='black', edgecolors='none', alpha=0.3, label='Measurements') 65 | plt.plot(t, m, c='black', linewidth=3, label=label) 66 | plt.fill_between( 67 | t[:, 0], 68 | m[:, 0] - 1.96 * np.sqrt(P[:, 0]), 69 | m[:, 0] + 1.96 * np.sqrt(P[:, 0]), 70 | color='black', 71 | edgecolor='none', 72 | alpha=0.2, 73 | ) 74 | 75 | plt.grid(linestyle='--', alpha=0.3, which='both') 76 | 77 | plt.xlim(0, 1) 78 | plt.ylim(-0.2, 1.2) 79 | 80 | ax.xaxis.set_major_locator(MultipleLocator(0.2)) 81 | ax.xaxis.set_minor_locator(MultipleLocator(0.1)) 82 | ax.xaxis.set_major_formatter('{x:.1f}') 83 | 84 | ax.yaxis.set_major_locator(MultipleLocator(0.4)) 85 | ax.yaxis.set_minor_locator(MultipleLocator(0.1)) 86 | ax.yaxis.set_major_formatter('{x:.1f}') 87 | 88 | plt.xlabel('$t$', fontsize=24) 89 | plt.title('$\\ell \\approx {:.2f}, \\quad \\sigma \\approx {:.2f}$'.format(cov.lengthscales.numpy(), 90 | cov.variance.numpy())) 91 | plt.legend(loc='upper left', fontsize='large') 92 | 93 | plt.tight_layout(pad=0.1) 94 | 95 | filename = 'gp-fail-example-sinu-' + name + '.pdf' 96 | plt.savefig(os.path.join(path_figs, filename)) 97 | -------------------------------------------------------------------------------- /thesis_latex/list_of_papers.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = dissertation.tex 2 | \addpublication{Zheng Zhao, Toni Karvonen, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{Taylor moment expansion for continuous-discrete Gaussian filtering}{IEEE Transactions on Automatic Control}{Volume 66, Issue 9, Pages 4460--4467}{December}{2020}{Zheng Zhao, Toni Karvonen, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{paperTME} 3 | \addcontribution{Zheng Zhao wrote the article and produced the results. The stability analysis is mainly due to Toni Karvonen. Roland Hostettler gave useful comments. Simo S\"{a}rkk\"{a} contributed the idea.} 4 | \adderrata{In Example 7, the coefficient $\Phi_{x,2}$ should multiply with a factor $2$.} 5 | 6 | \addpublication{Zheng Zhao, Muhammad Emzir, and Simo S\"{a}rkk\"{a}}{Deep state-space Gaussian processes}{Statistics and Computing}{Volume 31, Issue 6, Article number 75, Pages 1--26}{September}{2021}{Zheng Zhao, Muhammad Emzir, and Simo S\"{a}rkk\"{a}}{paperSSDGP} 7 | \addcontribution{Zheng Zhao wrote the article and produced the results. Muhammad Emzir and Simo S\"{a}rkk\"{a} gave useful comments.} 8 | 9 | \addpublication{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{Kalman-based spectro-temporal ECG analysis using deep convolutional networks for atrial fibrillation detection}{Journal of Signal Processing Systems}{Volume 92, Issue 7, Pages 621--636}{April}{2020}{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{paperKFSECG} 10 | \addcontribution{Zheng Zhao wrote the article and produced the results. Ali Bahrami Rad helped with the experiments. Simo S\"{a}rkk\"{a} came up with the spectro-temporal idea.} 11 | 12 | \addpublication[conference]{Zheng Zhao, Filip Tronarp, Roland Hostettler, and Simo S\"{a}rkk\"{a}}{State-space Gaussian process for drift estimation in stochastic differential equations}{Proceedings of the 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}{Barcelona, Spain, Pages 5295--5299}{May}{2020}{IEEE}{paperDRIFT} 13 | \addcontribution{Zheng Zhao wrote the article and produced the results. Filip Tronarp provided codes for the iterated posterior linearisation filter. Roland Hostettler gave useful comments. Idea was due to Simo S\"{a}rkk\"{a}.} 14 | 15 | \addpublication[conference]{Zheng Zhao, Simo S\"{a}rkk\"{a}, and Ali Bahrami Rad}{Spectro-temporal ECG analysis for atrial fibrillation detection}{Proceedings of the IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)}{Aalborg, Denmark, 6 pages}{September}{2018}{IEEE}{paperKFSECGCONF} 16 | \addcontribution{Zheng Zhao wrote the article and produced the results. Ali Bahrami Rad helped with the experiments. Simo S\"{a}rkk\"{a} came up with the spectro-temporal idea.} 17 | 18 | \addpublication[accepted]{Sarang Thombre, Zheng Zhao, Henrik Ramm-Schmidt, Jos\'{e} M. Vallet García, Tuomo Malkam\"{a}ki, Sergey Nikolskiy, Toni Hammarberg, Hiski Nuortie, M. Zahidul H. Bhuiyan, Simo S\"{a}rkk\"{a}, and Ville V. Lehtola}{Sensors and AI techniques for situational awareness in autonomous ships: a review}{IEEE Transactions on Intelligent Transportation Systems}{20 pages}{September}{2020}{IEEE}{paperMARITIME} 19 | \addcontribution{Zheng Zhao wrote the reviews of AI techniques and produced corresponding results.} 20 | 21 | \addpublication[submitted]{Zheng Zhao, Rui Gao, and Simo S\"{a}rkk\"{a}}{Hierarchical Non-stationary temporal Gaussian processes with $L^1$-regularization}{Statistics and Computing}{20 pages}{May}{2021}{Zheng Zhao, Rui Gao, and Simo S\"{a}rkk\"{a}}{paperRNSSGP} 22 | \addcontribution{Zheng Zhao wrote the article and produced the results. Rui Gao contributed the convergence analysis. Simo S\"{a}rkk\"{a} gave useful comments.} 23 | -------------------------------------------------------------------------------- /scripts/draw_ssdgp_m12.py: -------------------------------------------------------------------------------- 1 | # Draw Matern 1/2 SS-DGP samples and generate Figure 4.4 in the thesis 2 | # 3 | # Zheng Zhao 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | 10 | l2 = 2. 11 | s2 = 2. 12 | l3 = 2. 13 | s3 = 2. 14 | 15 | 16 | def g(x): 17 | """Transformation function 18 | """ 19 | return np.exp(x) 20 | 21 | 22 | def a_b(x): 23 | """Return SDE drift and dispersion function a and b 24 | """ 25 | return -1 / np.array([g(x[1]), l2, l3]) * x, \ 26 | math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])), s2 / np.sqrt(l2), s3 / np.sqrt(l3)]) 27 | 28 | 29 | def euler_maruyama(x0, dt, num_steps, int_steps): 30 | xx = np.zeros(shape=(num_steps, x0.shape[0])) 31 | x = x0 32 | ddt = dt / int_steps 33 | for i in range(num_steps): 34 | for j in range(int_steps): 35 | ax, bx = a_b(x) 36 | x = x + ax * ddt + np.sqrt(ddt) * bx @ np.random.randn(3) 37 | xx[i] = x 38 | return xx 39 | 40 | 41 | if __name__ == '__main__': 42 | path_figs = '../thesis/figs' 43 | plt.rcParams.update({ 44 | 'text.usetex': True, 45 | 'text.latex.preamble': r'\usepackage{fouriernc}', 46 | 'font.family': "serif", 47 | 'font.serif': 'New Century Schoolbook', 48 | 'font.size': 20}) 49 | 50 | np.random.seed(2020) 51 | 52 | end_T = 10 53 | num_steps = 1000 54 | int_steps = 10 55 | dt = end_T / num_steps 56 | tt = np.linspace(dt, end_T, num_steps) 57 | 58 | num_mc = 2 59 | 60 | # Compute Euler--Maruyama 61 | xx = np.zeros(shape=(num_mc, num_steps, 3)) 62 | for mc in range(num_mc): 63 | x0 = np.random.randn(3) 64 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps, int_steps=int_steps) 65 | 66 | colours = ('black', 'tab:blue', 'tab:purple') 67 | markers = ('.', 'x', '1') 68 | 69 | # Plot u 70 | fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(12, 13), sharex=True) 71 | for mc in range(num_mc): 72 | ax1.plot(tt, xx[mc, :, 0], 73 | linewidth=2, c=colours[mc], 74 | marker=markers[mc], markevery=200, markersize=16, 75 | label=f'Sample {mc + 1}') 76 | 77 | ax1.grid(linestyle='--', alpha=0.3, which='both') 78 | 79 | ax1.set_ylabel('$U^1_0(t)$') 80 | ax1.set_xlim(0, end_T) 81 | ax1.set_xticks(np.arange(0, end_T + 1, 1)) 82 | 83 | ax1.legend(ncol=2, loc='upper left', fontsize=18) 84 | 85 | # Plot ell 86 | for mc in range(num_mc): 87 | ax2.plot(tt, xx[mc, :, 1], 88 | linewidth=2, c=colours[mc], 89 | marker=markers[mc], markevery=200, markersize=16, 90 | label=f'Sample {mc}') 91 | 92 | ax2.grid(linestyle='--', alpha=0.3, which='both') 93 | 94 | ax2.set_ylabel('$U^2_1(t)$') 95 | ax2.set_xlim(0, end_T) 96 | ax2.set_xticks(np.arange(0, end_T + 1, 1)) 97 | 98 | # Plot sigma 99 | for mc in range(num_mc): 100 | ax3.plot(tt, xx[mc, :, 2], 101 | linewidth=2, c=colours[mc], 102 | marker=markers[mc], markevery=200, markersize=16, 103 | label=f'Sample {mc}') 104 | 105 | ax3.grid(linestyle='--', alpha=0.3, which='both') 106 | 107 | ax3.set_xlabel('$t$') 108 | ax3.set_ylabel('$U^3_1(t)$') 109 | ax3.set_xlim(0, end_T) 110 | ax3.set_xticks(np.arange(0, end_T + 1, 1)) 111 | 112 | plt.tight_layout(pad=0.1) 113 | plt.subplots_adjust(bottom=0.053) 114 | plt.savefig(os.path.join(path_figs, 'samples_ssdgp_m12.pdf')) 115 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Doctoral dissertation of Zheng Zhao 2 | thesis 3 | 4 | [![Dissertation latex compile](https://github.com/zgbkdlm/dissertation/actions/workflows/latex_compile.yml/badge.svg)](https://github.com/zgbkdlm/dissertation/actions/workflows/latex_compile.yml) 5 | 6 | This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems. As an example, one can think of a family of DGPs as solutions to stochastic differential equations (SDEs), and view their regression problems as filtering and smoothing problems. Additionally, this thesis also presents a few applications from (D)GPs, such as system identification of SDEs and spectro-temporal signal analysis. 7 | 8 | Supervisor: Prof. Simo Särkkä. 9 | 10 | Pre-examiners: Prof. Kody J. H. Law from The University of Manchester and Prof. David Duvenaud from University of Toronto. 11 | 12 | Opponent: Prof. Manfred Opper from University of Birmingham. 13 | 14 | The public defence of the thesis will be streamed online on December 10, 2021 at noon (Helsinki time) via Zoom link https://aalto.zoom.us/j/67529212279. It is free and open to the public, you are welcome to attend. 15 | 16 | More details regarding the thesis itself can be found in its title pages. 17 | 18 | # Contents 19 | 20 | The dissertation is in `./dissertation.pdf`. Feel free to download and read~~ 21 | 22 | **Note that you may also find an "official" version in [aaltodoc](http://urn.fi/URN:ISBN:978-952-64-0603-9) published by Aalto University, the content of which has no difference with `./dissertation.pdf`. However, the aaltodoc-version has many readability issues, please use `./dissertation.pdf` instead.** 23 | 24 | 1. `./dissertation.pdf`. The PDF of the thesis. 25 | 2. `./errata.md`. Errata of the thesis. 26 | 3. `./cover`. This folder contains a Python script that generates the cover image. 27 | 4. `./lectio_praecursoria`. This folder contains the presentation at the public defence of the thesis. 28 | 5. `./scripts`. This folder contains Python scripts that are used to generate some of the figures in the thesis. 29 | 6. `./thesis_latex`. This folder contains the LaTeX source of the thesis. Compiling the tex files here will generate a PDF the same as with `./dissertation.pdf`. 30 | 31 | # Satellite repositories 32 | 33 | 1. [https://github.com/zgbkdlm/ssdgp](https://github.com/zgbkdlm/ssdgp) contains implementation of state-space deep Gaussian processes. 34 | 2. [https://github.com/zgbkdlm/tme](https://github.com/zgbkdlm/tme) and [https://github.com/zgbkdlm/tmefs](https://github.com/zgbkdlm/tmefs) contain implementation of Taylor moment expansion method and its filter and smoother applications. 35 | 36 | # Citation 37 | 38 | Bibtex: 39 | 40 | ```bibtex 41 | @phdthesis{Zhao2021Thesis, 42 | title = {State-space deep Gaussian processes with applications}, 43 | author = {Zheng Zhao}, 44 | school = {Aalto University}, 45 | year = {2021}, 46 | } 47 | ``` 48 | 49 | Plain text: 50 | 51 | Zheng Zhao. *State-space deep Gaussian processes with applications*. PhD thesis, Aalto University, 2021. 52 | 53 | # License 54 | 55 | Unless otherwise stated, all rights belong to the author Zheng Zhao. This repository consists of files covered by different licenses, please check their licenses before you use them. 56 | 57 | You are free to download, display, and print `./dissertation.pdf` for your own personal use. Commercial use of it is prohibited. 58 | 59 | # Acknowledgement 60 | 61 | I would like to thank [Adrien (Monte) Corenflos](https://adriencorenflos.github.io/), [Christos Merkatas](https://cmerkatas.github.io/), [Dennis Yeung](https://www.linkedin.com/in/dptyeung/?originalSubdomain=fi), and [Sakira Hassan](https://sakira.github.io/) for their time and efforts for reviewing and checking the languange of the thesis. 62 | 63 | # Contact 64 | 65 | Zheng Zhao, zheng.zhao@aalto.fi 66 | -------------------------------------------------------------------------------- /scripts/showcase_gp_rectangular.py: -------------------------------------------------------------------------------- 1 | # Show the GP regression on a magnitude-varying rectangular wave signal, and generate Figure 1.1. 2 | # 3 | # Zheng Zhao 2021 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import gpflow 9 | import matplotlib.pyplot as plt 10 | 11 | from typing import Tuple 12 | from matplotlib.ticker import MultipleLocator 13 | 14 | path_figs = '../thesis/figs' 15 | np.random.seed(666) 16 | plt.rcParams.update({ 17 | 'text.usetex': True, 18 | 'text.latex.preamble': r'\usepackage{fouriernc}', 19 | 'font.family': "serif", 20 | 'font.serif': 'New Century Schoolbook', 21 | 'font.size': 20}) 22 | 23 | 24 | def rect(t: np.ndarray, 25 | r: float) -> Tuple[np.ndarray, np.ndarray]: 26 | """Rectangle signal. Return the signal and an noisy measurement of it. 27 | """ 28 | tau = (t - np.min(t)) / (np.max(t) - np.min(t)) 29 | 30 | p = np.linspace(1 / 6, 5 / 6, 5) 31 | 32 | y = np.zeros_like(t) 33 | y[(tau >= 0) & (tau < p[0])] = 0 34 | y[(tau >= p[0]) & (tau < p[1])] = 1 35 | y[(tau >= p[1]) & (tau < p[2])] = 0 36 | y[(tau >= p[2]) & (tau < p[3])] = 0.6 37 | y[(tau >= p[3]) & (tau < p[4])] = 0 38 | y[tau >= p[4]] = 0.4 39 | 40 | return y, y + math.sqrt(r) * np.random.randn(*t.shape) 41 | 42 | 43 | # Simulate measurements 44 | t = np.linspace(0, 1, 400).reshape(-1, 1) 45 | r = 0.004 46 | ft, y = rect(t, r) 47 | 48 | # GPflow 49 | ell = 1. 50 | sigma = 1. 51 | 52 | m12 = gpflow.kernels.Matern12(lengthscales=ell, variance=sigma) 53 | m32 = gpflow.kernels.Matern32(lengthscales=ell, variance=sigma) 54 | m52 = gpflow.kernels.Matern52(lengthscales=ell, variance=sigma) 55 | rbf = gpflow.kernels.SquaredExponential(lengthscales=ell, variance=sigma) 56 | 57 | # Plots 58 | for name, label, cov in zip(['m12', 'm32', 'm52', 'rbf'], 59 | [r'Mat\'ern $1\,/\,2$', r'Mat\'ern $3\,/\,2$', r'Mat\'ern $5\,/\,2$', r'RBF'], 60 | [m12, m32, m52, rbf]): 61 | print(f'GP regression with {name} cov function') 62 | model = gpflow.models.GPR(data=(t, y), kernel=cov, mean_function=None) 63 | model.likelihood.variance.assign(r) 64 | 65 | opt = gpflow.optimizers.Scipy() 66 | opt_logs = opt.minimize(model.training_loss, model.trainable_variables, 67 | method='L-BFGS-B', 68 | options={'disp': True}) 69 | 70 | m, P = model.predict_f(t) 71 | 72 | # Plot and save 73 | fig = plt.figure(figsize=(16, 8)) 74 | ax = plt.axes() 75 | plt.plot(t, ft, c='black', alpha=0.8, linestyle='--', linewidth=2, label='True signal') 76 | plt.scatter(t, y, s=15, c='black', edgecolors='none', alpha=0.3, label='Measurements') 77 | plt.plot(t, m, c='black', linewidth=3, label=label) 78 | plt.fill_between( 79 | t[:, 0], 80 | m[:, 0] - 1.96 * np.sqrt(P[:, 0]), 81 | m[:, 0] + 1.96 * np.sqrt(P[:, 0]), 82 | color='black', 83 | edgecolor='none', 84 | alpha=0.2, 85 | ) 86 | 87 | plt.grid(linestyle='--', alpha=0.3, which='both') 88 | 89 | plt.xlim(0, 1) 90 | plt.ylim(-0.2, 1.2) 91 | 92 | ax.xaxis.set_major_locator(MultipleLocator(0.2)) 93 | ax.xaxis.set_minor_locator(MultipleLocator(0.1)) 94 | ax.xaxis.set_major_formatter('{x:.1f}') 95 | 96 | ax.yaxis.set_major_locator(MultipleLocator(0.4)) 97 | ax.yaxis.set_minor_locator(MultipleLocator(0.1)) 98 | ax.yaxis.set_major_formatter('{x:.1f}') 99 | 100 | plt.xlabel('$t$', fontsize=24) 101 | plt.title('$\\ell \\approx {:.2f}, \\quad \\sigma \\approx {:.2f}$'.format(cov.lengthscales.numpy(), 102 | cov.variance.numpy())) 103 | plt.legend(loc='upper right', fontsize='large') 104 | 105 | plt.tight_layout(pad=0.1) 106 | 107 | filename = 'gp-fail-example-' + name + '.pdf' 108 | plt.savefig(os.path.join(path_figs, filename)) 109 | -------------------------------------------------------------------------------- /scripts/draw_ssdgp_m32.py: -------------------------------------------------------------------------------- 1 | # Draw Matern 3/2 SS-DGP samples and generate Figure 4.4 in the thesis 2 | # 3 | # Zheng Zhao 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | 10 | l2 = 0.5 11 | s2 = 2. 12 | l3 = 0.5 13 | s3 = 2. 14 | 15 | 16 | def g(x): 17 | """Transformation function 18 | """ 19 | return np.exp(x) 20 | 21 | 22 | def a_b(x): 23 | """Return SDE drift and dispersion function a and b 24 | """ 25 | kappa1 = math.sqrt(3) / g(x[2]) 26 | kappa2 = math.sqrt(3) / l2 27 | kappa3 = math.sqrt(3) / l3 28 | return np.array([[0, 1, 0, 0, 0, 0], 29 | [- kappa1 ** 2, -2 * kappa1, 0, 0, 0, 0], 30 | [0, 0, 0, 1, 0, 0], 31 | [0, 0, - kappa2 ** 2, -2 * kappa2, 0, 0], 32 | [0, 0, 0, 0, 0, 1], 33 | [0, 0, 0, 0, - kappa3 ** 2, -2 * kappa3]]) @ x, \ 34 | 2 * np.diag([0., 35 | g(x[4]) * kappa1 ** 1.5, 36 | 0., 37 | s2 * kappa2 ** 1.5, 38 | 0., 39 | s3 * kappa3 ** 1.5]) 40 | 41 | 42 | def euler_maruyama(x0, dt, num_steps, int_steps): 43 | xx = np.zeros(shape=(num_steps, x0.shape[0])) 44 | x = x0 45 | ddt = dt / int_steps 46 | for i in range(num_steps): 47 | for j in range(int_steps): 48 | ax, bx = a_b(x) 49 | x = x + ax * ddt + np.sqrt(ddt) * bx @ np.random.randn(6) 50 | xx[i] = x 51 | return xx 52 | 53 | 54 | if __name__ == '__main__': 55 | 56 | path_figs = '../thesis/figs' 57 | plt.rcParams.update({ 58 | 'text.usetex': True, 59 | 'text.latex.preamble': r'\usepackage{fouriernc}', 60 | 'font.family': "serif", 61 | 'font.serif': 'New Century Schoolbook', 62 | 'font.size': 20}) 63 | 64 | np.random.seed(2020) 65 | 66 | end_T = 10 67 | num_steps = 1000 68 | int_steps = 10 69 | dt = end_T / num_steps 70 | tt = np.linspace(dt, end_T, num_steps) 71 | 72 | num_mc = 3 73 | 74 | # Compute Euler--Maruyama 75 | xx = np.zeros(shape=(num_mc, num_steps, 6)) 76 | for mc in range(num_mc): 77 | x0 = np.random.randn(6) 78 | xx[mc] = euler_maruyama(x0, dt, num_steps=num_steps, int_steps=int_steps) 79 | 80 | colours = ('black', 'tab:blue', 'tab:purple') 81 | markers = ('.', 'x', '1') 82 | 83 | # Plot u 84 | fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(12, 13), sharex=True) 85 | for mc in range(num_mc): 86 | ax1.plot(tt, xx[mc, :, 0], 87 | linewidth=2, c=colours[mc], 88 | marker=markers[mc], markevery=200, markersize=16, 89 | label=f'Sample {mc + 1}') 90 | 91 | ax1.grid(linestyle='--', alpha=0.3, which='both') 92 | 93 | ax1.set_ylabel('$\\overline{U}^1_0(t)$') 94 | ax1.set_xlim(0, end_T) 95 | ax1.set_xticks(np.arange(0, end_T + 1, 1)) 96 | 97 | ax1.legend(ncol=3, loc='lower left', fontsize=18) 98 | 99 | # Plot ell 100 | for mc in range(num_mc): 101 | ax2.plot(tt, xx[mc, :, 2], 102 | linewidth=2, c=colours[mc], 103 | marker=markers[mc], markevery=200, markersize=16, 104 | label=f'Sample {mc}') 105 | 106 | ax2.grid(linestyle='--', alpha=0.3, which='both') 107 | 108 | ax2.set_ylabel('$\\overline{U}^2_1(t)$') 109 | ax2.set_xlim(0, end_T) 110 | ax2.set_xticks(np.arange(0, end_T + 1, 1)) 111 | 112 | # Plot sigma 113 | for mc in range(num_mc): 114 | ax3.plot(tt, xx[mc, :, 4], 115 | linewidth=2, c=colours[mc], 116 | marker=markers[mc], markevery=200, markersize=16, 117 | label=f'Sample {mc}') 118 | 119 | ax3.grid(linestyle='--', alpha=0.3, which='both') 120 | 121 | ax3.set_xlabel('$t$') 122 | ax3.set_ylabel('$\\overline{U}^3_1(t)$') 123 | ax3.set_xlim(0, end_T) 124 | ax3.set_xticks(np.arange(0, end_T + 1, 1)) 125 | 126 | plt.tight_layout(pad=0.1) 127 | plt.subplots_adjust(bottom=0.053) 128 | plt.savefig(os.path.join(path_figs, 'samples_ssdgp_m32.pdf')) 129 | -------------------------------------------------------------------------------- /thesis_latex/figs/dgp-example-2.tex: -------------------------------------------------------------------------------- 1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt 2 | 3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1] 4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300 5 | 6 | %Shape: Circle [id:dp821654118938725] 7 | \draw [line width=1.5] (130,70) .. controls (130,58.95) and (138.95,50) .. (150,50) .. controls (161.05,50) and (170,58.95) .. (170,70) .. controls (170,81.05) and (161.05,90) .. (150,90) .. controls (138.95,90) and (130,81.05) .. (130,70) -- cycle ; 8 | %Shape: Circle [id:dp7532382575897598] 9 | \draw [line width=1.5] (80,140) .. controls (80,128.95) and (88.95,120) .. (100,120) .. controls (111.05,120) and (120,128.95) .. (120,140) .. controls (120,151.05) and (111.05,160) .. (100,160) .. controls (88.95,160) and (80,151.05) .. (80,140) -- cycle ; 10 | %Shape: Circle [id:dp5902978501813476] 11 | \draw [line width=1.5] (180,140) .. controls (180,128.95) and (188.95,120) .. (200,120) .. controls (211.05,120) and (220,128.95) .. (220,140) .. controls (220,151.05) and (211.05,160) .. (200,160) .. controls (188.95,160) and (180,151.05) .. (180,140) -- cycle ; 12 | %Straight Lines [id:da9016881764549007] 13 | \draw [line width=1.5] (137.17,92.83) -- (110,120) ; 14 | \draw [shift={(140,90)}, rotate = 135] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 15 | %Straight Lines [id:da026586992089503658] 16 | \draw [line width=1.5] (162.83,92.83) -- (190,120) ; 17 | \draw [shift={(160,90)}, rotate = 45] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 18 | %Straight Lines [id:da8305170432215097] 19 | \draw [line width=1.5] (150,94) -- (150,120) ; 20 | \draw [shift={(150,90)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 21 | %Shape: Circle [id:dp5384623722514443] 22 | \draw [line width=1.5] (130,140) .. controls (130,128.95) and (138.95,120) .. (150,120) .. controls (161.05,120) and (170,128.95) .. (170,140) .. controls (170,151.05) and (161.05,160) .. (150,160) .. controls (138.95,160) and (130,151.05) .. (130,140) -- cycle ; 23 | %Shape: Circle [id:dp16437012307038867] 24 | \draw [line width=1.5] (80,210) .. controls (80,198.95) and (88.95,190) .. (100,190) .. controls (111.05,190) and (120,198.95) .. (120,210) .. controls (120,221.05) and (111.05,230) .. (100,230) .. controls (88.95,230) and (80,221.05) .. (80,210) -- cycle ; 25 | %Shape: Circle [id:dp734554865123197] 26 | \draw [line width=1.5] (240,70) .. controls (240,58.95) and (248.95,50) .. (260,50) .. controls (271.05,50) and (280,58.95) .. (280,70) .. controls (280,81.05) and (271.05,90) .. (260,90) .. controls (248.95,90) and (240,81.05) .. (240,70) -- cycle ; 27 | %Shape: Circle [id:dp48143471465976706] 28 | \draw [line width=1.5] (240,140) .. controls (240,128.95) and (248.95,120) .. (260,120) .. controls (271.05,120) and (280,128.95) .. (280,140) .. controls (280,151.05) and (271.05,160) .. (260,160) .. controls (248.95,160) and (240,151.05) .. (240,140) -- cycle ; 29 | %Straight Lines [id:da6721182832003427] 30 | \draw [line width=1.5] (100,164) -- (100,171) -- (100,190) ; 31 | \draw [shift={(100,160)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 32 | %Straight Lines [id:da34999125713295975] 33 | \draw [line width=1.5] (260,94) -- (260,120) ; 34 | \draw [shift={(260,90)}, rotate = 90] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 35 | %Shape: Rectangle [id:dp18541931062860906] 36 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (60,40) -- (290,40) -- (290,240) -- (60,240) -- cycle ; 37 | %Shape: Rectangle [id:dp6446344359743517] 38 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (70,110) -- (230,110) -- (230,170) -- (70,170) -- cycle ; 39 | 40 | % Text Node 41 | \draw (150,70) node [font=\large] {$U_{0}^{1}$}; 42 | % Text Node 43 | \draw (100,140) node [font=\large] {$U_{1}^{3}$}; 44 | % Text Node 45 | \draw (200,140) node [font=\large] {$U_{1}^{5}$}; 46 | % Text Node 47 | \draw (150,140) node [font=\large] {$U_{1}^{4}$}; 48 | % Text Node 49 | \draw (100,210) node [font=\large] {$U_{3}^{7}$}; 50 | % Text Node 51 | \draw (260,70) node [font=\large] {$U_{0}^{2}$}; 52 | % Text Node 53 | \draw (260,140) node [font=\large] {$U_{2}^{6}$}; 54 | % Text Node 55 | \draw (73,87.4) node [anchor=north west][inner sep=0.75pt] {$\mathcal{U}^{1}$}; 56 | % Text Node 57 | \draw (261,212.4) node [anchor=north west][inner sep=0.75pt] {$\mathcal{V}$}; 58 | 59 | 60 | \end{tikzpicture} 61 | -------------------------------------------------------------------------------- /thesis_latex/figs/dgp-binary-tree.tex: -------------------------------------------------------------------------------- 1 | \tikzset{every picture/.style={line width=0.75pt}} %set default line width to 0.75pt 2 | 3 | \begin{tikzpicture}[x=0.6pt,y=0.6pt,yscale=-1,xscale=1] 4 | %uncomment if require: \path (0,300); %set diagram left start at 0, and has height of 300 5 | 6 | %Shape: Circle [id:dp821654118938725] 7 | \draw [line width=1.5] (200,30) .. controls (200,18.95) and (208.95,10) .. (220,10) .. controls (231.05,10) and (240,18.95) .. (240,30) .. controls (240,41.05) and (231.05,50) .. (220,50) .. controls (208.95,50) and (200,41.05) .. (200,30) -- cycle ; 8 | %Shape: Circle [id:dp7532382575897598] 9 | \draw [line width=1.5] (150,100) .. controls (150,88.95) and (158.95,80) .. (170,80) .. controls (181.05,80) and (190,88.95) .. (190,100) .. controls (190,111.05) and (181.05,120) .. (170,120) .. controls (158.95,120) and (150,111.05) .. (150,100) -- cycle ; 10 | %Shape: Circle [id:dp5902978501813476] 11 | \draw [line width=1.5] (250,100) .. controls (250,88.95) and (258.95,80) .. (270,80) .. controls (281.05,80) and (290,88.95) .. (290,100) .. controls (290,111.05) and (281.05,120) .. (270,120) .. controls (258.95,120) and (250,111.05) .. (250,100) -- cycle ; 12 | %Straight Lines [id:da9016881764549007] 13 | \draw [line width=1.5] (207.17,52.83) -- (180,80) ; 14 | \draw [shift={(210,50)}, rotate = 135] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 15 | %Straight Lines [id:da026586992089503658] 16 | \draw [line width=1.5] (232.83,52.83) -- (260,80) ; 17 | \draw [shift={(230,50)}, rotate = 45] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 18 | %Shape: Circle [id:dp7932559631044391] 19 | \draw [line width=1.5] (100,170) .. controls (100,158.95) and (108.95,150) .. (120,150) .. controls (131.05,150) and (140,158.95) .. (140,170) .. controls (140,181.05) and (131.05,190) .. (120,190) .. controls (108.95,190) and (100,181.05) .. (100,170) -- cycle ; 20 | %Shape: Circle [id:dp8827895580667542] 21 | \draw [line width=1.5] (170,170) .. controls (170,158.95) and (178.95,150) .. (190,150) .. controls (201.05,150) and (210,158.95) .. (210,170) .. controls (210,181.05) and (201.05,190) .. (190,190) .. controls (178.95,190) and (170,181.05) .. (170,170) -- cycle ; 22 | %Shape: Circle [id:dp1311962702995253] 23 | \draw [line width=1.5] (230,170) .. controls (230,158.95) and (238.95,150) .. (250,150) .. controls (261.05,150) and (270,158.95) .. (270,170) .. controls (270,181.05) and (261.05,190) .. (250,190) .. controls (238.95,190) and (230,181.05) .. (230,170) -- cycle ; 24 | %Shape: Circle [id:dp055039676738894316] 25 | \draw [line width=1.5] (300,170) .. controls (300,158.95) and (308.95,150) .. (320,150) .. controls (331.05,150) and (340,158.95) .. (340,170) .. controls (340,181.05) and (331.05,190) .. (320,190) .. controls (308.95,190) and (300,181.05) .. (300,170) -- cycle ; 26 | %Straight Lines [id:da3720861628053822] 27 | \draw [line width=1.5] (156.8,122.4) -- (120,150) ; 28 | \draw [shift={(160,120)}, rotate = 143.13] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 29 | %Straight Lines [id:da5765340159287278] 30 | \draw [line width=1.5] (181.26,123.79) -- (190,150) ; 31 | \draw [shift={(180,120)}, rotate = 71.57] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 32 | %Straight Lines [id:da8546030008258803] 33 | \draw [line width=1.5] (258.74,123.79) -- (250,150) ; 34 | \draw [shift={(260,120)}, rotate = 108.43] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 35 | %Straight Lines [id:da21589979146129545] 36 | \draw [line width=1.5] (283.2,122.4) -- (320,150) ; 37 | \draw [shift={(280,120)}, rotate = 36.87] [fill={rgb, 255:red, 0; green, 0; blue, 0 } ][line width=0.08] [draw opacity=0] (13.4,-6.43) -- (0,0) -- (13.4,6.44) -- (8.9,0) -- cycle ; 38 | %Shape: Rectangle [id:dp24854624791356805] 39 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (140,70) -- (300,70) -- (300,130) -- (140,130) -- cycle ; 40 | %Shape: Rectangle [id:dp15950228071348826] 41 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (220,140) -- (350,140) -- (350,200) -- (220,200) -- cycle ; 42 | %Shape: Rectangle [id:dp8128591939925793] 43 | \draw [dash pattern={on 5.63pt off 4.5pt}][line width=1.5] (90,0) -- (360,0) -- (360,210) -- (90,210) -- cycle ; 44 | 45 | % Text Node 46 | \draw (340.5,25) node {$\mathcal{V}$}; 47 | % Text Node 48 | \draw (316,80.5) node {$\mathcal{U}^{1}$}; 49 | % Text Node 50 | \draw (339,124.5) node {$\mathcal{U}^{3}$}; 51 | % Text Node 52 | \draw (220,30) node [font=\large] {$U_{0}^{1}$}; 53 | % Text Node 54 | \draw (170,100) node [font=\large] {$U_{1}^{2}$}; 55 | % Text Node 56 | \draw (270,100) node [font=\large] {$U_{1}^{3}$}; 57 | % Text Node 58 | \draw (120,170) node [font=\large] {$U_{2}^{4}$}; 59 | % Text Node 60 | \draw (190,170) node [font=\large] {$U_{2}^{5}$}; 61 | % Text Node 62 | \draw (250,170) node [font=\large] {$U_{3}^{6}$}; 63 | % Text Node 64 | \draw (320,170) node [font=\large] {$U_{3}^{7}$}; 65 | 66 | \end{tikzpicture} 67 | -------------------------------------------------------------------------------- /scripts/disc_err_dgp_m12.py: -------------------------------------------------------------------------------- 1 | # Compare different discretisation schemes on an SS-DGP and Generate Figure 4.3 in the thesis 2 | # This is done purely in sympy and numpy. 3 | # 4 | # Zheng Zhao 2020 5 | # 6 | import os 7 | import math 8 | import numpy as np 9 | import matplotlib.pyplot as plt 10 | import sympy as sp 11 | 12 | import tme.base_sympy as tme 13 | from sympy import lambdify 14 | 15 | l2 = 1. 16 | s2 = 0.1 17 | l3 = 1. 18 | s3 = 0.1 19 | 20 | 21 | # Transformation function 22 | def g(x): 23 | return np.exp(0.5 * x) 24 | 25 | 26 | def g_sym(x): 27 | return sp.exp(0.5 * x) 28 | 29 | 30 | # SS-DGP drift 31 | def a(x): 32 | return -np.array([x[0] / g(x[1]), 33 | x[1] / l2, 34 | x[2] / l3]) 35 | 36 | 37 | # SS-DGP dispersion 38 | def b(x): 39 | return math.sqrt(2) * np.diag([g(x[2]) / np.sqrt(g(x[1])), 40 | s2 / math.sqrt(l2), 41 | s3 / math.sqrt(l3)]) 42 | 43 | 44 | def euler_maruyama(x0, dt, dws): 45 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0])) 46 | x = x0 47 | for idx, dw in enumerate(dws): 48 | x = x + a(x) * dt + b(x) @ dw 49 | xx[idx] = x 50 | return xx 51 | 52 | 53 | # Locally conditional discretisation method for giving 54 | # x_k \approx F(x_{k-1}) + q(x_{k-1}) 55 | def lcd_F(x, dt): 56 | return np.diag(np.exp(-dt / np.array([g(x[1]), l2, l3]))) 57 | 58 | 59 | def lcd_Q(x, dt): 60 | return np.diag([g(x[2]) ** 2 * (1 - np.exp(-2 * dt / g(x[1]))), 61 | s2 ** 2 * (1 - np.exp(-2 * dt / l2)), 62 | s3 ** 2 * (1 - np.exp(-2 * dt / l3))]) 63 | 64 | 65 | def lcd(x0, dt, dws): 66 | """Locally conditional discretisation for Matern 1/2 SS-DGPs 67 | """ 68 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0])) 69 | x = x0 70 | for idx, dw in enumerate(dws): 71 | x = lcd_F(x, dt) @ x + np.sqrt(lcd_Q(x, dt)) @ dw / np.sqrt(dt) 72 | xx[idx] = x 73 | return xx 74 | 75 | 76 | def local_sum(x, factor): 77 | target_shape = (int(x.shape[0] / factor), x.shape[1]) 78 | xx = np.zeros(target_shape) 79 | for i in range(target_shape[0]): 80 | xx[i] = np.sum(x[i * factor:(i + 1) * factor], axis=0) 81 | return xx 82 | 83 | 84 | def give_tme_symbols(order=3, simp=True): 85 | """Give mean and covariance symbols of Taylor moment expansion. 86 | """ 87 | # Symbols 88 | x = sp.MatrixSymbol('x', 3, 1) 89 | a_sym = -sp.Matrix([x[0] / g_sym(x[1]), 90 | x[1] / sp.S(l2), 91 | x[2] / sp.S(l3)]) 92 | b_sym = sp.sqrt(sp.S(2)) * sp.Matrix([[g_sym(x[2]) / sp.sqrt(g_sym(x[1])), 0, 0], 93 | [0, sp.S(s2) / sp.sqrt(sp.S(l2)), 0], 94 | [0, 0, sp.S(s3) / sp.sqrt(sp.S(l3))] 95 | ]) 96 | dt_sym = sp.Symbol('dt') 97 | tme_mean, tme_cov = tme.mean_and_cov(x, a_sym, b_sym, dt_sym, 98 | order=order, simp=simp) 99 | tme_mean_func = lambdify([x, dt_sym], tme_mean, 'numpy') 100 | tme_cov_func = lambdify([x, dt_sym], tme_cov, 'numpy') 101 | return tme_mean_func, tme_cov_func 102 | 103 | 104 | def tme_disc(x0, dt, dws, f, Q): 105 | """Taylor moment expansion discretisation. This gives you a demonstration how to use TME in sympy, however, 106 | please use the Jax implementation in practice. 107 | """ 108 | xx = np.zeros(shape=(dws.shape[0], x0.shape[0], 1)) 109 | x = x0.reshape(-1, 1) 110 | for idx, dw in enumerate(dws): 111 | x = f(x, dt) + (np.linalg.cholesky(Q(x, dt)) @ dw / np.sqrt(dt))[:, None] 112 | xx[idx] = x 113 | return xx[:, :, 0] 114 | 115 | 116 | def abs_err(x1, x2): 117 | return np.sum(np.abs(x1 - x2)) 118 | 119 | 120 | if __name__ == '__main__': 121 | path_figs = '../thesis/figs' 122 | plt.rcParams.update({ 123 | 'text.usetex': True, 124 | 'text.latex.preamble': r'\usepackage{fouriernc}', 125 | 'font.family': "serif", 126 | 'font.serif': 'New Century Schoolbook', 127 | 'font.size': 20}) 128 | 129 | np.random.seed(2020) 130 | 131 | end_T = 10 132 | num_steps = 100 133 | dt = end_T / num_steps 134 | tt = np.linspace(dt, end_T, num_steps) 135 | 136 | boost_factor = 1000 137 | boost_num_steps = num_steps * boost_factor 138 | boost_dt = dt / boost_factor 139 | boost_tt = np.linspace(boost_dt, end_T, boost_num_steps) 140 | boost_dws = np.sqrt(dt) * np.random.randn(boost_num_steps, 3) 141 | 142 | x0 = np.zeros(shape=(3,)) 143 | 144 | # Compute very accurate discretisation 145 | boost_xx = euler_maruyama(x0, boost_dt, boost_dws) 146 | exact_xx = boost_xx[boost_factor - 1::boost_factor] 147 | exact_tt = boost_tt[boost_factor - 1::boost_factor] 148 | 149 | # Compute Euler Maruyama 150 | dws = local_sum(boost_dws, boost_factor) 151 | em_xx = euler_maruyama(x0, dt, dws) 152 | 153 | # Compute locally conditional discretisation 154 | lcd_xx = lcd(x0, dt, dws) 155 | 156 | # Compute TME 157 | tme_order = 3 158 | tme_mean_func, tme_cov_func = give_tme_symbols(order=tme_order, simp=True) 159 | tme_xx = tme_disc(x0, dt, dws, tme_mean_func, tme_cov_func) 160 | 161 | # Compute abs error 162 | err_dim = 0 163 | err_em = abs_err(em_xx[:, err_dim], exact_xx[:, err_dim]) 164 | err_lcd = abs_err(lcd_xx[:, err_dim], exact_xx[:, err_dim]) 165 | err_tme = abs_err(tme_xx[:, err_dim], exact_xx[:, err_dim]) 166 | print(f'Euler--Maruyama abs err: {err_em}') 167 | print(f'LCD abs err: {err_lcd}') 168 | print(f'TME abs err: {err_tme}') 169 | 170 | # Plot 171 | plt.figure(figsize=(16, 8)) 172 | plt.plot(tt, exact_xx[:, 0], 173 | c='black', linewidth=3, label='Numerical exact') 174 | plt.plot(tt, em_xx[:, 0], 175 | c='tab:blue', linewidth=3, linestyle=(0, (1, 1)), 176 | label=f'Euler--Maruyama (abs. err. $\\approx$ {err_em:.1f})') 177 | plt.plot(tt, lcd_xx[:, 0], 178 | c='tab:purple', linewidth=3, linestyle=(0, (5, 1)), 179 | label=f'LCD (abs. err. $\\approx$ {err_lcd:.1f})') 180 | plt.plot(tt, tme_xx[:, 0], 181 | c='tab:red', linewidth=3, linestyle=(0, (3, 1, 1, 1)), 182 | label=f'TME-{tme_order} (abs. err. $\\approx$ {err_tme:.1f})') 183 | 184 | plt.legend(loc='lower left', fontsize=24) 185 | 186 | plt.grid(linestyle='--', alpha=0.3, which='both') 187 | plt.xlim(0, end_T) 188 | 189 | plt.xlabel('$t$', fontsize=26) 190 | plt.ylabel('$U^1_0(t)$', fontsize=26) 191 | plt.xticks(np.arange(0, end_T + 1, 1)) 192 | 193 | plt.tight_layout(pad=0.1) 194 | plt.savefig(os.path.join(path_figs, 'disc-err_dgp_m12.pdf')) 195 | -------------------------------------------------------------------------------- /scripts/spectro_temporal.py: -------------------------------------------------------------------------------- 1 | # Probabilistic state-space spectro-temporal estimation. This will generate Figure 5.2 in the thesis. 2 | # 3 | # Zheng Zhao 2020 4 | # 5 | import os 6 | import math 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | 10 | from scipy.linalg import cho_factor, cho_solve 11 | from typing import Tuple 12 | 13 | 14 | def test_signal(ts: np.ndarray, R: float) -> Tuple[np.ndarray, np.ndarray]: 15 | """Generate a test sinusoidal signal with multiple freq bands 16 | 17 | Parameters 18 | ---------- 19 | ts : np.ndarray 20 | Time instances. 21 | R : float 22 | Measurement noise variance. 23 | 24 | Returns 25 | ------- 26 | zt, yt : np.ndarray 27 | Ground truth signal and its noisy measurements, respectively. 28 | """ 29 | t1 = ts[ts < 1 / 3] 30 | t2 = ts[(ts >= 1 / 3) & (ts < 2 / 3)] 31 | t3 = ts[ts >= 2 / 3] 32 | zt = np.concatenate([np.sin(2 * math.pi * 10 * t1), 33 | np.sin(2 * math.pi * 40 * t2) + np.sin(2 * math.pi * 60 * t2), 34 | np.sin(2 * math.pi * 90 * t3)], 35 | axis=0) 36 | yt = zt + math.sqrt(R) * np.random.randn(ts.size) 37 | return zt, yt 38 | 39 | 40 | def kf_rts(F: np.ndarray, Q: np.ndarray, 41 | H: np.ndarray, R: float, 42 | y: np.ndarray, 43 | m0: np.ndarray, p0: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: 44 | """Simple enough Kalman filter and RTS smoother. 45 | 46 | x_k = F x_{k-1} + q_{k-1}, 47 | y_k = H x_k + r_k, 48 | 49 | Parameters 50 | ---------- 51 | F : np.ndarray 52 | State transition. 53 | Q : np.ndarray 54 | State covariance. 55 | H : np.ndarray 56 | Measurement matrix. 57 | R : float 58 | Measurement noise variance. 59 | y : np.ndarray 60 | Measurements. 61 | m0, P0 : np.ndarray 62 | Initial mean and cov. 63 | 64 | Returns 65 | ------- 66 | ms, ps : np.ndarray 67 | Smoothing posterior mean and covariances. 68 | """ 69 | dim_x = m0.size 70 | num_y = y.size 71 | 72 | mm = np.zeros(shape=(num_y, dim_x)) 73 | pp = np.zeros(shape=(num_y, dim_x, dim_x)) 74 | 75 | mm_pred = mm.copy() 76 | pp_pred = pp.copy() 77 | 78 | m = m0 79 | p = p0 80 | 81 | # Filtering pass 82 | for k in range(num_y): 83 | # Pred 84 | m = F @ m 85 | p = F @ p @ F.T + Q 86 | mm_pred[k] = m 87 | pp_pred[k] = p 88 | 89 | # Update 90 | Hk = H[k] 91 | S = Hk @ p @ Hk.T + R 92 | K = p @ Hk.T / S 93 | m = m + K * (y[k] - Hk @ m) 94 | p = p - np.outer(K, K) * S 95 | 96 | # Save 97 | mm[k] = m 98 | pp[k] = p 99 | 100 | # Smoothing pass 101 | ms = mm.copy() 102 | ps = pp.copy() 103 | for k in range(num_y - 2, -1, -1): 104 | (c, low) = cho_factor(pp_pred[k + 1]) 105 | G = pp[k] @ cho_solve((c, low), F).T 106 | ms[k] = mm[k] + G @ (ms[k + 1] - mm_pred[k + 1]) 107 | ps[k] = pp[k] + G @ (ps[k + 1] - pp_pred[k + 1]) @ G.T 108 | 109 | return ms, ps 110 | 111 | 112 | def generate_spectro_temporal_ssm(ell: float, sigma: float, 113 | ts: np.ndarray, dt: float, 114 | freqs: np.ndarray): 115 | """Generate the state-space model for specro-temporal analysis. Only implemented for the Matern 12 prior with 116 | uniform parameters for ell and sigma for all frequency components. 117 | 118 | Parameters 119 | ---------- 120 | ell : float 121 | Length scale of Matern 12. 122 | sigma : float 123 | Magnitude scale of Matern 12. 124 | ts : np.ndarray 125 | Time instances. 126 | dt : float 127 | Time interval. (It is left as an exercise for you to implement varying dt) 128 | freqs : np.ndarray 129 | Frequencies. 130 | 131 | Returns 132 | ------- 133 | F, Q, H 134 | State coefficients. 135 | """ 136 | dim_x = 2 * N + 1 137 | 138 | lam = 1 / ell 139 | q = 2 * sigma ** 2 / ell 140 | 141 | F = math.exp(-lam * dt) * np.eye(dim_x) 142 | Q = q / (2 * lam) * (1 - math.exp(-2 * lam * dt)) * np.eye(dim_x) 143 | 144 | H = np.array([[1.] 145 | + [np.cos(2 * math.pi * f * t) for f in freqs] 146 | + [np.sin(2 * math.pi * f * t) for f in freqs] for t in ts]) 147 | return F, Q, H 148 | 149 | 150 | if __name__ == '__main__': 151 | # Parameters of priors 152 | ell = 0.1 153 | sigma = 0.5 154 | 155 | # Generate a signal and measurements 156 | fs = 1000 157 | ts = np.linspace(0, 1, fs) 158 | R = 0.01 159 | zt, yt = test_signal(ts=ts, R=R) 160 | 161 | # Generate state-space GP model 162 | # Order of Fourier expansions 163 | N = 100 164 | freqs = np.linspace(1, 100, N) 165 | F, Q, H = generate_spectro_temporal_ssm(ell=ell, sigma=sigma, ts=ts, dt=1 / fs, 166 | freqs=freqs) 167 | 168 | # Kalman filtering and smoothing 169 | m0 = np.zeros(shape=(2 * N + 1,)) 170 | p0 = 1. * np.eye(2 * N + 1) 171 | 172 | # Discarded smoothing covariance ps 173 | ms, _ = kf_rts(F=F, Q=Q, H=H, R=R + 0.01, y=yt, m0=m0, p0=p0) 174 | 175 | # Draw spectrogram sqrt(a^2 + b^2) 176 | spectrogram = np.sqrt(ms[:, 1:N + 1] ** 2 + ms[:, N + 1:] ** 2) 177 | 178 | # Plot 179 | path_figs = '../thesis/figs' 180 | 181 | plt.rcParams.update({ 182 | 'text.usetex': True, 183 | 'text.latex.preamble': r'\usepackage{fouriernc}', 184 | 'font.family': "serif", 185 | 'font.serif': 'New Century Schoolbook', 186 | 'font.size': 20}) 187 | 188 | # Plot signal 189 | fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(16, 6)) 190 | 191 | axs[0].plot(ts, zt, linewidth=2, c='black', label='Signal') 192 | axs[0].scatter(ts, yt, s=10, c='tab:purple', edgecolors='none', alpha=0.4, label='Measurements') 193 | axs[0].set_xlabel('$t$', fontsize=24) 194 | 195 | axs[0].grid(linestyle='--', alpha=0.3, which='both') 196 | axs[0].legend(loc='upper left', fontsize=16) 197 | 198 | # Plot spectrogram and true freq bands 199 | mesh_ts, mesh_freqs = np.meshgrid(ts, freqs, indexing='ij') 200 | axs[1].contourf(mesh_ts, mesh_freqs, spectrogram, levels=4, cmap=plt.cm.Blues_r) 201 | 202 | axs[1].axhline(y=10, xmin=ts[0], xmax=ts[ts < 1 / 3][-1], 203 | c='black', linewidth=2, linestyle='--') 204 | axs[1].axhline(y=40, xmin=ts[(ts >= 1 / 3) & (ts < 2 / 3)][0], xmax=ts[(ts >= 1 / 3) & (ts < 2 / 3)][-1], 205 | c='black', linewidth=2, linestyle='--') 206 | axs[1].axhline(y=60, xmin=ts[(ts >= 1 / 3) & (ts < 2 / 3)][0], xmax=ts[(ts >= 1 / 3) & (ts < 2 / 3)][-1], 207 | c='black', linewidth=2, linestyle='--') 208 | axs[1].axhline(y=90, xmin=ts[ts >= 2 / 3][0], xmax=ts[ts >= 2 / 3][-1], 209 | c='black', linewidth=2, linestyle='--') 210 | 211 | axs[1].set_xlabel('$t$', fontsize=24) 212 | axs[1].set_ylabel('Frequency') 213 | 214 | plt.subplots_adjust(left=0.044, bottom=0.11, right=0.989, top=0.977, wspace=0.134, hspace=0.2) 215 | plt.savefig(os.path.join(path_figs, 'spectro-temporal-demo1.pdf')) 216 | -------------------------------------------------------------------------------- /lectio_praecursoria/z_marcro.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = dissertation.tex 2 | 3 | % To use this macro you need packages: amsmath, amssymb, bm, mathtools 4 | % 5 | % Zheng Zhao @ 2019 6 | % zz@zabemon.com 7 | % 8 | % License: Creativice Commons Attribution 4.0 International (CC BY 4.0) 9 | % 10 | 11 | % Adaptive bold math font command 12 | \newcommand{\cu}[1]{ 13 | \ifcat\noexpand#1\relax 14 | \bm{#1} 15 | \else 16 | \mathbf{#1} 17 | \fi 18 | } 19 | 20 | \newcommand{\tash}[2]{\frac{\partial #1}{\partial #2}} 21 | \newcommand{\tashh}[3]{\frac{\partial^2 #1}{\partial #2 \, \partial #3}} 22 | 23 | % Slightly smaller spacing than a pure mathop 24 | \newcommand{\diff}{\mathop{}\!\mathrm{d}} 25 | 26 | % Complex 27 | \newcommand{\imag}{\mathrm{i}} 28 | 29 | % Exponential 30 | \newcommand{\expp}{\mathrm{e}} 31 | 32 | % \mid used in condition in probability e.g., E[x \mid y] 33 | \newcommand{\cond}{{\;|\;}} 34 | \newcommand{\condbig}{{\;\big|\;}} 35 | \newcommand{\condBig}{{\;\Big|\;}} 36 | \newcommand{\condbigg}{{\;\bigg|\;}} 37 | \newcommand{\condBigg}{{\;\Bigg|\;}} 38 | 39 | \let\sup\relax 40 | \let\inf\relax 41 | \let\lim\relax 42 | \DeclareMathOperator*{\argmin}{arg\,min\,} % Argmin 43 | \DeclareMathOperator*{\argmax}{arg\,max\,} % Argmax 44 | \DeclareMathOperator*{\sup}{sup\,} % sup better spacing 45 | \DeclareMathOperator*{\inf}{inf\,} % inf 46 | \DeclareMathOperator*{\lim}{lim\,} % inf 47 | 48 | \newcommand{\sgn}{\operatorname{sgn}} % sign function 49 | 50 | \newcommand{\expecsym}{\operatorname{\mathbb{E}}} % Expec 51 | \newcommand{\covsym}{\operatorname{Cov}} % Covariance 52 | \newcommand{\varrsym}{\operatorname{Var}} % Variance 53 | \newcommand{\diagsym}{\operatorname{diag}} % Diagonal matrix 54 | \newcommand{\tracesym}{\operatorname{tr}} % Trace 55 | 56 | % Two problems for E, Cov, Var etc. with brackets 57 | % 1. \operatorname does not give space for bracket, thus we need to manually add \, after E. If \left\right is used then no need to add space. 58 | % 2. \left\right does not give correct vertical spacing. The brackets will be shifted down slightly. 59 | % Solution is to use \left\right when it is inevitable. 60 | % Use \expec when you do not want auto-height 61 | % Use \expec* when you want auto-height 62 | % Use \expecsym when you want to fully define the behaviour, which only gives the E symbol wihout brackets. 63 | \let\expec\relax 64 | \let\cov\relax 65 | \let\varr\relax 66 | \let\diag\relax 67 | \let\trace\relax 68 | 69 | \makeatletter 70 | % E [ ] 71 | \newcommand{\expec}{\@ifstar{\@expecauto}{\@expecnoauto}} 72 | \newcommand{\@expecauto}[1]{\expecsym \left[ #1 \right]} 73 | \newcommand{\@expecnoauto}[1]{\expecsym \, [#1]} 74 | \newcommand{\expecbig}[1]{\expecsym \big[ #1 \big]} 75 | \newcommand{\expecBig}[1]{\expecsym \Big[ #1 \Big]} 76 | \newcommand{\expecbigg}[1]{\expecsym \bigg[ #1 \bigg]} 77 | \newcommand{\expecBigg}[1]{\expecsym \Bigg[ #1 \Bigg]} 78 | 79 | 80 | % Cov [ ] 81 | \newcommand{\cov}{\@ifstar{\@covauto}{\@covnoauto}} 82 | \newcommand{\@covauto}[1]{\covsym \left[ #1 \right]} 83 | \newcommand{\@covnoauto}[1]{\covsym \, [#1]} 84 | \newcommand{\covbig}[1]{\covsym \big[ #1 \big]} 85 | \newcommand{\covBig}[1]{\covsym \Big[ #1 \Big]} 86 | \newcommand{\covbigg}[1]{\covsym \bigg[ #1 \bigg]} 87 | \newcommand{\covBigg}[1]{\covsym \Bigg[ #1 \Bigg]} 88 | 89 | % Var [ ] 90 | \newcommand{\varr}{\@ifstar{\@varrauto}{\@varrnoauto}} 91 | \newcommand{\@varrauto}[1]{\varrsym \left[ #1 \right]} 92 | \newcommand{\@varrnoauto}[1]{\varrsym \, [#1]} 93 | \newcommand{\varrbig}[1]{\varrsym \big[ #1 \big]} 94 | \newcommand{\varrBig}[1]{\varrsym \Big[ #1 \Big]} 95 | \newcommand{\varrbigg}[1]{\varrsym \bigg[ #1 \bigg]} 96 | \newcommand{\varrBigg}[1]{\varrsym \Bigg[ #1 \Bigg]} 97 | 98 | % Diag ( ) 99 | \newcommand{\diag}{\@ifstar{\@diagauto}{\@diagnoauto}} 100 | \newcommand{\@diagauto}[1]{\diagsym \left( #1 \right)} 101 | \newcommand{\@diagnoauto}[1]{\diagsym \, (#1)} 102 | \newcommand{\diagbig}[1]{\diagsym \big( #1 \big)} 103 | \newcommand{\diagBig}[1]{\diagsym \Big( #1 \Big)} 104 | \newcommand{\diagbigg}[1]{\diagsym \bigg( #1 \bigg)} 105 | \newcommand{\diagBigg}[1]{\diagsym \Bigg( #1 \Bigg)} 106 | 107 | % tr ( ) 108 | \newcommand{\trace}{\@ifstar{\@traceauto}{\@tracenoauto}} 109 | \newcommand{\@traceauto}[1]{\tracesym \left( #1 \right)} 110 | \newcommand{\@tracenoauto}[1]{\tracesym \, (#1)} 111 | \newcommand{\tracebig}[1]{\tracesym \big( #1 \big)} 112 | \newcommand{\traceBig}[1]{\tracesym \Big( #1 \Big)} 113 | \newcommand{\tracebigg}[1]{\tracesym \bigg( #1 \bigg)} 114 | \newcommand{\traceBigg}[1]{\tracesym \Bigg( #1 \Bigg)} 115 | \makeatother 116 | 117 | \newcommand{\A}{\mathcal{A}} % Generator 118 | \newcommand{\Am}{\overline{\mathcal{A}}} % Generator 119 | 120 | % Transpose symbol using (DIN) EN ISO 80000-2:2013 standard 121 | \newcommand*{\trans}{{\mkern-1.5mu\mathsf{T}}} 122 | 123 | \newcommand*{\T}{\mathbb{T}} % Set of temporal varialbes 124 | \newcommand*{\R}{\mathbb{R}} % Set of real numbers 125 | \newcommand*{\Q}{\mathbb{Q}} % Set of rational numbers 126 | \newcommand*{\N}{\mathbb{N}} % Set of natural numbers 127 | \newcommand*{\Z}{\mathbb{Z}} % Set of integers 128 | 129 | \newcommand*{\BB}{\mathcal{B}} % Borel sigma-algebra 130 | \newcommand*{\FF}{\mathcal{F}} % Sigma-algebra 131 | \newcommand*{\PP}{\mathbb{P}} % Probability measure 132 | \newcommand*{\GP}{\mathrm{GP}} % GP 133 | 134 | \newcommand{\mineig}{\lambda_{\mathrm{min}}} 135 | \newcommand{\maxeig}{\lambda_{\mathrm{max}}} 136 | 137 | % Norm and inner product 138 | %% use \norm* to enable auto-height 139 | 140 | %% Some notes on these paired delimiters: 141 | %% It is argued that there should be no space between operator and delimiter, but this might not be suitable in some cases. Indeed log(x) should have no space between log and (, but log |x| with a mathop{} spacing looks absolutely much prettier than log|x| because here |x| is an argument. Think, shouldn't it be log(|x|) in full expansion, and we ignored () with spacing? 142 | %% See discussion in https://tex.stackexchange.com/questions/461806/missing-space-with-declarepaireddelimiter 143 | % 144 | \let\norm\relax 145 | \DeclarePairedDelimiter{\normbracket}{\lVert}{\rVert} 146 | \newcommand{\norm}{\normbracket} 147 | \newcommand{\normbig}[1]{\big \lVert #1 \big \rVert} 148 | \newcommand{\normBig}[1]{\Big \lVert #1 \Big\rVert} 149 | \newcommand{\normbigg}[1]{\bigg \lVert #1 \bigg\rVert} 150 | \newcommand{\normBigg}[1]{\Bigg \lVert #1 \Bigg\rVert} 151 | %\makeatletter 152 | %\newcommand{\norm}{\@ifstar{\@normnoauto}{\@normauto}} 153 | %\newcommand{\@normauto}[1]{\left\lVert#1\right\rVert} 154 | %\newcommand{\@normnoauto}[1]{\lVert#1\rVert} 155 | %\makeatother 156 | 157 | \let\innerp\relax 158 | \DeclarePairedDelimiter{\innerpbracket}{\langle}{\rangle} 159 | \newcommand{\innerp}{\innerpbracket} 160 | %\makeatletter 161 | %\newcommand{\innerp}{\@ifstar{\@inpnoautp}{\@inpauto}} 162 | %\newcommand{\@inpauto}[2]{\left\langle#1, #2\right\rangle} 163 | %\newcommand{\@inpnoautp}[2]{\left#1, #2\rangle} 164 | %\makeatother 165 | 166 | \let\abs\relax 167 | \DeclarePairedDelimiter{\absbracket}{\lvert}{\rvert} 168 | \newcommand{\abs}{\absbracket} 169 | \newcommand{\absbig}[1]{\big \lvert #1 \big \rvert} 170 | \newcommand{\absBig}[1]{\Big \lvert #1 \Big\rvert} 171 | \newcommand{\absbigg}[1]{\bigg \lvert #1 \bigg\rvert} 172 | \newcommand{\absBigg}[1]{\Bigg \lvert #1 \Bigg\rvert} 173 | %\makeatletter 174 | %\newcommand{\abs}{\@ifstar{\@absnoauto}{\@absauto}} 175 | %\newcommand{\@absauto}[1]{\left\lvert#1\right\rvert} 176 | %\newcommand{\@absnoauto}[1]{\lvert#1\rvert} 177 | %\makeatother 178 | 179 | % Some functions 180 | \newcommand{\mBesselsec}{\operatorname{K}_\nu} 181 | \newcommand{\jacob}{\operatorname{J}} 182 | \newcommand{\hessian}{\operatorname{H}} 183 | 184 | % Literals 185 | \def\matern{Mat\'{e}rn } 186 | 187 | % Theorem envs 188 | % Dummy env for those sharing the same numbering system. 189 | % 190 | \makeatletter 191 | \@ifundefined{thmnumcounter}{} 192 | {% 193 | \newtheorem{envcounter}{EnvcounterDummy}[\thmnumcounter] 194 | \newtheorem{theorem}[\thmnumcounter]{Theorem} 195 | \newtheorem{proposition}[\thmnumcounter]{Proposition} 196 | \newtheorem{lemma}[\thmnumcounter]{Lemma} 197 | \newtheorem{corollary}[\thmnumcounter]{Corollary} 198 | \newtheorem{remark}[\thmnumcounter]{Remark} 199 | \newtheorem{example}[\thmnumcounter]{Example} 200 | \newtheorem{definition}[\thmnumcounter]{Definition} 201 | \newtheorem{algorithm}[\thmnumcounter]{Algorithm} 202 | \newtheorem{assumption}[\thmnumcounter]{Assumption} 203 | \newcommand{\textcmd}{zz} 204 | } 205 | \makeatother 206 | -------------------------------------------------------------------------------- /thesis_latex/zmacro.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = main.tex 2 | % Generic macro definitions for a number of math operations. 3 | % Version 2.0, last updated 02.06.2022. 4 | % 5 | % To use this macro you need packages: amsmath, amssymb, bm, mathtools 6 | % 7 | % Zheng Zhao @ 2019 8 | % zz@zabemon.com 9 | % 10 | % License: Creativice Commons Attribution 4.0 International (CC BY 4.0) 11 | % 12 | 13 | % Adaptive bold math font command 14 | \newcommand{\cu}[1]{ 15 | \ifcat\noexpand#1\relax 16 | \bm{#1} 17 | \else 18 | \mathbf{#1} 19 | \fi 20 | } 21 | 22 | \newcommand{\tash}[2]{\frac{\partial #1}{\partial #2}} 23 | \newcommand{\tashh}[3]{\frac{\partial^2 #1}{\partial #2 \, \partial #3}} 24 | 25 | % Slightly smaller spacing than a pure mathop 26 | \newcommand{\diff}{\mathop{}\!\mathrm{d}} 27 | 28 | % Complex 29 | \newcommand{\imag}{\mathrm{i}} 30 | 31 | % Exponential 32 | \newcommand{\expp}{\mathrm{e}} 33 | 34 | % \mid used in condition in probability e.g., E[x \mid y] 35 | \newcommand{\cond}{{\;|\;}} 36 | \newcommand{\condbig}{{\;\big|\;}} 37 | \newcommand{\condBig}{{\;\Big|\;}} 38 | \newcommand{\condbigg}{{\;\bigg|\;}} 39 | \newcommand{\condBigg}{{\;\Bigg|\;}} 40 | 41 | \let\sup\relax 42 | \let\inf\relax 43 | \let\lim\relax 44 | \DeclareMathOperator*{\argmin}{arg\,min\,} % Argmin 45 | \DeclareMathOperator*{\argmax}{arg\,max\,} % Argmax 46 | \DeclareMathOperator*{\sup}{sup\,} % sup better spacing 47 | \DeclareMathOperator*{\inf}{inf\,} % inf 48 | \DeclareMathOperator*{\lim}{lim\,} % inf 49 | \DeclareMathOperator*{\oprepeat}{\cdots} % repeat operation 50 | 51 | \newcommand{\sgn}{\operatorname{sgn}} % sign function 52 | 53 | \newcommand{\expecsym}{\operatorname{\mathbb{E}}} % Expec 54 | \newcommand{\covsym}{\operatorname{Cov}} % Covariance 55 | \newcommand{\varrsym}{\operatorname{Var}} % Variance 56 | \newcommand{\diagsym}{\operatorname{diag}} % Diagonal matrix 57 | \newcommand{\tracesym}{\operatorname{tr}} % Trace 58 | 59 | % Two problems for E, Cov, Var etc. with brackets 60 | % 1. \operatorname does not give space for bracket, thus we need to manually add \, after E. If \left\right is used then no need to add space. 61 | % 2. \left\right does not give correct vertical spacing. The brackets will be shifted down slightly. 62 | % Solution is to use \left\right when it is inevitable. 63 | % Use \expec when you do not want auto-height 64 | % Use \expec* when you want auto-height 65 | % Use \expecsym when you want to fully define the behaviour, which only gives the E symbol wihout brackets. 66 | \let\expec\relax 67 | \let\cov\relax 68 | \let\varr\relax 69 | \let\diag\relax 70 | \let\trace\relax 71 | 72 | \makeatletter 73 | % E [ ] 74 | \newcommand{\expec}{\@ifstar{\@expecauto}{\@expecnoauto}} 75 | \newcommand{\@expecauto}[1]{\expecsym \left[ #1 \right]} 76 | \newcommand{\@expecnoauto}[1]{\expecsym [#1]} 77 | \newcommand{\expecbig}[1]{\expecsym \bigl[ #1 \bigr]} 78 | \newcommand{\expecBig}[1]{\expecsym \Bigl[ #1 \Bigr]} 79 | \newcommand{\expecbigg}[1]{\expecsym \biggl[ #1 \biggr]} 80 | \newcommand{\expecBigg}[1]{\expecsym \Biggl[ #1 \Biggr]} 81 | 82 | 83 | % Cov [ ] 84 | \newcommand{\cov}{\@ifstar{\@covauto}{\@covnoauto}} 85 | \newcommand{\@covauto}[1]{\covsym \left[ #1 \right]} 86 | \newcommand{\@covnoauto}[1]{\covsym [#1]} 87 | \newcommand{\covbig}[1]{\covsym \bigl[ #1 \bigr]} 88 | \newcommand{\covBig}[1]{\covsym \Bigl[ #1 \Bigr]} 89 | \newcommand{\covbigg}[1]{\covsym \biggl[ #1 \biggr]} 90 | \newcommand{\covBigg}[1]{\covsym \Biggl[ #1 \Biggr]} 91 | 92 | % Var [ ] 93 | \newcommand{\varr}{\@ifstar{\@varrauto}{\@varrnoauto}} 94 | \newcommand{\@varrauto}[1]{\varrsym \left[ #1 \right]} 95 | \newcommand{\@varrnoauto}[1]{\varrsym [#1]} 96 | \newcommand{\varrbig}[1]{\varrsym \bigl[ #1 \bigr]} 97 | \newcommand{\varrBig}[1]{\varrsym \Bigl[ #1 \Bigr]} 98 | \newcommand{\varrbigg}[1]{\varrsym \biggl[ #1 \biggr]} 99 | \newcommand{\varrBigg}[1]{\varrsym \Biggl[ #1 \Biggr]} 100 | 101 | % Diag ( ) 102 | \newcommand{\diag}{\@ifstar{\@diagauto}{\@diagnoauto}} 103 | \newcommand{\@diagauto}[1]{\diagsym \left( #1 \right)} 104 | \newcommand{\@diagnoauto}[1]{\diagsym (#1)} 105 | \newcommand{\diagbig}[1]{\diagsym \bigl( #1 \bigr)} 106 | \newcommand{\diagBig}[1]{\diagsym \Bigl( #1 \Bigr)} 107 | \newcommand{\diagbigg}[1]{\diagsym \biggl( #1 \biggr)} 108 | \newcommand{\diagBigg}[1]{\diagsym \Biggl( #1 \Biggr)} 109 | 110 | % tr ( ) 111 | \newcommand{\trace}{\@ifstar{\@traceauto}{\@tracenoauto}} 112 | \newcommand{\@traceauto}[1]{\tracesym \left( #1 \right)} 113 | \newcommand{\@tracenoauto}[1]{\tracesym (#1)} 114 | \newcommand{\tracebig}[1]{\tracesym \bigl( #1 \bigr)} 115 | \newcommand{\traceBig}[1]{\tracesym \Bigl( #1 \Bigr)} 116 | \newcommand{\tracebigg}[1]{\tracesym \biggl( #1 \biggr)} 117 | \newcommand{\traceBigg}[1]{\tracesym \Biggl( #1 \Biggr)} 118 | \makeatother 119 | 120 | \newcommand{\A}{\mathcal{A}} % Generator 121 | \newcommand{\Am}{\overline{\mathcal{A}}} % Generator 122 | 123 | % Transpose symbol using (DIN) EN ISO 80000-2:2013 standard 124 | \newcommand*{\trans}{{\mkern-1.5mu\mathsf{T}}} 125 | 126 | \newcommand*{\T}{\mathbb{T}} % Set of temporal varialbes 127 | \newcommand*{\R}{\mathbb{R}} % Set of real numbers 128 | \newcommand*{\Q}{\mathbb{Q}} % Set of rational numbers 129 | \newcommand*{\N}{\mathbb{N}} % Set of natural numbers 130 | \newcommand*{\Z}{\mathbb{Z}} % Set of integers 131 | 132 | \newcommand*{\BB}{\mathcal{B}} % Borel sigma-algebra 133 | \newcommand*{\FF}{\mathcal{F}} % Sigma-algebra 134 | \newcommand*{\PP}{\mathbb{P}} % Probability measure 135 | \newcommand*{\GP}{\mathrm{GP}} % GP 136 | 137 | \newcommand{\mineig}{\lambda_{\mathrm{min}}} 138 | \newcommand{\maxeig}{\lambda_{\mathrm{max}}} 139 | 140 | % Norm and inner product 141 | %% use \norm* to enable auto-height 142 | 143 | %% Some notes on these paired delimiters: 144 | %% It is argued that there should be no space between operator and delimiter, but this might not be suitable in some cases. Indeed log(x) should have no space between log and (, but log |x| with a mathop{} spacing looks absolutely much prettier than log|x| because here |x| is an argument. Think, shouldn't it be log(|x|) in full expansion, and we ignored () with spacing? 145 | %% See discussion in https://tex.stackexchange.com/questions/461806/missing-space-with-declarepaireddelimiter 146 | % 147 | \let\norm\relax 148 | \DeclarePairedDelimiter{\normbracket}{\lVert}{\rVert} 149 | \newcommand{\norm}{\normbracket} 150 | \newcommand{\normbig}[1]{\big \lVert #1 \big \rVert} 151 | \newcommand{\normBig}[1]{\Big \lVert #1 \Big\rVert} 152 | \newcommand{\normbigg}[1]{\bigg \lVert #1 \bigg\rVert} 153 | \newcommand{\normBigg}[1]{\Bigg \lVert #1 \Bigg\rVert} 154 | %\makeatletter 155 | %\newcommand{\norm}{\@ifstar{\@normnoauto}{\@normauto}} 156 | %\newcommand{\@normauto}[1]{\left\lVert#1\right\rVert} 157 | %\newcommand{\@normnoauto}[1]{\lVert#1\rVert} 158 | %\makeatother 159 | 160 | \let\innerp\relax 161 | \DeclarePairedDelimiter{\innerpbracket}{\langle}{\rangle} 162 | \newcommand{\innerp}{\innerpbracket} 163 | %\makeatletter 164 | %\newcommand{\innerp}{\@ifstar{\@inpnoautp}{\@inpauto}} 165 | %\newcommand{\@inpauto}[2]{\left\langle#1, #2\right\rangle} 166 | %\newcommand{\@inpnoautp}[2]{\left#1, #2\rangle} 167 | %\makeatother 168 | 169 | \let\abs\relax 170 | \DeclarePairedDelimiter{\absbracket}{\lvert}{\rvert} 171 | \newcommand{\abs}{\absbracket} 172 | \newcommand{\absbig}[1]{\big \lvert #1 \big \rvert} 173 | \newcommand{\absBig}[1]{\Big \lvert #1 \Big\rvert} 174 | \newcommand{\absbigg}[1]{\bigg \lvert #1 \bigg\rvert} 175 | \newcommand{\absBigg}[1]{\Bigg \lvert #1 \Bigg\rvert} 176 | %\makeatletter 177 | %\newcommand{\abs}{\@ifstar{\@absnoauto}{\@absauto}} 178 | %\newcommand{\@absauto}[1]{\left\lvert#1\right\rvert} 179 | %\newcommand{\@absnoauto}[1]{\lvert#1\rvert} 180 | %\makeatother 181 | 182 | % Some functions 183 | \newcommand{\mBesselsec}{\operatorname{K_\nu}} 184 | \newcommand{\jacob}{\mathrm{J}} 185 | \newcommand{\hessian}{\mathrm{H}} 186 | 187 | % Literals 188 | \def\matern{Mat\'{e}rn } 189 | 190 | % Theorem envs 191 | % Dummy env for those sharing the same numbering system. 192 | % If you would like to customise your environment numbering, you can define a command e.g., \thmnumcounter{section} in your main tex. 193 | % If \thmnumcounter is undefined, it is assumed that you will deal with defining theorem lemma etc by yourself. 194 | \makeatletter 195 | \@ifundefined{thmenvcounter}{} 196 | {% 197 | \newtheorem{envcounter}{EnvcounterDummy}[\thmenvcounter] 198 | \newtheorem{theorem}[envcounter]{Theorem} 199 | \newtheorem{proposition}[envcounter]{Proposition} 200 | \newtheorem{lemma}[envcounter]{Lemma} 201 | \newtheorem{corollary}[envcounter]{Corollary} 202 | \newtheorem{remark}[envcounter]{Remark} 203 | \newtheorem{example}[envcounter]{Example} 204 | \newtheorem{definition}[envcounter]{Definition} 205 | \newtheorem{algorithm}[envcounter]{Algorithm} 206 | \newtheorem{assumption}[envcounter]{Assumption} 207 | } 208 | \makeatother 209 | -------------------------------------------------------------------------------- /lectio_praecursoria/scripts/kfs_anime.py: -------------------------------------------------------------------------------- 1 | """ 2 | Generate animation of filtering and smoothing operations. 3 | 4 | Zheng Zhao, 2021 5 | """ 6 | import math 7 | import numpy as np 8 | import scipy.linalg 9 | import matplotlib.pyplot as plt 10 | from matplotlib.animation import FuncAnimation 11 | from matplotlib import animation 12 | from typing import Tuple 13 | 14 | 15 | def lti_sde_to_disc(A: np.ndarray, B: np.ndarray, dt: float) -> Tuple[np.ndarray, np.ndarray]: 16 | # Axelsson and Gustafsson 2015 17 | dim = A.shape[0] 18 | 19 | F = scipy.linalg.expm(A * dt) 20 | phi = np.vstack([np.hstack([A, np.outer(B, B)]), np.hstack([np.zeros_like(A), -A.T])]) 21 | AB = scipy.linalg.expm(phi * dt) @ np.vstack([np.zeros_like(A), np.eye(dim)]) 22 | Q = AB[0:dim, :] @ F.T 23 | return F, Q 24 | 25 | 26 | def simulate_data_from_disc_ss(F: np.ndarray, Q: np.ndarray, 27 | H: np.ndarray, R: float, 28 | m0: np.ndarray, p0: np.ndarray, 29 | T: int) -> Tuple[np.ndarray, np.ndarray]: 30 | dim_x = m0.size 31 | 32 | xs = np.empty((T, dim_x)) 33 | ys = np.empty((T, )) 34 | 35 | x = m0 + np.linalg.cholesky(p0) @ np.random.randn(dim_x) 36 | for k in range(T): 37 | x = F @ x + np.linalg.cholesky(Q) @ np.random.randn(dim_x) 38 | y = H @ x + math.sqrt(R) * np.random.randn() 39 | xs[k] = x 40 | ys[k] = y 41 | return xs, ys 42 | 43 | 44 | def kf_rts(F: np.ndarray, Q: np.ndarray, 45 | H: np.ndarray, R: float, 46 | y: np.ndarray, 47 | m0: np.ndarray, p0: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: 48 | """A Kalman filter and RTS smoother implementation can't be simpler. 49 | 50 | x_k = F x_{k-1} + q_{k-1}, 51 | y_k = H x_k + r_k, 52 | 53 | Parameters 54 | ---------- 55 | F : np.ndarray 56 | State transition. 57 | Q : np.ndarray 58 | State covariance. 59 | H : np.ndarray 60 | Measurement matrix (should give 1d measurement for simplicity). 61 | R : float 62 | Measurement noise variance. 63 | y : np.ndarray 64 | Measurements. 65 | m0, P0 : np.ndarray 66 | Initial mean and cov. 67 | Returns 68 | ------- 69 | ms, ps : np.ndarray 70 | Smoothing posterior mean and covariances. 71 | """ 72 | dim_x = m0.size 73 | num_y = y.size 74 | 75 | mfs = np.zeros(shape=(num_y, dim_x)) 76 | pfs = np.zeros(shape=(num_y, dim_x, dim_x)) 77 | 78 | mps = mfs.copy() 79 | pps = pfs.copy() 80 | 81 | m = m0 82 | p = p0 83 | 84 | # Filtering pass 85 | for k in range(num_y): 86 | 87 | # Pred 88 | m = F @ m 89 | p = F @ p @ F.T + Q 90 | 91 | mps[k] = m 92 | pps[k] = p 93 | 94 | # Update 95 | S = H @ p @ H.T + R 96 | K = p @ H.T / S 97 | m = m + K @ (y[k] - H @ m) 98 | p = p - K @ S @ K.T 99 | 100 | # Save 101 | mfs[k] = m 102 | pfs[k] = p 103 | 104 | # Smoothing pass 105 | mss = mfs.copy() 106 | pss = pfs.copy() 107 | for k in range(num_y - 2, -1, -1): 108 | (c, low) = scipy.linalg.cho_factor(pps[k + 1]) 109 | G = pfs[k] @ scipy.linalg.cho_solve((c, low), F).T 110 | mss[k] = mfs[k] + G @ (mss[k + 1] - mps[k + 1]) 111 | pss[k] = pfs[k] + G @ (pss[k + 1] - pps[k + 1]) @ G.T 112 | 113 | return mfs, pfs, mss, pss 114 | 115 | 116 | if __name__ == "__main__": 117 | 118 | np.random.seed(666666) 119 | 120 | plt.rcParams.update({ 121 | 'text.usetex': True, 122 | 'text.latex.preamble': r'\usepackage{fouriernc}', 123 | 'font.family': "serif", 124 | 'font.serif': 'New Century Schoolbook', 125 | 'font.size': 18}) 126 | anime_writer = animation.ImageMagickWriter() 127 | 128 | # Matern 3/2 coefficients 129 | ell = 2. 130 | sigma = 1. 131 | 132 | A = np.array([[0., 1.], 133 | [-3 / ell ** 2, -2 * math.sqrt(3) / ell]]) 134 | B = np.array([0., sigma * math.sqrt(12 * math.sqrt(3)) / ell ** (3 / 2)]) 135 | 136 | m0 = np.zeros((2, )) 137 | p0 = np.array([[sigma ** 2, 0.], 138 | [0., 3 * sigma ** 2 / ell ** 2]]) 139 | 140 | dt = 0.1 141 | F, Q = lti_sde_to_disc(A, B, dt) 142 | 143 | H = np.array([[1., 0.]]) 144 | R = 0.1 145 | 146 | # Generate data 147 | T = 100 148 | ts = np.linspace(dt, T * dt, T) 149 | xs, ys = simulate_data_from_disc_ss(F, Q, H, R, m0, p0, T) 150 | 151 | # Filtering and smoothing 152 | mfs, pfs, mss, pss = kf_rts(F, Q, H, R, ys, m0, p0) 153 | 154 | # Animation for filtering 155 | # Updating polycollection in matplotlib is difficult, 156 | # the code in the following for fill_between might look ugly 157 | fig, ax = plt.subplots(figsize=(8.8, 6.6)) 158 | 159 | line_true, = ax.plot(ts[0], xs[0, 0], c='black', linewidth=3, label='True signal $X(t)$') 160 | sct_data = ax.scatter(ts[0], ys[0], s=4, c='purple', label='Measurement $Y_k$') 161 | line_mfs, = ax.plot(ts[0], mfs[0, 0], c='tab:blue', linewidth=3, label='Filtering mean') 162 | ax.fill_between( 163 | ts[:1], 164 | mfs[:1, 0] - 1.96 * np.sqrt(pfs[:1, 0, 0]), 165 | mfs[:1, 0] + 1.96 * np.sqrt(pfs[:1, 0, 0]), 166 | color='tab:blue', 167 | edgecolor='none', 168 | alpha=0.1, 169 | label='.95 confidence' 170 | ) 171 | k_line = ax.axvline(ts[0], c='black', linestyle='--') 172 | k_text = ax.text(ts[0], 0., '$k=0$', fontsize=18) 173 | 174 | ax.set_xlim(0, T * dt) 175 | ax.set_ylim(-2.5, 1.5) 176 | ax.legend(loc='upper left', ncol=2, fontsize=18) 177 | ax.set_xlabel('$t$', fontsize=18) 178 | 179 | plt.subplots_adjust(top=.986, bottom=.084, left=.063, right=.988) 180 | 181 | def anime_func(frame): 182 | line_true.set_data(ts[:frame], xs[:frame, 0]) 183 | line_mfs.set_data(ts[:frame], mfs[:frame, 0]) 184 | k_line.set_data((ts[frame-1], ts[frame-1]), (0, 1)) 185 | k_text.set_text(f'$k={frame}$') 186 | k_text.set_position((ts[frame], 0.)) 187 | ax.collections.clear() 188 | ax.fill_between( 189 | ts[:frame], 190 | mfs[:frame, 0] - 1.96 * np.sqrt(pfs[:frame, 0, 0]), 191 | mfs[:frame, 0] + 1.96 * np.sqrt(pfs[:frame, 0, 0]), 192 | color='tab:blue', 193 | edgecolor='none', 194 | alpha=0.1 195 | ) 196 | ax.scatter(ts[:frame], ys[:frame], s=4, c='purple') 197 | 198 | ani = FuncAnimation(fig, anime_func, 199 | frames=T, interval=10, 200 | repeat=False) 201 | 202 | ani.save('../figs/animes/filter.png', writer=anime_writer) 203 | # plt.show() 204 | 205 | plt.close(fig) 206 | 207 | # Animation for smoothing 208 | fig, ax = plt.subplots(figsize=(8.8, 6.6)) 209 | 210 | ax.plot(ts, xs[:, 0], c='black', linewidth=3, label='True signal $X(t)$') 211 | ax.scatter(ts, ys, s=4, c='purple', label='Measurement $Y_k$') 212 | line_mss, = ax.plot(ts, mss[:, 0], c='tab:blue', linewidth=3, label='Smoothing mean') 213 | ax.fill_between( 214 | ts[:1], 215 | mss[:1, 0] - 1.96 * np.sqrt(pss[:1, 0, 0]), 216 | mss[:1, 0] + 1.96 * np.sqrt(pss[:1, 0, 0]), 217 | color='tab:blue', 218 | edgecolor='none', 219 | alpha=0.1, 220 | label='.95 confidence' 221 | ) 222 | k_line = ax.axvline(ts[0], c='black', linestyle='--') 223 | k_text = ax.text(ts[0], 0., '$k=0$', fontsize=18) 224 | 225 | ax.set_xlim(0, T * dt) 226 | ax.set_ylim(-2.5, 1.5) 227 | ax.legend(loc='upper left', ncol=2, fontsize=18) 228 | ax.set_xlabel('$t$', fontsize=18) 229 | 230 | plt.subplots_adjust(top=.986, bottom=.084, left=.063, right=.988) 231 | 232 | def anime_func(frame): 233 | line_mss.set_data(ts[:frame], mss[:frame, 0]) 234 | k_line.set_data((ts[frame-1], ts[frame-1]), (0, 1)) 235 | k_text.set_text(f'$k={frame}$') 236 | k_text.set_position((ts[frame], 0.)) 237 | ax.collections.clear() 238 | ax.fill_between( 239 | ts[:frame], 240 | mss[:frame, 0] - 1.96 * np.sqrt(pss[:frame, 0, 0]), 241 | mss[:frame, 0] + 1.96 * np.sqrt(pss[:frame, 0, 0]), 242 | color='tab:blue', 243 | edgecolor='none', 244 | alpha=0.1 245 | ) 246 | ax.scatter(ts, ys, s=4, c='purple') 247 | 248 | ani = FuncAnimation(fig, anime_func, 249 | frames=T, interval=10, 250 | repeat=False) 251 | 252 | ani.save('../figs/animes/smoother.png', writer=anime_writer) 253 | # plt.show() 254 | 255 | -------------------------------------------------------------------------------- /thesis_latex/ch6.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = dissertation.tex 2 | \chapter{Summary and discussion} 3 | \label{chap:summary} 4 | In this chapter we present a concise summary of Publications I--VII as well as discussion on a few unsolved problems and possible future extensions. 5 | 6 | \section{Summary of publications} 7 | This section briefly summaries the contributions of Publications~I--VII and highlights their significances. 8 | 9 | \subsection*{Publication~\cp{paperTME} (Chapter~\ref{chap:tme})} 10 | This paper proposes a new class of non-linear continuous-discrete Gaussian filters and smoothers by using the Taylor moment expansion (TME) scheme to predict the means and covariances from SDEs. The main significance of this paper is that the TME method can provide asymptotically exact solutions of the predictive mean and covariances required in the Gaussian filtering and smoothing steps. Secondly, the paper analyses the positive definiteness of TME covariance approximations and thereupon presents a few sufficient conditions to guarantee the positive definiteness. Lastly, the paper analyses the stability of TME Gaussian filters. 11 | 12 | \subsection*{Publication~\cp{paperSSDGP} (Chapter~\ref{chap:dssgp})} 13 | This paper introduces state-space representations of a class of deep Gaussian processes (DGPs). More specifically, the paper defines DGPs as vector-valued stochastic processes over collections of conditional GPs, thereupon, the paper represents DGPs in hierarchical systems of the SDE representations of their conditional GPs. The main significance of this paper is that the resulting state-space DGPs (SS-DGPs) are Markov processes, so that the SS-DGP regression problem is computationally cheap (i.e., linear with respect to the number of measurements) by using continuous-discrete filtering and smoothing methods. Secondly, the paper identifies that for a certain class of SS-DGPs the Gaussian filtering and smoothing methods fail to learn the posterior distributions of their state components. Finally, the paper features a real application of SS-DGPs in modelling a gravitational wave signal. 14 | 15 | \subsection*{Publication~\cp{paperKFSECG} (Section~\ref{sec:spectro-temporal})} 16 | This paper is an extension of Publication~\cp{paperKFSECGCONF}. In particular, the quasi-periodic SDEs are used to model the Fourier coefficients instead of the Ornstein--Uhlenbeck ones used in Publication~\cp{paperKFSECGCONF}. This consideration leads to state-space models for which the measurement representations are time-invariant therefore, one can use steady-state Kalman filters and smoothers to solve the spectro-temporal estimation problem with lower computational cost compared to Publication~\cp{paperKFSECGCONF}. This paper also expands the experiments for atrial fibrillation detection by taking into account more classifiers. 17 | 18 | \subsection*{Publication~\cp{paperDRIFT} (Section~\ref{sec:drift-est})} 19 | This paper is concerned with the state-space GP approach for estimating unknown drift functions of SDEs from partially observed trajectories. This approach is significant mainly in terms of computation, as the computational complexity scales linearly in the number of measurements. In addition, the state-space GP approach allows for using high-order It\^{o}--Taylor expansions in order to give accurate SDE discretisations without the necessity to compute the covariance matrices of the derivatives of the GP prior. 20 | 21 | \subsection*{Publication~\cp{paperKFSECGCONF} (Section~\ref{sec:spectro-temporal})} 22 | This paper introduces a state-space probabilistic spectro-temporal estimation method and thereupon applies the method for detecting atrial fibrillation from electrocardiogram signals. The so-called probabilistic spectro-temporal estimation is a GP regression-based model for estimating the coefficients of Fourier expansions. The main significance of this paper is that the state-space framework allows for dealing with large sets of measurements and high-order Fourier expansions. Also, the combination of the spectro-temporal estimation method and deep convolutional neural networks shows efficacy for classifying a class of electrocardiogram signals. 23 | 24 | \subsection*{Publication~\cp{paperMARITIME} (Section~\ref{sec:maritime})} 25 | This paper reviews sensor technologies and machine learning methods for autonomous maritime vessel navigation. In particular, the paper lists and reviews a number of studies that use deep learning and GP methods for vessel trajectory analysis, ship detection and classification, and ship tracking. The paper also features a ship detection example by using a deep convolutional neural network. 26 | 27 | \subsection*{Publication~\cp{paperRNSSGP} (Section~\ref{sec:l1-r-dgp})} 28 | This paper solves $L^1$-regularised DGP regression problems under the alternating direction method of multipliers (ADMM) framework. The significance of this paper is that one can introduce regularisation (e.g., sparseness or total variation) at any level of the DGP component hierarchy. Secondly, the paper provides a general framework that allows for regularising both batch and state-space DGPs. Finally, the paper presents a convergence analysis for the proposed ADMM solution of $L^1$-regularised DGP regression problems. 29 | 30 | \section{Discussion} 31 | Finally, we end this thesis with discussion on some unsolved problems and possible future extensions. 32 | 33 | \subsection*{Positive definiteness analysis for high-order and high-dimensional TME covariance approximation} 34 | Theorem~\ref{thm:tme-cov-pd} provides a sufficient condition to guarantee the positive definiteness of TME covariance approximations. However, the use of Theorem~\ref{thm:tme-cov-pd} soon becomes infeasible as the expansion order $M$ and the state dimension $d$ grow large. In practice, it can be easier to check the positive definiteness numerically when $d$ is small. 35 | 36 | \subsection*{Practical implementation of TME} 37 | A practical challenge with implementing TME consists in the presence of derivative terms in $\A$ (see, Equation~\eqref{equ:generator-ito}). This in turn implies that the iterated generator $\A^M$ further requires the computation of derivatives of the SDE coefficients up to order $M$. While the derivatives of $\A$ are easily computed by hand, the derivatives in $\A^M$ require more consideration as they involve numerous applications of the chain rule, not to mention the multidimensional operator $\Am$ in Remark~\ref{remark:multidim-generator}. 38 | 39 | While in our current implementation we chose to use symbolic differentiation (for ease of implementation as well as portability across languages), several things can be said against using it. Symbolic differentiation explicitly computes full Jacobians, where only vector-Jacobian/Jacobian-vector products would be necessary. This induces an unnecessary overhead that grows with the dimension of the problem. Also, symbolic differentiation is usually independent of the philosophy of modern differentiable programming frameworks and the optimisation for parallelisable hardware (e.g., GPUs), hence they may incur a loss of performance on these. 40 | 41 | Automatic differentiation tools, for instance, TensorFlow and JaX are amenable to computing the derivatives in $\Am$. Furthermore, they provide efficient computations for Jacobian-vector/vector-Jacobian products. We hence argue that these tools are worthwhile for performance improvement in the future\footnote{By the time of the pre-examination of this thesis, the TME method is now implemented in JaX as an open source library (see, Section~\ref{sec:codes}).}. 42 | 43 | \subsection*{Generalisation of the identifiability analysis} 44 | The identifiability analysis in Section~\ref{sec:identi-problem} is limited to SS-DGPs for which the GP elements are one-dimensional. This dimension assumption is used in order to derive Equation~\eqref{equ:vanish-cov-eq1} in closed-form. However, it is of interest to see whether we can generalise Lemma~\ref{lemma:vanishing-prior-cov} for SS-DGPs that have multidimensional GP elements. 45 | 46 | The abstract Gaussian filter in Algorithm~\ref{alg:abs-gf} assumes that the prediction steps are done exactly. However, this assumption may not always be realistic because Gaussian filters often involve numerical integrations to predict through SDEs, for example, by using sigma-point methods. Hence, it is important to verify if Lemma~\ref{lemma:vanishing-prior-cov} still holds when one computes the filtering predictions by some numerical means. 47 | 48 | \subsection*{Spatio-temporal SS-DGPs} 49 | SS-DGPs are stochastic processes defined on temporal domains. In order to model spatio-temporal data, it is necessary to generalise SS-DGPs to take values in infinite-dimensional spaces~\citep{Giuseppe2014}. A path for this generalisation is to leverage the stochastic partial differential equation (SPDE) representations of spatio-temporal GPs. To see this, let us consider an $\mathbb{H}$-valued stochastic process $U \colon \T \to \mathbb{H}$ governed by a well-defined SPDE 50 | % 51 | \begin{equation} 52 | \diff U(t) = A \, U(t) \diff t + B \diff W(t) \nonumber 53 | \end{equation} 54 | % 55 | with some boundary and initial conditions, where $A\colon \mathbb{H} \to \mathbb{H}$ and $B\colon \mathbb{W} \to \mathbb{H}$ are linear operators, and $W\colon \T \to \mathbb{W}$ is a $\mathbb{W}$-valued Wiener process. Then we can borrow the idea presented in Section~\ref{sec:ssdgp} to form a spatio-temporal SS-DGP by hierarchically composing such SPDEs of the form above. 56 | 57 | A different path for generalising SS-DGPs is shown by~\citet{Emzir2020}. Specifically, they build deep Gaussian fields based on the SPDE representations of \matern fields~\citep{Whittle1954, Lindgren2011}. However, we should note that this approach gives random fields instead of spatio-temporal processes. 58 | -------------------------------------------------------------------------------- /thesis_latex/fourier2.sty: -------------------------------------------------------------------------------- 1 | \def\fileversion{1.4}% 2 | \def\filedate{2005/01/01}% 3 | \NeedsTeXFormat{LaTeX2e}% 4 | \ProvidesPackage{fourier}% 5 | [\filedate\space\fileversion\space fourier-GUTenberg package]% 6 | \DeclareFontEncoding{FML}{}{} 7 | \DeclareFontSubstitution{FML}{futm}{m}{it} 8 | \DeclareFontEncoding{FMS}{}{} 9 | \DeclareFontSubstitution{FMS}{futm}{m}{n} 10 | %\DeclareFontEncoding{FMX}{}{} 11 | % \DeclareFontSubstitution{FMX}{futm}{m}{n} 12 | %% 13 | \newif\ifsloped\newif\ifpoorman\poormantrue 14 | \newif\ifwidespace\widespacefalse 15 | \DeclareOption{widespace}{\widespacetrue} 16 | %% 17 | \DeclareOption{poorman}{\def\textfamilyextension{s}% 18 | \def\mathfamilyextension{s}} 19 | \DeclareOption{expert}{\def\textfamilyextension{x}% 20 | \def\mathfamilyextension{x}\poormanfalse} 21 | \DeclareOption{oldstyle}{\def\textfamilyextension{j}% 22 | \def\mathfamilyextension{x}\poormanfalse} 23 | \DeclareOption{fulloldstyle}{\def\textfamilyextension{j}% 24 | \def\mathfamilyextension{j}\poormanfalse} 25 | \DeclareOption{sloped}{\slopedtrue} 26 | \DeclareOption{upright}{\slopedfalse} 27 | \ExecuteOptions{sloped,poorman} 28 | \ProcessOptions 29 | %% 30 | 31 | %% 32 | \ifwidespace 33 | \DeclareRobustCommand{\SetFourierSpace}{% 34 | \fontdimen2\font=1.23\fontdimen2\font} 35 | \fi 36 | \ifpoorman\else 37 | \newcommand*{\sbseries}{\fontseries{sb}\selectfont} 38 | \newcommand*{\blackseries}{\fontseries{eb}\selectfont} 39 | \newcommand*{\titleshape}{\fontshape{tt}\selectfont} 40 | \DeclareTextFontCommand{\textsb}{\sbseries}% 41 | \DeclareTextFontCommand{\textblack}{\blackseries}% 42 | \DeclareTextFontCommand{\texttitle}{\titleshape}% 43 | \newcommand*{\oldstyle}{\fontfamily{futj}\selectfont} 44 | \newcommand*{\lining}{\fontfamily{futx}\selectfont} 45 | \fi 46 | \renewcommand{\rmdefault}{fut\textfamilyextension} 47 | \RequirePackage[T1]{fontenc} 48 | \RequirePackage{textcomp} 49 | \RequirePackage{fourier-orns} 50 | \DeclareSymbolFont{operators}{T1}{fut\mathfamilyextension}{m}{n}% 51 | \SetSymbolFont{operators}{bold}{T1}{fut\mathfamilyextension}{b}{n}% 52 | 53 | % 54 | \def\addFourierGreekPrefix#1{other} 55 | \newcommand{\othergreek}[1]{\expandafter\csname\expandafter% 56 | \addFourierGreekPrefix\string#1\endcsname} 57 | % 58 | \ifsloped 59 | \DeclareSymbolFont{letters}{FML}{futmi}{m}{it}% 60 | \DeclareSymbolFont{otherletters}{FML}{futm}{m}{it} 61 | \SetSymbolFont{letters}{bold}{FML}{futmi}{b}{it} 62 | \SetSymbolFont{otherletters}{bold}{FML}{futm}{b}{it} 63 | \DeclareMathSymbol{\Gamma}{\mathord}{otherletters}{000} 64 | \DeclareMathSymbol{\Delta}{\mathord}{otherletters}{001} 65 | \DeclareMathSymbol{\Theta}{\mathord}{otherletters}{002} 66 | \DeclareMathSymbol{\Lambda}{\mathord}{otherletters}{003} 67 | \DeclareMathSymbol{\Xi}{\mathord}{otherletters}{004} 68 | \DeclareMathSymbol{\Pi}{\mathord}{otherletters}{005} 69 | \DeclareMathSymbol{\Sigma}{\mathord}{otherletters}{006} 70 | \DeclareMathSymbol{\Upsilon}{\mathord}{otherletters}{007} 71 | \DeclareMathSymbol{\Phi}{\mathord}{otherletters}{008} 72 | \DeclareMathSymbol{\Psi}{\mathord}{otherletters}{009} 73 | \DeclareMathSymbol{\Omega}{\mathord}{otherletters}{010} 74 | \DeclareMathSymbol{\otherGamma}{\mathord}{letters}{000} 75 | \DeclareMathSymbol{\otherDelta}{\mathord}{letters}{001} 76 | \DeclareMathSymbol{\otherTheta}{\mathord}{letters}{002} 77 | \DeclareMathSymbol{\otherLambda}{\mathord}{letters}{003} 78 | \DeclareMathSymbol{\otherXi}{\mathord}{letters}{004} 79 | \DeclareMathSymbol{\otherPi}{\mathord}{letters}{005} 80 | \DeclareMathSymbol{\otherSigma}{\mathord}{letters}{006} 81 | \DeclareMathSymbol{\otherUpsilon}{\mathord}{letters}{007} 82 | \DeclareMathSymbol{\otherPhi}{\mathord}{letters}{008} 83 | \DeclareMathSymbol{\otherPsi}{\mathord}{letters}{009} 84 | \DeclareMathSymbol{\otherOmega}{\mathord}{letters}{010} 85 | \else 86 | \DeclareSymbolFont{letters}{FML}{futm}{m}{it}% 87 | \DeclareSymbolFont{otherletters}{FML}{futmi}{m}{it} 88 | \SetSymbolFont{letters}{bold}{FML}{futm}{b}{it} 89 | \SetSymbolFont{otherletters}{bold}{FML}{futmi}{b}{it} 90 | \DeclareMathSymbol{\Gamma}{\mathord}{letters}{000} 91 | \DeclareMathSymbol{\Delta}{\mathord}{letters}{001} 92 | \DeclareMathSymbol{\Theta}{\mathord}{letters}{002} 93 | \DeclareMathSymbol{\Lambda}{\mathord}{letters}{003} 94 | \DeclareMathSymbol{\Xi}{\mathord}{letters}{004} 95 | \DeclareMathSymbol{\Pi}{\mathord}{letters}{005} 96 | \DeclareMathSymbol{\Sigma}{\mathord}{letters}{006} 97 | \DeclareMathSymbol{\Upsilon}{\mathord}{letters}{007} 98 | \DeclareMathSymbol{\Phi}{\mathord}{letters}{008} 99 | \DeclareMathSymbol{\Psi}{\mathord}{letters}{009} 100 | \DeclareMathSymbol{\Omega}{\mathord}{letters}{010} 101 | \DeclareMathSymbol{\otherGamma}{\mathord}{otherletters}{000} 102 | \DeclareMathSymbol{\otherDelta}{\mathord}{otherletters}{001} 103 | \DeclareMathSymbol{\otherTheta}{\mathord}{otherletters}{002} 104 | \DeclareMathSymbol{\otherLambda}{\mathord}{otherletters}{003} 105 | \DeclareMathSymbol{\otherXi}{\mathord}{otherletters}{004} 106 | \DeclareMathSymbol{\otherPi}{\mathord}{otherletters}{005} 107 | \DeclareMathSymbol{\otherSigma}{\mathord}{otherletters}{006} 108 | \DeclareMathSymbol{\otherUpsilon}{\mathord}{otherletters}{007} 109 | \DeclareMathSymbol{\otherPhi}{\mathord}{otherletters}{008} 110 | \DeclareMathSymbol{\otherPsi}{\mathord}{otherletters}{009} 111 | \DeclareMathSymbol{\otherOmega}{\mathord}{otherletters}{010} 112 | \fi 113 | \DeclareSymbolFont{symbols}{FMS}{futm}{m}{n}% 114 | %\DeclareSymbolFont{largesymbols}{FMX}{futm}{m}{n} 115 | \DeclareMathAlphabet{\mathbf}{T1}{fut\mathfamilyextension}{bx}{n}% 116 | \DeclareMathAlphabet{\mathrm}{T1}{fut\mathfamilyextension}{m}{n}% 117 | \DeclareMathAlphabet{\mathit}{T1}{fut\mathfamilyextension}{m}{it}% 118 | \DeclareMathAlphabet{\mathcal}{FMS}{futm}{m}{n}% 119 | \DeclareMathSymbol{\varkappa}{\mathord}{letters}{128} 120 | \DeclareMathSymbol{\varvarrho}{\mathord}{letters}{129} 121 | \DeclareMathSymbol{+}{\mathbin}{symbols}{128} 122 | \DeclareMathSymbol{=}{\mathrel}{symbols}{129} 123 | \DeclareMathSymbol{<}{\mathrel}{symbols}{130} 124 | \DeclareMathSymbol{>}{\mathrel}{symbols}{131} 125 | \DeclareMathSymbol{\leqslant}{\mathrel}{symbols}{132} 126 | \DeclareMathSymbol{\geqslant}{\mathrel}{symbols}{133} 127 | \DeclareMathSymbol{\parallelslant}{\mathrel}{symbols}{134} 128 | \DeclareMathSymbol{\thething}{\mathord}{symbols}{135} 129 | \DeclareMathSymbol{\vDash}{\mathrel}{symbols}{136} 130 | \DeclareMathSymbol{\blacktriangleleft}{\mathrel}{symbols}{137} 131 | \DeclareMathSymbol{\blacktriangleright}{\mathrel}{symbols}{138} 132 | \DeclareMathSymbol{\nleqslant}{\mathrel}{symbols}{139} 133 | \DeclareMathSymbol{\ngeqslant}{\mathrel}{symbols}{140} 134 | \DeclareMathSymbol{\parallel}{\mathrel}{symbols}{141} 135 | \DeclareMathSymbol{\nparallel}{\mathrel}{symbols}{142} 136 | \DeclareMathSymbol{\nparallelslant}{\mathrel}{symbols}{143} 137 | \DeclareMathSymbol{\nvDash}{\mathrel}{symbols}{144} 138 | \DeclareMathSymbol{\intercal}{\mathbin}{symbols}{145} 139 | \DeclareMathSymbol{\hslash}{\mathord}{symbols}{146} 140 | \DeclareMathSymbol{\nexists}{\mathord}{symbols}{147} 141 | \DeclareMathSymbol{\complement}{\mathord}{symbols}{148} 142 | \DeclareMathSymbol{\varsubsetneq}{\mathrel}{symbols}{149} 143 | \DeclareMathSymbol{\xswordsup}{\mathord}{symbols}{150} 144 | \DeclareMathSymbol{\xswordsdown}{\mathord}{symbols}{151} 145 | \let\notin\@undefined 146 | \DeclareMathSymbol{\notin}{\mathrel}{symbols}{155} 147 | \DeclareMathSymbol{\notowns}{\mathrel}{symbols}{156} 148 | \DeclareMathSymbol{\hbar}{\mathord}{symbols}{157} 149 | \DeclareMathSymbol{\smallsetminus}{\mathbin}{symbols}{158} 150 | \DeclareMathSymbol{\subsetneqq}{\mathrel}{symbols}{159} 151 | \DeclareMathSymbol{\rightrightarrows}{\mathrel}{symbols}{160} 152 | \DeclareMathSymbol{\leftleftarrows}{\mathrel}{symbols}{161} 153 | \DeclareMathSymbol{\square}{\mathord}{symbols}{162} 154 | \DeclareMathSymbol{\curvearrowleft}{\mathrel}{symbols}{163} 155 | \DeclareMathSymbol{\curvearrowright}{\mathrel}{symbols}{164} 156 | \DeclareMathSymbol{\blacksquare}{\mathord}{symbols}{165} 157 | \DeclareMathSymbol{\otheralpha}{\mathord}{otherletters}{011} 158 | \DeclareMathSymbol{\otherbeta}{\mathord}{otherletters}{012} 159 | \DeclareMathSymbol{\othergamma}{\mathord}{otherletters}{013} 160 | \DeclareMathSymbol{\otherdelta}{\mathord}{otherletters}{014} 161 | \DeclareMathSymbol{\otherepsilon}{\mathord}{otherletters}{015} 162 | \DeclareMathSymbol{\otherzeta}{\mathord}{otherletters}{016} 163 | \DeclareMathSymbol{\othereta}{\mathord}{otherletters}{017} 164 | \DeclareMathSymbol{\othertheta}{\mathord}{otherletters}{018} 165 | \DeclareMathSymbol{\otheriota}{\mathord}{otherletters}{019} 166 | \DeclareMathSymbol{\otherkappa}{\mathord}{otherletters}{020} 167 | \DeclareMathSymbol{\otherlambda}{\mathord}{otherletters}{021} 168 | \DeclareMathSymbol{\othermu}{\mathord}{otherletters}{022} 169 | \DeclareMathSymbol{\othernu}{\mathord}{otherletters}{023} 170 | \DeclareMathSymbol{\otherxi}{\mathord}{otherletters}{024} 171 | \DeclareMathSymbol{\otherpi}{\mathord}{otherletters}{025} 172 | \DeclareMathSymbol{\otherrho}{\mathord}{otherletters}{026} 173 | \DeclareMathSymbol{\othersigma}{\mathord}{otherletters}{027} 174 | \DeclareMathSymbol{\othertau}{\mathord}{otherletters}{028} 175 | \DeclareMathSymbol{\otherupsilon}{\mathord}{otherletters}{029} 176 | \DeclareMathSymbol{\otherphi}{\mathord}{otherletters}{030} 177 | \DeclareMathSymbol{\otherchi}{\mathord}{otherletters}{031} 178 | \DeclareMathSymbol{\otherpsi}{\mathord}{otherletters}{032} 179 | \DeclareMathSymbol{\otheromega}{\mathord}{otherletters}{033} 180 | \DeclareMathSymbol{\othervarepsilon}{\mathord}{otherletters}{034} 181 | \DeclareMathSymbol{\othervartheta}{\mathord}{otherletters}{035} 182 | \DeclareMathSymbol{\othervarpi}{\mathord}{otherletters}{036} 183 | \DeclareMathSymbol{\othervarrho}{\mathord}{otherletters}{037} 184 | \DeclareMathSymbol{\othervarsigma}{\mathord}{otherletters}{038} 185 | \DeclareMathSymbol{\othervarphi}{\mathord}{otherletters}{039} 186 | \DeclareMathSymbol{\varkappa}{\mathord}{letters}{128} 187 | \DeclareMathSymbol{\varvarrho}{\mathord}{letters}{129} 188 | \DeclareMathSymbol{\varpartialdiff}{\mathord}{letters}{130} 189 | \DeclareMathSymbol{\varvarpi}{\mathord}{letters}{131} 190 | \DeclareMathSymbol{\othervarkappa}{\mathord}{otherletters}{128} 191 | \DeclareMathSymbol{\othervarvarrho}{\mathord}{otherletters}{129} 192 | \DeclareMathSymbol{\othervarvarpi}{\mathord}{otherletters}{131} 193 | 194 | % No MathDelimiters! - MV 195 | 196 | \DeclareMathAccent{\acute}{\mathalpha}{operators}{1} 197 | \DeclareMathAccent{\grave}{\mathalpha}{operators}{0} 198 | \DeclareMathAccent{\ddot}{\mathalpha}{operators}{4} 199 | \DeclareMathAccent{\tilde}{\mathalpha}{operators}{3} 200 | \DeclareMathAccent{\bar}{\mathalpha}{operators}{9} 201 | \DeclareMathAccent{\breve}{\mathalpha}{operators}{8} 202 | \DeclareMathAccent{\check}{\mathalpha}{operators}{7} 203 | \DeclareMathAccent{\hat}{\mathalpha}{operators}{2} 204 | \DeclareMathAccent{\dot}{\mathalpha}{operators}{10} 205 | \DeclareMathAccent{\mathring}{\mathalpha}{operators}{6} 206 | \DeclareMathAccent{\wideparen}{\mathord}{largesymbols}{148} 207 | %% 208 | 209 | \DeclareMathAccent{\widearc}{\mathord}{largesymbols}{216} 210 | \DeclareMathAccent{\wideOarc}{\mathord}{largesymbols}{228} 211 | %% 212 | \def\defaultscriptratio{.76} 213 | \def\defaultscriptscriptratio{.6} 214 | \DeclareMathSizes{5} {6} {6} {6} 215 | \DeclareMathSizes{6} {6} {6} {6} 216 | \DeclareMathSizes{7} {6.8} {6} {6} 217 | \DeclareMathSizes{8} {8} {6.8}{6} 218 | \DeclareMathSizes{9} {9} {7.6}{6} 219 | \DeclareMathSizes{10} {10} {7.6}{6} 220 | \DeclareMathSizes{10.95}{10.95}{8.3}{6} 221 | \DeclareMathSizes{12} {12} {9} {7} 222 | \DeclareMathSizes{14.4} {14.4} {10} {8} 223 | \DeclareMathSizes{17.28}{17.28}{12} {9} 224 | \DeclareMathSizes{20.74}{20.74}{14.4}{10} 225 | \DeclareMathSizes{24.88}{24.88}{17.28}{12} 226 | \thinmuskip=2mu 227 | \medmuskip=2.5mu plus 1mu minus 2.5mu 228 | \thickmuskip=3.5mu plus 2.5mu 229 | %% 230 | \delimiterfactor850 231 | %% 232 | \DeclareFontFamily{U}{futm}{} 233 | \DeclareFontShape{U}{futm}{m}{n}{ 234 | <-> s * [.92] fourier-bb 235 | }{} 236 | \DeclareSymbolFont{Ufutm}{U}{futm}{m}{n} 237 | \DeclareSymbolFontAlphabet{\math@bb}{Ufutm} 238 | \AtBeginDocument{\let\mathbb\math@bb % 239 | 240 | \ifx\overset\@undefined\else 241 | \newcommand{\widering}[1]{\overset{\smash{\vbox to .2ex{% 242 | \hbox{$\mathring{}$}}}}{\wideparen{#1}}} 243 | \fi 244 | % 245 | \def\accentclass@{0} % I'm unsure whether this is ok 246 | } 247 | % 248 | % 249 | \endinput 250 | 251 | -------------------------------------------------------------------------------- /scripts/TME_estimation_benes.py: -------------------------------------------------------------------------------- 1 | # Demonstrate TME on a Benes model for some expectation approximations. This will generate Figure 3.1 in the thesis. 2 | # 3 | # Zheng Zhao 2020 4 | # 5 | import os 6 | import math 7 | import sympy 8 | import numpy as np 9 | import matplotlib.pyplot as plt 10 | import tme.base_sympy as tme 11 | 12 | from typing import Callable 13 | from sympy import lambdify 14 | 15 | 16 | def rieman1D(x: np.ndarray, 17 | f: Callable, 18 | *args, **kwargs): 19 | r"""Riemannian computation of an integral 20 | \int f(x, *args, **kwargs) dx \approx \sum f(x_i) (x_i - x_i-1). 21 | Can be replaced by :code:`np.trapz`. 22 | """ 23 | return np.sum(f(x[1:], *args, **kwargs) * np.diff(x)) 24 | 25 | 26 | def benesPDF(x: np.ndarray, 27 | x0: float, 28 | dt: float): 29 | """ 30 | Transition density of the Benes model. 31 | See, pp. 214 of Sarkka 2019. 32 | """ 33 | return 1 / math.sqrt(2 * math.pi * dt) * np.cosh(x) / np.cosh(x0) \ 34 | * math.exp(-0.5 * dt) * np.exp(-0.5 / dt * (x - x0) ** 2) 35 | 36 | 37 | def f_mean(x: np.ndarray, 38 | x0: float, 39 | dt: float): 40 | """Expectation integrand. 41 | """ 42 | return x * benesPDF(x, x0, dt) 43 | 44 | 45 | def f_x2(x: np.ndarray, 46 | x0: float, 47 | dt: float): 48 | """Expectation integrand. 49 | """ 50 | return x ** 2 * benesPDF(x, x0, dt) 51 | 52 | 53 | def f_x3(x: np.ndarray, 54 | x0: float, 55 | dt: float): 56 | """Expectation integrand. 57 | """ 58 | return x ** 3 * benesPDF(x, x0, dt) 59 | 60 | 61 | def f_nonlinear(x: np.ndarray, 62 | x0: float, 63 | dt: float): 64 | """Expectation integrand. 65 | """ 66 | return np.sin(x) * benesPDF(x, x0, dt) 67 | 68 | 69 | def softplus(x): 70 | return np.log(1 + np.exp(x)) 71 | 72 | 73 | def softplus_sympy(x): 74 | return sympy.log(1 + sympy.exp(x)) 75 | 76 | 77 | def f_nn(x: np.ndarray, 78 | x0: float, 79 | dt: float): 80 | """ 81 | A toy-level neural network with a single perceptron 82 | NN(x) = sigmoid(x) 83 | """ 84 | return softplus(softplus(x)) * benesPDF(x, x0, dt) 85 | 86 | 87 | def em_mean(f: Callable, 88 | x0: float, 89 | dt: float): 90 | """E[x | x0] by Euler Maruyama 91 | """ 92 | return x0 + f(x0) * dt 93 | 94 | 95 | def em_cov(f: Callable, 96 | x0: float, 97 | dt: float): 98 | """Var[x | x0] by Euler Maruyama 99 | """ 100 | return dt 101 | 102 | 103 | def em_x3(f: Callable, 104 | x0: float, 105 | dt: float): 106 | """E[x^3 | x0] by Euler Maruyama 107 | """ 108 | return x0 ** 3 + 3 * x0 ** 2 * f(x0) * dt \ 109 | + 3 * x0 * dt + f(x0) ** 3 * dt ** 3 \ 110 | + 3 * f(x0) * dt ** 2 111 | 112 | 113 | def ito15_mean(f: Callable, 114 | dfdx: Callable, 115 | d2fdx2: Callable, 116 | x0: float, 117 | dt: float): 118 | """E[x | x0] by Ito-1.5 119 | """ 120 | return x0 + f(x0) * dt + (dfdx(x0) * f(x0) + 0.5 * d2fdx2(x0)) * dt ** 2 / 2 121 | 122 | 123 | def ito15_cov(f: Callable, 124 | dfdx: Callable, 125 | d2fdx2: Callable, 126 | x0: float, 127 | dt: float): 128 | """Cov[x | x0] by Ito-1.5 129 | """ 130 | return dt + dfdx(x0) ** 2 * dt ** 3 / 3 + dfdx(x0) * dt ** 2 131 | 132 | 133 | def ito15_x3(f: Callable, 134 | dfdx: Callable, 135 | d2fdx2: Callable, 136 | x0: float, 137 | dt: float): 138 | """E[x^3 | x0] by Ito-1.5 139 | """ 140 | z = x0 + f(x0) * dt + (dfdx(x0) * f(x0) + 0.5 * d2fdx2(x0)) * dt ** 2 / 2 141 | return z ** 3 + 3 * z * dt + 3 * z * dfdx(x0) * dt ** 2 + z * dfdx(x0) * dt ** 3 142 | 143 | 144 | tanh = lambda u: np.tanh(u) 145 | dtanh = lambda u: 1 - np.tanh(u) ** 2 146 | ddtanh = lambda u: 2 * (np.tanh(u) ** 3 - np.tanh(u)) 147 | 148 | if __name__ == '__main__': 149 | 150 | # Initial value and paras 151 | np.random.seed(666) 152 | x0 = 0.5 153 | T = np.linspace(0.01, 4, 200) 154 | 155 | # Riemannian range 156 | range_dx = np.linspace(x0 - 20, x0 + 20, 100000) 157 | 158 | # Benes SDE 159 | x = sympy.MatrixSymbol('x', 1, 1) 160 | f = sympy.Matrix([sympy.tanh(x[0])]) 161 | L = sympy.eye(1) 162 | dt_sym = sympy.Symbol('dt') 163 | 164 | # TME 165 | tme_mean, tme_cov = tme.mean_and_cov(x, f, L, dt_sym, 166 | order=3, simp=True) 167 | tme_x3 = tme.expectation(sympy.Matrix([x[0] ** 3]), x, f, L, dt_sym, 168 | order=3, simp=True) 169 | tme_nonlinear3 = tme.expectation(sympy.Matrix([sympy.sin(x[0])]), x, f, L, dt_sym, 170 | order=2, simp=True) 171 | tme_nonlinear4 = tme.expectation(sympy.Matrix([sympy.sin(x[0])]), x, f, L, dt_sym, 172 | order=3, simp=True) 173 | tme_nn = tme.expectation(sympy.Matrix([softplus_sympy(softplus_sympy(x[0]))]), x, f, L, dt_sym, 174 | order=3, simp=True) 175 | 176 | tme_mean_func = lambdify([x, dt_sym], tme_mean, 'numpy') 177 | tme_cov_func = lambdify([x, dt_sym], tme_cov, 'numpy') 178 | tme_x3_func = lambdify([x, dt_sym], tme_x3, 'numpy') 179 | tme_nonlinear_func3 = lambdify([x, dt_sym], tme_nonlinear3, 'numpy') 180 | tme_nonlinear_func4 = lambdify([x, dt_sym], tme_nonlinear4, 'numpy') 181 | tme_nonlinear_nn = lambdify([x, dt_sym], tme_nn, 'numpy') 182 | 183 | # Result containers 184 | tme_mean_result = np.zeros_like(T) 185 | tme_cov_result = np.zeros_like(T) 186 | tme_x3_result = np.zeros_like(T) 187 | tme_nonlinear3_result = np.zeros_like(T) 188 | tme_nonlinear4_result = np.zeros_like(T) 189 | tme_nn_result = np.zeros_like(T) 190 | 191 | riem_mean_result = np.zeros_like(T) 192 | riem_cov_result = np.zeros_like(T) 193 | riem_x3_result = np.zeros_like(T) 194 | riem_nonlinear_result = np.zeros_like(T) 195 | riem_nn_result = np.zeros_like(T) 196 | 197 | em_mean_result = np.zeros_like(T) 198 | em_cov_result = np.zeros_like(T) 199 | em_x3_result = np.zeros_like(T) 200 | 201 | ito15_mean_result = np.zeros_like(T) 202 | ito15_cov_result = np.zeros_like(T) 203 | ito15_x3_result = np.zeros_like(T) 204 | 205 | for idx, t in enumerate(T): 206 | tme_mean_result[idx] = tme_mean_func(np.array([[x0]]), np.array([[t]])) 207 | tme_cov_result[idx] = tme_cov_func(np.array([[x0]]), np.array([[t]])) 208 | tme_x3_result[idx] = tme_x3_func(np.array([[x0]]), np.array([[t]])) 209 | tme_nonlinear3_result[idx] = tme_nonlinear_func3(np.array([[x0]]), np.array([[t]])) 210 | tme_nonlinear4_result[idx] = tme_nonlinear_func4(np.array([[x0]]), np.array([[t]])) 211 | tme_nn_result[idx] = tme_nonlinear_nn(np.array([[x0]]), np.array([[t]])) 212 | 213 | riem_mean_result[idx] = rieman1D(range_dx, f_mean, x0=x0, dt=t) 214 | riem_cov_result[idx] = rieman1D(range_dx, f_x2, x0=x0, dt=t) - riem_mean_result[idx] ** 2 215 | riem_x3_result[idx] = rieman1D(range_dx, f_x3, x0=x0, dt=t) 216 | riem_nonlinear_result[idx] = rieman1D(range_dx, f_nonlinear, x0=x0, dt=t) 217 | riem_nn_result[idx] = rieman1D(range_dx, f_nn, x0=x0, dt=t) 218 | 219 | em_mean_result[idx] = em_mean(lambda z: np.tanh(z), x0, t) 220 | em_cov_result[idx] = em_cov(lambda z: np.tanh(z), x0, t) 221 | em_x3_result[idx] = em_x3(lambda z: np.tanh(z), x0, t) 222 | 223 | ito15_mean_result[idx] = ito15_mean(tanh, dtanh, ddtanh, x0, t) 224 | ito15_cov_result[idx] = ito15_cov(tanh, dtanh, ddtanh, x0, t) 225 | ito15_x3_result[idx] = ito15_x3(tanh, dtanh, ddtanh, x0, t) 226 | 227 | # Plot 228 | path_figs = '../thesis/figs' 229 | plt.rcParams.update({ 230 | 'text.usetex': True, 231 | 'text.latex.preamble': r'\usepackage{fouriernc}', 232 | 'font.family': "serif", 233 | 'font.serif': 'New Century Schoolbook', 234 | 'font.size': 15}) 235 | 236 | fig, axs = plt.subplots(nrows=4, ncols=1, figsize=(9, 12), sharex=True) 237 | 238 | # No need to show the mean because the results are identical 239 | # plt.figure() 240 | # plt.plot(T, tme_mean_result, label='TME') 241 | # plt.plot(T, riem_mean_result, label='Exact') 242 | # plt.plot(T, em_mean_result, label='EM') 243 | # plt.plot(T, ito15_mean_result, label='Ito15') 244 | # plt.legend() 245 | # plt.savefig(os.path.join(path_figs, 'tme-benes-mean.pdf')) 246 | 247 | # Variance 248 | axs[0].plot(T, riem_cov_result, 249 | c='black', 250 | linewidth=3, marker='+', markevery=20, 251 | markersize=17, 252 | label='Exact') 253 | axs[0].plot(T, tme_cov_result, 254 | c='tab:blue', 255 | linewidth=3, marker='.', markevery=20, 256 | markersize=17, 257 | label='TME-3') 258 | axs[0].plot(T, em_cov_result, 259 | c='tab:red', 260 | linewidth=3, marker='1', markevery=20, 261 | markersize=17, 262 | label='Euler--Maruyama') 263 | axs[0].plot(T, ito15_cov_result, 264 | c='tab:purple', 265 | linewidth=3, marker='2', markevery=20, 266 | markersize=17, 267 | label=r'It\^{o}-1.5') 268 | 269 | axs[0].grid(linestyle='--', alpha=0.3, which='both') 270 | 271 | axs[0].set_ylabel(r'$\mathrm{Var}\,[X(t) \mid X(t_0)]$') 272 | axs[0].legend(loc='upper left', fontsize=17) 273 | 274 | # X^3 275 | axs[1].plot(T, riem_x3_result, 276 | c='black', 277 | linewidth=3, marker='+', markevery=20, 278 | markersize=17, 279 | label='Exact') 280 | axs[1].plot(T, tme_x3_result, 281 | c='tab:blue', 282 | linewidth=3, marker='.', markevery=20, 283 | markersize=17, 284 | label='TME-3') 285 | axs[1].plot(T, em_x3_result, 286 | c='tab:red', 287 | linewidth=3, marker='1', markevery=20, 288 | markersize=17, 289 | label='Euler--Maruyama') 290 | axs[1].plot(T, ito15_x3_result, 291 | c='tab:purple', 292 | linewidth=3, marker='2', markevery=20, 293 | markersize=17, 294 | label=r'It\^{o}-1.5') 295 | 296 | axs[1].grid(linestyle='--', alpha=0.3, which='both') 297 | 298 | axs[1].set_ylabel(r'$\mathbb{E} \, [X^3(t) \mid X(t_0)]$') 299 | axs[1].legend(loc='upper left', fontsize=17) 300 | 301 | # Nonlinear function only by TME 302 | axs[2].plot(T, riem_nonlinear_result, 303 | c='black', 304 | linewidth=3, marker='+', markevery=20, 305 | markersize=17, 306 | label='Exact') 307 | axs[2].plot(T, tme_nonlinear3_result, 308 | c='tab:purple', 309 | linewidth=3, marker='.', markevery=20, 310 | markersize=17, 311 | label='TME-2') 312 | axs[2].plot(T, tme_nonlinear4_result, 313 | c='tab:blue', 314 | linewidth=3, marker='x', markevery=20, 315 | markersize=17, 316 | label='TME-3') 317 | 318 | axs[2].grid(linestyle='--', alpha=0.3, which='both') 319 | axs[2].set_ylim(-6, 4) 320 | 321 | axs[2].set_ylabel(r'$\mathbb{E}\, [\sin(X(t)) \mid X(t_0)]$') 322 | axs[2].legend(loc='lower left', fontsize=17) 323 | 324 | # Neural network 325 | axs[3].plot(T, riem_nn_result, 326 | c='black', 327 | linewidth=3, marker='+', markevery=20, 328 | markersize=17, 329 | label='Exact') 330 | axs[3].plot(T, tme_nn_result, 331 | c='tab:blue', 332 | linewidth=3, marker='.', markevery=20, 333 | markersize=17, 334 | label='TME-3') 335 | 336 | axs[3].grid(linestyle='--', alpha=0.3, which='both') 337 | # plt.ylim(-1, 1) 338 | 339 | axs[3].set_xlim(0, 4) 340 | 341 | axs[3].set_xlabel('$t$', fontsize=16) 342 | axs[3].set_ylabel(r'$\mathbb{E}\, [\log(1+\exp(\log(1+\exp(X(t))))) \mid X(t_0)]$') 343 | axs[3].legend(loc='upper left', fontsize=17) 344 | 345 | plt.tight_layout(pad=0.1) 346 | plt.savefig(os.path.join(path_figs, 'tme-benes-all.pdf')) 347 | # plt.show() 348 | -------------------------------------------------------------------------------- /thesis_latex/ch1.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = dissertation.tex 2 | \chapter{Introduction} 3 | \label{chap:intro} 4 | In signal processing, statistics, and machine learning, it is common to consider that noisy measurements/data are generated from a latent, unknown, function. In statistics, this is often regarded as a regression problem over the space of functions. Specifically, Bayesian statistics impose a prior belief over the latent function of interest in the form of a probability distribution. It is therefore of vital importance to choose the prior appropriately, since it will encode the characteristics of the underlying function. In recent decades, Gaussian processes\footnote{In the statistics and applied probability literature, Gaussian processes can also be found under the name of Gaussian fields, in particular when they are multidimensional in the input. Depending on the context, we may use one or the other terminology interchangeably.}~\citep[GPs,][]{Carl2006GPML} have become a popular family of prior distributions over functions, and they have been used successfully in numerous applications~\citep{Roberts2013, Hennig2015, Kocijan2016}. 5 | 6 | Formally, GPs are function-valued random variables that have Gaussian distributions fully determined by their mean and covariance functions. The choice of mean and covariance functions is in itself arbitrary, which allows for representing functions with various properties. As an example, \matern covariance functions are used as priors to functions with different degrees of differentiability~\citep{Carl2006GPML}. However, the use of GPs in practice usually involves two main challenges. 7 | 8 | The first challenge lies in the expensive \textit{computational cost} of training and parameter estimation. Due to the necessity of inverting covariance matrices during the learning phase, the computational complexity of standard GP regression and parameter estimation is cubic in the number of measurements. This makes GP computationally infeasible for large-scale datasets. Moreover, when the sampled data points are densely located, the covariance matrices that need inversion may happen to be numerically singular or close to singular, making the learning process unstable. 9 | 10 | The second challenge is related to modelling of irregular functions, such as piecewise smooth functions, or functions that have time-varying features (e.g., frequency or volatility). Many commonly-used GPs (e.g., with \matern covariance functions) fail to cover these irregular functions mainly because their probability distributions are invariant under translation (i.e., they are said to be \textit{stationary}). This behaviour is illustrated in Figure~\ref{fig:gp-fail}, where we show that a \matern GP poorly fits two irregular functions (i.e., a rectangular signal and a composite sinusoidal signal), because the GP's parameters/features are assumed to be constant over time. Specifically, in the rectangular signal example, in order to model the discontinuities, the \matern GP recovers a small global length scale ($\ell \approx 0.04$) which results in poor fitting in the continuous and flat parts. Similarly, in the composite sinusoidal signal example, the GP learns a small global length scale ($\ell \approx 0.01$) in order to model the high-frequency sections of the signal. This too results in poor fitting the low-frequency section of the signal. 11 | 12 | \begin{figure}[t!] 13 | \centering 14 | \includegraphics[width=.95\linewidth]{figs/gp-fail-example-m32}\\ 15 | \includegraphics[width=.95\linewidth]{figs/gp-fail-example-sinu-m32} 16 | \caption{Mat\'{e}rn $\nu=3\,/\,2$ GP regression on a magnitude-varying rectangular signal (top) and a composite sinusoidal signal (bottom). The parameters $\ell$ and $\sigma$ are learnt by maximum likelihood estimation. The figures are taken from~\citet{Zhao2020SSDGP}.} 17 | \label{fig:gp-fail} 18 | \end{figure} 19 | 20 | The main aim of this thesis is thus to introduce a new class of non-stationary (Gaussian) Markov processes, that we name \textit{state-space deep Gaussian processes (SS-DGPs)}\footnote{Please note that although the name includes the term Gaussian, SS-DGPs are typically not Gaussian distributed, but instead hierarchically conditionally Gaussian, hence the name.}. These are able to address the computational and non-stationarity challenges aforementioned, by hierarchically composing the state-space representations of GPs. Indeed, SS-DGPs are computationally efficient models due to their Markovian structure. More precisely, this means that the resulting regression problem can be solved in linear computational time (with respect to the number of measurements) by using Bayesian filtering and smoothing methods. Moreover, due to their hierarchical nature, SS-DGPs are capable of changing their features/characteristics (e.g., length scale) over time, thereby inducing a rich class of priors compatible with irregular functions. The thesis ends with a collection of applications of state-space (deep) GPs. 21 | 22 | \section{Bibliographical notes} 23 | \label{sec:literature-review} 24 | In this section we provide a short and non-exhaustive review of related works in the GP literature. In particular we will focus on works that consider specifically reducing their computational complexity and allowing the non-stationarity in GPs. 25 | 26 | \subsection*{Scalable Gaussian processes} 27 | We now give a list of GP methods and approximations that are commonly used to reduce the computational costs of GP regression and parameter learning. 28 | 29 | \subsubsection{Sparse approximations of Gaussian processes} 30 | Sparse GPs approximate full-rank GPs with sparse representations by using, for example, inducing points~\citep{Snelson2006}, subsets of data~\citep{Snelson2007, Csato2002}, or approximations of marginal likelihoods~\citep{Titsias2009}, mostly relying on so-called pseudo-inputs. These approaches can reduce the computational complexity to quadratic in the number of pseudo-inputs and linear in the number of data points. In practice, the number and position of pseudo-inputs used in sparse representation must either be assigned by human experts or learnt from data~\citep{Hensman2013}. For more comprehensive reviews of sparse GPs, see, for example,~\citet{Quinonero2005unifying, Chalupka2013, LiuHaitao2020}. 31 | 32 | \subsubsection*{Gaussian Markov random fields} 33 | Gaussian Markov random fields~\citep[GMRFs,][]{Rue2005Book} are indexed collections of Gaussian random variables that have a Markov property (defined on graph). They are computationally efficient models because their precision matrices are sparse by construction. Methodologies for solving the regression and parameter learning problems on GMRFs can be found, for example, in~\citet{Rue2007, Rue2009N}. However, GMRFs are usually only approximations of Gaussian fields~\citep[see, e.g.,][Chapter 5]{Rue2005Book}, although explicit representations exist for some specific Gaussian fields~\citep{Lindgren2011}. 34 | 35 | \subsubsection*{State-space representations of Gaussian processes} 36 | State-space Gaussian processes (SS-GPs) are (temporal) Markov GPs that are solutions of stochastic differential equations~\citep[SDEs,][]{Simo2013SSGP, Sarkka2019}. Due to their Markovian structure, probability distributions of SS-GPs factorise sequentially in the time dimension. The regression problem can therefore be solved efficiently in linear time with respect to the number of data points. Moreover, leveraging the sparse structure of the precision matrix \citep{Grigorievskiy2017}, or leveraging the associativity of the Kalman filtering and smoothing operations \citep{Corenflos2021SSGP} can lead to a sublinear computational complexity. 37 | 38 | \subsubsection*{Other data-scalable Gaussian processes} 39 | \citet{Rasmussen2002GPexperts, Meeds2006} form mixtures of GPs by splitting the dataset into batches resulting in a computational complexity that is cubic in the batch size. This methodology can further be made parallel \citep{ZhangMinyi2019}. \citet{Lazaro2010} approximate stationary GPs with sparse spectral representations (i.e., trigonometric expansions). \citet{Gardner2018} and \citet{KeWang2019} use conjugate gradients and stochastic trace estimation to efficiently compute the marginal log-likelihood of standard GPs, as well as their gradients with respect to parameters, resulting in a quadratic computational complexity in the number of data points. 40 | 41 | \subsection*{Non-stationary Gaussian processes} 42 | In the below we give a list of methods that are introduced in order to induce non-stationarity in GPs. 43 | 44 | \subsubsection*{Non-stationary covariance function-based Gaussian processes} 45 | Non-stationary covariance functions can be constructed by making their parameters (e.g., length scale or magnitude) depend on the data position. For instance, \citet{Gibbs} and \citet{Higdon1999non} present specific examples of covariance functions where the length scale parameter depends on the spatial location. On the other hand, \citet{Paciorek2004, Paciorek2006} generalise these constructions to turn any stationary covariance function into a non-stationary one. There also exist some other non-stationary covariance functions, such as the polynomial or neural network covariance functions~\citep{Williams1998, Carl2006GPML} that can also give non-stationary GPs, but we do not review them here as they are not within the scope of this thesis. 46 | 47 | \subsubsection*{Composition-based Gaussian processes} 48 | \citet{Sampson1992, Schmidt2003, Carl2006GPML} show that it is possible to construct a non-stationary GP as the pullback of an existing stationary GP by a non-linear transformation. Formally, given a stationary GP $U\colon E \to \R$, one can find a suitable transformation $\Upsilon\colon \T \to E$, such that the composition $U \circ \Upsilon\colon \T\to\R$ is a non-stationary GP on $\T$. For example, \citet{Calandra2016ManifoldGP} and~\citet{Wilson2016DeepKernel} choose $\Upsilon$ as neural networks. 49 | 50 | \subsubsection{Warping-based Gaussian processes} 51 | Conversely to the composition paradigm above, it is also possible to transform GPs the other way around, that is, to consider that GPs are the transformations of some non-Gaussian processes by non-linear functions~\citep{Snelson2004}. Computing the marginal log-likelihood function of these warped GPs is then done by leveraging the change-of-variables formula for Lebesgue integrals (when it applies). However, the warping can be computationally demanding as the change-of-variables formula requires computing the inverse determinant of the transformation Jacobian. This issue can be mitigated, for example, by writing the the warping scheme with multiple layers of elementary functions which have explicit inverses~\citep{Rios2019}. 52 | 53 | 54 | \subsection*{Deep Gaussian processes} 55 | The deterministic constructions for introducing non-stationarity GPs can be further extended in order to give a class of non-stationary non-Gaussian processes that can also represent irregular functions. While they are different in structure, the three subclasses of models presented below are usually all referred as deep Gaussian processes (DGPs) in literature. 56 | 57 | \subsubsection*{Composition-based deep Gaussian processes} 58 | \citet{Gredilla2012} extends the aforementioned pullback idea by taking $\Upsilon\colon \T\to E$ to be a GP instead of a deterministic mapping in order to overcome the overfitting problem. Resulting compositions of the form $U \circ \Upsilon\colon \T\to\R$ may not necessarily be GPs anymore but may provide a more flexible family of priors than that of deterministic compositions. This construction can be done recursively leading to a subclass of DGPs~\citep{Damianou2013}. However, the training of these DGPs is found to be challenging and requires approximate inference methods~\citep{Bui2016, Salimbeni2017Doubly}. Moreover, \citet{Duvenaud2014Thesis, Duvenaud2014} show that increasing the depth of DGPs can lead to a representation pathology, where samples of DGPs tend to be flat in high probability and exhibit sudden jumps. This problem can be mitigated by making their latent GP components explicitly depend on their original inputs~\citep{Duvenaud2014}. 59 | 60 | \subsubsection*{Hierarchical parametrisation-based deep Gaussian processes} 61 | A similar idea to compositional DGPs is to model the parameters of GPs as latent GPs. The posterior distribution of the joint model can then be computed by successive applications of Bayes' rule. As an example,~\citet{Roininen2016} consider putting a GP prior on the length scale parameter of a \matern GP and use Metropolis-within-Gibbs to sample from the posterior distribution. Similarly,~\citet{Salimbeni2017ns} model the length scale parameter of the non-stationary covariance function introduced by~\citet{Paciorek2004} as a GP, but use a variational approximation to approximate its posterior distribution. Other sampling techniques to recover the posterior distribution of these models can be found, for example, in~\citet{Heinonen2016, Karla2020}. 62 | 63 | \citet{Zhao2020SSDGP} and~\citet{Emzir2020} show that this hierarchy in parametrisation can be done recursively, leading to another subclass of DGPs that can be represented by stochastic (partial) differential equations. The relationship between the composition-based and parametrisation-based DGPs is also briefly discussed in~\citet{Dunlop2018JMLR}. 64 | 65 | \section{Reproducibility} 66 | \label{sec:codes} 67 | In order to allow for reproducibility of our work, we provide the following implementations. 68 | 69 | \begin{itemize} 70 | \item Taylor moment expansion (Chapter~\ref{chap:tme}). Python and Matlab codes for it are available at \href{https://github.com/zgbkdlm/tme}{https://github.com/zgbkdlm/tme}. 71 | \item State-space deep Gaussian processes (Chapter~\ref{chap:dssgp}). Python and Matlab codes for it are available at \href{https://github.com/zgbkdlm/ssdgp}{https://github.com/zgbkdlm/ssdgp}. 72 | \item The Python codes for reproducing Figures~\ref{fig:gp-fail}, \ref{fig:disc-err-dgp-m12}, \ref{fig:ssdgp-vanishing-cov}, and~\ref{fig:spectro-temporal-demo}, as well as the simulations in Examples~\ref{example-tme-benes}, \ref{example:tme-softplus}, \ref{example:ssdgp-m12}, and~\ref{example:ssdgp-m32} are available at \href{https://github.com/zgbkdlm/dissertation}{https://github.com/zgbkdlm/dissertation}. 73 | \end{itemize} 74 | 75 | \section{Outline of the thesis} 76 | \label{sec:outline} 77 | This thesis consists of seven publications and overviews of them, and the thesis is organised as follows. 78 | 79 | In Chapter~\ref{chap:cd-smoothing} we review stochastic differential equations (SDEs) and Bayesian continuous-discrete filtering and smoothing (CD-FS) problems. This chapter lays out the preliminary definitions and results that are needed in the rest of the thesis. 80 | 81 | Chapter~\ref{chap:tme} (related to Publication~\cp{paperTME}) shows how to solve Gaussian approximated CD-FS problems by using the Taylor moment expansion (TME) method. This chapter also features some numerical demonstrations and analyses the positive definiteness of TME covariance approximations as well as the stability of TME Gaussian filters. 82 | 83 | Chapter~\ref{chap:dssgp} (related to Publications~\cp{paperSSDGP} and~\cp{paperRNSSGP}) introduces SS-DGPs. In particular, after defining DGPs formally, we introduce their state-space representations. Secondly, we present how to sample from SS-DGPs by combining discretisation and numerical integration. Thirdly, we illustrate the construction of SS-DGPs in the \matern sense. Fourhtly, we represent SS-DGP regression problems as CD-FS problems that we can then solve using the methods introduced in Chapter~\ref{chap:cd-smoothing}. Finally, we explain how DGPs can be regularised in the $L^1$ sense, in particular to promote sparsity at any level of the DGP component hierarchy. 84 | 85 | Chapter~\ref{chap:apps} (related to Publications~\cp{paperDRIFT},~\cp{paperKFSECG},~\cp{paperKFSECGCONF},~\cp{paperSSDGP}, and~\cp{paperMARITIME}) introduces various applications of state-space (deep) GPs. These include estimation of the drift functions in SDEs, probabilistic spectro-temporal signal analysis, as well as modelling real-world signals (from astrophysics, human motion, and maritime navigation) with SS-DGPs. 86 | 87 | Finally, Chapter~\ref{chap:summary} offers a summary of the contributions of the seven publications presented in this thesis, and concludes with a discussion of unsolved problems and possible future extensions. 88 | -------------------------------------------------------------------------------- /thesis_latex/dissertation.tex: -------------------------------------------------------------------------------- 1 | % Metadata for pdfx 2 | \RequirePackage{filecontents} 3 | \begin{filecontents*}{dissertation.xmpdata} 4 | \Title{State-space deep Gaussian processes with applications} 5 | \Author{Zheng Zhao} 6 | \Subject{State-space methods for deep Gaussian processes} 7 | \Keywords{Gaussian processes, machine learning, state space, stochastic differential equations, stochastic filtering} 8 | \end{filecontents*} 9 | 10 | \documentclass[dissertation,final,vertlayout,pdfa,nologo,math]{aaltoseries} 11 | 12 | % Kludge to make sure we have utf8 input (check that this file is utf8!) 13 | \makeatletter 14 | \@ifpackageloaded{inputenc}{% 15 | \inputencoding{utf8}}{% 16 | \usepackage[utf8]{inputenc}} 17 | \makeatother 18 | 19 | % hyperref is pre-loaded by aaltoseries 20 | \hypersetup{bookmarks=true, colorlinks=false, pagebackref=true, hypertexnames=true, hidelinks} 21 | 22 | % Enable backref especially for bibliography 23 | \usepackage[hyperpageref]{backref} 24 | \renewcommand*{\backref}[1]{} 25 | \renewcommand*{\backrefalt}[4]{{ 26 | \ifcase #1 Not cited.% 27 | \or Cited on page~#2.% 28 | \else Cited on pages #2.% 29 | \fi% 30 | }} 31 | 32 | \usepackage[english]{babel} 33 | \usepackage{amsmath,amsthm,amssymb,bm} 34 | % Adjustment set by the aaltoseries developers 35 | \interdisplaylinepenalty=2500 36 | \renewcommand*{\arraystretch}{1.2} 37 | \setlength{\jot}{8pt} 38 | 39 | % Enable the following to suppress page headers and numbers on 40 | % content-less left (even-numbered) pages. Fixes a bug in aaltoseries 41 | \usepackage{emptypage} 42 | 43 | \usepackage[SchoolofEngineering]{aaltologo} 44 | 45 | \usepackage{CJKutf8} 46 | \usepackage[round, authoryear]{natbib} 47 | \usepackage{graphicx} 48 | \usepackage{mathtools} 49 | \usepackage[shortlabels]{enumitem} 50 | 51 | \usepackage{tikz} 52 | \usetikzlibrary{fadings} 53 | \usetikzlibrary{patterns} 54 | \usetikzlibrary{shadows.blur} 55 | \usetikzlibrary{shapes} 56 | 57 | \newcommand{\thmenvcounter}{chapter} 58 | \input{zmacro.tex} 59 | 60 | \newcommand{\zz}[1]{{\color{red} #1}} 61 | 62 | \newcommand*{\hilite}[1]{ 63 | \setlength{\fboxsep}{3mm}% 64 | \begin{center}\colorbox{orange}{\parbox{0.9\columnwidth}{\textit{#1}}}\end{center}% 65 | } 66 | 67 | \author{Zheng Zhao} 68 | \title{State-Space Deep Gaussian Processes with Applications} 69 | 70 | \begin{document} 71 | 72 | \includepdf[pages=-]{title-pages/title-pages.pdf} 73 | 74 | \draftabstract{This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SS-DGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuous-discrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SS-DGPs and solving their regression problems. Lastly, this thesis features a number of applications of state-space (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectro-temporal features of signals.} 75 | 76 | \setcounter{page}{0} 77 | 78 | %% Preface 79 | % Note: I myself changed this environment. 80 | \begin{preface}[Helsinki]{\large{\begin{CJK}{UTF8}{bkai}\\趙~正\end{CJK}} 81 | } 82 | The research work in this thesis has been carried out in the Department of Electrical Engineering and Automation, Aalto University, during the years 2018-2021. My doctoral studies officially started in April of 2018, while most of the pivotal work came in 2020-2021. During this time, my doctoral research was financially supported by Academy of Finland and Aalto ELEC Doctoral School. The Aalto Scientific Computing team and the Aalto Learning Center also provided useful computational and literature resources for my studies. I particularly enjoyed the Spring, Autumn, and Winter in Finland, which allowed me to find inner peace and focus on my research. 83 | 84 | I would like to offer my greatest gratitude to Prof. Simo S\"{a}rkk\"{a} who is my supervisor and mentor, and without whom this work would never have been possible. After finishing my master studies in Beijing University of Technology in 2017, I found myself lost in finding a ``meaningful'' way of life in the never-sleeping metropolis that is Beijing. This quest was fulfilled when Simo offered me the opportunity of pursuing a doctoral degree under his supervision. Disregarding my bewilderment on the research path in the beginning, Simo's patience and valuable guidance led me to a research area that I am fascinated in. Over the years, Simo's help, support, and friendship have helped me become a qualified and independent researcher. I think very highly of Simo's supervision, and I almost surely could not have found a better supervisor. 85 | 86 | During my years in the campus, I owe a great thanks to Rui Gao (\begin{CJK}{UTF8}{bkai}高~睿\end{CJK}) who is a brilliant, learnt, and erudite researcher. 87 | 88 | I would like to thank these few people that have accompanied me through joy and sorrow, I name: Adrien Corenflos and Christos Merkatas. I thank you for the friendship and relieving me from solitude\footnote{This was written under constraint.}. 89 | 90 | During my years in Aalto university, I have shared my office with Marco Soldati, Juha Sarmavuori, Janne Myll\"{a}rinen, Fei Wang (\begin{CJK}{UTF8}{bkai}王~斐\end{CJK}), Jiaqi Liu (\begin{CJK}{UTF8}{bkai}劉~佳琦\end{CJK}), Ajinkya Gorad, Masaya Murata (\begin{CJK}{UTF8}{bkai}村田~真哉\end{CJK}), and Otto Kangasmaa. I thank them all for filling the office with happiness and joy. I especially thank Marco Soldati who offered me honest friendship, lasagne, and taught me many useful Italian phrases. My thanks also go to Lauri Palva, Zenith Purisha, Joel Jaskari, Sakira Hassan, Fatemeh Yaghoobi, Abubakar Yamin, Zaeed Khan, Xiaofeng Ma (\begin{CJK}{UTF8}{bkai}馬 曉峰\end{CJK}), Prof. Ivan Vujaklija, Dennis Yeung, Wendy Lam, Prof. Ilkka Laakso, Marko Mikkonen, Noora Matilainen, Juhani Kataja, Linda Srbova, and Tuomas Turunen. All these amazing people made working at Aalto a real pleasure. I would also like to give my thanks to Laila Aikala who kindly offered me a peaceful place to stay in Espoo. 91 | 92 | I warmly thank Prof. Leo K\"{a}rkk\"{a}inen for the collaboration on the AI in Health Technology course and our inspiring discussions on many Thursdays and Fridays. I particularly enjoyed the collaboration with Muhammad Fuady Emzir who offered me knowledge generously and with no reservations. Many thanks go to my coauthors Prof. Roland Hostettler, Prof. Ali Bahrami Rad, Filip Tronarp, and Toni Karvonen. I also appreciated the collaboration with Sarang Thombre and Toni Hammarberg from Finnish Geospatial Research Institute, Prof. Ville V. Lehtola from University of Twente, and Tuomas Lumikari from Helsinki University Hospital. I also thank Prof. Lassi Roininen and Prof. Arno Solin for their time and valuable advice. 93 | 94 | Lastly, I would like to thank my parents and sister who support me persistently as always. 95 | 96 | \end{preface} 97 | 98 | %% Table of contents of the dissertation 99 | \clearpage 100 | \tableofcontents 101 | 102 | % To be defined before generating list of publications. Use 'no' if no acknowledgement 103 | \languagecheck{Adrien Corenflos, Christos Merkatas, and Dennis Yeung} 104 | 105 | %% This is for article dissertations. Remove if you write a monograph dissertation. 106 | % The actual publications are entered manually one by one as shown further down: 107 | % use \addpublication, \addcontribution, \adderrata, and addpublicationpdf. 108 | % The last adds the actual article, the other three enter related information 109 | % that will be collected in lists -- like this one. 110 | % 111 | % Uncomment and edit as needed 112 | \def\authorscontributionname{Author's contribution} 113 | \listofpublications 114 | 115 | %%% Add lists of figures and tables as you usually do (\listoffigures, \listoftables) 116 | %\listoffigures 117 | 118 | %% Add list of abbreviations, list of symbols, etc., using your preferred package/method. 119 | \abbreviations 120 | 121 | \begin{description}[style=multiline,leftmargin=3cm] 122 | \item[CD-FS] Continuous-discrete filtering and smoothing 123 | \item[DGP] Deep Gaussian process 124 | \item[GFS] Gaussian approximated density filter and smoother 125 | \item[GMRF] Gaussian Markov random field 126 | \item[GP] Gaussian process 127 | \item[It\^{o}-1.5] It\^{o}--Taylor strong order 1.5 128 | \item[LCD] Locally conditional discretisation 129 | \item[MAP] Maximum a posteriori 130 | \item[MCMC] Markov chain Monte Carlo 131 | \item[MLE] Maximum likelihood estimation 132 | \item[NSGP] Non-stationary Gaussian process 133 | \item[ODE] Ordinary differential equation 134 | \item[PDE] Partial differential equation 135 | \item[RBF] Radial basis function 136 | \item[R-DGP] Regularised (batch) deep Gaussian process 137 | \item[R-SS-DGP] Regularised state-space deep Gaussian process 138 | \item[RTS] Rauch--Tung--Striebel 139 | \item[SDE] Stochastic differential equation 140 | \item[SS-DGP] State-space deep Gaussian process 141 | \item[SS-GP] State-space Gaussian process 142 | \item[TME] Taylor moment expansion 143 | \end{description} 144 | 145 | \symbols 146 | 147 | \begin{description}[style=multiline,leftmargin=3cm] 148 | \item[$a$] Drift function of SDE 149 | \item[$A$] Drift matrix of linear SDE 150 | \item[$\A$] Infinitesimal generator 151 | \item[$\Am$] Multidimensional infinitesimal generator 152 | \item[$b$] Dispersion function of SDE 153 | \item[$B$] Dispersion matrix of linear SDE 154 | \item[$c$] Constant 155 | \item[$\mathcal{C}^k(\Omega; \Pi)$] Space of $k$ times continuously differentiable functions on $\Omega$ mapping to $\Pi$ 156 | \item[$C(t,t')$] Covariance function 157 | \item[$C_{\mathrm{Mat.}}(t,t')$] \matern covariance function 158 | \item[$C_{\mathrm{NS}}(t,t')$] Non-stationary \matern covariance function 159 | \item[$C_{1:T}$] Covariance/Gram matrix by evaluating the covariance function $C(t, t')$ on Cartesian grid $(t_1,\ldots, t_T) \times (t_1,\ldots, t_T)$ 160 | \item[$\covsym$] Covariance 161 | \item[$\cov{X \mid Y}$] Conditional covariance of random variable $X$ given another random variable $Y$ 162 | \item[$\cov{X \mid y}$] Conditional covariance of random variable $X$ given the realisation $y$ of random variable $Y$ 163 | \item[$d$] Dimension of state variable 164 | \item[$d_i$] Dimension of the $i$-th GP element 165 | \item[$d_y$] Dimension of measurement variable 166 | \item[$\det$] Determinant 167 | \item[$\diagsym$] Diagonal matrix 168 | \item[$\expecsym$] Expectation 169 | \item[$\expec{X \mid \mathcal{F}}$] Conditional expectation of $X$ given sigma-algebra $\mathcal{F}$ 170 | \item[$\expec{X \cond Y}$] Conditional expectation of $X$ given the sigma-algebra generated by random variable $Y$ 171 | \item[$\expec{X \cond y}$] Conditional expectation of $X$ given the realisation $y$ of random variable $Y$ 172 | \item[$f$] Approximate transition function in discrete state-space model 173 | \item[$f^M$] $M$-order TME approximated transition function in discrete state-space model 174 | \item[$\check{f}$] Exact transition function in discrete state-space model 175 | \item[$\mathring{f}_j$] $j$-th frequency component 176 | \item[$\FF$] Sigma-algebra 177 | \item[$\FF_t$] Filtration 178 | \item[$\FF_t^W$] Filtration generated by $W$ and initial random variable 179 | \item[$g$] Transformation function 180 | \item[$\mathrm{GP}(0, C(t,t'))$] Zero-mean Gaussian process with covariance function $C(t,t')$. 181 | \item[$h$] Measurement function 182 | \item[$H$] Measurement matrix 183 | \item[$\hessian_x f$] Hessian matrix of $f$ with respect to $x$ 184 | \item[$I$] Identity matrix 185 | \item[$J$] Set of conditional dependencies of GP elements 186 | \item[$\jacob_x f$] Jacobian matrix of $f$ with respect to $x$ 187 | \item[$K$] Kalman gain 188 | \item[$\mBesselsec$] Modified Bessel function of the second kind with parameter $\nu$ 189 | \item[$\ell$] Length scale parameter 190 | \item[$\mathcal{L}^\mathrm{A}$] Augmented Lagrangian function 191 | \item[$\mathcal{L}^\mathrm{B}$] MAP objective function of batch DGP 192 | \item[$\mathcal{L}^\mathrm{B-REG}$] $L^1$-regularisation term for batch DGP 193 | \item[$\mathcal{L}^\mathrm{S}$] MAP objective function of state-space DGP 194 | \item[$\mathcal{L}^\mathrm{S-REG}$] $L^1$-regularisation term for state-space DGP 195 | \item[$m(t)$] Mean function 196 | \item[$m^-_k$] Predictive mean at time $t_k$ 197 | \item[$m^f_k$] Filtering mean at time $t_k$ 198 | \item[$m^s_k$] Smoothing mean at time $t_k$ 199 | \item[$M$] Order of Taylor moment expansion 200 | \item[$N$] Order of Fourier expansion 201 | \item[$\mathrm{N}(x\mid m, P)$] Normal probability density function with mean $m$ and covariance $P$ 202 | \item[$\N$] Set of natural numbers 203 | \item[$O$] Big $O$ notation 204 | \item[$p_X(x)$] Probability density function of random variable $X$ 205 | \item[$p_{X \cond Y}(x\cond y)$] Conditional probability density function of $X$ given $Y$ taking value $y$ 206 | \item[$P^-_k$] Predictive covariance at time $t_k$ 207 | \item[$P^f_k$] Filtering covariance at time $t_k$ 208 | \item[$P^s_k$] Smoothing covariance at time $t_k$ 209 | \item[$P^{i,j}_k$] Filtering covariance of the $i$ and $j$-th state elements at time $t_k$ 210 | \item[$\mathbb{P}$] Probability measure 211 | \item[$q_k$] Approximate process noise in discretised state-space model at time $t_k$ 212 | \item[$\check{q}_k$] Exact process noise in discretised state-space model at time $t_k$ 213 | \item[$Q_k$] Covariance of process noise $q_k$ 214 | \item[$R_{M, \phi}$] Remainder of $M$-order TME approximation for target function $\phi$ 215 | \item[$\R$] Set of real numbers 216 | \item[$\R_{>0}$] Set of positive real numbers 217 | \item[$\R_{<0}$] Set of negative real numbers 218 | \item[$\sgn$] Sign function 219 | \item[$\mathcal{S}_{m, P}$] Sigma-point approximation of Gaussian integral with mean $m$ and covariance $P$ 220 | \item[$t$] Temporal variable 221 | \item[$\tracesym$] Trace 222 | \item[$t_0$] Initial time 223 | \item[$T$] Number of measurements 224 | \item[$\T$] Temporal domain $\T\coloneqq [t_0, \infty)$ 225 | \item[$U$] (State-space) GP 226 | \item[$U^i_{j_i}$] (State-space) GP element in $\mathcal{V}$ indexed by $i$, and it is also a parent of the $j_i$-th GP element in $\mathcal{V}$ 227 | \item[$U_{1:T}$] Collection of $U(t_1), U(t_2),\ldots, U(t_T)$ 228 | \item[$\mathcal{U}^i$] Collection of parents of $U^i_{j_i}$ 229 | \item[$V$] (State-space) deep GP 230 | \item[$V_k$] Shorthand of $V(t_k)$ 231 | \item[$V_{1:T}$] Collection of $V(t_1), V(t_2),\ldots, V(t_T)$ 232 | \item[$\mathcal{V}$] Collection of GP elements 233 | \item[$\varrsym$] Variance 234 | \item[$w$] Dimension of Wiener process 235 | \item[$W$] Wiener process 236 | \item[$X$] Stochastic process 237 | \item[$X_0$] Initial random variable 238 | \item[$X_k$] Shorthand of $X(t_k)$ 239 | \item[$Y_k$] Measurement random variable at time $t_k$ 240 | \item[$Y_{1:T}$] Collection of $Y_1, Y_2,\ldots, Y_T$ 241 | 242 | \item[$\gamma$] Dimension of the state variable of Mat\'{e}rn GP 243 | \item[$\Gamma$] Shorthand of $b(x) \, b(x)^\trans$ 244 | \item[$\varGamma$] Gamma function 245 | \item[$\Delta t$] Time interval $t-s$ 246 | \item[$\Delta t_k$] Time interval $t_k-t_{k-1}$ 247 | \item[$\eta$] Multiplier for augmented Lagrangian function 248 | \item[$\theta$] Auxiliary variable used in augmented Lagrangian function 249 | \item[$\Theta_{r}$] $r$-th polynomial coefficient in TME covariance approximation 250 | \item[$\mineig$] Minimum eigenvalue 251 | \item[$\maxeig$] Maximum eigenvalue 252 | \item[$\Lambda(t)$] Solution of a matrix ordinary differential equation 253 | \item[$\cu{\Lambda}(t, s)$] Shorthand of $\Lambda(t) \, (\Lambda(s))^{-1}$ 254 | \item[$\xi_k$] Measurement noise at time $t_k$ 255 | \item[$\Xi_k$] Variance of measurement noise $\xi_k$ 256 | \item[$\rho$] Penalty parameter in augmented Lagrangian function 257 | \item[$\sigma$] Magnitude (scale) parameter 258 | \item[$\Sigma_M$] $M$-order TME covariance approximant 259 | \item[$\phi$] Target function 260 | \item[$\phi_{ij}$] $i,j$-th element of $\phi$ 261 | \item[$\phi^\mathrm{I}$] $\phi^\mathrm{I}(x) \coloneqq x$ 262 | \item[$\phi^\mathrm{II}$] $\phi^\mathrm{II}(x) \coloneqq x \, x^\trans$ 263 | \item[$\Phi$] Sparsity inducing matrix 264 | \item[$\chi(\Delta t)$] Polynomial of $\Delta t$ associated with TME covariance approximation 265 | \item[$\Omega$] Sample space 266 | 267 | \item[$(\Omega, \FF, \FF_t, \PP)$] Filtered probability space with sample space $\Omega$, sigma-algebra $\FF$, filtration $\FF_t$, and probability measure $\PP$ 268 | \item[$\abs{\cdot}$] Absolute value 269 | \item[$\norm{\cdot}_p$] $L^p$ norm or $L^p$-induced matrix norm 270 | \item[$\norm{\cdot}_G$] Euclidean norm weighted by a non-singular matrix $G$ 271 | \item[$\nabla_x f$] Gradient of $f$ with respect to $x$ 272 | \item[$\binom{\cdot}{\cdot}$] Binomial coefficient 273 | \item[$\innerp{\cdot, \cdot}$] Inner product 274 | \item[$\circ$] Mapping composition 275 | \item[$\coloneqq$] By definition 276 | \item[$\times$] Cartesian product 277 | \item[$a \, \wedge \, b$] Minimum of $a$ and $b$ 278 | \end{description} 279 | 280 | \input{ch1} 281 | \input{ch2} 282 | \input{ch3} 283 | \input{ch4} 284 | \input{ch5} 285 | \input{ch6} 286 | 287 | \renewcommand{\bibname}{References} 288 | \bibliographystyle{plainnat} 289 | \bibliography{refs} 290 | 291 | % Errata list, if you have errors in the publications. 292 | \errata 293 | 294 | \input{list_of_papers.tex} 295 | 296 | \includepdf[pages=-]{title-pages/backcover.pdf} 297 | 298 | \end{document} 299 | -------------------------------------------------------------------------------- /lectio_praecursoria/slide.tex: -------------------------------------------------------------------------------- 1 | \documentclass[seriffont, cmap=Beijing, 10pt]{zz} 2 | 3 | \newcommand\hmmax{0} 4 | \newcommand\bmmax{0} 5 | 6 | \usepackage[utf8]{inputenc} 7 | \usepackage[T1]{fontenc} 8 | %\usepackage{fouriernc} 9 | \usepackage{amsmath, amssymb, bm, mathtools} 10 | \usepackage{animate} 11 | \usepackage{graphicx} 12 | 13 | \usepackage{tikz} 14 | \usetikzlibrary{fadings} 15 | \usetikzlibrary{patterns} 16 | \usetikzlibrary{shadows.blur} 17 | \usetikzlibrary{shapes} 18 | 19 | \title{Lectio Praecursoria} 20 | \subtitle{State-Space Deep Gaussian Processes with Applications} 21 | 22 | \date[10 December 2021]{10 December 2021} 23 | \institute{Aalto University} 24 | 25 | \author[Zheng Zhao]{Zheng Zhao} 26 | 27 | \setbeamercovered{transparent} 28 | \setbeamertemplate{section in toc}[circle] 29 | 30 | % Change toc item spacing 31 | % https://tex.stackexchange.com/questions/170268/separation-space-between-tableofcontents-items-in-beamer 32 | \usepackage{etoolbox} 33 | \makeatletter 34 | \patchcmd{\beamer@sectionintoc} 35 | {\vfill} 36 | {\vskip2.\itemsep} 37 | {} 38 | {} 39 | \makeatother 40 | 41 | % Footnote without numbering 42 | % https://tex.stackexchange.com/questions/30720/footnote-without-a-marker 43 | \newcommand\blfootnote[1]{% 44 | \begingroup 45 | \renewcommand\thefootnote{}\footnote{\scriptsize#1}% 46 | \addtocounter{footnote}{-1}% 47 | \endgroup 48 | } 49 | 50 | \input{../thesis_latex/z_marcro.tex} 51 | 52 | \begin{document} 53 | 54 | \titlepage 55 | 56 | \begin{frame}{The dissertation} 57 | \noindent 58 | \begin{minipage}{.48\textwidth} 59 | \begin{figure} 60 | \centering 61 | \fbox{\includegraphics[trim={2cm 2cm 2cm 2cm},width=.8\linewidth,clip]{../thesis_latex/title-pages/title-pages}} 62 | \end{figure} 63 | \end{minipage} 64 | \hfill 65 | \begin{minipage}{.48\textwidth} 66 | \begin{block}{} 67 | Available online:\\ \url{https://github.com/zgbkdlm/dissertation}\\ 68 | or scan the QR code 69 | \end{block} 70 | \begin{block}{} 71 | \begin{figure} 72 | \centering 73 | \includegraphics[width=.5\linewidth]{figs/qr-code-thesis} 74 | \end{figure} 75 | \end{block} 76 | \begin{block}{} 77 | Companion codes in Python and Matlab are also in $^\wedge$ 78 | \end{block} 79 | \end{minipage} 80 | \end{frame} 81 | 82 | \begin{frame}{Contents} 83 | This dissertation mainly consists of: 84 | \begin{block}{} 85 | \tableofcontents 86 | \end{block} 87 | \end{frame} 88 | 89 | \section{Continuous-discrete filtering and smoothing with Taylor moment expansion} 90 | \begin{frame}{Contents} 91 | \begin{block}{} 92 | \tableofcontents[currentsection] 93 | \end{block} 94 | \end{frame} 95 | 96 | \begin{frame}{Stochastic filtering} 97 | \begin{block}{} 98 | Consider a system 99 | % 100 | \begin{equation} 101 | \begin{split} 102 | \diff X(t) &= a(X(t)) \diff t + b(X(t)) \diff W(t), \quad X(t_0) = X_0,\\ 103 | Y_k &= h(X(t_k)) + \xi_k, \quad \xi_k \sim \mathrm{N}(0, \Xi_k), 104 | \end{split} 105 | \end{equation} 106 | % 107 | and a set of data $y_{1:T} = \lbrace y_1, y_2,\ldots, y_T \rbrace$. The goals are to estimate 108 | \begin{itemize} 109 | \item the (marginal) \alert{filtering} distributions 110 | % 111 | \begin{equation} 112 | p(x_k \cond y_{1:k}), \quad \text{for }k=1,2,\ldots, 113 | \end{equation} 114 | % 115 | \item and the (marginal) \alert{smoothing} distributions 116 | % 117 | \begin{equation} 118 | p(x_k \cond y_{1:T}), \quad \text{for }k=1,2,\ldots,T, 119 | \end{equation} 120 | % 121 | \end{itemize} 122 | \end{block} 123 | \blfootnote{$p(x \cond y)$ abbreviates $p_{X \cond Y}(x \cond y)$.} 124 | \end{frame} 125 | 126 | \begin{frame}{Stochastic filtering} 127 | \begin{block}{} 128 | \begin{figure} 129 | \centering 130 | \animategraphics[autoplay, loop, width=.49\linewidth]{10}{figs/animes/filter-}{0}{99} 131 | \animategraphics[autoplay, loop, width=.49\linewidth]{10}{figs/animes/smoother-}{0}{99} 132 | \caption{Filtering (left) and smoothing (right).} 133 | % dummy image 134 | \end{figure} 135 | \end{block} 136 | \end{frame} 137 | 138 | \begin{frame}{Stochastic filtering} 139 | \begin{block}{} 140 | Solving the \alert{filtering} and \alert{smoothing} problems usually involves computing 141 | % 142 | \begin{equation} 143 | \expec{\phi(X(t)) \cond X(s)} 144 | \end{equation} 145 | % 146 | for $t\geq s \in\T$ and some \alert{target function $\phi$}. 147 | \end{block} 148 | \begin{block}{} 149 | For instance, in \alert{Gaussian} approximate filtering and smoothing, we choose $\phi^\mathrm{I}(x)\coloneqq x$ and $\phi^\mathrm{II}(x)\coloneqq x \, x^\trans$ in order to approximate 150 | % 151 | \begin{equation} 152 | \begin{split} 153 | p(x_k \cond y_{1:k}) &\approx \mathrm{N}\big(x_k \cond m^f_k, P^f_k\big), \\ 154 | p(x_k \cond y_{1:T}) &\approx \mathrm{N}\big(x_k \cond m^s_k, P^s_k\big). 155 | \end{split} 156 | \end{equation} 157 | % 158 | \end{block} 159 | \end{frame} 160 | 161 | \begin{frame}{Stochastic filtering} 162 | \begin{block}{} 163 | Thanks to D. Florens-Zmirou and D. Dacunha-Castelle, for any $\phi\in \mathcal{C}(\R^{2\,(M+1)};\R)$, it is possible to 164 | % 165 | \begin{equation} 166 | \expec{\phi(X(t)) \cond X(s)} = \sum^M_{r=0} \A^r\phi(X(s))\,\Delta t^r + R_{M, \phi}(X(s), \Delta t), 167 | \end{equation} 168 | % 169 | where 170 | % 171 | \begin{equation} 172 | \begin{split} 173 | \A\phi(x) &\coloneqq (\nabla_x\phi(x))^\trans \, a(x) + \frac{1}{2} \, \tracebig{\Gamma(x) \, \hessian_x\phi(x)},\\ 174 | \Gamma(x) &\coloneqq b(x) \, b(x)^\trans. 175 | \end{split} 176 | \end{equation} 177 | % 178 | \end{block} 179 | \begin{block}{} 180 | We call this \alert{Taylor moment expansion (TME)}, detailed in Section 3.3. 181 | \end{block} 182 | \end{frame} 183 | 184 | \begin{frame}{Stochastic filtering} 185 | \begin{block}{} 186 | However, the TME approximation to the \alert{covariance} $\cov{X(t) \cond X(s)}$ might not be \alert{positive definite}. Detailed in \alert{Theorem~3.5}. 187 | \end{block} 188 | \begin{block}{} 189 | This problem can be numerically addressed by: 190 | \begin{itemize} 191 | \item Choose small \alert{time interval} $t-s$. 192 | \item Increase the expansion \alert{order} $M$ if the SDE coefficients are regular enough. 193 | \item Tune SDE coefficients so that it's positive definite for all $t-s\in\R_{>0}$ (see \alert{Corollary 3.6}). 194 | \item Tune SDE coefficients so that it's positive definite for all $X(s)\in\R^d$ (see \alert{Lemma 3.8}). 195 | \end{itemize} 196 | \end{block} 197 | \end{frame} 198 | 199 | \begin{frame}{Stochastic filtering} 200 | \begin{block}{} 201 | \begin{example} 202 | \begin{equation} 203 | \begin{split} 204 | \diff X^1(t) &= \big( \log(1+\exp(X^1(t))) + \kappa\,X^2(t) \big)\diff t + \diff W_1(t),\\ 205 | \diff X^2(t) &= \big( \log(1+\exp(X^2(t))) + \kappa\,X^1(t) \big)\diff t + \diff W_2(t),\\ 206 | X^1(t_0)&=X^2(t_0)=0, 207 | \end{split} 208 | \end{equation} 209 | where $\kappa\in\R$ is a \alert{tunable} parameter. By applying \alert{Corollary 3.6}, the TME-2 covariance approximation to this SDE is positive definite for \alert{all} $t-t_0\in\R_{>0}$, if \alert{$\abs{\kappa}\leq0.5$}. 210 | \end{example} 211 | \end{block} 212 | \end{frame} 213 | 214 | %\begin{frame}{Stochastic filtering} 215 | % \begin{block}{} 216 | % \begin{figure} 217 | % \centering 218 | % \includegraphics[width=.7\linewidth]{../thesis_latex/figs/tme-softplus-mineigs} 219 | % \caption{The minimum eigenvalues of TME-2 approximated $\cov{X(t) \cond X(t_0)}$ (denote $\Sigma_2$) w.r.t. $\Delta t=t-t_0$ and $\kappa$.} 220 | % \end{figure} 221 | % \end{block} 222 | %\end{frame} 223 | 224 | \begin{frame}{Stochastic filtering} 225 | \begin{block}{} 226 | \alert{Section 3.6} details how to run Gaussian filters and smoothers with the TME method. 227 | \end{block} 228 | \begin{block}{} 229 | Under a few assumptions on the system, the TME Gaussian filters and smoothers are \alert{stable} in the sense that (\alert{Theorem 3.17}) 230 | % 231 | \begin{equation} 232 | \expecBig{\normbig{X_k - m^f_k}_2^2} \leq (c^f_1)^k \, \trace{P_0} + c^f_2, \quad k=1,2,\ldots 233 | \end{equation} 234 | % 235 | and 236 | \begin{equation} 237 | \begin{split} 238 | \expecBig{\normbig{X_k - m^s_k}_2^2} &\leq c^f_0(k) + (c^s_1)^{T-k} \, c_2 + c_3, \quad,\\ 239 | k&=1,2,\ldots,T, \quad T=1,2,\ldots, 240 | \end{split} 241 | \end{equation} 242 | where $c^f_1<1$, $c^s_1<1$, and $c^f_0(k)$ depends on $\expec{\norm{X_k - m^f_k}_2^2}$ \alert{only}. 243 | \end{block} 244 | \end{frame} 245 | 246 | \begin{frame}{Stochastic filtering} 247 | \begin{figure} 248 | \centering 249 | \includegraphics[width=.6\linewidth]{../thesis_latex/figs/tme-duffing-filter-smoother}\\ 250 | \includegraphics[width=.4\linewidth]{../thesis_latex/figs/tme-duffing-smoother-x1} 251 | \includegraphics[width=.4\linewidth]{../thesis_latex/figs/tme-duffing-smoother-x2} 252 | \caption{TME on Duffing-van der Pol (\alert{Example 3.19}).} 253 | \end{figure} 254 | \end{frame} 255 | 256 | \section{State-space deep Gaussian processes} 257 | \begin{frame}{Contents} 258 | \begin{block}{} 259 | \tableofcontents[currentsection] 260 | \end{block} 261 | \end{frame} 262 | 263 | %\begin{frame}{Gaussian processes} 264 | % \begin{block}{} 265 | % $U\colon\T\to\R^d$ is said to be a \alert{Gaussian process (GP)}, if for every $t_10}, 383 | % \end{split} 384 | % \end{equation} 385 | % % 386 | % then it is \alert{hard} for Gaussian filters and smoothers to estimate the posterior distribution of $\sigma$ from data. 387 | % \end{block} 388 | % \begin{block}{} 389 | % The \alert{Kalman gain} for $\sigma$ converges to zero as $k\to\infty$. 390 | % \end{block} 391 | % \begin{block}{} 392 | % These are detailed in \alert{Section 4.8}. 393 | % \end{block} 394 | %\end{frame} 395 | 396 | \section{Applications of state-space (deep) Gaussian processes} 397 | \begin{frame}{Contents} 398 | \begin{block}{} 399 | \tableofcontents[currentsection] 400 | \end{block} 401 | \end{frame} 402 | 403 | \begin{frame}{Probabilistic Drift Estimation} 404 | \begin{block}{} 405 | Consider an SDE 406 | % 407 | \begin{equation} 408 | \diff X(t) = a(X(t)) \diff t + b \diff W(t), \quad X(t_0) = X_0, 409 | \end{equation} 410 | % 411 | where the drift function $a$ is \alert{unknown}. The task is to estimate $a$ from a set of partial observations \alert{$x(t_1), x(t_2), \ldots, x(t_T)$} of the SDE. 412 | \end{block} 413 | \begin{block}{} 414 | One can assume that 415 | % 416 | \begin{equation} 417 | a(x) \sim \mathrm{SSGP}(0, C(x, x')) 418 | \end{equation} 419 | % 420 | then build an \alert{approximate likelihood} model from any \alert{discretisation} of the SDE. 421 | \end{block} 422 | \begin{block}{} 423 | If necessary, let $a$ follow an SS-DGP. 424 | \end{block} 425 | \end{frame} 426 | 427 | \begin{frame}{Probabilistic Drift Estimation} 428 | \begin{block}{} 429 | Essentially, the estimation model reads 430 | % 431 | \begin{equation} 432 | \begin{split} 433 | a(x) &\sim \mathrm{SSGP}(0, C(x, x')),\\ 434 | X(t_k) - X(t_{k-1}) &\approx f_{k-1}(X_{k-1}) + q_{k-1}(X_{k-1}), 435 | \end{split} 436 | \end{equation} 437 | % 438 | where $f_{k-1}$ and $q_{k-1}$ are some \alert{non-linear functions and random variables} of $a$ and its \alert{derivatives} (depending on the discretisation). 439 | \end{block} 440 | \begin{block}{} 441 | What are the \alert{upsides} for placing an SS-(D)GP prior on $a$? 442 | \begin{itemize} 443 | \item \alert{Linear} time computational complexity. 444 | \item Derivatives of $a$ appear as \alert{state components}, no need to compute the covariance matrices of derivatives. 445 | \item Amenable to \alert{high-order discretisation schemes/accurate likelihood approxiamation}. 446 | \end{itemize} 447 | \end{block} 448 | \end{frame} 449 | 450 | %\begin{frame}{} 451 | % \begin{figure} 452 | % \centering 453 | % \includegraphics[width=\linewidth]{../thesis_latex/figs/drift-est} 454 | % \caption{Left: $a(x) = 3 \, (x-x^3)$. Right: $a(x) = \tanh(x)$.} 455 | % \end{figure} 456 | %\end{frame} 457 | 458 | \begin{frame}{Spectro-temporal Analysis} 459 | \begin{block}{} 460 | Consider any periodic signal $z\colon\T\to\R$. We may want to approximate it by \alert{Fourier expansion}: 461 | % 462 | \begin{equation} 463 | z(t) \approx \alpha_0 + \sum^N_{n=1} \big[ \alpha_n \cos(2 \, \pi \, f_n \, t) + \beta_n\sin(2 \, \pi \, f_n \, t) \big]. 464 | \end{equation} 465 | % 466 | \alert{GP estimation} of the coefficients \alert{$\lbrace \alpha_0, \alpha_n,\beta_n \rbrace_{n=1}^N$}: 467 | % 468 | \begin{equation} 469 | \begin{split} 470 | \alpha_0(t) &\sim \mathrm{SSGP}(0, C^0_\alpha(t, t')), \\ 471 | \alpha_n(t) &\sim \mathrm{SSGP}(0, C^n_\alpha(t, t')), \\ 472 | \beta_n(t) &\sim \mathrm{SSGP}(0, C^n_\beta(t, t')), \\ 473 | Y_k \alert{=} \alpha_0(t_k) + \sum^N_{n=1} \big[ \alpha_n(t_k) &\cos(2 \, \pi \, f_n \, t_k) + \beta_n(t_k)\sin(2 \, \pi \, f_n \, t_k) \big] + \xi_k,\nonumber 474 | \end{split} 475 | \end{equation} 476 | \end{block} 477 | \end{frame} 478 | 479 | \begin{frame}{Spectro-temporal Analysis} 480 | \begin{block}{} 481 | However, the approach is \alert{computationally demanding}. Needs to store and compute \alert{$2 \, N+1$} covariance matrices of dimension \alert{$T\times T$} and their \alert{inverse}. 482 | \end{block} 483 | \begin{block}{} 484 | If we use the state-space approach, then it reduces to solve \alert{$T$} covariance matrices of dimension \alert{$2 \, N+1$}. Beneficial when \alert{$T\gg N$}. 485 | \end{block} 486 | \begin{block}{} 487 | With a clever choice of \alert{stationary} state-space prior, the said covariance matrices are no longer a problem. Replaced by a \alert{pre-computed and data-independent} stationary covariance matrix. Even faster. 488 | \end{block} 489 | \begin{block}{} 490 | Detailed in \alert{Section 5.2}. 491 | \end{block} 492 | \end{frame} 493 | 494 | %\begin{frame}{} 495 | % \begin{figure} 496 | % \centering 497 | % \includegraphics[width=\linewidth]{../thesis_latex/figs/spectro-temporal-demo1} 498 | % \caption{Spectrogram (right, contour plot) of a sinusoidal signal (left) estimated by RTS smoother. Dashed black lines stand for the ground truth frequencies.} 499 | % \end{figure} 500 | %\end{frame} 501 | 502 | \begin{frame} 503 | \noindent 504 | \begin{minipage}{.48\textwidth} 505 | \begin{figure} 506 | \centering 507 | \fbox{\includegraphics[trim={2cm 2cm 2cm 2cm},width=.8\linewidth,clip]{../thesis_latex/title-pages/title-pages}} 508 | \end{figure} 509 | \end{minipage} 510 | \hfill 511 | \begin{minipage}{.48\textwidth} 512 | \begin{block}{} 513 | Thank you! 514 | \end{block} 515 | \begin{block}{} 516 | \begin{figure} 517 | \centering 518 | \includegraphics[width=.5\linewidth]{figs/qr-code-thesis} 519 | \end{figure} 520 | \end{block} 521 | \end{minipage} 522 | \end{frame} 523 | 524 | \end{document} -------------------------------------------------------------------------------- /thesis_latex/ch5.tex: -------------------------------------------------------------------------------- 1 | %!TEX root = dissertation.tex 2 | \chapter{Applications} 3 | \label{chap:apps} 4 | In this chapter, we present the experimental results in Publications~\cp{paperDRIFT}, \cp{paperKFSECG}, \cp{paperKFSECGCONF}, \cp{paperSSDGP}, and~\cp{paperMARITIME}. These works are mainly concerned with the applications of state-space (deep) GPs. Specifically, in Section~\ref{sec:drift-est} we show how to use the SS-GP regression method to estimate unknown drift functions in SDEs. Similarly, under that same state-space framework, in Section~\ref{sec:spectro-temporal} we show how to estimate the posterior distributions of the Fourier coefficients of signals. Sections~\ref{sec:apps-ssdgp} and~\ref{sec:maritime} illustrate how SS-DGPs can be used to model real-world signals, such as gravitational waves, accelerometer recordings of human motion, and maritime vessel trajectories. 5 | 6 | \section{Drift estimation in stochastic differential equations} 7 | \label{sec:drift-est} 8 | Consider a scalar-valued stochastic process $X \colon \T \to \R$ governed by a stochastic differential equation 9 | % 10 | \begin{equation} 11 | \diff X(t) = a(X(t)) \diff t + b \diff W(t), \quad X(t_0) = X_0, 12 | \label{equ:drift-est-sde} 13 | \end{equation} 14 | % 15 | where $b\in\R$ is a constant, $W\colon \T\to\R$ is a Wiener process, and $a\colon \R\to\R$ is an \emph{unknown} drift function. Suppose that we have measurement random variables $X(t_1), X(t_2), \ldots, X(t_T)$ of $X$ at time instances $t_1, t_2, \ldots, t_T\in\T$, the goal is to estimate the drift function $a$ from these measurements. 16 | 17 | One way to proceed is to assume a parametric form of function $a=a_\vartheta(\cdot)$ and estimate its parameters $\vartheta$ by using, for example, maximum likelihood estimation~\citep{Zmirou1986, Yoshida1992, Kessler1997, Sahalia2003} or Monte Carlo methods~\citep{Roberts2001, Beskos2006}. 18 | 19 | In this chapter, we are mainly concerned with the GP regression approach for estimating the unknown $a$~\citep{Papaspiliopoulos2012, Ruttor2013, Garcia2017, Batz2018, Opper2019}. The key idea of this approach is to assume that the unknown drift function is distributed according to a GP, that is 20 | % 21 | \begin{equation} 22 | a(x) \sim \mathrm{GP}\bigl(0, C(x, x')\bigr). 23 | \end{equation} 24 | % 25 | Having at our disposal measurements $X(t_1), X(t_2), \ldots, X(t_T)$ observed directly from SDE~\eqref{equ:drift-est-sde}, we can formulate the problem of estimating $a$ as a GP regression problem. In order to do so, we discretise the SDE in Equation~\eqref{equ:drift-est-sde} and thereupon define the measurement model as 26 | % 27 | \begin{equation} 28 | Y_k \coloneqq X(t_{k}) - X(t_{k-1}) = \check{f}_{k-1}(X(t_{k-1})) + \check{q}_{k-1}(X(t_{k-1})) 29 | \end{equation} 30 | % 31 | for $k=1,2,\ldots, T$, where the function $\check{f}_{k-1}\colon \R \to \R$ and the random variable $\check{q}_{k-1}$ represent the exact discretisation of $X$ at $t_{k}$ from $t_{k-1}$. We write the GP regression model for estimating the drift function by 32 | % 33 | \begin{equation} 34 | \begin{split} 35 | a(x) &\sim \mathrm{GP}(0, C\bigl(x, x')\bigr),\\ 36 | Y_k &= \check{f}_{k-1}(X_{k-1}) + \check{q}_{k-1}(X_{k-1}). 37 | \label{equ:drift-est-reg-model} 38 | \end{split} 39 | \end{equation} 40 | % 41 | The goal now is to estimate the posterior density of $a(x)$ for all $x\in\R$ from a set of data $y_{1:T} = \lbrace x_k - x_{k-1} \colon k=1,2,\ldots, T \rbrace$. 42 | 43 | However, the exact discretisation of non-linear SDEs is rarely possible. In practice, we often have to approximate $\check{f}_{k-1}$ and $\check{q}_{k-1}$ by using, for instance, Euler--Maruyama scheme, Milstein's method, or more generally It\^{o}--Taylor expansions~\citep{Kloeden1992}. As an example, application of the Euler--Maruyama method to Equation~\eqref{equ:drift-est-sde} gives 44 | % 45 | \begin{equation} 46 | \begin{split} 47 | \check{f}_{k-1} &\approx a(x) \, \Delta t_k, \\ 48 | \check{q}_{k-1} &\approx b \, \delta_k, 49 | \end{split} 50 | \end{equation} 51 | % 52 | where $\Delta t_k \coloneqq t_{k} - t_{k-1}$ and $\delta_k \sim \mathrm{N}(0, \Delta t_k)$. 53 | 54 | However, the discretisation by the Euler--Maruyama scheme can sometimes be crude, especially when the discretisation step is relatively large, making the measurement representation obtained from it inaccurate.~\citet{ZhaoZheng2020Drift} show that if the prior of $a$ is chosen of certain regularities, it is possible to leverage high-order It\^{o}--Taylor expansions in order to discretise the SDE with higher accuracy. As an example, suppose that the GP prior $a$ is twice-differentiable almost surely. Then, the It\^{o}--Taylor strong order 1.5 (It\^{o}-1.5) method~\citep{Kloeden1992} gives 55 | % 56 | \begin{equation} 57 | \begin{split} 58 | \check{f}_{k-1}(x) &\approx a(x) \, \Delta t_{k} + \frac{1}{2} \Big( \frac{\diff a}{\diff x}(x) \, a(x) + \frac{1}{2}\, \frac{\diff^2 a}{\diff x^2}(x) \, b^2 \Big) \, \Delta t_k^2,\\ 59 | \check{q}_{k-1}(x) &\approx b\,\delta_{1,k} + \frac{\diff a}{\diff x}(x) \, b \, \delta_{2, k}, 60 | \label{equ:drift-est-ito15} 61 | \end{split} 62 | \end{equation} 63 | % 64 | where 65 | % 66 | \begin{equation} 67 | \begin{bmatrix} 68 | \delta_{1,k} \\ 69 | \delta_{2,k} 70 | \end{bmatrix} \sim \mathrm{N}\left( 71 | \begin{bmatrix} 72 | 0\\0 73 | \end{bmatrix}, 74 | \begin{bmatrix} 75 | \frac{(\Delta t_k)^3}{3} & \frac{(\Delta t_k)^2}{2}\\ 76 | \frac{(\Delta t_k)^2}{2} & \Delta t_k 77 | \end{bmatrix}\right). 78 | \end{equation} 79 | % 80 | Indeed, using a higher order It\^{o}--Taylor expansion can lead to a better measurement representation, however, this in turn requires more computations and limits the choice of the prior model. It is also worth mentioning that if one uses the approximations of high order It\^{o}--Taylor expansions -- such as the one in Equation~\eqref{equ:drift-est-ito15} -- the resulting measurement representation in the GP regression model~\eqref{equ:drift-est-reg-model} is no longer linear with respect to $a$. Consequently, the GP regression solution may not admit a closed-form solution. 81 | 82 | One problem of this GP regression-based drift estimation approach is that the computation can be demanding if the number of measurements $T$ is large. Moreover, if the measurements are densely located then the covariance matrices used in GP regression may be numerically close to singular. These two issues are already discussed in Introduction and Section~\ref{sec:ssgp}. In addition, the GP regression model is not amenable to high order It\^{o}--Taylor expansions, as these expansions result in non-linear measurement representations and require to compute the derivatives of $a$ up to a certain order. 83 | 84 | \begin{figure}[t!] 85 | \centering 86 | \includegraphics[width=.99\linewidth]{figs/drift-est} 87 | \caption{Estimation of drift functions $a(x)=3\,(x-x^3)$ (left) and $a(x)=\tanh(x)$ (right) by \citet{ZhaoZheng2020Drift}. UKFS stands for unscented Kalman filter and RTS smoother (UKFS). Shaded area stands for 0.95 confidence interval associated with the UKFS estimation.} 88 | \label{fig:drift-est} 89 | \end{figure} 90 | 91 | \citet{ZhaoZheng2020Drift} address the problems above by considering solving the GP regression problem in Equation~\eqref{equ:drift-est-reg-model} under the state-space framework. More precisely, they put an SS-GP prior over the unknown $a$ instead of a standard batch GP. The main benefit of doing so for this application is that the SS-GP regression solvers are computationally more efficient for large-scale measurements compared to the standard batch GP regression (see, Introduction and Section~\ref{sec:ssgp}). Moreover, in order to use high order It\^{o}--Taylor expansions, \citet{ZhaoZheng2020Drift} consider putting SS-GP priors over $a$ of the \matern family, so that the derivatives of $a$ naturally appear as the state components of $a$ (see, Section~\ref{sec:deep-matern}). In this way, computing the covariance matrices of the derivatives of $a$ is no longer needed. 92 | 93 | \begin{remark} 94 | Note that the SS-GP approach requires to treat $X(t_1), X(t_2), \ldots ,\allowbreak X(t_T)$ as time variables and sort their data $x_{1:T}=\lbrace x_1,x_2,\ldots,x_T \rbrace$ in temporal order. 95 | \end{remark} 96 | 97 | In Figure~\ref{fig:drift-est}, we show a representative result from~\citet{ZhaoZheng2020Drift}, where the SS-GP approach is employed to approximate the drift functions of two SDEs. In particular, the solutions are obtained by using the It\^{o}-1.5 discretisation, and an unscented Kalman filter and an RTS smoother. For more details regarding the experiments the reader is referred to~\citet{ZhaoZheng2020Drift}. 98 | 99 | \section{Probabilistic spectro-temporal signal analysis} 100 | \label{sec:spectro-temporal} 101 | Let $z\colon \T \to \R$ be a periodic signal. In signal processing, it is often of interest to approximate the signal by Fourier expansions of the form 102 | % 103 | \begin{equation} 104 | z(t) \approx \alpha_0 + \sum^{N}_{n=1} \big[\alpha_n \cos(2 \, \pi \, \mathring{f}_n \, t) + \beta_n \sin(2 \, \pi \, \mathring{f}_n \, t)\big], 105 | \end{equation} 106 | % 107 | where $\big\lbrace \mathring{f}_n\colon n=1,2,\ldots,N \big\rbrace$ stand for the frequency components, and $N$ is a given expansion order. When $z$ satisfies certain conditions~\citep{Katznelson2004}, the representation in the equation above converges as $N\to\infty$ (in various modes). 108 | 109 | Let use denote $y_k\coloneqq y(t_k)$ and suppose that we have a set of measurement data $y_{1:T}=\lbrace y_k\colon k=1,2,\ldots,T \rbrace$ of the signal at time instances $t_1, t_2, \ldots, t_T\in\T$. In order to quantify the truncation and measurement errors, we introduce Gaussian random variables $\xi_k \sim \mathrm{N}(0, \Xi_k)$ for $k=1,2,\ldots,T$ and let 110 | % 111 | \begin{equation} 112 | \begin{split} 113 | Y_k = \alpha_0 + \sum^{N}_{n=1} \big[\alpha_n \cos(2 \, \pi \, \mathring{f}_n \, t_k) + \beta_n \sin(2 \, \pi \, \mathring{f}_n \, t_k)\big] + \xi_k 114 | \label{equ:spectro-temporal-y} 115 | \end{split} 116 | \end{equation} 117 | % 118 | represent the random measurements of $z$ at $t_k$. The goal now is to estimate the coefficients $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$ from the data $y_{1:T}$. We call this problem the \emph{spectro-temporal estimation} problem. 119 | 120 | One way to proceed is by using the MLE method~\citep{Bretthorst1988}, but~\citet{QiYuan2002, ZhaoZheng2018KF, ZhaoZheng2020KFECG} show that we can also consider this spectro-temporal estimation problem as a GP regression problem. More precisely, the modelling assumption is that 121 | % 122 | \begin{equation} 123 | \begin{split} 124 | \alpha_0(t) &\sim \mathrm{GP}\big(0, C^0_\alpha(t, t')\big),\\ 125 | \alpha_n(t) &\sim \mathrm{GP}\big(0, C^n_\alpha(t, t')\big),\\ 126 | \beta_n(t) &\sim \mathrm{GP}\big(0, C^n_\beta(t, t')\big), 127 | \label{equ:spectro-temporal-gp-priors} 128 | \end{split} 129 | \end{equation} 130 | % 131 | for $n=1,2,\ldots, N$, and that the measurements follow 132 | % 133 | \begin{equation} 134 | Y_k = \alpha_0(t_k) + \sum^{N}_{n=1} \big[\alpha_n(t_k) \, \cos(2 \, \pi \, \mathring{f}_n \, t_k) + \beta_n(t_k) \, \sin(2 \, \pi \, \mathring{f}_n \, t_k)\big] + \xi_k,\nonumber 135 | \end{equation} 136 | % 137 | for $k=1,2,\ldots,T$. This results in a standard GP regression problem therefore, the posterior distribution of coefficients $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$ have a close-form solution. However, solving this GP regression problem is, in practice, infeasible when the expansion order $N$ and the number of measurements $T$ are large. This is due to the fact that one needs to compute $2\,N+1$ covariance matrices of dimension $T\times T$ and compute their inverse. 138 | 139 | \citet{ZhaoZheng2018KF} propose to solve this spectro-temporal GP regression problem under the state-space framework, that is, by replacing the GP priors in Equation~\eqref{equ:spectro-temporal-gp-priors} with their SDE representations. Since SS-GPs have already been extensively discussed in previous sections, we omit the resulting state-space spectro-temporal estimation formulations. However, the details can be found in Section~\ref{sec:ssgp} and in~\citet{ZhaoZheng2018KF}. 140 | 141 | The computational cost of the state-space spectro-temporal estimation method is substantially cheaper than that of standard batch GP methods. Indeed, Kalman filters and smoothers only need to compute one $E$-dimensional covariance matrix at each time step (see, Algorithm~\ref{alg:kfs}) instead of those required by batch GP methods. The dimension $E$ is equal to the sum of all the state dimensions of the SS-GPs $\lbrace \alpha_0, \alpha_n, \beta_n \colon n=1,2,\ldots, N \rbrace$. 142 | 143 | \citet{ZhaoZheng2020KFECG} further extend the state-space spectro-temporal estimation method by putting quasi-periodic SDE priors~\citep{Solin2014} over the Fourier coefficients instead of the Ornstein--Uhlenbeck SDE priors used in~\citet{ZhaoZheng2018KF}. This consideration generates a time-invariant version of the measurement model in Equation~\eqref{equ:spectro-temporal-y}, thus, one can apply steady-state Kalman filters and smoothers (SS-KFSs) in order to achieve lower computational costs. The computational cost is further reduced because SS-KFSs do not need to compute the $E$-dimensional covariances of the state in their filtering and smoothing loops. Instead, the state covariances in SS-KFSs are replaced by a pre-computed steady covariance matrix obtained as the solution of its discrete algebraic Riccati equation (DARE). Moreover, solving the DARE is independent of data/measurements, which is especially useful when the model is known or fixed. However, SS-KFSs may not always be computationally efficient when $N \gg T$, since solving an $E$-dimensional DARE can be demanding when $E$ is large. 144 | 145 | \citet{ZhaoZheng2018KF, ZhaoZheng2020KFECG} show that the state-space spectro-temporal estimation method can be a useful feature extraction mechanism for detecting atrial fibrillation from electrocardiogram signals. More specifically, the spectro-temporal method estimates the spectrogram images of atrial fibrillation signals. These images are then fed to a deep convolutional neural network classifier which is tasked with recognising atrial fibrillation manifestations. 146 | 147 | Since the measurement noises $\lbrace \xi_k\colon k=1,2,\ldots,T \rbrace$ in Equation~\eqref{equ:spectro-temporal-y} encode the truncation and measurement errors, it is also of interest to estimate them. This is done in~\citet{GaoRui2019ALKS}, where the variances $\Xi_k$ of $\xi_k$ for $k=1,2,\ldots, T$ are estimated under the alternating direction method of multipliers. 148 | 149 | \begin{figure}[t!] 150 | \centering 151 | \includegraphics[width=.99\linewidth]{figs/spectro-temporal-demo1} 152 | \caption{Spectrogram (right, contour plot) of a sinusoidal signal (left) generated by Kalman filtering and RTS smoothing using the method in Section~\ref{sec:spectro-temporal}. Dashed black lines (right) stand for the ground truth frequencies.} 153 | \label{fig:spectro-temporal-demo} 154 | \end{figure} 155 | 156 | Figure~\ref{fig:spectro-temporal-demo} illustrates an example of using the state-space spectro-temporal estimation method to estimate the spectrogram of a sinusoidal signal with multiple frequency bands. 157 | 158 | \section{Signal modelling with SS-DGPs} 159 | \label{sec:apps-ssdgp} 160 | In this section, we apply SS-DGPs for modelling gravitational waves and human motion (i.e., acceleration). We consider these as SS-DGP regression problems, where the measurement models are assumed to be linear with respect to the SS-DGPs with additive Gaussian noises. As for their priors, we chose the Mat\'{e}rn $\nu=3 \, / \, 2$ SS-DGP in Example~\ref{example:ssdgp-m32}, except that the parent GPs $U^2_1$ and $U^3_1$ use the Mat\'{e}rn $\nu=1 \, / \, 2$ representation. 161 | 162 | \subsection*{Modelling gravitational waves} 163 | Gravitational waves are curvatures of spacetime caused by the movement of objects with mass~\citep{Maggiore2008}. Since the time Albert Einstein predicted the existence of gravitational waves theoretically from a linearised field equation in 1916~\citep{EinsteinGW1937, Hill2017}, much effort has been done to observe their presence~\citep{Blair1991}. In 2015, the laser interferometer gravitational-wave observatory (LIGO) team first observed a gravitational wave from the merging of a black hole binary~\citep[event GW150914,][]{LIGO2016}. This wave/signal is challenging for standard GPs to fit because the frequency of the signal changes over time. It is then of our interest to see if SS-DGPs can fit this gravitational wave signal. 164 | 165 | \begin{figure}[t!] 166 | \centering 167 | \includegraphics[width=.99\linewidth]{figs/gravit-wave-ssdgp} 168 | \caption{Mat\'{e}rn $\nu=3\, \, /\, 2$ SS-DGP regression (solved by cubature Kalman filter and smoother) for the gravitational wave in event GW150914 (Hanford, Washington). The shaded area stands for 0.95 confidence interval. Details about the data can be found in~\citet{Zhao2020SSDGP}.} 169 | \label{fig:gravit-wave-ssdgp} 170 | \end{figure} 171 | 172 | Figure~\ref{fig:gravit-wave-ssdgp} plots the SS-DGP fit for the gravitational wave observed in the event GW150914. In the same figure, we also show the fit from a \matern $\nu=3\,/\,2$ GP as well as a waveform (which is regarded as the ground truth) computed from the numerical relativity (purple dashed lines) for comparison. Details about the experiment and data are found in~\citet{Zhao2020SSDGP}. 173 | 174 | Figure~\ref{fig:gravit-wave-ssdgp} shows that the GP fails to give a reasonable fit to the gravitational wave because the GP over-adapts the high-frequency section of the signal around $0.4$~s. On the contrary, the SS-DGP does not have such a problem, and the fit is closer to the numerical relativity waveform compared that of the GP. Moreover, the estimated length scale (in log transformation) can interpret the data in the sense that the length scale value decreases as the signal frequency increases. 175 | 176 | \subsection*{Modelling human motion} 177 | We apply the regularised SS-DGP (R-SS-DGP) presented in Section~\ref{sec:l1-r-dgp} to fit an accelerometer recording of human motion. The reason for using R-SS-DGP here is that the recording (see, the first row of Figure~\ref{fig:imu-r-ssdgp}) is found to have some sharp changes and artefacts. Hence, we aim at testing if we can use sparse length scale and magnitude to describe such data. The collection of accelerometer recordings and the experiment settings are detailed in~\citet{Hostettler2018} and~\citet{Zhao2021RSSGP}, respectively. 178 | 179 | \begin{figure}[t!] 180 | \centering 181 | \includegraphics[width=.99\linewidth]{figs/imu-r-ssdgp} 182 | \caption{Human motion modelling with an R-SS-DGP. The GP here uses a \matern $\nu=3\,/\,2$ covariance function. Shaded area stands for 0.95 confidence interval.} 183 | \label{fig:imu-r-ssdgp} 184 | \end{figure} 185 | 186 | A demonstrative result is shown in Figure~\ref{fig:imu-r-ssdgp}. We see that the fit of R-SS-DGP is smoother than that of GP. Moreover, the posterior variance of R-SS-DGP is also found to be reasonably smaller than GP. It is also evidenced from the figure that the GP does not handle the artefacts well, for example, around times $t=55$~s and $62$~s. Finally, we find that the learnt length scale and magnitude (in log transformation) are sparse, and that they can respond sharply to the abrupt signal changes and artefacts. 187 | 188 | \section{Maritime situational awareness} 189 | \label{sec:maritime} 190 | Another area of applications of (deep) GPs is autonomous maritime navigation. In~\citet{Sarang2020}, we present a literature review on the sensor technology and machine learning methods for autonomous vessel navigation. In particular, we show that GP-based methods are able to analyse ship trajectories~\citep{Rong2019}, detect navigation abnormality~\citep{Kowalska2012, Smith2014}, and detect/classify vessels~\citep{XiaoZ2017}. 191 | 192 | \begin{figure}[t!] 193 | \centering 194 | \includegraphics[width=.99\linewidth]{figs/ais-r-ssdgp} 195 | \caption{Modelling AIS recording (speed over ground) of MS Finlandia with an R-SS-DGP. The GP here uses a \matern $\nu=3\,/\,2$ covariance function. Shaded area stands for 0.95 confidence interval.} 196 | \label{fig:ais-ssdgp} 197 | \end{figure} 198 | 199 | In Figure~\ref{fig:ais-ssdgp}, we present an example for fitting an automatic identification system (AIS) recording by using an R-SS-DGP. The recording is taken from MS Finlandia (Helsinki--Tallinn) by Fleetrange Oy on December 10, 2020. We see from the figure that the fit of R-SS-DGP is smoother than that of GP. Moreover, the learnt length scale and magnitude parameters are flat and jump at the acceleration/deceleration points. 200 | --------------------------------------------------------------------------------