├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── _config.yml ├── _data ├── options.yml └── social.yml ├── _includes ├── footer.html ├── head.html ├── header.html └── print-footer.html ├── _layouts ├── default.html ├── full-width.html ├── page.html └── post.html ├── _plugins ├── fullwidth.rb ├── main_column_img.rb ├── margin_figure.rb ├── marginnote.rb ├── mathjaxtag.rb ├── newthought.rb └── sidenote.rb ├── _sass ├── _fonts.scss ├── _settings.scss └── _syntax-highlighting.scss ├── autoregressive ├── autoregressive.png ├── fvsbn.png ├── index.md ├── index.tex └── nade.png ├── css ├── tufte.css ├── tufte.orginal.css └── tufte.scss ├── docs ├── LICENSE ├── Makefile ├── autoregressive │ ├── autoregressive.png │ ├── fvsbn.png │ ├── index.html │ ├── index.tex │ └── nade.png ├── css │ ├── tufte.css │ └── tufte.orginal.css ├── flow │ ├── flow-graphical.PNG │ ├── iaf.PNG │ ├── index.html │ └── maf.PNG ├── fonts │ ├── et-bembo │ │ ├── et-bembo-bold-line-figures │ │ │ ├── et-bembo-bold-line-figures.eot │ │ │ ├── et-bembo-bold-line-figures.svg │ │ │ ├── et-bembo-bold-line-figures.ttf │ │ │ └── et-bembo-bold-line-figures.woff │ │ ├── et-bembo-display-italic-old-style-figures │ │ │ ├── et-bembo-display-italic-old-style-figures.eot │ │ │ ├── et-bembo-display-italic-old-style-figures.svg │ │ │ ├── et-bembo-display-italic-old-style-figures.ttf │ │ │ └── et-bembo-display-italic-old-style-figures.woff │ │ ├── et-bembo-roman-line-figures │ │ │ ├── et-bembo-roman-line-figures.eot │ │ │ ├── et-bembo-roman-line-figures.svg │ │ │ ├── et-bembo-roman-line-figures.ttf │ │ │ └── et-bembo-roman-line-figures.woff │ │ ├── et-bembo-roman-old-style-figures │ │ │ ├── et-bembo-roman-old-style-figures.eot │ │ │ ├── et-bembo-roman-old-style-figures.svg │ │ │ ├── et-bembo-roman-old-style-figures.ttf │ │ │ └── et-bembo-roman-old-style-figures.woff │ │ └── et-bembo-semi-bold-old-style-figures │ │ │ ├── et-bembo-semi-bold-old-style-figures.eot │ │ │ ├── et-bembo-semi-bold-old-style-figures.svg │ │ │ ├── et-bembo-semi-bold-old-style-figures.ttf │ │ │ └── et-bembo-semi-bold-old-style-figures.woff │ ├── et-book │ │ ├── et-book-bold-line-figures │ │ │ ├── et-book-bold-line-figures.eot │ │ │ ├── et-book-bold-line-figures.svg │ │ │ ├── et-book-bold-line-figures.ttf │ │ │ └── et-book-bold-line-figures.woff │ │ ├── et-book-display-italic-old-style-figures │ │ │ ├── et-book-display-italic-old-style-figures.eot │ │ │ ├── et-book-display-italic-old-style-figures.svg │ │ │ ├── et-book-display-italic-old-style-figures.ttf │ │ │ └── et-book-display-italic-old-style-figures.woff │ │ ├── et-book-roman-line-figures │ │ │ ├── et-book-roman-line-figures.eot │ │ │ ├── et-book-roman-line-figures.svg │ │ │ ├── et-book-roman-line-figures.ttf │ │ │ └── et-book-roman-line-figures.woff │ │ ├── et-book-roman-old-style-figures │ │ │ ├── et-book-roman-old-style-figures.eot │ │ │ ├── et-book-roman-old-style-figures.svg │ │ │ ├── et-book-roman-old-style-figures.ttf │ │ │ └── et-book-roman-old-style-figures.woff │ │ └── et-book-semi-bold-old-style-figures │ │ │ ├── et-book-semi-bold-old-style-figures.eot │ │ │ ├── et-book-semi-bold-old-style-figures.svg │ │ │ ├── et-book-semi-bold-old-style-figures.ttf │ │ │ └── et-book-semi-bold-old-style-figures.woff │ ├── icomoon.eot │ ├── icomoon.svg │ ├── icomoon.ttf │ └── icomoon.woff ├── gan │ ├── cyclegan_gendisc.png │ ├── gan.png │ ├── index.html │ └── index.tex ├── index.html ├── introduction │ ├── index.html │ ├── learning.png │ ├── learning_1.png │ └── learning_2.png └── vae │ ├── index.html │ ├── klgap.png │ └── vae.png ├── flow ├── flow-graphical.PNG ├── flow-graphical.png ├── iaf.PNG ├── iaf.png ├── index.md ├── maf.PNG └── maf.png ├── fonts ├── et-bembo │ ├── et-bembo-bold-line-figures │ │ ├── et-bembo-bold-line-figures.eot │ │ ├── et-bembo-bold-line-figures.svg │ │ ├── et-bembo-bold-line-figures.ttf │ │ └── et-bembo-bold-line-figures.woff │ ├── et-bembo-display-italic-old-style-figures │ │ ├── et-bembo-display-italic-old-style-figures.eot │ │ ├── et-bembo-display-italic-old-style-figures.svg │ │ ├── et-bembo-display-italic-old-style-figures.ttf │ │ └── et-bembo-display-italic-old-style-figures.woff │ ├── et-bembo-roman-line-figures │ │ ├── et-bembo-roman-line-figures.eot │ │ ├── et-bembo-roman-line-figures.svg │ │ ├── et-bembo-roman-line-figures.ttf │ │ └── et-bembo-roman-line-figures.woff │ ├── et-bembo-roman-old-style-figures │ │ ├── et-bembo-roman-old-style-figures.eot │ │ ├── et-bembo-roman-old-style-figures.svg │ │ ├── et-bembo-roman-old-style-figures.ttf │ │ └── et-bembo-roman-old-style-figures.woff │ └── et-bembo-semi-bold-old-style-figures │ │ ├── et-bembo-semi-bold-old-style-figures.eot │ │ ├── et-bembo-semi-bold-old-style-figures.svg │ │ ├── et-bembo-semi-bold-old-style-figures.ttf │ │ └── et-bembo-semi-bold-old-style-figures.woff ├── et-book │ ├── et-book-bold-line-figures │ │ ├── et-book-bold-line-figures.eot │ │ ├── et-book-bold-line-figures.svg │ │ ├── et-book-bold-line-figures.ttf │ │ └── et-book-bold-line-figures.woff │ ├── et-book-display-italic-old-style-figures │ │ ├── et-book-display-italic-old-style-figures.eot │ │ ├── et-book-display-italic-old-style-figures.svg │ │ ├── et-book-display-italic-old-style-figures.ttf │ │ └── et-book-display-italic-old-style-figures.woff │ ├── et-book-roman-line-figures │ │ ├── et-book-roman-line-figures.eot │ │ ├── et-book-roman-line-figures.svg │ │ ├── et-book-roman-line-figures.ttf │ │ └── et-book-roman-line-figures.woff │ ├── et-book-roman-old-style-figures │ │ ├── et-book-roman-old-style-figures.eot │ │ ├── et-book-roman-old-style-figures.svg │ │ ├── et-book-roman-old-style-figures.ttf │ │ └── et-book-roman-old-style-figures.woff │ └── et-book-semi-bold-old-style-figures │ │ ├── et-book-semi-bold-old-style-figures.eot │ │ ├── et-book-semi-bold-old-style-figures.svg │ │ ├── et-book-semi-bold-old-style-figures.ttf │ │ └── et-book-semi-bold-old-style-figures.woff ├── icomoon.eot ├── icomoon.svg ├── icomoon.ttf └── icomoon.woff ├── gan ├── cyclegan_gendisc.png ├── gan.png ├── index.md └── index.tex ├── index.md ├── introduction ├── index.md ├── learning.png ├── learning_1.png └── learning_2.png └── vae ├── index.md ├── klgap.png └── vae.png /.gitignore: -------------------------------------------------------------------------------- 1 | _site 2 | .sass-cache 3 | .DS\_Store 4 | config.codekit 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2018 Aditya Grover, Stefano Ermon 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | TEMPDIR := $(shell mktemp -d -t tmp.XXX) 2 | 3 | publish: 4 | echo 'hmmm' 5 | cp -r ./_site/* $(TEMPDIR) 6 | cd $(TEMPDIR) && \ 7 | ls -a && \ 8 | git init && \ 9 | git add . && \ 10 | git commit -m 'publish site' && \ 11 | git remote add origin https://github.com/deepgenerativemodels/notes.git && \ 12 | git push origin master:refs/heads/gh-pages --force 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notes on Deep Generative Models 2 | 3 | These notes form a concise introductory course on deep generative models. They are based on Stanford [CS236](https://deepgenerativemodels.github.io/), taught by [Aditya Grover](http://aditya-grover.github.io/) and [Stefano Ermon](http://cs.stanford.edu/~ermon/), and have been written by [Aditya Grover](http://aditya-grover.github.io/), with the [help](https://github.com/deepgenerativemodels/notes/commits/master) of many students and course staff. 4 | 5 | The compiled version is available [here](https://deepgenerativemodels.github.io/notes/index.html). 6 | 7 | ## Contributing 8 | 9 | This material is under construction! Although we have written up most of it, you will probably find several typos. If you do, please let us know, or submit a pull request with your fixes via Github. 10 | 11 | 12 | The notes are written in Markdown and are compiled into HTML using Jekyll. Please add your changes directly to the Markdown source code. In order to install jekyll, you can follow the instructions posted on their website (https://jekyllrb.com/docs/installation/). 13 | 14 | Note that jekyll is only supported on GNU/Linux, Unix, or macOS. Thus, if you run Windows 10 on your local machine, you will have to install Bash on Ubuntu on Windows. Windows gives instructions on how to do that here and Jekyll's website offers helpful instructions on how to proceed through the rest of the process. 15 | 16 | To compile Markdown to HTML (i.e. after you have made changes to markdown and want them to be accessible to students viewing the docs), 17 | run the following commands from the root of your cloned version of the https://github.com/deepgenerativemodels/notes repo: 18 | 1) rm -r docs/ 19 | 2) jekyll serve # This should create a folder called _site. Note: This creates a running server; press Ctrl-C to stop the server before proceeding 20 | 3) mv _site docs # Change the name of the _site folder to "docs". This won't work if the server is still running. 21 | 4) git add file_names 22 | 5) git commit -am "your commit message describing what you did" 23 | 6) git push origin master 24 | 25 | Note that if you cloned the ermongroup/cs228-notes repo directly onto your local machine (instead of forking it) then you may see an error like "remote: Permission to ermongroup/cs228-notes.git denied to userjanedoe". If that is the case, then you need to fork their repo first. Then, if your github profile were userjanedoe, you would need to first push your local updates to your forked repo like so: 26 | 27 | git push https://github.com/deepgenerativemodels/notes.git master 28 | 29 | And then you could go and submit the pull request through the GitHub website. 30 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | baseurl: /notes 2 | title: Deep Generative Models 3 | subtitle: Lecture notes 4 | author: Aditya Grover 5 | simple_search: http://google.com/search 6 | description: Lecture notes for Deep Generative Models. 7 | name: notes 8 | markdown_ext: "markdown,mkdown,mkdn,mkd,md" 9 | permalink: /articles/:short_year/:title 10 | timezone: America/New_York 11 | excerpt_separator: # you can specify your own separator, of course. 12 | exclude: ['Gemfile', 'Gemfile.lock', 'Rakefile', 'README.md'] 13 | destination: docs 14 | google_analytics: UA-129020129-1 15 | post: 16 | template: _post.txt 17 | extension: md 18 | page: 19 | template: _page.txt 20 | extension: md 21 | editor: gvim 22 | git: 23 | branch: master 24 | transfer: 25 | command: rsync 26 | settings: -av 27 | source: _site/ -------------------------------------------------------------------------------- /_data/options.yml: -------------------------------------------------------------------------------- 1 | mathjax: true 2 | lato_font_load: true -------------------------------------------------------------------------------- /_data/social.yml: -------------------------------------------------------------------------------- 1 | - link: //www.twitter.com/twitter_handle 2 | icon: icon-twitter 3 | - link: //plus.google.com/+googlePlusName 4 | icon: icon-googleplus 5 | - link: //github.com/GithubHandle 6 | icon: icon-github 7 | - link: //www.flickr.com/photos/FlickrUserID 8 | icon: icon-flickr 9 | - link: /feed 10 | icon: icon-feed -------------------------------------------------------------------------------- /_includes/footer.html: -------------------------------------------------------------------------------- 1 | 16 | -------------------------------------------------------------------------------- /_includes/head.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | {% if page.title %}{{ page.title }}{% else %}{{ site.title }}{% endif %} 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | {% if site.data.options.lato_font_load %} 15 | 16 | {% endif %} 17 | 18 | {% if site.data.options.mathjax %} 19 | 20 | {% endif %} 21 | 22 | 32 | 33 | 34 | 35 | 36 | -------------------------------------------------------------------------------- /_includes/header.html: -------------------------------------------------------------------------------- 1 | 2 |
3 | 8 |
9 | -------------------------------------------------------------------------------- /_includes/print-footer.html: -------------------------------------------------------------------------------- 1 | {% if page.date %}{{ page.title }} - {{ page.date | date: "%B %-d, %Y" }} - {{site.author}}{% else %}{{ page.title }} - {{site.author}}{% endif %} -------------------------------------------------------------------------------- /_layouts/default.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% include head.html %} 4 | 5 | {% include header.html %} 6 |
7 | {{ content }} 8 |
9 | {% include print-footer.html %} 10 | {% include footer.html %} 11 | 12 | 13 | -------------------------------------------------------------------------------- /_layouts/full-width.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% include head.html %} 4 | 5 | {% include header.html %} 6 |
7 | {{ content }} 8 |
9 | {% include print-footer.html %} 10 | {% include footer.html %} 11 | 12 | 13 | -------------------------------------------------------------------------------- /_layouts/page.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | --- 4 |

{{ page.title | capitalize }}

5 |

{{ page.date | date: "%B %-d, %Y" }}

6 | 7 | 8 | {{ content }} 9 | -------------------------------------------------------------------------------- /_layouts/post.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | --- 4 |

{{ page.title | capitalize }}

5 |

{{ page.date | date: "%B %-d, %Y" }}

6 | 7 | 8 | 32 | 33 | 34 | {{ content }} 35 | 36 | -------------------------------------------------------------------------------- /_plugins/fullwidth.rb: -------------------------------------------------------------------------------- 1 | ## This has a fairly harmless hack that wraps the img tag in a div to prevent it from being 2 | ## wrapped in a paragraph tag instead, which would totally fuck things up layout-wise 3 | ## Usage {% fullwidth 'path/to/image' 'caption goes here in quotes' %} 4 | # 5 | module Jekyll 6 | class RenderFullWidthTag < Liquid::Tag 7 | 8 | require "shellwords" 9 | 10 | def initialize(tag_name, text, tokens) 11 | super 12 | @text = text.shellsplit 13 | end 14 | 15 | def render(context) 16 | baseurl = context.registers[:site].config['baseurl'] 17 | if @text[0].start_with?('http://', 'https://','//') 18 | "
"+ 19 | "
#{@text[1]}
" 20 | else 21 | "
"+ 22 | "
#{@text[1]}
" 23 | end 24 | end 25 | end 26 | end 27 | 28 | Liquid::Template.register_tag('fullwidth', Jekyll::RenderFullWidthTag) 29 | -------------------------------------------------------------------------------- /_plugins/main_column_img.rb: -------------------------------------------------------------------------------- 1 | ## Liquid tag 'maincolumn-figure' used to add image data that fits within the 2 | ## main column area of the layout 3 | ## Usage {% maincolumn 'path/to/image' 'This is the caption' %} 4 | # 5 | module Jekyll 6 | class RenderMainColumnTag < Liquid::Tag 7 | 8 | require "shellwords" 9 | 10 | def initialize(tag_name, text, tokens) 11 | super 12 | @text = text.shellsplit 13 | end 14 | 15 | def render(context) 16 | baseurl = context.registers[:site].config['baseurl'] 17 | if @text[0].start_with?('http://', 'https://','//') 18 | "
#{@text[1]}
" 19 | else 20 | "
#{@text[1]}
" 21 | end 22 | end 23 | end 24 | end 25 | 26 | Liquid::Template.register_tag('maincolumn', Jekyll::RenderMainColumnTag) 27 | -------------------------------------------------------------------------------- /_plugins/margin_figure.rb: -------------------------------------------------------------------------------- 1 | ## Liquid tag 'maincolumn' used to add image data that fits within the main 2 | ## column area of the layout 3 | ## Usage {% marginfigure 'margin-id-whatever' 'path/to/image' 'This is the caption' %} 4 | # 5 | module Jekyll 6 | class RenderMarginFigureTag < Liquid::Tag 7 | 8 | require "shellwords" 9 | 10 | def initialize(tag_name, text, tokens) 11 | super 12 | @text = text.shellsplit 13 | end 14 | 15 | def render(context) 16 | baseurl = context.registers[:site].config['baseurl'] 17 | if @text[1].start_with?('http://', 'https://', '//') 18 | ""+ 19 | ""+ 20 | "
#{@text[2]}
" 21 | else 22 | ""+ 23 | ""+ 24 | "
#{@text[2]}
" 25 | end 26 | end 27 | end 28 | end 29 | 30 | Liquid::Template.register_tag('marginfigure', Jekyll::RenderMarginFigureTag) 31 | -------------------------------------------------------------------------------- /_plugins/marginnote.rb: -------------------------------------------------------------------------------- 1 | module Jekyll 2 | class RenderMarginNoteTag < Liquid::Tag 3 | 4 | require "shellwords" 5 | 6 | def initialize(tag_name, text, tokens) 7 | super 8 | @text = text.shellsplit 9 | end 10 | 11 | def render(context) 12 | "#{@text[1]} " 13 | end 14 | end 15 | end 16 | 17 | Liquid::Template.register_tag('marginnote', Jekyll::RenderMarginNoteTag) 18 | 19 | -------------------------------------------------------------------------------- /_plugins/mathjaxtag.rb: -------------------------------------------------------------------------------- 1 | module Jekyll 2 | class MathJaxBlockTag < Liquid::Tag 3 | def render(context) 4 | '
' 15 | end 16 | end 17 | class MathJaxEndInlineTag < Liquid::Tag 18 | def render(context) 19 | '' 20 | end 21 | end 22 | end 23 | 24 | Liquid::Template.register_tag('math', Jekyll::MathJaxBlockTag) 25 | Liquid::Template.register_tag('m', Jekyll::MathJaxInlineTag) 26 | Liquid::Template.register_tag('endmath', Jekyll::MathJaxEndBlockTag) 27 | Liquid::Template.register_tag('em', Jekyll::MathJaxEndInlineTag) -------------------------------------------------------------------------------- /_plugins/newthought.rb: -------------------------------------------------------------------------------- 1 | ## Newthought tag will render anything in the tag with small caps 2 | ## Usage {% newthought Your text string here} will render a span 3 | ## YOUR TEXT STRING HERE (sort of, you know, small caps) if you are using the tufte.css file 4 | 5 | module Jekyll 6 | class RenderNewThoughtTag < Liquid::Tag 7 | 8 | require "shellwords" 9 | 10 | def initialize(tag_name, text, tokens) 11 | super 12 | @text = text.shellsplit 13 | end 14 | 15 | 16 | def render(context) 17 | "#{@text[0]} " 18 | end 19 | end 20 | end 21 | 22 | Liquid::Template.register_tag('newthought', Jekyll::RenderNewThoughtTag) -------------------------------------------------------------------------------- /_plugins/sidenote.rb: -------------------------------------------------------------------------------- 1 | module Jekyll 2 | class RenderSideNoteTag < Liquid::Tag 3 | 4 | require "shellwords" 5 | 6 | def initialize(tag_name, text, tokens) 7 | super 8 | @text = text.shellsplit 9 | end 10 | 11 | def render(context) 12 | "#{@text[1]} " 13 | end 14 | end 15 | end 16 | 17 | Liquid::Template.register_tag('sidenote', Jekyll::RenderSideNoteTag) 18 | 19 | -------------------------------------------------------------------------------- /_sass/_fonts.scss: -------------------------------------------------------------------------------- 1 | // Font imports file. If you don't want these fonts, comment out these and add your own into the fonts directory 2 | // and point the src attribute to the file. 3 | // 4 | 5 | @charset "UTF-8"; 6 | // 7 | // @font-face { 8 | // font-family: ETBembo; 9 | // src: url("../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.eot"); 10 | // src: url("../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.woff") format("woff"), url("../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.ttf") format("truetype"), url("../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.svg#etbemboromanosf") format("svg"); 11 | // font-weight: normal; 12 | // font-style: normal 13 | // } 14 | // 15 | // @font-face { 16 | // font-family: ETBembo; 17 | // src: url("../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.eot"); 18 | // src: url("../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.woff") format("woff"), url("../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.ttf") format("truetype"), url("../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.svg#etbemboromanosf") format("svg"); 19 | // font-weight: normal; 20 | // font-style: italic 21 | // } 22 | // 23 | // @font-face { 24 | // font-family: ETBembo; 25 | // src: url("../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.eot"); 26 | // src: url("../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.woff") format("woff"), url("../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.ttf") format("truetype"), url("../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.svg#etbemboromanosf") format("svg"); 27 | // font-weight: bold; 28 | // font-style: normal 29 | // } 30 | // 31 | // @font-face { 32 | // font-family: ETBemboRomanOldStyle; 33 | // src: url("../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.eot"); 34 | // src: url("../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.woff") format("woff"), url("../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.ttf") format("truetype"), url("../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.svg#etbemboromanosf") format("svg"); 35 | // font-weight: normal; 36 | // font-style: normal; 37 | // } 38 | 39 | 40 | @font-face { 41 | font-family: "et-book"; 42 | src: url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot"); 43 | src: url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff") format("woff"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.svg#etbookromanosf") format("svg"); 44 | font-weight: normal; 45 | font-style: normal 46 | } 47 | 48 | @font-face { 49 | font-family: "et-book"; 50 | src: url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot"); 51 | src: url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff") format("woff"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.svg#etbookromanosf") format("svg"); 52 | font-weight: normal; 53 | font-style: italic 54 | } 55 | 56 | @font-face { 57 | font-family: "et-book"; 58 | src: url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot"); 59 | src: url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff") format("woff"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.svg#etbookromanosf") format("svg"); 60 | font-weight: bold; 61 | font-style: normal 62 | } 63 | 64 | @font-face { 65 | font-family: "et-book-roman-old-style"; 66 | src: url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot"); 67 | src: url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff") format("woff"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.svg#etbookromanosf") format("svg"); 68 | font-weight: normal; 69 | font-style: normal; 70 | } 71 | -------------------------------------------------------------------------------- /_sass/_settings.scss: -------------------------------------------------------------------------------- 1 | /* This file contains all the constants for colors and font styles */ 2 | 3 | $body-font: et-book, Palatino, "Palatino Linotype", "Palatino LT STD", "Book Antiqua", Georgia, serif; 4 | // $body-font: ETBembo, Palatino, "Palatino Linotype", "Palatino LT STD", "Book Antiqua", Georgia, serif; 5 | // Note that Gill Sans is the top of the stack and corresponds to what is used in Tufte's books 6 | // However, it is not a free font, so if it is not present on the computer that is viewing the webpage 7 | // The free Google 'Lato' font is used instead. It is similar. 8 | $sans-font: "Gill Sans", "Gill Sans MT", "Lato", Calibri, sans-serif; 9 | $code-font: Consolas, "Liberation Mono", Menlo, Courier, monospace; 10 | $url-font: "Lucida Console", "Lucida Sans Typewriter", Monaco, "Bitstream Vera Sans Mono", monospace; 11 | $text-color: #111; 12 | $bg-color: #fffff8; 13 | $contrast-color: #a00000; 14 | $border-color: #333333; 15 | $link-style: underline; // choices are 'color' or 'underline'. Default is color using $contrast-color set above 16 | 17 | 18 | 19 | -------------------------------------------------------------------------------- /_sass/_syntax-highlighting.scss: -------------------------------------------------------------------------------- 1 | /** 2 | * Syntax highlighting styles 3 | */ 4 | $spacing-unit: 30px; 5 | %vertical-rhythm { 6 | margin-bottom: $spacing-unit / 2; 7 | } 8 | 9 | .highlight { 10 | background: #fffff8; 11 | @extend %vertical-rhythm; 12 | 13 | .c { color: #998; font-style: italic } // Comment 14 | .err { color: #a61717; background-color: #e3d2d2 } // Error 15 | .k { font-weight: bold } // Keyword 16 | .o { font-weight: bold } // Operator 17 | .cm { color: #998; font-style: italic } // Comment.Multiline 18 | .cp { color: #999; font-weight: bold } // Comment.Preproc 19 | .c1 { color: #998; font-style: italic } // Comment.Single 20 | .cs { color: #999; font-weight: bold; font-style: italic } // Comment.Special 21 | .gd { color: #000; background-color: #fdd } // Generic.Deleted 22 | .gd .x { color: #000; background-color: #faa } // Generic.Deleted.Specific 23 | .ge { font-style: italic } // Generic.Emph 24 | .gr { color: #a00 } // Generic.Error 25 | .gh { color: #999 } // Generic.Heading 26 | .gi { color: #000; background-color: #dfd } // Generic.Inserted 27 | .gi .x { color: #000; background-color: #afa } // Generic.Inserted.Specific 28 | .go { color: #888 } // Generic.Output 29 | .gp { color: #555 } // Generic.Prompt 30 | .gs { font-weight: bold } // Generic.Strong 31 | .gu { color: #aaa } // Generic.Subheading 32 | .gt { color: #a00 } // Generic.Traceback 33 | .kc { font-weight: bold } // Keyword.Constant 34 | .kd { font-weight: bold } // Keyword.Declaration 35 | .kp { font-weight: bold } // Keyword.Pseudo 36 | .kr { font-weight: bold } // Keyword.Reserved 37 | .kt { color: #458; font-weight: bold } // Keyword.Type 38 | .m { color: #099 } // Literal.Number 39 | .s { color: #d14 } // Literal.String 40 | .na { color: #008080 } // Name.Attribute 41 | .nb { color: #0086B3 } // Name.Builtin 42 | .nc { color: #458; font-weight: bold } // Name.Class 43 | .no { color: #008080 } // Name.Constant 44 | .ni { color: #800080 } // Name.Entity 45 | .ne { color: #900; font-weight: bold } // Name.Exception 46 | .nf { color: #900; font-weight: bold } // Name.Function 47 | .nn { color: #555 } // Name.Namespace 48 | .nt { color: #000080 } // Name.Tag 49 | .nv { color: #008080 } // Name.Variable 50 | .ow { font-weight: bold } // Operator.Word 51 | .w { color: #bbb } // Text.Whitespace 52 | .mf { color: #099 } // Literal.Number.Float 53 | .mh { color: #099 } // Literal.Number.Hex 54 | .mi { color: #099 } // Literal.Number.Integer 55 | .mo { color: #099 } // Literal.Number.Oct 56 | .sb { color: #d14 } // Literal.String.Backtick 57 | .sc { color: #d14 } // Literal.String.Char 58 | .sd { color: #d14 } // Literal.String.Doc 59 | .s2 { color: #d14 } // Literal.String.Double 60 | .se { color: #d14 } // Literal.String.Escape 61 | .sh { color: #d14 } // Literal.String.Heredoc 62 | .si { color: #d14 } // Literal.String.Interpol 63 | .sx { color: #d14 } // Literal.String.Other 64 | .sr { color: #009926 } // Literal.String.Regex 65 | .s1 { color: #d14 } // Literal.String.Single 66 | .ss { color: #990073 } // Literal.String.Symbol 67 | .bp { color: #999 } // Name.Builtin.Pseudo 68 | .vc { color: #008080 } // Name.Variable.Class 69 | .vg { color: #008080 } // Name.Variable.Global 70 | .vi { color: #008080 } // Name.Variable.Instance 71 | .il { color: #099 } // Literal.Number.Integer.Long 72 | } 73 | -------------------------------------------------------------------------------- /autoregressive/autoregressive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/deepgenerativemodels/notes/bd2303339eaaea884870125b473cc1ae8c980d51/autoregressive/autoregressive.png -------------------------------------------------------------------------------- /autoregressive/fvsbn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/deepgenerativemodels/notes/bd2303339eaaea884870125b473cc1ae8c980d51/autoregressive/fvsbn.png -------------------------------------------------------------------------------- /autoregressive/index.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: post 3 | title: Autoregressive Models 4 | --- 5 | 6 | We begin our study into generative modeling with autoregressive models. As before, we assume we are given access to a dataset $$\mathcal{D}$$ of $$n$$-dimensional datapoints $$\mathbf{x}$$. For simplicity, we assume the datapoints are binary, i.e., $$\mathbf{x} \in \{0,1\}^n$$. 7 | 8 | Representation 9 | ============== 10 | 11 | By the chain rule of probability, we can factorize the joint distribution over the $$n$$-dimensions as 12 | 13 | {% math %} 14 | p(\mathbf{x}) = \prod\limits_{i=1}^{n}p(x_i \vert x_1, x_2, \ldots, x_{i-1}) = 15 | \prod\limits_{i=1}^{n} p(x_i \vert \mathbf{x}_{< i } ) 16 | {% endmath %} 17 | 18 | where $$\mathbf{x}_{< i}=[x_1, x_2, \ldots, x_{i-1}]$$ denotes the vector of random variables with index less than $$i$$. 19 | 20 | The chain rule factorization can be expressed graphically as a Bayesian network. 21 | 22 | 23 |
24 | drawing 25 |
26 | Graphical model for an autoregressive Bayesian network with no conditional independence assumptions. 27 |
28 |
29 | 30 | Such a Bayesian network that makes no conditional independence assumptions is said to obey the *autoregressive* property. 31 | The term *autoregressive* originates from the literature on time-series models where observations from the previous time-steps are used to predict the value at the current time step. Here, we fix an ordering of the variables $$x_1, x_2, \ldots, x_n$$ and the distribution for the $$i$$-th random variable depends on the values of all the preceding random variables in the chosen ordering $$x_1, x_2, \ldots, x_{i-1}$$. 32 | 33 | If we allow for every conditional $$p(x_i \vert \mathbf{x}_{< i})$$ to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over $$n$$ random variables. However, the space complexity for such a representation grows exponentially with $$n$$. 34 | 35 | To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability distribution for each of the $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. For any one of the $$2^{n-1}$$ possible configurations of the variables, the probabilities should sum to one. Therefore, we need only one parameter for each configuration, so the total number of parameters for specifying this conditional is given by $$2^{n-1}$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule. 36 | 37 | In an *autoregressive generative model*, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions $$p(x_i \vert \mathbf{x}_{< i})$$ to correspond to a Bernoulli random variable and learn a function that maps the preceding random variables $$x_1, x_2, \ldots, x_{i-1}$$ to the 38 | mean of this distribution. Hence, we have 39 | {% math %} 40 | p_{\theta_i}(x_i \vert \mathbf{x}_{< i}) = \mathrm{Bern}(f_i(x_1, x_2, \ldots, x_{i-1})) 41 | {% endmath %} 42 | where $$\theta_i$$ denotes the set of parameters used to specify the mean 43 | function $$f_i: \{0,1\}^{i-1}\rightarrow [0,1]$$. 44 | 45 | 46 | The number of parameters of an autoregressive generative model are given by $$\sum_{i=1}^n \vert \theta_i \vert$$. As we shall see in the examples below, the number of parameters are much fewer than the tabular setting considered previously. Unlike the tabular setting however, an autoregressive generative model cannot represent all possible distributions. Its expressiveness is limited by the fact that we are limiting the conditional distributions to correspond to a Bernoulli random variable with the mean specified via a restricted class of parameterized functions. 47 | 48 |
49 | drawing 50 |
51 | A fully visible sigmoid belief network over four variables. The conditionals are denoted by \(\widehat{x}_1, \widehat{x}_2, \widehat{x}_3, \widehat{x}_4\) respectively. 52 |
53 |
54 | In the simplest case, we can specify the function as a linear combination of the input elements followed by a sigmoid non-linearity (to restrict the output to lie between 0 and 1). This gives us the formulation of a *fully-visible sigmoid belief network* ([FVSBN](https://papers.nips.cc/paper/1153-does-the-wake-sleep-algorithm-produce-good-density-estimators.pdf)). 55 | 56 | {% math %} 57 | f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\alpha^{(i)}_0 + \alpha^{(i)}_1 x_1 + \ldots + \alpha^{(i)}_{i-1} x_{i-1}) 58 | {% endmath %} 59 | 60 | where $$\sigma$$ denotes the sigmoid function and $$\theta_i=\{\alpha^{(i)}_0,\alpha^{(i)}_1, \ldots, \alpha^{(i)}_{i-1}\}$$ denote the parameters of the mean function. The conditional for variable $$i$$ requires $$i$$ parameters, and hence the total number of parameters in the model is given by $$\sum_{i=1}^ni= O(n^2)$$. Note that the number of parameters are much fewer than the exponential complexity of the tabular case. 61 | 62 | A natural way to increase the expressiveness of an autoregressive generative model is to use more flexible parameterizations for the mean function e.g., multi-layer perceptrons (MLP). For example, consider the case of a neural network with 1 hidden layer. The mean function for variable $$i$$ can be expressed as 63 | 64 | {% math %} 65 | \mathbf{h}_i = \sigma(A_i \mathbf{x_{< i}} + \mathbf{c}_i)\\ 66 | f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i ) 67 | {% endmath %} 68 | 69 | where $$\mathbf{h}_i \in \mathbb{R}^d$$ denotes the hidden layer activations for the MLP and $$\theta_i = \{A_i \in \mathbb{R}^{d\times (i-1)}, \mathbf{c}_i \in \mathbb{R}^d, \boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d, b_i \in \mathbb{R}\}$$ 70 | are the set of parameters for the mean function $$\mu_i(\cdot)$$. The total number of parameters in this model is dominated by the matrices $$A_i$$ and given by $$O(n^2 d)$$. 71 | 72 | 73 |
74 | drawing 75 |
76 | A neural autoregressive density estimator over four variables. The conditionals are denoted by \(\widehat{x}_1, \widehat{x}_2, \widehat{x}_3, \widehat{x}_4\) respectively. The blue connections denote the tied weights \(W[., i]\) used for computing the hidden layer activations. 77 |
78 |
79 | 80 | The *Neural Autoregressive Density Estimator* ([NADE](http://proceedings.mlr.press/v15/larochelle11a/larochelle11a.pdf)) provides an alternate MLP-based parameterization that is more statistically and computationally efficient than the vanilla approach. In NADE, parameters are shared across the functions used for evaluating the conditionals. In particular, the hidden layer activations are specified as 81 | 82 | {% math %} 83 | \mathbf{h}_i = \sigma(W_{., < i} \mathbf{x_{< i}} + \mathbf{c})\\ 84 | f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i ) 85 | {% endmath %} 86 | where $$\theta=\{W\in \mathbb{R}^{d\times n}, \mathbf{c} \in \mathbb{R}^d, \{\boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d\}^n_{i=1}, \{b_i \in \mathbb{R}\}^n_{i=1}\}$$is 87 | the full set of parameters for the mean functions $$f_1(\cdot), f_2(\cdot), \ldots, f_n(\cdot)$$. The weight matrix $$W$$ and the bias vector $$\mathbf{c}$$ are shared across the conditionals. Sharing parameters offers two benefits: 88 | 89 | 1. The total number of parameters gets reduced from $$O(n^2 d)$$ to $$O(nd)$$ \[readers are encouraged to check!\]. 90 | 91 | 2. The hidden unit activations can be evaluated in $$O(nd)$$ time via the following recursive strategy: 92 | {% math %} 93 | \mathbf{h}_i = \sigma(\mathbf{a}_i)\\ 94 | \mathbf{a}_{i+1} = \mathbf{a}_{i} + W[., i]x_i 95 | {% endmath %} 96 | with the base case given by $$\mathbf{a}_1=\mathbf{c}$$. 97 | 98 | 99 | ### Extensions to NADE 100 | 101 | The [RNADE](https://arxiv.org/abs/1306.0186) algorithm extends NADE to learn generative models over real-valued data. Here, the conditionals are modeled via a continuous distribution such as a equi-weighted mixture of $$K$$ Gaussians. Instead of learning a mean function, we now learn the means $$\mu_{i,1}, \mu_{i,2},\ldots, \mu_{i,K}$$ and variances $$\Sigma_{i,1}, \Sigma_{i,2},\ldots, \Sigma_{i,K}$$ of the $$K$$ Gaussians for every conditional. For statistical and computational efficiency, a single function $$g_i: \mathbb{R}^{i-1}\rightarrow\mathbb{R}^{2K}$$ outputs all the means and variances of the $$K$$ Gaussians for the $$i$$-th conditional distribution. 102 | 103 | Notice that NADE requires specifying a single, fixed ordering of the variables. The choice of ordering can lead to different models. The [EoNADE](https://arxiv.org/abs/1310.1757) algorithm allows training an ensemble of NADE models with different orderings. 104 | 105 | Learning and inference 106 | ====================== 107 | 108 | Recall that learning a generative model involves optimizing the closeness between the data and model distributions. One commonly used notion of closeness in the KL divergence between the data and the model distributions. 109 | 110 | {% math %} 111 | \min_{\theta\in \mathcal{M}}d_{KL} 112 | (p_{\mathrm{data}}, p_{\theta}) = \mathbb{E}_{\mathbf{x} \sim p_{\mathrm{data}} }\left[\log p_{\mathrm{data}}(\mathbf{x}) - \log p_{\theta}(\mathbf{x})\right] 113 | {% endmath %} 114 | 115 | Before moving any further, we make two comments about the KL divergence. First, we note that the KL divergence between any two distributions is asymmetric. As we navigate through this chapter, the reader is encouraged to think what could go wrong if we decided to optimize the reverse KL divergence instead. Secondly, the KL divergences heavily penalizes any model distribution $$p_\theta$$ which assigns low probability to a datapoint that is likely to be sampled under $$p_{\mathrm{data}}$$. In the extreme case, if the density $$p_\theta(\mathbf{x})$$ evaluates to zero for a datapoint sampled from $$p_{\mathrm{data}}$$, the objective evaluates to $$+\infty$$. 116 | 117 | Since $$p_{\mathrm{data}}$$ does not depend on $$\theta$$, we can equivalently recover the optimal parameters via maximizing likelihood estimation. 118 | 119 | {% math %} 120 | \max_{\theta\in \mathcal{M}}\mathbb{E}_{\mathbf{x} \sim p_{\mathrm{data}} }\left[\log p_{\theta}(\mathbf{x})\right]. 121 | {% endmath %} 122 | 123 | Here, $$\log p_{\theta}(\mathbf{x})$$ is referred to as the log-likelihood of the datapoint $$\mathbf{x}$$ with respect to the model distribution $$p_\theta$$. 124 | 125 | To approximate the expectation over the unknown $$p_{\mathrm{data}}$$, we make an assumption: points in the dataset $$\mathcal{D}$$ are sampled i.i.d. from $$p_{\mathrm{data}}$$. This allows us to obtain an unbiased Monte Carlo estimate of the objective as 126 | 127 | {% math %} 128 | \max_{\theta\in \mathcal{M}}\frac{1}{\vert D \vert} \sum_{\mathbf{x} \in\mathcal{D} }\log p_{\theta}(\mathbf{x}) = \mathcal{L}(\theta \vert \mathcal{D}). 129 | {% endmath %} 130 | 131 | 132 | The maximum likelihood estimation (MLE) objective has an intuitive interpretation: pick the model parameters $$\theta \in \mathcal{M}$$ that maximize the log-probability of the observed datapoints in $$\mathcal{D}$$. 133 | 134 | In practice, we optimize the MLE objective using mini-batch gradient ascent. The algorithm operates in iterations. At every iteration $$t$$, we sample a mini-batch $$\mathcal{B}_t$$ of datapoints sampled randomly from the dataset ($$\vert \mathcal{B}_t\vert < \vert \mathcal{D} \vert$$) and compute gradients of the objective evaluated for the mini-batch. These parameters at iteration $$t+1$$ are then given via the following update rule 135 | {% math %} 136 | \theta^{(t+1)} = \theta^{(t)} + r_t \nabla_\theta\mathcal{L}(\theta^{(t)} \vert \mathcal{B}_t) 137 | {% endmath %} 138 | 139 | where $$\theta^{(t+1)}$$ and $$\theta^{(t)}$$ are the parameters at iterations $$t+1$$ and $$t$$ respectively, and $$r_t$$ is the learning rate at iteration $$t$$. Typically, we only specify the initial learning rate $$r_1$$ and update the rate based on a schedule. [Variants](http://cs231n.github.io/optimization-1/) of stochastic gradient ascent, such as RMS prop and Adam, employ modified update rules that work slightly better in practice. 140 | 141 | From a practical standpoint, we must think about how to choose hyperparameters (such as the initial learning rate) and a stopping criteria for the gradient descent. For both these questions, we follow the standard practice in machine learning of monitoring the objective on a validation dataset. Consequently, we choose the hyperparameters with the best performance on the validation dataset and stop updating the parameters when the validation log-likelihoods cease to improve[^1]. 142 | 143 | Now that we have a well-defined objective and optimization procedure, the only remaining task is to evaluate the objective in the context of an autoregressive generative model. To this end, we substitute the factorized joint distribution of an autoregressive model in the MLE objective to get 144 | 145 | {% math %} 146 | \max_{\theta \in \mathcal{M}}\frac{1}{\vert D \vert} \sum_{\mathbf{x} \in\mathcal{D} }\sum_{i=1}^n\log p_{\theta_i}(x_i \vert \mathbf{x}_{< i}) 147 | {% endmath %} 148 | 149 | where $$\theta = \{\theta_1, \theta_2, \ldots, \theta_n\}$$ now denotes the 150 | collective set of parameters for the conditionals. 151 | 152 | Inference in an autoregressive model is straightforward. For density estimation of an arbitrary point $$\mathbf{x}$$, we simply evaluate the log-conditionals $$\log p_{\theta_i}(x_i \vert \mathbf{x}_{< i})$$ for each $$i$$ and add these up to obtain the log-likelihood assigned by the model to $$\mathbf{x}$$. Since we know conditioning vector $$\mathbf{x}$$, each of the conditionals can be evaluated in parallel. Hence, density estimation is efficient on modern hardware. 153 | 154 | Sampling from an autoregressive model is a sequential procedure. Here, we first sample $$x_1$$, then we sample $$x_2$$ conditioned on the sampled $$x_1$$, followed by $$x_3$$ conditioned on both $$x_1$$ and $$x_2$$ and so on until we sample $$x_n$$ conditioned on the previously sampled $$\mathbf{x}_{< n}$$. For applications requiring real-time generation of high-dimensional data such as audio synthesis, the sequential sampling can be an expensive process. Later in this course, we will discuss how parallel WaveNet, an autoregressive model sidesteps this expensive sampling process. 155 | 156 | 157 | 158 | Finally, an autoregressive model does not directly learn unsupervised representations of the data. In the next few set of lectures, we will look at latent variable models (e.g., variational autoencoders) which explicitly learn latent representations of the data. 159 | 160 | 165 | 166 | Footnotes 167 | ============== 168 | 169 | [^1]: Given the non-convex nature of such problems, the optimization procedure can get stuck in local optima. Hence, early stopping will generally not be optimal but is a very practical strategy. 170 | -------------------------------------------------------------------------------- /autoregressive/index.tex: -------------------------------------------------------------------------------- 1 | \section{Autoregressive Models} 2 | 3 | We begin our study with the autoregressive generative models. As before, we assume we are given access to a dataset $\mathcal{D}$ of $n$-dimensional datapoints $\mathbf{x}$. For simplicity, we assume the datapoints are binary, i.e., $\mathbf{x} \in \{0,1\}^n$. 4 | 5 | \section{Representation} 6 | 7 | By the chain rule of probability, we can factorize the joint distribution over the $n$-dimensions as: 8 | \[ 9 | \begin{equation} 10 | p(\mathbf{x}) = \prod\limits_{i=1}^{n}p(x_i \vert x_1, x_2, \ldots, x_{i-1}) = \prod\limits_{i=1}^{n} p(x_i \vert \mathbf{x}_{