├── .gitignore ├── README.md ├── related_paper_queue.md └── src ├── DGM4NLP.jpg ├── MI1.png ├── MI2.png ├── MI3.png ├── MI4.png ├── MINotes.md ├── MINotes.pdf ├── VI4NLP_Recipe.pdf ├── annotated_arae.pdf ├── roadmap.01.png └── titlepage.jpeg /.gitignore: -------------------------------------------------------------------------------- 1 | src/.DS_Store 2 | .DS_Store 3 | local -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ![title](src/titlepage.jpeg) 4 | 5 | DGMs 4 NLP. Deep Generative Models for Natural Language Processing. A Roadmap. 6 | 7 | Yao Fu, University of Edinburgh, yao.fu@ed.ac.uk 8 | 9 | \*\*Update\*\*: [How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources](https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1) 10 | 11 | \*\*Update\*\*: [A Closer Look at Language Model Emergent Abilities](https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f) 12 | 13 | \*\*Update\*\*: [Large Languge Models](#large-language-models) 14 | 15 | \*\*Update\*\*: [Long-range Dependency](#long-range-dependency); [Why S4 is Good at Long Sequence: Remembering a Sequence with Online Function Approximation](https://yaofu.notion.site/Why-S4-is-Good-at-Long-Sequence-Remembering-a-Sequence-with-Online-Function-Approximation-836fc54a49aa413b84997a265132f13f) 16 | 17 | \*\*TODO 1\*\*: Calibration; Prompting; Long-range transformers; State-space Models 18 | 19 | \*\*TODO 2\*\*: Matrix Factorization and Word embedding; Kernels; Gaussian Process 20 | 21 | \*\*TODO 3\*\*: Relationship between inference and RL; 22 | 23 | 24 | 25 | ---- 26 | ## Introduction 27 | 28 | ### Prelude 29 | 30 | (written in early 2019, originated from the [DGM seminar at Columbia](http://stat.columbia.edu/~cunningham/teaching/GR8201/)) 31 | 32 | Why do we want deep generative models? Because we want to learn basic factors that generate language. Human language contains rich latent factors, the continuous ones might be emotion, intention, and others, the discrete/ structural factors might be POS/ NER tags or syntax trees. Many of them are latent as in most cases, we just observe the sentence. They are also generative: human should produce language based on the overall idea, the current emotion, the syntax, and all other things we can or cannot name. 33 | 34 | How to model the generative process of language in a statistically principled way? Can we have a flexible framework that allows us to incorporate explicit supervision signals when we have labels, or add distant supervision or logical/ statistical constraints when we do not have labels but have other prior knowledge, or simply infer whatever makes the most sense when we have no labels or a priori? Is it possible that we exploit the modeling power of advanced neural architectures while still being mathematical and probabilistic? DGMs allow us to achieve these goals. 35 | 36 | Let us begin the journey. 37 | 38 | ### chronology 39 | * 2013: VAE 40 | * 2014: GAN; Sequence to sequence; Attention Mechanism 41 | * 2015: Normalizing Flow; Difussion Models 42 | * 2016: Gumbel-softmax; Google's Neural Machine Translation System (GNMT) 43 | * 2017: Transformers; ELMo 44 | * 2018: BERT 45 | * 2019: Probing and Bertology; GPT2 46 | * 2020: GPT3; Contrastive Learning; Compositional Generalization; Diffusion Models 47 | * 2021: Prompting; Score-based Generative Models; 48 | * 2022: State-spece Models 49 | 50 | ## Table of Content 51 | 52 | ![roadmap](src/roadmap.01.png) 53 | 54 | - [Introduction](#introduction) 55 | - [Prelude](#prelude) 56 | - [chronology](#chronology) 57 | - [Table of Content](#table-of-content) 58 | - [Resources](#resources) 59 | - [DGM Seminars](#dgm-seminars) 60 | - [Courses](#courses) 61 | - [Books](#books) 62 | - [NLP Side](#nlp-side) 63 | - [Generation](#generation) 64 | - [Decoding and Search, General](#decoding-and-search-general) 65 | - [Constrained Decoding](#constrained-decoding) 66 | - [Non-autoregressive Decoding](#non-autoregressive-decoding) 67 | - [Decoding from Pretrained Language Model](#decoding-from-pretrained-language-model) 68 | - [Structured Prediction](#structured-prediction) 69 | - [Syntax](#syntax) 70 | - [Semantics](#semantics) 71 | - [Grammar Induction](#grammar-induction) 72 | - [Compositionality](#compositionality) 73 | - [ML Side](#ml-side) 74 | - [Samplig Methods](#samplig-methods) 75 | - [Variational Inference, VI](#variational-inference-vi) 76 | - [VAEs](#vaes) 77 | - [Reparameterization](#reparameterization) 78 | - [GANs](#gans) 79 | - [Flows](#flows) 80 | - [Score-based Generative Models](#score-based-generative-models) 81 | - [Diffusion Models](#diffusion-models) 82 | - [Advanced Topics](#advanced-topics) 83 | - [Neural Architectures](#neural-architectures) 84 | - [RNNs](#rnns) 85 | - [Transformers](#transformers) 86 | - [Language Model Pretraining](#language-model-pretraining) 87 | - [Neural Network Learnability](#neural-network-learnability) 88 | - [Long-range Transformers](#long-range-transformers) 89 | - [State-Spece Models](#state-spece-models) 90 | - [Large Language Models](#large-language-models) 91 | - [Solutions and Frameworks for Running Large Language Models](#solutions-and-frameworks-for-running-large-language-models) 92 | - [List of Large Language Models](#list-of-large-language-models) 93 | - [Emergent Abilities](#emergent-abilities) 94 | - [Optimization](#optimization) 95 | - [Gradient Estimation](#gradient-estimation) 96 | - [Discrete Structures](#discrete-structures) 97 | - [Inference](#inference) 98 | - [Efficient Inference](#efficient-inference) 99 | - [Posterior Regularization](#posterior-regularization) 100 | - [Geometry](#geometry) 101 | - [Randomization](#randomization) 102 | - [Generalization Thoery](#generalization-thoery) 103 | - [Representation](#representation) 104 | - [Information Theory](#information-theory) 105 | - [Disentanglement and Interpretability](#disentanglement-and-interpretability) 106 | - [Invariance](#invariance) 107 | - [Analysis and Critics](#analysis-and-critics) 108 | 109 | Citation: 110 | ``` 111 | @article{yao2019DGM4NLP, 112 | title = "Deep Generative Models for Natual Language Processing", 113 | author = "Yao Fu", 114 | year = "2019", 115 | url = "https://github.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing" 116 | } 117 | ``` 118 | 119 | ## Resources 120 | 121 | * [How to write Variational Inference and Generative Models for NLP: a recipe](https://github.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/blob/master/src/VI4NLP_Recipe.pdf). This is strongly suggested for beginners writing papers about VAEs for NLP. 122 | 123 | * A Tutorial on Deep Latent Variable Models of Natural Language ([link](https://arxiv.org/abs/1812.06834)), EMNLP 18 124 | * Yoon Kim, Sam Wiseman and Alexander M. Rush, Havard 125 | 126 | * Latent Structure Models for NLP. ACL 2019 tutorial [link](https://deep-spin.github.io/tutorial/) 127 | * André Martinns, Tsvetomila Mihaylova, Nikita Nangia, Vlad Niculae. 128 | 129 | ### DGM Seminars 130 | 131 | * Columbia STAT 8201 - [Deep Generative Models](http://stat.columbia.edu/~cunningham/teaching/GR8201/), by [John Cunningham](https://stat.columbia.edu/~cunningham/) 132 | 133 | * Stanford CS 236 - [Deep Generative Models](https://deepgenerativemodels.github.io/), by Stefano Ermon 134 | 135 | * U Toronto CS 2541 - [Differentiable Inference and Generative Models](https://www.cs.toronto.edu/~duvenaud/courses/csc2541/index.html), CS 2547 [Learning Discrete Latent Structures](https://duvenaud.github.io/learn-discrete/), CSC 2547 Fall 2019: [Learning to Search](https://duvenaud.github.io/learning-to-search/). By David Duvenaud 136 | 137 | * U Toronto STA 4273 Winter 2021 - [Minimizing Expectations](https://www.cs.toronto.edu/~cmaddis/courses/sta4273_w21/). By Chris Maddison 138 | 139 | * Berkeley CS294-158 - [Deep Unsupervised Learning](https://sites.google.com/view/berkeley-cs294-158-sp20/home). By Pieter Abbeel 140 | 141 | * Columbia STCS 8101 - [Representation Learning: A Probabilistic Perspective](http://www.cs.columbia.edu/~blei/seminar/2020-representation/index.html). By David Blei 142 | 143 | * Stanford CS324 - [Large Language Models](https://stanford-cs324.github.io/winter2022/). By Percy Liang, Tatsunori Hashimoto and Christopher Re 144 | 145 | * U Toronto CSC2541 - [Neural Net Training Dynamics](https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2021/). By Roger Grosse. 146 | 147 | ### Courses 148 | 149 | The fundation of the DGMs is built upon probabilistic graphical models. So we take a look at the following resources 150 | 151 | * Blei's Foundation of Graphical Models course, STAT 6701 at Columbia ([link](http://www.cs.columbia.edu/~blei/fogm/2019F/index.html)) 152 | * Foundation of probabilistic modeling, graphical models, and approximate inference. 153 | 154 | * Xing's Probabilistic Graphical Models, 10-708 at CMU ([link](https://sailinglab.github.io/pgm-spring-2019/)) 155 | * A really heavy course with extensive materials. 156 | * 5 modules in total: exact inference, approximate inference, DGMs, reinforcement learning, and non-parameterics. 157 | * All the lecture notes, vedio recordings, and homeworks are open-sourced. 158 | 159 | * Collins' Natural Language Processing, COMS 4995 at Columbia ([link](http://www.cs.columbia.edu/~mcollins/cs4705-spring2019/)) 160 | * Many inference methods for structured models are introduced. Also take a look at related notes from [Collins' homepage](http://www.cs.columbia.edu/~mcollins/) 161 | * Also checkout [bilibili](https://www.bilibili.com/video/av29608234?from=search&seid=10252913399572988135) 162 | 163 | ### Books 164 | 165 | * Pattern Recognition and Machine Learning. Christopher M. Bishop. 2006 166 | * Probabily the most classical textbook 167 | * The _core part_, according to my own understanding, of this book, should be section 8 - 13, especially section 10 since this is the section that introduces variational inference. 168 | 169 | * Machine Learning: A Probabilistic Perspective. Kevin P. Murphy. 2012 170 | * Compared with the PRML Bishop book, this book may be used as a super-detailed handbook for various graphical models and inference methods. 171 | 172 | * Graphical Models, Exponential Families, and Variational Inference. 2008 173 | * Martin J. Wainwright and Michael I. Jordan 174 | 175 | * Linguistic Structure Prediction. 2011 176 | * Noah Smith 177 | 178 | * The Syntactic Process. 2000 179 | * Mark Steedman 180 | 181 | ---- 182 | 183 | 184 | ## NLP Side 185 | 186 | 187 | ### Generation 188 | 189 | * Generating Sentences from a Continuous Space, CoNLL 15 190 | * Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio 191 | 192 | * Neural variational inference for text processing, ICML 16 193 | * Yishu Miao, Lei Yu, Phil Blunsom, Deepmind 194 | 195 | * Learning Neural Templates for Text Generation. EMNLP 2018 196 | * Sam Wiseman, Stuart M. Shieber, Alexander Rush. Havard 197 | 198 | * Residual Energy Based Models for Text Generation. ICLR 20 199 | * Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc' Aurelio Ranzato. Havard and FAIR 200 | 201 | * Paraphrase Generation with Latent Bag of Words. NeurIPS 2019. 202 | * Yao Fu, Yansong Feng, and John P. Cunningham. Columbia 203 | 204 | 205 | 206 | ### Decoding and Search, General 207 | 208 | * Fairseq Decoding Library. [[github](https://github.com/pytorch/fairseq/blob/master/fairseq/search.py)] 209 | 210 | * Controllabel Neural Text Generation [[Lil'Log](https://lilianweng.github.io/lil-log/2021/01/02/controllable-neural-text-generation.html)] 211 | 212 | * Best-First Beam Search. TACL 2020 213 | * Clara Meister, Tim Vieira, Ryan Cotterell 214 | 215 | * The Curious Case of Neural Text Degeneration. ICLR 2020 216 | * Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi 217 | 218 | * Comparison of Diverse Decoding Methods from Conditional Language Models. ACL 2019 219 | * Daphne Ippolito, Reno Kriz, Maria Kustikova, Joa ̃o Sedoc, Chris Callison-Burch 220 | 221 | * Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. ICML 19 222 | * Wouter Kool, Herke van Hoof, Max Welling 223 | 224 | * Conditional Poisson Stochastic Beam Search. EMNLP 2021 225 | * Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell 226 | 227 | * Massive-scale Decoding for Text Generation using Lattices. 2021 228 | * Jiacheng Xu and Greg Durrett 229 | 230 | 231 | 232 | ### Constrained Decoding 233 | * Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. ACL 2017 234 | * Chris Hokamp, Qun Liu 235 | 236 | * Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. NAACL 2018 237 | * Matt Post, David Vilar 238 | 239 | * Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting. NAACL 2019 240 | * J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme 241 | 242 | * Towards Decoding as Continuous Optimisation in Neural Machine Translation. EMNLP 2017 243 | * Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. 244 | 245 | * Gradient-guided Unsupervised Lexically Constrained Text Generation. EMNLP 2020 246 | * Lei Sha 247 | 248 | * Controlled Text Generation as Continuous Optimization with Multiple Constraints. 2021 249 | * Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov 250 | 251 | * NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints. NAACL 2021 252 | * Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi 253 | 254 | * NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics. 2021 255 | * Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi 256 | 257 | * COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics. 2022 258 | * Lianhui Qin, Sean Welleck, Daniel Khashabi, Yejin Choi 259 | 260 | 261 | ### Non-autoregressive Decoding 262 | 263 | Note: I have not fully gone through this chapter, please give me suggestions! 264 | 265 | * Non-Autoregressive Neural Machine Translation. ICLR 2018 266 | * Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, Richard Socher 267 | 268 | * Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade. 269 | * Jiatao Gu, Xiang Kong. 270 | 271 | * Fast Decoding in Sequence Models Using Discrete Latent Variables. ICML 2021 272 | * Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer 273 | 274 | * Cascaded Text Generation with Markov Transformers. Arxiv 20 275 | * Yuntian Deng and Alexander Rush 276 | 277 | * Glancing Transformer for Non-Autoregressive Neural Machine Translation. ACL 2021 278 | * Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, Lei Li 279 | * This one is now deployed inside Bytedance 280 | 281 | 282 | ### Decoding from Pretrained Language Model 283 | 284 | TODO: more about it 285 | 286 | * Prompt Papers, ThuNLP ([link](https://github.com/thunlp/PromptPapers)) 287 | 288 | * CTRL: A Conditional Transformer Language Model for Controllable Generation. Arxiv 2019 289 | * Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher 290 | 291 | * Plug and Play Language Models: a Simple Approach to Controlled Text Generation 292 | * Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu 293 | 294 | ### Structured Prediction 295 | 296 | * Torch-Struct: Deep Structured Prediction Library. [github](https://github.com/harvardnlp/pytorch-struct), [paper](https://arxiv.org/abs/2002.00876), [documentation](http://nlp.seas.harvard.edu/pytorch-struct/) 297 | * Alexander M. Rush. Cornell University 298 | 299 | * An introduction to Conditional Random Fields. 2012 300 | * Charles Sutton and Andrew McCallum. 301 | 302 | 303 | * Inside-Outside and Forward-Backward Algorithms Are Just Backprop. 2016. 304 | * Jason Eisner 305 | * Learning with Fenchel-Young Losses. JMLR 2019 306 | * Mathieu Blondel, André F. T. Martins, Vlad Niculae 307 | 308 | * Structured Attention Networks. ICLR 2017 309 | * Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush 310 | 311 | * Differentiable Dynamic Programming for Structured Prediction and Attention. ICML 2018 312 | * Arthur Mensch and Mathieu Blondel. 313 | 314 | 315 | 316 | ### Syntax 317 | 318 | * Recurrent Neural Network Grammars. NAACL 16 319 | * Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah Smith. 320 | 321 | * Unsupervised Recurrent Neural Network Grammars, NAACL 19 322 | * Yoon Kin, Alexander Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, and Gabor Melis 323 | 324 | * Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder, ICLR 19 325 | * Caio Corro, Ivan Titov, Edinburgh 326 | 327 | 328 | ### Semantics 329 | 330 | * The Syntactic Process. 2020 331 | * Mark Steedman 332 | 333 | * Linguistically-Informed Self-Attention for Semantic Role Labeling. EMNLP 2018 Best paper award 334 | * Emma Strubell, Patrick Verga, Daniel Andor, David Weiss and Andrew McCallum. UMass Amherst and Google AI Language 335 | 336 | * Semantic Parsing with Semi-Supervised Sequential Autoencoders. 2016 337 | * Tomas Kocisky, Gabor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann 338 | 339 | ### Grammar Induction 340 | * Grammar Induction and Unsupervised Learning, paper list. ([link](https://github.com/FranxYao/nlp-fundamental-frontier/blob/main/nlp/grammar_induction.md)) 341 | * Yao Fu 342 | 343 | ### Compositionality 344 | 345 | * [Compositional Generalization in NLP](https://github.com/FranxYao/CompositionalGeneralizationNLP). Paper list 346 | * Yao Fu 347 | 348 | * Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. ICML 2019 349 | * Brenden Lake and Marco Baroni 350 | 351 | * Improving Text-to-SQL Evaluation Methodology. ACL 2018 352 | * Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev 353 | 354 | ---- 355 | 356 | ## ML Side 357 | 358 | 359 | ### Samplig Methods 360 | 361 | * Probabilistic inference using Markov chain Monte Carlo methods. 1993 362 | * Radford M Neal 363 | 364 | * Elements of Sequential Monte Carlo ([link](https://arxiv.org/abs/1903.04797)) 365 | * Christian A. Naesseth, Fredrik Lindsten, Thomas B. Schön 366 | 367 | * A Conceptual Introduction to Hamiltonian Monte Carlo ([link](https://arxiv.org/abs/1701.02434)) 368 | * Michael Betancourt 369 | 370 | * Candidate Sampling ([link](https://www.tensorflow.org/extras/candidate_sampling.pdf)) 371 | * Google Tensorflow Blog 372 | 373 | * Noise-constrastive estimation: A new estimation principle for unnormalized statistical models. AISTATA 2010 374 | * Michael Gutmann, Hyvarinen. University of Helsinki 375 | 376 | * A* Sampling. NIPS 2014 Best paper award 377 | * Chris J. Maddison, Daniel Tarlow, Tom Minka. University of Toronto and MSR 378 | 379 | 380 | 381 | ### Variational Inference, VI 382 | 383 | * Cambridge Variational Inference Reading Group ([link](http://www.statslab.cam.ac.uk/~sp825/vi.html)) 384 | * Sam Power. University of Cambridge 385 | 386 | * Variational Inference: A Review for Statisticians. 387 | * David M. Blei, Alp Kucukelbir, Jon D. McAuliffe. 388 | 389 | * Stochastic Variational Inference 390 | * Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley 391 | 392 | * Variational Bayesian Inference with Stochastic Search. ICML 12 393 | * John Paisley, David Blei, Michael Jordan. Berkeley and Princeton 394 | 395 | 396 | 397 | ### VAEs 398 | 399 | * Auto-Encoding Variational Bayes, ICLR 14 400 | * Diederik P. Kingma, Max Welling 401 | 402 | * beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017 403 | * Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner 404 | 405 | * Importance Weighted Autoencoders. ICLR 2015 406 | * Yuri Burda, Roger Grosse, Ruslan Salakhutdinov 407 | 408 | * Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ICML 14 409 | * Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra 410 | * Reparameterization w. deep gaussian models. 411 | 412 | * Semi-amortized variational autoencoders, ICML 18 413 | * Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush, Havard 414 | 415 | * Adversarially Regularized Autoencoders, ICML 18 416 | * Jake (Junbo) Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun. 417 | 418 | 419 | 420 | 421 | ### Reparameterization 422 | More on reparameterization: to reparameterize gaussian mixture, permutation matrix, and rejection samplers(Gamma and Dirichlet). 423 | 424 | * Stochastic Backpropagation through Mixture Density Distributions, Arxiv 16 425 | * Alex Graves 426 | 427 | * Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms. AISTATS 2017 428 | * Christian A. Naesseth, Francisco J. R. Ruiz, Scott W. Linderman, David M. Blei 429 | 430 | * Implicit Reparameterization Gradients. NeurIPS 2018. 431 | * Michael Figurnov, Shakir Mohamed, and Andriy Mnih 432 | 433 | * Categorical Reparameterization with Gumbel-Softmax. ICLR 2017 434 | * Eric Jang, Shixiang Gu, Ben Poole 435 | 436 | * The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. ICLR 2017 437 | * Chris J. Maddison, Andriy Mnih, and Yee Whye Teh 438 | 439 | * Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax. 2020 440 | * Andres Potapczynski, Gabriel Loaiza-Ganem, John P. Cunningham 441 | 442 | * Reparameterizable Subset Sampling via Continuous Relaxations. IJCAI 2019 443 | * Sang Michael Xie and Stefano Ermon 444 | 445 | 446 | 447 | 448 | 449 | ### GANs 450 | 451 | * Generative Adversarial Networks, NIPS 14 452 | * Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio 453 | 454 | * Towards principled methods for training generative adversarial networks, ICLR 2017 455 | * Martin Arjovsky and Leon Bottou 456 | 457 | * Wasserstein GAN 458 | * Martin Arjovsky, Soumith Chintala, Léon Bottou 459 | 460 | * InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016 461 | * Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel. UC Berkeley. OpenAI 462 | 463 | * Adversarially Learned Inference. ICLR 2017 464 | * Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville 465 | 466 | ### Flows 467 | 468 | * Flow Based Deep Generative Models, from [Lil's log](https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html) 469 | 470 | * Variational Inference with Normalizing Flows, ICML 15 471 | * Danilo Jimenez Rezende, Shakir Mohamed 472 | 473 | * Learning About Language with Normalizing Flows 474 | * Graham Neubig, CMU, [slides](http://www.phontron.com/slides/neubig19generative.pdf) 475 | 476 | * Improved Variational Inference with Inverse Autoregressive Flow 477 | * Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling 478 | 479 | * Density estimation using Real NVP. ICLR 17 480 | * Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio 481 | 482 | * Unsupervised Learning of Syntactic Structure with Invertible Neural Projections. EMNLP 2018 483 | * Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick 484 | 485 | * Latent Normalizing Flows for Discrete Sequences. ICML 2019. 486 | * Zachary M. Ziegler and Alexander M. Rush 487 | 488 | * Discrete Flows: Invertible Generative Models of Discrete Data. 2019 489 | * Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, Ben Poole 490 | 491 | * FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow. EMNLP 2019 492 | * Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, Eduard Hovy 493 | 494 | * Variational Neural Machine Translation with Normalizing Flows. ACL 2020 495 | * Hendra Setiawan, Matthias Sperber, Udhay Nallasamy, Matthias Paulik. Apple 496 | 497 | * On the Sentence Embeddings from Pre-trained Language Models. EMNLP 2020 498 | * Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei Li 499 | 500 | ### Score-based Generative Models 501 | > FY: Need to see how score-based generative models and diffusion models can be used for discrete sequences 502 | 503 | * [Generative Modeling by Estimating Gradients of the Data Distribution](https://yang-song.github.io/blog/2021/score/). Blog 2021 504 | * Yang Song 505 | 506 | * [Score Based Generative Modeling Papers](https://scorebasedgenerativemodeling.github.io/) 507 | * researchers at the University of Oxford 508 | 509 | * [Generative Modeling by Estimating Gradients of the Data Distribution](https://arxiv.org/abs/1907.05600). NeurIPS 2019 510 | * Yang Song, Stefano Ermon 511 | 512 | ### Diffusion Models 513 | 514 | * [What are Diffusion Models?](https://lilianweng.github.io/lil-log/2021/07/11/diffusion-models.html) 2021 515 | * Lilian Weng 516 | 517 | * [Awesome-Diffusion-Models](https://github.com/heejkoo/Awesome-Diffusion-Models) 518 | * Heejoon Koo 519 | 520 | * [Deep Unsupervised Learning using Nonequilibrium Thermodynamics](https://arxiv.org/abs/1503.03585). 2015 521 | * Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli 522 | 523 | * [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239). NeurIPS 2020 524 | * Jonathan Ho, Ajay Jain, Pieter Abbeel 525 | 526 | * [Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions](https://arxiv.org/abs/2102.05379). NeurIPS 2021 527 | * Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, Max Welling 528 | 529 | * [Structured Denoising Diffusion Models in Discrete State-Spaces](https://arxiv.org/abs/2107.03006). NeurIPS 2021 530 | * Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rianne van den Berg 531 | 532 | * [Autoregressive Diffusion Models](https://arxiv.org/abs/2110.02037). ICLR 2022 533 | * Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, Tim Salimans 534 | 535 | * [Diffusion-LM Improves Controllable Text Generation](https://arxiv.org/abs/2205.14217). 2022 536 | * Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto 537 | 538 | * [Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding](). 2022 539 | * Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi 540 | 541 | ---- 542 | ## Advanced Topics 543 | 544 | ### Neural Architectures 545 | 546 | 547 | #### RNNs 548 | 549 | * Ordered Neurons: Integrating Tree Structured into Recurrent Neural Networks 550 | * Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville. Mila, MSR 551 | 552 | * RNNs can generate bounded hierarchical languages with optimal memory 553 | * John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning 554 | 555 | #### Transformers 556 | 557 | * Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. ACL 2019 558 | * Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov 559 | 560 | * Theoretical Limitations of Self-Attention in Neural Sequence Models. TACL 2019 561 | * Michael Hahn 562 | 563 | * Rethinking Attention with Performers. 2020 564 | * Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller 565 | 566 | #### Language Model Pretraining 567 | 568 | * THUNLP: Pre-trained Languge Model paper list ([link](https://github.com/thunlp/PLMpapers)) 569 | * Xiaozhi Wang and Zhengyan Zhang, Tsinghua University 570 | 571 | * Tomohide Shibata's [BERT-related Papers](https://github.com/tomohideshibata/BERT-related-papers) 572 | 573 | #### Neural Network Learnability 574 | * [Neural Network Learnability](https://github.com/FranxYao/Semantics-and-Compositional-Generalization-in-Natural-Language-Processing#neural-network-learnability). Yao Fu 575 | 576 | 577 | #### Long-range Transformers 578 | 579 | * Long Range Arena: A Benchmark for Efficient Transformers 580 | * Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler 581 | 582 | #### State-Spece Models 583 | 584 | * HiPPO: Recurrent Memory with Optimal Polynomial Projections. NeurIPS 2020 585 | * Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Ré 586 | 587 | * Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer. NeurIPS 2021 588 | * Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré 589 | 590 | * Efficiently Modeling Long Sequences with Structured State Spaces. ICLR 2022 591 | * Albert Gu, Karan Goel, and Christopher Ré 592 | 593 | * [Why S4 is Good at Long Sequence: Remembering a Sequence with Online Function Approximation.](https://yaofu.notion.site/Why-S4-is-Good-at-Long-Sequence-Remembering-a-Sequence-with-Online-Function-Approximation-836fc54a49aa413b84997a265132f13f) 2022 594 | * Yao Fu 595 | 596 | 597 | ### Large Language Models 598 | 599 | #### Solutions and Frameworks for Running Large Language Models 600 | 601 | * Serving OPT-175B using Alpa (350 GB GPU memory in total) [link](https://alpa.ai/tutorials/opt_serving.html) 602 | 603 | #### List of Large Language Models 604 | 605 | * GPT3 (175B). Language Models are Few-Shot Learners. May 2020 606 | 607 | * Megatron-Turing NLG (530B). Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. Jan 2022 608 | 609 | * LaMDA (137B). LaMDA: Language Models for Dialog Applications. Jan 2022 610 | 611 | * Gopher (280B). Scaling Language Models: Methods, Analysis & Insights from Training Gopher. Dec 2021 612 | 613 | * Chinchilla (70B). Training Compute-Optimal Large Language Models. Mar 2022 614 | 615 | * PaLM (540B). PaLM: Scaling Language Modeling with Pathways. Apr 2022 616 | 617 | * OPT (175B). OPT: Open Pre-trained Transformer Language Models. May 2022 618 | 619 | * BLOOM (176B): BigScience Large Open-science Open-access Multilingual Language Model. May 2022 620 | 621 | * BlenderBot 3 (175B): a deployed conversational agent that continually learns to responsibly engage. Aug 2022 622 | 623 | 624 | 625 | #### Emergent Abilities 626 | 627 | * Scaling Laws for Neural Language Models. 2020 628 | * Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei 629 | 630 | * Emergent Abilities of Large Language Models. 2022 631 | * Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. 632 | 633 | 634 | ### Optimization 635 | 636 | #### Gradient Estimation 637 | 638 | * [Minimizing Expectations](https://www.cs.toronto.edu/~cmaddis/courses/sta4273_w21/). Chris Maddison 639 | 640 | * Monte Carlo Gradient Estimation in Machine Learning 641 | * Schakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih. DeepMind 642 | 643 | * Variational Inference for Monte Carlo Objectives. ICML 16 644 | * Andriy Mnih, Danilo J. Rezende. DeepMind 645 | 646 | * REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. NIPS 17 647 | * George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein. Google Brain, DeepMind, Oxford 648 | 649 | * Backpropagation Through the Void: Optimizing Control Variates for Black-box Gradient Estimation. ICLR 18 650 | * Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud. U Toronto and Vector Institute 651 | 652 | * Backpropagating through Structured Argmax using a SPIGOT. ACL 2018 Best Paper Honorable Mention. 653 | * Hao Peng, Sam Thomson, and Noah A. Smith 654 | 655 | * Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning. EMNLP 2020 656 | * Tsvetomila Mihaylova, Vlad Niculae, and Andre ́ F. T. Martins 657 | 658 | 659 | 660 | #### Discrete Structures 661 | 662 | * Learning with Differentiable Perturbed Optimizers. NeurIPS 2020 663 | * Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach 664 | 665 | * Gradient Estimation with Stochastic Softmax Tricks. NeurIPS 2020 666 | * Max B. Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison. 667 | 668 | * Differentiable Dynamic Programming for Structured Prediction and Attention. ICML 18 669 | * Arthur Mensch, Mathieu Blondel. Inria Parietal and NTT Communication Science Laboratories 670 | 671 | * Stochastic Optimization of Sorting Networks via Continuous Relaxations 672 | * Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon 673 | 674 | * Differentiable Ranks and Sorting using Optimal Transport 675 | * Guy Lorberbom, Andreea Gane, Tommi Jaakkola, and Tamir Hazan 676 | 677 | * Reparameterizing the Birkhoff Polytope for Variational Permutation Inference. AISTATS 2018 678 | * Scott W. Linderman, Gonzalo E. Mena, Hal Cooper, Liam Paninski, John P. Cunningham. 679 | 680 | * A Regularized Framework for Sparse and Structured Neural Attention. NeurIPS 2017 681 | 682 | * SparseMAP: Differentiable Sparse Structured Inference. ICML 2018 683 | 684 | 685 | ### Inference 686 | 687 | * Topics in Advanced Inference. Yingzhen Li. ([Link](http://yingzhenli.net/home/pdf/topics_approx_infer.pdf)) 688 | 689 | #### Efficient Inference 690 | 691 | * Nested Named Entity Recognition with Partially-Observed TreeCRFs. AAAI 2021 692 | * Yao Fu, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang 693 | 694 | * Rao-Blackwellized Stochastic Gradients for Discrete Distributions. ICML 2019. 695 | * Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael I. Jordan, Jon McAuliffe 696 | 697 | * Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity. NeurIPS 2020 698 | * Gonçalo M. Correia, Vlad Niculae, Wilker Aziz, André F. T. Martins 699 | 700 | 701 | #### Posterior Regularization 702 | 703 | * Posterior Regularization for Structured Latent Variable Models. JMLR 2010 704 | * Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar. 705 | 706 | * Posterior Control of Blackbox Generation. 2019 707 | * Xiang Lisa Li and Alexander M. Rush. 708 | 709 | * Dependency Grammar Induction with a Neural Variational Transition-based Parser. AAAI 2019 710 | * Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller 711 | 712 | 713 | ### Geometry 714 | 715 | * (In Chinese) 微分几何与拓扑学简明教程 716 | * 米先珂,福明珂 717 | 718 | * Only Bayes Should Learn a Manifold (On the Estimation of Differential Geometric Structure from Data). Arxiv 2018 719 | * Soren Hauberg 720 | 721 | * The Riemannian Geometry of Deep Generative Models. CVPRW 2018 722 | * Hang Shao, Abhishek Kumar, P. Thomas Fletcher 723 | 724 | * The Geometry of Deep Generative Image Models and Its Applications. ICLR 2021 725 | * Binxu Wang and Carlos R. Ponce 726 | 727 | * Metrics for Deep Generative Models. AISTATS 2017 728 | * Nutan Chen, Alexej Klushyn, Richard Kurle, Xueyan Jiang, Justin Bayer, Patrick van der Smagt 729 | 730 | * First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces. 2022 731 | * Michael I. Jordan, Tianyi Lin, Emmanouil V. Vlatakis-Gkaragkounis 732 | 733 | ### Randomization 734 | 735 | * Random Features for Large-Scale Kernel Machines. NeurIPS 2007 736 | * Ali Rahimi, Benjamin Recht 737 | 738 | * Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM 2011 739 | * Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp 740 | 741 | * Efficient optimization of loops and limits with randomized telescoping sums. ICML 2019 742 | * Alex Beatson, Ryan P Adams 743 | 744 | * Telescoping Density-Ratio Estimation. NeurIPS 2020 745 | * Benjamin Rhodes, Kai Xu, Michael U. Gutmann 746 | 747 | * Bias-Free Scalable Gaussian Processes via Randomized Truncations. ICML 2021 748 | * Andres Potapczynski, Luhuan Wu, Dan Biderman, Geoff Pleiss, John P Cunningham 749 | 750 | * Randomized Automatic Differentiation. ICLR 2021 751 | * Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P. Adams 752 | 753 | * Scaling Structured Inference with Randomization. 2021 754 | * Yao Fu, John Cunningham, Mirella Lapata 755 | 756 | 757 | 758 | ### Generalization Thoery 759 | 760 | * CS229T. Statistical Learning Theory. 2016 761 | * Percy Liang 762 | 763 | 764 | ### Representation 765 | 766 | #### Information Theory 767 | 768 | * Elements of Information Theory. Cover and Thomas. 1991 769 | 770 | * On Variational Bounds of Mutual Information. ICML 2019 771 | * Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker 772 | * A comprehensive discussion of all these MI variational bounds 773 | 774 | * Learning Deep Representations By Mutual Information Estimation And Maximization. ICLR 2019 775 | * R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio 776 | * A detailed comparison between different MI estimators, section 3.2. 777 | 778 | * MINE: Mutual Information Neural Estimation 779 | * R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio 780 | 781 | * Deep Variational Information Bottleneck. ICLR 2017 782 | * Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy. Google Research 783 | 784 | 785 | 786 | #### Disentanglement and Interpretability 787 | 788 | * Identifying Bayesian Mixture Models 789 | * Michael Betancourt 790 | 791 | * Disentangling Disentanglement in Variational Autoencoders. ICML 2019 792 | * Emile Mathieu, Tom Rainforth, N. Siddharth, Yee Whye Teh 793 | 794 | * Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. ICML 2019 795 | * Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem 796 | 797 | 798 | 799 | 800 | 801 | #### Invariance 802 | 803 | * Emergence of Invariance and Disentanglement in Deep Representations 804 | * Alessandro Achillo and Stefano Soatto. UCLA. JMLR 2018 805 | 806 | * Invariant Risk Minimization 807 | * Martin Arjovsky, Leon Bottou, Ishaan Gulrajani, David Lopez-Paz. 2019. 808 | 809 | 810 | 811 | 812 | 813 | 814 | 815 | 816 | 817 | 818 | ### Analysis and Critics 819 | 820 | * Fixing a Broken ELBO. ICML 2018. 821 | * Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy 822 | 823 | * Tighter Variational Bounds are Not Necessarily Better. ICML 2018 824 | * Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh 825 | 826 | * The continuous Bernoulli: fixing a pervasive error in variational autoencoders. NeurIPS 2019 827 | * Gabriel Loaiza-Ganem and John P. Cunningham. Columbia. 828 | 829 | * Do Deep Generative Models Know What They Don't Know? ICLR 2019 830 | * Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan 831 | 832 | * Effective Estimation of Deep Generative Language Models. ACL 2020 833 | * Tom Pelsmaeker and Wilker Aziz. University of Edinburgh and University of Amsterdam 834 | 835 | * How Good is the Bayes Posterior in Deep Neural Networks Really? ICML 2020 836 | * Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin 837 | 838 | * A statistical theory of cold posteriors in deep neural networks. ICLR 2021 839 | * Laurence Aitchison 840 | 841 | * Limitations of Autoregressive Models and Their Alternatives. NAACL 2021 842 | * Chu-Cheng Lin, Aaron Jaech, Xin Li, Matthew R. Gormley, Jason Eisner 843 | 844 | -------------------------------------------------------------------------------- /related_paper_queue.md: -------------------------------------------------------------------------------- 1 | * Latent Variable Model for Multi-modal Translation. ACL 19 2 | * Iacer Calixto, Miguel Rios and Wilker Aziz 3 | 4 | * Interpretable Neural Predictions with Differentiable Binary Variables. ACL 2019 5 | * Joost Bastings, Wilker Aziz and Ivan Titov. 6 | 7 | * Lagging Inference Networks and Posterior Collapse in Variational Autoencoders, ICLR 19 8 | * Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick 9 | 10 | * Spherical Latent Spaces for Stable Variational Autoencoders, EMNLP 18 11 | * Jiacheng Xu and Greg Durrett, UT Austin 12 | 13 | * Avoiding Latent Variable Collapse with Generative Skip Models, AISTATS 19 14 | * Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei 15 | 16 | * The Annotated Gumbel-softmax. Yao Fu. 2020 ([link](https://github.com/FranxYao/Annotated-Gumbel-Softmax-and-Score-Function)) 17 | 18 | * Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders 19 | * Emile Mathieu, Charline Le Lan, Chris J. Maddison, Ryota Tomioka, Yee Whye Teh 20 | 21 | 22 | * Direct Optimization through arg max for Discrete Variational Auto-Encoder 23 | * Guy Lorberbom, Andreea Gane, Tommi Jaakkola, Tamir Hazan 24 | 25 | 26 | 27 | * My [notes on mutual information](src/MINotes.md). Yao Fu, 2019. [pdf](src/MINotes.pdf) 28 | * Basics of information theory 29 | 30 | * Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization, NIPS 18 31 | * Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, Bill Dolan 32 | 33 | * Discovering Discrete Latent Topics with Neural Variational Inference, ICML 17 34 | * Yishu Miao, Edward Grefenstette, Phil Blunsom. Oxford 35 | 36 | * TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency, ICLR 17 37 | * Adji B. Dieng, Chong Wang, Jianfeng Gao, John William Paisley 38 | 39 | * Topic Aware Neural Response Generation, AAAI 17 40 | * Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma -------------------------------------------------------------------------------- /src/DGM4NLP.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/DGM4NLP.jpg -------------------------------------------------------------------------------- /src/MI1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/MI1.png -------------------------------------------------------------------------------- /src/MI2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/MI2.png -------------------------------------------------------------------------------- /src/MI3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/MI3.png -------------------------------------------------------------------------------- /src/MI4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/MI4.png -------------------------------------------------------------------------------- /src/MINotes.md: -------------------------------------------------------------------------------- 1 | # Mutual Information Estimation and Representation Learning 2 | 3 | mi1 5 | 6 | mi2 8 | 9 | mi3 11 | 12 | mi4 14 | -------------------------------------------------------------------------------- /src/MINotes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/MINotes.pdf -------------------------------------------------------------------------------- /src/VI4NLP_Recipe.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/VI4NLP_Recipe.pdf -------------------------------------------------------------------------------- /src/annotated_arae.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/annotated_arae.pdf -------------------------------------------------------------------------------- /src/roadmap.01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/roadmap.01.png -------------------------------------------------------------------------------- /src/titlepage.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing/2f9f98fcf1da5a81dea9f2f796e8e640457f591f/src/titlepage.jpeg --------------------------------------------------------------------------------