24 | 25 |

Non-Autoregressive Sequence Generation

26 |

Tutorial @ ACL 2022, May 22, 2022

27 | 28 |

Speakers

29 |

30 | Jiatao Gu, Facebook AI Research, jgu@fb.com 31 |
32 | Xu Tan, Microsoft Research Asia, xuta@microsoft.com 33 |

34 | 35 |

Abstract

36 |

Non-autoregressive sequence generation (NAR) attempts to generate the entire or partial output sequences in parallel to speed up the generation process and avoid potential issues (e.g., label bias, exposure bias) in autoregressive generation. While it has received much research attention and has been applied in many sequence generation tasks in natural language and speech, naive NAR models still face many challenges to close the performance gap between state-of-the-art autoregressive models because of a lack of modeling power. In this tutorial, we will provide a thorough introduction and review of non-autoregressive sequence generation, in four sections: 1) Background, which covers the motivation of NAR generation, the problem definition, the evaluation protocol, and the comparison with standard autoregressive generation approaches. 2) Method, which includes different aspects: model architecture, objective function, training data, learning paradigm, and additional inference tricks. 3) Application, which covers different tasks in text and speech generation, and some advanced topics in applications. 4) Conclusion, in which we describe several research challenges and discuss the potential future research directions. We hope this tutorial can serve both academic researchers and industry practitioners working on non-autoregressive sequence generation.

37 | 38 | 39 |

Outline

40 |

Introduction (~ 20 minutes)
42 | 1.1 Problem definition
43 | 1.2 Evaluation protocol
44 | 1.3 Multi-modality problem
Methods (~ 80 minutes)
46 | 2.1 Model architectures
47 | 2.1.1 Fully NAR models
48 | 2.1.2 Iteration-based NAR models
49 | 2.1.3 Partially NAR models
50 | 2.1.4 Locally AR models
51 | 2.1.5 NAR models with latent variables
52 | 2.2 Objective functions
53 | 2.2.1 Loss with latent variables
54 | 2.2.2 Loss beyond token-level
55 | 2.3 Training data
56 | 2.4 Learning paradigms
57 | 2.4.1 Curriculum learning
58 | 2.4.2 Self-supervised pre-training
59 | 2.5 Inference methods and tricks
Applications (~ 60 minutes)
61 | 3.1 Task overview in text/speech/image generation
62 | 3.2 NAR generation tasks
63 | 3.2.1 Neural machine translation
64 | 3.2.2 Text error correction
65 | 3.2.3 Automatic speech recognition
66 | 3.2.4 Text to speech / singing voice synthesis
67 | 3.2.5 Image (pixel/token) generation
68 | 3.3 Summary of NAR Applications
69 | 3.3.1 Benefits of NAR for different tasks
70 | 3.3.2 Addressing target-target/source dependency
71 | 3.3.3 Data difficulty vs model capacity
72 | 3.3.4 Streaming vs NAR, AR vs iterative NAR
Open problems, future directions, Q\&A (~ 20 minutes)

75 | 76 | 77 | 78 |

Materials

79 |

Slides
80 | 81 | 82 | 83 | 84 |