├── README.md ├── index.html └── photo ├── taoqin.jpg └── xuta.jpg /README.md: -------------------------------------------------------------------------------- 1 | # Neural Text to Speech Synthesis 2 | Tutorial @ [IJCAI 2021](http://ijcai-21.org), August 19-26, 2021 3 | 4 | 5 | ## Speakers 6 | [Xu Tan](https://www.microsoft.com/en-us/research/people/xuta/), Microsoft Research Asia,
7 | [Tao Qin](https://www.microsoft.com/en-us/research/people/taoqin/), Microsoft Research Asia, 8 | 9 | 10 | 11 | ## Abstract 12 | Text to speech (TTS), which aims to synthesize natural and intelligible speech given text, has been a hot research topic in the artificial intelligence community and has become an important product service in the industry. As the development of deep learning and artificial intelligence, neural network based TTS has significantly improved the quality of synthesized speech in recent years. In this tutorial, we will give an introduction to neural text to speech, which consists of four parts. In the first part, we will briefly overview the history of TTS technology. In the second part, we will introduce the key components in neural TTS, including text analysis, acoustic model and vocoder. In the third part, we will review the works that push the frontier of TTS research and cover practical TTS products, including end-to-end TTS, non-autoregressive and lightweight TTS, robust/expressive/controllable TTS, low-resource TTS, and custom voice adaptation. At the end of the tutorial, we will describe several challenges of TTS and discuss future research directions. 13 | 14 | 15 | 16 | ## Outline 17 | 18 | 1. Background
19 | 2. Key components in TTS
20 | 2.1 Text analysis
21 | 2.2 Acoustic model
22 | 2.3 Vocoder
23 | 2.4 Towards end-to-end TTS
24 | 3. Advanced topics in TTS
25 | 3.1 Fast TTS
26 | 3.2 Low-resource TTS
27 | 3.3 Robust TTS
28 | 3.4 Expressive TTS
29 | 3.5 Adaptive TTS
30 | 4. Challenges and future directions
31 | 32 | ## Materials 33 | [Slides](https://www.microsoft.com/en-us/research/uploads/prod/2023/04/TTS.ijcai21-642be55185047.pdf)
34 | [Project page](https://www.microsoft.com/en-us/research/project/text-to-speech/)
35 | [Speech demo page](https://speechresearch.github.io/) 36 | 37 | 38 | ## Other Related Links 39 | [TTS tutorial](https://www.microsoft.com/en-us/research/uploads/prod/2021/02/ISCSLP2021-TTS-Tutorial.pdf) @ [ISCSLP 2021](https://www.iscslp2021.org/program/tutorials/)
40 | [A talk on FastSpeech](https://resource.gtcevent.cn/gtc2020/pdf/CNS20269.pdf) @ NVIDIA GTC China 2020
41 | [A talk on low-resource TTS](https://mp.weixin.qq.com/s/qEhsoWwi2MEL5Ude5QvBag)
42 | [A webinar talk on TTS](https://www.youtube.com/watch?v=MA8PCvmr8B0)
43 | [A talk on Towards Efficient Machine Learning for Speech and Music Applications](https://www.microsoft.com/en-us/research/uploads/prod/2021/07/Efficient-ML-for-Speech-and-Music-Xu-Tan.pdf) 44 | 45 | 46 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Neural Text to Speech Synthesis | Tutorial @ IJCAI 2021 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 19 | 20 | 21 | 22 | 23 |

24 | 25 |

Neural Text to Speech Synthesis

26 |

Tutorial @ IJCAI 2021, August 19-26, 2021

27 | 28 |

Speakers

29 |

Xu Tan, Microsoft Research Asia, xuta@microsoft.com
30 | Tao Qin, Microsoft Research Asia, taoqin@microsoft.com

31 | 32 |

Abstract

33 |

Text to speech (TTS), which aims to synthesize natural and intelligible speech given text, has been a hot research topic in the artificial intelligence community and has become an important product service in the industry. As the development of deep learning and artificial intelligence, neural network based TTS has significantly improved the quality of synthesized speech in recent years. In this tutorial, we will give an introduction to neural text to speech, which consists of four parts. In the first part, we will briefly overview the history of TTS technology. In the second part, we will introduce the key components in neural TTS, including text analysis, acoustic model and vocoder. In the third part, we will review the works that push the frontier of TTS research and cover practical TTS products, including end-to-end TTS, non-autoregressive and lightweight TTS, robust/expressive/controllable TTS, low-resource TTS, and custom voice adaptation. At the end of the tutorial, we will describe several challenges of TTS and discuss future research directions.

34 | 35 |

Outline

36 | 37 |

Background
Key components in TTS
40 | 2.1 Text analysis
41 | 2.2 Acoustic model
42 | 2.3 Vocoder
43 | 2.4 Towards end-to-end TTS
Advanced topics in TTS
45 | 3.1 Fast TTS
46 | 3.2 Low-resource TTS
47 | 3.3 Robust TTS
48 | 3.4 Expressive TTS
49 | 3.5 Adaptive TTS
Challenges and future directions

52 | 53 |

Materials

54 |

Slides
55 | Project page
56 | Speech demo page

57 | 58 | 59 |

60 | TTS tutorial @ ISCSLP 2021
61 | A talk on low-resource TTS @ Jiangmen
62 | A talk on FastSpeech @ NVIDIA GTC China 2020
63 | A webinar talk on TTS @ Microsoft Research
64 | A talk on Towards Efficient Machine Learning for Speech and Music Applications
65 | 66 |

67 | 68 | 69 | 70 | 71 |

72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | -------------------------------------------------------------------------------- /photo/taoqin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tts-tutorial/ijcai2021/a31f1f7e2f4fc33e7f4a51ea6dd08faac466b50d/photo/taoqin.jpg -------------------------------------------------------------------------------- /photo/xuta.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tts-tutorial/ijcai2021/a31f1f7e2f4fc33e7f4a51ea6dd08faac466b50d/photo/xuta.jpg --------------------------------------------------------------------------------