├── README.md └── pics ├── TTS_tasks.png ├── TTS_topics.png ├── VC_tasks.png └── VC_topics.png /README.md: -------------------------------------------------------------------------------- 1 | # ICASSP2022 TTS&VC Summary 2 | 3 | 总结了ICASSP2022中TTS和VC相关论文,主要是TTS。统计了sessions,topics,tasks/motivation以及对应的论文数量。 4 | * TTS相关工作最多的是acoustic model,其次是expressiveness和prosody。 5 | * Acoustic model主题中关于模型结构的论文居多,其次是关于AM中的时长建模。 6 | * Expresiveness主题中关于disentanglement的论文居多。 7 | * Prosody主题中关于control的论文居多。 8 | * Front-end主题中关于G2P,多音字消歧的论文居多。 9 | * Vocoder主题中大多是提升效率的论文。 10 | * Multimodal主题中关于配音的论文居多。 11 | 12 | 13 | *** 14 | 15 | ## Content 16 | ### [TTS](#tts) 17 | * [Sessions](#tts_sessions) 18 | * [Topics](#tts_topics) 19 | * [Tasks](#tts_tasks) 20 | * [Acoustic model](#am) 21 | * [Expressiveness](#expressiveness) 22 | * [Prosody](#prosody) 23 | * [Front-end](#front_end) 24 | * [Vocoder](#vocoder) 25 | * [Adaptation](#adaptation) 26 | * [Multimodal](#multimodal) 27 | * [Multi-lingual/Cross-lingual TTS](#multilingual_crosslingual) 28 | * [Singing voice synthesis](#svs) 29 | * [Speech editing](#speech_editing) 30 | * [Others](#others) 31 | 32 | ### [VC](#vc) 33 | * [Sessions](#vc_sessions) 34 | * [Tasks](#vc_tasks) 35 | 36 | *** 37 | 38 | 39 | ## TTS 40 | ### TTS Sessions 41 | | | Sessions | #Sessions | #Papers | 42 | | ------------- | ------------- | ------------- | ------------- | 43 | |1 | Expressiveness/Adaptation | 4 | 24 | 44 | |2 | General topic | 2 | 12 | 45 | |3 | Novel acoustic model | 1 | 6 | 46 | |4 | Front-end | 1 | 6 | 47 | |5 | Vocoder and evaluation | 1 | 6 | 48 | |6 | Multi-lingual/Multimodal | 1 | 6 | 49 | |7 | Singing Voice and others | 1 | 5 | 50 | | Total | 7 | 11 | 65 | 51 | 52 |
53 | 54 | ### TTS Topics 55 | | | Topics | #Papers | 56 | | ------------- | ------------- | ------------- | 57 | |1 | Acoustic model | 12 | 58 | |2 | Expressiveness | 10 | 59 | |3 | Prosody | 9 | 60 | |4 | Front-end | 8 | 61 | |5 | Vocoder | 6 | 62 | |6 | Adaptation | 5 | 63 | |7 | Multimodal | 5 | 64 | |8 | Multi-lingual/Cross-lingual | 4 | 65 | |9 | Singing voice synthesis | 2 | 66 | |10 | Speech editing | 2 | 67 | |11 | Others | 2 | 68 | | Total | 11 | 65 | 69 | 70 |
71 | 72 | 73 | ### TTS Task/Motivation 74 | #### AM (Acoustic model) 75 | | | Tasks | #Papers | 76 | | ------------- | ------------- | ------------- | 77 | |1 | Model | 3 | 78 | |2 | Duration | 2 | 79 | |3 | Analysis | 2 | 80 | |4 | Input | 1 | 81 | |5 | Speaker | 1 | 82 | |6 | Efficiency | 1 | 83 | |7 | Noisy data | 1 | 84 | |8 | Incremental TTS | 1 | 85 | | Total | 8 | 12 | 86 | 87 | #### Expressiveness 88 | | | Tasks | #Papers | 89 | | ------------- | ------------- | ------------- | 90 | |1 | Disentanglement | 3 | 91 | |2 | Emotion | 2 | 92 | |3 | Low-quality | 1 | 93 | |4 | Adaption | 1 | 94 | |5 | Reference selection | 1 | 95 | |6 | Conversational TTS | 1 | 96 | |7 | Low-resource | 1 | 97 | | Total | 7 | 10 | 98 | 99 | 100 | #### Prosody 101 | | | Tasks | #Papers | 102 | | ------------- | ------------- | ------------- | 103 | |1 | Control | 4 | 104 | |2 | Rich prosody | 2 | 105 | |3 | Cross-sentence context | 2 | 106 | |4 | Word-level prosody | 1 | 107 | | Total | 4 | 9 | 108 | 109 | 110 | #### Front-end 111 | | | Tasks | #Papers | 112 | | ------------- | ------------- | ------------- | 113 | |1 | G2P | 2 | 114 | |2 | Polyphone disambiguation | 2 | 115 | |3 | Prosodic structure prediction | 1 | 116 | |4 | POS model compression | 1 | 117 | |5 | End-to-end text normalization | 1 | 118 | |6 | Mathematical formulas | 1 | 119 | | Total | 6 | 8 | 120 | 121 | #### Vocoder 122 | | | Tasks | #Papers | 123 | | ------------- | ------------- | ------------- | 124 | |1 | Efficiency | 4 | 125 | |2 | New method | 2 | 126 | | Total | 2 | 6 | 127 | 128 | #### Adaptation 129 | | | Tasks | #Papers | 130 | | ------------- | ------------- | ------------- | 131 | |1 | Speaker generation | 1 | 132 | |2 | VC for postprocessing | 1 | 133 | |3 | Multimodal | 1 | 134 | |4 | Low-quality data | 1 | 135 | |5 | New structure | 1 | 136 | | Total | 5 | 5 | 137 | 138 | 139 | #### Multimodal 140 | | | Tasks | #Papers | 141 | | ------------- | ------------- | ------------- | 142 | |1 | Dubbing | 4 | 143 | |2 | Speech-to-animation | 1 | 144 | | Total | 2 | 5 | 145 | 146 | 147 | #### Multi-lingual/Cross-lingual TTS 148 | | | Tasks | #Papers | 149 | | ------------- | ------------- | ------------- | 150 | |1 | Data augmentation | 1 | 151 | |2 | Lifelong learning | 1 | 152 | |3 | Triple loss | 1 | 153 | |4 | Improved structure | 1 | 154 | | Total | 4 | 4 | 155 | 156 | 157 | #### Singing voice synthesis 158 | | | Tasks | #Papers | 159 | | ------------- | ------------- | ------------- | 160 | |1 | End-to-end | 1 | 161 | |2 | Melody unsupervision | 1 | 162 | | Total | 2 | 2 | 163 | 164 | 165 | 166 | #### Speech editing 167 | | | Tasks | #Papers | 168 | | ------------- | ------------- | ------------- | 169 | |1 | Speech editing | 2 | 170 | | Total | 1 | 2 | 171 | 172 | 173 | 174 | #### Others 175 | | | Tasks | #Papers | 176 | | ------------- | ------------- | ------------- | 177 | |1 | MOS net | 1 | 178 | |2 | Phase reconstruction | 1 | 179 | | Total | 2 | 2 | 180 | 181 | 182 | **** 183 | ## VC 184 | ### VC Sessions 185 | | | Sessions | #Sessions | #Papers | 186 | | ------------- | ------------- | ------------- | ------------- | 187 | |1 | Conversion | 2 | 12 | 188 | |2 | Representation | 1 | 6 | 189 | |3 | Singing voice and others | 1 | 6 | 190 | | Total | 3 | 4 | 24 | 191 | 192 | 193 |
194 | 195 | ### VC Tasks 196 | 197 | | | Topics | #Papers | 198 | | ------------- | ------------- |------------- | 199 | |1 | VC | 7 | 200 | |2 | One-shot/Representation | 6 | 201 | |3 | Singing VC | 3 | 202 | |4 | Dysarthric speech | 2 | 203 | |5 | Noise robust | 2 | 204 | |6 | Prounication robust | 1 | 205 | |7 | Streaming VC | 1 | 206 | |8 | Data augmentation | 1 | 207 | |9 | Tool | 1 | 208 | | Total | 9 | 24 | 209 | 210 |
-------------------------------------------------------------------------------- /pics/TTS_tasks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/TTS_tasks.png -------------------------------------------------------------------------------- /pics/TTS_topics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/TTS_topics.png -------------------------------------------------------------------------------- /pics/VC_tasks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/VC_tasks.png -------------------------------------------------------------------------------- /pics/VC_topics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/VC_topics.png --------------------------------------------------------------------------------