├── README.md
└── pics
    ├── TTS_tasks.png
    ├── TTS_topics.png
    ├── VC_tasks.png
    └── VC_topics.png


/README.md:
--------------------------------------------------------------------------------
  1 | # ICASSP2022 TTS&amp;VC Summary
  2 | 
  3 | 总结了ICASSP2022中TTS和VC相关论文，主要是TTS。统计了sessions，topics，tasks/motivation以及对应的论文数量。
  4 | * TTS相关工作最多的是acoustic model，其次是expressiveness和prosody。
  5 | * Acoustic model主题中关于模型结构的论文居多，其次是关于AM中的时长建模。
  6 | * Expresiveness主题中关于disentanglement的论文居多。
  7 | * Prosody主题中关于control的论文居多。
  8 | * Front-end主题中关于G2P，多音字消歧的论文居多。
  9 | * Vocoder主题中大多是提升效率的论文。
 10 | * Multimodal主题中关于配音的论文居多。
 11 | 
 12 | 
 13 | *** 
 14 | 
 15 | ## Content
 16 | ### [TTS](#tts)
 17 | * [Sessions](#tts_sessions)
 18 | * [Topics](#tts_topics)
 19 | * [Tasks](#tts_tasks)
 20 |     * [Acoustic model](#am)
 21 |     * [Expressiveness](#expressiveness)
 22 |     * [Prosody](#prosody)
 23 |     * [Front-end](#front_end)
 24 |     * [Vocoder](#vocoder)
 25 |     * [Adaptation](#adaptation)
 26 |     * [Multimodal](#multimodal)
 27 |     * [Multi-lingual/Cross-lingual TTS](#multilingual_crosslingual)
 28 |     * [Singing voice synthesis](#svs)
 29 |     * [Speech editing](#speech_editing)
 30 |     * [Others](#others)
 31 | 
 32 | ### [VC](#vc)
 33 | * [Sessions](#vc_sessions)
 34 | * [Tasks](#vc_tasks)
 35 | 
 36 | *** 
 37 | 
 38 | 
 39 | ## TTS <span id='tts'/>
 40 | ### TTS Sessions  <span id='tts_sessions'/>
 41 | |          | Sessions        | #Sessions     | #Papers |
 42 | | ------------- | ------------- | ------------- | ------------- |
 43 | |1 | Expressiveness/Adaptation  | 4  | 24  |
 44 | |2 | General topic  | 2  | 12  |
 45 | |3 | Novel acoustic model  | 1  | 6  |
 46 | |4 | Front-end  | 1  | 6  |
 47 | |5 | Vocoder and evaluation  | 1  | 6  |
 48 | |6 | Multi-lingual/Multimodal  | 1  | 6  |
 49 | |7 | Singing Voice and others | 1  | 5  |
 50 | | Total | 7  | 11   | 65   |
 51 | 
 52 | <div><img src='./pics/TTS_topics.png' width=500 alt=''> </img></div> 
 53 | 
 54 | ### TTS Topics  <span id='tts_topics'/>
 55 | |    | Topics  | #Papers | 
 56 | | ------------- | ------------- | ------------- | 
 57 | |1 | Acoustic model  | 12  |
 58 | |2 | Expressiveness  | 10  | 
 59 | |3 | Prosody    | 9  |
 60 | |4 | Front-end  | 8  | 
 61 | |5 | Vocoder  | 6 | 
 62 | |6 | Adaptation  | 5  | 
 63 | |7 | Multimodal | 5  | 
 64 | |8 | Multi-lingual/Cross-lingual | 4  |
 65 | |9 | Singing voice synthesis | 2  | 
 66 | |10 | Speech editing | 2  | 
 67 | |11 | Others | 2  | 
 68 | | Total | 11   | 65 | 
 69 | 
 70 | <div><img src='./pics/TTS_tasks.png' width=500 alt=''> </img></div> 
 71 | 
 72 | 
 73 | ### TTS Task/Motivation  <span id='tts_tasks'/>
 74 | #### AM (Acoustic model) <span id='am'/>
 75 | |    | Tasks  | #Papers | 
 76 | | ------------- | ------------- | ------------- | 
 77 | |1 | Model  | 3  |
 78 | |2 | Duration  | 2  | 
 79 | |3 | Analysis    | 2  |
 80 | |4 | Input  | 1  | 
 81 | |5 | Speaker  | 1 | 
 82 | |6 | Efficiency  | 1  | 
 83 | |7 | Noisy data | 1  | 
 84 | |8 | Incremental TTS | 1  |
 85 | | Total | 8   | 12 | 
 86 | 
 87 | #### Expressiveness <span id='expressiveness'/>
 88 | |    | Tasks  | #Papers | 
 89 | | ------------- | ------------- | ------------- | 
 90 | |1 | Disentanglement  | 3  |
 91 | |2 | Emotion  | 2  | 
 92 | |3 | Low-quality    | 1  |
 93 | |4 | Adaption  | 1  | 
 94 | |5 | Reference selection  | 1 | 
 95 | |6 | Conversational TTS  | 1  | 
 96 | |7 | Low-resource | 1  | 
 97 | | Total | 7   | 10 | 
 98 | 
 99 | 
100 | #### Prosody <span id='prosody'/>
101 | |    | Tasks  | #Papers | 
102 | | ------------- | ------------- | ------------- | 
103 | |1 | Control  | 4  |
104 | |2 | Rich prosody  | 2  | 
105 | |3 | Cross-sentence context    | 2  |
106 | |4 | Word-level prosody  | 1  | 
107 | | Total | 4   | 9 | 
108 | 
109 | 
110 | #### Front-end <span id='front_end'/>
111 | |    | Tasks  | #Papers | 
112 | | ------------- | ------------- | ------------- | 
113 | |1 | G2P  | 2  |
114 | |2 | Polyphone disambiguation   | 2  | 
115 | |3 | Prosodic structure prediction     | 1  |
116 | |4 | POS model compression   | 1  |
117 | |5 | End-to-end text normalization   | 1 | 
118 | |6 | Mathematical formulas   | 1  |  
119 | | Total | 6   | 8 | 
120 | 
121 | #### Vocoder <span id='vocoder'/>
122 | |    | Tasks  | #Papers | 
123 | | ------------- | ------------- | ------------- | 
124 | |1 | Efficiency  | 4  |
125 | |2 | New method   | 2  | 
126 | | Total | 2   | 6 | 
127 | 
128 | #### Adaptation <span id='adaptation'/>
129 | |    | Tasks  | #Papers | 
130 | | ------------- | ------------- | ------------- | 
131 | |1 | Speaker generation  | 1  |
132 | |2 | VC for postprocessing  | 1  | 
133 | |3 | Multimodal     | 1  |
134 | |4 | Low-quality data   | 1  |
135 | |5 | New structure   | 1 | 
136 | | Total | 5   | 5 | 
137 | 
138 | 
139 | #### Multimodal <span id='multimodal'/>
140 | |    | Tasks  | #Papers | 
141 | | ------------- | ------------- | ------------- | 
142 | |1 | Dubbing  | 4  |
143 | |2 | Speech-to-animation  | 1  | 
144 | | Total | 2   | 5 | 
145 | 
146 | 
147 | #### Multi-lingual/Cross-lingual TTS <span id='multilingual_crosslingual'/>
148 | |    | Tasks  | #Papers | 
149 | | ------------- | ------------- | ------------- | 
150 | |1 | Data augmentation  | 1  |
151 | |2 | Lifelong learning  | 1  | 
152 | |3 | Triple loss     | 1  |
153 | |4 | Improved structure   | 1  |
154 | | Total | 4   | 4 | 
155 | 
156 | 
157 | #### Singing voice synthesis <span id='svs'/>
158 | |    | Tasks  | #Papers | 
159 | | ------------- | ------------- | ------------- | 
160 | |1 | End-to-end | 1  |
161 | |2 | Melody unsupervision  | 1  | 
162 | | Total | 2   | 2 | 
163 | 
164 | 
165 | 
166 | #### Speech editing <span id='speech_editing'/>
167 | |    | Tasks  | #Papers | 
168 | | ------------- | ------------- | ------------- | 
169 | |1 | Speech editing | 2  | 
170 | | Total | 1   | 2 | 
171 | 
172 | 
173 | 
174 | #### Others  <span id='others'/>
175 | |    | Tasks  | #Papers | 
176 | | ------------- | ------------- | ------------- | 
177 | |1 | MOS net | 1  |
178 | |2 | Phase reconstruction  | 1  | 
179 | | Total | 2   | 2 | 
180 | 
181 | 
182 | ****
183 | ## VC  <span id='vc'/>
184 | ### VC Sessions  <span id='vc_sessions'/>
185 | |          | Sessions        | #Sessions     | #Papers |
186 | | ------------- | ------------- | ------------- | ------------- |
187 | |1 | Conversion  | 2  | 12  |
188 | |2 | Representation  | 1  | 6  |
189 | |3 | Singing voice and others  | 1  | 6  |
190 | | Total | 3  | 4   | 24   |
191 | 
192 | 
193 | <div><img src='./pics/VC_topics.png' width=500 alt=''> </img></div> 
194 | 
195 | ### VC Tasks  <span id='vc_tasks'/>
196 | 
197 | |    | Topics  | #Papers | 
198 | | ------------- | ------------- |------------- |
199 | |1 | VC  | 7  |
200 | |2 | One-shot/Representation  | 6  | 
201 | |3 | Singing VC    | 3  |
202 | |4 | Dysarthric speech  | 2  | 
203 | |5 | Noise robust  | 2 | 
204 | |6 | Prounication robust  | 1  | 
205 | |7 | Streaming VC | 1  | 
206 | |8 | Data augmentation | 1  |
207 | |9 | Tool | 1  | 
208 | | Total | 9   | 24 | 
209 | 
210 | <div><img src='./pics/VC_tasks.png' width=500 alt=''> </img></div> 


--------------------------------------------------------------------------------
/pics/TTS_tasks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/TTS_tasks.png


--------------------------------------------------------------------------------
/pics/TTS_topics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/TTS_topics.png


--------------------------------------------------------------------------------
/pics/VC_tasks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/VC_tasks.png


--------------------------------------------------------------------------------
/pics/VC_topics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lmxue/ICASSP2022_TTS_VC_Summary/ef93606430817764d283476c60f4cbc53f284ef4/pics/VC_topics.png


--------------------------------------------------------------------------------