├── wavs ├── 5K_jeong_5.wav ├── 5K_kang_5.wav ├── 5K_yoon_5.wav ├── D12-250_4.wav ├── D12-250_5.wav ├── D12-500_4.wav ├── D12-500_5.wav ├── D13-250_4.wav ├── D13-250_5.wav ├── D13-500_4.wav ├── D13-500_5.wav ├── D14-250_4.wav ├── D14-250_5.wav ├── D14-500_4.wav ├── D14-500_5.wav ├── judy-250_2.wav ├── judy-250_3.wav ├── judy-250_5.wav ├── judy-500_2.wav ├── judy-500_3.wav ├── judy-500_5.wav ├── judy_5K_3.wav ├── kang_250_4.wav ├── kang_250_5.wav ├── kang_500_4.wav ├── kang_500_5.wav ├── mary-250_2.wav ├── mary-250_3.wav ├── mary-250_5.wav ├── mary-500_2.wav ├── mary-500_3.wav ├── mary-500_5.wav ├── mary_5K_3.wav ├── yoon_250_4.wav ├── yoon_250_5.wav ├── yoon_500_4.wav ├── yoon_500_5.wav ├── D12-500-5k_4.wav ├── D12-500-5k_5.wav ├── D13-500-5k_4.wav ├── D13-500-5k_5.wav ├── D14-500-5k_4.wav ├── D14-500-5k_5.wav ├── jeong_250_4.wav ├── jeong_250_5.wav ├── jeong_500_4.wav ├── jeong_500_5.wav ├── miller-250_2.wav ├── miller-250_3.wav ├── miller-250_5.wav ├── miller-500_2.wav ├── miller-500_3.wav ├── miller-500_5.wav └── miller_5K_3.wav ├── styles.css └── index.html /wavs/5K_jeong_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/5K_jeong_5.wav -------------------------------------------------------------------------------- /wavs/5K_kang_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/5K_kang_5.wav -------------------------------------------------------------------------------- /wavs/5K_yoon_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/5K_yoon_5.wav -------------------------------------------------------------------------------- /wavs/D12-250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-250_4.wav -------------------------------------------------------------------------------- /wavs/D12-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-250_5.wav -------------------------------------------------------------------------------- /wavs/D12-500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-500_4.wav -------------------------------------------------------------------------------- /wavs/D12-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-500_5.wav -------------------------------------------------------------------------------- /wavs/D13-250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-250_4.wav -------------------------------------------------------------------------------- /wavs/D13-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-250_5.wav -------------------------------------------------------------------------------- /wavs/D13-500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-500_4.wav -------------------------------------------------------------------------------- /wavs/D13-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-500_5.wav -------------------------------------------------------------------------------- /wavs/D14-250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-250_4.wav -------------------------------------------------------------------------------- /wavs/D14-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-250_5.wav -------------------------------------------------------------------------------- /wavs/D14-500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-500_4.wav -------------------------------------------------------------------------------- /wavs/D14-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-500_5.wav -------------------------------------------------------------------------------- /wavs/judy-250_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-250_2.wav -------------------------------------------------------------------------------- /wavs/judy-250_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-250_3.wav -------------------------------------------------------------------------------- /wavs/judy-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-250_5.wav -------------------------------------------------------------------------------- /wavs/judy-500_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-500_2.wav -------------------------------------------------------------------------------- /wavs/judy-500_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-500_3.wav -------------------------------------------------------------------------------- /wavs/judy-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy-500_5.wav -------------------------------------------------------------------------------- /wavs/judy_5K_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/judy_5K_3.wav -------------------------------------------------------------------------------- /wavs/kang_250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/kang_250_4.wav -------------------------------------------------------------------------------- /wavs/kang_250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/kang_250_5.wav -------------------------------------------------------------------------------- /wavs/kang_500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/kang_500_4.wav -------------------------------------------------------------------------------- /wavs/kang_500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/kang_500_5.wav -------------------------------------------------------------------------------- /wavs/mary-250_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-250_2.wav -------------------------------------------------------------------------------- /wavs/mary-250_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-250_3.wav -------------------------------------------------------------------------------- /wavs/mary-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-250_5.wav -------------------------------------------------------------------------------- /wavs/mary-500_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-500_2.wav -------------------------------------------------------------------------------- /wavs/mary-500_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-500_3.wav -------------------------------------------------------------------------------- /wavs/mary-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary-500_5.wav -------------------------------------------------------------------------------- /wavs/mary_5K_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/mary_5K_3.wav -------------------------------------------------------------------------------- /wavs/yoon_250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/yoon_250_4.wav -------------------------------------------------------------------------------- /wavs/yoon_250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/yoon_250_5.wav -------------------------------------------------------------------------------- /wavs/yoon_500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/yoon_500_4.wav -------------------------------------------------------------------------------- /wavs/yoon_500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/yoon_500_5.wav -------------------------------------------------------------------------------- /wavs/D12-500-5k_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-500-5k_4.wav -------------------------------------------------------------------------------- /wavs/D12-500-5k_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D12-500-5k_5.wav -------------------------------------------------------------------------------- /wavs/D13-500-5k_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-500-5k_4.wav -------------------------------------------------------------------------------- /wavs/D13-500-5k_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D13-500-5k_5.wav -------------------------------------------------------------------------------- /wavs/D14-500-5k_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-500-5k_4.wav -------------------------------------------------------------------------------- /wavs/D14-500-5k_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/D14-500-5k_5.wav -------------------------------------------------------------------------------- /wavs/jeong_250_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/jeong_250_4.wav -------------------------------------------------------------------------------- /wavs/jeong_250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/jeong_250_5.wav -------------------------------------------------------------------------------- /wavs/jeong_500_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/jeong_500_4.wav -------------------------------------------------------------------------------- /wavs/jeong_500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/jeong_500_5.wav -------------------------------------------------------------------------------- /wavs/miller-250_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-250_2.wav -------------------------------------------------------------------------------- /wavs/miller-250_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-250_3.wav -------------------------------------------------------------------------------- /wavs/miller-250_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-250_5.wav -------------------------------------------------------------------------------- /wavs/miller-500_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-500_2.wav -------------------------------------------------------------------------------- /wavs/miller-500_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-500_3.wav -------------------------------------------------------------------------------- /wavs/miller-500_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller-500_5.wav -------------------------------------------------------------------------------- /wavs/miller_5K_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DSAIL-SKKU/Transfer-Learning/HEAD/wavs/miller_5K_3.wav -------------------------------------------------------------------------------- /styles.css: -------------------------------------------------------------------------------- 1 | .heading { 2 | margin-top: 40px; 3 | font-weight: 900; 4 | } 5 | span { 6 | font-weight: 900; 7 | } 8 | .audio-list { 9 | display: flex; 10 | /* justify-content: space-between; */ 11 | } 12 | .audio-element { 13 | margin-right: 10px; 14 | } 15 | h3 { 16 | margin-top: 40px; 17 | } 18 | .footer { 19 | margin-top: 40px; 20 | } 21 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Toward Natural and Intelligible Speech Synthesis: An Empirical Study on Transfer Learning 8 | 9 | 10 |

Toward Natural and Intelligible Speech Synthesis: An Empirical Study on Transfer Learning

11 |

Authors: Jeewoo Yoon, Seong Choi, Taihu Li, and Jinyoung Han*

12 |

Abstract: To synthesize natural and intelligible speech with a small amount of data, transfer learning with well-maintained and pre-trained data has been known to be useful. However, little attention has been paid to answer the following research questions with empirically-grounded evidence, ``How much pre-trained (source) speech data (e.g., 10 K utterances or 10 hours) used in transfer learning is enough for generating natural and intelligible speech?'' and ``For generating natural and intelligible speech, how much (target) speech data should at least be provided?'', which are essential for the quality of speech synthesis. To answer these questions, this paper conducts extensive experiments on speech synthesis with multiple source and target data with different lengths, speakers, and languages. We show that intelligible and natural speech can be synthesized with only 500 utterances of target data using transfer learning. Our work also reveals that at least 5000 utterances of source pre-trained data are required to synthesize decent speech.

13 |

Experiment 1: Using different target datasets with different lengths

14 |
15 |

The model was pre-trained with 10K utterances and fine-tuned with 250 and 500 utterances

16 |

"I was so drunk last night that I can not remember anything."

17 |
18 |
19 |

E-SPK1-250

20 | 23 |
24 |
25 |

E-SPK2-250

26 | 29 |
30 |
31 |

E-SPK3-250

32 | 35 |
36 |
37 |
38 |
39 |

E-SPK1-500

40 | 43 |
44 |
45 |

E-SPK2-500

46 | 49 |
50 |
51 |

E-SPK3-500

52 | 55 |
56 |
57 | 58 | 59 |

"Tell me if you need anything."

60 |
61 |
62 |

E-SPK1-250

63 | 66 |
67 |
68 |

E-SPK2-250

69 | 72 |
73 |
74 |

E-SPK3-250

75 | 78 |
79 |
80 |
81 |
82 |

E-SPK1-500

83 | 86 |
87 |
88 |

E-SPK2-500

89 | 92 |
93 |
94 |

E-SPK3-500

95 | 98 |
99 |
100 | 101 |

"이 옷 정말 예쁘지 않니?"

102 |
103 |
104 |

K-SPK1-250

105 | 108 |
109 |
110 |

K-SPK2-250

111 | 114 |
115 |
116 |

K-SPK3-250

117 | 120 |
121 |
122 |
123 |
124 |

K-SPK1-500

125 | 128 |
129 |
130 |

K-SPK2-500

131 | 134 |
135 |
136 |

K-SPK3-500

137 | 140 |
141 |
142 | 143 |

"요즘에는 예식장 구하기가 하늘의 별 따기야."

144 |
145 |
146 |

K-SPK1-250

147 | 150 |
151 |
152 |

K-SPK2-250

153 | 156 |
157 |
158 |

K-SPK3-250

159 | 162 |
163 |
164 |
165 |
166 |

K-SPK1-500

167 | 170 |
171 |
172 |

K-SPK2-500

173 | 176 |
177 |
178 |

K-SPK3-500

179 | 182 |
183 |
184 | 185 |

"气质这块拿捏得死死的"

186 |
187 |
188 |

C-SPK1-250

189 | 192 |
193 |
194 |

C-SPK2-250

195 | 198 |
199 |
200 |

C-SPK3-250

201 | 204 |
205 |
206 |
207 |
208 |

C-SPK1-500

209 | 212 |
213 |
214 |

C-SPK2-500

215 | 218 |
219 |
220 |

C-SPK3-500

221 | 224 |
225 |
226 | 227 |

"他的写作水平不敢恭维"

228 |
229 |
230 |

C-SPK1-250

231 | 234 |
235 |
236 |

C-SPK2-250

237 | 240 |
241 |
242 |

C-SPK3-250

243 | 246 |
247 |
248 |
249 |
250 |

C-SPK1-500

251 | 254 |
255 |
256 |

C-SPK2-500

257 | 260 |
261 |
262 |

C-SPK3-500

263 | 266 |
267 |
268 | 269 | 270 | 271 |

Experiment 2: Using different source datasets with different lengths

272 |
273 |

The model was pre-trained with 5K utterances and fine-tuned with 500 utterances

274 |

"I was so drunk last night that I can not remember anything."

275 |
276 |
277 |

E-SPK1-500

278 | 281 |
282 |
283 |

E-SPK2-500

284 | 287 |
288 |
289 |

E-SPK3-500

290 | 293 |
294 |
295 | 296 |

"요즘에는 예식장 구하기가 하늘의 별 따기야."

297 |
298 |
299 |

K-SPK1-500

300 | 303 |
304 |
305 |

K-SPK2-500

306 | 309 |
310 |
311 |

K-SPK3-500

312 | 315 |
316 |
317 | 318 |

"气质这块拿捏得死死的"

319 |
320 |
321 |

C-SPK1-500

322 | 325 |
326 |
327 |

C-SPK2-500

328 | 331 |
332 |
333 |

C-SPK3-500

334 | 337 |
338 |
339 | 340 | 344 | 345 | 346 | 347 | --------------------------------------------------------------------------------