└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Indonesian TTS using Coqui TTS
 2 | 
 3 | Models are available in [Releases](https://github.com/Wikidepia/indonesian-tts/releases/) tab.
 4 | 
 5 | **DO NOT USE FOR COMMERCIAL PURPOSES!**
 6 | 
 7 | ## Model changelog
 8 | 
 9 | ### v1.2 (Aug 12, 2022)
10 | 
11 | Finetuned from v1.1 model on:
12 | 
13 | - 4 hours of Audiobook dataset
14 | - 2000 sample of Azure TTS
15 | - High quality TTS data for Javanese & Sundanese
16 | 
17 | ### v1.1 (Aug 6, 2022)
18 | 
19 | Finetuned from LJSpeech model on:
20 | 
21 | - 4 hours of Audiobook dataset
22 | - 2000 sample of Azure TTS
23 | 
24 | ### v1.0 (Jun 23, 2022)
25 | 
26 | Trained from scratch on:
27 | 
28 | - 4 hours of Audiobook dataset.
29 | 
30 | ## Example
31 | 
32 | `Ardi (Azure)`:
33 | 
34 | https://user-images.githubusercontent.com/72781956/183240414-b1127e83-6ddd-427c-b58d-386c377f15b4.mp4
35 | 
36 | `Gadis (Azure)`:
37 | 
38 | https://user-images.githubusercontent.com/72781956/183240420-a5d0d335-af4a-4563-a744-40f6795955c5.mp4
39 | 
40 | `Wibowo (Audiobook)`:
41 | 
42 | https://user-images.githubusercontent.com/72781956/184360026-c81ac336-c9f1-48ee-97fb-907d66b7f343.mp4
43 | 
44 | ## How to use
45 | 
46 | You need [`g2p-id`](https://github.com/Wikidepia/g2p-id) to convert grapheme to phoneme. 
47 | 
48 | Use `tts` command from Coqui TTS to synthesize speech:
49 | 
50 | ```
51 | tts --text "saja səˈdanʔ ˈbərada di dʒaˈkarta." \
52 |     --model_path checkpoint.pth \
53 |     --config_path config.json \
54 |     --speaker_idx wibowo \
55 |     --out_path output.wav
56 | ```
57 | 
58 | You can get all speaker idx by using `--list_speaker_idxs`:
59 | 
60 | ```
61 | tts --model_path checkpoint.pth \
62 |     --config_path config.json \
63 |     --list_speaker_idxs
64 | ```
65 | 
66 | ## Data
67 | 
68 | - [Indonesian Azure TTS](https://depia.wiki/files/azure-tts.tar)
69 | 
70 | ## Citations
71 | 
72 | ```bibtex
73 | @misc{https://doi.org/10.48550/arxiv.2106.06103,
74 |   doi = {10.48550/ARXIV.2106.06103}, 
75 |   url = {https://arxiv.org/abs/2106.06103},
76 |   author = {Kim, Jaehyeon and Kong, Jungil and Son, Juhee},
77 |   keywords = {Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
78 |   title = {Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech},
79 |   publisher = {arXiv},
80 |   year = {2021},
81 |   copyright = {arXiv.org perpetual, non-exclusive license}
82 | }
83 | ```
84 | 
85 | ```bibtex
86 | @inproceedings{kjartansson-etal-tts-sltu2018,
87 |     title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
88 |     author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
89 |     booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
90 |     year  = {2018},
91 |     address = {Gurugram, India},
92 |     month = aug,
93 |     pages = {66--70},
94 |     URL   = {http://dx.doi.org/10.21437/SLTU.2018-14}
95 | }
96 | ```
97 | 


--------------------------------------------------------------------------------