└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Indonesian TTS using Coqui TTS 2 | 3 | Models are available in [Releases](https://github.com/Wikidepia/indonesian-tts/releases/) tab. 4 | 5 | **DO NOT USE FOR COMMERCIAL PURPOSES!** 6 | 7 | ## Model changelog 8 | 9 | ### v1.2 (Aug 12, 2022) 10 | 11 | Finetuned from v1.1 model on: 12 | 13 | - 4 hours of Audiobook dataset 14 | - 2000 sample of Azure TTS 15 | - High quality TTS data for Javanese & Sundanese 16 | 17 | ### v1.1 (Aug 6, 2022) 18 | 19 | Finetuned from LJSpeech model on: 20 | 21 | - 4 hours of Audiobook dataset 22 | - 2000 sample of Azure TTS 23 | 24 | ### v1.0 (Jun 23, 2022) 25 | 26 | Trained from scratch on: 27 | 28 | - 4 hours of Audiobook dataset. 29 | 30 | ## Example 31 | 32 | `Ardi (Azure)`: 33 | 34 | https://user-images.githubusercontent.com/72781956/183240414-b1127e83-6ddd-427c-b58d-386c377f15b4.mp4 35 | 36 | `Gadis (Azure)`: 37 | 38 | https://user-images.githubusercontent.com/72781956/183240420-a5d0d335-af4a-4563-a744-40f6795955c5.mp4 39 | 40 | `Wibowo (Audiobook)`: 41 | 42 | https://user-images.githubusercontent.com/72781956/184360026-c81ac336-c9f1-48ee-97fb-907d66b7f343.mp4 43 | 44 | ## How to use 45 | 46 | You need [`g2p-id`](https://github.com/Wikidepia/g2p-id) to convert grapheme to phoneme. 47 | 48 | Use `tts` command from Coqui TTS to synthesize speech: 49 | 50 | ``` 51 | tts --text "saja səˈdanʔ ˈbərada di dʒaˈkarta." \ 52 | --model_path checkpoint.pth \ 53 | --config_path config.json \ 54 | --speaker_idx wibowo \ 55 | --out_path output.wav 56 | ``` 57 | 58 | You can get all speaker idx by using `--list_speaker_idxs`: 59 | 60 | ``` 61 | tts --model_path checkpoint.pth \ 62 | --config_path config.json \ 63 | --list_speaker_idxs 64 | ``` 65 | 66 | ## Data 67 | 68 | - [Indonesian Azure TTS](https://depia.wiki/files/azure-tts.tar) 69 | 70 | ## Citations 71 | 72 | ```bibtex 73 | @misc{https://doi.org/10.48550/arxiv.2106.06103, 74 | doi = {10.48550/ARXIV.2106.06103}, 75 | url = {https://arxiv.org/abs/2106.06103}, 76 | author = {Kim, Jaehyeon and Kong, Jungil and Son, Juhee}, 77 | keywords = {Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering}, 78 | title = {Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech}, 79 | publisher = {arXiv}, 80 | year = {2021}, 81 | copyright = {arXiv.org perpetual, non-exclusive license} 82 | } 83 | ``` 84 | 85 | ```bibtex 86 | @inproceedings{kjartansson-etal-tts-sltu2018, 87 | title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}}, 88 | author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin}, 89 | booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)}, 90 | year = {2018}, 91 | address = {Gurugram, India}, 92 | month = aug, 93 | pages = {66--70}, 94 | URL = {http://dx.doi.org/10.21437/SLTU.2018-14} 95 | } 96 | ``` 97 | --------------------------------------------------------------------------------