├── outputs └── .placeholder ├── Windows_app_start.bat ├── demo_info ├── ui.png ├── Rogger_sample_aa.wav ├── Rogger_sample_en.wav └── Rogger_sample_ru.wav ├── targets └── Rogger.wav ├── requirements.txt ├── texts.json ├── languages.json ├── appTerminal.py ├── LICENSE ├── .gitignore ├── app.py ├── app2.py └── README.md /outputs/.placeholder: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Windows_app_start.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | cd 3 | call venv\Scripts\activate 4 | python app.py -------------------------------------------------------------------------------- /demo_info/ui.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BoltzmannEntropy/xtts2-ui/HEAD/demo_info/ui.png -------------------------------------------------------------------------------- /targets/Rogger.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BoltzmannEntropy/xtts2-ui/HEAD/targets/Rogger.wav -------------------------------------------------------------------------------- /demo_info/Rogger_sample_aa.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BoltzmannEntropy/xtts2-ui/HEAD/demo_info/Rogger_sample_aa.wav -------------------------------------------------------------------------------- /demo_info/Rogger_sample_en.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BoltzmannEntropy/xtts2-ui/HEAD/demo_info/Rogger_sample_en.wav -------------------------------------------------------------------------------- /demo_info/Rogger_sample_ru.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BoltzmannEntropy/xtts2-ui/HEAD/demo_info/Rogger_sample_ru.wav -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | gradio==4.7.1 2 | TTS==0.21.* 3 | soundfile==0.12.1 4 | transformers==4.33.0 5 | streamlit 6 | audio-recorder-streamlit 7 | -------------------------------------------------------------------------------- /texts.json: -------------------------------------------------------------------------------- 1 | [ "You can add the Sample texts here, so that you can generate a lot of voices to prepare dataset for training purpose", 2 | "Hello, this is a test.", 3 | "This is another test.", 4 | "This is the final test." 5 | ] -------------------------------------------------------------------------------- /languages.json: -------------------------------------------------------------------------------- 1 | { 2 | "Arabic": "ar", 3 | "Chinese": "zh-cn", 4 | "Czech": "cs", 5 | "Dutch": "nl", 6 | "English": "en", 7 | "French": "fr", 8 | "German": "de", 9 | "Hungarian": "hu", 10 | "Italian": "it", 11 | "Japanese": "ja", 12 | "Korean": "ko", 13 | "Polish": "pl", 14 | "Portuguese": "pt", 15 | "Russian": "ru", 16 | "Spanish": "es", 17 | "Turkish": "tr" 18 | } -------------------------------------------------------------------------------- /appTerminal.py: -------------------------------------------------------------------------------- 1 | import json 2 | from pathlib import Path 3 | from app import gen_voice, tts, update_speakers, languages 4 | 5 | def generate_voices_from_file(file_path): 6 | # Load the texts from the JSON file 7 | with open(file_path, 'r') as f: 8 | texts = json.load(f) 9 | 10 | # Get the list of speakers 11 | speakers = update_speakers() 12 | 13 | # For each text, generate a voice for each speaker 14 | for text in texts: 15 | for speaker in speakers: 16 | gen_voice(text, speaker, speed=0.8, english="English") 17 | 18 | if __name__ == "__main__": 19 | generate_voices_from_file(Path('texts.json')) -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Shlomo Kashani 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | outputs/* 2 | !outputs/.placeholder 3 | targets/* 4 | !target/Rogger.wav 5 | 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | share/python-wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | MANIFEST 33 | 34 | # PyInstaller 35 | # Usually these files are written by a python script from a template 36 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 37 | *.manifest 38 | *.spec 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | *.py,cover 55 | .hypothesis/ 56 | .pytest_cache/ 57 | cover/ 58 | 59 | # Translations 60 | *.mo 61 | *.pot 62 | 63 | # Django stuff: 64 | *.log 65 | local_settings.py 66 | db.sqlite3 67 | db.sqlite3-journal 68 | 69 | # Flask stuff: 70 | instance/ 71 | .webassets-cache 72 | 73 | # Scrapy stuff: 74 | .scrapy 75 | 76 | # Sphinx documentation 77 | docs/_build/ 78 | 79 | # PyBuilder 80 | .pybuilder/ 81 | target/ 82 | 83 | # Jupyter Notebook 84 | .ipynb_checkpoints 85 | 86 | # IPython 87 | profile_default/ 88 | ipython_config.py 89 | 90 | # pyenv 91 | # For a library or package, you might want to ignore these files since the code is 92 | # intended to run in multiple environments; otherwise, check them in: 93 | # .python-version 94 | 95 | # pipenv 96 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 97 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 98 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 99 | # install all needed dependencies. 100 | #Pipfile.lock 101 | 102 | # poetry 103 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 104 | # This is especially recommended for binary packages to ensure reproducibility, and is more 105 | # commonly ignored for libraries. 106 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 107 | #poetry.lock 108 | 109 | # pdm 110 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 111 | #pdm.lock 112 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 113 | # in version control. 114 | # https://pdm.fming.dev/#use-with-ide 115 | .pdm.toml 116 | 117 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 118 | __pypackages__/ 119 | 120 | # Celery stuff 121 | celerybeat-schedule 122 | celerybeat.pid 123 | 124 | # SageMath parsed files 125 | *.sage.py 126 | 127 | # Environments 128 | .env 129 | .venv 130 | env/ 131 | venv/ 132 | ENV/ 133 | env.bak/ 134 | venv.bak/ 135 | 136 | # Spyder project settings 137 | .spyderproject 138 | .spyproject 139 | 140 | # Rope project settings 141 | .ropeproject 142 | 143 | # mkdocs documentation 144 | /site 145 | 146 | # mypy 147 | .mypy_cache/ 148 | .dmypy.json 149 | dmypy.json 150 | 151 | # Pyre type checker 152 | .pyre/ 153 | 154 | # pytype static type analyzer 155 | .pytype/ 156 | 157 | # Cython debug symbols 158 | cython_debug/ 159 | 160 | # PyCharm 161 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 162 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 163 | # and can be added to the global gitignore or merged into this file. For a more nuclear 164 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 165 | #.idea/ 166 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import gradio as gr 2 | import torch 3 | import platform 4 | import random 5 | import json 6 | from pathlib import Path 7 | from TTS.api import TTS 8 | import uuid 9 | import html 10 | import soundfile as sf 11 | 12 | def is_mac_os(): 13 | return platform.system() == 'Darwin' 14 | 15 | params = { 16 | "activate": True, 17 | "autoplay": True, 18 | "show_text": False, 19 | "remove_trailing_dots": False, 20 | "voice": "Rogger.wav", 21 | "language": "English", 22 | "model_name": "tts_models/multilingual/multi-dataset/xtts_v2", 23 | } 24 | 25 | # SUPPORTED_FORMATS = ['wav', 'mp3', 'flac', 'ogg'] 26 | SAMPLE_RATE = 16000 27 | device = None 28 | 29 | # Set the default speaker name 30 | default_speaker_name = "Rogger" 31 | 32 | if is_mac_os(): 33 | device = torch.device('cpu') 34 | else: 35 | device = torch.device('cuda:0') 36 | 37 | # Load model 38 | tts = TTS(model_name=params["model_name"]).to(device) 39 | 40 | # # Random sentence (assuming harvard_sentences.txt is in the correct path) 41 | # def random_sentence(): 42 | # with open(Path("harvard_sentences.txt")) as f: 43 | # return random.choice(list(f)) 44 | 45 | # Voice generation function 46 | def gen_voice(string, spk, speed, english): 47 | string = html.unescape(string) 48 | short_uuid = str(uuid.uuid4())[:8] 49 | fl_name='outputs/' + spk + "-" + short_uuid +'.wav' 50 | output_file = Path(fl_name) 51 | this_dir = str(Path(__file__).parent.resolve()) 52 | tts.tts_to_file( 53 | text=string, 54 | speed=speed, 55 | file_path=output_file, 56 | speaker_wav=[f"{this_dir}/targets/" +spk + ".wav"], 57 | language=languages[english] 58 | ) 59 | return output_file 60 | 61 | def update_speakers(): 62 | speakers = {p.stem: str(p) for p in list(Path('targets').glob("*.wav"))} 63 | return list(speakers.keys()) 64 | 65 | def update_dropdown(_=None, selected_speaker=default_speaker_name): 66 | return gr.Dropdown(choices=update_speakers(), value=selected_speaker, label="Select Speaker") 67 | 68 | def handle_recorded_audio(audio_data, speaker_dropdown, filename = "user_entered"): 69 | if not audio_data: 70 | return speaker_dropdown 71 | 72 | sample_rate, audio_content = audio_data 73 | 74 | save_path = f"targets/{filename}.wav" 75 | 76 | # Write the audio content to a WAV file 77 | sf.write(save_path, audio_content, sample_rate) 78 | 79 | # Create a new Dropdown with the updated speakers list, including the recorded audio 80 | updated_dropdown = update_dropdown(selected_speaker=filename) 81 | return updated_dropdown 82 | 83 | 84 | # Load the language data 85 | with open(Path('languages.json'), encoding='utf8') as f: 86 | languages = json.load(f) 87 | 88 | # Gradio Blocks interface 89 | with gr.Blocks() as app: 90 | 91 | gr.Markdown("### TTS based Voice Cloning.") 92 | 93 | with gr.Row(): 94 | with gr.Column(): 95 | text_input = gr.Textbox(lines=2, label="Speechify this Text",value="Even in the darkest nights, a single spark of hope can ignite the fire of determination within us, guiding us towards a future we dare to dream.") 96 | speed_slider = gr.Slider(label='Speed', minimum=0.1, maximum=1.99, value=0.8, step=0.01) 97 | language_dropdown = gr.Dropdown(list(languages.keys()), label="Language/Accent", value="English") 98 | 99 | gr.Markdown("### Speaker Selection and Voice Cloning") 100 | 101 | with gr.Row(): 102 | with gr.Column(): 103 | speaker_dropdown = update_dropdown() 104 | refresh_button = gr.Button("Refresh Speakers") 105 | with gr.Column(): 106 | filename_input = gr.Textbox(label="Add new Speaker", placeholder="Enter a name for your recording/upload to save as") 107 | save_button = gr.Button("Save Below Recording") 108 | 109 | refresh_button.click(fn=update_dropdown, inputs=[], outputs=speaker_dropdown) 110 | 111 | with gr.Row(): 112 | record_button = gr.Audio(label="Record Your Voice") 113 | 114 | save_button.click(fn=handle_recorded_audio, inputs=[record_button, speaker_dropdown, filename_input], outputs=speaker_dropdown) 115 | record_button.stop_recording(fn=handle_recorded_audio, inputs=[record_button, filename_input], outputs=speaker_dropdown) 116 | record_button.upload(fn=handle_recorded_audio, inputs=[record_button, filename_input], outputs=speaker_dropdown) 117 | 118 | submit_button = gr.Button("Convert") 119 | 120 | with gr.Column(): 121 | audio_output = gr.Audio() 122 | 123 | submit_button.click( 124 | fn=gen_voice, 125 | inputs=[text_input, speaker_dropdown, speed_slider, language_dropdown], 126 | outputs=audio_output 127 | ) 128 | 129 | if __name__ == "__main__": 130 | app.launch() -------------------------------------------------------------------------------- /app2.py: -------------------------------------------------------------------------------- 1 | 2 | from TTS.api import TTS 3 | import time 4 | import json 5 | import random 6 | import time 7 | from pathlib import Path 8 | import streamlit as st 9 | import time 10 | import html 11 | 12 | import gradio as gr 13 | import uuid 14 | import torch 15 | import librosa 16 | import streamlit as st 17 | from audio_recorder_streamlit import audio_recorder 18 | from scipy.io.wavfile import write 19 | 20 | params = { 21 | "activate": True, 22 | "autoplay": True, 23 | "show_text": False, 24 | "remove_trailing_dots": False, 25 | "voice": "Rogger.wav", 26 | "language": "English", 27 | "model_name": "tts_models/multilingual/multi-dataset/xtts_v2", 28 | # "model_path":"./models/", 29 | # "config_path":"./models/config.json" 30 | } 31 | 32 | SUPPORTED_FORMATS = ['wav'] 33 | SAMPLE_RATE = 16000 34 | 35 | speakers = {p.stem: str(p) for p in list(Path('targets').iterdir())} 36 | 37 | device = "cuda:0" if torch.cuda.is_available() else "cpu" 38 | print(f"Device: {device}") 39 | 40 | # device = torch.device('cuda:0') 41 | 42 | # device = torch.device('cuda:0') 43 | import os 44 | os.makedirs(os.path.join(".", "targets"), exist_ok=True) 45 | os.makedirs(os.path.join(".", "outputs"), exist_ok=True) 46 | 47 | @st.cache_resource 48 | # @st.experimental_singleton 49 | def load_model(): 50 | global tts 51 | print("[XTTS] Loading XTTS...") 52 | tts = TTS(model_name=params["model_name"]).to(device) 53 | # model_path=params["model_path"], 54 | # config_path=params["config_path"]). 55 | return tts 56 | 57 | tts=load_model() 58 | 59 | def get_available_voices(): 60 | return sorted([voice.name for voice in Path(f"{this_dir}/targets").glob("*.wav")]) 61 | 62 | def random_sentence(): 63 | with open(Path("harvard_sentences.txt")) as f: 64 | return random.choice(list(f)) 65 | 66 | st.title("TTS based Voice Cloning in 16 Languages.") 67 | # st.image('logo.png', width=150) 68 | 69 | st.header('Text to speech generation') 70 | 71 | this_dir = str(Path(__file__).parent.resolve()) 72 | languages=None 73 | with open(Path(f"{this_dir}/languages.json"), encoding='utf8') as f: 74 | languages = json.load(f) 75 | 76 | with st.sidebar: 77 | voice_list=get_available_voices() 78 | print (voice_list) 79 | st.title("Text to Voice") 80 | english = st.radio( 81 | label="Choose your language", options=languages, index=0, horizontal=True) 82 | 83 | default_speaker_name = "Rogger" 84 | speaker_name = st.selectbox('Select target speaker:', options=[None] + list(speakers.keys()), 85 | index=[key for key in speakers.keys()].index(default_speaker_name) + 1 if default_speaker_name in speakers else 0) 86 | 87 | wav_tgt=None 88 | if speaker_name is not None: 89 | wav_tgt, _ = librosa.load(speakers[speaker_name],sr=22000) 90 | wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20) 91 | 92 | st.write('Selected Target:') 93 | st.audio(wav_tgt, sample_rate=22000) 94 | 95 | # Upload the WAV file 96 | 97 | text = st.text_area('Enter text to convert to audio format', 98 | value="Hello") 99 | speed = st.slider('Speed', 0.1, 1.99, 0.8, 0.01) 100 | 101 | st.caption ("Optional Microphone Recording. Download and rename your recording before using.") 102 | audio_bytes = audio_recorder() 103 | if audio_bytes: 104 | st.audio(audio_bytes, format="audio/wav") 105 | 106 | def gen_voice(string,spk): 107 | string = html.unescape(string) 108 | # Generate a short UUID 109 | short_uuid = str(uuid.uuid4())[:8] 110 | fl_name='outputs/' + spk + "-" + short_uuid +'.wav' 111 | output_file = Path(fl_name) 112 | tts.tts_to_file( 113 | text=string, 114 | speed=speed, 115 | file_path=output_file, 116 | speaker_wav=[f"{this_dir}/targets/" +spk + ".wav"], 117 | language=languages[english] 118 | ) 119 | 120 | return output_file 121 | 122 | # Upload the WAV file 123 | st.caption ("For the audio file, use the name of your Target, for instance ABIDA.wav") 124 | new_tgt = st.file_uploader('Upload a new TARGET audio WAV file:', type=SUPPORTED_FORMATS, accept_multiple_files=False) 125 | if new_tgt is not None: 126 | # Get the original file name 127 | file_name = new_tgt.name 128 | 129 | # Save the file to the file system 130 | file_path = os.path.join("./targets/", file_name) 131 | st.info(f"Original file name: {file_name}") 132 | 133 | # Extract the file name without the extension 134 | file_name_without_extension = os.path.splitext(file_name)[0] 135 | 136 | # Use librosa to load and process the WAV file 137 | wav_tgt, _ = librosa.load(new_tgt, sr=22000) 138 | wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20) 139 | 140 | # Use scipy.io.wavfile.write to save the processed WAV file 141 | write('./targets/' + file_name_without_extension + '.wav', 22000, wav_tgt) 142 | 143 | st.success(f"New target saved successfully to {file_path}") 144 | 145 | if st.button('Convert'): 146 | # Run TTS 147 | st.success('Converting ... please wait ...') 148 | output_file=gen_voice(text, speaker_name) 149 | # tts.tts_to_file(text=text, speed=speed, speaker=speaker, file_path="out.wav") 150 | 151 | st.write(f'Target voice:'+speaker_name) 152 | # st.success('Converted to audio successfully') 153 | 154 | audio_file = open(output_file, 'rb') 155 | audio_bytes = audio_file.read() 156 | st.audio(audio_bytes, format='audio/wav') 157 | # st.success("You can now play the audio by clicking on the play button.") 158 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # XTTS-2-UI: A User Interface for XTTS-2 Text-Based Voice Cloning 2 | 3 | This repository contains the essential code for cloning any voice using just text and a 10-second audio sample of the target voice. XTTS-2-UI is simple to setup and use. [Example Results 🔊](#examples) 4 | 5 | Works in [16 languages](#language-support) and has in-built voice recording/uploading. 6 | Note: Don't expect EL level quality, it is not there yet. 7 | 8 | ## Model 9 | The model used is `tts_models/multilingual/multi-dataset/xtts_v2`. For more details, refer to [Hugging Face - XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and its specific version [XTTS-v2 Version 2.0.2](https://huggingface.co/coqui/XTTS-v2/tree/v2.0.2). 10 | 11 |

12 | 13 |

14 | 15 | ## Table of Contents 16 | 17 | - [XTTS-2-UI: A User Interface for XTTS-2 Text-Based Voice Cloning](#xtts-2-ui-a-user-interface-for-xtts-2-text-based-voice-cloning) 18 | - [Model](#model) 19 | - [Table of Contents](#table-of-contents) 20 | - [Setup](#setup) 21 | - [Inference](#inference) 22 | - [Target Voices Dataset](#target-voices-dataset) 23 | - [Sample Audio Examples:](#sample-audio-examples) 24 | - [Language Support](#language-support) 25 | - [Notes](#notes) 26 | - [Credits](#credits) 27 | 28 | ## Setup 29 | 30 | To set up this project, follow these steps in a terminal: 31 | 32 | 1. **Clone the Repository** 33 | 34 | - Clone the repository to your local machine. 35 | ```bash 36 | git clone https://github.com/pbanuru/xtts2-ui.git 37 | cd xtts2-ui 38 | ``` 39 | 40 | 2. **Create a Virtual Environment:** 41 | - Run the following command to create a Python virtual environment: 42 | ```bash 43 | python -m venv venv 44 | ``` 45 | - Activate the virtual environment: 46 | - Windows: 47 | ```bash 48 | # cmd prompt 49 | venv\Scripts\activate 50 | ``` 51 | or 52 | 53 | ```bash 54 | # git bash 55 | source venv/Scripts/activate 56 | ``` 57 | - Linux/Mac: 58 | ```bash 59 | source venv/bin/activate 60 | ``` 61 | 62 | 3. **Install PyTorch:** 63 | 64 | - If you have an Nvidia CUDA-Enabled GPU, choose the appropriate PyTorch installation command: 65 | - Before installing PyTorch, check your CUDA version by running: 66 | ```bash 67 | nvcc --version 68 | ``` 69 | - For CUDA 12.1: 70 | ```bash 71 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 72 | ``` 73 | - For CUDA 11.8: 74 | ```bash 75 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 76 | ``` 77 | - If you don't have a CUDA-enabled GPU,: 78 | Follow the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/) to install the appropriate version of PyTorch for your system. 79 | 80 | 4. **Install Other Required Packages:** 81 | - Install direct dependencies: 82 | ```bash 83 | pip install -r requirements.txt 84 | ``` 85 | - Upgrade the TTS package to the latest version: 86 | ```bash 87 | pip install --upgrade TTS 88 | ``` 89 | 90 | 91 | 92 | 93 | After completing these steps, your setup should be complete and you can start using the project. 94 | 95 | Models will be downloaded automatically upon first use. 96 | 97 | Download paths: 98 | - MacOS: `/Users/USR/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2` 99 | - Windows: `C:\Users\ YOUR-USER-ACCOUNT \AppData\Local\tts\tts_models--multilingual--multi-dataset--xtts_v2` 100 | - Linux: `/home/${USER}/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2` 101 | 102 | 103 | 104 | ## Inference 105 | To run the application: 106 | 107 | ``` 108 | python app.py 109 | OR 110 | streamlit run app2.py 111 | ``` 112 | Or, You can also run from the terminal itself, by providing sample input texts on texts.json and generate multiple audios with multiple speakers, (you may need to adjust on appTerminal.py) 113 | ``` 114 | python appTerminal.py 115 | ``` 116 | On initial use, you will need to agree to the terms: 117 | 118 | ``` 119 | [XTTS] Loading XTTS... 120 | > tts_models/multilingual/multi-dataset/xtts_v2 has been updated, clearing model cache... 121 | > You must agree to the terms of service to use this model. 122 | | > Please see the terms of service at https://coqui.ai/cpml.txt 123 | | > "I have read, understood and agreed to the Terms and Conditions." - [y/n] 124 | | | > 125 | ``` 126 | 127 | If your model is re-downloading each run, please consult [Issue 4723 on GitHub](https://github.com/oobabooga/text-generation-webui/issues/4723#issuecomment-1826120220). 128 | 129 | ## Target Voices Dataset 130 | The dataset consists of a single folder named `targets`, pre-populated with several voices for testing purposes. 131 | 132 | To add more voices (if you don't want to go through the GUI), create a 24KHz WAV file of approximately 10 seconds and place it under the `targets` folder. 133 | You can use yt-dlp to download a voice from YouTube for cloning: 134 | ``` 135 | yt-dlp -x --audio-format wav "https://www.youtube.com/watch?" 136 | ``` 137 | 138 | 139 | ## Sample Audio Examples: 140 | 141 | | Language | Audio Sample Link | 142 | |----------|-------------------| 143 | | English | [▶️](demo_info/Rogger_sample_en.wav) | 144 | | Russian | [▶️](demo_info/Rogger_sample_ru.wav) | 145 | | Arabic | [▶️](demo_info/Rogger_sample_aa.wav) | 146 | 147 | ## Language Support 148 | Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Italian, Japanese[ (see setup)](#notes), Korean, Polish, Portuguese, Russian, Spanish, Turkish 149 | 150 | ## Notes 151 | If you would like to select **Japanese** as the target language, you must install a dictionary. 152 | ```bash 153 | # Lite version 154 | pip install fugashi[unidic-lite] 155 | ``` 156 | or for more serious processing: 157 | ```bash 158 | # Full version 159 | pip install fugashi[unidic] 160 | python -m unidic download 161 | ``` 162 | More details [here](https://github.com/polm/fugashi#installing-a-dictionary). 163 | 164 | 165 | ## Credits 166 | 1. Heavily based on https://github.com/kanttouchthis/text_generation_webui_xtts/ 167 | --------------------------------------------------------------------------------