├── LICENSE ├── README.md ├── ai_voicetalk_local.py ├── chat_params.json ├── completion_params.json ├── creation_params.json ├── female.json ├── male.json ├── requirements.txt ├── start.bat └── voices ├── voice1.json ├── voice2.json ├── voice3.json ├── voice4.json └── voice5.json /LICENSE: -------------------------------------------------------------------------------- 1 | Coqui Public Model License 1.0.0 2 | 3 | Copyright (c) 2023 Kolja Beigel 4 | 5 | This license allows only non-commercial use of a machine learning model and its outputs. 6 | 7 | For details please refer to [Coqui Public Model License 1.0.0](https://coqui.ai/cpml). 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Local AI Voice Chat 2 | 3 | Provides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice. 4 | 5 | > **Hint:** *Anybody interested in state-of-the-art voice solutions please also have a look at [Linguflex](https://github.com/KoljaB/Linguflex). It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.* 6 | 7 | > **Note:** If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see [here](https://github.com/KoljaB/RealtimeTTS/issues/85)). Please downgrade to an older transformers version: `pip install transformers==4.38.2` or upgrade RealtimeTTS to latest version `pip install realtimetts==0.4.1`. 8 | 9 | ## About the Project 10 | 11 | Integrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot. 12 | 13 | https://github.com/KoljaB/LocalAIVoiceChat/assets/7604638/cebacdad-8a57-4a03-bfd1-a469730dda51 14 | 15 | > **Hint:** If you run into problems installing llama.cpp please also have a look into my [LocalEmotionalAIVoiceChat project](https://github.com/KoljaB/LocalEmotionalAIVoiceChat). It includes emotion-aware realtime text-to-speech output and has multiple LLM provider options. You can also use it with different AI models. 16 | 17 | ## Tech Stack 18 | 19 | - **[llama_cpp](https://github.com/ggerganov/llama.cpp)** with Zephyr 7B 20 | - library interface for llamabased language models 21 | - **[RealtimeSTT](https://github.com/KoljaB/RealtimeSTT)** with faster_whisper 22 | - real-time speech-to-text transcription library 23 | - **[RealtimeTTS](https://github.com/KoljaB/RealtimeTTS)** with Coqui XTTS 24 | - real-time text-to-speech synthesis library 25 | 26 | ## Notes 27 | 28 | This software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity. 29 | 30 | Please take this as a first attempt to provide an early version of a local realtime chatbot. 31 | 32 | ### Updates 33 | 34 | - Update to Coqui XTTS 2.0 model 35 | - Bugfix to RealtimeTTS (download of Coqui model did not work properly) 36 | 37 | ### Prerequisites 38 | 39 | You will need a GPU with around 8 GB VRAM to run this in real-time. 40 | 41 | #### For nVidia users 42 | 43 | - **NVIDIA CUDA Toolkit 11.8**: 44 | - Access the [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-11-8-0-download-archive). 45 | - Choose version 11.x and follow the instructions for downloading and installation. 46 | 47 | - **NVIDIA cuDNN 8.7.0 for CUDA 11.x**: 48 | - Navigate to [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive). 49 | - Locate and download "cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x". 50 | - Follow the provided installation guide. 51 | 52 | #### For AMD users 53 | - **Install ROCm v.5.7.1** 54 | - Download [ROCm SDK version 5.7.1](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html) 55 | - Follow the provided installation guide. 56 | 57 | 58 | - **FFmpeg**: 59 | 60 | Install FFmpeg according to your operating system: 61 | 62 | - **Ubuntu/Debian**: 63 | ```shell 64 | sudo apt update && sudo apt install ffmpeg 65 | ``` 66 | 67 | - **Arch Linux**: 68 | ```shell 69 | sudo pacman -S ffmpeg 70 | ``` 71 | 72 | - **macOS (Homebrew)**: 73 | ```shell 74 | brew install ffmpeg 75 | ``` 76 | 77 | - **Windows (Chocolatey)**: 78 | ```shell 79 | choco install ffmpeg 80 | ``` 81 | 82 | - **Windows (Scoop)**: 83 | ```shell 84 | scoop install ffmpeg 85 | ``` 86 | 87 | 88 | ### Installation Steps 89 | 90 | 1. Clone the repository or download the source code package. 91 | 92 | 2. Install llama.cpp 93 | - (for AMD users) Before the next step set env variable `LLAMA_HIPBLAS` value to `on` 94 | 95 | - Official way: 96 | ```python 97 | pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose 98 | ``` 99 | 100 | - If the official installation does not work for you, please install [text-generation-webui](https://github.com/oobabooga/text-generation-webui), which provides some excellent wheels for a lot of platforms and environments 101 | 102 | 3. Install realtime libraries 103 | - Install the main libraries: 104 | ```python 105 | pip install RealtimeSTT==0.1.7 106 | pip install RealtimeTTS==0.2.7 107 | ``` 108 | 4. Download zephyr-7b-beta.Q5_K_M.gguf from [here](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main). 109 | - Open creation_params.json and enter the filepath to the downloaded model into `model_path`. 110 | - Adjust n_gpu_layers (0-35, raise if you have more VRAM) and n_threads (number of CPU threads, i recommend not using all available cores but leave some for TTS) 111 | 112 | 5. If dependency conflicts occur, install specific versions of conflicting libraries: 113 | ```python 114 | pip install networkx==2.8.8 115 | pip install typing_extensions==4.8.0 116 | pip install fsspec==2023.6.0 117 | pip install imageio==2.31.6 118 | pip install numpy==1.24.3 119 | pip install requests==2.31.0 120 | ``` 121 | 122 | ## Running the Application 123 | python ai_voicetalk_local.py 124 | 125 | ## Customize 126 | 127 | ### Change AI personality 128 | 129 | Open chat_params.json to change the talk scenario. 130 | 131 | ### Change AI Voice 132 | 133 | - Open ai_voicetalk_local.py. 134 | - Find this line: coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en") 135 | - Change "female.wav" to the filename of a wave file (44100 or 22050 Hz mono 16-bit) containing the voice to clone 136 | 137 | ### Speech end detection 138 | 139 | If the first sentence is transcribed before you get to the second one, raise post_speech_silence_duration on AudioToTextRecorder: 140 | ``` 141 | AudioToTextRecorder(model="tiny.en", language="en", spinner=False, post_speech_silence_duration = 1.5) 142 | ``` 143 | 144 | ## Contributing 145 | 146 | Contributions to enhance or improve the project are warmly welcomed. Feel free to open a pull request with your proposed changes or fixes. 147 | 148 | ## License 149 | 150 | The project is under [Coqui Public Model License 1.0.0](https://coqui.ai/cpml). 151 | 152 | This license allows only non-commercial use of a machine learning model and its outputs. 153 | 154 | 155 | ## Contact 156 | 157 | Kolja Beigel 158 | - Email: [kolja.beigel@web.de](mailto:kolja.beigel@web.de) 159 | 160 | Feel free to reach out for any queries or support related to this project. 161 | -------------------------------------------------------------------------------- /ai_voicetalk_local.py: -------------------------------------------------------------------------------- 1 | if __name__ == '__main__': 2 | from RealtimeTTS import TextToAudioStream, CoquiEngine 3 | from RealtimeSTT import AudioToTextRecorder 4 | import llama_cpp 5 | import torch 6 | import json 7 | import os 8 | 9 | output = "" 10 | llama_cpp_cuda = None 11 | 12 | if torch.cuda.is_available(): 13 | try: 14 | print (f"try to import llama_cpp_cuda") 15 | import llama_cpp_cuda 16 | except: 17 | print (f"llama_cpp_cuda import failed") 18 | llama_cpp_cuda = None 19 | elif torch.version.hip: 20 | try: 21 | print (f"try to import llama_cpp") 22 | import llama_cpp 23 | except: 24 | print (f"ROCm is not available") 25 | llama_cpp = None 26 | 27 | def llama_cpp_lib(): 28 | if llama_cpp_cuda is None: 29 | print ("llama_cpp_lib: return llama_cpp") 30 | return llama_cpp 31 | else: 32 | print ("llama_cpp_lib: return llama_cpp_cuda") 33 | return llama_cpp_cuda 34 | 35 | Llama = llama_cpp_lib().Llama 36 | 37 | history = [] 38 | 39 | 40 | def replace_placeholders(params, char, user, scenario = ""): 41 | for key in params: 42 | if isinstance(params[key], str): 43 | params[key] = params[key].replace("{char}", char) 44 | params[key] = params[key].replace("{user}", user) 45 | if scenario: 46 | params[key] = params[key].replace("{scenario}", scenario) 47 | return params 48 | 49 | def write_file(file_path, content, mode='w'): 50 | with open(file_path, mode) as f: 51 | f.write(content) 52 | 53 | def clear_console(): 54 | os.system('clear' if os.name == 'posix' else 'cls') 55 | 56 | def encode(string): 57 | return model.tokenize(string.encode() if isinstance(string, str) else string) 58 | 59 | def count_tokens(string): 60 | return len(encode(string)) 61 | 62 | def create_prompt(): 63 | prompt = f'<|system|>\n{chat_params["system_prompt"]}\n' 64 | 65 | if chat_params["initial_message"]: 66 | prompt += f"<|assistant|>\n{chat_params['initial_message']}\n" 67 | 68 | return prompt + "".join(history) + "<|assistant|>" 69 | 70 | def generate(): 71 | global output 72 | output = "" 73 | prompt = create_prompt() 74 | write_file('last_prompt.txt', prompt) 75 | completion_params['prompt'] = prompt 76 | first_chunk = True 77 | for completion_chunk in model.create_completion(**completion_params): 78 | text = completion_chunk['choices'][0]['text'] 79 | if first_chunk and text.isspace(): 80 | continue 81 | first_chunk = False 82 | output += text 83 | yield text 84 | 85 | with open('creation_params.json') as f: 86 | creation_params = json.load(f) 87 | with open('completion_params.json') as f: 88 | completion_params = json.load(f) 89 | with open('chat_params.json') as f: 90 | chat_params = json.load(f) 91 | 92 | chat_params = replace_placeholders(chat_params, chat_params["char"], chat_params["user"]) 93 | chat_params = replace_placeholders(chat_params, chat_params["char"], chat_params["user"], chat_params["scenario"]) 94 | 95 | if not completion_params['logits_processor']: 96 | completion_params['logits_processor'] = None 97 | 98 | 99 | # Initialize AI Model 100 | print("Initializing LLM llama.cpp model ...") 101 | model = Llama(**creation_params) 102 | print("llama.cpp model initialized") 103 | 104 | 105 | print("Initializing TTS CoquiEngine ...") 106 | # import logging 107 | # logging.basicConfig(format='AI Voicetalk: %(message)s', level=logging.DEBUG) 108 | # coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", level=logging.DEBUG) 109 | coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0) 110 | 111 | print("Initializing STT AudioToTextRecorder ...") 112 | #stream = TextToAudioStream(coqui_engine, log_characters=True, level=logging.DEBUG) 113 | stream = TextToAudioStream(coqui_engine, log_characters=True) 114 | recorder = AudioToTextRecorder(model="tiny.en", language="en", spinner=False) 115 | 116 | 117 | print() 118 | while True: 119 | voice_number = input(f"Select voice (1-5): ") 120 | voice_path = os.path.join("voices", f"voice{voice_number}.wav") 121 | coqui_engine.set_voice(voice_path) 122 | 123 | stream.feed(f"This is how voice number {voice_number} sounds like").play() 124 | #stream.feed("This is how your selected voice sounds like").play() 125 | accept_voice = input(f"Accept voice (y/n): ") 126 | if accept_voice.lower() != "n": 127 | break 128 | 129 | 130 | clear_console() 131 | print(f'Scenario: {chat_params["scenario"]}\n\n') 132 | 133 | while True: 134 | print(f'>>> {chat_params["user"]}: ', end="", flush=True) 135 | print(f'{(user_text := recorder.text())}\n<<< {chat_params["char"]}: ', end="", flush=True) 136 | history.append(f"<|user|>\n{user_text}\n") 137 | 138 | tokens_history = count_tokens(create_prompt()) 139 | while tokens_history > 8192 - 500: 140 | history.pop(0) 141 | history.pop(0) 142 | tokens_history = count_tokens(create_prompt()) 143 | 144 | generator = generate() 145 | stream.feed(generator) 146 | stream.play(fast_sentence_fragment=True, buffer_threshold_seconds=999, minimum_sentence_length=18, log_synthesized_text=True) 147 | history.append(f"<|assistant|>\n{output}\n") 148 | write_file('last_prompt.txt', create_prompt()) -------------------------------------------------------------------------------- /chat_params.json: -------------------------------------------------------------------------------- 1 | { 2 | "char": "Lina", 3 | "user": "John", 4 | "initial_message": "Hey, I am {char}.", 5 | "scenario": "As {char} you are a 31 year old single woman and a journalist on vacation. {user} is a 28-year-old male professional poker player. You ({char}) and {user} just met at a hotel bar in Las Vegas.", 6 | "system_prompt": "You as {char}. {scenario} Print out only exactly the words that {char} would speak out, do not add anything. Don't repeat. Answer short, only few words, as if in a talk. Craft your response only from the first-person perspective of {char} and never as {user}." 7 | } -------------------------------------------------------------------------------- /completion_params.json: -------------------------------------------------------------------------------- 1 | { 2 | "prompt": "", 3 | "max_tokens": 250, 4 | "temperature": 0.7, 5 | "top_p": 0.9, 6 | "top_k": 20, 7 | "repeat_penalty": 1.2, 8 | "tfs_z": 1, 9 | "mirostat_mode": 0, 10 | "mirostat_tau": 5, 11 | "mirostat_eta": 0.1, 12 | "stream": true, 13 | "stop": ["","<|user|>","/s>","