├── LICENSE ├── README.md ├── README_en.md ├── README_zh.md ├── gpt_sovits_api.py ├── main_ollama.py └── requirements.txt /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 HaxxorCialtion 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 智能语音助手系统 2 | 3 | ## This project contains two README files: 4 | - [English version](README_en.md) 5 | - [中文版](README_zh.md) 6 | 7 | ## 项目概述 8 | 9 | 这是一个基于关键词检测、语音识别、语音合成和对话生成的智能语音助手系统。该系统能够通过特定的唤醒词(如"hey bro")启动与用户的语音对话,并利用先进的自然语言处理技术提供智能回复。 10 | 11 | ## 主要功能 12 | 13 | 1. **关键词检测**:使用Porcupine实时监听唤醒词,启动对话。 14 | 2. **语音录制和检测**:采用WebRTC VAD进行语音活动检测,录制有效语音片段。 15 | 3. **语音识别(ASR)**:使用SenseVoice Small模型将录制的语音转换为文本。 16 | 4. **对话生成**:调用Ollama API(兼容OpenAI API),根据上下文生成助手的文本回复。 17 | 5. **语音合成(TTS)**:将助手的回复通过语音合成输出,模拟人声对话。 18 | 6. **对话历史保存**:定期将对话内容保存为JSON文件,便于后续分析。 19 | 20 | ## 技术栈 21 | 22 | - Python 23 | - Porcupine(关键词检测) 24 | - WebRTC VAD(语音活动检测) 25 | - SenseVoice Small(语音识别) 26 | - Ollama API(对话生成) 27 | - GPT-SoVITS(语音合成) 28 | - PyAudio, NumPy, SciPy(音频处理) 29 | 30 | ### TTS部分 31 | 32 | - 本项目基于GPT-Sovits-v2,感谢开源社区工作者的贡献! 33 | 34 | #### GPT-SoVITS TTS 项目 35 | 36 | 这是一个基于GPT-SoVITS API的文本到语音(TTS)项目。该项目允许用户根据不同的情感生成和播放语音,使用预定义的参考音频来影响输出的语音风格。 37 | 38 | ##### 功能特点 39 | 40 | - 支持多种情感的语音生成(高兴、抑郁、激动、平静、纠结) 41 | - 使用参考音频来控制语音风格 42 | - 实时生成并播放WAV格式的音频文件 43 | - 可自定义文本输入和输出文件名 44 | - 提供TTS处理时间统计 45 | - 可以添加一个情感识别模型来决定参考音频,从而控制音频合成情感(to do) 46 | 47 | ## 使用指南 48 | 49 | 1. 克隆仓库: 50 | - git clone https://github.com/HaxxorCialtion/ASR_LLM_TTS_py.git 51 | 52 | - cd intelligent-voice-assistant 53 | 54 | 55 | 2. 安装依赖: 56 | - pip install -r requirements.txt 57 | 58 | 59 | 3. 准备必要的API密钥和模型: 60 | - 获取Porcupine API密钥 61 | - 下载SenseVoice Small模型文件 62 | - 确保Ollama API服务已经运行 63 | - 开启GPT-Sovits API服务 64 | 65 | 4. 配置系统: 66 | - 在脚本中填入Porcupine API密钥 67 | - 设置ASR模型路径 68 | - 配置Ollama API端点(默认为本地) 69 | - 配置GPT-sovits 模型和参考音频 70 | 71 | ## 使用方法 72 | 73 | 1. 运行主脚本: 74 | python main.py 75 | 76 | 2. 等待系统提示"Listening for wake word..." 77 | 78 | 3. 说出唤醒词(默认为"hey bro")开始对话 79 | 80 | 4. 与语音助手进行自然语言交互 81 | 82 | 5. 超时则需再次触发 83 | 84 | 6. 结束本轮对话即自动保存对话记录 85 | 86 | ## 主要特性 87 | 88 | - 实时语音交互 89 | - 智能对话生成 90 | - 自然语音合成 91 | - 长时间无语音自动休眠 92 | - 对话历史记录 93 | 94 | ## 自定义设置 95 | 96 | - 修改`settings`变量来自定义助手的角色和背景 97 | - 调整`max_silence_duration`和`min_speech_duration`等参数来优化语音检测 98 | - 更换唤醒词和对应的模型文件 99 | 100 | ## 许可证 101 | 102 | 本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情 103 | 104 | ## 联系方式 105 | 106 | 项目维护者:HaxxorCialtion - cialtion@outlook.com 107 | Bilibili视频地址:https://www.bilibili.com/video/BV1pftreQEbu 108 | 109 | ## 致谢 110 | 111 | - [Porcupine](https://github.com/Picovoice/porcupine) - 用于唤醒词检测 112 | - [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) - 提供ASR模型 113 | - [Ollama](https://github.com/ollama/ollama) - 本地大语言模型服务 114 | - [LLM](https://github.com/QwenLM/Qwen2.5) - LLLM服务 115 | - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS) - 用于语音合成 -------------------------------------------------------------------------------- /README_en.md: -------------------------------------------------------------------------------- 1 | # Intelligent Voice Assistant System 2 | 3 | ## Project Overview 4 | 5 | This is an intelligent voice assistant system based on keyword detection, speech recognition, speech synthesis, and dialogue generation. The system can initiate a voice conversation with users through specific wake words (such as "hey bro") and provide intelligent replies using advanced natural language processing technology. 6 | 7 | ## Main Features 8 | 9 | - **Keyword Detection**: Real-time monitoring of wake words using Porcupine to initiate conversation. 10 | - **Voice Recording and Detection**: Voice activity detection using WebRTC VAD to record valid voice segments. 11 | - **Speech Recognition (ASR)**: Converting recorded voice to text using the SenseVoice Small model. 12 | - **Dialogue Generation**: Generating the assistant's text replies based on context by calling the Ollama API (compatible with OpenAI API). 13 | - **Speech Synthesis (TTS)**: Outputting the assistant's replies through speech synthesis to simulate human voice conversation. 14 | - **Dialogue History Saving**: Regularly saving dialogue content as JSON files for subsequent analysis. 15 | 16 | ## Technology Stack 17 | 18 | - Python 19 | - Porcupine (Keyword Detection) 20 | - WebRTC VAD (Voice Activity Detection) 21 | - SenseVoice Small (Speech Recognition) 22 | - Ollama API (Dialogue Generation) 23 | - GPT-SoVITS (Speech Synthesis) 24 | - PyAudio, NumPy, SciPy (Audio Processing) 25 | 26 | ## TTS Part 27 | 28 | This project is based on the GPT-SoVITS API, and we appreciate the contributions of the open-source community workers! 29 | 30 | ### GPT-SoVITS TTS Project 31 | 32 | This is a Text-to-Speech (TTS) project based on the GPT-SoVITS API. The project allows users to generate and play voice with different emotions, using predefined reference audio to influence the output voice style. 33 | 34 | ### Features 35 | 36 | - Supports voice generation with various emotions (happy, depressed, excited, calm, confused). 37 | - Uses reference audio to control voice style. 38 | - Real-time generation and playback of WAV format audio files. 39 | - Customizable text input and output filenames. 40 | - Provides TTS processing time statistics. 41 | - Can add an emotion recognition model to decide the reference audio, thus controlling the audio synthesis emotion (to do). 42 | 43 | ## Usage Guide 44 | 45 | 1.Clone the repository: 46 | - git clone https://github.com/HaxxorCialtion/ASR_LLM_TTS_py.git 47 | - cd intelligent-voice-assistant 48 | 49 | 2. Install dependencies: 50 | 51 | - pip install -r requirements.txt 52 | 53 | 3. Prepare necessary API keys and models: 54 | - Obtain Porcupine API key. 55 | - Download SenseVoice Small model files. 56 | - Ensure Ollama API service is running. 57 | - Start GPT-Sovits API service. 58 | 59 | 4. Configure the system: 60 | - Fill in the Porcupine API key in the script. 61 | - Set the ASR model path. 62 | - Configure Ollama API endpoint (default is local). 63 | - Configure GPT-sovits model and reference audio. 64 | 65 | 5. Run the main script: 66 | 67 | - python main_ollama.py 68 | 69 | - Wait for the system to prompt "Listening for wake word..." 70 | - Speak the wake word (default is "hey bro") to start the conversation. 71 | - Engage in natural language interaction with the voice assistant. 72 | - If timed out, trigger again. 73 | - The conversation record is automatically saved at the end of the round. 74 | 75 | ## Main Features 76 | 77 | - Real-time voice interaction. 78 | - Intelligent dialogue generation. 79 | - Natural voice synthesis. 80 | - Automatic sleep mode after a long period of no voice. 81 | - Dialogue history recording. 82 | - Custom settings. 83 | 84 | ## Customization 85 | 86 | - Modify the `settings` variable to customize the assistant's role and background. 87 | - Adjust parameters such as `max_silence_duration` and `min_speech_duration` to optimize voice detection. 88 | - Change the wake word and corresponding model files. 89 | 90 | ## License 91 | 92 | This project is licensed under the MIT License 93 | 94 | ## Contact 95 | 96 | Project Maintainer: HaxxorCialtion - cialtion@outlook.com 97 | 98 | Bilibili Video Address: https://www.bilibili.com/video/BV1pftreQEbu 99 | 100 | ## Acknowledgements 101 | 102 | - [Porcupine](https://github.com/Picovoice/porcupine) 103 | - [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) 104 | - [Ollama](https://github.com/ollama/ollama) 105 | - [LLM](https://github.com/QwenLM/Qwen2.5) 106 | - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS) -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- 1 | # 智能语音助手系统 2 | 3 | ## 项目概述 4 | 5 | 这是一个基于关键词检测、语音识别、语音合成和对话生成的智能语音助手系统。该系统能够通过特定的唤醒词(如"hey bro")启动与用户的语音对话,并利用先进的自然语言处理技术提供智能回复。 6 | 7 | ## 主要功能 8 | 9 | 1. **关键词检测**:使用Porcupine实时监听唤醒词,启动对话。 10 | 2. **语音录制和检测**:采用WebRTC VAD进行语音活动检测,录制有效语音片段。 11 | 3. **语音识别(ASR)**:使用SenseVoice Small模型将录制的语音转换为文本。 12 | 4. **对话生成**:调用Ollama API(兼容OpenAI API),根据上下文生成助手的文本回复。 13 | 5. **语音合成(TTS)**:将助手的回复通过语音合成输出,模拟人声对话。 14 | 6. **对话历史保存**:定期将对话内容保存为JSON文件,便于后续分析。 15 | 16 | ## 技术栈 17 | 18 | - Python 19 | - Porcupine(关键词检测) 20 | - WebRTC VAD(语音活动检测) 21 | - SenseVoice Small(语音识别) 22 | - Ollama API(对话生成) 23 | - GPT-SoVITS(语音合成) 24 | - PyAudio, NumPy, SciPy(音频处理) 25 | 26 | ### TTS部分 27 | 28 | - 本项目基于GPT-Sovits-v2,感谢开源社区工作者的贡献! 29 | 30 | #### GPT-SoVITS TTS 项目 31 | 32 | 这是一个基于GPT-SoVITS API的文本到语音(TTS)项目。该项目允许用户根据不同的情感生成和播放语音,使用预定义的参考音频来影响输出的语音风格。 33 | 34 | ##### 功能特点 35 | 36 | - 支持多种情感的语音生成(高兴、抑郁、激动、平静、纠结) 37 | - 使用参考音频来控制语音风格 38 | - 实时生成并播放WAV格式的音频文件 39 | - 可自定义文本输入和输出文件名 40 | - 提供TTS处理时间统计 41 | - 可以添加一个情感识别模型来决定参考音频,从而控制音频合成情感(to do) 42 | 43 | ## 使用指南 44 | 45 | 1. 克隆仓库: 46 | - git clone https://github.com/HaxxorCialtion/ASR_LLM_TTS_py.git 47 | 48 | - cd intelligent-voice-assistant 49 | 50 | 51 | 2. 安装依赖: 52 | - pip install -r requirements.txt 53 | 54 | 55 | 3. 准备必要的API密钥和模型: 56 | - 获取Porcupine API密钥 57 | - 下载SenseVoice Small模型文件 58 | - 确保Ollama API服务已经运行 59 | - 开启GPT-Sovits API服务 60 | 61 | 4. 配置系统: 62 | - 在脚本中填入Porcupine API密钥 63 | - 设置ASR模型路径 64 | - 配置Ollama API端点(默认为本地) 65 | - 配置GPT-sovits 模型和参考音频 66 | 67 | ## 使用方法 68 | 69 | 1. 运行主脚本: 70 | python main.py 71 | 72 | 2. 等待系统提示"Listening for wake word..." 73 | 74 | 3. 说出唤醒词(默认为"hey bro")开始对话 75 | 76 | 4. 与语音助手进行自然语言交互 77 | 78 | 5. 超时则需再次触发 79 | 80 | 6. 结束本轮对话即自动保存对话记录 81 | 82 | ## 主要特性 83 | 84 | - 实时语音交互 85 | - 智能对话生成 86 | - 自然语音合成 87 | - 长时间无语音自动休眠 88 | - 对话历史记录 89 | 90 | ## 自定义设置 91 | 92 | - 修改`settings`变量来自定义助手的角色和背景 93 | - 调整`max_silence_duration`和`min_speech_duration`等参数来优化语音检测 94 | - 更换唤醒词和对应的模型文件 95 | 96 | ## 许可证 97 | 98 | 本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情 99 | 100 | ## 联系方式 101 | 102 | 项目维护者:HaxxorCialtion - cialtion@outlook.com 103 | Bilibili视频地址:https://www.bilibili.com/video/BV1pftreQEbu 104 | 105 | ## 致谢 106 | 107 | - [Porcupine](https://github.com/Picovoice/porcupine) - 用于唤醒词检测 108 | - [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) - 提供ASR模型 109 | - [Ollama](https://github.com/ollama/ollama) - 本地大语言模型服务 110 | - [LLM](https://github.com/QwenLM/Qwen2.5) - LLLM服务 111 | - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS) - 用于语音合成 -------------------------------------------------------------------------------- /gpt_sovits_api.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import os 3 | import time 4 | from playsound import playsound 5 | import pyaudio 6 | import wave 7 | 8 | # 基础URL 9 | base_url = "http://127.0.0.1:9880" 10 | 11 | def get_audio_file_path(emotion): 12 | """获取参考音频的路径""" 13 | paths = { 14 | "高兴": "./参考音频/要吸收和消化掉这些对吧?我会努力的!.wav", 15 | "抑郁": "./参考音频/虽然可惜依旧不能算正式干员的样子.wav", 16 | "激动": "./参考音频/这下坏人的数量又减少了呢!都是多亏了博士!.wav", 17 | "平静": "./参考音频/嗯?博士怎么放炮了几个坏人?我去替博士收拾一下吧。.wav", 18 | "纠结": "./参考音频/不要吧,那我会很困扰啊。.wav" 19 | } 20 | return paths.get(emotion) 21 | 22 | def change_reference_audio(emotion, prompt_language="zh"): 23 | """切换参考音频并发送请求""" 24 | refer_wav_path = get_audio_file_path(emotion) 25 | filename = os.path.basename(refer_wav_path) 26 | file_name_without_extension = os.path.splitext(filename)[0] 27 | prompt_text = file_name_without_extension 28 | data = { 29 | "refer_wav_path": refer_wav_path, 30 | "prompt_text": prompt_text, 31 | "prompt_language": prompt_language 32 | } 33 | 34 | response = requests.post(f"{base_url}/change_refer", json=data) 35 | def save_audio_from_response(url, data, output_file): 36 | """执行推理并保存音频""" 37 | try: 38 | response = requests.post(url, json=data) 39 | 40 | if response.status_code == 200: 41 | with open(output_file, "wb") as f: 42 | f.write(response.content) 43 | return output_file 44 | else: 45 | print(response.status_code) 46 | except Exception as e: 47 | print(e) 48 | 49 | def play_wav_file(wav_file): 50 | # 打开WAV文件 51 | wf = wave.open(wav_file, 'rb') 52 | 53 | # 创建PyAudio对象 54 | p = pyaudio.PyAudio() 55 | 56 | # 打开音频流 57 | stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), 58 | channels=wf.getnchannels(), 59 | rate=wf.getframerate(), 60 | output=True) 61 | 62 | # 读取数据并播放 63 | data = wf.readframes(1024) 64 | while len(data) > 0: 65 | stream.write(data) 66 | data = wf.readframes(1024) 67 | 68 | # 停止和关闭音频流 69 | stream.stop_stream() 70 | stream.close() 71 | 72 | # 关闭PyAudio 73 | p.terminate() 74 | 75 | def gpt_sovits(temp_text="长时间没和我交流,已待机", emotion="高兴", output_file=f"temp_12.wav"): 76 | t1 = time.time() 77 | base_url = "http://127.0.0.1:9880" 78 | url = f"{base_url}/" 79 | data = { 80 | "text": f"{temp_text}", 81 | "text_language": "zh", 82 | "cut_punc": ",。!?!、:;?.,、—‘’“”《》【】()[]{}「」『』‖|…‥・﹏﹋﹌·・~-−—―「」『』〝〞", 83 | # "cut_punc": "。", 84 | "top_k": 20, 85 | "top_p": 1.0, 86 | "temperature": 1, 87 | "speed": 1.0 88 | } 89 | change_reference_audio(emotion, "zh") 90 | wav_file = save_audio_from_response(url, data, output_file) 91 | t2 = time.time() 92 | print(f"TTS耗时: {t2 - t1} seconds") 93 | # 播放音频 94 | play_wav_file(wav_file) 95 | 96 | """ 97 | $$ http://127.0.0.1:9880?text=晚上好,博士!&refer_wav_path=E:\AI_tools\resperpy\参考音频\虽然可惜依旧不能算正式干员的样子.wav&prompt_text=虽然可惜依旧不能算正式干员的样子&prompt_language=zh&text_language=zh&cut_punc=,。!?!、:;?.,、—‘’“”《》【】()[]{}「」『』‖|…‥・﹏﹋﹌·・~-−—―「」『』〝〞&top_k=20&top_p=1.0&temperature=1&speed=1.0 98 | """ -------------------------------------------------------------------------------- /main_ollama.py: -------------------------------------------------------------------------------- 1 | """ 2 | ASR LLM TTS 部分需要的模型和文件需要自己配置,详情参考README.md 3 | """ 4 | import pvporcupine 5 | import pyaudio 6 | import numpy as np 7 | from funasr_onnx import SenseVoiceSmall 8 | from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess 9 | from scipy.io.wavfile import write 10 | import sounddevice as sd 11 | import time 12 | import webrtcvad 13 | import os 14 | import requests 15 | import json 16 | from datetime import datetime 17 | import gpt_sovits_api 18 | import soundfile as sf 19 | 20 | def record_audio_vad(filename, sample_rate, vad): 21 | print("Recording started with VAD...") 22 | 23 | audio = [] 24 | silence_frames = 0 25 | speech_frames = 0 26 | max_silence_duration = 1.5 # 增加到1.5秒 27 | min_speech_duration = 0.5 # 最小语音持续时间为0.5秒 28 | speech_started = False 29 | valid_speech = False 30 | 31 | stream = sd.InputStream(samplerate=sample_rate, channels=1, dtype='int16') 32 | with stream: 33 | while True: 34 | frame = stream.read(frame_length)[0] 35 | frame = frame.flatten() 36 | 37 | is_speech = vad.is_speech(frame.tobytes(), sample_rate) 38 | 39 | if is_speech: 40 | speech_frames += 1 41 | silence_frames = 0 42 | if not speech_started: 43 | speech_started = True 44 | print("Speech detected, recording started.") 45 | else: 46 | silence_frames += 1 47 | if speech_started: 48 | audio.append(frame) 49 | 50 | if speech_started: 51 | audio.append(frame) 52 | 53 | # 检查是否达到最小语音持续时间 54 | if speech_frames * frame_duration / 1000 >= min_speech_duration: 55 | valid_speech = True 56 | 57 | # 检查是否达到最大静音持续时间 58 | if silence_frames * frame_duration / 1000 > max_silence_duration: 59 | if valid_speech: 60 | print("Silence detected after valid speech. Stopping recording.") 61 | break 62 | else: 63 | print("Short noise detected. Resetting.") 64 | audio = [] 65 | silence_frames = 0 66 | speech_frames = 0 67 | speech_started = False 68 | valid_speech = False 69 | 70 | if len(audio) > 0 and valid_speech: 71 | audio_data = np.concatenate(audio, axis=0) 72 | write(filename, sample_rate, audio_data) 73 | print(f"Recording finished. Saved to {filename}") 74 | return filename 75 | else: 76 | print("No valid speech detected. No audio file saved.") 77 | return None 78 | 79 | def transcribe_audio(wav_file, model): 80 | print(f"Processing audio file {wav_file}...") 81 | t1 = time.time() 82 | res = model([wav_file], language="zh", use_itn=True) 83 | transcription = [rich_transcription_postprocess(i) for i in res] 84 | print(f"ASR耗时: {time.time() - t1} seconds") 85 | return transcription[0] # 返回第一个(也是唯一的)转录结果 86 | 87 | def tts(temp_text="你好", emotion="抑郁", output_file=f"temp_12.wav"): 88 | gpt_sovits_api.gpt_sovits(temp_text, emotion, output_file) 89 | 90 | def dp_chat(message: str, stream=False): 91 | global conversation_history 92 | t1 = time.time() 93 | 94 | conversation_history.append({"role": "user", "content": message}) 95 | 96 | payload = { 97 | "model": "qwen2.5", 98 | "messages": conversation_history, 99 | "stream": stream, 100 | } 101 | url = "http://localhost:11434/api/chat" 102 | response = requests.post(url, json=payload) 103 | if stream: 104 | # 逐行读取流式响应内容 105 | for line in response.iter_lines(): 106 | if line: 107 | # 尝试将每一行解析为 JSON 108 | try: 109 | data = json.loads(line.decode('utf-8')) 110 | print(data) # 打印每一个 JSON 数据块 111 | except json.JSONDecodeError as e: 112 | print(f"JSON 解析失败: {e}") 113 | else: 114 | response_json = response.json() 115 | assistant_response = response_json.get("message", {}).get("content", "") 116 | 117 | t2 = time.time() 118 | print(f"API response time: {t2 - t1} seconds") 119 | 120 | conversation_history.append({"role": "assistant", "content": assistant_response}) 121 | 122 | tts(assistant_response) 123 | 124 | return assistant_response 125 | 126 | def play_audio(file_path): 127 | """触发唤醒词后,播放hello文件""" 128 | data, fs = sf.read(file_path, dtype='float32') # 读取音频文件 129 | sd.play(data, fs) # 播放音频 130 | sd.wait() # 等待音频播放结束 131 | 132 | def continuous_conversation(model, vad, sleep_time=10): 133 | max_silence_duration = sleep_time # 最大静音时长为 60 秒 134 | while True: 135 | audio_filename = "input_audio.wav" 136 | start_time = time.time() 137 | 138 | # 记录开始录音的时间 139 | recorded_file = record_audio_vad(audio_filename, sample_rate, vad) 140 | 141 | if recorded_file is None: 142 | print("No valid speech detected. Please try again.") 143 | continue 144 | 145 | # 如果超过了指定时间没有检测到有效语音,退出对话 146 | if time.time() - start_time > max_silence_duration: 147 | print("No speech detected for 60 seconds. Conversation ended.") 148 | play_audio("./sleep.wav") 149 | break 150 | 151 | # 语音转文字 152 | transcription_result = transcribe_audio(recorded_file, model) 153 | if transcription_result.lower() in ['退出', '结束对话', 'exit', 'quit']: 154 | print("对话结束") 155 | break 156 | 157 | # 生成助手回复 158 | response = dp_chat(transcription_result) 159 | print("User:", transcription_result) 160 | print("Assistant:", response) 161 | 162 | def save_conversation_history(): 163 | if not os.path.exists("conversation_logs"): 164 | os.makedirs("conversation_logs") 165 | 166 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 167 | filename = f"conversation_logs/conversation_{timestamp}.json" 168 | 169 | with open(filename, 'w', encoding='utf-8') as f: 170 | json.dump(conversation_history, f, ensure_ascii=False, indent=2) 171 | 172 | print(f"Conversation history saved to {filename}") 173 | 174 | def start_service(): 175 | print("Initializing Porcupine and ASR model...") 176 | 177 | # 加载ASR模型 178 | model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True) 179 | 180 | # 检测音频流中是否存在语音,它可以区分音频信号中的语音和背景噪音 181 | vad = webrtcvad.Vad() 182 | vad.set_mode(2) # 参数范围0-3,越大越严格,越能忽略更多的背景噪音 183 | 184 | pa = pyaudio.PyAudio() 185 | stream = pa.open(rate=porcupine.sample_rate, # 音频流的采样率 186 | channels=1, # 音频流的通道数 187 | format=pyaudio.paInt16, # 音频数据的格式 188 | input=True, # 指定这个音频流是用于输入(采集音频) 189 | frames_per_buffer=porcupine.frame_length) # 缓冲区大小,表示每次从音频输入设备读取多少帧 190 | 191 | print("Listening for wake word...") 192 | 193 | try: 194 | while True: 195 | pcm = stream.read(porcupine.frame_length) # 从麦克风音频输入流中读取指定长度的音频数据 196 | pcm = np.frombuffer(pcm, dtype=np.int16) 197 | 198 | keyword_index = porcupine.process(pcm) # 调用process方法检测是否包含唤醒词 199 | if keyword_index >= 0: 200 | print("Wake word detected! Starting conversation...") 201 | stream.stop_stream() # 上文已经获取到唤醒词,则先停止当前音频流,准备之后的对话 202 | # 额外功能:播放当前文件夹下的hello.wav 203 | play_audio("hello.wav") 204 | continuous_conversation(model, vad) 205 | stream.start_stream() 206 | print("Conversation ended. Listening for wake word again...") 207 | except KeyboardInterrupt: 208 | print("Stopping service...") 209 | finally: 210 | stream.close() 211 | pa.terminate() 212 | save_conversation_history() 213 | 214 | 215 | if __name__ == "__main__": 216 | # 配置Porcupine关键词检测 217 | porcupine = pvporcupine.create( 218 | access_key=f"输入你的API", 219 | keyword_paths=["./hey-bro_en_windows_v3_0_0/hey-bro_en_windows_v3_0_0.ppn"] 220 | ) 221 | 222 | # ASR模型路径 223 | model_dir = "conversation_logs/sensevoice-small-onnx-quant" 224 | sample_rate = 16000 225 | frame_duration = 30 226 | frame_length = int(sample_rate * frame_duration / 1000) 227 | 228 | settings = ("你的名字是水月,我是博士,我从事理论与计算化学的工作,需要学习数学物理化学计算机的交叉知识,你是我的助手,你和博士现在都还不够优秀," 229 | "你们会在之后的学习生涯中慢慢变得更加优秀。") 230 | # 全局变量用于存储对话历史 231 | conversation_history = [ 232 | {"role": "system", "content": f"你将扮演一个和我用语音聊天的对象,回复就和正常说话一样,得简短。{settings}"}, 233 | {"role": "user", "content": "我是谁?你又是谁?"}, 234 | {"role": "assistant", "content": "你是博士,我是水月"} 235 | ] 236 | 237 | start_service() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests 2 | playsound 3 | pyaudio 4 | numpy 5 | funasr-onnx 6 | scipy 7 | sounddevice 8 | webrtcvad 9 | soundfile 10 | --------------------------------------------------------------------------------