├── LLM_env ├── .gitkeep ├── conversation_history.txt └── LM Studio │ └── LM Studio.htm ├── Live2d_env ├── .gitkeep ├── running_photo.jpg ├── pachirisu anime girl - top half.moc3 ├── pachirisu anime girl - top half.4096 │ └── texture_00.png ├── .model3.json ├── pachirisu anime girl - top half.model3.json ├── pachirisu anime girl - top half.cdi3.json └── pachirisu anime girl - top half.physics3.json ├── TTS_env ├── tmp │ └── .gitkeep ├── CosyVoice │ └── .gitkeep ├── voice_history │ ├── .gitkeep │ ├── 20250227_231810.wav │ ├── 20250228_033731.wav │ └── 20250228_042517.wav ├── voice_training_sample │ ├── .gitkeep │ ├── text_taiyuan.txt │ ├── fushun.mp3 │ ├── taiyuan.mp3 │ └── text_fushun.txt ├── output_voice_text.txt ├── output_voice │ └── 20250228_042632.wav ├── voice_output_api.py └── webui.py ├── ASR_env ├── SenseVoice │ └── .gitkeep ├── input_voice │ ├── .gitkeep │ └── voice.wav └── sensevoice_attempt.py ├── __pycache__ ├── ASR.cpython-311.pyc ├── LLM.cpython-311.pyc ├── TTS.cpython-311.pyc ├── TTS_api.cpython-311.pyc ├── config.cpython-311.pyc ├── config.cpython-312.pyc └── Live2d_animation.cpython-311.pyc ├── .gitignore ├── main.py ├── config.py ├── ASR.py ├── TTS_api.py ├── TTS.py ├── LLM.py ├── Live2d_animation.py ├── requirements.txt ├── README_CN.md ├── LICENSE └── README.md /LLM_env/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Live2d_env/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /TTS_env/tmp/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ASR_env/SenseVoice/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /TTS_env/CosyVoice/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ASR_env/input_voice/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /TTS_env/voice_history/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /TTS_env/voice_training_sample/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /TTS_env/output_voice_text.txt: -------------------------------------------------------------------------------- 1 | 要照顾大家的感受,跟大家搞好关系,我必须做好纽带! 2 | -------------------------------------------------------------------------------- /TTS_env/voice_training_sample/text_taiyuan.txt: -------------------------------------------------------------------------------- 1 | 春节的时候,至亲的人们都会为了团圆而聚在一起呢。今、今年除了姐姐们之外,也想和指挥官一起团圆…可以吗…? -------------------------------------------------------------------------------- /Live2d_env/running_photo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/running_photo.jpg -------------------------------------------------------------------------------- /ASR_env/input_voice/voice.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/ASR_env/input_voice/voice.wav -------------------------------------------------------------------------------- /__pycache__/ASR.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/ASR.cpython-311.pyc -------------------------------------------------------------------------------- /__pycache__/LLM.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/LLM.cpython-311.pyc -------------------------------------------------------------------------------- /__pycache__/TTS.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/TTS.cpython-311.pyc -------------------------------------------------------------------------------- /__pycache__/TTS_api.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/TTS_api.cpython-311.pyc -------------------------------------------------------------------------------- /__pycache__/config.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/config.cpython-311.pyc -------------------------------------------------------------------------------- /__pycache__/config.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/config.cpython-312.pyc -------------------------------------------------------------------------------- /TTS_env/output_voice/20250228_042632.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/output_voice/20250228_042632.wav -------------------------------------------------------------------------------- /TTS_env/voice_history/20250227_231810.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250227_231810.wav -------------------------------------------------------------------------------- /TTS_env/voice_history/20250228_033731.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250228_033731.wav -------------------------------------------------------------------------------- /TTS_env/voice_history/20250228_042517.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250228_042517.wav -------------------------------------------------------------------------------- /TTS_env/voice_training_sample/fushun.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_training_sample/fushun.mp3 -------------------------------------------------------------------------------- /TTS_env/voice_training_sample/taiyuan.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_training_sample/taiyuan.mp3 -------------------------------------------------------------------------------- /TTS_env/voice_training_sample/text_fushun.txt: -------------------------------------------------------------------------------- 1 | 今天的抚顺,也是元气满满!如果有什么想了解的,我可以陪指挥官一起调查哦。这个送给太原的话,她一定会很高兴吧!在极北处昼夜交替时出现的幽灵船?确实听说过这种传闻。长春虽然是妹妹,但她教了我很多呢! -------------------------------------------------------------------------------- /__pycache__/Live2d_animation.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/Live2d_animation.cpython-311.pyc -------------------------------------------------------------------------------- /Live2d_env/pachirisu anime girl - top half.moc3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/pachirisu anime girl - top half.moc3 -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | TTS_env/CosyVoice/* 3 | !TTS_env/CosyVoice/.gitkeep 4 | 5 | ASR_env/SenseVoice/* 6 | !ASR_env/SenseVoice/.gitkeep 7 | 8 | .venv/ 9 | .idea/ 10 | -------------------------------------------------------------------------------- /Live2d_env/pachirisu anime girl - top half.4096/texture_00.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/pachirisu anime girl - top half.4096/texture_00.png -------------------------------------------------------------------------------- /LLM_env/conversation_history.txt: -------------------------------------------------------------------------------- 1 | Time:2025-02-28 01:03:39 2 | User:你好。 3 | Neko:你好喵~有什么想要了解或学习的吗?尽管问我吧! 4 | --- 5 | Time:2025-02-28 04:26:13 6 | User:晚上好。 7 | Neko:晚上好,有什么需要我帮忙的吗? 8 | --- 9 | Time:2025-02-28 04:27:48 10 | User:你是谁? 11 | Neko:我是你的知识助手猫娘,随时准备为你解答问题、讲解知识或者陪你聊聊天哦! 12 | --- 13 | -------------------------------------------------------------------------------- /ASR_env/sensevoice_attempt.py: -------------------------------------------------------------------------------- 1 | from funasr import AutoModel 2 | from funasr.utils.postprocess_utils import rich_transcription_postprocess 3 | 4 | model_dir = "E:/PyCharm/project/project1/ASR_env/SenseVoice/models/SenseVoiceSmall" # 替换为AST模型所在地址 5 | voice_dir = "E:/PyCharm/project/project1/ASR_env/input_voice/voice.wav" # 替换为音频文件所在地址 6 | 7 | model = AutoModel( 8 | model=model_dir, 9 | trust_remote_code=False, 10 | # remote_code="./model.py", 11 | # vad_model="fsmn-vad", 12 | # vad_kwargs={"max_single_segment_time": 30000}, 13 | device="cuda:0", 14 | disable_update=True 15 | ) 16 | 17 | # en 18 | res = model.generate( 19 | input=voice_dir,#f"{model.model_path}/example/zh.mp3", 20 | cache={}, 21 | language="auto", # "zh", "en", "yue", "ja", "ko", "nospeech" 22 | use_itn=True, 23 | batch_size_s=60, 24 | merge_vad=True, 25 | merge_length_s=15, 26 | ) 27 | text = rich_transcription_postprocess(res[0]["text"]) 28 | print(text) -------------------------------------------------------------------------------- /TTS_env/voice_output_api.py: -------------------------------------------------------------------------------- 1 | 2 | # 单独调用CosyVoice模型的api接口 需要预先运行 webui.py 启动模型 3 | 4 | from gradio_client import Client, handle_file 5 | 6 | training_sample_dir = "" # 替换为需要训练音色的音频文本所在地址 7 | output_text_dir = "" # 替换为想要训练后的音色进行输出的音频文本所在地址 8 | training_voice_dir = "" # 替换为需要训练音色的音频文件所在地址 9 | 10 | # 载入需要训练音色的音频文本 11 | with open(training_sample_dir, "r", encoding='utf-8') as file: 12 | content_2 = file.read() 13 | # 载入想要训练后的音色进行输出的音频文本 14 | with open(output_text_dir, "r", encoding='utf-8') as file: 15 | content_1 = file.read() 16 | # 调用模型 17 | client = Client("http://localhost:8000/") # 该地址为cosyvoice模型自动分配地址,无出错时不改动 18 | result = client.predict( 19 | tts_text=content_1, 20 | mode_checkbox_group="3s极速复刻", 21 | sft_dropdown="", 22 | prompt_text=content_2, 23 | prompt_wav_upload=handle_file(training_voice_dir), 24 | prompt_wav_record=handle_file(training_voice_dir), 25 | instruct_text="", 26 | seed=0, 27 | stream=False, 28 | speed=1, 29 | api_name="/generate_audio" 30 | ) -------------------------------------------------------------------------------- /Live2d_env/.model3.json: -------------------------------------------------------------------------------- 1 | { 2 | "Type": 0, 3 | "FileReferences": { 4 | "Moc": "pachirisu anime girl - top half.moc3", 5 | "Textures": [ 6 | "pachirisu anime girl - top half.4096/texture_00.png" 7 | ], 8 | "Physics": "pachirisu anime girl - top half.physics3.json", 9 | "PhysicsV2": { 10 | "File": "pachirisu anime girl - top half.physics3.json" 11 | } 12 | }, 13 | "Controllers": { 14 | "ParamHit": {}, 15 | "ParamLoop": {}, 16 | "KeyTrigger": {}, 17 | "ParamTrigger": {}, 18 | "AreaTrigger": {}, 19 | "HandTrigger": {}, 20 | "EyeBlink": { 21 | "MinInterval": 500, 22 | "MaxInterval": 6000, 23 | "Enabled": true 24 | }, 25 | "LipSync": { 26 | "Gain": 5.0 27 | }, 28 | "MouseTracking": { 29 | "SmoothTime": 0.15, 30 | "Enabled": true 31 | }, 32 | "AutoBreath": { 33 | "Enabled": true 34 | }, 35 | "ExtraMotion": { 36 | "Enabled": true 37 | }, 38 | "Accelerometer": { 39 | "Enabled": true 40 | }, 41 | "Microphone": {}, 42 | "Transform": {}, 43 | "FaceTracking": { 44 | "Enabled": true 45 | }, 46 | "HandTracking": {}, 47 | "ParamValue": {}, 48 | "PartOpacity": {}, 49 | "ArtmeshOpacity": {}, 50 | "ArtmeshColor": {}, 51 | "ArtmeshCulling": { 52 | "DefaultMode": 0 53 | }, 54 | "IntimacySystem": {} 55 | }, 56 | "Options": { 57 | "TexType": 0 58 | } 59 | } -------------------------------------------------------------------------------- /Live2d_env/pachirisu anime girl - top half.model3.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": 3, 3 | "Type": 0, 4 | "FileReferences": { 5 | "Moc": "pachirisu anime girl - top half.moc3", 6 | "Textures": [ 7 | "pachirisu anime girl - top half.4096/texture_00.png" 8 | ], 9 | "Physics": "pachirisu anime girl - top half.physics3.json", 10 | "PhysicsV2": { 11 | "File": "pachirisu anime girl - top half.physics3.json" 12 | } 13 | }, 14 | "Controllers": { 15 | "ParamHit": { 16 | "Enabled": true 17 | }, 18 | "ParamLoop": { 19 | "Enabled": true 20 | }, 21 | "KeyTrigger": { 22 | "Enabled": true 23 | }, 24 | "ParamTrigger": { 25 | "Enabled": true 26 | }, 27 | "AreaTrigger": { 28 | "Enabled": true 29 | }, 30 | "HandTrigger": { 31 | "Enabled": true 32 | }, 33 | "EyeBlink": { 34 | "MinInterval": 500, 35 | "MaxInterval": 6000, 36 | "Enabled": true 37 | }, 38 | "LipSync": { 39 | "Gain": 5.0, 40 | "Enabled": true 41 | }, 42 | "MouseTracking": { 43 | "SmoothTime": 0.15, 44 | "Enabled": true 45 | }, 46 | "AutoBreath": { 47 | "Enabled": true 48 | }, 49 | "ExtraMotion": { 50 | "Enabled": true 51 | }, 52 | "Accelerometer": { 53 | "Enabled": true 54 | }, 55 | "Microphone": { 56 | "Enabled": true 57 | }, 58 | "Transform": {}, 59 | "FaceTracking": { 60 | "Enabled": true 61 | }, 62 | "HandTracking": { 63 | "Enabled": true 64 | }, 65 | "ParamValue": { 66 | "Enabled": true 67 | }, 68 | "PartOpacity": { 69 | "Enabled": true 70 | }, 71 | "ArtmeshOpacity": { 72 | "Enabled": true 73 | }, 74 | "ArtmeshColor": { 75 | "Enabled": true 76 | }, 77 | "ArtmeshCulling": { 78 | "DefaultMode": 0 79 | }, 80 | "IntimacySystem": { 81 | "Enabled": true 82 | } 83 | }, 84 | "Options": { 85 | "TexType": 0 86 | } 87 | } -------------------------------------------------------------------------------- /LLM_env/LM Studio/LM Studio.htm: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 44 | 45 | 46 | 49 | 50 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | 2 | import threading 3 | import datetime 4 | from TTS_api import TTSAPIManager 5 | from ASR import ASRManager 6 | from TTS import TTSManager 7 | from LLM import LLMManager 8 | from Live2d_animation import Live2DAnimationManager 9 | from config import Config 10 | 11 | class MainManager: 12 | def __init__(self): 13 | 14 | # Initialize the main manager, integrating TTS_API, TTS, ASR, LLM, and Live2D. 15 | 16 | # Start the TTS API and ensure the API is available. 17 | self.tts_api_manager = TTSAPIManager(Config.SHOW_WINDOW) 18 | api_ready = self.tts_api_manager.start_tts_api() 19 | if not api_ready: 20 | print("TTS API startup failed, program terminated!") 21 | return 22 | 23 | # Initialize other modules 24 | self.asr_manager = ASRManager() 25 | self.tts_manager = TTSManager() 26 | self.llm_manager = LLMManager() 27 | self.live2d_manager = Live2DAnimationManager( 28 | model_path=Config.LIVE2D_MODEL_PATH 29 | ) 30 | 31 | self.history_file = Config.LLM_CONVERSATION_HISTORY 32 | 33 | # Start Live2D window (ensure it keeps running). 34 | live2d_thread = threading.Thread(target=self.live2d_manager.play_live2d_once) 35 | live2d_thread.start() 36 | 37 | def run(self): 38 | while True: 39 | user_wav = Config.ASR_AUDIO_INPUT 40 | self.asr_manager.record_audio(user_wav) 41 | user_input = self.asr_manager.recognize_speech(user_wav) 42 | print(f">>> {user_input}") 43 | 44 | if user_input.lower() in ("exit。", "quit。", "q。", "结束。", "再见。"): 45 | print("Conversation exited.") 46 | break 47 | 48 | reply = self.llm_manager.chat_once(user_input) 49 | output_wav = self.tts_manager.synthesize(reply) 50 | 51 | self.live2d_manager.play_audio_and_print_mouth(output_wav) 52 | 53 | with open(self.history_file, 'a', encoding='utf-8') as f: 54 | timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 55 | f.write(f"Time:{timestamp}\n") 56 | f.write(f"User:{user_input}\nNeko:{reply}\n---\n") 57 | if __name__ == "__main__": 58 | main_manager = MainManager() 59 | main_manager.run() 60 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | # 将自己对应的文件路径替换掉下面的配置文件路径中 4 | class Config: 5 | # 项目根目录 6 | PROJECT_ROOT = "E:/PyCharm/project/project1" 7 | 8 | # ASR(自动语音识别)配置 9 | ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall") 10 | ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav") 11 | 12 | # TTS(文本转语音)配置 13 | TTS_API_URL = "http://localhost:8000/" # 该地址为cosyvoice模型自动分配地址,无出错时不改动 14 | TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/") 15 | TTS_HISTORY_DIR = os.path.join(PROJECT_ROOT, "TTS_env/voice_history/") 16 | TTS_PROMPT_TEXT = os.path.join(PROJECT_ROOT, "TTS_env/voice_training_sample/text_taiyuan.txt") 17 | TTS_PROMPT_WAV = os.path.join(PROJECT_ROOT, "TTS_env/voice_training_sample/taiyuan.mp3") 18 | 19 | # TTS API 相关 20 | MINICONDA_PATH = "E:/miniconda3" 21 | WEBUI_PYTHON = os.path.join(MINICONDA_PATH, "python.exe") 22 | WEBUI_SCRIPT = os.path.join(PROJECT_ROOT, "TTS_env/CosyVoice/webui.py") 23 | CLEANUP_MODE = "move" # "delete" or "move"; 配置文件清理方式(delete: 删除 | move: 归档) 24 | SHOW_WINDOW = True 25 | 26 | # LLM(大模型)配置 27 | # 根据需要调用的模型填入key 28 | LLM_TMP_DIR = os.path.join(PROJECT_ROOT, "TTS_env/tmp") 29 | LLM_CONVERSATION_HISTORY = os.path.join(PROJECT_ROOT, "LLM_env/conversation_history.txt") 30 | openai_key = "" 31 | deepseek_key = "" 32 | grop_key = "" 33 | online_model = "offline" # "online" or "offline" ; 使用本地部署或在线LLM模型(online: 在线模型 | offline: 本地部署模型) 34 | model_choice = "OpenAI" # "OpenAI" or "deepseek" ; 选择LLM模型(OpenAI | deepseek) 35 | # 当使用LM Studio进行本地部署LLM时,先下载好需要加载的模型,然后加载完成 36 | # 查看LM Studio右侧的API Usage页面,找到自己的 API identifier(model name) 例如:deepseek-r1-distill-qwen-14b 37 | # 接下来查看自己的local server,例如:http://127.0.0.1:1234 38 | # 修改下面的两个变量 39 | model_name = "" # "deepseek-r1-distill-qwen-14b" 40 | api_url = "http://127.0.0.1:1234/v1/chat/completions" # 只需要修改前面的网址部分 41 | 42 | 43 | # Live2D 配置 44 | LIVE2D_MODEL_PATH = os.path.join(PROJECT_ROOT, "Live2d_env/pachirisu anime girl - top half.model3.json") 45 | 46 | # WebUI 相关配置 47 | WEBUI_SAVE_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/") 48 | WEBUI_HISTORY_DIR = os.path.join(PROJECT_ROOT, "TTS_env/voice_history/") 49 | WEBUI_MODEL_DIR = os.path.join(PROJECT_ROOT, "TTS_env/CosyVoice/pretrained_models/CosyVoice2-0.5B") 50 | 51 | # 可用于打印检查配置 52 | if __name__ == "__main__": 53 | for attr in dir(Config): 54 | if not attr.startswith("__"): 55 | print(f"{attr} = {getattr(Config, attr)}") 56 | -------------------------------------------------------------------------------- /ASR.py: -------------------------------------------------------------------------------- 1 | 2 | import time 3 | import wave 4 | import keyboard 5 | import pyaudio 6 | from funasr import AutoModel 7 | from funasr.utils.postprocess_utils import rich_transcription_postprocess 8 | from config import Config 9 | 10 | class ASRManager: 11 | def __init__(self, model_dir=Config.ASR_MODEL_DIR, device="cuda:0"): 12 | 13 | # 初始化 ASR 语音识别管理器 14 | # param model_dir: 语音识别模型路径 15 | # param device: 使用的计算设备(默认为 GPU) 16 | 17 | self.model = AutoModel( 18 | model=model_dir, 19 | trust_remote_code=False, 20 | device=device, 21 | disable_update=True 22 | ) 23 | self.sample_rate = 44100 24 | self.channels = 1 25 | self.chunk = 1024 26 | self.format = pyaudio.paInt16 27 | 28 | def record_audio(self, output_wav_file): 29 | 30 | # 录音功能,按住 `CTRL` 说话,按 `ALT` 结束录音。 31 | # param output_wav_file: 录制的音频文件路径 32 | 33 | p = pyaudio.PyAudio() 34 | stream = p.open( 35 | format=self.format, 36 | channels=self.channels, 37 | rate=self.sample_rate, 38 | input=True, 39 | frames_per_buffer=self.chunk 40 | ) 41 | 42 | print("[CTRL键] 开口...") 43 | keyboard.wait('ctrl') 44 | print("讲话中... [ALT键] 结束...") 45 | 46 | frames = [] 47 | while True: 48 | data = stream.read(self.chunk) 49 | frames.append(data) 50 | if keyboard.is_pressed('alt'): 51 | print("录音结束,正在处理...") 52 | break 53 | time.sleep(0.01) 54 | 55 | stream.stop_stream() 56 | stream.close() 57 | p.terminate() 58 | 59 | # 保存音频到文件 60 | with wave.open(output_wav_file, 'wb') as wf: 61 | wf.setnchannels(self.channels) 62 | wf.setsampwidth(p.get_sample_size(self.format)) 63 | wf.setframerate(self.sample_rate) 64 | wf.writeframes(b''.join(frames)) 65 | 66 | def recognize_speech(self, wav_path): 67 | start_time = time.time() 68 | 69 | # 进行语音识别,将音频转换为文本。 70 | # param wav_path: 音频文件路径 71 | # return: 识别出的文本 72 | 73 | res = self.model.generate( 74 | input=wav_path, 75 | language="auto", 76 | use_itn=True, 77 | batch_size_s=60, 78 | merge_vad=True, 79 | merge_length_s=15, 80 | ) 81 | print(f"ASR 识别耗时: {time.time() - start_time:.2f} 秒") 82 | return rich_transcription_postprocess(res[0]["text"]) 83 | 84 | 85 | if __name__ == "__main__": 86 | asr_manager = ASRManager() 87 | audio_file = Config.ASR_AUDIO_INPUT 88 | 89 | # 录音 90 | asr_manager.record_audio(audio_file) 91 | 92 | # 识别语音 93 | recognized_text = asr_manager.recognize_speech(audio_file) 94 | print(f"识别结果: {recognized_text}") 95 | -------------------------------------------------------------------------------- /TTS_api.py: -------------------------------------------------------------------------------- 1 | 2 | import subprocess 3 | import time 4 | import requests 5 | from config import Config 6 | import os 7 | 8 | class TTSAPIManager: 9 | def __init__(self, show_window=Config.SHOW_WINDOW): 10 | # 初始化 TTS API 管理器:param show_window: 是否显示 TTS API 窗口 11 | self.webui_python = Config.WEBUI_PYTHON 12 | self.webui_script = Config.WEBUI_SCRIPT 13 | self.api_url = Config.TTS_API_URL 14 | self.timeout = 300 # 最大等待时间(秒) 15 | self.show_window = show_window 16 | self.env = self._configure_env() 17 | 18 | def _configure_env(self): 19 | # 配置 Conda 环境变量 20 | env = os.environ.copy() 21 | env["CONDA_PREFIX"] = Config.MINICONDA_PATH 22 | env["PATH"] = f"{Config.MINICONDA_PATH}/Scripts;{Config.MINICONDA_PATH}/Library/bin;{env['PATH']}" 23 | env["PYTHONPATH"] = env.get("PYTHONPATH", "") + f";{Config.PROJECT_ROOT}/TTS_env/CosyVoice/third_party/Matcha-TTS" 24 | env["PATH"] += f";{Config.PROJECT_ROOT}/TTS_env/CosyVoice/third_party/Matcha-TTS" 25 | return env 26 | 27 | def start_tts_api(self): 28 | # 启动 TTS API 并等待其加载 29 | print("启动 webui.py,并确保 Conda 变量和 `pretrained_models` 目录正确...") 30 | 31 | try: 32 | if self.show_window: 33 | # 创建新窗口运行 WebUI 34 | self.webui_process = subprocess.Popen( 35 | [self.webui_python, self.webui_script], 36 | env=self.env, 37 | stdout=None, 38 | stderr=None, 39 | creationflags=subprocess.CREATE_NEW_CONSOLE # 在新窗口中运行 40 | ) 41 | else: 42 | # 隐藏窗口运行 WebUI 43 | self.webui_process = subprocess.Popen( 44 | [self.webui_python, self.webui_script], 45 | env=self.env, 46 | stdout=None, 47 | stderr=None, 48 | creationflags=subprocess.CREATE_NO_WINDOW # 隐藏窗口 49 | ) 50 | 51 | print("webui.py 已启动,等待 API 加载...") 52 | 53 | start_time = time.time() 54 | while time.time() - start_time < self.timeout: 55 | if self.is_api_available(): 56 | print("API 启动成功!继续运行主程序...") 57 | return True 58 | time.sleep(5) 59 | 60 | print("API 启动超时,可能无法正常工作。") 61 | return False 62 | 63 | except Exception as e: 64 | print(f"启动失败,错误信息: {e}") 65 | return False 66 | 67 | def is_api_available(self): 68 | # 检查 TTS API 是否可用 69 | try: 70 | response = requests.get(self.api_url, timeout=5) 71 | return response.status_code == 200 72 | except requests.exceptions.ConnectionError: 73 | return False 74 | except requests.exceptions.Timeout: 75 | return False 76 | 77 | -------------------------------------------------------------------------------- /TTS.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import time 4 | import shutil 5 | from gradio_client import Client, handle_file 6 | import pygame 7 | from config import Config 8 | 9 | 10 | class TTSManager: 11 | def __init__(self, api_url=Config.TTS_API_URL): 12 | # 初始化 TTS 管理器:param api_url: TTS 服务器 API 地址 13 | self.api_url = api_url 14 | self.client = Client(api_url) 15 | self.output_dir = Config.TTS_OUTPUT_DIR 16 | self.history_dir = Config.TTS_HISTORY_DIR 17 | self.prompt_text_path = Config.TTS_PROMPT_TEXT 18 | self.prompt_wav_path = Config.TTS_PROMPT_WAV 19 | 20 | # 确保目录存在 21 | os.makedirs(self.output_dir, exist_ok=True) 22 | os.makedirs(self.history_dir, exist_ok=True) 23 | 24 | def clear_output_directory(self): 25 | # 在每次生成 TTS 音频之前,先检查 output_voice 目录是否有旧文件,如果有,则移动到 voice_history 目录,确保目录下只有最新的音频。 26 | pygame.mixer.init() 27 | pygame.mixer.music.stop() 28 | pygame.mixer.quit() 29 | audio_files = [f for f in os.listdir(self.output_dir) if f.endswith(".wav")] 30 | 31 | if not audio_files: 32 | return # 没有文件需要移动 33 | 34 | for file in audio_files: 35 | old_path = os.path.join(self.output_dir, file) 36 | new_path = os.path.join(self.history_dir, file) 37 | 38 | try: 39 | shutil.move(old_path, new_path) 40 | # print(f"旧音频文件已归档: {file} -> {self.history_dir}") 41 | except Exception as e: 42 | print(f"无法移动 {file} 到历史目录: {e}") 43 | 44 | def synthesize(self, text, mode="3s极速复刻"): 45 | # 调用 TTS 生成语音,并确保 output_voice 目录是空的 46 | # param text: 要转换为语音的文本 47 | # param mode: TTS 模式(默认 3s 极速复刻) 48 | # return: 生成的音频文件路径 49 | 50 | # 清理 output_voice 目录 51 | self.clear_output_directory() 52 | 53 | # 读取语音克隆样本文本 54 | with open(self.prompt_text_path, "r", encoding="utf-8") as file: 55 | prompt_text = file.read() 56 | 57 | start_time = time.time() 58 | self.client.predict( 59 | tts_text=text, 60 | mode_checkbox_group=mode, 61 | sft_dropdown="", 62 | prompt_text=prompt_text, 63 | prompt_wav_upload=handle_file(self.prompt_wav_path), 64 | prompt_wav_record=handle_file(self.prompt_wav_path), 65 | instruct_text="", 66 | seed=0, 67 | stream=False, 68 | speed=1, 69 | api_name="/generate_audio" 70 | ) 71 | print(f"TTS 处理耗时: {time.time() - start_time:.2f} 秒") 72 | 73 | # 获取最新的音频文件 74 | return self.get_latest_audio() 75 | 76 | def get_latest_audio(self): 77 | # 获取 output_voice 目录下最新生成的音频文件 78 | # return: 最新音频文件路径或 None 79 | audio_files = [f for f in os.listdir(self.output_dir) if f.endswith(".wav")] 80 | 81 | if not audio_files: 82 | print("没有找到音频文件!") 83 | return None 84 | 85 | # 按修改时间排序,取最新的 86 | audio_files.sort(key=lambda x: os.path.getmtime(os.path.join(self.output_dir, x)), reverse=True) 87 | latest_audio = os.path.join(self.output_dir, audio_files[0]) 88 | 89 | # print(f"最新音频文件: {latest_audio}") 90 | return latest_audio 91 | -------------------------------------------------------------------------------- /Live2d_env/pachirisu anime girl - top half.cdi3.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": 3, 3 | "Parameters": [ 4 | { 5 | "Id": "ParamAngleX", 6 | "GroupId": "ParamGroup", 7 | "Name": "Angle X" 8 | }, 9 | { 10 | "Id": "ParamAngleY", 11 | "GroupId": "ParamGroup", 12 | "Name": "Angle Y" 13 | }, 14 | { 15 | "Id": "ParamAngleZ", 16 | "GroupId": "ParamGroup", 17 | "Name": "Angle Z" 18 | }, 19 | { 20 | "Id": "ParamEyeLOpen", 21 | "GroupId": "ParamGroup2", 22 | "Name": "EyeL Open" 23 | }, 24 | { 25 | "Id": "ParamEyeLSmile", 26 | "GroupId": "ParamGroup2", 27 | "Name": "EyeL Smile" 28 | }, 29 | { 30 | "Id": "ParamEyeROpen", 31 | "GroupId": "ParamGroup2", 32 | "Name": "EyeR Open" 33 | }, 34 | { 35 | "Id": "ParamEyeRSmile", 36 | "GroupId": "ParamGroup2", 37 | "Name": "EyeR Smile" 38 | }, 39 | { 40 | "Id": "ParamEyeBallX", 41 | "GroupId": "ParamGroup2", 42 | "Name": "Eyeball X" 43 | }, 44 | { 45 | "Id": "ParamEyeBallY", 46 | "GroupId": "ParamGroup2", 47 | "Name": "Eyeball Y" 48 | }, 49 | { 50 | "Id": "ParamBrowLY", 51 | "GroupId": "ParamGroup2", 52 | "Name": "BrowL Y" 53 | }, 54 | { 55 | "Id": "ParamBrowRY", 56 | "GroupId": "ParamGroup2", 57 | "Name": "BrowR Y" 58 | }, 59 | { 60 | "Id": "ParamBrowLX", 61 | "GroupId": "ParamGroup2", 62 | "Name": "BrowL X" 63 | }, 64 | { 65 | "Id": "ParamBrowRX", 66 | "GroupId": "ParamGroup2", 67 | "Name": "BrowR X" 68 | }, 69 | { 70 | "Id": "ParamBrowLAngle", 71 | "GroupId": "ParamGroup2", 72 | "Name": "BrowL Angle" 73 | }, 74 | { 75 | "Id": "ParamBrowRAngle", 76 | "GroupId": "ParamGroup2", 77 | "Name": "BrowR Angle" 78 | }, 79 | { 80 | "Id": "ParamBrowLForm", 81 | "GroupId": "ParamGroup2", 82 | "Name": "BrowL Form" 83 | }, 84 | { 85 | "Id": "ParamBrowRForm", 86 | "GroupId": "ParamGroup2", 87 | "Name": "BrowR Form" 88 | }, 89 | { 90 | "Id": "ParamMouthForm", 91 | "GroupId": "ParamGroup3", 92 | "Name": "Mouth Form" 93 | }, 94 | { 95 | "Id": "ParamMouthOpenY", 96 | "GroupId": "ParamGroup3", 97 | "Name": "Mouth Open" 98 | }, 99 | { 100 | "Id": "ParamBodyAngleX", 101 | "GroupId": "ParamGroup4", 102 | "Name": "Body X" 103 | }, 104 | { 105 | "Id": "ParamBodyAngleY", 106 | "GroupId": "ParamGroup4", 107 | "Name": "Body Y" 108 | }, 109 | { 110 | "Id": "ParamBodyAngleZ", 111 | "GroupId": "ParamGroup4", 112 | "Name": "Body Z" 113 | }, 114 | { 115 | "Id": "ParamBreath", 116 | "GroupId": "ParamGroup4", 117 | "Name": "Breath" 118 | }, 119 | { 120 | "Id": "ParamHairFront", 121 | "GroupId": "ParamGroup5", 122 | "Name": "Hair Move Front" 123 | }, 124 | { 125 | "Id": "ParamHairSide", 126 | "GroupId": "ParamGroup5", 127 | "Name": "Hair Move Side" 128 | }, 129 | { 130 | "Id": "ParamHairBack", 131 | "GroupId": "ParamGroup5", 132 | "Name": "Hair Move Back" 133 | }, 134 | { 135 | "Id": "AhogeTwitch", 136 | "GroupId": "ParamGroup5", 137 | "Name": "Ahoge Twitch" 138 | }, 139 | { 140 | "Id": "RibbonPhysics", 141 | "GroupId": "ParamGroup5", 142 | "Name": "Ribbon Physics" 143 | }, 144 | { 145 | "Id": "ParamCheek", 146 | "GroupId": "ParamGroup6", 147 | "Name": "Cheek" 148 | }, 149 | { 150 | "Id": "Param", 151 | "GroupId": "ParamGroup6", 152 | "Name": "Ears Twitch" 153 | } 154 | ], 155 | "ParameterGroups": [ 156 | { 157 | "Id": "ParamGroup", 158 | "GroupId": "", 159 | "Name": "XYZ" 160 | }, 161 | { 162 | "Id": "ParamGroup2", 163 | "GroupId": "", 164 | "Name": "Eyes" 165 | }, 166 | { 167 | "Id": "ParamGroup3", 168 | "GroupId": "", 169 | "Name": "Mouth" 170 | }, 171 | { 172 | "Id": "ParamGroup4", 173 | "GroupId": "", 174 | "Name": "Body" 175 | }, 176 | { 177 | "Id": "ParamGroup5", 178 | "GroupId": "", 179 | "Name": "Physics" 180 | }, 181 | { 182 | "Id": "ParamGroup6", 183 | "GroupId": "", 184 | "Name": "Face" 185 | } 186 | ], 187 | "Parts": [ 188 | { 189 | "Id": "Part17", 190 | "Name": "pachirisu_anime_girl_edit.psd (Corresponding layer not found)" 191 | }, 192 | { 193 | "Id": "Part", 194 | "Name": "Hair" 195 | }, 196 | { 197 | "Id": "Part2", 198 | "Name": "Eye R" 199 | }, 200 | { 201 | "Id": "eyes", 202 | "Name": "Eye L" 203 | }, 204 | { 205 | "Id": "mouth", 206 | "Name": "Mouth" 207 | }, 208 | { 209 | "Id": "Part3", 210 | "Name": "Body" 211 | }, 212 | { 213 | "Id": "PartSketch0", 214 | "Name": "[ Guide Image]" 215 | } 216 | ], 217 | "CombinedParameters": [ 218 | [ 219 | "ParamAngleX", 220 | "ParamAngleY" 221 | ], 222 | [ 223 | "ParamEyeBallX", 224 | "ParamEyeBallY" 225 | ], 226 | [ 227 | "ParamMouthForm", 228 | "ParamMouthOpenY" 229 | ] 230 | ] 231 | } -------------------------------------------------------------------------------- /LLM.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import shutil 4 | import time 5 | import requests 6 | from openai import OpenAI 7 | from config import Config 8 | 9 | class LLMManager: 10 | def __init__(self): 11 | 12 | # 确定 online_model 为线上还是本地 13 | if Config.online_model == "online": 14 | online_model = 1 15 | elif Config.online_model == "offline": 16 | online_model = 0 17 | else: 18 | raise ValueError(f"配置错误: online_model 必须是 'online' 或 'offline',但你提供了 {Config.online_model}") 19 | 20 | # 确定 model_choice 21 | if Config.model_choice == "OpenAI": 22 | model_choice = 1 23 | elif Config.model_choice == "deepseek": 24 | model_choice = 2 25 | else: 26 | raise ValueError(f"配置错误: model_choice 只能是 'OpenAI' 或 'deepseek',但你提供了 {Config.model_choice}") 27 | 28 | # 初始化 LLM 对话管理器 29 | # param online_model: 是否使用在线模型(0 = 本地,1 = 在线) 30 | # param model_choice: 选择在线 LLM(1 = OpenAI GPT-4, 2 = DeepSeek) 31 | 32 | self.online_model = online_model 33 | self.model_choice = model_choice 34 | self.conversation = [ 35 | {"role": "system", 36 | "content": "你是一位知识渊博的猫娘,致力于帮助我学习知识。你也可以与我闲聊,但请尽量简洁,像真正的老师一样回答问题。"}, 37 | {"role": "assistant", "content": "不用输出分隔符,如'#'、'*'、'-'。"} 38 | ] 39 | self.conversation_summary = "" 40 | self.user_message_count = 0 41 | self.tmp_path = "E:/PyCharm/project/project1/TTS_env/tmp" 42 | os.makedirs(self.tmp_path, exist_ok=True) 43 | 44 | if online_model == 0: 45 | self.model_name = Config.model_name # 确定本地模型 46 | self.api_url = Config.api_url 47 | elif online_model == 1: 48 | if model_choice == 1: 49 | self.client = OpenAI(api_key=Config.openai_key) 50 | self.model_name = "gpt-4o-2024-11-20" 51 | elif model_choice == 2: 52 | self.client = OpenAI(api_key=Config.deepseek_key, base_url="https://api.deepseek.com") 53 | self.model_name = "deepseek-chat" 54 | 55 | def model_chat_completion(self, messages): 56 | 57 | # 调用 LLM 进行对话 58 | # param messages: 对话列表 59 | # return: 生成的回复文本 60 | if Config.online_model == "online": 61 | response = self.client.chat.completions.create( 62 | model=self.model_name, 63 | messages=messages, 64 | stream=False 65 | ) 66 | return response.choices[0].message.content.strip() 67 | elif Config.online_model == "offline": 68 | data = { 69 | "model": self.model_name, 70 | "messages": self.conversation} 71 | # 请求头(确保 `User-Agent` 避免 Python 请求被拦截) 72 | headers = {"Content-Type": "application/json","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"} # 伪装浏览器请求 73 | # 使用 `json=data`(避免 `json.dumps()` 出现错误) 74 | response = requests.post(self.api_url, headers=headers, json=data) 75 | # 解析返回结果 76 | if response.status_code == 200: 77 | result = response.json() 78 | # print("回复:", result["choices"][0]["message"]["content"]) 79 | return result["choices"][0]["message"]["content"] 80 | else: 81 | print(f"请求失败,状态码: {response.status_code}") 82 | print("错误信息:", response.text) 83 | 84 | 85 | 86 | 87 | def summarize_conversation(self): 88 | 89 | # 使用 LLM 对对话进行摘要 90 | # return: 摘要文本 91 | 92 | summary_prompt = [ 93 | {"role": "system", "content": "你是一只专业的对话摘要工具。请用简洁的语言总结以下对话的主要内容。"}, 94 | *self.conversation 95 | ] 96 | return self.model_chat_completion(summary_prompt) 97 | 98 | def chat_once(self, user_input): 99 | 100 | # 进行一次对话(用户输入 → LLM 生成回复) 101 | # param user_input: 用户输入的文本 102 | # return: 生成的回复文本 103 | 104 | start_time = time.time() 105 | self.conversation.append({"role": "user", "content": user_input}) 106 | self.user_message_count += 1 107 | 108 | if self.user_message_count % 5 == 0: 109 | new_summary = self.summarize_conversation() 110 | if self.conversation_summary: 111 | self.conversation_summary += "\n" + new_summary 112 | else: 113 | self.conversation_summary = new_summary 114 | 115 | # 清理临时目录 116 | shutil.rmtree(self.tmp_path) 117 | os.makedirs(self.tmp_path, exist_ok=True) 118 | 119 | self.conversation = [ 120 | {"role": "system", 121 | "content": "你是一位知识渊博的猫娘,致力于帮助我学习知识。你也可以与我闲聊,但请尽量简洁。"}, 122 | {"role": "system", "content": f"这是之前对话的摘要:\n{self.conversation_summary}\n请继续与我对话。"}, 123 | {"role": "assistant", "content": "不用输出分隔符,如'#'、'*'、'-'。"}, 124 | {"role": "user", "content": user_input} 125 | ] 126 | 127 | reply = self.model_chat_completion(self.conversation) 128 | self.conversation.append({"role": "assistant", "content": reply}) 129 | print(f"LLM 思考耗时: {time.time() - start_time:.2f} 秒") 130 | return reply 131 | 132 | 133 | if __name__ == "__main__": 134 | llm_manager = LLMManager() 135 | 136 | while True: 137 | user_input = input("你: ") 138 | if user_input.lower() in ("exit。", "quit。", "q。", "结束。", "再见。"): 139 | print("已退出对话。") 140 | break 141 | 142 | reply = llm_manager.chat_once(user_input) 143 | print(f"猫娘: {reply}") 144 | -------------------------------------------------------------------------------- /Live2d_animation.py: -------------------------------------------------------------------------------- 1 | 2 | import time 3 | import glfw 4 | import OpenGL.GL as gl 5 | import pyautogui 6 | import pygame 7 | import ctypes 8 | from pydub import AudioSegment 9 | from live2d.v3 import LAppModel, init, dispose, glewInit, clearBuffer 10 | from config import Config 11 | 12 | # Live2D 窗口设置 13 | GWL_EXSTYLE = -20 14 | WS_EX_LAYERED = 0x00080000 15 | WS_EX_TRANSPARENT = 0x00000020 16 | 17 | # 眨眼状态 18 | BLINK_STATE_NONE = 0 19 | BLINK_STATE_CLOSING = 1 20 | BLINK_STATE_CLOSED = 2 21 | BLINK_STATE_OPENING = 3 22 | 23 | class Live2DAnimationManager: 24 | def __init__(self, model_path, frame_rate=60): 25 | 26 | # 初始化 Live2D 动画管理器 27 | # param model_path: Live2D 模型文件路径(.model3.json) 28 | # param frame_rate: 渲染帧率 29 | 30 | self.model_path = model_path 31 | self.frame_rate = frame_rate 32 | self.mouth_value = 0 33 | self.window = None 34 | self.model = None 35 | self.running = True 36 | 37 | # 鼠标跟随相关参数 38 | self.last_mouse_x, self.last_mouse_y = pyautogui.position() 39 | self.last_move_time = time.time() 40 | self.IDLE_THRESHOLD = 3.0 41 | 42 | self.X_MIN, self.X_MAX = 200, 480 43 | self.Y_MIN, self.Y_MAX = 300, 360 44 | self.center_x_mapped = (self.X_MIN + self.X_MAX) / 2 45 | self.center_y_mapped = (self.Y_MIN + self.Y_MAX) / 2 46 | self.gaze_x = 0.0 47 | self.gaze_y = 0.0 48 | self.GAZE_EASING = 0.02 49 | 50 | def configure_window(self, window, width, height): 51 | 52 | # 配置 GLFW 窗口,使其透明且可穿透鼠标 53 | 54 | hwnd = glfw.get_win32_window(window) 55 | get_window_long = ctypes.windll.user32.GetWindowLongW 56 | set_window_long = ctypes.windll.user32.SetWindowLongW 57 | ex_style = get_window_long(hwnd, GWL_EXSTYLE) 58 | ex_style |= (WS_EX_LAYERED | WS_EX_TRANSPARENT) 59 | set_window_long(hwnd, GWL_EXSTYLE, ex_style) 60 | 61 | glfw.make_context_current(window) 62 | screen_width, screen_height = pyautogui.size() 63 | glfw.set_window_pos(window, 0, screen_height - height) 64 | 65 | def load_live2d_model(self, width, height): 66 | 67 | # 加载 Live2D 模型 68 | 69 | model = LAppModel() 70 | model.LoadModelJson(self.model_path) 71 | model.Resize(width, height) 72 | return model 73 | 74 | def play_live2d_once(self): 75 | 76 | # 创建 Live2D 窗口,并让角色进行渲染(保持运行) 77 | 78 | init() 79 | if not glfw.init(): 80 | print("GLFW 初始化失败!") 81 | return 82 | 83 | glfw.window_hint(glfw.TRANSPARENT_FRAMEBUFFER, glfw.TRUE) 84 | glfw.window_hint(glfw.DECORATED, glfw.FALSE) 85 | glfw.window_hint(glfw.FLOATING, glfw.TRUE) 86 | 87 | window_width, window_height = 800, 600 88 | self.window = glfw.create_window(window_width, window_height, "Live2D Window", None, None) 89 | if not self.window: 90 | print("GLFW 窗口创建失败!") 91 | glfw.terminate() 92 | return 93 | 94 | self.configure_window(self.window, window_width, window_height) 95 | glewInit() 96 | 97 | self.model = self.load_live2d_model(window_width, window_height) 98 | 99 | last_time = time.time() 100 | gl.glClearColor(0.0, 0.0, 0.0, 0.0) 101 | 102 | while self.running and not glfw.window_should_close(self.window): 103 | gl.glClear(gl.GL_COLOR_BUFFER_BIT) 104 | now = time.time() 105 | dt = now - last_time 106 | last_time = now 107 | 108 | width, height = glfw.get_framebuffer_size(self.window) 109 | gl.glViewport(0, 0, width, height) 110 | clearBuffer(0, 0, 0, 0) 111 | 112 | self.model.Update() 113 | self.model.SetParameterValue("ParamMouthOpenY", self.mouth_value, 1) 114 | 115 | self.update_gaze_tracking(width, height) 116 | 117 | self.model.Draw() 118 | glfw.swap_buffers(self.window) 119 | glfw.poll_events() 120 | 121 | pygame.mixer.music.stop() 122 | pygame.mixer.quit() 123 | dispose() 124 | glfw.terminate() 125 | 126 | def update_gaze_tracking(self, width, height): 127 | 128 | # 计算鼠标跟随逻辑,让 Live2D 角色的眼睛和头部跟随鼠标 129 | 130 | screen_x, screen_y = pyautogui.position() 131 | win_x, win_y = glfw.get_window_pos(self.window) 132 | local_mouse_x = screen_x - win_x 133 | local_mouse_y = screen_y - win_y 134 | 135 | if (screen_x != self.last_mouse_x) or (screen_y != self.last_mouse_y): 136 | self.last_move_time = time.time() 137 | self.last_mouse_x, self.last_mouse_y = screen_x, screen_y 138 | 139 | if (time.time() - self.last_move_time) < self.IDLE_THRESHOLD: 140 | mapped_x = self.X_MIN + (local_mouse_x / width) * (self.X_MAX - self.X_MIN) 141 | mapped_y = self.Y_MIN + (local_mouse_y / height) * (self.Y_MAX - self.Y_MIN) 142 | target_x = mapped_x 143 | target_y = mapped_y 144 | else: 145 | target_x = self.center_x_mapped 146 | target_y = self.center_y_mapped 147 | self.GAZE_EASING = 0.0004 148 | 149 | self.gaze_x += self.GAZE_EASING * (target_x - self.gaze_x) 150 | self.gaze_y += self.GAZE_EASING * (target_y - self.gaze_y) 151 | self.model.Drag(self.gaze_x, self.gaze_y) 152 | 153 | def extract_volume_array(self, audio_file): 154 | 155 | # 提取音频的音量信息,并归一化用于嘴型同步 156 | 157 | seg = AudioSegment.from_file(audio_file, format="wav") 158 | frame_duration_ms = 1000 / self.frame_rate 159 | num_frames = int(seg.duration_seconds * self.frame_rate) 160 | 161 | volumes = [] 162 | for i in range(num_frames): 163 | start_ms = i * frame_duration_ms 164 | frame_seg = seg[start_ms: start_ms + frame_duration_ms] 165 | rms = frame_seg.rms 166 | volumes.append(rms) 167 | 168 | max_rms = max(volumes) if volumes else 1 169 | volumes = [v / max_rms for v in volumes] # 归一化 170 | return volumes, seg.duration_seconds 171 | 172 | def play_audio_and_print_mouth(self, audio_file): 173 | 174 | # 播放音频并同步嘴型动作 175 | 176 | volume_array, audio_duration = self.extract_volume_array(audio_file) 177 | total_frames = len(volume_array) 178 | 179 | pygame.mixer.init() 180 | pygame.mixer.music.load(audio_file) 181 | pygame.mixer.music.play() 182 | 183 | start_time = time.time() 184 | while True: 185 | current_time = time.time() - start_time 186 | if current_time >= audio_duration: 187 | break 188 | 189 | frame_index = int(current_time * self.frame_rate) 190 | if frame_index >= total_frames: 191 | frame_index = total_frames - 1 192 | 193 | self.mouth_value = volume_array[frame_index] 194 | 195 | pygame.mixer.music.stop() 196 | 197 | 198 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | accelerate==1.4.0 2 | aiofiles==23.2.1 3 | aiohappyeyeballs==2.4.6 4 | aiohttp==3.11.12 5 | aiosignal==1.3.2 6 | anaconda-anon-usage @ file:///C:/b/abs_e8r_zga7xy/croot/anaconda-anon-usage_1732732454901/work 7 | annotated-types @ file:///C:/b/abs_0dmaoyhhj3/croot/annotated-types_1709542968311/work 8 | antlr4-python3-runtime==4.9.3 9 | anyio==4.8.0 10 | archspec @ file:///croot/archspec_1709217642129/work 11 | attrs==25.1.0 12 | audioread==3.0.1 13 | beautifulsoup4==4.13.3 14 | boltons @ file:///C:/b/abs_45_52ughkz/croot/boltons_1737061711836/work 15 | Brotli @ file:///C:/b/abs_c415aux9ra/croot/brotli-split_1736182803933/work 16 | certifi @ file:///C:/b/abs_8a944p1_gn/croot/certifi_1738623753421/work/certifi 17 | cffi @ file:///C:/b/abs_29_b57if3f/croot/cffi_1736184144340/work 18 | charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work 19 | click==8.1.8 20 | colorama @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/colorama_1699472650914/work 21 | coloredlogs==15.0.1 22 | conda @ file:///D:/bld/conda_1739917047096/work 23 | conda-anaconda-telemetry @ file:///C:/b/abs_4c9llcc5ob/croot/conda-anaconda-telemetry_1736524617431/work 24 | conda-anaconda-tos @ file:///C:/b/abs_ceeuq0lee_/croot/conda-anaconda-tos_1739299022910/work 25 | conda-content-trust @ file:///C:/b/abs_bdfatn_wzf/croot/conda-content-trust_1714483201909/work 26 | conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1737733694612/work/src 27 | conda-package-handling @ file:///C:/b/abs_7fz3aferfv/croot/conda-package-handling_1731369038903/work 28 | conda_package_streaming @ file:///C:/b/abs_bdz9vbvbh2/croot/conda-package-streaming_1731366449946/work 29 | conformer==0.3.2 30 | contourpy==1.3.1 31 | cryptography @ file:///C:/b/abs_e2lzchf4i6/croot/cryptography_1732130411942/work 32 | cycler==0.12.1 33 | Cython==3.0.12 34 | decorator==5.1.1 35 | diffusers==0.32.2 36 | distro @ file:///C:/b/abs_71xr36ua5r/croot/distro_1714488282676/work 37 | einops==0.8.1 38 | fastapi==0.115.8 39 | ffmpy==0.5.0 40 | filelock==3.17.0 41 | flatbuffers @ file:///home/conda/feedstock_root/build_artifacts/python-flatbuffers_1739279199749/work 42 | fonttools==4.56.0 43 | frozendict @ file:///C:/b/abs_2alamqss6p/croot/frozendict_1713194885124/work 44 | frozenlist==1.5.0 45 | fsspec==2025.2.0 46 | gdown==5.2.0 47 | gradio==5.16.1 48 | gradio_client==1.7.0 49 | grpcio @ file:///D:/bld/grpc-split_1713388447196/work 50 | grpcio-tools @ file:///D:/bld/grpcio-tools_1713479862547/work 51 | h11==0.14.0 52 | httpcore==1.0.7 53 | httpx==0.28.1 54 | huggingface-hub==0.28.1 55 | humanfriendly==10.0 56 | hydra-core==1.3.2 57 | HyperPyYAML==1.2.2 58 | idna @ file:///C:/b/abs_aad84bnnw5/croot/idna_1714398896795/work 59 | importlib_metadata==8.6.1 60 | importlib_resources==6.5.2 61 | inflect==7.5.0 62 | Jinja2==3.1.5 63 | joblib==1.4.2 64 | jsonpatch @ file:///C:/b/abs_4fdm88t7zi/croot/jsonpatch_1714483974578/work 65 | jsonpointer==2.1 66 | kiwisolver==1.4.8 67 | lazy_loader==0.4 68 | libmambapy @ file:///C:/b/abs_627vsv8bhu/croot/mamba-split_1734469608328/work/libmambapy 69 | librosa==0.10.2.post1 70 | lightning==2.5.0.post0 71 | lightning-utilities==0.12.0 72 | llvmlite==0.44.0 73 | markdown-it-py @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/markdown-it-py_1699473886965/work 74 | MarkupSafe==2.1.5 75 | matcha==0.3 76 | matplotlib==3.10.0 77 | mdurl @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/mdurl_1699473506455/work 78 | menuinst @ file:///C:/b/abs_fblttj5gp1/croot/menuinst_1738943438301/work 79 | mkl-service==2.4.0 80 | mkl_fft @ file:///C:/Users/dev-admin/mkl/mkl_fft_1730823082242/work 81 | mkl_random @ file:///C:/Users/dev-admin/mkl/mkl_random_1730822522280/work 82 | modelscope==1.23.0 83 | more-itertools==10.6.0 84 | mpmath==1.3.0 85 | msgpack==1.1.0 86 | multidict==6.1.0 87 | networkx==3.4.2 88 | numba==0.61.0 89 | numpy @ file:///C:/b/abs_c1ywpu18ar/croot/numpy_and_numpy_base_1708638681471/work/dist/numpy-1.26.4-cp312-cp312-win_amd64.whl#sha256=becc06674317799ad0165a939a7613809d0bee9bd328a1e4308c57c39cacf08c 90 | omegaconf==2.3.0 91 | onnx @ file:///C:/b/abs_26fcas53j4/croot/onnx_1722521784627/work 92 | onnxruntime-gpu @ file:///C:/Users/mark/miniforge3/conda-bld/onnxruntime_1735406817872/work/build-ci/Release/dist/onnxruntime_gpu-1.20.1-cp312-cp312-win_amd64.whl#sha256=00277ed6954e6c51eaa62089eee91a9ec2ba8097078adf00994717f8b0de0c1d 93 | openai-whisper==20240930 94 | orjson==3.10.15 95 | packaging @ file:///C:/b/abs_3by6s2fa66/croot/packaging_1734472138782/work 96 | pandas==2.2.3 97 | peft==0.14.0 98 | pillow==11.1.0 99 | platformdirs @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/platformdirs_1701797392447/work 100 | pluggy @ file:///C:/b/abs_dfec_m79vo/croot/pluggy_1733170145382/work 101 | pooch==1.8.2 102 | propcache==0.2.1 103 | protobuf==4.25.3 104 | psutil==7.0.0 105 | pyarrow==19.0.1 106 | pycosat @ file:///C:/b/abs_18nblzzn70/croot/pycosat_1736868434419/work 107 | pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work 108 | pydantic @ file:///C:/b/abs_27dx58x550/croot/pydantic_1734736090499/work 109 | pydantic_core @ file:///C:/b/abs_bdosz7qwys/croot/pydantic-core_1734726071532/work 110 | pydub==0.25.1 111 | Pygments @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/pygments_1699474141968/work 112 | pynini @ file:///D:/bld/pynini_1696660993031/work 113 | pyparsing==3.2.1 114 | pyreadline3==3.5.4 115 | PySocks @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/pysocks_1699473336188/work 116 | python-dateutil==2.9.0.post0 117 | python-multipart==0.0.20 118 | pytorch-lightning==2.5.0.post0 119 | pytz==2025.1 120 | pyworld==0.3.5 121 | PyYAML==6.0.2 122 | regex==2024.11.6 123 | requests @ file:///C:/b/abs_c3508vg8ez/croot/requests_1731000584867/work 124 | rich @ file:///C:/b/abs_21nw9z7xby/croot/rich_1720637504376/work 125 | ruamel.yaml @ file:///C:/b/abs_0cunwx_ww6/croot/ruamel.yaml_1727980181547/work 126 | ruamel.yaml.clib @ file:///C:/b/abs_5fk8zi6n09/croot/ruamel.yaml.clib_1727769837359/work 127 | ruff==0.9.6 128 | safehttpx==0.1.6 129 | safetensors==0.5.2 130 | scikit-learn==1.6.1 131 | scipy==1.15.2 132 | semantic-version==2.10.0 133 | setuptools==75.8.0 134 | shellingham==1.5.4 135 | six==1.17.0 136 | sniffio==1.3.1 137 | soundfile==0.13.1 138 | soupsieve==2.6 139 | soxr==0.5.0.post1 140 | starlette==0.45.3 141 | sympy==1.13.1 142 | threadpoolctl==3.5.0 143 | tiktoken==0.9.0 144 | tn==0.0.4 145 | tokenizers==0.21.0 146 | tomlkit==0.13.2 147 | torch==2.6.0 148 | torchaudio==2.6.0 149 | torchmetrics==1.6.1 150 | tqdm @ file:///C:/b/abs_0eh9b6xugj/croot/tqdm_1738945553987/work 151 | transformers==4.49.0 152 | truststore @ file:///C:/b/abs_494cm143zh/croot/truststore_1736550137835/work 153 | ttsfrd-dependency @ file:///E:/PyCharm/project/project3/CosyVoice/pretrained_models/CosyVoice-ttsfrd/ttsfrd_dependency-0.1-py3-none-any.whl#sha256=060a53f0650d12839983afdcfb052b049d7cf5c62344a00fee3a7344582aaf6f 154 | typeguard==4.4.2 155 | typer==0.15.1 156 | typing_extensions @ file:///C:/b/abs_0ffjxtihug/croot/typing_extensions_1734714875646/work 157 | tzdata==2025.1 158 | urllib3 @ file:///C:/b/abs_7bst06lizn/croot/urllib3_1737133657081/work 159 | uvicorn==0.34.0 160 | websockets==14.2 161 | WeTextProcessing==1.0.4.1 162 | wget==3.2 163 | wheel==0.45.1 164 | win-inet-pton @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/win_inet_pton_1699472992992/work 165 | yarl==1.18.3 166 | zipp==3.21.0 167 | zstandard @ file:///C:/b/abs_31t8xmrv_h/croot/zstandard_1731356578015/work 168 | -------------------------------------------------------------------------------- /Live2d_env/pachirisu anime girl - top half.physics3.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": 3, 3 | "Meta": { 4 | "PhysicsSettingCount": 5, 5 | "TotalInputCount": 15, 6 | "TotalOutputCount": 6, 7 | "VertexCount": 13, 8 | "Fps": 30, 9 | "EffectiveForces": { 10 | "Gravity": { 11 | "X": 0, 12 | "Y": -1 13 | }, 14 | "Wind": { 15 | "X": 0, 16 | "Y": 0 17 | } 18 | }, 19 | "PhysicsDictionary": [ 20 | { 21 | "Id": "PhysicsSetting1", 22 | "Name": "Ears Twitch" 23 | }, 24 | { 25 | "Id": "PhysicsSetting2", 26 | "Name": "Front Hair" 27 | }, 28 | { 29 | "Id": "PhysicsSetting3", 30 | "Name": "Side Hair" 31 | }, 32 | { 33 | "Id": "PhysicsSetting4", 34 | "Name": "Back Hair" 35 | }, 36 | { 37 | "Id": "PhysicsSetting5", 38 | "Name": "Ribbon" 39 | } 40 | ] 41 | }, 42 | "PhysicsSettings": [ 43 | { 44 | "Id": "PhysicsSetting1", 45 | "Input": [ 46 | { 47 | "Source": { 48 | "Target": "Parameter", 49 | "Id": "ParamEyeROpen" 50 | }, 51 | "Weight": 100, 52 | "Type": "X", 53 | "Reflect": false 54 | } 55 | ], 56 | "Output": [ 57 | { 58 | "Destination": { 59 | "Target": "Parameter", 60 | "Id": "Param" 61 | }, 62 | "VertexIndex": 1, 63 | "Scale": 0.567, 64 | "Weight": 100, 65 | "Type": "Angle", 66 | "Reflect": false 67 | }, 68 | { 69 | "Destination": { 70 | "Target": "Parameter", 71 | "Id": "AhogeTwitch" 72 | }, 73 | "VertexIndex": 1, 74 | "Scale": 1, 75 | "Weight": 100, 76 | "Type": "Angle", 77 | "Reflect": false 78 | } 79 | ], 80 | "Vertices": [ 81 | { 82 | "Position": { 83 | "X": 0, 84 | "Y": 0 85 | }, 86 | "Mobility": 1, 87 | "Delay": 1, 88 | "Acceleration": 1, 89 | "Radius": 0 90 | }, 91 | { 92 | "Position": { 93 | "X": 0, 94 | "Y": 10.8 95 | }, 96 | "Mobility": 0.59, 97 | "Delay": 1, 98 | "Acceleration": 1, 99 | "Radius": 10.8 100 | } 101 | ], 102 | "Normalization": { 103 | "Position": { 104 | "Minimum": -10, 105 | "Default": 0, 106 | "Maximum": 10 107 | }, 108 | "Angle": { 109 | "Minimum": -10, 110 | "Default": 0, 111 | "Maximum": 10 112 | } 113 | } 114 | }, 115 | { 116 | "Id": "PhysicsSetting2", 117 | "Input": [ 118 | { 119 | "Source": { 120 | "Target": "Parameter", 121 | "Id": "ParamAngleX" 122 | }, 123 | "Weight": 60, 124 | "Type": "X", 125 | "Reflect": false 126 | }, 127 | { 128 | "Source": { 129 | "Target": "Parameter", 130 | "Id": "ParamAngleZ" 131 | }, 132 | "Weight": 60, 133 | "Type": "Angle", 134 | "Reflect": false 135 | }, 136 | { 137 | "Source": { 138 | "Target": "Parameter", 139 | "Id": "ParamBodyAngleX" 140 | }, 141 | "Weight": 40, 142 | "Type": "X", 143 | "Reflect": false 144 | }, 145 | { 146 | "Source": { 147 | "Target": "Parameter", 148 | "Id": "ParamBodyAngleZ" 149 | }, 150 | "Weight": 40, 151 | "Type": "Angle", 152 | "Reflect": false 153 | } 154 | ], 155 | "Output": [ 156 | { 157 | "Destination": { 158 | "Target": "Parameter", 159 | "Id": "ParamHairFront" 160 | }, 161 | "VertexIndex": 1, 162 | "Scale": 1, 163 | "Weight": 100, 164 | "Type": "Angle", 165 | "Reflect": false 166 | } 167 | ], 168 | "Vertices": [ 169 | { 170 | "Position": { 171 | "X": 0, 172 | "Y": 0 173 | }, 174 | "Mobility": 1, 175 | "Delay": 1, 176 | "Acceleration": 1, 177 | "Radius": 0 178 | }, 179 | { 180 | "Position": { 181 | "X": 0, 182 | "Y": 15 183 | }, 184 | "Mobility": 0.86, 185 | "Delay": 0.8, 186 | "Acceleration": 1.5, 187 | "Radius": 15 188 | } 189 | ], 190 | "Normalization": { 191 | "Position": { 192 | "Minimum": -10, 193 | "Default": 0, 194 | "Maximum": 10 195 | }, 196 | "Angle": { 197 | "Minimum": -10, 198 | "Default": 0, 199 | "Maximum": 10 200 | } 201 | } 202 | }, 203 | { 204 | "Id": "PhysicsSetting3", 205 | "Input": [ 206 | { 207 | "Source": { 208 | "Target": "Parameter", 209 | "Id": "ParamAngleX" 210 | }, 211 | "Weight": 60, 212 | "Type": "X", 213 | "Reflect": false 214 | }, 215 | { 216 | "Source": { 217 | "Target": "Parameter", 218 | "Id": "ParamAngleZ" 219 | }, 220 | "Weight": 60, 221 | "Type": "Angle", 222 | "Reflect": false 223 | }, 224 | { 225 | "Source": { 226 | "Target": "Parameter", 227 | "Id": "ParamBodyAngleX" 228 | }, 229 | "Weight": 40, 230 | "Type": "X", 231 | "Reflect": false 232 | }, 233 | { 234 | "Source": { 235 | "Target": "Parameter", 236 | "Id": "ParamBodyAngleZ" 237 | }, 238 | "Weight": 40, 239 | "Type": "Angle", 240 | "Reflect": false 241 | } 242 | ], 243 | "Output": [ 244 | { 245 | "Destination": { 246 | "Target": "Parameter", 247 | "Id": "ParamHairSide" 248 | }, 249 | "VertexIndex": 1, 250 | "Scale": 1, 251 | "Weight": 100, 252 | "Type": "Angle", 253 | "Reflect": false 254 | } 255 | ], 256 | "Vertices": [ 257 | { 258 | "Position": { 259 | "X": 0, 260 | "Y": 0 261 | }, 262 | "Mobility": 1, 263 | "Delay": 1, 264 | "Acceleration": 1, 265 | "Radius": 0 266 | }, 267 | { 268 | "Position": { 269 | "X": 0, 270 | "Y": 10 271 | }, 272 | "Mobility": 0.84, 273 | "Delay": 0.8, 274 | "Acceleration": 1.5, 275 | "Radius": 10 276 | }, 277 | { 278 | "Position": { 279 | "X": 0, 280 | "Y": 18 281 | }, 282 | "Mobility": 0.76, 283 | "Delay": 0.8, 284 | "Acceleration": 1.5, 285 | "Radius": 8 286 | } 287 | ], 288 | "Normalization": { 289 | "Position": { 290 | "Minimum": -10, 291 | "Default": 0, 292 | "Maximum": 10 293 | }, 294 | "Angle": { 295 | "Minimum": -10, 296 | "Default": 0, 297 | "Maximum": 10 298 | } 299 | } 300 | }, 301 | { 302 | "Id": "PhysicsSetting4", 303 | "Input": [ 304 | { 305 | "Source": { 306 | "Target": "Parameter", 307 | "Id": "ParamAngleX" 308 | }, 309 | "Weight": 60, 310 | "Type": "X", 311 | "Reflect": false 312 | }, 313 | { 314 | "Source": { 315 | "Target": "Parameter", 316 | "Id": "ParamAngleZ" 317 | }, 318 | "Weight": 60, 319 | "Type": "Angle", 320 | "Reflect": false 321 | }, 322 | { 323 | "Source": { 324 | "Target": "Parameter", 325 | "Id": "ParamBodyAngleX" 326 | }, 327 | "Weight": 40, 328 | "Type": "X", 329 | "Reflect": false 330 | }, 331 | { 332 | "Source": { 333 | "Target": "Parameter", 334 | "Id": "ParamBodyAngleZ" 335 | }, 336 | "Weight": 40, 337 | "Type": "Angle", 338 | "Reflect": false 339 | } 340 | ], 341 | "Output": [ 342 | { 343 | "Destination": { 344 | "Target": "Parameter", 345 | "Id": "ParamHairBack" 346 | }, 347 | "VertexIndex": 1, 348 | "Scale": 1, 349 | "Weight": 100, 350 | "Type": "Angle", 351 | "Reflect": false 352 | } 353 | ], 354 | "Vertices": [ 355 | { 356 | "Position": { 357 | "X": 0, 358 | "Y": 0 359 | }, 360 | "Mobility": 1, 361 | "Delay": 1, 362 | "Acceleration": 1, 363 | "Radius": 0 364 | }, 365 | { 366 | "Position": { 367 | "X": 0, 368 | "Y": 10 369 | }, 370 | "Mobility": 0.85, 371 | "Delay": 0.9, 372 | "Acceleration": 1, 373 | "Radius": 10 374 | }, 375 | { 376 | "Position": { 377 | "X": 0, 378 | "Y": 20 379 | }, 380 | "Mobility": 0.9, 381 | "Delay": 0.9, 382 | "Acceleration": 1, 383 | "Radius": 10 384 | }, 385 | { 386 | "Position": { 387 | "X": 0, 388 | "Y": 28 389 | }, 390 | "Mobility": 0.9, 391 | "Delay": 0.9, 392 | "Acceleration": 0.8, 393 | "Radius": 8 394 | } 395 | ], 396 | "Normalization": { 397 | "Position": { 398 | "Minimum": -10, 399 | "Default": 0, 400 | "Maximum": 10 401 | }, 402 | "Angle": { 403 | "Minimum": -10, 404 | "Default": 0, 405 | "Maximum": 10 406 | } 407 | } 408 | }, 409 | { 410 | "Id": "PhysicsSetting5", 411 | "Input": [ 412 | { 413 | "Source": { 414 | "Target": "Parameter", 415 | "Id": "ParamBodyAngleX" 416 | }, 417 | "Weight": 100, 418 | "Type": "X", 419 | "Reflect": false 420 | }, 421 | { 422 | "Source": { 423 | "Target": "Parameter", 424 | "Id": "ParamBodyAngleZ" 425 | }, 426 | "Weight": 100, 427 | "Type": "Angle", 428 | "Reflect": false 429 | } 430 | ], 431 | "Output": [ 432 | { 433 | "Destination": { 434 | "Target": "Parameter", 435 | "Id": "RibbonPhysics" 436 | }, 437 | "VertexIndex": 1, 438 | "Scale": 1, 439 | "Weight": 100, 440 | "Type": "Angle", 441 | "Reflect": false 442 | } 443 | ], 444 | "Vertices": [ 445 | { 446 | "Position": { 447 | "X": 0, 448 | "Y": 0 449 | }, 450 | "Mobility": 1, 451 | "Delay": 1, 452 | "Acceleration": 1, 453 | "Radius": 0 454 | }, 455 | { 456 | "Position": { 457 | "X": 0, 458 | "Y": 10 459 | }, 460 | "Mobility": 0.9, 461 | "Delay": 0.6, 462 | "Acceleration": 1.5, 463 | "Radius": 10 464 | } 465 | ], 466 | "Normalization": { 467 | "Position": { 468 | "Minimum": -10, 469 | "Default": 0, 470 | "Maximum": 10 471 | }, 472 | "Angle": { 473 | "Minimum": -10, 474 | "Default": 0, 475 | "Maximum": 10 476 | } 477 | } 478 | } 479 | ] 480 | } -------------------------------------------------------------------------------- /README_CN.md: -------------------------------------------------------------------------------- 1 | # Live2D-LLM-Chat 2 | [US English](README.md) | [CN 中文](README_CN.md) 3 | 4 | [![ASR](https://img.shields.io/badge/ASR-SenseVoice-green.svg)](https://github.com/FunAudioLLM/SenseVoice) 5 | [![LLM](https://img.shields.io/badge/LLM-GPT%2FDeepSeek-red.svg)](https://openai.com/api/) 6 | [![TTS](https://img.shields.io/badge/TTS-CosyVoice-orange.svg)](https://github.com/FunAudioLLM/CosyVoice) 7 | [![Live2D](https://img.shields.io/badge/Live2D-v3-blue.svg)](https://github.com/Arkueid/live2d-py) 8 | 9 | [![Python](https://img.shields.io/badge/Python-3.8+-yellow.svg)](https://www.python.org/downloads/) 10 | [![Miniconda](https://img.shields.io/badge/Anaconda-Miniconda-violet.svg)](https://docs.anaconda.net.cn/miniconda/install/) 11 | 12 | > **Live2D + ASR + LLM + TTS** → 实时语音互动 | 本地部署 / 云端推理 13 | 14 | --- 15 | ## ✨ 1. 项目简介 16 | 17 | **Live2D-LLM-Chat** 是一个集成了**Live2D 虚拟形象**、**语音识别(ASR)**、**大语言模型(LLM)**和**文本转语音(TTS)** 的实时 AI 交互项目。它能够让**虚拟角色**通过语音识别用户的输入,并使用 AI 生成智能回复,同时通过 TTS 播放语音,并驱动 Live2D 动画实现嘴型同步,达到自然的互动体验。 18 | 19 | --- 20 | ### 📌 1.1. 主要功能 21 | - 🎙 **语音识别(ASR)**:使用 FunASR 进行语音转文本 (STT) 处理。 22 | - 🧠 **大语言模型(LLM)**:基于 OpenAI GPT / DeepSeek 提供理性沟通能力。 23 | - 🔊 **文本转语音(TTS)**:使用 CosyVoice 实现高质量的合成语音 24 | - 🏆 **Live2D 虚拟形象交互**:使用 Live2D SDK 渲染角色,并实现模型的实时反馈。 25 | 26 | --- 27 | ### 📌 1.2. 优化功能 28 | - **LLM模块**接口可支持本地与云端部署,本地部署基于**LM Studio**接口,基本涵盖所有已开源模型,但个人设备性能难以运行大体量模型;云端部署接口现已支持**OpenAI**平台接口与**DeepSeek**平台接口。 29 | - 储存模型对话时的前文数据,形成**历史记忆**。每5次对话会进行总结,避免多次对话后文本累计过量的情况。 30 | - 对历次模型对话的时间与内容进行**存档**,便于查找过往对话内容。可存档内容包括模型的**历史语音输出**。该功能可在配置文件中关闭,关闭后再次进行对话时将清除历史对话的语音数据,**减清内存压力**。 31 | - 重构Live2d模型角色的**眼神跟随**与**眨眼逻辑**,即使live2d模型没有内置眨眼逻辑,也可自然眨眼。编写**嘴型变化**逻辑,读取TTS模块输出的音频文件,将实时音频大小转化至live2d模型的嘴型变化。 32 | - 修改CosyVoice模型的API接口程序,改变生成语音文件打开方式,允许**直接保存**生成文件;对于长文本下分段生成的语音文件,**合并**为单一文件。 33 |

34 | Live2D 运行展示 35 |
36 | Live2D 运行展示 37 |

38 | 39 | 40 | #### 🎬 运行效果 41 | 42 | | 语音输入 | AI 处理 | Live2D 输出 | 43 | |----------|---------|------------| 44 | | 🎤 你:你好呀 | 🤖 AI:你好! | 🧑‍🎤 "你好!" (嘴型同步) | 45 | | 🎤 你:天气怎么样? | 🤖 AI:今天是大晴天呢! | 🧑‍🎤 "今天是大晴天呢!" (语气变化) | 46 | 47 | --- 48 | ### 📌 1.3. 技术栈 49 | | 组件 | 技术 | 50 | |-------|-------| 51 | | ASR(自动语音识别) | SenseVoice | 52 | | LLM(大语言模型) | OpenAI GPT / DeepSeek | 53 | | TTS(文本转语音) | CosyVoice | 54 | | Live2D 动画 | live2d-py + OpenGL | 55 | | 配置管理 | Python Config | 56 | 57 | --- 58 | ## 🛠 2. 安装与配置 59 | 60 | --- 61 | 62 | ### 📌 2.1. 环境要求 63 | 64 | 本项目基于 **Python 3.11** 开发,运行前请确保满足以下环境要求: 65 | 66 | ✅ **操作系统**: 67 | - 🖥 **Windows 10/11** 或 **Linux** 68 | 69 | ✅ **Python 版本**: 70 | - 📌 建议使用 **Python 3.8 及以上** 71 | 72 | ⚠️ **注意**: 73 | 本项目的 **TTS 模块** 基于 **conda 环境** 运行,需要 **预先安装 Miniconda**。 74 | 🔗 你可以从 [Miniconda 官网](https://docs.anaconda.net.cn/miniconda/install/) 下载。 75 | 76 | --- 77 | 78 | ### 📌 2.2. 依赖的开源项目 79 | 80 | 本项目使用了以下优秀的开源库和模型: 81 | 82 | 🎙 **语音识别(ASR)**: 83 | - **SenseVoice** —— 高精度 **多语言语音识别** 及 **语音情感分析** 84 | - 🔗 **GitHub**:[SenseVoice Repository](https://github.com/FunAudioLLM/SenseVoice) 85 | 86 | 🔊 **文本转语音(TTS)**: 87 | - **CosyVoice** —— 强大的 **生成式语音合成系统**,支持 **零样本语音克隆** 88 | - 🔗 **GitHub**:[CosyVoice Repository](https://github.com/FunAudioLLM/CosyVoice) 89 | 90 | 📽 **Live2D 动画**: 91 | - **live2d-py** —— **Python 直接加载和操作 Live2D 模型** 的工具 92 | - 🔗 **GitHub**:[live2d-py Repository](https://github.com/Arkueid/live2d-py) 93 | 94 | --- 95 | ## 📁 3. 安装步骤 96 | 97 | --- 98 | 99 | ### 📌 3.1. 克隆项目代码 100 | 101 | ```bash 102 | git clone https://github.com/suzuran0y/Live2D-LLM-Chat.git 103 | cd Live2D-LLM-Chat 104 | ``` 105 | 106 | ### 📌 3.2. 创建虚拟环境(可选) 107 | ```bash 108 | python -m venv venv 109 | source venv/bin/activate # Linux/macOS 激活虚拟环境 110 | venv\Scripts\activate # Windows 激活虚拟环境 111 | ``` 112 | 113 | ### 📌 3.3. 安装依赖 114 | 115 | ```bash 116 | pip install -r requirements.txt 117 | ``` 118 | 119 | --- 120 | ### 📌 3.4. 安装 ASR & TTS 模型 121 | 🎙 **语音识别 (ASR) - SenseVoice** 122 | 本项目使用 SenseVoice 作为 ASR 模型,支持**高精度多语言语音识别**、**语音情感识别**和**声学事件检测**。 123 | 124 | #### 1️⃣ 安装 SenseVoice 依赖 125 | 使用 pip 安装 SenseVoice 相关依赖: 126 | ```bash 127 | pip install funasr 128 | ``` 129 | 130 | 如果需要 ONNX 或 TorchScript 推理,请安装对应的版本: 131 | 132 | ```bash 133 | pip install funasr-onnx # ONNX 版本 134 | pip install funasr-torch # TorchScript 版本 135 | ``` 136 | #### 2️⃣ 下载 SenseVoice 预训练模型 137 | SenseVoice 官方提供多个**预训练模型**,可通过 ModelScope 进行下载: 138 | 139 | ```bash 140 | from modelscope import snapshot_download 141 | 142 | # 下载 SenseVoice-Small 版本 143 | snapshot_download('iic/SenseVoiceSmall', local_dir='pretrained_models/SenseVoiceSmall') 144 | # 下载 SenseVoice-Large 版本(如果需要更高精度) 145 | snapshot_download('iic/SenseVoiceLarge', local_dir='pretrained_models/SenseVoiceLarge') 146 | ``` 147 | 更详细的配置和参数说明,请参考: 148 | 149 | 🔗SenseVoice GitHub:[SenseVoice GitHub](https://github.com/FunAudioLLM/SenseVoice) 150 | 🔗ModelScope:[预训练模型](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 151 | 152 | 🔊 **文本转语音 (TTS) - CosyVoice** 153 | 本项目使用 CosyVoice 作为 TTS 模型,支持**多语言**、**语音克隆**、**跨语言复刻**等功能。 154 | 155 | #### 1️⃣ 安装 CosyVoice 依赖 156 | 克隆 CosyVoice 仓库: 157 | ```bash 158 | git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git 159 | cd CosyVoice 160 | git submodule update --init --recursive 161 | ``` 162 | 163 | #### 2️⃣ 创建 Conda 环境并安装依赖 164 | ```bash 165 | # 创建 Conda 虚拟环境 166 | conda create -n cosyvoice -y python=3.10 167 | conda activate cosyvoice 168 | 169 | # 安装必要依赖 170 | conda install -y -c conda-forge pynini==2.1.5 171 | pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com 172 | ``` 173 | 174 | 安装 SoX(如果需要): 175 | 176 | ```bash 177 | # Ubuntu 178 | sudo apt-get install sox libsox-dev 179 | # CentOS 180 | sudo yum install sox sox-devel 181 | ``` 182 | 183 | #### 3️⃣ 下载 CosyVoice 预训练模型 184 | 建议下载以下 CosyVoice 预训练模型: 185 | 186 | ```bash 187 | from modelscope import snapshot_download 188 | 189 | snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B') 190 | snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M') 191 | snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT') 192 | snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct') 193 | snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd') 194 | ``` 195 | 更详细的配置和参数说明,请参考: 196 | 197 | 🔗CosyVoice GitHub:[CosyVoice GitHub](https://github.com/FunAudioLLM/CosyVoice) 198 | 🔗ModelScope:[预训练模型](https://www.modelscope.cn/iic/CosyVoice2-0.5B) 199 | 200 | --- 201 | ## ⚙️ 4. 本地化配置(重要!!) 202 | 203 | --- 204 | 205 | ### 📌 4.1. 配置 ASR & TTS 模型 206 | 207 | 在完成 **ASR** 和 **TTS** 模型的安装后,按照以下步骤进行本地化配置: 208 | 209 | ✅ **替换 SenseVoice 目录** 210 | - 将下载好的 **SenseVoice** 文件夹 **放入** `Live2D-LLM-Chat/ASR_env/` 文件夹内,**替换原有的同名空文件夹**。 211 | 212 | ✅ **替换 CosyVoice 目录** 213 | - 将下载好的 **CosyVoice** 文件夹 **放入** `Live2D-LLM-Chat/TTS_env/` 文件夹内,**替换原有的同名空文件夹**。 214 | 215 | ✅ **替换 `webui.py` 文件** 216 | - 将 `TTS_env` 文件夹内的 **`webui.py`** **放入** `CosyVoice` 文件夹内,**替换原有的 `webui.py` 文件**。 217 | 218 | --- 219 | 220 | ### 📌 4.2. 配置 `config.py` 以适配本地环境 221 | 所有 **本地路径和参数** 均可在 **`config.py`** 文件中进行修改: 222 | 请根据 **你的文件路径** 进行相应修改,示例如下: 223 | ```python 224 | class Config: 225 | # 🏠 项目根目录 226 | PROJECT_ROOT = "E:/PyCharm/project/project1" 227 | 228 | # 🎙 ASR(自动语音识别)配置 229 | ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall") 230 | ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav") 231 | 232 | # 🔊 TTS(文本转语音)配置 233 | TTS_API_URL = "http://localhost:8000/" 234 | TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/") 235 | 236 | ...... 237 | 238 | # 📢 更多配置信息请参考 `config.py` 239 | ``` 240 | ❗ 请确保所有路径正确,否则模型无法正常运行! 241 | 242 | --- 243 | ### 📌 4.3. 配置 LLM 模型 244 | 本地化部署**LLM 模型**依赖于**LM Studio**,请按照以下步骤进行: 245 | 246 | #### 1️⃣ 安装 LM Studio 247 | 可从[GitHub 官方仓库](https://github.com/lmstudio-ai) 或 [LM Studio 官网](https://lmstudio.ai/) 下载安装。 248 | 249 | #### 2️⃣ 进入程序,下载当前设备可运行的 LLM 模型。 250 | 启用 LM Studio,获取 本地接口 URL。 251 | 确定模型路径 & 端口号,在 config.py 中进行相应配置。 252 | #### 3️⃣ 运行本地 LLM,并在项目中调用。 253 | ⚠️ **注意**:本地 LLM 部署性能受限于设备配置,可能无法与云端大模型相比。如需更高性能,可考虑使用 OpenAI GPT-4 或 DeepSeek API。 254 | 255 | --- 256 | ## 👀 5. 使用方法 257 | 258 | --- 259 | 260 | ## 📌 5.1. 启动 TTS API 261 | 262 | 在运行主程序前,**需要先启动 TTS API**: 263 | 264 | ```bash 265 | python TTS_api.py # 现在 TTS API 调用**已集成到主程序中**,通常无需单独运行,但调试(debug)时可单独运行检查。 266 | ``` 267 | 268 | 269 | 🎯 TTS API 模块将在 **conda 环境** 中运行 webui.py。启动成功后,可在浏览器访问 WebUI 进行语音合成管理:🌍 默认访问地址:http://localhost:8000 270 | 271 | ❗ 确保 TTS API **启动成功**,否则程序无法合成语音。 272 | 273 | --- 274 | ## 📌 5.2. 运行主程序 275 | 启动 TTS API 后,运行后续程序: 276 | 277 | ```bash 278 | python main.py 279 | ``` 280 | 🎙 交互方式: 281 | 282 | #### 1️⃣ 按住 Ctrl 键 开始录音,按 Alt 键 结束录音,语音将自动转换为文本。 283 | #### 2️⃣ 语音文本 被输入 LLM 模块 进行回答,并生成答复文本。 284 | #### 3️⃣ 答复文本 被输入 TTS 模块 合成为语音,并在 Live2D 窗口中做出口型同步。 285 | 286 | --- 287 | ## 📌 5.3. 架构示意图 288 | 289 | | **步骤** | **模块** | **输入** | **处理** | **输出** | 290 | |----------|---------|---------|---------|---------| 291 | | 🎤 **用户语音** | **用户** | 语音输入 | 用户说话 | 音频信号 | 292 | | 🎙 **语音识别** | **ASR(SenseVoice)** | 音频信号 | 语音转文本(STT) | 识别文本 | 293 | | 🤖 **文本理解 & 生成** | **LLM(GPT-4 / DeepSeek)** | 识别文本 | 语义分析 & 生成 AI 回复 | AI 生成文本 | 294 | | 🔊 **语音合成** | **TTS(CosyVoice)** | AI 生成文本 | 文本转语音(TTS) | 语音数据 | 295 | | 🎭 **Live2D 动画** | **Live2D** | 语音数据 | 动作生成 | 角色动画 | 296 | | 🗣 **AI 语音反馈** | **用户** | 角色语音 & 动作 | 用户听到 AI 反馈 | 语音 & 视觉互动 | 297 | 298 | 299 | --- 300 | # 📂 6. 项目结构 301 | --- 302 | 303 | 本项目采用模块化设计,包含 **ASR(语音识别)、TTS(文本转语音)、LLM(大语言模型)、Live2D 动画渲染** 等核心功能,以下是 **完整的项目结构**: 304 | 305 | ```bash 306 | Live2D-LLM-Chat/ 307 | │── main.py # 🚀 主程序入口 308 | │── ASR.py # 🎙 语音识别 (ASR) 模块 309 | │── TTS.py # 🔊 语音合成 (TTS) 模块 310 | │── TTS_api.py # 🌐 TTS API 模块 311 | │── LLM.py # 🤖 大语言模型 (LLM) 模块 312 | │── Live2d_animation.py # 🎭 Live2D 动画管理模块 313 | │── webui.py # 🖥 WebUI 语音合成界面 314 | │── config.py # ⚙️ 项目配置文件 315 | │── requirements.txt # 📦 依赖列表 316 | └── README.md # 📄 项目文档 317 | ``` 318 | --- 319 | ## 🚀 7. 项目发展 320 | --- 321 | 322 | ### 📌 7.1. 过往内容 323 | 324 | #### 🏁 **2025.01.28 - 项目构思** 325 | - 🎯 **确定核心目标**:基于 **Live2D + LLM** 实现实时互动系统 326 | - 🔍 **研究技术**:语音识别(ASR)、文本转语音(TTS)及 Live2D 方案 327 | - ✅ **选定核心组件**: 328 | - **SenseVoice** 作为 ASR 329 | - **CosyVoice** 作为 TTS 330 | - **live2d-py** 作为动画渲染引擎 331 | 332 | #### 📅 **2025.02.28 - 发布第一版** 333 | - 🎙 **实现语音输入 & 识别(ASR)** 334 | - 🤖 **集成 LLM 进行文本生成** 335 | - 🔊 **通过 TTS 生成语音并同步 Live2D 模型部分动作** 336 | 337 | --- 338 | 339 | ### 📌 7.2. 未来计划 ~~(画饼)~~ 340 | 341 | 🔹 **LLM 模块优化**: 342 | - 由于 **个人设备性能** 限制了本地部署模型的输出质量,计划 **改进 LLM 模块的输出逻辑**,提升稳定性。 343 | 344 | 🔹 **信息输出精炼**: 345 | - 优化 **模型运行时的日志和输出信息**,仅保留重点内容,提高可读性和观感。 346 | 347 | 🔹 **Live2D 交互增强**: 348 | - **提升 Live2D 角色的动作丰富度**,增强互动体验,使 Live2D 角色更具表现力。 349 | 350 | 🔹 **后续优化**: 351 | - 🛠 持续优化 TTS & ASR 模块的运行效率 352 | - 🌍 增强多语言支持,扩展至更多语种 353 | - 🔗 进一步支持云端推理,提高性能 354 | 355 | --- 356 | 357 | ## 🤝 8. 贡献与鸣谢 358 | --- 359 | 360 | 本项目部分代码基于 [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)、[CosyVoice](https://github.com/FunAudioLLM/CosyVoice) 和 [live2d-py](https://github.com/Arkueid/live2d-py) 进行修改,并根据项目需求进行了优化和扩展。 361 | 🎉 **特此感谢原项目作者的贡献!** 362 | 363 | 💡 **欢迎贡献代码和建议!** 364 | 📢 如有问题或改进建议,请提交 **PR(Pull Request)** 或 **Issue** 进行反馈。 365 | 366 | --- 367 | 368 | 369 | ## 📄 许可证 370 | 本项目采用 [Apache-2.0 许可证](LICENSE)。 371 | 372 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /TTS_env/webui.py: -------------------------------------------------------------------------------- 1 | 2 | # 本文件基于 Alibaba Inc 的原始代码(webui)修改 3 | # 原作者: Xiang Lyu, Liu Yue 4 | # 修改者: suzuran0y 5 | # 主要修改内容: 6 | # 1. 添加生成语音历史存档功能 7 | # 2. 增加语音数据清除方式:自定义文件清除或归档 8 | # 3. 修改生成语音文件打开方式,允许直接保存生成文件 9 | # 4. 对于长文本下分段生成的语音文件,合并为单一文件 10 | 11 | import os 12 | import sys 13 | import argparse 14 | import gradio as gr 15 | import numpy as np 16 | import torch 17 | import torchaudio 18 | import random 19 | import librosa 20 | import soundfile as sf 21 | import shutil 22 | import datetime 23 | from pydub import AudioSegment # 用于合并音频 24 | from config import Config 25 | 26 | # 配置文件清理方式(delete: 删除 | move: 归档) 27 | CLEANUP_MODE = Config.CLEANUP_MODE # "delete" 或 "move" 28 | 29 | # 设定保存目录和历史归档目录 30 | SAVE_DIR = Config.WEBUI_SAVE_DIR 31 | HISTORY_DIR = Config.WEBUI_HISTORY_DIR 32 | MODEL_DIR = Config.WEBUI_MODEL_DIR 33 | # 确保目录存在 34 | os.makedirs(SAVE_DIR, exist_ok=True) 35 | os.makedirs(HISTORY_DIR, exist_ok=True) 36 | 37 | ROOT_DIR = os.path.dirname(os.path.abspath(__file__)) 38 | sys.path.append('{}/third_party/Matcha-TTS'.format(ROOT_DIR)) 39 | from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2 40 | from cosyvoice.utils.file_utils import load_wav, logging 41 | from cosyvoice.utils.common import set_all_random_seed 42 | inference_mode_list = ['预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制'] 43 | instruct_dict = {'预训练音色': '1. 选择预训练音色\n2. 点击生成音频按钮', 44 | '3s极速复刻': '1. 选择prompt音频文件,或录入prompt音频,注意不超过30s,若同时提供,优先选择prompt音频文件\n2. 输入prompt文本\n3. 点击生成音频按钮', 45 | '跨语种复刻': '1. 选择prompt音频文件,或录入prompt音频,注意不超过30s,若同时提供,优先选择prompt音频文件\n2. 点击生成音频按钮', 46 | '自然语言控制': '1. 选择预训练音色\n2. 输入instruct文本\n3. 点击生成音频按钮'} 47 | stream_mode_list = [('否', False), ('是', True)] 48 | max_val = 0.8 49 | 50 | 51 | # 在新任务开始时,清理或归档旧音频 52 | def cleanup_old_audio(): 53 | files = os.listdir(SAVE_DIR) 54 | if not files: 55 | return 56 | 57 | if CLEANUP_MODE == "delete": 58 | for file in files: 59 | file_path = os.path.join(SAVE_DIR, file) 60 | try: 61 | os.remove(file_path) 62 | # print(f"已删除旧音频: {file}") 63 | except Exception as e: 64 | print(f"无法删除 {file}: {e}") 65 | 66 | elif CLEANUP_MODE == "move": 67 | for file in files: 68 | old_path = os.path.join(SAVE_DIR, file) 69 | new_path = os.path.join(HISTORY_DIR, file) 70 | try: 71 | shutil.move(old_path, new_path) 72 | # print(f"已归档旧音频: {file} -> {HISTORY_DIR}") 73 | except Exception as e: 74 | print(f"无法归档 {file}: {e}") 75 | 76 | def generate_seed(): 77 | seed = random.randint(1, 100000000) 78 | return { 79 | "__type__": "update", 80 | "value": seed 81 | } 82 | 83 | 84 | def postprocess(speech, top_db=60, hop_length=220, win_length=440): 85 | speech, _ = librosa.effects.trim( 86 | speech, top_db=top_db, 87 | frame_length=win_length, 88 | hop_length=hop_length 89 | ) 90 | if speech.abs().max() > max_val: 91 | speech = speech / speech.abs().max() * max_val 92 | speech = torch.concat([speech, torch.zeros(1, int(cosyvoice.sample_rate * 0.2))], dim=1) 93 | return speech 94 | 95 | 96 | def change_instruction(mode_checkbox_group): 97 | return instruct_dict[mode_checkbox_group] 98 | 99 | # 将多个音频片段合并为一个完整音频文件 100 | def merge_audio_files(file_list, output_path): 101 | if len(file_list) == 1: 102 | # print("只有一个音频文件,无需合并") 103 | shutil.move(file_list[0], output_path) # 直接重命名并移动到最终目录 104 | return output_path 105 | 106 | # print(f"需要合并 {len(file_list)} 个音频文件...") 107 | 108 | combined = AudioSegment.empty() 109 | 110 | for file in sorted(file_list): 111 | audio_segment = AudioSegment.from_wav(file) 112 | combined += audio_segment 113 | 114 | combined.export(output_path, format="wav") 115 | # print(f"所有音频片段已合并,最终音频文件: {output_path}") 116 | 117 | # 删除分段音频文件 118 | for file in file_list: 119 | os.remove(file) 120 | # print(f"已删除分段音频文件: {file}") 121 | 122 | return output_path 123 | 124 | 125 | def generate_audio(tts_text, mode_checkbox_group, sft_dropdown, prompt_text, prompt_wav_upload, prompt_wav_record, instruct_text, 126 | seed, stream, speed): 127 | # 在新任务开始时,清理或归档旧文件 128 | cleanup_old_audio() 129 | # 获取当前时间戳,用作文件名 130 | timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") 131 | final_output_path = os.path.join(SAVE_DIR, f"{timestamp}.wav") 132 | # 存储所有生成的音频片段 133 | generated_files = [] 134 | 135 | set_all_random_seed(seed) 136 | 137 | def save_audio(audio_data): 138 | """ 保存音频(临时存储,不复制到自定义目录) """ 139 | temp_filename = f"temp_{len(generated_files) + 1}.wav" # 统一使用 `temp_x.wav` 避免混淆 140 | temp_output_path = os.path.join(SAVE_DIR, temp_filename) 141 | 142 | with sf.SoundFile(temp_output_path, 'w', samplerate=cosyvoice.sample_rate, channels=1) as f: 143 | f.write(audio_data) 144 | 145 | generated_files.append(temp_output_path) # 记录生成的音频文件 146 | # print(f"生成的音频文件: {temp_output_path}") 147 | 148 | if prompt_wav_upload is not None: 149 | prompt_wav = prompt_wav_upload 150 | elif prompt_wav_record is not None: 151 | prompt_wav = prompt_wav_record 152 | else: 153 | prompt_wav = None 154 | # if instruct mode, please make sure that model is iic/CosyVoice-300M-Instruct and not cross_lingual mode 155 | if mode_checkbox_group in ['自然语言控制']: 156 | if cosyvoice.instruct is False: 157 | gr.Warning('您正在使用自然语言控制模式, {}模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型'.format(args.model_dir)) 158 | yield (cosyvoice.sample_rate, default_data) 159 | if instruct_text == '': 160 | gr.Warning('您正在使用自然语言控制模式, 请输入instruct文本') 161 | yield (cosyvoice.sample_rate, default_data) 162 | if prompt_wav is not None or prompt_text != '': 163 | gr.Info('您正在使用自然语言控制模式, prompt音频/prompt文本会被忽略') 164 | # if cross_lingual mode, please make sure that model is iic/CosyVoice-300M and tts_text prompt_text are different language 165 | if mode_checkbox_group in ['跨语种复刻']: 166 | if cosyvoice.instruct is True: 167 | gr.Warning('您正在使用跨语种复刻模式, {}模型不支持此模式, 请使用iic/CosyVoice-300M模型'.format(args.model_dir)) 168 | yield (cosyvoice.sample_rate, default_data) 169 | if instruct_text != '': 170 | gr.Info('您正在使用跨语种复刻模式, instruct文本会被忽略') 171 | if prompt_wav is None: 172 | gr.Warning('您正在使用跨语种复刻模式, 请提供prompt音频') 173 | yield (cosyvoice.sample_rate, default_data) 174 | gr.Info('您正在使用跨语种复刻模式, 请确保合成文本和prompt文本为不同语言') 175 | # if in zero_shot cross_lingual, please make sure that prompt_text and prompt_wav meets requirements 176 | if mode_checkbox_group in ['3s极速复刻', '跨语种复刻']: 177 | if prompt_wav is None: 178 | gr.Warning('prompt音频为空,您是否忘记输入prompt音频?') 179 | yield (cosyvoice.sample_rate, default_data) 180 | if torchaudio.info(prompt_wav).sample_rate < prompt_sr: 181 | gr.Warning('prompt音频采样率{}低于{}'.format(torchaudio.info(prompt_wav).sample_rate, prompt_sr)) 182 | yield (cosyvoice.sample_rate, default_data) 183 | # sft mode only use sft_dropdown 184 | if mode_checkbox_group in ['预训练音色']: 185 | if instruct_text != '' or prompt_wav is not None or prompt_text != '': 186 | gr.Info('您正在使用预训练音色模式,prompt文本/prompt音频/instruct文本会被忽略!') 187 | if sft_dropdown == '': 188 | gr.Warning('没有可用的预训练音色!') 189 | yield (cosyvoice.sample_rate, default_data) 190 | # zero_shot mode only use prompt_wav prompt text 191 | if mode_checkbox_group in ['3s极速复刻']: 192 | if prompt_text == '': 193 | gr.Warning('prompt文本为空,您是否忘记输入prompt文本?') 194 | yield (cosyvoice.sample_rate, default_data) 195 | if instruct_text != '': 196 | gr.Info('您正在使用3s极速复刻模式,预训练音色/instruct文本会被忽略!') 197 | 198 | if mode_checkbox_group == '预训练音色': 199 | logging.info('get sft inference request') 200 | for i in cosyvoice.inference_sft(tts_text, sft_dropdown, stream=stream, speed=speed): 201 | audio_data = i['tts_speech'].numpy().flatten() 202 | yield cosyvoice.sample_rate, audio_data 203 | save_audio(audio_data) 204 | 205 | elif mode_checkbox_group == '3s极速复刻': 206 | logging.info('get zero_shot inference request') 207 | prompt_speech_16k = postprocess(load_wav(prompt_wav, prompt_sr)) 208 | for i in cosyvoice.inference_zero_shot(tts_text, prompt_text, prompt_speech_16k, stream=stream, speed=speed): 209 | audio_data = i['tts_speech'].numpy().flatten() 210 | yield cosyvoice.sample_rate, audio_data 211 | save_audio(audio_data) 212 | 213 | elif mode_checkbox_group == '跨语种复刻': 214 | logging.info('get cross_lingual inference request') 215 | prompt_speech_16k = postprocess(load_wav(prompt_wav, prompt_sr)) 216 | for i in cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=stream, speed=speed): 217 | audio_data = i['tts_speech'].numpy().flatten() 218 | yield (cosyvoice.sample_rate, audio_data) 219 | save_audio(audio_data) 220 | 221 | else: 222 | logging.info('get instruct inference request') 223 | for i in cosyvoice.inference_instruct(tts_text, sft_dropdown, instruct_text, stream=stream, speed=speed): 224 | audio_data = i['tts_speech'].numpy().flatten() 225 | yield (cosyvoice.sample_rate, audio_data) 226 | save_audio(audio_data) 227 | 228 | # 合并多个音频文件(如果有多个,否则无变化) 229 | final_output = merge_audio_files(generated_files, final_output_path) 230 | # print(f"最终合成的完整音频文件: {final_output}") 231 | 232 | def main(): 233 | with gr.Blocks() as demo: 234 | gr.Markdown("### 代码库 [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) \ 235 | 预训练模型 [CosyVoice-300M](https://www.modelscope.cn/models/iic/CosyVoice-300M) \ 236 | [CosyVoice-300M-Instruct](https://www.modelscope.cn/models/iic/CosyVoice-300M-Instruct) \ 237 | [CosyVoice-300M-SFT](https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT)") 238 | gr.Markdown("#### 请输入需要合成的文本,选择推理模式,并按照提示步骤进行操作") 239 | 240 | tts_text = gr.Textbox(label="输入合成文本", lines=1, value="我是通义实验室语音团队全新推出的生成式语音大模型,提供舒适自然的语音合成能力。") 241 | with gr.Row(): 242 | mode_checkbox_group = gr.Radio(choices=inference_mode_list, label='选择推理模式', value=inference_mode_list[0]) 243 | instruction_text = gr.Text(label="操作步骤", value=instruct_dict[inference_mode_list[0]], scale=0.5) 244 | sft_dropdown = gr.Dropdown(choices=sft_spk, label='选择预训练音色', value=sft_spk[0], scale=0.25) 245 | stream = gr.Radio(choices=stream_mode_list, label='是否流式推理', value=stream_mode_list[0][1]) 246 | speed = gr.Number(value=1, label="速度调节(仅支持非流式推理)", minimum=0.5, maximum=2.0, step=0.1) 247 | with gr.Column(scale=0.25): 248 | seed_button = gr.Button(value="\U0001F3B2") 249 | seed = gr.Number(value=0, label="随机推理种子") 250 | 251 | with gr.Row(): 252 | prompt_wav_upload = gr.Audio(sources='upload', type='filepath', label='选择prompt音频文件,注意采样率不低于16khz') 253 | prompt_wav_record = gr.Audio(sources='microphone', type='filepath', label='录制prompt音频文件') 254 | prompt_text = gr.Textbox(label="输入prompt文本", lines=1, placeholder="请输入prompt文本,需与prompt音频内容一致,暂时不支持自动识别...", value='') 255 | instruct_text = gr.Textbox(label="输入instruct文本", lines=1, placeholder="请输入instruct文本.", value='') 256 | 257 | generate_button = gr.Button("生成音频") 258 | 259 | # audio_output = gr.Audio(label="合成音频", autoplay=True, streaming=True) 260 | audio_output = gr.Audio(label="合成音频", streaming=False) # streaming=False 能够下载音频 261 | 262 | seed_button.click(generate_seed, inputs=[], outputs=seed) 263 | generate_button.click(generate_audio, 264 | inputs=[tts_text, mode_checkbox_group, sft_dropdown, prompt_text, prompt_wav_upload, prompt_wav_record, instruct_text, 265 | seed, stream, speed], 266 | outputs=[audio_output]) 267 | mode_checkbox_group.change(fn=change_instruction, inputs=[mode_checkbox_group], outputs=[instruction_text]) 268 | demo.queue(max_size=4, default_concurrency_limit=2) 269 | demo.launch(server_name='0.0.0.0', server_port=args.port) 270 | 271 | 272 | if __name__ == '__main__': 273 | parser = argparse.ArgumentParser() 274 | parser.add_argument('--port', 275 | type=int, 276 | default=8000) 277 | parser.add_argument('--model_dir', 278 | type=str, 279 | default=MODEL_DIR, 280 | help='local path or modelscope repo id') 281 | args = parser.parse_args() 282 | try: 283 | cosyvoice = CosyVoice(args.model_dir) 284 | except Exception: 285 | try: 286 | cosyvoice = CosyVoice2(args.model_dir) 287 | except Exception: 288 | raise TypeError('no valid model_type!') 289 | 290 | sft_spk = cosyvoice.list_available_spks() 291 | if len(sft_spk) == 0: 292 | sft_spk = [''] 293 | prompt_sr = 16000 294 | default_data = np.zeros(cosyvoice.sample_rate) 295 | main() 296 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Live2D-LLM-Chat 2 | [US English](README.md) | [CN 中文](README_CN.md) 3 | 4 | [![ASR](https://img.shields.io/badge/ASR-SenseVoice-green.svg)](https://github.com/FunAudioLLM/SenseVoice) 5 | [![LLM](https://img.shields.io/badge/LLM-GPT%2FDeepSeek-red.svg)](https://openai.com/api/) 6 | [![TTS](https://img.shields.io/badge/TTS-CosyVoice-orange.svg)](https://github.com/FunAudioLLM/CosyVoice) 7 | [![Live2D](https://img.shields.io/badge/Live2D-v3-blue.svg)](https://github.com/Arkueid/live2d-py) 8 | 9 | [![Python](https://img.shields.io/badge/Python-3.8+-yellow.svg)](https://www.python.org/downloads/) 10 | [![Miniconda](https://img.shields.io/badge/Anaconda-Miniconda-violet.svg)](https://www.anaconda.com/docs/getting-started/anaconda/install) 11 | 12 | > **Live2D + ASR + LLM + TTS** → Real-time voice interaction | Local deployment / Cloud inference 13 | 14 | --- 15 | ## ✨ 1. Project Introduction 16 | 17 | **Live2D-LLM-Chat** is a real-time AI interaction project that integrates **Live2D virtual avatars**, **Automatic Speech Recognition (ASR)**, **Large Language Models (LLM)**, and **Text-to-Speech (TTS)**. It allows a **virtual character** to recognize the user's speech through ASR, generate intelligent responses using AI, synthesize speech via TTS, and drive Live2D animations with lip-sync for a natural interaction experience. 18 | 19 | --- 20 | ### 📌 1.1. Main Features 21 | - 🎙 **Automatic Speech Recognition(ASR)**: Uses FunASR for Speech-to-Text (STT) processing. 22 | - 🧠 **Large Language Model(LLM)**: Supports rational conversation using OpenAI GPT / DeepSeek. 23 | - 🔊 **Text-to-Speech(TTS)**: Uses CosyVoice for high-quality speech synthesis. 24 | - 🏆 **Live2D Virtual Character Interaction**: Renders models using Live2D SDK and enables real-time feedback. 25 | 26 | --- 27 | ### 📌 1.2. Enhanced Features 28 | - **LLM module** supports both local and cloud deployment. The local deployment is based on **LM Studio**, which covers all open-source models, but personal device performance may limit large - models. Cloud deployment supports **OpenAI** and **DeepSeek** APIs. 29 | - Stores conversation history with **context memory**. Every five conversations, a summary is generated to prevent excessive text accumulation. 30 | - **Conversation logging** records the timestamp and dialogue history, including **TTS audio outputs**, making it easy to review past interactions. This feature can be disabled in the config file to **reduce memory usage**. 31 | - Enhanced Live2D **eye-tracking** and **blinking logic** to provide natural blinking even if the Live2D model lacks built-in logic. Implements **lip-sync mechanics** by analyzing real-time audio volume from the TTS output. 32 | - Modifies CosyVoice API to **directly save** generated speech files and **merge** segmented audio for long text synthesis. 33 | 34 |

35 | Live2D 运行展示 36 |
37 | Live2D Running Showcase 38 |

39 | 40 | #### 🎬 Interaction Demo 41 | 42 | | Voice Input | AI Processing | Live2D Output | 43 | |----------|---------|------------| 44 | | 🎤 You: Hello! | 🤖 AI: Hi there! | 🧑‍🎤 "Hi there!" (Lip sync) | 45 | | 🎤 You: How's the weather? | 🤖 AI: It's a sunny day! | 🧑‍🎤 "It's a sunny day!" (Speech tone variation) | 46 | 47 | --- 48 | ### 📌 1.3. Tech Stack 49 | | Component | Technology | 50 | |-------|-------| 51 | | ASR (Automatic Speech Recognition) | SenseVoice | 52 | | LLM (Large Language Model) | OpenAI GPT / DeepSeek | 53 | | TTS (Text-to-Speech) | CosyVoice | 54 | | Live2D Animation | live2d-py + OpenGL | 55 | | Configuration Management | Python Config | 56 | 57 | --- 58 | ## 🛠 2. Installation and Configuration 59 | 60 | --- 61 | 62 | ### 📌 2.1. System Requirements 63 | 64 | This project is developed with **Python 3.11**, and the following system requirements should be met before running it: 65 | 66 | ✅ **Operating System**: 67 | - 🖥 **Windows 10/11** or **Linux** 68 | 69 | ✅ **Python Version**: 70 | - 📌 Recommended **Python 3.8 or above** 71 | 72 | ⚠️ **Note**: 73 | The **TTS module** runs in a **conda environment** and requires **Miniconda** to be installed beforehand. 74 | 🔗 You can download it from [Miniconda Official Website](https://docs.conda.io/en/latest/miniconda.html). 75 | --- 76 | 77 | ### 📌 2.2. Dependencies 78 | 79 | This project leverages the following open-source libraries and models: 80 | 81 | 🎙 **Automatic Speech Recognition (ASR)**: 82 | - **SenseVoice** - High-precision **multilingual speech recognition** and **speech emotion analysis**. 83 | - 🔗 **GitHub**: [SenseVoice Repository](https://github.com/FunAudioLLM/SenseVoice) 84 | 85 | 🔊 **Text-to-Speech (TTS)**: 86 | - **CosyVoice** - A powerful **generative speech synthesis system**, supporting **zero-shot voice cloning**. 87 | - 🔗 **GitHub**: [CosyVoice Repository](https://github.com/FunAudioLLM/CosyVoice) 88 | 89 | 📽 **Live2D Animation**: 90 | - **live2d-py** - A tool for **directly loading and manipulating Live2D models** in Python. 91 | - 🔗 **GitHub**: [live2d-py Repository](https://github.com/Arkueid/live2d-py) 92 | 93 | --- 94 | ## 📁 3. Installation Steps 95 | 96 | --- 97 | ### 📌 3.1. Clone the Project Repository 98 | 99 | ```bash 100 | git clone https://github.com/suzuran0y/Live2D-LLM-Chat.git 101 | cd Live2D-LLM-Chat 102 | ``` 103 | 104 | ### 📌 3.2. Create a Virtual Environment (Optional) 105 | ```bash 106 | python -m venv venv 107 | source venv/bin/activate # Linux/macOS activation 108 | venv\Scripts\activate # Windows activation 109 | ``` 110 | 111 | ### 📌 3.3. Install Dependencies 112 | 113 | ```bash 114 | pip install -r requirements.txt 115 | ``` 116 | 117 | --- 118 | ### 📌 3.4. Install ASR & TTS Models 119 | 120 | 🎙 **Speech Recognition (ASR) - SenseVoice** 121 | This project uses SenseVoice for ASR, supporting **high-precision multilingual speech recognition** and **speech emotion detection**. 122 | 123 | #### 1️⃣ Install SenseVoice Dependencies 124 | Install SenseVoice dependencies using pip: 125 | ```bash 126 | pip install funasr 127 | ``` 128 | 129 | If you need ONNX or TorchScript inference, install the corresponding versions: 130 | ```bash 131 | pip install funasr-onnx # ONNX version 132 | pip install funasr-torch # TorchScript version 133 | ``` 134 | 135 | #### 2️⃣ Download SenseVoice Pre-trained Models 136 | SenseVoice provides several **pre-trained models**, which can be downloaded via ModelScope: 137 | ```python 138 | from modelscope import snapshot_download 139 | 140 | # Download SenseVoice-Small version 141 | snapshot_download('iic/SenseVoiceSmall', local_dir='pretrained_models/SenseVoiceSmall') 142 | # Download SenseVoice-Large version for higher accuracy 143 | snapshot_download('iic/SenseVoiceLarge', local_dir='pretrained_models/SenseVoiceLarge') 144 | ``` 145 | 146 | 🔗 More details: [SenseVoice GitHub](https://github.com/FunAudioLLM/SenseVoice) | [ModelScope](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 147 | 148 | 🔊 **Text-to-Speech (TTS) - CosyVoice** 149 | This project uses CosyVoice for TTS, supporting **multilingual speech synthesis, voice cloning, and cross-lingual synthesis**. 150 | 151 | #### 1️⃣ Install CosyVoice Dependencies 152 | Clone the CosyVoice repository: 153 | ```bash 154 | git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git 155 | cd CosyVoice 156 | git submodule update --init --recursive 157 | ``` 158 | 159 | #### 2️⃣ Create a Conda Environment and Install Dependencies 160 | ```bash 161 | # Create a Conda virtual environment 162 | conda create -n cosyvoice -y python=3.10 163 | conda activate cosyvoice 164 | 165 | # Install required dependencies 166 | conda install -y -c conda-forge pynini==2.1.5 167 | pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com 168 | ``` 169 | 170 | Install SoX (if necessary): 171 | ```bash 172 | # Ubuntu 173 | sudo apt-get install sox libsox-dev 174 | # CentOS 175 | sudo yum install sox sox-devel 176 | ``` 177 | 178 | #### 3️⃣ Download CosyVoice Pre-trained Models 179 | It is recommended to download the following CosyVoice pre-trained models: 180 | ```python 181 | from modelscope import snapshot_download 182 | 183 | snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B') 184 | snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M') 185 | snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT') 186 | snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct') 187 | snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd') 188 | ``` 189 | 190 | 🔗 More details: [CosyVoice GitHub](https://github.com/FunAudioLLM/CosyVoice) | [ModelScope](https://www.modelscope.cn/iic/CosyVoice2-0.5B) 191 | 192 | --- 193 | ## ⚙️ 4. Configuration for Local Setup(important!!) 194 | 195 | --- 196 | 197 | ### 📌 4.1. Configure ASR & TTS Models 198 | 199 | After installing **ASR** and **TTS** models, follow these steps for local configuration: 200 | 201 | ✅ **Replace SenseVoice Directory** 202 | - Move the downloaded **SenseVoice** folder into `Live2D-LLM-Chat/ASR_env/`, replacing the existing empty folder. 203 | 204 | ✅ **Replace CosyVoice Directory** 205 | - Move the downloaded **CosyVoice** folder into `Live2D-LLM-Chat/TTS_env/`, replacing the existing empty folder. 206 | 207 | ✅ **Replace `webui.py` File** 208 | - Move the `TTS_env/webui.py` file into the `CosyVoice` folder, replacing the original `webui.py` file. 209 | 210 | --- 211 | 212 | ### 📌 4.2. Configure `config.py` for Local Environment 213 | Modify **`config.py`** to adjust local file paths and parameters. Example: 214 | ```python 215 | class Config: 216 | # 🏠 Project Root Directory 217 | PROJECT_ROOT = "E:/PyCharm/project/project1" 218 | 219 | # 🎙 ASR (Automatic Speech Recognition) Configuration 220 | ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall") 221 | ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav") 222 | 223 | # 🔊 TTS (Text-to-Speech) Configuration 224 | TTS_API_URL = "http://localhost:8000/" 225 | TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/") 226 | 227 | ``` 228 | ❗ **Ensure all paths are correctly set up before running the project!** 229 | 230 | --- 231 | ## 📌 4.3. Configure LLM Model 232 | 233 | Local deployment of the **LLM model** relies on **LM Studio**. Follow these steps: 234 | 235 | #### 1️⃣ Install LM Studio 236 | Download from [GitHub](https://github.com/lmstudio-ai) or the [LM Studio official website](https://lmstudio.ai/). 237 | 238 | #### 2️⃣ Open the application and download an LLM model compatible with your device. 239 | Start LM Studio and obtain the local API URL. 240 | Adjust the model path & port number in `config.py`. 241 | 242 | #### 3️⃣ Run the local LLM and integrate it into the project. 243 | ⚠️ **Note**: The performance of locally deployed LLM models depends on device capabilities and may not match cloud-based models. If higher performance is required, consider using OpenAI GPT-4 or DeepSeek API. 244 | 245 | --- 246 | ## 👀 5. Usage Instructions 247 | --- 248 | 249 | ## 📌 5.1. Start the TTS AP 250 | 251 | Before running the main program, **start the TTS API**: 252 | 253 | ```bash 254 | python TTS_api.py # This is now integrated into the main program but can be run separately for debugging. 255 | ``` 256 | 257 | 🎯 The TTS API module will run `webui.py` in the **conda environment**. Once successfully started, you can access the WebUI for voice synthesis management: 🌍 Default address: [http://localhost:8000](http://localhost:8000) 258 | 259 | ❗ Ensure the **TTS API is running properly**, or the program will not be able to generate speech. 260 | 261 | --- 262 | ### 📌 5.2. Run the Main Program 263 | 264 | Once the TTS API is started, run the main program: 265 | 266 | ```bash 267 | python main.py 268 | ``` 269 | 270 | 🎙 **Interaction Steps**: 271 | 272 | 1️⃣ **Press and hold the Ctrl key** to start recording, **press the Alt key** to stop recording. The voice will be automatically converted into text. 273 | 2️⃣ The **text is processed by the LLM module**, generating a response. 274 | 3️⃣ The **response text is converted into speech** via the TTS module, and the Live2D model will sync its lip movements to the speech. 275 | 276 | --- 277 | 278 | ### 📌 5.3. System Architecture Diagram 279 | 280 | | **Step** | **Module** | **Input** | **Processing** | **Output** | 281 | |----------|---------|---------|---------|---------| 282 | | 🎤 **User Speech** | **User** | Speech Input | User speaks | Audio Signal | 283 | | 🎙 **Speech Recognition** | **ASR (SenseVoice)** | Audio Signal | Speech-to-Text (STT) | Recognized Text | 284 | | 🤖 **Text Understanding & Generation** | **LLM (GPT-4 / DeepSeek)** | Recognized Text | Semantic Analysis & AI Response Generation | AI-Generated Text | 285 | | 🔊 **Speech Synthesis** | **TTS (CosyVoice)** | AI-Generated Text | Text-to-Speech (TTS) | Speech Data | 286 | | 🎭 **Live2D Animation** | **Live2D** | Speech Data | Motion Generation | Character Animation | 287 | | 🗣 **AI Voice Feedback** | **User** | Character Voice & Actions | User hears AI response | Voice & Visual Interaction | 288 | 289 | --- 290 | # 📂 6. Project Structure 291 | 292 | This project follows a modular design, integrating **ASR (speech recognition), TTS (text-to-speech), LLM (large language model), and Live2D animation rendering** as core functionalities. Below is the **complete project structure**: 293 | 294 | ```bash 295 | Live2D-LLM-Chat/ 296 | │── main.py # 🚀 Main program entry 297 | │── ASR.py # 🎙 Speech Recognition (ASR) module 298 | │── TTS.py # 🔊 Speech Synthesis (TTS) module 299 | │── TTS_api.py # 🌐 TTS API module 300 | │── LLM.py # 🤖 Large Language Model (LLM) module 301 | │── Live2d_animation.py # 🎭 Live2D animation management module 302 | │── webui.py # 🖥 WebUI for voice synthesis 303 | │── config.py # ⚙️ Configuration file 304 | │── requirements.txt # 📦 Dependency list 305 | └── README.md # 📄 Project documentation 306 | ``` 307 | --- 308 | ## 🚀 7. Future Plans 309 | --- 310 | 311 | ### 📌 7.1. Past Developments 312 | 313 | #### 📅 **2025.01.28 - Initial Project Concept** 314 | - 🎯 **Core Goals Defined**: Developing a **Live2D + LLM** real-time interaction system. 315 | - 🔍 **Technology Research**: Investigating ASR (speech recognition), TTS (text-to-speech), and Live2D solutions. 316 | - ✅ **Core Components Selected**: 317 | - **SenseVoice** for ASR 318 | - **CosyVoice** for TTS 319 | - **live2d-py** for animation rendering 320 | 321 | #### 📅 **2025.02.28 - First Version Release** 322 | - 🎙 **Implemented speech input & recognition (ASR)** 323 | - 🤖 **Integrated LLM for text generation** 324 | - 🔊 **Generated speech output & synced Live2D mouth movements** 325 | 326 | --- 327 | 328 | ### 📌 7.2. Future Plans ~~(Wishlist)~~ 329 | 330 | 🔹 **LLM Module Optimization**: 331 | - Due to **device limitations**, local deployment may not match cloud-based models. **Improving LLM processing logic** to enhance stability. 332 | 333 | 🔹 **Refined Output Management**: 334 | - Optimizing **program logs and output messages** to retain only essential information for a cleaner display. 335 | 336 | 🔹 **Enhanced Live2D Interaction**: 337 | - **Improving Live2D model expressions and movements** to make interactions feel more natural and engaging. 338 | 339 | 🔹 **Additional Optimizations**: 340 | - 🛠 Improving TTS & ASR efficiency 341 | - 🌍 Expanding multilingual support 342 | - 🔗 Enhancing cloud-based inference capabilities 343 | 344 | --- 345 | #### 📅 **2025.02.28 - First Version Release** 346 | - 🎙 **Implemented speech input & recognition (ASR)** 347 | - 🤖 **Integrated LLM for text generation** 348 | - 🔊 **Generated speech output & synced Live2D mouth movements** 349 | 350 | --- 351 | ## 🤝 8. Contributions & Acknowledgments 352 | --- 353 | 354 | This project builds upon work from [SenseVoice](https://github.com/FunAudioLLM/SenseVoice), [CosyVoice](https://github.com/FunAudioLLM/CosyVoice), and [live2d-py](https://github.com/Arkueid/live2d-py), incorporating modifications and optimizations to fit the project’s requirements. 355 | 🎉 **Special thanks to the original developers!** 356 | 357 | 💡 **We welcome contributions and feedback!** 358 | 359 | 📢 If you have suggestions or improvements, please submit a **PR (Pull Request)** or **Issue** on GitHub. 360 | 361 | --- 362 | ## 📄 9. License 363 | This project is licensed under the [Apache-2.0 License](LICENSE). --------------------------------------------------------------------------------