├── LLM_env
    ├── .gitkeep
    ├── conversation_history.txt
    └── LM Studio
    │   └── LM Studio.htm
├── Live2d_env
    ├── .gitkeep
    ├── running_photo.jpg
    ├── pachirisu anime girl - top half.moc3
    ├── pachirisu anime girl - top half.4096
    │   └── texture_00.png
    ├── .model3.json
    ├── pachirisu anime girl - top half.model3.json
    ├── pachirisu anime girl - top half.cdi3.json
    └── pachirisu anime girl - top half.physics3.json
├── TTS_env
    ├── tmp
    │   └── .gitkeep
    ├── CosyVoice
    │   └── .gitkeep
    ├── voice_history
    │   ├── .gitkeep
    │   ├── 20250227_231810.wav
    │   ├── 20250228_033731.wav
    │   └── 20250228_042517.wav
    ├── voice_training_sample
    │   ├── .gitkeep
    │   ├── text_taiyuan.txt
    │   ├── fushun.mp3
    │   ├── taiyuan.mp3
    │   └── text_fushun.txt
    ├── output_voice_text.txt
    ├── output_voice
    │   └── 20250228_042632.wav
    ├── voice_output_api.py
    └── webui.py
├── ASR_env
    ├── SenseVoice
    │   └── .gitkeep
    ├── input_voice
    │   ├── .gitkeep
    │   └── voice.wav
    └── sensevoice_attempt.py
├── __pycache__
    ├── ASR.cpython-311.pyc
    ├── LLM.cpython-311.pyc
    ├── TTS.cpython-311.pyc
    ├── TTS_api.cpython-311.pyc
    ├── config.cpython-311.pyc
    ├── config.cpython-312.pyc
    └── Live2d_animation.cpython-311.pyc
├── .gitignore
├── main.py
├── config.py
├── ASR.py
├── TTS_api.py
├── TTS.py
├── LLM.py
├── Live2d_animation.py
├── requirements.txt
├── README_CN.md
├── LICENSE
└── README.md


/LLM_env/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/Live2d_env/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/TTS_env/tmp/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/ASR_env/SenseVoice/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/TTS_env/CosyVoice/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/ASR_env/input_voice/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/TTS_env/voice_history/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/TTS_env/voice_training_sample/.gitkeep:
--------------------------------------------------------------------------------
1 |  
2 | 


--------------------------------------------------------------------------------
/TTS_env/output_voice_text.txt:
--------------------------------------------------------------------------------
1 | 要照顾大家的感受，跟大家搞好关系，我必须做好纽带!
2 | 


--------------------------------------------------------------------------------
/TTS_env/voice_training_sample/text_taiyuan.txt:
--------------------------------------------------------------------------------
1 | 春节的时候，至亲的人们都会为了团圆而聚在一起呢。今、今年除了姐姐们之外，也想和指挥官一起团圆…可以吗…?


--------------------------------------------------------------------------------
/Live2d_env/running_photo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/running_photo.jpg


--------------------------------------------------------------------------------
/ASR_env/input_voice/voice.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/ASR_env/input_voice/voice.wav


--------------------------------------------------------------------------------
/__pycache__/ASR.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/ASR.cpython-311.pyc


--------------------------------------------------------------------------------
/__pycache__/LLM.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/LLM.cpython-311.pyc


--------------------------------------------------------------------------------
/__pycache__/TTS.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/TTS.cpython-311.pyc


--------------------------------------------------------------------------------
/__pycache__/TTS_api.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/TTS_api.cpython-311.pyc


--------------------------------------------------------------------------------
/__pycache__/config.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/config.cpython-311.pyc


--------------------------------------------------------------------------------
/__pycache__/config.cpython-312.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/config.cpython-312.pyc


--------------------------------------------------------------------------------
/TTS_env/output_voice/20250228_042632.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/output_voice/20250228_042632.wav


--------------------------------------------------------------------------------
/TTS_env/voice_history/20250227_231810.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250227_231810.wav


--------------------------------------------------------------------------------
/TTS_env/voice_history/20250228_033731.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250228_033731.wav


--------------------------------------------------------------------------------
/TTS_env/voice_history/20250228_042517.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_history/20250228_042517.wav


--------------------------------------------------------------------------------
/TTS_env/voice_training_sample/fushun.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_training_sample/fushun.mp3


--------------------------------------------------------------------------------
/TTS_env/voice_training_sample/taiyuan.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/TTS_env/voice_training_sample/taiyuan.mp3


--------------------------------------------------------------------------------
/TTS_env/voice_training_sample/text_fushun.txt:
--------------------------------------------------------------------------------
1 | 今天的抚顺，也是元气满满！如果有什么想了解的，我可以陪指挥官一起调查哦。这个送给太原的话，她一定会很高兴吧！在极北处昼夜交替时出现的幽灵船？确实听说过这种传闻。长春虽然是妹妹，但她教了我很多呢！


--------------------------------------------------------------------------------
/__pycache__/Live2d_animation.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/__pycache__/Live2d_animation.cpython-311.pyc


--------------------------------------------------------------------------------
/Live2d_env/pachirisu anime girl - top half.moc3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/pachirisu anime girl - top half.moc3


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | 
 2 | TTS_env/CosyVoice/*
 3 | !TTS_env/CosyVoice/.gitkeep
 4 | 
 5 | ASR_env/SenseVoice/*
 6 | !ASR_env/SenseVoice/.gitkeep
 7 | 
 8 | .venv/
 9 | .idea/
10 | 


--------------------------------------------------------------------------------
/Live2d_env/pachirisu anime girl - top half.4096/texture_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suzuran0y/Live2D-LLM-Chat/HEAD/Live2d_env/pachirisu anime girl - top half.4096/texture_00.png


--------------------------------------------------------------------------------
/LLM_env/conversation_history.txt:
--------------------------------------------------------------------------------
 1 | Time：2025-02-28 01:03:39
 2 | User：你好。
 3 | Neko：你好喵～有什么想要了解或学习的吗？尽管问我吧！
 4 | ---
 5 | Time：2025-02-28 04:26:13
 6 | User：晚上好。
 7 | Neko：晚上好，有什么需要我帮忙的吗？
 8 | ---
 9 | Time：2025-02-28 04:27:48
10 | User：你是谁？
11 | Neko：我是你的知识助手猫娘，随时准备为你解答问题、讲解知识或者陪你聊聊天哦！
12 | ---
13 | 


--------------------------------------------------------------------------------
/ASR_env/sensevoice_attempt.py:
--------------------------------------------------------------------------------
 1 | from funasr import AutoModel
 2 | from funasr.utils.postprocess_utils import rich_transcription_postprocess
 3 | 
 4 | model_dir = "E:/PyCharm/project/project1/ASR_env/SenseVoice/models/SenseVoiceSmall" # 替换为AST模型所在地址
 5 | voice_dir = "E:/PyCharm/project/project1/ASR_env/input_voice/voice.wav" # 替换为音频文件所在地址
 6 | 
 7 | model = AutoModel(
 8 |     model=model_dir,
 9 |     trust_remote_code=False,
10 | #    remote_code="./model.py",
11 | #    vad_model="fsmn-vad",
12 | #    vad_kwargs={"max_single_segment_time": 30000},
13 |     device="cuda:0",
14 |     disable_update=True
15 | )
16 | 
17 | # en
18 | res = model.generate(
19 |     input=voice_dir,#f"{model.model_path}/example/zh.mp3",
20 |     cache={},
21 |     language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
22 |     use_itn=True,
23 |     batch_size_s=60,
24 |     merge_vad=True,
25 |     merge_length_s=15,
26 | )
27 | text = rich_transcription_postprocess(res[0]["text"])
28 | print(text)


--------------------------------------------------------------------------------
/TTS_env/voice_output_api.py:
--------------------------------------------------------------------------------
 1 | 
 2 | # 单独调用CosyVoice模型的api接口 需要预先运行 webui.py 启动模型
 3 | 
 4 | from gradio_client import Client, handle_file
 5 | 
 6 | training_sample_dir = "" # 替换为需要训练音色的音频文本所在地址
 7 | output_text_dir = "" # 替换为想要训练后的音色进行输出的音频文本所在地址
 8 | training_voice_dir = "" # 替换为需要训练音色的音频文件所在地址
 9 | 
10 | # 载入需要训练音色的音频文本
11 | with open(training_sample_dir, "r", encoding='utf-8') as file:
12 | 	content_2 = file.read()
13 | # 载入想要训练后的音色进行输出的音频文本
14 | with open(output_text_dir, "r", encoding='utf-8') as file:
15 | 	content_1 = file.read()
16 | # 调用模型
17 | client = Client("http://localhost:8000/") # 该地址为cosyvoice模型自动分配地址，无出错时不改动
18 | result = client.predict(
19 | 		tts_text=content_1,
20 | 		mode_checkbox_group="3s极速复刻",
21 | 		sft_dropdown="",
22 | 		prompt_text=content_2,
23 | 		prompt_wav_upload=handle_file(training_voice_dir),
24 | 		prompt_wav_record=handle_file(training_voice_dir),
25 | 		instruct_text="",
26 | 		seed=0,
27 | 		stream=False,
28 | 		speed=1,
29 | 		api_name="/generate_audio"
30 | )


--------------------------------------------------------------------------------
/Live2d_env/.model3.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "Type": 0,
 3 |   "FileReferences": {
 4 |     "Moc": "pachirisu anime girl - top half.moc3",
 5 |     "Textures": [
 6 |       "pachirisu anime girl - top half.4096/texture_00.png"
 7 |     ],
 8 |     "Physics": "pachirisu anime girl - top half.physics3.json",
 9 |     "PhysicsV2": {
10 |       "File": "pachirisu anime girl - top half.physics3.json"
11 |     }
12 |   },
13 |   "Controllers": {
14 |     "ParamHit": {},
15 |     "ParamLoop": {},
16 |     "KeyTrigger": {},
17 |     "ParamTrigger": {},
18 |     "AreaTrigger": {},
19 |     "HandTrigger": {},
20 |     "EyeBlink": {
21 |       "MinInterval": 500,
22 |       "MaxInterval": 6000,
23 |       "Enabled": true
24 |     },
25 |     "LipSync": {
26 |       "Gain": 5.0
27 |     },
28 |     "MouseTracking": {
29 |       "SmoothTime": 0.15,
30 |       "Enabled": true
31 |     },
32 |     "AutoBreath": {
33 |       "Enabled": true
34 |     },
35 |     "ExtraMotion": {
36 |       "Enabled": true
37 |     },
38 |     "Accelerometer": {
39 |       "Enabled": true
40 |     },
41 |     "Microphone": {},
42 |     "Transform": {},
43 |     "FaceTracking": {
44 |       "Enabled": true
45 |     },
46 |     "HandTracking": {},
47 |     "ParamValue": {},
48 |     "PartOpacity": {},
49 |     "ArtmeshOpacity": {},
50 |     "ArtmeshColor": {},
51 |     "ArtmeshCulling": {
52 |       "DefaultMode": 0
53 |     },
54 |     "IntimacySystem": {}
55 |   },
56 |   "Options": {
57 |     "TexType": 0
58 |   }
59 | }


--------------------------------------------------------------------------------
/Live2d_env/pachirisu anime girl - top half.model3.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "Version": 3,
 3 |   "Type": 0,
 4 |   "FileReferences": {
 5 |     "Moc": "pachirisu anime girl - top half.moc3",
 6 |     "Textures": [
 7 |       "pachirisu anime girl - top half.4096/texture_00.png"
 8 |     ],
 9 |     "Physics": "pachirisu anime girl - top half.physics3.json",
10 |     "PhysicsV2": {
11 |       "File": "pachirisu anime girl - top half.physics3.json"
12 |     }
13 |   },
14 |   "Controllers": {
15 |     "ParamHit": {
16 |       "Enabled": true
17 |     },
18 |     "ParamLoop": {
19 |       "Enabled": true
20 |     },
21 |     "KeyTrigger": {
22 |       "Enabled": true
23 |     },
24 |     "ParamTrigger": {
25 |       "Enabled": true
26 |     },
27 |     "AreaTrigger": {
28 |       "Enabled": true
29 |     },
30 |     "HandTrigger": {
31 |       "Enabled": true
32 |     },
33 |     "EyeBlink": {
34 |       "MinInterval": 500,
35 |       "MaxInterval": 6000,
36 |       "Enabled": true
37 |     },
38 |     "LipSync": {
39 |       "Gain": 5.0,
40 |       "Enabled": true
41 |     },
42 |     "MouseTracking": {
43 |       "SmoothTime": 0.15,
44 |       "Enabled": true
45 |     },
46 |     "AutoBreath": {
47 |       "Enabled": true
48 |     },
49 |     "ExtraMotion": {
50 |       "Enabled": true
51 |     },
52 |     "Accelerometer": {
53 |       "Enabled": true
54 |     },
55 |     "Microphone": {
56 |       "Enabled": true
57 |     },
58 |     "Transform": {},
59 |     "FaceTracking": {
60 |       "Enabled": true
61 |     },
62 |     "HandTracking": {
63 |       "Enabled": true
64 |     },
65 |     "ParamValue": {
66 |       "Enabled": true
67 |     },
68 |     "PartOpacity": {
69 |       "Enabled": true
70 |     },
71 |     "ArtmeshOpacity": {
72 |       "Enabled": true
73 |     },
74 |     "ArtmeshColor": {
75 |       "Enabled": true
76 |     },
77 |     "ArtmeshCulling": {
78 |       "DefaultMode": 0
79 |     },
80 |     "IntimacySystem": {
81 |       "Enabled": true
82 |     }
83 |   },
84 |   "Options": {
85 |     "TexType": 0
86 |   }
87 | }


--------------------------------------------------------------------------------
/LLM_env/LM Studio/LM Studio.htm:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 |   <head>
 4 |     <meta charset="utf-8">
 5 |     <meta name="referrer" content="origin-when-cross-origin">
 6 |     <script>//<![CDATA[
 7 |       var s = false;
 8 |       function l() {
 9 |         setTimeout(f, 10000);
10 |         if (document.referrer) {
11 |           try {
12 |             var pm = /(^|&|\?)px=([^&]*)(&|$)/i;
13 |             var px = window.location.href.match(pm);
14 |             var rs = document.referrer;
15 |             if (px != null) {
16 |               if (rs.match(pm))
17 |                 rs = rs.replace(pm, "$1px=" + px[2] + "$3");
18 |               else if (rs.indexOf("?") != -1)
19 |                 rs = rs + "&px=" + px[2];
20 |               else
21 |                 rs = rs + "?px=" + px[2];
22 |             }
23 |             history.replaceState({}, "Bing", rs);
24 |             window.addEventListener("pageshow", function(e) { if (e.persisted || (typeof window.performance != "undefined" && window.performance.navigation.type === 2)) window.location.reload(); });
25 |             s = true;
26 |             setTimeout(r, 10);
27 |             return;
28 |           } catch (e) {}
29 |         }
30 |         r();
31 |       }
32 |       function r() {
33 |         var u = "https://lmstudio.ai/";
34 |         if (s)
35 |           window.location.href = u;
36 |         else
37 |           window.location.replace(u);
38 |       }
39 |       function f() {
40 |         document.getElementById("fb").style.display = "block";
41 |       }
42 |       //]]>
43 |     </script>
44 |   </head>
45 |   <body onload="l()">
46 |     <div id="fb" style="display: none">
47 |       Please <a href="https://www.bing.com/ck/a?!&&p=eb696a4eaf73a94f6381285ad5c832f4ed516baa5ee5d5026450d7cfbddea78bJmltdHM9MTc0MDYxNDQwMA&ptn=3&ver=2&hsh=4&fclid=36b37f1b-b457-6ed5-1741-6b84b5146f09&psq=lm+studio&u=a1aHR0cHM6Ly9sbXN0dWRpby5haS8&ntb=F">click here</a> if the page does not redirect automatically ...
48 |     </div>
49 |   </body>
50 | </html>


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import threading
 3 | import datetime
 4 | from TTS_api import TTSAPIManager
 5 | from ASR import ASRManager
 6 | from TTS import TTSManager
 7 | from LLM import LLMManager
 8 | from Live2d_animation import Live2DAnimationManager
 9 | from config import Config
10 | 
11 | class MainManager:
12 |     def __init__(self):
13 | 
14 |         # Initialize the main manager, integrating TTS_API, TTS, ASR, LLM, and Live2D.
15 | 
16 |         # Start the TTS API and ensure the API is available.
17 |         self.tts_api_manager = TTSAPIManager(Config.SHOW_WINDOW)
18 |         api_ready = self.tts_api_manager.start_tts_api()
19 |         if not api_ready:
20 |             print("TTS API startup failed, program terminated!")
21 |             return
22 | 
23 |         # Initialize other modules
24 |         self.asr_manager = ASRManager()
25 |         self.tts_manager = TTSManager()
26 |         self.llm_manager = LLMManager()
27 |         self.live2d_manager = Live2DAnimationManager(
28 |             model_path=Config.LIVE2D_MODEL_PATH
29 |         )
30 | 
31 |         self.history_file = Config.LLM_CONVERSATION_HISTORY
32 | 
33 |         # Start Live2D window (ensure it keeps running).
34 |         live2d_thread = threading.Thread(target=self.live2d_manager.play_live2d_once)
35 |         live2d_thread.start()
36 | 
37 |     def run(self):
38 |         while True:
39 |             user_wav = Config.ASR_AUDIO_INPUT
40 |             self.asr_manager.record_audio(user_wav)
41 |             user_input = self.asr_manager.recognize_speech(user_wav)
42 |             print(f">>> {user_input}")
43 | 
44 |             if user_input.lower() in ("exit。", "quit。", "q。", "结束。", "再见。"):
45 |                 print("Conversation exited.")
46 |                 break
47 | 
48 |             reply = self.llm_manager.chat_once(user_input)
49 |             output_wav = self.tts_manager.synthesize(reply)
50 | 
51 |             self.live2d_manager.play_audio_and_print_mouth(output_wav)
52 | 
53 |             with open(self.history_file, 'a', encoding='utf-8') as f:
54 |                 timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
55 |                 f.write(f"Time：{timestamp}\n")
56 |                 f.write(f"User：{user_input}\nNeko：{reply}\n---\n")
57 | if __name__ == "__main__":
58 |     main_manager = MainManager()
59 |     main_manager.run()
60 | 


--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | # 将自己对应的文件路径替换掉下面的配置文件路径中
 4 | class Config:
 5 |     # 项目根目录
 6 |     PROJECT_ROOT = "E:/PyCharm/project/project1"
 7 | 
 8 |     # ASR（自动语音识别）配置
 9 |     ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall")
10 |     ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav")
11 | 
12 |     # TTS（文本转语音）配置
13 |     TTS_API_URL = "http://localhost:8000/" # 该地址为cosyvoice模型自动分配地址，无出错时不改动
14 |     TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/")
15 |     TTS_HISTORY_DIR = os.path.join(PROJECT_ROOT, "TTS_env/voice_history/")
16 |     TTS_PROMPT_TEXT = os.path.join(PROJECT_ROOT, "TTS_env/voice_training_sample/text_taiyuan.txt")
17 |     TTS_PROMPT_WAV = os.path.join(PROJECT_ROOT, "TTS_env/voice_training_sample/taiyuan.mp3")
18 | 
19 |     # TTS API 相关
20 |     MINICONDA_PATH = "E:/miniconda3"
21 |     WEBUI_PYTHON = os.path.join(MINICONDA_PATH, "python.exe")
22 |     WEBUI_SCRIPT = os.path.join(PROJECT_ROOT, "TTS_env/CosyVoice/webui.py")
23 |     CLEANUP_MODE = "move"  # "delete" or "move"; 配置文件清理方式（delete: 删除 | move: 归档）
24 |     SHOW_WINDOW = True
25 | 
26 |     # LLM（大模型）配置
27 |     # 根据需要调用的模型填入key
28 |     LLM_TMP_DIR = os.path.join(PROJECT_ROOT, "TTS_env/tmp")
29 |     LLM_CONVERSATION_HISTORY = os.path.join(PROJECT_ROOT, "LLM_env/conversation_history.txt")
30 |     openai_key = ""
31 |     deepseek_key = ""
32 |     grop_key = ""
33 |     online_model = "offline" # "online" or "offline" ; 使用本地部署或在线LLM模型（online: 在线模型 | offline: 本地部署模型）
34 |     model_choice = "OpenAI" # "OpenAI" or "deepseek" ; 选择LLM模型（OpenAI | deepseek）
35 |     # 当使用LM Studio进行本地部署LLM时，先下载好需要加载的模型，然后加载完成
36 |     # 查看LM Studio右侧的API Usage页面，找到自己的 API identifier（model name） 例如：deepseek-r1-distill-qwen-14b
37 |     # 接下来查看自己的local server，例如：http://127.0.0.1:1234
38 |     # 修改下面的两个变量
39 |     model_name = "" # "deepseek-r1-distill-qwen-14b"
40 |     api_url = "http://127.0.0.1:1234/v1/chat/completions" # 只需要修改前面的网址部分
41 | 
42 | 
43 |     # Live2D 配置
44 |     LIVE2D_MODEL_PATH = os.path.join(PROJECT_ROOT, "Live2d_env/pachirisu anime girl - top half.model3.json")
45 | 
46 |     # WebUI 相关配置
47 |     WEBUI_SAVE_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/")
48 |     WEBUI_HISTORY_DIR = os.path.join(PROJECT_ROOT, "TTS_env/voice_history/")
49 |     WEBUI_MODEL_DIR = os.path.join(PROJECT_ROOT, "TTS_env/CosyVoice/pretrained_models/CosyVoice2-0.5B")
50 | 
51 | # 可用于打印检查配置
52 | if __name__ == "__main__":
53 |     for attr in dir(Config):
54 |         if not attr.startswith("__"):
55 |             print(f"{attr} = {getattr(Config, attr)}")
56 | 


--------------------------------------------------------------------------------
/ASR.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import time
 3 | import wave
 4 | import keyboard
 5 | import pyaudio
 6 | from funasr import AutoModel
 7 | from funasr.utils.postprocess_utils import rich_transcription_postprocess
 8 | from config import Config
 9 | 
10 | class ASRManager:
11 |     def __init__(self, model_dir=Config.ASR_MODEL_DIR, device="cuda:0"):
12 | 
13 |         # 初始化 ASR 语音识别管理器
14 |         # param model_dir: 语音识别模型路径
15 |         # param device: 使用的计算设备（默认为 GPU）
16 | 
17 |         self.model = AutoModel(
18 |             model=model_dir,
19 |             trust_remote_code=False,
20 |             device=device,
21 |             disable_update=True
22 |         )
23 |         self.sample_rate = 44100
24 |         self.channels = 1
25 |         self.chunk = 1024
26 |         self.format = pyaudio.paInt16
27 | 
28 |     def record_audio(self, output_wav_file):
29 | 
30 |         # 录音功能，按住 `CTRL` 说话，按 `ALT` 结束录音。
31 |         # param output_wav_file: 录制的音频文件路径
32 | 
33 |         p = pyaudio.PyAudio()
34 |         stream = p.open(
35 |             format=self.format,
36 |             channels=self.channels,
37 |             rate=self.sample_rate,
38 |             input=True,
39 |             frames_per_buffer=self.chunk
40 |         )
41 | 
42 |         print("[CTRL键] 开口...")
43 |         keyboard.wait('ctrl')
44 |         print("讲话中... [ALT键] 结束...")
45 | 
46 |         frames = []
47 |         while True:
48 |             data = stream.read(self.chunk)
49 |             frames.append(data)
50 |             if keyboard.is_pressed('alt'):
51 |                 print("录音结束，正在处理...")
52 |                 break
53 |             time.sleep(0.01)
54 | 
55 |         stream.stop_stream()
56 |         stream.close()
57 |         p.terminate()
58 | 
59 |         # 保存音频到文件
60 |         with wave.open(output_wav_file, 'wb') as wf:
61 |             wf.setnchannels(self.channels)
62 |             wf.setsampwidth(p.get_sample_size(self.format))
63 |             wf.setframerate(self.sample_rate)
64 |             wf.writeframes(b''.join(frames))
65 | 
66 |     def recognize_speech(self, wav_path):
67 |         start_time = time.time()
68 | 
69 |         # 进行语音识别，将音频转换为文本。
70 |         # param wav_path: 音频文件路径
71 |         # return: 识别出的文本
72 | 
73 |         res = self.model.generate(
74 |             input=wav_path,
75 |             language="auto",
76 |             use_itn=True,
77 |             batch_size_s=60,
78 |             merge_vad=True,
79 |             merge_length_s=15,
80 |         )
81 |         print(f"ASR 识别耗时: {time.time() - start_time:.2f} 秒")
82 |         return rich_transcription_postprocess(res[0]["text"])
83 | 
84 | 
85 | if __name__ == "__main__":
86 |     asr_manager = ASRManager()
87 |     audio_file = Config.ASR_AUDIO_INPUT
88 | 
89 |     # 录音
90 |     asr_manager.record_audio(audio_file)
91 | 
92 |     # 识别语音
93 |     recognized_text = asr_manager.recognize_speech(audio_file)
94 |     print(f"识别结果: {recognized_text}")
95 | 


--------------------------------------------------------------------------------
/TTS_api.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import subprocess
 3 | import time
 4 | import requests
 5 | from config import Config
 6 | import os
 7 | 
 8 | class TTSAPIManager:
 9 |     def __init__(self, show_window=Config.SHOW_WINDOW):
10 |         # 初始化 TTS API 管理器:param show_window: 是否显示 TTS API 窗口
11 |         self.webui_python = Config.WEBUI_PYTHON
12 |         self.webui_script = Config.WEBUI_SCRIPT
13 |         self.api_url = Config.TTS_API_URL
14 |         self.timeout = 300  # 最大等待时间（秒）
15 |         self.show_window = show_window
16 |         self.env = self._configure_env()
17 | 
18 |     def _configure_env(self):
19 |         # 配置 Conda 环境变量
20 |         env = os.environ.copy()
21 |         env["CONDA_PREFIX"] = Config.MINICONDA_PATH
22 |         env["PATH"] = f"{Config.MINICONDA_PATH}/Scripts;{Config.MINICONDA_PATH}/Library/bin;{env['PATH']}"
23 |         env["PYTHONPATH"] = env.get("PYTHONPATH", "") + f";{Config.PROJECT_ROOT}/TTS_env/CosyVoice/third_party/Matcha-TTS"
24 |         env["PATH"] += f";{Config.PROJECT_ROOT}/TTS_env/CosyVoice/third_party/Matcha-TTS"
25 |         return env
26 | 
27 |     def start_tts_api(self):
28 |         # 启动 TTS API 并等待其加载
29 |         print("启动 webui.py，并确保 Conda 变量和 `pretrained_models` 目录正确...")
30 | 
31 |         try:
32 |             if self.show_window:
33 |                 # 创建新窗口运行 WebUI
34 |                 self.webui_process = subprocess.Popen(
35 |                     [self.webui_python, self.webui_script],
36 |                     env=self.env,
37 |                     stdout=None,
38 |                     stderr=None,
39 |                     creationflags=subprocess.CREATE_NEW_CONSOLE  # 在新窗口中运行
40 |                 )
41 |             else:
42 |                 # 隐藏窗口运行 WebUI
43 |                 self.webui_process = subprocess.Popen(
44 |                     [self.webui_python, self.webui_script],
45 |                     env=self.env,
46 |                     stdout=None,
47 |                     stderr=None,
48 |                     creationflags=subprocess.CREATE_NO_WINDOW  # 隐藏窗口
49 |                 )
50 | 
51 |             print("webui.py 已启动，等待 API 加载...")
52 | 
53 |             start_time = time.time()
54 |             while time.time() - start_time < self.timeout:
55 |                 if self.is_api_available():
56 |                     print("API 启动成功！继续运行主程序...")
57 |                     return True
58 |                 time.sleep(5)
59 | 
60 |             print("API 启动超时，可能无法正常工作。")
61 |             return False
62 | 
63 |         except Exception as e:
64 |             print(f"启动失败，错误信息: {e}")
65 |             return False
66 | 
67 |     def is_api_available(self):
68 |         # 检查 TTS API 是否可用
69 |         try:
70 |             response = requests.get(self.api_url, timeout=5)
71 |             return response.status_code == 200
72 |         except requests.exceptions.ConnectionError:
73 |             return False
74 |         except requests.exceptions.Timeout:
75 |             return False
76 | 
77 | 


--------------------------------------------------------------------------------
/TTS.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import os
 3 | import time
 4 | import shutil
 5 | from gradio_client import Client, handle_file
 6 | import pygame
 7 | from config import Config
 8 | 
 9 | 
10 | class TTSManager:
11 |     def __init__(self, api_url=Config.TTS_API_URL):
12 |         # 初始化 TTS 管理器:param api_url: TTS 服务器 API 地址
13 |         self.api_url = api_url
14 |         self.client = Client(api_url)
15 |         self.output_dir = Config.TTS_OUTPUT_DIR
16 |         self.history_dir = Config.TTS_HISTORY_DIR
17 |         self.prompt_text_path = Config.TTS_PROMPT_TEXT
18 |         self.prompt_wav_path = Config.TTS_PROMPT_WAV
19 | 
20 |         # 确保目录存在
21 |         os.makedirs(self.output_dir, exist_ok=True)
22 |         os.makedirs(self.history_dir, exist_ok=True)
23 | 
24 |     def clear_output_directory(self):
25 |         # 在每次生成 TTS 音频之前，先检查 output_voice 目录是否有旧文件，如果有，则移动到 voice_history 目录，确保目录下只有最新的音频。
26 |         pygame.mixer.init()
27 |         pygame.mixer.music.stop()
28 |         pygame.mixer.quit()
29 |         audio_files = [f for f in os.listdir(self.output_dir) if f.endswith(".wav")]
30 | 
31 |         if not audio_files:
32 |             return  # 没有文件需要移动
33 | 
34 |         for file in audio_files:
35 |             old_path = os.path.join(self.output_dir, file)
36 |             new_path = os.path.join(self.history_dir, file)
37 | 
38 |             try:
39 |                 shutil.move(old_path, new_path)
40 |                 # print(f"旧音频文件已归档: {file} -> {self.history_dir}")
41 |             except Exception as e:
42 |                 print(f"无法移动 {file} 到历史目录: {e}")
43 | 
44 |     def synthesize(self, text, mode="3s极速复刻"):
45 |         # 调用 TTS 生成语音，并确保 output_voice 目录是空的
46 |         # param text: 要转换为语音的文本
47 |         # param mode: TTS 模式（默认 3s 极速复刻）
48 |         # return: 生成的音频文件路径
49 | 
50 |         # 清理 output_voice 目录
51 |         self.clear_output_directory()
52 | 
53 |         # 读取语音克隆样本文本
54 |         with open(self.prompt_text_path, "r", encoding="utf-8") as file:
55 |             prompt_text = file.read()
56 | 
57 |         start_time = time.time()
58 |         self.client.predict(
59 |             tts_text=text,
60 |             mode_checkbox_group=mode,
61 |             sft_dropdown="",
62 |             prompt_text=prompt_text,
63 |             prompt_wav_upload=handle_file(self.prompt_wav_path),
64 |             prompt_wav_record=handle_file(self.prompt_wav_path),
65 |             instruct_text="",
66 |             seed=0,
67 |             stream=False,
68 |             speed=1,
69 |             api_name="/generate_audio"
70 |         )
71 |         print(f"TTS 处理耗时: {time.time() - start_time:.2f} 秒")
72 | 
73 |         # 获取最新的音频文件
74 |         return self.get_latest_audio()
75 | 
76 |     def get_latest_audio(self):
77 |         # 获取 output_voice 目录下最新生成的音频文件
78 |         # return: 最新音频文件路径或 None
79 |         audio_files = [f for f in os.listdir(self.output_dir) if f.endswith(".wav")]
80 | 
81 |         if not audio_files:
82 |             print("没有找到音频文件！")
83 |             return None
84 | 
85 |         # 按修改时间排序，取最新的
86 |         audio_files.sort(key=lambda x: os.path.getmtime(os.path.join(self.output_dir, x)), reverse=True)
87 |         latest_audio = os.path.join(self.output_dir, audio_files[0])
88 | 
89 |         # print(f"最新音频文件: {latest_audio}")
90 |         return latest_audio
91 | 


--------------------------------------------------------------------------------
/Live2d_env/pachirisu anime girl - top half.cdi3.json:
--------------------------------------------------------------------------------
  1 | {
  2 | 	"Version": 3,
  3 | 	"Parameters": [
  4 | 		{
  5 | 			"Id": "ParamAngleX",
  6 | 			"GroupId": "ParamGroup",
  7 | 			"Name": "Angle X"
  8 | 		},
  9 | 		{
 10 | 			"Id": "ParamAngleY",
 11 | 			"GroupId": "ParamGroup",
 12 | 			"Name": "Angle Y"
 13 | 		},
 14 | 		{
 15 | 			"Id": "ParamAngleZ",
 16 | 			"GroupId": "ParamGroup",
 17 | 			"Name": "Angle Z"
 18 | 		},
 19 | 		{
 20 | 			"Id": "ParamEyeLOpen",
 21 | 			"GroupId": "ParamGroup2",
 22 | 			"Name": "EyeL Open"
 23 | 		},
 24 | 		{
 25 | 			"Id": "ParamEyeLSmile",
 26 | 			"GroupId": "ParamGroup2",
 27 | 			"Name": "EyeL Smile"
 28 | 		},
 29 | 		{
 30 | 			"Id": "ParamEyeROpen",
 31 | 			"GroupId": "ParamGroup2",
 32 | 			"Name": "EyeR Open"
 33 | 		},
 34 | 		{
 35 | 			"Id": "ParamEyeRSmile",
 36 | 			"GroupId": "ParamGroup2",
 37 | 			"Name": "EyeR Smile"
 38 | 		},
 39 | 		{
 40 | 			"Id": "ParamEyeBallX",
 41 | 			"GroupId": "ParamGroup2",
 42 | 			"Name": "Eyeball X"
 43 | 		},
 44 | 		{
 45 | 			"Id": "ParamEyeBallY",
 46 | 			"GroupId": "ParamGroup2",
 47 | 			"Name": "Eyeball Y"
 48 | 		},
 49 | 		{
 50 | 			"Id": "ParamBrowLY",
 51 | 			"GroupId": "ParamGroup2",
 52 | 			"Name": "BrowL Y"
 53 | 		},
 54 | 		{
 55 | 			"Id": "ParamBrowRY",
 56 | 			"GroupId": "ParamGroup2",
 57 | 			"Name": "BrowR Y"
 58 | 		},
 59 | 		{
 60 | 			"Id": "ParamBrowLX",
 61 | 			"GroupId": "ParamGroup2",
 62 | 			"Name": "BrowL X"
 63 | 		},
 64 | 		{
 65 | 			"Id": "ParamBrowRX",
 66 | 			"GroupId": "ParamGroup2",
 67 | 			"Name": "BrowR X"
 68 | 		},
 69 | 		{
 70 | 			"Id": "ParamBrowLAngle",
 71 | 			"GroupId": "ParamGroup2",
 72 | 			"Name": "BrowL Angle"
 73 | 		},
 74 | 		{
 75 | 			"Id": "ParamBrowRAngle",
 76 | 			"GroupId": "ParamGroup2",
 77 | 			"Name": "BrowR Angle"
 78 | 		},
 79 | 		{
 80 | 			"Id": "ParamBrowLForm",
 81 | 			"GroupId": "ParamGroup2",
 82 | 			"Name": "BrowL Form"
 83 | 		},
 84 | 		{
 85 | 			"Id": "ParamBrowRForm",
 86 | 			"GroupId": "ParamGroup2",
 87 | 			"Name": "BrowR Form"
 88 | 		},
 89 | 		{
 90 | 			"Id": "ParamMouthForm",
 91 | 			"GroupId": "ParamGroup3",
 92 | 			"Name": "Mouth Form"
 93 | 		},
 94 | 		{
 95 | 			"Id": "ParamMouthOpenY",
 96 | 			"GroupId": "ParamGroup3",
 97 | 			"Name": "Mouth Open"
 98 | 		},
 99 | 		{
100 | 			"Id": "ParamBodyAngleX",
101 | 			"GroupId": "ParamGroup4",
102 | 			"Name": "Body X"
103 | 		},
104 | 		{
105 | 			"Id": "ParamBodyAngleY",
106 | 			"GroupId": "ParamGroup4",
107 | 			"Name": "Body Y"
108 | 		},
109 | 		{
110 | 			"Id": "ParamBodyAngleZ",
111 | 			"GroupId": "ParamGroup4",
112 | 			"Name": "Body Z"
113 | 		},
114 | 		{
115 | 			"Id": "ParamBreath",
116 | 			"GroupId": "ParamGroup4",
117 | 			"Name": "Breath"
118 | 		},
119 | 		{
120 | 			"Id": "ParamHairFront",
121 | 			"GroupId": "ParamGroup5",
122 | 			"Name": "Hair Move Front"
123 | 		},
124 | 		{
125 | 			"Id": "ParamHairSide",
126 | 			"GroupId": "ParamGroup5",
127 | 			"Name": "Hair Move Side"
128 | 		},
129 | 		{
130 | 			"Id": "ParamHairBack",
131 | 			"GroupId": "ParamGroup5",
132 | 			"Name": "Hair Move Back"
133 | 		},
134 | 		{
135 | 			"Id": "AhogeTwitch",
136 | 			"GroupId": "ParamGroup5",
137 | 			"Name": "Ahoge Twitch"
138 | 		},
139 | 		{
140 | 			"Id": "RibbonPhysics",
141 | 			"GroupId": "ParamGroup5",
142 | 			"Name": "Ribbon Physics"
143 | 		},
144 | 		{
145 | 			"Id": "ParamCheek",
146 | 			"GroupId": "ParamGroup6",
147 | 			"Name": "Cheek"
148 | 		},
149 | 		{
150 | 			"Id": "Param",
151 | 			"GroupId": "ParamGroup6",
152 | 			"Name": "Ears Twitch"
153 | 		}
154 | 	],
155 | 	"ParameterGroups": [
156 | 		{
157 | 			"Id": "ParamGroup",
158 | 			"GroupId": "",
159 | 			"Name": "XYZ"
160 | 		},
161 | 		{
162 | 			"Id": "ParamGroup2",
163 | 			"GroupId": "",
164 | 			"Name": "Eyes"
165 | 		},
166 | 		{
167 | 			"Id": "ParamGroup3",
168 | 			"GroupId": "",
169 | 			"Name": "Mouth"
170 | 		},
171 | 		{
172 | 			"Id": "ParamGroup4",
173 | 			"GroupId": "",
174 | 			"Name": "Body"
175 | 		},
176 | 		{
177 | 			"Id": "ParamGroup5",
178 | 			"GroupId": "",
179 | 			"Name": "Physics"
180 | 		},
181 | 		{
182 | 			"Id": "ParamGroup6",
183 | 			"GroupId": "",
184 | 			"Name": "Face"
185 | 		}
186 | 	],
187 | 	"Parts": [
188 | 		{
189 | 			"Id": "Part17",
190 | 			"Name": "pachirisu_anime_girl_edit.psd (Corresponding layer not found)"
191 | 		},
192 | 		{
193 | 			"Id": "Part",
194 | 			"Name": "Hair"
195 | 		},
196 | 		{
197 | 			"Id": "Part2",
198 | 			"Name": "Eye R"
199 | 		},
200 | 		{
201 | 			"Id": "eyes",
202 | 			"Name": "Eye L"
203 | 		},
204 | 		{
205 | 			"Id": "mouth",
206 | 			"Name": "Mouth"
207 | 		},
208 | 		{
209 | 			"Id": "Part3",
210 | 			"Name": "Body"
211 | 		},
212 | 		{
213 | 			"Id": "PartSketch0",
214 | 			"Name": "[ Guide Image]"
215 | 		}
216 | 	],
217 | 	"CombinedParameters": [
218 | 		[
219 | 			"ParamAngleX",
220 | 			"ParamAngleY"
221 | 		],
222 | 		[
223 | 			"ParamEyeBallX",
224 | 			"ParamEyeBallY"
225 | 		],
226 | 		[
227 | 			"ParamMouthForm",
228 | 			"ParamMouthOpenY"
229 | 		]
230 | 	]
231 | }


--------------------------------------------------------------------------------
/LLM.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import os
  3 | import shutil
  4 | import time
  5 | import requests
  6 | from openai import OpenAI
  7 | from config import Config
  8 | 
  9 | class LLMManager:
 10 |     def __init__(self):
 11 | 
 12 |         # 确定 online_model 为线上还是本地
 13 |         if Config.online_model == "online":
 14 |             online_model = 1
 15 |         elif Config.online_model == "offline":
 16 |             online_model = 0
 17 |         else:
 18 |             raise ValueError(f"配置错误: online_model 必须是 'online' 或 'offline'，但你提供了 {Config.online_model}")
 19 | 
 20 |         # 确定 model_choice
 21 |         if Config.model_choice == "OpenAI":
 22 |             model_choice = 1
 23 |         elif Config.model_choice == "deepseek":
 24 |             model_choice = 2
 25 |         else:
 26 |             raise ValueError(f"配置错误: model_choice 只能是 'OpenAI' 或 'deepseek'，但你提供了 {Config.model_choice}")
 27 | 
 28 |         # 初始化 LLM 对话管理器
 29 |         # param online_model: 是否使用在线模型（0 = 本地，1 = 在线）
 30 |         # param model_choice: 选择在线 LLM（1 = OpenAI GPT-4, 2 = DeepSeek）
 31 | 
 32 |         self.online_model = online_model
 33 |         self.model_choice = model_choice
 34 |         self.conversation = [
 35 |             {"role": "system",
 36 |              "content": "你是一位知识渊博的猫娘，致力于帮助我学习知识。你也可以与我闲聊，但请尽量简洁，像真正的老师一样回答问题。"},
 37 |             {"role": "assistant", "content": "不用输出分隔符，如'#'、'*'、'-'。"}
 38 |         ]
 39 |         self.conversation_summary = ""
 40 |         self.user_message_count = 0
 41 |         self.tmp_path = "E:/PyCharm/project/project1/TTS_env/tmp"
 42 |         os.makedirs(self.tmp_path, exist_ok=True)
 43 | 
 44 |         if online_model == 0:
 45 |             self.model_name = Config.model_name # 确定本地模型
 46 |             self.api_url = Config.api_url
 47 |         elif online_model == 1:
 48 |             if model_choice == 1:
 49 |                 self.client = OpenAI(api_key=Config.openai_key)
 50 |                 self.model_name = "gpt-4o-2024-11-20"
 51 |             elif model_choice == 2:
 52 |                 self.client = OpenAI(api_key=Config.deepseek_key, base_url="https://api.deepseek.com")
 53 |                 self.model_name = "deepseek-chat"
 54 | 
 55 |     def model_chat_completion(self, messages):
 56 | 
 57 |         # 调用 LLM 进行对话
 58 |         # param messages: 对话列表
 59 |         # return: 生成的回复文本
 60 |         if Config.online_model == "online":
 61 |             response = self.client.chat.completions.create(
 62 |                 model=self.model_name,
 63 |                 messages=messages,
 64 |                 stream=False
 65 |             )
 66 |             return response.choices[0].message.content.strip()
 67 |         elif Config.online_model == "offline":
 68 |             data = {
 69 |             "model": self.model_name,
 70 |             "messages": self.conversation}
 71 |             # 请求头（确保 `User-Agent` 避免 Python 请求被拦截）
 72 |             headers = {"Content-Type": "application/json","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"} # 伪装浏览器请求
 73 |             # 使用 `json=data`（避免 `json.dumps()` 出现错误）
 74 |             response = requests.post(self.api_url, headers=headers, json=data)
 75 |             # 解析返回结果
 76 |             if response.status_code == 200:
 77 |                 result = response.json()
 78 |                 # print("回复:", result["choices"][0]["message"]["content"])
 79 |                 return result["choices"][0]["message"]["content"]
 80 |             else:
 81 |                 print(f"请求失败，状态码: {response.status_code}")
 82 |                 print("错误信息:", response.text)
 83 | 
 84 | 
 85 | 
 86 | 
 87 |     def summarize_conversation(self):
 88 | 
 89 |         # 使用 LLM 对对话进行摘要
 90 |         # return: 摘要文本
 91 | 
 92 |         summary_prompt = [
 93 |             {"role": "system", "content": "你是一只专业的对话摘要工具。请用简洁的语言总结以下对话的主要内容。"},
 94 |             *self.conversation
 95 |         ]
 96 |         return self.model_chat_completion(summary_prompt)
 97 | 
 98 |     def chat_once(self, user_input):
 99 | 
100 |         # 进行一次对话（用户输入 → LLM 生成回复）
101 |         # param user_input: 用户输入的文本
102 |         # return: 生成的回复文本
103 | 
104 |         start_time = time.time()
105 |         self.conversation.append({"role": "user", "content": user_input})
106 |         self.user_message_count += 1
107 | 
108 |         if self.user_message_count % 5 == 0:
109 |             new_summary = self.summarize_conversation()
110 |             if self.conversation_summary:
111 |                 self.conversation_summary += "\n" + new_summary
112 |             else:
113 |                 self.conversation_summary = new_summary
114 | 
115 |             # 清理临时目录
116 |             shutil.rmtree(self.tmp_path)
117 |             os.makedirs(self.tmp_path, exist_ok=True)
118 | 
119 |             self.conversation = [
120 |                 {"role": "system",
121 |                  "content": "你是一位知识渊博的猫娘，致力于帮助我学习知识。你也可以与我闲聊，但请尽量简洁。"},
122 |                 {"role": "system", "content": f"这是之前对话的摘要：\n{self.conversation_summary}\n请继续与我对话。"},
123 |                 {"role": "assistant", "content": "不用输出分隔符，如'#'、'*'、'-'。"},
124 |                 {"role": "user", "content": user_input}
125 |             ]
126 | 
127 |         reply = self.model_chat_completion(self.conversation)
128 |         self.conversation.append({"role": "assistant", "content": reply})
129 |         print(f"LLM 思考耗时: {time.time() - start_time:.2f} 秒")
130 |         return reply
131 | 
132 | 
133 | if __name__ == "__main__":
134 |     llm_manager = LLMManager()
135 | 
136 |     while True:
137 |         user_input = input("你: ")
138 |         if user_input.lower() in ("exit。", "quit。", "q。", "结束。", "再见。"):
139 |             print("已退出对话。")
140 |             break
141 | 
142 |         reply = llm_manager.chat_once(user_input)
143 |         print(f"猫娘: {reply}")
144 | 


--------------------------------------------------------------------------------
/Live2d_animation.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import time
  3 | import glfw
  4 | import OpenGL.GL as gl
  5 | import pyautogui
  6 | import pygame
  7 | import ctypes
  8 | from pydub import AudioSegment
  9 | from live2d.v3 import LAppModel, init, dispose, glewInit, clearBuffer
 10 | from config import Config
 11 | 
 12 | # Live2D 窗口设置
 13 | GWL_EXSTYLE = -20
 14 | WS_EX_LAYERED = 0x00080000
 15 | WS_EX_TRANSPARENT = 0x00000020
 16 | 
 17 | # 眨眼状态
 18 | BLINK_STATE_NONE = 0
 19 | BLINK_STATE_CLOSING = 1
 20 | BLINK_STATE_CLOSED = 2
 21 | BLINK_STATE_OPENING = 3
 22 | 
 23 | class Live2DAnimationManager:
 24 |     def __init__(self, model_path, frame_rate=60):
 25 | 
 26 |         # 初始化 Live2D 动画管理器
 27 |         # param model_path: Live2D 模型文件路径（.model3.json）
 28 |         # param frame_rate: 渲染帧率
 29 | 
 30 |         self.model_path = model_path
 31 |         self.frame_rate = frame_rate
 32 |         self.mouth_value = 0
 33 |         self.window = None
 34 |         self.model = None
 35 |         self.running = True
 36 | 
 37 |         # 鼠标跟随相关参数
 38 |         self.last_mouse_x, self.last_mouse_y = pyautogui.position()
 39 |         self.last_move_time = time.time()
 40 |         self.IDLE_THRESHOLD = 3.0
 41 | 
 42 |         self.X_MIN, self.X_MAX = 200, 480
 43 |         self.Y_MIN, self.Y_MAX = 300, 360
 44 |         self.center_x_mapped = (self.X_MIN + self.X_MAX) / 2
 45 |         self.center_y_mapped = (self.Y_MIN + self.Y_MAX) / 2
 46 |         self.gaze_x = 0.0
 47 |         self.gaze_y = 0.0
 48 |         self.GAZE_EASING = 0.02
 49 | 
 50 |     def configure_window(self, window, width, height):
 51 | 
 52 |         # 配置 GLFW 窗口，使其透明且可穿透鼠标
 53 | 
 54 |         hwnd = glfw.get_win32_window(window)
 55 |         get_window_long = ctypes.windll.user32.GetWindowLongW
 56 |         set_window_long = ctypes.windll.user32.SetWindowLongW
 57 |         ex_style = get_window_long(hwnd, GWL_EXSTYLE)
 58 |         ex_style |= (WS_EX_LAYERED | WS_EX_TRANSPARENT)
 59 |         set_window_long(hwnd, GWL_EXSTYLE, ex_style)
 60 | 
 61 |         glfw.make_context_current(window)
 62 |         screen_width, screen_height = pyautogui.size()
 63 |         glfw.set_window_pos(window, 0, screen_height - height)
 64 | 
 65 |     def load_live2d_model(self, width, height):
 66 | 
 67 |         # 加载 Live2D 模型
 68 | 
 69 |         model = LAppModel()
 70 |         model.LoadModelJson(self.model_path)
 71 |         model.Resize(width, height)
 72 |         return model
 73 | 
 74 |     def play_live2d_once(self):
 75 | 
 76 |         # 创建 Live2D 窗口，并让角色进行渲染（保持运行）
 77 | 
 78 |         init()
 79 |         if not glfw.init():
 80 |             print("GLFW 初始化失败！")
 81 |             return
 82 | 
 83 |         glfw.window_hint(glfw.TRANSPARENT_FRAMEBUFFER, glfw.TRUE)
 84 |         glfw.window_hint(glfw.DECORATED, glfw.FALSE)
 85 |         glfw.window_hint(glfw.FLOATING, glfw.TRUE)
 86 | 
 87 |         window_width, window_height = 800, 600
 88 |         self.window = glfw.create_window(window_width, window_height, "Live2D Window", None, None)
 89 |         if not self.window:
 90 |             print("GLFW 窗口创建失败！")
 91 |             glfw.terminate()
 92 |             return
 93 | 
 94 |         self.configure_window(self.window, window_width, window_height)
 95 |         glewInit()
 96 | 
 97 |         self.model = self.load_live2d_model(window_width, window_height)
 98 | 
 99 |         last_time = time.time()
100 |         gl.glClearColor(0.0, 0.0, 0.0, 0.0)
101 | 
102 |         while self.running and not glfw.window_should_close(self.window):
103 |             gl.glClear(gl.GL_COLOR_BUFFER_BIT)
104 |             now = time.time()
105 |             dt = now - last_time
106 |             last_time = now
107 | 
108 |             width, height = glfw.get_framebuffer_size(self.window)
109 |             gl.glViewport(0, 0, width, height)
110 |             clearBuffer(0, 0, 0, 0)
111 | 
112 |             self.model.Update()
113 |             self.model.SetParameterValue("ParamMouthOpenY", self.mouth_value, 1)
114 | 
115 |             self.update_gaze_tracking(width, height)
116 | 
117 |             self.model.Draw()
118 |             glfw.swap_buffers(self.window)
119 |             glfw.poll_events()
120 | 
121 |         pygame.mixer.music.stop()
122 |         pygame.mixer.quit()
123 |         dispose()
124 |         glfw.terminate()
125 | 
126 |     def update_gaze_tracking(self, width, height):
127 | 
128 |         # 计算鼠标跟随逻辑，让 Live2D 角色的眼睛和头部跟随鼠标
129 | 
130 |         screen_x, screen_y = pyautogui.position()
131 |         win_x, win_y = glfw.get_window_pos(self.window)
132 |         local_mouse_x = screen_x - win_x
133 |         local_mouse_y = screen_y - win_y
134 | 
135 |         if (screen_x != self.last_mouse_x) or (screen_y != self.last_mouse_y):
136 |             self.last_move_time = time.time()
137 |             self.last_mouse_x, self.last_mouse_y = screen_x, screen_y
138 | 
139 |         if (time.time() - self.last_move_time) < self.IDLE_THRESHOLD:
140 |             mapped_x = self.X_MIN + (local_mouse_x / width) * (self.X_MAX - self.X_MIN)
141 |             mapped_y = self.Y_MIN + (local_mouse_y / height) * (self.Y_MAX - self.Y_MIN)
142 |             target_x = mapped_x
143 |             target_y = mapped_y
144 |         else:
145 |             target_x = self.center_x_mapped
146 |             target_y = self.center_y_mapped
147 |             self.GAZE_EASING = 0.0004
148 | 
149 |         self.gaze_x += self.GAZE_EASING * (target_x - self.gaze_x)
150 |         self.gaze_y += self.GAZE_EASING * (target_y - self.gaze_y)
151 |         self.model.Drag(self.gaze_x, self.gaze_y)
152 | 
153 |     def extract_volume_array(self, audio_file):
154 | 
155 |         # 提取音频的音量信息，并归一化用于嘴型同步
156 | 
157 |         seg = AudioSegment.from_file(audio_file, format="wav")
158 |         frame_duration_ms = 1000 / self.frame_rate
159 |         num_frames = int(seg.duration_seconds * self.frame_rate)
160 | 
161 |         volumes = []
162 |         for i in range(num_frames):
163 |             start_ms = i * frame_duration_ms
164 |             frame_seg = seg[start_ms: start_ms + frame_duration_ms]
165 |             rms = frame_seg.rms
166 |             volumes.append(rms)
167 | 
168 |         max_rms = max(volumes) if volumes else 1
169 |         volumes = [v / max_rms for v in volumes]  # 归一化
170 |         return volumes, seg.duration_seconds
171 | 
172 |     def play_audio_and_print_mouth(self, audio_file):
173 | 
174 |         # 播放音频并同步嘴型动作
175 | 
176 |         volume_array, audio_duration = self.extract_volume_array(audio_file)
177 |         total_frames = len(volume_array)
178 | 
179 |         pygame.mixer.init()
180 |         pygame.mixer.music.load(audio_file)
181 |         pygame.mixer.music.play()
182 | 
183 |         start_time = time.time()
184 |         while True:
185 |             current_time = time.time() - start_time
186 |             if current_time >= audio_duration:
187 |                 break
188 | 
189 |             frame_index = int(current_time * self.frame_rate)
190 |             if frame_index >= total_frames:
191 |                 frame_index = total_frames - 1
192 | 
193 |             self.mouth_value = volume_array[frame_index]
194 | 
195 |         pygame.mixer.music.stop()
196 | 
197 | 
198 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | accelerate==1.4.0
  2 | aiofiles==23.2.1
  3 | aiohappyeyeballs==2.4.6
  4 | aiohttp==3.11.12
  5 | aiosignal==1.3.2
  6 | anaconda-anon-usage @ file:///C:/b/abs_e8r_zga7xy/croot/anaconda-anon-usage_1732732454901/work
  7 | annotated-types @ file:///C:/b/abs_0dmaoyhhj3/croot/annotated-types_1709542968311/work
  8 | antlr4-python3-runtime==4.9.3
  9 | anyio==4.8.0
 10 | archspec @ file:///croot/archspec_1709217642129/work
 11 | attrs==25.1.0
 12 | audioread==3.0.1
 13 | beautifulsoup4==4.13.3
 14 | boltons @ file:///C:/b/abs_45_52ughkz/croot/boltons_1737061711836/work
 15 | Brotli @ file:///C:/b/abs_c415aux9ra/croot/brotli-split_1736182803933/work
 16 | certifi @ file:///C:/b/abs_8a944p1_gn/croot/certifi_1738623753421/work/certifi
 17 | cffi @ file:///C:/b/abs_29_b57if3f/croot/cffi_1736184144340/work
 18 | charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work
 19 | click==8.1.8
 20 | colorama @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/colorama_1699472650914/work
 21 | coloredlogs==15.0.1
 22 | conda @ file:///D:/bld/conda_1739917047096/work
 23 | conda-anaconda-telemetry @ file:///C:/b/abs_4c9llcc5ob/croot/conda-anaconda-telemetry_1736524617431/work
 24 | conda-anaconda-tos @ file:///C:/b/abs_ceeuq0lee_/croot/conda-anaconda-tos_1739299022910/work
 25 | conda-content-trust @ file:///C:/b/abs_bdfatn_wzf/croot/conda-content-trust_1714483201909/work
 26 | conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1737733694612/work/src
 27 | conda-package-handling @ file:///C:/b/abs_7fz3aferfv/croot/conda-package-handling_1731369038903/work
 28 | conda_package_streaming @ file:///C:/b/abs_bdz9vbvbh2/croot/conda-package-streaming_1731366449946/work
 29 | conformer==0.3.2
 30 | contourpy==1.3.1
 31 | cryptography @ file:///C:/b/abs_e2lzchf4i6/croot/cryptography_1732130411942/work
 32 | cycler==0.12.1
 33 | Cython==3.0.12
 34 | decorator==5.1.1
 35 | diffusers==0.32.2
 36 | distro @ file:///C:/b/abs_71xr36ua5r/croot/distro_1714488282676/work
 37 | einops==0.8.1
 38 | fastapi==0.115.8
 39 | ffmpy==0.5.0
 40 | filelock==3.17.0
 41 | flatbuffers @ file:///home/conda/feedstock_root/build_artifacts/python-flatbuffers_1739279199749/work
 42 | fonttools==4.56.0
 43 | frozendict @ file:///C:/b/abs_2alamqss6p/croot/frozendict_1713194885124/work
 44 | frozenlist==1.5.0
 45 | fsspec==2025.2.0
 46 | gdown==5.2.0
 47 | gradio==5.16.1
 48 | gradio_client==1.7.0
 49 | grpcio @ file:///D:/bld/grpc-split_1713388447196/work
 50 | grpcio-tools @ file:///D:/bld/grpcio-tools_1713479862547/work
 51 | h11==0.14.0
 52 | httpcore==1.0.7
 53 | httpx==0.28.1
 54 | huggingface-hub==0.28.1
 55 | humanfriendly==10.0
 56 | hydra-core==1.3.2
 57 | HyperPyYAML==1.2.2
 58 | idna @ file:///C:/b/abs_aad84bnnw5/croot/idna_1714398896795/work
 59 | importlib_metadata==8.6.1
 60 | importlib_resources==6.5.2
 61 | inflect==7.5.0
 62 | Jinja2==3.1.5
 63 | joblib==1.4.2
 64 | jsonpatch @ file:///C:/b/abs_4fdm88t7zi/croot/jsonpatch_1714483974578/work
 65 | jsonpointer==2.1
 66 | kiwisolver==1.4.8
 67 | lazy_loader==0.4
 68 | libmambapy @ file:///C:/b/abs_627vsv8bhu/croot/mamba-split_1734469608328/work/libmambapy
 69 | librosa==0.10.2.post1
 70 | lightning==2.5.0.post0
 71 | lightning-utilities==0.12.0
 72 | llvmlite==0.44.0
 73 | markdown-it-py @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/markdown-it-py_1699473886965/work
 74 | MarkupSafe==2.1.5
 75 | matcha==0.3
 76 | matplotlib==3.10.0
 77 | mdurl @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/mdurl_1699473506455/work
 78 | menuinst @ file:///C:/b/abs_fblttj5gp1/croot/menuinst_1738943438301/work
 79 | mkl-service==2.4.0
 80 | mkl_fft @ file:///C:/Users/dev-admin/mkl/mkl_fft_1730823082242/work
 81 | mkl_random @ file:///C:/Users/dev-admin/mkl/mkl_random_1730822522280/work
 82 | modelscope==1.23.0
 83 | more-itertools==10.6.0
 84 | mpmath==1.3.0
 85 | msgpack==1.1.0
 86 | multidict==6.1.0
 87 | networkx==3.4.2
 88 | numba==0.61.0
 89 | numpy @ file:///C:/b/abs_c1ywpu18ar/croot/numpy_and_numpy_base_1708638681471/work/dist/numpy-1.26.4-cp312-cp312-win_amd64.whl#sha256=becc06674317799ad0165a939a7613809d0bee9bd328a1e4308c57c39cacf08c
 90 | omegaconf==2.3.0
 91 | onnx @ file:///C:/b/abs_26fcas53j4/croot/onnx_1722521784627/work
 92 | onnxruntime-gpu @ file:///C:/Users/mark/miniforge3/conda-bld/onnxruntime_1735406817872/work/build-ci/Release/dist/onnxruntime_gpu-1.20.1-cp312-cp312-win_amd64.whl#sha256=00277ed6954e6c51eaa62089eee91a9ec2ba8097078adf00994717f8b0de0c1d
 93 | openai-whisper==20240930
 94 | orjson==3.10.15
 95 | packaging @ file:///C:/b/abs_3by6s2fa66/croot/packaging_1734472138782/work
 96 | pandas==2.2.3
 97 | peft==0.14.0
 98 | pillow==11.1.0
 99 | platformdirs @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/platformdirs_1701797392447/work
100 | pluggy @ file:///C:/b/abs_dfec_m79vo/croot/pluggy_1733170145382/work
101 | pooch==1.8.2
102 | propcache==0.2.1
103 | protobuf==4.25.3
104 | psutil==7.0.0
105 | pyarrow==19.0.1
106 | pycosat @ file:///C:/b/abs_18nblzzn70/croot/pycosat_1736868434419/work
107 | pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
108 | pydantic @ file:///C:/b/abs_27dx58x550/croot/pydantic_1734736090499/work
109 | pydantic_core @ file:///C:/b/abs_bdosz7qwys/croot/pydantic-core_1734726071532/work
110 | pydub==0.25.1
111 | Pygments @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/pygments_1699474141968/work
112 | pynini @ file:///D:/bld/pynini_1696660993031/work
113 | pyparsing==3.2.1
114 | pyreadline3==3.5.4
115 | PySocks @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/pysocks_1699473336188/work
116 | python-dateutil==2.9.0.post0
117 | python-multipart==0.0.20
118 | pytorch-lightning==2.5.0.post0
119 | pytz==2025.1
120 | pyworld==0.3.5
121 | PyYAML==6.0.2
122 | regex==2024.11.6
123 | requests @ file:///C:/b/abs_c3508vg8ez/croot/requests_1731000584867/work
124 | rich @ file:///C:/b/abs_21nw9z7xby/croot/rich_1720637504376/work
125 | ruamel.yaml @ file:///C:/b/abs_0cunwx_ww6/croot/ruamel.yaml_1727980181547/work
126 | ruamel.yaml.clib @ file:///C:/b/abs_5fk8zi6n09/croot/ruamel.yaml.clib_1727769837359/work
127 | ruff==0.9.6
128 | safehttpx==0.1.6
129 | safetensors==0.5.2
130 | scikit-learn==1.6.1
131 | scipy==1.15.2
132 | semantic-version==2.10.0
133 | setuptools==75.8.0
134 | shellingham==1.5.4
135 | six==1.17.0
136 | sniffio==1.3.1
137 | soundfile==0.13.1
138 | soupsieve==2.6
139 | soxr==0.5.0.post1
140 | starlette==0.45.3
141 | sympy==1.13.1
142 | threadpoolctl==3.5.0
143 | tiktoken==0.9.0
144 | tn==0.0.4
145 | tokenizers==0.21.0
146 | tomlkit==0.13.2
147 | torch==2.6.0
148 | torchaudio==2.6.0
149 | torchmetrics==1.6.1
150 | tqdm @ file:///C:/b/abs_0eh9b6xugj/croot/tqdm_1738945553987/work
151 | transformers==4.49.0
152 | truststore @ file:///C:/b/abs_494cm143zh/croot/truststore_1736550137835/work
153 | ttsfrd-dependency @ file:///E:/PyCharm/project/project3/CosyVoice/pretrained_models/CosyVoice-ttsfrd/ttsfrd_dependency-0.1-py3-none-any.whl#sha256=060a53f0650d12839983afdcfb052b049d7cf5c62344a00fee3a7344582aaf6f
154 | typeguard==4.4.2
155 | typer==0.15.1
156 | typing_extensions @ file:///C:/b/abs_0ffjxtihug/croot/typing_extensions_1734714875646/work
157 | tzdata==2025.1
158 | urllib3 @ file:///C:/b/abs_7bst06lizn/croot/urllib3_1737133657081/work
159 | uvicorn==0.34.0
160 | websockets==14.2
161 | WeTextProcessing==1.0.4.1
162 | wget==3.2
163 | wheel==0.45.1
164 | win-inet-pton @ file:///C:/Users/dev-admin/perseverance-python-buildout/croot/win_inet_pton_1699472992992/work
165 | yarl==1.18.3
166 | zipp==3.21.0
167 | zstandard @ file:///C:/b/abs_31t8xmrv_h/croot/zstandard_1731356578015/work
168 | 


--------------------------------------------------------------------------------
/Live2d_env/pachirisu anime girl - top half.physics3.json:
--------------------------------------------------------------------------------
  1 | {
  2 | 	"Version": 3,
  3 | 	"Meta": {
  4 | 		"PhysicsSettingCount": 5,
  5 | 		"TotalInputCount": 15,
  6 | 		"TotalOutputCount": 6,
  7 | 		"VertexCount": 13,
  8 | 		"Fps": 30,
  9 | 		"EffectiveForces": {
 10 | 			"Gravity": {
 11 | 				"X": 0,
 12 | 				"Y": -1
 13 | 			},
 14 | 			"Wind": {
 15 | 				"X": 0,
 16 | 				"Y": 0
 17 | 			}
 18 | 		},
 19 | 		"PhysicsDictionary": [
 20 | 			{
 21 | 				"Id": "PhysicsSetting1",
 22 | 				"Name": "Ears Twitch"
 23 | 			},
 24 | 			{
 25 | 				"Id": "PhysicsSetting2",
 26 | 				"Name": "Front Hair"
 27 | 			},
 28 | 			{
 29 | 				"Id": "PhysicsSetting3",
 30 | 				"Name": "Side Hair"
 31 | 			},
 32 | 			{
 33 | 				"Id": "PhysicsSetting4",
 34 | 				"Name": "Back Hair"
 35 | 			},
 36 | 			{
 37 | 				"Id": "PhysicsSetting5",
 38 | 				"Name": "Ribbon"
 39 | 			}
 40 | 		]
 41 | 	},
 42 | 	"PhysicsSettings": [
 43 | 		{
 44 | 			"Id": "PhysicsSetting1",
 45 | 			"Input": [
 46 | 				{
 47 | 					"Source": {
 48 | 						"Target": "Parameter",
 49 | 						"Id": "ParamEyeROpen"
 50 | 					},
 51 | 					"Weight": 100,
 52 | 					"Type": "X",
 53 | 					"Reflect": false
 54 | 				}
 55 | 			],
 56 | 			"Output": [
 57 | 				{
 58 | 					"Destination": {
 59 | 						"Target": "Parameter",
 60 | 						"Id": "Param"
 61 | 					},
 62 | 					"VertexIndex": 1,
 63 | 					"Scale": 0.567,
 64 | 					"Weight": 100,
 65 | 					"Type": "Angle",
 66 | 					"Reflect": false
 67 | 				},
 68 | 				{
 69 | 					"Destination": {
 70 | 						"Target": "Parameter",
 71 | 						"Id": "AhogeTwitch"
 72 | 					},
 73 | 					"VertexIndex": 1,
 74 | 					"Scale": 1,
 75 | 					"Weight": 100,
 76 | 					"Type": "Angle",
 77 | 					"Reflect": false
 78 | 				}
 79 | 			],
 80 | 			"Vertices": [
 81 | 				{
 82 | 					"Position": {
 83 | 						"X": 0,
 84 | 						"Y": 0
 85 | 					},
 86 | 					"Mobility": 1,
 87 | 					"Delay": 1,
 88 | 					"Acceleration": 1,
 89 | 					"Radius": 0
 90 | 				},
 91 | 				{
 92 | 					"Position": {
 93 | 						"X": 0,
 94 | 						"Y": 10.8
 95 | 					},
 96 | 					"Mobility": 0.59,
 97 | 					"Delay": 1,
 98 | 					"Acceleration": 1,
 99 | 					"Radius": 10.8
100 | 				}
101 | 			],
102 | 			"Normalization": {
103 | 				"Position": {
104 | 					"Minimum": -10,
105 | 					"Default": 0,
106 | 					"Maximum": 10
107 | 				},
108 | 				"Angle": {
109 | 					"Minimum": -10,
110 | 					"Default": 0,
111 | 					"Maximum": 10
112 | 				}
113 | 			}
114 | 		},
115 | 		{
116 | 			"Id": "PhysicsSetting2",
117 | 			"Input": [
118 | 				{
119 | 					"Source": {
120 | 						"Target": "Parameter",
121 | 						"Id": "ParamAngleX"
122 | 					},
123 | 					"Weight": 60,
124 | 					"Type": "X",
125 | 					"Reflect": false
126 | 				},
127 | 				{
128 | 					"Source": {
129 | 						"Target": "Parameter",
130 | 						"Id": "ParamAngleZ"
131 | 					},
132 | 					"Weight": 60,
133 | 					"Type": "Angle",
134 | 					"Reflect": false
135 | 				},
136 | 				{
137 | 					"Source": {
138 | 						"Target": "Parameter",
139 | 						"Id": "ParamBodyAngleX"
140 | 					},
141 | 					"Weight": 40,
142 | 					"Type": "X",
143 | 					"Reflect": false
144 | 				},
145 | 				{
146 | 					"Source": {
147 | 						"Target": "Parameter",
148 | 						"Id": "ParamBodyAngleZ"
149 | 					},
150 | 					"Weight": 40,
151 | 					"Type": "Angle",
152 | 					"Reflect": false
153 | 				}
154 | 			],
155 | 			"Output": [
156 | 				{
157 | 					"Destination": {
158 | 						"Target": "Parameter",
159 | 						"Id": "ParamHairFront"
160 | 					},
161 | 					"VertexIndex": 1,
162 | 					"Scale": 1,
163 | 					"Weight": 100,
164 | 					"Type": "Angle",
165 | 					"Reflect": false
166 | 				}
167 | 			],
168 | 			"Vertices": [
169 | 				{
170 | 					"Position": {
171 | 						"X": 0,
172 | 						"Y": 0
173 | 					},
174 | 					"Mobility": 1,
175 | 					"Delay": 1,
176 | 					"Acceleration": 1,
177 | 					"Radius": 0
178 | 				},
179 | 				{
180 | 					"Position": {
181 | 						"X": 0,
182 | 						"Y": 15
183 | 					},
184 | 					"Mobility": 0.86,
185 | 					"Delay": 0.8,
186 | 					"Acceleration": 1.5,
187 | 					"Radius": 15
188 | 				}
189 | 			],
190 | 			"Normalization": {
191 | 				"Position": {
192 | 					"Minimum": -10,
193 | 					"Default": 0,
194 | 					"Maximum": 10
195 | 				},
196 | 				"Angle": {
197 | 					"Minimum": -10,
198 | 					"Default": 0,
199 | 					"Maximum": 10
200 | 				}
201 | 			}
202 | 		},
203 | 		{
204 | 			"Id": "PhysicsSetting3",
205 | 			"Input": [
206 | 				{
207 | 					"Source": {
208 | 						"Target": "Parameter",
209 | 						"Id": "ParamAngleX"
210 | 					},
211 | 					"Weight": 60,
212 | 					"Type": "X",
213 | 					"Reflect": false
214 | 				},
215 | 				{
216 | 					"Source": {
217 | 						"Target": "Parameter",
218 | 						"Id": "ParamAngleZ"
219 | 					},
220 | 					"Weight": 60,
221 | 					"Type": "Angle",
222 | 					"Reflect": false
223 | 				},
224 | 				{
225 | 					"Source": {
226 | 						"Target": "Parameter",
227 | 						"Id": "ParamBodyAngleX"
228 | 					},
229 | 					"Weight": 40,
230 | 					"Type": "X",
231 | 					"Reflect": false
232 | 				},
233 | 				{
234 | 					"Source": {
235 | 						"Target": "Parameter",
236 | 						"Id": "ParamBodyAngleZ"
237 | 					},
238 | 					"Weight": 40,
239 | 					"Type": "Angle",
240 | 					"Reflect": false
241 | 				}
242 | 			],
243 | 			"Output": [
244 | 				{
245 | 					"Destination": {
246 | 						"Target": "Parameter",
247 | 						"Id": "ParamHairSide"
248 | 					},
249 | 					"VertexIndex": 1,
250 | 					"Scale": 1,
251 | 					"Weight": 100,
252 | 					"Type": "Angle",
253 | 					"Reflect": false
254 | 				}
255 | 			],
256 | 			"Vertices": [
257 | 				{
258 | 					"Position": {
259 | 						"X": 0,
260 | 						"Y": 0
261 | 					},
262 | 					"Mobility": 1,
263 | 					"Delay": 1,
264 | 					"Acceleration": 1,
265 | 					"Radius": 0
266 | 				},
267 | 				{
268 | 					"Position": {
269 | 						"X": 0,
270 | 						"Y": 10
271 | 					},
272 | 					"Mobility": 0.84,
273 | 					"Delay": 0.8,
274 | 					"Acceleration": 1.5,
275 | 					"Radius": 10
276 | 				},
277 | 				{
278 | 					"Position": {
279 | 						"X": 0,
280 | 						"Y": 18
281 | 					},
282 | 					"Mobility": 0.76,
283 | 					"Delay": 0.8,
284 | 					"Acceleration": 1.5,
285 | 					"Radius": 8
286 | 				}
287 | 			],
288 | 			"Normalization": {
289 | 				"Position": {
290 | 					"Minimum": -10,
291 | 					"Default": 0,
292 | 					"Maximum": 10
293 | 				},
294 | 				"Angle": {
295 | 					"Minimum": -10,
296 | 					"Default": 0,
297 | 					"Maximum": 10
298 | 				}
299 | 			}
300 | 		},
301 | 		{
302 | 			"Id": "PhysicsSetting4",
303 | 			"Input": [
304 | 				{
305 | 					"Source": {
306 | 						"Target": "Parameter",
307 | 						"Id": "ParamAngleX"
308 | 					},
309 | 					"Weight": 60,
310 | 					"Type": "X",
311 | 					"Reflect": false
312 | 				},
313 | 				{
314 | 					"Source": {
315 | 						"Target": "Parameter",
316 | 						"Id": "ParamAngleZ"
317 | 					},
318 | 					"Weight": 60,
319 | 					"Type": "Angle",
320 | 					"Reflect": false
321 | 				},
322 | 				{
323 | 					"Source": {
324 | 						"Target": "Parameter",
325 | 						"Id": "ParamBodyAngleX"
326 | 					},
327 | 					"Weight": 40,
328 | 					"Type": "X",
329 | 					"Reflect": false
330 | 				},
331 | 				{
332 | 					"Source": {
333 | 						"Target": "Parameter",
334 | 						"Id": "ParamBodyAngleZ"
335 | 					},
336 | 					"Weight": 40,
337 | 					"Type": "Angle",
338 | 					"Reflect": false
339 | 				}
340 | 			],
341 | 			"Output": [
342 | 				{
343 | 					"Destination": {
344 | 						"Target": "Parameter",
345 | 						"Id": "ParamHairBack"
346 | 					},
347 | 					"VertexIndex": 1,
348 | 					"Scale": 1,
349 | 					"Weight": 100,
350 | 					"Type": "Angle",
351 | 					"Reflect": false
352 | 				}
353 | 			],
354 | 			"Vertices": [
355 | 				{
356 | 					"Position": {
357 | 						"X": 0,
358 | 						"Y": 0
359 | 					},
360 | 					"Mobility": 1,
361 | 					"Delay": 1,
362 | 					"Acceleration": 1,
363 | 					"Radius": 0
364 | 				},
365 | 				{
366 | 					"Position": {
367 | 						"X": 0,
368 | 						"Y": 10
369 | 					},
370 | 					"Mobility": 0.85,
371 | 					"Delay": 0.9,
372 | 					"Acceleration": 1,
373 | 					"Radius": 10
374 | 				},
375 | 				{
376 | 					"Position": {
377 | 						"X": 0,
378 | 						"Y": 20
379 | 					},
380 | 					"Mobility": 0.9,
381 | 					"Delay": 0.9,
382 | 					"Acceleration": 1,
383 | 					"Radius": 10
384 | 				},
385 | 				{
386 | 					"Position": {
387 | 						"X": 0,
388 | 						"Y": 28
389 | 					},
390 | 					"Mobility": 0.9,
391 | 					"Delay": 0.9,
392 | 					"Acceleration": 0.8,
393 | 					"Radius": 8
394 | 				}
395 | 			],
396 | 			"Normalization": {
397 | 				"Position": {
398 | 					"Minimum": -10,
399 | 					"Default": 0,
400 | 					"Maximum": 10
401 | 				},
402 | 				"Angle": {
403 | 					"Minimum": -10,
404 | 					"Default": 0,
405 | 					"Maximum": 10
406 | 				}
407 | 			}
408 | 		},
409 | 		{
410 | 			"Id": "PhysicsSetting5",
411 | 			"Input": [
412 | 				{
413 | 					"Source": {
414 | 						"Target": "Parameter",
415 | 						"Id": "ParamBodyAngleX"
416 | 					},
417 | 					"Weight": 100,
418 | 					"Type": "X",
419 | 					"Reflect": false
420 | 				},
421 | 				{
422 | 					"Source": {
423 | 						"Target": "Parameter",
424 | 						"Id": "ParamBodyAngleZ"
425 | 					},
426 | 					"Weight": 100,
427 | 					"Type": "Angle",
428 | 					"Reflect": false
429 | 				}
430 | 			],
431 | 			"Output": [
432 | 				{
433 | 					"Destination": {
434 | 						"Target": "Parameter",
435 | 						"Id": "RibbonPhysics"
436 | 					},
437 | 					"VertexIndex": 1,
438 | 					"Scale": 1,
439 | 					"Weight": 100,
440 | 					"Type": "Angle",
441 | 					"Reflect": false
442 | 				}
443 | 			],
444 | 			"Vertices": [
445 | 				{
446 | 					"Position": {
447 | 						"X": 0,
448 | 						"Y": 0
449 | 					},
450 | 					"Mobility": 1,
451 | 					"Delay": 1,
452 | 					"Acceleration": 1,
453 | 					"Radius": 0
454 | 				},
455 | 				{
456 | 					"Position": {
457 | 						"X": 0,
458 | 						"Y": 10
459 | 					},
460 | 					"Mobility": 0.9,
461 | 					"Delay": 0.6,
462 | 					"Acceleration": 1.5,
463 | 					"Radius": 10
464 | 				}
465 | 			],
466 | 			"Normalization": {
467 | 				"Position": {
468 | 					"Minimum": -10,
469 | 					"Default": 0,
470 | 					"Maximum": 10
471 | 				},
472 | 				"Angle": {
473 | 					"Minimum": -10,
474 | 					"Default": 0,
475 | 					"Maximum": 10
476 | 				}
477 | 			}
478 | 		}
479 | 	]
480 | }


--------------------------------------------------------------------------------
/README_CN.md:
--------------------------------------------------------------------------------
  1 | # Live2D-LLM-Chat
  2 | [US English](README.md) | [CN 中文](README_CN.md)
  3 | 
  4 | [![ASR](https://img.shields.io/badge/ASR-SenseVoice-green.svg)](https://github.com/FunAudioLLM/SenseVoice)
  5 | [![LLM](https://img.shields.io/badge/LLM-GPT%2FDeepSeek-red.svg)](https://openai.com/api/) 
  6 | [![TTS](https://img.shields.io/badge/TTS-CosyVoice-orange.svg)](https://github.com/FunAudioLLM/CosyVoice)
  7 | [![Live2D](https://img.shields.io/badge/Live2D-v3-blue.svg)](https://github.com/Arkueid/live2d-py)
  8 | 
  9 | [![Python](https://img.shields.io/badge/Python-3.8+-yellow.svg)](https://www.python.org/downloads/)
 10 | [![Miniconda](https://img.shields.io/badge/Anaconda-Miniconda-violet.svg)](https://docs.anaconda.net.cn/miniconda/install/)
 11 | 
 12 | > **Live2D + ASR + LLM + TTS** → 实时语音互动 | 本地部署 / 云端推理
 13 | 
 14 | ---
 15 | ## ✨ 1. 项目简介
 16 | 
 17 | **Live2D-LLM-Chat** 是一个集成了**Live2D 虚拟形象**、**语音识别（ASR）**、**大语言模型（LLM）**和**文本转语音（TTS）** 的实时 AI 交互项目。它能够让**虚拟角色**通过语音识别用户的输入，并使用 AI 生成智能回复，同时通过 TTS 播放语音，并驱动 Live2D 动画实现嘴型同步，达到自然的互动体验。
 18 | 
 19 | ---
 20 | ### 📌 1.1. 主要功能
 21 | - 🎙 **语音识别（ASR）**：使用 FunASR 进行语音转文本 (STT) 处理。
 22 | - 🧠 **大语言模型（LLM）**：基于 OpenAI GPT / DeepSeek 提供理性沟通能力。
 23 | - 🔊 **文本转语音（TTS）**：使用 CosyVoice 实现高质量的合成语音
 24 | - 🏆 **Live2D 虚拟形象交互**：使用 Live2D SDK 渲染角色，并实现模型的实时反馈。
 25 | 
 26 | ---
 27 | ### 📌 1.2. 优化功能
 28 | - **LLM模块**接口可支持本地与云端部署，本地部署基于**LM Studio**接口，基本涵盖所有已开源模型，但个人设备性能难以运行大体量模型；云端部署接口现已支持**OpenAI**平台接口与**DeepSeek**平台接口。
 29 | - 储存模型对话时的前文数据，形成**历史记忆**。每5次对话会进行总结，避免多次对话后文本累计过量的情况。
 30 | - 对历次模型对话的时间与内容进行**存档**，便于查找过往对话内容。可存档内容包括模型的**历史语音输出**。该功能可在配置文件中关闭，关闭后再次进行对话时将清除历史对话的语音数据，**减清内存压力**。
 31 | - 重构Live2d模型角色的**眼神跟随**与**眨眼逻辑**，即使live2d模型没有内置眨眼逻辑，也可自然眨眼。编写**嘴型变化**逻辑，读取TTS模块输出的音频文件，将实时音频大小转化至live2d模型的嘴型变化。
 32 | - 修改CosyVoice模型的API接口程序，改变生成语音文件打开方式，允许**直接保存**生成文件；对于长文本下分段生成的语音文件，**合并**为单一文件。
 33 | <p align="center">
 34 |   <img src="Live2d_env/running_photo.jpg" alt="Live2D 运行展示" width="620px">
 35 |   <br>
 36 |   <b>Live2D 运行展示</b>
 37 | </p>
 38 | 
 39 | 
 40 | #### 🎬 运行效果
 41 | 
 42 | | 语音输入 | AI 处理 | Live2D 输出 |
 43 | |----------|---------|------------|
 44 | | 🎤 你：你好呀 | 🤖 AI：你好！ | 🧑‍🎤 "你好！" (嘴型同步) |
 45 | | 🎤 你：天气怎么样？ | 🤖 AI：今天是大晴天呢！ | 🧑‍🎤 "今天是大晴天呢！" (语气变化) |
 46 | 
 47 | ---
 48 | ### 📌 1.3. 技术栈
 49 | | 组件  | 技术  |
 50 | |-------|-------|
 51 | | ASR（自动语音识别） | SenseVoice |
 52 | | LLM（大语言模型） | OpenAI GPT / DeepSeek |
 53 | | TTS（文本转语音） | CosyVoice |
 54 | | Live2D 动画 | live2d-py + OpenGL |
 55 | | 配置管理 | Python Config |
 56 | 
 57 | ---
 58 | ## 🛠 2. 安装与配置
 59 | 
 60 | ---
 61 | 
 62 | ### 📌 2.1. 环境要求
 63 | 
 64 | 本项目基于 **Python 3.11** 开发，运行前请确保满足以下环境要求：
 65 | 
 66 | ✅ **操作系统**：
 67 |    - 🖥 **Windows 10/11** 或 **Linux**
 68 |   
 69 | ✅ **Python 版本**：
 70 |    - 📌 建议使用 **Python 3.8 及以上**
 71 |   
 72 | ⚠️ **注意**：  
 73 | 本项目的 **TTS 模块** 基于 **conda 环境** 运行，需要 **预先安装 Miniconda**。  
 74 | 🔗 你可以从 [Miniconda 官网](https://docs.anaconda.net.cn/miniconda/install/) 下载。
 75 | 
 76 | ---
 77 | 
 78 | ### 📌 2.2. 依赖的开源项目 
 79 | 
 80 | 本项目使用了以下优秀的开源库和模型：  
 81 | 
 82 | 🎙 **语音识别（ASR）**：  
 83 | - **SenseVoice** —— 高精度 **多语言语音识别** 及 **语音情感分析**  
 84 | - 🔗 **GitHub**：[SenseVoice Repository](https://github.com/FunAudioLLM/SenseVoice)  
 85 | 
 86 | 🔊 **文本转语音（TTS）**：  
 87 | - **CosyVoice** —— 强大的 **生成式语音合成系统**，支持 **零样本语音克隆**  
 88 | - 🔗 **GitHub**：[CosyVoice Repository](https://github.com/FunAudioLLM/CosyVoice)  
 89 | 
 90 | 📽 **Live2D 动画**：  
 91 | - **live2d-py** —— **Python 直接加载和操作 Live2D 模型** 的工具  
 92 | - 🔗 **GitHub**：[live2d-py Repository](https://github.com/Arkueid/live2d-py)  
 93 | 
 94 | ---
 95 | ## 📁 3. 安装步骤
 96 | 
 97 | ---
 98 | 
 99 | ### 📌 3.1. 克隆项目代码
100 | 
101 | ```bash
102 | git clone https://github.com/suzuran0y/Live2D-LLM-Chat.git
103 | cd Live2D-LLM-Chat
104 | ```
105 | 
106 | ### 📌 3.2. 创建虚拟环境（可选）
107 | ```bash
108 | python -m venv venv
109 | source venv/bin/activate # Linux/macOS 激活虚拟环境
110 | venv\Scripts\activate # Windows 激活虚拟环境
111 | ```
112 | 
113 | ### 📌 3.3. 安装依赖
114 | 
115 | ```bash
116 | pip install -r requirements.txt
117 | ```
118 | 
119 | ---
120 | ### 📌 3.4. 安装 ASR & TTS 模型
121 | 🎙 **语音识别 (ASR) - SenseVoice**
122 | 本项目使用 SenseVoice 作为 ASR 模型，支持**高精度多语言语音识别**、**语音情感识别**和**声学事件检测**。
123 | 
124 | #### 1️⃣ 安装 SenseVoice 依赖
125 | 使用 pip 安装 SenseVoice 相关依赖：
126 | ```bash
127 | pip install funasr
128 | ```
129 | 
130 | 如果需要 ONNX 或 TorchScript 推理，请安装对应的版本：
131 | 
132 | ```bash
133 | pip install funasr-onnx  # ONNX 版本
134 | pip install funasr-torch  # TorchScript 版本
135 | ```
136 | #### 2️⃣ 下载 SenseVoice 预训练模型
137 | SenseVoice 官方提供多个**预训练模型**，可通过 ModelScope 进行下载：
138 | 
139 | ```bash
140 | from modelscope import snapshot_download
141 | 
142 | # 下载 SenseVoice-Small 版本
143 | snapshot_download('iic/SenseVoiceSmall', local_dir='pretrained_models/SenseVoiceSmall')
144 | # 下载 SenseVoice-Large 版本（如果需要更高精度）
145 | snapshot_download('iic/SenseVoiceLarge', local_dir='pretrained_models/SenseVoiceLarge')
146 | ```
147 | 更详细的配置和参数说明，请参考：
148 | 
149 | 🔗SenseVoice GitHub：[SenseVoice GitHub](https://github.com/FunAudioLLM/SenseVoice)
150 | 🔗ModelScope：[预训练模型](https://www.modelscope.cn/models/iic/SenseVoiceSmall)
151 | 
152 | 🔊 **文本转语音 (TTS) - CosyVoice**
153 | 本项目使用 CosyVoice 作为 TTS 模型，支持**多语言**、**语音克隆**、**跨语言复刻**等功能。
154 | 
155 | #### 1️⃣ 安装 CosyVoice 依赖
156 | 克隆 CosyVoice 仓库：
157 | ```bash
158 | git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
159 | cd CosyVoice
160 | git submodule update --init --recursive
161 | ```
162 | 
163 | #### 2️⃣ 创建 Conda 环境并安装依赖
164 | ```bash
165 | # 创建 Conda 虚拟环境
166 | conda create -n cosyvoice -y python=3.10
167 | conda activate cosyvoice
168 | 
169 | # 安装必要依赖
170 | conda install -y -c conda-forge pynini==2.1.5
171 | pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
172 | ```
173 | 
174 | 安装 SoX（如果需要）：
175 | 
176 | ```bash
177 | # Ubuntu
178 | sudo apt-get install sox libsox-dev
179 | # CentOS
180 | sudo yum install sox sox-devel
181 | ```
182 | 
183 | #### 3️⃣ 下载 CosyVoice 预训练模型
184 | 建议下载以下 CosyVoice 预训练模型：
185 | 
186 | ```bash
187 | from modelscope import snapshot_download
188 | 
189 | snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
190 | snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
191 | snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
192 | snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
193 | snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
194 | ```
195 | 更详细的配置和参数说明，请参考：
196 | 
197 | 🔗CosyVoice GitHub：[CosyVoice GitHub](https://github.com/FunAudioLLM/CosyVoice)
198 | 🔗ModelScope：[预训练模型](https://www.modelscope.cn/iic/CosyVoice2-0.5B)
199 | 
200 | ---
201 | ## ⚙️ 4. 本地化配置（重要！！）
202 | 
203 | ---
204 | 
205 | ### 📌 4.1. 配置 ASR & TTS 模型
206 | 
207 | 在完成 **ASR** 和 **TTS** 模型的安装后，按照以下步骤进行本地化配置：  
208 | 
209 | ✅ **替换 SenseVoice 目录**  
210 | - 将下载好的 **SenseVoice** 文件夹 **放入** `Live2D-LLM-Chat/ASR_env/` 文件夹内，**替换原有的同名空文件夹**。  
211 | 
212 | ✅ **替换 CosyVoice 目录**  
213 | - 将下载好的 **CosyVoice** 文件夹 **放入** `Live2D-LLM-Chat/TTS_env/` 文件夹内，**替换原有的同名空文件夹**。  
214 | 
215 | ✅ **替换 `webui.py` 文件**  
216 | - 将 `TTS_env` 文件夹内的 **`webui.py`** **放入** `CosyVoice` 文件夹内，**替换原有的 `webui.py` 文件**。
217 | 
218 | ---
219 | 
220 | ### 📌 4.2. 配置 `config.py` 以适配本地环境
221 | 所有 **本地路径和参数** 均可在 **`config.py`** 文件中进行修改：  
222 | 请根据 **你的文件路径** 进行相应修改，示例如下：
223 | ```python
224 | class Config:
225 |     # 🏠 项目根目录
226 |     PROJECT_ROOT = "E:/PyCharm/project/project1"
227 | 
228 |     # 🎙 ASR（自动语音识别）配置
229 |     ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall")
230 |     ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav")
231 | 
232 |     # 🔊 TTS（文本转语音）配置
233 |     TTS_API_URL = "http://localhost:8000/"
234 |     TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/")
235 | 
236 |     ......
237 |     
238 |     # 📢 更多配置信息请参考 `config.py`
239 | ```
240 | ❗ 请确保所有路径正确，否则模型无法正常运行！
241 | 
242 | ---
243 | ### 📌 4.3. 配置 LLM 模型
244 | 本地化部署**LLM 模型**依赖于**LM Studio**，请按照以下步骤进行：
245 | 
246 | #### 1️⃣ 安装 LM Studio
247 | 可从[GitHub 官方仓库](https://github.com/lmstudio-ai) 或 [LM Studio 官网](https://lmstudio.ai/) 下载安装。
248 | 
249 | #### 2️⃣ 进入程序，下载当前设备可运行的 LLM 模型。
250 | 启用 LM Studio，获取 本地接口 URL。
251 | 确定模型路径 & 端口号，在 config.py 中进行相应配置。
252 | #### 3️⃣ 运行本地 LLM，并在项目中调用。
253 | ⚠️ **注意**：本地 LLM 部署性能受限于设备配置，可能无法与云端大模型相比。如需更高性能，可考虑使用 OpenAI GPT-4 或 DeepSeek API。
254 | 
255 | ---
256 | ## 👀 5. 使用方法
257 | 
258 | ---
259 | 
260 | ## 📌 5.1. 启动 TTS API
261 | 
262 | 在运行主程序前，**需要先启动 TTS API**：  
263 | 
264 | ```bash
265 | python TTS_api.py # 现在 TTS API 调用**已集成到主程序中**，通常无需单独运行，但调试（debug）时可单独运行检查。
266 | ```
267 | 
268 | 
269 | 🎯 TTS API 模块将在 **conda 环境** 中运行 webui.py。启动成功后，可在浏览器访问 WebUI 进行语音合成管理：🌍 默认访问地址：http://localhost:8000
270 | 
271 | ❗ 确保 TTS API **启动成功**，否则程序无法合成语音。
272 | 
273 | ---
274 | ## 📌 5.2. 运行主程序
275 | 启动 TTS API 后，运行后续程序：
276 | 
277 | ```bash
278 | python main.py
279 | ```
280 | 🎙 交互方式：
281 | 
282 | #### 1️⃣ 按住 Ctrl 键 开始录音，按 Alt 键 结束录音，语音将自动转换为文本。
283 | #### 2️⃣ 语音文本 被输入 LLM 模块 进行回答，并生成答复文本。
284 | #### 3️⃣ 答复文本 被输入 TTS 模块 合成为语音，并在 Live2D 窗口中做出口型同步。
285 | 
286 | ---
287 | ## 📌 5.3. 架构示意图
288 | 
289 | | **步骤** | **模块** | **输入** | **处理** | **输出** |
290 | |----------|---------|---------|---------|---------|
291 | | 🎤 **用户语音** | **用户** | 语音输入 | 用户说话 | 音频信号 |
292 | | 🎙 **语音识别** | **ASR（SenseVoice）** | 音频信号 | 语音转文本（STT） | 识别文本 |
293 | | 🤖 **文本理解 & 生成** | **LLM（GPT-4 / DeepSeek）** | 识别文本 | 语义分析 & 生成 AI 回复 | AI 生成文本 |
294 | | 🔊 **语音合成** | **TTS（CosyVoice）** | AI 生成文本 | 文本转语音（TTS） | 语音数据 |
295 | | 🎭 **Live2D 动画** | **Live2D** | 语音数据 | 动作生成 | 角色动画 |
296 | | 🗣 **AI 语音反馈** | **用户** | 角色语音 & 动作 | 用户听到 AI 反馈 | 语音 & 视觉互动 |
297 | 
298 | 
299 | ---
300 | # 📂 6. 项目结构
301 | ---
302 | 
303 | 本项目采用模块化设计，包含 **ASR（语音识别）、TTS（文本转语音）、LLM（大语言模型）、Live2D 动画渲染** 等核心功能，以下是 **完整的项目结构**：
304 | 
305 | ```bash
306 | Live2D-LLM-Chat/
307 | │── main.py                # 🚀 主程序入口
308 | │── ASR.py                 # 🎙 语音识别 (ASR) 模块
309 | │── TTS.py                 # 🔊 语音合成 (TTS) 模块
310 | │── TTS_api.py             # 🌐 TTS API 模块
311 | │── LLM.py                 # 🤖 大语言模型 (LLM) 模块
312 | │── Live2d_animation.py    # 🎭 Live2D 动画管理模块
313 | │── webui.py               # 🖥 WebUI 语音合成界面
314 | │── config.py              # ⚙️ 项目配置文件
315 | │── requirements.txt       # 📦 依赖列表
316 | └── README.md              # 📄 项目文档
317 | ```
318 | ---
319 | ## 🚀 7. 项目发展  
320 | ---
321 | 
322 | ### 📌 7.1. 过往内容
323 | 
324 | #### 🏁 **2025.01.28 - 项目构思**
325 | - 🎯 **确定核心目标**：基于 **Live2D + LLM** 实现实时互动系统  
326 | - 🔍 **研究技术**：语音识别（ASR）、文本转语音（TTS）及 Live2D 方案  
327 | - ✅ **选定核心组件**：
328 |   - **SenseVoice** 作为 ASR  
329 |   - **CosyVoice** 作为 TTS  
330 |   - **live2d-py** 作为动画渲染引擎  
331 | 
332 | #### 📅 **2025.02.28 - 发布第一版**
333 | - 🎙 **实现语音输入 & 识别（ASR）**  
334 | - 🤖 **集成 LLM 进行文本生成**  
335 | - 🔊 **通过 TTS 生成语音并同步 Live2D 模型部分动作**  
336 | 
337 | ---
338 | 
339 | ### 📌 7.2. 未来计划 ~~(画饼)~~  
340 | 
341 | 🔹 **LLM 模块优化**：
342 |    - 由于 **个人设备性能** 限制了本地部署模型的输出质量，计划 **改进 LLM 模块的输出逻辑**，提升稳定性。  
343 | 
344 | 🔹 **信息输出精炼**：
345 |    - 优化 **模型运行时的日志和输出信息**，仅保留重点内容，提高可读性和观感。  
346 | 
347 | 🔹 **Live2D 交互增强**：
348 |    - **提升 Live2D 角色的动作丰富度**，增强互动体验，使 Live2D 角色更具表现力。  
349 | 
350 | 🔹 **后续优化**：
351 |    - 🛠 持续优化 TTS & ASR 模块的运行效率  
352 |    - 🌍 增强多语言支持，扩展至更多语种  
353 |    - 🔗 进一步支持云端推理，提高性能  
354 | 
355 | ---
356 | 
357 | ## 🤝 8. 贡献与鸣谢
358 | ---
359 | 
360 | 本项目部分代码基于 [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)、[CosyVoice](https://github.com/FunAudioLLM/CosyVoice) 和 [live2d-py](https://github.com/Arkueid/live2d-py) 进行修改，并根据项目需求进行了优化和扩展。  
361 | 🎉 **特此感谢原项目作者的贡献！**  
362 | 
363 | 💡 **欢迎贡献代码和建议！**  
364 | 📢 如有问题或改进建议，请提交 **PR（Pull Request）** 或 **Issue** 进行反馈。  
365 | 
366 | ---
367 | 
368 | 
369 | ## 📄 许可证
370 | 本项目采用 [Apache-2.0 许可证](LICENSE)。
371 | 
372 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/TTS_env/webui.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # 本文件基于 Alibaba Inc 的原始代码(webui)修改
  3 | # 原作者: Xiang Lyu, Liu Yue
  4 | # 修改者: suzuran0y
  5 | # 主要修改内容:
  6 | # 1. 添加生成语音历史存档功能
  7 | # 2. 增加语音数据清除方式：自定义文件清除或归档
  8 | # 3. 修改生成语音文件打开方式，允许直接保存生成文件
  9 | # 4. 对于长文本下分段生成的语音文件，合并为单一文件
 10 | 
 11 | import os
 12 | import sys
 13 | import argparse
 14 | import gradio as gr
 15 | import numpy as np
 16 | import torch
 17 | import torchaudio
 18 | import random
 19 | import librosa
 20 | import soundfile as sf
 21 | import shutil
 22 | import datetime
 23 | from pydub import AudioSegment  # 用于合并音频
 24 | from config import Config
 25 | 
 26 | # 配置文件清理方式（delete: 删除 | move: 归档）
 27 | CLEANUP_MODE = Config.CLEANUP_MODE  # "delete" 或 "move"
 28 | 
 29 | # 设定保存目录和历史归档目录
 30 | SAVE_DIR = Config.WEBUI_SAVE_DIR
 31 | HISTORY_DIR = Config.WEBUI_HISTORY_DIR
 32 | MODEL_DIR = Config.WEBUI_MODEL_DIR
 33 | # 确保目录存在
 34 | os.makedirs(SAVE_DIR, exist_ok=True)
 35 | os.makedirs(HISTORY_DIR, exist_ok=True)
 36 | 
 37 | ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
 38 | sys.path.append('{}/third_party/Matcha-TTS'.format(ROOT_DIR))
 39 | from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
 40 | from cosyvoice.utils.file_utils import load_wav, logging
 41 | from cosyvoice.utils.common import set_all_random_seed
 42 | inference_mode_list = ['预训练音色', '3s极速复刻', '跨语种复刻', '自然语言控制']
 43 | instruct_dict = {'预训练音色': '1. 选择预训练音色\n2. 点击生成音频按钮',
 44 |                  '3s极速复刻': '1. 选择prompt音频文件，或录入prompt音频，注意不超过30s，若同时提供，优先选择prompt音频文件\n2. 输入prompt文本\n3. 点击生成音频按钮',
 45 |                  '跨语种复刻': '1. 选择prompt音频文件，或录入prompt音频，注意不超过30s，若同时提供，优先选择prompt音频文件\n2. 点击生成音频按钮',
 46 |                  '自然语言控制': '1. 选择预训练音色\n2. 输入instruct文本\n3. 点击生成音频按钮'}
 47 | stream_mode_list = [('否', False), ('是', True)]
 48 | max_val = 0.8
 49 | 
 50 | 
 51 | # 在新任务开始时，清理或归档旧音频
 52 | def cleanup_old_audio():
 53 |     files = os.listdir(SAVE_DIR)
 54 |     if not files:
 55 |         return
 56 | 
 57 |     if CLEANUP_MODE == "delete":
 58 |         for file in files:
 59 |             file_path = os.path.join(SAVE_DIR, file)
 60 |             try:
 61 |                 os.remove(file_path)
 62 | #                print(f"已删除旧音频: {file}")
 63 |             except Exception as e:
 64 |                 print(f"无法删除 {file}: {e}")
 65 | 
 66 |     elif CLEANUP_MODE == "move":
 67 |         for file in files:
 68 |             old_path = os.path.join(SAVE_DIR, file)
 69 |             new_path = os.path.join(HISTORY_DIR, file)
 70 |             try:
 71 |                 shutil.move(old_path, new_path)
 72 | #                print(f"已归档旧音频: {file} -> {HISTORY_DIR}")
 73 |             except Exception as e:
 74 |                 print(f"无法归档 {file}: {e}")
 75 | 
 76 | def generate_seed():
 77 |     seed = random.randint(1, 100000000)
 78 |     return {
 79 |         "__type__": "update",
 80 |         "value": seed
 81 |     }
 82 | 
 83 | 
 84 | def postprocess(speech, top_db=60, hop_length=220, win_length=440):
 85 |     speech, _ = librosa.effects.trim(
 86 |         speech, top_db=top_db,
 87 |         frame_length=win_length,
 88 |         hop_length=hop_length
 89 |     )
 90 |     if speech.abs().max() > max_val:
 91 |         speech = speech / speech.abs().max() * max_val
 92 |     speech = torch.concat([speech, torch.zeros(1, int(cosyvoice.sample_rate * 0.2))], dim=1)
 93 |     return speech
 94 | 
 95 | 
 96 | def change_instruction(mode_checkbox_group):
 97 |     return instruct_dict[mode_checkbox_group]
 98 | 
 99 | # 将多个音频片段合并为一个完整音频文件
100 | def merge_audio_files(file_list, output_path):
101 |     if len(file_list) == 1:
102 | #        print("只有一个音频文件，无需合并")
103 |         shutil.move(file_list[0], output_path)  # 直接重命名并移动到最终目录
104 |         return output_path
105 | 
106 | #    print(f"需要合并 {len(file_list)} 个音频文件...")
107 | 
108 |     combined = AudioSegment.empty()
109 | 
110 |     for file in sorted(file_list):
111 |         audio_segment = AudioSegment.from_wav(file)
112 |         combined += audio_segment
113 | 
114 |     combined.export(output_path, format="wav")
115 | #    print(f"所有音频片段已合并，最终音频文件: {output_path}")
116 | 
117 |     # 删除分段音频文件
118 |     for file in file_list:
119 |         os.remove(file)
120 | #        print(f"已删除分段音频文件: {file}")
121 | 
122 |     return output_path
123 | 
124 | 
125 | def generate_audio(tts_text, mode_checkbox_group, sft_dropdown, prompt_text, prompt_wav_upload, prompt_wav_record, instruct_text,
126 |                    seed, stream, speed):
127 |     # 在新任务开始时，清理或归档旧文件
128 |     cleanup_old_audio()
129 |     # 获取当前时间戳，用作文件名
130 |     timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
131 |     final_output_path = os.path.join(SAVE_DIR, f"{timestamp}.wav")
132 |     # 存储所有生成的音频片段
133 |     generated_files = []
134 | 
135 |     set_all_random_seed(seed)
136 | 
137 |     def save_audio(audio_data):
138 |         """ 保存音频（临时存储，不复制到自定义目录） """
139 |         temp_filename = f"temp_{len(generated_files) + 1}.wav"  # 统一使用 `temp_x.wav` 避免混淆
140 |         temp_output_path = os.path.join(SAVE_DIR, temp_filename)
141 | 
142 |         with sf.SoundFile(temp_output_path, 'w', samplerate=cosyvoice.sample_rate, channels=1) as f:
143 |             f.write(audio_data)
144 | 
145 |         generated_files.append(temp_output_path)  # 记录生成的音频文件
146 | #        print(f"生成的音频文件: {temp_output_path}")
147 | 
148 |     if prompt_wav_upload is not None:
149 |         prompt_wav = prompt_wav_upload
150 |     elif prompt_wav_record is not None:
151 |         prompt_wav = prompt_wav_record
152 |     else:
153 |         prompt_wav = None
154 |     # if instruct mode, please make sure that model is iic/CosyVoice-300M-Instruct and not cross_lingual mode
155 |     if mode_checkbox_group in ['自然语言控制']:
156 |         if cosyvoice.instruct is False:
157 |             gr.Warning('您正在使用自然语言控制模式, {}模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型'.format(args.model_dir))
158 |             yield (cosyvoice.sample_rate, default_data)
159 |         if instruct_text == '':
160 |             gr.Warning('您正在使用自然语言控制模式, 请输入instruct文本')
161 |             yield (cosyvoice.sample_rate, default_data)
162 |         if prompt_wav is not None or prompt_text != '':
163 |             gr.Info('您正在使用自然语言控制模式, prompt音频/prompt文本会被忽略')
164 |     # if cross_lingual mode, please make sure that model is iic/CosyVoice-300M and tts_text prompt_text are different language
165 |     if mode_checkbox_group in ['跨语种复刻']:
166 |         if cosyvoice.instruct is True:
167 |             gr.Warning('您正在使用跨语种复刻模式, {}模型不支持此模式, 请使用iic/CosyVoice-300M模型'.format(args.model_dir))
168 |             yield (cosyvoice.sample_rate, default_data)
169 |         if instruct_text != '':
170 |             gr.Info('您正在使用跨语种复刻模式, instruct文本会被忽略')
171 |         if prompt_wav is None:
172 |             gr.Warning('您正在使用跨语种复刻模式, 请提供prompt音频')
173 |             yield (cosyvoice.sample_rate, default_data)
174 |         gr.Info('您正在使用跨语种复刻模式, 请确保合成文本和prompt文本为不同语言')
175 |     # if in zero_shot cross_lingual, please make sure that prompt_text and prompt_wav meets requirements
176 |     if mode_checkbox_group in ['3s极速复刻', '跨语种复刻']:
177 |         if prompt_wav is None:
178 |             gr.Warning('prompt音频为空，您是否忘记输入prompt音频？')
179 |             yield (cosyvoice.sample_rate, default_data)
180 |         if torchaudio.info(prompt_wav).sample_rate < prompt_sr:
181 |             gr.Warning('prompt音频采样率{}低于{}'.format(torchaudio.info(prompt_wav).sample_rate, prompt_sr))
182 |             yield (cosyvoice.sample_rate, default_data)
183 |     # sft mode only use sft_dropdown
184 |     if mode_checkbox_group in ['预训练音色']:
185 |         if instruct_text != '' or prompt_wav is not None or prompt_text != '':
186 |             gr.Info('您正在使用预训练音色模式，prompt文本/prompt音频/instruct文本会被忽略！')
187 |         if sft_dropdown == '':
188 |             gr.Warning('没有可用的预训练音色！')
189 |             yield (cosyvoice.sample_rate, default_data)
190 |     # zero_shot mode only use prompt_wav prompt text
191 |     if mode_checkbox_group in ['3s极速复刻']:
192 |         if prompt_text == '':
193 |             gr.Warning('prompt文本为空，您是否忘记输入prompt文本？')
194 |             yield (cosyvoice.sample_rate, default_data)
195 |         if instruct_text != '':
196 |             gr.Info('您正在使用3s极速复刻模式，预训练音色/instruct文本会被忽略！')
197 | 
198 |     if mode_checkbox_group == '预训练音色':
199 |         logging.info('get sft inference request')
200 |         for i in cosyvoice.inference_sft(tts_text, sft_dropdown, stream=stream, speed=speed):
201 |             audio_data = i['tts_speech'].numpy().flatten()
202 |             yield cosyvoice.sample_rate, audio_data
203 |             save_audio(audio_data)
204 | 
205 |     elif mode_checkbox_group == '3s极速复刻':
206 |         logging.info('get zero_shot inference request')
207 |         prompt_speech_16k = postprocess(load_wav(prompt_wav, prompt_sr))
208 |         for i in cosyvoice.inference_zero_shot(tts_text, prompt_text, prompt_speech_16k, stream=stream, speed=speed):
209 |             audio_data = i['tts_speech'].numpy().flatten()
210 |             yield cosyvoice.sample_rate, audio_data
211 |             save_audio(audio_data)
212 | 
213 |     elif mode_checkbox_group == '跨语种复刻':
214 |         logging.info('get cross_lingual inference request')
215 |         prompt_speech_16k = postprocess(load_wav(prompt_wav, prompt_sr))
216 |         for i in cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=stream, speed=speed):
217 |             audio_data = i['tts_speech'].numpy().flatten()
218 |             yield (cosyvoice.sample_rate, audio_data)
219 |             save_audio(audio_data)
220 | 
221 |     else:
222 |         logging.info('get instruct inference request')
223 |         for i in cosyvoice.inference_instruct(tts_text, sft_dropdown, instruct_text, stream=stream, speed=speed):
224 |             audio_data = i['tts_speech'].numpy().flatten()
225 |             yield (cosyvoice.sample_rate, audio_data)
226 |             save_audio(audio_data)
227 | 
228 |         # 合并多个音频文件（如果有多个,否则无变化）
229 |     final_output = merge_audio_files(generated_files, final_output_path)
230 | #    print(f"最终合成的完整音频文件: {final_output}")
231 | 
232 | def main():
233 |     with gr.Blocks() as demo:
234 |         gr.Markdown("### 代码库 [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) \
235 |                     预训练模型 [CosyVoice-300M](https://www.modelscope.cn/models/iic/CosyVoice-300M) \
236 |                     [CosyVoice-300M-Instruct](https://www.modelscope.cn/models/iic/CosyVoice-300M-Instruct) \
237 |                     [CosyVoice-300M-SFT](https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT)")
238 |         gr.Markdown("#### 请输入需要合成的文本，选择推理模式，并按照提示步骤进行操作")
239 | 
240 |         tts_text = gr.Textbox(label="输入合成文本", lines=1, value="我是通义实验室语音团队全新推出的生成式语音大模型，提供舒适自然的语音合成能力。")
241 |         with gr.Row():
242 |             mode_checkbox_group = gr.Radio(choices=inference_mode_list, label='选择推理模式', value=inference_mode_list[0])
243 |             instruction_text = gr.Text(label="操作步骤", value=instruct_dict[inference_mode_list[0]], scale=0.5)
244 |             sft_dropdown = gr.Dropdown(choices=sft_spk, label='选择预训练音色', value=sft_spk[0], scale=0.25)
245 |             stream = gr.Radio(choices=stream_mode_list, label='是否流式推理', value=stream_mode_list[0][1])
246 |             speed = gr.Number(value=1, label="速度调节(仅支持非流式推理)", minimum=0.5, maximum=2.0, step=0.1)
247 |             with gr.Column(scale=0.25):
248 |                 seed_button = gr.Button(value="\U0001F3B2")
249 |                 seed = gr.Number(value=0, label="随机推理种子")
250 | 
251 |         with gr.Row():
252 |             prompt_wav_upload = gr.Audio(sources='upload', type='filepath', label='选择prompt音频文件，注意采样率不低于16khz')
253 |             prompt_wav_record = gr.Audio(sources='microphone', type='filepath', label='录制prompt音频文件')
254 |         prompt_text = gr.Textbox(label="输入prompt文本", lines=1, placeholder="请输入prompt文本，需与prompt音频内容一致，暂时不支持自动识别...", value='')
255 |         instruct_text = gr.Textbox(label="输入instruct文本", lines=1, placeholder="请输入instruct文本.", value='')
256 | 
257 |         generate_button = gr.Button("生成音频")
258 | 
259 | #        audio_output = gr.Audio(label="合成音频", autoplay=True, streaming=True)
260 |         audio_output = gr.Audio(label="合成音频", streaming=False) # streaming=False 能够下载音频
261 | 
262 |         seed_button.click(generate_seed, inputs=[], outputs=seed)
263 |         generate_button.click(generate_audio,
264 |                               inputs=[tts_text, mode_checkbox_group, sft_dropdown, prompt_text, prompt_wav_upload, prompt_wav_record, instruct_text,
265 |                                       seed, stream, speed],
266 |                               outputs=[audio_output])
267 |         mode_checkbox_group.change(fn=change_instruction, inputs=[mode_checkbox_group], outputs=[instruction_text])
268 |     demo.queue(max_size=4, default_concurrency_limit=2)
269 |     demo.launch(server_name='0.0.0.0', server_port=args.port)
270 | 
271 | 
272 | if __name__ == '__main__':
273 |     parser = argparse.ArgumentParser()
274 |     parser.add_argument('--port',
275 |                         type=int,
276 |                         default=8000)
277 |     parser.add_argument('--model_dir',
278 |                         type=str,
279 |                         default=MODEL_DIR,
280 |                         help='local path or modelscope repo id')
281 |     args = parser.parse_args()
282 |     try:
283 |         cosyvoice = CosyVoice(args.model_dir)
284 |     except Exception:
285 |         try:
286 |             cosyvoice = CosyVoice2(args.model_dir)
287 |         except Exception:
288 |             raise TypeError('no valid model_type!')
289 | 
290 |     sft_spk = cosyvoice.list_available_spks()
291 |     if len(sft_spk) == 0:
292 |         sft_spk = ['']
293 |     prompt_sr = 16000
294 |     default_data = np.zeros(cosyvoice.sample_rate)
295 |     main()
296 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Live2D-LLM-Chat
  2 | [US English](README.md) | [CN 中文](README_CN.md)
  3 | 
  4 | [![ASR](https://img.shields.io/badge/ASR-SenseVoice-green.svg)](https://github.com/FunAudioLLM/SenseVoice)
  5 | [![LLM](https://img.shields.io/badge/LLM-GPT%2FDeepSeek-red.svg)](https://openai.com/api/) 
  6 | [![TTS](https://img.shields.io/badge/TTS-CosyVoice-orange.svg)](https://github.com/FunAudioLLM/CosyVoice)
  7 | [![Live2D](https://img.shields.io/badge/Live2D-v3-blue.svg)](https://github.com/Arkueid/live2d-py)
  8 | 
  9 | [![Python](https://img.shields.io/badge/Python-3.8+-yellow.svg)](https://www.python.org/downloads/)
 10 | [![Miniconda](https://img.shields.io/badge/Anaconda-Miniconda-violet.svg)](https://www.anaconda.com/docs/getting-started/anaconda/install)
 11 | 
 12 | > **Live2D + ASR + LLM + TTS** → Real-time voice interaction | Local deployment / Cloud inference
 13 | 
 14 | ---
 15 | ## ✨ 1. Project Introduction
 16 | 
 17 | **Live2D-LLM-Chat** is a real-time AI interaction project that integrates **Live2D virtual avatars**, **Automatic Speech Recognition (ASR)**, **Large Language Models (LLM)**, and **Text-to-Speech (TTS)**. It allows a **virtual character** to recognize the user's speech through ASR, generate intelligent responses using AI, synthesize speech via TTS, and drive Live2D animations with lip-sync for a natural interaction experience.
 18 | 
 19 | ---
 20 | ### 📌 1.1. Main Features
 21 | - 🎙 **Automatic Speech Recognition（ASR）**: Uses FunASR for Speech-to-Text (STT) processing.
 22 | - 🧠 **Large Language Model（LLM）**: Supports rational conversation using OpenAI GPT / DeepSeek.
 23 | - 🔊 **Text-to-Speech（TTS）**: Uses CosyVoice for high-quality speech synthesis.
 24 | - 🏆 **Live2D Virtual Character Interaction**: Renders models using Live2D SDK and enables real-time feedback.
 25 | 
 26 | ---
 27 | ### 📌 1.2. Enhanced Features
 28 | - **LLM module** supports both local and cloud deployment. The local deployment is based on **LM Studio**, which covers all open-source models, but personal device performance may limit large - models. Cloud deployment supports **OpenAI** and **DeepSeek** APIs.
 29 | - Stores conversation history with **context memory**. Every five conversations, a summary is generated to prevent excessive text accumulation.
 30 | - **Conversation logging** records the timestamp and dialogue history, including **TTS audio outputs**, making it easy to review past interactions. This feature can be disabled in the config file to **reduce memory usage**.
 31 | - Enhanced Live2D **eye-tracking** and **blinking logic** to provide natural blinking even if the Live2D model lacks built-in logic. Implements **lip-sync mechanics** by analyzing real-time audio volume from the TTS output.
 32 | - Modifies CosyVoice API to **directly save** generated speech files and **merge** segmented audio for long text synthesis.
 33 | 
 34 | <p align="center">
 35 |   <img src="Live2d_env/running_photo.jpg" alt="Live2D 运行展示" width="620px">
 36 |   <br>
 37 |   <b>Live2D Running Showcase</b>
 38 | </p>
 39 | 
 40 | #### 🎬 Interaction Demo
 41 | 
 42 | | Voice Input	 | AI Processing | Live2D Output |
 43 | |----------|---------|------------|
 44 | | 🎤 You: Hello! | 🤖 AI: Hi there! | 🧑‍🎤 "Hi there!" (Lip sync) |
 45 | | 🎤 You: How's the weather? | 🤖 AI: It's a sunny day! | 🧑‍🎤 "It's a sunny day!" (Speech tone variation) |
 46 | 
 47 | ---
 48 | ### 📌 1.3. Tech Stack
 49 | | Component  | Technology  |
 50 | |-------|-------|
 51 | | ASR (Automatic Speech Recognition) | SenseVoice |
 52 | | LLM (Large Language Model) | OpenAI GPT / DeepSeek |
 53 | | TTS (Text-to-Speech) | CosyVoice |
 54 | | Live2D Animation | live2d-py + OpenGL |
 55 | | Configuration Management | Python Config |
 56 | 
 57 | ---
 58 | ## 🛠 2. Installation and Configuration
 59 | 
 60 | ---
 61 | 
 62 | ### 📌 2.1. System Requirements
 63 | 
 64 | This project is developed with **Python 3.11**, and the following system requirements should be met before running it:
 65 | 
 66 | ✅ **Operating System**:
 67 |    - 🖥 **Windows 10/11** or **Linux**
 68 | 
 69 | ✅ **Python Version**:
 70 |    - 📌 Recommended **Python 3.8 or above**
 71 | 
 72 | ⚠️ **Note**:  
 73 | The **TTS module** runs in a **conda environment** and requires **Miniconda** to be installed beforehand.  
 74 | 🔗 You can download it from [Miniconda Official Website](https://docs.conda.io/en/latest/miniconda.html).
 75 | ---
 76 | 
 77 | ### 📌 2.2. Dependencies
 78 | 
 79 | This project leverages the following open-source libraries and models: 
 80 | 
 81 | 🎙 **Automatic Speech Recognition (ASR)**:
 82 | - **SenseVoice** - High-precision **multilingual speech recognition** and **speech emotion analysis**.
 83 | - 🔗 **GitHub**: [SenseVoice Repository](https://github.com/FunAudioLLM/SenseVoice)
 84 | 
 85 | 🔊 **Text-to-Speech (TTS)**:
 86 | - **CosyVoice** - A powerful **generative speech synthesis system**, supporting **zero-shot voice cloning**.
 87 | - 🔗 **GitHub**: [CosyVoice Repository](https://github.com/FunAudioLLM/CosyVoice)
 88 | 
 89 | 📽 **Live2D Animation**:
 90 | - **live2d-py** - A tool for **directly loading and manipulating Live2D models** in Python.
 91 | - 🔗 **GitHub**: [live2d-py Repository](https://github.com/Arkueid/live2d-py)
 92 | 
 93 | ---
 94 | ## 📁 3. Installation Steps
 95 | 
 96 | ---
 97 | ### 📌 3.1. Clone the Project Repository
 98 | 
 99 | ```bash
100 | git clone https://github.com/suzuran0y/Live2D-LLM-Chat.git
101 | cd Live2D-LLM-Chat
102 | ```
103 | 
104 | ### 📌 3.2. Create a Virtual Environment (Optional)
105 | ```bash
106 | python -m venv venv
107 | source venv/bin/activate  # Linux/macOS activation
108 | venv\Scripts\activate  # Windows activation
109 | ```
110 | 
111 | ### 📌 3.3. Install Dependencies
112 | 
113 | ```bash
114 | pip install -r requirements.txt
115 | ```
116 | 
117 | ---
118 | ### 📌 3.4. Install ASR & TTS Models
119 | 
120 | 🎙 **Speech Recognition (ASR) - SenseVoice**
121 | This project uses SenseVoice for ASR, supporting **high-precision multilingual speech recognition** and **speech emotion detection**.
122 | 
123 | #### 1️⃣ Install SenseVoice Dependencies
124 | Install SenseVoice dependencies using pip:
125 | ```bash
126 | pip install funasr
127 | ```
128 | 
129 | If you need ONNX or TorchScript inference, install the corresponding versions:
130 | ```bash
131 | pip install funasr-onnx  # ONNX version
132 | pip install funasr-torch  # TorchScript version
133 | ```
134 | 
135 | #### 2️⃣ Download SenseVoice Pre-trained Models
136 | SenseVoice provides several **pre-trained models**, which can be downloaded via ModelScope:
137 | ```python
138 | from modelscope import snapshot_download
139 | 
140 | # Download SenseVoice-Small version
141 | snapshot_download('iic/SenseVoiceSmall', local_dir='pretrained_models/SenseVoiceSmall')
142 | # Download SenseVoice-Large version for higher accuracy
143 | snapshot_download('iic/SenseVoiceLarge', local_dir='pretrained_models/SenseVoiceLarge')
144 | ```
145 | 
146 | 🔗 More details: [SenseVoice GitHub](https://github.com/FunAudioLLM/SenseVoice) | [ModelScope](https://www.modelscope.cn/models/iic/SenseVoiceSmall)
147 | 
148 | 🔊 **Text-to-Speech (TTS) - CosyVoice**
149 | This project uses CosyVoice for TTS, supporting **multilingual speech synthesis, voice cloning, and cross-lingual synthesis**.
150 | 
151 | #### 1️⃣ Install CosyVoice Dependencies
152 | Clone the CosyVoice repository:
153 | ```bash
154 | git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
155 | cd CosyVoice
156 | git submodule update --init --recursive
157 | ```
158 | 
159 | #### 2️⃣ Create a Conda Environment and Install Dependencies
160 | ```bash
161 | # Create a Conda virtual environment
162 | conda create -n cosyvoice -y python=3.10
163 | conda activate cosyvoice
164 | 
165 | # Install required dependencies
166 | conda install -y -c conda-forge pynini==2.1.5
167 | pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
168 | ```
169 | 
170 | Install SoX (if necessary):
171 | ```bash
172 | # Ubuntu
173 | sudo apt-get install sox libsox-dev
174 | # CentOS
175 | sudo yum install sox sox-devel
176 | ```
177 | 
178 | #### 3️⃣ Download CosyVoice Pre-trained Models
179 | It is recommended to download the following CosyVoice pre-trained models:
180 | ```python
181 | from modelscope import snapshot_download
182 | 
183 | snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
184 | snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
185 | snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
186 | snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
187 | snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
188 | ```
189 | 
190 | 🔗 More details: [CosyVoice GitHub](https://github.com/FunAudioLLM/CosyVoice) | [ModelScope](https://www.modelscope.cn/iic/CosyVoice2-0.5B)
191 | 
192 | ---
193 | ## ⚙️ 4. Configuration for Local Setup（important！！）
194 | 
195 | ---
196 | 
197 | ### 📌 4.1. Configure ASR & TTS Models
198 | 
199 | After installing **ASR** and **TTS** models, follow these steps for local configuration:
200 | 
201 | ✅ **Replace SenseVoice Directory** 
202 | - Move the downloaded **SenseVoice** folder into `Live2D-LLM-Chat/ASR_env/`, replacing the existing empty folder.
203 | 
204 | ✅ **Replace CosyVoice Directory**
205 | - Move the downloaded **CosyVoice** folder into `Live2D-LLM-Chat/TTS_env/`, replacing the existing empty folder.
206 | 
207 | ✅ **Replace `webui.py` File**
208 | - Move the `TTS_env/webui.py` file into the `CosyVoice` folder, replacing the original `webui.py` file.
209 | 
210 | ---
211 | 
212 | ### 📌 4.2. Configure `config.py` for Local Environment
213 | Modify **`config.py`** to adjust local file paths and parameters. Example:
214 | ```python
215 | class Config:
216 |     # 🏠 Project Root Directory
217 |     PROJECT_ROOT = "E:/PyCharm/project/project1"
218 | 
219 |     # 🎙 ASR (Automatic Speech Recognition) Configuration
220 |     ASR_MODEL_DIR = os.path.join(PROJECT_ROOT, "ASR_env/SenseVoice/models/SenseVoiceSmall")
221 |     ASR_AUDIO_INPUT = os.path.join(PROJECT_ROOT, "ASR_env/input_voice/voice.wav")
222 | 
223 |     # 🔊 TTS (Text-to-Speech) Configuration
224 |     TTS_API_URL = "http://localhost:8000/"
225 |     TTS_OUTPUT_DIR = os.path.join(PROJECT_ROOT, "TTS_env/output_voice/")
226 | 
227 | ```
228 | ❗ **Ensure all paths are correctly set up before running the project!**
229 | 
230 | ---
231 | ## 📌 4.3. Configure LLM Model
232 | 
233 | Local deployment of the **LLM model** relies on **LM Studio**. Follow these steps:
234 | 
235 | #### 1️⃣ Install LM Studio
236 | Download from [GitHub](https://github.com/lmstudio-ai) or the [LM Studio official website](https://lmstudio.ai/).
237 | 
238 | #### 2️⃣ Open the application and download an LLM model compatible with your device.
239 | Start LM Studio and obtain the local API URL.
240 | Adjust the model path & port number in `config.py`.
241 | 
242 | #### 3️⃣ Run the local LLM and integrate it into the project.
243 | ⚠️ **Note**: The performance of locally deployed LLM models depends on device capabilities and may not match cloud-based models. If higher performance is required, consider using OpenAI GPT-4 or DeepSeek API.
244 | 
245 | ---
246 | ## 👀 5. Usage Instructions
247 | ---
248 | 
249 | ## 📌 5.1. Start the TTS AP
250 |  
251 | Before running the main program, **start the TTS API**:
252 | 
253 | ```bash
254 | python TTS_api.py  # This is now integrated into the main program but can be run separately for debugging.
255 | ```
256 | 
257 | 🎯 The TTS API module will run `webui.py` in the **conda environment**. Once successfully started, you can access the WebUI for voice synthesis management: 🌍 Default address: [http://localhost:8000](http://localhost:8000)
258 | 
259 | ❗ Ensure the **TTS API is running properly**, or the program will not be able to generate speech.
260 | 
261 | ---
262 | ### 📌 5.2. Run the Main Program
263 | 
264 | Once the TTS API is started, run the main program:
265 | 
266 | ```bash
267 | python main.py
268 | ```
269 | 
270 | 🎙 **Interaction Steps**:
271 | 
272 | 1️⃣ **Press and hold the Ctrl key** to start recording, **press the Alt key** to stop recording. The voice will be automatically converted into text.
273 | 2️⃣ The **text is processed by the LLM module**, generating a response.
274 | 3️⃣ The **response text is converted into speech** via the TTS module, and the Live2D model will sync its lip movements to the speech.
275 | 
276 | ---
277 | 
278 | ### 📌 5.3. System Architecture Diagram
279 | 
280 | | **Step** | **Module** | **Input** | **Processing** | **Output** |
281 | |----------|---------|---------|---------|---------|
282 | | 🎤 **User Speech** | **User** | Speech Input | User speaks | Audio Signal |
283 | | 🎙 **Speech Recognition** | **ASR (SenseVoice)** | Audio Signal | Speech-to-Text (STT) | Recognized Text |
284 | | 🤖 **Text Understanding & Generation** | **LLM (GPT-4 / DeepSeek)** | Recognized Text | Semantic Analysis & AI Response Generation | AI-Generated Text |
285 | | 🔊 **Speech Synthesis** | **TTS (CosyVoice)** | AI-Generated Text | Text-to-Speech (TTS) | Speech Data |
286 | | 🎭 **Live2D Animation** | **Live2D** | Speech Data | Motion Generation | Character Animation |
287 | | 🗣 **AI Voice Feedback** | **User** | Character Voice & Actions | User hears AI response | Voice & Visual Interaction |
288 | 
289 | ---
290 | # 📂 6. Project Structure
291 | 
292 | This project follows a modular design, integrating **ASR (speech recognition), TTS (text-to-speech), LLM (large language model), and Live2D animation rendering** as core functionalities. Below is the **complete project structure**:
293 | 
294 | ```bash
295 | Live2D-LLM-Chat/
296 | │── main.py                # 🚀 Main program entry
297 | │── ASR.py                 # 🎙 Speech Recognition (ASR) module
298 | │── TTS.py                 # 🔊 Speech Synthesis (TTS) module
299 | │── TTS_api.py             # 🌐 TTS API module
300 | │── LLM.py                 # 🤖 Large Language Model (LLM) module
301 | │── Live2d_animation.py    # 🎭 Live2D animation management module
302 | │── webui.py               # 🖥 WebUI for voice synthesis
303 | │── config.py              # ⚙️ Configuration file
304 | │── requirements.txt       # 📦 Dependency list
305 | └── README.md              # 📄 Project documentation
306 | ```
307 | ---
308 | ## 🚀 7. Future Plans
309 | ---
310 | 
311 | ### 📌 7.1. Past Developments
312 | 
313 | #### 📅 **2025.01.28 - Initial Project Concept** 
314 | - 🎯 **Core Goals Defined**: Developing a **Live2D + LLM** real-time interaction system.
315 | - 🔍 **Technology Research**: Investigating ASR (speech recognition), TTS (text-to-speech), and Live2D solutions.
316 | - ✅ **Core Components Selected**:
317 |   - **SenseVoice** for ASR
318 |   - **CosyVoice** for TTS
319 |   - **live2d-py** for animation rendering
320 | 
321 | #### 📅 **2025.02.28 - First Version Release**
322 | - 🎙 **Implemented speech input & recognition (ASR)**
323 | - 🤖 **Integrated LLM for text generation**
324 | - 🔊 **Generated speech output & synced Live2D mouth movements**
325 | 
326 | ---
327 | 
328 | ### 📌 7.2. Future Plans ~~(Wishlist)~~
329 | 
330 | 🔹 **LLM Module Optimization**:
331 |    - Due to **device limitations**, local deployment may not match cloud-based models. **Improving LLM processing logic** to enhance stability.
332 | 
333 | 🔹 **Refined Output Management**:
334 |    - Optimizing **program logs and output messages** to retain only essential information for a cleaner display.
335 | 
336 | 🔹 **Enhanced Live2D Interaction**:
337 |    - **Improving Live2D model expressions and movements** to make interactions feel more natural and engaging.
338 | 
339 | 🔹 **Additional Optimizations**:
340 |    - 🛠 Improving TTS & ASR efficiency
341 |    - 🌍 Expanding multilingual support
342 |    - 🔗 Enhancing cloud-based inference capabilities
343 | 
344 | ---
345 | #### 📅 **2025.02.28 - First Version Release**
346 | - 🎙 **Implemented speech input & recognition (ASR)**
347 | - 🤖 **Integrated LLM for text generation**
348 | - 🔊 **Generated speech output & synced Live2D mouth movements**
349 | 
350 | ---
351 | ## 🤝 8. Contributions & Acknowledgments
352 | ---
353 | 
354 | This project builds upon work from [SenseVoice](https://github.com/FunAudioLLM/SenseVoice), [CosyVoice](https://github.com/FunAudioLLM/CosyVoice), and [live2d-py](https://github.com/Arkueid/live2d-py), incorporating modifications and optimizations to fit the project’s requirements.  
355 | 🎉 **Special thanks to the original developers!**
356 | 
357 | 💡 **We welcome contributions and feedback!**
358 | 
359 | 📢 If you have suggestions or improvements, please submit a **PR (Pull Request)** or **Issue** on GitHub.
360 | 
361 | ---
362 | ## 📄 9. License
363 | This project is licensed under the [Apache-2.0 License](LICENSE).


--------------------------------------------------------------------------------