├── .gitignore ├── None └── text.0_0 ├── README.md ├── README_EN.md ├── Windows_user_tutorial.md ├── asserts ├── 2024-09-08-14-37-55.png ├── 2024-09-08-14-39-00.png ├── 2024-09-08-14-40-02.png ├── 2024-09-08-16-43-48.png ├── 2024-09-08-16-43-56.png ├── 2024-09-08-16-44-01.png └── 2024-09-08-16-44-07.png ├── config.json ├── data └── database.xlsx ├── dialogs_output_202408201256.md ├── docker-compose.yml ├── env.template ├── img ├── RAG_1.png ├── RAG_2.png ├── algo_long_response.png ├── algo_long_screenShot.png ├── excel.png ├── fix_response.png ├── fix_screenShot.png ├── general_response.png ├── general_screenShot.png ├── image.png ├── img.gif └── web.gif ├── interview ├── ChatgptManager.py ├── Config.py ├── DialogManager.py ├── Interviewer.py ├── MultimodeManager.py ├── R2Uploader.py ├── RagManager.py ├── RequestSender.py ├── Rookie.py ├── SaveFile.py ├── ScreenShot.py ├── __init__.py ├── audioTest.py └── main.py ├── macOS_user_tutorial.md ├── poetry.md ├── prompt ├── algorithm.md ├── fix.md ├── general.md ├── long_screenshot.md ├── ocr.md └── personality.md ├── pyproject.toml ├── screenshots ├── long_screenshot_20240829_212734.png ├── screenshot_20240829_160848.png ├── screenshot_20240829_213839.png └── screenshot_20240829_225747.png ├── start_funasr_services.sh ├── tests └── __init__.py ├── tmp └── text.0_0 └── workflows ├── prompt_1 ├── step1.md └── step2.md └── prompt_2 └── step1.md /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | *.wav 3 | !.env.template 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | interview/__pycache__/ 7 | *.py[cod] 8 | *$py.class 9 | 10 | # C extensions 11 | *.so 12 | 13 | screenshots/* 14 | !screenshots/long_screenshot_20240829_212734.png 15 | !screenshots/screenshot_20240829_160848.png 16 | !screenshots/screenshot_20240829_213839.png 17 | !screenshots/screenshot_20240829_225747.png 18 | 19 | dialogs_output* 20 | !dialogs_output_202408201256.md 21 | 22 | # Distribution / packaging 23 | .Python 24 | env/ 25 | venv/ 26 | ENV/ 27 | env.bak/ 28 | venv.bak/ 29 | .pytest_cache/ 30 | 31 | # Poetry specific 32 | poetry.lock 33 | .cache/ 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | .coverage/ 53 | 54 | # MyPy 55 | .mypy_cache/ 56 | .dmypy.json 57 | dmypy.json 58 | 59 | # Pyre type checker 60 | .pyre/ 61 | 62 | # Jupyter Notebook 63 | .ipynb_checkpoints 64 | 65 | # pyenv 66 | .python-version 67 | 68 | # Celery stuff 69 | celerybeat-schedule 70 | celerybeat.pid 71 | 72 | # SageMath parsed files 73 | *.sage.py 74 | 75 | # Environments 76 | .env 77 | .env.* 78 | .venv 79 | venv/ 80 | ENV/ 81 | 82 | # Virtualenv 83 | venv/ 84 | ENV/ 85 | 86 | # Spyder project settings 87 | .spyderproject 88 | .spyproject 89 | 90 | # Rope project settings 91 | .ropeproject 92 | 93 | # IDEs 94 | .vscode/ 95 | .idea/ 96 | *.sublime-project 97 | *.sublime-workspace 98 | 99 | # MacOS-specific files 100 | .DS_Store 101 | .AppleDouble 102 | .LSOverride 103 | 104 | # Thumbnails 105 | ._* 106 | 107 | # Trashes 108 | .Trashes 109 | 110 | # macOS specific files 111 | .DS_Store 112 | Icon? 113 | ._* 114 | .Spotlight-V100 115 | .Trashes 116 | .DS_Store? 117 | 118 | # Linux-specific files 119 | *~ 120 | 121 | # Windows-specific files 122 | Thumbs.db 123 | ehthumbs.db 124 | Desktop.ini 125 | $RECYCLE.BIN/ 126 | *.cab 127 | *.msi 128 | *.msm 129 | *.msp 130 | *.lnk 131 | -------------------------------------------------------------------------------- /None/text.0_0: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/None/text.0_0 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Simple Interview Audio Processing 2 | 3 | > **免责声明**: 本项目仅为技术练习,严禁用于面试作弊或任何商业用途。若因使用本项目导致任何法律问题,作者概不负责。如本项目对您造成困扰,请联系作者进行删除。 4 | 5 | 这是一个基于音频流的简单示例项目,旨在展示音频处理的基本功能。灵感源于网络上的高价付费项目,本项目通过简单实现来证明这类功能并不需要复杂的技术堆叠。 6 | 7 | --- 8 | 9 | [Windows 用户部署教程](Windows_user_tutorial.md) 10 | 11 | [MacOS 用户部署教程](macOS_user_tutorial.md) 12 | 13 | ## Future features 14 | 15 | 1. 通过 `多模态` + `数据库` 构建人格, 使用 `Tampermonkey` 实现 `性格测试` 的 `全自动化` 16 | 2. ocr 识别数学公式准确率不高,导致 GPT 有时候会读不懂题进而乱回答 17 | 3. 目前回答的 GPT 味太过严重, 这边我会爬取一些题解面经作为 RAG 外接,同时优化 prompt 18 | 4. GPT 的正确率还是太低,做一个 claude 和 GPT,甚至其他模型相互验证 19 | 5. 最重要: 优化部署流程部署难度,致力于让每个人都能简简单单的用上(考虑做一个用户端的界面并打包) 20 | 6. 如果你喜欢这个项目,可以给一个 star 吗? 如果有你想要的 `Feature` 可以在 `Issues` 或者其他地方告诉我. 21 | 22 | ## New Feature 23 | 24 | 新增多模态和 GPT 快捷键调用 25 | 26 | #### MacOS 27 | 28 | ##### 对话询问 29 | 30 | 使用`Command (⌘) + Option/Alt (⌥) + h` 调用`GPT` 进行对话询问. 31 | 32 | ##### 截图询问 33 | 34 | 使用`Command (⌘) + Option/Alt (⌥) + a` 调用`algorithm Prompt` 询问. 35 | 36 | 使用`Command (⌘) + Option/Alt (⌥) + p` 调用`personality Prompt` 询问. 37 | 38 | 使用`Command (⌘) + Option/Alt (⌥) + g` 调用`general Prompt` 询问. 39 | 40 | 使用`Command (⌘) + Option/Alt (⌥) + l` 调用`long_screenshot Prompt` 询问. (截屏拼接) 41 | 42 | 使用`Command (⌘) + Option/Alt (⌥) + f` 调用`fix Prompt` 询问. 43 | 44 | 使用`Command (⌘) + Option/Alt (⌥) + o` 调用`ocr Prompt` 询问. 45 | 46 | #### Windows 47 | 48 | ##### 对话询问 49 | 50 | 使用` + + h` 调用`GPT` 进行对话询问. 51 | 52 | ##### 截图询问 53 | 54 | 使用` + + a` 调用`algorithm Prompt` 询问. 55 | 56 | 使用` + + p` 调用`personality Prompt` 询问. 57 | 58 | 使用` + + g` 调用`general Prompt` 询问. 59 | 60 | 使用` + + l` 调用`long_screenshot Prompt` 询问. (截屏拼接) 61 | 62 | 使用` + + f` 调用`fix Prompt` 询问. 63 | 64 | 使用` + + o` 调用`ocr Prompt` 询问. 65 | 66 | ### 通用问答展示 67 | 68 | ![general_screenShot](img/general_screenShot.png) 69 | ![general_response](img/general_response.png) 70 | 71 | ### 长算法题展示 72 | 73 | ![algo_long_screenShot](img/algo_long_screenShot.png) 74 | ![algo_long_response](img/algo_long_response.png) 75 | 76 | ### 代码修复 77 | 78 | ![fix_screenShot](img/fix_screenShot.png) 79 | ![fix_response](img/fix_response.png) 80 | 81 | ### web 展示 82 | 83 | 下图讲解, 通过播放本地音频,模仿系统内声音输出 `你为什么要使用消息队列呢?` 84 | 85 | `web` 监听到 `你为什么要使用消息队列呢?` 内容并流式输出 86 | 87 | `ChatGPT(大模型助手)` 流式输出相关问题的答案. 88 | 89 | 支持 `本地部署` 和 `服务器部署`. `flex` 布局,在 `手机` `平板` `电脑` 下的具有良好的显示效果 90 | ![Web](img/web.gif) 91 | 92 | [web 部分的仓库](https://github.com/AowerDmax/websocket-redis) 93 | 94 | --- 95 | 96 | ### RAG 展示 97 | 98 | 下图讲解, 支持将你预设好的问答内容存入`RAG`数据库,在询问`ChatGPT(大模型助手)`之前会先搜索`RAG`数据库. 99 | 100 | ![Rag_1](img/RAG_1.png) 101 | 102 | ![Rag_2](img/RAG_2.png) 103 | 104 | [RAG 部分的仓库](https://github.com/AowerDmax/websocket-redis) 105 | 106 | #### 如何使用 RAG 知识库 107 | 108 | 1. 在 `.env` 文件内, 设置`RAG_ENABLED=True` 109 | 2. 在 `data` 文件夹下,放入你的知识库 `xlsx` 文件, 会递归遍历所有的`xlsx` 文件 110 | 3. 文件格式如下: 有两列, 第一列是`Q`,第二列是`A`. 分别对应问题和回答. (生成知识库文件可以使用`FastGPT`导出的内容) 111 | 4. 重启`docker`容器, `docker-compose restart` 112 | 113 | ![Excel](img/excel.png) 114 | 115 | **Q:** 能否将我的项目进行存放在 `RAG` 知识库中呢? 116 | 117 | **A:** 目前需要自己处理, 自己询问 GPT 或者自己编写对应的问题, 或者你可以修改网络上 `AI审查代码的项目` 修改里面的`prompt`, 让其生成你自己项目的 `RAG` 知识库. 118 | 119 | --- 120 | 121 | ### Terminal 展示 122 | 123 | 下图讲解, 通过播放 b 站视频, 模仿系统内声音输出 `Interview` 监听到 `Redis` 相关内容 124 | 125 | `Rookie` 用户麦克风回答 `我不知道` 126 | 127 | `ChatGPT(大模型助手)` 流式输出相关问题的答案. 128 | 129 | ![Demo](img/img.gif) 130 | 131 | --- 132 | 133 | ### Workflow 展示 134 | 135 | 下图讲解,`interview(系统内声音)` 询问 `哈希` 相关问题 136 | 137 | `ChatGPT(大模型助手)` 首先回答了简要答案,哈希表、哈希函数、哈希冲突 138 | 139 | 然后针对这三个点,进行详细性针对性回答. 140 | 141 | `流式输出`,保证`输出速度` 142 | 143 | ![Audio Configuration](img/image.png) 144 | 145 | --- 146 | 147 | ## Features 148 | 149 | - **音频源处理**: 读取系统声音作为 `interviewer` 声音源,读取麦克风声音作为 `Rookie` 声音源,准确区分输入和输出。 150 | - **流式输出**: 支持 `interviewer`、`Rookie` 和 `ChatGPT` 的流式对话输出。 151 | - **自定义设置**: 可以自定义对话深度和打印内容,控制 `interviewer`、`Rookie` 和 `ChatGPT` 的最大对话记录数。 152 | - **Prompt 工作流**: 根据预设工作流顺序处理 `prompt` 文件夹中的所有文件。目前的工作流支持快速回复总结,然后针对各项针对性细节性回答 153 | - **保存对话记录**: 通过运行 `python interview/SaveFile.py` 将对话记录保存为 Markdown 文件。 154 | - **支持 openai 式 api**: `ChatGPT`, `Oaipro`, `Deepseek`, `通义千问`, 以及通过 `newApi` 和 `OneApi` 转换的 `openai` 格式的 API 155 | - **支持 web 展示**: 支持 web 展示 `Interview Dialog`, 流式输出, 支持 `本地部署` 和 `服务器部署`. `flex` 布局,在 `手机` `平板` `电脑` 下的具有良好的显示效果 156 | - **支持 外接 RAG 知识库**: 支持外接知识库 `RAG`, 存入你预设好的问题以及相应的答案. 在询问 `ChatGPT` 之前会先从先搜索知识库 `RAG` 内的相关内容作为辅助数据. 支持设置 `辅助数据的个数` 157 | - **优雅的退出机制**: 当 `代码` 退出时, 会自动生成 `dialogs_output_YYMMDDHHMM.md` 文件. 158 | 159 | ## Installation 160 | 161 | ### Recommended installation 162 | 163 | #### 1. 拉取代码仓库 164 | 165 | ```bash 166 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git 167 | 168 | cd Simple-Interview-Audio-Processing 169 | ``` 170 | 171 | #### 2. 启动服务 172 | 173 | 使用 `docker-compose` 启动服务: 174 | 175 | ```bash 176 | docker-compose up -d 177 | ``` 178 | 179 | #### 3. docker 容器设置 180 | 181 | ```bash 182 | sudo chmod +x start_funasr_services.sh 183 | ./start_funasr_services.sh 184 | ``` 185 | 186 | #### 4. 安装环境依赖 187 | 188 | 如果你对 `Poetry` 不了解, 这里是一个简单的 `Poetry` 入门教程. [Poetry 入门教程](poetry.md) 189 | 190 | 使用 Poetry 安装依赖: 191 | 192 | ```bash 193 | poetry install 194 | ``` 195 | 196 | 进入虚拟环境: 197 | 198 | ```bash 199 | poetry shell 200 | ``` 201 | 202 | #### 5. 配置 `.env` 文件 203 | 204 | 复制模板文件并根据需要进行修改: 205 | 206 | ```bash 207 | cp env.template .env 208 | ``` 209 | 210 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。 211 | 212 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量 213 | 214 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置. 215 | 216 | #### 6. 运行项目 217 | 218 | 运行主程序: 219 | 220 | ```bash 221 | python interview/main.py 222 | ``` 223 | 224 | ### Manual installation 225 | 226 | #### 1. 启动服务 227 | 228 | 使用 `docker-compose` 启动服务: 229 | 230 | ```bash 231 | docker-compose up -d 232 | ``` 233 | 234 | #### FunASR Interview 服务 235 | 236 | 进入 Docker 容器内部: 237 | 238 | ```bash 239 | docker-compose exec funasr_interview bash 240 | ``` 241 | 242 | 在 Docker 容器内运行以下命令启动服务: 243 | 244 | ```bash 245 | cd FunASR/runtime 246 | nohup bash run_server_2pass.sh \ 247 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 248 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 249 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 250 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 251 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 252 | --itn-dir thuduj12/fst_itn_zh \ 253 | --certfile 0 \ 254 | --hotword ../../hotwords.txt > log.txt 2>&1 & 255 | ``` 256 | 257 | #### FunASR Rookie 服务 258 | 259 | 进入 Docker 容器内部: 260 | 261 | ```bash 262 | docker-compose exec funasr_rookie bash 263 | ``` 264 | 265 | 在 Docker 容器内运行以下命令启动服务: 266 | 267 | ```bash 268 | cd FunASR/runtime 269 | nohup bash run_server_2pass.sh \ 270 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 271 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 272 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 273 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 274 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 275 | --itn-dir thuduj12/fst_itn_zh \ 276 | --certfile 0 \ 277 | --hotword ../../hotwords.txt > log.txt 2>&1 & 278 | ``` 279 | 280 | #### 2. 安装环境依赖 281 | 282 | 使用 Poetry 安装依赖: 283 | 284 | ```bash 285 | poetry install 286 | ``` 287 | 288 | 进入虚拟环境: 289 | 290 | ```bash 291 | poetry shell 292 | ``` 293 | 294 | #### 3. 配置 `.env` 文件 295 | 296 | 复制模板文件并根据需要进行修改: 297 | 298 | ```bash 299 | cp env.template .env 300 | ``` 301 | 302 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` 以及 GPT 的 `baseurl` 和 `API` 配置。 303 | 304 | #### 4. 运行项目 305 | 306 | 运行主程序: 307 | 308 | ```bash 309 | python interview/main.py 310 | ``` 311 | 312 | ## Audio Configuration on macOS and Windows 313 | 314 | ### 使用 BlackHole 进行音频捕获(macOS) 315 | 316 | 在 macOS 中,BlackHole 是一个虚拟音频驱动程序,允许在应用程序之间无缝传输音频。以下是配置步骤: 317 | 318 | 1. **配置 Aggregate Device(合并设备)**: 319 | 320 | - 打开 **Audio MIDI Setup** 应用程序。 321 | - 创建一个 Aggregate Device,选择 **BlackHole 16ch** 和你的蓝牙耳机设备。 322 | - 确保 **BlackHole 16ch** 作为输出设备,蓝牙耳机作为输入设备。 323 | 324 | 2. **配置 Multi-Output Device(多输出设备)**: 325 | 326 | - 创建一个 Multi-Output Device,选择 **BlackHole 16ch** 和蓝牙耳机作为输出设备。 327 | - 将 Multi-Output Device 设置为系统默认输出设备。 328 | 329 | 3. **运行音频测试**: 330 | - 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备,并确保选择输出频率为 16K。 331 | 332 | ### 在 Windows 中实现音频捕获 333 | 334 | 在 Windows 系统中,可以使用类似的虚拟音频设备,如 **VB-CABLE Virtual Audio Device** 或 **VoiceMeeter**,来实现与 macOS 上 BlackHole 类似的音频捕获功能。以下是使用 VB-CABLE 实现音频捕获的步骤: 335 | 336 | 1. **安装 VB-CABLE Virtual Audio Device**: 337 | 338 | - 访问 [VB-Audio 官方网站](https://vb-audio.com/Cable/) 并下载 VB-CABLE 安装程序。 339 | - 安装 VB-CABLE Virtual Audio Device。安装完成后,它将作为一个虚拟音频设备出现在你的系统中。 340 | 341 | 2. **配置音频设备**: 342 | 343 | - 打开 **声音控制面板**,进入 **播放** 和 **录制** 选项卡。 344 | - 在 **播放** 选项卡中,将 `VB-CABLE Input` 设置为默认播放设备,这将捕获系统音频。 345 | - 在 **录制** 选项卡中,选择 `VB-CABLE Output` 作为默认录音设备,这将允许应用程序获取系统音频输入。 346 | - 如果你需要同时捕获麦克风音频,可以将麦克风设置为 `VB-CABLE Output` 的输入,或者在使用 VoiceMeeter 时进行更多高级配置。 347 | 348 | 3. **运行音频测试**: 349 | 350 | - 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备,并确保在 Windows 上选择合适的音频设备进行录音和播放。 351 | - 确保所选设备的采样率为 16K,以便与 ASR 模型兼容。 352 | 353 | 通过这些步骤,无论在 macOS 还是 Windows 上,你都可以轻松实现音频捕获并应用于项目中。 354 | 355 | --- 356 | 357 | ## changelog 358 | 359 | - 2024.8.29. 增加多模态,修改 GPT 为快捷键调用 360 | - 2024.8.20. web 前端, 外接知识库 361 | - 2024.8.16. 完成 ASR 语音识别, GPT 询问 362 | 363 | 感谢您的使用!如有任何问题或建议,请随时联系。 364 | -------------------------------------------------------------------------------- /README_EN.md: -------------------------------------------------------------------------------- 1 | # Simple Interview Audio Processing 2 | 3 | > **Disclaimer**: This project is intended solely for technical practice and should not be used for cheating in interviews or any commercial purposes. The author is not responsible for any legal issues arising from the use of this project. If this project causes any inconvenience, please contact the author for removal. 4 | 5 | This is a simple, audio-stream-based example project designed to demonstrate basic audio processing capabilities. The inspiration for this project comes from the high-priced paid projects available online. This project aims to show that such functionalities do not require complex technical stacks. 6 | 7 | ![Demo](img/img.gif) 8 | 9 | --- 10 | 11 | ![Audio Configuration](img/image.png) 12 | 13 | ## Features 14 | 15 | - **Audio Source Processing**: Capture system audio as the `interviewer` source and microphone audio as the `Rookie` source, accurately distinguishing between input and output. 16 | - **Streaming Output**: Supports streaming dialogue output for `interviewer`, `Rookie`, and `ChatGPT`. 17 | - **Custom Settings**: Customize dialogue depth and print content, and control the maximum number of dialogues for `interviewer`, `Rookie`, and `ChatGPT`. 18 | - **Prompt Workflow**: Processes all files in the `prompt` folder according to a preset workflow order. 19 | - **Save Dialogue Records**: Save dialogue records as Markdown files by running `python interview/SaveFile.py`. 20 | 21 | ## Installation 22 | 23 | ### 1. Start the Services 24 | 25 | Use `docker-compose` to start the services: 26 | 27 | ```bash 28 | docker-compose up -d 29 | ``` 30 | 31 | #### FunASR Interview Service 32 | 33 | Enter the Docker container: 34 | 35 | ```bash 36 | docker-compose exec funasr_interview bash 37 | ``` 38 | 39 | Inside the Docker container, run the following command to start the service: 40 | 41 | ```bash 42 | cd FunASR/runtime 43 | nohup bash run_server_2pass.sh \ 44 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 45 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 46 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 47 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 48 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 49 | --itn-dir thuduj12/fst_itn_zh \ 50 | --certfile 0 \ 51 | --hotword ../../hotwords.txt > log.txt 2>&1 & 52 | ``` 53 | 54 | #### FunASR Rookie Service 55 | 56 | Enter the Docker container: 57 | 58 | ```bash 59 | docker-compose exec funasr_rookie bash 60 | ``` 61 | 62 | Inside the Docker container, run the following command to start the service: 63 | 64 | ```bash 65 | cd FunASR/runtime 66 | nohup bash run_server_2pass.sh \ 67 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 68 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 69 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 70 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 71 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 72 | --itn-dir thuduj12/fst_itn_zh \ 73 | --certfile 0 \ 74 | --hotword ../../hotwords.txt > log.txt 2>&1 & 75 | ``` 76 | 77 | ### 2. Install Dependencies 78 | 79 | Use Poetry to install dependencies: 80 | 81 | ```bash 82 | poetry install 83 | ``` 84 | 85 | Enter the virtual environment: 86 | 87 | ```bash 88 | poetry shell 89 | ``` 90 | 91 | ### 3. Configure the `.env` File 92 | 93 | Copy the template file and modify it as needed: 94 | 95 | ```bash 96 | cp .env.template .env 97 | ``` 98 | 99 | Make sure to adjust `AGGREGATE_DEVICE_INDEX`, `MIC_DEVICE_INDEX`, and the GPT `baseurl` and `API` settings. 100 | 101 | ### 4. Run the Project 102 | 103 | Run the main program: 104 | 105 | ```bash 106 | python interview/main.py 107 | ``` 108 | 109 | ## Audio Configuration on macOS and Windows 110 | 111 | ### Using BlackHole for Audio Capture (macOS) 112 | 113 | On macOS, BlackHole is a virtual audio driver that allows seamless audio routing between applications. Here's how to set it up: 114 | 115 | 1. **Configure an Aggregate Device**: 116 | 117 | - Open the **Audio MIDI Setup** application. 118 | - Create an Aggregate Device, selecting **BlackHole 2ch** and your Bluetooth headset. 119 | - Ensure that **BlackHole 2ch** is used as the output device and the Bluetooth headset as the input device. 120 | 121 | 2. **Configure a Multi-Output Device**: 122 | 123 | - Create a Multi-Output Device, selecting **BlackHole 2ch** and the Bluetooth headset as output devices. 124 | - Set the Multi-Output Device as the system's default output device. 125 | 126 | 3. **Run Audio Test**: 127 | - Use `python interview/audioTest.py` to list all audio input and output devices, and ensure that the sample rate is set to 16K. 128 | 129 | ### Audio Capture on Windows 130 | 131 | On Windows, you can use similar virtual audio devices like **VB-CABLE Virtual Audio Device** or **VoiceMeeter** to achieve functionality similar to BlackHole on macOS. Here's how to use VB-CABLE: 132 | 133 | 1. **Install VB-CABLE Virtual Audio Device**: 134 | 135 | - Visit the [VB-Audio official website](https://vb-audio.com/Cable/) and download the VB-CABLE installer. 136 | - Install VB-CABLE Virtual Audio Device, which will then appear as a virtual audio device on your system. 137 | 138 | 2. **Configure Audio Devices**: 139 | 140 | - Open the **Sound Control Panel** and navigate to the **Playback** and **Recording** tabs. 141 | - In the **Playback** tab, set `VB-CABLE Input` as the default playback device to capture system audio. 142 | - In the **Recording** tab, set `VB-CABLE Output` as the default recording device to allow applications to capture system audio input. 143 | - If you also need to capture microphone audio, set the microphone as an input to `VB-CABLE Output`, or configure advanced settings with VoiceMeeter. 144 | 145 | 3. **Run Audio Test**: 146 | 147 | - Use `python interview/audioTest.py` to list all audio input and output devices on Windows, ensuring that the selected devices are configured correctly for recording and playback. 148 | - Ensure the sample rate is set to 16K to be compatible with the ASR model. 149 | 150 | With these steps, you can easily set up audio capture on both macOS and Windows for use in your project. 151 | 152 | --- 153 | 154 | Thank you for using this project! If you have any questions or suggestions, please feel free to reach out. 155 | -------------------------------------------------------------------------------- /Windows_user_tutorial.md: -------------------------------------------------------------------------------- 1 | # Windows User tutorial 2 | 3 | 在本篇文章中, 将详细介绍在 `Windows` 系统下如何部署本工具. 4 | 5 | 本工具主要有两个部分. `截屏询问GPT`, `语音记录询问GPT`. 很高兴的是,这两个模块是互不干扰的, 所以如果你只需要任意一个单独的模块,只需要遵循对应的教程就可以了. 6 | 7 | 首先,你需要确保本地已经安装了 `Poetry`。如果尚未安装,可以使用以下命令进行安装(`使用PowerShell`): 8 | 9 | ```bash 10 | (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py - 11 | ``` 12 | 13 | 添加`Poetry`到系统环境变量中 14 | 15 | 安装完成后,确保 `Poetry` 已经添加到你的系统路径中,可以通过以下命令验证安装是否成功: 16 | 17 | ```bash 18 | poetry --version 19 | ``` 20 | 21 | 这将输出 Poetry 的版本信息,确认安装成功。 22 | 23 | 克隆本项目,并进入项目根目录 24 | 25 | ```bash 26 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git 27 | cd Simple-Interview-Audio-Processing 28 | ``` 29 | 30 | 在项目根目录下,运行以下命令来安装项目的所有依赖项: 31 | 32 | ```bash 33 | poetry install 34 | ``` 35 | 36 | 进入虚拟环境 37 | 38 | ```bash 39 | poetry shell 40 | ``` 41 | 42 | 复制`.env` 文件 43 | 44 | ``` 45 | cp env.template .env 46 | ``` 47 | 48 | **注意:**, 你每次进入这个项目都需要`poetry shell`, 然后你就可以使用`python xxx.py` 使用虚拟环境运行`python` 文件了. 49 | 50 | 启动 `docker` 容器. 51 | 52 | ``` 53 | docker-compose up -d 54 | # 如果你发现后面的步骤出现问题,这里可能会出现问题,使用下面的命令排查 55 | docker-compose down 56 | docker-compose up 57 | 查看log日志,看看是哪个服务没有正常启动 58 | docker-compose restart 59 | docker-compose restart <服务名> 60 | ``` 61 | 62 | ## 语音记录询问 GPT 63 | 64 | 这部分的需要的东西有: 65 | 66 | 1. `openai` ( `deepseek`, `oaipro`, `通义千问` )格式的`api`和`Base_url` 67 | 2. 阿里的 `funASR` 进行实时语音识别 68 | 3. [Virtual audio cable](https://pan.baidu.com/s/1yWoZ7dErbjsdQjI5Dtj5-w) (提取码:ib9p)进行语音的分离和采样. 69 | 70 | #### Virtual audio cable 设置 71 | 72 | ##### 下载安装 73 | 74 | [Virtual audio cable](https://pan.baidu.com/s/1yWoZ7dErbjsdQjI5Dtj5-w) (提取码:ib9p) 75 | 76 | 根据电脑操作系统位数进行安装 77 | 78 | ##### 进行 Virtual audio cable 设置 79 | 80 | 在声音设置中进行如下操作: 81 | 82 | ![2024-09-08-16-43-48](asserts/2024-09-08-16-43-48.png) 83 | ![2024-09-08-16-43-56](asserts/2024-09-08-16-43-56.png) 84 | ![2024-09-08-16-44-01](asserts/2024-09-08-16-44-01.png) 85 | 86 | 从`程序(win)` 里面找 `Audio Repeater` 87 | 88 | ![2024-09-08-16-44-07](asserts/2024-09-08-16-44-07.png) 89 | 90 | 此时,你需要测试一下,你的音响是否能播放声音, 你的麦克风能否正常输入声音. 91 | 92 | #### 测试环境 93 | 94 | 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备. 95 | 96 | ``` 97 | ❯ python interview/audioTest.py 98 | Device 0: Microsoft Sound Mapper - Input 99 | Device 1: 麦克风 (Realtek(R) Audio) 100 | Device 2: CABLE Output (VB-Audio Virtual 101 | Device 3: Line 1 (Virtual Audio Cable) 102 | Device 4: Microsoft Sound Mapper - Output 103 | Device 5: Line 1 (Virtual Audio Cable) 104 | Device 6: CABLE Input (VB-Audio Virtual C 105 | Device 7: 扬声器 (Realtek(R) Audio) 106 | Device 8: 主声音捕获驱动程序 107 | Device 9: 麦克风 (Realtek(R) Audio) 108 | Device 10: CABLE Output (VB-Audio Virtual Cable) 109 | Device 11: Line 1 (Virtual Audio Cable) 110 | Device 12: 主声音驱动程序 111 | Device 13: Line 1 (Virtual Audio Cable) 112 | Device 14: CABLE Input (VB-Audio Virtual Cable) 113 | Device 15: 扬声器 (Realtek(R) Audio) 114 | Device 16: Line 1 (Virtual Audio Cable) 115 | Device 17: CABLE Input (VB-Audio Virtual Cable) 116 | Device 18: 扬声器 (Realtek(R) Audio) 117 | Device 19: CABLE Output (VB-Audio Virtual Cable) 118 | Device 20: Line 1 (Virtual Audio Cable) 119 | Device 21: 麦克风 (Realtek(R) Audio) 120 | Device 22: 立体声混音 (Realtek HD Audio Stereo input) 121 | Device 23: 麦克风 (Realtek HD Audio Mic input) 122 | Device 24: Speakers (Realtek HD Audio output) 123 | Device 25: Mic 1 (Virtual Cable 1) 124 | Device 26: Line Out (Virtual Cable 1) 125 | Device 27: CABLE Output (VB-Audio Point) 126 | Device 28: Speakers (VB-Audio Point) 127 | Device 1: 128 | Name: 麦克风 (Realtek(R) Audio) 129 | Max Input Channels: 2 130 | Max Output Channels: 0 131 | Default Sample Rate: 44100.0 132 | Device 3: 133 | Name: Line 1 (Virtual Audio Cable) 134 | Max Input Channels: 8 135 | Max Output Channels: 0 136 | Default Sample Rate: 44100.0 137 | Recording... 138 | Recording finished and saved to files. 139 | ``` 140 | 141 | 我这里选择`Device 1`作为麦克风输入, `Device 3`作为系统内声音输入. 142 | 143 | 第一次运行一般都会报错的,然后你根据已经列出的所有音频输入设备,选择相应的麦克风输入和系统内声音输入. 144 | 运行完这个脚本,在项目的根目录内会生成的`system_audio.wav`,`mic_audio.wav` 里面分别记录了`系统内声音` 和 `麦克风声音`.(注意:此时在读取麦克风输入和系统内声音输入,请保证相应的有输入) 145 | 请播放这个文件,来检查自己的设置是否正常. 146 | 147 | #### 配置.env 文件 148 | 149 | 在`.env`文件里面 150 | 151 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。 152 | 153 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量 154 | 155 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置. 156 | 157 | #### 配置 docker 158 | 159 | ##### FunASR Interview 服务 160 | 161 | 进入 Docker 容器内部: 162 | 163 | ```bash 164 | docker-compose exec funasr_interview bash 165 | ``` 166 | 167 | 在 Docker 容器内运行以下命令启动服务: 168 | 169 | ```bash 170 | cd FunASR/runtime 171 | 172 | 173 | nohup bash run_server_2pass.sh \ 174 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 175 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 176 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 177 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 178 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 179 | --itn-dir thuduj12/fst_itn_zh \ 180 | --certfile 0 \ 181 | --hotword ../../hotwords.txt > log.txt 2>&1 & 182 | ``` 183 | 184 | ##### FunASR Rookie 服务 185 | 186 | 进入 Docker 容器内部: 187 | 188 | ```bash 189 | docker-compose exec funasr_rookie bash 190 | ``` 191 | 192 | 在 Docker 容器内运行以下命令启动服务: 193 | 194 | ```bash 195 | cd FunASR/runtime 196 | 197 | 198 | nohup bash run_server_2pass.sh \ 199 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 200 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 201 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 202 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 203 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 204 | --itn-dir thuduj12/fst_itn_zh \ 205 | --certfile 0 \ 206 | --hotword ../../hotwords.txt > log.txt 2>&1 & 207 | ``` 208 | 209 | #### 运行 210 | 211 | ``` 212 | python interview/main.py 213 | ``` 214 | 215 | 使用` + + h` 调用`GPT` 询问. 216 | 217 | ## 截屏询问 GPT 218 | 219 | 这部分的需要的东西有: 220 | 221 | 1. `openai`(或者官转)的`api`和`Base_url`. (这部分没有`deepseek`等其他模型, 需要支持`多模态`) 222 | 2. 类似于`CloudFlare`的`R2`的`OSS`. (做图床用,后续可能会支持免费图床) **注意: 一定不能是仅限国内访问的 OSS, 需要美国的 chatGPT 也能访问的到** 223 | 224 | [R2 数据库教程(网上搜的)](https://blog.huacai.one/post/3) 225 | 226 | 在`.env`文件里面 227 | 228 | ``` 229 | # R2 Access Key ID 230 | # This is your R2 account's Access Key ID, used to authenticate API requests. 231 | # R2 Secret Access Key 232 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests. 233 | # R2 Bucket Name 234 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored. 235 | # R2 Region 236 | # The region where your R2 bucket is located. For example, 'us-west-1'. 237 | # R2 Endpoint URL 238 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name. 239 | # For example: https://your-endpoint-url.com 240 | # R2 Custom Domain 241 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同 242 | R2_ACCESS_KEY_ID=your-access-key-id-here 243 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here 244 | R2_BUCKET_NAME=your-bucket-name-here 245 | R2_REGION=your-region-here 246 | R2_ENDPOINT_URL=https://your-endpoint-url-here 247 | R2_CUSTOM_DOMAIN=your-custom-domain-here 248 | ``` 249 | 250 | ### 使用 251 | 252 | ``` 253 | python interview/main.py 254 | ``` 255 | 256 | 使用` + + a` 调用`algorithm Prompt` 询问. 257 | 258 | 使用` + + p` 调用`personality Prompt` 询问. 259 | 260 | 使用` + + g` 调用`general Prompt` 询问. 261 | 262 | 使用` + + l` 调用`long_screenshot Prompt` 询问. (截屏拼接) 263 | 264 | 使用` + + f` 调用`fix Prompt` 询问. 265 | 266 | 使用` + + o` 调用`ocr Prompt` 询问. 267 | 268 | ## QA 269 | 270 | 在这里罗列一些可能出现的问题,以及解决方案: 271 | 272 | ### FunASR 273 | 274 | 首先这个是`阿里巴巴` 提供的`实时语音转文字`服务. 但是他只支持单声道,16k 采样率. 很多时候我们的电脑是不支持这种设置的, 我`重采样` 了, 所以你应该在你的`.env`中填写正确的内容. 275 | 通过`python interview/audioTest.py` 进行测试. 276 | 277 | **Q:** 报错.2024-08-17 23:50:04,377 - modelscope - WARNING - Downloading: TLG.fst failed, reason: ('Connection broken: IncompleteRead(1164614 bytes read, 166607546 more expected)', IncompleteRead(1164614 bytes read, 166607546 more expected)) will retry 278 | 279 | **A:** 这是因为`modelScope`的下载服务出了问题, 原因可能因为你是国外的网络,导致`modelScope`中的模型下载失败导致的.如果出现了这个问题, 在`docker`的部分 280 | 使用下面的命令: 281 | 282 | ```bash 283 | docker-compose exec bash 284 | ``` 285 | 286 | 在 Docker 容器内运行以下命令启动服务: 287 | 288 | ```bash 289 | cd FunASR/runtime 290 | 291 | 292 | bash run_server_2pass.sh \ 293 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 294 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 295 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 296 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 297 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 298 | --itn-dir thuduj12/fst_itn_zh \ 299 | --certfile 0 \ 300 | --hotword ../../hotwords.txt 301 | ``` 302 | 303 | 查看具体的报错和网络连接失败信息,进行修复. 304 | 305 | ### websocket 306 | 307 | **Q:** websocket 连接失败,或者网页长时间查看不了 308 | 309 | **A:** docker 相关的容器和本地机器是不是跨国了, 如果跨国这边会这样,尽量部署在局域网或者国内. 310 | -------------------------------------------------------------------------------- /asserts/2024-09-08-14-37-55.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-37-55.png -------------------------------------------------------------------------------- /asserts/2024-09-08-14-39-00.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-39-00.png -------------------------------------------------------------------------------- /asserts/2024-09-08-14-40-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-40-02.png -------------------------------------------------------------------------------- /asserts/2024-09-08-16-43-48.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-43-48.png -------------------------------------------------------------------------------- /asserts/2024-09-08-16-43-56.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-43-56.png -------------------------------------------------------------------------------- /asserts/2024-09-08-16-44-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-44-01.png -------------------------------------------------------------------------------- /asserts/2024-09-08-16-44-07.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-44-07.png -------------------------------------------------------------------------------- /config.json: -------------------------------------------------------------------------------- 1 | { 2 | "shortcuts": { 3 | "algorithm": ["", "", "a"], 4 | "personality": ["", "", "p"], 5 | "general": ["", "", "g"], 6 | "long_screenshot": ["", "", "l"], 7 | "help": ["", "", "h"], 8 | "fix": ["", "", "f"], 9 | "ocr": ["", "", "o"], 10 | "exit": ["", "c"] 11 | }, 12 | "save_dir": "./screenshots", 13 | "scroll_delay": 1.5, 14 | "scroll_amount": 17, 15 | "max_screenshots": 3 16 | } -------------------------------------------------------------------------------- /data/database.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/data/database.xlsx -------------------------------------------------------------------------------- /dialogs_output_202408201256.md: -------------------------------------------------------------------------------- 1 | [12:55:40] 🎤 interviewer : 2 | 3 | 你为什么要使用redis呢 4 | 5 | [12:55:48] 🤖 chatgpt : 6 | 7 | ```markdown 8 | 为了提高多进程间通信的效率和速度。 9 | ``` 10 | 11 | [12:56:15] 🤖 chatgpt : 12 | 13 | 在技术面试中,当被问及为何选择使用Redis时,我的回答会包含以下几个方面: 14 | 15 | ### 1. Redis的基本特性 16 | Redis(Remote Dictionary Server)是一个开源的内存数据库,它以键值对的形式存储数据,并支持多种数据结构,如字符串、哈希、列表、集合、有序集合等。Redis的主要特点是数据存储在内存中,这使得其访问速度非常快,通常可以达到微秒级别的响应时间。 17 | 18 | ### 2. 多进程间通信的需求 19 | 在现代应用开发中,多进程架构是常见的设计模式,特别是在需要处理大量并发请求的场景下。多进程间通信(IPC)是确保这些进程能够高效、可靠地交换信息的关键。传统的IPC方法,如文件、管道、消息队列等,虽然可行,但在性能和实时性上往往不能满足高并发系统的需求。 20 | 21 | ### 3. Redis在多进程间通信中的应用 22 | Redis因其高性能和丰富的数据结构,非常适合作为多进程间通信的中间件。例如,在一个分布式系统中,不同的服务进程可能需要共享某些状态或数据。使用Redis,这些进程可以快速地读写共享数据,而无需担心数据一致性和并发控制的问题。 23 | 24 | ### 4. 具体应用场景举例 25 | 假设我们有一个在线游戏服务器,它需要处理成千上万的玩家请求。为了提高处理能力,服务器采用了多进程架构,每个进程负责处理一部分玩家请求。在这种情况下,Redis可以用来存储玩家的实时状态(如位置、分数等),这样即使玩家请求被分配到不同的进程处理,也能保证状态的一致性和实时更新。 26 | 27 | ### 5. Redis的优缺点 28 | **优点:** 29 | - **高性能:** 内存存储,快速读写。 30 | - **丰富的数据结构:** 支持多种数据类型,便于处理复杂的数据交互。 31 | - **原子操作:** 所有操作都是原子的,适合并发环境。 32 | - **持久化:** 支持数据持久化,防止数据丢失。 33 | 34 | **缺点:** 35 | - **内存限制:** 由于数据存储在内存中,因此存储容量受限于物理内存大小。 36 | - **成本:** 内存数据库的成本通常高于磁盘数据库。 37 | 38 | ### 6. 结论 39 | 综上所述,Redis因其高性能和丰富的数据结构,在多进程间通信中扮演了重要角色。虽然存在一些限制,但在大多数高并发、实时性要求高的应用场景中,Redis都是一个优秀的选择。在实际应用中,我们需要根据具体的业务需求和资源情况,权衡利弊,做出合适的技术选型。 -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3.8" 2 | 3 | services: 4 | app: 5 | image: ghcr.io/aowerdmax/websocket-redis/websocket_redis:latest 6 | ports: 7 | - "${GO_APP_PORT}:${GO_APP_PORT}" 8 | volumes: 9 | - ./data:/app/data 10 | - ./.env:/app/.env 11 | depends_on: 12 | - meilisearch 13 | - redis 14 | env_file: 15 | - .env 16 | environment: 17 | MEILISEARCH_HOST: "meilisearch" 18 | MEILISEARCH_PORT: 7700 19 | REDIS_HOST: "redis" 20 | REDIS_PORT: 6379 21 | restart: on-failure 22 | networks: 23 | - app_network 24 | 25 | meilisearch: 26 | image: getmeili/meilisearch:latest 27 | ports: 28 | - "${MEILISEARCH_PORT}:7700" 29 | environment: 30 | MEILI_ENV: "development" 31 | networks: 32 | - app_network 33 | 34 | funasr_interview: 35 | image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10 36 | container_name: funasr_interview 37 | ports: 38 | - "${INTERVIEWER_PORT}:10095" 39 | volumes: 40 | - ./funasr-runtime-resources/models_interview:/workspace/models 41 | privileged: true 42 | stdin_open: true 43 | tty: true 44 | networks: 45 | - app_network 46 | 47 | funasr_rookie: 48 | image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10 49 | container_name: funasr_rookie 50 | ports: 51 | - "${ROOKIE_PORT}:10095" 52 | volumes: 53 | - ./funasr-runtime-resources/models_rookie:/workspace/models 54 | privileged: true 55 | stdin_open: true 56 | tty: true 57 | networks: 58 | - app_network 59 | 60 | redis: 61 | image: redis:latest 62 | ports: 63 | - "${REDIS_PORT}:6379" 64 | networks: 65 | - app_network 66 | command: ["redis-server", "--appendonly", "yes"] 67 | 68 | networks: 69 | app_network: 70 | -------------------------------------------------------------------------------- /env.template: -------------------------------------------------------------------------------- 1 | CHATGPT_BASE_URL=https://api.oaipro.com/v1/chat/completions 2 | AUTHORIZATION=sk-xxxxx 3 | MODEL=gpt-4o 4 | SSL=0 5 | MODE=2pass 6 | ROOKIE_HOST=localhost 7 | ROOKIE_PORT=10097 8 | INTERVIEWER_HOST=localhost 9 | INTERVIEWER_PORT=10096 10 | AUDIO_FS=48000 11 | AUDIO_CHANNEL=2 12 | INTERVIEWER_WAIT_TIME=5000 13 | ROOKIE_WAIT_TIME=5000 14 | ROOKIE_DIALOG_LEN=1 15 | CHATGPT_DIALOG_LEN=4 16 | INTERVIEWER_DIALOG_LEN=5 17 | REDIS_HOST=localhost 18 | REDIS_PORT=6379 19 | AGGREGATE_DEVICE_INDEX=12 20 | MIC_DEVICE_INDEX=2 21 | MEILISEARCH_HOST=127.0.0.1 22 | MEILISEARCH_PORT=7700 23 | MEILISEARCH_DEEP=1 24 | RAG_ENABLED=True 25 | INTERVAL_TIME=500 26 | DATA_QUEUE_KEYS=dialog_manager:chatgpt,dialog_manager:interviewer,dialog_manager:rookie 27 | GO_APP_HOST=0.0.0.0 28 | GO_APP_PORT=8080 29 | WS_HOST=127.0.0.1 30 | WS_PORT=8080 31 | 32 | # R2 Access Key ID 33 | # This is your R2 account's Access Key ID, used to authenticate API requests. 34 | # R2 Secret Access Key 35 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests. 36 | # R2 Bucket Name 37 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored. 38 | # R2 Region 39 | # The region where your R2 bucket is located. For example, 'us-west-1'. 40 | # R2 Endpoint URL 41 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name. 42 | # For example: https://your-endpoint-url.com 43 | # R2 Custom Domain 44 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同 45 | R2_ACCESS_KEY_ID=your-access-key-id-here 46 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here 47 | R2_BUCKET_NAME=your-bucket-name-here 48 | R2_REGION=your-region-here 49 | R2_ENDPOINT_URL=https://your-endpoint-url-here 50 | R2_CUSTOM_DOMAIN=your-custom-domain-here 51 | -------------------------------------------------------------------------------- /img/RAG_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/RAG_1.png -------------------------------------------------------------------------------- /img/RAG_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/RAG_2.png -------------------------------------------------------------------------------- /img/algo_long_response.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/algo_long_response.png -------------------------------------------------------------------------------- /img/algo_long_screenShot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/algo_long_screenShot.png -------------------------------------------------------------------------------- /img/excel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/excel.png -------------------------------------------------------------------------------- /img/fix_response.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/fix_response.png -------------------------------------------------------------------------------- /img/fix_screenShot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/fix_screenShot.png -------------------------------------------------------------------------------- /img/general_response.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/general_response.png -------------------------------------------------------------------------------- /img/general_screenShot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/general_screenShot.png -------------------------------------------------------------------------------- /img/image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/image.png -------------------------------------------------------------------------------- /img/img.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/img.gif -------------------------------------------------------------------------------- /img/web.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/web.gif -------------------------------------------------------------------------------- /interview/ChatgptManager.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import asyncio 4 | from DialogManager import DialogManager 5 | from RequestSender import RequestSender 6 | from Config import Config 7 | from RagManager import RagManager 8 | 9 | class ChatgptManager: 10 | def __init__(self): 11 | self.dialog = DialogManager() 12 | self.request_sender = RequestSender() 13 | self.prompt_directory = "./workflows/prompt_2" 14 | self.last_processed_id = 3 15 | self.rag = RagManager() 16 | 17 | async def run_workflow(self): 18 | files = sorted(os.listdir(self.prompt_directory)) 19 | content_list = [] 20 | 21 | for file_name in files: 22 | file_path = os.path.join(self.prompt_directory, file_name) 23 | if file_name.endswith('.md') and os.path.isfile(file_path): 24 | with open(file_path, 'r') as file: 25 | content = file.read() 26 | 27 | content = await self._replace_content(content) 28 | 29 | content_list.append({ 30 | "content": content, 31 | "role": "user" 32 | }) 33 | 34 | receiveData = await self.request_sender.send_post_request(content_list) 35 | content_list.append({ 36 | "content": receiveData, 37 | "role": "assistant" 38 | }) 39 | 40 | async def _replace_content(self, content): 41 | content = content.replace("${conversation}", self.dialog.get_dialog()) 42 | if Config.RAG_ENABLED: 43 | rag_answers = await self.rag.query_meilisearch() 44 | if rag_answers: 45 | RAG_content = "\n下面是辅助数据:\n" + "\n".join(rag_answers) 46 | content += RAG_content 47 | return content 48 | 49 | def should_process(self): 50 | interviewer_list = self.dialog._get_list(self.dialog.interviewer_key) 51 | 52 | if not interviewer_list: 53 | return False 54 | 55 | last_entry = interviewer_list[-1] 56 | if last_entry['id'] > self.last_processed_id: 57 | if last_entry['text'].strip() == "": 58 | self.last_processed_id = last_entry['id'] 59 | return True 60 | 61 | return False 62 | 63 | def chatgpt_process(): 64 | chatgpt_manager = ChatgptManager() 65 | while True: 66 | if chatgpt_manager.should_process(): 67 | asyncio.run(chatgpt_manager.run_workflow()) 68 | 69 | time.sleep(Config.INTERVIEWER_WAIT_TIME / 1000) 70 | 71 | if __name__ == "__main__": 72 | chatgpt_process() -------------------------------------------------------------------------------- /interview/Config.py: -------------------------------------------------------------------------------- 1 | from dotenv import load_dotenv, find_dotenv 2 | import os 3 | 4 | dotenv_path = find_dotenv() 5 | if dotenv_path: 6 | print(f"Loading .env file from: {dotenv_path}") 7 | load_dotenv(dotenv_path) 8 | else: 9 | print("No .env file found") 10 | 11 | class Config: 12 | ROOKIE_HOST = os.getenv("ROOKIE_HOST", "localhost") 13 | ROOKIE_PORT = int(os.getenv("ROOKIE_PORT", 10095)) 14 | INTERVIEWER_HOST = os.getenv("INTERVIEWER_HOST", "localhost") 15 | INTERVIEWER_PORT = int(os.getenv("INTERVIEWER_PORT", 10095)) 16 | CHUNK_SIZE = [int(x) for x in os.getenv("CHUNK_SIZE", "5,10,5").split(",")] 17 | CHUNK_INTERVAL = int(os.getenv("CHUNK_INTERVAL", 10)) 18 | HOTWORD = os.getenv("HOTWORD", "") 19 | AUDIO_IN = os.getenv("AUDIO_IN", None) 20 | AUDIO_FS = int(os.getenv("AUDIO_FS", 48000)) 21 | AUDIO_CHANNEL = int(os.getenv("AUDIO_CHANNEL", 2)) 22 | SEND_WITHOUT_SLEEP = os.getenv("SEND_WITHOUT_SLEEP", "True").lower() in ("true", "1", "yes") 23 | THREAD_NUM = int(os.getenv("THREAD_NUM", 1)) 24 | WORDS_MAX_PRINT = int(os.getenv("WORDS_MAX_PRINT", 10000)) 25 | OUTPUT_DIR = os.getenv("OUTPUT_DIR", None) 26 | SSL = int(os.getenv("SSL", 1)) # 整数类型,1 表示启用 SSL,0 表示禁用 27 | USE_ITN = int(os.getenv("USE_ITN", 1)) # 整数类型,1 表示使用 ITN,0 表示不使用 28 | MODE = os.getenv("MODE", "2pass") 29 | INTERVIEWER_WAIT_TIME = int(os.getenv("INTERVIEWER_WAIT_TIME", 5000)) 30 | ROOKIE_WAIT_TIME = int(os.getenv("ROOKIE_WAIT_TIME", 5000)) 31 | ROOKIE_DIALOG_LEN=int(os.getenv("ROOKIE_DIALOG_LEN", 1)) 32 | CHATGPT_DIALOG_LEN=int(os.getenv("CHATGPT_DIALOG_LEN", 2)) 33 | INTERVIEWER_DIALOG_LEN=int(os.getenv("INTERVIEWER_DIALOG_LEN", 3)) 34 | REDIS_HOST=os.getenv("REDIS_HOST", "127.0.0.1") 35 | REDIS_PORT=int(os.getenv("REDIS_PORT", 6379)) 36 | AGGREGATE_DEVICE_INDEX=int(os.getenv("AGGREGATE_DEVICE_INDEX", 13)) 37 | MIC_DEVICE_INDEX=int(os.getenv("MIC_DEVICE_INDEX", 2)) 38 | MEILISEARCH_HOST=os.getenv("MEILISEARCH_HOST", "127.0.0.1") 39 | MEILISEARCH_PORT=int(os.getenv("MEILISEARCH_PORT", 7700)) 40 | MEILISEARCH_DEEP=int(os.getenv("MEILISEARCH_DEEP", 1)) 41 | RAG_ENABLED = os.getenv("RAG_ENABLED", "False").lower() in ("true", "1", "yes") 42 | R2_ACCESS_KEY_ID = os.getenv('R2_ACCESS_KEY_ID') 43 | R2_SECRET_ACCESS_KEY = os.getenv('R2_SECRET_ACCESS_KEY') 44 | R2_BUCKET_NAME = os.getenv('R2_BUCKET_NAME') 45 | R2_REGION = os.getenv('R2_REGION') 46 | R2_ENDPOINT_URL = os.getenv('R2_ENDPOINT_URL') 47 | R2_CUSTOM_DOMAIN = os.getenv('R2_CUSTOM_DOMAIN') 48 | -------------------------------------------------------------------------------- /interview/DialogManager.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import random 4 | import redis 5 | import json 6 | from multiprocessing import Process 7 | from Config import Config 8 | from redis.exceptions import LockNotOwnedError 9 | import platform 10 | 11 | 12 | class DialogManager: 13 | _instance = None 14 | 15 | def __new__(cls, *args, **kwargs): 16 | if not cls._instance: 17 | cls._instance = super(DialogManager, cls).__new__(cls) 18 | cls._instance._initialize() 19 | return cls._instance 20 | 21 | def _initialize(self): 22 | self.redis = redis.StrictRedis(host=Config.REDIS_HOST, port=Config.REDIS_PORT, decode_responses=True) 23 | 24 | self.interviewer_key = "dialog_manager:interviewer" 25 | self.rookie_key = "dialog_manager:rookie" 26 | self.chatgpt_key = "dialog_manager:chatgpt" 27 | self.next_id = "dialog_manager:next_id" 28 | self.system = platform.system() 29 | 30 | self.interviewer_icons = ["🎤"] 31 | self.rookie_icons = ["😅"] 32 | self.chatgpt_icons = ["🤖"] 33 | 34 | def _get_next_id(self): 35 | return self.redis.incr("dialog_manager:next_id") 36 | 37 | def _get_icon(self, list_name): 38 | if list_name == "interviewer": 39 | return random.choice(self.interviewer_icons) 40 | elif list_name == "rookie": 41 | return random.choice(self.rookie_icons) 42 | elif list_name == "chatgpt": 43 | return random.choice(self.chatgpt_icons) 44 | 45 | def _get_valid_entries(self, key, max_len): 46 | entries = self._get_list(key) 47 | valid_entries = [item for item in entries if item['text'].strip() != ""] 48 | 49 | if len(valid_entries) > max_len: 50 | return valid_entries[-max_len:] 51 | else: 52 | return valid_entries 53 | 54 | def _notify_change(self, out = True): 55 | interviewer_list = self._get_valid_entries(self.interviewer_key, Config.INTERVIEWER_DIALOG_LEN) 56 | rookie_list = self._get_valid_entries(self.rookie_key, Config.ROOKIE_DIALOG_LEN) 57 | chatgpt_list = self._get_valid_entries(self.chatgpt_key, Config.CHATGPT_DIALOG_LEN) 58 | 59 | combined_list = interviewer_list + rookie_list + chatgpt_list 60 | 61 | sorted_combined_list = sorted(combined_list, key=lambda x: x['time']) 62 | 63 | output = [] 64 | for entry in sorted_combined_list: 65 | icon = self._get_icon(entry['list_name']) 66 | output.append(f"{entry['time']} {icon} {entry['list_name']} : {entry['text']}") 67 | 68 | result = "\n".join(output) 69 | if out: 70 | if self.system == "Windows": 71 | os.system('cls') 72 | elif self.system == "Darwin" or self.system == "Linux": 73 | os.system('clear') 74 | print(result) 75 | return result 76 | 77 | def _get_list(self, key): 78 | return [json.loads(item) for item in self.redis.lrange(key, 0, -1)] 79 | 80 | def _add_to_list(self, key, element): 81 | lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5) 82 | try: 83 | lock.acquire() 84 | element['id'] = self._get_next_id() 85 | self.redis.rpush(key, json.dumps(element)) 86 | self._notify_change() 87 | finally: 88 | try: 89 | lock.release() 90 | except LockNotOwnedError: 91 | pass 92 | 93 | def _update_last_in_list(self, key, element): 94 | lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5) 95 | try: 96 | lock.acquire() 97 | current_list = self._get_list(key) 98 | if current_list: 99 | element['id'] = current_list[-1]['id'] 100 | self.redis.lset(key, -1, json.dumps(element)) 101 | else: 102 | element['id'] = self._get_next_id() 103 | self.redis.rpush(key, json.dumps(element)) 104 | self._notify_change() 105 | finally: 106 | try: 107 | lock.release() 108 | except LockNotOwnedError: 109 | pass 110 | 111 | def get_last_interviewer(self): 112 | interviewer_list = self._get_valid_entries(self.interviewer_key, 1) 113 | return interviewer_list[-1] 114 | 115 | def get_last_rookie(self): 116 | rookie_list = self._get_valid_entries(self.rookie_key, 1) 117 | if rookie_list: 118 | return rookie_list[-1] 119 | return None 120 | 121 | def get_dialog(self): 122 | interviewer_list = self._get_valid_entries(self.interviewer_key, Config.INTERVIEWER_DIALOG_LEN) 123 | rookie_list = self._get_valid_entries(self.rookie_key, Config.ROOKIE_DIALOG_LEN) 124 | 125 | combined_list = interviewer_list + rookie_list 126 | 127 | sorted_combined_list = sorted(combined_list, key=lambda x: x['time']) 128 | 129 | output = [] 130 | for entry in sorted_combined_list: 131 | icon = self._get_icon(entry['list_name']) 132 | output.append(f"{entry['time']} {icon} {entry['list_name']} : {entry['text']}") 133 | 134 | result = "\n".join(output) 135 | return result 136 | 137 | def add_to_interviewer(self, element): 138 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 139 | entry = {'time': current_time, 'list_name': 'interviewer', 'text': element} 140 | self._add_to_list(self.interviewer_key, entry) 141 | 142 | def add_to_rookie(self, element): 143 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 144 | entry = {'time': current_time, 'list_name': 'rookie', 'text': element} 145 | self._add_to_list(self.rookie_key, entry) 146 | 147 | def add_to_chatgpt(self, element): 148 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 149 | entry = {'time': current_time, 'list_name': 'chatgpt', 'text': element} 150 | self._add_to_list(self.chatgpt_key, entry) 151 | 152 | def update_last_interviewer(self, element): 153 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 154 | entry = {'time': current_time, 'list_name': 'interviewer', 'text': element} 155 | self._update_last_in_list(self.interviewer_key, entry) 156 | 157 | def update_last_rookie(self, element): 158 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 159 | entry = {'time': current_time, 'list_name': 'rookie', 'text': element} 160 | self._update_last_in_list(self.rookie_key, entry) 161 | 162 | def update_last_chatgpt(self, element): 163 | current_time = time.strftime("[%H:%M:%S]", time.localtime()) 164 | entry = {'time': current_time, 'list_name': 'chatgpt', 'text': element} 165 | self._update_last_in_list(self.chatgpt_key, entry) 166 | 167 | def clear_all(self): 168 | lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5) 169 | try: 170 | lock.acquire() 171 | self.redis.delete(self.interviewer_key) 172 | self.redis.delete(self.rookie_key) 173 | self.redis.delete(self.chatgpt_key) 174 | self.redis.delete(self.next_id) 175 | print("All keys have been cleared.") 176 | finally: 177 | try: 178 | lock.release() 179 | except LockNotOwnedError: 180 | pass 181 | 182 | 183 | def process_1(): 184 | dialog_manager = DialogManager() 185 | dialog_manager.add_to_interviewer("Question 1") 186 | time.sleep(1) 187 | dialog_manager.add_to_rookie("Answer 1") 188 | time.sleep(1) 189 | dialog_manager.add_to_chatgpt("Suggestion 1") 190 | time.sleep(1) 191 | 192 | def process_2(): 193 | dialog_manager = DialogManager() 194 | dialog_manager.add_to_interviewer("Question 2") 195 | time.sleep(1) 196 | dialog_manager.add_to_rookie("Answer 2") 197 | time.sleep(1) 198 | dialog_manager.add_to_rookie("Answer 3") 199 | time.sleep(1) 200 | dialog_manager.add_to_chatgpt("Suggestion 2") 201 | time.sleep(1) 202 | 203 | def process_3(): 204 | dialog_manager = DialogManager() 205 | dialog_manager.update_last_interviewer("Updated Question 2") 206 | dialog_manager.update_last_rookie("Updated Answer 5") 207 | dialog_manager.update_last_chatgpt("Updated Suggestion 2") 208 | time.sleep(1) 209 | dialog_manager.update_last_interviewer("Question 3 (new)") 210 | dialog_manager.update_last_rookie("Answer 4 (new)") 211 | dialog_manager.update_last_chatgpt("Suggestion 3 (new)") 212 | 213 | if __name__ == "__main__": 214 | p1 = Process(target=process_1) 215 | p2 = Process(target=process_2) 216 | p3 = Process(target=process_3) 217 | 218 | p1.start() 219 | p2.start() 220 | p3.start() 221 | 222 | p1.join() 223 | p2.join() 224 | p3.join() 225 | 226 | print('end') 227 | -------------------------------------------------------------------------------- /interview/Interviewer.py: -------------------------------------------------------------------------------- 1 | import pyaudio 2 | import json 3 | import asyncio 4 | import numpy as np 5 | import websockets 6 | from Config import Config 7 | import os 8 | import time 9 | from queue import Queue 10 | from multiprocessing import Process 11 | from DialogManager import DialogManager 12 | from ChatgptManager import ChatgptManager 13 | import multiprocessing 14 | 15 | class Interview: 16 | def __init__(self): 17 | self.voices = [] 18 | self.offline_msg_done = False 19 | self.websocket = None 20 | self.text_print_2pass_offline = "" 21 | self.text_print_2pass_online = "" 22 | self.dialog = DialogManager() 23 | self.chatgpt = ChatgptManager() 24 | 25 | 26 | async def ws_client(self, id="Interview", chunk_begin=0, chunk_size=1): 27 | while True: 28 | for i in range(chunk_begin, chunk_begin + chunk_size): 29 | self.offline_msg_done = False 30 | self.voices = Queue() 31 | 32 | uri = f"ws://{Config.INTERVIEWER_HOST}:{Config.INTERVIEWER_PORT}" 33 | ssl_context = None 34 | 35 | print("Connecting to", uri) 36 | try: 37 | async with websockets.connect(uri, subprotocols=["binary"], ping_interval=None, ssl=ssl_context) as self.websocket: 38 | task1 = asyncio.create_task(self.record_system_voice()) 39 | task2 = asyncio.create_task(self.message(id)) 40 | await asyncio.gather(task1, task2) 41 | except websockets.exceptions.ConnectionClosedError as e: 42 | print(f"WebSocket connection closed with error: {e}") 43 | await asyncio.sleep(3) 44 | except Exception as e: 45 | print(f"Interviewer error occurred: {e}") 46 | import traceback 47 | traceback.print_exc() 48 | await asyncio.sleep(3) 49 | 50 | async def record_system_voice(self): 51 | FORMAT = pyaudio.paInt16 52 | CHANNELS = Config.AUDIO_CHANNEL 53 | RATE = Config.AUDIO_FS 54 | chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL 55 | CHUNK = int(RATE / 1000 * chunk_size) 56 | audio = pyaudio.PyAudio() 57 | TARGET_RATE = 16000 58 | TARGET_CHANNELS = 1 59 | 60 | 61 | while True: 62 | try: 63 | audio = pyaudio.PyAudio() 64 | system_stream = audio.open(format=FORMAT, 65 | channels=CHANNELS, 66 | rate=RATE, 67 | input=True, 68 | input_device_index=Config.AGGREGATE_DEVICE_INDEX, 69 | frames_per_buffer=CHUNK) 70 | break 71 | except OSError as e: 72 | print(f"Error opening audio stream: {e}") 73 | except Exception as e: 74 | print(f"pyaudio error occurred: {e}") 75 | 76 | fst_dict, hotword_msg = self.prepare_hotword_message() 77 | 78 | use_itn = Config.USE_ITN == 1 79 | message = json.dumps({ 80 | "mode": Config.MODE, 81 | "chunk_size": Config.CHUNK_SIZE, 82 | "chunk_interval": Config.CHUNK_INTERVAL, 83 | "wav_name": "system_voice", 84 | "is_speaking": True, 85 | "hotwords": hotword_msg, 86 | "itn": use_itn 87 | }) 88 | 89 | await self.websocket.send(message) 90 | 91 | def convert_to_mono(data, channels): 92 | audio_data = np.frombuffer(data, dtype=np.int16) 93 | mono_data = audio_data.reshape((-1, channels)).mean(axis=1) 94 | return mono_data.astype(np.int16) 95 | 96 | def resample_audio(data, original_rate, target_rate): 97 | audio_data = np.frombuffer(data, dtype=np.int16) 98 | resampled_data = np.interp(np.linspace(0, len(audio_data), int(len(audio_data) * target_rate / original_rate)), 99 | np.arange(len(audio_data)), audio_data) 100 | return resampled_data.astype(np.int16) 101 | 102 | while True: 103 | data = system_stream.read(CHUNK) 104 | 105 | try: 106 | if CHANNELS == TARGET_CHANNELS and RATE == TARGET_RATE: 107 | await self.websocket.send(data) 108 | else: 109 | mono_data = convert_to_mono(data, CHANNELS) 110 | 111 | resampled_data = resample_audio(mono_data, RATE, TARGET_RATE) 112 | 113 | output_data = resampled_data.tobytes() 114 | 115 | await self.websocket.send(output_data) 116 | await asyncio.sleep(0.01) 117 | except websockets.exceptions.ConnectionClosedError as e: 118 | print(f"Connection closed with error: {e}") 119 | break 120 | 121 | async def message(self, id): 122 | if Config.OUTPUT_DIR is not None: 123 | ibest_writer = open(os.path.join(Config.OUTPUT_DIR, f"text.{id}"), "a", encoding="utf-8") 124 | else: 125 | ibest_writer = None 126 | try: 127 | while True: 128 | try: 129 | meg = await asyncio.wait_for(self.websocket.recv(), timeout=Config.INTERVIEWER_WAIT_TIME / 1000) 130 | meg = json.loads(meg) 131 | wav_name = meg.get("wav_name", "demo") 132 | text = meg["text"] 133 | timestamp = "" 134 | self.offline_msg_done = meg.get("is_final", False) 135 | 136 | if "timestamp" in meg: 137 | timestamp = meg["timestamp"] 138 | 139 | if ibest_writer is not None: 140 | if timestamp != "": 141 | text_write_line = f"{wav_name}\t{text}\t{timestamp}\n" 142 | else: 143 | text_write_line = f"{wav_name}\t{text}\n" 144 | ibest_writer.write(text_write_line) 145 | 146 | if 'mode' not in meg: 147 | continue 148 | 149 | if meg["mode"] == "2pass-online": 150 | self.text_print_2pass_online += "{}".format(text) 151 | text_print = self.text_print_2pass_offline + self.text_print_2pass_online 152 | text_print = text_print[-Config.WORDS_MAX_PRINT:] 153 | self.dialog.update_last_interviewer(text_print) 154 | else: 155 | self.text_print_2pass_online = "" 156 | text_print = self.text_print_2pass_offline + "{}".format(text) 157 | text_print = text_print[-Config.WORDS_MAX_PRINT:] 158 | self.dialog.update_last_interviewer(text_print) 159 | self.text_print_2pass_offline += "{}".format(text) 160 | except asyncio.TimeoutError: 161 | if self.text_print_2pass_offline: 162 | self.text_print_2pass_offline = "" 163 | self.dialog.add_to_interviewer("") 164 | except Exception as e: 165 | print("Interview receive ws message Exception:", e) 166 | 167 | 168 | def prepare_hotword_message(self): 169 | fst_dict = {} 170 | hotword_msg = "" 171 | if Config.HOTWORD.strip(): 172 | with open(Config.HOTWORD) as f_scp: 173 | hot_lines = f_scp.readlines() 174 | for line in hot_lines: 175 | words = line.strip().split(" ") 176 | if len(words) < 2: 177 | print("Please check format of hotwords") 178 | continue 179 | try: 180 | fst_dict[" ".join(words[:-1])] = int(words[-1]) 181 | except ValueError: 182 | print("Please check format of hotwords") 183 | hotword_msg = json.dumps(fst_dict) 184 | return fst_dict, hotword_msg 185 | 186 | 187 | def interview_thread(id, chunk_begin, chunk_size): 188 | while True: 189 | try: 190 | interview = Interview() 191 | asyncio.run(interview.ws_client(id, chunk_begin, chunk_size)) 192 | except Exception as e: 193 | print(f"Interview thread encountered an error: {e}") 194 | time.sleep(3) 195 | 196 | if __name__ == '__main__': 197 | p = Process(target=interview_thread, args=("interviewer", 0, 1)) 198 | p.start() 199 | p.join() 200 | -------------------------------------------------------------------------------- /interview/MultimodeManager.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from DialogManager import DialogManager 3 | from RequestSender import RequestSender 4 | from Config import Config 5 | 6 | class MultimodeManager: 7 | def __init__(self): 8 | self.request_sender = RequestSender() 9 | self.prompt_directory = "./prompt" 10 | self.dialog = DialogManager() 11 | 12 | async def multimode_process(self, category, link): 13 | content_list = [] 14 | 15 | file_path = self.prompt_directory + "/" + category + ".md" 16 | print(file_path) 17 | with open(file_path, 'r') as file: 18 | file_content = file.read() 19 | if not self.dialog.get_last_rookie(): 20 | self.dialog.update_last_rookie(file_content + f' \n Description') 21 | else: 22 | print(self.dialog.get_last_rookie()) 23 | self.dialog.add_to_rookie(file_content + f' \n Description') 24 | 25 | content = [ 26 | {"type": "text", "text": file_content}, 27 | {"type": "image_url", "image_url": {"url": link}} 28 | ] 29 | 30 | content_list.append({ 31 | "content": content, 32 | "role": "user" 33 | }) 34 | 35 | print(content_list) 36 | 37 | await self.request_sender.send_post_request(content_list) 38 | 39 | self.dialog.add_to_rookie("") 40 | 41 | async def main(): 42 | multimodeManager = MultimodeManager() 43 | category = "algorithm" 44 | link = "xxxxxx" 45 | await multimodeManager.multimode_process(category, link) 46 | 47 | if __name__ == '__main__': 48 | asyncio.run(main()) 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /interview/R2Uploader.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | from botocore.client import Config as boto3Config 3 | from Config import Config 4 | import hashlib 5 | import os 6 | from datetime import datetime 7 | import requests 8 | from requests.exceptions import RequestException, SSLError 9 | from requests.adapters import HTTPAdapter 10 | from urllib3.util.retry import Retry 11 | 12 | class R2Uploader: 13 | def __init__(self): 14 | self.access_key_id = Config.R2_ACCESS_KEY_ID 15 | self.secret_access_key = Config.R2_SECRET_ACCESS_KEY 16 | self.bucket_name = Config.R2_BUCKET_NAME 17 | self.region = Config.R2_REGION 18 | self.endpoint_url = Config.R2_ENDPOINT_URL 19 | self.custom_domain = Config.R2_CUSTOM_DOMAIN 20 | 21 | self.s3_client = boto3.client( 22 | 's3', 23 | region_name=self.region, 24 | endpoint_url=self.endpoint_url, 25 | aws_access_key_id=self.access_key_id, 26 | aws_secret_access_key=self.secret_access_key, 27 | config=boto3Config(signature_version='s3v4') 28 | ) 29 | 30 | def _calculate_md5(self, file_path): 31 | hash_md5 = hashlib.md5() 32 | with open(file_path, "rb") as f: 33 | for chunk in iter(lambda: f.read(4096), b""): 34 | hash_md5.update(chunk) 35 | return hash_md5.hexdigest() 36 | 37 | def _construct_object_name(self, file_path): 38 | md5_hash = self._calculate_md5(file_path) 39 | ext = os.path.splitext(file_path)[1] 40 | now = datetime.now() 41 | object_name = f"{now.year}/{now.month}/{md5_hash}{ext}" 42 | return object_name 43 | 44 | def upload_image(self, file_path): 45 | try: 46 | object_name = self._construct_object_name(file_path) 47 | self.s3_client.upload_file( 48 | file_path, 49 | self.bucket_name, 50 | object_name, 51 | ExtraArgs={'ACL': 'public-read'} 52 | ) 53 | print(f"File {file_path} uploaded to {object_name} in R2 bucket.") 54 | return object_name 55 | except Exception as e: 56 | print(f"Failed to upload {file_path} to R2 bucket: {e}") 57 | return None 58 | 59 | def get_file_url(self, object_name): 60 | if self.custom_domain: 61 | return f"{self.custom_domain}/{self.bucket_name}/{object_name}" 62 | else: 63 | return f"{self.endpoint_url}/{self.bucket_name}/{object_name}" 64 | 65 | def get_markdown_link(self, object_name): 66 | url = self.get_file_url(object_name) 67 | return f"![Image]({url})" 68 | 69 | def get_html_link(self, object_name): 70 | url = self.get_file_url(object_name) 71 | return f'Image' 72 | 73 | def upload_and_get_link(self, file_path, link_type="url"): 74 | object_name = self.upload_image(file_path) 75 | if object_name: 76 | if link_type == "markdown": 77 | return self.get_markdown_link(object_name) 78 | elif link_type == "html": 79 | return self.get_html_link(object_name) 80 | else: 81 | return self.get_file_url(object_name) 82 | else: 83 | return None 84 | 85 | def download_image_from_url(self, img_url): 86 | headers = { 87 | "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36", 88 | "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", 89 | "Accept-Language": "en-US,en;q=0.5", 90 | "Connection": "keep-alive" 91 | } 92 | 93 | retry_strategy = Retry( 94 | total=5, 95 | backoff_factor=1, 96 | status_forcelist=[429, 500, 502, 503, 504], 97 | method_whitelist=["HEAD", "GET", "OPTIONS"] 98 | ) 99 | 100 | adapter = HTTPAdapter(max_retries=retry_strategy) 101 | session = requests.Session() 102 | session.mount("https://", adapter) 103 | session.mount("http://", adapter) 104 | 105 | try: 106 | response = session.get(img_url, headers=headers, stream=True, timeout=(10, 30)) 107 | response.raise_for_status() 108 | except SSLError as e: 109 | print(f"SSL error occurred, retrying with TLSv1.2: {e}") 110 | try: 111 | # Retry with TLSv1.2 112 | session.mount("https://", adapter) 113 | response = session.get(img_url, headers=headers, stream=True) 114 | response.raise_for_status() 115 | except Exception as e: 116 | print(f"Failed to download image with TLSv1.2 from {img_url}: {e}") 117 | return None 118 | except RequestException as e: 119 | print(f"Failed to download image from {img_url}: {e}") 120 | return None 121 | 122 | tmp_dir = "./tmp" 123 | if not os.path.exists(tmp_dir): 124 | os.makedirs(tmp_dir) 125 | 126 | file_name = os.path.basename(img_url) 127 | file_path = os.path.join(tmp_dir, file_name) 128 | 129 | try: 130 | with open(file_path, 'wb') as out_file: 131 | for chunk in response.iter_content(chunk_size=8192): 132 | out_file.write(chunk) 133 | print(f"Image downloaded successfully from {img_url}") 134 | return file_path 135 | except Exception as e: 136 | print(f"Unexpected error occurred while saving image from {img_url}: {e}") 137 | return None 138 | 139 | def upload_image_from_url(self, img_url, link_type="url"): 140 | file_path = self.download_image_from_url(img_url) 141 | if file_path: 142 | return self.upload_and_get_link(file_path, link_type) 143 | return None 144 | 145 | if __name__ == "__main__": 146 | uploader = R2Uploader() 147 | file_path = './0aa00c3f-97fb-42b6-9297-812b2b0b7b61.png' 148 | link_type = 'markdown' # 'url', 'markdown', or 'html' 149 | 150 | link = uploader.upload_and_get_link(file_path, link_type) 151 | print(f"Generated link: {link}") 152 | -------------------------------------------------------------------------------- /interview/RagManager.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from meilisearch import Client 3 | from DialogManager import DialogManager 4 | from Config import Config 5 | import json 6 | class RagManager: 7 | def __init__(self): 8 | self.dialog = DialogManager() 9 | self.config = Config() 10 | self.meilisearch_client = Client(f"http://{self.config.MEILISEARCH_HOST}:{self.config.MEILISEARCH_PORT}") 11 | 12 | async def query_meilisearch(self): 13 | query_object = self.dialog.get_last_interviewer() 14 | query_text = query_object['text'] 15 | print(f"query_text: {query_text}") 16 | index = self.meilisearch_client.index('qa_pairs') 17 | 18 | results = index.search(query_text, { 19 | 'limit': self.config.MEILISEARCH_DEEP 20 | }) 21 | answers = [hit['a'] for hit in results['hits'] if 'a' in hit] 22 | return answers 23 | 24 | async def main(): 25 | rag = RagManager() 26 | answers = await rag.query_meilisearch() 27 | for ans in answers: 28 | print(ans) 29 | 30 | if __name__ == '__main__': 31 | asyncio.run(main()) -------------------------------------------------------------------------------- /interview/RequestSender.py: -------------------------------------------------------------------------------- 1 | import aiohttp 2 | import json 3 | from dotenv import load_dotenv 4 | import os 5 | import logging 6 | from DialogManager import DialogManager 7 | import time 8 | 9 | logging.basicConfig(level=logging.INFO) 10 | logger = logging.getLogger(__name__) 11 | 12 | class RequestSender: 13 | def __init__(self): 14 | load_dotenv() 15 | self.url = os.getenv('CHATGPT_BASE_URL') 16 | self.headers = { 17 | 'Authorization': "Bearer " + os.getenv('AUTHORIZATION'), 18 | 'Content-Type': 'application/json', 19 | "Accept": 'application/json' 20 | } 21 | self.dialog = DialogManager() 22 | self.model = os.getenv('MODEL') 23 | 24 | async def send_post_request(self, contentList): 25 | async with aiohttp.ClientSession() as session: 26 | payload = { 27 | "model": self.model, 28 | "messages": contentList, 29 | "stream": True 30 | } 31 | receiveData = "" 32 | buffer = "" 33 | buffer_time = 0.5 34 | self.dialog.add_to_chatgpt("") 35 | try: 36 | async with session.post(self.url, headers=self.headers, data=json.dumps(payload)) as response: 37 | if response.status == 200: 38 | last_update_time = time.time() 39 | async for line in response.content: 40 | if line.startswith(b'data: '): 41 | sse_message = line.decode('utf-8').strip()[6:] 42 | if sse_message == '[DONE]': 43 | receiveData += buffer 44 | self.dialog.update_last_chatgpt(receiveData) 45 | time.sleep(1) 46 | self.dialog.add_to_chatgpt("") 47 | break 48 | message_data = self.extract_id_and_content(json.loads(sse_message)) 49 | if 'content' in message_data and message_data['content']: 50 | buffer += message_data['content'] 51 | 52 | if time.time() - last_update_time > buffer_time: 53 | receiveData += buffer 54 | self.dialog.update_last_chatgpt(receiveData) 55 | buffer = "" 56 | last_update_time = time.time() 57 | 58 | else: 59 | logger.debug(f"Received non-data line: {line}") 60 | except aiohttp.ClientError as e: 61 | logger.error(f"Request failed: {str(e)}") 62 | 63 | return receiveData 64 | 65 | def extract_id_and_content(self, response_data): 66 | try: 67 | response_id = response_data.get('id') 68 | content = response_data.get('choices', [])[0].get('delta', {}).get('content') 69 | return { 70 | "id": response_id, 71 | "content": content 72 | } 73 | except (IndexError, KeyError, TypeError) as e: 74 | logger.error(f"Error extracting id and content: {e}") 75 | return {"error": "Invalid response format"} 76 | 77 | -------------------------------------------------------------------------------- /interview/Rookie.py: -------------------------------------------------------------------------------- 1 | import pyaudio 2 | import numpy as np 3 | import json 4 | import asyncio 5 | import websockets 6 | from Config import Config 7 | import time 8 | import os 9 | from queue import Queue 10 | from multiprocessing import Process 11 | from DialogManager import DialogManager 12 | 13 | class Rookie: 14 | def __init__(self): 15 | self.voices = [] 16 | self.offline_msg_done = False 17 | self.websocket = None 18 | self.text_print_2pass_offline = "" 19 | self.text_print_2pass_online = "" 20 | self.dialog = DialogManager() 21 | 22 | async def ws_client(self, id="Rookie", chunk_begin=0, chunk_size=1): 23 | while True: 24 | for i in range(chunk_begin, chunk_begin + chunk_size): 25 | self.offline_msg_done = False 26 | self.voices = Queue() 27 | 28 | uri = f"ws://{Config.ROOKIE_HOST}:{Config.ROOKIE_PORT}" 29 | ssl_context = None 30 | 31 | print("Connecting to", uri) 32 | try: 33 | async with websockets.connect(uri, subprotocols=["binary"], ping_interval=None, ssl=ssl_context) as self.websocket: 34 | task1 = asyncio.create_task(self.record_microphone()) 35 | task2 = asyncio.create_task(self.message(id)) 36 | await asyncio.gather(task1, task2) 37 | except websockets.exceptions.ConnectionClosedError as e: 38 | print(f"WebSocket connection closed with error: {e}") 39 | await asyncio.sleep(3) 40 | except Exception as e: 41 | print(f"Rookie error occurred: {e}") 42 | import traceback 43 | traceback.print_exc() 44 | await asyncio.sleep(3) 45 | 46 | async def record_microphone(self): 47 | FORMAT = pyaudio.paInt16 48 | CHANNELS = Config.AUDIO_CHANNEL 49 | RATE = Config.AUDIO_FS 50 | chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL 51 | CHUNK = int(RATE / 1000 * chunk_size) 52 | audio = pyaudio.PyAudio() 53 | 54 | TARGET_RATE = 16000 55 | TARGET_CHANNELS = 1 56 | 57 | while True: 58 | try: 59 | audio = pyaudio.PyAudio() 60 | mic_stream = audio.open(format=FORMAT, 61 | channels=CHANNELS, 62 | rate=RATE, 63 | input=True, 64 | input_device_index=Config.MIC_DEVICE_INDEX, 65 | frames_per_buffer=CHUNK) 66 | break 67 | except OSError as e: 68 | print(f"Error opening audio stream: {e}") 69 | await asyncio.sleep(3) 70 | 71 | fst_dict, hotword_msg = self.prepare_hotword_message() 72 | 73 | use_itn = Config.USE_ITN == 1 74 | message = json.dumps({ 75 | "mode": Config.MODE, 76 | "chunk_size": Config.CHUNK_SIZE, 77 | "chunk_interval": Config.CHUNK_INTERVAL, 78 | "wav_name": "system_voice", 79 | "is_speaking": True, 80 | "hotwords": hotword_msg, 81 | "itn": use_itn 82 | }) 83 | 84 | await self.websocket.send(message) 85 | def convert_to_mono(data, channels): 86 | audio_data = np.frombuffer(data, dtype=np.int16) 87 | mono_data = audio_data.reshape((-1, channels)).mean(axis=1) 88 | return mono_data.astype(np.int16) 89 | 90 | def resample_audio(data, original_rate, target_rate): 91 | audio_data = np.frombuffer(data, dtype=np.int16) 92 | resampled_data = np.interp(np.linspace(0, len(audio_data), int(len(audio_data) * target_rate / original_rate)), 93 | np.arange(len(audio_data)), audio_data) 94 | return resampled_data.astype(np.int16) 95 | 96 | 97 | while True: 98 | data = mic_stream.read(CHUNK) 99 | try: 100 | if CHANNELS == TARGET_CHANNELS and RATE == TARGET_RATE: 101 | await self.websocket.send(data) 102 | else: 103 | mono_data = convert_to_mono(data, CHANNELS) 104 | 105 | resampled_data = resample_audio(mono_data, RATE, TARGET_RATE) 106 | 107 | output_data = resampled_data.tobytes() 108 | 109 | await self.websocket.send(output_data) 110 | await asyncio.sleep(0.01) 111 | except websockets.exceptions.ConnectionClosedError as e: 112 | print(f"Connection closed with error: {e}") 113 | break 114 | 115 | async def message(self, id): 116 | if Config.OUTPUT_DIR is not None: 117 | ibest_writer = open(os.path.join(Config.OUTPUT_DIR, f"text.{id}"), "a", encoding="utf-8") 118 | else: 119 | ibest_writer = None 120 | try: 121 | while True: 122 | try: 123 | meg = await asyncio.wait_for(self.websocket.recv(), timeout=Config.ROOKIE_WAIT_TIME / 1000) 124 | meg = json.loads(meg) 125 | wav_name = meg.get("wav_name", "demo") 126 | text = meg["text"] 127 | timestamp = "" 128 | self.offline_msg_done = meg.get("is_final", False) 129 | 130 | if "timestamp" in meg: 131 | timestamp = meg["timestamp"] 132 | 133 | if ibest_writer is not None: 134 | if timestamp != "": 135 | text_write_line = f"{wav_name}\t{text}\t{timestamp}\n" 136 | else: 137 | text_write_line = f"{wav_name}\t{text}\n" 138 | ibest_writer.write(text_write_line) 139 | 140 | if 'mode' not in meg: 141 | continue 142 | 143 | if meg["mode"] == "2pass-online": 144 | self.text_print_2pass_online += "{}".format(text) 145 | text_print = self.text_print_2pass_offline + self.text_print_2pass_online 146 | text_print = text_print[-Config.WORDS_MAX_PRINT:] 147 | self.dialog.update_last_rookie(text_print) 148 | else: 149 | self.text_print_2pass_online = "" 150 | text_print = self.text_print_2pass_offline + "{}".format(text) 151 | text_print = text_print[-Config.WORDS_MAX_PRINT:] 152 | self.dialog.update_last_rookie(text_print) 153 | self.text_print_2pass_offline += "{}".format(text) 154 | except asyncio.TimeoutError: 155 | if self.text_print_2pass_offline: 156 | self.text_print_2pass_offline = "" 157 | self.dialog.add_to_rookie("") 158 | 159 | except Exception as e: 160 | print("Exception:", e) 161 | 162 | def prepare_hotword_message(self): 163 | fst_dict = {} 164 | hotword_msg = "" 165 | if Config.HOTWORD.strip(): 166 | with open(Config.HOTWORD) as f_scp: 167 | hot_lines = f_scp.readlines() 168 | for line in hot_lines: 169 | words = line.strip().split(" ") 170 | if len(words) < 2: 171 | print("Please check format of hotwords") 172 | continue 173 | try: 174 | fst_dict[" ".join(words[:-1])] = int(words[-1]) 175 | except ValueError: 176 | print("Please check format of hotwords") 177 | hotword_msg = json.dumps(fst_dict) 178 | return fst_dict, hotword_msg 179 | 180 | 181 | def rookie_thread(id, chunk_begin, chunk_size): 182 | while True: 183 | try: 184 | rookie = Rookie() 185 | asyncio.run(rookie.ws_client(id, chunk_begin, chunk_size)) 186 | except Exception as e: 187 | print(f"Rookie thread encountered an error: {e}") 188 | time.sleep(3) 189 | 190 | if __name__ == '__main__': 191 | p = Process(target=rookie_thread, args=("rookie", 0, 1)) 192 | p.start() 193 | p.join() 194 | -------------------------------------------------------------------------------- /interview/SaveFile.py: -------------------------------------------------------------------------------- 1 | import redis 2 | import os 3 | import json 4 | from Config import Config 5 | 6 | class SaveFile: 7 | def __init__(self): 8 | self.redis = redis.StrictRedis(host=Config.REDIS_HOST, port=Config.REDIS_PORT, decode_responses=True) 9 | 10 | self.interviewer_key = "dialog_manager:interviewer" 11 | self.rookie_key = "dialog_manager:rookie" 12 | self.chatgpt_key = "dialog_manager:chatgpt" 13 | 14 | def _get_valid_entries(self, key, max_len): 15 | entries = [json.loads(item) for item in self.redis.lrange(key, 0, -1)] 16 | valid_entries = [item for item in entries if item['text'].strip() != ""] 17 | 18 | if len(valid_entries) > max_len: 19 | return valid_entries[-max_len:] 20 | else: 21 | return valid_entries 22 | 23 | def _get_icon(self, list_name): 24 | if list_name == "interviewer": 25 | return "🎤" 26 | elif list_name == "rookie": 27 | return "😅" 28 | elif list_name == "chatgpt": 29 | return "🤖" 30 | 31 | def export_dialogs_to_file(self, filename): 32 | interviewer_list = self._get_valid_entries(self.interviewer_key, 3000) 33 | rookie_list = self._get_valid_entries(self.rookie_key, 3000) 34 | chatgpt_list = self._get_valid_entries(self.chatgpt_key, 3000) 35 | 36 | combined_list = interviewer_list + rookie_list + chatgpt_list 37 | 38 | # 对提取出的元素按时间排序 39 | sorted_combined_list = sorted(combined_list, key=lambda x: x['time']) 40 | 41 | output = [] 42 | for entry in sorted_combined_list: 43 | icon = self._get_icon(entry['list_name']) 44 | output.append(f"{entry['time']} {icon} {entry['list_name']} : \n\n {entry['text']}") 45 | 46 | result = "\n\n".join(output) 47 | 48 | with open(filename, 'w') as file: 49 | file.write(result) 50 | 51 | print(f"Dialogs have been saved to {filename}") 52 | 53 | # 示例用法 54 | if __name__ == "__main__": 55 | dialog_manager = SaveFile() 56 | 57 | dialog_manager.export_dialogs_to_file("dialogs_output.md") 58 | -------------------------------------------------------------------------------- /interview/ScreenShot.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from datetime import datetime 4 | from pynput import keyboard 5 | import pyautogui 6 | import asyncio 7 | from PIL import Image 8 | from multiprocessing import Process 9 | from R2Uploader import R2Uploader 10 | from MultimodeManager import MultimodeManager 11 | from DialogManager import DialogManager 12 | from ChatgptManager import ChatgptManager 13 | import platform 14 | 15 | 16 | class ScreenshotManager: 17 | CONFIG_FILE = "config.json" 18 | DEFAULT_CONFIG = { 19 | "shortcuts": { 20 | "algorithm": ["", "", "a"], 21 | "personality": ["", "", "p"], 22 | "general": ["", "", "g"], 23 | "long_screenshot": ["", "", "l"], 24 | "help": ["", "", "h"], 25 | "fix": ["", "", "f"], 26 | "ocr": ["", "", "o"], 27 | "exit": ["", "c"] 28 | }, 29 | "save_dir": "./screenshots", 30 | "scroll_delay": 1.5, 31 | "scroll_amount": 17, 32 | "max_screenshots": 3 33 | } 34 | 35 | def __init__(self): 36 | self.config = self.load_config() 37 | self.current_keys = set() 38 | self.uploader = R2Uploader() 39 | self.link_type = 'url' 40 | self.prompt_dir = "./prompt/" 41 | self.dialog = DialogManager() 42 | self.multimodeManager = MultimodeManager() 43 | self.loop = None 44 | self.chatgpt_manager = ChatgptManager() 45 | self.system = platform.system() 46 | os.makedirs(self.config['save_dir'], exist_ok=True) 47 | 48 | def load_config(self): 49 | if os.path.exists(self.CONFIG_FILE): 50 | with open(self.CONFIG_FILE, 'r') as f: 51 | return json.load(f) 52 | return self.DEFAULT_CONFIG 53 | 54 | def save_config(self): 55 | with open(self.CONFIG_FILE, 'w') as f: 56 | json.dump(self.config, f, indent=4) 57 | 58 | def map_key_for_platform(self, key): 59 | if self.system == 'Windows': 60 | return self.map_key_for_windows(key) 61 | elif self.system == 'Darwin': 62 | return self.map_key_for_macos(key) 63 | elif self.system == 'Linux': 64 | return self.map_key_for_linux(key) 65 | return key 66 | 67 | def map_key_for_windows(self, key): 68 | if hasattr(key, 'vk'): 69 | vk = key.vk 70 | if 65 <= vk <= 90: 71 | return chr(vk).lower() 72 | if 48 <= vk <= 57: 73 | return chr(vk) 74 | 75 | key_mapping = { 76 | 'Key.ctrl_l': '', 77 | 'Key.ctrl_r': '', 78 | 'Key.alt_l': '', 79 | 'Key.alt_r': '', 80 | 'Key.shift_l': '', 81 | 'Key.shift_r': '', 82 | 'Key.cmd': '', 83 | 'Key.enter': '', 84 | 'Key.space': '', 85 | 'Key.tab': '', 86 | 'Key.esc': '', 87 | 'Key.backspace': '', 88 | } 89 | return key_mapping.get(str(key), None) 90 | 91 | def map_key_for_macos(self, key): 92 | key_mapping = { 93 | 'Key.cmd': '', 94 | 'Key.ctrl': '', 95 | 'Key.alt': '', 96 | 'Key.shift': '', 97 | 'Key.enter': '', 98 | 'Key.space': '', 99 | 'Key.tab': '', 100 | 'Key.esc': '', 101 | 'Key.backspace': '', 102 | 'Key.caps_lock': '', 103 | 'Key.right': '', 104 | 'Key.left': '', 105 | 'Key.up': '', 106 | 'Key.down': '' 107 | } 108 | if isinstance(key, keyboard.KeyCode): 109 | return key.char.lower() if key.char else None 110 | else: 111 | return key_mapping.get(str(key), None) 112 | 113 | 114 | def map_key_for_linux(self, key): 115 | if hasattr(key, 'vk'): 116 | vk = key.vk 117 | if 65 <= vk <= 90: 118 | return chr(vk).lower() 119 | if 48 <= vk <= 57: 120 | return chr(vk) 121 | 122 | key_mapping = { 123 | 'Key.ctrl_l': '', 124 | 'Key.ctrl_r': '', 125 | 'Key.alt_l': '', 126 | 'Key.alt_r': '', 127 | 'Key.shift_l': '', 128 | 'Key.shift_r': '', 129 | 'Key.enter': '', 130 | 'Key.space': '', 131 | 'Key.tab': '', 132 | 'Key.esc': '', 133 | 'Key.backspace': '', 134 | } 135 | return key_mapping.get(str(key), None) 136 | 137 | def take_screenshot(self): 138 | screenshot = pyautogui.screenshot() 139 | filename = datetime.now().strftime("screenshot_%Y%m%d_%H%M%S.png") 140 | filepath = os.path.join(self.config['save_dir'], filename) 141 | screenshot.save(filepath) 142 | print(f"Screenshot saved as {filepath}") 143 | return filepath 144 | 145 | async def take_long_screenshot(self): 146 | print("Taking long screenshot...") 147 | screen_width, screen_height = pyautogui.size() 148 | mouse_x, mouse_y = pyautogui.position() 149 | 150 | screenshots = [] 151 | total_height = 0 152 | 153 | for i in range(self.config['max_screenshots']): 154 | print(f"Taking screenshot {i+1}/{self.config['max_screenshots']}...") 155 | 156 | current_screenshot = await self.safe_screenshot() 157 | if not current_screenshot: 158 | print(f"Failed to take screenshot {i+1}. Stopping capture.") 159 | break 160 | 161 | screenshots.append(current_screenshot) 162 | total_height += current_screenshot.height 163 | 164 | if i < self.config['max_screenshots'] - 1: 165 | print(f"Scrolling down {self.config['scroll_amount']} pixels...") 166 | pyautogui.scroll(-self.config['scroll_amount']) 167 | await asyncio.sleep(self.config['scroll_delay']) 168 | 169 | if not screenshots: 170 | print("No screenshots were captured. Aborting.") 171 | return None 172 | 173 | print(f"Captured {len(screenshots)} screenshots. Stitching them together...") 174 | long_screenshot = Image.new('RGB', (screen_width, total_height)) 175 | 176 | current_height = 0 177 | for screenshot in screenshots: 178 | long_screenshot.paste(screenshot, (0, current_height)) 179 | current_height += screenshot.height 180 | 181 | filename = datetime.now().strftime("long_screenshot_%Y%m%d_%H%M%S.png") 182 | filepath = os.path.join(self.config['save_dir'], filename) 183 | long_screenshot.save(filepath) 184 | print(f"Long screenshot saved as {filepath}") 185 | 186 | pyautogui.moveTo(mouse_x, mouse_y) 187 | return filepath 188 | 189 | 190 | async def async_screenshot(self): 191 | return await asyncio.to_thread(pyautogui.screenshot) 192 | 193 | async def safe_screenshot(self, timeout=5): 194 | try: 195 | return await asyncio.wait_for(self.async_screenshot(), timeout) 196 | except asyncio.TimeoutError: 197 | print(f"Screenshot timed out after {timeout} seconds.") 198 | return None 199 | 200 | 201 | async def generate_question(self, category, link): 202 | print(f"Generating {category} question...") 203 | await self.multimodeManager.multimode_process(category, link) 204 | 205 | def on_press(self, key): 206 | asyncio.run_coroutine_threadsafe(self._async_on_press(key), self.loop) 207 | 208 | async def _async_on_press(self, key): 209 | 210 | key_char = self.map_key_for_platform(key) 211 | 212 | if key_char is None: 213 | return 214 | 215 | self.current_keys.add(key_char) 216 | 217 | for category, shortcut in self.config['shortcuts'].items(): 218 | if all(k in self.current_keys for k in shortcut): 219 | if category == "long_screenshot": 220 | filepath = await self.take_long_screenshot() 221 | elif category == "help": 222 | await self.run_chatgpt_workflow() 223 | elif category == "exit": 224 | raise KeyboardInterrupt("Exit shortcut triggered") 225 | else: 226 | filepath = self.take_screenshot() 227 | 228 | if filepath: 229 | link = self.uploader.upload_and_get_link(filepath, self.link_type) 230 | print(f"Generated link: {link}") 231 | await self.generate_question(category, link) 232 | 233 | 234 | def on_release(self, key): 235 | key_char = self.map_key_for_platform(key) 236 | 237 | if key_char is None: 238 | return 239 | 240 | if key_char in self.current_keys: 241 | self.current_keys.remove(key_char) 242 | 243 | async def run_chatgpt_workflow(self): 244 | print("Running ChatGPT workflow...") 245 | await self.chatgpt_manager.run_workflow() 246 | 247 | async def run(self): 248 | self.loop = asyncio.get_running_loop() 249 | 250 | with keyboard.Listener( 251 | on_press=self.on_press, 252 | on_release=self.on_release 253 | ) as listener: 254 | print("Shortcut program is running. Press Ctrl+C to exit.") 255 | try: 256 | await asyncio.Event().wait() # Run forever 257 | except KeyboardInterrupt: 258 | print("\nProgram terminated.") 259 | finally: 260 | listener.stop() 261 | 262 | def listen_key_process(): 263 | manager = ScreenshotManager() 264 | asyncio.run(manager.run()) 265 | 266 | if __name__ == "__main__": 267 | process = Process(target=listen_key_process) 268 | process.start() 269 | process.join() -------------------------------------------------------------------------------- /interview/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/interview/__init__.py -------------------------------------------------------------------------------- /interview/audioTest.py: -------------------------------------------------------------------------------- 1 | import pyaudio 2 | import wave 3 | from Config import Config 4 | 5 | # 初始化 PyAudio 6 | audio = pyaudio.PyAudio() 7 | 8 | # 列出所有音频输入设备 9 | for i in range(audio.get_device_count()): 10 | device_info = audio.get_device_info_by_index(i) 11 | print(f"Device {i}: {device_info['name']}") 12 | 13 | def print_device_info(audio, index): 14 | try: 15 | info = audio.get_device_info_by_index(index) 16 | print(f"Device {index}:") 17 | print(f" Name: {info['name']}") 18 | print(f" Max Input Channels: {info['maxInputChannels']}") 19 | print(f" Max Output Channels: {info['maxOutputChannels']}") 20 | print(f" Default Sample Rate: {info['defaultSampleRate']}") 21 | except Exception as e: 22 | print(f"Error getting info for device {index}: {e}") 23 | 24 | # 选择 Aggregate Device 和麦克风的设备索引 25 | aggregate_device_index = 12 # 根据上面的列表选择适合的设备索引 26 | mic_device_index = 1 # 替换为你的麦克风设备索引 27 | 28 | print_device_info(audio, aggregate_device_index) 29 | print_device_info(audio, mic_device_index) 30 | 31 | 32 | 33 | # 配置参数 34 | FORMAT = pyaudio.paInt16 35 | System_RATE = 48000 # 单声道可能会兼容不同的设备 36 | Mic_RATE = 16000 # 单声道可能会兼容不同的设备 37 | System_CHANNELS = 1 # 单声道可能会兼容不同的设备 38 | Mic_CHANNELS = 1 # 单声道可能会兼容不同的设备 39 | chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL 40 | Mic_CHUNK = int(Mic_RATE / 1000 * chunk_size) 41 | System_CHUNK = int(System_RATE / 1000 * chunk_size) 42 | # 打开系统音频输入流(使用聚合设备) 43 | system_stream = audio.open(format=FORMAT, 44 | channels=System_CHANNELS, 45 | rate=System_RATE, 46 | input=True, 47 | input_device_index=aggregate_device_index, 48 | frames_per_buffer=System_CHUNK) 49 | 50 | # 打开麦克风输入流 51 | mic_stream = audio.open(format=FORMAT, 52 | channels=Mic_CHANNELS, 53 | rate=Mic_RATE, 54 | input=True, 55 | input_device_index=mic_device_index, 56 | frames_per_buffer=Mic_CHUNK) 57 | 58 | print("Recording...") 59 | 60 | # 捕获音频数据 61 | system_frames = [] 62 | mic_frames = [] 63 | 64 | for _ in range(50): 65 | system_data = system_stream.read(System_CHUNK) 66 | mic_data = mic_stream.read(Mic_CHUNK) 67 | system_frames.append(system_data) 68 | mic_frames.append(mic_data) 69 | 70 | # 停止并关闭流 71 | system_stream.stop_stream() 72 | system_stream.close() 73 | 74 | mic_stream.stop_stream() 75 | mic_stream.close() 76 | 77 | audio.terminate() 78 | 79 | # 保存系统音频到文件 80 | with wave.open("system_audio.wav", 'wb') as wf: 81 | wf.setnchannels(System_CHANNELS) 82 | wf.setsampwidth(audio.get_sample_size(FORMAT)) 83 | wf.setframerate(System_RATE) 84 | wf.writeframes(b''.join(system_frames)) 85 | 86 | # 保存麦克风音频到文件 87 | with wave.open("mic_audio.wav", 'wb') as wf: 88 | wf.setnchannels(Mic_CHANNELS) 89 | wf.setsampwidth(audio.get_sample_size(FORMAT)) 90 | wf.setframerate(Mic_RATE) 91 | wf.writeframes(b''.join(mic_frames)) 92 | 93 | print("Recording finished and saved to files.") 94 | -------------------------------------------------------------------------------- /interview/main.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import signal 3 | import sys 4 | import time 5 | from multiprocessing import Process 6 | from Rookie import Rookie 7 | from Interviewer import Interview 8 | from DialogManager import DialogManager 9 | from ScreenShot import listen_key_process 10 | from SaveFile import SaveFile 11 | from datetime import datetime 12 | 13 | def interview_thread(id, chunk_begin, chunk_size): 14 | while True: 15 | try: 16 | interview = Interview() 17 | asyncio.run(interview.ws_client(id, chunk_begin, chunk_size)) 18 | except Exception as e: 19 | print(f"Interview thread encountered an error: {e}") 20 | time.sleep(3) 21 | 22 | def rookie_thread(id, chunk_begin, chunk_size): 23 | while True: 24 | try: 25 | rookie = Rookie() 26 | asyncio.run(rookie.ws_client(id, chunk_begin, chunk_size)) 27 | except Exception as e: 28 | print(f"Rookie thread encountered an error: {e}") 29 | time.sleep(3) 30 | 31 | def handle_exit(signum, frame): 32 | print("\nGracefully shutting down...") 33 | 34 | interview_process.terminate() 35 | rookie_process.terminate() 36 | listen_key_process.terminate() 37 | 38 | interview_process.join() 39 | rookie_process.join() 40 | listen_key_process.join() 41 | 42 | saveFile = SaveFile() 43 | current_time = datetime.now().strftime("%Y%m%d%H%M") 44 | file_name = f"dialogs_output_{current_time}.md" 45 | saveFile.export_dialogs_to_file(file_name) 46 | print(f"Dialogs saved to {file_name}. Exiting.") 47 | 48 | sys.exit(0) 49 | 50 | if __name__ == '__main__': 51 | signal.signal(signal.SIGINT, handle_exit) 52 | 53 | dialog_manager = DialogManager() 54 | dialog_manager.clear_all() 55 | dialog_manager.add_to_interviewer("") 56 | dialog_manager.add_to_chatgpt("") 57 | dialog_manager.add_to_rookie("") 58 | 59 | interview_process = Process(target=interview_thread, args=("interviewer", 0, 1)) 60 | rookie_process = Process(target=rookie_thread, args=("rookie", 0, 1)) 61 | listen_key_process = Process(target=listen_key_process) 62 | 63 | interview_process.start() 64 | rookie_process.start() 65 | listen_key_process.start() 66 | 67 | try: 68 | interview_process.join() 69 | rookie_process.join() 70 | listen_key_process.join() 71 | except KeyboardInterrupt: 72 | handle_exit(None, None) 73 | -------------------------------------------------------------------------------- /macOS_user_tutorial.md: -------------------------------------------------------------------------------- 1 | # MacOS User tutorial 2 | 3 | 在本篇文章中, 将详细介绍在 `MacOS` 系统下如何部署本工具. 4 | 5 | 本工具主要有两个部分. `截屏询问GPT`, `语音记录询问GPT`. 很高兴的是,这两个模块是互不干扰的, 所以如果你只需要任意一个单独的模块,只需要遵循对应的教程就可以了. 6 | 7 | 首先,你需要确保本地已经安装了 `Poetry`。如果尚未安装,可以使用以下命令进行安装: 8 | 9 | ```bash 10 | curl -sSL https://install.python-poetry.org | python3 - 11 | ``` 12 | 13 | 添加`Poetry`到系统环境变量中, 可以添加到`.bashrc`或者 `.zshrc` 中长久生效. 14 | 15 | 安装完成后,确保 `Poetry` 已经添加到你的系统路径中,可以通过以下命令验证安装是否成功: 16 | 17 | ```bash 18 | poetry --version 19 | ``` 20 | 21 | 这将输出 Poetry 的版本信息,确认安装成功。 22 | 23 | 克隆本项目,并进入项目根目录 24 | 25 | ```bash 26 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git 27 | cd Simple-Interview-Audio-Processing 28 | ``` 29 | 30 | 在项目根目录下,运行以下命令来安装项目的所有依赖项: 31 | 32 | ```bash 33 | poetry install 34 | ``` 35 | 36 | 进入虚拟环境 37 | 38 | ```bash 39 | poetry shell 40 | ``` 41 | 42 | 复制`.env` 文件 43 | 44 | ``` 45 | cp env.template .env 46 | ``` 47 | 48 | **注意:**, 你每次进入这个项目都需要`poetry shell`, 然后你就可以使用`python xxx.py` 使用虚拟环境运行`python` 文件了. 49 | 50 | 启动 `docker` 容器. 51 | 52 | ``` 53 | docker-compose up -d 54 | # 如果你发现后面的步骤出现问题,这里可能会出现问题,使用下面的命令排查 55 | docker-compose down 56 | docker-compose up 57 | 查看log日志,看看是哪个服务没有正常启动 58 | docker-compose restart 59 | docker-compose restart <服务名> 60 | ``` 61 | 62 | ## 语音记录询问 GPT 63 | 64 | 这部分的需要的东西有: 65 | 66 | 1. `openai` ( `deepseek`, `oaipro`, `通义千问` )格式的`api`和`Base_url` 67 | 2. 阿里的 `funASR` 进行实时语音识别 68 | 3. [BlackHole 16ch](https://github.com/ExistentialAudio/BlackHole)进行语音的分离和采样. 69 | 70 | #### BlackHole 16ch 设置 71 | 72 | ##### 下载安装 73 | 74 | Option 1: Install via Homebrew 75 | 76 | ``` 77 | brew install blackhole-16ch 78 | ``` 79 | 80 | Option 2: Download Installer 81 | 从链接中安装[BlackHole 16ch](https://existential.audio/blackhole/). 可以选择不捐赠,然后填入自己的邮箱和个人信息,然后在邮箱里面找下载链接.(如果有能力可以捐赠他,**和我没关系**.) 82 | 83 | ##### 进行 BlackHole 设置 84 | 85 | `Command` + `Space` 打开聚焦搜索, 搜索`音频MIDI设置`. 86 | 根据下面的截图创建聚合设备和多输出设备, 一定要注意,采样速率和主设备都要保证和下面的图一致. 因为`FunASR`是只支持 `单声道` 和 `16000` 采样率的. 87 | 88 | ![2024-09-08-14-37-55](asserts/2024-09-08-14-37-55.png) 89 | 90 | ![2024-09-08-14-39-00](asserts/2024-09-08-14-39-00.png) 91 | 92 | ![2024-09-08-14-40-02](asserts/2024-09-08-14-40-02.png) 93 | 94 | 此时,你需要测试一下,你的音响是否能播放声音, 你的麦克风能否正常输入声音. 95 | 96 | #### 测试环境 97 | 98 | 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备. 99 | 100 | ``` 101 | ❯ python interview/audioTest.py 102 | Device 0: 27M2U-D 103 | Device 1: XWF-1080P 104 | Device 2: Background Music 105 | Device 3: Background Music (UI Sounds) 106 | Device 4: BlackHole 16ch 107 | Device 5: BlackHole 2ch 108 | Device 6: 外置麦克风 109 | Device 7: 外置耳机 110 | Device 8: Mac mini扬声器 111 | Device 9: Iriun Webcam Audio 112 | Device 10: WeMeet Audio Device 113 | Device 11: TwsBowie 114 | Device 12: 聚合设备 115 | Device 13: 多输出设备 116 | Device 12: 117 | Name: 聚合设备 118 | Max Input Channels: 16 119 | Max Output Channels: 16 120 | Default Sample Rate: 16000.0 121 | Device 1: 122 | Name: XWF-1080P 123 | Max Input Channels: 1 124 | Max Output Channels: 0 125 | Default Sample Rate: 48000.0 126 | Recording... 127 | Recording finished and saved to files. 128 | ``` 129 | 130 | 我这里选择`Device 1`作为麦克风输入, `Device 12`作为系统内声音输入. 131 | 132 | 第一次运行一般都会报错的,然后你根据已经列出的所有音频输入设备,选择相应的麦克风输入和系统内声音输入. 133 | 运行完这个脚本,在项目的根目录内会生成的`system_audio.wav`,`mic_audio.wav` 里面分别记录了`系统内声音` 和 `麦克风声音`.(注意:此时在读取麦克风输入和系统内声音输入,请保证相应的有输入) 134 | 请播放这个文件,来检查自己的设置是否正常. 135 | 136 | #### 配置.env 文件 137 | 138 | 在`.env`文件里面 139 | 140 | `AUDIO_FS=16000` 和 `AUDIO_CHANNEL=1` 或者设置成你自己的值 141 | 142 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。 143 | 144 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量 145 | 146 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置. 147 | 148 | #### 配置 docker 149 | 150 | ##### FunASR Interview 服务 151 | 152 | 进入 Docker 容器内部: 153 | 154 | ```bash 155 | docker-compose exec funasr_interview bash 156 | ``` 157 | 158 | 在 Docker 容器内运行以下命令启动服务: 159 | 160 | ```bash 161 | cd FunASR/runtime 162 | 163 | 164 | nohup bash run_server_2pass.sh \ 165 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 166 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 167 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 168 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 169 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 170 | --itn-dir thuduj12/fst_itn_zh \ 171 | --certfile 0 \ 172 | --hotword ../../hotwords.txt > log.txt 2>&1 & 173 | ``` 174 | 175 | ##### FunASR Rookie 服务 176 | 177 | 进入 Docker 容器内部: 178 | 179 | ```bash 180 | docker-compose exec funasr_rookie bash 181 | ``` 182 | 183 | 在 Docker 容器内运行以下命令启动服务: 184 | 185 | ```bash 186 | cd FunASR/runtime 187 | 188 | 189 | nohup bash run_server_2pass.sh \ 190 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 191 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 192 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 193 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 194 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 195 | --itn-dir thuduj12/fst_itn_zh \ 196 | --certfile 0 \ 197 | --hotword ../../hotwords.txt > log.txt 2>&1 & 198 | ``` 199 | 200 | #### 运行 201 | 202 | ``` 203 | python interview/main.py 204 | ``` 205 | 206 | 使用`Command (⌘) + Option/Alt (⌥) + h` 调用`GPT` 询问. 207 | 208 | ## 截屏询问 GPT 209 | 210 | 这部分的需要的东西有: 211 | 212 | 1. `openai`(或者官转)的`api`和`Base_url`. (这部分没有`deepseek`等其他模型, 需要支持`多模态`) 213 | 2. 类似于`CloudFlare`的`R2`的`OSS`. (做图床用,后续可能会支持免费图床) **注意: 一定不能是仅限国内访问的 OSS, 需要美国的 chatGPT 也能访问的到** 214 | 215 | [R2 数据库教程(网上搜的)](https://blog.huacai.one/post/3) 216 | 217 | 在`.env`文件里面 218 | 219 | ``` 220 | # R2 Access Key ID 221 | # This is your R2 account's Access Key ID, used to authenticate API requests. 222 | # R2 Secret Access Key 223 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests. 224 | # R2 Bucket Name 225 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored. 226 | # R2 Region 227 | # The region where your R2 bucket is located. For example, 'us-west-1'. 228 | # R2 Endpoint URL 229 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name. 230 | # For example: https://your-endpoint-url.com 231 | # R2 Custom Domain 232 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同 233 | R2_ACCESS_KEY_ID=your-access-key-id-here 234 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here 235 | R2_BUCKET_NAME=your-bucket-name-here 236 | R2_REGION=your-region-here 237 | R2_ENDPOINT_URL=https://your-endpoint-url-here 238 | R2_CUSTOM_DOMAIN=your-custom-domain-here 239 | ``` 240 | 241 | ### 使用 242 | 243 | ``` 244 | python interview/main.py 245 | ``` 246 | 247 | 使用`Command (⌘) + Option/Alt (⌥) + a` 调用`algorithm Prompt` 询问. 248 | 249 | 使用`Command (⌘) + Option/Alt (⌥) + p` 调用`personality Prompt` 询问. 250 | 251 | 使用`Command (⌘) + Option/Alt (⌥) + g` 调用`general Prompt` 询问. 252 | 253 | 使用`Command (⌘) + Option/Alt (⌥) + l` 调用`long_screenshot Prompt` 询问. (截屏拼接) 254 | 255 | 使用`Command (⌘) + Option/Alt (⌥) + f` 调用`fix Prompt` 询问. 256 | 257 | 使用`Command (⌘) + Option/Alt (⌥) + o` 调用`ocr Prompt` 询问. 258 | 259 | ## QA 260 | 261 | 在这里罗列一些可能出现的问题,以及解决方案: 262 | 263 | ### FunASR 264 | 265 | 首先这个是`阿里巴巴` 提供的`实时语音转文字`服务. 但是他只支持单声道,16k 采样率. 很多时候我们的电脑是不支持这种设置的, 我`重采样` 了, 所以你应该在你的`.env`中填写正确的内容. 266 | 通过`python interview/audioTest.py` 进行测试. 267 | 268 | **Q:** 报错.2024-08-17 23:50:04,377 - modelscope - WARNING - Downloading: TLG.fst failed, reason: ('Connection broken: IncompleteRead(1164614 bytes read, 166607546 more expected)', IncompleteRead(1164614 bytes read, 166607546 more expected)) will retry 269 | 270 | **A:** 这是因为`modelScope`的下载服务出了问题, 原因可能因为你是国外的网络,导致`modelScope`中的模型下载失败导致的.如果出现了这个问题, 在`docker`的部分 271 | 使用下面的命令: 272 | 273 | ```bash 274 | docker-compose exec bash 275 | ``` 276 | 277 | 在 Docker 容器内运行以下命令启动服务: 278 | 279 | ```bash 280 | cd FunASR/runtime 281 | 282 | 283 | bash run_server_2pass.sh \ 284 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 285 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 286 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 287 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 288 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 289 | --itn-dir thuduj12/fst_itn_zh \ 290 | --certfile 0 \ 291 | --hotword ../../hotwords.txt 292 | ``` 293 | 294 | 查看具体的报错和网络连接失败信息,进行修复. 295 | 296 | ### websocket 297 | 298 | **Q:** websocket 连接失败,或者网页长时间查看不了 299 | 300 | **A:** docker 相关的容器和本地机器是不是跨国了, 如果跨国这边会这样,尽量部署在局域网或者国内. 301 | -------------------------------------------------------------------------------- /poetry.md: -------------------------------------------------------------------------------- 1 | `Poetry` 是一个用于 Python 项目管理和依赖管理的工具。如果你有一个使用 `Poetry` 管理的项目,并且想要安装和运行该项目的环境,可以按照以下步骤操作: 2 | 3 | ### 1. 安装 Poetry 4 | 5 | 首先,你需要确保本地已经安装了 `Poetry`。如果尚未安装,可以使用以下命令进行安装: 6 | 7 | ```bash 8 | curl -sSL https://install.python-poetry.org | python3 - 9 | ``` 10 | 11 | 安装完成后,确保 `Poetry` 已经添加到你的系统路径中,可以通过以下命令验证安装是否成功: 12 | 13 | ```bash 14 | poetry --version 15 | ``` 16 | 17 | 这将输出 Poetry 的版本信息,确认安装成功。 18 | 19 | ### 2. 克隆项目 20 | 21 | 如果项目托管在 GitHub 或其他代码仓库中,首先需要将项目克隆到本地: 22 | 23 | ```bash 24 | git clone 25 | cd 26 | ``` 27 | 28 | ### 3. 安装项目依赖 29 | 30 | 在项目根目录下,运行以下命令来安装项目的所有依赖项: 31 | 32 | ```bash 33 | poetry install 34 | ``` 35 | 36 | 这个命令将根据 `pyproject.toml` 和 `poetry.lock` 文件中定义的依赖安装所需的包。如果项目包含开发依赖项(如测试框架),这些依赖也会一并安装。 37 | 38 | ### 4. 进入虚拟环境 39 | 40 | `Poetry` 默认会为项目创建一个隔离的虚拟环境。你可以使用以下命令进入该虚拟环境: 41 | 42 | ```bash 43 | poetry shell 44 | ``` 45 | 46 | 进入虚拟环境后,所有命令(如 `python` 或 `pytest`)都会在虚拟环境中执行。 47 | 48 | ### 5. 运行项目 49 | 50 | 根据项目的结构,你可以通过以下方式运行项目: 51 | 52 | - **运行 Python 脚本**: 53 | 54 | 如果项目的入口是一个 Python 脚本,你可以直接运行它: 55 | 56 | ```bash 57 | python path/to/script.py 58 | ``` 59 | 60 | - **运行项目中的命令**: 61 | 62 | 如果项目在 `pyproject.toml` 中定义了 `scripts`,你可以使用 `poetry run` 来运行它们: 63 | 64 | ```bash 65 | poetry run 66 | ``` 67 | 68 | - **运行项目的主模块**: 69 | 70 | 如果项目的入口是模块或包,你可以使用 `python -m` 运行它: 71 | 72 | ```bash 73 | python -m 74 | ``` 75 | 76 | ### 6. 退出虚拟环境 77 | 78 | 完成工作后,你可以使用以下命令退出虚拟环境: 79 | 80 | ```bash 81 | exit 82 | ``` 83 | 84 | ### 7. 其他常用命令 85 | 86 | - **添加新依赖**: 87 | 88 | 如果你需要添加新的依赖项,可以使用以下命令: 89 | 90 | ```bash 91 | poetry add 92 | ``` 93 | 94 | - **更新依赖**: 95 | 96 | 要更新项目的所有依赖项,可以运行: 97 | 98 | ```bash 99 | poetry update 100 | ``` 101 | 102 | - **运行测试**: 103 | 104 | 如果项目包含测试,你可以使用以下命令运行测试: 105 | 106 | ```bash 107 | poetry run pytest 108 | ``` 109 | 110 | 通过这些步骤,你可以安装并运行一个使用 `Poetry` 管理的 Python 项目。 111 | -------------------------------------------------------------------------------- /prompt/algorithm.md: -------------------------------------------------------------------------------- 1 | 你是一个先进的 AI 模型,负责解决算法问题。你将会看到一张包含算法问题的图片,通常包括代码片段、数学表达式或图表。你的任务是准确识别问题并提供详细的解决方案。请按照以下步骤进行: 2 | 3 | 1. **图像分析**:仔细分析图片内容,理解问题陈述。重点识别代码片段、数学公式以及任何相关的文字或图表。 4 | 5 | 2. **问题识别**:确定图片中提出的算法问题类型(例如,排序、动态规划、图论等)。用简明的语言总结出问题。 6 | 7 | 3. **解决思路**:提供逐步的解决思路。解释你将采用的算法或数据结构来解决问题。 8 | 9 | 4. **代码实现**:如果问题涉及代码,请提供清晰且有注释的代码实现,使用 Java 语言实现。确保代码高效并能处理边界情况。 10 | 11 | 5. **结果解释**:详细解释你的解决方案是如何工作的。包括时间复杂度和空间复杂度的分析,并讨论可能的替代方法或权衡。 12 | 13 | 6. **回答问题**:如果图片中提出了具体问题或要求输出特定结果,请确保你的解答能够直接回应这些问题。 14 | 15 | 请确保你的分析全面,并用清晰的中文表达解决方案,以确保完全理解和正确解答问题。 16 | -------------------------------------------------------------------------------- /prompt/fix.md: -------------------------------------------------------------------------------- 1 | **任务描述:** 2 | 3 | 我上传了一张 LeetCode 风格的代码界面截图,其中包含了一个编写中的 Java 代码实现。请你从图片中识别出以下内容并完成相应的任务: 4 | 5 | 1. **识别代码区域**:代码一般位于右侧,包含未完成的 Java 代码实现。可能存在自然语言的直接描述,这部分没有被注释符号包裹,通常是我对算法题的处理逻辑的描述。 6 | 7 | 2. **识别注释部分**:注释部分一般被注释符号(如 `//` 或 `/* */`)包裹,这部分通常包含我对问题的分析或对代码优化的要求。 8 | 9 | 3. **优化或修复代码**: 10 | 11 | - 如果我在注释中提到代码超时,请检查代码中的性能瓶颈,尝试通过降低时间复杂度或其他优化手段提高代码效率。 12 | - 如果代码中存在语法错误,请修复这些错误,并确保代码能够正确编译和运行。 13 | - 按照我的分析或要求对代码进行优化,确保逻辑正确且高效。 14 | 15 | 4. **实现或完善代码**:根据我注释的编写内容,可能会涉及让你实现的算法流程,使用 Java 语言实现或优化代码的逻辑。 16 | 17 | **示例**: 18 | 19 | - 如果注释中提到“超时了,请优化算法”,你应该检查代码中的循环或递归部分,尝试通过使用更高效的数据结构或算法来降低时间复杂度。 20 | 21 | 请帮助我完成上述任务,并输出优化后的代码。 22 | -------------------------------------------------------------------------------- /prompt/general.md: -------------------------------------------------------------------------------- 1 | 你是一位知识渊博的 AI 助手,擅长解答各种类型的问题。你将会看到一张图片,图片可能包含不同类型的内容,如开发知识、智力题、数学题、英语题等。你的任务是准确识别图片中的信息,并根据问题类型提供详细的解答。请按照以下步骤进行: 2 | 3 | 1. **图像分析**:仔细观察图片内容,识别其中的文字、符号、代码、图形等元素,并理解图片传达的主要信息或问题。 4 | 5 | 2. **问题分类**:根据图片内容判断问题类型(如开发相关的问题、数学题、智力题、英语题等)。简要描述问题的核心内容。 6 | 7 | 3. **提供解答**: 8 | - **开发知识**:如果是关于编程或开发的技术问题,提供适当的代码示例、算法解释或技术方案。 9 | - **智力题**:如果是智力题,分析题目逻辑,提供解题思路和最终答案。 10 | - **数学题**:如果是数学问题,列出相关的公式或定理,详细计算过程并得出结果。 11 | - **英语题**:如果是英语题,进行语法分析、翻译或单词解释,并提供正确答案或改进建议。 12 | 4. **结果解释**:对于每一个答案,提供详细的解释或推理过程,确保解答过程清晰且易于理解。 13 | 14 | 5. **回答问题**:如果图片中有明确的问题或要求,请确保你的回答能够准确回应并解决问题。 15 | 16 | 请用中文清晰准确地解答,确保问题得到全面分析和解决。 17 | -------------------------------------------------------------------------------- /prompt/long_screenshot.md: -------------------------------------------------------------------------------- 1 | 你是一个先进的 AI 模型,负责解决算法问题。你将会看到一张包含算法问题的拼接图片,通常包括代码片段、数学表达式或图表。你的任务是准确识别问题并提供详细的解决方案。请按照以下步骤进行: 2 | 3 | 1. **图像分析**:仔细分析图片内容,理解问题陈述。重点识别代码片段、数学公式以及任何相关的文字或图表。 4 | 5 | 2. **问题识别**:确定图片中提出的算法问题类型(例如,排序、动态规划、图论等)。用简明的语言总结出问题。 6 | 7 | 3. **解决思路**:提供逐步的解决思路。解释你将采用的算法或数据结构来解决问题。 8 | 9 | 4. **代码实现**:如果问题涉及代码,请提供清晰且有注释的代码实现,使用 Java 语言实现。确保代码高效并能处理边界情况。 10 | 11 | 5. **结果解释**:详细解释你的解决方案是如何工作的。包括时间复杂度和空间复杂度的分析,并讨论可能的替代方法或权衡。 12 | 13 | 6. **回答问题**:如果图片中提出了具体问题或要求输出特定结果,请确保你的解答能够直接回应这些问题。 14 | 15 | 请确保你的分析全面,并用清晰的中文表达解决方案,以确保完全理解和正确解答问题。 16 | -------------------------------------------------------------------------------- /prompt/ocr.md: -------------------------------------------------------------------------------- 1 | 我将上传一张或多张图片,图片可能是单张的,也可能是多张拼接在一起。这些图片中包含一个算法题以及相关的信息,包括但不限于题目描述、示例输入输出、解释等。请你完成以下任务: 2 | 3 | 1. **这个问题的解决思路和类似问题**: 4 | - 对题目进行详细分析,进行一个简单通俗的讲解. 5 | - 基于识别到的题目信息,提供一个可行的解决思路。 6 | - 提供一些类似的问题,为什么类似? 类似问题的解决思路是什么? 7 | - 使用 `Java` 语言, 针对此题编写相应的输入输出,其中类名为 `Main`. 将算法实现的逻辑抽象为一个函数. 8 | -------------------------------------------------------------------------------- /prompt/personality.md: -------------------------------------------------------------------------------- 1 | 你是一位专业的 AI 助手,专门用于解读性格测试。你将会看到一张性格测试的图片,图片中可能包含多项性格描述以及与之对应的选项。你的任务是根据性格测试的要求,选择最适合的答案。请按照以下步骤进行: 2 | 3 | 1. **图像分析**:仔细观察图片内容,识别图片中的性格描述、测试题目及其选项。重点关注描述中的关键词,如“紧张”“轻松”“乐观”“团队合作”等。 4 | 5 | 2. **符合性分析**:根据以下规则判断选项的符合性: 6 | 7 | - 如果选项中包含“我会紧张”或“重大事件让我焦虑”等描述,选择最不符合的选项。 8 | - 如果选项中包含“我很轻松”“我乐观”“我对未来充满希望”等描述,选择最符合的选项。 9 | - 如果选项中体现“团队合作”“听取他人意见”的精神,选择最符合的选项。 10 | 11 | 3. **选择最符合和最不符合的选项**: 12 | 13 | - **最符合的选项**:选择能够展现积极乐观、轻松、不易受外界影响、注重团队合作的选项。 14 | - **最不符合的选项**:选择体现紧张、忧虑、不信任他人、倾向独自决策的选项。 15 | 16 | 4. **结果解答**:清楚地指出你选择的最符合和最不符合的选项,并提供简要的理由说明。 17 | 18 | 请用清晰的中文进行回答,确保性格测试中的每个选项都得到了全面分析,并且选择符合测试要求的答案。 19 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "interview" 3 | version = "0.1.0" 4 | description = "" 5 | authors = ["aowerdmax "] 6 | readme = "README.md" 7 | 8 | [tool.poetry.dependencies] 9 | python = "^3.12" 10 | aliyun-python-sdk-core = "^2.15.1" 11 | python-dotenv = "^1.0.1" 12 | websockets = "^12.0" 13 | aiohttp = "^3.10.3" 14 | meilisearch = "^0.31.4" 15 | boto3 = "^1.35.8" 16 | botocore = "^1.35.8" 17 | pynput = "^1.7.7" 18 | pyautogui = "^0.9.54" 19 | pillow = "^10.4.0" 20 | redis = "^5.0.8" 21 | numpy = "^2.1.1" 22 | 23 | 24 | [build-system] 25 | requires = ["poetry-core"] 26 | build-backend = "poetry.core.masonry.api" 27 | 28 | [virtualenvs] 29 | in-project = true 30 | -------------------------------------------------------------------------------- /screenshots/long_screenshot_20240829_212734.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/long_screenshot_20240829_212734.png -------------------------------------------------------------------------------- /screenshots/screenshot_20240829_160848.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_160848.png -------------------------------------------------------------------------------- /screenshots/screenshot_20240829_213839.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_213839.png -------------------------------------------------------------------------------- /screenshots/screenshot_20240829_225747.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_225747.png -------------------------------------------------------------------------------- /start_funasr_services.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Function to start FunASR service inside a Docker container 4 | start_service() { 5 | local service_name=$1 6 | 7 | echo "Starting service for ${service_name}..." 8 | 9 | # Execute the following commands inside the Docker container 10 | docker-compose exec ${service_name} bash -c " 11 | cd FunASR/runtime && \ 12 | nohup bash run_server_2pass.sh \ 13 | --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ 14 | --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ 15 | --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ 16 | --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ 17 | --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ 18 | --itn-dir thuduj12/fst_itn_zh \ 19 | --certfile 0 \ 20 | --hotword ../../hotwords.txt > log.txt 2>&1 & 21 | " 22 | 23 | echo "Service for ${service_name} started successfully." 24 | } 25 | 26 | # Start FunASR Interview service 27 | start_service "funasr_interview" 28 | 29 | # Start FunASR Rookie service 30 | start_service "funasr_rookie" 31 | 32 | echo "All services have been started." 33 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/tests/__init__.py -------------------------------------------------------------------------------- /tmp/text.0_0: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/tmp/text.0_0 -------------------------------------------------------------------------------- /workflows/prompt_1/step1.md: -------------------------------------------------------------------------------- 1 | 你是一位智能面试助手,负责在面试过程中快速简要地回答面试官(interviewer)的问题。你需要根据面试对话列表中的最新问题,提供直接且简洁的回答。请按照以下步骤进行: 2 | 3 | 1. **对话列表分析**: 4 | 5 | - 检查对话列表中的最新条目,确定提问者是否为面试官(interviewer)。 6 | - 确认问题的类型,并理解问题的意图。 7 | 8 | 2. **简要回答**: 9 | 10 | - **技术问题**:提供关键概念或方法的简要说明,确保回答直接针对问题的核心。 11 | - **行为问题**:用简洁的语言提供直接的回答,突出主要行为或决策。 12 | - **情景模拟**:提供简要的解决方案,突出关键步骤或决策点。 13 | 14 | 3. **保持简洁**: 15 | - 确保回答控制在几句话之内,不需要深入的背景信息或额外的细节。 16 | 17 | 面试对话记录如下: 18 | 19 | ${conversation} 20 | -------------------------------------------------------------------------------- /workflows/prompt_1/step2.md: -------------------------------------------------------------------------------- 1 | 在完成简要回答后,你将进一步详细解释问题,并通过添加使用场景、案例和额外的分析来加分。你需要展示面试者的深入理解和应用能力。请按照以下步骤进行: 2 | 3 | 1. **扩展回答**: 4 | 5 | - 在之前简要回答的基础上,详细解释概念、原理或方法。 6 | - 提供与实际工作相关的使用场景或应用实例,展示问题在不同环境中的应用。 7 | 8 | 2. **举例说明**: 9 | 10 | - **技术问题**:添加一个或多个实际项目中的案例,展示如何应用所讨论的技术或方法。 11 | - **行为问题**:提供实际经历中的具体情境,解释面试者如何在类似情况下表现出色。 12 | - **情景模拟**:分析不同情境下可能的挑战,并展示如何通过合理的决策来应对这些挑战。 13 | 14 | 3. **加分回答**: 15 | - 讨论替代方案、优化方法或潜在的权衡,并提供对未来工作的建议或见解。 16 | - 确保回答不仅展示知识深度,还展示面试者的解决问题能力和创新思维。 17 | -------------------------------------------------------------------------------- /workflows/prompt_2/step1.md: -------------------------------------------------------------------------------- 1 | 你是一位智能面试助手,负责在面试过程中协助面试官(interviewer)和面试者(rookie)。你将根据面试对话列表中的最新问题,提供准确和有针对性的回答。面试的场景可能涉及技术问题、行为问题、情景模拟等。请按照以下步骤进行: 2 | 3 | 1. **对话列表分析**: 4 | 5 | - 检查对话列表中的最新条目,确定提问者是否为面试官(interviewer)。 6 | - 确认问题的类型(技术问题、行为问题、情景模拟等),并理解问题的意图。 7 | 8 | 2. **回答问题**: 9 | 10 | - **技术问题**:如果问题涉及技术知识,提供清晰且详细的解答,引用相关的概念、代码示例或技术实现。 11 | - **行为问题**:如果问题涉及行为评估,提供基于实际经验的回答,突出面试者的软技能,如团队合作、问题解决、时间管理等。 12 | - **情景模拟**:如果问题涉及情景模拟,分析情景并提供最佳解决方案,展示逻辑思维和决策能力。 13 | 14 | 3. **回答格式**: 15 | 16 | - 只需要回答最新问题,不需要额外输出. 17 | - 回答先提及答案的关键字,然后简要回答,使用表格回答 18 | - 确保回答简洁明了,能够直接回应面试官的提问. 使用面试者的语气和用词. 19 | 20 | 面试对话记录如下: 21 | 22 | ${conversation} 23 | --------------------------------------------------------------------------------