├── .gitignore
├── None
    └── text.0_0
├── README.md
├── README_EN.md
├── Windows_user_tutorial.md
├── asserts
    ├── 2024-09-08-14-37-55.png
    ├── 2024-09-08-14-39-00.png
    ├── 2024-09-08-14-40-02.png
    ├── 2024-09-08-16-43-48.png
    ├── 2024-09-08-16-43-56.png
    ├── 2024-09-08-16-44-01.png
    └── 2024-09-08-16-44-07.png
├── config.json
├── data
    └── database.xlsx
├── dialogs_output_202408201256.md
├── docker-compose.yml
├── env.template
├── img
    ├── RAG_1.png
    ├── RAG_2.png
    ├── algo_long_response.png
    ├── algo_long_screenShot.png
    ├── excel.png
    ├── fix_response.png
    ├── fix_screenShot.png
    ├── general_response.png
    ├── general_screenShot.png
    ├── image.png
    ├── img.gif
    └── web.gif
├── interview
    ├── ChatgptManager.py
    ├── Config.py
    ├── DialogManager.py
    ├── Interviewer.py
    ├── MultimodeManager.py
    ├── R2Uploader.py
    ├── RagManager.py
    ├── RequestSender.py
    ├── Rookie.py
    ├── SaveFile.py
    ├── ScreenShot.py
    ├── __init__.py
    ├── audioTest.py
    └── main.py
├── macOS_user_tutorial.md
├── poetry.md
├── prompt
    ├── algorithm.md
    ├── fix.md
    ├── general.md
    ├── long_screenshot.md
    ├── ocr.md
    └── personality.md
├── pyproject.toml
├── screenshots
    ├── long_screenshot_20240829_212734.png
    ├── screenshot_20240829_160848.png
    ├── screenshot_20240829_213839.png
    └── screenshot_20240829_225747.png
├── start_funasr_services.sh
├── tests
    └── __init__.py
├── tmp
    └── text.0_0
└── workflows
    ├── prompt_1
        ├── step1.md
        └── step2.md
    └── prompt_2
        └── step1.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | .env
  2 | *.wav
  3 | !.env.template
  4 | # Byte-compiled / optimized / DLL files
  5 | __pycache__/
  6 | interview/__pycache__/
  7 | *.py[cod]
  8 | *$py.class
  9 | 
 10 | # C extensions
 11 | *.so
 12 | 
 13 | screenshots/*
 14 | !screenshots/long_screenshot_20240829_212734.png
 15 | !screenshots/screenshot_20240829_160848.png
 16 | !screenshots/screenshot_20240829_213839.png
 17 | !screenshots/screenshot_20240829_225747.png
 18 | 
 19 | dialogs_output*
 20 | !dialogs_output_202408201256.md
 21 | 
 22 | # Distribution / packaging
 23 | .Python
 24 | env/
 25 | venv/
 26 | ENV/
 27 | env.bak/
 28 | venv.bak/
 29 | .pytest_cache/
 30 | 
 31 | # Poetry specific
 32 | poetry.lock
 33 | .cache/
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | .coverage/
 53 | 
 54 | # MyPy
 55 | .mypy_cache/
 56 | .dmypy.json
 57 | dmypy.json
 58 | 
 59 | # Pyre type checker
 60 | .pyre/
 61 | 
 62 | # Jupyter Notebook
 63 | .ipynb_checkpoints
 64 | 
 65 | # pyenv
 66 | .python-version
 67 | 
 68 | # Celery stuff
 69 | celerybeat-schedule
 70 | celerybeat.pid
 71 | 
 72 | # SageMath parsed files
 73 | *.sage.py
 74 | 
 75 | # Environments
 76 | .env
 77 | .env.*
 78 | .venv
 79 | venv/
 80 | ENV/
 81 | 
 82 | # Virtualenv
 83 | venv/
 84 | ENV/
 85 | 
 86 | # Spyder project settings
 87 | .spyderproject
 88 | .spyproject
 89 | 
 90 | # Rope project settings
 91 | .ropeproject
 92 | 
 93 | # IDEs
 94 | .vscode/
 95 | .idea/
 96 | *.sublime-project
 97 | *.sublime-workspace
 98 | 
 99 | # MacOS-specific files
100 | .DS_Store
101 | .AppleDouble
102 | .LSOverride
103 | 
104 | # Thumbnails
105 | ._*
106 | 
107 | # Trashes
108 | .Trashes
109 | 
110 | # macOS specific files
111 | .DS_Store
112 | Icon?
113 | ._*
114 | .Spotlight-V100
115 | .Trashes
116 | .DS_Store?
117 | 
118 | # Linux-specific files
119 | *~
120 | 
121 | # Windows-specific files
122 | Thumbs.db
123 | ehthumbs.db
124 | Desktop.ini
125 | $RECYCLE.BIN/
126 | *.cab
127 | *.msi
128 | *.msm
129 | *.msp
130 | *.lnk
131 | 


--------------------------------------------------------------------------------
/None/text.0_0:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/None/text.0_0


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Simple Interview Audio Processing
  2 | 
  3 | > **免责声明**: 本项目仅为技术练习，严禁用于面试作弊或任何商业用途。若因使用本项目导致任何法律问题，作者概不负责。如本项目对您造成困扰，请联系作者进行删除。
  4 | 
  5 | 这是一个基于音频流的简单示例项目，旨在展示音频处理的基本功能。灵感源于网络上的高价付费项目，本项目通过简单实现来证明这类功能并不需要复杂的技术堆叠。
  6 | 
  7 | ---
  8 | 
  9 | [Windows 用户部署教程](Windows_user_tutorial.md)
 10 | 
 11 | [MacOS 用户部署教程](macOS_user_tutorial.md)
 12 | 
 13 | ## Future features
 14 | 
 15 | 1. 通过 `多模态` + `数据库` 构建人格, 使用 `Tampermonkey` 实现 `性格测试` 的 `全自动化`
 16 | 2. ocr 识别数学公式准确率不高,导致 GPT 有时候会读不懂题进而乱回答
 17 | 3. 目前回答的 GPT 味太过严重, 这边我会爬取一些题解面经作为 RAG 外接,同时优化 prompt
 18 | 4. GPT 的正确率还是太低,做一个 claude 和 GPT,甚至其他模型相互验证
 19 | 5. 最重要: 优化部署流程部署难度,致力于让每个人都能简简单单的用上(考虑做一个用户端的界面并打包)
 20 | 6. 如果你喜欢这个项目,可以给一个 star 吗? 如果有你想要的 `Feature` 可以在 `Issues` 或者其他地方告诉我.
 21 | 
 22 | ## New Feature
 23 | 
 24 | 新增多模态和 GPT 快捷键调用
 25 | 
 26 | #### MacOS
 27 | 
 28 | ##### 对话询问
 29 | 
 30 | 使用`Command (⌘) + Option/Alt (⌥) + h` 调用`GPT` 进行对话询问.
 31 | 
 32 | ##### 截图询问
 33 | 
 34 | 使用`Command (⌘) + Option/Alt (⌥) + a` 调用`algorithm Prompt` 询问.
 35 | 
 36 | 使用`Command (⌘) + Option/Alt (⌥) + p` 调用`personality Prompt` 询问.
 37 | 
 38 | 使用`Command (⌘) + Option/Alt (⌥) + g` 调用`general Prompt` 询问.
 39 | 
 40 | 使用`Command (⌘) + Option/Alt (⌥) + l` 调用`long_screenshot Prompt` 询问. (截屏拼接)
 41 | 
 42 | 使用`Command (⌘) + Option/Alt (⌥) + f` 调用`fix Prompt` 询问.
 43 | 
 44 | 使用`Command (⌘) + Option/Alt (⌥) + o` 调用`ocr Prompt` 询问.
 45 | 
 46 | #### Windows
 47 | 
 48 | ##### 对话询问
 49 | 
 50 | 使用`<ctrl> + <alt> + h` 调用`GPT` 进行对话询问.
 51 | 
 52 | ##### 截图询问
 53 | 
 54 | 使用`<ctrl> + <alt> + a` 调用`algorithm Prompt` 询问.
 55 | 
 56 | 使用`<ctrl> + <alt> + p` 调用`personality Prompt` 询问.
 57 | 
 58 | 使用`<ctrl> + <alt> + g` 调用`general Prompt` 询问.
 59 | 
 60 | 使用`<ctrl> + <alt> + l` 调用`long_screenshot Prompt` 询问. (截屏拼接)
 61 | 
 62 | 使用`<ctrl> + <alt> + f` 调用`fix Prompt` 询问.
 63 | 
 64 | 使用`<ctrl> + <alt> + o` 调用`ocr Prompt` 询问.
 65 | 
 66 | ### 通用问答展示
 67 | 
 68 | ![general_screenShot](img/general_screenShot.png)
 69 | ![general_response](img/general_response.png)
 70 | 
 71 | ### 长算法题展示
 72 | 
 73 | ![algo_long_screenShot](img/algo_long_screenShot.png)
 74 | ![algo_long_response](img/algo_long_response.png)
 75 | 
 76 | ### 代码修复
 77 | 
 78 | ![fix_screenShot](img/fix_screenShot.png)
 79 | ![fix_response](img/fix_response.png)
 80 | 
 81 | ### web 展示
 82 | 
 83 | 下图讲解, 通过播放本地音频,模仿系统内声音输出 `你为什么要使用消息队列呢?`
 84 | 
 85 | `web` 监听到 `你为什么要使用消息队列呢?` 内容并流式输出
 86 | 
 87 | `ChatGPT(大模型助手)` 流式输出相关问题的答案.
 88 | 
 89 | 支持 `本地部署` 和 `服务器部署`. `flex` 布局,在 `手机` `平板` `电脑` 下的具有良好的显示效果
 90 | ![Web](img/web.gif)
 91 | 
 92 | [web 部分的仓库](https://github.com/AowerDmax/websocket-redis)
 93 | 
 94 | ---
 95 | 
 96 | ### RAG 展示
 97 | 
 98 | 下图讲解, 支持将你预设好的问答内容存入`RAG`数据库,在询问`ChatGPT(大模型助手)`之前会先搜索`RAG`数据库.
 99 | 
100 | ![Rag_1](img/RAG_1.png)
101 | 
102 | ![Rag_2](img/RAG_2.png)
103 | 
104 | [RAG 部分的仓库](https://github.com/AowerDmax/websocket-redis)
105 | 
106 | #### 如何使用 RAG 知识库
107 | 
108 | 1. 在 `.env` 文件内, 设置`RAG_ENABLED=True`
109 | 2. 在 `data` 文件夹下,放入你的知识库 `xlsx` 文件, 会递归遍历所有的`xlsx` 文件
110 | 3. 文件格式如下: 有两列, 第一列是`Q`,第二列是`A`. 分别对应问题和回答. (生成知识库文件可以使用`FastGPT`导出的内容)
111 | 4. 重启`docker`容器, `docker-compose restart`
112 | 
113 | ![Excel](img/excel.png)
114 | 
115 | **Q:** 能否将我的项目进行存放在 `RAG` 知识库中呢?
116 | 
117 | **A:** 目前需要自己处理, 自己询问 GPT 或者自己编写对应的问题, 或者你可以修改网络上 `AI审查代码的项目` 修改里面的`prompt`, 让其生成你自己项目的 `RAG` 知识库.
118 | 
119 | ---
120 | 
121 | ### Terminal 展示
122 | 
123 | 下图讲解, 通过播放 b 站视频, 模仿系统内声音输出 `Interview` 监听到 `Redis` 相关内容
124 | 
125 | `Rookie` 用户麦克风回答 `我不知道`
126 | 
127 | `ChatGPT(大模型助手)` 流式输出相关问题的答案.
128 | 
129 | ![Demo](img/img.gif)
130 | 
131 | ---
132 | 
133 | ### Workflow 展示
134 | 
135 | 下图讲解,`interview(系统内声音)` 询问 `哈希` 相关问题
136 | 
137 | `ChatGPT(大模型助手)` 首先回答了简要答案,哈希表、哈希函数、哈希冲突
138 | 
139 | 然后针对这三个点,进行详细性针对性回答.
140 | 
141 | `流式输出`,保证`输出速度`
142 | 
143 | ![Audio Configuration](img/image.png)
144 | 
145 | ---
146 | 
147 | ## Features
148 | 
149 | - **音频源处理**: 读取系统声音作为 `interviewer` 声音源，读取麦克风声音作为 `Rookie` 声音源，准确区分输入和输出。
150 | - **流式输出**: 支持 `interviewer`、`Rookie` 和 `ChatGPT` 的流式对话输出。
151 | - **自定义设置**: 可以自定义对话深度和打印内容，控制 `interviewer`、`Rookie` 和 `ChatGPT` 的最大对话记录数。
152 | - **Prompt 工作流**: 根据预设工作流顺序处理 `prompt` 文件夹中的所有文件。目前的工作流支持快速回复总结,然后针对各项针对性细节性回答
153 | - **保存对话记录**: 通过运行 `python interview/SaveFile.py` 将对话记录保存为 Markdown 文件。
154 | - **支持 openai 式 api**: `ChatGPT`, `Oaipro`, `Deepseek`, `通义千问`, 以及通过 `newApi` 和 `OneApi` 转换的 `openai` 格式的 API
155 | - **支持 web 展示**: 支持 web 展示 `Interview Dialog`, 流式输出, 支持 `本地部署` 和 `服务器部署`. `flex` 布局,在 `手机` `平板` `电脑` 下的具有良好的显示效果
156 | - **支持 外接 RAG 知识库**: 支持外接知识库 `RAG`, 存入你预设好的问题以及相应的答案. 在询问 `ChatGPT` 之前会先从先搜索知识库 `RAG` 内的相关内容作为辅助数据. 支持设置 `辅助数据的个数`
157 | - **优雅的退出机制**: 当 `代码` 退出时, 会自动生成 `dialogs_output_YYMMDDHHMM.md` 文件.
158 | 
159 | ## Installation
160 | 
161 | ### Recommended installation
162 | 
163 | #### 1. 拉取代码仓库
164 | 
165 | ```bash
166 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git
167 | 
168 | cd Simple-Interview-Audio-Processing
169 | ```
170 | 
171 | #### 2. 启动服务
172 | 
173 | 使用 `docker-compose` 启动服务：
174 | 
175 | ```bash
176 | docker-compose up -d
177 | ```
178 | 
179 | #### 3. docker 容器设置
180 | 
181 | ```bash
182 | sudo chmod +x start_funasr_services.sh
183 | ./start_funasr_services.sh
184 | ```
185 | 
186 | #### 4. 安装环境依赖
187 | 
188 | 如果你对 `Poetry` 不了解, 这里是一个简单的 `Poetry` 入门教程. [Poetry 入门教程](poetry.md)
189 | 
190 | 使用 Poetry 安装依赖：
191 | 
192 | ```bash
193 | poetry install
194 | ```
195 | 
196 | 进入虚拟环境：
197 | 
198 | ```bash
199 | poetry shell
200 | ```
201 | 
202 | #### 5. 配置 `.env` 文件
203 | 
204 | 复制模板文件并根据需要进行修改：
205 | 
206 | ```bash
207 | cp env.template .env
208 | ```
209 | 
210 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。
211 | 
212 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量
213 | 
214 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置.
215 | 
216 | #### 6. 运行项目
217 | 
218 | 运行主程序：
219 | 
220 | ```bash
221 | python interview/main.py
222 | ```
223 | 
224 | ### Manual installation
225 | 
226 | #### 1. 启动服务
227 | 
228 | 使用 `docker-compose` 启动服务：
229 | 
230 | ```bash
231 | docker-compose up -d
232 | ```
233 | 
234 | #### FunASR Interview 服务
235 | 
236 | 进入 Docker 容器内部：
237 | 
238 | ```bash
239 | docker-compose exec funasr_interview bash
240 | ```
241 | 
242 | 在 Docker 容器内运行以下命令启动服务：
243 | 
244 | ```bash
245 | cd FunASR/runtime
246 | nohup bash run_server_2pass.sh \
247 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
248 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
249 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
250 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
251 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
252 |  --itn-dir thuduj12/fst_itn_zh \
253 |  --certfile 0 \
254 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
255 | ```
256 | 
257 | #### FunASR Rookie 服务
258 | 
259 | 进入 Docker 容器内部：
260 | 
261 | ```bash
262 | docker-compose exec funasr_rookie bash
263 | ```
264 | 
265 | 在 Docker 容器内运行以下命令启动服务：
266 | 
267 | ```bash
268 | cd FunASR/runtime
269 | nohup bash run_server_2pass.sh \
270 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
271 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
272 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
273 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
274 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
275 |  --itn-dir thuduj12/fst_itn_zh \
276 |  --certfile 0 \
277 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
278 | ```
279 | 
280 | #### 2. 安装环境依赖
281 | 
282 | 使用 Poetry 安装依赖：
283 | 
284 | ```bash
285 | poetry install
286 | ```
287 | 
288 | 进入虚拟环境：
289 | 
290 | ```bash
291 | poetry shell
292 | ```
293 | 
294 | #### 3. 配置 `.env` 文件
295 | 
296 | 复制模板文件并根据需要进行修改：
297 | 
298 | ```bash
299 | cp env.template .env
300 | ```
301 | 
302 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` 以及 GPT 的 `baseurl` 和 `API` 配置。
303 | 
304 | #### 4. 运行项目
305 | 
306 | 运行主程序：
307 | 
308 | ```bash
309 | python interview/main.py
310 | ```
311 | 
312 | ## Audio Configuration on macOS and Windows
313 | 
314 | ### 使用 BlackHole 进行音频捕获（macOS）
315 | 
316 | 在 macOS 中，BlackHole 是一个虚拟音频驱动程序，允许在应用程序之间无缝传输音频。以下是配置步骤：
317 | 
318 | 1. **配置 Aggregate Device（合并设备）**：
319 | 
320 |    - 打开 **Audio MIDI Setup** 应用程序。
321 |    - 创建一个 Aggregate Device，选择 **BlackHole 16ch** 和你的蓝牙耳机设备。
322 |    - 确保 **BlackHole 16ch** 作为输出设备，蓝牙耳机作为输入设备。
323 | 
324 | 2. **配置 Multi-Output Device（多输出设备）**：
325 | 
326 |    - 创建一个 Multi-Output Device，选择 **BlackHole 16ch** 和蓝牙耳机作为输出设备。
327 |    - 将 Multi-Output Device 设置为系统默认输出设备。
328 | 
329 | 3. **运行音频测试**：
330 |    - 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备，并确保选择输出频率为 16K。
331 | 
332 | ### 在 Windows 中实现音频捕获
333 | 
334 | 在 Windows 系统中，可以使用类似的虚拟音频设备，如 **VB-CABLE Virtual Audio Device** 或 **VoiceMeeter**，来实现与 macOS 上 BlackHole 类似的音频捕获功能。以下是使用 VB-CABLE 实现音频捕获的步骤：
335 | 
336 | 1. **安装 VB-CABLE Virtual Audio Device**：
337 | 
338 |    - 访问 [VB-Audio 官方网站](https://vb-audio.com/Cable/) 并下载 VB-CABLE 安装程序。
339 |    - 安装 VB-CABLE Virtual Audio Device。安装完成后，它将作为一个虚拟音频设备出现在你的系统中。
340 | 
341 | 2. **配置音频设备**：
342 | 
343 |    - 打开 **声音控制面板**，进入 **播放** 和 **录制** 选项卡。
344 |    - 在 **播放** 选项卡中，将 `VB-CABLE Input` 设置为默认播放设备，这将捕获系统音频。
345 |    - 在 **录制** 选项卡中，选择 `VB-CABLE Output` 作为默认录音设备，这将允许应用程序获取系统音频输入。
346 |    - 如果你需要同时捕获麦克风音频，可以将麦克风设置为 `VB-CABLE Output` 的输入，或者在使用 VoiceMeeter 时进行更多高级配置。
347 | 
348 | 3. **运行音频测试**：
349 | 
350 |    - 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备，并确保在 Windows 上选择合适的音频设备进行录音和播放。
351 |    - 确保所选设备的采样率为 16K，以便与 ASR 模型兼容。
352 | 
353 | 通过这些步骤，无论在 macOS 还是 Windows 上，你都可以轻松实现音频捕获并应用于项目中。
354 | 
355 | ---
356 | 
357 | ## changelog
358 | 
359 | - 2024.8.29. 增加多模态,修改 GPT 为快捷键调用
360 | - 2024.8.20. web 前端, 外接知识库
361 | - 2024.8.16. 完成 ASR 语音识别, GPT 询问
362 | 
363 | 感谢您的使用！如有任何问题或建议，请随时联系。
364 | 


--------------------------------------------------------------------------------
/README_EN.md:
--------------------------------------------------------------------------------
  1 | # Simple Interview Audio Processing
  2 | 
  3 | > **Disclaimer**: This project is intended solely for technical practice and should not be used for cheating in interviews or any commercial purposes. The author is not responsible for any legal issues arising from the use of this project. If this project causes any inconvenience, please contact the author for removal.
  4 | 
  5 | This is a simple, audio-stream-based example project designed to demonstrate basic audio processing capabilities. The inspiration for this project comes from the high-priced paid projects available online. This project aims to show that such functionalities do not require complex technical stacks.
  6 | 
  7 | ![Demo](img/img.gif)
  8 | 
  9 | ---
 10 | 
 11 | ![Audio Configuration](img/image.png)
 12 | 
 13 | ## Features
 14 | 
 15 | - **Audio Source Processing**: Capture system audio as the `interviewer` source and microphone audio as the `Rookie` source, accurately distinguishing between input and output.
 16 | - **Streaming Output**: Supports streaming dialogue output for `interviewer`, `Rookie`, and `ChatGPT`.
 17 | - **Custom Settings**: Customize dialogue depth and print content, and control the maximum number of dialogues for `interviewer`, `Rookie`, and `ChatGPT`.
 18 | - **Prompt Workflow**: Processes all files in the `prompt` folder according to a preset workflow order.
 19 | - **Save Dialogue Records**: Save dialogue records as Markdown files by running `python interview/SaveFile.py`.
 20 | 
 21 | ## Installation
 22 | 
 23 | ### 1. Start the Services
 24 | 
 25 | Use `docker-compose` to start the services:
 26 | 
 27 | ```bash
 28 | docker-compose up -d
 29 | ```
 30 | 
 31 | #### FunASR Interview Service
 32 | 
 33 | Enter the Docker container:
 34 | 
 35 | ```bash
 36 | docker-compose exec funasr_interview bash
 37 | ```
 38 | 
 39 | Inside the Docker container, run the following command to start the service:
 40 | 
 41 | ```bash
 42 | cd FunASR/runtime
 43 | nohup bash run_server_2pass.sh \
 44 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
 45 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
 46 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
 47 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
 48 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
 49 |  --itn-dir thuduj12/fst_itn_zh \
 50 |  --certfile 0 \
 51 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
 52 | ```
 53 | 
 54 | #### FunASR Rookie Service
 55 | 
 56 | Enter the Docker container:
 57 | 
 58 | ```bash
 59 | docker-compose exec funasr_rookie bash
 60 | ```
 61 | 
 62 | Inside the Docker container, run the following command to start the service:
 63 | 
 64 | ```bash
 65 | cd FunASR/runtime
 66 | nohup bash run_server_2pass.sh \
 67 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
 68 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
 69 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
 70 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
 71 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
 72 |  --itn-dir thuduj12/fst_itn_zh \
 73 |  --certfile 0 \
 74 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
 75 | ```
 76 | 
 77 | ### 2. Install Dependencies
 78 | 
 79 | Use Poetry to install dependencies:
 80 | 
 81 | ```bash
 82 | poetry install
 83 | ```
 84 | 
 85 | Enter the virtual environment:
 86 | 
 87 | ```bash
 88 | poetry shell
 89 | ```
 90 | 
 91 | ### 3. Configure the `.env` File
 92 | 
 93 | Copy the template file and modify it as needed:
 94 | 
 95 | ```bash
 96 | cp .env.template .env
 97 | ```
 98 | 
 99 | Make sure to adjust `AGGREGATE_DEVICE_INDEX`, `MIC_DEVICE_INDEX`, and the GPT `baseurl` and `API` settings.
100 | 
101 | ### 4. Run the Project
102 | 
103 | Run the main program:
104 | 
105 | ```bash
106 | python interview/main.py
107 | ```
108 | 
109 | ## Audio Configuration on macOS and Windows
110 | 
111 | ### Using BlackHole for Audio Capture (macOS)
112 | 
113 | On macOS, BlackHole is a virtual audio driver that allows seamless audio routing between applications. Here's how to set it up:
114 | 
115 | 1. **Configure an Aggregate Device**:
116 | 
117 |    - Open the **Audio MIDI Setup** application.
118 |    - Create an Aggregate Device, selecting **BlackHole 2ch** and your Bluetooth headset.
119 |    - Ensure that **BlackHole 2ch** is used as the output device and the Bluetooth headset as the input device.
120 | 
121 | 2. **Configure a Multi-Output Device**:
122 | 
123 |    - Create a Multi-Output Device, selecting **BlackHole 2ch** and the Bluetooth headset as output devices.
124 |    - Set the Multi-Output Device as the system's default output device.
125 | 
126 | 3. **Run Audio Test**:
127 |    - Use `python interview/audioTest.py` to list all audio input and output devices, and ensure that the sample rate is set to 16K.
128 | 
129 | ### Audio Capture on Windows
130 | 
131 | On Windows, you can use similar virtual audio devices like **VB-CABLE Virtual Audio Device** or **VoiceMeeter** to achieve functionality similar to BlackHole on macOS. Here's how to use VB-CABLE:
132 | 
133 | 1. **Install VB-CABLE Virtual Audio Device**:
134 | 
135 |    - Visit the [VB-Audio official website](https://vb-audio.com/Cable/) and download the VB-CABLE installer.
136 |    - Install VB-CABLE Virtual Audio Device, which will then appear as a virtual audio device on your system.
137 | 
138 | 2. **Configure Audio Devices**:
139 | 
140 |    - Open the **Sound Control Panel** and navigate to the **Playback** and **Recording** tabs.
141 |    - In the **Playback** tab, set `VB-CABLE Input` as the default playback device to capture system audio.
142 |    - In the **Recording** tab, set `VB-CABLE Output` as the default recording device to allow applications to capture system audio input.
143 |    - If you also need to capture microphone audio, set the microphone as an input to `VB-CABLE Output`, or configure advanced settings with VoiceMeeter.
144 | 
145 | 3. **Run Audio Test**:
146 | 
147 |    - Use `python interview/audioTest.py` to list all audio input and output devices on Windows, ensuring that the selected devices are configured correctly for recording and playback.
148 |    - Ensure the sample rate is set to 16K to be compatible with the ASR model.
149 | 
150 | With these steps, you can easily set up audio capture on both macOS and Windows for use in your project.
151 | 
152 | ---
153 | 
154 | Thank you for using this project! If you have any questions or suggestions, please feel free to reach out.
155 | 


--------------------------------------------------------------------------------
/Windows_user_tutorial.md:
--------------------------------------------------------------------------------
  1 | # Windows User tutorial
  2 | 
  3 | 在本篇文章中, 将详细介绍在 `Windows` 系统下如何部署本工具.
  4 | 
  5 | 本工具主要有两个部分. `截屏询问GPT`, `语音记录询问GPT`. 很高兴的是,这两个模块是互不干扰的, 所以如果你只需要任意一个单独的模块,只需要遵循对应的教程就可以了.
  6 | 
  7 | 首先，你需要确保本地已经安装了 `Poetry`。如果尚未安装，可以使用以下命令进行安装(`使用PowerShell`)：
  8 | 
  9 | ```bash
 10 | (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
 11 | ```
 12 | 
 13 | 添加`Poetry`到系统环境变量中
 14 | 
 15 | 安装完成后，确保 `Poetry` 已经添加到你的系统路径中，可以通过以下命令验证安装是否成功：
 16 | 
 17 | ```bash
 18 | poetry --version
 19 | ```
 20 | 
 21 | 这将输出 Poetry 的版本信息，确认安装成功。
 22 | 
 23 | 克隆本项目,并进入项目根目录
 24 | 
 25 | ```bash
 26 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git
 27 | cd Simple-Interview-Audio-Processing
 28 | ```
 29 | 
 30 | 在项目根目录下，运行以下命令来安装项目的所有依赖项：
 31 | 
 32 | ```bash
 33 | poetry install
 34 | ```
 35 | 
 36 | 进入虚拟环境
 37 | 
 38 | ```bash
 39 | poetry shell
 40 | ```
 41 | 
 42 | 复制`.env` 文件
 43 | 
 44 | ```
 45 | cp env.template .env
 46 | ```
 47 | 
 48 | **注意:**, 你每次进入这个项目都需要`poetry shell`, 然后你就可以使用`python xxx.py` 使用虚拟环境运行`python` 文件了.
 49 | 
 50 | 启动 `docker` 容器.
 51 | 
 52 | ```
 53 | docker-compose up -d
 54 | # 如果你发现后面的步骤出现问题,这里可能会出现问题,使用下面的命令排查
 55 | docker-compose down
 56 | docker-compose up
 57 | 查看log日志,看看是哪个服务没有正常启动
 58 | docker-compose restart
 59 | docker-compose restart <服务名>
 60 | ```
 61 | 
 62 | ## 语音记录询问 GPT
 63 | 
 64 | 这部分的需要的东西有:
 65 | 
 66 | 1. `openai` ( `deepseek`, `oaipro`, `通义千问` )格式的`api`和`Base_url`
 67 | 2. 阿里的 `funASR` 进行实时语音识别
 68 | 3. [Virtual audio cable](https://pan.baidu.com/s/1yWoZ7dErbjsdQjI5Dtj5-w) (提取码:ib9p)进行语音的分离和采样.
 69 | 
 70 | #### Virtual audio cable 设置
 71 | 
 72 | ##### 下载安装
 73 | 
 74 | [Virtual audio cable](https://pan.baidu.com/s/1yWoZ7dErbjsdQjI5Dtj5-w) (提取码:ib9p)
 75 | 
 76 | 根据电脑操作系统位数进行安装
 77 | 
 78 | ##### 进行 Virtual audio cable 设置
 79 | 
 80 | 在声音设置中进行如下操作:
 81 | 
 82 | ![2024-09-08-16-43-48](asserts/2024-09-08-16-43-48.png)
 83 | ![2024-09-08-16-43-56](asserts/2024-09-08-16-43-56.png)
 84 | ![2024-09-08-16-44-01](asserts/2024-09-08-16-44-01.png)
 85 | 
 86 | 从`程序(win)` 里面找 `Audio Repeater`
 87 | 
 88 | ![2024-09-08-16-44-07](asserts/2024-09-08-16-44-07.png)
 89 | 
 90 | 此时,你需要测试一下,你的音响是否能播放声音, 你的麦克风能否正常输入声音.
 91 | 
 92 | #### 测试环境
 93 | 
 94 | 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备.
 95 | 
 96 | ```
 97 | ❯ python interview/audioTest.py
 98 | Device 0: Microsoft Sound Mapper - Input
 99 | Device 1: 麦克风 (Realtek(R) Audio)
100 | Device 2: CABLE Output (VB-Audio Virtual
101 | Device 3: Line 1 (Virtual Audio Cable)
102 | Device 4: Microsoft Sound Mapper - Output
103 | Device 5: Line 1 (Virtual Audio Cable)
104 | Device 6: CABLE Input (VB-Audio Virtual C
105 | Device 7: 扬声器 (Realtek(R) Audio)
106 | Device 8: 主声音捕获驱动程序
107 | Device 9: 麦克风 (Realtek(R) Audio)
108 | Device 10: CABLE Output (VB-Audio Virtual Cable)
109 | Device 11: Line 1 (Virtual Audio Cable)
110 | Device 12: 主声音驱动程序
111 | Device 13: Line 1 (Virtual Audio Cable)
112 | Device 14: CABLE Input (VB-Audio Virtual Cable)
113 | Device 15: 扬声器 (Realtek(R) Audio)
114 | Device 16: Line 1 (Virtual Audio Cable)
115 | Device 17: CABLE Input (VB-Audio Virtual Cable)
116 | Device 18: 扬声器 (Realtek(R) Audio)
117 | Device 19: CABLE Output (VB-Audio Virtual Cable)
118 | Device 20: Line 1 (Virtual Audio Cable)
119 | Device 21: 麦克风 (Realtek(R) Audio)
120 | Device 22: 立体声混音 (Realtek HD Audio Stereo input)
121 | Device 23: 麦克风 (Realtek HD Audio Mic input)
122 | Device 24: Speakers (Realtek HD Audio output)
123 | Device 25: Mic 1 (Virtual Cable 1)
124 | Device 26: Line Out (Virtual Cable 1)
125 | Device 27: CABLE Output (VB-Audio Point)
126 | Device 28: Speakers (VB-Audio Point)
127 | Device 1:
128 |   Name: 麦克风 (Realtek(R) Audio)
129 |   Max Input Channels: 2
130 |   Max Output Channels: 0
131 |   Default Sample Rate: 44100.0
132 | Device 3:
133 |   Name: Line 1 (Virtual Audio Cable)
134 |   Max Input Channels: 8
135 |   Max Output Channels: 0
136 |   Default Sample Rate: 44100.0
137 | Recording...
138 | Recording finished and saved to files.
139 | ```
140 | 
141 | 我这里选择`Device 1`作为麦克风输入, `Device 3`作为系统内声音输入.
142 | 
143 | 第一次运行一般都会报错的,然后你根据已经列出的所有音频输入设备,选择相应的麦克风输入和系统内声音输入.
144 | 运行完这个脚本,在项目的根目录内会生成的`system_audio.wav`,`mic_audio.wav` 里面分别记录了`系统内声音` 和 `麦克风声音`.(注意:此时在读取麦克风输入和系统内声音输入,请保证相应的有输入)
145 | 请播放这个文件,来检查自己的设置是否正常.
146 | 
147 | #### 配置.env 文件
148 | 
149 | 在`.env`文件里面
150 | 
151 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。
152 | 
153 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量
154 | 
155 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置.
156 | 
157 | #### 配置 docker
158 | 
159 | ##### FunASR Interview 服务
160 | 
161 | 进入 Docker 容器内部：
162 | 
163 | ```bash
164 | docker-compose exec funasr_interview bash
165 | ```
166 | 
167 | 在 Docker 容器内运行以下命令启动服务：
168 | 
169 | ```bash
170 | cd FunASR/runtime
171 | 
172 | 
173 | nohup bash run_server_2pass.sh \
174 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
175 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
176 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
177 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
178 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
179 |  --itn-dir thuduj12/fst_itn_zh \
180 |  --certfile 0 \
181 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
182 | ```
183 | 
184 | ##### FunASR Rookie 服务
185 | 
186 | 进入 Docker 容器内部：
187 | 
188 | ```bash
189 | docker-compose exec funasr_rookie bash
190 | ```
191 | 
192 | 在 Docker 容器内运行以下命令启动服务：
193 | 
194 | ```bash
195 | cd FunASR/runtime
196 | 
197 | 
198 | nohup bash run_server_2pass.sh \
199 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
200 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
201 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
202 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
203 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
204 |  --itn-dir thuduj12/fst_itn_zh \
205 |  --certfile 0 \
206 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
207 | ```
208 | 
209 | #### 运行
210 | 
211 | ```
212 | python interview/main.py
213 | ```
214 | 
215 | 使用`<ctrl> + <alt> + h` 调用`GPT` 询问.
216 | 
217 | ## 截屏询问 GPT
218 | 
219 | 这部分的需要的东西有:
220 | 
221 | 1. `openai`(或者官转)的`api`和`Base_url`. (这部分没有`deepseek`等其他模型, 需要支持`多模态`)
222 | 2. 类似于`CloudFlare`的`R2`的`OSS`. (做图床用,后续可能会支持免费图床) **注意: 一定不能是仅限国内访问的 OSS, 需要美国的 chatGPT 也能访问的到**
223 | 
224 | [R2 数据库教程(网上搜的)](https://blog.huacai.one/post/3)
225 | 
226 | 在`.env`文件里面
227 | 
228 | ```
229 | # R2 Access Key ID
230 | # This is your R2 account's Access Key ID, used to authenticate API requests.
231 | # R2 Secret Access Key
232 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests.
233 | # R2 Bucket Name
234 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored.
235 | # R2 Region
236 | # The region where your R2 bucket is located. For example, 'us-west-1'.
237 | # R2 Endpoint URL
238 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name.
239 | # For example: https://your-endpoint-url.com
240 | # R2 Custom Domain
241 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同
242 | R2_ACCESS_KEY_ID=your-access-key-id-here
243 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here
244 | R2_BUCKET_NAME=your-bucket-name-here
245 | R2_REGION=your-region-here
246 | R2_ENDPOINT_URL=https://your-endpoint-url-here
247 | R2_CUSTOM_DOMAIN=your-custom-domain-here
248 | ```
249 | 
250 | ### 使用
251 | 
252 | ```
253 | python interview/main.py
254 | ```
255 | 
256 | 使用`<ctrl> + <alt> + a` 调用`algorithm Prompt` 询问.
257 | 
258 | 使用`<ctrl> + <alt> + p` 调用`personality Prompt` 询问.
259 | 
260 | 使用`<ctrl> + <alt> + g` 调用`general Prompt` 询问.
261 | 
262 | 使用`<ctrl> + <alt> + l` 调用`long_screenshot Prompt` 询问. (截屏拼接)
263 | 
264 | 使用`<ctrl> + <alt> + f` 调用`fix Prompt` 询问.
265 | 
266 | 使用`<ctrl> + <alt> + o` 调用`ocr Prompt` 询问.
267 | 
268 | ## QA
269 | 
270 | 在这里罗列一些可能出现的问题,以及解决方案:
271 | 
272 | ### FunASR
273 | 
274 | 首先这个是`阿里巴巴` 提供的`实时语音转文字`服务. 但是他只支持单声道,16k 采样率. 很多时候我们的电脑是不支持这种设置的, 我`重采样` 了, 所以你应该在你的`.env`中填写正确的内容.
275 | 通过`python interview/audioTest.py` 进行测试.
276 | 
277 | **Q:** 报错.2024-08-17 23:50:04,377 - modelscope - WARNING - Downloading: TLG.fst failed, reason: ('Connection broken: IncompleteRead(1164614 bytes read, 166607546 more expected)', IncompleteRead(1164614 bytes read, 166607546 more expected)) will retry
278 | 
279 | **A:** 这是因为`modelScope`的下载服务出了问题, 原因可能因为你是国外的网络,导致`modelScope`中的模型下载失败导致的.如果出现了这个问题, 在`docker`的部分
280 | 使用下面的命令:
281 | 
282 | ```bash
283 | docker-compose exec <funasr_interview/rookie> bash
284 | ```
285 | 
286 | 在 Docker 容器内运行以下命令启动服务：
287 | 
288 | ```bash
289 | cd FunASR/runtime
290 | 
291 | 
292 | bash run_server_2pass.sh \
293 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
294 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
295 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
296 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
297 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
298 |  --itn-dir thuduj12/fst_itn_zh \
299 |  --certfile 0 \
300 |  --hotword ../../hotwords.txt
301 | ```
302 | 
303 | 查看具体的报错和网络连接失败信息,进行修复.
304 | 
305 | ### websocket
306 | 
307 | **Q:** websocket 连接失败,或者网页长时间查看不了
308 | 
309 | **A:** docker 相关的容器和本地机器是不是跨国了, 如果跨国这边会这样,尽量部署在局域网或者国内.
310 | 


--------------------------------------------------------------------------------
/asserts/2024-09-08-14-37-55.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-37-55.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-14-39-00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-39-00.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-14-40-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-14-40-02.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-16-43-48.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-43-48.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-16-43-56.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-43-56.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-16-44-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-44-01.png


--------------------------------------------------------------------------------
/asserts/2024-09-08-16-44-07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/asserts/2024-09-08-16-44-07.png


--------------------------------------------------------------------------------
/config.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "shortcuts": {
 3 |       "algorithm": ["<ctrl>", "<alt>", "a"],
 4 |       "personality": ["<ctrl>", "<alt>", "p"],
 5 |       "general": ["<ctrl>", "<alt>", "g"],
 6 |       "long_screenshot": ["<ctrl>", "<alt>", "l"],
 7 |       "help": ["<ctrl>", "<alt>", "h"],
 8 |       "fix": ["<ctrl>", "<alt>", "f"],
 9 |       "ocr": ["<ctrl>", "<alt>", "o"],
10 |       "exit": ["<ctrl>", "c"]
11 |   },
12 |   "save_dir": "./screenshots",
13 |   "scroll_delay": 1.5,
14 |   "scroll_amount": 17,
15 |   "max_screenshots": 3 
16 | }


--------------------------------------------------------------------------------
/data/database.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/data/database.xlsx


--------------------------------------------------------------------------------
/dialogs_output_202408201256.md:
--------------------------------------------------------------------------------
 1 | [12:55:40] 🎤 interviewer : 
 2 | 
 3 |  你为什么要使用redis呢
 4 | 
 5 | [12:55:48] 🤖 chatgpt : 
 6 | 
 7 |  ```markdown
 8 | 为了提高多进程间通信的效率和速度。
 9 | ```
10 | 
11 | [12:56:15] 🤖 chatgpt : 
12 | 
13 |  在技术面试中，当被问及为何选择使用Redis时，我的回答会包含以下几个方面：
14 | 
15 | ### 1. Redis的基本特性
16 | Redis（Remote Dictionary Server）是一个开源的内存数据库，它以键值对的形式存储数据，并支持多种数据结构，如字符串、哈希、列表、集合、有序集合等。Redis的主要特点是数据存储在内存中，这使得其访问速度非常快，通常可以达到微秒级别的响应时间。
17 | 
18 | ### 2. 多进程间通信的需求
19 | 在现代应用开发中，多进程架构是常见的设计模式，特别是在需要处理大量并发请求的场景下。多进程间通信（IPC）是确保这些进程能够高效、可靠地交换信息的关键。传统的IPC方法，如文件、管道、消息队列等，虽然可行，但在性能和实时性上往往不能满足高并发系统的需求。
20 | 
21 | ### 3. Redis在多进程间通信中的应用
22 | Redis因其高性能和丰富的数据结构，非常适合作为多进程间通信的中间件。例如，在一个分布式系统中，不同的服务进程可能需要共享某些状态或数据。使用Redis，这些进程可以快速地读写共享数据，而无需担心数据一致性和并发控制的问题。
23 | 
24 | ### 4. 具体应用场景举例
25 | 假设我们有一个在线游戏服务器，它需要处理成千上万的玩家请求。为了提高处理能力，服务器采用了多进程架构，每个进程负责处理一部分玩家请求。在这种情况下，Redis可以用来存储玩家的实时状态（如位置、分数等），这样即使玩家请求被分配到不同的进程处理，也能保证状态的一致性和实时更新。
26 | 
27 | ### 5. Redis的优缺点
28 | **优点：**
29 | - **高性能：** 内存存储，快速读写。
30 | - **丰富的数据结构：** 支持多种数据类型，便于处理复杂的数据交互。
31 | - **原子操作：** 所有操作都是原子的，适合并发环境。
32 | - **持久化：** 支持数据持久化，防止数据丢失。
33 | 
34 | **缺点：**
35 | - **内存限制：** 由于数据存储在内存中，因此存储容量受限于物理内存大小。
36 | - **成本：** 内存数据库的成本通常高于磁盘数据库。
37 | 
38 | ### 6. 结论
39 | 综上所述，Redis因其高性能和丰富的数据结构，在多进程间通信中扮演了重要角色。虽然存在一些限制，但在大多数高并发、实时性要求高的应用场景中，Redis都是一个优秀的选择。在实际应用中，我们需要根据具体的业务需求和资源情况，权衡利弊，做出合适的技术选型。


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | version: "3.8"
 2 | 
 3 | services:
 4 |   app:
 5 |     image: ghcr.io/aowerdmax/websocket-redis/websocket_redis:latest
 6 |     ports:
 7 |       - "${GO_APP_PORT}:${GO_APP_PORT}"
 8 |     volumes:
 9 |       - ./data:/app/data
10 |       - ./.env:/app/.env
11 |     depends_on:
12 |       - meilisearch
13 |       - redis
14 |     env_file:
15 |       - .env
16 |     environment:
17 |       MEILISEARCH_HOST: "meilisearch"
18 |       MEILISEARCH_PORT: 7700
19 |       REDIS_HOST: "redis"
20 |       REDIS_PORT: 6379
21 |     restart: on-failure
22 |     networks:
23 |       - app_network
24 | 
25 |   meilisearch:
26 |     image: getmeili/meilisearch:latest
27 |     ports:
28 |       - "${MEILISEARCH_PORT}:7700"
29 |     environment:
30 |       MEILI_ENV: "development"
31 |     networks:
32 |       - app_network
33 | 
34 |   funasr_interview:
35 |     image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10
36 |     container_name: funasr_interview
37 |     ports:
38 |       - "${INTERVIEWER_PORT}:10095"
39 |     volumes:
40 |       - ./funasr-runtime-resources/models_interview:/workspace/models
41 |     privileged: true
42 |     stdin_open: true
43 |     tty: true
44 |     networks:
45 |       - app_network
46 | 
47 |   funasr_rookie:
48 |     image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10
49 |     container_name: funasr_rookie
50 |     ports:
51 |       - "${ROOKIE_PORT}:10095"
52 |     volumes:
53 |       - ./funasr-runtime-resources/models_rookie:/workspace/models
54 |     privileged: true
55 |     stdin_open: true
56 |     tty: true
57 |     networks:
58 |       - app_network
59 | 
60 |   redis:
61 |     image: redis:latest
62 |     ports:
63 |       - "${REDIS_PORT}:6379"
64 |     networks:
65 |       - app_network
66 |     command: ["redis-server", "--appendonly", "yes"]
67 | 
68 | networks:
69 |   app_network:
70 | 


--------------------------------------------------------------------------------
/env.template:
--------------------------------------------------------------------------------
 1 | CHATGPT_BASE_URL=https://api.oaipro.com/v1/chat/completions
 2 | AUTHORIZATION=sk-xxxxx
 3 | MODEL=gpt-4o
 4 | SSL=0
 5 | MODE=2pass
 6 | ROOKIE_HOST=localhost
 7 | ROOKIE_PORT=10097
 8 | INTERVIEWER_HOST=localhost
 9 | INTERVIEWER_PORT=10096
10 | AUDIO_FS=48000
11 | AUDIO_CHANNEL=2
12 | INTERVIEWER_WAIT_TIME=5000
13 | ROOKIE_WAIT_TIME=5000
14 | ROOKIE_DIALOG_LEN=1
15 | CHATGPT_DIALOG_LEN=4
16 | INTERVIEWER_DIALOG_LEN=5
17 | REDIS_HOST=localhost
18 | REDIS_PORT=6379
19 | AGGREGATE_DEVICE_INDEX=12
20 | MIC_DEVICE_INDEX=2
21 | MEILISEARCH_HOST=127.0.0.1
22 | MEILISEARCH_PORT=7700
23 | MEILISEARCH_DEEP=1
24 | RAG_ENABLED=True
25 | INTERVAL_TIME=500
26 | DATA_QUEUE_KEYS=dialog_manager:chatgpt,dialog_manager:interviewer,dialog_manager:rookie
27 | GO_APP_HOST=0.0.0.0
28 | GO_APP_PORT=8080
29 | WS_HOST=127.0.0.1
30 | WS_PORT=8080
31 | 
32 | # R2 Access Key ID
33 | # This is your R2 account's Access Key ID, used to authenticate API requests.
34 | # R2 Secret Access Key
35 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests.
36 | # R2 Bucket Name
37 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored.
38 | # R2 Region
39 | # The region where your R2 bucket is located. For example, 'us-west-1'.
40 | # R2 Endpoint URL
41 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name.
42 | # For example: https://your-endpoint-url.com
43 | # R2 Custom Domain
44 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同
45 | R2_ACCESS_KEY_ID=your-access-key-id-here
46 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here
47 | R2_BUCKET_NAME=your-bucket-name-here
48 | R2_REGION=your-region-here
49 | R2_ENDPOINT_URL=https://your-endpoint-url-here
50 | R2_CUSTOM_DOMAIN=your-custom-domain-here
51 | 


--------------------------------------------------------------------------------
/img/RAG_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/RAG_1.png


--------------------------------------------------------------------------------
/img/RAG_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/RAG_2.png


--------------------------------------------------------------------------------
/img/algo_long_response.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/algo_long_response.png


--------------------------------------------------------------------------------
/img/algo_long_screenShot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/algo_long_screenShot.png


--------------------------------------------------------------------------------
/img/excel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/excel.png


--------------------------------------------------------------------------------
/img/fix_response.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/fix_response.png


--------------------------------------------------------------------------------
/img/fix_screenShot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/fix_screenShot.png


--------------------------------------------------------------------------------
/img/general_response.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/general_response.png


--------------------------------------------------------------------------------
/img/general_screenShot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/general_screenShot.png


--------------------------------------------------------------------------------
/img/image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/image.png


--------------------------------------------------------------------------------
/img/img.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/img.gif


--------------------------------------------------------------------------------
/img/web.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/img/web.gif


--------------------------------------------------------------------------------
/interview/ChatgptManager.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | import asyncio
 4 | from DialogManager import DialogManager
 5 | from RequestSender import RequestSender
 6 | from Config import Config
 7 | from RagManager import RagManager
 8 | 
 9 | class ChatgptManager:
10 |     def __init__(self):
11 |         self.dialog = DialogManager()
12 |         self.request_sender = RequestSender()
13 |         self.prompt_directory = "./workflows/prompt_2"
14 |         self.last_processed_id = 3
15 |         self.rag = RagManager()
16 | 
17 |     async def run_workflow(self):
18 |         files = sorted(os.listdir(self.prompt_directory))
19 |         content_list = []
20 | 
21 |         for file_name in files:
22 |             file_path = os.path.join(self.prompt_directory, file_name)
23 |             if file_name.endswith('.md') and os.path.isfile(file_path):
24 |                 with open(file_path, 'r') as file:
25 |                     content = file.read()
26 |                 
27 |                 content = await self._replace_content(content)
28 |                 
29 |                 content_list.append({
30 |                     "content": content,
31 |                     "role": "user"
32 |                 })
33 |                 
34 |                 receiveData = await self.request_sender.send_post_request(content_list)
35 |                 content_list.append({
36 |                     "content": receiveData,
37 |                     "role": "assistant"
38 |                 })
39 | 
40 |     async def _replace_content(self, content):
41 |         content = content.replace("${conversation}", self.dialog.get_dialog())
42 |         if Config.RAG_ENABLED:
43 |             rag_answers = await self.rag.query_meilisearch()
44 |             if rag_answers:
45 |                 RAG_content = "\n下面是辅助数据:\n" + "\n".join(rag_answers)
46 |                 content += RAG_content
47 |         return content
48 | 
49 |     def should_process(self):
50 |         interviewer_list = self.dialog._get_list(self.dialog.interviewer_key)
51 |         
52 |         if not interviewer_list:
53 |             return False
54 | 
55 |         last_entry = interviewer_list[-1]
56 |         if last_entry['id'] > self.last_processed_id:
57 |             if last_entry['text'].strip() == "":
58 |                 self.last_processed_id = last_entry['id']
59 |                 return True
60 | 
61 |         return False
62 | 
63 | def chatgpt_process():
64 |     chatgpt_manager = ChatgptManager()
65 |     while True:
66 |         if chatgpt_manager.should_process():
67 |             asyncio.run(chatgpt_manager.run_workflow())
68 |         
69 |         time.sleep(Config.INTERVIEWER_WAIT_TIME / 1000)
70 | 
71 | if __name__ == "__main__":
72 |     chatgpt_process()


--------------------------------------------------------------------------------
/interview/Config.py:
--------------------------------------------------------------------------------
 1 | from dotenv import load_dotenv, find_dotenv
 2 | import os
 3 | 
 4 | dotenv_path = find_dotenv()
 5 | if dotenv_path:
 6 |     print(f"Loading .env file from: {dotenv_path}")
 7 |     load_dotenv(dotenv_path)
 8 | else:
 9 |     print("No .env file found")
10 | 
11 | class Config:
12 |     ROOKIE_HOST = os.getenv("ROOKIE_HOST", "localhost")
13 |     ROOKIE_PORT = int(os.getenv("ROOKIE_PORT", 10095))
14 |     INTERVIEWER_HOST = os.getenv("INTERVIEWER_HOST", "localhost")
15 |     INTERVIEWER_PORT = int(os.getenv("INTERVIEWER_PORT", 10095))
16 |     CHUNK_SIZE = [int(x) for x in os.getenv("CHUNK_SIZE", "5,10,5").split(",")]
17 |     CHUNK_INTERVAL = int(os.getenv("CHUNK_INTERVAL", 10))
18 |     HOTWORD = os.getenv("HOTWORD", "")
19 |     AUDIO_IN = os.getenv("AUDIO_IN", None)
20 |     AUDIO_FS = int(os.getenv("AUDIO_FS", 48000))
21 |     AUDIO_CHANNEL = int(os.getenv("AUDIO_CHANNEL", 2))
22 |     SEND_WITHOUT_SLEEP = os.getenv("SEND_WITHOUT_SLEEP", "True").lower() in ("true", "1", "yes")
23 |     THREAD_NUM = int(os.getenv("THREAD_NUM", 1))
24 |     WORDS_MAX_PRINT = int(os.getenv("WORDS_MAX_PRINT", 10000))
25 |     OUTPUT_DIR = os.getenv("OUTPUT_DIR", None)
26 |     SSL = int(os.getenv("SSL", 1))  # 整数类型，1 表示启用 SSL，0 表示禁用
27 |     USE_ITN = int(os.getenv("USE_ITN", 1))  # 整数类型，1 表示使用 ITN，0 表示不使用
28 |     MODE = os.getenv("MODE", "2pass")
29 |     INTERVIEWER_WAIT_TIME = int(os.getenv("INTERVIEWER_WAIT_TIME", 5000))
30 |     ROOKIE_WAIT_TIME = int(os.getenv("ROOKIE_WAIT_TIME", 5000))
31 |     ROOKIE_DIALOG_LEN=int(os.getenv("ROOKIE_DIALOG_LEN", 1))
32 |     CHATGPT_DIALOG_LEN=int(os.getenv("CHATGPT_DIALOG_LEN", 2))
33 |     INTERVIEWER_DIALOG_LEN=int(os.getenv("INTERVIEWER_DIALOG_LEN", 3))
34 |     REDIS_HOST=os.getenv("REDIS_HOST", "127.0.0.1")
35 |     REDIS_PORT=int(os.getenv("REDIS_PORT", 6379))
36 |     AGGREGATE_DEVICE_INDEX=int(os.getenv("AGGREGATE_DEVICE_INDEX", 13))
37 |     MIC_DEVICE_INDEX=int(os.getenv("MIC_DEVICE_INDEX", 2))
38 |     MEILISEARCH_HOST=os.getenv("MEILISEARCH_HOST", "127.0.0.1")
39 |     MEILISEARCH_PORT=int(os.getenv("MEILISEARCH_PORT", 7700))
40 |     MEILISEARCH_DEEP=int(os.getenv("MEILISEARCH_DEEP", 1))
41 |     RAG_ENABLED = os.getenv("RAG_ENABLED", "False").lower() in ("true", "1", "yes")
42 |     R2_ACCESS_KEY_ID = os.getenv('R2_ACCESS_KEY_ID')
43 |     R2_SECRET_ACCESS_KEY = os.getenv('R2_SECRET_ACCESS_KEY')
44 |     R2_BUCKET_NAME = os.getenv('R2_BUCKET_NAME')
45 |     R2_REGION = os.getenv('R2_REGION')
46 |     R2_ENDPOINT_URL = os.getenv('R2_ENDPOINT_URL')
47 |     R2_CUSTOM_DOMAIN = os.getenv('R2_CUSTOM_DOMAIN')
48 | 


--------------------------------------------------------------------------------
/interview/DialogManager.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import time
  3 | import random
  4 | import redis
  5 | import json
  6 | from multiprocessing import Process
  7 | from Config import Config
  8 | from redis.exceptions import LockNotOwnedError
  9 | import platform
 10 | 
 11 | 
 12 | class DialogManager:
 13 |     _instance = None
 14 | 
 15 |     def __new__(cls, *args, **kwargs):
 16 |         if not cls._instance:
 17 |             cls._instance = super(DialogManager, cls).__new__(cls)
 18 |             cls._instance._initialize()
 19 |         return cls._instance
 20 | 
 21 |     def _initialize(self):
 22 |         self.redis = redis.StrictRedis(host=Config.REDIS_HOST, port=Config.REDIS_PORT, decode_responses=True)
 23 | 
 24 |         self.interviewer_key = "dialog_manager:interviewer"
 25 |         self.rookie_key = "dialog_manager:rookie"
 26 |         self.chatgpt_key = "dialog_manager:chatgpt"
 27 |         self.next_id = "dialog_manager:next_id"
 28 |         self.system = platform.system()
 29 | 
 30 |         self.interviewer_icons = ["🎤"]
 31 |         self.rookie_icons = ["😅"]
 32 |         self.chatgpt_icons = ["🤖"]
 33 | 
 34 |     def _get_next_id(self):
 35 |         return self.redis.incr("dialog_manager:next_id")
 36 |     
 37 |     def _get_icon(self, list_name):
 38 |         if list_name == "interviewer":
 39 |             return random.choice(self.interviewer_icons)
 40 |         elif list_name == "rookie":
 41 |             return random.choice(self.rookie_icons)
 42 |         elif list_name == "chatgpt":
 43 |             return random.choice(self.chatgpt_icons)
 44 | 
 45 |     def _get_valid_entries(self, key, max_len):
 46 |         entries = self._get_list(key)
 47 |         valid_entries = [item for item in entries if item['text'].strip() != ""]
 48 | 
 49 |         if len(valid_entries) > max_len:
 50 |             return valid_entries[-max_len:]
 51 |         else:
 52 |             return valid_entries
 53 | 
 54 |     def _notify_change(self, out = True):
 55 |         interviewer_list = self._get_valid_entries(self.interviewer_key, Config.INTERVIEWER_DIALOG_LEN)
 56 |         rookie_list = self._get_valid_entries(self.rookie_key, Config.ROOKIE_DIALOG_LEN)
 57 |         chatgpt_list = self._get_valid_entries(self.chatgpt_key, Config.CHATGPT_DIALOG_LEN)
 58 | 
 59 |         combined_list = interviewer_list + rookie_list + chatgpt_list
 60 | 
 61 |         sorted_combined_list = sorted(combined_list, key=lambda x: x['time'])
 62 | 
 63 |         output = []
 64 |         for entry in sorted_combined_list:
 65 |             icon = self._get_icon(entry['list_name'])
 66 |             output.append(f"{entry['time']} {icon} {entry['list_name']} : {entry['text']}")
 67 | 
 68 |         result = "\n".join(output)
 69 |         if out:
 70 |             if self.system == "Windows":
 71 |                 os.system('cls')
 72 |             elif self.system == "Darwin" or self.system == "Linux":
 73 |                 os.system('clear')
 74 |             print(result)
 75 |         return result
 76 | 
 77 |     def _get_list(self, key):
 78 |         return [json.loads(item) for item in self.redis.lrange(key, 0, -1)]
 79 | 
 80 |     def _add_to_list(self, key, element):
 81 |         lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5)
 82 |         try:
 83 |             lock.acquire()
 84 |             element['id'] = self._get_next_id()
 85 |             self.redis.rpush(key, json.dumps(element))
 86 |             self._notify_change()
 87 |         finally:
 88 |             try:
 89 |                 lock.release()
 90 |             except LockNotOwnedError:
 91 |                 pass
 92 | 
 93 |     def _update_last_in_list(self, key, element):
 94 |         lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5)
 95 |         try:
 96 |             lock.acquire()
 97 |             current_list = self._get_list(key)
 98 |             if current_list:
 99 |                 element['id'] = current_list[-1]['id']
100 |                 self.redis.lset(key, -1, json.dumps(element))
101 |             else:
102 |                 element['id'] = self._get_next_id()
103 |                 self.redis.rpush(key, json.dumps(element))
104 |             self._notify_change()
105 |         finally:
106 |             try:
107 |                 lock.release()
108 |             except LockNotOwnedError:
109 |                 pass
110 | 
111 |     def get_last_interviewer(self):
112 |         interviewer_list = self._get_valid_entries(self.interviewer_key, 1)
113 |         return interviewer_list[-1]
114 |     
115 |     def get_last_rookie(self):
116 |         rookie_list = self._get_valid_entries(self.rookie_key, 1)
117 |         if rookie_list:
118 |             return rookie_list[-1]
119 |         return None
120 |     
121 |     def get_dialog(self):
122 |         interviewer_list = self._get_valid_entries(self.interviewer_key, Config.INTERVIEWER_DIALOG_LEN)
123 |         rookie_list = self._get_valid_entries(self.rookie_key, Config.ROOKIE_DIALOG_LEN)
124 | 
125 |         combined_list = interviewer_list + rookie_list
126 | 
127 |         sorted_combined_list = sorted(combined_list, key=lambda x: x['time'])
128 | 
129 |         output = []
130 |         for entry in sorted_combined_list:
131 |             icon = self._get_icon(entry['list_name'])
132 |             output.append(f"{entry['time']} {icon} {entry['list_name']} : {entry['text']}")
133 | 
134 |         result = "\n".join(output)
135 |         return result
136 |     
137 |     def add_to_interviewer(self, element):
138 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
139 |         entry = {'time': current_time, 'list_name': 'interviewer', 'text': element}
140 |         self._add_to_list(self.interviewer_key, entry)
141 | 
142 |     def add_to_rookie(self, element):
143 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
144 |         entry = {'time': current_time, 'list_name': 'rookie', 'text': element}
145 |         self._add_to_list(self.rookie_key, entry)
146 | 
147 |     def add_to_chatgpt(self, element):
148 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
149 |         entry = {'time': current_time, 'list_name': 'chatgpt', 'text': element}
150 |         self._add_to_list(self.chatgpt_key, entry)
151 | 
152 |     def update_last_interviewer(self, element):
153 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
154 |         entry = {'time': current_time, 'list_name': 'interviewer', 'text': element}
155 |         self._update_last_in_list(self.interviewer_key, entry)
156 | 
157 |     def update_last_rookie(self, element):
158 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
159 |         entry = {'time': current_time, 'list_name': 'rookie', 'text': element}
160 |         self._update_last_in_list(self.rookie_key, entry)
161 | 
162 |     def update_last_chatgpt(self, element):
163 |         current_time = time.strftime("[%H:%M:%S]", time.localtime())
164 |         entry = {'time': current_time, 'list_name': 'chatgpt', 'text': element}
165 |         self._update_last_in_list(self.chatgpt_key, entry)
166 |     
167 |     def clear_all(self):
168 |         lock = self.redis.lock("dialog_manager_lock", timeout=10, blocking_timeout=5)
169 |         try:
170 |             lock.acquire()
171 |             self.redis.delete(self.interviewer_key)
172 |             self.redis.delete(self.rookie_key)
173 |             self.redis.delete(self.chatgpt_key)
174 |             self.redis.delete(self.next_id)
175 |             print("All keys have been cleared.")
176 |         finally:
177 |             try:
178 |                 lock.release()
179 |             except LockNotOwnedError:
180 |                 pass
181 | 
182 | 
183 | def process_1():
184 |     dialog_manager = DialogManager()
185 |     dialog_manager.add_to_interviewer("Question 1")
186 |     time.sleep(1)
187 |     dialog_manager.add_to_rookie("Answer 1")
188 |     time.sleep(1)
189 |     dialog_manager.add_to_chatgpt("Suggestion 1")
190 |     time.sleep(1)
191 | 
192 | def process_2():
193 |     dialog_manager = DialogManager()
194 |     dialog_manager.add_to_interviewer("Question 2")
195 |     time.sleep(1)
196 |     dialog_manager.add_to_rookie("Answer 2")
197 |     time.sleep(1)
198 |     dialog_manager.add_to_rookie("Answer 3")
199 |     time.sleep(1)
200 |     dialog_manager.add_to_chatgpt("Suggestion 2")
201 |     time.sleep(1)
202 | 
203 | def process_3():
204 |     dialog_manager = DialogManager()
205 |     dialog_manager.update_last_interviewer("Updated Question 2")
206 |     dialog_manager.update_last_rookie("Updated Answer 5")
207 |     dialog_manager.update_last_chatgpt("Updated Suggestion 2")
208 |     time.sleep(1)
209 |     dialog_manager.update_last_interviewer("Question 3 (new)")
210 |     dialog_manager.update_last_rookie("Answer 4 (new)")
211 |     dialog_manager.update_last_chatgpt("Suggestion 3 (new)")
212 | 
213 | if __name__ == "__main__":
214 |     p1 = Process(target=process_1)
215 |     p2 = Process(target=process_2)
216 |     p3 = Process(target=process_3)
217 | 
218 |     p1.start()
219 |     p2.start()
220 |     p3.start()
221 | 
222 |     p1.join()
223 |     p2.join()
224 |     p3.join()
225 | 
226 |     print('end')
227 | 


--------------------------------------------------------------------------------
/interview/Interviewer.py:
--------------------------------------------------------------------------------
  1 | import pyaudio
  2 | import json
  3 | import asyncio
  4 | import numpy as np
  5 | import websockets
  6 | from Config import Config
  7 | import os
  8 | import time
  9 | from queue import Queue
 10 | from multiprocessing import Process
 11 | from DialogManager import DialogManager
 12 | from ChatgptManager import ChatgptManager
 13 | import multiprocessing
 14 | 
 15 | class Interview:
 16 |     def __init__(self):
 17 |         self.voices = []
 18 |         self.offline_msg_done = False
 19 |         self.websocket = None
 20 |         self.text_print_2pass_offline = ""
 21 |         self.text_print_2pass_online = ""
 22 |         self.dialog = DialogManager()
 23 |         self.chatgpt = ChatgptManager()
 24 | 
 25 | 
 26 |     async def ws_client(self, id="Interview", chunk_begin=0, chunk_size=1):
 27 |         while True:
 28 |             for i in range(chunk_begin, chunk_begin + chunk_size):
 29 |                 self.offline_msg_done = False
 30 |                 self.voices = Queue()
 31 |                 
 32 |                 uri = f"ws://{Config.INTERVIEWER_HOST}:{Config.INTERVIEWER_PORT}"
 33 |                 ssl_context = None
 34 |                 
 35 |                 print("Connecting to", uri)
 36 |                 try:
 37 |                     async with websockets.connect(uri, subprotocols=["binary"], ping_interval=None, ssl=ssl_context) as self.websocket:
 38 |                         task1 = asyncio.create_task(self.record_system_voice())
 39 |                         task2 = asyncio.create_task(self.message(id))
 40 |                         await asyncio.gather(task1, task2)
 41 |                 except websockets.exceptions.ConnectionClosedError as e:
 42 |                     print(f"WebSocket connection closed with error: {e}")
 43 |                     await asyncio.sleep(3)
 44 |                 except Exception as e:
 45 |                     print(f"Interviewer error occurred: {e}")
 46 |                     import traceback
 47 |                     traceback.print_exc()
 48 |                     await asyncio.sleep(3)
 49 | 
 50 |     async def record_system_voice(self):
 51 |         FORMAT = pyaudio.paInt16
 52 |         CHANNELS = Config.AUDIO_CHANNEL
 53 |         RATE = Config.AUDIO_FS
 54 |         chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL
 55 |         CHUNK = int(RATE / 1000 * chunk_size)
 56 |         audio = pyaudio.PyAudio()
 57 |         TARGET_RATE = 16000  
 58 |         TARGET_CHANNELS = 1
 59 | 
 60 | 
 61 |         while True:
 62 |             try:
 63 |                 audio = pyaudio.PyAudio()
 64 |                 system_stream = audio.open(format=FORMAT,
 65 |                                            channels=CHANNELS,
 66 |                                            rate=RATE,
 67 |                                            input=True,
 68 |                                            input_device_index=Config.AGGREGATE_DEVICE_INDEX,
 69 |                                            frames_per_buffer=CHUNK)
 70 |                 break
 71 |             except OSError as e:
 72 |                 print(f"Error opening audio stream: {e}")
 73 |             except Exception as e:
 74 |                 print(f"pyaudio error occurred: {e}")
 75 | 
 76 |         fst_dict, hotword_msg = self.prepare_hotword_message()
 77 | 
 78 |         use_itn = Config.USE_ITN == 1
 79 |         message = json.dumps({
 80 |             "mode": Config.MODE,
 81 |             "chunk_size": Config.CHUNK_SIZE,
 82 |             "chunk_interval": Config.CHUNK_INTERVAL,
 83 |             "wav_name": "system_voice",
 84 |             "is_speaking": True,
 85 |             "hotwords": hotword_msg,
 86 |             "itn": use_itn
 87 |         })
 88 | 
 89 |         await self.websocket.send(message)
 90 | 
 91 |         def convert_to_mono(data, channels):
 92 |             audio_data = np.frombuffer(data, dtype=np.int16)
 93 |             mono_data = audio_data.reshape((-1, channels)).mean(axis=1)
 94 |             return mono_data.astype(np.int16)
 95 | 
 96 |         def resample_audio(data, original_rate, target_rate):
 97 |             audio_data = np.frombuffer(data, dtype=np.int16)
 98 |             resampled_data = np.interp(np.linspace(0, len(audio_data), int(len(audio_data) * target_rate / original_rate)),
 99 |                                     np.arange(len(audio_data)), audio_data)
100 |             return resampled_data.astype(np.int16)
101 | 
102 |         while True:
103 |             data = system_stream.read(CHUNK)
104 | 
105 |             try:
106 |                 if CHANNELS == TARGET_CHANNELS and RATE == TARGET_RATE:
107 |                     await self.websocket.send(data)
108 |                 else:
109 |                     mono_data = convert_to_mono(data, CHANNELS)
110 | 
111 |                     resampled_data = resample_audio(mono_data, RATE, TARGET_RATE)
112 | 
113 |                     output_data = resampled_data.tobytes()
114 | 
115 |                     await self.websocket.send(output_data)
116 |                 await asyncio.sleep(0.01)
117 |             except websockets.exceptions.ConnectionClosedError as e:
118 |                 print(f"Connection closed with error: {e}")
119 |                 break
120 | 
121 |     async def message(self, id):
122 |         if Config.OUTPUT_DIR is not None:
123 |             ibest_writer = open(os.path.join(Config.OUTPUT_DIR, f"text.{id}"), "a", encoding="utf-8")
124 |         else:
125 |             ibest_writer = None
126 |         try:
127 |             while True:
128 |                 try:
129 |                     meg = await asyncio.wait_for(self.websocket.recv(), timeout=Config.INTERVIEWER_WAIT_TIME / 1000)
130 |                     meg = json.loads(meg)
131 |                     wav_name = meg.get("wav_name", "demo")
132 |                     text = meg["text"]
133 |                     timestamp = ""
134 |                     self.offline_msg_done = meg.get("is_final", False)
135 | 
136 |                     if "timestamp" in meg:
137 |                         timestamp = meg["timestamp"]
138 | 
139 |                     if ibest_writer is not None:
140 |                         if timestamp != "":
141 |                             text_write_line = f"{wav_name}\t{text}\t{timestamp}\n"
142 |                         else:
143 |                             text_write_line = f"{wav_name}\t{text}\n"
144 |                         ibest_writer.write(text_write_line)
145 | 
146 |                     if 'mode' not in meg:
147 |                         continue
148 | 
149 |                     if meg["mode"] == "2pass-online":
150 |                         self.text_print_2pass_online += "{}".format(text)
151 |                         text_print = self.text_print_2pass_offline + self.text_print_2pass_online
152 |                         text_print = text_print[-Config.WORDS_MAX_PRINT:]
153 |                         self.dialog.update_last_interviewer(text_print)
154 |                     else:
155 |                         self.text_print_2pass_online = ""
156 |                         text_print = self.text_print_2pass_offline + "{}".format(text)
157 |                         text_print = text_print[-Config.WORDS_MAX_PRINT:]
158 |                         self.dialog.update_last_interviewer(text_print)
159 |                         self.text_print_2pass_offline += "{}".format(text)
160 |                 except asyncio.TimeoutError: 
161 |                     if self.text_print_2pass_offline:
162 |                         self.text_print_2pass_offline = ""
163 |                         self.dialog.add_to_interviewer("")
164 |         except Exception as e:
165 |             print("Interview receive ws message Exception:", e)
166 | 
167 | 
168 |     def prepare_hotword_message(self):
169 |         fst_dict = {}
170 |         hotword_msg = ""
171 |         if Config.HOTWORD.strip():
172 |             with open(Config.HOTWORD) as f_scp:
173 |                 hot_lines = f_scp.readlines()
174 |                 for line in hot_lines:
175 |                     words = line.strip().split(" ")
176 |                     if len(words) < 2:
177 |                         print("Please check format of hotwords")
178 |                         continue
179 |                     try:
180 |                         fst_dict[" ".join(words[:-1])] = int(words[-1])
181 |                     except ValueError:
182 |                         print("Please check format of hotwords")
183 |             hotword_msg = json.dumps(fst_dict)
184 |         return fst_dict, hotword_msg
185 | 
186 | 
187 | def interview_thread(id, chunk_begin, chunk_size):
188 |     while True:
189 |         try:
190 |             interview = Interview()
191 |             asyncio.run(interview.ws_client(id, chunk_begin, chunk_size))
192 |         except Exception as e:
193 |             print(f"Interview thread encountered an error: {e}")
194 |             time.sleep(3)
195 | 
196 | if __name__ == '__main__':
197 |     p = Process(target=interview_thread, args=("interviewer", 0, 1))
198 |     p.start()
199 |     p.join()
200 | 


--------------------------------------------------------------------------------
/interview/MultimodeManager.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | from DialogManager import DialogManager
 3 | from RequestSender import RequestSender
 4 | from Config import Config
 5 | 
 6 | class MultimodeManager:
 7 |     def __init__(self):
 8 |         self.request_sender = RequestSender()
 9 |         self.prompt_directory = "./prompt"
10 |         self.dialog = DialogManager()
11 | 
12 |     async def multimode_process(self, category, link):
13 |         content_list = []
14 | 
15 |         file_path = self.prompt_directory + "/" + category + ".md"
16 |         print(file_path)
17 |         with open(file_path, 'r') as file:
18 |             file_content = file.read()
19 |         if not self.dialog.get_last_rookie():
20 |             self.dialog.update_last_rookie(file_content + f' \n <img src="{link}" alt="Description" style="width:100%;">')
21 |         else:
22 |             print(self.dialog.get_last_rookie())
23 |             self.dialog.add_to_rookie(file_content + f' \n <img src="{link}" alt="Description" style="width:100%;">')
24 | 
25 |         content = [
26 |             {"type": "text", "text": file_content},
27 |             {"type": "image_url", "image_url": {"url": link}}
28 |         ]
29 |         
30 |         content_list.append({
31 |             "content": content,
32 |             "role": "user"
33 |         })
34 | 
35 |         print(content_list)
36 |         
37 |         await self.request_sender.send_post_request(content_list)
38 | 
39 |         self.dialog.add_to_rookie("")
40 | 
41 | async def main():
42 |     multimodeManager = MultimodeManager()
43 |     category = "algorithm"
44 |     link = "xxxxxx"
45 |     await multimodeManager.multimode_process(category, link)
46 | 
47 | if __name__ == '__main__':
48 |     asyncio.run(main())
49 | 
50 | 
51 | 
52 | 
53 | 


--------------------------------------------------------------------------------
/interview/R2Uploader.py:
--------------------------------------------------------------------------------
  1 | import boto3
  2 | from botocore.client import Config as boto3Config
  3 | from Config import Config
  4 | import hashlib
  5 | import os
  6 | from datetime import datetime
  7 | import requests
  8 | from requests.exceptions import RequestException, SSLError
  9 | from requests.adapters import HTTPAdapter
 10 | from urllib3.util.retry import Retry
 11 | 
 12 | class R2Uploader:
 13 |     def __init__(self):
 14 |         self.access_key_id = Config.R2_ACCESS_KEY_ID
 15 |         self.secret_access_key = Config.R2_SECRET_ACCESS_KEY
 16 |         self.bucket_name = Config.R2_BUCKET_NAME
 17 |         self.region = Config.R2_REGION
 18 |         self.endpoint_url = Config.R2_ENDPOINT_URL
 19 |         self.custom_domain = Config.R2_CUSTOM_DOMAIN
 20 | 
 21 |         self.s3_client = boto3.client(
 22 |             's3',
 23 |             region_name=self.region,
 24 |             endpoint_url=self.endpoint_url,
 25 |             aws_access_key_id=self.access_key_id,
 26 |             aws_secret_access_key=self.secret_access_key,
 27 |             config=boto3Config(signature_version='s3v4')
 28 |         )
 29 | 
 30 |     def _calculate_md5(self, file_path):
 31 |         hash_md5 = hashlib.md5()
 32 |         with open(file_path, "rb") as f:
 33 |             for chunk in iter(lambda: f.read(4096), b""):
 34 |                 hash_md5.update(chunk)
 35 |         return hash_md5.hexdigest()
 36 | 
 37 |     def _construct_object_name(self, file_path):
 38 |         md5_hash = self._calculate_md5(file_path)
 39 |         ext = os.path.splitext(file_path)[1]
 40 |         now = datetime.now()
 41 |         object_name = f"{now.year}/{now.month}/{md5_hash}{ext}"
 42 |         return object_name
 43 | 
 44 |     def upload_image(self, file_path):
 45 |         try:
 46 |             object_name = self._construct_object_name(file_path)
 47 |             self.s3_client.upload_file(
 48 |                 file_path, 
 49 |                 self.bucket_name, 
 50 |                 object_name,
 51 |                 ExtraArgs={'ACL': 'public-read'}
 52 |             )
 53 |             print(f"File {file_path} uploaded to {object_name} in R2 bucket.")
 54 |             return object_name
 55 |         except Exception as e:
 56 |             print(f"Failed to upload {file_path} to R2 bucket: {e}")
 57 |             return None
 58 | 
 59 |     def get_file_url(self, object_name):
 60 |         if self.custom_domain:
 61 |             return f"{self.custom_domain}/{self.bucket_name}/{object_name}"
 62 |         else:
 63 |             return f"{self.endpoint_url}/{self.bucket_name}/{object_name}"
 64 | 
 65 |     def get_markdown_link(self, object_name):
 66 |         url = self.get_file_url(object_name)
 67 |         return f"![Image]({url})"
 68 | 
 69 |     def get_html_link(self, object_name):
 70 |         url = self.get_file_url(object_name)
 71 |         return f'<img src="{url}" alt="Image"/>'
 72 | 
 73 |     def upload_and_get_link(self, file_path, link_type="url"):
 74 |         object_name = self.upload_image(file_path)
 75 |         if object_name:
 76 |             if link_type == "markdown":
 77 |                 return self.get_markdown_link(object_name)
 78 |             elif link_type == "html":
 79 |                 return self.get_html_link(object_name)
 80 |             else:
 81 |                 return self.get_file_url(object_name)
 82 |         else:
 83 |             return None
 84 |         
 85 |     def download_image_from_url(self, img_url):
 86 |         headers = {
 87 |             "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
 88 |             "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
 89 |             "Accept-Language": "en-US,en;q=0.5",
 90 |             "Connection": "keep-alive"
 91 |         }
 92 | 
 93 |         retry_strategy = Retry(
 94 |             total=5,
 95 |             backoff_factor=1, 
 96 |             status_forcelist=[429, 500, 502, 503, 504],
 97 |             method_whitelist=["HEAD", "GET", "OPTIONS"]
 98 |         )
 99 | 
100 |         adapter = HTTPAdapter(max_retries=retry_strategy)
101 |         session = requests.Session()
102 |         session.mount("https://", adapter)
103 |         session.mount("http://", adapter)
104 | 
105 |         try:
106 |             response = session.get(img_url, headers=headers, stream=True, timeout=(10, 30))
107 |             response.raise_for_status()
108 |         except SSLError as e:
109 |             print(f"SSL error occurred, retrying with TLSv1.2: {e}")
110 |             try:
111 |                 # Retry with TLSv1.2
112 |                 session.mount("https://", adapter)
113 |                 response = session.get(img_url, headers=headers, stream=True)
114 |                 response.raise_for_status()
115 |             except Exception as e:
116 |                 print(f"Failed to download image with TLSv1.2 from {img_url}: {e}")
117 |                 return None
118 |         except RequestException as e:
119 |             print(f"Failed to download image from {img_url}: {e}")
120 |             return None
121 | 
122 |         tmp_dir = "./tmp"
123 |         if not os.path.exists(tmp_dir):
124 |             os.makedirs(tmp_dir)
125 | 
126 |         file_name = os.path.basename(img_url)
127 |         file_path = os.path.join(tmp_dir, file_name)
128 | 
129 |         try:
130 |             with open(file_path, 'wb') as out_file:
131 |                 for chunk in response.iter_content(chunk_size=8192):
132 |                     out_file.write(chunk)
133 |             print(f"Image downloaded successfully from {img_url}")
134 |             return file_path
135 |         except Exception as e:
136 |             print(f"Unexpected error occurred while saving image from {img_url}: {e}")
137 |             return None
138 | 
139 |     def upload_image_from_url(self, img_url, link_type="url"):
140 |         file_path = self.download_image_from_url(img_url)
141 |         if file_path:
142 |             return self.upload_and_get_link(file_path, link_type)
143 |         return None
144 | 
145 | if __name__ == "__main__":
146 |     uploader = R2Uploader()
147 |     file_path = './0aa00c3f-97fb-42b6-9297-812b2b0b7b61.png'
148 |     link_type = 'markdown'  # 'url', 'markdown', or 'html'
149 | 
150 |     link = uploader.upload_and_get_link(file_path, link_type)
151 |     print(f"Generated link: {link}")
152 | 


--------------------------------------------------------------------------------
/interview/RagManager.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | from meilisearch import Client
 3 | from DialogManager import DialogManager
 4 | from Config import Config
 5 | import json
 6 | class RagManager:
 7 |     def __init__(self):
 8 |         self.dialog = DialogManager()
 9 |         self.config = Config()
10 |         self.meilisearch_client = Client(f"http://{self.config.MEILISEARCH_HOST}:{self.config.MEILISEARCH_PORT}")
11 | 
12 |     async def query_meilisearch(self):
13 |         query_object = self.dialog.get_last_interviewer()
14 |         query_text = query_object['text']
15 |         print(f"query_text: {query_text}")
16 |         index = self.meilisearch_client.index('qa_pairs')
17 |         
18 |         results = index.search(query_text, {
19 |             'limit': self.config.MEILISEARCH_DEEP
20 |         })
21 |         answers = [hit['a'] for hit in results['hits'] if 'a' in hit]
22 |         return answers
23 | 
24 | async def main():
25 |     rag = RagManager()
26 |     answers = await rag.query_meilisearch()
27 |     for ans in answers:
28 |         print(ans)
29 | 
30 | if __name__ == '__main__':
31 |     asyncio.run(main())


--------------------------------------------------------------------------------
/interview/RequestSender.py:
--------------------------------------------------------------------------------
 1 | import aiohttp
 2 | import json
 3 | from dotenv import load_dotenv
 4 | import os
 5 | import logging
 6 | from DialogManager import DialogManager
 7 | import time
 8 | 
 9 | logging.basicConfig(level=logging.INFO)
10 | logger = logging.getLogger(__name__)
11 | 
12 | class RequestSender:
13 |     def __init__(self):
14 |         load_dotenv()
15 |         self.url = os.getenv('CHATGPT_BASE_URL')
16 |         self.headers = {
17 |             'Authorization': "Bearer " + os.getenv('AUTHORIZATION'),
18 |             'Content-Type': 'application/json', 
19 |             "Accept": 'application/json'
20 |         }
21 |         self.dialog = DialogManager()
22 |         self.model = os.getenv('MODEL')
23 | 
24 |     async def send_post_request(self, contentList):
25 |         async with aiohttp.ClientSession() as session:
26 |             payload = {
27 |                 "model": self.model,
28 |                 "messages": contentList, 
29 |                 "stream": True
30 |             }
31 |             receiveData = ""
32 |             buffer = ""
33 |             buffer_time = 0.5
34 |             self.dialog.add_to_chatgpt("")
35 |             try:
36 |                 async with session.post(self.url, headers=self.headers, data=json.dumps(payload)) as response:
37 |                     if response.status == 200:
38 |                         last_update_time = time.time()
39 |                         async for line in response.content:
40 |                             if line.startswith(b'data: '):
41 |                                 sse_message = line.decode('utf-8').strip()[6:]
42 |                                 if sse_message == '[DONE]':
43 |                                     receiveData += buffer
44 |                                     self.dialog.update_last_chatgpt(receiveData)
45 |                                     time.sleep(1)
46 |                                     self.dialog.add_to_chatgpt("")
47 |                                     break
48 |                                 message_data = self.extract_id_and_content(json.loads(sse_message))
49 |                                 if 'content' in message_data and message_data['content']:
50 |                                     buffer += message_data['content']
51 |                                     
52 |                                     if time.time() - last_update_time > buffer_time:
53 |                                         receiveData += buffer
54 |                                         self.dialog.update_last_chatgpt(receiveData)
55 |                                         buffer = ""
56 |                                         last_update_time = time.time()
57 | 
58 |                             else:
59 |                                 logger.debug(f"Received non-data line: {line}")
60 |             except aiohttp.ClientError as e:
61 |                 logger.error(f"Request failed: {str(e)}")
62 |             
63 |             return receiveData
64 |     
65 |     def extract_id_and_content(self, response_data):
66 |         try:
67 |             response_id = response_data.get('id')
68 |             content = response_data.get('choices', [])[0].get('delta', {}).get('content')
69 |             return {
70 |                 "id": response_id,
71 |                 "content": content
72 |             }
73 |         except (IndexError, KeyError, TypeError) as e:
74 |             logger.error(f"Error extracting id and content: {e}")
75 |             return {"error": "Invalid response format"}
76 | 
77 | 


--------------------------------------------------------------------------------
/interview/Rookie.py:
--------------------------------------------------------------------------------
  1 | import pyaudio
  2 | import numpy as np
  3 | import json
  4 | import asyncio
  5 | import websockets
  6 | from Config import Config
  7 | import time
  8 | import os
  9 | from queue import Queue
 10 | from multiprocessing import Process
 11 | from DialogManager import DialogManager
 12 | 
 13 | class Rookie:
 14 |     def __init__(self):
 15 |         self.voices = []
 16 |         self.offline_msg_done = False
 17 |         self.websocket = None
 18 |         self.text_print_2pass_offline = ""
 19 |         self.text_print_2pass_online = ""
 20 |         self.dialog = DialogManager()
 21 | 
 22 |     async def ws_client(self, id="Rookie", chunk_begin=0, chunk_size=1):
 23 |         while True:
 24 |             for i in range(chunk_begin, chunk_begin + chunk_size):
 25 |                 self.offline_msg_done = False
 26 |                 self.voices = Queue()
 27 |                 
 28 |                 uri = f"ws://{Config.ROOKIE_HOST}:{Config.ROOKIE_PORT}"
 29 |                 ssl_context = None
 30 |                 
 31 |                 print("Connecting to", uri)
 32 |                 try:
 33 |                     async with websockets.connect(uri, subprotocols=["binary"], ping_interval=None, ssl=ssl_context) as self.websocket:
 34 |                         task1 = asyncio.create_task(self.record_microphone())
 35 |                         task2 = asyncio.create_task(self.message(id))
 36 |                         await asyncio.gather(task1, task2)
 37 |                 except websockets.exceptions.ConnectionClosedError as e:
 38 |                     print(f"WebSocket connection closed with error: {e}")
 39 |                     await asyncio.sleep(3)
 40 |                 except Exception as e:
 41 |                     print(f"Rookie error occurred: {e}")
 42 |                     import traceback
 43 |                     traceback.print_exc()
 44 |                     await asyncio.sleep(3)
 45 | 
 46 |     async def record_microphone(self):
 47 |         FORMAT = pyaudio.paInt16
 48 |         CHANNELS = Config.AUDIO_CHANNEL
 49 |         RATE = Config.AUDIO_FS
 50 |         chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL
 51 |         CHUNK = int(RATE / 1000 * chunk_size)
 52 |         audio = pyaudio.PyAudio()
 53 | 
 54 |         TARGET_RATE = 16000
 55 |         TARGET_CHANNELS = 1
 56 | 
 57 |         while True:
 58 |             try:
 59 |                 audio = pyaudio.PyAudio()
 60 |                 mic_stream = audio.open(format=FORMAT,
 61 |                                         channels=CHANNELS,
 62 |                                         rate=RATE,
 63 |                                         input=True,
 64 |                                         input_device_index=Config.MIC_DEVICE_INDEX,
 65 |                                         frames_per_buffer=CHUNK)
 66 |                 break
 67 |             except OSError as e:
 68 |                 print(f"Error opening audio stream: {e}")
 69 |                 await asyncio.sleep(3)
 70 | 
 71 |         fst_dict, hotword_msg = self.prepare_hotword_message()
 72 | 
 73 |         use_itn = Config.USE_ITN == 1
 74 |         message = json.dumps({
 75 |             "mode": Config.MODE,
 76 |             "chunk_size": Config.CHUNK_SIZE,
 77 |             "chunk_interval": Config.CHUNK_INTERVAL,
 78 |             "wav_name": "system_voice",
 79 |             "is_speaking": True,
 80 |             "hotwords": hotword_msg,
 81 |             "itn": use_itn
 82 |         })
 83 | 
 84 |         await self.websocket.send(message)
 85 |         def convert_to_mono(data, channels):
 86 |             audio_data = np.frombuffer(data, dtype=np.int16)
 87 |             mono_data = audio_data.reshape((-1, channels)).mean(axis=1)
 88 |             return mono_data.astype(np.int16)
 89 | 
 90 |         def resample_audio(data, original_rate, target_rate):
 91 |             audio_data = np.frombuffer(data, dtype=np.int16)
 92 |             resampled_data = np.interp(np.linspace(0, len(audio_data), int(len(audio_data) * target_rate / original_rate)),
 93 |                                     np.arange(len(audio_data)), audio_data)
 94 |             return resampled_data.astype(np.int16)
 95 | 
 96 | 
 97 |         while True:
 98 |             data = mic_stream.read(CHUNK)
 99 |             try:
100 |                 if CHANNELS == TARGET_CHANNELS and RATE == TARGET_RATE:
101 |                     await self.websocket.send(data)
102 |                 else:
103 |                     mono_data = convert_to_mono(data, CHANNELS)
104 | 
105 |                     resampled_data = resample_audio(mono_data, RATE, TARGET_RATE)
106 | 
107 |                     output_data = resampled_data.tobytes()
108 | 
109 |                     await self.websocket.send(output_data)
110 |                 await asyncio.sleep(0.01)
111 |             except websockets.exceptions.ConnectionClosedError as e:
112 |                 print(f"Connection closed with error: {e}")
113 |                 break
114 | 
115 |     async def message(self, id):
116 |         if Config.OUTPUT_DIR is not None:
117 |             ibest_writer = open(os.path.join(Config.OUTPUT_DIR, f"text.{id}"), "a", encoding="utf-8")
118 |         else:
119 |             ibest_writer = None
120 |         try:
121 |             while True:
122 |                 try:
123 |                     meg = await asyncio.wait_for(self.websocket.recv(), timeout=Config.ROOKIE_WAIT_TIME / 1000)
124 |                     meg = json.loads(meg)
125 |                     wav_name = meg.get("wav_name", "demo")
126 |                     text = meg["text"]
127 |                     timestamp = ""
128 |                     self.offline_msg_done = meg.get("is_final", False)
129 | 
130 |                     if "timestamp" in meg:
131 |                         timestamp = meg["timestamp"]
132 | 
133 |                     if ibest_writer is not None:
134 |                         if timestamp != "":
135 |                             text_write_line = f"{wav_name}\t{text}\t{timestamp}\n"
136 |                         else:
137 |                             text_write_line = f"{wav_name}\t{text}\n"
138 |                         ibest_writer.write(text_write_line)
139 | 
140 |                     if 'mode' not in meg:
141 |                         continue
142 | 
143 |                     if meg["mode"] == "2pass-online":
144 |                         self.text_print_2pass_online += "{}".format(text)
145 |                         text_print = self.text_print_2pass_offline + self.text_print_2pass_online
146 |                         text_print = text_print[-Config.WORDS_MAX_PRINT:]
147 |                         self.dialog.update_last_rookie(text_print)
148 |                     else:
149 |                         self.text_print_2pass_online = ""
150 |                         text_print = self.text_print_2pass_offline + "{}".format(text)
151 |                         text_print = text_print[-Config.WORDS_MAX_PRINT:]
152 |                         self.dialog.update_last_rookie(text_print)
153 |                         self.text_print_2pass_offline += "{}".format(text)
154 |                 except asyncio.TimeoutError:
155 |                     if self.text_print_2pass_offline:
156 |                         self.text_print_2pass_offline = ""
157 |                         self.dialog.add_to_rookie("")
158 | 
159 |         except Exception as e:
160 |             print("Exception:", e)
161 | 
162 |     def prepare_hotword_message(self):
163 |         fst_dict = {}
164 |         hotword_msg = ""
165 |         if Config.HOTWORD.strip():
166 |             with open(Config.HOTWORD) as f_scp:
167 |                 hot_lines = f_scp.readlines()
168 |                 for line in hot_lines:
169 |                     words = line.strip().split(" ")
170 |                     if len(words) < 2:
171 |                         print("Please check format of hotwords")
172 |                         continue
173 |                     try:
174 |                         fst_dict[" ".join(words[:-1])] = int(words[-1])
175 |                     except ValueError:
176 |                         print("Please check format of hotwords")
177 |             hotword_msg = json.dumps(fst_dict)
178 |         return fst_dict, hotword_msg
179 | 
180 | 
181 | def rookie_thread(id, chunk_begin, chunk_size):
182 |     while True:
183 |         try:
184 |             rookie = Rookie()
185 |             asyncio.run(rookie.ws_client(id, chunk_begin, chunk_size))
186 |         except Exception as e:
187 |             print(f"Rookie thread encountered an error: {e}")
188 |             time.sleep(3)
189 | 
190 | if __name__ == '__main__':
191 |     p = Process(target=rookie_thread, args=("rookie", 0, 1))
192 |     p.start()
193 |     p.join()
194 | 


--------------------------------------------------------------------------------
/interview/SaveFile.py:
--------------------------------------------------------------------------------
 1 | import redis
 2 | import os
 3 | import json
 4 | from Config import Config
 5 | 
 6 | class SaveFile:
 7 |     def __init__(self):
 8 |         self.redis = redis.StrictRedis(host=Config.REDIS_HOST, port=Config.REDIS_PORT, decode_responses=True)
 9 | 
10 |         self.interviewer_key = "dialog_manager:interviewer"
11 |         self.rookie_key = "dialog_manager:rookie"
12 |         self.chatgpt_key = "dialog_manager:chatgpt"
13 | 
14 |     def _get_valid_entries(self, key, max_len):
15 |         entries = [json.loads(item) for item in self.redis.lrange(key, 0, -1)]
16 |         valid_entries = [item for item in entries if item['text'].strip() != ""]
17 | 
18 |         if len(valid_entries) > max_len:
19 |             return valid_entries[-max_len:]
20 |         else:
21 |             return valid_entries
22 | 
23 |     def _get_icon(self, list_name):
24 |         if list_name == "interviewer":
25 |             return "🎤"
26 |         elif list_name == "rookie":
27 |             return "😅"
28 |         elif list_name == "chatgpt":
29 |             return "🤖"
30 | 
31 |     def export_dialogs_to_file(self, filename):
32 |         interviewer_list = self._get_valid_entries(self.interviewer_key, 3000)
33 |         rookie_list = self._get_valid_entries(self.rookie_key, 3000)
34 |         chatgpt_list = self._get_valid_entries(self.chatgpt_key, 3000)
35 | 
36 |         combined_list = interviewer_list + rookie_list + chatgpt_list
37 | 
38 |         # 对提取出的元素按时间排序
39 |         sorted_combined_list = sorted(combined_list, key=lambda x: x['time'])
40 | 
41 |         output = []
42 |         for entry in sorted_combined_list:
43 |             icon = self._get_icon(entry['list_name'])
44 |             output.append(f"{entry['time']} {icon} {entry['list_name']} : \n\n {entry['text']}")
45 | 
46 |         result = "\n\n".join(output)
47 | 
48 |         with open(filename, 'w') as file:
49 |             file.write(result)
50 | 
51 |         print(f"Dialogs have been saved to {filename}")
52 | 
53 | # 示例用法
54 | if __name__ == "__main__":
55 |     dialog_manager = SaveFile()
56 | 
57 |     dialog_manager.export_dialogs_to_file("dialogs_output.md")
58 | 


--------------------------------------------------------------------------------
/interview/ScreenShot.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | from datetime import datetime
  4 | from pynput import keyboard
  5 | import pyautogui
  6 | import asyncio
  7 | from PIL import Image
  8 | from multiprocessing import Process
  9 | from R2Uploader import R2Uploader
 10 | from MultimodeManager import MultimodeManager
 11 | from DialogManager import DialogManager
 12 | from ChatgptManager import ChatgptManager
 13 | import platform
 14 | 
 15 | 
 16 | class ScreenshotManager:
 17 |     CONFIG_FILE = "config.json"
 18 |     DEFAULT_CONFIG = {
 19 |         "shortcuts": {
 20 |             "algorithm": ["<ctrl>", "<alt>", "a"],
 21 |             "personality": ["<ctrl>", "<alt>", "p"],
 22 |             "general": ["<ctrl>", "<alt>", "g"],
 23 |             "long_screenshot": ["<ctrl>", "<alt>", "l"],
 24 |             "help": ["<ctrl>", "<alt>", "h"],
 25 |             "fix": ["<ctrl>", "<alt>", "f"],
 26 |             "ocr": ["<ctrl>", "<alt>", "o"],
 27 |             "exit": ["<ctrl>", "c"]
 28 |         },
 29 |         "save_dir": "./screenshots",
 30 |         "scroll_delay": 1.5,
 31 |         "scroll_amount": 17,
 32 |         "max_screenshots": 3 
 33 |     }
 34 | 
 35 |     def __init__(self):
 36 |         self.config = self.load_config()
 37 |         self.current_keys = set()
 38 |         self.uploader = R2Uploader()
 39 |         self.link_type = 'url'
 40 |         self.prompt_dir = "./prompt/"
 41 |         self.dialog = DialogManager()
 42 |         self.multimodeManager = MultimodeManager()
 43 |         self.loop = None
 44 |         self.chatgpt_manager = ChatgptManager()
 45 |         self.system = platform.system()
 46 |         os.makedirs(self.config['save_dir'], exist_ok=True)
 47 | 
 48 |     def load_config(self):
 49 |         if os.path.exists(self.CONFIG_FILE):
 50 |             with open(self.CONFIG_FILE, 'r') as f:
 51 |                 return json.load(f)
 52 |         return self.DEFAULT_CONFIG
 53 | 
 54 |     def save_config(self):
 55 |         with open(self.CONFIG_FILE, 'w') as f:
 56 |             json.dump(self.config, f, indent=4)
 57 | 
 58 |     def map_key_for_platform(self, key):
 59 |         if self.system == 'Windows':
 60 |             return self.map_key_for_windows(key)
 61 |         elif self.system == 'Darwin':
 62 |             return self.map_key_for_macos(key)
 63 |         elif self.system == 'Linux':
 64 |             return self.map_key_for_linux(key)
 65 |         return key
 66 | 
 67 |     def map_key_for_windows(self, key):
 68 |         if hasattr(key, 'vk'):
 69 |             vk = key.vk
 70 |             if 65 <= vk <= 90:
 71 |                 return chr(vk).lower()
 72 |             if 48 <= vk <= 57:
 73 |                 return chr(vk)
 74 | 
 75 |         key_mapping = {
 76 |             'Key.ctrl_l': '<ctrl>',
 77 |             'Key.ctrl_r': '<ctrl>',
 78 |             'Key.alt_l': '<alt>',
 79 |             'Key.alt_r': '<alt>',
 80 |             'Key.shift_l': '<shift>',
 81 |             'Key.shift_r': '<shift>',
 82 |             'Key.cmd': '<cmd>',
 83 |             'Key.enter': '<enter>',
 84 |             'Key.space': '<space>',
 85 |             'Key.tab': '<tab>',
 86 |             'Key.esc': '<esc>',
 87 |             'Key.backspace': '<backspace>',
 88 |         }
 89 |         return key_mapping.get(str(key), None)
 90 | 
 91 |     def map_key_for_macos(self, key):
 92 |         key_mapping = {
 93 |             'Key.cmd': '<cmd>',
 94 |             'Key.ctrl': '<ctrl>',
 95 |             'Key.alt': '<alt>',
 96 |             'Key.shift': '<shift>',
 97 |             'Key.enter': '<enter>',
 98 |             'Key.space': '<space>',
 99 |             'Key.tab': '<tab>',
100 |             'Key.esc': '<esc>',
101 |             'Key.backspace': '<backspace>',
102 |             'Key.caps_lock': '<caps_lock>',
103 |             'Key.right': '<right>',
104 |             'Key.left': '<left>',
105 |             'Key.up': '<up>',
106 |             'Key.down': '<down>'
107 |         }
108 |         if isinstance(key, keyboard.KeyCode):
109 |             return key.char.lower() if key.char else None
110 |         else:
111 |             return key_mapping.get(str(key), None)
112 | 
113 | 
114 |     def map_key_for_linux(self, key):
115 |         if hasattr(key, 'vk'):
116 |             vk = key.vk
117 |             if 65 <= vk <= 90:
118 |                 return chr(vk).lower()
119 |             if 48 <= vk <= 57:
120 |                 return chr(vk)
121 | 
122 |         key_mapping = {
123 |             'Key.ctrl_l': '<ctrl>',
124 |             'Key.ctrl_r': '<ctrl>',
125 |             'Key.alt_l': '<alt>',
126 |             'Key.alt_r': '<alt>',
127 |             'Key.shift_l': '<shift>',
128 |             'Key.shift_r': '<shift>',
129 |             'Key.enter': '<enter>',
130 |             'Key.space': '<space>',
131 |             'Key.tab': '<tab>',
132 |             'Key.esc': '<esc>',
133 |             'Key.backspace': '<backspace>',
134 |         }
135 |         return key_mapping.get(str(key), None)
136 |     
137 |     def take_screenshot(self):
138 |         screenshot = pyautogui.screenshot()
139 |         filename = datetime.now().strftime("screenshot_%Y%m%d_%H%M%S.png")
140 |         filepath = os.path.join(self.config['save_dir'], filename)
141 |         screenshot.save(filepath)
142 |         print(f"Screenshot saved as {filepath}")
143 |         return filepath
144 |     
145 |     async def take_long_screenshot(self):
146 |         print("Taking long screenshot...")
147 |         screen_width, screen_height = pyautogui.size()
148 |         mouse_x, mouse_y = pyautogui.position()
149 | 
150 |         screenshots = []
151 |         total_height = 0
152 | 
153 |         for i in range(self.config['max_screenshots']):
154 |             print(f"Taking screenshot {i+1}/{self.config['max_screenshots']}...")
155 |             
156 |             current_screenshot = await self.safe_screenshot()
157 |             if not current_screenshot:
158 |                 print(f"Failed to take screenshot {i+1}. Stopping capture.")
159 |                 break
160 | 
161 |             screenshots.append(current_screenshot)
162 |             total_height += current_screenshot.height
163 | 
164 |             if i < self.config['max_screenshots'] - 1:
165 |                 print(f"Scrolling down {self.config['scroll_amount']} pixels...")
166 |                 pyautogui.scroll(-self.config['scroll_amount'])
167 |                 await asyncio.sleep(self.config['scroll_delay'])
168 | 
169 |         if not screenshots:
170 |             print("No screenshots were captured. Aborting.")
171 |             return None
172 | 
173 |         print(f"Captured {len(screenshots)} screenshots. Stitching them together...")
174 |         long_screenshot = Image.new('RGB', (screen_width, total_height))
175 | 
176 |         current_height = 0
177 |         for screenshot in screenshots:
178 |             long_screenshot.paste(screenshot, (0, current_height))
179 |             current_height += screenshot.height
180 | 
181 |         filename = datetime.now().strftime("long_screenshot_%Y%m%d_%H%M%S.png")
182 |         filepath = os.path.join(self.config['save_dir'], filename)
183 |         long_screenshot.save(filepath)
184 |         print(f"Long screenshot saved as {filepath}")
185 | 
186 |         pyautogui.moveTo(mouse_x, mouse_y)
187 |         return filepath
188 | 
189 | 
190 |     async def async_screenshot(self):
191 |         return await asyncio.to_thread(pyautogui.screenshot)
192 |     
193 |     async def safe_screenshot(self, timeout=5):
194 |         try:
195 |             return await asyncio.wait_for(self.async_screenshot(), timeout)
196 |         except asyncio.TimeoutError:
197 |             print(f"Screenshot timed out after {timeout} seconds.")
198 |             return None
199 |         
200 | 
201 |     async def generate_question(self, category, link):
202 |         print(f"Generating {category} question...")
203 |         await self.multimodeManager.multimode_process(category, link)
204 | 
205 |     def on_press(self, key):
206 |         asyncio.run_coroutine_threadsafe(self._async_on_press(key), self.loop)
207 | 
208 |     async def _async_on_press(self, key):
209 | 
210 |         key_char = self.map_key_for_platform(key)
211 | 
212 |         if key_char is None:
213 |             return
214 | 
215 |         self.current_keys.add(key_char)
216 |         
217 |         for category, shortcut in self.config['shortcuts'].items():
218 |             if all(k in self.current_keys for k in shortcut):
219 |                 if category == "long_screenshot":
220 |                     filepath = await self.take_long_screenshot()
221 |                 elif category == "help":
222 |                     await self.run_chatgpt_workflow()
223 |                 elif category == "exit":
224 |                     raise KeyboardInterrupt("Exit shortcut triggered")
225 |                 else:
226 |                     filepath = self.take_screenshot()
227 | 
228 |                 if filepath:
229 |                     link = self.uploader.upload_and_get_link(filepath, self.link_type)
230 |                     print(f"Generated link: {link}")
231 |                     await self.generate_question(category, link)
232 | 
233 | 
234 |     def on_release(self, key):
235 |         key_char = self.map_key_for_platform(key)
236 | 
237 |         if key_char is None:
238 |             return 
239 |     
240 |         if key_char in self.current_keys:
241 |             self.current_keys.remove(key_char)
242 | 
243 |     async def run_chatgpt_workflow(self):
244 |         print("Running ChatGPT workflow...")
245 |         await self.chatgpt_manager.run_workflow()
246 | 
247 |     async def run(self):
248 |         self.loop = asyncio.get_running_loop()
249 |         
250 |         with keyboard.Listener(
251 |             on_press=self.on_press,
252 |             on_release=self.on_release
253 |         ) as listener:
254 |             print("Shortcut program is running. Press Ctrl+C to exit.")
255 |             try:
256 |                 await asyncio.Event().wait()  # Run forever
257 |             except KeyboardInterrupt:
258 |                 print("\nProgram terminated.")
259 |             finally:
260 |                 listener.stop()
261 | 
262 | def listen_key_process():
263 |     manager = ScreenshotManager()
264 |     asyncio.run(manager.run())
265 | 
266 | if __name__ == "__main__":
267 |     process = Process(target=listen_key_process)
268 |     process.start()
269 |     process.join()


--------------------------------------------------------------------------------
/interview/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/interview/__init__.py


--------------------------------------------------------------------------------
/interview/audioTest.py:
--------------------------------------------------------------------------------
 1 | import pyaudio
 2 | import wave
 3 | from Config import Config
 4 | 
 5 | # 初始化 PyAudio
 6 | audio = pyaudio.PyAudio()
 7 | 
 8 | # 列出所有音频输入设备
 9 | for i in range(audio.get_device_count()):
10 |     device_info = audio.get_device_info_by_index(i)
11 |     print(f"Device {i}: {device_info['name']}")
12 | 
13 | def print_device_info(audio, index):
14 |     try:
15 |         info = audio.get_device_info_by_index(index)
16 |         print(f"Device {index}:")
17 |         print(f"  Name: {info['name']}")
18 |         print(f"  Max Input Channels: {info['maxInputChannels']}")
19 |         print(f"  Max Output Channels: {info['maxOutputChannels']}")
20 |         print(f"  Default Sample Rate: {info['defaultSampleRate']}")
21 |     except Exception as e:
22 |         print(f"Error getting info for device {index}: {e}")
23 | 
24 | # 选择 Aggregate Device 和麦克风的设备索引
25 | aggregate_device_index = 12  # 根据上面的列表选择适合的设备索引
26 | mic_device_index = 1  # 替换为你的麦克风设备索引
27 | 
28 | print_device_info(audio, aggregate_device_index)
29 | print_device_info(audio, mic_device_index)
30 | 
31 | 
32 | 
33 | # 配置参数
34 | FORMAT = pyaudio.paInt16
35 | System_RATE  = 48000  # 单声道可能会兼容不同的设备
36 | Mic_RATE  = 16000  # 单声道可能会兼容不同的设备
37 | System_CHANNELS = 1  # 单声道可能会兼容不同的设备
38 | Mic_CHANNELS = 1  # 单声道可能会兼容不同的设备
39 | chunk_size = 60 * Config.CHUNK_SIZE[1] / Config.CHUNK_INTERVAL
40 | Mic_CHUNK = int(Mic_RATE / 1000 * chunk_size)
41 | System_CHUNK = int(System_RATE / 1000 * chunk_size)
42 | # 打开系统音频输入流（使用聚合设备）
43 | system_stream = audio.open(format=FORMAT,
44 |                            channels=System_CHANNELS,
45 |                            rate=System_RATE,
46 |                            input=True,
47 |                            input_device_index=aggregate_device_index,
48 |                            frames_per_buffer=System_CHUNK)
49 | 
50 | # 打开麦克风输入流
51 | mic_stream = audio.open(format=FORMAT,
52 |                         channels=Mic_CHANNELS,
53 |                         rate=Mic_RATE,
54 |                         input=True,
55 |                         input_device_index=mic_device_index,
56 |                         frames_per_buffer=Mic_CHUNK)
57 | 
58 | print("Recording...")
59 | 
60 | # 捕获音频数据
61 | system_frames = []
62 | mic_frames = []
63 | 
64 | for _ in range(50):
65 |     system_data = system_stream.read(System_CHUNK)
66 |     mic_data = mic_stream.read(Mic_CHUNK)
67 |     system_frames.append(system_data)
68 |     mic_frames.append(mic_data)
69 | 
70 | # 停止并关闭流
71 | system_stream.stop_stream()
72 | system_stream.close()
73 | 
74 | mic_stream.stop_stream()
75 | mic_stream.close()
76 | 
77 | audio.terminate()
78 | 
79 | # 保存系统音频到文件
80 | with wave.open("system_audio.wav", 'wb') as wf:
81 |     wf.setnchannels(System_CHANNELS)
82 |     wf.setsampwidth(audio.get_sample_size(FORMAT))
83 |     wf.setframerate(System_RATE)
84 |     wf.writeframes(b''.join(system_frames))
85 | 
86 | # 保存麦克风音频到文件
87 | with wave.open("mic_audio.wav", 'wb') as wf:
88 |     wf.setnchannels(Mic_CHANNELS)
89 |     wf.setsampwidth(audio.get_sample_size(FORMAT))
90 |     wf.setframerate(Mic_RATE)
91 |     wf.writeframes(b''.join(mic_frames))
92 | 
93 | print("Recording finished and saved to files.")
94 | 


--------------------------------------------------------------------------------
/interview/main.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import signal
 3 | import sys
 4 | import time
 5 | from multiprocessing import Process
 6 | from Rookie import Rookie
 7 | from Interviewer import Interview
 8 | from DialogManager import DialogManager
 9 | from ScreenShot import listen_key_process
10 | from SaveFile import SaveFile
11 | from datetime import datetime
12 | 
13 | def interview_thread(id, chunk_begin, chunk_size):
14 |     while True:
15 |         try:
16 |             interview = Interview()
17 |             asyncio.run(interview.ws_client(id, chunk_begin, chunk_size))
18 |         except Exception as e:
19 |             print(f"Interview thread encountered an error: {e}")
20 |             time.sleep(3)
21 | 
22 | def rookie_thread(id, chunk_begin, chunk_size):
23 |     while True:
24 |         try:
25 |             rookie = Rookie()
26 |             asyncio.run(rookie.ws_client(id, chunk_begin, chunk_size))
27 |         except Exception as e:
28 |             print(f"Rookie thread encountered an error: {e}")
29 |             time.sleep(3)
30 | 
31 | def handle_exit(signum, frame):
32 |     print("\nGracefully shutting down...")
33 | 
34 |     interview_process.terminate()
35 |     rookie_process.terminate()
36 |     listen_key_process.terminate()
37 | 
38 |     interview_process.join()
39 |     rookie_process.join()
40 |     listen_key_process.join()
41 | 
42 |     saveFile = SaveFile()
43 |     current_time = datetime.now().strftime("%Y%m%d%H%M")
44 |     file_name = f"dialogs_output_{current_time}.md"
45 |     saveFile.export_dialogs_to_file(file_name)
46 |     print(f"Dialogs saved to {file_name}. Exiting.")
47 | 
48 |     sys.exit(0)
49 | 
50 | if __name__ == '__main__':
51 |     signal.signal(signal.SIGINT, handle_exit)
52 | 
53 |     dialog_manager = DialogManager()
54 |     dialog_manager.clear_all()
55 |     dialog_manager.add_to_interviewer("")
56 |     dialog_manager.add_to_chatgpt("")
57 |     dialog_manager.add_to_rookie("")
58 | 
59 |     interview_process = Process(target=interview_thread, args=("interviewer", 0, 1))
60 |     rookie_process = Process(target=rookie_thread, args=("rookie", 0, 1))
61 |     listen_key_process = Process(target=listen_key_process)
62 | 
63 |     interview_process.start()
64 |     rookie_process.start()
65 |     listen_key_process.start()
66 | 
67 |     try:
68 |         interview_process.join()
69 |         rookie_process.join()
70 |         listen_key_process.join()
71 |     except KeyboardInterrupt:
72 |         handle_exit(None, None)
73 | 


--------------------------------------------------------------------------------
/macOS_user_tutorial.md:
--------------------------------------------------------------------------------
  1 | # MacOS User tutorial
  2 | 
  3 | 在本篇文章中, 将详细介绍在 `MacOS` 系统下如何部署本工具.
  4 | 
  5 | 本工具主要有两个部分. `截屏询问GPT`, `语音记录询问GPT`. 很高兴的是,这两个模块是互不干扰的, 所以如果你只需要任意一个单独的模块,只需要遵循对应的教程就可以了.
  6 | 
  7 | 首先，你需要确保本地已经安装了 `Poetry`。如果尚未安装，可以使用以下命令进行安装：
  8 | 
  9 | ```bash
 10 | curl -sSL https://install.python-poetry.org | python3 -
 11 | ```
 12 | 
 13 | 添加`Poetry`到系统环境变量中, 可以添加到`.bashrc`或者 `.zshrc` 中长久生效.
 14 | 
 15 | 安装完成后，确保 `Poetry` 已经添加到你的系统路径中，可以通过以下命令验证安装是否成功：
 16 | 
 17 | ```bash
 18 | poetry --version
 19 | ```
 20 | 
 21 | 这将输出 Poetry 的版本信息，确认安装成功。
 22 | 
 23 | 克隆本项目,并进入项目根目录
 24 | 
 25 | ```bash
 26 | git clone https://github.com/AowerDmax/Simple-Interview-Audio-Processing.git
 27 | cd Simple-Interview-Audio-Processing
 28 | ```
 29 | 
 30 | 在项目根目录下，运行以下命令来安装项目的所有依赖项：
 31 | 
 32 | ```bash
 33 | poetry install
 34 | ```
 35 | 
 36 | 进入虚拟环境
 37 | 
 38 | ```bash
 39 | poetry shell
 40 | ```
 41 | 
 42 | 复制`.env` 文件
 43 | 
 44 | ```
 45 | cp env.template .env
 46 | ```
 47 | 
 48 | **注意:**, 你每次进入这个项目都需要`poetry shell`, 然后你就可以使用`python xxx.py` 使用虚拟环境运行`python` 文件了.
 49 | 
 50 | 启动 `docker` 容器.
 51 | 
 52 | ```
 53 | docker-compose up -d
 54 | # 如果你发现后面的步骤出现问题,这里可能会出现问题,使用下面的命令排查
 55 | docker-compose down
 56 | docker-compose up
 57 | 查看log日志,看看是哪个服务没有正常启动
 58 | docker-compose restart
 59 | docker-compose restart <服务名>
 60 | ```
 61 | 
 62 | ## 语音记录询问 GPT
 63 | 
 64 | 这部分的需要的东西有:
 65 | 
 66 | 1. `openai` ( `deepseek`, `oaipro`, `通义千问` )格式的`api`和`Base_url`
 67 | 2. 阿里的 `funASR` 进行实时语音识别
 68 | 3. [BlackHole 16ch](https://github.com/ExistentialAudio/BlackHole)进行语音的分离和采样.
 69 | 
 70 | #### BlackHole 16ch 设置
 71 | 
 72 | ##### 下载安装
 73 | 
 74 | Option 1: Install via Homebrew
 75 | 
 76 | ```
 77 | brew install blackhole-16ch
 78 | ```
 79 | 
 80 | Option 2: Download Installer
 81 | 从链接中安装[BlackHole 16ch](https://existential.audio/blackhole/). 可以选择不捐赠,然后填入自己的邮箱和个人信息,然后在邮箱里面找下载链接.(如果有能力可以捐赠他,**和我没关系**.)
 82 | 
 83 | ##### 进行 BlackHole 设置
 84 | 
 85 | `Command` + `Space` 打开聚焦搜索, 搜索`音频MIDI设置`.
 86 | 根据下面的截图创建聚合设备和多输出设备, 一定要注意,采样速率和主设备都要保证和下面的图一致. 因为`FunASR`是只支持 `单声道` 和 `16000` 采样率的.
 87 | 
 88 | ![2024-09-08-14-37-55](asserts/2024-09-08-14-37-55.png)
 89 | 
 90 | ![2024-09-08-14-39-00](asserts/2024-09-08-14-39-00.png)
 91 | 
 92 | ![2024-09-08-14-40-02](asserts/2024-09-08-14-40-02.png)
 93 | 
 94 | 此时,你需要测试一下,你的音响是否能播放声音, 你的麦克风能否正常输入声音.
 95 | 
 96 | #### 测试环境
 97 | 
 98 | 使用 `python interview/audioTest.py` 来获取所有音频输入输出设备.
 99 | 
100 | ```
101 | ❯ python interview/audioTest.py
102 | Device 0: 27M2U-D
103 | Device 1: XWF-1080P
104 | Device 2: Background Music
105 | Device 3: Background Music (UI Sounds)
106 | Device 4: BlackHole 16ch
107 | Device 5: BlackHole 2ch
108 | Device 6: 外置麦克风
109 | Device 7: 外置耳机
110 | Device 8: Mac mini扬声器
111 | Device 9: Iriun Webcam Audio
112 | Device 10: WeMeet Audio Device
113 | Device 11: TwsBowie
114 | Device 12: 聚合设备
115 | Device 13: 多输出设备
116 | Device 12:
117 |   Name: 聚合设备
118 |   Max Input Channels: 16
119 |   Max Output Channels: 16
120 |   Default Sample Rate: 16000.0
121 | Device 1:
122 |   Name: XWF-1080P
123 |   Max Input Channels: 1
124 |   Max Output Channels: 0
125 |   Default Sample Rate: 48000.0
126 | Recording...
127 | Recording finished and saved to files.
128 | ```
129 | 
130 | 我这里选择`Device 1`作为麦克风输入, `Device 12`作为系统内声音输入.
131 | 
132 | 第一次运行一般都会报错的,然后你根据已经列出的所有音频输入设备,选择相应的麦克风输入和系统内声音输入.
133 | 运行完这个脚本,在项目的根目录内会生成的`system_audio.wav`,`mic_audio.wav` 里面分别记录了`系统内声音` 和 `麦克风声音`.(注意:此时在读取麦克风输入和系统内声音输入,请保证相应的有输入)
134 | 请播放这个文件,来检查自己的设置是否正常.
135 | 
136 | #### 配置.env 文件
137 | 
138 | 在`.env`文件里面
139 | 
140 | `AUDIO_FS=16000` 和 `AUDIO_CHANNEL=1` 或者设置成你自己的值
141 | 
142 | 重点修改 `AGGREGATE_DEVICE_INDEX`、`MIC_DEVICE_INDEX` `RAG_ENABLED` 以及 GPT 的 `baseurl` 和 `API` 配置。
143 | 
144 | 可以通过 `MEILISEARCH_DEEP` 来设置 `RAG` 搜索辅助知识的数量
145 | 
146 | 可以通过 `ROOKIE_DIALOG_LEN`, `CHATGPT_DIALOG_LEN`, `INTERVIEWER_DIALOG_LEN` 来分别设置 `终端` 中 各类消息显示的数量. 同时在传入 `GPT` 问答时的对话记录的时候也遵循这个设置.
147 | 
148 | #### 配置 docker
149 | 
150 | ##### FunASR Interview 服务
151 | 
152 | 进入 Docker 容器内部：
153 | 
154 | ```bash
155 | docker-compose exec funasr_interview bash
156 | ```
157 | 
158 | 在 Docker 容器内运行以下命令启动服务：
159 | 
160 | ```bash
161 | cd FunASR/runtime
162 | 
163 | 
164 | nohup bash run_server_2pass.sh \
165 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
166 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
167 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
168 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
169 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
170 |  --itn-dir thuduj12/fst_itn_zh \
171 |  --certfile 0 \
172 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
173 | ```
174 | 
175 | ##### FunASR Rookie 服务
176 | 
177 | 进入 Docker 容器内部：
178 | 
179 | ```bash
180 | docker-compose exec funasr_rookie bash
181 | ```
182 | 
183 | 在 Docker 容器内运行以下命令启动服务：
184 | 
185 | ```bash
186 | cd FunASR/runtime
187 | 
188 | 
189 | nohup bash run_server_2pass.sh \
190 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
191 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
192 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
193 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
194 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
195 |  --itn-dir thuduj12/fst_itn_zh \
196 |  --certfile 0 \
197 |  --hotword ../../hotwords.txt > log.txt 2>&1 &
198 | ```
199 | 
200 | #### 运行
201 | 
202 | ```
203 | python interview/main.py
204 | ```
205 | 
206 | 使用`Command (⌘) + Option/Alt (⌥) + h` 调用`GPT` 询问.
207 | 
208 | ## 截屏询问 GPT
209 | 
210 | 这部分的需要的东西有:
211 | 
212 | 1. `openai`(或者官转)的`api`和`Base_url`. (这部分没有`deepseek`等其他模型, 需要支持`多模态`)
213 | 2. 类似于`CloudFlare`的`R2`的`OSS`. (做图床用,后续可能会支持免费图床) **注意: 一定不能是仅限国内访问的 OSS, 需要美国的 chatGPT 也能访问的到**
214 | 
215 | [R2 数据库教程(网上搜的)](https://blog.huacai.one/post/3)
216 | 
217 | 在`.env`文件里面
218 | 
219 | ```
220 | # R2 Access Key ID
221 | # This is your R2 account's Access Key ID, used to authenticate API requests.
222 | # R2 Secret Access Key
223 | # This is your R2 account's Secret Access Key, used along with the Access Key ID to authenticate API requests.
224 | # R2 Bucket Name
225 | # The name of the R2 bucket you wish to interact with. This is where your data will be stored.
226 | # R2 Region
227 | # The region where your R2 bucket is located. For example, 'us-west-1'.
228 | # R2 Endpoint URL
229 | # The endpoint URL for accessing the R2 service. This typically includes the protocol and the domain name.
230 | # For example: https://your-endpoint-url.com
231 | # R2 Custom Domain
232 | # If you have a custom domain configured for accessing your R2 bucket, specify it here. 可以和R2 Endpoint URL相同
233 | R2_ACCESS_KEY_ID=your-access-key-id-here
234 | R2_SECRET_ACCESS_KEY=your-secret-access-key-here
235 | R2_BUCKET_NAME=your-bucket-name-here
236 | R2_REGION=your-region-here
237 | R2_ENDPOINT_URL=https://your-endpoint-url-here
238 | R2_CUSTOM_DOMAIN=your-custom-domain-here
239 | ```
240 | 
241 | ### 使用
242 | 
243 | ```
244 | python interview/main.py
245 | ```
246 | 
247 | 使用`Command (⌘) + Option/Alt (⌥) + a` 调用`algorithm Prompt` 询问.
248 | 
249 | 使用`Command (⌘) + Option/Alt (⌥) + p` 调用`personality Prompt` 询问.
250 | 
251 | 使用`Command (⌘) + Option/Alt (⌥) + g` 调用`general Prompt` 询问.
252 | 
253 | 使用`Command (⌘) + Option/Alt (⌥) + l` 调用`long_screenshot Prompt` 询问. (截屏拼接)
254 | 
255 | 使用`Command (⌘) + Option/Alt (⌥) + f` 调用`fix Prompt` 询问.
256 | 
257 | 使用`Command (⌘) + Option/Alt (⌥) + o` 调用`ocr Prompt` 询问.
258 | 
259 | ## QA
260 | 
261 | 在这里罗列一些可能出现的问题,以及解决方案:
262 | 
263 | ### FunASR
264 | 
265 | 首先这个是`阿里巴巴` 提供的`实时语音转文字`服务. 但是他只支持单声道,16k 采样率. 很多时候我们的电脑是不支持这种设置的, 我`重采样` 了, 所以你应该在你的`.env`中填写正确的内容.
266 | 通过`python interview/audioTest.py` 进行测试.
267 | 
268 | **Q:** 报错.2024-08-17 23:50:04,377 - modelscope - WARNING - Downloading: TLG.fst failed, reason: ('Connection broken: IncompleteRead(1164614 bytes read, 166607546 more expected)', IncompleteRead(1164614 bytes read, 166607546 more expected)) will retry
269 | 
270 | **A:** 这是因为`modelScope`的下载服务出了问题, 原因可能因为你是国外的网络,导致`modelScope`中的模型下载失败导致的.如果出现了这个问题, 在`docker`的部分
271 | 使用下面的命令:
272 | 
273 | ```bash
274 | docker-compose exec <funasr_interview/rookie> bash
275 | ```
276 | 
277 | 在 Docker 容器内运行以下命令启动服务：
278 | 
279 | ```bash
280 | cd FunASR/runtime
281 | 
282 | 
283 | bash run_server_2pass.sh \
284 |  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
285 |  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
286 |  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
287 |  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
288 |  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
289 |  --itn-dir thuduj12/fst_itn_zh \
290 |  --certfile 0 \
291 |  --hotword ../../hotwords.txt
292 | ```
293 | 
294 | 查看具体的报错和网络连接失败信息,进行修复.
295 | 
296 | ### websocket
297 | 
298 | **Q:** websocket 连接失败,或者网页长时间查看不了
299 | 
300 | **A:** docker 相关的容器和本地机器是不是跨国了, 如果跨国这边会这样,尽量部署在局域网或者国内.
301 | 


--------------------------------------------------------------------------------
/poetry.md:
--------------------------------------------------------------------------------
  1 | `Poetry` 是一个用于 Python 项目管理和依赖管理的工具。如果你有一个使用 `Poetry` 管理的项目，并且想要安装和运行该项目的环境，可以按照以下步骤操作：
  2 | 
  3 | ### 1. 安装 Poetry
  4 | 
  5 | 首先，你需要确保本地已经安装了 `Poetry`。如果尚未安装，可以使用以下命令进行安装：
  6 | 
  7 | ```bash
  8 | curl -sSL https://install.python-poetry.org | python3 -
  9 | ```
 10 | 
 11 | 安装完成后，确保 `Poetry` 已经添加到你的系统路径中，可以通过以下命令验证安装是否成功：
 12 | 
 13 | ```bash
 14 | poetry --version
 15 | ```
 16 | 
 17 | 这将输出 Poetry 的版本信息，确认安装成功。
 18 | 
 19 | ### 2. 克隆项目
 20 | 
 21 | 如果项目托管在 GitHub 或其他代码仓库中，首先需要将项目克隆到本地：
 22 | 
 23 | ```bash
 24 | git clone <repository-url>
 25 | cd <repository-directory>
 26 | ```
 27 | 
 28 | ### 3. 安装项目依赖
 29 | 
 30 | 在项目根目录下，运行以下命令来安装项目的所有依赖项：
 31 | 
 32 | ```bash
 33 | poetry install
 34 | ```
 35 | 
 36 | 这个命令将根据 `pyproject.toml` 和 `poetry.lock` 文件中定义的依赖安装所需的包。如果项目包含开发依赖项（如测试框架），这些依赖也会一并安装。
 37 | 
 38 | ### 4. 进入虚拟环境
 39 | 
 40 | `Poetry` 默认会为项目创建一个隔离的虚拟环境。你可以使用以下命令进入该虚拟环境：
 41 | 
 42 | ```bash
 43 | poetry shell
 44 | ```
 45 | 
 46 | 进入虚拟环境后，所有命令（如 `python` 或 `pytest`）都会在虚拟环境中执行。
 47 | 
 48 | ### 5. 运行项目
 49 | 
 50 | 根据项目的结构，你可以通过以下方式运行项目：
 51 | 
 52 | - **运行 Python 脚本**：
 53 | 
 54 |   如果项目的入口是一个 Python 脚本，你可以直接运行它：
 55 | 
 56 |   ```bash
 57 |   python path/to/script.py
 58 |   ```
 59 | 
 60 | - **运行项目中的命令**：
 61 | 
 62 |   如果项目在 `pyproject.toml` 中定义了 `scripts`，你可以使用 `poetry run` 来运行它们：
 63 | 
 64 |   ```bash
 65 |   poetry run <script-name>
 66 |   ```
 67 | 
 68 | - **运行项目的主模块**：
 69 | 
 70 |   如果项目的入口是模块或包，你可以使用 `python -m` 运行它：
 71 | 
 72 |   ```bash
 73 |   python -m <module_name>
 74 |   ```
 75 | 
 76 | ### 6. 退出虚拟环境
 77 | 
 78 | 完成工作后，你可以使用以下命令退出虚拟环境：
 79 | 
 80 | ```bash
 81 | exit
 82 | ```
 83 | 
 84 | ### 7. 其他常用命令
 85 | 
 86 | - **添加新依赖**：
 87 | 
 88 |   如果你需要添加新的依赖项，可以使用以下命令：
 89 | 
 90 |   ```bash
 91 |   poetry add <package-name>
 92 |   ```
 93 | 
 94 | - **更新依赖**：
 95 | 
 96 |   要更新项目的所有依赖项，可以运行：
 97 | 
 98 |   ```bash
 99 |   poetry update
100 |   ```
101 | 
102 | - **运行测试**：
103 | 
104 |   如果项目包含测试，你可以使用以下命令运行测试：
105 | 
106 |   ```bash
107 |   poetry run pytest
108 |   ```
109 | 
110 | 通过这些步骤，你可以安装并运行一个使用 `Poetry` 管理的 Python 项目。
111 | 


--------------------------------------------------------------------------------
/prompt/algorithm.md:
--------------------------------------------------------------------------------
 1 | 你是一个先进的 AI 模型，负责解决算法问题。你将会看到一张包含算法问题的图片，通常包括代码片段、数学表达式或图表。你的任务是准确识别问题并提供详细的解决方案。请按照以下步骤进行：
 2 | 
 3 | 1. **图像分析**：仔细分析图片内容，理解问题陈述。重点识别代码片段、数学公式以及任何相关的文字或图表。
 4 | 
 5 | 2. **问题识别**：确定图片中提出的算法问题类型（例如，排序、动态规划、图论等）。用简明的语言总结出问题。
 6 | 
 7 | 3. **解决思路**：提供逐步的解决思路。解释你将采用的算法或数据结构来解决问题。
 8 | 
 9 | 4. **代码实现**：如果问题涉及代码，请提供清晰且有注释的代码实现，使用 Java 语言实现。确保代码高效并能处理边界情况。
10 | 
11 | 5. **结果解释**：详细解释你的解决方案是如何工作的。包括时间复杂度和空间复杂度的分析，并讨论可能的替代方法或权衡。
12 | 
13 | 6. **回答问题**：如果图片中提出了具体问题或要求输出特定结果，请确保你的解答能够直接回应这些问题。
14 | 
15 | 请确保你的分析全面，并用清晰的中文表达解决方案，以确保完全理解和正确解答问题。
16 | 


--------------------------------------------------------------------------------
/prompt/fix.md:
--------------------------------------------------------------------------------
 1 | **任务描述:**
 2 | 
 3 | 我上传了一张 LeetCode 风格的代码界面截图，其中包含了一个编写中的 Java 代码实现。请你从图片中识别出以下内容并完成相应的任务：
 4 | 
 5 | 1. **识别代码区域**：代码一般位于右侧，包含未完成的 Java 代码实现。可能存在自然语言的直接描述，这部分没有被注释符号包裹，通常是我对算法题的处理逻辑的描述。
 6 | 
 7 | 2. **识别注释部分**：注释部分一般被注释符号（如 `//` 或 `/* */`）包裹，这部分通常包含我对问题的分析或对代码优化的要求。
 8 | 
 9 | 3. **优化或修复代码**：
10 | 
11 |    - 如果我在注释中提到代码超时，请检查代码中的性能瓶颈，尝试通过降低时间复杂度或其他优化手段提高代码效率。
12 |    - 如果代码中存在语法错误，请修复这些错误，并确保代码能够正确编译和运行。
13 |    - 按照我的分析或要求对代码进行优化，确保逻辑正确且高效。
14 | 
15 | 4. **实现或完善代码**：根据我注释的编写内容,可能会涉及让你实现的算法流程，使用 Java 语言实现或优化代码的逻辑。
16 | 
17 | **示例**：
18 | 
19 | - 如果注释中提到“超时了，请优化算法”，你应该检查代码中的循环或递归部分，尝试通过使用更高效的数据结构或算法来降低时间复杂度。
20 | 
21 | 请帮助我完成上述任务，并输出优化后的代码。
22 | 


--------------------------------------------------------------------------------
/prompt/general.md:
--------------------------------------------------------------------------------
 1 | 你是一位知识渊博的 AI 助手，擅长解答各种类型的问题。你将会看到一张图片，图片可能包含不同类型的内容，如开发知识、智力题、数学题、英语题等。你的任务是准确识别图片中的信息，并根据问题类型提供详细的解答。请按照以下步骤进行：
 2 | 
 3 | 1. **图像分析**：仔细观察图片内容，识别其中的文字、符号、代码、图形等元素，并理解图片传达的主要信息或问题。
 4 | 
 5 | 2. **问题分类**：根据图片内容判断问题类型（如开发相关的问题、数学题、智力题、英语题等）。简要描述问题的核心内容。
 6 | 
 7 | 3. **提供解答**：
 8 |    - **开发知识**：如果是关于编程或开发的技术问题，提供适当的代码示例、算法解释或技术方案。
 9 |    - **智力题**：如果是智力题，分析题目逻辑，提供解题思路和最终答案。
10 |    - **数学题**：如果是数学问题，列出相关的公式或定理，详细计算过程并得出结果。
11 |    - **英语题**：如果是英语题，进行语法分析、翻译或单词解释，并提供正确答案或改进建议。
12 | 4. **结果解释**：对于每一个答案，提供详细的解释或推理过程，确保解答过程清晰且易于理解。
13 | 
14 | 5. **回答问题**：如果图片中有明确的问题或要求，请确保你的回答能够准确回应并解决问题。
15 | 
16 | 请用中文清晰准确地解答，确保问题得到全面分析和解决。
17 | 


--------------------------------------------------------------------------------
/prompt/long_screenshot.md:
--------------------------------------------------------------------------------
 1 | 你是一个先进的 AI 模型，负责解决算法问题。你将会看到一张包含算法问题的拼接图片，通常包括代码片段、数学表达式或图表。你的任务是准确识别问题并提供详细的解决方案。请按照以下步骤进行：
 2 | 
 3 | 1. **图像分析**：仔细分析图片内容，理解问题陈述。重点识别代码片段、数学公式以及任何相关的文字或图表。
 4 | 
 5 | 2. **问题识别**：确定图片中提出的算法问题类型（例如，排序、动态规划、图论等）。用简明的语言总结出问题。
 6 | 
 7 | 3. **解决思路**：提供逐步的解决思路。解释你将采用的算法或数据结构来解决问题。
 8 | 
 9 | 4. **代码实现**：如果问题涉及代码，请提供清晰且有注释的代码实现，使用 Java 语言实现。确保代码高效并能处理边界情况。
10 | 
11 | 5. **结果解释**：详细解释你的解决方案是如何工作的。包括时间复杂度和空间复杂度的分析，并讨论可能的替代方法或权衡。
12 | 
13 | 6. **回答问题**：如果图片中提出了具体问题或要求输出特定结果，请确保你的解答能够直接回应这些问题。
14 | 
15 | 请确保你的分析全面，并用清晰的中文表达解决方案，以确保完全理解和正确解答问题。
16 | 


--------------------------------------------------------------------------------
/prompt/ocr.md:
--------------------------------------------------------------------------------
1 | 我将上传一张或多张图片，图片可能是单张的，也可能是多张拼接在一起。这些图片中包含一个算法题以及相关的信息，包括但不限于题目描述、示例输入输出、解释等。请你完成以下任务：
2 | 
3 | 1. **这个问题的解决思路和类似问题**：
4 |    - 对题目进行详细分析,进行一个简单通俗的讲解.
5 |    - 基于识别到的题目信息，提供一个可行的解决思路。
6 |    - 提供一些类似的问题,为什么类似? 类似问题的解决思路是什么?
7 |    - 使用 `Java` 语言, 针对此题编写相应的输入输出,其中类名为 `Main`. 将算法实现的逻辑抽象为一个函数.
8 | 


--------------------------------------------------------------------------------
/prompt/personality.md:
--------------------------------------------------------------------------------
 1 | 你是一位专业的 AI 助手，专门用于解读性格测试。你将会看到一张性格测试的图片，图片中可能包含多项性格描述以及与之对应的选项。你的任务是根据性格测试的要求，选择最适合的答案。请按照以下步骤进行：
 2 | 
 3 | 1. **图像分析**：仔细观察图片内容，识别图片中的性格描述、测试题目及其选项。重点关注描述中的关键词，如“紧张”“轻松”“乐观”“团队合作”等。
 4 | 
 5 | 2. **符合性分析**：根据以下规则判断选项的符合性：
 6 | 
 7 |    - 如果选项中包含“我会紧张”或“重大事件让我焦虑”等描述，选择最不符合的选项。
 8 |    - 如果选项中包含“我很轻松”“我乐观”“我对未来充满希望”等描述，选择最符合的选项。
 9 |    - 如果选项中体现“团队合作”“听取他人意见”的精神，选择最符合的选项。
10 | 
11 | 3. **选择最符合和最不符合的选项**：
12 | 
13 |    - **最符合的选项**：选择能够展现积极乐观、轻松、不易受外界影响、注重团队合作的选项。
14 |    - **最不符合的选项**：选择体现紧张、忧虑、不信任他人、倾向独自决策的选项。
15 | 
16 | 4. **结果解答**：清楚地指出你选择的最符合和最不符合的选项，并提供简要的理由说明。
17 | 
18 | 请用清晰的中文进行回答，确保性格测试中的每个选项都得到了全面分析，并且选择符合测试要求的答案。
19 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.poetry]
 2 | name = "interview"
 3 | version = "0.1.0"
 4 | description = ""
 5 | authors = ["aowerdmax <qq1329210652@gmail.com>"]
 6 | readme = "README.md"
 7 | 
 8 | [tool.poetry.dependencies]
 9 | python = "^3.12"
10 | aliyun-python-sdk-core = "^2.15.1"
11 | python-dotenv = "^1.0.1"
12 | websockets = "^12.0"
13 | aiohttp = "^3.10.3"
14 | meilisearch = "^0.31.4"
15 | boto3 = "^1.35.8"
16 | botocore = "^1.35.8"
17 | pynput = "^1.7.7"
18 | pyautogui = "^0.9.54"
19 | pillow = "^10.4.0"
20 | redis = "^5.0.8"
21 | numpy = "^2.1.1"
22 | 
23 | 
24 | [build-system]
25 | requires = ["poetry-core"]
26 | build-backend = "poetry.core.masonry.api"
27 | 
28 | [virtualenvs]
29 | in-project = true
30 | 


--------------------------------------------------------------------------------
/screenshots/long_screenshot_20240829_212734.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/long_screenshot_20240829_212734.png


--------------------------------------------------------------------------------
/screenshots/screenshot_20240829_160848.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_160848.png


--------------------------------------------------------------------------------
/screenshots/screenshot_20240829_213839.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_213839.png


--------------------------------------------------------------------------------
/screenshots/screenshot_20240829_225747.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/screenshots/screenshot_20240829_225747.png


--------------------------------------------------------------------------------
/start_funasr_services.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Function to start FunASR service inside a Docker container
 4 | start_service() {
 5 |     local service_name=$1
 6 | 
 7 |     echo "Starting service for ${service_name}..."
 8 | 
 9 |     # Execute the following commands inside the Docker container
10 |     docker-compose exec ${service_name} bash -c "
11 |     cd FunASR/runtime && \
12 |     nohup bash run_server_2pass.sh \
13 |      --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
14 |      --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
15 |      --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
16 |      --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
17 |      --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
18 |      --itn-dir thuduj12/fst_itn_zh \
19 |      --certfile 0 \
20 |      --hotword ../../hotwords.txt > log.txt 2>&1 &
21 |     "
22 | 
23 |     echo "Service for ${service_name} started successfully."
24 | }
25 | 
26 | # Start FunASR Interview service
27 | start_service "funasr_interview"
28 | 
29 | # Start FunASR Rookie service
30 | start_service "funasr_rookie"
31 | 
32 | echo "All services have been started."
33 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/tests/__init__.py


--------------------------------------------------------------------------------
/tmp/text.0_0:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AowerDmax/Simple-Interview-Audio-Processing/4857f35a8356854b611e33952528ce7c834353df/tmp/text.0_0


--------------------------------------------------------------------------------
/workflows/prompt_1/step1.md:
--------------------------------------------------------------------------------
 1 | 你是一位智能面试助手，负责在面试过程中快速简要地回答面试官（interviewer）的问题。你需要根据面试对话列表中的最新问题，提供直接且简洁的回答。请按照以下步骤进行：
 2 | 
 3 | 1. **对话列表分析**：
 4 | 
 5 |    - 检查对话列表中的最新条目，确定提问者是否为面试官（interviewer）。
 6 |    - 确认问题的类型，并理解问题的意图。
 7 | 
 8 | 2. **简要回答**：
 9 | 
10 |    - **技术问题**：提供关键概念或方法的简要说明，确保回答直接针对问题的核心。
11 |    - **行为问题**：用简洁的语言提供直接的回答，突出主要行为或决策。
12 |    - **情景模拟**：提供简要的解决方案，突出关键步骤或决策点。
13 | 
14 | 3. **保持简洁**：
15 |    - 确保回答控制在几句话之内，不需要深入的背景信息或额外的细节。
16 | 
17 | 面试对话记录如下:
18 | 
19 | ${conversation}
20 | 


--------------------------------------------------------------------------------
/workflows/prompt_1/step2.md:
--------------------------------------------------------------------------------
 1 | 在完成简要回答后，你将进一步详细解释问题，并通过添加使用场景、案例和额外的分析来加分。你需要展示面试者的深入理解和应用能力。请按照以下步骤进行：
 2 | 
 3 | 1. **扩展回答**：
 4 | 
 5 |    - 在之前简要回答的基础上，详细解释概念、原理或方法。
 6 |    - 提供与实际工作相关的使用场景或应用实例，展示问题在不同环境中的应用。
 7 | 
 8 | 2. **举例说明**：
 9 | 
10 |    - **技术问题**：添加一个或多个实际项目中的案例，展示如何应用所讨论的技术或方法。
11 |    - **行为问题**：提供实际经历中的具体情境，解释面试者如何在类似情况下表现出色。
12 |    - **情景模拟**：分析不同情境下可能的挑战，并展示如何通过合理的决策来应对这些挑战。
13 | 
14 | 3. **加分回答**：
15 |    - 讨论替代方案、优化方法或潜在的权衡，并提供对未来工作的建议或见解。
16 |    - 确保回答不仅展示知识深度，还展示面试者的解决问题能力和创新思维。
17 | 


--------------------------------------------------------------------------------
/workflows/prompt_2/step1.md:
--------------------------------------------------------------------------------
 1 | 你是一位智能面试助手，负责在面试过程中协助面试官（interviewer）和面试者（rookie）。你将根据面试对话列表中的最新问题，提供准确和有针对性的回答。面试的场景可能涉及技术问题、行为问题、情景模拟等。请按照以下步骤进行：
 2 | 
 3 | 1. **对话列表分析**：
 4 | 
 5 |    - 检查对话列表中的最新条目，确定提问者是否为面试官（interviewer）。
 6 |    - 确认问题的类型（技术问题、行为问题、情景模拟等），并理解问题的意图。
 7 | 
 8 | 2. **回答问题**：
 9 | 
10 |    - **技术问题**：如果问题涉及技术知识，提供清晰且详细的解答，引用相关的概念、代码示例或技术实现。
11 |    - **行为问题**：如果问题涉及行为评估，提供基于实际经验的回答，突出面试者的软技能，如团队合作、问题解决、时间管理等。
12 |    - **情景模拟**：如果问题涉及情景模拟，分析情景并提供最佳解决方案，展示逻辑思维和决策能力。
13 | 
14 | 3. **回答格式**：
15 | 
16 |    - 只需要回答最新问题,不需要额外输出.
17 |    - 回答先提及答案的关键字,然后简要回答,使用表格回答
18 |    - 确保回答简洁明了，能够直接回应面试官的提问. 使用面试者的语气和用词.
19 | 
20 | 面试对话记录如下:
21 | 
22 | ${conversation}
23 | 


--------------------------------------------------------------------------------