├── .gitignore ├── .pre-commit-config.yaml ├── LICENSE ├── PROJECT_PLANS.md ├── README.md ├── assets └── logo.jpg ├── config.json ├── pyproject.toml ├── src └── whisperchain │ ├── cli │ ├── run.py │ ├── run_client.py │ └── run_server.py │ ├── client │ ├── key_listener.py │ └── stream_client.py │ ├── core │ ├── __init__.py │ ├── audio.py │ ├── chain.py │ └── config.py │ ├── prompts │ └── transcription_cleanup.txt │ ├── server │ ├── __init__.py │ └── server.py │ ├── ui │ └── streamlit_app.py │ └── utils │ ├── decorators.py │ ├── logger.py │ ├── secrets.py │ └── segment.py └── tests ├── test_audio_capture.py ├── test_chain.py ├── test_key_listener.py ├── test_pywhispercpp.py ├── test_stream_client.py └── test_utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Python cache 2 | __pycache__/ 3 | *.pyc 4 | *.egg-info/ 5 | 6 | # Build 7 | build/ 8 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | default_language_version: 2 | python: python3 3 | 4 | repos: 5 | - repo: https://github.com/pre-commit/pre-commit-hooks 6 | rev: v4.4.0 7 | hooks: 8 | - id: check-ast 9 | - id: check-merge-conflict 10 | - id: check-yaml 11 | - id: end-of-file-fixer 12 | - id: trailing-whitespace 13 | args: [--markdown-linebreak-ext=md] 14 | 15 | - repo: https://github.com/psf/black 16 | rev: 23.3.0 17 | hooks: 18 | - id: black 19 | language_version: python3 20 | args: ["--line-length", "99"] 21 | 22 | - repo: https://github.com/pycqa/isort 23 | rev: 5.12.0 24 | hooks: 25 | - id: isort 26 | exclude: README.md 27 | args: ["--profile", "black"] 28 | 29 | # jupyter notebook cell output clearing 30 | - repo: https://github.com/kynan/nbstripout 31 | rev: 0.6.1 32 | hooks: 33 | - id: nbstripout 34 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2025, Chris Choy 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | * Neither the name of the PyAutoGUI nor the names of its 15 | contributors may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | -------------------------------------------------------------------------------- /PROJECT_PLANS.md: -------------------------------------------------------------------------------- 1 | # Project Plan 2 | 3 | ## Final Product 4 | 5 | - Press to talk 6 | - Transcribe speech to text using pywhispercpp 7 | - Use [pywhispercpp](https://github.com/absadiki/pywhispercpp) for Whisper.cpp integration 8 | - Support Apple silicon chips (M1, M2, ...) 9 | - Support CUDA for GPU acceleration 10 | - Support real-time transcription via WebSocket 11 | - Use LangChain to parse the text and clean up the text 12 | - E.g. "Ehh what is the emm wheather like in SF? no, Salt Lake City" -> "What is the weather like in Salt Lake City?" 13 | - Support multiple LLM providers 14 | 15 | ## Milestones 16 | 17 | - [x] Speech to text setup 18 | - [x] Install pywhispercpp with CoreML support for Apple Silicon 19 | - [x] Basic transcription test 20 | - [x] Basic git commit hook to check if the code is formatted 21 | - [x] Format the code 22 | - [x] Voice Processing Server 23 | - [x] FastAPI server setup 24 | - [x] Audio upload endpoint 25 | - [x] Streaming audio support 26 | - [x] LangChain Integration 27 | - [x] Test OpenAI API Key loading 28 | - [x] Chain configuration 29 | - [x] Text processing pipeline 30 | - [x] Response formatting 31 | - [ ] Support other LLMs (DeepSeek, Gemini, ...) 32 | - [ ] Local LLM support 33 | - [ ] Press to talk 34 | - [x] Key listener 35 | - [x] Capture a hot key regardless of the current application 36 | - [x] Put the final result in the system clipboard 37 | - [ ] Show an icon when voice control is active 38 | - [x] Command line interface 39 | - [x] Add a command line interface using `click` 40 | - [x] Web UI 41 | - [x] Streamlit UI 42 | - [x] Visualize (input audio), transcription, and output text 43 | - [x] Visualize transcription history 44 | - [ ] Prompt config 45 | - [ ] LangChain config 46 | - [ ] Context Management 47 | - [ ] System prompt configuration 48 | - [ ] Chat history persistence 49 | - [ ] Documentation 50 | - [ ] API Documentation 51 | - [ ] Usage examples and guides 52 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Whisper Chain 2 | 3 |

4 | Whisper Chain Logo 5 |

6 | 7 | ## Overview 8 | 9 | Typing is boring, let's use voice to speed up your workflow. This project combines: 10 | - Real-time speech recognition using Whisper.cpp 11 | - Transcription cleanup using LangChain 12 | - Global hotkey support for voice control 13 | - Automatic clipboard integration for the cleaned transcription 14 | 15 | ## Requirements 16 | 17 | - Python 3.8+ 18 | - OpenAI API Key 19 | - For MacOS: 20 | - ffmpeg (for audio processing) 21 | - portaudio (for audio capture) 22 | 23 | ## Installation 24 | 25 | 1. Install system dependencies (MacOS): 26 | ```bash 27 | # Install ffmpeg and portaudio using Homebrew 28 | brew install ffmpeg portaudio 29 | ``` 30 | 31 | 2. Install the project: 32 | ```bash 33 | pip install whisperchain 34 | ``` 35 | 36 | ## Configuration 37 | 38 | WhisperChain will look for configuration in the following locations: 39 | 1. Environment variables 40 | 2. .env file in the current directory 41 | 3. ~/.whisperchain/.env file 42 | 43 | On first run, if no configuration is found, you will be prompted to enter your OpenAI API key. The key will be saved in `~/.whisperchain/.env` for future use. 44 | 45 | You can also manually set your OpenAI API key in any of these ways: 46 | ```bash 47 | # Option 1: Environment variable 48 | export OPENAI_API_KEY=your-api-key-here 49 | 50 | # Option 2: Create .env file in current directory 51 | echo "OPENAI_API_KEY=your-api-key-here" > .env 52 | 53 | # Option 3: Create global config 54 | mkdir -p ~/.whisperchain 55 | echo "OPENAI_API_KEY=your-api-key-here" > ~/.whisperchain/.env 56 | ``` 57 | 58 | ## Usage 59 | 60 | 1. Start the application: 61 | ```bash 62 | # Run with default settings 63 | whisperchain 64 | 65 | # Run with custom configuration 66 | whisperchain --config config.json 67 | 68 | # Override specific settings 69 | whisperchain --port 8080 --hotkey "++t" --model "large" --debug 70 | ``` 71 | 72 | 3. Use the global hotkey (`++r` by default. `+