├── .gitignore ├── LICENSE ├── README.md ├── main.py ├── requirements.txt └── run_meeting_summarizer.sh /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | /flagged 3 | transcript.txt 4 | /whisper.cpp 5 | .DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Alexis Balayre 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AI-Powered Meeting Summarizer 2 | 3 | ## Overview 4 | 5 | The **AI-Powered Meeting Summarizer** is a Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using `whisper.cpp` for audio-to-text conversion and `Ollama` for text summarization. This tool is ideal for quickly extracting key points, decisions, and action items from meetings. 6 | 7 | Screenshot 2024-10-01 at 10 05 32 PM 8 | 9 | 10 | https://github.com/user-attachments/assets/2f1de19d-0feb-4a35-a6ab-f9be8dabf512 11 | 12 | 13 | 14 | 15 | ## Features 16 | 17 | - **Audio-to-Text Conversion**: Uses `whisper.cpp` to convert audio files into text. 18 | - **Text Summarization**: Uses models from the `Ollama` server to summarize the transcript. 19 | - **Multiple Models Support**: Supports different Whisper models (`base`, `small`, `medium`, `large-V3`) and any available model from the Ollama server. 20 | - **Translation**: Allows translation of non-English audio to English using Whisper. 21 | - **Gradio Interface**: Provides a user-friendly web interface to upload audio files, view summaries, and download transcripts. 22 | 23 | ## Requirements 24 | 25 | - Python 3.x 26 | - [FFmpeg](https://www.ffmpeg.org/) (for audio processing) 27 | - [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) (for audio-to-text conversion) 28 | - [Ollama server](https://ollama.com/) (for text summarization) 29 | - [Gradio](https://www.gradio.app/) (for the web interface) 30 | - [Requests](https://requests.readthedocs.io/en/latest/) (for handling API calls to the Ollama server) 31 | 32 | ## Pre-Installation 33 | 34 | Before running the application, ensure you have Ollama that is running on your local machine or a server. You can follow the instructions provided in the [Ollama repository](https://github.com/ollama/ollama) to set up the server. Do not forget to download and run a model from the Ollama server. 35 | 36 | ```bash 37 | # To install and run Llama 3.2 38 | ollama run llama3.2 39 | ``` 40 | 41 | ## Installation 42 | 43 | Follow the steps below to set up and run the application: 44 | 45 | ### Step 1: Clone the Repository 46 | 47 | ```bash 48 | git clone https://github.com/AlexisBalayre/AI-Powered-Meeting-Summarizer 49 | cd AI-Powered-Meeting-Summarizer 50 | ``` 51 | 52 | ### Step 2: Run the Setup Script 53 | 54 | To install all necessary dependencies (including Python virtual environment, `whisper.cpp`, FFmpeg, and Python packages), and to run the application, execute the provided setup script: 55 | 56 | ```bash 57 | chmod +x run_meeting_summarizer.sh 58 | ./run_meeting_summarizer.sh 59 | ``` 60 | 61 | This script will: 62 | 63 | - Create and activate a Python virtual environment. 64 | - Install necessary Python packages like `requests` and `gradio`. 65 | - Check if `FFmpeg` is installed and install it if missing. 66 | - Clone and build `whisper.cpp`. 67 | - Download the required Whisper model (default: `small`). 68 | - **Run the `main.py` script**, which will start the Gradio interface for the application. 69 | 70 | ### Step 3: Accessing the Application 71 | 72 | Once the setup and execution are complete, Gradio will provide a URL (typically `http://127.0.0.1:7860`). Open this URL in your web browser to access the Meeting Summarizer interface. 73 | 74 | Alternatively, after setup, you can activate the virtual environment and run the Python script manually: 75 | 76 | ```bash 77 | # Activate the virtual environment 78 | source .venv/bin/activate 79 | 80 | # Run the main.py script 81 | python main.py 82 | ``` 83 | 84 | ## Usage 85 | 86 | ### Uploading an Audio File 87 | 88 | 1. **Upload an Audio File**: Click on the audio upload area and select an audio file in any supported format (e.g., `.wav`, `.mp3`). 89 | 2. **Provide Context (Optional)**: You can provide additional context for better summarization (e.g., "Meeting about AI and Ethics"). 90 | 3. **Select Whisper Model**: Choose one of the available Whisper models (`base`, `small`, `medium`, `large-V3`) for audio-to-text conversion. 91 | 4. **Select Summarization Model**: Choose a model from the available options retrieved from the `Ollama` server. 92 | 93 | ### Viewing Results 94 | 95 | - After uploading an audio file, you will get a **Summary** of the transcript generated by the selected models. 96 | - You can also **download the full transcript** as a text file by clicking the provided link. 97 | 98 | ## Customization 99 | 100 | ### Changing the Whisper Model 101 | 102 | By default, the Whisper model used is `small`. You can modify this in the `run_meeting_summarizer.sh` script by changing the `WHISPER_MODEL` variable: 103 | 104 | ```bash 105 | WHISPER_MODEL="medium" 106 | ``` 107 | 108 | Alternatively, you can select different Whisper models from the dropdown in the Gradio interface. The list of available models is dynamically generated based on the `.bin` files found in the `whisper.cpp/models` directory. 109 | 110 | ### Downloading Additional Whisper Models 111 | 112 | To download a different Whisper model (e.g., `base`, `medium`, `large`), use the following steps: 113 | 114 | 1. Navigate to the `whisper.cpp` directory: 115 | 116 | ```bash 117 | cd whisper.cpp 118 | ``` 119 | 120 | 2. Use the provided script to download the desired model. For example, to download the `base` model, run: 121 | 122 | ```bash 123 | ./models/download-ggml-model.sh base 124 | ``` 125 | 126 | For the `large` model, you can run: 127 | 128 | ```bash 129 | ./models/download-ggml-model.sh large 130 | ``` 131 | 132 | This will download the `.bin` file into the `whisper.cpp/models` directory. 133 | 134 | 3. Once downloaded, the new model will automatically be available in the model dropdown when you restart the application. 135 | 136 | ### Configuring Translation 137 | 138 | By default, Whisper will detect the language of the audio file and translate it to English if necessary. This behavior is controlled by the `-l` flag in the `whisper.cpp` command. 139 | 140 | ```bash 141 | ./whisper.cpp/main -m ./whisper.cpp/models/ggml-{WHISPER_MODEL}.bin -l fr -f "{audio_file_wav}" 142 | ``` 143 | 144 | This flag tells Whisper to translate the audio into French regardless of the original language. 145 | 146 | ## License 147 | 148 | This project is licensed under the MIT License. See the `LICENSE` file for details. 149 | 150 | ## Acknowledgements 151 | 152 | - **whisper.cpp** by Georgi Gerganov for the audio-to-text conversion. 153 | - **Gradio** for the interactive web interface framework. 154 | - **Ollama** for providing large language models for summarization. 155 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import os 3 | import gradio as gr 4 | import requests 5 | import json 6 | 7 | OLLAMA_SERVER_URL = "http://localhost:11434" # Replace this with your actual Ollama server URL if different 8 | WHISPER_MODEL_DIR = "./whisper.cpp/models" # Directory where whisper models are stored 9 | 10 | 11 | def get_available_models() -> list[str]: 12 | """ 13 | Retrieves a list of all available models from the Ollama server and extracts the model names. 14 | 15 | Returns: 16 | A list of model names available on the Ollama server. 17 | """ 18 | response = requests.get(f"{OLLAMA_SERVER_URL}/api/tags") 19 | if response.status_code == 200: 20 | models = response.json()["models"] 21 | llm_model_names = [model["model"] for model in models] # Extract model names 22 | return llm_model_names 23 | else: 24 | raise Exception( 25 | f"Failed to retrieve models from Ollama server: {response.text}" 26 | ) 27 | 28 | 29 | def get_available_whisper_models() -> list[str]: 30 | """ 31 | Retrieves a list of available Whisper models based on downloaded .bin files in the whisper.cpp/models directory. 32 | Filters out test models and only includes official Whisper models (e.g., base, small, medium, large). 33 | 34 | Returns: 35 | A list of available Whisper model names (e.g., 'base', 'small', 'medium', 'large-V3'). 36 | """ 37 | # List of acceptable official Whisper models 38 | valid_models = ["base", "small", "medium", "large", "large-V3"] 39 | 40 | # Get the list of model files in the models directory 41 | model_files = [f for f in os.listdir(WHISPER_MODEL_DIR) if f.endswith(".bin")] 42 | 43 | # Filter out test models and models that aren't in the valid list 44 | whisper_models = [ 45 | os.path.splitext(f)[0].replace("ggml-", "") 46 | for f in model_files 47 | if any(valid_model in f for valid_model in valid_models) and "test" not in f 48 | ] 49 | 50 | # Remove any potential duplicates 51 | whisper_models = list(set(whisper_models)) 52 | 53 | return whisper_models 54 | 55 | 56 | def summarize_with_model(llm_model_name: str, context: str, text: str) -> str: 57 | """ 58 | Uses a specified model on the Ollama server to generate a summary. 59 | Handles streaming responses by processing each line of the response. 60 | 61 | Args: 62 | llm_model_name (str): The name of the model to use for summarization. 63 | context (str): Optional context for the summary, provided by the user. 64 | text (str): The transcript text to summarize. 65 | 66 | Returns: 67 | str: The generated summary text from the model. 68 | """ 69 | prompt = f"""You are given a transcript from a meeting, along with some optional context. 70 | 71 | Context: {context if context else 'No additional context provided.'} 72 | 73 | The transcript is as follows: 74 | 75 | {text} 76 | 77 | Please summarize the transcript.""" 78 | 79 | headers = {"Content-Type": "application/json"} 80 | data = {"model": llm_model_name, "prompt": prompt} 81 | 82 | response = requests.post( 83 | f"{OLLAMA_SERVER_URL}/api/generate", json=data, headers=headers, stream=True 84 | ) 85 | 86 | if response.status_code == 200: 87 | full_response = "" 88 | try: 89 | # Process the streaming response line by line 90 | for line in response.iter_lines(): 91 | if line: 92 | # Decode each line and parse it as a JSON object 93 | decoded_line = line.decode("utf-8") 94 | json_line = json.loads(decoded_line) 95 | # Extract the "response" part from each JSON object 96 | full_response += json_line.get("response", "") 97 | # If "done" is True, break the loop 98 | if json_line.get("done", False): 99 | break 100 | return full_response 101 | except json.JSONDecodeError: 102 | print("Error: Response contains invalid JSON data.") 103 | return f"Failed to parse the response from the server. Raw response: {response.text}" 104 | else: 105 | raise Exception( 106 | f"Failed to summarize with model {llm_model_name}: {response.text}" 107 | ) 108 | 109 | 110 | def preprocess_audio_file(audio_file_path: str) -> str: 111 | """ 112 | Converts the input audio file to a WAV format with 16kHz sample rate and mono channel. 113 | 114 | Args: 115 | audio_file_path (str): Path to the input audio file. 116 | 117 | Returns: 118 | str: The path to the preprocessed WAV file. 119 | """ 120 | output_wav_file = f"{os.path.splitext(audio_file_path)[0]}_converted.wav" 121 | 122 | # Ensure ffmpeg converts to 16kHz sample rate and mono channel 123 | cmd = f'ffmpeg -y -i "{audio_file_path}" -ar 16000 -ac 1 "{output_wav_file}"' 124 | subprocess.run(cmd, shell=True, check=True) 125 | 126 | return output_wav_file 127 | 128 | 129 | def translate_and_summarize( 130 | audio_file_path: str, context: str, whisper_model_name: str, llm_model_name: str 131 | ) -> tuple[str, str]: 132 | """ 133 | Translates the audio file into text using the whisper.cpp model and generates a summary using Ollama. 134 | Also provides the transcript file for download. 135 | 136 | Args: 137 | audio_file_path (str): Path to the input audio file. 138 | context (str): Optional context to include in the summary. 139 | whisper_model_name (str): Whisper model to use for audio-to-text conversion. 140 | llm_model_name (str): Model to use for summarizing the transcript. 141 | 142 | Returns: 143 | tuple[str, str]: A tuple containing the summary and the path to the transcript file for download. 144 | """ 145 | output_file = "output.txt" 146 | 147 | print("Processing audio file:", audio_file_path) 148 | 149 | # Convert the input file to WAV format if necessary 150 | audio_file_wav = preprocess_audio_file(audio_file_path) 151 | 152 | print("Audio preprocessed:", audio_file_wav) 153 | 154 | # Call the whisper.cpp binary 155 | whisper_command = f'./whisper.cpp/main -m ./whisper.cpp/models/ggml-{whisper_model_name}.bin -f "{audio_file_wav}" > {output_file}' 156 | subprocess.run(whisper_command, shell=True, check=True) 157 | 158 | print("Whisper.cpp executed successfully") 159 | 160 | # Read the output from the transcript 161 | with open(output_file, "r") as f: 162 | transcript = f.read() 163 | 164 | # Save the transcript to a downloadable file 165 | transcript_file = "transcript.txt" 166 | with open(transcript_file, "w") as transcript_f: 167 | transcript_f.write(transcript) 168 | 169 | # Generate summary from the transcript using Ollama's model 170 | summary = summarize_with_model(llm_model_name, context, transcript) 171 | 172 | # Clean up temporary files 173 | os.remove(audio_file_wav) 174 | os.remove(output_file) 175 | 176 | # Return the downloadable link for the transcript and the summary text 177 | return summary, transcript_file 178 | 179 | 180 | # Gradio interface 181 | def gradio_app( 182 | audio, context: str, whisper_model_name: str, llm_model_name: str 183 | ) -> tuple[str, str]: 184 | """ 185 | Gradio application to handle file upload, model selection, and summary generation. 186 | 187 | Args: 188 | audio: The uploaded audio file. 189 | context (str): Optional context provided by the user. 190 | whisper_model_name (str): The selected Whisper model name. 191 | llm_model_name (str): The selected language model for summarization. 192 | 193 | Returns: 194 | tuple[str, str]: A tuple containing the summary text and a downloadable transcript file. 195 | """ 196 | return translate_and_summarize(audio, context, whisper_model_name, llm_model_name) 197 | 198 | 199 | # Main function to launch the Gradio interface 200 | if __name__ == "__main__": 201 | # Retrieve available models for Gradio dropdown input 202 | ollama_models = get_available_models() # Retrieve models from Ollama server 203 | whisper_models = ( 204 | get_available_whisper_models() 205 | ) # Dynamically detect downloaded Whisper models 206 | 207 | # Ensure the first model is selected by default 208 | iface = gr.Interface( 209 | fn=gradio_app, 210 | inputs=[ 211 | gr.Audio(type="filepath", label="Upload an audio file"), 212 | gr.Textbox( 213 | label="Context (optional)", 214 | placeholder="Provide any additional context for the summary", 215 | ), 216 | gr.Dropdown( 217 | choices=whisper_models, 218 | label="Select a Whisper model for audio-to-text conversion", 219 | value=whisper_models[0], 220 | ), 221 | gr.Dropdown( 222 | choices=ollama_models, 223 | label="Select a model for summarization", 224 | value=ollama_models[0] if ollama_models else None, 225 | ), 226 | ], 227 | outputs=[ 228 | gr.Textbox( 229 | label="Summary", 230 | show_copy_button=True, 231 | ), # Display the summary generated by the Ollama model 232 | gr.File( 233 | label="Download Transcript" 234 | ), # Provide the transcript as a downloadable file 235 | ], 236 | analytics_enabled=False, 237 | title="Meeting Summarizer", 238 | description="Upload an audio file of a meeting and get a summary of the key concepts discussed.", 239 | ) 240 | 241 | iface.launch(debug=True) 242 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiofiles==23.2.1 2 | annotated-types==0.7.0 3 | anyio==4.6.0 4 | attrs==24.2.0 5 | cattrs==24.1.2 6 | certifi==2024.8.30 7 | charset-normalizer==3.3.2 8 | click==8.1.7 9 | contourpy==1.3.0 10 | cycler==0.12.1 11 | fastapi==0.115.0 12 | ffmpy==0.4.0 13 | filelock==3.16.1 14 | fonttools==4.54.1 15 | fsspec==2024.9.0 16 | gradio==4.44.1 17 | gradio_client==1.3.0 18 | h11==0.14.0 19 | httpcore==1.0.5 20 | httpx==0.27.2 21 | huggingface-hub==0.25.1 22 | idna==3.10 23 | importlib_resources==6.4.5 24 | Jinja2==3.1.4 25 | kiwisolver==1.4.7 26 | llvmlite==0.43.0 27 | markdown-it-py==3.0.0 28 | MarkupSafe==2.1.5 29 | matplotlib==3.9.2 30 | mdurl==0.1.2 31 | more-itertools==10.5.0 32 | mpmath==1.3.0 33 | networkx==3.3 34 | numba==0.60.0 35 | numpy==2.0.2 36 | openai-whisper==20240930 37 | orjson==3.10.7 38 | packaging==24.1 39 | pandas==2.2.3 40 | pillow==10.4.0 41 | protobuf==3.20.1 42 | pyaml==24.9.0 43 | pydantic==2.9.2 44 | pydantic_core==2.23.4 45 | pydub==0.25.1 46 | Pygments==2.18.0 47 | pyparsing==3.1.4 48 | python-dateutil==2.9.0.post0 49 | python-multipart==0.0.12 50 | pytz==2024.2 51 | PyYAML==6.0.2 52 | regex==2024.9.11 53 | requests==2.32.3 54 | rich==13.9.0 55 | ruff==0.6.8 56 | safetensors==0.4.5 57 | semantic-version==2.10.0 58 | setuptools==75.1.0 59 | shellingham==1.5.4 60 | six==1.16.0 61 | sniffio==1.3.1 62 | starlette==0.38.6 63 | sympy==1.13.3 64 | tiktoken==0.7.0 65 | tokenizers==0.20.0 66 | tomlkit==0.12.0 67 | torch==2.4.1 68 | tqdm==4.66.5 69 | transformers==4.45.1 70 | typer==0.12.5 71 | typing_extensions==4.12.2 72 | tzdata==2024.2 73 | urllib3==2.2.3 74 | uvicorn==0.31.0 75 | websockets==12.0 76 | -------------------------------------------------------------------------------- /run_meeting_summarizer.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Set up variables 4 | VENV_DIR=".venv" 5 | PYTHON_VERSION="python3" 6 | WHISPER_CPP_REPO="https://github.com/ggerganov/whisper.cpp.git" 7 | WHISPER_CPP_DIR="whisper.cpp" 8 | WHISPER_MODEL="small" # Change to "base", "medium", "large", etc. as needed 9 | PYTHON_SCRIPT="main.py" # Change this to the name of your Python script 10 | 11 | # Step 1: Check if Python is installed 12 | if ! command -v $PYTHON_VERSION &>/dev/null; then 13 | echo "Python3 is not installed. Please install Python 3.x to continue." 14 | exit 1 15 | fi 16 | 17 | # Step 2: Create a virtual environment if it doesn't exist 18 | if [ ! -d "$VENV_DIR" ]; then 19 | echo "Creating a virtual environment in $VENV_DIR..." 20 | $PYTHON_VERSION -m venv $VENV_DIR 21 | fi 22 | 23 | # Step 3: Activate the virtual environment 24 | echo "Activating the virtual environment..." 25 | source $VENV_DIR/bin/activate 26 | 27 | # Step 4: Upgrade pip 28 | echo "Upgrading pip..." 29 | pip install --upgrade pip 30 | 31 | # Step 5: Install Python dependencies 32 | echo "Installing required Python dependencies (requests, gradio)..." 33 | pip install requests gradio 34 | 35 | # Step 6: Install FFmpeg if not installed 36 | if ! command -v ffmpeg &>/dev/null; then 37 | echo "FFmpeg is not installed. Installing FFmpeg..." 38 | if [[ "$OSTYPE" == "darwin"* ]]; then 39 | # For MacOS using Homebrew 40 | if ! command -v brew &>/dev/null; then 41 | echo "Homebrew is not installed. Please install Homebrew from https://brew.sh/ and rerun this script." 42 | exit 1 43 | fi 44 | brew install ffmpeg 45 | elif [[ "$OSTYPE" == "linux-gnu"* ]]; then 46 | sudo apt-get update 47 | sudo apt-get install -y ffmpeg 48 | fi 49 | else 50 | echo "FFmpeg is already installed." 51 | fi 52 | 53 | # Step 7: Install whisper.cpp if not already installed 54 | if [ ! -d "$WHISPER_CPP_DIR" ]; then 55 | echo "Cloning whisper.cpp repository..." 56 | git clone $WHISPER_CPP_REPO 57 | fi 58 | 59 | # Step 8: Build whisper.cpp 60 | echo "Building whisper.cpp..." 61 | cd $WHISPER_CPP_DIR 62 | make 63 | 64 | if [ $? -ne 0 ]; then 65 | echo "Failed to build whisper.cpp. Please check for errors." 66 | exit 1 67 | fi 68 | cd .. 69 | 70 | # Step 9: Download the Whisper model 71 | if [ ! -f "./$WHISPER_CPP_DIR/models/ggml-$WHISPER_MODEL.bin" ]; then 72 | echo "Downloading the '$WHISPER_MODEL' Whisper model for whisper.cpp..." 73 | ./$WHISPER_CPP_DIR/models/download-ggml-model.sh $WHISPER_MODEL 74 | else 75 | echo "Whisper model '$WHISPER_MODEL' already downloaded." 76 | fi 77 | 78 | # Step 10: Run the Python script 79 | if [ -f "$PYTHON_SCRIPT" ]; then 80 | echo "Running the Python script..." 81 | python "$PYTHON_SCRIPT" 82 | else 83 | echo "Python script '$PYTHON_SCRIPT' not found. Please ensure the script exists." 84 | exit 1 85 | fi 86 | 87 | # Optional: Deactivate environment after running script 88 | deactivate 89 | --------------------------------------------------------------------------------