├── .gitignore ├── requirements.txt ├── .streamlit └── config.toml ├── LICENSE ├── README.md └── mlx_whisper_transcribe.py /.gitignore: -------------------------------------------------------------------------------- 1 | pages/local_video* 2 | local_video* 3 | __pycache__ 4 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | streamlit 2 | streamlit-lottie 3 | mlx 4 | mlx-whisper 5 | requests 6 | numpy -------------------------------------------------------------------------------- /.streamlit/config.toml: -------------------------------------------------------------------------------- 1 | [theme] 2 | primaryColor="#F63366" 3 | backgroundColor="#FFFFFF" 4 | secondaryBackgroundColor="#F0F2F6" 5 | textColor="#262730" 6 | font="sans serif" 7 | [server] 8 | maxUploadSize=8000 -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Batuhan Yılmaz 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Apple MLX Powered Video Transcription 2 | 3 | This Streamlit application allows users to upload video files and generate accurate transcripts using Apple's MLX framework. 4 | 5 | Follow me on X: [@RayFernando1337](https://x.com/rayfernando1337/) 6 | 7 | YouTube: [@RayFernando1337](https://www.youtube.com/@rayfernando1337) 8 | 9 | [Watch the demo video](https://github.com/user-attachments/assets/937ad360-6df2-4ea7-a3d0-6d9b22a6404a) 10 | 11 | ### Planned Features (Work in Progress) 12 | 13 | - Translation to English and transcription. 14 | 15 | ## Important Note 16 | 17 | ⚠️ This application is designed to run on Apple Silicon (M series) Macs only. It utilizes the MLX framework, which is optimized for Apple's custom chips. 18 | 19 | ## Getting Started 20 | 21 | ### Prerequisites 22 | 23 | - An Apple Silicon (M series) Mac 24 | - Conda package manager 25 | 26 | If you don't have Conda installed on your Mac, you can follow the [Ultimate Guide to Installing Miniforge for AI Development on M1 Macs](https://www.rayfernando.ai/ultimate-guide-installing-miniforge-ai-development-m1-macs) for a comprehensive setup process. 27 | 28 | ### Installation 29 | 30 | 1. Clone the repository: 31 | ``` 32 | git clone https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator.git; 33 | cd MLX-Auto-Subtitled-Video-Generator 34 | ``` 35 | 36 | 2. Create a new Conda environment with Python 3.12: 37 | ``` 38 | conda create -n mlx-whisper python=3.12; 39 | conda activate mlx-whisper 40 | ``` 41 | 42 | 3. Install the required dependencies: 43 | ``` 44 | pip install -r requirements.txt 45 | ``` 46 | 47 | 4. Install FFmpeg (required for audio processing): 48 | ``` 49 | brew install ffmpeg 50 | ``` 51 | 52 | Note: If you don't have Homebrew installed, you can install it by running the following command in your terminal: 53 | ``` 54 | /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" 55 | ``` 56 | 57 | After installation, follow the instructions provided in the terminal to add Homebrew to your PATH. For more information about Homebrew, visit [brew.sh](https://brew.sh/). 58 | 59 | ### Running the Application 60 | 61 | To run the Streamlit application, use the following command: 62 | 63 | `streamlit run mlx_whisper_transcribe.py` 64 | 65 | 66 | ## Features 67 | 68 | - Upload video files (MP4, AVI, MOV, MKV) 69 | - Choose between transcription and translation tasks 70 | - Select from multiple Whisper models 71 | - Generate VTT and SRT subtitle files 72 | - Download transcripts as a ZIP file 73 | 74 | ## How It Works 75 | 76 | 1. Upload a video file 77 | 2. Select the task (Transcribe or Translate) 78 | 3. Choose a Whisper model 79 | 4. Click the task button to process the video 80 | 5. View the results and download the generated transcripts 81 | 82 | ## Models 83 | 84 | The application supports the following Whisper models: 85 | 86 | - Tiny (Q4) 87 | - Large v3 88 | - Small English (Q4) 89 | - Small (FP32) 90 | 91 | Each model has different capabilities and processing speeds. Experiment with different models to find the best balance between accuracy and performance for your needs. 92 | 93 | 94 | ## Troubleshooting 95 | 96 | If you encounter any issues, please check the following: 97 | 98 | - Ensure you're using an Apple Silicon Mac 99 | - Verify that all dependencies are correctly installed 100 | - Check the console output for any error messages 101 | 102 | For any persistent problems, please open an issue in the repository. 103 | 104 | 105 | ## Acknowledgements 106 | 107 | This project is a fork of the [original Auto-Subtitled Video Generator](https://github.com/BatuhanYilmaz26/Auto-Subtitled-Video-Generator) by Batuhan Yilmaz. I deeply appreciate the contribution to the open-source community. 108 | -------------------------------------------------------------------------------- /mlx_whisper_transcribe.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from streamlit_lottie import st_lottie 3 | import mlx.core as mx 4 | import mlx_whisper 5 | import requests 6 | from typing import List, Dict, Any 7 | import pathlib 8 | import os 9 | import base64 10 | import logging 11 | from zipfile import ZipFile 12 | import subprocess 13 | import numpy as np 14 | 15 | # Configure logging 16 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') 17 | 18 | # Set up Streamlit page config 19 | st.set_page_config(page_title="Auto Subtitled Video Generator", page_icon=":movie_camera:", layout="wide") 20 | 21 | # Define constants 22 | TASK_VERBS = { 23 | "Transcribe": "Transcribing", 24 | "Translate": "Translating" 25 | } 26 | 27 | DEVICE = "mps" if mx.metal.is_available() else "cpu" 28 | MODELS = { 29 | "Tiny (Q4)": "mlx-community/whisper-tiny-mlx-q4", 30 | "Large v3": "mlx-community/whisper-large-v3-mlx", 31 | "Small English (Q4)": "mlx-community/whisper-small.en-mlx-q4", 32 | "Small (FP32)": "mlx-community/whisper-small-mlx-fp32" 33 | } 34 | APP_DIR = pathlib.Path(__file__).parent.absolute() 35 | LOCAL_DIR = APP_DIR / "local_video" 36 | LOCAL_DIR.mkdir(exist_ok=True) 37 | SAVE_DIR = LOCAL_DIR / "output" 38 | SAVE_DIR.mkdir(exist_ok=True) 39 | 40 | @st.cache_data 41 | def load_lottie_url(url: str) -> Dict[str, Any]: 42 | try: 43 | r = requests.get(url) 44 | r.raise_for_status() 45 | return r.json() 46 | except requests.RequestException as e: 47 | logging.error(f"Failed to load Lottie animation: {e}") 48 | return None 49 | 50 | 51 | def prepare_audio(audio_path: str) -> mx.array: 52 | command = [ 53 | "ffmpeg", 54 | "-i", audio_path, 55 | "-f", "s16le", 56 | "-acodec", "pcm_s16le", 57 | "-ar", "16000", 58 | "-ac", "1", 59 | "-" 60 | ] 61 | 62 | process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL) 63 | audio_data, _ = process.communicate() 64 | 65 | audio_array = np.frombuffer(audio_data, dtype=np.int16) 66 | audio_array = audio_array.astype(np.float32) / 32768.0 67 | 68 | return mx.array(audio_array) 69 | 70 | def process_audio(model_path: str, audio: mx.array, task: str) -> Dict[str, Any]: 71 | logging.info(f"Processing audio with model: {model_path}, task: {task}") 72 | 73 | try: 74 | if task.lower() == "transcribe": 75 | results = mlx_whisper.transcribe( 76 | audio, 77 | path_or_hf_repo=model_path, 78 | fp16=False, 79 | verbose=True 80 | ) 81 | elif task.lower() == "translate": 82 | results = mlx_whisper.translate( 83 | audio, 84 | path_or_hf_repo=model_path, 85 | fp16=False, 86 | verbose=True 87 | ) 88 | else: 89 | raise ValueError(f"Unsupported task: {task}") 90 | 91 | logging.info(f"{task.capitalize()} completed successfully") 92 | return results 93 | except Exception as e: 94 | logging.error(f"Unexpected error in mlx_whisper.{task}: {e}") 95 | raise 96 | 97 | def write_subtitles(segments: List[Dict[str, Any]], format: str, output_file: str) -> None: 98 | with open(output_file, "w", encoding="utf-8") as f: 99 | if format == "vtt": 100 | f.write("WEBVTT\n\n") 101 | for segment in segments: 102 | f.write(f"{segment['start']:.3f} --> {segment['end']:.3f}\n") 103 | f.write(f"{segment['text'].strip()}\n\n") 104 | elif format == "srt": 105 | for i, segment in enumerate(segments, start=1): 106 | f.write(f"{i}\n") 107 | start = f"{int(segment['start'] // 3600):02d}:{int(segment['start'] % 3600 // 60):02d}:{segment['start'] % 60:06.3f}" 108 | end = f"{int(segment['end'] // 3600):02d}:{int(segment['end'] % 3600 // 60):02d}:{segment['end'] % 60:06.3f}" 109 | f.write(f"{start.replace('.', ',')} --> {end.replace('.', ',')}\n") 110 | f.write(f"{segment['text'].strip()}\n\n") 111 | 112 | def create_download_link(file_path: str, link_text: str) -> str: 113 | with open(file_path, "rb") as f: 114 | data = f.read() 115 | b64 = base64.b64encode(data).decode() 116 | href = f'{link_text}' 117 | return href 118 | 119 | def main(): 120 | col1, col2 = st.columns([1, 3]) 121 | 122 | with col1: 123 | lottie = load_lottie_url("https://assets1.lottiefiles.com/packages/lf20_HjK9Ol.json") 124 | if lottie: 125 | st_lottie(lottie) 126 | 127 | with col2: 128 | st.markdown(""" 129 | ## Apple MLX Powered Video Transcription 130 | 131 | Upload your video and get: 132 | - Accurate transcripts (SRT/VTT files) 133 | - Optional English translation 134 | - Lightning-fast processing 135 | 136 | ### Choose your task 137 | - 🎙️ Transcribe: Capture spoken words in the original language 138 | - 🌍 Translate: Convert speech to English subtitles 139 | """) 140 | 141 | input_file = st.file_uploader("Upload Video File", type=["mp4", "avi", "mov", "mkv"]) 142 | task = st.selectbox("Select Task", list(TASK_VERBS.keys()), index=0) 143 | 144 | # Add model selection dropdown 145 | selected_model = st.selectbox("Select Whisper Model", list(MODELS.keys()), index=0) 146 | MODEL_NAME = MODELS[selected_model] 147 | 148 | if input_file and st.button(task): 149 | with st.spinner(f"{TASK_VERBS[task]} the video using {selected_model} model..."): 150 | try: 151 | # Save uploaded file 152 | input_path = str(SAVE_DIR / "input.mp4") 153 | with open(input_path, "wb") as f: 154 | f.write(input_file.read()) 155 | 156 | # Prepare audio 157 | audio = prepare_audio(input_path) 158 | 159 | # Process audio 160 | results = process_audio(MODEL_NAME, audio, task.lower()) 161 | 162 | # Display results 163 | col3, col4 = st.columns(2) 164 | with col3: 165 | st.video(input_file) 166 | 167 | # Write subtitles 168 | vtt_path = str(SAVE_DIR / "transcript.vtt") 169 | srt_path = str(SAVE_DIR / "transcript.srt") 170 | write_subtitles(results["segments"], "vtt", vtt_path) 171 | write_subtitles(results["segments"], "srt", srt_path) 172 | 173 | with col4: 174 | st.text_area("Transcription", results["text"], height=300) 175 | st.success(f"{task} completed successfully using {selected_model} model!") 176 | 177 | # Create zip file with outputs 178 | zip_path = str(SAVE_DIR / "transcripts.zip") 179 | with ZipFile(zip_path, "w") as zipf: 180 | for file in [vtt_path, srt_path]: 181 | zipf.write(file, os.path.basename(file)) 182 | 183 | # Create download link 184 | st.markdown(create_download_link(zip_path, "Download Transcripts"), unsafe_allow_html=True) 185 | 186 | except Exception as e: 187 | st.error(f"An error occurred: {str(e)}") 188 | logging.exception("Error in main processing loop") 189 | 190 | if __name__ == "__main__": 191 | main() --------------------------------------------------------------------------------