├── .gitignore
├── requirements.txt
├── .streamlit
    └── config.toml
├── LICENSE
├── README.md
└── mlx_whisper_transcribe.py


/.gitignore:
--------------------------------------------------------------------------------
1 | pages/local_video*
2 | local_video*
3 | __pycache__
4 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | streamlit
2 | streamlit-lottie
3 | mlx
4 | mlx-whisper
5 | requests
6 | numpy


--------------------------------------------------------------------------------
/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [theme]
2 | primaryColor="#F63366"
3 | backgroundColor="#FFFFFF"
4 | secondaryBackgroundColor="#F0F2F6"
5 | textColor="#262730"
6 | font="sans serif"
7 | [server]
8 | maxUploadSize=8000


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Batuhan Yılmaz
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Apple MLX Powered Video Transcription
  2 | 
  3 | This Streamlit application allows users to upload video files and generate accurate transcripts using Apple's MLX framework.
  4 | 
  5 | Follow me on X: [@RayFernando1337](https://x.com/rayfernando1337/)
  6 | 
  7 | YouTube: [@RayFernando1337](https://www.youtube.com/@rayfernando1337)
  8 | 
  9 | [Watch the demo video](https://github.com/user-attachments/assets/937ad360-6df2-4ea7-a3d0-6d9b22a6404a)
 10 | 
 11 | ### Planned Features (Work in Progress)
 12 | 
 13 | - Translation to English and transcription.
 14 | 
 15 | ## Important Note
 16 | 
 17 | ⚠️ This application is designed to run on Apple Silicon (M series) Macs only. It utilizes the MLX framework, which is optimized for Apple's custom chips.
 18 | 
 19 | ## Getting Started
 20 | 
 21 | ### Prerequisites
 22 | 
 23 | - An Apple Silicon (M series) Mac
 24 | - Conda package manager
 25 | 
 26 | If you don't have Conda installed on your Mac, you can follow the [Ultimate Guide to Installing Miniforge for AI Development on M1 Macs](https://www.rayfernando.ai/ultimate-guide-installing-miniforge-ai-development-m1-macs) for a comprehensive setup process.
 27 | 
 28 | ### Installation
 29 | 
 30 | 1. Clone the repository:
 31 |    ```
 32 |    git clone https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator.git;
 33 |    cd MLX-Auto-Subtitled-Video-Generator
 34 |    ```
 35 | 
 36 | 2. Create a new Conda environment with Python 3.12:
 37 |    ```
 38 |    conda create -n mlx-whisper python=3.12;
 39 |    conda activate mlx-whisper
 40 |    ```
 41 | 
 42 | 3. Install the required dependencies:
 43 |    ```
 44 |    pip install -r requirements.txt
 45 |    ```
 46 | 
 47 | 4. Install FFmpeg (required for audio processing):
 48 |    ```
 49 |    brew install ffmpeg
 50 |    ```
 51 | 
 52 |    Note: If you don't have Homebrew installed, you can install it by running the following command in your terminal:
 53 |    ```
 54 |    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
 55 |    ```
 56 |    
 57 |    After installation, follow the instructions provided in the terminal to add Homebrew to your PATH. For more information about Homebrew, visit [brew.sh](https://brew.sh/).
 58 | 
 59 | ### Running the Application
 60 | 
 61 | To run the Streamlit application, use the following command:
 62 | 
 63 | `streamlit run mlx_whisper_transcribe.py`
 64 | 
 65 | 
 66 | ## Features
 67 | 
 68 | - Upload video files (MP4, AVI, MOV, MKV)
 69 | - Choose between transcription and translation tasks
 70 | - Select from multiple Whisper models
 71 | - Generate VTT and SRT subtitle files
 72 | - Download transcripts as a ZIP file
 73 | 
 74 | ## How It Works
 75 | 
 76 | 1. Upload a video file
 77 | 2. Select the task (Transcribe or Translate)
 78 | 3. Choose a Whisper model
 79 | 4. Click the task button to process the video
 80 | 5. View the results and download the generated transcripts
 81 | 
 82 | ## Models
 83 | 
 84 | The application supports the following Whisper models:
 85 | 
 86 | - Tiny (Q4)
 87 | - Large v3
 88 | - Small English (Q4)
 89 | - Small (FP32)
 90 | 
 91 | Each model has different capabilities and processing speeds. Experiment with different models to find the best balance between accuracy and performance for your needs.
 92 | 
 93 | 
 94 | ## Troubleshooting
 95 | 
 96 | If you encounter any issues, please check the following:
 97 | 
 98 | - Ensure you're using an Apple Silicon Mac
 99 | - Verify that all dependencies are correctly installed
100 | - Check the console output for any error messages
101 | 
102 | For any persistent problems, please open an issue in the repository.
103 | 
104 | 
105 | ## Acknowledgements
106 | 
107 | This project is a fork of the [original Auto-Subtitled Video Generator](https://github.com/BatuhanYilmaz26/Auto-Subtitled-Video-Generator) by Batuhan Yilmaz. I deeply appreciate the contribution to the open-source community.
108 | 


--------------------------------------------------------------------------------
/mlx_whisper_transcribe.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | from streamlit_lottie import st_lottie
  3 | import mlx.core as mx
  4 | import mlx_whisper
  5 | import requests
  6 | from typing import List, Dict, Any
  7 | import pathlib
  8 | import os
  9 | import base64
 10 | import logging
 11 | from zipfile import ZipFile
 12 | import subprocess
 13 | import numpy as np
 14 | 
 15 | # Configure logging
 16 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 17 | 
 18 | # Set up Streamlit page config
 19 | st.set_page_config(page_title="Auto Subtitled Video Generator", page_icon=":movie_camera:", layout="wide")
 20 | 
 21 | # Define constants
 22 | TASK_VERBS = {
 23 |     "Transcribe": "Transcribing",
 24 |     "Translate": "Translating"
 25 | }
 26 | 
 27 | DEVICE = "mps" if mx.metal.is_available() else "cpu"
 28 | MODELS = {
 29 |     "Tiny (Q4)": "mlx-community/whisper-tiny-mlx-q4",
 30 |     "Large v3": "mlx-community/whisper-large-v3-mlx",
 31 |     "Small English (Q4)": "mlx-community/whisper-small.en-mlx-q4",
 32 |     "Small (FP32)": "mlx-community/whisper-small-mlx-fp32"
 33 | }
 34 | APP_DIR = pathlib.Path(__file__).parent.absolute()
 35 | LOCAL_DIR = APP_DIR / "local_video"
 36 | LOCAL_DIR.mkdir(exist_ok=True)
 37 | SAVE_DIR = LOCAL_DIR / "output"
 38 | SAVE_DIR.mkdir(exist_ok=True)
 39 | 
 40 | @st.cache_data
 41 | def load_lottie_url(url: str) -> Dict[str, Any]:
 42 |     try:
 43 |         r = requests.get(url)
 44 |         r.raise_for_status()
 45 |         return r.json()
 46 |     except requests.RequestException as e:
 47 |         logging.error(f"Failed to load Lottie animation: {e}")
 48 |         return None
 49 | 
 50 | 
 51 | def prepare_audio(audio_path: str) -> mx.array:
 52 |     command = [
 53 |         "ffmpeg",
 54 |         "-i", audio_path,
 55 |         "-f", "s16le",
 56 |         "-acodec", "pcm_s16le",
 57 |         "-ar", "16000",
 58 |         "-ac", "1",
 59 |         "-"
 60 |     ]
 61 |     
 62 |     process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
 63 |     audio_data, _ = process.communicate()
 64 |     
 65 |     audio_array = np.frombuffer(audio_data, dtype=np.int16)
 66 |     audio_array = audio_array.astype(np.float32) / 32768.0
 67 |     
 68 |     return mx.array(audio_array)
 69 | 
 70 | def process_audio(model_path: str, audio: mx.array, task: str) -> Dict[str, Any]:
 71 |     logging.info(f"Processing audio with model: {model_path}, task: {task}")
 72 |     
 73 |     try:
 74 |         if task.lower() == "transcribe":
 75 |             results = mlx_whisper.transcribe(
 76 |                 audio,
 77 |                 path_or_hf_repo=model_path,
 78 |                 fp16=False,
 79 |                 verbose=True
 80 |             )
 81 |         elif task.lower() == "translate":
 82 |             results = mlx_whisper.translate(
 83 |                 audio,
 84 |                 path_or_hf_repo=model_path,
 85 |                 fp16=False,
 86 |                 verbose=True
 87 |             )
 88 |         else:
 89 |             raise ValueError(f"Unsupported task: {task}")
 90 |         
 91 |         logging.info(f"{task.capitalize()} completed successfully")
 92 |         return results
 93 |     except Exception as e:
 94 |         logging.error(f"Unexpected error in mlx_whisper.{task}: {e}")
 95 |         raise
 96 | 
 97 | def write_subtitles(segments: List[Dict[str, Any]], format: str, output_file: str) -> None:
 98 |     with open(output_file, "w", encoding="utf-8") as f:
 99 |         if format == "vtt":
100 |             f.write("WEBVTT\n\n")
101 |             for segment in segments:
102 |                 f.write(f"{segment['start']:.3f} --> {segment['end']:.3f}\n")
103 |                 f.write(f"{segment['text'].strip()}\n\n")
104 |         elif format == "srt":
105 |             for i, segment in enumerate(segments, start=1):
106 |                 f.write(f"{i}\n")
107 |                 start = f"{int(segment['start'] // 3600):02d}:{int(segment['start'] % 3600 // 60):02d}:{segment['start'] % 60:06.3f}"
108 |                 end = f"{int(segment['end'] // 3600):02d}:{int(segment['end'] % 3600 // 60):02d}:{segment['end'] % 60:06.3f}"
109 |                 f.write(f"{start.replace('.', ',')} --> {end.replace('.', ',')}\n")
110 |                 f.write(f"{segment['text'].strip()}\n\n")
111 | 
112 | def create_download_link(file_path: str, link_text: str) -> str:
113 |     with open(file_path, "rb") as f:
114 |         data = f.read()
115 |         b64 = base64.b64encode(data).decode()
116 |         href = f'<a href="data:file/zip;base64,{b64}" download="{os.path.basename(file_path)}">{link_text}</a>'
117 |     return href
118 | 
119 | def main():
120 |     col1, col2 = st.columns([1, 3])
121 |     
122 |     with col1:
123 |         lottie = load_lottie_url("https://assets1.lottiefiles.com/packages/lf20_HjK9Ol.json")
124 |         if lottie:
125 |             st_lottie(lottie)
126 |     
127 |     with col2:
128 |         st.markdown("""
129 |             ## Apple MLX Powered Video Transcription
130 | 
131 |             Upload your video and get:
132 |             - Accurate transcripts (SRT/VTT files)
133 |             - Optional English translation
134 |             - Lightning-fast processing
135 | 
136 |             ### Choose your task
137 |             - 🎙️ Transcribe: Capture spoken words in the original language
138 |             - 🌍 Translate: Convert speech to English subtitles
139 |         """)
140 |     
141 |     input_file = st.file_uploader("Upload Video File", type=["mp4", "avi", "mov", "mkv"])
142 |     task = st.selectbox("Select Task", list(TASK_VERBS.keys()), index=0)
143 |     
144 |     # Add model selection dropdown
145 |     selected_model = st.selectbox("Select Whisper Model", list(MODELS.keys()), index=0)
146 |     MODEL_NAME = MODELS[selected_model]
147 |     
148 |     if input_file and st.button(task):
149 |         with st.spinner(f"{TASK_VERBS[task]} the video using {selected_model} model..."):
150 |             try:
151 |                 # Save uploaded file
152 |                 input_path = str(SAVE_DIR / "input.mp4")
153 |                 with open(input_path, "wb") as f:
154 |                     f.write(input_file.read())
155 |                 
156 |                 # Prepare audio
157 |                 audio = prepare_audio(input_path)
158 |                 
159 |                 # Process audio
160 |                 results = process_audio(MODEL_NAME, audio, task.lower())
161 |                 
162 |                 # Display results
163 |                 col3, col4 = st.columns(2)
164 |                 with col3:
165 |                     st.video(input_file)
166 |                 
167 |                 # Write subtitles
168 |                 vtt_path = str(SAVE_DIR / "transcript.vtt")
169 |                 srt_path = str(SAVE_DIR / "transcript.srt")
170 |                 write_subtitles(results["segments"], "vtt", vtt_path)
171 |                 write_subtitles(results["segments"], "srt", srt_path)
172 |                 
173 |                 with col4:
174 |                     st.text_area("Transcription", results["text"], height=300)
175 |                     st.success(f"{task} completed successfully using {selected_model} model!")
176 |                 
177 |                 # Create zip file with outputs
178 |                 zip_path = str(SAVE_DIR / "transcripts.zip")
179 |                 with ZipFile(zip_path, "w") as zipf:
180 |                     for file in [vtt_path, srt_path]:
181 |                         zipf.write(file, os.path.basename(file))
182 |                 
183 |                 # Create download link
184 |                 st.markdown(create_download_link(zip_path, "Download Transcripts"), unsafe_allow_html=True)
185 |             
186 |             except Exception as e:
187 |                 st.error(f"An error occurred: {str(e)}")
188 |                 logging.exception("Error in main processing loop")
189 | 
190 | if __name__ == "__main__":
191 |     main()


--------------------------------------------------------------------------------