├── .gitmodules ├── .github ├── FUNDING.yml └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── src ├── unix_run.sh ├── windows_run.bat ├── static │ ├── main.css │ └── main.js ├── run.py ├── templates │ └── index.html └── main.py ├── requirements.txt ├── Dockerfile ├── LICENSE ├── README.md └── CODE_OF_CONDUCT.md /.gitmodules: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: Kabanosk 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .gitignore 3 | .idea/ 4 | __pycache__/ 5 | src/audio.mp3 6 | venv/ 7 | data/ 8 | 9 | -------------------------------------------------------------------------------- /src/unix_run.sh: -------------------------------------------------------------------------------- 1 | python3 -m pip install -r ../requirements.txt 2 | git clone git@github.com:openai/whisper.git 3 | python3 run.py -------------------------------------------------------------------------------- /src/windows_run.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | 3 | python "-m" "pip" "install" "-r" "..\requirements.txt" 4 | git "clone" "git@github.com:openai\whisper.git" 5 | python "run.py" 6 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi 2 | uvicorn 3 | jinja2 4 | python-multipart 5 | stable-ts 6 | 7 | numpy 8 | torch 9 | tqdm 10 | more-itertools 11 | transformers>=4.19.0 12 | ffmpeg-python==0.2.0 13 | datetime 14 | deep_translator 15 | 16 | srt~=3.5.3 -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.9 2 | WORKDIR /app 3 | 4 | RUN apt-get -y update 5 | RUN apt-get -y upgrade 6 | RUN apt-get install -y ffmpeg 7 | 8 | COPY requirements.txt /app 9 | 10 | RUN pip install uv 11 | RUN uv pip install --system -r requirements.txt 12 | 13 | COPY src/ /app 14 | 15 | CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"] 16 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: Kabanosk 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **OS (please complete the following information):** 14 | - [e.g. iOS] 15 | 16 | **Additional context** 17 | Add any other context about the problem here. 18 | -------------------------------------------------------------------------------- /src/static/main.css: -------------------------------------------------------------------------------- 1 | form { 2 | width:100%; 3 | text-align: center; 4 | } 5 | 6 | input, audio{ 7 | width: 25em; 8 | margin: 2em 0; 9 | } 10 | 11 | #interval { 12 | width: 10em; 13 | } 14 | 15 | #recorder { 16 | font-size: 1em; 17 | width: 20em; 18 | } 19 | 20 | #submit { 21 | border-radius: 3em; 22 | height: 5em; 23 | } 24 | 25 | p { 26 | font-size: 1.5em; 27 | white-space: pre-line; 28 | } 29 | 30 | 31 | -------------------------------------------------------------------------------- /src/run.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from time import sleep 3 | import uvicorn 4 | import webbrowser 5 | 6 | from multiprocessing import Process 7 | 8 | 9 | def open_browser(): 10 | webbrowser.open('http://127.0.0.1:8000') 11 | 12 | 13 | def run_localhost(): 14 | uvicorn.run('main:app') 15 | 16 | 17 | if __name__ == '__main__': 18 | open_browser_proc = Process(target=open_browser) 19 | run_localhost_proc = Process(target=run_localhost) 20 | Path("../data").mkdir(parents=True, exist_ok=True) 21 | 22 | run_localhost_proc.start() 23 | sleep(2) 24 | open_browser_proc.start() 25 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: enhancement 6 | assignees: Kabanosk 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Wojciech Fiołka 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Website which convert speech to text by Whisper model ([Official Repo](https://github.com/openai/whisper)) 2 | 3 | ## Hosting website on localhost: 4 | 5 | 1. Clone the repo - `git clone https://github.com/Kabanosk/whisper-website.git` 6 | 2. Go to repo directory - `cd whisper-website` 7 | 3. Create virtual environment - `python3 -m venv venv` 8 | 4. Activate the environment - `source venv/bin/activate`/`. venv/bin/activate` 9 | 5. Install requirements - `pip install -r requirements.txt` 10 | 6. Go to src directory - `cd src` 11 | 7. Run the `run.py` file - `python3 run.py` 12 | 8. Go to your browser and type `http://127.0.0.1:8000/` if the browser doesn't open 13 | 14 | ## Run website on localhost with Docker 15 | ### First time 16 | 1. Install [Docker](https://docs.docker.com/engine/install/) 17 | 2. Clone the repo - `git clone https://github.com/Kabanosk/whisper-website.git` 18 | 3. Go to repo directory - `cd whisper-website` 19 | 4. Create Docker image - `docker build -t app .` 20 | 5. Run Docker container - `docker run --name app_container -p 80:80 app` 21 | 6. Go to your browser and type `http://127.0.0.1:80/` 22 | 23 | ### Next time 24 | 25 | 1. Start your Docker container - `docker start app_container` 26 | 2. Go to your browser and type `http://127.0.0.1:80/` 27 | -------------------------------------------------------------------------------- /src/static/main.js: -------------------------------------------------------------------------------- 1 | /* Moved the inline JavaScript to this file so it can keep the html cleaner */ 2 | /* script */ 3 | const recorder = document.getElementById('recorder'); 4 | const player = document.getElementById('player'); 5 | 6 | recorder.addEventListener('change', function (e) { 7 | const file = e.target.files[0]; 8 | const url = URL.createObjectURL(file); 9 | player.src = url; 10 | }); 11 | /* /script */ 12 | 13 | /* script */ 14 | const conversionForm = document.getElementById('conversion-form'); 15 | const submitButton = document.getElementById('submit'); 16 | const spinner = document.getElementById('spinner'); 17 | 18 | conversionForm.addEventListener('submit', async (event) => { 19 | event.preventDefault(); 20 | submitButton.disabled = true; 21 | spinner.classList.remove('d-none'); 22 | 23 | const formData = new FormData(conversionForm); 24 | const response = await fetch('/download/', { 25 | method: 'POST', 26 | body: formData, 27 | }); 28 | 29 | const blob = await response.blob(); 30 | const downloadUrl = URL.createObjectURL(blob); 31 | const link = document.createElement('a'); 32 | link.href = downloadUrl; 33 | link.download = `${formData.get('filename')}.${formData.get('file_type')}`; 34 | document.body.appendChild(link); 35 | link.click(); 36 | document.body.removeChild(link); 37 | submitButton.disabled = false; 38 | spinner.classList.add('d-none'); 39 | }); 40 | /* /script */ -------------------------------------------------------------------------------- /src/templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Conversion by Whisper 7 | 8 | 9 | 10 | 11 |
12 |

Audio to Text Converter

13 |
14 |
15 | 16 | 17 |
18 | 19 |
20 | 21 | 29 |
30 | 31 |
32 | 33 | 37 |
38 | 39 |
40 | 41 | 42 |
43 | 44 |
45 | 46 | 47 |
48 | 49 |
50 | 51 | 64 |
65 | 66 |
67 | 68 | 72 |
73 | 74 | {% if text %} 75 |

{{ text }}

76 | {% endif %} 77 |
78 |
79 | 80 | 81 | 82 | 83 | 84 | -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | from datetime import timedelta 2 | from typing import Optional 3 | 4 | from fastapi import FastAPI, Request, File, Form 5 | from fastapi.responses import HTMLResponse, StreamingResponse 6 | from fastapi.staticfiles import StaticFiles 7 | from fastapi.templating import Jinja2Templates 8 | import ffmpeg 9 | import numpy as np 10 | import srt as srt 11 | import stable_whisper 12 | from deep_translator import GoogleTranslator 13 | 14 | DEFAULT_MAX_CHARACTERS = 80 15 | 16 | 17 | def get_audio_buffer(filename: str, start: int, length: int): 18 | """ 19 | input: filename of the audio file, start time in seconds, length of the audio in seconds 20 | output: np array of the audio data which the model's transcribe function can take as input 21 | """ 22 | out, _ = ( 23 | ffmpeg.input(filename, threads=0) 24 | .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000, ss=start, t=length) 25 | .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True) 26 | ) 27 | 28 | return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0 29 | 30 | 31 | def transcribe_time_stamps(segments: list): 32 | """ 33 | input: a list of segments from the model's transcribe function 34 | output: a string of the timestamps and the text of each segment 35 | """ 36 | string = "" 37 | for seg in segments: 38 | string += " ".join([str(seg.start), "->", str(seg.end), ": ", seg.text.strip(), "\n"]) 39 | return string 40 | 41 | 42 | def split_text_by_punctuation(text: str, max_length: int): 43 | chunks = [] 44 | while len(text) > max_length: 45 | 46 | split_pos = max( 47 | text.rfind(p, 0, max_length) for p in [",", ".", "?", "!"," "] if p in text[:max_length] 48 | ) 49 | 50 | 51 | if split_pos == -1: 52 | split_pos = max_length 53 | 54 | 55 | chunks.append(text[:split_pos + 1].strip()) 56 | text = text[split_pos + 1:].strip() 57 | 58 | if text: 59 | chunks.append(text) 60 | 61 | return chunks 62 | 63 | 64 | def translate_text(text: str, translate_to: str): 65 | return GoogleTranslator(source='auto', target=translate_to).translate(text=text) 66 | 67 | 68 | def make_srt_subtitles(segments: list,translate_to: str, max_chars: int): 69 | subtitles = [] 70 | for i, seg in enumerate(segments, start=1): 71 | start_time = seg.start 72 | end_time = seg.end 73 | 74 | text = ( 75 | translate_text(seg.text.strip(), translate_to) 76 | if translate_to != "no_translation" 77 | else seg.text.strip() 78 | ) 79 | 80 | text_chunks = split_text_by_punctuation(text, max_chars) 81 | 82 | duration = (end_time - start_time) / len(text_chunks) 83 | 84 | for j, chunk in enumerate(text_chunks): 85 | chunk_start = start_time + j * duration 86 | chunk_end = chunk_start + duration 87 | 88 | subtitle = srt.Subtitle( 89 | index=len(subtitles) + 1, 90 | start=timedelta(seconds=chunk_start), 91 | end=timedelta(seconds=chunk_end), 92 | content=chunk 93 | ) 94 | subtitles.append(subtitle) 95 | 96 | return srt.compose(subtitles) 97 | 98 | 99 | app = FastAPI(debug=True) 100 | 101 | app.mount('/static', StaticFiles(directory='static'), name='static') 102 | template = Jinja2Templates(directory='templates') 103 | 104 | 105 | @app.get('/', response_class=HTMLResponse) 106 | def index(request: Request): 107 | return template.TemplateResponse('index.html', {"request": request, "text": None}) 108 | 109 | 110 | @app.post('/download/') 111 | async def download_subtitle( 112 | request: Request, 113 | file: bytes = File(), 114 | model_type: str = Form("tiny"), 115 | timestamps: Optional[str] = Form("False"), 116 | filename: str = Form("subtitles"), 117 | file_type: str = Form("srt"), 118 | max_characters: int = Form(DEFAULT_MAX_CHARACTERS), 119 | translate_to: str = Form('no_translation'), 120 | ): 121 | 122 | with open('audio.mp3', 'wb') as f: 123 | f.write(file) 124 | 125 | model = stable_whisper.load_model(model_type) 126 | result = model.transcribe("audio.mp3", regroup=False) 127 | 128 | subtitle_file = "subtitle.srt" 129 | 130 | if file_type == "srt": 131 | subtitle_file = f"{filename}.srt" 132 | with open(subtitle_file, "w") as f: 133 | if timestamps: 134 | f.write(make_srt_subtitles(result.segments, translate_to, max_characters)) 135 | else: 136 | f.write(result.text) 137 | elif file_type == "vtt": 138 | subtitle_file = f"{filename}.vtt" 139 | with open(subtitle_file, "w") as f: 140 | if timestamps: 141 | f.write(result.to_vtt()) 142 | else: 143 | f.write(result.text) 144 | 145 | 146 | media_type = "application/octet-stream" 147 | response = StreamingResponse( 148 | open(subtitle_file, 'rb'), 149 | media_type=media_type, 150 | headers={'Content-Disposition': f'attachment;filename={subtitle_file}'} 151 | ) 152 | 153 | return response -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | . 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | --------------------------------------------------------------------------------