├── .DS_Store ├── .gitignore ├── LICENSE ├── README.md ├── README_ZH.md ├── Transcriptify.py ├── Transcriptify_ZH.py └── requirements.txt /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/victorGPT/Transcriptify/06c8390a4f55060483871c128814e41e64639bd9/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | compress_audio.py 2 | moviepy.editor.py 3 | openai_api_key.txt 4 | split_audio.py 5 | whisper_transcribe.py 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 victor-wu.eth 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | English |[中文](https://github.com/victorGPT/Transcriptify/blob/main/README_ZH.md) 2 | ## Transcriptify: A Python Script for Speech-to-Text 3 | 4 | Transcriptify is a Python script for converting speech to text, created using ChatGPT and OpenAI's Whisper technology. The script has the following features: 5 | 6 | ### Features: 7 | 1. Automatically compresses audio files larger than 25MB (because OpenAI only supports files under 25MB), allowing users to use larger audio files for speech-to-text. 8 | 2. Supports output of subtitle formats such as txt, vtt, srt, tsv, and json to meet different user needs. 9 | 3. Generates results quickly (using only three minutes to transcribe an hour-long audio file), allowing users to quickly obtain the desired results. 10 | 11 | ### Instructions for use: 12 | Before using the script, make sure that the following environments and libraries are installed on your computer: 13 | - Python 14 | - OpenAI 15 | - tqdm 16 | - ffmpeg for processing audio 17 | - OpenAI's [API key](https://platform.openai.com/account/api-keys) 18 | 19 | If you don't know how to install the above environments and libraries, don't worry, you can consult ChatGPT for help and support, as the script author did. 20 | 21 | In summary, Transcriptify is a convenient and efficient Python script for speech-to-text, supporting compression of large files and output in multiple subtitle formats. Users need to install the corresponding environment and libraries before using the script, or consult ChatGPT for help and support. 22 | 23 | ## Version Update 24 | ### 1.0.1v 25 | 26 | 2023/04/13 27 | 28 | Changed the file processing logic, now following the process flow: 29 | 30 | ```mermaid 31 | graph LR 32 | A[Input File] --> B{Is the file size larger than 25mb?} 33 | B -->|No| C(Convert speech to text) 34 | B -->|Yes| D[Compress the original file] 35 | D --> C 36 | C --> E{Is there a compressed file?} 37 | E -->|Yes| F[Delete the compressed file]-->G 38 | E -->|No| G[Finished] 39 | -------------------------------------------------------------------------------- /README_ZH.md: -------------------------------------------------------------------------------- 1 | [English](https://github.com/victorGPT/Transcriptify/blob/main/README.md) |中文 2 | ## Transcriptify:一款语音转文字的Python脚本 3 | 4 | Transcriptify是一款利用ChatGPT和OpenAI的Whisper技术制作的语音转文字的Python脚本。该脚本有如下特点: 5 | 6 | ### 特点: 7 | 1. 自动将大于25MB的音频文件压缩(因为OpenAI只支持25MB以下的文件),使得用户可以使用更大的音频文件进行语音转文字。 8 | 2. 输出的字幕格式支持txt、vtt、srt、tsv、json,满足用户不同的需求。 9 | 3. 生成时间短(一小时音频文件尽用时三分钟),让用户可以快速得到想要的结果。 10 | 11 | ### 使用须知: 12 | 使用该脚本前,需要确保电脑已安装以下环境和库: 13 | - Python 14 | - OpenAI 15 | - tqdm 16 | - 处理音频的ffmpeg 17 | - openai的[api-key](https://platform.openai.com/account/api-keys) 18 | 19 | 如果你不知道如何安装上述环境和库,如果你不知道怎么操作,请放心咨询ChatGPT,因为脚本作者我就是这么做的。 20 | 21 | 总之,Transcriptify是一款方便快捷的语音转文字的Python脚本,支持大文件压缩和多种字幕格式输出。使用该脚本前,用户需要安装相应的环境和库,或者咨询ChatGPT获取帮助和支持。 22 | 23 | ## 版本更新 24 | ###1.0.1v 25 | 26 | 2023/04/13 27 | 28 | 修改了对文件处理的逻辑,现在文件处理逻辑遵守以下流程 29 | 30 | ```mermaid 31 | graph LR 32 | A[输入文件] --> B{文件体积是否大于25mb?} 33 | B -->|否| C(音频转文字) 34 | B -->|是| D[压缩原文件] 35 | D --> C 36 | C --> E{是否有压缩文件?} 37 | E -->|是| F[删去压缩文件]-->G 38 | E -->|否| G[完成] -------------------------------------------------------------------------------- /Transcriptify.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import openai 4 | from tqdm import tqdm 5 | import subprocess 6 | 7 | # Define API key file path 8 | API_KEY_PATH = os.path.expanduser("~/.openai") 9 | 10 | # Load API key from file if it exists 11 | if os.path.isfile(API_KEY_PATH): 12 | with open(API_KEY_PATH, "r") as f: 13 | api_key = f.read().strip() 14 | openai.api_key = api_key 15 | 16 | # Ask for API key if it's not loaded 17 | while not openai.api_key: 18 | api_key = input("Please enter your OpenAI API key: ").strip() 19 | openai.api_key = api_key 20 | # Save API key to file 21 | with open(API_KEY_PATH, "w") as f: 22 | f.write(api_key) 23 | 24 | # Get ffmpeg path 25 | if os.name == 'nt': # Windows system 26 | try: 27 | ffmpeg_path = subprocess.check_output(['where', 'ffmpeg']).decode().strip() 28 | except: 29 | print('Please make sure ffmpeg is installed and added to the PATH environment variable') 30 | exit() 31 | else: # Linux and Mac systems 32 | try: 33 | ffmpeg_path = subprocess.check_output(['which', 'ffmpeg']).decode().strip() 34 | except: 35 | print('Please make sure ffmpeg is installed and added to the PATH environment variable') 36 | exit() 37 | 38 | def compress_audio(input_file): 39 | # Get input file size 40 | file_size = os.path.getsize(input_file) 41 | if file_size <= 25000000: 42 | return input_file 43 | 44 | # Compress audio file using ffmpeg 45 | output_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3" 46 | command = f"{ffmpeg_path} -i {input_file} -ac 1 -ar 16000 -ab 32k {output_file}" 47 | os.system(command) 48 | 49 | return output_file 50 | 51 | def transcribe_audio(input_file, output_file, response_format, progress_bar): 52 | with open(input_file, "rb") as f: 53 | transcript = openai.Audio.transcribe("whisper-1", f, response_format=response_format) 54 | 55 | progress_bar.update(100) 56 | 57 | with open(output_file, "w", encoding="utf-8") as f: 58 | f.write(transcript) 59 | 60 | # Remove compressed file 61 | compressed_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3" 62 | if os.path.exists(compressed_file): 63 | os.remove(compressed_file) 64 | 65 | input_file = input("Please enter the audio file name (including extension): ").strip("'\"") 66 | output_format = input("Please enter the output format (txt, vtt, srt, tsv, json, all): ").strip("'\"") 67 | 68 | if output_format not in ["txt", "vtt", "srt", "tsv", "json", "all"]: 69 | print("Invalid output format.") 70 | exit() 71 | 72 | file_name, _ = os.path.splitext(input_file) 73 | output_file = f"{file_name}_transcript.{output_format}" 74 | 75 | # Compress audio file if larger than 25MB 76 | input_file = compress_audio(input_file) 77 | 78 | progress_bar = tqdm(total=100, desc="Transcribing audio", ncols=80) 79 | 80 | start_time = time.time() 81 | 82 | transcribe_audio(input_file, output_file, output_format, progress_bar) 83 | 84 | progress_bar.close() 85 | 86 | elapsed_time = time.time() - start_time 87 | print(f"Audio transcription completed in {elapsed_time:.2f} seconds") 88 | 89 | print(f"Transcript saved to {output_file}") 90 | -------------------------------------------------------------------------------- /Transcriptify_ZH.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import openai 4 | from tqdm import tqdm 5 | import subprocess 6 | 7 | # Define API key file path 8 | API_KEY_PATH = os.path.expanduser("~/.openai") 9 | 10 | # Load API key from file if it exists 11 | if os.path.isfile(API_KEY_PATH): 12 | with open(API_KEY_PATH, "r") as f: 13 | api_key = f.read().strip() 14 | openai.api_key = api_key 15 | 16 | # Ask for API key if it's not loaded 17 | while not openai.api_key: 18 | api_key = input("请输入 OpenAI API key:").strip() 19 | openai.api_key = api_key 20 | # Save API key to file 21 | with open(API_KEY_PATH, "w") as f: 22 | f.write(api_key) 23 | 24 | # Get ffmpeg path 25 | if os.name == 'nt': # Windows系统 26 | try: 27 | ffmpeg_path = subprocess.check_output(['where', 'ffmpeg']).decode().strip() 28 | except: 29 | print('请确认ffmpeg已安装并配置到环境变量PATH中') 30 | exit() 31 | else: # Linux和Mac系统 32 | try: 33 | ffmpeg_path = subprocess.check_output(['which', 'ffmpeg']).decode().strip() 34 | except: 35 | print('请确认ffmpeg已安装并配置到环境变量PATH中') 36 | exit() 37 | 38 | def compress_audio(input_file): 39 | # Get input file size 40 | file_size = os.path.getsize(input_file) 41 | if file_size <= 25000000: 42 | return input_file 43 | 44 | # Compress audio file using ffmpeg 45 | output_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3" 46 | command = f"{ffmpeg_path} -i {input_file} -ac 1 -ar 16000 -ab 32k {output_file}" 47 | os.system(command) 48 | 49 | return output_file 50 | 51 | def transcribe_audio(input_file, output_file, response_format, progress_bar): 52 | with open(input_file, "rb") as f: 53 | transcript = openai.Audio.transcribe("whisper-1", f, response_format=response_format) 54 | 55 | progress_bar.update(100) 56 | 57 | with open(output_file, "w", encoding="utf-8") as f: 58 | f.write(transcript) 59 | 60 | # Remove compressed file 61 | compressed_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3" 62 | if os.path.exists(compressed_file): 63 | os.remove(compressed_file) 64 | 65 | input_file = input("请输入音频文件名(包括扩展名):").strip("'\"") 66 | output_format = input("请输入输出格式(txt、vtt、srt、tsv、json、all):").strip("'\"") 67 | 68 | if output_format not in ["txt", "vtt", "srt", "tsv", "json", "all"]: 69 | print("无效的输出格式。") 70 | exit() 71 | 72 | file_name, _ = os.path.splitext(input_file) 73 | output_file = f"{file_name}_transcript.{output_format}" 74 | 75 | # Compress audio file if larger than 25MB 76 | input_file = compress_audio(input_file) 77 | 78 | progress_bar = tqdm(total=100, desc="音频转录中", ncols=80) 79 | 80 | start_time = time.time() 81 | 82 | transcribe_audio(input_file, output_file, output_format, progress_bar) 83 | 84 | progress_bar.close() 85 | 86 | elapsed_time = time.time() - start_time 87 | print(f"音频转录完成,用时 {elapsed_time:.2f} 秒") 88 | 89 | print(f"音频转录已保存到 {output_file}") 90 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiofiles==23.1.0 2 | aiohttp==3.8.4 3 | aiosignal==1.3.1 4 | altair==4.2.2 5 | anyio==3.6.2 6 | arxiv==1.4.3 7 | async-generator==1.10 8 | async-timeout==4.0.2 9 | asynctest==0.13.0 10 | attrs==22.2.0 11 | beautifulsoup4==4.12.2 12 | black==23.3.0 13 | blobfile==2.0.2 14 | Brotli==1.0.9 15 | bs4==0.0.1 16 | cachetools==5.3.0 17 | certifi==2022.12.7 18 | cfgv==3.3.1 19 | chardet==5.1.0 20 | charset-normalizer==2.0.12 21 | click==8.1.3 22 | colorama==0.4.6 23 | contourpy==1.0.7 24 | coverage==7.2.3 25 | cssselect==1.2.0 26 | cycler==0.11.0 27 | dataclasses-json==0.5.7 28 | decorator==4.4.2 29 | distlib==0.3.6 30 | dnspython==2.3.0 31 | docker==6.0.1 32 | docopt==0.6.2 33 | duckduckgo-search==2.8.6 34 | entrypoints==0.4 35 | exceptiongroup==1.1.1 36 | fastapi==0.95.1 37 | feedparser==6.0.10 38 | ffmpeg-python==0.2.0 39 | ffmpy==0.3.0 40 | filelock==3.11.0 41 | flake8==6.0.0 42 | Flask==2.2.3 43 | Flask-Cors==3.0.10 44 | fonttools==4.39.3 45 | frozenlist==1.3.3 46 | fsspec==2023.4.0 47 | future==0.18.3 48 | gitdb==4.0.10 49 | GitPython==3.1.31 50 | google-api-core==2.11.0 51 | google-api-python-client==2.85.0 52 | google-auth==2.17.3 53 | google-auth-httplib2==0.1.0 54 | googleapis-common-protos==1.59.0 55 | gptcache==0.1.18 56 | gradio==3.20.1 57 | gTTS==2.3.1 58 | h11==0.14.0 59 | httpcore==0.17.0 60 | httplib2==0.22.0 61 | httpx==0.24.0 62 | identify==2.5.22 63 | idna==3.4 64 | imageio==2.27.0 65 | imageio-ffmpeg==0.4.8 66 | inflate64==0.3.1 67 | iniconfig==2.0.0 68 | isort==5.12.0 69 | itsdangerous==2.1.2 70 | jieba==0.42.1 71 | Jinja2==3.1.2 72 | joblib==1.2.0 73 | jsonschema==4.17.3 74 | kiwisolver==1.4.4 75 | langchain==0.0.142 76 | linkify-it-py==2.0.0 77 | llama-index==0.5.23.post1 78 | llvmlite==0.39.1 79 | loguru==0.7.0 80 | lxml==4.9.2 81 | Markdown==3.4.3 82 | markdown-it-py==2.2.0 83 | MarkupSafe==2.1.2 84 | marshmallow==3.19.0 85 | marshmallow-enum==1.5.1 86 | matplotlib==3.7.1 87 | mccabe==0.7.0 88 | mdit-py-plugins==0.3.3 89 | mdurl==0.1.2 90 | more-itertools==9.1.0 91 | moviepy==1.0.3 92 | mpmath==1.3.0 93 | multidict==6.0.4 94 | multivolumefile==0.2.3 95 | mypy-extensions==1.0.0 96 | networkx==3.1 97 | nltk==3.8.1 98 | nodeenv==1.7.0 99 | numba==0.56.4 100 | numexpr==2.8.4 101 | numpy==1.23.5 102 | oauthlib==3.2.2 103 | openai==0.27.0 104 | openai-whisper @ git+https://github.com/openai/whisper.git@c09a7ae299c4c34c5839a76380ae407e7d785914 105 | openapi-schema-pydantic==1.2.4 106 | orjson==3.8.10 107 | outcome==1.2.0 108 | packaging==23.1 109 | pandas==2.0.0 110 | pathspec==0.11.1 111 | Pillow==9.4.0 112 | pinecone-client==2.2.1 113 | platformdirs==3.2.0 114 | playsound==1.2.2 115 | pluggy==1.0.0 116 | pre-commit==3.2.2 117 | proglog==0.1.10 118 | protobuf==4.22.3 119 | psutil==5.9.5 120 | py-cpuinfo==9.0.0 121 | py7zr==0.20.5 122 | pyasn1==0.4.8 123 | pyasn1-modules==0.2.8 124 | pybase64==1.2.3 125 | pybcj==1.0.1 126 | pycodestyle==2.10.0 127 | pycryptodome==3.17 128 | pycryptodomex==3.17 129 | pydantic==1.10.7 130 | pydub==0.25.1 131 | pyflakes==3.0.1 132 | PyMuPDF==1.21.1 133 | pyparsing==3.0.9 134 | pyppmd==1.0.0 135 | pyrsistent==0.19.3 136 | PySocks==1.7.1 137 | pytest==7.3.1 138 | pytest-asyncio==0.21.0 139 | pytest-benchmark==4.0.0 140 | pytest-cov==4.0.0 141 | pytest-integration==0.2.3 142 | pytest-mock==3.10.0 143 | python-dateutil==2.8.2 144 | python-dotenv==1.0.0 145 | python-multipart==0.0.6 146 | pytz==2023.3 147 | PyYAML==6.0 148 | pyzstd==0.15.7 149 | rarfile==4.0 150 | readability-lxml==0.8.1 151 | redis==4.5.4 152 | regex==2023.3.23 153 | requests==2.26.0 154 | requests-oauthlib==1.3.1 155 | rsa==4.9 156 | selenium==4.8.3 157 | sgmllib3k==1.0.0 158 | six==1.16.0 159 | smmap==5.0.0 160 | sniffio==1.3.0 161 | sortedcontainers==2.4.0 162 | soupsieve==2.4.1 163 | sourcery==1.2.0 164 | SQLAlchemy==1.4.47 165 | starlette==0.26.1 166 | sympy==1.11.1 167 | tenacity==8.2.2 168 | texttable==1.6.7 169 | tiktoken==0.2.0 170 | tomli==2.0.1 171 | toolz==0.12.0 172 | torch==2.0.0 173 | tqdm==4.65.0 174 | trio==0.22.0 175 | trio-websocket==0.10.2 176 | tweepy==4.13.0 177 | typing-inspect==0.8.0 178 | typing_extensions==4.5.0 179 | tzdata==2023.3 180 | uc-micro-py==1.0.1 181 | uritemplate==4.1.1 182 | urllib3==1.25.11 183 | uvicorn==0.21.1 184 | virtualenv==20.21.0 185 | webdriver-manager==3.8.6 186 | websocket-client==1.5.1 187 | websockets==11.0.2 188 | webvtt-py==0.4.6 189 | Werkzeug==2.2.3 190 | wsproto==1.2.0 191 | yarl==1.8.2 192 | --------------------------------------------------------------------------------