├── .DS_Store
├── .gitignore
├── LICENSE
├── README.md
├── README_ZH.md
├── Transcriptify.py
├── Transcriptify_ZH.py
└── requirements.txt


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/victorGPT/Transcriptify/06c8390a4f55060483871c128814e41e64639bd9/.DS_Store


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | compress_audio.py
2 | moviepy.editor.py
3 | openai_api_key.txt
4 | split_audio.py
5 | whisper_transcribe.py
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 victor-wu.eth
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | English |[中文](https://github.com/victorGPT/Transcriptify/blob/main/README_ZH.md)
 2 | ## Transcriptify: A Python Script for Speech-to-Text
 3 | 
 4 | Transcriptify is a Python script for converting speech to text, created using ChatGPT and OpenAI's Whisper technology. The script has the following features:
 5 | 
 6 | ### Features:
 7 | 1. Automatically compresses audio files larger than 25MB (because OpenAI only supports files under 25MB), allowing users to use larger audio files for speech-to-text.
 8 | 2. Supports output of subtitle formats such as txt, vtt, srt, tsv, and json to meet different user needs.
 9 | 3. Generates results quickly (using only three minutes to transcribe an hour-long audio file), allowing users to quickly obtain the desired results.
10 | 
11 | ### Instructions for use:
12 | Before using the script, make sure that the following environments and libraries are installed on your computer:
13 | - Python
14 |     - OpenAI
15 |     - tqdm
16 | - ffmpeg for processing audio
17 | - OpenAI's [API key](https://platform.openai.com/account/api-keys)
18 | 
19 | If you don't know how to install the above environments and libraries, don't worry, you can consult ChatGPT for help and support, as the script author did.
20 | 
21 | In summary, Transcriptify is a convenient and efficient Python script for speech-to-text, supporting compression of large files and output in multiple subtitle formats. Users need to install the corresponding environment and libraries before using the script, or consult ChatGPT for help and support.
22 | 
23 | ## Version Update
24 | ### 1.0.1v 
25 | 
26 | 2023/04/13
27 | 
28 | Changed the file processing logic, now following the process flow:
29 | 
30 | ```mermaid
31 | graph LR
32 |     A[Input File] --> B{Is the file size larger than 25mb?}
33 |     B -->|No| C(Convert speech to text)
34 |     B -->|Yes| D[Compress the original file]
35 |     D --> C
36 |     C --> E{Is there a compressed file?}
37 |     E -->|Yes| F[Delete the compressed file]-->G
38 |     E -->|No| G[Finished]
39 | 


--------------------------------------------------------------------------------
/README_ZH.md:
--------------------------------------------------------------------------------
 1 | [English](https://github.com/victorGPT/Transcriptify/blob/main/README.md) |中文
 2 | ## Transcriptify：一款语音转文字的Python脚本
 3 | 
 4 | Transcriptify是一款利用ChatGPT和OpenAI的Whisper技术制作的语音转文字的Python脚本。该脚本有如下特点：
 5 | 
 6 | ### 特点：
 7 | 1. 自动将大于25MB的音频文件压缩（因为OpenAI只支持25MB以下的文件），使得用户可以使用更大的音频文件进行语音转文字。
 8 | 2. 输出的字幕格式支持txt、vtt、srt、tsv、json，满足用户不同的需求。
 9 | 3. 生成时间短（一小时音频文件尽用时三分钟），让用户可以快速得到想要的结果。
10 | 
11 | ### 使用须知：
12 | 使用该脚本前，需要确保电脑已安装以下环境和库：
13 | - Python
14 |     - OpenAI
15 |     - tqdm
16 | - 处理音频的ffmpeg
17 | - openai的[api-key](https://platform.openai.com/account/api-keys)
18 | 
19 | 如果你不知道如何安装上述环境和库，如果你不知道怎么操作，请放心咨询ChatGPT，因为脚本作者我就是这么做的。
20 | 
21 | 总之，Transcriptify是一款方便快捷的语音转文字的Python脚本，支持大文件压缩和多种字幕格式输出。使用该脚本前，用户需要安装相应的环境和库，或者咨询ChatGPT获取帮助和支持。
22 | 
23 | ## 版本更新
24 | ###1.0.1v 
25 | 
26 | 2023/04/13
27 | 
28 | 修改了对文件处理的逻辑，现在文件处理逻辑遵守以下流程
29 | 
30 | ```mermaid
31 | graph LR
32 |     A[输入文件] --> B{文件体积是否大于25mb?}
33 |     B -->|否| C(音频转文字)
34 |     B -->|是| D[压缩原文件]
35 |     D --> C
36 |     C --> E{是否有压缩文件?}
37 |     E -->|是| F[删去压缩文件]-->G
38 |     E -->|否| G[完成]


--------------------------------------------------------------------------------
/Transcriptify.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | import openai
 4 | from tqdm import tqdm
 5 | import subprocess
 6 | 
 7 | # Define API key file path
 8 | API_KEY_PATH = os.path.expanduser("~/.openai")
 9 | 
10 | # Load API key from file if it exists
11 | if os.path.isfile(API_KEY_PATH):
12 |     with open(API_KEY_PATH, "r") as f:
13 |         api_key = f.read().strip()
14 |         openai.api_key = api_key
15 | 
16 | # Ask for API key if it's not loaded
17 | while not openai.api_key:
18 |     api_key = input("Please enter your OpenAI API key: ").strip()
19 |     openai.api_key = api_key
20 |     # Save API key to file
21 |     with open(API_KEY_PATH, "w") as f:
22 |         f.write(api_key)
23 | 
24 | # Get ffmpeg path
25 | if os.name == 'nt':  # Windows system
26 |     try:
27 |         ffmpeg_path = subprocess.check_output(['where', 'ffmpeg']).decode().strip()
28 |     except:
29 |         print('Please make sure ffmpeg is installed and added to the PATH environment variable')
30 |         exit()
31 | else:  # Linux and Mac systems
32 |     try:
33 |         ffmpeg_path = subprocess.check_output(['which', 'ffmpeg']).decode().strip()
34 |     except:
35 |         print('Please make sure ffmpeg is installed and added to the PATH environment variable')
36 |         exit()
37 | 
38 | def compress_audio(input_file):
39 |     # Get input file size
40 |     file_size = os.path.getsize(input_file)
41 |     if file_size <= 25000000:
42 |         return input_file
43 | 
44 |     # Compress audio file using ffmpeg
45 |     output_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3"
46 |     command = f"{ffmpeg_path} -i {input_file} -ac 1 -ar 16000 -ab 32k {output_file}"
47 |     os.system(command)
48 | 
49 |     return output_file
50 | 
51 | def transcribe_audio(input_file, output_file, response_format, progress_bar):
52 |     with open(input_file, "rb") as f:
53 |         transcript = openai.Audio.transcribe("whisper-1", f, response_format=response_format)
54 | 
55 |     progress_bar.update(100)
56 | 
57 |     with open(output_file, "w", encoding="utf-8") as f:
58 |         f.write(transcript)
59 | 
60 |     # Remove compressed file
61 |     compressed_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3"
62 |     if os.path.exists(compressed_file):
63 |         os.remove(compressed_file)
64 | 
65 | input_file = input("Please enter the audio file name (including extension): ").strip("'\"")
66 | output_format = input("Please enter the output format (txt, vtt, srt, tsv, json, all): ").strip("'\"")
67 | 
68 | if output_format not in ["txt", "vtt", "srt", "tsv", "json", "all"]:
69 |     print("Invalid output format.")
70 |     exit()
71 | 
72 | file_name, _ = os.path.splitext(input_file)
73 | output_file = f"{file_name}_transcript.{output_format}"
74 | 
75 | # Compress audio file if larger than 25MB
76 | input_file = compress_audio(input_file)
77 | 
78 | progress_bar = tqdm(total=100, desc="Transcribing audio", ncols=80)
79 | 
80 | start_time = time.time()
81 | 
82 | transcribe_audio(input_file, output_file, output_format, progress_bar)
83 | 
84 | progress_bar.close()
85 | 
86 | elapsed_time = time.time() - start_time
87 | print(f"Audio transcription completed in {elapsed_time:.2f} seconds")
88 | 
89 | print(f"Transcript saved to {output_file}")
90 | 


--------------------------------------------------------------------------------
/Transcriptify_ZH.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | import openai
 4 | from tqdm import tqdm
 5 | import subprocess
 6 | 
 7 | # Define API key file path
 8 | API_KEY_PATH = os.path.expanduser("~/.openai")
 9 | 
10 | # Load API key from file if it exists
11 | if os.path.isfile(API_KEY_PATH):
12 |     with open(API_KEY_PATH, "r") as f:
13 |         api_key = f.read().strip()
14 |         openai.api_key = api_key
15 | 
16 | # Ask for API key if it's not loaded
17 | while not openai.api_key:
18 |     api_key = input("请输入 OpenAI API key：").strip()
19 |     openai.api_key = api_key
20 |     # Save API key to file
21 |     with open(API_KEY_PATH, "w") as f:
22 |         f.write(api_key)
23 | 
24 | # Get ffmpeg path
25 | if os.name == 'nt':  # Windows系统
26 |     try:
27 |         ffmpeg_path = subprocess.check_output(['where', 'ffmpeg']).decode().strip()
28 |     except:
29 |         print('请确认ffmpeg已安装并配置到环境变量PATH中')
30 |         exit()
31 | else:  # Linux和Mac系统
32 |     try:
33 |         ffmpeg_path = subprocess.check_output(['which', 'ffmpeg']).decode().strip()
34 |     except:
35 |         print('请确认ffmpeg已安装并配置到环境变量PATH中')
36 |         exit()
37 | 
38 | def compress_audio(input_file):
39 |     # Get input file size
40 |     file_size = os.path.getsize(input_file)
41 |     if file_size <= 25000000:
42 |         return input_file
43 | 
44 |     # Compress audio file using ffmpeg
45 |     output_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3"
46 |     command = f"{ffmpeg_path} -i {input_file} -ac 1 -ar 16000 -ab 32k {output_file}"
47 |     os.system(command)
48 | 
49 |     return output_file
50 | 
51 | def transcribe_audio(input_file, output_file, response_format, progress_bar):
52 |     with open(input_file, "rb") as f:
53 |         transcript = openai.Audio.transcribe("whisper-1", f, response_format=response_format)
54 | 
55 |     progress_bar.update(100)
56 | 
57 |     with open(output_file, "w", encoding="utf-8") as f:
58 |         f.write(transcript)
59 | 
60 |     # Remove compressed file
61 |     compressed_file = f"{os.path.splitext(input_file)[0]}_compressed.mp3"
62 |     if os.path.exists(compressed_file):
63 |         os.remove(compressed_file)
64 | 
65 | input_file = input("请输入音频文件名（包括扩展名）：").strip("'\"")
66 | output_format = input("请输入输出格式（txt、vtt、srt、tsv、json、all）：").strip("'\"")
67 | 
68 | if output_format not in ["txt", "vtt", "srt", "tsv", "json", "all"]:
69 |     print("无效的输出格式。")
70 |     exit()
71 | 
72 | file_name, _ = os.path.splitext(input_file)
73 | output_file = f"{file_name}_transcript.{output_format}"
74 | 
75 | # Compress audio file if larger than 25MB
76 | input_file = compress_audio(input_file)
77 | 
78 | progress_bar = tqdm(total=100, desc="音频转录中", ncols=80)
79 | 
80 | start_time = time.time()
81 | 
82 | transcribe_audio(input_file, output_file, output_format, progress_bar)
83 | 
84 | progress_bar.close()
85 | 
86 | elapsed_time = time.time() - start_time
87 | print(f"音频转录完成，用时 {elapsed_time:.2f} 秒")
88 | 
89 | print(f"音频转录已保存到 {output_file}")
90 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiofiles==23.1.0
  2 | aiohttp==3.8.4
  3 | aiosignal==1.3.1
  4 | altair==4.2.2
  5 | anyio==3.6.2
  6 | arxiv==1.4.3
  7 | async-generator==1.10
  8 | async-timeout==4.0.2
  9 | asynctest==0.13.0
 10 | attrs==22.2.0
 11 | beautifulsoup4==4.12.2
 12 | black==23.3.0
 13 | blobfile==2.0.2
 14 | Brotli==1.0.9
 15 | bs4==0.0.1
 16 | cachetools==5.3.0
 17 | certifi==2022.12.7
 18 | cfgv==3.3.1
 19 | chardet==5.1.0
 20 | charset-normalizer==2.0.12
 21 | click==8.1.3
 22 | colorama==0.4.6
 23 | contourpy==1.0.7
 24 | coverage==7.2.3
 25 | cssselect==1.2.0
 26 | cycler==0.11.0
 27 | dataclasses-json==0.5.7
 28 | decorator==4.4.2
 29 | distlib==0.3.6
 30 | dnspython==2.3.0
 31 | docker==6.0.1
 32 | docopt==0.6.2
 33 | duckduckgo-search==2.8.6
 34 | entrypoints==0.4
 35 | exceptiongroup==1.1.1
 36 | fastapi==0.95.1
 37 | feedparser==6.0.10
 38 | ffmpeg-python==0.2.0
 39 | ffmpy==0.3.0
 40 | filelock==3.11.0
 41 | flake8==6.0.0
 42 | Flask==2.2.3
 43 | Flask-Cors==3.0.10
 44 | fonttools==4.39.3
 45 | frozenlist==1.3.3
 46 | fsspec==2023.4.0
 47 | future==0.18.3
 48 | gitdb==4.0.10
 49 | GitPython==3.1.31
 50 | google-api-core==2.11.0
 51 | google-api-python-client==2.85.0
 52 | google-auth==2.17.3
 53 | google-auth-httplib2==0.1.0
 54 | googleapis-common-protos==1.59.0
 55 | gptcache==0.1.18
 56 | gradio==3.20.1
 57 | gTTS==2.3.1
 58 | h11==0.14.0
 59 | httpcore==0.17.0
 60 | httplib2==0.22.0
 61 | httpx==0.24.0
 62 | identify==2.5.22
 63 | idna==3.4
 64 | imageio==2.27.0
 65 | imageio-ffmpeg==0.4.8
 66 | inflate64==0.3.1
 67 | iniconfig==2.0.0
 68 | isort==5.12.0
 69 | itsdangerous==2.1.2
 70 | jieba==0.42.1
 71 | Jinja2==3.1.2
 72 | joblib==1.2.0
 73 | jsonschema==4.17.3
 74 | kiwisolver==1.4.4
 75 | langchain==0.0.142
 76 | linkify-it-py==2.0.0
 77 | llama-index==0.5.23.post1
 78 | llvmlite==0.39.1
 79 | loguru==0.7.0
 80 | lxml==4.9.2
 81 | Markdown==3.4.3
 82 | markdown-it-py==2.2.0
 83 | MarkupSafe==2.1.2
 84 | marshmallow==3.19.0
 85 | marshmallow-enum==1.5.1
 86 | matplotlib==3.7.1
 87 | mccabe==0.7.0
 88 | mdit-py-plugins==0.3.3
 89 | mdurl==0.1.2
 90 | more-itertools==9.1.0
 91 | moviepy==1.0.3
 92 | mpmath==1.3.0
 93 | multidict==6.0.4
 94 | multivolumefile==0.2.3
 95 | mypy-extensions==1.0.0
 96 | networkx==3.1
 97 | nltk==3.8.1
 98 | nodeenv==1.7.0
 99 | numba==0.56.4
100 | numexpr==2.8.4
101 | numpy==1.23.5
102 | oauthlib==3.2.2
103 | openai==0.27.0
104 | openai-whisper @ git+https://github.com/openai/whisper.git@c09a7ae299c4c34c5839a76380ae407e7d785914
105 | openapi-schema-pydantic==1.2.4
106 | orjson==3.8.10
107 | outcome==1.2.0
108 | packaging==23.1
109 | pandas==2.0.0
110 | pathspec==0.11.1
111 | Pillow==9.4.0
112 | pinecone-client==2.2.1
113 | platformdirs==3.2.0
114 | playsound==1.2.2
115 | pluggy==1.0.0
116 | pre-commit==3.2.2
117 | proglog==0.1.10
118 | protobuf==4.22.3
119 | psutil==5.9.5
120 | py-cpuinfo==9.0.0
121 | py7zr==0.20.5
122 | pyasn1==0.4.8
123 | pyasn1-modules==0.2.8
124 | pybase64==1.2.3
125 | pybcj==1.0.1
126 | pycodestyle==2.10.0
127 | pycryptodome==3.17
128 | pycryptodomex==3.17
129 | pydantic==1.10.7
130 | pydub==0.25.1
131 | pyflakes==3.0.1
132 | PyMuPDF==1.21.1
133 | pyparsing==3.0.9
134 | pyppmd==1.0.0
135 | pyrsistent==0.19.3
136 | PySocks==1.7.1
137 | pytest==7.3.1
138 | pytest-asyncio==0.21.0
139 | pytest-benchmark==4.0.0
140 | pytest-cov==4.0.0
141 | pytest-integration==0.2.3
142 | pytest-mock==3.10.0
143 | python-dateutil==2.8.2
144 | python-dotenv==1.0.0
145 | python-multipart==0.0.6
146 | pytz==2023.3
147 | PyYAML==6.0
148 | pyzstd==0.15.7
149 | rarfile==4.0
150 | readability-lxml==0.8.1
151 | redis==4.5.4
152 | regex==2023.3.23
153 | requests==2.26.0
154 | requests-oauthlib==1.3.1
155 | rsa==4.9
156 | selenium==4.8.3
157 | sgmllib3k==1.0.0
158 | six==1.16.0
159 | smmap==5.0.0
160 | sniffio==1.3.0
161 | sortedcontainers==2.4.0
162 | soupsieve==2.4.1
163 | sourcery==1.2.0
164 | SQLAlchemy==1.4.47
165 | starlette==0.26.1
166 | sympy==1.11.1
167 | tenacity==8.2.2
168 | texttable==1.6.7
169 | tiktoken==0.2.0
170 | tomli==2.0.1
171 | toolz==0.12.0
172 | torch==2.0.0
173 | tqdm==4.65.0
174 | trio==0.22.0
175 | trio-websocket==0.10.2
176 | tweepy==4.13.0
177 | typing-inspect==0.8.0
178 | typing_extensions==4.5.0
179 | tzdata==2023.3
180 | uc-micro-py==1.0.1
181 | uritemplate==4.1.1
182 | urllib3==1.25.11
183 | uvicorn==0.21.1
184 | virtualenv==20.21.0
185 | webdriver-manager==3.8.6
186 | websocket-client==1.5.1
187 | websockets==11.0.2
188 | webvtt-py==0.4.6
189 | Werkzeug==2.2.3
190 | wsproto==1.2.0
191 | yarl==1.8.2
192 | 


--------------------------------------------------------------------------------