├── .gitignore ├── LICENSE ├── README.md ├── funs ├── 01.视频抽帧.py ├── 02.裁剪图像.py ├── 03.识别字幕.py └── 04.提取中间字幕.py ├── logo.png ├── main.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | config.txt 3 | test 4 | tmp_imgs 5 | test 6 | __pycache__ 7 | build 8 | dist 9 | *.spec 10 | output -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 CodeFly 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SubExtractor-OCR 2 | 3 | > 字幕提取器-OCR版-1.0 4 | 5 | ## 简介 6 | 7 | SubExtractor-OCR 是一款基于 OCR(Optical Character Recognition 光学字符识别)技术的视频字幕提取器。它利用 PyQT6 构建,旨在帮助用户从视频中提取视频文本字幕,从而获取视频文案。 8 | 9 | 这是一款开源自制软件,专为解决提取视频文案的烦恼而生。不再需要手动暂停视频抄录文案,字幕提取器为你助力! 10 | 11 | ## 功能亮点 12 | 13 | - **OCR 提取**: 使用光学字符识别技术从视频中提取字幕文本。 14 | - **用户友好的界面**: 基于 PyQT6 构建,提供直观、易用的用户界面。 15 | 16 | - **支持自定义高度区间**:设置字幕高度区间,提高适应性。 17 | ## 如何安装 18 | 19 | ### 下载安装包: 20 | 21 | 就是项目中的rar文件,解压即可使用。 22 | 23 | 下载链接:[https://github.com/w-x-x-w/SubExtractor-OCR/releases](https://github.com/w-x-x-w/SubExtractor-OCR/releases) 24 | 25 | ## 注意事项 26 | 27 | 好了,现在揭晓一些坏消息。由于这是字幕提取器1.0版本,一些功能尚未支持,比如批量化视频处理和自动一键完成。但是,我们已经有了改进的计划,期待未来更完美的版本。 28 | 29 | ### 为什么没有实现批量化? 30 | 31 | 批量化处理涉及到复杂的业务场景,视频尺寸不一致可能导致字幕高度区间的错误。 32 | 33 | 我正在思考是否可以添加为每个视频设置字幕高度区间的功能,欢迎在评论区一起探讨。 34 | 35 | ### 开发者的心路历程 36 | 37 | 虽然功能是无限的,但我们的完美主义者一直在思考更好的方案,想为大家呈现最佳版本。当前版本虽不完美,但每个流程和函数都已实现,只是用户操作流程还需要梳理。有代码基础的同学可以尝试使用并提出建议。 38 | 39 | ### 使用方法 40 | 41 | 1. 选择视频文件。 42 | 2. 点击“视频抽帧”按钮,耐心等待抽帧完成。 43 | 3. 量取字幕的高度区间。 44 | 4. 点击后续按钮,完成后续处理。 45 | 5. 复制文案,享受提取字幕的乐趣! 46 | 47 | ### 效果展示 48 | 49 | 在软件运行界面中,选择一个视频文件,点击视频抽帧按钮,即可开始操作。后续处理速度快,最终效果可在文案中查看。 50 | 1.0 51 | - 解决了换行字幕的识别问题 52 | 53 | ### 待改进实现的方面 54 | 55 | - 批量化自动化处理 56 | - 单独线程处理任务而不是单线程 57 | - 预设功能,无需每次输入高度区间 58 | 59 | ### 最后的话 60 | 61 | 感谢您使用字幕提取器-OCR版-1.0!我们将不断改进和更新,为用户提供更好的体验。有任何问题或建议,请在评论区与我们分享。希望您喜欢这个小工具,让视频文案提取更轻松愉快! 62 | 63 | 字幕一般不会换行吧 64 | 65 | 就截取一行得了,不纠结那么多了 66 | 67 | ## 打包命令 68 | 69 | `pyinstaller -w 视频字幕提取-OCR版-1.0.py -i ./logo.png` 70 | 71 | `pyinstaller -w 视频字幕提取-OCR版-2.0.py -i ./logo.png` 72 | 73 | ## 贡献 74 | 75 | 如果您发现任何问题或有改进建议,请提出 issue 或创建 pull 请求。我们欢迎您的贡献! 76 | 77 | ## 许可证 78 | 79 | SubExtractor-OCR 使用 [MIT 许可证](LICENSE)。 80 | 81 | --- 82 | 83 | 请注意,上述 README 只是一个示例,具体内容可能需要根据你的项目的实际情况进行修改。你可以添加更多的详细信息,例如支持的字幕格式、示例截图等。希望这个示例对你有帮助! 84 | -------------------------------------------------------------------------------- /funs/01.视频抽帧.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2023/12/6 23:17 3 | # @QQ : 2942581284 4 | # @File : 01.视频抽帧.py 5 | import cv2 6 | import os 7 | 8 | def get_video_frames(video_path=r"D:\program\剪映\导出\如何给我们的PyQt6程序制作一个炫酷的充值按钮.mp4", 9 | tmp_imgs_path='tmp_imgs'): 10 | tmp_folder_path, file_extension = os.path.splitext(os.path.basename(video_path)) 11 | folder_path=os.path.join(tmp_imgs_path, tmp_folder_path) 12 | if not os.path.exists(folder_path): 13 | os.makedirs(folder_path) 14 | # 打开视频文件 15 | cap = cv2.VideoCapture(video_path) 16 | # 检查视频是否成功打开 17 | if not cap.isOpened(): 18 | print("无法打开视频文件") 19 | else: 20 | # 获取视频帧数 21 | total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 22 | print(f"视频总帧数: {total_frames}") 23 | frame_rate = int(cap.get(cv2.CAP_PROP_FPS)) 24 | print(f"视频帧率: {frame_rate} fps") 25 | # 设置帧间隔,每秒抽取一帧,1秒抽取1帧 26 | frame_interval = frame_rate // 1 27 | # 循环遍历视频的每一帧并保存为图像 28 | frame_count = 0 29 | while True: 30 | ret, frame = cap.read() 31 | if not ret: 32 | break 33 | if frame_count % frame_interval == 0: 34 | image_filename = f"frame_{frame_count:06d}.png" 35 | image_save_path=os.path.join(folder_path,image_filename) 36 | cv2.imencode(".png", frame)[1].tofile(image_save_path) 37 | print(f"保存图像:{image_filename},视频抽帧进度:{frame_count//frame_interval}/{total_frames//frame_interval}:{frame_count}/{total_frames}") 38 | frame_count += 1 39 | cap.release() 40 | 41 | get_video_frames() -------------------------------------------------------------------------------- /funs/02.裁剪图像.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2023/12/6 23:17 3 | # @QQ : 2942581284 4 | # @File : 02.裁剪图像.py 5 | import os 6 | import cv2 7 | import numpy as np 8 | 9 | 10 | def cv_imread(file_path): 11 | cv_img = cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), -1) 12 | return cv_img 13 | 14 | 15 | # imgurl='测试.jpg' 16 | # img1 = cv_imread(imgurl) 17 | # cv2.imencode('.jpg', img1 )[1].tofile(imgurl) 18 | 19 | def cut_zimu_from_img( 20 | imgs_path=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\【PyQtPySide界面美化】qt-material极简上手!'): 21 | zimu_folder_name = os.path.basename(imgs_path) + '-字幕' 22 | zimu_path = os.path.join(os.path.dirname(imgs_path), zimu_folder_name) 23 | if not os.path.exists(zimu_path): 24 | os.makedirs(zimu_path) 25 | # 获取文件夹中的所有图片文件 26 | image_files = [f for f in os.listdir(imgs_path)] 27 | # 定义裁剪区域的坐标 (x, y, width, height) 28 | height_min = 1800 29 | height_max = 1950 30 | width = 3800 31 | crop_area = (0, height_min, width, height_max - height_min) 32 | # 循环处理每个图片文件 33 | for image_file in image_files: 34 | # 拼接图片文件的完整路径 35 | image_path = os.path.normpath(os.path.join(imgs_path, image_file)) 36 | 37 | print(image_path) 38 | # 读取图片 39 | image = cv_imread(image_path) 40 | # image = cv2.imread(image_path) 41 | if image is not None: 42 | # 裁剪图片 43 | x, y, w, h = crop_area 44 | cropped_image = image[y:y + h, x:x + w] 45 | # 保存裁剪后的图片 46 | output_file = os.path.join(zimu_path, f"cropped_{image_file}") 47 | cv2.imencode(".png", cropped_image)[1].tofile(output_file) 48 | print(f"已保存裁剪后的图片:{output_file}") 49 | else: 50 | print(f"无法读取图片:{image_path}") 51 | 52 | cut_zimu_from_img() -------------------------------------------------------------------------------- /funs/03.识别字幕.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import time 4 | from wechat_ocr.ocr_manager import OcrManager, OCR_MAX_TASK_ID 5 | 6 | wechat_ocr_dir = r"C:\Users\86159\AppData\Roaming\Tencent\WeChat\XPlugin\Plugins\WeChatOCR\7061\extracted\WeChatOCR.exe" 7 | # wechat_ocr_dir = "C:\\Users\\Administrator\\AppData\\Roaming\\Tencent\\WeChat\\XPlugin\\Plugins\\WeChatOCR\\7057\\extracted\\WeChatOCR.exe" 8 | wechat_dir = r"D:\微信\WeChat\[3.9.8.15]" 9 | # wechat_dir = "D:\\GreenSoftware\\WeChat\\3.9.6.32" 10 | 11 | def ocr_result_callback(img_path: str, results: dict): 12 | zimu_imgs_path=os.path.dirname(img_path).replace('-字幕','-识别结果') 13 | zimu_imgs_name=os.path.basename(img_path) 14 | result_file = os.path.join(zimu_imgs_path,zimu_imgs_name)+".json" 15 | # result_file = os.path.basename(img_path) + ".json" 16 | print(f"识别成功,img_path: {img_path}, result_file: {result_file}") 17 | with open(result_file, 'w', encoding='utf-8') as f: 18 | f.write(json.dumps(results, ensure_ascii=False, indent=2)) 19 | 20 | def ocr_zimu(file_name=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\如何给我们的PyQt6程序制作一个炫酷的充值按钮'): 21 | zimu_path=file_name+'-字幕' 22 | output_path=file_name+'-识别结果' 23 | if not os.path.exists(output_path): 24 | os.makedirs(output_path) 25 | ocr_manager = OcrManager(wechat_dir) 26 | # 设置WeChatOcr目录 27 | ocr_manager.SetExePath(wechat_ocr_dir) 28 | # 设置微信所在路径 29 | ocr_manager.SetUsrLibDir(wechat_dir) 30 | # 设置ocr识别结果的回调函数 31 | ocr_manager.SetOcrResultCallback(ocr_result_callback) 32 | # 启动ocr服务 33 | ocr_manager.StartWeChatOCR() 34 | # 开始识别图片 35 | image_files = [f'{f}' for f in os.listdir(zimu_path)] 36 | for image_name in image_files: 37 | image_path=os.path.join(zimu_path,image_name) 38 | ocr_manager.DoOCRTask(image_path) 39 | # time.sleep(1) 40 | while ocr_manager.m_task_id.qsize() != OCR_MAX_TASK_ID: 41 | pass 42 | # 识别输出结果 43 | ocr_manager.KillWeChatOCR() 44 | 45 | 46 | if __name__ == "__main__": 47 | ocr_zimu() 48 | -------------------------------------------------------------------------------- /funs/04.提取中间字幕.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2023/12/6 20:25 3 | # @QQ : 2942581284 4 | # @File : 提取中间字幕.py 5 | import os 6 | import json 7 | 8 | def extract_zimu_from_file(file): 9 | with open(file, 'r', encoding='utf-8') as file: 10 | data = json.load(file) 11 | zimu_width = 3800 12 | if data: 13 | # print(data) 14 | ocrResult=data['ocrResult'] 15 | if ocrResult: 16 | ocrResult.sort(key=lambda x:-1*(x['location']['right']-x['location']['left'])) 17 | center=ocrResult[0]['location']['left']+ocrResult[0]['location']['right'] 18 | if abs(center - zimu_width) < 100: 19 | print(ocrResult[0]['text']) 20 | return ocrResult[0]['text'] 21 | else: 22 | return False 23 | 24 | def connect_center_result(file_path=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\如何给我们的PyQt6程序制作一个炫酷的充值按钮'): 25 | res_list=[] 26 | zimu_path=file_path+'-识别结果' 27 | zimu_files=os.listdir(zimu_path) 28 | zimu_files.sort(key=lambda x:int(x[14:20])) 29 | for file in zimu_files: 30 | zimu_file_path=os.path.join(zimu_path,file) 31 | result=extract_zimu_from_file(zimu_file_path) 32 | if result: 33 | res_list.append(result) 34 | res_list=list(dict.fromkeys(res_list)) 35 | # res_list=handke_repeat_list(res_list) 36 | print(','.join(res_list)) 37 | 38 | connect_center_result() -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cpython666/SubExtractor-OCR/88ef66f6eb1a94d0057adc28fd8a4973f6fd411d/logo.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import json 4 | import base64 5 | import requests 6 | import cv2 7 | import math 8 | from PySide6.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, 9 | QPushButton, QLabel, QFileDialog, QListWidget, QHBoxLayout, 10 | QLineEdit, QTextEdit, QProgressBar, QMessageBox) 11 | from PySide6.QtGui import QIcon, QGuiApplication, QDesktopServices 12 | from PySide6.QtCore import QUrl, Qt, QThread, Signal 13 | from qt_material import apply_stylesheet 14 | from dotenv import load_dotenv 15 | 16 | load_dotenv() 17 | UMI_OCR_URL = os.getenv("UMI_OCR_URL", 'http://127.0.0.1:1224/api/ocr') 18 | 19 | 20 | class SubtitleExtractor(QThread): 21 | """字幕提取的工作线程,负责处理视频并生成SRT和TXT字幕文件""" 22 | progress_extract_signal = Signal(int, int) # 抽取字幕图片进度信号 23 | progress_ocr_signal = Signal(int, int) # OCR识别进度信号 24 | progress_combine_signal = Signal(int, int) # 字幕合并进度信号 25 | finished_signal = Signal() # 任务完成信号 26 | message_signal = Signal(str) # 输出消息信号 27 | 28 | def __init__(self, parent=None): 29 | super().__init__(parent) 30 | self.frames_per_second = 1 # 默认每秒抽取1帧,可通过UI配置 31 | 32 | def set_video_paths(self, video_paths, frames_per_second=1): 33 | """设置视频路径和每秒抽取帧数""" 34 | self.video_paths = video_paths 35 | self.frames_per_second = frames_per_second 36 | 37 | def run(self): 38 | """线程运行主函数,依次处理每个视频""" 39 | for video_path in self.video_paths: 40 | self.video_path = video_path 41 | self.video_name = os.path.splitext(os.path.basename(video_path))[0] 42 | self.message_signal.emit(f"开始提取{self.video_name}的字幕!") 43 | self.setup_directories() 44 | self.extract_subtitle_frames() 45 | self.message_signal.emit("字幕区域裁剪完成,开始OCR识别~") 46 | self.perform_ocr() 47 | self.message_signal.emit("字幕识别完成,开始合并~") 48 | self.generate_srt_and_txt_files() 49 | self.message_signal.emit(f"字幕已输出至: {self.output_dir}/{self.video_name}.srt 和 {self.output_dir}/{self.video_name}.txt") 50 | self.message_signal.emit("请手动删除临时文件!") 51 | self.finished_signal.emit() 52 | 53 | def setup_directories(self): 54 | """初始化临时目录和输出目录""" 55 | self.subtitle_dir = f'tmp_imgs/{self.video_name}-字幕' 56 | self.ocr_result_dir = f'tmp_imgs/{self.video_name}-识别结果' 57 | self.output_dir = 'output' 58 | os.makedirs(self.subtitle_dir, exist_ok=True) 59 | os.makedirs(self.ocr_result_dir, exist_ok=True) 60 | os.makedirs(self.output_dir, exist_ok=True) 61 | 62 | def extract_subtitle_frames(self): 63 | """从视频中抽取字幕区域的帧并保存为图片""" 64 | cap = cv2.VideoCapture(self.video_path) 65 | if not cap.isOpened(): 66 | self.message_signal.emit('无法打开视频文件') 67 | return 68 | 69 | total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 70 | frame_rate = int(cap.get(cv2.CAP_PROP_FPS)) 71 | duration = total_frames / frame_rate if frame_rate > 0 else 0 72 | self.message_signal.emit(f"视频总帧数: {total_frames}") 73 | self.message_signal.emit(f"视频帧率: {frame_rate} fps") 74 | self.message_signal.emit(f"视频时长: {duration:.2f} 秒") 75 | 76 | # 获取视频宽高 77 | width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 78 | height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 79 | self.message_signal.emit(f"视频宽高: {width}x{height}") 80 | 81 | # 计算抽帧间隔 82 | frame_interval = frame_rate // self.frames_per_second 83 | frame_count = 0 84 | extracted_count = 0 85 | total_extracted = total_frames // frame_interval 86 | 87 | # 计算帧号的填充位数 88 | padding_length = math.ceil(math.log10(total_frames)) if total_frames > 0 else 6 89 | 90 | # 读取一帧以确定截取高度 91 | ret, frame = cap.read() 92 | if ret: 93 | crop_height_min = int(frame.shape[0] * 0.8) 94 | self.message_signal.emit(f"截取字幕高度区间: {crop_height_min} - {frame.shape[0]}") 95 | cap.set(cv2.CAP_PROP_POS_FRAMES, 0) # 重置到第一帧 96 | 97 | while True: 98 | ret, frame = cap.read() 99 | if not ret: 100 | break 101 | if frame_count % frame_interval == 0: 102 | image_filename = f"frame_{frame_count:0{padding_length}d}.png" 103 | image_path = os.path.join(self.subtitle_dir, image_filename) 104 | cropped_frame = self.crop_subtitle_area(frame) 105 | cv2.imencode(".png", cropped_frame)[1].tofile(image_path) 106 | extracted_count += 1 107 | self.progress_extract_signal.emit(extracted_count, total_extracted) 108 | frame_count += 1 109 | cap.release() 110 | 111 | def crop_subtitle_area(self, frame): 112 | """裁剪视频帧中的字幕区域(默认裁剪底部20%区域)""" 113 | height = frame.shape[0] 114 | height_min = int(height * 0.8) # 字幕通常在底部 115 | return frame[height_min:, :] 116 | 117 | def perform_ocr(self): 118 | """对字幕图片进行OCR识别并保存结果""" 119 | image_files = [f for f in os.listdir(self.subtitle_dir) if f.endswith('.png')] 120 | total = len(image_files) 121 | api_url = UMI_OCR_URL 122 | 123 | for idx, image_name in enumerate(image_files): 124 | image_path = os.path.join(self.subtitle_dir, image_name) 125 | with open(image_path, 'rb') as img_file: 126 | img_base64 = base64.b64encode(img_file.read()).decode('utf-8') 127 | data = { 128 | "base64": img_base64, 129 | "options": {"data.format": "dict"} 130 | # "options": {"data.format": "text"} 131 | } 132 | headers = {"Content-Type": "application/json"} 133 | 134 | data_str = json.dumps(data) 135 | retries = 3 136 | for attempt in range(retries): 137 | try: 138 | response = requests.post(api_url, data=data_str,headers=headers, timeout=10) 139 | if response.status_code == 200: 140 | result = response.json() 141 | print("result",result) 142 | text = self.parse_ocr_result(result) 143 | print("text",text) 144 | result_file = os.path.join(self.ocr_result_dir, f"{image_name}.json") 145 | with open(result_file, 'w', encoding='utf-8') as f: 146 | json.dump({'text': text}, f, ensure_ascii=False, indent=2) 147 | break 148 | elif response.status_code == 502: 149 | if attempt < retries - 1: 150 | self.message_signal.emit(f"API返回502,正在重试... (尝试 {attempt + 1}/{retries})") 151 | else: 152 | self.message_signal.emit(f"API调用失败: 502,重试 {retries} 次后仍失败") 153 | else: 154 | self.message_signal.emit(f"OCR API调用失败: {response.status_code}") 155 | break 156 | except Exception as e: 157 | if attempt < retries - 1: 158 | self.message_signal.emit(f"OCR处理出错: {str(e)},正在重试... (尝试 {attempt + 1}/{retries})") 159 | else: 160 | self.message_signal.emit(f"OCR处理出错: {str(e)},重试 {retries} 次后仍失败") 161 | self.progress_ocr_signal.emit(idx + 1, total) 162 | 163 | def parse_ocr_result(self, result): 164 | """解析UMI-OCR的返回结果,提取面积最大的一句话""" 165 | # 检查识别是否成功,code 为 100 表示成功 166 | if result.get('code') != 100: 167 | return '' 168 | 169 | # 获取 data 列表,若为空则返回空字符串 170 | data = result.get('data', []) 171 | if not data: 172 | return '' 173 | 174 | # 初始化最大面积和对应的文本 175 | max_area = 0 176 | max_text = '' 177 | 178 | # 遍历 data 中的每个文本项 179 | for item in data: 180 | box = item.get('box', []) 181 | # 确保 box 包含 4 个点(矩形) 182 | if len(box) != 4: 183 | continue 184 | 185 | # 计算矩形面积 186 | # box 格式为 [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] 187 | # 宽 = x2 - x1,高 = y3 - y1 188 | width = box[1][0] - box[0][0] 189 | height = box[2][1] - box[0][1] 190 | area = width * height 191 | 192 | # 更新最大面积及其对应的文本 193 | if area > max_area: 194 | max_area = area 195 | max_text = item.get('text', '') 196 | 197 | return max_text 198 | 199 | def normalize_text(self, text): 200 | """归一化相近字符,例如将'—'替换为'-'""" 201 | normalization_map = {'—': '-'} 202 | for src, dst in normalization_map.items(): 203 | text = text.replace(src, dst) 204 | return text 205 | 206 | def generate_srt_and_txt_files(self): 207 | """合并OCR结果并生成带时间线的SRT文件和TXT文件""" 208 | ocr_files = [f for f in os.listdir(self.ocr_result_dir) if f.endswith('.json')] 209 | ocr_files.sort(key=lambda x: int(x.split('_')[1].split('.')[0])) 210 | total = len(ocr_files) 211 | 212 | subtitles = [] 213 | current_text = None 214 | start_frame = None 215 | frame_rate = int(cv2.VideoCapture(self.video_path).get(cv2.CAP_PROP_FPS)) 216 | 217 | for idx, file in enumerate(ocr_files): 218 | file_path = os.path.join(self.ocr_result_dir, file) 219 | text = self.parse_ocr_json(file_path) 220 | if text: 221 | normalized_text = self.normalize_text(text) 222 | frame_num = int(file.split('_')[1].split('.')[0]) 223 | if current_text is None: 224 | current_text = normalized_text 225 | start_frame = frame_num 226 | elif normalized_text != current_text: 227 | end_frame = frame_num - 1 228 | subtitles.append({ 229 | 'start': start_frame / frame_rate, 230 | 'end': end_frame / frame_rate, 231 | 'text': current_text 232 | }) 233 | current_text = normalized_text 234 | start_frame = frame_num 235 | self.progress_combine_signal.emit(idx + 1, total) 236 | 237 | # 添加最后一个字幕 238 | if current_text is not None: 239 | end_frame = int(ocr_files[-1].split('_')[1].split('.')[0]) 240 | subtitles.append({ 241 | 'start': start_frame / frame_rate, 242 | 'end': end_frame / frame_rate, 243 | 'text': current_text 244 | }) 245 | 246 | # 生成SRT和TXT内容 247 | srt_content = "" 248 | txt_content = "" 249 | for i, sub in enumerate(subtitles, start=1): 250 | start_time = self.format_srt_time(sub['start']) 251 | end_time = self.format_srt_time(sub['end']) 252 | srt_content += f"{i}\n{start_time} --> {end_time}\n{sub['text']}\n\n" 253 | txt_content += f"{sub['text']}\n" 254 | 255 | # 写入SRT文件 256 | srt_path = os.path.join(self.output_dir, f"{self.video_name}.srt") 257 | with open(srt_path, 'w', encoding='utf-8') as f: 258 | f.write(srt_content) 259 | self.message_signal.emit("SRT文件生成完成!") 260 | 261 | # 写入TXT文件 262 | txt_path = os.path.join(self.output_dir, f"{self.video_name}.txt") 263 | 264 | res_list = list(dict.fromkeys(txt_content.split('\n'))) 265 | print(','.join(res_list)) 266 | txt_content=','.join(res_list) 267 | self.message_signal.emit(','.join(res_list)) 268 | with open(txt_path, 'w', encoding='utf-8') as f: 269 | f.write(txt_content) 270 | self.message_signal.emit("TXT文件生成完成!") 271 | 272 | def parse_ocr_json(self, file_path): 273 | """从JSON文件中提取OCR识别的文本""" 274 | with open(file_path, 'r', encoding='utf-8') as f: 275 | data = json.load(f) 276 | return data.get('text', '') 277 | 278 | def format_srt_time(self, seconds): 279 | """将秒数转换为SRT时间格式(例如:00:00:01,000)""" 280 | hours = int(seconds // 3600) 281 | minutes = int((seconds % 3600) // 60) 282 | secs = int(seconds % 60) 283 | millis = int((seconds - int(seconds)) * 1000) 284 | return f"{hours:02}:{minutes:02}:{secs:02},{millis:03}" 285 | 286 | 287 | class SubtitleApp(QMainWindow): 288 | """主窗口,提供用户界面交互""" 289 | def __init__(self): 290 | super().__init__() 291 | self.selected_videos = [] 292 | self.output_dir = os.path.join(os.path.dirname(__file__), "output") 293 | self.init_ui() 294 | self.center_window() 295 | 296 | def center_window(self): 297 | """将窗口居中显示""" 298 | qr = self.frameGeometry() 299 | cp = QGuiApplication.primaryScreen().availableGeometry().center() 300 | qr.moveCenter(cp) 301 | self.move(qr.topLeft()) 302 | 303 | def init_ui(self): 304 | """初始化用户界面""" 305 | self.setWindowIcon(QIcon('logo.png')) 306 | self.setGeometry(0, 0, 800, 600) 307 | self.setWindowTitle('字幕提取器') 308 | layout = QVBoxLayout() 309 | 310 | # 文件选择区域 311 | file_layout = QHBoxLayout() 312 | self.file_label = QLabel('已选择文件:') 313 | file_layout.addWidget(self.file_label) 314 | self.file_list = QListWidget() 315 | file_layout.addWidget(self.file_list) 316 | file_btn_layout = QVBoxLayout() 317 | add_file_btn = QPushButton('添加文件') 318 | add_file_btn.clicked.connect(self.add_files) 319 | file_btn_layout.addWidget(add_file_btn) 320 | remove_file_btn = QPushButton('移除选中文件') 321 | remove_file_btn.clicked.connect(self.remove_selected_files) 322 | file_btn_layout.addWidget(remove_file_btn) 323 | file_layout.addLayout(file_btn_layout) 324 | layout.addLayout(file_layout) 325 | 326 | # 配置选项 327 | self.output_label = QLabel(f'字幕输出路径: {self.output_dir}/{{视频名称}}.srt 和 {{视频名称}}.txt') 328 | layout.addWidget(self.output_label) 329 | layout.addWidget(QLabel('每秒抽取帧数:')) 330 | self.fps_input = QLineEdit("1") 331 | layout.addWidget(self.fps_input) 332 | 333 | # 进度条 334 | self.progress_extract = QProgressBar(self) 335 | layout.addWidget(QLabel('字幕图片提取进度:')) 336 | layout.addWidget(self.progress_extract) 337 | self.progress_ocr = QProgressBar(self) 338 | layout.addWidget(QLabel('字幕OCR识别进度:')) 339 | layout.addWidget(self.progress_ocr) 340 | self.progress_combine = QProgressBar(self) 341 | layout.addWidget(QLabel('字幕合并进度:')) 342 | layout.addWidget(self.progress_combine) 343 | 344 | # 输出信息 345 | self.output_text = QTextEdit() 346 | self.output_text.setReadOnly(True) 347 | layout.addWidget(self.output_text) 348 | 349 | # 开始按钮 350 | start_btn = QPushButton('开始处理') 351 | start_btn.clicked.connect(self.start_processing) 352 | layout.addWidget(start_btn) 353 | 354 | # 作者信息 355 | layout.addLayout(self.create_author_link()) 356 | 357 | # 设置主窗口布局 358 | main_widget = QWidget() 359 | main_widget.setLayout(layout) 360 | self.setCentralWidget(main_widget) 361 | 362 | def start_processing(self): 363 | """启动字幕提取进程""" 364 | if not self.selected_videos: 365 | QMessageBox.information(self, '提示', '请先选择视频文件', QMessageBox.StandardButton.Ok) 366 | return 367 | try: 368 | fps = int(self.fps_input.text()) 369 | if fps <= 0: 370 | raise ValueError 371 | except ValueError: 372 | QMessageBox.information(self, '提示', '每秒抽取帧数必须是正整数', QMessageBox.StandardButton.Ok) 373 | return 374 | 375 | self.extractor = SubtitleExtractor(self) 376 | self.extractor.set_video_paths(self.selected_videos, fps) 377 | self.extractor.progress_extract_signal.connect(self.update_extract_progress) 378 | self.extractor.progress_ocr_signal.connect(self.update_ocr_progress) 379 | self.extractor.progress_combine_signal.connect(self.update_combine_progress) 380 | self.extractor.message_signal.connect(self.update_output) 381 | self.extractor.finished_signal.connect(self.reset_progress) 382 | self.reset_progress() 383 | self.extractor.start() 384 | 385 | def update_output(self, message): 386 | """更新输出信息""" 387 | self.output_text.append(message) 388 | 389 | def update_extract_progress(self, current, total): 390 | """更新字幕提取进度""" 391 | self.progress_extract.setRange(0, total) 392 | self.progress_extract.setValue(current) 393 | 394 | def update_ocr_progress(self, current, total): 395 | """更新OCR识别进度""" 396 | self.progress_ocr.setRange(0, total) 397 | self.progress_ocr.setValue(current) 398 | 399 | def update_combine_progress(self, current, total): 400 | """更新字幕合并进度""" 401 | self.progress_combine.setRange(0, total) 402 | self.progress_combine.setValue(current) 403 | 404 | def reset_progress(self): 405 | """重置进度条""" 406 | self.progress_extract.setValue(0) 407 | self.progress_ocr.setValue(0) 408 | self.progress_combine.setValue(0) 409 | 410 | def add_files(self): 411 | """添加视频文件""" 412 | files, _ = QFileDialog.getOpenFileNames(self, '选择视频文件', '', '视频文件 (*.mp4 *.avi *.mkv);;所有文件 (*)') 413 | if files: 414 | self.selected_videos.clear() 415 | self.selected_videos.extend(files) 416 | self.update_file_list() 417 | 418 | def update_file_list(self): 419 | """更新文件列表显示""" 420 | self.file_list.clear() 421 | for file in self.selected_videos: 422 | self.file_list.addItem(file) 423 | 424 | def remove_selected_files(self): 425 | """移除选中的文件""" 426 | selected_items = self.file_list.selectedItems() 427 | for item in selected_items: 428 | self.selected_videos.remove(item.text()) 429 | self.update_file_list() 430 | 431 | def open_url(self, url): 432 | """在浏览器中打开链接""" 433 | QDesktopServices.openUrl(QUrl(url)) 434 | 435 | def create_author_link(self, text="Made By", desc='派森斗罗【使用教程也在此处】', 436 | url='https://space.bilibili.com/1909782963'): 437 | """创建作者信息和链接""" 438 | layout = QHBoxLayout() 439 | label = QLabel(text) 440 | link_btn = QPushButton(desc, self) 441 | font = link_btn.font() 442 | font.setUnderline(True) 443 | link_btn.setFont(font) 444 | link_btn.setStyleSheet("border:0;") 445 | link_btn.clicked.connect(lambda: self.open_url(url)) 446 | layout.addStretch(1) 447 | layout.addWidget(label) 448 | layout.addWidget(link_btn) 449 | layout.addStretch(1) 450 | return layout 451 | 452 | 453 | if __name__ == '__main__': 454 | app = QApplication(sys.argv) 455 | app.setWindowIcon(QIcon('logo.png')) 456 | apply_stylesheet(app, theme='light_blue.xml') 457 | window = SubtitleApp() 458 | window.show() 459 | sys.exit(app.exec()) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | PySide6==6.8.2.1 2 | opencv-python==4.7.0.72 3 | qt_material==2.14 4 | numpy==1.25.1 5 | python-dotenv==1.0.1 --------------------------------------------------------------------------------