├── .gitignore
├── LICENSE
├── README.md
├── funs
    ├── 01.视频抽帧.py
    ├── 02.裁剪图像.py
    ├── 03.识别字幕.py
    └── 04.提取中间字幕.py
├── logo.png
├── main.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | .idea
 2 | config.txt
 3 | test
 4 | tmp_imgs
 5 | test
 6 | __pycache__
 7 | build
 8 | dist
 9 | *.spec
10 | output


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 CodeFly
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # SubExtractor-OCR
 2 | 
 3 | > 字幕提取器-OCR版-1.0
 4 | 
 5 | ## 简介
 6 | 
 7 | SubExtractor-OCR 是一款基于 OCR（Optical Character Recognition 光学字符识别）技术的视频字幕提取器。它利用 PyQT6 构建，旨在帮助用户从视频中提取视频文本字幕，从而获取视频文案。
 8 | 
 9 | 这是一款开源自制软件，专为解决提取视频文案的烦恼而生。不再需要手动暂停视频抄录文案，字幕提取器为你助力！
10 | 
11 | ## 功能亮点
12 | 
13 | - **OCR 提取**: 使用光学字符识别技术从视频中提取字幕文本。
14 | - **用户友好的界面**: 基于 PyQT6 构建，提供直观、易用的用户界面。
15 | 
16 | - **支持自定义高度区间**：设置字幕高度区间，提高适应性。
17 | ## 如何安装
18 | 
19 | ### 下载安装包：
20 | 
21 | 就是项目中的rar文件，解压即可使用。
22 | 
23 | 下载链接：[https://github.com/w-x-x-w/SubExtractor-OCR/releases](https://github.com/w-x-x-w/SubExtractor-OCR/releases)
24 | 
25 | ## 注意事项
26 | 
27 | 好了，现在揭晓一些坏消息。由于这是字幕提取器1.0版本，一些功能尚未支持，比如批量化视频处理和自动一键完成。但是，我们已经有了改进的计划，期待未来更完美的版本。
28 | 
29 | ### 为什么没有实现批量化？
30 | 
31 | 批量化处理涉及到复杂的业务场景，视频尺寸不一致可能导致字幕高度区间的错误。
32 | 
33 | 我正在思考是否可以添加为每个视频设置字幕高度区间的功能，欢迎在评论区一起探讨。
34 | 
35 | ### 开发者的心路历程
36 | 
37 | 虽然功能是无限的，但我们的完美主义者一直在思考更好的方案，想为大家呈现最佳版本。当前版本虽不完美，但每个流程和函数都已实现，只是用户操作流程还需要梳理。有代码基础的同学可以尝试使用并提出建议。
38 | 
39 | ### 使用方法
40 | 
41 | 1. 选择视频文件。
42 | 2. 点击“视频抽帧”按钮，耐心等待抽帧完成。
43 | 3. 量取字幕的高度区间。
44 | 4. 点击后续按钮，完成后续处理。
45 | 5. 复制文案，享受提取字幕的乐趣！
46 | 
47 | ### 效果展示
48 | 
49 | 在软件运行界面中，选择一个视频文件，点击视频抽帧按钮，即可开始操作。后续处理速度快，最终效果可在文案中查看。
50 | 1.0
51 | - 解决了换行字幕的识别问题
52 | 
53 | ### 待改进实现的方面
54 | 
55 | - 批量化自动化处理
56 | - 单独线程处理任务而不是单线程
57 | - 预设功能，无需每次输入高度区间
58 | 
59 | ### 最后的话
60 | 
61 | 感谢您使用字幕提取器-OCR版-1.0！我们将不断改进和更新，为用户提供更好的体验。有任何问题或建议，请在评论区与我们分享。希望您喜欢这个小工具，让视频文案提取更轻松愉快！
62 | 
63 | 字幕一般不会换行吧
64 | 
65 | 就截取一行得了，不纠结那么多了
66 | 
67 | ## 打包命令
68 | 
69 | `pyinstaller -w 视频字幕提取-OCR版-1.0.py -i ./logo.png`
70 | 
71 | `pyinstaller -w 视频字幕提取-OCR版-2.0.py -i ./logo.png`
72 | 
73 | ## 贡献
74 | 
75 | 如果您发现任何问题或有改进建议，请提出 issue 或创建 pull 请求。我们欢迎您的贡献！
76 | 
77 | ## 许可证
78 | 
79 | SubExtractor-OCR 使用 [MIT 许可证](LICENSE)。
80 | 
81 | ---
82 | 
83 | 请注意，上述 README 只是一个示例，具体内容可能需要根据你的项目的实际情况进行修改。你可以添加更多的详细信息，例如支持的字幕格式、示例截图等。希望这个示例对你有帮助！
84 | 


--------------------------------------------------------------------------------
/funs/01.视频抽帧.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2023/12/6 23:17
 3 | # @QQ  : 2942581284
 4 | # @File    : 01.视频抽帧.py
 5 | import cv2
 6 | import os
 7 | 
 8 | def get_video_frames(video_path=r"D:\program\剪映\导出\如何给我们的PyQt6程序制作一个炫酷的充值按钮.mp4",
 9 | 					 tmp_imgs_path='tmp_imgs'):
10 | 	tmp_folder_path, file_extension = os.path.splitext(os.path.basename(video_path))
11 | 	folder_path=os.path.join(tmp_imgs_path, tmp_folder_path)
12 | 	if not os.path.exists(folder_path):
13 | 		os.makedirs(folder_path)
14 | 	# 打开视频文件
15 | 	cap = cv2.VideoCapture(video_path)
16 | 	# 检查视频是否成功打开
17 | 	if not cap.isOpened():
18 | 		print("无法打开视频文件")
19 | 	else:
20 | 		# 获取视频帧数
21 | 		total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
22 | 		print(f"视频总帧数: {total_frames}")
23 | 		frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
24 | 		print(f"视频帧率: {frame_rate} fps")
25 | 		# 设置帧间隔，每秒抽取一帧,1秒抽取1帧
26 | 		frame_interval = frame_rate // 1
27 | 		# 循环遍历视频的每一帧并保存为图像
28 | 		frame_count = 0
29 | 		while True:
30 | 			ret, frame = cap.read()
31 | 			if not ret:
32 | 				break
33 | 			if frame_count % frame_interval == 0:
34 | 				image_filename = f"frame_{frame_count:06d}.png"
35 | 				image_save_path=os.path.join(folder_path,image_filename)
36 | 				cv2.imencode(".png", frame)[1].tofile(image_save_path)
37 | 				print(f"保存图像:{image_filename},视频抽帧进度:{frame_count//frame_interval}/{total_frames//frame_interval}:{frame_count}/{total_frames}")
38 | 			frame_count += 1
39 | 	cap.release()
40 | 
41 | get_video_frames()


--------------------------------------------------------------------------------
/funs/02.裁剪图像.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2023/12/6 23:17
 3 | # @QQ  : 2942581284
 4 | # @File    : 02.裁剪图像.py
 5 | import os
 6 | import cv2
 7 | import numpy as np
 8 | 
 9 | 
10 | def cv_imread(file_path):
11 |     cv_img = cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), -1)
12 |     return cv_img
13 | 
14 | 
15 | # imgurl='测试.jpg'
16 | # img1 = cv_imread(imgurl)
17 | # cv2.imencode('.jpg', img1 )[1].tofile(imgurl)
18 | 
19 | def cut_zimu_from_img(
20 |         imgs_path=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\【PyQtPySide界面美化】qt-material极简上手！'):
21 |     zimu_folder_name = os.path.basename(imgs_path) + '-字幕'
22 |     zimu_path = os.path.join(os.path.dirname(imgs_path), zimu_folder_name)
23 |     if not os.path.exists(zimu_path):
24 |         os.makedirs(zimu_path)
25 |     # 获取文件夹中的所有图片文件
26 |     image_files = [f for f in os.listdir(imgs_path)]
27 |     # 定义裁剪区域的坐标 (x, y, width, height)
28 |     height_min = 1800
29 |     height_max = 1950
30 |     width = 3800
31 |     crop_area = (0, height_min, width, height_max - height_min)
32 |     # 循环处理每个图片文件
33 |     for image_file in image_files:
34 |         # 拼接图片文件的完整路径
35 |         image_path = os.path.normpath(os.path.join(imgs_path, image_file))
36 | 
37 |         print(image_path)
38 |         # 读取图片
39 |         image = cv_imread(image_path)
40 |         # image = cv2.imread(image_path)
41 |         if image is not None:
42 |             # 裁剪图片
43 |             x, y, w, h = crop_area
44 |             cropped_image = image[y:y + h, x:x + w]
45 |             # 保存裁剪后的图片
46 |             output_file = os.path.join(zimu_path, f"cropped_{image_file}")
47 |             cv2.imencode(".png", cropped_image)[1].tofile(output_file)
48 |             print(f"已保存裁剪后的图片：{output_file}")
49 |         else:
50 |             print(f"无法读取图片：{image_path}")
51 | 
52 | cut_zimu_from_img()


--------------------------------------------------------------------------------
/funs/03.识别字幕.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | import time
 4 | from wechat_ocr.ocr_manager import OcrManager, OCR_MAX_TASK_ID
 5 | 
 6 | wechat_ocr_dir = r"C:\Users\86159\AppData\Roaming\Tencent\WeChat\XPlugin\Plugins\WeChatOCR\7061\extracted\WeChatOCR.exe"
 7 | # wechat_ocr_dir = "C:\\Users\\Administrator\\AppData\\Roaming\\Tencent\\WeChat\\XPlugin\\Plugins\\WeChatOCR\\7057\\extracted\\WeChatOCR.exe"
 8 | wechat_dir = r"D:\微信\WeChat\[3.9.8.15]"
 9 | # wechat_dir = "D:\\GreenSoftware\\WeChat\\3.9.6.32"
10 | 
11 | def ocr_result_callback(img_path: str, results: dict):
12 |     zimu_imgs_path=os.path.dirname(img_path).replace('-字幕','-识别结果')
13 |     zimu_imgs_name=os.path.basename(img_path)
14 |     result_file = os.path.join(zimu_imgs_path,zimu_imgs_name)+".json"
15 |     # result_file = os.path.basename(img_path) + ".json"
16 |     print(f"识别成功，img_path: {img_path}, result_file: {result_file}")
17 |     with open(result_file, 'w', encoding='utf-8') as f:
18 |         f.write(json.dumps(results, ensure_ascii=False, indent=2))
19 | 
20 | def ocr_zimu(file_name=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\如何给我们的PyQt6程序制作一个炫酷的充值按钮'):
21 |     zimu_path=file_name+'-字幕'
22 |     output_path=file_name+'-识别结果'
23 |     if not os.path.exists(output_path):
24 |         os.makedirs(output_path)
25 |     ocr_manager = OcrManager(wechat_dir)
26 |     # 设置WeChatOcr目录
27 |     ocr_manager.SetExePath(wechat_ocr_dir)
28 |     # 设置微信所在路径
29 |     ocr_manager.SetUsrLibDir(wechat_dir)
30 |     # 设置ocr识别结果的回调函数
31 |     ocr_manager.SetOcrResultCallback(ocr_result_callback)
32 |     # 启动ocr服务
33 |     ocr_manager.StartWeChatOCR()
34 |     # 开始识别图片
35 |     image_files = [f'{f}' for f in os.listdir(zimu_path)]
36 |     for image_name in image_files:
37 |         image_path=os.path.join(zimu_path,image_name)
38 |         ocr_manager.DoOCRTask(image_path)
39 |     # time.sleep(1)
40 |     while ocr_manager.m_task_id.qsize() != OCR_MAX_TASK_ID:
41 |         pass
42 |     # 识别输出结果
43 |     ocr_manager.KillWeChatOCR()
44 | 
45 | 
46 | if __name__ == "__main__":
47 |     ocr_zimu()
48 | 


--------------------------------------------------------------------------------
/funs/04.提取中间字幕.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2023/12/6 20:25
 3 | # @QQ  : 2942581284
 4 | # @File    : 提取中间字幕.py
 5 | import os
 6 | import json
 7 | 
 8 | def extract_zimu_from_file(file):
 9 |     with open(file, 'r', encoding='utf-8') as file:
10 |         data = json.load(file)
11 |     zimu_width = 3800
12 |     if data:
13 |         # print(data)
14 |         ocrResult=data['ocrResult']
15 |         if ocrResult:
16 |             ocrResult.sort(key=lambda x:-1*(x['location']['right']-x['location']['left']))
17 |             center=ocrResult[0]['location']['left']+ocrResult[0]['location']['right']
18 |             if abs(center - zimu_width) < 100:
19 |                 print(ocrResult[0]['text'])
20 |                 return ocrResult[0]['text']
21 |     else:
22 |         return False
23 | 
24 | def connect_center_result(file_path=r'D:\文件夹\github_\myshare_github\Python-Project-Pro\视频文案提取-OCR字幕识别\tmp_imgs\如何给我们的PyQt6程序制作一个炫酷的充值按钮'):
25 |     res_list=[]
26 |     zimu_path=file_path+'-识别结果'
27 |     zimu_files=os.listdir(zimu_path)
28 |     zimu_files.sort(key=lambda x:int(x[14:20]))
29 |     for file in zimu_files:
30 |         zimu_file_path=os.path.join(zimu_path,file)
31 |         result=extract_zimu_from_file(zimu_file_path)
32 |         if result:
33 |             res_list.append(result)
34 |     res_list=list(dict.fromkeys(res_list))
35 |     # res_list=handke_repeat_list(res_list)
36 |     print('，'.join(res_list))
37 | 
38 | connect_center_result()


--------------------------------------------------------------------------------
/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cpython666/SubExtractor-OCR/88ef66f6eb1a94d0057adc28fd8a4973f6fd411d/logo.png


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import os
  3 | import json
  4 | import base64
  5 | import requests
  6 | import cv2
  7 | import math
  8 | from PySide6.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout,
  9 |                                QPushButton, QLabel, QFileDialog, QListWidget, QHBoxLayout,
 10 |                                QLineEdit, QTextEdit, QProgressBar, QMessageBox)
 11 | from PySide6.QtGui import QIcon, QGuiApplication, QDesktopServices
 12 | from PySide6.QtCore import QUrl, Qt, QThread, Signal
 13 | from qt_material import apply_stylesheet
 14 | from dotenv import load_dotenv
 15 | 
 16 | load_dotenv()
 17 | UMI_OCR_URL = os.getenv("UMI_OCR_URL", 'http://127.0.0.1:1224/api/ocr')
 18 | 
 19 | 
 20 | class SubtitleExtractor(QThread):
 21 |     """字幕提取的工作线程，负责处理视频并生成SRT和TXT字幕文件"""
 22 |     progress_extract_signal = Signal(int, int)  # 抽取字幕图片进度信号
 23 |     progress_ocr_signal = Signal(int, int)      # OCR识别进度信号
 24 |     progress_combine_signal = Signal(int, int)  # 字幕合并进度信号
 25 |     finished_signal = Signal()                  # 任务完成信号
 26 |     message_signal = Signal(str)                # 输出消息信号
 27 | 
 28 |     def __init__(self, parent=None):
 29 |         super().__init__(parent)
 30 |         self.frames_per_second = 1  # 默认每秒抽取1帧，可通过UI配置
 31 | 
 32 |     def set_video_paths(self, video_paths, frames_per_second=1):
 33 |         """设置视频路径和每秒抽取帧数"""
 34 |         self.video_paths = video_paths
 35 |         self.frames_per_second = frames_per_second
 36 | 
 37 |     def run(self):
 38 |         """线程运行主函数，依次处理每个视频"""
 39 |         for video_path in self.video_paths:
 40 |             self.video_path = video_path
 41 |             self.video_name = os.path.splitext(os.path.basename(video_path))[0]
 42 |             self.message_signal.emit(f"开始提取{self.video_name}的字幕！")
 43 |             self.setup_directories()
 44 |             self.extract_subtitle_frames()
 45 |             self.message_signal.emit("字幕区域裁剪完成，开始OCR识别~")
 46 |             self.perform_ocr()
 47 |             self.message_signal.emit("字幕识别完成，开始合并~")
 48 |             self.generate_srt_and_txt_files()
 49 |             self.message_signal.emit(f"字幕已输出至: {self.output_dir}/{self.video_name}.srt 和 {self.output_dir}/{self.video_name}.txt")
 50 |             self.message_signal.emit("请手动删除临时文件！")
 51 |         self.finished_signal.emit()
 52 | 
 53 |     def setup_directories(self):
 54 |         """初始化临时目录和输出目录"""
 55 |         self.subtitle_dir = f'tmp_imgs/{self.video_name}-字幕'
 56 |         self.ocr_result_dir = f'tmp_imgs/{self.video_name}-识别结果'
 57 |         self.output_dir = 'output'
 58 |         os.makedirs(self.subtitle_dir, exist_ok=True)
 59 |         os.makedirs(self.ocr_result_dir, exist_ok=True)
 60 |         os.makedirs(self.output_dir, exist_ok=True)
 61 | 
 62 |     def extract_subtitle_frames(self):
 63 |         """从视频中抽取字幕区域的帧并保存为图片"""
 64 |         cap = cv2.VideoCapture(self.video_path)
 65 |         if not cap.isOpened():
 66 |             self.message_signal.emit('无法打开视频文件')
 67 |             return
 68 | 
 69 |         total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
 70 |         frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
 71 |         duration = total_frames / frame_rate if frame_rate > 0 else 0
 72 |         self.message_signal.emit(f"视频总帧数: {total_frames}")
 73 |         self.message_signal.emit(f"视频帧率: {frame_rate} fps")
 74 |         self.message_signal.emit(f"视频时长: {duration:.2f} 秒")
 75 | 
 76 |         # 获取视频宽高
 77 |         width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
 78 |         height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
 79 |         self.message_signal.emit(f"视频宽高: {width}x{height}")
 80 | 
 81 |         # 计算抽帧间隔
 82 |         frame_interval = frame_rate // self.frames_per_second
 83 |         frame_count = 0
 84 |         extracted_count = 0
 85 |         total_extracted = total_frames // frame_interval
 86 | 
 87 |         # 计算帧号的填充位数
 88 |         padding_length = math.ceil(math.log10(total_frames)) if total_frames > 0 else 6
 89 | 
 90 |         # 读取一帧以确定截取高度
 91 |         ret, frame = cap.read()
 92 |         if ret:
 93 |             crop_height_min = int(frame.shape[0] * 0.8)
 94 |             self.message_signal.emit(f"截取字幕高度区间: {crop_height_min} - {frame.shape[0]}")
 95 |         cap.set(cv2.CAP_PROP_POS_FRAMES, 0)  # 重置到第一帧
 96 | 
 97 |         while True:
 98 |             ret, frame = cap.read()
 99 |             if not ret:
100 |                 break
101 |             if frame_count % frame_interval == 0:
102 |                 image_filename = f"frame_{frame_count:0{padding_length}d}.png"
103 |                 image_path = os.path.join(self.subtitle_dir, image_filename)
104 |                 cropped_frame = self.crop_subtitle_area(frame)
105 |                 cv2.imencode(".png", cropped_frame)[1].tofile(image_path)
106 |                 extracted_count += 1
107 |                 self.progress_extract_signal.emit(extracted_count, total_extracted)
108 |             frame_count += 1
109 |         cap.release()
110 | 
111 |     def crop_subtitle_area(self, frame):
112 |         """裁剪视频帧中的字幕区域（默认裁剪底部20%区域）"""
113 |         height = frame.shape[0]
114 |         height_min = int(height * 0.8)  # 字幕通常在底部
115 |         return frame[height_min:, :]
116 | 
117 |     def perform_ocr(self):
118 |         """对字幕图片进行OCR识别并保存结果"""
119 |         image_files = [f for f in os.listdir(self.subtitle_dir) if f.endswith('.png')]
120 |         total = len(image_files)
121 |         api_url = UMI_OCR_URL
122 | 
123 |         for idx, image_name in enumerate(image_files):
124 |             image_path = os.path.join(self.subtitle_dir, image_name)
125 |             with open(image_path, 'rb') as img_file:
126 |                 img_base64 = base64.b64encode(img_file.read()).decode('utf-8')
127 |             data = {
128 |                 "base64": img_base64,
129 |                 "options": {"data.format": "dict"}
130 |                 # "options": {"data.format": "text"}
131 |             }
132 |             headers = {"Content-Type": "application/json"}
133 | 
134 |             data_str = json.dumps(data)
135 |             retries = 3
136 |             for attempt in range(retries):
137 |                 try:
138 |                     response = requests.post(api_url, data=data_str,headers=headers, timeout=10)
139 |                     if response.status_code == 200:
140 |                         result = response.json()
141 |                         print("result",result)
142 |                         text = self.parse_ocr_result(result)
143 |                         print("text",text)
144 |                         result_file = os.path.join(self.ocr_result_dir, f"{image_name}.json")
145 |                         with open(result_file, 'w', encoding='utf-8') as f:
146 |                             json.dump({'text': text}, f, ensure_ascii=False, indent=2)
147 |                         break
148 |                     elif response.status_code == 502:
149 |                         if attempt < retries - 1:
150 |                             self.message_signal.emit(f"API返回502，正在重试... (尝试 {attempt + 1}/{retries})")
151 |                         else:
152 |                             self.message_signal.emit(f"API调用失败: 502，重试 {retries} 次后仍失败")
153 |                     else:
154 |                         self.message_signal.emit(f"OCR API调用失败: {response.status_code}")
155 |                         break
156 |                 except Exception as e:
157 |                     if attempt < retries - 1:
158 |                         self.message_signal.emit(f"OCR处理出错: {str(e)}，正在重试... (尝试 {attempt + 1}/{retries})")
159 |                     else:
160 |                         self.message_signal.emit(f"OCR处理出错: {str(e)}，重试 {retries} 次后仍失败")
161 |             self.progress_ocr_signal.emit(idx + 1, total)
162 | 
163 |     def parse_ocr_result(self, result):
164 |         """解析UMI-OCR的返回结果，提取面积最大的一句话"""
165 |         # 检查识别是否成功，code 为 100 表示成功
166 |         if result.get('code') != 100:
167 |             return ''
168 | 
169 |         # 获取 data 列表，若为空则返回空字符串
170 |         data = result.get('data', [])
171 |         if not data:
172 |             return ''
173 | 
174 |         # 初始化最大面积和对应的文本
175 |         max_area = 0
176 |         max_text = ''
177 | 
178 |         # 遍历 data 中的每个文本项
179 |         for item in data:
180 |             box = item.get('box', [])
181 |             # 确保 box 包含 4 个点（矩形）
182 |             if len(box) != 4:
183 |                 continue
184 | 
185 |             # 计算矩形面积
186 |             # box 格式为 [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
187 |             # 宽 = x2 - x1，高 = y3 - y1
188 |             width = box[1][0] - box[0][0]
189 |             height = box[2][1] - box[0][1]
190 |             area = width * height
191 | 
192 |             # 更新最大面积及其对应的文本
193 |             if area > max_area:
194 |                 max_area = area
195 |                 max_text = item.get('text', '')
196 | 
197 |         return max_text
198 | 
199 |     def normalize_text(self, text):
200 |         """归一化相近字符，例如将'—'替换为'-'"""
201 |         normalization_map = {'—': '-'}
202 |         for src, dst in normalization_map.items():
203 |             text = text.replace(src, dst)
204 |         return text
205 | 
206 |     def generate_srt_and_txt_files(self):
207 |         """合并OCR结果并生成带时间线的SRT文件和TXT文件"""
208 |         ocr_files = [f for f in os.listdir(self.ocr_result_dir) if f.endswith('.json')]
209 |         ocr_files.sort(key=lambda x: int(x.split('_')[1].split('.')[0]))
210 |         total = len(ocr_files)
211 | 
212 |         subtitles = []
213 |         current_text = None
214 |         start_frame = None
215 |         frame_rate = int(cv2.VideoCapture(self.video_path).get(cv2.CAP_PROP_FPS))
216 | 
217 |         for idx, file in enumerate(ocr_files):
218 |             file_path = os.path.join(self.ocr_result_dir, file)
219 |             text = self.parse_ocr_json(file_path)
220 |             if text:
221 |                 normalized_text = self.normalize_text(text)
222 |                 frame_num = int(file.split('_')[1].split('.')[0])
223 |                 if current_text is None:
224 |                     current_text = normalized_text
225 |                     start_frame = frame_num
226 |                 elif normalized_text != current_text:
227 |                     end_frame = frame_num - 1
228 |                     subtitles.append({
229 |                         'start': start_frame / frame_rate,
230 |                         'end': end_frame / frame_rate,
231 |                         'text': current_text
232 |                     })
233 |                     current_text = normalized_text
234 |                     start_frame = frame_num
235 |             self.progress_combine_signal.emit(idx + 1, total)
236 | 
237 |         # 添加最后一个字幕
238 |         if current_text is not None:
239 |             end_frame = int(ocr_files[-1].split('_')[1].split('.')[0])
240 |             subtitles.append({
241 |                 'start': start_frame / frame_rate,
242 |                 'end': end_frame / frame_rate,
243 |                 'text': current_text
244 |             })
245 | 
246 |         # 生成SRT和TXT内容
247 |         srt_content = ""
248 |         txt_content = ""
249 |         for i, sub in enumerate(subtitles, start=1):
250 |             start_time = self.format_srt_time(sub['start'])
251 |             end_time = self.format_srt_time(sub['end'])
252 |             srt_content += f"{i}\n{start_time} --> {end_time}\n{sub['text']}\n\n"
253 |             txt_content += f"{sub['text']}\n"
254 | 
255 |         # 写入SRT文件
256 |         srt_path = os.path.join(self.output_dir, f"{self.video_name}.srt")
257 |         with open(srt_path, 'w', encoding='utf-8') as f:
258 |             f.write(srt_content)
259 |         self.message_signal.emit("SRT文件生成完成！")
260 | 
261 |         # 写入TXT文件
262 |         txt_path = os.path.join(self.output_dir, f"{self.video_name}.txt")
263 | 
264 |         res_list = list(dict.fromkeys(txt_content.split('\n')))
265 |         print('，'.join(res_list))
266 |         txt_content='，'.join(res_list)
267 |         self.message_signal.emit('，'.join(res_list))
268 |         with open(txt_path, 'w', encoding='utf-8') as f:
269 |             f.write(txt_content)
270 |         self.message_signal.emit("TXT文件生成完成！")
271 | 
272 |     def parse_ocr_json(self, file_path):
273 |         """从JSON文件中提取OCR识别的文本"""
274 |         with open(file_path, 'r', encoding='utf-8') as f:
275 |             data = json.load(f)
276 |         return data.get('text', '')
277 | 
278 |     def format_srt_time(self, seconds):
279 |         """将秒数转换为SRT时间格式（例如：00:00:01,000）"""
280 |         hours = int(seconds // 3600)
281 |         minutes = int((seconds % 3600) // 60)
282 |         secs = int(seconds % 60)
283 |         millis = int((seconds - int(seconds)) * 1000)
284 |         return f"{hours:02}:{minutes:02}:{secs:02},{millis:03}"
285 | 
286 | 
287 | class SubtitleApp(QMainWindow):
288 |     """主窗口，提供用户界面交互"""
289 |     def __init__(self):
290 |         super().__init__()
291 |         self.selected_videos = []
292 |         self.output_dir = os.path.join(os.path.dirname(__file__), "output")
293 |         self.init_ui()
294 |         self.center_window()
295 | 
296 |     def center_window(self):
297 |         """将窗口居中显示"""
298 |         qr = self.frameGeometry()
299 |         cp = QGuiApplication.primaryScreen().availableGeometry().center()
300 |         qr.moveCenter(cp)
301 |         self.move(qr.topLeft())
302 | 
303 |     def init_ui(self):
304 |         """初始化用户界面"""
305 |         self.setWindowIcon(QIcon('logo.png'))
306 |         self.setGeometry(0, 0, 800, 600)
307 |         self.setWindowTitle('字幕提取器')
308 |         layout = QVBoxLayout()
309 | 
310 |         # 文件选择区域
311 |         file_layout = QHBoxLayout()
312 |         self.file_label = QLabel('已选择文件:')
313 |         file_layout.addWidget(self.file_label)
314 |         self.file_list = QListWidget()
315 |         file_layout.addWidget(self.file_list)
316 |         file_btn_layout = QVBoxLayout()
317 |         add_file_btn = QPushButton('添加文件')
318 |         add_file_btn.clicked.connect(self.add_files)
319 |         file_btn_layout.addWidget(add_file_btn)
320 |         remove_file_btn = QPushButton('移除选中文件')
321 |         remove_file_btn.clicked.connect(self.remove_selected_files)
322 |         file_btn_layout.addWidget(remove_file_btn)
323 |         file_layout.addLayout(file_btn_layout)
324 |         layout.addLayout(file_layout)
325 | 
326 |         # 配置选项
327 |         self.output_label = QLabel(f'字幕输出路径: {self.output_dir}/{{视频名称}}.srt 和 {{视频名称}}.txt')
328 |         layout.addWidget(self.output_label)
329 |         layout.addWidget(QLabel('每秒抽取帧数:'))
330 |         self.fps_input = QLineEdit("1")
331 |         layout.addWidget(self.fps_input)
332 | 
333 |         # 进度条
334 |         self.progress_extract = QProgressBar(self)
335 |         layout.addWidget(QLabel('字幕图片提取进度：'))
336 |         layout.addWidget(self.progress_extract)
337 |         self.progress_ocr = QProgressBar(self)
338 |         layout.addWidget(QLabel('字幕OCR识别进度：'))
339 |         layout.addWidget(self.progress_ocr)
340 |         self.progress_combine = QProgressBar(self)
341 |         layout.addWidget(QLabel('字幕合并进度：'))
342 |         layout.addWidget(self.progress_combine)
343 | 
344 |         # 输出信息
345 |         self.output_text = QTextEdit()
346 |         self.output_text.setReadOnly(True)
347 |         layout.addWidget(self.output_text)
348 | 
349 |         # 开始按钮
350 |         start_btn = QPushButton('开始处理')
351 |         start_btn.clicked.connect(self.start_processing)
352 |         layout.addWidget(start_btn)
353 | 
354 |         # 作者信息
355 |         layout.addLayout(self.create_author_link())
356 | 
357 |         # 设置主窗口布局
358 |         main_widget = QWidget()
359 |         main_widget.setLayout(layout)
360 |         self.setCentralWidget(main_widget)
361 | 
362 |     def start_processing(self):
363 |         """启动字幕提取进程"""
364 |         if not self.selected_videos:
365 |             QMessageBox.information(self, '提示', '请先选择视频文件', QMessageBox.StandardButton.Ok)
366 |             return
367 |         try:
368 |             fps = int(self.fps_input.text())
369 |             if fps <= 0:
370 |                 raise ValueError
371 |         except ValueError:
372 |             QMessageBox.information(self, '提示', '每秒抽取帧数必须是正整数', QMessageBox.StandardButton.Ok)
373 |             return
374 | 
375 |         self.extractor = SubtitleExtractor(self)
376 |         self.extractor.set_video_paths(self.selected_videos, fps)
377 |         self.extractor.progress_extract_signal.connect(self.update_extract_progress)
378 |         self.extractor.progress_ocr_signal.connect(self.update_ocr_progress)
379 |         self.extractor.progress_combine_signal.connect(self.update_combine_progress)
380 |         self.extractor.message_signal.connect(self.update_output)
381 |         self.extractor.finished_signal.connect(self.reset_progress)
382 |         self.reset_progress()
383 |         self.extractor.start()
384 | 
385 |     def update_output(self, message):
386 |         """更新输出信息"""
387 |         self.output_text.append(message)
388 | 
389 |     def update_extract_progress(self, current, total):
390 |         """更新字幕提取进度"""
391 |         self.progress_extract.setRange(0, total)
392 |         self.progress_extract.setValue(current)
393 | 
394 |     def update_ocr_progress(self, current, total):
395 |         """更新OCR识别进度"""
396 |         self.progress_ocr.setRange(0, total)
397 |         self.progress_ocr.setValue(current)
398 | 
399 |     def update_combine_progress(self, current, total):
400 |         """更新字幕合并进度"""
401 |         self.progress_combine.setRange(0, total)
402 |         self.progress_combine.setValue(current)
403 | 
404 |     def reset_progress(self):
405 |         """重置进度条"""
406 |         self.progress_extract.setValue(0)
407 |         self.progress_ocr.setValue(0)
408 |         self.progress_combine.setValue(0)
409 | 
410 |     def add_files(self):
411 |         """添加视频文件"""
412 |         files, _ = QFileDialog.getOpenFileNames(self, '选择视频文件', '', '视频文件 (*.mp4 *.avi *.mkv);;所有文件 (*)')
413 |         if files:
414 |             self.selected_videos.clear()
415 |             self.selected_videos.extend(files)
416 |             self.update_file_list()
417 | 
418 |     def update_file_list(self):
419 |         """更新文件列表显示"""
420 |         self.file_list.clear()
421 |         for file in self.selected_videos:
422 |             self.file_list.addItem(file)
423 | 
424 |     def remove_selected_files(self):
425 |         """移除选中的文件"""
426 |         selected_items = self.file_list.selectedItems()
427 |         for item in selected_items:
428 |             self.selected_videos.remove(item.text())
429 |         self.update_file_list()
430 | 
431 |     def open_url(self, url):
432 |         """在浏览器中打开链接"""
433 |         QDesktopServices.openUrl(QUrl(url))
434 | 
435 |     def create_author_link(self, text="Made By", desc='派森斗罗【使用教程也在此处】',
436 |                            url='https://space.bilibili.com/1909782963'):
437 |         """创建作者信息和链接"""
438 |         layout = QHBoxLayout()
439 |         label = QLabel(text)
440 |         link_btn = QPushButton(desc, self)
441 |         font = link_btn.font()
442 |         font.setUnderline(True)
443 |         link_btn.setFont(font)
444 |         link_btn.setStyleSheet("border:0;")
445 |         link_btn.clicked.connect(lambda: self.open_url(url))
446 |         layout.addStretch(1)
447 |         layout.addWidget(label)
448 |         layout.addWidget(link_btn)
449 |         layout.addStretch(1)
450 |         return layout
451 | 
452 | 
453 | if __name__ == '__main__':
454 |     app = QApplication(sys.argv)
455 |     app.setWindowIcon(QIcon('logo.png'))
456 |     apply_stylesheet(app, theme='light_blue.xml')
457 |     window = SubtitleApp()
458 |     window.show()
459 |     sys.exit(app.exec())


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | PySide6==6.8.2.1
2 | opencv-python==4.7.0.72
3 | qt_material==2.14
4 | numpy==1.25.1
5 | python-dotenv==1.0.1


--------------------------------------------------------------------------------