├── LICENSE
├── README-zh.md
├── README.md
├── requirements.txt
├── settings.cfg.example
└── srt_translation.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 jesselau76
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README-zh.md:
--------------------------------------------------------------------------------
 1 | # srt-GPT-translator
 2 | [En](https://github.com/jesselau76/srt-gpt-translator/blob/main/README.md) | [中文说明](https://github.com/jesselau76/srt-gpt-translator/blob/main/README-zh.md)
 3 | 
 4 | 这个工具旨在帮助用户使用OpenAI API（model="gpt-3.5-turbo"）将SRT文件翻译成另一种语言。支持双语字幕输出。
 5 | 
 6 | ## 特点
 7 | - 每次翻译为不超过1024字符的多个字幕块，以保持上下文的通畅
 8 | - 加入了检测openai API翻译结果的机制，若格式与原文不对应，会重新翻译，翻译三次仍然不对，会返回那一部分短文本原文
 9 | 
10 | 
11 | ## 安装
12 | 
13 | 要使用此工具，您需要在系统上安装Python 3，以及以下软件包：
14 | 
15 | 您可以通过运行以下命令来安装这些软件包：
16 | 
17 | `pip install -r requirements.txt` 
18 | 
19 | 克隆git
20 | 
21 | `git clone https://github.com/jesselau76/srt-gpt-translator.git` 
22 | 
23 | 更新到新版本
24 | 
25 | ```
26 | cd srt-gpt-translator
27 | git pull
28 | pip install -r requirements.txt
29 | ```
30 | 
31 | ## 用法
32 | 
33 | 使用此工具，您需要首先将settings.cfg.example重命名为settings.cfg。
34 | 
35 | ```
36 | cd srt-gpt-translator
37 | mv settings.cfg.example settings.cfg
38 | nano settings.cfg` 
39 | ```
40 | 
41 | `openai-apikey = sk-xxxxxxx` 
42 | 
43 | 将sk-xxxxxxx替换为您的OpenAI API密钥。 更改其他选项，然后按CTRL-X保存。
44 | 
45 | 运行命令：
46 | ```
47 | python3 srt_translation.py [-h] [--test] filename
48 | 
49 | positional arguments:
50 |   filename    输入文件的名称
51 | 
52 | options:
53 |   -h，--help  显示此帮助消息并退出
54 |   --test      只翻译前3个短文本
55 | ```
56 | 
57 | 只需使用要翻译或转换的文件作为参数运行`srt_translation.py`脚本即可。例如，要翻译名为`example.srt`的SRT文件，您将运行以下命令：
58 | 
59 | `python3 srt_translation.py example.srt` 
60 | 
61 | 默认情况下，脚本将尝试将文本翻译成`settings.cfg`文件中`target-language`选项下指定的语言。
62 | 
63 | ## 特征
64 | 
65 | -   该代码从settings.cfg文件中读取OpenAI API密钥，目标语言和其他选项。
66 | -   代码提供了进度条，以显示SRT翻译的进度。
67 | -   测试功能可用。只翻译3个短文本以节省API使用情况，使用--test选项。
68 | 
69 | ## 配置
70 | 
71 | `settings.cfg`文件包含几个选项，可用于配置脚本的行为：
72 | 
73 | -   `openai-apikey`：您的OpenAI API的API密钥。
74 | -   `target-language`：您要将文本翻译成的语言（例如“英语”，“中文”，“日语”）。
75 | 
76 | ## 输出
77 | 
78 | 脚本的输出将是两个文件：
79 | - 一个与输入文件同名的SRT文件，但在末尾添加了`_translated`。例如，如果输入文件是`example.srt`，则输出文件将为`example_translated.srt`。
80 | - 另一个为双语字幕文件，与输入文件同名的SRT文件，但在末尾添加了`_translated_bilingual`。例如，如果输入文件是`example.srt`，则输出文件将为`example_translated_bilingual.srt`。
81 | ## 许可证
82 | 
83 | 此工具发布在MIT许可下。
84 | ## 免责声明：
85 | 
86 | SRT 翻译器仅供教育和信息目的使用。本工具所使用的 OpenAI API 模型（"gpt-3.5-turbo"）所生成的翻译的准确性、可靠性和完整性不能得到保证。使用 SRT 翻译器的用户应当对所得到的翻译进行准确性和实用性的验证，不应仅凭此进行进一步的依赖和使用。使用 SRT 翻译器工具的风险由用户自行承担，工具的开发人员和贡献者不对其使用所产生的任何损失或损害承担责任。使用 SRT 翻译器工具即表示您同意遵守这些条款和条件。
87 | 
88 | 如果您对本项目的使用有任何疑虑或建议，请通过问题（issues）部分与我们联系。
89 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # srt-GPT-translator
 2 | [En](https://github.com/jesselau76/srt-gpt-translator/blob/main/README.md) | [中文说明](https://github.com/jesselau76/srt-gpt-translator/blob/main/README-zh.md)
 3 | 
 4 | This tool is designed to help users translate srt file into a different language using the OpenAI API (model="gpt-3.5-turbo"). Support bilingual subtitles output.
 5 | 
 6 | ## Features
 7 | - Each translation consists of multiple subtitle blocks, not exceeding 1024 characters, to maintain smooth context.
 8 | - Introduced a mechanism to check the OpenAI API translation results. If the format does not correspond to the original text, it will be re-translated. If the translation is still incorrect after three attempts, the original short text for that section will be returned.
 9 | 
10 | ## Installation
11 | 
12 | To use this tool, you will need to have Python 3 installed on your system, as well as the following packages:
13 | 
14 | 
15 | You can install these packages by running the following command:
16 | ```
17 | pip install -r requirements.txt
18 | ```
19 | 
20 | git clone
21 | 
22 | ```
23 | git clone https://github.com/jesselau76/srt-gpt-translator.git
24 | ```
25 | 
26 | Update to new version
27 | ```
28 | cd srt-gpt-translator
29 | git pull
30 | pip install -r requirements.txt
31 | ```
32 | ## Usage
33 | 
34 | To use this tool, you need rename settings.cfg.example to settings.cfg at first.
35 | ```
36 | cd srt-gpt-translator
37 | mv settings.cfg.example settings.cfg
38 | nano settings.cfg
39 | ```
40 | 
41 | ```
42 | openai-apikey = sk-xxxxxxx
43 | ```
44 | replace sk-xxxxxxx to your OpenAI api key.
45 | Change others options then press CTRL-X to save.
46 | 
47 | run the command: 
48 | ```
49 | python3 srt_translation.py [-h] [--test] filename
50 | 
51 | positional arguments:
52 |   filename    Name of the input file
53 | 
54 | options:
55 |   -h, --help  show this help message and exit
56 |   --test      Only translate the first 3 short texts
57 | ```
58 | 
59 | Simply run the `srt_translation.py` script with the file you want to translate or convert as an argument. For example, to translate a srt file named `example.srt`, you would run the following command:
60 | 
61 | ```
62 | python3 srt_translation.py example.srt
63 | ```
64 | 
65 | By default, the script will attempt to translate the text into the language specified in the `settings.cfg` file under the `target-language` option.
66 | ## Feature
67 | - The code reads the OpenAI API key, target language, and other options from a settings.cfg file.
68 | - The code provides a progress bar to show the progress of srt translation
69 | - Test function available. Only translate 3 short texts to save your API usage with --test.
70 | 
71 | ## Configuration
72 | 
73 | The `settings.cfg` file contains several options that can be used to configure the behavior of the script:
74 | 
75 | - `openai-apikey`: Your API key for the OpenAI API.
76 | - `target-language`: The language you want to translate the text into (e.g. "English", "Chinese", "Japanese").
77 | 
78 | 
79 | ## Output
80 | 
81 | 
82 | The output of the script will be two files: 
83 | - An SRT file with the same name as the input file, but with `_translated` added to the end. For example, if the input file is `example.srt`, the output file will be `example_translated.srt`. 
84 | - Another bilingual subtitle file, an SRT file with the same name as the input file, but `_translated_bilingual` is added at the end. For example, if the input file is `example.srt`, the output file will be `example_translated_bilingual.srt`.
85 | 
86 | ## License
87 | 
88 | This tool is released under the MIT License.
89 | 
90 | ## Disclaimer:
91 | 
92 | The SRT Translator tool is provided for educational and informational purposes only. The accuracy, reliability, and completeness of the translations generated by the OpenAI API model ("gpt-3.5-turbo") used in this tool cannot be guaranteed. Users of the SRT Translator tool are solely responsible for verifying the accuracy and usefulness of the translations obtained, and should not rely solely on them without further verification. The use of the SRT Translator tool is at the user's own risk, and the tool's developers and contributors shall not be liable for any damages or losses arising from its use. By using the SRT Translator tool, you agree to these terms and conditions.
93 | 
94 | If you have any concerns or suggestions about the use of this project, please contact us through the issues section.
95 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | openai 
2 | tqdm 
3 | chardet


--------------------------------------------------------------------------------
/settings.cfg.example:
--------------------------------------------------------------------------------
 1 | [option]
 2 | #API key for OpenAI API
 3 | openai-apikey = sk-
 4 | 
 5 | #Target language for translation, e.g. "English", "Chinese", "Japanese"
 6 | target-language = Chinese
 7 | 
 8 | 
 9 | 
10 | 


--------------------------------------------------------------------------------
/srt_translation.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | 
  4 | import re
  5 | import openai
  6 | from tqdm import tqdm
  7 | # import nltk
  8 | # nltk.download('punkt')
  9 | # from nltk.tokenize import sent_tokenize
 10 | 
 11 | import os
 12 | import tempfile
 13 | import shutil
 14 | 
 15 | import configparser
 16 | 
 17 | from io import StringIO
 18 | import random
 19 | import json
 20 | 
 21 | 
 22 | 
 23 | 
 24 | import chardet
 25 | 
 26 | with open('settings.cfg', 'rb') as f:
 27 |     content = f.read()
 28 |     encoding = chardet.detect(content)['encoding']
 29 |     
 30 | with open('settings.cfg', encoding=encoding) as f:
 31 |     config_text = f.read()
 32 |     config = configparser.ConfigParser()
 33 |     config.read_string(config_text)
 34 | 
 35 | # 获取openai_apikey和language
 36 | openai_apikey = config.get('option', 'openai-apikey')
 37 | language_name = config.get('option', 'target-language')
 38 | 
 39 | # 设置openai的API密钥
 40 | openai.api_key = openai_apikey
 41 | import argparse
 42 | 
 43 | # 创建参数解析器
 44 | parser = argparse.ArgumentParser()
 45 | parser.add_argument("filename", help="Name of the input file")
 46 | parser.add_argument("--test", help="Only translate the first 3 short texts", action="store_true")
 47 | args = parser.parse_args()
 48 | 
 49 | # 获取命令行参数
 50 | filename = args.filename
 51 | base_filename, file_extension = os.path.splitext(filename)
 52 | new_filenametxt = base_filename + "_translated.srt"
 53 | new_filenametxt2 = base_filename + "_translated_bilingual.srt"
 54 | 
 55 | jsonfile = base_filename + "_process.json"
 56 | # 从文件中加载已经翻译的文本
 57 | translated_dict = {}
 58 | try:
 59 |     with open(jsonfile, "r", encoding="utf-8") as f:
 60 |         translated_dict = json.load(f)
 61 | except FileNotFoundError:
 62 |     pass
 63 | 
 64 | 
 65 | 
 66 | def split_text(text):
 67 |     # 使用正则表达式匹配输入文本的每个字幕块（包括空格行）
 68 |     blocks = re.split(r'(\n\s*\n)', text)
 69 | 
 70 |     # 初始化短文本列表
 71 |     short_text_list = []
 72 |     # 初始化当前短文本
 73 |     short_text = ""
 74 |     # 遍历字幕块列表
 75 |     for block in blocks:
 76 |         # 如果当前短文本加上新的字幕块长度不大于1024，则将新的字幕块加入当前短文本
 77 |         if len(short_text + block) <= 1024:
 78 |             short_text += block
 79 |         # 如果当前短文本加上新的字幕块长度大于1024，则将当前短文本加入短文本列表，并重置当前短文本为新的字幕块
 80 |         else:
 81 |             short_text_list.append(short_text)
 82 |             short_text = block
 83 |     # 将最后的短文本加入短文本列表
 84 |     short_text_list.append(short_text)
 85 |     return short_text_list
 86 | 
 87 | 
 88 | 
 89 | 
 90 | def is_translation_valid(original_text, translated_text):
 91 |     def get_index_lines(text):
 92 |         lines = text.split('\n')
 93 |         index_lines = [line for line in lines if re.match(r'^\d+$', line.strip())]
 94 |         return index_lines
 95 | 
 96 |     original_index_lines = get_index_lines(original_text)
 97 |     translated_index_lines = get_index_lines(translated_text)
 98 | 
 99 |     print(original_text, original_index_lines)
100 |     print(translated_text, translated_index_lines)
101 | 
102 |     return original_index_lines == translated_index_lines
103 | def translate_text(text):
104 |     max_retries = 3
105 |     retries = 0
106 |     
107 |     while retries < max_retries:
108 |         try:
109 |             completion = openai.ChatCompletion.create(
110 |                 model="gpt-3.5-turbo",
111 |                 messages=[
112 |                     {
113 |                         "role": "user",
114 |                         "content": f"Translate the following subtitle text into {language_name}, but keep the subtitle number and timeline unchanged: \n{text}",
115 |                     }
116 |                 ],
117 |             )
118 |             t_text = (
119 |                 completion["choices"][0]
120 |                 .get("message")
121 |                 .get("content")
122 |                 .encode("utf8")
123 |                 .decode()
124 |             )
125 |             
126 |             if is_translation_valid(text, t_text):
127 |                 return t_text
128 |             else:
129 |                 retries += 1
130 |                 print(f"Invalid translation format. Retrying ({retries}/{max_retries})")
131 |         
132 |         except Exception as e:
133 |             import time
134 |             sleep_time = 60
135 |             time.sleep(sleep_time)
136 |             retries += 1
137 |             print(e, f"will sleep {sleep_time} seconds, Retrying ({retries}/{max_retries})")
138 | 
139 |     print(f"Unable to get a valid translation after {max_retries} retries. Returning the original text.")
140 |     return text
141 |     
142 | def translate_and_store(text):
143 |     
144 | 
145 |     # 如果文本已经翻译过，直接返回翻译结果
146 |     if text in translated_dict:
147 |         return translated_dict[text]
148 | 
149 |     # 否则，调用 translate_text 函数进行翻译，并将结果存储在字典中
150 |     translated_text = translate_text(text)
151 |     translated_dict[text] = translated_text
152 | 
153 |     # 将字典保存为 JSON 文件
154 |     with open(jsonfile, "w", encoding="utf-8") as f:
155 |         json.dump(translated_dict, f, ensure_ascii=False, indent=4)
156 | 
157 |     return translated_text 
158 | 
159 | 
160 | text = ""
161 | 
162 | # 根据文件类型调用相应的函数
163 | 
164 | if filename.endswith('.srt'):
165 |     
166 |     with open(filename, 'r', encoding='utf-8') as file:
167 |         text = file.read()
168 |        
169 |            
170 |    
171 | else:
172 |     print("Unsupported file type")
173 | 
174 | 
175 |    
176 | # 将多个空格替换为一个空格
177 | import re
178 | #text = re.sub(r"\s+", " ", text)
179 | 
180 | 
181 | 
182 | 
183 | # 将文本分成不大于1024字符的短文本list
184 | short_text_list = split_text(text)
185 | if args.test:
186 |     short_text_list = short_text_list[:3]
187 | # 初始化翻译后的文本
188 | translated_text = ""
189 | 
190 | # 遍历短文本列表，依次翻译每个短文本
191 | for short_text in tqdm(short_text_list):
192 |     print((short_text))
193 |     # 翻译当前短文本
194 |     translated_short_text = translate_and_store(short_text)
195 |     
196 |     
197 |     # 将当前短文本和翻译后的文本加入总文本中
198 |         
199 |     translated_text += f"{translated_short_text}\n\n"
200 |     #print(short_text)
201 |     print(translated_short_text)
202 |     
203 | 
204 | 
205 |     
206 | 
207 | def replace_text(text1, text2):
208 |     def split_blocks(text):
209 |         blocks = re.split(r'(\n\s*\n)', text.strip())
210 |         return [block.split('\n') for block in blocks if block.strip()]
211 | 
212 |     blocks1 = split_blocks(text1)
213 |     blocks2 = split_blocks(text2)
214 | 
215 |     replaced_lines = []
216 | 
217 |     for block1, block2 in zip(blocks1, blocks2):
218 |         replaced_lines.extend(block1[:2])  # Index and timestamp
219 |         replaced_lines.extend(block2[2:])  # Chinese content
220 |         replaced_lines.append('')  # Add an empty line
221 | 
222 |     return '\n'.join(replaced_lines).strip()
223 | 
224 | 
225 | def merge_text(text1, text2):
226 |     def split_blocks(text):
227 |         blocks = re.split(r'(\n\s*\n)', text.strip())
228 |         return [block.split('\n') for block in blocks if block.strip()]
229 | 
230 |     blocks1 = split_blocks(text1)
231 |     blocks2 = split_blocks(text2)
232 | 
233 |     merged_lines = []
234 | 
235 |     for block1, block2 in zip(blocks1, blocks2):
236 |         merged_lines.extend(block1[:2])  # Index and timestamp
237 |         merged_lines.extend(block1[2:])  # English content
238 |         merged_lines.extend(block2[2:])  # Chinese content
239 |         merged_lines.append('')  # Add an empty line
240 | 
241 |     return '\n'.join(merged_lines).strip()
242 | 
243 | 
244 | result = replace_text(text, translated_text)
245 | # 将翻译后的文本写入srt文件
246 | with open(new_filenametxt, "w", encoding="utf-8") as f:
247 |     f.write(result)
248 | 
249 | result2 = merge_text(text, translated_text)
250 | # 将翻译后的文本写入srt文件
251 | with open(new_filenametxt2, "w", encoding="utf-8") as f:
252 |     f.write(result2)
253 | 
254 | try:
255 |     os.remove(jsonfile)
256 |     print(f"File '{jsonfile}' has been deleted.")
257 | except FileNotFoundError:
258 |     print(f"File '{jsonfile}' not found. No file was deleted.")


--------------------------------------------------------------------------------