├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── gen_caption.py
├── gui.py
├── hooks
├── hook-whisper.py
└── hook-zhconv.py
├── m3u8dl.py
├── main.py
├── md
└── README
│ ├── image-20230926124922726.png
│ ├── image-20231018204208066.png
│ ├── image-20240409095211597.png
│ ├── image-20240409095831766.png
│ ├── image-20240409105228362.png
│ ├── image-20240409131033038.png
│ ├── image-20240413001454717.png
│ ├── image-20240413001734218.png
│ ├── image-20240413002004628.png
│ ├── image-20240413002242979.png
│ ├── image-20240529171253980.png
│ ├── image-20240529171540279.png
│ ├── image-20240529171709402.png
│ ├── image-20240809182344017.png
│ ├── image-20240809182406184.png
│ ├── image-20240809182413373.png
│ ├── image-20240809182420653.png
│ └── image-20240809183350633.png
├── requirements-whisper.txt
├── requirements.txt
├── templates
└── index.html
├── utils.py
├── webui
├── script.js
└── styles.css
├── webui_interface.py
├── yhkt.ico
└── 项目详解.md
/.dockerignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | output/
3 | old/
4 | *.zip
5 | *.exe
6 | build/
7 | dist/
8 | *.spec
9 | whisper_models/
10 | release_*/
11 | *.json
12 | .idea
13 | .DS_Store
14 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | output/
3 | old/
4 | *.zip
5 | *.exe
6 | build/
7 | dist/
8 | *.spec
9 | whisper_models/
10 | release_*/
11 | *.json
12 | .idea
13 | .DS_Store
14 | auth.txt
15 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.8-slim
2 |
3 | WORKDIR /app
4 |
5 | # 创建新的sources.list
6 | RUN echo "deb http://mirrors.ustc.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list && \
7 | echo "deb http://mirrors.ustc.edu.cn/debian/ buster-updates main contrib non-free" >> /etc/apt/sources.list && \
8 | echo "deb http://mirrors.ustc.edu.cn/debian-security buster/updates main contrib non-free" >> /etc/apt/sources.list
9 |
10 |
11 | RUN apt-get update && \
12 | apt-get install -y ffmpeg && \
13 | rm -rf /var/lib/apt/lists/*
14 |
15 | COPY . /app
16 |
17 |
18 | RUN pip install Flask requests
19 |
20 | EXPOSE 5001
21 |
22 | VOLUME /app/output
23 |
24 | CMD ["python", "webui_interface.py"]
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 AuYang261
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # BIT_yanhe_download
2 |
3 | ## 介绍
4 |
5 | 本项目可下载[延河课堂 (yanhekt.cn)](https://www.yanhekt.cn/recordCourse)中的课程视频。延河课堂是北京理工大学的在线课堂,提供了大量的课程视频,但是没有提供下载功能。本项目可以下载指定课程的摄像头和屏幕信号,包括无权限的课程。
6 |
7 | 项目详细报告见[项目详解](./项目详解.md),仅供参考。
8 |
9 | 欢迎提出建议和 star!
10 |
11 | ## 使用:下载指定课程
12 |
13 | [点击此处下载](https://github.com/AuYang261/BIT_yanhe_download/releases/latest/download/release_downloader.zip)并解压。
14 |
15 | 在[延河课堂 (yanhekt.cn)](https://www.yanhekt.cn/recordCourse)中找到想下载的课程,以链接为 `https://www.yanhekt.cn/course/40524 `的课程为例,复制地址栏最后的五位编号 40524。注意是课程列表的链接(以 `yanhekt.cn/course/五位编号 `开头),不是视频界面的链接(以 `yanhekt.cn/session/六位编号`开头)。
16 |
17 | 
18 |
19 | ### 登录延河课堂
20 |
21 | 新版的延河课堂要求登录才能查看课程列表,故需要先自行登录延河课堂。登录后,在延河课堂的页面的地址栏输入如下代码(注意,浏览器会自动去掉前缀"javascript:",故直接复制粘贴后需手动补上):
22 |
23 | ```
24 | javascript:alert(JSON.parse(localStorage.auth).token)
25 | ```
26 |
27 | 
28 |
29 | 回车后会弹出提示框,复制该身份认证码。
30 |
31 | 
32 |
33 | 或者可以按 `F12`键打开”控制台“,在其中输入上述代码,也能得到身份认证码。
34 |
35 | ### 网页 GUI 交互
36 |
37 | 双击运行 `webui_interface.exe` 文件打开网页服务器,会自动弹出浏览器网页。
38 |
39 | 而后在打开的网页中新建任务即可。
40 |
41 | 下载类型可选摄像头(即教室后的摄像头录像)或电脑屏幕(即教室电脑的屏幕信号)。
42 |
43 | 可以选择是否下载教室蓝牙话筒信号(该课程有蓝牙话筒信号时有效),若老师未使用蓝牙话筒则该信号没有声音。
44 |
45 | 
46 |
47 | 首次使用或之前的登录失效时,需要输入上述获取的身份认证码。
48 |
49 | 若之前使用过本工具(包括其他交互方式),登录未失效,身份认证码会自动保存,无需每次都填写。
50 |
51 | 
52 |
53 | 下载完成的文件在 `output/`目录下以 `课程名-video/screen`格式命名的文件夹中。若下载了蓝牙音频则保存在和视频同目录同名的 `.aac`文件中。
54 |
55 | 
56 |
57 | ### 命令行 GUI 交互
58 |
59 | 打开命令行(在 `release_downloader.zip `解压的文件夹地址栏中搜索 cmd),在命令行中输入 `gui.exe` 文件运行。直接双击运行可能会有字符对不齐的问题,导致难以识别文字。最好将命令行窗口最大化以免字符显示不全。
60 |
61 | 
62 |
63 | 首先输入你想下载的课程编号(40524),回车(小键盘的回车似乎不能用),获取课程视频列表:
64 |
65 | 
66 |
67 | 同样,首次使用或之前的登录失效时,需要输入上述获取的身份认证码;登录未失效则不用。
68 |
69 | 
70 |
71 |
72 |
73 | 按键盘上下键移动光标,按空格选择/取消选择,至少需要选择一个视频。选择完成后按回车确认。若想退出按 q 键即可。
74 |
75 | 确认后,选择要下载的信号,同样至少需要选择一个信号,选择完成后按回车确认。
76 |
77 | 
78 |
79 | 而后选择是否下载教室蓝牙话筒信号,选择完成后按回车确认。开始下载。按 `ctrl+c`停止。
80 |
81 | 
82 |
83 | ### 原始交互方式
84 |
85 | 若使用上述 GUI 显示有问题,可直接使用原始交互方式。双击运行 `main.exe` 文件,并输入你想下载的课程编号(40524)和身份认证码(如果需要)。输出课程视频列表:
86 |
87 | 
88 |
89 | 输入想下载的视频编号,用英文逗号(,)分隔,回车。接着输入数字选择下载摄像头信号还是下载屏幕信号,默认为摄像头信号。而后选择是否下载蓝牙话筒信号。回车即开始下载。
90 |
91 | ## 自动生成字幕
92 |
93 | 本项目提供自动生成字幕功能,使用 openai 的[whisper](https://github.com/openai/whisper)项目及其模型在本地进行语音转文字生成字幕。
94 |
95 | 最好使用 GPU 运行,否则速度较慢,依赖见[下文](#依赖)。
96 |
97 | 下载[字幕生成程序 gen_caption](https://github.com/AuYang261/BIT_yanhe_download/releases/tag/v2.0),由于程序比较大,采用了分卷压缩发布。全部下载并解压,得到一个 `gen_caption.exe `可执行文件,保存在上述 `release_downloader.zip `解压的目录中,和保存视频的目录 `output/`同级,如下所示:
98 |
99 | 
100 |
101 | 下载完视频后,双击运行 `gen_caption.exe`(文件较大,需要等一会),输入数字选择视频,回车。再输入数字选择使用多大的模型,越往下效果越好,但所需时间也越长,默认使用 base 模型。第一次使用会自动下载模型(几百 M),请耐心等待。如下所示:
102 |
103 | 
104 |
105 | 等待程序运行完成,生成的字幕文件为 `.srt`格式,与视频文件在同级目录下,用支持字幕的播放器(如 potplayer)打开视频即可看到带字幕的视频。
106 |
107 | _tips: 语音转文字所需的时间较长,可以先观看视频,字幕生成好了再重新打开视频享受字幕。使用 GPU 大约需要几分钟,不使用 GPU 则需要更长时间。_
108 |
109 | ## 依赖
110 |
111 | - ffmpeg,已在 Release 中提供。若在 Linux 环境下运行,需手动安装 ffmpeg:
112 |
113 | ```bash
114 | sudo apt update
115 | sudo apt install ffmpeg
116 | ```
117 |
118 | - **若使用 GPU 运行自动生成字幕功能,需要先安装 cuda,安装方法见[cuda 安装](https://blog.csdn.net/chen565884393/article/details/127905428)。**
119 |
120 | _若想用 python 环境运行,需安装以下依赖_
121 |
122 | - python,[下载](https://www.python.org/ftp/python/3.9.4/python-3.9.4-amd64.exe)并安装
123 | - python 第三方库 requests。打开命令行,运行如下命令安装:
124 |
125 | ```bash
126 | pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
127 | ```
128 |
129 | - 安装语音转文字的依赖:(依赖于 pytorch,若未安装 pytorch,会自动安装,但是 cpu 版本。安装 cuda 版本的 pytorch 方法见[pytorch 官网](https://pytorch.org/get-started/locally/)。)
130 |
131 | ```bash
132 | pip install -r requirements_whisper.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
133 | ```
134 |
135 | ## 注意
136 |
137 | - 需要关闭本机上的代理,否则会提示类似 `check_hostname requires server_hostname`的报错信息。
138 | - 可以下载无权限的课程,只要知道课程链接(中的课程编号)就行。
139 |
140 | ## 打包(仅开发者需要)
141 |
142 | 如果想要运行时不依赖 python 环境,可将 python 程序打包成可执行文件。Release 中已打包。
143 |
144 | 使用如下命令打包:
145 |
146 | ```bash
147 | # 若未安装pyinstaller,运行以下命令安装
148 | pip install pyinstaller
149 | # 打包
150 | pyinstaller -F main.py -i yhkt.ico
151 | pyinstaller -F gui.py -i yhkt.ico
152 | pyinstaller -F webui_interface.py --add-data webui:webui --add-data templates:templates -i yhkt.ico
153 | pyinstaller -F gen_caption.py -i yhkt.ico
154 | ```
155 |
156 | 打包 `gen_caption.py`时可能会失败,提示递归过深:
157 |
158 |
159 |
160 | 解决方法参考[这里](https://zhuanlan.zhihu.com/p/661325305),需要修改项目根目录下的 `gen_caption.spec`配置文件,在文件开始处加上以下代码:
161 |
162 | ```python
163 | import sys ; sys.setrecursionlimit(sys.getrecursionlimit() * 5)
164 | ```
165 |
166 | 再使用如下命令打包:
167 |
168 | ```bash
169 | pyinstaller --clean .\gen_caption.spec
170 | ```
171 |
172 | 打包完成后运行若出现 Temp 目录下的文件未找到:
173 |
174 | 
175 |
176 | 解决方法参考[这个](https://blog.csdn.net/qq_42324086/article/details/118280341),将项目 `hooks`目录下的 `hook-whisper.py`和 `hook-zhconv.py`文件复制到 pyinstaller 的 hook 目录下(通常在 `python根目录\Lib\site-packages\PyInstaller\hooks`)。
177 |
--------------------------------------------------------------------------------
/gen_caption.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import time
4 |
5 | import whisper
6 | from zhconv import convert # 简繁体转换
7 |
8 |
9 | def seconds_to_hmsm(seconds):
10 | """
11 | 输入一个秒数,输出为H:M:S:M时间格式
12 | @params:
13 | seconds - Required : 秒 (float)
14 | """
15 | hours = str(int(seconds // 3600))
16 | minutes = str(int((seconds % 3600) // 60))
17 | seconds = seconds % 60
18 | milliseconds = str(int(int((seconds - int(seconds)) * 1000))) # 毫秒留三位
19 | seconds = str(int(seconds))
20 | # 补0
21 | if len(hours) < 2:
22 | hours = "0" + hours
23 | if len(minutes) < 2:
24 | minutes = "0" + minutes
25 | if len(seconds) < 2:
26 | seconds = "0" + seconds
27 | if len(milliseconds) < 3:
28 | milliseconds = "0" * (3 - len(milliseconds)) + milliseconds
29 | return f"{hours}:{minutes}:{seconds},{milliseconds}"
30 |
31 |
32 | def main():
33 | # 视频文件路径
34 | video_paths = []
35 | if len(sys.argv) >= 2:
36 | video_paths.append(sys.argv[1])
37 | else:
38 | files = []
39 | for dirpath, dirnames, filenames in os.walk("."):
40 | for filename in filenames:
41 | if filename.endswith(".mp4"):
42 | files.append(os.path.join(dirpath, filename).replace("\\", "/"))
43 | for i, f in enumerate(files):
44 | print(f"[{i}]: ", f)
45 | input_list = eval(
46 | "[" + input("select a video file by input a num(split with ','): ") + "]"
47 | )
48 | for i in input_list:
49 | video_paths.append(files[i])
50 | print("selected video files:", video_paths)
51 | models = []
52 | for model in whisper.available_models():
53 | if ".en" in model:
54 | continue
55 | print(f"[{len(models)}]: ", model)
56 | models.append(model)
57 | model_index = input("select a model by input a num(default 'base'): ")
58 | try:
59 | model_name = models[eval(model_index)]
60 | except Exception:
61 | model_name = "base"
62 | print("selected model:", model_name)
63 |
64 | for video_path in video_paths:
65 | audio_path = video_path.replace("mp4", "m4a")
66 | cmd = f'ffmpeg -i "{video_path}" -vn -ar {whisper.audio.SAMPLE_RATE} "{audio_path}"'
67 | os.system(cmd)
68 |
69 | model = whisper.load_model(model_name, download_root="whisper_models/")
70 |
71 | start = time.time()
72 | result = model.transcribe(audio_path, verbose=False, language="zh")
73 | print("Time cost: ", time.time() - start)
74 |
75 | # 写入字幕文件
76 | with open(video_path.replace("mp4", "srt"), "w", encoding="utf-8") as f:
77 | i = 1
78 | for r in result["segments"]:
79 | f.write(str(i) + "\n")
80 | f.write(
81 | seconds_to_hmsm(float(r["start"]))
82 | + " --> "
83 | + seconds_to_hmsm(float(r["end"]))
84 | + "\n"
85 | )
86 | i += 1
87 | f.write(
88 | convert(r["text"], "zh-cn") + "\n"
89 | ) # 结果可能是繁体,转为简体zh-cn
90 | f.write("\n")
91 |
92 | # 删除音频文件
93 | os.remove(audio_path)
94 |
95 |
96 | if __name__ == "__main__":
97 | main()
98 |
--------------------------------------------------------------------------------
/gui.py:
--------------------------------------------------------------------------------
1 | import curses
2 | import sys
3 |
4 | import m3u8dl
5 | import utils
6 |
7 | videoList = []
8 | courseName = ""
9 | professor = ""
10 | selected_videos = []
11 | selected_signal = []
12 | download_audio = []
13 |
14 | align = 0
15 |
16 |
17 | class Row:
18 | def __init__(self, text, highlighted=False):
19 | self.text = text
20 | self.highlighted = highlighted
21 |
22 |
23 | def draw_line(stdscr, text, row):
24 | # 在每个中文字符后插入一个空格,以解决中文字符宽度问题
25 | new_text = ""
26 | for c in text:
27 | new_text += c
28 | if ord(c) > 127:
29 | new_text += " "
30 | stdscr.addnstr(row, align, new_text, get_cmd_window_size(stdscr)[1])
31 |
32 |
33 | def draw_menu(stdscr, options, checked, title, subtitle, current_row):
34 | stdscr.clear()
35 | height, width = get_cmd_window_size(stdscr)
36 | draw_line(stdscr, title, 0)
37 | draw_line(stdscr, subtitle, 1)
38 | msg = []
39 | for idx, option in enumerate(options):
40 | checkmark = "[X]" if checked[idx] else "[ ]"
41 | msg.append(Row(f"{checkmark} {option}", idx == current_row))
42 | draw_multi_select(stdscr, msg, current_row)
43 | draw_line(stdscr, "按上下键移动,按空格键选择/取消选择", height - 2)
44 | draw_line(stdscr, "按回车键确认,按q键退出", height - 1)
45 | stdscr.refresh()
46 |
47 |
48 | def draw_multi_select(stdscr, messages: list, center_row):
49 | # 获取屏幕的行数和列数
50 | height, width = get_cmd_window_size(stdscr)
51 |
52 | # 计算消息的开始位置以使其居中
53 | total_messages = len(messages)
54 | visible_messages = min(height - 5, total_messages) # 屏幕可以显示的最大消息数
55 | start_row = max(2, (height // 2) - (visible_messages // 2))
56 |
57 | # 确定要显示的消息的范围
58 | start_index = min(
59 | max(0, center_row - (visible_messages // 2)), total_messages - visible_messages
60 | )
61 | end_index = min(total_messages, start_index + visible_messages)
62 |
63 | for i in range(start_index, end_index):
64 | message = messages[i]
65 | draw_line(
66 | stdscr,
67 | message.text + (" <=" if message.highlighted else ""),
68 | start_row + (i - start_index),
69 | )
70 |
71 |
72 | def multi_select(stdscr, options, title, subtitle="", checked=None):
73 | # curses.curs_set(0) # 隐藏光标
74 | if checked is None:
75 | checked = [False] * len(options)
76 | else:
77 | checked = [bool(c) for c in checked]
78 | current_row = 0
79 | while True:
80 | draw_menu(stdscr, options, checked, title, subtitle, current_row)
81 | key = stdscr.getch()
82 |
83 | if key == curses.KEY_DOWN:
84 | current_row = (current_row + 1) % len(options) # 向下循环移动
85 | elif key == curses.KEY_UP:
86 | current_row = (current_row - 1) % len(options) # 向上循环移动
87 | elif key == ord("q"):
88 | sys.exit() # 退出程序
89 | elif key == ord(" "):
90 | checked[current_row] = not checked[current_row] # 切换当前行的勾选状态
91 | elif key == curses.KEY_ENTER or key in [10, 13]:
92 | break
93 |
94 | # 获取选择
95 | return [i for i, c in enumerate(checked) if c]
96 |
97 |
98 | def config(stdscr):
99 | global \
100 | videoList, \
101 | courseName, \
102 | professor, \
103 | selected_videos, \
104 | selected_signal, \
105 | download_audio
106 |
107 | height, width = get_cmd_window_size(stdscr)
108 |
109 | # 开启回显
110 | curses.echo()
111 | # 设置背景色
112 | curses.start_color()
113 | # 设置颜色对
114 | curses.init_pair(1, curses.COLOR_BLACK, curses.COLOR_WHITE)
115 | # 设置窗口
116 | stdscr.clear()
117 | stdscr.refresh()
118 |
119 | # stdscr.border(0)
120 | # 提示用户输入
121 | url_base = "https://www.yanhekt.cn/course/"
122 |
123 | draw_line(stdscr, "请输入课程编号(回车退出):", 0)
124 |
125 | draw_line(stdscr, f"{url_base}", 1)
126 |
127 | # 等待用户输入字符串并显示它
128 | courseID = stdscr.getstr().decode("utf-8")
129 | if not courseID:
130 | sys.exit()
131 |
132 | if not utils.read_auth() or not utils.test_auth(courseID=courseID):
133 | stdscr.clear()
134 | for i, line in enumerate(utils.auth_prompt()):
135 | draw_line(stdscr, line, i)
136 | auth = stdscr.getstr().decode("utf-8")
137 | utils.write_auth(auth)
138 | if not utils.test_auth(courseID=courseID):
139 | stdscr.clear()
140 | draw_line(stdscr, "身份验证失败", 0)
141 | stdscr.getch()
142 | sys.exit()
143 | videoList, courseName, professor = utils.get_course_info(courseID=courseID)
144 |
145 | selected_videos = []
146 |
147 | while True:
148 | selected_videos = multi_select(
149 | stdscr,
150 | [v["title"] for v in videoList],
151 | f"课程名:{courseName},请选择要下载的视频:",
152 | )
153 | if not selected_videos:
154 | stdscr.clear()
155 | draw_line(stdscr, "请至少选择一个视频,按回车继续", 0)
156 | stdscr.getch()
157 | else:
158 | break
159 |
160 | selected_signal = []
161 |
162 | while True:
163 | selected_signal = multi_select(
164 | stdscr, ["摄像头", "电脑屏幕"], "选择要下载的信号:"
165 | )
166 | if not selected_signal:
167 | stdscr.clear()
168 | draw_line(stdscr, "请至少选择一个信号,按回车继续", 0)
169 | stdscr.getch()
170 | else:
171 | break
172 |
173 | download_audio = multi_select(
174 | stdscr,
175 | ["下载蓝牙音频"],
176 | "选择是否下载教室蓝牙话筒的音频(如果有的话):",
177 | "若教师未使用教室蓝牙话筒则该音频无声音",
178 | checked=[True],
179 | )
180 |
181 | stdscr.clear()
182 |
183 |
184 | def get_cmd_window_size(stdscr):
185 | return stdscr.getmaxyx()
186 |
187 |
188 | @utils.print_help
189 | def main():
190 | global align
191 | align = 25
192 | curses.wrapper(config)
193 |
194 | fail = []
195 | for i in selected_videos:
196 | c = videoList[i]
197 | name = courseName + "-" + professor + "-" + c["title"]
198 | print(name)
199 | try:
200 | if 1 in selected_signal:
201 | path = f"output/{courseName}-screen"
202 | m3u8dl.M3u8Download(c["videos"][0]["vga"], path, name)
203 | if 0 in selected_signal:
204 | path = f"output/{courseName}-video"
205 | m3u8dl.M3u8Download(c["videos"][0]["main"], path, name)
206 | if download_audio:
207 | audio_url = utils.get_audio_url(c["video_ids"][0])
208 | if audio_url:
209 | print("Downloading audio...")
210 | utils.download_audio(audio_url, path, name)
211 | print("Download audio successfully.")
212 | except Exception as e:
213 | print(e)
214 | fail.append(name)
215 | input(f"下载{name}失败,按回车键开始下一个")
216 | if fail:
217 | print("以下视频下载失败:")
218 | for f in fail:
219 | print(f)
220 | input("按回车键退出")
221 | else:
222 | input("下载结束,按回车键退出")
223 |
224 |
225 | if __name__ == "__main__":
226 | main()
227 |
--------------------------------------------------------------------------------
/hooks/hook-whisper.py:
--------------------------------------------------------------------------------
1 | from PyInstaller.utils.hooks import collect_data_files
2 |
3 | datas = collect_data_files("whisper")
4 |
--------------------------------------------------------------------------------
/hooks/hook-zhconv.py:
--------------------------------------------------------------------------------
1 | from PyInstaller.utils.hooks import collect_data_files
2 |
3 | datas = collect_data_files("zhconv")
4 |
--------------------------------------------------------------------------------
/m3u8dl.py:
--------------------------------------------------------------------------------
1 | import base64
2 | import os
3 | import queue
4 | import re
5 | import signal
6 | import sys
7 | import time
8 | from concurrent.futures import ThreadPoolExecutor
9 | from subprocess import run
10 |
11 | import requests
12 | import urllib3
13 |
14 | import utils
15 |
16 |
17 | class ThreadPoolExecutorWithQueueSizeLimit(ThreadPoolExecutor):
18 | """
19 | 实现多线程有界队列
20 | 队列数为线程数的2倍
21 | """
22 |
23 | def __init__(self, max_workers=None, *args, **kwargs):
24 | super().__init__(max_workers, *args, **kwargs)
25 | self._work_queue = queue.Queue(max_workers * 2)
26 |
27 |
28 | def make_sum():
29 | ts_num = 0
30 | while True:
31 | yield ts_num
32 | ts_num += 1
33 |
34 |
35 | def dummy_func(downloaded, total, merge_status):
36 | return
37 |
38 |
39 | class M3u8Download:
40 | """
41 | :param url: 完整的m3u8文件链接 如"https://www.bilibili.com/example/index.m3u8"
42 | :param name: 保存m3u8的文件名 如"index"
43 | :param max_workers: 多线程最大线程数
44 | :param num_retries: 重试次数
45 | :param base64_key: base64编码的字符串
46 | """
47 |
48 | def __init__(
49 | self,
50 | url,
51 | workDir,
52 | name,
53 | max_workers=32,
54 | num_retries=99,
55 | base64_key=None,
56 | progress_callback=dummy_func,
57 | ):
58 | self._url = url
59 | self._token = None
60 | self._workDir = workDir
61 | self._name = name
62 | self._max_workers = max_workers
63 | self._num_retries = num_retries
64 | self._progress_callback = progress_callback
65 | if not os.path.exists(os.path.join(os.getcwd(), self._workDir)):
66 | os.makedirs(os.path.join(os.getcwd(), self._workDir))
67 | self._file_path = os.path.join(os.getcwd(), self._workDir, self._name)
68 | if os.path.exists(self._file_path + ".mp4"):
69 | print(f"File '{self._file_path}.mp4' already exists, skip download")
70 | self._progress_callback(100, 100, 2)
71 | return
72 | self._front_url = None
73 | self._ts_url_list = []
74 | self._success_sum = 0
75 | self._ts_sum = 0
76 | self._key = base64.b64decode(base64_key.encode()) if base64_key else None
77 | self._headers = {
78 | "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36 Edg/93.0.961.52",
79 | "Origin": "https://www.yanhekt.cn",
80 | "referer": "https://www.yanhekt.cn/",
81 | }
82 | self.timestamp, self.signature = utils.getSignature()
83 | urllib3.disable_warnings()
84 |
85 | self._url = utils.encryptURL(self._url)
86 |
87 | self.get_m3u8_info(self._url, self._num_retries)
88 |
89 | def signal_handler(sig, frame):
90 | print("Caught KeyboardInterrupt. Shutting down...")
91 | os._exit(1)
92 |
93 | signal.signal(signal.SIGINT, signal_handler)
94 | print(f"Downloading: {self._name}", f"Save path: {self._file_path}", sep="\n")
95 | with ThreadPoolExecutorWithQueueSizeLimit(self._max_workers) as pool:
96 | pool.submit(self.updateSignatureLoop)
97 | for k, ts_url in enumerate(self._ts_url_list):
98 | pool.submit(
99 | self.download_ts,
100 | ts_url,
101 | # The `.ts` extension is mandatory for FFmpeg 7.1.1+.
102 | # https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/b753bac08f6881b2d3dea8f1ab84c81550f35897
103 | # https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/6c4e56f07d1a703435854f2156c881885f7798da
104 | os.path.join(self._file_path, f"{k}.ts"),
105 | self._num_retries,
106 | )
107 | if self._success_sum == self._ts_sum:
108 | self._progress_callback(self._success_sum, self._ts_sum, 1)
109 | self.output_mp4()
110 | self.delete_file()
111 | print(f"Download successfully --> {self._name}")
112 | self._progress_callback(self._success_sum, self._ts_sum, 2)
113 |
114 | def updateSignatureLoop(self):
115 | while self._success_sum != self._ts_sum:
116 | self.timestamp, self.signature = utils.getSignature()
117 | time.sleep(10)
118 |
119 | def get_m3u8_info(self, m3u8_url: str, num_retries: int) -> None:
120 | """
121 | 获取m3u8信息
122 | """
123 |
124 | if not self._token:
125 | self._token = utils.getToken()
126 | token = self._token
127 | url = utils.add_signature_for_url(
128 | m3u8_url, token, self.timestamp, self.signature
129 | )
130 | try:
131 | with requests.get(
132 | url, timeout=(3, 30), verify=False, headers=self._headers
133 | ) as res:
134 | if res.status_code != 200:
135 | raise Exception(f"Failed to get m3u8 info: {res.status_code}")
136 | self._front_url = res.url.split(res.request.path_url)[0]
137 | if "EXT-X-STREAM-INF" in res.text: # 判定为顶级M3U8文件
138 | for line in res.text.split("\n"):
139 | if "#" in line:
140 | continue
141 | elif line.startswith("http"):
142 | self._url = line
143 | elif line.startswith("/"):
144 | self._url = self._front_url + line
145 | else:
146 | self._url = self._url.rsplit("/", 1)[0] + "/" + line
147 | self.get_m3u8_info(self._url, self._num_retries)
148 | else:
149 | m3u8_text_str = res.text
150 | self.get_ts_url(m3u8_text_str)
151 | except Exception as e:
152 | print(e)
153 | if num_retries > 0:
154 | self.get_m3u8_info(m3u8_url, num_retries - 1)
155 |
156 | def get_ts_url(self, m3u8_text_str: str) -> None:
157 | """
158 | 获取每一个ts文件的链接
159 | """
160 | if not os.path.exists(self._file_path):
161 | os.mkdir(self._file_path)
162 | new_m3u8_str = ""
163 | ts = make_sum()
164 | for line in m3u8_text_str.split("\n"):
165 | if "#" in line:
166 | if "EXT-X-KEY" in line and "URI=" in line:
167 | if os.path.exists(os.path.join(self._file_path, "key")):
168 | continue
169 | key = self.download_key(line, 5)
170 | if key:
171 | new_m3u8_str += f"{key}\n"
172 | continue
173 | new_m3u8_str += f"{line}\n"
174 | if "EXT-X-ENDLIST" in line:
175 | break
176 | else:
177 | if line.startswith("http"):
178 | self._ts_url_list.append(line)
179 | elif line.startswith("/"):
180 | self._ts_url_list.append(self._front_url + line)
181 | else:
182 | self._ts_url_list.append(self._url.rsplit("/", 1)[0] + "/" + line)
183 | new_m3u8_str += os.path.join(self._file_path, f"{next(ts)}.ts") + "\n"
184 | self._ts_sum = next(ts)
185 | with open(self._file_path + ".m3u8", "wb") as f:
186 | f.write(new_m3u8_str.encode("utf-8"))
187 |
188 | def download_ts(self, ts_url_original: str, name: str, num_retries: int) -> None:
189 | """
190 | 下载 .ts 文件
191 | """
192 | if not self._token:
193 | self._token = utils.getToken()
194 | token = self._token
195 | ts_url = utils.add_signature_for_url(
196 | ts_url_original.split("\n")[0], token, self.timestamp, self.signature
197 | )
198 | try:
199 | if not os.path.exists(name):
200 | with requests.get(
201 | ts_url,
202 | stream=True,
203 | timeout=(5, 60),
204 | verify=False,
205 | headers=self._headers,
206 | ) as res:
207 | if res.status_code == 200:
208 | with open(name, "wb") as ts:
209 | for chunk in res.iter_content(chunk_size=1024):
210 | if chunk:
211 | ts.write(chunk)
212 | self._success_sum += 1
213 | sys.stdout.write(
214 | "\r[%-25s](%d/%d)"
215 | % (
216 | "*" * (100 * self._success_sum // self._ts_sum // 4),
217 | self._success_sum,
218 | self._ts_sum,
219 | )
220 | )
221 | sys.stdout.flush()
222 | else:
223 | self.download_ts(ts_url_original, name, num_retries - 1)
224 | else:
225 | self._success_sum += 1
226 |
227 | self._progress_callback(self._success_sum, self._ts_sum, 0)
228 | except Exception:
229 | if os.path.exists(name):
230 | os.remove(name)
231 | if num_retries > 0:
232 | self.download_ts(ts_url_original, name, num_retries - 1)
233 |
234 | def download_key(self, key_line, num_retries):
235 | """
236 | 下载key文件
237 | """
238 | mid_part = re.search(r"URI=[\'|\"].*?[\'|\"]", key_line).group()
239 | may_key_url = mid_part[5:-1]
240 | if self._key:
241 | with open(os.path.join(self._file_path, "key"), "wb") as f:
242 | f.write(self._key)
243 | return f'{key_line.split(mid_part)[0]}URI="./{self._name}/key"'
244 | if may_key_url.startswith("http"):
245 | true_key_url = may_key_url
246 | elif may_key_url.startswith("/"):
247 | true_key_url = self._front_url + may_key_url
248 | else:
249 | true_key_url = self._url.rsplit("/", 1)[0] + "/" + may_key_url
250 | try:
251 | with requests.get(
252 | true_key_url, timeout=(5, 30), verify=False, headers=self._headers
253 | ) as res:
254 | with open(os.path.join(self._file_path, "key"), "wb") as f:
255 | f.write(res.content)
256 | return f'{key_line.split(mid_part)[0]}URI="./{self._name}/key"{key_line.split(mid_part)[-1]}'
257 | except Exception as e:
258 | print(e)
259 | if os.path.exists(os.path.join(self._file_path, "key")):
260 | os.remove(os.path.join(self._file_path, "key"))
261 | print("加密视频,无法加载key,解密失败")
262 | if num_retries > 0:
263 | self.download_key(key_line, num_retries - 1)
264 |
265 | def output_mp4(self) -> None:
266 | """
267 | 合并.ts文件,输出mp4格式视频,需要ffmpeg
268 | """
269 | run(
270 | [
271 | "ffmpeg",
272 | "-i", f"{self._file_path}.m3u8",
273 | "-acodec", "copy",
274 | "-vcodec", "copy",
275 | "-f", "mp4",
276 | f"{self._file_path}.mp4",
277 | ],
278 | check=True,
279 | )
280 |
281 | def delete_file(self):
282 | file = os.listdir(self._file_path)
283 | for item in file:
284 | os.remove(os.path.join(self._file_path, item))
285 | os.removedirs(self._file_path)
286 | os.remove(self._file_path + ".m3u8")
287 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 |
4 | import m3u8dl
5 | import utils
6 |
7 | headers = {
8 | "Origin": "https://www.yanhekt.cn",
9 | "xdomain-client": "web_user",
10 | "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.26",
11 | }
12 |
13 |
14 | @utils.print_help
15 | def main():
16 | if len(sys.argv) == 1:
17 | courseID = input("输 入 课 程 ID: ")
18 | else:
19 | courseID = sys.argv[1]
20 |
21 | if not utils.read_auth() or not utils.test_auth(courseID=courseID):
22 | auth = input("。".join(utils.auth_prompt()))
23 | utils.write_auth(auth)
24 | if not utils.test_auth(courseID=courseID):
25 | print("身份验证失败")
26 | sys.exit()
27 | videoList, courseName, professor = utils.get_course_info(courseID=courseID)
28 |
29 | print(f"课 程 名: {courseName}")
30 |
31 | for i, c in enumerate(videoList):
32 | print(f"[{i}]: ", c["title"])
33 |
34 | index = eval(
35 | "[" + input("选 择 课 程 编 号 (用 英 文 逗 号 ','分 隔, 例 如: 0,2,4): ") + "]"
36 | )
37 | vga = input(
38 | "选 择 下 载 摄 像 头 (1) 还 是 电 脑 屏 幕 (2)?(输 入 1 或 2, 默 认 摄 像 头):"
39 | )
40 | audio = input(
41 | "是 否 下 载 教 室 蓝 牙 话 筒 的 音 频 ?若 教 师 未 使 用 蓝 牙 话 筒 则 该 音 频 无 声 音 (输 入 1不 下 载, 默 认 下 载):"
42 | )
43 | if not os.path.exists("output/"):
44 | os.mkdir("output/")
45 | for i in index:
46 | c = videoList[i]
47 | name = courseName + "-" + professor + "-" + c["title"]
48 | print(name)
49 | if vga == "2":
50 | path = f"output/{courseName}-screen"
51 | print("Downloading screen...")
52 | m3u8dl.M3u8Download(c["videos"][0]["vga"], path, name)
53 | else:
54 | path = f"output/{courseName}-video"
55 | print("Downloading video...")
56 | m3u8dl.M3u8Download(c["videos"][0]["main"], path, name)
57 | if audio == "" and c["video_ids"]:
58 | audio_url = utils.get_audio_url(c["video_ids"][0])
59 | if audio_url:
60 | print("Downloading audio...")
61 | utils.download_audio(audio_url, path, name)
62 | print("Download audio successfully.")
63 |
64 |
65 | if __name__ == "__main__":
66 | main()
67 |
--------------------------------------------------------------------------------
/md/README/image-20230926124922726.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20230926124922726.png
--------------------------------------------------------------------------------
/md/README/image-20231018204208066.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20231018204208066.png
--------------------------------------------------------------------------------
/md/README/image-20240409095211597.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240409095211597.png
--------------------------------------------------------------------------------
/md/README/image-20240409095831766.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240409095831766.png
--------------------------------------------------------------------------------
/md/README/image-20240409105228362.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240409105228362.png
--------------------------------------------------------------------------------
/md/README/image-20240409131033038.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240409131033038.png
--------------------------------------------------------------------------------
/md/README/image-20240413001454717.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240413001454717.png
--------------------------------------------------------------------------------
/md/README/image-20240413001734218.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240413001734218.png
--------------------------------------------------------------------------------
/md/README/image-20240413002004628.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240413002004628.png
--------------------------------------------------------------------------------
/md/README/image-20240413002242979.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240413002242979.png
--------------------------------------------------------------------------------
/md/README/image-20240529171253980.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240529171253980.png
--------------------------------------------------------------------------------
/md/README/image-20240529171540279.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240529171540279.png
--------------------------------------------------------------------------------
/md/README/image-20240529171709402.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240529171709402.png
--------------------------------------------------------------------------------
/md/README/image-20240809182344017.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240809182344017.png
--------------------------------------------------------------------------------
/md/README/image-20240809182406184.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240809182406184.png
--------------------------------------------------------------------------------
/md/README/image-20240809182413373.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240809182413373.png
--------------------------------------------------------------------------------
/md/README/image-20240809182420653.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240809182420653.png
--------------------------------------------------------------------------------
/md/README/image-20240809183350633.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AuYang261/BIT_yanhe_download/e173cf593f4802fcc4401e71d6edf43bb0e65541/md/README/image-20240809183350633.png
--------------------------------------------------------------------------------
/requirements-whisper.txt:
--------------------------------------------------------------------------------
1 | openai-whisper
2 | zhconv
3 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | windows-curses
3 | Flask
--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |