├── .gitignore ├── .gitattributes ├── figure ├── 0.png ├── 1.png ├── 2.png ├── favicon.ico └── title.jpg ├── config.json ├── config_example.json ├── LICENSE ├── README.md ├── download_parallel.py └── download.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.exe 2 | *.json 3 | *.zip 4 | *.txt 5 | */ 6 | **/test -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /figure/0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EricZhu-42/PedagogySquare_Downloader/HEAD/figure/0.png -------------------------------------------------------------------------------- /figure/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EricZhu-42/PedagogySquare_Downloader/HEAD/figure/1.png -------------------------------------------------------------------------------- /figure/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EricZhu-42/PedagogySquare_Downloader/HEAD/figure/2.png -------------------------------------------------------------------------------- /figure/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EricZhu-42/PedagogySquare_Downloader/HEAD/figure/favicon.ico -------------------------------------------------------------------------------- /figure/title.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EricZhu-42/PedagogySquare_Downloader/HEAD/figure/title.jpg -------------------------------------------------------------------------------- /config.json: -------------------------------------------------------------------------------- 1 | { 2 | "username": "your_username", 3 | "password": "your_password", 4 | "ext_expel_list": [], 5 | "cid_expel_list": [], 6 | "cid_include_list": [], 7 | "save_path": "", 8 | "keep_dirs": false 9 | } -------------------------------------------------------------------------------- /config_example.json: -------------------------------------------------------------------------------- 1 | { 2 | "username": "13500000000", 3 | "password": "somepassword", 4 | "ext_expel_list": ["mp4", "pdf"], 5 | "cid_include_list": [], 6 | "cid_expel_list": [12102], 7 | "save_path" : "", 8 | "keep_dirs": false 9 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 EricZhu-42 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 | GitHub 6 | GitHub last commit 7 | GitHub release (latest by date) 8 | GitHub code size in bytes 9 | GitHub top language 10 |
11 | GitHub stars 12 | GitHub All Releases 13 | GitHub issues 14 | GitHub closed issues 15 | PRs welcome 16 | 17 |

18 | 19 | # 教学立方课件下载器 20 | 21 | 在线教学平台——[教学立方](https://teaching.applysquare.com)的课件批量下载脚本,基于**Python** + **Requests** 22 | 23 | > 创建日期:2020-03-30 24 | > 更新日期:2022-03-06 25 | 26 | **✨ [2025/02/16] [@TwinklerG](https://github.com/TwinklerG) 维护了该项目的 [Rust 版本](https://github.com/TwinklerG/PedagogySquare-Downloader-rs),支持多线程并行下载,欢迎试用。** 27 | 28 | ## 下载地址 29 | 30 | **更新日期:2021年6月21日** 31 | 32 | 推荐通过Github release[下载](https://github.com/EricZhu-42/PedagogySquare_Downloader/releases/download/v1.7/PedagogySquare_Downloader_20210621.zip);若连接速度较慢,亦可通过国内镜像地址[下载](https://box.nju.edu.cn/f/4214a2459d4347df9394/?dl=1)。 33 | 34 | 35 | ## 版本更新日志 36 | 37 | ### 2025年2月19日:并发支持 38 | 39 | 增加`download_parallel.py`文件,此脚本为并发下载版本,使用`tqdm`提供进度条 40 | 41 | ### 2021年6月21日:功能性更新 42 | 43 | 在 `config.json` 文件中增加了 `cid_include_list` 项,在非空时将仅下载指定的部分课程。 44 | 45 | ### 2021年4月26日:功能性更新 46 | 47 | 在 `config.json` 文件中增加了 `keep_dirs` 项,可选择下载时是否保持课件的文件夹结构(默认为 `false`) 48 | 49 | ### 2021年3月5日:功能性更新 50 | 51 | 为了方便课件管理,当前版本的 `config.json` 文件中增加了 `save_path` 项(支持绝对路径/相对路径),可设置下载文件的保存目录。 52 | 53 | 请参照下文“配置文件说明”一节,调整下载目录(默认为空,表示下载到脚本的相同目录) 54 | 55 | ### 2021年3月1日:Windows平台下的工具封装 56 | 57 | **TLDR: 当前版本做了脚本封装,现在Windows平台下不安装Python环境也能用了** 58 | 59 | 自2021年3月1日起,我们将在release中为**Windows平台**用户提供封装后的**独立可执行程序**(standalone executable)。**无需安装配置Python环境即可使用封装后的课件下载工具**,具体的使用方法为: 60 | 61 | 1. 下载并解压最新的release文件包,并进入解压后的文件夹 62 | 3. 参照下文“最简配置方案”一节,修改文件 `config.json` ,填入用户名、密码等信息 63 | 4. 双击执行 `run.bat`,等待课件下载完成 64 | 65 | 此外,新版本脚本优化了 `config.json` 的配置逻辑,推荐参考下文“配置文件说明”一节重新进行配置。 66 | 67 | > 如果更新后的脚本无法正常使用,请尝试回退至[旧版本](https://github.com/EricZhu-42/PedagogySquare_Downloader/releases/download/v1.4_stable/PedagogySquare_Downloader_20200914.zip),观察问题是否解决,并联系开发者反馈问题,感谢! 68 | 69 | ## 程序特色 70 | 71 | 1. **一键下载**所有课程的全部课件,方便快捷 72 | 2. **可下载未直接开放下载的课件**,视频等内容 73 | 3. **可深度配置**的课程筛选/文件拓展名筛选功能 74 | 75 | > 本程序旨在方便学生下载教学立方平台上的课件及相关教学资料,消除下载文件的重复劳动 76 | > 请尊重教师的知识产权与劳动成果。除非获得教师许可,请勿将下载得到的文件在互联网上进行传播 77 | > 如本程序损害了您的权益,请联系作者删除相关代码 78 | 79 | ## 运行环境 80 | 81 | 开发过程中使用的环境与第三方模块版本如下: 82 | 83 | - **Python** = 3.7.4 84 | - **Requests** = 2.22.0 85 | - **tqdm** = 4.66.2 (可选,用于并发下载版本) 86 | 87 | 为了正常运行脚本,请安装**不低于**以上版本的Python与Requests 88 | 89 | > 另:经测试,**该脚本可在移动端的终端模拟器中正常运行**;安卓平台执行Python脚本可以参考[@OrangeX4](https://github.com/OrangeX4)提供的[工具与教程](https://orangex.orangex4.cool/)。 90 | 91 | ## 使用方法 92 | 93 | ### 1. 配置环境(请参考其他教程) 94 | 95 | 1. 安装对应版本的Python 96 | 97 | 2. 安装对应版本的Python模块:**Requests**(推荐使用[Anaconda](https://www.anaconda.com/)进行管理) 98 | 99 | 100 | ### 2. 修改配置文件 101 | 102 | 修改文件 `config.json` ,填入用户名、密码等信息 103 | 104 | > 关于如何修改配置文件,请参考“最简配置方案”一章 105 | > 关于配置文件内各项参数的说明,请参考“配置文件说明”一章 106 | 107 | ### 3. 运行脚本 108 | 109 | 运行 `download.py` 110 | 111 | > 注:若运行过程中出现下载速度过慢等现象,可能是由于与教学立方网站连接不稳定,请尝试重新运行脚本。 112 | 113 | ## 项目结构介绍 114 | 115 | | 文件名 | 功能 | 116 | | ------------------- | ------------------------ | 117 | | figure/ | 脚本说明中用到的图片文件 | 118 | | download.py | 脚本运行入口 | 119 | | config.json | 执行参数的配置文件 | 120 | | config_example.json | 供参考的样例配置文件 | 121 | 122 | ## 最简配置方案 123 | 124 | ```json 125 | { 126 | "username": "your_username", 127 | "password": "your_password", 128 | "ext_expel_list": [], 129 | "cid_expel_list": [], 130 | "cid_include_list": [], 131 | "save_path": "", 132 | "keep_dirs": false 133 | } 134 | ``` 135 | 136 | 将 `your_username` 与 `your_password` (**注意保留外部的双引号**)替换成你的**手机号**和**教学立方登录密码**即可,其他参数无需修改。 137 | 138 | > 请确保json文件格式正确,可参考提供的 `config_example.json` 进行配置。 139 | 140 | ## 配置文件说明 141 | 142 | 以下对 `config.json` 内各项参数进行简要说明: 143 | 144 | | 参数名 | 类型 | 含义 | 145 | | -------------------- | ---- | --------------------------------------------------- | 146 | | username | str | 教学立方登录用户名(一般为手机号) | 147 | | password | str | 教学立方登录密码 | 148 | | ext_expel_list | list | 排除文件的类型列表 | 149 | | cid_expel_list | list | 排除课程的课程ID列表 | 150 | | cid_include_list | list | 保留课程的ID列表 | 151 | | save_path | str | 下载目录(支持绝对路径或相对路径) | 152 | | keep_dirs | bool | 是否保留文件的目录结构 | 153 | 154 | ### 注意事项与说明: 155 | 156 | 1. 例如:若不需要下载课程ID为 `12345` 的课程,且不需要下载 `pdf` 文件与 `mp4` 文件,可设置为: 157 | - `ext_expel_list = ["pdf", "mp4"]` 158 | - `cid_expel_list = [12345]` 159 | 2. 在修改下载目录 `save_path` 时,请先在对应的位置**创建文件夹**;如无法找到对应的文件夹,将会报错并在默认目录(程序根目录)保存下载得到的文件。 160 | 4. 课程ID在课程主页地址中查看,例如: 161 | ![](./figure/0.png) 162 | 图中对应课程的ID为**8261** 163 | 164 | 165 | ## 版权信息 166 | 167 | 联系邮箱:zhuxinhao00@gmail.com 168 | 169 | 本项目基于MIT协议开源 170 | -------------------------------------------------------------------------------- /download_parallel.py: -------------------------------------------------------------------------------- 1 | # -*- coding: UTF-8 -*- 2 | """ 3 | @FileName: download_parallel.py 4 | 5 | @Author: zhuxinhao00@gmail.com 6 | @Author: twinklerchn@gmail.com 7 | 8 | @Create date: 2025/2/19 9 | 10 | @Modified date: 2025/2/19 11 | 12 | @description: A script to download file automatically from teaching.applysquare.com concurrently 13 | """ 14 | 15 | import hashlib 16 | import json 17 | import os 18 | import pathlib 19 | import time 20 | from concurrent.futures import ThreadPoolExecutor 21 | from tqdm import tqdm 22 | 23 | import requests 24 | import urllib3 25 | 26 | urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 27 | 28 | 29 | # Get Hex-md5 encoded password 30 | def hex_md5_stringify(raw_str: str): 31 | md5_encoder = hashlib.md5() 32 | md5_encoder.update(str(raw_str).encode("utf-8")) 33 | return md5_encoder.hexdigest() 34 | 35 | 36 | # Function dealing with illegal characters of windows filename 37 | def filename_filter(name: str): 38 | illegal_list = list('/\:*?”"<>|') 39 | for char in illegal_list: 40 | name = name.replace(char, " ") 41 | return name 42 | 43 | 44 | def construct_attchment_list(sess, token, pid, uid, cid, parent_dir): 45 | attachment_list = list() 46 | attachment_info_url = attachment_url_fmt.format(token, pid, 1, uid, cid) 47 | r = sess.get(attachment_info_url, verify=False) 48 | info = r.json()["message"] 49 | file_num = info.get("count") 50 | 51 | current_page = 1 52 | # Add attachment path to attachment_list 53 | while len(attachment_list) < file_num: 54 | current_url = attachment_url_fmt.format(token, pid, current_page, uid, cid) 55 | r = sess.get(current_url, verify=False) 56 | info = r.json()["message"] 57 | attachment_list.extend(info.get("list")) 58 | current_page += 1 59 | for entry in attachment_list: 60 | entry["parent_dir"] = parent_dir 61 | return attachment_list 62 | 63 | 64 | # Load config from config.json 65 | with open("config.json", "r", encoding="utf-8") as f: 66 | config = json.loads(f.read()) 67 | user_name = config.get("username") 68 | user_passwd = config.get("password") 69 | ext_expel_list = config.get("ext_expel_list") 70 | cid_include_list = list(map(str, config.get("cid_include_list", []))) 71 | cid_expel_list = list(map(str, config.get("cid_expel_list", []))) 72 | save_path = config.get("save_path", "") 73 | keep_dirs = config.get("keep_dirs", False) 74 | 75 | if save_path: 76 | try: 77 | os.chdir(save_path) 78 | except Exception as e: 79 | print( 80 | 'Changing save_path failed for reason "{}", using default path instead.'.format( 81 | e 82 | ) 83 | ) 84 | time.sleep(1) 85 | 86 | print("Files will be saved to ", os.getcwd()) 87 | 88 | # Some metadata 89 | login_url = r"https://teaching.applysquare.com/Api/User/ajaxLogin" 90 | attachment_url_fmt = r"https://teaching.applysquare.com/Api/CourseAttachment/getList/token/{}?parent_id={}&page={}&plan_id=-1&uid={}&cid={}" 91 | course_info_url_fmt = r"https://teaching.applysquare.com/Api/Public/getIndexCourseList/token/{}?type=1&usertype=1&uid={}" 92 | attachment_detail_url_fmt = r"https://teaching.applysquare.com/Api/CourseAttachment/ajaxGetInfo/token/{}?id={}&uid={}&cid={}" 93 | 94 | # Init Requests session 95 | sess = requests.Session() 96 | 97 | # Login in 98 | print("Trying to log in, please wait ...") 99 | login_request = sess.post( 100 | login_url, 101 | data={"email": user_name, "password": hex_md5_stringify(user_passwd)}, 102 | verify=False, 103 | ) 104 | 105 | login_response = login_request.json() 106 | login_info = login_response["message"] 107 | 108 | try: 109 | token = login_info["token"] 110 | except TypeError: 111 | print("Login Failed, please check your username & password") 112 | print("Login info received: {}".format(login_info)) 113 | exit() 114 | 115 | uid = login_info["uid"] 116 | print("Login successfully!") 117 | 118 | cid2name_dict = dict() 119 | course_info_url = course_info_url_fmt.format(token, uid) 120 | r = sess.get(course_info_url, verify=False) 121 | info = r.json()["message"] 122 | for entry in info: 123 | cid2name_dict[entry.get("cid")] = entry.get("name") 124 | 125 | cid_list = cid2name_dict.keys() 126 | 127 | 128 | def check_cid(cid): 129 | if len(cid_include_list) and cid not in cid_include_list: 130 | return False 131 | return cid not in cid_expel_list 132 | 133 | 134 | print("\nReady to download the following courses:") 135 | for cid, cname in cid2name_dict.items(): 136 | if not check_cid(cid): 137 | continue 138 | print("Course: {:8s}, CID={:6}".format(cname, cid)) 139 | 140 | 141 | def download_cid(cid): 142 | cid = str(cid) # Prevent bug caused by wrong type of cid 143 | 144 | if not check_cid(cid): 145 | return 146 | 147 | try: 148 | course_name = filename_filter(cid2name_dict[cid]) 149 | except KeyError: 150 | print( 151 | "Can't find course name for cid {}, maybe it's a legacy course?".format(cid) 152 | ) 153 | course_name = "CID_{}".format(cid) 154 | print("\nDownloading files of course {}".format(course_name)) 155 | 156 | # Create dir for this course 157 | root = pathlib.Path(os.getcwd()) / course_name 158 | if not root.exists() or root.is_file(): 159 | os.makedirs(root) 160 | 161 | # Construct attachment list, with some dirs in it 162 | course_attachment_list = construct_attchment_list( 163 | sess=sess, token=token, pid=0, uid=uid, cid=cid, parent_dir=pathlib.Path(".") 164 | ) 165 | 166 | # Iteratively add files in dirs to global attachment list 167 | dir_counter = 0 168 | for entry in course_attachment_list: 169 | if entry.get("ext") == "dir": 170 | dir_counter += 1 171 | # Add dir content to attachment list 172 | dir_id = entry.get("id") 173 | dir_name = filename_filter(entry.get("title")) if keep_dirs else "" 174 | parent_dir = entry.get("parent_dir") 175 | if not (root / parent_dir / dir_name).exists(): 176 | os.makedirs(root / parent_dir / dir_name) 177 | course_attachment_list.extend( 178 | construct_attchment_list( 179 | sess=sess, 180 | token=token, 181 | pid=dir_id, 182 | uid=uid, 183 | cid=cid, 184 | parent_dir=parent_dir / dir_name, 185 | ) 186 | ) 187 | 188 | print( 189 | "Get {:d} files with {:d} dirs".format( 190 | len(course_attachment_list) - dir_counter, dir_counter 191 | ) 192 | ) 193 | 194 | def download_entry(entry): 195 | ext = entry.get("ext") 196 | if (ext == "dir") or (ext in ext_expel_list): 197 | return 198 | 199 | if ext in entry.get("title"): 200 | filename = filename_filter(entry.get("title")) 201 | else: 202 | filename = filename_filter("{}.{}".format(entry.get("title"), ext)) 203 | filepath = root / entry.get("parent_dir") / filename 204 | 205 | filesize = entry.get("size") 206 | 207 | # Get download url for un-downloadable files 208 | if entry.get("can_download") == "0": 209 | attachment_detail_url = attachment_detail_url_fmt.format( 210 | token, entry.get("id"), uid, cid 211 | ) 212 | r = sess.get(attachment_detail_url, verify=False) 213 | info = r.json()["message"] 214 | entry["path"] = info.get("path") 215 | 216 | # Streaming, so we can iterate over the response 217 | response = requests.get(entry.get("path").replace("amp;", ""), stream=True) 218 | 219 | try: 220 | content_size = eval(response.headers["content-length"]) 221 | except Exception: 222 | print( 223 | "Failed to get content length of file {}, please download it manually.".format( 224 | filename 225 | ) 226 | ) 227 | return 228 | 229 | if filepath.exists() and filepath.is_file(): 230 | # If file is up-to date, continue; else, delete and re-download 231 | if os.path.getsize(filepath) == content_size: 232 | print("File {:\u3000<20} is up-to-date".format(filename)) 233 | return 234 | else: 235 | print("Updating File {}".format(filename)) 236 | os.remove(filepath) 237 | 238 | chunk_size = min(content_size, 10240) 239 | 240 | with tqdm(total=content_size, unit="B", unit_scale=True, desc=f"Downloading {filename}") as progress_bar: 241 | with open(filepath, "wb") as file: 242 | for data in response.iter_content(chunk_size): 243 | progress_bar.update(len(data)) 244 | file.write(data) 245 | 246 | # Download attachments 247 | with ThreadPoolExecutor(max_workers=8) as exe: 248 | exe.map(download_entry, course_attachment_list) 249 | 250 | 251 | with ThreadPoolExecutor(max_workers=8) as exe: 252 | exe.map(download_cid, cid_list) 253 | 254 | print("Done!") 255 | -------------------------------------------------------------------------------- /download.py: -------------------------------------------------------------------------------- 1 | # -*- coding: UTF-8 -*- 2 | """ 3 | @FileName: download.py 4 | 5 | @Author:zhuxinhao00@gmail.com 6 | 7 | @Create date: 2020/03/31 8 | 9 | @Modified date: 2021/09/14 10 | 11 | @description: A script to download file automatically from teaching.applysquare.com 12 | """ 13 | 14 | import hashlib 15 | import json 16 | import logging 17 | import os 18 | import pathlib 19 | import re 20 | import time 21 | from contextlib import closing 22 | 23 | import requests 24 | import urllib3 25 | urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 26 | 27 | 28 | # Get Hex-md5 encoded password 29 | def hex_md5_stringify(raw_str:str): 30 | md5_encoder = hashlib.md5() 31 | md5_encoder.update(str(raw_str).encode('utf-8')) 32 | return md5_encoder.hexdigest() 33 | 34 | # Function dealing with illegal characters of windows filename 35 | def filename_filter(name:str): 36 | illegal_list = list('/\:*?”"<>|') 37 | for char in illegal_list: 38 | name = name.replace(char, ' ') 39 | return name 40 | 41 | def construct_attchment_list(sess, token, pid, uid, cid, parent_dir): 42 | attachment_list = list() 43 | attachment_info_url = attachment_url_fmt.format(token, pid, 1, uid, cid) 44 | r = sess.get(attachment_info_url, verify=False) 45 | info = r.json()['message'] 46 | file_num = info.get('count') 47 | 48 | current_page = 1 49 | # Add attachment path to attachment_list 50 | while len(attachment_list) < file_num: 51 | current_url = attachment_url_fmt.format(token, pid, current_page, uid, cid) 52 | r = sess.get(current_url, verify=False) 53 | info = r.json()['message'] 54 | attachment_list.extend(info.get('list')) 55 | current_page += 1 56 | for entry in attachment_list: 57 | entry["parent_dir"] = parent_dir 58 | return attachment_list 59 | 60 | # Load config from config.json 61 | with open('config.json', 'r', encoding='utf-8') as f: 62 | config = json.loads(f.read()) 63 | user_name = config.get('username') 64 | user_passwd = config.get('password') 65 | ext_expel_list = config.get('ext_expel_list') 66 | cid_include_list = list(map(str, config.get('cid_include_list', []))) 67 | cid_expel_list = list(map(str, config.get('cid_expel_list', []))) 68 | save_path = config.get('save_path', "") 69 | keep_dirs = config.get('keep_dirs', False) 70 | 71 | if save_path: 72 | try: 73 | os.chdir(save_path) 74 | except Exception as e: 75 | print('Changing save_path failed for reason \"{}\", using default path instead.'.format(e)) 76 | time.sleep(1) 77 | 78 | print("Files will be saved to ", os.getcwd()) 79 | 80 | # Some metadata 81 | login_url = r'https://teaching.applysquare.com/Api/User/ajaxLogin' 82 | attachment_url_fmt = r'https://teaching.applysquare.com/Api/CourseAttachment/getList/token/{}?parent_id={}&page={}&plan_id=-1&uid={}&cid={}' 83 | course_info_url_fmt = r'https://teaching.applysquare.com/Api/Public/getIndexCourseList/token/{}?type=1&usertype=1&uid={}' 84 | attachment_detail_url_fmt = r'https://teaching.applysquare.com/Api/CourseAttachment/ajaxGetInfo/token/{}?id={}&uid={}&cid={}' 85 | 86 | # Init Requests session 87 | sess = requests.Session() 88 | 89 | # Login in 90 | print("Trying to log in, please wait ...") 91 | login_request = sess.post(login_url, data={"email" : user_name, "password" : hex_md5_stringify(user_passwd)}, verify=False) 92 | 93 | login_response = login_request.json() 94 | login_info = login_response['message'] 95 | 96 | try: 97 | token = login_info['token'] 98 | except TypeError: 99 | print("Login Failed, please check your username & password") 100 | print("Login info received: {}".format(login_info)) 101 | exit() 102 | 103 | uid = login_info['uid'] 104 | print("Login successfully!") 105 | 106 | cid2name_dict = dict() 107 | course_info_url = course_info_url_fmt.format(token, uid) 108 | r = sess.get(course_info_url, verify=False) 109 | info = r.json()["message"] 110 | for entry in info: 111 | cid2name_dict[entry.get('cid')] = entry.get('name') 112 | 113 | cid_list = cid2name_dict.keys() 114 | 115 | def check_cid(cid): 116 | if len(cid_include_list) and cid not in cid_include_list: 117 | return False 118 | return cid not in cid_expel_list 119 | 120 | print("\nReady to download the following courses:") 121 | for cid, cname in cid2name_dict.items(): 122 | if not check_cid(cid): 123 | continue 124 | print("Course: {:8s}, CID={:6}".format(cname, cid)) 125 | 126 | for cid in cid_list: 127 | cid = str(cid) # Prevent bug caused by wrong type of cid 128 | 129 | if not check_cid(cid): 130 | continue 131 | 132 | try: 133 | course_name = filename_filter(cid2name_dict[cid]) 134 | except KeyError: 135 | print("Can't find course name for cid {}, maybe it's a legacy course?".format(cid)) 136 | course_name = "CID_{}".format(cid) 137 | print("\nDownloading files of course {}".format(course_name)) 138 | 139 | # Create dir for this course 140 | root= pathlib.Path(os.getcwd()) / course_name 141 | if not root.exists() or root.is_file(): 142 | os.makedirs(root) 143 | 144 | # Construct attachment list, with some dirs in it 145 | course_attachment_list = construct_attchment_list(sess=sess, token=token, pid=0, uid=uid, cid=cid, parent_dir=pathlib.Path(".")) 146 | 147 | # Iteratively add files in dirs to global attachment list 148 | dir_counter = 0 149 | for entry in course_attachment_list: 150 | if (entry.get('ext') == 'dir'): 151 | dir_counter += 1 152 | # Add dir content to attachment list 153 | dir_id = entry.get('id') 154 | dir_name = filename_filter(entry.get('title')) if keep_dirs else '' 155 | parent_dir = entry.get('parent_dir') 156 | if not (root/parent_dir/dir_name).exists(): 157 | os.makedirs(root/parent_dir/dir_name) 158 | course_attachment_list.extend(construct_attchment_list(sess=sess, token=token, pid=dir_id, uid=uid, cid=cid, parent_dir=parent_dir/dir_name)) 159 | 160 | print("Get {:d} files with {:d} dirs".format(len(course_attachment_list)-dir_counter, dir_counter)) 161 | 162 | # Download attachments 163 | for entry in course_attachment_list: 164 | ext = entry.get('ext') 165 | if (ext == 'dir') or (ext in ext_expel_list): 166 | continue 167 | 168 | if (ext in entry.get('title')): 169 | filename = filename_filter(entry.get('title')) 170 | else: 171 | filename = filename_filter("{}.{}".format(entry.get('title'), ext)) 172 | filepath = root/entry.get("parent_dir")/filename 173 | 174 | filesize = entry.get('size') 175 | 176 | # Get download url for un-downloadable files 177 | if (entry.get('can_download') == '0'): 178 | attachment_detail_url = attachment_detail_url_fmt.format(token, entry.get('id'), uid, cid) 179 | r = sess.get(attachment_detail_url, verify=False) 180 | info = r.json()['message'] 181 | entry['path'] = info.get('path') 182 | 183 | with closing(requests.get(entry.get('path').replace('amp;', ''), stream=True)) as res: 184 | 185 | try: 186 | content_size = eval(res.headers['content-length']) 187 | except Exception: 188 | print("Failed to get content length of file {}, please download it manually.".format(filename)) 189 | continue 190 | 191 | if filepath.exists() and filepath.is_file(): 192 | # If file is up-to date, continue; else, delete and re-download 193 | if os.path.getsize(filepath) == content_size: 194 | print("File {:\u3000<20} is up-to-date".format(filename)) 195 | continue 196 | else: 197 | print("Updating File {}".format(filename)) 198 | os.remove(filepath) 199 | 200 | print("Downloading {:\u3000<20s}, filesize = {}".format(filename, filesize)) 201 | chunk_size = min(content_size, 10240) 202 | with open(filepath, "wb") as f: 203 | chunk_count = 0 204 | start_time = time.time() 205 | # previous_time = time.time() 206 | # lag_counter = 0 207 | total = content_size / 1024 / 1024 208 | for data in res.iter_content(chunk_size=chunk_size): 209 | chunk_count += 1 210 | processed = len(data) * chunk_count / 1024 / 1024 211 | current_time = time.time() 212 | if chunk_count < 5: 213 | print(r" Total: {:.2f} MB Processed: {:.2f} MB ({:.2f}%)".format(total, processed, processed/total*100), end = '\r') 214 | else: 215 | remaining = (current_time-start_time)/processed*(total-processed) 216 | print(r" Total: {:.2f} MB Processed: {:.2f} MB ({:.2f}%), ETA {:.2f}s".format(total, processed, processed/total*100, remaining), end = '\r') 217 | f.write(data) 218 | 219 | # speed = chunk_size / 1.0 * (current_time - previous_time) 220 | # if speed < speed_threshold: 221 | # lag_counter += 1 222 | # else: 223 | # lag_counter = 0 224 | 225 | # if lag_counter > 10: 226 | # print("Restart downloading of file {}".format(filename)) 227 | # attachment_list.append(entry) 228 | # continue 229 | 230 | 231 | print("Done!") 232 | --------------------------------------------------------------------------------