├── .github
└── workflows
│ └── build.yml
├── .gitignore
├── LICENSE
├── README.md
├── example_cookie.txt
├── hint.jpg
├── learn-old.py
├── learn-slow.py
├── learn-stdio.py
├── learn.py
├── learn_async.py
└── requirements.txt
/.github/workflows/build.yml:
--------------------------------------------------------------------------------
1 | name: PyInstaller
2 | on: [push]
3 | jobs:
4 | build:
5 | runs-on: ${{ matrix.os }}
6 | strategy:
7 | matrix:
8 | os: [macos-latest, windows-latest, ubuntu-latest]
9 | steps:
10 | - uses: actions/checkout@v1
11 | - name: Set up Python 3.7
12 | uses: actions/setup-python@v1
13 | with:
14 | python-version: 3.7
15 | - name: Install dependencies
16 | run: |
17 | python -m pip install --upgrade pip
18 | pip install -r requirements.txt
19 | pip install pyinstaller
20 | - name: build with pyinstaller
21 | run: |
22 | pyinstaller --onefile learn-stdio.py -n learn-stdio-${{ matrix.os }}
23 | - name : Upload artifact
24 | uses: actions/upload-artifact@master
25 | with:
26 | name: learn-stdio
27 | path: dist/
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | venv/
3 | .pass
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 n+e
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 清华大学新版网络学堂课程自动下载脚本
2 |
3 | ## Features
4 |
5 | 0. 跨平台支持:Windows/Mac/Linux 支持双击运行([详情点击](https://github.com/Trinkle23897/learn2018-autodown/releases))
6 | 1. 下载所有课程公告
7 | 2. 下载所有课件
8 | 3. 下载所有作业文件及其批阅情况
9 | 4. 下载所有课程讨论
10 | 5. 下载课程信息
11 | 6. 增量更新
12 | 7. 可选下载课程
13 | 8. 随时按 Ctrl+C 跳过某个文件的下载
14 | 9. 下载助教课程
15 | 10. 可使用 cookie 登录
16 | 11. 多刷刷有利于后台成绩提高,比如以下第三条记录是我的:
17 |
18 | 
19 |
20 | ## Dependency
21 |
22 | python>=3.5, bs4, tqdm, requests
23 |
24 | ```bash
25 | pip3 install -r requirements.txt --user -U
26 | ```
27 |
28 | ## Usage
29 |
30 | ### CLI 下载
31 |
32 | ```bash
33 | python learn-stdio.py
34 | ```
35 |
36 | ### 原始脚本下载选项
37 |
38 | 下载当前学期课程(默认)
39 |
40 | ```bash
41 | python learn_async.py
42 | ```
43 |
44 | 下载所有学期课程
45 |
46 | ```bash
47 | python learn_async.py --all
48 | ```
49 |
50 | 下载指定学期课程
51 |
52 | ```bash
53 | python learn_async.py --semester 2018-2019-1 2018-2019-3
54 | ```
55 |
56 | 下载指定课程
57 |
58 | ```bash
59 | python learn_async.py --course 计算机网络安全技术 计算机组成原理
60 | ```
61 |
62 | 跳过某几个课程下载
63 |
64 | ```bash
65 | python learn_async.py --ignore 数据结构 "实验室科研探究(1)"
66 | ```
67 |
68 | 移除所有文件夹下完全相同的文件
69 |
70 | ```bash
71 | python learn_async.py --clear --all
72 | ```
73 |
74 | 指定下载路径
75 |
76 | ```bash
77 | python learn_async.py --dist your_download_path
78 | ```
79 |
80 | 启用多进程下载
81 |
82 | ```bash
83 | python learn_async.py --multi
84 | ```
85 |
86 | 启用多进程下载,并指定进程数(如果不指定则默认使用所有 CPU 核心数)
87 |
88 | ```bash
89 | python learn_async.py --multi --processes 4
90 | ```
91 |
92 | 以上参数均可组合使用,比如我想并发的更新大二的课程到`./download`目录,但是不想下载数据结构、实验室科研探究、中国近现代史纲要(课程文件太大了):
93 |
94 | ```bash
95 | python learn_async.py --semester 2017-2018-1 2017-2018-2 2017-2018-3 --ignore 数据结构 "实验室科研探究(2)" 中国近现代史纲要 --multi --dist ./download
96 | ```
97 |
98 | **如果想跳过正在下载的某个文件,按 Ctrl+C 即可。**
99 |
100 | ### 登录选项(learn-stdio 中禁用)
101 |
102 | 懒得每次输入 info 账号密码?创建文件`.pass`,写入 info 账号和密码之后可以自动登录,或者是:
103 |
104 | ```bash
105 | python learn_async.py --_pass your_info_file
106 | ```
107 |
108 | 其中文件格式为
109 |
110 | ```bash
111 | info账号
112 | info密码
113 | ```
114 |
115 | 使用 Cookie 登录而不是输入 info 密码:
116 |
117 | ```bash
118 | python learn_async.py --cookie your_cookie_filename
119 | ```
120 |
121 | 其中 cookie 文件格式可参考 `example_cookie.txt`。
122 |
123 | ## Common Issues
124 |
125 | - 卡在 login:网络原因,看看 pulse-secure 关了没,重跑试试看
126 | - `500 : Internal Server Error`:请拉取最新版的脚本。网络学堂自 2020/2/22 开启强制 https。
127 | - `info_xxx.csv`在 Mac 下打开是乱码:别用 office,用 mac 自带的软件吧 :)
128 |
--------------------------------------------------------------------------------
/example_cookie.txt:
--------------------------------------------------------------------------------
1 | # Netscape HTTP Cookie File
2 | learn.tsinghua.edu.cn FALSE / FALSE JSESSIONID B1274E298A712E84F1346C2753AA4BC0.wlxt20181
3 |
--------------------------------------------------------------------------------
/hint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Trinkle23897/learn2018-autodown/c000b00eafa0846341f27eb760419ca846d78ca2/hint.jpg
--------------------------------------------------------------------------------
/learn-old.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 |
4 | import os, re, sys, bs4
5 | import urllib
6 | import getpass
7 | import http.cookiejar
8 | from bs4 import BeautifulSoup as bs
9 |
10 | url = 'https://learn.tsinghua.edu.cn/'
11 | user_agent = r'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
12 | headers = { 'User-Agent': user_agent, 'Connection': 'keep-alive' }
13 |
14 | cookie = http.cookiejar.MozillaCookieJar()
15 | handler = urllib.request.HTTPCookieProcessor(cookie)
16 | opener = urllib.request.build_opener(handler)
17 |
18 | def open_page(uri, values = {}):
19 | post_data = urllib.parse.urlencode(values).encode()
20 | request = urllib.request.Request(url + uri, post_data, headers)
21 | try:
22 | response = opener.open(request)
23 | return response
24 | except urllib.error.URLError as e:
25 | print(e.code, ':', e.reason)
26 |
27 | def get_page(uri, values = {}):
28 | data = open_page(uri, values)
29 | if data:
30 | return data.read().decode()
31 |
32 | def login(username, password):
33 | login_uri = 'MultiLanguage/lesson/teacher/loginteacher.jsp'
34 | values = { 'userid': username, 'userpass': password, 'submit1': '登陆' }
35 | successful = get_page(login_uri, values).find('loginteacher_action.jsp') != -1
36 | print('Login successfully' if successful else 'Login failed!')
37 | return successful
38 |
39 | def get_courses(typepage = 1):
40 | soup = bs(get_page('MultiLanguage/lesson/student/MyCourse.jsp?language=cn&typepage=' + str(typepage)), 'html.parser')
41 | ids = soup.findAll(href=re.compile("course_id="))
42 | courses = []
43 | for link in ids:
44 | href = link.get('href').split('course_id=')[-1]
45 | name = link.text.strip()
46 | courses.append((href, name))
47 | return courses
48 |
49 | def sync_file(path_prefix, course_id):
50 | if not os.path.exists(path_prefix):
51 | os.makedirs(path_prefix)
52 | soup = bs(get_page('MultiLanguage/lesson/student/download.jsp?course_id=' + str(course_id)), 'html.parser')
53 | for comment in soup(text=lambda text: isinstance(text, bs4.Comment)):
54 | link = bs(comment, 'html.parser').a
55 | name = link.text
56 | uri = comment.next.next.a.get('href')
57 | filename = link.get('onclick').split('getfilelink=')[-1].split('&id')[0]
58 | file_path = os.path.join(path_prefix, filename)
59 | if not os.path.exists(file_path):
60 | print('Download ', name)
61 | open(file_path, 'wb').write(open_page(uri).read())
62 |
63 | def sync_hw(path_prefix, course_id):
64 | if not os.path.exists(path_prefix):
65 | os.makedirs(path_prefix)
66 | root = bs(get_page('MultiLanguage/lesson/student/hom_wk_brw.jsp?course_id=' + str(course_id)), 'html.parser')
67 | for ele in root.findAll('a'):
68 | hw_path = os.path.join(path_prefix, ele.text)
69 | if not os.path.exists(hw_path):
70 | os.makedirs(hw_path)
71 | soup = bs(get_page('MultiLanguage/lesson/student/' + ele.get('href')), 'html.parser')
72 | for link in soup.findAll('a'):
73 | name = 'upload-'+link.text if link.parent.previous.previous.strip() == '上交作业附件' else link.text
74 | uri = link.get('href')
75 | file_path = os.path.join(hw_path, name)
76 | if not os.path.exists(file_path):
77 | print('Download ', name)
78 | open(file_path, 'wb').write(open_page(uri).read())
79 |
80 | if __name__ == '__main__':
81 | ignore = open('.ignore').read().split() if os.path.exists('.ignore') else []
82 | username = input('username: ')
83 | password = getpass.getpass('password: ')
84 | if login(username, password):
85 | typepage = 1 if '.py' in sys.argv[-1] else int(sys.argv[-1])
86 | courses = get_courses(typepage)
87 | for course_id, name in courses:
88 | if name in ignore:
89 | print('Skip ' + name)
90 | else:
91 | print('Sync '+ name)
92 | sync_file(name, course_id)
93 | sync_hw(name, course_id)
94 |
--------------------------------------------------------------------------------
/learn-slow.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python2
2 | # -*- coding: utf-8 -*-
3 |
4 | __author__ = "Trinkle23897"
5 | __copyright__ = "Copyright (C) 2019 Trinkle23897"
6 | __license__ = "MIT"
7 | __email__ = "463003665@qq.com"
8 |
9 | import os, sys, getpass, requests
10 | from time import sleep
11 | from selenium import webdriver
12 | from bs4 import BeautifulSoup as bs
13 | from selenium.webdriver.chrome.options import Options
14 |
15 | root_uri = 'http://learn2018.tsinghua.edu.cn'
16 | time_sleep = 0.05
17 | time_out = 5
18 |
19 | def wait_for_load(cond, driver): # wait for loading course info
20 | cnt = time_out / time_sleep # max try
21 | while cond(driver) and cnt > 0:
22 | sleep(time_sleep)
23 | cnt -= 1
24 |
25 | def load_course_cond(driver): # avoid null
26 | return len(bs(driver.page_source, 'html.parser').findAll(class_='title stu')) == 0
27 |
28 | def load_notice_cond(driver): # avoid '条数据'.count == 1
29 | return bs(driver.page_source, 'html.parser').text.count(u'条数据') < 2
30 |
31 | def load_notice_ele_cond(driver): # avoid single '\n'
32 | return len(bs(driver.page_source, 'html.parser').find(id='ggnr').text) < 2
33 |
34 | def load_course_file_cond(driver): # avoid no element in tabbox
35 | return bs(driver.page_source, 'html.parser').find(id='tabbox').text.count(u'电子教案') == 0
36 |
37 | def load_course_file_ele_cond(driver): # avoid no element in tabbox
38 | return len(bs(driver.page_source, 'html.parser').find(class_='playli').findAll('li')) == 0 and u'此类别没有课程文件' not in bs(driver.page_source, 'html.parser').text
39 |
40 | def load_hw_cond(driver):
41 | hw_html = bs(driver.page_source, 'html.parser')
42 | return len(hw_html.find(id='wtj').findAll('tr')) <= 2 and u'表中数据为空' not in hw_html.text
43 |
44 | def download(pwd, url, cookie, name):
45 | r = requests.get(url, cookies=cookie, stream=True)
46 | filename = r.headers['Content-Disposition'].split('filename="')[-1].split('"')[0]
47 | if filename in os.listdir(pwd):
48 | return
49 | print('Download %s' % name)
50 | open(os.path.join(pwd, filename), 'wb').write(r.content)
51 |
52 | if __name__ == "__main__":
53 | ignore = open('.ignore').read().split() if os.path.exists('.ignore') else []
54 | chrome_options = Options()
55 | chrome_options.add_argument("--headless") # comment for looking its behavior
56 | driver = webdriver.Chrome(chrome_options=chrome_options)
57 | print('Login ...')
58 | driver.get("http://learn.tsinghua.edu.cn/f/login")
59 | driver.find_element_by_name("i_user").send_keys(str(raw_input('Username: ')))
60 | driver.find_element_by_name("i_pass").send_keys(str(getpass.getpass('Password: ')))
61 | driver.find_element_by_id("loginButtonId").click()
62 | wait_for_load(load_course_cond, driver)
63 | print('\rLogin successfully!')
64 | # remember cookie for downloading files
65 | cookie = {}
66 | for c in driver.get_cookies():
67 | cookie[c[u'name'].encode('utf-8')] = c[u'value'].encode('utf-8')
68 | print(cookie)
69 | exit()
70 | root = bs(driver.page_source, 'html.parser')
71 | for course in root.findAll(class_='title stu')[:2]:
72 | if course.text in ignore:
73 | print('Skip ' + course.text)
74 | continue
75 | print('Sync ' + course.text)
76 | if not os.path.exists(course.text):
77 | os.mkdir(course.text)
78 | os.chdir(course.text)
79 | driver.get(root_uri + course.attrs['href'])
80 |
81 | # 公告
82 | if not os.path.exists('公告'):
83 | os.mkdir('公告')
84 | os.chdir('公告')
85 | driver.find_element_by_id("wlxt_kcgg_wlkc_ggb").click()
86 | wait_for_load(load_notice_cond, driver)
87 | all_notice = bs(driver.page_source, 'html.parser').find(id='table').findAll('a')
88 | for notice in all_notice:
89 | if os.path.exists(notice.attrs['title'].replace(u'/', u'、') + u'.txt'):
90 | continue
91 | driver.get(root_uri + notice.attrs['href'])
92 | wait_for_load(load_notice_ele_cond, driver)
93 | text = bs(driver.page_source, 'html.parser').find(id='ggnr').text
94 | open(notice.attrs['title'].replace(u'/', u'、') + u'.txt', 'w').write(text.encode('utf-8'))
95 | os.chdir('..') # leave 公告
96 |
97 | # 文件
98 | if not os.path.exists('文件'):
99 | os.mkdir('文件')
100 | os.chdir('文件')
101 | driver.find_element_by_id("wlxt_kj_wlkc_kjxxb").click()
102 | wait_for_load(load_course_file_cond, driver)
103 | all_tab = bs(driver.page_source, 'html.parser').find(id='tabbox').findAll('li')
104 | # print(all_tab)
105 | for tab in all_tab:
106 | driver.find_element_by_xpath('//li[@kjflid="%s"]' % tab.attrs['kjflid']).click()
107 | wait_for_load(load_course_file_cond, driver)
108 | wait_for_load(load_course_file_ele_cond, driver)
109 | all_file = bs(driver.page_source, 'html.parser').find(class_='playli').findAll('li')
110 | for file in all_file:
111 | download(os.getcwd(), root_uri + '/b/wlxt/kj/wlkc_kjxxb/student/downloadFile?sfgk=0&wjid=%s' % file.attrs['wjid'], cookie, file.attrs['kjbt'])
112 | os.chdir('..') # leave 文件
113 |
114 | # 作业
115 | if not os.path.exists('作业'):
116 | os.mkdir('作业')
117 | os.chdir('作业')
118 | driver.find_element_by_id("wlxt_kczy_zy").click()
119 | wait_for_load(load_hw_cond, driver)
120 | hw_html = bs(driver.page_source, 'html.parser')
121 | for hw_list in [hw_html.find(id='wtj'), hw_html.find(id='yjwg'), hw_html.find(id='ypg')]:
122 | # print(hw_list)
123 | if u'表中数据为空' in hw_list.text:
124 | continue
125 | for hw in hw_list.findAll('tr')[1:]:
126 | driver.get(root_uri + hw.td.next_sibling.a.attrs['href'])
127 | html = bs(driver.page_source, 'html.parser')
128 | title = html.find(class_='detail').find(class_='right').text.strip()
129 | if not os.path.exists(title):
130 | os.mkdir(title)
131 | os.chdir(title)
132 | disc = html.find(class_='detail').find(class_='c55').text.strip()
133 | open('作业说明.txt', 'w').write(disc.encode('utf-8'))
134 | file_list = html.findAll(class_='ftitle')
135 | for f in file_list:
136 | download(os.getcwd(), root_uri + f.a.attrs['href'].split('downloadUrl=')[-1], cookie, f.text.replace('\n', ''))
137 | os.chdir('..') # leave sub_hw
138 |
139 | os.chdir('..') # leave 作业
140 | os.chdir('..') # leave course
141 |
--------------------------------------------------------------------------------
/learn-stdio.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import time, argparse
4 | from learn_async import main
5 | import os
6 |
7 |
8 | def get(help, choices=None, default=None):
9 | while True:
10 | i = input(help)
11 | if i:
12 | if choices and i not in choices:
13 | pass
14 | else:
15 | if default == []:
16 | i = i.split()
17 | return i
18 | else:
19 | return default
20 |
21 |
22 | def get_args():
23 | parser = argparse.ArgumentParser()
24 | args = parser.parse_args()
25 | print("按回车选择默认选项 ...")
26 | args.all = get(
27 | "同步所有学期的所有课程 [y/N]:", choices=["Y", "N", "y", "n"], default=None
28 | )
29 | if args.all in ["n", "N"]:
30 | args.all = None
31 | args.clear = get("清空相同文件 [y/N]:", choices=["Y", "N", "y", "n"], default=None)
32 | if args.clear in ["n", "N"]:
33 | args.clear = None
34 | args.semester = get("学期:", default=[])
35 | args.course = get("指定课程:", default=[])
36 | args.ignore = get("忽略课程:", default=[])
37 | args.dist = get("下载路径(默认当前目录):", default="")
38 | if len(args.dist) != 0:
39 | if not os.path.exists(args.dist):
40 | multi = get(
41 | f"路径{args.dist}不存在,是否创建? [Y/n]",
42 | choices=["Y", "N", "y", "n"],
43 | default="Y",
44 | )
45 | if multi in ["y", "Y"]:
46 | os.makedirs(args.dist)
47 | else:
48 | exit()
49 | multi = get("是否启用多进程?[y/N]", choices=["Y", "N", "y", "n"], default="N")
50 | if multi in ["y", "Y"]:
51 | args.multi = True
52 | args.processes = get("进程数(默认使用所有CPU核心数):", default=None)
53 | else:
54 | args.multi = False
55 | args._pass = ".pass"
56 | args.cookie = ""
57 | args.http_proxy = ""
58 | args.https_proxy = ""
59 | args.username = ""
60 | args.password = ""
61 | return args
62 |
63 |
64 | if __name__ == "__main__":
65 | t = time.time()
66 | main(get_args())
67 | t = time.time() - t
68 | print("耗时: %02d:%02d:%02.0f" % (t // 3600, (t % 3600) // 60, t % 60))
69 | input("请按任意键退出")
70 |
--------------------------------------------------------------------------------
/learn.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 |
4 | __author__ = "Trinkle23897"
5 | __copyright__ = "Copyright (C) 2019 Trinkle23897"
6 | __license__ = "MIT"
7 | __email__ = "463003665@qq.com"
8 |
9 | import os, csv, json, html, urllib, getpass, base64, hashlib, argparse, platform, subprocess
10 | from tqdm import tqdm
11 | import urllib.request, http.cookiejar
12 | from bs4 import BeautifulSoup as bs
13 |
14 | import ssl
15 |
16 | ssl._create_default_https_context = ssl._create_unverified_context
17 | global dist_path, url, user_agent, headers, cookie, opener, err404
18 | dist_path = url = user_agent = headers = cookie = opener = err404 = None
19 |
20 |
21 | def build_global(args):
22 | global dist_path, url, user_agent, headers, cookie, opener, err404
23 | dist_path = args.dist
24 | url = 'https://learn.tsinghua.edu.cn'
25 | user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
26 | headers = {'User-Agent': user_agent, 'Connection': 'keep-alive'}
27 | handlers = []
28 | if args.http_proxy:
29 | handlers.append(urllib.request.ProxyHandler({'http': args.http_proxy}))
30 | if args.https_proxy:
31 | handlers.append(urllib.request.ProxyHandler({'https': args.https_proxy}))
32 | cookie = http.cookiejar.MozillaCookieJar()
33 | handlers.append(urllib.request.HTTPCookieProcessor(cookie))
34 | opener = urllib.request.build_opener(*handlers)
35 | urllib.request.install_opener(opener)
36 | err404 = '\r\n\r\n\r\n'
37 |
38 | def get_xsrf_token():
39 | cookie_obj = cookie._cookies.get('learn.tsinghua.edu.cn', dict()).get('/', dict()).get('XSRF-TOKEN', None)
40 | return cookie_obj.value if cookie_obj else None
41 |
42 | def open_page(uri, values={}):
43 | post_data = urllib.parse.urlencode(values).encode() if values else None
44 | request = urllib.request.Request(uri if uri.startswith('http') else url + uri, post_data, headers)
45 | try:
46 | response = opener.open(request)
47 | return response
48 | except urllib.error.URLError as e:
49 | print(uri, e.code, ':', e.reason)
50 |
51 |
52 | def get_page(uri, values={}):
53 | data = open_page(uri, values)
54 | if data:
55 | return data.read().decode()
56 |
57 |
58 | def get_json(uri, values={}):
59 | xsrf_token = get_xsrf_token()
60 | if xsrf_token:
61 | if '?' not in uri:
62 | uri = uri + f'?_csrf={xsrf_token}'
63 | else:
64 | uri = uri + f'&_csrf={xsrf_token}'
65 | try:
66 | page = get_page(uri, values)
67 | result = json.loads(page)
68 | return result
69 | except:
70 | return {}
71 |
72 |
73 | def escape(s):
74 | return html.unescape(s).replace(os.path.sep, '、').replace(':', '_').replace(' ', '_').replace('\t', '').replace('?', '.').replace('/', '_').replace('\'', '_').replace('<', '').replace('>', '').replace('#', '').replace(';', '').replace('*', '_').replace("\"", '_').replace("\'", '_').replace('|', '')
75 |
76 |
77 | def login(username, password):
78 | login_uri = 'https://id.tsinghua.edu.cn/do/off/ui/auth/login/post/bb5df85216504820be7bba2b0ae1535b/0?/login.do'
79 | values = {'i_user': username, 'i_pass': password, 'atOnce': 'true'}
80 | info = get_page(login_uri, values)
81 | successful = 'SUCCESS' in info
82 | print('User %s login successfully' % (username) if successful else 'User %s login failed!' % (username))
83 | if successful:
84 | get_page(get_page(info.split('replace("')[-1].split('");\n')[0]).split('location="')[1].split('";\r\n')[0])
85 | return successful
86 |
87 |
88 | def get_courses(args):
89 | try:
90 | now = get_json('/b/kc/zhjw_v_code_xnxq/getCurrentAndNextSemester')['result']['xnxq']
91 | if args.all or args.course or args.semester:
92 | query_list = [x for x in get_json('/b/wlxt/kc/v_wlkc_xs_xktjb_coassb/queryxnxq') if x is not None]
93 | query_list.sort()
94 | if args.semester:
95 | query_list_ = [q for q in query_list if q in args.semester]
96 | if len(query_list_) == 0:
97 | print('Invalid semester, choices: ', query_list)
98 | return []
99 | query_list = query_list_
100 | else:
101 | query_list = [now]
102 | except:
103 | print('您被退学了!')
104 | return []
105 | courses = []
106 | for q in query_list:
107 | try:
108 | c_stu = get_json('/b/wlxt/kc/v_wlkc_xs_xkb_kcb_extend/student/loadCourseBySemesterId/' + q + '/zh/')['resultList']
109 | except:
110 | c_stu = []
111 | try:
112 | c_ta = get_json('/b/kc/v_wlkc_kcb/queryAsorCoCourseList/%s/0' % q)['resultList']
113 | except:
114 | c_ta = []
115 | current_courses = []
116 | for c in c_stu:
117 | c['jslx'] = '3'
118 | current_courses.append(c)
119 | for c in c_ta:
120 | c['jslx'] = '0'
121 | current_courses.append(c)
122 | courses += current_courses
123 | escape_c = []
124 |
125 | def escape_course_fn(c):
126 | return escape(c).replace(' ', '').replace('_', '').replace('(', '(').replace(')', ')')
127 |
128 | for c in courses:
129 | c['kcm'] = escape_course_fn(c['kcm'])
130 | escape_c.append(c)
131 | courses = escape_c
132 | if args.course:
133 | args.course = [escape_course_fn(c) for c in args.course]
134 | courses = [c for c in courses if c['kcm'] in args.course]
135 | if args.ignore:
136 | args.ignore = [escape_course_fn(c) for c in args.ignore]
137 | courses = [c for c in courses if c['kcm'] not in args.ignore]
138 | return courses
139 |
140 |
141 | class TqdmUpTo(tqdm):
142 | def update_to(self, b=1, bsize=1, tsize=None):
143 | if tsize is not None:
144 | self.total = tsize
145 | self.update(b * bsize - self.n)
146 |
147 |
148 | def download(uri, name):
149 | filename = escape(name)
150 | if os.path.exists(filename) and os.path.getsize(filename) or 'Connection__close' in filename:
151 | return
152 | try:
153 | with TqdmUpTo(ascii=True, dynamic_ncols=True, unit='B', unit_scale=True, miniters=1, desc=filename) as t:
154 | urllib.request.urlretrieve(url + uri, filename=filename, reporthook=t.update_to, data=None)
155 | except:
156 | print('Could not download file %s ... removing broken file' % filename)
157 | if os.path.exists(filename):
158 | os.remove(filename)
159 | return
160 |
161 |
162 | def build_notify(s):
163 | tp = bs(base64.b64decode(s['ggnr']).decode('utf-8'), 'html.parser').text if s['ggnr'] else ''
164 | st = '题目: %s\n发布人: %s\n发布时间: %s\n\n内容: %s\n' % (s['bt'], s['fbr'], s['fbsjStr'], tp)
165 | return st
166 |
167 |
168 | def sync_notify(c):
169 | global dist_path
170 | pre = os.path.join(dist_path, c['kcm'], '公告')
171 | if not os.path.exists(pre):
172 | os.makedirs(pre)
173 | try:
174 | data = {'aoData': [{"name": "wlkcid", "value": c['wlkcid']}]}
175 | if c['_type'] == 'student':
176 | notify = get_json('/b/wlxt/kcgg/wlkc_ggb/student/pageListXs', data)['object']['aaData']
177 | else:
178 | notify = get_json('/b/wlxt/kcgg/wlkc_ggb/teacher/pageList', data)['object']['aaData']
179 | except:
180 | return
181 | for n in notify:
182 | if not os.path.exists(os.path.join(pre, escape(n['bt']))):
183 | os.makedirs(os.path.join(pre, escape(n['bt'])))
184 | path = os.path.join(os.path.join(pre, escape(n['bt'])), escape(n['bt']) + '.txt')
185 | open(path, 'w', encoding='utf-8').write(build_notify(n))
186 |
187 | if n.get('fjmc') is not None:
188 | html = get_page('/f/wlxt/kcgg/wlkc_ggb/%s/beforeViewXs?wlkcid=%s&id=%s' % (c['_type'], n['wlkcid'], n['ggid']))
189 | soup = bs(html, 'html.parser')
190 |
191 | link = soup.find('a', class_='ml-10')
192 |
193 | now = os.getcwd()
194 | os.chdir(os.path.join(pre, escape(n['bt'])))
195 | name = n['fjmc']
196 | download(link['href'], name=name)
197 | os.chdir(now)
198 |
199 |
200 | def sync_file(c):
201 | global dist_path
202 | now = os.getcwd()
203 | pre = os.path.join(dist_path, c['kcm'], '课件')
204 | if not os.path.exists(pre):
205 | os.makedirs(pre)
206 |
207 | if c['_type'] == 'student':
208 | files = get_json('/b/wlxt/kj/wlkc_kjxxb/student/kjxxbByWlkcidAndSizeForStudent?wlkcid=%s&size=0' % c['wlkcid'])['object']
209 | else:
210 | try:
211 | files = get_json('/b/wlxt/kj/v_kjxxb_wjwjb/teacher/queryByWlkcid?wlkcid=%s&size=0' % c['wlkcid'])['object']['resultsList']
212 | except: # None
213 | return
214 |
215 | rows = json.loads(get_page(f'/b/wlxt/kj/wlkc_kjflb/{c["_type"]}/pageList?_csrf={get_xsrf_token()}&wlkcid={c["wlkcid"]}'))['object']['rows']
216 |
217 | os.chdir(pre)
218 | for r in rows:
219 | if c['_type'] == 'student':
220 | row_files = get_json(f'/b/wlxt/kj/wlkc_kjxxb/{c["_type"]}/kjxxb/{c["wlkcid"]}/{r["kjflid"]}')['object']
221 | else:
222 | data = {'aoData': [
223 | {"name": "wlkcid", "value": c['wlkcid']},
224 | {"name": "kjflid","value": r["kjflid"]},
225 | {"name": "iDisplayStart","value": 0},
226 | {"name": "iDisplayLength","value": "-1"},
227 | ]}
228 | row_files = get_json('/b/wlxt/kj/v_kjxxb_wjwjb/teacher/pageList', data)['object']['aaData']
229 | if not os.path.exists(escape(r['bt'])):
230 | os.makedirs(escape(r['bt']))
231 | rnow = os.getcwd()
232 | os.chdir(escape(r['bt']))
233 | for rf in row_files:
234 | wjlx = None
235 | if c['_type'] == 'student':
236 | flag = False
237 | for f in files:
238 | if rf[7] == f['wjid']:
239 | flag = True
240 | wjlx = f['wjlx']
241 | break
242 | wjid = rf[7]
243 | name = rf[1]
244 | else:
245 | flag = True
246 | wjlx = rf['wjlx']
247 | wjid = rf['wjid']
248 | name = rf['bt']
249 | if flag:
250 | if wjlx:
251 | name += '.' + wjlx
252 | download(f'/b/wlxt/kj/wlkc_kjxxb/{c["_type"]}/downloadFile?sfgk=0&wjid={wjid}', name=name)
253 | else:
254 | print(f'文件{rf[1]}出错')
255 | os.chdir(rnow)
256 |
257 | os.chdir(now)
258 |
259 |
260 | def sync_info(c):
261 | global dist_path
262 | pre = os.path.join(dist_path, c['kcm'], '课程信息.txt')
263 | try:
264 | if c['_type'] == 'student':
265 | html = get_page('/f/wlxt/kc/v_kcxx_jskcxx/student/beforeXskcxx?wlkcid=%s&sfgk=-1' % c['wlkcid'])
266 | else:
267 | html = get_page('/f/wlxt/kc/v_kcxx_jskcxx/teacher/beforeJskcxx?wlkcid=%s&sfgk=-1' % c['wlkcid'])
268 | open(pre, 'w').write('\n'.join(bs(html, 'html.parser').find(class_='course-w').text.split()))
269 | except:
270 | return
271 |
272 |
273 | def append_hw_csv(fname, stu):
274 | try:
275 | f = [i for i in csv.reader(open(fname)) if i]
276 | except:
277 | f = [['学号', '姓名', '院系', '班级', '上交时间', '状态', '成绩', '批阅老师']]
278 | info_str = [stu['xh'], stu['xm'], stu['dwmc'], stu['bm'], stu['scsjStr'], stu['zt'], stu['cj'], stu['jsm']]
279 | xhs = [i[0] for i in f]
280 | if stu['xh'] in xhs:
281 | i = xhs.index(stu['xh'])
282 | f[i] = info_str
283 | else:
284 | f.append(info_str)
285 | csv.writer(open(fname, 'w')).writerows(f)
286 |
287 |
288 | def sync_hw(c):
289 | global dist_path
290 | now = os.getcwd()
291 | pre = os.path.join(dist_path, c['kcm'], '作业')
292 | if not os.path.exists(pre):
293 | os.makedirs(pre)
294 | data = {'aoData': [{"name": "wlkcid", "value": c['wlkcid']}]}
295 | if c['_type'] == 'student':
296 | hws = []
297 | for hwtype in ['zyListWj', 'zyListYjwg', 'zyListYpg']:
298 | try:
299 | hws += get_json('/b/wlxt/kczy/zy/student/%s' % hwtype, data)['object']['aaData']
300 | except:
301 | continue
302 | else:
303 | hws = get_json('/b/wlxt/kczy/zy/teacher/pageList', data)['object']['aaData']
304 | for hw in hws:
305 | path = os.path.join(pre, escape(hw['bt']))
306 | if not os.path.exists(path):
307 | os.makedirs(path)
308 | if c['_type'] == 'student':
309 | append_hw_csv(os.path.join(path, 'info_%s.csv' % c['wlkcid']), hw)
310 | page = bs(get_page('/f/wlxt/kczy/zy/student/viewCj?wlkcid=%s&zyid=%s&xszyid=%s' % (hw['wlkcid'], hw['zyid'], hw['xszyid'])), 'html.parser')
311 | files = page.findAll(class_='fujian')
312 | for i, f in enumerate(files):
313 | if len(f.findAll('a')) == 0:
314 | continue
315 | os.chdir(path) # to avoid filename too long
316 | name = f.findAll('a')[0].text
317 | if i >= 2 and not name.startswith(hw['xh']):
318 | name = hw['xh'] + '_' + name
319 | download('/b/wlxt/kczy/zy/%s/downloadFile/%s/%s' % (c['_type'], hw['wlkcid'], f.findAll('a')[-1].attrs['onclick'].split("ZyFile('")[-1][:-2]), name=name)
320 | os.chdir(now)
321 | else:
322 | print(hw['bt'])
323 | data = {'aoData': [{"name": "wlkcid", "value": c['wlkcid']}, {"name": "zyid", "value": hw['zyid']}]}
324 | stus = get_json('/b/wlxt/kczy/xszy/teacher/getDoneInfo', data)['object']['aaData']
325 | for stu in stus:
326 | append_hw_csv(os.path.join(path, 'info_%s.csv' % c['wlkcid']), stu)
327 | page = bs(get_page('/f/wlxt/kczy/xszy/teacher/beforePiYue?wlkcid=%s&xszyid=%s' % (stu['wlkcid'], stu['xszyid'])), 'html.parser')
328 | files = page.findAll(class_='wdhere')
329 | os.chdir(path) # to avoid filename too long
330 | for f in files:
331 | if f.text == '\n':
332 | continue
333 | try:
334 | id = f.findAll('span')[0].attrs['onclick'].split("'")[1]
335 | name = f.findAll('span')[0].text
336 | except:
337 | try:
338 | id = f.findAll('a')[-1].attrs['onclick'].split("'")[1]
339 | name = f.findAll('a')[0].text
340 | except: # another error
341 | continue
342 | if not name.startswith(stu['xh']):
343 | name = stu['xh'] + '_' + name
344 | download('/b/wlxt/kczy/xszy/teacher/downloadFile/%s/%s' % (stu['wlkcid'], id), name=name)
345 | os.chdir(now)
346 | stus = get_json('/b/wlxt/kczy/xszy/teacher/getUndoInfo', data)['object']['aaData']
347 | for stu in stus:
348 | append_hw_csv(os.path.join(path, 'info_%s.csv' % c['wlkcid']), stu)
349 |
350 |
351 | def build_discuss(s):
352 | return '课程:%s\n内容:%s\n学号:%s\n姓名:%s\n发布时间:%s\n最后回复:%s\n回复时间:%s\n' % (s['kcm'], s['bt'], s['fbr'], s['fbrxm'], s['fbsj'], s['zhhfrxm'], s['zhhfsj'])
353 |
354 |
355 | def sync_discuss(c):
356 | global dist_path
357 | pre = os.path.join(dist_path, c['kcm'], '讨论')
358 | if not os.path.exists(pre):
359 | os.makedirs(pre)
360 | try:
361 | disc = get_json('/b/wlxt/bbs/bbs_tltb/%s/kctlList?wlkcid=%s' % (c['_type'], c['wlkcid']))['object']['resultsList']
362 | except:
363 | return
364 | for d in disc:
365 | filename = os.path.join(pre, escape(d['bt']) + '.txt')
366 | if os.path.exists(filename):
367 | continue
368 | try:
369 | html = get_page('/f/wlxt/bbs/bbs_tltb/%s/viewTlById?wlkcid=%s&id=%s&tabbh=2&bqid=%s' % (c['_type'], d['wlkcid'], d['id'], d['bqid']))
370 | open(filename, 'w').write(build_discuss(d) + bs(html, 'html.parser').find(class_='detail').text)
371 | except:
372 | pass
373 |
374 |
375 | def gethash(fname):
376 | if platform.system() == 'Linux':
377 | return subprocess.check_output(['md5sum', fname]).decode().split()[0]
378 | hash_md5 = hashlib.md5()
379 | with open(fname, "rb") as f:
380 | for chunk in iter(lambda: f.read(4096), b""):
381 | hash_md5.update(chunk)
382 | return hash_md5.hexdigest()
383 |
384 |
385 | def dfs_clean(d):
386 | subdirs = [os.path.join(d, i) for i in os.listdir(d) if os.path.isdir(os.path.join(d, i))]
387 | for i in subdirs:
388 | dfs_clean(i)
389 | files = [os.path.join(d, i) for i in os.listdir(d) if os.path.isfile(os.path.join(d, i))]
390 | info = {}
391 | for f in files:
392 | if os.path.getsize(f):
393 | info[f] = {'size': os.path.getsize(f), 'time': os.path.getmtime(f), 'hash': '', 'rm': 0}
394 | info = list({k: v for k, v in sorted(info.items(), key=lambda item: item[1]['size'])}.items())
395 | for i in range(len(info)):
396 | for j in range(i):
397 | if info[i][1]['size'] == info[j][1]['size']:
398 | if info[i][1]['hash'] == '':
399 | info[i][1]['hash'] = gethash(info[i][0])
400 | if info[j][1]['hash'] == '':
401 | info[j][1]['hash'] = gethash(info[j][0])
402 | if info[i][1]['hash'] == info[j][1]['hash']:
403 | if info[i][1]['time'] < info[j][1]['time']:
404 | info[i][1]['rm'] = 1
405 | elif info[i][1]['time'] > info[j][1]['time']:
406 | info[j][1]['rm'] = 1
407 | elif len(info[i][0]) < len(info[j][0]):
408 | info[i][1]['rm'] = 1
409 | elif len(info[i][0]) > len(info[j][0]):
410 | info[j][1]['rm'] = 1
411 | rm = [i[0] for i in info if i[1]['rm'] or i[1]['size'] == 0]
412 | if rm:
413 | print('rmlist:', rm)
414 | for f in rm:
415 | os.remove(f)
416 |
417 |
418 | def clear(args):
419 | courses = [i for i in os.listdir('.') if os.path.isdir(i) and not i.startswith('.')]
420 | if args.all:
421 | pass
422 | else:
423 | if args.course:
424 | courses = [i for i in courses if i in args.course]
425 | if args.ignore:
426 | courses = [i for i in courses if i not in args.ignore]
427 | courses.sort()
428 | for i, c in enumerate(courses):
429 | print('Checking #%d %s' % (i + 1, c))
430 | for subdir in ['课件', '作业']:
431 | d = os.path.join(c, subdir)
432 | if os.path.exists(d):
433 | dfs_clean(d)
434 |
435 |
436 | def get_args():
437 | parser = argparse.ArgumentParser()
438 | parser.add_argument("--all", action='store_true')
439 | parser.add_argument("--clear", action='store_true', help='remove the duplicate course file')
440 | parser.add_argument("--semester", nargs='+', type=str, default=[])
441 | parser.add_argument("--ignore", nargs='+', type=str, default=[])
442 | parser.add_argument("--course", nargs='+', type=str, default=[])
443 | parser.add_argument('-p', "--_pass", type=str, default='.pass')
444 | parser.add_argument('-c', "--cookie", type=str, default='', help='Netscape HTTP Cookie File')
445 | parser.add_argument('-d', '--dist', type=str, default='', help='download path')
446 | parser.add_argument('--http_proxy', type=str, default='', help='http proxy')
447 | parser.add_argument('--https_proxy', type=str, default='', help='https proxy')
448 | args = parser.parse_args()
449 | return args
450 |
451 |
452 | def main(args):
453 | global dist_path
454 | build_global(args)
455 | assert (dist_path is not None) and (url is not None) and (user_agent is not None) and (headers is not None) and (cookie is not None) and (opener is not None) and (err404 is not None)
456 | if args.clear:
457 | clear(args)
458 | exit()
459 | args.login = False
460 | if args.cookie:
461 | cookie.load(args.cookie, ignore_discard=True, ignore_expires=True)
462 | args.login = (get_page('/b/wlxt/kc/v_wlkc_xs_xktjb_coassb/queryxnxq') != err404)
463 | print('login successfully' if args.login else 'login failed!')
464 | else:
465 | if os.path.exists(args._pass):
466 | username, password = open(args._pass).read().split()
467 | else:
468 | username = input('请输入INFO账号:')
469 | password = getpass.getpass('请输入INFO密码:')
470 | args.login = login(username, password)
471 | if args.login:
472 | courses = get_courses(args)
473 | for c in courses:
474 | c['_type'] = {'0': 'teacher', '3': 'student'}[c['jslx']]
475 | print('Sync ' + c['xnxq'] + ' ' + c['kcm'])
476 | if not os.path.exists(os.path.join(dist_path, c['kcm'])):
477 | os.makedirs(os.path.join(dist_path, c['kcm']))
478 | sync_info(c)
479 | sync_discuss(c)
480 | sync_notify(c)
481 | sync_file(c)
482 | sync_hw(c)
483 |
484 |
485 | if __name__ == '__main__':
486 | main(get_args())
487 |
--------------------------------------------------------------------------------
/learn_async.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 |
4 | __author__ = "Trinkle23897"
5 | __copyright__ = "Copyright (C) 2019 Trinkle23897"
6 | __license__ = "MIT"
7 | __email__ = "463003665@qq.com"
8 | __modified_by__ = "zycccishere"
9 |
10 | import os, csv, json, html, urllib, getpass, base64, hashlib, argparse, platform, subprocess
11 | from tqdm import tqdm
12 | import urllib.request, http.cookiejar
13 | from bs4 import BeautifulSoup as bs
14 | import multiprocessing as mp
15 | from functools import partial
16 |
17 | import ssl
18 |
19 | ssl._create_default_https_context = ssl._create_unverified_context
20 | global dist_path, url, user_agent, headers, cookie, opener, err404
21 | dist_path = url = user_agent = headers = cookie = opener = err404 = None
22 |
23 |
24 | def build_global(args):
25 | global dist_path, url, user_agent, headers, cookie, opener, err404
26 | dist_path = args.dist
27 | url = "https://learn.tsinghua.edu.cn"
28 | user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
29 | headers = {"User-Agent": user_agent, "Connection": "keep-alive"}
30 | handlers = []
31 |
32 | # 添加SSL上下文配置
33 | context = ssl.create_default_context()
34 | context.options |= 0x4 # OP_LEGACY_SERVER_CONNECT
35 | context.check_hostname = False
36 | context.verify_mode = ssl.CERT_NONE
37 | handlers.append(urllib.request.HTTPSHandler(context=context))
38 |
39 | if args.http_proxy:
40 | handlers.append(urllib.request.ProxyHandler({"http": args.http_proxy}))
41 | if args.https_proxy:
42 | handlers.append(urllib.request.ProxyHandler({"https": args.https_proxy}))
43 | cookie = http.cookiejar.MozillaCookieJar()
44 | handlers.append(urllib.request.HTTPCookieProcessor(cookie))
45 | opener = urllib.request.build_opener(*handlers)
46 | urllib.request.install_opener(opener)
47 | err404 = '\r\n\r\n\r\n'
48 |
49 |
50 | def get_xsrf_token():
51 | cookie_obj = (
52 | cookie._cookies.get("learn.tsinghua.edu.cn", dict())
53 | .get("/", dict())
54 | .get("XSRF-TOKEN", None)
55 | )
56 | return cookie_obj.value if cookie_obj else None
57 |
58 |
59 | def open_page(uri, values={}):
60 | post_data = urllib.parse.urlencode(values).encode() if values else None
61 | request = urllib.request.Request(
62 | uri if uri.startswith("http") else url + uri, post_data, headers
63 | )
64 | try:
65 | response = opener.open(request)
66 | return response
67 | except urllib.error.URLError as e:
68 | if hasattr(e, "code"):
69 | print(uri, e.code, ":", e.reason)
70 | else:
71 | print(uri, ":", e.reason)
72 |
73 |
74 | def get_page(uri, values={}):
75 | data = open_page(uri, values)
76 | if data:
77 | return data.read().decode()
78 |
79 |
80 | def get_json(uri, values={}):
81 | xsrf_token = get_xsrf_token()
82 | if xsrf_token:
83 | if "?" not in uri:
84 | uri = uri + f"?_csrf={xsrf_token}"
85 | else:
86 | uri = uri + f"&_csrf={xsrf_token}"
87 | try:
88 | page = get_page(uri, values)
89 | result = json.loads(page)
90 | return result
91 | except:
92 | return {}
93 |
94 |
95 | def escape(s):
96 | return (
97 | html.unescape(s)
98 | .replace(os.path.sep, "、")
99 | .replace(":", "_")
100 | .replace(" ", "_")
101 | .replace("\t", "")
102 | .replace("?", ".")
103 | .replace("/", "_")
104 | .replace("'", "_")
105 | .replace("<", "")
106 | .replace(">", "")
107 | .replace("#", "")
108 | .replace(";", "")
109 | .replace("*", "_")
110 | .replace('"', "_")
111 | .replace("'", "_")
112 | .replace("|", "")
113 | )
114 |
115 |
116 | def login(username, password):
117 | login_uri = "https://id.tsinghua.edu.cn/do/off/ui/auth/login/post/bb5df85216504820be7bba2b0ae1535b/0?/login.do"
118 | values = {"i_user": username, "i_pass": password, "atOnce": "true"}
119 | info = get_page(login_uri, values)
120 | successful = "SUCCESS" in info
121 | # print(
122 | # "User %s login successfully" % (username)
123 | # if successful
124 | # else "User %s login failed!" % (username)
125 | # )
126 | if not successful:
127 | print("User %s login failed!" % (username))
128 | return False
129 | if successful:
130 | get_page(
131 | get_page(info.split('replace("')[-1].split('");\n')[0])
132 | .split('location="')[1]
133 | .split('";\r\n')[0]
134 | )
135 | return successful
136 |
137 |
138 | def get_courses(args):
139 | try:
140 | now = get_json("/b/kc/zhjw_v_code_xnxq/getCurrentAndNextSemester")["result"][
141 | "xnxq"
142 | ]
143 | if args.all or args.course or args.semester:
144 | query_list = [
145 | x
146 | for x in get_json("/b/wlxt/kc/v_wlkc_xs_xktjb_coassb/queryxnxq")
147 | if x is not None
148 | ]
149 | query_list.sort()
150 | if args.semester:
151 | query_list_ = [q for q in query_list if q in args.semester]
152 | if len(query_list_) == 0:
153 | print("Invalid semester, choices: ", query_list)
154 | return []
155 | query_list = query_list_
156 | else:
157 | query_list = [now]
158 | except:
159 | print("您被退学了!")
160 | return []
161 | courses = []
162 | for q in query_list:
163 | try:
164 | c_stu = get_json(
165 | "/b/wlxt/kc/v_wlkc_xs_xkb_kcb_extend/student/loadCourseBySemesterId/"
166 | + q
167 | + "/zh/"
168 | )["resultList"]
169 | except:
170 | c_stu = []
171 | try:
172 | c_ta = get_json("/b/kc/v_wlkc_kcb/queryAsorCoCourseList/%s/0" % q)[
173 | "resultList"
174 | ]
175 | except:
176 | c_ta = []
177 | current_courses = []
178 | for c in c_stu:
179 | c["jslx"] = "3"
180 | current_courses.append(c)
181 | for c in c_ta:
182 | c["jslx"] = "0"
183 | current_courses.append(c)
184 | courses += current_courses
185 | escape_c = []
186 |
187 | def escape_course_fn(c):
188 | return (
189 | escape(c)
190 | .replace(" ", "")
191 | .replace("_", "")
192 | .replace("(", "(")
193 | .replace(")", ")")
194 | )
195 |
196 | for c in courses:
197 | c["kcm"] = escape_course_fn(c["kcm"])
198 | escape_c.append(c)
199 | courses = escape_c
200 | if args.course:
201 | args.course = [escape_course_fn(c) for c in args.course]
202 | courses = [c for c in courses if c["kcm"] in args.course]
203 | if args.ignore:
204 | args.ignore = [escape_course_fn(c) for c in args.ignore]
205 | courses = [c for c in courses if c["kcm"] not in args.ignore]
206 | return courses
207 |
208 |
209 | class TqdmUpTo(tqdm):
210 | def update_to(self, b=1, bsize=1, tsize=None):
211 | if tsize is not None:
212 | self.total = tsize
213 | self.update(b * bsize - self.n)
214 |
215 |
216 | def download(uri, name, target_dir=None):
217 | filename = escape(name)
218 |
219 | # 使用绝对路径
220 | if target_dir:
221 | filename = os.path.join(target_dir, filename)
222 |
223 | if (
224 | os.path.exists(filename)
225 | and os.path.getsize(filename)
226 | or "Connection__close" in filename
227 | ):
228 | return
229 |
230 | try:
231 | with TqdmUpTo(
232 | ascii=True,
233 | dynamic_ncols=True,
234 | unit="B",
235 | unit_scale=True,
236 | miniters=1,
237 | desc=filename,
238 | ) as t:
239 | urllib.request.urlretrieve(
240 | url + uri, filename=filename, reporthook=t.update_to, data=None
241 | )
242 | except Exception as e:
243 | print(
244 | f"Could not download file {filename} ... removing broken file. Error: {str(e)}"
245 | )
246 | if os.path.exists(filename):
247 | os.remove(filename)
248 | return
249 |
250 |
251 | def build_notify(s):
252 | tp = (
253 | bs(base64.b64decode(s["ggnr"]).decode("utf-8"), "html.parser").text
254 | if s["ggnr"]
255 | else ""
256 | )
257 | st = "题目: %s\n发布人: %s\n发布时间: %s\n\n内容: %s\n" % (
258 | s["bt"],
259 | s["fbr"],
260 | s["fbsjStr"],
261 | tp,
262 | )
263 | return st
264 |
265 |
266 | def makedirs_safe(directory):
267 | try:
268 | if not os.path.exists(directory):
269 | os.makedirs(directory)
270 | except FileExistsError:
271 | pass
272 |
273 |
274 | def sync_notify(c):
275 | global dist_path
276 | pre = os.path.join(dist_path, c["kcm"], "公告")
277 | makedirs_safe(pre)
278 | try:
279 | data = {"aoData": [{"name": "wlkcid", "value": c["wlkcid"]}]}
280 | if c["_type"] == "student":
281 | notify = get_json("/b/wlxt/kcgg/wlkc_ggb/student/pageListXs", data)[
282 | "object"
283 | ]["aaData"]
284 | else:
285 | notify = get_json("/b/wlxt/kcgg/wlkc_ggb/teacher/pageList", data)["object"][
286 | "aaData"
287 | ]
288 | except:
289 | return
290 | for n in notify:
291 | makedirs_safe(os.path.join(pre, escape(n["bt"])))
292 | path = os.path.join(
293 | os.path.join(pre, escape(n["bt"])), escape(n["bt"]) + ".txt"
294 | )
295 | open(path, "w", encoding="utf-8").write(build_notify(n))
296 |
297 | if n.get("fjmc") is not None:
298 | html = get_page(
299 | "/f/wlxt/kcgg/wlkc_ggb/%s/beforeViewXs?wlkcid=%s&id=%s"
300 | % (c["_type"], n["wlkcid"], n["ggid"])
301 | )
302 | soup = bs(html, "html.parser")
303 |
304 | link = soup.find("a", class_="ml-10")
305 |
306 | now = os.getcwd()
307 | os.chdir(os.path.join(pre, escape(n["bt"])))
308 | name = n["fjmc"]
309 | download(link["href"], name=name)
310 | os.chdir(now)
311 |
312 |
313 | def sync_file(c):
314 | global dist_path
315 | now = os.getcwd()
316 | pre = os.path.join(dist_path, c["kcm"], "课件")
317 | makedirs_safe(pre)
318 |
319 | if c["_type"] == "student":
320 | files = get_json(
321 | "/b/wlxt/kj/wlkc_kjxxb/student/kjxxbByWlkcidAndSizeForStudent?wlkcid=%s&size=0"
322 | % c["wlkcid"]
323 | )["object"]
324 | else:
325 | try:
326 | files = get_json(
327 | "/b/wlxt/kj/v_kjxxb_wjwjb/teacher/queryByWlkcid?wlkcid=%s&size=0"
328 | % c["wlkcid"]
329 | )["object"]["resultsList"]
330 | except: # None
331 | return
332 |
333 | rows = json.loads(
334 | get_page(
335 | f'/b/wlxt/kj/wlkc_kjflb/{c["_type"]}/pageList?_csrf={get_xsrf_token()}&wlkcid={c["wlkcid"]}'
336 | )
337 | )["object"]["rows"]
338 |
339 | os.chdir(pre)
340 | for r in rows:
341 | if c["_type"] == "student":
342 | row_files = get_json(
343 | f'/b/wlxt/kj/wlkc_kjxxb/{c["_type"]}/kjxxb/{c["wlkcid"]}/{r["kjflid"]}'
344 | )["object"]
345 | else:
346 | data = {
347 | "aoData": [
348 | {"name": "wlkcid", "value": c["wlkcid"]},
349 | {"name": "kjflid", "value": r["kjflid"]},
350 | {"name": "iDisplayStart", "value": 0},
351 | {"name": "iDisplayLength", "value": "-1"},
352 | ]
353 | }
354 | row_files = get_json("/b/wlxt/kj/v_kjxxb_wjwjb/teacher/pageList", data)[
355 | "object"
356 | ]["aaData"]
357 | makedirs_safe(escape(r["bt"]))
358 | rnow = os.getcwd()
359 | os.chdir(escape(r["bt"]))
360 | for rf in row_files:
361 | wjlx = None
362 | if c["_type"] == "student":
363 | flag = False
364 | for f in files:
365 | if rf[7] == f["wjid"]:
366 | flag = True
367 | wjlx = f["wjlx"]
368 | break
369 | wjid = rf[7]
370 | name = rf[1]
371 | else:
372 | flag = True
373 | wjlx = rf["wjlx"]
374 | wjid = rf["wjid"]
375 | name = rf["bt"]
376 | if flag:
377 | if wjlx:
378 | name += "." + wjlx
379 | download(
380 | f'/b/wlxt/kj/wlkc_kjxxb/{c["_type"]}/downloadFile?sfgk=0&wjid={wjid}',
381 | name=name,
382 | )
383 | else:
384 | print(f"文件{rf[1]}出错")
385 | os.chdir(rnow)
386 |
387 | os.chdir(now)
388 |
389 |
390 | def sync_info(c):
391 | global dist_path
392 | pre = os.path.join(dist_path, c["kcm"], "课程信息.txt")
393 | try:
394 | if c["_type"] == "student":
395 | html = get_page(
396 | "/f/wlxt/kc/v_kcxx_jskcxx/student/beforeXskcxx?wlkcid=%s&sfgk=-1"
397 | % c["wlkcid"]
398 | )
399 | else:
400 | html = get_page(
401 | "/f/wlxt/kc/v_kcxx_jskcxx/teacher/beforeJskcxx?wlkcid=%s&sfgk=-1"
402 | % c["wlkcid"]
403 | )
404 | open(pre, "w").write(
405 | "\n".join(bs(html, "html.parser").find(class_="course-w").text.split())
406 | )
407 | except:
408 | return
409 |
410 |
411 | def append_hw_csv(fname, stu):
412 | try:
413 | f = [i for i in csv.reader(open(fname)) if i]
414 | except:
415 | f = [["学号", "姓名", "院系", "班级", "上交时间", "状态", "成绩", "批阅老师"]]
416 | info_str = [
417 | stu["xh"],
418 | stu["xm"],
419 | stu["dwmc"],
420 | stu["bm"],
421 | stu["scsjStr"],
422 | stu["zt"],
423 | stu["cj"],
424 | stu["jsm"],
425 | ]
426 | xhs = [i[0] for i in f]
427 | if stu["xh"] in xhs:
428 | i = xhs.index(stu["xh"])
429 | f[i] = info_str
430 | else:
431 | f.append(info_str)
432 | csv.writer(open(fname, "w")).writerows(f)
433 |
434 |
435 | def sync_hw(c):
436 | global dist_path
437 | now = os.getcwd()
438 | pre = os.path.join(dist_path, c["kcm"], "作业")
439 | if not os.path.exists(pre):
440 | os.makedirs(pre)
441 | data = {"aoData": [{"name": "wlkcid", "value": c["wlkcid"]}]}
442 | if c["_type"] == "student":
443 | hws = []
444 | for hwtype in ["zyListWj", "zyListYjwg", "zyListYpg"]:
445 | try:
446 | hws += get_json("/b/wlxt/kczy/zy/student/%s" % hwtype, data)["object"][
447 | "aaData"
448 | ]
449 | except:
450 | continue
451 | else:
452 | hws = get_json("/b/wlxt/kczy/zy/teacher/pageList", data)["object"]["aaData"]
453 | for hw in hws:
454 | path = os.path.join(pre, escape(hw["bt"]))
455 | if not os.path.exists(path):
456 | os.makedirs(path)
457 | if c["_type"] == "student":
458 | append_hw_csv(os.path.join(path, "info_%s.csv" % c["wlkcid"]), hw)
459 | page = bs(
460 | get_page(
461 | "/f/wlxt/kczy/zy/student/viewCj?wlkcid=%s&zyid=%s&xszyid=%s"
462 | % (hw["wlkcid"], hw["zyid"], hw["xszyid"])
463 | ),
464 | "html.parser",
465 | )
466 | files = page.find_all(class_="fujian")
467 | for i, f in enumerate(files):
468 | if len(f.find_all("a")) == 0:
469 | continue
470 | os.chdir(path) # to avoid filename too long
471 | name = f.find_all("a")[0].text
472 | if i >= 2 and not name.startswith(hw["xh"]):
473 | name = hw["xh"] + "_" + name
474 | download(
475 | "/b/wlxt/kczy/zy/%s/downloadFile/%s/%s"
476 | % (
477 | c["_type"],
478 | hw["wlkcid"],
479 | f.find_all("a")[-1].attrs["onclick"].split("ZyFile('")[-1][:-2],
480 | ),
481 | name=name,
482 | )
483 | os.chdir(now)
484 | else:
485 | print(hw["bt"])
486 | data = {
487 | "aoData": [
488 | {"name": "wlkcid", "value": c["wlkcid"]},
489 | {"name": "zyid", "value": hw["zyid"]},
490 | ]
491 | }
492 | stus = get_json("/b/wlxt/kczy/xszy/teacher/getDoneInfo", data)["object"][
493 | "aaData"
494 | ]
495 | for stu in stus:
496 | append_hw_csv(os.path.join(path, "info_%s.csv" % c["wlkcid"]), stu)
497 | page = bs(
498 | get_page(
499 | "/f/wlxt/kczy/xszy/teacher/beforePiYue?wlkcid=%s&xszyid=%s"
500 | % (stu["wlkcid"], stu["xszyid"])
501 | ),
502 | "html.parser",
503 | )
504 | files = page.find_all(class_="wdhere")
505 | os.chdir(path) # to avoid filename too long
506 | for f in files:
507 | if f.text == "\n":
508 | continue
509 | try:
510 | id = f.find_all("span")[0].attrs["onclick"].split("'")[1]
511 | name = f.find_all("span")[0].text
512 | except:
513 | try:
514 | id = f.find_all("a")[-1].attrs["onclick"].split("'")[1]
515 | name = f.find_all("a")[0].text
516 | except: # another error
517 | continue
518 | if not name.startswith(stu["xh"]):
519 | name = stu["xh"] + "_" + name
520 | download(
521 | "/b/wlxt/kczy/xszy/teacher/downloadFile/%s/%s"
522 | % (stu["wlkcid"], id),
523 | name=name,
524 | )
525 | os.chdir(now)
526 | stus = get_json("/b/wlxt/kczy/xszy/teacher/getUndoInfo", data)["object"][
527 | "aaData"
528 | ]
529 | for stu in stus:
530 | append_hw_csv(os.path.join(path, "info_%s.csv" % c["wlkcid"]), stu)
531 |
532 |
533 | def build_discuss(s):
534 | return "课程:%s\n内容:%s\n学号:%s\n姓名:%s\n发布时间:%s\n最后回复:%s\n回复时间:%s\n" % (
535 | s["kcm"],
536 | s["bt"],
537 | s["fbr"],
538 | s["fbrxm"],
539 | s["fbsj"],
540 | s["zhhfrxm"],
541 | s["zhhfsj"],
542 | )
543 |
544 |
545 | def sync_discuss(c):
546 | global dist_path
547 | pre = os.path.join(dist_path, c["kcm"], "讨论")
548 | if not os.path.exists(pre):
549 | os.makedirs(pre)
550 | try:
551 | disc = get_json(
552 | "/b/wlxt/bbs/bbs_tltb/%s/kctlList?wlkcid=%s" % (c["_type"], c["wlkcid"])
553 | )["object"]["resultsList"]
554 | except:
555 | return
556 | for d in disc:
557 | filename = os.path.join(pre, escape(d["bt"]) + ".txt")
558 | if os.path.exists(filename):
559 | continue
560 | try:
561 | html = get_page(
562 | "/f/wlxt/bbs/bbs_tltb/%s/viewTlById?wlkcid=%s&id=%s&tabbh=2&bqid=%s"
563 | % (c["_type"], d["wlkcid"], d["id"], d["bqid"])
564 | )
565 | open(filename, "w").write(
566 | build_discuss(d) + bs(html, "html.parser").find(class_="detail").text
567 | )
568 | except:
569 | pass
570 |
571 |
572 | def gethash(fname):
573 | if platform.system() == "Linux":
574 | return subprocess.check_output(["md5sum", fname]).decode().split()[0]
575 | hash_md5 = hashlib.md5()
576 | with open(fname, "rb") as f:
577 | for chunk in iter(lambda: f.read(4096), b""):
578 | hash_md5.update(chunk)
579 | return hash_md5.hexdigest()
580 |
581 |
582 | def dfs_clean(d):
583 | subdirs = [
584 | os.path.join(d, i) for i in os.listdir(d) if os.path.isdir(os.path.join(d, i))
585 | ]
586 | for i in subdirs:
587 | dfs_clean(i)
588 | files = [
589 | os.path.join(d, i) for i in os.listdir(d) if os.path.isfile(os.path.join(d, i))
590 | ]
591 | info = {}
592 | for f in files:
593 | if os.path.getsize(f):
594 | info[f] = {
595 | "size": os.path.getsize(f),
596 | "time": os.path.getmtime(f),
597 | "hash": "",
598 | "rm": 0,
599 | }
600 | info = list(
601 | {
602 | k: v for k, v in sorted(info.items(), key=lambda item: item[1]["size"])
603 | }.items()
604 | )
605 | for i in range(len(info)):
606 | for j in range(i):
607 | if info[i][1]["size"] == info[j][1]["size"]:
608 | if info[i][1]["hash"] == "":
609 | info[i][1]["hash"] = gethash(info[i][0])
610 | if info[j][1]["hash"] == "":
611 | info[j][1]["hash"] = gethash(info[j][0])
612 | if info[i][1]["hash"] == info[j][1]["hash"]:
613 | if info[i][1]["time"] < info[j][1]["time"]:
614 | info[i][1]["rm"] = 1
615 | elif info[i][1]["time"] > info[j][1]["time"]:
616 | info[j][1]["rm"] = 1
617 | elif len(info[i][0]) < len(info[j][0]):
618 | info[i][1]["rm"] = 1
619 | elif len(info[i][0]) > len(info[j][0]):
620 | info[j][1]["rm"] = 1
621 | rm = [i[0] for i in info if i[1]["rm"] or i[1]["size"] == 0]
622 | if rm:
623 | print("rmlist:", rm)
624 | for f in rm:
625 | os.remove(f)
626 |
627 |
628 | def clear(args):
629 | courses = [i for i in os.listdir(".") if os.path.isdir(i) and not i.startswith(".")]
630 | if args.all:
631 | pass
632 | else:
633 | if args.course:
634 | courses = [i for i in courses if i in args.course]
635 | if args.ignore:
636 | courses = [i for i in courses if i not in args.ignore]
637 | courses.sort()
638 | for i, c in enumerate(courses):
639 | print("Checking #%d %s" % (i + 1, c))
640 | for subdir in ["课件", "作业"]:
641 | d = os.path.join(c, subdir)
642 | if os.path.exists(d):
643 | dfs_clean(d)
644 |
645 |
646 | def process_course(c, args):
647 | # 处理单个课程的函数,用于多进程
648 | build_global(args)
649 | login(args.username, args.password)
650 |
651 | c["_type"] = {"0": "teacher", "3": "student"}[c["jslx"]]
652 | print("Sync " + c["xnxq"] + " " + c["kcm"])
653 |
654 | if not os.path.exists(os.path.join(dist_path, c["kcm"])):
655 | os.makedirs(os.path.join(dist_path, c["kcm"]))
656 | sync_info(c)
657 | sync_discuss(c)
658 | sync_notify(c)
659 | sync_file(c)
660 | sync_hw(c)
661 |
662 | return c["kcm"]
663 |
664 |
665 | def main(args):
666 | global dist_path
667 | build_global(args)
668 | assert (
669 | (dist_path is not None)
670 | and (url is not None)
671 | and (user_agent is not None)
672 | and (headers is not None)
673 | and (cookie is not None)
674 | and (opener is not None)
675 | and (err404 is not None)
676 | )
677 | if args.clear:
678 | clear(args)
679 | exit()
680 | args.login = False
681 | if args.cookie:
682 | cookie.load(args.cookie, ignore_discard=True, ignore_expires=True)
683 | args.login = get_page("/b/wlxt/kc/v_wlkc_xs_xktjb_coassb/queryxnxq") != err404
684 | print("login successfully" if args.login else "login failed!")
685 | else:
686 | if os.path.exists(args._pass):
687 | username, password = open(args._pass).read().split()
688 | else:
689 | if not args.username:
690 | args.username = input("请输入INFO账号:")
691 | if not args.password:
692 | args.password = getpass.getpass("请输入INFO密码:")
693 | args.login = login(args.username, args.password)
694 | if args.login:
695 | courses = get_courses(args)
696 | if args.multi:
697 | # 如果未指定进程数,则使用CPU核数
698 | if not args.processes:
699 | args.processes = mp.cpu_count()
700 | print(f"启动多进程下载,进程数:{args.processes}")
701 | pool = mp.Pool(processes=args.processes)
702 | process_func = partial(process_course, args=args)
703 | for _ in tqdm(
704 | pool.imap_unordered(process_func, courses),
705 | total=len(courses),
706 | desc="处理课程",
707 | ):
708 | pass
709 |
710 | pool.close()
711 | pool.join()
712 | else:
713 | # 原始单进程处理
714 | for c in courses:
715 | c["_type"] = {"0": "teacher", "3": "student"}[c["jslx"]]
716 | print("Sync " + c["xnxq"] + " " + c["kcm"])
717 | if not os.path.exists(os.path.join(dist_path, c["kcm"])):
718 | os.makedirs(os.path.join(dist_path, c["kcm"]))
719 | sync_info(c)
720 | sync_discuss(c)
721 | sync_notify(c)
722 | sync_file(c)
723 | sync_hw(c)
724 |
725 |
726 | def get_args():
727 | parser = argparse.ArgumentParser()
728 | parser.add_argument("--all", action="store_true")
729 | parser.add_argument(
730 | "--clear", action="store_true", help="remove the duplicate course file"
731 | )
732 | parser.add_argument("--semester", nargs="+", type=str, default=[])
733 | parser.add_argument("--ignore", nargs="+", type=str, default=[])
734 | parser.add_argument("--course", nargs="+", type=str, default=[])
735 | parser.add_argument("-p", "--_pass", type=str, default=".pass")
736 | parser.add_argument(
737 | "-c", "--cookie", type=str, default="", help="Netscape HTTP Cookie File"
738 | )
739 | parser.add_argument("-d", "--dist", type=str, default="", help="download path")
740 | parser.add_argument("--http_proxy", type=str, default="", help="http proxy")
741 | parser.add_argument("--https_proxy", type=str, default="", help="https proxy")
742 | parser.add_argument("--username", type=str, default="", help="username")
743 | parser.add_argument("--password", type=str, default="", help="password")
744 | parser.add_argument("--multi", action="store_true", help="multi-process")
745 | parser.add_argument("--processes", type=int, help="concurrent processes")
746 | args = parser.parse_args()
747 | return args
748 |
749 |
750 | if __name__ == "__main__":
751 | main(get_args())
752 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | bs4
2 | tqdm
3 | requests
4 |
--------------------------------------------------------------------------------