├── assets ├── post41_1.png ├── post41_2.png ├── post41_3.png ├── post41_4.png ├── post41_5.png ├── post41_6.png ├── post41_7.png ├── post41_8.png └── post41_9.png ├── header_sign.py ├── .gitignore └── README.md /assets/post41_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_1.png -------------------------------------------------------------------------------- /assets/post41_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_2.png -------------------------------------------------------------------------------- /assets/post41_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_3.png -------------------------------------------------------------------------------- /assets/post41_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_4.png -------------------------------------------------------------------------------- /assets/post41_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_5.png -------------------------------------------------------------------------------- /assets/post41_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_6.png -------------------------------------------------------------------------------- /assets/post41_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_7.png -------------------------------------------------------------------------------- /assets/post41_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_8.png -------------------------------------------------------------------------------- /assets/post41_9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yokonsan/qcc_header_sign/HEAD/assets/post41_9.png -------------------------------------------------------------------------------- /header_sign.py: -------------------------------------------------------------------------------- 1 | #!usr/bin/env python 2 | # -*- coding:utf-8 -*- 3 | """ 4 | qcc 请求头生成 5 | """ 6 | import hashlib 7 | import hmac 8 | import json 9 | from urllib import parse 10 | 11 | 12 | class SignTool(object): 13 | def __init__(self): 14 | self.seeds = { 15 | "0": "W", 16 | "1": "l", 17 | "2": "k", 18 | "3": "B", 19 | "4": "Q", 20 | "5": "g", 21 | "6": "f", 22 | "7": "i", 23 | "8": "i", 24 | "9": "r", 25 | "10": "v", 26 | "11": "6", 27 | "12": "A", 28 | "13": "K", 29 | "14": "N", 30 | "15": "k", 31 | "16": "4", 32 | "17": "L", 33 | "18": "1", 34 | "19": "8" 35 | } 36 | self.n = 20 37 | 38 | def generate_map_result(self, s): 39 | if not s: 40 | s = "/" 41 | s = s.lower() 42 | s = s + s 43 | k = '' 44 | for i in s: 45 | k += self.seeds[str(ord(i) % 20)] 46 | return k 47 | 48 | @staticmethod 49 | def sign_with_hmac(key, s): 50 | return hmac.new(bytes(key, encoding='utf-8'), bytes(s, encoding='utf-8'), hashlib.sha512).hexdigest() 51 | 52 | def get_head_key(self, s): 53 | s = s.lower() 54 | map_result = self.generate_map_result(s) 55 | key = self.sign_with_hmac(map_result, s) 56 | return key[10:10 + 20] 57 | 58 | def get_head_value(self, url, data=None): 59 | if not url: 60 | url = "/" 61 | if not data: 62 | data = {} 63 | key = url.lower() 64 | # JSON.stringify(data).toLowerCase() 65 | data_s = json.dumps(data, ensure_ascii=False).lower() 66 | enc_data = key + key + data_s 67 | enc_key = self.generate_map_result(key) 68 | result = self.sign_with_hmac(enc_key, enc_data) 69 | return result 70 | 71 | def get_header(self, url): 72 | paths = parse.urlparse(url) 73 | uri = paths.path + "?" + paths.query 74 | header_key = self.get_head_key(uri) 75 | header_val = self.get_head_value(uri) 76 | return {header_key: header_val} 77 | 78 | 79 | sign_tool = SignTool() 80 | 81 | if __name__ == '__main__': 82 | print(sign_tool.get_header('https://www.qcc.com/api/elib/getNewCompany?countyCode=&flag=&industry=&isSortAsc=false&pageSize=20&province=&sortField=startdate&startDateEnd=')) 83 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | .idea/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # qcc请求头反爬破解 2 | 3 | 最近有朋友问我,[qcc](https://www.qcc.com/web/elib/newcompany) 网站做了一次反爬措施,要如何破解。 4 | 5 | 6 | ## 分析 7 | 8 | 9 | 我抓包大致看了下,该模块下的请求为 `ajax`请求,并且每次请求都会带上一个疑似身份验证的请求头,长这个样子: 10 | 11 | ![post41_1](./assets/post41_1.png) 12 | 13 | 首先搜索网页 `html` 源码,无法得知该信息从何来,前面的请求也没有带,基本上可以断定是 `js` 动态生成并带上请求头,和后端交互的。 14 | 15 | 16 | 既然确定了,就开始找找是哪段 `js` 代码。查看 `html` 代码,该页面只加载了几个 `js` 文件: 17 | 18 | ![post41_2](./assets/post41_2.png) 19 | ![post41_3](./assets/post41_3.png) 20 | 21 | 22 | 不出意外的话, `jquery` 可以先不管,那就先看下面的文件。搜索关键词 `newcompany` 可以大概确定就是它了。因为该网站为了防止别人前端做 `debug` ,在该 `js` 文件中加了大量的 `debug` 断点。这个时候我们可以使用抓包工具 `filder` 做本地代理,在网站加载请求该 `js` 文件时,使用本地文件返回给网站。这样就可以将原来的 `js` 文件中的断点代码全部删掉,方便正常的调试。 23 | 24 | 25 | ## 调试 26 | 27 | 28 | `filder` 的 `AutoResponder` 标签,点击添加规则,将左侧该 `main.8a7cb6af.js`  文件拖到右边,如下图设置即可: 29 | 30 | ![post41_4](./assets/post41_4.png) 31 | 32 | 接着刷新网页,便不会再有无限的断点了。这个时候就可以开始调试代码,找到请求头开始生成的位置,以及生成的逻辑。 33 | 34 | 35 | 前端调试很枯燥,根据关键词,一步一步往下走,可以看到关键代码,如下: 36 | 37 | ![post41_5](./assets/post41_5.png) 38 | 39 | 关键代码: 40 | ```javascript 41 | var i = (0, 42 | a.default)(t) 43 | , s = (0, 44 | o.default)(t, e.data); 45 | 46 | e.headers[i] = s, 47 | ``` 48 | 翻成人看的代码: 49 | 50 | 51 | ```javascript 52 | var i = a.default)(t), s = o.default(t, e.data) 53 | 54 | e.headers[i] = s 55 | ``` 56 | 跟着往前看调用栈,这里的 `e` 就是后面带上请求头的元素。那么这里的 `i` 和 `v` 就是我们需要的。 57 | 跟着往下走,看这两个值时如何生成的。 58 | 59 | ![post41_6](./assets/post41_6.png) 60 | 61 | 上图即是关键逻辑,用来生成初始的值。这里翻译过来就是: 62 | ```javascript 63 | var e = '/api/elib/getnewcompany?countycode=&flag=&industry=&issortasc=false&pagesize=20&province=&sortfield=startdate&startdateend=', 64 | t = e, 65 | n = t + t, 66 | i = "" 67 | for (o=0; o