├── README.md └── read_requests_v2.22.0 ├── img ├── image-20190902152129336.png └── image-20190906123617049.png └── read_requests_v2.22.0.md /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | > 每个人都有自己的时间表,不要让任何人去左右它。 4 | > 5 | > 1.01365=37.78, 1.00365=1, 0.99365=0.02 6 | 7 | ## 目录 8 | 9 | [read_requests_v2.22.0](./read_requests_v2.22.0/read_requests_v2.22.0.md) 2019-9-11 10 | 11 | ... 12 | 13 | ## 往事 14 | 15 | 大学毕业后一直在一家公司呆着,半年后成立新部门,开始接触嵌入式linux,负责某款核心模组的研发跟技术支持。新部门人员较少,跟该产品相关的活基本都是自己干,画原理图,驱动调试,全国各地跑支持客户,虽然很辛苦,但是自己成长的也快。 16 | 17 | 渐渐的,产品销售开始步入正轨,部门扭亏为盈,不断引入更高性能的Android平台产品。自然而然,旧平台产品以守客户为主,新平台产品挖掘新客户,产品也从模组发展到了整机。那时,我接触到一些应用编程,与同事一起编写了基于C++的轻量级GUI图形库;基于Android开发的智能管理系统,OTA升级程序等。 18 | 19 | 直到去年下半年,部门要为现有产品开发一个通用的产测软件,在产品出生产线前完成所有的软硬件测试。当时第一时间想到用web服务,考虑到兼容性及可移植性,决定选用django框架,最后也如期完成任务。这是我第一个python项目。 20 | 21 | 期间发生了一段小插曲,家里有亲戚是做实业的,想把电商也做起来,问我能不能帮忙找到人。自己想了一下,把这个当作自己的副业多好,恰好身边也有做这行的朋友,跟老婆商量后欣然同意了,就把这件事给接了过来。后面就是找朋友聊,组建团队,开工资。因为自己不是这个行业的,基本帮不了什么,就在工厂与店铺之间做做协调。 22 | 23 | 在做这件事之前,心里已经设立好止损线了。果然不出所料,因为产品定位及竞争等关系,店铺运营了六个月一点起色也没有,反而招到竞争对手的恶意打压,恶意买家诈骗等等。虽然那时心有不甘,后面还是毅然把店铺关闭了,及时止损。因为开店铺所花费全是自己的私房钱,所以对家里没有造成什么影响,过了些日子也就忘了。 24 | 25 | 经历过这件事后让我深刻的意识到,创业真的非常非常难,不是几句话所能表达清楚的,更何况隔行如隔山,很多事情你意想不到且不可控。 26 | 27 | 无论我们身在职场也好,自己创业也好,我认为人要有足够的抗风险能力,所以你要不断的提升自己,增强自己。失败没什么可怕的,它能让你看清自己到低缺少什么。 28 | 29 | 我也很庆幸能有这次经历,让我感受到了家人的理解与支持,让我看到了自身的不足,让我有理由去打破原本按部就班的生活。 30 | 31 | ## 目的 32 | 33 | 莫名其妙回忆了一段过去的往事,也许我只是释放下自己的情绪,让自己时刻保持清醒,不要懈怠。 34 | 35 | 如今我全心在学习python,我对数据分析,机器学习方向抱有很浓的兴趣,我希望我可以深入研究它们,获得一技之长。 36 | 37 | 借着github这个优秀的平台,我会持之以恒的记录自己的学习经历,阅读更多python大神的项目源码,吸取其精华。 38 | 39 | 我喜欢交流,欢迎您前来一同探讨。 40 | 41 | ## 最后 42 | 43 | 与君共勉。 -------------------------------------------------------------------------------- /read_requests_v2.22.0/img/image-20190902152129336.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BigFlower666/learn_python/e74cb95e889ff03871427f05ce45009a4f0cd610/read_requests_v2.22.0/img/image-20190902152129336.png -------------------------------------------------------------------------------- /read_requests_v2.22.0/img/image-20190906123617049.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BigFlower666/learn_python/e74cb95e889ff03871427f05ce45009a4f0cd610/read_requests_v2.22.0/img/image-20190906123617049.png -------------------------------------------------------------------------------- /read_requests_v2.22.0/read_requests_v2.22.0.md: -------------------------------------------------------------------------------- 1 | ## *-1-前言* 2 | 3 | 在写这篇文章之前,看过一个老外写的关于requests源码的阅读,思路清晰,概念点到为止,不繁琐,觉得挺不错。想法有了,那就开始行动。 4 | 5 | ## *-2-准备* 6 | 7 | 因为习惯了终端的操作方式,所以近期我的代码编辑器一直是vim+插件,个人认为编辑器这块只要自己用的顺手就可以,只要满足以下几点即可: 8 | 9 | - 代码补全功能 10 | - 能实现代码的快速跳转及返回 11 | - 快速浏览文件目录,关键字搜索 12 | - 窗口分割,显示多页代码 13 | - ... 14 | 15 | 源码下载: 16 | 17 | ``` 18 | $ git clone https://github.com/kennethreitz/requests 19 | ``` 20 | 21 | 当前版本的requests测试例程是基于pytest开发的,如果我们想运行测试代码,就需要安装测试环境所需要的所有库。源码包中requirements.txt详细列出了所需安装的库,通过pip工具安装即可: 22 | 23 | ``` 24 | $ pip3 install -r requirement.txt 25 | $ python3 -m pytest --version 26 | This is pytest version 5.1.0, imported from /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pytest.py 27 | setuptools registered plugins: 28 | pytest-httpbin-1.0.0 at /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pytest_httpbin/plugin.py 29 | ``` 30 | 31 | 然后我们写个简单的例程,看看requests是如何使用的: 32 | 33 | ``` 34 | >>> import requests 35 | >>> r = requests.get('http://www.baidu.com') 36 | >>> r.status_code 37 | 200 38 | >>> r.headers['content-type'] 39 | 'text/html' 40 | >>> r.encoding 41 | 'ISO-8859-1' 42 | >>> r.content 43 | b'... 44 | ``` 45 | 46 | ## *-3-代码分析* 47 | 48 | ### *step0确定方向* 49 | 50 | 在我们分析requests源码之前,我们先理一下思路。因为requests发展至今已经提交了137个版本,功能越来越完善,代码也越来越复杂,如果一上来就准备将所有源码啃一遍,那么我想肯定坚持不了3天。所以我决定从简单的入手,触类旁通,逐步攻克目标。 51 | 52 | 接下去我们需要做的是打开README.md文件看看requests当前都支持哪些功能: 53 | 54 | ``` 55 | Feature Support 56 | --------------- 57 | Requests is ready for today's web. 58 | 59 | - International Domains and URLs #国际化域名和URLS 60 | - Keep-Alive & Connection Pooling #keep—Alive&连接池 61 | - Sessions with Cookie Persistence #持久性cookie的会话 62 | - Browser-style SSL Verification #浏览器式SSL认证 63 | - Basic/Digest Authentication #基本/摘要认证 64 | - Elegant Key/Value Cookies #简明的key/value cookies 65 | - Automatic Decompression #自动解压缩 66 | - Automatic Content Decoding #自动内容解码 67 | - Unicode Response Bodies #Unicode响应体 68 | - Multipart File Uploads #文件分块上传 69 | - HTTP(S) Proxy Support #HTTP(S)代理支持 70 | - Connection Timeouts #连接超时 71 | - Streaming Downloads #数据流下载 72 | - `.netrc` Support #'.netrc'支持 73 | - Chunked Requests #Chunked请求 74 | ``` 75 | 76 | 像我这样对http协议不是很熟悉,甚至都听不懂某些专有名词(也可能是我翻译有偏差),暂且先知道它有这么个东东,在后期使用的时候再深入研究,学以致用。 77 | 78 | #### *从测试单元开始* 79 | 80 | 了解了requests具备的基本功能后,就可以开始分析源码了。那么我们该从哪里开始?在源码目录下有一个tests文件夹,这里面以test开头的测试文件是专门用于测试requests接口: 81 | 82 | ``` 83 | $ ls 84 | __init__.py compat.py pytestdebug.log test_hooks.py test_packages.py test_structures.py 85 | testserver utils.py __pycache__ conftest.py test_help.py test_lowlevel.py 86 | test_requests.py test_testserver.py test_utils.py 87 | ``` 88 | 89 | 我们以test_requests.py作为切入点,并选择Basic/Digest Authentication相关内容。通过git grep命令搜索下测试文件test_requests.py关于Digest的内容: 90 | 91 | ``` 92 | $ git grep -n DIGEST tests/test_requests.py 93 | tests/test_requests.py:587: def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 94 | tests/test_requests.py:605: def test_DIGEST_AUTH_RETURNS_COOKIE(self, httpbin): 95 | tests/test_requests.py:616: def test_DIGEST_AUTH_SETS_SESSION_COOKIES(self, httpbin): 96 | tests/test_requests.py:625: def test_DIGEST_STREAM(self, httpbin): 97 | tests/test_requests.py:637: def test_DIGESTAUTH_WRONG_HTTP_401_GET(self, httpbin): 98 | tests/test_requests.py:654: def test_DIGESTAUTH_QUOTES_QOP_VALUE(self, httpbin): 99 | ``` 100 | 101 | 我们选第一个方法分析,找到test_DIGEST_HTTP_200_OK_GET(self, httpbin): 102 | 103 | ``` 104 | # test_requests.py 105 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 106 | 107 | for authtype in self.digest_auth_algo: 108 | auth = HTTPDigestAuth('user', 'pass') 109 | url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') 110 | 111 | r = requests.get(url, auth=auth) 112 | assert r.status_code == 200 113 | 114 | r = requests.get(url) 115 | assert r.status_code == 401 116 | print(r.headers['WWW-Authenticate']) 117 | 118 | s = requests.session() 119 | s.auth = HTTPDigestAuth('user', 'pass') 120 | r = s.get(url) 121 | assert r.status_code == 200 122 | ``` 123 | 124 | ### *step1源码概括* 125 | 126 | 我们先大致了解一下这个测试用例的功能: 127 | 128 | ``` 129 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 130 | ``` 131 | 132 | 测试方法定义,传入参数为httpbin。 133 | 134 | ``` 135 | for authtype in self.digest_auth_algo: 136 | ``` 137 | 138 | 遍历不同的摘要认证算法。 139 | 140 | ``` 141 | auth = HTTPDigestAuth('user', 'pass') 142 | url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') 143 | ``` 144 | 145 | 摘要认证变量auth及url变量的设置。 146 | 147 | ``` 148 | r = requests.get(url, auth=auth) 149 | assert r.status_code == 200 150 | 151 | r = requests.get(url) 152 | assert r.status_code == 401 153 | print(r.headers['WWW-Authenticate']) 154 | ``` 155 | 156 | 对url发起get请求,200表示请求成功,401表示未经授权。这个测试是为了验证auth的必要性。 157 | 158 | ``` 159 | s = requests.session() 160 | s.auth = HTTPDigestAuth('user', 'pass') 161 | r = s.get(url) 162 | assert r.status_code == 200 163 | ``` 164 | 新建了一个会话对象s,同时也设置了auth变量,跟前面不同的是这个请求是由会话对象s发起的。 165 | 166 | ### *step2源码分析* 167 | 168 | #### *digest_auth_algo* 169 | 170 | ``` 171 | # test_requests.py 172 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 173 | ... 174 | for authtype in self.digest_auth_algo: 175 | ... 176 | 177 | -------------------------------------------------------------------------------------------- 178 | 179 | # test_requests.py 180 | class TestRequests: 181 | digest_auth_algo = ('MD5', 'SHA-256', 'SHA-512') 182 | ... 183 | ``` 184 | 185 | 在讲摘要认证算法之前先简单介绍一下摘要访问认证,它是一种协议规定的web服务器用来同网页浏览器进行认证信息协商的方法。浏览器在向服务器发送请求的过程中需要传递认证信息auth,auth经过摘要算法加密形成秘文,最后发送给服务器。服务器验证成功后返回“200”告知浏览器可以继续访问,若验证失败则返回"401"告诉浏览器禁止访问。 186 | 187 | 上文所说的摘要算法就是该代码实现的功能,当前该摘要算法分别选用了"MD5","SHA-256","SHA-512"。如果你想更深入了解,请参考[RFC 2069](https://tools.ietf.org/html/rfc206)。 188 | 189 | #### *HTTPDigestAuth* 190 | 191 | ``` 192 | # test_requests.py 193 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 194 | ... 195 | ... 196 | auth = HTTPDigestAuth('user', 'pass') 197 | ... 198 | 199 | -------------------------------------------------------------------------------------------- 200 | 201 | # auth.py 202 | class HTTPDigestAuth(AuthBase): 203 | """Attaches HTTP Digest Authentication to the given Request object.""" 204 | 205 | def __init__(self, username, password): 206 | self.username = username 207 | self.password = password 208 | # Keep state in per-thread local storage 209 | self._thread_local = threading.local() 210 | 211 | def init_per_thread_state(self): 212 | # Ensure state is initialized just once per-thread 213 | ... 214 | 215 | ... 216 | ``` 217 | 218 | HTTPDigestAuth:为http请求对象提供摘要认证。实例化对象auth时需要传入认证所需的username及password。 219 | 220 | threading.local()在这里的作用是保存一个全局变量,但是这个全局变量只能在当前线程才能访问,每一个线程都有单独的内存空间来保存这个变量,它们在逻辑上是隔离的,其他线程都无法访问。 221 | 222 | 我们可以通过实例演示一下摘要认证: 223 | 224 | ``` 225 | >>> import requests 226 | >>> from requests.auth import HTTPDigestAuth 227 | >>> r = requests.get('http://httpbin.org/digest-auth/auth/user/pass',auth=HTTPDigestAuth 228 | ('user','pass')) 229 | >>> r.status_code 230 | 200 231 | ``` 232 | 233 | #### *httpbin* 234 | 235 | ``` 236 | # test_requests.py 237 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 238 | ... 239 | ... 240 | url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') 241 | ... 242 | 243 | -------------------------------------------------------------------------------------------- 244 | 245 | # conftest.py 246 | def prepare_url(value): 247 | # Issue #1483: Make sure the URL always has a trailing slash 248 | httpbin_url = value.url.rstrip('/') + '/' 249 | 250 | def inner(*suffix): 251 | return urljoin(httpbin_url, '/'.join(suffix)) 252 | 253 | return inner 254 | 255 | 256 | @pytest.fixture 257 | def httpbin(httpbin): 258 | return prepare_url(httpbin) 259 | ``` 260 | 261 | 第一次接触pytest这个模块,看了半天没明白是怎么用的,于是就网上查查查。 262 | 263 | 当我们调用pytest 测试test_requests.py文件时,文件中每一个test开头的方法都会被调用并执行,当然也包括test_DIGEST_HTTP_200_OK_GET(self, httpbin)这个方法。这时你会发现,这里出现了好多个httpbin,第一眼看httpbin像一个方法,因为url=httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never'),但是conftest.py中的 方法def htttpbin(httpbin)却只定义了一个参数,参数名也叫作httpbin…看到这里,我想此时的你也会跟我一样感到非常困惑。 264 | 265 | 好在pytest提供了一系列调试工具,我们可以利用它去调试下httpbin到底是什么东东。先将set_trace()方法插入到代码中,如下: 266 | 267 | ``` 268 | # test_requests.py 269 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 270 | ... 271 | ... 272 | url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') 273 | pytest.set_trace() # debug 274 | ... 275 | 276 | -------------------------------------------------------------------------------------------- 277 | 278 | # conftest.py 279 | def prepare_url(value): 280 | # Issue #1483: Make sure the URL always has a trailing slash 281 | httpbin_url = value.url.rstrip('/') + '/' 282 | 283 | def inner(*suffix): 284 | return urljoin(httpbin_url, '/'.join(suffix)) 285 | 286 | return inner 287 | 288 | 289 | @pytest.fixture 290 | def httpbin(httpbin): 291 | pytest.set_trace() # debug 292 | return prepare_url(httpbin) 293 | ``` 294 | 295 | 为了操作简单,这里我新建了测试文件test_tt.py,只调用了方法test_DIGEST_HTTP_200_OK_GET(self, httpbin),调试信息如下: 296 | 297 | ``` 298 | $ python3 -m pytest test_tt.py 299 | ===================== test session starts ====================== 300 | platform darwin -- Python 3.6.5, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 301 | rootdir: /Users/xmy/requests-2.22.0, inifile: pytest.ini 302 | plugins: mock-0.11.0, httpbin-1.0.0, cov-2.2.1 303 | collected 1 items 304 | test_tt.py 305 | =====================PDB set_trace (IO-capturing turned off) ===================== 306 | \> /Users/xmy/requests-2.22.0/tests/conftest.py(21)httpbin() 307 | -> return prepare_url(httpbin) 308 | (Pdb) httpbin # debug 309 | # httpbin return 310 | (Pdb) c # exit 311 | ===================== PDB set_trace (IO-capturing turned off)===================== 312 | \> /Users/xmy/requests-2.22.0/tests/test_tt.py(27)test_DIGEST_HTTP_200_OK_GET() 313 | -> url = httpbin('digest-auth', 'auth', 'user', 'pass') 314 | (Pdb) httpbin # debug 315 | .inner at 0x1096e7268> # httpbin return 316 | (Pdb) url # debug 317 | 'http://127.0.0.1:62206/digest-auth/auth/user/pass' # url return 318 | (Pdb) c # exit 319 | 127.0.0.1 - - [29/Aug/2019 15:41:16] "GET /digest-auth/auth/user/pass HTTP/1.1" 401 0 320 | 127.0.0.1 - - [29/Aug/2019 15:41:16] "GET /digest-auth/auth/user/pass HTTP/1.1" 200 46 321 | 127.0.0.1 - - [29/Aug/2019 15:41:16] "GET /digest-auth/auth/user/pass HTTP/1.1" 401 0 322 | Digest realm="me@kennethreitz.com", nonce="46f593d6aedc8fe983c2430da4ddda3f", qop="auth", opaque="0512186cf2e776d90250d8558fd40e4a" 323 | 127.0.0.1 - - [29/Aug/2019 15:41:16] "GET /digest-auth/auth/user/pass HTTP/1.1" 401 0 324 | 127.0.0.1 - - [29/Aug/2019 15:41:16] "GET /digest-auth/auth/user/pass HTTP/1.1" 200 46 325 | ===================== 1 passed in 11.91 seconds ===================== 326 | ``` 327 | 328 | 在调试窗口PDB set_trace中可以看到,首先被调用的是的conftest.py中的httpbin()方法,我们在(pdb)中输入httpbin变量,结果返回了。然后继续调用方法test_DIGEST_HTTP_200_OK_GET(),输入httpbin变量,结果返回了.inner at 0x1096e7268>。 329 | 330 | 经过调试后,httpbin的面貌渐渐变得清晰了: 331 | 332 | - test_DIGEST_HTTP_200_OK_GET(self, httpbin)中的httpbin对象为.inner at 0x1096e7268>,也就是源码中prepare_url(value)方法里的inner(*suffix)方法。这里使用了函数闭包,有什么作用?我们后面讲。 333 | 334 | - httpbin(httpbin)方法中参数httpbin对象为,咦?pytest_httpbin是pytest的一个插件,那肯定跟pytest调用有关系了。然后Server是什么东东?我们来查看下它的源码: 335 | 336 | ``` 337 | # serve.py 338 | ... 339 | from wsgiref.simple_server import WSGIServer, make_server, WSGIRequestHandler 340 | ... 341 | 342 | class Server(object): 343 | """ 344 | HTTP server running a WSGI application in its own thread. 345 | """ 346 | port_envvar = 'HTTPBIN_HTTP_PORT' 347 | 348 | def __init__(self, host='127.0.0.1', port=0, application=None, **kwargs): 349 | self.app = application 350 | if self.port_envvar in os.environ: 351 | port = int(os.environ[self.port_envvar]) 352 | self._server = make_server( 353 | host, 354 | port, 355 | self.app, 356 | handler_class=Handler, 357 | **kwargs 358 | ) 359 | self.host = self._server.server_address[0] 360 | self.port = self._server.server_address[1] 361 | self.protocol = 'http' 362 | 363 | self._thread = threading.Thread( 364 | name=self.__class__, 365 | target=self._server.serve_forever, 366 | ) 367 | ... 368 | 369 | ... 370 | def start(self): 371 | self._thread.start() 372 | ... 373 | 374 | ... 375 | @property 376 | def url(self): 377 | return '{0}://{1}:{2}'.format(self.protocol, self.host, self.port) 378 | ... 379 | ``` 380 | 381 | 原来这是一个本地的WSGI服务器,专门用于pytest进行网络测试,这样的好处在于我们无需连接外部网络环境,在本地就能实现一系列的网络测试工作。 382 | 383 | WSGI全称是Web Server Gateway Interface,它其实是一个标准,介于web应用与web服务器之间。只要我们遵循WSGI接口标准设计web应用,就无需在意TCP连接,HTTP请求等等底层的实现,全权交由web服务器即可。 384 | 385 | 上述代码实现的逻辑已经比较清晰了,httpbin对象被实例化的时候调用\_\_init\_\_(self, host='127.0.0.1',port=0, application=None, **kwargs)。 386 | 387 | - host主机号为本地回环地址”127.0.0.1“。 388 | - port端口号默认为0,最终由系统来确定port = int(os.environ[self.port_envvar])。 389 | - application就是前面所说的web应用,如果没有传入,默认为None。 390 | - self._server = make_server(host, port, self.app, handler_class=Handler, **kwargs)创建本地服务器。 391 | - self.\_thread = threading.Thread( name=self.\_\_class\_\_, target=self._server.serve_forever, )创建线程,开启http监听。 392 | 393 | start(self)方法的作用是启动线程。 394 | 395 | url(self)方法使用了装饰器@property,目的是将方法url(self)变成属性来调用,返回本地服务器地址"http://127.0.0.1:xxxx"。 396 | 397 | 398 | emmm...感觉讲了很多,可是还是没搞清楚pytest是如何工作的啊?接下来我们就来讲这块内容,在此之前,我们再贴一下前面的代码: 399 | 400 | ``` 401 | # test_requests.py 402 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 403 | ... 404 | ... 405 | url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') 406 | ... 407 | 408 | -------------------------------------------------------------------------------------------- 409 | 410 | # conftest.py 411 | def prepare_url(value): 412 | # Issue #1483: Make sure the URL always has a trailing slash 413 | httpbin_url = value.url.rstrip('/') + '/' 414 | 415 | def inner(*suffix): 416 | return urljoin(httpbin_url, '/'.join(suffix)) 417 | 418 | return inner 419 | 420 | 421 | @pytest.fixture 422 | def httpbin(httpbin): 423 | return prepare_url(httpbin) 424 | ``` 425 | 426 | 装饰器@pytest.fixture用于声明一个方法是fixture方法,如果测试用例的参数列表中包含fixture对象,那么测试用例运行之前会先调用fixture方法。fixture可以指定其方法范围,由参数scope决定。 427 | 428 | - @pytest.fixture(scope="function"),function级别的fixture方法在每一个测试用例运行前被调用,待测试用例结束之后再销毁。如果不给定scope参数,默认情况下就是“function”。 429 | - @pytest.fixture(scope="session"),session级别的fixture方法在每一次会话中只运行一次,也就是在所有测试用例之前运行一次,且被所有测试用例共享。 430 | 431 | conftest.py可以认为是pytest中的配置文件,单独管理一些预置的操作,与fixture方法配合,pytest在运行测试用例之前会事先调用conftest.py中预置的fixture方法,然后供所有测试用例使用。 432 | 433 | 现在思路越来越清晰了,但是还差一步。我们前面提到fixture方法httpbin(httpbin)中的参数httpbin是一个Server对象,但是这个对象是在什么时候创建的?原来这个httpbin也是一个fixture方法,存在于pytest-httpbin插件中。 434 | 435 | ``` 436 | # plugin.py 437 | ... 438 | from httpbin import app as httpbin_app 439 | ... 440 | 441 | ... 442 | @pytest.fixture(scope='session') 443 | def httpbin(request): 444 | server = serve.Server(application=httpbin_app) 445 | server.start() 446 | request.addfinalizer(server.stop) 447 | return server 448 | ... 449 | ``` 450 | 451 | 这是一个"session"级别的fixture方法,首先实例化Server对象为server,传入application参数"httpbin_app",application参数我们在前面提到过,它指向我们的web应用程序。这里的httpbin_app是pytest-httpbin下app模块的别称,该模块是专门用于http测试而编写的web应用程序,这里就不扩展了。然后server继续调用start()方法,启动线程,开启WSGI服务器,最后返回server。这一切完全符合我们前面的设想。 452 | 453 | 现在,谜底终于揭晓了,我们再总结梳理一下整个流程: 454 | 455 | - 执行pytest测试程序 python3 -m pytest test_tt.py。 456 | 457 | - "session"级别的fixture方法httpbin(request)预先被调用,WSGI服务器开启,返回server,等待被注入到依赖项中。 458 | 459 | - 测试用例执行前,"function"级别的fixture方法httpbin(httpbin)预先被调用,server被注入httpbin(httpbin)中,也即httpbin(server)。接着调用prepare_url(server),获取WSGI服务器地址httpbin_url = server.url.rstrip('/') + '/'。最后返回inner(*suffix)对象,该方法的作用是通过urljoin将httpbin_url对象与\*suffix对象组合成完整的url,同样也是等待被注入到依赖项中。 460 | 461 | (这里补充一下前面讲的闭包函数的概念,闭包需满足3个条件:1.必须嵌套方法。 2.内嵌方法必须引用一个定义在闭合范围内的变量,也就是外部方法的变量 。3.外部方法必须返回内嵌方法。闭包的作用:保持程序上一次运行后的状态然后继续执行。这里prepaer_url(value)为外部方法,httpbin_url为闭合变量,内嵌方法为inner(*suffix)。因为当测试用例被执行时,fixture方法预先被调用,返回了内嵌方法inner(\*suffix)对象,等待被测试用例注入。待注入后,fixture对象实际就是内嵌方法inner(\*suffix)对象。若一个测试用例中多次调用fixture对象,也即多次调用内嵌方法inner(\*suffix)对象,且内嵌方法inner(\*suffix)调用了httpbin_url闭合变量,该闭合变量为WSGI服务器地址,且是唯一不变的。为了保持httpbin_url闭合变量的状态,这才用了闭包的功能。) 462 | 463 | - 测试用例test_DIGEST_HTTP_200_OK_GET(self, httpbin)被执行,fixture对象被注入,httpbin=inner。 464 | 465 | - 获取url对象:url = inner('digest-auth', 'auth', 'user', 'pass', authtype, 'never'),urljoin(httpbin_url, '/'.join(suffix))被调用,最后返回url="http://127.0.0.1:xxxx/digest-auth/auth/user/pass/(authtype)/nerver", authtype为MD5 or SHA-256 or SHA-512。 466 | 467 | httpbin分析花了不少时间,短短几行代码却衍生出了许多内容,让你不断扩充的知识应用于实际当中,我想这可能就是源码阅读的魅力所在。考虑到文字无法完全表达代码的抽象性,有些同学可能还没有理解这部分内容。为了照顾他们的心情,我又去画了张简易流程图来帮助理解。 468 | 469 | ![image-20190902152129336](./img/image-20190902152129336.png) 470 | 471 | 这下应该很清晰了,是不是有种豁然开朗的感觉,惊不惊喜,意不意外? 472 | 473 | 最后,细心的同学可能发现了,在前面的调试信息中,返回的
url="[http://127.0.0.1:62206/digest-auth/auth/user/pass]()",而不是我们测试用例中的url="[http://127.0.0.1:62206/digest-auth/auth/user/pass/MD5/nerver]()"这个样式。 474 | 475 | ``` 476 | ===================== PDB set_trace (IO-capturing turned off)===================== 477 | \> /Users/xmy/requests-2.22.0/tests/test_tt.py(27)test_DIGEST_HTTP_200_OK_GET() 478 | -> url = httpbin('digest-auth', 'auth', 'user', 'pass') 479 | (Pdb) httpbin # debug 480 | .inner at 0x1096e7268> # httpbin return 481 | (Pdb) url # debug 482 | 'http://127.0.0.1:62206/digest-auth/auth/user/pass' # url return 483 | (Pdb) c # exit 484 | ``` 485 | 486 | 因为在使用测试用例中的url时,测试服务器返回了404错误,表示无法找到文件,于是后两个参数被我去掉了。这个问题应该是测试服务器的问题,可能是个bug,我认为不是很重要,所以没有去深究。如果有同学找到了这个答案,记得告知下我哈。 487 | 488 | #### *get* 489 | 490 | ``` 491 | # test_requests.py 492 | ... 493 | import requests 494 | ... 495 | 496 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 497 | ... 498 | ... 499 | r = requests.get(url, auth=auth) 500 | assert r.status_code == 200 501 | 502 | r = requests.get(url) 503 | assert r.status_code == 401 504 | print(r.headers['WWW-Authenticate']) 505 | ... 506 | 507 | -------------------------------------------------------------------------------------------- 508 | 509 | # api.py 510 | def get(url, params=None, **kwargs): 511 | r"""Sends a GET request. 512 | 513 | :param url: URL for the new :class:`Request` object. 514 | :param params: (optional) Dictionary, list of tuples or bytes to send 515 | in the query string for the :class:`Request`. 516 | :param \*\*kwargs: Optional arguments that ``request`` takes. 517 | :return: :class:`Response ` object 518 | :rtype: requests.Response 519 | """ 520 | 521 | kwargs.setdefault('allow_redirects', True) 522 | return request('get', url, params=params, **kwargs) 523 | ... 524 | ``` 525 | 526 | 该方法的作用是向url指定的地址发起GET请求。输入参数分别为: 527 | 528 | - url:url全称叫统一资源定位符,即访问对象在互联网中的唯一地址。 529 | - params:可选参数,字典类型,为请求提供查询参数,最后构造到url中。 530 | - **kwargs:参数前加\*\*在方法中会转换为字典类型,作为请求方法request的可选参数。 531 | 532 | kwargs.setdefault('allow_redirects', True),设置默认键值对,若键值不存在,则插入值为"True"的键'allow_redirects'。 533 | 534 | 返回请求方法request对象。 535 | 536 | #### *request* 537 | 538 | ``` 539 | # api.py 540 | from . import sessions 541 | 542 | def request(method, url, **kwargs): 543 | """Constructs and sends a :class:`Request `. 544 | 545 | :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, 546 | ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``. 547 | :param url: URL for the new :class:`Request` object. 548 | :param params: (optional) Dictionary, list of tuples or bytes to send 549 | in the query string for the :class:`Request`. 550 | :param data: (optional) Dictionary, list of tuples, bytes, or file-like 551 | object to send in the body of the :class:`Request`. 552 | :param json: (optional) A JSON serializable Python object to send in the body of the 553 | :class:`Request`. 554 | :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. 555 | :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. 556 | :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': 557 | file-tuple}``) for multipart encoding upload. 558 | ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', 559 | fileobj, 'content_type')``or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, 560 | where ``'content-type'`` is a string 561 | defining the content type of the given file and ``custom_headers`` a dict-like 562 | object containing additional headers to add for the file. 563 | :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. 564 | :param timeout: (optional) How many seconds to wait for the server to send data 565 | before giving up, as a float, or a :ref:`(connect timeout, read 566 | timeout) ` tuple. 567 | :type timeout: float or tuple 568 | :param allow_redirects: (optional) Boolean. Enable/disable 569 | GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. 570 | :type allow_redirects: bool 571 | :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. 572 | :param verify: (optional) Either a boolean, in which case it controls whether we verify 573 | the server's TLS certificate, or a string, in which case it must be a path 574 | to a CA bundle to use. Defaults to ``True`` 575 | :param stream: (optional) if ``False``, the response content will be immediately 576 | downloaded. 577 | :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, 578 | ('cert', 'key') pair. 579 | :return: :class:`Response ` object 580 | :rtype: requests.Response 581 | 582 | Usage:: 583 | >>> import requests 584 | >>> req = requests.request('GET', 'https://httpbin.org/get') 585 | 586 | """ 587 | 588 | # By using the 'with' statement we are sure the session is closed, thus we 589 | # avoid leaving sockets open which can trigger a ResourceWarning in some 590 | # cases, and look like a memory leak in others. 591 | with sessions.Session() as session: 592 | return session.request(method=method, url=url, **kwargs) 593 | 594 | def get(url, params=None, **kwargs): 595 | ... 596 | return request('get', url, params=params, **kwargs) 597 | ... 598 | ``` 599 | 600 | 请求方法request包含了许多输入参数: 601 | 602 | - method:必选参数,该参数用于设置请求的方法,如"GET","OPTIONS","HEAD","POST","PUT","PATCH","DELETE"。这里调用的是get方法,所以参数为"get"。 603 | 604 | - url:必须参数,请求目标的唯一地址。 605 | 606 | - params:可选参数,字典类型,为请求提供查询参数,最后构造到url中。 607 | 608 | - data:可选参数,为请求提供表单数据。 609 | 610 | - json:可选参数,关于json的设置。 611 | 612 | - headers:可选参数,字典类型,用于设置请求头。 613 | 614 | - cookies:可选参数,关于cookies的设置。 615 | 616 | - files:可选参数,关于文件对象的设置,比如文件上传等。 617 | 618 | - auth:可选参数,关于认证的设置。 619 | 620 | - timeout:可选参数,关于超时的设置。 621 | 622 | - allow_redirects:可选参数,使能/禁止重定向。 623 | 624 | - proxies:可选参数,关于代理的设置。 625 | 626 | - verify:可选参数,关于证书的设置。 627 | 628 | - stream:可选参数,关于流的设置。 629 | 630 | - Cert:可选参数,关于证书路径,内容的设置。 631 | 632 | with sessions.Session() as session,with语句的作用是确保session对象无论是否正常运行都能确保正确退出,避免程序异常导致sockets接口无法正常关闭。具体内容我们在下一节分析。 633 | 634 | 最后返回session.request对象。 635 | 636 | #### *Session* 637 | 638 | ``` 639 | # api.py 640 | from . import sessions 641 | 642 | def request(method, url, **kwargs): 643 | """Constructs and sends a :class:`Request `.""" 644 | ... 645 | with sessions.Session() as session: 646 | return session.request(method=method, url=url, **kwargs) 647 | ... 648 | 649 | -------------------------------------------------------------------------------------------- 650 | 651 | # sessions.py 652 | class Session(SessionRedirectMixin): 653 | """A Requests session. 654 | ... 655 | Provides cookie persistence, connection-pooling, and configuration. 656 | ... 657 | """ 658 | 659 | ... 660 | def __init__(self): 661 | 662 | #: A case-insensitive dictionary of headers to be sent on each 663 | #: :class:`Request ` sent from this 664 | #: :class:`Session `. 665 | self.headers = default_headers() 666 | 667 | #: Default Authentication tuple or object to attach to 668 | #: :class:`Request `. 669 | self.auth = None 670 | 671 | #: Dictionary mapping protocol or protocol and host to the URL of the proxy 672 | #: (e.g. {'http': 'foo.bar:3128', 'http://host.name': 'foo.bar:4012'}) to 673 | #: be used on each :class:`Request `. 674 | self.proxies = {} 675 | 676 | #: Event-handling hooks. 677 | self.hooks = default_hooks() 678 | 679 | #: Dictionary of querystring data to attach to each 680 | #: :class:`Request `. The dictionary values may be lists for 681 | #: representing multivalued query parameters. 682 | self.params = {} 683 | 684 | #: Stream response content default. 685 | self.stream = False 686 | 687 | #: SSL Verification default. 688 | self.verify = True 689 | 690 | #: SSL client certificate default, if String, path to ssl client 691 | #: cert file (.pem). If Tuple, ('cert', 'key') pair. 692 | self.cert = None 693 | 694 | #: Maximum number of redirects allowed. If the request exceeds this 695 | #: limit, a :class:`TooManyRedirects` exception is raised. 696 | #: This defaults to requests.models.DEFAULT_REDIRECT_LIMIT, which is 697 | #: 30. 698 | self.max_redirects = DEFAULT_REDIRECT_LIMIT 699 | 700 | #: Trust environment settings for proxy configuration, default 701 | #: authentication and similar. 702 | self.trust_env = True 703 | 704 | #: A CookieJar containing all currently outstanding cookies set on this 705 | #: session. By default it is a 706 | #: :class:`RequestsCookieJar `, but 707 | #: may be any other ``cookielib.CookieJar`` compatible object. 708 | self.cookies = cookiejar_from_dict({}) 709 | 710 | # Default connection adapters. 711 | self.adapters = OrderedDict() 712 | self.mount('https://', HTTPAdapter()) 713 | self.mount('http://', HTTPAdapter()) 714 | 715 | def __enter__(self): 716 | return self 717 | 718 | def __exit__(self, *args): 719 | self.close() 720 | ... 721 | 722 | ... 723 | def request(self, method, url, 724 | params=None, data=None, headers=None, cookies=None, files=None, 725 | auth=None, timeout=None, allow_redirects=True, proxies=None, 726 | hooks=None, stream=None, verify=None, cert=None, json=None): 727 | """Constructs a :class:`Request `, prepares it and sends it. 728 | Returns :class:`Response ` object. 729 | ... 730 | """ 731 | # Create the Request. 732 | req = Request( 733 | method=method.upper(), 734 | url=url, 735 | headers=headers, 736 | files=files, 737 | data=data or {}, 738 | json=json, 739 | params=params or {}, 740 | auth=auth, 741 | cookies=cookies, 742 | hooks=hooks, 743 | ) 744 | prep = self.prepare_request(req) 745 | 746 | proxies = proxies or {} 747 | 748 | settings = self.merge_environment_settings( 749 | prep.url, proxies, stream, verify, cert 750 | ) 751 | 752 | # Send the request. 753 | send_kwargs = { 754 | 'timeout': timeout, 755 | 'allow_redirects': allow_redirects, 756 | } 757 | send_kwargs.update(settings) 758 | resp = self.send(prep, **send_kwargs) 759 | 760 | return resp 761 | 762 | def get(self, url, **kwargs): 763 | r"""Sends a GET request. Returns :class:`Response` object. 764 | 765 | :param url: URL for the new :class:`Request` object. 766 | :param \*\*kwargs: Optional arguments that ``request`` takes. 767 | :rtype: requests.Response 768 | """ 769 | 770 | kwargs.setdefault('allow_redirects', True) 771 | return self.request('GET', url, **kwargs) 772 | ... 773 | 774 | ... 775 | def post(self, url, data=None, json=None, **kwargs): 776 | r"""Sends a POST request. Returns :class:`Response` object. 777 | 778 | :param url: URL for the new :class:`Request` object. 779 | :param data: (optional) Dictionary, list of tuples, bytes, or file-like 780 | object to send in the body of the :class:`Request`. 781 | :param json: (optional) json to send in the body of the :class:`Request`. 782 | :param \*\*kwargs: Optional arguments that ``request`` takes. 783 | :rtype: requests.Response 784 | """ 785 | 786 | return self.request('POST', url, data=data, json=json, **kwargs) 787 | ... 788 | 789 | ... 790 | def close(self): 791 | """Closes all adapters and as such the session""" 792 | for v in self.adapters.values(): 793 | v.close() 794 | ... 795 | ``` 796 | 797 | Session是什么?我们摘取了部分内容,概括一下它的功能:支持持久性的cookies,使用urllib3连接池功能,对参数进行配置,为request对象提供参数,拥有所有的请求方法等。原来我们所有的设置操作,真真正正开始执行是在Session对象里。 798 | 799 | 同时Session继承了类SessionRedirectMixin,这个类实现了重定向的接口方法。重定向的意思就是当我们通过url指定的路径向服务器请求资源时,发现该资源并不在url指定的路径上,这时服务器通过响应,给出新的资源地址,然后我们通过新的url再次发起请求。这里就不再拓展了,如果后面有遇到相关内容,我们再分析。 800 | 801 | 接下去,我们来分析Session是如何被调用的。 802 | 803 | 前面提到过,Session调用时采用了with的方法,那with是什么?with是用来实现上下文管理的。那上下文管理是什么?上一节已经说了,为了保证with对象无论是否正常运行都能确保正确退出。with语句的原型如下: 804 | 805 | ``` 806 | with expression [as variable]: 807 | with-block 808 | ``` 809 | 810 | with语句中的[as variable]是可选的,如果指定了as variable说明符,则variable就是上下文管理器expression.\_\_enter\_\_()方法返回的对象。with-block是执行语句,with-block执行完毕时,with语句会自动调用expression.\_\_exit\_\_()方法进行资源清理。 811 | 812 | 然后我们看下源码中的with语句以及上下文管理器expression方法实现部分: 813 | 814 | ``` 815 | # api.py 816 | from . import sessions 817 | 818 | def request(method, url, **kwargs): 819 | """Constructs and sends a :class:`Request `.""" 820 | ... 821 | with sessions.Session() as session: 822 | return session.request(method=method, url=url, **kwargs) 823 | ... 824 | 825 | -------------------------------------------------------------------------------------------- 826 | 827 | # sessions.py 828 | class Session(SessionRedirectMixin): 829 | 830 | ... 831 | def __enter__(self): 832 | return self 833 | 834 | def __exit__(self, *args): 835 | self.close() 836 | ... 837 | 838 | ... 839 | def close(self): 840 | """Closes all adapters and as such the session""" 841 | for v in self.adapters.values(): 842 | v.close() 843 | ... 844 | ``` 845 | 846 | 结合with语句,该部分代码的实现一目了然: 847 | 848 | - session = sessions.Session().\_\_enter\_\_(self),也即Session实例本身。 849 | - session.request(method=method, url=url, **kwargs)为with语句执行部分。 850 | 851 | 当执行部分session.request方法调用完成,sessions.Session().\_\_exit\_\_(self, *args)方法被调用,接着Session对象中的close(self)方法被执行,完成Session对象资源的销毁,最后退出。 852 | 853 | __以上就是with语句的用途,这部分内容大家务必要理解,因为他跟后面的会话内容有相关__。 854 | 855 | 大家有没有发现,其实with语句执行完后,requests.get方法也就执行完了,一次请求也即完成。 856 | 857 | 但是,我们的目标是弄清楚它的来龙去脉,所以我们继续分析。 858 | 859 | #### *Session.\_\_init\_\_* 860 | 861 | 回到with语句中session获得上下文管理器sessions.Session()的_\_enter\_\_(self)对象时刻,此时Session对象实例化,调用初始化方法\_\_init\_\_(self): 862 | 863 | ``` 864 | # sessions.py 865 | class Session(SessionRedirectMixin): 866 | 867 | ... 868 | def __init__(self): 869 | 870 | #: A case-insensitive dictionary of headers to be sent on each 871 | #: :class:`Request ` sent from this 872 | #: :class:`Session `. 873 | self.headers = default_headers() 874 | 875 | #: Default Authentication tuple or object to attach to 876 | #: :class:`Request `. 877 | self.auth = None 878 | 879 | #: Dictionary mapping protocol or protocol and host to the URL of the proxy 880 | #: (e.g. {'http': 'foo.bar:3128', 'http://host.name': 'foo.bar:4012'}) to 881 | #: be used on each :class:`Request `. 882 | self.proxies = {} 883 | 884 | #: Event-handling hooks. 885 | self.hooks = default_hooks() 886 | 887 | #: Dictionary of querystring data to attach to each 888 | #: :class:`Request `. The dictionary values may be lists for 889 | #: representing multivalued query parameters. 890 | self.params = {} 891 | 892 | #: Stream response content default. 893 | self.stream = False 894 | 895 | #: SSL Verification default. 896 | self.verify = True 897 | 898 | #: SSL client certificate default, if String, path to ssl client 899 | #: cert file (.pem). If Tuple, ('cert', 'key') pair. 900 | self.cert = None 901 | 902 | #: Maximum number of redirects allowed. If the request exceeds this 903 | #: limit, a :class:`TooManyRedirects` exception is raised. 904 | #: This defaults to requests.models.DEFAULT_REDIRECT_LIMIT, which is 905 | #: 30. 906 | self.max_redirects = DEFAULT_REDIRECT_LIMIT 907 | 908 | #: Trust environment settings for proxy configuration, default 909 | #: authentication and similar. 910 | self.trust_env = True 911 | 912 | #: A CookieJar containing all currently outstanding cookies set on this 913 | #: session. By default it is a 914 | #: :class:`RequestsCookieJar `, but 915 | #: may be any other ``cookielib.CookieJar`` compatible object. 916 | self.cookies = cookiejar_from_dict({}) 917 | 918 | # Default connection adapters. 919 | self.adapters = OrderedDict() 920 | self.mount('https://', HTTPAdapter()) 921 | self.mount('http://', HTTPAdapter()) 922 | ... 923 | ``` 924 | 925 | 初始化方法主要实现了参数的默认设置,包括headers,auth,proxies,stream,verify,cookies,hooks等等。比如我们在发起一次请求时没有设置header参数,那么header就会使用默认参数,由方法default_headers()来设置: 926 | 927 | ``` 928 | # sessions.py 929 | class Session(SessionRedirectMixin): 930 | ... 931 | def __init__(self): 932 | #: A case-insensitive dictionary of headers to be sent on each 933 | #: :class:`Request ` sent from this 934 | #: :class:`Session `. 935 | self.headers = default_headers() 936 | ... 937 | ... 938 | 939 | -------------------------------------------------------------------------------------------- 940 | 941 | # utils.py 942 | ... 943 | def default_user_agent(name="python-requests"): 944 | """ 945 | Return a string representing the default user agent. 946 | 947 | :rtype: str 948 | """ 949 | return '%s/%s' % (name, __version__) 950 | 951 | def default_headers(): 952 | """ 953 | :rtype: requests.structures.CaseInsensitiveDict 954 | """ 955 | return CaseInsensitiveDict({ 956 | 'User-Agent': default_user_agent(), 957 | 'Accept-Encoding': ', '.join(('gzip', 'deflate')), 958 | 'Accept': '*/*', 959 | 'Connection': 'keep-alive', 960 | }) 961 | ... 962 | ``` 963 | 964 | 这时你会发现header默认参数中用户代理'User-Agent'将被设置为"python-requests",如果你正在写爬虫程序抓取某个网站的数据,那么建议你尽快修改用户代理,因为对方服务器可能很快就拒绝一个来之python的访问。 965 | 966 | hooks初始化: 967 | 968 | ``` 969 | # sessions.py 970 | class Session(SessionRedirectMixin): 971 | ... 972 | def __init__(self): 973 | ... 974 | #: Event-handling hooks. 975 | self.hooks = default_hooks() 976 | ... 977 | ... 978 | 979 | -------------------------------------------------------------------------------------------- 980 | 981 | #hooks.py 982 | ... 983 | """ 984 | Available hooks: 985 | 986 | ``response``: 987 | The response generated from a Request. 988 | """ 989 | HOOKS = ['response'] 990 | 991 | def default_hooks(): 992 | return {event: [] for event in HOOKS} 993 | ... 994 | ``` 995 | 996 | hooks意为事件挂钩,可以用来操控部分请求过程或者信号事件处理。requests有一个钩子系统,在请求产生的响应response前,做一些想做的事情。上述源代码中default_hooks()方法用了字典解析,最后返回{'response':[]}。下面我们简单描述一下hooks是如何使用的。 997 | 998 | 首先我们需要传递一个字典{hook_name:callback_function}给参数hooks。hook_name为钩子名,也就是 "response",callback_function为钩子方法,在目标事件发生时回调该方法。callback_function会接受一个数据块作为它的第一个参数,定义如下def callback_function(r, *args, **kwargs)。 999 | 1000 | 从default_hooks()方法返回的hooks默认参数{'response':[]}可知,键"response"所对应的值为一个列表,换句话说,对于一个钩子事件,可以有多个钩子方法。下面我们写个例子演示一下。 1001 | 1002 | ``` 1003 | >>> def hooks1(r, *args, **kwargs): 1004 | ... print("hooks1 url=" + r.url) 1005 | ... 1006 | >>> def hooks2(r, *args, **kwargs): 1007 | ... print("hooks2 encoding=" + r.encoding) 1008 | ... 1009 | >>> hooks = dict(response=[hooks1,hooks2]) 1010 | >>> requests.get("http://httpbin.org", hooks=hooks) 1011 | hooks1 url=http://httpbin.org/ 1012 | hooks2 encoding=utf-8 1013 | 1014 | ``` 1015 | 1016 | cookies初始化: 1017 | 1018 | ``` 1019 | # sessions.py 1020 | class Session(SessionRedirectMixin): 1021 | ... 1022 | def __init__(self): 1023 | ... 1024 | #: A CookieJar containing all currently outstanding cookies set on this 1025 | #: session. By default it is a 1026 | #: :class:`RequestsCookieJar `, but 1027 | #: may be any other ``cookielib.CookieJar`` compatible object. 1028 | self.cookies = cookiejar_from_dict({}) 1029 | ... 1030 | ... 1031 | 1032 | -------------------------------------------------------------------------------------------- 1033 | 1034 | # cookies.py 1035 | ... 1036 | def cookiejar_from_dict(cookie_dict, cookiejar=None, overwrite=True): 1037 | """Returns a CookieJar from a key/value dictionary. 1038 | 1039 | :param cookie_dict: Dict of key/values to insert into CookieJar. 1040 | :param cookiejar: (optional) A cookiejar to add the cookies to. 1041 | :param overwrite: (optional) If False, will not replace cookies 1042 | already in the jar with new ones. 1043 | :rtype: CookieJar 1044 | """ 1045 | if cookiejar is None: 1046 | cookiejar = RequestsCookieJar() 1047 | 1048 | if cookie_dict is not None: 1049 | names_from_jar = [cookie.name for cookie in cookiejar] 1050 | for name in cookie_dict: 1051 | if overwrite or (name not in names_from_jar): 1052 | cookiejar.set_cookie(create_cookie(name, cookie_dict[name])) 1053 | 1054 | return cookiejar 1055 | ... 1056 | ``` 1057 | 1058 | cookies初始化方法cookiejar_from_dict(cookie_dict, cookiejar=None, overwrite=True)的作用是将字典类型的cookies插入到cookiejar中,返回cookiejar。CookieJar用于管理HTTP cookie值,存储HTTP请求生成的cookie,向传出的HTTP请求添加cookie的对象。整个cookie都存储在内存中,CookieJar实例销毁后cookie也将丢失。 1059 | 1060 | adapters初始化: 1061 | 1062 | ``` 1063 | # sessions.py 1064 | class Session(SessionRedirectMixin): 1065 | ... 1066 | def __init__(self): 1067 | ... 1068 | # Default connection adapters. 1069 | self.adapters = OrderedDict() 1070 | self.mount('https://', HTTPAdapter()) 1071 | self.mount('http://', HTTPAdapter()) 1072 | ... 1073 | ... 1074 | 1075 | ... 1076 | def mount(self, prefix, adapter): 1077 | """Registers a connection adapter to a prefix. 1078 | 1079 | Adapters are sorted in descending order by prefix length. 1080 | """ 1081 | self.adapters[prefix] = adapter 1082 | keys_to_move = [k for k in self.adapters if len(k) < len(prefix)] 1083 | 1084 | for key in keys_to_move: 1085 | self.adapters[key] = self.adapters.pop(key) 1086 | 1087 | -------------------------------------------------------------------------------------------- 1088 | 1089 | # adapters.py 1090 | class HTTPAdapter(BaseAdapter): 1091 | """The built-in HTTP Adapter for urllib3. 1092 | 1093 | Provides a general-case interface for Requests sessions to contact HTTP and 1094 | HTTPS urls by implementing the Transport Adapter interface. This class will 1095 | usually be created by the :class:`Session ` class under the 1096 | covers. 1097 | ... 1098 | ... 1099 | ``` 1100 | 1101 | self.adapters = OrderedDict()将adapters指向一个新建的有序字典对象,用于存放传输适配器。传输适配器的作用是提供一种机制,让你可以为HTTP服务定义交互方法。 1102 | 1103 | requests自带了一个传输适配器,就是源码中的HTTPAdapter。这个适配器使用了强大的urllib3库,为requests提供了默认的HTTP和HTTPS交互方法。 1104 | 1105 | mount方法会注册一个传输适配器的特定实例到一个前缀上面。加载以后,任何使用该会话的 HTTP 请求,只要其 URL 是以给定的前缀开头,该传输适配器就会被使用到。 1106 | 1107 | 所以每当Session被实例化,就会有适配器附着在Session上,这里不管是HTTP还是HTTPS,用的都是同一个适配器HTTPAdapter。 1108 | 1109 | 其他参数的初始化请参考字面意思。 1110 | 1111 | #### *Session.request* 1112 | 1113 | Session对象实例化后指向session,接着调用了其内部方法request: 1114 | 1115 | ``` 1116 | # api.py 1117 | from . import sessions 1118 | 1119 | def request(method, url, **kwargs): 1120 | """Constructs and sends a :class:`Request `.""" 1121 | ... 1122 | with sessions.Session() as session: 1123 | return session.request(method=method, url=url, **kwargs) 1124 | ... 1125 | 1126 | -------------------------------------------------------------------------------------------- 1127 | 1128 | # sessions.py 1129 | class Session(SessionRedirectMixin): 1130 | """A Requests session. 1131 | Provides cookie persistence, connection-pooling, and configuration. 1132 | ... 1133 | """ 1134 | ... 1135 | def request(self, method, url, 1136 | params=None, data=None, headers=None, cookies=None, files=None, 1137 | auth=None, timeout=None, allow_redirects=True, proxies=None, 1138 | hooks=None, stream=None, verify=None, cert=None, json=None): 1139 | """Constructs a :class:`Request `, prepares it and sends it. 1140 | Returns :class:`Response ` object. 1141 | ... 1142 | :rtype: requests.Response 1143 | """ 1144 | # Create the Request. 1145 | req = Request( 1146 | method=method.upper(), 1147 | url=url, 1148 | headers=headers, 1149 | files=files, 1150 | data=data or {}, 1151 | json=json, 1152 | params=params or {}, 1153 | auth=auth, 1154 | cookies=cookies, 1155 | hooks=hooks, 1156 | ) 1157 | prep = self.prepare_request(req) 1158 | 1159 | proxies = proxies or {} 1160 | 1161 | settings = self.merge_environment_settings( 1162 | prep.url, proxies, stream, verify, cert 1163 | ) 1164 | 1165 | # Send the request. 1166 | send_kwargs = { 1167 | 'timeout': timeout, 1168 | 'allow_redirects': allow_redirects, 1169 | } 1170 | send_kwargs.update(settings) 1171 | resp = self.send(prep, **send_kwargs) 1172 | 1173 | return resp 1174 | ... 1175 | ``` 1176 | 1177 | 从request方法的注释中可以看出,该方法的作用是构造Request对象,准备并发送它,最后返回Response对象。我们查看一下Request: 1178 | 1179 | ``` 1180 | # sessions.py 1181 | class Session(SessionRedirectMixin): 1182 | ... 1183 | def request(self, method, url, ...): 1184 | ... 1185 | # Create the Request. 1186 | req = Request( 1187 | method=method.upper(), 1188 | url=url, 1189 | headers=headers, 1190 | files=files, 1191 | data=data or {}, 1192 | json=json, 1193 | params=params or {}, 1194 | auth=auth, 1195 | cookies=cookies, 1196 | hooks=hooks, 1197 | ) 1198 | ... 1199 | ... 1200 | 1201 | -------------------------------------------------------------------------------------------- 1202 | 1203 | # models.py 1204 | ... 1205 | class Request(RequestHooksMixin): 1206 | """A user-created :class:`Request ` object. 1207 | Used to prepare a :class:`PreparedRequest `, which is sent to the server. 1208 | ... 1209 | """ 1210 | 1211 | def __init__(self, 1212 | method=None, url=None, headers=None, files=None, data=None, 1213 | params=None, auth=None, cookies=None, hooks=None, json=None): 1214 | 1215 | # Default empty dicts for dict params. 1216 | data = [] if data is None else data 1217 | files = [] if files is None else files 1218 | headers = {} if headers is None else headers 1219 | params = {} if params is None else params 1220 | hooks = {} if hooks is None else hooks 1221 | 1222 | self.hooks = default_hooks() 1223 | for (k, v) in list(hooks.items()): 1224 | self.register_hook(event=k, hook=v) 1225 | 1226 | self.method = method 1227 | self.url = url 1228 | self.headers = headers 1229 | self.files = files 1230 | self.data = data 1231 | self.json = json 1232 | self.params = params 1233 | self.auth = auth 1234 | self.cookies = cookies 1235 | ... 1236 | ... 1237 | ``` 1238 | 1239 | 类Request继承了类RequestHooksMixin,类RequestHooksMixin提供了hooks事件注册与注销的接口方法。初始化方法实现了hooks事件注册,然后又是一波参数设置。类Request对象是为后面的类PreparedRequest对象创建做准备。 1240 | 1241 | Request对象实例构造完成后,继续调用了prepare_request方法: 1242 | 1243 | ``` 1244 | # sessions.py 1245 | 1246 | class Session(SessionRedirectMixin): 1247 | """A Requests session. 1248 | Provides cookie persistence, connection-pooling, and configuration. 1249 | ... 1250 | """ 1251 | 1252 | ... 1253 | def prepare_request(self, request): 1254 | """Constructs a :class:`PreparedRequest ` for 1255 | transmission and returns it. The :class:`PreparedRequest` has settings 1256 | merged from the :class:`Request ` instance and those of the 1257 | :class:`Session`. 1258 | 1259 | :param request: :class:`Request` instance to prepare with this 1260 | session's settings. 1261 | :rtype: requests.PreparedRequest 1262 | """ 1263 | cookies = request.cookies or {} 1264 | 1265 | # Bootstrap CookieJar. 1266 | if not isinstance(cookies, cookielib.CookieJar): 1267 | cookies = cookiejar_from_dict(cookies) 1268 | 1269 | # Merge with session cookies 1270 | merged_cookies = merge_cookies( 1271 | merge_cookies(RequestsCookieJar(), self.cookies), cookies) 1272 | 1273 | # Set environment's basic authentication if not explicitly set. 1274 | auth = request.auth 1275 | if self.trust_env and not auth and not self.auth: 1276 | auth = get_netrc_auth(request.url) 1277 | 1278 | p = PreparedRequest() 1279 | p.prepare( 1280 | method=request.method.upper(), 1281 | url=request.url, 1282 | files=request.files, 1283 | data=request.data, 1284 | json=request.json, 1285 | headers=merge_setting(request.headers, self.headers, 1286 | dict_class=CaseInsensitiveDict), 1287 | params=merge_setting(request.params, self.params), 1288 | auth=merge_setting(auth, self.auth), 1289 | cookies=merged_cookies, 1290 | hooks=merge_hooks(request.hooks, self.hooks), 1291 | ) 1292 | return p 1293 | 1294 | def request(self, method, url, ...): 1295 | """Constructs a :class:`Request `, prepares it and sends it. 1296 | Returns :class:`Response ` object. 1297 | ... 1298 | :rtype: requests.Response 1299 | """ 1300 | ... 1301 | prep = self.prepare_request(req) 1302 | ... 1303 | ... 1304 | ``` 1305 | 1306 | prepare_request(self, request)方法的作用是构造用于传输的PreparedRequest对象并返回它。那么PreparedRequest对象是如何构建的?它是由Request实例对象与Session对象中的数据(如cookies,stream,verify,proxies等等)合并而来。为什么参数分别放在Request对象与Session对象中呢?猜测与Session的参数持久化与连接池等有关,可以充分利用之前请求时存储的参数。至于如何合并参数就不展开了,因为细节较多,代码太长…... 1307 | 1308 | PreparedRequest对象构造完成后,最后再通过send方法将其发送出去: 1309 | 1310 | ``` 1311 | # sessions.py 1312 | 1313 | class Session(SessionRedirectMixin): 1314 | """A Requests session. 1315 | Provides cookie persistence, connection-pooling, and configuration. 1316 | ... 1317 | """ 1318 | ... 1319 | def request(self, method, url, ...): 1320 | """Constructs a :class:`Request `, prepares it and sends it. 1321 | Returns :class:`Response ` object. 1322 | ... 1323 | :rtype: requests.Response 1324 | """ 1325 | ... 1326 | resp = self.send(prep, **send_kwargs) 1327 | return resp 1328 | ... 1329 | 1330 | ... 1331 | def send(self, request, **kwargs): 1332 | """Send a given PreparedRequest. 1333 | 1334 | :rtype: requests.Response 1335 | """ 1336 | ... 1337 | # Get the appropriate adapter to use 1338 | adapter = self.get_adapter(url=request.url) 1339 | 1340 | ... 1341 | 1342 | # Send the request 1343 | r = adapter.send(request, **kwargs) 1344 | ... 1345 | 1346 | return r 1347 | ... 1348 | ``` 1349 | 1350 | send方法接收PreparedRequest对象,然后根据该对象的url参数获取对应的传输适配器。没错,这个传输适配器就是我们前面讲的HTTPAdapter,它的底层由强大的urllib3库实现,为requests提供了的HTTP和HTTPS全部接口方法,包括send等。当send方法将请求发送给服务器后,等待服务器的响应,最后返回Response对象。 1351 | 1352 | 到这里,一个完整的HTTP请求完成了。如果您有足够的勇气,可以继续深入send方法,研究底层库urllib3的实现。 1353 | 1354 | 最后我们以流程图来总结一下HTTP-GET请求涉及的方法调用: 1355 | 1356 | ![image-20190906123617049](./img/image-20190906123617049.png) 1357 | 1358 | 1359 | 1360 | #### *requests.session* 1361 | 1362 | 原以为此次源码分析到这里就结束了,结果在检查的时候发现还遗留了个问题。还记不记得前面提到让大家务必理解with语句的作用,没错,下面这个问题就是与它有关联。我们看下这部分源码: 1363 | 1364 | ``` 1365 | # test_requests.py 1366 | def test_DIGEST_HTTP_200_OK_GET(self, httpbin): 1367 | ... 1368 | ... 1369 | s = requests.session() 1370 | s.auth = HTTPDigestAuth('user', 'pass') 1371 | r = s.get(url) 1372 | assert r.status_code == 200 1373 | ... 1374 | 1375 | -------------------------------------------------------------------------------------------- 1376 | 1377 | # sessions.py 1378 | ... 1379 | class Session(SessionRedirectMixin): 1380 | """A Requests session. 1381 | Provides cookie persistence, connection-pooling, and configuration. 1382 | ... 1383 | """ 1384 | ... 1385 | def get(self, url, **kwargs): 1386 | r"""Sends a GET request. Returns :class:`Response` object. 1387 | 1388 | :param url: URL for the new :class:`Request` object. 1389 | :param \*\*kwargs: Optional arguments that ``request`` takes. 1390 | :rtype: requests.Response 1391 | """ 1392 | 1393 | kwargs.setdefault('allow_redirects', True) 1394 | return self.request('GET', url, **kwargs) 1395 | ... 1396 | ... 1397 | 1398 | def session(): 1399 | """ 1400 | Returns a :class:`Session` for context-management. 1401 | 1402 | :rtype: Session 1403 | """ 1404 | return Session() 1405 | ``` 1406 | 1407 | 从代码中可以看出,测试例程调用session()方法直接返回了Session对象s,然后继续调用get方法。这跟我们常规调用get方法有什么区别呢?好像唯一区别就在与常规调用多了一步with语句的实现: 1408 | 1409 | ``` 1410 | # api.py 1411 | from . import sessions 1412 | 1413 | def request(method, url, **kwargs): 1414 | """Constructs and sends a :class:`Request `.""" 1415 | ... 1416 | with sessions.Session() as session: 1417 | return session.request(method=method, url=url, **kwargs) 1418 | 1419 | def get(url, params=None, **kwargs): 1420 | r"""Sends a GET request. 1421 | ... 1422 | :rtype: requests.Response 1423 | """ 1424 | kwargs.setdefault('allow_redirects', True) 1425 | return request('get', url, params=params, **kwargs) 1426 | ... 1427 | 1428 | -------------------------------------------------------------------------------------------- 1429 | 1430 | # sessions.py 1431 | ... 1432 | class Session(SessionRedirectMixin): 1433 | """A Requests session. 1434 | Provides cookie persistence, connection-pooling, and configuration. 1435 | ... 1436 | """ 1437 | 1438 | ... 1439 | def get(self, url, **kwargs): 1440 | r"""Sends a GET request. Returns :class:`Response` object.""" 1441 | 1442 | kwargs.setdefault('allow_redirects', True) 1443 | return self.request('GET', url, **kwargs) 1444 | ... 1445 | ... 1446 | ``` 1447 | 1448 | 为什么要这样做呢?我们在前面已经提到过,Session对象具有保留参数的功能,支持持久性的cookies以及urllib3的连接池功能,当我们向同一主机发送多个请求的时候,底层的TCP连接将会被重用,从而带来显著的性能提升,同时也为我们节省了很多工作量,不必为每次请求都去设置参数。但是,不是每一次请求都需要保持长连接,保留参数,因为这会带来资源释放失败的风险,所以在我们常规方法中,引入了with 语句确保Session对象的正常退出。 1449 | 1450 | 最后,我们通过代码演示一下两者的区别: 1451 | 1452 | ``` 1453 | >>> s = requests.session() 1454 | >>> s.get("http://httpbin.org/cookies/set/sessioncookie/123456789") 1455 | 1456 | >>> r = s.get("http://httpbin.org/cookies") 1457 | >>> print(r.text) 1458 | { 1459 | "cookies": { 1460 | "sessioncookie": "123456789" 1461 | } 1462 | } 1463 | 1464 | -------------------------------------------------------------------------------------------- 1465 | 1466 | >>> requests.get("http://httpbin.org/cookies/set/sessioncookie/123456789") 1467 | 1468 | >>> r = requests.get("http://httpbin.org/cookies") 1469 | >>> print(r.text) 1470 | { 1471 | "cookies": {} 1472 | } 1473 | ``` 1474 | 1475 | Happy end! 1476 | 1477 | ## *-4-结束语* 1478 | 1479 | 源码阅读真是个苦差事,写这篇文章前前后后共花了12天时间,占用了两个周末,效率有待提高。从此次的阅读成果来看,自己的HTTP知识储备是远远不够的,后期要再接再厉,弥补欠缺的知识。 1480 | 1481 | 如果您看到这里,相信您是一个非常有耐心的人,充满对技术的热爱。同时也欢迎您前来一同探讨任何技术问题,让我们互帮互助,一同成长。我也会定期更新文档,完善其他知识点。 1482 | 1483 | 最后,祝大家身体健康,早日实现自己的目标。 1484 | 1485 | --------------------------------------------------------------------------------