├── .gitignore ├── LICENSE ├── README.md ├── common_question └── README.md ├── front_knowledge └── README.md ├── net_knowledge └── README.md ├── python_advance ├── about_asyncio │ ├── src │ │ ├── 2.py │ │ ├── asyncio_start.py │ │ ├── delay_time.py │ │ ├── ru.py │ │ └── te_sub.py │ └── 初识asyncio.md ├── about_celery │ └── ch1 │ │ ├── fot_ws.html │ │ ├── tasks.py │ │ └── ws_app.py ├── about_logger │ ├── logger_1.py │ ├── logger_3.py │ └── web_and_log.py ├── about_pyhton │ └── simple_interpreter │ │ ├── interpreter_1.py │ │ ├── interpreter_2.py │ │ └── interpreter_3.py ├── language_advance │ └── anyio │ │ └── compare.py ├── python-tips │ └── 向文本中插入文字 │ │ ├── README.md │ │ ├── a.txt │ │ ├── b.txt │ │ ├── base.py │ │ └── built_function.py ├── python周报 │ ├── README.md │ ├── issue#419.md │ ├── issue#420.md │ ├── issue#421.md │ ├── issue#422.md │ ├── issue#423.md │ ├── issue#424.md │ ├── issue#425.md │ ├── issue#426.md │ ├── issue#427.md │ ├── issue#428.md │ ├── issue#429.md │ ├── issue#430.md │ ├── issue#431.md │ ├── issue#432.md │ ├── issue#433.md │ ├── issue#434.md │ ├── issue#435.md │ ├── issue#436.md │ ├── issue#437.md │ ├── issue#438.md │ ├── issue#439.md │ ├── issue#440.md │ ├── issue#441.md │ ├── issue#442.md │ ├── issue#443.md │ ├── issue#444.md │ ├── issue#445.md │ ├── issue#446.md │ ├── issue#447.md │ ├── issue#448.md │ ├── issue#449.md │ ├── issue#450.md │ ├── issue#451.md │ ├── issue#452.md │ ├── issue#453.md │ ├── issue#454.md │ ├── issue#455.md │ ├── issue#456.md │ ├── issue#457.md │ ├── issue#458.md │ ├── issue#459.md │ ├── issue#460.md │ ├── issue#461.md │ ├── issue#462.md │ ├── issue#463.md │ ├── issue#464.md │ ├── issue#465.md │ ├── issue#466.md │ ├── issue#467.md │ ├── issue#468.md │ ├── issue#469.md │ ├── issue#470.md │ ├── issue#471.md │ ├── issue#472.md │ ├── issue#473.md │ ├── issue#474.md │ ├── issue#475.md │ ├── issue#476.md │ ├── issue#477.md │ ├── issue#478.md │ ├── issue#479.md │ ├── issue#480.md │ ├── issue#481.md │ ├── issue#482.md │ ├── issue#483.md │ ├── issue#484.md │ ├── issue#485.md │ ├── issue#486.md │ ├── issue#487.md │ ├── issue#488.md │ ├── issue#489.md │ ├── issue#490.md │ ├── issue#491.md │ ├── issue#492.md │ ├── issue#493.md │ ├── issue#494.md │ ├── issue#495.md │ ├── issue#496.md │ ├── issue#497.md │ ├── issue#498.md │ ├── issue#499.md │ ├── issue#500.md │ ├── issue#501.md │ ├── issue#502.md │ ├── issue#503.md │ ├── issue#504.md │ ├── issue#505.md │ ├── issue#506.md │ ├── issue#507.md │ ├── issue#508.md │ ├── issue#509.md │ ├── issue#510.md │ ├── issue#511.md │ ├── issue#512.md │ ├── issue#513.md │ ├── issue#514.md │ ├── issue#515.md │ ├── issue#516.md │ ├── issue#517.md │ ├── issue#518.md │ ├── issue#519.md │ ├── issue#520.md │ ├── issue#521.md │ ├── issue#522.md │ ├── issue#523.md │ ├── issue#524.md │ ├── issue#525.md │ ├── issue#526.md │ ├── issue#527.md │ ├── issue#528.md │ ├── material │ │ └── 420-reloading.gif │ └── template.md ├── requests请求重试 │ ├── README.md │ ├── decoration_built.py │ ├── decoration_simple.py │ ├── derector.py │ ├── flask_server.py │ ├── normal.py │ └── requests_built.py ├── 使用python客户端和服务器的功能测试实例 │ ├── README.md │ ├── client.py │ ├── flask_server.py │ ├── starletee_server.py │ ├── test_client.py │ ├── test_flask_client.py │ └── test_starletee_api.py ├── 在python脚本中运行脚本的几种方法 │ ├── README.md │ ├── bash_out.txt │ ├── check_alive.py │ ├── restart.py │ ├── run_bash.py │ └── test.sh ├── 在浏览器中运行python的几种主流方式 │ └── README.md └── 翻译计划 │ ├── Python101 │ └── iterators,generators,coroutines │ │ ├── README.md │ │ └── iterators,generators,coroutines.ipynb │ ├── 依赖注入 │ └── old_WAY.py │ └── 异步爬虫 │ ├── README.md │ └── src │ ├── asyncio_crawler.py │ ├── context_data.py │ ├── flask_server.py │ ├── generate_fn.py │ ├── local_server.py │ ├── queue_code.py │ ├── sub_yield_from_task.py │ ├── sub_yield_from_task2.py │ ├── task.py │ ├── yield_example.py │ └── yield_task.py ├── small_projects ├── README.md ├── convert_video │ ├── README.md │ ├── convert_fly_to_mp4.py │ └── video_clipping.py ├── email_sending │ ├── README.md │ ├── email_gui.py │ ├── gui.py │ ├── sche_email_sending.py │ ├── send_email.py │ └── test │ │ ├── read_csv_test.py │ │ └── users.csv ├── pdf_ppt_into_each_other │ ├── pdf_to_ppt.py │ └── ppt_to_pdf.py ├── rasa_ch_simple_example │ ├── __init__.py │ ├── actions.py │ ├── config.yml │ ├── credentials.yml │ ├── data │ │ ├── nlu.md │ │ └── stories.md │ ├── domain.yml │ ├── endpoints.yml │ └── tests │ │ └── conversation_tests.md ├── rasa_learn │ ├── ep2 │ │ ├── __init__.py │ │ ├── actions.py │ │ ├── channels.py │ │ ├── classification.py │ │ ├── config.yml │ │ ├── credentials.yml │ │ ├── data │ │ │ ├── nlu.md │ │ │ └── stories.md │ │ ├── domain.yml │ │ ├── endpoints.yml │ │ ├── run.py │ │ ├── tests │ │ │ └── conversation_tests.md │ │ └── train.py │ └── rasa-assistant │ │ ├── __init__.py │ │ ├── actions.py │ │ ├── config.yml │ │ ├── credentials.yml │ │ ├── data │ │ ├── nlu.md │ │ ├── response.md │ │ └── stories.md │ │ ├── domain.yml │ │ ├── endpoints.yml │ │ ├── results │ │ ├── confmat.png │ │ ├── failed_stories.md │ │ ├── hist.png │ │ ├── intent_report.json │ │ └── story_confmat.pdf │ │ └── tests │ │ ├── conversation_tests.md │ │ └── test_stories.md ├── 文字生成图片 │ ├── config.py │ ├── main.py │ └── requirements.txt └── 音视频分离 │ ├── extract_audio.py │ ├── get_video.py │ └── requirements ├── spider_project ├── README.md ├── ajax │ ├── README.md │ ├── images │ │ └── ajax-urls.png │ └── segmentfault │ │ ├── README.md │ │ ├── example.ipynb │ │ └── images │ │ ├── 1_s.png │ │ ├── ajax_data.png │ │ ├── recmmend.png │ │ ├── request_ana.png │ │ ├── s_index.png │ │ └── skill.png ├── asynchronous │ └── qiutan │ │ ├── data.js │ │ ├── main.py │ │ └── save_helper.py ├── douban_movie │ ├── README.md │ ├── code.py │ ├── douban_spider.ipynb │ └── images │ │ ├── capture_package_ready.png │ │ ├── confirm_request.png │ │ ├── data_target.png │ │ ├── index.png │ │ ├── re_get_data_1.png │ │ ├── re_name_poster.png │ │ ├── requests_ann.png │ │ ├── time_handle.png │ │ ├── total_data.png │ │ ├── unique1.png │ │ ├── unique2.png │ │ └── xpath1.png ├── dzdp │ ├── comments │ │ ├── __init__.py │ │ ├── comments_dzdp.py │ │ └── simple_crawl.py │ ├── details │ │ ├── __init__.py │ │ ├── crawler.py │ │ └── headers.py │ └── resource │ │ └── __init__.py ├── login │ └── wx_web │ │ ├── README.md │ │ ├── analyze.ipynb │ │ └── images │ │ ├── ana_key1.png │ │ ├── ana_key2.png │ │ ├── ana_login.png │ │ ├── ana_qr.png │ │ ├── index.png │ │ ├── lgin.png │ │ ├── login_ana.png │ │ ├── output_14_0.png │ │ ├── qr.png │ │ └── qr_url.png ├── multithreading │ ├── README.md │ ├── douban_top250.py │ └── mul_example │ │ ├── douban250.gif │ │ ├── mul.gif │ │ ├── mul_lock.py │ │ ├── mul_questiion.py │ │ └── mul_treading_exmple.py └── xiecheng │ └── hotel │ ├── __init__.py │ └── spider.py ├── spider_story └── first_day.md └── tools ├── README.md └── requests ├── README.md └── exmple.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Dustyposa 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### 本项目主要想通过一系列有趣的故事来教你学习爬虫和网络知识。不同于其他的爬虫教程,直接上代码或者一句话带过的网络知识。相信你能对爬虫有更深的认知。 2 | ![](https://img.shields.io/badge/language-python3-orange.svg) 3 | PS1: 网络知识和故事正在构思中,目前的story只是半成品,如果你有任何idea请务必与我联系. 4 | PS2: 如果你现在有需要我们提前加入的内容也可以提前说哦. 5 | #### important: 如果你喜欢该项目,请手动点一下🌟哦! 6 | 7 | 8 | - spider_project # 项目时少不了的 9 | - 数据提取项目 10 | - [豆瓣top100多提取器提取](./spider_project/douban_movie/) 11 | - 登录练习 12 | - [微信网页登录](./spider_project/login/wx_web) 13 | - [ajax 异步请求](./spider_project/ajax) 14 | - [思否推荐接口](./spider_project/ajax/segmentfault) 15 | - [爬虫加速](./spider_project/multithreading) 16 | - [多线程抓取豆瓣top250](./spider_project/multithreading) 17 | - [球探爬虫](./spider_project/asynchronous/qiutan) 18 | - spider_story # 故事+学习,详细你会喜欢 19 | + [爬虫和网络..待补完](./spider_story/first_day.md) 20 | - tools # 爬虫工具,让你事半功倍 21 | + [requests](./tools/requests) 22 | 23 | - small_project 24 | - [从视频中提取音频](./small_projects/音视频分离) 25 | - [邮件发送及GUI](./small_projects/email_sending) 26 | - [视频转换与剪切](./small_projects/convert_video) 27 | - common_question # 用来收集常见问题及爬虫技巧 28 | - net_knowledge # 详尽的网络知识 29 | - front_knowledge # 前端知识,提升网页分析速度,spider进阶必备 30 | - python_advance # python进阶知识 31 | - [python周报](./python_advance/python周报) 32 | - [文章翻译](./python_advance/翻译计划) 33 | - [异步爬虫及原理](./python_advance/翻译计划/异步爬虫)--[[相关代码](./python_advance/翻译计划/异步爬虫/src)] 34 | - [iterators,generators,coroutines](./python_advance/翻译计划/Python101/iterators,generators,coroutines) 35 | 36 | 近期文章更新计划: 37 | + [ ] [tdqm的使用]() 38 | + [ ] [在python脚本中运行python脚本(ing...)](./python_advance/在python脚本中运行脚本的几种方法) 39 | + [ ] [使用sentry搭建日志管理平台]() 40 | + [ ] [`how do i` 源码阅读]() 41 | + [ ] [在前端运行`python`的几种主流方法]() 42 | + [x] [使用python编写功能测试代码实例](./python_advance/使用python客户端和服务器的功能测试实例) 43 | + [x] [从requests请求重试到万能重试装饰器](./python_advance/requests请求重试) 44 | + [x] [翻译系列:文章-异步抓取](./python_advance/翻译计划/异步爬虫) 45 | -------------------------------------------------------------------------------- /common_question/README.md: -------------------------------------------------------------------------------- 1 | 这里主要整理爬虫中的常见问题及解决办法。 -------------------------------------------------------------------------------- /front_knowledge/README.md: -------------------------------------------------------------------------------- 1 | ### hey,这里是前端知识储备营,随着spider故事的发展会不断的填充小蜘蛛的知识哦! 2 | -------------------------------------------------------------------------------- /net_knowledge/README.md: -------------------------------------------------------------------------------- 1 | 为每个网络知识点,配好最详尽的网络知识。 -------------------------------------------------------------------------------- /python_advance/about_asyncio/src/2.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import time 3 | from typing import List 4 | 5 | 6 | async def say_after(delay, what): 7 | await asyncio.sleep(delay) 8 | print(what) 9 | 10 | 11 | async def synchronization_main(): 12 | print(f"started at {time.strftime('%X')}") 13 | 14 | await say_after(1, 'hello') 15 | await say_after(2, 'world') 16 | 17 | print(f"finished at {time.strftime('%X')}") 18 | 19 | 20 | async def wait_coroutine(coroutines): 21 | for i in coroutines: 22 | await i 23 | 24 | 25 | async def asynchronization_main(): 26 | print(f"started at {time.strftime('%X')}") 27 | coroutine_list = list() 28 | coroutine_list.append(asyncio.create_task(say_after(1, 'hello'))) 29 | coroutine_list.append(asyncio.create_task(say_after(2, 'asyncio'))) 30 | await wait_coroutine(coroutine_list) 31 | print(f"finished at {time.strftime('%X')}") 32 | 33 | 34 | async def factorial(name, number): 35 | f = 1 36 | for i in range(2, number + 1): 37 | print(f"Task {name}: Compute factorial({i})...") 38 | await asyncio.sleep(1) 39 | f *= i 40 | print(f"Task {name}: factorial({number}) = {f}") 41 | 42 | 43 | async def gather_main(): 44 | # Schedule three calls *concurrently*: 45 | await asyncio.gather( 46 | factorial("A", 2), 47 | factorial("B", 3), 48 | factorial("C", 4), 49 | ) 50 | 51 | 52 | asyncio.run(gather_main()) 53 | 54 | -------------------------------------------------------------------------------- /python_advance/about_asyncio/src/asyncio_start.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | 3 | 4 | async def main(): 5 | print('Hello ...') 6 | await asyncio.sleep(1) 7 | print('... World!') 8 | 9 | 10 | # Python 3.7+ 11 | # asyncio.run(main()) 12 | if __name__ == '__main__': 13 | main() 14 | -------------------------------------------------------------------------------- /python_advance/about_asyncio/src/delay_time.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import datetime 3 | 4 | 5 | async def display_date(): 6 | loop = asyncio.get_running_loop() 7 | end_time = loop.time() + 5.0 8 | while True: 9 | print(datetime.datetime.now()) 10 | if (loop.time() + 1.0) >= end_time: 11 | break 12 | await asyncio.sleep(1) 13 | 14 | 15 | asyncio.run(display_date()) 16 | import requests 17 | 18 | headers = { 19 | 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36', 20 | } 21 | url = 'https://dwz.cn/Qk6kP0DS' 22 | response = requests.get(url, headers=headers) 23 | redirect_responses = response.history 24 | for resp in redirect_responses: 25 | print(f'redirect url: {resp.url}') 26 | -------------------------------------------------------------------------------- /python_advance/about_asyncio/src/ru.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | 4 | print("start1") 5 | print("start1") 6 | print("start1") 7 | time.sleep(1) 8 | print("start1") 9 | time.sleep(1) 10 | print("start1") 11 | time.sleep(1) 12 | input("ASDASD") 13 | print("start1") 14 | time.sleep(1) 15 | print("start2") 16 | print("start1") 17 | time.sleep(1) 18 | print("start2") 19 | time.sleep(1) 20 | print("start2") 21 | print("start2") 22 | 23 | time.sleep(1) 24 | print("end") 25 | -------------------------------------------------------------------------------- /python_advance/about_asyncio/src/te_sub.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | 3 | 4 | async def async_readline(proc: asyncio.subprocess.Process): 5 | while proc.returncode is None and not proc.stdout.at_eof(): 6 | s = proc.stdin.get_extra_info("pipe") 7 | print(f"ssss: {s}") 8 | out = await proc.stdout.readline() 9 | print(F"OUT:{out}") 10 | 11 | 12 | async def async_in(proc: asyncio.subprocess.Process): 13 | while proc.returncode is None: 14 | await asyncio.sleep(1) # 人为阻塞 15 | # proc.stdin.write("321\n".encode("u8")) 16 | 17 | 18 | async def main(): 19 | proc = await asyncio.create_subprocess_exec( 20 | *["python3", "ru.py"], 21 | stdout=asyncio.subprocess.PIPE, 22 | stdin=asyncio.subprocess.PIPE, 23 | stderr=asyncio.subprocess.PIPE, 24 | env={ 25 | "PYTHONUNBUFFERED": '1', 26 | } 27 | ) 28 | # print(f"return code: {proc.returncode}") 29 | # tasks = [ 30 | # asyncio.create_task(async_readline(proc)), 31 | # asyncio.create_task(async_in(proc)), 32 | # ] 33 | tasks = [ 34 | async_readline(proc), 35 | async_in(proc), 36 | ] 37 | done, pending = await asyncio.wait( 38 | tasks, 39 | timeout=30, 40 | return_when=asyncio.FIRST_COMPLETED 41 | ) 42 | print(f"0000: {[i.cancelled() for i in pending]}") 43 | 44 | print(f"len: {len(done)}, len2: {len(pending)}") 45 | print(f"done1111: {[i.result() for i in done]}") 46 | 47 | [(i.cancel(), print(type(i))) for i in pending] 48 | # await asyncio.wait(pending) 49 | print(f"done222: {[i.cancelled() for i in pending]}") 50 | # print(f"pending: {pending}") 51 | # print(proc.returncode) 52 | # proc.kill() 53 | 54 | # proc.terminate() 55 | # await proc.wait() 56 | print(await proc.communicate()) 57 | 58 | print(proc.returncode) 59 | 60 | 61 | if __name__ == "__main__": 62 | asyncio.run(main()) 63 | -------------------------------------------------------------------------------- /python_advance/about_asyncio/初识asyncio.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/python_advance/about_asyncio/初识asyncio.md -------------------------------------------------------------------------------- /python_advance/about_celery/ch1/fot_ws.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 57 | 58 |

 59 | 
 60 | 
 61 | 
62 | 134 | -------------------------------------------------------------------------------- /python_advance/about_celery/ch1/tasks.py: -------------------------------------------------------------------------------- 1 | from typing import Any 2 | 3 | from celery import Celery 4 | from fastapi import WebSocket 5 | 6 | app = Celery('tasks', broker='pyamqp://guest:guest@127.0.0.1') 7 | 8 | 9 | @app.anyio_task 10 | async def send_ws(ws: WebSocket, msg: Any): 11 | print("send msg.....") 12 | await ws.send_text(msg) 13 | print("send msg success!!!!") 14 | 15 | 16 | @app.anyio_task 17 | def add(a, b): 18 | return a + b 19 | -------------------------------------------------------------------------------- /python_advance/about_celery/ch1/ws_app.py: -------------------------------------------------------------------------------- 1 | from fastapi import WebSocket, FastAPI 2 | import uvicorn 3 | 4 | from tasks import send_ws, add 5 | 6 | app = FastAPI() 7 | 8 | 9 | @app.websocket("/ws") 10 | async def ws_t(websocket: WebSocket) -> None: 11 | await websocket.accept() 12 | for i in range(4): 13 | send_ws.delay(websocket, f"this is {i} msg") 14 | 15 | if __name__ == '__main__': 16 | uvicorn.run(app, reload=True) 17 | -------------------------------------------------------------------------------- /python_advance/about_logger/logger_1.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from types import FrameType 3 | from typing import cast 4 | 5 | from loguru import logger 6 | 7 | 8 | # print(logging.getLogger("uvicorn.asgi")) 9 | # print(logging.getLogger("uvicorn.access")) 10 | 11 | 12 | class InterceptHandler(logging.Handler): 13 | def emit(self, record: logging.LogRecord) -> None: # pragma: no cover 14 | print(dir(record)) 15 | # Get corresponding Loguru level if it exists 16 | # try: 17 | # level = logger.level(record.levelname).name 18 | # except ValueError: 19 | # level = str(record.levelno) 20 | # # 21 | # # # Find caller from where originated the logged message 22 | # frame, depth = logging.currentframe(), 2 23 | # while frame.f_code.co_filename == logging.__file__: # noqa: WPS609 24 | # frame = cast(FrameType, frame.f_back) 25 | # depth += 1 26 | # 27 | record.msg += " test" 28 | print(record.getMessage()) 29 | # # logger.log(20, record.getMessage()) 30 | # logger.opt(depth=2, exception=record.exc_info) 31 | # logger.opt(depth=2, exception=record.exc_info).log( 32 | # self.level, 33 | # record.getMessage() + "test", 34 | # record.msg, 35 | # ) 36 | # ... 37 | # self.flush() 38 | 39 | 40 | 41 | # logger.add(sys.stdout, format="{time:HH:mm:ss!UTC}", serialize=True, enqueue=True) 42 | # logger.add(log_handler, format="{time:HH:mm:ss!UTC}", serialize=True, enqueue=True) 43 | logger.add(InterceptHandler("INFO")) 44 | 45 | logger.info("this is a logger") 46 | -------------------------------------------------------------------------------- /python_advance/about_logger/logger_3.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import sys 3 | from datetime import datetime 4 | from logging.handlers import TimedRotatingFileHandler, RotatingFileHandler 5 | 6 | from types import FrameType 7 | from typing import cast, Dict, Any 8 | from pprint import pprint 9 | 10 | from loguru import logger 11 | from orjson import dumps 12 | 13 | 14 | class CustomAppHandler(TimedRotatingFileHandler): 15 | def emit(self, record: logging.LogRecord) -> None: # pragma: no cover 16 | self.format(record) 17 | # print(record.asctime) 18 | pprint(record.__dict__) 19 | # Get corresponding Loguru level if it exists 20 | # try: 21 | # level = logger.level(record.levelname).name 22 | # except ValueError: 23 | # level = str(record.levelno) 24 | # # 25 | # # # Find caller from where originated the logged message 26 | # frame, depth = logging.currentframe(), 2 27 | # while frame.f_code.co_filename == logging.__file__: # noqa: WPS609 28 | # frame = cast(FrameType, frame.f_back) 29 | # depth += 1 30 | # # 31 | # # record.msg += " test" 32 | # # print(record.getMessage()) 33 | # # # logger.log(20, record.getMessage()) 34 | # # logger.opt(depth=2, exception=record.exc_info) 35 | # logger.opt(depth=depth, exception=record.exc_info).log( 36 | # level, 37 | # record.getMessage(), 38 | # ) 39 | # ... 40 | # self.flush() 41 | # logger.opt(depth=1, exception=record.exc_info).log(self.level, record.getMessage()) 42 | print(self.stream.write("123" + self.terminator)) 43 | 44 | 45 | # def rotation_filename(self, path: str): 46 | # return "_".join(path.split(".")) + ".log" 47 | 48 | 49 | def format_record(record: Dict[str, Any]) -> str: 50 | return record["message"] 51 | 52 | 53 | # handlers = [{"sink": CustomAppHandler("test.log")}] 54 | handlers = [ 55 | {"sink": sys.stderr, "level": logging.INFO}, 56 | {"sink": CustomAppHandler("gunicorn_info", when="M"), "format": format_record} 57 | # {"sink": "asd.log", "level": logging.INFO, "rotation": "1 minute"} 58 | ] 59 | 60 | # 61 | logger.configure(handlers=handlers) 62 | logger.add(sink=CustomAppHandler("test.log", when="M")) 63 | # logger.add(sink=CustomAppHandler("test.log", when="M"), format=format_record) 64 | # logger.configure(handlers=handlers) 65 | logger.info("this is a logger, 加上中文") 66 | # import time 67 | # time.sleep(2 * 45) 68 | 69 | logger.info("this is a logger2222222, 加上中文") 70 | -------------------------------------------------------------------------------- /python_advance/about_logger/web_and_log.py: -------------------------------------------------------------------------------- 1 | from typing import List, Optional 2 | from starlette.requests import Request 3 | from loguru import logger 4 | 5 | import uvicorn 6 | from starlette_context import context, middleware, plugins, header_keys 7 | 8 | from fastapi import FastAPI, APIRouter, Depends 9 | from fastapi.encoders import jsonable_encoder 10 | from pydantic import BaseModel 11 | 12 | get_method_name = "GET" 13 | post_method_name = "POST" 14 | trace_id_key = "unique_id" 15 | default_trace_value = "unknown" 16 | logger.add(lambda record: print(record, context[header_keys.HeaderKeys.request_id], context[trace_id_key]), 17 | level="INFO") 18 | app = FastAPI() 19 | app.add_middleware( 20 | middleware.ContextMiddleware, 21 | plugins=( 22 | plugins.RequestIdPlugin(), 23 | ), 24 | ) 25 | 26 | router = APIRouter() 27 | 28 | 29 | class Item(BaseModel): 30 | name: Optional[str] = None 31 | description: Optional[str] = None 32 | price: Optional[float] = None 33 | tax: float = 10.5 34 | tags: List[str] = [] 35 | 36 | 37 | items = { 38 | "foo": {"name": "Foo", "price": 50.2}, 39 | "bar": {"name": "Bar", "description": "The bartenders", "price": 62, "tax": 20.2}, 40 | "baz": {"name": "Baz", "description": None, "price": 50.2, "tax": 10.5, "tags": []}, 41 | } 42 | 43 | 44 | @router.get("/items/{item_id}", response_model=Item) 45 | async def read_item(item_id: str): 46 | logger.info(f"get: {item_id}") 47 | 48 | return items["foo"] 49 | 50 | 51 | @router.post("/items/{item_id}", response_model=Item) 52 | async def update_item(item_id: str, item: Item): 53 | update_item_encoded = jsonable_encoder(item) 54 | items[item_id] = update_item_encoded 55 | logger.info("收到请求") 56 | # print(context.data) 57 | # print(context[header_keys.HeaderKeys.request_id]) 58 | return update_item_encoded 59 | 60 | 61 | # 62 | # class Item(BaseModel): 63 | # name: str 64 | # description: Optional[str] = Field( 65 | # None, title="The description of the item", max_length=300 66 | # ) 67 | # price: float = Field(..., gt=0, description="The price must be greater than zero") 68 | # tax: Optional[float] = None 69 | # 70 | # 71 | # @app.post("/items/{item_id}") 72 | # async def update_item(item_id: int, item: Item = Body(..., embed=True)): 73 | # results = {"item_id": item_id, "item": item} 74 | # return results 75 | # 76 | 77 | 78 | async def set_trace_info(request: Request): 79 | method = request.method 80 | if method == get_method_name: 81 | r_params = request.query_params 82 | elif method == post_method_name: 83 | r_params = await request.json() 84 | else: 85 | r_params = {} 86 | context[trace_id_key] = r_params.get(trace_id_key, default_trace_value) 87 | 88 | 89 | app.include_router(router, dependencies=[Depends(set_trace_info)]) 90 | 91 | if __name__ == '__main__': 92 | 93 | uvicorn.run(app) 94 | -------------------------------------------------------------------------------- /python_advance/about_pyhton/simple_interpreter/interpreter_1.py: -------------------------------------------------------------------------------- 1 | class Interpreter: 2 | def __init__(self): 3 | self.stack = [] 4 | 5 | def LOAD_VALUE(self, val) -> None: 6 | self.stack.append(val) 7 | 8 | def PRINT_ANSWER(self) -> None: 9 | answer = self.stack.pop() 10 | print(answer) 11 | 12 | def ADD_TWO_VALUES(self) -> None: 13 | total = self.stack.pop() + self.stack.pop() 14 | self.stack.append(total) 15 | 16 | def run_code(self, what_to_execute) -> None: 17 | instructions, numbers = what_to_execute["instructions"], what_to_execute["numbers"] 18 | for each_step in instructions: 19 | step_name, argument = each_step 20 | if step_name == "LOAD_VALUE": 21 | getattr(self, step_name)(numbers[argument]) 22 | elif step_name == "ADD_TWO_VALUES": 23 | getattr(self, step_name)() 24 | elif step_name == "PRINT_ANSWER": 25 | getattr(self, step_name)() 26 | 27 | 28 | if __name__ == '__main__': 29 | # what_to_execute = { 30 | # "instructions": [("LOAD_VALUE", 0), # the first number 31 | # ("LOAD_VALUE", 1), # the second number 32 | # ("ADD_TWO_VALUES", None), 33 | # ("PRINT_ANSWER", None)], 34 | # "numbers": [7, 5] 35 | # } 36 | what_to_execute = { 37 | "instructions": [("LOAD_VALUE", 0), 38 | ("LOAD_VALUE", 1), 39 | ("ADD_TWO_VALUES", None), 40 | ("LOAD_VALUE", 2), 41 | ("ADD_TWO_VALUES", None), 42 | ("PRINT_ANSWER", None)], 43 | "numbers": [7, 5, 8]} 44 | Interpreter().run_code(what_to_execute) 45 | -------------------------------------------------------------------------------- /python_advance/about_pyhton/simple_interpreter/interpreter_2.py: -------------------------------------------------------------------------------- 1 | from typing import Any 2 | 3 | 4 | class Interpreter: 5 | def __init__(self): 6 | self.stack = [] 7 | self.environment = {} 8 | 9 | def STORE_NAME(self, name) -> None: 10 | val = self.stack.pop() 11 | self.environment[name] = val 12 | 13 | def LOAD_NAME(self, name) -> None: 14 | val = self.environment[name] 15 | self.stack.append(val) 16 | 17 | def parse_argument(self, instruction, argument, what_to_execute) -> Any: 18 | """解析每条指令的实际参数""" 19 | numbers = ["LOAD_VALUE"] 20 | names = ["LOAD_NAME", "STORE_NAME"] 21 | if instruction in numbers: 22 | argument = what_to_execute["numbers"][argument] 23 | elif instruction in names: 24 | argument = what_to_execute["names"][argument] 25 | 26 | return argument 27 | 28 | def LOAD_VALUE(self, val) -> None: 29 | self.stack.append(val) 30 | 31 | def PRINT_ANSWER(self) -> None: 32 | answer = self.stack.pop() 33 | print(answer) 34 | 35 | def ADD_TWO_VALUES(self) -> None: 36 | total = self.stack.pop() + self.stack.pop() 37 | self.stack.append(total) 38 | 39 | def run_code(self, what_to_execute) -> None: 40 | instructions = what_to_execute["instructions"] 41 | for each_step in instructions: 42 | step_name, argument = each_step 43 | argument = self.parse_argument(step_name, argument, what_to_execute) 44 | bytes_method = getattr(self, step_name) 45 | if argument is None: 46 | bytes_method() 47 | else: 48 | bytes_method(argument) 49 | 50 | 51 | if __name__ == '__main__': 52 | what_to_execute = { 53 | "instructions": [("LOAD_VALUE", 0), 54 | ("STORE_NAME", 0), 55 | ("LOAD_VALUE", 1), 56 | ("STORE_NAME", 1), 57 | ("LOAD_NAME", 0), 58 | ("LOAD_NAME", 1), 59 | ("ADD_TWO_VALUES", None), 60 | ("PRINT_ANSWER", None)], 61 | "numbers": [1, 2], 62 | "names": ["a", "b"]} 63 | Interpreter().run_code(what_to_execute) 64 | -------------------------------------------------------------------------------- /python_advance/language_advance/anyio/compare.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | 3 | import anyio 4 | 5 | 6 | async def anyio_task(n): 7 | print(f"start: {n}") 8 | await anyio.sleep(n) 9 | print(f"end: {n}") 10 | 11 | 12 | async def async_task(n): 13 | print(f"start: {n}") 14 | await asyncio.sleep(n) 15 | print(f"end: {n}") 16 | 17 | 18 | async def anyio_main(): 19 | try: 20 | async with anyio.create_task_group() as tg: 21 | await tg.spawn(anyio_task, 1) 22 | await tg.spawn(anyio_task, 2) 23 | finally: 24 | # e.g. release locks 25 | print('cleanup') 26 | 27 | 28 | async def async_main(): 29 | task1 = asyncio.create_task(async_task(1)) 30 | task2 = asyncio.create_task(async_task(2)) 31 | await asyncio.gather(task1, task2) 32 | # await asyncio.wait([task1, task2]) 33 | print('cleanup') 34 | 35 | from anyio import create_task_group, create_semaphore, sleep, run 36 | 37 | 38 | async def use_resource(tasknum, semaphore): 39 | async with semaphore: 40 | print('Task number', tasknum, 'is now working with the shared resource') 41 | await sleep(1) 42 | 43 | 44 | async def main(): 45 | semaphore = create_semaphore(10) 46 | async with create_task_group() as tg: 47 | for num in range(10): 48 | await tg.spawn(use_resource, num, semaphore) 49 | from anyio import create_task_group, create_lock, sleep, run 50 | 51 | 52 | async def use_resource(tasknum, lock): 53 | # async with lock: 54 | print('Task number', tasknum, 'is now working with the shared resource') 55 | await sleep(1) 56 | 57 | 58 | async def main(): 59 | lock = create_lock() 60 | async with create_task_group() as tg: 61 | for num in range(4): 62 | await tg.spawn(use_resource, num, lock) 63 | 64 | from anyio import create_task_group, create_event, run 65 | 66 | 67 | async def notify(event): 68 | await event.set() 69 | 70 | 71 | async def main(): 72 | event = create_event() 73 | async with create_task_group() as tg: 74 | await tg.spawn(notify, event) 75 | await event.wait() 76 | print('Received notification!') 77 | 78 | from anyio import create_task_group, create_condition, sleep, run 79 | 80 | 81 | async def listen(tasknum, condition): 82 | async with condition: 83 | await condition.wait() 84 | print('Woke up task number', tasknum) 85 | 86 | 87 | async def main(): 88 | condition = create_condition() 89 | async with create_task_group() as tg: 90 | for tasknum in range(6): 91 | await tg.spawn(listen, tasknum, condition) 92 | 93 | await sleep(1) 94 | async with condition: 95 | await condition.notify(1) 96 | 97 | await sleep(1) 98 | async with condition: 99 | await condition.notify(2) 100 | 101 | await sleep(1) 102 | async with condition: 103 | await condition.notify_all() 104 | 105 | if __name__ == '__main__': 106 | # run(main) 107 | run(main) 108 | 109 | # anyio.run(anyio_main) 110 | # asyncio.run(async_main()) 111 | -------------------------------------------------------------------------------- /python_advance/python-tips/向文本中插入文字/README.md: -------------------------------------------------------------------------------- 1 | ## 使用python向文本中插入文字 2 | > 本文只记录方法,希望对你能有所帮助! 3 | 4 | #### 第一种-原生: 5 | 主要利用文件读取和写入。 6 | ```python 7 | def write_data_to_file(file: str, data: str) -> None: 8 | """ 9 | 读取文件,并将数据插入第一行 10 | :param file: 读取文件路径 11 | :param data: 插入数据 12 | :return: 13 | """ 14 | with open(file, "r+", encoding="u8") as fp: 15 | tmp_data = fp.read() # 读取所有文件, 文件太大时不用使用此方法 16 | fp.seek(0) # 移动游标 17 | fp.write(data + "\n" + tmp_data) 18 | ``` 19 | 行数很少,原理也很简单。 20 | **如果我们要判断数据是否存在,不存在就插入,该怎么写?** 21 | 简单更改一下即可: 22 | ```python 23 | def write_data_to_file(file: str, data: str) -> None: 24 | """ 25 | 读取文件,并判断每行数据,如果和要插入的数据存在,就不插入,不存在,则插入第一行。 26 | :param file: 读取文件路径 27 | :param data: 插入数据 28 | :return: 29 | """ 30 | with open(file, "r+", encoding="u8") as fp: 31 | # 遍历每行数据进行判断 32 | for d in fp.readlines(): 33 | if data in d: 34 | break # 存在就跳出 35 | else: 36 | fp.seek(0) 37 | tmp_data = fp.read() # 读取所有文件, 文件太大时不用使用此方法 38 | fp.seek(0) # 移动游标 39 | fp.write(data + "\n" + tmp_data) 40 | ``` 41 | 利用一个读取文件行做了一个判断,也比较简单。 42 | **升级一下,在某一行进行插入。** 43 | ```python 44 | def write_data_to_file(file: str, data: str, row: int = 1) -> None: 45 | """ 46 | 读取文件,在某一行插入。 47 | :param file: 读取文件路径 48 | :param data: 插入数据 49 | :param row: 插入行 50 | :return: 51 | """ 52 | front_data = "" # 前半部分存储 53 | after_data = "" # 后半部分存储 54 | with open(file, "r+", encoding="u8") as fp: 55 | # 遍历每行数据进行判断行数,利用 enumerate 辅助计数 56 | for i, d in enumerate(fp.readlines(), start=1): 57 | if i >= row: 58 | after_data += d 59 | else: 60 | front_data += d 61 | 62 | fp.seek(0) # 回到初始点 63 | fp.write(front_data) 64 | fp.write(data + "\n") 65 | fp.write(after_data) 66 | ``` 67 | 同样代码也比较简单,利用前后分割,做的拼接。 68 | 这就是原生办法。 69 | 接下来我们介绍一个内置库`fileinput` 70 | #### 第二种-`fileinput`实现行数据插入 71 | 使用内置的`fileinput`实现,代码更简单: 72 | ```python 73 | def replace_file_data(file_path: str, data: str) -> None: 74 | """ 75 | 在第一行插入数据 76 | :param file_path: file 路径 77 | :param data: 插入的数据 78 | :return: 79 | """ 80 | with fileinput.input(files=file_path, inplace=True) as fp: 81 | for d in fp: 82 | if fp.filelineno() == 1: 83 | # 如果行数为 1 则进行插入 84 | print(data) 85 | print(d, end="") 86 | ``` 87 | 88 | 需要注意`inplace`必须为`true`,他会捕捉当前的`stdout`,然后加入文件,我们为了简单,就直接使用`print`,所有也可以做**删除操作**,当行数等于某行时,就不`print`即可。 89 | 我们再介绍一个小方法,数据读取,一次性读取多个文件。代码如下: 90 | ```python 91 | def read_all_text(files: Iterable) -> None: 92 | """ 93 | 读取所有文件数据 94 | :param files: 数据组 95 | :return: 96 | """ 97 | with fileinput.input(files=files) as f: 98 | for line in f: 99 | print(line, end="") 100 | ``` 101 | 只需要调动函数即可。 102 | 103 | `read_all_text(["a.txt", "b.txt"])` 104 | 105 | 好了,今天的小技巧就到这里,希望你能有所收获。 106 | -------------------------------------------------------------------------------- /python_advance/python-tips/向文本中插入文字/a.txt: -------------------------------------------------------------------------------- 1 | insert one line2 2 | insert one line5 3 | insert one line4 4 | insert one line3 5 | insert one line2 6 | -------------------------------------------------------------------------------- /python_advance/python-tips/向文本中插入文字/b.txt: -------------------------------------------------------------------------------- 1 | add data 2 | this is a 3 | this is b 4 | this is c 5 | this is d 6 | this is e 7 | this is f 8 | -------------------------------------------------------------------------------- /python_advance/python-tips/向文本中插入文字/base.py: -------------------------------------------------------------------------------- 1 | def write_data_to_file(file: str, data: str) -> None: 2 | """ 3 | 读取文件,并将数据插入第一行 4 | :param file: 读取文件路径 5 | :param data: 插入数据 6 | :return: 7 | """ 8 | with open(file, "r+", encoding="u8") as fp: 9 | tmp_data = fp.read() # 读取所有文件, 文件太大时不用使用此方法 10 | fp.seek(0) # 移动游标 11 | fp.write(data + "\n" + tmp_data) 12 | 13 | 14 | def write_data_to_file(file: str, data: str) -> None: 15 | """ 16 | 读取文件,并判断每行数据,如果和要插入的数据存在,就不插入,不存在,则插入第一行。 17 | :param file: 读取文件路径 18 | :param data: 插入数据 19 | :return: 20 | """ 21 | with open(file, "r+", encoding="u8") as fp: 22 | # 遍历每行数据进行判断 23 | for d in fp.readlines(): 24 | if data in d: 25 | break # 存在就跳出 26 | else: 27 | fp.seek(0) 28 | tmp_data = fp.read() # 读取所有文件, 文件太大时不用使用此方法 29 | fp.seek(0) # 移动游标 30 | fp.write(data + "\n" + tmp_data) 31 | 32 | 33 | def write_data_to_file(file: str, data: str, row: int = 1) -> None: 34 | """ 35 | 读取文件,在某一行插入。 36 | :param file: 读取文件路径 37 | :param data: 插入数据 38 | :param row: 插入行 39 | :return: 40 | """ 41 | front_data = "" # 前半部分存储 42 | after_data = "" # 后半部分存储 43 | with open(file, "r+", encoding="u8") as fp: 44 | # 遍历每行数据进行判断行数,利用 enumerate 辅助计数 45 | for i, d in enumerate(fp.readlines(), start=1): 46 | if i >= row: 47 | after_data += d 48 | else: 49 | front_data += d 50 | 51 | fp.seek(0) # 回到初始点 52 | fp.write(front_data) 53 | fp.write(data + "\n") 54 | fp.write(after_data) 55 | 56 | 57 | if __name__ == '__main__': 58 | write_data_to_file("./a.txt", "insert one line5", 2) 59 | -------------------------------------------------------------------------------- /python_advance/python-tips/向文本中插入文字/built_function.py: -------------------------------------------------------------------------------- 1 | import fileinput 2 | from typing import Iterable 3 | 4 | 5 | def replace_file_data(file_path: str, data: str) -> None: 6 | """ 7 | 在第一行插入数据 8 | :param file_path: file 路径 9 | :param data: 插入的数据 10 | :return: 11 | """ 12 | with fileinput.input(files=file_path, inplace=True) as fp: 13 | for d in fp: 14 | if fp.filelineno() == 1: 15 | # 如果行数为 1 则进行插入 16 | print(data) 17 | print(d, end="") 18 | 19 | 20 | def read_all_text(files: Iterable) -> None: 21 | """ 22 | 读取所有文件数据 23 | :param files: 数据组 24 | :return: 25 | """ 26 | with fileinput.input(files=files) as f: 27 | for line in f: 28 | print(line, end="") 29 | 30 | 31 | if __name__ == '__main__': 32 | # read_all_text(["a.txt", "b.txt"]) 33 | replace_file_data("b.txt", "add data") 34 | -------------------------------------------------------------------------------- /python_advance/python周报/README.md: -------------------------------------------------------------------------------- 1 | python周报的翻译 2 | -- 3 | 2019/10 4 | - [issue#419](./issue%23419.md) 5 | - [issue#420](./issue%23420.md) 6 | 7 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#480.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 480 2 | Date: 2020-12-24 16:25 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-480 5 | 6 | ### 欢迎阅读《pythonista周刊》第480期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-480](https://mailchi.mp/pythonweekly/python-weekly-issue-480) 10 | >翻译:Dustyposa 11 | 12 | **来自赞助商(PS:原文的赞助商):** 13 | [Get Your Weekly Dose of Programming](https://www.programmerweekly.com/?utm_source=pwad&utm_medium=newsletter) A weekly newsletter featuring the best hand curated news, articles, tutorials, talks, tools and libraries etc for programmers. [Join For Free](https://www.programmerweekly.com/?utm_source=pwad&utm_medium=newsletter) 14 | 15 | ### 文章、教程与话题 16 | 17 | [完整教程:利用机器学习进行性别预测](https://www.youtube.com/playlist?list=PLor7ckKBRXkXnrOkw2qh-XT2XrxIZzYE3) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 18 | 学习如何使用 `TensorFlow 2` 训练模型,使用 `FastAPI` 将其公开为 `API`,然后使用 `Flutter` 为 `iOS` 和 `Android` 构建客户端应用。 19 | 20 | [Flask - A list of useful “HOW TO’s”](https://blog.appseed.us/flask-how-to-code-simple-tasks/) 21 | 一个为Flask开发者提供提示和功能的精选列表:`authentication, SQLAlchemy, JSON, redirects.` 22 | 23 | [Cyberpwned](https://nicolas-siplis.com/blog/cyberpwned) 24 | 用 50 行 `Python` 代码搞定 Beating Cyberpunk 2077's 的小游戏。 25 | 26 | [NumPy Illustrated: NumPy视觉指南](https://t.co/sFCfGGxfmz) 27 | 补习你的 `NumPy` ,或者从头开始学习。 28 | 29 | [区块链与Python](https://www.youtube.com/playlist?list=PLe4mIUXfbIqYlbm6tNK9NExxgACsoq5e5) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 30 | 一个关于用 `Python` 创建一个简单的区块链的系列。 31 | 32 | [String Interning in Python](https://arpitbhayani.me/blogs/string-interning) 33 | 每一门编程语言都以表现力为目标, `Python` 也不例外。在这篇文章中,我们将深入研究 `Python` 的内部结构,并找出 `Python` 是如何使用一种名为 `String Interning` 的技术使其解释器具有高性能的。 34 | 35 | [Recreating grep in Python](https://kevingal.com/blog/cli-tools.html) 36 | 一个关于如何制作 `Python CLI` 工具的例子。 37 | 38 | [Building My Own Chess Engine](https://healeycodes.com/building-my-own-chess-engine/) 39 | 探索国际象棋的计算复杂性。`Python` 中的代码片段,所以你也可以这样做。 40 | 41 | [Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance](https://t.co/IJMxqFv3F8) 42 | A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers. 43 | 44 | [用Python自动实现SQL注入](https://bad-jubies.github.io/Blind-SQLi-1/) 45 | 这篇文章向你展示了如何用 `Python` 自动进行 `SQL` 注入。 46 | 47 | [如何用Python创建一个基于AI的全自动交易系统](https://t.co/vqJLMb33mV) 48 | 端到端项目:获取数据、训练模型、下单、获得通知。 49 | 50 | [用OpenCV和Python检测ArUco标记](https://www.pyimagesearch.com/2020/12/21/detecting-aruco-markers-with-opencv-and-python/) 51 | 在本教程中,你将学习如何使用 `OpenCV` 和 `Python` 检测图像和实时视频流中的 `ArUco` 标记。 52 | 53 | [使用PyTorch JIT优化模型](https://lernapparat.de/jit-optimization-intro/) 54 | 这篇文章向你展示了 `JIT` 是如何工作的,重点是使融合优化成为可能的部分,并从一个很高的层面进行了深入的实验,试图展示一些内部的工作原理。 55 | 56 | [Python微服务Web应用程序(与React、Django、Flask)--完整教程](https://www.youtube.com/watch?v=0iB5IPoTDts) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 57 | 使用 `Python` 微服务可以让你把你的应用程序分解成更小的部分,这些部分相互通信。这可以使根据流量来扩展应用程序变得更简单。此外,关注点的分离使得一次只对应用的一个部分进行工作变得更容易。 58 | 59 | [使用TIG协议栈监控Python中的网络调用](https://calendar.perfplanet.com/2020/monitoring-network-calls-in-python-using-tig-stack/) 60 | 61 | [用Python和Scikit-learn进行OPTICS聚类](https://www.machinecurve.com/index.php/2020/12/15/performing-optics-clustering-with-python-and-scikit-learn/) 62 | 63 | [使用Py.test的五个理由](https://kracekumar.com/post/five-reason-to-use-pytest/) 64 | 65 | ### 有趣的项目、工具和库 66 | 67 | 68 | [Hub](https://github.com/activeloopai/hub) 69 | 为 `PyTorch/TensorFlow` 访问和管理数据集以及构建可扩展数据管道的最快方法。 70 | 71 | [zoomout](https://github.com/lanewinfield/zoomout) 72 | 一个拉动开关(或BYO按钮),让你脱离视频通话,快速。 73 | 74 | [ZenML](https://github.com/maiot-io/zenml) 75 | 一个可扩展的、开源的 `MLOps` 框架,用于使用生产环境的机器学习管道。 76 | 77 | [GWSL](https://github.com/Opticos/GWSL-Source) 78 | GWSL在WSL之上和通过SSH自动运行X的过程。 79 | 80 | [Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib) 81 | 用 `Python` 做量化战略资产配置或投资组合优化的库。 82 | 83 | [BackgroundMattingV2](https://github.com/PeterL1n/BackgroundMattingV2) 84 | 实时高分辨率背景垫。 85 | 86 | [Questionary](https://github.com/tmbo/questionary) 87 | `Questionary` 是一个 `Python` 库,用于毫不费力地构建漂亮的命令行界面。 88 | 89 | [django-version-checks](https://github.com/adamchainz/django-version-checks) 90 | 系统检查你的项目环境。 91 | 92 | [argos-translate](https://github.com/argosopentech/argos-translate) 93 | 用 `Python` 编写的开源离线翻译应用。使用 `OpenNMT` 进行翻译,使用 `PyQt` 进行 `GUI`。 94 | 95 | [django-pattern-library](https://github.com/torchbox/django-pattern-library) 96 | `Django` 模板的 `UI` 样式库。 97 | 98 | 99 | Posa: 100 | 101 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 102 | 103 | 104 | ----- 分割线 ----- 105 | 106 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 107 | 108 | 109 | 110 | 111 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-480.html) 112 | - 改进: [issue-480.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23480.md) 113 | 114 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#484.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 484 2 | Date: 2021-01-29 16:25 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-484 5 | 6 | ### 欢迎阅读《pythonista周刊》第484期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-484](https://mailchi.mp/pythonweekly/python-weekly-issue-484) 10 | >翻译:Dustyposa 11 | 12 | ### 新闻 13 | 14 | [PIP is dropping support for Python 2](https://pip.pypa.io/en/stable/news/#id1) 15 | 16 | ### 文章、教程与话题 17 | 18 | [用机器学习预测硬盘故障](https://datto.engineering/post/predicting-hard-drive-failure-with-machine-learning) 19 | 我们都有过硬盘故障的经历,往往就像开机后发现无法访问一堆文件一样突然。这不是一个有趣的经历。尤其是当你有一整个数据中心的硬盘,这些硬盘对维持你的业务运行都很重要时,就更不好玩了。如果我们能够预测其中一个驱动器何时会出现故障,并在数据丢失之前抢先更换硬件,那会怎样? 20 | 21 | [Algpt2 Part 2](https://bkkaggle.github.io/blog/algpt2/2020/07/17/ALGPT2-part-2.html) 22 | 我是如何(几乎)复制 `OpenAI` 的 `GPT-2`(124M版) 23 | 24 | [12 requests per second](https://suade.org/dev/12-requests-per-second-with-python/) 25 | A realistic look at Python web frameworks. 26 | 27 | [为什么我们不使用python原生的enum](https://kodare.net/2020/11/17/why_we_dont_use_enums.html) 28 | `Python` 原生 `enum` 对于它们的设计目的来说是很好的,但我们发现它们是有限的,不允许你以一种合理的方式来增长复杂性和使用量。 29 | 30 | [在Python中使用环境变量进行应用程序配置](https://doppler.com/blog/environment-variables-in-python) 31 | 学习有经验的开发人员如何在 `Python` 中使用环境变量,包括管理默认值和类型转换。 32 | 33 | [SVM分类器和RBF内核--如何在Python中做出更好的模型](https://t.co/rPmFSNz2qc) 34 | 完整地解释了支持向量机(`SVM`)和径向基函数 `(RBF)`核的内部工作原理。 35 | 36 | [Raspberry Pi & Python Powered Tank](https://www.linuxscrew.com/raspberry-pi-python-powered-tank) 37 | This is part one of a two-part series and handles all of the hardware and wiring and a Python script for test firing the engines. 38 | 39 | [Learn Python Data Analytics By Example: NYC Parking Violations](https://t.co/nwWbga5JhL) 40 | 一个有趣的项目和详细的数据分析步骤演练,帮助你学习 `Python、pandas和matplotlib`。 41 | 42 | [How to Develop a Neural Net for Predicting Car Insurance Payout](https://machinelearningmastery.com/predicting-car-insurance-payout/) 43 | In this tutorial, you will discover how to develop a Multilayer Perceptron neural network model for the Swedish car insurance regression dataset. 44 | 45 | [25 IPython Tips for Your Next Advent of Code](https://switowski.com/blog/25-ipython-tips-for-your-next-advent-of-code) 46 | 47 | [Python's type checking renaissance](https://dafoster.net/articles/2021/01/26/python's-type-checking-renaissance/) 48 | 49 | [用Raspberry Pi和Python构建一个智能中央供暖系统](https://medium.com/python-in-plain-english/building-a-smart-central-heating-system-with-a-raspberry-pi-and-python-403c6ea0fd7e) 50 | 51 | [Unravelling `for` statements](https://snarky.ca/unravelling-for-statements/) 52 | 53 | ### 有趣的项目、工具和库 54 | 55 | [Edifice](https://www.pyedifice.org/) 56 | `Edifice` 是一个 `Python` 库,用于构建反应式 `UI`,其灵感来自诸如React之类的现代 `Javascript` 库。`Edifice` 使构建完全反应式UI变得很容易,而无需离开 `Python`,从而获得了两全其美。 57 | [Colorpedia](https://github.com/joowani/colorpedia) 58 | 命令行工具,用于快速查找颜色,阴影和调色板。 59 | 60 | [Scrapera](https://github.com/DarshanDeshpande/Scrapera) 61 | `Scrapera` 提供了对最常用的机器学习和数据科学领域的各种爬虫脚本的访问。 62 | 63 | [Dryvo](https://github.com/AdamGold/Dryvo) 64 | 驾驶课程变得更加简单。 使用 `Python` 构建的自定义计划 `API`。 65 | 66 | [Timeflake](https://github.com/anthonynsimon/timeflake) 67 | Timeflake is a 128-bit, roughly-ordered, URL-safe UUID. 68 | 69 | [spokestack-python](https://github.com/spokestack/spokestack-python) 70 | `Spokestack` 是一个允许用户轻松地将语音接口纳入 `Python` 应用程序的库。 71 | 72 | [siuba](https://github.com/machow/siuba) 73 | `Python` 库,用于将 `dplyr` 等语法与 `pandas` 和 `SQL` 一起使用。 74 | 75 | [paperetl](https://github.com/neuml/paperetl) 76 | 医学和科学论文的 `ETL` 流程。 77 | 78 | [APT-Hunter](https://github.com/ahmedkhlief/APT-Hunter) 79 | `APT-Hunter` 是用于 `Windows` 事件日志的威胁搜寻工具,紫色团队的心态使该工具能够检测 `Windows` 事件日志中隐藏的 `APT` 移动,以减少发现可疑活动的时间。 80 | 81 | [MushroomRL](https://github.com/MushroomRL/mushroom-rl) 82 | `MushroomRL` 是一个 `Python` 强化学习(RL)库,其模块化允许轻松使用知名的 `Python` 库进行张量计算(例如 `PyTorch`,`Tensorflow`)和 `RL` 基准测试(例如 `OpenAI Gym`, `PyBullet`,`Deepmind Control Suite`)。 83 | 84 | [AeroPython](https://github.com/barbagroup/AeroPython) 85 | Classical Aerodynamics of potential flow using Python and Jupyter Notebooks. 86 | 87 | ### 最近更新 88 | 89 | [Python in Visual Studio Code – January 2021 Release](https://devblogs.microsoft.com/python/python-in-visual-studio-code-january-2021-release/) 90 | This was a short release where we closed a total of 13 issues, and it includes a data viewer when debugging and PYTHONPATH support with Pylance. 91 | 92 | 93 | Posa: 94 | 95 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 96 | 97 | 98 | ----- 分割线 ----- 99 | 100 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 101 | 102 | 103 | 104 | 105 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-484.html) 106 | - 改进: [issue-484.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23484.md) 107 | 108 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#501.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 501 2 | Date: 2021-05-28 16:25 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-501 5 | 6 | ### 欢迎阅读《pythonista周刊》第501期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-501](https://mailchi.mp/pythonweekly/python-weekly-issue-501) 10 | >翻译:Dustyposa 11 | 12 | **来自赞助商(PS:原文的赞助商):** 13 | 14 | [Microsoft adds enterprise support for PyTorch AI on Azure](https://www.zdnet.com/article/microsoft-adds-enterprise-support-for-pytorch-ai-on-azure/) 15 | Microsoft makes Facebook's AI Python library more friendly for enterprises with patches and hotfixes for Windows 10. 16 | 17 | 18 | 19 | ### 文章、教程与话题 20 | 21 | [在Python中编写一个受Jinja启发的模板库](https://notes.eatonphil.com/writing-a-template-library-in-python.html) 22 | 在这篇文章中,我们将在 `Python` 中建立一个最小的文本模板库,其灵感来自 `Jinja`。它将能够显示变量并对数组进行迭代。 23 | 24 | [Shrinking your Python application’s Docker image: an overview](https://pythonspeed.com/articles/smaller-docker-images/) 25 | 在这篇文章中,你会发现你可以用来缩小图像的许多技术的概述,大约按逻辑顺序包装组织。重点是 `Python`,尽管这些技术中有许多是比较通用的。 26 | 27 | [Visual Studio Code ❤️ Pytorch](https://www.youtube.com/watch?v=AwRBJa23Z-I) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 28 | 学习 `Visual Studio Code` 中的一些惊人的功能,可以提高你的 `PyTorch` 水平。 29 | 30 | [Why the sad face?](https://lukasz.langa.pl/1d1a43c4-9c8a-4c5f-a366-7f22ce6a49fc/) 31 | 解释 `Black` 风格的一个细节的文章。 32 | 33 | [在PostgreSQL上通过Django查询集的范围字段提高性能](https://nezhar.com/blog/increase-performance-via-range-fields-in-django-querysets-on-postgresql/) 34 | 来自一个使用 `PostgreSQL` 的 `Django` 应用程序的数据库优化的简短故事 35 | 36 | [Retool - Build Internal Tools, 10x faster. By developers for developers.](https://retool.com/?utm_source=pythonweekly&utm_medium=sponsorship) 37 | Retool gives you a powerful set of building blocks: tables, lists, charts and more. Integrate with any datasource, REST API, gRPC, or Firebase and customize your app with JS. Save hundreds of hours. SPONSOR 38 | 39 | [Per-visitor Data With Sessions](https://www.mattlayman.com/understand-django/sessions/) 40 | `Django` 是如何知道用户是否登录的?框架可以在哪里为你的应用程序上的访问者存储数据?在这篇文章中,我们将回答这些问题,并看看 `Django` 中一个叫做 `session` 的存储概念。 41 | 42 | [pyFLAC: Real-time lossless audio compression in Python](https://tech-blog.sonos.com/posts/pyflac-real-time-lossless-audio-compression-in-python/) 43 | `pyFLAC` 软件包使 `Sonos` 在内部研究项目中更有效地传输高质量的音频数据,将数据率降低了约40%。实时应用于原始音频流的能力意味着它可以支持大型数据集,而不需要从磁盘进行额外的后期处理。 44 | 45 | [Making a Modern Python Package with Poetry](https://aricodes.net/posts/python-package-from-scratch/) 46 | 制作一个完整的命令行应用程序,没有任何 `setup.py` 的麻烦 47 | 48 | [What is WSGI and Why Do You Need Gunicorn and Nginx in Django](https://apirobot.me/posts/what-is-wsgi-and-why-do-you-need-gunicorn-and-nginx-in-django) 49 | 在这篇文章中,我们将讨论 `WSGI`、`Gunicorn` 和 `Nginx`。我们将讨论为什么你需要这些东西,以及它们如何与 `Django` 一起工作。 50 | 51 | [Making microrepos feel like monorepos with all-repos](https://www.youtube.com/watch?v=jYeSWGqrXKI) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 52 | 53 | [Line Search Optimization With Python](https://machinelearningmastery.com/line-search-optimization-with-python/) 54 | 55 | [Brython简介](https://stackabuse.com/an-introductory-guide-to-brython/) 56 | 57 | ### 有趣的项目、工具和库 58 | 59 | [msticpy](https://github.com/microsoft/msticpy) 60 | `msticpy` 是用于 `Jupyter` 笔记本的 `CyberSecurity` 工具包,帮助安全分析师在调查安全事件时可视化和分析数据。 61 | 62 | [Jerikan](https://github.com/jerikan-network/cmdb) 63 | 用于网络团队的配置管理系统。 64 | 65 | [pyWhat](https://github.com/bee-san/pyWhat) 66 | `pyWhat` 可以轻松地让你识别电子邮件、IP地址等等。给它一个 `.pcap` 文件或一些文本,它就会告诉你这是什么。 67 | 68 | [voice2json](https://github.com/synesthesiam/voice2json) 69 | 用于 `Linux` 上语音和意图识别的命令行工具。 70 | 71 | [QuickPotato](https://github.com/JoeyHendricks/QuickPotato) 72 | 剖析和测试以深入了解你的 `Python` 代码的性能。 73 | 74 | [kindle2notion](https://github.com/paperboi/kindle2notion) 75 | Export all the clippings from your Kindle device to a database in Notion. 76 | 77 | [entangle](https://github.com/radiantone/entangle) 78 | 一个基于简单装饰器和调用图的轻量级(无服务器)本地 `python` 并行处理框架。 79 | 80 | [Tkinter-Designer](https://github.com/ParthJadhav/Tkinter-Designer) 81 | 通过拖放创建漂亮的 `Tkinter` 图形界面。 82 | 83 | ### 活动 84 | 85 | [Virtual: PyLadies Vancouver Meetup June 2021](https://www.meetup.com/PyLadies-Vancouver/events/278265968/) 86 | 将会有以下话题: 87 | 88 | Starting your quantum journey with Python and Q# 89 | 90 | A Whirlwind Tour of Python Environment Managers 91 | 92 | 93 | [Virtual: DFW Pythoneers Meetup June 2021](https://www.meetup.com/dfwpython/events/zjnlcsyccjbfb/) 94 | 将有一个讲座,使用 `AI` 调试代码。 95 | 96 | [Brisbane PyLadies Meetup June 2021 - Brisbane City](https://www.meetup.com/BrisbanePyLadies/events/278107618/) 97 | There will be a talk, AI against Modern Slavery, How you can use your AI skills for Good. 98 | 99 | Posa: 100 | 101 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 102 | 103 | 104 | ----- 分割线 ----- 105 | 106 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 107 | 108 | 109 | 110 | 111 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-501.html) 112 | - 改进: [issue-501.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23501.md) 113 | 114 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#504.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 504 2 | Date: 2021-06-18 16:25 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-504 5 | 6 | ### 欢迎阅读《pythonista周刊》第504期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-504](https://mailchi.mp/pythonweekly/python-weekly-issue-504) 10 | >翻译:Dustyposa 11 | 12 | ### 文章、教程与话题 13 | 14 | [Building a WebAuthn Click Farm — Are CAPTCHAs Obsolete?](https://betterappsec.com/building-a-webauthn-click-farm-are-captchas-obsolete-bfab07bb798c) 15 | How I built a click farm to “bypass” Cloudflare’s CAPTCHA killer with some cheap USB security keys, an Arduino, and a bit of python. 16 | 17 | [2nd Year Calculus, But in Python](https://www.youtube.com/watch?v=Teb28OFMVFc) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 18 | 本讲座将介绍二年级微积分的所有公式,以及如何用 `Python` 符号化地评估它们,不需要铅笔或纸张。 19 | 20 | [如何解决Python中的内存问题](https://innovation.alteryx.com/how-to-troubleshoot-memory-problems-in-python/) 21 | 这篇文章展示了我们如何诊断和修复 `EvalML` (开源 `AutoML` 库)中的一个内存问题。 22 | 23 | [Data Analytics Crash Course: Teach Yourself in 30 Days](https://www.youtube.com/watch?v=jcTj6FgWOpo) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 24 | 该课程是对基于 `Python` 的数据分析的介绍。你将对 `Python` 的工作原理有一个基本的了解,使你能够自信地找到和处理数据源,并使用 `Jupyter` 环境从数据中获得洞察力。 25 | 26 | [Types in Python](https://auth0.com/blog/typing-in-python/) 27 | 鸟瞰 `Python 3.x` 中的类型化功能 28 | 29 | [New Features in Python 3.10](https://www.youtube.com/watch?v=5-A435hIYio) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 30 | 本讲座涵盖了 `Python` 中一些最有趣的新增内容--结构模式匹配、括号内的上下文管理器、更多的类型,以及新的和改进的错误信息。 31 | 32 | [使用PyCaret进行自动机器学习](https://t.co/xTG3kDqmhq) 33 | 用不到十行的代码就能使你的机器学习工作流程自动化。 34 | 35 | [Continuous integration for data science with pytest, Github Actions, and Hypervector](https://blog.hypervector.io/posts/2021-5-12-int-github.html) 36 | 本指南概述了如何使用 `Python` 库 `pytest`、 `Github` 的持续集成平台 `Actions` 和 `Hypervector` --一个用于轻松构建数据科学测试夹具的 `API`,为数据科学功能建立一个自动检查,在每次针对项目打开拉动请求时运行。 37 | 38 | [如何用AWS Chalice创建CRUD REST API](https://auth0.com/blog/how-to-create-crud-rest-api-with-aws-chalice/) 39 | 了解如何利用 `AWS Chalice` 无服务器技术构建图书数据库 `API`。 40 | 41 | [Excel、Python和数据科学的未来](https://www.infoworld.com/article/3620913/excel-python-and-the-future-of-data-science.html) 42 | 如果说无处不在的电子表格程序是通向数据科学的大门,那么 `Python` 旨在成为下一个步骤。 43 | 44 | [用Redis、Flask和Docker实现API缓存 [Step-By-Step\]](https://levelup.gitconnected.com/implement-api-caching-with-redis-flask-and-docker-step-by-step-9139636cef24) 45 | 你希望你的 `API` 更快、更稳定,并减少对服务器的请求?- 这就是缓存发挥作用的地方。本文告诉你如何在 `Flask` 上用 `Redis` 实现 `API` 缓存。 46 | 47 | [我如何在公共图书馆的Raspberry Pi 400上教授Python?](https://opensource.com/article/21/6/teach-python-raspberry-pi) 48 | 49 | [用Python编写快速的异步HTTP请求](https://blog.jonlu.ca/posts/async-python-http) 50 | 51 | ### 有趣的项目、工具和库 52 | 53 | [Flagsmith](https://github.com/Flagsmith/flagsmith) 54 | `Flagsmith` 是一个开放源码、功能齐全的特征旗帜和远程配置服务。使用我们的托管 `API`,部署到你自己的私有云,或在内部运行。 55 | 56 | [kcare-uchecker](https://github.com/cloudlinux/kcare-uchecker) 57 | 一个简单的工具可以检测到过期的共享库仍然链接到内存中的进程。 58 | 59 | [SimSwap](https://github.com/neuralchen/SimSwap) 60 | 一个高效的高保真人脸交换框架。 61 | 62 | [plan2scene](https://3dlg-hcvc.github.io/plan2scene/) 63 | 将平面图转换为3D场景。 64 | 65 | [mimics](https://github.com/maarten-dp/mimics) 66 | 用于推迟对对象和类的行动和操作的小工具。 67 | 68 | [Nerd-Storage](https://github.com/0xHaru/Nerd-Storage) 69 | 一个简单的 `LAN` 存储。 70 | 71 | [replit-cli](https://github.com/CoolCoderSJ/replit-cli) 72 | 通过 `Replit CLI` 与 `Replit` 进行远程互动。 73 | 74 | [STCN](https://github.com/hkchengrex/STCN) 75 | 重新思考具有改进内存覆盖率的时空网络,以实现高效的视频对象分割。 76 | 77 | [pyrgg](https://github.com/sepandhaghighi/pyrgg) 78 | `Python` 随机图谱生成器。 79 | 80 | [epispot](https://github.com/epispot/epispot) 81 | 用于通过分区模型对传染病进行数学建模的 `Python` 软件包。 82 | 83 | ### 活动 84 | 85 | [Virtual: PyBerlin 29](https://www.meetup.com/PyBerlin/events/277644388/) 86 | 将会有以下话题: 87 | 88 | - 为数据科学编写高效的 `Python` 89 | - Don't put your server in a nuclear silo 90 | 91 | 92 | [Virtual: San Diego Python Meetup June 2021](https://www.meetup.com/pythonsd/events/fhtccsyccjbgc/) 93 | 将有一个讲座,Lambda函数。 94 | 95 | [Virtual: PyLadies Berlin](https://www.meetup.com/PyLadies-Berlin/events/277660370/) 96 | There will be a workshop, Introduction to Web-Scraping & Regex. 97 | 98 | [Virtual: Edmonton Python Meetup June 2021](https://www.meetup.com/startupedmonton/events/chqxhsyccjbfc/) 99 | 100 | 101 | Posa: 102 | 103 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 104 | 105 | 106 | ----- 分割线 ----- 107 | 108 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 109 | 110 | 111 | 112 | 113 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-504.html) 114 | - 改进: [issue-504.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23504.md) 115 | 116 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#506.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 506 2 | Date: 2021-07-02 16:25 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-506 5 | 6 | ### 欢迎阅读《pythonista周刊》第506期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-506](https://mailchi.mp/pythonweekly/python-weekly-issue-506) 10 | >翻译:Dustyposa 11 | 12 | **来自赞助商(PS:原文的赞助商):**[ 13 | Get Your Weekly Dose of Programming](https://www.programmerweekly.com/?utm_source=pwad&utm_medium=newsletter) 14 | A weekly newsletter featuring the best hand curated news, articles, tutorials, talks, tools and libraries etc for programmers. [Join For Free](https://www.programmerweekly.com/?utm_source=pwad&utm_medium=newsletter) 15 | 16 | ### 新闻 17 | 18 | [Introducing GitHub Copilot: your AI pair programmer](https://copilot.github.com/) 19 | `GitHub Copilot` 是一个新的人工智能驱动的配对程序员,它与人类程序员合作,建议新的功能或代码行,支持广泛的框架和语言,包括 `Python`。 20 | 21 | ### 文章、教程与话题 22 | 23 | 24 | [为你的Flask模板提供漂亮的互动表格](https://blog.miguelgrinberg.com/post/beautiful-interactive-tables-for-your-flask-templates) 25 | 在 `Flask` 模板中渲染一个带有数据的表格,当表格很短时,是一个相对简单的任务,但对于需要排序、分页和搜索等功能的大型表格来说,可能是难以置信的困难。这篇文章告诉你如何在你的模板中整合 `dataTables.js` 库,这将使你能够轻松地创建功能齐全的表格 26 | 27 | [使用Python和OpenSea API对Meebits NFTs进行数据挖掘](http://adilmoujahid.com/posts/2021/06/data-mining-meebits-nfts-python-opensea/) 28 | 在本教程中,我们将使用 `Python` 和 `OpenSea API` 来下载和分析与 `Meebits` 有关的交易。在第1节中,我们将开始对 `NFTs`、 `Larva Lab` 和 `Meebits` 进行简短介绍。在第2节中,我们将介绍如何使用 `python` 和 `OpenSea API` 下载 `Meebits` 交易,我们将分析数据,目的是了解销售趋势和 `Meebits` 的一些卖家和所有者的行为。 29 | 30 | [Django Tutorial for Beginners [2021\]](https://www.youtube.com/watch?v=rHux0gMZ3Eg) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 31 | 学习 `Django` 以从事后端开发工作。 这个 `Django` 教程教给你入门所需的一切。 32 | 33 | [Typeclasses in Python](https://sobolevn.me/2021/06/typeclasses-in-python) 34 | 这篇文章为 `Python` 开发者介绍了一个新的概念: `typeclasses`。它是一个新的 `dry-python` 的概念,叫做类。 35 | 36 | [Functools - Python中高阶函数的力量](https://t.co/TT900DomXt) 37 | 参观 `Python` 的 `functools` 模块,学习如何使用其高阶函数来实现缓存、重载以及其他更多。 38 | 39 | [Build A Machine Learning iOS App](https://www.youtube.com/watch?v=ca4RGvIY5cc) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 40 | 了解如何使用 `Torchscript` 在 `iOS` 设备上部署 `PyTorch` 模型,并在 `Swift` 中构建一个移动 `App` 来进行图像分类。 41 | 42 | [如何启动一个可生产的Django项目](https://simpleisbetterthancomplex.com/tutorial/2021/06/27/how-to-start-a-production-ready-django-project.html) 43 | 44 | [测量Pandas数据框架的内存使用情况](https://pythonspeed.com/articles/pandas-dataframe-series-memory-usage/) 45 | 46 | ### 有趣的项目、工具和库 47 | 48 | [fds](https://github.com/DAGsHub/fds) 49 | 为数据科学家提供的 `CLI` ,通过方便地包装 `git` 和 `dvc`,可以同时对数据和代码进行版本控制。 50 | 51 | [pytago](https://github.com/nottheswimmer/pytago) 52 | 把一些 `Python` 转译成人类可读的 `Golang`。 53 | 54 | [cloudproxy](https://github.com/claffin/cloudproxy) 55 | 将你的爬虫IP隐藏在云端。在不同的云供应商之间提供代理服务器,以提高你的抓取成功率。 56 | 57 | [Zero](https://github.com/Ananto30/zero) 58 | 一个高性能和快速的 `Python` 微服务框架 `(RPC + PubSub)`。 59 | 60 | [TFace](https://github.com/Tencent/TFace) 61 | 一个值得信赖的人脸识别研究平台。它提供了一个高性能的分布式训练框架,并发布了我们高效的方法实现。 62 | 63 | [DeepLab2](https://github.com/google-research/deeplab2) 64 | `DeepLab2` 是一个用于深度标注的 `TensorFlow` 库,旨在为密集的像素标注任务提供一个统一的、最先进的 `TensorFlow` 代码库。 65 | 66 | [NL-Augmenter](https://github.com/GEM-benchmark/NL-Augmenter) 67 | A Collaborative Repository of Natural Language Transformations. 68 | 69 | ### 最近更新 70 | 71 | [Python 3.9.6, 3.8.11, 3.7.11, and 3.6.14 are now available](https://blog.python.org/2021/06/python-396-3811-3711-and-3614-are-now.html) 72 | 73 | ### 活动 74 | 75 | [Virtual: PyMNtos Python Presentation Night #96](https://www.meetup.com/PyMNtos-Twin-Cities-Python-User-Group/events/278900280/) 76 | 将会有以下话题: 77 | 78 | - 用 `Python` 实现类似流媒体的网络流媒体 79 | - 展示和讲述:`GetServiceBell.com` 80 | 81 | 82 | [Virtual: PyLadiesStockholm Workshop July 2021](https://www.meetup.com/PyLadiesStockholm/events/278600570/) 83 | 我们将谈论机器学习的基础知识。一些定义,模型训练和测试,评估指标和ML问题类型。回归、分类和聚类。然后,我们将通过一个关于 "皮马印第安人糖尿病预测 "项目的实践性ML应用。 84 | 85 | [PyData Tel Aviv July 2021](https://www.meetup.com/PyData-Tel-Aviv/events/278830929/) 86 | 将会有以下话题: 87 | 88 | - 让 `N` 个人群在一个模型架构中工作 89 | - Contextual Bandits for Pricing 90 | - 推荐系统的离线度量标准 91 | 92 | 93 | Posa: 94 | 95 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 96 | 97 | 98 | ----- 分割线 ----- 99 | 100 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 101 | 102 | 103 | 104 | 105 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-506.html) 106 | - 改进: [issue-506.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23506.md) 107 | 108 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#509.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 509 2 | Date: 2021-07-23 16:11 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-509 5 | 6 | ### 欢迎阅读《pythonista周刊》第509期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-509](https://mailchi.mp/pythonweekly/python-weekly-issue-509) 10 | >翻译:Dustyposa 11 | 12 | ### 文章、教程与话题 13 | 14 | [Python中最重要的3个傅里叶变换](https://www.youtube.com/watch?v=GKsCWivmlHg) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 15 | 本视频深入研究了 `sympy` 和 `scipy` 库,以了解 `python` 中的傅里叶分析,并具体研究了1.傅里叶变换,2.傅里叶数列,以及3.离散傅里叶变换。离散傅里叶变换。 16 | 17 | [用AlphaFold进行高精度的蛋白质结构预测](https://www.nature.com/articles/s41586-021-03819-2_reference.pdf) 18 | Nature 杂志发表了 `AlphaFold` 的最新情况,这是一种采用机器学习的计算方法,用于高精确度地预测全链蛋白质结构。 19 | 20 | [Beautiful ideas in programming: generators and continuations](https://www.hhyu.org/posts/generator_and_continuation/) 21 | 这篇文章总结了作者在尝试深入理解编程中的两个重要概念时的心得。`Python` 的生成器和 `Scheme` 的 `continuation`。 22 | 23 | [namedtuple in a post-dataclasses world](https://death.andgravity.com/namedtuples) 24 | `namedtuple` 已经存在了很久,随着时间的推移,它的便利性使它的使用远远超出了它最初的目的。随着数据类现在覆盖了这些用例的一部分,人们应该把 `namedtuple` 用于什么?在这篇文章中,我们将通过几个真实代码的例子来看看这个问题。 25 | 26 | [使用Django的指南(第二部分)。GeoDjango、PostGIS和Leaflet](https://www.paulox.net/2021/07/19/maps-with-django-part-2-geodjango-postgis-and-leaflet/) 27 | 一个快速入门指南,使用基于 `Python` 的网络框架 `Django` 创建一个网络地图,使用其模块 `GeoDjango`,`PostgreSQL` 数据库及其空间扩展 `PostGIS` 和 `Leaflet`,一个用于互动地图的 `JavaScript` 库。 28 | 29 | [Monitor your home's temperature and humidity with Raspberry Pis and Prometheus](https://opensource.com/article/21/7/home-temperature-raspberry-pi-prometheus) 30 | 在 `Raspberry Pi` 上用 `Python` 测试一个 `Prometheus` 应用程序,收集温度传感器数据。 31 | 32 | [Making Sense Of Settings](https://www.mattlayman.com/understand-django/settings/) 33 | 所有的 `Django` 应用程序都需要进行配置,以便正常运行。在这篇文章中,我们将深入探讨 `Django` 如何让您使用设置模块来配置您的项目。我们还将探讨如何更有效地使用设置。 34 | 35 | [Complete Glossary of Keras Neural Network Layers (with Code)](https://analyticsarora.com/complete-glossary-of-keras-neural-network-layers-with-code) 36 | Learn the purpose and instantiation for Core layers, Pooling layers, Preprocessing layers, etc. 37 | 38 | ### 有趣的项目、工具和库 39 | 40 | [AlphaFold](https://github.com/deepmind/alphafold) 41 | 这个包提供了 `AlphaFold v2.0` 推理管道的实现。 42 | 43 | [MVT](https://github.com/mvt-project/mvt) 44 | `MVT` 是一个寻找智能手机设备感染迹象的取证工具。 45 | 46 | [MaskFormer](https://github.com/facebookresearch/MaskFormer) 47 | Per-Pixel Classification is Not All You Need for Semantic Segmentation. 48 | 49 | [shillelagh](https://github.com/betodealmeida/shillelagh/) 50 | 使得通过 `SQL` 查询 `API` 变得容易。 51 | 52 | [Chameleon](https://github.com/klezVirus/chameleon) 53 | `Chameleon` 是另一个 `PowerShell` 混淆工具,旨在绕过 `AMSI` 和商业反病毒解决方案。 54 | 55 | [Parallelformers](https://github.com/tunib-ai/parallelformers) 56 | 一个高效的模型平行化部署工具包。 57 | 58 | [PIX](https://github.com/deepmind/dm_pix) 59 | `PIX` 是一个 `JAX` 中的图像处理库,用于 `JAX`。 60 | 61 | [byp4xx](https://github.com/lobuhi/byp4xx) 62 | 用于绕过 `HTTP 40X` 响应的 `Pyhton` 脚本。特点。动词篡改、头文件、#bugbountytips技巧和2454用户代理。 63 | 64 | [Multimerge](https://github.com/sweeneyde/multimerge) 65 | `Multimerge` 是一个 `Python` 包,它实现了一种算法,可以将几个排序的迭代器懒散地组合成一个较长的排序迭代器。它是 `Python` 标准库中 `heapq.merge` 的一个直接替代。 66 | 67 | [CodeGen](https://github.com/facebookresearch/CodeGen) 68 | 来自 `Facebook AI Research` 的代码生成项目的参考实现。将机器学习应用于代码的通用工具箱,从数据集创建到模型训练和评估。配有预训练的模型。 69 | 70 | [Muler](https://github.com/PizzaMyHeart/muler) 71 | 用 `Flask` 建立的一个药物信息搜索引擎。 72 | 73 | ### 最近更新 74 | 75 | 76 | [Python in Visual Studio Code – July 2021 Release](https://devblogs.microsoft.com/python/python-in-visual-studio-code-july-2021-release/) 77 | 这些是该版本中引入的一些值得注意的变化。 78 | 79 | - A faster way to configure project roots via a new Pylance quick fix 80 | - Selecting a Python interpreter no longer changes settings 81 | - New debugger features: step into targets and function breakpoints 82 | 83 | ### 活动 84 | 85 | [Virtual: Scale EDA & ML Workloads To Clusters & Back With Dask](https://www.meetup.com/PyData-Calgary/events/279465823/) 86 | 在本次演讲中,您将学习如何使用 `Dask` 以最小的代码改动来扩展您的 `PyData` 工作负载,这样您就可以专注于您的工作而不必学习新的 `API`。 87 | 88 | [Virtual: Machine Learning and its Potential to Improve Epilepsy Diagnosis](https://www.meetup.com/PyData-Edinburgh/events/279576429/) 89 | 本讲座概述了 `ML` 的新应用如何能很快为生理学家提供更好的定量工具,以改善工作流程和诊断的准确性。 90 | 91 | [Virtual: PyData Berlin & PyData Hamburg July 2021 Meetup](https://www.meetup.com/PyData-Berlin/events/279302584/) 92 | 将会有以下话题: 93 | 94 | - AI based visual assistance system 95 | - Introducing Elyra: Extending JupyterLab for AI 96 | 97 | 98 | Posa: 99 | 100 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 101 | 102 | 103 | ----- 分割线 ----- 104 | 105 | > 如果您发现哪里翻译有误的话,请务与我联系!感谢! 106 | 107 | 108 | 109 | 110 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-509.html) 111 | - 改进: [issue-509.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23509.md) 112 | 113 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#512.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 512 2 | Date: 2021-08-13 16:11 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-512 5 | 6 | ### 欢迎阅读《pythonista周刊》第512期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-512](https://mailchi.mp/pythonweekly/python-weekly-issue-512) 10 | >翻译:Dustyposa 11 | 12 | **来自赞助商(PS:原文的赞助商):** 13 | 14 | [Django Day Copenhagen 2021 Call for Proposals](https://2021.djangoday.dk/cfp) 15 | Submit before August 15th 2021 23:59:59 UTC+1. 16 | 17 | ### 文章、教程与话题 18 | 19 | [如何在Python中获取Derivatives](https://www.youtube.com/watch?v=DeeoiE22bZ8) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 20 | 本视频介绍了三种不同类型的需要在 `Python` 中求导数的情况:符号型、数字型和准符号型。 21 | 22 | [当你在PyPI上的发布失败时该怎么办?](https://snarky.ca/what-to-do-when-you-botch-a-release-on-pypi/) 23 | 你在 `PyPI` 上发布了一个版本,但出现了一个错误(我们都经历过)。它可能大到整个版本都是错误的,也可能只是 `README` 中的一个拼写错误。幸运的是,你可以做一些事情来处理各种情况。 24 | 25 | [使用QuickAI在2行代码中进行YOLOV4 训练](https://www.youtube.com/watch?v=eeW1EzSQNuc) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 26 | 在这个视频中,我们只用2行代码就训练了一个自定义的 `YOLOV4` 物体检测器 27 | 28 | [现代的为云原生部署实现对象存储](https://engineering.soroco.com/modernizing-object-storage-for-cloud-native-deployments/) 29 | 这篇文章介绍了实现对象存储的多种方式及其权衡,以及混合云对象存储如何在合适的情况下在企业内部运行,并在需要时通过云来实现自我扩展、自我修复和自我管理的特性。 30 | 31 | [使用Django和React的PayPal支付教程](https://justdjango.com/blog/django-react-paypal-payments) 32 | 学习如何使用 `PayPal` 支付来销售数字产品。在本教程中,你将学习如何使用 `Django` 和 `React`,用一个简单的产品登陆页面来销售数字产品。 33 | 34 | [为你的Python应用程序提供干净和安全的代码](https://www.sonarsource.com/resources/white-papers/deliver-clean-safe-python-code/?utm_source=python-weekly&utm_medium=paid&utm_campaign=python) 35 | 了解静态代码分析工具如何帮助 `Python` 社区识别(和修复)一些著名的开源 `Python` 项目中的错误和漏洞。[Read More](https://www.sonarsource.com/resources/white-papers/deliver-clean-safe-python-code/?utm_source=python-weekly&utm_medium=paid&utm_campaign=python) SPONSOR 36 | 37 | [在Python和Go中开始使用Libsodium](https://developer.okta.com/blog/2021/08/05/libsodium-encryption-go-python) 38 | 学习如何使用 `libsodium` 在 `Python` 中加密文件,并在 `Go` 中解密文件! 39 | 40 | [以任务为导向的对话式人工智能在Airbnb客户支持中的应用](https://medium.com/airbnb-engineering/task-oriented-conversational-ai-in-airbnb-customer-support-5ebf49169eaa) 41 | `Airbnb` 如何提供自动化支持来增强主机和访客体验。 42 | 43 | [Django Rest框架配方](https://tinystruggles.com/posts/drf_recipes/) 44 | 45 | [用Python中的神经风格转换将任何图像变成梵高的画作](https://www.youtube.com/watch?v=yn3bWvQZIIo) 46 | 47 | [用sqlite-utils CLI工具对SQLite列中的数据应用转换功能](https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/) 48 | 49 | [Celery in production: Three more years of fixing bugs](https://t.co/KHZZvWc0nk) 50 | 51 | ### 有趣的项目、工具和库 52 | 53 | [Evidently](https://github.com/evidentlyai/evidently) 54 | 交互式报告,在验证或生产监控期间分析机器学习模型。 55 | 56 | [Berowra](https://github.com/sampoder/berowra) 57 | 一个为黑客和业余爱好者建立的开源 `CMS`,在 `Deta Space` 上运行。 58 | 59 | [AutoVideo](https://github.com/datamllab/autovideo) 60 | 一个自动的视频动作识别系统。 61 | 62 | [Lona](https://github.com/fscherf/lona) 63 | `Lona` 是一个网络应用框架,旨在用 `Python` 编写响应式网络应用。 64 | 65 | [cmdpxl](https://github.com/knosmos/cmdpxl) 66 | 一个完全实用的命令行图像编辑器。 67 | 68 | [image-to-latex](https://github.com/kingyiusuen/image-to-latex) 69 | 将 `LaTex` 数学方程式的图像转换成 `LaTex` 代码。 70 | 71 | [connector-x](https://github.com/sfu-db/connector-x) 72 | `ConnectorX` 使你能够以最快和最有效的内存方式将数据从数据库加载到 `Python`。 73 | 74 | [hAFL2](https://github.com/SafeBreach-Labs/hAFL2) 75 | hAFL2 is a kAFL-based hypervisor fuzzer. 76 | 77 | [Bagua](https://github.com/BaguaSys/bagua) 78 | `Bagua` 是一个灵活和高性能的分布式训练算法开发框架。 79 | 80 | ### 最近更新 81 | 82 | [Python in Visual Studio Code – August 2021 Release](https://devblogs.microsoft.com/python/python-in-visual-studio-code-august-2021-release/) 83 | 84 | ### 活动 85 | 86 | 87 | [Virtual: San Francisco Python Meetup August 2021](https://www.meetup.com/sfpython/events/278047971) 88 | 将会有以下话题: 89 | 90 | - 不仅仅是数字。在 `pandas` 中处理日期和时间 91 | - 用 `Touca` 进行回归测试 92 | - 我们如何处理不良的代码分析 93 | 94 | 95 | [Virtual: PyLadies Dublin Meetup August 2021](https://www.meetup.com/PyLadiesDublin/events/279318562/) 96 | 将会有以下话题: 97 | 98 | - 使用HuggingFace变形器进行文本分类 99 | - PySpark 101: Tips and Tricks 100 | 101 | 102 | [Virtual: PyData T&T August 2021](https://www.meetup.com/pydata-t-t/events/280019534/) 103 | There will be a talk, An Introduction to Recommender Systems in Python. 104 | 105 | [Virtual: PyData São August 2021](https://www.meetup.com/PyData-Sao-Paulo/events/280022253/) 106 | There will be a talk, Open Source causal impact analysis. 107 | 108 | 109 | Posa: 110 | 111 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 112 | 113 | 114 | ----- 分割线 ----- 115 | 116 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 117 | 118 | 119 | 120 | 121 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-512.html) 122 | - 改进: [issue-512.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23512.md) 123 | 124 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#513.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 513 2 | Date: 2021-08-20 16:11 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-513 5 | 6 | ### 欢迎阅读《pythonista周刊》第513期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-513](https://mailchi.mp/pythonweekly/python-weekly-issue-513) 10 | >翻译:Dustyposa 11 | 12 | ### 文章、教程与话题 13 | 14 | 15 | [Python 测试驱动开发](https://www.youtube.com/watch?v=B1j6k2j2eJg) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 16 | 17 | 测试驱动开发( `TDD` ),也被称为红-绿-重构,是一种常见的软件开发技术,在编写实际代码之前,你要先写测试。本视频以 `Python` 为例,展示了如何使用 `unittest` 包进行 `TDD`,还分享了一些与软件测试有关的实用技巧。 18 | 19 | [Flask & React - 从零到全栈 (用样例)](https://blog.appseed.us/flask-react-full-stack-seed-projects/) 20 | 学习如何使用 `Flask` 和 `React` 来编写可以轻松扩展的全栈产品。包括开源样本。 21 | 22 | [Reining in the thundering herd.](https://blog.clubhouse.com/reining-in-the-thundering-herd-with-django-and-gunicorn/) 23 | 用 `Django` 达到 `80%` 的 `CPU` 利用率。 24 | 25 | [用API、Python和Tableau获取并分析美国的通货膨胀数据](https://t.co/kvvWMmkOeO) 26 | 使用 `Python` 和美国劳工统计局数据 `API` 检索消费者价格指数数据。将数据加载到 `Tableau` 中,与第一系列美国储蓄债券初始综合利率进行比较。 27 | 28 | [Python Django and Google APIs](https://www.youtube.com/watch?v=_vCT42vDfgw) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 29 | 在这个完整的课程中,学习如何建立一个使用多个谷歌 `API` 的 `Python Django` 应用程序。 30 | 31 | [SonarLint Free and Open Source IDE Extension for Python Devs](https://www.sonarlint.org/?utm_source=python-weekly&utm_medium=paid&utm_campaign=sonarlint) 32 | Working in VS Code, PyCharm, Visual Studio, or Eclipse? SonarLint helps you find & fix Code Quality and Code Security issues in your Python codebase! [Discover More](https://www.sonarlint.org/?utm_source=python-weekly&utm_medium=paid&utm_campaign=sonarlint) SPONSOR 33 | 34 | [用Python UI 实现Asana的自动化](https://anvil.works/articles/using-the-asana-api) 35 | `Asana` 的 `API` 是一个强大的工具,在这篇文章中,你将通过扮演一个经理的角色来学习如何使用它,这个经理想要一种简单的方法来更新他们团队的截止日期,如果他们缺席的话。我们将构建一个应用程序来记录团队成员的缺席,并通过一个按钮自动更新他们所有的任务截止日期。 36 | 37 | [字符串translate和maketrans方法](https://mathspp.com/blog/pydonts/string-translate-and-maketrans-methods) 38 | 在这篇文章中,学习 `Python` 字符串方法 `translate` 和 `maketrans`。 39 | 40 | [Reddit Interview Problems: The Game of Life](https://alexgolec.dev/reddit-interview-problems-the-game-of-life/) 41 | 42 | [可重复的Python字节码](https://vulns.xyz/2021/08/reproducible-python-bytecode/) 43 | 44 | [Introducing the Python Launcher for Unix](https://snarky.ca/introducing-the-python-launcher-for-unix/) 45 | 46 | [Feasibility, Use Cases, and Limitations of Pyodide](https://devblogs.microsoft.com/python/feasibility-use-cases-and-limitations-of-pyodide/) 47 | 48 | [Pythonic monotonic](https://nedbatchelder.com/blog/202108/pythonic_monotonic.html) 49 | 50 | ### 有趣的项目、工具和库 51 | 52 | [whython](https://github.com/NexInfinite/whython) 53 | 一个用 `python` 制作的几乎完全可定制的语言! 54 | 55 | [burst](https://github.com/burstable-ai/burst) 56 | 一个在云端远程执行代码的命令行工具。 57 | 58 | [Inverno](https://github.com/werew/inverno) 59 | `Inverno` 是一个灵活的投资组合跟踪器。 60 | 61 | [python-project-template](https://github.com/rochacbruno/python-project-template) 62 | 一个开始 `Python` 项目的 `github` 模板 - 这使用 `github action` 来生成你的项目,基于模板。 63 | 64 | [TorchDrug](https://github.com/DeepGraphLearning/torchdrug) 65 | 一个强大而灵活的机器学习平台,用于药物发现。 66 | 67 | [darts](https://github.com/unit8co/darts) 68 | 一个便于操作和预测时间序列的 `python` 库。 69 | 70 | [byexample](https://github.com/byexamples/byexample) 71 | byexample is a literate programming engine where you mix ordinary text and snippets of code in the same file and then you execute them as regression tests. 72 | 73 | [fastapi-azure-auth](https://github.com/Intility/fastapi-azure-auth) 74 | 为你的 `FastAPI API` 轻松而安全地使用 `Azure AD`。 75 | 76 | [aws-ip-ranges](https://github.com/seligman/aws-ip-ranges) 77 | 跟踪 `AWS` 的 `ip-ranges.json` 文件的历史和大小。 78 | 79 | [DIY-ai-art](https://github.com/maxvfischer/DIY-ai-art) 80 | 如何从头开始建立自己的人工智能艺术装置。 81 | 82 | [backdoors101](https://github.com/ebagdasa/backdoors101) 83 | `Backdoors 101` - 是一个 `PyTorch` 框架,用于对深度学习模型进行最先进的后门防御和攻击。 84 | 85 | 86 | Posa: 87 | 88 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 89 | 90 | 91 | ----- 分割线 ----- 92 | 93 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 94 | 95 | 96 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-513.html) 97 | - 改进: [issue-513.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23513.md) 98 | 99 | -------------------------------------------------------------------------------- /python_advance/python周报/issue#519.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 519 2 | Date: 2021-10-01 16:11 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-519 5 | 6 | 7 | ### 欢迎阅读《pythonista周刊》第519期。Let us start! 8 | 9 | 10 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-519](https://mailchi.mp/pythonweekly/python-weekly-issue-519) 11 | >翻译:Dustyposa 12 | 13 | #### 来自赞助商(PS:原文的赞助商): 14 | 15 | [ Your Python Code: Powerful and Secure](https://www.sonarqube.org/developer-edition/?utm_source=pythonweekly&utm_medium=paid&utm_campaign=python&utm_content=primary-0930) SonarQube offers Python static analysis that’s powerful, fast, and accurate - out of the box! Find and review Security Hotspots and automatically detect Vulnerabilities in your code. Get started analyzing your Python projects today! **Start for free**. 16 | 17 | ### 文章、教程与话题 18 | 19 | 20 | [用 Python & FastAPI 搭建一个区块链应用](https://www.youtube.com/watch?v=G5M4bsxR-7E) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 21 | 了解如何使用 `Python` 构建区块链,然后使用 `FastAPI` 构建的 `API` 访问它。 22 | 23 | [Python 幕后 #13: GIL及其对Python多线程的影响](https://tenthousandmeters.com/blog/python-behind-the-scenes-13-the-gil-and-its-effects-on-python-multithreading/) 24 | 这篇文章谈论的是 `GIL` 的非显而易见的影响。在此过程中,我们将讨论 `GIL` 到底是什么,它为什么存在,它如何工作,以及它将如何影响 `Python` 未来的并发性。 25 | 26 | [用 spaCy & Python 做自然语言处理](https://www.youtube.com/watch?v=dIUTsFT2MeQ) ![img](https://mcusercontent.com/e2e180baf855ac797ef407fc7/images/af76283a-6e65-436c-967a-900427cf6399.png) 27 | 在这个 `spaCy` 教程中,你将学习所有关于自然语言处理的知识,以及如何使用 `Python spaCy` 库将其应用于现实世界的问题。 28 | 29 | [New Testing Features in Django 4.0](https://adamj.eu/tech/2021/09/28/new-testing-features-in-django-4.0/) 30 | `Django 4.0` 上周发布了第一个 `alpha` 版本,最终版本应该会在12月发布。它包含了大量的新功能,你可以在发布说明中查看。在这篇文章中,我们将更深入地研究测试方面的变化。 31 | 32 | [三种用Python测试你的API的方法](https://opensource.com/article/21/9/unit-test-python) 33 | 单元测试可能令人退缩,但这些 `Python` 模块将使你的生活更加轻松。 34 | 35 | [Fetch the Flag CTF at SnykCon | Oct 5 | Register for free](https://snyk.io/snykcon/ctf/?utm_campaign=Event-SnykCon-2021&utm_medium=Paid-Email&utm_source=Python-Weekly&utm_content=ctf) 36 | Ready to take your security skills to the next level? Compete in 20 hands-on hacking challenges & win prizes at Fetch the Flag CTF at SnykCon on Oct 5. [Register for free. ](https://snyk.io/snykcon/ctf/?utm_campaign=Event-SnykCon-2021&utm_medium=Paid-Email&utm_source=Python-Weekly&utm_content=ctf)SPONSOR 37 | 38 | [条件表达式](https://mathspp.com/blog/pydonts/conditional-expressions) 39 | 在这篇文章中,学习如何使用 `Python` 的条件表达式。 40 | 41 | [Pixel Shuffle Super Resolution with TensorFlow, Keras, and Deep Learning](https://www.pyimagesearch.com/2021/09/27/pixel-shuffle-super-resolution-with-tensorflow-keras-and-deep-learning/) 42 | 了解 `Pixel Shuffle Super Resolution` 以及如何在自己的项目和代码中使用 `Pixel Shuffle Super Resolution`。 43 | 44 | [Python: 深浅拷贝对象](https://stackabuse.com/python-deep-and-shallow-copy-object/) 45 | 在这篇文章中,你将学习 `Python` 中深度拷贝和浅度拷贝对象的区别,以及如何用拷贝库来完成这两件事。 46 | 47 | [解读数据结构显示](https://snarky.ca/unravelling-data-structure-displays/) 48 | 49 | [Python: 在Kubernetes上用DevSpace进行Flask开发](https://loft.sh/blog/python-flask-development-on-kubernetes-with-devspace/) 50 | 51 | [关于PyPI,教程没有告诉你的4件事](https://blog.paoloamoroso.com/2021/09/4-things-tutorials-dont-tell-you-about.html) 52 | 53 | [类型检查你的 Django Application](https://kracekumar.com/post/type_check_your_django_app/) 54 | 55 | ### 有趣的项目、工具和库 56 | 57 | [Muzic](https://github.com/microsoft/muzic) 58 | `Muzic` 是一个关于人工智能音乐的研究项目,通过深度学习和人工智能赋予音乐理解和生成能力。 59 | 60 | [Imia](https://github.com/alex-oleshkevich/imia) 61 | 一个用于 `Starlette` 和 `FastAPI` 的认证库。 62 | 63 | [objexplore](https://github.com/kylepollina/objexplore) 64 | 一个用于检查和探索 `Python` 对象的终端UI。 65 | 66 | [LiveSpeechPortraits](https://github.com/YuanxunLu/LiveSpeechPortraits) 67 | Live Speech Portraits: 实时逼真的说话头像动画。 68 | 69 | [LibFewShot](https://github.com/RL-VIG/LibFewShot) 70 | 少数人学习的综合图书馆。 71 | 72 | [RobustVideoMatting](https://github.com/PeterL1n/RobustVideoMatting) 73 | Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML! 74 | 75 | [wonk](https://github.com/aminohealth/wonk) 76 | `Wonk` 是一个工具,用于将一组 `AWS` 策略文件组合成更小的编译策略集。 77 | 78 | [MagInkCal](https://github.com/speedyg0nz/MagInkCal) 79 | 电子墨水魔术日历,可自动同步到谷歌日历,并通过电池供电的 `Raspberry Pi Zero` 运行。 80 | 81 | Posa: 82 | 83 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 84 | 85 | 86 | ----- 分割线 ----- 87 | 88 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 89 | 90 | 91 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-519.html) 92 | - 改进: [issue-519.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23519.md) 93 | 94 | -------------------------------------------------------------------------------- /python_advance/python周报/material/420-reloading.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/python_advance/python周报/material/420-reloading.gif -------------------------------------------------------------------------------- /python_advance/python周报/template.md: -------------------------------------------------------------------------------- 1 | Title: pythonista-weekly : Pyw 444 2 | Date: 2020-03-28 14:22 3 | Tags: Weekly,pythonweekly,Zh 4 | Slug: pyw-444 5 | 6 | ### 欢迎阅读《pythonista周刊》第444期。Let us start! 7 | 8 | 9 | >原文: [https://mailchi.mp/pythonweekly/python-weekly-issue-444](https://mailchi.mp/pythonweekly/python-weekly-issue-444) 10 | >翻译:Dustyposa 11 | 12 | **来自赞助商(PS:原文的赞助商):** 13 | 使用 `Datadog` 监控你的python指标,日志,集群分析。使用`Datadog`的应用分析,可以深入任何纬度并且能找到你所需要的信息,来进行动态诊断和快速故障排除。[来免费试用 14 天吧!](https://www.datadoghq.com/dg/apm/python-troubleshooting/?utm_source=Advertisement&utm_medium=Advertisement&utm_campaign=PythonWeekly-Troubleshooting) 14 | 15 | [python开发者都需要的Vettery](https://www.vettery.com/tech?utm_source=newsletter&utm_medium=pythonweekly&utm_term=tech&utm_content=grouped&utm_campaign=ad-77579) 16 | Vettery是一个招聘网站,它改变了人们应聘或者雇佣的方式。准备好换工作地方了吗?免费制作简历,你的薪资你说了算,现在就和顶级雇主的HR联系吧! 17 | 18 | ### 19 | 20 | 21 | 22 | ### 新鲜事 23 | 24 | 25 | 26 | ### 文章、教程与话题 27 | 28 | 29 | 30 | ### 有趣的项目、工具和库 31 | 32 | 33 | 34 | 35 | 36 | ### 那些活动 37 | 38 | 39 | 40 | #### Posa: 41 | 42 | > ❤️ Happy Pythonic ;-(Posa私人无责任播报) 43 | 44 | ----- 分割线 ----- 45 | 46 | > 如果你发现哪里翻译有误的话,请务与我联系!感谢! 47 | 48 | 49 | 50 | 51 | - 首发: [pythonista-weekly~蠎周刊 ~汇集全球蠎事儿 ;-)](http://weekly.pychina.org/python-weekly/pyw-444.html) 52 | - 改进: [issue-444.md](https://github.com/PyChina/weekly/blob/master/content/python-weekly/issue%23444.md) 53 | 54 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/decoration_built.py: -------------------------------------------------------------------------------- 1 | import functools 2 | import time 3 | import types 4 | from typing import Callable 5 | 6 | import requests 7 | from requests.adapters import HTTPAdapter 8 | 9 | from normal import BaseDictData 10 | 11 | 12 | class RequestsRetry: 13 | def __init__(self, max_retry: int, func: Callable) -> None: 14 | """需要注意。被装饰的函数是最后传入的。""" 15 | self.max_retry = max_retry 16 | functools.wraps(func)(self) # 保留原函数的元信息 17 | self.func = func 18 | 19 | def __call__(self, *args, **kwargs) -> BaseDictData: 20 | """装饰器处理逻辑函数""" 21 | session: requests.Session = kwargs.get("session", requests.Session()) # 获取session 或者新建 session 22 | max_retry: requests.Session = kwargs.get("max_retry") # 获取 max_retry 23 | adapter: HTTPAdapter = HTTPAdapter(max_retries=max_retry) # 初始自带处理额外操作的适配器 24 | session.mount("http://", adapter=adapter) # 给我们的 session 安装上 adapter, 第一个参数为前缀,代表哪种请求需要装上适配器 25 | kwargs.update(session=session) # 更新 session, 如果没有传session,就将带适配器的 session 传入命名参数 26 | try: 27 | response: BaseDictData = self.func(*args, **kwargs) 28 | except requests.ConnectTimeout: 29 | print(f"{max_retry}次请求都超时了,即将返回空值,请耐心等待返回空值") 30 | return {} 31 | else: 32 | return response 33 | 34 | def __get__(self, instance, owner) -> object: 35 | """实现该方法后,可以将装饰器器用于类的函数的装饰。""" 36 | if instance is None: 37 | return self 38 | return types.MethodType(self, instance) # 如果有参数,就绑定至self 39 | 40 | def itself(self, *args, **kwargs) -> BaseDictData: 41 | """不做处理,调用本身""" 42 | return self.func(*args, **kwargs) 43 | 44 | 45 | def retry(max_retry: int = 3): 46 | """装饰器包装,增加请求重试参数。""" 47 | # 此处为了避免定义额外函数,直接使用 functools.partial 帮助构造 RequestsRetry 实例 48 | return functools.partial(RequestsRetry, max_retry) 49 | 50 | 51 | @retry() 52 | def get_data(url: str, time_out: float = 3., **kwargs) -> BaseDictData: 53 | """ 54 | 自动重试 timeout 错误 的方法, 用 requests 自带轮子完成! 55 | :param url: 请求的 url 56 | :param time_out: 超时重试时间 57 | :param kwargs: 可选命名参数 58 | :return: BaseDictData 59 | """ 60 | session: requests.Session = kwargs.get("session", requests.Session()) # 获取session 或者新建 session 61 | params: BaseDictData = kwargs.get("params", {}) # 不管你传了什么奇怪的东西, 我只收这个 62 | headers: BaseDictData = kwargs.get("headers", {}) # 同上 63 | with session.get(url, params=params, headers=headers, timeout=time_out) as response: 64 | return response.json() 65 | 66 | 67 | class MySpider: 68 | def __init__(self, func: Callable): 69 | self.func = func 70 | 71 | def __call__(self, *args, **kwargs): 72 | print("reset for one second") 73 | time.sleep(1) 74 | res_data = self.func(*args, **kwargs) 75 | return res_data 76 | 77 | def at_once_run(self, *args, **kwargs): 78 | print("now, run the function") 79 | return self.func(*args, **kwargs) 80 | 81 | 82 | @MySpider 83 | def spider(): 84 | print("正在抓取") 85 | 86 | 87 | if __name__ == '__main__': 88 | # res = get_data("http://127.0.0.1:5000/api/retry", time_out=1.) 89 | # print(res) 90 | # spider = MySpider(spider).at_once_run 91 | # spider = MySpider(spider) 92 | 93 | spider.at_once_run() 94 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/decoration_simple.py: -------------------------------------------------------------------------------- 1 | import wrapt 2 | from functools import wraps 3 | from typing import Tuple 4 | 5 | import requests 6 | 7 | from normal import BaseDictData 8 | 9 | 10 | def strong_retry( 11 | max_retry: int = 3, 12 | exception: Tuple[BaseException] = ( 13 | requests.ConnectTimeout, 14 | requests.ReadTimeout, 15 | ) 16 | ): 17 | """ 18 | 万能函数重试装饰器诞生! 19 | :param max_retry: 最大重试次数 20 | :param exception: 捕捉错误类型 21 | :return: 22 | """ 23 | 24 | def retry(func): 25 | @wraps(func) # 保留被装饰函数的元信息 26 | def closure(*args, **kwargs) -> BaseDictData: 27 | for i in range(max_retry + 1): 28 | try: 29 | res = func(*args, **kwargs) 30 | except exception: 31 | print(f"第{i + 1}次重试。") 32 | else: 33 | return res 34 | return {} 35 | 36 | return closure 37 | 38 | return retry 39 | 40 | 41 | def strong_common_retry( 42 | max_retry: int = 3, 43 | exception: Tuple[BaseException] = ( 44 | requests.ConnectTimeout, 45 | requests.ReadTimeout 46 | ) 47 | ): 48 | """ 49 | 万能重试装饰器诞生! 50 | :param max_retry: 最大重试次数 51 | :param exception: 捕捉错误类型 52 | :return: 53 | """ 54 | 55 | @wrapt.decorator # 保留被装饰函数的元信息 56 | def wrapper(wrapped, instance, args, kwargs) -> BaseDictData: 57 | """ 58 | 59 | :param wrapped: 60 | :param instance:如果被装饰者为普通类方法,该值为类实例 61 | 如果被装饰者为 classmethod 类方法,该值为类 62 | 如果被装饰者为类/函数/静态方法,该值为 None 63 | :param args: 64 | :param kwargs: 65 | :return: 66 | """ 67 | for i in range(max_retry + 1): 68 | try: 69 | res = wrapped(*args, **kwargs) 70 | except exception: 71 | print(f"第{i + 1}次重试。") 72 | else: 73 | return res 74 | return {} 75 | 76 | return wrapper 77 | 78 | 79 | @strong_common_retry() 80 | def get_data(url: str, time_out: float = 3., **kwargs) -> BaseDictData: 81 | """ 82 | 自动重试 timeout 错误 的方法, 用 requests 自带轮子完成! 83 | :param url: 请求的 url 84 | :param time_out: 超时重试时间 85 | :param kwargs: 可选命名参数 86 | :return: BaseDictData 87 | """ 88 | session: requests.Session = kwargs.get("session", requests.Session()) # 获取session 或者新建 session 89 | params: BaseDictData = kwargs.get("params", {}) # 不管你传了什么奇怪的东西, 我只收这个 90 | headers: BaseDictData = kwargs.get("headers", {}) # 同上 91 | with session.get(url, params=params, headers=headers, timeout=time_out) as response: 92 | return response.json() 93 | 94 | 95 | if __name__ == '__main__': 96 | r = get_data("http://127.0.0.1:5000/api/retry", time_out=1.) 97 | print(r) 98 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/derector.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | 4 | def count_fun_time(func): 5 | def wrapper(*arg, **kwargs): 6 | start_time = time.time() 7 | res = func(*arg, **kwargs) 8 | print(f"函数总共运行了{time.time() - start_time:.2f}s") 9 | return res 10 | 11 | return wrapper 12 | 13 | 14 | def my_function(time_wait: int = 3): 15 | time.sleep(time_wait) 16 | print("运行结束") 17 | 18 | 19 | @count_fun_time 20 | def your_function(time_wait: int = 3): 21 | time.sleep(time_wait) 22 | print("运行结束") 23 | 24 | 25 | my_function = count_fun_time(my_function) 26 | my_function() 27 | my_function(4) 28 | 29 | your_function() 30 | your_function(4) 31 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/flask_server.py: -------------------------------------------------------------------------------- 1 | from time import sleep 2 | 3 | from flask import Flask, jsonify, Response 4 | 5 | app: Flask = Flask(__name__) 6 | 7 | retry_count: int = 0 # 用于重试请求的计数 8 | 9 | 10 | @app.route("/api/retry", methods=["GET"]) 11 | def retry_api() -> Response: 12 | """ 13 | 延时 1s 的请求接口, 响应时间 > 1s。 14 | :return: 15 | """ 16 | global retry_count 17 | retry_count += 1 18 | print(f"这是第{retry_count}次请求") 19 | if retry_count < 3: 20 | sleep(1) 21 | else: 22 | retry_count = 0 # 计数清零 23 | return jsonify({"msg": "已经三次了哦!"}) 24 | 25 | # @app.route("/") 26 | 27 | 28 | if __name__ == '__main__': 29 | app.run(host="0.0.0.0", port=9999) 30 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/normal.py: -------------------------------------------------------------------------------- 1 | from typing import Dict, Any, List 2 | 3 | import requests 4 | 5 | BaseDictData = Dict[str, Any] 6 | 7 | 8 | def get_data(url: str, max_retry: int = 0, time_out: float = 3., **kwargs) -> BaseDictData: 9 | """自动重试 timeout 错误 的方法""" 10 | params: BaseDictData = kwargs.get("params", {}) # 不管你传了什么奇怪的东西, 我只收这个 11 | headers: BaseDictData = kwargs.get("headers", {}) # 同上 12 | for i in range(max_retry + 1): 13 | """进行最大重试次数的遍历""" 14 | try: 15 | response: requests.Response = requests.get( 16 | url=url, 17 | params=params, 18 | headers=headers, 19 | timeout=time_out, 20 | ) 21 | except requests.ReadTimeout: 22 | print(f"第{i + 1}次请求失败,正在重试。") 23 | else: 24 | return response.json() # 没有错误,直接返回 25 | 26 | print(f"{max_retry + 1} 次请求都失败了,返回空值,便于后续逻辑处理。。。") 27 | return {} 28 | 29 | 30 | if __name__ == '__main__': 31 | print(get_data("http://localhost:5000/api/retry", max_retry=1, time_out=.01)) 32 | -------------------------------------------------------------------------------- /python_advance/requests请求重试/requests_built.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from requests.adapters import HTTPAdapter 3 | 4 | from normal import BaseDictData 5 | 6 | 7 | def get_data(url: str, max_retry: int = 0, time_out: float = 1., **kwargs) -> BaseDictData: 8 | """ 9 | 自动重试 timeout 错误 的方法, 用 requests 自带轮子完成! 10 | :param url: 请求的 url 11 | :param max_retry: 最大重试次数 12 | :param time_out: 超时重试时间 13 | :param kwargs: 可选命名参数 14 | :return: BaseDictData 15 | """ 16 | session: requests.Session = kwargs.get("session", requests.Session()) # 获取session 或者新建 session 17 | params: BaseDictData = kwargs.get("params", {}) # 不管你传了什么奇怪的东西, 我只收这个 18 | headers: BaseDictData = kwargs.get("headers", {}) # 同上 19 | adapter: HTTPAdapter = HTTPAdapter(max_retries=max_retry) # 初始自带处理额外操作的适配器 20 | session.mount("http://127.0.0.1", adapter=adapter) # 给我们的 session 安装上 adapter, 第一个参数为主机,代表对于哪台主机的请求需要装上适配器 21 | try: 22 | response: requests.Response = session.get( 23 | url, 24 | params=params, 25 | headers=headers, 26 | timeout=time_out 27 | ) 28 | except requests.ConnectionError: 29 | print(f"{max_retry + 1}次请求都失败了,即将返回空值,请耐心等待...") 30 | else: 31 | session.close() # 关闭 session, 源码主要是清除所有装配器 32 | return response.json() 33 | return {} 34 | 35 | 36 | if __name__ == '__main__': 37 | res = get_data("http://127.0.0.1:5000/api/retry", 1) 38 | print(res) 39 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/client.py: -------------------------------------------------------------------------------- 1 | import typing 2 | from urllib import parse 3 | 4 | import requests 5 | 6 | 7 | MyResponse = typing.Dict[str, typing.List[str]] 8 | 9 | 10 | class MySpider: 11 | """拼接返回的id""" 12 | def __init__(self, url="http://example.com") -> None: 13 | self.url: str = url 14 | self.return_base_url: str = "http://shop.com/id/" 15 | 16 | def get_data(self): 17 | try: 18 | response: requests.Response = requests.get(self.url) 19 | except requests.exceptions.ConnectionError: 20 | # 链接超时 21 | return self._handle_data() 22 | else: 23 | response.raise_for_status() # 检查请求状态值200 24 | data = response.json() 25 | return self._handle_data(data) 26 | 27 | def _handle_data(self, data=None) -> MyResponse: 28 | """处理请求的数据""" 29 | if data: 30 | return_data: typing.List[str] = [] 31 | all_id = data.get("all_id", []) 32 | for goods_id in all_id: 33 | return_data.append(parse.urljoin(self.return_base_url, goods_id)) 34 | return {"data": return_data} 35 | return {"data": []} 36 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/flask_server.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, jsonify 2 | 3 | app = Flask(__name__) 4 | 5 | 6 | @app.route('/api', methods=["GET"]) 7 | def msg_api(): 8 | """常规返回""" 9 | return jsonify({'Hello': 'World!'}) 10 | 11 | 12 | @app.route('/goods/', methods=["GET"]) 13 | def query_goods(goods_id): 14 | """带id的路由""" 15 | return jsonify({"name": "cake", "id": goods_id}) 16 | 17 | 18 | @app.errorhandler(404) 19 | def error_404_handing(error): 20 | """404页面""" 21 | return jsonify({"msg": "no route", "err": str(error)}), 404 22 | 23 | 24 | if __name__ == '__main__': 25 | app.run() 26 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/starletee_server.py: -------------------------------------------------------------------------------- 1 | 2 | import uvicorn 3 | from starlette.applications import Starlette 4 | from starlette.responses import JSONResponse, FileResponse 5 | from starlette.routing import Router, Mount 6 | from starlette.staticfiles import StaticFiles 7 | 8 | 9 | # app = Router(routes=[ 10 | # Mount('/static', app=StaticFiles(directory='static'), name="static"), 11 | # ]) 12 | app = Starlette() 13 | 14 | 15 | @app.route('/api', methods=["GET"]) 16 | async def hello_api(request) -> JSONResponse: 17 | """常规返回""" 18 | return JSONResponse({'Hello': 'World!'}) 19 | 20 | 21 | @app.route('/goods/{goods_id:int}', methods=["GET"]) 22 | async def query_goods(request) -> JSONResponse: 23 | """带id的路由""" 24 | return JSONResponse({"name": "cake", "id": request.path_params.get("goods_id")}) 25 | 26 | 27 | @app.exception_handler(404) 28 | async def not_found(request, exc) -> JSONResponse: 29 | """404处理""" 30 | return JSONResponse(content={"msg": "no route"}, status_code=exc.status_code) 31 | 32 | 33 | @app.route("/static/{path:str}") 34 | def static_files(request): 35 | s = request.path_params.get("path") 36 | return FileResponse(path="static/" + s) 37 | 38 | 39 | if __name__ == '__main__': 40 | 41 | uvicorn.run(app) 42 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/test_client.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from unittest import mock 3 | 4 | import requests 5 | import requests_mock 6 | 7 | from client import MySpider, MyResponse 8 | 9 | 10 | class TestMySpider(unittest.TestCase): 11 | def setUp(self) -> None: 12 | self.spider: MySpider = MySpider() # 初始化对象,首先运行这里,再运行 test_xxxxxx 13 | 14 | def test_handle_data(self) -> None: 15 | """测试处理代码的逻辑""" 16 | return_data: MyResponse = {"data": []} # 返回的基础数据 17 | self.assertEqual(self.spider._handle_data(), return_data) # none值返回,测试是否相等 18 | 19 | return_data.update(data=[ 20 | "http://shop.com/id/23", 21 | "http://shop.com/id/32" 22 | ]) # 生成正常值 23 | # 正常值返回, 测试是否相等 24 | self.assertEqual(self.spider._handle_data({"all_id": ["23", "32"]}), return_data) 25 | 26 | @requests_mock.mock() 27 | def test_get_data(self, mocker) -> None: 28 | """测试正常逻辑""" 29 | shop_data: MyResponse = {"all_id": ["12", "123", "1234"]} 30 | mocker.get(requests_mock.ANY, json=shop_data) # 截胡 requests.get 31 | 32 | spider_data: MyResponse = self.spider.get_data() # 获取正常返回值 33 | response_data: MyResponse = {'data': ['http://shop.com/id/12', 'http://shop.com/id/123', 'http://shop.com/id/1234']} 34 | self.assertEqual(spider_data, response_data) # 比较是否相等 35 | 36 | shop_data: MyResponse = {} 37 | mocker.get(requests_mock.ANY, json=shop_data) # 截胡 requests.get 38 | spider_data: MyResponse = self.spider.get_data() # 获取空返回值 39 | response_data: MyResponse = {'data': []} 40 | self.assertEqual(spider_data, response_data) # 比较是否相等 41 | 42 | @mock.patch.object(requests, "get", side_effect=requests.ConnectionError("No network")) 43 | def test_net_error(self, mocked) -> None: 44 | return_data: MyResponse = {"data": []} 45 | spider_data: MyResponse = self.spider.get_data() # 获取网络错误的返回值 46 | self.assertEqual(spider_data, return_data) 47 | 48 | 49 | if __name__ == '__main__': 50 | unittest.main() 51 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/test_flask_client.py: -------------------------------------------------------------------------------- 1 | import json 2 | import typing 3 | import unittest 4 | 5 | from flask_basic import app as my_app 6 | 7 | 8 | class TestApp(unittest.TestCase): 9 | 10 | def setUp(self) -> None: 11 | self.client = my_app.test_client() # 初始化客户端,app 自带的测试客户端 12 | 13 | def test_msg_api(self) -> None: 14 | response = self.client.get("/api") # 访问路由 15 | data: typing.Dict[str, typing.Any] = json.loads(response.data.decode("u8")) # 响应数据格式化 16 | self.assertEqual(data["Hello"], "World!") # 判断结果 17 | 18 | def test_goods_api(self) -> None: 19 | response = self.client.get("/goods/123") # 访问路由 20 | data: typing.Dict[str, typing.Any] = json.loads(response.data.decode("u8")) # 响应数据格式化 21 | self.assertEqual(data["name"], "cake") # 判断结果 22 | self.assertEqual(data["id"], 123) # 判断结果 23 | 24 | def test_404_page(self) -> None: 25 | response = self.client.get("/idontknow") # 访问路由 26 | self.assertEqual(response.status, "404 NOT FOUND") # 404 状态监测 27 | data: typing.Dict[str, typing.Any] = json.loads(response.data.decode("u8")) # 响应数据格式化 28 | self.assertEqual(data["msg"], "no route") # 返回数据监测 29 | 30 | 31 | if __name__ == '__main__': 32 | unittest.main() 33 | -------------------------------------------------------------------------------- /python_advance/使用python客户端和服务器的功能测试实例/test_starletee_api.py: -------------------------------------------------------------------------------- 1 | import typing 2 | import unittest 3 | 4 | from starlette.testclient import TestClient 5 | 6 | from starletee_server import app as my_app 7 | 8 | 9 | class TestApp(unittest.TestCase): 10 | 11 | def setUp(self) -> None: 12 | self.client = TestClient(my_app) # 初始化客户端,app 自带的测试客户端 13 | 14 | def test_msg_api(self) -> None: 15 | response = self.client.get("/api") # 访问路由 16 | data: typing.Dict[str, typing.Any] = response.json() # 响应数据格式化 17 | self.assertEqual(data["Hello"], "World!") # 判断结果 18 | 19 | def test_goods_api(self) -> None: 20 | response = self.client.get("/goods/123") # 访问路由 21 | data: typing.Dict[str, typing.Any] = response.json() # 响应数据格式化 22 | self.assertEqual(data["name"], "cake") # 判断结果 23 | self.assertEqual(data["id"], 123) # 判断结果 24 | 25 | def test_404_page(self) -> None: 26 | response = self.client.get("/idontknow") # 访问路由 27 | self.assertEqual(response.status_code, 404) # 404 状态监测 28 | data: typing.Dict[str, typing.Any] = response.json() # 响应数据格式化 29 | self.assertEqual(data["msg"], "no route") # 返回数据监测 30 | 31 | 32 | if __name__ == '__main__': 33 | unittest.main() 34 | -------------------------------------------------------------------------------- /python_advance/在python脚本中运行脚本的几种方法/bash_out.txt: -------------------------------------------------------------------------------- 1 | bash run start 2 | total 56 3 | -rw-r--r--@ 1 dustyposa staff 12011 Oct 20 15:40 README.md 4 | -rw-r--r--@ 1 dustyposa staff 0 Oct 20 15:40 bash_out.txt 5 | -rw-r--r--@ 1 dustyposa staff 29 Oct 20 14:57 check_alive.py 6 | -rw-r--r--@ 1 dustyposa staff 630 Oct 17 21:56 restart.py 7 | -rw-r--r--@ 1 dustyposa staff 1002 Oct 20 15:39 run_bash.py 8 | -rw-r--r--@ 1 dustyposa staff 67 Oct 20 15:14 test.sh 9 | sleep over 10 | -------------------------------------------------------------------------------- /python_advance/在python脚本中运行脚本的几种方法/check_alive.py: -------------------------------------------------------------------------------- 1 | import aiohttp 2 | import asyncio 3 | 4 | 5 | async def fetch(session, url): 6 | async with session.get(url) as response: 7 | return await response.text() 8 | 9 | 10 | async def main(): 11 | async with aiohttp.ClientSession() as session: 12 | html = await fetch(session, 'http://python.org') 13 | print(html) 14 | 15 | 16 | if __name__ == '__main__': 17 | asyncio.run(main()) 18 | -------------------------------------------------------------------------------- /python_advance/在python脚本中运行脚本的几种方法/restart.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import time 4 | 5 | def run(select_data: str) -> None: 6 | if select_data == "a": 7 | print("程序休眠1s") 8 | time.sleep(1) 9 | elif select_data == "b": 10 | print("程序即将重启") 11 | os.execv(sys.executable, ["python3"] + sys.argv) # 或者 ["python3", __file__] 12 | elif select_data == "c": 13 | print("程序即将退出") 14 | sys.exit(0) 15 | 16 | 17 | if __name__ == '__main__': 18 | print("程序启动了!") 19 | print("请选择功能:", "A. sleep 1 s", "B. 重启程序", "C. 结束程序", sep="\n") 20 | while True: 21 | select = input("请选择:").lower() 22 | run(select) 23 | -------------------------------------------------------------------------------- /python_advance/在python脚本中运行脚本的几种方法/run_bash.py: -------------------------------------------------------------------------------- 1 | import os 2 | import subprocess 3 | import time 4 | 5 | bash_cmd = "zsh test.sh" 6 | test_bash_file = "test.sh" 7 | zsh_file = "/bin/zsh" 8 | bash_cmd_list = ["bash", "test.sh"] 9 | python_cmd_list = ["python3", "check_alive.py"] 10 | 11 | 12 | def system_run() -> None: 13 | """os.system 运行""" 14 | print("os.system start!") 15 | os.system(bash_cmd) 16 | 17 | 18 | def os_popen_run() -> None: 19 | """使用os.popen 运行子进程""" 20 | print("Start") 21 | with os.popen(" ".join(python_cmd_list)) as pipe: 22 | for line in pipe.readlines(): 23 | print(line, end="") 24 | 25 | """ 26 | with os.popen(bash_cmd) as pipe, open("bash_out.txt", "w", encoding="u8") as fp: 27 | for line in pipe.readlines(): 28 | print(line, end="", file=fp) 29 | """ 30 | 31 | 32 | def subprocess_run() -> None: 33 | proc = subprocess.Popen( 34 | bash_cmd_list, 35 | stdout=subprocess.PIPE, 36 | ) 37 | print(proc.returncode) 38 | print(proc.poll()) 39 | print(proc.returncode) 40 | proc.stdout.close() 41 | print("*" * 50) 42 | print(proc.wait()) 43 | 44 | 45 | def os_exec_run() -> None: 46 | """替代当前进程的运行""" 47 | print("python 正在运行") 48 | time.sleep(5) 49 | print("python 运行完毕,执行 bash 脚本") 50 | os.execv(zsh_file, bash_cmd_list) # == os.execl(zsh_file, *bash_cmd_list) 51 | 52 | 53 | if __name__ == '__main__': 54 | system_run() 55 | os_exec_run() 56 | os_popen_run() 57 | subprocess_run() 58 | -------------------------------------------------------------------------------- /python_advance/在python脚本中运行脚本的几种方法/test.sh: -------------------------------------------------------------------------------- 1 | # #!/bin/zsh 2 | echo "bash run start" 3 | ls -l 4 | sleep 5 5 | echo "sleep over" 6 | -------------------------------------------------------------------------------- /python_advance/在浏览器中运行python的几种主流方式/README.md: -------------------------------------------------------------------------------- 1 | > 参考资料 2 | > https://pythontips.com/2019/05/22/running-python-in-the-browser/ 3 | > http://stromberg.dnsalias.org/~strombrg/pybrowser/python-browser.html 4 | 5 | -------------------------------------------------------------------------------- /python_advance/翻译计划/依赖注入/old_WAY.py: -------------------------------------------------------------------------------- 1 | class Mailer: 2 | def send(self, email: str): 3 | print(f"Sending mail to {email}") 4 | 5 | 6 | class RegisterService: 7 | def __init__(self): 8 | self.mailer = Mailer() 9 | 10 | def register(self, email: str): 11 | print(f"Registering user {email}") 12 | self.mailer.send(email) 13 | 14 | 15 | register_service = RegisterService() 16 | register_service.register("petru@pepy.tech") 17 | 18 | import pinject 19 | 20 | 21 | class Mailer: 22 | def send(self, email: str): 23 | print(f"Sending mail to {email}") 24 | 25 | 26 | class RegisterService: 27 | def __init__(self, mailer: Mailer): 28 | self.mailer = mailer 29 | 30 | def register(self, email: str): 31 | print(f"Registering user {email}") 32 | self.mailer.send(email) 33 | 34 | 35 | obj_graph = pinject.new_object_graph() 36 | register_service = obj_graph.provide(RegisterService) 37 | register_service.register("petru@pepy.tech") 38 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/asyncio_crawler.py: -------------------------------------------------------------------------------- 1 | try: 2 | from asyncio import JoinableQueue as Queue, CancelledError 3 | except ImportError: 4 | # 在 Python 3.5,asyncio.JoinableQueue 并入到了 Queue 5 | from asyncio import Queue, CancelledError 6 | import asyncio 7 | 8 | import aiohttp 9 | 10 | loop = asyncio.get_event_loop() 11 | 12 | 13 | class Crawler: 14 | def __init__(self, root_url: str, max_redirect: int): 15 | self.max_tasks = 10 16 | self.max_redirect = max_redirect 17 | self.q = Queue() 18 | self.seen_urls = set() 19 | 20 | # aiohttp 的 ClientSession 执行连接池 并且 HTTP 为我们 keep-alive 21 | self.session = aiohttp.ClientSession(loop=loop) 22 | 23 | # 把 (URL, max_redirect) 放入队列 24 | self.q.put((root_url, self.max_redirect)) 25 | 26 | @asyncio.coroutine 27 | def crawl(self): 28 | """运行 crawler 直到所有的工作完成""" 29 | wokers = [asyncio.Task(self.work()) 30 | for _ in range(self.max_tasks)] 31 | 32 | # 当所有任务完成,退出 33 | yield from self.q.join() 34 | for w in wokers: 35 | w.cancel() 36 | 37 | @asyncio.coroutine 38 | def work(self): 39 | while True: 40 | url, max_redirect = yield from self.q.get() 41 | 42 | # 下载页面并向 self.q 中增加新链接 43 | yield from self.fetch(url, max_redirect) 44 | self.q.task_done() 45 | 46 | @asyncio.coroutine 47 | def fetch(self, url: str, max_redirect: int): 48 | # 我们自己处理 redirects 49 | response = yield from self.session.get( 50 | url, allow_redirects=False 51 | ) 52 | 53 | try: 54 | if is_redirect(response): 55 | if max_redirect > 0: 56 | next_url = response.headers['location'] 57 | if next_url in self.seen_urls: 58 | # 我们已经下载过这个路径 59 | return 60 | 61 | # 记录我们已经看过这条连接 62 | self.seen_urls.add(next_url) 63 | 64 | # 跟进重定向,重定向次数减一 65 | self.q.put_nowait((next_url, max_redirect - 1)) 66 | else: 67 | links = yield from self.parse_links(response) 68 | # python集合逻辑 69 | for link in links.dirrerence(self.seen_urls): 70 | self.q.put_nowait((link, self.max_redirect)) 71 | self.seen_urls.update(links) 72 | finally: 73 | # 返回连接池 74 | yield from response.release() 75 | 76 | 77 | class Task: 78 | def __init__(self, coro): 79 | self.coro = coro 80 | f = Future() 81 | f.set_result(None) 82 | self.step(f) 83 | 84 | def step(self, future: Future) -> None: 85 | try: 86 | next_future = self.coro.send(future.result) 87 | except CancelledError: 88 | self.cancelled = True 89 | return 90 | except StopIteration as exc: 91 | 92 | # Task 用 coro's 返回值 resolves 自己 93 | self.set_result(exc.value) 94 | return 95 | 96 | next_future.add_done_callback(self.step) 97 | 98 | def cancel(self): 99 | self.coro.throw(CancelledError) 100 | 101 | def __str__(self): 102 | return self.__class__.__name__ 103 | 104 | 105 | crawler = crawling.Crawler('http://xkcd.com', 106 | max_redirect=10) 107 | 108 | loop.run_until_complete(crawler.crawl()) 109 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/context_data.py: -------------------------------------------------------------------------------- 1 | import dis 2 | 3 | 4 | def foo(): 5 | bar() 6 | 7 | 8 | def bar(): 9 | pass 10 | 11 | 12 | dis.dis(foo) 13 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/flask_server.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, jsonify, request 2 | from flask_cors import CORS 3 | app = Flask(__name__) 4 | CORS(app) 5 | 6 | 7 | @app.route('/api', methods=["GET"]) 8 | def msg_api(): 9 | """常规返回""" 10 | return jsonify({'Hello': 'World!'}) 11 | 12 | 13 | @app.route('/goods/', methods=["GET"]) 14 | def query_goods(goods_id): 15 | """带id的路由""" 16 | return jsonify({"name": "cake", "id": goods_id}) 17 | 18 | 19 | @app.route('/po', methods=["POST"]) 20 | def po(): 21 | # json_data = request.form 22 | filename = request.form.get('name') 23 | request.files.get('data').save(filename) 24 | return jsonify({'stats': 'ok'}) 25 | 26 | 27 | @app.errorhandler(404) 28 | def error_404_handing(error): 29 | """404页面""" 30 | return jsonify({"msg": "no route", "err": str(error)}), 404 31 | 32 | 33 | if __name__ == '__main__': 34 | app.run(port=80) 35 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/generate_fn.py: -------------------------------------------------------------------------------- 1 | def gen_fn(): 2 | result = yield 1 3 | print(f'result of yield : {result}') 4 | result2 = yield 2 5 | print(f'result of 2nd yield : {result2}') 6 | return 'done' 7 | 8 | 9 | if __name__ == '__main__': 10 | gen = gen_fn() 11 | print(gen.send(None)) 12 | print(gen.send("first yield")) 13 | print(gen.send("2nd yield")) 14 | 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/local_server.py: -------------------------------------------------------------------------------- 1 | import selectors 2 | import socket 3 | 4 | sel = selectors.DefaultSelector() 5 | 6 | 7 | def accept(sock, mask): 8 | conn, addr = sock.accept() # Should be ready 9 | print('accepted', conn, 'from', addr) 10 | conn.setblocking(False) 11 | sel.register(conn, selectors.EVENT_READ, read) 12 | 13 | 14 | def read(conn, mask): 15 | data = conn.recv(1000) # Should be ready 16 | if data: 17 | print('echoing', repr(data), 'to', conn) 18 | conn.send(data) # Hope it won't block 19 | else: 20 | print('closing', conn) 21 | sel.unregister(conn) 22 | conn.close() 23 | 24 | 25 | sock = socket.socket() 26 | sock.bind(('', 80)) 27 | sock.listen(100) 28 | sock.setblocking(False) 29 | sel.register(sock, selectors.EVENT_READ, accept) 30 | 31 | while True: 32 | events = sel.select() 33 | for key, mask in events: 34 | callback = key.data 35 | callback(key.fileobj, mask) 36 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/queue_code.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | 3 | 4 | class Queue: 5 | def __init__(self): 6 | self._join_future = Future() 7 | self._unfinished_tasks = 0 8 | # ... 其他的初始条件 9 | 10 | def put_nowait(self, item): 11 | self._unfinished_tasks += 1 12 | # ... 保存 item 13 | 14 | def task_done(self): 15 | self._unfinished_tasks -= 1 16 | if self._unfinished_tasks == 0: 17 | self._join_future.set_result(None) 18 | 19 | @asyncio.coroutine 20 | def join(self): 21 | if self._unfinished_tasks > 0: 22 | yield from self._join_future 23 | 24 | 25 | class EventLoop: 26 | def run_until_complete(self, coro): 27 | """运行直到生成器结束""" 28 | task = Task(coro) 29 | task.task_done_callback(stop_callback) 30 | try: 31 | self.run_forever() 32 | except StopError: 33 | pass 34 | 35 | 36 | class StopError(BaseException): 37 | """抛出停止事件循环""" 38 | 39 | 40 | def stop_callback(future): 41 | raise StopError 42 | 43 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/sub_yield_from_task.py: -------------------------------------------------------------------------------- 1 | import socket 2 | from typing import Callable, Generator 3 | from selectors import DefaultSelector, EVENT_WRITE, EVENT_READ 4 | 5 | selector = DefaultSelector() # 创建选择器对象 6 | 7 | stoped = False 8 | 9 | 10 | class Future: 11 | def __init__(self): 12 | self.result = None 13 | self._callbacks = [] 14 | 15 | def add_done_callback(self, fn: Callable) -> None: 16 | self._callbacks.append(fn) 17 | 18 | def set_result(self, result) -> None: 19 | self.result = result 20 | for fn in self._callbacks: 21 | fn(self) 22 | 23 | 24 | class Task: 25 | def __init__(self, coro): 26 | self.coro = coro 27 | f = Future() 28 | f.set_result(None) 29 | self.step(f) 30 | 31 | def step(self, future: Future) -> None: 32 | try: 33 | next_future = self.coro.send(future.result) 34 | except StopIteration: 35 | return 36 | 37 | next_future.add_done_callback(self.step) 38 | 39 | def __str__(self): 40 | return self.__class__.__name__ 41 | 42 | 43 | class Fetcher: 44 | def __init__(self, url: str) -> None: 45 | self.response = b"" 46 | self.url = url 47 | self.sock = None 48 | 49 | def fetch(self) -> Generator: 50 | self.sock = socket.socket() 51 | self.sock.setblocking(False) 52 | try: 53 | self.sock.connect(('localhost', 80)) 54 | except BlockingIOError: 55 | pass 56 | f = Future() 57 | 58 | def on_connected(): 59 | f.set_result(None) 60 | 61 | selector.register( 62 | self.sock.fileno(), 63 | EVENT_WRITE, 64 | on_connected 65 | ) 66 | yield f 67 | print('connected') 68 | selector.unregister(self.sock.fileno()) 69 | print('send data') 70 | request = f'GET {self.url} HTTP/1.1\r\nHost: httpbin.org\r\n\r\n' # 构建请求头 71 | # ... 连接逻辑同上,然后: 72 | self.sock.send(request.encode('ascii')) 73 | self.response = yield from read_all(self.sock) 74 | print('response:', self.response) 75 | 76 | def __str__(self): 77 | return self.__class__.__name__ 78 | 79 | 80 | def read(sock: socket.socket): 81 | f = Future() 82 | 83 | def on_readable(): 84 | f.set_result(sock.recv(4096)) 85 | 86 | selector.register( 87 | sock.fileno(), 88 | EVENT_READ, 89 | on_readable 90 | ) 91 | chunk = yield f # 读一个chunk 92 | selector.unregister(sock.fileno()) 93 | return chunk 94 | 95 | 96 | def read_all(sock: socket.socket): 97 | response = [] 98 | # 读取所有消息 99 | chunk = yield from read(sock) 100 | while chunk: 101 | response.append(chunk) 102 | chunk = yield from read(sock) 103 | 104 | return b''.join(response) 105 | 106 | 107 | def loop() -> None: 108 | while not stoped: 109 | try: 110 | events = selector.select() 111 | except OSError: 112 | # 当 select 中没有东西时,len == 0, 这里会报错 113 | print(len(selector.get_map().items()) == 0) 114 | break 115 | else: 116 | for event_key, event_mask in events: 117 | callback = event_key.data 118 | callback() 119 | 120 | 121 | if __name__ == '__main__': 122 | # 开始抓取 http://xkcd.com/353 123 | fetcher = Fetcher('/get') 124 | Task(fetcher.fetch()) 125 | 126 | loop() 127 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/sub_yield_from_task2.py: -------------------------------------------------------------------------------- 1 | import socket 2 | from typing import Callable, Generator 3 | from selectors import DefaultSelector, EVENT_WRITE, EVENT_READ 4 | 5 | selector = DefaultSelector() # 创建选择器对象 6 | 7 | stoped = False 8 | 9 | 10 | class Future: 11 | def __init__(self): 12 | self.result = None 13 | self._callbacks = [] 14 | 15 | def add_done_callback(self, fn: Callable) -> None: 16 | self._callbacks.append(fn) 17 | 18 | def set_result(self, result) -> None: 19 | self.result = result 20 | for fn in self._callbacks: 21 | fn(self) 22 | 23 | # Future 类的方法 24 | def __iter__(self): 25 | # 告诉 Task 在这里继续 26 | yield self 27 | return self.result 28 | 29 | 30 | class Task: 31 | def __init__(self, coro): 32 | self.coro = coro 33 | f = Future() 34 | f.set_result(None) 35 | self.step(f) 36 | 37 | def step(self, future: Future) -> None: 38 | try: 39 | next_future = self.coro.send(future.result) 40 | except StopIteration: 41 | return 42 | 43 | next_future.add_done_callback(self.step) 44 | 45 | def __str__(self): 46 | return self.__class__.__name__ 47 | 48 | 49 | class Fetcher: 50 | def __init__(self, url: str) -> None: 51 | self.response = b"" 52 | self.url = url 53 | self.sock = None 54 | 55 | def fetch(self) -> Generator: 56 | self.sock = socket.socket() 57 | self.sock.setblocking(False) 58 | try: 59 | self.sock.connect(('localhost', 80)) 60 | except BlockingIOError: 61 | pass 62 | f = Future() 63 | 64 | def on_connected(): 65 | f.set_result(None) 66 | 67 | selector.register( 68 | self.sock.fileno(), 69 | EVENT_WRITE, 70 | on_connected 71 | ) 72 | yield f 73 | print('connected') 74 | selector.unregister(self.sock.fileno()) 75 | print('send data') 76 | request = f'GET {self.url} HTTP/1.1\r\nHost: httpbin.org\r\n\r\n' # 构建请求头 77 | # ... 连接逻辑同上,然后: 78 | self.sock.send(request.encode('ascii')) 79 | self.response = yield from read_all(self.sock) 80 | print('response:', self.response) 81 | 82 | def __str__(self): 83 | return self.__class__.__name__ 84 | 85 | 86 | def read(sock: socket.socket): 87 | f = Future() 88 | 89 | def on_readable(): 90 | f.set_result(sock.recv(4096)) 91 | 92 | selector.register( 93 | sock.fileno(), 94 | EVENT_READ, 95 | on_readable 96 | ) 97 | chunk = yield f # 读一个chunk 98 | selector.unregister(sock.fileno()) 99 | return chunk 100 | 101 | 102 | def read_all(sock: socket.socket): 103 | response = [] 104 | # 读取所有消息 105 | chunk = yield from read(sock) 106 | while chunk: 107 | response.append(chunk) 108 | chunk = yield from read(sock) 109 | 110 | return b''.join(response) 111 | 112 | 113 | def loop() -> None: 114 | while not stoped: 115 | try: 116 | events = selector.select() 117 | except OSError: 118 | # 当 select 中没有东西时,len == 0, 这里会报错 119 | print(len(selector.get_map().items()) == 0) 120 | break 121 | else: 122 | for event_key, event_mask in events: 123 | callback = event_key.data 124 | callback() 125 | 126 | 127 | if __name__ == '__main__': 128 | # 开始抓取 http://xkcd.com/353 129 | fetcher = Fetcher('/get') 130 | Task(fetcher.fetch()) 131 | 132 | loop() 133 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/task.py: -------------------------------------------------------------------------------- 1 | import socket 2 | from typing import Callable, Generator 3 | from selectors import DefaultSelector, EVENT_WRITE 4 | 5 | selector = DefaultSelector() # 创建选择器对象 6 | 7 | 8 | class Future: 9 | def __init__(self): 10 | self.result = None 11 | self._callbacks = [] 12 | 13 | def add_done_callback(self, fn: Callable) -> None: 14 | self._callbacks.append(fn) 15 | 16 | def set_result(self, result) -> None: 17 | self.result = result 18 | for fn in self._callbacks: 19 | fn(self) 20 | 21 | 22 | class Task: 23 | def __init__(self, coro): 24 | self.coro = coro 25 | f = Future() 26 | f.set_result(None) 27 | self.step(f) 28 | 29 | def step(self, future: Future) -> None: 30 | try: 31 | next_future = self.coro.send(future.result) 32 | except StopIteration: 33 | return 34 | 35 | next_future.add_done_callback(self.step) 36 | 37 | 38 | class Fetcher: 39 | def __init__(self, url: str) -> None: 40 | self.response = b"" 41 | self.url = url 42 | self.sock = None 43 | 44 | def fetch(self) -> Generator: 45 | self.sock = socket.socket() 46 | self.sock.setblocking(False) 47 | try: 48 | self.sock.connect(('baidu.com', 80)) 49 | except BlockingIOError: 50 | pass 51 | f = Future() 52 | 53 | def on_connected(): 54 | f.set_result(None) 55 | 56 | selector.register( 57 | self.sock.fileno(), 58 | EVENT_WRITE, 59 | on_connected 60 | ) 61 | yield f 62 | selector.unregister(self.sock.fileno()) 63 | 64 | 65 | stoped = False 66 | 67 | 68 | def loop() -> None: 69 | while not stoped: 70 | events = selector.select() 71 | for event_key, event_mask in events: 72 | callback = event_key.data 73 | callback() 74 | 75 | 76 | if __name__ == '__main__': 77 | # 开始抓取 http://xkcd.com/353 78 | fetcher = Fetcher('/353/') 79 | Task(fetcher.fetch()) 80 | 81 | loop() 82 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/yield_example.py: -------------------------------------------------------------------------------- 1 | import typing 2 | import time 3 | 4 | 5 | def a_yield() -> typing.Generator: 6 | print('start generator!') 7 | res = 'first step' 8 | print(res) 9 | send_data = yield res.split()[0] # first stop ='s right 10 | res = 'second step to send' 11 | print(f'res: {res}, send data: {send_data}') 12 | 13 | send_data = yield res.split()[0] # second stop ='s right 14 | res = 'third step' 15 | print(f'res: {res}, send data: {send_data}') 16 | 17 | yield res.split()[0] # third stop ='s right 18 | res = 'last step' 19 | print(res) 20 | return 'generator over!' 21 | 22 | 23 | def print_yield_data(data: str): 24 | print(f'from generator get data:\n{data}') 25 | print('-' * 50) 26 | time.sleep(3) 27 | 28 | 29 | if __name__ == '__main__': 30 | test_generator_function = a_yield() 31 | yield_data = test_generator_function.send(None) # 第一次必须发送 None 32 | # # yield_data = test_generator_function.send(1) # 第一次必须发送 None 33 | print_yield_data(yield_data) 34 | yield_data = test_generator_function.send('hi generator!') # resume generator 35 | print_yield_data(yield_data) 36 | yield_data = next(test_generator_function) # resume generator 37 | print_yield_data(yield_data) 38 | try: 39 | next(test_generator_function) # resume generator 40 | except StopIteration as e: 41 | print("return data:", e, sep='\n') 42 | -------------------------------------------------------------------------------- /python_advance/翻译计划/异步爬虫/src/yield_task.py: -------------------------------------------------------------------------------- 1 | import socket 2 | from typing import Callable, Generator 3 | from selectors import DefaultSelector, EVENT_WRITE, EVENT_READ 4 | 5 | selector = DefaultSelector() # 创建选择器对象 6 | 7 | stoped = False 8 | 9 | 10 | class Future: 11 | def __init__(self): 12 | self.result = None 13 | self._callbacks = [] 14 | 15 | def add_done_callback(self, fn: Callable) -> None: 16 | self._callbacks.append(fn) 17 | 18 | def set_result(self, result) -> None: 19 | self.result = result 20 | for fn in self._callbacks: 21 | fn(self) 22 | 23 | 24 | class Task: 25 | def __init__(self, coro): 26 | self.coro = coro 27 | f = Future() 28 | f.set_result(None) 29 | self.step(f) 30 | 31 | def step(self, future: Future) -> None: 32 | try: 33 | next_future = self.coro.send(future.result) 34 | except StopIteration: 35 | return 36 | 37 | next_future.add_done_callback(self.step) 38 | 39 | 40 | class Fetcher: 41 | def __init__(self, url: str) -> None: 42 | self.response = b"" 43 | self.url = url 44 | self.sock = None 45 | 46 | def fetch(self) -> Generator: 47 | self.sock = socket.socket() 48 | self.sock.setblocking(False) 49 | try: 50 | self.sock.connect(('localhost', 80)) 51 | except BlockingIOError: 52 | pass 53 | f = Future() 54 | 55 | def on_connected(): 56 | f.set_result(None) 57 | 58 | selector.register( 59 | self.sock.fileno(), 60 | EVENT_WRITE, 61 | on_connected 62 | ) 63 | yield f 64 | print('connected') 65 | selector.unregister(self.sock.fileno()) 66 | print('send data') 67 | request = f'GET {self.url} HTTP/1.1\r\nHost: httpbin.org\r\n\r\n' # 构建请求头 68 | # ... 连接逻辑同上,然后: 69 | self.sock.send(request.encode('ascii')) 70 | 71 | while True: 72 | f = Future() 73 | 74 | def on_readable(): 75 | f.set_result(self.sock.recv(4096)) 76 | 77 | selector.register( 78 | self.sock.fileno(), 79 | EVENT_READ, 80 | on_readable 81 | ) 82 | chunk = yield f 83 | selector.unregister(self.sock.fileno()) 84 | if chunk: 85 | print('收到数据') 86 | self.response += chunk 87 | else: 88 | # 响应读取完成 89 | break 90 | print('response:', self.response) 91 | 92 | 93 | def loop() -> None: 94 | while not stoped: 95 | try: 96 | events = selector.select() 97 | except OSError: 98 | # 当 select 中没有东西时,len == 0, 这里会报错 99 | print(len(selector.get_map().items()) == 0) 100 | break 101 | else: 102 | for event_key, event_mask in events: 103 | callback = event_key.data 104 | callback() 105 | 106 | 107 | if __name__ == '__main__': 108 | # 开始抓取 http://xkcd.com/353 109 | fetcher = Fetcher('/get') 110 | Task(fetcher.fetch()) 111 | 112 | loop() 113 | -------------------------------------------------------------------------------- /small_projects/README.md: -------------------------------------------------------------------------------- 1 | - [从视频中提取音频](./音视频分离) 2 | - [邮件发送及GUI](./email_sending) 3 | - [视频转换与剪切](./convert_video) 4 | - [pdf_to_ppt](./pdf_ppt_into_each_other) 5 | 6 | -------------------------------------------------------------------------------- /small_projects/convert_video/README.md: -------------------------------------------------------------------------------- 1 | ## install 2 | ffmpeg 3 | > python -m pip install ffmpeg moviepy 4 | 5 | ### use way 6 | > `convert_fly_to_mp4.py` 7 | > 指定 `flv` 文件的目录,之后会自动创建 `mp4s` 目录,在里面存入 转化后的格式 8 | > `flv`文件目录指定方式 ·flv_path = “XXX”· 9 | > 10 | > 11 | >`video_clipping.py` 12 | > 剪切 `mp4`格式视频,剪切后会生成`outputs`目录,里面会存剪切后的视频。 13 | > 默认为 10 分钟剪切一次,不足20分钟不剪 14 | -------------------------------------------------------------------------------- /small_projects/convert_video/convert_fly_to_mp4.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from pathlib import Path 4 | 5 | flv_path = r"F:\迅雷下载\ai_v" 6 | tmp_fly = Path(flv_path) 7 | 8 | mp4_base_p = tmp_fly.joinpath("mp4s") # 输出文件夹 9 | mp4_base_p.mkdir(exist_ok=True) # 创建 10 | cmd = "ffmpeg -i {flv_p} -c:v libx264 -crf 19 -strict experimental {mp4_p}" # 基础转化命令 11 | for flv_file in Path(flv_path).glob("*.flv"): 12 | flv_p = flv_file.absolute() 13 | mp4_p = mp4_base_p.joinpath(f"{flv_file.stem}.mp4").absolute() 14 | with os.popen(cmd.format(flv_p=flv_p.__str__(), mp4_p=mp4_p.__str__())) as pro: 15 | for i in pro.readlines(): 16 | print(i) 17 | -------------------------------------------------------------------------------- /small_projects/convert_video/video_clipping.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | from pathlib import Path 3 | 4 | from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip 5 | 6 | 7 | def get_length(filename) -> float: 8 | # 获取视频总长度 9 | result = subprocess.run(["ffprobe", "-v", "error", "-show_entries", 10 | "format=duration", "-of", 11 | "default=noprint_wrappers=1:nokey=1", filename], 12 | stdout=subprocess.PIPE, 13 | stderr=subprocess.STDOUT) 14 | return float(result.stdout) 15 | 16 | 17 | mp4_path = "." 18 | outputs_dir_name = "outputs" 19 | 20 | 21 | def main() -> None: 22 | (Path(mp4_path) / outputs_dir_name).mkdir(exist_ok=True) # 创建输出文件夹 23 | for file in Path(mp4_path).glob("*.mp4"): 24 | cut_video(file.absolute().__str__()) 25 | 26 | 27 | def cut_video(file_name: str): 28 | video_path = file_name 29 | tmp = Path(file_name) 30 | output_path = (tmp.parent / outputs_dir_name).joinpath(tmp.stem + "{}").with_suffix(".mp4").absolute().__str__() 31 | cut_time = 5 # unit s 32 | total_duration = get_length(file_name) - 20 33 | splice_time = 10 * 60 # 分割时间 34 | index = 1 35 | while (total_duration - cut_time) / splice_time > 2: 36 | ffmpeg_extract_subclip( 37 | video_path.format(""), 38 | cut_time, 39 | cut_time + splice_time, 40 | targetname=output_path.format(index) 41 | ) 42 | cut_time += splice_time 43 | index += 1 44 | print(f"{output_path.format(index)}文件生成成功。") 45 | ffmpeg_extract_subclip(video_path.format(""), cut_time, total_duration, targetname=output_path.format(index)) 46 | print(f"{output_path.format(index)}文件生成成功。") 47 | 48 | 49 | if __name__ == '__main__': 50 | main() 51 | -------------------------------------------------------------------------------- /small_projects/email_sending/README.md: -------------------------------------------------------------------------------- 1 | ## packaging cmd 2 | ```shell script 3 | pyinstaller -w email_gui.py 4 | ``` 5 | ### install pyinstaller 6 | ```shell script 7 | python -m pip install pyinstaller 8 | ``` 9 | #### issue 10 | - unicode_error 11 | + `chcp 65001` 12 | 13 | - Authentication error 14 | + 检查第三方邮箱 smtp 接口是否开启 15 | + check the third part is enabled your account's `smtp server` 16 | 17 | - `.env` 格式说明(`dotenv`使用) 18 | + ```ENV_VARIABLE_NAME=value``` 19 | - 其余打包方式 20 | > `pyinstaller -wF email_gui.py` # 普通运行 + GUI + 单文件打包 21 | `pyinstaller -wF sche_email_sending.py -n 邮件小助手` # 调度器运行 + GUI + 单文件打包 22 | `pyinstaller -w sche_email_sending.py -n 邮件小助手` # 多文件 + GUI + 调度器打包 23 | - 相关模块说明 24 | + [yagmail](https://github.com/kootenpv/yagmail) —— 邮件发送 25 | + [PySimpleGUI](https://pysimplegui.readthedocs.io/) —— GUI 26 | + [sched](https://docs.python.org/zh-cn/3/library/sched.html?highlight=sched) —— 调度模块(标准库) 27 | + [dotenv](https://github.com/theskumar/python-dotenv) —— 环境变量加载 28 | - 文件格式说明 29 | + 发件方邮箱文件(xx.csv) 30 | ``` 31 | user1,pwd2 32 | user2,pwd2 33 | ``` 34 | + 收件方邮箱文件(xx.csv) 35 | ``` 36 | email_address1 37 | email_address2 38 | email_address3 39 | email_address4 40 | ``` 41 | -------------------------------------------------------------------------------- /small_projects/email_sending/email_gui.py: -------------------------------------------------------------------------------- 1 | import os 2 | import csv 3 | import sched 4 | import sys 5 | import time 6 | from itertools import chain 7 | from typing import Any, Dict, List 8 | 9 | import PySimpleGUI as sg 10 | 11 | from send_email import send_emails 12 | 13 | sg.theme('BluePurple') 14 | 15 | user_input_key = '-User-' 16 | pwd_input_key = '-PWD-' 17 | 18 | layout = [ 19 | [sg.Text('请输入用户名和密码:')], 20 | [sg.Text('用户名:', size=(14, 1)), sg.Input(key=user_input_key)], 21 | [sg.Text('密码:', size=(14, 1)), sg.Input(key=pwd_input_key, password_char="*")], 22 | [sg.Text('邮件标题:', size=(14, 1), justification="left"), sg.Input(key="subject")], 23 | [sg.Text('发送的内容:', size=(14, 3), justification="left"), sg.Multiline(size=(60, 6), key="contents")], 24 | 25 | [sg.Text('待发送的用户:', size=(14, 1)), sg.Input(disabled=True), sg.FileBrowse("浏览", size=(8, 1), key="files")], 26 | [sg.Text('', key="send_state", text_color="red", justification="center", size=(40, 1))], 27 | # [sg.ProgressBar(500, orientation='h', size=(80, 20), key='progbar', style='winnative', relief='52%')], 28 | [sg.Button('确认', key='submit'), sg.Button('退出', key='Exit')], 29 | ] 30 | window = sg.Window('邮件发送', layout) 31 | 32 | 33 | def check_data(values: Dict[str, Any]) -> bool: 34 | for key in [user_input_key, pwd_input_key, "files", "contents", "subject"]: 35 | if not values.get(key): 36 | return False 37 | return True 38 | 39 | 40 | if time.time() - os.stat(sys.argv[0]).st_mtime > 2 * 60 * 60 * 24: 41 | sys.exit(1) 42 | 43 | 44 | def get_send_list(path: str) -> List[str]: 45 | with open(path, encoding="u8") as csv_file: 46 | reader = csv.reader(csv_file) 47 | return list(chain.from_iterable(reader)) 48 | 49 | while True: 50 | event, values = window.read() 51 | user, pwd = values[user_input_key], values[pwd_input_key] 52 | if event in (None, 'Exit'): 53 | break 54 | elif event == "submit": 55 | if check_data(values): 56 | send_lists = get_send_list(values["files"]) 57 | window["send_state"].update("发送中......") 58 | send_emails( 59 | user=values[user_input_key], 60 | pwd=values[pwd_input_key], 61 | contents=[ 62 | values["contents"], 63 | # values["files"], 64 | ], 65 | send_list=send_lists, 66 | subject=values["subject"] 67 | ) 68 | window["send_state"].update("发送完毕") 69 | 70 | # for i in range(500): 71 | # window['progbar'].update_bar(i + 1) 72 | # if event == 'Show': 73 | # Update the "output" text element to be the value of "input" element 74 | # window['-OUTPUT-'].update(values['-IN-']) 75 | 76 | window.close() 77 | -------------------------------------------------------------------------------- /small_projects/email_sending/gui.py: -------------------------------------------------------------------------------- 1 | import PySimpleGUI as sg 2 | 3 | sg.theme('Light Blue 2') 4 | 5 | layout = [[sg.Text('Enter 2 files to comare')], 6 | [sg.Text('File 1', size=(8, 1)), sg.Input(), sg.FileBrowse()], 7 | [sg.Text('File 2', size=(8, 1)), sg.Input(), sg.FileBrowse()], 8 | [sg.Submit(), sg.Cancel()]] 9 | 10 | window = sg.Window('File Compare', layout) 11 | 12 | event, values = window.read() 13 | window.close() 14 | print(f'You clicked {event}') 15 | print(f'You chose filenames {values[0]} and {values[1]}') 16 | -------------------------------------------------------------------------------- /small_projects/email_sending/send_email.py: -------------------------------------------------------------------------------- 1 | import time 2 | from typing import List, Union 3 | 4 | import yagmail 5 | 6 | 7 | def send_emails( 8 | user: str, 9 | pwd: str, 10 | contents: List[str], 11 | send_list: Union[List[str], str], 12 | subject: str 13 | ) -> None: 14 | if "vip" in user and "163" in user: 15 | host = "smtp.vip.163.com" 16 | else: 17 | host = f"smtp.{user.split('@')[1]}" 18 | try: 19 | yag = yagmail.SMTP(user=user, password=pwd, host=host) 20 | yag.send(to=send_list, subject=subject, contents=list(filter(bool, contents))) 21 | except Exception: 22 | file_name = f"send_fail_{user}_{time.strftime('%Y%m%d')}.txt" 23 | else: 24 | file_name = f"sended_{time.strftime('%Y%m%d')}.txt" 25 | finally: 26 | with open(file_name, "a") as fp: 27 | if isinstance(send_list, list): 28 | fp.writelines([i + "\n" for i in send_list]) 29 | else: 30 | fp.write(send_list + "\n") 31 | 32 | 33 | if __name__ == '__main__': 34 | import os 35 | from dotenv import load_dotenv 36 | 37 | load_dotenv() 38 | # 链接邮箱服务器 39 | yag = yagmail.SMTP(user=os.getenv("USER_NAME"), password=os.getenv("USER_PWD"), host='smtp.163.com') 40 | 41 | # 邮箱正文 42 | contents = [ 43 | '这是一封正常邮件,需要测试', 44 | '这是一封正常邮件,需要测试1', 45 | '这是一封正常邮件,需要测试2', 46 | '这是一封正常邮件,需要测试3', 47 | '这是一封正常邮件,需要测试4', 48 | 'You can find an audio file attached.', 49 | yagmail.inline("Snipaste_2020-03-16_16-44-15.png"), 50 | # "./Snipaste_2020-03-09_23-25-23.png", 51 | ] 52 | 53 | # 发送邮件 54 | yag.send(os.getenv("SEND_EMAIL"), '重要消息cc', contents) 55 | print("发送成功") 56 | -------------------------------------------------------------------------------- /small_projects/email_sending/test/read_csv_test.py: -------------------------------------------------------------------------------- 1 | from sche_email_sending import get_user_list, pack_send_email_and_user 2 | 3 | 4 | # print(get_user_list("../u.csv")) 5 | # print(get_user_list("../user.csv")) 6 | 7 | s = pack_send_email_and_user("../user.csv", "users.csv") 8 | print(list(s)) 9 | -------------------------------------------------------------------------------- /small_projects/email_sending/test/users.csv: -------------------------------------------------------------------------------- 1 | aaa,ddd 2 | ccc,eee 3 | ff,ggg 4 | -------------------------------------------------------------------------------- /small_projects/pdf_ppt_into_each_other/pdf_to_ppt.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from pdf2image import convert_from_path 3 | from pptx import Presentation 4 | from io import BytesIO 5 | 6 | pdf_file = "123.pdf" 7 | print() 8 | print("Converting file: " + pdf_file) 9 | print() 10 | 11 | # Prep presentation 12 | prs = Presentation() 13 | blank_slide_layout = prs.slide_layouts[6] 14 | 15 | # Create working folder 16 | base_name = pdf_file.split(".pdf")[0] 17 | 18 | # Convert PDF to list of images 19 | print("Starting conversion...") 20 | print(f"pdf_file: {pdf_file}") 21 | print(f"pdf_file: {Path(pdf_file).absolute()}") 22 | # slideimgs = convert_from_path(pdf_file, 300, fmt='ppm', thread_count=2) 23 | slideimgs = convert_from_path(Path(pdf_file).absolute(), 300, fmt='ppm', thread_count=2) 24 | print("...complete.") 25 | print() 26 | 27 | # Loop over slides 28 | for i, slideimg in enumerate(slideimgs): 29 | if i % 10 == 0: 30 | print("Saving slide: " + str(i)) 31 | 32 | imagefile = BytesIO() 33 | slideimg.save(imagefile, format='tiff') 34 | imagedata = imagefile.getvalue() 35 | imagefile.seek(0) 36 | width, height = slideimg.size 37 | 38 | # 调整幻灯片大小 39 | prs.slide_height = height * 9525 40 | prs.slide_width = width * 9525 41 | 42 | # Add slide 43 | slide = prs.slides.add_slide(blank_slide_layout) 44 | pic = slide.shapes.add_picture(imagefile, 0, 0, width=width * 9525, height=height * 9525) 45 | 46 | # Save Powerpoint 47 | print() 48 | print("Saving file: " + base_name + ".pptx") 49 | prs.save(base_name + '.pptx') 50 | print("Conversion complete. :)") 51 | print() 52 | -------------------------------------------------------------------------------- /small_projects/pdf_ppt_into_each_other/ppt_to_pdf.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | from pathlib import Path 4 | import comtypes.client 5 | 6 | # %% Get console arguments 7 | # input_folder_path = sys.argv[1] 8 | input_folder_path = "ppts" 9 | # output_folder_path = sys.argv[2] 10 | output_folder_path = "pdfs" 11 | 12 | # %% Convert folder paths to Windows format 13 | input_folder_path = os.path.abspath(input_folder_path) 14 | output_folder_path = os.path.abspath(output_folder_path) 15 | 16 | # %% Get files in input folder 17 | 18 | input_file_paths = os.listdir(input_folder_path) 19 | 20 | # %% Convert each file 21 | for input_file_name in input_file_paths: 22 | 23 | # Skip if file does not contain a power point extension 24 | if not input_file_name.lower().endswith((".ppt", ".pptx")): 25 | continue 26 | 27 | # Create input file path 28 | input_file_path = os.path.join(input_folder_path, input_file_name) 29 | 30 | # Create powerpoint application object 31 | powerpoint = comtypes.client.CreateObject("Powerpoint.Application") 32 | 33 | # Set visibility to minimize 34 | powerpoint.Visible = 1 35 | 36 | # Open the powerpoint slides 37 | slides = powerpoint.Presentations.Open(input_file_path) 38 | 39 | # Get base file name 40 | file_name = os.path.splitext(input_file_name)[0] 41 | 42 | # Create output file path 43 | output_file_path = os.path.join(output_folder_path, file_name + ".pdf") 44 | 45 | # Save as PDF (formatType = 32) 46 | slides.SaveAs(output_file_path, 32) 47 | 48 | # Close the slide deck 49 | slides.Close() 50 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_ch_simple_example/__init__.py -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/actions.py: -------------------------------------------------------------------------------- 1 | # This files contains your custom actions which can be used to run 2 | # custom Python code. 3 | # 4 | # See this guide on how to implement these action: 5 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 6 | 7 | 8 | # This is a simple example for a custom action which utters "Hello World!" 9 | 10 | # from typing import Any, Text, Dict, List 11 | # 12 | # from rasa_sdk import Action, Tracker 13 | # from rasa_sdk.executor import CollectingDispatcher 14 | # 15 | # 16 | # class ActionHelloWorld(Action): 17 | # 18 | # def name(self) -> Text: 19 | # return "action_hello_world" 20 | # 21 | # def run(self, dispatcher: CollectingDispatcher, 22 | # tracker: Tracker, 23 | # domain: Dict[Text, Any]) -> List[Dict[Text, Any]]: 24 | # 25 | # dispatcher.utter_message(text="Hello World!") 26 | # 27 | # return [] 28 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/config.yml: -------------------------------------------------------------------------------- 1 | # Configuration for Rasa NLU. 2 | # https://rasa.com/docs/rasa/nlu/components/ 3 | language: en 4 | pipeline: 5 | - name: WhitespaceTokenizer 6 | - name: RegexFeaturizer 7 | - name: LexicalSyntacticFeaturizer 8 | - name: CountVectorsFeaturizer 9 | - name: CountVectorsFeaturizer 10 | analyzer: "char_wb" 11 | min_ngram: 1 12 | max_ngram: 4 13 | - name: DIETClassifier 14 | epochs: 100 15 | - name: EntitySynonymMapper 16 | - name: ResponseSelector 17 | epochs: 100 18 | 19 | # Configuration for Rasa Core. 20 | # https://rasa.com/docs/rasa/core/policies/ 21 | policies: 22 | - name: MemoizationPolicy 23 | - name: TEDPolicy 24 | max_history: 5 25 | epochs: 100 26 | - name: MappingPolicy 27 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/credentials.yml: -------------------------------------------------------------------------------- 1 | # This file contains the credentials for the voice & chat platforms 2 | # which your bot is using. 3 | # https://rasa.com/docs/rasa/user-guide/messaging-and-voice-channels/ 4 | 5 | rest: 6 | # # you don't need to provide anything here - this channel doesn't 7 | # # require any credentials 8 | 9 | 10 | #facebook: 11 | # verify: "" 12 | # secret: "" 13 | # page-access-token: "" 14 | 15 | #slack: 16 | # slack_token: "" 17 | # slack_channel: "" 18 | 19 | #socketio: 20 | # user_message_evt: 21 | # bot_message_evt: 22 | # session_persistence: 23 | 24 | #mattermost: 25 | # url: "https:///api/v4" 26 | # token: "" 27 | # webhook_url: "" 28 | 29 | # This entry is needed if you are using Rasa X. The entry represents credentials 30 | # for the Rasa X "channel", i.e. Talk to your bot and Share with guest testers. 31 | rasa: 32 | url: "http://localhost:5002/api" 33 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/data/nlu.md: -------------------------------------------------------------------------------- 1 | ## intent:greet 2 | - 嗨 3 | - 你好 4 | - 嘿 5 | - 早上好 6 | - 晚上好 7 | - 在这里 8 | 9 | ## intent:goodbye 10 | - 再见 11 | - 拜拜 12 | - 后会有期 13 | - 回见 14 | 15 | ## intent:affirm 16 | - 是的 17 | - 果然 18 | - 当然 19 | - 听起来不错 20 | - 正确 21 | 22 | ## intent:deny 23 | - 不 24 | - 绝不 25 | - 我不认为是这样 26 | - 不是那样的 27 | - 没门 28 | - 不是的 29 | 30 | ## intent:mood_great 31 | - 棒极了 32 | - 非常好 33 | - 棒 34 | - 无敌 35 | - 厉害了 36 | - 感觉很不错 37 | - 我很开心 38 | - 我很好 39 | 40 | ## intent:mood_unhappy 41 | - 呜 42 | - 悲伤 43 | - 不开心 44 | - 糟糕 45 | - 非常糟糕 46 | - 差极了 47 | - 坏极了 48 | - 不是非常好 49 | - 悲哀至极 50 | - 如此悲伤 51 | 52 | ## intent:bot_challenge 53 | - 你是一个机器人吗? 54 | - 你是人类吗? 55 | - 我在和机器人聊天? 56 | - 我在和人聊天? 57 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/data/stories.md: -------------------------------------------------------------------------------- 1 | ## happy path 2 | * greet 3 | - utter_greet 4 | * mood_great 5 | - utter_happy 6 | 7 | ## sad path 1 8 | * greet 9 | - utter_greet 10 | * mood_unhappy 11 | - utter_cheer_up 12 | - utter_did_that_help 13 | * affirm 14 | - utter_happy 15 | 16 | ## sad path 2 17 | * greet 18 | - utter_greet 19 | * mood_unhappy 20 | - utter_cheer_up 21 | - utter_did_that_help 22 | * deny 23 | - utter_goodbye 24 | 25 | ## say goodbye 26 | * goodbye 27 | - utter_goodbye 28 | 29 | ## bot challenge 30 | * bot_challenge 31 | - utter_iamabot 32 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/domain.yml: -------------------------------------------------------------------------------- 1 | intents: 2 | - greet 3 | - goodbye 4 | - affirm 5 | - deny 6 | - mood_great 7 | - mood_unhappy 8 | - bot_challenge 9 | 10 | responses: 11 | utter_greet: 12 | - text: "哈咯,你好吗?" 13 | 14 | utter_cheer_up: 15 | - text: "这里有东西可以让你开心一下:" 16 | image: "https://i.imgur.com/nGF1K8f.jpg" 17 | 18 | utter_did_that_help: 19 | - text: "有帮到你吗?" 20 | 21 | utter_happy: 22 | - text: "棒极了!祝贺" 23 | 24 | utter_goodbye: 25 | - text: "有缘再见" 26 | 27 | utter_iamabot: 28 | - text: "我是机器人哦,基于 Rasa." 29 | 30 | session_config: 31 | session_expiration_time: 60 32 | carry_over_slots_to_new_session: true 33 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/endpoints.yml: -------------------------------------------------------------------------------- 1 | # This file contains the different endpoints your bot can use. 2 | 3 | # Server where the models are pulled from. 4 | # https://rasa.com/docs/rasa/user-guide/configuring-http-api/#fetching-models-from-a-server/ 5 | 6 | #models: 7 | # url: http://my-server.com/models/default_core@latest 8 | # wait_time_between_pulls: 10 # [optional](default: 100) 9 | 10 | # Server which runs your custom actions. 11 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 12 | 13 | #action_endpoint: 14 | # url: "http://localhost:5055/webhook" 15 | 16 | # Tracker store which is used to store the conversations. 17 | # By default the conversations are stored in memory. 18 | # https://rasa.com/docs/rasa/api/tracker-stores/ 19 | 20 | #tracker_store: 21 | # type: redis 22 | # url: 23 | # port: 24 | # db: 25 | # password: 26 | # use_ssl: 27 | 28 | #tracker_store: 29 | # type: mongod 30 | # url: 31 | # db: 32 | # username: 33 | # password: 34 | 35 | # Event broker which all conversation events should be streamed to. 36 | # https://rasa.com/docs/rasa/api/event-brokers/ 37 | 38 | #event_broker: 39 | # url: localhost 40 | # username: username 41 | # password: password 42 | # queue: queue 43 | -------------------------------------------------------------------------------- /small_projects/rasa_ch_simple_example/tests/conversation_tests.md: -------------------------------------------------------------------------------- 1 | #### This file contains tests to evaluate that your bot behaves as expected. 2 | #### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/user-guide/testing-your-assistant/ 3 | 4 | ## happy path 1 5 | * greet: hello there! 6 | - utter_greet 7 | * mood_great: amazing 8 | - utter_happy 9 | 10 | ## happy path 2 11 | * greet: hello there! 12 | - utter_greet 13 | * mood_great: amazing 14 | - utter_happy 15 | * goodbye: bye-bye! 16 | - utter_goodbye 17 | 18 | ## sad path 1 19 | * greet: hello 20 | - utter_greet 21 | * mood_unhappy: not good 22 | - utter_cheer_up 23 | - utter_did_that_help 24 | * affirm: yes 25 | - utter_happy 26 | 27 | ## sad path 2 28 | * greet: hello 29 | - utter_greet 30 | * mood_unhappy: not good 31 | - utter_cheer_up 32 | - utter_did_that_help 33 | * deny: not really 34 | - utter_goodbye 35 | 36 | ## sad path 3 37 | * greet: hi 38 | - utter_greet 39 | * mood_unhappy: very terrible 40 | - utter_cheer_up 41 | - utter_did_that_help 42 | * deny: no 43 | - utter_goodbye 44 | 45 | ## say goodbye 46 | * goodbye: bye-bye! 47 | - utter_goodbye 48 | 49 | ## bot challenge 50 | * bot_challenge: are you a bot? 51 | - utter_iamabot 52 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_learn/ep2/__init__.py -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/actions.py: -------------------------------------------------------------------------------- 1 | # This files contains your custom actions which can be used to run 2 | # custom Python code. 3 | # 4 | # See this guide on how to implement these action: 5 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 6 | 7 | 8 | # This is a simple example for a custom action which utters "Hello World!" 9 | 10 | # from typing import Any, Text, Dict, List 11 | # 12 | # from rasa_sdk import Action, Tracker 13 | # from rasa_sdk.executor import CollectingDispatcher 14 | # 15 | # 16 | # class ActionHelloWorld(Action): 17 | # 18 | # def name(self) -> Text: 19 | # return "action_hello_world" 20 | # 21 | # def run(self, dispatcher: CollectingDispatcher, 22 | # tracker: Tracker, 23 | # domain: Dict[Text, Any]) -> List[Dict[Text, Any]]: 24 | # 25 | # dispatcher.utter_message(text="Hello World!") 26 | # 27 | # return [] 28 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/channels.py: -------------------------------------------------------------------------------- 1 | import inspect 2 | import logging 3 | from asyncio import CancelledError 4 | from typing import Text, Callable, Awaitable, List, Dict 5 | 6 | from rasa.core.channels import RestInput, UserMessage, CollectingOutputChannel 7 | from rasa.server import get_tracker 8 | from sanic import Blueprint, response 9 | from sanic.request import Request 10 | from sanic.response import HTTPResponse 11 | 12 | logger = logging.getLogger(__name__) 13 | 14 | 15 | class RobotInput(RestInput): 16 | 17 | @classmethod 18 | def name(cls) -> Text: 19 | return "rest" 20 | 21 | def blueprint( 22 | self, on_new_message: Callable[[UserMessage], Awaitable[None]] 23 | ) -> Blueprint: 24 | robot_webhook = Blueprint( 25 | "robot_webhook_{}".format(type(self).__name__), 26 | inspect.getmodule(self).__name__, 27 | ) 28 | 29 | @robot_webhook.route("/", methods=["GET"]) 30 | async def health(_: Request): 31 | return response.json({"status": "ok"}) 32 | 33 | @robot_webhook.route("/webhook", methods=["POST"]) 34 | async def receive(request: Request) -> HTTPResponse: 35 | sender_id = await self._extract_sender(request) 36 | text = self._extract_message(request) 37 | input_channel = self._extract_input_channel(request) 38 | metadata = self.get_metadata(request) 39 | collector = CollectingOutputChannel() 40 | # noinspection PyBroadException 41 | try: 42 | message = UserMessage( 43 | text, 44 | collector, 45 | sender_id, 46 | input_channel=input_channel, 47 | metadata=metadata, 48 | ) 49 | await on_new_message( 50 | message 51 | ) 52 | except CancelledError: 53 | logger.error( 54 | "Message handling timed out for " 55 | "user message '{}'.".format(text) 56 | ) 57 | except Exception: 58 | logger.exception( 59 | "An exception occured while handling " 60 | "user message '{}'.".format(text) 61 | ) 62 | return response.json(self.__filter_confidence_below_threshold( 63 | request=request, 64 | message=collector.messages, 65 | sender_id=sender_id 66 | )) 67 | 68 | return robot_webhook 69 | 70 | def __filter_confidence_below_threshold( 71 | self, 72 | request: Request, 73 | message: List[Dict[str, str]], 74 | sender_id: str 75 | ) -> List[Dict[str, str]]: 76 | tracker = request.app.agent.tracker_store.get_or_create_tracker(sender_id) 77 | parse_data = tracker.latest_message.parse_data 78 | response_selector = parse_data.get("response_selector") 79 | ranking = response_selector["default"]["ranking"] 80 | if ranking: 81 | response_msg = [msg["text"] for msg in message] 82 | kv_ranking = self._turn_to_kv_dict(ranking) 83 | threshold = 0.80 84 | del_index = [] 85 | for text in response_msg: 86 | confidence = kv_ranking.get(text, 1.0) 87 | if confidence < threshold: 88 | del_index.append(text) 89 | return list(filter(lambda item: item["text"] not in del_index, message)) 90 | 91 | return message 92 | 93 | @staticmethod 94 | def _turn_to_kv_dict(data: List[Dict[str, str]]) -> Dict[str, str]: 95 | return {item["name"]: item["confidence"] for item in data} 96 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/classification.py: -------------------------------------------------------------------------------- 1 | from typing import Text, List, Any 2 | 3 | from rasa.nlu.classifiers.classifier import IntentClassifier 4 | from rasa.nlu.training_data import Message 5 | 6 | 7 | class CustomPipeline(IntentClassifier): 8 | 9 | @classmethod 10 | def required_packages(cls) -> List[Text]: 11 | return [] 12 | 13 | def process(self, message: Message, **kwargs: Any) -> None: 14 | message.set(self.__class__.__name__, { 15 | "topic": {}, 16 | "risk": {}, 17 | "sensitive_words": {}, 18 | }, add_to_output=True) 19 | 20 | @property 21 | def name(self): 22 | return self.__class__.__name__ 23 | 24 | 25 | if __name__ == '__main__': 26 | print(CustomPipeline.__name__) 27 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/config.yml: -------------------------------------------------------------------------------- 1 | # Configuration for Rasa NLU. 2 | # https://rasa.com/docs/rasa/nlu/components/ 3 | language: en 4 | pipeline: 5 | - name: WhitespaceTokenizer 6 | - name: classification.CustomPipeline 7 | - name: RegexFeaturizer 8 | - name: LexicalSyntacticFeaturizer 9 | - name: CountVectorsFeaturizer 10 | - name: CountVectorsFeaturizer 11 | analyzer: "char_wb" 12 | min_ngram: 1 13 | max_ngram: 4 14 | - name: DIETClassifier 15 | epochs: 100 16 | - name: EntitySynonymMapper 17 | - name: ResponseSelector 18 | epochs: 100 19 | 20 | # Configuration for Rasa Core. 21 | # https://rasa.com/docs/rasa/core/policies/ 22 | policies: 23 | - name: MemoizationPolicy 24 | - name: TEDPolicy 25 | max_history: 2 26 | epochs: 10 27 | - name: MappingPolicy 28 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/credentials.yml: -------------------------------------------------------------------------------- 1 | # This file contains the credentials for the voice & chat platforms 2 | # which your bot is using. 3 | # https://rasa.com/docs/rasa/user-guide/messaging-and-voice-channels/ 4 | 5 | #rest: 6 | # # you don't need to provide anything here - this channel doesn't 7 | # # require any credentials 8 | 9 | channels.RobotInput: 10 | # # custom channel 11 | 12 | #facebook: 13 | # verify: "" 14 | # secret: "" 15 | # page-access-token: "" 16 | 17 | #slack: 18 | # slack_token: "" 19 | # slack_channel: "" 20 | 21 | #socketio: 22 | # user_message_evt: 23 | # bot_message_evt: 24 | # session_persistence: 25 | 26 | #mattermost: 27 | # url: "https:///api/v4" 28 | # token: "" 29 | # webhook_url: "" 30 | 31 | # This entry is needed if you are using Rasa X. The entry represents credentials 32 | # for the Rasa X "channel", i.e. Talk to your bot and Share with guest testers. 33 | rasa: 34 | url: "http://localhost:5002/api" 35 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/data/nlu.md: -------------------------------------------------------------------------------- 1 | ## intent:greet 2 | - hey 3 | - hello 4 | - hi 5 | - good morning 6 | - good evening 7 | - hey there 8 | 9 | ## intent:goodbye 10 | - bye 11 | - goodbye 12 | - see you around 13 | - see you later 14 | 15 | ## intent:inform 16 | - [Sitka](location) 17 | - [Juneau](location) 18 | - [Virginia](location) 19 | - [Cusseta](location) 20 | - [Chicago](location) 21 | - [Tuscon](location) 22 | - [Columbus](location) 23 | - [San Francisco](location) 24 | 25 | ## intent:search_provider 26 | - I need a [hospital](fancility_type) 27 | - find me a nearby [hospital(fancility_type) 28 | - show me [home health agencies](fancility_type) 29 | - [hospital](fancility_type) 30 | - find me a nearby hospital in [San Francisco](location) 31 | - I need a [home health agency](fancility_type) 32 | 33 | ## intent:affirm 34 | - yes 35 | - indeed 36 | - of course 37 | - that sounds good 38 | - correct 39 | 40 | ## intent:deny 41 | - no 42 | - never 43 | - I don't think so 44 | - don't like that 45 | - no way 46 | - not really 47 | 48 | ## intent:mood_great 49 | - perfect 50 | - very good 51 | - great 52 | - amazing 53 | - wonderful 54 | - I am feeling very good 55 | - I am great 56 | - I'm good 57 | 58 | ## intent:mood_unhappy 59 | - sad 60 | - very sad 61 | - unhappy 62 | - bad 63 | - very bad 64 | - awful 65 | - terrible 66 | - not very good 67 | - extremely sad 68 | - so sad 69 | 70 | ## intent:bot_challenge 71 | - are you a bot? 72 | - are you a human? 73 | - am I talking to a bot? 74 | - am I talking to a human? 75 | 76 | 77 | ## intent:long 78 | - story 79 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/data/stories.md: -------------------------------------------------------------------------------- 1 | ## happy path 2 | * greet 3 | - utter_greet 4 | * mood_great 5 | - utter_happy 6 | 7 | ## sad path 1 8 | * greet 9 | - utter_greet 10 | * mood_unhappy 11 | - utter_cheer_up 12 | - utter_did_that_help 13 | * affirm 14 | - utter_happy 15 | 16 | ## sad path 2 17 | * greet 18 | - utter_greet 19 | * mood_unhappy 20 | - utter_cheer_up 21 | - utter_did_that_help 22 | * deny 23 | - utter_goodbye 24 | 25 | ## say goodbye 26 | * goodbye 27 | - utter_goodbye 28 | - slot{"a": true} 29 | - utter_greet 30 | - utter_cheer_up 31 | 32 | ## GOODBYE 2 33 | * goodbye 34 | - utter_goodbye 35 | - slot{"a": false} 36 | - utter_iamabot 37 | - utter_did_that_help 38 | 39 | ## bot challenge 40 | * bot_challenge 41 | - utter_iamabot 42 | 43 | 44 | ## test long msg 45 | * long 46 | - utter_S3.3.2.3 47 | - utter_S3.3.3.1 48 | - utter_S3.3.3.2 49 | - utter_S3.3.3.3 50 | - utter_S3.3.3.4 51 | - utter_S3.3.3.5 52 | - utter_S3.3.3.6 53 | - utter_S3.3.3.7 54 | - utter_S3.4.1 55 | - utter_S3.4.2 56 | - utter_S3.4.3 57 | - utter_S3.4.4 58 | - utter_S3.4.5 59 | - utter_S3.4.6 60 | - utter_S3.4.7 61 | - utter_S3.4.8 62 | 63 | 64 | ## test text slot Path 1 65 | * S1_text 66 | - slot{"city": "Japan"} 67 | - utter_S3.3.3.1 68 | - utter_S3.3.3.2 69 | - utter_S3.3.3.3 70 | 71 | 72 | ## test text slot Path 2 73 | * S1_text 74 | - slot{"city": "UA"} 75 | - utter_S3.3.3.3 76 | - utter_S3.3.3.2 77 | - utter_S3.3.3.1 78 | 79 | 80 | ## test categorical slot Path 1 81 | * S2.1 82 | - slot{"risk_level": "a"} 83 | - utter_S3.3.3.1 84 | 85 | ## test categorical slot Path 2 86 | * S2.1 87 | - slot{"risk_level": "b"} 88 | - utter_S3.3.3.2 89 | 90 | ## test categorical slot Path 3 91 | * S2.1 92 | - slot{"risk_level": "c"} 93 | - utter_S3.3.3.3 94 | 95 | ## test float slot Path 1 96 | * S2.2 97 | - slot{"temperature": -100} 98 | - utter_S3.3.3.1 99 | 100 | ## test float slot Path 2 101 | * S2.2 102 | - slot{"temperature": 100} 103 | - utter_S3.3.3.2 104 | 105 | ## test float slot Path 3 106 | * S2.2 107 | - slot{"temperature": 50} 108 | - utter_S3.3.3.3 109 | 110 | ## test list slot Path 1 111 | * S2.3 112 | - slot{"old_items": ["o1", "o2", 3, 4]} 113 | - utter_S3.3.3.6 114 | 115 | ## test list slot Path 2 116 | * S2.3 117 | - slot{"old_items": []} 118 | - utter_S3.3.3.7 119 | 120 | ## TEST multi conditions for story Path1 121 | * S3.1 122 | - slot{"risk_level": "c", "a": true} 123 | - utter_S3.4.1 124 | 125 | 126 | ## TEST multi conditions for story Path2 127 | * S3.1 128 | - slot{"risk_level": "b", "a": false} 129 | - utter_S3.4.2 130 | 131 | 132 | ## TEST multi conditions for story Path3 133 | * S3.1 134 | - slot{"risk_level": "b", "a": true} 135 | - utter_S3.4.3 136 | 137 | 138 | ## TEST multi conditions for story Path4 139 | * S3.1 140 | - slot{"risk_level": "c", "a": false} 141 | - utter_S3.4.4 142 | 143 | 144 | ## TEST multi line conditions for story Path1 145 | * S3.2 146 | - slot{"risk_level": "c"} 147 | - utter_S3.4.4 148 | - slot{"a": false} 149 | - utter_S3.4.3 150 | 151 | ## TEST multi line conditions for story Path2 152 | * S3.2 153 | - slot{"risk_level": "b"} 154 | - utter_S3.4.8 155 | - slot{"a": false} 156 | - utter_S3.4.7 157 | 158 | ## TEST multi line conditions for story Path3 159 | * S3.2 160 | - slot{"risk_level": "c"} 161 | - utter_S3.4.4 162 | - slot{"a": true} 163 | - utter_happy 164 | 165 | ## TEST multi line conditions for story Path4 166 | * S3.2 167 | - slot{"risk_level": "b"} 168 | - utter_S3.4.8 169 | - slot{"a": true} 170 | - utter_S3.4.5 171 | 172 | ## TEST multi float slot Path 1 173 | * S4.1 174 | - slot{"smoking_amount": 0.0} 175 | - utter_S4.1 176 | 177 | ## TEST multi float slot Path 2 178 | * S4.1 179 | - slot{"smoking_amount": 0.1} 180 | - utter_S4.3 181 | 182 | 183 | ## TEST multi float slot Path 3 184 | * S4.1 185 | - utter_S4.2 186 | 187 | 188 | 189 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/domain.yml: -------------------------------------------------------------------------------- 1 | intents: 2 | - greet 3 | - goodbye 4 | - affirm 5 | - deny 6 | - mood_great 7 | - mood_unhappy 8 | - bot_challenge 9 | - long 10 | - S1_text 11 | - S2.1 12 | - S2.2 13 | - S2.3 14 | - S3.1 15 | - S3.2 16 | - S4.1 17 | slots: 18 | a: 19 | type: bool 20 | city: 21 | type: text 22 | risk_level: 23 | type: categorical 24 | values: 25 | - a 26 | - b 27 | - c 28 | old_items: 29 | type: list 30 | temperature: 31 | type: float 32 | min_value: -100 33 | max_value: 100 34 | auto_fill: false 35 | smoking_amount: 36 | type: float 37 | min_value: 0.0 38 | max_value: 0.1 39 | auto_fill: false 40 | 41 | 42 | 43 | session_config: 44 | session_expiration_time: 60 45 | carry_over_slots_to_new_session: true 46 | 47 | responses: 48 | utter_greet: 49 | - text: "Hey! How are you?" 50 | 51 | utter_cheer_up: 52 | - text: "Here is something to cheer you up:" 53 | image: "https://i.imgur.com/nGF1K8f.jpg" 54 | 55 | utter_did_that_help: 56 | - text: "Did that help you?" 57 | 58 | utter_happy: 59 | - text: "Great, carry on!" 60 | 61 | utter_goodbye: 62 | - text: "Bye" 63 | 64 | utter_by: 65 | - text: "bybybybybybybybyby" 66 | 67 | utter_iamabot: 68 | - text: "I am a bot, powered by Rasa." 69 | 70 | utter_S3.3.2.3: 71 | - text: 哮喘控制好了,保证生活质量 72 | utter_S3.3.3.1: 73 | - text: 上次说的话白说了,把我的话当耳旁风 74 | utter_S3.3.3.2: 75 | - text: 药物没有按时/按量使用是不能发挥出最大药效的 76 | utter_S3.3.3.3: 77 | - text: 白白浪费了药钱不说 78 | utter_S3.3.3.4: 79 | - text: 哮喘控制不好,慢慢变成慢阻肺,你就后悔莫及了! 80 | utter_S3.3.3.5: 81 | - text: 我这么忙!还抽空管理你们,就是希望你可以好好控制 82 | utter_S3.3.3.6: 83 | - text: 哮喘控制不好,慢慢变成慢阻肺,你就后悔莫及了! 84 | utter_S3.3.3.7: 85 | - text: 保证一个较好的生活质量 86 | utter_S3.4.1: 87 | - text: 你一直做不到按时/按量用药这件事啊 88 | utter_S3.4.2: 89 | - text: 还是要想办法解决的 90 | utter_S3.4.3: 91 | - text: 这里有一个用药提醒的小工具 92 | utter_S3.4.4: 93 | - text: 到了吃药时间会自动提醒你 94 | utter_S3.4.5: 95 | - text: 你每天进去打卡 96 | utter_S3.4.6: 97 | - text: 我这边的小护士也会后台督促你的 98 | utter_S3.4.7: 99 | - text: 到#每日用药记录打卡小程序 100 | utter_S3.4.8: 101 | - text: 为你们也是操碎了心啊 102 | utter_S4.3: 103 | - text: 0.1 及以上 104 | utter_S4.1: 105 | - text: 0.0 及以下 106 | utter_S4.2: 107 | - text: 0.0 - 0.1 之间 108 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/endpoints.yml: -------------------------------------------------------------------------------- 1 | # This file contains the different endpoints your bot can use. 2 | 3 | # Server where the models are pulled from. 4 | # https://rasa.com/docs/rasa/user-guide/configuring-http-api/#fetching-models-from-a-server/ 5 | 6 | #models: 7 | # url: http://my-server.com/models/default_core@latest 8 | # wait_time_between_pulls: 10 # [optional](default: 100) 9 | 10 | # Server which runs your custom actions. 11 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 12 | 13 | #action_endpoint: 14 | # url: "http://localhost:5055/webhook" 15 | 16 | # Tracker store which is used to store the conversations. 17 | # By default the conversations are stored in memory. 18 | # https://rasa.com/docs/rasa/api/tracker-stores/ 19 | 20 | #tracker_store: 21 | # type: redis 22 | # url: 23 | # port: 24 | # db: 25 | # password: 26 | # use_ssl: 27 | 28 | #tracker_store: 29 | # type: mongod 30 | # url: 31 | # db: 32 | # username: 33 | # password: 34 | 35 | # Event broker which all conversation events should be streamed to. 36 | # https://rasa.com/docs/rasa/api/event-brokers/ 37 | 38 | #event_broker: 39 | # url: localhost 40 | # username: username 41 | # password: password 42 | # queue: queue 43 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/run.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import rasa 3 | from rasa.cli.shell import shell 4 | from rasa.constants import DEFAULT_DOMAIN_PATH, DEFAULT_ENDPOINTS_PATH, DEFAULT_MODELS_PATH 5 | from rasa.core.processor import MessageProcessor 6 | from rasa.core.tracker_store import TrackerStore 7 | from rasa.core.trackers import DialogueStateTracker 8 | from rasa.nlu.registry import component_classes, registered_components 9 | 10 | from small_projects.rasa_learn.ep2.train import PROJECT_PATH, CustomPipeline 11 | 12 | # component_classes.append(CustomPipeline) 13 | # registered_components[CustomPipeline.name] = CustomPipeline 14 | 15 | # class O: 16 | # model = "/Users/dustyposa/Documents/code/github/goSpider/small_projects/rasa_learn/ep2/models" 17 | # endpoints = "/Users/dustyposa/Documents/code/github/goSpider/small_projects/rasa_learn/ep2/endpoints.yml" 18 | # credentials = None 19 | # enable_api = False 20 | # 21 | # 22 | # shell(O()) 23 | from rasa.core.channels import RestInput 24 | 25 | # rasa.run(model="/Users/dustyposa/Documents/code/github/goSpider/small_projects/rasa_learn/ep2/models", 26 | # endpoints="/Users/dustyposa/Documents/code/github/goSpider/small_projects/rasa_learn/ep2/endpoints.yml") 27 | rasa.run(model=str((PROJECT_PATH / DEFAULT_MODELS_PATH)), 28 | endpoints=str((PROJECT_PATH / DEFAULT_ENDPOINTS_PATH)), 29 | credentials="./credentials.yml" 30 | ) 31 | from rasa.nlu.registry import Any 32 | 33 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/tests/conversation_tests.md: -------------------------------------------------------------------------------- 1 | #### This file contains tests to evaluate that your bot behaves as expected. 2 | #### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/user-guide/testing-your-assistant/ 3 | 4 | ## happy path 1 5 | * greet: hello there! 6 | - utter_greet 7 | * mood_great: amazing 8 | - utter_happy 9 | 10 | ## happy path 2 11 | * greet: hello there! 12 | - utter_greet 13 | * mood_great: amazing 14 | - utter_happy 15 | * goodbye: bye-bye! 16 | - utter_goodbye 17 | 18 | ## sad path 1 19 | * greet: hello 20 | - utter_greet 21 | * mood_unhappy: not good 22 | - utter_cheer_up 23 | - utter_did_that_help 24 | * affirm: yes 25 | - utter_happy 26 | 27 | ## sad path 2 28 | * greet: hello 29 | - utter_greet 30 | * mood_unhappy: not good 31 | - utter_cheer_up 32 | - utter_did_that_help 33 | * deny: not really 34 | - utter_goodbye 35 | 36 | ## sad path 3 37 | * greet: hi 38 | - utter_greet 39 | * mood_unhappy: very terrible 40 | - utter_cheer_up 41 | - utter_did_that_help 42 | * deny: no 43 | - utter_goodbye 44 | 45 | ## say goodbye 46 | * goodbye: bye-bye! 47 | - utter_goodbye 48 | 49 | ## bot challenge 50 | * bot_challenge: are you a bot? 51 | - utter_iamabot 52 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/ep2/train.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | 3 | from rasa import train as rasa_train 4 | from rasa.constants import DEFAULT_MODELS_PATH, DEFAULT_DOMAIN_PATH, DEFAULT_CONFIG_PATH, DEFAULT_DATA_PATH 5 | from rasa.core.trackers import DialogueStateTracker 6 | from rasa.nlu.registry import component_classes, registered_components 7 | 8 | from small_projects.rasa_learn.ep2.classification import CustomPipeline 9 | 10 | component_classes.append(CustomPipeline) 11 | registered_components[CustomPipeline.name] = CustomPipeline 12 | 13 | 14 | def get_project_path() -> Path: 15 | tmp_path = Path(".").absolute() 16 | while tmp_path.name != "ep2": 17 | tmp_path = tmp_path.parent 18 | return tmp_path 19 | 20 | 21 | PROJECT_PATH = get_project_path() 22 | 23 | 24 | def train_gj_ai(): 25 | """训练所有模块""" 26 | project_path = PROJECT_PATH 27 | print(str((project_path / DEFAULT_DOMAIN_PATH))) 28 | return rasa_train( 29 | domain=str((project_path / DEFAULT_DOMAIN_PATH)), 30 | config=str((project_path / DEFAULT_CONFIG_PATH)), 31 | training_files=str((project_path / DEFAULT_DATA_PATH)), 32 | output=str(project_path / DEFAULT_MODELS_PATH), 33 | ) 34 | 35 | 36 | if __name__ == '__main__': 37 | train_gj_ai() 38 | DialogueStateTracker 39 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_learn/rasa-assistant/__init__.py -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/actions.py: -------------------------------------------------------------------------------- 1 | # This files contains your custom actions which can be used to run 2 | # custom Python code. 3 | # 4 | # See this guide on how to implement these action: 5 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 6 | 7 | 8 | # This is a simple example for a custom action which utters "Hello World!" 9 | 10 | # from typing import Any, Text, Dict, List 11 | # 12 | # from rasa_sdk import Action, Tracker 13 | # from rasa_sdk.executor import CollectingDispatcher 14 | # 15 | # 16 | # class ActionHelloWorld(Action): 17 | # 18 | # def name(self) -> Text: 19 | # return "action_hello_world" 20 | # 21 | # def run(self, dispatcher: CollectingDispatcher, 22 | # tracker: Tracker, 23 | # domain: Dict[Text, Any]) -> List[Dict[Text, Any]]: 24 | # 25 | # dispatcher.utter_message(text="Hello World!") 26 | # 27 | # return [] 28 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/config.yml: -------------------------------------------------------------------------------- 1 | # Configuration for Rasa NLU. 2 | # https://rasa.com/docs/rasa/nlu/components/ 3 | language: en 4 | pipeline: 5 | - name: WhitespaceTokenizer 6 | - name: RegexFeaturizer 7 | - name: LexicalSyntacticFeaturizer 8 | - name: CountVectorsFeaturizer 9 | - name: CountVectorsFeaturizer 10 | analyzer: "char_wb" 11 | min_ngram: 1 12 | max_ngram: 4 13 | - name: DIETClassifier 14 | epochs: 100 15 | - name: EntitySynonymMapper 16 | - name: ResponseSelector 17 | epochs: 100 18 | 19 | # Configuration for Rasa Core. 20 | # https://rasa.com/docs/rasa/core/policies/ 21 | policies: 22 | - name: MemoizationPolicy 23 | - name: TEDPolicy 24 | max_history: 1 25 | epochs: 100 26 | - name: MappingPolicy 27 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/credentials.yml: -------------------------------------------------------------------------------- 1 | # This file contains the credentials for the voice & chat platforms 2 | # which your bot is using. 3 | # https://rasa.com/docs/rasa/user-guide/messaging-and-voice-channels/ 4 | 5 | rest: 6 | # # you don't need to provide anything here - this channel doesn't 7 | # # require any credentials 8 | 9 | 10 | #facebook: 11 | # verify: "" 12 | # secret: "" 13 | # page-access-token: "" 14 | 15 | #slack: 16 | # slack_token: "" 17 | # slack_channel: "" 18 | 19 | #socketio: 20 | # user_message_evt: 21 | # bot_message_evt: 22 | # session_persistence: 23 | 24 | #mattermost: 25 | # url: "https:///api/v4" 26 | # token: "" 27 | # webhook_url: "" 28 | 29 | # This entry is needed if you are using Rasa X. The entry represents credentials 30 | # for the Rasa X "channel", i.e. Talk to your bot and Share with guest testers. 31 | rasa: 32 | url: "http://localhost:5002/api" 33 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/data/nlu.md: -------------------------------------------------------------------------------- 1 | 2 | ## intent:greet 3 | - Hi 4 | - Hey 5 | - Hi bot 6 | - Hey bot 7 | - Hello 8 | - Good morning 9 | - hi again 10 | - hi folks 11 | 12 | ## intent:bye 13 | - goodbye 14 | - goodnight 15 | - good bye 16 | - good night 17 | - see ya 18 | - toodle-oo 19 | - bye bye 20 | - gotta go 21 | - farewell 22 | 23 | ## intent:thank 24 | - Thanks 25 | - Thank you 26 | - Thank you so much 27 | - Thanks bot 28 | - Thanks for that 29 | - cheers 30 | 31 | ## intent: faq/ask_channels 32 | - What channels of communication does rasa support? 33 | - what channels do you support? 34 | - what chat channels does rasa uses 35 | - channels supported by Rasa 36 | - which messaging channels does rasa support? 37 | 38 | ## intent: faq/ask_languages 39 | - what language does rasa support? 40 | - which language do you support? 41 | - which languages supports rasa 42 | - can I use rasa also for another laguage? 43 | - languages supported 44 | 45 | ## intent: faq/ask_rasax 46 | - I want information about rasa x 47 | - i want to learn more about Rasa X 48 | - what is rasa x? 49 | - Can you tell me about rasa x? 50 | - Tell me about rasa x 51 | - tell me what is rasa x 52 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/data/response.md: -------------------------------------------------------------------------------- 1 | ## ask channels 2 | * faq/ask_channels 3 | - We have a comprehensive list of [supported connectors](https://rasa.com/docs/core/connectors/), but if 4 | you don't see the one you're looking for, you can always create a custom connector by following 5 | [this guide](https://rasa.com/docs/rasa/user-guide/connectors/custom-connectors/). 6 | 7 | ## ask languages 8 | * faq/ask_languages 9 | - You can use Rasa to build assistants in any language you want! 10 | 11 | ## ask rasa x 12 | * faq/ask_rasax 13 | - Rasa X is a tool to learn from real conversations and improve your assistant. Read more [here](https://rasa.com/docs/rasa-x/) 14 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/data/stories.md: -------------------------------------------------------------------------------- 1 | ## greet 2 | * greet 3 | - utter_greet 4 | 5 | ## thank 6 | * thank 7 | - utter_noworries 8 | 9 | ## goodbye 10 | * bye 11 | - utter_bye 12 | 13 | ## Some question from FAQ 14 | * faq 15 | - respond_faq 16 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/domain.yml: -------------------------------------------------------------------------------- 1 | 2 | intents: 3 | - greet 4 | - bye 5 | - thank 6 | - faq 7 | 8 | responses: 9 | utter_noworries: 10 | - text: No worries! 11 | utter_greet: 12 | - text: Hi 13 | utter_bye: 14 | - text: Bye! 15 | 16 | actions: 17 | - respond_faq 18 | 19 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/endpoints.yml: -------------------------------------------------------------------------------- 1 | # This file contains the different endpoints your bot can use. 2 | 3 | # Server where the models are pulled from. 4 | # https://rasa.com/docs/rasa/user-guide/configuring-http-api/#fetching-models-from-a-server/ 5 | 6 | #models: 7 | # url: http://my-server.com/models/default_core@latest 8 | # wait_time_between_pulls: 10 # [optional](default: 100) 9 | 10 | # Server which runs your custom actions. 11 | # https://rasa.com/docs/rasa/core/actions/#custom-actions/ 12 | 13 | #action_endpoint: 14 | # url: "http://localhost:5055/webhook" 15 | 16 | # Tracker store which is used to store the conversations. 17 | # By default the conversations are stored in memory. 18 | # https://rasa.com/docs/rasa/api/tracker-stores/ 19 | 20 | #tracker_store: 21 | # type: redis 22 | # url: 23 | # port: 24 | # db: 25 | # password: 26 | # use_ssl: 27 | 28 | #tracker_store: 29 | # type: mongod 30 | # url: 31 | # db: 32 | # username: 33 | # password: 34 | 35 | # Event broker which all conversation events should be streamed to. 36 | # https://rasa.com/docs/rasa/api/event-brokers/ 37 | 38 | #event_broker: 39 | # url: localhost 40 | # username: username 41 | # password: password 42 | # queue: queue 43 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/results/confmat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_learn/rasa-assistant/results/confmat.png -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/results/failed_stories.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/results/hist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_learn/rasa-assistant/results/hist.png -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/results/intent_report.json: -------------------------------------------------------------------------------- 1 | { 2 | "greet": { 3 | "precision": 1.0, 4 | "recall": 1.0, 5 | "f1-score": 1.0, 6 | "support": 8, 7 | "confused_with": {} 8 | }, 9 | "thank": { 10 | "precision": 1.0, 11 | "recall": 1.0, 12 | "f1-score": 1.0, 13 | "support": 6, 14 | "confused_with": {} 15 | }, 16 | "bye": { 17 | "precision": 1.0, 18 | "recall": 1.0, 19 | "f1-score": 1.0, 20 | "support": 9, 21 | "confused_with": {} 22 | }, 23 | "accuracy": 1.0, 24 | "macro avg": { 25 | "precision": 1.0, 26 | "recall": 1.0, 27 | "f1-score": 1.0, 28 | "support": 23 29 | }, 30 | "weighted avg": { 31 | "precision": 1.0, 32 | "recall": 1.0, 33 | "f1-score": 1.0, 34 | "support": 23 35 | } 36 | } -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/results/story_confmat.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/small_projects/rasa_learn/rasa-assistant/results/story_confmat.pdf -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/tests/conversation_tests.md: -------------------------------------------------------------------------------- 1 | ## greet + goodbye 2 | * greet: Hi! 3 | - utter_greet 4 | * bye: Bye 5 | - utter_bye 6 | 7 | ## greet + thanks 8 | * greet: Hello there 9 | - utter_greet 10 | * thank: thanks a bunch 11 | - utter_noworries 12 | 13 | ## greet + thanks + goodbye 14 | * greet: Hey 15 | - utter_greet 16 | * thank: thank you 17 | - utter_noworries 18 | * bye: bye bye 19 | - utter_bye 20 | -------------------------------------------------------------------------------- /small_projects/rasa_learn/rasa-assistant/tests/test_stories.md: -------------------------------------------------------------------------------- 1 | ## ask channels 2 | * faq: What messaging channels does Rasa support? 3 | - respond_faq 4 | 5 | ## ask languages 6 | * faq: Which languages can I build assistants in? 7 | - respond_faq 8 | 9 | ## ask rasa x 10 | * faq: What’s Rasa X? 11 | - respond_faq 12 | -------------------------------------------------------------------------------- /small_projects/文字生成图片/config.py: -------------------------------------------------------------------------------- 1 | HOT_LIST_URL = "https://tophub.today/" 2 | -------------------------------------------------------------------------------- /small_projects/文字生成图片/main.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass 2 | from typing import List 3 | 4 | import requests 5 | from parsel import Selector 6 | from PIL import Image, ImageDraw, ImageFont 7 | 8 | from small_projects.文字生成图片.config import HOT_LIST_URL 9 | 10 | font_size = 30 11 | font = ImageFont.truetype("AaTianShiZhuYi-2.ttf", size=font_size) 12 | line_spacing = font_size + 5 13 | 14 | 15 | # img = Image.new('RGB', (200, 100), (255, 255, 255)) 16 | # d = ImageDraw.Draw(img) 17 | # d.text((20, 20), 'Hello', fill=(255, 0, 0)) 18 | # 19 | # with open("res.png", "wb") as fp: 20 | # img.save(fp, 'png') 21 | # print("生成图片成功") 22 | 23 | @dataclass 24 | class SingleData: 25 | title: str 26 | heat: str 27 | 28 | 29 | @dataclass 30 | class HotData: 31 | name: str 32 | data: List[SingleData] 33 | scope: str 34 | 35 | 36 | class HotListCrawler: 37 | url = HOT_LIST_URL 38 | headers = { 39 | "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 " 40 | "(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" 41 | } 42 | 43 | @classmethod 44 | def crawl(cls) -> List[HotData]: 45 | web_data = cls._get_web_data() 46 | return cls._parse_web_data(web_data) 47 | 48 | @classmethod 49 | def _get_web_data(cls) -> str: 50 | return requests.get(cls.url, headers=cls.headers).content.decode("u8") 51 | 52 | @classmethod 53 | def _parse_web_data(cls, web_data: str) -> List[HotData]: 54 | res = [] 55 | sel = Selector(text=web_data) 56 | for hot_data in sel.css(".cc-cd")[:4]: 57 | name = hot_data.css(".cc-cd-lb>span::text").get() 58 | scope = hot_data.css(".cc-cd-sb-st::text").get() 59 | data = [] 60 | for single_data in hot_data.css(".cc-cd-cb-ll")[:10]: 61 | title = single_data.css(".t::text").get() 62 | heat = single_data.css(".e::text").get() 63 | data.append(SingleData(title=title, heat=heat)) 64 | res.append(HotData( 65 | name=name, 66 | scope=scope, 67 | data=data 68 | )) 69 | return res 70 | 71 | 72 | class PngProducer: 73 | @classmethod 74 | def draw_from_hot_data(cls, data: List[HotData]): 75 | data_length = len(data) 76 | w, h = 1200, 600 77 | img = Image.new('RGB', (w, h * data_length), (255, 255, 255)) 78 | drawer = ImageDraw.Draw(img) 79 | base_x, base_y = 10, 10 80 | for hot_data in data: 81 | drawer.text((base_x, base_y), hot_data.name, fill=(255, 0, 0), 82 | font=font) 83 | base_y += line_spacing 84 | for single_data in hot_data.data: 85 | drawer.text((base_x, base_y), single_data.title, fill=(255, 0, 0), 86 | font=font) 87 | drawer.text((w - base_x - 5 * font_size, base_y), single_data.heat, fill=(255, 0, 0), 88 | font=font) 89 | base_y += line_spacing 90 | base_y += line_spacing * 3 91 | 92 | cls.save(img) 93 | 94 | @classmethod 95 | def save(cls, img): 96 | with open("res.png", "wb") as fp: 97 | img.save(fp, 'png') 98 | print("生成图片成功") 99 | 100 | 101 | if __name__ == '__main__': 102 | data = HotListCrawler.crawl() 103 | PngProducer.draw_from_hot_data(data) 104 | -------------------------------------------------------------------------------- /small_projects/文字生成图片/requirements.txt: -------------------------------------------------------------------------------- 1 | pillow 2 | requests 3 | parsel 4 | -------------------------------------------------------------------------------- /small_projects/音视频分离/extract_audio.py: -------------------------------------------------------------------------------- 1 | from path import Path 2 | 3 | from moviepy.editor import VideoFileClip 4 | 5 | for file in Path("videos_files").files(): 6 | video = VideoFileClip(file.abspath()) 7 | 8 | audio = video.audio 9 | audio.write_audiofile(f"./audio_files/{file.stem}.mp3") 10 | print(f"写入文件{file.stem}.mp3") 11 | -------------------------------------------------------------------------------- /small_projects/音视频分离/get_video.py: -------------------------------------------------------------------------------- 1 | from path import Path 2 | 3 | import requests 4 | 5 | 6 | def get_sources_to_max_num(url: str, max_num: int) -> None: 7 | for i in range(1, max_num + 1): 8 | if i < 10: 9 | tmp_url = url.format(f"0{i}") 10 | else: 11 | tmp_url = url.format(f"{i}") 12 | save_data(get_data(tmp_url), i) 13 | print(F"第{i}个存储完毕") 14 | 15 | 16 | def get_data(url: str) -> bytes: 17 | return requests.get(url).content 18 | 19 | 20 | def save_data(data: bytes, index: int): 21 | Path(f"./videos_files/{index}.mp4").write_bytes(data) 22 | 23 | 24 | if __name__ == '__main__': 25 | url = "http://www.ynfn.gov.cn:7000/fnnews/py{}.mp4" 26 | get_sources_to_max_num(url, 81) 27 | -------------------------------------------------------------------------------- /small_projects/音视频分离/requirements: -------------------------------------------------------------------------------- 1 | moviepy 2 | -------------------------------------------------------------------------------- /spider_project/README.md: -------------------------------------------------------------------------------- 1 | ## 边学边练,项目是少不了的! 2 | ## 练习分类 3 | - #### 解析练习 4 | - [豆瓣电影Top100](./douban_movie/) 5 | - 进行正则,BeautifulSoup4,xpath,pyquery,scrapy混合解析器的解析练习 6 | - #### 登录分析练习 7 | - [微信web登录](./login/wx_web/) 8 | 9 | - #### 爬虫提速 10 | - [多线程抓取豆瓣top250](./multithreading/) 11 | - 多线程基础练习 12 | -------------------------------------------------------------------------------- /spider_project/ajax/README.md: -------------------------------------------------------------------------------- 1 | ## ajxa 异步数据网站抓取 2 | ### ajxa 实战 3 | - #### [思否社区](./segmentfault/) 4 | 5 | ### ajxa 简介 6 | #### 有没有遇到过这种问题?明明在网页上能看到数据,但是请求之后并没有返回给自己想要的数据。这是为什么呢? 7 | #### 其实,这就是ajax异步加载,利用ajax异步加载,还能提升页面响应速度。所以很多网站也会用异步加载,这个专题就是用来了解异步加载数据的抓取方式,分析方法的! 8 | 9 | #### 首先我们看一下普通请求和ajax请求抓包上面的区别。 10 | 例如:[豆瓣电影Top100](https://movie.douban.com/tag/Top100) -- [项目地址](../douban_movie/) 11 | 网页打开的网址是: https://movie.douban.com/tag/Top100 12 | 响应返回的请求头是: 13 | ![请求地址](../douban_movie/images/requests_ann.png) 14 | **所以我们需要的数据直接请求浏览器打开的网址即可,但是对于有一些网址,直接请求浏览器访问的网址是拿不到数据的,其实浏览器访问一个网址,他也会顺带访问其他网址,如下图:** 15 | ![额外请求](./images/ajax-urls.png) 16 | 那么其实数据就在这些请求中。 17 | 18 | 19 | -------------------------------------------------------------------------------- /spider_project/ajax/images/ajax-urls.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/images/ajax-urls.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/1_s.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/1_s.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/ajax_data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/ajax_data.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/recmmend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/recmmend.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/request_ana.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/request_ana.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/s_index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/s_index.png -------------------------------------------------------------------------------- /spider_project/ajax/segmentfault/images/skill.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/ajax/segmentfault/images/skill.png -------------------------------------------------------------------------------- /spider_project/asynchronous/qiutan/save_helper.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass 2 | 3 | import pymongo 4 | 5 | 6 | @dataclass 7 | class MongoDb: 8 | mongo_session = pymongo.MongoClient(host="localhost", port=27017) 9 | mongo_db = mongo_session["qiutan"] 10 | mongo_col = mongo_db["qiutan"] 11 | -------------------------------------------------------------------------------- /spider_project/douban_movie/images/capture_package_ready.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/capture_package_ready.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/confirm_request.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/confirm_request.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/data_target.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/data_target.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/index.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/re_get_data_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/re_get_data_1.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/re_name_poster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/re_name_poster.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/requests_ann.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/requests_ann.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/time_handle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/time_handle.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/total_data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/total_data.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/unique1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/unique1.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/unique2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/unique2.png -------------------------------------------------------------------------------- /spider_project/douban_movie/images/xpath1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/douban_movie/images/xpath1.png -------------------------------------------------------------------------------- /spider_project/dzdp/comments/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/dzdp/comments/__init__.py -------------------------------------------------------------------------------- /spider_project/dzdp/comments/comments_dzdp.py: -------------------------------------------------------------------------------- 1 | import time 2 | from typing import Dict, Any 3 | 4 | import requests 5 | import pymongo 6 | from parsel import Selector 7 | from pathlib import Path 8 | 9 | from headers import COMMENTS_HEADERS 10 | 11 | 12 | class DBClient: 13 | def __init__(self, db_name: str) -> None: 14 | self.client = pymongo.MongoClient('mongodb://mongo:27017/') 15 | self.db = self.client[db_name] 16 | self.col = self.db["comments"] 17 | 18 | def insert_one(self, data: Dict[str, Any]) -> str: 19 | return self.col.insert_one(data) 20 | 21 | 22 | class CommentsParser: 23 | def __init__(self, html: str) -> None: 24 | self.selector = Selector(text=html) 25 | self.check_recommend = 0 26 | 27 | @property 28 | def is_not_empty_page(self) -> bool: 29 | return len(self.selector.css(".reviews-wrapper")) != 0 30 | 31 | @property 32 | def has_recommend(self) -> bool: 33 | return len(self.selector.css(".review-recommend")) != 0 34 | 35 | 36 | class CommentsCrawler: 37 | def __init__(self, shop_id: str) -> None: 38 | self.shop_id = shop_id 39 | self.base_url = f"http://www.dianping.com/shop/{self.shop_id}/review_all/p" 40 | self.now_page = 8 41 | self.crawl_url = self.base_url + str(self.now_page) 42 | self.headers = COMMENTS_HEADERS 43 | self.db_client = DBClient("dzdp") 44 | self.session = requests.Session() 45 | self.session.headers.update(self.headers) 46 | self.now_content = b"" 47 | 48 | def increase_page(self) -> None: 49 | self.now_page += 1 50 | self.crawl_url = self.base_url + str(self.now_page) 51 | 52 | def run(self) -> None: 53 | while True: 54 | print(f"正在抓取:{self.crawl_url}") 55 | parser = self.crawl_page() 56 | if parser.is_not_empty_page and parser.has_recommend: 57 | self.db_client.insert_one({ 58 | "text": parser.selector.xpath("//body").get(), 59 | "page_nums": self.now_page, 60 | "shop_id": self.shop_id, 61 | "url": self.crawl_url, 62 | }) 63 | self.increase_page() 64 | else: 65 | print("没有数据了。") 66 | self.record_id() 67 | break 68 | time.sleep(15) 69 | 70 | def record_id(self) -> None: 71 | Path("fail_data").write_bytes(self.now_content) 72 | with open("crawled_db", "a") as fp: 73 | fp.write(f"{self.shop_id},{self.now_page}\n") 74 | 75 | def crawl_page(self) -> CommentsParser: 76 | if self.now_page > 1: 77 | self.session.headers["Referer"] = self.base_url + str(self.now_page - 1) 78 | # print(self.session.headers) 79 | response = self.session.get(self.crawl_url) 80 | response.raise_for_status() 81 | self.now_content = response.content 82 | text = response.text 83 | parser = CommentsParser(html=text) 84 | return parser 85 | 86 | 87 | if __name__ == '__main__': 88 | CommentsCrawler("67408602").run() 89 | -------------------------------------------------------------------------------- /spider_project/dzdp/comments/simple_crawl.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | import requests 4 | from parsel import Selector 5 | 6 | from headers import COMMENTS_HEADERS 7 | 8 | base_url = "http://www.dianping.com/shop/67408602/review_all/p{}" 9 | 10 | 11 | for i in range(1, 10): 12 | if i > 1: 13 | COMMENTS_HEADERS["Referer"] = base_url.format(i - 1) 14 | res = requests.get(base_url.format(1), headers=COMMENTS_HEADERS) 15 | selector = Selector(text=res.text) 16 | if selector.css(".review-recommend").getall(): 17 | print(selector.css(".review-recommend").getall()) 18 | else: 19 | print(base_url.format(1)) 20 | print(res.content.decode("u8")) 21 | time.sleep(5) 22 | 23 | -------------------------------------------------------------------------------- /spider_project/dzdp/details/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/dzdp/details/__init__.py -------------------------------------------------------------------------------- /spider_project/dzdp/details/headers.py: -------------------------------------------------------------------------------- 1 | import dataclasses 2 | 3 | COMMON_HEADERS = { 4 | "Host": "www.dianping.com", 5 | "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0", 6 | "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", 7 | "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2", 8 | "Accept-Encoding": "gzip, deflate, br", 9 | "Connection": "keep-alive", 10 | "Cookie": "_lxsdk_cuid=1727a33b5c1c8-0f6ff667f9d808-1b386257-1aeaa0-1727a33b5c1c8; _lxsdk=1727a33b5c1c8-0f6ff667f9d808-1b386257-1aeaa0-1727a33b5c1c8; _hc.v=03ed4f1e-4cbe-ee55-c828-0181b754de15.1591188111; ctu=c12d516a4e292a0c820453631e6e434f2dfd07abe0b169dc6c7dbf126ee4e7ce; cityid=1609; default_ab=shop%3AA%3A6; cy=1609; cye=shuangliu; ua=dpuser_1875991754; ll=7fd06e815b796be3df069dec7836c3df; uamo=15182154396; Hm_lvt_602b80cf8079ae6591966cc70a3940e7=1592536008,1592962151; lgtoken=0fdcbe38d-1e13-4c58-82cd-3d781bc470ae; dper=9cb77edf4723f5b65797487ada8377fd98c4b9aaaef144f6a0cfccf02d56325d46426989e309cfdb3ab3b19d14de9aef32e6fbd123f7c89d388d5088e31503da74560ad1ac1ec609c10ce92e6c81ce3c6ea5e7975c4d88579f002e26dec02cee; dplet=1d835fc21d5ba94e3b978ff6b0871148; Hm_lpvt_602b80cf8079ae6591966cc70a3940e7=1594018032; _lxsdk_s=17322e05d44-584-c43-e45%7C%7C40", 11 | "Upgrade-Insecure-Requests": "1", 12 | "Pragma": "no-cache", 13 | "Cache-Control": "no-cache", 14 | } 15 | 16 | ALL_REVIEW_PARAMS = {'shopId': 'H1aHdhkb51pd6oXh', 17 | '_token': '', 18 | 'originUrl': 'http://www.dianping.com/shop/H1aHdhkb51pd6oXh', 19 | 'shopType': 10, 20 | 'cityId': 8} 21 | -------------------------------------------------------------------------------- /spider_project/dzdp/resource/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/dzdp/resource/__init__.py -------------------------------------------------------------------------------- /spider_project/login/wx_web/README.md: -------------------------------------------------------------------------------- 1 | 2 | ## 微信web 接口登录分析 3 | 网址: https://wx.qq.com/ 4 | 5 | #### 1. [网址分析](#1-网址分析) 6 | #### 2. [QRcode关键参数](#2-关键参数来源分析) 7 | #### 3. [获取二维码](#3-获取二维码) 8 | #### 4. [获取好友列表(待续)](#4-获取好友列表) 9 | #### [完整代码](./analyze.ipynb) 10 | 11 | ### 1. 网址分析 12 | 打开网址我们可以看到如图所示的页面: 13 | ![web页面](./images/index.png) 14 | **那么,这个二维码是哪里来的呢,我们用F12进行抓包看看:** 15 | ![web页面](./images/qr.png) 16 | **很明显来自这个请求,我们进行请求分析** 17 | ![web页面](./images/ana_qr.png) 18 | 这个关键参数的规律,很明显看不出来,那我们试着搜索一下,这个关键参数的来源是哪里。 19 | 20 | ### 2. 关键参数来源分析 21 | 利用Ctrl + F 进行关键字搜索,可以得到如下结果: 22 | ![请求列表](./images/ana_key1.png) 23 | **但是很明显,我们打开的第三条请求也有这个关键参数,所以产生这个关键参数的地方不是第三条请求,我们对第二条请求继续进行分析:** 24 | ![关键参数来源请求](./images/ana_key2.png) 25 | **接下来我们用代码尝试一下:** 26 | 27 | 28 | ```python 29 | >>> import requests 30 | >>> import time 31 | ``` 32 | 33 | 34 | ```python 35 | # 构造请求参数 36 | params = { 37 | "appid": "wx782c26e4c19acffb", 38 | "redirect_uri": "https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxnewloginpage", 39 | "fun": "new", 40 | "lang": "zh_CN", 41 | "_": int(str(time.time()).replace(".", "")[:13]) 42 | } 43 | # 构造请求头 44 | headers = { 45 | "Referer": "https://wx.qq.com/", 46 | "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36", 47 | } 48 | url = "https://login.wx.qq.com/jslogin" 49 | session = requests.session() 50 | ``` 51 | 52 | 53 | ```python 54 | >>> response = session.get(url, params=params, headers=headers) 55 | >>> response.text 56 | ``` 57 | 58 | 59 | 60 | 61 | 'window.QRLogin.code = 200; window.QRLogin.uuid = "AZGQ_G8qhA==";' 62 | 63 | 64 | 65 | **以上很明显是我们需要的参数,接下来我们提取这个参数** 66 | 67 | 68 | ```python 69 | >>> qr_key = response.text.split(";")[-2].split(" = ")[-1].strip('""') 70 | >>> qr_key 71 | ``` 72 | 73 | 74 | 75 | 76 | 'AZGQ_G8qhA==' 77 | 78 | 79 | 80 | ### 3. 获取二维码 81 | 82 | 二维码的请求地址如下图: 83 | ![qrcode](./images/qr_url.png) 84 | **我们进行请求尝试** 85 | 86 | 87 | ```python 88 | >>> qr_url = "https://login.weixin.qq.com/qrcode/" + qr_key 89 | >>> response = session.get(qr_url) 90 | >>> response 91 | ``` 92 | 93 | 94 | 95 | ``` 96 | 97 | ``` 98 | 99 | 100 | **当然,数据是二进制,我们得进行转化** 101 | 102 | 103 | ```python 104 | >>> response.content[:100] 105 | ``` 106 | 107 | 108 | 109 | ``` 110 | b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x03\x02\x02\x03\x02\x02\x03\x03\x03\x03\x04\x03\x03\x04\x05\x08\x05\x05\x04\x04\x05\n\x07\x07\x06\x08\x0c\n\x0c\x0c\x0b\n\x0b\x0b\r\x0e\x12\x10\r\x0e\x11\x0e\x0b\x0b\x10\x16\x10\x11\x13\x14\x15\x15\x15\x0c\x0f\x17\x18\x16\x14\x18\x12\x14\x15\x14\xff\xc0\x00\x0b\x08\x01\xae\x01\xae\x01\x01' 111 | ``` 112 | 113 | 114 | 115 | ```python 116 | >>> from PIL import Image 117 | >>> import io 118 | ``` 119 | 120 | 121 | ```python 122 | >>> ig = io.BytesIO(response.content) 123 | >>> Image.open(ig) 124 | ``` 125 | 126 | 127 | 128 | 129 | ![png](./images/output_14_0.png) 130 | 131 | 132 | 133 | **然后就可以扫一扫拉,扫描完后我们进行下一步分析** 134 | 135 | ## 4. 获取好友列表 136 | 在登录之后,好友的数据是从哪里来的呢? 我们逐一看返回的ajax请求发现 137 | ![好友数据](./images/lgin.png) 138 | 数据来自这里,我们看一下请求头: 139 | ![好友数据请求](./images/ana_login.png) 140 | 发现有三个参数, 有一个 "skey" 比较特殊,用同样的方法,我们进行 'Ctrl+F' 进行搜索 141 | 142 | 143 | ```python 144 | >>> response = session.get("https://wx2.qq.com/cgi-bin/mmwebwx-bin/webwxgetcontact?r="+ str(time.time()).replace(".", "")[:13] +"&seq=0&skey=@crypt_4a9c92a1_c808f1ac793e52e77e907f76cf7ca086") 145 | >>> response.json() 146 | ``` 147 | 148 | 149 | 150 | ``` 151 | { 152 | 'BaseResponse': {'Ret': 1, 'ErrMsg': ''}, 153 | 'MemberCount': 0, 154 | 'MemberList': [], 155 | 'Seq': 0 156 | } 157 | ``` 158 | 159 | 160 | ![登录分析](./images/login_ana.png) 161 | -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/ana_key1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/ana_key1.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/ana_key2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/ana_key2.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/ana_login.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/ana_login.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/ana_qr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/ana_qr.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/index.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/lgin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/lgin.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/login_ana.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/login_ana.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/output_14_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/output_14_0.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/qr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/qr.png -------------------------------------------------------------------------------- /spider_project/login/wx_web/images/qr_url.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/login/wx_web/images/qr_url.png -------------------------------------------------------------------------------- /spider_project/multithreading/README.md: -------------------------------------------------------------------------------- 1 | ## 多线程爬虫 2 | 3 | 在这里提一个问题:**如果让你在10秒内请求10次,但是每次请求后需要sleep 1秒,这样10次就需要sleep 10s,你该怎么做呢?** 4 | 5 | 没错,要解决这个问题,多线程爬取就是我们的解决办法之一! 6 | 爬虫最耗费时间的地方是请求与响应的IO,如果每次只发送一个请求,那么他就会一直等待该请求的响应,再继续发送另一个请求。是非常耗时的,为了提高爬取效率,我们使用多线程爬虫。 7 | 8 | python多线程爬虫实例 9 | --- 10 | - [豆瓣top250多线程抓取源码](./douban_top250.py)(源码较简单,如果需要解析的话请提issue,回头填坑) 11 | ![豆瓣movieTop250](./mul_example/douban250.gif) 12 | 13 | python 多线程的实现 14 | --- 15 | #### 1. 让一个函数同时执行多次 16 | 导入多线程与时间模块 17 | ```python 18 | >>> import threading 19 | >>> import time 20 | ``` 21 | 定义线程运行的函数 22 | ```python 23 | def fun(thread_name, others): 24 | for i in range(3): 25 | print(f"线程{thread_name}准备休息{others}秒") 26 | time.sleep(2) 27 | print(f"this is a treading, my name is {thread_name}") 28 | ``` 29 | 开启两个线程并运行 30 | - Thread 对象 31 | - target # 线程执行函数 32 | - args # 函数位置参数 33 | - kwargs # 函数关键字参数 34 | - 方法 35 | - .start() # 开始运行线程 36 | - .join() # 主线程等待子线程函数的运行结束,否则会造成孤儿进程。 37 | ```python 38 | threading_list = [] # 创建线程容器 39 | 40 | t1 = threading.Thread(target=fun, args=("thread-1", 2)) # 创建线程对象1 41 | t2 = threading.Thread(target=fun, args=("thread-2", 1)) # 创建线程对象2 42 | threading_list.append(t1) # 加入线程容器 43 | threading_list.append(t2) 44 | list(map(lambda x: x.start(), threading_list)) # start 用来执行线程 45 | list(map(lambda t: t.join(), threading_list)) # join用于主线程等待子线程运行结束 46 | print("主线程运行结束。") 47 | ``` 48 | 结果如下: 49 | ![多线程示例](./mul_example/mul.gif) 50 | 我们可以看到函数是同时执行的,代码执行总时长也是远少于 6 + 3 = 9s的,并且两个线程执行顺序也是随机的。 51 | 这就达到我们想要的效果啦! 52 | #### [多线程示例完整代码](./mul_example/mul_treading_exmple.py) 53 | #### 2. 多线程的同步问题 54 | 但是,多线程也会有问题,接下来我们用下面的示例来看一下 55 | 56 | 设定一个全局计数变量`count` 57 | ```python 58 | >>> count = 0 59 | ``` 60 | 61 | 设定计数函数 `count_fun`,函数功能也很简单。用来对全局变量`count`进行不断的累加至100万次。 62 | ```python 63 | def count_fun(): 64 | global count 65 | for i in range(1000000): 66 | count += 1 67 | 68 | ``` 69 | 我们运行的主函数,与上一个示例相似,只是没有传递额外的参数。 70 | ```python 71 | threading_list = [] 72 | 73 | t1 = threading.Thread(target=count_fun) 74 | t2 = threading.Thread(target=count_fun) 75 | threading_list.append(t1) 76 | threading_list.append(t2) 77 | 78 | list(map(lambda x: x.start(), threading_list)) 79 | list(map(lambda t: t.join(), threading_list)) 80 | print(f"主线程运行结束。count is : {count}") 81 | ``` 82 | 83 | 该次运行结果如下图: 84 | ```python 85 | 主线程运行结束。count is : 1550720 86 | ``` 87 | #### [线程同步问题代码](./mul_example/mul_questiion.py) 88 | 没错,结果并不是200万,这就是线程的**同步问题**。 89 | 因为变量公用,那么两个线程获取到的变量值可能是同样的,因为赋值操作还没执行。 90 | 有一个简单的解决办法,我们可以为操作变量时加上线程锁来解决这个问题。 91 | 代码示例如下: 92 | 我们初始化一个线程锁对象: 93 | ```python 94 | >>> lock = threading.Lock() 95 | ``` 96 | 在对公用变量操作时,我们上锁 97 | ```python 98 | def count_fun(): 99 | global count 100 | for i in range(1000000): 101 | with lock: # 在执行对公用变量操作时,加上锁,以此来保证线程的安全性 102 | count += 1 103 | ``` 104 | 输出结果如下: 105 | ```python 106 | 主线程运行结束。count is : 2000000 107 | ``` 108 | #### [线程锁完整代码](./mul_example/mul_lock.py) 109 | 有了这两步,我们就能完成多线程爬虫啦! 110 | -------------------------------------------------------------------------------- /spider_project/multithreading/douban_top250.py: -------------------------------------------------------------------------------- 1 | import threading 2 | import time 3 | 4 | import requests 5 | from scrapy import Selector 6 | 7 | url = "https://movie.douban.com/top250" 8 | lock = threading.Lock() 9 | page_offset = 0 10 | 11 | 12 | def get_data(url): 13 | """ 14 | 获取网页原数据 15 | :param url: 请求地址 16 | :return: 17 | """ 18 | global page_offset 19 | while page_offset <= 225: # 如果页面见底 20 | params = {"start": page_offset, "filter": ""} 21 | with lock: # 加上操作锁,保证公用数据安全性 22 | page_offset += 25 23 | response = requests.get(url, params=params) 24 | if response.status_code == requests.codes.ok: # 检测状态码 25 | data = response.text # 返回响应的文本信息 26 | print(f"当前解析的url是:{response.url}") 27 | parse_data(data) 28 | 29 | else: 30 | response.raise_for_status() # 4xx 5xx 31 | 32 | 33 | def parse_data(data): 34 | """ 35 | 解析数据 36 | :param data:解析元数据 37 | :return: 38 | """ 39 | se = Selector(text=data) # 初始化selector 40 | print("睡觉休息3s...") 41 | time.sleep(3) 42 | # 提取数页面数据 43 | base_se = se.css(".grid_view .item") 44 | for item in base_se: 45 | movie_title = "".join(item.css(".hd .title::text").extract()).replace("\xa0", "") # 替换特殊字符 46 | other_title = item.css(".hd .other::text").extract_first().replace("\xa0", "") 47 | description = item.css(".bd>p::text").extract_first().replace("\xa0", "").strip() 48 | poster = item.css(".pic img").xpath("./@src").extract_first() 49 | point_num = item.css(".star>.rating_num::text").extract_first() 50 | point_people = item.css(".star").xpath("./span[last()]/text()").re(r"\d+")[0] 51 | print({"movie_title": movie_title, 52 | "other_title": other_title, 53 | "description": description, 54 | "poster": poster, 55 | "point_num": point_num, 56 | "point_people": point_people}) 57 | 58 | 59 | def main(): 60 | """ 61 | 主函数 62 | :return: None 63 | """ 64 | print("多线程初始化中") 65 | t_list = [threading.Thread(target=get_data, kwargs={"url": url}) for _ in range(4)] # 生产线程列表 66 | list(map(lambda x: x.start(), t_list)) 67 | print("线程已经全部开始工作,等待子线程结束。。。") 68 | list(map(lambda x: x.join(), t_list)) 69 | print("所有抓取已经完成。") 70 | 71 | 72 | if __name__ == "__main__": 73 | main() 74 | -------------------------------------------------------------------------------- /spider_project/multithreading/mul_example/douban250.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/multithreading/mul_example/douban250.gif -------------------------------------------------------------------------------- /spider_project/multithreading/mul_example/mul.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/multithreading/mul_example/mul.gif -------------------------------------------------------------------------------- /spider_project/multithreading/mul_example/mul_lock.py: -------------------------------------------------------------------------------- 1 | import threading 2 | 3 | lock = threading.Lock() 4 | count = 0 5 | 6 | 7 | def count_fun(): 8 | global count 9 | for _ in range(1000000): 10 | with lock: # 在执行对公用变量操作时,加上锁,以此来保证线程的安全性 11 | count += 1 12 | 13 | 14 | threading_list = [] 15 | 16 | t1 = threading.Thread(target=count_fun) 17 | t2 = threading.Thread(target=count_fun) 18 | threading_list.append(t1) 19 | threading_list.append(t2) 20 | 21 | list(map(lambda x: x.start(), threading_list)) 22 | list(map(lambda t: t.join(), threading_list)) 23 | print(f"主线程运行结束。count is : {count}") 24 | 25 | -------------------------------------------------------------------------------- /spider_project/multithreading/mul_example/mul_questiion.py: -------------------------------------------------------------------------------- 1 | import threading 2 | 3 | count = 0 4 | 5 | 6 | def count_fun(): 7 | global count 8 | for _ in range(1000000): 9 | count += 1 10 | 11 | 12 | threading_list = [] 13 | 14 | t1 = threading.Thread(target=count_fun) 15 | t2 = threading.Thread(target=count_fun) 16 | threading_list.append(t1) 17 | threading_list.append(t2) 18 | 19 | list(map(lambda x: x.start(), threading_list)) 20 | list(map(lambda t: t.join(), threading_list)) 21 | print(f"主线程运行结束。count is : {count}") 22 | 23 | -------------------------------------------------------------------------------- /spider_project/multithreading/mul_example/mul_treading_exmple.py: -------------------------------------------------------------------------------- 1 | import threading 2 | import time 3 | 4 | 5 | def fun(thread_name, others): 6 | for _ in range(3): 7 | print(f"线程{thread_name}准备休息{others}秒") 8 | time.sleep(2) 9 | print(f"this is a treading, my name is {thread_name}") 10 | 11 | 12 | threading_list = [] 13 | 14 | t1 = threading.Thread(target=fun, args=("thread-1", 2)) 15 | t2 = threading.Thread(target=fun, args=("thread-2", 1)) 16 | threading_list.append(t1) 17 | threading_list.append(t2) 18 | input() 19 | list(map(lambda x: x.start(), threading_list)) 20 | list(map(lambda t: t.join(), threading_list)) 21 | print("主线程运行结束。") 22 | -------------------------------------------------------------------------------- /spider_project/xiecheng/hotel/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dustyposa/goSpider/4f5bdab86c71f503ac3ca7fbf5f81a615e323759/spider_project/xiecheng/hotel/__init__.py -------------------------------------------------------------------------------- /spider_project/xiecheng/hotel/spider.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | from selenium.webdriver.chrome.options import Options 3 | from time import sleep 4 | import csv 5 | import random 6 | 7 | chrome_options = Options() 8 | # chrome_options.add_argument("--disable-gpu") 9 | # chrome_options.add_argument("--headless") 10 | chrome_options.add_experimental_option('excludeSwitches', ['enable-automation']) 11 | chrome_options.add_argument("loglevel='3'") 12 | chrome_options.add_argument("blink-settings=imagesEnable=false") 13 | 14 | # start_url = "https://hotels.ectrip.com/hotel/chengdu28#ctm_ref=hd_hp_hc_lst_hi_i_a_7" 15 | start_url = "https://m.ctrip.com/webapp/hotel/chengdu28#ctm_ref=hd_hp_hc_lst_hi_i_a_7" 16 | get_cookie_url = "https://m.hotels.ctrip.com/" 17 | 18 | try: 19 | f = open("findtrip.csv", "w", newline="") 20 | fileheader = ["img_link", "hotel_name", "quality_stars", "address", "hotel_tage", "price", "judge_nums", 21 | "hotel_judge", "total_judgement_score", "recommend_kw"] 22 | cr = csv.DictWriter(f, fieldnames=fileheader) 23 | cr.writeheader() 24 | script = ''' 25 | Object.defineProperty(navigator, 'webdriver', { 26 | get: () => undefined 27 | }) 28 | ''' 29 | bor = webdriver.Chrome(executable_path="/Users/dustyposa/Downloads/chromedriver", options=chrome_options) 30 | bor.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {"source": script}) 31 | # bor.get(get_cookie_url) 32 | # sleep(5) 33 | bor.get(start_url) 34 | while True: 35 | sleep(random.uniform(2, 3)) 36 | item_list = bor.find_elements_by_xpath("//div[@class='hotel_new_list J_HotelListBaseCell']/ul") 37 | for item in item_list: 38 | dic = {} 39 | dic["img_link"] = item.find_element_by_xpath(".//div[@class='dpic J_as_bottom']/img").get_attribute("src") 40 | dic["img_link"] = dic["img_link"] 41 | dic["hotel_name"] = item.find_element_by_xpath(".//div[@class='dpic J_as_bottom']/img").get_attribute("alt") 42 | dic["quality_stars"] = item.find_element_by_xpath(".//span[@class='hotel_ico']/span[last()]").get_attribute( 43 | "class") 44 | dic["quality_stars"] = int(dic["quality_stars"][-1]) 45 | dic["address"] = item.find_element_by_xpath(".//p[@class='hotel_item_htladdress']").text 46 | dic["address"] = dic["address"].split("。")[0] 47 | dic["hotel_tage"] = item.find_element_by_xpath(".//span[@class='special_label']").text.replace("\n", ",") 48 | dic["price"] = item.find_element_by_xpath(".//span[@class='J_price_lowList']").text 49 | dic["judge_nums"] = item.find_element_by_xpath(".//span[@class='hotel_judgement']/span").text 50 | dic["hotel_judge"] = item.find_element_by_xpath(".//a[@class='hotel_judge']").get_attribute("title") 51 | dic["total_judgement_score"] = item.find_element_by_xpath( 52 | ".//span[@class='total_judgement_score']/span").text 53 | dic["recommend_kw"] = item.find_element_by_xpath(".//span[@class='recommend']").text.replace("\n", ",") 54 | # print(dic) 55 | cr.writerow(dic) 56 | next_url = bor.find_element_by_xpath("//a[text()='下一页']") 57 | if not next_url: 58 | break 59 | # print(next_url) 60 | next_url.click() 61 | print(bor.current_url) 62 | # sleep(random.uniform(0.6,1.3)) 63 | 64 | except Exception as e: 65 | print(e) 66 | 67 | finally: 68 | f.close() 69 | -------------------------------------------------------------------------------- /spider_story/first_day.md: -------------------------------------------------------------------------------- 1 | ## spider的自白 2 | 3 | hey,大家好,我就是传说中的spider。经常在各个网络之间爬行,爬行了这么久,也算见过世面的了,现在我就给大家介绍一下我们spider一族的生活。 4 | 5 | 相信你经常听到爬虫爬虫,那么爬虫到底是什么呢?名字叫做spider 会不会感觉很神奇了,当然,你也可以叫我 `spider man!` 6 | 7 | ## 爬虫是什么 8 | 9 | 在回答这个问题之前,我们先探讨一下,爬虫能做什么。 10 | 11 | - 获取网页上所看到数据。 12 | - 模拟用户行为。 13 | - 进行网络攻击。 14 | - ... 15 | 16 | 从这些能做的事情中间,我们能看出,爬虫最关键的东西是 网络。 17 | 18 | 那么可知,网络知识是非常重要的。 19 | 20 | 因为网络知识比较大,也比较杂,为了即学即用,我们直接从最简单的爬虫代码开始吧。 21 | 22 | ## 第一次任务 23 | 24 | hey,今天我又要去干活了,而且好像今天的活比较轻松。今天您就跟我一起参观一下我每天的工作吧! 25 | 26 | 今天的任务如下: 27 | 28 | ``` 29 | import requests 30 | 31 | url = "http://www.baidu.com" 32 | response = requests.get(url) 33 | print(response.text) 34 | ``` 35 | 36 | 这么简单吗!? 37 | 38 | 其实,这只是看着简单,那我们从正式的代码开始分析吧。 39 | 40 | ### 什么是网址 41 | 42 | **url** 43 | 44 | 这个url到底是什么呢? 45 | 46 | 这个,可能还不能直接回答,我们还需要一点前置知识,网络(www)是怎么组成的。 47 | 48 | #### 网络 49 | 50 | 电脑在最初的时候,都是一台一台,每台之间没有联系。但是,你写了一篇文章,我想看怎么办? 51 | 52 | 于是,软盘,U盘等存储介质就出现了。于是,你将你写的文章拷贝给了我,我在我的电脑上看了你的文章。 53 | 54 | 但是,我需要不停的对你的文章进行批注,你需要自己修改,需要不停的拷贝,黏贴,但是感觉太麻烦了。 55 | 56 | 于是想了一个办法,既然存储介质能和我的电脑互相连接,那么能不能让我的电脑和你的电脑连接起来呢?嗯,没错,网卡就出现啦! 57 | 58 | 这样我的电脑和你的电脑连接在了一起,可以通过网卡的通信,直接交换数据,方便太多了! 59 | 60 | 61 | 62 | -------------------------------------------------------------------------------- /tools/README.md: -------------------------------------------------------------------------------- 1 | ## 搭配常见的爬虫工具,让你事半功倍! 2 | 3 | 这里主要放置常用的爬虫工具集合,示例以 notebook的形式展示,主要用于快速上手 4 | --- 5 | #### 请求类 6 | - [requests](#%E5%9F%BA%E6%9C%AC%E4%BD%BF%E7%94%A8) 7 | #### 解析类 8 | - [xpath]() 9 | - [beautiful]() 10 | - [pyquery]() 11 | - [scrapy response]() 12 | #### 框架类 13 | - [Scrapy]() 14 | - [PySpider]() 15 | - [Crawley]() 16 | - [Portia]() 17 | - [Newspaper]() 18 | - [Grab]() 19 | - [Cola]() 20 | #### 监控类 21 | 22 | #### 模拟类 23 | - [selenium]() 24 | --------------------------------------------------------------------------------