├── docs_en ├── .nojekyll ├── imgs │ ├── code.jpg │ ├── logo.png │ ├── sb1.jpg │ ├── sb2.jpg │ ├── sb3.jpg │ ├── sb4.jpg │ ├── change1.png │ ├── change2.png │ ├── gitee_1.jpg │ ├── gitee_2.jpg │ ├── mixpage.jpg │ ├── webpage.jpg │ ├── color_logo.png │ ├── login_gitee1.jpg │ ├── login_gitee2.jpg │ ├── login_gitee3.jpg │ ├── 20230105105418.png │ └── find_browser_path.png ├── SessionPage │ ├── get_elements.md │ ├── introduction.md │ └── create_page_object.md ├── cooperation.md ├── ChromiumPage │ ├── get_elements.md │ ├── upload_files.md │ ├── introduction.md │ └── visit_web_page.md ├── features │ └── features_demos │ │ ├── download_file.md │ │ ├── get_element_attributes.md │ │ ├── switch_mode.md │ │ ├── compare_with_requests.md │ │ └── compare_with_selenium.md ├── get_start │ ├── installation.md │ ├── examples │ │ ├── control_browser.md │ │ ├── data_packets.md │ │ └── switch_mode.md │ ├── import.md │ └── before_start.md ├── WebPage │ ├── introduction.md │ ├── webpage_function.md │ ├── mode_switch.md │ └── create_page_object.md ├── MixPage │ └── introduction.md ├── demos │ ├── login_gitee.md │ ├── douban_book_pics.md │ ├── maoyan_TOP100.md │ ├── starbucks_pics.md │ └── multithreading_with_tabs.md ├── advance │ ├── settings.md │ ├── commands.md │ ├── errors.md │ ├── accelerate_reading.md │ ├── packaging.md │ └── tools.md ├── download │ └── introduction.md ├── get_elements │ ├── simplify.md │ ├── not_found.md │ ├── cheat_sheet.md │ └── introduction.md ├── history │ └── 1.x.md ├── usage_introduction.md └── Q&A.md ├── code2.jpg ├── requirements.txt ├── MANIFEST.in ├── DrissionPage ├── version.py ├── _functions │ ├── by.py │ ├── browser.pyi │ ├── cli.py │ ├── locator.pyi │ ├── settings.py │ ├── cookies.pyi │ ├── settings.pyi │ ├── keys.pyi │ ├── tools.pyi │ └── web.pyi ├── __init__.pyi ├── items.py ├── __init__.py ├── _elements │ ├── none_element.pyi │ └── none_element.py ├── _configs │ ├── configs.ini │ └── options_manage.pyi ├── _units │ ├── console.pyi │ ├── screencast.pyi │ ├── console.py │ ├── cookies_setter.py │ ├── cookies_setter.pyi │ ├── scroller.py │ ├── states.pyi │ └── clicker.pyi ├── errors.py ├── common.py ├── _pages │ ├── chromium_tab.py │ ├── chromium_tab.pyi │ └── chromium_page.py └── _base │ └── driver.pyi ├── .github └── FUNDING.yml ├── setup.py ├── .gitignore ├── LICENSE └── README.md /docs_en/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /code2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/code2.jpg -------------------------------------------------------------------------------- /docs_en/imgs/code.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/code.jpg -------------------------------------------------------------------------------- /docs_en/imgs/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/logo.png -------------------------------------------------------------------------------- /docs_en/imgs/sb1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/sb1.jpg -------------------------------------------------------------------------------- /docs_en/imgs/sb2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/sb2.jpg -------------------------------------------------------------------------------- /docs_en/imgs/sb3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/sb3.jpg -------------------------------------------------------------------------------- /docs_en/imgs/sb4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/sb4.jpg -------------------------------------------------------------------------------- /docs_en/imgs/change1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/change1.png -------------------------------------------------------------------------------- /docs_en/imgs/change2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/change2.png -------------------------------------------------------------------------------- /docs_en/imgs/gitee_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/gitee_1.jpg -------------------------------------------------------------------------------- /docs_en/imgs/gitee_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/gitee_2.jpg -------------------------------------------------------------------------------- /docs_en/imgs/mixpage.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/mixpage.jpg -------------------------------------------------------------------------------- /docs_en/imgs/webpage.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/webpage.jpg -------------------------------------------------------------------------------- /docs_en/imgs/color_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/color_logo.png -------------------------------------------------------------------------------- /docs_en/imgs/login_gitee1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/login_gitee1.jpg -------------------------------------------------------------------------------- /docs_en/imgs/login_gitee2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/login_gitee2.jpg -------------------------------------------------------------------------------- /docs_en/imgs/login_gitee3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/login_gitee3.jpg -------------------------------------------------------------------------------- /docs_en/imgs/20230105105418.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/20230105105418.png -------------------------------------------------------------------------------- /docs_en/imgs/find_browser_path.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g1879/DrissionPage/HEAD/docs_en/imgs/find_browser_path.png -------------------------------------------------------------------------------- /docs_en/SessionPage/get_elements.md: -------------------------------------------------------------------------------- 1 | 🚄 Search for Elements 2 | --- 3 | 4 | Please refer to the "Search for Elements" section. 5 | 6 | -------------------------------------------------------------------------------- /docs_en/cooperation.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: cooper 3 | title: Business cooperation 4 | --- 5 | 6 | Undertake automated orders, contact QQ:178691442 -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests 2 | lxml 3 | cssselect 4 | DownloadKit>=2.0.7 5 | websocket-client 6 | click 7 | tldextract>=3.4.4 8 | psutil -------------------------------------------------------------------------------- /docs_en/ChromiumPage/get_elements.md: -------------------------------------------------------------------------------- 1 | 🚤 Find Elements 2 | --- 3 | 4 | Please refer to the "[Find Elements](https://g1879.gitee.io/drissionpagedocs/get_elements/get_ele_intro)" section. 5 | 6 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include DrissionPage/_configs/configs.ini 2 | include DrissionPage/_functions/suffixes.dat 3 | include DrissionPage/*.pyi 4 | include DrissionPage/*/*.py 5 | include DrissionPage/*/*.pyi -------------------------------------------------------------------------------- /DrissionPage/version.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | __version__ = '4.1.1.2' 9 | -------------------------------------------------------------------------------- /docs_en/features/features_demos/download_file.md: -------------------------------------------------------------------------------- 1 | # Download Files 2 | 3 | DrissionPage comes with a convenient downloader that allows you to easily download files with just one line of code. 4 | 5 | ```python 6 | from DrissionPage import WebPage 7 | 8 | url = 'https://www.baidu.com/img/flexible/logo/pc/result.png' 9 | save_path = r'C:\download' 10 | 11 | page = WebPage('s') 12 | page.download(url, save_path) 13 | ``` 14 | 15 | -------------------------------------------------------------------------------- /DrissionPage/_functions/by.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | 9 | 10 | class By: 11 | ID = 'id' 12 | XPATH = 'xpath' 13 | LINK_TEXT = 'link text' 14 | PARTIAL_LINK_TEXT = 'partial link text' 15 | NAME = 'name' 16 | TAG_NAME = 'tag name' 17 | CLASS_NAME = 'class name' 18 | CSS_SELECTOR = 'css selector' 19 | -------------------------------------------------------------------------------- /DrissionPage/__init__.pyi: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | from ._base.chromium import Chromium 9 | from ._configs.chromium_options import ChromiumOptions 10 | from ._configs.session_options import SessionOptions 11 | from ._pages.chromium_page import ChromiumPage 12 | from ._pages.session_page import SessionPage 13 | from ._pages.web_page import WebPage 14 | from .version import __version__ 15 | 16 | 17 | __all__ = ['WebPage', 'ChromiumPage', 'Chromium', 'ChromiumOptions', 'SessionOptions', 'SessionPage', '__version__'] 18 | -------------------------------------------------------------------------------- /docs_en/ChromiumPage/upload_files.md: -------------------------------------------------------------------------------- 1 | 🚤 File Upload 2 | --- 3 | 4 | There are two ways to upload a file: 5 | 6 | - Find the `` element and insert the file path. 7 | 8 | - Intercept the file input box and automatically fill in the path. 9 | 10 | ## ✅️️ Traditional Method 11 | 12 | The first method is the traditional method, where developers need to find the file upload control in the DOM and use the `input()` method of the element object to insert the path. 13 | 14 | The file upload control is an `` element with the `type` attribute set to `'file'`, and the file path can be entered into the element. Its usage is the same as entering text. 15 | 16 | The only difference is that 17 | 18 | -------------------------------------------------------------------------------- /docs_en/get_start/installation.md: -------------------------------------------------------------------------------- 1 | 🌏 Installation 2 | --- 3 | 4 | ## ✅️️ System Requirements 5 | 6 | Operating System: Windows, Linux, or Mac. 7 | 8 | Python Version: 3.6 and above. 9 | 10 | Supported Browsers: Chromium-based browsers (such as Chrome and Edge). 11 | 12 | --- 13 | 14 | ## ✅️️ Installation 15 | 16 | Please use pip to install DrissionPage: 17 | 18 | ```shell 19 | pip install DrissionPage 20 | ``` 21 | 22 | --- 23 | 24 | ## ✅️️ Upgrading 25 | 26 | ### 📌 Upgrade to the Latest Stable Version 27 | 28 | ```shell 29 | pip install DrissionPage --upgrade 30 | ``` 31 | 32 | --- 33 | 34 | ### 📌 Upgrade to a Specific Version 35 | 36 | ```shell 37 | pip install DrissionPage==4.0.0b17 38 | ``` 39 | 40 | -------------------------------------------------------------------------------- /DrissionPage/items.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | from ._elements.chromium_element import ChromiumElement, ShadowRoot 9 | from ._elements.none_element import NoneElement 10 | from ._elements.session_element import SessionElement 11 | from ._pages.chromium_frame import ChromiumFrame 12 | from ._pages.chromium_tab import ChromiumTab 13 | from ._pages.mix_tab import MixTab 14 | from ._pages.mix_tab import MixTab as WebPageTab 15 | 16 | __all__ = ['ChromiumElement', 'ShadowRoot', 'NoneElement', 'SessionElement', 'ChromiumFrame', 'ChromiumTab', 17 | 'MixTab', 'WebPageTab'] 18 | -------------------------------------------------------------------------------- /docs_en/features/features_demos/get_element_attributes.md: -------------------------------------------------------------------------------- 1 | ⭐ Get Element Attribute 2 | --- 3 | 4 | ```python 5 | # Continuing from previous code 6 | foot = page.ele('#footer-left') # Find element by id 7 | first_col = foot.ele('css:>div') # Find element within the subordinates using css selector (the first one) 8 | lnk = first_col.ele('text:命令学') # Find element using text content 9 | text = lnk.text # Get element text 10 | href = lnk.attr('href') # Get element attribute value 11 | 12 | print(text, href, '\n') 13 | 14 | # Concise chaining mode 15 | text = page('@id:footer-left')('css:>div')('text:命令学').text 16 | print(text) 17 | ``` 18 | 19 | **Output:** 20 | 21 | ```shell 22 | Learn Git Command https://oschina.gitee.io/learn-git-branching/ 23 | 24 | Learn Git Command 25 | ``` 26 | 27 | -------------------------------------------------------------------------------- /docs_en/WebPage/introduction.md: -------------------------------------------------------------------------------- 1 | 🛸 Overview 2 | --- 3 | 4 | The `WebPage` object integrates `SessionPage` and `ChromiumPage`, enabling communication between the two. 5 | 6 | It can control the browser and send/receive data packets, and synchronizes login information between the two. 7 | 8 | It has two modes: d and s, corresponding to controlling the browser and sending/receiving data packets, respectively. 9 | 10 | `WebPage` can flexibly switch between these two modes, allowing for interesting use cases. 11 | 12 | For example, if the website login code is very complex and using data packets is too complicated, we can use the browser to handle the login and then switch to the data packet mode to collect data. 13 | 14 | The logic for using both modes is the same and there is no difference compared to `ChromiumPage`, making it easy to get started. 15 | 16 | Diagram of the `WebPage` structure: 17 | 18 | ![](../imgs/webpage.jpg) 19 | 20 | -------------------------------------------------------------------------------- /docs_en/features/features_demos/switch_mode.md: -------------------------------------------------------------------------------- 1 | ⭐ Mode Switch 2 | --- 3 | 4 | Log in to the website using a browser and switch to reading the webpage with requests. They will share login information. 5 | 6 | ```python 7 | from DrissionPage import WebPage 8 | from time import sleep 9 | 10 | # Create a page object with the default d mode 11 | page = WebPage() 12 | # Visit the personal center page (not logged in, redirect to the login page) 13 | page.get('https://gitee.com/profile') 14 | 15 | # Enter the account password to log in 16 | page.ele('@id:user_login').input('your_user_name') 17 | page.ele('@id:user_password').input('your_password\n') 18 | page.wait.load_start() 19 | 20 | # Switch to the s mode 21 | page.change_mode() 22 | # Output of session mode after login 23 | print('Logged in title:', page.title, '\n') 24 | ``` 25 | 26 | **Output:** 27 | 28 | ```shell 29 | Logged in title: Personal Information - Gitee.com 30 | ``` 31 | 32 | -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] 4 | patreon: # Replace with a single Patreon username 5 | open_collective: # Replace with a single Open Collective username 6 | ko_fi: # Replace with a single Ko-fi username 7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 9 | liberapay: # Replace with a single Liberapay username 10 | issuehunt: # Replace with a single IssueHunt username 11 | lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry 12 | polar: # Replace with a single Polar username 13 | buy_me_a_coffee: # Replace with a single Buy Me a Coffee username 14 | thanks_dev: # Replace with a single thanks.dev username 15 | custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] 16 | -------------------------------------------------------------------------------- /DrissionPage/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | 8 | 允许任何人以个人身份使用或分发本项目源代码,但仅限于学习和合法非盈利目的。 9 | 个人或组织如未获得版权持有人授权,不得将本项目以源代码或二进制形式用于商业行为。 10 | 11 | 使用本项目需满足以下条款,如使用过程中出现违反任意一项条款的情形,授权自动失效。 12 | * 禁止将DrissionPage应用到任何可能违反当地法律规定和道德约束的项目中 13 | * 禁止将DrissionPage用于任何可能有损他人利益的项目中 14 | * 禁止将DrissionPage用于攻击与骚扰行为 15 | * 遵守Robots协议,禁止将DrissionPage用于采集法律或系统Robots协议不允许的数据 16 | 17 | 使用DrissionPage发生的一切行为均由使用人自行负责。 18 | 因使用DrissionPage进行任何行为所产生的一切纠纷及后果均与版权持有人无关, 19 | 版权持有人不承担任何使用DrissionPage带来的风险和损失。 20 | 版权持有人不对DrissionPage可能存在的缺陷导致的任何损失负任何责任。 21 | """ 22 | from ._base.chromium import Chromium 23 | from ._configs.chromium_options import ChromiumOptions 24 | from ._configs.session_options import SessionOptions 25 | from ._pages.chromium_page import ChromiumPage 26 | from ._pages.session_page import SessionPage 27 | from ._pages.web_page import WebPage 28 | from .version import __version__ 29 | -------------------------------------------------------------------------------- /DrissionPage/_elements/none_element.pyi: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | from typing import Any, Optional 9 | 10 | from .._base.base import BasePage 11 | 12 | 13 | class NoneElement(object): 14 | _none_ele_value: Any = ... 15 | _none_ele_return_value: Any = ... 16 | method: Optional[str] = ... 17 | args: Optional[dict] = ... 18 | 19 | def __init__(self, 20 | page: BasePage = None, 21 | method: str = None, 22 | args: dict = None): 23 | """ 24 | :param page: 元素所在页面 25 | :param method: 查找元素的方法 26 | :param args: 查找元素的参数 27 | """ 28 | ... 29 | 30 | def __call__(self, *args, **kwargs) -> NoneElement: ... 31 | 32 | def __repr__(self) -> str: ... 33 | 34 | def __getattr__(self, item: str) -> str: ... 35 | 36 | def __eq__(self, other: Any) -> bool: ... 37 | 38 | def __bool__(self) -> bool: ... 39 | -------------------------------------------------------------------------------- /DrissionPage/_configs/configs.ini: -------------------------------------------------------------------------------- 1 | [paths] 2 | download_path = 3 | tmp_path = 4 | 5 | [chromium_options] 6 | address = 127.0.0.1:9222 7 | browser_path = chrome 8 | arguments = ['--no-default-browser-check', '--disable-suggestions-ui', '--no-first-run', '--disable-infobars', '--disable-popup-blocking', '--hide-crash-restore-bubble', '--disable-features=PrivacySandboxSettings4'] 9 | extensions = [] 10 | prefs = {'profile.default_content_settings.popups': 0, 'profile.default_content_setting_values': {'notifications': 2}} 11 | flags = {} 12 | load_mode = normal 13 | user = Default 14 | auto_port = False 15 | system_user_path = False 16 | existing_only = False 17 | new_env = False 18 | 19 | [session_options] 20 | headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'connection': 'keep-alive', 'accept-charset': 'GB2312,utf-8;q=0.7,*;q=0.7'} 21 | 22 | [timeouts] 23 | base = 10 24 | page_load = 30 25 | script = 30 26 | 27 | [proxies] 28 | http = 29 | https = 30 | 31 | [others] 32 | retry_times = 3 33 | retry_interval = 2 34 | -------------------------------------------------------------------------------- /docs_en/features/features_demos/compare_with_requests.md: -------------------------------------------------------------------------------- 1 | ⭐ Comparison with requests 2 | --- 3 | 4 | The following code achieves the same functionality, comparing the amount of code for each: 5 | 6 | 🔸 Get element content 7 | 8 | ```python 9 | url = 'https://baike.baidu.com/item/python' 10 | 11 | # Using requests: 12 | from lxml import etree 13 | headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36'} 14 | response = requests.get(url, headers = headers) 15 | html = etree.HTML(response.text) 16 | element = html.xpath('//h1')[0] 17 | title = element.text 18 | 19 | # Using DrissionPage: 20 | page = WebPage('s') 21 | page.get(url) 22 | title = page('tag:h1').text 23 | ``` 24 | 25 | :::tip Tips 26 | DrissionPage comes with default headers 27 | ::: 28 | 29 | 🔸 Download file 30 | 31 | ```python 32 | url = 'https://www.baidu.com/img/flexible/logo/pc/result.png' 33 | save_path = r'C:\download' 34 | 35 | # Using requests: 36 | r = requests.get(url) 37 | with open(f'{save_path}\\img.png', 'wb') as fd: 38 | for chunk in r.iter_content(): 39 | fd.write(chunk) 40 | 41 | # Using DrissionPage: 42 | page.download(url, save_path, 'img') # Supports renaming, handles filename conflicts 43 | ``` 44 | 45 | -------------------------------------------------------------------------------- /docs_en/MixPage/introduction.md: -------------------------------------------------------------------------------- 1 | 🛠 Old Version (MixPage) 2 | --- 3 | 4 | The versions of this repository prior to 3.0 were implemented by re-encapsulating selenium. 5 | 6 | The page objects for this version are `MixPage` and `DriverPage`, corresponding to `WebPage` and `ChromiumPage` of DrissionPage. The usage is basically the same as the new version. 7 | 8 | After years of use, the old version has become quite stable. However, due to reliance on selenium, the development of functions has been greatly restricted. Moreover, with the iteration of versions, the new version has surpassed the old version comprehensively, and it is time for the old version to retire. 9 | 10 | Therefore, starting from version 3.0, the old version code has been separated from this repository and developed into an independent library. 11 | 12 | This is to commemorate the achievements it has made. 13 | 14 | Currently, the development of the old version has been frozen. Except for bug fixes, there will be no more functional modifications for the old version. 15 | 16 | Interested readers can take a look. 17 | 18 | --- 19 | 20 | Project address: [MixPage](https://gitee.com/g1879/MixPage) 21 | 22 | Documentation: [MixPage User Manual](http://g1879.gitee.io/mixpage) 23 | 24 | --- 25 | 26 | The structure of `MixPage` is as follows: 27 | 28 | ![](../imgs/mixpage.jpg) 29 | 30 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | from setuptools import setup, find_packages 3 | from DrissionPage import __version__ 4 | 5 | with open("README.md", "r", encoding='utf-8') as fh: 6 | long_description = fh.read() 7 | 8 | setup( 9 | name="DrissionPage", 10 | version=__version__, 11 | author="g1879", 12 | author_email="g1879@qq.com", 13 | description="Python based web automation tool. It can control the browser and send and receive data packets.", 14 | long_description=long_description, 15 | long_description_content_type="text/markdown", 16 | # license="BSD", 17 | keywords="DrissionPage", 18 | url="https://DrissionPage.cn", 19 | include_package_data=True, 20 | packages=find_packages(), 21 | zip_safe=False, 22 | install_requires=[ 23 | 'lxml', 24 | 'requests', 25 | 'cssselect', 26 | 'DownloadKit>=2.0.7', 27 | 'websocket-client', 28 | 'click', 29 | 'tldextract>=3.4.4', 30 | 'psutil' 31 | ], 32 | classifiers=[ 33 | "Programming Language :: Python :: 3.6", 34 | "Development Status :: 4 - Beta", 35 | "Topic :: Utilities", 36 | # "License :: OSI Approved :: BSD License", 37 | ], 38 | python_requires='>=3.6', 39 | entry_points={ 40 | 'console_scripts': [ 41 | 'dp = DrissionPage._functions.cli:main', 42 | ], 43 | }, 44 | ) 45 | -------------------------------------------------------------------------------- /DrissionPage/_functions/browser.pyi: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | from typing import Union 9 | 10 | from .._configs.chromium_options import ChromiumOptions 11 | 12 | 13 | def connect_browser(option: ChromiumOptions) -> bool: 14 | """连接或启动浏览器 15 | :param option: ChromiumOptions对象 16 | :return: 返回是否接管的浏览器 17 | """ 18 | ... 19 | 20 | 21 | def get_launch_args(opt: ChromiumOptions) -> list: 22 | """从ChromiumOptions获取命令行启动参数 23 | :param opt: ChromiumOptions 24 | :return: 启动参数列表 25 | """ 26 | ... 27 | 28 | 29 | def set_prefs(opt: ChromiumOptions) -> None: 30 | """处理启动配置中的prefs项,目前只能对已存在文件夹配置 31 | :param opt: ChromiumOptions 32 | :return: None 33 | """ 34 | ... 35 | 36 | 37 | def set_flags(opt: ChromiumOptions) -> None: 38 | """处理启动配置中的flags项 39 | :param opt: ChromiumOptions 40 | :return: None 41 | """ 42 | ... 43 | 44 | 45 | def test_connect(ip: str, port: Union[int, str], timeout: float = 30) -> bool: 46 | """测试浏览器是否可用 47 | :param ip: 浏览器ip 48 | :param port: 浏览器端口 49 | :param timeout: 超时时间(秒) 50 | :return: None 51 | """ 52 | ... 53 | 54 | 55 | def get_chrome_path(ini_path: str) -> Union[str, None]: 56 | """从ini文件或系统变量中获取chrome可执行文件的路径 57 | :param ini_path: ini文件路径 58 | :return: 文件路径 59 | """ 60 | ... 61 | -------------------------------------------------------------------------------- /docs_en/demos/login_gitee.md: -------------------------------------------------------------------------------- 1 | 🌠 Gitee Auto Login 2 | --- 3 | 4 | This example demonstrates how to automatically login to the Gitee website by controlling the browser. 5 | 6 | ## ✅️️ Web Analysis 7 | 8 | URL: https://gitee.com/login 9 | 10 | ![](../imgs/login_gitee1.jpg) 11 | 12 | Press `F12` to view the code, and you can see that both input boxes can be located using the `id` attribute, as shown in the image. 13 | 14 | ![](../imgs/login_gitee2.jpg) 15 | 16 | --- 17 | 18 | ## ✅️️ Coding Idea 19 | 20 | Elements with the `id` attribute are easy to locate. Both input boxes can be directly located using the `id` attribute. 21 | The login button does not have an `id` attribute, but it can be observed that it is the first element with the `value` attribute set to `'登 录'`, so it can also be located using the Chinese text for better code readability. 22 | 23 | Since we are using a browser for logging in, we will use `ChromiumPage` to control the browser. 24 | 25 | --- 26 | 27 | ## ✅️️ Sample Code 28 | 29 | ```python 30 | from DrissionPage import ChromiumPage 31 | 32 | # Create a page object in 'd' mode (default mode) 33 | page = ChromiumPage() 34 | # Navigate to the login page 35 | page.get('https://gitee.com/login') 36 | 37 | # Locate the account input box and enter the account 38 | page.ele('#user_login').input('Your account') 39 | # Locate the password input box and enter the password 40 | page.ele('#user_password').input('Your password') 41 | 42 | # Click the login button 43 | page.ele('@value=登 录').click() 44 | ``` 45 | 46 | --- 47 | 48 | ## ✅️️ Result 49 | 50 | Login successful. 51 | 52 | ![](../imgs/login_gitee3.jpg) 53 | 54 | -------------------------------------------------------------------------------- /DrissionPage/_functions/cli.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | """ 3 | @Author : g1879 4 | @Contact : g1879@qq.com 5 | @Website : https://DrissionPage.cn 6 | @Copyright: (c) 2020 by g1879, Inc. All Rights Reserved. 7 | """ 8 | from click import command, option 9 | 10 | from .._functions.tools import configs_to_here as ch 11 | from .._configs.chromium_options import ChromiumOptions 12 | from .._pages.chromium_page import ChromiumPage 13 | 14 | 15 | @command() 16 | @option("-p", "--set-browser-path", help="设置浏览器路径") 17 | @option("-u", "--set-user-path", help="设置用户数据路径") 18 | @option("-c", "--configs-to-here", is_flag=True, help="复制默认配置文件到当前路径") 19 | @option("-l", "--launch-browser", default=-1, help="启动浏览器,传入端口号,0表示用配置文件中的值") 20 | def main(set_browser_path, set_user_path, configs_to_here, launch_browser): 21 | if set_browser_path: 22 | set_paths(browser_path=set_browser_path) 23 | 24 | if set_user_path: 25 | set_paths(user_data_path=set_user_path) 26 | 27 | if configs_to_here: 28 | ch() 29 | 30 | if launch_browser >= 0: 31 | port = f'127.0.0.1:{launch_browser}' if launch_browser else None 32 | ChromiumPage(port) 33 | 34 | 35 | def set_paths(browser_path=None, user_data_path=None): 36 | """快捷的路径设置函数 37 | :param browser_path: 浏览器可执行文件路径 38 | :param user_data_path: 用户数据路径 39 | :return: None 40 | """ 41 | co = ChromiumOptions() 42 | 43 | if browser_path is not None: 44 | co.set_browser_path(browser_path) 45 | 46 | if user_data_path is not None: 47 | co.set_user_data_path(user_data_path) 48 | 49 | co.save() 50 | 51 | 52 | if __name__ == '__main__': 53 | main() 54 | -------------------------------------------------------------------------------- /docs_en/demos/douban_book_pics.md: -------------------------------------------------------------------------------- 1 | 🌠 Download Douban Book Covers 2 | --- 3 | 4 | The example from Starbucks uses the `download()` method to download images. This example demonstrates how to directly save images in a browser. 5 | 6 | This feature is a highlight of this library. It does not require any UI operations or re-downloading of images. Instead, it directly reads and saves images from the cache, making it very convenient to use. 7 | 8 | ## ✅️️ Page Analysis 9 | 10 | Target URL: [https://book.douban.com/tag/小说](https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4) 11 | 12 | By pressing `F12`, you can see that each book is contained in an element with the `class` attribute set to `subject-item`. You can retrieve them in batches and then retrieve the `` element to save the image. 13 | 14 | --- 15 | 16 | ## ✅️️ Coding Approach 17 | 18 | In order to demonstrate the `save()` method of the element object, we will use browser operations to save the image files to the local `imgs` folder. 19 | 20 | --- 21 | 22 | ## ✅️️ Example Code 23 | 24 | The following code can be run directly. 25 | 26 | ```python 27 | from DrissionPage import ChromiumPage 28 | 29 | # Create a page object 30 | page = ChromiumPage() 31 | # Visit the target webpage 32 | page.get('https://book.douban.com/tag/小说?start=0&type=T') 33 | 34 | # Scrape 4 pages 35 | for _ in range(4): 36 | # Iterate through all the books on a single page 37 | for book in page.eles('.subject-item'): 38 | # Get the cover image object 39 | img = book('t:img') 40 | # Save the image 41 | img.save(r'.\imgs') 42 | 43 | # Click the next page 44 | page('后页>').click() 45 | page.wait.load_start() 46 | ``` 47 | 48 | --- 49 | 50 | ## ✅️️ Result 51 | 52 | ![](../imgs/20230105105418.png) 53 | 54 | -------------------------------------------------------------------------------- /docs_en/advance/settings.md: -------------------------------------------------------------------------------- 1 | ⚙️ Global Settings 2 | --- 3 | 4 | There are some global settings at runtime that can control certain behaviors of the program. 5 | 6 | ## ✅️️ Usage 7 | 8 | Global settings are located in the `DrissionPage.common` path. 9 | 10 | Use assignment to modify the properties of the `Settings` object. 11 | 12 | Usage: 13 | 14 | ```python 15 | from DrissionPage.common import Settings 16 | 17 | Settings.raise_when_wait_failed = True 18 | ``` 19 | 20 | --- 21 | 22 | ## ✅️️ Settings Options 23 | 24 | ### 📌 `raise_when_ele_not_found` 25 | 26 | Sets whether or not to raise an exception when an element is not found. Default is `False`. 27 | 28 | --- 29 | 30 | ### 📌 `raise_when_click_failed` 31 | 32 | Sets whether or not to raise an exception when clicking fails. Default is `False`. 33 | 34 | --- 35 | 36 | ### 📌 `raise_when_wait_failed` 37 | 38 | Sets whether or not to raise an exception when waiting fails. Default is `False`. 39 | 40 | --- 41 | 42 | ### 📌 `singleton_tab_obj` 43 | 44 | Sets whether or not the Tab object should use the singleton pattern. Default is `True`. 45 | 46 | --- 47 | 48 | ## ✅️️ Examples 49 | 50 | This example sets to immediately raise an exception when an element is not found (instead of returning `NoneElement`). 51 | 52 | You can execute it directly to see the effect. 53 | 54 | ```python 55 | from DrissionPage import SessionPage 56 | from DrissionPage.common import Settings 57 | 58 | Settings.raise_when_ele_not_found = True 59 | 60 | page = SessionPage() 61 | page.get('https://www.baidu.com') 62 | ele = page('#abcd') 63 | ``` 64 | 65 | **Output:** 66 | 67 | ```shell 68 | ...omitted... 69 | DrissionPage.errors.ElementNotFoundError: 70 | Element not found. 71 | method: ele() 72 | args: {'locator': '#abcd'} 73 | ``` 74 | 75 | -------------------------------------------------------------------------------- /docs_en/download/introduction.md: -------------------------------------------------------------------------------- 1 | ⤵️ Overview 2 | --- 3 | 4 | DrissionPage provides powerful file download management capabilities. 5 | 6 | It can initiate download tasks actively and also manage download tasks triggered by the browser. 7 | 8 | ## ✅️️ `download()` method 9 | 10 | This method can actively initiate download tasks and provide features such as task management, multi-threading, large file chunking, automatic reconnection, and file name conflict handling. 11 | 12 | This method is supported by page objects, tab objects, and `