├── LICENSE ├── README.md ├── multithread └── __init__.py ├── requirements.txt └── setup.py /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2019 DashLt 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # I made this years ago and you probably shouldn't use it. Keeping up for posterity. 2 | 3 | # multithread 4 | 5 | Multithread is an optionally asynchronous Python library for downloading files using several threads. 6 | 7 | # Features 8 | 9 | * Lightweight: one file, a little over 100 lines of code excluding license 10 | * Extensive: the ability to pass your own sessions and your own arguments to each request means you don't need to wait for your desired feature to be implemented; anything you can do in aiohttp, you can do in multithread! 11 | * Fast: benefit from the speed of aiohttp and multithreaded downloading! 12 | 13 | # Installation 14 | 15 | Requirements: 16 | 17 | * Python 3.5.3+ 18 | * aiohttp 19 | * Tinche/aiofiles 20 | * optional: tqdm 21 | 22 | Use the package manager [pip](https://pip.pypa.io/en/stable/) to install multithread. 23 | For support for progress bars, install multithread[progress]. 24 | 25 | ```bash 26 | pip3 install multithread 27 | ``` 28 | 29 | # Usage 30 | 31 | ```python 32 | import multithread 33 | 34 | download_object = multithread.Download("http://url.com/file", filename) 35 | download_object.start() 36 | 37 | # passing headers (you can pass any other arguments that aiohttp.ClientSession.request can take as well) 38 | download_object = multithread.Download("http://url.com/file", filename, aiohttp_args={"headers": {"a": "b", "c": "d"}}) 39 | download_object.start() 40 | ``` 41 | 42 | # Documentation 43 | 44 | # Downloader 45 | ```python 46 | Downloader(self, url, file, threads=4, session=None, progress_bar=True, aiohttp_args={'method': 'GET'}, create_dir=True) 47 | ``` 48 | 49 | An optionally asynchronous multi-threaded downloader class using aiohttp 50 | 51 | Attributes: 52 | 53 | - url (str): The URL to download 54 | - file (str or path-like object): The filename to write the download to. 55 | - threads (int): The number of threads to use to download 56 | - session (aiohttp.ClientSession): An existing session to use with aiohttp 57 | - new_session (bool): True if a session was not passed, and the downloader created a new one 58 | - progress_bar (bool): Whether to output a progress bar or not 59 | - aiohttp_args (dict): Arguments to be passed in each aiohttp request. If you supply a Range header using this, it will be overwritten in fetch() 60 | 61 | ## \_\_init\_\_ 62 | ```python 63 | Downloader.__init__(self, url, file, threads=4, session=None, progress_bar=True, aiohttp_args={'method': 'GET'}, create_dir=True) 64 | ``` 65 | Assigns arguments to self for when asyncstart() or start() calls download. 66 | 67 | All arguments are assigned directly to self except for: 68 | 69 | - session: if not passed, a ClientSession is created 70 | - aiohttp_args: if the key "method" does not exist, it is set to "GET" 71 | - create_dir: see parameter description 72 | 73 | Parameters: 74 | 75 | - url (str): The URL to download 76 | - file (str or path-like object): The filename to write the download to. 77 | - threads (int): The number of threads to use to download 78 | - session (aiohttp.ClientSession): An existing session to use with aiohttp 79 | - progress_bar (bool): Whether to output a progress bar or not 80 | - aiohttp_args (dict): Arguments to be passed in each aiohttp request. If you supply a Range header using this, it will be overwritten in fetch() 81 | - create_dir (bool): If true, the directories encompassing the file will be created if they do not exist already. 82 | 83 | ## start 84 | ```python 85 | Downloader.start(self) 86 | ``` 87 | Calls asyncstart() synchronously 88 | ## asyncstart 89 | ```python 90 | Downloader.asyncstart(self) 91 | ``` 92 | Re-initializes file and calls download() with it. Closes session if necessary 93 | ## fetch 94 | ```python 95 | Downloader.fetch(self, progress=False, filerange=(0, '')) 96 | ``` 97 | Individual thread for fetching files. 98 | 99 | Parameters: 100 | 101 | - progress (bool or tqdm.Progress): the progress bar (or lack thereof) to update 102 | - filerange (tuple): the range of the file to get 103 | 104 | ## download 105 | ```python 106 | Downloader.download(self) 107 | ``` 108 | Generates ranges and calls fetch() with them. 109 | 110 | 111 | # Contributing 112 | Any and all pull requests are welcome. As this is a small project, there are no strict standards, but please try to keep your code clean to a reasonable standard. Alternatively, if you would like to clean *my* code, that would be more than welcome! 113 | 114 | # License 115 | [MIT](https://choosealicense.com/licenses/mit/) 116 | -------------------------------------------------------------------------------- /multithread/__init__.py: -------------------------------------------------------------------------------- 1 | """An optionally asynchronous multi-threaded downloader module for Python.""" 2 | import asyncio 3 | import aiohttp 4 | import aiofiles 5 | from pathlib import Path 6 | 7 | name = "multithread" 8 | __version__ = "1.0.1" 9 | 10 | class Downloader: 11 | """ 12 | An optionally asynchronous multi-threaded downloader class using aiohttp 13 | 14 | Attributes: 15 | 16 | - url (str): The URL to download 17 | - file (str or path-like object): The filename to write the download to. 18 | - threads (int): The number of threads to use to download 19 | - session (aiohttp.ClientSession): An existing session to use with aiohttp 20 | - new_session (bool): True if a session was not passed, and the downloader created a new one 21 | - progress_bar (bool): Whether to output a progress bar or not 22 | - aiohttp_args (dict): Arguments to be passed in each aiohttp request. If you supply a Range header using this, it will be overwritten in fetch() 23 | """ 24 | def __init__(self, url, file, threads=4, session=None, progress_bar=True, aiohttp_args={"method": "GET"}, create_dir=True): 25 | """Assigns arguments to self for when asyncstart() or start() calls download. 26 | 27 | All arguments are assigned directly to self except for: 28 | 29 | - session: if not passed, a ClientSession is created 30 | - aiohttp_args: if the key "method" does not exist, it is set to "GET" 31 | - create_dir: see parameter description 32 | 33 | Parameters: 34 | 35 | - url (str): The URL to download 36 | - file (str or path-like object): The filename to write the download to. 37 | - threads (int): The number of threads to use to download 38 | - session (aiohttp.ClientSession): An existing session to use with aiohttp 39 | - progress_bar (bool): Whether to output a progress bar or not 40 | - aiohttp_args (dict): Arguments to be passed in each aiohttp request. If you supply a Range header using this, it will be overwritten in fetch() 41 | - create_dir (bool): If true, the directories encompassing the file will be created if they do not exist already. 42 | """ 43 | self.url = url 44 | if create_dir: 45 | parent_directory = Path(file).parent 46 | parent_directory.mkdir(parents=True, exist_ok=True) 47 | self.file = file 48 | self.threads = threads 49 | if not session: 50 | self.session = aiohttp.ClientSession() 51 | self.new_session = True 52 | else: 53 | self.session = session 54 | self.new_session = False 55 | self.progress_bar = progress_bar 56 | if "method" not in aiohttp_args: 57 | aiohttp_args["method"] = "GET" 58 | self.aiohttp_args = aiohttp_args 59 | 60 | def start(self): 61 | """Calls asyncstart() synchronously""" 62 | loop = asyncio.get_event_loop() 63 | loop.run_until_complete(self.asyncstart()) 64 | 65 | async def asyncstart(self): 66 | """Re-initializes file and calls download() with it. Closes session if necessary""" 67 | await self.download() 68 | if self.new_session: 69 | await self.session.close() 70 | 71 | async def fetch(self, progress=False, filerange=(0,"")): 72 | """Individual thread for fetching files. 73 | 74 | Parameters: 75 | 76 | - progress (bool or tqdm.Progress): the progress bar (or lack thereof) to update 77 | - filerange (tuple): the range of the file to get 78 | """ 79 | async with aiofiles.open(self.file, "wb") as fileobj: 80 | if "headers" not in self.aiohttp_args: 81 | self.aiohttp_args["headers"] = dict() 82 | self.aiohttp_args["headers"]["Range"] = f"bytes={filerange[0]}-{filerange[1]}" 83 | async with self.session.request(url=self.url, **self.aiohttp_args) as filereq: 84 | offset = filerange[0] 85 | await fileobj.seek(offset) 86 | async for chunk in filereq.content.iter_any(): 87 | if progress: 88 | progress.update(len(chunk)) 89 | await fileobj.write(chunk) 90 | 91 | async def download(self): 92 | """Generates ranges and calls fetch() with them.""" 93 | temp_args = self.aiohttp_args.copy() 94 | temp_args["method"] = "HEAD" 95 | async with self.session.request(url=self.url, **temp_args) as head: 96 | length = int(head.headers["Content-Length"]) 97 | start = -1 98 | base = int(length / self.threads) 99 | ranges = list() 100 | for counter in range(self.threads - 1): 101 | ranges.append((start + 1, start + base)) 102 | start += base 103 | ranges.append((start + 1, length)) 104 | if self.progress_bar: 105 | from tqdm import tqdm 106 | with tqdm(total=length, unit_scale=True, unit="B") as progress: 107 | await asyncio.gather(*[self.fetch(progress, filerange) for filerange in ranges]) 108 | else: 109 | await asyncio.gather(*[self.fetch(False, filerange) for filerange in ranges]) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp 2 | aiofiles 3 | tqdm -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | with open("README.md", "r") as fh: 4 | long_description = fh.read() 5 | 6 | setuptools.setup( 7 | name="multithread", 8 | version="1.0.1", 9 | author="DashLt", 10 | description="An optionally asynchronous multithreaded downloader for python", 11 | long_description=long_description, 12 | long_description_content_type="text/markdown", 13 | url="https://github.com/DashLt/multithread", 14 | packages=setuptools.find_packages(), 15 | classifiers=[ 16 | "Programming Language :: Python :: 3", 17 | "License :: OSI Approved :: MIT License", 18 | "Operating System :: OS Independent", 19 | "Development Status :: 4 - Beta", 20 | "Framework :: AsyncIO", 21 | "Topic :: Internet :: WWW/HTTP" 22 | ], 23 | python_requires=">3.5", 24 | install_requires=['aiohttp', 'aiofiles'], 25 | extras_require={ 26 | "progress": "tqdm" 27 | } 28 | ) --------------------------------------------------------------------------------