├── .gitignore ├── Makefile ├── README.cn.md ├── README.md ├── docs └── p2p_flow.png ├── minion.py ├── minion.spec ├── peer ├── __init__.py ├── client.py ├── excepts.py ├── libs │ ├── __init__.py │ └── mrequests │ │ └── __init__.py ├── models.py ├── server.py └── utils.py ├── requirements.txt ├── setup.py ├── tests ├── __init__.py ├── minions.py ├── peer_upload_server.py ├── piecefile.py ├── test_peer_server.py └── test_tracker.py └── tracker ├── __init__.py ├── manage.py ├── models.py ├── peer ├── __init__.py ├── admin.py ├── models.py ├── tests.py └── views.py ├── server.py └── tracker ├── __init__.py ├── settings.py ├── urls.py └── wsgi.py /.gitignore: -------------------------------------------------------------------------------- 1 | # python compiled file 2 | *.pyc 3 | test/res_file.tgz 4 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | ARCH_FILES="peer setup.py minion.py" 2 | 3 | sources: 4 | git archive HEAD "${ARCH_FILES}" -o minion.tar.gz 5 | 6 | -------------------------------------------------------------------------------- /README.cn.md: -------------------------------------------------------------------------------- 1 | ### 介绍 2 | 3 | Minion, 实现了一种类似BT,能加强网络传输带宽的途径。它由python语言实现,并基于http的协议设计 4 | 5 | Minion 可以像 curl/wget 一样易用. 它同时提供python的api库 6 | 7 | ### 安装 8 | 9 | Minion 命令行工具需要python 2.7, requests库 10 | 11 | Minion tracker需要python2.7 django mysql 12 | 13 | ``` 14 | git clone git://github.com/alibaba/minion 15 | cd minion 16 | pip install -r requirements.txt 17 | python setup.py install 18 | ``` 19 | 20 | ### 用法 21 | 22 | 部署自定义tracker服务 23 | 24 | ``` 25 | python tracker/manage.py syncdb 26 | python tracker/manage.py runserver # 线上请使用wsgi/nginx 27 | ``` 28 | 29 | 开始使用 minion 命令行工具 30 | 31 | ``` 32 | minion get http://foo.bar/testfile \ 33 | --tracker some.tracker.server \ 34 | --dest-path=/tmp/tops1 \ 35 | --upload-rate 10M \ 36 | --download-rate 100M \ 37 | --callback delete \ 38 | --upload-time 5 \ 39 | --fallback 40 | ``` 41 | 42 | * --tracker 指定tracker server的地址 43 | * --dest-path 下载到的地址,可以指定目录(url后缀为文件名),可为空(当前目录,url后缀为文件) 44 | * --upload-time 指定下载完毕后继续上传的时间长度,默认为60秒 45 | * --download-rate --upload-rate 指定上传下载的速率,可以带单位,如10M,默认10M 46 | * --hash 完成下载后帮忙校验hash值 47 | * --fallback 如果没有peer可用,从源地址下载(源地址下载不限速) 48 | * --callback 目前实现一个选项,delete, 在所有工作结束后会删除文件 49 | * --verbose 为1时打印debug信息,并存进/tmp/minion.log中 50 | 51 | ### 架构 52 | 53 | Minion 工作流如下图 54 | 55 | ![image](/docs/p2p_flow.png) 56 | 57 | PEER: 下载资源的客户机 58 | TRACKER: 提供P4P资源信息的服务 59 | SOURCE: 资源的url 60 | 61 | 62 | 1. PEER0 从 TRACKER 获取上传 SOURCE 的 PEER 列表,返回空 63 | 2. PEER0 从 SOURCE 的源地址获取资源 64 | 3. PEER0 上传资源到 TRACKER 65 | 4. PEER1 从 TRACKER 获取上传 SOURCE 的 PEER 列表,返回包含 PEER0 的列表 66 | 5. PEER1 从 PEER0 获取资源 67 | 6. PEER1 上传资源到 TRACKER 68 | 69 | ## License 70 | 71 | Minion 适用 GPLv2 开源协议 72 | 73 | [English](/README.md) 74 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### Introduction 2 | 3 | Minion, which achieves an approach to maximize utilization of 4 | broadband traffic like BT, implemented by python and based on HTTP 5 | protocol 6 | 7 | Minion can be easily used to dispatch data, similar to curl/wget. It 8 | also provides python API libs. 9 | 10 | ### Installtion 11 | 12 | minion cli need python 2.7, requests 13 | 14 | minion tracker need python 2.7, django mysql 15 | 16 | ``` 17 | git clone git://github.com/alibaba/minion 18 | cd minion 19 | pip install -r requirements.txt 20 | python setup.py install 21 | ``` 22 | 23 | ### Usage 24 | 25 | deploy your tracker server 26 | 27 | tracker need django mysql 28 | 29 | ``` 30 | python tracker/manage.py syncdb 31 | python tracker/manage.py runserver # or use wsgi/nginx 32 | ``` 33 | 34 | try your minion cli 35 | 36 | ``` 37 | minion get http://foo.bar/testfile \ 38 | --tracker some.tracker.server \ 39 | --dest-path=/tmp/tops1 \ 40 | --upload-rate 10M \ 41 | --download-rate 100M \ 42 | --callback delete \ 43 | --upload-time 5 \ 44 | --fallback 45 | ``` 46 | 47 | * --tracker specify tracker 48 | * --dest-path specify the path data will be wrote to, default to current dir 49 | * --upload-time specify time for upload after download complete, default 50 | to 60s 51 | * --download-rate --upload-rate specify rate for upload/download 52 | can used with unit, like 10M 53 | * --fallback option, if no resource exists in tracker download directly 54 | from origin url 55 | * --callback 56 | * --verbose for more log 57 | 58 | ### Framework 59 | 60 | Minion work as picture below 61 | 62 | ![image](/docs/p2p_flow.png) 63 | 64 | 65 | PEER: The host ready to download SOURCE 66 | TRACKER: The host manage PEERs and SOURCEs 67 | SOURCE: URL of some data 68 | 69 | 70 | STEP 71 | 72 | 1. PEER0 access TRACKER to get peers that was uploading SOURCE, TRACKER 73 | return null 74 | 2. PEER0 get SOURCE from origin url directly 75 | 3. PEER0 upload SOURCE, telling that to TRACKER 76 | 4. PEER1 do like step 1,return a list contain PEER0 77 | 5. PEER1 get SOURCE from PEER0 78 | 6. PEER1 do like step 3 79 | 80 | ## License 81 | 82 | Minion as a whole is released under the GNU General Public License 83 | version 2. 84 | 85 | [中文](/README.cn.md) 86 | -------------------------------------------------------------------------------- /docs/p2p_flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alibaba/minion/e98530138bbac133c52d3e1ee05b8ec392bfb916/docs/p2p_flow.png -------------------------------------------------------------------------------- /minion.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | 4 | import sys 5 | import os 6 | import time 7 | import errno 8 | import hashlib 9 | import argparse 10 | import urlparse 11 | import logging 12 | import traceback 13 | import tempfile 14 | import select 15 | import socket 16 | import mmap 17 | from multiprocessing import Process, Queue 18 | 19 | from peer.client import Minions 20 | from peer.excepts import ChunkNotReady 21 | from peer.utils import logging_config, strip_url_qp 22 | 23 | __VERSION__ = '0.5.6' 24 | __AUTHORS__ = [ 25 | "linxiulei@gmail.com" 26 | ] 27 | 28 | logging_config("INFO") 29 | 30 | logger = logging.getLogger('CLI') 31 | 32 | STDOUT_SIZE = 8192 * 16 33 | 34 | 35 | class RETCODE(object): 36 | success = 0 37 | checksum_missmatch = 4 38 | tracker_unavaliable = 5 39 | dns_fail = 6 40 | origin_fail = 7 41 | 42 | 43 | class ChecksumMissMatch(Exception): 44 | pass 45 | 46 | 47 | def hash_file(filepath, hashtype): 48 | BLOCKSIZE = 65536 49 | hasher = getattr(hashlib, hashtype)() 50 | f = file(filepath) 51 | buf = f.read(BLOCKSIZE) 52 | while len(buf) > 0: 53 | hasher.update(buf) 54 | buf = f.read(BLOCKSIZE) 55 | f.close() 56 | return hasher.hexdigest() 57 | 58 | 59 | def cb_verify_hash(filepath, hashtype, hashsum): 60 | real_hashsum = hash_file(filepath, hashtype) 61 | if real_hashsum != hashsum: 62 | raise ChecksumMissMatch("actually %s, should be %s" % 63 | (real_hashsum, hashsum)) 64 | 65 | 66 | class StoreUnitAction(argparse.Action): 67 | def __call__(self, parser, namespace, values, option_string=None): 68 | units = { 69 | "K": 1024, 70 | "M": 1024 ** 2, 71 | "G": 1024 ** 3, 72 | "T": 1024 ** 4, 73 | } 74 | for u in units.keys(): 75 | if values.endswith(u): 76 | setattr(namespace, self.dest, int(values[:-1]) * units[u]) 77 | 78 | 79 | class HashAction(argparse.Action): 80 | def __call__(self, parser, namespace, value, option_string=None): 81 | try: 82 | hashtype, hashsum = value.split(":") 83 | setattr(namespace, 'hash', value) 84 | setattr(namespace, 'hash_type', hashtype) 85 | setattr(namespace, 'hash_sum', hashsum) 86 | except ValueError: 87 | raise argparse.ArgumentError(self, "format like HASHTYPE:HASHSUM") 88 | 89 | try: 90 | getattr(hashlib, hashtype) 91 | except AttributeError: 92 | raise argparse.ArgumentError(self, "unknow hashtype: %s" % 93 | hashtype) 94 | 95 | 96 | def parse_args(): 97 | parser = argparse.ArgumentParser("P4P tool cli") 98 | subparsers = parser.add_subparsers(help="sub-command help") 99 | 100 | parser_get = subparsers.add_parser("get", help="get data") 101 | parser_get.add_argument("url", action="store") 102 | parser_get.add_argument( 103 | "--tracker", action="store", metavar="HOST", 104 | default="p2p-tracker.alibaba-inc.com", dest="tracker", 105 | help='specify tracker server') 106 | parser_get.add_argument( 107 | "--dest-path", metavar="localpath", 108 | action="store", dest="dest_path", help="path download, can be dir") 109 | parser_get.add_argument( 110 | "--download-rate", action=StoreUnitAction, 111 | metavar="int_with_unit", dest="download_rate", 112 | help='network rate limit which can with unit. e.g. 10M') 113 | parser_get.add_argument( 114 | "--upload-rate", action=StoreUnitAction, 115 | metavar='int_with_unit', dest="upload_rate", 116 | help='network rate limit which can with unit, e.g. 10M') 117 | parser_get.add_argument( 118 | "--upload-time", action="store", type=int, 119 | dest="upload_time", default=10, 120 | help="upload time after download completely") 121 | parser_get.add_argument( 122 | "--fallback", action="store_true", default=False, 123 | dest="fallback", 124 | help="download from origin source when no peer available") 125 | parser_get.add_argument( 126 | "--callback", type=str, choices=['delete'], 127 | dest="callback", 128 | help="method invoke when work all over") 129 | parser_get.add_argument( 130 | "--hash", action=HashAction, dest="hash", 131 | metavar="HASHTYPE:HASHSUM", 132 | help="specify hash type and hash sum to verify downloaded file") 133 | parser_get.add_argument( 134 | "--ignore-qp", nargs='*', dest="ignore_qp", 135 | help="ignore specified query param for resource uploading") 136 | parser_get.add_argument( 137 | "--data-stdout", dest="data_stdout", 138 | action="store_true", default=False, 139 | help="redirect downloading file to stdout") 140 | parser_get.add_argument( 141 | "--verbose", action="store", type=int, 142 | default=0, dest="verbose", help="you should know what it mean") 143 | parser_get.add_argument( 144 | "--logfile", action="store", dest="logfile", 145 | help="specify logfile path") 146 | 147 | return parser.parse_args() 148 | 149 | 150 | class Daemon(object): 151 | def __init__(self, args, q, sock, mm=None): 152 | self.args = args 153 | self.q = q 154 | self.sock = sock 155 | self.mm = mm 156 | 157 | def run(self): 158 | try: 159 | minion = get_minion(self.args) 160 | if self.args.data_stdout: 161 | minion.download_res(thread=True) 162 | 163 | # wait download initial 164 | while minion.check_download_thread(): 165 | if minion.res_downloaded_size() > 0: 166 | break 167 | else: 168 | time.sleep(0.2) 169 | 170 | cursor = minion.get_file_cursor() 171 | 172 | while True: 173 | minion.check_download_thread() 174 | try: 175 | a = cursor.read(STDOUT_SIZE) 176 | if a: 177 | size = len(a) 178 | self.mm[:size] = a 179 | self.sock.send(str(size)) 180 | self.sock.recv(10) 181 | else: 182 | break 183 | except ChunkNotReady as e: 184 | time.sleep(0.2) 185 | 186 | elif self.args.hash: 187 | minion.download_res( 188 | callback=cb_verify_hash, 189 | cb_kwargs={ 190 | 'filepath': self.args.dest_path, 191 | 'hashtype': self.args.hash_type, 192 | 'hashsum': self.args.hash_sum, 193 | }) 194 | else: 195 | minion.download_res() 196 | self.q.put((0, True)) 197 | except Exception: 198 | except_type, except_class, tb = sys.exc_info() 199 | self.q.put( 200 | (1, 201 | ( 202 | (except_type, 203 | except_class, 204 | traceback.extract_tb(tb)) 205 | ) 206 | ) 207 | ) 208 | return 209 | 210 | # minion.upload_res(path=args.dest_path) 211 | poll_time = 10 212 | remain_upload_time = self.args.upload_time 213 | while minion.is_uploading() or minion.is_wait_uploading(): 214 | remain_upload_time -= poll_time 215 | if remain_upload_time > 0: 216 | time.sleep(poll_time) 217 | else: 218 | # time is up 219 | time.sleep(remain_upload_time + poll_time) 220 | break 221 | 222 | minion.close() 223 | if self.args.callback == 'delete': 224 | try: 225 | os.remove(self.args.dest_path) 226 | except OSError as e: 227 | if e.errno == errno.ENOENT: 228 | logger.warn('file is already removed') 229 | else: 230 | raise 231 | 232 | def start(self): 233 | def child(): 234 | if self.args.logfile: 235 | os.close(sys.stdout.fileno()) 236 | os.close(sys.stderr.fileno()) 237 | 238 | Process(target=self.run).start() 239 | os.kill(os.getpid(), 15) 240 | 241 | Process(target=child).start() 242 | 243 | 244 | def get_minion(args): 245 | if args.data_stdout: 246 | disorder = False 247 | else: 248 | disorder = True 249 | 250 | if args.ignore_qp: 251 | upload_res_url = strip_url_qp(args.url, args.ignore_qp) 252 | minion = Minions( 253 | args.url, 254 | download_dest=args.dest_path, 255 | download_rate=args.download_rate, 256 | upload_rate=args.upload_rate, 257 | upload_res_url=upload_res_url, 258 | fallback=args.fallback, 259 | tracker=args.tracker, 260 | disorder=disorder) 261 | else: 262 | minion = Minions( 263 | args.url, 264 | download_dest=args.dest_path, 265 | download_rate=args.download_rate, 266 | upload_rate=args.upload_rate, 267 | fallback=args.fallback, 268 | tracker=args.tracker, 269 | disorder=disorder) 270 | return minion 271 | 272 | 273 | def get_filename_from_url(url): 274 | return urlparse.urlparse(url).path.split('/')[-1] 275 | 276 | 277 | def pre_args(args): 278 | if args.data_stdout: 279 | if args.hash: 280 | errmsg = "--hash not support in --data-stdout" 281 | raise ValueError(errmsg) 282 | 283 | if not args.dest_path: 284 | args.dest_path = tempfile.mktemp(prefix="/var/tmp/") 285 | args.callback = "delete" 286 | 287 | if args.verbose > 0 and not args.logfile: 288 | logger.warn("Should not use verbose > 0 when --data-stdout") 289 | 290 | if not args.dest_path: 291 | args.dest_path = "." 292 | 293 | if os.path.isdir(args.dest_path): 294 | filename = get_filename_from_url(args.url) 295 | args.dest_path = os.path.join(args.dest_path, filename) 296 | 297 | 298 | if __name__ == '__main__': 299 | retcode = 0 300 | args = parse_args() 301 | pre_args(args) 302 | if args.verbose == 0: 303 | logging_config("ERROR", args.logfile) 304 | elif args.verbose == 1: 305 | logging_config("INFO", args.logfile) 306 | elif args.verbose == 2: 307 | logging_config("DEBUG", args.logfile) 308 | 309 | msg_q = Queue() 310 | try: 311 | psock, csock = socket.socketpair() 312 | if args.data_stdout: 313 | mm = mmap.mmap(-1, STDOUT_SIZE) 314 | else: 315 | mm = None 316 | d = Daemon(args, msg_q, csock, mm) 317 | d.start() 318 | # wait donwload finish 319 | while msg_q.empty(): 320 | rlist, _, _ = select.select([psock], [], [], 1) 321 | if rlist: 322 | size = int(psock.recv(10)) 323 | sys.stdout.write(mm[:size]) 324 | psock.send("1") 325 | ret = msg_q.get() 326 | 327 | if ret[0] == 0 and ret[1] is True: 328 | logger.info('downloaded, upload %s second(s) background' % 329 | args.upload_time) 330 | retcode = 0 331 | elif ret[0] == 1: 332 | logger.error("Traceback (most recent call last):") 333 | exc_type, exc_obj, exc_trace = ret[1] 334 | logger.error("".join(traceback.format_list(exc_trace)) 335 | + exc_type.__name__ + ": " + str(exc_obj)) 336 | retcode = 1 337 | if exc_type.__name__ == 'ChecksumMissMatch': 338 | logger.error(exc_obj.message) 339 | retcode = RETCODE.checksum_missmatch 340 | elif exc_type.__name__ == 'TrackerUnavailable': 341 | logger.error("Tracker Server: %s Unavaliable" % args.tracker) 342 | retcode = RETCODE.tracker_unavaliable 343 | elif exc_type.__name__ == 'gaierror': 344 | logger.error("Domain name resolve failed, check DNS") 345 | retcode = RETCODE.dns_fail 346 | elif exc_type.__name__ == 'OriginURLConnectError': 347 | logger.error("Get data from origin url fail") 348 | retcode = RETCODE.origin_fail 349 | 350 | sys.exit(retcode) 351 | except Exception as e: 352 | raise 353 | -------------------------------------------------------------------------------- /minion.spec: -------------------------------------------------------------------------------- 1 | Name: minion 2 | Version: 0.7.0 3 | Release: 1%{?dist} 4 | Summary: Alibaba P4P library and cli 5 | 6 | Group: Alibaba/OPS 7 | License: Alibaba 8 | URL: http://sam.alibaba-inc.com 9 | Source0: minion.tar.gz 10 | BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) 11 | 12 | Buildarch: noarch 13 | 14 | BuildRequires: python >= 2.7 15 | 16 | Requires: python >= 2.7 17 | Requires: python-requests 18 | 19 | %description 20 | Alibaba P4P library and cli, implemented by pure python. 21 | 22 | %prep 23 | %setup -q -c 24 | 25 | %build 26 | python setup.py 27 | 28 | %install 29 | rm -rf $RPM_BUILD_ROOT 30 | cp minion.py %{buildroot}/usr/bin/ 31 | ln -s /usr/bin/minion.py %{buildroot}/usr/bin/minion 32 | 33 | %files 34 | %_prefix 35 | /usr/bin/minion 36 | /usr/bin/minion.py 37 | 38 | %changelog 39 | * Thu Mar 5 2015 shi yu 40 | - build in the obs 41 | -------------------------------------------------------------------------------- /peer/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['client', 'excepts', 'models', 'server', 'utils'] 2 | -------------------------------------------------------------------------------- /peer/client.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import json 4 | import time 5 | import errno 6 | import socket 7 | import random 8 | import logging 9 | import threading 10 | import SocketServer 11 | import sched 12 | from hashlib import md5 13 | from urllib import urlencode 14 | from libs.mrequests import requests 15 | from requests.exceptions import ConnectionError,\ 16 | Timeout, HTTPError, ReadTimeout 17 | 18 | import gevent 19 | import gevent.pool 20 | 21 | from models import PieceFile, ExpireStorage, ExcThread 22 | from server import ResourceManager, handler_factory,\ 23 | GeventServer 24 | 25 | from threading import Thread, Lock 26 | from utils import sizeof_fmt_human, TokenBucket,\ 27 | get_res_length, http_download_to_piecefile,\ 28 | elapsed_time, pyinotify, is_gaierror, ListStore,\ 29 | generate_range_string 30 | from excepts import NoPeersFound, TrackerUnavailable,\ 31 | UnknowStrict, PeerOverload, OriginURLConnectError,\ 32 | PieceChecksumError, IncompleteRead, RateTooSlow 33 | 34 | reload(requests) 35 | 36 | THREADS = 100 37 | PROTOCOL = 'http://' 38 | PORT = 9999 39 | PIECE_SIZE = 1024 40 | 41 | SHORT_BLACK_TIME = 3 42 | LONG_BLACK_TIME = 30 43 | 44 | GET_PEER_INTERVAL = 0.4 45 | 46 | logger = logging.getLogger('minions') 47 | 48 | 49 | class Minions(object): 50 | TIME_TO_TRACKER = 60 51 | logger = logger 52 | 53 | def __init__(self, 54 | res_url, 55 | download_dest=None, 56 | tracker=None, 57 | upload_res_url=None, 58 | upload_filepath=None, 59 | download_rate=None, 60 | upload_rate=None, 61 | strict=None, 62 | fallback=False, 63 | disorder=True): 64 | """ 65 | @res_url origin url of resource 66 | @upload_res_url resource url in tracker or peers, 67 | which is not exactly as res_url sometime 68 | """ 69 | 70 | self.res_url = res_url 71 | self.download_dest = download_dest 72 | 73 | self.fallback = fallback 74 | self.download_rate = download_rate 75 | self.upload_rate = upload_rate 76 | self.strict = strict 77 | if upload_res_url: 78 | self.upload_res_url = upload_res_url 79 | else: 80 | self.upload_res_url = res_url 81 | 82 | if upload_filepath: 83 | self.upload_filepath = upload_filepath 84 | else: 85 | self.upload_filepath = download_dest 86 | 87 | if not tracker: 88 | raise ValueError("tracker required") 89 | 90 | self.tracker = tracker 91 | self.loader = None 92 | self._ex_ip = None 93 | self._adt_info = None 94 | self.port = None 95 | self._is_activated = True 96 | self.download_thread = None 97 | self.uploader = None 98 | self.token_bucket = None 99 | 100 | self.res_is_downloaded = False 101 | self.disorder = disorder 102 | 103 | self._peer_in_conn = set() 104 | 105 | self.piece_file = None 106 | self.notifier = None 107 | 108 | self._black_peerlist = ExpireStorage(expire_time=30) 109 | self._black_lock = Lock() 110 | 111 | self._lock = Lock() 112 | 113 | self.ready_upload_thread = None 114 | self._is_uploading = False 115 | self._wait_uploading = False 116 | 117 | # multiplexing session would be good for performance 118 | self._requests_session = requests.Session() 119 | 120 | def activate(self): 121 | with self._lock: 122 | self._is_activated = True 123 | 124 | def deactivate(self): 125 | with self._lock: 126 | self._is_activated = False 127 | 128 | def is_activated(self): 129 | return self._is_activated 130 | 131 | def close(self): 132 | self.deactivate() 133 | self.stop_upload() 134 | if self.notifier: 135 | self.stop_pyinotify() 136 | 137 | def is_uploading(self): 138 | if self.uploader and self._is_uploading: 139 | return True 140 | else: 141 | return False 142 | 143 | def is_wait_uploading(self): 144 | return self._wait_uploading 145 | 146 | # @mprofile 147 | def get_peers(self): 148 | count = 0 149 | retry_count = 0 150 | max_retry = 3 151 | if self.strict: 152 | adt_info = self._get_strict_info() 153 | 154 | if self.strict: 155 | res_url = PROTOCOL + self.tracker + "/peer/?%s&strict=%s" %\ 156 | (urlencode({'res': self.upload_res_url}), self.strict) 157 | else: 158 | res_url = PROTOCOL + self.tracker + "/peer/?%s" %\ 159 | (urlencode({'res': self.upload_res_url})) 160 | 161 | logger.info('Try get peers for getting resouces %s from %s' % 162 | (self.upload_res_url, res_url)) 163 | 164 | while True: 165 | try: 166 | if self.strict: 167 | r = self._requests_session.get( 168 | res_url, 169 | data=json.dumps({'adt_info': adt_info})) 170 | else: 171 | r = self._requests_session.get(res_url) 172 | 173 | if not r.ok: 174 | r.raise_for_status() 175 | 176 | logger.debug(r.content) 177 | 178 | ret = r.json() 179 | peers = ret[self.upload_res_url] 180 | except ConnectionError as e: 181 | if is_gaierror(e): 182 | logger.error( 183 | "resolve tracker server domain name : %s error" % 184 | self.tracker) 185 | raise e.args[0][1] 186 | else: 187 | logger.error('get peers from tracker error: %s' % e) 188 | raise TrackerUnavailable 189 | 190 | except HTTPError as e: 191 | logger.warn( 192 | 'get peers from tracker code error code: %s, msg: %s' % 193 | (e.response.status_code, e.message)) 194 | retry_count += 1 195 | if retry_count >= max_retry: 196 | logger.error('max retry times run out') 197 | raise TrackerUnavailable 198 | 199 | continue 200 | 201 | peers = [(ip, port) for ip, port in peers] 202 | for peer in peers[:]: 203 | # delete myself 204 | ip, port = peer 205 | peer_set = peer 206 | if (self._ex_ip, self.port) == peer_set: 207 | peers.remove(peer) 208 | 209 | if peer_set in self._black_peerlist: 210 | logger.debug('peer: %s:%s is in blacklist' % (ip, port)) 211 | peers.remove(peer) 212 | 213 | if self._ex_ip: 214 | if (self._ex_ip, self.port) in peers: 215 | peers.remove((self._ex_ip, self.port)) 216 | 217 | peers_num = len(peers) 218 | if len(peers) == 0 or peers == "" or \ 219 | peers is None or peers == 'null': 220 | logger.warn( 221 | 'Did not get any peers for %s' % 222 | self.upload_res_url) 223 | count += 1 224 | if count >= peers_num: 225 | raise NoPeersFound 226 | else: 227 | self.peers = peers 228 | logger.info('Get peers: %s from tracker' % peers) 229 | return peers 230 | 231 | def _listen_on_file(self): 232 | logger.debug('Listen on download file') 233 | s = sched.scheduler(time.time, time.sleep) 234 | 235 | def func(): 236 | if self.piece_file.has_unalloc() and self.is_activated(): 237 | logger.info("Already download %s" % 238 | self.piece_file.get_real_filesize()) 239 | s.enter(5, 1, func, ()) 240 | s.enter(5, 1, func, ()) 241 | 242 | Thread(target=s.run).start() 243 | 244 | def res_downloaded_size(self): 245 | if self.piece_file: 246 | return self.piece_file.get_real_filesize() 247 | else: 248 | return 0 249 | 250 | def download_res( 251 | self, 252 | rate=None, 253 | callback=None, 254 | cb_kwargs=None, 255 | thread=False): 256 | if thread: 257 | self.download_thread = ExcThread( 258 | target=self._download_res, 259 | kwargs={ 260 | 'filepath': self.download_dest, 261 | 'rate': rate, 262 | 'callback': callback, 263 | 'cb_kwargs': cb_kwargs, 264 | } 265 | ) 266 | self.download_thread.start() 267 | else: 268 | self._download_res( 269 | filepath=self.download_dest, 270 | rate=rate, 271 | callback=callback, 272 | cb_kwargs=cb_kwargs 273 | ) 274 | 275 | def _record_get_peer_ts(self): 276 | self._last_get_peer_time = time.time() 277 | 278 | def _get_last_get_peer_tv(self): 279 | return time.time() - self._last_get_peer_time 280 | 281 | def _download_res( 282 | self, 283 | filepath, 284 | rate, 285 | uploading=True, 286 | callback=None, 287 | cb_kwargs=None): 288 | try: 289 | peers = self.get_peers() 290 | self._record_get_peer_ts() 291 | peers_num = len(peers) 292 | count = 0 293 | 294 | # just get resource size 295 | while True: 296 | ip, port = peers[count] 297 | logger.info('get resource size') 298 | try: 299 | ret = self._requests_session.get( 300 | "{protocol}{ip}:{port}/?{res}" 301 | .format( 302 | protocol=PROTOCOL, ip=ip, 303 | port=port, 304 | res=urlencode({'res_url': self.upload_res_url})), 305 | stream=True, 306 | headers={"Range": "bytes=0-0"}, 307 | timeout=1) 308 | 309 | if ret.ok: 310 | #: bytes=0-1/17) 311 | content_range = ret.headers.get("Content-Range") 312 | res_length = content_range.split('/')[-1] 313 | break 314 | else: 315 | logger.warn( 316 | 'get piece from ip: %s port: %s error, code: %s ' % 317 | (ip, port, ret.status_code)) 318 | count += 1 319 | self.del_from_tracker(ip=ip, peer_port=port) 320 | except ConnectionError: 321 | logger.warn( 322 | 'get piece from ip: %s port: %s error ConnectionError' 323 | % (ip, port)) 324 | count += 1 325 | self.del_from_tracker(ip=ip, peer_port=port) 326 | except Timeout: 327 | logger.warn( 328 | 'get piece from ip: %s port: %s error Timeout' % 329 | (ip, port)) 330 | count += 1 331 | self.del_from_tracker(ip=ip, peer_port=port) 332 | finally: 333 | if count >= peers_num: 334 | logger.warn("No peers avaliable") 335 | peers = self.get_peers() 336 | peers_num = len(peers) 337 | count = 0 338 | 339 | logger.info('%s is size of %s' % 340 | (self.upload_res_url, sizeof_fmt_human(res_length))) 341 | 342 | self.piece_file = PieceFile(res_length, filepath) 343 | 344 | pool_work_num = 15 345 | pool_q_size = pool_work_num * 2 346 | pool = gevent.pool.Pool(pool_work_num) 347 | self.start_ready_upload_thread() 348 | 349 | if rate: 350 | self.download_rate = rate 351 | else: 352 | rate = self.download_rate 353 | 354 | if rate: 355 | self.token_bucket = TokenBucket(rate) 356 | 357 | while self.piece_file.has_unalloc(): 358 | args_list = list() 359 | for peer in peers: 360 | if peer not in self._peer_in_conn: 361 | args_list.append((peer, None)) 362 | [pool.apply_async(self._download_piece_thread, *args) 363 | for args in args_list[:pool_q_size]] 364 | # update peers if peer run out 365 | while pool.full(): 366 | gevent.sleep(0.2) 367 | 368 | if not self.piece_file.has_empty(): 369 | pool.join() 370 | 371 | logger.debug( 372 | 'test get_empty_block: %s' % 373 | self.piece_file.get_empty_piece()) 374 | 375 | logger.debug('peer in connection: %s' % self._peer_in_conn) 376 | if self.piece_file.has_unalloc(): 377 | try: 378 | tv = self._get_last_get_peer_tv() 379 | if tv < GET_PEER_INTERVAL: 380 | gevent.sleep(GET_PEER_INTERVAL - tv) 381 | g = gevent.spawn(self.get_peers) 382 | peers = g.get() 383 | self._record_get_peer_ts() 384 | except NoPeersFound: 385 | # if pool.workRequests: 386 | if pool_work_num - pool.free_count() > 0: 387 | # some remained piece maybe on the way 388 | pool.join() 389 | if self.piece_file.has_unalloc(): 390 | tv = self._get_last_get_peer_tv() 391 | if tv > GET_PEER_INTERVAL: 392 | gevent.sleep(GET_PEER_INTERVAL - tv) 393 | g = gevent.spawn(self.get_peers) 394 | peers = g.get() 395 | self._record_get_peer_ts() 396 | else: 397 | break 398 | else: 399 | logger.error("no worker running, and get no peers") 400 | raise 401 | else: 402 | break 403 | 404 | logger.info('File is complete, size: %s' % 405 | self.piece_file.get_real_filesize()) 406 | 407 | except NoPeersFound: 408 | if self.fallback: 409 | logger.info('Use fallback way to get resouce') 410 | try: 411 | res_length = get_res_length(self.res_url) 412 | except ConnectionError: 413 | raise OriginURLConnectError(self.res_url) 414 | logger.info( 415 | 'get resource length %s' % 416 | sizeof_fmt_human(res_length)) 417 | if not self.piece_file: 418 | self.piece_file = PieceFile(res_length, filepath) 419 | 420 | self.start_ready_upload_thread() 421 | http_download_to_piecefile( 422 | self.res_url, self.piece_file) 423 | else: 424 | self.deactivate() 425 | raise 426 | 427 | # self.piece_file.tofile() 428 | self.res_is_downloaded = True 429 | if callback: 430 | logger.info('Run callback') 431 | callback(**cb_kwargs) 432 | 433 | def res_is_ready(self): 434 | return self.res_is_downloaded 435 | 436 | def wait_for_res(self): 437 | while not self.res_is_downloaded: 438 | if not self.download_thread.is_alive(): 439 | logger.info('download thread exit in exceptions') 440 | self.download_thread.join() 441 | # raise DownloadError('download thread exit in exceptions') 442 | logger.debug( 443 | "Waiting for resource to complete, " 444 | "size of %s is %s now" % 445 | (self.res_url, self.res_downloaded_size())) 446 | time.sleep(2) 447 | return True 448 | 449 | def check_download_thread(self): 450 | if self.download_thread.is_alive(): 451 | return True 452 | else: 453 | self.download_thread.join() 454 | 455 | def _get_peer_block(self, ip, port): 456 | logger.debug('get peer block') 457 | ret = self._requests_session.get( 458 | "{protocol}{ip}:{port}/?{res}&pieces=all".format( 459 | protocol=PROTOCOL, 460 | ip=ip, 461 | port=port, 462 | res=urlencode({'res_url': self.upload_res_url}) 463 | ), 464 | timeout=2 465 | ) 466 | 467 | retjosn = ret.json() 468 | if retjosn['status'] == 'overload': 469 | raise PeerOverload 470 | elif retjosn['status'] == 'normal': 471 | return retjosn['result'] 472 | 473 | def _get_adt_info(self): 474 | if self._adt_info: 475 | return self._adt_info 476 | 477 | self._adt_info = {} 478 | return self._adt_info 479 | 480 | def set_adt_info(self, adt_info): 481 | self._adt_info = adt_info 482 | 483 | def _get_strict_info(self): 484 | if self.strict is None: 485 | pass 486 | elif self.strict == 'site': 487 | return {'site': self._get_adt_info()['site']} 488 | else: 489 | raise UnknowStrict("strick = %s" % self.strict) 490 | 491 | def _enter_piece_thread(self, ip, port): 492 | self._peer_in_conn.add((ip, port)) 493 | 494 | def _exit_piece_thread(self, ip, port): 495 | try: 496 | self._peer_in_conn.remove((ip, port)) 497 | except KeyError: 498 | pass 499 | 500 | def get_num_peer_in_conn(self): 501 | return len(self._peer_in_conn) 502 | 503 | def _download_piece_thread(self, ip, port): 504 | # this function would communicate with only ontpeer 505 | logger.debug("in _download_piece_thread") 506 | # TODO: if limit rate is less than 1MB/s 507 | 508 | try: 509 | while self.is_activated(): 510 | if not self.piece_file.has_empty(): 511 | logger.info( 512 | 'I have no piece untouched, just over this thread') 513 | self._exit_piece_thread(ip, port) 514 | break 515 | 516 | self._enter_piece_thread(ip, port) 517 | 518 | # get peer avaliable block 519 | try: 520 | available_blocks = self._get_peer_block(ip, port) 521 | except (ConnectionError, Timeout) as e: 522 | logger.info( 523 | "occurr %s when get block from ip: %s, " 524 | "port: %s is unavaliable, so unregister it" % 525 | (e, ip, port)) 526 | self.del_from_tracker(ip=ip, peer_port=port) 527 | self._exit_piece_thread(ip, port) 528 | break 529 | except PeerOverload as e: 530 | logger.warn( 531 | 'Peer ip: %s port: %s overload, let it go' % 532 | (ip, port)) 533 | self.add_to_blacklist((ip, port), SHORT_BLACK_TIME) 534 | self._exit_piece_thread(ip, port) 535 | break 536 | 537 | logger.debug( 538 | 'get available block: %s from %s:%s' % 539 | (available_blocks, ip, port)) 540 | 541 | needto_downlist = self.piece_file.get_unalloc_piece( 542 | available_blocks) 543 | 544 | logger.debug( 545 | "Need to download piece %s from %s:%s" % 546 | (needto_downlist, ip, port)) 547 | 548 | # time.sleep(random.randrange(10)) 549 | if not needto_downlist and self.piece_file.has_unalloc(): 550 | logger.info( 551 | "add peer: %s:%s to blacklist " 552 | "because that has no blocks I need" 553 | % (ip, port) 554 | ) 555 | self.add_to_blacklist((ip, port)) 556 | self._exit_piece_thread(ip, port) 557 | break 558 | elif not needto_downlist: 559 | self._exit_piece_thread(ip, port) 560 | break 561 | 562 | if self.disorder: 563 | needto_downlist = random.sample( 564 | needto_downlist, len(needto_downlist)) 565 | ls = ListStore(needto_downlist) 566 | 567 | while not ls.empty(): 568 | peer_in_conn = self.get_num_peer_in_conn() 569 | 570 | logger.debug('peer in connection: %s' % 571 | self._peer_in_conn) 572 | 573 | # this will reduce ls 574 | tobe_download_piece_idlist = \ 575 | self._get_pieces_once_from_ls(ls, peer_in_conn) 576 | 577 | if not tobe_download_piece_idlist: 578 | continue 579 | 580 | try: 581 | self._get_pieces_by_idlist( 582 | ip, port, tobe_download_piece_idlist) 583 | except Exception as e: 584 | if type(e) == Timeout: 585 | logger.info( 586 | "Get resource from ip: %s, port: %s Time Out" % 587 | (ip, port)) 588 | self.del_from_tracker(ip=ip, peer_port=port) 589 | elif type(e) == PeerOverload: 590 | self.add_to_blacklist((ip, port), SHORT_BLACK_TIME) 591 | logger.warn( 592 | 'Peer ip: %s port: %s overload, let it go' % 593 | (ip, port)) 594 | elif type(e) == socket.error: 595 | if e.errno == socket.errno.ECONNRESET: 596 | logger.warn( 597 | 'Peer ip: %s port: %s reset, let it go' % 598 | (ip, port)) 599 | else: 600 | logger.exception("Socket error occurs") 601 | elif type(e) == IncompleteRead: 602 | logger.error("incomplete iter_chunk") 603 | elif type(e) == RateTooSlow: 604 | pass 605 | else: 606 | logger.error( 607 | "occurr %s when get piece from " 608 | "ip: %s, port: %s" 609 | " is unavaliable, so unregister it" % 610 | (e, ip, port)) 611 | self.del_from_tracker(ip=ip, peer_port=port) 612 | self._exit_piece_thread(ip, port) 613 | return 614 | 615 | except Exception: 616 | logger.exception( 617 | "A Exception occurs without dealing in _download_piece_thread") 618 | 619 | def _get_ip(self): 620 | if self._ex_ip: 621 | return self._ex_ip 622 | 623 | s = socket.socket() 624 | if ':' in self.tracker: 625 | domain, port_str = self.tracker.split(':') 626 | tracker_tuple = (domain, int(port_str)) 627 | else: 628 | tracker_tuple = (self.tracker, 80) 629 | 630 | try: 631 | s.connect(tracker_tuple) 632 | except socket.error as e: 633 | if e.errno == errno.ECONNREFUSED or e.errno == errno.ETIMEDOUT: 634 | raise TrackerUnavailable 635 | else: 636 | raise 637 | self._ex_ip = s.getsockname()[0] 638 | return self._ex_ip 639 | 640 | def upload_res(self, path=None, piece_file=None, res_name=None, rate=None): 641 | if res_name: 642 | self.upload_res_url = res_name 643 | 644 | if path: 645 | self.upload_filepath = path 646 | 647 | if piece_file: 648 | self.piece_file = piece_file 649 | 650 | if not self.piece_file and not path and not piece_file: 651 | raise ValueError( 652 | "The file that is uploaded" 653 | "should be download before or specify local path") 654 | 655 | if not rate: 656 | rate = self.upload_rate 657 | 658 | if not self.piece_file and path: 659 | self.piece_file = PieceFile.from_exist_file(path) 660 | 661 | res_mng = ResourceManager() 662 | res_mng.add_res(self.upload_res_url, self.piece_file) 663 | 664 | handler = handler_factory(res_mng, rate) 665 | SocketServer.TCPServer.allow_reuse_address = True 666 | 667 | self._is_uploading = True 668 | if pyinotify: 669 | self._start_pyinotify_when_res_ready( 670 | self.piece_file, self.uploader) 671 | 672 | self.uploader = GeventServer((self._get_ip(), 0), handler) 673 | t = threading.Thread( 674 | target=self.uploader.serve_forever, 675 | kwargs={'poll_interval': 0.02}) 676 | t.start() 677 | 678 | ip, self.port = self.uploader.socket.getsockname() 679 | 680 | def func(): 681 | interval = 5 682 | while self.is_activated(): 683 | try: 684 | self.add_to_tracker() 685 | except ConnectionError: 686 | logger.warn( 687 | 'Tracker is down, stop registering myself to tracker') 688 | self.close() 689 | remain = Minions.TIME_TO_TRACKER 690 | while remain > 0 and self.is_activated(): 691 | remain -= interval 692 | time.sleep(interval) 693 | 694 | Thread(target=func).start() 695 | 696 | def stop_pyinotify(self): 697 | self._pyinotify_event = False 698 | 699 | def _start_pyinotify_when_res_ready(self, piece_file, uploader): 700 | from models import UploaderEventHandler 701 | 702 | def func(): 703 | while not self.res_is_downloaded and self.is_activated(): 704 | logger.debug( 705 | 'wait for resource is downloaded and start the notifier') 706 | time.sleep(0.2) 707 | 708 | self._pyinotify_event = False 709 | if self.is_activated(): 710 | logger.debug('start notifier') 711 | wm = pyinotify.WatchManager() 712 | mask = pyinotify.IN_DELETE | pyinotify.IN_MODIFY |\ 713 | pyinotify.IN_MOVED_TO | pyinotify.IN_MOVE_SELF 714 | 715 | self.notifier = pyinotify.Notifier( 716 | wm, UploaderEventHandler(self)) 717 | wm.add_watch( 718 | os.path.dirname(piece_file.filepath), mask, rec=False) 719 | 720 | self._pyinotify_event = True 721 | gevent.spawn(self.notifier.loop) 722 | 723 | while self._pyinotify_event: 724 | gevent.sleep(0.2) 725 | 726 | self.notifier.stop() 727 | 728 | threading.Thread(target=func).start() 729 | 730 | def _wait_first_block_and_upload(self): 731 | self._wait_uploading = True 732 | interval = 0.5 733 | while self.is_activated(): 734 | logger.debug('wait_first_block') 735 | if self.piece_file.get_pieces_avail(): 736 | logger.info('start upload') 737 | self.upload_res() 738 | self._wait_uploading = False 739 | break 740 | else: 741 | time.sleep(interval) 742 | 743 | def start_ready_upload_thread(self): 744 | if not self.ready_upload_thread: 745 | logger.debug( 746 | 'wait the first piece downloaded, and start the uploader') 747 | self.ready_upload_thread = Thread( 748 | target=self._wait_first_block_and_upload) 749 | self.ready_upload_thread.start() 750 | else: 751 | logger.debug('ready upload thread is already started') 752 | 753 | def uploader_termiante(self): 754 | if self.uploader: 755 | self.uploader.shutdown() 756 | self._is_uploading = False 757 | 758 | def stop_upload(self): 759 | logger.info('delete resource: %s' % self.upload_res_url) 760 | # service_manager.del_res(self.upload_res_name) 761 | if self.uploader: 762 | self.del_myself_from_tracker() 763 | self.uploader.shutdown() 764 | self.uploader.socket.close() 765 | self._is_uploading = False 766 | 767 | def add_to_tracker(self): 768 | # for ip in self.ips: 769 | ip = self._get_ip() 770 | adt_info = self._get_adt_info() 771 | data = {'res': self.upload_res_url, 'ip': ip, 'port': self.port} 772 | if adt_info: 773 | data['adt_info'] = adt_info 774 | 775 | logger.debug( 776 | 'Register myself as peer' 777 | ' which provider res: %s to tracker, ' 778 | 'ip: %s, port: %s, adt_info: %s' % 779 | (self.upload_res_url, 780 | ip, self.port, 781 | adt_info)) 782 | 783 | ret = requests.post("{protocal}{host}/peer/".format( 784 | protocal=PROTOCOL, 785 | host=self.tracker, 786 | ), 787 | data=json.dumps(data), 788 | headers={'Content-Type': 'application/json; charset=utf-8'}) 789 | if ret.ok: 790 | logger.debug('Register successfully') 791 | else: 792 | logger.error( 793 | 'Failed to post myself to tracker, reason: %s' % 794 | ret.reason) 795 | 796 | def del_myself_from_tracker(self): 797 | self.del_from_tracker(self._get_ip(), self.port) 798 | 799 | def del_from_tracker(self, ip, peer_port): 800 | logger.info( 801 | 'Unregister peer from tracker, ' 802 | 'ip:%s, port:%s, res:%s,' % 803 | (ip, peer_port, self.upload_res_url)) 804 | try: 805 | ret = requests.delete( 806 | "{protocal}{host}/peer/?{res}&ip={ip}&port={peer_port}" 807 | .format( 808 | protocal=PROTOCOL, 809 | host=self.tracker, 810 | res=urlencode({'res': self.upload_res_url}), 811 | ip=ip, 812 | peer_port=peer_port) 813 | ) 814 | if ret.ok: 815 | logger.info('Unregister successfully') 816 | else: 817 | logger.error( 818 | 'Failed to unregister, reason: %s' % ret.reason) 819 | except ConnectionError: 820 | logger.error( 821 | 'Unregister peer from tracker failed, maybe tracker is down') 822 | 823 | def add_to_blacklist(self, peer, time=None): 824 | with self._black_lock: 825 | self._black_peerlist.add(peer, time) 826 | 827 | def get_blacklist(self): 828 | return [peer for peer in self._black_peerlist] 829 | 830 | def get_file_cursor(self): 831 | return self.piece_file.get_cursor() 832 | 833 | def _get_slow_level(self): 834 | rate_affort = 1024 ** 2 835 | 836 | if self.download_rate is not None: 837 | slow_level = min(rate_affort, self.download_rate / 2) 838 | else: 839 | slow_level = rate_affort 840 | return slow_level 841 | 842 | def _get_pieces_by_idlist(self, ip, port, piece_idlist): 843 | slow_level = self._get_slow_level() 844 | 845 | still_empty_idlist = [piece_id for piece_id, size in piece_idlist] 846 | logger.debug("fetch piece list: %s at once" % 847 | still_empty_idlist) 848 | range_str = generate_range_string(piece_idlist, 849 | self.piece_file.piece_size) 850 | 851 | ret = self._requests_session.get("{protocol}{ip}:{port}/?{res}".format( 852 | protocol=PROTOCOL, 853 | ip=ip, 854 | port=port, 855 | res=urlencode({'res_url': self.upload_res_url}) 856 | ), 857 | stream=True, 858 | headers={"Range": range_str}, 859 | timeout=2) 860 | 861 | if not ret.ok: 862 | # requests not ok 863 | logger.warn( 864 | "error occurs when peer thread getting" 865 | " res, reason:%s" % 866 | (ret.reason)) 867 | 868 | self.del_from_tracker(ip=ip, peer_port=port) 869 | self.piece_file.empty_ids(still_empty_idlist) 870 | self._exit_piece_thread(ip, port) 871 | return 872 | 873 | rate_wait_interval = 0.1 874 | rate_wait_cmltime = 0 875 | md5sum = ret.headers.get('Content-MD5', None) 876 | 877 | if md5sum: 878 | logger.info('get Content-MD5 in headers') 879 | md5list = md5sum.split(',') 880 | hasher = md5() 881 | else: 882 | logger.warn('No Content-MD5 in headers') 883 | 884 | try: 885 | self._judge_peer_status(ret.headers) 886 | except PeerOverload: 887 | if self.get_num_peer_in_conn() > 12: 888 | ret.raw.close() 889 | ret.close() 890 | self.piece_file.empty_ids(still_empty_idlist) 891 | raise 892 | else: 893 | # We get no choice 894 | logger.info( 895 | "Peer ip: %s port: %s is overload, " 896 | "but keep using it" % (ip, port) 897 | ) 898 | 899 | piece_once_num = len(piece_idlist) 900 | offset = 0 901 | 902 | all_piece_tsize = 0 903 | 904 | for _, size in piece_idlist: 905 | all_piece_tsize += size 906 | 907 | piece_id, size = piece_idlist[offset] 908 | 909 | # list contain every read block for wrote in piecefile 910 | buf = [] 911 | 912 | # every single block recived length 913 | buflen = 0 914 | chunksize = 4096 * 2 * 16 # * 8 915 | 916 | t_recv = 0 917 | fetch_piece_time_start = time.time() 918 | try: 919 | for chunk in ret.iter_content(chunksize): 920 | recv_size = len(chunk) 921 | if self.token_bucket: 922 | while not self.token_bucket.consume(recv_size): 923 | gevent.sleep(rate_wait_interval) 924 | rate_wait_cmltime += rate_wait_interval 925 | 926 | # That stuff after iter_comtent is cpu-relative work, 927 | # so switch out for doing network relative work first 928 | 929 | buflen += recv_size 930 | t_recv += recv_size 931 | 932 | if buflen < size: 933 | # this block is incomplete 934 | 935 | buf.append(chunk) 936 | if md5sum: 937 | hasher.update(chunk) 938 | 939 | else: 940 | # one piece was all received 941 | # one block is complete 942 | overflow_size = buflen - size 943 | 944 | # and data is too much for previous block 945 | # leave is for next block 946 | if overflow_size: 947 | block = chunk[:-overflow_size] 948 | else: 949 | block = chunk 950 | 951 | buf.append(block) 952 | to_write_str = "".join(buf) 953 | if md5sum: 954 | hasher.update(block) 955 | md5sum_get = hasher.hexdigest() 956 | if md5sum_get != md5list[offset]: 957 | logger.warn( 958 | 'md5sum is mismatch from headers: %s and ' 959 | 'body: %s, so this peer is tainted ip: %s ' 960 | 'port: %s' % 961 | (md5list[offset], md5sum_get, ip, port)) 962 | 963 | logger.warn( 964 | "get len: %s ,expected len: %s" % 965 | (len(to_write_str), size)) 966 | 967 | raise PieceChecksumError( 968 | md5list[offset], md5sum_get) 969 | return 970 | # reset hasher for next new block 971 | hasher = md5() 972 | else: 973 | logger.warn("no MD5SUM") 974 | 975 | fetch_piece_time_end = time.time() 976 | fetch_time = fetch_piece_time_end - fetch_piece_time_start 977 | logger.debug('get piece time: %.4f from ip: %s port: %s' 978 | % (fetch_time, ip, port)) 979 | 980 | if md5sum: 981 | logger.info('Fill piece_id: %s md5sum %s' % 982 | (piece_id, md5sum_get)) 983 | else: 984 | logger.info('Fill piece_id: %s' % (piece_id)) 985 | 986 | with elapsed_time() as pfill: 987 | self.piece_file.fill(piece_id, 988 | to_write_str[0:size], 989 | md5sum_get) 990 | 991 | logger.debug('fill piece time: %.4f' % pfill.elapsed_time) 992 | # gevent.sleep(0) 993 | del still_empty_idlist[0] 994 | 995 | if size > 1024 ** 2 / 10 and\ 996 | self.get_num_peer_in_conn() > 12: 997 | # ignore block less than 100K 998 | rate_in_tranform = size / \ 999 | (fetch_time - rate_wait_cmltime) 1000 | 1001 | if rate_in_tranform < slow_level: 1002 | # less than 1MB/s 1003 | logger.debug( 1004 | "rate_wait_cmltime: %s " 1005 | "size: %s " 1006 | "fetch time: %s" 1007 | % (rate_wait_cmltime, size, fetch_time) 1008 | ) 1009 | logger.warn( 1010 | 'This peer of ip: %s port: %s is ' 1011 | 'too slow to get resource, rate: %s/s' % 1012 | (ip, port, sizeof_fmt_human(rate_in_tranform))) 1013 | 1014 | raise RateTooSlow(rate_in_tranform, slow_level) 1015 | 1016 | offset += 1 1017 | if offset >= piece_once_num: 1018 | # finish 1019 | # should not run here 1020 | break 1021 | piece_id, size = piece_idlist[offset] 1022 | if overflow_size: 1023 | overflow_buf = chunk[-overflow_size:] 1024 | buf = [overflow_buf] 1025 | buflen = len(overflow_buf) 1026 | else: 1027 | overflow_buf = "" 1028 | buf = [] 1029 | buflen = 0 1030 | 1031 | if md5sum: 1032 | hasher.update(overflow_buf) 1033 | 1034 | rate_wait_cmltime = 0 1035 | fetch_piece_time_start = time.time() 1036 | 1037 | if t_recv != all_piece_tsize: 1038 | raise IncompleteRead 1039 | 1040 | except (ConnectionError, Timeout, PeerOverload, 1041 | socket.error, IncompleteRead, RateTooSlow, 1042 | ReadTimeout) as e: 1043 | self.piece_file.empty_ids(still_empty_idlist) 1044 | raise e 1045 | 1046 | def _get_pieces_once_from_ls(self, ls, peer_in_conn): 1047 | pieces_once = 16 - peer_in_conn 1048 | if pieces_once <= 4: 1049 | pieces_once = 4 1050 | 1051 | # let's get pieces 1052 | piece_idlist = [] 1053 | remain_pieces_num = pieces_once 1054 | while True: 1055 | pieces = ls.getlist(remain_pieces_num) 1056 | if not pieces: 1057 | break 1058 | 1059 | for piece in self.piece_file.get_unalloc_piece_for_fetch( 1060 | [p_id for p_id, size in pieces]): 1061 | 1062 | piece_id, size = piece 1063 | 1064 | piece_idlist.append((piece_id, size)) 1065 | remain_pieces_num -= 1 1066 | 1067 | if remain_pieces_num == 0: 1068 | # get the amount we need 1069 | break 1070 | 1071 | return piece_idlist 1072 | 1073 | def _judge_peer_status(self, headers): 1074 | peer_status = headers.get("Minions-Status", None) 1075 | if peer_status: 1076 | if peer_status == 'overload': 1077 | if self.get_num_peer_in_conn() > 12: 1078 | raise PeerOverload 1079 | 1080 | 1081 | if __name__ == '__main__': 1082 | try: 1083 | minions_1 = Minions('AliOS5U7-x86-64.tgz', tracker='localhost:6000') 1084 | minions_1.upload_res(path='/home/aliclone/download/os/') 1085 | 1086 | minions_2 = Minions('AliOS6U2-x86-64.tgz', tracker='localhost:6000') 1087 | minions_2.upload_res(path='/home/aliclone/download/os/') 1088 | 1089 | except KeyboardInterrupt: 1090 | minions_1.uploader.terminate() 1091 | minions_2.uploader.terminate() 1092 | -------------------------------------------------------------------------------- /peer/excepts.py: -------------------------------------------------------------------------------- 1 | 2 | class NoPeersFound(Exception): 3 | pass 4 | 5 | class TrackerUnavailable(Exception): 6 | pass 7 | 8 | class UnknowStrict(Exception): 9 | pass 10 | 11 | class DownloadError(Exception): 12 | pass 13 | 14 | class PeerOverload(Exception): 15 | pass 16 | 17 | class ChunkNotReady(Exception): 18 | pass 19 | 20 | class OriginURLConnectError(Exception): 21 | pass 22 | 23 | class PieceChecksumError(Exception): 24 | pass 25 | 26 | class IncompleteRead(Exception): 27 | pass 28 | 29 | class RateTooSlow(Exception): 30 | pass 31 | -------------------------------------------------------------------------------- /peer/libs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alibaba/minion/e98530138bbac133c52d3e1ee05b8ec392bfb916/peer/libs/__init__.py -------------------------------------------------------------------------------- /peer/libs/mrequests/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | from gevent import monkey 3 | 4 | monkey.patch_all(thread=False, select=False) 5 | 6 | import requests 7 | -------------------------------------------------------------------------------- /peer/models.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | import time 5 | from hashlib import md5 6 | from utils import sizeof_fmt_human, md5sum 7 | from threading import Thread, RLock 8 | from peer.excepts import ChunkNotReady 9 | 10 | 11 | class BitMap(dict): 12 | """ 13 | an bitmap descript which block is fetching/empty 14 | True mean fetching 15 | False mean empty 16 | delete mean filled 17 | """ 18 | def __init__(self, length, full=False, *args, **kwargs): 19 | super(BitMap, self).__init__(*args, **kwargs) 20 | if not full: 21 | for i in range(length): 22 | self[i] = False 23 | 24 | self.size = length 25 | 26 | def get_fetched_block(self): 27 | ret_list = [] 28 | for i in range(self.size): 29 | try: 30 | self[i] 31 | except KeyError: 32 | ret_list.append(i) 33 | return ret_list 34 | 35 | def get_fetching_block(self): 36 | for item in self.keys(): 37 | if self[item] is True: 38 | return item 39 | 40 | def get_empty_block(self): 41 | for item in self.keys(): 42 | if self[item] is False: 43 | return item 44 | 45 | def set_fetching_to_empty(self): 46 | for item in self.keys(): 47 | self[item] = False 48 | 49 | def fill(self, item): 50 | del self[item] 51 | 52 | def set_empty(self, blockid): 53 | self[blockid] = False 54 | 55 | def set_fetching(self, blockid): 56 | self[blockid] = True 57 | 58 | def filter_empty_block(self, block_list): 59 | ret_list = [] 60 | for blockid in block_list: 61 | try: 62 | if self[blockid] is False: 63 | ret_list.append(blockid) 64 | except KeyError: 65 | pass 66 | 67 | return ret_list 68 | 69 | def filter_fetching_block(self, block_list): 70 | ret_list = [] 71 | for blockid in block_list: 72 | if self[blockid] is True: 73 | ret_list.append(blockid) 74 | 75 | return ret_list 76 | 77 | def filter_exist_block(self, block_list): 78 | ret_list = [] 79 | for blockid in block_list: 80 | try: 81 | self[blockid] 82 | except KeyError: 83 | ret_list.append(blockid) 84 | 85 | return ret_list 86 | 87 | filter_fetched_block = filter_exist_block 88 | 89 | 90 | class PieceFileCursor(object): 91 | def __init__(self, piece_file, lock): 92 | self._piece_file = piece_file 93 | self._cursor_offset = 0 94 | self._start_offset = 0 95 | self._lock = lock 96 | self.md5 = md5() 97 | self.fileob = self._piece_file.fileob 98 | 99 | def write(self, buf): 100 | write_len = len(buf) 101 | self.md5.update(buf) 102 | 103 | with self._lock: 104 | self.fileob.seek(self._cursor_offset) 105 | self.fileob.write(buf) 106 | 107 | self._cursor_offset += write_len 108 | 109 | piece_id = self._start_offset / self._piece_file.piece_size 110 | if self._start_offset % self._piece_file.piece_size: 111 | piece_id += 1 112 | 113 | piece_id, size = self._piece_file.get_piece_info(piece_id) 114 | piece_start_offset = piece_id * self._piece_file.piece_size 115 | if (self._cursor_offset - piece_start_offset)\ 116 | >= size: 117 | self._piece_file.fileob.flush() 118 | 119 | self._piece_file.filled(piece_id, md5=self.md5.hexdigest()) 120 | self.md5 = md5() 121 | self._start_offset = piece_start_offset + size 122 | 123 | def read(self, length): 124 | if self.is_readable(length): 125 | with self._lock: 126 | self.fileob.seek(self._cursor_offset) 127 | content = self.fileob.read(length) 128 | self.seek(self._cursor_offset + len(content)) 129 | return content 130 | else: 131 | raise ChunkNotReady() 132 | 133 | def is_readable(self, length): 134 | start_piece_id = self._piece_file.get_piece_id(self._cursor_offset) 135 | end_piece_id = self._piece_file.get_piece_id( 136 | self._cursor_offset + length) 137 | 138 | for i in range(start_piece_id, end_piece_id + 1): 139 | if self._piece_file.is_filled(i): 140 | pass 141 | else: 142 | return False 143 | return True 144 | 145 | def seek(self, offset): 146 | self._cursor_offset = offset 147 | self._start_offset = offset 148 | 149 | def tell(self): 150 | return self._cursor_offset 151 | 152 | 153 | class PieceFile(object): 154 | def __init__(self, filesize, filepath, full=False, *args, **kwargs): 155 | self.max_len = int(filesize) 156 | self._lock = RLock() 157 | 158 | self.filepath = filepath 159 | self.full = full 160 | if full: 161 | self.fileob = file(filepath) 162 | else: 163 | self.fileob = file(filepath, 'w+') 164 | 165 | self.piece_size = 1024 ** 2 166 | 167 | piece_num = self.max_len / self.piece_size 168 | self.max_piece_id = piece_num 169 | 170 | last_piece_size = self.max_len % self.piece_size 171 | if last_piece_size >= 0: 172 | piece_num += 1 173 | self.last_piece_size = last_piece_size 174 | 175 | self.piece_map = BitMap(piece_num, full) 176 | 177 | self.piece_hash_map = dict() 178 | if full: 179 | for i in range(piece_num): 180 | hasher = md5() 181 | piece_id, piece_size = self.get_piece_info(i) 182 | for j in range(piece_size / 8192): 183 | chunk = self.fileob.read(8192) 184 | hasher.update(chunk) 185 | 186 | last_remain = piece_size % 8192 187 | 188 | if last_remain: 189 | chunk = self.fileob.read(last_remain) 190 | hasher.update(chunk) 191 | 192 | md5sum = hasher.hexdigest() 193 | self.piece_hash_map[piece_id] = md5sum 194 | 195 | def get_cursor(self, offset=0, piece_id=None): 196 | if piece_id: 197 | offset = piece_id * self.piece_size 198 | 199 | pfc = PieceFileCursor(self, self._lock) 200 | pfc.seek(offset) 201 | return pfc 202 | 203 | @classmethod 204 | def from_exist_file(cls, filepath): 205 | size = os.stat(filepath)[6] 206 | pf = cls(size, filepath, full=True) 207 | return pf 208 | 209 | def get_real_filesize(self, human=True): 210 | downloaded_size = self.max_len - len(self.piece_map.keys()) * \ 211 | self.piece_size 212 | 213 | if downloaded_size < 0: 214 | downloaded_size = 0 215 | 216 | if human: 217 | return sizeof_fmt_human(downloaded_size) 218 | else: 219 | return downloaded_size 220 | 221 | def fill(self, piece_id, buf, md5=None): 222 | with self._lock: 223 | start = piece_id * self.piece_size 224 | self.fileob.seek(start) 225 | self.fileob.write(buf) 226 | self.fileob.flush() 227 | if md5: 228 | piece_md5 = md5 229 | else: 230 | piece_md5 = md5sum(buf) 231 | 232 | self.filled(piece_id, md5=piece_md5) 233 | 234 | def filled(self, piece_id, md5=None): 235 | with self._lock: 236 | try: 237 | del self.piece_map[piece_id] 238 | except KeyError: 239 | pass 240 | 241 | if md5: 242 | self.piece_hash_map[piece_id] = md5 243 | 244 | def is_filled(self, piece_id): 245 | try: 246 | self.piece_map[piece_id] 247 | except KeyError: 248 | return True 249 | return False 250 | 251 | def get_piece_md5(self, piece_id): 252 | return self.piece_hash_map[piece_id] 253 | 254 | def empty(self, piece_id): 255 | with self._lock: 256 | self.piece_map.set_empty(piece_id) 257 | 258 | def empty_ids(self, piece_ids): 259 | with self._lock: 260 | for piece_id in piece_ids: 261 | self.piece_map.set_empty(piece_id) 262 | 263 | def get_unalloc_piece(self, piece_idlist=None): 264 | # It is thread safe 265 | with self._lock: 266 | if piece_idlist: 267 | ret = [] 268 | empty_piece_idlist = self.piece_map.filter_empty_block( 269 | piece_idlist) 270 | for piece_id in empty_piece_idlist: 271 | ret.append(self.get_piece_info(piece_id)) 272 | return ret 273 | 274 | piece_id = self.piece_map.get_empty_block() 275 | if piece_id == self.max_piece_id and self.last_piece_size > 0: 276 | return piece_id, self.last_piece_size 277 | elif piece_id is None: 278 | return None, None 279 | else: 280 | return piece_id, self.piece_size 281 | 282 | def get_unalloc_piece_for_fetch(self, piece_idlist=None): 283 | with self._lock: 284 | ret = self.get_unalloc_piece(piece_idlist) 285 | if piece_idlist: 286 | for piece_id, size in ret: 287 | self.piece_map.set_fetching(piece_id) 288 | else: 289 | piece_id, size = ret 290 | if piece_id is not None: 291 | self.piece_map.set_fetching(piece_id) 292 | return ret 293 | 294 | def get_unalloc_piece_by_id(self, piece_id): 295 | with self._lock: 296 | empty_piece_idlist = self.piece_map.filter_empty_block([piece_id]) 297 | try: 298 | piece_id = empty_piece_idlist[0] 299 | return self.get_piece_info(piece_id) 300 | except IndexError: 301 | return None, None 302 | 303 | def get_unalloc_piece_by_id_for_fetch(self, piece_id): 304 | with self._lock: 305 | piece_id, size = self.get_unalloc_piece_by_id(piece_id) 306 | if piece_id is not None: 307 | self.piece_map.set_fetching(piece_id) 308 | 309 | return piece_id, size 310 | 311 | def get_piece_info(self, piece_id): 312 | if piece_id == self.max_piece_id and self.last_piece_size > 0: 313 | return piece_id, self.last_piece_size 314 | else: 315 | return piece_id, self.piece_size 316 | 317 | def get_piece_id(self, offset): 318 | piece_id = offset / self.piece_size 319 | if offset % self.piece_size: 320 | piece_id += 1 321 | 322 | return piece_id 323 | 324 | def put_to_queue(self): 325 | for i in self.piece_map.keys(): 326 | self.queue.put(i) 327 | 328 | def get_empty_piece(self): 329 | return self.piece_map.get_empty_block() 330 | 331 | def has_empty(self): 332 | # empty means no fetching or filled 333 | with self._lock: 334 | if self.piece_map.get_empty_block() is not None: 335 | return True 336 | else: 337 | return False 338 | 339 | def has_unalloc(self): 340 | # it is not thread safe, just be careful 341 | with self._lock: 342 | return len(self.piece_map.keys()) > 0 343 | 344 | def get_pieces_avail(self, piece_idlist=None): 345 | if piece_idlist: 346 | return self.piece_map.filter_fetched_block(piece_idlist) 347 | else: 348 | return self.piece_map.get_fetched_block() 349 | 350 | def __len__(self): 351 | return self.max_len 352 | 353 | 354 | class ExpireWrapper(object): 355 | def __init__(self, obj, expire_time=2): 356 | self._obj = obj 357 | self.expire_time = time.time() + expire_time 358 | self._is_expired = False 359 | 360 | def is_expired(self): 361 | if not self._is_expired: 362 | self._is_expired = (time.time() - self.expire_time) > 0 363 | return self._is_expired 364 | 365 | def set_expire_time(self, expire_time=2): 366 | self.expire_time = time.time() + expire_time 367 | 368 | def get_obj(self): 369 | return self._obj 370 | 371 | def __str__(self): 372 | return str(self._obj) 373 | 374 | def __repr__(self): 375 | return repr(self._obj) 376 | 377 | 378 | class ExpireStorage(object): 379 | def __init__(self, expire_time=2): 380 | self._set = set() 381 | self._expire_time = expire_time 382 | 383 | def add(self, obj, time=None): 384 | if time: 385 | expire_time = time 386 | else: 387 | expire_time = self._expire_time 388 | 389 | if obj in self: 390 | for ew_obj in self._set: 391 | if ew_obj.is_expired(): 392 | continue 393 | else: 394 | if ew_obj.get_obj() is obj: 395 | ew_obj.set_expire_time(expire_time) 396 | else: 397 | ew_obj = ExpireWrapper(obj, self._expire_time) 398 | self._set.add(ew_obj) 399 | 400 | def __iter__(self): 401 | for ew_obj in self._set.copy(): 402 | if ew_obj.is_expired(): 403 | continue 404 | else: 405 | yield ew_obj.get_obj() 406 | 407 | def __str__(self): 408 | return str(self._set) 409 | 410 | def _repr__(self): 411 | return repr(self._set) 412 | 413 | 414 | class ExcThread(Thread): 415 | def __init__(self, *args, **kwargs): 416 | super(ExcThread, self).__init__(*args, **kwargs) 417 | self._exc_info = None 418 | 419 | def run(self): 420 | try: 421 | super(ExcThread, self).run() 422 | except: 423 | self._exc_info = sys.exc_info() 424 | 425 | def join(self, timeout=0): 426 | super(ExcThread, self).join(timeout) 427 | if self._exc_info: 428 | raise self._exc_info[0], self._exc_info[1], self._exc_info[2] 429 | 430 | 431 | try: 432 | import pyinotify 433 | 434 | class UploaderEventHandler(pyinotify.ProcessEvent): 435 | def __init__(self, minion): 436 | super(UploaderEventHandler, self).__init__() 437 | self.minion = minion 438 | self.filepath = os.path.abspath(minion.piece_file.filepath) 439 | 440 | def is_my_filepath(self, filepath): 441 | return self.filepath == filepath 442 | 443 | def stop_upload(self): 444 | self.minion.logger.info('File %s was modified, stop uploading' % 445 | self.filepath) 446 | self.minion.stop_pyinotify() 447 | self.minion.stop_upload() 448 | 449 | def process_IN_DELETE(self, event): 450 | if self.is_my_filepath(event.pathname): 451 | self.minion.logger.info('DELETE') 452 | self.stop_upload() 453 | 454 | def process_IN_MODIFY(self, event): 455 | if self.is_my_filepath(event.pathname): 456 | self.minion.logger.info('MODIFY') 457 | self.stop_upload() 458 | 459 | def process_IN_MOVED_TO(self, event): 460 | if self.is_my_filepath(event.pathname): 461 | self.minion.logger.info('MOVED TO') 462 | self.stop_upload() 463 | 464 | def process_IN_MOVE_SELF(self, event): 465 | if self.is_my_filepath(event.pathname): 466 | self.minion.logger.info('MOVE self') 467 | self.stop_upload() 468 | 469 | except ImportError: 470 | pyinotify = None 471 | -------------------------------------------------------------------------------- /peer/server.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import json 4 | import traceback 5 | import requests 6 | import threading 7 | import logging 8 | 9 | import gevent 10 | from gevent import monkey 11 | 12 | import SocketServer 13 | import BaseHTTPServer 14 | from urlparse import urlparse, parse_qsl 15 | from urllib import urlencode 16 | from SimpleHTTPServer import SimpleHTTPRequestHandler 17 | 18 | from peer.utils import TokenBucket, elapsed_time, sizeof_fmt_human 19 | from peer.utils import gevent_sendfile 20 | 21 | monkey.patch_select() 22 | logger = logging.getLogger('peer.server') 23 | 24 | 25 | class ResourceManager(dict): 26 | def add_res(self, res_dest, local_path): 27 | self[res_dest] = local_path 28 | 29 | def del_res(self, res_dest): 30 | del self[res_dest] 31 | 32 | 33 | res_mng = ResourceManager() 34 | 35 | 36 | class ServiceManager(object): 37 | def __init__(self): 38 | self.res_mng = res_mng 39 | SocketServer.TCPServer.allow_reuse_address = True 40 | self.httpd = None 41 | self.server_running = False 42 | self._lock = threading.RLock() 43 | 44 | def is_server_running(self): 45 | return self.server_running 46 | 47 | def run_server(self, rate=None): 48 | with self._lock: 49 | if self.server_running: 50 | return 51 | self.httpd = GeventServer(('0.0.0.0', 0), PeerHandler) 52 | t = threading.Thread( 53 | target=self.httpd.serve_forever, 54 | kwargs={'poll_interval': 0.01}) 55 | t.start() 56 | self.server_running = True 57 | 58 | def shutdown_server(self): 59 | with self._lock: 60 | if self.httpd: 61 | self.httpd.shutdown() 62 | self.server_running = False 63 | 64 | def get_httpd_server(self): 65 | return self.httpd 66 | 67 | def add_res(self, res_dest, local_path): 68 | with self._lock: 69 | self.res_mng.add_res(res_dest, local_path) 70 | 71 | def del_res(self, res_dest): 72 | with self._lock: 73 | self.res_mng.del_res(res_dest) 74 | if not self.res_mng.keys(): 75 | self.shutdown_server() 76 | 77 | 78 | service_manager = ServiceManager() 79 | 80 | 81 | def set_upload_rate(rate=None): 82 | if rate: 83 | PeerHandler.token_bucket = TokenBucket(rate) 84 | else: 85 | PeerHandler.token_bucket = None 86 | 87 | 88 | class ConnInfo(object): 89 | def __init__(self): 90 | self.conn_num = 0 91 | self._lock = threading.Lock() 92 | 93 | def conn_increase(self): 94 | with self._lock: 95 | self.conn_num += 1 96 | 97 | def conn_decrease(self): 98 | with self._lock: 99 | self.conn_num -= 1 100 | 101 | 102 | class ThreadingsServer(SocketServer.ThreadingMixIn, 103 | BaseHTTPServer.HTTPServer): 104 | 105 | request_queue_size = 1024 106 | 107 | 108 | class GeventMixIn: 109 | def process_request_thread(self, request, client_address): 110 | try: 111 | self.finish_request(request, client_address) 112 | self.shutdown_request(request) 113 | except: 114 | self.handle_error(request, client_address) 115 | self.shutdown_request(request) 116 | 117 | def process_request(self, request, client_address): 118 | gevent.spawn(self.process_request_thread, request, client_address) 119 | 120 | 121 | class PeerHandler(SimpleHTTPRequestHandler): 122 | 123 | protocol_version = "HTTP/1.1" 124 | 125 | res_mng = None 126 | token_bucket = None 127 | conn_info = None 128 | 129 | rbufsize = 4194304 130 | wbufsize = 4194304 131 | 132 | def log_message(self, format, *args): 133 | logger.info("%s %s- - [%s] %s" % 134 | (self.client_address[0], self.client_address[1], 135 | self.log_date_time_string(), 136 | format % args) 137 | ) 138 | 139 | def do_GET(self): 140 | ret = None 141 | with elapsed_time() as p: 142 | self.conn_info.conn_increase() 143 | try: 144 | logger.info('handler conn_info: %s' % self.conn_info.conn_num) 145 | if self.token_bucket: 146 | logger.info('Server network rate: %s/s' % 147 | sizeof_fmt_human(self.token_bucket.get_rate())) 148 | ret = self._do_GET() 149 | except Exception as e: 150 | ex_type, ex, tb = sys.exc_info() 151 | logger.error( 152 | 'PeerHandler Error %s, traceback: %s' % 153 | ( 154 | ex, traceback.format_list( 155 | traceback.extract_tb(tb)) 156 | ) 157 | ) 158 | self.send_error(500, "inner error %s" % type(e)) 159 | finally: 160 | self.conn_info.conn_decrease() 161 | 162 | logger.debug('access %s use time: %.4f' % (self.path, p.elapsed_time)) 163 | if p.elapsed_time > 0.4: 164 | logger.debug(self.headers["range"]) 165 | return ret 166 | 167 | def _do_GET(self): 168 | self.parse_param() 169 | res_url = self.GET.get('res_url', None) 170 | pieces = self.GET.get('pieces', None) 171 | 172 | realpath = None 173 | if res_url: 174 | try: 175 | piecefile = self.res_mng[res_url] 176 | except KeyError: 177 | self.send_error(404, "File not found") 178 | return 179 | else: 180 | # error raise 181 | self.send_error(404, "res_url should be specified") 182 | return 183 | 184 | if pieces: 185 | ret = {'status': 'normal'} 186 | 187 | if pieces == 'all': 188 | if self.token_bucket and\ 189 | self.token_bucket.get_rate_usage() > 0.75: 190 | ret['status'] = 'overload' 191 | 192 | else: 193 | ret['result'] = piecefile.get_pieces_avail() 194 | else: 195 | pieces_list = pieces.split(',') 196 | ret['result'] = piecefile.get_pieces_avail(pieces_list) 197 | 198 | json_result = json.dumps(ret) 199 | self.send_200_head(len(json_result)) 200 | self.wfile.write(json_result) 201 | return 202 | 203 | realpath = piecefile.filepath 204 | 205 | self.byte_ranges = [] 206 | f = self.send_head(realpath, piecefile) 207 | 208 | # sendfile will skip user space buffer 209 | self.wfile.flush() 210 | chunksize = 4096 * 2 * 16 211 | if f: 212 | if self.byte_ranges: 213 | wfileno = self.wfile.fileno() 214 | rfileno = f.fileno() 215 | for start, end in self.byte_ranges: 216 | 217 | left_size = end - start + 1 218 | while True: 219 | if left_size > 0: 220 | tmp_chunksize = min(left_size, chunksize) 221 | if self.token_bucket: 222 | while not self.token_bucket.consume( 223 | tmp_chunksize): 224 | gevent.sleep(0.02) 225 | 226 | gevent_sendfile( 227 | wfileno, rfileno, start, tmp_chunksize) 228 | left_size -= tmp_chunksize 229 | start += tmp_chunksize 230 | else: 231 | break 232 | else: 233 | self.copyfile(f, self.wfile) 234 | f.close() 235 | 236 | def parse_param(self): 237 | self.GET = {} 238 | query = urlparse(self.path).query 239 | qs = parse_qsl(query) 240 | if qs: 241 | for k, v in qs: 242 | self.GET[k] = v 243 | return True 244 | 245 | def _do_range(self, f, piecefile=None): 246 | # byte_range: bytes=1-199,2-33 247 | byte_range = self.headers['range'] 248 | 249 | byte_unit, file_range = byte_range.split('=') 250 | parts = file_range.split(',') 251 | t_length = 0 252 | md5_list = [] 253 | for part in parts: 254 | range_list = part.split('-') 255 | if len(range_list) == 2: 256 | range_pair = range_list 257 | 258 | fs = os.fstat(f.fileno()) 259 | if piecefile: 260 | content_len = str(piecefile.max_len) 261 | else: 262 | content_len = str(fs[6]) 263 | 264 | start, end = range_pair 265 | start = int(start) 266 | end = int(end) 267 | self.byte_ranges.append((start, end)) 268 | 269 | length = end - start + 1 270 | t_length += length 271 | 272 | md5 = None 273 | if piecefile: 274 | piece_id = start / piecefile.piece_size 275 | md5 = piecefile.get_piece_md5(piece_id) 276 | 277 | if md5: 278 | md5_list.append(md5) 279 | else: 280 | md5_list.append("") 281 | else: 282 | pass 283 | 284 | self.send_response(206) 285 | self.send_header("Content-Length", t_length) 286 | self.send_header("Content-Range", byte_range + '/' + content_len) 287 | if md5: 288 | self.send_header("Content-MD5", ",".join(md5_list)) 289 | 290 | if self.token_bucket and \ 291 | self.token_bucket.get_rate_usage() > 0.75: 292 | self.send_header("Minions-Status", "overload") 293 | self.send_header("Last-Modified", self.date_time_string(fs.st_mtime)) 294 | self.end_headers() 295 | return f 296 | 297 | def send_200_head(self, length): 298 | self.send_response(200) 299 | self.send_header("Location", self.path + "/") 300 | self.send_header("Content-Length", length) 301 | self.end_headers() 302 | return None 303 | 304 | def send_head(self, realpath=None, piecefile=None): 305 | if not realpath: 306 | path = self.translate_path(self.path) 307 | else: 308 | path = realpath 309 | f = None 310 | if os.path.isdir(path): 311 | if not self.path.endswith('/'): 312 | # redirect browser - doing basically what apache does 313 | self.send_response(301) 314 | self.send_header("Location", self.path + "/") 315 | self.end_headers() 316 | return None 317 | for index in "index.html", "index.htm": 318 | index = os.path.join(path, index) 319 | if os.path.exists(index): 320 | path = index 321 | break 322 | else: 323 | return self.list_directory(path) 324 | ctype = self.guess_type(path) 325 | try: 326 | # Always read in binary mode. Opening files in text mode may cause 327 | # newline translations, making the actual size of the content 328 | # transmitted *less* than the content-length! 329 | f = open(path, 'rb') 330 | except IOError: 331 | self.send_error(404, "File not found") 332 | return None 333 | 334 | if self.headers.get('range', None): 335 | return self._do_range(f, piecefile) 336 | else: 337 | fs = os.fstat(f.fileno()) 338 | content_len = str(fs[6]) 339 | 340 | self.send_response(200) 341 | self.send_header("Content-type", ctype) 342 | self.send_header("Content-Length", content_len) 343 | self.send_header("Last-Modified", 344 | self.date_time_string(fs.st_mtime)) 345 | self.end_headers() 346 | return f 347 | 348 | 349 | class GeventServer(GeventMixIn, BaseHTTPServer.HTTPServer): 350 | request_queue_size = 1024 351 | 352 | 353 | def handler_factory(res_mng, rate): 354 | newclass = type("NewHanlder", (PeerHandler, object), {}) 355 | newclass.res_mng = res_mng 356 | newclass.conn_info = ConnInfo() 357 | if rate: 358 | newclass.token_bucket = TokenBucket(rate) 359 | return newclass 360 | 361 | 362 | if __name__ == '__main__': 363 | sm = ServiceManager() 364 | res_url = 'http://yum.tbsite.net/bigfile' 365 | sm.add_res(res_url, '/tmp/bigfile') 366 | sm.run_server() 367 | 368 | ip, port = sm.get_httpd_server().socket.getsockname() 369 | print 'server in %s:%s' % (ip, port) 370 | try: 371 | requests.get('http://localhost:%s?%s' % 372 | (port, urlencode({'res_url': res_url})), 373 | headers={"Range": "bytes=0-1"} 374 | ) 375 | finally: 376 | sm.shutdown_server() 377 | -------------------------------------------------------------------------------- /peer/utils.py: -------------------------------------------------------------------------------- 1 | 2 | import cgi 3 | import time 4 | import socket 5 | import hashlib 6 | import logging.config 7 | import urlparse 8 | import requests 9 | import pstats 10 | import StringIO 11 | import cProfile 12 | from threading import Lock 13 | from threading import Thread 14 | from errno import EAGAIN 15 | from sendfile import sendfile as original_sendfile 16 | from gevent.socket import wait_write 17 | 18 | try: 19 | import pyinotify 20 | except ImportError: 21 | pyinotify = None 22 | 23 | 24 | def _get_attach_filename(content_disposition): 25 | # content_disposition = attachment; filename=clonescripts.tgz" 26 | if content_disposition: 27 | content_type, params = cgi.parse_header( 28 | content_disposition) 29 | return params['filename'] 30 | return "" 31 | 32 | 33 | class ListStore(object): 34 | def __init__(self, list_obj): 35 | self._list = list_obj 36 | self._offset = 0 37 | self._length = len(list_obj) 38 | 39 | def getlist(self, num=1): 40 | new_list = self._list[self._offset:self._offset + num] 41 | self._offset += num 42 | return new_list 43 | 44 | def empty(self): 45 | return self._offset >= self._length 46 | 47 | def get(self): 48 | obj = self._list[self._offset] 49 | self._offset += 1 50 | return obj 51 | 52 | 53 | def get_res_length(url): 54 | r = requests.get(url, stream=True) 55 | r.close() 56 | if not r.ok: 57 | r.raise_for_status() 58 | 59 | length_str = r.headers['Content-Length'] 60 | 61 | return int(length_str) 62 | 63 | 64 | def http_download_to_piecefile(url, piecefile, thread=False): 65 | def func(url, piecefile): 66 | chunksize = 8192 67 | 68 | r = requests.get(url, stream=True) 69 | 70 | if not r.ok: 71 | r.raise_for_status() 72 | 73 | cursor = piecefile.get_cursor() 74 | t_size = 0 75 | for chunk in r.iter_content(chunksize): 76 | t_size += len(chunk) 77 | if chunk: 78 | cursor.write(chunk) 79 | else: 80 | break 81 | 82 | if thread: 83 | t = Thread(target=func, kwargs={'url': url, 'piecefile': piecefile}) 84 | t.start() 85 | else: 86 | func(url, piecefile) 87 | 88 | 89 | def sizeof_fmt_human(num): 90 | if type(num) == str: 91 | num = float(num) 92 | for x in ['bytes', 'KB', 'MB', 'GB', 'TB']: 93 | if num < 1024.0: 94 | return "%3.1f %s" % (num, x) 95 | num /= 1024.0 96 | 97 | 98 | def hash_file(filepath, hashtype): 99 | hash_obj = getattr(hashlib, hashtype) 100 | myhash = hash_obj() 101 | with file(filepath) as f: 102 | myslice = f.read(8196) 103 | while myslice: 104 | myhash.update(myslice) 105 | myslice = f.read(8196) 106 | 107 | return myhash.hexdigest() 108 | 109 | 110 | def logging_config(level='INFO', logfile=None): 111 | if level == "DEBUG": 112 | format_string = '%(asctime)s %(levelname)s %(name)s'\ 113 | ' %(thread)d %(message)s' 114 | else: 115 | format_string = '%(asctime)s %(levelname)s %(name)s'\ 116 | ' %(message)s' 117 | 118 | LOGGING = { 119 | 'version': 1, 120 | 'disable_existing_loggers': False, 121 | 'formatters': { 122 | 'default': { 123 | 'format': format_string, 124 | }, 125 | }, 126 | 'handlers': { 127 | 'console': { 128 | 'class': 'logging.StreamHandler', 129 | 'level': 'DEBUG', 130 | 'formatter': 'default' 131 | } 132 | }, 133 | 'root': { 134 | 'level': level, 135 | 'handlers': [] 136 | }, 137 | 'loggers': { 138 | 'requests': { 139 | 'level': 'ERROR' 140 | }, 141 | } 142 | } 143 | 144 | if logfile: 145 | LOGGING['handlers']['file'] = { 146 | 'class': 'logging.FileHandler', 147 | 'filename': logfile, 148 | 'level': level, 149 | 'mode': 'w', 150 | 'formatter': 'default' 151 | } 152 | 153 | LOGGING['root']['handlers'].append('file') 154 | else: 155 | LOGGING['root']['handlers'].append('console') 156 | 157 | logging.config.dictConfig(LOGGING) 158 | 159 | 160 | class TokenBucket(object): 161 | """An implementation of the token bucket algorithm. 162 | 163 | >>> bucket = TokenBucket(80, 0.5) 164 | >>> print bucket.consume(10) 165 | True 166 | >>> print bucket.consume(90) 167 | False 168 | """ 169 | 170 | stat_interval = 0.4 171 | 172 | def __init__(self, fill_rate): 173 | """ 174 | tokens is the total tokens in the bucket. fill_rate is the 175 | rate in tokens/second that the bucket will be refilled. 176 | """ 177 | self._tokens = 0 178 | self._last_consumed_tokens = 0 179 | self._consumed_tokens = 0 180 | self.fill_rate = float(fill_rate) 181 | self.timestamp = time.time() 182 | self._rate_ts = time.time() 183 | self.rate = 0 184 | self._lock = Lock() 185 | 186 | self._run_stat_thread = False 187 | 188 | def get_rate_usage(self): 189 | return self.rate / self.fill_rate 190 | 191 | def start_stat_thread(self): 192 | self._run_stat_thread = True 193 | 194 | def func(): 195 | while self._run_stat_thread: 196 | self.stat_rate() 197 | time.sleep(self.stat_interval) 198 | 199 | Thread(target=func).start() 200 | 201 | def stop_stat_thread(self): 202 | self._run_stat_thread = False 203 | 204 | def consume(self, tokens): 205 | """Consume tokens from the bucket. Returns True if there were 206 | sufficient tokens otherwise False.""" 207 | if tokens <= self.tokens: 208 | self._tokens -= tokens 209 | self._consumed_tokens += tokens 210 | else: 211 | return False 212 | return True 213 | 214 | def get_rate(self): 215 | self.stat_rate() 216 | return self.rate 217 | 218 | def stat_rate(self): 219 | now = time.time() 220 | with self._lock: 221 | delta_seconds = now - self._rate_ts 222 | if delta_seconds > self.stat_interval: 223 | self.rate = ( 224 | self._consumed_tokens - 225 | self._last_consumed_tokens 226 | ) / delta_seconds 227 | self._rate_ts = now 228 | self._last_consumed_tokens = self._consumed_tokens 229 | 230 | def get_tokens(self): 231 | now = time.time() 232 | delta_seconds = min(now - self.timestamp, 2) 233 | delta = self.fill_rate * delta_seconds 234 | self._tokens += delta 235 | self.timestamp = now 236 | return self._tokens 237 | 238 | tokens = property(get_tokens) 239 | 240 | 241 | class elapsed_time(object): 242 | def __init__(self): 243 | self.elapsed_time = 0 244 | 245 | def __enter__(self): 246 | self.begin = time.time() 247 | return self 248 | 249 | def __exit__(self, type, value, traceback): 250 | self.elapsed_time = time.time() - self.begin 251 | 252 | 253 | def md5sum(buf): 254 | m = hashlib.md5() 255 | m.update(buf) 256 | return m.hexdigest() 257 | 258 | 259 | def join_qsl(qsl): 260 | ret_qs = [] 261 | for k, v in qsl: 262 | if v: 263 | ret_qs.append("%s=%s" % (k, v)) 264 | else: 265 | ret_qs.append("%s" % k) 266 | return '&'.join(ret_qs) 267 | 268 | 269 | def strip_url_qp(url, qp): 270 | pr = urlparse.urlparse(url) 271 | qsl = urlparse.parse_qsl(pr.query, True) 272 | new_qsl = [(k, v) for k, v in qsl if k not in qp] 273 | url = urlparse.urlunparse( 274 | ( 275 | pr.scheme, 276 | pr.netloc, 277 | pr.path, 278 | pr.params, 279 | join_qsl(new_qsl), 280 | pr.fragment) 281 | ) 282 | return url 283 | 284 | 285 | def generate_range_string(piece_info, piece_size): 286 | range_str = "bytes=" 287 | value_list = [] 288 | for piece_id, size in piece_info: 289 | start = piece_id * piece_size 290 | end = start + size - 1 291 | value_list.append("%s-%s" % (start, end)) 292 | range_str += ",".join(value_list) 293 | return range_str 294 | 295 | 296 | def is_gaierror(e): 297 | if getattr(e, "args", None): 298 | try: 299 | if type(e.args[0][1]) == socket.gaierror: 300 | return True 301 | except KeyError: 302 | pass 303 | return False 304 | 305 | 306 | def mprofile(func): 307 | def func_wrapper(*args, **kwargs): 308 | pr = cProfile.Profile() 309 | pr.enable() 310 | ret = func(*args, **kwargs) 311 | # ... do something ... 312 | pr.disable() 313 | s = StringIO.StringIO() 314 | sortby = 'cumulative' 315 | ps = pstats.Stats(pr, stream=s).sort_stats(sortby) 316 | ps.print_stats() 317 | print s.getvalue() 318 | 319 | return ret 320 | return func_wrapper 321 | 322 | 323 | try: 324 | import GreenletProfiler 325 | 326 | except ImportError: 327 | GreenletProfiler = None 328 | 329 | 330 | def gprofile(func): 331 | def func_wrapper(*args, **kwargs): 332 | if GreenletProfiler: 333 | GreenletProfiler.set_clock_type('wall') 334 | GreenletProfiler.start() 335 | ret = func(*args, **kwargs) 336 | GreenletProfiler.stop() 337 | stats = GreenletProfiler.get_func_stats() 338 | stats.print_all() 339 | stats.save('profile.callgrind', type='callgrind') 340 | return ret 341 | return func(*args, **kwargs) 342 | return func_wrapper 343 | 344 | 345 | """An example how to use sendfile[1] with gevent. 346 | [1] http://pypi.python.org/pypi/py-sendfile/ 347 | """ 348 | # pylint:disable=import-error 349 | 350 | 351 | def gevent_sendfile(out_fd, in_fd, offset, count): 352 | total_sent = 0 353 | sent = 0 354 | while total_sent < count: 355 | try: 356 | sent = original_sendfile( 357 | out_fd, in_fd, offset + total_sent, count - total_sent) 358 | total_sent += sent 359 | except OSError as ex: 360 | if ex.args[0] == EAGAIN: 361 | sent = 0 362 | # TODO: there is performance problem maybe 363 | wait_write(out_fd) 364 | else: 365 | raise 366 | return total_sent 367 | 368 | 369 | if __name__ == '__main__': 370 | from time import sleep 371 | bucket = TokenBucket(15) 372 | bucket.start_stat_thread() 373 | print "tokens =", bucket.tokens 374 | print "consume(10) =", bucket.consume(10) 375 | sleep(0.3) 376 | print "consume(10) =", bucket.consume(10) 377 | print "tokens =", bucket.tokens 378 | sleep(1) 379 | print "tokens =", bucket.tokens 380 | print "consume(90) =", bucket.consume(90) 381 | print "tokens =", bucket.tokens 382 | print 'consume speed = %s b/s' % bucket.get_rate() 383 | print 'sleep 1s' 384 | sleep(1) 385 | print 'consume speed = %s b/s' % bucket.get_rate() 386 | print 'get rate usage = %s' % bucket.get_rate_usage() 387 | bucket.stop_stat_thread() 388 | 389 | with elapsed_time() as p: 390 | time.sleep(1.123) 391 | 392 | print p.elapsed_time 393 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Django==1.7.3 2 | MySQL-python==1.2.5 3 | uWSGI==2.0.9 4 | gevent==1.1.0 5 | greenlet==0.4.9 6 | pysendfile==2.0.1 7 | requests==2.18.4 8 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | 3 | setup(name='minion', 4 | version='0.7.0', 5 | description='P4P library', 6 | author='Eric', 7 | author_email='linxiulei@gmail.com', 8 | packages=['peer', 'peer.libs', 'peer.libs.mrequests'], 9 | ) 10 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alibaba/minion/e98530138bbac133c52d3e1ee05b8ec392bfb916/tests/__init__.py -------------------------------------------------------------------------------- /tests/minions.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import time 4 | import unittest 5 | from threading import Thread 6 | from multiprocessing import Process 7 | from peer.libs.mrequests import requests 8 | from peer.client import Minions 9 | from peer.excepts import NoPeersFound 10 | from peer.utils import logging_config 11 | 12 | 13 | logging_config("WARN") 14 | 15 | TRACKER = 'localhost:5000' 16 | 17 | FILE_SIZE = 705 * 1024 * 1024 + 702 18 | 19 | 20 | class P4PTest(unittest.TestCase): 21 | def setUp(self): 22 | self.res_file = 'res_file.tgz' 23 | # create test file 24 | if not os.path.exists(self.res_file): 25 | with file(self.res_file, 'w+') as f: 26 | [f.write("0" * 1024) for i in xrange(FILE_SIZE/1024)] 27 | f.write("0" * (FILE_SIZE % 1024)) 28 | 29 | from tracker import runserver 30 | self.server_process = runserver('localhost', 5000) 31 | 32 | time.sleep(0.2) 33 | 34 | self.tmp_res_fpath = '/tmp/minions_1.tgz' 35 | 36 | def test_no_peers(self): 37 | 38 | with self.assertRaises(NoPeersFound): 39 | minion_1 = Minions('res_file.tgz', 40 | '/tmp/minions_1.tgz', 41 | tracker=TRACKER) 42 | minion_1.download_res() 43 | 44 | def _get_uploader_process(self): 45 | def _func(): 46 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 47 | minion_uploader.upload_res('./res_file.tgz') 48 | while True: 49 | time.sleep(1000) 50 | 51 | p = Process(target=_func) 52 | return p 53 | 54 | def test_P4P_ignore_url_param(self): 55 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 56 | minion_uploader.upload_res('./res_file.tgz') 57 | 58 | time.sleep(1) 59 | 60 | requests.get('http://localhost:5000/peer/?res=res_file.tgz') 61 | 62 | minions = [] 63 | try: 64 | for i in range(3): 65 | minion = Minions( 66 | 'res_file.tgz?a=%s&b=%s' % (i, i), 67 | '/tmp/res_file%s.tgz' % i, tracker=TRACKER, 68 | upload_res_url="res_file.tgz") 69 | minion.download_res(rate=20 * 1024 ** 2, thread=True) 70 | time.sleep(0.1) 71 | minions.append(minion) 72 | time.sleep(2) 73 | 74 | for m in minions: 75 | m.wait_for_res() 76 | 77 | except Exception: 78 | import traceback 79 | traceback.print_exc() 80 | raise 81 | finally: 82 | for m in minions: 83 | m.close() 84 | minion_uploader.close() 85 | 86 | # @gprofile 87 | def test_P4P(self): 88 | minion_uploader = Minions( 89 | 'res_file.tgz', upload_rate=150 * 1024 ** 2, tracker=TRACKER) 90 | minion_uploader.upload_res('./res_file.tgz') 91 | 92 | time.sleep(1) 93 | 94 | requests.get('http://localhost:5000/peer/?res=res_file.tgz') 95 | 96 | minions = [] 97 | try: 98 | for i in range(1): 99 | minion = Minions( 100 | 'res_file.tgz', 101 | '/tmp/res_file%s.tgz' % i, 102 | tracker=TRACKER) 103 | minions.append(minion) 104 | minion.download_res(rate=1500 * 1024 ** 2) 105 | time.sleep(0.1) 106 | time.sleep(2) 107 | 108 | for m in minions: 109 | m.wait_for_res() 110 | 111 | except Exception: 112 | import traceback 113 | traceback.print_exc() 114 | raise 115 | finally: 116 | for m in minions: 117 | m.close() 118 | minion_uploader.close() 119 | 120 | def test_fallback(self): 121 | import BaseHTTPServer 122 | from SimpleHTTPServer import SimpleHTTPRequestHandler 123 | 124 | server_address = ('', 8000) 125 | httpd = BaseHTTPServer.HTTPServer( 126 | server_address, 127 | SimpleHTTPRequestHandler) 128 | t = Thread(target=httpd.serve_forever) 129 | t.start() 130 | 131 | try: 132 | minion_1 = Minions( 133 | 'http://localhost:8000/res_file.tgz', 134 | '/tmp/res_file.tgz', 135 | fallback=True, 136 | tracker=TRACKER) 137 | 138 | minion_1.download_res() 139 | time.sleep(1) 140 | minion_2 = Minions( 141 | 'http://localhost:8000/res_file.tgz', 142 | '/tmp/res_file1.tgz', tracker=TRACKER) 143 | minion_2.download_res(rate=20 * 1024 ** 2) 144 | finally: 145 | minion_1.close() 146 | minion_2.close() 147 | httpd.shutdown() 148 | 149 | def test_get_peer_strict(self): 150 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 151 | minion_uploader.set_adt_info({'site': 'mysite1'}) 152 | minion_uploader.upload_res('./res_file.tgz') 153 | 154 | minion_2 = None 155 | try: 156 | with self.assertRaises(NoPeersFound): 157 | minion_1 = Minions( 158 | 'res_file.tgz', 159 | '/tmp/res_file1.tgz', 160 | strict='site', 161 | tracker=TRACKER) 162 | minion_1.set_adt_info({'site': 'mysite2'}) 163 | minion_1.download_res() 164 | 165 | minion_2 = Minions( 166 | 'res_file.tgz', 167 | '/tmp/res_file2.tgz', 168 | strict='site', 169 | tracker=TRACKER) 170 | minion_2.set_adt_info({'site': 'mysite1'}) 171 | minion_2.download_res() 172 | finally: 173 | minion_uploader.close() 174 | minion_1.close() 175 | if minion_2: 176 | minion_2.close() 177 | 178 | def test_runout_peer(self): 179 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 180 | minion_uploader.upload_res('./res_file.tgz') 181 | 182 | time.sleep(1) 183 | 184 | requests.get('http://localhost:5000/peer/?res=res_file.tgz') 185 | 186 | with self.assertRaises(NoPeersFound): 187 | try: 188 | minion_1 = Minions( 189 | 'res_file.tgz', '/tmp/res_file.tgz', 190 | tracker=TRACKER) 191 | minion_1.download_res(rate=10 * 1024 ** 2, thread=True) 192 | time.sleep(2) 193 | minion_uploader.close() 194 | minion_1.wait_for_res() 195 | except Exception as e: 196 | import traceback 197 | traceback.print_exc() 198 | raise e 199 | finally: 200 | minion_1.close() 201 | minion_uploader.close() 202 | 203 | def tearDown(self): 204 | self.server_process.terminate() 205 | 206 | 207 | if __name__ == '__main__': 208 | suite = unittest.TestSuite() 209 | suite.addTest(P4PTest("test_no_peers")) 210 | suite.addTest(P4PTest("test_P4P")) 211 | 212 | runner = unittest.TextTestRunner() 213 | runner.run(suite) 214 | -------------------------------------------------------------------------------- /tests/peer_upload_server.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import unittest 4 | import requests 5 | import threading 6 | from urllib import urlencode 7 | from peer.models import PieceFile 8 | from peer.server import ResourceManager, handler_factory, GeventServer 9 | 10 | FILE_SIZE = 705 * 1024 * 1024 + 702 11 | 12 | 13 | class UploadServerTest(unittest.TestCase): 14 | def setUp(self): 15 | self.res_file = 'res_file.tgz' 16 | 17 | if not os.path.exists(self.res_file): 18 | with file(self.res_file, 'w+') as f: 19 | [f.write("0" * 1024) for i in xrange(FILE_SIZE/1024)] 20 | f.write("0" * (FILE_SIZE % 1024)) 21 | 22 | with file(self.res_file) as f: 23 | self.begin_slice = f.read(2) 24 | 25 | def test_server(self): 26 | res_mng = ResourceManager() 27 | res_url = 'http://testserver.localhost/bigfile' 28 | res_mng.add_res(res_url, PieceFile.from_exist_file('res_file.tgz')) 29 | handler = handler_factory(res_mng, None) 30 | uploader = GeventServer(("0.0.0.0", 0), handler) 31 | 32 | t = threading.Thread( 33 | target=uploader.serve_forever, 34 | kwargs={'poll_interval': 0.02}) 35 | t.start() 36 | 37 | ip, port = uploader.socket.getsockname() 38 | print 'server in %s:%s' % (ip, port) 39 | try: 40 | ret = requests.get( 41 | 'http://localhost:%s?%s' % 42 | (port, urlencode({'res_url': res_url})), 43 | headers={"Range": "bytes=0-1"} 44 | ) 45 | self.assertEqual(ret.content, self.begin_slice) 46 | 47 | ret = requests.get( 48 | 'http://localhost:%s?%s&%s' % 49 | ( 50 | port, urlencode({'res_url': res_url}), 51 | "pieces=1,2,3,400" 52 | ), 53 | ) 54 | self.assertEqual( 55 | ret.content, 56 | '{"status": "normal", "result": ["1", "2", "3", "400"]}') 57 | finally: 58 | uploader.shutdown() 59 | 60 | 61 | if __name__ == '__main__': 62 | unittest.main() 63 | -------------------------------------------------------------------------------- /tests/piecefile.py: -------------------------------------------------------------------------------- 1 | import time 2 | import unittest 3 | import threading 4 | from hashlib import md5 5 | 6 | from memory_profiler import profile 7 | 8 | from peer.models import PieceFile 9 | from peer.utils import hash_file, mprofile 10 | from peer.excepts import ChunkNotReady 11 | 12 | FILE_SIZE = 755 * 1024 * 1024 + 702 13 | FILE_PATH = '/tmp/piecefile' 14 | 15 | 16 | def iter_length(length, size): 17 | left = length 18 | while True: 19 | if left > size: 20 | left -= size 21 | yield size 22 | else: 23 | yield left 24 | break 25 | 26 | 27 | class PieceFileTest(unittest.TestCase): 28 | # @profile 29 | def setUp(self): 30 | # self.buf = [(i % 10)+49 for i in range(1024)] 31 | self.buf = bytearray("0" * 1024 ** 2) 32 | self.buf[3] = "1" 33 | self.buf[30] = "1" 34 | self.piece_file = PieceFile(FILE_SIZE, FILE_PATH) 35 | 36 | @profile 37 | def test_all(self): 38 | a = list() 39 | for i in range(5): 40 | a.append(threading.Thread(target=self.func)) 41 | start = time.time() 42 | 43 | for i in a: 44 | i.start() 45 | 46 | for i in a: 47 | i.join() 48 | 49 | elasp = time.time() - start 50 | 51 | # print self.piece_file 52 | print 'use %s(s)' % elasp 53 | self._verify_file() 54 | 55 | def func(self): 56 | while True: 57 | piece_id, size = self.piece_file.get_unalloc_piece_for_fetch() 58 | if piece_id is not None: 59 | buf = self._mock_get_data() 60 | self.piece_file.fill(piece_id, buf[0:size]) 61 | else: 62 | break 63 | 64 | def func1(self): 65 | while True: 66 | piece_id, size = self.piece_file.get_unalloc_piece_for_fetch() 67 | buf = self._mock_get_data() 68 | if piece_id is not None: 69 | cursor = self.piece_file.get_cursor(piece_id=piece_id) 70 | cursor.write(buf[0:size]) 71 | else: 72 | break 73 | 74 | def random_write(self, piece_file): 75 | buf = self._mock_get_data() 76 | buflen = len(buf) 77 | while True: 78 | piece_id, size = piece_file.get_unalloc_piece_for_fetch() 79 | if piece_id is not None: 80 | cursor = piece_file.get_cursor(piece_id=piece_id) 81 | for i in iter_length(size, buflen): 82 | cursor.write(buf[0:i]) 83 | else: 84 | break 85 | 86 | @profile 87 | def test_cursor_write(self): 88 | a = list() 89 | for i in range(5): 90 | a.append(threading.Thread(target=self.func1)) 91 | start = time.time() 92 | 93 | for i in a: 94 | i.start() 95 | 96 | for i in a: 97 | i.join() 98 | 99 | elasp = time.time() - start 100 | 101 | # print self.piece_file 102 | print 'use %s(s)' % elasp 103 | self._verify_file() 104 | 105 | @mprofile 106 | def test_cursor_order_write(self): 107 | cursor = self.piece_file.get_cursor() 108 | while True: 109 | piece_id, size = self.piece_file.get_unalloc_piece_for_fetch() 110 | if piece_id is not None: 111 | offset = 0 112 | buf = self._mock_get_data() 113 | for i in iter_length(size, 8196): 114 | cursor.write(buf[offset:offset + i]) 115 | offset += i 116 | else: 117 | break 118 | 119 | self._verify_file() 120 | 121 | def test_cursor_read_fallback(self): 122 | pass 123 | 124 | # @profile 125 | def test_cursor_order_read(self): 126 | piece_file = PieceFile(FILE_SIZE, FILE_PATH) 127 | cursor = piece_file.get_cursor() 128 | a = list() 129 | 130 | for i in range(10): 131 | a.append( 132 | threading.Thread( 133 | target=self.random_write, args=(piece_file,))) 134 | 135 | for i in a: 136 | i.start() 137 | 138 | md5sum = md5() 139 | buf = self._mock_get_data() 140 | chunksize = len(buf) 141 | while True: 142 | if cursor.is_readable(chunksize): 143 | try: 144 | content = cursor.read(chunksize) 145 | if not content: 146 | break 147 | self.assertEqual(content, buf[0:len(content)]) 148 | md5sum.update(content) 149 | except ChunkNotReady: 150 | break 151 | else: 152 | time.sleep(0.2) 153 | 154 | self.assertEqual( 155 | md5sum.hexdigest(), '9e503d2e6a09f60dc8b28e8330b686e0') 156 | 157 | def _mock_get_data(self): 158 | return self.buf 159 | 160 | def _verify_file(self): 161 | self.assertEqual(len(self.piece_file), FILE_SIZE) 162 | hash_ret = hash_file(FILE_PATH, 'md5') 163 | print 'md5sum: %s' % hash_ret 164 | self.assertEqual(hash_ret, '9e503d2e6a09f60dc8b28e8330b686e0') 165 | 166 | def tearDown(self): 167 | pass 168 | 169 | 170 | if __name__ == '__main__': 171 | unittest.main() 172 | -------------------------------------------------------------------------------- /tests/test_peer_server.py: -------------------------------------------------------------------------------- 1 | import time 2 | import unittest 3 | import socket 4 | 5 | import gevent 6 | import requests 7 | 8 | from peer.client import Minions 9 | from peer.utils import logging_config 10 | 11 | logging_config("ERROR") 12 | 13 | host = "localhost" 14 | port = 5001 15 | peer_url = "http://%s:%s/peer/" % (host, port) 16 | TRACKER = 'localhost:5001' 17 | 18 | 19 | class PeerServerTest(unittest.TestCase): 20 | def setUp(self): 21 | from tracker import runserver 22 | self.server_process = runserver(host, port) 23 | time.sleep(1) 24 | 25 | def test_keepalive_timeout(self): 26 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 27 | minion_uploader.upload_res('./res_file.tgz') 28 | time.sleep(1) 29 | port = minion_uploader.port 30 | import logging 31 | logging.basicConfig(level=logging.ERROR) 32 | 33 | def foo(session): 34 | size = 1024 ** 2 35 | r = session.get( 36 | "http://localhost:%s/?res_url=res_file.tgz" % port, 37 | stream=True, 38 | headers={"Range": "bytes=0-%s" % (size)}, 39 | timeout=2) 40 | # r.raw.close() 41 | r.content 42 | return r 43 | 44 | try: 45 | s = requests.Session() 46 | g = [] 47 | for i in range(500): 48 | g.append(gevent.spawn(foo, s)) 49 | 50 | gevent.sleep(1) 51 | for i in g: 52 | i.get() 53 | # print "------------" 54 | # r = s.get("http://localhost:%s/?res_url=res_file.tgz" % port, 55 | # stream=True, 56 | # headers={"Range": "bytes=0-%s" %(size)}, 57 | # timeout=1) 58 | 59 | # print len(r.raw.read(1023)) 60 | # len(r.raw.read(1024**2 + 1)) 61 | # time.sleep(1) 62 | # r.content 63 | # print r 64 | # r.raw.close() 65 | # r.close() 66 | # r.raw.close() 67 | # del r 68 | # r = s.get( 69 | # "http://localhost:%s/?res_url=res_file.tgz&pieces=all" % 70 | # port, 71 | # stream=True, 72 | # headers={"Range": "bytes=0-%s" % (size)}, 73 | # timeout=1) 74 | # #print len(r.raw.read(1024**2 * 2 + 1)) 75 | # r.close() 76 | 77 | # r.content 78 | # r = s.get( 79 | # "http://localhost:%s/?res_url=res_file.tgz&pieces=all" 80 | # % port, 81 | # stream=True, 82 | # headers={"Range": "bytes=0-%s" % (size)}, 83 | # timeout=1) 84 | # r.content 85 | # r.close() 86 | # print "------------" 87 | finally: 88 | print 'finally' 89 | minion_uploader.close() 90 | minion_uploader.stop_upload() 91 | 92 | def test_keepalive(self): 93 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 94 | minion_uploader.upload_res('./res_file.tgz') 95 | time.sleep(1) 96 | 97 | port = minion_uploader.port 98 | 99 | try: 100 | # test if whether get block api interface works 101 | r = requests.get( 102 | "http://localhost:%s/?res_url=res_file.tgz&pieces=all" % port, 103 | stream=True, 104 | headers={"Range": "bytes=0-1"}, 105 | timeout=1) 106 | print r.content 107 | 108 | s = socket.socket() 109 | s.settimeout(0.2) 110 | s.connect(('localhost', port)) 111 | s.send( 112 | "GET /?res_url=res_file.tgz HTTP/1.1\r\n" 113 | "Host: localhost:62630\r\n" 114 | "Accept-Encoding: identity\r\n" 115 | "Content-Length: 0\r\nRange: bytes=0-1,3-4\r\n" 116 | "\r\n") 117 | print s.recv(200) 118 | s.recv(1024 ** 2 + 400) 119 | 120 | with self.assertRaises(socket.timeout): 121 | s.recv(1024) 122 | 123 | s.send( 124 | "GET /?res_url=res_file.tgz HTTP/1.1\r\n" 125 | "Host: localhost:62630\r\n" 126 | "Accept-Encoding: identity\r\n" 127 | "Content-Length: 0\r\n" 128 | "Range: bytes=0-1,3-4\r\n" 129 | "\r\n") 130 | print s.recv(200) 131 | 132 | # conn = httplib.HTTPConnection("localhost", port) 133 | # conn.debuglevel = 1 134 | # conn.request( 135 | # "GET", "/?res_url=res_file.tgz", 136 | # "", headers={"Range": "bytes=0-1,3-4"}) 137 | # r = conn.getresponse() 138 | 139 | finally: 140 | minion_uploader.close() 141 | minion_uploader.stop_upload() 142 | 143 | def test_multipart(self): 144 | minion_uploader = Minions('res_file.tgz', tracker=TRACKER) 145 | minion_uploader.upload_res('./res_file.tgz') 146 | 147 | time.sleep(1) 148 | port = minion_uploader.port 149 | 150 | try: 151 | r = requests.get( 152 | "http://localhost:%s/?res_url=res_file.tgz" % port, 153 | headers={"Range": "bytes=0-1,3-4"}) 154 | 155 | self.assertEqual(len(r.content), 4) 156 | finally: 157 | minion_uploader.close() 158 | minion_uploader.stop_upload() 159 | 160 | def tearDown(self): 161 | self.server_process.terminate() 162 | -------------------------------------------------------------------------------- /tests/test_tracker.py: -------------------------------------------------------------------------------- 1 | import time 2 | import json 3 | import unittest 4 | import socket 5 | import struct 6 | 7 | import requests 8 | 9 | host = "localhost" 10 | port = 5001 11 | peer_url = "http://%s:%s/peer/" % (host, port) 12 | 13 | 14 | def int_to_ipstr(int_ip): 15 | return socket.inet_ntoa( 16 | struct.pack('I', socket.htonl(int_ip)) 17 | ) 18 | 19 | 20 | def ipstr_to_int(str_ip): 21 | return socket.ntohl( 22 | struct.unpack("I", socket.inet_aton(str(str_ip)))[0] 23 | ) 24 | 25 | 26 | class TrackerServerTest(unittest.TestCase): 27 | def setUp(self): 28 | from tracker import runserver 29 | self.server_process = runserver(host, port) 30 | time.sleep(1.) 31 | 32 | data = {'ip': "1.1.1.1", 'port': 80, "res": '1.tgz'} 33 | requests.post( 34 | peer_url, data=json.dumps(data), 35 | headers={'Content-Type': 'application/json; charset=utf-8'} 36 | ) 37 | 38 | def test_add_peer(self): 39 | data = {'ip': "1.1.1.1", 'port': 80, "res": '1.tgz'} 40 | r = requests.post( 41 | peer_url, data=json.dumps(data), 42 | headers={'Content-Type': 'application/json; charset=utf-8'}) 43 | r.raise_for_status() 44 | 45 | def test_get_peer_strict(self): 46 | data = { 47 | 'ip': "1.1.1.1", 'port': 80, "res": '1.tgz', 48 | 'adt_info': {'site': 'mysite10'} 49 | } 50 | 51 | r = requests.post( 52 | peer_url, data=json.dumps(data), 53 | headers={'Content-Type': 'application/json; charset=utf-8'}) 54 | 55 | data = { 56 | 'ip': "1.1.1.2", 'port': 80, "res": '1.tgz', 57 | 'adt_info': {'site': 'c.mysite10'} 58 | } 59 | 60 | r = requests.post( 61 | peer_url, data=json.dumps(data), 62 | headers={'Content-Type': 'application/json; charset=utf-8'}) 63 | 64 | print requests.get(peer_url+"?res=1.tgz&verbose=1").content 65 | r = requests.get( 66 | peer_url+"?res=1.tgz&strict=site", 67 | data=json.dumps({"adt_info": {"site": "mysite10"}})) 68 | 69 | content = { 70 | "1.tgz": [ 71 | [ 72 | "1.1.1.1", 73 | 80 74 | ]], 75 | "ret": 76 | 'success' 77 | } 78 | self.assertEqual(r.json(), content) 79 | 80 | def test_unknow(self): 81 | data = { 82 | 'ip': "1.1.1.1", 'port': 80, "res": '1.tgz', 83 | 'adt_info': {'site': 'unknow'} 84 | } 85 | 86 | requests.post( 87 | peer_url, data=json.dumps(data), 88 | headers={'Content-Type': 'application/json; charset=utf-8'}) 89 | 90 | data = { 91 | 'ip': "1.1.1.2", 'port': 80, "res": '1.tgz', 92 | 'adt_info': {'site': 'unknow'} 93 | } 94 | 95 | requests.post( 96 | peer_url, data=json.dumps(data), 97 | headers={'Content-Type': 'application/json; charset=utf-8'}) 98 | 99 | print requests.get(peer_url+"?res=1.tgz&verbose=1").content 100 | 101 | requests.get( 102 | peer_url+"?res=1.tgz&strict=site", 103 | data=json.dumps({"adt_info": {"site": "unknow"}})) 104 | 105 | # content = { 106 | # "1.tgz": [ 107 | # [ 108 | # "1.1.1.1", 109 | # 80 110 | # ]], 111 | # "ret" : 112 | # 'success' 113 | # } 114 | 115 | # self.assertEqual(r.json(), content) 116 | 117 | def test_get_peer(self): 118 | data = {'ip': "1.1.1.1", 'port': 80, "res": '1.tgz'} 119 | requests.post( 120 | peer_url, data=json.dumps(data), 121 | headers={'Content-Type': 'application/json; charset=utf-8'}) 122 | r = requests.get(peer_url+"?res=1.tgz") 123 | content = { 124 | "1.tgz": [ 125 | [ 126 | "1.1.1.1", 127 | 80 128 | ]], 129 | "ret": 130 | 'success' 131 | } 132 | self.assertEqual(r.json(), content) 133 | 134 | def test_get_peer_performance(self): 135 | data = {'ip': "1.1.1.1", 'port': 80, "res": '1.tgz'} 136 | for i in range(4000): 137 | data['port'] = i 138 | r = requests.post( 139 | peer_url, data=json.dumps(data), 140 | headers={ 141 | 'Content-Type': 'application/json; charset=utf-8'}) 142 | 143 | # ip_start = ipstr_to_int('1.1.1.1') 144 | # for i in range(5): 145 | # data = {'ip': int_to_ipstr(ip_start + i), 146 | # 'port': 80, "res": '1.tgz'} 147 | # r = requests.post(peer_url, data=json.dumps(data), 148 | # headers={'Content-Type': 'application/json; charset=utf-8'}) 149 | 150 | start = time.time() 151 | r = requests.get(peer_url+"?res=1.tgz&num=140") 152 | print r.content 153 | elasp = time.time() - start 154 | print 'use %s(s)' % elasp 155 | 156 | def test_get_peer_less_tw(self): 157 | for i in range(42): 158 | data = {'ip': "1.1.1.1" + str(i), 'port': 80, "res": '1.tgz'} 159 | r = requests.post( 160 | peer_url, data=json.dumps(data), 161 | headers={'Content-Type': 'application/json; charset=utf-8'}) 162 | 163 | r = requests.get(peer_url+"?res=1.tgz") 164 | self.assertEqual(len(r.json()['1.tgz']), 40) 165 | 166 | def test_del_peer(self): 167 | r = requests.delete(peer_url+"?res=1.tgz&ip=1.1.1.1&port=80") 168 | r.raise_for_status() 169 | 170 | def tearDown(self): 171 | self.server_process.terminate() 172 | 173 | 174 | if __name__ == '__main__': 175 | unittest.main() 176 | -------------------------------------------------------------------------------- /tracker/__init__.py: -------------------------------------------------------------------------------- 1 | import time 2 | import subprocess 3 | from multiprocessing import Process 4 | 5 | import django 6 | from django.core.management import call_command 7 | 8 | #django.conf.settings.configure("tracker.tracker.settings") 9 | 10 | def runserver(ip, port): 11 | subprocess.call(['python', 'tracker/manage.py', 'flush', '--noinput']) 12 | 13 | p = subprocess.Popen(['python', 'tracker/manage.py', 'runserver', 14 | '%s:%s' % (ip, port), '--noreload']) 15 | time.sleep(0.4) 16 | return p 17 | 18 | #if __name__ == '__main__': 19 | # runserver('localhost', 5001) 20 | -------------------------------------------------------------------------------- /tracker/manage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os 3 | import sys 4 | 5 | if __name__ == "__main__": 6 | os.environ.setdefault("DJANGO_SETTINGS_MODULE", "tracker.settings") 7 | 8 | from django.core.management import execute_from_command_line 9 | 10 | execute_from_command_line(sys.argv) 11 | -------------------------------------------------------------------------------- /tracker/models.py: -------------------------------------------------------------------------------- 1 | import random 2 | from copy import copy 3 | 4 | class PeerList(dict): 5 | 6 | def add(self, res, ip_port): 7 | try: 8 | if ip_port in self[res]: 9 | return 10 | self[res].append(ip_port) 11 | except KeyError: 12 | self[res] = [ip_port] 13 | 14 | def delete(self, res, ip_port): 15 | try: 16 | self[res].remove(ip_port) 17 | except ValueError: 18 | pass 19 | 20 | def get_peers(self, res, num=None): 21 | try: 22 | peers = copy(self[res]) 23 | if num: 24 | peers_num = len(peers) 25 | sample_num = peers_num if peers_num < num else num 26 | sample = random.sample(xrange(peers_num), sample_num) 27 | return [peers[i] for i in sample] 28 | else: 29 | return peers 30 | except KeyError: 31 | return [] 32 | 33 | -------------------------------------------------------------------------------- /tracker/peer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alibaba/minion/e98530138bbac133c52d3e1ee05b8ec392bfb916/tracker/peer/__init__.py -------------------------------------------------------------------------------- /tracker/peer/admin.py: -------------------------------------------------------------------------------- 1 | from django.contrib import admin 2 | 3 | # Register your models here. 4 | -------------------------------------------------------------------------------- /tracker/peer/models.py: -------------------------------------------------------------------------------- 1 | import random 2 | from hashlib import md5 3 | 4 | from django.db import models 5 | from collections import OrderedDict 6 | 7 | class Peer(models.Model): 8 | ip = models.IPAddressField(db_index=True) 9 | port = models.IntegerField(db_index=True) 10 | res = models.TextField(max_length=200) 11 | res_md5 = models.CharField(max_length=32, db_index=True) 12 | site = models.CharField(max_length=20, db_index=True) 13 | 14 | @classmethod 15 | def new(cls, ip, port, res=None, res_md5=None, site=None): 16 | if res: 17 | m = md5() 18 | m.update(res) 19 | res_md5 = m.hexdigest() 20 | 21 | return cls(ip=ip, 22 | port=port, 23 | res_md5=res_md5, 24 | site=site 25 | ) 26 | 27 | 28 | def get_peers(cls, res_md5, site=None, num=None): 29 | peers = cls.object.filter(res_md5=res_md5) 30 | if site: 31 | peers.filter(site=site) 32 | 33 | if num: 34 | peers_num = len(peers) 35 | sample_num = min(peers_num, num) 36 | sample = random.sample(xrange(peers_num), sample_num) 37 | else: 38 | return peers 39 | 40 | def to_dict(self): 41 | if self.site: 42 | return [self.ip, self.port, {"site": self.site}] 43 | else: 44 | return [self.ip, self.port] 45 | 46 | -------------------------------------------------------------------------------- /tracker/peer/tests.py: -------------------------------------------------------------------------------- 1 | from django.test import TestCase 2 | 3 | # Create your tests here. 4 | -------------------------------------------------------------------------------- /tracker/peer/views.py: -------------------------------------------------------------------------------- 1 | import json 2 | import random 3 | from hashlib import md5 4 | from .models import Peer 5 | 6 | import cProfile, StringIO, pstats 7 | 8 | from django.db import transaction 9 | from django.views.generic import View 10 | from django.shortcuts import render, HttpResponse 11 | 12 | def random_sample(iter_obj, t_num, num): 13 | sample_num = min(t_num, num) 14 | _sample = random.sample(xrange(t_num), sample_num) 15 | return [iter_obj[i] for i in _sample] 16 | 17 | def get_random_sample_ids(t_num, num): 18 | t_num = int(t_num) 19 | num = int(num) 20 | sample_num = min(t_num, num) 21 | _sample = random.sample(xrange(t_num), sample_num) 22 | return _sample 23 | 24 | def mprofile(func): 25 | def func_wrapper(*args, **kwargs): 26 | pr = cProfile.Profile() 27 | pr.enable() 28 | ret = func(*args, **kwargs) 29 | # ... do something ... 30 | pr.disable() 31 | s = StringIO.StringIO() 32 | sortby = 'cumulative' 33 | ps = pstats.Stats(pr, stream=s).sort_stats(sortby) 34 | ps.print_stats() 35 | print s.getvalue() 36 | 37 | return ret 38 | return func_wrapper 39 | 40 | 41 | class MD5Cache(object): 42 | def __init__(self): 43 | self._dict = {} 44 | 45 | def get_md5(self, res): 46 | try: 47 | return self._dict[res] 48 | except KeyError: 49 | m = md5() 50 | m.update(res) 51 | res_md5 = m.hexdigest() 52 | self._dict[res] = res_md5 53 | return res_md5 54 | 55 | md5_cache = MD5Cache() 56 | 57 | 58 | class PeerView(View): 59 | #@mprofile 60 | def get(self, request): 61 | #import cProfile, pstats, StringIO 62 | #pr = cProfile.Profile() 63 | #pr.enable() 64 | 65 | data = request.GET 66 | 67 | res = data['res'] 68 | num = data.get('num', 40) 69 | verbose = data.get('verbose', 0) 70 | 71 | ret = {} 72 | res_md5 = md5_cache.get_md5(res) 73 | peers = Peer.objects.filter(res_md5=res_md5) 74 | try: 75 | strict = request.GET['strict'] 76 | except KeyError: 77 | strict = False 78 | 79 | #with transaction.atomic(): 80 | if strict: 81 | adt_info = json.loads(request.body)['adt_info'] 82 | if strict == 'site': 83 | site = adt_info['site'] 84 | peers = peers.filter(site=site) 85 | 86 | # # solution 1 87 | # #peers = random_sample(peers, peers.count(), num) 88 | 89 | # # solution 2 90 | # #peers = peers.order_by('?')[:num] 91 | 92 | # solution 3 93 | pk_list = peers.values_list('pk', flat=True) 94 | pk_list.count() 95 | pk_list = list(pk_list) 96 | ids = get_random_sample_ids(len(pk_list), num) 97 | #from django.db import connection 98 | #print connection.queries 99 | #print time.time() - begin 100 | sample_pk_list = [pk_list[id] for id in ids] 101 | peers = peers.filter(id__in=sample_pk_list) 102 | 103 | #from django.db import connection 104 | #print connection.queries 105 | 106 | ret[res] = [] 107 | for p in peers: 108 | if verbose: 109 | ret[res].append(p.to_dict()) 110 | else: 111 | ret[res].append((p.ip, p.port)) 112 | 113 | ret['ret'] = 'success' 114 | #import StringIO 115 | #pr.disable() 116 | #s = StringIO.StringIO() 117 | #sortby = 'cumulative' 118 | #ps = pstats.Stats(pr, stream=s).sort_stats(sortby) 119 | #ps.print_stats() 120 | #print s.getvalue() 121 | return HttpResponse(json.dumps(ret)) 122 | 123 | def post(self, request): 124 | data = json.loads(request.body) 125 | 126 | ip = data['ip'] 127 | port = int(data['port']) 128 | res = data['res'] 129 | 130 | m = md5() 131 | m.update(res) 132 | res_md5 = m.hexdigest() 133 | 134 | hostname = None 135 | site = None 136 | try: 137 | adt_info = data['adt_info'] 138 | site = adt_info.get('site', None) 139 | hostname = adt_info.get('hostname', None) 140 | except KeyError: 141 | pass 142 | 143 | 144 | p, created = Peer.objects.get_or_create(ip=ip, 145 | port=port, 146 | res=res, 147 | res_md5=res_md5) 148 | 149 | if site: 150 | p.site = site 151 | 152 | p.save() 153 | return HttpResponse(json.dumps({'ret':'success'})) 154 | 155 | def delete(self, request): 156 | data = request.GET 157 | 158 | ip = data['ip'] 159 | port = int(data['port']) 160 | res = data['res'] 161 | 162 | Peer.objects.filter(ip=ip, port=port, res=res).delete() 163 | 164 | return HttpResponse(json.dumps({'ret':'success'})) 165 | 166 | class ResView(View): 167 | def get(self, request): 168 | data = request.GET 169 | response = {} 170 | peers = Peer.objects.values('res').distinct() 171 | response['resources'] = [] 172 | response['ret'] = 'success' 173 | for p in peers: 174 | response['resources'].append(p['res']) 175 | 176 | return HttpResponse(json.dumps(response)) 177 | -------------------------------------------------------------------------------- /tracker/server.py: -------------------------------------------------------------------------------- 1 | import json 2 | from flask import Flask 3 | from flask import request 4 | from models import PeerList 5 | 6 | peers = PeerList() 7 | app = Flask(__name__) 8 | 9 | from flask import Flask, request, session, g, redirect, url_for, \ 10 | abort, render_template, flash 11 | 12 | # configuration 13 | """ 14 | {'resource_name':[(ip, port), ]} 15 | """ 16 | 17 | @app.route("/peer/", methods=["GET", "POST", "DELETE"]) 18 | def peer(): 19 | adt_info = None 20 | response = dict() 21 | if request.method == "POST": 22 | data = request.json 23 | try: 24 | adt_info = data['adt_info'] 25 | except KeyError: 26 | pass 27 | 28 | ip = data['ip'] 29 | port = data['port'] 30 | port = int(port) 31 | res = data['res'] 32 | peers.add(res, (ip, port)) 33 | response['ret'] = 'success' 34 | 35 | elif request.method == "GET": 36 | res = request.args.get('res') 37 | if res: 38 | ava_peers = peers.get_peers(res, num=40) 39 | response[res] = ava_peers 40 | else: 41 | response = json.dumps(peers) 42 | 43 | elif request.method == "DELETE": 44 | res = request.args.get('res') 45 | ip = request.args.get('ip') 46 | port = request.args.get('port') 47 | port = int(port) 48 | peers.delete(res, (ip, port)) 49 | response['ret'] = 'success' 50 | 51 | return json.dumps(response, indent=4) 52 | 53 | @app.route("/res/") 54 | def res(): 55 | response = dict() 56 | if request.method == "GET": 57 | res = peers.keys() 58 | response['resources'] = res 59 | return json.dumps(response, indent=4) 60 | 61 | @app.route("/debug/") 62 | def debug(): 63 | import pdb 64 | pdb.set_trace() 65 | return 'debuging' 66 | 67 | if __name__ == '__main__': 68 | app.run(debug=True, host='0.0.0.0', port=6000, threaded=True) 69 | -------------------------------------------------------------------------------- /tracker/tracker/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alibaba/minion/e98530138bbac133c52d3e1ee05b8ec392bfb916/tracker/tracker/__init__.py -------------------------------------------------------------------------------- /tracker/tracker/settings.py: -------------------------------------------------------------------------------- 1 | """ 2 | Django settings for tracker project. 3 | 4 | For more information on this file, see 5 | https://docs.djangoproject.com/en/1.7/topics/settings/ 6 | 7 | For the full list of settings and their values, see 8 | https://docs.djangoproject.com/en/1.7/ref/settings/ 9 | """ 10 | 11 | # Build paths inside the project like this: os.path.join(BASE_DIR, ...) 12 | import os 13 | BASE_DIR = os.path.dirname(os.path.dirname(__file__)) 14 | 15 | 16 | # Quick-start development settings - unsuitable for production 17 | # See https://docs.djangoproject.com/en/1.7/howto/deployment/checklist/ 18 | 19 | # SECURITY WARNING: keep the secret key used in production secret! 20 | SECRET_KEY = 'b^74&*90rf^r8mp20lzibf(*2mnv66*+*u0*5x*z^)(2_3w7!)' 21 | 22 | # SECURITY WARNING: don't run with debug turned on in production! 23 | DEBUG = True 24 | 25 | TEMPLATE_DEBUG = True 26 | 27 | ALLOWED_HOSTS = ['*'] 28 | 29 | 30 | # Application definition 31 | 32 | INSTALLED_APPS = ( 33 | #'django.contrib.admin', 34 | #'django.contrib.auth', 35 | #'django.contrib.contenttypes', 36 | #'django.contrib.sessions', 37 | 'django.contrib.messages', 38 | 'django.contrib.staticfiles', 39 | 'peer', 40 | ) 41 | 42 | MIDDLEWARE_CLASSES = ( 43 | 'django.contrib.sessions.middleware.SessionMiddleware', 44 | 'django.middleware.common.CommonMiddleware', 45 | 'django.middleware.csrf.CsrfViewMiddleware', 46 | 'django.contrib.auth.middleware.AuthenticationMiddleware', 47 | 'django.contrib.auth.middleware.SessionAuthenticationMiddleware', 48 | 'django.contrib.messages.middleware.MessageMiddleware', 49 | 'django.middleware.clickjacking.XFrameOptionsMiddleware', 50 | ) 51 | 52 | ROOT_URLCONF = 'tracker.urls' 53 | 54 | WSGI_APPLICATION = 'tracker.wsgi.application' 55 | 56 | 57 | # Database 58 | # https://docs.djangoproject.com/en/1.7/ref/settings/#databases 59 | 60 | DATABASES = { 61 | 'default': { 62 | 'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'. 63 | 'TIMEOUT': 2, 64 | 'NAME': 'tracker', # Or path to database file if using sqlite3. 65 | 'USER': 'root', # Not used with sqlite3. 66 | 'PASSWORD': '', # Not used with sqlite4. 67 | 'HOST': 'localhost', 68 | #'HOST': '10.209.96.124', 69 | 'PORT': '3306', 70 | }, 71 | } 72 | 73 | # Internationalization 74 | # https://docs.djangoproject.com/en/1.7/topics/i18n/ 75 | 76 | LANGUAGE_CODE = 'en-us' 77 | 78 | TIME_ZONE = 'UTC' 79 | 80 | USE_I18N = True 81 | 82 | USE_L10N = True 83 | 84 | USE_TZ = True 85 | 86 | 87 | # Static files (CSS, JavaScript, Images) 88 | # https://docs.djangoproject.com/en/1.7/howto/static-files/ 89 | 90 | STATIC_URL = '/static/' 91 | 92 | LOGGING = { 93 | 'version': 1, 94 | 'handlers': { 95 | 'console':{ 96 | 'level':'DEBUG', 97 | 'class':'logging.StreamHandler', 98 | }, 99 | #'file': { 100 | # 'class': 'logging.FileHandler', 101 | # 'filename' : '/home/admin/logs/django.log', 102 | # 'level': 'DEBUG', 103 | # 'mode': 'a+', 104 | # 'formatter': 'default' 105 | #} 106 | }, 107 | 'loggers': { 108 | 'django.request': { 109 | 'handlers':['console'], 110 | 'propagate': True, 111 | 'level':'DEBUG', 112 | } 113 | }, 114 | } 115 | -------------------------------------------------------------------------------- /tracker/tracker/urls.py: -------------------------------------------------------------------------------- 1 | from django.conf.urls import patterns, include, url 2 | from django.views.decorators.csrf import csrf_exempt 3 | from django.contrib import admin 4 | from peer.views import PeerView, ResView 5 | 6 | urlpatterns = patterns('', 7 | # Examples: 8 | # url(r'^$', 'tracker.views.home', name='home'), 9 | # url(r'^blog/', include('blog.urls')), 10 | 11 | (r'^res/', csrf_exempt(ResView.as_view())), 12 | (r'^peer/', csrf_exempt(PeerView.as_view())), 13 | ) 14 | -------------------------------------------------------------------------------- /tracker/tracker/wsgi.py: -------------------------------------------------------------------------------- 1 | """ 2 | WSGI config for tracker project. 3 | 4 | It exposes the WSGI callable as a module-level variable named ``application``. 5 | 6 | For more information on this file, see 7 | https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/ 8 | """ 9 | 10 | import os 11 | os.environ.setdefault("DJANGO_SETTINGS_MODULE", "tracker.settings") 12 | 13 | from django.core.wsgi import get_wsgi_application 14 | application = get_wsgi_application() 15 | --------------------------------------------------------------------------------