├── .gitignore ├── LICENSE ├── README.md └── scp-chunk.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Packages 9 | *.egg 10 | *.egg-info 11 | dist 12 | build 13 | eggs 14 | parts 15 | bin 16 | var 17 | sdist 18 | develop-eggs 19 | .installed.cfg 20 | lib 21 | lib64 22 | 23 | # Installer logs 24 | pip-log.txt 25 | 26 | # Unit test / coverage reports 27 | .coverage 28 | .tox 29 | nosetests.xml 30 | 31 | # Translations 32 | *.mo 33 | 34 | # Mr Developer 35 | .mr.developer.cfg 36 | .project 37 | .pydevproject 38 | .idea 39 | *.swp 40 | 41 | *.mov 42 | split_parts_* 43 | chunks_* 44 | *.bak 45 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2013 Sohonet Ltd 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | scp-chunk 2 | =================== 3 | 4 | ## Why ? 5 | 6 | For transferring files over long latency links. Depending on the TCP/IP stack and the version of ssh installed latency can limit the speed that a single transfer will achieve, on a per connection basis. To work around this scp-chunk transfers multiple chunks at the same time. 7 | 8 | Use the system python, without having to install any other python packages!!!! just put this on the machine and go. 9 | 10 | Can use rsync instead of scp. 11 | 12 | ## How it works 13 | 14 | Split a large file into chunks and then transfer via multiple scp connections. 15 | Then join the chunks back together, check the checksum. 16 | then clean up all the chunks, at the local and remote ends. 17 | It will use at peak twice the disk space of the size of the file to be transferred at each end. 18 | 19 | ## Requirements 20 | Uses rsync or scp to transfer the files to the remote system in parrellel, and expects the user to be pre-keyed to the remote systems. 21 | [see article here on how to set this up]( http://hocuspokus.net/2008/01/ssh-shared-key-setup-ssh-logins-without-passwords/) 22 | 23 | ### Goal 24 | 25 | Use the system python, without having to install any other python packages, just using the programs listed below. 26 | 27 | It is expected that the remote shell will provide access to the following commands :- 28 | 29 | #### remote system 30 | 31 | * [openssl](https://linux.die.net/man/1/openssl) usage to calculate checksum: **openssl md5 \** 32 | * [cat](https://linux.die.net/man/1/cat) usage to reassemble chunks: **cat \ >> \** 33 | * [rm](https://linux.die.net/man/1/rm) usage to remove chunks: **rm \** 34 | * [rsync](https://linux.die.net/man/1/rsync) usage to transfer chunks, use --use_rsync 35 | 36 | ##### local system 37 | 38 | * [scp](https://linux.die.net/man/1/scp) to copy files to remote system. 39 | * [rsync](https://linux.die.net/man/1/rsync) usage to transfer chunks, use --use_rsync 40 | 41 | ## Usage 42 | 43 | 44 | usage: scp-chunk.py [-h] [-c CYPHER] [-s SIZE] [-r RETRIES] [-t THREADS] [--use_rsync] 45 | src srv dst 46 | 47 | Chunk a file and then kick off multiple SCP threads.Speeds up transfers over high latency links 48 | 49 | positional arguments: 50 | src source file 51 | srv remote server and user if required e.g foo@example.com 52 | dst directory (if remote home dir then specify . ) 53 | 54 | optional arguments: 55 | -h, --help show this help message and exit 56 | -c CYPHER, --cypher CYPHER 57 | cypher to use, from transfer see: ssh 58 | -s SIZE, --size SIZE size of chunks to transfer. 59 | -r RETRIES, --retries RETRIES 60 | number of times to retry transfer. 61 | -t THREADS, --threads THREADS 62 | number of threads (default 3) 63 | --use_rsync Use rsync instead of scp, scp is being deprecated 64 | 65 | ## Example output 66 | 67 | python scp-chunk.py 2GB.mov ben@10.110.10.121 . --threads 10 68 | 69 | spliting file 70 | uploading MD5 (d8ce4123aaacaec671a854f6ec74d8c0) checksum to remote site 71 | starting transfers 72 | Starting chunk: chunk_.00000 1:5 remaining 4 retries 0 73 | Starting chunk: chunk_.00001 2:5 remaining 3 retries 0 74 | Starting chunk: chunk_.00002 3:5 remaining 2 retries 0 75 | Starting chunk: chunk_.00003 4:5 remaining 1 retries 0 76 | Starting chunk: chunk_.00004 5:5 remaining 0 retries 0 77 | Finished chunk: chunk_.00004 5:5 remaining 0 78 | Finished chunk: chunk_.00002 3:5 remaining 0 79 | Finished chunk: chunk_.00001 2:5 remaining 0 80 | Finished chunk: chunk_.00000 1:5 remaining 0 81 | Finished chunk: chunk_.00003 4:5 remaining 0 82 | re-assembling file at remote end 83 | processing chunk_.00004 - 84 | re-assembled 85 | checking remote file checksum 86 | PASSED checksums match 87 | cleaning up 88 | removing file chunks 89 | removing file chunk chunk_.00004 \ 90 | transfer complete 91 | 92 | PING transfer.example.com (xxx.xxx.xxx.xxx): 56 data bytes 93 | 64 bytes from xxx.xxx.xxx.xxx: icmp_seq=0 ttl=58 time=151.308 ms 94 | 64 bytes from xxx.xxx.xxx.xxx: icmp_seq=1 ttl=58 time=151.264 ms 95 | 64 bytes from xxx.xxx.xxx.xxx: icmp_seq=2 ttl=58 time=151.449 ms 96 | 64 bytes from xxx.xxx.xxx.xxx: icmp_seq=3 ttl=58 time=150.927 ms 97 | 98 | 99 | python scp-chunk.py /Stuff/23GBlargefile.mov ben@transfer.example.com /Store/ben_test/ --threads 10 --size 1G 100 | spliting file 101 | uploading MD5 (5e631de28dd45d1b05952c885a882be1) checksum to remote site 102 | copying /Stuff/23GBlargefile.mov to /Store/ben_test/23GBlargefile.mov.md5 103 | starting transfers 104 | Starting chunk: /Stuff/23GBlargefile.mov.00000 1:29 remaining 28 retries 0 105 | Starting chunk: /Stuff/23GBlargefile.mov.00001 2:29 remaining 27 retries 0 106 | Starting chunk: /Stuff/23GBlargefile.mov.00002 3:29 remaining 26 retries 0 107 | Starting chunk: /Stuff/23GBlargefile.mov.00003 4:29 remaining 25 retries 0 108 | Starting chunk: /Stuff/23GBlargefile.mov.00004 5:29 remaining 24 retries 0 109 | Starting chunk: /Stuff/23GBlargefile.mov.00005 6:29 remaining 23 retries 0 110 | Starting chunk: /Stuff/23GBlargefile.mov.00006 7:29 remaining 22 retries 0 111 | Starting chunk: /Stuff/23GBlargefile.mov.00007 8:29 remaining 21 retries 0 112 | Starting chunk: /Stuff/23GBlargefile.mov.00008 9:29 remaining 20 retries 0 113 | Starting chunk: /Stuff/23GBlargefile.mov.00009 10:29 remaining 19 retries 0 114 | Finished chunk: /Stuff/23GBlargefile.mov.00008 9:29 remaining 19 115 | Starting chunk: /Stuff/23GBlargefile.mov.00010 11:29 remaining 18 retries 0 116 | 117 | Finished chunk: /Stuff/23GBlargefile.mov.00019 20:29 remaining 2 118 | Starting chunk: /Stuff/23GBlargefile.mov.00027 28:29 remaining 1 retries 0 119 | Finished chunk: /Stuff/23GBlargefile.mov.00017 18:29 remaining 1 120 | Starting chunk: /Stuff/23GBlargefile.mov.00028 29:29 remaining 0 retries 0 121 | Finished chunk: /Stuff/23GBlargefile.mov.00014 15:29 remaining 0 122 | Finished chunk: /Stuff/23GBlargefile.mov.00028 29:29 remaining 0 123 | Finished chunk: /Stuff/23GBlargefile.mov.00020 21:29 remaining 0 124 | Finished chunk: /Stuff/23GBlargefile.mov.00024 25:29 remaining 0 125 | Finished chunk: /Stuff/23GBlargefile.mov.00021 22:29 remaining 0 126 | Finished chunk: /Stuff/23GBlargefile.mov.00025 26:29 remaining 0 127 | Finished chunk: /Stuff/23GBlargefile.mov.00023 24:29 remaining 0 128 | Finished chunk: /Stuff/23GBlargefile.mov.00022 23:29 remaining 0 129 | Finished chunk: /Stuff/23GBlargefile.mov.00026 27:29 remaining 0 130 | Finished chunk: /Stuff/23GBlargefile.mov.00027 28:29 remaining 0 131 | re-assembling file at remote end 132 | processing 23GBlargefile.mov / 133 | re-assembled 134 | checking remote file checksum 135 | PASSED checksums match 136 | cleaning up 137 | removing file chunks 138 | removing file chunk /Stuff/23GBlargefile.mov.00028 - 139 | transfer complete 140 | -------------------------------------------------------------------------------- 141 | file size :28.2 GB 142 | transfer rate :25.7 MB/s 143 | :205.9 Mb/s 144 | transfer time :18 minutes 43 seconds 145 | local chunking time :10 minutes 35 seconds 146 | remote reassembly time :4 minutes 20 seconds 147 | remote checksum time :1 minute 28 seconds 148 | total transfer rate :13.3 MB/s 149 | :106.6 Mb/s 150 | total time :36 minutes 8 seconds 151 | 152 | Would be faster is if the disks where not rubbish at the source end. SSD at each end would make it faster to chunk the file. 153 | 154 | ## Thank you 155 | 156 | [Bytes-to-human / human-to-bytes converter](http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/) 157 | 158 | 159 | [Humanizeize time](https://github.com/liudmil-mitev/experiments/blob/master/time/humanize_time.py) 160 | -------------------------------------------------------------------------------- /scp-chunk.py: -------------------------------------------------------------------------------- 1 | __author__ = "Patrick Sumby and Ben Roeder" 2 | 3 | import argparse 4 | import hashlib 5 | import os 6 | import re 7 | import subprocess 8 | import sys 9 | import time 10 | from queue import Queue 11 | from subprocess import CalledProcessError 12 | from threading import Thread 13 | 14 | winPath = re.compile("^(\w)\:[\\/](.*)$", re.IGNORECASE) 15 | 16 | default_num_threads = 3 17 | default_retries = 0 18 | default_cypher = "aes128-ctr" 19 | split_file_basename = "chunk_" 20 | use_rsync = False 21 | 22 | INTERVALS = [1, 60, 3600, 86400, 604800, 2419200, 29030400] 23 | NAMES = [ 24 | ("second", "seconds"), 25 | ("minute", "minutes"), 26 | ("hour", "hours"), 27 | ("day", "days"), 28 | ("week", "weeks"), 29 | ("month", "months"), 30 | ("year", "years"), 31 | ] 32 | 33 | 34 | def humanize_time(amount, units): 35 | """ 36 | Divide `amount` in time periods. 37 | Useful for making time intervals more human readable. 38 | 39 | >>> humanize_time(173, "hours") 40 | [(1, 'week'), (5, 'hours')] 41 | >>> humanize_time(17313, "seconds") 42 | [(4, 'hours'), (48, 'minutes'), (33, 'seconds')] 43 | >>> humanize_time(90, "weeks") 44 | [(1, 'year'), (10, 'months'), (2, 'weeks')] 45 | >>> humanize_time(42, "months") 46 | [(3, 'years'), (6, 'months')] 47 | >>> humanize_time(500, "days") 48 | [(1, 'year'), (5, 'months'), (3, 'weeks'), (3, 'days')] 49 | """ 50 | result = [] 51 | 52 | unit = list(map(lambda a: a[1], NAMES)).index(units) 53 | # Convert to seconds 54 | amount = amount * INTERVALS[unit] 55 | 56 | for i in range(len(NAMES) - 1, -1, -1): 57 | a = amount // INTERVALS[i] 58 | if a > 0: 59 | result.append((a, NAMES[i][1 % a])) 60 | amount -= a * INTERVALS[i] 61 | 62 | return result 63 | 64 | 65 | def humanize_time_to_string(time): 66 | time_str = "" 67 | for t, units in time: 68 | time_str += str(t) + " " + str(units) + " " 69 | return time_str 70 | 71 | 72 | # see: http://goo.gl/kTQMs 73 | SYMBOLS = { 74 | "customary": ("B", "K", "M", "G", "T", "P", "E", "Z", "Y"), 75 | "customary_ext": ("byte", "kilo", "mega", "giga", "tera", "peta", "exa", "zetta", "iotta"), 76 | "iec": ("Bi", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"), 77 | "iec_ext": ("byte", "kibi", "mebi", "gibi", "tebi", "pebi", "exbi", "zebi", "yobi"), 78 | } 79 | 80 | 81 | def bytes2human(n, format="%(value).1f %(symbol)s", symbols="customary"): 82 | """ 83 | Convert n bytes into a human readable string based on format. 84 | symbols can be either "customary", "customary_ext", "iec" or "iec_ext", 85 | see: http://goo.gl/kTQMs 86 | 87 | >>> bytes2human(0) 88 | '0.0 B' 89 | >>> bytes2human(0.9) 90 | '0.0 B' 91 | >>> bytes2human(1) 92 | '1.0 B' 93 | >>> bytes2human(1.9) 94 | '1.0 B' 95 | >>> bytes2human(1024) 96 | '1.0 K' 97 | >>> bytes2human(1048576) 98 | '1.0 M' 99 | >>> bytes2human(1099511627776127398123789121) 100 | '909.5 Y' 101 | 102 | >>> bytes2human(9856, symbols="customary") 103 | '9.6 K' 104 | >>> bytes2human(9856, symbols="customary_ext") 105 | '9.6 kilo' 106 | >>> bytes2human(9856, symbols="iec") 107 | '9.6 Ki' 108 | >>> bytes2human(9856, symbols="iec_ext") 109 | '9.6 kibi' 110 | 111 | >>> bytes2human(10000, "%(value).1f %(symbol)s/sec") 112 | '9.8 K/sec' 113 | 114 | >>> # precision can be adjusted by playing with %f operator 115 | >>> bytes2human(10000, format="%(value).5f %(symbol)s") 116 | '9.76562 K' 117 | """ 118 | n = int(n) 119 | if n < 0: 120 | raise ValueError("n < 0") 121 | symbols = SYMBOLS[symbols] 122 | prefix = {} 123 | for i, s in enumerate(symbols[1:]): 124 | prefix[s] = 1 << (i + 1) * 10 125 | for symbol in reversed(symbols[1:]): 126 | if n >= prefix[symbol]: 127 | value = float(n) / prefix[symbol] 128 | return format % locals() 129 | return format % dict(symbol=symbols[0], value=n) 130 | 131 | 132 | def human2bytes(s): 133 | """ 134 | Attempts to guess the string format based on default symbols 135 | set and return the corresponding bytes as an integer. 136 | When unable to recognize the format ValueError is raised. 137 | 138 | >>> human2bytes('0 B') 139 | 0 140 | >>> human2bytes('1 K') 141 | 1024 142 | >>> human2bytes('1 M') 143 | 1048576 144 | >>> human2bytes('1 Gi') 145 | 1073741824 146 | >>> human2bytes('1 tera') 147 | 1099511627776 148 | 149 | >>> human2bytes('0.5kilo') 150 | 512 151 | >>> human2bytes('0.1 byte') 152 | 0 153 | >>> human2bytes('1 k') # k is an alias for K 154 | 1024 155 | >>> human2bytes('12 foo') 156 | Traceback (most recent call last): 157 | ... 158 | ValueError: can't interpret '12 foo' 159 | """ 160 | init = s 161 | num = "" 162 | while s and s[0:1].isdigit() or s[0:1] == ".": 163 | num += s[0] 164 | s = s[1:] 165 | num = float(num) 166 | letter = s.strip() 167 | for _, sset in SYMBOLS.items(): 168 | if letter in sset: 169 | break 170 | else: 171 | if letter == "k": 172 | # treat 'k' as an alias for 'K' as per: http://goo.gl/kTQMs 173 | sset = SYMBOLS["customary"] 174 | letter = letter.upper() 175 | else: 176 | raise ValueError("can't interpret {0!r}".format(init)) 177 | prefix = {sset[0]: 1} 178 | for i, s in enumerate(sset[1:]): 179 | prefix[s] = 1 << (i + 1) * 10 180 | return int(num * prefix[letter]) 181 | 182 | 183 | def spinning_cursor(): 184 | while True: 185 | for cursor in "|/-\\": 186 | yield cursor 187 | 188 | 189 | spinner = spinning_cursor() 190 | 191 | 192 | def spin(text): 193 | sys.stdout.write(text + " " + next(spinner)) 194 | sys.stdout.flush() 195 | back_spc = (len(text) + 2) * "\b" 196 | sys.stdout.write(back_spc) 197 | 198 | 199 | def split_file_and_md5(file_name, prefix, max_size, padding_width=5, buff=1024 * 1024 * 5): 200 | chunks = [] 201 | file_md5 = hashlib.md5() 202 | (path, file_name_part) = os.path.split(file_name) 203 | with open(file_name, "r+b") as src: 204 | suffix = 0 205 | while True: 206 | chunk_name = os.path.join(path, prefix + ".%0*d" % (padding_width, suffix)) 207 | with open(chunk_name, "w+b") as tgt: 208 | chunk_md5 = hashlib.md5() 209 | written = 0 210 | while written <= max_size: 211 | data = src.read(buff) 212 | file_md5.update(data) 213 | chunk_md5.update(data) 214 | if data: 215 | tgt.write(data) 216 | written += buff 217 | spin(chunk_name) 218 | else: 219 | chunks.append((chunk_name, chunk_md5.hexdigest())) 220 | 221 | return ((file_name, file_md5.hexdigest()), chunks) 222 | suffix += 1 223 | chunks.append((chunk_name, chunk_md5.hexdigest())) 224 | 225 | 226 | class WorkerThread(Thread): 227 | def __init__(self, file_queue, dst_file, remote_server, cypher): 228 | Thread.__init__(self) 229 | self.file_queue = file_queue 230 | self.dst_file = dst_file 231 | self.remote_server = remote_server 232 | self.cypher = cypher 233 | 234 | def run(self): 235 | while True: 236 | if self.file_queue.empty(): 237 | return 238 | else: 239 | try: 240 | (src_file, dest_file, chunk_num, total_chunks, retries) = self.file_queue.get(timeout=1) 241 | print( 242 | "Starting chunk: " 243 | + src_file 244 | + " " 245 | + str(chunk_num) 246 | + ":" 247 | + str(total_chunks) 248 | + " remaining " 249 | + str(self.file_queue.qsize()) 250 | + " retries " 251 | + str(retries) 252 | ) 253 | res = self.upload_chunk(src_file, dest_file, use_rsync) 254 | if res: 255 | print( 256 | "Finished chunk: " 257 | + src_file 258 | + " " 259 | + str(chunk_num) 260 | + ":" 261 | + str(total_chunks) 262 | + " remaining " 263 | + str(self.file_queue.qsize()) 264 | ) 265 | self.file_queue.task_done() 266 | else: 267 | retries = retries - 1 268 | if retries > 0: 269 | print( 270 | "Re-queuing failed chunk: " 271 | + src_file 272 | + " " 273 | + str(chunk_num) 274 | + " retries left " 275 | + str(retries) 276 | ) 277 | self.file_queue.put((src_file, dest_file, chunk_num, total_chunks, retries)) 278 | else: 279 | print("ERROR: FAILED to upload " + src_file + " " + str(chunk_num)) 280 | self.file_queue.task_done() 281 | except Exception as _: 282 | print("ERROR: in uploading in tread") 283 | retries = retries - 1 284 | if retries > 0: 285 | print( 286 | "Re-queuing failed chunk: " 287 | + src_file 288 | + " " 289 | + str(chunk_num) 290 | + " retries left " 291 | + str(retries) 292 | ) 293 | self.file_queue.put((src_file, dest_file, chunk_num, total_chunks, retries)) 294 | else: 295 | print("FAILED to upload " + src_file + " " + str(chunk_num)) 296 | self.file_queue.task_done() 297 | 298 | def upload_chunk(self, src_file, dest_file, use_rsync=False): 299 | try: 300 | if use_rsync: 301 | if winPath.match(src_file): 302 | src_file = winPath.sub(r"/\g<1>/\g<2>", src_file) 303 | src_file = src_file.replace("\\", "/") 304 | subprocess.check_call( 305 | [ 306 | "rsync", 307 | "-Ptz", 308 | "--inplace", 309 | "--rsh=ssh", 310 | "--timeout=30", 311 | src_file, 312 | self.remote_server + ":" + dest_file, 313 | ] 314 | ) 315 | else: 316 | subprocess.check_call( 317 | [ 318 | "scp", 319 | "-c" + self.cypher, 320 | "-q", 321 | "-oBatchMode=yes", 322 | "-oConnectTimeout=30", 323 | src_file, 324 | self.remote_server + ":" + dest_file, 325 | ] 326 | ) 327 | 328 | except CalledProcessError as _: 329 | return False 330 | return True 331 | 332 | 333 | def get_file_md5(filename, buffer_size=1024 * 1024 * 2): 334 | """Return the hex digest of a file without loading it all into memory""" 335 | fh = open(filename) 336 | digest = hashlib.md5() 337 | while 1: 338 | buf = fh.read(buffer_size) 339 | if buf == "": 340 | break 341 | digest.update(buf) 342 | fh.close() 343 | return str(digest.hexdigest()).lower() 344 | 345 | 346 | def human_sizes(size): 347 | try: 348 | chunk_size = human2bytes(size) 349 | except ValueError as _: 350 | msg = "Invalid size " + str(size) + " try 1G" 351 | raise argparse.ArgumentTypeError(msg) 352 | return size 353 | 354 | 355 | def main(): 356 | start_time = time.time() 357 | # Read in arguments 358 | parser = argparse.ArgumentParser( 359 | description="Chunk a file and then kick" 360 | " off multiple SCP threads." 361 | "Speeds up transfers over " 362 | "high latency links" 363 | ) 364 | parser.add_argument( 365 | "-c", "--cypher", help="cypher to use, from transfer see: ssh", default=default_cypher, required=False 366 | ) 367 | parser.add_argument( 368 | "-s", "--size", help="size of chunks to transfer.", default="500M", required=False, type=human_sizes 369 | ) 370 | parser.add_argument( 371 | "-r", "--retries", help="number of times to retry transfer.", default=default_retries, required=False, type=int 372 | ) 373 | 374 | parser.add_argument( 375 | "-t", 376 | "--threads", 377 | help="number of threads (default " + str(default_num_threads) + ")", 378 | default=default_num_threads, 379 | required=False, 380 | type=int, 381 | ) 382 | parser.add_argument("src", help="source file") 383 | parser.add_argument("srv", help="remote server and user if required" " e.g foo@example.com") 384 | parser.add_argument("dst", help="directory (if remote home dir then specify . )") 385 | parser.add_argument( 386 | "--use_rsync", action="store_true", default=False, help="Use rsync instead of scp, scp is being deprecated" 387 | ) 388 | 389 | args = parser.parse_args() 390 | 391 | ssh_crypto = args.cypher 392 | try: 393 | chunk_size = human2bytes(args.size) 394 | except ValueError as e: 395 | print("Invalid chunk size " + str(e)) 396 | exit(1) 397 | num_threads = args.threads 398 | src_file = args.src 399 | dst_file = args.dst 400 | remote_server = args.srv 401 | retries = args.retries 402 | global use_rsync 403 | use_rsync = args.use_rsync 404 | (dest_path, _) = os.path.split(dst_file) 405 | if dest_path == "": 406 | dest_path = "~/" 407 | (_, src_filename) = os.path.split(src_file) 408 | remote_dest_file = os.path.join(dest_path, src_filename) 409 | remote_chunk_files = [] 410 | 411 | # Check args for errors + instantiate variables. 412 | if not os.path.exists(src_file): 413 | print("Error: Source file does not exist", src_file) 414 | exit(1) 415 | if not os.path.isfile(src_file): 416 | print("Error: Source is not a file", src_file) 417 | exit(1) 418 | 419 | src_file_size = os.stat(src_file).st_size 420 | # Split file and calc the file md5 421 | local_chunk_start_time = time.time() 422 | print("spliting file") 423 | spinner = spinning_cursor() 424 | sys.stdout.write(next(spinner)) 425 | sys.stdout.flush() 426 | sys.stdout.write("\b") 427 | (src_file_info, chunk_infos) = split_file_and_md5(src_file, src_filename, chunk_size) 428 | src_file_md5 = src_file_info[1] 429 | local_chunk_end_time = time.time() 430 | print("uploading MD5 ({0!s}) checksum to remote site".format(src_file_md5)) 431 | try: 432 | checksum_filename = src_file + ".md5" 433 | dest_checksum_filename = os.path.join(dest_path, src_filename + ".md5") 434 | with open(checksum_filename, "w+") as checksum_file: 435 | checksum_file.write(src_file_md5 + " " + src_filename) 436 | print("copying " + src_file + " to " + dest_checksum_filename) 437 | subprocess.check_call( 438 | [ 439 | "scp", 440 | "-c" + ssh_crypto, 441 | "-q", 442 | "-oBatchMode=yes", 443 | checksum_filename, 444 | remote_server + ":" + dest_checksum_filename, 445 | ] 446 | ) 447 | except CalledProcessError as e: 448 | print(e.returncode) 449 | print("ERROR: Couldn't connect to remote server.") 450 | exit(1) 451 | 452 | # Fill the queue of files to transfer 453 | q = Queue() 454 | chunk_num = 1 455 | total_chunks = len(chunk_infos) 456 | for src_chunk_filename, chunk_md5 in chunk_infos: 457 | # create destination path 458 | (_, src_filename) = os.path.split(src_chunk_filename) 459 | dest_chunk_filename = os.path.join(dest_path, src_filename) 460 | remote_chunk_files.append((src_chunk_filename, dest_chunk_filename, chunk_md5)) 461 | q.put((src_chunk_filename, dest_chunk_filename, chunk_num, total_chunks, retries)) 462 | chunk_num = chunk_num + 1 463 | 464 | # Kick off threads 465 | transfer_start_time = time.time() 466 | print("starting transfers") 467 | for _ in range(num_threads): 468 | t = WorkerThread(q, dst_file, remote_server, ssh_crypto) 469 | t.daemon = True 470 | t.start() 471 | q.join() 472 | transfer_end_time = time.time() 473 | 474 | # join the chunks back together and check the md5 475 | print("re-assembling file at remote end") 476 | remote_chunk_start_time = time.time() 477 | chunk_count = 0 478 | for chunk_filename, chunk_md5 in chunk_infos: 479 | (path, remote_chunk_filename) = os.path.split(chunk_filename) 480 | 481 | remote_chunk_file = os.path.join(dest_path, remote_chunk_filename) 482 | 483 | spin("processing " + remote_chunk_filename) 484 | if chunk_count: 485 | cmd = remote_chunk_file + ">> " + remote_dest_file 486 | else: 487 | # truncate if the first chunk 488 | cmd = remote_chunk_file + "> " + remote_dest_file 489 | 490 | subprocess.call(["ssh", remote_server, "cat", cmd]) 491 | chunk_count += 1 492 | print() 493 | print("re-assembled") 494 | remote_chunk_end_time = time.time() 495 | 496 | print("checking remote file checksum") 497 | remote_checksum_start_time = time.time() 498 | try: 499 | # use openssl to be cross platform (OSX,Linux) 500 | checksum = subprocess.check_output(["ssh", remote_server, "openssl", "md5", remote_dest_file]) 501 | # MD5(2GB.mov)= d8ce4123aaacaec671a854f6ec74d8c0 502 | print("checksum.find(src_file_md5):" + checksum.decode("utf-8").strip() + " - " + src_file_md5) 503 | if checksum.decode("utf-8").strip().find(src_file_md5) != -1: 504 | print("PASSED checksums match") 505 | else: 506 | print("ERROR: MD5s do not match local(" + src_file_md5 + ") != (" + checksum.strip() + ")") 507 | print(" File uploaded with errors - MD5 did not match.") 508 | print(" local and remote chunks not cleared up") 509 | exit(1) 510 | except CalledProcessError as e: 511 | print(e.returncode) 512 | print("ERROR: File uploaded with errors - MD5 did not match.") 513 | print(" local and remote chunks not cleared up") 514 | exit(1) 515 | 516 | remote_checksum_end_time = time.time() 517 | # clean up 518 | print("cleaning up") 519 | print("removing file chunks") 520 | for local_chunk, remote_chunk, chunk_md5 in remote_chunk_files: 521 | spin("removing file chunk " + local_chunk) 522 | os.remove(local_chunk) 523 | try: 524 | subprocess.call(["ssh", remote_server, "rm", remote_chunk]) 525 | except CalledProcessError as e: 526 | print(e.returncode) 527 | print("ERROR: failed to remove remote chunk " + remote_chunk) 528 | print("") 529 | print("transfer complete") 530 | end_time = time.time() 531 | print("-" * 80) 532 | print("file size :" + bytes2human(src_file_size) + "B") 533 | print( 534 | "transfer rate :" + bytes2human(src_file_size / int(transfer_end_time - transfer_start_time)) + "B/s" 535 | ) 536 | print( 537 | " :" 538 | + bytes2human((src_file_size * 8) / int(transfer_end_time - transfer_start_time)) 539 | + "b/s" 540 | ) 541 | print( 542 | "transfer time :" 543 | + str(humanize_time_to_string(humanize_time(int(transfer_end_time - transfer_start_time), "seconds"))) 544 | ) 545 | print( 546 | "local chunking time :" 547 | + str(humanize_time_to_string(humanize_time(int(local_chunk_end_time - local_chunk_start_time), "seconds"))) 548 | ) 549 | print( 550 | "remote reassembly time :" 551 | + str(humanize_time_to_string(humanize_time(int(remote_chunk_end_time - remote_chunk_start_time), "seconds"))) 552 | ) 553 | print( 554 | "remote checksum time :" 555 | + str( 556 | humanize_time_to_string( 557 | humanize_time(int(remote_checksum_end_time - remote_checksum_start_time), "seconds") 558 | ) 559 | ) 560 | ) 561 | print("total transfer rate :" + bytes2human(src_file_size / int(end_time - start_time)) + "B/s") 562 | print(" :" + bytes2human((src_file_size * 8) / int(end_time - start_time)) + "b/s") 563 | print( 564 | "total time :" + str(humanize_time_to_string(humanize_time(int(end_time - start_time), "seconds"))) 565 | ) 566 | 567 | exit(0) 568 | 569 | 570 | main() 571 | --------------------------------------------------------------------------------