├── requirements.txt ├── .gitignore ├── setup.py ├── README.md └── torram /requirements.txt: -------------------------------------------------------------------------------- 1 | bencode.py -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | 3 | # C extensions 4 | *.so 5 | 6 | # Packages 7 | *.egg 8 | *.egg-info 9 | dist 10 | build 11 | eggs 12 | parts 13 | bin 14 | var 15 | sdist 16 | develop-eggs 17 | .installed.cfg 18 | lib 19 | lib64 20 | 21 | # Installer logs 22 | pip-log.txt 23 | 24 | # Unit test / coverage reports 25 | .coverage 26 | .tox 27 | nosetests.xml 28 | 29 | # Translations 30 | *.mo 31 | 32 | # Mr Developer 33 | .mr.developer.cfg 34 | .project 35 | .pydevproject 36 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | from setuptools import setup 3 | 4 | setup( 5 | name = "torram", 6 | version = "1.0.1", 7 | install_requires = ['bencode.py'], 8 | scripts = ['torram'], 9 | 10 | # Metadata 11 | author = "Volodymyr Buell (Buiel)", 12 | author_email = "vbuell@gmail.com", 13 | url = "https://github.com/vbuell/torrent-upstart", 14 | description = ("Utility that recreats a torrent download folder with fully and partially downloaded files.") 15 | # license="APL2", 16 | ) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # torram 2 | 3 | torram (former torrent-upstart) is a utility that recreates a torrent download 4 | directory from fully and partially downloaded file(s). If several 5 | partially-downloaded sources of the same incompleted torrent files are found, it 6 | merges all downloaded chunks into the target files together. 7 | 8 | ## Use cases 9 | 10 | There are two major cases when this utility will help you to save traffic or to 11 | help recovering the files: 12 | 13 | #### "hey, where are all seeders?" situation 14 | 15 | That's the most common situation: you have several torrents having the same 16 | file(s) but for some reason these torrents are dead, so you have 75% done for 17 | one source and 85% done of another... there are chances that you actually have 18 | enough information to combine downloaded blocks from incomplete file #1 with 19 | downloaded blocks from file #2 to get a complete file as a result. The more 20 | sources you have, the better result you'll get. 21 | 22 | > :warning: After files are combined, do "Force check" in your torrent client. 23 | > Otherwise the torrent client won't know that the files were changed. 24 | 25 | **Note**: I had positive experience recovering some extra rare artifacts by 26 | merging partially downloaded files obtained by different p2p sources: torrent 27 | and mlDonkey temp directory. In that case just make sure that both files are 28 | children of the *root* directory (see Usage section). 29 | 30 | #### "I don't like to download it twice" situation 31 | 32 | It's a good idea to scan your HDD drive with this utility before starting 33 | downloading in terms of reusing already downloaded files. 34 | 35 | > :warning: Again, make sure that you "Force check" torrent after you run this 36 | > tool... 37 | 38 | **Note**: Due to technical implementation of how files are being split into 39 | chunks in .torrent, sometimes it's impossible to check md5 sum of first and last 40 | chunk of the file. So after "Force check" is done you may see that the file is 99% 41 | complete even though it's actually 100%. 42 | 43 | ## How does it work 44 | 45 | Each .torrent file consists of one or several files, each of them defined as a 46 | record of fields: file name, file size, file's hash, plus an error detection 47 | information in the form of md5 hash sum for all chunks. This error detection part 48 | allows us to do the trick - find chunks with the same md5 sum from files in your 49 | filesystem and generate output file using these chunks. 50 | 51 | So the algorithm is: 52 | 53 | * scan *root* directory (last positional arg) recursively and find ponential 54 | pretenders for each file of .torrent file using two simple matching rules: 55 | file should either be named as required or should have the same file size 56 | * for each pretender of .torrent file we do split pretender file into chunks 57 | and calculate md5 sum for every chunk in order to find blocks that we will 58 | use to put in destination file 59 | * if multiple pretenders found having reusable blocks then we prompt user to 60 | choose one of these methods: merge from multiple files (default); use one of files 61 | without merging; or skip the file 62 | * based on user's selection the tool will create a file (or skip creation) and 63 | place it in output directory (`-o` argument) 64 | 65 | 66 | ## Features 67 | 68 | * Combine several partially-downloaded sources into one (yes, it does work!) 69 | * Autodetect output directory (qBittorrent only for now) 70 | * Use symlinks instead of copying a file (danger... danger... danger...) 71 | * Manual and automatic modes (several levels of automation. See `-s`, `-ss` 72 | switches) 73 | 74 | ## Requirements 75 | 76 | * Python >= 3.4 77 | * bencode.py (https://pypi.python.org/pypi/bencode) 78 | * PyQt4 (only if you use qBittorrent output directory autodetection, switch 79 | `--autodetect_output_dir`) 80 | 81 | ## Usage 82 | 83 | ``` 84 | > ./torram.py --help 85 | usage: torram [-h] [--symlink] [--minsize MINSIZE] [-v] [-o OUTPUT_DIR] [-a] 86 | [-c] [-s] [--fileext FILE_EXT] [--version] 87 | torrentFile root 88 | 89 | Recreate download directory for .torrent file from fully and partially 90 | downloaded file(s). 91 | 92 | positional arguments: 93 | torrentFile .torrent file to analyze 94 | root Directory to recursively search files in 95 | 96 | optional arguments: 97 | -h, --help show this help message and exit 98 | --symlink Use symlinks instead of copying files. (Caution: You 99 | may lose your data if files are not actually 100 | identical.) 101 | --minsize MINSIZE Minimum file size in bytes to be recoverable. Default: 102 | 1048576 103 | -v, --verbose Be verbose (multiple levels: use -vv or -vvv to see 104 | more) 105 | -o OUTPUT_DIR, --output_dir OUTPUT_DIR 106 | Output directory where to place recreated files 107 | -a, --autodetect_output_dir 108 | Autodetect output directory (only qBittorrent 109 | currently supported) 110 | -c, --use_color Output format: [ansi, ascii] 111 | -s, --autoskip Dont ask questions if there is only one viable choice. 112 | Use -ss to behave even more automated 113 | --fileext FILE_EXT Extension to be added to output files (ex, .!qB for 114 | incomplete qBittorrent files) 115 | --version show program's version number and exit 116 | ``` 117 | -------------------------------------------------------------------------------- /torram: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import functools 3 | import hashlib 4 | import io 5 | import os 6 | import re 7 | import shutil 8 | import stat 9 | import sys 10 | import tempfile 11 | import bencode 12 | 13 | 14 | DELUGE_DIR = '~/.config/deluge/state' 15 | QBITTORRENT_RESUME_CONF = '~/.config/qBittorrent/qBittorrent-resume.conf' 16 | FILES_DIR = '~' 17 | MINIMUM_FILESIZE_TO_SEARCH = 1024 * 1024 18 | 19 | VERSION = '0.9.0' 20 | 21 | class AnsiFormatter(object): 22 | aaa = {'RED': "\033[31m", 23 | 'BOLD': "\033[1m", 24 | 'YELLOW': "\033[33m", 25 | 'INVERT': "\033[40m\033[37m", 26 | 'GREEN': "\033[32m", 27 | 'BLUE': "\033[34m", 28 | 'BLACK2': "\033[90m", 29 | 'BLACK1': "\033[37m", 30 | 'BLACK0': "\033[97m", 31 | } 32 | 33 | def format(self, txt, *code): 34 | return ''.join([self.aaa[c] for c in code]) + txt + "\033[0m" 35 | 36 | 37 | class BaseFormatter(object): 38 | def format(self, txt, *code): 39 | return txt 40 | 41 | 42 | class FileInfo(): 43 | def __init__(self): 44 | self.start_offset = None 45 | self.isOriginal = False 46 | self.isHardlink = False 47 | 48 | 49 | def suggest_method(file_infos): 50 | fullest_file_idx = None 51 | fullest_file_rate = 0 52 | downladed_file_rate = 0 53 | mixed_pieces = [] 54 | 55 | # calculate per file 56 | for idx, file_info in enumerate(file_infos): 57 | if len(file_info.chunks) == 0: 58 | return 'S' 59 | num_of_success = file_info.num_of_good_chunks 60 | rate = float(num_of_success) / len(file_info.chunks) 61 | if file_info.isOriginal: 62 | downladed_file_rate = rate 63 | if rate > fullest_file_rate: 64 | fullest_file_rate = rate 65 | fullest_file_idx = idx 66 | 67 | # calculate mixed 68 | if len(file_infos) > 1: 69 | for aa in zip(*[fi.chunks for fi in file_infos]): 70 | # print aa, type(aa), any(aa) 71 | mixed_pieces.append(any(aa)) 72 | num_of_success = functools.reduce(lambda x, y: x + int(y), mixed_pieces, 0) 73 | 74 | if float(num_of_success) / len(mixed_pieces) > fullest_file_rate: 75 | pattern = 'Got [{0} of {1} good pieces from {2} files.' 76 | print(fmt.format(pattern.format(num_of_success, len(mixed_pieces), len(file_infos)), 'RED', 'BOLD')) 77 | return 'M' 78 | 79 | if downladed_file_rate >= fullest_file_rate: 80 | return 'S' 81 | return str(fullest_file_idx) 82 | 83 | 84 | def get_similatity_rate_and_color(success_blocks, all_blocks): 85 | if all_blocks == 0: 86 | return 'BLACK2', 'Bad' 87 | 88 | rate = float(success_blocks) / all_blocks 89 | if rate > 0.9: 90 | return 'GREEN', 'Excellent' 91 | if rate > 0.5: 92 | return 'YELLOW', 'Good' 93 | if rate > 0.01: 94 | return 'YELLOW', 'Poor' 95 | else: 96 | return 'BLACK2', 'Bad' 97 | 98 | 99 | def get_file_sizes(info): 100 | global args 101 | if 'files' in info: # yield pieces from a multi-file torrent 102 | return [fi['length'] for fi in info['files'] if fi['length'] > int(args.minsize)] 103 | else: # yield pieces from a single file torrent 104 | return [info['length']] 105 | 106 | 107 | def get_possible_files(rootdir, sizes): 108 | """Find duplicate files in directory tree.""" 109 | filesizes = {} 110 | # Build up dict with key as filesize and value is list of filenames. 111 | for path, dirs, files in os.walk(rootdir): 112 | for filename in files: 113 | filepath = os.path.join(path, filename) 114 | filesize = os.lstat(filepath).st_size 115 | if args.verbose >= 2: 116 | print("File registered: %s (%d bytes)" % (filepath, filesize)) 117 | if filesize in sizes: 118 | filesizes.setdefault(filesize, []).append(filepath) 119 | return filesizes 120 | 121 | 122 | def remove_hard_links(files): 123 | inode_to_filename = {} 124 | for filename in files: 125 | stat_info = os.stat(filename) 126 | inode_to_filename[stat_info[stat.ST_INO]] = filename 127 | return inode_to_filename.values() 128 | 129 | 130 | def load_qbittorrent_conf(hash): 131 | """Load qBittorrent settings.""" 132 | from PyQt4 import QtCore 133 | 134 | settings = QtCore.QSettings(os.path.expanduser(QBITTORRENT_RESUME_CONF), QtCore.QSettings.IniFormat) 135 | 136 | root = settings.value('torrents').toPyObject() 137 | record = root[QtCore.QString(hash)] 138 | path = record[QtCore.QString('save_path')] 139 | 140 | if isinstance(path, QtCore.QString): 141 | path = str(path) 142 | 143 | return path 144 | 145 | 146 | def check_file_chunk(hash, offset, length, filename): 147 | with open(filename, "rb") as sfile: 148 | sfile.seek(offset) 149 | piece = sfile.read(length) 150 | 151 | piece_hash = hashlib.sha1(piece).digest() 152 | return piece_hash == hash 153 | 154 | 155 | def get_chunk(filesizes, global_offset): 156 | """ 157 | Returns file offset using global continuous file 158 | 159 | >>> get_chunk([], 0) 160 | (0, 0) 161 | >>> get_chunk([100], 0) 162 | (0, 0) 163 | >>> get_chunk([100], 100) 164 | (0, 0) 165 | >>> get_chunk([50,50,30], 100) 166 | (2, 0) 167 | """ 168 | file_offset = 0 169 | file_idx = 0 170 | 171 | # Find file idx 172 | for filesize in filesizes: 173 | if file_offset + filesize > global_offset: 174 | break 175 | 176 | file_offset += filesize 177 | file_idx += 1 178 | 179 | return (file_idx, global_offset - file_offset) 180 | 181 | 182 | def construct_file(file_infos, piece_length, dest_filename): 183 | # TODO: Get biggest file 184 | max_num_of_good_chunks = max(fi.num_of_good_chunks for fi in file_infos) 185 | biggest_file = next(fi for fi in file_infos if fi.num_of_good_chunks == max_num_of_good_chunks) 186 | 187 | f = tempfile.NamedTemporaryFile(mode='w+b', delete=False) 188 | file_name = f.name 189 | print("Temporary file:", file_name) 190 | f.close() 191 | print("Copy file: ", biggest_file.path) 192 | shutil.copy(biggest_file.path, f.name) 193 | 194 | if len(file_infos) > 1: 195 | with open(file_name, 'r+b') as f: 196 | for chunk_idx, chunks_merged in enumerate(zip(*[fi.chunks for fi in file_infos])): 197 | for i, p in enumerate(chunks_merged): 198 | if p: 199 | # copy chunk from and to (file_offset, piece_length) 200 | file_info = file_infos[i] 201 | src = open(file_info.path, 'rb') 202 | src.seek(file_info.start_offset + chunk_idx * piece_length) 203 | f.seek(file_info.start_offset + chunk_idx * piece_length) 204 | f.write(src.read(piece_length)) 205 | src.close() 206 | break 207 | 208 | print("Move temporary file to ", dest_filename) 209 | shutil.move(file_name, dest_filename) 210 | 211 | 212 | def ensure_dir_exists(f): 213 | d = os.path.dirname(f) 214 | if not os.path.exists(d): 215 | os.makedirs(d) 216 | 217 | 218 | def guess_file(file_info, file_idx, files, pieces, piece_length, files_sizes_array, basedir): 219 | global args 220 | global fmt 221 | global save_path 222 | 223 | if 'path' in file_info: 224 | file_name = os.path.join(*file_info['path']) 225 | else: 226 | file_name = file_info['name'] 227 | 228 | file_length = file_info['length'] 229 | destination_path = os.path.join(save_path, basedir, file_name) 230 | 231 | if file_length in files: 232 | print("Processing file: " + fmt.format(str(file_name), 'BLUE', 'BOLD')) 233 | file_infos = [] 234 | 235 | add_incomplete_file_with_different_size(destination_path, files[file_length]) 236 | 237 | if args.autoskip and len(files[file_length]) < 2 and \ 238 | max(file.startswith(destination_path) for file in files[file_length]): 239 | print("Only one file found. Thus, skipping...") 240 | return 241 | 242 | uniq_filenames = remove_hard_links(files[file_length]) 243 | 244 | for file_number, file in enumerate(files[file_length]): 245 | file_info = FileInfo() 246 | file_info.path = file 247 | 248 | if file.startswith(destination_path): 249 | number_to_show = ' * ' 250 | file_info.isOriginal = True 251 | else: 252 | number_to_show = ' ' + str(file_number) + ' ' 253 | if file in uniq_filenames: 254 | pieces.seek(0) 255 | sys.stdout.write(fmt.format(number_to_show, 'BLACK0', 'BOLD') + str(file)) 256 | 257 | num_of_checks = 0 258 | num_of_successes = 0 259 | offset = 0 260 | pieces_list = [] 261 | while True: 262 | hash = pieces.read(20) 263 | if not hash: 264 | break 265 | idx, file_offset = get_chunk(files_sizes_array, offset * piece_length) 266 | if not file_info.start_offset: 267 | file_info.start_offset = file_offset 268 | 269 | if idx == file_idx: 270 | num_of_checks += 1 271 | hash_result = check_file_chunk(hash, file_offset, piece_length, file) 272 | pieces_list.append(hash_result) 273 | if hash_result: 274 | num_of_successes += 1 275 | 276 | offset += 1 277 | 278 | file_info.chunks = pieces_list 279 | file_info.num_of_good_chunks = num_of_successes 280 | file_infos.append(file_info) 281 | 282 | color_code, result_message = get_similatity_rate_and_color(num_of_successes, num_of_checks) 283 | pattern = ' [{0} of {1}] ({2})' 284 | print(fmt.format(pattern.format(num_of_successes, num_of_checks, result_message), color_code)) 285 | else: 286 | sys.stdout.write(fmt.format(number_to_show, 'BLACK0', 'BOLD') + str(file) + " hardlink -> skipped\n") 287 | 288 | suggestion = suggest_method(file_infos) 289 | while True: 290 | user_input = '' 291 | if args.autoskip < 2 and not (suggestion == 'S' and args.autoskip > 0): 292 | user_input = input(fmt.format('Choose file number or [S]kip/co[M]bine/[A]uto [/S/M/A] ({0}) '.format(suggestion), 'INVERT')) 293 | 294 | if user_input == '': 295 | user_input = suggestion 296 | if re.match('^[0-9]+$', user_input): 297 | src_path = file_infos[int(user_input)].path 298 | print('Copying:', destination_path, src_path) 299 | ensure_dir_exists(destination_path) 300 | shutil.copyfile(src_path, destination_path + args.file_ext) 301 | break 302 | elif user_input.upper() == 'M': 303 | print('Creating mixed file from multiple sources') 304 | ensure_dir_exists(destination_path) 305 | construct_file(file_infos, piece_length, destination_path + args.file_ext) 306 | break 307 | elif user_input.upper() == 'S': 308 | print('Skipping...') 309 | break 310 | elif user_input.upper() == 'A': 311 | print('Autoselect default option') 312 | args.autoskip = 2 313 | else: 314 | print('Mmmm?') 315 | 316 | else: 317 | if args.verbose > 0: 318 | print("Skipping file: %s (%d bytes)" % (file_name, file_length)) 319 | 320 | 321 | def add_incomplete_file_with_different_size(filepath, list): 322 | if os.path.exists(filepath): 323 | for filename in os.listdir(os.path.dirname(filepath)): 324 | curr_filepath = os.path.join(os.path.dirname(filepath), filename) 325 | if curr_filepath.startswith(filepath): 326 | if curr_filepath not in list: 327 | list.append(curr_filepath) 328 | 329 | 330 | def main(): 331 | global args 332 | global save_path 333 | 334 | # Open torrent file 335 | torrent_file = open(args.torrentfile, "rb") 336 | metainfo = bencode.bdecode(torrent_file.read()) 337 | info = metainfo['info'] 338 | pieces = io.BytesIO(info['pieces']) 339 | 340 | hash = hashlib.sha1(bencode.bencode(info)).hexdigest() 341 | print("Hash:", hash) 342 | 343 | sizes = get_file_sizes(info) 344 | print('Sizes', sizes) 345 | 346 | if args.autodetect_output_dir: 347 | save_path = load_qbittorrent_conf(hash) 348 | print(save_path) 349 | else: 350 | save_path = args.output_dir 351 | 352 | # Get possible files 353 | print('Searching for file pretenders...') 354 | files = get_possible_files(os.path.expanduser(args.root), sizes) 355 | 356 | # Check files one by one 357 | if 'files' in info: 358 | files_sizes_array = [f['length'] for f in info['files']] 359 | file_idx = 0 360 | for f in info['files']: 361 | guess_file(f, file_idx, files, pieces, info['piece length'], files_sizes_array, info['name']) 362 | file_idx += 1 363 | else: 364 | files_sizes_array = [info['length']] 365 | guess_file(info, 0, files, pieces, info['piece length'], files_sizes_array, '') 366 | 367 | 368 | if __name__ == "__main__": 369 | from argparse import ArgumentParser 370 | 371 | parser = ArgumentParser(description='Recreate download directory for .torrent file from fully and partially downloaded file(s).') 372 | parser.add_argument('torrentfile', metavar='torrentFile', help='.torrent file to analyze') 373 | parser.add_argument('root', metavar='root', help='Directory to recursively search files in') 374 | parser.add_argument('--symlink', action='store_true', 375 | help='Use symlinks instead of copying files. ' 376 | '(Caution: You may lose your data if files are not actually identical.)') 377 | parser.add_argument('--minsize', default=MINIMUM_FILESIZE_TO_SEARCH, 378 | help='Minimum file size in bytes to be recoverable. Default: ' + str(MINIMUM_FILESIZE_TO_SEARCH)) 379 | parser.add_argument('-v', '--verbose', action='count', default=0, 380 | help='Be verbose (multiple levels: use -vv or -vvv to see more)') 381 | parser.add_argument('-o', '--output_dir', dest='output_dir', 382 | help='Output directory where to place recreated files') 383 | parser.add_argument('-a', '--autodetect_output_dir', dest='autodetect_output_dir', action='store_true', 384 | help='Autodetect output directory (only qBittorrent currently supported)') 385 | parser.add_argument('-c', '--use_color', dest='use_color', action='store_true', help='Output format: [ansi, ascii]') 386 | parser.add_argument('-s', '--autoskip', dest='autoskip', action='count', default=0, 387 | help='Dont ask questions if there is only one viable choice. Use -ss to behave even more automated') 388 | parser.add_argument('--fileext', dest='file_ext', default='', 389 | help='Extension to be added to output files (ex, .!qB for incomplete qBittorrent files)') 390 | parser.add_argument('--version', action='version', version=VERSION) 391 | args = parser.parse_args() 392 | 393 | if args.use_color: 394 | fmt = AnsiFormatter() 395 | else: 396 | fmt = BaseFormatter() 397 | 398 | main() 399 | --------------------------------------------------------------------------------