├── requirements.txt
├── .gitignore
├── setup.py
├── README.md
└── torram


/requirements.txt:
--------------------------------------------------------------------------------
1 | bencode.py


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[cod]
 2 | 
 3 | # C extensions
 4 | *.so
 5 | 
 6 | # Packages
 7 | *.egg
 8 | *.egg-info
 9 | dist
10 | build
11 | eggs
12 | parts
13 | bin
14 | var
15 | sdist
16 | develop-eggs
17 | .installed.cfg
18 | lib
19 | lib64
20 | 
21 | # Installer logs
22 | pip-log.txt
23 | 
24 | # Unit test / coverage reports
25 | .coverage
26 | .tox
27 | nosetests.xml
28 | 
29 | # Translations
30 | *.mo
31 | 
32 | # Mr Developer
33 | .mr.developer.cfg
34 | .project
35 | .pydevproject
36 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from setuptools import setup
 3 | 
 4 | setup(
 5 |     name = "torram",
 6 |     version = "1.0.1",
 7 |     install_requires = ['bencode.py'],
 8 |     scripts = ['torram'],
 9 | 
10 |     # Metadata
11 |     author = "Volodymyr Buell (Buiel)",
12 |     author_email = "vbuell@gmail.com",
13 |     url = "https://github.com/vbuell/torrent-upstart",
14 |     description = ("Utility that recreats a torrent download folder with fully and partially downloaded files.")
15 | #    license="APL2",
16 | )


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # torram
  2 | 
  3 | torram (former torrent-upstart) is a utility that recreates a torrent download
  4 | directory from fully and partially downloaded file(s). If several
  5 | partially-downloaded sources of the same incompleted torrent files are found, it
  6 | merges all downloaded chunks into the target files together.
  7 | 
  8 | ## Use cases
  9 | 
 10 | There are two major cases when this utility will help you to save traffic or to
 11 | help recovering the files:
 12 | 
 13 | #### "hey, where are all seeders?" situation
 14 | 
 15 | That's the most common situation: you have several torrents having the same
 16 | file(s) but for some reason these torrents are dead, so you have 75% done for
 17 | one source and 85% done of another... there are chances that you actually have
 18 | enough information to combine downloaded blocks from incomplete file #1 with
 19 | downloaded blocks from file #2 to get a complete file as a result. The more
 20 | sources you have, the better result you'll get.
 21 | 
 22 | > :warning: After files are combined, do "Force check" in your torrent client.
 23 | > Otherwise the torrent client won't know that the files were changed.
 24 | 
 25 | **Note**: I had positive experience recovering some extra rare artifacts by
 26 | merging partially downloaded files obtained by different p2p sources: torrent
 27 | and mlDonkey temp directory. In that case just make sure that both files are
 28 | children of the *root* directory (see Usage section).
 29 | 
 30 | #### "I don't like to download it twice" situation
 31 | 
 32 | It's a good idea to scan your HDD drive with this utility before starting
 33 | downloading in terms of reusing already downloaded files.
 34 | 
 35 | > :warning: Again, make sure that you "Force check" torrent after you run this
 36 | > tool...
 37 | 
 38 | **Note**: Due to technical implementation of how files are being split into
 39 | chunks in .torrent, sometimes it's impossible to check md5 sum of first and last
 40 | chunk of the file. So after "Force check" is done you may see that the file is 99%
 41 | complete even though it's actually 100%.
 42 | 
 43 | ## How does it work
 44 | 
 45 | Each .torrent file consists of one or several files, each of them defined as a
 46 | record of fields: file name, file size, file's hash, plus an error detection
 47 | information in the form of md5 hash sum for all chunks. This error detection part
 48 | allows us to do the trick - find chunks with the same md5 sum from files in your
 49 | filesystem and generate output file using these chunks.
 50 | 
 51 | So the algorithm is:
 52 | 
 53 |  * scan *root* directory (last positional arg) recursively and find ponential
 54 |    pretenders for each file of .torrent file using two simple matching rules:
 55 |    file should either be named as required or should have the same file size
 56 |  * for each pretender of .torrent file we do split pretender file into chunks
 57 |    and calculate md5 sum for every chunk in order to find blocks that we will
 58 |    use to put in destination file
 59 |  * if multiple pretenders found having reusable blocks then we prompt user to
 60 |    choose one of these methods: merge from multiple files (default); use one of files
 61 |    without merging; or skip the file
 62 |  * based on user's selection the tool will create a file (or skip creation) and
 63 |    place it in output directory (`-o` argument)
 64 | 
 65 | 
 66 | ## Features
 67 | 
 68 |  * Combine several partially-downloaded sources into one (yes, it does work!)
 69 |  * Autodetect output directory (qBittorrent only for now)
 70 |  * Use symlinks instead of copying a file (danger... danger... danger...)
 71 |  * Manual and automatic modes (several levels of automation. See `-s`, `-ss`
 72 |    switches)
 73 |  
 74 | ## Requirements
 75 | 
 76 |  * Python >= 3.4
 77 |  * bencode.py (https://pypi.python.org/pypi/bencode)
 78 |  * PyQt4 (only if you use qBittorrent output directory autodetection, switch
 79 |    `--autodetect_output_dir`)
 80 | 
 81 | ## Usage
 82 | 
 83 | ```
 84 | > ./torram.py --help
 85 | usage: torram [-h] [--symlink] [--minsize MINSIZE] [-v] [-o OUTPUT_DIR] [-a]
 86 |               [-c] [-s] [--fileext FILE_EXT] [--version]
 87 |               torrentFile root
 88 | 
 89 | Recreate download directory for .torrent file from fully and partially
 90 | downloaded file(s).
 91 | 
 92 | positional arguments:
 93 |   torrentFile           .torrent file to analyze
 94 |   root                  Directory to recursively search files in
 95 | 
 96 | optional arguments:
 97 |   -h, --help            show this help message and exit
 98 |   --symlink             Use symlinks instead of copying files. (Caution: You
 99 |                         may lose your data if files are not actually
100 |                         identical.)
101 |   --minsize MINSIZE     Minimum file size in bytes to be recoverable. Default:
102 |                         1048576
103 |   -v, --verbose         Be verbose (multiple levels: use -vv or -vvv to see
104 |                         more)
105 |   -o OUTPUT_DIR, --output_dir OUTPUT_DIR
106 |                         Output directory where to place recreated files
107 |   -a, --autodetect_output_dir
108 |                         Autodetect output directory (only qBittorrent
109 |                         currently supported)
110 |   -c, --use_color       Output format: [ansi, ascii]
111 |   -s, --autoskip        Dont ask questions if there is only one viable choice.
112 |                         Use -ss to behave even more automated
113 |   --fileext FILE_EXT    Extension to be added to output files (ex, .!qB for
114 |                         incomplete qBittorrent files)
115 |   --version             show program's version number and exit
116 | ```
117 | 


--------------------------------------------------------------------------------
/torram:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import functools
  3 | import hashlib
  4 | import io
  5 | import os
  6 | import re
  7 | import shutil
  8 | import stat
  9 | import sys
 10 | import tempfile
 11 | import bencode
 12 | 
 13 | 
 14 | DELUGE_DIR = '~/.config/deluge/state'
 15 | QBITTORRENT_RESUME_CONF = '~/.config/qBittorrent/qBittorrent-resume.conf'
 16 | FILES_DIR = '~'
 17 | MINIMUM_FILESIZE_TO_SEARCH = 1024 * 1024
 18 | 
 19 | VERSION = '0.9.0'
 20 | 
 21 | class AnsiFormatter(object):
 22 |     aaa = {'RED': "\033[31m",
 23 |            'BOLD': "\033[1m",
 24 |            'YELLOW': "\033[33m",
 25 |            'INVERT': "\033[40m\033[37m",
 26 |            'GREEN': "\033[32m",
 27 |            'BLUE': "\033[34m",
 28 |            'BLACK2': "\033[90m",
 29 |            'BLACK1': "\033[37m",
 30 |            'BLACK0': "\033[97m",
 31 |            }
 32 | 
 33 |     def format(self, txt, *code):
 34 |         return ''.join([self.aaa[c] for c in code]) + txt + "\033[0m"
 35 | 
 36 | 
 37 | class BaseFormatter(object):
 38 |     def format(self, txt, *code):
 39 |         return txt
 40 | 
 41 | 
 42 | class FileInfo():
 43 |     def __init__(self):
 44 |         self.start_offset = None
 45 |         self.isOriginal = False
 46 |         self.isHardlink = False
 47 | 
 48 | 
 49 | def suggest_method(file_infos):
 50 |     fullest_file_idx = None
 51 |     fullest_file_rate = 0
 52 |     downladed_file_rate = 0
 53 |     mixed_pieces = []
 54 | 
 55 |     # calculate per file
 56 |     for idx, file_info in enumerate(file_infos):
 57 |         if len(file_info.chunks) == 0:
 58 |             return 'S'
 59 |         num_of_success = file_info.num_of_good_chunks
 60 |         rate = float(num_of_success) / len(file_info.chunks)
 61 |         if file_info.isOriginal:
 62 |             downladed_file_rate = rate
 63 |         if rate > fullest_file_rate:
 64 |             fullest_file_rate = rate
 65 |             fullest_file_idx = idx
 66 | 
 67 |     # calculate mixed
 68 |     if len(file_infos) > 1:
 69 |         for aa in zip(*[fi.chunks for fi in file_infos]):
 70 |     #        print aa, type(aa), any(aa)
 71 |             mixed_pieces.append(any(aa))
 72 |         num_of_success = functools.reduce(lambda x, y: x + int(y), mixed_pieces, 0)
 73 | 
 74 |         if float(num_of_success) / len(mixed_pieces) > fullest_file_rate:
 75 |             pattern = 'Got [{0} of {1} good pieces from {2} files.'
 76 |             print(fmt.format(pattern.format(num_of_success, len(mixed_pieces), len(file_infos)), 'RED', 'BOLD'))
 77 |             return 'M'
 78 | 
 79 |     if downladed_file_rate >= fullest_file_rate:
 80 |         return 'S'
 81 |     return str(fullest_file_idx)
 82 | 
 83 | 
 84 | def get_similatity_rate_and_color(success_blocks, all_blocks):
 85 |     if all_blocks == 0:
 86 |         return 'BLACK2', 'Bad'
 87 | 
 88 |     rate = float(success_blocks) / all_blocks
 89 |     if rate > 0.9:
 90 |         return 'GREEN', 'Excellent'
 91 |     if rate > 0.5:
 92 |         return 'YELLOW', 'Good'
 93 |     if rate > 0.01:
 94 |         return 'YELLOW', 'Poor'
 95 |     else:
 96 |         return 'BLACK2', 'Bad'
 97 | 
 98 | 
 99 | def get_file_sizes(info):
100 |     global args
101 |     if 'files' in info:     # yield pieces from a multi-file torrent
102 |         return [fi['length'] for fi in info['files'] if fi['length'] > int(args.minsize)]
103 |     else:       # yield pieces from a single file torrent
104 |         return [info['length']]
105 | 
106 | 
107 | def get_possible_files(rootdir, sizes):
108 |     """Find duplicate files in directory tree."""
109 |     filesizes = {}
110 |     # Build up dict with key as filesize and value is list of filenames.
111 |     for path, dirs, files in os.walk(rootdir):
112 |         for filename in files:
113 |             filepath = os.path.join(path, filename)
114 |             filesize = os.lstat(filepath).st_size
115 |             if args.verbose >= 2:
116 |               print("File registered: %s (%d bytes)" % (filepath, filesize))
117 |             if filesize in sizes:
118 |                 filesizes.setdefault(filesize, []).append(filepath)
119 |     return filesizes
120 | 
121 | 
122 | def remove_hard_links(files):
123 |     inode_to_filename = {}
124 |     for filename in files:
125 |         stat_info = os.stat(filename)
126 |         inode_to_filename[stat_info[stat.ST_INO]] = filename
127 |     return inode_to_filename.values()
128 | 
129 | 
130 | def load_qbittorrent_conf(hash):
131 |     """Load qBittorrent settings."""
132 |     from PyQt4 import QtCore
133 | 
134 |     settings = QtCore.QSettings(os.path.expanduser(QBITTORRENT_RESUME_CONF), QtCore.QSettings.IniFormat)
135 | 
136 |     root = settings.value('torrents').toPyObject()
137 |     record = root[QtCore.QString(hash)]
138 |     path = record[QtCore.QString('save_path')]
139 | 
140 |     if isinstance(path, QtCore.QString):
141 |         path = str(path)
142 | 
143 |     return path
144 | 
145 | 
146 | def check_file_chunk(hash, offset, length, filename):
147 |     with open(filename, "rb") as sfile:
148 |         sfile.seek(offset)
149 |         piece = sfile.read(length)
150 | 
151 |         piece_hash = hashlib.sha1(piece).digest()
152 |         return piece_hash == hash
153 | 
154 | 
155 | def get_chunk(filesizes, global_offset):
156 |     """
157 |     Returns file offset using global continuous file
158 | 
159 |     >>> get_chunk([], 0)
160 |     (0, 0)
161 |     >>> get_chunk([100], 0)
162 |     (0, 0)
163 |     >>> get_chunk([100], 100)
164 |     (0, 0)
165 |     >>> get_chunk([50,50,30], 100)
166 |     (2, 0)
167 |     """
168 |     file_offset = 0
169 |     file_idx = 0
170 | 
171 |     # Find file idx
172 |     for filesize in filesizes:
173 |         if file_offset + filesize > global_offset:
174 |             break
175 | 
176 |         file_offset += filesize
177 |         file_idx += 1
178 | 
179 |     return (file_idx, global_offset - file_offset)
180 | 
181 | 
182 | def construct_file(file_infos, piece_length, dest_filename):
183 |     # TODO: Get biggest file
184 |     max_num_of_good_chunks = max(fi.num_of_good_chunks for fi in file_infos)
185 |     biggest_file = next(fi for fi in file_infos if fi.num_of_good_chunks == max_num_of_good_chunks)
186 | 
187 |     f = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
188 |     file_name = f.name
189 |     print("Temporary file:", file_name)
190 |     f.close()
191 |     print("Copy file: ", biggest_file.path)
192 |     shutil.copy(biggest_file.path, f.name)
193 | 
194 |     if len(file_infos) > 1:
195 |         with open(file_name, 'r+b') as f:
196 |             for chunk_idx, chunks_merged in enumerate(zip(*[fi.chunks for fi in file_infos])):
197 |                 for i, p in enumerate(chunks_merged):
198 |                     if p:
199 |                         # copy chunk from and to (file_offset, piece_length)
200 |                         file_info = file_infos[i]
201 |                         src = open(file_info.path, 'rb')
202 |                         src.seek(file_info.start_offset + chunk_idx * piece_length)
203 |                         f.seek(file_info.start_offset + chunk_idx * piece_length)
204 |                         f.write(src.read(piece_length))
205 |                         src.close()
206 |                         break
207 | 
208 |     print("Move temporary file to ", dest_filename)
209 |     shutil.move(file_name, dest_filename)
210 | 
211 | 
212 | def ensure_dir_exists(f):
213 |     d = os.path.dirname(f)
214 |     if not os.path.exists(d):
215 |         os.makedirs(d)
216 | 
217 | 
218 | def guess_file(file_info, file_idx, files, pieces, piece_length, files_sizes_array, basedir):
219 |     global args
220 |     global fmt
221 |     global save_path
222 | 
223 |     if 'path' in file_info:
224 |         file_name = os.path.join(*file_info['path'])
225 |     else:
226 |         file_name = file_info['name']
227 | 
228 |     file_length = file_info['length']
229 |     destination_path = os.path.join(save_path, basedir, file_name)
230 | 
231 |     if file_length in files:
232 |         print("Processing file: " + fmt.format(str(file_name), 'BLUE', 'BOLD'))
233 |         file_infos = []
234 | 
235 |         add_incomplete_file_with_different_size(destination_path, files[file_length])
236 | 
237 |         if args.autoskip and len(files[file_length]) < 2 and \
238 |                 max(file.startswith(destination_path) for file in files[file_length]):
239 |             print("Only one file found. Thus, skipping...")
240 |             return
241 | 
242 |         uniq_filenames = remove_hard_links(files[file_length])
243 | 
244 |         for file_number, file in enumerate(files[file_length]):
245 |             file_info = FileInfo()
246 |             file_info.path = file
247 | 
248 |             if file.startswith(destination_path):
249 |                 number_to_show = ' * '
250 |                 file_info.isOriginal = True
251 |             else:
252 |                 number_to_show = ' ' + str(file_number) + ' '
253 |             if file in uniq_filenames:
254 |                 pieces.seek(0)
255 |                 sys.stdout.write(fmt.format(number_to_show, 'BLACK0', 'BOLD') + str(file))
256 | 
257 |                 num_of_checks = 0
258 |                 num_of_successes = 0
259 |                 offset = 0
260 |                 pieces_list = []
261 |                 while True:
262 |                     hash = pieces.read(20)
263 |                     if not hash:
264 |                         break
265 |                     idx, file_offset = get_chunk(files_sizes_array, offset * piece_length)
266 |                     if not file_info.start_offset:
267 |                         file_info.start_offset = file_offset
268 | 
269 |                     if idx == file_idx:
270 |                         num_of_checks += 1
271 |                         hash_result = check_file_chunk(hash, file_offset, piece_length, file)
272 |                         pieces_list.append(hash_result)
273 |                         if hash_result:
274 |                             num_of_successes += 1
275 | 
276 |                     offset += 1
277 | 
278 |                 file_info.chunks = pieces_list
279 |                 file_info.num_of_good_chunks = num_of_successes
280 |                 file_infos.append(file_info)
281 | 
282 |                 color_code, result_message = get_similatity_rate_and_color(num_of_successes, num_of_checks)
283 |                 pattern = ' [{0} of {1}] ({2})'
284 |                 print(fmt.format(pattern.format(num_of_successes, num_of_checks, result_message), color_code))
285 |             else:
286 |                 sys.stdout.write(fmt.format(number_to_show, 'BLACK0', 'BOLD') + str(file) + " hardlink -> skipped\n")
287 | 
288 |         suggestion = suggest_method(file_infos)
289 |         while True:
290 |             user_input = ''
291 |             if args.autoskip < 2 and not (suggestion == 'S' and args.autoskip > 0):
292 |                 user_input = input(fmt.format('Choose file number or [S]kip/co[M]bine/[A]uto [<N>/S/M/A] ({0}) '.format(suggestion), 'INVERT'))
293 | 
294 |             if user_input == '':
295 |                 user_input = suggestion
296 |             if re.match('^[0-9]+$', user_input):
297 |                 src_path = file_infos[int(user_input)].path
298 |                 print('Copying:', destination_path, src_path)
299 |                 ensure_dir_exists(destination_path)
300 |                 shutil.copyfile(src_path, destination_path + args.file_ext)
301 |                 break
302 |             elif user_input.upper() == 'M':
303 |                 print('Creating mixed file from multiple sources')
304 |                 ensure_dir_exists(destination_path)
305 |                 construct_file(file_infos, piece_length, destination_path + args.file_ext)
306 |                 break
307 |             elif user_input.upper() == 'S':
308 |                 print('Skipping...')
309 |                 break
310 |             elif user_input.upper() == 'A':
311 |                 print('Autoselect default option')
312 |                 args.autoskip = 2
313 |             else:
314 |                 print('Mmmm?')
315 | 
316 |     else:
317 |         if args.verbose > 0:
318 |             print("Skipping file: %s (%d bytes)" % (file_name, file_length))
319 | 
320 | 
321 | def add_incomplete_file_with_different_size(filepath, list):
322 |     if os.path.exists(filepath):
323 |         for filename in os.listdir(os.path.dirname(filepath)):
324 |             curr_filepath = os.path.join(os.path.dirname(filepath), filename)
325 |             if curr_filepath.startswith(filepath):
326 |                 if curr_filepath not in list:
327 |                     list.append(curr_filepath)
328 | 
329 | 
330 | def main():
331 |     global args
332 |     global save_path
333 | 
334 |     # Open torrent file
335 |     torrent_file = open(args.torrentfile, "rb")
336 |     metainfo = bencode.bdecode(torrent_file.read())
337 |     info = metainfo['info']
338 |     pieces = io.BytesIO(info['pieces'])
339 | 
340 |     hash = hashlib.sha1(bencode.bencode(info)).hexdigest()
341 |     print("Hash:", hash)
342 | 
343 |     sizes = get_file_sizes(info)
344 |     print('Sizes', sizes)
345 | 
346 |     if args.autodetect_output_dir:
347 |         save_path = load_qbittorrent_conf(hash)
348 |         print(save_path)
349 |     else:
350 |         save_path = args.output_dir
351 | 
352 |     # Get possible files
353 |     print('Searching for file pretenders...')
354 |     files = get_possible_files(os.path.expanduser(args.root), sizes)
355 | 
356 |     # Check files one by one
357 |     if 'files' in info:
358 |         files_sizes_array = [f['length'] for f in info['files']]
359 |         file_idx = 0
360 |         for f in info['files']:
361 |             guess_file(f, file_idx, files, pieces, info['piece length'], files_sizes_array, info['name'])
362 |             file_idx += 1
363 |     else:
364 |         files_sizes_array = [info['length']]
365 |         guess_file(info, 0, files, pieces, info['piece length'], files_sizes_array, '')
366 | 
367 | 
368 | if __name__ == "__main__":
369 |     from argparse import ArgumentParser
370 | 
371 |     parser = ArgumentParser(description='Recreate download directory for .torrent file from fully and partially downloaded file(s).')
372 |     parser.add_argument('torrentfile', metavar='torrentFile', help='.torrent file to analyze')
373 |     parser.add_argument('root', metavar='root', help='Directory to recursively search files in')
374 |     parser.add_argument('--symlink', action='store_true',
375 |                         help='Use symlinks instead of copying files. '
376 |                              '(Caution: You may lose your data if files are not actually identical.)')
377 |     parser.add_argument('--minsize', default=MINIMUM_FILESIZE_TO_SEARCH,
378 |                         help='Minimum file size in bytes to be recoverable. Default: ' + str(MINIMUM_FILESIZE_TO_SEARCH))
379 |     parser.add_argument('-v', '--verbose', action='count', default=0,
380 |                         help='Be verbose (multiple levels: use -vv or -vvv to see more)')
381 |     parser.add_argument('-o', '--output_dir', dest='output_dir',
382 |                         help='Output directory where to place recreated files')
383 |     parser.add_argument('-a', '--autodetect_output_dir', dest='autodetect_output_dir', action='store_true',
384 |                         help='Autodetect output directory (only qBittorrent currently supported)')
385 |     parser.add_argument('-c', '--use_color', dest='use_color', action='store_true', help='Output format: [ansi, ascii]')
386 |     parser.add_argument('-s', '--autoskip', dest='autoskip', action='count', default=0,
387 |                         help='Dont ask questions if there is only one viable choice. Use -ss to behave even more automated')
388 |     parser.add_argument('--fileext', dest='file_ext', default='',
389 |                         help='Extension to be added to output files (ex, .!qB for incomplete qBittorrent files)')
390 |     parser.add_argument('--version', action='version', version=VERSION)
391 |     args = parser.parse_args()
392 | 
393 |     if args.use_color:
394 |         fmt = AnsiFormatter()
395 |     else:
396 |         fmt = BaseFormatter()
397 | 
398 |     main()
399 | 


--------------------------------------------------------------------------------