├── .gitignore ├── LICENSE-MIT.txt ├── README.md ├── bhlmake.py └── bhlreco.py /.gitignore: -------------------------------------------------------------------------------- 1 | misc_tools/ 2 | *.pyc 3 | *.zip 4 | *.dat 5 | *.bhl 6 | *.db3 7 | note.txt -------------------------------------------------------------------------------- /LICENSE-MIT.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Marco Pontello 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BlockHashLoc 2 | 3 | The purpose of BlockHashLoc is to enable the recovery of files after total loss of File System structures, or without even knowing what FS was used in the first place. 4 | 5 | The way it can recover a given file is by keeping a (small) parallel BHL file with a list of crypto-hashes of all the blocks (of selectable size) that compose it. So it's then possible to read blocks from a disk image/volume, calculate their hashes, compare them with the saved ones and rebuild the original file. 6 | 7 | With adequately sized blocks (512 bytes, 4KB, etc. depending on the media and File System), this let one recover a file regardless of the FS used, or the FS integrity, or the fragmentation level. 8 | 9 | This project is related to [SeqBox](https://github.com/MarcoPon/SeqBox). The main differences are: 10 | 11 | - SeqBox create a stand-alone file container with the above listed recovery characteristics. 12 | 13 | - BHL realize the same effect with a (small) parallel file, that can be stored separately (in other media, or in the cloud), or along the original as a SeqBox file (so that it can be recovered too, as the first step), so it can be used to add a degree of recoverability to existing files. 14 | 15 | **N.B.** 16 | 17 | The tools are still in beta and surely not speed optimized, but they are already functional and the BHL file format is considered final. 18 | 19 | ## Demo tour 20 | 21 | BlockHashLoc is composed of two separate tools: 22 | - BHLMake: create BHL files with block-hashes and metadata 23 | - BHLReco: recover files searching for the block's hashes contained in a set of BHL files 24 | 25 | There are in some case many parameters but the default are sensible so it's generally pretty simple. 26 | 27 | Here's a practical example. Let's see how 2 photos can be recovered from a fragmented floppy disk that have lost its FAT (and any other system section). The 2 JPEGs weight about 450KB and 680KB: 28 | 29 | ![Manu01](http://i.imgur.com/QKxgT5r.jpg) ![Manu02](http://i.imgur.com/jfQLlx1.jpg) 30 | 31 | We start by creating the BHL files, and then proceed to test them to make sure they are all right: 32 | 33 | ``` 34 | c:\t>bhlmake *.jpg 35 | creating file 'Manu01.jpg.bhl'... 36 | BHL file size: 29582 - blocks: 913 - ratio: 6.3% 37 | creating file 'Manu02.jpg.bhl'... 38 | BHL file size: 43936 - blocks: 1363 - ratio: 6.3% 39 | 40 | c:\t>bhlreco -t -bhl *.bhl 41 | reading BHL file 'Manu01.jpg.bhl'... 42 | reading BHL file 'Manu02.jpg.bhl'... 43 | BHL file(s) OK! 44 | 45 | ``` 46 | 47 | Now we put both the JPEGs in a floppy disk image that have gone trough various cycles of files updating and deleting. At this point the BHL files could be kept somewhere else (another disk, some online storage, etc.), or put in the same disk image after being encoded in one or more [SeqBox](https://github.com/MarcoPon/SeqBox) recoverable container(s) - because, obviously, there's no use in making BHL files if they can be lost too. 48 | As a result the data is laid out like this: 49 | 50 | ![Disk Layout](http://i.imgur.com/3MUOAjk.png) 51 | 52 | The photos are in green, and the two SBX files in blue. 53 | Then with an hex editor we zap the first system sectors and the FAT (in red), making the disk image unreadable! 54 | Time for recovery! 55 | 56 | We start with the free (GPLV v2+) [PhotoRec](http://www.cgsecurity.org/wiki/PhotoRec), which is the go-to tool for these kind of jobs. Parameters are set to "Paranoid : YES (Brute force enabled)" & "Keep corrupted files : Yes", to search the entire data area. 57 | As the files are fragmented, we know we can't expect miracles. The starting sector of the photos will be surely found, but as soon as the first contiguous fragment end, it's anyone guess. 58 | 59 | ![PhotoRec results](http://i.imgur.com/y9phKLX.png) 60 | 61 | As expected, something has been recovered. But the 2 files sizes are off (32K and 340KB). The very first parts of the photos are OK, but then they degrade quickly as other random blocks of data where mixed in. We have all seen JPEGs ending up like this: 62 | 63 | ![Manu01](http://i.imgur.com/bCtYJpW.jpg) ![Manu02](http://i.imgur.com/EmOid42.jpg) 64 | 65 | Other popular recovery tools lead to the same results. It's not anyone fault: it's just not possible to know how the various fragment are concatenated, without an index or some kind of list (there are approaches based on file type validators that can in at least some cases differentiate between spurious and *valid* blocks, but that's beside the point). 66 | 67 | But having the BHL files at hand, it's a different story. Each of the blocks referenced in the BHL files can't be fragmented, and they all can be located anywhere in the disk just by calculating the hash of every blocks until all matching ones are found. 68 | 69 | So, the first thing we need is to obtain the BHL files, either by getting them from some alternate storage, or recovering the [SeqBox](https://github.com/MarcoPon/SeqBox) containers from the same disk image and extracting them. 70 | 71 | Then we can run BHLReco and begin the scanning process: 72 | 73 | ``` 74 | c:\t>bhlreco disk.IMA -bhl *.bhl 75 | creating ':memory:' database... 76 | reading BHL file 'Manu01.jpg.bhl'... 77 | updating db... 78 | reading BHL file 'Manu02.jpg.bhl'... 79 | updating db... 80 | scan step: 512 81 | scanning file 'disk.IMA'... 82 | 90.4% - tot: 2274 - found: 2274 - 40.65MB/s 83 | scan completed. 84 | creating file 'Manu01.jpg'... 85 | hash match! 86 | creating file 'Manu02.jpg'... 87 | hash match! 88 | 89 | files restored: 2 - with errors: 0 - files missing: 0 90 | ``` 91 | 92 | All files have been recovered, with no errors! 93 | Time for a quick visual check: 94 | 95 | ![Manu01](http://i.imgur.com/qEB9wBQ.jpg) ![Manu02](http://i.imgur.com/s6spyFq.jpg) 96 | 97 | N.B. Here's a [7-Zip archive](http://mark0.net/download/bhldemo-diskimage.7z) with the disk image and the 2 BHL files used in the demo (1.2MB). 98 | 99 | 100 | 101 | ## Tech spec 102 | 103 | Byte order: Big Endian 104 | 105 | Hash: SHA-256 106 | 107 | ### BHL file structure 108 | 109 | | section | desc | note | 110 | | ---------- | ------------------------------------ | --------- | 111 | | Header | Signature & version | | 112 | | Metadata | Misc info | | 113 | | Hash | Blocks hash list & final hash | | 114 | | Last block | zlib compressed last block remainder | if needed | 115 | 116 | 117 | ### Header 118 | 119 | | pos | to pos | size | desc | 120 | |---- | --- | ---- | --------------------------------- | 121 | | 0 | 12 | 13 | Signature = 'BlockHashLoc' + 0x1a | 122 | | 13 | 13 | 1 | Version byte | 123 | | 14 | 17 | 4 | Block size | 124 | | 18 | 25 | 8 | File size | 125 | 126 | ### Metadata 127 | 128 | | pos | to pos | size | desc | 129 | |---- | ------ | ---- | --------------------- | 130 | | 26 | 29 | 4 | Metadata section size | 131 | | 30 | var | var | Encoded metadata list | 132 | 133 | ### Hash 134 | 135 | | pos | to pos | size | desc | 136 | |---- | ------ | ---- | --------------------------------- | 137 | | var | var | 32 | 1st block hash | 138 | | ... | ... | 32 | ... | 139 | | var | var | 32 | Last block hash | 140 | | var | var | 32 | Hash of all previous block hashes | 141 | 142 | 143 | ### Versions: 144 | 145 | Currently the only version is 1. 146 | 147 | ### Metadata encoding 148 | 149 | | Bytes | Field | 150 | | ----- | ----- | 151 | | 3 | ID | 152 | | 1 | Len | 153 | | n | Data | 154 | 155 | #### IDs 156 | 157 | | ID | Desc | 158 | | --- | --- | 159 | | FNM | filename (utf-8) | 160 | | FDT | date & time (8 bytes, seconds since epoch) | 161 | 162 | (others IDs may be added...) 163 | 164 | 165 | ## Links 166 | 167 | - [BlockHashLoc home page](http://mark0.net/) 168 | - [BlockHashLoc GitHub repository](https://github.com/MarcoPon/BlockHashLoc) 169 | 170 | ## Credits 171 | 172 | The idea of collecting & scanning for block hashes was something I had considered while developing [SeqBox](https://github.com/MarcoPon/SeqBox), then settling on using a stand alone file container instead of the original file plus a parallel one. 173 | 174 | Then the concept resurfaced during a nice discussion on Slashdot with user JoeyRoxx, and after some considerations I decided to put some work on that too, seeing how the two approaches could both be useful (in different situations) and even complement each other nicely. 175 | 176 | ## Contacts 177 | 178 | If you need more info, want to get in touch, or donate: [Marco Pontello](http://mark0.net/contacts-e.html) 179 | 180 | **Bitcoin**: 1Mark1tF6QGj112F5d3fQALGf41YfzXEK3 181 | 182 | ![Qr-Code](http://mark0.net/images/qrcode.png) -------------------------------------------------------------------------------- /bhlmake.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # BHLMake - BlockHashLoc Maker 5 | # 6 | # Created: 04/05/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import hashlib 29 | import argparse 30 | from time import time 31 | import zlib 32 | import fnmatch 33 | 34 | PROGRAM_VER = "0.7.1b" 35 | BHL_VER = 1 36 | 37 | def get_cmdline(): 38 | """Evaluate command line parameters, usage & help.""" 39 | parser = argparse.ArgumentParser( 40 | description="create a SeqBox container", 41 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 42 | prefix_chars='-', fromfile_prefix_chars='@') 43 | parser.add_argument("-v", "--version", action='version', 44 | version='BlockHashLoc ' + 45 | 'Maker v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 46 | parser.add_argument("filename", action="store", nargs="+", 47 | help="file to process") 48 | parser.add_argument("-d", action="store", dest="destpath", 49 | help="destination path", default="", metavar="path") 50 | parser.add_argument("-b", "--blocksize", type=int, default=512, 51 | help="blocks size", metavar="n") 52 | parser.add_argument("-c", "--continue", action="store_true", default=False, 53 | help="continue on block errors", dest="cont") 54 | parser.add_argument("-r", "--recurse", action="store_true", default=False, 55 | help="recurse subdirs") 56 | res = parser.parse_args() 57 | return res 58 | 59 | 60 | def errexit(errlev=1, mess=""): 61 | """Display an error and exit.""" 62 | if mess != "": 63 | sys.stderr.write("%s: error: %s\n" % 64 | (os.path.split(sys.argv[0])[1], mess)) 65 | sys.exit(errlev) 66 | 67 | 68 | def buildBHL(filename, bhlfilename, blocksize): 69 | filesize = os.path.getsize(filename) 70 | fin = open(filename, "rb", buffering=1024*1024) 71 | print("creating file '%s'..." % bhlfilename) 72 | open(bhlfilename, 'w').close() 73 | fout = open(bhlfilename, "wb", buffering=1024*1024) 74 | 75 | #write header 76 | fout.write(b"BlockHashLoc\x1a") 77 | fout.write(bytes([BHL_VER])) 78 | fout.write(blocksize.to_bytes(4, byteorder='big', signed=False)) 79 | fout.write(filesize.to_bytes(8, byteorder='big', signed=False)) 80 | 81 | #write metadata 82 | metadata = b"" 83 | bb = os.path.split(filename)[1].encode() 84 | bb = b"FNM" + bytes([len(bb)]) + bb 85 | metadata += bb 86 | bb = int(os.path.getmtime(filename)).to_bytes(8, byteorder='big') 87 | bb = b"FDT" + bytes([len(bb)]) + bb 88 | metadata += bb 89 | 90 | metadata = len(metadata).to_bytes(4, byteorder='big') + metadata 91 | fout.write(metadata) 92 | 93 | #read blocks and calc hashes 94 | globalhash = hashlib.sha256() 95 | blocksnum = 0 96 | ticks = 0 97 | updatetime = time() 98 | bufferz = b"" 99 | while True: 100 | buffer = fin.read(blocksize) 101 | if len(buffer) < blocksize: 102 | if len(buffer) == 0: 103 | break 104 | else: 105 | #compressed blob with last block remainder 106 | bufferz = zlib.compress(buffer, 9) 107 | blockhash = hashlib.sha256() 108 | blockhash.update(buffer) 109 | digest = blockhash.digest() 110 | globalhash.update(digest) 111 | fout.write(digest) 112 | blocksnum += 1 113 | 114 | #some progress update 115 | if time() > updatetime: 116 | print("%.1f%%" % (fin.tell()*100.0/filesize), " ", 117 | end="\r", flush=True) 118 | updatetime = time() + .1 119 | 120 | #write hash of hashes and block remainder (if present) 121 | fout.write(globalhash.digest()) 122 | if len(bufferz): 123 | fout.write(bufferz) 124 | 125 | fin.close() 126 | fout.close() 127 | 128 | #show stats about the file just created 129 | bhlfilesize = os.path.getsize(bhlfilename) 130 | overhead = bhlfilesize * 100 / filesize 131 | print(" BHL file size: %i - blocks: %i - ratio: %.1f%%" % 132 | (bhlfilesize, blocksnum, overhead)) 133 | 134 | 135 | def main(): 136 | 137 | cmdline = get_cmdline() 138 | blocksize = cmdline.blocksize 139 | 140 | #build list of files to process 141 | filenames = [] 142 | for filespec in cmdline.filename: 143 | filepath, filename = os.path.split(filespec) 144 | if not filepath: 145 | filepath = "." 146 | if not filename: 147 | filename = "*" 148 | for wroot, wdirs, wfiles in os.walk(filepath): 149 | if not cmdline.recurse: 150 | wdirs[:] = [] 151 | for fn in fnmatch.filter(wfiles, filename): 152 | filenames.append(os.path.join(wroot, fn)) 153 | filenames = sorted(set(filenames), key=os.path.getsize) 154 | 155 | bhlok = 0 156 | bhlerr = 0 157 | 158 | for filename in filenames: 159 | if not os.path.exists(filename): 160 | errexit(1, "file '%s' not found" % (filename)) 161 | 162 | destpath = cmdline.destpath 163 | if not destpath: 164 | bhlfilename = os.path.split(filename)[1] + ".bhl" 165 | else: 166 | if not os.path.isdir(destpath): 167 | destpath = os.path.split(filename)[0] 168 | bhlfilename = os.path.join(destpath, 169 | os.path.split(filename)[1] + ".bhl") 170 | 171 | try: 172 | buildBHL(filename, bhlfilename, blocksize) 173 | bhlok += 1 174 | except: 175 | if cmdline.cont: 176 | bhlerr += 1 177 | print(" warning: can't create BHL file!") 178 | else: 179 | errexit(1, "can't creating BHL file '%s'" % (bhlfilename)) 180 | 181 | if len(cmdline.filename) > 1 and bhlerr > 0: 182 | print("\nBHL files created: %i - errors: %i" % (bhlok, bhlerr)) 183 | 184 | 185 | if __name__ == '__main__': 186 | main() 187 | -------------------------------------------------------------------------------- /bhlreco.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # BHLReco - BlockHashLoc Recover 5 | # 6 | # Created: 06/05/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import hashlib 29 | import argparse 30 | import time 31 | import zlib 32 | import sqlite3 33 | import glob 34 | 35 | PROGRAM_VER = "0.7.17b" 36 | BHL_VER = 1 37 | BHL_MAGIC = b"BlockHashLoc\x1a" 38 | 39 | def get_cmdline(): 40 | """Evaluate command line parameters, usage & help.""" 41 | parser = argparse.ArgumentParser( 42 | description="create a SeqBox container", 43 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 44 | prefix_chars='-+', fromfile_prefix_chars='@') 45 | parser.add_argument("-v", "--version", action='version', 46 | version='BlockHashLoc ' + 47 | 'Recover v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 48 | parser.add_argument("imgfilename", action="store", nargs="*", 49 | help="image(s)/volumes(s) to scan") 50 | parser.add_argument("-db", "--database", action="store", dest="dbfilename", 51 | metavar="filename", 52 | help="temporary db with recovery info", 53 | default=":memory:") 54 | parser.add_argument("-bhl", action="store", nargs="+", dest="bhlfilename", 55 | help="BHL file(s)", metavar="filename", required=True) 56 | parser.add_argument("-d", action="store", dest="destpath", 57 | help="destination path", default="", metavar="path") 58 | parser.add_argument("-o", "--offset", type=int, default=0, 59 | help=("offset from the start"), metavar="n") 60 | parser.add_argument("-st", "--step", type=int, default=0, 61 | help=("scan step"), metavar="n") 62 | parser.add_argument("-t","--test", action="store_true", default=False, 63 | help="only test BHL file(s)") 64 | res = parser.parse_args() 65 | return res 66 | 67 | 68 | def errexit(errlev=1, mess=""): 69 | """Display an error and exit.""" 70 | if mess != "": 71 | sys.stderr.write("%s: error: %s\n" % 72 | (os.path.split(sys.argv[0])[1], mess)) 73 | sys.exit(errlev) 74 | 75 | 76 | def mcd(nums): 77 | """MCD: step good for different blocksizes""" 78 | res = min(nums) 79 | while res > 0: 80 | ok = 0 81 | for n in nums: 82 | if n % res != 0: 83 | break 84 | else: 85 | ok += 1 86 | if ok == len(nums): 87 | break 88 | res -= 1 89 | return res if res > 0 else 1 90 | 91 | 92 | def metadataDecode(data): 93 | """Decode metadata""" 94 | metadata = {} 95 | p = 0 96 | while p < (len(data)-3): 97 | metaid = data[p:p+3] 98 | p+=3 99 | metalen = data[p] 100 | metabb = data[p+1:p+1+metalen] 101 | p = p + 1 + metalen 102 | if metaid == b'FNM': 103 | metadata["filename"] = metabb.decode('utf-8') 104 | elif metaid == b'FDT': 105 | metadata["filedatetime"] = int.from_bytes(metabb, byteorder='big') 106 | return metadata 107 | 108 | 109 | class RecDB(): 110 | """Helper class to access Sqlite3 DB with recovery info""" 111 | 112 | def __init__(self, dbfilename): 113 | self.connection = sqlite3.connect(dbfilename) 114 | self.cursor = self.connection.cursor() 115 | 116 | def Commit(self): 117 | self.connection.commit() 118 | 119 | def CreateTables(self): 120 | c = self.cursor 121 | c.execute("CREATE TABLE bhl_files (id INTEGER, blocksize INTEGER, size INTEGER, name TEXT, datetime INTEGER, lastblock BLOB, hash BLOB)") 122 | c.execute("CREATE TABLE bhl_hashlist (hash BLOB, fileid INTEGER, sourceid INTEGER, num INTEGER, pos INTEGER)") 123 | c.execute("CREATE INDEX hash ON bhl_hashlist (hash)") 124 | self.connection.commit() 125 | 126 | def SetFileData(self, fid=0, fblocksize=0, fsize=0, fname="", fdatetime=0, flastblock=b"", fhash=b""): 127 | c = self.cursor 128 | c.execute("INSERT INTO bhl_files (id, blocksize, size, name, datetime, lastblock, hash) VALUES (?, ?, ?, ?, ?, ?, ?)", 129 | (fid, fblocksize, fsize, fname, fdatetime, flastblock, fhash)) 130 | self.connection.commit() 131 | 132 | def AddHash(self, fhash=0, fid=0, fnum=0): 133 | c = self.cursor 134 | c.execute("INSERT INTO bhl_hashlist (hash, fileid, num) VALUES (?, ?, ?)", 135 | (fhash, fid, fnum)) 136 | 137 | def SetHashPos(self, fhash=0, sid=0, pos=0): 138 | c = self.cursor 139 | c.execute("UPDATE bhl_hashlist SET pos = ?, sourceid = ? WHERE hash = ? AND pos IS NULL", 140 | (pos, sid, fhash)) 141 | return c.rowcount 142 | 143 | def GetFileInfo(self, fid): 144 | c = self.cursor 145 | data = {} 146 | c.execute("SELECT * FROM bhl_files where id = %i" % fid) 147 | res = c.fetchone() 148 | if res: 149 | data["blocksize"] = res[1] 150 | data["filesize"] = res[2] 151 | data["filename"] = res[3] 152 | data["filedatetime"] = res[4] 153 | data["lastblock"] = res[5] 154 | data["hash"] = res[6] 155 | return data 156 | 157 | def GetWriteList(self, fid): 158 | c = self.cursor 159 | data = [] 160 | c.execute("SELECT num, sourceid, pos FROM bhl_hashlist WHERE fileid = %i AND pos IS NOT NULL ORDER BY num" % fid) 161 | return c.fetchall() 162 | 163 | 164 | def uniquifyFileName(filename): 165 | count = 0 166 | uniq = "" 167 | name,ext = os.path.splitext(filename) 168 | while os.path.exists(filename): 169 | count += 1 170 | uniq = "(%i)" % count 171 | filename = name + uniq + ext 172 | return filename 173 | 174 | 175 | def getFileSize(filename): 176 | """Calc file size - works on devices too""" 177 | ftemp = os.open(filename, os.O_RDONLY) 178 | try: 179 | return os.lseek(ftemp, 0, os.SEEK_END) 180 | finally: 181 | os.close(ftemp) 182 | 183 | 184 | def main(): 185 | 186 | cmdline = get_cmdline() 187 | 188 | globalblocksnum = 0 189 | bhlfileid = 0 190 | sizelist = [] 191 | 192 | if not len(cmdline.imgfilename) and not cmdline.test: 193 | errexit(1, "no image file/volume specified!") 194 | 195 | #build list of BHL files to process 196 | bhlfilenames = [] 197 | for filename in cmdline.bhlfilename: 198 | if os.path.isdir(filename): 199 | filename = os.path.join(filename, "*") 200 | bhlfilenames += glob.glob(filename) 201 | bhlfilenames = [filename for filename in bhlfilenames 202 | if not os.path.isdir(filename)] 203 | bhlfilenames = sorted(set(bhlfilenames)) 204 | 205 | if len(bhlfilenames) == 0: 206 | errexit(1, "no BHL file(s) found!") 207 | 208 | #prepare database 209 | if not cmdline.test: 210 | dbfilename = cmdline.dbfilename 211 | print("creating '%s' database..." % (dbfilename)) 212 | if dbfilename.upper() != ":MEMORY:": 213 | open(dbfilename, 'w').close() 214 | db = RecDB(dbfilename) 215 | db.CreateTables() 216 | 217 | #process all BHL files 218 | for bhlfilename in bhlfilenames: 219 | if not os.path.exists(bhlfilename): 220 | errexit(1, "BHL file '%s' not found" % (bhlfilename)) 221 | bhlfilesize = os.path.getsize(bhlfilename) 222 | 223 | #read hashes in memory 224 | blocklist = {} 225 | print("reading BHL file '%s'..." % bhlfilename) 226 | fin = open(bhlfilename, "rb", buffering=1024*1024) 227 | if BHL_MAGIC != fin.read(13): 228 | errexit(1, "not a valid BHL file") 229 | #check ver 230 | bhlver = ord(fin.read(1)) 231 | blocksize = int.from_bytes(fin.read(4), byteorder='big') 232 | if not blocksize in sizelist: 233 | sizelist.append(blocksize) 234 | filesize = int.from_bytes(fin.read(8), byteorder='big') 235 | lastblocksize = filesize % blocksize 236 | totblocksnum = (filesize + blocksize-1) // blocksize 237 | 238 | #parse metadata section 239 | metasize = int.from_bytes(fin.read(4), byteorder='big') 240 | metadata = metadataDecode(fin.read(metasize)) 241 | 242 | #read all block hashes 243 | globalhash = hashlib.sha256() 244 | updatetime = time.time() 245 | for block in range(totblocksnum): 246 | digest = fin.read(32) 247 | globalhash.update(digest) 248 | if digest in blocklist: 249 | blocklist[digest].append(block) 250 | else: 251 | blocklist[digest] = [block] 252 | #some progress update 253 | if time.time() > updatetime: 254 | print("%.1f%%" % (fin.tell()*100.0/bhlfilesize), " ", 255 | end="\r", flush=True) 256 | updatetime = time.time() + .1 257 | 258 | lastblockdigest = digest 259 | 260 | #verify the hashes read 261 | digest = fin.read(32) 262 | if globalhash.digest() != digest: 263 | errexit(1, "hashes block corrupt!") 264 | 265 | #read and check last blocks 266 | if lastblocksize: 267 | totblocksnum -= 1 268 | buffer = fin.read(bhlfilesize-fin.tell()+1) 269 | lastblockbuffer = zlib.decompress(buffer) 270 | blockhash = hashlib.sha256() 271 | blockhash.update(lastblockbuffer) 272 | if blockhash.digest() != lastblockdigest: 273 | errexit(1, "last block corrupt!") 274 | #remove lastblock from the list 275 | del blocklist[lastblockdigest] 276 | else: 277 | lastblockbuffer = b"" 278 | print("100% ", end="\r", flush=True) 279 | 280 | globalblocksnum += totblocksnum 281 | 282 | #put data in the DB 283 | #hashes 284 | if not cmdline.test: 285 | print("updating db...") 286 | updatetime = time.time() 287 | i = 0 288 | for digest in blocklist: 289 | for pos in blocklist[digest]: 290 | db.AddHash(fhash=digest, fid=bhlfileid, fnum=pos) 291 | i+= 1 292 | #some progress update 293 | if time.time() > updatetime: 294 | print("%.1f%%" % (i*100.0/len(blocklist)), " ", 295 | end="\r", flush=True) 296 | db.Commit() 297 | updatetime = time.time() + .1 298 | 299 | #file info 300 | db.SetFileData(fid=bhlfileid, fblocksize=blocksize, fsize=filesize, 301 | fname=metadata["filename"], 302 | fdatetime=metadata["filedatetime"], 303 | flastblock=lastblockbuffer, 304 | fhash=globalhash.digest()) 305 | bhlfileid +=1 306 | 307 | if cmdline.test: 308 | print("BHL file(s) OK!") 309 | errexit(0) 310 | 311 | #select an adequate scan step 312 | maxblocksize = max(sizelist) 313 | scanstep = cmdline.step 314 | if scanstep == 0: 315 | scanstep = mcd(sizelist) 316 | print("scan step:", scanstep) 317 | offset = cmdline.offset 318 | 319 | #build list of image files to process 320 | imgfilenames = [] 321 | for filename in cmdline.imgfilename: 322 | if os.path.isdir(filename): 323 | filename = os.path.join(filename, "*") 324 | imgfilenames += glob.glob(filename) 325 | imgfilenames = [filename for filename in imgfilenames 326 | if not os.path.isdir(filename)] 327 | imgfilenames = sorted(set(imgfilenames)) 328 | 329 | #start scanning process... 330 | blocksfound = 0 331 | for imgfileid in range(len(imgfilenames)): 332 | imgfilename = imgfilenames[imgfileid] 333 | if not os.path.exists(imgfilename): 334 | errexit(1, "image file/volume '%s' not found" % (imgfilename)) 335 | imgfilesize = getFileSize(imgfilename) 336 | 337 | print("scanning file '%s'..." % imgfilename) 338 | fin = open(imgfilename, "rb", buffering=1024*1024) 339 | 340 | updatetime = time.time() - 1 341 | starttime = time.time() 342 | writelist = {} 343 | docommit = False 344 | 345 | for pos in range(offset, imgfilesize, scanstep): 346 | fin.seek(pos, 0) 347 | buffer = fin.read(maxblocksize) 348 | if len(buffer) > 0: 349 | #need to check for all sizes 350 | for size in sizelist: 351 | blockhash = hashlib.sha256() 352 | blockhash.update(buffer[:size]) 353 | digest = blockhash.digest() 354 | num = db.SetHashPos(fhash=digest, sid=imgfileid, pos=pos) 355 | if num: 356 | docommit = True 357 | blocksfound += num 358 | 359 | #status update 360 | if ((time.time() > updatetime) or (globalblocksnum == blocksfound) or 361 | (imgfilesize-pos-len(buffer) == 0) ): 362 | etime = (time.time()-starttime) 363 | if etime == 0: 364 | etime = .001 365 | print(" %.1f%% - tot: %i - found: %i - %.2fMB/s" % 366 | ((pos+len(buffer)-1)*100/imgfilesize, 367 | globalblocksnum, blocksfound, pos/(1024*1024)/etime), 368 | end = "\r", flush=True) 369 | updatetime = time.time() + .2 370 | if docommit: 371 | db.Commit() 372 | docommit = False 373 | #break early if all the work is done 374 | if blocksfound == globalblocksnum: 375 | break 376 | fin.close() 377 | print() 378 | 379 | print("scan completed.") 380 | 381 | filesrestored = 0 382 | filesrestorederr = 0 383 | filesmissing= 0 384 | 385 | #open all the sources 386 | finlist = {} 387 | for imgfileid in range(len(imgfilenames)): 388 | finlist[imgfileid] = open(imgfilenames[imgfileid], "rb") 389 | 390 | #start rebuilding files... 391 | for fid in range(len(bhlfilenames)): 392 | fileinfo = db.GetFileInfo(fid) 393 | filename = fileinfo["filename"] 394 | filename = os.path.join(cmdline.destpath, filename) 395 | filesize = fileinfo["filesize"] 396 | 397 | #get list of blocks num & positions 398 | blocksize = fileinfo["blocksize"] 399 | lastblock = fileinfo["lastblock"] 400 | writelist = db.GetWriteList(fid) 401 | totblocksnum = filesize // blocksize 402 | 403 | if len(writelist) > 0 or totblocksnum == 0: 404 | print("creating file '%s'..." % filename) 405 | open(filename, 'w').close() 406 | fout = open(filename, "wb") 407 | 408 | if len(writelist) < totblocksnum: 409 | print("file incomplete! block missings: %i" % 410 | (totblocksnum - len(writelist))) 411 | 412 | filehash = hashlib.sha256() 413 | for data in writelist: 414 | blocknum = data[0] 415 | imgid = data[1] 416 | pos = data[2] 417 | finlist[imgid].seek(pos) 418 | buffer = finlist[imgid].read(blocksize) 419 | fout.seek(blocknum*blocksize) 420 | fout.write(buffer) 421 | blockhash = hashlib.sha256() 422 | blockhash.update(buffer) 423 | filehash.update(blockhash.digest()) 424 | if lastblock: 425 | fout.write(lastblock) 426 | blockhash = hashlib.sha256() 427 | blockhash.update(lastblock) 428 | filehash.update(blockhash.digest()) 429 | fout.close() 430 | if "filedatetime" in fileinfo: 431 | os.utime(filename, 432 | (int(time.time()), fileinfo["filedatetime"])) 433 | filesrestored += 1 434 | 435 | if filehash.digest() == fileinfo["hash"]: 436 | print("hash match!") 437 | else: 438 | print("hash mismatch! decoded file corrupted/incomplete!") 439 | filesrestorederr += 1 440 | 441 | else: 442 | print("nothing found for file '%s'" % filename) 443 | filesmissing += 1 444 | 445 | print("\nfiles restored: %i - with errors: %i - files missing: %i" % 446 | (filesrestored, filesrestorederr, filesmissing)) 447 | 448 | 449 | if __name__ == '__main__': 450 | main() 451 | --------------------------------------------------------------------------------