├── .gitignore ├── LICENSE-MIT.txt ├── README.md ├── sbx-logo.svg ├── sbxdec.py ├── sbxenc.py ├── sbxreco.py ├── sbxscan.py ├── seqbox.bt ├── seqbox.py └── todo.txt /.gitignore: -------------------------------------------------------------------------------- 1 | misc_tools/ 2 | *.pyc 3 | *.zip 4 | *.dat 5 | *.sbx 6 | *.seqbox 7 | *.db3 -------------------------------------------------------------------------------- /LICENSE-MIT.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Marco Pontello 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SeqBox - Sequenced Box container 2 | ### A single file container/archive that can be reconstructed even after total loss of file system structures. 3 | ![SBX-Logo](http://i.imgur.com/Ewper2w.png) 4 | 5 | An SBX container exists both as a normal file in a mounted file system, and as a collection of recognizable blocks at a lower level. 6 | 7 | SBX blocks have a size sub-multiple/equal to that of a sector, so they can survive any level of fragmentation. Each block have a minimal header that include a unique file identifier, block sequence number, checksum, version. 8 | Additional, non critical info/metadata are contained in block 0 (like name, file size, crypto-hash, other attributes, etc.). 9 | 10 | If disaster strikes, recovery can be performed simply scanning a volume/image, reading sector sized slices and checking blocks signatures and then CRCs to detect valid SBX blocks. Then the blocks can be grouped by UIDs, sorted by sequence number and reassembled to form the original SeqBox containers. 11 | 12 | ![It's Magic](http://i.imgur.com/DQZDO0P.gif) 13 | 14 | It's also possible and entirely transparent to keep multiple copies of a container, in the same or different media, to increase the chances of recoverability. In case of corrupted blocks, all the good ones can be collected and reassembled from all available sources. 15 | 16 | The UID can be anything, as long as it's unique for the specific application. It could be random generated (probably the most common option), or a hash of the file content, or a simple sequence, etc. 17 | 18 | Overhead is minimal: for SBX v1 is 16B/512B (+1 optional 512B block), or < 3.5%. 19 | 20 | ## Demo tour 21 | 22 | The two main tools are obviously the encoder & decoder: 23 | - SBXEnc: encode a file to a SBX container 24 | - SBXDec: decode SBX back to original file; can also show info on a container and tests for integrity against a crypto-hash 25 | 26 | The other two are the recovery tools: 27 | - SBXScan: scan a set of files (raw images, or even block devices on Linux) to build a Sqlite db with the necessary recovery info 28 | - SBXReco: rebuild SBX files using data collected by SBXScan 29 | 30 | There are in some case many parameters but the default are sensible so it's generally pretty simple. 31 | 32 | Now to a practical example: let's see how 2 photos and their 2 SBX encoded versions go trough a fragmented floppy disk that have lost its FAT (and any other system part). We start with the 2 pictures, about 200KB and 330KB: 33 | 34 | ![Castle](http://i.imgur.com/Qf0qrUp.jpg) ![Lake](http://i.imgur.com/9rH6tMf.jpg) 35 | 36 | We encode using SBXEnc, and then test the new file with SBXDec, to be sure all is OK: 37 | 38 | ``` 39 | C:\t>sbxenc Lake.jpg 40 | hashing file 'Lake.jpg'... 41 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc 42 | creating file 'Lake.jpg.sbx'... 43 | 100% 44 | SBX file size: 343040 - blocks: 670 - overhead: 3.4% 45 | 46 | C:\t>sbxdec -t Lake.jpg.sbx 47 | decoding 'Lake.jpg.sbx'... 48 | metadata block found! 49 | SBX decoding complete 50 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc 51 | hash match! 52 | ``` 53 | 54 | Same for the other file. Now we put both the JPEG and the SBX files in a floppy disk image already about half full, that have gone trough various cycles of updating and deleting. As a result the data is laid out like this: 55 | 56 | ![Disk Layout](http://i.imgur.com/cBoXONY.png) 57 | 58 | Normal files (pictures included) are in green, and the two SBX in different shades of blue. 59 | Then with an hex editor we zap the first system sectors and the FAT (in red)! 60 | Time for recovery! 61 | 62 | We start with the free (GPLV v2+) [PhotoRec](http://www.cgsecurity.org/wiki/PhotoRec), which is the go-to tool for these kind of jobs. Parameters are set to "Paranoid : YES (Brute force enabled)" & "Keep corrupted files : Yes", to search the entire data area. 63 | As the files are fragmented, we know we can't expect miracles. The starting sector of the photos will be surely found, but as soon as the first contiguous fragment end, it's anyone guess. 64 | 65 | ![PhotoRec results](http://i.imgur.com/qa0PySP.png) 66 | 67 | As expected, something has been recovered. But the 2 files size are off (280K and 400KB). The very first parts of the photos are OK, but then they degrade quickly as other random blocks of data where mixed in. We have all seen JPEGs ending up like this: 68 | 69 | ![Castle](http://i.imgur.com/kP0jwyC.jpg) ![Lake](http://i.imgur.com/GyOonct.jpg) 70 | 71 | Other popular recovery tools lead to the same results. It's not anyone fault: it's just not possible to know how the various fragment are concatenated, without an index or some kind of list (there are approaches based on file type validators that can in at least some cases differentiate between spurious and *valid* blocks, but that's beside the point). 72 | 73 | But with a SBX file it's a different story. Each one of its block can't be fragmented more, and contains all the needed data to be put in its proper place in sequence. So let's proceed with the recovery of the SBX files. 74 | To spice things up, the disk image file is run trough a scrambler, that swaps variable sized blocks of sectors around. The resulting layout is now this: 75 | 76 | ![Scrambled](http://i.imgur.com/jmOWult.png) 77 | 78 | Pretty nightmarish! Now on to SBXScan to search for pieces of SBX files around, and SBXReco to get a report of the collected data: 79 | 80 | ``` 81 | C:\t\recovered\sbx>sbxscan \t\scrambled.IMA 82 | creating 'sbxscan.db3' database... 83 | scanning file/device '\t\scrambled.IMA' (1/1)... 84 | 100.0% blocks: 1087 - meta: 2 - files: 2 - 89.97MB/s 85 | scan completed! 86 | 87 | C:\t\recovered\sbx>sbxreco sbxscan.db3 -i 88 | opening 'sbxscan.db3' recovery info database... 89 | 90 | "UID", "filesize", "sbxname", "filename" 91 | "2818b123c00b", 206292, "Castle.jpg.sbx", "Castle.jpg" 92 | "76fe4a49ebf2", 331774, "Lake.jpg.sbx", "Lake.jpg" 93 | ``` 94 | 95 | The 2 SBX container have been found, with all the metadata. So the original filesizes are also known, along with the names of the SBX files and the original ones. At this point it would be possible to recover singles files or a group of them, by UID or names, but we opt to recover everything: 96 | 97 | ``` 98 | C:\t\recovered\sbx>sbxreco sbxscan.db3 --all 99 | opening 'sbxscan.db3' recovery info database... 100 | recovering SBX files... 101 | UID 2818b123c00b (1/2) 102 | blocks: 417 - size: 213504 bytes 103 | to: 'Castle.jpg.sbx' 104 | 100.0% (missing blocks: 0) 105 | UID 76fe4a49ebf2 (2/2) 106 | blocks: 670 - size: 343040 bytes 107 | to: 'Lake.jpg.sbx' 108 | 100.0% (missing blocks: 0) 109 | 110 | done. 111 | all SBx files recovered with no errors! 112 | ``` 113 | 114 | All SBX files seems to have been recovered correctly. We start decoding: 115 | 116 | ``` 117 | C:\t\recovered\sbx>sbxdec Lake.jpg.sbx 118 | decoding 'Lake.jpg.sbx'... 119 | metadata block found! 120 | creating file 'Lake.jpg'... 121 | SBX decoding complete 122 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc 123 | hash match! 124 | ``` 125 | 126 | And sure enough: 127 | 128 | ![Castle](http://i.imgur.com/Qf0qrUp.jpg) ![Lake](http://i.imgur.com/9rH6tMf.jpg) 129 | 130 | N.B. Here's a [7-Zip archive](http://mark0.net/download/sbxdemo-diskimages.7z) with the 2 disk images used in the demo (542KB). 131 | 132 | ## Possible / hypothetical / ideal uses cases 133 | - **Last step of a backup**. After creating a compressed archive of something, the archive could be SeqBox encoded to increase recovery chances in the event of some software/hardware issues that cause logic / file system's damages. 134 | - **Exchange data between different systems**. Regardless of the file system used, an SBX container can always be read/extracted. 135 | - **Long term storage**. Since each block is CRC tagged, and a crypto-hash of the original content is stored, bitrot can be easily detected. In addition, if multiple copies are stored, in the same or different media, the container can be correctly restored with high degree of probability even if all the copies are subject to some damages (in different blocks). 136 | - **Encoding of photos on a SDCard**. Loss of images on perfectly functioning SDCards are known occurrences in the photography world, for example when low on battery and maybe with a camera/firmware with suboptimal monitoring & management strategies. If the photo files are fragmented, recovery tools can usually help only to a point. 137 | - **On-disk format for a File System**. The trade-off in file size and performance (both should be fairly minimal anyway) could be interesting for some application. Maybe it could be a simple option (like compression in many FS). I plan to build a simple/toy FS with FUSE to test the concept, time permitting. 138 | - **Easy file splitting**. Probably less interesting, but a SeqBox container can also be splitted with no particular precautions aside from doing that on block size boundaries. Additionally, there's no need to use special naming conventions, numbering files, etc., as the SBX container can be reassembled exactly like when doing a recovery. 139 | - **Data hiding**. SeqBox containers (or even fragments of them) can be put inside other files (for example at the end of a JPEG, in the middle of a document, etc.), sprayed somewhere in the unused space, between partitions, and so on. 140 | Incidentally, that means that if you are in the digital forensics sector, now you have one more thing to check for! 141 | If a password is used, the entire SBX file is *mangled* to look pseudo-random, and SBXScan, SBXReco & SBXDec will not be able to recognize it unless the same password is provided. 142 | 143 | ## Tests 144 | 145 | SeqBox recoverability have been practically tested with a number of File Systems. The procedure involved using a Virtual Machine (or a full blown emulator) to format a small disk image with a certain FS, filling it with a number of small files, then deleting some of them randomly to free enough space to copy a series of SBX files. This way every SBX file results fragmented in a lot of smaller pieces. Then the image was quick-formatted, wipefs-ed and the VM shutdown. 146 | After that, from the host OS, recovery of the SBX files was attempted using SBXScan & SBXReco on the disk image. 147 | 148 | - **Working**: [ADFS](https://en.wikipedia.org/wiki/Advanced_Disc_Filing_System), [AFS](https://www.alteeve.com/w/Ami_File_Safe), [AFS](https://en.wikipedia.org/wiki/AtheOS_File_System), [AFFS](https://en.wikipedia.org/wiki/Amiga_Fast_File_System), [APFS](https://en.wikipedia.org/wiki/Apple_File_System), [BeFS](https://en.wikipedia.org/wiki/Be_File_System), [BtrFS](https://en.wikipedia.org/wiki/Btrfs), [EXT2/3/4](https://en.wikipedia.org/wiki/Extended_file_system), [F2FS](https://en.wikipedia.org/wiki/F2FS), [FATnn/VFAT/exFAT](https://en.wikipedia.org/wiki/File_Allocation_Table), [HAMMER](https://en.wikipedia.org/wiki/HAMMER), [HFS](https://en.wikipedia.org/wiki/Hierarchical_File_System), [HFS+](https://en.wikipedia.org/wiki/HFS_Plus), [HPFS](https://en.wikipedia.org/wiki/High_Performance_File_System), [JFS](https://en.wikipedia.org/wiki/JFS_(file_system)), [MFS](https://en.wikipedia.org/wiki/Macintosh_File_System), [MINIX FS](https://en.wikipedia.org/wiki/MINIX_file_system), [NTFS](https://en.wikipedia.org/wiki/NTFS), [ProDOS](https://en.wikipedia.org/wiki/Apple_ProDOS), [PFS](https://en.wikipedia.org/wiki/Professional_File_System), [ReFS](https://en.wikipedia.org/wiki/ReFS), [ReiserFS](https://en.wikipedia.org/wiki/ReiserFS), [UFS](https://en.wikipedia.org/wiki/Unix_File_System), [XFS](https://en.wikipedia.org/wiki/XFS), [YAFFS](https://en.wikipedia.org/wiki/YAFFS), [ZFS](https://en.wikipedia.org/wiki/ZFS). 149 | 150 | - **Not working**: [OFS](https://en.wikipedia.org/wiki/Amiga_Old_File_System) (due to 488 data bytes per 512 bytes sector) 151 | 152 | 153 | **N.B.** Obviously SBX blocks can't be found if File System encryption is used. Compression too (mostly, but not always). Striping/RAID instead is usually not a problem. 154 | 155 | Being written in Python 3, SeqBox tools are naturally multi-platform and have been tested successfully on various versions of Windows, on OS X & macOS, on some Linux distros either on x86 or ARM, on FreeBSD and on Android (via QPython). 156 | 157 | *** 158 | 159 | ## Tech spec 160 | Byte order: Big Endian 161 | ### Common blocks header: 162 | 163 | | pos | to pos | size | desc | 164 | |---- | --- | ---- | ----------------------------------- | 165 | | 0 | 2 | 3 | Recoverable Block signature = 'SBx' | 166 | | 3 | 3 | 1 | Version byte | 167 | | 4 | 5 | 2 | CRC-16-CCITT of the rest of the block (Version is used as starting value) | 168 | | 6 | 11 | 6 | file UID | 169 | | 12 | 15 | 4 | Block sequence number | 170 | 171 | ### Block 0 172 | 173 | | pos | to pos | size | desc | 174 | |---- | -------- | ---- | ---------------- | 175 | | 16 | n | var | encoded metadata | 176 | | n+1| blockend | var | padding (0x1a) | 177 | 178 | ### Blocks > 0 & < last: 179 | 180 | | pos | to pos | size | desc | 181 | |---- | -------- | ---- | ---------------- | 182 | | 16 | blockend | var | data | 183 | 184 | ### Blocks == last: 185 | 186 | | pos | to pos | size | desc | 187 | |---- | -------- | ---- | ---------------- | 188 | | 16 | n | var | data | 189 | | n+1 | blockend | var | padding (0x1a) | 190 | 191 | ### Versions: 192 | N.B. Current versions differs only by blocksize. 193 | 194 | | ver | blocksize | note | 195 | |---- | --------- | ------- | 196 | | 1 | 512 | default | 197 | | 2 | 128 | | 198 | | 3 | 4096 | | 199 | 200 | ### Metadata encoding: 201 | 202 | | Bytes | Field | 203 | | ----- | ----- | 204 | | 3 | ID | 205 | | 1 | Len | 206 | | n | Data | 207 | 208 | #### IDs 209 | 210 | | ID | Desc | 211 | | --- | --- | 212 | | FNM | filename (utf-8) | 213 | | SNM | sbx filename (utf-8) | 214 | | FSZ | filesize (8 bytes) | 215 | | FDT | date & time (8 bytes, seconds since epoch) | 216 | | SDT | sbx date & time (8 bytes) | 217 | | HSH | crypto hash (SHA256, using [Multihash](http://multiformats.io) protocol) | 218 | | PID | parent UID (*not used at the moment*)| 219 | 220 | (others IDs for file dates, attributes, etc. will be added...) 221 | 222 | ## Final notes 223 | The code was quickly hacked together in spare slices of time to verify the basic idea, so it's not optimized for speed and will benefit for some refactoring, in time. 224 | Still, the current block format is stable and some precautions have been taken to ensure that any encoded file could be correctly decoded. For example, the SHA256 hash that is stored as metadata is calculated before any other file operation. 225 | So, as long as a newly created SBX file is checked as OK with SBXDec, it should be OK. 226 | Also, SBXEnc and SBXDec by default don't overwrite files, and SBXReco uniquify the recovered ones. 227 | Finally, the file content is not altered in any way (except if a password is used), just re-framed. 228 | 229 | ## Related tools 230 | 231 | Check my [BlockHashLoc](https://github.com/MarcoPon/BlockHashLoc) for a different/sinergic approach to obtaining a similar degree of recoverability, but using a parallel, small hashes file instead of a standalone container. It's probably more suited to protect existing files, when it isn't practical to touch/re-encode them. 232 | 233 | ## Links 234 | 235 | - [SeqBox home page](http://mark0.net/soft-seqbox-e.html) 236 | - [SeqBox GitHub repository](https://github.com/MarcoPon/SeqBox) 237 | 238 | ## Contacts 239 | 240 | If you need more info, want to get in touch, or donate: [Marco Pontello](http://mark0.net/contacts-e.html) 241 | 242 | **Bitcoin**: 1Mark1tF6QGj112F5d3fQALGf41YfzXEK3 243 | 244 | ![Qr-Code](http://mark0.net/images/qrcode.png) -------------------------------------------------------------------------------- /sbx-logo.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 22 | SBX Logo 24 | 26 | 29 | 33 | 34 | 37 | 41 | 42 | 43 | 62 | 65 | 66 | 68 | 69 | 71 | image/svg+xml 72 | 74 | SBX Logo 75 | 76 | 77 | 78 | 82 | 85 | SBX Logo 87 | 90 | 97 | 104 | S 115 | 122 | 127 | 128 | 132 | 136 | 143 | 150 | B 161 | 168 | 173 | 174 | 179 | 180 | 184 | 188 | 192 | 199 | 206 | X 217 | 224 | 229 | 230 | 235 | 236 | 241 | 242 | 243 | 244 | 245 | -------------------------------------------------------------------------------- /sbxdec.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # SBXDec - Sequenced Box container Decoder 5 | # 6 | # Created: 03/03/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import hashlib 29 | import argparse 30 | import binascii 31 | import time 32 | 33 | import seqbox 34 | 35 | PROGRAM_VER = "1.0.2" 36 | 37 | 38 | def get_cmdline(): 39 | """Evaluate command line parameters, usage & help.""" 40 | parser = argparse.ArgumentParser( 41 | description="decode a SeqBox container", 42 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 43 | prefix_chars='-+') 44 | parser.add_argument("-v", "--version", action='version', 45 | version='SeqBox - Sequenced Box container - ' + 46 | 'Decoder v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 47 | parser.add_argument("sbxfilename", action="store", help="SBx container") 48 | parser.add_argument("filename", action="store", nargs='?', 49 | help="target/decoded file") 50 | parser.add_argument("-t","--test", action="store_true", default=False, 51 | help="test container integrity") 52 | parser.add_argument("-i", "--info", action="store_true", default=False, 53 | help="show informations/metadata") 54 | parser.add_argument("-c", "--continue", action="store_true", default=False, 55 | help="continue on block errors", dest="cont") 56 | parser.add_argument("-o", "--overwrite", action="store_true", default=False, 57 | help="overwrite existing file") 58 | parser.add_argument("-p", "--password", type=str, default="", 59 | help="encrypt with password", metavar="pass") 60 | res = parser.parse_args() 61 | return res 62 | 63 | 64 | def errexit(errlev=1, mess=""): 65 | """Display an error and exit.""" 66 | if mess != "": 67 | sys.stderr.write("%s: error: %s\n" % 68 | (os.path.split(sys.argv[0])[1], mess)) 69 | sys.exit(errlev) 70 | 71 | 72 | def lastEofCount(data): 73 | count = 0 74 | for b in range(len(data)): 75 | if data[-b-1] != 0x1a: 76 | break 77 | count +=1 78 | return count 79 | 80 | 81 | def main(): 82 | 83 | cmdline = get_cmdline() 84 | 85 | sbxfilename = cmdline.sbxfilename 86 | filename = cmdline.filename 87 | 88 | if not os.path.exists(sbxfilename): 89 | errexit(1, "sbx file '%s' not found" % (sbxfilename)) 90 | sbxfilesize = os.path.getsize(sbxfilename) 91 | 92 | print("decoding '%s'..." % (sbxfilename)) 93 | fin = open(sbxfilename, "rb", buffering=1024*1024) 94 | 95 | #check magic and get version 96 | header = fin.read(4) 97 | fin.seek(0, 0) 98 | if cmdline.password: 99 | e = seqbox.EncDec(cmdline.password, len(header)) 100 | header= e.xor(header) 101 | if header[:3] != b"SBx": 102 | errexit(1, "not a SeqBox file!") 103 | sbxver = header[3] 104 | 105 | sbx = seqbox.SbxBlock(ver=sbxver, pswd=cmdline.password) 106 | metadata = {} 107 | trimfilesize = False 108 | 109 | hashtype = 0 110 | hashlen = 0 111 | hashdigest = b"" 112 | hashcheck = False 113 | 114 | buffer = fin.read(sbx.blocksize) 115 | 116 | try: 117 | sbx.decode(buffer) 118 | except seqbox.SbxDecodeError as err: 119 | if cmdline.cont == False: 120 | print(err) 121 | errexit(errlev=1, mess="invalid block at offset 0x0") 122 | 123 | if sbx.blocknum > 1: 124 | errexit(errlev=1, mess="blocks missing or out of order at offset 0x0") 125 | elif sbx.blocknum == 0: 126 | print("metadata block found!") 127 | metadata = sbx.metadata 128 | if "filesize" in metadata: 129 | trimfilesize = True 130 | if "hash" in metadata: 131 | hashtype = metadata["hash"][0] 132 | if hashtype == 0x12: 133 | hashlen = metadata["hash"][1] 134 | hashdigest = metadata["hash"][2:2+hashlen] 135 | hashcheck = True 136 | 137 | else: 138 | #first block is data, so reset from the start 139 | print("no metadata available") 140 | fin.seek(0, 0) 141 | 142 | #display some info and stop 143 | if cmdline.info: 144 | print("\nSeqBox container info:") 145 | print(" file size: %i bytes" % (sbxfilesize)) 146 | print(" blocks: %i" % (sbxfilesize / sbx.blocksize)) 147 | print(" version: %i" % (sbx.ver)) 148 | print(" UID: %s" % (binascii.hexlify(sbx.uid).decode())) 149 | if metadata: 150 | print("metadata:") 151 | if "sbxname" in metadata: 152 | print(" SBX name : '%s'" % (metadata["sbxname"])) 153 | if "filename" in metadata: 154 | print(" file name: '%s'" % (metadata["filename"])) 155 | if "filesize" in metadata: 156 | print(" file size: %i bytes" % (metadata["filesize"])) 157 | if "sbxdatetime" in metadata: 158 | print(" SBX date&time : %s" % 159 | (time.strftime("%Y-%m-%d %H:%M:%S", 160 | time.localtime(metadata["sbxdatetime"])))) 161 | if "filedatetime" in metadata: 162 | print(" file date&time: %s" % 163 | (time.strftime("%Y-%m-%d %H:%M:%S", 164 | time.localtime(metadata["filedatetime"])))) 165 | if "hash" in metadata: 166 | if hashtype == 0x12: 167 | print(" SHA256: %s" % (binascii.hexlify( 168 | hashdigest).decode())) 169 | else: 170 | print(" hash type not recognized!") 171 | sys.exit(0) 172 | 173 | #evaluate target filename 174 | if not cmdline.test: 175 | if not filename: 176 | if "filename" in metadata: 177 | filename = metadata["filename"] 178 | else: 179 | filename = os.path.split(sbxfilename)[1] + ".out" 180 | elif os.path.isdir(filename): 181 | if "filename" in metadata: 182 | filename = os.path.join(filename, metadata["filename"]) 183 | else: 184 | filename = os.path.join(filename, 185 | os.path.split(sbxfilename)[1] + ".out") 186 | 187 | if os.path.exists(filename) and not cmdline.overwrite: 188 | errexit(1, "target file '%s' already exists!" % (filename)) 189 | print("creating file '%s'..." % (filename)) 190 | fout= open(filename, "wb", buffering=1024*1024) 191 | 192 | if hashtype == 0x12: 193 | d = hashlib.sha256() 194 | lastblocknum = 0 195 | 196 | filesize = 0 197 | blockmiss = 0 198 | updatetime = time.time() 199 | while True: 200 | buffer = fin.read(sbx.blocksize) 201 | if len(buffer) < sbx.blocksize: 202 | break 203 | 204 | try: 205 | sbx.decode(buffer) 206 | if sbx.blocknum > lastblocknum+1: 207 | if cmdline.cont: 208 | blockmiss += 1 209 | lastblocknum += 1 210 | else: 211 | errexit(errlev=1, mess="block %i out of order or missing" 212 | % (lastblocknum+1)) 213 | lastblocknum += 1 214 | if trimfilesize: 215 | filesize += sbx.datasize 216 | if filesize > metadata["filesize"]: 217 | sbx.data = sbx.data[:-(filesize - metadata["filesize"])] 218 | if hashcheck: 219 | d.update(sbx.data) 220 | if not cmdline.test: 221 | fout.write(sbx.data) 222 | 223 | except seqbox.SbxDecodeError as err: 224 | if cmdline.cont: 225 | blockmiss += 1 226 | lastblocknum += 1 227 | else: 228 | print(err) 229 | errexit(errlev=1, mess="invalid block at offset %s" % 230 | (hex(fin.tell()-sbx.blocksize))) 231 | 232 | #some progress report 233 | if time.time() > updatetime: 234 | print(" %.1f%%" % (fin.tell()*100.0/sbxfilesize), 235 | end="\r", flush=True) 236 | updatetime = time.time() + .1 237 | 238 | fin.close() 239 | if not cmdline.test: 240 | fout.close() 241 | if metadata: 242 | if "filedatetime" in metadata: 243 | os.utime(filename, 244 | (int(time.time()), metadata["filedatetime"])) 245 | 246 | print("SBX decoding complete") 247 | if blockmiss: 248 | errexit(1, "missing blocks: %i" % blockmiss) 249 | 250 | if hashcheck: 251 | if hashtype == 0x12: 252 | print("SHA256", d.hexdigest()) 253 | 254 | if d.digest() == hashdigest: 255 | print("hash match!") 256 | else: 257 | errexit(1, "hash mismatch! decoded file corrupted!") 258 | else: 259 | print("can't check integrity via hash!") 260 | #if filesize unknown, estimate based on 0x1a padding at block's end 261 | if not trimfilesize: 262 | c = lastEofCount(sbx.data[-4:]) 263 | print("EOF markers at the end of last block: %i/4" % c) 264 | 265 | 266 | if __name__ == '__main__': 267 | main() 268 | -------------------------------------------------------------------------------- /sbxenc.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # SBXEnc - Sequenced Box container Encoder 5 | # 6 | # Created: 10/02/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import hashlib 29 | import argparse 30 | import binascii 31 | from functools import partial 32 | from time import time 33 | 34 | import seqbox 35 | 36 | PROGRAM_VER = "1.0.2" 37 | 38 | def get_cmdline(): 39 | """Evaluate command line parameters, usage & help.""" 40 | parser = argparse.ArgumentParser( 41 | description="create a SeqBox container", 42 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 43 | prefix_chars='-+') 44 | parser.add_argument("-v", "--version", action='version', 45 | version='SeqBox - Sequenced Box container - ' + 46 | 'Encoder v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 47 | parser.add_argument("filename", action="store", 48 | help="file to encode") 49 | parser.add_argument("sbxfilename", action="store", nargs='?', 50 | help="SBX container") 51 | parser.add_argument("-o", "--overwrite", action="store_true", default=False, 52 | help="overwrite existing file") 53 | parser.add_argument("-nm","--nometa", action="store_true", default=False, 54 | help="exclude matadata block") 55 | parser.add_argument("-uid", action="store", default="r", type=str, 56 | help="use random or custom UID (up to 12 hexdigits)") 57 | parser.add_argument("-sv", "--sbxver", type=int, default=1, 58 | help="SBX blocks version", metavar="n") 59 | parser.add_argument("-p", "--password", type=str, default="", 60 | help="encrypt with password", metavar="pass") 61 | res = parser.parse_args() 62 | return res 63 | 64 | 65 | def errexit(errlev=1, mess=""): 66 | """Display an error and exit.""" 67 | if mess != "": 68 | sys.stderr.write("%s: error: %s\n" % 69 | (os.path.split(sys.argv[0])[1], mess)) 70 | sys.exit(errlev) 71 | 72 | 73 | def getsha256(filename): 74 | """SHA256 used to verify the integrity of the encoded file""" 75 | with open(filename, mode='rb') as fin: 76 | d = hashlib.sha256() 77 | for buf in iter(partial(fin.read, 1024*1024), b''): 78 | d.update(buf) 79 | return d.digest() 80 | 81 | 82 | def main(): 83 | 84 | cmdline = get_cmdline() 85 | 86 | filename = cmdline.filename 87 | sbxfilename = cmdline.sbxfilename 88 | if not sbxfilename: 89 | sbxfilename = os.path.split(filename)[1] + ".sbx" 90 | elif os.path.isdir(sbxfilename): 91 | sbxfilename = os.path.join(sbxfilename, 92 | os.path.split(filename)[1] + ".sbx") 93 | if os.path.exists(sbxfilename) and not cmdline.overwrite: 94 | errexit(1, "SBX file '%s' already exists!" % (sbxfilename)) 95 | 96 | #parse eventual custom uid 97 | uid = cmdline.uid 98 | if uid !="r": 99 | uid = uid[-12:] 100 | try: 101 | uid = int(uid, 16).to_bytes(6, byteorder='big') 102 | except: 103 | errexit(1, "invalid UID") 104 | 105 | if not os.path.exists(filename): 106 | errexit(1, "file '%s' not found" % (filename)) 107 | filesize = os.path.getsize(filename) 108 | 109 | fout = open(sbxfilename, "wb", buffering=1024*1024) 110 | 111 | #calc hash - before all processing, and not while reading the file, 112 | #just to be cautious 113 | if not cmdline.nometa: 114 | print("hashing file '%s'..." % (filename)) 115 | sha256 = getsha256(filename) 116 | print("SHA256",binascii.hexlify(sha256).decode()) 117 | 118 | fin = open(filename, "rb", buffering=1024*1024) 119 | print("creating file '%s'..." % sbxfilename) 120 | 121 | sbx = seqbox.SbxBlock(uid=uid, ver=cmdline.sbxver, pswd=cmdline.password) 122 | 123 | #write metadata block 0 124 | if not cmdline.nometa: 125 | sbx.metadata = {"filesize":filesize, 126 | "filename":os.path.split(filename)[1], 127 | "sbxname":os.path.split(sbxfilename)[1], 128 | "filedatetime":int(os.path.getmtime(filename)), 129 | "sbxdatetime":int(time()), 130 | "hash":b'\x12\x20'+sha256} #multihash 131 | fout.write(sbx.encode()) 132 | 133 | #write all other blocks 134 | ticks = 0 135 | updatetime = time() 136 | while True: 137 | buffer = fin.read(sbx.datasize) 138 | if len(buffer) < sbx.datasize: 139 | if len(buffer) == 0: 140 | break 141 | sbx.blocknum += 1 142 | sbx.data = buffer 143 | fout.write(sbx.encode()) 144 | 145 | #some progress update 146 | if time() > updatetime: 147 | print("%.1f%%" % (fin.tell()*100.0/filesize), " ", 148 | end="\r", flush=True) 149 | updatetime = time() + .1 150 | 151 | print("100% ") 152 | fin.close() 153 | fout.close() 154 | 155 | totblocks = sbx.blocknum if cmdline.nometa else sbx.blocknum + 1 156 | sbxfilesize = totblocks * sbx.blocksize 157 | overhead = 100.0 * sbxfilesize / filesize - 100 if filesize > 0 else 0 158 | print("SBX file size: %i - blocks: %i - overhead: %.1f%%" % 159 | (sbxfilesize, totblocks, overhead)) 160 | 161 | 162 | if __name__ == '__main__': 163 | main() 164 | -------------------------------------------------------------------------------- /sbxreco.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # SBXReco - Sequenced Box container Recover 5 | #s 6 | # Created: 08/03/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import argparse 29 | import binascii 30 | import sqlite3 31 | import time 32 | 33 | import seqbox 34 | 35 | PROGRAM_VER = "1.0.2" 36 | 37 | def get_cmdline(): 38 | """Evaluate command line parameters, usage & help.""" 39 | parser = argparse.ArgumentParser( 40 | description="recover SeqBox containers", 41 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 42 | prefix_chars='-+', 43 | fromfile_prefix_chars='@') 44 | parser.add_argument("-v", "--version", action='version', 45 | version='SeqBox - Sequenced Box container - ' + 46 | 'Recover v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 47 | parser.add_argument("dbfilename", action="store", metavar="filename", 48 | help="database with recovery info") 49 | parser.add_argument("destpath", action="store", nargs="?", metavar="path", 50 | help="destination path for recovered sbx files") 51 | parser.add_argument("--all", action="store_true", help="recover all") 52 | parser.add_argument("--file", action="store", nargs="+", metavar="filename", 53 | help="original filename(s) to recover") 54 | parser.add_argument("--sbx", action="store", nargs="+", metavar="filename", 55 | help="SBX filename(s) to recover") 56 | parser.add_argument("--uid", action="store", nargs="+", metavar="uid", 57 | help="UID(s) to recover") 58 | parser.add_argument("-f", "--fill", action="store_true", default=False, 59 | help="fill-in missing blocks") 60 | parser.add_argument("-i", "--info", action="store_true", default=False, 61 | help="show info on recoverable sbx file(s)") 62 | parser.add_argument("-p", "--password", type=str, default="", 63 | help="encrypt with password", metavar="pass") 64 | parser.add_argument("-o", "--overwrite", action="store_true", default=False, 65 | help="overwrite existing sbx file(s)") 66 | res = parser.parse_args() 67 | return res 68 | 69 | 70 | def errexit(errlev=1, mess=""): 71 | """Display an error and exit.""" 72 | if mess != "": 73 | sys.stderr.write("%s: error: %s\n" % 74 | (os.path.split(sys.argv[0])[1], mess)) 75 | sys.exit(errlev) 76 | 77 | 78 | class RecDB(): 79 | """Helper class to access Sqlite3 DB with recovery info""" 80 | 81 | def __init__(self, dbfilename): 82 | self.connection = sqlite3.connect(dbfilename) 83 | self.cursor = self.connection.cursor() 84 | 85 | def GetMetaFromUID(self, uid): 86 | meta = {} 87 | c = self.cursor 88 | c.execute("SELECT * from sbx_meta where uid = '%i'" % uid) 89 | res = c.fetchone() 90 | if res: 91 | meta["filesize"] = res[1] 92 | meta["filename"] = res[2] 93 | meta["sbxname"] = res[3] 94 | meta["filedatetime"] = res[4] 95 | meta["sbxdatetime"] = res[5] 96 | return meta 97 | 98 | def GetUIDFromFileName(self, filename): 99 | c = self.cursor 100 | c.execute("select uid from sbx_meta where name = '%s'" % (filename)) 101 | res = c.fetchone() 102 | if res: 103 | return(res[0]) 104 | 105 | def GetUIDFromSbxName(self, sbxname): 106 | c = self.cursor 107 | c.execute("select uid from sbx_meta where sbxname = '%s'" % (sbxname)) 108 | res = c.fetchone() 109 | if res: 110 | return(res[0]) 111 | 112 | def GetBlocksCountFromUID(self, uid): 113 | c = self.cursor 114 | c.execute("SELECT uid from sbx_blocks where uid = '%i' group by num order by num" % (uid)) 115 | return len(c.fetchall()) 116 | 117 | def GetBlocksListFromUID(self, uid): 118 | c = self.cursor 119 | c.execute("SELECT num, fileid, pos from sbx_blocks where uid = '%i' group by num order by num" % (uid)) 120 | return c.fetchall() 121 | 122 | def GetUIDDataList(self): 123 | c = self.cursor 124 | c.execute("SELECT * from sbx_uids") 125 | res = {row[0]:row[1] for row in c.fetchall()} 126 | return res 127 | 128 | def GetSourcesList(self): 129 | c = self.cursor 130 | c.execute("SELECT * FROM sbx_source") 131 | return c.fetchall() 132 | 133 | 134 | def uniquifyFileName(filename): 135 | count = 0 136 | uniq = "" 137 | name,ext = os.path.splitext(filename) 138 | while os.path.exists(filename): 139 | count += 1 140 | uniq = "(%i)" % count 141 | filename = name + uniq + ext 142 | return filename 143 | 144 | 145 | def report(db, uidDataList, blocksizes): 146 | """Create a report with the info obtained by SbxScan""" 147 | #just the basic info in CSV format for the moment 148 | 149 | print('\n"UID", "filesize", "sbxname", "filename", "filedatetime"') 150 | 151 | for uid in uidDataList: 152 | hexdigits = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode() 153 | metadata = db.GetMetaFromUID(uid) 154 | blocksnum = db.GetBlocksCountFromUID(uid) 155 | filename = metadata["filename"] if "filename" in metadata else "" 156 | sbxname = metadata["sbxname"] if "sbxname" in metadata else "" 157 | if "filesize" in metadata: 158 | filesize = metadata["filesize"] 159 | else: 160 | filesize = blocksnum * blocksizes[uidDataList[uid]] 161 | filedatetime = "n/a" 162 | if "filedatetime" in metadata: 163 | if metadata["filedatetime"] >= 0: 164 | filedatetime = time.strftime("%Y-%m-%d %H:%M:%S", 165 | time.localtime(metadata["filedatetime"])) 166 | 167 | print('"%s", %i, "%s", "%s", "%s"' % 168 | (hexdigits, filesize, sbxname, filename, filedatetime)) 169 | 170 | 171 | def report_err(db, uiderrlist, uidDataList, blocksizes): 172 | """Create a report with recovery errors""" 173 | #just the basic info in CSV format for the moment 174 | 175 | print('\n"UID", "blocks", "errs", "filesize", "sbxname", "filename"') 176 | for info in uiderrlist: 177 | uid = info[0] 178 | errblocks = info[1] 179 | hexdigits = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode() 180 | metadata = db.GetMetaFromUID(uid) 181 | blocksnum = db.GetBlocksCountFromUID(uid) 182 | filename = metadata["filename"] if "filename" in metadata else "" 183 | sbxname = metadata["sbxname"] if "sbxname" in metadata else "" 184 | 185 | if "filesize" in metadata: 186 | filesize = metadata["filesize"] 187 | else: 188 | filesize = blocksnum * blocksizes[uidDataList[uid]] 189 | 190 | print('"%s", %i, %i, %i, "%s", "%s"' % 191 | (hexdigits, blocksnum, errblocks, filesize, sbxname, filename)) 192 | 193 | 194 | def main(): 195 | 196 | cmdline = get_cmdline() 197 | 198 | dbfilename = cmdline.dbfilename 199 | if not os.path.exists(dbfilename) or os.path.isdir(dbfilename): 200 | errexit(1,"file '%s' not found!" % (dbfilename)) 201 | 202 | #open database 203 | print("opening '%s' recovery info database..." % (dbfilename)) 204 | db = RecDB(dbfilename) 205 | 206 | #get data on all uids present 207 | uidDataList = db.GetUIDDataList() 208 | 209 | #get blocksizes for every supported SBx version 210 | blocksizes = {} 211 | for v in seqbox.supported_vers: 212 | blocksizes[v] = seqbox.SbxBlock(ver=v).blocksize 213 | 214 | #info/report 215 | if cmdline.info: 216 | report(db, uidDataList, blocksizes) 217 | errexit(0) 218 | 219 | #build a list of uids to recover: 220 | uidRecoList = [] 221 | if cmdline.all: 222 | uidRecoList = list(uidDataList) 223 | else: 224 | if cmdline.uid: 225 | for hexuid in cmdline.uid: 226 | if len(hexuid) % 2 != 0: 227 | errexit(1, "invalid UID!") 228 | uid = int.from_bytes(binascii.unhexlify(hexuid), 229 | byteorder="big") 230 | if db.GetBlocksCountFromUID(uid): 231 | uidRecoList.append(uid) 232 | else: 233 | errexit(1,"no recoverable UID '%s'" % (hexuid)) 234 | if cmdline.sbx: 235 | for sbxname in cmdline.sbx: 236 | uid = db.GetUIDFromSbxName(sbxname) 237 | if uid: 238 | uidRecoList.append(uid) 239 | else: 240 | errexit(1,"no recoverable sbx file '%s'" % (sbxname)) 241 | if cmdline.file: 242 | for filename in cmdline.file: 243 | uid = db.GetUIDFromFileName(filename) 244 | if uid: 245 | uidRecoList.append(uid) 246 | else: 247 | errexit(1,"no recoverable file '%s'" % (filename)) 248 | 249 | if len(uidRecoList) == 0: 250 | errexit(1, "nothing to recover!") 251 | 252 | print("recovering SBX files...") 253 | uid_list = sorted(set(uidRecoList)) 254 | 255 | #open all the sources 256 | finlist = {} 257 | for key, value in db.GetSourcesList(): 258 | finlist[key] = open(value, "rb") 259 | 260 | uidcount = 0 261 | totblocks = 0 262 | totblockserr = 0 263 | uiderrlist = [] 264 | for uid in uidRecoList: 265 | uidcount += 1 266 | sbxver = uidDataList[uid] 267 | sbx = seqbox.SbxBlock(ver=sbxver, pswd=cmdline.password) 268 | hexuid = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode() 269 | print("UID %s (%i/%i)" % (hexuid, uidcount, len(uid_list))) 270 | 271 | blocksnum = db.GetBlocksCountFromUID(uid) 272 | print(" blocks: %i - size: %i bytes" % 273 | (blocksnum, blocksnum * sbx.blocksize)) 274 | meta = db.GetMetaFromUID(uid) 275 | if "sbxname" in meta: 276 | sbxname = meta["sbxname"] 277 | else: 278 | #use hex uid as name if no metadata present 279 | sbxname = (binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode() + 280 | ".sbx") 281 | if cmdline.destpath: 282 | sbxname = os.path.join(cmdline.destpath, sbxname) 283 | print(" to: '%s'" % sbxname) 284 | 285 | if not cmdline.overwrite: 286 | sbxname = uniquifyFileName(sbxname) 287 | fout = open(sbxname, "wb", buffering = 1024*1024) 288 | 289 | blockdatalist = db.GetBlocksListFromUID(uid) 290 | #read 1 block to initialize the correct block parameters 291 | #(needed for filling in missing blocks) 292 | blockdata = blockdatalist[0] 293 | fin = finlist[blockdata[1]] 294 | bpos = blockdata[2] 295 | fin.seek(bpos, 0) 296 | try: 297 | sbx.decode(fin.read(sbx.blocksize)) 298 | except seqbox.SbxDecodeError as err: 299 | print(err) 300 | errexit(1, "invalid block at offset %s file '%s'" % 301 | (hex(bpos), fin.name)) 302 | 303 | lastblock = -1 304 | ticks = 0 305 | missingblocks = 0 306 | updatetime = time.time() -1 307 | maxbnum = blockdatalist[-1][0] 308 | #loop trough the block list and recreate SBx file 309 | for blockdata in blockdatalist: 310 | bnum = blockdata[0] 311 | #check for missing blocks and fill in 312 | if bnum != lastblock +1 and bnum != 1: 313 | for b in range(lastblock+1, bnum): 314 | #no point in an empty block 0 with no metadata 315 | if b > 0 and cmdline.fill: 316 | sbx.blocknum = b 317 | sbx.data = bytes(sbx.datasize) 318 | buffer = sbx.encode() 319 | fout.write(buffer) 320 | missingblocks += 1 321 | 322 | fin = finlist[blockdata[1]] 323 | bpos = blockdata[2] 324 | fin.seek(bpos, 0) 325 | buffer = fin.read(sbx.blocksize) 326 | fout.write(buffer) 327 | lastblock = bnum 328 | 329 | #some progress report 330 | if time.time() > updatetime or bnum == maxbnum: 331 | print(" %.1f%%" % (bnum*100.0/maxbnum), " ", 332 | "(missing blocks: %i)" % missingblocks, 333 | end="\r", flush=True) 334 | updatetime = time.time() + .5 335 | 336 | fout.close() 337 | #set sbx date&time 338 | if "sbxdatetime" in meta: 339 | if meta["sbxdatetime"] >= 0: 340 | os.utime(sbxname, (int(time.time()), meta["sbxdatetime"])) 341 | 342 | print() 343 | if missingblocks > 0: 344 | uiderrlist.append((uid, missingblocks)) 345 | totblockserr += missingblocks 346 | 347 | print("\ndone.") 348 | if len(uiderrlist) == 0: 349 | print("all SBx files recovered with no errors!") 350 | else: 351 | print("errors detected in %i SBx file(s)!" % len(uiderrlist)) 352 | report_err(db, uiderrlist, uidDataList, blocksizes) 353 | 354 | 355 | if __name__ == '__main__': 356 | main() 357 | -------------------------------------------------------------------------------- /sbxscan.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # SBXScan - Sequenced Box container Scanner 5 | # 6 | # Created: 06/03/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import argparse 29 | import binascii 30 | from time import sleep, time 31 | import sqlite3 32 | 33 | import seqbox 34 | 35 | PROGRAM_VER = "1.0.1" 36 | 37 | def get_cmdline(): 38 | """Evaluate command line parameters, usage & help.""" 39 | parser = argparse.ArgumentParser( 40 | description=("scan files/devices for SBx blocks and create a "+ 41 | "detailed report plus an index to be used with "+ 42 | "SBXScan"), 43 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 44 | prefix_chars='-', fromfile_prefix_chars='@') 45 | parser.add_argument("-v", "--version", action='version', 46 | version='SeqBox - Sequenced Box container - ' + 47 | 'Scanner v%s - (C) 2017 by M.Pontello' % PROGRAM_VER) 48 | parser.add_argument("filename", action="store", nargs="+", 49 | help="file(s) to scan") 50 | parser.add_argument("-d", "--database", action="store", dest="dbfilename", 51 | metavar="filename", 52 | help="where to save recovery info", 53 | default="sbxscan.db3") 54 | parser.add_argument("-o", "--offset", type=int, default=0, 55 | help=("offset from the start"), metavar="n") 56 | parser.add_argument("-st", "--step", type=int, default=0, 57 | help=("scan step"), metavar="n") 58 | parser.add_argument("-b", "--buffer", type=int, default=1024, 59 | help=("read buffer in KB"), metavar="n") 60 | parser.add_argument("-sv", "--sbxver", type=int, default=1, 61 | help="SBX blocks version to search for", metavar="n") 62 | parser.add_argument("-p", "--password", type=str, default="", 63 | help="encrypt with password", metavar="pass") 64 | res = parser.parse_args() 65 | return res 66 | 67 | 68 | def errexit(errlev=1, mess=""): 69 | """Display an error and exit.""" 70 | if mess != "": 71 | sys.stderr.write("%s: error: %s\n" % 72 | (os.path.split(sys.argv[0])[1], mess)) 73 | sys.exit(errlev) 74 | 75 | 76 | def getFileSize(filename): 77 | """Calc file size - works on devices too""" 78 | ftemp = os.open(filename, os.O_RDONLY) 79 | try: 80 | return os.lseek(ftemp, 0, os.SEEK_END) 81 | finally: 82 | os.close(ftemp) 83 | 84 | 85 | def main(): 86 | 87 | cmdline = get_cmdline() 88 | 89 | filenames = [] 90 | for filename in cmdline.filename: 91 | if os.path.exists(filename): 92 | filenames.append(filename) 93 | else: 94 | errexit(1, "file '%s' not found!" % (filename)) 95 | filenames = sorted(set(filenames), key=os.path.getsize) 96 | 97 | dbfilename = cmdline.dbfilename 98 | if os.path.isdir(dbfilename): 99 | dbfilename = os.path.join(dbfilename, "sbxscan.db3") 100 | 101 | #create database tables 102 | print("creating '%s' database..." % (dbfilename)) 103 | if os.path.exists(dbfilename): 104 | os.remove(dbfilename) 105 | conn = sqlite3.connect(dbfilename) 106 | c = conn.cursor() 107 | c.execute("CREATE TABLE sbx_source (id INTEGER, name TEXT)") 108 | c.execute("CREATE TABLE sbx_meta (uid INTEGER, size INTEGER, name TEXT, sbxname TEXT, datetime INTEGER, sbxdatetime INTEGER, fileid INTEGER)") 109 | c.execute("CREATE TABLE sbx_uids (uid INTEGER, ver INTEGER)") 110 | c.execute("CREATE TABLE sbx_blocks (uid INTEGER, num INTEGER, fileid INTEGER, pos INTEGER )") 111 | c.execute("CREATE INDEX blocks ON sbx_blocks (uid, num, pos)") 112 | 113 | #scan all the files/devices 114 | sbx = seqbox.SbxBlock(ver=cmdline.sbxver,pswd=cmdline.password) 115 | offset = cmdline.offset 116 | filenum = 0 117 | uids = {} 118 | magic = b'SBx' + bytes([cmdline.sbxver]) 119 | if cmdline.password: 120 | magic = seqbox.EncDec(cmdline.password, len(magic)).xor(magic) 121 | scanstep = cmdline.step 122 | if scanstep == 0: 123 | scanstep = sbx.blocksize 124 | 125 | for filename in filenames: 126 | filenum += 1 127 | print("scanning file/device '%s' (%i/%i)..." % 128 | (filename, filenum, len(filenames))) 129 | filesize = getFileSize(filename) 130 | 131 | c.execute("INSERT INTO sbx_source (id, name) VALUES (?, ?)", 132 | (filenum, filename)) 133 | conn.commit() 134 | 135 | fin = open(filename, "rb", buffering=cmdline.buffer*1024) 136 | blocksfound = 0 137 | blocksmetafound = 0 138 | updatetime = time() - 1 139 | starttime = time() 140 | docommit = False 141 | for pos in range(offset, filesize, scanstep): 142 | fin.seek(pos, 0) 143 | buffer = fin.read(sbx.blocksize) 144 | #check for magic 145 | if buffer[:4] == magic: 146 | #check for valid block 147 | try: 148 | sbx.decode(buffer) 149 | #update uids table & list 150 | if not sbx.uid in uids: 151 | uids[sbx.uid] = True 152 | c.execute( 153 | "INSERT INTO sbx_uids (uid, ver) VALUES (?, ?)", 154 | (int.from_bytes(sbx.uid, byteorder='big'), 155 | sbx.ver)) 156 | docommit = True 157 | 158 | #update blocks table 159 | blocksfound+=1 160 | c.execute( 161 | "INSERT INTO sbx_blocks (uid, num, fileid, pos) VALUES (?, ?, ?, ?)", 162 | (int.from_bytes(sbx.uid, byteorder='big'), 163 | sbx.blocknum, filenum, pos)) 164 | docommit = True 165 | 166 | #update meta table 167 | if sbx.blocknum == 0: 168 | blocksmetafound += 1 169 | if not "filedatetime" in sbx.metadata: 170 | sbx.metadata["filedatetime"] = -1 171 | sbx.metadata["sbxdatetime"] = -1 172 | 173 | c.execute( 174 | "INSERT INTO sbx_meta (uid , size, name, sbxname, datetime, sbxdatetime, fileid) VALUES (?, ?, ?, ?, ?, ?, ?)", 175 | (int.from_bytes(sbx.uid, byteorder='big'), 176 | sbx.metadata["filesize"], 177 | sbx.metadata["filename"], sbx.metadata["sbxname"], 178 | sbx.metadata["filedatetime"], sbx.metadata["sbxdatetime"], 179 | filenum)) 180 | docommit = True 181 | 182 | except seqbox.SbxDecodeError: 183 | pass 184 | 185 | #status update 186 | if (time() > updatetime) or (pos >= filesize - scanstep): 187 | etime = (time()-starttime) 188 | if etime == 0: 189 | etime = 1 190 | print("%5.1f%% blocks: %i - meta: %i - files: %i - %.2fMB/s" % 191 | (pos*100.0/(filesize-scanstep), blocksfound, 192 | blocksmetafound, len(uids), pos/(1024*1024)/etime), 193 | end = "\r", flush=True) 194 | if docommit: 195 | conn.commit() 196 | docommit = False 197 | updatetime = time() + .5 198 | 199 | fin.close() 200 | print() 201 | 202 | c.close() 203 | conn.close() 204 | 205 | print("scan completed!") 206 | 207 | 208 | if __name__ == '__main__': 209 | main() 210 | -------------------------------------------------------------------------------- /seqbox.bt: -------------------------------------------------------------------------------- 1 | //-------------------------------------- 2 | //--- 010 Editor v6.0.3 Binary Template 3 | // 4 | // File: seqbox.bt 5 | // Author: Marco Pontello 6 | // Revision: 1 7 | // Purpose: Explore SeqBox container 8 | // https://github.com/MarcoPon/SeqBox 9 | //-------------------------------------- 10 | 11 | local int BLOCKSIZE = 512; 12 | 13 | BigEndian(); 14 | DisplayFormatHex(); 15 | 16 | struct BLOCK { 17 | struct HEADER { 18 | struct MAGIC { 19 | char signature[3]; 20 | } magic; 21 | byte version; 22 | short CRC16; 23 | struct UID { 24 | byte uid[6]; 25 | } uid; 26 | int blocknum; 27 | } header; 28 | byte data[BLOCKSIZE - sizeof(header)]; 29 | } block[FileSize() / BLOCKSIZE]; 30 | 31 | 32 | -------------------------------------------------------------------------------- /seqbox.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | #-------------------------------------------------------------------------- 4 | # SeqBox - Sequenced Box container module 5 | # 6 | # Created: 03/03/2017 7 | # 8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/ 9 | # 10 | # Licence: 11 | # This program is free software: you can redistribute it and/or modify 12 | # it under the terms of the GNU Affero General Public License as 13 | # published by the Free Software Foundation, either version 3 of the 14 | # License, or (at your option) any later version. 15 | # 16 | # This program is distributed in the hope that it will be useful, 17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 | # GNU Affero General Public License for more details. 20 | # 21 | # You should have received a copy of the GNU Affero General Public License 22 | # along with this program. If not, see . 23 | # 24 | #-------------------------------------------------------------------------- 25 | 26 | import os 27 | import sys 28 | import binascii 29 | import random 30 | import hashlib 31 | 32 | supported_vers = [1, 2, 3] 33 | 34 | 35 | #Some custom exceptions 36 | class SbxError(Exception): 37 | pass 38 | 39 | class SbxDecodeError(SbxError): 40 | pass 41 | 42 | 43 | class SbxBlock(): 44 | """ 45 | Implement a basic SBX block 46 | """ 47 | 48 | def __init__(self, ver=1, uid="r", pswd=""): 49 | self.ver = ver 50 | if ver == 1: 51 | self.blocksize = 512 52 | self.hdrsize = 16 53 | elif ver == 2: 54 | #mostly a test to double check that all tools works correctly 55 | #with different blocks versions/parameters. 56 | #or it could be good for CP/M! :) 57 | self.blocksize = 128 58 | self.hdrsize = 16 59 | elif ver == 3: 60 | #and another one for big blocks, to be used just if absolute 61 | #sure that the SBX file will not be used on a system with 62 | #smaller blocks 63 | self.blocksize = 4096 64 | self.hdrsize = 16 65 | else: 66 | raise SbxError("version %i not supported" % ver) 67 | self.datasize = self.blocksize - self.hdrsize 68 | self.magic = b'SBx' + bytes([ver]) 69 | self.blocknum = 0 70 | 71 | 72 | if uid == "r": 73 | random.seed() 74 | self.uid = random.getrandbits(6*8).to_bytes(6, byteorder='big') 75 | else: 76 | self.uid = (b'\x00'*6 + uid)[-6:] 77 | 78 | if pswd: 79 | self.encdec = EncDec(pswd, self.blocksize) 80 | else: 81 | self.encdec = False 82 | 83 | self.parent_uid = 0 84 | self.metadata = {} 85 | self.data = b"" 86 | 87 | def __str__(self): 88 | return "SBX Block ver: '%i', size: %i, hdr size: %i, data: %i" % \ 89 | (self.ver, self.blocksize, self.hdrsize, self.datasize) 90 | 91 | def encode(self): 92 | if self.blocknum == 0: 93 | self.data = b"" 94 | if "filename" in self.metadata: 95 | bb = self.metadata["filename"].encode() 96 | self.data += b"FNM" + bytes([len(bb)]) + bb 97 | if "sbxname" in self.metadata: 98 | bb = self.metadata["sbxname"].encode() 99 | self.data += b"SNM" + bytes([len(bb)]) + bb 100 | if "filesize" in self.metadata: 101 | bb = self.metadata["filesize"].to_bytes(8, byteorder='big') 102 | self.data += b"FSZ" + bytes([len(bb)]) + bb 103 | if "filedatetime" in self.metadata: 104 | bb = self.metadata["filedatetime"].to_bytes(8, byteorder='big') 105 | self.data += b"FDT" + bytes([len(bb)]) + bb 106 | if "sbxdatetime" in self.metadata: 107 | bb = self.metadata["sbxdatetime"].to_bytes(8, byteorder='big') 108 | self.data += b"SDT" + bytes([len(bb)]) + bb 109 | if "hash" in self.metadata: 110 | bb = self.metadata["hash"] 111 | self.data += b"HSH" + bytes([len(bb)]) + bb 112 | 113 | data = self.data + b'\x1A' * (self.datasize - len(self.data)) 114 | buffer = (self.uid + 115 | self.blocknum.to_bytes(4, byteorder='big') + 116 | data) 117 | crc = binascii.crc_hqx(buffer, self.ver).to_bytes(2,byteorder='big') 118 | block = self.magic + crc + buffer 119 | if self.encdec: 120 | block = self.encdec.xor(block) 121 | return block 122 | 123 | def decode(self, buffer): 124 | #start setting an invalid block number 125 | self.blocknum = -1 126 | #decode eventual password 127 | if self.encdec: 128 | buffer = self.encdec.xor(buffer) 129 | #check the basics 130 | if len(buffer) != self.blocksize: 131 | raise SbxDecodeError("bad block size") 132 | if buffer[:3] != self.magic[:3]: 133 | raise SbxDecodeError("not an SBX block") 134 | if not buffer[3] in supported_vers: 135 | raise SbxDecodeError("block v%i not supported" % buffer[3]) 136 | 137 | #check CRC of rest of the block 138 | crc = int.from_bytes(buffer[4:6], byteorder='big') 139 | if crc != binascii.crc_hqx(buffer[6:], self.ver): 140 | raise SbxDecodeError("bad CRC") 141 | 142 | self.parent_uid = 0 143 | 144 | self.uid = buffer[6:12] 145 | self.blocknum = int.from_bytes(buffer[12:16], byteorder='big') 146 | self.data = buffer[16:] 147 | 148 | self.metadata = {} 149 | 150 | if self.blocknum == 0: 151 | #decode meta data 152 | p = 0 153 | while p < (len(self.data)-3): 154 | metaid = self.data[p:p+3] 155 | p+=3 156 | if metaid == b"\x1a\x1a\x1a": 157 | break 158 | else: 159 | metalen = self.data[p] 160 | metabb = self.data[p+1:p+1+metalen] 161 | p = p + 1 + metalen 162 | if metaid == b'FNM': 163 | self.metadata["filename"] = metabb.decode('utf-8') 164 | if metaid == b'SNM': 165 | self.metadata["sbxname"] = metabb.decode('utf-8') 166 | if metaid == b'FSZ': 167 | self.metadata["filesize"] = int.from_bytes(metabb, byteorder='big') 168 | if metaid == b'FDT': 169 | self.metadata["filedatetime"] = int.from_bytes(metabb, byteorder='big') 170 | if metaid == b'SDT': 171 | self.metadata["sbxdatetime"] = int.from_bytes(metabb, byteorder='big') 172 | if metaid == b'HSH': 173 | self.metadata["hash"] = metabb 174 | return True 175 | 176 | 177 | class EncDec(): 178 | """Simple encoding/decoding function""" 179 | #it's not meant as 'strong encryption', but just to hide the presence 180 | #of SBX blocks on a simple scan 181 | def __init__(self, key, size): 182 | #key is kept as a bigint because a xor between two bigint is faster 183 | #than byte-by-byte 184 | d = hashlib.sha256() 185 | key = key.encode() 186 | tempkey = key 187 | while len(tempkey) < size: 188 | d.update(tempkey) 189 | key = d.digest() 190 | tempkey += key 191 | self.key = int(binascii.hexlify(tempkey[:size]), 16) 192 | def xor(self, buffer): 193 | num = int(binascii.hexlify(buffer), 16) ^ self.key 194 | return binascii.unhexlify(hex(num)[2:]) 195 | 196 | 197 | def main(): 198 | print("SeqBox module!") 199 | sys.exit(0) 200 | 201 | if __name__ == '__main__': 202 | main() 203 | -------------------------------------------------------------------------------- /todo.txt: -------------------------------------------------------------------------------- 1 | - 27/02/2017 blake2 would be better than SHA256, but require Python >=3.6. 2 | SHA256 should be good enough for the moment. 3 | 4 | - 12/03/2017 check if struct.pack&unpack is faster than to/from_bytes 5 | 6 | - 21/05/2017 prioritize metadata order; check if there's enough space 7 | to write them --------------------------------------------------------------------------------