├── .gitignore
├── LICENSE-MIT.txt
├── README.md
├── sbx-logo.svg
├── sbxdec.py
├── sbxenc.py
├── sbxreco.py
├── sbxscan.py
├── seqbox.bt
├── seqbox.py
└── todo.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | misc_tools/
2 | *.pyc
3 | *.zip
4 | *.dat
5 | *.sbx
6 | *.seqbox
7 | *.db3
--------------------------------------------------------------------------------
/LICENSE-MIT.txt:
--------------------------------------------------------------------------------
1 | Copyright (c) 2017 Marco Pontello
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy
4 | of this software and associated documentation files (the "Software"), to deal
5 | in the Software without restriction, including without limitation the rights
6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7 | copies of the Software, and to permit persons to whom the Software is
8 | furnished to do so, subject to the following conditions:
9 |
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 |
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SeqBox - Sequenced Box container
2 | ### A single file container/archive that can be reconstructed even after total loss of file system structures.
3 | 
4 |
5 | An SBX container exists both as a normal file in a mounted file system, and as a collection of recognizable blocks at a lower level.
6 |
7 | SBX blocks have a size sub-multiple/equal to that of a sector, so they can survive any level of fragmentation. Each block have a minimal header that include a unique file identifier, block sequence number, checksum, version.
8 | Additional, non critical info/metadata are contained in block 0 (like name, file size, crypto-hash, other attributes, etc.).
9 |
10 | If disaster strikes, recovery can be performed simply scanning a volume/image, reading sector sized slices and checking blocks signatures and then CRCs to detect valid SBX blocks. Then the blocks can be grouped by UIDs, sorted by sequence number and reassembled to form the original SeqBox containers.
11 |
12 | 
13 |
14 | It's also possible and entirely transparent to keep multiple copies of a container, in the same or different media, to increase the chances of recoverability. In case of corrupted blocks, all the good ones can be collected and reassembled from all available sources.
15 |
16 | The UID can be anything, as long as it's unique for the specific application. It could be random generated (probably the most common option), or a hash of the file content, or a simple sequence, etc.
17 |
18 | Overhead is minimal: for SBX v1 is 16B/512B (+1 optional 512B block), or < 3.5%.
19 |
20 | ## Demo tour
21 |
22 | The two main tools are obviously the encoder & decoder:
23 | - SBXEnc: encode a file to a SBX container
24 | - SBXDec: decode SBX back to original file; can also show info on a container and tests for integrity against a crypto-hash
25 |
26 | The other two are the recovery tools:
27 | - SBXScan: scan a set of files (raw images, or even block devices on Linux) to build a Sqlite db with the necessary recovery info
28 | - SBXReco: rebuild SBX files using data collected by SBXScan
29 |
30 | There are in some case many parameters but the default are sensible so it's generally pretty simple.
31 |
32 | Now to a practical example: let's see how 2 photos and their 2 SBX encoded versions go trough a fragmented floppy disk that have lost its FAT (and any other system part). We start with the 2 pictures, about 200KB and 330KB:
33 |
34 |  
35 |
36 | We encode using SBXEnc, and then test the new file with SBXDec, to be sure all is OK:
37 |
38 | ```
39 | C:\t>sbxenc Lake.jpg
40 | hashing file 'Lake.jpg'...
41 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc
42 | creating file 'Lake.jpg.sbx'...
43 | 100%
44 | SBX file size: 343040 - blocks: 670 - overhead: 3.4%
45 |
46 | C:\t>sbxdec -t Lake.jpg.sbx
47 | decoding 'Lake.jpg.sbx'...
48 | metadata block found!
49 | SBX decoding complete
50 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc
51 | hash match!
52 | ```
53 |
54 | Same for the other file. Now we put both the JPEG and the SBX files in a floppy disk image already about half full, that have gone trough various cycles of updating and deleting. As a result the data is laid out like this:
55 |
56 | 
57 |
58 | Normal files (pictures included) are in green, and the two SBX in different shades of blue.
59 | Then with an hex editor we zap the first system sectors and the FAT (in red)!
60 | Time for recovery!
61 |
62 | We start with the free (GPLV v2+) [PhotoRec](http://www.cgsecurity.org/wiki/PhotoRec), which is the go-to tool for these kind of jobs. Parameters are set to "Paranoid : YES (Brute force enabled)" & "Keep corrupted files : Yes", to search the entire data area.
63 | As the files are fragmented, we know we can't expect miracles. The starting sector of the photos will be surely found, but as soon as the first contiguous fragment end, it's anyone guess.
64 |
65 | 
66 |
67 | As expected, something has been recovered. But the 2 files size are off (280K and 400KB). The very first parts of the photos are OK, but then they degrade quickly as other random blocks of data where mixed in. We have all seen JPEGs ending up like this:
68 |
69 |  
70 |
71 | Other popular recovery tools lead to the same results. It's not anyone fault: it's just not possible to know how the various fragment are concatenated, without an index or some kind of list (there are approaches based on file type validators that can in at least some cases differentiate between spurious and *valid* blocks, but that's beside the point).
72 |
73 | But with a SBX file it's a different story. Each one of its block can't be fragmented more, and contains all the needed data to be put in its proper place in sequence. So let's proceed with the recovery of the SBX files.
74 | To spice things up, the disk image file is run trough a scrambler, that swaps variable sized blocks of sectors around. The resulting layout is now this:
75 |
76 | 
77 |
78 | Pretty nightmarish! Now on to SBXScan to search for pieces of SBX files around, and SBXReco to get a report of the collected data:
79 |
80 | ```
81 | C:\t\recovered\sbx>sbxscan \t\scrambled.IMA
82 | creating 'sbxscan.db3' database...
83 | scanning file/device '\t\scrambled.IMA' (1/1)...
84 | 100.0% blocks: 1087 - meta: 2 - files: 2 - 89.97MB/s
85 | scan completed!
86 |
87 | C:\t\recovered\sbx>sbxreco sbxscan.db3 -i
88 | opening 'sbxscan.db3' recovery info database...
89 |
90 | "UID", "filesize", "sbxname", "filename"
91 | "2818b123c00b", 206292, "Castle.jpg.sbx", "Castle.jpg"
92 | "76fe4a49ebf2", 331774, "Lake.jpg.sbx", "Lake.jpg"
93 | ```
94 |
95 | The 2 SBX container have been found, with all the metadata. So the original filesizes are also known, along with the names of the SBX files and the original ones. At this point it would be possible to recover singles files or a group of them, by UID or names, but we opt to recover everything:
96 |
97 | ```
98 | C:\t\recovered\sbx>sbxreco sbxscan.db3 --all
99 | opening 'sbxscan.db3' recovery info database...
100 | recovering SBX files...
101 | UID 2818b123c00b (1/2)
102 | blocks: 417 - size: 213504 bytes
103 | to: 'Castle.jpg.sbx'
104 | 100.0% (missing blocks: 0)
105 | UID 76fe4a49ebf2 (2/2)
106 | blocks: 670 - size: 343040 bytes
107 | to: 'Lake.jpg.sbx'
108 | 100.0% (missing blocks: 0)
109 |
110 | done.
111 | all SBx files recovered with no errors!
112 | ```
113 |
114 | All SBX files seems to have been recovered correctly. We start decoding:
115 |
116 | ```
117 | C:\t\recovered\sbx>sbxdec Lake.jpg.sbx
118 | decoding 'Lake.jpg.sbx'...
119 | metadata block found!
120 | creating file 'Lake.jpg'...
121 | SBX decoding complete
122 | SHA256 3cfc376b6362444d2d25ebedb19e7594000f2ce2bdbb521d98f6c59b5adebfdc
123 | hash match!
124 | ```
125 |
126 | And sure enough:
127 |
128 |  
129 |
130 | N.B. Here's a [7-Zip archive](http://mark0.net/download/sbxdemo-diskimages.7z) with the 2 disk images used in the demo (542KB).
131 |
132 | ## Possible / hypothetical / ideal uses cases
133 | - **Last step of a backup**. After creating a compressed archive of something, the archive could be SeqBox encoded to increase recovery chances in the event of some software/hardware issues that cause logic / file system's damages.
134 | - **Exchange data between different systems**. Regardless of the file system used, an SBX container can always be read/extracted.
135 | - **Long term storage**. Since each block is CRC tagged, and a crypto-hash of the original content is stored, bitrot can be easily detected. In addition, if multiple copies are stored, in the same or different media, the container can be correctly restored with high degree of probability even if all the copies are subject to some damages (in different blocks).
136 | - **Encoding of photos on a SDCard**. Loss of images on perfectly functioning SDCards are known occurrences in the photography world, for example when low on battery and maybe with a camera/firmware with suboptimal monitoring & management strategies. If the photo files are fragmented, recovery tools can usually help only to a point.
137 | - **On-disk format for a File System**. The trade-off in file size and performance (both should be fairly minimal anyway) could be interesting for some application. Maybe it could be a simple option (like compression in many FS). I plan to build a simple/toy FS with FUSE to test the concept, time permitting.
138 | - **Easy file splitting**. Probably less interesting, but a SeqBox container can also be splitted with no particular precautions aside from doing that on block size boundaries. Additionally, there's no need to use special naming conventions, numbering files, etc., as the SBX container can be reassembled exactly like when doing a recovery.
139 | - **Data hiding**. SeqBox containers (or even fragments of them) can be put inside other files (for example at the end of a JPEG, in the middle of a document, etc.), sprayed somewhere in the unused space, between partitions, and so on.
140 | Incidentally, that means that if you are in the digital forensics sector, now you have one more thing to check for!
141 | If a password is used, the entire SBX file is *mangled* to look pseudo-random, and SBXScan, SBXReco & SBXDec will not be able to recognize it unless the same password is provided.
142 |
143 | ## Tests
144 |
145 | SeqBox recoverability have been practically tested with a number of File Systems. The procedure involved using a Virtual Machine (or a full blown emulator) to format a small disk image with a certain FS, filling it with a number of small files, then deleting some of them randomly to free enough space to copy a series of SBX files. This way every SBX file results fragmented in a lot of smaller pieces. Then the image was quick-formatted, wipefs-ed and the VM shutdown.
146 | After that, from the host OS, recovery of the SBX files was attempted using SBXScan & SBXReco on the disk image.
147 |
148 | - **Working**: [ADFS](https://en.wikipedia.org/wiki/Advanced_Disc_Filing_System), [AFS](https://www.alteeve.com/w/Ami_File_Safe), [AFS](https://en.wikipedia.org/wiki/AtheOS_File_System), [AFFS](https://en.wikipedia.org/wiki/Amiga_Fast_File_System), [APFS](https://en.wikipedia.org/wiki/Apple_File_System), [BeFS](https://en.wikipedia.org/wiki/Be_File_System), [BtrFS](https://en.wikipedia.org/wiki/Btrfs), [EXT2/3/4](https://en.wikipedia.org/wiki/Extended_file_system), [F2FS](https://en.wikipedia.org/wiki/F2FS), [FATnn/VFAT/exFAT](https://en.wikipedia.org/wiki/File_Allocation_Table), [HAMMER](https://en.wikipedia.org/wiki/HAMMER), [HFS](https://en.wikipedia.org/wiki/Hierarchical_File_System), [HFS+](https://en.wikipedia.org/wiki/HFS_Plus), [HPFS](https://en.wikipedia.org/wiki/High_Performance_File_System), [JFS](https://en.wikipedia.org/wiki/JFS_(file_system)), [MFS](https://en.wikipedia.org/wiki/Macintosh_File_System), [MINIX FS](https://en.wikipedia.org/wiki/MINIX_file_system), [NTFS](https://en.wikipedia.org/wiki/NTFS), [ProDOS](https://en.wikipedia.org/wiki/Apple_ProDOS), [PFS](https://en.wikipedia.org/wiki/Professional_File_System), [ReFS](https://en.wikipedia.org/wiki/ReFS), [ReiserFS](https://en.wikipedia.org/wiki/ReiserFS), [UFS](https://en.wikipedia.org/wiki/Unix_File_System), [XFS](https://en.wikipedia.org/wiki/XFS), [YAFFS](https://en.wikipedia.org/wiki/YAFFS), [ZFS](https://en.wikipedia.org/wiki/ZFS).
149 |
150 | - **Not working**: [OFS](https://en.wikipedia.org/wiki/Amiga_Old_File_System) (due to 488 data bytes per 512 bytes sector)
151 |
152 |
153 | **N.B.** Obviously SBX blocks can't be found if File System encryption is used. Compression too (mostly, but not always). Striping/RAID instead is usually not a problem.
154 |
155 | Being written in Python 3, SeqBox tools are naturally multi-platform and have been tested successfully on various versions of Windows, on OS X & macOS, on some Linux distros either on x86 or ARM, on FreeBSD and on Android (via QPython).
156 |
157 | ***
158 |
159 | ## Tech spec
160 | Byte order: Big Endian
161 | ### Common blocks header:
162 |
163 | | pos | to pos | size | desc |
164 | |---- | --- | ---- | ----------------------------------- |
165 | | 0 | 2 | 3 | Recoverable Block signature = 'SBx' |
166 | | 3 | 3 | 1 | Version byte |
167 | | 4 | 5 | 2 | CRC-16-CCITT of the rest of the block (Version is used as starting value) |
168 | | 6 | 11 | 6 | file UID |
169 | | 12 | 15 | 4 | Block sequence number |
170 |
171 | ### Block 0
172 |
173 | | pos | to pos | size | desc |
174 | |---- | -------- | ---- | ---------------- |
175 | | 16 | n | var | encoded metadata |
176 | | n+1| blockend | var | padding (0x1a) |
177 |
178 | ### Blocks > 0 & < last:
179 |
180 | | pos | to pos | size | desc |
181 | |---- | -------- | ---- | ---------------- |
182 | | 16 | blockend | var | data |
183 |
184 | ### Blocks == last:
185 |
186 | | pos | to pos | size | desc |
187 | |---- | -------- | ---- | ---------------- |
188 | | 16 | n | var | data |
189 | | n+1 | blockend | var | padding (0x1a) |
190 |
191 | ### Versions:
192 | N.B. Current versions differs only by blocksize.
193 |
194 | | ver | blocksize | note |
195 | |---- | --------- | ------- |
196 | | 1 | 512 | default |
197 | | 2 | 128 | |
198 | | 3 | 4096 | |
199 |
200 | ### Metadata encoding:
201 |
202 | | Bytes | Field |
203 | | ----- | ----- |
204 | | 3 | ID |
205 | | 1 | Len |
206 | | n | Data |
207 |
208 | #### IDs
209 |
210 | | ID | Desc |
211 | | --- | --- |
212 | | FNM | filename (utf-8) |
213 | | SNM | sbx filename (utf-8) |
214 | | FSZ | filesize (8 bytes) |
215 | | FDT | date & time (8 bytes, seconds since epoch) |
216 | | SDT | sbx date & time (8 bytes) |
217 | | HSH | crypto hash (SHA256, using [Multihash](http://multiformats.io) protocol) |
218 | | PID | parent UID (*not used at the moment*)|
219 |
220 | (others IDs for file dates, attributes, etc. will be added...)
221 |
222 | ## Final notes
223 | The code was quickly hacked together in spare slices of time to verify the basic idea, so it's not optimized for speed and will benefit for some refactoring, in time.
224 | Still, the current block format is stable and some precautions have been taken to ensure that any encoded file could be correctly decoded. For example, the SHA256 hash that is stored as metadata is calculated before any other file operation.
225 | So, as long as a newly created SBX file is checked as OK with SBXDec, it should be OK.
226 | Also, SBXEnc and SBXDec by default don't overwrite files, and SBXReco uniquify the recovered ones.
227 | Finally, the file content is not altered in any way (except if a password is used), just re-framed.
228 |
229 | ## Related tools
230 |
231 | Check my [BlockHashLoc](https://github.com/MarcoPon/BlockHashLoc) for a different/sinergic approach to obtaining a similar degree of recoverability, but using a parallel, small hashes file instead of a standalone container. It's probably more suited to protect existing files, when it isn't practical to touch/re-encode them.
232 |
233 | ## Links
234 |
235 | - [SeqBox home page](http://mark0.net/soft-seqbox-e.html)
236 | - [SeqBox GitHub repository](https://github.com/MarcoPon/SeqBox)
237 |
238 | ## Contacts
239 |
240 | If you need more info, want to get in touch, or donate: [Marco Pontello](http://mark0.net/contacts-e.html)
241 |
242 | **Bitcoin**: 1Mark1tF6QGj112F5d3fQALGf41YfzXEK3
243 |
244 | 
--------------------------------------------------------------------------------
/sbx-logo.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
245 |
--------------------------------------------------------------------------------
/sbxdec.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | #--------------------------------------------------------------------------
4 | # SBXDec - Sequenced Box container Decoder
5 | #
6 | # Created: 03/03/2017
7 | #
8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/
9 | #
10 | # Licence:
11 | # This program is free software: you can redistribute it and/or modify
12 | # it under the terms of the GNU Affero General Public License as
13 | # published by the Free Software Foundation, either version 3 of the
14 | # License, or (at your option) any later version.
15 | #
16 | # This program is distributed in the hope that it will be useful,
17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 | # GNU Affero General Public License for more details.
20 | #
21 | # You should have received a copy of the GNU Affero General Public License
22 | # along with this program. If not, see .
23 | #
24 | #--------------------------------------------------------------------------
25 |
26 | import os
27 | import sys
28 | import hashlib
29 | import argparse
30 | import binascii
31 | import time
32 |
33 | import seqbox
34 |
35 | PROGRAM_VER = "1.0.2"
36 |
37 |
38 | def get_cmdline():
39 | """Evaluate command line parameters, usage & help."""
40 | parser = argparse.ArgumentParser(
41 | description="decode a SeqBox container",
42 | formatter_class=argparse.ArgumentDefaultsHelpFormatter,
43 | prefix_chars='-+')
44 | parser.add_argument("-v", "--version", action='version',
45 | version='SeqBox - Sequenced Box container - ' +
46 | 'Decoder v%s - (C) 2017 by M.Pontello' % PROGRAM_VER)
47 | parser.add_argument("sbxfilename", action="store", help="SBx container")
48 | parser.add_argument("filename", action="store", nargs='?',
49 | help="target/decoded file")
50 | parser.add_argument("-t","--test", action="store_true", default=False,
51 | help="test container integrity")
52 | parser.add_argument("-i", "--info", action="store_true", default=False,
53 | help="show informations/metadata")
54 | parser.add_argument("-c", "--continue", action="store_true", default=False,
55 | help="continue on block errors", dest="cont")
56 | parser.add_argument("-o", "--overwrite", action="store_true", default=False,
57 | help="overwrite existing file")
58 | parser.add_argument("-p", "--password", type=str, default="",
59 | help="encrypt with password", metavar="pass")
60 | res = parser.parse_args()
61 | return res
62 |
63 |
64 | def errexit(errlev=1, mess=""):
65 | """Display an error and exit."""
66 | if mess != "":
67 | sys.stderr.write("%s: error: %s\n" %
68 | (os.path.split(sys.argv[0])[1], mess))
69 | sys.exit(errlev)
70 |
71 |
72 | def lastEofCount(data):
73 | count = 0
74 | for b in range(len(data)):
75 | if data[-b-1] != 0x1a:
76 | break
77 | count +=1
78 | return count
79 |
80 |
81 | def main():
82 |
83 | cmdline = get_cmdline()
84 |
85 | sbxfilename = cmdline.sbxfilename
86 | filename = cmdline.filename
87 |
88 | if not os.path.exists(sbxfilename):
89 | errexit(1, "sbx file '%s' not found" % (sbxfilename))
90 | sbxfilesize = os.path.getsize(sbxfilename)
91 |
92 | print("decoding '%s'..." % (sbxfilename))
93 | fin = open(sbxfilename, "rb", buffering=1024*1024)
94 |
95 | #check magic and get version
96 | header = fin.read(4)
97 | fin.seek(0, 0)
98 | if cmdline.password:
99 | e = seqbox.EncDec(cmdline.password, len(header))
100 | header= e.xor(header)
101 | if header[:3] != b"SBx":
102 | errexit(1, "not a SeqBox file!")
103 | sbxver = header[3]
104 |
105 | sbx = seqbox.SbxBlock(ver=sbxver, pswd=cmdline.password)
106 | metadata = {}
107 | trimfilesize = False
108 |
109 | hashtype = 0
110 | hashlen = 0
111 | hashdigest = b""
112 | hashcheck = False
113 |
114 | buffer = fin.read(sbx.blocksize)
115 |
116 | try:
117 | sbx.decode(buffer)
118 | except seqbox.SbxDecodeError as err:
119 | if cmdline.cont == False:
120 | print(err)
121 | errexit(errlev=1, mess="invalid block at offset 0x0")
122 |
123 | if sbx.blocknum > 1:
124 | errexit(errlev=1, mess="blocks missing or out of order at offset 0x0")
125 | elif sbx.blocknum == 0:
126 | print("metadata block found!")
127 | metadata = sbx.metadata
128 | if "filesize" in metadata:
129 | trimfilesize = True
130 | if "hash" in metadata:
131 | hashtype = metadata["hash"][0]
132 | if hashtype == 0x12:
133 | hashlen = metadata["hash"][1]
134 | hashdigest = metadata["hash"][2:2+hashlen]
135 | hashcheck = True
136 |
137 | else:
138 | #first block is data, so reset from the start
139 | print("no metadata available")
140 | fin.seek(0, 0)
141 |
142 | #display some info and stop
143 | if cmdline.info:
144 | print("\nSeqBox container info:")
145 | print(" file size: %i bytes" % (sbxfilesize))
146 | print(" blocks: %i" % (sbxfilesize / sbx.blocksize))
147 | print(" version: %i" % (sbx.ver))
148 | print(" UID: %s" % (binascii.hexlify(sbx.uid).decode()))
149 | if metadata:
150 | print("metadata:")
151 | if "sbxname" in metadata:
152 | print(" SBX name : '%s'" % (metadata["sbxname"]))
153 | if "filename" in metadata:
154 | print(" file name: '%s'" % (metadata["filename"]))
155 | if "filesize" in metadata:
156 | print(" file size: %i bytes" % (metadata["filesize"]))
157 | if "sbxdatetime" in metadata:
158 | print(" SBX date&time : %s" %
159 | (time.strftime("%Y-%m-%d %H:%M:%S",
160 | time.localtime(metadata["sbxdatetime"]))))
161 | if "filedatetime" in metadata:
162 | print(" file date&time: %s" %
163 | (time.strftime("%Y-%m-%d %H:%M:%S",
164 | time.localtime(metadata["filedatetime"]))))
165 | if "hash" in metadata:
166 | if hashtype == 0x12:
167 | print(" SHA256: %s" % (binascii.hexlify(
168 | hashdigest).decode()))
169 | else:
170 | print(" hash type not recognized!")
171 | sys.exit(0)
172 |
173 | #evaluate target filename
174 | if not cmdline.test:
175 | if not filename:
176 | if "filename" in metadata:
177 | filename = metadata["filename"]
178 | else:
179 | filename = os.path.split(sbxfilename)[1] + ".out"
180 | elif os.path.isdir(filename):
181 | if "filename" in metadata:
182 | filename = os.path.join(filename, metadata["filename"])
183 | else:
184 | filename = os.path.join(filename,
185 | os.path.split(sbxfilename)[1] + ".out")
186 |
187 | if os.path.exists(filename) and not cmdline.overwrite:
188 | errexit(1, "target file '%s' already exists!" % (filename))
189 | print("creating file '%s'..." % (filename))
190 | fout= open(filename, "wb", buffering=1024*1024)
191 |
192 | if hashtype == 0x12:
193 | d = hashlib.sha256()
194 | lastblocknum = 0
195 |
196 | filesize = 0
197 | blockmiss = 0
198 | updatetime = time.time()
199 | while True:
200 | buffer = fin.read(sbx.blocksize)
201 | if len(buffer) < sbx.blocksize:
202 | break
203 |
204 | try:
205 | sbx.decode(buffer)
206 | if sbx.blocknum > lastblocknum+1:
207 | if cmdline.cont:
208 | blockmiss += 1
209 | lastblocknum += 1
210 | else:
211 | errexit(errlev=1, mess="block %i out of order or missing"
212 | % (lastblocknum+1))
213 | lastblocknum += 1
214 | if trimfilesize:
215 | filesize += sbx.datasize
216 | if filesize > metadata["filesize"]:
217 | sbx.data = sbx.data[:-(filesize - metadata["filesize"])]
218 | if hashcheck:
219 | d.update(sbx.data)
220 | if not cmdline.test:
221 | fout.write(sbx.data)
222 |
223 | except seqbox.SbxDecodeError as err:
224 | if cmdline.cont:
225 | blockmiss += 1
226 | lastblocknum += 1
227 | else:
228 | print(err)
229 | errexit(errlev=1, mess="invalid block at offset %s" %
230 | (hex(fin.tell()-sbx.blocksize)))
231 |
232 | #some progress report
233 | if time.time() > updatetime:
234 | print(" %.1f%%" % (fin.tell()*100.0/sbxfilesize),
235 | end="\r", flush=True)
236 | updatetime = time.time() + .1
237 |
238 | fin.close()
239 | if not cmdline.test:
240 | fout.close()
241 | if metadata:
242 | if "filedatetime" in metadata:
243 | os.utime(filename,
244 | (int(time.time()), metadata["filedatetime"]))
245 |
246 | print("SBX decoding complete")
247 | if blockmiss:
248 | errexit(1, "missing blocks: %i" % blockmiss)
249 |
250 | if hashcheck:
251 | if hashtype == 0x12:
252 | print("SHA256", d.hexdigest())
253 |
254 | if d.digest() == hashdigest:
255 | print("hash match!")
256 | else:
257 | errexit(1, "hash mismatch! decoded file corrupted!")
258 | else:
259 | print("can't check integrity via hash!")
260 | #if filesize unknown, estimate based on 0x1a padding at block's end
261 | if not trimfilesize:
262 | c = lastEofCount(sbx.data[-4:])
263 | print("EOF markers at the end of last block: %i/4" % c)
264 |
265 |
266 | if __name__ == '__main__':
267 | main()
268 |
--------------------------------------------------------------------------------
/sbxenc.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | #--------------------------------------------------------------------------
4 | # SBXEnc - Sequenced Box container Encoder
5 | #
6 | # Created: 10/02/2017
7 | #
8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/
9 | #
10 | # Licence:
11 | # This program is free software: you can redistribute it and/or modify
12 | # it under the terms of the GNU Affero General Public License as
13 | # published by the Free Software Foundation, either version 3 of the
14 | # License, or (at your option) any later version.
15 | #
16 | # This program is distributed in the hope that it will be useful,
17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 | # GNU Affero General Public License for more details.
20 | #
21 | # You should have received a copy of the GNU Affero General Public License
22 | # along with this program. If not, see .
23 | #
24 | #--------------------------------------------------------------------------
25 |
26 | import os
27 | import sys
28 | import hashlib
29 | import argparse
30 | import binascii
31 | from functools import partial
32 | from time import time
33 |
34 | import seqbox
35 |
36 | PROGRAM_VER = "1.0.2"
37 |
38 | def get_cmdline():
39 | """Evaluate command line parameters, usage & help."""
40 | parser = argparse.ArgumentParser(
41 | description="create a SeqBox container",
42 | formatter_class=argparse.ArgumentDefaultsHelpFormatter,
43 | prefix_chars='-+')
44 | parser.add_argument("-v", "--version", action='version',
45 | version='SeqBox - Sequenced Box container - ' +
46 | 'Encoder v%s - (C) 2017 by M.Pontello' % PROGRAM_VER)
47 | parser.add_argument("filename", action="store",
48 | help="file to encode")
49 | parser.add_argument("sbxfilename", action="store", nargs='?',
50 | help="SBX container")
51 | parser.add_argument("-o", "--overwrite", action="store_true", default=False,
52 | help="overwrite existing file")
53 | parser.add_argument("-nm","--nometa", action="store_true", default=False,
54 | help="exclude matadata block")
55 | parser.add_argument("-uid", action="store", default="r", type=str,
56 | help="use random or custom UID (up to 12 hexdigits)")
57 | parser.add_argument("-sv", "--sbxver", type=int, default=1,
58 | help="SBX blocks version", metavar="n")
59 | parser.add_argument("-p", "--password", type=str, default="",
60 | help="encrypt with password", metavar="pass")
61 | res = parser.parse_args()
62 | return res
63 |
64 |
65 | def errexit(errlev=1, mess=""):
66 | """Display an error and exit."""
67 | if mess != "":
68 | sys.stderr.write("%s: error: %s\n" %
69 | (os.path.split(sys.argv[0])[1], mess))
70 | sys.exit(errlev)
71 |
72 |
73 | def getsha256(filename):
74 | """SHA256 used to verify the integrity of the encoded file"""
75 | with open(filename, mode='rb') as fin:
76 | d = hashlib.sha256()
77 | for buf in iter(partial(fin.read, 1024*1024), b''):
78 | d.update(buf)
79 | return d.digest()
80 |
81 |
82 | def main():
83 |
84 | cmdline = get_cmdline()
85 |
86 | filename = cmdline.filename
87 | sbxfilename = cmdline.sbxfilename
88 | if not sbxfilename:
89 | sbxfilename = os.path.split(filename)[1] + ".sbx"
90 | elif os.path.isdir(sbxfilename):
91 | sbxfilename = os.path.join(sbxfilename,
92 | os.path.split(filename)[1] + ".sbx")
93 | if os.path.exists(sbxfilename) and not cmdline.overwrite:
94 | errexit(1, "SBX file '%s' already exists!" % (sbxfilename))
95 |
96 | #parse eventual custom uid
97 | uid = cmdline.uid
98 | if uid !="r":
99 | uid = uid[-12:]
100 | try:
101 | uid = int(uid, 16).to_bytes(6, byteorder='big')
102 | except:
103 | errexit(1, "invalid UID")
104 |
105 | if not os.path.exists(filename):
106 | errexit(1, "file '%s' not found" % (filename))
107 | filesize = os.path.getsize(filename)
108 |
109 | fout = open(sbxfilename, "wb", buffering=1024*1024)
110 |
111 | #calc hash - before all processing, and not while reading the file,
112 | #just to be cautious
113 | if not cmdline.nometa:
114 | print("hashing file '%s'..." % (filename))
115 | sha256 = getsha256(filename)
116 | print("SHA256",binascii.hexlify(sha256).decode())
117 |
118 | fin = open(filename, "rb", buffering=1024*1024)
119 | print("creating file '%s'..." % sbxfilename)
120 |
121 | sbx = seqbox.SbxBlock(uid=uid, ver=cmdline.sbxver, pswd=cmdline.password)
122 |
123 | #write metadata block 0
124 | if not cmdline.nometa:
125 | sbx.metadata = {"filesize":filesize,
126 | "filename":os.path.split(filename)[1],
127 | "sbxname":os.path.split(sbxfilename)[1],
128 | "filedatetime":int(os.path.getmtime(filename)),
129 | "sbxdatetime":int(time()),
130 | "hash":b'\x12\x20'+sha256} #multihash
131 | fout.write(sbx.encode())
132 |
133 | #write all other blocks
134 | ticks = 0
135 | updatetime = time()
136 | while True:
137 | buffer = fin.read(sbx.datasize)
138 | if len(buffer) < sbx.datasize:
139 | if len(buffer) == 0:
140 | break
141 | sbx.blocknum += 1
142 | sbx.data = buffer
143 | fout.write(sbx.encode())
144 |
145 | #some progress update
146 | if time() > updatetime:
147 | print("%.1f%%" % (fin.tell()*100.0/filesize), " ",
148 | end="\r", flush=True)
149 | updatetime = time() + .1
150 |
151 | print("100% ")
152 | fin.close()
153 | fout.close()
154 |
155 | totblocks = sbx.blocknum if cmdline.nometa else sbx.blocknum + 1
156 | sbxfilesize = totblocks * sbx.blocksize
157 | overhead = 100.0 * sbxfilesize / filesize - 100 if filesize > 0 else 0
158 | print("SBX file size: %i - blocks: %i - overhead: %.1f%%" %
159 | (sbxfilesize, totblocks, overhead))
160 |
161 |
162 | if __name__ == '__main__':
163 | main()
164 |
--------------------------------------------------------------------------------
/sbxreco.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | #--------------------------------------------------------------------------
4 | # SBXReco - Sequenced Box container Recover
5 | #s
6 | # Created: 08/03/2017
7 | #
8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/
9 | #
10 | # Licence:
11 | # This program is free software: you can redistribute it and/or modify
12 | # it under the terms of the GNU Affero General Public License as
13 | # published by the Free Software Foundation, either version 3 of the
14 | # License, or (at your option) any later version.
15 | #
16 | # This program is distributed in the hope that it will be useful,
17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 | # GNU Affero General Public License for more details.
20 | #
21 | # You should have received a copy of the GNU Affero General Public License
22 | # along with this program. If not, see .
23 | #
24 | #--------------------------------------------------------------------------
25 |
26 | import os
27 | import sys
28 | import argparse
29 | import binascii
30 | import sqlite3
31 | import time
32 |
33 | import seqbox
34 |
35 | PROGRAM_VER = "1.0.2"
36 |
37 | def get_cmdline():
38 | """Evaluate command line parameters, usage & help."""
39 | parser = argparse.ArgumentParser(
40 | description="recover SeqBox containers",
41 | formatter_class=argparse.ArgumentDefaultsHelpFormatter,
42 | prefix_chars='-+',
43 | fromfile_prefix_chars='@')
44 | parser.add_argument("-v", "--version", action='version',
45 | version='SeqBox - Sequenced Box container - ' +
46 | 'Recover v%s - (C) 2017 by M.Pontello' % PROGRAM_VER)
47 | parser.add_argument("dbfilename", action="store", metavar="filename",
48 | help="database with recovery info")
49 | parser.add_argument("destpath", action="store", nargs="?", metavar="path",
50 | help="destination path for recovered sbx files")
51 | parser.add_argument("--all", action="store_true", help="recover all")
52 | parser.add_argument("--file", action="store", nargs="+", metavar="filename",
53 | help="original filename(s) to recover")
54 | parser.add_argument("--sbx", action="store", nargs="+", metavar="filename",
55 | help="SBX filename(s) to recover")
56 | parser.add_argument("--uid", action="store", nargs="+", metavar="uid",
57 | help="UID(s) to recover")
58 | parser.add_argument("-f", "--fill", action="store_true", default=False,
59 | help="fill-in missing blocks")
60 | parser.add_argument("-i", "--info", action="store_true", default=False,
61 | help="show info on recoverable sbx file(s)")
62 | parser.add_argument("-p", "--password", type=str, default="",
63 | help="encrypt with password", metavar="pass")
64 | parser.add_argument("-o", "--overwrite", action="store_true", default=False,
65 | help="overwrite existing sbx file(s)")
66 | res = parser.parse_args()
67 | return res
68 |
69 |
70 | def errexit(errlev=1, mess=""):
71 | """Display an error and exit."""
72 | if mess != "":
73 | sys.stderr.write("%s: error: %s\n" %
74 | (os.path.split(sys.argv[0])[1], mess))
75 | sys.exit(errlev)
76 |
77 |
78 | class RecDB():
79 | """Helper class to access Sqlite3 DB with recovery info"""
80 |
81 | def __init__(self, dbfilename):
82 | self.connection = sqlite3.connect(dbfilename)
83 | self.cursor = self.connection.cursor()
84 |
85 | def GetMetaFromUID(self, uid):
86 | meta = {}
87 | c = self.cursor
88 | c.execute("SELECT * from sbx_meta where uid = '%i'" % uid)
89 | res = c.fetchone()
90 | if res:
91 | meta["filesize"] = res[1]
92 | meta["filename"] = res[2]
93 | meta["sbxname"] = res[3]
94 | meta["filedatetime"] = res[4]
95 | meta["sbxdatetime"] = res[5]
96 | return meta
97 |
98 | def GetUIDFromFileName(self, filename):
99 | c = self.cursor
100 | c.execute("select uid from sbx_meta where name = '%s'" % (filename))
101 | res = c.fetchone()
102 | if res:
103 | return(res[0])
104 |
105 | def GetUIDFromSbxName(self, sbxname):
106 | c = self.cursor
107 | c.execute("select uid from sbx_meta where sbxname = '%s'" % (sbxname))
108 | res = c.fetchone()
109 | if res:
110 | return(res[0])
111 |
112 | def GetBlocksCountFromUID(self, uid):
113 | c = self.cursor
114 | c.execute("SELECT uid from sbx_blocks where uid = '%i' group by num order by num" % (uid))
115 | return len(c.fetchall())
116 |
117 | def GetBlocksListFromUID(self, uid):
118 | c = self.cursor
119 | c.execute("SELECT num, fileid, pos from sbx_blocks where uid = '%i' group by num order by num" % (uid))
120 | return c.fetchall()
121 |
122 | def GetUIDDataList(self):
123 | c = self.cursor
124 | c.execute("SELECT * from sbx_uids")
125 | res = {row[0]:row[1] for row in c.fetchall()}
126 | return res
127 |
128 | def GetSourcesList(self):
129 | c = self.cursor
130 | c.execute("SELECT * FROM sbx_source")
131 | return c.fetchall()
132 |
133 |
134 | def uniquifyFileName(filename):
135 | count = 0
136 | uniq = ""
137 | name,ext = os.path.splitext(filename)
138 | while os.path.exists(filename):
139 | count += 1
140 | uniq = "(%i)" % count
141 | filename = name + uniq + ext
142 | return filename
143 |
144 |
145 | def report(db, uidDataList, blocksizes):
146 | """Create a report with the info obtained by SbxScan"""
147 | #just the basic info in CSV format for the moment
148 |
149 | print('\n"UID", "filesize", "sbxname", "filename", "filedatetime"')
150 |
151 | for uid in uidDataList:
152 | hexdigits = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode()
153 | metadata = db.GetMetaFromUID(uid)
154 | blocksnum = db.GetBlocksCountFromUID(uid)
155 | filename = metadata["filename"] if "filename" in metadata else ""
156 | sbxname = metadata["sbxname"] if "sbxname" in metadata else ""
157 | if "filesize" in metadata:
158 | filesize = metadata["filesize"]
159 | else:
160 | filesize = blocksnum * blocksizes[uidDataList[uid]]
161 | filedatetime = "n/a"
162 | if "filedatetime" in metadata:
163 | if metadata["filedatetime"] >= 0:
164 | filedatetime = time.strftime("%Y-%m-%d %H:%M:%S",
165 | time.localtime(metadata["filedatetime"]))
166 |
167 | print('"%s", %i, "%s", "%s", "%s"' %
168 | (hexdigits, filesize, sbxname, filename, filedatetime))
169 |
170 |
171 | def report_err(db, uiderrlist, uidDataList, blocksizes):
172 | """Create a report with recovery errors"""
173 | #just the basic info in CSV format for the moment
174 |
175 | print('\n"UID", "blocks", "errs", "filesize", "sbxname", "filename"')
176 | for info in uiderrlist:
177 | uid = info[0]
178 | errblocks = info[1]
179 | hexdigits = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode()
180 | metadata = db.GetMetaFromUID(uid)
181 | blocksnum = db.GetBlocksCountFromUID(uid)
182 | filename = metadata["filename"] if "filename" in metadata else ""
183 | sbxname = metadata["sbxname"] if "sbxname" in metadata else ""
184 |
185 | if "filesize" in metadata:
186 | filesize = metadata["filesize"]
187 | else:
188 | filesize = blocksnum * blocksizes[uidDataList[uid]]
189 |
190 | print('"%s", %i, %i, %i, "%s", "%s"' %
191 | (hexdigits, blocksnum, errblocks, filesize, sbxname, filename))
192 |
193 |
194 | def main():
195 |
196 | cmdline = get_cmdline()
197 |
198 | dbfilename = cmdline.dbfilename
199 | if not os.path.exists(dbfilename) or os.path.isdir(dbfilename):
200 | errexit(1,"file '%s' not found!" % (dbfilename))
201 |
202 | #open database
203 | print("opening '%s' recovery info database..." % (dbfilename))
204 | db = RecDB(dbfilename)
205 |
206 | #get data on all uids present
207 | uidDataList = db.GetUIDDataList()
208 |
209 | #get blocksizes for every supported SBx version
210 | blocksizes = {}
211 | for v in seqbox.supported_vers:
212 | blocksizes[v] = seqbox.SbxBlock(ver=v).blocksize
213 |
214 | #info/report
215 | if cmdline.info:
216 | report(db, uidDataList, blocksizes)
217 | errexit(0)
218 |
219 | #build a list of uids to recover:
220 | uidRecoList = []
221 | if cmdline.all:
222 | uidRecoList = list(uidDataList)
223 | else:
224 | if cmdline.uid:
225 | for hexuid in cmdline.uid:
226 | if len(hexuid) % 2 != 0:
227 | errexit(1, "invalid UID!")
228 | uid = int.from_bytes(binascii.unhexlify(hexuid),
229 | byteorder="big")
230 | if db.GetBlocksCountFromUID(uid):
231 | uidRecoList.append(uid)
232 | else:
233 | errexit(1,"no recoverable UID '%s'" % (hexuid))
234 | if cmdline.sbx:
235 | for sbxname in cmdline.sbx:
236 | uid = db.GetUIDFromSbxName(sbxname)
237 | if uid:
238 | uidRecoList.append(uid)
239 | else:
240 | errexit(1,"no recoverable sbx file '%s'" % (sbxname))
241 | if cmdline.file:
242 | for filename in cmdline.file:
243 | uid = db.GetUIDFromFileName(filename)
244 | if uid:
245 | uidRecoList.append(uid)
246 | else:
247 | errexit(1,"no recoverable file '%s'" % (filename))
248 |
249 | if len(uidRecoList) == 0:
250 | errexit(1, "nothing to recover!")
251 |
252 | print("recovering SBX files...")
253 | uid_list = sorted(set(uidRecoList))
254 |
255 | #open all the sources
256 | finlist = {}
257 | for key, value in db.GetSourcesList():
258 | finlist[key] = open(value, "rb")
259 |
260 | uidcount = 0
261 | totblocks = 0
262 | totblockserr = 0
263 | uiderrlist = []
264 | for uid in uidRecoList:
265 | uidcount += 1
266 | sbxver = uidDataList[uid]
267 | sbx = seqbox.SbxBlock(ver=sbxver, pswd=cmdline.password)
268 | hexuid = binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode()
269 | print("UID %s (%i/%i)" % (hexuid, uidcount, len(uid_list)))
270 |
271 | blocksnum = db.GetBlocksCountFromUID(uid)
272 | print(" blocks: %i - size: %i bytes" %
273 | (blocksnum, blocksnum * sbx.blocksize))
274 | meta = db.GetMetaFromUID(uid)
275 | if "sbxname" in meta:
276 | sbxname = meta["sbxname"]
277 | else:
278 | #use hex uid as name if no metadata present
279 | sbxname = (binascii.hexlify(uid.to_bytes(6, byteorder="big")).decode() +
280 | ".sbx")
281 | if cmdline.destpath:
282 | sbxname = os.path.join(cmdline.destpath, sbxname)
283 | print(" to: '%s'" % sbxname)
284 |
285 | if not cmdline.overwrite:
286 | sbxname = uniquifyFileName(sbxname)
287 | fout = open(sbxname, "wb", buffering = 1024*1024)
288 |
289 | blockdatalist = db.GetBlocksListFromUID(uid)
290 | #read 1 block to initialize the correct block parameters
291 | #(needed for filling in missing blocks)
292 | blockdata = blockdatalist[0]
293 | fin = finlist[blockdata[1]]
294 | bpos = blockdata[2]
295 | fin.seek(bpos, 0)
296 | try:
297 | sbx.decode(fin.read(sbx.blocksize))
298 | except seqbox.SbxDecodeError as err:
299 | print(err)
300 | errexit(1, "invalid block at offset %s file '%s'" %
301 | (hex(bpos), fin.name))
302 |
303 | lastblock = -1
304 | ticks = 0
305 | missingblocks = 0
306 | updatetime = time.time() -1
307 | maxbnum = blockdatalist[-1][0]
308 | #loop trough the block list and recreate SBx file
309 | for blockdata in blockdatalist:
310 | bnum = blockdata[0]
311 | #check for missing blocks and fill in
312 | if bnum != lastblock +1 and bnum != 1:
313 | for b in range(lastblock+1, bnum):
314 | #no point in an empty block 0 with no metadata
315 | if b > 0 and cmdline.fill:
316 | sbx.blocknum = b
317 | sbx.data = bytes(sbx.datasize)
318 | buffer = sbx.encode()
319 | fout.write(buffer)
320 | missingblocks += 1
321 |
322 | fin = finlist[blockdata[1]]
323 | bpos = blockdata[2]
324 | fin.seek(bpos, 0)
325 | buffer = fin.read(sbx.blocksize)
326 | fout.write(buffer)
327 | lastblock = bnum
328 |
329 | #some progress report
330 | if time.time() > updatetime or bnum == maxbnum:
331 | print(" %.1f%%" % (bnum*100.0/maxbnum), " ",
332 | "(missing blocks: %i)" % missingblocks,
333 | end="\r", flush=True)
334 | updatetime = time.time() + .5
335 |
336 | fout.close()
337 | #set sbx date&time
338 | if "sbxdatetime" in meta:
339 | if meta["sbxdatetime"] >= 0:
340 | os.utime(sbxname, (int(time.time()), meta["sbxdatetime"]))
341 |
342 | print()
343 | if missingblocks > 0:
344 | uiderrlist.append((uid, missingblocks))
345 | totblockserr += missingblocks
346 |
347 | print("\ndone.")
348 | if len(uiderrlist) == 0:
349 | print("all SBx files recovered with no errors!")
350 | else:
351 | print("errors detected in %i SBx file(s)!" % len(uiderrlist))
352 | report_err(db, uiderrlist, uidDataList, blocksizes)
353 |
354 |
355 | if __name__ == '__main__':
356 | main()
357 |
--------------------------------------------------------------------------------
/sbxscan.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | #--------------------------------------------------------------------------
4 | # SBXScan - Sequenced Box container Scanner
5 | #
6 | # Created: 06/03/2017
7 | #
8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/
9 | #
10 | # Licence:
11 | # This program is free software: you can redistribute it and/or modify
12 | # it under the terms of the GNU Affero General Public License as
13 | # published by the Free Software Foundation, either version 3 of the
14 | # License, or (at your option) any later version.
15 | #
16 | # This program is distributed in the hope that it will be useful,
17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 | # GNU Affero General Public License for more details.
20 | #
21 | # You should have received a copy of the GNU Affero General Public License
22 | # along with this program. If not, see .
23 | #
24 | #--------------------------------------------------------------------------
25 |
26 | import os
27 | import sys
28 | import argparse
29 | import binascii
30 | from time import sleep, time
31 | import sqlite3
32 |
33 | import seqbox
34 |
35 | PROGRAM_VER = "1.0.1"
36 |
37 | def get_cmdline():
38 | """Evaluate command line parameters, usage & help."""
39 | parser = argparse.ArgumentParser(
40 | description=("scan files/devices for SBx blocks and create a "+
41 | "detailed report plus an index to be used with "+
42 | "SBXScan"),
43 | formatter_class=argparse.ArgumentDefaultsHelpFormatter,
44 | prefix_chars='-', fromfile_prefix_chars='@')
45 | parser.add_argument("-v", "--version", action='version',
46 | version='SeqBox - Sequenced Box container - ' +
47 | 'Scanner v%s - (C) 2017 by M.Pontello' % PROGRAM_VER)
48 | parser.add_argument("filename", action="store", nargs="+",
49 | help="file(s) to scan")
50 | parser.add_argument("-d", "--database", action="store", dest="dbfilename",
51 | metavar="filename",
52 | help="where to save recovery info",
53 | default="sbxscan.db3")
54 | parser.add_argument("-o", "--offset", type=int, default=0,
55 | help=("offset from the start"), metavar="n")
56 | parser.add_argument("-st", "--step", type=int, default=0,
57 | help=("scan step"), metavar="n")
58 | parser.add_argument("-b", "--buffer", type=int, default=1024,
59 | help=("read buffer in KB"), metavar="n")
60 | parser.add_argument("-sv", "--sbxver", type=int, default=1,
61 | help="SBX blocks version to search for", metavar="n")
62 | parser.add_argument("-p", "--password", type=str, default="",
63 | help="encrypt with password", metavar="pass")
64 | res = parser.parse_args()
65 | return res
66 |
67 |
68 | def errexit(errlev=1, mess=""):
69 | """Display an error and exit."""
70 | if mess != "":
71 | sys.stderr.write("%s: error: %s\n" %
72 | (os.path.split(sys.argv[0])[1], mess))
73 | sys.exit(errlev)
74 |
75 |
76 | def getFileSize(filename):
77 | """Calc file size - works on devices too"""
78 | ftemp = os.open(filename, os.O_RDONLY)
79 | try:
80 | return os.lseek(ftemp, 0, os.SEEK_END)
81 | finally:
82 | os.close(ftemp)
83 |
84 |
85 | def main():
86 |
87 | cmdline = get_cmdline()
88 |
89 | filenames = []
90 | for filename in cmdline.filename:
91 | if os.path.exists(filename):
92 | filenames.append(filename)
93 | else:
94 | errexit(1, "file '%s' not found!" % (filename))
95 | filenames = sorted(set(filenames), key=os.path.getsize)
96 |
97 | dbfilename = cmdline.dbfilename
98 | if os.path.isdir(dbfilename):
99 | dbfilename = os.path.join(dbfilename, "sbxscan.db3")
100 |
101 | #create database tables
102 | print("creating '%s' database..." % (dbfilename))
103 | if os.path.exists(dbfilename):
104 | os.remove(dbfilename)
105 | conn = sqlite3.connect(dbfilename)
106 | c = conn.cursor()
107 | c.execute("CREATE TABLE sbx_source (id INTEGER, name TEXT)")
108 | c.execute("CREATE TABLE sbx_meta (uid INTEGER, size INTEGER, name TEXT, sbxname TEXT, datetime INTEGER, sbxdatetime INTEGER, fileid INTEGER)")
109 | c.execute("CREATE TABLE sbx_uids (uid INTEGER, ver INTEGER)")
110 | c.execute("CREATE TABLE sbx_blocks (uid INTEGER, num INTEGER, fileid INTEGER, pos INTEGER )")
111 | c.execute("CREATE INDEX blocks ON sbx_blocks (uid, num, pos)")
112 |
113 | #scan all the files/devices
114 | sbx = seqbox.SbxBlock(ver=cmdline.sbxver,pswd=cmdline.password)
115 | offset = cmdline.offset
116 | filenum = 0
117 | uids = {}
118 | magic = b'SBx' + bytes([cmdline.sbxver])
119 | if cmdline.password:
120 | magic = seqbox.EncDec(cmdline.password, len(magic)).xor(magic)
121 | scanstep = cmdline.step
122 | if scanstep == 0:
123 | scanstep = sbx.blocksize
124 |
125 | for filename in filenames:
126 | filenum += 1
127 | print("scanning file/device '%s' (%i/%i)..." %
128 | (filename, filenum, len(filenames)))
129 | filesize = getFileSize(filename)
130 |
131 | c.execute("INSERT INTO sbx_source (id, name) VALUES (?, ?)",
132 | (filenum, filename))
133 | conn.commit()
134 |
135 | fin = open(filename, "rb", buffering=cmdline.buffer*1024)
136 | blocksfound = 0
137 | blocksmetafound = 0
138 | updatetime = time() - 1
139 | starttime = time()
140 | docommit = False
141 | for pos in range(offset, filesize, scanstep):
142 | fin.seek(pos, 0)
143 | buffer = fin.read(sbx.blocksize)
144 | #check for magic
145 | if buffer[:4] == magic:
146 | #check for valid block
147 | try:
148 | sbx.decode(buffer)
149 | #update uids table & list
150 | if not sbx.uid in uids:
151 | uids[sbx.uid] = True
152 | c.execute(
153 | "INSERT INTO sbx_uids (uid, ver) VALUES (?, ?)",
154 | (int.from_bytes(sbx.uid, byteorder='big'),
155 | sbx.ver))
156 | docommit = True
157 |
158 | #update blocks table
159 | blocksfound+=1
160 | c.execute(
161 | "INSERT INTO sbx_blocks (uid, num, fileid, pos) VALUES (?, ?, ?, ?)",
162 | (int.from_bytes(sbx.uid, byteorder='big'),
163 | sbx.blocknum, filenum, pos))
164 | docommit = True
165 |
166 | #update meta table
167 | if sbx.blocknum == 0:
168 | blocksmetafound += 1
169 | if not "filedatetime" in sbx.metadata:
170 | sbx.metadata["filedatetime"] = -1
171 | sbx.metadata["sbxdatetime"] = -1
172 |
173 | c.execute(
174 | "INSERT INTO sbx_meta (uid , size, name, sbxname, datetime, sbxdatetime, fileid) VALUES (?, ?, ?, ?, ?, ?, ?)",
175 | (int.from_bytes(sbx.uid, byteorder='big'),
176 | sbx.metadata["filesize"],
177 | sbx.metadata["filename"], sbx.metadata["sbxname"],
178 | sbx.metadata["filedatetime"], sbx.metadata["sbxdatetime"],
179 | filenum))
180 | docommit = True
181 |
182 | except seqbox.SbxDecodeError:
183 | pass
184 |
185 | #status update
186 | if (time() > updatetime) or (pos >= filesize - scanstep):
187 | etime = (time()-starttime)
188 | if etime == 0:
189 | etime = 1
190 | print("%5.1f%% blocks: %i - meta: %i - files: %i - %.2fMB/s" %
191 | (pos*100.0/(filesize-scanstep), blocksfound,
192 | blocksmetafound, len(uids), pos/(1024*1024)/etime),
193 | end = "\r", flush=True)
194 | if docommit:
195 | conn.commit()
196 | docommit = False
197 | updatetime = time() + .5
198 |
199 | fin.close()
200 | print()
201 |
202 | c.close()
203 | conn.close()
204 |
205 | print("scan completed!")
206 |
207 |
208 | if __name__ == '__main__':
209 | main()
210 |
--------------------------------------------------------------------------------
/seqbox.bt:
--------------------------------------------------------------------------------
1 | //--------------------------------------
2 | //--- 010 Editor v6.0.3 Binary Template
3 | //
4 | // File: seqbox.bt
5 | // Author: Marco Pontello
6 | // Revision: 1
7 | // Purpose: Explore SeqBox container
8 | // https://github.com/MarcoPon/SeqBox
9 | //--------------------------------------
10 |
11 | local int BLOCKSIZE = 512;
12 |
13 | BigEndian();
14 | DisplayFormatHex();
15 |
16 | struct BLOCK {
17 | struct HEADER {
18 | struct MAGIC {
19 | char signature[3];
20 | } magic;
21 | byte version;
22 | short CRC16;
23 | struct UID {
24 | byte uid[6];
25 | } uid;
26 | int blocknum;
27 | } header;
28 | byte data[BLOCKSIZE - sizeof(header)];
29 | } block[FileSize() / BLOCKSIZE];
30 |
31 |
32 |
--------------------------------------------------------------------------------
/seqbox.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | #--------------------------------------------------------------------------
4 | # SeqBox - Sequenced Box container module
5 | #
6 | # Created: 03/03/2017
7 | #
8 | # Copyright (C) 2017 Marco Pontello - http://mark0.net/
9 | #
10 | # Licence:
11 | # This program is free software: you can redistribute it and/or modify
12 | # it under the terms of the GNU Affero General Public License as
13 | # published by the Free Software Foundation, either version 3 of the
14 | # License, or (at your option) any later version.
15 | #
16 | # This program is distributed in the hope that it will be useful,
17 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
18 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 | # GNU Affero General Public License for more details.
20 | #
21 | # You should have received a copy of the GNU Affero General Public License
22 | # along with this program. If not, see .
23 | #
24 | #--------------------------------------------------------------------------
25 |
26 | import os
27 | import sys
28 | import binascii
29 | import random
30 | import hashlib
31 |
32 | supported_vers = [1, 2, 3]
33 |
34 |
35 | #Some custom exceptions
36 | class SbxError(Exception):
37 | pass
38 |
39 | class SbxDecodeError(SbxError):
40 | pass
41 |
42 |
43 | class SbxBlock():
44 | """
45 | Implement a basic SBX block
46 | """
47 |
48 | def __init__(self, ver=1, uid="r", pswd=""):
49 | self.ver = ver
50 | if ver == 1:
51 | self.blocksize = 512
52 | self.hdrsize = 16
53 | elif ver == 2:
54 | #mostly a test to double check that all tools works correctly
55 | #with different blocks versions/parameters.
56 | #or it could be good for CP/M! :)
57 | self.blocksize = 128
58 | self.hdrsize = 16
59 | elif ver == 3:
60 | #and another one for big blocks, to be used just if absolute
61 | #sure that the SBX file will not be used on a system with
62 | #smaller blocks
63 | self.blocksize = 4096
64 | self.hdrsize = 16
65 | else:
66 | raise SbxError("version %i not supported" % ver)
67 | self.datasize = self.blocksize - self.hdrsize
68 | self.magic = b'SBx' + bytes([ver])
69 | self.blocknum = 0
70 |
71 |
72 | if uid == "r":
73 | random.seed()
74 | self.uid = random.getrandbits(6*8).to_bytes(6, byteorder='big')
75 | else:
76 | self.uid = (b'\x00'*6 + uid)[-6:]
77 |
78 | if pswd:
79 | self.encdec = EncDec(pswd, self.blocksize)
80 | else:
81 | self.encdec = False
82 |
83 | self.parent_uid = 0
84 | self.metadata = {}
85 | self.data = b""
86 |
87 | def __str__(self):
88 | return "SBX Block ver: '%i', size: %i, hdr size: %i, data: %i" % \
89 | (self.ver, self.blocksize, self.hdrsize, self.datasize)
90 |
91 | def encode(self):
92 | if self.blocknum == 0:
93 | self.data = b""
94 | if "filename" in self.metadata:
95 | bb = self.metadata["filename"].encode()
96 | self.data += b"FNM" + bytes([len(bb)]) + bb
97 | if "sbxname" in self.metadata:
98 | bb = self.metadata["sbxname"].encode()
99 | self.data += b"SNM" + bytes([len(bb)]) + bb
100 | if "filesize" in self.metadata:
101 | bb = self.metadata["filesize"].to_bytes(8, byteorder='big')
102 | self.data += b"FSZ" + bytes([len(bb)]) + bb
103 | if "filedatetime" in self.metadata:
104 | bb = self.metadata["filedatetime"].to_bytes(8, byteorder='big')
105 | self.data += b"FDT" + bytes([len(bb)]) + bb
106 | if "sbxdatetime" in self.metadata:
107 | bb = self.metadata["sbxdatetime"].to_bytes(8, byteorder='big')
108 | self.data += b"SDT" + bytes([len(bb)]) + bb
109 | if "hash" in self.metadata:
110 | bb = self.metadata["hash"]
111 | self.data += b"HSH" + bytes([len(bb)]) + bb
112 |
113 | data = self.data + b'\x1A' * (self.datasize - len(self.data))
114 | buffer = (self.uid +
115 | self.blocknum.to_bytes(4, byteorder='big') +
116 | data)
117 | crc = binascii.crc_hqx(buffer, self.ver).to_bytes(2,byteorder='big')
118 | block = self.magic + crc + buffer
119 | if self.encdec:
120 | block = self.encdec.xor(block)
121 | return block
122 |
123 | def decode(self, buffer):
124 | #start setting an invalid block number
125 | self.blocknum = -1
126 | #decode eventual password
127 | if self.encdec:
128 | buffer = self.encdec.xor(buffer)
129 | #check the basics
130 | if len(buffer) != self.blocksize:
131 | raise SbxDecodeError("bad block size")
132 | if buffer[:3] != self.magic[:3]:
133 | raise SbxDecodeError("not an SBX block")
134 | if not buffer[3] in supported_vers:
135 | raise SbxDecodeError("block v%i not supported" % buffer[3])
136 |
137 | #check CRC of rest of the block
138 | crc = int.from_bytes(buffer[4:6], byteorder='big')
139 | if crc != binascii.crc_hqx(buffer[6:], self.ver):
140 | raise SbxDecodeError("bad CRC")
141 |
142 | self.parent_uid = 0
143 |
144 | self.uid = buffer[6:12]
145 | self.blocknum = int.from_bytes(buffer[12:16], byteorder='big')
146 | self.data = buffer[16:]
147 |
148 | self.metadata = {}
149 |
150 | if self.blocknum == 0:
151 | #decode meta data
152 | p = 0
153 | while p < (len(self.data)-3):
154 | metaid = self.data[p:p+3]
155 | p+=3
156 | if metaid == b"\x1a\x1a\x1a":
157 | break
158 | else:
159 | metalen = self.data[p]
160 | metabb = self.data[p+1:p+1+metalen]
161 | p = p + 1 + metalen
162 | if metaid == b'FNM':
163 | self.metadata["filename"] = metabb.decode('utf-8')
164 | if metaid == b'SNM':
165 | self.metadata["sbxname"] = metabb.decode('utf-8')
166 | if metaid == b'FSZ':
167 | self.metadata["filesize"] = int.from_bytes(metabb, byteorder='big')
168 | if metaid == b'FDT':
169 | self.metadata["filedatetime"] = int.from_bytes(metabb, byteorder='big')
170 | if metaid == b'SDT':
171 | self.metadata["sbxdatetime"] = int.from_bytes(metabb, byteorder='big')
172 | if metaid == b'HSH':
173 | self.metadata["hash"] = metabb
174 | return True
175 |
176 |
177 | class EncDec():
178 | """Simple encoding/decoding function"""
179 | #it's not meant as 'strong encryption', but just to hide the presence
180 | #of SBX blocks on a simple scan
181 | def __init__(self, key, size):
182 | #key is kept as a bigint because a xor between two bigint is faster
183 | #than byte-by-byte
184 | d = hashlib.sha256()
185 | key = key.encode()
186 | tempkey = key
187 | while len(tempkey) < size:
188 | d.update(tempkey)
189 | key = d.digest()
190 | tempkey += key
191 | self.key = int(binascii.hexlify(tempkey[:size]), 16)
192 | def xor(self, buffer):
193 | num = int(binascii.hexlify(buffer), 16) ^ self.key
194 | return binascii.unhexlify(hex(num)[2:])
195 |
196 |
197 | def main():
198 | print("SeqBox module!")
199 | sys.exit(0)
200 |
201 | if __name__ == '__main__':
202 | main()
203 |
--------------------------------------------------------------------------------
/todo.txt:
--------------------------------------------------------------------------------
1 | - 27/02/2017 blake2 would be better than SHA256, but require Python >=3.6.
2 | SHA256 should be good enough for the moment.
3 |
4 | - 12/03/2017 check if struct.pack&unpack is faster than to/from_bytes
5 |
6 | - 21/05/2017 prioritize metadata order; check if there's enough space
7 | to write them
--------------------------------------------------------------------------------