├── LICENSE.md ├── README.md ├── catalog.sqlite ├── requirements.txt └── squid.py /LICENSE.md: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright 2015 Ryan Benson 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | SQUID (SQLite Unknown Identifier) 2 | ========= 3 | 4 | "Fuzzy matching" for SQLite databases (presentation from [OSDFCon about SQUID](https://www.osdfcon.org/presentations/2015/Ryan-Benson_OSDF-Squid.pdf)) 5 | 6 | SQUID (SQLite Unknown Identifier) is a tool that compares unknown SQLite databases to a catalog of 'known' databases to find exact and near matches. Even if a program updates and its database structure changes, there's a good chance SQUID will be able to identify it as related to that application. 7 | 8 | SQUID is made up of a Python script (squid.py) and a SQLite file of known databases (catalog.sqlite). 9 | 10 | #### Examples: 11 | 12 | Scan a folder of carved SQLite databases to determine what application they are associated with: 13 | > C:\\squid.py --compare "C:\carving\recovered_SQLite_DBs" 14 | 15 | Scan a user's AppData folder to locate interesting databases and name the report: 16 | > C:\\squid.py --compare "C:\Users\Ryan\AppData" --output "Ryan_AppData" 17 | 18 | Scan iOS backups and save the report to a different drive: 19 | > C:\\squid.py --compare "C:\Users\Ryan\AppData\Roaming\Apple Computer\MobileSync\Backup" --output "X:\Reports\Ryan_iOS_Backups" 20 | 21 | Teach SQUID about a new version of Chrome: 22 | > C:\\squid.py --learn "C:\Users\Ryan\AppData\Local\Google\Chrome\User Data\Default" --program "Google Chrome" --version "47" --family "Web Browser" 23 | 24 | 25 | #### Command Line Options: 26 | 27 | | Option | Description | 28 | | --------------- | ------------------------------------------------------- | 29 | | -c or --compare | Compare to catalog of known databases. If -c points to a file, just that file will be compared. If -c points to a directory, the contents of that directory and all subdirectories will be scanned and compared. | 30 | | -o or --output | File name of XLSX report (without extension) with match details. If -o is not given, the file will be named "SQUID Matches (YYYY-MM-DDTHH-MM-SS)" | 31 | | -l or --learn | Learn the structure of the indicated database(s) and add to catalog. If -l points to a file, just that single database will be added. If -l points to a directory, the contents of that directory will be scanned and added. Subdirectories will NOT be added. | 32 | | -n or --name | Name of the database from --learn. If -n is not given, the name of SQLite file from -l will be entered in the catalog.| 33 | | -f or --family | Program Family (Web Browser, Chat, etc). Use with --learn | 34 | | -p or --program | Program the database is associated with. Use with --learn | 35 | | -v or --version | Version of the program the database is associated with. Use with --learn | 36 | 37 | #### Requirements: 38 | 39 | XlsxWriter (pip install xlsxwriter) 40 | -------------------------------------------------------------------------------- /catalog.sqlite: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/obsidianforensics/SQUID/ec5f2df54771094bcf54b98787b6f32927db110d/catalog.sqlite -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | xlsxwriter>=0.7.7 2 | -------------------------------------------------------------------------------- /squid.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sqlite3 4 | import os 5 | import sys 6 | import json 7 | import time 8 | import hashlib 9 | import argparse 10 | import textwrap 11 | import xlsxwriter 12 | 13 | __author__ = "Ryan Benson" 14 | __version__ = "0.5.0" 15 | __email__ = "ryan@obsidianforensics.com" 16 | 17 | 18 | def dict_factory(cursor, row): 19 | d = {} 20 | for idx, col in enumerate(cursor.description): 21 | d[col[0]] = row[idx] 22 | return d 23 | 24 | 25 | class squid(object): 26 | def __init__(self, db_name=None, structure={}, path=None, program_family=None, program_name=None, program_version=None, squid_id=None): 27 | self.db_name = db_name 28 | self.structure = structure 29 | self.path = path 30 | self.program_family = program_family 31 | self.program_name = program_name 32 | self.program_version = program_version 33 | self.squid_id = squid_id 34 | 35 | def build_structure(self): 36 | 37 | self.structure = {} 38 | 39 | # Connect to SQLite db 40 | try: 41 | db = sqlite3.connect(self.path) 42 | cursor = db.cursor() 43 | except: 44 | return 45 | 46 | # Find the names of each table in the db 47 | try: 48 | cursor.execute("SELECT name FROM sqlite_master WHERE type='table'") 49 | tables = cursor.fetchall() 50 | except sqlite3.OperationalError: 51 | # print("\nSQLite3 error; is the file open? If so, please close it and try again.") 52 | return 53 | except: 54 | # print "Couldn't open {}".format(self.path) 55 | return 56 | 57 | # For each table, find all the columns in it 58 | for table in tables: 59 | try: 60 | cursor.execute('PRAGMA table_info({})'.format(str(table[0]))) 61 | columns = cursor.fetchall() 62 | # print columns 63 | 64 | # Create a dict of lists of the table/column names 65 | self.structure[str(table[0])] = {} 66 | for column in columns: 67 | self.structure[str(table[0])][str(column[1])] = {} 68 | self.structure[str(table[0])][str(column[1])]['type'] = str(column[2]) 69 | self.structure[str(table[0])][str(column[1])]['not_null'] = column[3] 70 | self.structure[str(table[0])][str(column[1])]['default_value'] = column[4] 71 | except: 72 | return 73 | 74 | 75 | def compare_dbs(candidate, known): 76 | # These values are used to compute how similar two databases are, based on how many tables, columns, and column 77 | # attributes are shared between them. These initial values are set to give table name matches the most weight at 78 | # 12, each column name match half that weight at 6, and all three attributes total a weight of 3 (1 each). These 79 | # weights can be modified as you see fit to tweak the comparison equation. 80 | TABLE_WEIGHT = 12 81 | COLUMN_WEIGHT = 6 82 | # SQUID considers three attributes for each column: type, default_value, and not_null. 83 | ATTRIBUTE_WEIGHT = 1 84 | attributes = ['type', 'default_value', 'not_null'] 85 | # Initialize both the scores to 0. 86 | candidate_score = 0 87 | known_score = 0 88 | 89 | # For every table in the candidate DB 90 | for candidate_table in candidate.structure.keys(): 91 | # if the candidate table name matches a table name in the known DB 92 | if candidate_table in known.structure.keys(): 93 | # increase the candidate's score 94 | candidate_score += TABLE_WEIGHT 95 | # for each column in the matching table in the candidate DB 96 | for candidate_column in candidate.structure[candidate_table]: 97 | # if the candidate column name matches a column in the known table 98 | if candidate_column in known.structure[candidate_table].keys(): 99 | # increase the candidate's score 100 | candidate_score += COLUMN_WEIGHT 101 | # for each attribute SQUID is tracking 102 | for attribute in attributes: 103 | # if the attribute value for the candidate column matches the known column 104 | if candidate.structure[candidate_table][candidate_column][attribute] == \ 105 | known.structure[candidate_table][candidate_column][attribute]: 106 | # increase the candidate score 107 | candidate_score += ATTRIBUTE_WEIGHT 108 | # increase the known score, regardless 109 | known_score += ATTRIBUTE_WEIGHT 110 | 111 | else: 112 | # increase the known score instead 113 | known_score += COLUMN_WEIGHT 114 | 115 | # if the candidate table name doesn't match any known table name 116 | else: 117 | # increase the known score instead 118 | known_score += TABLE_WEIGHT 119 | # and also increase the known score for all the columns in that non-matching table 120 | for candidate_column in candidate.structure[candidate_table]: 121 | known_score += COLUMN_WEIGHT 122 | # also do this for attr(*3)? 123 | 124 | # add the points for each table and column in the known DB to the known score 125 | for known_table in known.structure.keys(): 126 | known_score += TABLE_WEIGHT 127 | for known_column in known.structure[known_table].keys(): 128 | known_score += COLUMN_WEIGHT 129 | 130 | return candidate_score, known_score, "{:.1f}".format(100 * float(candidate_score) / known_score) 131 | 132 | 133 | def learn_db(db_name, new_database_path, program_family, program_name, program_version): 134 | new_database = squid(db_name, path=new_database_path, program_family=program_family, program_name=program_name, 135 | program_version=program_version) 136 | new_database.build_structure() 137 | new_database.db_name = os.path.split(new_database.db_name)[1] 138 | 139 | if new_database.structure != {}: 140 | # Connect to SQUID db 141 | database_path = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), 'catalog.sqlite') 142 | db = sqlite3.connect(database_path) 143 | with db: 144 | db.row_factory = dict_factory 145 | cursor = db.cursor() 146 | cursor.execute("CREATE TABLE IF NOT EXISTS known_databases(" 147 | "program_family TEXT," 148 | "program_name TEXT," 149 | "program_version TEXT," 150 | "db_name TEXT," 151 | "structure TEXT," 152 | "structure_md5 TEXT)") 153 | 154 | m = hashlib.md5() 155 | m.update(json.dumps(new_database.structure)) 156 | 157 | # Check if we already have a database with this structure 158 | cursor.execute("SELECT * FROM known_databases WHERE structure_md5 = :candidate", 159 | {'candidate': str(m.hexdigest())}) 160 | matches = cursor.fetchall() 161 | 162 | # If there is only one match in the catalog, ask if we should add to it, make a new entry, or do nothing. 163 | if len(matches) == 1: 164 | existing_versions = json.loads(matches[0]['program_version']) 165 | existing_versions_friendly = ', '.join(map(str, existing_versions)) 166 | print "\n - A database with this structure is already in the catalog." 167 | print " DB Name: {}".format(matches[0]['db_name']) 168 | print " Program: {}".format(matches[0]['program_name']) 169 | print " Version: {}".format(existing_versions_friendly) 170 | if program_version not in existing_versions: 171 | update = raw_input(" Would you like to add this new version number to this existing entry in " 172 | "the\n SQUID catalog? \n (Y)es or (N)o? ") 173 | if update[0].lower() == 'y': 174 | new_versions = json.loads(matches[0]['program_version']) 175 | new_versions.append(program_version) 176 | try: 177 | new_versions.sort(key=float) 178 | except: 179 | new_versions.sort() 180 | cursor.execute("UPDATE known_databases SET program_version = ? WHERE (structure_md5 = ? " 181 | "AND program_name = ?)", (json.dumps(new_versions), matches[0]['structure_md5'], 182 | matches[0]['program_name'])) 183 | elif update[0].lower() == 'n': 184 | add_new = raw_input(" Would you like to add this database as a new entry in the SQUID " 185 | "catalog? \n (Y)es or (N)o? ") 186 | if add_new[0].lower() == 'y': 187 | version_list = [] 188 | version_list.append(new_database.program_version) 189 | cursor.execute("INSERT INTO known_databases (program_family, program_name, program_version," 190 | " db_name, structure, structure_md5) VALUES (?, ?, ?, ?, ?, ?)", 191 | (new_database.program_family, new_database.program_name, 192 | json.dumps(version_list), new_database.db_name, 193 | json.dumps(new_database.structure), m.hexdigest())) 194 | else: 195 | print textwrap.fill("Not adding as version {} is already associated with this catalog entry" 196 | .format(program_version), width=75, initial_indent=" ", subsequent_indent=" ") 197 | 198 | # If there is more than one match in the catalog, ask if we should add to it, make a new entry, or skip. 199 | elif len(matches) > 1: 200 | print "\n {} databases with this structure are already in the catalog.\n".format(len(matches)) 201 | for count, match in enumerate(matches): 202 | existing_versions = json.loads(matches[count]['program_version']) 203 | if isinstance(existing_versions, list): 204 | existing_versions = ', '.join(map(str, existing_versions)) 205 | print " {}. DB Name: {}".format(count+1, matches[count]['db_name']) 206 | print " Program: {}".format(matches[count]['program_name']) 207 | print " Version: {}".format(existing_versions) 208 | 209 | update = raw_input(" Which entry would you like to add this new version number to? \n [Enter # or " 210 | "(N)one] ") 211 | try: 212 | if 0 < int(update[0]) <= len(matches): 213 | new_versions = json.loads(matches[int(update[0])-1]['program_version']) 214 | if program_version not in new_versions: 215 | new_versions.append(program_version) 216 | new_versions.sort() 217 | cursor.execute("UPDATE known_databases SET program_version = ? WHERE (structure_md5 = ? " 218 | "AND program_name = ?)", 219 | (json.dumps(new_versions), matches[int(update[0])-1]['structure_md5'], 220 | matches[int(update[0])-1]['program_name'])) 221 | else: 222 | print " Invalid entry. Skipping this database." 223 | except: 224 | pass 225 | 226 | if program_version not in matches[0]['program_version']: 227 | new_versions = json.loads(matches[0]['program_version']) 228 | new_versions.append(program_version) 229 | new_versions.sort() 230 | 231 | else: 232 | version_list = [] 233 | version_list.append(new_database.program_version) 234 | cursor.execute("INSERT INTO known_databases (program_family, program_name, program_version, db_name, " 235 | "structure, structure_md5) VALUES (?, ?, ?, ?, ?, ?)", 236 | (new_database.program_family, new_database.program_name, 237 | json.dumps(version_list), new_database.db_name, 238 | json.dumps(new_database.structure), m.hexdigest())) 239 | print 240 | print textwrap.fill("- Learned {} from {}\n".format(db_name, new_database_path), width=75, 241 | initial_indent=" ", subsequent_indent=" ", replace_whitespace=False) 242 | 243 | 244 | def learn_program(program_path, program_family, program_name, program_version): 245 | listing = os.listdir(program_path) 246 | for potential_db in listing: 247 | learn_db(potential_db, os.path.join(program_path, potential_db), program_family, program_name, program_version) 248 | 249 | 250 | def compare_to_known(candidate_db, squid_reference_database): 251 | top_three_matches = [] 252 | short_columns = "{:>25} {:>5}% {:<25} {:<18}" 253 | 254 | def print_short_comparison(score, known_db, candidate_db_name): 255 | if len(candidate_db_name) > 25: 256 | candidate_db_name = candidate_db_name[:23] + ".." 257 | if len(known_db.db_name) > 25: 258 | known_db.db_name = known_db.db_name[:23] + ".." 259 | if len(known_db.program_name) > 22: 260 | known_db.program_name = known_db.program_name[:20] + ".." 261 | print(short_columns.format(candidate_db_name, score, known_db.db_name, known_db.program_name)) 262 | 263 | def add_rank(rankings, score, known_squid): 264 | if len(rankings) < 3: 265 | rankings.append({'score': score, 'squid': known_squid}) 266 | rankings = sorted(rankings, key=lambda k: k['score'], reverse=True) 267 | else: 268 | if rankings[2]['score'] < score: 269 | rankings.pop() 270 | rankings.append({'score': score, 'squid': known_squid}) 271 | rankings = sorted(rankings, key=lambda k: k['score'], reverse=True) 272 | 273 | return rankings 274 | 275 | # Connect to SQUID db 276 | squid_db = sqlite3.connect(squid_reference_database) 277 | with squid_db: 278 | squid_db.row_factory = dict_factory 279 | cursor = squid_db.cursor() 280 | cursor.execute("SELECT db_name, structure, rowid AS squid_id, program_family, program_name, program_version " 281 | "FROM known_databases") 282 | for known_db in cursor: 283 | # Convert 'structure' from string to JSON 284 | known_db['structure'] = json.loads(known_db['structure']) 285 | # Create a squid from the database row 286 | known_squid = squid(**known_db) 287 | score = compare_dbs(candidate_db, known_squid) 288 | top_three_matches = add_rank(top_three_matches, float(score[2]), known_squid) 289 | 290 | # If the match is over 90%, just print 291 | if top_three_matches[0]['score'] > 90: 292 | print_short_comparison(top_three_matches[0]['score'], top_three_matches[0]['squid'], candidate_db.db_name) 293 | 294 | return top_three_matches 295 | 296 | 297 | def compare_each(known_db, dirname, names): 298 | for file_name in names: 299 | file_path = os.path.join(dirname, file_name) 300 | candidate = squid(db_name=file_name, path=file_path) 301 | candidate.build_structure() 302 | if candidate.structure != {}: 303 | top_three = compare_to_known(candidate, known_db) 304 | results.append({'file_name': file_name, 'file_path': file_path, 'top_three': top_three}) 305 | 306 | 307 | def write_xlsx(output): 308 | def friendly_version(version_json): 309 | version_list = json.loads(version_json) 310 | if isinstance(version_list, list): 311 | if len(version_list) > 1: 312 | return str(version_list[0]) + " - " + str(version_list[-1]) 313 | else: 314 | return version_list[0] 315 | else: 316 | return version_list 317 | workbook = xlsxwriter.Workbook(output + '.xlsx') 318 | w = workbook.add_worksheet('Matches') 319 | 320 | # Define cell formats 321 | title_header_format = workbook.add_format({'font_color': 'white', 'bg_color': 'gray', 'bold': 'true'}) 322 | center_header_format = workbook.add_format({'font_color': 'black', 'align': 'center', 'bg_color': 'gray', 323 | 'bold': 'true'}) 324 | header_format = workbook.add_format({'font_color': 'black', 'bg_color': 'gray', 'bold': 'true'}) 325 | black_percent_format = workbook.add_format({'font_color': 'black', 'num_format': '0.0%', 'left': 1}) 326 | 327 | # Title bar 328 | w.merge_range('A1:B1', "SQUID (v%s)" % __version__, title_header_format) 329 | w.merge_range('C1:G1', 'Match 1', center_header_format) 330 | w.merge_range('H1:L1', 'Match 2', center_header_format) 331 | w.merge_range('M1:Q1', 'Match 3', center_header_format) 332 | 333 | # Write column headers 334 | w.write(1, 0, "Name", header_format) 335 | w.write(1, 1, "Path", header_format) 336 | w.write(1, 2, "Match%", header_format) 337 | w.write(1, 3, "DB Name", header_format) 338 | w.write(1, 4, "Program Name", header_format) 339 | w.write(1, 5, "Version", header_format) 340 | w.write(1, 6, "Category", header_format) 341 | 342 | w.write(1, 7, "Match%", header_format) 343 | w.write(1, 8, "DB Name", header_format) 344 | w.write(1, 9, "Program Name", header_format) 345 | w.write(1, 10, "Version", header_format) 346 | w.write(1, 11, "Category", header_format) 347 | 348 | w.write(1, 12, "Match%", header_format) 349 | w.write(1, 13, "DB Name", header_format) 350 | w.write(1, 14, "Program Name", header_format) 351 | w.write(1, 15, "Version", header_format) 352 | w.write(1, 16, "Category", header_format) 353 | 354 | #Set column widths 355 | w.set_column('A:A', 25) # Name 356 | w.set_column('B:B', 35) # Path 357 | 358 | # Match 1 359 | w.set_column('C:C', 8, black_percent_format) # Match % 360 | w.set_column('D:D', 20) # DB Name 361 | w.set_column('E:E', 16) # Program Name 362 | w.set_column('F:F', 10) # Program Version 363 | w.set_column('G:G', 15) # Program Family 364 | 365 | # Match 2 366 | w.set_column('H:H', 8, black_percent_format) # Match % 367 | w.set_column('I:I', 20) # DB Name 368 | w.set_column('J:J', 16) # Program Name 369 | w.set_column('K:K', 10) # Program Version 370 | w.set_column('L:L', 15) # Program Family 371 | 372 | # Match 3 373 | w.set_column('M:M', 8, black_percent_format) # Match % 374 | w.set_column('N:N', 20) # DB Name 375 | w.set_column('O:O', 16) # Program Name 376 | w.set_column('P:P', 10) # Program Version 377 | w.set_column('Q:Q', 15) # Program Family 378 | 379 | print 380 | print textwrap.fill("Writing match details to \"{}.xlsx\".".format(output), width=75, 381 | initial_indent=" ", subsequent_indent=" ") 382 | row_number = 2 383 | for item in results: 384 | for counter, match in enumerate(item['top_three']): 385 | if item['top_three'][counter]['score'] == 0.0: 386 | item['top_three'][counter]['squid'].db_name = '-' 387 | item['top_three'][counter]['squid'].program_name = '-' 388 | item['top_three'][counter]['squid'].program_version = "[\"-\"]" 389 | item['top_three'][counter]['squid'].program_family = '-' 390 | w.write(row_number, 0, item['file_name']) 391 | w.write(row_number, 1, item['file_path']) 392 | w.write(row_number, 2, item['top_three'][0]['score'] / 100) 393 | w.write(row_number, 3, item['top_three'][0]['squid'].db_name) 394 | w.write(row_number, 4, item['top_three'][0]['squid'].program_name) 395 | w.write(row_number, 5, friendly_version(item['top_three'][0]['squid'].program_version)) 396 | w.write(row_number, 6, item['top_three'][0]['squid'].program_family) 397 | w.write(row_number, 7, item['top_three'][1]['score'] / 100) 398 | w.write(row_number, 8, item['top_three'][1]['squid'].db_name) 399 | w.write(row_number, 9, item['top_three'][1]['squid'].program_name) 400 | w.write(row_number, 10, friendly_version(item['top_three'][1]['squid'].program_version)) 401 | w.write(row_number, 11, item['top_three'][1]['squid'].program_family) 402 | w.write(row_number, 12, item['top_three'][2]['score'] / 100) 403 | w.write(row_number, 13, item['top_three'][2]['squid'].db_name) 404 | w.write(row_number, 14, item['top_three'][2]['squid'].program_name) 405 | w.write(row_number, 15, friendly_version(item['top_three'][2]['squid'].program_version)) 406 | w.write(row_number, 16, item['top_three'][2]['squid'].program_family) 407 | 408 | row_number += 1 409 | 410 | # Formatting 411 | w.freeze_panes(2, 0) # Freeze top row 412 | w.autofilter(1, 0, row_number, 16) # Add autofilter 413 | 414 | workbook.close() 415 | 416 | 417 | def parse_args(): 418 | description = textwrap.fill("SQUID (SQLite Unknown Identifier) is a tool that compares unknown SQLite databases " 419 | "to a catalog of 'known' databases to find exact and near matches. Even if a " 420 | "program updates and changes its database structure, there's a good chance SQUID will " 421 | "be able to identify it.", width=75, initial_indent=" ", subsequent_indent=" ") 422 | usage1 = textwrap.fill("squid.py --compare c:\carved_databases", width=75, initial_indent=" ") 423 | usage2 = textwrap.fill("squid.py --learn \"C:\\Users\\Ryan\\AppData\\Local\\Google\\Chrome\\User Data\\Default\" " 424 | "--program \"Chrome\" --version \"47\" --family \"Web Browser\"", width=75, 425 | initial_indent=" ", subsequent_indent=" ", replace_whitespace=True) 426 | pre = description + "\n\n Example Usage:\n" + usage1 + "\n" + usage2 427 | 428 | class MyParser(argparse.ArgumentParser): 429 | def error(self, message): 430 | sys.stderr.write('error: %s\n' % message) 431 | self.print_help() 432 | sys.exit(2) 433 | 434 | parser = MyParser( 435 | formatter_class=argparse.RawDescriptionHelpFormatter, 436 | description=pre) 437 | 438 | main_group = parser.add_mutually_exclusive_group(required=True) 439 | 440 | main_group.add_argument('-c', '--compare', 441 | help='Compare to catalog of known databases. If -c points to a file, just that file will ' 442 | 'be compared. If -c points to a directory, the contents of that directory and all ' 443 | 'subdirectories will be scanned and compared.') 444 | 445 | main_group.add_argument('-l', '--learn', 446 | help='Learn the structure of the indicated database(s) and add to catalog. If -l points to ' 447 | 'a file, just that single database will be added. If -l points to a directory, the ' 448 | 'contents of that directory will be scanned and added. Subdirectories will NOT ' 449 | 'be added.') 450 | parser.add_argument('-n', '--name', 451 | help='Name of the database from --learn. If -n is not given, the name of SQLite ' 452 | 'file from -l will be entered in the catalog.') 453 | parser.add_argument('-f', '--family', 454 | help='Program Family (Web Browser, Chat, etc). Use with --learn') 455 | parser.add_argument('-p', '--program', 456 | help='Program the database is associated with. Use with --learn') 457 | parser.add_argument('-v', '--version', 458 | help='Version of the program the database is associated with. Use with --learn') 459 | parser.add_argument('-o', '--output', 460 | help='File name of XLSX report (without extension) with match details. If -o is not given, ' 461 | 'the file will be named "SQUID Matches (YYYY-MM-DDTHH-MM-SS)".') 462 | 463 | args = vars(parser.parse_args()) 464 | if not args['name']: 465 | args['name'] = args['learn'] 466 | if not args['name'] and not args['learn']: 467 | args['name'] = args['compare'] 468 | if not args['output']: 469 | args['output'] = "SQUID Matches ({})".format(time.strftime('%Y-%m-%dT%H-%M-%S')) 470 | return args 471 | 472 | 473 | def main(): 474 | args = parse_args() 475 | 476 | print 477 | print '-' * 78 478 | print " SQUID v{} - SQLite Unknown Identifier".format(__version__) 479 | print '-' * 78 + '\n' 480 | 481 | if args['compare']: 482 | global results 483 | results = [] 484 | short_columns = "{:>25} {:>5}% {:<25} {:<18}" 485 | 486 | if os.path.isdir(str(args['compare']).rstrip(os.sep)): 487 | print textwrap.fill("Scanning {} and any subdirectories for SQLite DBs.\n".format(args['compare']), 488 | width=75, initial_indent=" ", subsequent_indent=" ") 489 | print "\n" 490 | print textwrap.fill("Below are any high-confidence (90+%) matches; a complete list of the top three matches" 491 | " for each SQLite DB is in \"{}.xlsx\".".format(args['output']), width=75, 492 | initial_indent=" ", subsequent_indent=" ") 493 | print "\n" 494 | print '-' * 78 495 | print(short_columns.format('Candidate SQLite DB', 'Match', 'Known DB Name', 'Known Program')) 496 | print '-' * 78 497 | os.path.walk(args['compare'], compare_each, 'catalog.sqlite') 498 | print '-' * 78 499 | write_xlsx(args['output']) 500 | else: 501 | print "Comparing {} to known SQLite DBs.\n".format(args['compare']) 502 | print '-' * 78 503 | print(short_columns.format('Candidate SQLite DB', 'Match', 'DB Name', 'Program')) 504 | print '-' * 78 505 | candidate_db = squid(args['name'], path=args['compare']) 506 | candidate_db.build_structure() 507 | catalog_path = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), 'catalog.sqlite') 508 | compare_to_known(candidate_db, catalog_path) 509 | print '-' * 78 + '\n' 510 | 511 | elif args['learn']: 512 | if os.path.isdir(str(args['learn']).rstrip(os.sep)): 513 | print textwrap.fill("Scanning {} for SQLite DBs.\n".format(args['learn']), 514 | width=75, initial_indent=" ", subsequent_indent=" ") 515 | print 516 | learn_program(args['learn'], args['family'], args['program'], args['version']) 517 | else: 518 | print textwrap.fill("Learning structure of {}.\n".format(args['learn']), 519 | width=75, initial_indent=" ", subsequent_indent=" ") 520 | print 521 | learn_db(args['name'], args['learn'], args['family'], args['program'], args['version']) 522 | 523 | 524 | if __name__ == "__main__": 525 | main() 526 | --------------------------------------------------------------------------------