├── .gitignore ├── LICENSE ├── README.md ├── ccl_chromium_reader ├── __init__.py ├── ccl_chromium_cache.py ├── ccl_chromium_filesystem.py ├── ccl_chromium_history.py ├── ccl_chromium_indexeddb.py ├── ccl_chromium_localstorage.py ├── ccl_chromium_notifications.py ├── ccl_chromium_profile_folder.py ├── ccl_chromium_sessionstorage.py ├── ccl_chromium_snss2.py ├── ccl_shared_proto_db_downloads.py ├── common.py ├── download_common.py ├── profile_folder_protocols.py ├── serialization_formats │ ├── __init__.py │ ├── ccl_blink_value_deserializer.py │ ├── ccl_easy_chromium_pickle.py │ ├── ccl_protobuff.py │ └── ccl_v8_value_deserializer.py └── storage_formats │ ├── __init__.py │ └── ccl_leveldb.py ├── pyproject.toml ├── requirements.txt └── tools_and_utilities ├── Chromium_dump_local_storage.py ├── Chromium_dump_session_storage.py ├── benchmark.py ├── ccl_chrome_audit.py ├── dump_indexeddb_details.py ├── dump_leveldb.py └── extras ├── make_many_indexeddb_databases.html ├── make_test_indexeddb.html └── make_webstorage.html /.gitignore: -------------------------------------------------------------------------------- 1 | /*.bin 2 | /__pycache__ 3 | /__ignore__ 4 | /.idea 5 | env/ 6 | dist 7 | *.egg-info -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2020, CCL Forensics 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ccl_chromium_reader 2 | This repository contains a package of (sometimes partial) 3 | re-implementations of the technologies used by Chrome/Chromium/Chrome-esque 4 | applications to store data in a range of data-stores in Python. These libraries 5 | provide programmatic access to these data-stores with a digital forensics slant 6 | (e.g. for most artefacts, offsets or IDs for the data are provided so that they 7 | can be located and manually checked). 8 | 9 | The technologies supported are: 10 | * Snappy decompression 11 | * LevelDB 12 | * Protobuf 13 | * Pickles 14 | * V8 object deserialization 15 | * Blink object deserialization 16 | * IndexedDB 17 | * Web Storage (Local Storage and Session Storage) 18 | * Cache (both Block File and Simple formats) 19 | * SNSS Session files (partial support) 20 | * FileSystem API 21 | * Notifications API (Platform Notifications) 22 | * Downloads (from shared_proto_db) 23 | * History 24 | 25 | Additionally, there are a number of utility scripts included such as: 26 | * `ccl_chromium_cache.py` - using the cache library as a command line tool dumps 27 | the cache and all HTTP header information. 28 | * `ccl_chrome_audit.py` - a tool which can be used to scan the data-stored supported 29 | by the included libraries, plus a couple more, for records related to a host - 30 | designed as a research tool into data stored by web apps. 31 | 32 | 33 | ## Python Versions 34 | The code in this library was written and tested using Python 3.10. It *should* work 35 | with 3.9, but uses language features which were not present in earlier versions. 36 | Some parts of the library will probably work OK going back a few versions, but if 37 | you report bugs related to any version before 3.10, the first question will be: can 38 | you upgrade to 3.10? 39 | 40 | ## A Note On Requirements 41 | This repository contains a `requirements.txt` in the pip format. Other than `Brotli` 42 | The dependencies listed are only required for the `ccl_chrome_audit.py` script or 43 | when using the `ccl_chromium_cache` module as a script for dumping the cache; the 44 | libraries work using only the other scripts in this repository and the Python 45 | standard library. 46 | 47 | ## Documentation 48 | The documentation in the libraries is currently sparser than ideal, but some 49 | recent work has been undertaken to add more usage strings and fill in some gaps 50 | in the type-hints. We welcome pull requests to fill in gaps in the documentation. 51 | 52 | ## ccl_chrome_audit 53 | This script audits multiple data stores in a Chrom(e|ium) profile folder based on 54 | a fragment (regex) of a host name. It is designed to aid in research into web apps 55 | by quickly highlighting what data related to that domain is stored where (also of 56 | us with Electron apps etc.) 57 | 58 | ### Caveats 59 | At the moment, the script is designed primarily for use on Windows and on the 60 | host where the data was populated (this is because of the Cookie decryption being 61 | achieved using DPAPI). 62 | 63 | ### Usage 64 | ``` 65 | ccl_chrome_audit.py [cache folder (for mobile)] 66 | ``` 67 | 68 | ### Current Supported Data Sources 69 | * Bookmarks 70 | * History 71 | * Downloads (from History) 72 | * Downloads (from shared_proto_db) 73 | * Favicons 74 | * Cache 75 | * Cookies 76 | * Local Storage 77 | * Session Storage 78 | * IndexedDb 79 | * File System API 80 | * Platform Notifications 81 | * Logins 82 | * Sessions (SNSS) 83 | 84 | 85 | ## ChromiumProfileFolder 86 | The `ChromiumProfileFolder` class is intended to act as a convenient entry-point to 87 | much of the useful functionality in the package. It performs on-demand loading of 88 | data, so the "start-up cost" of using this object over the individual modules 89 | is near-zero, but with the advantage of better searching and filtering 90 | functionality built in and an easier interface to bring together data from these 91 | different sources. 92 | 93 | In this version `ChromiumProfileFolder` supports the following data-stores: 94 | * History 95 | * Cache 96 | * IndexedDB 97 | * Local Storage 98 | * Session Storage 99 | 100 | To use the object, simply pass the path of the profile folder into the constructor 101 | (the object supports the context manager interface): 102 | 103 | ```python 104 | import pathlib 105 | from ccl_chromium_reader import ChromiumProfileFolder 106 | 107 | profile_path = pathlib.Path("profile path goes here") 108 | 109 | with ChromiumProfileFolder(profile_path) as profile: 110 | ... # do things with the profile 111 | ``` 112 | 113 | Most of the methods of the `ChromiumProfileFolder` object which retrieve data can 114 | search/filter through a `KeySearch` interface which in essence is on of: 115 | * a `str`, in which case the search will try to exactly match the value 116 | * a collection of `str` (e.g., `list` or `tuple`), in which case the search will 117 | try to exactly match one of the values contained therein 118 | * a `re.pattern` in which case the search attempts to match the pattern anywhere 119 | in the search (same as `re.search`) 120 | * a function which takes a `str` and returns a `bool` indicating whether it's a 121 | match. 122 | 123 | ```python 124 | import re 125 | import pathlib 126 | from ccl_chromium_reader import ChromiumProfileFolder 127 | 128 | profile_path = pathlib.Path("profile path goes here") 129 | 130 | with ChromiumProfileFolder(profile_path) as profile: 131 | # Match one of two possible hosts exactly, then a regular expression for the key 132 | for ls_rec in profile.iter_local_storage( 133 | storage_key=["http://not-a-real-url1.com", "http://not-a-real-url2.com"], 134 | script_key=re.compile(r"message\d{1,3}?-text")): 135 | print(ls_rec.value) 136 | 137 | # Match all urls which end with "&read=1" 138 | for hist_rec in profile.iterate_history_records(url=lambda x: x.endswith("&read=1")): 139 | print(hist_rec.title, hist_rec.url) 140 | 141 | ``` 142 | 143 | ## IndexedDB 144 | The `ccl_chromium_indexeddb.py` library processes IndexedDB data found in Chrome et al. 145 | 146 | ### Blog 147 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/indexeddb-on-chromium 148 | 149 | ### Caveats 150 | There is a fair amount of work yet to be done in terms of documentation, but 151 | the modules should be fine for pulling data out of IndexedDB, with the following 152 | caveats: 153 | 154 | #### LevelDB deleted data 155 | The LevelDB module will spit out live and deleted/old versions of records 156 | indiscriminately; it's possible to differentiate between them with some 157 | work, but that hasn't really been baked into the modules as they currently 158 | stand. So you are getting deleted data "for free" currently...whether you 159 | want it or not. 160 | 161 | #### Blink data types 162 | I am fairly satisfied that all the possible V8 object types are accounted for 163 | (but I'm happy to be shown otherwise and get that fixed of course!), but it 164 | is likely that the hosted Blink objects aren't all there yet; so if you hit 165 | upon an error coming from inside ccl_blink_value_deserializer and can point 166 | me towards test data, I'd be very thankful! 167 | 168 | #### Cyclic references 169 | It is noted in the V8 source that recursive referencing is possible in the 170 | serialization, we're not yet accounting for that so if Python throws a 171 | `RecursionError` that's likely what you're seeing. The plan is to use a 172 | similar approach to ccl_bplist where the collection types are subclassed and 173 | do Just In Time resolution of the items, but that isn't done yet. 174 | 175 | ## Using the modules 176 | There are two methods for accessing records - a more pythonic API using a set of 177 | wrapper objects and a raw API which doesn't mask the underlying workings. There is 178 | unlikely to be much benefit to using the raw API in most cases, so the wrapper objects 179 | are recommended unless you have a compelling reason otherwise. 180 | 181 | ### Wrapper API 182 | ```python 183 | import sys 184 | from ccl_chromium_reader import ccl_chromium_indexeddb 185 | 186 | # assuming command line arguments are paths to the .leveldb and .blob folders 187 | leveldb_folder_path = sys.argv[1] 188 | blob_folder_path = sys.argv[2] 189 | 190 | # open the indexedDB: 191 | wrapper = ccl_chromium_indexeddb.WrappedIndexDB(leveldb_folder_path, blob_folder_path) 192 | 193 | # You can check the databases present using `wrapper.database_ids` 194 | 195 | # Databases can be accessed from the wrapper in a number of ways: 196 | db = wrapper[2] # accessing database using id number 197 | db = wrapper["MyTestDatabase"] # accessing database using name (only valid for single origin indexedDB instances) 198 | db = wrapper["MyTestDatabase", "file__0@1"] # accessing the database using name and origin 199 | # NB using name and origin is likely the preferred option in most cases 200 | 201 | # The wrapper object also supports checking for databases using `in` 202 | 203 | # You can check for object store names using `db.object_store_names` 204 | 205 | # Object stores can be accessed from the database in a number of ways: 206 | obj_store = db[1] # accessing object store using id number 207 | obj_store = db["store"] # accessing object store using name 208 | 209 | # Records can then be accessed by iterating the object store in a for-loop 210 | for record in obj_store.iterate_records(): 211 | print(record.user_key) 212 | print(record.value) 213 | 214 | # if this record contained a FileInfo object somewhere linking 215 | # to data stored in the blob dir, we could access that data like 216 | # so (assume the "file" key in the record value is our FileInfo): 217 | with record.get_blob_stream(record.value["file"]) as f: 218 | file_data = f.read() 219 | 220 | # By default, any errors in decoding records will bubble an exception 221 | # which might be painful when iterating records in a for-loop, so either 222 | # passing True into the errors_to_stdout argument and/or by passing in an 223 | # error handler function to bad_deserialization_data_handler, you can 224 | # perform logging rather than crashing: 225 | 226 | for record in obj_store.iterate_records( 227 | errors_to_stdout=True, 228 | bad_deserializer_data_handler= lambda k,v: print(f"error: {k}, {v}")): 229 | print(record.user_key) 230 | print(record.value) 231 | ``` 232 | 233 | ### Raw access API 234 | ```python 235 | import sys 236 | from ccl_chromium_reader import ccl_chromium_indexeddb 237 | 238 | # assuming command line arguments are paths to the .leveldb and .blob folders 239 | leveldb_folder_path = sys.argv[1] 240 | blob_folder_path = sys.argv[2] 241 | 242 | # open the database: 243 | db = ccl_chromium_indexeddb.IndexedDb(leveldb_folder_path, blob_folder_path) 244 | 245 | # there can be multiple databases, so we need to iterate through them (NB 246 | # DatabaseID objects contain additional metadata, they aren't just ints): 247 | for db_id_meta in db.global_metadata.db_ids: 248 | # and within each database, there will be multiple object stores so we 249 | # will need to know the maximum object store number (this process will be 250 | # cleaned up in future releases): 251 | max_objstore_id = db.get_database_metadata( 252 | db_id_meta.dbid_no, 253 | ccl_chromium_indexeddb.DatabaseMetadataType.MaximumObjectStoreId) 254 | 255 | # if the above returns None, then there are no stores in this db 256 | if max_objstore_id is None: 257 | continue 258 | 259 | # there may be multiple object stores, so again, we iterate through them 260 | # this time based on the id number. Object stores start at id 1 and the 261 | # max_objstore_id is inclusive: 262 | for obj_store_id in range(1, max_objstore_id + 1): 263 | # now we can ask the indexeddb wrapper for all records for this db 264 | # and object store: 265 | for record in db.iterate_records(db_id_meta.dbid_no, obj_store_id): 266 | print(f"key: {record.user_key}") 267 | print(f"key: {record.value}") 268 | 269 | # if this record contained a FileInfo object somewhere linking 270 | # to data stored in the blob dir, we could access that data like 271 | # so (assume the "file" key in the record value is our FileInfo): 272 | with record.get_blob_stream(record.value["file"]) as f: 273 | file_data = f.read() 274 | ``` 275 | 276 | ## Local Storage 277 | `ccl_chromium_localstorage` contains functionality to read the Local Storage data from 278 | a Chromium/Chrome profile folder. 279 | 280 | ### Blog 281 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage 282 | 283 | ### Using the module 284 | 285 | An example showing how to iterate all records, grouped by host is shown below: 286 | ```python 287 | import sys 288 | import pathlib 289 | from ccl_chromium_reader import ccl_chromium_localstorage 290 | 291 | level_db_in_dir = pathlib.Path(sys.argv[1]) 292 | 293 | # Create the LocalStoreDb object which is used to access the data 294 | with ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir) as local_storage: 295 | for storage_key in local_storage.iter_storage_keys(): 296 | print(f"Getting records for {storage_key}") 297 | 298 | for record in local_storage.iter_records_for_storage_key(storage_key): 299 | # we can attempt to associate this record with a batch, which may 300 | # provide an approximate timestamp (withing 5-60 seconds) for this 301 | # record. 302 | batch = local_storage.find_batch(record.leveldb_seq_number) 303 | timestamp = batch.timestamp if batch else None 304 | print(record.leveldb_seq_number, record.script_key, record.value, sep="\t") 305 | 306 | ``` 307 | 308 | ## Session Storage 309 | `ccl_chromium_sessionstorage` contains functionality to read the Session Storage data from 310 | a Chromium/Chrome profile folder. 311 | 312 | ### Blog 313 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage 314 | 315 | ### Using the module 316 | An example showing how to iterate all records, grouped by host is shown below: 317 | 318 | ```python 319 | import sys 320 | import pathlib 321 | from ccl_chromium_reader import ccl_chromium_sessionstorage 322 | 323 | level_db_in_dir = pathlib.Path(sys.argv[1]) 324 | 325 | # Create the SessionStoreDb object which is used to access the data 326 | with ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir) as session_storage: 327 | for host in session_storage.iter_hosts(): 328 | print(f"Getting records for {host}") 329 | for record in session_storage.iter_records_for_host(host): 330 | print(record.leveldb_sequence_number, record.key, record.value) 331 | 332 | ``` 333 | 334 | ## Cache 335 | `ccl_chromium_cache` contains functionality for reading Chromium cache data (both 336 | block file and simple cache formats). It can be used to programmatically access 337 | cache data and metadata (including http headers). 338 | 339 | ### CLI 340 | Executing the module as a script allows you to dump a cache (either format) and 341 | collate all metadata into a csv file. 342 | 343 | ``` 344 | USAGE: ccl_chromium_cache.py 345 | 346 | ``` 347 | 348 | ### Using the module 349 | The main() function (which provides the CLI) in the module shows the full 350 | process of detecting the cache type, reading data and metadata from the cache. 351 | 352 | 353 | 354 | 355 | -------------------------------------------------------------------------------- /ccl_chromium_reader/__init__.py: -------------------------------------------------------------------------------- 1 | from .ccl_chromium_profile_folder import ChromiumProfileFolder 2 | from .common import KeySearch 3 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_filesystem.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | __version__ = "0.8" 24 | __description__ = "Library for reading Chrome/Chromium File System API data" 25 | __contact__ = "Alex Caithness" 26 | 27 | import dataclasses 28 | import os 29 | import sys 30 | import pathlib 31 | import datetime 32 | import re 33 | import typing 34 | import types 35 | import functools 36 | 37 | from .storage_formats import ccl_leveldb 38 | from .serialization_formats.ccl_easy_chromium_pickle import EasyPickleIterator 39 | 40 | 41 | @dataclasses.dataclass(frozen=True) 42 | class FileInfo: 43 | _owner: "FileSystem" = dataclasses.field(repr=False) 44 | origin: str 45 | folder_id: str 46 | is_persistent: bool 47 | seq_no: int 48 | file_id: int 49 | parent_id: int 50 | data_path: str 51 | name: str 52 | timestamp: datetime.datetime 53 | 54 | @classmethod 55 | def from_pickle( 56 | cls, owner: "FileSystem", origin: str, folder_id: str, is_persistent: bool, 57 | seq_no: int, file_id: int, data: bytes): 58 | with EasyPickleIterator(data) as reader: 59 | parent_id = reader.read_uint64() 60 | data_path = reader.read_string() 61 | name = reader.read_string() 62 | timestamp = reader.read_datetime() 63 | 64 | return cls(owner, origin, folder_id, is_persistent, seq_no, file_id, parent_id, data_path, name, timestamp) 65 | 66 | def get_local_storage_path(self) -> pathlib.Path: 67 | return self._owner.get_local_path_for_fileinfo(self) 68 | 69 | @property 70 | def is_stored_locally(self) -> bool: 71 | return self.get_local_storage_path().exists() 72 | 73 | 74 | class OriginStorage: 75 | def __init__( 76 | self, 77 | owner: "FileSystem", 78 | origin: str, 79 | folder_id: str, 80 | persistent_files: typing.Optional[typing.Mapping[int, FileInfo]], 81 | persistent_deleted_file_ids: typing.Optional[typing.Iterable[int]], 82 | temporary_files: typing.Optional[typing.Mapping[int, FileInfo]], 83 | temporary_deleted_file_ids: typing.Optional[typing.Iterable[int]]): 84 | self._owner = owner 85 | self._origin = origin 86 | self._folder_id = folder_id 87 | self._persistent_files = types.MappingProxyType(persistent_files or {}) 88 | self._persistent_deleted_file_ids = set(persistent_deleted_file_ids or []) 89 | self._temporary_files = types.MappingProxyType(temporary_files or {}) 90 | self._temporary_deleted_file_ids = set(temporary_deleted_file_ids or []) 91 | 92 | self._persistent_file_listing_lookup = types.MappingProxyType(self._make_file_listing_lookup(True)) 93 | self._temporary_file_listing_lookup = types.MappingProxyType(self._make_file_listing_lookup(False)) 94 | 95 | self._file_listing_lookup_reverse: dict[str, list[str]] = {} 96 | for k, v in self._persistent_file_listing_lookup.items(): 97 | self._file_listing_lookup_reverse.setdefault(v, []) 98 | self._file_listing_lookup_reverse[v].append(f"p_{k}") 99 | 100 | for k, v in self._temporary_file_listing_lookup.items(): 101 | self._file_listing_lookup_reverse.setdefault(v, []) 102 | self._file_listing_lookup_reverse[v].append(f"t_{k}") 103 | self._file_listing_lookup_reverse = types.MappingProxyType( 104 | self._file_listing_lookup_reverse) 105 | 106 | def _make_file_listing_lookup(self, persistent=True) -> dict[int, str]: 107 | files = self._persistent_files if persistent else self._temporary_files 108 | file_listing_lookup: dict[int, str] = {} 109 | 110 | for file_info in files.values(): 111 | if not file_info.data_path: 112 | continue 113 | path_parts = [] 114 | current = file_info 115 | while current.file_id != current.parent_id: 116 | path_parts.insert(0, current.name) 117 | current = files.get(current.parent_id, "") 118 | 119 | path_parts.insert(0, "p" if persistent else "t") 120 | # path_parts.insert(0, self._origin) 121 | # path_parts.insert(0, "") 122 | file_listing_lookup[file_info.file_id] = "/".join(path_parts) 123 | 124 | return file_listing_lookup 125 | 126 | def get_file_listing(self) -> typing.Iterable[tuple[str, FileInfo]]: 127 | for file_id in self._persistent_file_listing_lookup: 128 | yield self._persistent_file_listing_lookup[file_id], self._persistent_files[file_id] 129 | for file_id in self._temporary_file_listing_lookup: 130 | yield self._temporary_file_listing_lookup[file_id], self._temporary_files[file_id] 131 | 132 | def _get_file_info_from_path(self, path) -> typing.Iterable[FileInfo]: 133 | file_keys = self._file_listing_lookup_reverse[str(path)] 134 | for key in file_keys: 135 | p_or_t, file_id = key.split("_", 1) 136 | yield self._persistent_files[int(file_id)] if p_or_t == "p" else self._temporary_files[int(file_id)] 137 | 138 | 139 | class FileSystem: 140 | def __init__(self, path: typing.Union[os.PathLike, str]): 141 | """ 142 | Constructor for the File System API access (the entry point for most processing scripts) 143 | :param path: the path of the File System API storage 144 | """ 145 | self._root = pathlib.Path(path) 146 | self._origins = self._get_origins() 147 | self._origins_reverse = {} 148 | for origin, folders in self._origins.items(): 149 | for folder in folders: 150 | self._origins_reverse[folder] = origin 151 | 152 | def _get_origins(self) -> dict[str, tuple]: 153 | result = {} 154 | with ccl_leveldb.RawLevelDb(self._root / "Origins") as db: 155 | for record in db.iterate_records_raw(): 156 | if record.state != ccl_leveldb.KeyState.Live: 157 | continue 158 | if record.user_key.startswith(b"ORIGIN:"): 159 | _, origin = record.user_key.split(b":", 1) 160 | origin = origin.decode("utf-8") 161 | result.setdefault(origin, []) 162 | result[origin].append(record.value.decode("utf-8")) 163 | 164 | return {k: tuple(v) for (k, v) in result.items()} 165 | 166 | def get_origins(self) -> typing.Iterable[str]: 167 | """ 168 | Yields the origins for this File System API 169 | :return: Yields the origins in this File System API 170 | """ 171 | yield from self._origins.keys() 172 | 173 | def get_folders_for_origin(self, origin) -> tuple[str, ...]: 174 | """ 175 | Returns the folder ids which are used by the origin (host/domain) 176 | :param origin: 177 | :return: a tuple of strings which are the folder id(s) for this origin 178 | """ 179 | return self._origins[origin] 180 | 181 | def get_storage_for_folder(self, folder_id) -> OriginStorage: 182 | """ 183 | Get the OriginStorage object for the folder 184 | :param folder_id: a folder id (such as those returned by get_folders_for_origin) 185 | :return: OriginStorage for the folder_id 186 | """ 187 | return self._build_file_graph(folder_id) 188 | 189 | @functools.cache 190 | def _build_file_graph(self, folder_id) -> OriginStorage: 191 | persistent_files: typing.Optional[dict[int, FileInfo]] = {} 192 | persistent_deleted_files: typing.Optional[dict[int, int]] = {} # file_id: seq_no 193 | temporary_files: typing.Optional[dict[int, FileInfo]] = {} 194 | temporary_deleted_files: typing.Optional[dict[int, int]] = {} # file_id: seq_no 195 | 196 | origin = self._origins_reverse[folder_id] 197 | 198 | for p_or_t in ("p", "t"): 199 | db_path = self._root / folder_id / p_or_t / "Paths" 200 | if not db_path.exists(): 201 | continue 202 | files: dict[int, FileInfo] = persistent_files if p_or_t == "p" else temporary_files 203 | deleted_files: dict[int, int] = persistent_deleted_files if p_or_t == "p" else temporary_deleted_files 204 | with ccl_leveldb.RawLevelDb(db_path) as db: 205 | # TODO: we can infer file modified (created?) times using the parent's modified times maybe 206 | for record in db.iterate_records_raw(): 207 | if re.match(b"[0-9]+", record.user_key) is not None: 208 | if record.state == ccl_leveldb.KeyState.Live: 209 | file_id = int(record.user_key.decode("utf-8")) 210 | file_info = FileInfo.from_pickle( 211 | self, origin, folder_id, p_or_t == "p", record.seq, file_id, record.value) 212 | 213 | # undelete a file if more recent than deletion record: 214 | if file_id in deleted_files and deleted_files[file_id] < file_info.seq_no: 215 | deleted_files.pop(file_id) 216 | 217 | if old_file_info := files.get(file_id): 218 | if old_file_info.seq_no < file_info.seq_no: 219 | # TODO: any reason to keep older records (other than for the timestamps as above?) 220 | files[file_id] = file_info 221 | else: 222 | files[file_id] = file_info 223 | else: 224 | if old_file_info := files.get(file_id): 225 | if old_file_info.seq_no < record.seq: 226 | deleted_files[file_id] = record.seq 227 | else: 228 | deleted_files[file_id] = record.seq 229 | 230 | return OriginStorage( 231 | self, origin, folder_id, 232 | persistent_files, persistent_deleted_files.keys(), 233 | temporary_files, temporary_deleted_files.keys()) 234 | 235 | def get_local_path_for_fileinfo(self, file_info: FileInfo): 236 | """ 237 | Returns the path on the local file system for the FilInfo object 238 | :param file_info: 239 | :return: the path on the local file system for the FilInfo object 240 | """ 241 | path = self._root / file_info.folder_id / ("p" if file_info.is_persistent else "t") / file_info.data_path 242 | return path 243 | 244 | def get_file_stream_for_fileinfo(self, file_info: FileInfo) -> typing.Optional[typing.BinaryIO]: 245 | """ 246 | Returns a file object from the local file system for the FilInfo object 247 | :param file_info: 248 | :return: a file object from the local file system for the FilInfo object 249 | """ 250 | path = self.get_local_path_for_fileinfo(file_info) 251 | if path.exists(): 252 | return path.open("rb") 253 | return None 254 | 255 | 256 | class FileSystemUtils: 257 | @staticmethod 258 | def print_origin_to_folder(fs_folder: typing.Union[os.PathLike, str]) -> None: 259 | """ 260 | utility function to print out origins in the File System API to their folders 261 | :param fs_folder: the path of the File System API storage 262 | :return: None 263 | """ 264 | fs = FileSystem(fs_folder) 265 | for origin in sorted(fs.get_origins()): 266 | print(f"{origin}: {','.join(fs.get_folders_for_origin(origin))}") 267 | 268 | @staticmethod 269 | def print_folder_to_origin(fs_folder: typing.Union[os.PathLike, str]) -> None: 270 | """ 271 | utility function to print out folders in the File System API to their Origin 272 | :param fs_folder: the path of the File System API storage 273 | :return: None 274 | """ 275 | fs = FileSystem(fs_folder) 276 | result = {} 277 | for origin in fs.get_origins(): 278 | for folder in fs.get_folders_for_origin(origin): 279 | result[folder] = origin 280 | 281 | for folder in sorted(result.keys()): 282 | print(f"{folder}: {result[folder]}") 283 | 284 | @staticmethod 285 | def print_all_files(fs_folder: typing.Union[os.PathLike, str]) -> None: 286 | """ 287 | utility function to print out all files in the File System API 288 | :param fs_folder: the path of the File System API storage 289 | :return: None 290 | """ 291 | fs = FileSystem(fs_folder) 292 | for origin in sorted(fs.get_origins()): 293 | for folder in fs.get_folders_for_origin(origin): 294 | storage = fs.get_storage_for_folder(folder) 295 | for file_path, file_info in storage.get_file_listing(): 296 | print("/".join([origin, file_path])) 297 | 298 | 299 | if __name__ == "__main__": 300 | FileSystemUtils.print_all_files(sys.argv[1]) 301 | 302 | 303 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_history.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2024, CCL Forensics 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | """ 21 | 22 | import dataclasses 23 | import datetime 24 | import math 25 | import pathlib 26 | import sqlite3 27 | import enum 28 | import re 29 | import struct 30 | import typing 31 | import collections.abc as col_abc 32 | 33 | from .common import KeySearch, is_keysearch_hit 34 | from .download_common import Download, DownloadSource 35 | 36 | __version__ = "0.6" 37 | __description__ = "Module to access the chrom(e|ium) history database" 38 | __contact__ = "Alex Caithness" 39 | 40 | EPOCH = datetime.datetime(1601, 1, 1) 41 | 42 | 43 | def parse_chromium_time(microseconds: int) -> datetime.datetime: 44 | return EPOCH + datetime.timedelta(microseconds=microseconds) 45 | 46 | 47 | def encode_chromium_time(datetime_value: datetime.datetime) -> int: 48 | return math.floor((datetime_value - EPOCH).total_seconds() * 1000000) 49 | 50 | 51 | class PageTransitionCoreEnum(enum.IntEnum): 52 | # chrome/common/page_transition_types.h 53 | link = 0 54 | typed = 1 55 | auto_bookmark = 2 56 | auto_subframe = 3 57 | manual_subframe = 4 58 | generated = 5 59 | start_page = 6 60 | form_submit = 7 61 | reload = 8 62 | keyword = 9 63 | keyword_generated = 10 64 | 65 | 66 | class PageTransitionQualifierEnum(enum.IntFlag): 67 | blocked = 0x00800000 68 | forward_back = 0x01000000 69 | from_address_bar = 0x02000000 70 | home_page = 0x04000000 71 | from_api = 0x08000000 72 | chain_start = 0x10000000 73 | chain_end = 0x20000000 74 | client_redirect = 0x40000000 75 | server_redirect = 0x80000000 76 | 77 | 78 | @dataclasses.dataclass(frozen=True) 79 | class PageTransition: 80 | core: PageTransitionCoreEnum 81 | qualifier: PageTransitionQualifierEnum 82 | 83 | @classmethod 84 | def from_int(cls, val): 85 | # database stores values signed, python needs unsigned 86 | if val < 0: 87 | val, = struct.unpack(">I", struct.pack(">i", val)) 88 | 89 | core = PageTransitionCoreEnum(val & 0xff) 90 | qual = PageTransitionQualifierEnum(val & 0xffffff00) 91 | 92 | return cls(core, qual) 93 | 94 | 95 | @dataclasses.dataclass(frozen=True) 96 | class HistoryRecord: 97 | _owner: "HistoryDatabase" = dataclasses.field(repr=False) 98 | rec_id: int 99 | url: str 100 | title: str 101 | visit_time: datetime.datetime 102 | visit_duration: datetime.timedelta 103 | transition: PageTransition 104 | from_visit_id: int 105 | opener_visit_id: int 106 | 107 | @property 108 | def record_location(self) -> str: 109 | return f"SQLite Rowid: {self.rec_id}" 110 | 111 | @property 112 | def has_parent(self) -> bool: 113 | return self.from_visit_id != 0 or self.opener_visit_id != 0 114 | 115 | @property 116 | def parent_visit_id(self) -> int: 117 | return self.opener_visit_id or self.from_visit_id 118 | 119 | def get_parent(self) -> typing.Optional["HistoryRecord"]: 120 | """ 121 | Get the parent visit for this record (based on the from_visit field in the database), 122 | or None if there isn't one. 123 | """ 124 | 125 | return self._owner.get_parent_of(self) 126 | 127 | def get_children(self) -> col_abc.Iterable["HistoryRecord"]: 128 | """ 129 | Get the children visits for this record (based on their from_visit field in the database). 130 | """ 131 | return self._owner.get_children_of(self) 132 | 133 | 134 | class HistoryDatabase: 135 | _HISTORY_QUERY = """ 136 | SELECT 137 | "visits"."id", 138 | "urls"."url", 139 | "urls"."title", 140 | "visits"."visit_time", 141 | "visits"."from_visit", 142 | "visits"."opener_visit", 143 | "visits"."transition", 144 | "visits"."visit_duration", 145 | CASE 146 | WHEN "visits"."opener_visit" != 0 THEN "visits"."opener_visit" 147 | ELSE "visits"."from_visit" 148 | END "parent_id" 149 | 150 | FROM "visits" 151 | LEFT JOIN "urls" ON "visits"."url" = "urls"."id" 152 | """ 153 | 154 | _WHERE_URL_EQUALS_PREDICATE = """"urls"."url" = ?""" 155 | 156 | _WHERE_URL_REGEX_PREDICATE = """"urls"."url" REGEXP ?""" 157 | 158 | _WHERE_URL_IN_PREDICATE = """"urls"."url" IN ({parameter_question_marks})""" 159 | 160 | _WHERE_VISIT_TIME_EARLIEST_PREDICATE = """"visits"."visit_time" >= ?""" 161 | 162 | _WHERE_VISIT_TIME_LATEST_PREDICATE = """"visits"."visit_time" <= ?""" 163 | 164 | _WHERE_VISIT_ID_EQUALS_PREDICATE = """"visits"."id" = ?""" 165 | 166 | #_WHERE_FROM_VISIT_EQUALS_PREDICATE = """"visits"."from_visit" = ?""" 167 | 168 | #_WHERE_OPENER_VISIT_EQUALS_PREDICATE = """"visits"."opener_visit" = ?""" 169 | 170 | _WHERE_PARENT_ID_EQUALS_PREDICATE = """"parent_id" = ?""" 171 | 172 | _DOWNLOADS_QUERY = """ 173 | SELECT 174 | "downloads"."id", 175 | "downloads"."guid", 176 | "downloads"."current_path", 177 | "downloads"."target_path", 178 | "downloads"."start_time", 179 | "downloads"."received_bytes", 180 | "downloads"."total_bytes", 181 | "downloads"."state", 182 | "downloads"."danger_type", 183 | "downloads"."interrupt_reason", 184 | "downloads"."hash", 185 | "downloads"."end_time", 186 | "downloads"."opened", 187 | "downloads"."last_access_time", 188 | "downloads"."transient", 189 | "downloads"."referrer", 190 | "downloads"."site_url", 191 | "downloads"."embedder_download_data", 192 | "downloads"."tab_url", 193 | "downloads"."tab_referrer_url", 194 | "downloads"."http_method", 195 | "downloads"."mime_type", 196 | "downloads"."original_mime_type" 197 | FROM "downloads"; 198 | """ 199 | 200 | _DOWNLOADS_URL_CHAINS_QUEREY = """ 201 | SELECT "downloads_url_chains"."id", 202 | "downloads_url_chains"."chain_index", 203 | "downloads_url_chains"."url" 204 | FROM "downloads_url_chains" 205 | WHERE "downloads_url_chains"."id" = ? 206 | ORDER BY "downloads_url_chains"."chain_index"; 207 | """ 208 | 209 | def __init__(self, db_path: pathlib.Path): 210 | self._conn = sqlite3.connect(db_path.absolute().as_uri() + "?mode=ro", uri=True) 211 | self._conn.row_factory = sqlite3.Row 212 | self._conn.create_function("regexp", 2, lambda y, x: 1 if re.search(y, x) is not None else 0) 213 | 214 | def _row_to_record(self, row: sqlite3.Row) -> HistoryRecord: 215 | return HistoryRecord( 216 | self, 217 | row["id"], 218 | row["url"], 219 | row["title"], 220 | parse_chromium_time(row["visit_time"]), 221 | datetime.timedelta(microseconds=row["visit_duration"]), 222 | PageTransition.from_int(row["transition"]), 223 | row["from_visit"], 224 | row["opener_visit"] 225 | ) 226 | 227 | def get_parent_of(self, record: HistoryRecord) -> typing.Optional[HistoryRecord]: 228 | if record.from_visit_id == 0 and record.opener_visit_id == 0: 229 | return None 230 | 231 | parent_id = record.opener_visit_id if record.opener_visit_id != 0 else record.from_visit_id 232 | 233 | query = HistoryDatabase._HISTORY_QUERY 234 | query += f" WHERE {HistoryDatabase._WHERE_VISIT_ID_EQUALS_PREDICATE};" 235 | cur = self._conn.cursor() 236 | cur.execute(query, (parent_id,)) 237 | row = cur.fetchone() 238 | cur.close() 239 | if row: 240 | return self._row_to_record(row) 241 | 242 | def get_children_of(self, record: HistoryRecord) -> col_abc.Iterable[HistoryRecord]: 243 | query = HistoryDatabase._HISTORY_QUERY 244 | predicate = HistoryDatabase._WHERE_PARENT_ID_EQUALS_PREDICATE 245 | query += f" WHERE {predicate};" 246 | cur = self._conn.cursor() 247 | cur.execute(query, (record.rec_id,)) 248 | for row in cur: 249 | yield self._row_to_record(row) 250 | 251 | cur.close() 252 | 253 | def get_record_with_id(self, visit_id: int) -> typing.Optional[HistoryRecord]: 254 | query = HistoryDatabase._HISTORY_QUERY 255 | query += f" WHERE {HistoryDatabase._WHERE_VISIT_ID_EQUALS_PREDICATE};" 256 | cur = self._conn.cursor() 257 | cur.execute(query, (visit_id,)) 258 | row = cur.fetchone() 259 | cur.close() 260 | if row: 261 | return self._row_to_record(row) 262 | 263 | def iter_history_records( 264 | self, url: typing.Optional[KeySearch], *, 265 | earliest: typing.Optional[datetime.datetime]=None, latest: typing.Optional[datetime.datetime]=None 266 | ) -> col_abc.Iterable[HistoryRecord]: 267 | 268 | predicates = [] 269 | parameters = [] 270 | 271 | if url is None: 272 | pass # no predicate 273 | elif isinstance(url, str): 274 | predicates.append(HistoryDatabase._WHERE_URL_EQUALS_PREDICATE) 275 | parameters.append(url) 276 | elif isinstance(url, re.Pattern): 277 | predicates.append(HistoryDatabase._WHERE_URL_REGEX_PREDICATE) 278 | parameters.append(url.pattern) 279 | elif isinstance(url, col_abc.Collection): 280 | predicates.append( 281 | HistoryDatabase._WHERE_URL_IN_PREDICATE.format( 282 | parameter_question_marks=",".join("?" for _ in range(len(url))))) 283 | parameters.extend(url) 284 | elif isinstance(url, col_abc.Callable): 285 | pass # we have to call this function across every 286 | else: 287 | raise TypeError(f"Unexpected type: {type(url)} (expects: {KeySearch})") 288 | 289 | if earliest is not None: 290 | predicates.append(HistoryDatabase._WHERE_VISIT_TIME_EARLIEST_PREDICATE) 291 | parameters.append(encode_chromium_time(earliest)) 292 | 293 | if latest is not None: 294 | predicates.append(HistoryDatabase._WHERE_VISIT_TIME_LATEST_PREDICATE) 295 | parameters.append(encode_chromium_time(latest)) 296 | 297 | query = HistoryDatabase._HISTORY_QUERY 298 | if predicates: 299 | query += f" WHERE {' AND '.join(predicates)}" 300 | 301 | query += ";" 302 | cur = self._conn.cursor() 303 | for row in cur.execute(query, parameters): 304 | if not isinstance(url, col_abc.Callable) or url(row["url"]): 305 | yield self._row_to_record(row) 306 | 307 | cur.close() 308 | 309 | def iter_downloads( 310 | self, 311 | download_url: typing.Optional[KeySearch]=None, 312 | tab_url: typing.Optional[KeySearch]=None) -> col_abc.Iterable[Download]: 313 | downloads_cur = self._conn.cursor() 314 | chain_cur = self._conn.cursor() 315 | 316 | downloads_cur.execute(HistoryDatabase._DOWNLOADS_QUERY) 317 | 318 | for download in downloads_cur: 319 | chain_cur.execute(HistoryDatabase._DOWNLOADS_URL_CHAINS_QUEREY, (download["id"],)) 320 | chain = tuple(x["url"] for x in chain_cur) 321 | 322 | if download_url is not None and not any(is_keysearch_hit(download_url, x) for x in chain): 323 | continue 324 | 325 | if (tab_url is not None and 326 | not is_keysearch_hit(tab_url, download["tab_url"]) and 327 | not is_keysearch_hit(tab_url, download["tab_referrer_url"])): 328 | continue 329 | 330 | yield Download( 331 | DownloadSource.history_db, 332 | download["id"], 333 | download["guid"], 334 | download["hash"].hex(), 335 | chain, 336 | download["tab_url"], 337 | download["tab_referrer_url"], 338 | download["target_path"], 339 | download["mime_type"], 340 | download["original_mime_type"], 341 | download["total_bytes"], 342 | parse_chromium_time(download["start_time"]), 343 | parse_chromium_time(download["end_time"]) 344 | ) 345 | 346 | downloads_cur.close() 347 | chain_cur.close() 348 | 349 | def close(self): 350 | self._conn.close() 351 | 352 | def __enter__(self): 353 | return self 354 | 355 | def __exit__(self, exc_type, exc_val, exc_tb): 356 | self.close() 357 | 358 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_localstorage.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2021-2024, CCL Forensics 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | """ 21 | 22 | import io 23 | import bisect 24 | import re 25 | import sys 26 | import pathlib 27 | import types 28 | import typing 29 | import collections.abc as col_abc 30 | import dataclasses 31 | import datetime 32 | 33 | from .storage_formats import ccl_leveldb 34 | from .common import KeySearch 35 | 36 | __version__ = "0.5" 37 | __description__ = "Module for reading the Chromium leveldb localstorage format" 38 | __contact__ = "Alex Caithness" 39 | 40 | """ 41 | See: https://source.chromium.org/chromium/chromium/src/+/main:components/services/storage/dom_storage/local_storage_impl.cc 42 | Meta keys: 43 | Key = "META:" + storage_key (the host) 44 | Value = protobuff: 1=timestamp (varint); 2=size in bytes (varint) 45 | 46 | Record keys: 47 | Key = "_" + storage_key + "\\x0" + script_key 48 | Value = record_value 49 | 50 | """ 51 | 52 | _META_PREFIX = b"META:" 53 | _RECORD_KEY_PREFIX = b"_" 54 | _CHROME_EPOCH = datetime.datetime(1601, 1, 1, 0, 0, 0) 55 | 56 | EIGHT_BIT_ENCODING = "iso-8859-1" 57 | 58 | 59 | def from_chrome_timestamp(microseconds: int) -> datetime.datetime: 60 | return _CHROME_EPOCH + datetime.timedelta(microseconds=microseconds) 61 | 62 | 63 | def decode_string(raw: bytes) -> str: 64 | """ 65 | decodes a type-prefixed string - prefix of: 0=utf-16-le; 1=an extended ascii codepage (likely dependant on locale) 66 | :param raw: raw prefixed-string data 67 | :return: decoded string 68 | """ 69 | prefix = raw[0] 70 | if prefix == 0: 71 | return raw[1:].decode("utf-16-le") 72 | elif prefix == 1: 73 | return raw[1:].decode(EIGHT_BIT_ENCODING) 74 | else: 75 | raise ValueError("Unexpected prefix, please contact developer") 76 | 77 | 78 | @dataclasses.dataclass(frozen=True) 79 | class StorageMetadata: 80 | storage_key: str 81 | timestamp: datetime.datetime 82 | size_in_bytes: int 83 | leveldb_seq_number: int 84 | 85 | @classmethod 86 | def from_protobuff(cls, storage_key: str, data: bytes, seq: int): 87 | with io.BytesIO(data) as stream: 88 | # This is a simple protobuff, so we'll read it directly, but with checks, rather than add a dependency 89 | ts_tag = ccl_leveldb.read_le_varint(stream) 90 | if (ts_tag & 0x07) != 0 or (ts_tag >> 3) != 1: 91 | raise ValueError("Unexpected tag when reading StorageMetadata from protobuff") 92 | timestamp = from_chrome_timestamp(ccl_leveldb.read_le_varint(stream)) 93 | 94 | size_tag = ccl_leveldb.read_le_varint(stream) 95 | if (size_tag & 0x07) != 0 or (size_tag >> 3) != 2: 96 | raise ValueError("Unexpected tag when reading StorageMetadata from protobuff") 97 | size = ccl_leveldb.read_le_varint(stream) 98 | 99 | return cls(storage_key, timestamp, size, seq) 100 | 101 | 102 | @dataclasses.dataclass(frozen=True) 103 | class LocalStorageRecord: 104 | storage_key: str 105 | script_key: str 106 | value: str 107 | leveldb_seq_number: int 108 | is_live: bool 109 | 110 | @property 111 | def record_location(self) -> str: 112 | return f"Leveldb Seq: {self.leveldb_seq_number}" 113 | 114 | 115 | class LocalStorageBatch: 116 | def __init__(self, meta: StorageMetadata, end_seq: int): 117 | self._meta = meta 118 | self._end = end_seq 119 | 120 | @property 121 | def storage_key(self) -> str: 122 | return self._meta.storage_key 123 | 124 | @property 125 | def timestamp(self) -> datetime.datetime: 126 | return self._meta.timestamp 127 | 128 | @property 129 | def start(self): 130 | return self._meta.leveldb_seq_number 131 | 132 | @property 133 | def end(self): 134 | return self._end 135 | 136 | def __repr__(self): 137 | return f"(storage_key={self.storage_key}, timestamp={self.timestamp}, start={self.start}, end={self.end})" 138 | 139 | 140 | class LocalStoreDb: 141 | def __init__(self, in_dir: pathlib.Path): 142 | if not in_dir.is_dir(): 143 | raise IOError("Input directory is not a directory") 144 | 145 | self._ldb = ccl_leveldb.RawLevelDb(in_dir) 146 | 147 | self._storage_details = {} # storage_key: {seq_number: StorageMetadata} 148 | self._flat_items = [] # [StorageMetadata|LocalStorageRecord] - used to batch items up 149 | self._records = {} # storage_key: {script_key: {seq_number: LocalStorageRecord}} 150 | 151 | for record in self._ldb.iterate_records_raw(): 152 | if record.user_key.startswith(_META_PREFIX) and record.state == ccl_leveldb.KeyState.Live: 153 | # Only live records for metadata - not sure what we can reliably infer from deleted keys 154 | storage_key = record.user_key.removeprefix(_META_PREFIX).decode(EIGHT_BIT_ENCODING) 155 | self._storage_details.setdefault(storage_key, {}) 156 | metadata = StorageMetadata.from_protobuff(storage_key, record.value, record.seq) 157 | self._storage_details[storage_key][record.seq] = metadata 158 | self._flat_items.append(metadata) 159 | elif record.user_key.startswith(_RECORD_KEY_PREFIX): 160 | # We include deleted records here because we need them to build batches 161 | storage_key_raw, script_key_raw = record.user_key.removeprefix(_RECORD_KEY_PREFIX).split(b"\x00", 1) 162 | storage_key = storage_key_raw.decode(EIGHT_BIT_ENCODING) 163 | script_key = decode_string(script_key_raw) 164 | 165 | try: 166 | value = decode_string(record.value) if record.state == ccl_leveldb.KeyState.Live else None 167 | except UnicodeDecodeError as e: 168 | # Some sites play games to test the browser's capabilities like encoding half of a surrogate pair 169 | print(f"Error decoding record value at seq no {record.seq}; " 170 | f"{storage_key} {script_key}: {record.value}") 171 | continue 172 | 173 | self._records.setdefault(storage_key, {}) 174 | self._records[storage_key].setdefault(script_key, {}) 175 | 176 | ls_record = LocalStorageRecord( 177 | storage_key, script_key, value, record.seq, record.state == ccl_leveldb.KeyState.Live) 178 | self._records[storage_key][script_key][record.seq] = ls_record 179 | self._flat_items.append(ls_record) 180 | 181 | self._storage_details = types.MappingProxyType(self._storage_details) 182 | self._records = types.MappingProxyType(self._records) 183 | 184 | self._all_storage_keys = frozenset(self._storage_details.keys() | self._records.keys()) # because deleted data. 185 | self._flat_items.sort(key=lambda x: x.leveldb_seq_number) 186 | 187 | # organise batches - this is made complex and slow by having to account for missing/deleted data 188 | # we're looking for a StorageMetadata followed by sequential (in terms of seq number) LocalStorageRecords 189 | # with the same storage key. Everything that falls within that chain can safely be considered a batch. 190 | # Any break in sequence numbers or storage key is a fail and can't be considered part of a batch. 191 | self._batches = {} 192 | current_meta: typing.Optional[StorageMetadata] = None 193 | current_end = 0 194 | for item in self._flat_items: # pre-sorted 195 | if isinstance(item, LocalStorageRecord): 196 | if current_meta is None: 197 | # no currently valid metadata so we can't attribute this record to anything 198 | continue 199 | elif item.leveldb_seq_number - current_end != 1 or item.storage_key != current_meta.storage_key: 200 | # this record breaks a chain, so bundle up what we have and clear everything out 201 | self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end) 202 | current_meta = None 203 | current_end = 0 204 | else: 205 | # contiguous and right storage key, include in the current chain 206 | current_end = item.leveldb_seq_number 207 | elif isinstance(item, StorageMetadata): 208 | if current_meta is not None: 209 | # this record breaks a chain, so bundle up what we have, set new start 210 | self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end) 211 | current_meta = item 212 | current_end = item.leveldb_seq_number 213 | else: 214 | raise ValueError 215 | 216 | if current_meta is not None: 217 | self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end) 218 | 219 | self._batch_starts = tuple(sorted(self._batches.keys())) 220 | 221 | def iter_storage_keys(self) -> col_abc.Iterable[str]: 222 | yield from self._storage_details.keys() 223 | 224 | def contains_storage_key(self, storage_key: str) -> bool: 225 | return storage_key in self._all_storage_keys 226 | 227 | def iter_script_keys(self, storage_key: str) -> col_abc.Iterable[str]: 228 | if storage_key not in self._all_storage_keys: 229 | raise KeyError(storage_key) 230 | if storage_key not in self._records: 231 | raise StopIteration 232 | yield from self._records[storage_key].keys() 233 | 234 | def contains_script_key(self, storage_key: str, script_key: str) -> bool: 235 | return script_key in self._records.get(storage_key, {}) 236 | 237 | def find_batch(self, seq: int) -> typing.Optional[LocalStorageBatch]: 238 | """ 239 | Finds the batch that a record with the given sequence number belongs to 240 | :param seq: leveldb sequence id 241 | :return: the batch containing the given sequence number or None if no batch contains it 242 | """ 243 | 244 | i = bisect.bisect_left(self._batch_starts, seq) - 1 245 | if i < 0: 246 | return None 247 | start = self._batch_starts[i] 248 | batch = self._batches[start] 249 | if batch.start <= seq <= batch.end: 250 | return batch 251 | else: 252 | return None 253 | 254 | def iter_all_records(self, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]: 255 | """ 256 | :param include_deletions: if True, records related to deletions will be included 257 | (these will have None as values). 258 | :return: iterable of LocalStorageRecords 259 | """ 260 | for storage_key, script_dict in self._records.items(): 261 | for script_key, values in script_dict.items(): 262 | for seq, value in values.items(): 263 | if value.is_live or include_deletions: 264 | yield value 265 | 266 | def _iter_records_for_storage_key( 267 | self, storage_key: str, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]: 268 | """ 269 | :param storage_key: storage key (host) for the records 270 | :param include_deletions: if True, records related to deletions will be included 271 | (these will have None as values). 272 | :return: iterable of LocalStorageRecords 273 | """ 274 | if not self.contains_storage_key(storage_key): 275 | raise KeyError(storage_key) 276 | for script_key, values in self._records[storage_key].items(): 277 | for seq, value in values.items(): 278 | if value.is_live or include_deletions: 279 | yield value 280 | 281 | def _search_storage_keys(self, storage_key: KeySearch) -> list[str]: 282 | if isinstance(storage_key, str): 283 | return [storage_key] 284 | elif isinstance(storage_key, re.Pattern): 285 | return [x for x in self._all_storage_keys if storage_key.search(x)] 286 | elif isinstance(storage_key, col_abc.Collection): 287 | return list(set(storage_key) & self._all_storage_keys) 288 | elif isinstance(storage_key, col_abc.Callable): 289 | return [x for x in self._all_storage_keys if storage_key(x)] 290 | else: 291 | raise TypeError(f"Unexpected type: {type(storage_key)} (expects: {KeySearch})") 292 | 293 | def iter_records_for_storage_key( 294 | self, storage_key: KeySearch, *, 295 | include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[LocalStorageRecord]: 296 | """ 297 | :param storage_key: storage key (host) for the records. This can be one of: a single string; 298 | a collection of strings; a regex pattern; a function that takes a string (the host) and returns a bool. 299 | :param include_deletions: if True, records related to deletions will be included 300 | :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError 301 | (these will have None as values). 302 | :return: iterable of LocalStorageRecords 303 | """ 304 | if isinstance(storage_key, str): 305 | if raise_on_no_result and not self.contains_storage_key(storage_key): 306 | raise KeyError(storage_key) 307 | yield from self._iter_records_for_storage_key(storage_key, include_deletions) 308 | elif isinstance(storage_key, re.Pattern): 309 | matched_keys = self._search_storage_keys(storage_key) 310 | if raise_on_no_result and not matched_keys: 311 | raise KeyError(f"Pattern: {storage_key.pattern}") 312 | for key in matched_keys: 313 | yield from self._iter_records_for_storage_key(key, include_deletions) 314 | elif isinstance(storage_key, col_abc.Collection): 315 | matched_keys = self._search_storage_keys(storage_key) 316 | if raise_on_no_result and not matched_keys: 317 | raise KeyError(storage_key) 318 | for key in matched_keys: 319 | yield from self._iter_records_for_storage_key(key, include_deletions) 320 | elif isinstance(storage_key, col_abc.Callable): 321 | matched_keys = self._search_storage_keys(storage_key) 322 | if raise_on_no_result and not matched_keys: 323 | raise KeyError(storage_key) 324 | for key in matched_keys: 325 | yield from self._iter_records_for_storage_key(key, include_deletions) 326 | else: 327 | raise TypeError(f"Unexpected type for storage key: {type(storage_key)} (expects: {KeySearch})") 328 | 329 | def _iter_records_for_script_key( 330 | self, storage_key: str, script_key: str, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]: 331 | """ 332 | :param storage_key: storage key (host) for the records 333 | :param script_key: script defined key for the records 334 | :param include_deletions: if True, records related to deletions will be included 335 | :return: iterable of LocalStorageRecords 336 | """ 337 | if not self.contains_script_key(storage_key, script_key): 338 | raise KeyError((storage_key, script_key)) 339 | for seq, value in self._records[storage_key][script_key].items(): 340 | if value.is_live or include_deletions: 341 | yield value 342 | 343 | def iter_records_for_script_key( 344 | self, storage_key: KeySearch, script_key: KeySearch, *, 345 | include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[LocalStorageRecord]: 346 | """ 347 | :param storage_key: storage key (host) for the records. This can be one of: a single string; 348 | a collection of strings; a regex pattern; a function that takes a string and returns a bool. 349 | :param script_key: script defined key for the records. This can be one of: a single string; 350 | a collection of strings; a regex pattern; a function that takes a string and returns a bool. 351 | :param include_deletions: if True, records related to deletions will be included 352 | :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError 353 | (these will have None as values). 354 | :return: iterable of LocalStorageRecords 355 | """ 356 | 357 | if isinstance(storage_key, str) and isinstance(script_key, str): 358 | if raise_on_no_result and not self.contains_script_key(storage_key, script_key): 359 | raise KeyError((storage_key, script_key)) 360 | yield from self._iter_records_for_script_key(storage_key, script_key, include_deletions=include_deletions) 361 | else: 362 | matched_storage_keys = self._search_storage_keys(storage_key) 363 | if raise_on_no_result and not matched_storage_keys: 364 | raise KeyError((storage_key, script_key)) 365 | 366 | yielded = False 367 | for matched_storage_key in matched_storage_keys: 368 | if isinstance(script_key, str): 369 | matched_script_keys = [script_key] 370 | elif isinstance(script_key, re.Pattern): 371 | matched_script_keys = [x for x in self._records[matched_storage_key].keys() if script_key.search(x)] 372 | elif isinstance(script_key, col_abc.Collection): 373 | script_key_set = set(script_key) 374 | matched_script_keys = list(self._records[matched_storage_key].keys() & script_key_set) 375 | elif isinstance(script_key, col_abc.Callable): 376 | matched_script_keys = [x for x in self._records[matched_storage_key].keys() if script_key(x)] 377 | else: 378 | raise TypeError(f"Unexpected type for script key: {type(script_key)} (expects: {KeySearch})") 379 | 380 | for key in matched_script_keys: 381 | for seq, value in self._records[matched_storage_key][key].items(): 382 | if value.is_live or include_deletions: 383 | yielded = True 384 | yield value 385 | 386 | if not yielded: 387 | raise KeyError((storage_key, script_key)) 388 | 389 | def iter_metadata(self) -> col_abc.Iterable[StorageMetadata]: 390 | """ 391 | :return: iterable of StorageMetaData 392 | """ 393 | for meta in self._flat_items: 394 | if isinstance(meta, StorageMetadata): 395 | yield meta 396 | 397 | def iter_metadata_for_storage_key(self, storage_key: str) -> col_abc.Iterable[StorageMetadata]: 398 | """ 399 | :param storage_key: storage key (host) for the metadata 400 | :return: iterable of StorageMetadata 401 | """ 402 | if storage_key not in self._all_storage_keys: 403 | raise KeyError(storage_key) 404 | if storage_key not in self._storage_details: 405 | return None 406 | for seq, meta in self._storage_details[storage_key].items(): 407 | yield meta 408 | 409 | def iter_batches(self) -> col_abc.Iterable[LocalStorageBatch]: 410 | yield from self._batches.values() 411 | 412 | def close(self): 413 | self._ldb.close() 414 | 415 | def __contains__(self, item: typing.Union[str, tuple[str, str]]) -> bool: 416 | """ 417 | :param item: either the host as a str or a tuple of the host and a key (both str) 418 | :return: if item is a str, returns true if that host is present, if item is a tuple of (str, str), returns True 419 | if that host and key pair are present 420 | """ 421 | 422 | if isinstance(item, str): 423 | return item in self._all_storage_keys 424 | elif isinstance(item, tuple) and len(item) == 2: 425 | host, key = item 426 | return host in self._all_storage_keys and key in self._records[host] 427 | else: 428 | raise TypeError("item must be a string or a tuple of (str, str)") 429 | 430 | def __iter__(self): 431 | """ 432 | iterates the hosts (storage keys) present 433 | """ 434 | yield from self._all_storage_keys 435 | 436 | def __enter__(self) -> "LocalStoreDb": 437 | return self 438 | 439 | def __exit__(self, exc_type, exc_val, exc_tb): 440 | self.close() 441 | 442 | 443 | def main(args): 444 | in_ldb_path = pathlib.Path(args[0]) 445 | local_store = LocalStoreDb(in_ldb_path) 446 | 447 | for rec in local_store.iter_all_records(): 448 | batch = local_store.find_batch(rec.leveldb_seq_number) 449 | print(rec, batch) 450 | 451 | 452 | if __name__ == '__main__': 453 | main(sys.argv[1:]) 454 | 455 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_notifications.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import datetime 24 | import enum 25 | import io 26 | import os 27 | import struct 28 | import sys 29 | import pathlib 30 | import dataclasses 31 | import typing 32 | 33 | from .storage_formats import ccl_leveldb 34 | from .serialization_formats import ccl_blink_value_deserializer, ccl_v8_value_deserializer, ccl_protobuff as pb 35 | 36 | __version__ = "0.3" 37 | __description__ = "Library for reading Chrome/Chromium notifications (Platform Notifications)" 38 | __contact__ = "Alex Caithness" 39 | 40 | # See content/browser/notifications/notification_database.cc 41 | # and content/browser/notifications/notification_database_data.proto 42 | 43 | EPOCH = datetime.datetime(1601, 1, 1) 44 | 45 | 46 | # stolen from ccl_chromium_indexeddb 20230907 47 | @dataclasses.dataclass(frozen=True) 48 | class BlinkTrailer: 49 | # third_party/blink/renderer/bindings/core/v8/serialization/trailer_reader.h 50 | offset: int 51 | length: int 52 | 53 | TRAILER_SIZE: typing.ClassVar[int] = 13 54 | MIN_WIRE_FORMAT_VERSION_FOR_TRAILER: typing.ClassVar[int] = 21 55 | 56 | @classmethod 57 | def from_buffer(cls, buffer, trailer_offset: int): 58 | tag, offset, length = struct.unpack(">cQI", buffer[trailer_offset: trailer_offset + BlinkTrailer.TRAILER_SIZE]) 59 | if tag != ccl_blink_value_deserializer.Constants.tag_kTrailerOffsetTag: 60 | raise ValueError( 61 | f"Trailer doesn't start with kTrailerOffsetTag " 62 | f"(expected: 0x{ccl_blink_value_deserializer.Constants.tag_kTrailerOffsetTag.hex()}; " 63 | f"got: 0x{tag.hex()}") 64 | 65 | return BlinkTrailer(offset, length) 66 | 67 | 68 | class ClosedReason(enum.IntEnum): 69 | USER = 0 70 | DEVELOPER = 1 71 | UNKNOWN = 2 72 | 73 | 74 | class ActionType(enum.IntEnum): 75 | BUTTON = 0 76 | TEXT = 1 77 | 78 | 79 | class Direction(enum.IntEnum): 80 | LEFT_TO_RIGHT = 0 81 | RIGHT_TO_LEFT = 1 82 | AUTO = 2 83 | 84 | 85 | def read_datetime(stream): 86 | ms = pb.read_le_varint(stream) 87 | return EPOCH + datetime.timedelta(microseconds=ms) 88 | 89 | 90 | NotificationAction_Structure = { 91 | 1: pb.ProtoDecoder("action", pb.read_string), 92 | 2: pb.ProtoDecoder("title", pb.read_string), 93 | 3: pb.ProtoDecoder("icon", pb.read_string), 94 | 4: pb.ProtoDecoder("type", lambda x: ActionType(pb.read_le_varint(x))), 95 | 5: pb.ProtoDecoder("placeholder", pb.read_string), 96 | } 97 | 98 | NotificationData_Structure = { 99 | 1: pb.ProtoDecoder("title", pb.read_string), 100 | 2: pb.ProtoDecoder("closed_reason", lambda x: Direction(pb.read_le_varint(x))), 101 | 3: pb.ProtoDecoder("lang", pb.read_string), 102 | 4: pb.ProtoDecoder("body", pb.read_string), 103 | 5: pb.ProtoDecoder("tag", pb.read_string), 104 | 6: pb.ProtoDecoder("icon", pb.read_string), 105 | 7: pb.ProtoDecoder("silent", lambda x: pb.read_le_varint(x) != 0), 106 | 8: pb.ProtoDecoder("data", pb.read_blob), 107 | 9: pb.ProtoDecoder("vibration", pb.read_blob), 108 | 10: pb.ProtoDecoder( 109 | "actions", lambda x: pb.read_embedded_protobuf(x, NotificationAction_Structure, use_friendly_tag=True)), 110 | 11: pb.ProtoDecoder("require_interaction", lambda x: pb.read_le_varint(x) != 0), 111 | 12: pb.ProtoDecoder("timestamp", read_datetime), 112 | 13: pb.ProtoDecoder("renotify", lambda x: pb.read_le_varint(x) != 0), 113 | 14: pb.ProtoDecoder("badge", pb.read_string), 114 | 15: pb.ProtoDecoder("image", pb.read_string), 115 | 16: pb.ProtoDecoder("show_trigger_timestamp", read_datetime), 116 | } 117 | 118 | NotificationDatabaseDataProto_Structure = { 119 | 1: pb.ProtoDecoder("persistent_notification_id", pb.read_le_varint), 120 | 2: pb.ProtoDecoder("origin", pb.read_string), 121 | 3: pb.ProtoDecoder("service_worker_registration_id", pb.read_le_varint), 122 | 4: pb.ProtoDecoder( 123 | "notification_data", lambda x: pb.read_embedded_protobuf(x, NotificationData_Structure, use_friendly_tag=True)), 124 | 5: pb.ProtoDecoder("notification_id", pb.read_string), 125 | 6: pb.ProtoDecoder("replaced_existing_notification", lambda x: pb.read_le_varint(x) != 0), 126 | 7: pb.ProtoDecoder("num_clicks", pb.read_le_varint32), 127 | 8: pb.ProtoDecoder("num_action_button_clicks", pb.read_le_varint32), 128 | 9: pb.ProtoDecoder("creation_time_millis", read_datetime), 129 | 10: pb.ProtoDecoder("time_until_first_click_millis", pb.read_le_varint), 130 | 11: pb.ProtoDecoder("time_until_last_click_millis", pb.read_le_varint), 131 | 12: pb.ProtoDecoder("time_until_close_millis", pb.read_le_varint), 132 | 13: pb.ProtoDecoder("closed_reason", lambda x: ClosedReason(pb.read_le_varint(x))), 133 | 14: pb.ProtoDecoder("has_triggered", lambda x: pb.read_le_varint(x) != 0), 134 | 15: pb.ProtoDecoder("is_shown_by_browser", lambda x: pb.read_le_varint(x) != 0), 135 | } 136 | 137 | 138 | @dataclasses.dataclass(frozen=True) 139 | class LevelDbInfo: 140 | user_key: bytes 141 | origin_file: os.PathLike 142 | seq_no: int 143 | 144 | 145 | @dataclasses.dataclass(frozen=True) 146 | class NotificationAction: 147 | action: typing.Optional[str] 148 | title: typing.Optional[str] 149 | icon: typing.Optional[str] 150 | action_type: typing.Optional[ActionType] 151 | placeholder: typing.Optional[str] 152 | 153 | 154 | @dataclasses.dataclass(frozen=True) 155 | class ChromiumNotification: 156 | level_db_info: LevelDbInfo 157 | origin: str 158 | persistent_notification_id: int 159 | notification_id: str 160 | title: typing.Optional[str] 161 | body: typing.Optional[str] 162 | data: typing.Optional[typing.Any] 163 | timestamp: datetime.datetime 164 | creation_time: datetime.datetime # from creation_time_millis 165 | closed_reason: ClosedReason 166 | time_until_first_click_millis: int 167 | time_until_last_click_millis: int 168 | time_until_close_millis: int 169 | 170 | tag: typing.Optional[str] 171 | image: typing.Optional[str] 172 | icon: typing.Optional[str] 173 | badge: typing.Optional[str] 174 | 175 | actions: typing.Optional[typing.Iterable[NotificationAction]] 176 | 177 | 178 | class NotificationReader: 179 | def __init__(self, notification_input_path: pathlib.Path): 180 | self._db = ccl_leveldb.RawLevelDb(notification_input_path) 181 | 182 | def close(self): 183 | self._db.close() 184 | 185 | def __enter__(self): 186 | return self 187 | 188 | def __exit__(self, exc_type, exc_val, exc_tb): 189 | self._db.close() 190 | 191 | def read_notifications(self) -> typing.Iterable[ChromiumNotification]: 192 | blink_deserializer = ccl_blink_value_deserializer.BlinkV8Deserializer() 193 | for record in self._db.iterate_records_raw(): 194 | if record.state != ccl_leveldb.KeyState.Live: 195 | continue 196 | 197 | key = record.user_key.decode("utf-8") 198 | record_type, key_info = key.split(":", 1) 199 | origin, key_id = key_info.split("\0", 1) 200 | level_db_info = LevelDbInfo(record.user_key, record.origin_file, record.seq) 201 | if record_type == "DATA": 202 | with io.BytesIO(record.value) as stream: 203 | root = pb.ProtoObject( 204 | 0x2, 205 | "root", 206 | pb.read_protobuff(stream, NotificationDatabaseDataProto_Structure, use_friendly_tag=True)) 207 | 208 | data = root.only("notification_data").only("data").value 209 | if data: 210 | if data[0] != 0xff: 211 | print(key) 212 | print(data) 213 | raise ValueError("Missing blink tag at the start of data") 214 | blink_version, blink_version_bytes = pb._read_le_varint(io.BytesIO(data[1:])) 215 | data_start = 1 + len(blink_version_bytes) 216 | if blink_version >= BlinkTrailer.MIN_WIRE_FORMAT_VERSION_FOR_TRAILER: 217 | trailer = BlinkTrailer.from_buffer(data, data_start) # TODO: do something with the trailer? 218 | data_start += BlinkTrailer.TRAILER_SIZE 219 | 220 | with io.BytesIO(data[data_start:]) as obj_raw: 221 | try: 222 | deserializer = ccl_v8_value_deserializer.Deserializer( 223 | obj_raw, host_object_delegate=blink_deserializer.read) 224 | except ValueError: 225 | print("Error record:") 226 | print(level_db_info, key) 227 | raise 228 | data = deserializer.read() 229 | 230 | yield ChromiumNotification( 231 | level_db_info, 232 | root.only("origin").value, 233 | root.only("persistent_notification_id").value, 234 | root.only("notification_id").value, 235 | root.only("notification_data").only("title").value, 236 | root.only("notification_data").only("body").value, 237 | data, 238 | root.only("notification_data").only("timestamp").value, 239 | root.only("creation_time_millis").value, 240 | root.only("closed_reason").value, 241 | root.only("time_until_first_click_millis").value, 242 | root.only("time_until_last_click_millis").value, 243 | root.only("time_until_close_millis").value, 244 | root.only("notification_data").only("tag").value, 245 | root.only("notification_data").only("image").value, 246 | root.only("notification_data").only("icon").value, 247 | root.only("notification_data").only("badge").value, 248 | tuple( 249 | NotificationAction( 250 | x.only("action").value, 251 | x.only("title").value, 252 | x.only("icon").value, 253 | x.only("type").value, 254 | x.only("placeholder").value 255 | ) 256 | for x in root["notification_data"][0]["actions"]) 257 | ) 258 | 259 | 260 | if __name__ == '__main__': 261 | if len(sys.argv) < 2: 262 | print(f"USAGE: {pathlib.Path(sys.argv[0]).name} ") 263 | exit(1) 264 | 265 | _reader = NotificationReader(pathlib.Path(sys.argv[1])) 266 | _blink_deserializer = ccl_blink_value_deserializer.BlinkV8Deserializer() 267 | for notification in _reader.read_notifications(): 268 | print(notification) 269 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_sessionstorage.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2021, CCL Forensics 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | """ 21 | 22 | import sys 23 | import pathlib 24 | import typing 25 | import dataclasses 26 | import re 27 | import collections.abc as col_abc 28 | from types import MappingProxyType 29 | 30 | from .storage_formats import ccl_leveldb 31 | from .common import KeySearch 32 | 33 | __version__ = "0.6" 34 | __description__ = "Module for reading the Chromium leveldb sessionstorage format" 35 | __contact__ = "Alex Caithness" 36 | 37 | # See: https://source.chromium.org/chromium/chromium/src/+/main:components/services/storage/dom_storage/session_storage_metadata.cc 38 | # et al 39 | 40 | _NAMESPACE_PREFIX = b"namespace-" 41 | _MAP_ID_PREFIX = b"map-" 42 | 43 | log = None 44 | 45 | 46 | @dataclasses.dataclass(frozen=True) 47 | class SessionStoreValue: 48 | host: typing.Optional[str] 49 | key: str 50 | value: str 51 | # guid: typing.Optional[str] 52 | leveldb_sequence_number: int 53 | is_deleted: bool = False 54 | 55 | @property 56 | def record_location(self) -> str: 57 | return f"Leveldb Seq: {self.leveldb_sequence_number}" 58 | 59 | 60 | class SessionStoreDb: 61 | # todo: get all grouped by namespace by host? 62 | # todo: get all grouped by namespace by host.key? 63 | # todo: consider refactoring to only getting metadata on first pass and everything else on demand? 64 | def __init__(self, in_dir: pathlib.Path): 65 | if not in_dir.is_dir(): 66 | raise IOError("Input directory is not a directory") 67 | 68 | self._ldb = ccl_leveldb.RawLevelDb(in_dir) 69 | 70 | # If performance is a concern we should refactor this, but slow and steady for now 71 | 72 | # First collect the namespace (session/tab guid + host) and map-ids together 73 | self._map_id_to_host = {} # map_id: host 74 | self._deleted_keys = set() 75 | 76 | for rec in self._ldb.iterate_records_raw(): 77 | if rec.user_key.startswith(_NAMESPACE_PREFIX): 78 | if rec.user_key == _NAMESPACE_PREFIX: 79 | continue # bogus entry near the top usually 80 | try: 81 | key = rec.user_key.decode("utf-8") 82 | except UnicodeDecodeError: 83 | print(f"Invalid namespace key: {rec.user_key}") 84 | continue 85 | 86 | split_key = key.split("-", 2) 87 | if len(split_key) != 3: 88 | print(f"Invalid namespace key: {key}") 89 | continue 90 | 91 | _, guid, host = split_key 92 | 93 | if not host: 94 | continue # TODO investigate why this happens 95 | 96 | # normalize host to lower just in case 97 | host = host.lower() 98 | guid_host_pair = guid, host 99 | 100 | if rec.state == ccl_leveldb.KeyState.Deleted: 101 | self._deleted_keys.add(guid_host_pair) 102 | else: 103 | try: 104 | map_id = rec.value.decode("utf-8") 105 | except UnicodeDecodeError: 106 | print(f"Invalid namespace value: {key}") 107 | continue 108 | 109 | if not map_id: 110 | continue # TODO: investigate why this happens/do we want to keep the host around somewhere? 111 | 112 | #if map_id in self._map_id_to_host_guid and self._map_id_to_host_guid[map_id] != guid_host_pair: 113 | if map_id in self._map_id_to_host and self._map_id_to_host[map_id] != host: 114 | print("Map ID Collision!") 115 | print(f"map_id: {map_id}") 116 | print(f"Old host: {self._map_id_to_host[map_id]}") 117 | print(f"New host: {guid_host_pair}") 118 | raise ValueError("map_id collision") 119 | else: 120 | self._map_id_to_host[map_id] = host 121 | 122 | # freeze stuff 123 | self._map_id_to_host = MappingProxyType(self._map_id_to_host) 124 | 125 | self._deleted_keys = frozenset(self._deleted_keys) 126 | self._deleted_keys_lookup: dict[str, tuple] = {} 127 | 128 | self._host_lookup = {} # {host: {ss_key: [SessionStoreValue, ...]}} 129 | self._orphans = [] # list of tuples of key, value where we can't get the host 130 | for rec in self._ldb.iterate_records_raw(): 131 | if rec.user_key.startswith(_MAP_ID_PREFIX): 132 | try: 133 | key = rec.user_key.decode("utf-8") 134 | except UnicodeDecodeError: 135 | print(f"Invalid map id key: {rec.user_key}") 136 | continue 137 | 138 | # if rec.state == ccl_leveldb.KeyState.Deleted: 139 | # continue # TODO: do we want to keep the key around because the presence is important? 140 | 141 | split_key = key.split("-", 2) 142 | if len(split_key) != 3: 143 | print(f"Invalid map id key: {key}") 144 | continue 145 | 146 | _, map_id, ss_key = split_key 147 | 148 | if not split_key: 149 | # TODO what does it mean when there is no key here? 150 | # The value will also be a single number (encoded utf-8) 151 | continue 152 | 153 | try: 154 | value = rec.value.decode("UTF-16-LE") if rec.state == ccl_leveldb.KeyState.Live else None 155 | except UnicodeDecodeError: 156 | print(f"Error decoding value for {key}") 157 | print(f"Raw Value: {rec.value}") 158 | continue 159 | 160 | host = self._map_id_to_host.get(map_id) 161 | if not host: 162 | self._orphans.append( 163 | (ss_key, 164 | SessionStoreValue(None, ss_key, value, rec.seq, rec.state == ccl_leveldb.KeyState.Deleted) 165 | )) 166 | else: 167 | self._host_lookup.setdefault(host, {}) 168 | self._host_lookup[host].setdefault(ss_key, []) 169 | self._host_lookup[host][ss_key].append( 170 | SessionStoreValue(host, ss_key, value, rec.seq, rec.state == ccl_leveldb.KeyState.Deleted)) 171 | 172 | def __contains__(self, item: typing.Union[str, typing.Tuple[str, str]]) -> bool: 173 | """ 174 | :param item: either the host as a str or a tuple of the host and a key (both str) 175 | :return: if item is a str, returns true if that host is present, if item is a tuple of (str, str), returns True 176 | if that host and key pair are present 177 | """ 178 | 179 | if isinstance(item, str): 180 | return item in self._host_lookup 181 | elif isinstance(item, tuple) and len(item) == 2: 182 | host, key = item 183 | return host in self._host_lookup and key in self._host_lookup[host] 184 | else: 185 | raise TypeError("item must be a string or a tuple of (str, str)") 186 | 187 | def iter_hosts(self) -> typing.Iterable[str]: 188 | """ 189 | :return: yields the hosts present in this SessionStorage 190 | """ 191 | yield from self._host_lookup.keys() 192 | 193 | def get_all_for_host(self, host: str) -> dict[str, tuple[SessionStoreValue, ...]]: 194 | """ 195 | DEPRECATED 196 | :param host: the host (domain name) for the session storage 197 | :return: a dictionary where the keys are storage keys and the values are tuples of SessionStoreValue objects 198 | for that key. Multiple values may be returned as deleted or old values may be recovered. 199 | """ 200 | if host not in self: 201 | return {} 202 | result_raw = dict(self._host_lookup[host]) 203 | for ss_key in result_raw: 204 | result_raw[ss_key] = tuple(result_raw[ss_key]) 205 | return result_raw 206 | 207 | def _search_host(self, host: KeySearch) -> list[str]: 208 | if isinstance(host, str): 209 | return [host] 210 | elif isinstance(host, re.Pattern): 211 | return [x for x in self._host_lookup if host.search(x)] 212 | elif isinstance(host, col_abc.Collection): 213 | return list(set(host) & self._host_lookup.keys()) 214 | elif isinstance(host, col_abc.Callable): 215 | return [x for x in self._host_lookup if host(x)] 216 | else: 217 | raise TypeError(f"Unexpected type: {type(host)} (expects: {KeySearch})") 218 | 219 | def iter_records_for_host( 220 | self, host: KeySearch, *, 221 | include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[SessionStoreValue]: 222 | """ 223 | :param host: storage key (host) for the records. This can be one of: a single string; 224 | a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool. 225 | :param include_deletions: if True, records related to deletions will be included 226 | :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError 227 | (these will have None as values). 228 | :return: iterable of SessionStoreValue 229 | """ 230 | if isinstance(host, str): 231 | if raise_on_no_result and host not in self._host_lookup: 232 | raise KeyError(host) 233 | for records in self._host_lookup[host].values(): 234 | for rec in records: 235 | if include_deletions or not rec.is_deleted: 236 | yield rec 237 | elif isinstance(host, re.Pattern) or isinstance(host, col_abc.Collection) or isinstance(host, col_abc.Callable): 238 | found_hosts = self._search_host(host) 239 | if raise_on_no_result and not found_hosts: 240 | raise KeyError(host) 241 | for found_host in found_hosts: 242 | for records in self._host_lookup[found_host].values(): 243 | for rec in records: 244 | if include_deletions or not rec.is_deleted: 245 | yield rec 246 | else: 247 | raise TypeError(f"Unexpected type for host: {type(host)} (expects: {KeySearch})") 248 | 249 | def iter_all_records(self, *, include_deletions=False, include_orphans=False): 250 | """ 251 | Returns all records recovered from session storage 252 | :param include_deletions: if True, records related to deletions will be included 253 | :param include_orphans: if True, records which cannot be associated with a host will be included 254 | """ 255 | for host in self.iter_hosts(): 256 | yield from self.iter_records_for_host(host, include_deletions=include_deletions) 257 | if include_orphans: 258 | yield from (x[1] for x in self.iter_orphans()) 259 | 260 | def get_session_storage_key(self, host: str, key: str) -> tuple[SessionStoreValue, ...]: 261 | """ 262 | DEPRECATED 263 | :param host: the host (domain name) for the session storage 264 | :param key: the storage key 265 | :return: a tuple of SessionStoreValue matching the host and key. Multiple values may be returned as deleted or 266 | old values may be recovered. 267 | """ 268 | if (host, key) not in self: 269 | return tuple() 270 | return tuple(self._host_lookup[host][key]) 271 | 272 | def iter_records_for_session_storage_key( 273 | self, host: KeySearch, key: KeySearch, *, 274 | include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[SessionStoreValue]: 275 | """ 276 | :param host: storage key (host) for the records. This can be one of: a single string; 277 | a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool. 278 | :param key: script defined key for the records. This can be one of: a single string; 279 | a collection of strings; a regex pattern; a function that takes a string and returns a bool. 280 | :param include_deletions: if True, records related to deletions will be included 281 | :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError 282 | (these will have None as values). 283 | :return: iterable of LocalStorageRecords 284 | """ 285 | if isinstance(host, str) and isinstance(key, str): 286 | if host not in self._host_lookup or key not in self._host_lookup[host]: 287 | if raise_on_no_result: 288 | raise KeyError((host, key)) 289 | else: 290 | return [] 291 | 292 | yield from (r for r in self._host_lookup[host][key] if include_deletions or not r.is_deleted) 293 | 294 | else: 295 | found_hosts = self._search_host(host) 296 | if raise_on_no_result and not found_hosts: 297 | raise KeyError((host, key)) 298 | 299 | yielded = False 300 | for found_host in found_hosts: 301 | if isinstance(key, str): 302 | matched_keys = [key] 303 | elif isinstance(key, re.Pattern): 304 | matched_keys = [x for x in self._host_lookup[found_host].keys() if key.search(x)] 305 | elif isinstance(key, col_abc.Collection): 306 | script_key_set = set(key) 307 | matched_keys = list(self._host_lookup[found_host].keys() & script_key_set) 308 | elif isinstance(key, col_abc.Callable): 309 | matched_keys = [x for x in self._host_lookup[found_host].keys() if key(x)] 310 | else: 311 | raise TypeError(f"Unexpected type for script key: {type(key)} (expects: {KeySearch})") 312 | 313 | for matched_key in matched_keys: 314 | for rec in self._host_lookup[found_host][matched_key]: 315 | if include_deletions or not rec.is_deleted: 316 | yielded = True 317 | yield rec 318 | 319 | if not yielded and raise_on_no_result: 320 | raise KeyError((host, key)) 321 | 322 | def iter_orphans(self) -> typing.Iterable[tuple[str, SessionStoreValue]]: 323 | """ 324 | Returns records which have been orphaned from their host (domain name) where it cannot be recovered. The keys 325 | may be named uniquely enough that the host may be inferred. 326 | :return: yields tuples of (session key, SessionStoreValue) 327 | """ 328 | yield from self._orphans 329 | 330 | def __getitem__(self, item: typing.Union[str, typing.Tuple[str, str]]) -> typing.Union[ 331 | dict[str, tuple[SessionStoreValue, ...]], tuple[SessionStoreValue, ...]]: 332 | if item not in self: 333 | raise KeyError(item) 334 | 335 | if isinstance(item, str): 336 | return self.get_all_for_host(item) 337 | elif isinstance(item, tuple) and len(item) == 2: 338 | return self.get_session_storage_key(*item) 339 | else: 340 | raise TypeError("item must be a string or a tuple of (str, str)") 341 | 342 | def __iter__(self) -> typing.Iterable[str]: 343 | """ 344 | iterates the hosts present 345 | """ 346 | return self.iter_hosts() 347 | 348 | def close(self): 349 | self._ldb.close() 350 | 351 | def __enter__(self) -> "SessionStoreDb": 352 | return self 353 | 354 | def __exit__(self, exc_type, exc_val, exc_tb): 355 | self.close() 356 | 357 | 358 | def main(args): 359 | ldb_in_dir = pathlib.Path(args[0]) 360 | ssdb = SessionStoreDb(ldb_in_dir) 361 | 362 | print("Hosts in db:") 363 | for host in ssdb: 364 | print(host) 365 | 366 | 367 | if __name__ == '__main__': 368 | main(sys.argv[1:]) 369 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_chromium_snss2.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import dataclasses 24 | import enum 25 | import struct 26 | import sys 27 | import os 28 | import pathlib 29 | import datetime 30 | import types 31 | import typing 32 | from .serialization_formats.ccl_easy_chromium_pickle import EasyPickleIterator, EasyPickleException 33 | 34 | __version__ = "0.2" 35 | __description__ = "Module for reading Chromium SNSS files" 36 | __contact__ = "Alex Caithness" 37 | 38 | 39 | class TabRestoreIdType(enum.Enum): 40 | # components/sessions/core/tab_restore_service_impl.cc 41 | CommandUpdateTabNavigation = 1 42 | CommandRestoredEntry = 2 43 | CommandWindowDeprecated = 3 44 | CommandSelectedNavigationInTab = 4 45 | CommandPinnedState = 5 46 | CommandSetExtensionAppID = 6 47 | CommandSetWindowAppName = 7 48 | CommandSetTabUserAgentOverride = 8 49 | CommandWindow = 9 50 | CommandSetTabGroupData = 10 51 | CommandSetTabUserAgentOverride2 = 11 52 | CommandSetWindowUserTitle = 12 53 | CommandCreateGroup = 13 54 | CommandAddTabExtraData = 14 55 | 56 | UnusedCommand = 255 57 | 58 | 59 | class SessionRestoreIdType(enum.Enum): 60 | # components/sessions/core/session_service_commands.cc 61 | CommandSetTabWindow = 0 62 | CommandSetWindowBounds = 1 # // OBSOLETE Superseded by kCommandSetWindowBounds3. 63 | CommandSetTabIndexInWindow = 2 64 | CommandTabNavigationPathPrunedFromBack = 5 # // OBSOLETE: Superseded by kCommandTabNavigationPathPruned instead 65 | CommandUpdateTabNavigation = 6 66 | CommandSetSelectedNavigationIndex = 7 67 | CommandSetSelectedTabInIndex = 8 68 | CommandSetWindowType = 9 69 | CommandSetWindowBounds2 = 10 # // OBSOLETE Superseded by kCommandSetWindowBounds3. Except for data migration. 70 | CommandTabNavigationPathPrunedFromFront = 11 # // Superseded kCommandTabNavigationPathPruned instead 71 | CommandSetPinnedState = 12 72 | CommandSetExtensionAppID = 13 73 | CommandSetWindowBounds3 = 14 74 | CommandSetWindowAppName = 15 75 | CommandTabClosed = 16 76 | CommandWindowClosed = 17 77 | CommandSetTabUserAgentOverride = 18 # // OBSOLETE: Superseded by kCommandSetTabUserAgentOverride2. 78 | CommandSessionStorageAssociated = 19 79 | CommandSetActiveWindow = 20 80 | CommandLastActiveTime = 21 81 | CommandSetWindowWorkspace = 22 # // OBSOLETE Superseded by kCommandSetWindowWorkspace2. 82 | CommandSetWindowWorkspace2 = 23 83 | CommandTabNavigationPathPruned = 24 84 | CommandSetTabGroup = 25 85 | CommandSetTabGroupMetadata = 26 # // OBSOLETE Superseded by kCommandSetTabGroupMetadata2. 86 | CommandSetTabGroupMetadata2 = 27 87 | CommandSetTabGuid = 28 88 | CommandSetTabUserAgentOverride2 = 29 89 | CommandSetTabData = 30 90 | CommandSetWindowUserTitle = 31 91 | CommandSetWindowVisibleOnAllWorkspaces = 32 92 | CommandAddTabExtraData = 33 93 | CommandAddWindowExtraData = 34 94 | 95 | # Edge has custom command types. These are what I have seen so far. 96 | # None of these types appear to be related to browsing data at the moment (typically only a few bytes long). 97 | EdgeCommandUnknown131 = 131 98 | EdgeCommandUnknown132 = 132 99 | 100 | UnusedCommand = 255 101 | 102 | 103 | class PageTransition: 104 | # ui/base/page_transition_types.h 105 | _core_mask = 0xff 106 | _qualifier_mask = 0xffffff00 107 | _core_transitions = { 108 | 0: "Link", 109 | 1: "Typed", 110 | 2: "AutoBookmark", 111 | 3: "AutoSubframe", 112 | 4: "ManualSubframe", 113 | 5: "Generated", 114 | 6: "AutoToplevel", 115 | 7: "FormSubmit", 116 | 8: "Reload", 117 | 9: "Keyword", 118 | 10: "KeywordGenerated" 119 | } 120 | _qualifiers = { 121 | 0x00800000: "Blocked", 122 | 0x01000000: "ForwardBack", 123 | 0x02000000: "FromAddressBar", 124 | 0x04000000: "HomePage", 125 | 0x08000000: "FromApi", 126 | 0x10000000: "ChainStart", 127 | 0x20000000: "ChainEnd", 128 | 0x40000000: "ClientRedirect", 129 | 0x80000000: "ServerRedirect" 130 | } 131 | 132 | def __init__(self, value): 133 | self._value = value 134 | if value < 0: 135 | # signed to unsigned 136 | value += (0x80000000 * 2) 137 | self._core_transition = PageTransition._core_transitions[value & PageTransition._core_mask] 138 | self._qualifiers = [] 139 | for flag in PageTransition._qualifiers: 140 | if (value & PageTransition._qualifier_mask) & flag > 0: 141 | self._qualifiers.append(PageTransition._qualifiers[flag]) 142 | 143 | def __str__(self): 144 | return "; ".join([self._core_transition] + self._qualifiers) 145 | 146 | def __repr__(self): 147 | return "".format(self._value, str(self)) 148 | 149 | @property 150 | def core_transition(self) -> str: 151 | return self._core_transition 152 | 153 | @property 154 | def qualifiers(self) -> typing.Iterable[str]: 155 | yield from self._qualifiers 156 | 157 | @property 158 | def value(self): 159 | return self._value 160 | 161 | 162 | class SnssError(Exception): 163 | ... 164 | 165 | 166 | @dataclasses.dataclass(frozen=True) 167 | class SessionCommand: 168 | offset: int 169 | id_type: typing.Union[SessionRestoreIdType, TabRestoreIdType] 170 | 171 | 172 | @dataclasses.dataclass(frozen=True) 173 | class NavigationEntry(SessionCommand): 174 | # components/sessions/core/serialized_navigation_entry.cc 175 | index: int 176 | url: str 177 | title: str 178 | page_state_raw: bytes # replace with completed PageState object 179 | transition_type: PageTransition 180 | has_post_data: typing.Optional[bool] = None 181 | referrer_url: typing.Optional[str] = None 182 | original_request_url: typing.Optional[str] = None 183 | is_overriding_user_agent: typing.Optional[bool] = None 184 | timestamp: typing.Optional[datetime.datetime] = None 185 | http_status: typing.Optional[int] = None 186 | referrer_policy: typing.Optional[int] = None 187 | extended_map: typing.Optional[types.MappingProxyType] = None 188 | task_id: typing.Optional[int] = None 189 | parent_task_id: typing.Optional[int] = None 190 | root_task_id: typing.Optional[int] = None 191 | session_id: typing.Optional[int] = None 192 | 193 | @classmethod 194 | def from_pickle( 195 | cls, pickle, id_type: typing.Union[SessionRestoreIdType, TabRestoreIdType], 196 | offset: int, session_id: typing.Optional[int]=None) -> "NavigationEntry": 197 | index = pickle.read_int32() 198 | url = pickle.read_string() 199 | title = pickle.read_string16() 200 | page_state_length = pickle.read_int32() 201 | # https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/common/page_state/page_state_serialization.cc;drc=d1e1301c82bef37d30796e2d6098856b851d90a4;l=897 202 | page_state_raw = pickle.read_aligned(page_state_length) 203 | transition_type = PageTransition(pickle.read_uint32()) 204 | 205 | try: 206 | type_mask = pickle.read_uint32() 207 | except EasyPickleException: 208 | # very old versions of data end here, so we return a partial object here 209 | return cls(offset, id_type, index, url, title, page_state_raw, transition_type) 210 | 211 | has_post_data = (type_mask & 0x01) > 0 212 | referrer_url = pickle.read_string() 213 | _ = pickle.read_int32() # referrer policy, not used 214 | original_request_url = pickle.read_string() 215 | is_overriding_user_agent = pickle.read_bool() 216 | timestamp = pickle.read_datetime() 217 | _ = pickle.read_string16() # search terms, not used 218 | http_status = pickle.read_int32() 219 | referrer_policy = pickle.read_int32() 220 | 221 | extended_map_size = pickle.read_int32() 222 | extended_map = {} 223 | for _ in range(extended_map_size): 224 | key = pickle.read_string() 225 | value = pickle.read_string() 226 | extended_map[key] = value 227 | 228 | extended_map = types.MappingProxyType(extended_map) 229 | 230 | task_id = None 231 | parent_task_id = None 232 | root_task_id = None 233 | 234 | try: 235 | # these might not exist in older files, so no big deal if we can't get them 236 | task_id = pickle.read_int64() 237 | parent_task_id = pickle.read_int64() 238 | root_task_id = pickle.read_int64() 239 | 240 | child_task_id_count = pickle.read_int32() 241 | if child_task_id_count != 0: 242 | raise SnssError("Child tasks should not be present when reading NavigationEntry") 243 | except EasyPickleException: 244 | pass 245 | 246 | return cls( 247 | offset, id_type, 248 | index, url, title, page_state_raw, transition_type, has_post_data, referrer_url, original_request_url, 249 | is_overriding_user_agent, timestamp, http_status, referrer_policy, extended_map, task_id, parent_task_id, 250 | root_task_id, session_id 251 | ) 252 | 253 | 254 | class UnprocessedEntry(SessionCommand): 255 | ... 256 | 257 | 258 | class SnssFileType(enum.Enum): 259 | Session = 1 260 | Tab = 2 261 | 262 | 263 | class SnssFile: 264 | def __init__(self, file_type: SnssFileType, stream: typing.BinaryIO): 265 | # components/sessions/core/command_storage_backend.cc 266 | self._f = stream 267 | if file_type == SnssFileType.Session: 268 | self._id_type = SessionRestoreIdType 269 | elif file_type == SnssFileType.Tab: 270 | self._id_type = TabRestoreIdType 271 | else: 272 | raise ValueError("file_type is an unknown SnssFileType or is not SnssFileType") 273 | 274 | self._file_type = file_type 275 | header = self._f.read(8) 276 | if header[0:4] != b"SNSS": 277 | raise SnssError(f"Invalid magic; expected SNSS; got {header[0:4]}") 278 | self._version, = struct.unpack(" SnssFileType: 284 | return self._file_type 285 | 286 | def reset(self): 287 | self._f.seek(8, os.SEEK_SET) 288 | 289 | def _get_next_session_command(self) -> typing.Optional[SessionCommand]: 290 | # components/sessions/core/command_storage_backend.cc 291 | start_offset = self._f.tell() 292 | length_raw = self._f.read(2) 293 | if not length_raw: 294 | return None # eof 295 | length, = struct.unpack(" typing.Iterable[SessionCommand]: 312 | self.reset() 313 | while command := self._get_next_session_command(): 314 | yield command 315 | 316 | 317 | def main(args): 318 | in_path = pathlib.Path(args[0]) 319 | if in_path.name.startswith("Session_"): 320 | file_type = SnssFileType.Session 321 | elif in_path.name.startswith("Tabs_"): 322 | file_type = SnssFileType.Tab 323 | else: 324 | raise ValueError("File name does not start with Session or Tabs") 325 | with in_path.open("rb") as f: 326 | snss_file = SnssFile(file_type, f) 327 | for command in snss_file.iter_session_commands(): 328 | print(command) 329 | 330 | 331 | if __name__ == '__main__': 332 | main(sys.argv[1:]) 333 | -------------------------------------------------------------------------------- /ccl_chromium_reader/ccl_shared_proto_db_downloads.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | __version__ = "0.3" 24 | __description__ = "A module for reading downloads from the Chrome/Chromium shared_proto_db leveldb data store" 25 | __contact__ = "Alex Caithness" 26 | 27 | import datetime 28 | import io 29 | import os 30 | import pathlib 31 | import sys 32 | import typing 33 | 34 | from .storage_formats import ccl_leveldb 35 | from .serialization_formats import ccl_protobuff as pb 36 | from .download_common import Download 37 | 38 | CHROME_EPOCH = datetime.datetime(1601, 1, 1, 0, 0, 0) 39 | 40 | 41 | def chrome_milli_time(milliseconds: typing.Optional[int], allow_none=True) -> typing.Optional[datetime.datetime]: 42 | if milliseconds is not None: 43 | if milliseconds == 0xffffffffffffffff: 44 | return CHROME_EPOCH 45 | else: 46 | return CHROME_EPOCH + datetime.timedelta(milliseconds=milliseconds) 47 | elif allow_none: 48 | return None 49 | raise ValueError("milliseconds cannot be None") 50 | 51 | 52 | def read_datetime(stream) -> typing.Optional[datetime.datetime]: 53 | return chrome_milli_time(pb.read_le_varint(stream)) 54 | 55 | 56 | # https://source.chromium.org/chromium/chromium/src/+/main:components/download/database/proto/download_entry.proto;l=86 57 | 58 | HttpRequestHeader_Structure = { 59 | 1: pb.ProtoDecoder("key", pb.read_string), 60 | 2: pb.ProtoDecoder("value", pb.read_string) 61 | } 62 | 63 | ReceivedSlice_Structure = { 64 | 1: pb.ProtoDecoder("offset", pb.read_le_varint), 65 | 2: pb.ProtoDecoder("received_bytes", pb.read_le_varint), 66 | 3: pb.ProtoDecoder("finished", lambda x: pb.read_le_varint(x) != 0) 67 | } 68 | 69 | InProgressInfo_Structure = { 70 | 1: pb.ProtoDecoder("url_chain", pb.read_string), # string 71 | 2: pb.ProtoDecoder("referrer_url", pb.read_string), # string 72 | 3: pb.ProtoDecoder("site_url", pb.read_string), # string // deprecated 73 | 4: pb.ProtoDecoder("tab_url", pb.read_string), # string 74 | 5: pb.ProtoDecoder("tab_referrer_url", pb.read_string), # string 75 | 6: pb.ProtoDecoder("fetch_error_body", lambda x: pb.read_le_varint(x) != 0), # bool 76 | 7: pb.ProtoDecoder("request_headers", lambda x: pb.read_embedded_protobuf(x, HttpRequestHeader_Structure, True)), # HttpRequestHeader 77 | 8: pb.ProtoDecoder("etag", pb.read_string), # string 78 | 9: pb.ProtoDecoder("last_modified", pb.read_string), # string 79 | 10: pb.ProtoDecoder("total_bytes", pb.read_le_varint), # int64: 80 | 11: pb.ProtoDecoder("mime_type", pb.read_string), # string 81 | 12: pb.ProtoDecoder("original_mime_type", pb.read_string), # string 82 | 13: pb.ProtoDecoder("current_path", pb.read_blob), # bytes // Serialized pickles to support string16: TODO 83 | 14: pb.ProtoDecoder("target_path", pb.read_blob), # bytes // Serialized pickles to support string16: TODO 84 | 15: pb.ProtoDecoder("received_bytes", pb.read_le_varint), # int64: 85 | 16: pb.ProtoDecoder("start_time", read_datetime), # int64: 86 | 17: pb.ProtoDecoder("end_time", read_datetime), # int64: 87 | 18: pb.ProtoDecoder("received_slices", lambda x: pb.read_embedded_protobuf(x, ReceivedSlice_Structure, True)), # ReceivedSlice 88 | 19: pb.ProtoDecoder("hash", pb.read_blob), # string 89 | 20: pb.ProtoDecoder("transient", lambda x: pb.read_le_varint(x) != 0), # bool 90 | 21: pb.ProtoDecoder("state", pb.read_le_varint32), # int32: 91 | 22: pb.ProtoDecoder("danger_type", pb.read_le_varint32), # int32: 92 | 23: pb.ProtoDecoder("interrupt_reason", pb.read_le_varint32), # int32: 93 | 24: pb.ProtoDecoder("paused", lambda x: pb.read_le_varint(x) != 0), # bool 94 | 25: pb.ProtoDecoder("metered", lambda x: pb.read_le_varint(x) != 0), # bool 95 | 26: pb.ProtoDecoder("bytes_wasted", pb.read_le_varint), # int64: 96 | 27: pb.ProtoDecoder("auto_resume_count", pb.read_le_varint32), # int32: 97 | # 28: pb.ProtoDecoder("download_schedule", None) # DownloadSchedule // // Deprecated. 98 | # 29: pb.ProtoDecoder("reroute_info", pb), # enterprise_connectors.DownloadItemRerouteInfo TODO 99 | 30: pb.ProtoDecoder("credentials_mode", pb.read_le_varint32), # int32: // network::mojom::CredentialsMode 100 | 31: pb.ProtoDecoder("range_request_from", pb.read_le_varint), # int64: 101 | 32: pb.ProtoDecoder("range_request_to", pb.read_le_varint), # int64: 102 | 33: pb.ProtoDecoder("serialized_embedder_download_data", pb.read_string) # string 103 | } 104 | 105 | DownloadInfo_structure = { 106 | 1: pb.ProtoDecoder("guid", pb.read_string), 107 | 2: pb.ProtoDecoder("id", pb.read_le_varint32), 108 | # 3 UkmInfo 109 | 4: pb.ProtoDecoder("in_progress_info", lambda x: pb.read_embedded_protobuf(x, InProgressInfo_Structure, True)) 110 | } 111 | 112 | DownloadDbEntry_structure = { 113 | 1: pb.ProtoDecoder("download_info", lambda x: pb.read_embedded_protobuf(x, DownloadInfo_structure, True)) 114 | } 115 | 116 | 117 | def read_downloads( 118 | shared_proto_db_folder: typing.Union[str, os.PathLike], 119 | *, handle_errors=False, utf16_paths=True) -> typing.Iterator[Download]: 120 | ldb_path = pathlib.Path(shared_proto_db_folder) 121 | with ccl_leveldb.RawLevelDb(ldb_path) as ldb: 122 | for rec in ldb.iterate_records_raw(): 123 | if rec.state != ccl_leveldb.KeyState.Live: 124 | continue 125 | 126 | key = rec.user_key 127 | record_type, specific_key = key.split(b"_", 1) 128 | if record_type == b"21": 129 | with io.BytesIO(rec.value) as f: 130 | obj = pb.ProtoObject( 131 | 0xa, "root", pb.read_protobuff(f, DownloadDbEntry_structure, use_friendly_tag=True)) 132 | try: 133 | download = Download.from_pb(rec.seq, obj, target_path_is_utf_16=utf16_paths) 134 | except ValueError as ex: 135 | print(f"Error reading a download: {ex}", file=sys.stderr) 136 | if handle_errors: 137 | continue 138 | else: 139 | raise 140 | 141 | yield download 142 | 143 | 144 | def report_downloads( 145 | shared_proto_db_folder: typing.Union[str, os.PathLike], 146 | out_csv_path: typing.Union[str, os.PathLike], utf16_paths=True): 147 | 148 | with pathlib.Path(out_csv_path).open("tx", encoding="utf-8", newline="") as out: 149 | writer = csv.writer(out, csv.excel, quoting=csv.QUOTE_ALL, quotechar="\"", escapechar="\\") 150 | writer.writerow([ 151 | "seq no", 152 | "guid", 153 | "start time", 154 | "end time", 155 | "tab url", 156 | "tab referrer url", 157 | "download url chain", 158 | "target path", 159 | "hash", 160 | "total bytes", 161 | "mime type", 162 | "original mime type" 163 | ]) 164 | for download in read_downloads(shared_proto_db_folder, handle_errors=True, utf16_paths=utf16_paths): 165 | writer.writerow([ 166 | str(download.level_db_seq_no), 167 | str(download.guid), 168 | download.start_time, 169 | download.end_time, 170 | download.tab_url, 171 | download.tab_referrer_url, 172 | " -> ".join(download.url_chain), 173 | download.target_path, 174 | download.hash, 175 | download.total_bytes, 176 | download.mime_type, 177 | download.original_mime_type 178 | ]) 179 | 180 | 181 | if __name__ == '__main__': 182 | import csv 183 | if len(sys.argv) < 3: 184 | print(f"USAGE: {pathlib.Path(sys.argv[0]).name} [-u8]") 185 | print() 186 | print("-u8\tutf-8 target paths (use this if target paths appear garbled in the output)") 187 | print() 188 | exit(1) 189 | report_downloads(sys.argv[1], sys.argv[2], "-u8" not in sys.argv[3:]) 190 | -------------------------------------------------------------------------------- /ccl_chromium_reader/common.py: -------------------------------------------------------------------------------- 1 | import re 2 | import typing 3 | import collections.abc as col_abc 4 | 5 | 6 | KeySearch = typing.Union[str, re.Pattern, col_abc.Collection[str], col_abc.Callable[[str], bool]] 7 | 8 | 9 | def is_keysearch_hit(search: KeySearch, value: str): 10 | if isinstance(search, str): 11 | return value == search 12 | elif isinstance(search, re.Pattern): 13 | return search.search(value) is not None 14 | elif isinstance(search, col_abc.Collection): 15 | return value in set(search) 16 | elif isinstance(search, col_abc.Callable): 17 | return search(value) 18 | else: 19 | raise TypeError(f"Unexpected type: {type(search)} (expects: {KeySearch})") -------------------------------------------------------------------------------- /ccl_chromium_reader/download_common.py: -------------------------------------------------------------------------------- 1 | import dataclasses 2 | import datetime 3 | import struct 4 | import enum 5 | 6 | from .serialization_formats import ccl_protobuff as pb 7 | 8 | 9 | class DownloadSource(enum.Enum): 10 | shared_proto_db = 1 11 | history_db = 2 12 | 13 | 14 | @dataclasses.dataclass(frozen=True) 15 | class Download: # TODO: all of the parameters 16 | record_source: DownloadSource 17 | record_id: int 18 | guid: str 19 | hash: str 20 | url_chain: tuple[str, ...] 21 | tab_url: str 22 | tab_referrer_url: str 23 | target_path: str 24 | mime_type: str 25 | original_mime_type: str 26 | total_bytes: str 27 | start_time: datetime.datetime 28 | end_time: datetime.datetime 29 | 30 | @property 31 | def level_db_seq_no(self): 32 | if self.record_source == DownloadSource.shared_proto_db: 33 | return self.record_id 34 | 35 | @property 36 | def record_location(self) -> str: 37 | if self.record_source == DownloadSource.shared_proto_db: 38 | return f"Leveldb Seq: {self.record_id}" 39 | elif self.record_source == DownloadSource.history_db: 40 | return f"SQLite Rowid: {self.record_id}" 41 | raise NotImplementedError() 42 | 43 | @property 44 | def url(self) -> str: 45 | return self.url_chain[-1] 46 | 47 | @property 48 | def file_size(self) -> int: 49 | return int(self.total_bytes) 50 | 51 | @classmethod 52 | def from_pb(cls, seq: int, proto: pb.ProtoObject, *, target_path_is_utf_16=True): 53 | if not proto.only("download_info").value: 54 | raise ValueError("download_info is empty") 55 | target_path_raw = proto.only("download_info").only("in_progress_info").only("target_path").value 56 | path_proto_length, path_char_count = struct.unpack(" str: 12 | raise NotImplementedError() 13 | 14 | 15 | @typing.runtime_checkable 16 | class LocalStorageRecordProtocol(HasRecordLocationProtocol, typing.Protocol): 17 | @property 18 | def storage_key(self) -> str: 19 | raise NotImplementedError() 20 | 21 | @property 22 | def script_key(self) -> str: 23 | raise NotImplementedError() 24 | 25 | @property 26 | def value(self) -> str: 27 | raise NotImplementedError() 28 | 29 | 30 | @typing.runtime_checkable 31 | class SessionStorageRecordProtocol(HasRecordLocationProtocol, typing.Protocol): 32 | host: typing.Optional[str] 33 | key: str 34 | value: str 35 | 36 | 37 | @typing.runtime_checkable 38 | class HistoryRecordProtocol(HasRecordLocationProtocol, typing.Protocol): 39 | url: str 40 | title: str 41 | visit_time: datetime.datetime 42 | # TODO: Assess whether the parent/child visits can be part of the protocol 43 | 44 | 45 | @typing.runtime_checkable 46 | class IdbKeyProtocol(typing.Protocol): 47 | raw_key: bytes 48 | value: typing.Any 49 | 50 | 51 | @typing.runtime_checkable 52 | class IndexedDbRecordProtocol(HasRecordLocationProtocol, typing.Protocol): 53 | key: IdbKeyProtocol 54 | value: typing.Any 55 | 56 | 57 | class CacheMetadataProtocol(typing.Protocol): 58 | request_time: datetime.datetime 59 | http_header_attributes: typing.Iterable[tuple[str, str]] 60 | 61 | def get_attribute(self, attribute: str) -> list[str]: 62 | raise NotImplementedError() 63 | 64 | 65 | class CacheKeyProtocol(typing.Protocol): 66 | raw_key: str 67 | url: str 68 | 69 | 70 | class CacheRecordProtocol(typing.Protocol): 71 | key: CacheKeyProtocol 72 | metadata: CacheMetadataProtocol 73 | data: bytes 74 | metadata_location: typing.Any 75 | data_location: typing.Any 76 | was_decompressed: bool 77 | 78 | 79 | class DownloadRecordProtocol(HasRecordLocationProtocol, typing.Protocol): 80 | url: str 81 | start_time: typing.Optional[datetime.datetime] 82 | end_time: typing.Optional[datetime.datetime] 83 | target_path: typing.Optional[str] 84 | file_size: int 85 | 86 | 87 | @typing.runtime_checkable 88 | class BrowserProfileProtocol(typing.Protocol): 89 | def close(self): 90 | raise NotImplementedError() 91 | 92 | def iter_local_storage_hosts(self) -> col_abc.Iterable[str]: 93 | """ 94 | Iterates the hosts in this profile's local storage 95 | """ 96 | raise NotImplementedError() 97 | 98 | def iter_local_storage( 99 | self, storage_key: typing.Optional[KeySearch] = None, script_key: typing.Optional[KeySearch] = None, *, 100 | include_deletions=False, raise_on_no_result=False) -> col_abc.Iterable[LocalStorageRecordProtocol]: 101 | """ 102 | Iterates this profile's local storage records 103 | 104 | :param storage_key: storage key (host) for the records. This can be one of: a single string; 105 | a collection of strings; a regex pattern; a function that takes a string and returns a bool. 106 | :param script_key: script defined key for the records. This can be one of: a single string; 107 | a collection of strings; a regex pattern; a function that takes a string and returns a bool. 108 | :param include_deletions: if True, records related to deletions will be included 109 | :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError 110 | (these will have None as values). 111 | :return: 112 | """ 113 | raise NotImplementedError() 114 | 115 | def iter_session_storage_hosts(self) -> col_abc.Iterable[str]: 116 | """ 117 | Iterates this profile's session storage hosts 118 | """ 119 | raise NotImplementedError() 120 | 121 | def iter_session_storage( 122 | self, host: typing.Optional[KeySearch] = None, key: typing.Optional[KeySearch] = None, *, 123 | include_deletions=False, raise_on_no_result=False) -> col_abc.Iterable[SessionStorageRecordProtocol]: 124 | """ 125 | Iterates this profile's session storage records 126 | 127 | :param host: storage key (host) for the records. This can be one of: a single string; 128 | a collection of strings; a regex pattern; a function that takes a string (each host) and 129 | returns a bool; or None (the default) in which case all hosts are considered. 130 | :param key: script defined key for the records. This can be one of: a single string; 131 | a collection of strings; a regex pattern; a function that takes a string and returns a bool; or 132 | None (the default) in which case all keys are considered. 133 | :param include_deletions: if True, records related to deletions will be included (these will have None as 134 | values). 135 | :param raise_on_no_result: if True, if no matching storage keys are found, raise a KeyError 136 | 137 | :return: iterable of SessionStoreValue 138 | """ 139 | raise NotImplementedError() 140 | 141 | def iter_indexeddb_hosts(self) -> col_abc.Iterable[str]: 142 | """ 143 | Iterates the hosts present in the Indexed DB folder. These values are what should be used to load the databases 144 | directly. 145 | """ 146 | raise NotImplementedError() 147 | 148 | def get_indexeddb(self, host: str): 149 | """ 150 | Returns the database with the host provided. Should be one of the values returned by 151 | :func:`~iter_indexeddb_hosts`. The database will be opened on-demand if it hasn't previously been opened. 152 | 153 | :param host: the host to get 154 | """ 155 | # TODO typehint return type once it's also abstracted 156 | raise NotImplementedError() 157 | 158 | def iter_indexeddb_records( 159 | self, host_id: typing.Optional[KeySearch], database_name: typing.Optional[KeySearch] = None, 160 | object_store_name: typing.Optional[KeySearch] = None, *, 161 | raise_on_no_result=False, include_deletions=False, 162 | bad_deserializer_data_handler=None) -> col_abc.Iterable[IndexedDbRecordProtocol]: 163 | """ 164 | Iterates indexeddb records in this profile. 165 | 166 | :param host_id: the host for the records, relates to the host-named folder in the IndexedDB folder. The 167 | possible values for this profile are returned by :func:`~iter_indexeddb_hosts`. This can be one of: 168 | a single string; a collection of strings; a regex pattern; a function that takes a string (each host) and 169 | returns a bool; or None in which case all hosts are considered. Be cautious with supplying a parameter 170 | which will lead to unnecessary databases being opened as this has a set-up time for the first time it 171 | is opened. 172 | :param database_name: the database name for the records. This can be one of: a single string; a collection 173 | of strings; a regex pattern; a function that takes a string (each host) and returns a bool; or None (the 174 | default) in which case all hosts are considered. 175 | :param object_store_name: the object store name of the records. This can be one of: a single string; 176 | a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool; 177 | or None (the default) in which case all hosts are considered. 178 | :param raise_on_no_result: if True, if no matching storage keys are found, raise a KeyError 179 | :param include_deletions: if True, records related to deletions will be included (these will have None as 180 | values). 181 | :param bad_deserializer_data_handler: a callback function which will be executed by the underlying 182 | indexeddb reader if invalid data is encountered during reading a record, rather than raising an exception. 183 | The function should take two arguments: an IdbKey object (which is the key of the bad record) and a bytes 184 | object (which is the raw data). The return value of the callback is ignored by the calling code. If this is 185 | None (the default) then any bad data will cause an exception to be raised. 186 | """ 187 | raise NotImplementedError() 188 | 189 | def iterate_history_records( 190 | self, url: typing.Optional[KeySearch]=None, *, 191 | earliest: typing.Optional[datetime.datetime]=None, 192 | latest: typing.Optional[datetime.datetime]=None) -> col_abc.Iterable[HistoryRecordProtocol]: 193 | """ 194 | Iterates history records for this profile. 195 | 196 | :param url: a URL to search for. This can be one of: a single string; a collection of strings; 197 | a regex pattern; a function that takes a string (each host) and returns a bool; or None (the 198 | default) in which case all hosts are considered. 199 | :param earliest: an optional datetime which will be used to exclude records before this date. 200 | NB the date should be UTC to match the database. If None, no lower limit will be placed on 201 | timestamps. 202 | :param latest: an optional datetime which will be used to exclude records after this date. 203 | NB the date should be UTC to match the database. If None, no upper limit will be placed on 204 | timestamps. 205 | """ 206 | # TODO typehint return type once it's also abstracted 207 | raise NotImplementedError() 208 | 209 | def iterate_cache( 210 | self, 211 | url: typing.Optional[KeySearch]=None, *, decompress=True, omit_cached_data=False, 212 | **kwargs: typing.Union[bool, KeySearch]) -> col_abc.Iterable[CacheRecordProtocol]: 213 | """ 214 | Iterates cache records for this profile. 215 | 216 | :param url: a URL to search for. This can be one of: a single string; a collection of strings; 217 | a regex pattern; a function that takes a string (each host) and returns a bool; or None (the 218 | default) in which case all records are considered. 219 | :param decompress: if True (the default), data from the cache which is compressed (as per the 220 | content-encoding header field) will be decompressed when read if the compression format is 221 | supported (currently deflate, gzip and brotli are supported). 222 | :param omit_cached_data: does not collect the cached data and omits it from each `CacheResult` 223 | object. Should be faster in cases when only metadata recovery is required. 224 | :param kwargs: further keyword arguments are used to search based upon header fields. The 225 | keyword should be the header field name, with underscores replacing hyphens (e.g., 226 | content-encoding, becomes content_encoding). The value should be one of: a Boolean (in which 227 | case only records with this field present will be included if True, and vice versa); a single 228 | string; a collection of strings; a regex pattern; a function that takes a string (the value) 229 | and returns a bool. 230 | """ 231 | raise NotImplementedError() 232 | 233 | def iter_downloads( 234 | self, *, download_url: typing.Optional[KeySearch]=None, 235 | tab_url: typing.Optional[KeySearch]=None) -> col_abc.Iterable[DownloadRecordProtocol]: 236 | """ 237 | Iterates download records for this profile 238 | 239 | :param download_url: A URL related to the downloaded resource. This can be one of: a single string; 240 | a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool; 241 | or None (the default) in which case all records are considered. 242 | :param tab_url: A URL related to the page the user was accessing when this download was started. 243 | This can be one of: a single string; a collection of strings; a regex pattern; a function that takes 244 | a string (each host) and returns a bool; or None (the default) in which case all records are considered. 245 | """ 246 | raise NotImplementedError() 247 | 248 | @property 249 | def path(self) -> pathlib.Path: 250 | """The input path of this browser profile""" 251 | raise NotImplementedError() 252 | 253 | @property 254 | def local_storage(self): 255 | """The local storage object for this browser profile""" 256 | raise NotImplementedError() 257 | 258 | @property 259 | def session_storage(self): 260 | """The session storage object for this browser profile""" 261 | raise NotImplementedError() 262 | 263 | @property 264 | def cache(self): 265 | """The cache for this browser profile""" 266 | raise NotImplementedError() 267 | 268 | @property 269 | def history(self): 270 | """The history for this browser profile""" 271 | raise NotImplementedError() 272 | 273 | @property 274 | def browser_type(self) -> str: 275 | """The name of the browser type for this profile""" 276 | raise NotImplementedError() 277 | -------------------------------------------------------------------------------- /ccl_chromium_reader/serialization_formats/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/ccl_chromium_reader/serialization_formats/__init__.py -------------------------------------------------------------------------------- /ccl_chromium_reader/serialization_formats/ccl_blink_value_deserializer.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2020, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import sys 24 | import enum 25 | import typing 26 | from dataclasses import dataclass 27 | 28 | from . import ccl_v8_value_deserializer 29 | 30 | # See: https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/bindings/core/v8/serialization 31 | # https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/modules/v8/serialization/v8_script_value_serializer_for_modules.cc 32 | 33 | 34 | # WebCoreStrings are read as (length:uint32_t, string:UTF8[length]). 35 | # RawStrings are read as (length:uint32_t, string:UTF8[length]). 36 | # RawUCharStrings are read as 37 | # (length:uint32_t, string:UChar[length/sizeof(UChar)]). 38 | # RawFiles are read as 39 | # (path:WebCoreString, url:WebCoreStrng, type:WebCoreString). 40 | # There is a reference table that maps object references (uint32_t) to 41 | # v8::Values. 42 | # Tokens marked with (ref) are inserted into the reference table and given the 43 | # next object reference ID after decoding. 44 | # All tags except InvalidTag, PaddingTag, ReferenceCountTag, VersionTag, 45 | # GenerateFreshObjectTag and GenerateFreshArrayTag push their results to the 46 | # deserialization stack. 47 | # There is also an 'open' stack that is used to resolve circular references. 48 | # Objects or arrays may contain self-references. Before we begin to deserialize 49 | # the contents of these values, they are first given object reference IDs (by 50 | # GenerateFreshObjectTag/GenerateFreshArrayTag); these reference IDs are then 51 | # used with ObjectReferenceTag to tie the recursive knot. 52 | 53 | __version__ = "0.3" 54 | __description__ = "Partial reimplementation of the Blink Javascript Object Serialization" 55 | __contact__ = "Alex Caithness" 56 | 57 | __DEBUG = False 58 | 59 | 60 | def log(msg, debug_only=True): 61 | if __DEBUG or not debug_only: 62 | caller_name = sys._getframe(1).f_code.co_name 63 | caller_line = sys._getframe(1).f_code.co_firstlineno 64 | print(f"{caller_name} ({caller_line}):\t{msg}") 65 | 66 | 67 | class BlobIndexType(enum.Enum): 68 | Blob = 0 69 | File = 1 70 | 71 | 72 | @dataclass 73 | class BlobIndex: 74 | index_type: BlobIndexType 75 | index_id: int 76 | 77 | 78 | @dataclass(frozen=True) 79 | class NativeFileHandle: 80 | is_dir: bool 81 | name: str 82 | token_index: int 83 | 84 | 85 | @dataclass(frozen=True) 86 | class CryptoKey: 87 | sub_type: "V8CryptoKeySubType" 88 | algorithm_type: typing.Optional["V8CryptoKeyAlgorithm"] 89 | hash_type: typing.Optional["V8CryptoKeyAlgorithm"] 90 | asymmetric_key_type: typing.Optional["V8AsymmetricCryptoKeyType"] 91 | byte_length: typing.Optional[int] 92 | public_exponent: typing.Optional[bytes] 93 | named_curve_type: typing.Optional["V8CryptoNamedCurve"] 94 | key_usage: "V8CryptoKeyUsage" 95 | key_data: bytes 96 | 97 | 98 | class Constants: 99 | tag_kMessagePortTag = b"M" # index:int -> MessagePort. Fills the result with 100 | # transferred MessagePort. 101 | tag_kMojoHandleTag = b"h" # index:int -> MojoHandle. Fills the result with 102 | # transferred MojoHandle. 103 | tag_kBlobTag = b"b" # uuid:WebCoreString, type:WebCoreString, size:uint64_t -> 104 | # Blob (ref) 105 | tag_kBlobIndexTag = b"i" # index:int32_t -> Blob (ref) 106 | tag_kFileTag = b"f" # file:RawFile -> File (ref) 107 | tag_kFileIndexTag = b"e" # index:int32_t -> File (ref) 108 | tag_kDOMFileSystemTag = b"d" # type : int32_t, name:WebCoreString, 109 | # uuid:WebCoreString -> FileSystem (ref) 110 | tag_kNativeFileSystemFileHandleTag = b"n" # name:WebCoreString, index:uint32_t 111 | # -> NativeFileSystemFileHandle (ref) 112 | tag_kNativeFileSystemDirectoryHandleTag = b"N" # name:WebCoreString, index:uint32_t -> 113 | # NativeFileSystemDirectoryHandle (ref) 114 | tag_kFileListTag = b"l" # length:uint32_t, files:RawFile[length] -> FileList (ref) 115 | tag_kFileListIndexTag = b"L" # length:uint32_t, files:int32_t[length] -> FileList (ref) 116 | tag_kImageDataTag = b"#" # tags terminated by ImageSerializationTag::kEnd (see 117 | # SerializedColorParams.h), width:uint32_t, 118 | # height:uint32_t, pixelDataLength:uint64_t, 119 | # data:byte[pixelDataLength] 120 | # -> ImageData (ref) 121 | tag_kImageBitmapTag = b"g" # tags terminated by ImageSerializationTag::kEnd (see 122 | # SerializedColorParams.h), width:uint32_t, 123 | # height:uint32_t, pixelDataLength:uint32_t, 124 | # data:byte[pixelDataLength] 125 | # -> ImageBitmap (ref) 126 | tag_kImageBitmapTransferTag = b"G" # index:uint32_t -> ImageBitmap. For ImageBitmap transfer 127 | tag_kOffscreenCanvasTransferTag = b"H" # index, width, height, id, 128 | # filter_quality::uint32_t -> 129 | # OffscreenCanvas. For OffscreenCanvas 130 | # transfer 131 | tag_kReadableStreamTransferTag = b"r" # index:uint32_t 132 | tag_kTransformStreamTransferTag = b"m" # index:uint32_t 133 | tag_kWritableStreamTransferTag = b"w" # index:uint32_t 134 | tag_kDOMPointTag = b"Q" # x:Double, y:Double, z:Double, w:Double 135 | tag_kDOMPointReadOnlyTag = b"W" # x:Double, y:Double, z:Double, w:Double 136 | tag_kDOMRectTag = b"E" # x:Double, y:Double, width:Double, height:Double 137 | tag_kDOMRectReadOnlyTag = b"R" # x:Double, y:Double, width:Double, height:Double 138 | tag_kDOMQuadTag = b"T" # p1:Double, p2:Double, p3:Double, p4:Double 139 | tag_kDOMMatrixTag = b"Y" # m11..m44: 16 Double 140 | tag_kDOMMatrixReadOnlyTag = b"U" # m11..m44: 16 Double 141 | tag_kDOMMatrix2DTag = b"I" # a..f: 6 Double 142 | tag_kDOMMatrix2DReadOnlyTag = b"O" # a..f: 6 Double 143 | tag_kCryptoKeyTag = b"K" # subtag:byte, props, usages:uint32_t, 144 | # keyDataLength:uint32_t, keyData:byte[keyDataLength] 145 | # If subtag=AesKeyTag: 146 | # props = keyLengthBytes:uint32_t, algorithmId:uint32_t 147 | # If subtag=HmacKeyTag: 148 | # props = keyLengthBytes:uint32_t, hashId:uint32_t 149 | # If subtag=RsaHashedKeyTag: 150 | # props = algorithmId:uint32_t, type:uint32_t, 151 | # modulusLengthBits:uint32_t, 152 | # publicExponentLength:uint32_t, 153 | # publicExponent:byte[publicExponentLength], 154 | # hashId:uint32_t 155 | # If subtag=EcKeyTag: 156 | # props = algorithmId:uint32_t, type:uint32_t, 157 | # namedCurve:uint32_t 158 | tag_kRTCCertificateTag = b"k" # length:uint32_t, pemPrivateKey:WebCoreString, 159 | # pemCertificate:WebCoreString 160 | tag_kRTCEncodedAudioFrameTag = b"A" # uint32_t -> transferred audio frame ID 161 | tag_kRTCEncodedVideoFrameTag = b"V" # uint32_t -> transferred video frame ID 162 | tag_kVideoFrameTag = b"v" # uint32_t -> transferred video frame ID 163 | 164 | # The following tags were used by the Shape Detection API implementation 165 | # between M71 and M81. During these milestones, the API was always behind 166 | # a flag. Usage was removed in https:#crrev.com/c/2040378. 167 | tag_kDeprecatedDetectedBarcodeTag = b"B" 168 | tag_kDeprecatedDetectedFaceTag = b"F" 169 | tag_kDeprecatedDetectedTextTag = b"t" 170 | 171 | tag_kDOMExceptionTag = b"x" # name:String,message:String,stack:String 172 | tag_kVersionTag = b"\xff" # version:uint32_t -> Uses this as the file version. 173 | tag_kTrailerOffsetTag = b"\xfe" # offset:uint64_t (fixed width, network order) from buffer, start size:uint32_t (fixed width, network order) 174 | tag_kTrailerRequiresInterfacesTag = b"\xA0" 175 | 176 | 177 | class V8CryptoKeySubType(enum.IntEnum): 178 | """ 179 | See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h 180 | Used by the kCryptoKeyTag type 181 | """ 182 | AesKey = 1 183 | HmacKey = 2 184 | # ID 3 was used by RsaKeyTag, while still behind experimental flag. 185 | RsaHashedKey = 4 186 | EcKey = 5 187 | NoParamsKey = 6 188 | 189 | 190 | class V8CryptoKeyAlgorithm(enum.IntEnum): 191 | """ 192 | See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h 193 | Used by the kCryptoKeyTag type 194 | """ 195 | AesCbcTag = 1 196 | HmacTag = 2 197 | RsaSsaPkcs1v1_5Tag = 3 198 | # ID 4 was used by RsaEs, while still behind experimental flag. 199 | Sha1Tag = 5 200 | Sha256Tag = 6 201 | Sha384Tag = 7 202 | Sha512Tag = 8 203 | AesGcmTag = 9 204 | RsaOaepTag = 10 205 | AesCtrTag = 11 206 | AesKwTag = 12 207 | RsaPssTag = 13 208 | EcdsaTag = 14 209 | EcdhTag = 15 210 | HkdfTag = 16 211 | Pbkdf2Tag = 17 212 | 213 | 214 | class V8AsymmetricCryptoKeyType(enum.IntEnum): 215 | Public = 1 216 | Private = 2 217 | 218 | 219 | class V8CryptoNamedCurve(enum.IntEnum): 220 | """ 221 | See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h 222 | Used by the kCryptoKeyTag type 223 | """ 224 | P256 = 1 225 | P384 = 2 226 | P521 = 3 227 | 228 | 229 | class V8CryptoKeyUsage(enum.IntFlag): 230 | """ 231 | See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h 232 | Used by the kCryptoKeyTag type 233 | """ 234 | kExtractableUsage = 1 << 0 235 | kEncryptUsage = 1 << 1 236 | kDecryptUsage = 1 << 2 237 | kSignUsage = 1 << 3 238 | kVerifyUsage = 1 << 4 239 | kDeriveKeyUsage = 1 << 5 240 | kWrapKeyUsage = 1 << 6 241 | kUnwrapKeyUsage = 1 << 7 242 | kDeriveBitsUsage = 1 << 8 243 | 244 | 245 | class BlinkV8Deserializer: 246 | def _read_varint(self, stream) -> int: 247 | return ccl_v8_value_deserializer.read_le_varint(stream)[0] 248 | 249 | def _read_varint32(self, stream) -> int: 250 | return ccl_v8_value_deserializer.read_le_varint(stream, is_32bit=True)[0] 251 | 252 | def _read_utf8_string(self, stream: typing.BinaryIO) -> str: 253 | length = self._read_varint32(stream) 254 | raw_string = stream.read(length) 255 | if len(raw_string) != length: 256 | raise ValueError("Could not read all of the utf-8 data") 257 | return raw_string.decode("utf-8") 258 | 259 | # def _read_uint32(self, stream: typing.BinaryIO) -> int: 260 | # raw = stream.read(4) 261 | # if len(raw) < 4: 262 | # raise ValueError("Could not read enough data when reading int32") 263 | # return struct.unpack(" BlobIndex: 266 | return BlobIndex(BlobIndexType.File, self._read_varint(stream)) 267 | 268 | def _read_blob_index(self, stream: typing.BinaryIO) -> BlobIndex: 269 | return BlobIndex(BlobIndexType.Blob, self._read_varint(stream)) 270 | 271 | def _read_file_list_index(self, stream: typing.BinaryIO) -> typing.Iterable[BlobIndex]: 272 | length = self._read_varint(stream) 273 | result = [self._read_file_index(stream) for _ in range(length)] 274 | return result 275 | 276 | def _read_native_file_handle(self, is_dir: bool, stream: typing.BinaryIO) -> NativeFileHandle: 277 | return NativeFileHandle(is_dir, self._read_utf8_string(stream), self._read_varint(stream)) 278 | 279 | def _read_crypto_key(self, stream: typing.BinaryIO): 280 | sub_type = V8CryptoKeySubType(stream.read(1)[0]) 281 | 282 | if sub_type == V8CryptoKeySubType.AesKey: 283 | algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 284 | byte_length = self._read_varint32(stream) 285 | params = { 286 | "algorithm_type": algorithm_id, 287 | "byte_length": byte_length, 288 | "hash_type": None, 289 | "named_curve_type": None, 290 | "asymmetric_key_type": None, 291 | "public_exponent": None 292 | } 293 | elif sub_type == V8CryptoKeySubType.HmacKey: 294 | byte_length = self._read_varint32(stream) 295 | hash_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 296 | params = { 297 | "byte_length": byte_length, 298 | "hash_type": hash_id, 299 | "algorithm_type": None, 300 | "named_curve_type": None, 301 | "asymmetric_key_type": None, 302 | "public_exponent": None 303 | } 304 | elif sub_type == V8CryptoKeySubType.RsaHashedKey: 305 | algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 306 | asymmetric_key_type = V8AsymmetricCryptoKeyType(stream.read(1)[0]) 307 | length_bytes = self._read_varint32(stream) 308 | public_exponent_length = self._read_varint32(stream) 309 | public_exponent = stream.read(public_exponent_length) 310 | if len(public_exponent) != public_exponent_length: 311 | raise ValueError(f"Could not read all of public exponent data") 312 | hash_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 313 | params = { 314 | "algorithm_type": algorithm_id, 315 | "asymmetric_key_type": asymmetric_key_type, 316 | "byte_length": length_bytes, 317 | "public_exponent": public_exponent, 318 | "hash_type": hash_id, 319 | "named_curve_type": None 320 | } 321 | 322 | elif sub_type == V8CryptoKeySubType.EcKey: 323 | algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 324 | asymmetric_key_type = V8AsymmetricCryptoKeyType(stream.read(1)[0]) 325 | named_curve = V8CryptoNamedCurve(self._read_varint32(stream)) 326 | params = { 327 | "algorithm_type": algorithm_id, 328 | "asymmetric_key_type": asymmetric_key_type, 329 | "named_curve_type": named_curve, 330 | "hash_type": None, 331 | "byte_length": None, 332 | "public_exponent": None 333 | } 334 | elif sub_type == V8CryptoKeySubType.NoParamsKey: 335 | algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream)) 336 | params = { 337 | "algorithm_type": algorithm_id, 338 | "hash_type": None, 339 | "asymmetric_key_type": None, 340 | "byte_length": None, 341 | "named_curve_type": None, 342 | "public_exponent": None 343 | } 344 | else: 345 | raise ValueError(f"Unknown V8CryptoKeySubType {sub_type}") 346 | 347 | params["key_usage"] = V8CryptoKeyUsage(self._read_varint32(stream)) 348 | key_length = self._read_varint32(stream) 349 | key_data = stream.read(key_length) 350 | if len(key_data) < key_length: 351 | raise ValueError("Could not read all key data") 352 | 353 | params["key_data"] = key_data 354 | return CryptoKey(sub_type, **params) 355 | 356 | def _not_implemented(self, stream): 357 | raise NotImplementedError() 358 | 359 | def read(self, stream: typing.BinaryIO) -> typing.Any: 360 | tag = stream.read(1) 361 | 362 | func = { 363 | Constants.tag_kMessagePortTag: lambda x: self._not_implemented(x), 364 | Constants.tag_kMojoHandleTag: lambda x: self._not_implemented(x), 365 | Constants.tag_kBlobTag: lambda x: self._not_implemented(x), 366 | Constants.tag_kBlobIndexTag: lambda x: self._read_blob_index(x), 367 | Constants.tag_kFileTag: lambda x: self._not_implemented(x), 368 | Constants.tag_kFileIndexTag: lambda x: self._read_file_index(x), 369 | Constants.tag_kDOMFileSystemTag: lambda x: self._not_implemented(x), 370 | Constants.tag_kNativeFileSystemFileHandleTag: lambda x: self._read_native_file_handle(False, x), 371 | Constants.tag_kNativeFileSystemDirectoryHandleTag: lambda x: self._read_native_file_handle(True, x), 372 | Constants.tag_kFileListTag: lambda x: self._not_implemented(x), 373 | Constants.tag_kFileListIndexTag: lambda x: self._read_file_list_index(x), 374 | Constants.tag_kImageDataTag: lambda x: self._not_implemented(x), 375 | Constants.tag_kImageBitmapTag: lambda x: self._not_implemented(x), 376 | Constants.tag_kImageBitmapTransferTag: lambda x: self._not_implemented(x), 377 | Constants.tag_kOffscreenCanvasTransferTag: lambda x: self._not_implemented(x), 378 | Constants.tag_kReadableStreamTransferTag: lambda x: self._not_implemented(x), 379 | Constants.tag_kTransformStreamTransferTag: lambda x: self._not_implemented(x), 380 | Constants.tag_kWritableStreamTransferTag: lambda x: self._not_implemented(x), 381 | Constants.tag_kDOMPointTag: lambda x: self._not_implemented(x), 382 | Constants.tag_kDOMPointReadOnlyTag: lambda x: self._not_implemented(x), 383 | Constants.tag_kDOMRectTag: lambda x: self._not_implemented(x), 384 | Constants.tag_kDOMRectReadOnlyTag: lambda x: self._not_implemented(x), 385 | Constants.tag_kDOMQuadTag: lambda x: self._not_implemented(x), 386 | Constants.tag_kDOMMatrixTag: lambda x: self._not_implemented(x), 387 | Constants.tag_kDOMMatrixReadOnlyTag: lambda x: self._not_implemented(x), 388 | Constants.tag_kDOMMatrix2DTag: lambda x: self._not_implemented(x), 389 | Constants.tag_kDOMMatrix2DReadOnlyTag: lambda x: self._not_implemented(x), 390 | Constants.tag_kCryptoKeyTag: lambda x: self._read_crypto_key(x), 391 | Constants.tag_kRTCCertificateTag: lambda x: self._not_implemented(x), 392 | Constants.tag_kRTCEncodedAudioFrameTag: lambda x: self._not_implemented(x), 393 | Constants.tag_kRTCEncodedVideoFrameTag: lambda x: self._not_implemented(x), 394 | Constants.tag_kVideoFrameTag: lambda x: self._not_implemented(x), 395 | Constants.tag_kDOMExceptionTag: lambda x: self._not_implemented(x) 396 | }.get(tag) 397 | 398 | if func is None: 399 | raise ValueError(f"Unknown tag: {tag}") 400 | 401 | return func(stream) 402 | -------------------------------------------------------------------------------- /ccl_chromium_reader/serialization_formats/ccl_easy_chromium_pickle.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import io 24 | import datetime 25 | import struct 26 | import os 27 | 28 | 29 | __version__ = "0.1" 30 | __description__ = "Module for reading Chromium Pickles." 31 | __contact__ = "Alex Caithness" 32 | 33 | 34 | class EasyPickleException(Exception): 35 | ... 36 | 37 | 38 | class EasyPickleIterator: 39 | """ 40 | A pythonic implementation of the PickleIterator object used in various places in Chrom(e|ium). 41 | """ 42 | def __init__(self, data: bytes, alignment: int=4): 43 | """ 44 | Takes a bytes buffer and wraps the EasyPickleIterator around it 45 | :param data: the data to be wrapped 46 | :param alignment: (optional) the number of bytes to align reads to (default: 4) 47 | """ 48 | self._f = io.BytesIO(data) 49 | self._alignment = alignment 50 | 51 | self._pickle_length = self.read_uint32() 52 | if len(data) != self._pickle_length + 4: 53 | raise EasyPickleException("pickle length invalid") 54 | 55 | def __enter__(self) -> "EasyPickleIterator": 56 | return self 57 | 58 | def __exit__(self, exc_type, exc_val, exc_tb): 59 | self.close() 60 | 61 | def close(self): 62 | self._f.close() 63 | 64 | def read_aligned(self, length: int) -> bytes: 65 | """ 66 | reads the number of bytes specified by the length parameter. Aligns the buffer afterwards if required. 67 | :param length: the length od data to be read 68 | :return: the data read (without the alignment padding) 69 | """ 70 | raw = self._f.read(length) 71 | if len(raw) != length: 72 | raise EasyPickleException(f"Tried to read {length} bytes but only got {len(raw)}") 73 | 74 | align_count = self._alignment - (length % self._alignment) 75 | if align_count != self._alignment: 76 | self._f.seek(align_count, os.SEEK_CUR) 77 | 78 | return raw 79 | 80 | def read_uint16(self) -> int: 81 | raw = self.read_aligned(2) 82 | return struct.unpack(" int: 85 | raw = self.read_aligned(4) 86 | return struct.unpack(" int: 89 | raw = self.read_aligned(8) 90 | return struct.unpack(" int: 93 | raw = self.read_aligned(2) 94 | return struct.unpack(" int: 97 | raw = self.read_aligned(4) 98 | return struct.unpack(" int: 101 | raw = self.read_aligned(8) 102 | return struct.unpack(" bool: 105 | raw = self.read_int32() 106 | if raw == 0: 107 | return False 108 | elif raw == 1: 109 | return True 110 | else: 111 | raise EasyPickleException("bools should only contain 0 or 1") 112 | 113 | def read_single(self) -> float: 114 | raw = self.read_aligned(4) 115 | return struct.unpack(" float: 118 | raw = self.read_aligned(8) 119 | return struct.unpack(" str: 122 | length = self.read_uint32() 123 | raw = self.read_aligned(length) 124 | return raw.decode("utf-8") 125 | 126 | def read_string16(self) -> str: 127 | length = self.read_uint32() * 2 # character count 128 | raw = self.read_aligned(length) 129 | return raw.decode("utf-16-le") 130 | 131 | def read_datetime(self) -> datetime.datetime: 132 | return datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds=self.read_uint64()) 133 | 134 | -------------------------------------------------------------------------------- /ccl_chromium_reader/serialization_formats/ccl_protobuff.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2022, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import sys 24 | import struct 25 | import io 26 | import typing 27 | 28 | __version__ = "0.8" 29 | __description__ = "Module for naive parsing of Protocol Buffers" 30 | __contact__ = "Alex Caithness" 31 | 32 | DEBUG = False 33 | 34 | 35 | class Empty: 36 | value = None 37 | 38 | 39 | class ProtoObject: 40 | def __init__(self, tag, name, value): 41 | self.tag = tag 42 | self.name = name 43 | self.value = value 44 | self.wire = tag & 0x07 45 | 46 | def __str__(self): 47 | if self.name: 48 | return "{0} ({1}): {2}".format( 49 | self.tag if self.tag > 0x7f else hex(self.tag), self.name, repr(self.value)) 50 | else: 51 | return "{0}: {1}".format( 52 | self.tag if self.tag > 0x7f else hex(self.tag), repr(self.value)) 53 | __repr__ = __str__ 54 | 55 | @property 56 | def friendly_tag(self) -> int: 57 | """:return the "real" tag (i.e. the one that would be seen inside the .proto schema)""" 58 | return self.tag >> 3 59 | 60 | def get_items_by_tag(self, tag_id: int) -> list[typing.Any]: 61 | """ 62 | :param tag_id: the tag id for the child items 63 | :return: list of child items with this tag number 64 | """ 65 | if not isinstance(self.value, list): 66 | raise ValueError("This object does not support child items") 67 | if not isinstance(tag_id, int): 68 | raise TypeError("Expected type: int; actual type: {0}".format(type(tag_id))) 69 | return [x for x in self.value if x.tag == tag_id] 70 | 71 | def get_items_by_name(self, name: str) -> list[typing.Any]: 72 | """ 73 | :param name: the field name for the child items 74 | :return: list of child items with this name 75 | """ 76 | if not isinstance(self.value, list): 77 | raise ValueError("This object does not support child items") 78 | if not isinstance(name, str): 79 | raise TypeError("Expected type: str; actual type: {0}".format(type(name))) 80 | return [x for x in self.value if x.name == name] 81 | 82 | def only(self, name: str, default=Empty): 83 | """ 84 | Returns a single item which matches the name parameter. Use this to streamline getting non-repeating items 85 | :param name: the name of the child item 86 | :param default: optional: the value to return if the item is not present (default: None) 87 | :return: the single child item 88 | :exception: ValueError: if there is more than one child item which matches this name 89 | """ 90 | got = self.get_items_by_name(name) 91 | if len(got) == 0: 92 | return default 93 | elif len(got) == 1: 94 | return got[0] 95 | else: 96 | raise ValueError("More than one value with this key") 97 | 98 | def __getitem__(self, key: typing.Union[str, int]) -> list[typing.Any]: 99 | if isinstance(key, str): 100 | return self.get_items_by_name(key) 101 | elif isinstance(key, int): 102 | return self.get_items_by_tag(key) 103 | else: 104 | raise TypeError("Key should be int or str; actual type: {0}".format(type(key))) 105 | 106 | def __len__(self) -> int: 107 | return self.value.__len__() 108 | 109 | def __iter__(self): 110 | if not isinstance(self.value, list): 111 | raise ValueError("This object does not support child items") 112 | else: 113 | yield from (x.tag for x in self.value) 114 | 115 | 116 | class ProtoDecoder: 117 | def __init__(self, object_name, func): 118 | self.func = func 119 | self.object_name = object_name 120 | 121 | def __call__(self, arg): 122 | return self.func(arg) 123 | 124 | 125 | def _read_le_varint(stream: typing.BinaryIO, is_32bit=False) -> typing.Optional[typing.Tuple[int, bytes]]: 126 | # this only outputs unsigned 127 | limit = 5 if is_32bit else 10 128 | i = 0 129 | result = 0 130 | underlying_bytes = [] 131 | while i < limit: # 64 bit max possible? 132 | raw = stream.read(1) 133 | if len(raw) < 1: 134 | return None 135 | tmp, = raw 136 | underlying_bytes.append(tmp) 137 | result |= ((tmp & 0x7f) << (i * 7)) 138 | if (tmp & 0x80) == 0: 139 | break 140 | i += 1 141 | return result, bytes(underlying_bytes) 142 | 143 | 144 | def read_le_varint(stream: typing.BinaryIO, is_32bit=False) -> typing.Optional[int]: 145 | x = _read_le_varint(stream, is_32bit) 146 | if x is None: 147 | return None 148 | else: 149 | return x[0] 150 | 151 | 152 | def read_le_varint32(stream: typing.BinaryIO) -> typing.Optional[int]: 153 | return read_le_varint(stream, True) 154 | 155 | 156 | def read_tag( 157 | stream: typing.BinaryIO, 158 | tag_mappings: dict[int, typing.Callable[[typing.BinaryIO], typing.Any]], 159 | log_out=sys.stderr, use_friendly_tag=False) -> typing.Optional[ProtoObject]: 160 | tag_id = read_le_varint(stream) 161 | if tag_id is None: 162 | return None 163 | decoder = tag_mappings.get(tag_id if not use_friendly_tag else tag_id >> 3) 164 | name = None 165 | if isinstance(decoder, ProtoDecoder): 166 | name = decoder.object_name 167 | 168 | available_wirebytes = io.BytesIO(_get_bytes_for_wiretype(tag_id, stream)) 169 | 170 | tag_value = decoder(available_wirebytes) if decoder else _fallback_decode( 171 | tag_id, available_wirebytes, log_out) 172 | 173 | return ProtoObject(tag_id, name, tag_value) 174 | 175 | 176 | def read_protobuff( 177 | stream: typing.BinaryIO, 178 | tag_mappings: dict [int, typing.Callable[[typing.BinaryIO], typing.Any]], 179 | use_friendly_tag=False) -> list[ProtoObject]: 180 | result = [] 181 | while True: 182 | tag = read_tag(stream, tag_mappings, use_friendly_tag=use_friendly_tag) 183 | if tag is None: 184 | break 185 | result.append(tag) 186 | 187 | return result 188 | 189 | 190 | def read_blob(stream: typing.BinaryIO) -> bytes: 191 | blob_length = read_le_varint(stream) 192 | blob = stream.read(blob_length) 193 | return blob 194 | 195 | 196 | def read_string(stream: typing.BinaryIO) -> str: 197 | raw_string = read_blob(stream) 198 | string = raw_string.decode("utf-8") 199 | return string 200 | 201 | 202 | def read_double(stream: typing.BinaryIO) -> float: 203 | return struct.unpack(" int: 207 | return struct.unpack(" int: 211 | return struct.unpack(" list[ProtoObject]: 215 | blob_blob = read_blob(stream) 216 | blob_stream = io.BytesIO(blob_blob) 217 | return read_protobuff(blob_stream, mappings, use_friendly_tag) 218 | 219 | 220 | def read_fixed_blob(stream: typing.BinaryIO, length: int) -> bytes: 221 | data = stream.read(length) 222 | if len(data) != length: 223 | raise ValueError("Couldn't read enough data") 224 | return data 225 | 226 | 227 | _fallback_wire_types = { 228 | 0: read_le_varint, 229 | 1: lambda x: read_fixed_blob(x, 8), 230 | 2: read_blob, 231 | 5: lambda x: read_fixed_blob(x, 4) 232 | } 233 | 234 | _wire_type_friendly_names = { 235 | 0: "Varint", 236 | 1: "64-Bit", 237 | 2: "Length Delimited", 238 | 5: "32-Bit" 239 | } 240 | 241 | 242 | def _get_bytes_for_wiretype(tag_id: int, stream: typing.BinaryIO): 243 | wire_type = tag_id & 0x07 244 | if wire_type == 0: 245 | read_bytes = [] 246 | for i in range(10): 247 | x = stream.read(1)[0] 248 | read_bytes.append(x) 249 | if x & 0x80 == 0: 250 | break 251 | buffer = bytes(read_bytes) 252 | elif wire_type == 1: 253 | buffer = stream.read(8) 254 | elif wire_type == 2: 255 | l, b = _read_le_varint(stream) 256 | available_bytes = stream.read(l) 257 | if len(available_bytes) < l: 258 | raise ValueError("Stream too short") 259 | buffer = b + available_bytes 260 | elif wire_type == 5: 261 | buffer = stream.read(4) 262 | else: 263 | raise ValueError("Invalid wiretype") 264 | 265 | return buffer 266 | 267 | 268 | def _fallback_decode(tag_id, stream, log): 269 | fallback_func = _fallback_wire_types.get(tag_id & 0x07) 270 | if not fallback_func: 271 | raise ValueError("No appropriate fallback function for tag {0} (wire type {1})".format( 272 | tag_id, tag_id & 0x07)) 273 | if DEBUG: 274 | log.write("Tag {0} ({1}) not defined, using fallback decoding.\n".format( 275 | tag_id if tag_id > 0x7f else hex(tag_id), _wire_type_friendly_names[tag_id & 0x07])) 276 | return fallback_func(stream) 277 | -------------------------------------------------------------------------------- /ccl_chromium_reader/storage_formats/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/ccl_chromium_reader/storage_formats/__init__.py -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools>=61.0"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [tool.setuptools] 6 | packages = [ 7 | "ccl_chromium_reader", 8 | "ccl_chromium_reader.serialization_formats", 9 | "ccl_chromium_reader.storage_formats" 10 | ] 11 | 12 | [project] 13 | name = "ccl_chromium_reader" 14 | version = "0.3.14" 15 | authors = [ 16 | { name="Alex Caithness", email="research@cclsolutionsgroup.com" }, 17 | ] 18 | description = "(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications." 19 | readme = "README.md" 20 | requires-python = ">=3.10" 21 | classifiers = [ 22 | "Programming Language :: Python :: 3", 23 | "License :: OSI Approved :: MIT License", 24 | "Operating System :: OS Independent", 25 | "Development Status :: 4 - Beta", 26 | ] 27 | keywords = ["digital forensics", "dfir", "chrome", "chromium", "browser"] 28 | dependencies = [ 29 | "Brotli", 30 | "ccl_simplesnappy @ git+https://github.com/cclgroupltd/ccl_simplesnappy.git" 31 | ] 32 | 33 | [project.urls] 34 | Homepage = "https://github.com/cclgroupltd/ccl_chromium_reader" 35 | Issues = "https://github.com/cclgroupltd/ccl_chromium_reader/issues" -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/requirements.txt -------------------------------------------------------------------------------- /tools_and_utilities/Chromium_dump_local_storage.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2021, CCL Forensics 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | """ 21 | 22 | import sys 23 | import pathlib 24 | import datetime 25 | import sqlite3 26 | from ccl_chromium_reader import ccl_chromium_localstorage 27 | 28 | __version__ = "0.1" 29 | __description__ = "Dumps a Chromium localstorage leveldb to sqlite for review" 30 | __contact__ = "Alex Caithness" 31 | 32 | DB_SCHEMA = """ 33 | CREATE TABLE storage_keys ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "storage_key" TEXT); 34 | CREATE TABLE batches ("start_ldbseq" INTEGER PRIMARY KEY, 35 | "end_ldbseq" INTEGER, 36 | "storage_key" INTEGER, 37 | "timestamp" INTEGER); 38 | CREATE TABLE records ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, 39 | "storage_key" INTEGER, 40 | "key" TEXT, 41 | "value" TEXT, 42 | "batch" INTEGER, 43 | "ldbseq" INTEGER); 44 | CREATE INDEX "storage_keys_storage_key" ON "storage_keys" ("storage_key"); 45 | 46 | CREATE VIEW "records_view" AS 47 | SELECT 48 | storage_keys.storage_key AS "storage_key", 49 | records."key" AS "key", 50 | records.value AS "value", 51 | datetime(batches."timestamp", 'unixepoch') AS "batch_timestamp", 52 | records.ldbseq AS "ldbseq" 53 | FROM records 54 | INNER JOIN storage_keys ON records.storage_key = storage_keys._id 55 | INNER JOIN batches ON records.batch = batches.start_ldbseq 56 | ORDER BY records.ldbseq; 57 | """ 58 | 59 | INSERT_STORAGE_KEY_SQL = """INSERT INTO "storage_keys" ("storage_key") VALUES (?);""" 60 | INSERT_BATCH_SQL = """INSERT INTO "batches" ("start_ldbseq", "end_ldbseq", "storage_key", "timestamp") 61 | VALUES (?, ?, ?, ?);""" 62 | INSERT_RECORD_SQL = """INSERT INTO "records" ("storage_key", "key", "value", "batch", "ldbseq") 63 | VALUES (?, ?, ?, ?, ?);""" 64 | 65 | 66 | def main(args): 67 | level_db_in_dir = pathlib.Path(args[0]) 68 | db_out_path = pathlib.Path(args[1]) 69 | 70 | if db_out_path.exists(): 71 | raise ValueError("output database already exists") 72 | 73 | local_storage = ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir) 74 | out_db = sqlite3.connect(db_out_path) 75 | out_db.executescript(DB_SCHEMA) 76 | cur = out_db.cursor() 77 | 78 | storage_keys_lookup = {} 79 | for storage_key in local_storage.iter_storage_keys(): 80 | cur.execute(INSERT_STORAGE_KEY_SQL, (storage_key,)) 81 | cur.execute("SELECT last_insert_rowid();") 82 | storage_key_id = cur.fetchone()[0] 83 | storage_keys_lookup[storage_key] = storage_key_id 84 | 85 | for batch in local_storage.iter_batches(): 86 | cur.execute( 87 | INSERT_BATCH_SQL, 88 | (batch.start, batch.end, storage_keys_lookup[batch.storage_key], 89 | batch.timestamp.replace(tzinfo=datetime.timezone.utc).timestamp())) 90 | 91 | for record in local_storage.iter_all_records(): 92 | batch = local_storage.find_batch(record.leveldb_seq_number) 93 | batch_id = batch.start if batch is not None else None 94 | cur.execute( 95 | INSERT_RECORD_SQL, 96 | ( 97 | storage_keys_lookup[record.storage_key], record.script_key, record.value, 98 | batch_id, record.leveldb_seq_number 99 | ) 100 | ) 101 | 102 | cur.close() 103 | out_db.commit() 104 | out_db.close() 105 | 106 | 107 | if __name__ == '__main__': 108 | if len(sys.argv) != 3: 109 | print(f"{pathlib.Path(sys.argv[0]).name} ") 110 | exit(1) 111 | main(sys.argv[1:]) 112 | -------------------------------------------------------------------------------- /tools_and_utilities/Chromium_dump_session_storage.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2021, CCL Forensics 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | """ 21 | 22 | import sys 23 | import pathlib 24 | import sqlite3 25 | from ccl_chromium_reader import ccl_chromium_sessionstorage 26 | 27 | __version__ = "0.1" 28 | __description__ = "Dumps a Chromium sessionstorage leveldb to sqlite for review" 29 | __contact__ = "Alex Caithness" 30 | 31 | DB_SCHEMA = """ 32 | CREATE TABLE "hosts" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "host" TEXT); 33 | CREATE TABLE "guids" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "guid" TEXT); 34 | CREATE TABLE "items" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, 35 | "host" INTEGER, 36 | "guid" INTEGER, 37 | "ldbseq" INTEGER, 38 | "key" TEXT, 39 | "value" TEXT); 40 | CREATE INDEX "item_host" ON "items" ("host"); 41 | CREATE INDEX "item_ldbseq" ON "items" ("ldbseq"); 42 | 43 | CREATE VIEW items_view AS 44 | SELECT "items"."ldbseq", 45 | "hosts"."host", 46 | "items"."key", 47 | "items"."value", 48 | "guids"."guid" 49 | FROM "items" 50 | LEFT JOIN "hosts" ON "items"."host" = "hosts"."_id" 51 | LEFT JOIN "guids" ON "items"."guid" = "guids"."_id" 52 | ORDER BY "items"."ldbseq"; 53 | """ 54 | 55 | INSERT_HOST_SQL = """INSERT INTO "hosts" ("host") VALUES (?);""" 56 | INSERT_ITEM_SQL = """INSERT INTO "items" (host, guid, ldbseq, key, value) VALUES (?, ?, ?, ?, ?);""" 57 | 58 | 59 | def main(args): 60 | level_db_in_dir = pathlib.Path(args[0]) 61 | db_out_path = pathlib.Path(args[1]) 62 | 63 | if db_out_path.exists(): 64 | raise ValueError("output database already exists") 65 | 66 | session_storage = ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir) 67 | out_db = sqlite3.connect(db_out_path) 68 | out_db.executescript(DB_SCHEMA) 69 | cur = out_db.cursor() 70 | for host in session_storage.iter_hosts(): 71 | cur.execute(INSERT_HOST_SQL, (host,)) 72 | cur.execute("SELECT last_insert_rowid();") 73 | host_id = cur.fetchone()[0] 74 | host_kvs = session_storage.get_all_for_host(host) 75 | 76 | for key, values in host_kvs.items(): 77 | for value in values: 78 | cur.execute(INSERT_ITEM_SQL, (host_id, None, value.leveldb_sequence_number, key, value.value)) 79 | 80 | for key, value in session_storage.iter_orphans(): 81 | cur.execute(INSERT_ITEM_SQL, (None, None, value.leveldb_sequence_number, key, value.value)) 82 | 83 | cur.close() 84 | out_db.commit() 85 | out_db.close() 86 | 87 | 88 | if __name__ == '__main__': 89 | if len(sys.argv) != 3: 90 | print(f"{pathlib.Path(sys.argv[0]).name} ") 91 | exit(1) 92 | main(sys.argv[1:]) 93 | -------------------------------------------------------------------------------- /tools_and_utilities/benchmark.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import pathlib 3 | from ccl_chromium_reader import ccl_chromium_indexeddb 4 | import time 5 | 6 | 7 | def main(args): 8 | start = time.time() 9 | ldb_path = pathlib.Path(args[0]) 10 | wrapper = ccl_chromium_indexeddb.WrappedIndexDB(ldb_path) 11 | 12 | for db_info in wrapper.database_ids: 13 | db = wrapper[db_info.dbid_no] 14 | print("------Database------") 15 | print(f"db_number={db.db_number}; name={db.name}; origin={db.origin}") 16 | print() 17 | print("\t---Object Stores---") 18 | for obj_store_name in db.object_store_names: 19 | obj_store = db[obj_store_name] 20 | print(f"\tobject_store_id={obj_store.object_store_id}; name={obj_store.name}") 21 | try: 22 | one_record = next(obj_store.iterate_records()) 23 | except StopIteration: 24 | one_record = None 25 | print() 26 | end = time.time() 27 | print("Elapsed time: {} seconds.".format(int(end-start))) 28 | 29 | 30 | if __name__ == '__main__': 31 | if len(sys.argv) < 2: 32 | print(f"USAGE: {pathlib.Path(sys.argv[0]).name} ") 33 | exit(1) 34 | 35 | main(sys.argv[1:]) 36 | -------------------------------------------------------------------------------- /tools_and_utilities/dump_indexeddb_details.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2020-2024, CCL Forensics 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 8 | of the Software, and to permit persons to whom the Software is furnished to do 9 | so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | """ 22 | 23 | import sys 24 | import pathlib 25 | from ccl_chromium_reader import ccl_chromium_indexeddb 26 | 27 | 28 | def main(args): 29 | ldb_path = pathlib.Path(args[0]) 30 | wrapper = ccl_chromium_indexeddb.WrappedIndexDB(ldb_path) 31 | 32 | for db_info in wrapper.database_ids: 33 | db = wrapper[db_info.dbid_no] 34 | print("------Database------") 35 | print(f"db_number={db.db_number}; name={db.name}; origin={db.origin}") 36 | print() 37 | print("\t---Object Stores---") 38 | for obj_store_name in db.object_store_names: 39 | obj_store = db[obj_store_name] 40 | print(f"\tobject_store_id={obj_store.object_store_id}; name={obj_store.name}") 41 | try: 42 | one_record = next(obj_store.iterate_records()) 43 | except StopIteration: 44 | one_record = None 45 | if one_record is not None: 46 | print("\tExample record:") 47 | print(f"\tkey: {one_record.key}") 48 | print(f"\tvalue: {one_record.value}") 49 | else: 50 | print("\tNo records") 51 | print() 52 | print() 53 | 54 | 55 | if __name__ == '__main__': 56 | if len(sys.argv) < 2: 57 | print(f"USAGE: {pathlib.Path(sys.argv[0]).name} ") 58 | exit(1) 59 | main(sys.argv[1:]) 60 | -------------------------------------------------------------------------------- /tools_and_utilities/dump_leveldb.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import csv 3 | from ccl_chromium_reader.storage_formats import ccl_leveldb 4 | import pathlib 5 | 6 | ENCODING = "iso-8859-1" 7 | 8 | 9 | def main(args): 10 | input_path = args[0] 11 | output_path = "leveldb_dump.csv" 12 | if len(args) > 1: 13 | output_path = args[1] 14 | 15 | leveldb_records = ccl_leveldb.RawLevelDb(input_path) 16 | 17 | with open(output_path, "w", encoding="utf-8", newline="") as file1: 18 | writes = csv.writer(file1, quoting=csv.QUOTE_ALL) 19 | writes.writerow( 20 | [ 21 | "key-hex", "key-text", "value-hex", "value-text", "origin_file", 22 | "file_type", "offset", "seq", "state", "was_compressed" 23 | ]) 24 | 25 | for record in leveldb_records.iterate_records_raw(): 26 | writes.writerow([ 27 | record.user_key.hex(" ", 1), 28 | record.user_key.decode(ENCODING, "replace"), 29 | record.value.hex(" ", 1), 30 | record.value.decode(ENCODING, "replace"), 31 | str(record.origin_file), 32 | record.file_type.name, 33 | record.offset, 34 | record.seq, 35 | record.state.name, 36 | record.was_compressed 37 | ]) 38 | 39 | 40 | if __name__ == '__main__': 41 | if len(sys.argv) < 2: 42 | print(f"Usage: {pathlib.Path(sys.argv[0]).name} [outpath.csv]") 43 | exit(1) 44 | print() 45 | print("+--------------------------------------------------------+") 46 | print("|Please note: keys and values in leveldb are binary blobs|") 47 | print("|so any text seen in the output of this script might not |") 48 | print("|represent the entire meaning of the data. The output of |") 49 | print("|this script should be considered as a preview of the |") 50 | print("|data only. |") 51 | print("+--------------------------------------------------------+") 52 | print() 53 | main(sys.argv[1:]) 54 | -------------------------------------------------------------------------------- /tools_and_utilities/extras/make_many_indexeddb_databases.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 115 | 116 | 117 | 118 |

119 | 120 | 121 | -------------------------------------------------------------------------------- /tools_and_utilities/extras/make_test_indexeddb.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 286 | 287 | 288 | 289 | 290 |

291 | 292 | 293 |

294 | 295 | 296 | -------------------------------------------------------------------------------- /tools_and_utilities/extras/make_webstorage.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | --------------------------------------------------------------------------------