├── .gitignore
├── LICENSE
├── README.md
├── ccl_chromium_reader
    ├── __init__.py
    ├── ccl_chromium_cache.py
    ├── ccl_chromium_filesystem.py
    ├── ccl_chromium_history.py
    ├── ccl_chromium_indexeddb.py
    ├── ccl_chromium_localstorage.py
    ├── ccl_chromium_notifications.py
    ├── ccl_chromium_profile_folder.py
    ├── ccl_chromium_sessionstorage.py
    ├── ccl_chromium_snss2.py
    ├── ccl_shared_proto_db_downloads.py
    ├── common.py
    ├── download_common.py
    ├── profile_folder_protocols.py
    ├── serialization_formats
    │   ├── __init__.py
    │   ├── ccl_blink_value_deserializer.py
    │   ├── ccl_easy_chromium_pickle.py
    │   ├── ccl_protobuff.py
    │   └── ccl_v8_value_deserializer.py
    └── storage_formats
    │   ├── __init__.py
    │   └── ccl_leveldb.py
├── pyproject.toml
├── requirements.txt
└── tools_and_utilities
    ├── Chromium_dump_local_storage.py
    ├── Chromium_dump_session_storage.py
    ├── benchmark.py
    ├── ccl_chrome_audit.py
    ├── dump_indexeddb_details.py
    ├── dump_leveldb.py
    └── extras
        ├── make_many_indexeddb_databases.html
        ├── make_test_indexeddb.html
        └── make_webstorage.html


/.gitignore:
--------------------------------------------------------------------------------
1 | /*.bin
2 | /__pycache__
3 | /__ignore__
4 | /.idea
5 | env/
6 | dist
7 | *.egg-info


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright 2020, CCL Forensics
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 7 | of the Software, and to permit persons to whom the Software is furnished to do
 8 | so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ccl_chromium_reader
  2 | This repository contains a package of (sometimes partial)
  3 | re-implementations of the technologies used by Chrome/Chromium/Chrome-esque
  4 | applications to store data in a range of data-stores in Python. These libraries 
  5 | provide programmatic access to these data-stores with a digital forensics slant
  6 | (e.g. for most artefacts, offsets or IDs for the data are provided so that they 
  7 | can be located and manually checked).
  8 | 
  9 | The technologies supported are:
 10 | * Snappy decompression
 11 | * LevelDB
 12 | * Protobuf
 13 | * Pickles
 14 | * V8 object deserialization
 15 | * Blink object deserialization
 16 | * IndexedDB
 17 | * Web Storage (Local Storage and Session Storage)
 18 | * Cache (both Block File and Simple formats)
 19 | * SNSS Session files (partial support)
 20 | * FileSystem API
 21 | * Notifications API (Platform Notifications)
 22 | * Downloads (from shared_proto_db)
 23 | * History
 24 | 
 25 | Additionally, there are a number of utility scripts included such as:
 26 | * `ccl_chromium_cache.py` - using the cache library as a command line tool dumps
 27 |   the cache and all HTTP header information.
 28 | * `ccl_chrome_audit.py` - a tool which can be used to scan the data-stored supported
 29 |   by the included libraries, plus a couple more, for records related to a host -
 30 |   designed as a research tool into data stored by web apps.
 31 | 
 32 | 
 33 | ## Python Versions
 34 | The code in this library was written and tested using Python 3.10. It *should* work
 35 | with 3.9, but uses language features which were not present in earlier versions.
 36 | Some parts of the library will probably work OK going back a few versions, but if
 37 | you report bugs related to any version before 3.10, the first question will be: can
 38 | you upgrade to 3.10?
 39 | 
 40 | ## A Note On Requirements
 41 | This repository contains a `requirements.txt` in the pip format. Other than `Brotli` 
 42 | The dependencies listed are only required for the `ccl_chrome_audit.py` script or 
 43 | when using the `ccl_chromium_cache` module as a script for dumping the cache; the 
 44 | libraries work using only the other scripts in this repository and the Python 
 45 | standard library.
 46 | 
 47 | ## Documentation
 48 | The documentation in the libraries is currently sparser than ideal, but some 
 49 | recent work has been undertaken to add more usage strings and fill in some gaps
 50 | in the type-hints. We welcome pull requests to fill in gaps in the documentation.
 51 | 
 52 | ## ccl_chrome_audit
 53 | This script audits multiple data stores in a Chrom(e|ium) profile folder based on
 54 | a fragment (regex) of a host name. It is designed to aid in research into web apps
 55 | by quickly highlighting what data related to that domain is stored where (also of
 56 | us with Electron apps etc.)
 57 | 
 58 | ### Caveats
 59 | At the moment, the script is designed primarily for use on Windows and on the 
 60 | host where the data was populated (this is because of the Cookie decryption being
 61 | achieved using DPAPI). 
 62 | 
 63 | ### Usage
 64 | ```
 65 | ccl_chrome_audit.py <chrome profile folder> [cache folder (for mobile)]
 66 | ```
 67 | 
 68 | ### Current Supported Data Sources
 69 | * Bookmarks
 70 | * History
 71 | * Downloads (from History)
 72 | * Downloads (from shared_proto_db)
 73 | * Favicons
 74 | * Cache
 75 | * Cookies
 76 | * Local Storage
 77 | * Session Storage
 78 | * IndexedDb
 79 | * File System API
 80 | * Platform Notifications 
 81 | * Logins
 82 | * Sessions (SNSS)
 83 | 
 84 | 
 85 | ## ChromiumProfileFolder
 86 | The `ChromiumProfileFolder` class is intended to act as a convenient entry-point to
 87 | much of the useful functionality in the package. It performs on-demand loading of 
 88 | data, so the "start-up cost" of using this object over the individual modules 
 89 | is near-zero, but with the advantage of better searching and filtering 
 90 | functionality built in and an easier interface to bring together data from these
 91 | different sources.
 92 | 
 93 | In this version `ChromiumProfileFolder` supports the following data-stores:
 94 | * History
 95 | * Cache
 96 | * IndexedDB
 97 | * Local Storage
 98 | * Session Storage
 99 | 
100 | To use the object, simply pass the path of the profile folder into the constructor
101 | (the object supports the context manager interface):
102 | 
103 | ```python
104 | import pathlib
105 | from ccl_chromium_reader import ChromiumProfileFolder
106 | 
107 | profile_path = pathlib.Path("profile path goes here")
108 | 
109 | with ChromiumProfileFolder(profile_path) as profile:
110 |     ...  # do things with the profile
111 | ```
112 | 
113 | Most of the methods of the `ChromiumProfileFolder` object which retrieve data can 
114 | search/filter through a `KeySearch` interface which in essence is on of: 
115 | * a `str`, in which case the search will try to exactly match the value
116 | * a collection of `str` (e.g., `list` or `tuple`), in which case the search will
117 |   try to exactly match one of the values contained therein
118 | * a `re.pattern` in which case the search attempts to match the pattern anywhere
119 |   in the search (same as `re.search`)
120 | * a function which takes a `str` and returns a `bool` indicating whether it's a
121 |   match.
122 | 
123 | ```python
124 | import re
125 | import pathlib
126 | from ccl_chromium_reader import ChromiumProfileFolder
127 | 
128 | profile_path = pathlib.Path("profile path goes here")
129 | 
130 | with ChromiumProfileFolder(profile_path) as profile:
131 |     # Match one of two possible hosts exactly, then a regular expression for the key
132 |     for ls_rec in profile.iter_local_storage(
133 |             storage_key=["http://not-a-real-url1.com", "http://not-a-real-url2.com"], 
134 |             script_key=re.compile(r"message\d{1,3}?-text")):
135 |         print(ls_rec.value)
136 |         
137 |     # Match all urls which end with "&read=1"
138 |     for hist_rec in profile.iterate_history_records(url=lambda x: x.endswith("&read=1")):
139 |         print(hist_rec.title, hist_rec.url)
140 | 
141 | ```
142 | 
143 | ## IndexedDB
144 | The `ccl_chromium_indexeddb.py` library processes IndexedDB data found in Chrome et al. 
145 | 
146 | ### Blog
147 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/indexeddb-on-chromium
148 | 
149 | ### Caveats
150 | There is a fair amount of work yet to be done in terms of documentation, but 
151 | the modules should be fine for pulling data out of IndexedDB, with the following
152 | caveats:
153 | 
154 | #### LevelDB deleted data
155 | The LevelDB module will spit out live and deleted/old versions of records
156 | indiscriminately; it's possible to differentiate between them with some
157 | work, but that hasn't really been baked into the modules as they currently
158 | stand. So you are getting deleted data "for free" currently...whether you
159 | want it or not.
160 | 
161 | #### Blink data types
162 | I am fairly satisfied that all the possible V8 object types are accounted for
163 | (but I'm happy to be shown otherwise and get that fixed of course!), but it
164 | is likely that the hosted Blink objects aren't all there yet; so if you hit
165 | upon an error coming from inside ccl_blink_value_deserializer and can point
166 | me towards test data, I'd be very thankful!
167 | 
168 | #### Cyclic references
169 | It is noted in the V8 source that recursive referencing is possible in the
170 | serialization, we're not yet accounting for that so if Python throws a
171 | `RecursionError` that's likely what you're seeing. The plan is to use a 
172 | similar approach to ccl_bplist where the collection types are subclassed and
173 | do Just In Time resolution of the items, but that isn't done yet.
174 | 
175 | ## Using the modules
176 | There are two methods for accessing records - a more pythonic API using a set of 
177 | wrapper objects and a raw API which doesn't mask the underlying workings. There is
178 | unlikely to be much benefit to using the raw API in most cases, so the wrapper objects
179 | are recommended unless you have a compelling reason otherwise.
180 | 
181 | ### Wrapper API
182 | ```python
183 | import sys
184 | from ccl_chromium_reader import ccl_chromium_indexeddb
185 | 
186 | # assuming command line arguments are paths to the .leveldb and .blob folders
187 | leveldb_folder_path = sys.argv[1]
188 | blob_folder_path = sys.argv[2]
189 | 
190 | # open the indexedDB:
191 | wrapper = ccl_chromium_indexeddb.WrappedIndexDB(leveldb_folder_path, blob_folder_path)
192 | 
193 | # You can check the databases present using `wrapper.database_ids`
194 | 
195 | # Databases can be accessed from the wrapper in a number of ways:
196 | db = wrapper[2]  # accessing database using id number
197 | db = wrapper["MyTestDatabase"]  # accessing database using name (only valid for single origin indexedDB instances)
198 | db = wrapper["MyTestDatabase", "file__0@1"]  # accessing the database using name and origin
199 | # NB using name and origin is likely the preferred option in most cases
200 | 
201 | # The wrapper object also supports checking for databases using `in`
202 | 
203 | # You can check for object store names using `db.object_store_names`
204 | 
205 | # Object stores can be accessed from the database in a number of ways:
206 | obj_store = db[1]  # accessing object store using id number
207 | obj_store = db["store"]  # accessing object store using name
208 | 
209 | # Records can then be accessed by iterating the object store in a for-loop
210 | for record in obj_store.iterate_records():
211 |     print(record.user_key)
212 |     print(record.value)
213 | 
214 |     # if this record contained a FileInfo object somewhere linking
215 |     # to data stored in the blob dir, we could access that data like
216 |     # so (assume the "file" key in the record value is our FileInfo):
217 |     with record.get_blob_stream(record.value["file"]) as f:
218 |         file_data = f.read()
219 | 
220 | # By default, any errors in decoding records will bubble an exception 
221 | # which might be painful when iterating records in a for-loop, so either
222 | # passing True into the errors_to_stdout argument and/or by passing in an 
223 | # error handler function to bad_deserialization_data_handler, you can 
224 | # perform logging rather than crashing:
225 | 
226 | for record in obj_store.iterate_records(
227 |         errors_to_stdout=True, 
228 |         bad_deserializer_data_handler= lambda k,v: print(f"error: {k}, {v}")):
229 |     print(record.user_key)
230 |     print(record.value)
231 | ```
232 | 
233 | ### Raw access API
234 | ```python
235 | import sys
236 | from ccl_chromium_reader import ccl_chromium_indexeddb
237 | 
238 | # assuming command line arguments are paths to the .leveldb and .blob folders
239 | leveldb_folder_path = sys.argv[1]
240 | blob_folder_path = sys.argv[2]
241 | 
242 | # open the database:
243 | db = ccl_chromium_indexeddb.IndexedDb(leveldb_folder_path, blob_folder_path)
244 | 
245 | # there can be multiple databases, so we need to iterate through them (NB 
246 | # DatabaseID objects contain additional metadata, they aren't just ints):
247 | for db_id_meta in db.global_metadata.db_ids:
248 |     # and within each database, there will be multiple object stores so we
249 |     # will need to know the maximum object store number (this process will be
250 |     # cleaned up in future releases):
251 |     max_objstore_id = db.get_database_metadata(
252 |             db_id_meta.dbid_no, 
253 |             ccl_chromium_indexeddb.DatabaseMetadataType.MaximumObjectStoreId)
254 |     
255 |     # if the above returns None, then there are no stores in this db
256 |     if max_objstore_id is None:
257 |         continue
258 | 
259 |     # there may be multiple object stores, so again, we iterate through them
260 |     # this time based on the id number. Object stores start at id 1 and the
261 |     # max_objstore_id is inclusive:
262 |     for obj_store_id in range(1, max_objstore_id + 1):
263 |         # now we can ask the indexeddb wrapper for all records for this db
264 |         # and object store:
265 |         for record in db.iterate_records(db_id_meta.dbid_no, obj_store_id):
266 |             print(f"key: {record.user_key}")
267 |             print(f"key: {record.value}")
268 | 
269 |             # if this record contained a FileInfo object somewhere linking
270 |             # to data stored in the blob dir, we could access that data like
271 |             # so (assume the "file" key in the record value is our FileInfo):
272 |             with record.get_blob_stream(record.value["file"]) as f:
273 |                 file_data = f.read()
274 | ```
275 | 
276 | ## Local Storage
277 | `ccl_chromium_localstorage` contains functionality to read the Local Storage data from
278 | a Chromium/Chrome profile folder.
279 | 
280 | ### Blog
281 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage
282 | 
283 | ### Using the module
284 | 
285 | An example showing how to iterate all records, grouped by host is shown below:
286 | ```python
287 | import sys
288 | import pathlib
289 | from ccl_chromium_reader import ccl_chromium_localstorage
290 | 
291 | level_db_in_dir = pathlib.Path(sys.argv[1])
292 | 
293 | # Create the LocalStoreDb object which is used to access the data
294 | with ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir) as local_storage:
295 |     for storage_key in local_storage.iter_storage_keys():
296 |         print(f"Getting records for {storage_key}")
297 |       
298 |         for record in local_storage.iter_records_for_storage_key(storage_key):
299 |             # we can attempt to associate this record with a batch, which may
300 |             # provide an approximate timestamp (withing 5-60 seconds) for this
301 |             # record.
302 |             batch = local_storage.find_batch(record.leveldb_seq_number)
303 |             timestamp = batch.timestamp if batch else None
304 |             print(record.leveldb_seq_number, record.script_key, record.value, sep="\t")
305 | 
306 | ```
307 | 
308 | ## Session Storage
309 | `ccl_chromium_sessionstorage` contains functionality to read the Session Storage data from
310 | a Chromium/Chrome profile folder.
311 | 
312 | ### Blog
313 | Read a blog on the subject here: https://www.cclsolutionsgroup.com/post/chromium-session-storage-and-local-storage
314 | 
315 | ### Using the module
316 | An example showing how to iterate all records, grouped by host is shown below:
317 | 
318 | ```python
319 | import sys
320 | import pathlib
321 | from ccl_chromium_reader import ccl_chromium_sessionstorage
322 | 
323 | level_db_in_dir = pathlib.Path(sys.argv[1])
324 | 
325 | # Create the SessionStoreDb object which is used to access the data
326 | with ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir) as session_storage: 
327 |     for host in session_storage.iter_hosts():
328 |         print(f"Getting records for {host}")
329 |         for record in session_storage.iter_records_for_host(host):
330 |           print(record.leveldb_sequence_number, record.key, record.value)
331 | 
332 | ```
333 | 
334 | ## Cache
335 | `ccl_chromium_cache` contains functionality for reading Chromium cache data (both 
336 | block file and simple cache formats). It can be used to programmatically access 
337 | cache data and metadata (including http headers).
338 | 
339 | ### CLI
340 | Executing the module as a script allows you to dump a cache (either format) and 
341 | collate all metadata into a csv file.
342 | 
343 | ```
344 | USAGE: ccl_chromium_cache.py <cache input dir> <out dir>
345 | 
346 | ```
347 | 
348 | ### Using the module
349 | The main() function (which provides the CLI) in the module shows the full 
350 | process of detecting the cache type, reading data and metadata from the cache.
351 | 
352 | 
353 | 
354 | 
355 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/__init__.py:
--------------------------------------------------------------------------------
1 | from .ccl_chromium_profile_folder import ChromiumProfileFolder
2 | from .common import KeySearch
3 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_filesystem.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | __version__ = "0.8"
 24 | __description__ = "Library for reading Chrome/Chromium File System API data"
 25 | __contact__ = "Alex Caithness"
 26 | 
 27 | import dataclasses
 28 | import os
 29 | import sys
 30 | import pathlib
 31 | import datetime
 32 | import re
 33 | import typing
 34 | import types
 35 | import functools
 36 | 
 37 | from .storage_formats import ccl_leveldb
 38 | from .serialization_formats.ccl_easy_chromium_pickle import EasyPickleIterator
 39 | 
 40 | 
 41 | @dataclasses.dataclass(frozen=True)
 42 | class FileInfo:
 43 |     _owner: "FileSystem" = dataclasses.field(repr=False)
 44 |     origin: str
 45 |     folder_id: str
 46 |     is_persistent: bool
 47 |     seq_no: int
 48 |     file_id: int
 49 |     parent_id: int
 50 |     data_path: str
 51 |     name: str
 52 |     timestamp: datetime.datetime
 53 | 
 54 |     @classmethod
 55 |     def from_pickle(
 56 |             cls, owner: "FileSystem", origin: str, folder_id: str, is_persistent: bool,
 57 |             seq_no: int, file_id: int, data: bytes):
 58 |         with EasyPickleIterator(data) as reader:
 59 |             parent_id = reader.read_uint64()
 60 |             data_path = reader.read_string()
 61 |             name = reader.read_string()
 62 |             timestamp = reader.read_datetime()
 63 | 
 64 |         return cls(owner, origin, folder_id, is_persistent, seq_no, file_id, parent_id, data_path, name, timestamp)
 65 | 
 66 |     def get_local_storage_path(self) -> pathlib.Path:
 67 |         return self._owner.get_local_path_for_fileinfo(self)
 68 | 
 69 |     @property
 70 |     def is_stored_locally(self) -> bool:
 71 |         return self.get_local_storage_path().exists()
 72 | 
 73 | 
 74 | class OriginStorage:
 75 |     def __init__(
 76 |             self,
 77 |             owner: "FileSystem",
 78 |             origin: str,
 79 |             folder_id: str,
 80 |             persistent_files: typing.Optional[typing.Mapping[int, FileInfo]],
 81 |             persistent_deleted_file_ids: typing.Optional[typing.Iterable[int]],
 82 |             temporary_files: typing.Optional[typing.Mapping[int, FileInfo]],
 83 |             temporary_deleted_file_ids: typing.Optional[typing.Iterable[int]]):
 84 |         self._owner = owner
 85 |         self._origin = origin
 86 |         self._folder_id = folder_id
 87 |         self._persistent_files = types.MappingProxyType(persistent_files or {})
 88 |         self._persistent_deleted_file_ids = set(persistent_deleted_file_ids or [])
 89 |         self._temporary_files = types.MappingProxyType(temporary_files or {})
 90 |         self._temporary_deleted_file_ids = set(temporary_deleted_file_ids or [])
 91 | 
 92 |         self._persistent_file_listing_lookup = types.MappingProxyType(self._make_file_listing_lookup(True))
 93 |         self._temporary_file_listing_lookup = types.MappingProxyType(self._make_file_listing_lookup(False))
 94 | 
 95 |         self._file_listing_lookup_reverse: dict[str, list[str]] = {}
 96 |         for k, v in self._persistent_file_listing_lookup.items():
 97 |             self._file_listing_lookup_reverse.setdefault(v, [])
 98 |             self._file_listing_lookup_reverse[v].append(f"p_{k}")
 99 | 
100 |         for k, v in self._temporary_file_listing_lookup.items():
101 |             self._file_listing_lookup_reverse.setdefault(v, [])
102 |             self._file_listing_lookup_reverse[v].append(f"t_{k}")
103 |         self._file_listing_lookup_reverse = types.MappingProxyType(
104 |             self._file_listing_lookup_reverse)
105 | 
106 |     def _make_file_listing_lookup(self, persistent=True) -> dict[int, str]:
107 |         files = self._persistent_files if persistent else self._temporary_files
108 |         file_listing_lookup: dict[int, str] = {}
109 | 
110 |         for file_info in files.values():
111 |             if not file_info.data_path:
112 |                 continue
113 |             path_parts = []
114 |             current = file_info
115 |             while current.file_id != current.parent_id:
116 |                 path_parts.insert(0, current.name)
117 |                 current = files.get(current.parent_id, "<MISSING PATH SEGMENT>")
118 | 
119 |             path_parts.insert(0, "p" if persistent else "t")
120 |             # path_parts.insert(0, self._origin)
121 |             # path_parts.insert(0, "")
122 |             file_listing_lookup[file_info.file_id] = "/".join(path_parts)
123 | 
124 |         return file_listing_lookup
125 | 
126 |     def get_file_listing(self) -> typing.Iterable[tuple[str, FileInfo]]:
127 |         for file_id in self._persistent_file_listing_lookup:
128 |             yield self._persistent_file_listing_lookup[file_id], self._persistent_files[file_id]
129 |         for file_id in self._temporary_file_listing_lookup:
130 |             yield self._temporary_file_listing_lookup[file_id], self._temporary_files[file_id]
131 | 
132 |     def _get_file_info_from_path(self, path) -> typing.Iterable[FileInfo]:
133 |         file_keys = self._file_listing_lookup_reverse[str(path)]
134 |         for key in file_keys:
135 |             p_or_t, file_id = key.split("_", 1)
136 |             yield self._persistent_files[int(file_id)] if p_or_t == "p" else self._temporary_files[int(file_id)]
137 | 
138 | 
139 | class FileSystem:
140 |     def __init__(self, path: typing.Union[os.PathLike, str]):
141 |         """
142 |         Constructor for the File System API access (the entry point for most processing scripts)
143 |         :param path: the path of the File System API storage
144 |         """
145 |         self._root = pathlib.Path(path)
146 |         self._origins = self._get_origins()
147 |         self._origins_reverse = {}
148 |         for origin, folders in self._origins.items():
149 |             for folder in folders:
150 |                 self._origins_reverse[folder] = origin
151 | 
152 |     def _get_origins(self) -> dict[str, tuple]:
153 |         result = {}
154 |         with ccl_leveldb.RawLevelDb(self._root / "Origins") as db:
155 |             for record in db.iterate_records_raw():
156 |                 if record.state != ccl_leveldb.KeyState.Live:
157 |                     continue
158 |                 if record.user_key.startswith(b"ORIGIN:"):
159 |                     _, origin = record.user_key.split(b":", 1)
160 |                     origin = origin.decode("utf-8")
161 |                     result.setdefault(origin, [])
162 |                     result[origin].append(record.value.decode("utf-8"))
163 | 
164 |         return {k: tuple(v) for (k, v) in result.items()}
165 | 
166 |     def get_origins(self) -> typing.Iterable[str]:
167 |         """
168 |         Yields the origins for this File System API
169 |         :return: Yields the origins in this File System API
170 |         """
171 |         yield from self._origins.keys()
172 | 
173 |     def get_folders_for_origin(self, origin) -> tuple[str, ...]:
174 |         """
175 |         Returns the folder ids which are used by the origin (host/domain)
176 |         :param origin:
177 |         :return: a tuple of strings which are the folder id(s) for this origin
178 |         """
179 |         return self._origins[origin]
180 | 
181 |     def get_storage_for_folder(self, folder_id) -> OriginStorage:
182 |         """
183 |         Get the OriginStorage object for the folder
184 |         :param folder_id: a folder id (such as those returned by get_folders_for_origin)
185 |         :return: OriginStorage for the folder_id
186 |         """
187 |         return self._build_file_graph(folder_id)
188 | 
189 |     @functools.cache
190 |     def _build_file_graph(self, folder_id) -> OriginStorage:
191 |         persistent_files: typing.Optional[dict[int, FileInfo]] = {}
192 |         persistent_deleted_files: typing.Optional[dict[int, int]] = {}  # file_id: seq_no
193 |         temporary_files: typing.Optional[dict[int, FileInfo]] = {}
194 |         temporary_deleted_files: typing.Optional[dict[int, int]] = {}  # file_id: seq_no
195 | 
196 |         origin = self._origins_reverse[folder_id]
197 | 
198 |         for p_or_t in ("p", "t"):
199 |             db_path = self._root / folder_id / p_or_t / "Paths"
200 |             if not db_path.exists():
201 |                 continue
202 |             files: dict[int, FileInfo] = persistent_files if p_or_t == "p" else temporary_files
203 |             deleted_files: dict[int, int] = persistent_deleted_files if p_or_t == "p" else temporary_deleted_files
204 |             with ccl_leveldb.RawLevelDb(db_path) as db:
205 |                 # TODO: we can infer file modified (created?) times using the parent's modified times maybe
206 |                 for record in db.iterate_records_raw():
207 |                     if re.match(b"[0-9]+", record.user_key) is not None:
208 |                         if record.state == ccl_leveldb.KeyState.Live:
209 |                             file_id = int(record.user_key.decode("utf-8"))
210 |                             file_info = FileInfo.from_pickle(
211 |                                 self, origin, folder_id, p_or_t == "p", record.seq, file_id, record.value)
212 | 
213 |                             # undelete a file if more recent than deletion record:
214 |                             if file_id in deleted_files and deleted_files[file_id] < file_info.seq_no:
215 |                                 deleted_files.pop(file_id)
216 | 
217 |                             if old_file_info := files.get(file_id):
218 |                                 if old_file_info.seq_no < file_info.seq_no:
219 |                                     # TODO: any reason to keep older records (other than for the timestamps as above?)
220 |                                     files[file_id] = file_info
221 |                             else:
222 |                                 files[file_id] = file_info
223 |                         else:
224 |                             if old_file_info := files.get(file_id):
225 |                                 if old_file_info.seq_no < record.seq:
226 |                                     deleted_files[file_id] = record.seq
227 |                             else:
228 |                                 deleted_files[file_id] = record.seq
229 | 
230 |         return OriginStorage(
231 |             self, origin, folder_id,
232 |             persistent_files, persistent_deleted_files.keys(),
233 |             temporary_files, temporary_deleted_files.keys())
234 | 
235 |     def get_local_path_for_fileinfo(self, file_info: FileInfo):
236 |         """
237 |         Returns the path on the local file system for the FilInfo object
238 |         :param file_info:
239 |         :return: the path on the local file system for the FilInfo object
240 |         """
241 |         path = self._root / file_info.folder_id / ("p" if file_info.is_persistent else "t") / file_info.data_path
242 |         return path
243 | 
244 |     def get_file_stream_for_fileinfo(self, file_info: FileInfo) -> typing.Optional[typing.BinaryIO]:
245 |         """
246 |         Returns a file object from the local file system for the FilInfo object
247 |         :param file_info:
248 |         :return: a file object from the local file system for the FilInfo object
249 |         """
250 |         path = self.get_local_path_for_fileinfo(file_info)
251 |         if path.exists():
252 |             return path.open("rb")
253 |         return None
254 | 
255 | 
256 | class FileSystemUtils:
257 |     @staticmethod
258 |     def print_origin_to_folder(fs_folder: typing.Union[os.PathLike, str]) -> None:
259 |         """
260 |         utility function to print out origins in the File System API to their folders
261 |         :param fs_folder: the path of the File System API storage
262 |         :return: None
263 |         """
264 |         fs = FileSystem(fs_folder)
265 |         for origin in sorted(fs.get_origins()):
266 |             print(f"{origin}: {','.join(fs.get_folders_for_origin(origin))}")
267 | 
268 |     @staticmethod
269 |     def print_folder_to_origin(fs_folder: typing.Union[os.PathLike, str]) -> None:
270 |         """
271 |         utility function to print out folders in the File System API to their Origin
272 |         :param fs_folder: the path of the File System API storage
273 |         :return: None
274 |         """
275 |         fs = FileSystem(fs_folder)
276 |         result = {}
277 |         for origin in fs.get_origins():
278 |             for folder in fs.get_folders_for_origin(origin):
279 |                 result[folder] = origin
280 | 
281 |         for folder in sorted(result.keys()):
282 |             print(f"{folder}: {result[folder]}")
283 | 
284 |     @staticmethod
285 |     def print_all_files(fs_folder: typing.Union[os.PathLike, str]) -> None:
286 |         """
287 |         utility function to print out all files in the File System API
288 |         :param fs_folder: the path of the File System API storage
289 |         :return: None
290 |         """
291 |         fs = FileSystem(fs_folder)
292 |         for origin in sorted(fs.get_origins()):
293 |             for folder in fs.get_folders_for_origin(origin):
294 |                 storage = fs.get_storage_for_folder(folder)
295 |                 for file_path, file_info in storage.get_file_listing():
296 |                     print("/".join([origin, file_path]))
297 | 
298 | 
299 | if __name__ == "__main__":
300 |     FileSystemUtils.print_all_files(sys.argv[1])
301 | 
302 | 
303 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_history.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2024, CCL Forensics
  3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  4 | this software and associated documentation files (the "Software"), to deal in
  5 | the Software without restriction, including without limitation the rights to
  6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  7 | of the Software, and to permit persons to whom the Software is furnished to do
  8 | so, subject to the following conditions:
  9 | 
 10 | The above copyright notice and this permission notice shall be included in all
 11 | copies or substantial portions of the Software.
 12 | 
 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | SOFTWARE.
 20 | """
 21 | 
 22 | import dataclasses
 23 | import datetime
 24 | import math
 25 | import pathlib
 26 | import sqlite3
 27 | import enum
 28 | import re
 29 | import struct
 30 | import typing
 31 | import collections.abc as col_abc
 32 | 
 33 | from .common import KeySearch, is_keysearch_hit
 34 | from .download_common import Download, DownloadSource
 35 | 
 36 | __version__ = "0.6"
 37 | __description__ = "Module to access the chrom(e|ium) history database"
 38 | __contact__ = "Alex Caithness"
 39 | 
 40 | EPOCH = datetime.datetime(1601, 1, 1)
 41 | 
 42 | 
 43 | def parse_chromium_time(microseconds: int) -> datetime.datetime:
 44 |     return EPOCH + datetime.timedelta(microseconds=microseconds)
 45 | 
 46 | 
 47 | def encode_chromium_time(datetime_value: datetime.datetime) -> int:
 48 |     return math.floor((datetime_value - EPOCH).total_seconds() * 1000000)
 49 | 
 50 | 
 51 | class PageTransitionCoreEnum(enum.IntEnum):
 52 |     # chrome/common/page_transition_types.h
 53 |     link = 0
 54 |     typed = 1
 55 |     auto_bookmark = 2
 56 |     auto_subframe = 3
 57 |     manual_subframe = 4
 58 |     generated = 5
 59 |     start_page = 6
 60 |     form_submit = 7
 61 |     reload = 8
 62 |     keyword = 9
 63 |     keyword_generated = 10
 64 | 
 65 | 
 66 | class PageTransitionQualifierEnum(enum.IntFlag):
 67 |     blocked = 0x00800000
 68 |     forward_back = 0x01000000
 69 |     from_address_bar = 0x02000000
 70 |     home_page = 0x04000000
 71 |     from_api = 0x08000000
 72 |     chain_start = 0x10000000
 73 |     chain_end = 0x20000000
 74 |     client_redirect = 0x40000000
 75 |     server_redirect = 0x80000000
 76 | 
 77 | 
 78 | @dataclasses.dataclass(frozen=True)
 79 | class PageTransition:
 80 |     core: PageTransitionCoreEnum
 81 |     qualifier: PageTransitionQualifierEnum
 82 | 
 83 |     @classmethod
 84 |     def from_int(cls, val):
 85 |         # database stores values signed, python needs unsigned
 86 |         if val < 0:
 87 |             val, = struct.unpack(">I", struct.pack(">i", val))
 88 | 
 89 |         core = PageTransitionCoreEnum(val & 0xff)
 90 |         qual = PageTransitionQualifierEnum(val & 0xffffff00)
 91 | 
 92 |         return cls(core, qual)
 93 | 
 94 | 
 95 | @dataclasses.dataclass(frozen=True)
 96 | class HistoryRecord:
 97 |     _owner: "HistoryDatabase" = dataclasses.field(repr=False)
 98 |     rec_id: int
 99 |     url: str
100 |     title: str
101 |     visit_time: datetime.datetime
102 |     visit_duration: datetime.timedelta
103 |     transition: PageTransition
104 |     from_visit_id: int
105 |     opener_visit_id: int
106 | 
107 |     @property
108 |     def record_location(self) -> str:
109 |         return f"SQLite Rowid: {self.rec_id}"
110 | 
111 |     @property
112 |     def has_parent(self) -> bool:
113 |         return self.from_visit_id != 0 or self.opener_visit_id != 0
114 | 
115 |     @property
116 |     def parent_visit_id(self) -> int:
117 |         return self.opener_visit_id or self.from_visit_id
118 | 
119 |     def get_parent(self) -> typing.Optional["HistoryRecord"]:
120 |         """
121 |         Get the parent visit for this record (based on the from_visit field in the database),
122 |         or None if there isn't one.
123 |         """
124 | 
125 |         return self._owner.get_parent_of(self)
126 | 
127 |     def get_children(self) -> col_abc.Iterable["HistoryRecord"]:
128 |         """
129 |         Get the children visits for this record (based on their from_visit field in the database).
130 |         """
131 |         return self._owner.get_children_of(self)
132 | 
133 | 
134 | class HistoryDatabase:
135 |     _HISTORY_QUERY = """
136 |     SELECT
137 |       "visits"."id",
138 |       "urls"."url",
139 |       "urls"."title",
140 |       "visits"."visit_time",
141 |       "visits"."from_visit",
142 |       "visits"."opener_visit",
143 |       "visits"."transition",
144 |       "visits"."visit_duration",
145 |       CASE 
146 |           WHEN "visits"."opener_visit" != 0 THEN "visits"."opener_visit"
147 |           ELSE "visits"."from_visit"
148 |       END "parent_id"
149 |       
150 |     FROM "visits"
151 |       LEFT JOIN "urls" ON "visits"."url" = "urls"."id"
152 |     """
153 | 
154 |     _WHERE_URL_EQUALS_PREDICATE = """"urls"."url" = ?"""
155 | 
156 |     _WHERE_URL_REGEX_PREDICATE = """"urls"."url" REGEXP ?"""
157 | 
158 |     _WHERE_URL_IN_PREDICATE = """"urls"."url" IN ({parameter_question_marks})"""
159 | 
160 |     _WHERE_VISIT_TIME_EARLIEST_PREDICATE = """"visits"."visit_time" >= ?"""
161 | 
162 |     _WHERE_VISIT_TIME_LATEST_PREDICATE = """"visits"."visit_time" <= ?"""
163 | 
164 |     _WHERE_VISIT_ID_EQUALS_PREDICATE = """"visits"."id" = ?"""
165 | 
166 |     #_WHERE_FROM_VISIT_EQUALS_PREDICATE = """"visits"."from_visit" = ?"""
167 | 
168 |     #_WHERE_OPENER_VISIT_EQUALS_PREDICATE = """"visits"."opener_visit" = ?"""
169 | 
170 |     _WHERE_PARENT_ID_EQUALS_PREDICATE = """"parent_id" = ?"""
171 | 
172 |     _DOWNLOADS_QUERY = """
173 |     SELECT 
174 |       "downloads"."id",
175 |       "downloads"."guid",
176 |       "downloads"."current_path",
177 |       "downloads"."target_path",
178 |       "downloads"."start_time",
179 |       "downloads"."received_bytes",
180 |       "downloads"."total_bytes",
181 |       "downloads"."state",
182 |       "downloads"."danger_type",
183 |       "downloads"."interrupt_reason",
184 |       "downloads"."hash",
185 |       "downloads"."end_time",
186 |       "downloads"."opened",
187 |       "downloads"."last_access_time",
188 |       "downloads"."transient",
189 |       "downloads"."referrer",
190 |       "downloads"."site_url",
191 |       "downloads"."embedder_download_data",
192 |       "downloads"."tab_url",
193 |       "downloads"."tab_referrer_url",
194 |       "downloads"."http_method",
195 |       "downloads"."mime_type",
196 |       "downloads"."original_mime_type"
197 |     FROM "downloads";
198 |     """
199 | 
200 |     _DOWNLOADS_URL_CHAINS_QUEREY = """
201 |     SELECT "downloads_url_chains"."id",
202 |       "downloads_url_chains"."chain_index",
203 |       "downloads_url_chains"."url"
204 |     FROM "downloads_url_chains"
205 |     WHERE "downloads_url_chains"."id" = ?
206 |     ORDER BY "downloads_url_chains"."chain_index";
207 |     """
208 | 
209 |     def __init__(self, db_path: pathlib.Path):
210 |         self._conn = sqlite3.connect(db_path.absolute().as_uri() + "?mode=ro", uri=True)
211 |         self._conn.row_factory = sqlite3.Row
212 |         self._conn.create_function("regexp", 2, lambda y, x: 1 if re.search(y, x) is not None else 0)
213 | 
214 |     def _row_to_record(self, row: sqlite3.Row) -> HistoryRecord:
215 |         return HistoryRecord(
216 |             self,
217 |             row["id"],
218 |             row["url"],
219 |             row["title"],
220 |             parse_chromium_time(row["visit_time"]),
221 |             datetime.timedelta(microseconds=row["visit_duration"]),
222 |             PageTransition.from_int(row["transition"]),
223 |             row["from_visit"],
224 |             row["opener_visit"]
225 |         )
226 | 
227 |     def get_parent_of(self, record: HistoryRecord) -> typing.Optional[HistoryRecord]:
228 |         if record.from_visit_id == 0 and record.opener_visit_id == 0:
229 |             return None
230 | 
231 |         parent_id = record.opener_visit_id if record.opener_visit_id != 0 else record.from_visit_id
232 | 
233 |         query = HistoryDatabase._HISTORY_QUERY
234 |         query += f" WHERE {HistoryDatabase._WHERE_VISIT_ID_EQUALS_PREDICATE};"
235 |         cur = self._conn.cursor()
236 |         cur.execute(query, (parent_id,))
237 |         row = cur.fetchone()
238 |         cur.close()
239 |         if row:
240 |             return self._row_to_record(row)
241 | 
242 |     def get_children_of(self, record: HistoryRecord) -> col_abc.Iterable[HistoryRecord]:
243 |         query = HistoryDatabase._HISTORY_QUERY
244 |         predicate = HistoryDatabase._WHERE_PARENT_ID_EQUALS_PREDICATE
245 |         query += f" WHERE {predicate};"
246 |         cur = self._conn.cursor()
247 |         cur.execute(query, (record.rec_id,))
248 |         for row in cur:
249 |             yield self._row_to_record(row)
250 | 
251 |         cur.close()
252 | 
253 |     def get_record_with_id(self, visit_id: int) -> typing.Optional[HistoryRecord]:
254 |         query = HistoryDatabase._HISTORY_QUERY
255 |         query += f" WHERE {HistoryDatabase._WHERE_VISIT_ID_EQUALS_PREDICATE};"
256 |         cur = self._conn.cursor()
257 |         cur.execute(query, (visit_id,))
258 |         row = cur.fetchone()
259 |         cur.close()
260 |         if row:
261 |             return self._row_to_record(row)
262 | 
263 |     def iter_history_records(
264 |             self, url: typing.Optional[KeySearch], *,
265 |             earliest: typing.Optional[datetime.datetime]=None, latest: typing.Optional[datetime.datetime]=None
266 |     ) -> col_abc.Iterable[HistoryRecord]:
267 | 
268 |         predicates = []
269 |         parameters = []
270 | 
271 |         if url is None:
272 |             pass  # no predicate
273 |         elif isinstance(url, str):
274 |             predicates.append(HistoryDatabase._WHERE_URL_EQUALS_PREDICATE)
275 |             parameters.append(url)
276 |         elif isinstance(url, re.Pattern):
277 |             predicates.append(HistoryDatabase._WHERE_URL_REGEX_PREDICATE)
278 |             parameters.append(url.pattern)
279 |         elif isinstance(url, col_abc.Collection):
280 |             predicates.append(
281 |                 HistoryDatabase._WHERE_URL_IN_PREDICATE.format(
282 |                     parameter_question_marks=",".join("?" for _ in range(len(url)))))
283 |             parameters.extend(url)
284 |         elif isinstance(url, col_abc.Callable):
285 |             pass  # we have to call this function across every
286 |         else:
287 |             raise TypeError(f"Unexpected type: {type(url)} (expects: {KeySearch})")
288 | 
289 |         if earliest is not None:
290 |             predicates.append(HistoryDatabase._WHERE_VISIT_TIME_EARLIEST_PREDICATE)
291 |             parameters.append(encode_chromium_time(earliest))
292 | 
293 |         if latest is not None:
294 |             predicates.append(HistoryDatabase._WHERE_VISIT_TIME_LATEST_PREDICATE)
295 |             parameters.append(encode_chromium_time(latest))
296 | 
297 |         query = HistoryDatabase._HISTORY_QUERY
298 |         if predicates:
299 |             query += f" WHERE {' AND '.join(predicates)}"
300 | 
301 |         query += ";"
302 |         cur = self._conn.cursor()
303 |         for row in cur.execute(query, parameters):
304 |             if not isinstance(url, col_abc.Callable) or url(row["url"]):
305 |                 yield self._row_to_record(row)
306 | 
307 |         cur.close()
308 | 
309 |     def iter_downloads(
310 |             self,
311 |             download_url: typing.Optional[KeySearch]=None,
312 |             tab_url: typing.Optional[KeySearch]=None) -> col_abc.Iterable[Download]:
313 |         downloads_cur = self._conn.cursor()
314 |         chain_cur = self._conn.cursor()
315 | 
316 |         downloads_cur.execute(HistoryDatabase._DOWNLOADS_QUERY)
317 | 
318 |         for download in downloads_cur:
319 |             chain_cur.execute(HistoryDatabase._DOWNLOADS_URL_CHAINS_QUEREY, (download["id"],))
320 |             chain = tuple(x["url"] for x in chain_cur)
321 | 
322 |             if download_url is not None and not any(is_keysearch_hit(download_url, x) for x in chain):
323 |                 continue
324 | 
325 |             if (tab_url is not None and
326 |                     not is_keysearch_hit(tab_url, download["tab_url"]) and
327 |                     not is_keysearch_hit(tab_url, download["tab_referrer_url"])):
328 |                 continue
329 | 
330 |             yield Download(
331 |                 DownloadSource.history_db,
332 |                 download["id"],
333 |                 download["guid"],
334 |                 download["hash"].hex(),
335 |                 chain,
336 |                 download["tab_url"],
337 |                 download["tab_referrer_url"],
338 |                 download["target_path"],
339 |                 download["mime_type"],
340 |                 download["original_mime_type"],
341 |                 download["total_bytes"],
342 |                 parse_chromium_time(download["start_time"]),
343 |                 parse_chromium_time(download["end_time"])
344 |             )
345 | 
346 |         downloads_cur.close()
347 |         chain_cur.close()
348 | 
349 |     def close(self):
350 |         self._conn.close()
351 | 
352 |     def __enter__(self):
353 |         return self
354 | 
355 |     def __exit__(self, exc_type, exc_val, exc_tb):
356 |         self.close()
357 | 
358 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_localstorage.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2021-2024, CCL Forensics
  3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  4 | this software and associated documentation files (the "Software"), to deal in
  5 | the Software without restriction, including without limitation the rights to
  6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  7 | of the Software, and to permit persons to whom the Software is furnished to do
  8 | so, subject to the following conditions:
  9 | 
 10 | The above copyright notice and this permission notice shall be included in all
 11 | copies or substantial portions of the Software.
 12 | 
 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | SOFTWARE.
 20 | """
 21 | 
 22 | import io
 23 | import bisect
 24 | import re
 25 | import sys
 26 | import pathlib
 27 | import types
 28 | import typing
 29 | import collections.abc as col_abc
 30 | import dataclasses
 31 | import datetime
 32 | 
 33 | from .storage_formats import ccl_leveldb
 34 | from .common import KeySearch
 35 | 
 36 | __version__ = "0.5"
 37 | __description__ = "Module for reading the Chromium leveldb localstorage format"
 38 | __contact__ = "Alex Caithness"
 39 | 
 40 | """
 41 | See: https://source.chromium.org/chromium/chromium/src/+/main:components/services/storage/dom_storage/local_storage_impl.cc
 42 | Meta keys:
 43 |     Key = "META:" + storage_key (the host)
 44 |     Value = protobuff: 1=timestamp (varint); 2=size in bytes (varint)
 45 |     
 46 | Record keys:
 47 |     Key = "_" + storage_key + "\\x0" + script_key
 48 |     Value = record_value
 49 |     
 50 | """
 51 | 
 52 | _META_PREFIX = b"META:"
 53 | _RECORD_KEY_PREFIX = b"_"
 54 | _CHROME_EPOCH = datetime.datetime(1601, 1, 1, 0, 0, 0)
 55 | 
 56 | EIGHT_BIT_ENCODING = "iso-8859-1"
 57 | 
 58 | 
 59 | def from_chrome_timestamp(microseconds: int) -> datetime.datetime:
 60 |     return _CHROME_EPOCH + datetime.timedelta(microseconds=microseconds)
 61 | 
 62 | 
 63 | def decode_string(raw: bytes) -> str:
 64 |     """
 65 |     decodes a type-prefixed string - prefix of: 0=utf-16-le; 1=an extended ascii codepage (likely dependant on locale)
 66 |     :param raw: raw prefixed-string data
 67 |     :return: decoded string
 68 |     """
 69 |     prefix = raw[0]
 70 |     if prefix == 0:
 71 |         return raw[1:].decode("utf-16-le")
 72 |     elif prefix == 1:
 73 |         return raw[1:].decode(EIGHT_BIT_ENCODING)
 74 |     else:
 75 |         raise ValueError("Unexpected prefix, please contact developer")
 76 | 
 77 | 
 78 | @dataclasses.dataclass(frozen=True)
 79 | class StorageMetadata:
 80 |     storage_key: str
 81 |     timestamp: datetime.datetime
 82 |     size_in_bytes: int
 83 |     leveldb_seq_number: int
 84 | 
 85 |     @classmethod
 86 |     def from_protobuff(cls, storage_key: str, data: bytes, seq: int):
 87 |         with io.BytesIO(data) as stream:
 88 |             # This is a simple protobuff, so we'll read it directly, but with checks, rather than add a dependency
 89 |             ts_tag = ccl_leveldb.read_le_varint(stream)
 90 |             if (ts_tag & 0x07) != 0 or (ts_tag >> 3) != 1:
 91 |                 raise ValueError("Unexpected tag when reading StorageMetadata from protobuff")
 92 |             timestamp = from_chrome_timestamp(ccl_leveldb.read_le_varint(stream))
 93 | 
 94 |             size_tag = ccl_leveldb.read_le_varint(stream)
 95 |             if (size_tag & 0x07) != 0 or (size_tag >> 3) != 2:
 96 |                 raise ValueError("Unexpected tag when reading StorageMetadata from protobuff")
 97 |             size = ccl_leveldb.read_le_varint(stream)
 98 | 
 99 |             return cls(storage_key, timestamp, size, seq)
100 | 
101 | 
102 | @dataclasses.dataclass(frozen=True)
103 | class LocalStorageRecord:
104 |     storage_key: str
105 |     script_key: str
106 |     value: str
107 |     leveldb_seq_number: int
108 |     is_live: bool
109 | 
110 |     @property
111 |     def record_location(self) -> str:
112 |         return f"Leveldb Seq: {self.leveldb_seq_number}"
113 | 
114 | 
115 | class LocalStorageBatch:
116 |     def __init__(self, meta: StorageMetadata, end_seq: int):
117 |         self._meta = meta
118 |         self._end = end_seq
119 | 
120 |     @property
121 |     def storage_key(self) -> str:
122 |         return self._meta.storage_key
123 | 
124 |     @property
125 |     def timestamp(self) -> datetime.datetime:
126 |         return self._meta.timestamp
127 | 
128 |     @property
129 |     def start(self):
130 |         return self._meta.leveldb_seq_number
131 | 
132 |     @property
133 |     def end(self):
134 |         return self._end
135 | 
136 |     def __repr__(self):
137 |         return f"(storage_key={self.storage_key}, timestamp={self.timestamp}, start={self.start}, end={self.end})"
138 | 
139 | 
140 | class LocalStoreDb:
141 |     def __init__(self, in_dir: pathlib.Path):
142 |         if not in_dir.is_dir():
143 |             raise IOError("Input directory is not a directory")
144 | 
145 |         self._ldb = ccl_leveldb.RawLevelDb(in_dir)
146 | 
147 |         self._storage_details = {}  # storage_key: {seq_number: StorageMetadata}
148 |         self._flat_items = []       # [StorageMetadata|LocalStorageRecord]   - used to batch items up
149 |         self._records = {}          # storage_key: {script_key: {seq_number: LocalStorageRecord}}
150 | 
151 |         for record in self._ldb.iterate_records_raw():
152 |             if record.user_key.startswith(_META_PREFIX) and record.state == ccl_leveldb.KeyState.Live:
153 |                 # Only live records for metadata - not sure what we can reliably infer from deleted keys
154 |                 storage_key = record.user_key.removeprefix(_META_PREFIX).decode(EIGHT_BIT_ENCODING)
155 |                 self._storage_details.setdefault(storage_key, {})
156 |                 metadata = StorageMetadata.from_protobuff(storage_key, record.value, record.seq)
157 |                 self._storage_details[storage_key][record.seq] = metadata
158 |                 self._flat_items.append(metadata)
159 |             elif record.user_key.startswith(_RECORD_KEY_PREFIX):
160 |                 # We include deleted records here because we need them to build batches
161 |                 storage_key_raw, script_key_raw = record.user_key.removeprefix(_RECORD_KEY_PREFIX).split(b"\x00", 1)
162 |                 storage_key = storage_key_raw.decode(EIGHT_BIT_ENCODING)
163 |                 script_key = decode_string(script_key_raw)
164 | 
165 |                 try:
166 |                     value = decode_string(record.value) if record.state == ccl_leveldb.KeyState.Live else None
167 |                 except UnicodeDecodeError as e:
168 |                     # Some sites play games to test the browser's capabilities like encoding half of a surrogate pair
169 |                     print(f"Error decoding record value at seq no {record.seq}; "
170 |                           f"{storage_key} {script_key}:  {record.value}")
171 |                     continue
172 | 
173 |                 self._records.setdefault(storage_key, {})
174 |                 self._records[storage_key].setdefault(script_key, {})
175 | 
176 |                 ls_record = LocalStorageRecord(
177 |                     storage_key, script_key, value, record.seq, record.state == ccl_leveldb.KeyState.Live)
178 |                 self._records[storage_key][script_key][record.seq] = ls_record
179 |                 self._flat_items.append(ls_record)
180 | 
181 |         self._storage_details = types.MappingProxyType(self._storage_details)
182 |         self._records = types.MappingProxyType(self._records)
183 | 
184 |         self._all_storage_keys = frozenset(self._storage_details.keys() | self._records.keys())  # because deleted data.
185 |         self._flat_items.sort(key=lambda x: x.leveldb_seq_number)
186 | 
187 |         # organise batches - this is made complex and slow by having to account for missing/deleted data
188 |         # we're looking for a StorageMetadata followed by sequential (in terms of seq number) LocalStorageRecords
189 |         # with the same storage key. Everything that falls within that chain can safely be considered a batch.
190 |         # Any break in sequence numbers or storage key is a fail and can't be considered part of a batch.
191 |         self._batches = {}
192 |         current_meta: typing.Optional[StorageMetadata] = None
193 |         current_end = 0
194 |         for item in self._flat_items:  # pre-sorted
195 |             if isinstance(item, LocalStorageRecord):
196 |                 if current_meta is None:
197 |                     # no currently valid metadata so we can't attribute this record to anything
198 |                     continue
199 |                 elif item.leveldb_seq_number - current_end != 1 or item.storage_key != current_meta.storage_key:
200 |                     # this record breaks a chain, so bundle up what we have and clear everything out
201 |                     self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end)
202 |                     current_meta = None
203 |                     current_end = 0
204 |                 else:
205 |                     # contiguous and right storage key, include in the current chain
206 |                     current_end = item.leveldb_seq_number
207 |             elif isinstance(item, StorageMetadata):
208 |                 if current_meta is not None:
209 |                     # this record breaks a chain, so bundle up what we have, set new start
210 |                     self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end)
211 |                 current_meta = item
212 |                 current_end = item.leveldb_seq_number
213 |             else:
214 |                 raise ValueError
215 | 
216 |         if current_meta is not None:
217 |             self._batches[current_meta.leveldb_seq_number] = LocalStorageBatch(current_meta, current_end)
218 | 
219 |         self._batch_starts = tuple(sorted(self._batches.keys()))
220 | 
221 |     def iter_storage_keys(self) -> col_abc.Iterable[str]:
222 |         yield from self._storage_details.keys()
223 | 
224 |     def contains_storage_key(self, storage_key: str) -> bool:
225 |         return storage_key in self._all_storage_keys
226 | 
227 |     def iter_script_keys(self, storage_key: str) -> col_abc.Iterable[str]:
228 |         if storage_key not in self._all_storage_keys:
229 |             raise KeyError(storage_key)
230 |         if storage_key not in self._records:
231 |             raise StopIteration
232 |         yield from self._records[storage_key].keys()
233 | 
234 |     def contains_script_key(self, storage_key: str, script_key: str) -> bool:
235 |         return script_key in self._records.get(storage_key, {})
236 | 
237 |     def find_batch(self, seq: int) -> typing.Optional[LocalStorageBatch]:
238 |         """
239 |         Finds the batch that a record with the given sequence number belongs to
240 |         :param seq: leveldb sequence id
241 |         :return: the batch containing the given sequence number or None if no batch contains it
242 |         """
243 | 
244 |         i = bisect.bisect_left(self._batch_starts, seq) - 1
245 |         if i < 0:
246 |             return None
247 |         start = self._batch_starts[i]
248 |         batch = self._batches[start]
249 |         if batch.start <= seq <= batch.end:
250 |             return batch
251 |         else:
252 |             return None
253 | 
254 |     def iter_all_records(self, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]:
255 |         """
256 |         :param include_deletions: if True, records related to deletions will be included
257 |         (these will have None as values).
258 |         :return: iterable of LocalStorageRecords
259 |         """
260 |         for storage_key, script_dict in self._records.items():
261 |             for script_key, values in script_dict.items():
262 |                 for seq, value in values.items():
263 |                     if value.is_live or include_deletions:
264 |                         yield value
265 | 
266 |     def _iter_records_for_storage_key(
267 |             self, storage_key: str, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]:
268 |         """
269 |         :param storage_key: storage key (host) for the records
270 |         :param include_deletions: if True, records related to deletions will be included
271 |         (these will have None as values).
272 |         :return: iterable of LocalStorageRecords
273 |         """
274 |         if not self.contains_storage_key(storage_key):
275 |             raise KeyError(storage_key)
276 |         for script_key, values in self._records[storage_key].items():
277 |             for seq, value in values.items():
278 |                 if value.is_live or include_deletions:
279 |                     yield value
280 | 
281 |     def _search_storage_keys(self, storage_key: KeySearch) -> list[str]:
282 |         if isinstance(storage_key, str):
283 |             return [storage_key]
284 |         elif isinstance(storage_key, re.Pattern):
285 |             return [x for x in self._all_storage_keys if storage_key.search(x)]
286 |         elif isinstance(storage_key, col_abc.Collection):
287 |             return list(set(storage_key) & self._all_storage_keys)
288 |         elif isinstance(storage_key, col_abc.Callable):
289 |             return [x for x in self._all_storage_keys if storage_key(x)]
290 |         else:
291 |             raise TypeError(f"Unexpected type: {type(storage_key)} (expects: {KeySearch})")
292 | 
293 |     def iter_records_for_storage_key(
294 |             self, storage_key: KeySearch, *,
295 |             include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[LocalStorageRecord]:
296 |         """
297 |         :param storage_key: storage key (host) for the records. This can be one of: a single string;
298 |         a collection of strings; a regex pattern; a function that takes a string (the host) and returns a bool.
299 |         :param include_deletions: if True, records related to deletions will be included
300 |         :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError
301 |         (these will have None as values).
302 |         :return: iterable of LocalStorageRecords
303 |         """
304 |         if isinstance(storage_key, str):
305 |             if raise_on_no_result and not self.contains_storage_key(storage_key):
306 |                 raise KeyError(storage_key)
307 |             yield from self._iter_records_for_storage_key(storage_key, include_deletions)
308 |         elif isinstance(storage_key, re.Pattern):
309 |             matched_keys = self._search_storage_keys(storage_key)
310 |             if raise_on_no_result and not matched_keys:
311 |                 raise KeyError(f"Pattern: {storage_key.pattern}")
312 |             for key in matched_keys:
313 |                 yield from self._iter_records_for_storage_key(key, include_deletions)
314 |         elif isinstance(storage_key, col_abc.Collection):
315 |             matched_keys = self._search_storage_keys(storage_key)
316 |             if raise_on_no_result and not matched_keys:
317 |                 raise KeyError(storage_key)
318 |             for key in matched_keys:
319 |                 yield from self._iter_records_for_storage_key(key, include_deletions)
320 |         elif isinstance(storage_key, col_abc.Callable):
321 |             matched_keys = self._search_storage_keys(storage_key)
322 |             if raise_on_no_result and not matched_keys:
323 |                 raise KeyError(storage_key)
324 |             for key in matched_keys:
325 |                 yield from self._iter_records_for_storage_key(key, include_deletions)
326 |         else:
327 |             raise TypeError(f"Unexpected type for storage key: {type(storage_key)} (expects: {KeySearch})")
328 | 
329 |     def _iter_records_for_script_key(
330 |             self, storage_key: str, script_key: str, include_deletions=False) -> col_abc.Iterable[LocalStorageRecord]:
331 |         """
332 |         :param storage_key: storage key (host) for the records
333 |         :param script_key: script defined key for the records
334 |         :param include_deletions: if True, records related to deletions will be included
335 |         :return: iterable of LocalStorageRecords
336 |         """
337 |         if not self.contains_script_key(storage_key, script_key):
338 |             raise KeyError((storage_key, script_key))
339 |         for seq, value in self._records[storage_key][script_key].items():
340 |             if value.is_live or include_deletions:
341 |                 yield value
342 | 
343 |     def iter_records_for_script_key(
344 |         self, storage_key: KeySearch, script_key: KeySearch, *,
345 |             include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[LocalStorageRecord]:
346 |         """
347 |         :param storage_key: storage key (host) for the records. This can be one of: a single string;
348 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool.
349 |         :param script_key: script defined key for the records. This can be one of: a single string;
350 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool.
351 |         :param include_deletions: if True, records related to deletions will be included
352 |         :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError
353 |         (these will have None as values).
354 |         :return: iterable of LocalStorageRecords
355 |         """
356 | 
357 |         if isinstance(storage_key, str) and isinstance(script_key, str):
358 |             if raise_on_no_result and not self.contains_script_key(storage_key, script_key):
359 |                 raise KeyError((storage_key, script_key))
360 |             yield from self._iter_records_for_script_key(storage_key, script_key, include_deletions=include_deletions)
361 |         else:
362 |             matched_storage_keys = self._search_storage_keys(storage_key)
363 |             if raise_on_no_result and not matched_storage_keys:
364 |                 raise KeyError((storage_key, script_key))
365 | 
366 |             yielded = False
367 |             for matched_storage_key in matched_storage_keys:
368 |                 if isinstance(script_key, str):
369 |                     matched_script_keys = [script_key]
370 |                 elif isinstance(script_key, re.Pattern):
371 |                     matched_script_keys = [x for x in self._records[matched_storage_key].keys() if script_key.search(x)]
372 |                 elif isinstance(script_key, col_abc.Collection):
373 |                     script_key_set = set(script_key)
374 |                     matched_script_keys = list(self._records[matched_storage_key].keys() & script_key_set)
375 |                 elif isinstance(script_key, col_abc.Callable):
376 |                     matched_script_keys = [x for x in self._records[matched_storage_key].keys() if script_key(x)]
377 |                 else:
378 |                     raise TypeError(f"Unexpected type for script key: {type(script_key)} (expects: {KeySearch})")
379 | 
380 |                 for key in matched_script_keys:
381 |                     for seq, value in self._records[matched_storage_key][key].items():
382 |                         if value.is_live or include_deletions:
383 |                             yielded = True
384 |                             yield value
385 | 
386 |             if not yielded:
387 |                 raise KeyError((storage_key, script_key))
388 | 
389 |     def iter_metadata(self) -> col_abc.Iterable[StorageMetadata]:
390 |         """
391 |         :return: iterable of StorageMetaData
392 |         """
393 |         for meta in self._flat_items:
394 |             if isinstance(meta, StorageMetadata):
395 |                 yield meta
396 | 
397 |     def iter_metadata_for_storage_key(self, storage_key: str) -> col_abc.Iterable[StorageMetadata]:
398 |         """
399 |         :param storage_key: storage key (host) for the metadata
400 |         :return: iterable of StorageMetadata
401 |         """
402 |         if storage_key not in self._all_storage_keys:
403 |             raise KeyError(storage_key)
404 |         if storage_key not in self._storage_details:
405 |             return None
406 |         for seq, meta in self._storage_details[storage_key].items():
407 |             yield meta
408 | 
409 |     def iter_batches(self) -> col_abc.Iterable[LocalStorageBatch]:
410 |         yield from self._batches.values()
411 | 
412 |     def close(self):
413 |         self._ldb.close()
414 | 
415 |     def __contains__(self, item: typing.Union[str, tuple[str, str]]) -> bool:
416 |         """
417 |         :param item: either the host as a str or a tuple of the host and a key (both str)
418 |         :return: if item is a str, returns true if that host is present, if item is a tuple of (str, str), returns True
419 |             if that host and key pair are present
420 |         """
421 | 
422 |         if isinstance(item, str):
423 |             return item in self._all_storage_keys
424 |         elif isinstance(item, tuple) and len(item) == 2:
425 |             host, key = item
426 |             return host in self._all_storage_keys and key in self._records[host]
427 |         else:
428 |             raise TypeError("item must be a string or a tuple of (str, str)")
429 | 
430 |     def __iter__(self):
431 |         """
432 |         iterates the hosts (storage keys) present
433 |         """
434 |         yield from self._all_storage_keys
435 | 
436 |     def __enter__(self) -> "LocalStoreDb":
437 |         return self
438 | 
439 |     def __exit__(self, exc_type, exc_val, exc_tb):
440 |         self.close()
441 | 
442 | 
443 | def main(args):
444 |     in_ldb_path = pathlib.Path(args[0])
445 |     local_store = LocalStoreDb(in_ldb_path)
446 | 
447 |     for rec in local_store.iter_all_records():
448 |         batch = local_store.find_batch(rec.leveldb_seq_number)
449 |         print(rec, batch)
450 | 
451 | 
452 | if __name__ == '__main__':
453 |     main(sys.argv[1:])
454 | 
455 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_notifications.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import datetime
 24 | import enum
 25 | import io
 26 | import os
 27 | import struct
 28 | import sys
 29 | import pathlib
 30 | import dataclasses
 31 | import typing
 32 | 
 33 | from .storage_formats import ccl_leveldb
 34 | from .serialization_formats import ccl_blink_value_deserializer, ccl_v8_value_deserializer, ccl_protobuff as pb
 35 | 
 36 | __version__ = "0.3"
 37 | __description__ = "Library for reading Chrome/Chromium notifications (Platform Notifications)"
 38 | __contact__ = "Alex Caithness"
 39 | 
 40 | # See content/browser/notifications/notification_database.cc
 41 | #   and content/browser/notifications/notification_database_data.proto
 42 | 
 43 | EPOCH = datetime.datetime(1601, 1, 1)
 44 | 
 45 | 
 46 | # stolen from ccl_chromium_indexeddb 20230907
 47 | @dataclasses.dataclass(frozen=True)
 48 | class BlinkTrailer:
 49 |     # third_party/blink/renderer/bindings/core/v8/serialization/trailer_reader.h
 50 |     offset: int
 51 |     length: int
 52 | 
 53 |     TRAILER_SIZE: typing.ClassVar[int] = 13
 54 |     MIN_WIRE_FORMAT_VERSION_FOR_TRAILER: typing.ClassVar[int] = 21
 55 | 
 56 |     @classmethod
 57 |     def from_buffer(cls, buffer, trailer_offset: int):
 58 |         tag, offset, length = struct.unpack(">cQI", buffer[trailer_offset: trailer_offset + BlinkTrailer.TRAILER_SIZE])
 59 |         if tag != ccl_blink_value_deserializer.Constants.tag_kTrailerOffsetTag:
 60 |             raise ValueError(
 61 |                 f"Trailer doesn't start with kTrailerOffsetTag "
 62 |                 f"(expected: 0x{ccl_blink_value_deserializer.Constants.tag_kTrailerOffsetTag.hex()}; "
 63 |                 f"got: 0x{tag.hex()}")
 64 | 
 65 |         return BlinkTrailer(offset, length)
 66 | 
 67 | 
 68 | class ClosedReason(enum.IntEnum):
 69 |     USER = 0
 70 |     DEVELOPER = 1
 71 |     UNKNOWN = 2
 72 | 
 73 | 
 74 | class ActionType(enum.IntEnum):
 75 |     BUTTON = 0
 76 |     TEXT = 1
 77 | 
 78 | 
 79 | class Direction(enum.IntEnum):
 80 |     LEFT_TO_RIGHT = 0
 81 |     RIGHT_TO_LEFT = 1
 82 |     AUTO = 2
 83 | 
 84 | 
 85 | def read_datetime(stream):
 86 |     ms = pb.read_le_varint(stream)
 87 |     return EPOCH + datetime.timedelta(microseconds=ms)
 88 | 
 89 | 
 90 | NotificationAction_Structure = {
 91 |     1: pb.ProtoDecoder("action", pb.read_string),
 92 |     2: pb.ProtoDecoder("title", pb.read_string),
 93 |     3: pb.ProtoDecoder("icon", pb.read_string),
 94 |     4: pb.ProtoDecoder("type", lambda x: ActionType(pb.read_le_varint(x))),
 95 |     5: pb.ProtoDecoder("placeholder", pb.read_string),
 96 | }
 97 | 
 98 | NotificationData_Structure = {
 99 |     1: pb.ProtoDecoder("title", pb.read_string),
100 |     2: pb.ProtoDecoder("closed_reason", lambda x: Direction(pb.read_le_varint(x))),
101 |     3: pb.ProtoDecoder("lang", pb.read_string),
102 |     4: pb.ProtoDecoder("body", pb.read_string),
103 |     5: pb.ProtoDecoder("tag", pb.read_string),
104 |     6: pb.ProtoDecoder("icon", pb.read_string),
105 |     7: pb.ProtoDecoder("silent", lambda x: pb.read_le_varint(x) != 0),
106 |     8: pb.ProtoDecoder("data", pb.read_blob),
107 |     9: pb.ProtoDecoder("vibration", pb.read_blob),
108 |     10: pb.ProtoDecoder(
109 |         "actions", lambda x: pb.read_embedded_protobuf(x, NotificationAction_Structure, use_friendly_tag=True)),
110 |     11: pb.ProtoDecoder("require_interaction", lambda x: pb.read_le_varint(x) != 0),
111 |     12: pb.ProtoDecoder("timestamp", read_datetime),
112 |     13: pb.ProtoDecoder("renotify", lambda x: pb.read_le_varint(x) != 0),
113 |     14: pb.ProtoDecoder("badge", pb.read_string),
114 |     15: pb.ProtoDecoder("image", pb.read_string),
115 |     16: pb.ProtoDecoder("show_trigger_timestamp", read_datetime),
116 | }
117 | 
118 | NotificationDatabaseDataProto_Structure = {
119 |     1: pb.ProtoDecoder("persistent_notification_id", pb.read_le_varint),
120 |     2: pb.ProtoDecoder("origin", pb.read_string),
121 |     3: pb.ProtoDecoder("service_worker_registration_id", pb.read_le_varint),
122 |     4: pb.ProtoDecoder(
123 |         "notification_data", lambda x: pb.read_embedded_protobuf(x, NotificationData_Structure, use_friendly_tag=True)),
124 |     5: pb.ProtoDecoder("notification_id", pb.read_string),
125 |     6: pb.ProtoDecoder("replaced_existing_notification", lambda x: pb.read_le_varint(x) != 0),
126 |     7: pb.ProtoDecoder("num_clicks", pb.read_le_varint32),
127 |     8: pb.ProtoDecoder("num_action_button_clicks", pb.read_le_varint32),
128 |     9: pb.ProtoDecoder("creation_time_millis", read_datetime),
129 |     10: pb.ProtoDecoder("time_until_first_click_millis", pb.read_le_varint),
130 |     11: pb.ProtoDecoder("time_until_last_click_millis", pb.read_le_varint),
131 |     12: pb.ProtoDecoder("time_until_close_millis", pb.read_le_varint),
132 |     13: pb.ProtoDecoder("closed_reason", lambda x: ClosedReason(pb.read_le_varint(x))),
133 |     14: pb.ProtoDecoder("has_triggered", lambda x: pb.read_le_varint(x) != 0),
134 |     15: pb.ProtoDecoder("is_shown_by_browser", lambda x: pb.read_le_varint(x) != 0),
135 | }
136 | 
137 | 
138 | @dataclasses.dataclass(frozen=True)
139 | class LevelDbInfo:
140 |     user_key: bytes
141 |     origin_file: os.PathLike
142 |     seq_no: int
143 | 
144 | 
145 | @dataclasses.dataclass(frozen=True)
146 | class NotificationAction:
147 |     action: typing.Optional[str]
148 |     title: typing.Optional[str]
149 |     icon: typing.Optional[str]
150 |     action_type: typing.Optional[ActionType]
151 |     placeholder: typing.Optional[str]
152 | 
153 | 
154 | @dataclasses.dataclass(frozen=True)
155 | class ChromiumNotification:
156 |     level_db_info: LevelDbInfo
157 |     origin: str
158 |     persistent_notification_id: int
159 |     notification_id: str
160 |     title: typing.Optional[str]
161 |     body: typing.Optional[str]
162 |     data: typing.Optional[typing.Any]
163 |     timestamp: datetime.datetime
164 |     creation_time: datetime.datetime  # from creation_time_millis
165 |     closed_reason: ClosedReason
166 |     time_until_first_click_millis: int
167 |     time_until_last_click_millis: int
168 |     time_until_close_millis: int
169 | 
170 |     tag: typing.Optional[str]
171 |     image: typing.Optional[str]
172 |     icon: typing.Optional[str]
173 |     badge: typing.Optional[str]
174 | 
175 |     actions: typing.Optional[typing.Iterable[NotificationAction]]
176 | 
177 | 
178 | class NotificationReader:
179 |     def __init__(self, notification_input_path: pathlib.Path):
180 |         self._db = ccl_leveldb.RawLevelDb(notification_input_path)
181 | 
182 |     def close(self):
183 |         self._db.close()
184 | 
185 |     def __enter__(self):
186 |         return self
187 | 
188 |     def __exit__(self, exc_type, exc_val, exc_tb):
189 |         self._db.close()
190 | 
191 |     def read_notifications(self) -> typing.Iterable[ChromiumNotification]:
192 |         blink_deserializer = ccl_blink_value_deserializer.BlinkV8Deserializer()
193 |         for record in self._db.iterate_records_raw():
194 |             if record.state != ccl_leveldb.KeyState.Live:
195 |                 continue
196 | 
197 |             key = record.user_key.decode("utf-8")
198 |             record_type, key_info = key.split(":", 1)
199 |             origin, key_id = key_info.split("\0", 1)
200 |             level_db_info = LevelDbInfo(record.user_key, record.origin_file, record.seq)
201 |             if record_type == "DATA":
202 |                 with io.BytesIO(record.value) as stream:
203 |                     root = pb.ProtoObject(
204 |                         0x2,
205 |                         "root",
206 |                         pb.read_protobuff(stream, NotificationDatabaseDataProto_Structure, use_friendly_tag=True))
207 | 
208 |                 data = root.only("notification_data").only("data").value
209 |                 if data:
210 |                     if data[0] != 0xff:
211 |                         print(key)
212 |                         print(data)
213 |                         raise ValueError("Missing blink tag at the start of data")
214 |                     blink_version, blink_version_bytes = pb._read_le_varint(io.BytesIO(data[1:]))
215 |                     data_start = 1 + len(blink_version_bytes)
216 |                     if blink_version >= BlinkTrailer.MIN_WIRE_FORMAT_VERSION_FOR_TRAILER:
217 |                         trailer = BlinkTrailer.from_buffer(data, data_start)  # TODO: do something with the trailer?
218 |                         data_start += BlinkTrailer.TRAILER_SIZE
219 | 
220 |                     with io.BytesIO(data[data_start:]) as obj_raw:
221 |                         try:
222 |                             deserializer = ccl_v8_value_deserializer.Deserializer(
223 |                                 obj_raw, host_object_delegate=blink_deserializer.read)
224 |                         except ValueError:
225 |                             print("Error record:")
226 |                             print(level_db_info, key)
227 |                             raise
228 |                         data = deserializer.read()
229 | 
230 |                 yield ChromiumNotification(
231 |                     level_db_info,
232 |                     root.only("origin").value,
233 |                     root.only("persistent_notification_id").value,
234 |                     root.only("notification_id").value,
235 |                     root.only("notification_data").only("title").value,
236 |                     root.only("notification_data").only("body").value,
237 |                     data,
238 |                     root.only("notification_data").only("timestamp").value,
239 |                     root.only("creation_time_millis").value,
240 |                     root.only("closed_reason").value,
241 |                     root.only("time_until_first_click_millis").value,
242 |                     root.only("time_until_last_click_millis").value,
243 |                     root.only("time_until_close_millis").value,
244 |                     root.only("notification_data").only("tag").value,
245 |                     root.only("notification_data").only("image").value,
246 |                     root.only("notification_data").only("icon").value,
247 |                     root.only("notification_data").only("badge").value,
248 |                     tuple(
249 |                         NotificationAction(
250 |                             x.only("action").value,
251 |                             x.only("title").value,
252 |                             x.only("icon").value,
253 |                             x.only("type").value,
254 |                             x.only("placeholder").value
255 |                         )
256 |                         for x in root["notification_data"][0]["actions"])
257 |                 )
258 | 
259 | 
260 | if __name__ == '__main__':
261 |     if len(sys.argv) < 2:
262 |         print(f"USAGE: {pathlib.Path(sys.argv[0]).name} <Platform Notifications Folder>")
263 |         exit(1)
264 | 
265 |     _reader = NotificationReader(pathlib.Path(sys.argv[1]))
266 |     _blink_deserializer = ccl_blink_value_deserializer.BlinkV8Deserializer()
267 |     for notification in _reader.read_notifications():
268 |         print(notification)
269 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_sessionstorage.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2021, CCL Forensics
  3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  4 | this software and associated documentation files (the "Software"), to deal in
  5 | the Software without restriction, including without limitation the rights to
  6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  7 | of the Software, and to permit persons to whom the Software is furnished to do
  8 | so, subject to the following conditions:
  9 | 
 10 | The above copyright notice and this permission notice shall be included in all
 11 | copies or substantial portions of the Software.
 12 | 
 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | SOFTWARE.
 20 | """
 21 | 
 22 | import sys
 23 | import pathlib
 24 | import typing
 25 | import dataclasses
 26 | import re
 27 | import collections.abc as col_abc
 28 | from types import MappingProxyType
 29 | 
 30 | from .storage_formats import ccl_leveldb
 31 | from .common import KeySearch
 32 | 
 33 | __version__ = "0.6"
 34 | __description__ = "Module for reading the Chromium leveldb sessionstorage format"
 35 | __contact__ = "Alex Caithness"
 36 | 
 37 | # See: https://source.chromium.org/chromium/chromium/src/+/main:components/services/storage/dom_storage/session_storage_metadata.cc
 38 | # et al
 39 | 
 40 | _NAMESPACE_PREFIX = b"namespace-"
 41 | _MAP_ID_PREFIX = b"map-"
 42 | 
 43 | log = None
 44 | 
 45 | 
 46 | @dataclasses.dataclass(frozen=True)
 47 | class SessionStoreValue:
 48 |     host: typing.Optional[str]
 49 |     key: str
 50 |     value: str
 51 |     # guid: typing.Optional[str]
 52 |     leveldb_sequence_number: int
 53 |     is_deleted: bool = False
 54 | 
 55 |     @property
 56 |     def record_location(self) -> str:
 57 |         return f"Leveldb Seq: {self.leveldb_sequence_number}"
 58 | 
 59 | 
 60 | class SessionStoreDb:
 61 |     # todo: get all grouped by namespace by host?
 62 |     # todo: get all grouped by namespace by host.key?
 63 |     # todo: consider refactoring to only getting metadata on first pass and everything else on demand?
 64 |     def __init__(self, in_dir: pathlib.Path):
 65 |         if not in_dir.is_dir():
 66 |             raise IOError("Input directory is not a directory")
 67 | 
 68 |         self._ldb = ccl_leveldb.RawLevelDb(in_dir)
 69 | 
 70 |         # If performance is a concern we should refactor this, but slow and steady for now
 71 | 
 72 |         # First collect the namespace (session/tab guid  + host) and map-ids together
 73 |         self._map_id_to_host = {}  # map_id: host
 74 |         self._deleted_keys = set()
 75 | 
 76 |         for rec in self._ldb.iterate_records_raw():
 77 |             if rec.user_key.startswith(_NAMESPACE_PREFIX):
 78 |                 if rec.user_key == _NAMESPACE_PREFIX:
 79 |                     continue  # bogus entry near the top usually
 80 |                 try:
 81 |                     key = rec.user_key.decode("utf-8")
 82 |                 except UnicodeDecodeError:
 83 |                     print(f"Invalid namespace key: {rec.user_key}")
 84 |                     continue
 85 | 
 86 |                 split_key = key.split("-", 2)
 87 |                 if len(split_key) != 3:
 88 |                     print(f"Invalid namespace key: {key}")
 89 |                     continue
 90 | 
 91 |                 _, guid, host = split_key
 92 | 
 93 |                 if not host:
 94 |                     continue  # TODO investigate why this happens
 95 | 
 96 |                 # normalize host to lower just in case
 97 |                 host = host.lower()
 98 |                 guid_host_pair = guid, host
 99 | 
100 |                 if rec.state == ccl_leveldb.KeyState.Deleted:
101 |                     self._deleted_keys.add(guid_host_pair)
102 |                 else:
103 |                     try:
104 |                         map_id = rec.value.decode("utf-8")
105 |                     except UnicodeDecodeError:
106 |                         print(f"Invalid namespace value: {key}")
107 |                         continue
108 | 
109 |                     if not map_id:
110 |                         continue  # TODO: investigate why this happens/do we want to keep the host around somewhere?
111 | 
112 |                     #if map_id in self._map_id_to_host_guid and self._map_id_to_host_guid[map_id] != guid_host_pair:
113 |                     if map_id in self._map_id_to_host and self._map_id_to_host[map_id] != host:
114 |                         print("Map ID Collision!")
115 |                         print(f"map_id: {map_id}")
116 |                         print(f"Old host: {self._map_id_to_host[map_id]}")
117 |                         print(f"New host: {guid_host_pair}")
118 |                         raise ValueError("map_id collision")
119 |                     else:
120 |                         self._map_id_to_host[map_id] = host
121 | 
122 |         # freeze stuff
123 |         self._map_id_to_host = MappingProxyType(self._map_id_to_host)
124 | 
125 |         self._deleted_keys = frozenset(self._deleted_keys)
126 |         self._deleted_keys_lookup: dict[str, tuple] = {}
127 | 
128 |         self._host_lookup = {}  # {host: {ss_key: [SessionStoreValue, ...]}}
129 |         self._orphans = []  #  list of tuples of key, value where we can't get the host
130 |         for rec in self._ldb.iterate_records_raw():
131 |             if rec.user_key.startswith(_MAP_ID_PREFIX):
132 |                 try:
133 |                     key = rec.user_key.decode("utf-8")
134 |                 except UnicodeDecodeError:
135 |                     print(f"Invalid map id key: {rec.user_key}")
136 |                     continue
137 | 
138 |                 # if rec.state == ccl_leveldb.KeyState.Deleted:
139 |                 #     continue  # TODO: do we want to keep the key around because the presence is important?
140 | 
141 |                 split_key = key.split("-", 2)
142 |                 if len(split_key) != 3:
143 |                     print(f"Invalid map id key: {key}")
144 |                     continue
145 | 
146 |                 _, map_id, ss_key = split_key
147 | 
148 |                 if not split_key:
149 |                     # TODO what does it mean when there is no key here?
150 |                     #      The value will also be a single number (encoded utf-8)
151 |                     continue
152 | 
153 |                 try:
154 |                     value = rec.value.decode("UTF-16-LE") if rec.state == ccl_leveldb.KeyState.Live else None
155 |                 except UnicodeDecodeError:
156 |                     print(f"Error decoding value for {key}")
157 |                     print(f"Raw Value: {rec.value}")
158 |                     continue
159 | 
160 |                 host = self._map_id_to_host.get(map_id)
161 |                 if not host:
162 |                     self._orphans.append(
163 |                         (ss_key,
164 |                          SessionStoreValue(None, ss_key, value, rec.seq, rec.state == ccl_leveldb.KeyState.Deleted)
165 |                          ))
166 |                 else:
167 |                     self._host_lookup.setdefault(host, {})
168 |                     self._host_lookup[host].setdefault(ss_key, [])
169 |                     self._host_lookup[host][ss_key].append(
170 |                         SessionStoreValue(host, ss_key, value, rec.seq, rec.state == ccl_leveldb.KeyState.Deleted))
171 | 
172 |     def __contains__(self, item: typing.Union[str, typing.Tuple[str, str]]) -> bool:
173 |         """
174 |         :param item: either the host as a str or a tuple of the host and a key (both str)
175 |         :return: if item is a str, returns true if that host is present, if item is a tuple of (str, str), returns True
176 |             if that host and key pair are present
177 |         """
178 | 
179 |         if isinstance(item, str):
180 |             return item in self._host_lookup
181 |         elif isinstance(item, tuple) and len(item) == 2:
182 |             host, key = item
183 |             return host in self._host_lookup and key in self._host_lookup[host]
184 |         else:
185 |             raise TypeError("item must be a string or a tuple of (str, str)")
186 | 
187 |     def iter_hosts(self) -> typing.Iterable[str]:
188 |         """
189 |         :return: yields the hosts present in this SessionStorage
190 |         """
191 |         yield from self._host_lookup.keys()
192 | 
193 |     def get_all_for_host(self, host: str) -> dict[str, tuple[SessionStoreValue, ...]]:
194 |         """
195 |         DEPRECATED
196 |         :param host: the host (domain name) for the session storage
197 |         :return: a dictionary where the keys are storage keys and the values are tuples of SessionStoreValue objects
198 |             for that key. Multiple values may be returned as deleted or old values may be recovered.
199 |         """
200 |         if host not in self:
201 |             return {}
202 |         result_raw = dict(self._host_lookup[host])
203 |         for ss_key in result_raw:
204 |             result_raw[ss_key] = tuple(result_raw[ss_key])
205 |         return result_raw
206 | 
207 |     def _search_host(self, host: KeySearch) -> list[str]:
208 |         if isinstance(host, str):
209 |             return [host]
210 |         elif isinstance(host, re.Pattern):
211 |             return [x for x in self._host_lookup if host.search(x)]
212 |         elif isinstance(host, col_abc.Collection):
213 |             return list(set(host) & self._host_lookup.keys())
214 |         elif isinstance(host, col_abc.Callable):
215 |             return [x for x in self._host_lookup if host(x)]
216 |         else:
217 |             raise TypeError(f"Unexpected type: {type(host)} (expects: {KeySearch})")
218 | 
219 |     def iter_records_for_host(
220 |             self, host: KeySearch, *,
221 |             include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[SessionStoreValue]:
222 |         """
223 |         :param host: storage key (host) for the records. This can be one of: a single string;
224 |         a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool.
225 |         :param include_deletions: if True, records related to deletions will be included
226 |         :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError
227 |         (these will have None as values).
228 |         :return: iterable of SessionStoreValue
229 |         """
230 |         if isinstance(host, str):
231 |             if raise_on_no_result and host not in self._host_lookup:
232 |                 raise KeyError(host)
233 |             for records in self._host_lookup[host].values():
234 |                 for rec in records:
235 |                     if include_deletions or not rec.is_deleted:
236 |                         yield rec
237 |         elif isinstance(host, re.Pattern) or isinstance(host, col_abc.Collection) or isinstance(host, col_abc.Callable):
238 |             found_hosts = self._search_host(host)
239 |             if raise_on_no_result and not found_hosts:
240 |                 raise KeyError(host)
241 |             for found_host in found_hosts:
242 |                 for records in self._host_lookup[found_host].values():
243 |                     for rec in records:
244 |                         if include_deletions or not rec.is_deleted:
245 |                             yield rec
246 |         else:
247 |             raise TypeError(f"Unexpected type for host: {type(host)} (expects: {KeySearch})")
248 | 
249 |     def iter_all_records(self, *, include_deletions=False, include_orphans=False):
250 |         """
251 |         Returns all records recovered from session storage
252 |         :param include_deletions: if True, records related to deletions will be included
253 |         :param include_orphans: if True, records which cannot be associated with a host will be included
254 |         """
255 |         for host in self.iter_hosts():
256 |             yield from self.iter_records_for_host(host, include_deletions=include_deletions)
257 |         if include_orphans:
258 |             yield from (x[1] for x in self.iter_orphans())
259 | 
260 |     def get_session_storage_key(self, host: str, key: str) -> tuple[SessionStoreValue, ...]:
261 |         """
262 |         DEPRECATED
263 |         :param host: the host (domain name) for the session storage
264 |         :param key: the storage key
265 |         :return: a tuple of SessionStoreValue matching the host and key. Multiple values may be returned as deleted or
266 |             old values may be recovered.
267 |         """
268 |         if (host, key) not in self:
269 |             return tuple()
270 |         return tuple(self._host_lookup[host][key])
271 | 
272 |     def iter_records_for_session_storage_key(
273 |             self, host: KeySearch, key: KeySearch, *,
274 |             include_deletions=False, raise_on_no_result=True) -> col_abc.Iterable[SessionStoreValue]:
275 |         """
276 |         :param host: storage key (host) for the records. This can be one of: a single string;
277 |         a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool.
278 |         :param key: script defined key for the records. This can be one of: a single string;
279 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool.
280 |         :param include_deletions: if True, records related to deletions will be included
281 |         :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError
282 |         (these will have None as values).
283 |         :return: iterable of LocalStorageRecords
284 |         """
285 |         if isinstance(host, str) and isinstance(key, str):
286 |             if host not in self._host_lookup or key not in self._host_lookup[host]:
287 |                 if raise_on_no_result:
288 |                     raise KeyError((host, key))
289 |                 else:
290 |                     return []
291 | 
292 |             yield from (r for r in self._host_lookup[host][key] if include_deletions or not r.is_deleted)
293 | 
294 |         else:
295 |             found_hosts = self._search_host(host)
296 |             if raise_on_no_result and not found_hosts:
297 |                 raise KeyError((host, key))
298 | 
299 |             yielded = False
300 |             for found_host in found_hosts:
301 |                 if isinstance(key, str):
302 |                     matched_keys = [key]
303 |                 elif isinstance(key, re.Pattern):
304 |                     matched_keys = [x for x in self._host_lookup[found_host].keys() if key.search(x)]
305 |                 elif isinstance(key, col_abc.Collection):
306 |                     script_key_set = set(key)
307 |                     matched_keys = list(self._host_lookup[found_host].keys() & script_key_set)
308 |                 elif isinstance(key, col_abc.Callable):
309 |                     matched_keys = [x for x in self._host_lookup[found_host].keys() if key(x)]
310 |                 else:
311 |                     raise TypeError(f"Unexpected type for script key: {type(key)} (expects: {KeySearch})")
312 | 
313 |                 for matched_key in matched_keys:
314 |                     for rec in self._host_lookup[found_host][matched_key]:
315 |                         if include_deletions or not rec.is_deleted:
316 |                             yielded = True
317 |                             yield rec
318 | 
319 |             if not yielded and raise_on_no_result:
320 |                 raise KeyError((host, key))
321 | 
322 |     def iter_orphans(self) -> typing.Iterable[tuple[str, SessionStoreValue]]:
323 |         """
324 |         Returns records which have been orphaned from their host (domain name) where it cannot be recovered. The keys
325 |             may be named uniquely enough that the host may be inferred.
326 |         :return: yields tuples of (session key, SessionStoreValue)
327 |         """
328 |         yield from self._orphans
329 | 
330 |     def __getitem__(self, item: typing.Union[str, typing.Tuple[str, str]]) -> typing.Union[
331 |             dict[str, tuple[SessionStoreValue, ...]], tuple[SessionStoreValue, ...]]:
332 |         if item not in self:
333 |             raise KeyError(item)
334 | 
335 |         if isinstance(item, str):
336 |             return self.get_all_for_host(item)
337 |         elif isinstance(item, tuple) and len(item) == 2:
338 |             return self.get_session_storage_key(*item)
339 |         else:
340 |             raise TypeError("item must be a string or a tuple of (str, str)")
341 | 
342 |     def __iter__(self) -> typing.Iterable[str]:
343 |         """
344 |         iterates the hosts present
345 |         """
346 |         return self.iter_hosts()
347 | 
348 |     def close(self):
349 |         self._ldb.close()
350 | 
351 |     def __enter__(self) -> "SessionStoreDb":
352 |         return self
353 | 
354 |     def __exit__(self, exc_type, exc_val, exc_tb):
355 |         self.close()
356 | 
357 | 
358 | def main(args):
359 |     ldb_in_dir = pathlib.Path(args[0])
360 |     ssdb = SessionStoreDb(ldb_in_dir)
361 | 
362 |     print("Hosts in db:")
363 |     for host in ssdb:
364 |         print(host)
365 | 
366 | 
367 | if __name__ == '__main__':
368 |     main(sys.argv[1:])
369 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_chromium_snss2.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import dataclasses
 24 | import enum
 25 | import struct
 26 | import sys
 27 | import os
 28 | import pathlib
 29 | import datetime
 30 | import types
 31 | import typing
 32 | from .serialization_formats.ccl_easy_chromium_pickle import EasyPickleIterator, EasyPickleException
 33 | 
 34 | __version__ = "0.2"
 35 | __description__ = "Module for reading Chromium SNSS files"
 36 | __contact__ = "Alex Caithness"
 37 | 
 38 | 
 39 | class TabRestoreIdType(enum.Enum):
 40 |     # components/sessions/core/tab_restore_service_impl.cc
 41 |     CommandUpdateTabNavigation = 1
 42 |     CommandRestoredEntry = 2
 43 |     CommandWindowDeprecated = 3
 44 |     CommandSelectedNavigationInTab = 4
 45 |     CommandPinnedState = 5
 46 |     CommandSetExtensionAppID = 6
 47 |     CommandSetWindowAppName = 7
 48 |     CommandSetTabUserAgentOverride = 8
 49 |     CommandWindow = 9
 50 |     CommandSetTabGroupData = 10
 51 |     CommandSetTabUserAgentOverride2 = 11
 52 |     CommandSetWindowUserTitle = 12
 53 |     CommandCreateGroup = 13
 54 |     CommandAddTabExtraData = 14
 55 | 
 56 |     UnusedCommand = 255
 57 | 
 58 | 
 59 | class SessionRestoreIdType(enum.Enum):
 60 |     # components/sessions/core/session_service_commands.cc
 61 |     CommandSetTabWindow = 0
 62 |     CommandSetWindowBounds = 1  # // OBSOLETE Superseded by kCommandSetWindowBounds3.
 63 |     CommandSetTabIndexInWindow = 2
 64 |     CommandTabNavigationPathPrunedFromBack = 5  # // OBSOLETE: Superseded by kCommandTabNavigationPathPruned instead
 65 |     CommandUpdateTabNavigation = 6
 66 |     CommandSetSelectedNavigationIndex = 7
 67 |     CommandSetSelectedTabInIndex = 8
 68 |     CommandSetWindowType = 9
 69 |     CommandSetWindowBounds2 = 10  # // OBSOLETE Superseded by kCommandSetWindowBounds3. Except for data migration.
 70 |     CommandTabNavigationPathPrunedFromFront = 11  # // Superseded kCommandTabNavigationPathPruned instead
 71 |     CommandSetPinnedState = 12
 72 |     CommandSetExtensionAppID = 13
 73 |     CommandSetWindowBounds3 = 14
 74 |     CommandSetWindowAppName = 15
 75 |     CommandTabClosed = 16
 76 |     CommandWindowClosed = 17
 77 |     CommandSetTabUserAgentOverride = 18  # // OBSOLETE: Superseded by kCommandSetTabUserAgentOverride2.
 78 |     CommandSessionStorageAssociated = 19
 79 |     CommandSetActiveWindow = 20
 80 |     CommandLastActiveTime = 21
 81 |     CommandSetWindowWorkspace = 22  # // OBSOLETE Superseded by kCommandSetWindowWorkspace2.
 82 |     CommandSetWindowWorkspace2 = 23
 83 |     CommandTabNavigationPathPruned = 24
 84 |     CommandSetTabGroup = 25
 85 |     CommandSetTabGroupMetadata = 26  # // OBSOLETE Superseded by kCommandSetTabGroupMetadata2.
 86 |     CommandSetTabGroupMetadata2 = 27
 87 |     CommandSetTabGuid = 28
 88 |     CommandSetTabUserAgentOverride2 = 29
 89 |     CommandSetTabData = 30
 90 |     CommandSetWindowUserTitle = 31
 91 |     CommandSetWindowVisibleOnAllWorkspaces = 32
 92 |     CommandAddTabExtraData = 33
 93 |     CommandAddWindowExtraData = 34
 94 | 
 95 |     # Edge has custom command types. These are what I have seen so far.
 96 |     # None of these types appear to be related to browsing data at the moment (typically only a few bytes long).
 97 |     EdgeCommandUnknown131 = 131
 98 |     EdgeCommandUnknown132 = 132
 99 | 
100 |     UnusedCommand = 255
101 | 
102 | 
103 | class PageTransition:
104 |     # ui/base/page_transition_types.h
105 |     _core_mask = 0xff
106 |     _qualifier_mask = 0xffffff00
107 |     _core_transitions = {
108 |             0: "Link",
109 |             1: "Typed",
110 |             2: "AutoBookmark",
111 |             3: "AutoSubframe",
112 |             4: "ManualSubframe",
113 |             5: "Generated",
114 |             6: "AutoToplevel",
115 |             7: "FormSubmit",
116 |             8: "Reload",
117 |             9: "Keyword",
118 |             10: "KeywordGenerated"
119 |     }
120 |     _qualifiers = {
121 |             0x00800000: "Blocked",
122 |             0x01000000: "ForwardBack",
123 |             0x02000000: "FromAddressBar",
124 |             0x04000000: "HomePage",
125 |             0x08000000: "FromApi",
126 |             0x10000000: "ChainStart",
127 |             0x20000000: "ChainEnd",
128 |             0x40000000: "ClientRedirect",
129 |             0x80000000: "ServerRedirect"
130 |     }
131 | 
132 |     def __init__(self, value):
133 |         self._value = value
134 |         if value < 0:
135 |             # signed to unsigned
136 |             value += (0x80000000 * 2)
137 |         self._core_transition = PageTransition._core_transitions[value & PageTransition._core_mask]
138 |         self._qualifiers = []
139 |         for flag in PageTransition._qualifiers:
140 |             if (value & PageTransition._qualifier_mask) & flag > 0:
141 |                 self._qualifiers.append(PageTransition._qualifiers[flag])
142 | 
143 |     def __str__(self):
144 |         return "; ".join([self._core_transition] + self._qualifiers)
145 | 
146 |     def __repr__(self):
147 |         return "<ChromeTransition ({0}): {1})>".format(self._value, str(self))
148 | 
149 |     @property
150 |     def core_transition(self) -> str:
151 |         return self._core_transition
152 | 
153 |     @property
154 |     def qualifiers(self) -> typing.Iterable[str]:
155 |         yield from self._qualifiers
156 | 
157 |     @property
158 |     def value(self):
159 |         return self._value
160 | 
161 | 
162 | class SnssError(Exception):
163 |     ...
164 | 
165 | 
166 | @dataclasses.dataclass(frozen=True)
167 | class SessionCommand:
168 |     offset: int
169 |     id_type: typing.Union[SessionRestoreIdType, TabRestoreIdType]
170 | 
171 | 
172 | @dataclasses.dataclass(frozen=True)
173 | class NavigationEntry(SessionCommand):
174 |     # components/sessions/core/serialized_navigation_entry.cc
175 |     index: int
176 |     url: str
177 |     title: str
178 |     page_state_raw: bytes  # replace with completed PageState object
179 |     transition_type: PageTransition
180 |     has_post_data: typing.Optional[bool] = None
181 |     referrer_url: typing.Optional[str] = None
182 |     original_request_url: typing.Optional[str] = None
183 |     is_overriding_user_agent: typing.Optional[bool] = None
184 |     timestamp: typing.Optional[datetime.datetime] = None
185 |     http_status: typing.Optional[int] = None
186 |     referrer_policy: typing.Optional[int] = None
187 |     extended_map: typing.Optional[types.MappingProxyType] = None
188 |     task_id: typing.Optional[int] = None
189 |     parent_task_id: typing.Optional[int] = None
190 |     root_task_id: typing.Optional[int] = None
191 |     session_id: typing.Optional[int] = None
192 | 
193 |     @classmethod
194 |     def from_pickle(
195 |             cls, pickle, id_type: typing.Union[SessionRestoreIdType, TabRestoreIdType],
196 |             offset: int, session_id: typing.Optional[int]=None) -> "NavigationEntry":
197 |         index = pickle.read_int32()
198 |         url = pickle.read_string()
199 |         title = pickle.read_string16()
200 |         page_state_length = pickle.read_int32()
201 |         # https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/common/page_state/page_state_serialization.cc;drc=d1e1301c82bef37d30796e2d6098856b851d90a4;l=897
202 |         page_state_raw = pickle.read_aligned(page_state_length)
203 |         transition_type = PageTransition(pickle.read_uint32())
204 | 
205 |         try:
206 |             type_mask = pickle.read_uint32()
207 |         except EasyPickleException:
208 |             # very old versions of data end here, so we return a partial object here
209 |             return cls(offset, id_type, index, url, title, page_state_raw, transition_type)
210 | 
211 |         has_post_data = (type_mask & 0x01) > 0
212 |         referrer_url = pickle.read_string()
213 |         _ = pickle.read_int32()  # referrer policy, not used
214 |         original_request_url = pickle.read_string()
215 |         is_overriding_user_agent = pickle.read_bool()
216 |         timestamp = pickle.read_datetime()
217 |         _ = pickle.read_string16()  # search terms, not used
218 |         http_status = pickle.read_int32()
219 |         referrer_policy = pickle.read_int32()
220 | 
221 |         extended_map_size = pickle.read_int32()
222 |         extended_map = {}
223 |         for _ in range(extended_map_size):
224 |             key = pickle.read_string()
225 |             value = pickle.read_string()
226 |             extended_map[key] = value
227 | 
228 |         extended_map = types.MappingProxyType(extended_map)
229 | 
230 |         task_id = None
231 |         parent_task_id = None
232 |         root_task_id = None
233 | 
234 |         try:
235 |             # these might not exist in older files, so no big deal if we can't get them
236 |             task_id = pickle.read_int64()
237 |             parent_task_id = pickle.read_int64()
238 |             root_task_id = pickle.read_int64()
239 | 
240 |             child_task_id_count = pickle.read_int32()
241 |             if child_task_id_count != 0:
242 |                 raise SnssError("Child tasks should not be present when reading NavigationEntry")
243 |         except EasyPickleException:
244 |             pass
245 | 
246 |         return cls(
247 |             offset, id_type,
248 |             index, url, title, page_state_raw, transition_type, has_post_data, referrer_url, original_request_url,
249 |             is_overriding_user_agent, timestamp, http_status, referrer_policy, extended_map, task_id, parent_task_id,
250 |             root_task_id, session_id
251 |         )
252 | 
253 | 
254 | class UnprocessedEntry(SessionCommand):
255 |     ...
256 | 
257 | 
258 | class SnssFileType(enum.Enum):
259 |     Session = 1
260 |     Tab = 2
261 | 
262 | 
263 | class SnssFile:
264 |     def __init__(self, file_type: SnssFileType, stream: typing.BinaryIO):
265 |         # components/sessions/core/command_storage_backend.cc
266 |         self._f = stream
267 |         if file_type == SnssFileType.Session:
268 |             self._id_type = SessionRestoreIdType
269 |         elif file_type == SnssFileType.Tab:
270 |             self._id_type = TabRestoreIdType
271 |         else:
272 |             raise ValueError("file_type is an unknown SnssFileType or is not SnssFileType")
273 | 
274 |         self._file_type = file_type
275 |         header = self._f.read(8)
276 |         if header[0:4] != b"SNSS":
277 |             raise SnssError(f"Invalid magic; expected SNSS; got {header[0:4]}")
278 |         self._version, = struct.unpack("<I", header[4:8])
279 |         if self._version not in (1, 3):
280 |             raise SnssError(f"Expected version 1 or 3, got version {self._version}")
281 | 
282 |     @property
283 |     def file_type(self) -> SnssFileType:
284 |         return self._file_type
285 | 
286 |     def reset(self):
287 |         self._f.seek(8, os.SEEK_SET)
288 | 
289 |     def _get_next_session_command(self) -> typing.Optional[SessionCommand]:
290 |         # components/sessions/core/command_storage_backend.cc
291 |         start_offset = self._f.tell()
292 |         length_raw = self._f.read(2)
293 |         if not length_raw:
294 |             return None  # eof
295 |         length, = struct.unpack("<H", length_raw)
296 |         data = self._f.read(length)
297 |         if len(data) != length:
298 |             raise ValueError(f"Could not get enough data reading record starting at {start_offset}")
299 |         record_id_type = self._id_type(data[0])
300 | 
301 |         # components/sessions/core/session_service_commands.cc, components/sessions/core/base_session_service_commands.cc
302 |         # components/sessions/core/tab_restore_service_impl.cc
303 |         if record_id_type in (SessionRestoreIdType.CommandUpdateTabNavigation, TabRestoreIdType.CommandUpdateTabNavigation):  # 6, 1
304 |             with EasyPickleIterator(data[1:]) as pickle:
305 |                 session_id = pickle.read_int32()
306 |                 nav = NavigationEntry.from_pickle(pickle, record_id_type, start_offset, session_id)
307 |                 return nav
308 |         else:
309 |             return UnprocessedEntry(start_offset, record_id_type)
310 | 
311 |     def iter_session_commands(self) -> typing.Iterable[SessionCommand]:
312 |         self.reset()
313 |         while command := self._get_next_session_command():
314 |             yield command
315 | 
316 | 
317 | def main(args):
318 |     in_path = pathlib.Path(args[0])
319 |     if in_path.name.startswith("Session_"):
320 |         file_type = SnssFileType.Session
321 |     elif in_path.name.startswith("Tabs_"):
322 |         file_type = SnssFileType.Tab
323 |     else:
324 |         raise ValueError("File name does not start with Session or Tabs")
325 |     with in_path.open("rb") as f:
326 |         snss_file = SnssFile(file_type, f)
327 |         for command in snss_file.iter_session_commands():
328 |             print(command)
329 | 
330 | 
331 | if __name__ == '__main__':
332 |     main(sys.argv[1:])
333 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/ccl_shared_proto_db_downloads.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | __version__ = "0.3"
 24 | __description__ = "A module for reading downloads from the Chrome/Chromium shared_proto_db leveldb data store"
 25 | __contact__ = "Alex Caithness"
 26 | 
 27 | import datetime
 28 | import io
 29 | import os
 30 | import pathlib
 31 | import sys
 32 | import typing
 33 | 
 34 | from .storage_formats import ccl_leveldb
 35 | from .serialization_formats import ccl_protobuff as pb
 36 | from .download_common import Download
 37 | 
 38 | CHROME_EPOCH = datetime.datetime(1601, 1, 1, 0, 0, 0)
 39 | 
 40 | 
 41 | def chrome_milli_time(milliseconds: typing.Optional[int], allow_none=True) -> typing.Optional[datetime.datetime]:
 42 |     if milliseconds is not None:
 43 |         if milliseconds == 0xffffffffffffffff:
 44 |             return CHROME_EPOCH
 45 |         else:
 46 |             return CHROME_EPOCH + datetime.timedelta(milliseconds=milliseconds)
 47 |     elif allow_none:
 48 |         return None
 49 |     raise ValueError("milliseconds cannot be None")
 50 | 
 51 | 
 52 | def read_datetime(stream) -> typing.Optional[datetime.datetime]:
 53 |     return chrome_milli_time(pb.read_le_varint(stream))
 54 | 
 55 | 
 56 | # https://source.chromium.org/chromium/chromium/src/+/main:components/download/database/proto/download_entry.proto;l=86
 57 | 
 58 | HttpRequestHeader_Structure = {
 59 |     1: pb.ProtoDecoder("key", pb.read_string),
 60 |     2: pb.ProtoDecoder("value", pb.read_string)
 61 | }
 62 | 
 63 | ReceivedSlice_Structure = {
 64 |     1: pb.ProtoDecoder("offset", pb.read_le_varint),
 65 |     2: pb.ProtoDecoder("received_bytes", pb.read_le_varint),
 66 |     3: pb.ProtoDecoder("finished", lambda x: pb.read_le_varint(x) != 0)
 67 | }
 68 | 
 69 | InProgressInfo_Structure = {
 70 |     1: pb.ProtoDecoder("url_chain", pb.read_string),  # string
 71 |     2: pb.ProtoDecoder("referrer_url", pb.read_string),  # string
 72 |     3: pb.ProtoDecoder("site_url", pb.read_string),  # string  // deprecated
 73 |     4: pb.ProtoDecoder("tab_url", pb.read_string),  # string
 74 |     5: pb.ProtoDecoder("tab_referrer_url", pb.read_string),  # string
 75 |     6: pb.ProtoDecoder("fetch_error_body", lambda x: pb.read_le_varint(x) != 0),  # bool
 76 |     7: pb.ProtoDecoder("request_headers", lambda x: pb.read_embedded_protobuf(x, HttpRequestHeader_Structure, True)),  # HttpRequestHeader
 77 |     8: pb.ProtoDecoder("etag", pb.read_string),  # string
 78 |     9: pb.ProtoDecoder("last_modified", pb.read_string),  # string
 79 |     10: pb.ProtoDecoder("total_bytes", pb.read_le_varint),  # int64:
 80 |     11: pb.ProtoDecoder("mime_type", pb.read_string),  # string
 81 |     12: pb.ProtoDecoder("original_mime_type", pb.read_string),  # string
 82 |     13: pb.ProtoDecoder("current_path", pb.read_blob),  # bytes  // Serialized pickles to support string16: TODO
 83 |     14: pb.ProtoDecoder("target_path", pb.read_blob),  # bytes   // Serialized pickles to support string16: TODO
 84 |     15: pb.ProtoDecoder("received_bytes", pb.read_le_varint),  # int64:
 85 |     16: pb.ProtoDecoder("start_time", read_datetime),  # int64:
 86 |     17: pb.ProtoDecoder("end_time", read_datetime),  # int64:
 87 |     18: pb.ProtoDecoder("received_slices", lambda x: pb.read_embedded_protobuf(x, ReceivedSlice_Structure, True)),  # ReceivedSlice
 88 |     19: pb.ProtoDecoder("hash", pb.read_blob),  # string
 89 |     20: pb.ProtoDecoder("transient", lambda x: pb.read_le_varint(x) != 0),  # bool
 90 |     21: pb.ProtoDecoder("state", pb.read_le_varint32),  # int32:
 91 |     22: pb.ProtoDecoder("danger_type", pb.read_le_varint32),  # int32:
 92 |     23: pb.ProtoDecoder("interrupt_reason", pb.read_le_varint32),  # int32:
 93 |     24: pb.ProtoDecoder("paused", lambda x: pb.read_le_varint(x) != 0),  # bool
 94 |     25: pb.ProtoDecoder("metered", lambda x: pb.read_le_varint(x) != 0),  # bool
 95 |     26: pb.ProtoDecoder("bytes_wasted", pb.read_le_varint),  # int64:
 96 |     27: pb.ProtoDecoder("auto_resume_count", pb.read_le_varint32),  # int32:
 97 |     # 28: pb.ProtoDecoder("download_schedule", None)  # DownloadSchedule  // // Deprecated.
 98 |     # 29: pb.ProtoDecoder("reroute_info", pb),  # enterprise_connectors.DownloadItemRerouteInfo TODO
 99 |     30: pb.ProtoDecoder("credentials_mode", pb.read_le_varint32),  # int32:  // network::mojom::CredentialsMode
100 |     31: pb.ProtoDecoder("range_request_from", pb.read_le_varint),  # int64:
101 |     32: pb.ProtoDecoder("range_request_to", pb.read_le_varint),  # int64:
102 |     33: pb.ProtoDecoder("serialized_embedder_download_data", pb.read_string)  # string
103 | }
104 | 
105 | DownloadInfo_structure = {
106 |     1: pb.ProtoDecoder("guid", pb.read_string),
107 |     2: pb.ProtoDecoder("id", pb.read_le_varint32),
108 |     # 3 UkmInfo
109 |     4: pb.ProtoDecoder("in_progress_info", lambda x: pb.read_embedded_protobuf(x, InProgressInfo_Structure, True))
110 | }
111 | 
112 | DownloadDbEntry_structure = {
113 |     1: pb.ProtoDecoder("download_info", lambda x: pb.read_embedded_protobuf(x, DownloadInfo_structure, True))
114 | }
115 | 
116 | 
117 | def read_downloads(
118 |         shared_proto_db_folder: typing.Union[str, os.PathLike],
119 |         *, handle_errors=False, utf16_paths=True) -> typing.Iterator[Download]:
120 |     ldb_path = pathlib.Path(shared_proto_db_folder)
121 |     with ccl_leveldb.RawLevelDb(ldb_path) as ldb:
122 |         for rec in ldb.iterate_records_raw():
123 |             if rec.state != ccl_leveldb.KeyState.Live:
124 |                 continue
125 | 
126 |             key = rec.user_key
127 |             record_type, specific_key = key.split(b"_", 1)
128 |             if record_type == b"21":
129 |                 with io.BytesIO(rec.value) as f:
130 |                     obj = pb.ProtoObject(
131 |                         0xa, "root", pb.read_protobuff(f, DownloadDbEntry_structure, use_friendly_tag=True))
132 |                 try:
133 |                     download = Download.from_pb(rec.seq, obj, target_path_is_utf_16=utf16_paths)
134 |                 except ValueError as ex:
135 |                     print(f"Error reading a download: {ex}", file=sys.stderr)
136 |                     if handle_errors:
137 |                         continue
138 |                     else:
139 |                         raise
140 | 
141 |                 yield download
142 | 
143 | 
144 | def report_downloads(
145 |         shared_proto_db_folder: typing.Union[str, os.PathLike],
146 |         out_csv_path: typing.Union[str, os.PathLike], utf16_paths=True):
147 | 
148 |     with pathlib.Path(out_csv_path).open("tx", encoding="utf-8", newline="") as out:
149 |         writer = csv.writer(out, csv.excel, quoting=csv.QUOTE_ALL, quotechar="\"", escapechar="\\")
150 |         writer.writerow([
151 |             "seq no",
152 |             "guid",
153 |             "start time",
154 |             "end time",
155 |             "tab url",
156 |             "tab referrer url",
157 |             "download url chain",
158 |             "target path",
159 |             "hash",
160 |             "total bytes",
161 |             "mime type",
162 |             "original mime type"
163 |         ])
164 |         for download in read_downloads(shared_proto_db_folder, handle_errors=True, utf16_paths=utf16_paths):
165 |             writer.writerow([
166 |                 str(download.level_db_seq_no),
167 |                 str(download.guid),
168 |                 download.start_time,
169 |                 download.end_time,
170 |                 download.tab_url,
171 |                 download.tab_referrer_url,
172 |                 " -> ".join(download.url_chain),
173 |                 download.target_path,
174 |                 download.hash,
175 |                 download.total_bytes,
176 |                 download.mime_type,
177 |                 download.original_mime_type
178 |             ])
179 | 
180 | 
181 | if __name__ == '__main__':
182 |     import csv
183 |     if len(sys.argv) < 3:
184 |         print(f"USAGE: {pathlib.Path(sys.argv[0]).name} <shared_proto_db folder> <out.csv> [-u8]")
185 |         print()
186 |         print("-u8\tutf-8 target paths (use this if target paths appear garbled in the output)")
187 |         print()
188 |         exit(1)
189 |     report_downloads(sys.argv[1], sys.argv[2], "-u8" not in sys.argv[3:])
190 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/common.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import typing
 3 | import collections.abc as col_abc
 4 | 
 5 | 
 6 | KeySearch = typing.Union[str, re.Pattern, col_abc.Collection[str], col_abc.Callable[[str], bool]]
 7 | 
 8 | 
 9 | def is_keysearch_hit(search: KeySearch, value: str):
10 |     if isinstance(search, str):
11 |         return value == search
12 |     elif isinstance(search, re.Pattern):
13 |         return search.search(value) is not None
14 |     elif isinstance(search, col_abc.Collection):
15 |         return value in set(search)
16 |     elif isinstance(search, col_abc.Callable):
17 |         return search(value)
18 |     else:
19 |         raise TypeError(f"Unexpected type: {type(search)} (expects: {KeySearch})")


--------------------------------------------------------------------------------
/ccl_chromium_reader/download_common.py:
--------------------------------------------------------------------------------
 1 | import dataclasses
 2 | import datetime
 3 | import struct
 4 | import enum
 5 | 
 6 | from .serialization_formats import ccl_protobuff as pb
 7 | 
 8 | 
 9 | class DownloadSource(enum.Enum):
10 |     shared_proto_db = 1
11 |     history_db = 2
12 | 
13 | 
14 | @dataclasses.dataclass(frozen=True)
15 | class Download:  # TODO: all of the parameters
16 |     record_source: DownloadSource
17 |     record_id: int
18 |     guid: str
19 |     hash: str
20 |     url_chain: tuple[str, ...]
21 |     tab_url: str
22 |     tab_referrer_url: str
23 |     target_path: str
24 |     mime_type: str
25 |     original_mime_type: str
26 |     total_bytes: str
27 |     start_time: datetime.datetime
28 |     end_time: datetime.datetime
29 | 
30 |     @property
31 |     def level_db_seq_no(self):
32 |         if self.record_source == DownloadSource.shared_proto_db:
33 |             return self.record_id
34 | 
35 |     @property
36 |     def record_location(self) -> str:
37 |         if self.record_source == DownloadSource.shared_proto_db:
38 |             return f"Leveldb Seq: {self.record_id}"
39 |         elif self.record_source == DownloadSource.history_db:
40 |             return f"SQLite Rowid: {self.record_id}"
41 |         raise NotImplementedError()
42 | 
43 |     @property
44 |     def url(self) -> str:
45 |         return self.url_chain[-1]
46 | 
47 |     @property
48 |     def file_size(self) -> int:
49 |         return int(self.total_bytes)
50 | 
51 |     @classmethod
52 |     def from_pb(cls, seq: int, proto: pb.ProtoObject, *, target_path_is_utf_16=True):
53 |         if not proto.only("download_info").value:
54 |             raise ValueError("download_info is empty")
55 |         target_path_raw = proto.only("download_info").only("in_progress_info").only("target_path").value
56 |         path_proto_length, path_char_count = struct.unpack("<II", target_path_raw[0:8])
57 |         if path_proto_length != len(target_path_raw) - 4:
58 |             raise ValueError("Invalid pickle for target path")
59 |         if target_path_is_utf_16:
60 |             target_path = target_path_raw[8: 8 + (path_char_count * 2)].decode("utf-16-le")
61 |         else:
62 |             target_path = target_path_raw[8: 8 + path_char_count].decode("utf-8")
63 | 
64 |         return cls(
65 |             DownloadSource.shared_proto_db,
66 |             seq,
67 |             proto.only("download_info").only("guid").value,
68 |             proto.only("download_info").only("in_progress_info").only("hash").value.hex(),
69 |             tuple(x.value for x in proto.only("download_info").only("in_progress_info")["url_chain"]),
70 |             proto.only("download_info").only("in_progress_info").only("tab_url").value,
71 |             proto.only("download_info").only("in_progress_info").only("tab_url_referrer").value,
72 |             target_path,
73 |             proto.only("download_info").only("in_progress_info").only("mime_type").value,
74 |             proto.only("download_info").only("in_progress_info").only("original_mime_type").value,
75 |             proto.only("download_info").only("in_progress_info").only("total_bytes").value,
76 |             proto.only("download_info").only("in_progress_info").only("start_time").value,
77 |             proto.only("download_info").only("in_progress_info").only("end_time").value,
78 |         )
79 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/profile_folder_protocols.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import pathlib
  3 | import typing
  4 | import collections.abc as col_abc
  5 | 
  6 | from .common import KeySearch, is_keysearch_hit
  7 | 
  8 | 
  9 | class HasRecordLocationProtocol(typing.Protocol):
 10 |     @property
 11 |     def record_location(self) -> str:
 12 |         raise NotImplementedError()
 13 | 
 14 | 
 15 | @typing.runtime_checkable
 16 | class LocalStorageRecordProtocol(HasRecordLocationProtocol, typing.Protocol):
 17 |     @property
 18 |     def storage_key(self) -> str:
 19 |         raise NotImplementedError()
 20 | 
 21 |     @property
 22 |     def script_key(self) -> str:
 23 |         raise NotImplementedError()
 24 | 
 25 |     @property
 26 |     def value(self) -> str:
 27 |         raise NotImplementedError()
 28 | 
 29 | 
 30 | @typing.runtime_checkable
 31 | class SessionStorageRecordProtocol(HasRecordLocationProtocol, typing.Protocol):
 32 |     host: typing.Optional[str]
 33 |     key: str
 34 |     value: str
 35 | 
 36 | 
 37 | @typing.runtime_checkable
 38 | class HistoryRecordProtocol(HasRecordLocationProtocol, typing.Protocol):
 39 |     url: str
 40 |     title: str
 41 |     visit_time: datetime.datetime
 42 |     # TODO: Assess whether the parent/child visits can be part of the protocol
 43 | 
 44 | 
 45 | @typing.runtime_checkable
 46 | class IdbKeyProtocol(typing.Protocol):
 47 |     raw_key: bytes
 48 |     value: typing.Any
 49 | 
 50 | 
 51 | @typing.runtime_checkable
 52 | class IndexedDbRecordProtocol(HasRecordLocationProtocol, typing.Protocol):
 53 |     key: IdbKeyProtocol
 54 |     value: typing.Any
 55 | 
 56 | 
 57 | class CacheMetadataProtocol(typing.Protocol):
 58 |     request_time: datetime.datetime
 59 |     http_header_attributes: typing.Iterable[tuple[str, str]]
 60 | 
 61 |     def get_attribute(self, attribute: str) -> list[str]:
 62 |         raise NotImplementedError()
 63 | 
 64 | 
 65 | class CacheKeyProtocol(typing.Protocol):
 66 |     raw_key: str
 67 |     url: str
 68 | 
 69 | 
 70 | class CacheRecordProtocol(typing.Protocol):
 71 |     key: CacheKeyProtocol
 72 |     metadata: CacheMetadataProtocol
 73 |     data: bytes
 74 |     metadata_location: typing.Any
 75 |     data_location: typing.Any
 76 |     was_decompressed: bool
 77 | 
 78 | 
 79 | class DownloadRecordProtocol(HasRecordLocationProtocol, typing.Protocol):
 80 |     url: str
 81 |     start_time: typing.Optional[datetime.datetime]
 82 |     end_time: typing.Optional[datetime.datetime]
 83 |     target_path: typing.Optional[str]
 84 |     file_size: int
 85 | 
 86 | 
 87 | @typing.runtime_checkable
 88 | class BrowserProfileProtocol(typing.Protocol):
 89 |     def close(self):
 90 |         raise NotImplementedError()
 91 | 
 92 |     def iter_local_storage_hosts(self) -> col_abc.Iterable[str]:
 93 |         """
 94 |         Iterates the hosts in this profile's local storage
 95 |         """
 96 |         raise NotImplementedError()
 97 | 
 98 |     def iter_local_storage(
 99 |             self, storage_key: typing.Optional[KeySearch] = None, script_key: typing.Optional[KeySearch] = None, *,
100 |             include_deletions=False, raise_on_no_result=False) -> col_abc.Iterable[LocalStorageRecordProtocol]:
101 |         """
102 |         Iterates this profile's local storage records
103 | 
104 |         :param storage_key: storage key (host) for the records. This can be one of: a single string;
105 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool.
106 |         :param script_key: script defined key for the records. This can be one of: a single string;
107 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool.
108 |         :param include_deletions: if True, records related to deletions will be included
109 |         :param raise_on_no_result: if True (the default) if no matching storage keys are found, raise a KeyError
110 |         (these will have None as values).
111 |         :return:
112 |         """
113 |         raise NotImplementedError()
114 | 
115 |     def iter_session_storage_hosts(self) -> col_abc.Iterable[str]:
116 |         """
117 |         Iterates this profile's session storage hosts
118 |         """
119 |         raise NotImplementedError()
120 | 
121 |     def iter_session_storage(
122 |             self, host: typing.Optional[KeySearch] = None, key: typing.Optional[KeySearch] = None, *,
123 |             include_deletions=False, raise_on_no_result=False) -> col_abc.Iterable[SessionStorageRecordProtocol]:
124 |         """
125 |         Iterates this profile's session storage records
126 | 
127 |         :param host: storage key (host) for the records. This can be one of: a single string;
128 |         a collection of strings; a regex pattern; a function that takes a string (each host) and
129 |         returns a bool; or None (the default) in which case all hosts are considered.
130 |         :param key: script defined key for the records. This can be one of: a single string;
131 |         a collection of strings; a regex pattern; a function that takes a string and returns a bool; or
132 |         None (the default) in which case all keys are considered.
133 |         :param include_deletions: if True, records related to deletions will be included (these will have None as
134 |         values).
135 |         :param raise_on_no_result: if True, if no matching storage keys are found, raise a KeyError
136 | 
137 |         :return: iterable of SessionStoreValue
138 |         """
139 |         raise NotImplementedError()
140 | 
141 |     def iter_indexeddb_hosts(self) -> col_abc.Iterable[str]:
142 |         """
143 |         Iterates the hosts present in the Indexed DB folder. These values are what should be used to load the databases
144 |         directly.
145 |         """
146 |         raise NotImplementedError()
147 | 
148 |     def get_indexeddb(self, host: str):
149 |         """
150 |         Returns the database with the host provided. Should be one of the values returned by
151 |         :func:`~iter_indexeddb_hosts`. The database will be opened on-demand if it hasn't previously been opened.
152 | 
153 |         :param host: the host to get
154 |         """
155 |         # TODO typehint return type once it's also abstracted
156 |         raise NotImplementedError()
157 | 
158 |     def iter_indexeddb_records(
159 |             self, host_id: typing.Optional[KeySearch], database_name: typing.Optional[KeySearch] = None,
160 |             object_store_name: typing.Optional[KeySearch] = None, *,
161 |             raise_on_no_result=False, include_deletions=False,
162 |             bad_deserializer_data_handler=None) -> col_abc.Iterable[IndexedDbRecordProtocol]:
163 |         """
164 |         Iterates indexeddb records in this profile.
165 | 
166 |         :param host_id: the host for the records, relates to the host-named folder in the IndexedDB folder. The
167 |         possible values for this profile are returned by :func:`~iter_indexeddb_hosts`. This can be one of:
168 |         a single string; a collection of strings; a regex pattern; a function that takes a string (each host) and
169 |         returns a bool; or None in which case all hosts are considered. Be cautious with supplying a parameter
170 |         which will lead to unnecessary databases being opened as this has a set-up time for the first time it
171 |         is opened.
172 |         :param database_name: the database name for the records. This can be one of: a single string; a collection
173 |         of strings; a regex pattern; a function that takes a string (each host) and returns a bool; or None (the
174 |         default) in which case all hosts are considered.
175 |         :param object_store_name: the object store name of the records. This can be one of: a single string;
176 |         a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool;
177 |         or None (the default) in which case all hosts are considered.
178 |         :param raise_on_no_result: if True, if no matching storage keys are found, raise a KeyError
179 |         :param include_deletions: if True, records related to deletions will be included (these will have None as
180 |         values).
181 |         :param bad_deserializer_data_handler: a callback function which will be executed by the underlying
182 |         indexeddb reader if invalid data is encountered during reading a record, rather than raising an exception.
183 |         The function should take two arguments: an IdbKey object (which is the key of the bad record) and a bytes
184 |         object (which is the raw data). The return value of the callback is ignored by the calling code. If this is
185 |         None (the default) then any bad data will cause an exception to be raised.
186 |         """
187 |         raise NotImplementedError()
188 | 
189 |     def iterate_history_records(
190 |             self, url: typing.Optional[KeySearch]=None, *,
191 |             earliest: typing.Optional[datetime.datetime]=None,
192 |             latest: typing.Optional[datetime.datetime]=None) -> col_abc.Iterable[HistoryRecordProtocol]:
193 |         """
194 |         Iterates history records for this profile.
195 | 
196 |         :param url: a URL to search for. This can be one of: a single string; a collection of strings;
197 |         a regex pattern; a function that takes a string (each host) and returns a bool; or None (the
198 |         default) in which case all hosts are considered.
199 |         :param earliest: an optional datetime which will be used to exclude records before this date.
200 |         NB the date should be UTC to match the database. If None, no lower limit will be placed on
201 |         timestamps.
202 |         :param latest: an optional datetime which will be used to exclude records after this date.
203 |         NB the date should be UTC to match the database. If None, no upper limit will be placed on
204 |         timestamps.
205 |         """
206 |         # TODO typehint return type once it's also abstracted
207 |         raise NotImplementedError()
208 | 
209 |     def iterate_cache(
210 |             self,
211 |             url: typing.Optional[KeySearch]=None, *, decompress=True, omit_cached_data=False,
212 |             **kwargs: typing.Union[bool, KeySearch]) -> col_abc.Iterable[CacheRecordProtocol]:
213 |         """
214 |         Iterates cache records for this profile.
215 | 
216 |         :param url: a URL to search for. This can be one of: a single string; a collection of strings;
217 |         a regex pattern; a function that takes a string (each host) and returns a bool; or None (the
218 |         default) in which case all records are considered.
219 |         :param decompress: if True (the default), data from the cache which is compressed (as per the
220 |         content-encoding header field) will be decompressed when read if the compression format is
221 |         supported (currently deflate, gzip and brotli are supported).
222 |         :param omit_cached_data: does not collect the cached data and omits it from each `CacheResult`
223 |         object. Should be faster in cases when only metadata recovery is required.
224 |         :param kwargs: further keyword arguments are used to search based upon header fields. The
225 |         keyword should be the header field name, with underscores replacing hyphens (e.g.,
226 |         content-encoding, becomes content_encoding). The value should be one of: a Boolean (in which
227 |         case only records with this field present will be included if True, and vice versa); a single
228 |         string; a collection of strings; a regex pattern; a function that takes a string (the value)
229 |         and returns a bool.
230 |         """
231 |         raise NotImplementedError()
232 | 
233 |     def iter_downloads(
234 |             self, *, download_url: typing.Optional[KeySearch]=None,
235 |             tab_url: typing.Optional[KeySearch]=None) -> col_abc.Iterable[DownloadRecordProtocol]:
236 |         """
237 |         Iterates download records for this profile
238 | 
239 |         :param download_url: A URL related to the downloaded resource. This can be one of: a single string;
240 |         a collection of strings; a regex pattern; a function that takes a string (each host) and returns a bool;
241 |         or None (the default) in which case all records are considered.
242 |         :param tab_url: A URL related to the page the user was accessing when this download was started.
243 |         This can be one of: a single string; a collection of strings; a regex pattern; a function that takes
244 |         a string (each host) and returns a bool; or None (the default) in which case all records are considered.
245 |         """
246 |         raise NotImplementedError()
247 | 
248 |     @property
249 |     def path(self) -> pathlib.Path:
250 |         """The input path of this browser profile"""
251 |         raise NotImplementedError()
252 | 
253 |     @property
254 |     def local_storage(self):
255 |         """The local storage object for this browser profile"""
256 |         raise NotImplementedError()
257 | 
258 |     @property
259 |     def session_storage(self):
260 |         """The session storage object for this browser profile"""
261 |         raise NotImplementedError()
262 | 
263 |     @property
264 |     def cache(self):
265 |         """The cache for this browser profile"""
266 |         raise NotImplementedError()
267 | 
268 |     @property
269 |     def history(self):
270 |         """The history for this browser profile"""
271 |         raise NotImplementedError()
272 | 
273 |     @property
274 |     def browser_type(self) -> str:
275 |         """The name of the browser type for this profile"""
276 |         raise NotImplementedError()
277 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/serialization_formats/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/ccl_chromium_reader/serialization_formats/__init__.py


--------------------------------------------------------------------------------
/ccl_chromium_reader/serialization_formats/ccl_blink_value_deserializer.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2020, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import sys
 24 | import enum
 25 | import typing
 26 | from dataclasses import dataclass
 27 | 
 28 | from . import ccl_v8_value_deserializer
 29 | 
 30 | # See: https://chromium.googlesource.com/chromium/src/third_party/+/master/blink/renderer/bindings/core/v8/serialization
 31 | #      https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/modules/v8/serialization/v8_script_value_serializer_for_modules.cc
 32 | 
 33 | 
 34 | # WebCoreStrings are read as (length:uint32_t, string:UTF8[length]).
 35 | # RawStrings are read as (length:uint32_t, string:UTF8[length]).
 36 | # RawUCharStrings are read as
 37 | #     (length:uint32_t, string:UChar[length/sizeof(UChar)]).
 38 | # RawFiles are read as
 39 | #     (path:WebCoreString, url:WebCoreStrng, type:WebCoreString).
 40 | # There is a reference table that maps object references (uint32_t) to
 41 | # v8::Values.
 42 | # Tokens marked with (ref) are inserted into the reference table and given the
 43 | # next object reference ID after decoding.
 44 | # All tags except InvalidTag, PaddingTag, ReferenceCountTag, VersionTag,
 45 | # GenerateFreshObjectTag and GenerateFreshArrayTag push their results to the
 46 | # deserialization stack.
 47 | # There is also an 'open' stack that is used to resolve circular references.
 48 | # Objects or arrays may contain self-references. Before we begin to deserialize
 49 | # the contents of these values, they are first given object reference IDs (by
 50 | # GenerateFreshObjectTag/GenerateFreshArrayTag); these reference IDs are then
 51 | # used with ObjectReferenceTag to tie the recursive knot.
 52 | 
 53 | __version__ = "0.3"
 54 | __description__ = "Partial reimplementation of the Blink Javascript Object Serialization"
 55 | __contact__ = "Alex Caithness"
 56 | 
 57 | __DEBUG = False
 58 | 
 59 | 
 60 | def log(msg, debug_only=True):
 61 |     if __DEBUG or not debug_only:
 62 |         caller_name = sys._getframe(1).f_code.co_name
 63 |         caller_line = sys._getframe(1).f_code.co_firstlineno
 64 |         print(f"{caller_name} ({caller_line}):\t{msg}")
 65 | 
 66 | 
 67 | class BlobIndexType(enum.Enum):
 68 |     Blob = 0
 69 |     File = 1
 70 | 
 71 | 
 72 | @dataclass
 73 | class BlobIndex:
 74 |     index_type: BlobIndexType
 75 |     index_id: int
 76 | 
 77 | 
 78 | @dataclass(frozen=True)
 79 | class NativeFileHandle:
 80 |     is_dir: bool
 81 |     name: str
 82 |     token_index: int
 83 | 
 84 | 
 85 | @dataclass(frozen=True)
 86 | class CryptoKey:
 87 |     sub_type: "V8CryptoKeySubType"
 88 |     algorithm_type: typing.Optional["V8CryptoKeyAlgorithm"]
 89 |     hash_type: typing.Optional["V8CryptoKeyAlgorithm"]
 90 |     asymmetric_key_type: typing.Optional["V8AsymmetricCryptoKeyType"]
 91 |     byte_length: typing.Optional[int]
 92 |     public_exponent: typing.Optional[bytes]
 93 |     named_curve_type: typing.Optional["V8CryptoNamedCurve"]
 94 |     key_usage: "V8CryptoKeyUsage"
 95 |     key_data: bytes
 96 | 
 97 | 
 98 | class Constants:
 99 |     tag_kMessagePortTag = b"M"  # index:int -> MessagePort. Fills the result with
100 |                                 # transferred MessagePort.
101 |     tag_kMojoHandleTag = b"h"   # index:int -> MojoHandle. Fills the result with
102 |                                 # transferred MojoHandle.
103 |     tag_kBlobTag = b"b"         # uuid:WebCoreString, type:WebCoreString, size:uint64_t ->
104 |                                 # Blob (ref)
105 |     tag_kBlobIndexTag = b"i"    # index:int32_t -> Blob (ref)
106 |     tag_kFileTag = b"f"         # file:RawFile -> File (ref)
107 |     tag_kFileIndexTag = b"e"    # index:int32_t -> File (ref)
108 |     tag_kDOMFileSystemTag = b"d"  # type : int32_t, name:WebCoreString,
109 |                                   # uuid:WebCoreString -> FileSystem (ref)
110 |     tag_kNativeFileSystemFileHandleTag = b"n"  # name:WebCoreString, index:uint32_t
111 |                                                # -> NativeFileSystemFileHandle (ref)
112 |     tag_kNativeFileSystemDirectoryHandleTag = b"N"  # name:WebCoreString, index:uint32_t ->
113 |                                                    # NativeFileSystemDirectoryHandle (ref)
114 |     tag_kFileListTag = b"l"                     # length:uint32_t, files:RawFile[length] -> FileList (ref)
115 |     tag_kFileListIndexTag = b"L"                # length:uint32_t, files:int32_t[length] -> FileList (ref)
116 |     tag_kImageDataTag = b"#"                   # tags terminated by ImageSerializationTag::kEnd (see
117 |                                                # SerializedColorParams.h), width:uint32_t,
118 |                                                # height:uint32_t, pixelDataLength:uint64_t,
119 |                                                # data:byte[pixelDataLength]
120 |                                                # -> ImageData (ref)
121 |     tag_kImageBitmapTag = b"g"        # tags terminated by ImageSerializationTag::kEnd (see
122 |                                       # SerializedColorParams.h), width:uint32_t,
123 |                                       # height:uint32_t, pixelDataLength:uint32_t,
124 |                                       # data:byte[pixelDataLength]
125 |                                       # -> ImageBitmap (ref)
126 |     tag_kImageBitmapTransferTag = b"G"      # index:uint32_t -> ImageBitmap. For ImageBitmap transfer
127 |     tag_kOffscreenCanvasTransferTag = b"H"  # index, width, height, id,
128 |                                             # filter_quality::uint32_t ->
129 |                                             # OffscreenCanvas. For OffscreenCanvas
130 |                                             # transfer
131 |     tag_kReadableStreamTransferTag = b"r"    # index:uint32_t
132 |     tag_kTransformStreamTransferTag = b"m"   # index:uint32_t
133 |     tag_kWritableStreamTransferTag = b"w"    # index:uint32_t
134 |     tag_kDOMPointTag = b"Q"                  # x:Double, y:Double, z:Double, w:Double
135 |     tag_kDOMPointReadOnlyTag = b"W"          # x:Double, y:Double, z:Double, w:Double
136 |     tag_kDOMRectTag = b"E"                   # x:Double, y:Double, width:Double, height:Double
137 |     tag_kDOMRectReadOnlyTag = b"R"           # x:Double, y:Double, width:Double, height:Double
138 |     tag_kDOMQuadTag = b"T"                   # p1:Double, p2:Double, p3:Double, p4:Double
139 |     tag_kDOMMatrixTag = b"Y"                 # m11..m44: 16 Double
140 |     tag_kDOMMatrixReadOnlyTag = b"U"         # m11..m44: 16 Double
141 |     tag_kDOMMatrix2DTag = b"I"               # a..f: 6 Double
142 |     tag_kDOMMatrix2DReadOnlyTag = b"O"       # a..f: 6 Double
143 |     tag_kCryptoKeyTag = b"K"                 # subtag:byte, props, usages:uint32_t,
144 |     # keyDataLength:uint32_t, keyData:byte[keyDataLength]
145 |     #                 If subtag=AesKeyTag:
146 |     #                     props = keyLengthBytes:uint32_t, algorithmId:uint32_t
147 |     #                 If subtag=HmacKeyTag:
148 |     #                     props = keyLengthBytes:uint32_t, hashId:uint32_t
149 |     #                 If subtag=RsaHashedKeyTag:
150 |     #                     props = algorithmId:uint32_t, type:uint32_t,
151 |     #                     modulusLengthBits:uint32_t,
152 |     #                     publicExponentLength:uint32_t,
153 |     #                     publicExponent:byte[publicExponentLength],
154 |     #                     hashId:uint32_t
155 |     #                 If subtag=EcKeyTag:
156 |     #                     props = algorithmId:uint32_t, type:uint32_t,
157 |     #                     namedCurve:uint32_t
158 |     tag_kRTCCertificateTag = b"k"  # length:uint32_t, pemPrivateKey:WebCoreString,
159 |     # pemCertificate:WebCoreString
160 |     tag_kRTCEncodedAudioFrameTag = b"A"  # uint32_t -> transferred audio frame ID
161 |     tag_kRTCEncodedVideoFrameTag = b"V"  # uint32_t -> transferred video frame ID
162 |     tag_kVideoFrameTag = b"v"            # uint32_t -> transferred video frame ID
163 | 
164 |     # The following tags were used by the Shape Detection API implementation
165 |     # between M71 and M81. During these milestones, the API was always behind
166 |     # a flag. Usage was removed in https:#crrev.com/c/2040378.
167 |     tag_kDeprecatedDetectedBarcodeTag = b"B"
168 |     tag_kDeprecatedDetectedFaceTag = b"F"
169 |     tag_kDeprecatedDetectedTextTag = b"t"
170 | 
171 |     tag_kDOMExceptionTag = b"x"  # name:String,message:String,stack:String
172 |     tag_kVersionTag = b"\xff"  # version:uint32_t -> Uses this as the file version.
173 |     tag_kTrailerOffsetTag = b"\xfe" # offset:uint64_t (fixed width, network order) from buffer, start size:uint32_t (fixed width, network order)
174 |     tag_kTrailerRequiresInterfacesTag = b"\xA0"
175 | 
176 | 
177 | class V8CryptoKeySubType(enum.IntEnum):
178 |     """
179 |     See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h
180 |     Used by the kCryptoKeyTag type
181 |     """
182 |     AesKey = 1
183 |     HmacKey = 2
184 |     # ID 3 was used by RsaKeyTag, while still behind experimental flag.
185 |     RsaHashedKey = 4
186 |     EcKey = 5
187 |     NoParamsKey = 6
188 | 
189 | 
190 | class V8CryptoKeyAlgorithm(enum.IntEnum):
191 |     """
192 |     See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h
193 |     Used by the kCryptoKeyTag type
194 |     """
195 |     AesCbcTag = 1
196 |     HmacTag = 2
197 |     RsaSsaPkcs1v1_5Tag = 3
198 |     # ID 4 was used by RsaEs, while still behind experimental flag.
199 |     Sha1Tag = 5
200 |     Sha256Tag = 6
201 |     Sha384Tag = 7
202 |     Sha512Tag = 8
203 |     AesGcmTag = 9
204 |     RsaOaepTag = 10
205 |     AesCtrTag = 11
206 |     AesKwTag = 12
207 |     RsaPssTag = 13
208 |     EcdsaTag = 14
209 |     EcdhTag = 15
210 |     HkdfTag = 16
211 |     Pbkdf2Tag = 17
212 | 
213 | 
214 | class V8AsymmetricCryptoKeyType(enum.IntEnum):
215 |     Public = 1
216 |     Private = 2
217 | 
218 | 
219 | class V8CryptoNamedCurve(enum.IntEnum):
220 |     """
221 |     See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h
222 |     Used by the kCryptoKeyTag type
223 |     """
224 |     P256 = 1
225 |     P384 = 2
226 |     P521 = 3
227 | 
228 | 
229 | class V8CryptoKeyUsage(enum.IntFlag):
230 |     """
231 |     See: third_party/blink/renderer/bindings/modules/v8/serialization/web_crypto_sub_tags.h
232 |     Used by the kCryptoKeyTag type
233 |     """
234 |     kExtractableUsage = 1 << 0
235 |     kEncryptUsage = 1 << 1
236 |     kDecryptUsage = 1 << 2
237 |     kSignUsage = 1 << 3
238 |     kVerifyUsage = 1 << 4
239 |     kDeriveKeyUsage = 1 << 5
240 |     kWrapKeyUsage = 1 << 6
241 |     kUnwrapKeyUsage = 1 << 7
242 |     kDeriveBitsUsage = 1 << 8
243 | 
244 | 
245 | class BlinkV8Deserializer:
246 |     def _read_varint(self, stream) -> int:
247 |         return ccl_v8_value_deserializer.read_le_varint(stream)[0]
248 | 
249 |     def _read_varint32(self, stream) -> int:
250 |         return ccl_v8_value_deserializer.read_le_varint(stream, is_32bit=True)[0]
251 | 
252 |     def _read_utf8_string(self, stream: typing.BinaryIO) -> str:
253 |         length = self._read_varint32(stream)
254 |         raw_string = stream.read(length)
255 |         if len(raw_string) != length:
256 |             raise ValueError("Could not read all of the utf-8 data")
257 |         return raw_string.decode("utf-8")
258 | 
259 |     # def _read_uint32(self, stream: typing.BinaryIO) -> int:
260 |     #     raw = stream.read(4)
261 |     #     if len(raw) < 4:
262 |     #         raise ValueError("Could not read enough data when reading int32")
263 |     #     return struct.unpack("<I", raw)[0]
264 | 
265 |     def _read_file_index(self, stream: typing.BinaryIO) -> BlobIndex:
266 |         return BlobIndex(BlobIndexType.File, self._read_varint(stream))
267 | 
268 |     def _read_blob_index(self, stream: typing.BinaryIO) -> BlobIndex:
269 |         return BlobIndex(BlobIndexType.Blob, self._read_varint(stream))
270 | 
271 |     def _read_file_list_index(self, stream: typing.BinaryIO) -> typing.Iterable[BlobIndex]:
272 |         length = self._read_varint(stream)
273 |         result = [self._read_file_index(stream) for _ in range(length)]
274 |         return result
275 | 
276 |     def _read_native_file_handle(self, is_dir: bool, stream: typing.BinaryIO) -> NativeFileHandle:
277 |         return NativeFileHandle(is_dir, self._read_utf8_string(stream), self._read_varint(stream))
278 | 
279 |     def _read_crypto_key(self, stream: typing.BinaryIO):
280 |         sub_type = V8CryptoKeySubType(stream.read(1)[0])
281 | 
282 |         if sub_type == V8CryptoKeySubType.AesKey:
283 |             algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
284 |             byte_length = self._read_varint32(stream)
285 |             params = {
286 |                 "algorithm_type": algorithm_id,
287 |                 "byte_length": byte_length,
288 |                 "hash_type": None,
289 |                 "named_curve_type": None,
290 |                 "asymmetric_key_type": None,
291 |                 "public_exponent": None
292 |             }
293 |         elif sub_type == V8CryptoKeySubType.HmacKey:
294 |             byte_length = self._read_varint32(stream)
295 |             hash_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
296 |             params = {
297 |                 "byte_length": byte_length,
298 |                 "hash_type": hash_id,
299 |                 "algorithm_type": None,
300 |                 "named_curve_type": None,
301 |                 "asymmetric_key_type": None,
302 |                 "public_exponent": None
303 |             }
304 |         elif sub_type == V8CryptoKeySubType.RsaHashedKey:
305 |             algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
306 |             asymmetric_key_type = V8AsymmetricCryptoKeyType(stream.read(1)[0])
307 |             length_bytes = self._read_varint32(stream)
308 |             public_exponent_length = self._read_varint32(stream)
309 |             public_exponent = stream.read(public_exponent_length)
310 |             if len(public_exponent) != public_exponent_length:
311 |                 raise ValueError(f"Could not read all of public exponent data")
312 |             hash_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
313 |             params = {
314 |                 "algorithm_type": algorithm_id,
315 |                 "asymmetric_key_type": asymmetric_key_type,
316 |                 "byte_length": length_bytes,
317 |                 "public_exponent": public_exponent,
318 |                 "hash_type": hash_id,
319 |                 "named_curve_type": None
320 |             }
321 | 
322 |         elif sub_type == V8CryptoKeySubType.EcKey:
323 |             algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
324 |             asymmetric_key_type = V8AsymmetricCryptoKeyType(stream.read(1)[0])
325 |             named_curve = V8CryptoNamedCurve(self._read_varint32(stream))
326 |             params = {
327 |                 "algorithm_type": algorithm_id,
328 |                 "asymmetric_key_type": asymmetric_key_type,
329 |                 "named_curve_type": named_curve,
330 |                 "hash_type": None,
331 |                 "byte_length": None,
332 |                 "public_exponent": None
333 |             }
334 |         elif sub_type == V8CryptoKeySubType.NoParamsKey:
335 |             algorithm_id = V8CryptoKeyAlgorithm(self._read_varint32(stream))
336 |             params = {
337 |                 "algorithm_type": algorithm_id,
338 |                 "hash_type": None,
339 |                 "asymmetric_key_type": None,
340 |                 "byte_length": None,
341 |                 "named_curve_type": None,
342 |                 "public_exponent": None
343 |             }
344 |         else:
345 |             raise ValueError(f"Unknown V8CryptoKeySubType {sub_type}")
346 | 
347 |         params["key_usage"] = V8CryptoKeyUsage(self._read_varint32(stream))
348 |         key_length = self._read_varint32(stream)
349 |         key_data = stream.read(key_length)
350 |         if len(key_data) < key_length:
351 |             raise ValueError("Could not read all key data")
352 | 
353 |         params["key_data"] = key_data
354 |         return CryptoKey(sub_type, **params)
355 | 
356 |     def _not_implemented(self, stream):
357 |         raise NotImplementedError()
358 | 
359 |     def read(self, stream: typing.BinaryIO) -> typing.Any:
360 |         tag = stream.read(1)
361 | 
362 |         func = {
363 |             Constants.tag_kMessagePortTag: lambda x: self._not_implemented(x),
364 |             Constants.tag_kMojoHandleTag: lambda x: self._not_implemented(x),
365 |             Constants.tag_kBlobTag: lambda x: self._not_implemented(x),
366 |             Constants.tag_kBlobIndexTag: lambda x: self._read_blob_index(x),
367 |             Constants.tag_kFileTag: lambda x: self._not_implemented(x),
368 |             Constants.tag_kFileIndexTag: lambda x: self._read_file_index(x),
369 |             Constants.tag_kDOMFileSystemTag: lambda x: self._not_implemented(x),
370 |             Constants.tag_kNativeFileSystemFileHandleTag: lambda x: self._read_native_file_handle(False, x),
371 |             Constants.tag_kNativeFileSystemDirectoryHandleTag: lambda x: self._read_native_file_handle(True, x),
372 |             Constants.tag_kFileListTag: lambda x: self._not_implemented(x),
373 |             Constants.tag_kFileListIndexTag: lambda x: self._read_file_list_index(x),
374 |             Constants.tag_kImageDataTag: lambda x: self._not_implemented(x),
375 |             Constants.tag_kImageBitmapTag: lambda x: self._not_implemented(x),
376 |             Constants.tag_kImageBitmapTransferTag: lambda x: self._not_implemented(x),
377 |             Constants.tag_kOffscreenCanvasTransferTag: lambda x: self._not_implemented(x),
378 |             Constants.tag_kReadableStreamTransferTag: lambda x: self._not_implemented(x),
379 |             Constants.tag_kTransformStreamTransferTag: lambda x: self._not_implemented(x),
380 |             Constants.tag_kWritableStreamTransferTag: lambda x: self._not_implemented(x),
381 |             Constants.tag_kDOMPointTag: lambda x: self._not_implemented(x),
382 |             Constants.tag_kDOMPointReadOnlyTag: lambda x: self._not_implemented(x),
383 |             Constants.tag_kDOMRectTag: lambda x: self._not_implemented(x),
384 |             Constants.tag_kDOMRectReadOnlyTag: lambda x: self._not_implemented(x),
385 |             Constants.tag_kDOMQuadTag: lambda x: self._not_implemented(x),
386 |             Constants.tag_kDOMMatrixTag: lambda x: self._not_implemented(x),
387 |             Constants.tag_kDOMMatrixReadOnlyTag: lambda x: self._not_implemented(x),
388 |             Constants.tag_kDOMMatrix2DTag: lambda x: self._not_implemented(x),
389 |             Constants.tag_kDOMMatrix2DReadOnlyTag: lambda x: self._not_implemented(x),
390 |             Constants.tag_kCryptoKeyTag: lambda x: self._read_crypto_key(x),
391 |             Constants.tag_kRTCCertificateTag: lambda x: self._not_implemented(x),
392 |             Constants.tag_kRTCEncodedAudioFrameTag: lambda x: self._not_implemented(x),
393 |             Constants.tag_kRTCEncodedVideoFrameTag: lambda x: self._not_implemented(x),
394 |             Constants.tag_kVideoFrameTag: lambda x: self._not_implemented(x),
395 |             Constants.tag_kDOMExceptionTag: lambda x: self._not_implemented(x)
396 |         }.get(tag)
397 | 
398 |         if func is None:
399 |             raise ValueError(f"Unknown tag: {tag}")
400 | 
401 |         return func(stream)
402 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/serialization_formats/ccl_easy_chromium_pickle.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import io
 24 | import datetime
 25 | import struct
 26 | import os
 27 | 
 28 | 
 29 | __version__ = "0.1"
 30 | __description__ = "Module for reading Chromium Pickles."
 31 | __contact__ = "Alex Caithness"
 32 | 
 33 | 
 34 | class EasyPickleException(Exception):
 35 |     ...
 36 | 
 37 | 
 38 | class EasyPickleIterator:
 39 |     """
 40 |     A pythonic implementation of the PickleIterator object used in various places in Chrom(e|ium).
 41 |     """
 42 |     def __init__(self, data: bytes, alignment: int=4):
 43 |         """
 44 |         Takes a bytes buffer and wraps the EasyPickleIterator around it
 45 |         :param data: the data to be wrapped
 46 |         :param alignment: (optional) the number of bytes to align reads to (default: 4)
 47 |         """
 48 |         self._f = io.BytesIO(data)
 49 |         self._alignment = alignment
 50 | 
 51 |         self._pickle_length = self.read_uint32()
 52 |         if len(data) != self._pickle_length + 4:
 53 |             raise EasyPickleException("pickle length invalid")
 54 | 
 55 |     def __enter__(self) -> "EasyPickleIterator":
 56 |         return self
 57 | 
 58 |     def __exit__(self, exc_type, exc_val, exc_tb):
 59 |         self.close()
 60 | 
 61 |     def close(self):
 62 |         self._f.close()
 63 | 
 64 |     def read_aligned(self, length: int) -> bytes:
 65 |         """
 66 |         reads the number of bytes specified by the length parameter. Aligns the buffer afterwards if required.
 67 |         :param length: the length od data to be read
 68 |         :return: the data read (without the alignment padding)
 69 |         """
 70 |         raw = self._f.read(length)
 71 |         if len(raw) != length:
 72 |             raise EasyPickleException(f"Tried to read {length} bytes but only got {len(raw)}")
 73 | 
 74 |         align_count = self._alignment - (length % self._alignment)
 75 |         if align_count != self._alignment:
 76 |             self._f.seek(align_count, os.SEEK_CUR)
 77 | 
 78 |         return raw
 79 | 
 80 |     def read_uint16(self) -> int:
 81 |         raw = self.read_aligned(2)
 82 |         return struct.unpack("<H", raw)[0]
 83 | 
 84 |     def read_uint32(self) -> int:
 85 |         raw = self.read_aligned(4)
 86 |         return struct.unpack("<I", raw)[0]
 87 | 
 88 |     def read_uint64(self) -> int:
 89 |         raw = self.read_aligned(8)
 90 |         return struct.unpack("<Q", raw)[0]
 91 | 
 92 |     def read_int16(self) -> int:
 93 |         raw = self.read_aligned(2)
 94 |         return struct.unpack("<h", raw)[0]
 95 | 
 96 |     def read_int32(self) -> int:
 97 |         raw = self.read_aligned(4)
 98 |         return struct.unpack("<i", raw)[0]
 99 | 
100 |     def read_int64(self) -> int:
101 |         raw = self.read_aligned(8)
102 |         return struct.unpack("<q", raw)[0]
103 | 
104 |     def read_bool(self) -> bool:
105 |         raw = self.read_int32()
106 |         if raw == 0:
107 |             return False
108 |         elif raw == 1:
109 |             return True
110 |         else:
111 |             raise EasyPickleException("bools should only contain 0 or 1")
112 | 
113 |     def read_single(self) -> float:
114 |         raw = self.read_aligned(4)
115 |         return struct.unpack("<f", raw)[0]
116 | 
117 |     def read_double(self) -> float:
118 |         raw = self.read_aligned(8)
119 |         return struct.unpack("<d", raw)[0]
120 | 
121 |     def read_string(self) -> str:
122 |         length = self.read_uint32()
123 |         raw = self.read_aligned(length)
124 |         return raw.decode("utf-8")
125 | 
126 |     def read_string16(self) -> str:
127 |         length = self.read_uint32() * 2  # character count
128 |         raw = self.read_aligned(length)
129 |         return raw.decode("utf-16-le")
130 | 
131 |     def read_datetime(self) -> datetime.datetime:
132 |         return datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds=self.read_uint64())
133 | 
134 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/serialization_formats/ccl_protobuff.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2022, CCL Forensics
  3 | 
  4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  5 | this software and associated documentation files (the "Software"), to deal in
  6 | the Software without restriction, including without limitation the rights to
  7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  8 | of the Software, and to permit persons to whom the Software is furnished to do
  9 | so, subject to the following conditions:
 10 | 
 11 | The above copyright notice and this permission notice shall be included in all
 12 | copies or substantial portions of the Software.
 13 | 
 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 20 | SOFTWARE.
 21 | """
 22 | 
 23 | import sys
 24 | import struct
 25 | import io
 26 | import typing
 27 | 
 28 | __version__ = "0.8"
 29 | __description__ = "Module for naive parsing of Protocol Buffers"
 30 | __contact__ = "Alex Caithness"
 31 | 
 32 | DEBUG = False
 33 | 
 34 | 
 35 | class Empty:
 36 |     value = None
 37 | 
 38 | 
 39 | class ProtoObject:
 40 |     def __init__(self, tag, name, value):
 41 |         self.tag = tag
 42 |         self.name = name
 43 |         self.value = value
 44 |         self.wire = tag & 0x07
 45 | 
 46 |     def __str__(self):
 47 |         if self.name:
 48 |             return "{0} ({1}): {2}".format(
 49 |                 self.tag if self.tag > 0x7f else hex(self.tag), self.name, repr(self.value))
 50 |         else:
 51 |             return "{0}: {1}".format(
 52 |                 self.tag if self.tag > 0x7f else hex(self.tag), repr(self.value))
 53 |     __repr__ = __str__
 54 | 
 55 |     @property
 56 |     def friendly_tag(self) -> int:
 57 |         """:return the "real" tag (i.e. the one that would be seen inside the .proto schema)"""
 58 |         return self.tag >> 3
 59 | 
 60 |     def get_items_by_tag(self, tag_id: int) -> list[typing.Any]:
 61 |         """
 62 |         :param tag_id: the tag id for the child items
 63 |         :return: list of child items with this tag number
 64 |         """
 65 |         if not isinstance(self.value, list):
 66 |             raise ValueError("This object does not support child items")
 67 |         if not isinstance(tag_id, int):
 68 |             raise TypeError("Expected type: int; actual type: {0}".format(type(tag_id)))
 69 |         return [x for x in self.value if x.tag == tag_id]
 70 | 
 71 |     def get_items_by_name(self, name: str) -> list[typing.Any]:
 72 |         """
 73 |         :param name: the field name for the child items
 74 |         :return: list of child items with this name
 75 |         """
 76 |         if not isinstance(self.value, list):
 77 |             raise ValueError("This object does not support child items")
 78 |         if not isinstance(name, str):
 79 |             raise TypeError("Expected type: str; actual type: {0}".format(type(name)))
 80 |         return [x for x in self.value if x.name == name]
 81 | 
 82 |     def only(self, name: str, default=Empty):
 83 |         """
 84 |         Returns a single item which matches the name parameter. Use this to streamline getting non-repeating items
 85 |         :param name: the name of the child item
 86 |         :param default: optional: the value to return if the item is not present (default: None)
 87 |         :return: the single child item
 88 |         :exception: ValueError: if there is more than one child item which matches this name
 89 |         """
 90 |         got = self.get_items_by_name(name)
 91 |         if len(got) == 0:
 92 |             return default
 93 |         elif len(got) == 1:
 94 |             return got[0]
 95 |         else:
 96 |             raise ValueError("More than one value with this key")
 97 | 
 98 |     def __getitem__(self, key: typing.Union[str, int]) -> list[typing.Any]:
 99 |         if isinstance(key, str):
100 |             return self.get_items_by_name(key)
101 |         elif isinstance(key, int):
102 |             return self.get_items_by_tag(key)
103 |         else:
104 |             raise TypeError("Key should be int or str; actual type: {0}".format(type(key)))
105 | 
106 |     def __len__(self) -> int:
107 |         return self.value.__len__()
108 | 
109 |     def __iter__(self):
110 |         if not isinstance(self.value, list):
111 |             raise ValueError("This object does not support child items")
112 |         else:
113 |             yield from (x.tag for x in self.value)
114 | 
115 | 
116 | class ProtoDecoder:
117 |     def __init__(self, object_name, func):
118 |         self.func = func
119 |         self.object_name = object_name
120 | 
121 |     def __call__(self, arg):
122 |         return self.func(arg)
123 | 
124 | 
125 | def _read_le_varint(stream: typing.BinaryIO, is_32bit=False) -> typing.Optional[typing.Tuple[int, bytes]]:
126 |     # this only outputs unsigned
127 |     limit = 5 if is_32bit else 10
128 |     i = 0
129 |     result = 0
130 |     underlying_bytes = []
131 |     while i < limit:  # 64 bit max possible?
132 |         raw = stream.read(1)
133 |         if len(raw) < 1:
134 |             return None
135 |         tmp, = raw
136 |         underlying_bytes.append(tmp)
137 |         result |= ((tmp & 0x7f) << (i * 7))
138 |         if (tmp & 0x80) == 0:
139 |             break
140 |         i += 1
141 |     return result, bytes(underlying_bytes)
142 | 
143 | 
144 | def read_le_varint(stream: typing.BinaryIO, is_32bit=False) -> typing.Optional[int]:
145 |     x = _read_le_varint(stream, is_32bit)
146 |     if x is None:
147 |         return None
148 |     else:
149 |         return x[0]
150 | 
151 | 
152 | def read_le_varint32(stream: typing.BinaryIO) -> typing.Optional[int]:
153 |     return read_le_varint(stream, True)
154 | 
155 | 
156 | def read_tag(
157 |         stream: typing.BinaryIO,
158 |         tag_mappings: dict[int, typing.Callable[[typing.BinaryIO], typing.Any]],
159 |         log_out=sys.stderr, use_friendly_tag=False) -> typing.Optional[ProtoObject]:
160 |     tag_id = read_le_varint(stream)
161 |     if tag_id is None: 
162 |         return None
163 |     decoder = tag_mappings.get(tag_id if not use_friendly_tag else tag_id >> 3)
164 |     name = None
165 |     if isinstance(decoder, ProtoDecoder):
166 |         name = decoder.object_name
167 |     
168 |     available_wirebytes = io.BytesIO(_get_bytes_for_wiretype(tag_id, stream))
169 | 
170 |     tag_value = decoder(available_wirebytes) if decoder else _fallback_decode(
171 |         tag_id, available_wirebytes, log_out)
172 |     
173 |     return ProtoObject(tag_id, name, tag_value)
174 | 
175 | 
176 | def read_protobuff(
177 |         stream: typing.BinaryIO,
178 |         tag_mappings: dict [int, typing.Callable[[typing.BinaryIO], typing.Any]],
179 |         use_friendly_tag=False) -> list[ProtoObject]:
180 |     result = []
181 |     while True:
182 |         tag = read_tag(stream, tag_mappings, use_friendly_tag=use_friendly_tag)
183 |         if tag is None:
184 |             break
185 |         result.append(tag)
186 | 
187 |     return result
188 | 
189 | 
190 | def read_blob(stream: typing.BinaryIO) -> bytes:
191 |     blob_length = read_le_varint(stream)
192 |     blob = stream.read(blob_length)
193 |     return blob
194 | 
195 | 
196 | def read_string(stream: typing.BinaryIO) -> str:
197 |     raw_string = read_blob(stream)
198 |     string = raw_string.decode("utf-8")
199 |     return string
200 | 
201 | 
202 | def read_double(stream: typing.BinaryIO) -> float:
203 |     return struct.unpack("<d", stream.read(8))[0]
204 | 
205 | 
206 | def read_long(stream: typing.BinaryIO) -> int:
207 |     return struct.unpack("<q", stream.read(8))[0]
208 | 
209 | 
210 | def read_int(stream: typing.BinaryIO) -> int:
211 |     return struct.unpack("<i", stream.read(4))[0]
212 | 
213 | 
214 | def read_embedded_protobuf(stream: typing.BinaryIO, mappings, use_friendly_tag=False) -> list[ProtoObject]:
215 |     blob_blob = read_blob(stream)
216 |     blob_stream = io.BytesIO(blob_blob)
217 |     return read_protobuff(blob_stream, mappings, use_friendly_tag)
218 | 
219 | 
220 | def read_fixed_blob(stream: typing.BinaryIO, length: int) -> bytes:
221 |     data = stream.read(length)
222 |     if len(data) != length:
223 |         raise ValueError("Couldn't read enough data")
224 |     return data
225 | 
226 | 
227 | _fallback_wire_types = {
228 |     0: read_le_varint,
229 |     1: lambda x: read_fixed_blob(x, 8),
230 |     2: read_blob,
231 |     5: lambda x: read_fixed_blob(x, 4)
232 |     }
233 | 
234 | _wire_type_friendly_names = {
235 |     0: "Varint",
236 |     1: "64-Bit",
237 |     2: "Length Delimited",
238 |     5: "32-Bit"
239 |     }
240 | 
241 | 
242 | def _get_bytes_for_wiretype(tag_id: int, stream: typing.BinaryIO):
243 |     wire_type = tag_id & 0x07
244 |     if wire_type == 0:
245 |         read_bytes = []
246 |         for i in range(10):
247 |             x = stream.read(1)[0]
248 |             read_bytes.append(x)
249 |             if x & 0x80 == 0:
250 |                 break
251 |         buffer = bytes(read_bytes)
252 |     elif wire_type == 1:
253 |         buffer = stream.read(8)
254 |     elif wire_type == 2:
255 |         l, b = _read_le_varint(stream)
256 |         available_bytes = stream.read(l)
257 |         if len(available_bytes) < l:
258 |             raise ValueError("Stream too short")
259 |         buffer = b + available_bytes
260 |     elif wire_type == 5:
261 |         buffer = stream.read(4)
262 |     else:
263 |         raise ValueError("Invalid wiretype")
264 | 
265 |     return buffer
266 | 
267 | 
268 | def _fallback_decode(tag_id, stream, log):
269 |     fallback_func = _fallback_wire_types.get(tag_id & 0x07)
270 |     if not fallback_func:
271 |         raise ValueError("No appropriate fallback function for tag {0} (wire type {1})".format(
272 |             tag_id, tag_id & 0x07))
273 |     if DEBUG:
274 |         log.write("Tag {0} ({1}) not defined, using fallback decoding.\n".format(
275 |             tag_id if tag_id > 0x7f else hex(tag_id), _wire_type_friendly_names[tag_id & 0x07]))
276 |     return fallback_func(stream)
277 | 


--------------------------------------------------------------------------------
/ccl_chromium_reader/storage_formats/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/ccl_chromium_reader/storage_formats/__init__.py


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [build-system]
 2 | requires = ["setuptools>=61.0"]
 3 | build-backend = "setuptools.build_meta"
 4 | 
 5 | [tool.setuptools]
 6 | packages = [
 7 |     "ccl_chromium_reader",
 8 |     "ccl_chromium_reader.serialization_formats",
 9 |     "ccl_chromium_reader.storage_formats"
10 | ]
11 | 
12 | [project]
13 | name = "ccl_chromium_reader"
14 | version = "0.3.14"
15 | authors = [
16 |   { name="Alex Caithness", email="research@cclsolutionsgroup.com" },
17 | ]
18 | description = "(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications."
19 | readme = "README.md"
20 | requires-python = ">=3.10"
21 | classifiers = [
22 |     "Programming Language :: Python :: 3",
23 |     "License :: OSI Approved :: MIT License",
24 |     "Operating System :: OS Independent",
25 |     "Development Status :: 4 - Beta",
26 | ]
27 | keywords = ["digital forensics", "dfir", "chrome", "chromium", "browser"]
28 | dependencies = [
29 |     "Brotli",
30 |     "ccl_simplesnappy @ git+https://github.com/cclgroupltd/ccl_simplesnappy.git"
31 | ]
32 | 
33 | [project.urls]
34 | Homepage = "https://github.com/cclgroupltd/ccl_chromium_reader"
35 | Issues = "https://github.com/cclgroupltd/ccl_chromium_reader/issues"


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cclgroupltd/ccl_chromium_reader/552516720761397c4d482908b6b8b08130b313a1/requirements.txt


--------------------------------------------------------------------------------
/tools_and_utilities/Chromium_dump_local_storage.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Copyright 2021, CCL Forensics
  3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
  4 | this software and associated documentation files (the "Software"), to deal in
  5 | the Software without restriction, including without limitation the rights to
  6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  7 | of the Software, and to permit persons to whom the Software is furnished to do
  8 | so, subject to the following conditions:
  9 | 
 10 | The above copyright notice and this permission notice shall be included in all
 11 | copies or substantial portions of the Software.
 12 | 
 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | SOFTWARE.
 20 | """
 21 | 
 22 | import sys
 23 | import pathlib
 24 | import datetime
 25 | import sqlite3
 26 | from ccl_chromium_reader import ccl_chromium_localstorage
 27 | 
 28 | __version__ = "0.1"
 29 | __description__ = "Dumps a Chromium localstorage leveldb to sqlite for review"
 30 | __contact__ = "Alex Caithness"
 31 | 
 32 | DB_SCHEMA = """
 33 | CREATE TABLE storage_keys ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "storage_key" TEXT);
 34 | CREATE TABLE batches ("start_ldbseq" INTEGER PRIMARY KEY, 
 35 |                       "end_ldbseq" INTEGER,
 36 |                       "storage_key" INTEGER,
 37 |                       "timestamp" INTEGER);
 38 | CREATE TABLE records ("_id" INTEGER PRIMARY KEY AUTOINCREMENT,
 39 |                       "storage_key" INTEGER,
 40 |                       "key" TEXT,
 41 |                       "value" TEXT,
 42 |                       "batch" INTEGER,
 43 |                       "ldbseq" INTEGER);
 44 | CREATE INDEX "storage_keys_storage_key" ON "storage_keys" ("storage_key");
 45 | 
 46 | CREATE VIEW "records_view" AS
 47 |     SELECT 
 48 |       storage_keys.storage_key AS "storage_key",
 49 |       records."key"  AS "key",
 50 |       records.value AS "value",
 51 |       datetime(batches."timestamp", 'unixepoch') AS "batch_timestamp",
 52 |       records.ldbseq AS "ldbseq"
 53 |     FROM records
 54 |       INNER JOIN storage_keys ON records.storage_key = storage_keys._id
 55 |       INNER JOIN batches ON records.batch = batches.start_ldbseq
 56 |     ORDER BY records.ldbseq;
 57 | """
 58 | 
 59 | INSERT_STORAGE_KEY_SQL = """INSERT INTO "storage_keys" ("storage_key") VALUES (?);"""
 60 | INSERT_BATCH_SQL = """INSERT INTO "batches" ("start_ldbseq", "end_ldbseq", "storage_key", "timestamp") 
 61 |                       VALUES (?, ?, ?, ?);"""
 62 | INSERT_RECORD_SQL = """INSERT INTO "records" ("storage_key", "key", "value", "batch", "ldbseq")
 63 |                        VALUES (?, ?, ?, ?, ?);"""
 64 | 
 65 | 
 66 | def main(args):
 67 |     level_db_in_dir = pathlib.Path(args[0])
 68 |     db_out_path = pathlib.Path(args[1])
 69 | 
 70 |     if db_out_path.exists():
 71 |         raise ValueError("output database already exists")
 72 | 
 73 |     local_storage = ccl_chromium_localstorage.LocalStoreDb(level_db_in_dir)
 74 |     out_db = sqlite3.connect(db_out_path)
 75 |     out_db.executescript(DB_SCHEMA)
 76 |     cur = out_db.cursor()
 77 | 
 78 |     storage_keys_lookup = {}
 79 |     for storage_key in local_storage.iter_storage_keys():
 80 |         cur.execute(INSERT_STORAGE_KEY_SQL, (storage_key,))
 81 |         cur.execute("SELECT last_insert_rowid();")
 82 |         storage_key_id = cur.fetchone()[0]
 83 |         storage_keys_lookup[storage_key] = storage_key_id
 84 | 
 85 |     for batch in local_storage.iter_batches():
 86 |         cur.execute(
 87 |             INSERT_BATCH_SQL,
 88 |             (batch.start, batch.end, storage_keys_lookup[batch.storage_key],
 89 |              batch.timestamp.replace(tzinfo=datetime.timezone.utc).timestamp()))
 90 | 
 91 |     for record in local_storage.iter_all_records():
 92 |         batch = local_storage.find_batch(record.leveldb_seq_number)
 93 |         batch_id = batch.start if batch is not None else None
 94 |         cur.execute(
 95 |             INSERT_RECORD_SQL,
 96 |             (
 97 |                 storage_keys_lookup[record.storage_key], record.script_key, record.value,
 98 |                 batch_id, record.leveldb_seq_number
 99 |             )
100 |         )
101 | 
102 |     cur.close()
103 |     out_db.commit()
104 |     out_db.close()
105 | 
106 | 
107 | if __name__ == '__main__':
108 |     if len(sys.argv) != 3:
109 |         print(f"{pathlib.Path(sys.argv[0]).name} <leveldb dir> <out.db>")
110 |         exit(1)
111 |     main(sys.argv[1:])
112 | 


--------------------------------------------------------------------------------
/tools_and_utilities/Chromium_dump_session_storage.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copyright 2021, CCL Forensics
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 7 | of the Software, and to permit persons to whom the Software is furnished to do
 8 | so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 | """
21 | 
22 | import sys
23 | import pathlib
24 | import sqlite3
25 | from ccl_chromium_reader import ccl_chromium_sessionstorage
26 | 
27 | __version__ = "0.1"
28 | __description__ = "Dumps a Chromium sessionstorage leveldb to sqlite for review"
29 | __contact__ = "Alex Caithness"
30 | 
31 | DB_SCHEMA = """
32 | CREATE TABLE "hosts" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "host" TEXT);
33 | CREATE TABLE "guids" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, "guid" TEXT);
34 | CREATE TABLE "items" ("_id" INTEGER PRIMARY KEY AUTOINCREMENT, 
35 |                       "host" INTEGER, 
36 |                       "guid" INTEGER, 
37 |                       "ldbseq" INTEGER, 
38 |                       "key" TEXT, 
39 |                       "value" TEXT);
40 | CREATE INDEX "item_host" ON "items" ("host");  
41 | CREATE INDEX "item_ldbseq" ON "items" ("ldbseq");
42 | 
43 | CREATE VIEW items_view AS 
44 |     SELECT "items"."ldbseq",
45 |       "hosts"."host",
46 |       "items"."key",
47 |       "items"."value",
48 |       "guids"."guid"
49 |     FROM "items"
50 |       LEFT JOIN "hosts" ON "items"."host" = "hosts"."_id"
51 |       LEFT JOIN "guids" ON "items"."guid" = "guids"."_id"
52 |     ORDER BY "items"."ldbseq";
53 | """
54 | 
55 | INSERT_HOST_SQL = """INSERT INTO "hosts" ("host") VALUES (?);"""
56 | INSERT_ITEM_SQL = """INSERT INTO "items" (host, guid, ldbseq, key, value) VALUES (?, ?, ?, ?, ?);"""
57 | 
58 | 
59 | def main(args):
60 |     level_db_in_dir = pathlib.Path(args[0])
61 |     db_out_path = pathlib.Path(args[1])
62 | 
63 |     if db_out_path.exists():
64 |         raise ValueError("output database already exists")
65 | 
66 |     session_storage = ccl_chromium_sessionstorage.SessionStoreDb(level_db_in_dir)
67 |     out_db = sqlite3.connect(db_out_path)
68 |     out_db.executescript(DB_SCHEMA)
69 |     cur = out_db.cursor()
70 |     for host in session_storage.iter_hosts():
71 |         cur.execute(INSERT_HOST_SQL, (host,))
72 |         cur.execute("SELECT last_insert_rowid();")
73 |         host_id = cur.fetchone()[0]
74 |         host_kvs = session_storage.get_all_for_host(host)
75 | 
76 |         for key, values in host_kvs.items():
77 |             for value in values:
78 |                 cur.execute(INSERT_ITEM_SQL, (host_id, None, value.leveldb_sequence_number, key, value.value))
79 | 
80 |     for key, value in session_storage.iter_orphans():
81 |         cur.execute(INSERT_ITEM_SQL, (None, None, value.leveldb_sequence_number, key, value.value))
82 | 
83 |     cur.close()
84 |     out_db.commit()
85 |     out_db.close()
86 | 
87 | 
88 | if __name__ == '__main__':
89 |     if len(sys.argv) != 3:
90 |         print(f"{pathlib.Path(sys.argv[0]).name} <leveldb dir> <out.db>")
91 |         exit(1)
92 |     main(sys.argv[1:])
93 | 


--------------------------------------------------------------------------------
/tools_and_utilities/benchmark.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import pathlib
 3 | from ccl_chromium_reader import ccl_chromium_indexeddb
 4 | import time
 5 | 
 6 | 
 7 | def main(args):
 8 |     start = time.time()
 9 |     ldb_path = pathlib.Path(args[0])
10 |     wrapper = ccl_chromium_indexeddb.WrappedIndexDB(ldb_path)
11 | 
12 |     for db_info in wrapper.database_ids:
13 |         db = wrapper[db_info.dbid_no]
14 |         print("------Database------")
15 |         print(f"db_number={db.db_number}; name={db.name}; origin={db.origin}")
16 |         print()
17 |         print("\t---Object Stores---")
18 |         for obj_store_name in db.object_store_names:
19 |             obj_store = db[obj_store_name]
20 |             print(f"\tobject_store_id={obj_store.object_store_id}; name={obj_store.name}")
21 |             try:
22 |                 one_record = next(obj_store.iterate_records())
23 |             except StopIteration:
24 |                 one_record = None
25 |         print()
26 |     end = time.time()
27 |     print("Elapsed time: {} seconds.".format(int(end-start)))
28 | 
29 | 
30 | if __name__ == '__main__':
31 |     if len(sys.argv) < 2:
32 |         print(f"USAGE: {pathlib.Path(sys.argv[0]).name} <ldb dir path>")
33 |         exit(1)
34 | 
35 |     main(sys.argv[1:])
36 | 


--------------------------------------------------------------------------------
/tools_and_utilities/dump_indexeddb_details.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copyright 2020-2024, CCL Forensics
 3 | 
 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 5 | this software and associated documentation files (the "Software"), to deal in
 6 | the Software without restriction, including without limitation the rights to
 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 8 | of the Software, and to permit persons to whom the Software is furnished to do
 9 | so, subject to the following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | """
22 | 
23 | import sys
24 | import pathlib
25 | from ccl_chromium_reader import ccl_chromium_indexeddb
26 | 
27 | 
28 | def main(args):
29 |     ldb_path = pathlib.Path(args[0])
30 |     wrapper = ccl_chromium_indexeddb.WrappedIndexDB(ldb_path)
31 | 
32 |     for db_info in wrapper.database_ids:
33 |         db = wrapper[db_info.dbid_no]
34 |         print("------Database------")
35 |         print(f"db_number={db.db_number}; name={db.name}; origin={db.origin}")
36 |         print()
37 |         print("\t---Object Stores---")
38 |         for obj_store_name in db.object_store_names:
39 |             obj_store = db[obj_store_name]
40 |             print(f"\tobject_store_id={obj_store.object_store_id}; name={obj_store.name}")
41 |             try:
42 |                 one_record = next(obj_store.iterate_records())
43 |             except StopIteration:
44 |                 one_record = None
45 |             if one_record is not None:
46 |                 print("\tExample record:")
47 |                 print(f"\tkey: {one_record.key}")
48 |                 print(f"\tvalue: {one_record.value}")
49 |             else:
50 |                 print("\tNo records")
51 |             print()
52 |         print()
53 | 
54 | 
55 | if __name__ == '__main__':
56 |     if len(sys.argv) < 2:
57 |         print(f"USAGE: {pathlib.Path(sys.argv[0]).name} <ldb dir path>")
58 |         exit(1)
59 |     main(sys.argv[1:])
60 | 


--------------------------------------------------------------------------------
/tools_and_utilities/dump_leveldb.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import csv
 3 | from ccl_chromium_reader.storage_formats import ccl_leveldb
 4 | import pathlib
 5 | 
 6 | ENCODING = "iso-8859-1"
 7 | 
 8 | 
 9 | def main(args):
10 |     input_path = args[0]
11 |     output_path = "leveldb_dump.csv"
12 |     if len(args) > 1:
13 |         output_path = args[1]
14 | 
15 |     leveldb_records = ccl_leveldb.RawLevelDb(input_path)
16 | 
17 |     with open(output_path, "w", encoding="utf-8", newline="") as file1:
18 |         writes = csv.writer(file1, quoting=csv.QUOTE_ALL)
19 |         writes.writerow(
20 |             [
21 |                 "key-hex", "key-text", "value-hex", "value-text", "origin_file",
22 |                 "file_type", "offset", "seq", "state", "was_compressed"
23 |             ])
24 | 
25 |         for record in leveldb_records.iterate_records_raw():
26 |             writes.writerow([
27 |                 record.user_key.hex(" ", 1),
28 |                 record.user_key.decode(ENCODING, "replace"),
29 |                 record.value.hex(" ", 1),
30 |                 record.value.decode(ENCODING, "replace"),
31 |                 str(record.origin_file),
32 |                 record.file_type.name,
33 |                 record.offset,
34 |                 record.seq,
35 |                 record.state.name,
36 |                 record.was_compressed
37 |             ])
38 | 
39 | 
40 | if __name__ == '__main__':
41 |     if len(sys.argv) < 2:
42 |         print(f"Usage: {pathlib.Path(sys.argv[0]).name} <indir path> [outpath.csv]")
43 |         exit(1)
44 |     print()
45 |     print("+--------------------------------------------------------+")
46 |     print("|Please note: keys and values in leveldb are binary blobs|")
47 |     print("|so any text seen in the output of this script might not |")
48 |     print("|represent the entire meaning of the data. The output of |")
49 |     print("|this script should be considered as a preview of the    |")
50 |     print("|data only.                                              |")
51 |     print("+--------------------------------------------------------+")
52 |     print()
53 |     main(sys.argv[1:])
54 | 


--------------------------------------------------------------------------------
/tools_and_utilities/extras/make_many_indexeddb_databases.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html>
  3 |     <head>
  4 |         <meta charset="utf-8">
  5 |         <script>            
  6 |             const databaseCount = 260;
  7 |             const objectStoreCount = 260; // only made in the first database, all others only get one.
  8 | 
  9 |             var setupDb = async () =>{
 10 |                 console.log("setupDb entered");
 11 |                 let db;
 12 |                 let request;
 13 |                 for(let i = 0; i < databaseCount ; i++){
 14 |                     request = indexedDB.open(`MyTestDatabase${i}`, 1);
 15 |                     request.onerror = function(event) {
 16 |                         console.log(event.target);
 17 |                     };
 18 |                     request.onsuccess = function(event) {
 19 |                         db = event.target.result;  
 20 |                         console.log("setupDb>request.onsuccess");
 21 | 
 22 |                     };
 23 |                     request.onupgradeneeded = function(event) {                     
 24 |                         console.log("setupDb>request.onupgradeneeded");
 25 |                         // Save the IDBDatabase interface 
 26 |                         let db = event.target.result;
 27 |                         
 28 |                         if(i == 0){
 29 |                             //make lots
 30 |                             for(let j = 0; j < objectStoreCount; j++){
 31 |                                 let objectStore = db.createObjectStore(`store${j}`, { keyPath: "id" });    
 32 |                             }
 33 |                         } else{
 34 |                             // Create an objectStore for this database
 35 |                             let objectStore = db.createObjectStore("store", { keyPath: "id" });    
 36 |                         }
 37 |                         
 38 |                     }
 39 |                 }
 40 |                 
 41 |             }
 42 | 
 43 |             var writeRecord = async () =>{
 44 |                 // let db;
 45 |                 // let tx;
 46 |                 // let objectStore;
 47 |                 // let addRequest;
 48 | 
 49 |                 console.log("writeRecord entered");
 50 |                 for(let i = 0; i < databaseCount; i++){
 51 |                     let request = indexedDB.open(`MyTestDatabase${i}`, 1);
 52 |                     request.onerror = function(event) {
 53 |                         console.log(event.target);
 54 |                         };
 55 |                     request.onsuccess = function(event) {
 56 |                         let db = event.target.result;
 57 |                         
 58 |                         if(i == 0){
 59 |                             // do lots
 60 |                             for(let j = 0; j < objectStoreCount; j++){
 61 |                                 let tx = db.transaction(`store${j}`, "readwrite");
 62 |                                 let objectStore = tx.objectStore(`store${j}`);
 63 |                                 
 64 |                                 let addRequest = objectStore.add({"id": "the id", "test_record": `record in db ${i}, store ${j}`});
 65 |                                 addRequest.onsuccess = (ev) => console.log(`record added`);
 66 |                                 addRequest.onerror = (ev) => console.log(`failed to add record ${ev.target.errorCode}`);      
 67 |                             }
 68 |                         } else{
 69 |                             // do one
 70 |                             let tx = db.transaction("store", "readwrite");
 71 |                             let objectStore = tx.objectStore("store");
 72 |                             
 73 |                             let addRequest = objectStore.add({"id": "the id", "test_record": `record in db ${i}`});
 74 |                             addRequest.onsuccess = (ev) => console.log(`record added`);
 75 |                             addRequest.onerror = (ev) => console.log(`failed to add record ${ev.target  }`);  
 76 |                         }
 77 |                              
 78 |                     }
 79 |                 }
 80 |             }
 81 | 
 82 |             var clearStore = async () =>{
 83 |                 let db;
 84 |                 let tx;
 85 |                 let objectStore;
 86 | 
 87 |                 console.log("clearStore entered");
 88 | 
 89 |                 for(let i = 0; i < databaseCount; i++){
 90 |                     let request = indexedDB.open(`MyTestDatabase${i}`, 1);
 91 |                     request.onerror = function(event) {
 92 |                         console.log(event.target);
 93 |                         };
 94 |                     request.onsuccess = function(event) {
 95 |                         db = event.target.result;
 96 |                         
 97 |                         if(i == 0){
 98 |                             // do lots
 99 |                             for(let j = 0; j < objectStoreCount; j++){
100 |                                 let tx = db.transaction(`store${j}`, "readwrite");
101 |                                 objectStore = tx.objectStore(`store${j}`);
102 |                                 objectStore.clear();
103 |                             }
104 |                         } else{
105 |                             // do one
106 |                             tx = db.transaction("store", "readwrite");
107 |                             objectStore = tx.objectStore("store");
108 |                             objectStore.clear()
109 |                         }
110 |                              
111 |                     }
112 |                 }
113 |             }
114 |         </script>
115 |     </head>
116 |         <button type="button" onclick="setupDb()">Setup DB</button>
117 |         <button type="button" onclick="writeRecord()">Write Record(s)</button>
118 |         <br><br>
119 |         <button type="button" onclick="clearStore()">Clear Store</button>
120 |     </body>
121 | </html>


--------------------------------------------------------------------------------
/tools_and_utilities/extras/make_test_indexeddb.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html>
  3 |     <head>
  4 |         <meta charset="utf-8">
  5 |         <script>
  6 |             let aRepeatedString = "this string literal is repeated";
  7 |             let aRepeatedStringObject = new String("this string object is repeated");
  8 |             
  9 |             let parent = {"parent": null, "children": []};
 10 |             let child = {"parent": parent, children: []};
 11 |             parent.children.push(child);
 12 | 
 13 |             let linked_one = {}
 14 |             let linked_two = {"prev2": linked_one}
 15 |             let linked_three = {"prev3": linked_two, "next3": linked_one}
 16 |             linked_one["prev1"] = linked_three;
 17 |             linked_one["next1"] = linked_two;
 18 | 
 19 |             let sparse = Array(100);
 20 |             sparse[32] = "ELEMENT AT INDEX 32";
 21 |             sparse[92] = "ELEMENT AT INDEX 92"
 22 | 
 23 |             let ab = new ArrayBuffer(128);
 24 |             let int32buff = new Int32Array(ab);
 25 |             int32buff.set([1, 2, 3, 4, 5, 6], 0);
 26 | 
 27 |             let _objects;
 28 |             let rsaKeyPair; 
 29 |             let ecdsaKeyPair;
 30 |             let hmacKey;
 31 |             let aesKey;
 32 |             
 33 | 
 34 |             Promise.all([
 35 |                 window.crypto.subtle.generateKey(
 36 |                     {
 37 |                         name: "RSA-OAEP",
 38 |                         modulusLength: 4096,
 39 |                         publicExponent: new Uint8Array([1, 0, 1]),
 40 |                         hash: "SHA-256"
 41 |                     },
 42 |                     true,
 43 |                     ["encrypt", "decrypt"]
 44 |                     ).then(x => rsaKeyPair = x),
 45 |                 window.crypto.subtle.generateKey(
 46 |                     {
 47 |                         name: "ECDSA",
 48 |                         namedCurve: "P-384"
 49 |                     },
 50 |                     true,
 51 |                     ["sign", "verify"]
 52 |                     ).then(x => ecdsaKeyPair = x),
 53 |                 window.crypto.subtle.generateKey(
 54 |                     {
 55 |                         name: "HMAC",
 56 |                         hash: {name: "SHA-512"}
 57 |                     },
 58 |                     true,
 59 |                     ["sign", "verify"]
 60 |                     ).then(x => hmacKey = x),
 61 |                 window.crypto.subtle.generateKey(
 62 |                     {
 63 |                         name: "AES-GCM",
 64 |                         length: 256
 65 |                     },
 66 |                     true,
 67 |                     ["encrypt", "decrypt"]
 68 |                     ).then(x => aesKey = x)
 69 |             ]).then(x =>{
 70 |                 _objects = [
 71 |                     {
 72 |                         "id": "basics",
 73 |                         "true": true,
 74 |                         "false": false,
 75 |                         "null": null,
 76 |                         "undefined": undefined,
 77 |                         "string_1a": aRepeatedString,
 78 |                         "string_1b": aRepeatedString,
 79 |                         "string_2a": aRepeatedStringObject,
 80 |                         "string_2b": aRepeatedStringObject,
 81 |                         "the_number_100": 100,
 82 |                         "the_number_1000000000": 1000000000,
 83 |                         "the_number_1.5": 1.5,
 84 |                         "aRegex": /[A-z]{3}/,
 85 |                         "date": new Date(2022, 10, 21, 17, 0, 0)
 86 |                     },
 87 |                     {
 88 |                         "id": "the_one_with_bigints",
 89 |                         "a_BigInt": BigInt("1000"),
 90 |                         "a_neg_bigInt": BigInt("-1000"),
 91 |                         "a_hugeInt": BigInt("100000000000000000000000"),
 92 |                         "beeegInt": 0x0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2fn,
 93 |                     },
 94 |                     {
 95 |                         "id": "the_one_with_collections",
 96 |                         "dense_array": ["one", "two", "three", "four"],
 97 |                         "sparse_array": sparse,
 98 |                         "inner_object": {"key1": "value1", "key2": "value2", "key3": "value3"},
 99 |                         "map": new Map([["map_key1", "map_value1"], ["map_key2", "map_value2"], ["map_key3", "map_value3"]]),
100 |                         "set": new Set(["set_value1", "set_value2", "set_value3", "set_value4"])
101 |                     },
102 |                     {
103 |                         "id": "the_one_with_array_buffers",
104 |                         "array_buffer": ab,
105 |                         "int32_buffer": int32buff
106 |                     },
107 |                     {
108 |                         "id": "the_one_with_cyclic_references",
109 |                         "one_layer": parent,
110 |                         "three_item_cyclic_linked_list": linked_one,
111 |                     },
112 |                     {
113 |                         "id": "the_one_with_unicode",
114 |                         "all_ascii": "hello world",
115 |                         "ascii_plus_latin1": "hélló wórld",
116 |                         "ascii_plus_emoji": "hell😮 world",
117 |                         "all_unicode": "😛😫😋😎"
118 |                     },
119 |                     {
120 |                         "id": "the_one_with_primitives_and_objects",
121 |                         "string_primitive": "hello primitive",
122 |                         "string_object": new String("hello object"),
123 |                         "bool_primitive_true": true,
124 |                         "bool_object_true" : new Boolean(1),
125 |                         "number_primitive_1000": 1000,
126 |                         "number_object_1000": new Number("1000"),
127 |                         "bigint_primitive_abcdefabcdefabcdefabcdef": 0xabcdefabcdefabcdefabcdefn,
128 |                         "bigint_object_abcdefabcdefabcdefabcdef": BigInt("0xabcdefabcdefabcdefabcdef")
129 |                     },
130 |                     {
131 |                         "id": ["an", "array", "of", "text"],
132 |                         "text": "primary key is an array of text"
133 |                     },
134 |                     {
135 |                         "id": 1000,
136 |                         "text": "primary key is the integer 1000"
137 |                     },
138 |                     {
139 |                         "id": ["an", "array", "of", "text", "and", "an", "integer", 1000],
140 |                         "text": "primary is an array of text with a number at the end"
141 |                     },
142 |                     {
143 |                         "id": "a_big_record_to_test_kIDBWrapThreshold_in_chrome",
144 |                         "big_array": [...Array(65536).keys()]
145 |                     },
146 |                     {
147 |                         "id": "the_one_with_crypto_objects",
148 |                         "rsa": rsaKeyPair,
149 |                         "ecdsa": ecdsaKeyPair,
150 |                         "hmac": hmacKey,
151 |                         "aes": aesKey
152 |                     },
153 |                     {
154 |                         "id": "the_one_with_a_blob",
155 |                         "blob": new Blob(
156 |                                 ["here is a blob, a lovely lovely blob, which is different to a file, but also somewhat similar",
157 |                                 "This is the second item in the blob which is used to construct the blob"
158 |                                 ],
159 |                                 {type: "text/plain"}
160 |                                 )
161 |                     },
162 |                     {
163 |                         "id": [["foo"]],
164 |                         "desc": "key is [[\"foo\"]]"
165 |                     },
166 |                     {
167 |                         "id": [[]],
168 |                         "desc": "key is [[]]"
169 |                     },
170 |                     {
171 |                         "id": new Date(2024, 4, 1),
172 |                         "desc": "key is new Date(2024, 4, 1)"
173 |                     },
174 |                     {
175 |                         "id": [[[1,2], 3, [[4], 5, 6], [7, [8, 9]]], 10],
176 |                         "desc": "key is [[[1,2], 3, [[4], 5, 6], [7, [8, 9]]], 10]"
177 |                     }
178 | 
179 |                 ];
180 |             });
181 |             
182 | 
183 |             
184 | 
185 | 
186 |             var setupDb = async () =>{
187 |                 console.log("setupDb entered");
188 |                 let db;
189 |                 let request = indexedDB.open("MyTestDatabase", 1);
190 |                 //var imageBitmap = await createImageBitmap(document.querySelector("#anImage"));
191 |                 request.onerror = function(event) {
192 |                     console.log(event.target);
193 |                     };
194 |                 request.onsuccess = function(event) {
195 |                     db = event.target.result;  
196 |                     console.log("setupDb>request.onsuccess");
197 | 
198 |                 };
199 |                 request.onupgradeneeded = function(event) {                     
200 |                     console.log("setupDb>request.onupgradeneeded");
201 |                     // Save the IDBDatabase interface 
202 |                     let db = event.target.result;
203 | 
204 |                     // Create an objectStore for this database
205 |                     let objectStore = db.createObjectStore("store", { keyPath: "id" });
206 |                 }
207 |             }
208 | 
209 |             var writeRecord = async () =>{
210 |                 let db;
211 |                 let tx;
212 |                 let objectStore;
213 |                 let addRequest;
214 | 
215 |                 console.log("writeRecord entered");
216 | 
217 |                 let request = indexedDB.open("MyTestDatabase", 1);
218 |                 //var imageBitmap = await createImageBitmap(document.querySelector("#anImage"));
219 |                 request.onerror = function(event) {
220 |                     console.log(event.target);
221 |                     };
222 |                 request.onsuccess = function(event) {
223 |                     db = event.target.result;
224 |                     tx = db.transaction("store", "readwrite");
225 |                     objectStore = tx.objectStore("store");
226 |                     
227 |                     for(let obj of _objects){
228 |                         addRequest = objectStore.add(obj);
229 |                         addRequest.onsuccess = (ev) => console.log(`record added ${obj}`);
230 |                         addRequest.onerror = (ev) => console.log(`failed to add record ${ev.target.errorCode}\n${obj}`);
231 |                     }
232 | 
233 |                     //Add the file(s)
234 |                     let fileInput1 = document.querySelector("#myfiles1");
235 |                     let files1 = fileInput1.files;
236 |                     let file1 = files1[0];
237 |                     
238 |                     let fileInput2 = document.querySelector("#myfiles2");
239 |                     let files2 = fileInput2.files;
240 |                     let file2 = files2[0];
241 | 
242 |                     addRequest = objectStore.add({
243 |                         "id": "the one with files",
244 |                         "one file": file1,
245 |                         "two file": file2,
246 |                         "the first file again": file1
247 | 
248 |                         //"file list": files  //Firefox won't serialize these?
249 |                     });
250 |                     
251 |                     addRequest.onsuccess = (ev) => console.log(`file record added`);
252 |                     addRequest.onerror = (ev) => console.log(`failed to add file record ${ev.target.errorCode}`);
253 | 
254 |                     // Add a big long record plus a file to show how mozilla deals with external data + a file
255 | 
256 |                     addRequest = objectStore.add({
257 |                         "id": "a_big_record_to_test_kIDBWrapThreshold_in_chrome_plus_a_file_to_check_how_mozilla_does_that",
258 |                         "big_array": [...Array(65536).keys()],
259 |                         "a_file": file1
260 |                     },)
261 | 
262 |                 };
263 |             }
264 | 
265 |             var clearStore = async () =>{
266 |                 let db;
267 |                 let tx;
268 |                 let objectStore;
269 | 
270 |                 console.log("clearStore entered");
271 | 
272 |                 let request = indexedDB.open("MyTestDatabase", 1);
273 |                 
274 |                 //var imageBitmap = await createImageBitmap(document.querySelector("#anImage"));
275 |                 request.onerror = function(event) {
276 |                     console.log(event.target);
277 |                     };
278 |                 request.onsuccess = function(event) {
279 |                     db = event.target.result;
280 |                     tx = db.transaction("store", "readwrite");
281 |                     objectStore = tx.objectStore("store");
282 |                     objectStore.clear();
283 |                 };
284 |             }
285 |         </script>
286 |     </head>
287 |     <body>
288 |         <input id="myfiles1" multiple type="file">
289 |         <input id="myfiles2" type="file">
290 |         <br><br>
291 |         <button type="button" onclick="setupDb()">Setup DB</button>
292 |         <button type="button" onclick="writeRecord()">Write Record(s)</button>
293 |         <br><br>
294 |         <button type="button" onclick="clearStore()">Clear Store</button>
295 |     </body>
296 | </html>


--------------------------------------------------------------------------------
/tools_and_utilities/extras/make_webstorage.html:
--------------------------------------------------------------------------------
 1 | <html>
 2 |     <head>
 3 |         <script>
 4 |             const KNOWN_LS_KEY = "ls_a_known_key";
 5 |             const KNOWN_SS_KEY = "ss_a_known_key";
 6 | 
 7 |             function makeLs(){
 8 |                 const tsNow = Date.now().toString();
 9 |                 localStorage.setItem(KNOWN_LS_KEY, `initial value set at ${tsNow}`);
10 |                 localStorage.setItem(`ls_property_with_ascii_${tsNow}`, "just ascii");
11 |                 localStorage.setItem(`ls_property_with_latin1_${tsNow}`, "latin-1 in a café");
12 |                 localStorage.setItem(`ls_property_in_greek_${tsNow}`, "τοπική αποθήκευση");
13 |                 localStorage.setItem(`ls_property_with_emoji_${tsNow}`, "😎🆒");
14 |                 localStorage.setItem(`ls_property_which_whould_be_shorter_as_utf_16_${tsNow}`, "\uff11\uff12\uff13\uff14")
15 |                 localStorage.setItem(`ls_property_which_is_long_${tsNow}`, "here is some long data which contains this string repeated a thousand times |".repeat(1000));
16 |                 console.log(`localstorage set with timestamp: ${tsNow}`);
17 |             }
18 |             
19 |             function makeSs(){
20 |                 const tsNow = Date.now().toString();
21 |                 sessionStorage.setItem(KNOWN_SS_KEY, `initial value set at ${tsNow}`);
22 |                 sessionStorage.setItem(`ss_property_with_ascii_${tsNow}`, "just ascii");
23 |                 sessionStorage.setItem(`ss_property_with_latin1_${tsNow}`, "latin-1 in a café");
24 |                 sessionStorage.setItem(`ss_property_in_greek_${tsNow}`, "τοπική αποθήκευση");
25 |                 sessionStorage.setItem(`ss_property_with_emoji_${tsNow}`, "😎🆒");
26 |                 sessionStorage.setItem(`ss_property_which_whould_be_shorter_as_utf_16_${tsNow}`, "\uff11\uff12\uff13\uff14")
27 |                 sessionStorage.setItem(`ss_property_which_is_long_${tsNow}`, "here is some long data which contains this string repeated a thousand times |".repeat(1000));
28 |                 console.log(`sessionstorage set with timestamp: ${tsNow}`);
29 |             }
30 | 
31 |             function updateLs(){
32 |                 const tsNow = Date.now().toString();
33 |                 localStorage.setItem(KNOWN_LS_KEY, `updated value set at ${tsNow}`);
34 |                 console.log(`localstorage updated with timestamp: ${tsNow}`);
35 |             }
36 | 
37 |             function updateSs(){
38 |                 const tsNow = Date.now().toString();
39 |                 sessionStorage.setItem(KNOWN_SS_KEY, `updated value set at ${tsNow}`);
40 |                 console.log(`sessionstorage updated with timestamp: ${tsNow}`);
41 |             }
42 | 
43 |             function readLs(){
44 |                 const tsNow = Date.now().toString();
45 |                 let lsVal = localStorage.getItem(KNOWN_LS_KEY);
46 |                 console.log(`Got local storage value ${KNOWN_LS_KEY} at ${tsNow}`);
47 |             }
48 | 
49 |             function readSs(){
50 |                 const tsNow = Date.now().toString();
51 |                 let ssVal = sessionStorage.getItem(KNOWN_SS_KEY);
52 |                 console.log(`Got session storage value ${KNOWN_SS_KEY} at ${tsNow}`);
53 |             }
54 | 
55 |             
56 |         </script>
57 |     </head>
58 |     <body>
59 |         <button onclick="makeLs()">Make Localstorage Records</button>
60 |         <button onclick="makeSs()">Make Sessionstorage Records</button>
61 |         <button onclick="localStorage.clear()">Clear Localstorage</button>
62 |         <button onclick="sessionStorage.clear()">Clear Sessionstorage</button>
63 |         <button onclick="updateLs()">Update Localstorage</button>
64 |         <button onclick="updateSs()">Update Sessionstorage</button>
65 |         <button onclick="readLs()">Read Localstorage</button>
66 |         <button onclick="readSs()">Read Sessionstorage</button>
67 |     </body>
68 | </html>


--------------------------------------------------------------------------------