├── README.md ├── img ├── 1.png ├── 2.png ├── 3.png └── 4.png ├── msidump.py ├── requirements.txt └── test-cases ├── README.md ├── putty-backdoored.msi.bin ├── sample1-run-autoruns64.msi.bin ├── sample2-run-calc-script.msi.bin ├── sample3-run-calc-shellcode-via-dotnet.msi.bin └── sample4-customaction-run-calc.msi.bin /README.md: -------------------------------------------------------------------------------- 1 | # `msidump` 2 | 3 | **MSI Dump** - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary data and incorporates YARA scanner. 4 | 5 | On Macro-enabled Office documents we can quickly use [oletools mraptor](https://github.com/decalage2/oletools/blob/master/oletools/mraptor.py) to determine whether document is malicious. If we want to dissect it further, we could bring in [oletools olevba](https://github.com/decalage2/oletools/blob/master/oletools/olevba.py) or [oledump](https://github.com/DidierStevens/DidierStevensSuite/blob/master/oledump.py). 6 | 7 | To dissect malicious MSI files, so far we had only one, but reliable and trustworthy [lessmsi](https://github.com/activescott/lessmsi). 8 | However, `lessmsi` doesn't implement features I was looking for: 9 | 10 | - quick triage 11 | - Binary data extraction 12 | - YARA scanning 13 | 14 | Hence this is where `msidump` comes into play. 15 | 16 | 17 | ## Features 18 | 19 | This tool helps in quick triages as well as detailed examinations of malicious MSIs corpora. 20 | It lets us: 21 | 22 | - Quickly determine whether file is suspicious or not. 23 | - List all MSI tables as well as dump specific records 24 | - Extract Binary data, all files from CABs, scripts from CustomActions 25 | - scan all inner data and records with YARA rules 26 | - Uses `file`/MIME type deduction to determine inner data type 27 | 28 | It was created as a companion tool to the blog post I released here: 29 | 30 | - [MSI Shenanigans. Part 1 - Offensive Capabilities Overview](https://mgeeky.tech/msi-shenanigans-part-1/) 31 | 32 | 33 | ### Limitations 34 | 35 | - The program is still in an early alpha version, things are expected to break and triaging/parsing logic to change 36 | - Due to this tool heavy relience on Win32 COM `WindowsInstaller.Installer` interfaces, currently **it is not possible to support native Linux** platforms. Maybe `wine python msidump.py` could help, but haven't tried that yet. 37 | 38 | 39 | ## Use Cases 40 | 41 | 1. Perform quick triage of a suspicious MSI augmented with YARA rule: 42 | 43 | ``` 44 | cmd> python msidump.py evil.msi -y rules.yara 45 | ``` 46 | 47 | ![1.png](img/1.png) 48 | 49 | Here we can see that input MSI is injected with suspicious **VBScript** and contains numerous executables in it. 50 | 51 | 52 | 2. Now we want to take a closer look at this VBScript by extracting only that record. 53 | 54 | We see from the triage table that it was present in `Binary` table. Lets get him: 55 | 56 | ``` 57 | python msidump.py putty-backdoored.msi -l binary -i UBXtHArj 58 | ``` 59 | 60 | We can specify which to record dump either by its name/ID or its index number (here that would be 7). 61 | 62 | ![2.png](img/2.png) 63 | 64 | Lets have a look at another example. This time there is executable stored in `Binary` table that will be executed during installation: 65 | 66 | ![3.png](img/3.png) 67 | 68 | To extract that file we're gonna go with 69 | 70 | ``` 71 | python msidump.py evil2.msi -x binary -i lmskBju -O extracted 72 | ``` 73 | 74 | Where 75 | - `-x binary` tells to extract contents of `Binary` table 76 | - `-i lmskBju` specifies which record exactly to extract 77 | - `-O extracted` sets output directory 78 | 79 | ![4.png](img/4.png) 80 | 81 | 82 | For the best output experience, run the tool on a **maximized console window** or redirect output to file: 83 | 84 | ``` 85 | python msidump.py [...] -o analysis.log 86 | ``` 87 | 88 | ## Full Usage 89 | 90 | ``` 91 | PS D:\> python .\msidump.py --help 92 | options: 93 | -h, --help show this help message and exit 94 | 95 | Required arguments: 96 | infile Input MSI file (or directory) for analysis. 97 | 98 | Options: 99 | -q, --quiet Surpress banner and unnecessary information. In triage mode, will display only verdict. 100 | -v, --verbose Verbose mode. 101 | -d, --debug Debug mode. 102 | -N, --nocolor Dont use colors in text output. 103 | -n PRINT_LEN, --print-len PRINT_LEN 104 | When previewing data - how many bytes to include in preview/hexdump. Default: 128 105 | -f {text,json,csv}, --format {text,json,csv} 106 | Output format: text, json, csv. Default: text 107 | -o path, --outfile path 108 | Redirect program output to this file. 109 | -m, --mime When sniffing inner data type, report MIME types 110 | 111 | Analysis Modes: 112 | -l what, --list what List specific table contents. See help message to learn what can be listed. 113 | -x what, --extract what 114 | Extract data from MSI. For what can be extracted, refer to help message. 115 | 116 | Analysis Specific options: 117 | -i number|name, --record number|name 118 | Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir 119 | -O path, --outdir path 120 | When --extract mode is used, specifies output location where to extract data. 121 | -y path, --yara path Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files 122 | 123 | ------------------------------------------------------ 124 | 125 | - What can be listed: 126 | --list CustomAction - Specific table 127 | --list Registry,File - List multiple tables 128 | --list stats - Print MSI database statistics 129 | --list all - All tables and their contents 130 | --list olestream - Prints all OLE streams & storages. 131 | To display CABs embedded in MSI try: --list _Streams 132 | --list cabs - Lists embedded CAB files 133 | --list binary - Lists binary data embedded in MSI for its own purposes. 134 | That typically includes EXEs, DLLs, VBS/JS scripts, etc 135 | 136 | - What can be extracted: 137 | --extract all - Extracts Binary data, all files from CABs, scripts from CustomActions 138 | --extract binary - Extracts Binary data 139 | --extract files - Extracts files 140 | --extract cabs - Extracts cabinets 141 | --extract scripts - Extracts scripts 142 | 143 | ------------------------------------------------------ 144 | ``` 145 | 146 | ## TODO 147 | 148 | - Triaging logic is still a bit flakey, I'm not very proud of it. Hence it will be subject for constant redesigns and further ramifications 149 | - Test it on a wider test samples corpora 150 | - Add support for input ZIP archives with passwords 151 | - Add support for ingesting entire directory full of YARA rules instead of working with a single file only 152 | - Currently, the tool matches malicious `CustomAction Type`s based on assessing their numbers, which is prone to being evaded. 153 | - It needs to be reworked to properly consume Type number and decompose it [onto flags](https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types) 154 | 155 | 156 | ## Tool's Name 157 | 158 | Apparently when naming my tool, I didn't think on checking whether it was already taken. 159 | There is another tool named `msidump` being part of [msitools](https://gitlab.gnome.org/GNOME/msitools) GNU package: 160 | 161 | - [msidump](https://wiki.gnome.org/msitools) 162 | 163 | --- 164 | 165 | ### ☕ Show Support ☕ 166 | 167 | This and other projects are outcome of sleepless nights and **plenty of hard work**. If you like what I do and appreciate that I always give back to the community, 168 | [Consider buying me a coffee](https://github.com/sponsors/mgeeky) _(or better a beer)_ just to say thank you! 💪 169 | 170 | --- 171 | 172 | ``` 173 | Mariusz Banach / mgeeky, (@mariuszbit) 174 | 175 | ``` 176 | -------------------------------------------------------------------------------- /img/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/1.png -------------------------------------------------------------------------------- /img/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/2.png -------------------------------------------------------------------------------- /img/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/3.png -------------------------------------------------------------------------------- /img/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/4.png -------------------------------------------------------------------------------- /msidump.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | # 3 | # Written by Mariusz Banach , @mariuszbit / mgeeky 4 | # 5 | 6 | import sys 7 | import os 8 | import re 9 | import glob 10 | import pefile 11 | import argparse 12 | import hashlib 13 | import random 14 | import string 15 | import tempfile 16 | import textwrap 17 | import cabarchive 18 | import shutil 19 | import atexit 20 | import urllib 21 | from collections import OrderedDict 22 | from textwrap import fill 23 | 24 | if sys.platform != 'win32': 25 | print('\n\n[!] FATAL: This script can only be used in Windows system as it works with Win32 COM/OLE interfaces.\n\n') 26 | 27 | import pythoncom 28 | import win32com.client 29 | from win32com.shell import shell, shellcon 30 | from win32com.client import constants 31 | 32 | USE_SSDEEP = False 33 | 34 | try: 35 | import ssdeep 36 | USE_SSDEEP = True 37 | except: 38 | quiet = False 39 | # for a in sys.argv: 40 | # if a == '-q' or a == '--quiet': 41 | # quiet = True 42 | # break 43 | # if not quiet: 44 | # print("[!] 'ssdeep' not installed. Will not use it.") 45 | 46 | try: 47 | import colorama 48 | import magic 49 | import yara 50 | import olefile 51 | from prettytable import PrettyTable 52 | 53 | except ImportError as e: 54 | print(f'\n[!] Requirements not installed: {e}\n\tInstall them with:\n\tcmd> pip install -r requirements.txt\n') 55 | sys.exit(1) 56 | 57 | ######################################################### 58 | 59 | VERSION = '0.2' 60 | 61 | ######################################################### 62 | 63 | options = { 64 | 'debug' : False, 65 | 'verbose' : False, 66 | 'format' : 'text', 67 | } 68 | 69 | logger = None 70 | 71 | try: 72 | colorama.init() 73 | except: 74 | pass 75 | 76 | class Logger: 77 | colors_map = { 78 | 'red': colorama.Fore.RED, 79 | 'green': colorama.Fore.GREEN, 80 | 'yellow': colorama.Fore.YELLOW, 81 | 'blue': colorama.Fore.BLUE, 82 | 'magenta': colorama.Fore.MAGENTA, 83 | 'cyan': colorama.Fore.CYAN, 84 | 'white': colorama.Fore.WHITE, 85 | 'grey': colorama.Fore.WHITE, 86 | 'reset': colorama.Style.RESET_ALL, 87 | } 88 | 89 | def __init__(self, opts): 90 | self.opts = opts 91 | 92 | @staticmethod 93 | def colorize(txt, col): 94 | if type(txt) is not str: 95 | txt = str(txt) 96 | if not col in Logger.colors_map.keys() or options.get('nocolor', False): 97 | return txt 98 | return Logger.colors_map[col] + txt + Logger.colors_map['reset'] 99 | 100 | @staticmethod 101 | def stripColors(txt): 102 | ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') 103 | result = ansi_escape.sub('', txt) 104 | return result 105 | 106 | def fatal(self, txt): 107 | self.text('[!] ' + txt, color='red') 108 | sys.exit(1) 109 | 110 | def info(self, txt): 111 | self.text('[.] ' + txt, color='yellow') 112 | 113 | def err(self, txt): 114 | self.text('[-] ' + txt, color='red') 115 | 116 | def ok(self, txt): 117 | self.text('[+] ' + txt, color='green') 118 | 119 | def verbose(self, txt): 120 | if self.opts.get('verbose', False) or self.opts.get('debug', False): 121 | self.text('[>] ' + txt, color='cyan') 122 | 123 | def dbg(self, txt): 124 | if self.opts.get('debug', False): 125 | self.text('[dbg] ' + txt, color='magenta') 126 | 127 | def text(self, txt, color='none'): 128 | if color != 'none': 129 | txt = Logger.colorize(txt, color) 130 | 131 | if not self.opts.get('quiet', False): 132 | print(txt) 133 | 134 | 135 | class MSIDumper: 136 | # https://learn.microsoft.com/pl-pl/windows/win32/msi/custom-action-return-processing-options?redirectedfrom=MSDN 137 | CustomActionReturnType = { 138 | 'check' : 0, 139 | 'ignore' : 64, 140 | 'asyncWait' : 128, 141 | 'asyncNoWait' : 192, 142 | } 143 | 144 | # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-execution-scheduling-options 145 | CustomActionExecuteType = { 146 | 'always' : 0, 147 | 'firstSequence' : 256, 148 | 'oncePerProcess' : 512, 149 | 'clientRepeat' : 768 150 | } 151 | 152 | # 153 | # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-in-script-execution-options 154 | # Deferred, rollback and commit custom actions can only be placed between InstallInitialize and InstallFinalize 155 | # 156 | CustomActionInScriptExecute = { 157 | 'immediate' : 0, 158 | 'deferred' : 1, 159 | 'rollback' : 1280, 160 | 'commit' : 1536, 161 | 'deferred-no-impersonate' : 3072, 162 | 'rollback-no-impersonate' : 3328, 163 | 'commit-no-impersonate' : 3584, 164 | } 165 | 166 | # https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types 167 | CustomActionNativeTypes = { 168 | 'dll-in-binary-table' : 1, 169 | 'exe-in-binary-table' : 2, 170 | 'jscript-in-binary-table' : 5, 171 | 'vbscript-in-binary-table' : 6, 172 | 'dll-installed-with-product' : 17, 173 | 'exe-installed-with-product' : 18, 174 | 'jscript-installed-with-product' : 21, 175 | 'vbscript-installed-with-product' : 22, 176 | 'exe-with-directory-path-in-target' : 34, 177 | 'directory-set' : 35, 178 | 'jscript-in-sequence-table' : 37, 179 | 'vbscript-in-sequence-table' : 38, 180 | 'exe-command-line' : 50, 181 | 'jscript-with-funcname-in-property' : 53, 182 | 'vbscript-with-funcname-in-property' : 55, 183 | } 184 | 185 | OpenMode = { 186 | 'msiOpenDatabaseModeReadOnly' : 0, 187 | 'msiOpenDatabaseModeTransact' : 1, 188 | } 189 | 190 | SkipColumns = ( 191 | 'extendedtype', 192 | ) 193 | 194 | ListModes = ( 195 | 'all', 'olestream', 'cabs', 'binary', 'stats', 'olestreams', 196 | ) 197 | 198 | ExtractModes = ( 199 | 'all', 'binary', 'files', 'cabs', 'scripts', 200 | ) 201 | 202 | KnownCOMErrors = { 203 | 0x80004005 : 'Could not process input database', 204 | } 205 | 206 | KnownTables = ( 207 | 'ActionText', 'AdminExecuteSequence', 'AdminUISequence', 'AdvtExecuteSequence', 'AdvtUISequence', 208 | 'AppId', 'AppSearch', 'BBControl', 'Billboard', 'Binary', 'BindImage', 'CCPSearch', 'CheckBox', 209 | 'Class', 'ComboBox', 'CompLocator', 'Complus', 'Component', 'Condition', 'Control', 'ControlCondition', 210 | 'ControlEvent', 'CreateFolder', 'CustomAction', 'Dialog', 'Directory', 'DrLocator', 211 | 'DuplicateFile', 'Environment', 'Error', 'EventMapping', 'Extension', 'Feature', 'FeatureComponents', 212 | 'File', 'FileSFPCatalog', 'Font', 'Icon', 'IniFile', 'IniLocator', 'InstallExecuteSequence', 213 | 'InstallUISequence', 'IsolatedComponent', 'LaunchCondition', 'ListBox', 'ListView', 'LockPermissions', 214 | 'Media', 'MIME', 'MoveFile', 'MsiAssembly', 'MsiAssemblyName', 'MsiDigitalCertificate', 215 | 'MsiDigitalSignature', 'MsiEmbeddedChainer', 'MsiEmbeddedUI', 'MsiFileHash', 'MsiLockPermissionsEx', 216 | 'MsiPackageCertificate', 'MsiPatchCertificate', 'MsiPatchHeaders', 'MsiPatchMetadata', 'MsiPatchOldAssemblyFile', 217 | 'MsiPatchOldAssemblyName', 'MsiPatchSequence', 'MsiServiceConfig', 'MsiServiceConfigFailureActions', 218 | 'MsiSFCBypass', 'MsiShortcutProperty', 'ODBCAttribute', 'ODBCDataSource', 'ODBCDriver', 'ODBCSourceAttribute', 219 | 'ODBCTranslator', 'Patch', 'PatchPackage', 'ProgId', 'Property', 'PublishComponent', 'RadioButton', 220 | 'Registry', 'RegLocator', 'RemoveFile', 'RemoveIniFile', 'RemoveRegistry', 'ReserveCost', 'SelfReg', 221 | 'ServiceControl', 'ServiceInstall', 'SFPCatalog', 'Shortcut', 'Signature', 'TextStyle', 'TypeLib', 'UIText', 222 | 'Upgrade', 'Verb', '_Columns', '_Storages', '_Streams', '_Tables', '_TransformView', '_Validation', 223 | ) 224 | 225 | ImportantTables = ( 226 | 'CustomAction', 'InstallExecuteSequence', '_Streams', 'Media', 'InstallUISequence', 'Binary', '_TransformView', 227 | 'Component', 'Registry', 'Shortcut', 'RemoveFile', 'File', 228 | ) 229 | 230 | SuspiciousTables = ( 231 | 'CustomAction', 'Binary', '_Streams', 232 | ) 233 | 234 | # 235 | # Approach based on assessing CustomAction Type numbers is prone to being evaded. 236 | # TODO: Rework it to properly consume Type number and decompose it onto flags: 237 | # https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types 238 | # 239 | CustomActionTypes = { 240 | 'Execute' : { 241 | 'color' : 'red', 242 | 'types': (1250, 3298, 226), 243 | 'desc' : 'Will execute system commands or other executables', 244 | }, 245 | 'VBScript' : { 246 | 'color' : 'red', 247 | 'types': (1126, 102), 248 | 'desc' : 'Will run VBScript in-memory', 249 | }, 250 | 'JScript' : { 251 | 'color' : 'red', 252 | 'types': (1125, 101), 253 | 'desc' : 'Will run JScript in-memory', 254 | }, 255 | 'Run-Exe' : { 256 | 'color' : 'red', 257 | 'types': (1218, 194), 258 | 'desc' : 'Will extract executable from inner Binary table, drop it to:\n C:\\Windows\\Installer\\MSIXXXX.tmp\nand then run it.', 259 | }, 260 | 'Load-DLL' : { 261 | 'color' : 'red', 262 | 'types': (65, ), 263 | 'desc' : 'Will load DLL in memory and invoke its exported function.\nThat may also include .NET DLL', 264 | }, 265 | 'Run-Dropped-File' : { 266 | 'color' : 'red', 267 | 'types': (1746,), 268 | 'desc' : 'Will run file extracted as a result of installation', 269 | }, 270 | 'Set-Directory' : { 271 | 'color' : 'cyan', 272 | 'types': (51,), 273 | 'desc' : 'Will set Directory to a specific path', 274 | }, 275 | } 276 | 277 | MimeTypesThatIncreasSuspiciousScore = ( 278 | "application/hta", 279 | "application/js", 280 | "application/msword", 281 | "application/vnd.ms-excel", 282 | "application/vnd.ms-powerpoint", 283 | "application/vns.ms-appx", 284 | "application/x-ms-shortcut", 285 | "application/x-vbs", 286 | 'application/vnd.ms-excel', 287 | 'application/vnd.openxmlformats-officedocument.presentationml.presentation', 288 | 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 289 | 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 290 | 'application/x-dosexec', 291 | ) 292 | 293 | RecognizedInnerFileTypes = { 294 | 'cabinet' : { 295 | 'indicator' : 'MS Cabinet archive (.CAB)', 296 | 'safe-extension' : '.cab', 297 | 'color' : 'yellow', 298 | 'magic' : ('Microsoft Cabinet',) 299 | }, 300 | 'executable' : { 301 | 'indicator' : 'PE executable (EXE)', 302 | 'safe-extension' : '.exe.bin', 303 | 'color' : 'red', 304 | 'magic' : ( 305 | 'executable (console)', 306 | 'executable (GUI)', 307 | ) 308 | }, 309 | 'dll' : { 310 | 'indicator' : 'PE executable (DLL)', 311 | 'safe-extension' : '.dll.bin', 312 | 'color' : 'red', 313 | 'magic' : ( 314 | 'executable (DLL)', 315 | ) 316 | }, 317 | 'unsure-executable' : { 318 | 'indicator' : 'PE executable (?)', 319 | 'safe-extension' : '.exe.bin', 320 | 'color' : 'red', 321 | 'min-keywords' : 3, 322 | 'keywords' : ( 323 | 'This program', 'cannot be', 'run in', 'dos mode', 324 | ), 325 | }, 326 | 'unsure-cabinet' : { 327 | 'indicator' : 'CAB archive (?)', 328 | 'safe-extension' : '.cab', 329 | 'color' : 'yellow', 330 | 'min-keywords' : 1, 331 | 'keywords' : ( 332 | 'MSCF', 333 | ), 334 | }, 335 | 'unsure-vbscript' : { 336 | 'indicator' : 'VBScript (?)', 337 | 'safe-extension' : '.vbs.bin', 338 | 'color' : 'red', 339 | 'printable' : True, 340 | 'min-keywords' : 3, 341 | 'keywords' : ( 342 | 'dim', 'function ', 'sub ', 'createobject', 'getobject', 'with', 'string', 343 | 'object', 'set', 'then', 'end if', 'end function', 'end sub' 344 | ), 345 | 'not-keywords' : ( 346 | '' 442 | 443 | if type(data) is str: 444 | data = data.encode() 445 | 446 | for i in range(0, num, 16): 447 | line = '' 448 | line += '%04x | ' % (addr + i) 449 | n += 16 450 | 451 | for j in range(n-16, n): 452 | if j >= len(data): break 453 | line += '%02x ' % (int(data[j]) & 0xff) 454 | 455 | line += ' ' * (3 * 16 + 7 - len(line)) + ' | ' 456 | 457 | for j in range(n-16, n): 458 | if j >= len(data): break 459 | c = data[j] if not (data[j] < 0x20 or data[j] > 0x7e) else '.' 460 | line += '%c' % c 461 | 462 | lines.append(line) 463 | return '\n'.join(lines) 464 | 465 | def parseCOMException(self, message, error, additional=''): 466 | code = error.hresult + 2**32 467 | code2 = 0 468 | 469 | try: 470 | code2 = error.excepinfo[-1] + 2**32 471 | except: 472 | pass 473 | 474 | if code2 != 0: 475 | if code in MSIDumper.KnownCOMErrors: 476 | additional += MSIDumper.KnownCOMErrors[code] 477 | 478 | if code2 in MSIDumper.KnownCOMErrors: 479 | additional += MSIDumper.KnownCOMErrors[code2] 480 | 481 | self.logger.err(f'''{message}: 482 | 483 | {error} 484 | 485 | HRESULT 1: 0x{code:08X} <-- General exception code 486 | 487 | HRESULT 2: 0x{code2:08X} <-- COM exception code. Google up that error number: 488 | https://google.com/?q={urllib.parse.quote_plus(f"COM exception 0x{code2:08X}")} 489 | 490 | {additional} 491 | ''') 492 | 493 | else: 494 | if code in MSIDumper.KnownCOMErrors: 495 | additional += MSIDumper.KnownCOMErrors[code] 496 | 497 | self.logger.err(f'''{message}: 498 | 499 | {error} 500 | 501 | HRESULT: 0x{code:08X} <-- General exception code 502 | 503 | {additional} 504 | ''') 505 | 506 | def open(self, infile): 507 | self.infile = os.path.abspath(os.path.normpath(infile)) 508 | self.outdir = os.path.abspath(os.path.normpath(self.options.get('outdir', ''))) 509 | 510 | if not os.path.isfile(self.infile): 511 | self.logger.fatal(f'Input file does not exist: {self.infile}') 512 | 513 | mode = MSIDumper.OpenMode['msiOpenDatabaseModeReadOnly'] 514 | 515 | if self.disinfectionMode: 516 | self.logger.fatal('MSI Disinfection is not yet implemented.') 517 | mode = MSIDumper.OpenMode['constants.msiOpenDatabaseModeTransact'] 518 | 519 | self.initCOM() 520 | 521 | try: 522 | self.logger.dbg(f'Opening database {self.infile} ...') 523 | self.nativedb = self.installer.OpenDatabase( 524 | self.infile, 525 | mode 526 | ) 527 | 528 | return True 529 | 530 | except pythoncom.com_error as error: 531 | if self.options['debug']: 532 | self.parseCOMException( 533 | message=f"Could not open MSI database natively via COM", 534 | error=error 535 | ) 536 | 537 | return False 538 | 539 | def close(self): 540 | if self.nativedb is not None: 541 | self.nativedb = None 542 | 543 | if self.installer is not None: 544 | try: 545 | self.installer.Release() 546 | except: 547 | pass 548 | 549 | self.installer = None 550 | 551 | def initCOM(self): 552 | if self.installer is not None: 553 | return 554 | 555 | try: 556 | # 557 | # Logic borrowed from: 558 | # https://github.com/orestis/python/blob/master/Tools/msi/msilib.py#L60 559 | # 560 | 561 | self.logger.dbg('Initializing COM and instantiating WindowsInstaller.Installer ...') 562 | pythoncom.CoInitialize() 563 | 564 | win32com.client.gencache.EnsureModule('{000C1092-0000-0000-C000-000000000046}', 1033, 1, 0) 565 | 566 | self.installer = win32com.client.Dispatch( 567 | 'WindowsInstaller.Installer', 568 | resultCLSID='{000C1090-0000-0000-C000-000000000046}' 569 | ) 570 | 571 | if self.installer is None: 572 | self.logger.fatal('Could not instantiate WindowsInstaller.Installer!') 573 | 574 | except Exception as e: 575 | self.logger.fatal(f'Could not instantiate WindowsInstaller.Installer. Exception:\n\n\t{e}') 576 | 577 | def collectEntries(self, table, dontSort = False): 578 | entries = [] 579 | 580 | try: 581 | entries = self._collectEntries( 582 | table, 583 | dontSort 584 | ) 585 | except Exception as e: 586 | self.logger.dbg(f'Error: Table {table} did not contain any records.') 587 | 588 | if self.options.get('debug', False) and table.lower() != '_streams': 589 | raise 590 | 591 | return entries 592 | 593 | def _collectEntries(self, table, dontSort = False): 594 | assert self.nativedb is not None, "Database is not opened" 595 | entries = [] 596 | 597 | view = self.nativedb.OpenView(f"SELECT * FROM {table}") 598 | view.Execute(None) 599 | 600 | types = view.ColumnInfo(constants.msiColumnInfoTypes) 601 | names = view.ColumnInfo(constants.msiColumnInfoNames) 602 | columns = [] 603 | 604 | for i in range(1, types.FieldCount+1): 605 | t = types.StringData(i) 606 | n = names.StringData(i) 607 | 608 | if t[0] in 'slSL': 609 | columns.append((n, 'str')) 610 | elif t[0] in 'iI': 611 | columns.append((n, 'int')) 612 | elif t[0] == 'v': 613 | columns.append((n, 'bin')) 614 | else: 615 | self.logger.dbg(f'Unsupported column type: table {table}, column: {i}. Type: {t}, Name: {n}') 616 | columns.append((n, '?')) 617 | 618 | while True: 619 | r = view.Fetch() 620 | if not r: 621 | break 622 | 623 | rec = OrderedDict() 624 | for i in range(1, r.FieldCount+1): 625 | val = None 626 | name = columns[i-1][0] 627 | 628 | if r.IsNull(i): 629 | val = '' 630 | 631 | elif columns[i-1][1] == 'str': 632 | try: 633 | val = r.StringData(i) 634 | 635 | except Exception as e: 636 | txt = f'Could not convert {table} column {columns[i-1][0]} value to string (type: {columns[i-1][1]}): {e}' 637 | if txt not in self.errorsCache: 638 | self.logger.dbg(txt) 639 | self.errorsCache.add(txt) 640 | val = '' 641 | 642 | elif columns[i-1][1] == 'int': 643 | try: 644 | val = r.IntegerData(i) 645 | except Exception as e: 646 | txt = f'Could not convert {table} column {columns[i-1][0]} value to integer (type: {columns[i-1][1]}): {e}' 647 | if txt not in self.errorsCache: 648 | self.logger.dbg(txt) 649 | self.errorsCache.add(txt) 650 | val = 0 651 | 652 | elif columns[i-1][1] == 'bin': 653 | size = r.DataSize(i) 654 | val = r.ReadStream(i, size, constants.msiReadStreamBytes) 655 | 656 | rec[columns[i-1][0].lower()] = val 657 | 658 | entries.append(rec) 659 | 660 | view.Close() 661 | 662 | if not dontSort and table in MSIDumper.TableSortBy: 663 | entries = sorted(entries, key=lambda x: list(x.values())[MSIDumper.TableSortBy[table]] ) 664 | 665 | self.logger.dbg(f'Collected {len(entries)} entries from {table} ...') 666 | return entries 667 | 668 | def getMaxValueFromTable(self, table, columnNum): 669 | maxVal = -1 670 | entries = self.collectEntries(table) 671 | 672 | for entry in entries: 673 | if maxVal < entry[columnNum]: 674 | maxVal = entry[columnNum] 675 | 676 | return maxVal 677 | 678 | def analyse(self): 679 | assert self.nativedb is not None, "Database is not opened" 680 | 681 | try: 682 | ret = self.analysisWorker() 683 | 684 | if self.grade > 0: 685 | self.verdict = f'[.] Verdict: {Logger.colorize("SUSPICIOUS", "red")}' 686 | 687 | self.logger.verbose(f'Verdict grade: {self.grade}') 688 | 689 | return ret 690 | 691 | except Exception as e: 692 | if self.nativedb is not None: 693 | self.nativedb = None 694 | 695 | if self.options['debug']: 696 | raise 697 | else: 698 | self.logger.err(f'Could not analyse input MSI. Enable --debug to learn more. Exception: {e}') 699 | 700 | return False 701 | 702 | finally: 703 | pass 704 | 705 | def listTable(self, table): 706 | if ',' in table: 707 | output = '' 708 | tables = table.split(',') 709 | for t in tables: 710 | output += f'{Logger.colorize("[+]", "green")} Listing: {Logger.colorize(t, "green")}\n\n' 711 | 712 | out = self._listTable(t) 713 | if out is not None: 714 | output += str(out) + '\n' 715 | 716 | return output 717 | else: 718 | return self._listTable(table) 719 | 720 | def _listTable(self, table): 721 | assert self.nativedb is not None, "Database is not opened" 722 | 723 | records = None 724 | 725 | if table == 'streams': table = '_Streams' 726 | if table == 'stream': table = '_Streams' 727 | if table == 'binary': table = 'Binary' 728 | if table == 'cabs': table = 'Media' 729 | if table == 'olestreams':table = 'olestream' 730 | 731 | if table.lower() not in [x.lower() for x in MSIDumper.KnownTables + MSIDumper.ListModes]: 732 | tb = PrettyTable(['1','2','3']) 733 | tb.header = False 734 | vals = list(MSIDumper.KnownTables + MSIDumper.ListModes) 735 | i = 0 736 | while i + 3 < len(vals): 737 | tb.add_row([vals[i+0], vals[i+1], vals[i+2]]) 738 | i += 3 739 | 740 | if i < len(vals): 741 | for j in range(len(vals)-i): 742 | tb.add_row([vals[i+j], '', '']) 743 | 744 | self.logger.fatal(f'Unsupported --list setting: {table}\n Pick one/combination of following --list values:\n\n{tb}\n') 745 | 746 | if table.lower() in [x.lower() for x in MSIDumper.KnownTables]: 747 | try: 748 | if table not in MSIDumper.KnownTables: 749 | for t in MSIDumper.KnownTables: 750 | if table.lower() == t.lower(): 751 | table = t 752 | break 753 | 754 | index = self.options.get('record', -1) 755 | if index != -1: 756 | records0 = self.collectEntries(table) 757 | 758 | try: 759 | index = int(index) 760 | if index < 0 or index-1 > len(records0): 761 | self.logger.fatal(f'Invalid --record specified. There were only {len(records0)} records returned from {table}.\n\t\tUse value between --record 1 and --record {len(records0)}') 762 | records = [ records0[index-1], ] 763 | except: 764 | records = [] 765 | for a in records0: 766 | vals = list(a.values()) 767 | if len(vals) > 0 and vals[0].lower() == index.lower(): 768 | records.append(a) 769 | break 770 | 771 | if len(records) == 0: 772 | self.logger.fatal(f'Invalid --record specified. Could not find {table} record entry based on its index number nor ID name.') 773 | else: 774 | records = self.collectEntries(table) 775 | 776 | except Exception as e: 777 | self.logger.err(f'Exception occurred while enumerating {table} entries: {e}') 778 | 779 | if self.options.get('debug', False): 780 | raise 781 | else: 782 | table = table.lower() 783 | 784 | try: 785 | if table == 'stats': 786 | records = self.collectStats() 787 | elif table == 'all': 788 | return self.collectAll() 789 | elif table == 'olestream': 790 | records = self.collectStreams() 791 | else: 792 | self.logger.fatal(f'Unsupported --list setting: {table}') 793 | 794 | except Exception as e: 795 | self.logger.err(f'Exception occurred while pulling MSI metadata {table}: {e}') 796 | 797 | if self.options.get('debug', False): 798 | raise 799 | 800 | if records is not None: 801 | self.tableSpecificHighlighting(table, records) 802 | return self.printTable(table, records) 803 | 804 | else: 805 | if table in MSIDumper.KnownTables: 806 | return f'No records found in {Logger.colorize(table, "green")} table.' 807 | else: 808 | return f'No {Logger.colorize(table, "green")} metadata was extracted.' 809 | 810 | def tableSpecificHighlighting(self, table, records): 811 | if table.lower() == 'customaction': 812 | for i in range(len(records)): 813 | rec = records[i] 814 | for k, v in rec.items(): 815 | if k == 'type': 816 | col = '' 817 | for a, b in MSIDumper.CustomActionTypes.items(): 818 | if v in b['types']: 819 | col = b['color'] 820 | break 821 | if col != '': 822 | records[i][k] = Logger.colorize(v, col) 823 | records[i]['source'] = Logger.colorize(records[i]['source'], col) 824 | 825 | if table.lower() == 'binary': 826 | for i in range(len(records)): 827 | records[i]['Magic type'] = self.sniffDataType(records[i]['data'], color=True) 828 | 829 | def extract(self, what): 830 | assert self.nativedb is not None, "Database is not opened" 831 | 832 | what = what.lower() 833 | 834 | if what == 'script': 835 | what = 'scripts' 836 | 837 | if what not in [x.lower() for x in MSIDumper.ExtractModes]: 838 | self.logger.fatal(f'Unsupported --extract setting: {what}') 839 | 840 | self.outdir = os.path.normpath(os.path.abspath(self.options.get('outdir', ''))) 841 | if len(self.outdir) == 0: 842 | self.outdir = os.getcwd() 843 | 844 | if not os.path.isdir(self.outdir): 845 | os.makedirs(self.outdir) 846 | 847 | if what == 'all': 848 | return self.extractAll() 849 | elif what == 'binary': 850 | return self.extractBinary() 851 | elif what == 'files': 852 | return self.extractFiles() 853 | elif what == 'cabs': 854 | return self.extractCABs() 855 | elif what == 'scripts': 856 | return self.extractScripts() 857 | 858 | def extractAll(self): 859 | output = '' 860 | 861 | outs = self.extractBinary() 862 | if len(outs) > 0: 863 | output += outs + '\n' 864 | 865 | outs = self.extractFiles() 866 | if len(outs) > 0: 867 | output += outs + '\n' 868 | 869 | outs = self.extractCABs() 870 | if len(outs) > 0: 871 | output += outs + '\n' 872 | 873 | outs = self.extractScripts() 874 | if len(outs) > 0: 875 | output += outs + '\n' 876 | 877 | output += f'\nExtracted in total {self.extractedCount} objects.\n' 878 | 879 | return output 880 | 881 | def sanitizeName(self, name): 882 | windowsNames = ( 883 | 'CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 884 | 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 885 | 'COM8', 'COM9', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 886 | 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 887 | ) 888 | 889 | for a in ('..', '\\', '/', '"', "'", '?', '*', ':'): 890 | name = name.replace(a, '') 891 | 892 | for a in windowsNames: 893 | name = name.replace(a, '') 894 | 895 | if len(name) == 0: 896 | name = 'bin-' + ''.join(random.choices(string.ascii_uppercase + string.digits, k=5)) 897 | 898 | return name 899 | 900 | def extractBinary(self): 901 | binary = self.collectEntries('Binary') 902 | num = 0 903 | output = '' 904 | 905 | self.logger.verbose('Extracting data from Binary table...') 906 | 907 | if len(binary) == 0: 908 | self.logger.err('Input MSI does not contain any embedded Binary data.') 909 | 910 | for elem in binary: 911 | sniffed = self.sniffDataType(elem['data']) 912 | name = self.sanitizeName(elem['name']) + self.sniffDataExt(sniffed) 913 | outp = os.path.join(self.outdir, name) 914 | 915 | with open(outp, 'wb') as f: 916 | f.write(elem['data'].encode()) 917 | 918 | num += 1 919 | output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}' 920 | 921 | self.extractedCount += num 922 | if num > 0 and self.options.get('extract', '') != 'all': 923 | output += f'\n\nExtracted in total {num} objects.\n' 924 | 925 | return output 926 | 927 | def extractCab(self, infile, outdir, files): 928 | with open(infile, "rb") as f: 929 | arc = cabarchive.CabArchive(f.read()) 930 | 931 | self.logger.verbose('Extracting Cabinets from MSI...') 932 | 933 | output = f'Extracting files from CAB ({infile}):\n\n' 934 | num = 0 935 | 936 | for k, v in arc.items(): 937 | fn = v.filename 938 | 939 | for _file in files: 940 | if fn == _file['file']: 941 | fn = _file['filename'] 942 | 943 | p, ext = os.path.splitext(fn) 944 | if ext.lower() in MSIDumper.DangerousExtensions: 945 | fn += '.bin' 946 | 947 | lp = os.path.join(outdir, fn) 948 | 949 | lp1 = os.path.join(outdir, os.path.dirname(lp)) 950 | if not os.path.isdir(lp1): 951 | output += f'\t{Logger.colorize("[+]","green")} Creating temp dir: {lp1}\n' 952 | os.makedirs(lp1, exist_ok=True) 953 | 954 | output += f'{Logger.colorize("[+]","green")} {v.filename:20} => {lp}\n' 955 | with open(lp, 'wb') as f: 956 | f.write(v.buf) 957 | num += 1 958 | 959 | return num, output 960 | 961 | def extractFiles(self, overrideOutdir=''): 962 | outdir = self.outdir 963 | if len(overrideOutdir) > 0: 964 | dirpath = overrideOutdir 965 | else: 966 | dirpath = tempfile.mkdtemp() 967 | 968 | self.outdir = dirpath 969 | self.extractCABs() 970 | self.outdir = outdir 971 | 972 | self.logger.verbose('Extracting files from MSI...') 973 | 974 | cabsNum = 0 975 | num = 0 976 | output = '' 977 | files = self.collectEntries('File') 978 | 979 | path = os.path.join(dirpath, '*.cab') 980 | for cab in glob.glob(path, recursive=True): 981 | cabPath = os.path.join(path, cab) 982 | cabsNum += 1 983 | outp = os.path.join(dirpath, os.path.basename(cabPath).replace('.cab', '')) 984 | 985 | try: 986 | num0, output0 = self.extractCab(cabPath, outp, files) 987 | num += num0 988 | output += output0 989 | 990 | except Exception as e: 991 | self.logger.err(f'Could not extract files from CABinet: {cabPath}. Error: {e}') 992 | if self.options.get('debug', False): 993 | raise 994 | finally: 995 | if os.path.isfile(cabPath): 996 | os.remove(cabPath) 997 | 998 | if dirpath != overrideOutdir: 999 | shutil.rmtree(dirpath) 1000 | 1001 | self.extractedCount += num 1002 | if num > 0 and self.options.get('extract', '') != 'all': 1003 | output += f'\nExtracted in total {num} files from {cabsNum} cabinets.\n' 1004 | 1005 | return output 1006 | 1007 | def extractCABs(self): 1008 | binary = self.collectEntries('Binary') 1009 | num = 0 1010 | output = '' 1011 | 1012 | if len(binary) == 0: 1013 | self.logger.err('Input MSI does not contain any embedded Binary data.') 1014 | 1015 | for elem in binary: 1016 | sniffed = self.sniffDataType(elem['data']) 1017 | if '.cab' not in sniffed.lower(): 1018 | continue 1019 | 1020 | name = self.sanitizeName(elem['name']) + '.cab' 1021 | outp = os.path.join(self.outdir, name) 1022 | 1023 | with open(outp, 'wb') as f: 1024 | f.write(elem['data'].encode()) 1025 | 1026 | num += 1 1027 | 1028 | # source: https://github.com/decalage2/oletools/blob/master/oletools/oledir.py#L245 1029 | ole = olefile.OleFileIO(self.infile) 1030 | for entry in ole.listdir(): 1031 | name = entry[-1] 1032 | name = repr(name)[1:-1] 1033 | entry_id = ole._find(entry) 1034 | try: 1035 | size = ole.get_size(entry) 1036 | except: 1037 | size = '-' 1038 | 1039 | data0 = ole.openstream(entry).getvalue() 1040 | data = data0.decode(errors='ignore') 1041 | 1042 | sniffed = self.sniffDataType(data) 1043 | if '.cab' not in sniffed.lower(): 1044 | continue 1045 | 1046 | name = f'ole-stream-{entry_id}.cab' 1047 | outp = os.path.join(self.outdir, name) 1048 | 1049 | with open(outp, 'wb') as f: 1050 | f.write(data0) 1051 | 1052 | num += 1 1053 | output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]), "green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}' 1054 | 1055 | self.extractedCount += num 1056 | if num > 0 and self.options.get('extract', '') != 'all': 1057 | output += f'\n\nExtracted in total {num} objects.\n' 1058 | 1059 | return output 1060 | 1061 | def extractScripts(self): 1062 | binary = self.collectEntries('Binary') 1063 | actions = self.collectEntries('CustomAction') 1064 | num = 0 1065 | output = '' 1066 | 1067 | self.logger.verbose('Extracting scripts from CustomAction and Binary tables...') 1068 | 1069 | if len(binary) == 0: 1070 | self.logger.err('Input MSI does not contain any embedded Binary data.') 1071 | 1072 | for elem in actions: 1073 | sniffed = self.sniffDataType(elem['target']) 1074 | if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower(): 1075 | continue 1076 | 1077 | name = self.sanitizeName(elem['action']) 1078 | outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed) 1079 | 1080 | with open(outp, 'wb') as f: 1081 | f.write(elem['target'].encode()) 1082 | 1083 | num += 1 1084 | output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["target"]),"green")} bytes of {Logger.colorize(elem["action"],"green")} CustomAction script to: {Logger.colorize(outp,"yellow")}' 1085 | 1086 | for elem in binary: 1087 | sniffed = self.sniffDataType(elem['data']) 1088 | if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower(): 1089 | continue 1090 | 1091 | name = self.sanitizeName(elem['name']) 1092 | outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed) 1093 | 1094 | with open(outp, 'wb') as f: 1095 | f.write(elem['data'].encode()) 1096 | 1097 | num += 1 1098 | output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} binary object script to: {Logger.colorize(outp,"yellow")}' 1099 | 1100 | self.extractedCount += num 1101 | if num > 0 and self.options.get('extract', '') != 'all': 1102 | output += f'\n\nExtracted in total {num} objects.\n' 1103 | 1104 | return output 1105 | 1106 | def formatTable(self, tbl, table, records): 1107 | if self.maxWidth > -1 and len(records) > 0: 1108 | for k in records[0].keys(): 1109 | tbl._max_width[k] = self.maxWidth 1110 | 1111 | tbl.align['YARA Results'] = 'l' 1112 | 1113 | if table.lower() in self.specificTableAlignment.keys(): 1114 | for k, v in self.specificTableAlignment[table.lower()].items(): 1115 | tbl.align[k] = v 1116 | 1117 | if table.lower() in [x.lower() for x in MSIDumper.TableSortBy] and len(records) > 0: 1118 | tbl.sortby = list(records[0].keys())[MSIDumper.TableSortBy[table]] 1119 | 1120 | return tbl 1121 | 1122 | def collectAll(self): 1123 | output = '' 1124 | 1125 | self.logger.info('Dumping all MSI tables...') 1126 | 1127 | for table in MSIDumper.KnownTables: 1128 | recs = self.collectEntries(table) 1129 | 1130 | if not self.options.get('verbose', False) and len(recs) == 0 and table not in MSIDumper.ImportantTables: 1131 | continue 1132 | 1133 | output += '\n\n' 1134 | output += Logger.colorize(f'===============[ {table} : {len(recs)} records ]===============', 'green') 1135 | output += '\n\n' 1136 | output += self.printTable(table, recs) 1137 | 1138 | return output 1139 | 1140 | def collectStreams(self): 1141 | records = [] 1142 | 1143 | ole = olefile.OleFileIO(self.infile) 1144 | for entry in ole.listdir(storages=True): 1145 | name = entry[-1] 1146 | name = repr(name)[1:-1] 1147 | entry_id = ole._find(entry) 1148 | try: 1149 | size = ole.get_size(entry) 1150 | except: 1151 | size = '-' 1152 | typeid = ole.get_type(entry) 1153 | clsid = ole.getclsid(entry) 1154 | 1155 | data0 = ole.openstream(entry).getvalue() 1156 | data = data0.decode(errors='ignore') 1157 | sniffed = self.sniffDataType(data, color=True) 1158 | 1159 | records.append({ 1160 | 'entry_id' : entry_id, 1161 | 'data type' : sniffed, 1162 | 'name' : Logger.colorize(name, 'yellow'), 1163 | 'size' : size, 1164 | 'typeid' : typeid, 1165 | 'CLSID' : clsid, 1166 | }) 1167 | 1168 | return sorted(records, key=lambda x: x['entry_id']) 1169 | 1170 | def collectStats(self): 1171 | records = [] 1172 | hashes = ( 1173 | 'md5', 'sha1', 'sha256', 'ssdeep' 1174 | ) 1175 | 1176 | self.logger.info('Computing MSI file hashes...') 1177 | 1178 | with open(self.infile, 'rb') as f: 1179 | data = f.read() 1180 | 1181 | for h in hashes: 1182 | if h == 'ssdeep': 1183 | if USE_SSDEEP: 1184 | hsh = ssdeep.hash(data) 1185 | else: 1186 | hsh = 'err: ssdeep module not installed' 1187 | else: 1188 | m = hashlib.new(h) 1189 | m.update(data) 1190 | hsh = m.hexdigest() 1191 | 1192 | records.append({ 1193 | 'type' : Logger.colorize(f'Hash {h}', 'cyan'), 1194 | 'value' : Logger.colorize(hsh, 'cyan'), 1195 | }) 1196 | 1197 | del data 1198 | 1199 | self.logger.info('Collecting MSI tables stats...') 1200 | 1201 | for table in MSIDumper.KnownTables: 1202 | recs = self.collectEntries(table) 1203 | val = f'{len(recs)} records' 1204 | 1205 | if table in MSIDumper.SuspiciousTables: 1206 | table = Logger.colorize(table, 'red') 1207 | val = Logger.colorize(val, 'red') 1208 | 1209 | elif table in MSIDumper.ImportantTables: 1210 | table = Logger.colorize(table, 'yellow') 1211 | val = Logger.colorize(val, 'yellow') 1212 | 1213 | else: 1214 | if len(recs) == 0 and not self.options.get('verbose', False): 1215 | continue 1216 | 1217 | records.append({ 1218 | 'type' : table, 1219 | 'value' : val, 1220 | }) 1221 | 1222 | return records 1223 | 1224 | def analysisWorker(self): 1225 | self.processActions() 1226 | self.lookForIOCs() 1227 | 1228 | return self.printReport() 1229 | 1230 | def normalizeDataForOutput(self, val, num=0, table=''): 1231 | if num == 0: 1232 | num = self.options.get('print_len', MSIDumper.DefaultTableWidth) 1233 | 1234 | if num != -1: 1235 | val = val[:num] 1236 | 1237 | printable = MSIDumper.isprintable(val) 1238 | 1239 | if not printable and table not in ('olestream', ): 1240 | printable2 = MSIDumper.isprintable(Logger.stripColors(val)) 1241 | if not printable2: 1242 | val = MSIDumper.hexdump(val) + '\n' 1243 | 1244 | return val 1245 | 1246 | def cleanString(self, txt): 1247 | txt = txt.replace('\r', '') 1248 | txt = txt.replace('\t', ' ') 1249 | 1250 | if self.options.get('format', 'text') in ('csv', 'json'): 1251 | txt = Logger.stripColors(txt) 1252 | txt = ''.join(filter(lambda x: x in string.printable, txt)) 1253 | txt = txt.replace('\n', ' ') 1254 | txt = re.sub(r'\s+', ' ', txt, re.I) 1255 | 1256 | return txt 1257 | 1258 | def printTable(self, table, records): 1259 | if len(records) == 0: 1260 | return f'\n\nNo records found in table {Logger.colorize(table, "green")}.' 1261 | 1262 | yaraColumn = '' 1263 | self.logger.dbg(f'Dumping {table} table results...') 1264 | 1265 | rules = None 1266 | if len(self.options.get('yara', '')) > 0 and table != 'YARA Results': 1267 | yaraColumn = 'YARA Results' 1268 | matchesReport = [] 1269 | rules = self.initYara() 1270 | 1271 | if len(records) == 1 and (self.options.get('record', '') != -1 and len(self.options.get('record', '')) > 0): 1272 | output = '' 1273 | 1274 | for k, v in records[0].items(): 1275 | k0 = Logger.colorize(k, "green") 1276 | output += f'\n- {k0:20} : ' 1277 | 1278 | if type(v) is str: 1279 | v = self.normalizeDataForOutput(v, -1, table=table) 1280 | 1281 | if len(v) < 50: 1282 | output += v 1283 | else: 1284 | spacer = Logger.colorize('=' * MSIDumper.DefaultTableWidth, 'yellow') 1285 | output += '\n\n' + spacer + '\n\n' + v + '\n\n' + spacer + '\n' 1286 | else: 1287 | output += str(v) 1288 | 1289 | if table in ('binary', ): 1290 | output += '\n' 1291 | 1292 | output += '\n' 1293 | 1294 | if len(yaraColumn) > 0: 1295 | k0 = Logger.colorize(yaraColumn, "green") 1296 | output += f'\n- {k0:20} : ' 1297 | 1298 | for k, v in records[0].items(): 1299 | if type(v) is not str: 1300 | continue 1301 | matches = rules.match(data = v) 1302 | if matches: 1303 | ms = '' 1304 | for m in matches: 1305 | ms += f'- {m.rule}\n' 1306 | output += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n' 1307 | else: 1308 | output = '' 1309 | numCol = ['#',] 1310 | yarCol = [] 1311 | if table == 'olestream': 1312 | numCol = [] 1313 | 1314 | if len(yaraColumn) > 0: 1315 | yarCol = [yaraColumn, ] 1316 | 1317 | tbl = PrettyTable(numCol + list(records[0].keys()) + yarCol) 1318 | num = 0 1319 | 1320 | index = self.options.get('record', -1) 1321 | if index != -1: 1322 | num = index - 1 1323 | 1324 | tbl = self.formatTable(tbl, table, records) 1325 | 1326 | for rec in records: 1327 | num += 1 1328 | vals = [] 1329 | i = 0 1330 | for v in [num, ] + list(rec.values()): 1331 | if i == 0 and 'entry_id' in rec.keys(): 1332 | i += 1 1333 | continue 1334 | if type(v) is str: 1335 | v = self.normalizeDataForOutput(v, table=table) 1336 | s = self.cleanString(v).strip() 1337 | n = '' 1338 | 1339 | if table.lower() in ('binary', ): 1340 | n = '\n' 1341 | 1342 | vals.append(s + n) 1343 | else: 1344 | vals.append(v) 1345 | i += 1 1346 | 1347 | if len(yaraColumn) > 0: 1348 | i = 0 1349 | val = '' 1350 | for v in list(rec.values()): 1351 | if type(v) is not str: 1352 | i += 1 1353 | continue 1354 | matches = rules.match(data = v) 1355 | if matches: 1356 | ms = '' 1357 | for m in matches: 1358 | ms += f'- {m.rule}\n' 1359 | k = list(rec.keys())[i] 1360 | val += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n' 1361 | i += 1 1362 | vals.append(val) 1363 | 1364 | if self.options['format'] == 'csv': 1365 | tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals]) 1366 | else: 1367 | tbl.add_row(vals) 1368 | 1369 | if self.options['format'] == 'text': 1370 | output += str(tbl) 1371 | 1372 | if table != 'YARA Results' and self: 1373 | output += f'\n\n[.] Found {Logger.colorize(str(len(records)), "green")} records in {Logger.colorize(table, "green")} table.' 1374 | 1375 | output += '\n' 1376 | 1377 | elif self.options['format'] == 'json': 1378 | output += str(tbl.get_json_string()) 1379 | 1380 | elif self.options['format'] == 'csv': 1381 | output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\')) 1382 | 1383 | # elif self.options['format'] == 'html': 1384 | # output += str(tbl.get_html_string()) 1385 | 1386 | return output 1387 | 1388 | def printReport(self): 1389 | output = '' 1390 | cols = [ 1391 | '#', 1392 | 'threat', 1393 | 'location', 1394 | 'context', 1395 | 'description' 1396 | ] 1397 | tbl = PrettyTable(cols) 1398 | tbl = self.formatTable(tbl, 'report', self.report) 1399 | 1400 | num = 0 1401 | 1402 | for report in self.report: 1403 | num += 1 1404 | rec = [ 1405 | num, 1406 | report['name'], 1407 | report['location'], 1408 | report['context'], 1409 | report['desc'], 1410 | ] 1411 | vals = [] 1412 | for v in rec: 1413 | if type(v) is str: 1414 | vals.append(self.cleanString(v)) 1415 | else: 1416 | vals.append(v) 1417 | 1418 | if self.options['format'] == 'csv': 1419 | tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals]) 1420 | else: 1421 | tbl.add_row(vals) 1422 | 1423 | if self.options['format'] == 'text': 1424 | output += str(tbl) 1425 | 1426 | elif self.options['format'] == 'json': 1427 | output += str(tbl.get_json_string()) 1428 | 1429 | elif self.options['format'] == 'csv': 1430 | output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\')) 1431 | 1432 | # elif self.options['format'f] == 'html': 1433 | # output += str(tbl.get_html_string()) 1434 | 1435 | return output 1436 | 1437 | def printRecord(self, rec, indent=''): 1438 | out = '' 1439 | keyLen = -1 1440 | 1441 | if type(rec) is str: 1442 | return rec 1443 | 1444 | for k, v in rec.items(): 1445 | if len(k) > keyLen: 1446 | keyLen = len(Logger.colorize(k, 'yellow')) + 1 1447 | 1448 | if self.format == 'text': 1449 | for k, v in rec.items(): 1450 | if k.lower() in MSIDumper.SkipColumns: 1451 | continue 1452 | 1453 | if type(v) is str or type(v) is bytes: 1454 | printable = MSIDumper.isprintable(v) 1455 | 1456 | if not printable and v[0] != '\x1b': 1457 | v = '\n\n' + MSIDumper.hexdump(v) + '\n' 1458 | 1459 | if self.options.get('record', -1) == -1 and len(v) > 256: 1460 | v = '\n\n' + v[:256].strip() + '\n\t[CUT FOR BREVITY]\n' 1461 | 1462 | k = Logger.colorize(k, 'yellow') 1463 | out += indent + f'- {k:{keyLen}}: {v}\n' 1464 | 1465 | elif self.format == 'csv': 1466 | out = self.csvDelim.join([str(x).replace(self.csvDelim, '')[:self.maxWidth] for x in rec.values()]) 1467 | 1468 | return out 1469 | 1470 | @staticmethod 1471 | def isValidPE(data): 1472 | pe = None 1473 | try: 1474 | pe = pefile.PE(data=data.encode(), fast_load=True) 1475 | _format = MSIDumper.RecognizedInnerFileTypes['executable']['indicator'] 1476 | 1477 | if pe.OPTIONAL_HEADER.DllCharacteristics != 0: 1478 | _format = MSIDumper.RecognizedInnerFileTypes['dll']['indicator'] 1479 | 1480 | pe.close() 1481 | return (True, _format) 1482 | except pefile.PEFormatError as e: 1483 | logger.dbg(f'pefile error: {e}') 1484 | return (False, '') 1485 | finally: 1486 | if pe: 1487 | pe.close() 1488 | 1489 | def sniffDataExt(self, sniffed): 1490 | for k, v in MSIDumper.RecognizedInnerFileTypes.items(): 1491 | if v['indicator'].lower() == sniffed.lower(): 1492 | return MSIDumper.RecognizedInnerFileTypes[k]['safe-extension'] 1493 | 1494 | return '' 1495 | 1496 | def gradeFoundIndicator(self, indicator, data='', color='', mime=''): 1497 | if color != '': 1498 | if color == 'red': 1499 | return 1 1500 | 1501 | if mime != '' and mime.lower() in MSIDumper.MimeTypesThatIncreasSuspiciousScore: 1502 | return 1 1503 | 1504 | return 0 1505 | 1506 | def sniffDataType(self, data, color=False): 1507 | mime = self.options.get('mime', False) 1508 | magicOut = 'data' 1509 | try: 1510 | magicOut = magic.from_buffer(data, mime=mime) 1511 | except Exception as e: 1512 | self.logger.dbg(f'Magic failed fingerprinting data: {e}') 1513 | 1514 | pe, petype = MSIDumper.isValidPE(data) 1515 | if pe: 1516 | if mime and magicOut in ('data', 'application/octet-stream'): 1517 | indicator = 'application/x-dosexec' 1518 | if color: 1519 | indicator = Logger.colorize(petype, 'red') 1520 | self.grade += self.gradeFoundIndicator(indicator, data, color='red') 1521 | return indicator 1522 | 1523 | for format, predicate in MSIDumper.RecognizedInnerFileTypes.items(): 1524 | indicator = predicate.get('indicator', '') 1525 | predColor = predicate.get('color', '') 1526 | 1527 | if format == 'unsure-executable': 1528 | if data[:2] != 'MZ' and data[:2] != 'ZM': 1529 | continue 1530 | elif format == 'unsure-cabinet': 1531 | if data[:4] != 'MSCF': 1532 | continue 1533 | 1534 | if mime: 1535 | indicator = magicOut 1536 | 1537 | if color: 1538 | indicator = Logger.colorize(indicator, predColor) 1539 | 1540 | magicVals = predicate.get('magic', []) 1541 | if len(magicVals) > 0: 1542 | for m in magicVals: 1543 | if m.lower() in magicOut.lower(): 1544 | self.grade += self.gradeFoundIndicator(indicator, data, color=predColor) 1545 | return indicator 1546 | 1547 | keywords = predicate.get('keywords', []) 1548 | minkeywords = predicate.get('min-keywords', 0) 1549 | 1550 | printable = predicate.get('printable', 0) 1551 | printableMet = False 1552 | if printable: 1553 | if MSIDumper.isprintable(data): 1554 | printableMet = True 1555 | 1556 | if printable and not printableMet: 1557 | continue 1558 | 1559 | if len(keywords) > 0 and minkeywords > 0: 1560 | skip = False 1561 | found = 0 1562 | for keyword in keywords: 1563 | if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I): 1564 | found += 1 1565 | 1566 | if found >= minkeywords: 1567 | foundNots = 0 1568 | notkeywords = predicate.get('not-keywords', []) 1569 | 1570 | if len(notkeywords) > 0: 1571 | for keyword in notkeywords: 1572 | if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I): 1573 | foundNots += 1 1574 | 1575 | if foundNots == 0: 1576 | self.grade += self.gradeFoundIndicator(indicator, data, color=predColor) 1577 | return indicator 1578 | 1579 | if magicOut == 'data': 1580 | return '' 1581 | 1582 | return magicOut 1583 | 1584 | def lookForIOCs(self): 1585 | binary = self.collectEntries('Binary') 1586 | customActions = self.collectEntries('CustomAction') 1587 | i = 0 1588 | 1589 | streams = self.collectEntries('_Streams') 1590 | if len(streams) == 0: 1591 | self.report.append({ 1592 | 'name' : Logger.colorize('Missing _Streams', 'yellow'), 1593 | 'location' : f'_Streams table', 1594 | 'context' : '', 1595 | 'desc' : f'Typically MSIs contain _Streams table referring .CAB archives.\nThis sample however didn\'t contain such table, making it unusual/mangled.\n', 1596 | }) 1597 | 1598 | for data in binary: 1599 | i += 1 1600 | sniffed = self.sniffDataType(data['data'], color=True) 1601 | 1602 | if len(sniffed) > 0: 1603 | data['size'] = len(data['data']) 1604 | runByCa = False 1605 | desc = '' 1606 | 1607 | i = 0 1608 | for ca in customActions: 1609 | i += 1 1610 | if ca['source'] == data['name']: 1611 | runByCa = True 1612 | desc = f'\nThat data will be used during installation by CustomAction {Logger.colorize(i, "yellow")}. {Logger.colorize(ca["action"], "yellow")}' 1613 | break 1614 | 1615 | if not runByCa: 1616 | self.grade -= 1 1617 | sniffed = Logger.stripColors(sniffed) 1618 | sniffed = Logger.colorize(sniffed, 'yellow') 1619 | desc = '\nHowever that data doesn\'t seem to be used in CustomActions, decreasing impact.' 1620 | 1621 | self.report.append({ 1622 | 'name' : sniffed, 1623 | 'location' : f'Binary table', 1624 | 'context' : self.printRecord(data), 1625 | 'desc' : f'MSI contains {sniffed} data in Binary table entry {Logger.colorize(str(i), "yellow")}. {Logger.colorize(data["name"], "yellow")}' + desc, 1626 | }) 1627 | 1628 | def processActions(self): 1629 | actions = self.collectEntries('CustomAction') 1630 | execSeq = self.collectEntries('InstallExecuteSequence') 1631 | uiSeq = self.collectEntries('InstallUISequence') 1632 | 1633 | for action in actions: 1634 | self.logger.dbg(f'Parsing CustomAction {action["action"]} ...') 1635 | 1636 | for suspAction, data in MSIDumper.CustomActionTypes.items(): 1637 | if action['type'] in data['types']: 1638 | desc = data['desc'] 1639 | color = MSIDumper.CustomActionTypes[suspAction].get('color', 'white') 1640 | 1641 | fieldToHighlight = '' 1642 | 1643 | if 'vbscript' in suspAction.lower() or 'jscript' in suspAction.lower(): 1644 | if len(action['source']) > 0: 1645 | fieldToHighlight = 'source' 1646 | self.grade += self.gradeFoundIndicator(suspAction, color=color) 1647 | desc += f".\nScript is located in {Logger.colorize(action['source'],'yellow')} Binary table record." 1648 | 1649 | elif 'run-dll' in suspAction.lower(): 1650 | fieldToHighlight = 'source' 1651 | self.grade += self.gradeFoundIndicator(suspAction, color=color) 1652 | desc += f".\nDLL is located in {Logger.colorize(action['source'],'yellow')} Binary table record." 1653 | 1654 | elif 'run-exe' in suspAction.lower(): 1655 | fieldToHighlight = 'source' 1656 | self.grade += self.gradeFoundIndicator(suspAction, color=color) 1657 | desc += f"\nEXE is located in {Logger.colorize(action['source'],'yellow')} Binary table record." 1658 | 1659 | elif 'set-directory' in suspAction.lower(): 1660 | fieldToHighlight = 'target' 1661 | 1662 | elif 'execute' in suspAction.lower(): 1663 | fieldToHighlight = 'target' 1664 | self.grade += self.gradeFoundIndicator(suspAction, color=color) 1665 | desc += f".\nCommand that will be executed:\ncmd> {Logger.colorize(action['target'],'red')}" 1666 | 1667 | foundInSeq = False 1668 | for seq in execSeq: 1669 | if seq['action'] == action['action']: 1670 | foundInSeq = True 1671 | cond = '' 1672 | if len(seq['condition']) > 0: 1673 | cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}" 1674 | 1675 | desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallExecuteSequence','yellow')} table" + cond + '\n' 1676 | break 1677 | 1678 | for seq in uiSeq: 1679 | if seq['action'] == action['action']: 1680 | foundInSeq = True 1681 | cond = '' 1682 | if len(seq['condition']) > 0: 1683 | cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}" 1684 | 1685 | desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallUISequence','yellow')} table" + cond + '\n' 1686 | break 1687 | 1688 | if not foundInSeq: 1689 | self.grade -= 1 1690 | color = 'yellow' 1691 | desc = '\nHowever that action doesn\'t seem to be invoked anywhere, decreasing impact.' 1692 | 1693 | if len(fieldToHighlight) > 0: 1694 | action[fieldToHighlight] = Logger.colorize(action[fieldToHighlight], color) 1695 | 1696 | self.report.append({ 1697 | 'name' : Logger.colorize(suspAction, color), 1698 | 'location' : f'CustomAction table', 1699 | 'context' : self.printRecord(action), 1700 | 'desc' : desc 1701 | }) 1702 | break 1703 | 1704 | def initYara(self): 1705 | yaraPath = self.options.get('yara', '') 1706 | if len(yaraPath) == 0: 1707 | return None 1708 | 1709 | yaraPath = os.path.abspath(os.path.normpath(yaraPath)) 1710 | 1711 | if not os.path.isfile(yaraPath) and not os.path.isdir(yaraPath): 1712 | self.logger.fatal(f'Specified --yara path does not exist.') 1713 | 1714 | rules = None 1715 | try: 1716 | rules = yara.compile(yaraPath) 1717 | except Exception as e: 1718 | self.logger.fatal(f'Could not compile YARA rules. Exception: {e}') 1719 | 1720 | return rules 1721 | 1722 | def yaraScan(self, scanBinary=True, scanActions=True, scanFiles=True): 1723 | matchesReport = [] 1724 | rules = self.initYara() 1725 | 1726 | if scanBinary: 1727 | binary = self.collectEntries('Binary') 1728 | output = '' 1729 | 1730 | if len(binary) > 0: 1731 | i = 0 1732 | for elem in binary: 1733 | i += 1 1734 | matches = rules.match(data = elem['data'].encode()) 1735 | if matches: 1736 | matchesReport.append({ 1737 | 'where' : f'Binary record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}', 1738 | 'rules' : '\n'.join([x.rule for x in matches]) 1739 | }) 1740 | 1741 | if scanActions: 1742 | actions = self.collectEntries('CustomAction') 1743 | output = '' 1744 | 1745 | if len(actions) > 0: 1746 | i = 0 1747 | for elem in actions: 1748 | sniffed = self.sniffDataType(elem['target']) 1749 | if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower(): 1750 | continue 1751 | i += 1 1752 | matches = rules.match(data = elem['data']) 1753 | if matches: 1754 | matchesReport.append({ 1755 | 'where' : f'CustomAction record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}', 1756 | 'rules' : '\n'.join([x.rule for x in matches]) 1757 | }) 1758 | 1759 | if scanFiles: 1760 | try: 1761 | dirpath = tempfile.mkdtemp() 1762 | self.logger.verbose(f'Extracting all files from MSI into temp dir: {dirpath} ...') 1763 | 1764 | out = self.extractFiles(overrideOutdir = dirpath) 1765 | 1766 | for _file in glob.glob(os.path.join(dirpath, '**/*.*'), recursive=True): 1767 | path = os.path.join(dirpath, _file) 1768 | 1769 | matches = rules.match(path) 1770 | if matches: 1771 | matchesReport.append({ 1772 | 'where' : f'File extracted from MSI: {Logger.colorize(os.path.basename(path), "yellow")}', 1773 | 'rules' : '\n'.join([x.rule for x in matches]) 1774 | }) 1775 | 1776 | except Exception as e: 1777 | self.logger.err(f'Could not extract files from MSI for YARA scanning. Exception: {e}') 1778 | if self.options.get('debug', False): 1779 | raise 1780 | 1781 | finally: 1782 | if os.path.isdir(dirpath): 1783 | shutil.rmtree(dirpath) 1784 | 1785 | if len(matchesReport) > 0: 1786 | output += Logger.colorize(f'[+] Got {len(matchesReport)} YARA rules matches on this MSI:\n\n', 'green') 1787 | output += self.printTable('YARA Results', matchesReport) 1788 | 1789 | return output 1790 | 1791 | def getoptions(): 1792 | global logger 1793 | global options 1794 | 1795 | epilog = f''' 1796 | 1797 | ------------------------------------------------------ 1798 | 1799 | - What can be listed: 1800 | --list CustomAction - Specific table 1801 | --list Registry,File - List multiple tables 1802 | --list stats - Print MSI database statistics 1803 | --list all - All tables and their contents 1804 | --list olestream - Prints all OLE streams & storages. 1805 | To display CABs embedded in MSI try: --list _Streams 1806 | --list cabs - Lists embedded CAB files 1807 | --list binary - Lists binary data embedded in MSI for its own purposes. 1808 | That typically includes EXEs, DLLs, VBS/JS scripts, etc 1809 | 1810 | - What can be extracted: 1811 | --extract all - Extracts Binary data, all files from CABs, scripts from CustomActions 1812 | --extract binary - Extracts Binary data 1813 | --extract files - Extracts files 1814 | --extract cabs - Extracts cabinets 1815 | --extract scripts - Extracts scripts 1816 | 1817 | ------------------------------------------------------ 1818 | 1819 | ''' 1820 | 1821 | usage = '\nUsage: msidump.py [options] \n' 1822 | opts = argparse.ArgumentParser( 1823 | usage=usage, 1824 | formatter_class=argparse.RawDescriptionHelpFormatter, 1825 | epilog=textwrap.dedent(epilog) 1826 | ) 1827 | 1828 | req = opts.add_argument_group('Required arguments') 1829 | req.add_argument('infile', help='Input MSI file (or directory) for analysis.') 1830 | 1831 | opt = opts.add_argument_group('Options') 1832 | opt.add_argument('-q', '--quiet', default=False, action='store_true', help='Surpress banner and unnecessary information. In triage mode, will display only verdict.') 1833 | opt.add_argument('-v', '--verbose', default=False, action='store_true', help='Verbose mode.') 1834 | opt.add_argument('-d', '--debug', default=False, action='store_true', help='Debug mode.') 1835 | opt.add_argument('-N', '--nocolor', default=False, action='store_true', help='Dont use colors in text output.') 1836 | opt.add_argument('-n', '--print-len', default=MSIDumper.DefaultTableWidth, type=int, help='When previewing data - how many bytes to include in preview/hexdump. Default: 128') 1837 | opt.add_argument('-f', '--format', default='text', choices=['text', 'json', 'csv'], help='Output format: text, json, csv. Default: text') 1838 | opt.add_argument('-o', '--outfile', metavar='path', default='', help='Redirect program output to this file.') 1839 | opt.add_argument('-m', '--mime', default=False, action='store_true', help='When sniffing inner data type, report MIME types') 1840 | 1841 | mod = opts.add_argument_group('Analysis Modes') 1842 | mod.add_argument('-l', '--list', metavar='what', default='', help='List specific table contents. See help message to learn what can be listed.') 1843 | mod.add_argument('-x', '--extract', metavar='what', default='', help='Extract data from MSI. For what can be extracted, refer to help message.') 1844 | 1845 | spec = opts.add_argument_group('Analysis Specific options') 1846 | spec.add_argument('-i', '--record', metavar='number|name', type=str, default=-1, help='Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir') 1847 | spec.add_argument('-O', '--outdir', metavar='path', default='', help='When --extract mode is used, specifies output location where to extract data.') 1848 | spec.add_argument('-y', '--yara', metavar='path', default='', help='Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files') 1849 | 1850 | args = opts.parse_args() 1851 | options.update(vars(args)) 1852 | 1853 | logger = Logger(options) 1854 | 1855 | if len(args.list) > 0: 1856 | if args.list.lower() not in [x.lower() for x in MSIDumper.ListModes + MSIDumper.KnownTables] and ',' not in args.list: 1857 | logger.err(f'WARNING: Requested {args.list} table is not recognized: parser will probably crash!') 1858 | 1859 | args.infile = os.path.abspath(os.path.normpath(args.infile)) 1860 | 1861 | if not os.path.isfile(args.infile) and not os.path.isdir(args.infile): 1862 | logger.fatal(f'--infile does not exist!') 1863 | 1864 | exclusive = sum([len(args.list) > 0, len(args.extract) > 0]) 1865 | if exclusive > 1: 1866 | logger.fatal(f'--list and --extract are mutually exclusive options. Pick one.') 1867 | 1868 | if len(args.extract) > 0 and len(args.outdir) == 0: 1869 | logger.fatal('-O/--outdir telling where to extract files to is required when working in --extract mode.') 1870 | 1871 | options.update(vars(args)) 1872 | return args 1873 | 1874 | @atexit.register 1875 | def goodbye(): 1876 | try: 1877 | colorama.deinit() 1878 | except: 1879 | pass 1880 | 1881 | def terminalWidth(): 1882 | n = shutil.get_terminal_size((80, 20)) # pass fallback 1883 | return n.columns 1884 | 1885 | def banner(): 1886 | print(f''' 1887 | _ _ 1888 | _ __ ___ ___(_) __| |_ _ _ __ ___ _ __ 1889 | | '_ ` _ \/ __| |/ _` | | | | '_ ` _ \| '_ \ 1890 | | | | | | \__ \ | (_| | |_| | | | | | | |_) | 1891 | |_| |_| |_|___/_|\__,_|\__,_|_| |_| |_| .__/ 1892 | |_| 1893 | version: {Logger.colorize(VERSION, "green")} 1894 | author : Mariusz Banach (mgeeky, @mariuszbit) 1895 | binary-offensive.com 1896 | ''') 1897 | 1898 | def processFile(args, path): 1899 | msir = MSIDumper(options, logger) 1900 | 1901 | if not msir.open(path): 1902 | logger.err(f'Could not open database (use -d to learn more): {path}') 1903 | return '' 1904 | 1905 | report = '' 1906 | if not args.quiet and args.format == 'text': 1907 | report += f'{Logger.colorize("[+]","green")} Analyzing : {path}\n\n' 1908 | 1909 | if len(args.list) > 0: 1910 | report += msir.listTable(args.list) 1911 | 1912 | elif len(args.extract) > 0: 1913 | report += msir.extract(args.extract) 1914 | 1915 | else: 1916 | rep = msir.analyse() 1917 | 1918 | if len(args.yara) > 0: 1919 | rep += '\n\n' + msir.yaraScan() 1920 | 1921 | if not args.quiet: 1922 | report += str(rep) 1923 | 1924 | if args.format == 'text': 1925 | report += '\n\n' + msir.verdict.strip() + '\n' 1926 | 1927 | elif args.format == 'text': 1928 | verd = msir.verdict.strip() 1929 | pos = verd.find(':') 1930 | if pos != -1: 1931 | verd = verd[pos+1:].strip() 1932 | 1933 | report += verd + ' : ' + path 1934 | 1935 | if args.format == 'text': 1936 | logger.ok(f'Database processed : {path}') 1937 | msir.close() 1938 | 1939 | return report 1940 | 1941 | def processDir(args, infile): 1942 | report = '' 1943 | 1944 | logger.verbose(f'Process files from directory: {infile}') 1945 | 1946 | for file in glob.glob(os.path.join(infile, '**/**'), recursive=True): 1947 | path = os.path.join(infile, file) 1948 | if os.path.isfile(path): 1949 | try: 1950 | report += processFile(args, path) 1951 | report += '\n\n' 1952 | 1953 | except Exception as e: 1954 | logger.err('Analysis of "{}" failed. Exception: {}'.format( 1955 | path, str(e) 1956 | )) 1957 | 1958 | return report 1959 | 1960 | def main(): 1961 | global options 1962 | args = getoptions() 1963 | if not args: 1964 | return False 1965 | 1966 | if not args.quiet and args.format == 'text': 1967 | banner() 1968 | 1969 | if len(args.outfile) > 0: 1970 | options['nocolor'] = True 1971 | 1972 | options['max_width'] = terminalWidth() 1973 | 1974 | if os.path.isfile(args.infile): 1975 | report = processFile(args, args.infile) 1976 | 1977 | else: 1978 | report = processDir(args, args.infile) 1979 | 1980 | if len(args.outfile) > 0: 1981 | with open(args.outfile, 'wb') as f: 1982 | rep = Logger.stripColors(report) 1983 | f.write(rep.encode()) 1984 | else: 1985 | print(report) 1986 | 1987 | if __name__ == '__main__': 1988 | main() 1989 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | olefile 2 | colorama 3 | yara-python 4 | prettytable>=3.5 5 | pefile 6 | cabarchive 7 | pywin32 8 | python-magic 9 | python-magic-bin; sys_platform == "win32" or sys_platform == "darwin" 10 | 11 | # ssdeep is optional 12 | #ssdeep -------------------------------------------------------------------------------- /test-cases/README.md: -------------------------------------------------------------------------------- 1 | ## msidump test cases 2 | 3 | - `sample1-run-autoruns64.msi.bin` - launches MS Sysinternals Autoruns64.exe from `C:\Windows\Installer\MSXXXX.msi` 4 | - `sample2-run-calc-script.msi.bin` - executes VBScript that runs `calc` over `Wscript.Shell.Exec` method 5 | - `sample3-run-calc-shellcode-via-dotnet.msi.bin` - bundles specially crafted CustomAction .NET DLL, that when executed, runs shellcode which spawns `calc` 6 | - `sample4-customaction-run-calc.msi.bin` - simple MSI that runs system commands after installation is complete, here runs `calc` 7 | - `putty-backdoored.msi.bin` - runs `calc` during PuTTY installation 8 | 9 | All these installers install themselves to `%LOCALAPPDATA%\VcRedist` directory. 10 | 11 | You can uninstall them with: 12 | 13 | ``` 14 | msiexec /q /x file.msi 15 | ``` 16 | -------------------------------------------------------------------------------- /test-cases/putty-backdoored.msi.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/putty-backdoored.msi.bin -------------------------------------------------------------------------------- /test-cases/sample1-run-autoruns64.msi.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample1-run-autoruns64.msi.bin -------------------------------------------------------------------------------- /test-cases/sample2-run-calc-script.msi.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample2-run-calc-script.msi.bin -------------------------------------------------------------------------------- /test-cases/sample3-run-calc-shellcode-via-dotnet.msi.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample3-run-calc-shellcode-via-dotnet.msi.bin -------------------------------------------------------------------------------- /test-cases/sample4-customaction-run-calc.msi.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample4-customaction-run-calc.msi.bin --------------------------------------------------------------------------------