├── README.md
├── img
    ├── 1.png
    ├── 2.png
    ├── 3.png
    └── 4.png
├── msidump.py
├── requirements.txt
└── test-cases
    ├── README.md
    ├── putty-backdoored.msi.bin
    ├── sample1-run-autoruns64.msi.bin
    ├── sample2-run-calc-script.msi.bin
    ├── sample3-run-calc-shellcode-via-dotnet.msi.bin
    └── sample4-customaction-run-calc.msi.bin


/README.md:
--------------------------------------------------------------------------------
  1 | # `msidump`
  2 | 
  3 | **MSI Dump** - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary data and incorporates YARA scanner.
  4 | 
  5 | On Macro-enabled Office documents we can quickly use [oletools mraptor](https://github.com/decalage2/oletools/blob/master/oletools/mraptor.py) to determine whether document is malicious. If we want to dissect it further, we could bring in [oletools olevba](https://github.com/decalage2/oletools/blob/master/oletools/olevba.py) or [oledump](https://github.com/DidierStevens/DidierStevensSuite/blob/master/oledump.py).
  6 | 
  7 | To dissect malicious MSI files, so far we had only one, but reliable and trustworthy [lessmsi](https://github.com/activescott/lessmsi).
  8 | However, `lessmsi` doesn't implement features I was looking for:
  9 | 
 10 | - quick triage
 11 | - Binary data extraction
 12 | - YARA scanning
 13 | 
 14 | Hence this is where `msidump` comes into play.
 15 | 
 16 | 
 17 | ## Features
 18 | 
 19 | This tool helps in quick triages as well as detailed examinations of malicious MSIs corpora.
 20 | It lets us:
 21 | 
 22 | - Quickly determine whether file is suspicious or not.
 23 | - List all MSI tables as well as dump specific records
 24 | - Extract Binary data, all files from CABs, scripts from CustomActions
 25 | - scan all inner data and records with YARA rules
 26 | - Uses `file`/MIME type deduction to determine inner data type
 27 | 
 28 | It was created as a companion tool to the blog post I released here:
 29 | 
 30 | - [MSI Shenanigans. Part 1 - Offensive Capabilities Overview](https://mgeeky.tech/msi-shenanigans-part-1/)
 31 | 
 32 | 
 33 | ### Limitations
 34 | 
 35 | - The program is still in an early alpha version, things are expected to break and triaging/parsing logic to change
 36 | - Due to this tool heavy relience on Win32 COM `WindowsInstaller.Installer` interfaces, currently **it is not possible to support native Linux** platforms. Maybe `wine python msidump.py` could help, but haven't tried that yet.
 37 | 
 38 | 
 39 | ## Use Cases
 40 | 
 41 | 1. Perform quick triage of a suspicious MSI augmented with YARA rule:
 42 | 
 43 | ```
 44 | cmd> python msidump.py evil.msi -y rules.yara
 45 | ```
 46 | 
 47 | ![1.png](img/1.png)
 48 | 
 49 | Here we can see that input MSI is injected with suspicious **VBScript** and contains numerous executables in it.
 50 | 
 51 | 
 52 | 2. Now we want to take a closer look at this VBScript by extracting only that record. 
 53 | 
 54 | We see from the triage table that it was present in `Binary` table. Lets get him:
 55 | 
 56 | ```
 57 | python msidump.py putty-backdoored.msi -l binary -i UBXtHArj
 58 | ```
 59 | 
 60 | We can specify which to record dump either by its name/ID or its index number (here that would be 7).
 61 | 
 62 | ![2.png](img/2.png)
 63 | 
 64 | Lets have a look at another example. This time there is executable stored in `Binary` table that will be executed during installation:
 65 | 
 66 | ![3.png](img/3.png)
 67 | 
 68 | To extract that file we're gonna go with 
 69 | 
 70 | ```
 71 | python msidump.py evil2.msi -x binary -i lmskBju -O extracted
 72 | ```
 73 | 
 74 | Where 
 75 | - `-x binary` tells to extract contents of `Binary` table
 76 | - `-i lmskBju` specifies which record exactly to extract
 77 | - `-O extracted` sets output directory
 78 | 
 79 | ![4.png](img/4.png)
 80 | 
 81 | 
 82 | For the best output experience, run the tool on a **maximized console window** or redirect output to file:
 83 | 
 84 | ```
 85 | python msidump.py [...] -o analysis.log
 86 | ```
 87 | 
 88 | ## Full Usage
 89 | 
 90 | ```
 91 | PS D:\> python .\msidump.py --help
 92 | options:
 93 |   -h, --help            show this help message and exit
 94 | 
 95 | Required arguments:
 96 |   infile                Input MSI file (or directory) for analysis.
 97 | 
 98 | Options:
 99 |   -q, --quiet           Surpress banner and unnecessary information. In triage mode, will display only verdict.
100 |   -v, --verbose         Verbose mode.
101 |   -d, --debug           Debug mode.
102 |   -N, --nocolor         Dont use colors in text output.
103 |   -n PRINT_LEN, --print-len PRINT_LEN
104 |                         When previewing data - how many bytes to include in preview/hexdump. Default: 128
105 |   -f {text,json,csv}, --format {text,json,csv}
106 |                         Output format: text, json, csv. Default: text
107 |   -o path, --outfile path
108 |                         Redirect program output to this file.
109 |   -m, --mime            When sniffing inner data type, report MIME types
110 | 
111 | Analysis Modes:
112 |   -l what, --list what  List specific table contents. See help message to learn what can be listed.
113 |   -x what, --extract what
114 |                         Extract data from MSI. For what can be extracted, refer to help message.
115 | 
116 | Analysis Specific options:
117 |   -i number|name, --record number|name
118 |                         Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir
119 |   -O path, --outdir path
120 |                         When --extract mode is used, specifies output location where to extract data.
121 |   -y path, --yara path  Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files
122 | 
123 | ------------------------------------------------------
124 | 
125 | - What can be listed:
126 |     --list CustomAction     - Specific table
127 |     --list Registry,File    - List multiple tables
128 |     --list stats            - Print MSI database statistics
129 |     --list all              - All tables and their contents
130 |     --list olestream        - Prints all OLE streams & storages.
131 |                               To display CABs embedded in MSI try: --list _Streams
132 |     --list cabs             - Lists embedded CAB files
133 |     --list binary           - Lists binary data embedded in MSI for its own purposes.
134 |                               That typically includes EXEs, DLLs, VBS/JS scripts, etc
135 | 
136 | - What can be extracted:
137 |     --extract all           - Extracts Binary data, all files from CABs, scripts from CustomActions
138 |     --extract binary        - Extracts Binary data
139 |     --extract files         - Extracts files
140 |     --extract cabs          - Extracts cabinets
141 |     --extract scripts       - Extracts scripts
142 | 
143 | ------------------------------------------------------
144 | ```
145 | 
146 | ## TODO
147 | 
148 | - Triaging logic is still a bit flakey, I'm not very proud of it. Hence it will be subject for constant redesigns and further ramifications
149 | - Test it on a wider test samples corpora
150 | - Add support for input ZIP archives with passwords
151 | - Add support for ingesting entire directory full of YARA rules instead of working with a single file only
152 | - Currently, the tool matches malicious `CustomAction Type`s based on assessing their numbers, which is prone to being evaded.
153 |   - It needs to be reworked to properly consume Type number and decompose it [onto flags](https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types)
154 | 
155 | 
156 | ## Tool's Name
157 | 
158 | Apparently when naming my tool, I didn't think on checking whether it was already taken.
159 | There is another tool named `msidump` being part of [msitools](https://gitlab.gnome.org/GNOME/msitools) GNU package:
160 | 
161 | - [msidump](https://wiki.gnome.org/msitools)
162 | 
163 | ---
164 | 
165 | ### ☕ Show Support ☕
166 | 
167 | This and other projects are outcome of sleepless nights and **plenty of hard work**. If you like what I do and appreciate that I always give back to the community,
168 | [Consider buying me a coffee](https://github.com/sponsors/mgeeky) _(or better a beer)_ just to say thank you! 💪 
169 | 
170 | ---
171 | 
172 | ```
173 | Mariusz Banach / mgeeky, (@mariuszbit)
174 | <mb [at] binary-offensive.com>
175 | ```
176 | 


--------------------------------------------------------------------------------
/img/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/1.png


--------------------------------------------------------------------------------
/img/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/2.png


--------------------------------------------------------------------------------
/img/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/3.png


--------------------------------------------------------------------------------
/img/4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/img/4.png


--------------------------------------------------------------------------------
/msidump.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/python3
   2 | #
   3 | # Written by Mariusz Banach <mb@binary-offensive.com>, @mariuszbit / mgeeky
   4 | #
   5 | 
   6 | import sys
   7 | import os
   8 | import re
   9 | import glob
  10 | import pefile
  11 | import argparse
  12 | import hashlib
  13 | import random
  14 | import string
  15 | import tempfile
  16 | import textwrap
  17 | import cabarchive
  18 | import shutil
  19 | import atexit
  20 | import urllib
  21 | from collections import OrderedDict
  22 | from textwrap import fill
  23 | 
  24 | if sys.platform != 'win32':
  25 |     print('\n\n[!] FATAL: This script can only be used in Windows system as it works with Win32 COM/OLE interfaces.\n\n')
  26 | 
  27 | import pythoncom
  28 | import win32com.client
  29 | from win32com.shell import shell, shellcon
  30 | from win32com.client import constants
  31 | 
  32 | USE_SSDEEP = False
  33 | 
  34 | try:
  35 |     import ssdeep
  36 |     USE_SSDEEP = True
  37 | except:
  38 |     quiet = False
  39 |     # for a in sys.argv:
  40 |     #     if a == '-q' or a == '--quiet':
  41 |     #         quiet = True
  42 |     #         break
  43 |     # if not quiet:
  44 |     #     print("[!] 'ssdeep' not installed. Will not use it.")
  45 | 
  46 | try:
  47 |     import colorama
  48 |     import magic
  49 |     import yara
  50 |     import olefile
  51 |     from prettytable import PrettyTable
  52 | 
  53 | except ImportError as e:
  54 |     print(f'\n[!] Requirements not installed: {e}\n\tInstall them with:\n\tcmd> pip install -r requirements.txt\n')
  55 |     sys.exit(1)
  56 | 
  57 | #########################################################
  58 | 
  59 | VERSION = '0.2'
  60 | 
  61 | #########################################################
  62 | 
  63 | options = {
  64 |     'debug'     : False,
  65 |     'verbose'   : False,
  66 |     'format'    : 'text',
  67 | }
  68 | 
  69 | logger = None
  70 | 
  71 | try:
  72 |     colorama.init()
  73 | except:
  74 |     pass
  75 | 
  76 | class Logger:
  77 |     colors_map = {
  78 |         'red':      colorama.Fore.RED, 
  79 |         'green':    colorama.Fore.GREEN, 
  80 |         'yellow':   colorama.Fore.YELLOW,
  81 |         'blue':     colorama.Fore.BLUE, 
  82 |         'magenta':  colorama.Fore.MAGENTA, 
  83 |         'cyan':     colorama.Fore.CYAN,
  84 |         'white':    colorama.Fore.WHITE, 
  85 |         'grey':     colorama.Fore.WHITE,
  86 |         'reset':    colorama.Style.RESET_ALL,
  87 |     }
  88 |     
  89 |     def __init__(self, opts):
  90 |         self.opts = opts
  91 | 
  92 |     @staticmethod
  93 |     def colorize(txt, col):
  94 |         if type(txt) is not str:
  95 |             txt = str(txt)
  96 |         if not col in Logger.colors_map.keys() or options.get('nocolor', False):
  97 |             return txt
  98 |         return Logger.colors_map[col] + txt + Logger.colors_map['reset']
  99 | 
 100 |     @staticmethod
 101 |     def stripColors(txt):
 102 |         ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
 103 |         result = ansi_escape.sub('', txt)
 104 |         return result
 105 | 
 106 |     def fatal(self, txt):
 107 |         self.text('[!] ' + txt, color='red')
 108 |         sys.exit(1)
 109 | 
 110 |     def info(self, txt):
 111 |         self.text('[.] ' + txt, color='yellow')
 112 | 
 113 |     def err(self, txt):
 114 |         self.text('[-] ' + txt, color='red')
 115 | 
 116 |     def ok(self, txt):
 117 |         self.text('[+] ' + txt, color='green')
 118 | 
 119 |     def verbose(self, txt):
 120 |         if self.opts.get('verbose', False) or self.opts.get('debug', False):
 121 |             self.text('[>] ' + txt, color='cyan')
 122 | 
 123 |     def dbg(self, txt):
 124 |         if self.opts.get('debug', False):
 125 |             self.text('[dbg] ' + txt, color='magenta')
 126 | 
 127 |     def text(self, txt, color='none'):
 128 |         if color != 'none':
 129 |             txt = Logger.colorize(txt, color)
 130 | 
 131 |         if not self.opts.get('quiet', False):
 132 |             print(txt)
 133 | 
 134 | 
 135 | class MSIDumper:
 136 |     # https://learn.microsoft.com/pl-pl/windows/win32/msi/custom-action-return-processing-options?redirectedfrom=MSDN
 137 |     CustomActionReturnType = {
 138 |         'check' : 0,
 139 |         'ignore' : 64,
 140 |         'asyncWait' : 128,
 141 |         'asyncNoWait' : 192,
 142 |     }
 143 | 
 144 |     # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-execution-scheduling-options
 145 |     CustomActionExecuteType = {
 146 |         'always' : 0,
 147 |         'firstSequence' : 256,
 148 |         'oncePerProcess' : 512,
 149 |         'clientRepeat' : 768
 150 |     }
 151 | 
 152 |     #
 153 |     # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-in-script-execution-options
 154 |     # Deferred, rollback and commit custom actions can only be placed between InstallInitialize and InstallFinalize
 155 |     #
 156 |     CustomActionInScriptExecute = {
 157 |         'immediate' : 0,
 158 |         'deferred' : 1,
 159 |         'rollback' : 1280,
 160 |         'commit' : 1536,
 161 |         'deferred-no-impersonate' : 3072,
 162 |         'rollback-no-impersonate' : 3328,
 163 |         'commit-no-impersonate' : 3584,
 164 |     }
 165 | 
 166 |     # https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types
 167 |     CustomActionNativeTypes = {
 168 |         'dll-in-binary-table' : 1,
 169 |         'exe-in-binary-table' : 2,
 170 |         'jscript-in-binary-table' : 5,
 171 |         'vbscript-in-binary-table' : 6,
 172 |         'dll-installed-with-product' : 17,
 173 |         'exe-installed-with-product' : 18,
 174 |         'jscript-installed-with-product' : 21,
 175 |         'vbscript-installed-with-product' : 22,
 176 |         'exe-with-directory-path-in-target' : 34,
 177 |         'directory-set' : 35,
 178 |         'jscript-in-sequence-table' : 37,
 179 |         'vbscript-in-sequence-table' : 38,
 180 |         'exe-command-line' : 50,
 181 |         'jscript-with-funcname-in-property' : 53,
 182 |         'vbscript-with-funcname-in-property' : 55,
 183 |     }
 184 | 
 185 |     OpenMode = {
 186 |         'msiOpenDatabaseModeReadOnly' : 0,
 187 |         'msiOpenDatabaseModeTransact' : 1,
 188 |     }
 189 | 
 190 |     SkipColumns = (
 191 |         'extendedtype',
 192 |     )
 193 | 
 194 |     ListModes = (
 195 |         'all', 'olestream', 'cabs', 'binary', 'stats', 'olestreams',
 196 |     )
 197 | 
 198 |     ExtractModes = (
 199 |         'all', 'binary', 'files', 'cabs', 'scripts',
 200 |     )
 201 | 
 202 |     KnownCOMErrors = {
 203 |         0x80004005 : 'Could not process input database',
 204 |     }
 205 | 
 206 |     KnownTables = (
 207 | 		'ActionText', 'AdminExecuteSequence', 'AdminUISequence', 'AdvtExecuteSequence', 'AdvtUISequence', 
 208 |         'AppId', 'AppSearch', 'BBControl', 'Billboard', 'Binary', 'BindImage', 'CCPSearch', 'CheckBox', 
 209 |         'Class', 'ComboBox', 'CompLocator', 'Complus', 'Component', 'Condition', 'Control', 'ControlCondition',
 210 |          'ControlEvent', 'CreateFolder', 'CustomAction', 'Dialog', 'Directory', 'DrLocator', 
 211 |          'DuplicateFile', 'Environment', 'Error', 'EventMapping', 'Extension', 'Feature', 'FeatureComponents', 
 212 |          'File', 'FileSFPCatalog', 'Font', 'Icon', 'IniFile', 'IniLocator', 'InstallExecuteSequence', 
 213 |          'InstallUISequence', 'IsolatedComponent', 'LaunchCondition', 'ListBox', 'ListView', 'LockPermissions', 
 214 |          'Media', 'MIME', 'MoveFile', 'MsiAssembly', 'MsiAssemblyName', 'MsiDigitalCertificate', 
 215 |          'MsiDigitalSignature', 'MsiEmbeddedChainer', 'MsiEmbeddedUI', 'MsiFileHash', 'MsiLockPermissionsEx', 
 216 |          'MsiPackageCertificate', 'MsiPatchCertificate', 'MsiPatchHeaders', 'MsiPatchMetadata', 'MsiPatchOldAssemblyFile', 
 217 |          'MsiPatchOldAssemblyName', 'MsiPatchSequence', 'MsiServiceConfig', 'MsiServiceConfigFailureActions', 
 218 |          'MsiSFCBypass', 'MsiShortcutProperty', 'ODBCAttribute', 'ODBCDataSource', 'ODBCDriver', 'ODBCSourceAttribute', 
 219 |          'ODBCTranslator', 'Patch', 'PatchPackage', 'ProgId', 'Property', 'PublishComponent', 'RadioButton', 
 220 |          'Registry', 'RegLocator', 'RemoveFile', 'RemoveIniFile', 'RemoveRegistry', 'ReserveCost', 'SelfReg', 
 221 |          'ServiceControl', 'ServiceInstall', 'SFPCatalog', 'Shortcut', 'Signature', 'TextStyle', 'TypeLib', 'UIText', 
 222 |          'Upgrade', 'Verb', '_Columns', '_Storages', '_Streams', '_Tables', '_TransformView', '_Validation',
 223 |     )
 224 | 
 225 |     ImportantTables = (
 226 |         'CustomAction', 'InstallExecuteSequence', '_Streams', 'Media', 'InstallUISequence', 'Binary', '_TransformView',
 227 |         'Component', 'Registry', 'Shortcut', 'RemoveFile', 'File',
 228 |     )
 229 | 
 230 |     SuspiciousTables = (
 231 |         'CustomAction', 'Binary', '_Streams', 
 232 |     )
 233 | 
 234 |     #
 235 |     # Approach based on assessing CustomAction Type numbers is prone to being evaded.
 236 |     # TODO: Rework it to properly consume Type number and decompose it onto flags:
 237 |     #  https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types
 238 |     #
 239 |     CustomActionTypes = {
 240 |         'Execute' : {
 241 |             'color' : 'red',
 242 |             'types': (1250, 3298, 226),
 243 |             'desc' : 'Will execute system commands or other executables',
 244 |         },
 245 |         'VBScript' : {
 246 |             'color' : 'red',
 247 |             'types': (1126, 102),
 248 |             'desc' : 'Will run VBScript in-memory',
 249 |         }, 
 250 |         'JScript' : {
 251 |             'color' : 'red',
 252 |             'types': (1125, 101),
 253 |             'desc' : 'Will run JScript in-memory',
 254 |         },
 255 |         'Run-Exe' : {
 256 |             'color' : 'red',
 257 |             'types': (1218, 194),
 258 |             'desc' : 'Will extract executable from inner Binary table, drop it to:\n  C:\\Windows\\Installer\\MSIXXXX.tmp\nand then run it.',
 259 |         },
 260 |         'Load-DLL' : {
 261 |             'color' : 'red',
 262 |             'types': (65, ),
 263 |             'desc' : 'Will load DLL in memory and invoke its exported function.\nThat may also include .NET DLL',
 264 |         },
 265 |         'Run-Dropped-File' : {
 266 |             'color' : 'red',
 267 |             'types': (1746,),
 268 |             'desc' : 'Will run file extracted as a result of installation',
 269 |         },
 270 |         'Set-Directory' : {
 271 |             'color' : 'cyan',
 272 |             'types': (51,),
 273 |             'desc' : 'Will set Directory to a specific path',
 274 |         },
 275 |     }
 276 | 
 277 |     MimeTypesThatIncreasSuspiciousScore = (
 278 |         "application/hta",
 279 |         "application/js",
 280 |         "application/msword",
 281 |         "application/vnd.ms-excel",
 282 |         "application/vnd.ms-powerpoint",
 283 |         "application/vns.ms-appx",
 284 |         "application/x-ms-shortcut",
 285 |         "application/x-vbs",
 286 |         'application/vnd.ms-excel', 
 287 |         'application/vnd.openxmlformats-officedocument.presentationml.presentation', 
 288 |         'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 
 289 |         'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
 290 |         'application/x-dosexec',
 291 |     )
 292 | 
 293 |     RecognizedInnerFileTypes = {
 294 |         'cabinet' : {
 295 |             'indicator' : 'MS Cabinet archive (.CAB)',
 296 |             'safe-extension' : '.cab',
 297 |             'color' : 'yellow',
 298 |             'magic' : ('Microsoft Cabinet',)
 299 |         },
 300 |         'executable' : {
 301 |             'indicator' : 'PE executable (EXE)',
 302 |             'safe-extension' : '.exe.bin',
 303 |             'color' : 'red',
 304 |             'magic' : (
 305 |                 'executable (console)', 
 306 |                 'executable (GUI)', 
 307 |             )
 308 |         },
 309 |         'dll' : {
 310 |             'indicator' : 'PE executable (DLL)',
 311 |             'safe-extension' : '.dll.bin',
 312 |             'color' : 'red',
 313 |             'magic' : (
 314 |                 'executable (DLL)', 
 315 |             )
 316 |         },
 317 |         'unsure-executable' : {
 318 |             'indicator' : 'PE executable (?)',
 319 |             'safe-extension' : '.exe.bin',
 320 |             'color' : 'red',
 321 |             'min-keywords' : 3,
 322 |             'keywords' : (
 323 |                 'This program', 'cannot be', 'run in', 'dos mode',
 324 |             ),
 325 |         },
 326 |         'unsure-cabinet' : {
 327 |             'indicator' : 'CAB archive (?)',
 328 |             'safe-extension' : '.cab',
 329 |             'color' : 'yellow',
 330 |             'min-keywords' : 1,
 331 |             'keywords' : (
 332 |                 'MSCF',
 333 |             ),
 334 |         },
 335 |         'unsure-vbscript' : {
 336 |             'indicator' : 'VBScript (?)',
 337 |             'safe-extension' : '.vbs.bin',
 338 |             'color' : 'red',
 339 |             'printable' : True,
 340 |             'min-keywords' : 3,
 341 |             'keywords' : (
 342 |                 'dim', 'function ', 'sub ', 'createobject', 'getobject', 'with', 'string',
 343 |                 'object', 'set', 'then', 'end if', 'end function', 'end sub'
 344 |             ),
 345 |             'not-keywords' : (
 346 |                 '<?xml',
 347 |             )
 348 |         },
 349 |         'unsure-jscript' : {
 350 |             'indicator' : 'JScript (?)',
 351 |             'safe-extension' : '.js.bin',
 352 |             'color' : 'red',
 353 |             'printable' : True,
 354 |             'min-keywords' : 3,
 355 |             'keywords' : (
 356 |                 'var', 'activexobject', 'try {', 'try{', '}catch', '} catch', 'return ',
 357 |             'function ',
 358 |             ),
 359 |             'not-keywords' : (
 360 |             )
 361 |         }
 362 |     }
 363 | 
 364 |     DangerousExtensions = (
 365 |         '.lnk', '.exe', '.cpl', '.xll', '.url', '.vbs', '.ps1', '.bat', '.psm', 
 366 |         '.wsc', '.wsf', '.dll', '.js', '.vbe', '.jse', '.hta', '.msi', '.cmd',
 367 |     )
 368 | 
 369 |     TableSortBy = {
 370 |         'InstallExecuteSequence' : 2,
 371 |         'InstallUISequence' : 2,
 372 |         'File' : 7,
 373 |         'Feature' : 4,
 374 |         'Media' : 0,
 375 |     }
 376 | 
 377 |     DefaultTableWidth = 128
 378 | 
 379 |     def __init__(self, options, logger):
 380 |         self.options = options
 381 |         self.logger = logger
 382 |         self.disinfectionMode = False
 383 |         self.report = []
 384 |         self.infile = ''
 385 |         self.csvDelim = ','
 386 |         self.maxWidth = self.options.get('print_len', -1)
 387 |         self.format = self.options.get('format', 'text')
 388 |         self.errorsCache = set()
 389 |         self.nativedb = None
 390 |         self.outdir = ''
 391 |         self.verdict = f'[.] Verdict: {Logger.colorize("Benign", "green")}'
 392 |         self.installer = None
 393 |         self.extractedCount = 0
 394 |         self.grade = 0
 395 | 
 396 |         self.specificTableAlignment = {
 397 |             'stats' : {
 398 |                 'type' : 'r',
 399 |                 'value' : 'l',
 400 |             },
 401 |             'report' : {
 402 |                 'description': 'l',
 403 |                 'context': 'l',
 404 |             }
 405 |         }
 406 | 
 407 |     @staticmethod
 408 |     def isprintable(data):
 409 |         if type(data) is str:
 410 |             data = data.encode()
 411 |         for a in data:
 412 |             if a not in string.printable.encode():
 413 |                 return False
 414 |         return True
 415 | 
 416 |     @staticmethod
 417 |     def fromHexdumpToRaw(txt):
 418 |         raw = []
 419 |         if not re.match(r'[0-9a-f]+ \| [0-9a-f]{2}.*', txt.split('\n')[0], re.I):
 420 |             return txt.encode()
 421 | 
 422 |         for line in txt.split('\n'):
 423 |             line = line.strip()
 424 | 
 425 |             if re.match(r'[0-9a-f]+ \| [0-9a-f]{2}.*', line, re.I):
 426 |                 parts = line.split('|')
 427 |                 bytesPart = parts[1].strip()
 428 | 
 429 |                 for m in re.finditer(r'([0-9a-f]{2})', bytesPart, re.I):
 430 |                     raw.append(int(m.group(1), 16))
 431 |         return bytes(raw)
 432 | 
 433 |     @staticmethod
 434 |     def hexdump(data, addr = 0, num = 0):
 435 |         s = ''
 436 |         n = 0
 437 |         lines = []
 438 |         if num == 0: num = len(data)
 439 | 
 440 |         if len(data) == 0:
 441 |             return '<empty>'
 442 | 
 443 |         if type(data) is str:
 444 |             data = data.encode()
 445 | 
 446 |         for i in range(0, num, 16):
 447 |             line = ''
 448 |             line += '%04x | ' % (addr + i)
 449 |             n += 16
 450 | 
 451 |             for j in range(n-16, n):
 452 |                 if j >= len(data): break
 453 |                 line += '%02x ' % (int(data[j]) & 0xff)
 454 | 
 455 |             line += ' ' * (3 * 16 + 7 - len(line)) + ' | '
 456 | 
 457 |             for j in range(n-16, n):
 458 |                 if j >= len(data): break
 459 |                 c = data[j] if not (data[j] < 0x20 or data[j] > 0x7e) else '.'
 460 |                 line += '%c' % c
 461 | 
 462 |             lines.append(line)
 463 |         return '\n'.join(lines)
 464 | 
 465 |     def parseCOMException(self, message, error, additional=''):
 466 |         code = error.hresult + 2**32
 467 |         code2 = 0
 468 | 
 469 |         try:
 470 |             code2 = error.excepinfo[-1] + 2**32
 471 |         except:
 472 |             pass
 473 | 
 474 |         if code2 != 0:
 475 |             if code in MSIDumper.KnownCOMErrors:
 476 |                 additional += MSIDumper.KnownCOMErrors[code]
 477 | 
 478 |             if code2 in MSIDumper.KnownCOMErrors:
 479 |                 additional += MSIDumper.KnownCOMErrors[code2]
 480 | 
 481 |             self.logger.err(f'''{message}:
 482 | 
 483 |     {error}
 484 | 
 485 |     HRESULT 1: 0x{code:08X}          <-- General exception code
 486 | 
 487 |     HRESULT 2: 0x{code2:08X}          <-- COM exception code. Google up that error number: 
 488 |                                         https://google.com/?q={urllib.parse.quote_plus(f"COM exception 0x{code2:08X}")}
 489 | 
 490 |     {additional}
 491 | ''')
 492 | 
 493 |         else:
 494 |             if code in MSIDumper.KnownCOMErrors:
 495 |                 additional += MSIDumper.KnownCOMErrors[code]
 496 | 
 497 |             self.logger.err(f'''{message}:
 498 | 
 499 |     {error}
 500 | 
 501 |     HRESULT: 0x{code:08X}          <-- General exception code
 502 | 
 503 |     {additional}
 504 | ''')
 505 | 
 506 |     def open(self, infile):
 507 |         self.infile = os.path.abspath(os.path.normpath(infile))
 508 |         self.outdir = os.path.abspath(os.path.normpath(self.options.get('outdir', '')))
 509 | 
 510 |         if not os.path.isfile(self.infile):
 511 |             self.logger.fatal(f'Input file does not exist: {self.infile}')
 512 | 
 513 |         mode = MSIDumper.OpenMode['msiOpenDatabaseModeReadOnly']
 514 | 
 515 |         if self.disinfectionMode:
 516 |             self.logger.fatal('MSI Disinfection is not yet implemented.')
 517 |             mode = MSIDumper.OpenMode['constants.msiOpenDatabaseModeTransact']
 518 | 
 519 |         self.initCOM()
 520 | 
 521 |         try:
 522 |             self.logger.dbg(f'Opening database {self.infile} ...')
 523 |             self.nativedb = self.installer.OpenDatabase(
 524 |                 self.infile, 
 525 |                 mode
 526 |             )
 527 | 
 528 |             return True
 529 | 
 530 |         except pythoncom.com_error as error:
 531 |             if self.options['debug']:
 532 |                 self.parseCOMException(
 533 |                     message=f"Could not open MSI database natively via COM",
 534 |                     error=error
 535 |                 )
 536 | 
 537 |             return False
 538 | 
 539 |     def close(self):
 540 |         if self.nativedb is not None:
 541 |             self.nativedb = None
 542 |         
 543 |         if self.installer is not None:
 544 |             try:
 545 |                 self.installer.Release()
 546 |             except:
 547 |                 pass
 548 | 
 549 |             self.installer = None
 550 | 
 551 |     def initCOM(self):
 552 |         if self.installer is not None:
 553 |             return
 554 | 
 555 |         try:
 556 |             #
 557 |             # Logic borrowed from:
 558 |             #   https://github.com/orestis/python/blob/master/Tools/msi/msilib.py#L60
 559 |             #
 560 | 
 561 |             self.logger.dbg('Initializing COM and instantiating WindowsInstaller.Installer ...')
 562 |             pythoncom.CoInitialize()
 563 | 
 564 |             win32com.client.gencache.EnsureModule('{000C1092-0000-0000-C000-000000000046}', 1033, 1, 0)
 565 | 
 566 |             self.installer = win32com.client.Dispatch(
 567 |                 'WindowsInstaller.Installer',
 568 |                 resultCLSID='{000C1090-0000-0000-C000-000000000046}'
 569 |             )
 570 | 
 571 |             if self.installer is None:
 572 |                 self.logger.fatal('Could not instantiate WindowsInstaller.Installer!')
 573 | 
 574 |         except Exception as e:
 575 |             self.logger.fatal(f'Could not instantiate WindowsInstaller.Installer. Exception:\n\n\t{e}')
 576 | 
 577 |     def collectEntries(self, table, dontSort = False):
 578 |         entries = []
 579 | 
 580 |         try:
 581 |             entries = self._collectEntries(
 582 |                 table, 
 583 |                 dontSort
 584 |             )
 585 |         except Exception as e:
 586 |             self.logger.dbg(f'Error: Table {table} did not contain any records.')
 587 | 
 588 |             if self.options.get('debug', False) and table.lower() != '_streams':
 589 |                 raise
 590 | 
 591 |         return entries
 592 | 
 593 |     def _collectEntries(self, table, dontSort = False):
 594 |         assert self.nativedb is not None, "Database is not opened"
 595 |         entries = []
 596 | 
 597 |         view = self.nativedb.OpenView(f"SELECT * FROM {table}")
 598 |         view.Execute(None)
 599 | 
 600 |         types = view.ColumnInfo(constants.msiColumnInfoTypes)
 601 |         names = view.ColumnInfo(constants.msiColumnInfoNames)
 602 |         columns = []
 603 | 
 604 |         for i in range(1, types.FieldCount+1):
 605 |             t = types.StringData(i)
 606 |             n = names.StringData(i)
 607 | 
 608 |             if t[0] in 'slSL':
 609 |                 columns.append((n, 'str'))
 610 |             elif t[0] in 'iI':
 611 |                 columns.append((n, 'int'))
 612 |             elif t[0] == 'v':
 613 |                 columns.append((n, 'bin'))
 614 |             else:
 615 |                 self.logger.dbg(f'Unsupported column type: table {table}, column: {i}. Type: {t}, Name: {n}')
 616 |                 columns.append((n, '?'))
 617 | 
 618 |         while True:
 619 |             r = view.Fetch() 
 620 |             if not r:
 621 |                 break
 622 | 
 623 |             rec = OrderedDict()
 624 |             for i in range(1, r.FieldCount+1):
 625 |                 val = None
 626 |                 name = columns[i-1][0]
 627 | 
 628 |                 if r.IsNull(i):
 629 |                     val = ''
 630 | 
 631 |                 elif columns[i-1][1] == 'str': 
 632 |                     try:
 633 |                         val = r.StringData(i)
 634 | 
 635 |                     except Exception as e:
 636 |                         txt = f'Could not convert {table} column {columns[i-1][0]} value to string (type: {columns[i-1][1]}): {e}'
 637 |                         if txt not in self.errorsCache:
 638 |                             self.logger.dbg(txt)
 639 |                             self.errorsCache.add(txt)
 640 |                         val = ''
 641 | 
 642 |                 elif columns[i-1][1] == 'int': 
 643 |                     try:
 644 |                         val = r.IntegerData(i)
 645 |                     except Exception as e:
 646 |                         txt = f'Could not convert {table} column {columns[i-1][0]} value to integer (type: {columns[i-1][1]}): {e}'
 647 |                         if txt not in self.errorsCache:
 648 |                             self.logger.dbg(txt)
 649 |                             self.errorsCache.add(txt)
 650 |                         val = 0
 651 | 
 652 |                 elif columns[i-1][1] == 'bin': 
 653 |                     size = r.DataSize(i)
 654 |                     val = r.ReadStream(i, size, constants.msiReadStreamBytes)
 655 | 
 656 |                 rec[columns[i-1][0].lower()] = val
 657 | 
 658 |             entries.append(rec)
 659 | 
 660 |         view.Close()
 661 | 
 662 |         if not dontSort and table in MSIDumper.TableSortBy:
 663 |             entries = sorted(entries, key=lambda x: list(x.values())[MSIDumper.TableSortBy[table]] )
 664 | 
 665 |         self.logger.dbg(f'Collected {len(entries)} entries from {table} ...')
 666 |         return entries
 667 | 
 668 |     def getMaxValueFromTable(self, table, columnNum):
 669 |         maxVal = -1
 670 |         entries = self.collectEntries(table)
 671 | 
 672 |         for entry in entries:
 673 |             if maxVal < entry[columnNum]:
 674 |                 maxVal = entry[columnNum]
 675 | 
 676 |         return maxVal
 677 | 
 678 |     def analyse(self):
 679 |         assert self.nativedb is not None, "Database is not opened"
 680 | 
 681 |         try:
 682 |             ret = self.analysisWorker()
 683 | 
 684 |             if self.grade > 0:
 685 |                 self.verdict = f'[.] Verdict: {Logger.colorize("SUSPICIOUS", "red")}'
 686 | 
 687 |             self.logger.verbose(f'Verdict grade: {self.grade}')
 688 | 
 689 |             return ret
 690 | 
 691 |         except Exception as e:
 692 |             if self.nativedb is not None:
 693 |                 self.nativedb = None
 694 | 
 695 |             if self.options['debug']: 
 696 |                 raise
 697 |             else:
 698 |                 self.logger.err(f'Could not analyse input MSI. Enable --debug to learn more. Exception: {e}')
 699 | 
 700 |             return False
 701 | 
 702 |         finally:
 703 |             pass
 704 | 
 705 |     def listTable(self, table):
 706 |         if ',' in table:
 707 |             output = ''
 708 |             tables = table.split(',')
 709 |             for t in tables:
 710 |                 output += f'{Logger.colorize("[+]", "green")} Listing: {Logger.colorize(t, "green")}\n\n'
 711 | 
 712 |                 out = self._listTable(t)
 713 |                 if out is not None:
 714 |                     output += str(out) + '\n'
 715 | 
 716 |             return output
 717 |         else:
 718 |             return self._listTable(table)
 719 | 
 720 |     def _listTable(self, table):
 721 |         assert self.nativedb is not None, "Database is not opened"
 722 | 
 723 |         records = None
 724 | 
 725 |         if table == 'streams':  table = '_Streams'
 726 |         if table == 'stream':   table = '_Streams'
 727 |         if table == 'binary':   table = 'Binary'
 728 |         if table == 'cabs':     table = 'Media'
 729 |         if table == 'olestreams':table = 'olestream'
 730 | 
 731 |         if table.lower() not in [x.lower() for x in MSIDumper.KnownTables + MSIDumper.ListModes]:
 732 |             tb = PrettyTable(['1','2','3'])
 733 |             tb.header = False
 734 |             vals = list(MSIDumper.KnownTables + MSIDumper.ListModes)
 735 |             i = 0
 736 |             while i + 3 < len(vals):
 737 |                 tb.add_row([vals[i+0], vals[i+1], vals[i+2]])
 738 |                 i += 3
 739 | 
 740 |             if i < len(vals):
 741 |                 for j in range(len(vals)-i):
 742 |                     tb.add_row([vals[i+j], '', ''])
 743 | 
 744 |             self.logger.fatal(f'Unsupported --list setting: {table}\n    Pick one/combination of following --list values:\n\n{tb}\n')
 745 | 
 746 |         if table.lower() in [x.lower() for x in MSIDumper.KnownTables]:
 747 |             try:
 748 |                 if table not in MSIDumper.KnownTables:
 749 |                     for t in MSIDumper.KnownTables:
 750 |                         if table.lower() == t.lower():
 751 |                             table = t
 752 |                             break
 753 | 
 754 |                 index = self.options.get('record', -1)
 755 |                 if index != -1:
 756 |                     records0 = self.collectEntries(table)
 757 | 
 758 |                     try:
 759 |                         index = int(index)
 760 |                         if index < 0 or index-1 > len(records0):
 761 |                             self.logger.fatal(f'Invalid --record specified. There were only {len(records0)} records returned from {table}.\n\t\tUse value between --record 1 and --record {len(records0)}')
 762 |                         records = [ records0[index-1], ]
 763 |                     except:
 764 |                         records = []
 765 |                         for a in records0:
 766 |                             vals = list(a.values())
 767 |                             if len(vals) > 0 and vals[0].lower() == index.lower():
 768 |                                 records.append(a)
 769 |                                 break
 770 | 
 771 |                         if len(records) == 0:
 772 |                             self.logger.fatal(f'Invalid --record specified. Could not find {table} record entry based on its index number nor ID name.')
 773 |                 else:
 774 |                     records = self.collectEntries(table)  
 775 |      
 776 |             except Exception as e:
 777 |                 self.logger.err(f'Exception occurred while enumerating {table} entries: {e}')
 778 | 
 779 |                 if self.options.get('debug', False):
 780 |                     raise
 781 |         else:
 782 |             table = table.lower()
 783 | 
 784 |             try:
 785 |                 if table == 'stats':
 786 |                     records = self.collectStats()
 787 |                 elif table == 'all':
 788 |                     return self.collectAll()
 789 |                 elif table == 'olestream':
 790 |                     records = self.collectStreams()
 791 |                 else:
 792 |                     self.logger.fatal(f'Unsupported --list setting: {table}')
 793 | 
 794 |             except Exception as e:
 795 |                 self.logger.err(f'Exception occurred while pulling MSI metadata {table}: {e}')
 796 | 
 797 |                 if self.options.get('debug', False):
 798 |                     raise
 799 | 
 800 |         if records is not None:
 801 |             self.tableSpecificHighlighting(table, records)
 802 |             return self.printTable(table, records)
 803 | 
 804 |         else:
 805 |             if table in MSIDumper.KnownTables:
 806 |                 return f'No records found in {Logger.colorize(table, "green")} table.'
 807 |             else:
 808 |                 return f'No {Logger.colorize(table, "green")} metadata was extracted.'
 809 | 
 810 |     def tableSpecificHighlighting(self, table, records):
 811 |         if table.lower() == 'customaction':
 812 |             for i in range(len(records)):
 813 |                 rec = records[i]
 814 |                 for k, v in rec.items():
 815 |                     if k == 'type':
 816 |                         col = ''
 817 |                         for a, b in MSIDumper.CustomActionTypes.items():
 818 |                             if v in b['types']:
 819 |                                 col = b['color']
 820 |                                 break
 821 |                         if col != '':
 822 |                             records[i][k] = Logger.colorize(v, col)
 823 |                             records[i]['source'] = Logger.colorize(records[i]['source'], col)
 824 | 
 825 |         if table.lower() == 'binary':
 826 |             for i in range(len(records)):
 827 |                 records[i]['Magic type'] = self.sniffDataType(records[i]['data'], color=True)
 828 |         
 829 |     def extract(self, what):
 830 |         assert self.nativedb is not None, "Database is not opened"
 831 | 
 832 |         what = what.lower()
 833 | 
 834 |         if what == 'script':
 835 |             what = 'scripts'
 836 | 
 837 |         if what not in [x.lower() for x in MSIDumper.ExtractModes]:
 838 |             self.logger.fatal(f'Unsupported --extract setting: {what}')
 839 | 
 840 |         self.outdir = os.path.normpath(os.path.abspath(self.options.get('outdir', '')))
 841 |         if len(self.outdir) == 0:
 842 |             self.outdir = os.getcwd()
 843 | 
 844 |         if not os.path.isdir(self.outdir):
 845 |             os.makedirs(self.outdir)
 846 | 
 847 |         if what == 'all':
 848 |             return self.extractAll()
 849 |         elif what == 'binary':
 850 |             return self.extractBinary()
 851 |         elif what == 'files':
 852 |             return self.extractFiles()
 853 |         elif what == 'cabs':
 854 |             return self.extractCABs()
 855 |         elif what == 'scripts':
 856 |             return self.extractScripts()
 857 | 
 858 |     def extractAll(self):
 859 |         output = ''
 860 | 
 861 |         outs = self.extractBinary()
 862 |         if len(outs) > 0:
 863 |             output += outs + '\n'
 864 |         
 865 |         outs = self.extractFiles()
 866 |         if len(outs) > 0:
 867 |             output += outs + '\n'
 868 | 
 869 |         outs = self.extractCABs()
 870 |         if len(outs) > 0:
 871 |             output += outs + '\n'
 872 | 
 873 |         outs = self.extractScripts()
 874 |         if len(outs) > 0:
 875 |             output += outs + '\n'
 876 | 
 877 |         output += f'\nExtracted in total {self.extractedCount} objects.\n'
 878 | 
 879 |         return output
 880 | 
 881 |     def sanitizeName(self, name):
 882 |         windowsNames = (
 883 |             'CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 
 884 |             'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 
 885 |             'COM8', 'COM9', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 
 886 |             'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 
 887 |         )
 888 | 
 889 |         for a in ('..', '\\', '/', '"', "'", '?', '*', ':'):
 890 |             name = name.replace(a, '')
 891 | 
 892 |         for a in windowsNames:
 893 |             name = name.replace(a, '')
 894 |         
 895 |         if len(name) == 0:
 896 |             name = 'bin-' + ''.join(random.choices(string.ascii_uppercase + string.digits, k=5))
 897 | 
 898 |         return name
 899 | 
 900 |     def extractBinary(self):
 901 |         binary = self.collectEntries('Binary')
 902 |         num = 0
 903 |         output = ''
 904 | 
 905 |         self.logger.verbose('Extracting data from Binary table...')
 906 | 
 907 |         if len(binary) == 0:
 908 |             self.logger.err('Input MSI does not contain any embedded Binary data.')
 909 | 
 910 |         for elem in binary:
 911 |             sniffed = self.sniffDataType(elem['data'])
 912 |             name = self.sanitizeName(elem['name']) + self.sniffDataExt(sniffed)
 913 |             outp = os.path.join(self.outdir, name)
 914 | 
 915 |             with open(outp, 'wb') as f:
 916 |                 f.write(elem['data'].encode())
 917 | 
 918 |             num += 1
 919 |             output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}'
 920 | 
 921 |         self.extractedCount += num
 922 |         if num > 0 and self.options.get('extract', '') != 'all':
 923 |             output += f'\n\nExtracted in total {num} objects.\n'
 924 | 
 925 |         return output
 926 | 
 927 |     def extractCab(self, infile, outdir, files):
 928 |         with open(infile, "rb") as f:
 929 |             arc = cabarchive.CabArchive(f.read())
 930 | 
 931 |         self.logger.verbose('Extracting Cabinets from MSI...')
 932 | 
 933 |         output = f'Extracting files from CAB ({infile}):\n\n'
 934 |         num = 0
 935 | 
 936 |         for k, v in arc.items():
 937 |             fn = v.filename
 938 | 
 939 |             for _file in files:
 940 |                 if fn == _file['file']:
 941 |                     fn = _file['filename']
 942 | 
 943 |             p, ext = os.path.splitext(fn)
 944 |             if ext.lower() in MSIDumper.DangerousExtensions:
 945 |                 fn += '.bin'
 946 | 
 947 |             lp = os.path.join(outdir, fn)
 948 | 
 949 |             lp1 = os.path.join(outdir, os.path.dirname(lp))
 950 |             if not os.path.isdir(lp1):
 951 |                 output += f'\t{Logger.colorize("[+]","green")} Creating temp dir: {lp1}\n'
 952 |                 os.makedirs(lp1, exist_ok=True)
 953 | 
 954 |             output += f'{Logger.colorize("[+]","green")} {v.filename:20} => {lp}\n'
 955 |             with open(lp, 'wb') as f:
 956 |                 f.write(v.buf)
 957 |                 num += 1
 958 | 
 959 |         return num, output
 960 | 
 961 |     def extractFiles(self, overrideOutdir=''):
 962 |         outdir = self.outdir
 963 |         if len(overrideOutdir) > 0:
 964 |             dirpath = overrideOutdir
 965 |         else:
 966 |             dirpath = tempfile.mkdtemp()
 967 | 
 968 |         self.outdir = dirpath
 969 |         self.extractCABs()
 970 |         self.outdir = outdir
 971 | 
 972 |         self.logger.verbose('Extracting files from MSI...')
 973 | 
 974 |         cabsNum = 0
 975 |         num = 0
 976 |         output = ''
 977 |         files = self.collectEntries('File')
 978 | 
 979 |         path = os.path.join(dirpath, '*.cab')
 980 |         for cab in glob.glob(path, recursive=True):
 981 |             cabPath = os.path.join(path, cab)
 982 |             cabsNum += 1
 983 |             outp = os.path.join(dirpath, os.path.basename(cabPath).replace('.cab', ''))
 984 | 
 985 |             try:
 986 |                 num0, output0 = self.extractCab(cabPath, outp, files)
 987 |                 num += num0
 988 |                 output += output0
 989 | 
 990 |             except Exception as e:
 991 |                 self.logger.err(f'Could not extract files from CABinet: {cabPath}. Error: {e}')
 992 |                 if self.options.get('debug', False):
 993 |                     raise
 994 |             finally:
 995 |                 if os.path.isfile(cabPath):
 996 |                     os.remove(cabPath)
 997 | 
 998 |         if dirpath != overrideOutdir:
 999 |             shutil.rmtree(dirpath)
1000 | 
1001 |         self.extractedCount += num
1002 |         if num > 0 and self.options.get('extract', '') != 'all':
1003 |             output += f'\nExtracted in total {num} files from {cabsNum} cabinets.\n'
1004 | 
1005 |         return output
1006 | 
1007 |     def extractCABs(self):
1008 |         binary = self.collectEntries('Binary')
1009 |         num = 0
1010 |         output = ''
1011 | 
1012 |         if len(binary) == 0:
1013 |             self.logger.err('Input MSI does not contain any embedded Binary data.')
1014 | 
1015 |         for elem in binary:
1016 |             sniffed = self.sniffDataType(elem['data'])
1017 |             if '.cab' not in sniffed.lower():
1018 |                 continue
1019 | 
1020 |             name = self.sanitizeName(elem['name']) + '.cab'
1021 |             outp = os.path.join(self.outdir, name)
1022 | 
1023 |             with open(outp, 'wb') as f:
1024 |                 f.write(elem['data'].encode())
1025 | 
1026 |             num += 1
1027 | 
1028 |         # source: https://github.com/decalage2/oletools/blob/master/oletools/oledir.py#L245
1029 |         ole = olefile.OleFileIO(self.infile)
1030 |         for entry in ole.listdir():
1031 |             name = entry[-1]
1032 |             name = repr(name)[1:-1]
1033 |             entry_id = ole._find(entry)
1034 |             try:
1035 |                 size = ole.get_size(entry)
1036 |             except:
1037 |                 size = '-'
1038 | 
1039 |             data0 = ole.openstream(entry).getvalue()
1040 |             data = data0.decode(errors='ignore')
1041 | 
1042 |             sniffed = self.sniffDataType(data)
1043 |             if '.cab' not in sniffed.lower():
1044 |                 continue
1045 | 
1046 |             name = f'ole-stream-{entry_id}.cab'
1047 |             outp = os.path.join(self.outdir, name)
1048 | 
1049 |             with open(outp, 'wb') as f:
1050 |                 f.write(data0)
1051 | 
1052 |             num += 1
1053 |             output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]), "green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}'
1054 | 
1055 |         self.extractedCount += num
1056 |         if num > 0 and self.options.get('extract', '') != 'all':
1057 |             output += f'\n\nExtracted in total {num} objects.\n'
1058 | 
1059 |         return output
1060 | 
1061 |     def extractScripts(self):
1062 |         binary = self.collectEntries('Binary')
1063 |         actions = self.collectEntries('CustomAction')
1064 |         num = 0
1065 |         output = ''
1066 | 
1067 |         self.logger.verbose('Extracting scripts from CustomAction and Binary tables...')
1068 | 
1069 |         if len(binary) == 0:
1070 |             self.logger.err('Input MSI does not contain any embedded Binary data.')
1071 | 
1072 |         for elem in actions:
1073 |             sniffed = self.sniffDataType(elem['target'])
1074 |             if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
1075 |                 continue
1076 | 
1077 |             name = self.sanitizeName(elem['action'])
1078 |             outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed)
1079 | 
1080 |             with open(outp, 'wb') as f:
1081 |                 f.write(elem['target'].encode())
1082 | 
1083 |             num += 1
1084 |             output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["target"]),"green")} bytes of {Logger.colorize(elem["action"],"green")} CustomAction script to: {Logger.colorize(outp,"yellow")}'
1085 | 
1086 |         for elem in binary:
1087 |             sniffed = self.sniffDataType(elem['data'])
1088 |             if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
1089 |                 continue
1090 |                 
1091 |             name = self.sanitizeName(elem['name'])
1092 |             outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed)
1093 | 
1094 |             with open(outp, 'wb') as f:
1095 |                 f.write(elem['data'].encode())
1096 | 
1097 |             num += 1
1098 |             output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} binary object script to: {Logger.colorize(outp,"yellow")}'
1099 | 
1100 |         self.extractedCount += num
1101 |         if num > 0 and self.options.get('extract', '') != 'all':
1102 |             output += f'\n\nExtracted in total {num} objects.\n'
1103 | 
1104 |         return output
1105 | 
1106 |     def formatTable(self, tbl, table, records):
1107 |         if self.maxWidth > -1 and len(records) > 0:
1108 |             for k in records[0].keys():
1109 |                 tbl._max_width[k] = self.maxWidth
1110 | 
1111 |         tbl.align['YARA Results'] = 'l'
1112 | 
1113 |         if table.lower() in self.specificTableAlignment.keys():
1114 |             for k, v in self.specificTableAlignment[table.lower()].items():
1115 |                 tbl.align[k] = v
1116 | 
1117 |         if table.lower() in [x.lower() for x in MSIDumper.TableSortBy] and len(records) > 0:
1118 |             tbl.sortby = list(records[0].keys())[MSIDumper.TableSortBy[table]]
1119 | 
1120 |         return tbl
1121 | 
1122 |     def collectAll(self):
1123 |         output = ''
1124 | 
1125 |         self.logger.info('Dumping all MSI tables...')
1126 | 
1127 |         for table in MSIDumper.KnownTables:
1128 |             recs = self.collectEntries(table)
1129 | 
1130 |             if not self.options.get('verbose', False) and len(recs) == 0 and table not in MSIDumper.ImportantTables:
1131 |                 continue
1132 | 
1133 |             output += '\n\n'
1134 |             output += Logger.colorize(f'===============[ {table} : {len(recs)} records ]===============', 'green')
1135 |             output += '\n\n'
1136 |             output += self.printTable(table, recs)
1137 | 
1138 |         return output
1139 |     
1140 |     def collectStreams(self):
1141 |         records = []
1142 | 
1143 |         ole = olefile.OleFileIO(self.infile)
1144 |         for entry in ole.listdir(storages=True):
1145 |             name = entry[-1]
1146 |             name = repr(name)[1:-1]
1147 |             entry_id = ole._find(entry)
1148 |             try:
1149 |                 size = ole.get_size(entry)
1150 |             except:
1151 |                 size = '-'
1152 |             typeid = ole.get_type(entry)
1153 |             clsid = ole.getclsid(entry)
1154 |             
1155 |             data0 = ole.openstream(entry).getvalue()
1156 |             data = data0.decode(errors='ignore')
1157 |             sniffed = self.sniffDataType(data, color=True)
1158 | 
1159 |             records.append({
1160 |                 'entry_id' : entry_id,
1161 |                 'data type' : sniffed,
1162 |                 'name' : Logger.colorize(name, 'yellow'),
1163 |                 'size' : size,
1164 |                 'typeid' : typeid,
1165 |                 'CLSID' : clsid,
1166 |             })
1167 | 
1168 |         return sorted(records, key=lambda x: x['entry_id'])
1169 | 
1170 |     def collectStats(self):
1171 |         records = []
1172 |         hashes = (
1173 |             'md5', 'sha1', 'sha256', 'ssdeep'
1174 |         )
1175 | 
1176 |         self.logger.info('Computing MSI file hashes...')
1177 | 
1178 |         with open(self.infile, 'rb') as f:
1179 |             data = f.read()
1180 | 
1181 |             for h in hashes:
1182 |                 if h == 'ssdeep':
1183 |                     if USE_SSDEEP:
1184 |                         hsh = ssdeep.hash(data)
1185 |                     else:
1186 |                         hsh = 'err: ssdeep module not installed'
1187 |                 else:
1188 |                     m = hashlib.new(h)
1189 |                     m.update(data)
1190 |                     hsh = m.hexdigest()
1191 | 
1192 |                 records.append({
1193 |                     'type' : Logger.colorize(f'Hash {h}', 'cyan'),
1194 |                     'value' : Logger.colorize(hsh, 'cyan'),
1195 |                 })
1196 | 
1197 |         del data
1198 | 
1199 |         self.logger.info('Collecting MSI tables stats...')
1200 | 
1201 |         for table in MSIDumper.KnownTables:
1202 |             recs = self.collectEntries(table)
1203 |             val = f'{len(recs)} records'
1204 | 
1205 |             if table in MSIDumper.SuspiciousTables:
1206 |                 table = Logger.colorize(table, 'red')
1207 |                 val = Logger.colorize(val, 'red')
1208 | 
1209 |             elif table in MSIDumper.ImportantTables:
1210 |                 table = Logger.colorize(table, 'yellow')
1211 |                 val = Logger.colorize(val, 'yellow')
1212 | 
1213 |             else:
1214 |                 if len(recs) == 0 and not self.options.get('verbose', False):
1215 |                     continue
1216 | 
1217 |             records.append({
1218 |                 'type' : table,
1219 |                 'value' : val,
1220 |             })
1221 | 
1222 |         return records
1223 | 
1224 |     def analysisWorker(self):
1225 |         self.processActions()
1226 |         self.lookForIOCs()
1227 | 
1228 |         return self.printReport()
1229 | 
1230 |     def normalizeDataForOutput(self, val, num=0, table=''):
1231 |         if num == 0:
1232 |             num = self.options.get('print_len', MSIDumper.DefaultTableWidth)
1233 | 
1234 |         if num != -1:
1235 |             val = val[:num]
1236 | 
1237 |         printable = MSIDumper.isprintable(val)
1238 | 
1239 |         if not printable and table not in ('olestream', ):
1240 |             printable2 = MSIDumper.isprintable(Logger.stripColors(val))
1241 |             if not printable2:
1242 |                 val = MSIDumper.hexdump(val) + '\n'
1243 | 
1244 |         return val
1245 |     
1246 |     def cleanString(self, txt):
1247 |         txt = txt.replace('\r', '')
1248 |         txt = txt.replace('\t', '  ')
1249 | 
1250 |         if self.options.get('format', 'text') in ('csv', 'json'):
1251 |             txt = Logger.stripColors(txt)
1252 |             txt = ''.join(filter(lambda x: x in string.printable, txt))
1253 |             txt = txt.replace('\n', ' ')
1254 |             txt = re.sub(r'\s+', ' ', txt, re.I)
1255 |         
1256 |         return txt
1257 | 
1258 |     def printTable(self, table, records):
1259 |         if len(records) == 0:
1260 |             return f'\n\nNo records found in table {Logger.colorize(table, "green")}.'
1261 | 
1262 |         yaraColumn = ''
1263 |         self.logger.dbg(f'Dumping {table} table results...')
1264 | 
1265 |         rules = None
1266 |         if len(self.options.get('yara', '')) > 0 and table != 'YARA Results':
1267 |             yaraColumn = 'YARA Results'
1268 |             matchesReport = []
1269 |             rules = self.initYara()
1270 | 
1271 |         if len(records) == 1 and (self.options.get('record', '') != -1 and len(self.options.get('record', '')) > 0):
1272 |             output = ''
1273 | 
1274 |             for k, v in records[0].items():
1275 |                 k0 = Logger.colorize(k, "green")
1276 |                 output += f'\n- {k0:20} : '
1277 | 
1278 |                 if type(v) is str:
1279 |                     v = self.normalizeDataForOutput(v, -1, table=table)
1280 | 
1281 |                     if len(v) < 50:
1282 |                         output += v
1283 |                     else:
1284 |                         spacer = Logger.colorize('=' * MSIDumper.DefaultTableWidth, 'yellow')
1285 |                         output += '\n\n' + spacer + '\n\n' + v + '\n\n' + spacer + '\n'
1286 |                 else:
1287 |                     output += str(v)
1288 | 
1289 |                 if table in ('binary', ):
1290 |                     output += '\n'
1291 | 
1292 |             output += '\n'
1293 | 
1294 |             if len(yaraColumn) > 0:
1295 |                 k0 = Logger.colorize(yaraColumn, "green")
1296 |                 output += f'\n- {k0:20} : '
1297 | 
1298 |                 for k, v in records[0].items(): 
1299 |                     if type(v) is not str:
1300 |                         continue
1301 |                     matches = rules.match(data = v)
1302 |                     if matches:
1303 |                         ms = ''
1304 |                         for m in matches:
1305 |                             ms += f'- {m.rule}\n'
1306 |                         output += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n'
1307 |         else:
1308 |             output = ''
1309 |             numCol = ['#',]
1310 |             yarCol = []
1311 |             if table == 'olestream':
1312 |                 numCol = []
1313 | 
1314 |             if len(yaraColumn) > 0:
1315 |                 yarCol = [yaraColumn, ]
1316 | 
1317 |             tbl = PrettyTable(numCol + list(records[0].keys()) + yarCol)
1318 |             num = 0
1319 | 
1320 |             index = self.options.get('record', -1)
1321 |             if index != -1:
1322 |                 num = index - 1
1323 | 
1324 |             tbl = self.formatTable(tbl, table, records)
1325 | 
1326 |             for rec in records:
1327 |                 num += 1
1328 |                 vals = []
1329 |                 i = 0
1330 |                 for v in [num, ] + list(rec.values()):
1331 |                     if i == 0 and 'entry_id' in rec.keys():
1332 |                         i += 1
1333 |                         continue
1334 |                     if type(v) is str:
1335 |                         v = self.normalizeDataForOutput(v, table=table)
1336 |                         s = self.cleanString(v).strip()
1337 |                         n = ''
1338 | 
1339 |                         if table.lower() in ('binary', ):
1340 |                             n = '\n'
1341 | 
1342 |                         vals.append(s + n)
1343 |                     else:
1344 |                         vals.append(v)
1345 |                     i += 1
1346 | 
1347 |                 if len(yaraColumn) > 0:
1348 |                     i = 0
1349 |                     val = ''
1350 |                     for v in list(rec.values()): 
1351 |                         if type(v) is not str:
1352 |                             i += 1
1353 |                             continue
1354 |                         matches = rules.match(data = v)
1355 |                         if matches:
1356 |                             ms = ''
1357 |                             for m in matches:
1358 |                                 ms += f'- {m.rule}\n'
1359 |                             k = list(rec.keys())[i]
1360 |                             val += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n'
1361 |                         i += 1
1362 |                     vals.append(val)
1363 | 
1364 |                 if self.options['format'] == 'csv':
1365 |                     tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals])
1366 |                 else:
1367 |                     tbl.add_row(vals)
1368 | 
1369 |             if self.options['format'] == 'text':
1370 |                 output += str(tbl)
1371 | 
1372 |                 if table != 'YARA Results' and self:
1373 |                     output += f'\n\n[.] Found {Logger.colorize(str(len(records)), "green")} records in {Logger.colorize(table, "green")} table.'
1374 | 
1375 |                 output += '\n'
1376 | 
1377 |             elif self.options['format'] == 'json':
1378 |                 output += str(tbl.get_json_string())
1379 |             
1380 |             elif self.options['format'] == 'csv':
1381 |                 output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\'))
1382 |             
1383 |             # elif self.options['format'] == 'html':
1384 |             #     output += str(tbl.get_html_string())
1385 |             
1386 |         return output
1387 | 
1388 |     def printReport(self):
1389 |         output = ''
1390 |         cols = [
1391 |             '#',
1392 |             'threat',
1393 |             'location',
1394 |             'context',
1395 |             'description'
1396 |         ]
1397 |         tbl = PrettyTable(cols)
1398 |         tbl = self.formatTable(tbl, 'report', self.report)
1399 | 
1400 |         num = 0
1401 | 
1402 |         for report in self.report:
1403 |             num += 1
1404 |             rec = [
1405 |                 num,
1406 |                 report['name'],
1407 |                 report['location'],
1408 |                 report['context'],
1409 |                 report['desc'],
1410 |             ]
1411 |             vals = []
1412 |             for v in rec:
1413 |                 if type(v) is str:
1414 |                     vals.append(self.cleanString(v))
1415 |                 else:
1416 |                     vals.append(v)
1417 | 
1418 |             if self.options['format'] == 'csv':
1419 |                 tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals])
1420 |             else:
1421 |                 tbl.add_row(vals)
1422 | 
1423 |         if self.options['format'] == 'text':
1424 |             output += str(tbl)
1425 | 
1426 |         elif self.options['format'] == 'json':
1427 |             output += str(tbl.get_json_string())
1428 |         
1429 |         elif self.options['format'] == 'csv':
1430 |             output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\'))
1431 |         
1432 |         # elif self.options['format'f] == 'html':
1433 |         #     output += str(tbl.get_html_string())
1434 | 
1435 |         return output
1436 | 
1437 |     def printRecord(self, rec, indent=''):
1438 |         out = ''
1439 |         keyLen = -1
1440 | 
1441 |         if type(rec) is str:
1442 |             return rec
1443 | 
1444 |         for k, v in rec.items():
1445 |             if len(k) > keyLen:
1446 |                 keyLen = len(Logger.colorize(k, 'yellow')) + 1
1447 | 
1448 |         if self.format == 'text':
1449 |             for k, v in rec.items():
1450 |                 if k.lower() in MSIDumper.SkipColumns:
1451 |                     continue
1452 | 
1453 |                 if type(v) is str or type(v) is bytes:
1454 |                     printable = MSIDumper.isprintable(v)
1455 | 
1456 |                     if not printable and v[0] != '\x1b':
1457 |                         v = '\n\n' + MSIDumper.hexdump(v) + '\n'
1458 | 
1459 |                     if self.options.get('record', -1) == -1 and len(v) > 256: 
1460 |                         v = '\n\n' + v[:256].strip() + '\n\t[CUT FOR BREVITY]\n'
1461 | 
1462 |                 k = Logger.colorize(k, 'yellow')
1463 |                 out += indent + f'- {k:{keyLen}}: {v}\n'
1464 | 
1465 |         elif self.format == 'csv':
1466 |             out = self.csvDelim.join([str(x).replace(self.csvDelim, '')[:self.maxWidth] for x in rec.values()])
1467 | 
1468 |         return out
1469 | 
1470 |     @staticmethod
1471 |     def isValidPE(data):
1472 |         pe = None
1473 |         try:
1474 |             pe = pefile.PE(data=data.encode(), fast_load=True)
1475 |             _format = MSIDumper.RecognizedInnerFileTypes['executable']['indicator']
1476 | 
1477 |             if pe.OPTIONAL_HEADER.DllCharacteristics != 0:
1478 |                 _format = MSIDumper.RecognizedInnerFileTypes['dll']['indicator']
1479 | 
1480 |             pe.close()
1481 |             return (True, _format)
1482 |         except pefile.PEFormatError as e:
1483 |             logger.dbg(f'pefile error: {e}')
1484 |             return (False, '')
1485 |         finally:
1486 |             if pe:
1487 |                 pe.close()
1488 | 
1489 |     def sniffDataExt(self, sniffed):
1490 |         for k, v in MSIDumper.RecognizedInnerFileTypes.items():
1491 |             if v['indicator'].lower() == sniffed.lower():
1492 |                 return MSIDumper.RecognizedInnerFileTypes[k]['safe-extension']
1493 | 
1494 |         return ''
1495 | 
1496 |     def gradeFoundIndicator(self, indicator, data='', color='', mime=''):
1497 |         if color != '':
1498 |             if color == 'red':
1499 |                 return 1
1500 |         
1501 |         if mime != '' and mime.lower() in MSIDumper.MimeTypesThatIncreasSuspiciousScore:
1502 |             return 1
1503 | 
1504 |         return 0
1505 | 
1506 |     def sniffDataType(self, data, color=False):
1507 |         mime = self.options.get('mime', False)
1508 |         magicOut = 'data'
1509 |         try:
1510 |             magicOut = magic.from_buffer(data, mime=mime)
1511 |         except Exception as e:
1512 |             self.logger.dbg(f'Magic failed fingerprinting data: {e}')
1513 | 
1514 |         pe, petype = MSIDumper.isValidPE(data)
1515 |         if pe:
1516 |             if mime and magicOut in ('data', 'application/octet-stream'):
1517 |                 indicator = 'application/x-dosexec'
1518 |             if color:
1519 |                 indicator = Logger.colorize(petype, 'red')
1520 |             self.grade += self.gradeFoundIndicator(indicator, data, color='red')
1521 |             return indicator
1522 | 
1523 |         for format, predicate in MSIDumper.RecognizedInnerFileTypes.items():
1524 |             indicator = predicate.get('indicator', '')
1525 |             predColor = predicate.get('color', '')
1526 | 
1527 |             if format == 'unsure-executable':
1528 |                 if data[:2] != 'MZ' and data[:2] != 'ZM':
1529 |                     continue
1530 |             elif format == 'unsure-cabinet':
1531 |                 if data[:4] != 'MSCF':
1532 |                     continue
1533 | 
1534 |             if mime:
1535 |                 indicator = magicOut
1536 | 
1537 |             if color:
1538 |                 indicator = Logger.colorize(indicator, predColor)
1539 |                 
1540 |             magicVals = predicate.get('magic', [])
1541 |             if len(magicVals) > 0:
1542 |                 for m in magicVals:
1543 |                     if m.lower() in magicOut.lower():
1544 |                         self.grade += self.gradeFoundIndicator(indicator, data, color=predColor)
1545 |                         return indicator
1546 | 
1547 |             keywords = predicate.get('keywords', [])
1548 |             minkeywords = predicate.get('min-keywords', 0)
1549 |             
1550 |             printable = predicate.get('printable', 0)
1551 |             printableMet = False
1552 |             if printable:
1553 |                 if MSIDumper.isprintable(data):
1554 |                     printableMet = True
1555 | 
1556 |             if printable and not printableMet:
1557 |                 continue
1558 | 
1559 |             if len(keywords) > 0 and minkeywords > 0:
1560 |                 skip = False
1561 |                 found = 0
1562 |                 for keyword in keywords:
1563 |                     if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I):
1564 |                         found += 1
1565 | 
1566 |                 if found >= minkeywords:
1567 |                     foundNots = 0
1568 |                     notkeywords = predicate.get('not-keywords', [])
1569 | 
1570 |                     if len(notkeywords) > 0:
1571 |                         for keyword in notkeywords:
1572 |                             if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I):
1573 |                                 foundNots += 1
1574 | 
1575 |                     if foundNots == 0:
1576 |                         self.grade += self.gradeFoundIndicator(indicator, data, color=predColor)
1577 |                         return indicator
1578 | 
1579 |         if magicOut == 'data':
1580 |             return ''
1581 | 
1582 |         return magicOut
1583 | 
1584 |     def lookForIOCs(self):
1585 |         binary = self.collectEntries('Binary')
1586 |         customActions = self.collectEntries('CustomAction')
1587 |         i = 0
1588 | 
1589 |         streams = self.collectEntries('_Streams')
1590 |         if len(streams) == 0:
1591 |             self.report.append({
1592 |                 'name' : Logger.colorize('Missing _Streams', 'yellow'),
1593 |                 'location' : f'_Streams table',
1594 |                 'context' : '',
1595 |                 'desc' : f'Typically MSIs contain _Streams table referring .CAB archives.\nThis sample however didn\'t contain such table, making it unusual/mangled.\n',
1596 |             })
1597 | 
1598 |         for data in binary:
1599 |             i += 1
1600 |             sniffed = self.sniffDataType(data['data'], color=True)
1601 | 
1602 |             if len(sniffed) > 0:
1603 |                 data['size'] = len(data['data'])
1604 |                 runByCa = False
1605 |                 desc = ''
1606 | 
1607 |                 i = 0
1608 |                 for ca in customActions:
1609 |                     i += 1
1610 |                     if ca['source'] == data['name']:
1611 |                         runByCa = True
1612 |                         desc = f'\nThat data will be used during installation by CustomAction {Logger.colorize(i, "yellow")}. {Logger.colorize(ca["action"], "yellow")}'
1613 |                         break
1614 | 
1615 |                 if not runByCa:
1616 |                     self.grade -= 1
1617 |                     sniffed = Logger.stripColors(sniffed)
1618 |                     sniffed = Logger.colorize(sniffed, 'yellow')
1619 |                     desc = '\nHowever that data doesn\'t seem to be used in CustomActions, decreasing impact.'
1620 | 
1621 |                 self.report.append({
1622 |                     'name' : sniffed,
1623 |                     'location' : f'Binary table',
1624 |                     'context' : self.printRecord(data),
1625 |                     'desc' : f'MSI contains {sniffed} data in Binary table entry {Logger.colorize(str(i), "yellow")}. {Logger.colorize(data["name"], "yellow")}' + desc,
1626 |                 })
1627 | 
1628 |     def processActions(self):
1629 |         actions = self.collectEntries('CustomAction')
1630 |         execSeq = self.collectEntries('InstallExecuteSequence')
1631 |         uiSeq = self.collectEntries('InstallUISequence')
1632 | 
1633 |         for action in actions:
1634 |             self.logger.dbg(f'Parsing CustomAction {action["action"]} ...')
1635 | 
1636 |             for suspAction, data in MSIDumper.CustomActionTypes.items():
1637 |                 if action['type'] in data['types']:
1638 |                     desc = data['desc']
1639 |                     color = MSIDumper.CustomActionTypes[suspAction].get('color', 'white')
1640 | 
1641 |                     fieldToHighlight = ''
1642 | 
1643 |                     if 'vbscript' in suspAction.lower() or 'jscript' in suspAction.lower():
1644 |                         if len(action['source']) > 0:
1645 |                             fieldToHighlight = 'source'
1646 |                             self.grade += self.gradeFoundIndicator(suspAction, color=color)
1647 |                             desc += f".\nScript is located in {Logger.colorize(action['source'],'yellow')} Binary table record."
1648 | 
1649 |                     elif 'run-dll' in suspAction.lower():
1650 |                         fieldToHighlight = 'source'
1651 |                         self.grade += self.gradeFoundIndicator(suspAction, color=color)
1652 |                         desc += f".\nDLL is located in {Logger.colorize(action['source'],'yellow')} Binary table record."
1653 |                     
1654 |                     elif 'run-exe' in suspAction.lower():
1655 |                         fieldToHighlight = 'source'
1656 |                         self.grade += self.gradeFoundIndicator(suspAction, color=color)
1657 |                         desc += f"\nEXE is located in {Logger.colorize(action['source'],'yellow')} Binary table record."
1658 | 
1659 |                     elif 'set-directory' in suspAction.lower():
1660 |                         fieldToHighlight = 'target'
1661 | 
1662 |                     elif 'execute' in suspAction.lower():
1663 |                         fieldToHighlight = 'target'
1664 |                         self.grade += self.gradeFoundIndicator(suspAction, color=color)
1665 |                         desc += f".\nCommand that will be executed:\ncmd> {Logger.colorize(action['target'],'red')}"
1666 | 
1667 |                     foundInSeq = False
1668 |                     for seq in execSeq:
1669 |                         if seq['action'] == action['action']:
1670 |                             foundInSeq = True
1671 |                             cond = ''
1672 |                             if len(seq['condition']) > 0:
1673 |                                 cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}"
1674 | 
1675 |                             desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallExecuteSequence','yellow')} table" + cond + '\n'
1676 |                             break
1677 | 
1678 |                     for seq in uiSeq:
1679 |                         if seq['action'] == action['action']:
1680 |                             foundInSeq = True
1681 |                             cond = ''
1682 |                             if len(seq['condition']) > 0:
1683 |                                 cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}"
1684 | 
1685 |                             desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallUISequence','yellow')} table" + cond + '\n'
1686 |                             break
1687 | 
1688 |                     if not foundInSeq:
1689 |                         self.grade -= 1
1690 |                         color = 'yellow'
1691 |                         desc = '\nHowever that action doesn\'t seem to be invoked anywhere, decreasing impact.'
1692 | 
1693 |                     if len(fieldToHighlight) > 0:
1694 |                         action[fieldToHighlight] = Logger.colorize(action[fieldToHighlight], color)
1695 | 
1696 |                     self.report.append({
1697 |                         'name' : Logger.colorize(suspAction, color),
1698 |                         'location' : f'CustomAction table',
1699 |                         'context' : self.printRecord(action),
1700 |                         'desc' : desc
1701 |                     })
1702 |                     break
1703 | 
1704 |     def initYara(self):
1705 |         yaraPath = self.options.get('yara', '')
1706 |         if len(yaraPath) == 0:
1707 |             return None
1708 | 
1709 |         yaraPath = os.path.abspath(os.path.normpath(yaraPath))
1710 | 
1711 |         if not os.path.isfile(yaraPath) and not os.path.isdir(yaraPath):
1712 |             self.logger.fatal(f'Specified --yara path does not exist.')
1713 | 
1714 |         rules = None
1715 |         try:
1716 |             rules = yara.compile(yaraPath)
1717 |         except Exception as e:
1718 |             self.logger.fatal(f'Could not compile YARA rules. Exception: {e}')
1719 | 
1720 |         return rules
1721 | 
1722 |     def yaraScan(self, scanBinary=True, scanActions=True, scanFiles=True):
1723 |         matchesReport = []
1724 |         rules = self.initYara()
1725 | 
1726 |         if scanBinary:
1727 |             binary = self.collectEntries('Binary')
1728 |             output = ''
1729 | 
1730 |             if len(binary) > 0:
1731 |                 i = 0
1732 |                 for elem in binary:
1733 |                     i += 1
1734 |                     matches = rules.match(data = elem['data'].encode())
1735 |                     if matches:
1736 |                         matchesReport.append({
1737 |                             'where' : f'Binary record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}',
1738 |                             'rules' : '\n'.join([x.rule for x in matches])
1739 |                         })
1740 | 
1741 |         if scanActions:
1742 |             actions = self.collectEntries('CustomAction')
1743 |             output = ''
1744 | 
1745 |             if len(actions) > 0:
1746 |                 i = 0
1747 |                 for elem in actions:
1748 |                     sniffed = self.sniffDataType(elem['target'])
1749 |                     if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
1750 |                         continue
1751 |                     i += 1
1752 |                     matches = rules.match(data = elem['data'])
1753 |                     if matches:
1754 |                         matchesReport.append({
1755 |                             'where' : f'CustomAction record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}',
1756 |                             'rules' : '\n'.join([x.rule for x in matches])
1757 |                         })
1758 | 
1759 |         if scanFiles:
1760 |             try:
1761 |                 dirpath = tempfile.mkdtemp()
1762 |                 self.logger.verbose(f'Extracting all files from MSI into temp dir: {dirpath} ...')
1763 | 
1764 |                 out = self.extractFiles(overrideOutdir = dirpath)
1765 | 
1766 |                 for _file in glob.glob(os.path.join(dirpath, '**/*.*'), recursive=True):
1767 |                     path = os.path.join(dirpath, _file)
1768 | 
1769 |                     matches = rules.match(path)
1770 |                     if matches:
1771 |                         matchesReport.append({
1772 |                             'where' : f'File extracted from MSI: {Logger.colorize(os.path.basename(path), "yellow")}',
1773 |                             'rules' : '\n'.join([x.rule for x in matches])
1774 |                         })
1775 | 
1776 |             except Exception as e:
1777 |                 self.logger.err(f'Could not extract files from MSI for YARA scanning. Exception: {e}')
1778 |                 if self.options.get('debug', False):
1779 |                     raise
1780 | 
1781 |             finally:
1782 |                 if os.path.isdir(dirpath):
1783 |                     shutil.rmtree(dirpath)
1784 | 
1785 |         if len(matchesReport) > 0:
1786 |             output += Logger.colorize(f'[+] Got {len(matchesReport)} YARA rules matches on this MSI:\n\n', 'green')
1787 |             output += self.printTable('YARA Results', matchesReport)
1788 | 
1789 |         return output
1790 | 
1791 | def getoptions():
1792 |     global logger
1793 |     global options
1794 | 
1795 |     epilog = f'''
1796 | 
1797 | ------------------------------------------------------
1798 | 
1799 | - What can be listed:
1800 |     --list CustomAction     - Specific table
1801 |     --list Registry,File    - List multiple tables
1802 |     --list stats            - Print MSI database statistics
1803 |     --list all              - All tables and their contents
1804 |     --list olestream        - Prints all OLE streams & storages. 
1805 |                               To display CABs embedded in MSI try: --list _Streams
1806 |     --list cabs             - Lists embedded CAB files
1807 |     --list binary           - Lists binary data embedded in MSI for its own purposes.
1808 |                               That typically includes EXEs, DLLs, VBS/JS scripts, etc
1809 | 
1810 | - What can be extracted:
1811 |     --extract all           - Extracts Binary data, all files from CABs, scripts from CustomActions
1812 |     --extract binary        - Extracts Binary data
1813 |     --extract files         - Extracts files
1814 |     --extract cabs          - Extracts cabinets
1815 |     --extract scripts       - Extracts scripts
1816 | 
1817 | ------------------------------------------------------
1818 | 
1819 | '''
1820 | 
1821 |     usage = '\nUsage: msidump.py [options] <infile.msi>\n'
1822 |     opts = argparse.ArgumentParser(
1823 |         usage=usage,
1824 |         formatter_class=argparse.RawDescriptionHelpFormatter,
1825 |         epilog=textwrap.dedent(epilog)
1826 |     )
1827 | 
1828 |     req = opts.add_argument_group('Required arguments')
1829 |     req.add_argument('infile', help='Input MSI file (or directory) for analysis.')
1830 |     
1831 |     opt = opts.add_argument_group('Options')
1832 |     opt.add_argument('-q', '--quiet', default=False, action='store_true', help='Surpress banner and unnecessary information. In triage mode, will display only verdict.')
1833 |     opt.add_argument('-v', '--verbose', default=False, action='store_true', help='Verbose mode.')
1834 |     opt.add_argument('-d', '--debug', default=False, action='store_true', help='Debug mode.')
1835 |     opt.add_argument('-N', '--nocolor', default=False, action='store_true', help='Dont use colors in text output.')
1836 |     opt.add_argument('-n', '--print-len', default=MSIDumper.DefaultTableWidth, type=int, help='When previewing data - how many bytes to include in preview/hexdump. Default: 128')
1837 |     opt.add_argument('-f', '--format', default='text', choices=['text', 'json', 'csv'], help='Output format: text, json, csv. Default: text')
1838 |     opt.add_argument('-o', '--outfile', metavar='path', default='', help='Redirect program output to this file.')
1839 |     opt.add_argument('-m', '--mime', default=False, action='store_true', help='When sniffing inner data type, report MIME types')
1840 |     
1841 |     mod = opts.add_argument_group('Analysis Modes')
1842 |     mod.add_argument('-l', '--list', metavar='what', default='', help='List specific table contents. See help message to learn what can be listed.')
1843 |     mod.add_argument('-x', '--extract', metavar='what', default='', help='Extract data from MSI. For what can be extracted, refer to help message.')
1844 | 
1845 |     spec = opts.add_argument_group('Analysis Specific options')
1846 |     spec.add_argument('-i', '--record', metavar='number|name', type=str, default=-1, help='Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir')
1847 |     spec.add_argument('-O', '--outdir', metavar='path', default='', help='When --extract mode is used, specifies output location where to extract data.')
1848 |     spec.add_argument('-y', '--yara', metavar='path', default='', help='Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files')
1849 | 
1850 |     args = opts.parse_args()
1851 |     options.update(vars(args))
1852 | 
1853 |     logger = Logger(options)
1854 | 
1855 |     if len(args.list) > 0:
1856 |         if args.list.lower() not in [x.lower() for x in MSIDumper.ListModes + MSIDumper.KnownTables] and ',' not in args.list:
1857 |             logger.err(f'WARNING: Requested {args.list} table is not recognized: parser will probably crash!')
1858 | 
1859 |     args.infile = os.path.abspath(os.path.normpath(args.infile))
1860 | 
1861 |     if not os.path.isfile(args.infile) and not os.path.isdir(args.infile):
1862 |         logger.fatal(f'--infile does not exist!')
1863 | 
1864 |     exclusive = sum([len(args.list) > 0, len(args.extract) > 0])
1865 |     if exclusive > 1:
1866 |         logger.fatal(f'--list and --extract are mutually exclusive options. Pick one.')
1867 | 
1868 |     if len(args.extract) > 0 and len(args.outdir) == 0:
1869 |         logger.fatal('-O/--outdir telling where to extract files to is required when working in --extract mode.')
1870 | 
1871 |     options.update(vars(args))
1872 |     return args
1873 | 
1874 | @atexit.register
1875 | def goodbye():
1876 |     try:
1877 |         colorama.deinit()
1878 |     except:
1879 |         pass
1880 | 
1881 | def terminalWidth():
1882 |     n = shutil.get_terminal_size((80, 20))  # pass fallback
1883 |     return n.columns
1884 | 
1885 | def banner():
1886 |     print(f'''
1887 |                    _     _                       
1888 |      _ __ ___  ___(_) __| |_   _ _ __ ___  _ __  
1889 |     | '_ ` _ \/ __| |/ _` | | | | '_ ` _ \| '_ \ 
1890 |     | | | | | \__ \ | (_| | |_| | | | | | | |_) |
1891 |     |_| |_| |_|___/_|\__,_|\__,_|_| |_| |_| .__/ 
1892 |                                         |_|    
1893 |     version: {Logger.colorize(VERSION, "green")}
1894 |     author : Mariusz Banach (mgeeky, @mariuszbit)
1895 |              binary-offensive.com
1896 | ''')
1897 | 
1898 | def processFile(args, path):
1899 |     msir = MSIDumper(options, logger)
1900 | 
1901 |     if not msir.open(path):
1902 |         logger.err(f'Could not open database (use -d to learn more): {path}')
1903 |         return ''
1904 | 
1905 |     report = ''
1906 |     if not args.quiet and args.format == 'text':
1907 |         report += f'{Logger.colorize("[+]","green")} Analyzing : {path}\n\n'
1908 | 
1909 |     if len(args.list) > 0:
1910 |         report += msir.listTable(args.list)
1911 | 
1912 |     elif len(args.extract) > 0:
1913 |         report += msir.extract(args.extract)
1914 | 
1915 |     else:
1916 |         rep = msir.analyse()
1917 | 
1918 |         if len(args.yara) > 0:
1919 |             rep += '\n\n' + msir.yaraScan()
1920 | 
1921 |         if not args.quiet:
1922 |             report += str(rep)
1923 | 
1924 |             if args.format == 'text':
1925 |                 report += '\n\n' + msir.verdict.strip() + '\n'
1926 | 
1927 |         elif args.format == 'text':
1928 |             verd = msir.verdict.strip()
1929 |             pos = verd.find(':')
1930 |             if pos != -1:
1931 |                 verd = verd[pos+1:].strip()
1932 | 
1933 |             report += verd + ' : ' + path
1934 | 
1935 |     if args.format == 'text':
1936 |         logger.ok(f'Database processed : {path}')
1937 |     msir.close()
1938 | 
1939 |     return report
1940 | 
1941 | def processDir(args, infile):
1942 |     report = ''
1943 | 
1944 |     logger.verbose(f'Process files from directory: {infile}')
1945 | 
1946 |     for file in glob.glob(os.path.join(infile, '**/**'), recursive=True):
1947 |         path = os.path.join(infile, file)
1948 |         if os.path.isfile(path):
1949 |             try:
1950 |                 report += processFile(args, path)
1951 |                 report += '\n\n'
1952 | 
1953 |             except Exception as e:
1954 |                 logger.err('Analysis of "{}" failed. Exception: {}'.format(
1955 |                     path, str(e)
1956 |                 ))
1957 | 
1958 |     return report
1959 | 
1960 | def main():
1961 |     global options
1962 |     args = getoptions()
1963 |     if not args:
1964 |         return False
1965 | 
1966 |     if not args.quiet and args.format == 'text':
1967 |         banner()
1968 | 
1969 |     if len(args.outfile) > 0:
1970 |         options['nocolor'] = True
1971 | 
1972 |     options['max_width'] = terminalWidth()
1973 | 
1974 |     if os.path.isfile(args.infile):
1975 |         report = processFile(args, args.infile)
1976 | 
1977 |     else:
1978 |         report = processDir(args, args.infile)
1979 | 
1980 |     if len(args.outfile) > 0:
1981 |         with open(args.outfile, 'wb') as f:
1982 |             rep = Logger.stripColors(report)
1983 |             f.write(rep.encode())
1984 |     else:
1985 |         print(report)
1986 | 
1987 | if __name__ == '__main__':
1988 |     main()
1989 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | olefile
 2 | colorama
 3 | yara-python
 4 | prettytable>=3.5
 5 | pefile
 6 | cabarchive
 7 | pywin32
 8 | python-magic
 9 | python-magic-bin; sys_platform == "win32" or sys_platform == "darwin"
10 | 
11 | # ssdeep is optional
12 | #ssdeep 


--------------------------------------------------------------------------------
/test-cases/README.md:
--------------------------------------------------------------------------------
 1 | ## msidump test cases
 2 | 
 3 | - `sample1-run-autoruns64.msi.bin` - launches MS Sysinternals Autoruns64.exe from `C:\Windows\Installer\MSXXXX.msi`
 4 | - `sample2-run-calc-script.msi.bin` - executes VBScript that runs `calc` over `Wscript.Shell.Exec` method
 5 | - `sample3-run-calc-shellcode-via-dotnet.msi.bin` - bundles specially crafted CustomAction .NET DLL, that when executed, runs shellcode which spawns `calc`
 6 | - `sample4-customaction-run-calc.msi.bin` - simple MSI that runs system commands after installation is complete, here runs `calc`
 7 | - `putty-backdoored.msi.bin` - runs `calc` during PuTTY installation
 8 | 
 9 | All these installers install themselves to `%LOCALAPPDATA%\VcRedist` directory.
10 | 
11 | You can uninstall them with:
12 | 
13 | ```
14 | msiexec /q /x file.msi
15 | ```
16 | 


--------------------------------------------------------------------------------
/test-cases/putty-backdoored.msi.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/putty-backdoored.msi.bin


--------------------------------------------------------------------------------
/test-cases/sample1-run-autoruns64.msi.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample1-run-autoruns64.msi.bin


--------------------------------------------------------------------------------
/test-cases/sample2-run-calc-script.msi.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample2-run-calc-script.msi.bin


--------------------------------------------------------------------------------
/test-cases/sample3-run-calc-shellcode-via-dotnet.msi.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample3-run-calc-shellcode-via-dotnet.msi.bin


--------------------------------------------------------------------------------
/test-cases/sample4-customaction-run-calc.msi.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mgeeky/msidump/40833694ebba0188f4f6e0d0bf5fd89a223775be/test-cases/sample4-customaction-run-calc.msi.bin


--------------------------------------------------------------------------------