├── .gitignore ├── BUILDING.md ├── README.md ├── bitwise_expr_lookup_tbl.cpp ├── bitwise_expr_lookup_tbl.hpp ├── consts.hpp ├── equiv_class.cpp ├── equiv_class.hpp ├── file.cpp ├── file.hpp ├── generate_oracle.bat ├── generate_oracle.sh ├── goomba.cfg ├── goomba.cpp ├── heuristics.cpp ├── heuristics.hpp ├── images ├── mba1_after.png └── mba1_before.png ├── lin_conj_exprs.hpp ├── linear_exprs.cpp ├── linear_exprs.hpp ├── makefile ├── mcode_emu.hpp ├── minsn_template.hpp ├── msynth_parser.cpp ├── msynth_parser.hpp ├── optimizer.cpp ├── optimizer.hpp ├── simp_lin_conj_exprs.hpp ├── smt_convert.cpp ├── smt_convert.hpp ├── tests └── idb │ ├── mba_challenge.i64 │ └── nonlinear.o.i64 ├── z3++_no_warn.h └── z3 └── readme.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .gitignore 2 | r32.bat 3 | tests/ 4 | -------------------------------------------------------------------------------- /BUILDING.md: -------------------------------------------------------------------------------- 1 | 2 | # Bulding gooMBA 3 | 4 | ## dependencies 5 | 6 | gooMBA requires IDA SDK (8.2 or later) and the [z3 library](https://github.com/Z3Prover/z3). 7 | 8 | ## Building 9 | 10 | 1. After unpacking and setting up the SDK, copy goomba source tree under SDK's `plugins` directory, 11 | for example `C:\idasdk_pro82\plugins\goomba`. 12 | 13 | 2. Download and extract [z3 build for your OS](https://github.com/Z3Prover/z3/releases) into the `z3` subdirectory. 14 | 15 | Under it, you should have `bin` and `include` directories: 16 | 17 | z3/bin/ 18 | z3/include/ 19 | 20 | Alternatively, set `Z3_BIN` and `Z3_INCLUDE` to point to the directories elsewhere. 21 | 22 | 3. build the necessary version of gooMBA, for example: 23 | 24 | ```make -j``` for 32-bit IDA 25 | ```make __EA64__=1 -j``` for IDA64 26 | 27 | 4. Copy generated files from SDK's bin directory to your IDA install (or [user directory](https://hex-rays.com/blog/igors-tip-of-the-week-33-idas-user-directory-idausr/)): 28 | 29 | On Windows: 30 | 31 | * `C:\idasdk_pro82\bin\plugins\goomba*` -> `C:\Program Files\IDA Pro 8.2\plugins\` 32 | * `C:\idasdk_pro82\bin\cfg\goomba.cfg` -> `C:\Program Files\IDA Pro 8.2\cfg\` 33 | * `C:\idasdk_pro82\bin\libz3.*` -> `C:\Program Files\IDA Pro 8.2\` 34 | 35 | On linux: 36 | 37 | * `/path/to/idasdk_pro82/bin/plugins/goomba*` -> `/path/to/ida82/plugins/` 38 | * `/path/to/idasdk_pro82/bin/cfg/goomba.cfg` -> `/path/to/ida82/cfg/` 39 | * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/` 40 | 41 | On macOS: 42 | 43 | * `/path/to/idasdk_pro82/bin/plugins/goomba*` -> `/path/to/ida82/ida.app/Contents/MacOS/plugins/` 44 | * `/path/to/idasdk_pro82/bin/cfg/goomba.cfg` -> `/path/to/ida82/ida.app/Contents/MacOS/cfg/` 45 | * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/ida.app/Contents/MacOS/` 46 | * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/ida64.app/Contents/MacOS/` 47 | 48 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # gooMBA 2 | 3 | gooMBA is a Hex-Rays Decompiler plugin that simplifies Mixed Boolean-Arithmetic 4 | (MBA) expressions. It achieves this using several heuristics and algorithms to 5 | achieve orders-of-magnitude better performance than existing state-of-the-art 6 | solutions. 7 | 8 | More information on the inner workings of this tool is available in our [blog 9 | post](https://hex-rays.com/blog/deobfuscation-with-goomba/). 10 | 11 | ## Core Features 12 | - Full integration with the Hex-Rays Decompiler 13 | - Simplifies linear MBAs, including opaque predicates 14 | - Handles sign extension for linear functions 15 | - Verifies soundness of simplifications using the z3 SMT solver 16 | - Simplifies non-linear MBAs with the use of a function fingerprint oracle 17 | 18 | ## Usage 19 | 20 | By default, the plugin does not run automatically. You can invoke the plugin 21 | by right clicking in the pseudocode view and selecting "Run gooMBA Optimizer". 22 | In addition, you can set up a keyboard shortcut in IDA by opening Options -> 23 | Shortcuts... and adding a shortcut for the `goomba:run` action. 24 | 25 | Several options for usage are available within `goomba.cfg`. You can set up a 26 | fingerprint oracle, configure the z3 proof timeout time, choose the desired behavior when 27 | timeouts occur, and choose to make the plugin run automatically without needing 28 | to be invoked from the right-click menu. 29 | 30 | ## Demo 31 | 32 | The sample database `tests/idb/mba_challenge.i64` was created from the `mba_challenge` binary. The functions 33 | `mba1`, `mba2`, `mba3`, `mba`, `solve_me` contain MBA expressions of varying complexity. 34 | 35 | For example, the `mba1` function's initial pseudocode: 36 | ![mba1 initial pseudocode](./images/mba1_before.png) 37 | 38 | And after running gooMBA optimization: 39 | ![mba1 pseudocode optimized](./images/mba1_after.png) 40 | 41 | 42 | ## Fingerprint oracle 43 | 44 | The oracle can be used for simplifying non-linear MBAs. 45 | The input for generaring it is a list of candidate expressions in [msynth](https://github.com/mrphrazer/msynth) syntax. 46 | You can use `generate_oracle.sh` or `generate_oracle.bat` to generate a binary 47 | oracle file which can then be used by the plugin by specifying the path to it 48 | in `goomba.cfg` (parameter `MBA_ORACLE_PATH`). 49 | 50 | A large pre-computed oracle is available [here](https://hex-rays.com/products/ida/support/freefiles/goomba-oracle.7z) 51 | 52 | NOTE: oracle files generated with IDA 8.2 can only be used with 64-bit binaries, otherwise you may hit internal error 30661. 53 | 54 | ## Obtaining gooMBA 55 | 56 | Please see the [releases](https://github.com/HexRaysSA/goomba/releases) section for `goomba` builds that will work with IDA Pro & IDA Teams v8.2. 57 | 58 | Starting with version 8.3, `goomba` is shipped with IDA Pro & IDA Teams. 59 | -------------------------------------------------------------------------------- /bitwise_expr_lookup_tbl.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "bitwise_expr_lookup_tbl.hpp" 11 | 12 | bw_expr_tbl_t bw_expr_tbl_t::instance; 13 | 14 | bw_expr_tbl_t::bw_expr_tbl_t() 15 | { 16 | minsn_templates_t X; 17 | X.push_back(std::make_shared(0)); 18 | X.push_back(std::make_shared(1)); 19 | X.push_back(std::make_shared(2)); 20 | 21 | minsn_template_ptr_t zero = std::make_shared(0ull); 22 | 23 | // note that all expressions are ordered by the numeric value of the instruction trace 24 | // see lin_conj_exprs.hpp for more info on ordering. 25 | auto &onevar = tbl.push_back(); 26 | onevar.push_back(zero); // [0 0] 27 | onevar.push_back(X[0]); // [0 1] 28 | 29 | auto &twovar = tbl.push_back(); 30 | twovar.push_back(zero); // [0 0 0 0] 31 | twovar.push_back(X[0]&~X[1]); // [0 1 0 0] 32 | twovar.push_back(~(X[0]|~X[1])); // [0 0 1 0] 33 | twovar.push_back(X[0]^X[1]); // [0 1 1 0] 34 | twovar.push_back(X[0]&X[1]); // [0 0 0 1] 35 | twovar.push_back(X[0]); // [0 1 0 1] 36 | twovar.push_back(X[1]); // [0 0 1 1] 37 | twovar.push_back(X[0]|X[1]); // [0 1 1 1] 38 | 39 | auto &threevar = tbl.push_back(); 40 | threevar.push_back(zero); // [0 0 0 0 0 0 0 0] 41 | threevar.push_back(~(~X[0]|(X[1]|X[2]))); // [0 1 0 0 0 0 0 0] 42 | threevar.push_back(~(X[0]|(~X[1]|X[2]))); // [0 0 1 0 0 0 0 0] 43 | threevar.push_back(~X[2]&(X[0]^X[1])); // [0 1 1 0 0 0 0 0] 44 | threevar.push_back(~(~X[0]|(~X[1]|X[2]))); // [0 0 0 1 0 0 0 0] 45 | threevar.push_back(X[0]&~X[2]); // [0 1 0 1 0 0 0 0] 46 | threevar.push_back(X[1]&~X[2]); // [0 0 1 1 0 0 0 0] 47 | threevar.push_back(X[2]^(X[0]|(X[1]|X[2]))); // [0 1 1 1 0 0 0 0] 48 | threevar.push_back(~X[0]&(~X[1]&X[2])); // [0 0 0 0 1 0 0 0] 49 | threevar.push_back(~X[1]&(X[0]^X[2])); // [0 1 0 0 1 0 0 0] 50 | threevar.push_back(~X[0]&(X[1]^X[2])); // [0 0 1 0 1 0 0 0] 51 | threevar.push_back(~(X[0]&X[1])&(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 0 0 0] 52 | threevar.push_back(~(X[0]^X[1])&(X[0]^X[2])); // [0 0 0 1 1 0 0 0] 53 | threevar.push_back(X[2]^(X[0]|(X[1]&X[2]))); // [0 1 0 1 1 0 0 0] 54 | threevar.push_back(~(X[0]&~X[1])&(X[1]^X[2])); // [0 0 1 1 1 0 0 0] 55 | threevar.push_back(X[2]^(X[0]|X[1])); // [0 1 1 1 1 0 0 0] 56 | threevar.push_back(X[0]&(~X[1]&X[2])); // [0 0 0 0 0 1 0 0] 57 | threevar.push_back(X[0]&~X[1]); // [0 1 0 0 0 1 0 0] 58 | threevar.push_back((X[0]^X[1])&~(X[0]^X[2])); // [0 0 1 0 0 1 0 0] 59 | threevar.push_back(X[1]^(X[0]|(X[1]&X[2]))); // [0 1 1 0 0 1 0 0] 60 | threevar.push_back(X[0]&(X[1]^X[2])); // [0 0 0 1 0 1 0 0] 61 | threevar.push_back(~(~X[0]|(X[1]&X[2]))); // [0 1 0 1 0 1 0 0] 62 | threevar.push_back((X[0]|X[1])&(X[1]^X[2])); // [0 0 1 1 0 1 0 0] 63 | threevar.push_back((X[0]&X[1])^~(X[0]^(~X[1]|X[2]))); // [0 1 1 1 0 1 0 0] 64 | threevar.push_back(~(X[1]|~X[2])); // [0 0 0 0 1 1 0 0] 65 | threevar.push_back(X[1]^(X[0]|(X[1]|X[2]))); // [0 1 0 0 1 1 0 0] 66 | threevar.push_back(~(X[0]&X[1])&(X[1]^X[2])); // [0 0 1 0 1 1 0 0] 67 | threevar.push_back(X[1]^(X[0]|X[2])); // [0 1 1 0 1 1 0 0] 68 | threevar.push_back((X[0]|~X[1])&(X[1]^X[2])); // [0 0 0 1 1 1 0 0] 69 | threevar.push_back((X[0]&X[2])^(X[0]^(~X[1]&X[2]))); // [0 1 0 1 1 1 0 0] 70 | threevar.push_back(X[1]^X[2]); // [0 0 1 1 1 1 0 0] 71 | threevar.push_back((X[0]&~X[1])|(X[1]^X[2])); // [0 1 1 1 1 1 0 0] 72 | threevar.push_back(~X[0]&(X[1]&X[2])); // [0 0 0 0 0 0 1 0] 73 | threevar.push_back((X[0]^X[1])&(X[0]^X[2])); // [0 1 0 0 0 0 1 0] 74 | threevar.push_back(~(X[0]|~X[1])); // [0 0 1 0 0 0 1 0] 75 | threevar.push_back(X[1]^~(~X[0]|(~X[1]&X[2]))); // [0 1 1 0 0 0 1 0] 76 | threevar.push_back(X[1]&(X[0]^X[2])); // [0 0 0 1 0 0 1 0] 77 | threevar.push_back(X[2]^(X[0]|(~X[1]&X[2]))); // [0 1 0 1 0 0 1 0] 78 | threevar.push_back(X[1]&~(X[0]&X[2])); // [0 0 1 1 0 0 1 0] 79 | threevar.push_back(X[1]^~(~X[0]|(X[1]^X[2]))); // [0 1 1 1 0 0 1 0] 80 | threevar.push_back(~(X[0]|~X[2])); // [0 0 0 0 1 0 1 0] 81 | threevar.push_back(X[2]^(X[0]&(~X[1]|X[2]))); // [0 1 0 0 1 0 1 0] 82 | threevar.push_back(~X[0]&(X[1]|X[2])); // [0 0 1 0 1 0 1 0] 83 | threevar.push_back(X[0]^(X[1]|X[2])); // [0 1 1 0 1 0 1 0] 84 | threevar.push_back(X[2]^(X[0]&(X[1]|X[2]))); // [0 0 0 1 1 0 1 0] 85 | threevar.push_back(X[0]^X[2]); // [0 1 0 1 1 0 1 0] 86 | threevar.push_back((X[0]&X[2])^(X[1]|X[2])); // [0 0 1 1 1 0 1 0] 87 | threevar.push_back(X[2]^~(~X[0]&(~X[1]|X[2]))); // [0 1 1 1 1 0 1 0] 88 | threevar.push_back(X[2]&(X[0]^X[1])); // [0 0 0 0 0 1 1 0] 89 | threevar.push_back(X[1]^~(~X[0]&(~X[1]|X[2]))); // [0 1 0 0 0 1 1 0] 90 | threevar.push_back(X[1]^(X[0]&(X[1]|X[2]))); // [0 0 1 0 0 1 1 0] 91 | threevar.push_back(X[0]^X[1]); // [0 1 1 0 0 1 1 0] 92 | threevar.push_back((X[0]|X[1])&~(X[0]^(X[1]^X[2]))); // [0 0 0 1 0 1 1 0] 93 | threevar.push_back(X[0]^(X[1]&X[2])); // [0 1 0 1 0 1 1 0] 94 | threevar.push_back(X[1]^(X[0]&X[2])); // [0 0 1 1 0 1 1 0] 95 | threevar.push_back(X[1]^(X[0]&(~X[1]|X[2]))); // [0 1 1 1 0 1 1 0] 96 | threevar.push_back(X[2]&~(X[0]&X[1])); // [0 0 0 0 1 1 1 0] 97 | threevar.push_back(X[1]^(X[0]|(X[1]^X[2]))); // [0 1 0 0 1 1 1 0] 98 | threevar.push_back((X[0]&X[1])^(X[1]|X[2])); // [0 0 1 0 1 1 1 0] 99 | threevar.push_back(X[1]^(X[0]|(~X[1]&X[2]))); // [0 1 1 0 1 1 1 0] 100 | threevar.push_back(X[2]^(X[0]&X[1])); // [0 0 0 1 1 1 1 0] 101 | threevar.push_back(X[2]^~(~X[0]|(~X[1]&X[2]))); // [0 1 0 1 1 1 1 0] 102 | threevar.push_back(~(X[0]|~X[1])|(X[1]^X[2])); // [0 0 1 1 1 1 1 0] 103 | threevar.push_back((X[0]^X[1])|(X[0]^X[2])); // [0 1 1 1 1 1 1 0] 104 | threevar.push_back(X[0]&(X[1]&X[2])); // [0 0 0 0 0 0 0 1] 105 | threevar.push_back(~(~X[0]|(X[1]^X[2]))); // [0 1 0 0 0 0 0 1] 106 | threevar.push_back(X[1]&~(X[0]^X[2])); // [0 0 1 0 0 0 0 1] 107 | threevar.push_back((X[0]|X[1])&(X[0]^(X[1]^X[2]))); // [0 1 1 0 0 0 0 1] 108 | threevar.push_back(X[0]&X[1]); // [0 0 0 1 0 0 0 1] 109 | threevar.push_back(~(~X[0]|(~X[1]&X[2]))); // [0 1 0 1 0 0 0 1] 110 | threevar.push_back(X[1]&(X[0]|~X[2])); // [0 0 1 1 0 0 0 1] 111 | threevar.push_back((X[1]&~X[2])|~(~X[0]|(~X[1]&X[2]))); // [0 1 1 1 0 0 0 1] 112 | threevar.push_back(X[2]&~(X[0]^X[1])); // [0 0 0 0 1 0 0 1] 113 | threevar.push_back((X[0]|~X[1])&(X[0]^(X[1]^X[2]))); // [0 1 0 0 1 0 0 1] 114 | threevar.push_back(~(X[0]&~X[1])&(X[0]^(X[1]^X[2]))); // [0 0 1 0 1 0 0 1] 115 | threevar.push_back(X[0]^(X[1]^X[2])); // [0 1 1 0 1 0 0 1] 116 | threevar.push_back(X[1]^(~X[0]&(X[1]|X[2]))); // [0 0 0 1 1 0 0 1] 117 | threevar.push_back(X[0]^(~X[1]&X[2])); // [0 1 0 1 1 0 0 1] 118 | threevar.push_back(X[1]^~(X[0]|~X[2])); // [0 0 1 1 1 0 0 1] 119 | threevar.push_back((X[0]&X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 1 1 0 0 1] 120 | threevar.push_back(X[0]&X[2]); // [0 0 0 0 0 1 0 1] 121 | threevar.push_back(X[0]&(~X[1]|X[2])); // [0 1 0 0 0 1 0 1] 122 | threevar.push_back(X[2]^(~X[0]&(X[1]|X[2]))); // [0 0 1 0 0 1 0 1] 123 | threevar.push_back(~(X[0]^(~X[1]|X[2]))); // [0 1 1 0 0 1 0 1] 124 | threevar.push_back(X[0]&(X[1]|X[2])); // [0 0 0 1 0 1 0 1] 125 | threevar.push_back(X[0]); // [0 1 0 1 0 1 0 1] 126 | threevar.push_back((X[0]&X[2])|(X[1]&~X[2])); // [0 0 1 1 0 1 0 1] 127 | threevar.push_back(~(~X[0]&(~X[1]|X[2]))); // [0 1 1 1 0 1 0 1] 128 | threevar.push_back(X[2]&(X[0]|~X[1])); // [0 0 0 0 1 1 0 1] 129 | threevar.push_back((X[1]&~X[2])^(X[0]|(X[1]^X[2]))); // [0 1 0 0 1 1 0 1] 130 | threevar.push_back(X[2]^~(X[0]|~X[1])); // [0 0 1 0 1 1 0 1] 131 | threevar.push_back((X[0]&~X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 1 0 1] 132 | threevar.push_back((X[0]&X[1])|~(X[1]|~X[2])); // [0 0 0 1 1 1 0 1] 133 | threevar.push_back(X[0]|(~X[1]&X[2])); // [0 1 0 1 1 1 0 1] 134 | threevar.push_back((X[0]&X[1])|(X[1]^X[2])); // [0 0 1 1 1 1 0 1] 135 | threevar.push_back(X[0]|(X[1]^X[2])); // [0 1 1 1 1 1 0 1] 136 | threevar.push_back(X[1]&X[2]); // [0 0 0 0 0 0 1 1] 137 | threevar.push_back((X[0]|X[1])&~(X[1]^X[2])); // [0 1 0 0 0 0 1 1] 138 | threevar.push_back(X[1]&~(X[0]&~X[2])); // [0 0 1 0 0 0 1 1] 139 | threevar.push_back(X[1]^(X[0]&~X[2])); // [0 1 1 0 0 0 1 1] 140 | threevar.push_back(X[1]&(X[0]|X[2])); // [0 0 0 1 0 0 1 1] 141 | threevar.push_back((X[0]&X[2])^(X[0]^(X[1]&X[2]))); // [0 1 0 1 0 0 1 1] 142 | threevar.push_back(X[1]); // [0 0 1 1 0 0 1 1] 143 | threevar.push_back(X[1]|(X[0]&~X[2])); // [0 1 1 1 0 0 1 1] 144 | threevar.push_back(X[2]&~(X[0]&~X[1])); // [0 0 0 0 1 0 1 1] 145 | threevar.push_back(X[2]^(X[0]&~X[1])); // [0 1 0 0 1 0 1 1] 146 | threevar.push_back((X[1]&X[2])|(~X[0]&(X[1]|X[2]))); // [0 0 1 0 1 0 1 1] 147 | threevar.push_back(~(X[0]|~X[1])|(X[0]^(X[1]^X[2]))); // [0 1 1 0 1 0 1 1] 148 | threevar.push_back(X[1]^(~X[0]&(X[1]^X[2]))); // [0 0 0 1 1 0 1 1] 149 | threevar.push_back(X[2]^~(~X[0]|(X[1]&X[2]))); // [0 1 0 1 1 0 1 1] 150 | threevar.push_back(X[1]|~(X[0]|~X[2])); // [0 0 1 1 1 0 1 1] 151 | threevar.push_back(X[1]|(X[0]^X[2])); // [0 1 1 1 1 0 1 1] 152 | threevar.push_back(X[2]&(X[0]|X[1])); // [0 0 0 0 0 1 1 1] 153 | threevar.push_back((X[0]&X[1])^(X[0]^(X[1]&X[2]))); // [0 1 0 0 0 1 1 1] 154 | threevar.push_back(X[1]^(X[0]&(X[1]^X[2]))); // [0 0 1 0 0 1 1 1] 155 | threevar.push_back(X[1]^~(~X[0]|(X[1]&X[2]))); // [0 1 1 0 0 1 1 1] 156 | threevar.push_back((X[1]&X[2])|(X[0]&(X[1]|X[2]))); // [0 0 0 1 0 1 1 1] 157 | threevar.push_back(X[0]|(X[1]&X[2])); // [0 1 0 1 0 1 1 1] 158 | threevar.push_back(X[1]|(X[0]&X[2])); // [0 0 1 1 0 1 1 1] 159 | threevar.push_back(X[0]|X[1]); // [0 1 1 1 0 1 1 1] 160 | threevar.push_back(X[2]); // [0 0 0 0 1 1 1 1] 161 | threevar.push_back(X[2]|(X[0]&~X[1])); // [0 1 0 0 1 1 1 1] 162 | threevar.push_back(X[2]|~(X[0]|~X[1])); // [0 0 1 0 1 1 1 1] 163 | threevar.push_back(X[2]|(X[0]^X[1])); // [0 1 1 0 1 1 1 1] 164 | threevar.push_back(X[2]|(X[0]&X[1])); // [0 0 0 1 1 1 1 1] 165 | threevar.push_back(X[0]|X[2]); // [0 1 0 1 1 1 1 1] 166 | threevar.push_back(X[1]|X[2]); // [0 0 1 1 1 1 1 1] 167 | threevar.push_back(X[0]|(X[1]|X[2])); // [0 1 1 1 1 1 1 1] 168 | } 169 | -------------------------------------------------------------------------------- /bitwise_expr_lookup_tbl.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include "minsn_template.hpp" 11 | 12 | // bw_expr_tbl_t is a singleton class that maintains a lookup table mapping 13 | // boolean function evaluation traces (i.e. I/O behavior) to the shortest 14 | // representation of each boolean function. 15 | // for instance, if you found a boolean function f(x, y) with the following 16 | // behavior: f(0, 0) = 0, f(0, 1) = 0, f(1, 0) = 0, f(1, 1) = 1, then you 17 | // can query this object to find that f(x, y) = x & y. 18 | // note that we do not consider any functions that return 1 on the all-zeros 19 | // input. 20 | class bw_expr_tbl_t 21 | { 22 | qvector tbl; 23 | 24 | public: 25 | static bw_expr_tbl_t instance; 26 | 27 | // do not call directly, use instance instead 28 | bw_expr_tbl_t(); 29 | 30 | // eval_trace is a bitmap whose i'th bit contains the 31 | // boolean function's evaluation on the i'th conjunction, 32 | // where conjunctions are ordered in the same way as in lin_conj_exprs.hpp 33 | minsn_template_ptr_t lookup(int nvars, uint64_t bit_trace) 34 | { 35 | QASSERT(30698, (bit_trace & 1) == 0); 36 | QASSERT(30699, nvars <= 3); 37 | QASSERT(30700, nvars >= 1); 38 | QASSERT(30701, bit_trace < (1ull << (1ull << (nvars)))); 39 | return tbl[nvars-1][bit_trace >> 1]; 40 | // since the 0th conjunction is never considered, all vector indices are 41 | // divided by 2. See the corresponding .cpp file for more info. 42 | } 43 | }; 44 | -------------------------------------------------------------------------------- /consts.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #define ACTION_NAME "goomba:run" 11 | // Z3_TIMEOUT_MS defines the amount of time we allow the z3 theorem prover to 12 | // take to prove any given statement 13 | #define Z3_TIMEOUT_MS 1000 14 | 15 | // Only used for *generating* oracles: how many test cases to run against each 16 | // function to generate fingerprints. Note that an existing oracle will report 17 | // its own number, and the below constant will not be used 18 | #define TCS_PER_EQUIV_CLASS 128 19 | // The number of inputs used when evaluating functions for fingerprinting 20 | #define CANDIDATE_EXPR_NUMINPUTS 5 21 | // The maximum number of candidates to consider which have the same fingerprint 22 | // as the expression being simplified 23 | #define EQUIV_CLASS_MAX_CANDIDATES 10 24 | // The maximum number of fingerprints to consider for each expression being 25 | // simplified -- this number is greater than one since we consider every 26 | // possible assignment of input variables 27 | #define EQUIV_CLASS_MAX_FINGERPRINTS 50 -------------------------------------------------------------------------------- /equiv_class.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "equiv_class.hpp" 11 | #include "optimizer.hpp" 12 | 13 | 14 | //------------------------------------------------------------------------- 15 | // replaces all references to abstract mop_l's with variables from new_vars 16 | minsn_t *make_concrete_minsn(ea_t ea, const minsn_t &minsn, const mopvec_t &new_vars, int newsz) 17 | { 18 | struct mop_reassigner_t : public mop_visitor_t 19 | { 20 | const mopvec_t &new_vars; 21 | ea_t ea; 22 | mop_reassigner_t(ea_t e, const mopvec_t &nm) 23 | : new_vars(nm), ea(e) {} 24 | int idaapi visit_mop(mop_t *op, const tinfo_t *, bool) 25 | { 26 | if ( op->t == mop_l ) 27 | { 28 | int idx = op->l->idx; 29 | if ( idx >= new_vars.size() ) 30 | return -1; 31 | op->t = mop_d; 32 | op->d = resize_mop(ea, new_vars.at(idx), op->size, false); 33 | } 34 | return 0; 35 | } 36 | }; 37 | 38 | minsn_t *res = nullptr; 39 | minsn_t *copy = new minsn_t(minsn); 40 | 41 | mop_reassigner_t mr(ea, new_vars); 42 | int code = copy->for_all_ops(mr); 43 | if ( code >= 0 ) 44 | { 45 | copy->setaddr(ea); 46 | 47 | // resize res to the correct output size 48 | mop_t res_mop; 49 | res_mop.create_from_insn(copy); 50 | res = resize_mop(ea, res_mop, newsz, false); 51 | } 52 | delete copy; 53 | return res; 54 | } 55 | 56 | //------------------------------------------------------------------------- 57 | static void create_var_mapping(var_mapping_t &dest, const mopvec_t &mops) 58 | { 59 | for ( size_t i = 0; i < mops.size(); i++ ) 60 | dest.insert( { mops[i], i } ); 61 | } 62 | 63 | //------------------------------------------------------------------------- 64 | void equiv_class_finder_t::find_candidates(minsn_set_t &dest, const minsn_t &insn) 65 | { 66 | std::set seen; 67 | int num_fingerprints = 0; // includes duplicate fingerprints 68 | int num_candidates = 0; 69 | 70 | mopvec_t input_mops = get_input_mops(insn); 71 | do 72 | { 73 | var_mapping_t mapping; 74 | create_var_mapping(mapping, input_mops); 75 | 76 | func_fingerprint_t fingerprint = compute_fingerprint(insn, &mapping); 77 | msg("Computed fingerprint %" FMT_64 "x\n", fingerprint); 78 | 79 | num_fingerprints++; 80 | if ( num_fingerprints > EQUIV_CLASS_MAX_FINGERPRINTS ) 81 | break; 82 | 83 | if ( !seen.insert(fingerprint).second ) 84 | continue; // already seen 85 | 86 | const minsn_set_t *equiv_class = find_equiv_class(fingerprint); 87 | if ( equiv_class != nullptr ) 88 | { 89 | for ( const auto &mi : *equiv_class ) 90 | { 91 | num_candidates++; 92 | // msg("Fingerprint matches: %s\n", mi->dstr()); 93 | minsn_t *concrete = make_concrete_minsn(insn.ea, *mi, input_mops, insn.d.size); 94 | 95 | if ( concrete != nullptr ) 96 | dest.insert(concrete); 97 | 98 | if ( num_candidates >= EQUIV_CLASS_MAX_CANDIDATES ) 99 | break; 100 | } 101 | } 102 | 103 | } while ( std::next_permutation(input_mops.begin(), input_mops.end()) ); 104 | } 105 | -------------------------------------------------------------------------------- /equiv_class.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include "msynth_parser.hpp" 12 | #include "heuristics.hpp" 13 | #include "linear_exprs.hpp" 14 | #include "consts.hpp" 15 | 16 | struct minsn_with_mapping_t; 17 | 18 | typedef std::set minsn_set_t; 19 | 20 | typedef qvector output_behavior_t; 21 | typedef qvector testcase_t; 22 | typedef std::map var_mapping_t; 23 | typedef uint64 func_fingerprint_t; 24 | typedef std::map equiv_class_map_t; 25 | 26 | #define CHECK_SERIALIZATION_CONSISTENCY true 27 | 28 | //------------------------------------------------------------------------- 29 | // output behavior is summarized as a list of uint64's, each corresponding to a test case 30 | inline func_fingerprint_t compute_fingerprint_from_outputs(const output_behavior_t &outputs) 31 | { 32 | // FNV-1a, as per Wikipedia 33 | const uint64 FNV_BASIS = 0xcbf29ce484222325; 34 | const uint64 FNV_PRIME = 0x100000001b3; 35 | uint64 sum = FNV_BASIS; 36 | for ( uint64 c : outputs ) 37 | { 38 | sum ^= c; 39 | sum *= FNV_PRIME; 40 | } 41 | return sum; 42 | } 43 | 44 | //------------------------------------------------------------------------- 45 | inline void gen_testcase(testcase_t *tc) 46 | { 47 | tc->resize(CANDIDATE_EXPR_NUMINPUTS); 48 | for ( auto &v : *tc ) 49 | v = gen_rand_mcode_val(8).val; 50 | } 51 | 52 | //------------------------------------------------------------------------- 53 | class equiv_class_finder_t 54 | { 55 | public: 56 | equiv_class_map_t equiv_classes; 57 | qvector testcases; 58 | 59 | //------------------------------------------------------------------------- 60 | // helper_emu_t evaluates expressions for a given test case and variable mapping 61 | struct helper_emu_t : public mcode_emulator_t 62 | { 63 | const testcase_t &tc; 64 | const var_mapping_t *var_mapping; // maps variables to input index 65 | // assigning a nullptr var_mapping indicates that the indexing should be done 66 | // according to the abstract mop's self-declared index 67 | 68 | helper_emu_t (const testcase_t &t, const var_mapping_t *vm) 69 | : tc(t), var_mapping(vm) {} 70 | 71 | virtual mcode_val_t get_var_val(const mop_t &mop) override 72 | { 73 | if ( var_mapping == nullptr ) 74 | { 75 | // the instruction must be abstract, get the index from the mop itself 76 | QASSERT(30706, mop.t == mop_l); 77 | return mcode_val_t(tc[mop.l->idx], mop.size); 78 | } 79 | return mcode_val_t(tc.at(var_mapping->at(mop)), mop.size); 80 | } 81 | }; 82 | 83 | virtual ~equiv_class_finder_t() {} 84 | 85 | //------------------------------------------------------------------------- 86 | equiv_class_finder_t() 87 | { 88 | testcases.resize(TCS_PER_EQUIV_CLASS); 89 | for ( auto &tc : testcases ) 90 | gen_testcase(&tc); 91 | } 92 | 93 | //------------------------------------------------------------------------- 94 | // mapping = nullptr means the instruction is abstract (all terminal mops 95 | // have type mop_l), and mop indices will be retrieved by querying mop.l->idx 96 | func_fingerprint_t compute_fingerprint( 97 | const minsn_t &ins, 98 | const var_mapping_t *mapping = nullptr) 99 | { 100 | output_behavior_t res; 101 | res.reserve(testcases.size()); 102 | for ( const auto &tc : testcases ) 103 | { 104 | helper_emu_t emu(tc, mapping); 105 | res.push_back(emu.minsn_value(ins).val); 106 | } 107 | return compute_fingerprint_from_outputs(res); 108 | } 109 | 110 | //------------------------------------------------------------------------- 111 | func_fingerprint_t compute_fingerprint_from_serialization( 112 | uchar *buf, uint32 sz, 113 | int version = -1, 114 | const var_mapping_t *mapping = nullptr) 115 | { 116 | if ( version == -1 ) // use current serialization version 117 | { 118 | bytevec_t bv; 119 | version = minsn_t(0).serialize(&bv); 120 | } 121 | minsn_t minsn(0); 122 | minsn.deserialize(buf, sz, version); 123 | 124 | return compute_fingerprint(minsn, mapping); 125 | } 126 | 127 | //------------------------------------------------------------------------- 128 | // computes the fingerprint of the abstract minsn and adds it to the index 129 | void add_abstract_minsn(minsn_t *ins) 130 | { 131 | auto fingerprint = compute_fingerprint(*ins); 132 | auto it = equiv_classes.find(fingerprint); 133 | if ( it != equiv_classes.end() ) 134 | { 135 | // check if semantically equivalent expression already exists 136 | for ( const auto &o : it->second ) 137 | if ( probably_equivalent(*o, *ins) ) 138 | return; 139 | it->second.insert(ins); 140 | } 141 | else 142 | { 143 | minsn_set_t new_entry; 144 | new_entry.insert(ins); 145 | equiv_classes.insert( { fingerprint, new_entry } ); 146 | } 147 | } 148 | 149 | //------------------------------------------------------------------------- 150 | virtual const minsn_set_t *find_equiv_class(func_fingerprint_t fingerprint) 151 | { 152 | auto p = equiv_classes.find(fingerprint); 153 | if ( p != equiv_classes.end() ) 154 | return &p->second; 155 | return nullptr; 156 | } 157 | 158 | //------------------------------------------------------------------------- 159 | // find candidate minsns that match the fingerprint of the given minsn 160 | // before being added, these are made concrete -- the abstract mop_l's are 161 | // replaced by real mops from the input insn 162 | void find_candidates(minsn_set_t &dest, const minsn_t &insn); 163 | }; 164 | 165 | //------------------------------------------------------------------------- 166 | struct equiv_class_idx_entry_t 167 | { 168 | func_fingerprint_t fingerprint; 169 | uint64_t offset; 170 | // offset relative to the beginning of where minsns are stored within the oracle file 171 | 172 | bool operator<(const equiv_class_idx_entry_t &o) const 173 | { 174 | return fingerprint < o.fingerprint; 175 | } 176 | }; 177 | 178 | //------------------------------------------------------------------------- 179 | struct equiv_class_idx_t 180 | { 181 | qvector index; 182 | 183 | //------------------------------------------------------------------------- 184 | void read_from_file(FILE *file) 185 | { 186 | uint32 idx_sz; 187 | if ( qfread(file, &idx_sz, sizeof(idx_sz)) != sizeof(idx_sz) ) 188 | INTERR(30719); 189 | CASSERT(sizeof(equiv_class_idx_entry_t) == 16); 190 | 191 | index.resize_noinit(idx_sz); 192 | size_t nbytes = idx_sz * sizeof(equiv_class_idx_entry_t); 193 | if ( qfread(file, index.begin(), nbytes) != nbytes ) 194 | INTERR(0); 195 | } 196 | 197 | //------------------------------------------------------------------------- 198 | size_t find(func_fingerprint_t fp) 199 | { 200 | equiv_class_idx_entry_t key; 201 | key.fingerprint = fp; 202 | auto p = std::lower_bound(index.begin(), index.end(), key); 203 | if ( p == index.end() || p->fingerprint != fp ) 204 | return -1; 205 | return p->offset; 206 | } 207 | }; 208 | 209 | //------------------------------------------------------------------------- 210 | // lazy-loading collection of equivalence classes 211 | struct equiv_class_finder_lazy_t : public equiv_class_finder_t 212 | { 213 | FILE *file; 214 | qoff64_t fsize; 215 | uint32 format_version; // format version used to serialize minsn_t's 216 | equiv_class_idx_t index; 217 | uint64 minsns_offset; // offset at which the minsns table begins 218 | 219 | virtual ~equiv_class_finder_lazy_t() { qfclose(file); } 220 | 221 | //------------------------------------------------------------------------- 222 | //lint -sem(equiv_class_finder_lazy_t::equiv_class_finder_lazy_t, custodial(1)) 223 | equiv_class_finder_lazy_t(FILE *f) : file(f) 224 | { 225 | fsize = qfsize(file); 226 | 227 | // read in the format version 228 | if ( qfread(file, &format_version, sizeof(format_version)) != sizeof(format_version) ) 229 | INTERR(30716); 230 | 231 | // read and validate the number of the test cases 232 | uint32 n_tcs; 233 | if ( qfread(file, &n_tcs, sizeof(n_tcs)) != sizeof(n_tcs) ) 234 | INTERR(30717); 235 | if ( n_tcs > fsize ) 236 | INTERR(0); 237 | 238 | // read in the test cases 239 | testcases.resize(n_tcs); 240 | for ( auto &new_tc : testcases ) 241 | { 242 | new_tc.resize(CANDIDATE_EXPR_NUMINPUTS); 243 | for ( uint64 &new_inp : new_tc ) 244 | if ( qfread(file, &new_inp, sizeof(new_inp)) != sizeof(new_inp) ) 245 | INTERR(30718); 246 | } 247 | 248 | // read in the index 249 | index.read_from_file(file); 250 | 251 | minsns_offset = qftell(file); 252 | // msg("minsns offset %llu", minsns_offset); 253 | } 254 | 255 | //------------------------------------------------------------------------- 256 | // populates the equiv_classes map with the minsn set included in the file 257 | // for the given fingerprint 258 | void read_minsn_set_from_file(func_fingerprint_t fp) 259 | { 260 | int64 idx_lookup = index.find(fp); 261 | if ( idx_lookup < 0 ) 262 | return; // fingerprint doesn't exist in oracle 263 | if ( equiv_classes.count(fp) != 0 ) 264 | return; // we already loaded in the equiv class 265 | 266 | uint64 minsn_offset = minsns_offset + idx_lookup; 267 | if ( qfseek(file, minsn_offset, SEEK_SET) != 0 ) 268 | INTERR(30722); 269 | 270 | uint32 n_minsns; 271 | if ( qfread(file, &n_minsns, sizeof(n_minsns)) != sizeof(n_minsns) ) 272 | INTERR(30723); 273 | if ( n_minsns > fsize ) // sanity check 274 | INTERR(0); 275 | 276 | bytevec_t bv; 277 | minsn_set_t &set = equiv_classes[fp]; 278 | for ( uint32 i = 0; i < n_minsns; i++ ) 279 | { 280 | uint32 minsn_sz; 281 | if ( qfread(file, &minsn_sz, sizeof(minsn_sz)) != sizeof(minsn_sz) ) 282 | INTERR(30724); 283 | if ( minsn_sz > fsize ) // sanity check 284 | INTERR(0); 285 | bv.resize(minsn_sz); 286 | if ( qfread(file, bv.begin(), minsn_sz) != minsn_sz ) 287 | INTERR(30725); 288 | minsn_t *minsn = new minsn_t(0); 289 | minsn->deserialize(bv.begin(), minsn_sz, format_version); 290 | set.insert(minsn); 291 | } 292 | } 293 | 294 | //------------------------------------------------------------------------- 295 | const minsn_set_t *find_equiv_class(func_fingerprint_t fingerprint) override 296 | { 297 | read_minsn_set_from_file(fingerprint); 298 | return equiv_class_finder_t::find_equiv_class(fingerprint); 299 | } 300 | 301 | //------------------------------------------------------------------------- 302 | bool optimize(minsn_t &insn); 303 | }; 304 | -------------------------------------------------------------------------------- /file.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include 11 | #include 12 | #include "file.hpp" 13 | #include "msynth_parser.hpp" 14 | #include "simp_lin_conj_exprs.hpp" 15 | #include "heuristics.hpp" 16 | #include "equiv_class.hpp" 17 | 18 | //------------------------------------------------------------------------- 19 | // In fact this function is not really needed. The user can simply turn on 20 | // the timestamp display in the output window. 21 | static qstring curtime() 22 | { 23 | char buf[64]; 24 | char *ptr = buf; 25 | char *end = buf + sizeof(buf); 26 | qtime64_t ts = qtime64(); 27 | ptr += qstrftime64(ptr, end-ptr, "%H:%M:%S", ts); 28 | uint32 msecs = get_usecs(ts) / 1000; 29 | qsnprintf(ptr, end-ptr, ".%03d", msecs); 30 | return qstring(buf); 31 | } 32 | 33 | //------------------------------------------------------------------------- 34 | void create_minsns_file(FILE *msynth_in, FILE *minsns_out) 35 | { 36 | qstring line; 37 | int n_proc = 0; 38 | int n_written = 0; 39 | while ( qgetline(&line, msynth_in) >= 0 ) 40 | { 41 | n_proc++; 42 | if ( line.size() == 0 ) 43 | continue; 44 | if ( n_proc % REPORT_FREQ == 0 ) 45 | msg("%s: Processed %d, Wrote %d\n", curtime().c_str(), n_proc, n_written); 46 | mopvec_t default_vars; 47 | //------------------------------------------------------------------------- 48 | // an *abstract* mop is a mop_l that does not refer to anything within a 49 | // specific program, it is a placeholder for minsn templates 50 | for ( int i = 0; i < CANDIDATE_EXPR_NUMINPUTS; i++ ) 51 | { 52 | mop_t new_var; 53 | new_var.t = mop_l; 54 | new_var.l = new lvar_ref_t(nullptr, i); 55 | new_var.size = 8; 56 | default_vars.push_back(new_var); 57 | } 58 | 59 | msynth_expr_parser_t mep(line.c_str(), default_vars); 60 | minsn_t *insn = mep.parse_next_expr(); 61 | 62 | bytevec_t bv; 63 | insn->serialize(&bv); 64 | uint32 bv_sz = bv.size(); 65 | qfwrite(minsns_out, &bv_sz, sizeof(bv_sz)); 66 | qfwrite(minsns_out, bv.begin(), bv_sz); 67 | n_written++; 68 | 69 | delete insn; 70 | } 71 | 72 | msg("%s: Processed %d, Wrote %d\n", curtime().c_str(), n_proc, n_written); 73 | } 74 | 75 | //------------------------------------------------------------------------- 76 | // bytevec comparison based on length 77 | struct bv_len_cmptr_t 78 | { 79 | inline bool operator()(const bytevec_t &a, const bytevec_t &b) const 80 | { 81 | auto asz = a.size(); 82 | auto bsz = b.size(); 83 | return std::tie(asz, a) < std::tie(bsz, b); 84 | } 85 | }; 86 | typedef std::set bvset_t; 87 | 88 | //------------------------------------------------------------------------- 89 | inline size_t bv_sz_on_disk(const bytevec_t &bv) 90 | { 91 | return sizeof(uint32) + bv.size(); 92 | } 93 | 94 | //------------------------------------------------------------------------- 95 | static void write_bv_to_disk(FILE *fout, const bytevec_t &bv) 96 | { 97 | uint32 bv_sz = bv.size(); 98 | qfwrite(fout, &bv_sz, sizeof(bv_sz)); 99 | qfwrite(fout, bv.begin(), bv_sz); 100 | } 101 | 102 | //------------------------------------------------------------------------- 103 | static size_t bvset_sz_on_disk(const bvset_t &bvset) 104 | { 105 | size_t res = sizeof(uint32); 106 | for ( const auto &bv : bvset ) 107 | res += bv_sz_on_disk(bv); 108 | return res; 109 | } 110 | 111 | //------------------------------------------------------------------------- 112 | static void write_bvset_to_disk(FILE *fout, const bvset_t &bvset) 113 | { 114 | uint32 bvset_sz = bvset.size(); 115 | qfwrite(fout, &bvset_sz, sizeof(bvset_sz)); 116 | for ( const auto &bv : bvset ) 117 | write_bv_to_disk(fout, bv); 118 | } 119 | 120 | //------------------------------------------------------------------------- 121 | bool create_oracle_file(FILE *minsns_in, FILE *oracle_out) 122 | { 123 | // begin by loading the minsns from the file and generating fingerprints 124 | // keeping full minsns in memory would take too much space, so we store them as strings 125 | // and use string length as a proxy for complexity 126 | std::map oracle; 127 | equiv_class_finder_t ecf; 128 | 129 | int n_proc = 0; 130 | while ( true ) 131 | { 132 | if ( n_proc % REPORT_FREQ == 0 ) 133 | msg("%s: Processed %d, #Fingerprints %" FMT_Z "\n", curtime().c_str(), n_proc, oracle.size()); 134 | n_proc++; 135 | uint32 minsn_sz; 136 | if ( qfread(minsns_in, &minsn_sz, sizeof(minsn_sz)) != sizeof(minsn_sz) ) 137 | break; 138 | if ( minsn_sz > qfsize(minsns_in) ) // sanity check on minsn_sz 139 | { 140 | msg("Wrong instruction size %d in the oracle file, stopped reading it\n", minsn_sz); 141 | return false; 142 | } 143 | bytevec_t buf; 144 | buf.resize(minsn_sz); 145 | if ( qfread(minsns_in, buf.begin(), minsn_sz) != minsn_sz ) 146 | break; 147 | 148 | func_fingerprint_t fp = ecf.compute_fingerprint_from_serialization(buf.begin(), minsn_sz); 149 | 150 | if ( oracle.count(fp) == 0 ) 151 | oracle.insert( { fp, std::set() } ); 152 | 153 | oracle[fp].insert(buf); 154 | } 155 | 156 | msg("%s: Processed %d, #Fingerprints %" FMT_Z "\n", curtime().c_str(), n_proc, oracle.size()); 157 | 158 | // write the resulting oracle to the file 159 | // begin by writing the format version 160 | { 161 | bytevec_t bv; 162 | uint32 format_version = minsn_t(0).serialize(&bv); 163 | qfwrite(oracle_out, &format_version, sizeof(format_version)); 164 | } 165 | 166 | // write the ecf's test cases to file 167 | uint32 n_tcs = ecf.testcases.size(); 168 | qfwrite(oracle_out, &n_tcs, sizeof(n_tcs)); 169 | for ( const testcase_t &tc : ecf.testcases ) 170 | for ( const uint64 input : tc ) 171 | qfwrite(oracle_out, &input, sizeof(input)); 172 | 173 | msg("Wrote test cases to file\n"); 174 | 175 | // write the index to file 176 | // the index is a list of entries, each consisting of a uint64 (fingerprint) and a uint64 (offset) 177 | uint32 index_sz = oracle.size(); 178 | qfwrite(oracle_out, &index_sz, sizeof(index_sz)); 179 | qoff64_t current_offset = 0; 180 | int n_written = 0; 181 | for ( const auto &entry : oracle ) 182 | { 183 | if ( n_written % REPORT_FREQ == 0 ) 184 | msg("%s: Wrote %d index entries\n", curtime().c_str(), n_written); 185 | n_written++; 186 | 187 | auto fingerprint = entry.first; 188 | auto bvset = entry.second; 189 | qfwrite(oracle_out, &fingerprint, sizeof(fingerprint)); 190 | qfwrite(oracle_out, ¤t_offset, sizeof(current_offset)); 191 | 192 | current_offset += bvset_sz_on_disk(bvset); 193 | } 194 | 195 | msg("Size of oracle on disk: %llu\n", current_offset); 196 | msg("Current file position: %llu\n", qftell(oracle_out)); 197 | 198 | // write the actual microinstructions to disk 199 | n_written = 0; 200 | for ( const auto &entry : oracle ) 201 | { 202 | if ( n_written % REPORT_FREQ == 0 ) 203 | msg("%s: Wrote %d microinstruction vectors\n", curtime().c_str(), n_written); 204 | n_written++; 205 | 206 | write_bvset_to_disk(oracle_out, entry.second); 207 | } 208 | 209 | msg("%s: Wrote %d microinstruction vectors\n", curtime().c_str(), n_written); 210 | msg("Current file position: %" FMT_64 "u\n", qftell(oracle_out)); 211 | return true; 212 | } 213 | -------------------------------------------------------------------------------- /file.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | 12 | // functions that convert huge files in a streaming fashion without using too much memory 13 | 14 | const int REPORT_FREQ = 10000; // how often we should report progress in the log 15 | // generates a file that is just a list of minsns 16 | void create_minsns_file(FILE *msynth_in, FILE *minsns_out); 17 | // given a minsns file, fingerprints each minsn and serializes it into the oracle 18 | bool create_oracle_file(FILE *minsns_in, FILE *oracle_out); -------------------------------------------------------------------------------- /generate_oracle.bat: -------------------------------------------------------------------------------- 1 | @if "%DEBUG%" == "" @echo off 2 | @rem ########################################################################## 3 | @rem 4 | @rem gooMBA oracle file generation script 5 | @rem 6 | @rem ########################################################################## 7 | @rem Set local scope for the variables with windows NT shell 8 | if "%OS%"=="Windows_NT" setlocal 9 | 10 | if .%1 == . goto usage 11 | set VD_MSYNTH_PATH=%~f1 12 | echo generating minsns file (step 1/2)... 13 | idat64 -A -Llog.txt tests/idb/mba_challenge.i64 14 | set VD_MSYNTH_PATH= 15 | set VD_MBA_MINSNS_PATH=%~dpnx1.b 16 | echo generating oracle file (step 2/2)... 17 | idat64 -A -Llog.txt tests/idb/mba_challenge.i64 18 | echo. >> log.txt 19 | echo finished! 20 | move %~dpnx1.b.c %~dpn1.oracle 21 | echo finished! Result is in %~dpn1.oracle 22 | tail log.txt 23 | exit /b 24 | :usage 25 | echo "Usage: generate_oracle.bat all_combined.txt" 26 | -------------------------------------------------------------------------------- /generate_oracle.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # usage: ./generate_oracle.sh all_combined.txt 4 | # after the script finishes running, the oracle file will be available in all_combined.txt.oracle 5 | 6 | ( 7 | VD_MSYNTH_PATH=`realpath $1` ida64 -A -S`realpath script.py` -Llog.txt tests/idb/mba_challenge.i64 8 | VD_MBA_MINSNS_PATH=`realpath $1.b` ida64 -A -S`realpath script.py` -Llog.txt tests/idb/mba_challenge.i64 9 | mv $1.b.c $1.oracle 10 | echo -e "\nfinished! Result is in $1.oracle" >> log.txt 11 | ) & 12 | 13 | tail -F log.txt -------------------------------------------------------------------------------- /goomba.cfg: -------------------------------------------------------------------------------- 1 | 2 | // This configuration file is used by the mixed_bool_arith plugin, which 3 | // provides deobfuscation functionality for expressions obfuscated with 4 | // mixed boolean arithmetic expressions. 5 | 6 | // By default, the plugin only engages through a right-click menu option. 7 | // Set the below option to YES to make the plugin engage automatically 8 | // when the decompiler is invoked. 9 | MBA_RUN_AUTOMATICALLY = NO 10 | // The timeout in ms for z3 proofs. Set this to 0 to disable z3 proofs 11 | // entirely and assume simplifications are correct after heuristic checks. 12 | MBA_Z3_TIMEOUT = 1000 13 | // When z3 times out, should the simplification be assumed correct? 14 | MBA_Z3_ASSUME_TIMEOUTS_CORRECT = YES 15 | // Path to an MBA oracle. Leave this empty to disable the function 16 | // fingerprinting algorithm and use only linear methods. 17 | MBA_ORACLE_PATH = ""; 18 | -------------------------------------------------------------------------------- /goomba.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * It deobfuscates the MBA (mixed boolean arithmetic) epxressions. 7 | * 8 | */ 9 | 10 | #include 11 | 12 | #include "z3++_no_warn.h" 13 | #include "consts.hpp" 14 | #include "optimizer.hpp" 15 | #include "equiv_class.hpp" 16 | #include "file.hpp" 17 | #include 18 | #include 19 | 20 | struct plugin_ctx_t; 21 | 22 | //-------------------------------------------------------------------------- 23 | // returns true if the environment variables indicate the plugin should 24 | // always be enabled (i.e. in testing environments) 25 | inline bool always_on(void) 26 | { 27 | return qgetenv("VD_MBA_AUTO"); 28 | } 29 | 30 | //-------------------------------------------------------------------------- 31 | struct action_handler : public action_handler_t 32 | { 33 | plugin_ctx_t *plugmod; 34 | 35 | action_handler(plugin_ctx_t *_plugmod) : plugmod(_plugmod) {} 36 | 37 | virtual int idaapi activate(action_activation_ctx_t *ctx) override; 38 | virtual action_state_t idaapi update(action_update_ctx_t *) override 39 | { 40 | return AST_ENABLE; 41 | }; 42 | }; 43 | 44 | //-------------------------------------------------------------------------- 45 | //lint -e{958} padding of 7 bytes needed to align member on a 8 byte boundary 46 | struct plugin_ctx_t : public plugmod_t 47 | { 48 | bool run_automatically = false; 49 | qstring oracle_path; 50 | 51 | action_handler ah; 52 | optimizer_t optimizer; 53 | bool plugmod_active = false; 54 | plugin_ctx_t(); 55 | ~plugin_ctx_t() { term_hexrays_plugin(); } 56 | virtual bool idaapi run(size_t) override; 57 | }; 58 | 59 | //-------------------------------------------------------------------------- 60 | static plugmod_t *idaapi init() 61 | { 62 | if ( !init_hexrays_plugin() ) 63 | return nullptr; // no decompiler 64 | 65 | const char *hxver = get_hexrays_version(); 66 | msg("Hex-rays version %s has been detected, %s ready to use\n", 67 | hxver, PLUGIN.wanted_name); 68 | 69 | plugin_ctx_t *plugmod = new plugin_ctx_t; 70 | 71 | const cfgopt_t cfgopts[] = 72 | { 73 | cfgopt_t("MBA_RUN_AUTOMATICALLY", &plugmod->run_automatically, 1), 74 | cfgopt_t("MBA_Z3_TIMEOUT", &plugmod->optimizer.z3_timeout), 75 | cfgopt_t("MBA_ORACLE_PATH", &plugmod->oracle_path), 76 | cfgopt_t("MBA_Z3_ASSUME_TIMEOUTS_CORRECT", &plugmod->optimizer.z3_assume_timeouts_correct, 1) 77 | }; 78 | 79 | read_config_file("goomba", cfgopts, qnumber(cfgopts), nullptr); 80 | 81 | if ( plugmod->oracle_path.empty() ) 82 | qgetenv("VD_MBA_ORACLE_PATH", &plugmod->oracle_path); 83 | 84 | if ( !plugmod->oracle_path.empty() ) 85 | { 86 | const char *path = plugmod->oracle_path.c_str(); 87 | FILE *fin = qfopen(path, "rb"); 88 | if ( fin != nullptr ) 89 | { 90 | plugmod->optimizer.equiv_classes = new equiv_class_finder_lazy_t(fin); 91 | msg("%s: loaded MBA oracle\n", path); 92 | } 93 | else 94 | { 95 | msg("%s: %s\n", path, qstrerror(-1)); 96 | } 97 | } 98 | 99 | qstring ifpath; 100 | if ( qgetenv("VD_MSYNTH_PATH", &ifpath) ) 101 | { 102 | qstring ofpath = ifpath + ".b"; 103 | FILE *fin = qfopen(ifpath.c_str(), "r"); 104 | if ( fin == nullptr ) 105 | error("%s: failed to open for reading", ifpath.c_str()); 106 | FILE *fout = qfopen(ofpath.c_str(), "wb"); 107 | if ( fout == nullptr ) 108 | error("%s: failed to open for writing", ofpath.c_str()); 109 | create_minsns_file(fin, fout); 110 | qfclose(fin); 111 | qfclose(fout); 112 | // do not save the IDB 113 | set_database_flag(DBFL_KILL); 114 | qexit(0); 115 | } 116 | 117 | if ( qgetenv("VD_MBA_MINSNS_PATH", &ifpath) ) 118 | { 119 | qstring ofpath = ifpath + ".c"; 120 | FILE *fin = qfopen(ifpath.c_str(), "rb"); 121 | if ( fin == nullptr ) 122 | error("%s: failed to open for reading", ifpath.c_str()); 123 | FILE *fout = qfopen(ofpath.c_str(), "wb"); 124 | if ( fout == nullptr ) 125 | error("%s: failed to open for writing", ofpath.c_str()); 126 | bool ok = create_oracle_file(fin, fout); 127 | qfclose(fin); 128 | qfclose(fout); 129 | if ( !ok ) 130 | error("%s: failed to process", ifpath.c_str()); 131 | // do not save the IDB 132 | set_database_flag(DBFL_KILL); 133 | qexit(0); 134 | } 135 | 136 | return plugmod; 137 | } 138 | 139 | //-------------------------------------------------------------------------- 140 | int idaapi action_handler::activate(action_activation_ctx_t *ctx) 141 | { 142 | vdui_t *vu = get_widget_vdui(ctx->widget); 143 | if ( vu != nullptr ) 144 | { 145 | plugmod->plugmod_active = true; 146 | vu->refresh_view(true); 147 | return 1; 148 | } 149 | return 0; 150 | } 151 | 152 | //-------------------------------------------------------------------------- 153 | // This callback handles various hexrays events. 154 | static ssize_t idaapi callback(void *ud, hexrays_event_t event, va_list va) 155 | { 156 | plugin_ctx_t *plugmod = (plugin_ctx_t *) ud; 157 | switch ( event ) 158 | { 159 | case hxe_microcode: // microcode has been generated 160 | { 161 | mba_t *mba = va_arg(va, mba_t *); 162 | if ( always_on() || plugmod->run_automatically ) 163 | plugmod->plugmod_active = true; 164 | if ( plugmod->plugmod_active ) 165 | mba->set_mba_flags2(MBA2_PROP_COMPLEX); // increase acceptable complexity 166 | } 167 | break; 168 | 169 | case hxe_populating_popup: 170 | { 171 | TWidget *widget = va_arg(va, TWidget *); 172 | TPopupMenu *popup = va_arg(va, TPopupMenu *); 173 | attach_action_to_popup(widget, popup, ACTION_NAME); 174 | } 175 | break; 176 | 177 | case hxe_glbopt: 178 | if ( plugmod->plugmod_active ) 179 | { 180 | mba_t *mba = va_arg(va, mba_t *); 181 | 182 | struct ida_local insn_optimize_t : public minsn_visitor_t 183 | { 184 | optimizer_t &optimizer; 185 | int cnt = 0; 186 | insn_optimize_t ( optimizer_t &o ) : optimizer(o) {} 187 | int idaapi visit_minsn() override 188 | { 189 | // msg("Optimizing %s\n", curins->dstr()); 190 | if ( optimizer.optimize_insn_recurse(curins) ) 191 | { 192 | cnt++; 193 | blk->mark_lists_dirty(); 194 | mba->dump_mba(true, "vd_mba success %a", curins->ea); 195 | } 196 | return 0; 197 | } 198 | }; 199 | 200 | insn_optimize_t visitor(plugmod->optimizer); 201 | mba->for_all_topinsns(visitor); 202 | 203 | if ( visitor.cnt != 0 ) 204 | { 205 | mba->verify(true); 206 | msg("Completed mba optimization pass, improved %d expressions\n", visitor.cnt); 207 | } 208 | plugmod->plugmod_active = false; 209 | mba->clr_mba_flags2(MBA2_PROP_COMPLEX); 210 | return MERR_LOOP; // restart optimization 211 | } 212 | break; 213 | 214 | default: 215 | break; 216 | } 217 | return 0; 218 | } 219 | 220 | //-------------------------------------------------------------------------- 221 | plugin_ctx_t::plugin_ctx_t() : ah(this) 222 | { 223 | install_hexrays_callback(callback, this); 224 | register_action(ACTION_DESC_LITERAL_PLUGMOD( 225 | ACTION_NAME, 226 | "De-obfuscate arithmetic expressions", 227 | &ah, 228 | this, 229 | nullptr, 230 | "Attempt to simplify Mixed Boolean Arithmetic-obfuscated expressions using gooMBA", 231 | -1)); 232 | } 233 | 234 | //-------------------------------------------------------------------------- 235 | bool idaapi plugin_ctx_t::run(size_t) 236 | { 237 | return true; 238 | } 239 | 240 | //-------------------------------------------------------------------------- 241 | static char comment[] = "gooMBA plugin for Hex-Rays decompiler"; 242 | 243 | //-------------------------------------------------------------------------- 244 | // 245 | // PLUGIN DESCRIPTION BLOCK 246 | // 247 | //-------------------------------------------------------------------------- 248 | plugin_t PLUGIN = 249 | { 250 | IDP_INTERFACE_VERSION, 251 | PLUGIN_MULTI // The plugin can work with multiple idbs in parallel 252 | | PLUGIN_HIDE, // no menu items in Edit, Plugins 253 | init, // initialize 254 | nullptr, 255 | nullptr, 256 | comment, // long comment about the plugin 257 | nullptr, // multiline help about the plugin 258 | "gooMBA plugin", // the preferred short name of the plugin 259 | nullptr, // the preferred hotkey to run the plugin 260 | }; 261 | -------------------------------------------------------------------------------- /heuristics.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "heuristics.hpp" 11 | 12 | //------------------------------------------------------------------------- 13 | inline uint64 rand64() 14 | { 15 | uint32 r1 = rand(); 16 | uint32 r2 = rand(); 17 | return uint64(r1) << 32 | uint64(r2); 18 | } 19 | 20 | //------------------------------------------------------------------------- 21 | mcode_val_t gen_rand_mcode_val(int size) 22 | { 23 | if ( rand() > SPECIAL_PROBABILITY * RAND_MAX ) 24 | { 25 | // select from uniform random distribution 26 | return mcode_val_t(rand64(), size); 27 | } 28 | else 29 | { 30 | // select from special cases 31 | return mcode_val_t(SPECIAL[rand() % NUM_SPECIAL], size); 32 | } 33 | } 34 | 35 | //------------------------------------------------------------------------- 36 | // guesses whether or not the instruction is MBA 37 | bool is_mba(const minsn_t &insn) 38 | { 39 | struct mba_opc_counter_t : public minsn_visitor_t 40 | { 41 | int bool_cnt = 0; 42 | int arith_cnt = 0; 43 | int idaapi visit_minsn(void) override 44 | { 45 | switch ( curins->opcode ) 46 | { 47 | case m_neg: 48 | case m_add: 49 | case m_sub: 50 | case m_mul: 51 | case m_udiv: 52 | case m_sdiv: 53 | case m_umod: 54 | case m_smod: 55 | case m_shl: 56 | case m_shr: 57 | arith_cnt++; 58 | break; 59 | case m_bnot: 60 | case m_or: 61 | case m_and: 62 | case m_xor: 63 | case m_sar: 64 | bool_cnt++; 65 | break; 66 | default: 67 | return 0; 68 | } 69 | return bool_cnt >= MIN_MBA_BOOL_OPS && arith_cnt >= MIN_MBA_ARITH_OPS; 70 | } 71 | }; 72 | 73 | if ( is_mcode_xdsu(insn.opcode) ) 74 | return false; // exclude xdsu, it is better to optimize its operand 75 | 76 | if ( insn.d.size > 8 ) 77 | return false; // we only support 64-bit math 78 | 79 | mba_opc_counter_t cntr; 80 | return CONST_CAST(minsn_t*)(&insn)->for_all_insns(cntr) != 0; 81 | } 82 | 83 | //------------------------------------------------------------------------- 84 | // runs a battery of random test cases against both expressions to see if they are equivalent 85 | bool probably_equivalent(const minsn_t &insn, const candidate_expr_t &expr) 86 | { 87 | for ( int i = 0; i < NUM_TEST_CASES; i++ ) 88 | { 89 | mcode_emu_rand_vals_t emu; 90 | mcode_val_t insn_eval = emu.minsn_value(insn); 91 | mcode_val_t expr_eval = expr.evaluate(emu); 92 | 93 | if ( insn_eval != expr_eval ) 94 | return false; 95 | } 96 | 97 | return true; 98 | } 99 | 100 | //------------------------------------------------------------------------- 101 | // runs a battery of random test cases against both expressions to see if they are equivalent 102 | bool probably_equivalent(const minsn_t &a, const minsn_t &b) 103 | { 104 | for ( int i = 0; i < NUM_TEST_CASES; i++ ) 105 | { 106 | mcode_emu_rand_vals_t emu; 107 | mcode_val_t insn_eval = emu.minsn_value(a); 108 | mcode_val_t expr_eval = emu.minsn_value(b); 109 | 110 | if ( insn_eval != expr_eval ) 111 | return false; 112 | } 113 | 114 | return true; 115 | } 116 | 117 | //------------------------------------------------------------------------- 118 | // estimates the "complexity" of a given instruction 119 | int score_complexity(const minsn_t &insn) 120 | { 121 | struct ida_local complexity_counter_t : public minsn_visitor_t 122 | { 123 | int cnt = 0; 124 | int idaapi visit_minsn() override 125 | { 126 | cnt++; 127 | return 0; 128 | } 129 | }; 130 | complexity_counter_t cc; 131 | CONST_CAST(minsn_t&)(insn).for_all_insns(cc); 132 | return cc.cnt; 133 | } 134 | -------------------------------------------------------------------------------- /heuristics.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include "mcode_emu.hpp" 11 | #include "linear_exprs.hpp" 12 | 13 | const uint64 SPECIAL[] = { 0, 1, 0xffffffffffffffff }; 14 | const int NUM_SPECIAL = qnumber(SPECIAL); 15 | const double SPECIAL_PROBABILITY = 0.2; // probability of selecting a special number when sampling 16 | 17 | // an expression must have at least this many subinstructions of each type to count as an MBA 18 | const int MIN_MBA_BOOL_OPS = 1; 19 | const int MIN_MBA_ARITH_OPS = 1; 20 | 21 | // number of test cases to run when checking if an instruction matches the candidate expression's behavior 22 | const int NUM_TEST_CASES = 256; 23 | 24 | //------------------------------------------------------------------------- 25 | mcode_val_t gen_rand_mcode_val(int size); 26 | 27 | //------------------------------------------------------------------------- 28 | // emulates the microcode, assigning random values to unknown variables 29 | // (but keeping them consistent across executions) 30 | struct mcode_emu_rand_vals_t : public mcode_emulator_t 31 | { 32 | std::map assigned_vals; 33 | 34 | mcode_val_t get_var_val(const mop_t &mop) override 35 | { 36 | // check that the mop is indeed a variable 37 | mopt_t t = mop.t; 38 | QASSERT(30672, t == mop_r || t == mop_S || t == mop_v || t == mop_l); 39 | 40 | auto assignment = assigned_vals.find(mop); 41 | if ( assignment != assigned_vals.end() ) 42 | return assignment->second; 43 | 44 | mcode_val_t new_val = gen_rand_mcode_val(mop.size); 45 | assigned_vals.insert( { mop, new_val } ); 46 | return new_val; 47 | } 48 | }; 49 | 50 | //------------------------------------------------------------------------- 51 | bool is_mba(const minsn_t &insn); 52 | 53 | //------------------------------------------------------------------------- 54 | bool probably_equivalent(const minsn_t &insn, const candidate_expr_t &expr); 55 | bool probably_equivalent(const minsn_t &a, const minsn_t &b); 56 | 57 | //------------------------------------------------------------------------- 58 | // estimates the "complexity" of a given instruction 59 | int score_complexity(const minsn_t &insn); 60 | 61 | struct minsn_complexity_cmptr_t 62 | { 63 | bool operator()(const minsn_t *a, const minsn_t *b) const 64 | { 65 | auto score_a = score_complexity(*a); 66 | auto score_b = score_complexity(*b); 67 | return score_a < score_b; 68 | } 69 | }; 70 | 71 | inline mopvec_t get_input_mops(const minsn_t &insn) 72 | { 73 | default_zero_mcode_emu_t emu; 74 | emu.minsn_value(insn); // populate emu.assigned_vals 75 | 76 | mopvec_t res; 77 | res.reserve(emu.assigned_vals.size()); 78 | for ( auto const &entry : emu.assigned_vals ) 79 | res.push_back(entry.first); 80 | 81 | std::sort(res.begin(), res.end()); 82 | return res; 83 | } -------------------------------------------------------------------------------- /images/mba1_after.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/images/mba1_after.png -------------------------------------------------------------------------------- /images/mba1_before.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/images/mba1_before.png -------------------------------------------------------------------------------- /lin_conj_exprs.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include "linear_exprs.hpp" 12 | #include "mcode_emu.hpp" 13 | 14 | typedef qvector coeff_vector_t; 15 | typedef qvector eval_trace_t; 16 | const int LIN_CONJ_MAX_VARS = 16; 17 | 18 | // represents a linear combination of conjunctions 19 | class lin_conj_expr_t : public candidate_expr_t 20 | { 21 | protected: 22 | mopvec_t mops; 23 | coeff_vector_t coeffs; 24 | eval_trace_t eval_trace; 25 | 26 | public: 27 | //------------------------------------------------------------------------- 28 | const char *dstr() const override 29 | { 30 | static char buf[MAXSTR]; 31 | char *ptr = buf; 32 | char *end = buf + sizeof(buf); 33 | 34 | ptr += qsnprintf(ptr, end-ptr, "0x%" FMT_64 "x", coeffs[0].val); 35 | for ( uint32 i = 1; i < coeffs.size(); i++ ) 36 | { 37 | if ( coeffs[i].val == 0 ) 38 | continue; 39 | ptr += qsnprintf(ptr, end-ptr, " + 0x%" FMT_64 "x(", coeffs[i].val); 40 | ptr = print_assignment(ptr, end, i); 41 | APPEND(ptr, end, ")"); 42 | } 43 | return buf; 44 | } 45 | 46 | //------------------------------------------------------------------------- 47 | // each boolean assignment is represented as a uint32, where the nth bit 48 | // represents the 0/1 value of the corresponding variable 49 | char *print_assignment(char *ptr, char *end, uint32 assn) const 50 | { 51 | bool first_printed = false; 52 | for ( int i = 0; i < mops.size(); i++ ) 53 | { 54 | if ( ((assn >> i) & 1) != 0 ) 55 | { 56 | if ( first_printed ) 57 | APPCHAR(ptr, end, '&'); 58 | APPEND(ptr, end, mops[i].dstr()); 59 | first_printed = true; 60 | } 61 | } 62 | return ptr; 63 | } 64 | 65 | //------------------------------------------------------------------------- 66 | // each boolean assignment is represented as a uint32, where the nth bit 67 | // represents the 0/1 value of the corresponding variable 68 | void apply_assignment(uint32 assn, std::map &dest) 69 | { 70 | // recall std::map keeps keys in sorted order 71 | int curr_idx = 0; 72 | for ( auto &kv : dest ) 73 | { 74 | kv.second.val = (assn >> curr_idx) & 1; 75 | curr_idx++; 76 | } 77 | } 78 | 79 | //------------------------------------------------------------------------- 80 | // the i'th index in output_vals contains the output value corresponding to 81 | // the i'th assignment, where the i'th assignment is defined as in 82 | // apply_assignment. 83 | // the return value of this function is the corresponding coefficients in 84 | // the linear combination of conjunctions that would yield the output 85 | // behavior. The coefficients are ordered based on the same indexing pattern. 86 | void compute_coeffs(coeff_vector_t &dest, const qvector &output_vals) 87 | { 88 | dest = coeff_vector_t(); 89 | dest.reserve(output_vals.size()); 90 | dest.push_back(output_vals[0]); // the zero coeff = the zero assignment 91 | 92 | // we can think of the problem as solving the linear equation Ax = y, 93 | // where y is the output_vals and x is the desired coefficient set. 94 | // A is defined as the binary matrix where row numbers represent 95 | // assignments and columns represent conjunctions. See the SiMBA paper 96 | // for more details. 97 | // We do an additional simplification, noting that 98 | // A_{ij} = (i & j) == j. Also, we use forward substitution since A is a 99 | // lower-triangular matrix. 100 | 101 | for ( uint32 i = 1; i < output_vals.size(); i++ ) 102 | { 103 | mcode_val_t curr_coeff = output_vals[i]; 104 | for ( uint32 j = 0; j < i; j++ ) 105 | { 106 | if ( (i & j) == j ) 107 | curr_coeff = curr_coeff - dest[j]; 108 | } 109 | dest.push_back(curr_coeff); 110 | } 111 | } 112 | 113 | //------------------------------------------------------------------------- 114 | void recompute_coeffs() 115 | { 116 | compute_coeffs(coeffs, eval_trace); 117 | } 118 | 119 | //------------------------------------------------------------------------- 120 | mcode_val_t evaluate(mcode_emulator_t &emu) const override 121 | { 122 | minsn_t *minsn = to_minsn(0); 123 | mcode_val_t res = emu.minsn_value(*minsn); 124 | delete minsn; 125 | return res; 126 | } 127 | 128 | //------------------------------------------------------------------------- 129 | // eliminates all variables that are not needed in the expression 130 | void eliminate_variables() 131 | { 132 | for ( int i = 0; i < mops.size(); i++ ) 133 | { 134 | if ( can_eliminate_variable(i) ) 135 | { 136 | eliminate_variable(i); 137 | i--; // the mop at mop[i] no longer exists 138 | } 139 | } 140 | } 141 | 142 | //------------------------------------------------------------------------- 143 | // creates a linear combination of conjunctions based on the minsn behavior 144 | lin_conj_expr_t(const minsn_t &insn) 145 | { 146 | default_zero_mcode_emu_t emu; 147 | mcode_val_t const_term = emu.minsn_value(insn); 148 | 149 | int nvars = emu.assigned_vals.size(); 150 | if ( nvars > LIN_CONJ_MAX_VARS ) 151 | throw "lin_conj_expr_t: too many input variables"; 152 | 153 | uint32 max_assignment = 1 << nvars; 154 | // we have already gotten the value for the all-zeroes assignment, which is const_term 155 | eval_trace.push_back(const_term); 156 | eval_trace.reserve(max_assignment); 157 | 158 | for ( uint32 assn = 1; assn < max_assignment; assn++ ) 159 | { 160 | apply_assignment(assn, emu.assigned_vals); 161 | mcode_val_t output_val = emu.minsn_value(insn); 162 | 163 | eval_trace.push_back(output_val); 164 | } 165 | 166 | compute_coeffs(coeffs, eval_trace); 167 | mops.reserve(emu.assigned_vals.size()); 168 | for ( const auto &kv : emu.assigned_vals ) 169 | mops.push_back(kv.first); 170 | 171 | QASSERT(30679, coeffs.size() == (1ull << mops.size())); 172 | } 173 | 174 | //------------------------------------------------------------------------- 175 | z3::expr to_smt(z3_converter_t &cvtr) const override 176 | { 177 | minsn_t *minsn = to_minsn(0); 178 | z3::expr res = cvtr.minsn_to_expr(*minsn); 179 | delete minsn; 180 | return res; 181 | } 182 | 183 | //------------------------------------------------------------------------- 184 | // converts an assignment to the corresponding conjunction. e.g. 185 | // 0b1101 => x0 & x2 & x3 186 | minsn_t *assn_to_minsn(uint32 assn, int size, ea_t ea) const 187 | { 188 | QASSERT(30680, assn != 0); 189 | minsn_t *res = nullptr; 190 | 191 | for ( int i = 0; i < mops.size(); i++ ) 192 | { 193 | if ( ((assn >> i) & 1) != 0 ) 194 | { 195 | if ( res == nullptr ) 196 | { 197 | res = resize_mop(ea, mops[i], size, false); 198 | } 199 | else 200 | { 201 | minsn_t *new_res = new minsn_t(ea); 202 | new_res->opcode = m_and; 203 | new_res->l.create_from_insn(res); 204 | minsn_t *rsz = resize_mop(ea, mops[i], size, false); 205 | new_res->r.create_from_insn(rsz); 206 | delete rsz; 207 | new_res->d.size = size; 208 | 209 | delete res; 210 | res = new_res; 211 | } 212 | } 213 | } 214 | 215 | QASSERT(30681, res->opcode != m_ldc); 216 | 217 | return res; 218 | } 219 | 220 | //------------------------------------------------------------------------- 221 | minsn_t *to_minsn(ea_t ea) const override 222 | { 223 | minsn_t *res = new minsn_t(ea); 224 | res->opcode = m_ldc; 225 | res->l.make_number(coeffs[0].val, coeffs[0].size, ea); 226 | res->r.zero(); 227 | res->d.size = coeffs[0].size; 228 | 229 | for ( uint32 assn = 1; assn < coeffs.size(); assn++ ) 230 | { 231 | auto coeff = coeffs[assn]; 232 | if ( coeff.val == 0 ) 233 | continue; 234 | 235 | // mul = coeff * F(mops) 236 | minsn_t mul(ea); 237 | mul.opcode = m_mul; 238 | mul.l.make_number(coeff.val, coeff.size); 239 | minsn_t *F = assn_to_minsn(assn, coeff.size, ea); 240 | mul.r.create_from_insn(F); 241 | delete F; 242 | mul.d.size = coeff.size; 243 | 244 | // add = res + mul 245 | minsn_t *add = new minsn_t(ea); 246 | add->opcode = m_add; 247 | add->l.create_from_insn(res); 248 | add->r.create_from_insn(&mul); 249 | add->d.size = coeff.size; 250 | 251 | delete res; // mop_t::create_from_insn makes a copy of the insn 252 | res = add; 253 | } 254 | 255 | return res; 256 | } 257 | 258 | private: 259 | //------------------------------------------------------------------------- 260 | // returns true if the variable can be eliminated safely 261 | // i.e. all terms containing it have coeff = 0 262 | bool can_eliminate_variable(int idx) 263 | { 264 | for ( uint32 assn = 0; assn < coeffs.size(); assn++ ) 265 | { 266 | if ( ((assn >> idx) & 1) != 0 && coeffs[assn].val != 0 ) 267 | return false; 268 | } 269 | return true; 270 | } 271 | 272 | //------------------------------------------------------------------------- 273 | // removes the variable from the expression 274 | // make sure to check can_eliminate_variable before calling 275 | void eliminate_variable(int idx) 276 | { 277 | coeff_vector_t new_coeffs; 278 | eval_trace_t new_evals; 279 | new_coeffs.reserve(coeffs.size() / 2); 280 | new_evals.reserve(coeffs.size() / 2); 281 | for ( uint32 assn = 0; assn < coeffs.size(); assn++ ) 282 | { 283 | if ( ((assn >> idx) & 1) == 0 ) 284 | { 285 | new_coeffs.push_back(coeffs[assn]); 286 | new_evals.push_back(eval_trace[assn]); 287 | } 288 | else 289 | { 290 | QASSERT(30682, coeffs[assn].val == 0); 291 | } 292 | } 293 | coeffs = new_coeffs; 294 | eval_trace = new_evals; 295 | mops.erase(mops.begin() + idx); 296 | } 297 | }; 298 | -------------------------------------------------------------------------------- /linear_exprs.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "linear_exprs.hpp" 11 | 12 | //------------------------------------------------------------------------- 13 | const char *linear_expr_t::dstr() const 14 | { 15 | static char buf[MAXSTR]; 16 | char *ptr = buf; 17 | char *end = buf + sizeof(buf); 18 | 19 | ptr += qsnprintf(ptr, end-ptr, "0x%" FMT_64 "x", const_term.val); 20 | for ( const auto &term : coeffs ) 21 | { 22 | if ( term.second.val == 0 ) 23 | continue; 24 | ptr += qsnprintf(ptr, end-ptr, " + 0x%" FMT_64 "x*", term.second.val); 25 | if ( term.first.size < const_term.size ) 26 | { 27 | ptr += qsnprintf(ptr, end-ptr, "%s(%s)", 28 | sext.count(term.first) ? "SEXT" : "ZEXT", 29 | term.first.dstr()); 30 | } 31 | else if ( term.first.size > const_term.size ) 32 | { 33 | ptr += qsnprintf(ptr, end-ptr, "TRUNC(%s)", term.first.dstr()); 34 | } 35 | else 36 | { 37 | APPEND(ptr, end, term.first.dstr()); 38 | } 39 | } 40 | return buf; 41 | } 42 | 43 | //------------------------------------------------------------------------- 44 | linear_expr_t::linear_expr_t(const minsn_t &insn) // creates a linear expression based on the instruction behavior 45 | { 46 | default_zero_mcode_emu_t emu; 47 | const_term = emu.minsn_value(insn); // the value when all variables are assigned to zero 48 | 49 | for ( auto &p : emu.assigned_vals ) 50 | { 51 | mop_t mop = p.first; 52 | p.second = mcode_val_t(1, mop.size); 53 | mcode_val_t coeff = emu.minsn_value(insn) - const_term; 54 | 55 | if ( mop.size < const_term.size ) 56 | { 57 | // check if a sign extension is necessary 58 | p.second = mcode_val_t(-1, mop.size); 59 | mcode_val_t eval = emu.minsn_value(insn); // eval = const + (-1)*coeff if x was sign extended 60 | 61 | if ( const_term - eval == coeff ) 62 | sext.insert(mop); 63 | } 64 | 65 | coeffs.insert( { mop, emu.minsn_value(insn) - const_term } ); 66 | p.second = mcode_val_t(0, mop.size); 67 | } 68 | } 69 | 70 | //------------------------------------------------------------------------- 71 | mcode_val_t linear_expr_t::evaluate(mcode_emulator_t &emu) const 72 | { 73 | mcode_val_t res = const_term; 74 | 75 | for ( const auto &term : coeffs ) 76 | { 77 | const mop_t &mop = term.first; 78 | const mcode_val_t &coeff = term.second; 79 | mcode_val_t mop_val = emu.get_var_val(mop); 80 | 81 | // extend the value to 64 bits first 82 | uint64 ext_val = sext.count(mop) ? mop_val.signed_val() : mop_val.val; 83 | 84 | res = res + coeff * mcode_val_t(ext_val, coeff.size); 85 | } 86 | 87 | return res; 88 | } 89 | 90 | //------------------------------------------------------------------------- 91 | z3::expr linear_expr_t::to_smt(z3_converter_t &cvtr) const 92 | { 93 | z3::expr res = cvtr.mcode_val_to_expr(const_term); 94 | 95 | for ( const auto &term : coeffs ) 96 | { 97 | const mop_t &mop = term.first; 98 | const mcode_val_t &coeff = term.second; 99 | z3::expr mop_expr = cvtr.mop_to_expr(mop); 100 | 101 | z3::expr ext_expr = cvtr.bv_resize_to_len(mop_expr, const_term.size * 8, sext.count(mop) != 0); 102 | 103 | res = res 104 | + cvtr.mcode_val_to_expr(coeff) * ext_expr; 105 | } 106 | 107 | return res; 108 | } 109 | 110 | //------------------------------------------------------------------------- 111 | minsn_t *linear_expr_t::to_minsn(ea_t ea) const 112 | { 113 | minsn_t *res = new minsn_t(ea); 114 | res->opcode = m_ldc; 115 | res->l.make_number(const_term.val, const_term.size); 116 | res->r.zero(); 117 | res->d.size = const_term.size; 118 | 119 | for ( const auto &term : coeffs ) 120 | { 121 | const mop_t &mop = term.first; 122 | const mcode_val_t &coeff = term.second; 123 | 124 | if ( coeff.val == 0 ) 125 | continue; 126 | 127 | // mul = coeff * ext(mop) 128 | minsn_t mul(ea); 129 | mul.opcode = m_mul; 130 | mul.l.make_number(coeff.val, coeff.size); 131 | minsn_t *rsz = resize_mop(ea, mop, const_term.size, sext.count(mop) != 0); 132 | mul.r.create_from_insn(rsz); 133 | delete rsz; 134 | 135 | mul.d.size = const_term.size; 136 | 137 | // add = res + mul 138 | minsn_t *add = new minsn_t(ea); 139 | add->opcode = m_add; 140 | add->l.create_from_insn(res); 141 | add->r.create_from_insn(&mul); 142 | add->d.size = const_term.size; 143 | 144 | delete res; // mop_t::create_from_insn makes a copy of the insn 145 | res = add; 146 | } 147 | 148 | return res; 149 | } 150 | -------------------------------------------------------------------------------- /linear_exprs.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | 12 | #include "smt_convert.hpp" 13 | #include "mcode_emu.hpp" 14 | 15 | //------------------------------------------------------------------------- 16 | class candidate_expr_t 17 | { 18 | public: 19 | virtual ~candidate_expr_t() {} 20 | virtual mcode_val_t evaluate(mcode_emulator_t &emu) const = 0; 21 | virtual z3::expr to_smt(z3_converter_t &converter) const = 0; 22 | virtual minsn_t *to_minsn(ea_t ea) const = 0; 23 | virtual const char *dstr() const = 0; 24 | }; 25 | 26 | //------------------------------------------------------------------------- 27 | // resize_mop generates a minsn that resizes the source operand (truncates or extends) 28 | inline minsn_t *resize_mop(ea_t ea, const mop_t &mop, int dest_sz, bool sext) 29 | { 30 | minsn_t *res = new minsn_t(ea); 31 | if ( dest_sz == mop.size ) 32 | res->opcode = m_mov; 33 | else if ( dest_sz < mop.size ) 34 | res->opcode = m_low; 35 | else 36 | res->opcode = sext ? m_xds : m_xdu; 37 | 38 | res->l = mop; 39 | res->d.size = dest_sz; 40 | return res; 41 | } 42 | 43 | //------------------------------------------------------------------------- 44 | // this emulator automatically assigns variables to 0 45 | // after the first run, the assigned_vals field can be modified 46 | // and the emulation can be rerun to obtain coefficients 47 | class default_zero_mcode_emu_t : public mcode_emulator_t 48 | { 49 | public: 50 | std::map assigned_vals; 51 | 52 | mcode_val_t get_var_val(const mop_t &mop) override 53 | { 54 | // check that the mop is indeed a variable 55 | mopt_t t = mop.t; 56 | QASSERT(30695, t == mop_r || t == mop_S || t == mop_v || t == mop_l); 57 | 58 | auto p = assigned_vals.find(mop); 59 | if ( p != assigned_vals.end() ) 60 | return p->second; 61 | 62 | mcode_val_t new_val = mcode_val_t(0, mop.size); 63 | assigned_vals.insert( { mop, new_val } ); 64 | return new_val; 65 | } 66 | }; 67 | 68 | //------------------------------------------------------------------------- 69 | class linear_expr_t : public candidate_expr_t 70 | { 71 | public: 72 | mcode_val_t const_term { 0, 1 }; 73 | std::map coeffs; 74 | std::set sext; 75 | 76 | const char *dstr() const override; 77 | linear_expr_t(const minsn_t &insn); 78 | mcode_val_t evaluate(mcode_emulator_t &emu) const override; 79 | z3::expr to_smt(z3_converter_t &cvtr) const override; 80 | minsn_t *to_minsn(ea_t ea) const override; 81 | }; 82 | -------------------------------------------------------------------------------- /makefile: -------------------------------------------------------------------------------- 1 | PROC=goomba 2 | 3 | GOALS += $(R)libz3$(DLLEXT) 4 | O2=heuristics 5 | O3=smt_convert 6 | O4=linear_exprs 7 | O5=msynth_parser 8 | O6=bitwise_expr_lookup_tbl 9 | O7=optimizer 10 | O8=equiv_class 11 | O9=file 12 | 13 | CONFIGS=goomba.cfg 14 | include ../plugin.mak 15 | 16 | ifeq ($(THIRD_PARTY),) 17 | # building outside of Hex-Rays tree, use a local z3 build 18 | Z3_BIN = z3/bin/ 19 | Z3_INCLUDE = z3/include/ 20 | endif 21 | 22 | ifdef __MAC__ 23 | POSTACTION=install_name_tool -change libz3.dylib @executable_path/libz3.dylib $@ 24 | endif 25 | 26 | ifdef __NT__ 27 | # link to the import library on Windows 28 | STDLIBS += $(Z3_BIN)libz3.lib 29 | else 30 | # link directly to dylib/shared object on Unix 31 | STDLIBS += -L$(R) -lz3 32 | endif 33 | 34 | $(F)$(PROC)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 35 | $(F)$(O2)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 36 | $(F)$(O3)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 37 | $(F)$(O4)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 38 | $(F)$(O5)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 39 | $(F)$(O6)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 40 | $(F)$(O7)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 41 | $(F)$(O8)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 42 | $(F)$(O9)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++ 43 | $(F)$(PROC)$(O): $(R)libz3$(DLLEXT) 44 | 45 | $(R)libz3$(DLLEXT): $(Z3_BIN)libz3$(DLLEXT) 46 | $(Q)$(CP) $? $@ 47 | 48 | # MAKEDEP dependency list ------------------ 49 | $(F)bitwise_expr_lookup_tbl$(O): $(I)bitrange.hpp $(I)bytes.hpp \ 50 | $(I)config.hpp $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp \ 51 | $(I)hexrays.hpp $(I)ida.hpp $(I)idp.hpp $(I)ieee.h \ 52 | $(I)kernwin.hpp $(I)lines.hpp $(I)llong.hpp \ 53 | $(I)loader.hpp $(I)nalt.hpp $(I)name.hpp $(I)netnode.hpp \ 54 | $(I)pro.h $(I)range.hpp $(I)segment.hpp $(I)typeinf.hpp \ 55 | $(I)ua.hpp $(I)xref.hpp bitwise_expr_lookup_tbl.cpp \ 56 | bitwise_expr_lookup_tbl.hpp consts.hpp linear_exprs.hpp \ 57 | mcode_emu.hpp minsn_template.hpp smt_convert.hpp \ 58 | z3++_no_warn.h 59 | $(F)equiv_class$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 60 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 61 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 62 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 63 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 64 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 65 | bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.cpp \ 66 | equiv_class.hpp heuristics.hpp lin_conj_exprs.hpp \ 67 | linear_exprs.hpp mcode_emu.hpp minsn_template.hpp \ 68 | msynth_parser.hpp optimizer.hpp simp_lin_conj_exprs.hpp \ 69 | smt_convert.hpp z3++_no_warn.h 70 | $(F)file$(O) : $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp $(I)fpro.h \ 71 | $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp $(I)ida.hpp \ 72 | $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp $(I)lines.hpp \ 73 | $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp $(I)name.hpp \ 74 | $(I)netnode.hpp $(I)pro.h $(I)range.hpp $(I)segment.hpp \ 75 | $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 76 | bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp \ 77 | file.cpp file.hpp heuristics.hpp lin_conj_exprs.hpp \ 78 | linear_exprs.hpp mcode_emu.hpp minsn_template.hpp \ 79 | msynth_parser.hpp simp_lin_conj_exprs.hpp \ 80 | smt_convert.hpp z3++_no_warn.h 81 | $(F)goomba$(O) : $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp $(I)err.h \ 82 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 83 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 84 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 85 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 86 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 87 | bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp \ 88 | file.hpp goomba.cpp heuristics.hpp lin_conj_exprs.hpp \ 89 | linear_exprs.hpp mcode_emu.hpp minsn_template.hpp \ 90 | msynth_parser.hpp optimizer.hpp simp_lin_conj_exprs.hpp \ 91 | smt_convert.hpp z3++_no_warn.h 92 | $(F)heuristics$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 93 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 94 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 95 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 96 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 97 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 98 | heuristics.cpp heuristics.hpp linear_exprs.hpp \ 99 | mcode_emu.hpp smt_convert.hpp z3++_no_warn.h 100 | $(F)linear_exprs$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 101 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 102 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 103 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 104 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 105 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 106 | linear_exprs.cpp linear_exprs.hpp mcode_emu.hpp \ 107 | smt_convert.hpp z3++_no_warn.h 108 | $(F)msynth_parser$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 109 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 110 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 111 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 112 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 113 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 114 | consts.hpp linear_exprs.hpp mcode_emu.hpp \ 115 | minsn_template.hpp msynth_parser.cpp msynth_parser.hpp \ 116 | smt_convert.hpp z3++_no_warn.h 117 | $(F)optimizer$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 118 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 119 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 120 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 121 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 122 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 123 | bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp \ 124 | heuristics.hpp lin_conj_exprs.hpp linear_exprs.hpp \ 125 | mcode_emu.hpp minsn_template.hpp msynth_parser.hpp \ 126 | optimizer.cpp optimizer.hpp simp_lin_conj_exprs.hpp \ 127 | smt_convert.hpp z3++_no_warn.h 128 | $(F)smt_convert$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp \ 129 | $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp \ 130 | $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp \ 131 | $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp \ 132 | $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp \ 133 | $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp \ 134 | mcode_emu.hpp smt_convert.cpp smt_convert.hpp \ 135 | z3++_no_warn.h 136 | -------------------------------------------------------------------------------- /mcode_emu.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * This file implements a simple microcode emulator class 7 | * 8 | */ 9 | 10 | #pragma once 11 | #include 12 | 13 | //------------------------------------------------------------------------- 14 | // truncate v to w bytes 15 | inline uint64 trunc(uint64 v, int w) 16 | { 17 | QASSERT(30660, w == 1 || w == 2 || w == 4 || w == 8); 18 | return v & make_mask(w * 8); 19 | } 20 | 21 | //------------------------------------------------------------------------- 22 | struct mcode_val_t 23 | { 24 | uint64 val; 25 | int size; // in bytes 26 | 27 | //------------------------------------------------------------------------- 28 | void check_size_equal(const mcode_val_t &o) const 29 | { 30 | QASSERT(30661, size == o.size); 31 | } 32 | 33 | //------------------------------------------------------------------------- 34 | mcode_val_t(uint64 v, int s) : val(trunc(v, s)), size(s) {} 35 | 36 | //------------------------------------------------------------------------- 37 | int64 signed_val() const 38 | { 39 | return extend_sign(val, size, true); 40 | } 41 | 42 | //------------------------------------------------------------------------- 43 | mcode_val_t sext(int target_sz) const 44 | { 45 | QASSERT(30662, target_sz >= size); 46 | return mcode_val_t(signed_val(), target_sz); 47 | } 48 | 49 | //------------------------------------------------------------------------- 50 | mcode_val_t zext(int target_sz) const 51 | { 52 | QASSERT(30663, target_sz >= size); 53 | return mcode_val_t(val, target_sz); 54 | } 55 | 56 | //------------------------------------------------------------------------- 57 | mcode_val_t low(int target_sz) const 58 | { 59 | QASSERT(30664, target_sz <= size); 60 | return mcode_val_t(val, target_sz); 61 | } 62 | 63 | //------------------------------------------------------------------------- 64 | mcode_val_t high(int target_sz) const 65 | { 66 | QASSERT(30665, target_sz <= size); 67 | int bytes_to_remove = size - target_sz; 68 | return mcode_val_t(right_ushift(val, 8 * bytes_to_remove), target_sz); 69 | } 70 | 71 | //------------------------------------------------------------------------- 72 | bool operator==(const mcode_val_t &o) const 73 | { 74 | return size == o.size && val == o.val; 75 | } 76 | 77 | //------------------------------------------------------------------------- 78 | bool operator!=(const mcode_val_t &o) const 79 | { 80 | return !(*this == o); 81 | } 82 | 83 | //------------------------------------------------------------------------- 84 | bool operator<(const mcode_val_t &o) const 85 | { 86 | QASSERT(30702, size == o.size); 87 | return val < o.val; 88 | } 89 | 90 | //------------------------------------------------------------------------- 91 | mcode_val_t operator+(const mcode_val_t &o) const 92 | { 93 | check_size_equal(o); 94 | return mcode_val_t(val + o.val, size); 95 | } 96 | 97 | //------------------------------------------------------------------------- 98 | mcode_val_t operator-(const mcode_val_t &o) const 99 | { 100 | check_size_equal(o); 101 | return mcode_val_t(val - o.val, size); 102 | } 103 | 104 | //------------------------------------------------------------------------- 105 | mcode_val_t operator*(const mcode_val_t &o) const 106 | { 107 | check_size_equal(o); 108 | return mcode_val_t(val * o.val, size); 109 | } 110 | 111 | //------------------------------------------------------------------------- 112 | mcode_val_t operator/(const mcode_val_t &o) const 113 | { 114 | check_size_equal(o); 115 | if ( o.val == 0 ) 116 | throw "division by zero occurred when emulating instruction"; 117 | return mcode_val_t(val / o.val, size); 118 | } 119 | 120 | //------------------------------------------------------------------------- 121 | mcode_val_t sdiv(const mcode_val_t &o) const 122 | { 123 | check_size_equal(o); 124 | if ( o.val == 0 ) 125 | throw "division by zero occurred when emulating instruction"; 126 | int64 res; 127 | uint64 l = val; 128 | uint64 r = o.val; 129 | switch ( size ) 130 | { 131 | case 1: res = int8(l) / int8(r); break; 132 | case 2: res = int16(l) / int16(r); break; 133 | case 4: res = int32(l) / int32(r); break; 134 | case 8: res = int64(l) / int64(r); break; 135 | default: INTERR(30666); 136 | } 137 | 138 | return mcode_val_t(res, size); 139 | } 140 | 141 | //------------------------------------------------------------------------- 142 | mcode_val_t operator%(const mcode_val_t &o) const 143 | { 144 | check_size_equal(o); 145 | if ( o.val == 0 ) 146 | throw "division by zero occurred when emulating instruction"; 147 | return mcode_val_t(val % o.val, size); 148 | } 149 | 150 | //------------------------------------------------------------------------- 151 | mcode_val_t smod(const mcode_val_t &o) const 152 | { 153 | check_size_equal(o); 154 | if ( o.val == 0 ) 155 | throw "division by zero occurred when emulating instruction"; 156 | int64 res = -1; 157 | uint64 l = val; 158 | uint64 r = o.val; 159 | switch ( size ) 160 | { 161 | case 1: res = int8(l) % int8(r); break; 162 | case 2: res = int16(l) % int16(r); break; 163 | case 4: res = int32(l) % int32(r); break; 164 | case 8: res = int64(l) % int64(r); break; 165 | default: QASSERT(30667, false); 166 | } 167 | 168 | return mcode_val_t(res, size); 169 | } 170 | 171 | //------------------------------------------------------------------------- 172 | mcode_val_t operator<<(const mcode_val_t &o) const 173 | { 174 | return mcode_val_t(left_shift(val, o.val), size); 175 | } 176 | 177 | //------------------------------------------------------------------------- 178 | mcode_val_t operator>>(const mcode_val_t &o) const 179 | { 180 | return mcode_val_t(right_ushift(val, o.val), size); 181 | } 182 | 183 | //------------------------------------------------------------------------- 184 | mcode_val_t sar(const mcode_val_t &o) const 185 | { 186 | return mcode_val_t(right_sshift(signed_val(), o.val), size); 187 | } 188 | 189 | //------------------------------------------------------------------------- 190 | mcode_val_t operator|(const mcode_val_t &o) const 191 | { 192 | check_size_equal(o); 193 | return mcode_val_t(val | o.val, size); 194 | } 195 | 196 | //------------------------------------------------------------------------- 197 | mcode_val_t operator&(const mcode_val_t &o) const 198 | { 199 | check_size_equal(o); 200 | return mcode_val_t(val & o.val, size); 201 | } 202 | 203 | //------------------------------------------------------------------------- 204 | mcode_val_t operator^(const mcode_val_t &o) const 205 | { 206 | check_size_equal(o); 207 | return mcode_val_t(val ^ o.val, size); 208 | } 209 | 210 | //------------------------------------------------------------------------- 211 | mcode_val_t operator-() const 212 | { 213 | return mcode_val_t(-val, size); 214 | } 215 | 216 | //------------------------------------------------------------------------- 217 | mcode_val_t operator!() const 218 | { 219 | return mcode_val_t(!val, size); 220 | } 221 | 222 | //------------------------------------------------------------------------- 223 | mcode_val_t operator~() const 224 | { 225 | return mcode_val_t(~val, size); 226 | } 227 | }; 228 | 229 | //------------------------------------------------------------------------- 230 | class mcode_emulator_t 231 | { 232 | public: 233 | // base classes with virtual functions should have a virtual dtr 234 | virtual ~mcode_emulator_t() {} 235 | // returns the value assigned to a register, stack, global, or local variable 236 | virtual mcode_val_t get_var_val(const mop_t &mop) = 0; 237 | 238 | //------------------------------------------------------------------------- 239 | mcode_val_t mop_value(const mop_t &mop) 240 | { 241 | if ( mop.size > 8 ) 242 | throw "too big mop size in mcode emulator"; 243 | switch ( mop.t ) 244 | { 245 | case mop_n: 246 | return mcode_val_t(mop.nnn->value, mop.size); 247 | case mop_d: 248 | return minsn_value(*mop.d); 249 | case mop_r: // register 250 | case mop_S: // stack variable 251 | case mop_v: // global variable 252 | case mop_l: 253 | return get_var_val(mop); 254 | default: 255 | throw "unhandled mop type in mcode emulator"; 256 | } 257 | } 258 | 259 | //------------------------------------------------------------------------- 260 | mcode_val_t minsn_value(const minsn_t &insn) 261 | { 262 | if ( insn.is_fpinsn() ) 263 | { 264 | msg("Emulator does not support floating point\n"); 265 | throw "Emulator does not support floating point"; 266 | } 267 | switch ( insn.opcode ) 268 | { 269 | case m_ldc: 270 | case m_mov: 271 | return mop_value(insn.l); 272 | case m_neg: 273 | return -mop_value(insn.l); 274 | case m_lnot: 275 | return !mop_value(insn.l); 276 | case m_bnot: 277 | return ~mop_value(insn.l); 278 | case m_xds: 279 | return mop_value(insn.l).sext(insn.d.size); 280 | case m_xdu: 281 | return mop_value(insn.l).zext(insn.d.size); 282 | case m_low: 283 | return mop_value(insn.l).low(insn.d.size); 284 | case m_high: 285 | return mop_value(insn.l).high(insn.d.size); 286 | case m_add: 287 | return mop_value(insn.l) + mop_value(insn.r); 288 | case m_sub: 289 | return mop_value(insn.l) - mop_value(insn.r); 290 | case m_mul: 291 | return mop_value(insn.l) * mop_value(insn.r); 292 | case m_udiv: 293 | return mop_value(insn.l) / mop_value(insn.r); 294 | case m_sdiv: 295 | return mop_value(insn.l).sdiv(mop_value(insn.r)); 296 | case m_umod: 297 | return mop_value(insn.l) & mop_value(insn.r); 298 | case m_smod: 299 | return mop_value(insn.l).smod(mop_value(insn.r)); 300 | case m_or: 301 | return mop_value(insn.l) | mop_value(insn.r); 302 | case m_and: 303 | return mop_value(insn.l) & mop_value(insn.r); 304 | case m_xor: 305 | return mop_value(insn.l) ^ mop_value(insn.r); 306 | case m_shl: 307 | return mop_value(insn.l) << mop_value(insn.r); 308 | case m_shr: 309 | return mop_value(insn.l) >> mop_value(insn.r); 310 | case m_sar: 311 | return mop_value(insn.l).sar(mop_value(insn.r)); 312 | case m_sets: 313 | return mcode_val_t(mop_value(insn.l).signed_val() < 0, insn.d.size); 314 | case m_setnz: 315 | return mcode_val_t(mop_value(insn.l) != mop_value(insn.r), insn.d.size); 316 | case m_setz: 317 | return mcode_val_t(mop_value(insn.l) == mop_value(insn.r), insn.d.size); 318 | case m_setae: 319 | return mcode_val_t(mop_value(insn.l).val >= mop_value(insn.r).val, insn.d.size); 320 | case m_setb: 321 | return mcode_val_t(mop_value(insn.l).val < mop_value(insn.r).val, insn.d.size); 322 | case m_seta: 323 | return mcode_val_t(mop_value(insn.l).val > mop_value(insn.r).val, insn.d.size); 324 | case m_setbe: 325 | return mcode_val_t(mop_value(insn.l).val <= mop_value(insn.r).val, insn.d.size); 326 | case m_setg: 327 | return mcode_val_t(mop_value(insn.l).signed_val() > mop_value(insn.r).signed_val(), insn.d.size); 328 | case m_setge: 329 | return mcode_val_t(mop_value(insn.l).signed_val() >= mop_value(insn.r).signed_val(), insn.d.size); 330 | case m_setl: 331 | return mcode_val_t(mop_value(insn.l).signed_val() < mop_value(insn.r).signed_val(), insn.d.size); 332 | case m_setle: 333 | return mcode_val_t(mop_value(insn.l).signed_val() <= mop_value(insn.r).signed_val(), insn.d.size); 334 | default: 335 | msg("Unhandled opcode in emulator %d\n", insn.opcode); 336 | throw "Unhandled opcode"; 337 | } 338 | } 339 | }; -------------------------------------------------------------------------------- /minsn_template.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include "linear_exprs.hpp" 12 | #include "consts.hpp" 13 | 14 | //------------------------------------------------------------------------- 15 | struct default_mops_t 16 | { 17 | mopvec_t mops; 18 | 19 | static default_mops_t *get_instance() 20 | { 21 | if ( instance == nullptr ) 22 | instance = new default_mops_t(); 23 | return instance; 24 | } 25 | 26 | private: 27 | static default_mops_t *instance; 28 | default_mops_t() 29 | { 30 | for ( int i = 0; i < CANDIDATE_EXPR_NUMINPUTS; i++ ) 31 | { 32 | mop_t new_var; 33 | new_var.t = mop_l; 34 | new_var.l = new lvar_ref_t(nullptr, i); 35 | new_var.size = 8; 36 | mops.push_back(new_var); 37 | } 38 | } 39 | }; 40 | 41 | //------------------------------------------------------------------------- 42 | // a minsn template has no defined size or assigned terminal mops 43 | class minsn_template_t 44 | { 45 | public: 46 | // caller is responsible for freeing the minsn_t * 47 | virtual minsn_t *synthesize(ea_t ea, int size, const qvector &mops) const = 0; 48 | virtual ~minsn_template_t() {} 49 | 50 | const char *dstr() const 51 | { 52 | minsn_t *insn = synthesize(0, 8, default_mops_t::get_instance()->mops); 53 | const char *res = insn->dstr(); 54 | delete insn; 55 | return res; 56 | } 57 | }; 58 | 59 | typedef std::shared_ptr minsn_template_ptr_t; 60 | typedef qvector minsn_templates_t; 61 | 62 | //------------------------------------------------------------------------- 63 | struct mt_constant_t : public minsn_template_t 64 | { 65 | uint64_t val; 66 | 67 | mt_constant_t(uint64_t v) : val(v) {} 68 | minsn_t *synthesize(ea_t ea, int size, const qvector&) const override 69 | { 70 | minsn_t *res = new minsn_t(ea); 71 | res->opcode = m_ldc; 72 | res->l.make_number(val, size, ea); 73 | res->r.zero(); 74 | res->d.size = size; 75 | return res; 76 | } 77 | }; 78 | 79 | //------------------------------------------------------------------------- 80 | struct mt_varref_t : public minsn_template_t 81 | { 82 | int var_idx; 83 | 84 | mt_varref_t(int v) : var_idx(v) {} 85 | minsn_t *synthesize(ea_t ea, int size, const qvector &mops) const override 86 | { 87 | QASSERT(30704, var_idx < mops.size()); 88 | return resize_mop(ea, mops[var_idx], size, false); 89 | } 90 | }; 91 | 92 | //------------------------------------------------------------------------- 93 | struct mt_comp_t : public minsn_template_t 94 | { 95 | mcode_t opc; 96 | minsn_templates_t operands; 97 | 98 | mt_comp_t(mcode_t op, minsn_templates_t opr) : opc(op), operands(opr) {} 99 | 100 | minsn_t *synthesize(ea_t ea, int size, const qvector &mops) const override 101 | { 102 | minsn_t *res = new minsn_t(ea); 103 | res->opcode = opc; 104 | res->l.zero(); 105 | res->r.zero(); 106 | 107 | if ( operands.size() >= 1 ) 108 | { 109 | minsn_t *l = operands[0]->synthesize(ea, size, mops); 110 | res->l.create_from_insn(l); 111 | delete l; 112 | } 113 | if ( operands.size() >= 2 ) 114 | { 115 | minsn_t *r = operands[1]->synthesize(ea, size, mops); 116 | res->r.create_from_insn(r); 117 | delete r; 118 | } 119 | 120 | res->d.size = size; 121 | return res; 122 | } 123 | }; 124 | 125 | inline minsn_template_ptr_t make_un(mcode_t opc, minsn_template_ptr_t a) 126 | { 127 | minsn_templates_t operands; 128 | operands.push_back(a); 129 | return std::make_shared(opc, operands); 130 | } 131 | 132 | inline minsn_template_ptr_t make_bin(mcode_t opc, minsn_template_ptr_t a, minsn_template_ptr_t b) 133 | { 134 | minsn_templates_t operands; 135 | operands.push_back(a); 136 | operands.push_back(b); 137 | return std::make_shared(opc, operands); 138 | } 139 | 140 | inline minsn_template_ptr_t operator+(minsn_template_ptr_t a, minsn_template_ptr_t b) 141 | { 142 | return make_bin(m_add, a, b); 143 | } 144 | inline minsn_template_ptr_t operator*(minsn_template_ptr_t a, minsn_template_ptr_t b) 145 | { 146 | return make_bin(m_mul, a, b); 147 | } 148 | inline minsn_template_ptr_t operator&(minsn_template_ptr_t a, minsn_template_ptr_t b) 149 | { 150 | return make_bin(m_and, a, b); 151 | } 152 | inline minsn_template_ptr_t operator|(minsn_template_ptr_t a, minsn_template_ptr_t b) 153 | { 154 | return make_bin(m_or, a, b); 155 | } 156 | inline minsn_template_ptr_t operator^(minsn_template_ptr_t a, minsn_template_ptr_t b) 157 | { 158 | return make_bin(m_xor, a, b); 159 | } 160 | inline minsn_template_ptr_t operator~(minsn_template_ptr_t a) 161 | { 162 | return make_un(m_bnot, a); 163 | } 164 | -------------------------------------------------------------------------------- /msynth_parser.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "msynth_parser.hpp" 11 | #include "minsn_template.hpp" 12 | 13 | default_mops_t *default_mops_t::instance = nullptr; 14 | 15 | minsn_t *msynth_expr_parser_t::parse_next_expr() 16 | { 17 | if ( *next == '~' ) 18 | { 19 | next++; 20 | minsn_t *res = new minsn_t(0); 21 | res->opcode = m_bnot; 22 | minsn_t *next_expr = parse_next_expr(); 23 | res->l.create_from_insn(next_expr); 24 | delete next_expr; 25 | next_expr = nullptr; 26 | res->d.size = res->l.size; 27 | return res; 28 | } 29 | 30 | // ExprInt(val: uint64, bitw: int) 31 | { 32 | int nread; 33 | uint64 val; 34 | int bitw; 35 | int sr = qsscanf(next, "ExprInt(%" FMT_64 "u, %d)%n", &val, &bitw, &nread); 36 | if ( sr == 2 ) 37 | { 38 | next += nread; 39 | 40 | minsn_t *res = new minsn_t(0); 41 | res->opcode = m_ldc; 42 | res->l.make_number(val, bitw/8); 43 | res->r.zero(); 44 | res->d.size = bitw/8; 45 | return res; 46 | } 47 | } 48 | 49 | // ExprId(id: str, bitw: int) 50 | { 51 | int nread; 52 | int varnum, bitw; 53 | int sr = qsscanf(next, "ExprId(\"p%d\", %d)%n", &varnum, &bitw, &nread); 54 | if ( sr == 2 ) 55 | { 56 | next += nread; 57 | minsn_t *res = new minsn_t(0); 58 | res->opcode = bitw == 64 ? m_mov : m_low; 59 | res->l = vars[varnum]; 60 | res->d.size = bitw/8; 61 | return res; 62 | } 63 | } 64 | 65 | // ExprOp(op: str, expr*) 66 | { 67 | int sc = strncmp(next, "ExprOp", 6); 68 | if ( sc == 0 ) 69 | { 70 | int nread; 71 | next += 6; 72 | char op[3]; 73 | int sr = qsscanf(next, "(\"%2[^\"]\"%n", op, &nread); 74 | QASSERT(30688, sr == 1); 75 | next += nread; 76 | 77 | minsnptrs_t args; 78 | while ( *next != ')' ) 79 | { 80 | sc = strncmp(next, ", ", 2); 81 | QASSERT(30689, sc == 0); 82 | next += 2; 83 | 84 | args.push_back(parse_next_expr()); 85 | } 86 | 87 | next++; // consume the ')' 88 | 89 | // - can be either unary or binary 90 | if ( streq(op, "-") ) 91 | { 92 | if ( args.size() == 1 ) 93 | return make_un(m_neg, &args); 94 | if ( args.size() == 2 ) 95 | return make_bin(m_sub, &args); 96 | INTERR(30690); 97 | } 98 | else 99 | { 100 | mcode_t code = get_binop(op); 101 | if ( code != m_nop ) 102 | return make_bin(code, &args); 103 | } 104 | INTERR(30691); 105 | } 106 | } 107 | 108 | // ExprSlice(expr, low, hi) 109 | { 110 | int sc = strncmp(next, "ExprSlice", 9); 111 | if ( sc == 0 ) 112 | { 113 | next += 9; 114 | QASSERT(30692, *next == '('); 115 | next++; 116 | minsn_t *to_slice = parse_next_expr(); 117 | int lo, hi, nread; 118 | int sr = qsscanf(next, ", %d, %d)%n", &lo, &hi, &nread); 119 | QASSERT(30693, sr == 2); 120 | next += nread; 121 | minsn_t *res = make_slice(to_slice, lo, hi); 122 | delete to_slice; 123 | return res; 124 | } 125 | } 126 | 127 | INTERR(30694); 128 | } 129 | -------------------------------------------------------------------------------- /msynth_parser.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include "linear_exprs.hpp" 12 | 13 | //------------------------------------------------------------------------- 14 | struct bin_op_t 15 | { 16 | const char *text; 17 | mcode_t opcode; 18 | }; 19 | 20 | static const bin_op_t bin_ops[] = 21 | { 22 | { "+", m_add }, // "-" is handled separately since it can also be unary 23 | { "*", m_mul }, 24 | { "/", m_udiv }, 25 | { "&", m_and }, 26 | { "|", m_or }, 27 | { "^", m_xor }, 28 | { "<<", m_shl }, 29 | }; 30 | 31 | //------------------------------------------------------------------------- 32 | inline mcode_t get_binop(const char *op) 33 | { 34 | for ( size_t i=0; i < qnumber(bin_ops); i++ ) 35 | if ( streq(bin_ops[i].text, op) ) 36 | return bin_ops[i].opcode; 37 | return m_nop; 38 | } 39 | 40 | //------------------------------------------------------------------------- 41 | class msynth_expr_parser_t 42 | { 43 | public: 44 | const char *next; 45 | const mopvec_t &vars; 46 | 47 | 48 | //------------------------------------------------------------------------- 49 | void init_from_arg(mop_t *op, minsn_t **pp_ins) 50 | { 51 | minsn_t *ins = *pp_ins; 52 | op->create_from_insn(ins); 53 | delete ins; 54 | *pp_ins = nullptr; 55 | } 56 | 57 | //------------------------------------------------------------------------- 58 | minsn_t *make_un(mcode_t opcode, minsnptrs_t *args) 59 | { 60 | QASSERT(30683, args->size() == 1); 61 | minsn_t *res = new minsn_t(0); 62 | res->opcode = opcode; 63 | init_from_arg(&res->l, args->begin() + 0); 64 | res->d.size = res->l.size; 65 | return res; 66 | } 67 | 68 | //------------------------------------------------------------------------- 69 | minsn_t *make_bin(mcode_t opcode, minsnptrs_t *args) 70 | { 71 | QASSERT(30684, args->size() == 2); 72 | minsn_t *res = new minsn_t(0); 73 | res->opcode = opcode; 74 | init_from_arg(&res->l, args->begin() + 0); 75 | init_from_arg(&res->r, args->begin() + 1); 76 | if ( opcode == m_shl && res->r.size != 1 ) 77 | res->r.change_size(1); 78 | res->d.size = res->l.size; 79 | return res; 80 | } 81 | 82 | //------------------------------------------------------------------------- 83 | minsn_t *make_slice(minsn_t *src, int lo, int hi) 84 | { 85 | QASSERT(30686, lo == 0); 86 | QASSERT(30687, hi == 8 || hi == 16 || hi == 32); 87 | 88 | minsn_t *res = new minsn_t(0); 89 | res->opcode = m_low; 90 | res->l.create_from_insn(src); 91 | res->d.size = hi / 8; 92 | return res; 93 | } 94 | 95 | minsn_t *parse_next_expr(); 96 | 97 | public: 98 | msynth_expr_parser_t(const char *s, const mopvec_t &v) : next(s), vars(v) {} 99 | }; 100 | -------------------------------------------------------------------------------- /optimizer.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include 10 | 11 | #include "z3++_no_warn.h" 12 | #include "optimizer.hpp" 13 | 14 | //-------------------------------------------------------------------------- 15 | // check whether or not we should skip the proving step of optimization 16 | inline bool skip_proofs() 17 | { 18 | return qgetenv("VD_MBA_SKIP_PROOFS"); 19 | } 20 | 21 | //-------------------------------------------------------------------------- 22 | inline void set_cmt(ea_t ea, const char *cmt) 23 | { 24 | func_t *pfn = get_func(ea); 25 | set_func_cmt(pfn, cmt, false); 26 | } 27 | 28 | //-------------------------------------------------------------------------- 29 | static bool check_and_substitute( 30 | minsn_t *insn, 31 | minsn_t *cand_insn, 32 | uint z3_timeout, 33 | bool z3_assume_timeouts_correct) 34 | { 35 | bool ok = false; 36 | int original_score = score_complexity(*insn); 37 | int candidate_score = score_complexity(*cand_insn); 38 | msg("Testing candidate %s\n", cand_insn->dstr()); 39 | if ( candidate_score > original_score ) 40 | { 41 | msg("Candidate (%d) is not simpler than original (%d), skipping\n", candidate_score, original_score); 42 | } 43 | else 44 | { 45 | z3_converter_t converter; 46 | if ( probably_equivalent(*insn, *cand_insn) ) 47 | { 48 | msg("Instruction is probably equivalent to candidate\n"); 49 | if ( skip_proofs() || z3_timeout == 0 ) 50 | { 51 | set_cmt(insn->ea, "goomba: z3 proof skipped, simplification assumed correct"); 52 | ok = true; 53 | } 54 | else 55 | { 56 | z3::expr lge = converter.minsn_to_expr(*cand_insn); 57 | z3::expr ie = converter.minsn_to_expr(*insn); 58 | z3::solver s(converter.context); 59 | s.set("timeout", z3_timeout); 60 | s.add(lge != ie); 61 | z3::check_result res = s.check(); 62 | msg("SMT check result: %d\n", res); 63 | 64 | if ( res == z3::check_result::unsat ) 65 | { 66 | ok = true; 67 | } 68 | 69 | if ( z3_assume_timeouts_correct && res == z3::check_result::unknown ) 70 | { 71 | set_cmt(insn->ea, "goomba: z3 proof timed out, simplification assumed correct"); 72 | ok = true; 73 | } 74 | } 75 | } 76 | else 77 | { 78 | msg("Candidate not equivalent, skipping\n"); 79 | } 80 | } 81 | 82 | if ( ok ) 83 | substitute(insn, cand_insn); 84 | 85 | return ok; 86 | } 87 | 88 | //-------------------------------------------------------------------------- 89 | bool optimizer_t::optimize_insn_recurse(minsn_t *insn) 90 | { 91 | if ( optimize_insn(insn) ) 92 | return true; 93 | 94 | bool result = false; 95 | 96 | if ( insn->l.is_insn() ) 97 | result |= optimize_insn_recurse(insn->l.d); 98 | 99 | if ( insn->r.is_insn() ) 100 | result |= optimize_insn_recurse(insn->r.d); 101 | 102 | return result; 103 | } 104 | 105 | //-------------------------------------------------------------------------- 106 | bool optimizer_t::optimize_insn(minsn_t *insn) 107 | { 108 | bool success = false; 109 | auto start_time = std::chrono::high_resolution_clock::now(); 110 | 111 | if ( insn->has_side_effects(true) ) 112 | { 113 | // msg("Instruction has side effects, skipping\n"); 114 | } 115 | else 116 | { 117 | if ( is_mba(*insn) ) 118 | { 119 | msg("Found MBA instruction %s\n", insn->dstr()); 120 | 121 | try 122 | { 123 | minsn_set_t candidate_set; // recall minsn_set_t is automatically sorted by complexity 124 | auto equiv_class_start = std::chrono::high_resolution_clock::now(); 125 | if ( equiv_classes != nullptr ) 126 | equiv_classes->find_candidates(candidate_set, *insn); 127 | auto equiv_class_end = std::chrono::high_resolution_clock::now(); 128 | 129 | auto linear_start = equiv_class_end; 130 | linear_expr_t linear_guess(*insn); 131 | // msg("Linear guess %s\n", linear_guess.dstr()); 132 | candidate_set.insert(linear_guess.to_minsn(insn->ea)); 133 | auto linear_end = std::chrono::high_resolution_clock::now(); 134 | 135 | auto lin_conj_start = linear_end; 136 | lin_conj_expr_t lin_conj_guess(*insn); 137 | simp_lin_conj_expr_t simp_lin_conj_expr_t(lin_conj_guess); 138 | // msg("Simplified lin conj guess %s\n", simp_lin_conj_expr_t.dstr()); 139 | candidate_set.insert(simp_lin_conj_expr_t.to_minsn(insn->ea)); 140 | auto lin_conj_end = std::chrono::high_resolution_clock::now(); 141 | 142 | for ( minsn_t *cand : candidate_set ) 143 | { 144 | cand->optimize_solo(); // get rid of useless mov(#0) operands 145 | if ( check_and_substitute(insn, cand, z3_timeout, z3_assume_timeouts_correct) ) 146 | { 147 | if ( qgetenv("VD_MBA_LOG_PERF") ) 148 | { 149 | int nvars = get_input_mops(*insn).size(); 150 | msg("Equiv class time: %d %" FMT_64 "d us\n", nvars, 151 | std::chrono::duration_cast(equiv_class_end - equiv_class_start).count()); 152 | msg("Linear time: %d %" FMT_64 "d us\n", nvars, 153 | std::chrono::duration_cast(linear_end - linear_start).count()); 154 | msg("Lin conj time: %d %" FMT_64 "d us\n", nvars, 155 | std::chrono::duration_cast(lin_conj_end - lin_conj_start).count()); 156 | } 157 | success = true; 158 | goto finish; 159 | } 160 | } 161 | } 162 | catch ( const char *&e ) 163 | { 164 | msg("err: %s\n", e); 165 | return false; 166 | } 167 | } 168 | } 169 | 170 | finish: 171 | if ( success ) 172 | { 173 | auto end_time = std::chrono::high_resolution_clock::now(); 174 | msg("Time taken: %" FMT_64 "d us\n", 175 | std::chrono::duration_cast(end_time - start_time).count()); 176 | } 177 | 178 | return success; 179 | } 180 | -------------------------------------------------------------------------------- /optimizer.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | 11 | #include "equiv_class.hpp" 12 | #include "smt_convert.hpp" 13 | #include "heuristics.hpp" 14 | #include "lin_conj_exprs.hpp" 15 | #include "simp_lin_conj_exprs.hpp" 16 | 17 | //-------------------------------------------------------------------------- 18 | inline void substitute(minsn_t *insn, minsn_t *cand) 19 | { 20 | cand->d = insn->d; 21 | insn->swap(*cand); 22 | } 23 | 24 | //-------------------------------------------------------------------------- 25 | class optimizer_t 26 | { 27 | public: 28 | uint z3_timeout = 1000; 29 | bool z3_assume_timeouts_correct = true; 30 | equiv_class_finder_t *equiv_classes = nullptr; 31 | bool optimize_insn(minsn_t *insn); // attempts to replace the instruction with a simpler version 32 | bool optimize_insn_recurse(minsn_t *insn); // attempts to optimize the instruction, and if it fails, optimizes its subinstructions 33 | }; 34 | -------------------------------------------------------------------------------- /simp_lin_conj_exprs.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include 12 | #include "lin_conj_exprs.hpp" 13 | #include "minsn_template.hpp" 14 | #include "bitwise_expr_lookup_tbl.hpp" 15 | 16 | //------------------------------------------------------------------------- 17 | // represents a simplified linear combination of conjunctions, 18 | // essentially just a lin_conj_expr with more bitwise expressions 19 | // other than just conjunctions 20 | class simp_lin_conj_expr_t : public lin_conj_expr_t 21 | { 22 | minsn_template_ptr_t non_conj_term = std::make_shared(0ull); 23 | qvector range; // sorted lowest to highest 24 | 25 | //------------------------------------------------------------------------- 26 | void recompute_range() 27 | { 28 | std::set new_range; 29 | 30 | for ( const auto &mval : eval_trace ) 31 | new_range.insert(mval); 32 | 33 | range.qclear(); 34 | for ( auto &mval : new_range ) 35 | range.push_back(mval); 36 | } 37 | 38 | //------------------------------------------------------------------------- 39 | // returns a bitfield where the i'th bit indicates whether the i'th evaluation 40 | // returns the value of pos 41 | uint64 eval_trace_to_bit_trace(const eval_trace_t &src_trace, mcode_val_t pos) 42 | { 43 | QASSERT(30703, src_trace.size() <= 64); 44 | 45 | uint64 res = 0; 46 | for ( int i = 0; i < src_trace.size(); i++ ) 47 | { 48 | if ( src_trace[i] == pos ) 49 | res |= (1ull << i); 50 | } 51 | 52 | return res; 53 | } 54 | 55 | //------------------------------------------------------------------------- 56 | bool reset_eval_trace() 57 | { 58 | for ( auto &et : eval_trace ) 59 | et.val = 0; 60 | recompute_coeffs(); 61 | recompute_range(); 62 | return true; 63 | } 64 | 65 | public: 66 | //------------------------------------------------------------------------- 67 | simp_lin_conj_expr_t(const lin_conj_expr_t &o) : lin_conj_expr_t(o) 68 | { 69 | eliminate_variables(); 70 | recompute_range(); 71 | simplify(); 72 | } 73 | 74 | //------------------------------------------------------------------------- 75 | const char *dstr() const override 76 | { 77 | static char res[MAXSTR]; 78 | 79 | minsn_t *ins = non_conj_term->synthesize(0, coeffs[0].size, mops); 80 | qsnprintf(res, sizeof(res), "%s + %s", lin_conj_expr_t::dstr(), ins->dstr()); 81 | delete ins; 82 | return res; 83 | } 84 | 85 | // (1) A constant expression would lead to all variables getting eliminated by eliminate_variables, 86 | // so there's no need for a simplification step here. 87 | 88 | //------------------------------------------------------------------------- 89 | // (2) If F has two unique entries and its first entry is zero, we replace the nonzero element a by 90 | // 1, find the lookup table's entry for the corresponding truth vector and multiply the found 91 | // expression by a. 92 | bool simp_2() 93 | { 94 | if ( range.size() != 2 ) 95 | return false; 96 | if ( eval_trace[0].val != 0 ) 97 | return false; 98 | 99 | mcode_val_t a = range[1]; 100 | 101 | uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, a); 102 | auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace); 103 | 104 | non_conj_term = non_conj_term 105 | + std::make_shared(a.val) * minsn_template; 106 | 107 | return reset_eval_trace(); 108 | } 109 | 110 | //------------------------------------------------------------------------- 111 | // (3) If F has two unique entries a and b, both of them are nonzero, w.l.o.g., b = 2a mod 2^n, and 112 | // F's first entry is a, we can express the result in terms of a negated single expression. We 113 | // replace all occurences of a by zeros and that of b by ones, find the corresponding expression 114 | // in the lookup table, negate it, and multiply it by -a. 115 | bool simp_3() 116 | { 117 | if ( range.size() != 2 ) 118 | return false; 119 | 120 | mcode_val_t a = eval_trace[0]; 121 | mcode_val_t b = range[0] == a ? range[1] : range[0]; 122 | 123 | if ( a * mcode_val_t(2, b.size) != b ) 124 | return false; 125 | 126 | uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, b); 127 | auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace); 128 | 129 | non_conj_term = non_conj_term 130 | + std::make_shared(-a.val) * ~minsn_template; 131 | 132 | return reset_eval_trace(); 133 | } 134 | 135 | //------------------------------------------------------------------------- 136 | // (4) If F has two unique entries a and b, but the previous cases do not apply, and F's very first 137 | // entry is a, we first identify a as the constant term. Then we find an expression with ones 138 | // exactly where F has the entry b in the lookup table, multiply it by b - a and add the term to 139 | // the constant. 140 | bool simp_4() 141 | { 142 | if ( range.size() != 2 ) 143 | return false; 144 | 145 | mcode_val_t a = eval_trace[0]; 146 | mcode_val_t b = range[0] == a? range[1] : range[0]; 147 | 148 | uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, b); 149 | auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace); 150 | 151 | non_conj_term = non_conj_term 152 | + std::make_shared(a.val) 153 | + std::make_shared((b-a).val) * minsn_template; 154 | 155 | return reset_eval_trace(); 156 | } 157 | 158 | //------------------------------------------------------------------------- 159 | // (5) If F has two unique nonzero entries a and b and its first one is zero, we split it into two vectors 160 | // with ones where F has entries a or b, resp., find the corresponding expressions in the lookup 161 | // table, multiply them by a and b, resp., and add the terms together. 162 | bool simp_5() 163 | { 164 | if ( range.size() != 3 ) 165 | return false; 166 | if ( eval_trace[0].val != 0ull ) 167 | return false; 168 | 169 | mcode_val_t a = range[1]; 170 | mcode_val_t b = range[2]; 171 | 172 | uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a); 173 | uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b); 174 | auto a_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace); 175 | auto b_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), b_bit_trace); 176 | 177 | non_conj_term = non_conj_term 178 | + std::make_shared(a.val) * a_minsn_template 179 | + std::make_shared(b.val) * b_minsn_template; 180 | 181 | return reset_eval_trace(); 182 | } 183 | 184 | //------------------------------------------------------------------------- 185 | // (6) If F has three unique nonzero entries a, b and c and its first one is 0, we try to express one 186 | // of them as a sum of the others modulo 2n, e.g., a = b + c. In that case we split F into two 187 | // vectors with ones where F has entries b or c, resp., or a, find the corresponding expressions in 188 | // the lookup table, multiply them by b and c, resp., and add the terms together. 189 | bool simp_6() 190 | { 191 | if ( range.size() != 4 ) 192 | return false; 193 | if ( eval_trace[0].val != 0ull ) 194 | return false; 195 | 196 | mcode_val_t a = range[1]; 197 | mcode_val_t b = range[2]; 198 | mcode_val_t c = range[3]; 199 | 200 | // make sure that a = b + c 201 | if ( b == a + c ) 202 | qswap(a, b); 203 | else if ( c == a + b ) 204 | qswap(a, c); 205 | else if ( a != b + c ) 206 | return false; 207 | 208 | QASSERT(30705, a == b + c); 209 | 210 | uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a); 211 | uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b); 212 | uint64 c_bit_trace = eval_trace_to_bit_trace(eval_trace, c); 213 | auto ab_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace | b_bit_trace); 214 | auto ac_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace | c_bit_trace); 215 | 216 | non_conj_term = non_conj_term 217 | + std::make_shared(b.val) * ab_minsn_template 218 | + std::make_shared(c.val) * ac_minsn_template; 219 | 220 | return reset_eval_trace(); 221 | } 222 | 223 | //------------------------------------------------------------------------- 224 | // (7) If F has three unique nonzero entries a, b and c, its first one is 0 and the previous case does 225 | // not apply, we split it into three vectors with ones where F has entries a, b or c, resp., find the 226 | // corresponding expressions in the lookup table, multiply them by a, b and c, resp., and add the 227 | // terms together. 228 | bool simp_7() 229 | { 230 | if ( range.size() != 4 ) 231 | return false; 232 | if ( eval_trace[0].val != 0ull ) 233 | return false; 234 | 235 | mcode_val_t a = range[1]; 236 | mcode_val_t b = range[2]; 237 | mcode_val_t c = range[3]; 238 | 239 | uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a); 240 | uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b); 241 | uint64 c_bit_trace = eval_trace_to_bit_trace(eval_trace, c); 242 | auto a_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace); 243 | auto b_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), b_bit_trace); 244 | auto c_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), c_bit_trace); 245 | 246 | non_conj_term = non_conj_term 247 | + std::make_shared(a.val) * a_minsn_template 248 | + std::make_shared(b.val) * b_minsn_template 249 | + std::make_shared(c.val) * c_minsn_template; 250 | 251 | return reset_eval_trace(); 252 | } 253 | 254 | //------------------------------------------------------------------------- 255 | bool simp_8() 256 | { 257 | if ( range.size() != 4 ) 258 | return false; 259 | if ( eval_trace[0].val == 0ull ) 260 | return false; 261 | 262 | mcode_val_t a = eval_trace[0]; 263 | 264 | non_conj_term = non_conj_term + std::make_shared(a.val); 265 | 266 | for ( int i = 0; i < eval_trace.size(); i++ ) 267 | eval_trace[i] = eval_trace[i] - a; 268 | recompute_coeffs(); 269 | recompute_range(); 270 | return simplify(); // start again 271 | } 272 | 273 | //------------------------------------------------------------------------- 274 | bool simplify() 275 | { 276 | if ( mops.size() < 1 || mops.size() > 3 ) 277 | return false; 278 | if ( simp_2() ) 279 | return true; 280 | if ( simp_3() ) 281 | return true; 282 | if ( simp_4() ) 283 | return true; 284 | if ( simp_5() ) 285 | return true; 286 | if ( simp_6() ) 287 | return true; 288 | if ( simp_7() ) 289 | return true; 290 | if ( simp_8() ) 291 | return true; 292 | return false; 293 | } 294 | 295 | //------------------------------------------------------------------------- 296 | minsn_t *to_minsn(ea_t ea) const override 297 | { 298 | minsn_t *res = new minsn_t(ea); 299 | minsn_t *l = lin_conj_expr_t::to_minsn(ea); 300 | minsn_t *r = non_conj_term->synthesize(ea, coeffs[0].size, mops); 301 | 302 | res->opcode = m_add; 303 | res->l.create_from_insn(l); 304 | res->r.create_from_insn(r); 305 | res->d.size = coeffs[0].size; 306 | 307 | delete l; 308 | delete r; 309 | return res; 310 | } 311 | }; -------------------------------------------------------------------------------- /smt_convert.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #include "z3++_no_warn.h" 10 | #include "smt_convert.hpp" 11 | 12 | //-------------------------------------------------------------------------- 13 | z3::expr z3_converter_t::create_new_z3_var(const mop_t &mop) 14 | { 15 | const char *name = build_new_varname(); 16 | return context.bv_const(name, mop.size * 8); 17 | } 18 | 19 | //-------------------------------------------------------------------------- 20 | z3::expr z3_converter_t::var_to_expr(const mop_t &mop) 21 | { 22 | if ( assigned_vars.count(mop) ) 23 | return assigned_vars.at(mop); 24 | 25 | // mop has not yet been assigned a z3 var, make one now 26 | z3::expr new_var = create_new_z3_var(mop); 27 | input_vars.push_back(new_var); 28 | assigned_vars.insert( { mop, new_var } ); 29 | return new_var; 30 | } 31 | 32 | //-------------------------------------------------------------------------- 33 | z3::expr z3_converter_t::mop_to_expr(const mop_t &mop) 34 | { 35 | switch ( mop.t ) 36 | { 37 | case mop_n: // immediate value 38 | { 39 | int bytesz = mop.size; 40 | uint64_t value = mop.nnn->value; 41 | return context.bv_val(value, bytesz * 8); // z3 counts size in bits 42 | } 43 | 44 | case mop_d: // result of another instruction 45 | return minsn_to_expr(*mop.d); 46 | 47 | case mop_r: // register 48 | case mop_S: // stack variable 49 | case mop_v: // global variable 50 | { 51 | auto p = assigned_vars.find(mop); 52 | if ( p != assigned_vars.end() ) 53 | return p->second; 54 | 55 | // mop has not yet been assigned a z3 var, make one now 56 | const char *name = build_new_varname(); 57 | z3::expr new_var = context.bv_const(name, mop.size * 8); 58 | input_vars.push_back(new_var); 59 | assigned_vars.insert( { mop, new_var } ); 60 | return new_var; 61 | } 62 | default: 63 | INTERR(30696); // it is better to check this before running z3, when detecting mba 64 | } 65 | } 66 | 67 | //-------------------------------------------------------------------------- 68 | z3::expr z3_converter_t::minsn_to_expr(const minsn_t &insn) 69 | { 70 | switch ( insn.opcode ) 71 | { 72 | case m_ldc: // load constant 73 | case m_mov: // move 74 | return mop_to_expr(insn.l); 75 | case m_neg: 76 | return -mop_to_expr(insn.l); 77 | case m_lnot: 78 | { 79 | int bitsz = insn.l.size * 8; 80 | z3::expr bool_res = mop_to_expr(insn.l) == context.bv_val(0, bitsz); 81 | // !x === (x == 0) 82 | return bool_to_bv(bool_res, bitsz); 83 | } 84 | case m_bnot: 85 | return ~mop_to_expr(insn.l); 86 | case m_xds: // signed extension 87 | case m_xdu: // unsigned (zero) extension 88 | { 89 | auto e = mop_to_expr(insn.l); 90 | int orig_bitsz = e.get_sort().bv_size(); 91 | int dest_bitsz = insn.d.size * 8; 92 | QASSERT(30674, dest_bitsz >= orig_bitsz); 93 | if ( insn.opcode == m_xdu ) 94 | return z3::zext(e, dest_bitsz - orig_bitsz); 95 | else 96 | return z3::sext(e, dest_bitsz - orig_bitsz); 97 | } 98 | case m_low: 99 | { 100 | auto dest_bitsz = insn.d.size * 8; 101 | return mop_to_expr(insn.l).extract(dest_bitsz - 1, 0); 102 | } 103 | case m_high: 104 | { 105 | auto src_bitsz = insn.l.size * 8; 106 | auto dest_bitsz = insn.d.size * 8; 107 | return mop_to_expr(insn.l).extract(src_bitsz - 1, src_bitsz - dest_bitsz); 108 | } 109 | case m_add: 110 | return mop_to_expr(insn.l) + mop_to_expr(insn.r); 111 | case m_sub: 112 | return mop_to_expr(insn.l) - mop_to_expr(insn.r); 113 | case m_mul: 114 | return mop_to_expr(insn.l) * mop_to_expr(insn.r); 115 | case m_udiv: 116 | return z3::udiv(mop_to_expr(insn.l), mop_to_expr(insn.r)); 117 | case m_sdiv: 118 | return mop_to_expr(insn.l) / mop_to_expr(insn.r); 119 | case m_umod: 120 | return mop_to_expr(insn.l) % mop_to_expr(insn.r); 121 | case m_smod: 122 | return z3::smod(mop_to_expr(insn.l), mop_to_expr(insn.r)); 123 | case m_or: 124 | return mop_to_expr(insn.l) | mop_to_expr(insn.r); 125 | case m_and: 126 | return mop_to_expr(insn.l) & mop_to_expr(insn.r); 127 | case m_xor: 128 | return mop_to_expr(insn.l) ^ mop_to_expr(insn.r); 129 | case m_shl: 130 | return z3::shl( 131 | mop_to_expr(insn.l), 132 | bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8)); 133 | case m_shr: 134 | return z3::lshr( 135 | mop_to_expr(insn.l), 136 | bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8)); 137 | case m_sar: 138 | return z3::ashr( 139 | mop_to_expr(insn.l), 140 | bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8)); 141 | case m_sets: // get sign bit of expression 142 | return bool_to_bv(mop_to_expr(insn.l) < 0, insn.d.size * 8); 143 | // TODO: m_seto, m_setp 144 | case m_setnz: 145 | return bool_to_bv(mop_to_expr(insn.l) != mop_to_expr(insn.r), insn.d.size * 8); 146 | case m_setz: 147 | return bool_to_bv(mop_to_expr(insn.l) == mop_to_expr(insn.r), insn.d.size * 8); 148 | case m_setae: 149 | return bool_to_bv(z3::uge(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 150 | case m_setb: 151 | return bool_to_bv(z3::ult(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 152 | case m_seta: 153 | return bool_to_bv(z3::ugt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 154 | case m_setbe: 155 | return bool_to_bv(z3::ule(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 156 | case m_setg: 157 | return bool_to_bv(z3::sgt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 158 | case m_setge: 159 | return bool_to_bv(z3::sge(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 160 | case m_setl: 161 | return bool_to_bv(z3::slt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 162 | case m_setle: 163 | return bool_to_bv(z3::sle(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8); 164 | default: 165 | INTERR(30697); // it is better to check this before running z3, when detecting mba 166 | } 167 | } 168 | -------------------------------------------------------------------------------- /smt_convert.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023 by Hex-Rays, support@hex-rays.com 3 | * ALL RIGHTS RESERVED. 4 | * 5 | * gooMBA plugin for Hex-Rays Decompiler. 6 | * 7 | */ 8 | 9 | #pragma once 10 | #include "z3++_no_warn.h" 11 | #include "mcode_emu.hpp" 12 | 13 | //------------------------------------------------------------------------- 14 | class z3_converter_t 15 | { 16 | char namebuf[12]; 17 | int next_free_varnum = 0; 18 | const char *build_new_varname() 19 | { 20 | qsnprintf(namebuf, sizeof(namebuf), "y%d", next_free_varnum++); 21 | return namebuf; 22 | } 23 | 24 | public: 25 | z3::context context; 26 | z3::expr_vector input_vars; 27 | 28 | // the next integer we can use to generate a z3 variable name 29 | std::map assigned_vars; 30 | 31 | z3_converter_t() : input_vars(context) { namebuf[0] = '\0'; } 32 | virtual ~z3_converter_t() {} 33 | 34 | // create_new_z3_var is called when var_to_expr fails to find an assigned_var in the cache 35 | virtual z3::expr create_new_z3_var(const mop_t &mop); 36 | z3::expr var_to_expr(const mop_t &mop); // for terminal mops, i.e. stack vars, registers, global vars 37 | z3::expr mop_to_expr(const mop_t &mop); 38 | z3::expr minsn_to_expr(const minsn_t &insn); 39 | 40 | //------------------------------------------------------------------------- 41 | z3::expr bool_to_bv(z3::expr boolean, uint bitsz) 42 | { 43 | return z3::ite(boolean, context.bv_val(1, bitsz), context.bv_val(0, bitsz)); 44 | } 45 | 46 | //------------------------------------------------------------------------- 47 | z3::expr bv_zext_to_len(z3::expr bv, uint target_bitsz) 48 | { 49 | uint orig_bitsz = bv.get_sort().bv_size(); 50 | if ( target_bitsz == orig_bitsz ) 51 | return bv; // no need to extend 52 | return z3::zext(bv, target_bitsz - orig_bitsz); 53 | } 54 | 55 | //------------------------------------------------------------------------- 56 | z3::expr bv_sext_to_len(z3::expr bv, uint target_bitsz) 57 | { 58 | uint orig_bitsz = bv.get_sort().bv_size(); 59 | if ( target_bitsz == orig_bitsz ) 60 | return bv; // no need to extend 61 | return z3::sext(bv, target_bitsz - orig_bitsz); 62 | } 63 | 64 | //------------------------------------------------------------------------- 65 | z3::expr bv_resize_to_len(z3::expr bv, uint target_bitsz, bool sext) 66 | { 67 | uint orig_bitsz = bv.get_sort().bv_size(); 68 | if ( target_bitsz == orig_bitsz ) 69 | return bv; 70 | if ( target_bitsz < orig_bitsz ) 71 | return bv.extract(target_bitsz - 1, 0); 72 | else 73 | return sext 74 | ? bv_sext_to_len(bv, target_bitsz) 75 | : bv_zext_to_len(bv, target_bitsz); 76 | } 77 | 78 | //------------------------------------------------------------------------- 79 | z3::expr mcode_val_to_expr(mcode_val_t v) 80 | { 81 | return context.bv_val(uint64_t(v.val), v.size * 8); 82 | } 83 | }; 84 | -------------------------------------------------------------------------------- /tests/idb/mba_challenge.i64: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/tests/idb/mba_challenge.i64 -------------------------------------------------------------------------------- /tests/idb/nonlinear.o.i64: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/tests/idb/nonlinear.o.i64 -------------------------------------------------------------------------------- /z3++_no_warn.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | // using z3++.h directly leads to compiler warnings about shadowing declaractions 3 | #ifdef __GNUC__ 4 | # pragma GCC diagnostic push 5 | # pragma GCC diagnostic ignored "-Wshadow" 6 | #endif 7 | #include 8 | #ifdef __GNUC__ 9 | # pragma GCC diagnostic pop 10 | #endif 11 | -------------------------------------------------------------------------------- /z3/readme.txt: -------------------------------------------------------------------------------- 1 | bin and include directories of the z3 build should be extracted here 2 | --------------------------------------------------------------------------------