├── .gitignore
├── BUILDING.md
├── README.md
├── bitwise_expr_lookup_tbl.cpp
├── bitwise_expr_lookup_tbl.hpp
├── consts.hpp
├── equiv_class.cpp
├── equiv_class.hpp
├── file.cpp
├── file.hpp
├── generate_oracle.bat
├── generate_oracle.sh
├── goomba.cfg
├── goomba.cpp
├── heuristics.cpp
├── heuristics.hpp
├── images
    ├── mba1_after.png
    └── mba1_before.png
├── lin_conj_exprs.hpp
├── linear_exprs.cpp
├── linear_exprs.hpp
├── makefile
├── mcode_emu.hpp
├── minsn_template.hpp
├── msynth_parser.cpp
├── msynth_parser.hpp
├── optimizer.cpp
├── optimizer.hpp
├── simp_lin_conj_exprs.hpp
├── smt_convert.cpp
├── smt_convert.hpp
├── tests
    └── idb
    │   ├── mba_challenge.i64
    │   └── nonlinear.o.i64
├── z3++_no_warn.h
└── z3
    └── readme.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | .gitignore
2 | r32.bat
3 | tests/
4 | 


--------------------------------------------------------------------------------
/BUILDING.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Bulding gooMBA
 3 | 
 4 | ## dependencies
 5 | 
 6 | gooMBA requires IDA SDK (8.2 or later) and the [z3 library](https://github.com/Z3Prover/z3).
 7 | 
 8 | ## Building
 9 | 
10 | 1. After unpacking and setting up the SDK, copy goomba source tree under SDK's `plugins` directory, 
11 | for example `C:\idasdk_pro82\plugins\goomba`.
12 | 
13 | 2. Download and extract [z3 build for your OS](https://github.com/Z3Prover/z3/releases) into the `z3` subdirectory.
14 | 
15 | Under it, you should have `bin` and `include` directories:
16 | 
17 |     z3/bin/
18 |     z3/include/
19 | 
20 | Alternatively, set `Z3_BIN` and `Z3_INCLUDE` to point to the directories elsewhere.
21 | 
22 | 3. build the necessary version of gooMBA, for example:
23 | 
24 | ```make -j```  for 32-bit IDA
25 | ```make __EA64__=1 -j```  for IDA64
26 | 
27 | 4. Copy generated files from SDK's bin directory to your IDA install (or [user directory](https://hex-rays.com/blog/igors-tip-of-the-week-33-idas-user-directory-idausr/)):
28 | 
29 | On Windows:
30 | 
31 |  * `C:\idasdk_pro82\bin\plugins\goomba*` -> `C:\Program Files\IDA Pro 8.2\plugins\`
32 |  * `C:\idasdk_pro82\bin\cfg\goomba.cfg` -> `C:\Program Files\IDA Pro 8.2\cfg\`
33 |  * `C:\idasdk_pro82\bin\libz3.*` -> `C:\Program Files\IDA Pro 8.2\`
34 | 
35 | On linux:
36 | 
37 |  * `/path/to/idasdk_pro82/bin/plugins/goomba*` -> `/path/to/ida82/plugins/`
38 |  * `/path/to/idasdk_pro82/bin/cfg/goomba.cfg` -> `/path/to/ida82/cfg/`
39 |  * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/`
40 | 
41 | On macOS:
42 | 
43 |  * `/path/to/idasdk_pro82/bin/plugins/goomba*` -> `/path/to/ida82/ida.app/Contents/MacOS/plugins/`
44 |  * `/path/to/idasdk_pro82/bin/cfg/goomba.cfg` -> `/path/to/ida82/ida.app/Contents/MacOS/cfg/`
45 |  * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/ida.app/Contents/MacOS/`
46 |  * `/path/to/idasdk_pro82/bin/libz3.*` -> `/path/to/ida82/ida64.app/Contents/MacOS/`
47 | 
48 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # gooMBA
 2 | 
 3 | gooMBA is a Hex-Rays Decompiler plugin that simplifies Mixed Boolean-Arithmetic
 4 | (MBA) expressions. It achieves this using several heuristics and algorithms to
 5 | achieve orders-of-magnitude better performance than existing state-of-the-art
 6 | solutions.
 7 | 
 8 | More information on the inner workings of this tool is available in our [blog
 9 | post](https://hex-rays.com/blog/deobfuscation-with-goomba/).
10 | 
11 | ## Core Features
12 | - Full integration with the Hex-Rays Decompiler
13 | - Simplifies linear MBAs, including opaque predicates
14 | - Handles sign extension for linear functions
15 | - Verifies soundness of simplifications using the z3 SMT solver
16 | - Simplifies non-linear MBAs with the use of a function fingerprint oracle
17 | 
18 | ## Usage
19 | 
20 | By default, the plugin does not run automatically. You can invoke the plugin
21 | by right clicking in the pseudocode view and selecting "Run gooMBA Optimizer".
22 | In addition, you can set up a keyboard shortcut in IDA by opening Options ->
23 | Shortcuts... and adding a shortcut for the `goomba:run` action.
24 | 
25 | Several options for usage are available within `goomba.cfg`. You can set up a
26 | fingerprint oracle, configure the z3 proof timeout time, choose the desired behavior when
27 | timeouts occur, and choose to make the plugin run automatically without needing
28 | to be invoked from the right-click menu.
29 | 
30 | ## Demo
31 | 
32 | The sample database `tests/idb/mba_challenge.i64` was created from the `mba_challenge` binary. The functions
33 | `mba1`, `mba2`, `mba3`, `mba`, `solve_me` contain MBA expressions of varying complexity.
34 | 
35 | For example, the `mba1` function's initial pseudocode:
36 | ![mba1 initial pseudocode](./images/mba1_before.png)
37 | 
38 | And after running gooMBA optimization:
39 | ![mba1 pseudocode optimized](./images/mba1_after.png)
40 | 
41 | 
42 | ## Fingerprint oracle
43 | 
44 | The oracle can be used for simplifying non-linear MBAs.
45 | The input for generaring it is a list of candidate expressions in [msynth](https://github.com/mrphrazer/msynth) syntax.
46 | You can use `generate_oracle.sh` or `generate_oracle.bat` to generate a binary 
47 | oracle file which can then be used by the plugin by specifying the path to it 
48 | in `goomba.cfg` (parameter `MBA_ORACLE_PATH`).
49 | 
50 | A large pre-computed oracle is available [here](https://hex-rays.com/products/ida/support/freefiles/goomba-oracle.7z)
51 | 
52 | NOTE: oracle files generated with IDA 8.2 can only be used with 64-bit binaries, otherwise you may hit internal error 30661.
53 | 
54 | ## Obtaining gooMBA
55 | 
56 | Please see the [releases](https://github.com/HexRaysSA/goomba/releases) section for `goomba` builds that will work with IDA Pro & IDA Teams v8.2.
57 | 
58 | Starting with version 8.3, `goomba` is shipped with IDA Pro & IDA Teams.
59 | 


--------------------------------------------------------------------------------
/bitwise_expr_lookup_tbl.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "bitwise_expr_lookup_tbl.hpp"
 11 | 
 12 | bw_expr_tbl_t bw_expr_tbl_t::instance;
 13 | 
 14 | bw_expr_tbl_t::bw_expr_tbl_t()
 15 | {
 16 |   minsn_templates_t X;
 17 |   X.push_back(std::make_shared<mt_varref_t>(0));
 18 |   X.push_back(std::make_shared<mt_varref_t>(1));
 19 |   X.push_back(std::make_shared<mt_varref_t>(2));
 20 | 
 21 |   minsn_template_ptr_t zero = std::make_shared<mt_constant_t>(0ull);
 22 | 
 23 |   // note that all expressions are ordered by the numeric value of the instruction trace
 24 |   // see lin_conj_exprs.hpp for more info on ordering.
 25 |   auto &onevar = tbl.push_back();
 26 |   onevar.push_back(zero); // [0 0]
 27 |   onevar.push_back(X[0]); // [0 1]
 28 | 
 29 |   auto &twovar = tbl.push_back();
 30 |   twovar.push_back(zero);           // [0 0 0 0]
 31 |   twovar.push_back(X[0]&~X[1]);     // [0 1 0 0]
 32 |   twovar.push_back(~(X[0]|~X[1]));  // [0 0 1 0]
 33 |   twovar.push_back(X[0]^X[1]);      // [0 1 1 0]
 34 |   twovar.push_back(X[0]&X[1]);      // [0 0 0 1]
 35 |   twovar.push_back(X[0]);           // [0 1 0 1]
 36 |   twovar.push_back(X[1]);           // [0 0 1 1]
 37 |   twovar.push_back(X[0]|X[1]);      // [0 1 1 1]
 38 | 
 39 |   auto &threevar = tbl.push_back();
 40 |   threevar.push_back(zero);                                   // [0 0 0 0 0 0 0 0]
 41 |   threevar.push_back(~(~X[0]|(X[1]|X[2])));                   // [0 1 0 0 0 0 0 0]
 42 |   threevar.push_back(~(X[0]|(~X[1]|X[2])));                   // [0 0 1 0 0 0 0 0]
 43 |   threevar.push_back(~X[2]&(X[0]^X[1]));                      // [0 1 1 0 0 0 0 0]
 44 |   threevar.push_back(~(~X[0]|(~X[1]|X[2])));                  // [0 0 0 1 0 0 0 0]
 45 |   threevar.push_back(X[0]&~X[2]);                             // [0 1 0 1 0 0 0 0]
 46 |   threevar.push_back(X[1]&~X[2]);                             // [0 0 1 1 0 0 0 0]
 47 |   threevar.push_back(X[2]^(X[0]|(X[1]|X[2])));                // [0 1 1 1 0 0 0 0]
 48 |   threevar.push_back(~X[0]&(~X[1]&X[2]));                     // [0 0 0 0 1 0 0 0]
 49 |   threevar.push_back(~X[1]&(X[0]^X[2]));                      // [0 1 0 0 1 0 0 0]
 50 |   threevar.push_back(~X[0]&(X[1]^X[2]));                      // [0 0 1 0 1 0 0 0]
 51 |   threevar.push_back(~(X[0]&X[1])&(X[0]^(X[1]^X[2])));        // [0 1 1 0 1 0 0 0]
 52 |   threevar.push_back(~(X[0]^X[1])&(X[0]^X[2]));               // [0 0 0 1 1 0 0 0]
 53 |   threevar.push_back(X[2]^(X[0]|(X[1]&X[2])));                // [0 1 0 1 1 0 0 0]
 54 |   threevar.push_back(~(X[0]&~X[1])&(X[1]^X[2]));              // [0 0 1 1 1 0 0 0]
 55 |   threevar.push_back(X[2]^(X[0]|X[1]));                       // [0 1 1 1 1 0 0 0]
 56 |   threevar.push_back(X[0]&(~X[1]&X[2]));                      // [0 0 0 0 0 1 0 0]
 57 |   threevar.push_back(X[0]&~X[1]);                             // [0 1 0 0 0 1 0 0]
 58 |   threevar.push_back((X[0]^X[1])&~(X[0]^X[2]));               // [0 0 1 0 0 1 0 0]
 59 |   threevar.push_back(X[1]^(X[0]|(X[1]&X[2])));                // [0 1 1 0 0 1 0 0]
 60 |   threevar.push_back(X[0]&(X[1]^X[2]));                       // [0 0 0 1 0 1 0 0]
 61 |   threevar.push_back(~(~X[0]|(X[1]&X[2])));                   // [0 1 0 1 0 1 0 0]
 62 |   threevar.push_back((X[0]|X[1])&(X[1]^X[2]));                // [0 0 1 1 0 1 0 0]
 63 |   threevar.push_back((X[0]&X[1])^~(X[0]^(~X[1]|X[2])));       // [0 1 1 1 0 1 0 0]
 64 |   threevar.push_back(~(X[1]|~X[2]));                          // [0 0 0 0 1 1 0 0]
 65 |   threevar.push_back(X[1]^(X[0]|(X[1]|X[2])));                // [0 1 0 0 1 1 0 0]
 66 |   threevar.push_back(~(X[0]&X[1])&(X[1]^X[2]));               // [0 0 1 0 1 1 0 0]
 67 |   threevar.push_back(X[1]^(X[0]|X[2]));                       // [0 1 1 0 1 1 0 0]
 68 |   threevar.push_back((X[0]|~X[1])&(X[1]^X[2]));               // [0 0 0 1 1 1 0 0]
 69 |   threevar.push_back((X[0]&X[2])^(X[0]^(~X[1]&X[2])));        // [0 1 0 1 1 1 0 0]
 70 |   threevar.push_back(X[1]^X[2]);                              // [0 0 1 1 1 1 0 0]
 71 |   threevar.push_back((X[0]&~X[1])|(X[1]^X[2]));               // [0 1 1 1 1 1 0 0]
 72 |   threevar.push_back(~X[0]&(X[1]&X[2]));                      // [0 0 0 0 0 0 1 0]
 73 |   threevar.push_back((X[0]^X[1])&(X[0]^X[2]));                // [0 1 0 0 0 0 1 0]
 74 |   threevar.push_back(~(X[0]|~X[1]));                          // [0 0 1 0 0 0 1 0]
 75 |   threevar.push_back(X[1]^~(~X[0]|(~X[1]&X[2])));             // [0 1 1 0 0 0 1 0]
 76 |   threevar.push_back(X[1]&(X[0]^X[2]));                       // [0 0 0 1 0 0 1 0]
 77 |   threevar.push_back(X[2]^(X[0]|(~X[1]&X[2])));               // [0 1 0 1 0 0 1 0]
 78 |   threevar.push_back(X[1]&~(X[0]&X[2]));                      // [0 0 1 1 0 0 1 0]
 79 |   threevar.push_back(X[1]^~(~X[0]|(X[1]^X[2])));              // [0 1 1 1 0 0 1 0]
 80 |   threevar.push_back(~(X[0]|~X[2]));                          // [0 0 0 0 1 0 1 0]
 81 |   threevar.push_back(X[2]^(X[0]&(~X[1]|X[2])));               // [0 1 0 0 1 0 1 0]
 82 |   threevar.push_back(~X[0]&(X[1]|X[2]));                      // [0 0 1 0 1 0 1 0]
 83 |   threevar.push_back(X[0]^(X[1]|X[2]));                       // [0 1 1 0 1 0 1 0]
 84 |   threevar.push_back(X[2]^(X[0]&(X[1]|X[2])));                // [0 0 0 1 1 0 1 0]
 85 |   threevar.push_back(X[0]^X[2]);                              // [0 1 0 1 1 0 1 0]
 86 |   threevar.push_back((X[0]&X[2])^(X[1]|X[2]));                // [0 0 1 1 1 0 1 0]
 87 |   threevar.push_back(X[2]^~(~X[0]&(~X[1]|X[2])));             // [0 1 1 1 1 0 1 0]
 88 |   threevar.push_back(X[2]&(X[0]^X[1]));                       // [0 0 0 0 0 1 1 0]
 89 |   threevar.push_back(X[1]^~(~X[0]&(~X[1]|X[2])));             // [0 1 0 0 0 1 1 0]
 90 |   threevar.push_back(X[1]^(X[0]&(X[1]|X[2])));                // [0 0 1 0 0 1 1 0]
 91 |   threevar.push_back(X[0]^X[1]);                              // [0 1 1 0 0 1 1 0]
 92 |   threevar.push_back((X[0]|X[1])&~(X[0]^(X[1]^X[2])));        // [0 0 0 1 0 1 1 0]
 93 |   threevar.push_back(X[0]^(X[1]&X[2]));                       // [0 1 0 1 0 1 1 0]
 94 |   threevar.push_back(X[1]^(X[0]&X[2]));                       // [0 0 1 1 0 1 1 0]
 95 |   threevar.push_back(X[1]^(X[0]&(~X[1]|X[2])));               // [0 1 1 1 0 1 1 0]
 96 |   threevar.push_back(X[2]&~(X[0]&X[1]));                      // [0 0 0 0 1 1 1 0]
 97 |   threevar.push_back(X[1]^(X[0]|(X[1]^X[2])));                // [0 1 0 0 1 1 1 0]
 98 |   threevar.push_back((X[0]&X[1])^(X[1]|X[2]));                // [0 0 1 0 1 1 1 0]
 99 |   threevar.push_back(X[1]^(X[0]|(~X[1]&X[2])));               // [0 1 1 0 1 1 1 0]
100 |   threevar.push_back(X[2]^(X[0]&X[1]));                       // [0 0 0 1 1 1 1 0]
101 |   threevar.push_back(X[2]^~(~X[0]|(~X[1]&X[2])));             // [0 1 0 1 1 1 1 0]
102 |   threevar.push_back(~(X[0]|~X[1])|(X[1]^X[2]));              // [0 0 1 1 1 1 1 0]
103 |   threevar.push_back((X[0]^X[1])|(X[0]^X[2]));                // [0 1 1 1 1 1 1 0]
104 |   threevar.push_back(X[0]&(X[1]&X[2]));                       // [0 0 0 0 0 0 0 1]
105 |   threevar.push_back(~(~X[0]|(X[1]^X[2])));                   // [0 1 0 0 0 0 0 1]
106 |   threevar.push_back(X[1]&~(X[0]^X[2]));                      // [0 0 1 0 0 0 0 1]
107 |   threevar.push_back((X[0]|X[1])&(X[0]^(X[1]^X[2])));         // [0 1 1 0 0 0 0 1]
108 |   threevar.push_back(X[0]&X[1]);                              // [0 0 0 1 0 0 0 1]
109 |   threevar.push_back(~(~X[0]|(~X[1]&X[2])));                  // [0 1 0 1 0 0 0 1]
110 |   threevar.push_back(X[1]&(X[0]|~X[2]));                      // [0 0 1 1 0 0 0 1]
111 |   threevar.push_back((X[1]&~X[2])|~(~X[0]|(~X[1]&X[2])));     // [0 1 1 1 0 0 0 1]
112 |   threevar.push_back(X[2]&~(X[0]^X[1]));                      // [0 0 0 0 1 0 0 1]
113 |   threevar.push_back((X[0]|~X[1])&(X[0]^(X[1]^X[2])));        // [0 1 0 0 1 0 0 1]
114 |   threevar.push_back(~(X[0]&~X[1])&(X[0]^(X[1]^X[2])));       // [0 0 1 0 1 0 0 1]
115 |   threevar.push_back(X[0]^(X[1]^X[2]));                       // [0 1 1 0 1 0 0 1]
116 |   threevar.push_back(X[1]^(~X[0]&(X[1]|X[2])));               // [0 0 0 1 1 0 0 1]
117 |   threevar.push_back(X[0]^(~X[1]&X[2]));                      // [0 1 0 1 1 0 0 1]
118 |   threevar.push_back(X[1]^~(X[0]|~X[2]));                     // [0 0 1 1 1 0 0 1]
119 |   threevar.push_back((X[0]&X[1])|(X[0]^(X[1]^X[2])));         // [0 1 1 1 1 0 0 1]
120 |   threevar.push_back(X[0]&X[2]);                              // [0 0 0 0 0 1 0 1]
121 |   threevar.push_back(X[0]&(~X[1]|X[2]));                      // [0 1 0 0 0 1 0 1]
122 |   threevar.push_back(X[2]^(~X[0]&(X[1]|X[2])));               // [0 0 1 0 0 1 0 1]
123 |   threevar.push_back(~(X[0]^(~X[1]|X[2])));                   // [0 1 1 0 0 1 0 1]
124 |   threevar.push_back(X[0]&(X[1]|X[2]));                       // [0 0 0 1 0 1 0 1]
125 |   threevar.push_back(X[0]);                                   // [0 1 0 1 0 1 0 1]
126 |   threevar.push_back((X[0]&X[2])|(X[1]&~X[2]));               // [0 0 1 1 0 1 0 1]
127 |   threevar.push_back(~(~X[0]&(~X[1]|X[2])));                  // [0 1 1 1 0 1 0 1]
128 |   threevar.push_back(X[2]&(X[0]|~X[1]));                      // [0 0 0 0 1 1 0 1]
129 |   threevar.push_back((X[1]&~X[2])^(X[0]|(X[1]^X[2])));        // [0 1 0 0 1 1 0 1]
130 |   threevar.push_back(X[2]^~(X[0]|~X[1]));                     // [0 0 1 0 1 1 0 1]
131 |   threevar.push_back((X[0]&~X[1])|(X[0]^(X[1]^X[2])));        // [0 1 1 0 1 1 0 1]
132 |   threevar.push_back((X[0]&X[1])|~(X[1]|~X[2]));              // [0 0 0 1 1 1 0 1]
133 |   threevar.push_back(X[0]|(~X[1]&X[2]));                      // [0 1 0 1 1 1 0 1]
134 |   threevar.push_back((X[0]&X[1])|(X[1]^X[2]));                // [0 0 1 1 1 1 0 1]
135 |   threevar.push_back(X[0]|(X[1]^X[2]));                       // [0 1 1 1 1 1 0 1]
136 |   threevar.push_back(X[1]&X[2]);                              // [0 0 0 0 0 0 1 1]
137 |   threevar.push_back((X[0]|X[1])&~(X[1]^X[2]));               // [0 1 0 0 0 0 1 1]
138 |   threevar.push_back(X[1]&~(X[0]&~X[2]));                     // [0 0 1 0 0 0 1 1]
139 |   threevar.push_back(X[1]^(X[0]&~X[2]));                      // [0 1 1 0 0 0 1 1]
140 |   threevar.push_back(X[1]&(X[0]|X[2]));                       // [0 0 0 1 0 0 1 1]
141 |   threevar.push_back((X[0]&X[2])^(X[0]^(X[1]&X[2])));         // [0 1 0 1 0 0 1 1]
142 |   threevar.push_back(X[1]);                                   // [0 0 1 1 0 0 1 1]
143 |   threevar.push_back(X[1]|(X[0]&~X[2]));                      // [0 1 1 1 0 0 1 1]
144 |   threevar.push_back(X[2]&~(X[0]&~X[1]));                     // [0 0 0 0 1 0 1 1]
145 |   threevar.push_back(X[2]^(X[0]&~X[1]));                      // [0 1 0 0 1 0 1 1]
146 |   threevar.push_back((X[1]&X[2])|(~X[0]&(X[1]|X[2])));        // [0 0 1 0 1 0 1 1]
147 |   threevar.push_back(~(X[0]|~X[1])|(X[0]^(X[1]^X[2])));       // [0 1 1 0 1 0 1 1]
148 |   threevar.push_back(X[1]^(~X[0]&(X[1]^X[2])));               // [0 0 0 1 1 0 1 1]
149 |   threevar.push_back(X[2]^~(~X[0]|(X[1]&X[2])));              // [0 1 0 1 1 0 1 1]
150 |   threevar.push_back(X[1]|~(X[0]|~X[2]));                     // [0 0 1 1 1 0 1 1]
151 |   threevar.push_back(X[1]|(X[0]^X[2]));                       // [0 1 1 1 1 0 1 1]
152 |   threevar.push_back(X[2]&(X[0]|X[1]));                       // [0 0 0 0 0 1 1 1]
153 |   threevar.push_back((X[0]&X[1])^(X[0]^(X[1]&X[2])));         // [0 1 0 0 0 1 1 1]
154 |   threevar.push_back(X[1]^(X[0]&(X[1]^X[2])));                // [0 0 1 0 0 1 1 1]
155 |   threevar.push_back(X[1]^~(~X[0]|(X[1]&X[2])));              // [0 1 1 0 0 1 1 1]
156 |   threevar.push_back((X[1]&X[2])|(X[0]&(X[1]|X[2])));         // [0 0 0 1 0 1 1 1]
157 |   threevar.push_back(X[0]|(X[1]&X[2]));                       // [0 1 0 1 0 1 1 1]
158 |   threevar.push_back(X[1]|(X[0]&X[2]));                       // [0 0 1 1 0 1 1 1]
159 |   threevar.push_back(X[0]|X[1]);                              // [0 1 1 1 0 1 1 1]
160 |   threevar.push_back(X[2]);                                   // [0 0 0 0 1 1 1 1]
161 |   threevar.push_back(X[2]|(X[0]&~X[1]));                      // [0 1 0 0 1 1 1 1]
162 |   threevar.push_back(X[2]|~(X[0]|~X[1]));                     // [0 0 1 0 1 1 1 1]
163 |   threevar.push_back(X[2]|(X[0]^X[1]));                       // [0 1 1 0 1 1 1 1]
164 |   threevar.push_back(X[2]|(X[0]&X[1]));                       // [0 0 0 1 1 1 1 1]
165 |   threevar.push_back(X[0]|X[2]);                              // [0 1 0 1 1 1 1 1]
166 |   threevar.push_back(X[1]|X[2]);                              // [0 0 1 1 1 1 1 1]
167 |   threevar.push_back(X[0]|(X[1]|X[2]));                       // [0 1 1 1 1 1 1 1]
168 | }
169 | 


--------------------------------------------------------------------------------
/bitwise_expr_lookup_tbl.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #include "minsn_template.hpp"
11 | 
12 | // bw_expr_tbl_t is a singleton class that maintains a lookup table mapping
13 | // boolean function evaluation traces (i.e. I/O behavior) to the shortest
14 | // representation of each boolean function.
15 | // for instance, if you found a boolean function f(x, y) with the following
16 | // behavior: f(0, 0) = 0, f(0, 1) = 0, f(1, 0) = 0, f(1, 1) = 1, then you
17 | // can query this object to find that f(x, y) = x & y.
18 | // note that we do not consider any functions that return 1 on the all-zeros
19 | // input.
20 | class bw_expr_tbl_t
21 | {
22 |   qvector<minsn_templates_t> tbl;
23 | 
24 | public:
25 |   static bw_expr_tbl_t instance;
26 | 
27 |   // do not call directly, use instance instead
28 |   bw_expr_tbl_t();
29 | 
30 |   // eval_trace is a bitmap whose i'th bit contains the
31 |   // boolean function's evaluation on the i'th conjunction,
32 |   // where conjunctions are ordered in the same way as in lin_conj_exprs.hpp
33 |   minsn_template_ptr_t lookup(int nvars, uint64_t bit_trace)
34 |   {
35 |     QASSERT(30698, (bit_trace & 1) == 0);
36 |     QASSERT(30699, nvars <= 3);
37 |     QASSERT(30700, nvars >= 1);
38 |     QASSERT(30701, bit_trace < (1ull << (1ull << (nvars))));
39 |     return tbl[nvars-1][bit_trace >> 1];
40 |     // since the 0th conjunction is never considered, all vector indices are
41 |     // divided by 2. See the corresponding .cpp file for more info.
42 |   }
43 | };
44 | 


--------------------------------------------------------------------------------
/consts.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #define ACTION_NAME "goomba:run"
11 | // Z3_TIMEOUT_MS defines the amount of time we allow the z3 theorem prover to
12 | // take to prove any given statement
13 | #define Z3_TIMEOUT_MS 1000
14 | 
15 | // Only used for *generating* oracles: how many test cases to run against each
16 | // function to generate fingerprints. Note that an existing oracle will report
17 | // its own number, and the below constant will not be used
18 | #define TCS_PER_EQUIV_CLASS 128
19 | // The number of inputs used when evaluating functions for fingerprinting
20 | #define CANDIDATE_EXPR_NUMINPUTS 5
21 | // The maximum number of candidates to consider which have the same fingerprint
22 | // as the expression being simplified
23 | #define EQUIV_CLASS_MAX_CANDIDATES 10
24 | // The maximum number of fingerprints to consider for each expression being
25 | // simplified -- this number is greater than one since we consider every
26 | // possible assignment of input variables
27 | #define EQUIV_CLASS_MAX_FINGERPRINTS 50


--------------------------------------------------------------------------------
/equiv_class.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "equiv_class.hpp"
 11 | #include "optimizer.hpp"
 12 | 
 13 | 
 14 | //-------------------------------------------------------------------------
 15 | // replaces all references to abstract mop_l's with variables from new_vars
 16 | minsn_t *make_concrete_minsn(ea_t ea, const minsn_t &minsn, const mopvec_t &new_vars, int newsz)
 17 | {
 18 |   struct mop_reassigner_t : public mop_visitor_t
 19 |   {
 20 |     const mopvec_t &new_vars;
 21 |     ea_t ea;
 22 |     mop_reassigner_t(ea_t e, const mopvec_t &nm)
 23 |       : new_vars(nm), ea(e) {}
 24 |     int idaapi visit_mop(mop_t *op, const tinfo_t *, bool)
 25 |     {
 26 |       if ( op->t == mop_l )
 27 |       {
 28 |         int idx = op->l->idx;
 29 |         if ( idx >= new_vars.size() )
 30 |           return -1;
 31 |         op->t = mop_d;
 32 |         op->d = resize_mop(ea, new_vars.at(idx), op->size, false);
 33 |       }
 34 |       return 0;
 35 |     }
 36 |   };
 37 | 
 38 |   minsn_t *res = nullptr;
 39 |   minsn_t *copy = new minsn_t(minsn);
 40 | 
 41 |   mop_reassigner_t mr(ea, new_vars);
 42 |   int code = copy->for_all_ops(mr);
 43 |   if ( code >= 0 )
 44 |   {
 45 |     copy->setaddr(ea);
 46 | 
 47 |     // resize res to the correct output size
 48 |     mop_t res_mop;
 49 |     res_mop.create_from_insn(copy);
 50 |     res = resize_mop(ea, res_mop, newsz, false);
 51 |   }
 52 |   delete copy;
 53 |   return res;
 54 | }
 55 | 
 56 | //-------------------------------------------------------------------------
 57 | static void create_var_mapping(var_mapping_t &dest, const mopvec_t &mops)
 58 | {
 59 |   for ( size_t i = 0; i < mops.size(); i++ )
 60 |     dest.insert( { mops[i], i } );
 61 | }
 62 | 
 63 | //-------------------------------------------------------------------------
 64 | void equiv_class_finder_t::find_candidates(minsn_set_t &dest, const minsn_t &insn)
 65 | {
 66 |   std::set<func_fingerprint_t> seen;
 67 |   int num_fingerprints = 0; // includes duplicate fingerprints
 68 |   int num_candidates = 0;
 69 | 
 70 |   mopvec_t input_mops = get_input_mops(insn);
 71 |   do
 72 |   {
 73 |     var_mapping_t mapping;
 74 |     create_var_mapping(mapping, input_mops);
 75 | 
 76 |     func_fingerprint_t fingerprint = compute_fingerprint(insn, &mapping);
 77 |     msg("Computed fingerprint %" FMT_64 "x\n", fingerprint);
 78 | 
 79 |     num_fingerprints++;
 80 |     if ( num_fingerprints > EQUIV_CLASS_MAX_FINGERPRINTS )
 81 |       break;
 82 | 
 83 |     if ( !seen.insert(fingerprint).second )
 84 |       continue; // already seen
 85 | 
 86 |     const minsn_set_t *equiv_class = find_equiv_class(fingerprint);
 87 |     if ( equiv_class != nullptr )
 88 |     {
 89 |       for ( const auto &mi : *equiv_class )
 90 |       {
 91 |         num_candidates++;
 92 | //        msg("Fingerprint matches: %s\n", mi->dstr());
 93 |         minsn_t *concrete = make_concrete_minsn(insn.ea, *mi, input_mops, insn.d.size);
 94 | 
 95 |         if ( concrete != nullptr )
 96 |           dest.insert(concrete);
 97 | 
 98 |         if ( num_candidates >= EQUIV_CLASS_MAX_CANDIDATES )
 99 |           break;
100 |       }
101 |     }
102 | 
103 |   } while ( std::next_permutation(input_mops.begin(), input_mops.end()) );
104 | }
105 | 


--------------------------------------------------------------------------------
/equiv_class.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #pragma once
 10 | #include <hexrays.hpp>
 11 | #include "msynth_parser.hpp"
 12 | #include "heuristics.hpp"
 13 | #include "linear_exprs.hpp"
 14 | #include "consts.hpp"
 15 | 
 16 | struct minsn_with_mapping_t;
 17 | 
 18 | typedef std::set<minsn_t*, minsn_complexity_cmptr_t> minsn_set_t;
 19 | 
 20 | typedef qvector<uint64> output_behavior_t;
 21 | typedef qvector<uint64> testcase_t;
 22 | typedef std::map<mop_t, int> var_mapping_t;
 23 | typedef uint64 func_fingerprint_t;
 24 | typedef std::map<func_fingerprint_t, minsn_set_t> equiv_class_map_t;
 25 | 
 26 | #define CHECK_SERIALIZATION_CONSISTENCY true
 27 | 
 28 | //-------------------------------------------------------------------------
 29 | // output behavior is summarized as a list of uint64's, each corresponding to a test case
 30 | inline func_fingerprint_t compute_fingerprint_from_outputs(const output_behavior_t &outputs)
 31 | {
 32 |   // FNV-1a, as per Wikipedia
 33 |   const uint64 FNV_BASIS = 0xcbf29ce484222325;
 34 |   const uint64 FNV_PRIME = 0x100000001b3;
 35 |   uint64 sum = FNV_BASIS;
 36 |   for ( uint64 c : outputs )
 37 |   {
 38 |     sum ^= c;
 39 |     sum *= FNV_PRIME;
 40 |   }
 41 |   return sum;
 42 | }
 43 | 
 44 | //-------------------------------------------------------------------------
 45 | inline void gen_testcase(testcase_t *tc)
 46 | {
 47 |   tc->resize(CANDIDATE_EXPR_NUMINPUTS);
 48 |   for ( auto &v : *tc )
 49 |     v = gen_rand_mcode_val(8).val;
 50 | }
 51 | 
 52 | //-------------------------------------------------------------------------
 53 | class equiv_class_finder_t
 54 | {
 55 | public:
 56 |   equiv_class_map_t equiv_classes;
 57 |   qvector<testcase_t> testcases;
 58 | 
 59 |   //-------------------------------------------------------------------------
 60 |   // helper_emu_t evaluates expressions for a given test case and variable mapping
 61 |   struct helper_emu_t : public mcode_emulator_t
 62 |   {
 63 |     const testcase_t &tc;
 64 |     const var_mapping_t *var_mapping; // maps variables to input index
 65 |     // assigning a nullptr var_mapping indicates that the indexing should be done
 66 |     // according to the abstract mop's self-declared index
 67 | 
 68 |     helper_emu_t (const testcase_t &t, const var_mapping_t *vm)
 69 |       : tc(t), var_mapping(vm) {}
 70 | 
 71 |     virtual mcode_val_t get_var_val(const mop_t &mop) override
 72 |     {
 73 |       if ( var_mapping == nullptr )
 74 |       {
 75 |         // the instruction must be abstract, get the index from the mop itself
 76 |         QASSERT(30706, mop.t == mop_l);
 77 |         return mcode_val_t(tc[mop.l->idx], mop.size);
 78 |       }
 79 |       return mcode_val_t(tc.at(var_mapping->at(mop)), mop.size);
 80 |     }
 81 |   };
 82 | 
 83 |   virtual ~equiv_class_finder_t() {}
 84 | 
 85 |   //-------------------------------------------------------------------------
 86 |   equiv_class_finder_t()
 87 |   {
 88 |     testcases.resize(TCS_PER_EQUIV_CLASS);
 89 |     for ( auto &tc : testcases )
 90 |       gen_testcase(&tc);
 91 |   }
 92 | 
 93 |   //-------------------------------------------------------------------------
 94 |   // mapping = nullptr means the instruction is abstract (all terminal mops
 95 |   // have type mop_l), and mop indices will be retrieved by querying mop.l->idx
 96 |   func_fingerprint_t compute_fingerprint(
 97 |         const minsn_t &ins,
 98 |         const var_mapping_t *mapping = nullptr)
 99 |   {
100 |     output_behavior_t res;
101 |     res.reserve(testcases.size());
102 |     for ( const auto &tc : testcases )
103 |     {
104 |       helper_emu_t emu(tc, mapping);
105 |       res.push_back(emu.minsn_value(ins).val);
106 |     }
107 |     return compute_fingerprint_from_outputs(res);
108 |   }
109 | 
110 |   //-------------------------------------------------------------------------
111 |   func_fingerprint_t compute_fingerprint_from_serialization(
112 |         uchar *buf, uint32 sz,
113 |         int version = -1,
114 |         const var_mapping_t *mapping = nullptr)
115 |   {
116 |     if ( version == -1 ) // use current serialization version
117 |     {
118 |       bytevec_t bv;
119 |       version = minsn_t(0).serialize(&bv);
120 |     }
121 |     minsn_t minsn(0);
122 |     minsn.deserialize(buf, sz, version);
123 | 
124 |     return compute_fingerprint(minsn, mapping);
125 |   }
126 | 
127 |   //-------------------------------------------------------------------------
128 |   // computes the fingerprint of the abstract minsn and adds it to the index
129 |   void add_abstract_minsn(minsn_t *ins)
130 |   {
131 |     auto fingerprint = compute_fingerprint(*ins);
132 |     auto it = equiv_classes.find(fingerprint);
133 |     if ( it != equiv_classes.end() )
134 |     {
135 |       // check if semantically equivalent expression already exists
136 |       for ( const auto &o : it->second )
137 |         if ( probably_equivalent(*o, *ins) )
138 |           return;
139 |       it->second.insert(ins);
140 |     }
141 |     else
142 |     {
143 |       minsn_set_t new_entry;
144 |       new_entry.insert(ins);
145 |       equiv_classes.insert( { fingerprint, new_entry } );
146 |     }
147 |   }
148 | 
149 |   //-------------------------------------------------------------------------
150 |   virtual const minsn_set_t *find_equiv_class(func_fingerprint_t fingerprint)
151 |   {
152 |     auto p = equiv_classes.find(fingerprint);
153 |     if ( p != equiv_classes.end() )
154 |       return &p->second;
155 |     return nullptr;
156 |   }
157 | 
158 |   //-------------------------------------------------------------------------
159 |   // find candidate minsns that match the fingerprint of the given minsn
160 |   // before being added, these are made concrete -- the abstract mop_l's are
161 |   // replaced by real mops from the input insn
162 |   void find_candidates(minsn_set_t &dest, const minsn_t &insn);
163 | };
164 | 
165 | //-------------------------------------------------------------------------
166 | struct equiv_class_idx_entry_t
167 | {
168 |   func_fingerprint_t fingerprint;
169 |   uint64_t offset;
170 |   // offset relative to the beginning of where minsns are stored within the oracle file
171 | 
172 |   bool operator<(const equiv_class_idx_entry_t &o) const
173 |   {
174 |     return fingerprint < o.fingerprint;
175 |   }
176 | };
177 | 
178 | //-------------------------------------------------------------------------
179 | struct equiv_class_idx_t
180 | {
181 |   qvector<equiv_class_idx_entry_t> index;
182 | 
183 |   //-------------------------------------------------------------------------
184 |   void read_from_file(FILE *file)
185 |   {
186 |     uint32 idx_sz;
187 |     if ( qfread(file, &idx_sz, sizeof(idx_sz)) != sizeof(idx_sz) )
188 |       INTERR(30719);
189 |     CASSERT(sizeof(equiv_class_idx_entry_t) == 16);
190 | 
191 |     index.resize_noinit(idx_sz);
192 |     size_t nbytes = idx_sz * sizeof(equiv_class_idx_entry_t);
193 |     if ( qfread(file, index.begin(), nbytes) != nbytes )
194 |       INTERR(0);
195 |   }
196 | 
197 |   //-------------------------------------------------------------------------
198 |   size_t find(func_fingerprint_t fp)
199 |   {
200 |     equiv_class_idx_entry_t key;
201 |     key.fingerprint = fp;
202 |     auto p = std::lower_bound(index.begin(), index.end(), key);
203 |     if ( p == index.end() || p->fingerprint != fp )
204 |       return -1;
205 |     return p->offset;
206 |   }
207 | };
208 | 
209 | //-------------------------------------------------------------------------
210 | // lazy-loading collection of equivalence classes
211 | struct equiv_class_finder_lazy_t : public equiv_class_finder_t
212 | {
213 |   FILE *file;
214 |   qoff64_t fsize;
215 |   uint32 format_version; // format version used to serialize minsn_t's
216 |   equiv_class_idx_t index;
217 |   uint64 minsns_offset; // offset at which the minsns table begins
218 | 
219 |   virtual ~equiv_class_finder_lazy_t() { qfclose(file); }
220 | 
221 |   //-------------------------------------------------------------------------
222 |   //lint -sem(equiv_class_finder_lazy_t::equiv_class_finder_lazy_t, custodial(1))
223 |   equiv_class_finder_lazy_t(FILE *f) : file(f)
224 |   {
225 |     fsize = qfsize(file);
226 | 
227 |     // read in the format version
228 |     if ( qfread(file, &format_version, sizeof(format_version)) != sizeof(format_version) )
229 |       INTERR(30716);
230 | 
231 |     // read and validate the number of the test cases
232 |     uint32 n_tcs;
233 |     if ( qfread(file, &n_tcs, sizeof(n_tcs)) != sizeof(n_tcs) )
234 |       INTERR(30717);
235 |     if ( n_tcs > fsize )
236 |       INTERR(0);
237 | 
238 |     // read in the test cases
239 |     testcases.resize(n_tcs);
240 |     for ( auto &new_tc : testcases )
241 |     {
242 |       new_tc.resize(CANDIDATE_EXPR_NUMINPUTS);
243 |       for ( uint64 &new_inp : new_tc )
244 |         if ( qfread(file, &new_inp, sizeof(new_inp)) != sizeof(new_inp) )
245 |           INTERR(30718);
246 |     }
247 | 
248 |     // read in the index
249 |     index.read_from_file(file);
250 | 
251 |     minsns_offset = qftell(file);
252 | //    msg("minsns offset %llu", minsns_offset);
253 |   }
254 | 
255 |   //-------------------------------------------------------------------------
256 |   // populates the equiv_classes map with the minsn set included in the file
257 |   // for the given fingerprint
258 |   void read_minsn_set_from_file(func_fingerprint_t fp)
259 |   {
260 |     int64 idx_lookup = index.find(fp);
261 |     if ( idx_lookup < 0 )
262 |       return; // fingerprint doesn't exist in oracle
263 |     if ( equiv_classes.count(fp) != 0 )
264 |       return; // we already loaded in the equiv class
265 | 
266 |     uint64 minsn_offset = minsns_offset + idx_lookup;
267 |     if ( qfseek(file, minsn_offset, SEEK_SET) != 0 )
268 |       INTERR(30722);
269 | 
270 |     uint32 n_minsns;
271 |     if ( qfread(file, &n_minsns, sizeof(n_minsns)) != sizeof(n_minsns) )
272 |       INTERR(30723);
273 |     if ( n_minsns > fsize ) // sanity check
274 |       INTERR(0);
275 | 
276 |     bytevec_t bv;
277 |     minsn_set_t &set = equiv_classes[fp];
278 |     for ( uint32 i = 0; i < n_minsns; i++ )
279 |     {
280 |       uint32 minsn_sz;
281 |       if ( qfread(file, &minsn_sz, sizeof(minsn_sz)) != sizeof(minsn_sz) )
282 |         INTERR(30724);
283 |       if ( minsn_sz > fsize ) // sanity check
284 |         INTERR(0);
285 |       bv.resize(minsn_sz);
286 |       if ( qfread(file, bv.begin(), minsn_sz) != minsn_sz )
287 |         INTERR(30725);
288 |       minsn_t *minsn = new minsn_t(0);
289 |       minsn->deserialize(bv.begin(), minsn_sz, format_version);
290 |       set.insert(minsn);
291 |     }
292 |   }
293 | 
294 |   //-------------------------------------------------------------------------
295 |   const minsn_set_t *find_equiv_class(func_fingerprint_t fingerprint) override
296 |   {
297 |     read_minsn_set_from_file(fingerprint);
298 |     return equiv_class_finder_t::find_equiv_class(fingerprint);
299 |   }
300 | 
301 |   //-------------------------------------------------------------------------
302 |   bool optimize(minsn_t &insn);
303 | };
304 | 


--------------------------------------------------------------------------------
/file.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include <hexrays.hpp>
 11 | #include <fpro.h>
 12 | #include "file.hpp"
 13 | #include "msynth_parser.hpp"
 14 | #include "simp_lin_conj_exprs.hpp"
 15 | #include "heuristics.hpp"
 16 | #include "equiv_class.hpp"
 17 | 
 18 | //-------------------------------------------------------------------------
 19 | // In fact this function is not really needed. The user can simply turn on
 20 | // the timestamp display in the output window.
 21 | static qstring curtime()
 22 | {
 23 |   char buf[64];
 24 |   char *ptr = buf;
 25 |   char *end = buf + sizeof(buf);
 26 |   qtime64_t ts = qtime64();
 27 |   ptr += qstrftime64(ptr, end-ptr, "%H:%M:%S", ts);
 28 |   uint32 msecs = get_usecs(ts) / 1000;
 29 |   qsnprintf(ptr, end-ptr, ".%03d", msecs);
 30 |   return qstring(buf);
 31 | }
 32 | 
 33 | //-------------------------------------------------------------------------
 34 | void create_minsns_file(FILE *msynth_in, FILE *minsns_out)
 35 | {
 36 |   qstring line;
 37 |   int n_proc = 0;
 38 |   int n_written = 0;
 39 |   while ( qgetline(&line, msynth_in) >= 0 )
 40 |   {
 41 |     n_proc++;
 42 |     if ( line.size() == 0 )
 43 |       continue;
 44 |     if ( n_proc % REPORT_FREQ == 0 )
 45 |       msg("%s: Processed %d, Wrote %d\n", curtime().c_str(), n_proc, n_written);
 46 |     mopvec_t default_vars;
 47 |     //-------------------------------------------------------------------------
 48 |     // an *abstract* mop is a mop_l that does not refer to anything within a
 49 |     // specific program, it is a placeholder for minsn templates
 50 |     for ( int i = 0; i < CANDIDATE_EXPR_NUMINPUTS; i++ )
 51 |     {
 52 |       mop_t new_var;
 53 |       new_var.t = mop_l;
 54 |       new_var.l = new lvar_ref_t(nullptr, i);
 55 |       new_var.size = 8;
 56 |       default_vars.push_back(new_var);
 57 |     }
 58 | 
 59 |     msynth_expr_parser_t mep(line.c_str(), default_vars);
 60 |     minsn_t *insn = mep.parse_next_expr();
 61 | 
 62 |     bytevec_t bv;
 63 |     insn->serialize(&bv);
 64 |     uint32 bv_sz = bv.size();
 65 |     qfwrite(minsns_out, &bv_sz, sizeof(bv_sz));
 66 |     qfwrite(minsns_out, bv.begin(), bv_sz);
 67 |     n_written++;
 68 | 
 69 |     delete insn;
 70 |   }
 71 | 
 72 |   msg("%s: Processed %d, Wrote %d\n", curtime().c_str(), n_proc, n_written);
 73 | }
 74 | 
 75 | //-------------------------------------------------------------------------
 76 | // bytevec comparison based on length
 77 | struct bv_len_cmptr_t
 78 | {
 79 |   inline bool operator()(const bytevec_t &a, const bytevec_t &b) const
 80 |   {
 81 |     auto asz = a.size();
 82 |     auto bsz = b.size();
 83 |     return std::tie(asz, a) < std::tie(bsz, b);
 84 |   }
 85 | };
 86 | typedef std::set<bytevec_t, bv_len_cmptr_t> bvset_t;
 87 | 
 88 | //-------------------------------------------------------------------------
 89 | inline size_t bv_sz_on_disk(const bytevec_t &bv)
 90 | {
 91 |   return sizeof(uint32) + bv.size();
 92 | }
 93 | 
 94 | //-------------------------------------------------------------------------
 95 | static void write_bv_to_disk(FILE *fout, const bytevec_t &bv)
 96 | {
 97 |   uint32 bv_sz = bv.size();
 98 |   qfwrite(fout, &bv_sz, sizeof(bv_sz));
 99 |   qfwrite(fout, bv.begin(), bv_sz);
100 | }
101 | 
102 | //-------------------------------------------------------------------------
103 | static size_t bvset_sz_on_disk(const bvset_t &bvset)
104 | {
105 |   size_t res = sizeof(uint32);
106 |   for ( const auto &bv : bvset )
107 |     res += bv_sz_on_disk(bv);
108 |   return res;
109 | }
110 | 
111 | //-------------------------------------------------------------------------
112 | static void write_bvset_to_disk(FILE *fout, const bvset_t &bvset)
113 | {
114 |   uint32 bvset_sz = bvset.size();
115 |   qfwrite(fout, &bvset_sz, sizeof(bvset_sz));
116 |   for ( const auto &bv : bvset )
117 |     write_bv_to_disk(fout, bv);
118 | }
119 | 
120 | //-------------------------------------------------------------------------
121 | bool create_oracle_file(FILE *minsns_in, FILE *oracle_out)
122 | {
123 |   // begin by loading the minsns from the file and generating fingerprints
124 |   // keeping full minsns in memory would take too much space, so we store them as strings
125 |   // and use string length as a proxy for complexity
126 |   std::map<func_fingerprint_t, bvset_t> oracle;
127 |   equiv_class_finder_t ecf;
128 | 
129 |   int n_proc = 0;
130 |   while ( true )
131 |   {
132 |     if ( n_proc % REPORT_FREQ == 0 )
133 |       msg("%s: Processed %d, #Fingerprints %" FMT_Z "\n", curtime().c_str(), n_proc, oracle.size());
134 |     n_proc++;
135 |     uint32 minsn_sz;
136 |     if ( qfread(minsns_in, &minsn_sz, sizeof(minsn_sz)) != sizeof(minsn_sz) )
137 |       break;
138 |     if ( minsn_sz > qfsize(minsns_in) ) // sanity check on minsn_sz
139 |     {
140 |       msg("Wrong instruction size %d in the oracle file, stopped reading it\n", minsn_sz);
141 |       return false;
142 |     }
143 |     bytevec_t buf;
144 |     buf.resize(minsn_sz);
145 |     if ( qfread(minsns_in, buf.begin(), minsn_sz) != minsn_sz )
146 |       break;
147 | 
148 |     func_fingerprint_t fp = ecf.compute_fingerprint_from_serialization(buf.begin(), minsn_sz);
149 | 
150 |     if ( oracle.count(fp) == 0 )
151 |       oracle.insert( { fp, std::set<bytevec_t, bv_len_cmptr_t>() } );
152 | 
153 |     oracle[fp].insert(buf);
154 |   }
155 | 
156 |   msg("%s: Processed %d, #Fingerprints %" FMT_Z "\n", curtime().c_str(), n_proc, oracle.size());
157 | 
158 |   // write the resulting oracle to the file
159 |   // begin by writing the format version
160 |   {
161 |     bytevec_t bv;
162 |     uint32 format_version = minsn_t(0).serialize(&bv);
163 |     qfwrite(oracle_out, &format_version, sizeof(format_version));
164 |   }
165 | 
166 |   // write the ecf's test cases to file
167 |   uint32 n_tcs = ecf.testcases.size();
168 |   qfwrite(oracle_out, &n_tcs, sizeof(n_tcs));
169 |   for ( const testcase_t &tc : ecf.testcases )
170 |     for ( const uint64 input : tc )
171 |       qfwrite(oracle_out, &input, sizeof(input));
172 | 
173 |   msg("Wrote test cases to file\n");
174 | 
175 |   // write the index to file
176 |   // the index is a list of entries, each consisting of a uint64 (fingerprint) and a uint64 (offset)
177 |   uint32 index_sz = oracle.size();
178 |   qfwrite(oracle_out, &index_sz, sizeof(index_sz));
179 |   qoff64_t current_offset = 0;
180 |   int n_written = 0;
181 |   for ( const auto &entry : oracle )
182 |   {
183 |     if ( n_written % REPORT_FREQ == 0 )
184 |       msg("%s: Wrote %d index entries\n", curtime().c_str(), n_written);
185 |     n_written++;
186 | 
187 |     auto fingerprint = entry.first;
188 |     auto bvset = entry.second;
189 |     qfwrite(oracle_out, &fingerprint, sizeof(fingerprint));
190 |     qfwrite(oracle_out, &current_offset, sizeof(current_offset));
191 | 
192 |     current_offset += bvset_sz_on_disk(bvset);
193 |   }
194 | 
195 |   msg("Size of oracle on disk: %llu\n", current_offset);
196 |   msg("Current file position: %llu\n", qftell(oracle_out));
197 | 
198 |   // write the actual microinstructions to disk
199 |   n_written = 0;
200 |   for ( const auto &entry : oracle )
201 |   {
202 |     if ( n_written % REPORT_FREQ == 0 )
203 |       msg("%s: Wrote %d microinstruction vectors\n", curtime().c_str(), n_written);
204 |     n_written++;
205 | 
206 |     write_bvset_to_disk(oracle_out, entry.second);
207 |   }
208 | 
209 |   msg("%s: Wrote %d microinstruction vectors\n", curtime().c_str(), n_written);
210 |   msg("Current file position: %" FMT_64 "u\n", qftell(oracle_out));
211 |   return true;
212 | }
213 | 


--------------------------------------------------------------------------------
/file.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #include <hexrays.hpp>
11 | 
12 | // functions that convert huge files in a streaming fashion without using too much memory
13 | 
14 | const int REPORT_FREQ = 10000; // how often we should report progress in the log
15 | // generates a file that is just a list of minsns
16 | void create_minsns_file(FILE *msynth_in, FILE *minsns_out);
17 | // given a minsns file, fingerprints each minsn and serializes it into the oracle
18 | bool create_oracle_file(FILE *minsns_in, FILE *oracle_out);


--------------------------------------------------------------------------------
/generate_oracle.bat:
--------------------------------------------------------------------------------
 1 | @if "%DEBUG%" == "" @echo off
 2 | @rem ##########################################################################
 3 | @rem
 4 | @rem  gooMBA oracle file generation script
 5 | @rem
 6 | @rem ##########################################################################
 7 | @rem Set local scope for the variables with windows NT shell
 8 | if "%OS%"=="Windows_NT" setlocal
 9 | 
10 | if .%1 == . goto usage
11 | set VD_MSYNTH_PATH=%~f1
12 | echo generating minsns file (step 1/2)...
13 | idat64 -A -Llog.txt tests/idb/mba_challenge.i64
14 | set VD_MSYNTH_PATH=
15 | set VD_MBA_MINSNS_PATH=%~dpnx1.b
16 | echo generating oracle file (step 2/2)...
17 | idat64 -A -Llog.txt tests/idb/mba_challenge.i64
18 | echo. >> log.txt
19 | echo finished!
20 | move %~dpnx1.b.c %~dpn1.oracle
21 | echo finished! Result is in %~dpn1.oracle
22 | tail log.txt
23 | exit /b
24 | :usage
25 | echo "Usage: generate_oracle.bat all_combined.txt"
26 | 


--------------------------------------------------------------------------------
/generate_oracle.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # usage: ./generate_oracle.sh all_combined.txt
 4 | # after the script finishes running, the oracle file will be available in all_combined.txt.oracle
 5 | 
 6 | (
 7 |   VD_MSYNTH_PATH=`realpath $1` ida64 -A -S`realpath script.py` -Llog.txt tests/idb/mba_challenge.i64
 8 |   VD_MBA_MINSNS_PATH=`realpath $1.b` ida64 -A -S`realpath script.py` -Llog.txt tests/idb/mba_challenge.i64
 9 |   mv $1.b.c $1.oracle
10 |   echo -e "\nfinished! Result is in $1.oracle" >> log.txt
11 | ) &
12 | 
13 | tail -F log.txt


--------------------------------------------------------------------------------
/goomba.cfg:
--------------------------------------------------------------------------------
 1 | 
 2 | // This configuration file is used by the mixed_bool_arith plugin, which
 3 | // provides deobfuscation functionality for expressions obfuscated with
 4 | // mixed boolean arithmetic expressions.
 5 | 
 6 | // By default, the plugin only engages through a right-click menu option.
 7 | // Set the below option to YES to make the plugin engage automatically
 8 | // when the decompiler is invoked.
 9 | MBA_RUN_AUTOMATICALLY = NO
10 | // The timeout in ms for z3 proofs. Set this to 0 to disable z3 proofs
11 | // entirely and assume simplifications are correct after heuristic checks.
12 | MBA_Z3_TIMEOUT = 1000
13 | // When z3 times out, should the simplification be assumed correct?
14 | MBA_Z3_ASSUME_TIMEOUTS_CORRECT = YES
15 | // Path to an MBA oracle. Leave this empty to disable the function
16 | // fingerprinting algorithm and use only linear methods.
17 | MBA_ORACLE_PATH = "";
18 | 


--------------------------------------------------------------------------------
/goomba.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *      It deobfuscates the MBA (mixed boolean arithmetic) epxressions.
  7 |  *
  8 |  */
  9 | 
 10 | #include <chrono>
 11 | 
 12 | #include "z3++_no_warn.h"
 13 | #include "consts.hpp"
 14 | #include "optimizer.hpp"
 15 | #include "equiv_class.hpp"
 16 | #include "file.hpp"
 17 | #include <hexrays.hpp>
 18 | #include <err.h>
 19 | 
 20 | struct plugin_ctx_t;
 21 | 
 22 | //--------------------------------------------------------------------------
 23 | // returns true if the environment variables indicate the plugin should
 24 | // always be enabled (i.e. in testing environments)
 25 | inline bool always_on(void)
 26 | {
 27 |   return qgetenv("VD_MBA_AUTO");
 28 | }
 29 | 
 30 | //--------------------------------------------------------------------------
 31 | struct action_handler : public action_handler_t
 32 | {
 33 |   plugin_ctx_t *plugmod;
 34 | 
 35 |   action_handler(plugin_ctx_t *_plugmod) : plugmod(_plugmod) {}
 36 | 
 37 |   virtual int idaapi activate(action_activation_ctx_t *ctx) override;
 38 |   virtual action_state_t idaapi update(action_update_ctx_t *) override
 39 |   {
 40 |     return AST_ENABLE;
 41 |   };
 42 | };
 43 | 
 44 | //--------------------------------------------------------------------------
 45 | //lint -e{958} padding of 7 bytes needed to align member on a 8 byte boundary
 46 | struct plugin_ctx_t : public plugmod_t
 47 | {
 48 |   bool run_automatically = false;
 49 |   qstring oracle_path;
 50 | 
 51 |   action_handler ah;
 52 |   optimizer_t optimizer;
 53 |   bool plugmod_active = false;
 54 |   plugin_ctx_t();
 55 |   ~plugin_ctx_t() { term_hexrays_plugin(); }
 56 |   virtual bool idaapi run(size_t) override;
 57 | };
 58 | 
 59 | //--------------------------------------------------------------------------
 60 | static plugmod_t *idaapi init()
 61 | {
 62 |   if ( !init_hexrays_plugin() )
 63 |     return nullptr; // no decompiler
 64 | 
 65 |   const char *hxver = get_hexrays_version();
 66 |   msg("Hex-rays version %s has been detected, %s ready to use\n",
 67 |       hxver, PLUGIN.wanted_name);
 68 | 
 69 |   plugin_ctx_t *plugmod = new plugin_ctx_t;
 70 | 
 71 |   const cfgopt_t cfgopts[] =
 72 |   {
 73 |     cfgopt_t("MBA_RUN_AUTOMATICALLY", &plugmod->run_automatically, 1),
 74 |     cfgopt_t("MBA_Z3_TIMEOUT", &plugmod->optimizer.z3_timeout),
 75 |     cfgopt_t("MBA_ORACLE_PATH", &plugmod->oracle_path),
 76 |     cfgopt_t("MBA_Z3_ASSUME_TIMEOUTS_CORRECT", &plugmod->optimizer.z3_assume_timeouts_correct, 1)
 77 |   };
 78 | 
 79 |   read_config_file("goomba", cfgopts, qnumber(cfgopts), nullptr);
 80 | 
 81 |   if ( plugmod->oracle_path.empty() )
 82 |     qgetenv("VD_MBA_ORACLE_PATH", &plugmod->oracle_path);
 83 | 
 84 |   if ( !plugmod->oracle_path.empty() )
 85 |   {
 86 |     const char *path = plugmod->oracle_path.c_str();
 87 |     FILE *fin = qfopen(path, "rb");
 88 |     if ( fin != nullptr )
 89 |     {
 90 |       plugmod->optimizer.equiv_classes = new equiv_class_finder_lazy_t(fin);
 91 |       msg("%s: loaded MBA oracle\n", path);
 92 |     }
 93 |     else
 94 |     {
 95 |       msg("%s: %s\n", path, qstrerror(-1));
 96 |     }
 97 |   }
 98 | 
 99 |   qstring ifpath;
100 |   if ( qgetenv("VD_MSYNTH_PATH", &ifpath) )
101 |   {
102 |     qstring ofpath = ifpath + ".b";
103 |     FILE *fin = qfopen(ifpath.c_str(), "r");
104 |     if ( fin == nullptr )
105 |       error("%s: failed to open for reading", ifpath.c_str());
106 |     FILE *fout = qfopen(ofpath.c_str(), "wb");
107 |     if ( fout == nullptr )
108 |       error("%s: failed to open for writing", ofpath.c_str());
109 |     create_minsns_file(fin, fout);
110 |     qfclose(fin);
111 |     qfclose(fout);
112 |     // do not save the IDB
113 |     set_database_flag(DBFL_KILL);
114 |     qexit(0);
115 |   }
116 | 
117 |   if ( qgetenv("VD_MBA_MINSNS_PATH", &ifpath) )
118 |   {
119 |     qstring ofpath = ifpath + ".c";
120 |     FILE *fin = qfopen(ifpath.c_str(), "rb");
121 |     if ( fin == nullptr )
122 |       error("%s: failed to open for reading", ifpath.c_str());
123 |     FILE *fout = qfopen(ofpath.c_str(), "wb");
124 |     if ( fout == nullptr )
125 |       error("%s: failed to open for writing", ofpath.c_str());
126 |     bool ok = create_oracle_file(fin, fout);
127 |     qfclose(fin);
128 |     qfclose(fout);
129 |     if ( !ok )
130 |       error("%s: failed to process", ifpath.c_str());
131 |     // do not save the IDB
132 |     set_database_flag(DBFL_KILL);
133 |     qexit(0);
134 |   }
135 | 
136 |   return plugmod;
137 | }
138 | 
139 | //--------------------------------------------------------------------------
140 | int idaapi action_handler::activate(action_activation_ctx_t *ctx)
141 | {
142 |   vdui_t *vu = get_widget_vdui(ctx->widget);
143 |   if ( vu != nullptr )
144 |   {
145 |     plugmod->plugmod_active = true;
146 |     vu->refresh_view(true);
147 |     return 1;
148 |   }
149 |   return 0;
150 | }
151 | 
152 | //--------------------------------------------------------------------------
153 | // This callback handles various hexrays events.
154 | static ssize_t idaapi callback(void *ud, hexrays_event_t event, va_list va)
155 | {
156 |   plugin_ctx_t *plugmod = (plugin_ctx_t *) ud;
157 |   switch ( event )
158 |   {
159 |     case hxe_microcode: // microcode has been generated
160 |       {
161 |         mba_t *mba = va_arg(va, mba_t *);
162 |         if ( always_on() || plugmod->run_automatically )
163 |           plugmod->plugmod_active = true;
164 |         if ( plugmod->plugmod_active )
165 |           mba->set_mba_flags2(MBA2_PROP_COMPLEX); // increase acceptable complexity
166 |       }
167 |       break;
168 | 
169 |     case hxe_populating_popup:
170 |       {
171 |         TWidget *widget = va_arg(va, TWidget *);
172 |         TPopupMenu *popup = va_arg(va, TPopupMenu *);
173 |         attach_action_to_popup(widget, popup, ACTION_NAME);
174 |       }
175 |       break;
176 | 
177 |     case hxe_glbopt:
178 |       if ( plugmod->plugmod_active )
179 |       {
180 |         mba_t *mba = va_arg(va, mba_t *);
181 | 
182 |         struct ida_local insn_optimize_t : public minsn_visitor_t
183 |         {
184 |           optimizer_t &optimizer;
185 |           int cnt = 0;
186 |           insn_optimize_t ( optimizer_t &o ) : optimizer(o) {}
187 |           int idaapi visit_minsn() override
188 |           {
189 | //            msg("Optimizing %s\n", curins->dstr());
190 |             if ( optimizer.optimize_insn_recurse(curins) )
191 |             {
192 |               cnt++;
193 |               blk->mark_lists_dirty();
194 |               mba->dump_mba(true, "vd_mba success %a", curins->ea);
195 |             }
196 |             return 0;
197 |           }
198 |         };
199 | 
200 |         insn_optimize_t visitor(plugmod->optimizer);
201 |         mba->for_all_topinsns(visitor);
202 | 
203 |         if ( visitor.cnt != 0 )
204 |         {
205 |           mba->verify(true);
206 |           msg("Completed mba optimization pass, improved %d expressions\n", visitor.cnt);
207 |         }
208 |         plugmod->plugmod_active = false;
209 |         mba->clr_mba_flags2(MBA2_PROP_COMPLEX);
210 |         return MERR_LOOP; // restart optimization
211 |       }
212 |       break;
213 | 
214 |     default:
215 |       break;
216 |   }
217 |   return 0;
218 | }
219 | 
220 | //--------------------------------------------------------------------------
221 | plugin_ctx_t::plugin_ctx_t() : ah(this)
222 | {
223 |   install_hexrays_callback(callback, this);
224 |   register_action(ACTION_DESC_LITERAL_PLUGMOD(
225 |                     ACTION_NAME,
226 |                     "De-obfuscate arithmetic expressions",
227 |                     &ah,
228 |                     this,
229 |                     nullptr,
230 |                     "Attempt to simplify Mixed Boolean Arithmetic-obfuscated expressions using gooMBA",
231 |                     -1));
232 | }
233 | 
234 | //--------------------------------------------------------------------------
235 | bool idaapi plugin_ctx_t::run(size_t)
236 | {
237 |   return true;
238 | }
239 | 
240 | //--------------------------------------------------------------------------
241 | static char comment[] = "gooMBA plugin for Hex-Rays decompiler";
242 | 
243 | //--------------------------------------------------------------------------
244 | //
245 | //      PLUGIN DESCRIPTION BLOCK
246 | //
247 | //--------------------------------------------------------------------------
248 | plugin_t PLUGIN =
249 | {
250 |   IDP_INTERFACE_VERSION,
251 |   PLUGIN_MULTI          // The plugin can work with multiple idbs in parallel
252 | | PLUGIN_HIDE,          // no menu items in Edit, Plugins
253 |   init,                 // initialize
254 |   nullptr,
255 |   nullptr,
256 |   comment,              // long comment about the plugin
257 |   nullptr,              // multiline help about the plugin
258 |   "gooMBA plugin",         // the preferred short name of the plugin
259 |   nullptr,              // the preferred hotkey to run the plugin
260 | };
261 | 


--------------------------------------------------------------------------------
/heuristics.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "heuristics.hpp"
 11 | 
 12 | //-------------------------------------------------------------------------
 13 | inline uint64 rand64()
 14 | {
 15 |   uint32 r1 = rand();
 16 |   uint32 r2 = rand();
 17 |   return uint64(r1) << 32 | uint64(r2);
 18 | }
 19 | 
 20 | //-------------------------------------------------------------------------
 21 | mcode_val_t gen_rand_mcode_val(int size)
 22 | {
 23 |   if ( rand() > SPECIAL_PROBABILITY * RAND_MAX )
 24 |   {
 25 |     // select from uniform random distribution
 26 |     return mcode_val_t(rand64(), size);
 27 |   }
 28 |   else
 29 |   {
 30 |     // select from special cases
 31 |     return mcode_val_t(SPECIAL[rand() % NUM_SPECIAL], size);
 32 |   }
 33 | }
 34 | 
 35 | //-------------------------------------------------------------------------
 36 | // guesses whether or not the instruction is MBA
 37 | bool is_mba(const minsn_t &insn)
 38 | {
 39 |   struct mba_opc_counter_t : public minsn_visitor_t
 40 |   {
 41 |     int bool_cnt = 0;
 42 |     int arith_cnt = 0;
 43 |     int idaapi visit_minsn(void) override
 44 |     {
 45 |       switch ( curins->opcode )
 46 |       {
 47 |         case m_neg:
 48 |         case m_add:
 49 |         case m_sub:
 50 |         case m_mul:
 51 |         case m_udiv:
 52 |         case m_sdiv:
 53 |         case m_umod:
 54 |         case m_smod:
 55 |         case m_shl:
 56 |         case m_shr:
 57 |           arith_cnt++;
 58 |           break;
 59 |         case m_bnot:
 60 |         case m_or:
 61 |         case m_and:
 62 |         case m_xor:
 63 |         case m_sar:
 64 |           bool_cnt++;
 65 |           break;
 66 |         default:
 67 |           return 0;
 68 |       }
 69 |       return bool_cnt >= MIN_MBA_BOOL_OPS && arith_cnt >= MIN_MBA_ARITH_OPS;
 70 |     }
 71 |   };
 72 | 
 73 |   if ( is_mcode_xdsu(insn.opcode) )
 74 |     return false; // exclude xdsu, it is better to optimize its operand
 75 | 
 76 |   if ( insn.d.size > 8 )
 77 |     return false; // we only support 64-bit math
 78 | 
 79 |   mba_opc_counter_t cntr;
 80 |   return CONST_CAST(minsn_t*)(&insn)->for_all_insns(cntr) != 0;
 81 | }
 82 | 
 83 | //-------------------------------------------------------------------------
 84 | // runs a battery of random test cases against both expressions to see if they are equivalent
 85 | bool probably_equivalent(const minsn_t &insn, const candidate_expr_t &expr)
 86 | {
 87 |   for ( int i = 0; i < NUM_TEST_CASES; i++ )
 88 |   {
 89 |     mcode_emu_rand_vals_t emu;
 90 |     mcode_val_t insn_eval = emu.minsn_value(insn);
 91 |     mcode_val_t expr_eval = expr.evaluate(emu);
 92 | 
 93 |     if ( insn_eval != expr_eval )
 94 |       return false;
 95 |   }
 96 | 
 97 |   return true;
 98 | }
 99 | 
100 | //-------------------------------------------------------------------------
101 | // runs a battery of random test cases against both expressions to see if they are equivalent
102 | bool probably_equivalent(const minsn_t &a, const minsn_t &b)
103 | {
104 |   for ( int i = 0; i < NUM_TEST_CASES; i++ )
105 |   {
106 |     mcode_emu_rand_vals_t emu;
107 |     mcode_val_t insn_eval = emu.minsn_value(a);
108 |     mcode_val_t expr_eval = emu.minsn_value(b);
109 | 
110 |     if ( insn_eval != expr_eval )
111 |       return false;
112 |   }
113 | 
114 |   return true;
115 | }
116 | 
117 | //-------------------------------------------------------------------------
118 | // estimates the "complexity" of a given instruction
119 | int score_complexity(const minsn_t &insn)
120 | {
121 |   struct ida_local complexity_counter_t : public minsn_visitor_t
122 |   {
123 |     int cnt = 0;
124 |     int idaapi visit_minsn() override
125 |     {
126 |       cnt++;
127 |       return 0;
128 |     }
129 |   };
130 |   complexity_counter_t cc;
131 |   CONST_CAST(minsn_t&)(insn).for_all_insns(cc);
132 |   return cc.cnt;
133 | }
134 | 


--------------------------------------------------------------------------------
/heuristics.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #include "mcode_emu.hpp"
11 | #include "linear_exprs.hpp"
12 | 
13 | const uint64 SPECIAL[] = { 0, 1, 0xffffffffffffffff };
14 | const int NUM_SPECIAL = qnumber(SPECIAL);
15 | const double SPECIAL_PROBABILITY = 0.2; // probability of selecting a special number when sampling
16 | 
17 | // an expression must have at least this many subinstructions of each type to count as an MBA
18 | const int MIN_MBA_BOOL_OPS = 1;
19 | const int MIN_MBA_ARITH_OPS = 1;
20 | 
21 | // number of test cases to run when checking if an instruction matches the candidate expression's behavior
22 | const int NUM_TEST_CASES = 256;
23 | 
24 | //-------------------------------------------------------------------------
25 | mcode_val_t gen_rand_mcode_val(int size);
26 | 
27 | //-------------------------------------------------------------------------
28 | // emulates the microcode, assigning random values to unknown variables
29 | // (but keeping them consistent across executions)
30 | struct mcode_emu_rand_vals_t : public mcode_emulator_t
31 | {
32 |   std::map<const mop_t, mcode_val_t> assigned_vals;
33 | 
34 |   mcode_val_t get_var_val(const mop_t &mop) override
35 |   {
36 |     // check that the mop is indeed a variable
37 |     mopt_t t = mop.t;
38 |     QASSERT(30672, t == mop_r || t == mop_S || t == mop_v || t == mop_l);
39 | 
40 |     auto assignment = assigned_vals.find(mop);
41 |     if ( assignment != assigned_vals.end() )
42 |       return assignment->second;
43 | 
44 |     mcode_val_t new_val = gen_rand_mcode_val(mop.size);
45 |     assigned_vals.insert( { mop, new_val } );
46 |     return new_val;
47 |   }
48 | };
49 | 
50 | //-------------------------------------------------------------------------
51 | bool is_mba(const minsn_t &insn);
52 | 
53 | //-------------------------------------------------------------------------
54 | bool probably_equivalent(const minsn_t &insn, const candidate_expr_t &expr);
55 | bool probably_equivalent(const minsn_t &a, const minsn_t &b);
56 | 
57 | //-------------------------------------------------------------------------
58 | // estimates the "complexity" of a given instruction
59 | int score_complexity(const minsn_t &insn);
60 | 
61 | struct minsn_complexity_cmptr_t
62 | {
63 |   bool operator()(const minsn_t *a, const minsn_t *b) const
64 |   {
65 |     auto score_a = score_complexity(*a);
66 |     auto score_b = score_complexity(*b);
67 |     return score_a < score_b;
68 |   }
69 | };
70 | 
71 | inline mopvec_t get_input_mops(const minsn_t &insn)
72 | {
73 |   default_zero_mcode_emu_t emu;
74 |   emu.minsn_value(insn); // populate emu.assigned_vals
75 | 
76 |   mopvec_t res;
77 |   res.reserve(emu.assigned_vals.size());
78 |   for ( auto const &entry : emu.assigned_vals )
79 |     res.push_back(entry.first);
80 | 
81 |   std::sort(res.begin(), res.end());
82 |   return res;
83 | }


--------------------------------------------------------------------------------
/images/mba1_after.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/images/mba1_after.png


--------------------------------------------------------------------------------
/images/mba1_before.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/images/mba1_before.png


--------------------------------------------------------------------------------
/lin_conj_exprs.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #pragma once
 10 | #include <hexrays.hpp>
 11 | #include "linear_exprs.hpp"
 12 | #include "mcode_emu.hpp"
 13 | 
 14 | typedef qvector<mcode_val_t> coeff_vector_t;
 15 | typedef qvector<mcode_val_t> eval_trace_t;
 16 | const int LIN_CONJ_MAX_VARS = 16;
 17 | 
 18 | // represents a linear combination of conjunctions
 19 | class lin_conj_expr_t : public candidate_expr_t
 20 | {
 21 | protected:
 22 |   mopvec_t mops;
 23 |   coeff_vector_t coeffs;
 24 |   eval_trace_t eval_trace;
 25 | 
 26 | public:
 27 |   //-------------------------------------------------------------------------
 28 |   const char *dstr() const override
 29 |   {
 30 |     static char buf[MAXSTR];
 31 |     char *ptr = buf;
 32 |     char *end = buf + sizeof(buf);
 33 | 
 34 |     ptr += qsnprintf(ptr, end-ptr, "0x%" FMT_64 "x", coeffs[0].val);
 35 |     for ( uint32 i = 1; i < coeffs.size(); i++ )
 36 |     {
 37 |       if ( coeffs[i].val == 0 )
 38 |         continue;
 39 |       ptr += qsnprintf(ptr, end-ptr, " + 0x%" FMT_64 "x(", coeffs[i].val);
 40 |       ptr = print_assignment(ptr, end, i);
 41 |       APPEND(ptr, end, ")");
 42 |     }
 43 |     return buf;
 44 |   }
 45 | 
 46 |   //-------------------------------------------------------------------------
 47 |   // each boolean assignment is represented as a uint32, where the nth bit
 48 |   // represents the 0/1 value of the corresponding variable
 49 |   char *print_assignment(char *ptr, char *end, uint32 assn) const
 50 |   {
 51 |     bool first_printed = false;
 52 |     for ( int i = 0; i < mops.size(); i++ )
 53 |     {
 54 |       if ( ((assn >> i) & 1) != 0 )
 55 |       {
 56 |         if ( first_printed )
 57 |           APPCHAR(ptr, end, '&');
 58 |         APPEND(ptr, end, mops[i].dstr());
 59 |         first_printed = true;
 60 |       }
 61 |     }
 62 |     return ptr;
 63 |   }
 64 | 
 65 |   //-------------------------------------------------------------------------
 66 |   // each boolean assignment is represented as a uint32, where the nth bit
 67 |   // represents the 0/1 value of the corresponding variable
 68 |   void apply_assignment(uint32 assn, std::map<const mop_t, mcode_val_t> &dest)
 69 |   {
 70 |     // recall std::map keeps keys in sorted order
 71 |     int curr_idx = 0;
 72 |     for ( auto &kv : dest )
 73 |     {
 74 |       kv.second.val = (assn >> curr_idx) & 1;
 75 |       curr_idx++;
 76 |     }
 77 |   }
 78 | 
 79 |   //-------------------------------------------------------------------------
 80 |   // the i'th index in output_vals contains the output value corresponding to
 81 |   // the i'th assignment, where the i'th assignment is defined as in
 82 |   // apply_assignment.
 83 |   // the return value of this function is the corresponding coefficients in
 84 |   // the linear combination of conjunctions that would yield the output
 85 |   // behavior. The coefficients are ordered based on the same indexing pattern.
 86 |   void compute_coeffs(coeff_vector_t &dest, const qvector<mcode_val_t> &output_vals)
 87 |   {
 88 |     dest = coeff_vector_t();
 89 |     dest.reserve(output_vals.size());
 90 |     dest.push_back(output_vals[0]); // the zero coeff = the zero assignment
 91 | 
 92 |     // we can think of the problem as solving the linear equation Ax = y,
 93 |     // where y is the output_vals and x is the desired coefficient set.
 94 |     // A is defined as the binary matrix where row numbers represent
 95 |     // assignments and columns represent conjunctions. See the SiMBA paper
 96 |     // for more details.
 97 |     // We do an additional simplification, noting that
 98 |     // A_{ij} = (i & j) == j. Also, we use forward substitution since A is a
 99 |     // lower-triangular matrix.
100 | 
101 |     for ( uint32 i = 1; i < output_vals.size(); i++ )
102 |     {
103 |       mcode_val_t curr_coeff = output_vals[i];
104 |       for ( uint32 j = 0; j < i; j++ )
105 |       {
106 |         if ( (i & j) == j )
107 |           curr_coeff = curr_coeff - dest[j];
108 |       }
109 |       dest.push_back(curr_coeff);
110 |     }
111 |   }
112 | 
113 |   //-------------------------------------------------------------------------
114 |   void recompute_coeffs()
115 |   {
116 |     compute_coeffs(coeffs, eval_trace);
117 |   }
118 | 
119 |   //-------------------------------------------------------------------------
120 |   mcode_val_t evaluate(mcode_emulator_t &emu) const override
121 |   {
122 |     minsn_t *minsn = to_minsn(0);
123 |     mcode_val_t res = emu.minsn_value(*minsn);
124 |     delete minsn;
125 |     return res;
126 |   }
127 | 
128 |   //-------------------------------------------------------------------------
129 |   // eliminates all variables that are not needed in the expression
130 |   void eliminate_variables()
131 |   {
132 |     for ( int i = 0; i < mops.size(); i++ )
133 |     {
134 |       if ( can_eliminate_variable(i) )
135 |       {
136 |         eliminate_variable(i);
137 |         i--; // the mop at mop[i] no longer exists
138 |       }
139 |     }
140 |   }
141 | 
142 |   //-------------------------------------------------------------------------
143 |   // creates a linear combination of conjunctions based on the minsn behavior
144 |   lin_conj_expr_t(const minsn_t &insn)
145 |   {
146 |     default_zero_mcode_emu_t emu;
147 |     mcode_val_t const_term = emu.minsn_value(insn);
148 | 
149 |     int nvars = emu.assigned_vals.size();
150 |     if ( nvars > LIN_CONJ_MAX_VARS )
151 |       throw "lin_conj_expr_t: too many input variables";
152 | 
153 |     uint32 max_assignment = 1 << nvars;
154 |     // we have already gotten the value for the all-zeroes assignment, which is const_term
155 |     eval_trace.push_back(const_term);
156 |     eval_trace.reserve(max_assignment);
157 | 
158 |     for ( uint32 assn = 1; assn < max_assignment; assn++ )
159 |     {
160 |       apply_assignment(assn, emu.assigned_vals);
161 |       mcode_val_t output_val = emu.minsn_value(insn);
162 | 
163 |       eval_trace.push_back(output_val);
164 |     }
165 | 
166 |     compute_coeffs(coeffs, eval_trace);
167 |     mops.reserve(emu.assigned_vals.size());
168 |     for ( const auto &kv : emu.assigned_vals )
169 |       mops.push_back(kv.first);
170 | 
171 |     QASSERT(30679, coeffs.size() == (1ull << mops.size()));
172 |   }
173 | 
174 |   //-------------------------------------------------------------------------
175 |   z3::expr to_smt(z3_converter_t &cvtr) const override
176 |   {
177 |     minsn_t *minsn = to_minsn(0);
178 |     z3::expr res = cvtr.minsn_to_expr(*minsn);
179 |     delete minsn;
180 |     return res;
181 |   }
182 | 
183 |   //-------------------------------------------------------------------------
184 |   // converts an assignment to the corresponding conjunction. e.g.
185 |   // 0b1101 => x0 & x2 & x3
186 |   minsn_t *assn_to_minsn(uint32 assn, int size, ea_t ea) const
187 |   {
188 |     QASSERT(30680, assn != 0);
189 |     minsn_t *res = nullptr;
190 | 
191 |     for ( int i = 0; i < mops.size(); i++ )
192 |     {
193 |       if ( ((assn >> i) & 1) != 0 )
194 |       {
195 |         if ( res == nullptr )
196 |         {
197 |           res = resize_mop(ea, mops[i], size, false);
198 |         }
199 |         else
200 |         {
201 |           minsn_t *new_res = new minsn_t(ea);
202 |           new_res->opcode = m_and;
203 |           new_res->l.create_from_insn(res);
204 |           minsn_t *rsz = resize_mop(ea, mops[i], size, false);
205 |           new_res->r.create_from_insn(rsz);
206 |           delete rsz;
207 |           new_res->d.size = size;
208 | 
209 |           delete res;
210 |           res = new_res;
211 |         }
212 |       }
213 |     }
214 | 
215 |     QASSERT(30681, res->opcode != m_ldc);
216 | 
217 |     return res;
218 |   }
219 | 
220 |   //-------------------------------------------------------------------------
221 |   minsn_t *to_minsn(ea_t ea) const override
222 |   {
223 |     minsn_t *res = new minsn_t(ea);
224 |     res->opcode = m_ldc;
225 |     res->l.make_number(coeffs[0].val, coeffs[0].size, ea);
226 |     res->r.zero();
227 |     res->d.size = coeffs[0].size;
228 | 
229 |     for ( uint32 assn = 1; assn < coeffs.size(); assn++ )
230 |     {
231 |       auto coeff = coeffs[assn];
232 |       if ( coeff.val == 0 )
233 |         continue;
234 | 
235 |       // mul = coeff * F(mops)
236 |       minsn_t mul(ea);
237 |       mul.opcode = m_mul;
238 |       mul.l.make_number(coeff.val, coeff.size);
239 |       minsn_t *F = assn_to_minsn(assn, coeff.size, ea);
240 |       mul.r.create_from_insn(F);
241 |       delete F;
242 |       mul.d.size = coeff.size;
243 | 
244 |       // add = res + mul
245 |       minsn_t *add = new minsn_t(ea);
246 |       add->opcode = m_add;
247 |       add->l.create_from_insn(res);
248 |       add->r.create_from_insn(&mul);
249 |       add->d.size = coeff.size;
250 | 
251 |       delete res; // mop_t::create_from_insn makes a copy of the insn
252 |       res = add;
253 |     }
254 | 
255 |     return res;
256 |   }
257 | 
258 | private:
259 |   //-------------------------------------------------------------------------
260 |   // returns true if the variable can be eliminated safely
261 |   // i.e. all terms containing it have coeff = 0
262 |   bool can_eliminate_variable(int idx)
263 |   {
264 |     for ( uint32 assn = 0; assn < coeffs.size(); assn++ )
265 |     {
266 |       if ( ((assn >> idx) & 1) != 0 && coeffs[assn].val != 0 )
267 |         return false;
268 |     }
269 |     return true;
270 |   }
271 | 
272 |   //-------------------------------------------------------------------------
273 |   // removes the variable from the expression
274 |   // make sure to check can_eliminate_variable before calling
275 |   void eliminate_variable(int idx)
276 |   {
277 |     coeff_vector_t new_coeffs;
278 |     eval_trace_t new_evals;
279 |     new_coeffs.reserve(coeffs.size() / 2);
280 |     new_evals.reserve(coeffs.size() / 2);
281 |     for ( uint32 assn = 0; assn < coeffs.size(); assn++ )
282 |     {
283 |       if ( ((assn >> idx) & 1) == 0 )
284 |       {
285 |         new_coeffs.push_back(coeffs[assn]);
286 |         new_evals.push_back(eval_trace[assn]);
287 |       }
288 |       else
289 |       {
290 |         QASSERT(30682, coeffs[assn].val == 0);
291 |       }
292 |     }
293 |     coeffs = new_coeffs;
294 |     eval_trace = new_evals;
295 |     mops.erase(mops.begin() + idx);
296 |   }
297 | };
298 | 


--------------------------------------------------------------------------------
/linear_exprs.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "linear_exprs.hpp"
 11 | 
 12 | //-------------------------------------------------------------------------
 13 | const char *linear_expr_t::dstr() const
 14 | {
 15 |   static char buf[MAXSTR];
 16 |   char *ptr = buf;
 17 |   char *end = buf + sizeof(buf);
 18 | 
 19 |   ptr += qsnprintf(ptr, end-ptr, "0x%" FMT_64 "x", const_term.val);
 20 |   for ( const auto &term : coeffs )
 21 |   {
 22 |     if ( term.second.val == 0 )
 23 |       continue;
 24 |     ptr += qsnprintf(ptr, end-ptr, " + 0x%" FMT_64 "x*", term.second.val);
 25 |     if ( term.first.size < const_term.size )
 26 |     {
 27 |       ptr += qsnprintf(ptr, end-ptr, "%s(%s)",
 28 |                        sext.count(term.first) ? "SEXT" : "ZEXT",
 29 |                        term.first.dstr());
 30 |     }
 31 |     else if ( term.first.size > const_term.size )
 32 |     {
 33 |       ptr += qsnprintf(ptr, end-ptr, "TRUNC(%s)", term.first.dstr());
 34 |     }
 35 |     else
 36 |     {
 37 |       APPEND(ptr, end, term.first.dstr());
 38 |     }
 39 |   }
 40 |   return buf;
 41 | }
 42 | 
 43 | //-------------------------------------------------------------------------
 44 | linear_expr_t::linear_expr_t(const minsn_t &insn) // creates a linear expression based on the instruction behavior
 45 | {
 46 |   default_zero_mcode_emu_t emu;
 47 |   const_term = emu.minsn_value(insn); // the value when all variables are assigned to zero
 48 | 
 49 |   for ( auto &p : emu.assigned_vals )
 50 |   {
 51 |     mop_t mop = p.first;
 52 |     p.second = mcode_val_t(1, mop.size);
 53 |     mcode_val_t coeff = emu.minsn_value(insn) - const_term;
 54 | 
 55 |     if ( mop.size < const_term.size )
 56 |     {
 57 |       // check if a sign extension is necessary
 58 |       p.second = mcode_val_t(-1, mop.size);
 59 |       mcode_val_t eval = emu.minsn_value(insn); // eval = const + (-1)*coeff if x was sign extended
 60 | 
 61 |       if ( const_term - eval == coeff )
 62 |         sext.insert(mop);
 63 |     }
 64 | 
 65 |     coeffs.insert( { mop, emu.minsn_value(insn) - const_term } );
 66 |     p.second = mcode_val_t(0, mop.size);
 67 |   }
 68 | }
 69 | 
 70 | //-------------------------------------------------------------------------
 71 | mcode_val_t linear_expr_t::evaluate(mcode_emulator_t &emu) const
 72 | {
 73 |   mcode_val_t res = const_term;
 74 | 
 75 |   for ( const auto &term : coeffs )
 76 |   {
 77 |     const mop_t &mop = term.first;
 78 |     const mcode_val_t &coeff = term.second;
 79 |     mcode_val_t mop_val = emu.get_var_val(mop);
 80 | 
 81 |     // extend the value to 64 bits first
 82 |     uint64 ext_val = sext.count(mop) ? mop_val.signed_val() : mop_val.val;
 83 | 
 84 |     res = res + coeff * mcode_val_t(ext_val, coeff.size);
 85 |   }
 86 | 
 87 |   return res;
 88 | }
 89 | 
 90 | //-------------------------------------------------------------------------
 91 | z3::expr linear_expr_t::to_smt(z3_converter_t &cvtr) const
 92 | {
 93 |   z3::expr res = cvtr.mcode_val_to_expr(const_term);
 94 | 
 95 |   for ( const auto &term : coeffs )
 96 |   {
 97 |     const mop_t &mop = term.first;
 98 |     const mcode_val_t &coeff = term.second;
 99 |     z3::expr mop_expr = cvtr.mop_to_expr(mop);
100 | 
101 |     z3::expr ext_expr = cvtr.bv_resize_to_len(mop_expr, const_term.size * 8, sext.count(mop) != 0);
102 | 
103 |     res = res
104 |         + cvtr.mcode_val_to_expr(coeff) * ext_expr;
105 |   }
106 | 
107 |   return res;
108 | }
109 | 
110 | //-------------------------------------------------------------------------
111 | minsn_t *linear_expr_t::to_minsn(ea_t ea) const
112 | {
113 |   minsn_t *res = new minsn_t(ea);
114 |   res->opcode = m_ldc;
115 |   res->l.make_number(const_term.val, const_term.size);
116 |   res->r.zero();
117 |   res->d.size = const_term.size;
118 | 
119 |   for ( const auto &term : coeffs )
120 |   {
121 |     const mop_t &mop = term.first;
122 |     const mcode_val_t &coeff = term.second;
123 | 
124 |     if ( coeff.val == 0 )
125 |       continue;
126 | 
127 |     // mul = coeff * ext(mop)
128 |     minsn_t mul(ea);
129 |     mul.opcode = m_mul;
130 |     mul.l.make_number(coeff.val, coeff.size);
131 |     minsn_t *rsz = resize_mop(ea, mop, const_term.size, sext.count(mop) != 0);
132 |     mul.r.create_from_insn(rsz);
133 |     delete rsz;
134 | 
135 |     mul.d.size = const_term.size;
136 | 
137 |     // add = res + mul
138 |     minsn_t *add = new minsn_t(ea);
139 |     add->opcode = m_add;
140 |     add->l.create_from_insn(res);
141 |     add->r.create_from_insn(&mul);
142 |     add->d.size = const_term.size;
143 | 
144 |     delete res; // mop_t::create_from_insn makes a copy of the insn
145 |     res = add;
146 |   }
147 | 
148 |   return res;
149 | }
150 | 


--------------------------------------------------------------------------------
/linear_exprs.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #include <hexrays.hpp>
11 | 
12 | #include "smt_convert.hpp"
13 | #include "mcode_emu.hpp"
14 | 
15 | //-------------------------------------------------------------------------
16 | class candidate_expr_t
17 | {
18 | public:
19 |   virtual ~candidate_expr_t() {}
20 |   virtual mcode_val_t evaluate(mcode_emulator_t &emu) const = 0;
21 |   virtual z3::expr to_smt(z3_converter_t &converter) const = 0;
22 |   virtual minsn_t *to_minsn(ea_t ea) const = 0;
23 |   virtual const char *dstr() const = 0;
24 | };
25 | 
26 | //-------------------------------------------------------------------------
27 | // resize_mop generates a minsn that resizes the source operand (truncates or extends)
28 | inline minsn_t *resize_mop(ea_t ea, const mop_t &mop, int dest_sz, bool sext)
29 | {
30 |   minsn_t *res = new minsn_t(ea);
31 |   if ( dest_sz == mop.size )
32 |     res->opcode = m_mov;
33 |   else if ( dest_sz < mop.size )
34 |     res->opcode = m_low;
35 |   else
36 |     res->opcode = sext ? m_xds : m_xdu;
37 | 
38 |   res->l = mop;
39 |   res->d.size = dest_sz;
40 |   return res;
41 | }
42 | 
43 | //-------------------------------------------------------------------------
44 | // this emulator automatically assigns variables to 0
45 | // after the first run, the assigned_vals field can be modified
46 | // and the emulation can be rerun to obtain coefficients
47 | class default_zero_mcode_emu_t : public mcode_emulator_t
48 | {
49 | public:
50 |   std::map<const mop_t, mcode_val_t> assigned_vals;
51 | 
52 |   mcode_val_t get_var_val(const mop_t &mop) override
53 |   {
54 |     // check that the mop is indeed a variable
55 |     mopt_t t = mop.t;
56 |     QASSERT(30695, t == mop_r || t == mop_S || t == mop_v || t == mop_l);
57 | 
58 |     auto p = assigned_vals.find(mop);
59 |     if ( p != assigned_vals.end() )
60 |       return p->second;
61 | 
62 |     mcode_val_t new_val = mcode_val_t(0, mop.size);
63 |     assigned_vals.insert( { mop, new_val } );
64 |     return new_val;
65 |   }
66 | };
67 | 
68 | //-------------------------------------------------------------------------
69 | class linear_expr_t : public candidate_expr_t
70 | {
71 | public:
72 |   mcode_val_t const_term { 0, 1 };
73 |   std::map<mop_t, mcode_val_t> coeffs;
74 |   std::set<mop_t> sext;
75 | 
76 |   const char *dstr() const override;
77 |   linear_expr_t(const minsn_t &insn);
78 |   mcode_val_t evaluate(mcode_emulator_t &emu) const override;
79 |   z3::expr to_smt(z3_converter_t &cvtr) const override;
80 |   minsn_t *to_minsn(ea_t ea) const override;
81 | };
82 | 


--------------------------------------------------------------------------------
/makefile:
--------------------------------------------------------------------------------
  1 | PROC=goomba
  2 | 
  3 | GOALS += $(R)libz3$(DLLEXT)
  4 | O2=heuristics
  5 | O3=smt_convert
  6 | O4=linear_exprs
  7 | O5=msynth_parser
  8 | O6=bitwise_expr_lookup_tbl
  9 | O7=optimizer
 10 | O8=equiv_class
 11 | O9=file
 12 | 
 13 | CONFIGS=goomba.cfg
 14 | include ../plugin.mak
 15 | 
 16 | ifeq ($(THIRD_PARTY),)
 17 |   # building outside of Hex-Rays tree, use a local z3 build 
 18 |   Z3_BIN = z3/bin/
 19 |   Z3_INCLUDE = z3/include/
 20 | endif
 21 | 
 22 | ifdef __MAC__
 23 |   POSTACTION=install_name_tool -change libz3.dylib @executable_path/libz3.dylib $@
 24 | endif
 25 | 
 26 | ifdef __NT__
 27 |   # link to the import library on Windows
 28 |   STDLIBS += $(Z3_BIN)libz3.lib
 29 | else
 30 |   # link directly to dylib/shared object on Unix
 31 |   STDLIBS += -L$(R) -lz3
 32 | endif
 33 | 
 34 | $(F)$(PROC)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 35 | $(F)$(O2)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 36 | $(F)$(O3)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 37 | $(F)$(O4)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 38 | $(F)$(O5)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 39 | $(F)$(O6)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 40 | $(F)$(O7)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 41 | $(F)$(O8)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 42 | $(F)$(O9)$(O): CC_INCP += $(Z3_INCLUDE) $(Z3_INCLUDE)c++
 43 | $(F)$(PROC)$(O): $(R)libz3$(DLLEXT)
 44 | 
 45 | $(R)libz3$(DLLEXT): $(Z3_BIN)libz3$(DLLEXT)
 46 | 	$(Q)$(CP) $? $@
 47 | 
 48 | # MAKEDEP dependency list ------------------
 49 | $(F)bitwise_expr_lookup_tbl$(O): $(I)bitrange.hpp $(I)bytes.hpp             \
 50 |                   $(I)config.hpp $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp       \
 51 |                   $(I)hexrays.hpp $(I)ida.hpp $(I)idp.hpp $(I)ieee.h        \
 52 |                   $(I)kernwin.hpp $(I)lines.hpp $(I)llong.hpp               \
 53 |                   $(I)loader.hpp $(I)nalt.hpp $(I)name.hpp $(I)netnode.hpp  \
 54 |                   $(I)pro.h $(I)range.hpp $(I)segment.hpp $(I)typeinf.hpp   \
 55 |                   $(I)ua.hpp $(I)xref.hpp bitwise_expr_lookup_tbl.cpp       \
 56 |                   bitwise_expr_lookup_tbl.hpp consts.hpp linear_exprs.hpp   \
 57 |                   mcode_emu.hpp minsn_template.hpp smt_convert.hpp          \
 58 |                   z3++_no_warn.h
 59 | $(F)equiv_class$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp          \
 60 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
 61 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
 62 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
 63 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
 64 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
 65 |                   bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.cpp    \
 66 |                   equiv_class.hpp heuristics.hpp lin_conj_exprs.hpp         \
 67 |                   linear_exprs.hpp mcode_emu.hpp minsn_template.hpp         \
 68 |                   msynth_parser.hpp optimizer.hpp simp_lin_conj_exprs.hpp   \
 69 |                   smt_convert.hpp z3++_no_warn.h
 70 | $(F)file$(O)    : $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp $(I)fpro.h  \
 71 |                   $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp $(I)ida.hpp     \
 72 |                   $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp $(I)lines.hpp      \
 73 |                   $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp $(I)name.hpp    \
 74 |                   $(I)netnode.hpp $(I)pro.h $(I)range.hpp $(I)segment.hpp   \
 75 |                   $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp                   \
 76 |                   bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp    \
 77 |                   file.cpp file.hpp heuristics.hpp lin_conj_exprs.hpp       \
 78 |                   linear_exprs.hpp mcode_emu.hpp minsn_template.hpp         \
 79 |                   msynth_parser.hpp simp_lin_conj_exprs.hpp                 \
 80 |                   smt_convert.hpp z3++_no_warn.h
 81 | $(F)goomba$(O)  : $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp $(I)err.h   \
 82 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
 83 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
 84 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
 85 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
 86 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
 87 |                   bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp    \
 88 |                   file.hpp goomba.cpp heuristics.hpp lin_conj_exprs.hpp     \
 89 |                   linear_exprs.hpp mcode_emu.hpp minsn_template.hpp         \
 90 |                   msynth_parser.hpp optimizer.hpp simp_lin_conj_exprs.hpp   \
 91 |                   smt_convert.hpp z3++_no_warn.h
 92 | $(F)heuristics$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp           \
 93 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
 94 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
 95 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
 96 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
 97 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
 98 |                   heuristics.cpp heuristics.hpp linear_exprs.hpp            \
 99 |                   mcode_emu.hpp smt_convert.hpp z3++_no_warn.h
100 | $(F)linear_exprs$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp         \
101 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
102 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
103 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
104 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
105 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
106 |                   linear_exprs.cpp linear_exprs.hpp mcode_emu.hpp           \
107 |                   smt_convert.hpp z3++_no_warn.h
108 | $(F)msynth_parser$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp        \
109 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
110 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
111 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
112 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
113 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
114 |                   consts.hpp linear_exprs.hpp mcode_emu.hpp                 \
115 |                   minsn_template.hpp msynth_parser.cpp msynth_parser.hpp    \
116 |                   smt_convert.hpp z3++_no_warn.h
117 | $(F)optimizer$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp            \
118 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
119 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
120 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
121 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
122 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
123 |                   bitwise_expr_lookup_tbl.hpp consts.hpp equiv_class.hpp    \
124 |                   heuristics.hpp lin_conj_exprs.hpp linear_exprs.hpp        \
125 |                   mcode_emu.hpp minsn_template.hpp msynth_parser.hpp        \
126 |                   optimizer.cpp optimizer.hpp simp_lin_conj_exprs.hpp       \
127 |                   smt_convert.hpp z3++_no_warn.h
128 | $(F)smt_convert$(O): $(I)bitrange.hpp $(I)bytes.hpp $(I)config.hpp          \
129 |                   $(I)fpro.h $(I)funcs.hpp $(I)gdl.hpp $(I)hexrays.hpp      \
130 |                   $(I)ida.hpp $(I)idp.hpp $(I)ieee.h $(I)kernwin.hpp        \
131 |                   $(I)lines.hpp $(I)llong.hpp $(I)loader.hpp $(I)nalt.hpp   \
132 |                   $(I)name.hpp $(I)netnode.hpp $(I)pro.h $(I)range.hpp      \
133 |                   $(I)segment.hpp $(I)typeinf.hpp $(I)ua.hpp $(I)xref.hpp   \
134 |                   mcode_emu.hpp smt_convert.cpp smt_convert.hpp             \
135 |                   z3++_no_warn.h
136 | 


--------------------------------------------------------------------------------
/mcode_emu.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *      This file implements a simple microcode emulator class
  7 |  *
  8 |  */
  9 | 
 10 | #pragma once
 11 | #include <hexrays.hpp>
 12 | 
 13 | //-------------------------------------------------------------------------
 14 | // truncate v to w bytes
 15 | inline uint64 trunc(uint64 v, int w)
 16 | {
 17 |   QASSERT(30660, w == 1 || w == 2 || w == 4 || w == 8);
 18 |   return v & make_mask<uint64>(w * 8);
 19 | }
 20 | 
 21 | //-------------------------------------------------------------------------
 22 | struct mcode_val_t
 23 | {
 24 |   uint64 val;
 25 |   int size; // in bytes
 26 | 
 27 |   //-------------------------------------------------------------------------
 28 |   void check_size_equal(const mcode_val_t &o) const
 29 |   {
 30 |     QASSERT(30661, size == o.size);
 31 |   }
 32 | 
 33 |   //-------------------------------------------------------------------------
 34 |   mcode_val_t(uint64 v, int s) : val(trunc(v, s)), size(s) {}
 35 | 
 36 |   //-------------------------------------------------------------------------
 37 |   int64 signed_val() const
 38 |   {
 39 |     return extend_sign(val, size, true);
 40 |   }
 41 | 
 42 |   //-------------------------------------------------------------------------
 43 |   mcode_val_t sext(int target_sz) const
 44 |   {
 45 |     QASSERT(30662, target_sz >= size);
 46 |     return mcode_val_t(signed_val(), target_sz);
 47 |   }
 48 | 
 49 |   //-------------------------------------------------------------------------
 50 |   mcode_val_t zext(int target_sz) const
 51 |   {
 52 |     QASSERT(30663, target_sz >= size);
 53 |     return mcode_val_t(val, target_sz);
 54 |   }
 55 | 
 56 |   //-------------------------------------------------------------------------
 57 |   mcode_val_t low(int target_sz) const
 58 |   {
 59 |     QASSERT(30664, target_sz <= size);
 60 |     return mcode_val_t(val, target_sz);
 61 |   }
 62 | 
 63 |   //-------------------------------------------------------------------------
 64 |   mcode_val_t high(int target_sz) const
 65 |   {
 66 |     QASSERT(30665, target_sz <= size);
 67 |     int bytes_to_remove = size - target_sz;
 68 |     return mcode_val_t(right_ushift<uint64>(val, 8 * bytes_to_remove), target_sz);
 69 |   }
 70 | 
 71 |   //-------------------------------------------------------------------------
 72 |   bool operator==(const mcode_val_t &o) const
 73 |   {
 74 |     return size == o.size && val == o.val;
 75 |   }
 76 | 
 77 |   //-------------------------------------------------------------------------
 78 |   bool operator!=(const mcode_val_t &o) const
 79 |   {
 80 |     return !(*this == o);
 81 |   }
 82 | 
 83 |   //-------------------------------------------------------------------------
 84 |   bool operator<(const mcode_val_t &o) const
 85 |   {
 86 |     QASSERT(30702, size == o.size);
 87 |     return val < o.val;
 88 |   }
 89 | 
 90 |   //-------------------------------------------------------------------------
 91 |   mcode_val_t operator+(const mcode_val_t &o) const
 92 |   {
 93 |     check_size_equal(o);
 94 |     return mcode_val_t(val + o.val, size);
 95 |   }
 96 | 
 97 |   //-------------------------------------------------------------------------
 98 |   mcode_val_t operator-(const mcode_val_t &o) const
 99 |   {
100 |     check_size_equal(o);
101 |     return mcode_val_t(val - o.val, size);
102 |   }
103 | 
104 |   //-------------------------------------------------------------------------
105 |   mcode_val_t operator*(const mcode_val_t &o) const
106 |   {
107 |     check_size_equal(o);
108 |     return mcode_val_t(val * o.val, size);
109 |   }
110 | 
111 |   //-------------------------------------------------------------------------
112 |   mcode_val_t operator/(const mcode_val_t &o) const
113 |   {
114 |     check_size_equal(o);
115 |     if ( o.val == 0 )
116 |       throw "division by zero occurred when emulating instruction";
117 |     return mcode_val_t(val / o.val, size);
118 |   }
119 | 
120 |   //-------------------------------------------------------------------------
121 |   mcode_val_t sdiv(const mcode_val_t &o) const
122 |   {
123 |     check_size_equal(o);
124 |     if ( o.val == 0 )
125 |       throw "division by zero occurred when emulating instruction";
126 |     int64 res;
127 |     uint64 l = val;
128 |     uint64 r = o.val;
129 |     switch ( size )
130 |     {
131 |       case 1: res = int8(l)  / int8(r); break;
132 |       case 2: res = int16(l) / int16(r); break;
133 |       case 4: res = int32(l) / int32(r); break;
134 |       case 8: res = int64(l) / int64(r); break;
135 |       default: INTERR(30666);
136 |     }
137 | 
138 |     return mcode_val_t(res, size);
139 |   }
140 | 
141 |   //-------------------------------------------------------------------------
142 |   mcode_val_t operator%(const mcode_val_t &o) const
143 |   {
144 |     check_size_equal(o);
145 |     if ( o.val == 0 )
146 |       throw "division by zero occurred when emulating instruction";
147 |     return mcode_val_t(val % o.val, size);
148 |   }
149 | 
150 |   //-------------------------------------------------------------------------
151 |   mcode_val_t smod(const mcode_val_t &o) const
152 |   {
153 |     check_size_equal(o);
154 |     if ( o.val == 0 )
155 |       throw "division by zero occurred when emulating instruction";
156 |     int64 res = -1;
157 |     uint64 l = val;
158 |     uint64 r = o.val;
159 |     switch ( size )
160 |     {
161 |       case 1: res = int8(l)  % int8(r); break;
162 |       case 2: res = int16(l) % int16(r); break;
163 |       case 4: res = int32(l) % int32(r); break;
164 |       case 8: res = int64(l) % int64(r); break;
165 |       default: QASSERT(30667, false);
166 |     }
167 | 
168 |     return mcode_val_t(res, size);
169 |   }
170 | 
171 |   //-------------------------------------------------------------------------
172 |   mcode_val_t operator<<(const mcode_val_t &o) const
173 |   {
174 |     return mcode_val_t(left_shift<uint64>(val, o.val), size);
175 |   }
176 | 
177 |   //-------------------------------------------------------------------------
178 |   mcode_val_t operator>>(const mcode_val_t &o) const
179 |   {
180 |     return mcode_val_t(right_ushift<uint64>(val, o.val), size);
181 |   }
182 | 
183 |   //-------------------------------------------------------------------------
184 |   mcode_val_t sar(const mcode_val_t &o) const
185 |   {
186 |     return mcode_val_t(right_sshift<int64>(signed_val(), o.val), size);
187 |   }
188 | 
189 |   //-------------------------------------------------------------------------
190 |   mcode_val_t operator|(const mcode_val_t &o) const
191 |   {
192 |     check_size_equal(o);
193 |     return mcode_val_t(val | o.val, size);
194 |   }
195 | 
196 |   //-------------------------------------------------------------------------
197 |   mcode_val_t operator&(const mcode_val_t &o) const
198 |   {
199 |     check_size_equal(o);
200 |     return mcode_val_t(val & o.val, size);
201 |   }
202 | 
203 |   //-------------------------------------------------------------------------
204 |   mcode_val_t operator^(const mcode_val_t &o) const
205 |   {
206 |     check_size_equal(o);
207 |     return mcode_val_t(val ^ o.val, size);
208 |   }
209 | 
210 |   //-------------------------------------------------------------------------
211 |   mcode_val_t operator-() const
212 |   {
213 |     return mcode_val_t(-val, size);
214 |   }
215 | 
216 |   //-------------------------------------------------------------------------
217 |   mcode_val_t operator!() const
218 |   {
219 |     return mcode_val_t(!val, size);
220 |   }
221 | 
222 |   //-------------------------------------------------------------------------
223 |   mcode_val_t operator~() const
224 |   {
225 |     return mcode_val_t(~val, size);
226 |   }
227 | };
228 | 
229 | //-------------------------------------------------------------------------
230 | class mcode_emulator_t
231 | {
232 | public:
233 |   // base classes with virtual functions should have a virtual dtr
234 |   virtual ~mcode_emulator_t() {}
235 |   // returns the value assigned to a register, stack, global, or local variable
236 |   virtual mcode_val_t get_var_val(const mop_t &mop) = 0;
237 | 
238 |   //-------------------------------------------------------------------------
239 |   mcode_val_t mop_value(const mop_t &mop)
240 |   {
241 |     if ( mop.size > 8 )
242 |       throw "too big mop size in mcode emulator";
243 |     switch ( mop.t )
244 |     {
245 |       case mop_n:
246 |         return mcode_val_t(mop.nnn->value, mop.size);
247 |       case mop_d:
248 |         return minsn_value(*mop.d);
249 |       case mop_r: // register
250 |       case mop_S: // stack variable
251 |       case mop_v: // global variable
252 |       case mop_l:
253 |         return get_var_val(mop);
254 |       default:
255 |         throw "unhandled mop type in mcode emulator";
256 |     }
257 |   }
258 | 
259 |   //-------------------------------------------------------------------------
260 |   mcode_val_t minsn_value(const minsn_t &insn)
261 |   {
262 |     if ( insn.is_fpinsn() )
263 |     {
264 |       msg("Emulator does not support floating point\n");
265 |       throw "Emulator does not support floating point";
266 |     }
267 |     switch ( insn.opcode )
268 |     {
269 |       case m_ldc:
270 |       case m_mov:
271 |         return mop_value(insn.l);
272 |       case m_neg:
273 |         return -mop_value(insn.l);
274 |       case m_lnot:
275 |         return !mop_value(insn.l);
276 |       case m_bnot:
277 |         return ~mop_value(insn.l);
278 |       case m_xds:
279 |         return mop_value(insn.l).sext(insn.d.size);
280 |       case m_xdu:
281 |         return mop_value(insn.l).zext(insn.d.size);
282 |       case m_low:
283 |         return mop_value(insn.l).low(insn.d.size);
284 |       case m_high:
285 |         return mop_value(insn.l).high(insn.d.size);
286 |       case m_add:
287 |         return mop_value(insn.l) + mop_value(insn.r);
288 |       case m_sub:
289 |         return mop_value(insn.l) - mop_value(insn.r);
290 |       case m_mul:
291 |         return mop_value(insn.l) * mop_value(insn.r);
292 |       case m_udiv:
293 |         return mop_value(insn.l) / mop_value(insn.r);
294 |       case m_sdiv:
295 |         return mop_value(insn.l).sdiv(mop_value(insn.r));
296 |       case m_umod:
297 |         return mop_value(insn.l) & mop_value(insn.r);
298 |       case m_smod:
299 |         return mop_value(insn.l).smod(mop_value(insn.r));
300 |       case m_or:
301 |         return mop_value(insn.l) | mop_value(insn.r);
302 |       case m_and:
303 |         return mop_value(insn.l) & mop_value(insn.r);
304 |       case m_xor:
305 |         return mop_value(insn.l) ^ mop_value(insn.r);
306 |       case m_shl:
307 |         return mop_value(insn.l) << mop_value(insn.r);
308 |       case m_shr:
309 |         return mop_value(insn.l) >> mop_value(insn.r);
310 |       case m_sar:
311 |         return mop_value(insn.l).sar(mop_value(insn.r));
312 |       case m_sets:
313 |         return mcode_val_t(mop_value(insn.l).signed_val() < 0, insn.d.size);
314 |       case m_setnz:
315 |         return mcode_val_t(mop_value(insn.l) != mop_value(insn.r), insn.d.size);
316 |       case m_setz:
317 |         return mcode_val_t(mop_value(insn.l) == mop_value(insn.r), insn.d.size);
318 |       case m_setae:
319 |         return mcode_val_t(mop_value(insn.l).val >= mop_value(insn.r).val, insn.d.size);
320 |       case m_setb:
321 |         return mcode_val_t(mop_value(insn.l).val < mop_value(insn.r).val, insn.d.size);
322 |       case m_seta:
323 |         return mcode_val_t(mop_value(insn.l).val > mop_value(insn.r).val, insn.d.size);
324 |       case m_setbe:
325 |         return mcode_val_t(mop_value(insn.l).val <= mop_value(insn.r).val, insn.d.size);
326 |       case m_setg:
327 |         return mcode_val_t(mop_value(insn.l).signed_val() > mop_value(insn.r).signed_val(), insn.d.size);
328 |       case m_setge:
329 |         return mcode_val_t(mop_value(insn.l).signed_val() >= mop_value(insn.r).signed_val(), insn.d.size);
330 |       case m_setl:
331 |         return mcode_val_t(mop_value(insn.l).signed_val() < mop_value(insn.r).signed_val(), insn.d.size);
332 |       case m_setle:
333 |         return mcode_val_t(mop_value(insn.l).signed_val() <= mop_value(insn.r).signed_val(), insn.d.size);
334 |       default:
335 |         msg("Unhandled opcode in emulator %d\n", insn.opcode);
336 |         throw "Unhandled opcode";
337 |     }
338 |   }
339 | };


--------------------------------------------------------------------------------
/minsn_template.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #pragma once
 10 | #include <hexrays.hpp>
 11 | #include "linear_exprs.hpp"
 12 | #include "consts.hpp"
 13 | 
 14 | //-------------------------------------------------------------------------
 15 | struct default_mops_t
 16 | {
 17 |   mopvec_t mops;
 18 | 
 19 |   static default_mops_t *get_instance()
 20 |   {
 21 |     if ( instance == nullptr )
 22 |       instance = new default_mops_t();
 23 |     return instance;
 24 |   }
 25 | 
 26 | private:
 27 |   static default_mops_t *instance;
 28 |   default_mops_t()
 29 |   {
 30 |     for ( int i = 0; i < CANDIDATE_EXPR_NUMINPUTS; i++ )
 31 |     {
 32 |       mop_t new_var;
 33 |       new_var.t = mop_l;
 34 |       new_var.l = new lvar_ref_t(nullptr, i);
 35 |       new_var.size = 8;
 36 |       mops.push_back(new_var);
 37 |     }
 38 |   }
 39 | };
 40 | 
 41 | //-------------------------------------------------------------------------
 42 | // a minsn template has no defined size or assigned terminal mops
 43 | class minsn_template_t
 44 | {
 45 | public:
 46 |   // caller is responsible for freeing the minsn_t *
 47 |   virtual minsn_t *synthesize(ea_t ea, int size, const qvector<mop_t> &mops) const = 0;
 48 |   virtual ~minsn_template_t() {}
 49 | 
 50 |   const char *dstr() const
 51 |   {
 52 |     minsn_t *insn = synthesize(0, 8, default_mops_t::get_instance()->mops);
 53 |     const char *res = insn->dstr();
 54 |     delete insn;
 55 |     return res;
 56 |   }
 57 | };
 58 | 
 59 | typedef std::shared_ptr<minsn_template_t> minsn_template_ptr_t;
 60 | typedef qvector<minsn_template_ptr_t> minsn_templates_t;
 61 | 
 62 | //-------------------------------------------------------------------------
 63 | struct mt_constant_t : public minsn_template_t
 64 | {
 65 |   uint64_t val;
 66 | 
 67 |   mt_constant_t(uint64_t v) : val(v) {}
 68 |   minsn_t *synthesize(ea_t ea, int size, const qvector<mop_t>&) const override
 69 |   {
 70 |     minsn_t *res = new minsn_t(ea);
 71 |     res->opcode = m_ldc;
 72 |     res->l.make_number(val, size, ea);
 73 |     res->r.zero();
 74 |     res->d.size = size;
 75 |     return res;
 76 |   }
 77 | };
 78 | 
 79 | //-------------------------------------------------------------------------
 80 | struct mt_varref_t : public minsn_template_t
 81 | {
 82 |   int var_idx;
 83 | 
 84 |   mt_varref_t(int v) : var_idx(v) {}
 85 |   minsn_t *synthesize(ea_t ea, int size, const qvector<mop_t> &mops) const override
 86 |   {
 87 |     QASSERT(30704, var_idx < mops.size());
 88 |     return resize_mop(ea, mops[var_idx], size, false);
 89 |   }
 90 | };
 91 | 
 92 | //-------------------------------------------------------------------------
 93 | struct mt_comp_t : public minsn_template_t
 94 | {
 95 |   mcode_t opc;
 96 |   minsn_templates_t operands;
 97 | 
 98 |   mt_comp_t(mcode_t op, minsn_templates_t opr) : opc(op), operands(opr) {}
 99 | 
100 |   minsn_t *synthesize(ea_t ea, int size, const qvector<mop_t> &mops) const override
101 |   {
102 |     minsn_t *res = new minsn_t(ea);
103 |     res->opcode = opc;
104 |     res->l.zero();
105 |     res->r.zero();
106 | 
107 |     if ( operands.size() >= 1 )
108 |     {
109 |       minsn_t *l = operands[0]->synthesize(ea, size, mops);
110 |       res->l.create_from_insn(l);
111 |       delete l;
112 |     }
113 |     if ( operands.size() >= 2 )
114 |     {
115 |       minsn_t *r = operands[1]->synthesize(ea, size, mops);
116 |       res->r.create_from_insn(r);
117 |       delete r;
118 |     }
119 | 
120 |     res->d.size = size;
121 |     return res;
122 |   }
123 | };
124 | 
125 | inline minsn_template_ptr_t make_un(mcode_t opc, minsn_template_ptr_t a)
126 | {
127 |   minsn_templates_t operands;
128 |   operands.push_back(a);
129 |   return std::make_shared<mt_comp_t>(opc, operands);
130 | }
131 | 
132 | inline minsn_template_ptr_t make_bin(mcode_t opc, minsn_template_ptr_t a, minsn_template_ptr_t b)
133 | {
134 |   minsn_templates_t operands;
135 |   operands.push_back(a);
136 |   operands.push_back(b);
137 |   return std::make_shared<mt_comp_t>(opc, operands);
138 | }
139 | 
140 | inline minsn_template_ptr_t operator+(minsn_template_ptr_t a, minsn_template_ptr_t b)
141 | {
142 |   return make_bin(m_add, a, b);
143 | }
144 | inline minsn_template_ptr_t operator*(minsn_template_ptr_t a, minsn_template_ptr_t b)
145 | {
146 |   return make_bin(m_mul, a, b);
147 | }
148 | inline minsn_template_ptr_t operator&(minsn_template_ptr_t a, minsn_template_ptr_t b)
149 | {
150 |   return make_bin(m_and, a, b);
151 | }
152 | inline minsn_template_ptr_t operator|(minsn_template_ptr_t a, minsn_template_ptr_t b)
153 | {
154 |   return make_bin(m_or, a, b);
155 | }
156 | inline minsn_template_ptr_t operator^(minsn_template_ptr_t a, minsn_template_ptr_t b)
157 | {
158 |   return make_bin(m_xor, a, b);
159 | }
160 | inline minsn_template_ptr_t operator~(minsn_template_ptr_t a)
161 | {
162 |   return make_un(m_bnot, a);
163 | }
164 | 


--------------------------------------------------------------------------------
/msynth_parser.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "msynth_parser.hpp"
 11 | #include "minsn_template.hpp"
 12 | 
 13 | default_mops_t *default_mops_t::instance = nullptr;
 14 | 
 15 | minsn_t *msynth_expr_parser_t::parse_next_expr()
 16 | {
 17 |   if ( *next == '~' )
 18 |   {
 19 |     next++;
 20 |     minsn_t *res = new minsn_t(0);
 21 |     res->opcode = m_bnot;
 22 |     minsn_t *next_expr = parse_next_expr();
 23 |     res->l.create_from_insn(next_expr);
 24 |     delete next_expr;
 25 |     next_expr = nullptr;
 26 |     res->d.size = res->l.size;
 27 |     return res;
 28 |   }
 29 | 
 30 |   // ExprInt(val: uint64, bitw: int)
 31 |   {
 32 |     int nread;
 33 |     uint64 val;
 34 |     int bitw;
 35 |     int sr = qsscanf(next, "ExprInt(%" FMT_64 "u, %d)%n", &val, &bitw, &nread);
 36 |     if ( sr == 2 )
 37 |     {
 38 |       next += nread;
 39 | 
 40 |       minsn_t *res = new minsn_t(0);
 41 |       res->opcode = m_ldc;
 42 |       res->l.make_number(val, bitw/8);
 43 |       res->r.zero();
 44 |       res->d.size = bitw/8;
 45 |       return res;
 46 |     }
 47 |   }
 48 | 
 49 |   // ExprId(id: str, bitw: int)
 50 |   {
 51 |     int nread;
 52 |     int varnum, bitw;
 53 |     int sr = qsscanf(next, "ExprId(\"p%d\", %d)%n", &varnum, &bitw, &nread);
 54 |     if ( sr == 2 )
 55 |     {
 56 |       next += nread;
 57 |       minsn_t *res = new minsn_t(0);
 58 |       res->opcode = bitw == 64 ? m_mov : m_low;
 59 |       res->l = vars[varnum];
 60 |       res->d.size = bitw/8;
 61 |       return res;
 62 |     }
 63 |   }
 64 | 
 65 |   // ExprOp(op: str, expr*)
 66 |   {
 67 |     int sc = strncmp(next, "ExprOp", 6);
 68 |     if ( sc == 0 )
 69 |     {
 70 |       int nread;
 71 |       next += 6;
 72 |       char op[3];
 73 |       int sr = qsscanf(next, "(\"%2[^\"]\"%n", op, &nread);
 74 |       QASSERT(30688, sr == 1);
 75 |       next += nread;
 76 | 
 77 |       minsnptrs_t args;
 78 |       while ( *next != ')' )
 79 |       {
 80 |         sc = strncmp(next, ", ", 2);
 81 |         QASSERT(30689, sc == 0);
 82 |         next += 2;
 83 | 
 84 |         args.push_back(parse_next_expr());
 85 |       }
 86 | 
 87 |       next++; // consume the ')'
 88 | 
 89 |       // - can be either unary or binary
 90 |       if ( streq(op, "-") )
 91 |       {
 92 |         if ( args.size() == 1 )
 93 |           return make_un(m_neg, &args);
 94 |         if ( args.size() == 2 )
 95 |           return make_bin(m_sub, &args);
 96 |         INTERR(30690);
 97 |       }
 98 |       else
 99 |       {
100 |         mcode_t code = get_binop(op);
101 |         if ( code != m_nop )
102 |           return make_bin(code, &args);
103 |       }
104 |       INTERR(30691);
105 |     }
106 |   }
107 | 
108 |   // ExprSlice(expr, low, hi)
109 |   {
110 |     int sc = strncmp(next, "ExprSlice", 9);
111 |     if ( sc == 0 )
112 |     {
113 |       next += 9;
114 |       QASSERT(30692, *next == '(');
115 |       next++;
116 |       minsn_t *to_slice = parse_next_expr();
117 |       int lo, hi, nread;
118 |       int sr = qsscanf(next, ", %d, %d)%n", &lo, &hi, &nread);
119 |       QASSERT(30693, sr == 2);
120 |       next += nread;
121 |       minsn_t *res = make_slice(to_slice, lo, hi);
122 |       delete to_slice;
123 |       return res;
124 |     }
125 |   }
126 | 
127 |   INTERR(30694);
128 | }
129 | 


--------------------------------------------------------------------------------
/msynth_parser.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #pragma once
 10 | #include <hexrays.hpp>
 11 | #include "linear_exprs.hpp"
 12 | 
 13 | //-------------------------------------------------------------------------
 14 | struct bin_op_t
 15 | {
 16 |   const char *text;
 17 |   mcode_t opcode;
 18 | };
 19 | 
 20 | static const bin_op_t bin_ops[] =
 21 | {
 22 |   { "+",  m_add  }, // "-" is handled separately since it can also be unary
 23 |   { "*",  m_mul  },
 24 |   { "/",  m_udiv },
 25 |   { "&",  m_and  },
 26 |   { "|",  m_or   },
 27 |   { "^",  m_xor  },
 28 |   { "<<", m_shl  },
 29 | };
 30 | 
 31 | //-------------------------------------------------------------------------
 32 | inline mcode_t get_binop(const char *op)
 33 | {
 34 |   for ( size_t i=0; i < qnumber(bin_ops); i++ )
 35 |     if ( streq(bin_ops[i].text, op) )
 36 |       return bin_ops[i].opcode;
 37 |   return m_nop;
 38 | }
 39 | 
 40 | //-------------------------------------------------------------------------
 41 | class msynth_expr_parser_t
 42 | {
 43 | public:
 44 |   const char *next;
 45 |   const mopvec_t &vars;
 46 | 
 47 | 
 48 |   //-------------------------------------------------------------------------
 49 |   void init_from_arg(mop_t *op, minsn_t **pp_ins)
 50 |   {
 51 |     minsn_t *ins = *pp_ins;
 52 |     op->create_from_insn(ins);
 53 |     delete ins;
 54 |     *pp_ins = nullptr;
 55 |   }
 56 | 
 57 |   //-------------------------------------------------------------------------
 58 |   minsn_t *make_un(mcode_t opcode, minsnptrs_t *args)
 59 |   {
 60 |     QASSERT(30683, args->size() == 1);
 61 |     minsn_t *res = new minsn_t(0);
 62 |     res->opcode = opcode;
 63 |     init_from_arg(&res->l, args->begin() + 0);
 64 |     res->d.size = res->l.size;
 65 |     return res;
 66 |   }
 67 | 
 68 |   //-------------------------------------------------------------------------
 69 |   minsn_t *make_bin(mcode_t opcode, minsnptrs_t *args)
 70 |   {
 71 |     QASSERT(30684, args->size() == 2);
 72 |     minsn_t *res = new minsn_t(0);
 73 |     res->opcode = opcode;
 74 |     init_from_arg(&res->l, args->begin() + 0);
 75 |     init_from_arg(&res->r, args->begin() + 1);
 76 |     if ( opcode == m_shl && res->r.size != 1 )
 77 |       res->r.change_size(1);
 78 |     res->d.size = res->l.size;
 79 |     return res;
 80 |   }
 81 | 
 82 |   //-------------------------------------------------------------------------
 83 |   minsn_t *make_slice(minsn_t *src, int lo, int hi)
 84 |   {
 85 |     QASSERT(30686, lo == 0);
 86 |     QASSERT(30687, hi == 8 || hi == 16 || hi == 32);
 87 | 
 88 |     minsn_t *res = new minsn_t(0);
 89 |     res->opcode = m_low;
 90 |     res->l.create_from_insn(src);
 91 |     res->d.size = hi / 8;
 92 |     return res;
 93 |   }
 94 | 
 95 |   minsn_t *parse_next_expr();
 96 | 
 97 | public:
 98 |   msynth_expr_parser_t(const char *s, const mopvec_t &v) : next(s), vars(v) {}
 99 | };
100 | 


--------------------------------------------------------------------------------
/optimizer.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include <chrono>
 10 | 
 11 | #include "z3++_no_warn.h"
 12 | #include "optimizer.hpp"
 13 | 
 14 | //--------------------------------------------------------------------------
 15 | // check whether or not we should skip the proving step of optimization
 16 | inline bool skip_proofs()
 17 | {
 18 |   return qgetenv("VD_MBA_SKIP_PROOFS");
 19 | }
 20 | 
 21 | //--------------------------------------------------------------------------
 22 | inline void set_cmt(ea_t ea, const char *cmt)
 23 | {
 24 |   func_t *pfn = get_func(ea);
 25 |   set_func_cmt(pfn, cmt, false);
 26 | }
 27 | 
 28 | //--------------------------------------------------------------------------
 29 | static bool check_and_substitute(
 30 |         minsn_t *insn,
 31 |         minsn_t *cand_insn,
 32 |         uint z3_timeout,
 33 |         bool z3_assume_timeouts_correct)
 34 | {
 35 |   bool ok = false;
 36 |   int original_score = score_complexity(*insn);
 37 |   int candidate_score = score_complexity(*cand_insn);
 38 |   msg("Testing candidate %s\n", cand_insn->dstr());
 39 |   if ( candidate_score > original_score )
 40 |   {
 41 |     msg("Candidate (%d) is not simpler than original (%d), skipping\n", candidate_score, original_score);
 42 |   }
 43 |   else
 44 |   {
 45 |     z3_converter_t converter;
 46 |     if ( probably_equivalent(*insn, *cand_insn) )
 47 |     {
 48 |       msg("Instruction is probably equivalent to candidate\n");
 49 |       if ( skip_proofs() || z3_timeout == 0 )
 50 |       {
 51 |         set_cmt(insn->ea, "goomba: z3 proof skipped, simplification assumed correct");
 52 |         ok = true;
 53 |       }
 54 |       else
 55 |       {
 56 |         z3::expr lge = converter.minsn_to_expr(*cand_insn);
 57 |         z3::expr ie = converter.minsn_to_expr(*insn);
 58 |         z3::solver s(converter.context);
 59 |         s.set("timeout", z3_timeout);
 60 |         s.add(lge != ie);
 61 |         z3::check_result res = s.check();
 62 |         msg("SMT check result: %d\n", res);
 63 | 
 64 |         if ( res == z3::check_result::unsat )
 65 |         {
 66 |           ok = true;
 67 |         }
 68 | 
 69 |         if ( z3_assume_timeouts_correct && res == z3::check_result::unknown )
 70 |         {
 71 |           set_cmt(insn->ea, "goomba: z3 proof timed out, simplification assumed correct");
 72 |           ok = true;
 73 |         }
 74 |       }
 75 |     }
 76 |     else
 77 |     {
 78 |       msg("Candidate not equivalent, skipping\n");
 79 |     }
 80 |   }
 81 | 
 82 |   if ( ok )
 83 |     substitute(insn, cand_insn);
 84 | 
 85 |   return ok;
 86 | }
 87 | 
 88 | //--------------------------------------------------------------------------
 89 | bool optimizer_t::optimize_insn_recurse(minsn_t *insn)
 90 | {
 91 |   if ( optimize_insn(insn) )
 92 |     return true;
 93 | 
 94 |   bool result = false;
 95 | 
 96 |   if ( insn->l.is_insn() )
 97 |     result |= optimize_insn_recurse(insn->l.d);
 98 | 
 99 |   if ( insn->r.is_insn() )
100 |     result |= optimize_insn_recurse(insn->r.d);
101 | 
102 |   return result;
103 | }
104 | 
105 | //--------------------------------------------------------------------------
106 | bool optimizer_t::optimize_insn(minsn_t *insn)
107 | {
108 |   bool success = false;
109 |   auto start_time = std::chrono::high_resolution_clock::now();
110 | 
111 |   if ( insn->has_side_effects(true) )
112 |   {
113 | //    msg("Instruction has side effects, skipping\n");
114 |   }
115 |   else
116 |   {
117 |     if ( is_mba(*insn) )
118 |     {
119 |       msg("Found MBA instruction %s\n", insn->dstr());
120 | 
121 |       try
122 |       {
123 |         minsn_set_t candidate_set; // recall minsn_set_t is automatically sorted by complexity
124 |         auto equiv_class_start = std::chrono::high_resolution_clock::now();
125 |         if ( equiv_classes != nullptr )
126 |           equiv_classes->find_candidates(candidate_set, *insn);
127 |         auto equiv_class_end = std::chrono::high_resolution_clock::now();
128 | 
129 |         auto linear_start = equiv_class_end;
130 |         linear_expr_t linear_guess(*insn);
131 | //        msg("Linear guess %s\n", linear_guess.dstr());
132 |         candidate_set.insert(linear_guess.to_minsn(insn->ea));
133 |         auto linear_end = std::chrono::high_resolution_clock::now();
134 | 
135 |         auto lin_conj_start = linear_end;
136 |         lin_conj_expr_t lin_conj_guess(*insn);
137 |         simp_lin_conj_expr_t simp_lin_conj_expr_t(lin_conj_guess);
138 | //        msg("Simplified lin conj guess %s\n", simp_lin_conj_expr_t.dstr());
139 |         candidate_set.insert(simp_lin_conj_expr_t.to_minsn(insn->ea));
140 |         auto lin_conj_end = std::chrono::high_resolution_clock::now();
141 | 
142 |         for ( minsn_t *cand : candidate_set )
143 |         {
144 |           cand->optimize_solo(); // get rid of useless mov(#0) operands
145 |           if ( check_and_substitute(insn, cand, z3_timeout, z3_assume_timeouts_correct) )
146 |           {
147 |             if ( qgetenv("VD_MBA_LOG_PERF") )
148 |             {
149 |               int nvars = get_input_mops(*insn).size();
150 |               msg("Equiv class time: %d %" FMT_64 "d us\n", nvars,
151 |                 std::chrono::duration_cast<std::chrono::microseconds>(equiv_class_end - equiv_class_start).count());
152 |               msg("Linear time: %d %" FMT_64 "d us\n", nvars,
153 |                 std::chrono::duration_cast<std::chrono::microseconds>(linear_end - linear_start).count());
154 |               msg("Lin conj time: %d %" FMT_64 "d us\n", nvars,
155 |                 std::chrono::duration_cast<std::chrono::microseconds>(lin_conj_end - lin_conj_start).count());
156 |             }
157 |             success = true;
158 |             goto finish;
159 |           }
160 |         }
161 |       }
162 |       catch ( const char *&e )
163 |       {
164 |         msg("err: %s\n", e);
165 |         return false;
166 |       }
167 |     }
168 |   }
169 | 
170 | finish:
171 |   if ( success )
172 |   {
173 |     auto end_time = std::chrono::high_resolution_clock::now();
174 |     msg("Time taken: %" FMT_64 "d us\n",
175 |       std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time).count());
176 |   }
177 | 
178 |   return success;
179 | }
180 | 


--------------------------------------------------------------------------------
/optimizer.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | 
11 | #include "equiv_class.hpp"
12 | #include "smt_convert.hpp"
13 | #include "heuristics.hpp"
14 | #include "lin_conj_exprs.hpp"
15 | #include "simp_lin_conj_exprs.hpp"
16 | 
17 | //--------------------------------------------------------------------------
18 | inline void substitute(minsn_t *insn, minsn_t *cand)
19 | {
20 |   cand->d = insn->d;
21 |   insn->swap(*cand);
22 | }
23 | 
24 | //--------------------------------------------------------------------------
25 | class optimizer_t
26 | {
27 | public:
28 |   uint z3_timeout = 1000;
29 |   bool z3_assume_timeouts_correct = true;
30 |   equiv_class_finder_t *equiv_classes = nullptr;
31 |   bool optimize_insn(minsn_t *insn); // attempts to replace the instruction with a simpler version
32 |   bool optimize_insn_recurse(minsn_t *insn); // attempts to optimize the instruction, and if it fails, optimizes its subinstructions
33 | };
34 | 


--------------------------------------------------------------------------------
/simp_lin_conj_exprs.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #pragma once
 10 | #include <memory>
 11 | #include <hexrays.hpp>
 12 | #include "lin_conj_exprs.hpp"
 13 | #include "minsn_template.hpp"
 14 | #include "bitwise_expr_lookup_tbl.hpp"
 15 | 
 16 | //-------------------------------------------------------------------------
 17 | // represents a simplified linear combination of conjunctions,
 18 | // essentially just a lin_conj_expr with more bitwise expressions
 19 | // other than just conjunctions
 20 | class simp_lin_conj_expr_t : public lin_conj_expr_t
 21 | {
 22 |   minsn_template_ptr_t non_conj_term = std::make_shared<mt_constant_t>(0ull);
 23 |   qvector<mcode_val_t> range; // sorted lowest to highest
 24 | 
 25 |   //-------------------------------------------------------------------------
 26 |   void recompute_range()
 27 |   {
 28 |     std::set<mcode_val_t> new_range;
 29 | 
 30 |     for ( const auto &mval : eval_trace )
 31 |       new_range.insert(mval);
 32 | 
 33 |     range.qclear();
 34 |     for ( auto &mval : new_range )
 35 |       range.push_back(mval);
 36 |   }
 37 | 
 38 |   //-------------------------------------------------------------------------
 39 |   // returns a bitfield where the i'th bit indicates whether the i'th evaluation
 40 |   // returns the value of pos
 41 |   uint64 eval_trace_to_bit_trace(const eval_trace_t &src_trace, mcode_val_t pos)
 42 |   {
 43 |     QASSERT(30703, src_trace.size() <= 64);
 44 | 
 45 |     uint64 res = 0;
 46 |     for ( int i = 0; i < src_trace.size(); i++ )
 47 |     {
 48 |       if ( src_trace[i] == pos )
 49 |         res |= (1ull << i);
 50 |     }
 51 | 
 52 |     return res;
 53 |   }
 54 | 
 55 |   //-------------------------------------------------------------------------
 56 |   bool reset_eval_trace()
 57 |   {
 58 |     for ( auto &et : eval_trace )
 59 |       et.val = 0;
 60 |     recompute_coeffs();
 61 |     recompute_range();
 62 |     return true;
 63 |   }
 64 | 
 65 | public:
 66 |   //-------------------------------------------------------------------------
 67 |   simp_lin_conj_expr_t(const lin_conj_expr_t &o) : lin_conj_expr_t(o)
 68 |   {
 69 |     eliminate_variables();
 70 |     recompute_range();
 71 |     simplify();
 72 |   }
 73 | 
 74 |   //-------------------------------------------------------------------------
 75 |   const char *dstr() const override
 76 |   {
 77 |     static char res[MAXSTR];
 78 | 
 79 |     minsn_t *ins = non_conj_term->synthesize(0, coeffs[0].size, mops);
 80 |     qsnprintf(res, sizeof(res), "%s + %s", lin_conj_expr_t::dstr(), ins->dstr());
 81 |     delete ins;
 82 |     return res;
 83 |   }
 84 | 
 85 |   // (1) A constant expression would lead to all variables getting eliminated by eliminate_variables,
 86 |   // so there's no need for a simplification step here.
 87 | 
 88 |   //-------------------------------------------------------------------------
 89 |   // (2) If F has two unique entries and its first entry is zero, we replace the nonzero element a by
 90 |   // 1, find the lookup table's entry for the corresponding truth vector and multiply the found
 91 |   // expression by a.
 92 |   bool simp_2()
 93 |   {
 94 |     if ( range.size() != 2 )
 95 |       return false;
 96 |     if ( eval_trace[0].val != 0 )
 97 |       return false;
 98 | 
 99 |     mcode_val_t a = range[1];
100 | 
101 |     uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, a);
102 |     auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace);
103 | 
104 |     non_conj_term = non_conj_term
105 |                   + std::make_shared<mt_constant_t>(a.val) * minsn_template;
106 | 
107 |     return reset_eval_trace();
108 |   }
109 | 
110 |   //-------------------------------------------------------------------------
111 |   // (3) If F has two unique entries a and b, both of them are nonzero, w.l.o.g., b = 2a mod 2^n, and
112 |   // F's first entry is a, we can express the result in terms of a negated single expression. We
113 |   // replace all occurences of a by zeros and that of b by ones, find the corresponding expression
114 |   // in the lookup table, negate it, and multiply it by -a.
115 |   bool simp_3()
116 |   {
117 |     if ( range.size() != 2 )
118 |       return false;
119 | 
120 |     mcode_val_t a = eval_trace[0];
121 |     mcode_val_t b = range[0] == a ? range[1] : range[0];
122 | 
123 |     if ( a * mcode_val_t(2, b.size) != b )
124 |       return false;
125 | 
126 |     uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, b);
127 |     auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace);
128 | 
129 |     non_conj_term = non_conj_term
130 |                   + std::make_shared<mt_constant_t>(-a.val) * ~minsn_template;
131 | 
132 |     return reset_eval_trace();
133 |   }
134 | 
135 |   //-------------------------------------------------------------------------
136 |   // (4) If F has two unique entries a and b, but the previous cases do not apply, and F's very first
137 |   // entry is a, we first identify a as the constant term. Then we find an expression with ones
138 |   // exactly where F has the entry b in the lookup table, multiply it by b - a and add the term to
139 |   // the constant.
140 |   bool simp_4()
141 |   {
142 |     if ( range.size() != 2 )
143 |       return false;
144 | 
145 |     mcode_val_t a = eval_trace[0];
146 |     mcode_val_t b = range[0] == a? range[1] : range[0];
147 | 
148 |     uint64 bit_trace = eval_trace_to_bit_trace(eval_trace, b);
149 |     auto minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), bit_trace);
150 | 
151 |     non_conj_term = non_conj_term
152 |                   + std::make_shared<mt_constant_t>(a.val)
153 |                   + std::make_shared<mt_constant_t>((b-a).val) * minsn_template;
154 | 
155 |     return reset_eval_trace();
156 |   }
157 | 
158 |   //-------------------------------------------------------------------------
159 |   // (5) If F has two unique nonzero entries a and b and its first one is zero, we split it into two vectors
160 |   // with ones where F has entries a or b, resp., find the corresponding expressions in the lookup
161 |   // table, multiply them by a and b, resp., and add the terms together.
162 |   bool simp_5()
163 |   {
164 |     if ( range.size() != 3 )
165 |       return false;
166 |     if ( eval_trace[0].val != 0ull )
167 |       return false;
168 | 
169 |     mcode_val_t a = range[1];
170 |     mcode_val_t b = range[2];
171 | 
172 |     uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a);
173 |     uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b);
174 |     auto a_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace);
175 |     auto b_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), b_bit_trace);
176 | 
177 |     non_conj_term = non_conj_term
178 |                   + std::make_shared<mt_constant_t>(a.val) * a_minsn_template
179 |                   + std::make_shared<mt_constant_t>(b.val) * b_minsn_template;
180 | 
181 |     return reset_eval_trace();
182 |   }
183 | 
184 |   //-------------------------------------------------------------------------
185 |   // (6) If F has three unique nonzero entries a, b and c and its first one is 0, we try to express one
186 |   // of them as a sum of the others modulo 2n, e.g., a = b + c. In that case we split F into two
187 |   // vectors with ones where F has entries b or c, resp., or a, find the corresponding expressions in
188 |   // the lookup table, multiply them by b and c, resp., and add the terms together.
189 |   bool simp_6()
190 |   {
191 |     if ( range.size() != 4 )
192 |       return false;
193 |     if ( eval_trace[0].val != 0ull )
194 |       return false;
195 | 
196 |     mcode_val_t a = range[1];
197 |     mcode_val_t b = range[2];
198 |     mcode_val_t c = range[3];
199 | 
200 |     // make sure that a = b + c
201 |     if ( b == a + c )
202 |       qswap(a, b);
203 |     else if ( c == a + b )
204 |       qswap(a, c);
205 |     else if ( a != b + c )
206 |       return false;
207 | 
208 |     QASSERT(30705, a == b + c);
209 | 
210 |     uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a);
211 |     uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b);
212 |     uint64 c_bit_trace = eval_trace_to_bit_trace(eval_trace, c);
213 |     auto ab_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace | b_bit_trace);
214 |     auto ac_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace | c_bit_trace);
215 | 
216 |     non_conj_term = non_conj_term
217 |                   + std::make_shared<mt_constant_t>(b.val) * ab_minsn_template
218 |                   + std::make_shared<mt_constant_t>(c.val) * ac_minsn_template;
219 | 
220 |     return reset_eval_trace();
221 |   }
222 | 
223 |   //-------------------------------------------------------------------------
224 |   // (7) If F has three unique nonzero entries a, b and c, its first one is 0 and the previous case does
225 |   // not apply, we split it into three vectors with ones where F has entries a, b or c, resp., find the
226 |   // corresponding expressions in the lookup table, multiply them by a, b and c, resp., and add the
227 |   // terms together.
228 |   bool simp_7()
229 |   {
230 |     if ( range.size() != 4 )
231 |       return false;
232 |     if ( eval_trace[0].val != 0ull )
233 |       return false;
234 | 
235 |     mcode_val_t a = range[1];
236 |     mcode_val_t b = range[2];
237 |     mcode_val_t c = range[3];
238 | 
239 |     uint64 a_bit_trace = eval_trace_to_bit_trace(eval_trace, a);
240 |     uint64 b_bit_trace = eval_trace_to_bit_trace(eval_trace, b);
241 |     uint64 c_bit_trace = eval_trace_to_bit_trace(eval_trace, c);
242 |     auto a_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), a_bit_trace);
243 |     auto b_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), b_bit_trace);
244 |     auto c_minsn_template = bw_expr_tbl_t::instance.lookup(mops.size(), c_bit_trace);
245 | 
246 |     non_conj_term = non_conj_term
247 |                   + std::make_shared<mt_constant_t>(a.val) * a_minsn_template
248 |                   + std::make_shared<mt_constant_t>(b.val) * b_minsn_template
249 |                   + std::make_shared<mt_constant_t>(c.val) * c_minsn_template;
250 | 
251 |     return reset_eval_trace();
252 |   }
253 | 
254 |   //-------------------------------------------------------------------------
255 |   bool simp_8()
256 |   {
257 |     if ( range.size() != 4 )
258 |       return false;
259 |     if ( eval_trace[0].val == 0ull )
260 |       return false;
261 | 
262 |     mcode_val_t a = eval_trace[0];
263 | 
264 |     non_conj_term = non_conj_term + std::make_shared<mt_constant_t>(a.val);
265 | 
266 |     for ( int i = 0; i < eval_trace.size(); i++ )
267 |       eval_trace[i] = eval_trace[i] - a;
268 |     recompute_coeffs();
269 |     recompute_range();
270 |     return simplify(); // start again
271 |   }
272 | 
273 |   //-------------------------------------------------------------------------
274 |   bool simplify()
275 |   {
276 |     if ( mops.size() < 1 || mops.size() > 3 )
277 |       return false;
278 |     if ( simp_2() )
279 |       return true;
280 |     if ( simp_3() )
281 |       return true;
282 |     if ( simp_4() )
283 |       return true;
284 |     if ( simp_5() )
285 |       return true;
286 |     if ( simp_6() )
287 |       return true;
288 |     if ( simp_7() )
289 |       return true;
290 |     if ( simp_8() )
291 |       return true;
292 |     return false;
293 |   }
294 | 
295 |   //-------------------------------------------------------------------------
296 |   minsn_t *to_minsn(ea_t ea) const override
297 |   {
298 |     minsn_t *res = new minsn_t(ea);
299 |     minsn_t *l = lin_conj_expr_t::to_minsn(ea);
300 |     minsn_t *r = non_conj_term->synthesize(ea, coeffs[0].size, mops);
301 | 
302 |     res->opcode = m_add;
303 |     res->l.create_from_insn(l);
304 |     res->r.create_from_insn(r);
305 |     res->d.size = coeffs[0].size;
306 | 
307 |     delete l;
308 |     delete r;
309 |     return res;
310 |   }
311 | };


--------------------------------------------------------------------------------
/smt_convert.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
  3 |  *      ALL RIGHTS RESERVED.
  4 |  *
  5 |  *      gooMBA plugin for Hex-Rays Decompiler.
  6 |  *
  7 |  */
  8 | 
  9 | #include "z3++_no_warn.h"
 10 | #include "smt_convert.hpp"
 11 | 
 12 | //--------------------------------------------------------------------------
 13 | z3::expr z3_converter_t::create_new_z3_var(const mop_t &mop)
 14 | {
 15 |   const char *name = build_new_varname();
 16 |   return context.bv_const(name, mop.size * 8);
 17 | }
 18 | 
 19 | //--------------------------------------------------------------------------
 20 | z3::expr z3_converter_t::var_to_expr(const mop_t &mop)
 21 | {
 22 |   if ( assigned_vars.count(mop) )
 23 |     return assigned_vars.at(mop);
 24 | 
 25 |   // mop has not yet been assigned a z3 var, make one now
 26 |   z3::expr new_var = create_new_z3_var(mop);
 27 |   input_vars.push_back(new_var);
 28 |   assigned_vars.insert( { mop, new_var } );
 29 |   return new_var;
 30 | }
 31 | 
 32 | //--------------------------------------------------------------------------
 33 | z3::expr z3_converter_t::mop_to_expr(const mop_t &mop)
 34 | {
 35 |   switch ( mop.t )
 36 |   {
 37 |     case mop_n: // immediate value
 38 |       {
 39 |         int bytesz = mop.size;
 40 |         uint64_t value = mop.nnn->value;
 41 |         return context.bv_val(value, bytesz * 8); // z3 counts size in bits
 42 |       }
 43 | 
 44 |     case mop_d: // result of another instruction
 45 |       return minsn_to_expr(*mop.d);
 46 | 
 47 |     case mop_r: // register
 48 |     case mop_S: // stack variable
 49 |     case mop_v: // global variable
 50 |       {
 51 |         auto p = assigned_vars.find(mop);
 52 |         if ( p != assigned_vars.end() )
 53 |           return p->second;
 54 | 
 55 |         // mop has not yet been assigned a z3 var, make one now
 56 |         const char *name = build_new_varname();
 57 |         z3::expr new_var = context.bv_const(name, mop.size * 8);
 58 |         input_vars.push_back(new_var);
 59 |         assigned_vars.insert( { mop, new_var } );
 60 |         return new_var;
 61 |       }
 62 |     default:
 63 |       INTERR(30696); // it is better to check this before running z3, when detecting mba
 64 |   }
 65 | }
 66 | 
 67 | //--------------------------------------------------------------------------
 68 | z3::expr z3_converter_t::minsn_to_expr(const minsn_t &insn)
 69 | {
 70 |   switch ( insn.opcode )
 71 |   {
 72 |     case m_ldc: // load constant
 73 |     case m_mov: // move
 74 |       return mop_to_expr(insn.l);
 75 |     case m_neg:
 76 |       return -mop_to_expr(insn.l);
 77 |     case m_lnot:
 78 |       {
 79 |         int bitsz = insn.l.size * 8;
 80 |         z3::expr bool_res = mop_to_expr(insn.l) == context.bv_val(0, bitsz);
 81 |         // !x === (x == 0)
 82 |         return bool_to_bv(bool_res, bitsz);
 83 |       }
 84 |     case m_bnot:
 85 |       return ~mop_to_expr(insn.l);
 86 |     case m_xds: // signed extension
 87 |     case m_xdu: // unsigned (zero) extension
 88 |       {
 89 |         auto e = mop_to_expr(insn.l);
 90 |         int orig_bitsz = e.get_sort().bv_size();
 91 |         int dest_bitsz = insn.d.size * 8;
 92 |         QASSERT(30674, dest_bitsz >= orig_bitsz);
 93 |         if ( insn.opcode == m_xdu )
 94 |           return z3::zext(e, dest_bitsz - orig_bitsz);
 95 |         else
 96 |           return z3::sext(e, dest_bitsz - orig_bitsz);
 97 |       }
 98 |     case m_low:
 99 |       {
100 |         auto dest_bitsz = insn.d.size * 8;
101 |         return mop_to_expr(insn.l).extract(dest_bitsz - 1, 0);
102 |       }
103 |     case m_high:
104 |       {
105 |         auto src_bitsz = insn.l.size * 8;
106 |         auto dest_bitsz = insn.d.size * 8;
107 |         return mop_to_expr(insn.l).extract(src_bitsz - 1, src_bitsz - dest_bitsz);
108 |       }
109 |     case m_add:
110 |       return mop_to_expr(insn.l) + mop_to_expr(insn.r);
111 |     case m_sub:
112 |       return mop_to_expr(insn.l) - mop_to_expr(insn.r);
113 |     case m_mul:
114 |       return mop_to_expr(insn.l) * mop_to_expr(insn.r);
115 |     case m_udiv:
116 |       return z3::udiv(mop_to_expr(insn.l), mop_to_expr(insn.r));
117 |     case m_sdiv:
118 |       return mop_to_expr(insn.l) / mop_to_expr(insn.r);
119 |     case m_umod:
120 |       return mop_to_expr(insn.l) % mop_to_expr(insn.r);
121 |     case m_smod:
122 |       return z3::smod(mop_to_expr(insn.l), mop_to_expr(insn.r));
123 |     case m_or:
124 |       return mop_to_expr(insn.l) | mop_to_expr(insn.r);
125 |     case m_and:
126 |       return mop_to_expr(insn.l) & mop_to_expr(insn.r);
127 |     case m_xor:
128 |       return mop_to_expr(insn.l) ^ mop_to_expr(insn.r);
129 |     case m_shl:
130 |       return z3::shl(
131 |         mop_to_expr(insn.l),
132 |         bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8));
133 |     case m_shr:
134 |       return z3::lshr(
135 |         mop_to_expr(insn.l),
136 |         bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8));
137 |     case m_sar:
138 |       return z3::ashr(
139 |         mop_to_expr(insn.l),
140 |         bv_zext_to_len(mop_to_expr(insn.r), insn.l.size * 8));
141 |     case m_sets: // get sign bit of expression
142 |       return bool_to_bv(mop_to_expr(insn.l) < 0, insn.d.size * 8);
143 |     // TODO: m_seto, m_setp
144 |     case m_setnz:
145 |       return bool_to_bv(mop_to_expr(insn.l) != mop_to_expr(insn.r), insn.d.size * 8);
146 |     case m_setz:
147 |       return bool_to_bv(mop_to_expr(insn.l) == mop_to_expr(insn.r), insn.d.size * 8);
148 |     case m_setae:
149 |       return bool_to_bv(z3::uge(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
150 |     case m_setb:
151 |       return bool_to_bv(z3::ult(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
152 |     case m_seta:
153 |       return bool_to_bv(z3::ugt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
154 |     case m_setbe:
155 |       return bool_to_bv(z3::ule(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
156 |     case m_setg:
157 |       return bool_to_bv(z3::sgt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
158 |     case m_setge:
159 |       return bool_to_bv(z3::sge(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
160 |     case m_setl:
161 |       return bool_to_bv(z3::slt(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
162 |     case m_setle:
163 |       return bool_to_bv(z3::sle(mop_to_expr(insn.l), mop_to_expr(insn.r)), insn.d.size * 8);
164 |     default:
165 |       INTERR(30697); // it is better to check this before running z3, when detecting mba
166 |   }
167 | }
168 | 


--------------------------------------------------------------------------------
/smt_convert.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 |  *      Copyright (c) 2023 by Hex-Rays, support@hex-rays.com
 3 |  *      ALL RIGHTS RESERVED.
 4 |  *
 5 |  *      gooMBA plugin for Hex-Rays Decompiler.
 6 |  *
 7 |  */
 8 | 
 9 | #pragma once
10 | #include "z3++_no_warn.h"
11 | #include "mcode_emu.hpp"
12 | 
13 | //-------------------------------------------------------------------------
14 | class z3_converter_t
15 | {
16 |   char namebuf[12];
17 |   int next_free_varnum = 0;
18 |   const char *build_new_varname()
19 |   {
20 |     qsnprintf(namebuf, sizeof(namebuf), "y%d", next_free_varnum++);
21 |     return namebuf;
22 |   }
23 | 
24 | public:
25 |   z3::context context;
26 |   z3::expr_vector input_vars;
27 | 
28 |   // the next integer we can use to generate a z3 variable name
29 |   std::map<mop_t, z3::expr> assigned_vars;
30 | 
31 |   z3_converter_t() : input_vars(context) { namebuf[0] = '\0'; }
32 |   virtual ~z3_converter_t() {}
33 | 
34 |   // create_new_z3_var is called when var_to_expr fails to find an assigned_var in the cache
35 |   virtual z3::expr create_new_z3_var(const mop_t &mop);
36 |   z3::expr var_to_expr(const mop_t &mop); // for terminal mops, i.e. stack vars, registers, global vars
37 |   z3::expr mop_to_expr(const mop_t &mop);
38 |   z3::expr minsn_to_expr(const minsn_t &insn);
39 | 
40 |   //-------------------------------------------------------------------------
41 |   z3::expr bool_to_bv(z3::expr boolean, uint bitsz)
42 |   {
43 |     return z3::ite(boolean, context.bv_val(1, bitsz), context.bv_val(0, bitsz));
44 |   }
45 | 
46 |   //-------------------------------------------------------------------------
47 |   z3::expr bv_zext_to_len(z3::expr bv, uint target_bitsz)
48 |   {
49 |     uint orig_bitsz = bv.get_sort().bv_size();
50 |     if ( target_bitsz == orig_bitsz )
51 |       return bv; // no need to extend
52 |     return z3::zext(bv, target_bitsz - orig_bitsz);
53 |   }
54 | 
55 |   //-------------------------------------------------------------------------
56 |   z3::expr bv_sext_to_len(z3::expr bv, uint target_bitsz)
57 |   {
58 |     uint orig_bitsz = bv.get_sort().bv_size();
59 |     if ( target_bitsz == orig_bitsz )
60 |       return bv; // no need to extend
61 |     return z3::sext(bv, target_bitsz - orig_bitsz);
62 |   }
63 | 
64 |   //-------------------------------------------------------------------------
65 |   z3::expr bv_resize_to_len(z3::expr bv, uint target_bitsz, bool sext)
66 |   {
67 |     uint orig_bitsz = bv.get_sort().bv_size();
68 |     if ( target_bitsz == orig_bitsz )
69 |       return bv;
70 |     if ( target_bitsz < orig_bitsz )
71 |       return bv.extract(target_bitsz - 1, 0);
72 |     else
73 |       return sext
74 |            ? bv_sext_to_len(bv, target_bitsz)
75 |            : bv_zext_to_len(bv, target_bitsz);
76 |   }
77 | 
78 |   //-------------------------------------------------------------------------
79 |   z3::expr mcode_val_to_expr(mcode_val_t v)
80 |   {
81 |     return context.bv_val(uint64_t(v.val), v.size * 8);
82 |   }
83 | };
84 | 


--------------------------------------------------------------------------------
/tests/idb/mba_challenge.i64:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/tests/idb/mba_challenge.i64


--------------------------------------------------------------------------------
/tests/idb/nonlinear.o.i64:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HexRaysSA/goomba/bf1e49866f3cbf605b1069f053edd9d126de1372/tests/idb/nonlinear.o.i64


--------------------------------------------------------------------------------
/z3++_no_warn.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | // using z3++.h directly leads to compiler warnings about shadowing declaractions
 3 | #ifdef __GNUC__
 4 | #  pragma GCC diagnostic push
 5 | #  pragma GCC diagnostic ignored "-Wshadow"
 6 | #endif
 7 | #include <z3++.h>
 8 | #ifdef __GNUC__
 9 | #  pragma GCC diagnostic pop
10 | #endif
11 | 


--------------------------------------------------------------------------------
/z3/readme.txt:
--------------------------------------------------------------------------------
1 | bin and include directories of the z3 build should be extracted here
2 | 


--------------------------------------------------------------------------------