├── 00-llvm ├── README.md ├── hello.c └── run ├── 01-model ├── example.c ├── model-1.yml ├── README.md ├── model-2.yml ├── model-3.yml └── run ├── 02-simple-example ├── README.md ├── run └── source.c ├── 14-python-scripting ├── linked-list ├── README.md ├── run └── script.py ├── 03-stack-o0 ├── README.md ├── source.c └── run ├── 06-klee ├── README.md ├── support.c ├── use-after-free.c ├── Makefile └── run ├── 08-codeql ├── README.md ├── use-after-free.c └── run ├── 07-clang-static-analyzer ├── README.md ├── use-after-free.c └── run ├── 13-indirect-branch-handling ├── README.md ├── run └── switch-jump-table.S ├── 12-obfuscation-cfg-flattening ├── README.md ├── run └── analysis │ └── CFGFlatteningPass.cpp ├── 09-automated-type-recovery ├── README.md ├── linked-lists.c └── run ├── 10-obfuscation-code-mutation-stack-traffic ├── README.md ├── sum42.S ├── extract └── run ├── 11-obfuscation-opaque-branch-conditions ├── README.md ├── add-or-shift.c └── run ├── run-all-examples ├── 04-taint-example ├── README.md ├── run ├── source.c └── analysis │ └── TaintAnalysis.cpp ├── run ├── 05-loop-example ├── README.md ├── source.c ├── run └── analysis │ └── LoopInfo.cpp ├── README.md ├── container ├── common └── Dockerfile /00-llvm/README.md: -------------------------------------------------------------------------------- 1 | This is a simple example to show some LLVM IR emitted by `clang`. 2 | -------------------------------------------------------------------------------- /01-model/example.c: -------------------------------------------------------------------------------- 1 | int main(int argc, char *argv[]) { 2 | return argc * 3; 3 | } 4 | -------------------------------------------------------------------------------- /02-simple-example/README.md: -------------------------------------------------------------------------------- 1 | An example of the IR produced by rev.ng from a simple program. 2 | -------------------------------------------------------------------------------- /14-python-scripting/linked-list: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/revng/demos/HEAD/14-python-scripting/linked-list -------------------------------------------------------------------------------- /03-stack-o0/README.md: -------------------------------------------------------------------------------- 1 | A simple example showing how rev.ng can optimize away superfluous usage of the stack present in the input program. 2 | -------------------------------------------------------------------------------- /06-klee/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program with a use-after-free bug and uses KLEE to identify it given the IR produced by rev.ng. 2 | 3 | -------------------------------------------------------------------------------- /14-python-scripting/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program using a custom linked-list and uses it via the rev.ng Python scripting APIs. 2 | -------------------------------------------------------------------------------- /08-codeql/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program with a use-after-free bug and uses CodeQL to identify it given the decompiled C produced by rev.ng. 2 | -------------------------------------------------------------------------------- /07-clang-static-analyzer/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program with a use-after-free bug and uses clang static analyser to identify it given the decompiled C produced by rev.ng. 2 | -------------------------------------------------------------------------------- /13-indirect-branch-handling/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a snippet of assembly with an indirect jump employing a jump table and shows the internals of how rev.ng can effectively devirtualize such indirect jump. 2 | -------------------------------------------------------------------------------- /00-llvm/hello.c: -------------------------------------------------------------------------------- 1 | int add_1_or_2(int add_1, int argument) { 2 | int result; 3 | if (add_1) { 4 | result = argument + 1; 5 | } else { 6 | result = argument + 2; 7 | } 8 | return result; 9 | } 10 | -------------------------------------------------------------------------------- /12-obfuscation-cfg-flattening/README.md: -------------------------------------------------------------------------------- 1 | This example implement a simple CFG flattening on LLVM IR and shows it can beasily undone by employing off-the-shelf LLVM transformations, specifically the `jump-threading` pass. 2 | -------------------------------------------------------------------------------- /03-stack-o0/source.c: -------------------------------------------------------------------------------- 1 | long myfunction(long value) { 2 | long doubled = value; 3 | doubled = doubled * 2; 4 | return doubled; 5 | } 6 | 7 | int main(int argc, char *argv[]) { 8 | myfunction(argc); 9 | return 0; 10 | } 11 | -------------------------------------------------------------------------------- /09-automated-type-recovery/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program using a custom linked-list and shows how rev.ng can correctly recover such data structure and emit pretty accesses to the data structure in the decompiled code. 2 | -------------------------------------------------------------------------------- /10-obfuscation-code-mutation-stack-traffic/README.md: -------------------------------------------------------------------------------- 1 | This compiles a snippet of assembly with some superfluous stack accesses and some obfuscated computation and shows how the IR produced by rev.ng can be easily deobfuscated with off-the-shelf LLVM optimizations. 2 | -------------------------------------------------------------------------------- /11-obfuscation-opaque-branch-conditions/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program where the condition of an `if` statement is the result of a function call that, however, produces always the same value. 2 | 3 | The example shows how such obfuscation technique can be easily bypassed by inlining the called function. 4 | -------------------------------------------------------------------------------- /run-all-examples: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -euo pipefail 4 | 5 | cd /demo 6 | 7 | for EXAMPLE in $(find -mindepth 1 -maxdepth 1 -type d -regex '\./[0-9]+-.*' | sort); do 8 | echo "Running example $EXAMPLE" >&2 9 | pushd "$EXAMPLE" >& /dev/null 10 | ./run 11 | popd >& /dev/null 12 | done 13 | -------------------------------------------------------------------------------- /04-taint-example/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple C program making an HTTP requests and feeds it to an analysis checking if the result of a sensitive API call `get_location` is sent over the network (`send`). 2 | 3 | The goal of this example is to show how to write a simple LLVM analysis on code generated by rev.ng. 4 | -------------------------------------------------------------------------------- /run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -euo pipefail 4 | 5 | if command -v podman > /dev/null; then 6 | DOCKER=podman 7 | elif command -v docker > /dev/null; then 8 | DOCKER=docker 9 | else 10 | echo "Neither podman nor docker are available" >&2 11 | exit 1 12 | fi 13 | 14 | "$DOCKER" build -t revng-demo . 15 | ./container /demo/run-all-examples 16 | -------------------------------------------------------------------------------- /11-obfuscation-opaque-branch-conditions/add-or-shift.c: -------------------------------------------------------------------------------- 1 | long obfuscated_add_42(long value) { 2 | return value + 42; 3 | } 4 | 5 | long add_or_shift(long arg1, long arg2) { 6 | if (obfuscated_add_42(33) > 50) { 7 | return arg1 + arg2; 8 | } else { 9 | return arg1 << arg2; 10 | } 11 | } 12 | 13 | long _start(long a, long b) { 14 | return add_or_shift(a, b); 15 | } 16 | -------------------------------------------------------------------------------- /01-model/model-1.yml: -------------------------------------------------------------------------------- 1 | --- 2 | Architecture: x86_64 3 | DefaultABI: SystemV_x86_64 4 | Segments: 5 | - StartAddress: "0x400000:Generic64" 6 | VirtualSize: 7 7 | StartOffset: 0 8 | FileSize: 7 9 | IsReadable: true 10 | IsWriteable: false 11 | IsExecutable: true 12 | Functions: 13 | - Entry: "0x400000:Code_x86_64" 14 | ... 15 | -------------------------------------------------------------------------------- /05-loop-example/README.md: -------------------------------------------------------------------------------- 1 | This example compiles a simple program with a loop and runs on it an LLVM analysis dumping information about the loop. 2 | 3 | The goal of this example is to show that LLVM is able to 1) identify the loop, 2) predict the number of iterations and 3) provide constraints on the values that the induction variable assumes in different parts of the code (the two branches of the `if` statement). 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rev.ng demos 2 | 3 | This repository contains a couple of rev.ng demos. 4 | 5 | Check out the subdirectories, each of them contains a brief description. 6 | 7 | ## How to run all demos 8 | 9 | Demos run into a container, to run them all: 10 | 11 | ``` 12 | ./run 13 | ``` 14 | 15 | To run an individual demo: 16 | 17 | ``` 18 | podman build -t revng-demo . 19 | ./container 20 | /demo/1-model/run 21 | ``` 22 | -------------------------------------------------------------------------------- /container: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -euo pipefail 4 | 5 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 6 | 7 | if command -v podman > /dev/null; then 8 | DOCKER=podman 9 | elif command -v docker > /dev/null; then 10 | DOCKER=docker 11 | else 12 | echo "Neither podman nor docker are available" >&2 13 | exit 1 14 | fi 15 | 16 | "$DOCKER" run -it --rm -v "$SCRIPT_DIR:/demo" revng-demo "$@" 17 | -------------------------------------------------------------------------------- /03-stack-o0/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | "${CLANG}" -fno-stack-protector source.c -o binary 9 | 10 | rm -rf resume/ 11 | revng \ 12 | artifact \ 13 | --enable-remote-debug-info \ 14 | "cleanup-ir" \ 15 | --analyze \ 16 | binary \ 17 | --resume resume \ 18 | -o /dev/null \ 19 | --progress 20 | -------------------------------------------------------------------------------- /01-model/README.md: -------------------------------------------------------------------------------- 1 | This example creates a raw binary containing some x86-64 executable code (not an ELF) and loads it into rev.ng. 2 | In order to do this, a model is composed step-by-step. First with only the minimal information required to emit the `disassemble` artifact, then with a function prototype to be able to emit the `decompile` artifact. 3 | 4 | Finally, the example shows how to exploit rev.ng's analyses to emit decompiled code in a single shot starting from an ELF executable. 5 | -------------------------------------------------------------------------------- /02-simple-example/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | "${CLANG}" -O2 -fno-stack-protector source.c -o binary 9 | 10 | rm -rf resume/ 11 | revng \ 12 | artifact \ 13 | --enable-remote-debug-info \ 14 | "cleanup-ir" \ 15 | --analyze \ 16 | binary \ 17 | --resume resume \ 18 | -o /dev/null \ 19 | --progress 20 | -------------------------------------------------------------------------------- /06-klee/support.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | extern int local_main(int argc, const char ** argv); 4 | 5 | int main(int argc, const char **argv) { 6 | return local_main(argc, argv); 7 | } 8 | 9 | void *dynamic_malloc(size_t size) { 10 | return malloc(size); 11 | } 12 | 13 | void dynamic_free(void *ptr) { 14 | return free(ptr); 15 | } 16 | 17 | int dynamic_strtol(const char *nptr, char **endptr, int base) { 18 | return strtol(nptr, endptr, base); 19 | } 20 | 21 | void rcu_init(void) { } 22 | -------------------------------------------------------------------------------- /06-klee/use-after-free.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #define NOINLINE __attribute__((weak)) __attribute__((noinline)) 4 | 5 | NOINLINE 6 | void my_free(void *p) { 7 | free(p); 8 | } 9 | 10 | NOINLINE 11 | int do_stuff(int condition) { 12 | int *p = malloc(sizeof(int)); 13 | if (condition > 4) { 14 | my_free(p); 15 | // programmer forgot to return 0; 16 | } 17 | // potential use after free 18 | *p = 3; 19 | int result = *p; 20 | // potential double free 21 | my_free(p); 22 | return result; 23 | } 24 | 25 | NOINLINE 26 | int main(int argc, char **argv) { 27 | return do_stuff(strtol(argv[1], NULL, 10)); 28 | } 29 | -------------------------------------------------------------------------------- /00-llvm/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | "${CLANG}" \ 9 | -Xclang -disable-O0-optnone \ 10 | -S \ 11 | -emit-llvm \ 12 | hello.c \ 13 | -o hello.ll 14 | 15 | echo "" 16 | echo "=======" 17 | echo "Vanilla" 18 | echo "=======" 19 | echo "" 20 | cat hello.ll | extract_llvm_function add_1_or_2 | pretty_llvm 21 | 22 | echo "" 23 | echo "=========" 24 | echo "Optimized" 25 | echo "=========" 26 | echo "" 27 | "${OPT}" -O2 hello.ll -S | extract_llvm_function add_1_or_2 | pretty_llvm 28 | -------------------------------------------------------------------------------- /08-codeql/use-after-free.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #define NOINLINE __attribute__((weak)) __attribute__((noinline)) 4 | 5 | NOINLINE 6 | void my_free(void *p) { 7 | free(p); 8 | } 9 | 10 | NOINLINE 11 | int do_stuff(int condition) { 12 | 13 | int *p = malloc(sizeof(int)); 14 | 15 | if (condition > 4) { 16 | my_free(p); 17 | // programmer forgot to return 0; 18 | } 19 | 20 | // potential use after free 21 | *p = 3; 22 | int result = *p; 23 | 24 | // potential double free 25 | my_free(p); 26 | 27 | return result; 28 | } 29 | 30 | NOINLINE 31 | int main(int argc, char **argv) { 32 | 33 | return do_stuff(strtol(argv[1], NULL, 10)); 34 | 35 | } 36 | -------------------------------------------------------------------------------- /07-clang-static-analyzer/use-after-free.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #define NOINLINE __attribute__((weak)) __attribute__((noinline)) 4 | 5 | NOINLINE 6 | void my_free(void *p) { 7 | free(p); 8 | } 9 | 10 | NOINLINE 11 | int do_stuff(int condition) { 12 | 13 | int *p = malloc(sizeof(int)); 14 | 15 | if (condition > 4) { 16 | my_free(p); 17 | // programmer forgot to return 0; 18 | } 19 | 20 | // potential use after free 21 | *p = 3; 22 | int result = *p; 23 | 24 | // potential double free 25 | my_free(p); 26 | 27 | return result; 28 | } 29 | 30 | NOINLINE 31 | int main(int argc, char **argv) { 32 | 33 | return do_stuff(strtol(argv[1], NULL, 10)); 34 | 35 | } 36 | -------------------------------------------------------------------------------- /14-python-scripting/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # TODO: make this script work out of the container 9 | # TODO: remove the binary in favor of building it from source 10 | 11 | # Make llvm-config available 12 | rm -f llvm-config 13 | ln -s $(which $(which "$LLVM_CONFIG")) llvm-config 14 | export PATH="$PWD:$PATH" 15 | 16 | # Disable fetching debug info 17 | export REVNG_OPTIONS="--debug-info=no" 18 | 19 | python3.11 -m venv venv 20 | source venv/bin/activate 21 | pip install revng ipython 22 | 23 | # export OPENAI_API_KEY="..." 24 | 25 | ./script.py 26 | -------------------------------------------------------------------------------- /13-indirect-branch-handling/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | BINARY="switch-jump-table" 9 | 10 | # Compile the obfuscated assembly 11 | gcc -no-pie -masm=intel "${BINARY}".S -nostdlib -o "${BINARY}" 12 | 13 | JMP_ADDRESS=$(objdump -Mintel -d "${BINARY}" | grep jmp.*PTR | awk '{ print $1 }' | sed 's|:||') 14 | 15 | # Lift the binary with revng 16 | revng artifact lift -o /dev/null "${BINARY}" --analyze --dump-vm-at="0x$JMP_ADDRESS" |& tee output 17 | 18 | mv "$(grep -o /tmp/dfg-.*.dot output | head -n1)" dfg.dot 19 | mv "$(grep -o /tmp/cfeg-.*.dot output | head -n1)" cfeg.dot 20 | -------------------------------------------------------------------------------- /05-loop-example/source.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #define WEAK __attribute__((weak)) 4 | 5 | char encrypt_byte_1(char value) WEAK; 6 | char encrypt_byte_1(char value) { 7 | return value ^ 0xFF; 8 | } 9 | 10 | char encrypt_byte_2(char value) WEAK; 11 | char encrypt_byte_2(char value) { 12 | return value ^ 0x77; 13 | } 14 | 15 | char encrypt(char *data) WEAK; 16 | char encrypt(char *data) { 17 | char checksum = 0; 18 | 19 | for (unsigned I = 0; I < 16; ++I) { 20 | if (I < 8) { 21 | data[I] = encrypt_byte_1(data[I]); 22 | } else { 23 | data[I] = encrypt_byte_2(data[I]); 24 | } 25 | 26 | checksum += data[I]; 27 | } 28 | 29 | return checksum; 30 | } 31 | 32 | int main(int argc, char *argv[]) { 33 | encrypt(argv[0]); 34 | } 35 | -------------------------------------------------------------------------------- /06-klee/Makefile: -------------------------------------------------------------------------------- 1 | use-after-free.bin: use-after-free.c 2 | gcc -O2 -fno-stack-protector -fno-optimize-sibling-calls use-after-free.c -o use-after-free.bin 3 | 4 | use-after-free.ll: use-after-free.bin 5 | revng artifact cleanup-ir --analyze --debug-names --progress use-after-free.bin -o use-after-free.ll 6 | 7 | cfg.dot: use-after-free.ll 8 | opt-16 -passes=dot-cfg use-after-free.ll 9 | 10 | use-after-free.bc: use-after-free.ll 11 | clang-16 -O2 -fno-stack-protector support.c -Xclang -emit-llvm -S -o support.ll 12 | llvm-link-16 use-after-free.ll support.ll -o use-after-free.bc 13 | 14 | run-klee: use-after-free.bc 15 | klee --posix-runtime --libc=klee use-after-free.bc --sym-args 1 1 2 16 | 17 | clean: 18 | -rm -rf klee* *.ll *.bc *.bin .*.dot 19 | 20 | .PHONY: clean run-klee 21 | -------------------------------------------------------------------------------- /13-indirect-branch-handling/switch-jump-table.S: -------------------------------------------------------------------------------- 1 | .intel_syntax noprefix 2 | .type myfunc, @function 3 | .globl myfunc 4 | myfunc: 5 | mov rax, QWORD PTR [rax] 6 | cmp rax, 0x5 7 | ja end 8 | jmp QWORD PTR [rax*8+jumptable] 9 | one: 10 | mov rax, 0x3 11 | ret 12 | two: 13 | mov rax, 0x5 14 | ret 15 | three: 16 | mov rax, 0x6 17 | ret 18 | four: 19 | mov rax, 0x7 20 | ret 21 | five: 22 | mov rax, 0x8 23 | ret 24 | end: 25 | ret 26 | 27 | .type _start, @function 28 | .globl _start 29 | _start: 30 | mov rax, rdi 31 | call myfunc 32 | mov rdi, rax 33 | ret 34 | 35 | .section .rodata 36 | jumptable: 37 | .quad one 38 | .quad two 39 | .quad three 40 | .quad four 41 | .quad five 42 | .quad end 43 | -------------------------------------------------------------------------------- /05-loop-example/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | "${CLANG}" -O2 -fno-stack-protector -fno-unroll-loops source.c -o binary 9 | 10 | rm -rf resume/ 11 | revng \ 12 | artifact \ 13 | --enable-remote-debug-info \ 14 | "cleanup-ir" \ 15 | --analyze \ 16 | binary \ 17 | --resume resume \ 18 | -o /dev/null \ 19 | --progress 20 | 21 | g++ \ 22 | -o "analysis/libLoopInfo.so" \ 23 | -fPIC \ 24 | -shared \ 25 | -O2 \ 26 | analysis/LoopInfo.cpp \ 27 | $("${LLVM_CONFIG}" --cxxflags --ldflags) \ 28 | -g 29 | 30 | zstdcat resume/cleanup-ir/module.bc.zstd | \ 31 | "${OPT}" \ 32 | -load-pass-plugin="analysis/libLoopInfo.so" \ 33 | -passes="LoopInfo" \ 34 | -o /dev/null 35 | -------------------------------------------------------------------------------- /04-taint-example/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | "${CLANG}" -O2 -fno-stack-protector -fno-builtin source.c -o binary 9 | 10 | rm -rf resume/ 11 | revng \ 12 | artifact \ 13 | --enable-remote-debug-info \ 14 | "cleanup-ir" \ 15 | --analyze \ 16 | binary \ 17 | --debug-names \ 18 | --resume resume \ 19 | --progress | \ 20 | zstdcat | \ 21 | "${OPT}" -S > module.ll 22 | 23 | g++ \ 24 | -o "analysis/libTaintAnalysis.so" \ 25 | -fPIC \ 26 | -shared \ 27 | -O2 \ 28 | analysis/TaintAnalysis.cpp \ 29 | $("${LLVM_CONFIG}" --cxxflags --ldflags) \ 30 | -g 31 | 32 | "${OPT}" \ 33 | -load-pass-plugin="analysis/libTaintAnalysis.so" \ 34 | -passes="TaintAnalysis" \ 35 | module.ll \ 36 | -o /dev/null 37 | -------------------------------------------------------------------------------- /10-obfuscation-code-mutation-stack-traffic/sum42.S: -------------------------------------------------------------------------------- 1 | .intel_syntax noprefix 2 | .type _start, @function 3 | .section .text 4 | .global _start 5 | _start: 6 | push rdi # Push arg1 on stack 7 | mov rdx, QWORD PTR [rsp] # Put arg1 from stack to rdx 8 | movabs rax, 0x8712baf28d22ab23 # Materialize magic mask in rax 9 | xor rdx, rax # Xor arg1 in rdx with magic mask 10 | movabs rax, 0x88712baf28d22ab2 # Materialize magic mask in rax 11 | xor rdx, rax # Xor rdx with magic mask 12 | movabs rax, 0xf63915da5f08191 # Materialize magic mask in rax 13 | xor rdx, rax # Xor rdx with magic mask: the result is again arg1 14 | mov QWORD PTR [rsp], rdx # Put back arg1 on the stack 15 | add QWORD PTR [rsp], 42 # Add 42 to arg1 directly on the stack 16 | pop rax # Pop result into the rax return register 17 | ret # Return 18 | -------------------------------------------------------------------------------- /01-model/model-2.yml: -------------------------------------------------------------------------------- 1 | --- 2 | Architecture: x86_64 3 | DefaultABI: SystemV_x86_64 4 | Segments: 5 | - StartAddress: "0x400000:Generic64" 6 | VirtualSize: 7 7 | StartOffset: 0 8 | FileSize: 7 9 | IsReadable: true 10 | IsWriteable: false 11 | IsExecutable: true 12 | Functions: 13 | - Entry: "0x400000:Code_x86_64" 14 | Prototype: 15 | Kind: DefinedType 16 | Definition: "/TypeDefinitions/1-CABIFunctionDefinition" 17 | TypeDefinitions: 18 | - Kind: CABIFunctionDefinition 19 | ABI: SystemV_x86_64 20 | ID: 1 21 | Arguments: 22 | - Index: 0 23 | Type: 24 | Kind: PrimitiveType 25 | PrimitiveKind: Unsigned 26 | Size: 8 27 | - Index: 1 28 | Type: 29 | Kind: PrimitiveType 30 | PrimitiveKind: Unsigned 31 | Size: 8 32 | ReturnType: 33 | Kind: PrimitiveType 34 | PrimitiveKind: Unsigned 35 | Size: 8 36 | -------------------------------------------------------------------------------- /10-obfuscation-code-mutation-stack-traffic/extract: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | FUNCTION="$1" 7 | RESUME="$2" 8 | 9 | function extract() { 10 | STEP="$1" 11 | FUNC="$2" 12 | zstdcat "${RESUME}"/"${STEP}"/module.bc.zstd | \ 13 | # Make very minimal cleanup on it with off-the-shelf LLVM passes 14 | "${OPT}" --enable-new-pm=false -dse | \ 15 | # Extract the function we want 16 | extract_llvm_function "$FUNC" | \ 17 | # Remove noisy stuff to make the output more terse for demo purpose 18 | # all the cleanup below here can be skipped 19 | pretty_llvm | \ 20 | grep -v newFuncRoot: 21 | } 22 | 23 | extract lift root > 1-lift.ll 24 | extract isolate "${FUNCTION}" > 2-isolate.ll 25 | extract enforce-abi "${FUNCTION}" > 3-enforce-abi.ll 26 | extract segregate-stack-accesses "${FUNCTION}" > 4-segregate-stack-accesses.ll 27 | extract cleanup-ir "${FUNCTION}" > 5-cleanup-ir.ll 28 | -------------------------------------------------------------------------------- /09-automated-type-recovery/linked-lists.c: -------------------------------------------------------------------------------- 1 | // 2 | // This file is distributed under the MIT License. See LICENSE.md for details. 3 | // 4 | 5 | #include 6 | #include 7 | #include 8 | 9 | #define WEAK __attribute__((weak)) 10 | 11 | #define DATA_LEN 5 12 | 13 | struct node { 14 | int64_t data[DATA_LEN]; 15 | struct node *next; 16 | }; 17 | 18 | WEAK struct node *init_list() { 19 | return NULL; 20 | } 21 | 22 | WEAK void release_list(struct node *n) { 23 | free(n); 24 | } 25 | 26 | WEAK int64_t sum(struct node *n) { 27 | int64_t result = 0; 28 | for (int i = 0; i < DATA_LEN; ++i) 29 | result += n->data[i]; 30 | return result; 31 | } 32 | 33 | WEAK int64_t compute(struct node *n __attribute__((nonnull))) { 34 | int64_t result = 0; 35 | while (n) { 36 | result += sum(n); 37 | n = n->next; 38 | } 39 | return result; 40 | } 41 | 42 | int main() { 43 | struct node *head = init_list(); 44 | int64_t result = compute(head); 45 | release_list(head); 46 | return result != 42; 47 | } 48 | -------------------------------------------------------------------------------- /07-clang-static-analyzer/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # Compile to binary, with optimizations, but disabling stack protector and 9 | # optimizations around tail calls, to ease source-to-binary matching for demo 10 | gcc -O2 -fno-stack-protector -fno-optimize-sibling-calls use-after-free.c -o use-after-free 11 | 12 | # Lift with revng generate all decompiled C code in a single tar.gz, and unpack it in a single directory 13 | revng artifact --debug-names emit-recompilable-archive --analyze use-after-free -o - | tar xfz - 14 | 15 | # Invoke clang-static-analyzer on the decompiled C code 16 | "${CLANG}" \ 17 | -Wno-int-conversion \ 18 | -o /dev/null \ 19 | -c \ 20 | decompiled/functions.c \ 21 | --analyze \ 22 | -ffreestanding \ 23 | -Xanalyzer -analyzer-disable-checker \ 24 | -Xanalyzer deadcode \ 25 | -Xanalyzer -analyzer-output=html \ 26 | -o clang-static-analyzer-results 27 | -------------------------------------------------------------------------------- /01-model/model-3.yml: -------------------------------------------------------------------------------- 1 | --- 2 | Architecture: x86_64 3 | DefaultABI: SystemV_x86_64 4 | Segments: 5 | - StartAddress: "0x400000:Generic64" 6 | VirtualSize: 7 7 | StartOffset: 0 8 | FileSize: 7 9 | IsReadable: true 10 | IsWriteable: false 11 | IsExecutable: true 12 | Functions: 13 | - Entry: "0x400000:Code_x86_64" 14 | Name: sum 15 | Prototype: 16 | Kind: DefinedType 17 | Definition: "/TypeDefinitions/1-CABIFunctionDefinition" 18 | TypeDefinitions: 19 | - Kind: CABIFunctionDefinition 20 | ABI: SystemV_x86_64 21 | ID: 1 22 | Arguments: 23 | - Index: 0 24 | Name: first_addend 25 | Type: 26 | Kind: PrimitiveType 27 | PrimitiveKind: Unsigned 28 | Size: 8 29 | - Index: 1 30 | Name: second_addend 31 | Type: 32 | Kind: PrimitiveType 33 | PrimitiveKind: Unsigned 34 | Size: 8 35 | ReturnType: 36 | Kind: PrimitiveType 37 | PrimitiveKind: Unsigned 38 | Size: 8 39 | -------------------------------------------------------------------------------- /08-codeql/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # Compile to binary, with optimizations, but disabling stack protector and 9 | # optimizations around tail calls, to ease source-to-binary matching for demo 10 | gcc -O2 -fno-stack-protector -fno-optimize-sibling-calls use-after-free.c -o use-after-free.bin 11 | 12 | # Lift with revng generate all decompiled C code in a single tar.gz, and unpack it in a single directory 13 | revng artifact --debug-names emit-recompilable-archive --analyze use-after-free.bin -o - | tar xfz - 14 | 15 | # CodeQL can't see across functions by default. Unwrap calls to myfree. 16 | sed 's/ my_free/ free/g' decompiled/functions.c > decompiled/functions.normalized.c 17 | 18 | # Invoke codeql on the decompiled C code 19 | codeql database create --overwrite codeql.db --language=cpp --command='gcc -c decompiled/functions.normalized.c -o /dev/null' 20 | codeql database analyze codeql.db --download codeql/cpp-queries --verbose --format=csv --output=codeql.output.csv --threads=16 --rerun 21 | -------------------------------------------------------------------------------- /10-obfuscation-code-mutation-stack-traffic/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | RESUME_DIRECTORY="resume-directory" 9 | BINARY="sum42" 10 | 11 | # Compile the obfuscated assembly 12 | gcc -no-pie -masm=intel "${BINARY}".S -nostdlib -o "${BINARY}" 13 | 14 | # Lift the binary with revng 15 | rm -rf "${RESUME_DIRECTORY}" 16 | revng artifact cleanup-ir --resume="${RESUME_DIRECTORY}" -o /dev/null "${BINARY}" --analyses=revng-initial-auto-analysis,revng-c-initial-auto-analysis 17 | 18 | # Extract LLVM IR at different stages of the decompilation pipeline, to show the 19 | # dead code and bogus stack traffic is gradually optimized away by the default 20 | # revng decompilation pipeline. 21 | ./extract "local_0x$(address_of "${BINARY}" _start):Code_x86_64" "${RESUME_DIRECTORY}" 22 | 23 | echo "" 24 | echo "==========" 25 | echo "Obfuscated" 26 | echo "==========" 27 | echo "" 28 | cat 3-enforce-abi.ll 29 | 30 | echo "" 31 | echo "============" 32 | echo "Deobfuscated" 33 | echo "============" 34 | echo "" 35 | cat 5-cleanup-ir.ll 36 | -------------------------------------------------------------------------------- /06-klee/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # Compile to binary, with optimizations, but disabling stack protector and 9 | # optimizations around tail calls, to ease source-to-binary matching for demo 10 | gcc -O2 -fno-stack-protector -fno-optimize-sibling-calls use-after-free.c -o use-after-free.bin 11 | 12 | # Lift with revng and generate clean IR 13 | revng artifact cleanup-ir --analyze --debug-names use-after-free.bin | zstdcat > use-after-free.ll 14 | 15 | # Dump the CFG for visualization 16 | "${OPT}" -passes=dot-cfg use-after-free.ll 17 | # Can be visualized with xdot .local_do_stuff.dot 18 | 19 | # Compile and link with support library, used to show klee that calls to 20 | # dynamic_* in the lifted LLVM IR are actually just wrappers around libc. 21 | "${CLANG}" -O2 -fno-stack-protector support.c -Xclang -emit-llvm -S -o support.ll 22 | "${LLVM_LINK}" use-after-free.ll support.ll -o use-after-free.bc 23 | 24 | # Invoke klee, instructing it that we're running in a POSIX environment, telling 25 | # it to use its own model of the libc, and to consider the CLI arguments as 26 | # symbolic, always passing at least 1, and at most 1, with exactly 2 bytes of 27 | # lenght. 28 | klee --posix-runtime --libc=klee use-after-free.bc --sym-args 1 1 2 29 | -------------------------------------------------------------------------------- /01-model/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # An empty model is a valid model 9 | revng model opt /dev/null -verify -Y 10 | 11 | # Create a simple raw binary containing x86-64 code 12 | printf '\x48\x01\xf7\x48\x89\xf8\xc3\x90' > sum 13 | 14 | # Disassemble the program above 15 | objdump -D -Mintel,x86-64 -b binary -m i386:x86-64 sum 16 | 17 | # Disassemble using rev.ng 18 | revng artifact disassemble sum --model model-1.yml | revng ptml --color 19 | 20 | # Show differences between model versions 21 | git diff --no-index model-1.yml model-2.yml | cat || true 22 | 23 | # Decompile the program 24 | revng artifact decompile sum --model model-2.yml | revng ptml --color 25 | 26 | # Show differences between model versions 27 | git diff --no-index model-2.yml model-3.yml | cat || true 28 | 29 | # Decompile again 30 | revng artifact decompile sum --model model-3.yml | revng ptml --color 31 | 32 | # Create a simple C program 33 | cat > example.c < /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | # Compile to binary, with optimizations, but disabling stack protector and 9 | # optimizations around tail calls, to ease source-to-binary matching for demo. 10 | # Also disable loop unrolling to enable showing array recovery. 11 | "${CLANG}" -O2 -Wno-pointer-bool-conversion -fno-stack-protector -fno-optimize-sibling-calls -fno-unroll-loops linked-lists.c -o linked-lists.bin 12 | 13 | # Lift with revng generate all decompiled C code in a single tar.gz, and unpack it in a single directory 14 | # This DOES NOT include type recovery 15 | revng analyze revng-initial-auto-analysis --resume=resume-without-type-recovery -o /dev/null linked-lists.bin 16 | revng analyze detect-stack-size --resume=resume-without-type-recovery -o /dev/null linked-lists.bin 17 | revng analyze convert-functions-to-cabi --resume=resume-without-type-recovery -o /dev/null linked-lists.bin 18 | mkdir -p decompiled-without-type-recovery 19 | revng artifact emit-recompilable-archive --resume=resume-without-type-recovery linked-lists.bin -o - | tar xfz - -C decompiled-without-type-recovery 20 | 21 | # Lift with revng generate all decompiled C code in a single tar.gz, and unpack it in a single directory 22 | # This includes type recovery 23 | mkdir -p decompiled-with-type-recovery 24 | revng artifact emit-recompilable-archive --analyze linked-lists.bin -o - | tar xfz - -C decompiled-with-type-recovery 25 | -------------------------------------------------------------------------------- /12-obfuscation-cfg-flattening/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | RESUME_DIRECTORY="resume-directory" 9 | BINARY="calc-x86-64-static" 10 | 11 | # Get the example binary 12 | rm -rf "${BINARY}" 13 | wget https://rev.ng/downloads/"${BINARY}" 14 | 15 | # Lift the binary with revng 16 | rm -rf "${RESUME_DIRECTORY}" 17 | revng artifact cleanup-ir --resume="${RESUME_DIRECTORY}" -o /dev/null "${BINARY}" --analyses=revng-initial-auto-analysis,revng-c-initial-auto-analysis 18 | 19 | # Build the CFG flattening pass 20 | g++ \ 21 | -o "analysis/libCFGFlatteningPass.so" \ 22 | -fPIC \ 23 | -shared \ 24 | -O2 \ 25 | analysis/CFGFlatteningPass.cpp \ 26 | $("${LLVM_CONFIG}" --cxxflags --ldflags) \ 27 | -g 28 | 29 | # Extract only the function we want to look at for the demo. 30 | # For a simpler one use --func="local_0x404050:Code_x86_64" 31 | # For a larger one use --func="local_0x401fb3:Code_x86_64" 32 | zstdcat "${RESUME_DIRECTORY}"/cleanup-ir/module.bc.zstd | \ 33 | "${LLVM_EXTRACT}" --func="local_0x401185:Code_x86_64" | \ 34 | "${OPT}" -passes=strip -S -o original.ll 35 | 36 | "${OPT}" original.ll \ 37 | -load="analysis/libCFGFlatteningPass.so" \ 38 | -enable-new-pm=false --flatten-cfg \ 39 | -S -o flattened.ll 40 | 41 | "${OPT}" flattened.ll \ 42 | --enable-new-pm=false \ 43 | --jump-threading \ 44 | --jump-threading-across-loop-headers \ 45 | -o unflattened.ll 46 | -------------------------------------------------------------------------------- /02-simple-example/source.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #define WEAK __attribute__((weak)) 5 | 6 | uint64_t a(uint64_t arg1, 7 | uint64_t arg2, 8 | uint64_t arg3, 9 | uint64_t arg4, 10 | uint64_t arg5, 11 | uint64_t arg6, 12 | uint64_t arg7, 13 | uint64_t arg8) WEAK; 14 | 15 | uint64_t b(uint64_t arg1, 16 | uint64_t arg2, 17 | uint64_t arg3, 18 | uint64_t arg4, 19 | uint64_t arg5, 20 | uint64_t arg6, 21 | uint64_t arg7, 22 | uint64_t arg8) WEAK; 23 | 24 | typedef struct { 25 | uint64_t a; 26 | uint64_t b; 27 | } TwoInts; 28 | 29 | TwoInts *global_pointer = NULL; 30 | 31 | uint64_t a(uint64_t arg1, 32 | uint64_t arg2, 33 | uint64_t arg3, 34 | uint64_t arg4, 35 | uint64_t arg5, 36 | uint64_t arg6, 37 | uint64_t arg7, 38 | uint64_t arg8) { 39 | TwoInts integers; 40 | integers.a = arg1 + 1; 41 | integers.b = arg2 + 1; 42 | global_pointer = ℤ 43 | 44 | return b(arg1 + 1, 45 | arg2 + 2, 46 | arg3 + 3, 47 | arg4 + 4, 48 | arg5 + 5, 49 | arg6 + 6, 50 | arg7 + 7, 51 | arg8 + 8) + 1; 52 | } 53 | 54 | uint64_t b(uint64_t arg1, 55 | uint64_t arg2, 56 | uint64_t arg3, 57 | uint64_t arg4, 58 | uint64_t arg5, 59 | uint64_t arg6, 60 | uint64_t arg7, 61 | uint64_t arg8) { 62 | return arg1 * arg2 * arg3 * arg4 * arg5 * arg6 * arg7 * arg8; 63 | } 64 | 65 | int main() { 66 | return 0; 67 | } 68 | -------------------------------------------------------------------------------- /04-taint-example/source.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | char* get_location() __attribute__((weak)); 10 | 11 | char* get_location() { 12 | return "latitude=123.456&longitude=456.789"; 13 | } 14 | 15 | void make_http_request(const char* location) { 16 | const char* host = "example.com"; 17 | const int port = 80; 18 | const char* request_line = "GET /endpoint?"; 19 | const char* http_version = " HTTP/1.1\r\nHost: "; 20 | const char* header_end = "\r\n\r\n"; 21 | 22 | size_t request_len = strlen(request_line) + strlen(location) + strlen(http_version) + strlen(host) + strlen(header_end) + 1; 23 | char request[request_len]; 24 | 25 | strcpy(request, request_line); 26 | strcat(request, location); 27 | strcat(request, http_version); 28 | strcat(request, host); 29 | strcat(request, header_end); 30 | 31 | int sockfd = socket(AF_INET, SOCK_STREAM, 0); 32 | struct hostent* server = gethostbyname(host); 33 | struct sockaddr_in server_addr; 34 | bzero((char*)&server_addr, sizeof(server_addr)); 35 | server_addr.sin_family = AF_INET; 36 | bcopy((char*)server->h_addr, (char*)&server_addr.sin_addr.s_addr, server->h_length); 37 | server_addr.sin_port = htons(port); 38 | connect(sockfd, (struct sockaddr*)&server_addr, sizeof(server_addr)); 39 | send(sockfd, request, strlen(request), 0); 40 | 41 | char response[1024]; 42 | int n; 43 | while ((n = recv(sockfd, response, sizeof(response) - 1, 0)) > 0) { 44 | response[n] = '\0'; 45 | printf("%s", response); 46 | } 47 | 48 | close(sockfd); 49 | } 50 | 51 | int main() { 52 | char* location = get_location(); 53 | make_http_request(location); 54 | return 0; 55 | } 56 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM docker.io/library/ubuntu:22.04 2 | 3 | ARG REVNG_BRANCH=master 4 | 5 | ENV DEBIAN_FRONTEND=noninteractive 6 | RUN apt-get update -y && \ 7 | apt-get upgrade -y && \ 8 | apt-get install -y --no-install-recommends \ 9 | bash \ 10 | build-essential \ 11 | ca-certificates \ 12 | cmake \ 13 | curl \ 14 | git \ 15 | gnupg2 \ 16 | less \ 17 | libcurl4-openssl-dev \ 18 | libgoogle-perftools-dev \ 19 | libsqlite3-dev \ 20 | libz3-dev \ 21 | nano \ 22 | ninja-build \ 23 | python3-pip \ 24 | python3.11 \ 25 | python3.11-venv \ 26 | wget \ 27 | zlib1g-dev \ 28 | zstd \ 29 | && \ 30 | python3 -m pip install --user lit && \ 31 | wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \ 32 | echo 'deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-16 main' >> /etc/apt/sources.list && \ 33 | apt-get update -y && \ 34 | apt-get install -y --no-install-recommends clang-16 llvm-16 llvm-16-dev && \ 35 | rm -rf /root/.cache /var/lib/apt/lists/* /var/log/* /var/cache/* 36 | 37 | ENV PATH="${PATH}:/root/.local/bin" 38 | 39 | # Build and install klee 40 | RUN mkdir /klee && \ 41 | cd /klee && \ 42 | git clone https://github.com/klee/klee && \ 43 | cd klee && \ 44 | git checkout v3.1 && \ 45 | mkdir build && \ 46 | cd build && \ 47 | cmake \ 48 | ../ \ 49 | -GNinja \ 50 | -DCMAKE_BUILD_TYPE=Release \ 51 | -DCMAKE_INSTALL_PREFIX=/klee \ 52 | -DENABLE_KLEE_ASSERTS=OFF \ 53 | -DENABLE_SOLVER_Z3=On \ 54 | -DENABLE_POSIX_RUNTIME=ON \ 55 | -Wno-dev && \ 56 | ninja install && \ 57 | rm -rf /klee/klee 58 | 59 | ENV PATH="${PATH}:/klee/bin" 60 | 61 | # Install rev.ng 62 | RUN cd / && curl -L -s 'https://rev.ng/downloads/revng-distributable/'"$REVNG_BRANCH"'/install.sh' | bash 63 | ENV PATH="${PATH}:/revng" 64 | -------------------------------------------------------------------------------- /11-obfuscation-opaque-branch-conditions/run: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 4 | source "${SCRIPT_DIR}/../common" 5 | 6 | cd "$SCRIPT_DIR" 7 | 8 | RESUME_DIRECTORY="resume-directory" 9 | BINARY="add-or-shift" 10 | 11 | # Produce assembly 12 | gcc -masm=intel -O1 -fno-inline -fno-asynchronous-unwind-tables -fno-exceptions -no-pie "${BINARY}".c -nostdlib -S -o "${BINARY}".S 13 | 14 | # Compile 15 | gcc -O1 -fno-inline -no-pie "${BINARY}".c -nostdlib -o "${BINARY}" 16 | FUNCTION="local_0x$(address_of "${BINARY}" add_or_shift):Code_x86_64" 17 | 18 | # Lift the binary with revng 19 | rm -rf "${RESUME_DIRECTORY}" 20 | revng artifact cleanup-ir --resume="${RESUME_DIRECTORY}" -o /dev/null "${BINARY}" --analyses=revng-initial-auto-analysis,revng-c-initial-auto-analysis 21 | 22 | function extract_the_function() { 23 | # extract the function we want 24 | extract_llvm_function "${FUNCTION}" | \ 25 | # remove noisy stuff to make the output more terse for demo purpose 26 | # all the cleanup below here can be skipped 27 | pretty_llvm | \ 28 | grep -v newFuncRoot: 29 | } 30 | 31 | # Extract the cleaned up LLVM IR. 32 | # At this stage the obfuscation is still present 33 | zstdcat "${RESUME_DIRECTORY}"/cleanup-ir/module.bc.zstd | \ 34 | extract_the_function > obfuscated.ll 35 | 36 | # Apply -O2 optimization pipeline. 37 | # This triggers inlining, constant propagation, and CFG simplification to remove 38 | # dead branches, resulting in deobfuscated code. 39 | zstdcat "${RESUME_DIRECTORY}"/cleanup-ir/module.bc.zstd | \ 40 | # run O2 optimization pipeline, to deobfuscate the branch 41 | "${OPT}" -O2 | \ 42 | extract_the_function > deobfuscated.ll 43 | 44 | echo "" 45 | echo "==========" 46 | echo "Obfuscated" 47 | echo "==========" 48 | echo "" 49 | cat obfuscated.ll 50 | 51 | echo "" 52 | echo "==========" 53 | echo "Deobfuscated" 54 | echo "==========" 55 | echo "" 56 | cat deobfuscated.ll 57 | -------------------------------------------------------------------------------- /14-python-scripting/script.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3.11 2 | 3 | import os 4 | import sys 5 | 6 | from revng.model import MetaAddress 7 | from revng.model.metaaddress import MetaAddressType 8 | from revng.project import CLIProject 9 | 10 | from IPython import embed 11 | 12 | def print_functions(ast): 13 | from pycparser import c_ast 14 | class FuncDefVisitor(c_ast.NodeVisitor): 15 | def visit_FuncDef(self, node): 16 | print(node.decl.name) 17 | 18 | visitor = FuncDefVisitor() 19 | visitor.visit(ast) 20 | 21 | def main(): 22 | project = CLIProject() 23 | project.import_and_analyze("linked-list") 24 | 25 | # 26 | # Binary-wise actions 27 | # 28 | print(project.model.serialize()) 29 | 30 | project.model.get_artifact("decompile-to-single-file").print() 31 | 32 | # 33 | # Function-wise actions 34 | # 35 | address = MetaAddress(0x400930, Type=MetaAddressType.Code_x86_64) 36 | function = project.model.Functions[address] 37 | 38 | # Disassemble 39 | function.get_artifact("disassemble").print() 40 | 41 | # Decompile 42 | function.get_artifact("decompile").print() 43 | 44 | # LLVM IR 45 | module = function.get_artifact("cleanup-ir").module() 46 | for f in module.iter_functions(): 47 | if f.is_declaration(): 48 | continue 49 | print(f"Function {f.name.decode('utf8')}:") 50 | for bb in f.iter_basic_blocks(): 51 | print(f" Basic block {bb.name.decode('utf8')}") 52 | for instruction in bb.iter_instructions(): 53 | instruction.dump() 54 | print() 55 | 56 | # Make changes 57 | function.Name = "process_array" 58 | prototype = function.Prototype.Definition.get() 59 | argument = prototype.Arguments[0] 60 | argument.Name = "list_head" 61 | argument.Comment = "The head of the linked list" 62 | project.model.commit() 63 | 64 | function.get_artifact("decompile").print() 65 | 66 | # Delete all other functions 67 | project.model.Functions = [function] 68 | project.model.commit() 69 | project.model.get_artifact("decompile-to-single-file").print() 70 | 71 | # Parse C 72 | recompilable_archive = project.model.get_artifact("emit-recompilable-archive") 73 | print_functions(recompilable_archive.parse()) 74 | 75 | # Use LLM 76 | if "OPENAI_API_KEY" in os.environ: 77 | project.model.analyze("llm-rename") 78 | function.get_artifact("decompile").print() 79 | 80 | 81 | if __name__ == "__main__": 82 | sys.exit(main()) 83 | -------------------------------------------------------------------------------- /05-loop-example/analysis/LoopInfo.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "llvm/Analysis/AssumptionCache.h" 4 | #include "llvm/ADT/ArrayRef.h" 5 | #include "llvm/IR/PassManager.h" 6 | #include "llvm/Passes/PassBuilder.h" 7 | #include "llvm/Passes/PassPlugin.h" 8 | #include "llvm/Analysis/LazyValueInfo.h" 9 | 10 | using namespace llvm; 11 | 12 | class LoopInfoPass : public PassInfoMixin { 13 | public: 14 | PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM, 15 | LoopStandardAnalysisResults &AR, LPMUpdater &U) { 16 | auto &FAM = AM.getResult(L, AR); 17 | AssumptionCache AC(*L.getHeader()->getParent()); 18 | LazyValueInfo LVI(&AC, &L.getHeader()->getParent()->getParent()->getDataLayout(), nullptr); 19 | 20 | auto *Phi = L.getInductionVariable(AR.SE); 21 | const SCEV *IVSCEV = nullptr; 22 | std::optional IVRange; 23 | if (Phi != nullptr) { 24 | IVSCEV = AR.SE.getSCEV(Phi); 25 | IVRange = AR.SE.getUnsignedRange(AR.SE.getSCEV(Phi)); 26 | } 27 | 28 | dbgs() << "Found a loop!\n"; 29 | 30 | if (const SCEV *BTC = AR.SE.getBackedgeTakenCount(&L)) { 31 | dbgs() << " BackedgeTakenCount: "; 32 | BTC->dump(); 33 | } 34 | 35 | dbgs() << " Depth: " << L.getLoopDepth() << "\n"; 36 | 37 | dbgs() << "\n"; 38 | 39 | dbgs() << " The loop is composed by:\n"; 40 | SmallVector ExitingBlocks; 41 | L.getExitingBlocks(ExitingBlocks); 42 | for (BasicBlock *BB : L.blocks()) { 43 | dbgs() << " " << BB->getName().str(); 44 | 45 | if (BB == L.getHeader()) 46 | dbgs() << " (header)"; 47 | 48 | if (llvm::find(ExitingBlocks, BB) != ExitingBlocks.end()) 49 | dbgs() << " (exiting)"; 50 | 51 | if (Phi != nullptr) { 52 | dbgs() << " (i.v. range: " << LVI.getConstantRange(Phi, BB->getTerminator()).intersectWith(*IVRange) << ")"; 53 | } 54 | 55 | dbgs() << "\n"; 56 | } 57 | dbgs() << "\n"; 58 | 59 | SmallVector ExitBlocks; 60 | L.getExitBlocks(ExitBlocks); 61 | llvm::dbgs() << " Exit blocks:\n"; 62 | for (BasicBlock *BB : ExitBlocks) { 63 | llvm::dbgs() << " " << BB->getName().str() << "\n"; 64 | } 65 | dbgs() << "\n"; 66 | 67 | if (Phi != nullptr) { 68 | dbgs() << " The induction variable is:\n LLVM IR: "; 69 | Phi->dump(); 70 | 71 | if (IVSCEV != nullptr) { 72 | dbgs() << " SCEV: "; 73 | IVSCEV->dump(); 74 | dbgs() << " Range: " << *IVRange << "\n"; 75 | } 76 | 77 | } 78 | 79 | return PreservedAnalyses::all(); 80 | } 81 | }; 82 | 83 | // This part is the new way of registering your pass 84 | extern "C" ::llvm::PassPluginLibraryInfo LLVM_ATTRIBUTE_WEAK 85 | llvmGetPassPluginInfo() { 86 | return {LLVM_PLUGIN_API_VERSION, "LoopInfo", "v0.1", [](PassBuilder &PB) { 87 | PB.registerPipelineParsingCallback( 88 | [](StringRef Name, LoopPassManager &LPM, 89 | ArrayRef) { 90 | if (Name == "LoopInfo") { 91 | LPM.addPass(LoopInfoPass()); 92 | return true; 93 | } 94 | return false; 95 | }); 96 | }}; 97 | } 98 | -------------------------------------------------------------------------------- /04-taint-example/analysis/TaintAnalysis.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "llvm/ADT/ArrayRef.h" 4 | #include "llvm/IR/PassManager.h" 5 | #include "llvm/Passes/PassBuilder.h" 6 | #include "llvm/Passes/PassPlugin.h" 7 | 8 | using namespace llvm; 9 | 10 | class TaintAnalysisPass : public PassInfoMixin { 11 | public: 12 | PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM) { 13 | Function *Source = M.getFunction("local_get_location"); 14 | Function *Sink = M.getFunction("local_send_"); 15 | 16 | std::queue Queue; 17 | std::set Tainted; 18 | 19 | // Lambda to taint uses 20 | auto TaintUses = [&](Value *V) { 21 | for (Use &U : V->uses()) { 22 | bool New = Tainted.insert(&U).second; 23 | if (New) 24 | Queue.push(&U); 25 | } 26 | }; 27 | 28 | // Enqueue callers 29 | for (User *User : Source->users()) 30 | if (auto *Call = dyn_cast(User)) 31 | TaintUses(Call); 32 | 33 | while (not Queue.empty()) { 34 | Use *Current = Queue.front(); 35 | Queue.pop(); 36 | 37 | // Dump for debug purposes 38 | Current->getUser()->dump(); 39 | 40 | // Check if the use is used as a call argument 41 | if (auto *Call = dyn_cast(Current->getUser())) { 42 | if (Call->isArgOperand(Current)) { 43 | // Handle actual arguments 44 | unsigned Index = Call->getArgOperandNo(Current); 45 | 46 | // Obtain callee name 47 | StringRef CalleeName; 48 | Function *Callee = Call->getCalledFunction(); 49 | if (Callee != nullptr) 50 | CalleeName = Callee->getName(); 51 | 52 | // If we have the definition of the function move from actual argument 53 | // to formal argument 54 | if (not Callee->empty()) 55 | TaintUses(Callee->getArg(Index)); 56 | 57 | // Handle well-known functions 58 | if (Index == 1 and 59 | (CalleeName == "local_strcat_" 60 | or CalleeName == "local_memcpy_" 61 | or CalleeName == "local_strcpy_")) { 62 | // Propagate from second argument to the first... 63 | TaintUses(Call->getArgOperand(0)); 64 | 65 | // and to the return value from the call 66 | TaintUses(Call); 67 | } 68 | 69 | if (Callee == Sink) { 70 | llvm::dbgs() << "The result of get_location is used in a call to " 71 | << "send\n"; 72 | Call->dump(); 73 | return PreservedAnalyses::none(); 74 | } 75 | 76 | } 77 | } 78 | } 79 | 80 | return PreservedAnalyses::none(); 81 | } 82 | }; 83 | 84 | // Register pass 85 | extern "C" ::llvm::PassPluginLibraryInfo LLVM_ATTRIBUTE_WEAK 86 | llvmGetPassPluginInfo() { 87 | return {LLVM_PLUGIN_API_VERSION, "TaintAnaly", "v0.1", [](PassBuilder &PB) { 88 | PB.registerPipelineParsingCallback( 89 | [](StringRef Name, ModulePassManager &FPM, 90 | ArrayRef) { 91 | if (Name == "TaintAnalysis") { 92 | FPM.addPass(TaintAnalysisPass()); 93 | return true; 94 | } 95 | return false; 96 | }); 97 | }}; 98 | } 99 | -------------------------------------------------------------------------------- /12-obfuscation-cfg-flattening/analysis/CFGFlatteningPass.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "llvm/ADT/STLExtras.h" 4 | #include "llvm/IR/IRBuilder.h" 5 | #include "llvm/IR/InstIterator.h" 6 | #include "llvm/IR/Instructions.h" 7 | #include "llvm/IR/Verifier.h" 8 | #include "llvm/Pass.h" 9 | 10 | using namespace llvm; 11 | 12 | class CFGFlatteningPass : public FunctionPass { 13 | 14 | public: 15 | static char ID; 16 | 17 | CFGFlatteningPass() : FunctionPass(ID) {} 18 | 19 | bool runOnFunction(Function &F) override; 20 | }; 21 | 22 | bool CFGFlatteningPass::runOnFunction(Function &F) { 23 | 24 | LLVMContext &C = F.getContext(); 25 | IRBuilder<> Builder { C }; 26 | 27 | auto *Int64 = IntegerType::getInt64Ty(C); 28 | 29 | std::map BlockIndex; 30 | for (const auto &Group : llvm::enumerate(F)) { 31 | auto *Index = ConstantInt::get(Int64, Group.index()); 32 | BlockIndex[&Group.value()] = Index; 33 | } 34 | 35 | BasicBlock *OriginalEntry = &F.getEntryBlock(); 36 | 37 | unsigned NumBlocks = F.size(); 38 | BasicBlock *AllocaBlock = BasicBlock::Create(C, "entry", &F, &F.getEntryBlock()); 39 | BasicBlock *SwitchBlock = BasicBlock::Create(C, "switch", &F); 40 | 41 | Builder.SetInsertPoint(AllocaBlock); 42 | auto *SwitchVariable = Builder.CreateAlloca(Int64); 43 | Builder.CreateStore(BlockIndex.at(OriginalEntry), SwitchVariable); 44 | Builder.CreateBr(SwitchBlock); 45 | 46 | Builder.SetInsertPoint(SwitchBlock); 47 | auto *Condition = Builder.CreateLoad(Int64, SwitchVariable); 48 | auto *Switch = Builder.CreateSwitch(Condition, OriginalEntry, NumBlocks); 49 | 50 | for (Instruction &I : llvm::make_early_inc_range(instructions(F))) { 51 | 52 | auto *PHI = dyn_cast(&I); 53 | if (not PHI) 54 | continue; 55 | 56 | Builder.SetInsertPoint(&*F.getEntryBlock().begin()); 57 | auto *Alloca = Builder.CreateAlloca(PHI->getType()); 58 | 59 | Builder.SetInsertPoint(&*std::next(PHI->getIterator())); 60 | auto *Load = Builder.CreateLoad(PHI->getType(), Alloca); 61 | 62 | for (unsigned Index = 0; Index < PHI->getNumIncomingValues(); ++Index) { 63 | Builder.SetInsertPoint(&*std::prev(PHI->getIncomingBlock(Index)->end())); 64 | Builder.CreateStore(PHI->getIncomingValue(Index), Alloca); 65 | } 66 | 67 | PHI->replaceAllUsesWith(Load); 68 | PHI->eraseFromParent(); 69 | } 70 | 71 | for (BasicBlock &B :F) { 72 | 73 | auto It = BlockIndex.find(&B); 74 | if (It == BlockIndex.end()) 75 | continue; 76 | 77 | Switch->addCase(It->second, &B); 78 | 79 | for (Instruction &I : llvm::make_early_inc_range(B)) { 80 | AllocaInst *NewAlloca = nullptr; 81 | for (Use &U : llvm::make_early_inc_range(I.uses())) { 82 | 83 | auto *UseInst = cast(U.getUser()); 84 | if (UseInst->getParent() == &B) 85 | continue; 86 | 87 | if (not NewAlloca) { 88 | Builder.SetInsertPoint(&*AllocaBlock->begin()); 89 | NewAlloca = Builder.CreateAlloca(I.getType()); 90 | 91 | Builder.SetInsertPoint(&B, std::next(I.getIterator())); 92 | Builder.CreateStore(&I, NewAlloca); 93 | } 94 | 95 | Builder.SetInsertPoint(UseInst); 96 | U.set(Builder.CreateLoad(I.getType(), NewAlloca)); 97 | } 98 | } 99 | 100 | auto *Terminator = B.getTerminator(); 101 | unsigned NumSuccessors = Terminator->getNumSuccessors(); 102 | for (unsigned SuccIndex = 0; SuccIndex < NumSuccessors; ++SuccIndex) { 103 | BasicBlock *NewSuccessor = BasicBlock::Create(C, "", &F); 104 | Builder.SetInsertPoint(NewSuccessor); 105 | Builder.CreateStore(BlockIndex.at(Terminator->getSuccessor(SuccIndex)), SwitchVariable); 106 | Builder.CreateBr(SwitchBlock); 107 | Terminator->setSuccessor(SuccIndex, NewSuccessor); 108 | } 109 | } 110 | 111 | assert(not llvm::verifyFunction(F, &dbgs())); 112 | return true; 113 | } 114 | 115 | char CFGFlatteningPass::ID = 0; 116 | 117 | using Register = RegisterPass; 118 | static Register X("flatten-cfg", "CFG Flattening Pass", true, true); 119 | 120 | --------------------------------------------------------------------------------