├── README-KR.md └── README.md /README-KR.md: -------------------------------------------------------------------------------- 1 | # 더미다(Themida) VM 분석 2 | 3 | ## 목차 4 | 5 | 1. [개요](#개요) 6 | 2. [Themida VM 컨텍스트 구조 (`VM_CONTEXT`)](#themida-vm-컨텍스트-구조-vm_context) 7 | 3. [핸들러 동작 원리 및 특징](#핸들러-동작-원리-및-특징) 8 | 4. [핸들러 코드 분석 예시](#핸들러-코드-분석-예시) 9 | 10 | --- 11 | 12 | ## 개요 13 | 14 | Themida(또는 WinLicense 계열) 난독화 엔진은 **가상 머신(VM)**을 통해 바이너리 코드를 난독화 한다. 특히 x64 코드를 커스텀 바이트코드 형태로 바꿔서, 다양한 **가상 레지스터**, **가상 스택** 연산, **안티 디버깅** 등을 섞어 높은 난이도를 제공한다. 15 | 16 | 역분석 과정의 핵심은 다음과 같다. 17 | 18 | 1. **VM CONTEXT** 파악 19 | - Themida VM 실행 시점에 실제 CPU 레지스터 / 플래그를 백업하고, 내부적으로 VM 전용 레지스터와 스택을 운영한다. 20 | 2. **핸들러(Handler) 분석** 21 | - 각 **가상화된 명령**(ADD, SUB, SHIFT, ROTATE, PUSH, POP, CALL, RET, …)을 처리하는 핸들러가 난독화되어 있다. 22 | 3. **바이트코드 디스패치** 23 | - VM 바이트코드(또는 워드 단위 opcode) + 핸들러 테이블로, 다음 핸들러 주소를 동적으로 계산한다. 24 | 4. **리프팅(Devirtualization) 구현** 아이디어 25 | - Python + Triton 등으로 **“바이트코드 → IR/원본 x86 코드”**를 반자동/자동으로 복원. 26 | 27 | --- 28 | 29 | ## Themida VM 컨텍스트 구조 (`VM_CONTEXT`) 30 | 31 | Themida VM은 **0x200 바이트 내외**의 VM CONTEXT를 사용한다. VM CONTEXT는 더미다 난독화 적용 과정에서 랜덤하게 오프셋이 정해진다. 32 | 33 | ```c 34 | // 실제와 다름. 35 | struct VM_CONTEXT 36 | { 37 | char VM_CONTEXT_START; 38 | char field_1; 39 | char field_2; 40 | char field_3; 41 | char field_4; 42 | char field_5; 43 | char field_6; 44 | char field_7; 45 | char field_8; 46 | char field_9; 47 | char field_A; 48 | char field_B; 49 | char field_C; 50 | char field_D; 51 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_1; 52 | __unaligned __declspec(align(1)) __int64 field_16; 53 | __unaligned __declspec(align(1)) int vm_some_key2; 54 | __unaligned __declspec(align(1)) __int64 vm_opaque_022; 55 | __unaligned __declspec(align(1)) int field_2A; 56 | __unaligned __declspec(align(1)) __int64 vReg_R11; 57 | __unaligned __declspec(align(1)) __int64 vm_opaque_036; 58 | __unaligned __declspec(align(1)) int vm_some_key1; 59 | char vm_unknown_flag; 60 | char field_43[12]; 61 | char vm_branch_flag_maybe; 62 | __int64 vm_scratch_qword; 63 | int vm_some_key3; 64 | __unaligned __declspec(align(1)) __int64 field_5C; 65 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_2; 66 | char vm_instruction_opcode5; 67 | __unaligned __declspec(align(1)) __int16 vm_opaque_06D; 68 | __unaligned __declspec(align(1)) __int64 vReg_R15; 69 | char vm_opaque_077[12]; 70 | __unaligned __declspec(align(1)) __int64 vm_handlerScratch; 71 | __unaligned __declspec(align(1)) __int16 vm_opaque_08B; 72 | __unaligned __declspec(align(1)) __int64 vm_instruction_pointer; 73 | __unaligned __declspec(align(1)) int vm_some_key5; 74 | __unaligned __declspec(align(1)) __int64 vReg_R13; 75 | char vm_instruction_opcode2; 76 | __unaligned __declspec(align(1)) __int64 vm_some_key4; 77 | __unaligned __declspec(align(1)) __int64 vReg_RDI; 78 | __int16 field_B2; 79 | int vm_some_key7; 80 | char vm_instruction_opcode1; 81 | __unaligned __declspec(align(1)) __int16 vm_some_key6; 82 | __unaligned __declspec(align(1)) __int64 vReg_RBP; 83 | __unaligned __declspec(align(1)) __int64 vReg_RBX; 84 | __unaligned __declspec(align(1)) int pad; 85 | __unaligned __declspec(align(1)) __int64 vm_left_value_1; 86 | char vm_instruction_opcode4; 87 | int vm_opaque_0D8; 88 | __unaligned __declspec(align(1)) __int64 vReg_R9; 89 | __unaligned __declspec(align(1)) __int64 vReg_R10; 90 | __unaligned __declspec(align(1)) __int64 *vm_handlerTable; 91 | int vm_opaque_0F4; 92 | __int64 vReg_RCX; 93 | __int64 vm_opaque_100; 94 | int vm_handlerKey; 95 | __unaligned __declspec(align(1)) __int64 vReg_R8; 96 | __unaligned __declspec(align(1)) __int64 vReg_R14; 97 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_3; 98 | __unaligned __declspec(align(1)) __int64 vReg_RAX; 99 | char vm_instruction_opcode3; 100 | char vm_opaque_12D[12]; 101 | __unaligned __declspec(align(1)) __int64 vReg_R12; 102 | char vm_opaque_141[12]; 103 | __unaligned __declspec(align(1)) __int64 vReg_RSI; 104 | char vm_operation_result_selector_maybe; 105 | __int16 vm_opaque_156; 106 | int vm_spinlock; 107 | __unaligned __declspec(align(1)) __int64 vm_stack_pointer; 108 | __unaligned __declspec(align(1)) __int64 vReg_RDX; 109 | __unaligned __declspec(align(1)) __int64 vm_left_value_2; 110 | char vm_opaque_174[140]; 111 | }; 112 | ``` 113 | 114 | - **`vm_handlerTable`**: 바이트코드 인덱스(또는 opcode)로부터 “핸들러” 주소를 가져올 때 사용. 115 | - **`vm_instruction_pointer`**: 현재 해석 중인 바이트코드 포인터? 116 | - **``vm_stack_pointer``**: vsp 117 | 118 | --- 119 | 120 | ## 더미다 동작 흐름 개요 121 | 122 | ### VM ENTER: 123 | 124 | 원본 컨텍스트를 스택에 푸시한다 125 | 126 | ``` 127 | // 간소화된 예시임. 실제와 다름. 128 | push r8 129 | push r9 130 | push r10 131 | push r11 132 | push r12 133 | push r13 134 | push r14 135 | push r15 136 | push rdi 137 | push rsi 138 | push rbp 139 | push rbx 140 | push rdx 141 | push rcx 142 | push rax 143 | push eflags 144 | push some_key 145 | push first_handler_offset 146 | push retaddr 147 | ... 148 | ``` 149 | 150 | 더미다는 단 한개의 VM CONTEXT 전역 변수를 모든 가상화 코드에서 공유하므로 진입 전에 스핀락을 설정한다. 151 | 152 | ``` 153 | .themida:1400931D0 loc_1400931D0: 154 | .themida:1400931D0 xor eax, eax 155 | .themida:1400931D2 lock cmpxchg [rbx+rbp], ecx ;rbx는 스핀락 오프셋이고 rbp는 VM_CONTEXT의 시작주소 156 | .themida:1400931D7 jz loc_1400931E4 157 | .themida:1400931DD pause 158 | .themida:1400931DF jmp loc_1400931D0 159 | ``` 160 | 161 | 스택에 푸시된 컨텍스트 정보를 VM CONTEXT로 옮긴다.. 162 | 163 | ``` 164 | // 간소화된 예시임. 실제와 다름. 165 | pop qword ptr [r9] ;eflags to VM_CONTEXT 166 | pop qword ptr [r14] ;rax 167 | pop qword ptr [r14] ;rcx 168 | pop qword ptr [r15] ;rdx 169 | pop qword ptr [r15] ;rbx 170 | pop qword ptr [r15] ;rbp 171 | pop qword ptr [rsi] ;rsi 172 | pop qword ptr [rsi] ;rdi 173 | pop qword ptr [r12] ;r15 174 | pop qword ptr [r14] ;r14 175 | pop qword ptr [r14] ;r13 176 | pop qword ptr [r15] ;r12 177 | pop qword ptr [r15] ;r11 178 | pop qword ptr [r15] ;r10 179 | pop qword ptr [r15] ;r9 180 | pop qword ptr [r14] ;r8 181 | ... 182 | ``` 183 | 184 | ### VM HANDLER: 185 | 186 | 간단히 말해서 이 과정에서는 바이트 코드를 해석하고 실행한다. 참고로 더미다의 바이트 코드는 바이트 배열이 아니다. 187 | 188 | 예시 핸들러: 189 | 190 | ``` 191 | // 간소화된 예시임. 실제와 다름. 192 | // mov operation 193 | if ( v0->vm_instruction_opcode1 == (char)0xEA ) 194 | { 195 | v84 = v0->vm_opnd_maybe_1 + (v0->vm_some_key4 ^ v0->vm_opnd_maybe_2) - 0x6C1BFD75; 196 | v85 = v0->vm_left_value_1 + v0->vm_opnd_maybe_3 + 0x2787FB21 - v0->vm_some_key4; 197 | if ( v0->vm_instruction_opcode3 == 0x75 ) 198 | v84 = (char)v85; 199 | if ( v0->vm_instruction_opcode3 == 0x76 ) 200 | v84 = (__int16)(LOWORD(v0->vm_left_value_1) + LOWORD(v0->vm_opnd_maybe_3) - 0x4DF - LOWORD(v0->vm_some_key4)); 201 | if ( v0->vm_instruction_opcode3 == 0x77 ) 202 | v84 = (unsigned int)v85; 203 | v50 = v84 + v85; 204 | __readeflags(); 205 | if ( v0->vm_operation_result_selector_maybe <= 0x8Eu ) 206 | v0->vm_left_value_1 = v84 ^ 0x2787FB21; 207 | else 208 | v0->vm_left_value_2 = v84 + 0x116ABA2E; 209 | } 210 | 211 | // shr operation 212 | if ( v0->vm_instruction_opcode1 == 0x6D ) 213 | { 214 | v214 = v0->vm_opnd_maybe_1 + (v0->vm_some_key4 ^ v0->vm_opnd_maybe_2) - 0x6C1BFD75; 215 | v215 = v0->vm_left_value_1 + v0->vm_opnd_maybe_3 + 0x2787FB21 - v0->vm_some_key4; 216 | v237 = *(_QWORD *)(&v0->VM_CONTEXT_START + *(unsigned __int16 *)(v0->vm_instruction_pointer + 0xB)); 217 | if ( v0->vm_instruction_opcode2 == (char)0xA2 ) 218 | { 219 | __writeeflags(v237); 220 | LOBYTE(v214) = (unsigned __int8)v214 >> v215; 221 | v216 = __readeflags(); 222 | v237 = v216; 223 | } 224 | if ( v0->vm_instruction_opcode2 == (char)0xA3 ) 225 | { 226 | __writeeflags(v237); 227 | LOWORD(v214) = (unsigned __int16)v214 >> v215; 228 | v217 = __readeflags(); 229 | v237 = v217; 230 | } 231 | if ( v0->vm_instruction_opcode2 == (char)0xA4 ) 232 | { 233 | __writeeflags(v237); 234 | v214 = (unsigned int)v214 >> v215; 235 | v218 = __readeflags(); 236 | v237 = v218; 237 | } 238 | if ( v0->vm_instruction_opcode2 == (char)0xA5 ) 239 | { 240 | __writeeflags(v237); 241 | v214 >>= v215; 242 | __readeflags(); 243 | } 244 | vm_operation_result_selector_maybe = v0->vm_operation_result_selector_maybe; 245 | if ( vm_operation_result_selector_maybe <= 0x8Eu ) 246 | v0->vm_left_value_1 = v214 ^ 0x2787FB21; 247 | else 248 | v0->vm_left_value_2 = v214 + 0x116ABA2E; 249 | } 250 | ``` 251 | 252 | 참고로 산술 연산이라면 오퍼레이션 직후 pushfq (__readeflags) 가 존재 할 가능성이 높다. 253 | 254 | ``` 255 | // 간소화된 예시임. 실제와 다름. 256 | // push operation 257 | if ( vm_instruction_opcode1 == 0x6A ) 258 | { 259 | if ( (_BYTE)p_vm_instruction_opcode5 == 2 ) 260 | { 261 | LOWORD(v34) = v18; 262 | } 263 | else 264 | { 265 | LOWORD(v34) = HIWORD(v18); 266 | v20 = ((_BYTE)v0 - 0x48) & (v18 + v20) ^ 0xF; 267 | } 268 | } 269 | 270 | ... 271 | 272 | // stack pointer adjust 273 | v26 = (p_vm_instruction_opcode5 | 0x47) - 4; 274 | v27 = v0->vm_instruction_opcode1 - 0x6A; 275 | if ( v0->vm_instruction_opcode1 == 0x6A ) 276 | { 277 | if ( v24 == 2 ) 278 | { 279 | *(_QWORD *)vm_stack_pointer -= 2LL; 280 | } 281 | else 282 | { 283 | *(_QWORD *)vm_stack_pointer -= 8LL; 284 | LOBYTE(v26) = v27 ^ (((v27 ^ v26) - 0xF) | 0x28); 285 | } 286 | v26 = (unsigned __int8)v26 & 0xA8; 287 | } 288 | LOBYTE(p_vm_instruction_opcode5) = v0->vm_instruction_opcode1; 289 | v28 = p_vm_instruction_opcode5 | (v26 + 0x8000FBFFLL); 290 | if ( (_BYTE)p_vm_instruction_opcode5 == 0x1C ) 291 | { 292 | vm_opnd_maybe_1 = v0->vm_opnd_maybe_1; 293 | v28 = 0x400 - vm_opnd_maybe_1; 294 | if ( (char *)(vm_opnd_maybe_1 - 0x31AC9D7C) != vm_stack_pointer ) 295 | { 296 | if ( v24 == 2 ) 297 | *(_QWORD *)vm_stack_pointer += 2LL; 298 | else 299 | *(_QWORD *)vm_stack_pointer += 8LL; 300 | v28 = 0x3E4LL; 301 | } 302 | } 303 | ``` 304 | 305 | 더미다는 핸들러에서 디스패쳐를 따로 거치치 않는다. 핸들러에서 바로 다음 핸들러로 이동한다. 306 | 307 | ``` 308 | // 간소화된 예시임. 실제와 다름. 309 | vmctx->vm_handlerKey -= 0x5AC92481; 310 | vmctx->vm_handlerKey ^= 0x3F8BFC4F; 311 | vmctx->vm_handlerKey ^= 0x2DDDE2; 312 | 313 | v1 = *(unsigned __int16 *)(vmctx->vm_instruction_pointer + 4); 314 | index = v1 - vmctx->vm_handlerKey; 315 | vmctx->vm_handlerKey &= index; 316 | 317 | next_handler = vmctx->vm_handlerTable[(unsigned __int16)index]; 318 | 319 | // VIP 업데이트 320 | vmctx->vm_instruction_pointer += *(int *)vmctx->vm_instruction_pointer; 321 | 322 | jump next_handler; 323 | ``` 324 | 325 | ### VM EXIT: 326 | 327 | VM CONTEXT에 들어있던 레지스터들을 실제 레지스터로 전환한다. 그리고 스핀락을 해제한다. 328 | 329 | 따라서 VM CONTEXT -> STACK -> REAL CONTEXT 전환 과정이 들어있다. 330 | 331 | ``` 332 | // 간소화된 예시임. 실제와 다름. 333 | .themida:14000AEA2 pop r8 334 | .themida:14000AEA4 pop r9 335 | .themida:14000AEA6 pop r10 336 | .themida:14000AEA8 pop r11 337 | .themida:14000AEAA pop r12 338 | .themida:14000AEAC pop r13 339 | .themida:14000AEAE pop r14 340 | .themida:14000AEB0 pop r15 341 | .themida:14000AEB2 pop rdi 342 | .themida:14000AEB3 pop rsi 343 | .themida:14000AEB4 pop rbp 344 | .themida:14000AEB5 pop rbx 345 | .themida:14000AEB6 pop rdx 346 | .themida:14000AEB7 pop rcx 347 | .themida:14000AEB8 pop rax 348 | .themida:14000AEB9 popfq 349 | .themida:14000AEBA popfq 350 | .themida:14000AEBB retn 0 351 | ``` 352 | 353 | --- 354 | 355 | ## 핸들러 동작 원리 및 특징 356 | 357 | 1. **바이트코드에서 opcode fetch** 358 | - `*(unsigned __int16*)(vm_bytecodePtr + offset)` 등으로 2바이트를 읽고, `vm_handlerKey`와 XOR/ADD 등 수행. 359 | - 결과를 `& 0xffff` 후 × 8 → **handler table**에서 “다음 핸들러” 주소를 구한다. 360 | 2. **VM CONTEXT 조작** 361 | - 산술/논리 연산을 `vReg_RAX`, `vReg_RBX`, `vm_flagsA/B`, `field_0[...](스택)` 등에 적용. 362 | - EFLAGS 대신 `vm_flagsA` 등으로 시뮬레이션하거나, 부분적으로 pushfq/popfq, __readeflags()/__writeeflags() 등을 써서 난독화. 363 | 3. **옵코드** 364 | - “if (vm_instruction_opcodeN == 0x16) then pop 2 bytes from field_0[...]” 식 로직이 흔함. 365 | - opcode 상위/하위 4비트(`(value & 0xF0)>>4`, `(value & 0xF)`) 같은게 보였음. opcodeMain, subOpcode 같은 개념이 존재 할수도 있음 366 | 4. **다음 바이트코드로 이동** 367 | - `vm_bytecodePtr += *(int*)(vm_bytecodePtr + someOffset)` 368 | - 핸들러마다 다른 offset이 있다. ex) +6, +3, +4, etc. 369 | 370 | ------ 371 | 372 | ## 핸들러 코드 분석 예시 373 | 374 | 아래는 일부 실제 핸들러(난독화된) 형태를 단순화한 예시: 375 | 376 | ```c 377 | __int64 __fastcall sub_14003411C(VM_CONTEXT *v0) { 378 | if ((v0->vm_handlerKey & 2) != 0) 379 | v0->vm_flagsA += 0x7660110; 380 | // swap(vm_handlerKey, vm_flagsA) 381 | int tmp = v0->vm_handlerKey; 382 | v0->vm_handlerKey = v0->vm_flagsA; 383 | v0->vm_flagsA = tmp; 384 | 385 | // ... 각종 VM 명령어 실행 (오퍼랜드 fetch, 실행 등) 386 | 387 | // bytecode fetch? 388 | unsigned int opcode = *(unsigned __int16*)v0->vm_bytecodePtr; 389 | // next handler = handlerTable[opcode * 8] 390 | __int64 nextHandler = *( (__int64*)(v0->vm_handlerTable + (opcode & 0xFFFF) * 8) ); 391 | 392 | // vm_bytecodePtr += *(int*)(v0->vm_bytecodePtr + 3) 393 | v0->vm_bytecodePtr += *(int*)(v0->vm_bytecodePtr + 3); 394 | 395 | // call/jmp nextHandler 396 | return ((handlerFuncType)nextHandler)(...); 397 | } 398 | ``` 399 | 400 | 이처럼 **핸들러**는 “(1) VM CONTEXT 업데이트 → (2) 바이트코드에서 opcode 추출 → (3) 다음 핸들러 주소 계산 → (4) 바이트코드 포인터 이동” 순으로 동작한다. 401 | 402 | --- 403 | 404 | ## 더미다 분석기 제작 아이디어 405 | 406 | 1. 초기 분석 단계 407 | - 바이너리를 로드하고 지정된 주소에서 가상화된 코드 실행 시작 408 | - 모든 레지스터들을 심볼릭하게 설정하고 Triton 엔진으로 명령어 에뮬레이션 409 | - vm context의 주요 오프셋 식별 (vip/vsp 레지스터 등) 410 | - vip/vsp 레지스터를 심볼릭화하여 이후 메모리 접근 추적 411 | 2. 핸들러 패턴 매칭 412 | - 각 핸들러의 마지막 store 명령에서 저장되는 값의 AST 분석 413 | - 패턴과 매칭하여 해당 핸들러의 연산 의미 파악 414 | - 패턴 예시: 415 | - "[vsp] + [vsp]" → ADD 416 | - "~[vsp] | ~[vsp]" → NAND 417 | - "[vsp] >> ([vsp] & 0x3f)" → SHR 418 | 3. 제어 흐름 분석과 변환 419 | - 식별된 핸들러를 basic block으로 변환 420 | - RIP 레지스터를 슬라이싱하여 다음 basic block 주소 탐색 421 | - Basic block들을 연결하여 제어 흐름 그래프 구성 422 | - 모든 basic block을 LLVM IR로 변환하여 실행 가능한 코드로 복원 423 | 4. 최적화 단계 (옵션) 424 | - LLVM 최적화 패스 적용 425 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Themida VM Analysis 2 | 3 | ## Table of Contents 4 | 5 | 1. [Overview](#overview) 6 | 2. [Themida VM Context Structure (`VM_CONTEXT`)](#themida-vm-context-structure-vm_context) 7 | 3. [Handler Operation Principles and Characteristics](#handler-operation-principles-and-characteristics) 8 | 4. [Handler Code Analysis Example](#handler-code-analysis-example) 9 | 10 | --- 11 | 12 | ## Overview 13 | 14 | The Themida (or WinLicense family) obfuscation engine obfuscates binary code through a **virtual machine (VM)**. It particularly transforms x64 code into custom bytecode format, mixing various **virtual registers**, **virtual stack** operations, and **anti-debugging** techniques to provide a high level of complexity. 15 | 16 | The key steps in reverse engineering are as follows: 17 | 18 | 1. **Understanding VM CONTEXT** 19 | - When executing the Themida VM, it backs up actual CPU registers/flags and internally operates VM-specific registers and stack. 20 | 2. **Handler Analysis** 21 | - Analysis of handlers that process each **virtualized instruction** (ADD, SUB, SHIFT, ROTATE, PUSH, POP, CALL, RET, ...). 22 | 3. **Bytecode Dispatch** 23 | - Using VM bytecode (or word-sized opcode) + handler table to dynamically calculate the next handler address. 24 | 4. **Implementing Lifting (Devirtualization)** 25 | - Semi-automatic/automatic restoration of **"bytecode → IR/original x86 code"** using Python + Triton, etc. 26 | 27 | --- 28 | 29 | ## Themida VM Context Structure (`VM_CONTEXT`) 30 | 31 | Themida VM uses a VM CONTEXT of about **0x200 bytes**. The offsets of VM CONTEXT are randomly determined during the Themida obfuscation process. 32 | 33 | ```c 34 | // This is a simplified example. Differs from actual implementation. 35 | struct VM_CONTEXT 36 | { 37 | char VM_CONTEXT_START; 38 | char field_1; 39 | char field_2; 40 | char field_3; 41 | char field_4; 42 | char field_5; 43 | char field_6; 44 | char field_7; 45 | char field_8; 46 | char field_9; 47 | char field_A; 48 | char field_B; 49 | char field_C; 50 | char field_D; 51 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_1; 52 | __unaligned __declspec(align(1)) __int64 field_16; 53 | __unaligned __declspec(align(1)) int vm_some_key2; 54 | __unaligned __declspec(align(1)) __int64 vm_opaque_022; 55 | __unaligned __declspec(align(1)) int field_2A; 56 | __unaligned __declspec(align(1)) __int64 vReg_R11; 57 | __unaligned __declspec(align(1)) __int64 vm_opaque_036; 58 | __unaligned __declspec(align(1)) int vm_some_key1; 59 | char vm_unknown_flag; 60 | char field_43[12]; 61 | char vm_branch_flag_maybe; 62 | __int64 vm_scratch_qword; 63 | int vm_some_key3; 64 | __unaligned __declspec(align(1)) __int64 field_5C; 65 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_2; 66 | char vm_instruction_opcode5; 67 | __unaligned __declspec(align(1)) __int16 vm_opaque_06D; 68 | __unaligned __declspec(align(1)) __int64 vReg_R15; 69 | char vm_opaque_077[12]; 70 | __unaligned __declspec(align(1)) __int64 vm_handlerScratch; 71 | __unaligned __declspec(align(1)) __int16 vm_opaque_08B; 72 | __unaligned __declspec(align(1)) __int64 vm_instruction_pointer; 73 | __unaligned __declspec(align(1)) int vm_some_key5; 74 | __unaligned __declspec(align(1)) __int64 vReg_R13; 75 | char vm_instruction_opcode2; 76 | __unaligned __declspec(align(1)) __int64 vm_some_key4; 77 | __unaligned __declspec(align(1)) __int64 vReg_RDI; 78 | __int16 field_B2; 79 | int vm_some_key7; 80 | char vm_instruction_opcode1; 81 | __unaligned __declspec(align(1)) __int16 vm_some_key6; 82 | __unaligned __declspec(align(1)) __int64 vReg_RBP; 83 | __unaligned __declspec(align(1)) __int64 vReg_RBX; 84 | __unaligned __declspec(align(1)) int pad; 85 | __unaligned __declspec(align(1)) __int64 vm_left_value_1; 86 | char vm_instruction_opcode4; 87 | int vm_opaque_0D8; 88 | __unaligned __declspec(align(1)) __int64 vReg_R9; 89 | __unaligned __declspec(align(1)) __int64 vReg_R10; 90 | __unaligned __declspec(align(1)) __int64 *vm_handlerTable; 91 | int vm_opaque_0F4; 92 | __int64 vReg_RCX; 93 | __int64 vm_opaque_100; 94 | int vm_handlerKey; 95 | __unaligned __declspec(align(1)) __int64 vReg_R8; 96 | __unaligned __declspec(align(1)) __int64 vReg_R14; 97 | __unaligned __declspec(align(1)) __int64 vm_opnd_maybe_3; 98 | __unaligned __declspec(align(1)) __int64 vReg_RAX; 99 | char vm_instruction_opcode3; 100 | char vm_opaque_12D[12]; 101 | __unaligned __declspec(align(1)) __int64 vReg_R12; 102 | char vm_opaque_141[12]; 103 | __unaligned __declspec(align(1)) __int64 vReg_RSI; 104 | char vm_operation_result_selector_maybe; 105 | __int16 vm_opaque_156; 106 | int vm_spinlock; 107 | __unaligned __declspec(align(1)) __int64 vm_stack_pointer; 108 | __unaligned __declspec(align(1)) __int64 vReg_RDX; 109 | __unaligned __declspec(align(1)) __int64 vm_left_value_2; 110 | char vm_opaque_174[140]; 111 | }; 112 | ``` 113 | 114 | - **`vm_handlerTable`**: Used to fetch "handler" addresses from bytecode index (or opcode). 115 | - **`vm_instruction_pointer`**: Pointer to the bytecode currently being interpreted? 116 | - **`vm_stack_pointer`**: vsp 117 | 118 | --- 119 | 120 | ## Themida Operation Flow Overview 121 | 122 | ### VM ENTER: 123 | 124 | Pushes the original context onto the stack 125 | 126 | ``` 127 | // Simplified example. Differs from actual implementation. 128 | push r8 129 | push r9 130 | push r10 131 | push r11 132 | push r12 133 | push r13 134 | push r14 135 | push r15 136 | push rdi 137 | push rsi 138 | push rbp 139 | push rbx 140 | push rdx 141 | push rcx 142 | push rax 143 | push eflags 144 | push some_key 145 | push first_handler_offset 146 | push retaddr 147 | ... 148 | ``` 149 | 150 | Themida sets a spinlock before entry because all virtualized code shares a single global VM CONTEXT variable. 151 | 152 | ``` 153 | .themida:1400931D0 loc_1400931D0: 154 | .themida:1400931D0 xor eax, eax 155 | .themida:1400931D2 lock cmpxchg [rbx+rbp], ecx ;rbx is spinlock offset and rbp is the start address of VM_CONTEXT 156 | .themida:1400931D7 jz loc_1400931E4 157 | .themida:1400931DD pause 158 | .themida:1400931DF jmp loc_1400931D0 159 | ``` 160 | 161 | Moves context information pushed onto the stack to VM CONTEXT. 162 | 163 | ``` 164 | // Simplified example. Differs from actual implementation. 165 | pop qword ptr [r9] ;eflags to VM_CONTEXT 166 | pop qword ptr [r14] ;rax 167 | pop qword ptr [r14] ;rcx 168 | pop qword ptr [r15] ;rdx 169 | pop qword ptr [r15] ;rbx 170 | pop qword ptr [r15] ;rbp 171 | pop qword ptr [rsi] ;rsi 172 | pop qword ptr [rsi] ;rdi 173 | pop qword ptr [r12] ;r15 174 | pop qword ptr [r14] ;r14 175 | pop qword ptr [r14] ;r13 176 | pop qword ptr [r15] ;r12 177 | pop qword ptr [r15] ;r11 178 | pop qword ptr [r15] ;r10 179 | pop qword ptr [r15] ;r9 180 | pop qword ptr [r14] ;r8 181 | ... 182 | ``` 183 | 184 | ### VM HANDLER: 185 | 186 | In simple terms, this process interprets and executes bytecode. Note that Themida's bytecode is not a byte array. 187 | 188 | Example handlers: 189 | 190 | ``` 191 | // Simplified example. Differs from actual implementation. 192 | // mov operation 193 | if ( v0->vm_instruction_opcode1 == (char)0xEA ) 194 | { 195 | v84 = v0->vm_opnd_maybe_1 + (v0->vm_some_key4 ^ v0->vm_opnd_maybe_2) - 0x6C1BFD75; 196 | v85 = v0->vm_left_value_1 + v0->vm_opnd_maybe_3 + 0x2787FB21 - v0->vm_some_key4; 197 | if ( v0->vm_instruction_opcode3 == 0x75 ) 198 | v84 = (char)v85; 199 | if ( v0->vm_instruction_opcode3 == 0x76 ) 200 | v84 = (__int16)(LOWORD(v0->vm_left_value_1) + LOWORD(v0->vm_opnd_maybe_3) - 0x4DF - LOWORD(v0->vm_some_key4)); 201 | if ( v0->vm_instruction_opcode3 == 0x77 ) 202 | v84 = (unsigned int)v85; 203 | v50 = v84 + v85; 204 | __readeflags(); 205 | if ( v0->vm_operation_result_selector_maybe <= 0x8Eu ) 206 | v0->vm_left_value_1 = v84 ^ 0x2787FB21; 207 | else 208 | v0->vm_left_value_2 = v84 + 0x116ABA2E; 209 | } 210 | 211 | // shr operation 212 | if ( v0->vm_instruction_opcode1 == 0x6D ) 213 | { 214 | v214 = v0->vm_opnd_maybe_1 + (v0->vm_some_key4 ^ v0->vm_opnd_maybe_2) - 0x6C1BFD75; 215 | v215 = v0->vm_left_value_1 + v0->vm_opnd_maybe_3 + 0x2787FB21 - v0->vm_some_key4; 216 | v237 = *(_QWORD *)(&v0->VM_CONTEXT_START + *(unsigned __int16 *)(v0->vm_instruction_pointer + 0xB)); 217 | if ( v0->vm_instruction_opcode2 == (char)0xA2 ) 218 | { 219 | __writeeflags(v237); 220 | LOBYTE(v214) = (unsigned __int8)v214 >> v215; 221 | v216 = __readeflags(); 222 | v237 = v216; 223 | } 224 | if ( v0->vm_instruction_opcode2 == (char)0xA3 ) 225 | { 226 | __writeeflags(v237); 227 | LOWORD(v214) = (unsigned __int16)v214 >> v215; 228 | v217 = __readeflags(); 229 | v237 = v217; 230 | } 231 | if ( v0->vm_instruction_opcode2 == (char)0xA4 ) 232 | { 233 | __writeeflags(v237); 234 | v214 = (unsigned int)v214 >> v215; 235 | v218 = __readeflags(); 236 | v237 = v218; 237 | } 238 | if ( v0->vm_instruction_opcode2 == (char)0xA5 ) 239 | { 240 | __writeeflags(v237); 241 | v214 >>= v215; 242 | __readeflags(); 243 | } 244 | vm_operation_result_selector_maybe = v0->vm_operation_result_selector_maybe; 245 | if ( vm_operation_result_selector_maybe <= 0x8Eu ) 246 | v0->vm_left_value_1 = v214 ^ 0x2787FB21; 247 | else 248 | v0->vm_left_value_2 = v214 + 0x116ABA2E; 249 | } 250 | ``` 251 | 252 | Note that if it's an arithmetic operation, there is likely to be a pushfq (__readeflags) right after the operation. 253 | 254 | ``` 255 | // Simplified example. Differs from actual implementation. 256 | // push operation 257 | if ( vm_instruction_opcode1 == 0x6A ) 258 | { 259 | if ( (_BYTE)p_vm_instruction_opcode5 == 2 ) 260 | { 261 | LOWORD(v34) = v18; 262 | } 263 | else 264 | { 265 | LOWORD(v34) = HIWORD(v18); 266 | v20 = ((_BYTE)v0 - 0x48) & (v18 + v20) ^ 0xF; 267 | } 268 | } 269 | 270 | ... 271 | 272 | // stack pointer adjust 273 | v26 = (p_vm_instruction_opcode5 | 0x47) - 4; 274 | v27 = v0->vm_instruction_opcode1 - 0x6A; 275 | if ( v0->vm_instruction_opcode1 == 0x6A ) 276 | { 277 | if ( v24 == 2 ) 278 | { 279 | *(_QWORD *)vm_stack_pointer -= 2LL; 280 | } 281 | else 282 | { 283 | *(_QWORD *)vm_stack_pointer -= 8LL; 284 | LOBYTE(v26) = v27 ^ (((v27 ^ v26) - 0xF) | 0x28); 285 | } 286 | v26 = (unsigned __int8)v26 & 0xA8; 287 | } 288 | LOBYTE(p_vm_instruction_opcode5) = v0->vm_instruction_opcode1; 289 | v28 = p_vm_instruction_opcode5 | (v26 + 0x8000FBFFLL); 290 | if ( (_BYTE)p_vm_instruction_opcode5 == 0x1C ) 291 | { 292 | vm_opnd_maybe_1 = v0->vm_opnd_maybe_1; 293 | v28 = 0x400 - vm_opnd_maybe_1; 294 | if ( (char *)(vm_opnd_maybe_1 - 0x31AC9D7C) != vm_stack_pointer ) 295 | { 296 | if ( v24 == 2 ) 297 | *(_QWORD *)vm_stack_pointer += 2LL; 298 | else 299 | *(_QWORD *)vm_stack_pointer += 8LL; 300 | v28 = 0x3E4LL; 301 | } 302 | } 303 | ``` 304 | 305 | Themida doesn't go through a separate dispatcher in the handler. It moves directly from the handler to the next handler. 306 | 307 | ``` 308 | // Simplified example. Differs from actual implementation. 309 | vmctx->vm_handlerKey -= 0x5AC92481; 310 | vmctx->vm_handlerKey ^= 0x3F8BFC4F; 311 | vmctx->vm_handlerKey ^= 0x2DDDE2; 312 | 313 | v1 = *(unsigned __int16 *)(vmctx->vm_instruction_pointer + 4); 314 | index = v1 - vmctx->vm_handlerKey; 315 | vmctx->vm_handlerKey &= index; 316 | 317 | next_handler = vmctx->vm_handlerTable[(unsigned __int16)index]; 318 | 319 | // VIP update 320 | vmctx->vm_instruction_pointer += *(int *)vmctx->vm_instruction_pointer; 321 | 322 | jump next_handler; 323 | ``` 324 | 325 | ### VM EXIT: 326 | 327 | Converts registers from VM CONTEXT to actual registers and releases the spinlock. 328 | 329 | Therefore, it includes the process of VM CONTEXT -> STACK -> REAL CONTEXT conversion. 330 | 331 | ``` 332 | // Simplified example. Differs from actual implementation. 333 | .themida:14000AEA2 pop r8 334 | .themida:14000AEA4 pop r9 335 | .themida:14000AEA6 pop r10 336 | .themida:14000AEA8 pop r11 337 | .themida:14000AEAA pop r12 338 | .themida:14000AEAC pop r13 339 | .themida:14000AEAE pop r14 340 | .themida:14000AEB0 pop r15 341 | .themida:14000AEB2 pop rdi 342 | .themida:14000AEB3 pop rsi 343 | .themida:14000AEB4 pop rbp 344 | .themida:14000AEB5 pop rbx 345 | .themida:14000AEB6 pop rdx 346 | .themida:14000AEB7 pop rcx 347 | .themida:14000AEB8 pop rax 348 | .themida:14000AEB9 popfq 349 | .themida:14000AEBA popfq 350 | .themida:14000AEBB retn 0 351 | ``` 352 | 353 | --- 354 | 355 | ## Handler Operation Principles and Characteristics 356 | 357 | 1. **Fetching opcode from bytecode** 358 | - Read 2 bytes using `*(unsigned __int16*)(vm_bytecodePtr + offset)`, then perform XOR/ADD etc. with `vm_handlerKey`. 359 | - After `& 0xffff` × 8 → Get the "next handler" address from the **handler table**. 360 | 2. **VM CONTEXT manipulation** 361 | - Apply arithmetic/logical operations to `vReg_RAX`, `vReg_RBX`, `vm_flagsA/B`, `field_0[...](stack)`, etc. 362 | - Instead of EFLAGS, simulate with `vm_flagsA`, or partially obfuscate using pushfq/popfq, __readeflags()/__writeeflags(). 363 | 3. **Opcodes** 364 | - Logic like "if (vm_instruction_opcodeN == 0x16) then pop 2 bytes from field_0[...]" is common. 365 | - Observed upper/lower 4 bits of opcode (`(value & 0xF0)>>4`, `(value & 0xF)`). Concepts like opcodeMain and subOpcode might exist. 366 | 4. **Moving to next bytecode** 367 | - `vm_bytecodePtr += *(int*)(vm_bytecodePtr + someOffset)` 368 | - Different handlers have different offsets. e.g., +6, +3, +4, etc. 369 | 370 | ------ 371 | 372 | ## Handler Code Analysis Example 373 | 374 | Below is a simplified example of the form of some actual (obfuscated) handlers: 375 | 376 | ```c 377 | __int64 __fastcall sub_14003411C(VM_CONTEXT *v0) { 378 | if ((v0->vm_handlerKey & 2) != 0) 379 | v0->vm_flagsA += 0x7660110; 380 | // swap(vm_handlerKey, vm_flagsA) 381 | int tmp = v0->vm_handlerKey; 382 | v0->vm_handlerKey = v0->vm_flagsA; 383 | v0->vm_flagsA = tmp; 384 | 385 | // ... Various VM command executions (operand fetch, execution, etc.) 386 | 387 | // bytecode fetch 388 | unsigned int opcode = *(unsigned __int16*)v0->vm_bytecodePtr; 389 | // next handler = handlerTable[opcode * 8] 390 | __int64 nextHandler = *( (__int64*)(v0->vm_handlerTable + (opcode & 0xFFFF) * 8) ); 391 | 392 | // vm_bytecodePtr += *(int*)(v0->vm_bytecodePtr + 3) 393 | v0->vm_bytecodePtr += *(int*)(v0->vm_bytecodePtr + 3); 394 | 395 | // call/jmp nextHandler 396 | return ((handlerFuncType)nextHandler)(...); 397 | } 398 | ``` 399 | 400 | Thus, a **handler** operates in the order of "(1) Update VM CONTEXT → (2) Extract opcode from bytecode → (3) Calculate next handler address → (4) Move bytecode pointer." 401 | 402 | --- 403 | 404 | ## Themida Analyzer Development Ideas 405 | 406 | 1. Initial Analysis Phase 407 | - Load the binary and start executing virtualized code at the specified address 408 | - Set all registers symbolically and emulate instructions using the Triton engine 409 | - Identify key offsets of vm context (vip/vsp registers, etc.) 410 | - Symbolize vip/vsp registers to trace subsequent memory accesses 411 | 2. Handler Pattern Matching 412 | - Analyze the AST of values stored in the last store instruction of each handler 413 | - Match with patterns to understand the operational meaning of the handler 414 | - Pattern examples: 415 | - "[vsp] + [vsp]" → ADD 416 | - "~[vsp] | ~[vsp]" → NAND 417 | - "[vsp] >> ([vsp] & 0x3f)" → SHR 418 | 3. Control Flow Analysis and Transformation 419 | - Convert identified handlers to basic blocks 420 | - Slice the RIP register to search for the next basic block address 421 | - Connect basic blocks to construct a control flow graph 422 | - Convert all basic blocks to LLVM IR for restoration as executable code 423 | 4. Optimization Phase (Optional) 424 | - Apply LLVM optimization passes 425 | --------------------------------------------------------------------------------