├── .gitignore ├── DOCS.md ├── LICENSE ├── README.md ├── TODO.md ├── demos ├── objdump │ └── objdump.py └── patching │ ├── patch.py │ ├── patched │ ├── thing │ └── thing.c ├── dispatch ├── __init__.py ├── analysis │ ├── __init__.py │ ├── arm_analyzer.py │ ├── base_analyzer.py │ └── x86_analyzer.py ├── constructs.py ├── enums.py ├── formats │ ├── SectionDoubleP.py │ ├── __init__.py │ ├── base_executable.py │ ├── elf_executable.py │ ├── macho_executable.py │ ├── pe_executable.py │ └── section.py └── util │ ├── __init__.py │ └── trie.py ├── setup.py └── tests ├── analyze_one.py ├── binaries ├── arm32 │ ├── conditions-static.elf │ ├── conditions.elf │ ├── functions-static.elf │ ├── functions.elf │ ├── hello-static.elf │ ├── hello.elf │ ├── switch-static.elf │ ├── switch.elf │ ├── test2-static.elf │ └── test2.elf ├── src │ ├── conditions.c │ ├── functions.c │ ├── hello.c │ ├── switch.c │ └── test2.c ├── x86 │ ├── conditions-static.elf │ ├── conditions.elf │ ├── conditions.macho │ ├── conditions.pe │ ├── functions-static.elf │ ├── functions.elf │ ├── functions.macho │ ├── functions.pe │ ├── hello-static.elf │ ├── hello.elf │ ├── hello.macho │ ├── hello.pe │ ├── switch-static.elf │ ├── switch.elf │ ├── switch.macho │ ├── switch.pe │ ├── test2-static.elf │ ├── test2.elf │ ├── test2.macho │ └── test2.pe └── x86_64 │ ├── conditions-static.elf │ ├── conditions.elf │ ├── conditions.macho │ ├── conditions.pe │ ├── functions-static.elf │ ├── functions.elf │ ├── functions.macho │ ├── functions.pe │ ├── hello-static.elf │ ├── hello.elf │ ├── hello.macho │ ├── hello.pe │ ├── switch-static.elf │ ├── switch.elf │ ├── switch.macho │ ├── switch.pe │ ├── test2-static.elf │ ├── test2.elf │ ├── test2.macho │ └── test2.pe ├── test_analysis.py └── test_injection.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .idea/ 3 | build/ 4 | dist/ 5 | dispatch.egg-info/ 6 | -------------------------------------------------------------------------------- /DOCS.md: -------------------------------------------------------------------------------- 1 | # dispatch Docs 2 | 3 | Though this code is reasonably well commented/logged (esp. the base classes), I wanted to write up some basic theory and broad docs to make contributing easier. 4 | 5 | So here we go. 6 | 7 | ## Class breakdown 8 | 9 | Larger classes (i.e. Executable and Analyzer) have a single base class which defines required functions for the subclasses and also provides basic implementations for a few helper functions. 10 | 11 | ### Executable 12 | 13 | The executable class should be subclassed for each file format to be supported. Currently, we provide executable parsers for the 3 most common binary types: 14 | 15 | * [ELF](./dispatch/formats/elf_executable.py) (used by Linux, Solaris, the BSDs, etc.) 16 | * [PE](./dispatch/formats/pe_executable.py) (used by Windows) 17 | * [MachO](./dispatch/formats/macho_executable.py) (used by OS X) 18 | 19 | It is preferred to use existing (license compatible) libraries to do the low-level executable parsing to reduce errors that we could make and keep the codebase small. 20 | 21 | #### Purpose 22 | 23 | The executable classes are responsible for parsing the executable, handing off the "chunks" of the binary to the analyzer, and doing the binary rewriting part of the patching. 24 | 25 | The executable classes currently extract and keep the following: 26 | 27 | * Segments/sections 28 | * Referenced libraries 29 | * Symbol table(s) 30 | * Strings 31 | 32 | The executable classes also keep an array for the functions of the binary, however it is up to the analyzer to identify and store those. 33 | 34 | ### Analyzer 35 | 36 | The analyzer class should be subclassed for each architecture to be supported. If two architectures are very similar (e.g. x86/i386 and x86\_64), they should be put into one file. Also if possible, a superset architecture (e.g. x86\_64) should subclass the "simpler" subset architecture (e.g. x86). 37 | 38 | We currently provide analysis classes for 4 architectures: 39 | 40 | * [x86](./dispatch/analysis/x86_analyzer.py) 41 | * [x86\_64](./dispatch/analysis/x86_analyzer.py) 42 | * [ARM](./dispatch/analysis/arm_analyzer.py) 43 | * [AArch64](./dispatch/analysis/arm_analyzer.py) (a.k.a. ARM64) 44 | 45 | Currently, all of these analyzers are based around the [capstone engine](https://github.com/aquynh/capstone), but any disassembler could be used with minimal effort required to switch. 46 | 47 | #### Purpose 48 | 49 | The analyzer classes are responsible for doing the actual analysis of binaries: 50 | 51 | * Disassembling the binary 52 | * Identifying constructs in a binary (e.g. functions, basic blocks, jump tables) 53 | * Generating CFGs 54 | 55 | The analyzer also provides architecture-specific helper methods and constants for use in patching (e.g. `REG_NAMES`, `IP_REGS`, `SP_REGS`, `NOP_INSTRUCTION`) 56 | 57 | ## Loading & Analysis Flow 58 | 59 | The following is a breakdown of what happens when a binary is loaded and analyzed: 60 | 61 | 1. `read_executable` (in [\_\_init\_\_.py](./dispatch/__init__.py)) identifies the binary format based on starting magic bytes. 62 | 2. The initializer for the found format is called which loads the binary into its helper (e.g. [pyelftools](https://github.com/eliben/pyelftools)) for parsing 63 | 3. The format initializer parses out some basic information from the loaded binary and stores it for further use (e.g. the sections/segments of the binary, which segment is the main read&executable segment, etc.) 64 | 4. `analyze()` (defined in [base\_analyzer.py](./dispatch/analysis/base_analyzer.py) is called by a script on the returned executable instance, which... 65 | 5. Disassembles the binary into a Trie for quick lookups 66 | 6. Asks the executable to parse and store the symbol table 67 | 7. Identifies functions through a couple of methods (see below) 68 | 8. Populates the (empty) functions with Instructions 69 | 9. Does basic block analysis on the (now populated) Functions 70 | 10. Marks cross-references 71 | 11. Marks strings 72 | 73 | Once this is done, everything in the binary has been setup and can be used. 74 | 75 | ## Implementation Notes 76 | 77 | ### Function analysis 78 | 79 | Currently, functions are marked in two ways: 80 | 81 | 1. Through symbol tables (if applicable) 82 | 2. Through prologue/epilogue matching 83 | 84 | Since symbol tables and prologue/epilogue matching occur at different times, the binaries' `.functions` array is filled with what are essentially placeholder functions (i.e. functions without instructions stored) until the functions are formally populated (step 8 above). 85 | 86 | The need for this two-step find and fill processs will be completely removed soon when a single structure represents all bytes in the binary along with what they represent. 87 | Basically instead of a Function having a normal array, the array will actually just be a view into this backing datastructure (since the offset and size is already known). 88 | This will fix a lot of potential issues stemming from arrays not being synchronized and whatnot, and will allow for something like the following to work: 89 | 90 | ```python 91 | main = executable.function_named('main') 92 | main.bbs[0].instructions[0] = '\xcc' 93 | main.save('modified') 94 | ``` 95 | 96 | 97 | ### Patching 98 | 99 | #### ELF 100 | 101 | As noted, we use a method derived from [http://vxheavens.com/lib/vsc01.html](http://vxheavens.com/lib/vsc01.html). 102 | 103 | #### MachO 104 | 105 | MachO's are very kind and provide us with room to just drop in a new section because of the large amount of padding after the headers and before the rest of the binary. 106 | All we have to do as create the new load command and have it point to the end of the executable where we drop our (address aligned) injected code. 107 | 108 | #### PE 109 | 110 | Since we are already using [pefile](https://github.com/erocarrera/pefile), we are able to let [SectionDoubleP](http://git.n0p.cc/?p=SectionDoubleP.git;a=summary) do the heavy lifting of adding a new section. 111 | 112 | 113 | ### Why a Trie? 114 | 115 | Because it gives us a quick way to do fast (i.e. non-linear time) lookups, while also providing a way to get ranges of the binary without a linear search. 116 | 117 | ### X-Ref detection 118 | 119 | Currently we do _very_ simplistic x-ref detection by finding any instruction operands that happen to be immediates (i.e. set values) and that happen to land in mapped virtual memory. 120 | While this is potentially error-prone, it seems to work very well in practice, and so we haven't seen a need to improve it yet. 121 | 122 | ### String detection 123 | 124 | Similar to x-ref detection, string detection is very simplistic: any time 3 or more printable characters appear in a row in certain sections, it is marked as a string. 125 | Again, while this is definitely error-prone, it seems to end up working just fine in almost all cases so far, so we haven't seen a need to improve it. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 NYU OSIRIS Lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Dispatch 2 | ======== 3 | 4 | Programmatic binary disassembly and patching 5 | 6 | ## Features 7 | * Support for all 3 common executable formats (ELF, MachO, PE) 8 | * Support for x86(-64) and ARM (including AArch64) 9 | * MIPS eventually 10 | 11 | ## Quick Example 12 | ```python 13 | import dispatch 14 | ex = dispatch.read_executable('/bin/cat') 15 | ex.analyze() 16 | print ex.functions 17 | ``` 18 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | * Store read/written registers in Instruction 2 | * Use these to properly implement references_ip() and references_sp() 3 | * Change MachO and PE replace_instructions() to the new format (args: vaddr, asm) 4 | * Load binary from stream 5 | * Stop CFG flow after call to exit() 6 | * ARM analysis 7 | * Improve x86 function analysis with flow analysis 8 | * Generators for common instructions for all platforms (jump, call) 9 | * Shift to having a single mmap backed instance of the binary with everything providing views into that data 10 | -------------------------------------------------------------------------------- /demos/objdump/objdump.py: -------------------------------------------------------------------------------- 1 | from dispatch import * 2 | from sys import argv 3 | 4 | def main(): 5 | if len(argv) < 2: 6 | print "Usage: python objdump.py [binary]" 7 | return 8 | 9 | exe = read_executable(argv[1]) 10 | exe.analyze() 11 | 12 | for function in exe.iter_functions(): 13 | print "{:08x} <{}>:".format(function.address, function.name) 14 | for ins in function.instructions: 15 | ins_bytes = ' '.join(["{:02x}".format(x) for x in ins.raw]) 16 | print " {:08x}\t{:<20}\t{!s}".format(ins.address, ins_bytes, ins) 17 | print "" # newline for space 18 | 19 | if __name__ == '__main__': 20 | main() 21 | -------------------------------------------------------------------------------- /demos/patching/patch.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dispatch import * 3 | 4 | # Read in our executable 5 | exe = read_executable('thing') 6 | # ... and analyze it 7 | exe.analyze() 8 | 9 | # Find the main function (main for linux, _main for OS X) 10 | main = exe.function_named('main') or exe.function_named('_main') 11 | 12 | for i in main.instructions: 13 | # Find the first jne which happens to be the "winner" check 14 | if i.mnemonic == 'jne': 15 | ins = i 16 | exe.replace_at(i.address, '') # NOP it out 17 | exe.save('patched') # Save 18 | os.system("chmod +x patched") # and make the patched binary executable 19 | 20 | break 21 | -------------------------------------------------------------------------------- /demos/patching/patched: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/demos/patching/patched -------------------------------------------------------------------------------- /demos/patching/thing: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/demos/patching/thing -------------------------------------------------------------------------------- /demos/patching/thing.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main() { 4 | int i; 5 | scanf("%d", &i); 6 | if (i == 0x1337) { 7 | printf("Winner!"); 8 | } else { 9 | printf("Nope!"); 10 | } 11 | return 0; 12 | } 13 | -------------------------------------------------------------------------------- /dispatch/__init__.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | 4 | from .formats.elf_executable import ELFExecutable 5 | from .formats.pe_executable import PEExecutable 6 | from .formats.macho_executable import MachOExecutable 7 | 8 | from .enums import * 9 | 10 | MAGICS = {'\x7f\x45\x4c\x46': FORMAT.ELF, 11 | '\x4d\x5a': FORMAT.PE, 12 | '\x50\x45\x00\x00': FORMAT.PE, 13 | '\xFE\xED\xFA\xCE': FORMAT.MACH_O, 14 | '\xFE\xED\xFA\xCF': FORMAT.MACH_O, 15 | '\xCE\xFA\xED\xFE': FORMAT.MACH_O, 16 | '\xCF\xFA\xED\xFE': FORMAT.MACH_O} 17 | 18 | def _identify_format(fh): 19 | maxlen = max([len(m) for m in MAGICS]) 20 | 21 | fh.seek(0) 22 | header = fh.read(maxlen) 23 | 24 | for m in MAGICS: 25 | if header.startswith(m): 26 | return MAGICS[m] 27 | 28 | return None 29 | 30 | def read_executable(file_path): 31 | if not os.path.exists(file_path): 32 | raise Exception('No such file') 33 | 34 | fmt = _identify_format(open(file_path, 'rb')) 35 | 36 | if fmt == FORMAT.ELF: 37 | exe = ELFExecutable(file_path) 38 | elif fmt == FORMAT.PE: 39 | exe = PEExecutable(file_path) 40 | elif fmt == FORMAT.MACH_O: 41 | exe = MachOExecutable(file_path) 42 | else: 43 | raise Exception('Could not determine executable format.') 44 | 45 | logging.info('Extracting symbol table') 46 | exe._extract_symbol_table() 47 | 48 | return exe -------------------------------------------------------------------------------- /dispatch/analysis/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/analysis/__init__.py -------------------------------------------------------------------------------- /dispatch/analysis/arm_analyzer.py: -------------------------------------------------------------------------------- 1 | import capstone 2 | from capstone import * 3 | from capstone.arm_const import * 4 | from Queue import Queue 5 | import struct 6 | 7 | from ..constructs import * 8 | from .base_analyzer import BaseAnalyzer 9 | 10 | class ARM_Analyzer(BaseAnalyzer): 11 | def __init__(self, executable): 12 | super(ARM_Analyzer, self).__init__(executable) 13 | 14 | if self.executable.entry_point() & 0x1: 15 | self._disassembler = Cs(CS_ARCH_ARM, CS_MODE_THUMB) 16 | else: 17 | self._disassembler = Cs(CS_ARCH_ARM, CS_MODE_ARM) 18 | 19 | self._disassembler.detail = True 20 | self._disassembler.skipdata = True 21 | 22 | self.REG_NAMES = dict([(v,k[8:].lower()) for k,v in capstone.arm_const.__dict__.iteritems() if k.startswith('ARM_REG')]) 23 | self.IP_REGS = set([11]) 24 | self.SP_REGS = set([12]) 25 | self.NOP_INSTRUCTION = '\x00\x00\x00\x00' 26 | 27 | def _gen_ins_map(self): 28 | # Again, since ARM binaries can have code using both instruction sets, we basically have to make a CFG and 29 | # disassemble each BB as we find them. 30 | 31 | # vaddr -> disassembly type 32 | bb_disasm_mode = {} 33 | 34 | # If we find a constants table (used for pc-relative ld's), mark it as a known end because it always comes after 35 | # the end of a BB/function 36 | known_ends = set() 37 | 38 | entry = self.executable.entry_point() 39 | 40 | if entry & 0b1: 41 | initial_mode = CS_MODE_THUMB 42 | else: 43 | initial_mode = CS_MODE_ARM 44 | 45 | entry &= ~0b1 46 | 47 | to_analyze = Queue() 48 | to_analyze.put((entry, initial_mode, )) 49 | 50 | bb_disasm_mode[entry] = initial_mode 51 | 52 | # TODO: make this much cleaner, not use raw mnemonic checks, etc 53 | while not to_analyze.empty(): 54 | start_vaddr, mode = to_analyze.get() 55 | 56 | self._disassembler.mode = mode 57 | 58 | logging.debug('Analyzing code at address {} in {} mode' 59 | .format(hex(start_vaddr), 'thumb' if mode == CS_MODE_THUMB else 'arm')) 60 | 61 | # Stop at either the next BB listed or the end of the section 62 | cur_section = self.executable.section_containing_vaddr(start_vaddr) 63 | section_end_vaddr = cur_section.vaddr + cur_section.size 64 | end_vaddr = min([a for a in bb_disasm_mode if a > start_vaddr] or [section_end_vaddr]) 65 | 66 | # Force the low bit 0 67 | start_vaddr &= ~0b1 68 | 69 | code = self.executable.get_binary_vaddr_range(start_vaddr, end_vaddr) 70 | 71 | for ins in self._disassembler.disasm(code, start_vaddr): 72 | if ins.id == 0: # We hit a data byte, so we must have gotten to the end of this bb/function 73 | break 74 | elif ins.address in known_ends: # At a constants table, so we know we're at the end of a bb/function 75 | break 76 | 77 | our_ins = instruction_from_cs_insn(ins, self.executable) 78 | self.ins_map[ins.address] = our_ins 79 | 80 | if self._insn_is_epilogue(our_ins): 81 | break 82 | 83 | # Branch immediate 84 | if ins.mnemonic.startswith('b') and ins.operands[-1].type == CS_OP_IMM: 85 | jump_dst = ins.operands[-1].imm 86 | 87 | if self.executable.vaddr_is_executable(jump_dst) and jump_dst not in bb_disasm_mode: 88 | if 'x' in ins.mnemonic: 89 | next_mode = CS_MODE_THUMB if jump_dst & 0x1 else CS_MODE_ARM 90 | else: 91 | next_mode = mode 92 | 93 | jump_dst &= ~0b1 94 | 95 | logging.debug('Found branch to address {} in instruction at {}' 96 | .format(hex(int(jump_dst)), hex(int(ins.address)))) 97 | bb_disasm_mode[jump_dst] = next_mode 98 | to_analyze.put((jump_dst, next_mode, )) 99 | 100 | # load/move function address as in the case of libc_start_main 101 | elif ins.mnemonic.startswith('ld') or ins.mnemonic.startswith('mov'): 102 | # load/move immediate 103 | if ins.operands[-1].type == CS_OP_IMM and self.executable.vaddr_is_executable(ins.operands[-1].imm): 104 | referenced_addr = ins.operands[-1].imm 105 | if referenced_addr not in bb_disasm_mode: 106 | logging.debug('Found reference to executable address {} in instruction at {}' 107 | .format(hex(int(referenced_addr)), hex(int(ins.address)))) 108 | 109 | next_mode = CS_MODE_THUMB if referenced_addr & 0x1 else CS_MODE_ARM 110 | referenced_addr &= ~0b1 111 | bb_disasm_mode[referenced_addr] = next_mode 112 | to_analyze.put((referenced_addr, next_mode, )) 113 | 114 | # load/move PC-relative entry 115 | elif ins.operands[-1].type == CS_OP_MEM and ins.operands[-1].mem.base == ARM_REG_PC: 116 | ''' 117 | ARM THUMB Instruction Set sec. 5.6.1: 118 | 119 | Note: The value specified by #Imm is a full 10-bit address, but must always be word-aligned 120 | (ie with bits 1:0 set to 0), since the assembler places #Imm >> 2 in field Word8. 121 | 122 | Note: The value of the PC will be 4 bytes greater than the address of this instruction, but bit 123 | 1 of the PC is forced to 0 to ensure it is word aligned. 124 | ''' 125 | ptr = (ins.address + 4 + ins.operands[-1].mem.disp) & (~0b11) 126 | 127 | known_ends.add(ptr) 128 | 129 | referenced_bytes = self.executable.get_binary_vaddr_range(ptr, ptr + self.executable.address_length()) 130 | referenced_addr = struct.unpack(self.executable.pack_endianness + self.executable.address_pack_type, 131 | referenced_bytes)[0] 132 | 133 | if self.executable.vaddr_is_executable(referenced_addr): 134 | logging.debug('Found reference to address {} through const table at {} in instruction at {}' 135 | .format(hex(int(referenced_addr)), hex(int(ptr)), hex(int(ins.address)))) 136 | 137 | if referenced_addr not in bb_disasm_mode: 138 | next_mode = CS_MODE_THUMB if referenced_addr & 0x1 else CS_MODE_ARM 139 | referenced_addr &= ~0b1 140 | bb_disasm_mode[referenced_addr] = next_mode 141 | to_analyze.put((referenced_addr, next_mode, )) 142 | 143 | self._disasm_mode = bb_disasm_mode 144 | 145 | def disassemble_range(self, start_vaddr, end_vaddr): 146 | if start_vaddr & 0x1: 147 | self._disassembler.mode = CS_MODE_THUMB 148 | else: 149 | self._disassembler.mode = CS_MODE_ARM 150 | 151 | start_vaddr &= ~0b1 152 | 153 | size = end_vaddr - start_vaddr 154 | self.executable.binary.seek(self.executable.vaddr_binary_offset(start_vaddr)) 155 | 156 | instructions = [] 157 | 158 | for ins in self._disassembler.disasm(self.executable.binary.read(size), start_vaddr): 159 | if ins.id: 160 | instructions.append(instruction_from_cs_insn(ins, self.executable)) 161 | 162 | return instructions 163 | 164 | def _insn_is_epilogue(self, ins): 165 | """ 166 | Determines whether the instruction is a typical function epilogue 167 | :param ins: Instruction to test 168 | :return: True if the instruction is an epilogue 169 | """ 170 | 171 | # b** {..., lr} 172 | if ins.mnemonic.startswith('b') and ins.operands[0].type == Operand.REG and \ 173 | ins.operands[0].reg == ARM_REG_LR: 174 | return True 175 | 176 | # pop {..., pc} 177 | elif ins.mnemonic == 'pop' and \ 178 | any(o.reg == ARM_REG_PC for o in ins.operands if o.type == Operand.REG): 179 | return True 180 | 181 | return False 182 | 183 | def _identify_functions(self): 184 | STATE_NOT_IN_FUNC, STATE_IN_FUNCTION = 0, 1 185 | 186 | state = STATE_NOT_IN_FUNC 187 | 188 | cur_func = None 189 | 190 | for cur_ins in self.ins_map: 191 | if cur_ins.address in self.executable.functions: 192 | state = STATE_IN_FUNCTION 193 | cur_func = self.executable.functions[cur_ins.address] 194 | 195 | logging.debug('Analyzing function {} with pre-populated size {}'.format(cur_func, cur_func.size)) 196 | 197 | if not cur_func.size: 198 | # Function from symtab has no size, so start to keep track of it 199 | cur_func.size += cur_ins.size 200 | 201 | elif cur_func and cur_func.contains_address(cur_ins.address): 202 | # ARM sometimes stores pointers to various things after the function body, but this data is included in 203 | # ELF's (and maybe others) symbol size, so we have to actively look for the actual end of the function. 204 | 205 | if self._insn_is_epilogue(cur_ins): 206 | state = STATE_NOT_IN_FUNC 207 | logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address))) 208 | cur_func.size -= (cur_func.address + cur_func.size) - (cur_ins.address + cur_ins.size) 209 | cur_func = None 210 | 211 | elif state == STATE_NOT_IN_FUNC and cur_ins.mnemonic == 'push' and \ 212 | any(o.reg == ARM_REG_LR for o in cur_ins.operands if o.type == Operand.REG): 213 | 214 | state = STATE_IN_FUNCTION 215 | logging.debug( 216 | 'Identified function by prologue at {} with prologue instruction {}'.format(hex(cur_ins.address), 217 | cur_ins)) 218 | 219 | cur_func = Function(cur_ins.address, 220 | cur_ins.size, 221 | 'sub_' + hex(cur_ins.address)[2:], 222 | self.executable) 223 | 224 | elif state == STATE_IN_FUNCTION and self._insn_is_epilogue(cur_ins): 225 | state = STATE_NOT_IN_FUNC 226 | cur_func.size += cur_ins.size 227 | 228 | logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address))) 229 | 230 | self.executable.functions[cur_func.address] = cur_func 231 | 232 | cur_func = None 233 | 234 | elif state == STATE_IN_FUNCTION: 235 | cur_func.size += cur_ins.size 236 | 237 | def cfg(self): 238 | edges = set() 239 | 240 | for f in self.executable.iter_functions(): 241 | if f.type == Function.NORMAL_FUNC: 242 | for ins in f.instructions: 243 | if ins.is_call() and ins.operands[-1].type == Operand.IMM: 244 | call_addr = ins.operands[-1].imm 245 | if self.executable.vaddr_is_executable(call_addr): 246 | edge = CFGEdge(ins.address, call_addr, CFGEdge.CALL) 247 | edges.add(edge) 248 | 249 | for cur_bb in f.bbs: 250 | last_ins = cur_bb.instructions[-1] 251 | 252 | if last_ins.is_jump(): 253 | if last_ins.operands[-1].type == Operand.IMM: 254 | jmp_addr = last_ins.operands[-1].imm 255 | 256 | if self.executable.vaddr_is_executable(jmp_addr): 257 | if last_ins.mnemonic == 'b' or last_ins.mnemonic == 'bx': 258 | edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.DEFAULT) 259 | edges.add(edge) 260 | else: # Conditional jump 261 | # True case 262 | edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.COND_JUMP, True) 263 | edges.add(edge) 264 | 265 | # Default/fall-through case 266 | next_addr = last_ins.address + last_ins.size 267 | edge = CFGEdge(last_ins.address, next_addr, CFGEdge.COND_JUMP, False) 268 | edges.add(edge) 269 | elif last_ins != f.instructions[-1]: 270 | # Otherwise, if we're just at the end of a BB that's not the end of the function, just fall 271 | # through to the next of the instruction 272 | edge = CFGEdge(last_ins.address, last_ins.address + last_ins.size, CFGEdge.DEFAULT) 273 | edges.add(edge) 274 | 275 | return edges 276 | 277 | 278 | class ARM_64_Analyzer(ARM_Analyzer): 279 | def __init__(self, executable): 280 | super(ARM_64_Analyzer, self).__init__(executable) 281 | 282 | if self.executable.entry_point() & 0x1: 283 | self._disassembler = Cs(CS_ARCH_ARM64, CS_MODE_THUMB) 284 | else: 285 | self._disassembler = Cs(CS_ARCH_ARM64, CS_MODE_ARM) 286 | 287 | self._disassembler.detail = True 288 | self._disassembler.skipdata = True 289 | 290 | self.REGISTER_NAMES = dict([(v,k[10:].lower()) for k,v in capstone.arm64_const.__dict__.iteritems() if k.startswith('ARM64_REG')]) 291 | self.IP_REGS = set() 292 | self.SP_REGS = set([4, 5]) 293 | self.NOP_INSTRUCTION = '\x1F\x20\x03\xD5' 294 | -------------------------------------------------------------------------------- /dispatch/analysis/base_analyzer.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import re 3 | import string 4 | from ..util.trie import Trie 5 | 6 | from ..constructs import * 7 | 8 | class BaseAnalyzer(object): 9 | ''' 10 | The analyzers are responsible for taking raw instructions from the executable and transforming them 11 | into higher-level constructs. This includes identifying functions, basic blocks, etc. 12 | 13 | The analyzers also provide some helper methods (ins_*) which are quick ways to determine what an instruction does. 14 | This can include determining if a instruction is sensitive to location, is a call/jump, etc. 15 | ''' 16 | def __init__(self, executable): 17 | self.executable = executable 18 | 19 | self.ins_map = Trie() 20 | 21 | def __repr__(self): 22 | return '<{} for {} {} \'{}\'>'.format(self.__class__.__name__, 23 | self.executable.architecture, 24 | self.executable.__class__.__name__, 25 | self.executable.fp) 26 | 27 | def _gen_ins_map(self): 28 | ''' 29 | Generates the instruction lookup dictionary 30 | :return: None 31 | ''' 32 | raise NotImplementedError() 33 | 34 | def disassemble_range(self, start_vaddr, end_vaddr): 35 | ''' 36 | Return an array of instructions disassembled between start and end 37 | :param start_vaddr: The virtual address to start disassembly at 38 | :param end_vaddr: The last virtual address to disassemble 39 | :return: Array of disassembled instructions 40 | ''' 41 | raise NotImplementedError() 42 | 43 | def _identify_functions(self): 44 | ''' 45 | Iterates through instructions and identifies functions by prologues and epilogues 46 | :return: None 47 | ''' 48 | raise NotImplementedError() 49 | 50 | def _populate_func_instructions(self): 51 | ''' 52 | Iterates through all found functions and add instructions inside that function to the Function object 53 | :return: None 54 | ''' 55 | for f in self.executable.iter_functions(): 56 | # some formats (such as macho) have special functions 57 | # that don't actually exist in the binary, so we ignore them 58 | if f.address in self.ins_map: 59 | f.instructions = self.ins_map[f.address : f.address+f.size] 60 | else: 61 | f.instructions = [] 62 | 63 | def _identify_strings(self): 64 | ''' 65 | Extracts all strings from the executable and stores them in the strings dict (addr -> string) 66 | :return: None 67 | ''' 68 | # https://stackoverflow.com/questions/6804582/extract-strings-from-a-binary-file-in-python 69 | chars = string.printable 70 | shortest_run = 2 71 | regexp = '[%s]{%d,}' % (chars, shortest_run) 72 | pattern = re.compile(regexp) 73 | 74 | for section in self.executable.iter_string_sections(): 75 | for found_string in pattern.finditer(section.raw): 76 | vaddr = section.vaddr + found_string.start() 77 | self.executable.strings[vaddr] = String(found_string.group(), vaddr, self.executable) 78 | 79 | 80 | def _mark_xrefs(self): 81 | ''' 82 | Identify all the xrefs from the executable and store them in the xrefs dict (addr -> set of referencing addrs) 83 | :return: None 84 | ''' 85 | for ins in self.ins_map: 86 | for operand in ins.operands: 87 | if operand is not None and operand.type == Operand.IMM and self.executable.vaddr_binary_offset(operand.imm) is not None: 88 | if operand.imm in self.executable.xrefs: 89 | self.executable.xrefs[operand.imm].add(ins.address) 90 | else: 91 | self.executable.xrefs[operand.imm] = set([ins.address]) 92 | 93 | def analyze(self): 94 | ''' 95 | Run the analysis subroutines. 96 | Generates the instruction map, extracts symbol tables, identifies functions/BBs, and "prettifies" instruction op_str's 97 | :return: None 98 | ''' 99 | logging.info('Generating instruction map') 100 | self._gen_ins_map() 101 | 102 | logging.info('Extracting symbol table') 103 | self.executable._extract_symbol_table() 104 | 105 | logging.info('Identifying functions') 106 | self._identify_functions() 107 | 108 | # TODO: CFA 109 | 110 | logging.info('Populating function instructions') 111 | self._populate_func_instructions() 112 | logging.info('Identifying basic blocks') 113 | for func in self.executable.iter_functions(): 114 | func.do_bb_analysis() 115 | logging.info('Marking XRefs') 116 | self._mark_xrefs() 117 | 118 | logging.info('Identifying strings') 119 | self._identify_strings() 120 | 121 | def cfg(self): 122 | ''' 123 | Creates a control flow graph for the binary 124 | :return: List of CFGEdges that describe the edges of the graph. 125 | ''' 126 | raise NotImplementedError() 127 | -------------------------------------------------------------------------------- /dispatch/analysis/x86_analyzer.py: -------------------------------------------------------------------------------- 1 | import capstone 2 | from capstone import * 3 | from capstone.x86_const import * 4 | import logging 5 | import collections 6 | import struct 7 | 8 | from ..constructs import * 9 | from .base_analyzer import BaseAnalyzer 10 | 11 | class X86_Analyzer(BaseAnalyzer): 12 | def __init__(self, executable): 13 | super(X86_Analyzer, self).__init__(executable) 14 | 15 | self._disassembler = Cs(CS_ARCH_X86, CS_MODE_32) 16 | self._disassembler.detail = True 17 | self._disassembler.skipdata = True 18 | 19 | self.REG_NAMES = dict([(v,k[8:].lower()) for k,v in capstone.x86_const.__dict__.iteritems() if k.startswith('X86_REG')]) 20 | self.IP_REGS = set([26, 34, 41]) 21 | self.SP_REGS = set([6, 7, 20, 30, 36, 44, 47, 48]) 22 | self.NOP_INSTRUCTION = '\x90' 23 | 24 | def _gen_ins_map(self): 25 | for section in self.executable.sections_to_disassemble(): 26 | for ins in self._disassembler.disasm(section.raw, section.vaddr): 27 | if ins.id: # .byte "instructions" have an id of 0 28 | self.ins_map[ins.address] = instruction_from_cs_insn(ins, self.executable) 29 | 30 | def disassemble_range(self, start_vaddr, end_vaddr): 31 | size = end_vaddr - start_vaddr 32 | self.executable.binary.seek(self.executable.vaddr_binary_offset(start_vaddr)) 33 | 34 | instructions = [] 35 | 36 | for ins in self._disassembler.disasm(self.executable.binary.read(size), start_vaddr): 37 | if ins.id: 38 | instructions.append(instruction_from_cs_insn(ins, self.executable)) 39 | else: 40 | print ins 41 | 42 | return instructions 43 | 44 | def ins_modifies_esp(self, instruction): 45 | return 'pop' in instruction.mnemonic or 'push' in instruction.mnemonic \ 46 | or instruction.operands[0] in self.SP_REGS 47 | 48 | def _identify_functions(self): 49 | """ 50 | This has to take into account 3 possibilities: 51 | 52 | 1) No symbols whatsoever. Here we basically end up just doing basic prologue/epilogue analysis and hoping that 53 | the functions aren't weird and are relatively predictable. 54 | 55 | 2) Symbols with no size. We use the symbols we have as known starting points (replacing the prologue) but still 56 | look for a epilogue (or the start of another function) to signal the end of the function. 57 | 58 | 3) Symbols with size. 59 | """ 60 | 61 | STATE_NOT_IN_FUNC, STATE_IN_PROLOGUE, STATE_IN_FUNCTION = 0, 1, 2 62 | 63 | state = STATE_NOT_IN_FUNC 64 | 65 | cur_func = None 66 | 67 | ops = [] 68 | 69 | for cur_ins in self.ins_map: 70 | if cur_ins.address in self.executable.functions: 71 | state = STATE_IN_FUNCTION 72 | cur_func = self.executable.functions[cur_ins.address] 73 | 74 | logging.debug('Analyzing function {} with pre-populated size {}'.format(cur_func, cur_func.size)) 75 | 76 | if not cur_func.size: 77 | # Function from symtab has no size, so start to keep track of it 78 | cur_func.size += cur_ins.size 79 | 80 | elif cur_func and cur_func.contains_address(cur_ins.address): 81 | # Current function under analysis has a pre-populated size so just continue on until we get to the end 82 | continue 83 | 84 | # Windows sometimes puts `mov edi, edi` as the first instruction in a function for hot patching, so we check 85 | # for this case to make sure the function we detect starts at the correct address. 86 | # https://blogs.msdn.microsoft.com/oldnewthing/20110921-00/?p=9583 87 | elif state == STATE_NOT_IN_FUNC and cur_ins.mnemonic == 'mov' and \ 88 | cur_ins.operands[0].type == Operand.REG and \ 89 | cur_ins.operands[0].reg == X86_REG_EDI and \ 90 | cur_ins.operands[1].type == Operand.REG and \ 91 | cur_ins.operands[1].reg == X86_REG_EDI: 92 | 93 | state = STATE_IN_PROLOGUE 94 | ops.append(cur_ins) 95 | 96 | elif state in (STATE_NOT_IN_FUNC, STATE_IN_PROLOGUE) and cur_ins.mnemonic == 'push' and \ 97 | cur_ins.operands[0].type == Operand.REG and \ 98 | cur_ins.operands[0].reg in (X86_REG_EBP, X86_REG_RBP): 99 | 100 | state = STATE_IN_PROLOGUE 101 | ops.append(cur_ins) 102 | 103 | elif state == STATE_IN_PROLOGUE and \ 104 | cur_ins.mnemonic == 'mov' and \ 105 | cur_ins.operands[0].type == Operand.REG and \ 106 | cur_ins.operands[0].reg in (X86_REG_EBP, X86_REG_RBP) and \ 107 | cur_ins.operands[1].type == Operand.REG and \ 108 | cur_ins.operands[1].reg in self.SP_REGS: 109 | 110 | 111 | state = STATE_IN_FUNCTION 112 | ops.append(cur_ins) 113 | 114 | logging.debug('Identified function by prologue at {} with prologue ops {}'.format(hex(cur_ins.address), ops)) 115 | cur_func = Function(ops[0].address, 116 | sum(i.size for i in ops), 117 | 'sub_'+hex(ops[0].address)[2:], 118 | self.executable) 119 | ops = [] 120 | 121 | elif state == STATE_IN_FUNCTION and 'ret' in cur_ins.mnemonic: 122 | state = STATE_NOT_IN_FUNC 123 | cur_func.size += cur_ins.size 124 | 125 | logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address))) 126 | 127 | self.executable.functions[cur_func.address] = cur_func 128 | 129 | cur_func = None 130 | 131 | elif state == STATE_IN_FUNCTION: 132 | cur_func.size += cur_ins.size 133 | 134 | 135 | def cfg(self): 136 | edges = set() 137 | 138 | for f in self.executable.iter_functions(): 139 | if f.type == Function.NORMAL_FUNC: 140 | for ins in f.instructions: 141 | #TODO: understand non-immediates here 142 | if ins.is_call() and ins.operands[-1].type == Operand.IMM: 143 | call_addr = ins.operands[-1].imm 144 | if self.executable.vaddr_is_executable(call_addr): 145 | edge = CFGEdge(ins.address, call_addr, CFGEdge.CALL) 146 | edges.add(edge) 147 | 148 | for cur_bb in f.bbs: 149 | last_ins = cur_bb.instructions[-1] 150 | 151 | if last_ins.is_jump(): 152 | if last_ins.operands[-1].type == Operand.IMM: 153 | jmp_addr = last_ins.operands[-1].imm 154 | 155 | if self.executable.vaddr_is_executable(jmp_addr): 156 | if last_ins.mnemonic == 'jmp': 157 | edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.DEFAULT) 158 | edges.add(edge) 159 | else: # Conditional jump 160 | # True case 161 | edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.COND_JUMP, True) 162 | edges.add(edge) 163 | 164 | # Default/fall-through case 165 | next_addr = last_ins.address + last_ins.size 166 | edge = CFGEdge(last_ins.address, next_addr, CFGEdge.COND_JUMP, False) 167 | edges.add(edge) 168 | elif last_ins != f.instructions[-1]: 169 | # Otherwise, if we're just at the end of a BB that's not the end of the function, just fall 170 | # through to the next of the instruction 171 | edge = CFGEdge(last_ins.address, last_ins.address + last_ins.size, CFGEdge.DEFAULT) 172 | edges.add(edge) 173 | 174 | edges.update(self._do_jump_table_detection(f)) 175 | 176 | return edges 177 | 178 | def _do_jump_table_detection(self, f): 179 | # Basic idea is to label each BB as one of these types based on its contents 180 | class BB_TYPE: 181 | NONE = 0 # Seemingly not associated with a switch 182 | VALUE = 1 # A simple value compare (cmp and jmp) 183 | RANGE = 2 # A range compare (cmp and jl/jle/jg/jge) 184 | TABLE = 3 # A jump to a jump table (anything with [_+_*(4,8)] 185 | 186 | 187 | # BB address -> (type, important instruction) 188 | bb_types = {} 189 | 190 | for bb in f.iter_bbs(): 191 | bb_type = (BB_TYPE.NONE, None) 192 | 193 | # Table detection 194 | # NOTE: We *should* do full register tracing if this instruction is a mov/lea, 195 | # but we can relatively safely assume that the jump at the end of the BB 196 | # will be a `jmp {reg}` if this is indeed a jump table 197 | # NOTE: Value tables will be marked as a TABLE, but sanity checking later on 198 | # prevents values from being interpreted as jump destinations 199 | for i, ins in enumerate(bb.instructions): 200 | if any(o.type == Operand.MEM and o.scale in [4,8] for o in ins.operands): 201 | bb_type = (BB_TYPE.TABLE, ins) 202 | break 203 | 204 | # Range detection 205 | cmp_ins = None 206 | for i, ins in enumerate(bb.instructions): 207 | if ins.mnemonic == 'cmp': # Anything else? Is sub used in ranges in clang? 208 | cmp_ins = ins 209 | 210 | # TODO: replace `in ...` with `not in` 211 | elif cmp_ins and ins.is_jump() and ins.mnemonic in ('jb','jnae','jnb','jae','jbe', 212 | 'jna','ja','jnbe','jl','jnge', 213 | 'jge','jnl','jle','jng','jg','jnle'): 214 | bb_type = (BB_TYPE.RANGE, cmp_ins) 215 | 216 | # Value detection 217 | cmp_ins = None 218 | for i, ins in enumerate(bb.instructions): 219 | if ins.mnemonic in ('cmp', 'test', 'sub'): # TODO: Properly check for clang's use of `sub` 220 | cmp_ins = ins 221 | 222 | elif cmp_ins and ins.mnemonic in ('je', 'jne'): 223 | bb_type = (BB_TYPE.VALUE, cmp_ins) 224 | 225 | logging.debug("Marking BB at {} as type {}".format(hex(bb.address), bb_type)) 226 | bb_types[bb.address] = bb_type 227 | 228 | 229 | # Start address of table -> (type, scale, {relative location}) 230 | table_types = {} 231 | 232 | class TABLE_TYPE: 233 | ADDR_REL = 0 # Values in the table are relative to a constant loaded elsewhere 234 | ABS = 1 # Values in the table are absolute 235 | 236 | ins_to_table = [] 237 | 238 | # TODO: Look for _CSWTCH symbols 239 | 240 | for bb in f.iter_bbs(): 241 | if bb_types[bb.address][0] == BB_TYPE.TABLE: 242 | for ins in bb.instructions: 243 | # Special-case the various ways of doing a jump table 244 | 245 | # Option 1 (seemingly most common): lea {reg}, {ip-rel const} 246 | # NOTE: This could either be a jump table or a value table 247 | if ins.mnemonic == 'lea' and ins.operands[1].type == Operand.MEM: 248 | insn_with_mem_op = bb_types[bb.address][1] 249 | if len(insn_with_mem_op.operands) > 1 and insn_with_mem_op.operands[1].type == Operand.MEM: 250 | table_scale = insn_with_mem_op.operands[1].scale 251 | 252 | table_addr = ins.address + ins.size + ins.operands[1].disp 253 | 254 | logging.debug("Marking table at {} as an ADDR_REL table".format(hex(table_addr))) 255 | table_types[table_addr] = (TABLE_TYPE.ADDR_REL, table_scale, ins.address + ins.size) 256 | ins_to_table.append((ins.address, table_addr)) 257 | break 258 | 259 | # Option 2: offset is directly in the jumps mem. operand 260 | elif bb_types[bb.address][1].operands[-1].type == Operand.MEM: 261 | mem_offset = bb_types[bb.address][1].operands[-1].disp 262 | if mem_offset: 263 | logging.debug("Marking table at {} as an ABS table".format(hex(mem_offset))) 264 | table_types[mem_offset] = (TABLE_TYPE.ABS, bb_types[bb.address][1].operands[-1].scale) 265 | ins_to_table.append((ins.address, mem_offset)) 266 | break 267 | 268 | logging.debug("Couldn't find anything with a table offset in BB at {}".format(hex(bb.address))) 269 | 270 | 271 | # Add the end of the segment as an upper bound 272 | 273 | table_types[self.executable.executable_segment_vaddr() + self.executable.executable_segment_size()] = None 274 | 275 | # http://stackoverflow.com/questions/32030412/twos-complement-sign-extension-python 276 | def sign_extend(value, bits): 277 | sign_bit = 1 << (bits - 1) 278 | return (value & (sign_bit - 1)) - (value & sign_bit) 279 | 280 | # Start address of table -> [destination addresses] 281 | table_values = collections.defaultdict(list) 282 | 283 | table_addrs = sorted(table_types.keys()) 284 | 285 | for start_a, end_a in zip(table_addrs[:-1], table_addrs[1:]): 286 | if not table_types[start_a]: 287 | continue 288 | t_type = table_types[start_a][0] 289 | scale = table_types[start_a][1] 290 | for addr in range(start_a, end_a, scale): 291 | # sometimes, our addr+scale ends up not being in the executable, 292 | # usually because they compute a relative offset and then add a 293 | # base address to it. For now, we'll just skip the address. 294 | # TODO: Is there a way to do this without basically implementing 295 | # symbolic execution? 296 | try: 297 | raw = self.executable.get_binary_vaddr_range(addr, addr+scale) 298 | data_val = struct.unpack(self.executable.pack_endianness+('i' if scale == 4 else 'q'), raw)[0] 299 | except KeyError: 300 | logging.warning("Invalid vaddrs requested during jump table analysis, skipping this vaddr: {:08x}".format(addr)) 301 | continue 302 | if t_type == TABLE_TYPE.ADDR_REL: 303 | addr_bit_len = 8*self.executable.address_length() 304 | abs_val = (start_a+sign_extend(data_val, addr_bit_len)) & (2**(addr_bit_len+1) - 1) 305 | else: 306 | abs_val = data_val 307 | 308 | # Only add the values if they land us in the executable segment 309 | # TODO: Be smarter here. Add restrictions to make sure that the table doesn't extend 310 | # past the end of a section/segment before making sure the address is valid 311 | valid_start = self.executable.executable_segment_vaddr() 312 | valid_end = valid_start + self.executable.executable_segment_size() 313 | 314 | if valid_start <= abs_val < valid_end: 315 | table_values[start_a].append(abs_val) 316 | else: 317 | break 318 | 319 | edges = set() 320 | for addr, table in ins_to_table: 321 | for dst in table_values[table]: 322 | edges.add(CFGEdge(addr, dst, CFGEdge.SWITCH)) 323 | 324 | return edges 325 | 326 | class X86_64_Analyzer(X86_Analyzer): 327 | def __init__(self, executable): 328 | super(X86_64_Analyzer, self).__init__(executable) 329 | 330 | self._disassembler = Cs(CS_ARCH_X86, CS_MODE_64) 331 | self._disassembler.detail = True 332 | self._disassembler.skipdata = True 333 | -------------------------------------------------------------------------------- /dispatch/constructs.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import logging 3 | import capstone 4 | import string 5 | from enums import * 6 | 7 | import ctypes 8 | 9 | class Function(object): 10 | NORMAL_FUNC = 0 11 | DYNAMIC_FUNC = 1 12 | 13 | def __init__(self, address, size, name, executable, type=NORMAL_FUNC): 14 | self.address = int(address) 15 | self.size = int(size) 16 | self.name = name 17 | self.type = type 18 | self._executable = executable 19 | 20 | # BELOW: Helpers used to explore the binary. 21 | # NOTE: These should *not* be directly modified at this time. 22 | # Instead, executable.replace_at should be used. 23 | self.instructions = [] # Sequential list of instructions 24 | self.bbs = [] # Sequential list of basic blocks. BB instructions are auto-populated from our instructions 25 | 26 | def __repr__(self): 27 | return ''.format(self.name, hex(self.address)) 28 | 29 | def do_bb_analysis(self): 30 | if self.instructions: 31 | bb_ends = set([self.instructions[-1].address + self.instructions[-1].size]) 32 | 33 | for i in range(len(self.instructions) - 1): 34 | cur = self.instructions[i] 35 | next = self.instructions[i + 1] 36 | 37 | if cur.is_jump(): 38 | bb_ends.add(next.address) 39 | if cur.operands[0].type == Operand.IMM: 40 | bb_ends.add(cur.operands[0].imm) 41 | 42 | bb_ends = sorted(list(bb_ends)) 43 | bb_instructions = [] 44 | 45 | for ins in self.instructions: 46 | if ins.address == bb_ends[0] and bb_instructions: 47 | bb = BasicBlock(self, 48 | bb_instructions[0].address, 49 | bb_instructions[-1].address + bb_instructions[-1].size - bb_instructions[0].address) 50 | bb.instructions = bb_instructions 51 | self.bbs.append(bb) 52 | 53 | bb_ends = bb_ends[1:] 54 | bb_instructions = [ins] 55 | else: 56 | bb_instructions.append(ins) 57 | 58 | # There will always be one BB left over which "ends" at the first address of the next function, so be 59 | # sure to add it 60 | 61 | bb = BasicBlock(self, 62 | bb_instructions[0].address, 63 | bb_instructions[-1].address + bb_instructions[-1].size - bb_instructions[0].address) 64 | bb.instructions = bb_instructions 65 | self.bbs.append(bb) 66 | 67 | def contains_address(self, address): 68 | return self.address <= address < self.address + self.size 69 | 70 | def iter_bbs(self): 71 | for bb in self.bbs: 72 | yield bb 73 | 74 | def print_disassembly(self): 75 | for i in self.instructions: 76 | print hex(i.address) + ' ' + str(i) 77 | 78 | def demangle(self): 79 | if self.name.startswith('_Z'): 80 | p = subprocess.Popen(['c++filt', '-n', self.name], stdout=subprocess.PIPE) 81 | demangled, _ = p.communicate() 82 | return demangled.replace('\n','') 83 | elif self.name.startswith('@'): 84 | # TODO: MSVC demangling (look at wine debugger source) 85 | return self.name 86 | else: 87 | logging.debug('Call to demangle with a non-reserved function name') 88 | 89 | 90 | class BasicBlock(object): 91 | def __init__(self, parent_func, address, size): 92 | self.parent = parent_func 93 | self.address = int(address) 94 | self.size = int(size) 95 | self.offset = self.parent.address - self.address 96 | self.instructions = [] 97 | 98 | def __repr__(self): 99 | return ''.format(hex(self.address)) 100 | 101 | def print_disassembly(self): 102 | for i in self.instructions: 103 | print hex(i.address) + ' ' + str(i) 104 | 105 | 106 | class Instruction(object): 107 | GRP_CALL = 0 108 | GRP_JUMP = 1 109 | 110 | def __init__(self, address, size, raw, mnemonic, operands, groups, backend_instruction, executable): 111 | self.address = int(address) 112 | self.size = int(size) 113 | self.raw = raw 114 | self.mnemonic = mnemonic 115 | self.operands = operands 116 | self.groups = groups 117 | self._backend_instruction = backend_instruction 118 | self._executable = executable 119 | 120 | self.comment = '' 121 | 122 | def __repr__(self): 123 | return ''.format(hex(self.address)) 124 | 125 | def __str__(self): 126 | s = self.mnemonic + ' ' + self.nice_op_str() 127 | if self.comment: 128 | s += '; "{}"'.format(self.comment) 129 | if self.address in self._executable.xrefs: 130 | s += '; XREF={}'.format(', '.join(hex(a)[:-1] for a in self._executable.xrefs[self.address])) 131 | # TODO: Print nice function relative offsets if the xref is in a function 132 | 133 | return s 134 | 135 | def is_call(self): 136 | return Instruction.GRP_CALL in self.groups 137 | 138 | def is_jump(self): 139 | return Instruction.GRP_JUMP in self.groups 140 | 141 | def redirects_flow(self): 142 | return self.is_jump() or self.is_call() 143 | 144 | def references_ip(self): 145 | implicit_read, implicit_written = self._backend_instruction.regs_access() 146 | ops_direct = [op.used_regs() for op in self.operands] 147 | if ops_direct: 148 | explicit_accessed = set.union(*ops_direct) 149 | else: 150 | explicit_accessed = set() 151 | all_accessed = set.union(explicit_accessed, implicit_read, implicit_written) 152 | return bool(self._executable.analyzer.IP_REGS.intersection(all_accessed)) 153 | 154 | def references_sp(self): 155 | implicit_read, implicit_written = self._backend_instruction.regs_access() 156 | ops_direct = [op.used_regs() for op in self.operands] 157 | if ops_direct: 158 | explicit_accessed = set.union(*ops_direct) 159 | else: 160 | explicit_accessed = set() 161 | all_accessed = set.union(explicit_accessed, implicit_read, implicit_written) 162 | return bool(self._executable.analyzer.SP_REGS.intersection(all_accessed)) 163 | 164 | def references_seg_reg(self): 165 | ''' 166 | Returns whether our instruction uses segmentation registers (fs, gs, etc on x86[_64]) 167 | Mostly seen on x86[_64] stack canaries 168 | :return: Whether this instruction references the segmentation registers 169 | ''' 170 | operand_refs_seg_reg = lambda op: op.type == Operand.MEM and op.seg_reg 171 | 172 | return any(operand_refs_seg_reg(op) for op in self.operands) 173 | 174 | def op_str(self): 175 | return ', '.join(str(op) for op in self.operands) 176 | 177 | def nice_op_str(self): 178 | ''' 179 | Returns the operand string "nicely formatted." I.e. replaces addresses with function names (and function 180 | relative offsets) if appropriate. 181 | :return: The nicely formatted operand string 182 | ''' 183 | op_strings = [str(op) for op in self.operands] 184 | 185 | # If this is an immediate call or jump, try to put a name to where we're calling/jumping to 186 | if self.is_call() or self.is_jump(): 187 | # jump/call destination will always be the last operand (even with conditional ARM branch instructions) 188 | operand = self.operands[-1] 189 | # TODO: Don't only do this when we've got an IMM operation 190 | if operand.type == Operand.IMM: 191 | if operand.imm in self._executable.functions: 192 | op_strings[-1] = self._executable.functions[operand.imm].name 193 | elif self._executable.vaddr_is_executable(operand.imm): 194 | for func in self._executable.iter_functions(): 195 | if func.contains_address(operand.imm): 196 | diff = operand.imm - func.address 197 | op_strings[-1] = func.name+'+'+hex(diff) 198 | break 199 | else: # TODO: Limit this to only be sensible instructions (e.g. mov, push, etc.) 200 | for i, operand in enumerate(self.operands): 201 | if operand.type == Operand.IMM and operand.imm in self._executable.strings: 202 | referenced_string = self._executable.strings[operand.imm] 203 | op_strings[i] = referenced_string.short_name 204 | self.comment = referenced_string.string.strip() 205 | 206 | return ', '.join(op_strings) 207 | 208 | 209 | class Operand(object): 210 | IMM = 0 211 | FP = 1 212 | REG = 2 213 | MEM = 3 214 | 215 | def __init__(self, type, size, instruction, **kwargs): 216 | self.type = type 217 | self.size = size 218 | self._instruction = instruction 219 | if self.type == Operand.IMM: 220 | self.imm = int(kwargs.get('imm')) 221 | elif self.type == Operand.FP: 222 | self.fp = float(kwargs.get('fp')) 223 | elif self.type == Operand.REG: 224 | self.reg = kwargs.get('reg') 225 | elif self.type == Operand.MEM: 226 | self.base = kwargs.get('base') 227 | self.index = kwargs.get('index') 228 | self.scale = int(kwargs.get('scale', 1)) 229 | self.disp = int(kwargs.get('disp', 0)) 230 | self.seg_reg = kwargs.get('seg_reg') 231 | else: 232 | raise ValueError('Type is not one of Operand.{IMM,FP,REG,MEM}') 233 | 234 | def _get_simplified(self): 235 | # Auto-simplify ip-relative operands to their actual address 236 | if self.type == Operand.MEM and self.base in self._instruction._executable.analyzer.IP_REGS and self.index == 0: 237 | addr = self._instruction.address + self._instruction.size + self.index * self.scale + self.disp 238 | return Operand(Operand.MEM, self.size, self._instruction, disp=addr) 239 | 240 | return self 241 | 242 | def used_regs(self): 243 | if self.type == Operand.REG: 244 | return set([self.reg]) 245 | elif self.type == Operand.MEM: 246 | return set([self.base, self.index]) 247 | else: 248 | return set() 249 | 250 | def __str__(self): 251 | sizes = { 252 | 1: 'byte ptr', 253 | 2: 'word ptr', 254 | 4: 'dword ptr', 255 | 8: 'qword ptr' 256 | } 257 | if self.type == Operand.IMM: 258 | return sizes.get(self.size, '') + ' ' + hex(self.imm) 259 | elif self.type == Operand.FP: 260 | return str(self.fp) 261 | elif self.type == Operand.REG: 262 | return self._instruction._executable.analyzer.REG_NAMES[self.reg] 263 | elif self.type == Operand.MEM: 264 | simplified = self._get_simplified() 265 | 266 | s = '' 267 | if self.seg_reg: 268 | s += self._instruction._executable.analyzer.REG_NAMES[simplified.seg_reg] 269 | s += ':' 270 | 271 | s += '[' 272 | 273 | show_plus = False 274 | if simplified.base: 275 | s += self._instruction._executable.analyzer.REG_NAMES[simplified.base] 276 | show_plus = True 277 | if simplified.index: 278 | if show_plus: 279 | s += ' + ' 280 | 281 | s += self._instruction._executable.analyzer.REG_NAMES[simplified.index] 282 | if simplified.scale > 1: 283 | s += '*' 284 | s += str(simplified.scale) 285 | 286 | show_plus = True 287 | if simplified.disp: 288 | if show_plus: 289 | s += ' + ' 290 | s += hex(simplified.disp) 291 | 292 | s += ']' 293 | 294 | return sizes.get(self.size, '') + ' ' + s 295 | 296 | 297 | def operand_from_cs_op(csOp, instruction): 298 | size = csOp.size if hasattr(csOp, 'size') else None 299 | if csOp.type == capstone.CS_OP_IMM: 300 | return Operand(Operand.IMM, size, instruction, imm=csOp.imm) 301 | elif csOp.type == capstone.CS_OP_FP: 302 | return Operand(Operand.FP, size, instruction, fp=csOp.fp) 303 | elif csOp.type == capstone.CS_OP_REG: 304 | return Operand(Operand.REG, size, instruction, reg=csOp.reg) 305 | elif csOp.type == capstone.CS_OP_MEM: 306 | return Operand(Operand.MEM, size, instruction, base=csOp.mem.base, index=csOp.mem.index, scale=csOp.mem.scale, disp=csOp.mem.disp, seg_reg=csOp.reg) 307 | 308 | 309 | def instruction_from_cs_insn(csInsn, executable): 310 | groups = [] 311 | 312 | if executable.architecture in (ARCHITECTURE.ARM, ARCHITECTURE.ARM_64): 313 | if csInsn.mnemonic.startswith('bl'): 314 | groups.append(Instruction.GRP_CALL) 315 | elif csInsn.mnemonic.startswith('b'): 316 | groups.append(Instruction.GRP_JUMP) 317 | else: 318 | if capstone.CS_GRP_JUMP in csInsn.groups: 319 | groups.append(Instruction.GRP_JUMP) 320 | if capstone.CS_GRP_CALL in csInsn.groups: 321 | groups.append(Instruction.GRP_CALL) 322 | 323 | instruction = Instruction(csInsn.address, csInsn.size, csInsn.bytes, csInsn.mnemonic, [], groups, csInsn, executable) 324 | 325 | # We manually pull out the instruction details here so that capstone doesn't deepcopy everything which burns time 326 | # and memory 327 | detail = ctypes.cast(csInsn._raw.detail, ctypes.POINTER(capstone._cs_detail)).contents 328 | 329 | if executable.architecture == ARCHITECTURE.X86 or executable.architecture == ARCHITECTURE.X86_64: 330 | detail = detail.arch.x86 331 | elif executable.architecture == ARCHITECTURE.ARM: 332 | detail = detail.arch.arm 333 | elif executable.architecture == ARCHITECTURE.ARM_64: 334 | detail = detail.arch.arm64 335 | 336 | operands = [operand_from_cs_op(detail.operands[i], instruction) for i in range(detail.op_count)] 337 | 338 | instruction.operands = operands 339 | 340 | return instruction 341 | 342 | 343 | class String(object): 344 | def __init__(self, s, vaddr, executable): 345 | self.string = s 346 | self.short_name = reduce(lambda s, r: s.replace(r, ''), ' '+string.punctuation, self.string)[:8] 347 | self.vaddr = vaddr 348 | self._executable = executable 349 | 350 | def __repr__(self): 351 | return ''.format(self.string, hex(self.vaddr)) 352 | 353 | def __str__(self): 354 | return self.string 355 | 356 | 357 | class CFGEdge(object): 358 | # Edge with no special information. Could be from a default fall-through, unconditional jump, etc. 359 | DEFAULT = 0 360 | 361 | # Edge from a conditional jump. Two of these should be added for each cond. jump, one for the True, and one for False 362 | COND_JUMP = 1 363 | 364 | # Edge from a switch/jump table. One edge should be added for each entry, and the corresponding key set as the value 365 | SWITCH = 2 366 | 367 | # Edge from a call instruction. 368 | CALL = 3 369 | 370 | def __init__(self, src, dst, type, value=None): 371 | self.src = src 372 | self.dst = dst 373 | self.type = type 374 | self.value = value 375 | 376 | def __eq__(self, other): 377 | if isinstance(other, CFGEdge) and self.src == other.src and self.dst == other.dst and self.type == other.type: 378 | return True 379 | return False 380 | 381 | def __ne__(self, other): 382 | return not self.__eq__(other) 383 | 384 | def __repr__(self): 385 | return ''.format(hex(self.src), hex(self.dst)) 386 | -------------------------------------------------------------------------------- /dispatch/enums.py: -------------------------------------------------------------------------------- 1 | class FORMAT: 2 | ELF = 'ELF' 3 | PE = 'PE' 4 | MACH_O = 'MachO' 5 | 6 | class ARCHITECTURE: 7 | X86 = 'x86' 8 | X86_64 = 'x86-64' 9 | ARM = 'ARM' 10 | ARM_64 = 'ARM64' -------------------------------------------------------------------------------- /dispatch/formats/SectionDoubleP.py: -------------------------------------------------------------------------------- 1 | """ Tested with pefile 1.2.10-123 on 32bit PE executable files. 2 | 3 | An implementation to push or pop a section header to the section table of a PE file. 4 | For further information refer to the docstrings of pop_back/push_back. 5 | 6 | by n0p 7 | """ 8 | 9 | import pefile 10 | 11 | class SectionDoublePError(Exception): 12 | pass 13 | 14 | class SectionDoubleP: 15 | def __init__(self, pe): 16 | self.pe = pe 17 | 18 | def __adjust_optional_header(self): 19 | """ Recalculates the SizeOfImage, SizeOfCode, SizeOfInitializedData and 20 | SizeOfUninitializedData of the optional header. 21 | """ 22 | 23 | # SizeOfImage = ((VirtualAddress + VirtualSize) of the new last section) 24 | self.pe.OPTIONAL_HEADER.SizeOfImage = (self.pe.sections[-1].VirtualAddress + 25 | self.pe.sections[-1].Misc_VirtualSize) 26 | 27 | self.pe.OPTIONAL_HEADER.SizeOfCode = 0 28 | self.pe.OPTIONAL_HEADER.SizeOfInitializedData = 0 29 | self.pe.OPTIONAL_HEADER.SizeOfUninitializedData = 0 30 | 31 | # Recalculating the sizes by iterating over every section and checking if 32 | # the appropriate characteristics are set. 33 | for section in self.pe.sections: 34 | if section.Characteristics & 0x00000020: 35 | # Section contains code. 36 | self.pe.OPTIONAL_HEADER.SizeOfCode += section.SizeOfRawData 37 | if section.Characteristics & 0x00000040: 38 | # Section contains initialized data. 39 | self.pe.OPTIONAL_HEADER.SizeOfInitializedData += section.SizeOfRawData 40 | if section.Characteristics & 0x00000080: 41 | # Section contains uninitialized data. 42 | self.pe.OPTIONAL_HEADER.SizeOfUninitializedData += section.SizeOfRawData 43 | 44 | def __add_header_space(self): 45 | """ To make space for a new section header a buffer filled with nulls is added at the 46 | end of the headers. The buffer has the size of one file alignment. 47 | The data between the last section header and the end of the headers is copied to 48 | the new space (everything moved by the size of one file alignment). If any data 49 | directory entry points to the moved data the pointer is adjusted. 50 | """ 51 | 52 | FileAlignment = self.pe.OPTIONAL_HEADER.FileAlignment 53 | SizeOfHeaders = self.pe.OPTIONAL_HEADER.SizeOfHeaders 54 | 55 | data = '\x00' * FileAlignment 56 | 57 | # Adding the null buffer. 58 | self.pe.__data__ = (self.pe.__data__[:SizeOfHeaders] + data + 59 | self.pe.__data__[SizeOfHeaders:]) 60 | 61 | section_table_offset = (self.pe.DOS_HEADER.e_lfanew + 4 + 62 | self.pe.FILE_HEADER.sizeof() + self.pe.FILE_HEADER.SizeOfOptionalHeader) 63 | 64 | # Copying the data between the last section header and SizeOfHeaders to the newly allocated 65 | # space. 66 | new_section_offset = section_table_offset + self.pe.FILE_HEADER.NumberOfSections*0x28 67 | size = SizeOfHeaders - new_section_offset 68 | data = self.pe.get_data(new_section_offset, size) 69 | self.pe.set_bytes_at_offset(new_section_offset + FileAlignment, data) 70 | 71 | # Filling the space, from which the data was copied from, with NULLs. 72 | self.pe.set_bytes_at_offset(new_section_offset, '\x00' * FileAlignment) 73 | 74 | data_directory_offset = section_table_offset - self.pe.OPTIONAL_HEADER.NumberOfRvaAndSizes * 0x8 75 | 76 | # Checking data directories if anything points to the space between the last section header 77 | # and the former SizeOfHeaders. If that's the case the pointer is increased by FileAlignment. 78 | for data_offset in xrange(data_directory_offset, section_table_offset, 0x8): 79 | data_rva = self.pe.get_dword_from_offset(data_offset) 80 | 81 | if new_section_offset <= data_rva and data_rva < SizeOfHeaders: 82 | self.pe.set_dword_at_offset(data_offset, data_rva + FileAlignment) 83 | 84 | SizeOfHeaders_offset = (self.pe.DOS_HEADER.e_lfanew + 4 + 85 | self.pe.FILE_HEADER.sizeof() + 0x3C) 86 | 87 | # Adjusting the SizeOfHeaders value. 88 | self.pe.set_dword_at_offset(SizeOfHeaders_offset, SizeOfHeaders + FileAlignment) 89 | 90 | section_raw_address_offset = section_table_offset + 0x14 91 | 92 | # The raw addresses of the sections are adjusted. 93 | for section in self.pe.sections: 94 | if section.PointerToRawData != 0: 95 | self.pe.set_dword_at_offset(section_raw_address_offset, section.PointerToRawData+FileAlignment) 96 | 97 | section_raw_address_offset += 0x28 98 | 99 | # All changes in this method were made to the raw data (__data__). To make these changes 100 | # accessbile in self.pe __data__ has to be parsed again. Since a new pefile is parsed during 101 | # the init method, the easiest way is to replace self.pe with a new pefile based on __data__ 102 | # of the old self.pe. 103 | self.pe = pefile.PE(data=self.pe.__data__) 104 | 105 | def __is_null_data(self, data): 106 | """ Checks if the given data contains just null bytes. 107 | """ 108 | 109 | for char in data: 110 | if char != '\x00': 111 | return False 112 | return True 113 | 114 | def push_back(self, Name=".NewSec", VirtualSize=0x00000000, VirtualAddress=0x00000000, 115 | RawSize=0x00000000, RawAddress=0x00000000, RelocAddress=0x00000000, 116 | Linenumbers=0x00000000, RelocationsNumber=0x0000, LinenumbersNumber=0x0000, 117 | Characteristics=0xE00000E0, Data=""): 118 | """ Adds the section, specified by the functions parameters, at the end of the section 119 | table. 120 | If the space to add an additional section header is insufficient, a buffer is inserted 121 | after SizeOfHeaders. Data between the last section header and the end of SizeOfHeaders 122 | is copied to +1 FileAlignment. Data directory entries pointing to this data are fixed. 123 | 124 | A call with no parameters creates the same section header as LordPE does. But for the 125 | binary to be executable without errors a VirtualSize > 0 has to be set. 126 | 127 | If a RawSize > 0 is set or Data is given the data gets aligned to the FileAlignment and 128 | is attached at the end of the file. 129 | """ 130 | 131 | if self.pe.FILE_HEADER.NumberOfSections == len(self.pe.sections): 132 | 133 | FileAlignment = self.pe.OPTIONAL_HEADER.FileAlignment 134 | SectionAlignment = self.pe.OPTIONAL_HEADER.SectionAlignment 135 | 136 | if len(Name) > 8: 137 | raise SectionDoublePError("The name is too long for a section.") 138 | 139 | if ( VirtualAddress < (self.pe.sections[-1].Misc_VirtualSize + 140 | self.pe.sections[-1].VirtualAddress) 141 | or VirtualAddress % SectionAlignment != 0): 142 | 143 | if (self.pe.sections[-1].Misc_VirtualSize % SectionAlignment) != 0: 144 | VirtualAddress = \ 145 | (self.pe.sections[-1].VirtualAddress + self.pe.sections[-1].Misc_VirtualSize - 146 | (self.pe.sections[-1].Misc_VirtualSize % SectionAlignment) + SectionAlignment) 147 | else: 148 | VirtualAddress = \ 149 | (self.pe.sections[-1].VirtualAddress + self.pe.sections[-1].Misc_VirtualSize) 150 | 151 | if VirtualSize < len(Data): 152 | VirtualSize = len(Data) 153 | 154 | if (len(Data) % FileAlignment) != 0: 155 | # Padding the data of the section. 156 | Data += '\x00' * (FileAlignment - (len(Data) % FileAlignment)) 157 | 158 | if RawSize != len(Data): 159 | if ( RawSize > len(Data) 160 | and (RawSize % FileAlignment) == 0): 161 | Data += '\x00' * (RawSize - (len(Data) % RawSize)) 162 | else: 163 | RawSize = len(Data) 164 | 165 | 166 | section_table_offset = (self.pe.DOS_HEADER.e_lfanew + 4 + 167 | self.pe.FILE_HEADER.sizeof() + self.pe.FILE_HEADER.SizeOfOptionalHeader) 168 | 169 | # If the new section header exceeds the SizeOfHeaders there won't be enough space 170 | # for an additional section header. Besides that it's checked if the 0x28 bytes 171 | # (size of one section header) after the last current section header are filled 172 | # with nulls/ are free to use. 173 | if ( self.pe.OPTIONAL_HEADER.SizeOfHeaders < 174 | section_table_offset + (self.pe.FILE_HEADER.NumberOfSections+1)*0x28 175 | or not self.__is_null_data(self.pe.get_data(section_table_offset + 176 | (self.pe.FILE_HEADER.NumberOfSections)*0x28, 0x28))): 177 | 178 | # Checking if more space can be added. 179 | if self.pe.OPTIONAL_HEADER.SizeOfHeaders < self.pe.sections[0].VirtualAddress: 180 | 181 | self.__add_header_space() 182 | else: 183 | raise SectionDoublePError("No more space can be added for the section header.") 184 | 185 | 186 | # The validity check of RawAddress is done after space for a new section header may 187 | # have been added because if space had been added the PointerToRawData of the previous 188 | # section would have changed. 189 | if (RawAddress != (self.pe.sections[-1].PointerToRawData + 190 | self.pe.sections[-1].SizeOfRawData)): 191 | RawAddress = \ 192 | (self.pe.sections[-1].PointerToRawData + self.pe.sections[-1].SizeOfRawData) 193 | 194 | 195 | # Appending the data of the new section to the file. 196 | if len(Data) > 0: 197 | self.pe.__data__ = (self.pe.__data__[:RawAddress] + Data + \ 198 | self.pe.__data__[RawAddress:]) 199 | 200 | section_offset = section_table_offset + self.pe.FILE_HEADER.NumberOfSections*0x28 201 | 202 | # Manually writing the data of the section header to the file. 203 | self.pe.set_bytes_at_offset(section_offset, Name) 204 | self.pe.set_dword_at_offset(section_offset+0x08, VirtualSize) 205 | self.pe.set_dword_at_offset(section_offset+0x0C, VirtualAddress) 206 | self.pe.set_dword_at_offset(section_offset+0x10, RawSize) 207 | self.pe.set_dword_at_offset(section_offset+0x14, RawAddress) 208 | self.pe.set_dword_at_offset(section_offset+0x18, RelocAddress) 209 | self.pe.set_dword_at_offset(section_offset+0x1C, Linenumbers) 210 | self.pe.set_word_at_offset(section_offset+0x20, RelocationsNumber) 211 | self.pe.set_word_at_offset(section_offset+0x22, LinenumbersNumber) 212 | self.pe.set_dword_at_offset(section_offset+0x24, Characteristics) 213 | 214 | self.pe.FILE_HEADER.NumberOfSections +=1 215 | 216 | # Parsing the section table of the file again to add the new section to the sections 217 | # list of pefile. 218 | self.pe.parse_sections(section_table_offset) 219 | 220 | self.__adjust_optional_header() 221 | else: 222 | raise SectionDoublePError("The NumberOfSections specified in the file header and the " + 223 | "size of the sections list of pefile don't match.") 224 | 225 | return self.pe -------------------------------------------------------------------------------- /dispatch/formats/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/formats/__init__.py -------------------------------------------------------------------------------- /dispatch/formats/base_executable.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | 4 | try: 5 | from keystone import * 6 | except: 7 | logging.warning('Keystone assembler not found so assembling will not work') 8 | 9 | from StringIO import StringIO 10 | 11 | from ..analysis.x86_analyzer import * 12 | from ..analysis.arm_analyzer import * 13 | 14 | from ..enums import * 15 | 16 | class BaseExecutable(object): 17 | ''' 18 | The executable classes expose the raw binary in higher-level chunks. 19 | They automatically lift the .text segment (or equiv.) for quick use, keep a map of offset->vaddrs for lookups in 20 | the rewriting process, and more. You can think of them as the middle-man that sits between the disassembly and the 21 | underlying binary. 22 | ''' 23 | def __init__(self, file_path): 24 | if not os.path.exists(file_path): 25 | raise Exception('No such file') 26 | 27 | self.fp = file_path 28 | self.binary = StringIO(open(self.fp).read()) 29 | 30 | self.architecture = None 31 | self.pack_endianness = None 32 | 33 | self.helper = None 34 | 35 | self.analyzer = None 36 | self.libraries = [] 37 | self.functions = {} # Vaddr: Function 38 | self.strings = {} 39 | self.xrefs = {} 40 | 41 | self.next_injection_vaddr = None 42 | 43 | def __repr__(self): 44 | return '<{} {} \'{}\'>'.format(self.architecture, self.__class__.__name__, self.fp) 45 | 46 | def _identify_arch(self): 47 | ''' 48 | Identifies the architecture that the executable is compiled for. 49 | :return: None 50 | ''' 51 | raise NotImplementedError() 52 | 53 | def is_64_bit(self): 54 | ''' 55 | Determines if the executable is 64 bit or 32 bit. 56 | :return: True if the executable is 64 bit, otherwise false. 57 | ''' 58 | return self.architecture in (ARCHITECTURE.X86_64, ARCHITECTURE.ARM_64) 59 | 60 | def address_length(self): 61 | ''' 62 | :return: Number of bytes an address in this executable will have (i.e. 4 for 32 bit, 8 for 64 bit) 63 | ''' 64 | return 8 if self.is_64_bit() else 4 65 | 66 | def entry_point(self): 67 | ''' 68 | Gets the entry point of the executable. 69 | :return: The entry point of the executable. 70 | ''' 71 | raise NotImplementedError() 72 | 73 | def sections_to_disassemble(self): 74 | ''' 75 | Iterates through each section in the executable that is supposed to be disassembled. 76 | :return: Iterator 77 | ''' 78 | for s in self.sections: 79 | if s.executable: 80 | yield s 81 | 82 | def iter_string_sections(self): 83 | ''' 84 | Returns the section(s) with strings in this executable 85 | :return: Section(s) with strings. 86 | ''' 87 | raise NotImplementedError() 88 | 89 | def vaddr_is_executable(self, vaddr): 90 | ''' 91 | Determine if the given virtual address is in a mapped executable memory segment. 92 | :param vaddr: Virtual address to check 93 | :return: True if the vaddr is in an executable segment, False otherwise 94 | ''' 95 | for section in self.sections: 96 | if section.executable and section.contains_vaddr(vaddr): 97 | return True 98 | 99 | return False 100 | 101 | def section_containing_vaddr(self, vaddr): 102 | for section in self.sections: 103 | if section.contains_vaddr(vaddr): 104 | return section 105 | 106 | return None 107 | 108 | def function_containing_vaddr(self, vaddr): 109 | for f in self.iter_functions(): 110 | if f.contains_address(vaddr): 111 | return f 112 | 113 | return None 114 | 115 | def bb_containing_vaddr(self, vaddr): 116 | for f in self.iter_functions(): 117 | for bb in f.bbs: 118 | if bb.address <= vaddr < bb.address + bb.size: 119 | return bb 120 | 121 | return None 122 | 123 | def vaddr_binary_offset(self, vaddr): 124 | ''' 125 | Gets the offset in the binary file for a given virtual address. 126 | :param vaddr: The virtual address to get the offset for. 127 | :return: The offset in the binary of the virtual address. 128 | ''' 129 | for section in self.sections: 130 | if section.contains_vaddr(vaddr): 131 | return section.offset + vaddr - section.vaddr 132 | 133 | return None 134 | 135 | def _extract_symbol_table(self): 136 | ''' 137 | Extracts the symbol table from the binary and creates named functions as appropriate. 138 | Called from the analyzer in the main analysis function. 139 | :return: None 140 | ''' 141 | raise NotImplementedError() 142 | 143 | def get_binary(self): 144 | ''' 145 | Gets the entire binary. 146 | :return: The raw bytes of the entire binary. 147 | ''' 148 | return self.binary.getvalue() 149 | 150 | def get_binary_vaddr_range(self, start, end): 151 | ''' 152 | Gets the raw bytes from the binary within a virtual address range 153 | :param start: Starting virtual address 154 | :param end: Ending virtual address 155 | :raises: KeyError if either the start or end virtual addresses do not actually exist in the binary 156 | :return: The bytes in the binary between the two virtual addresses 157 | ''' 158 | start_offset = self.vaddr_binary_offset(start) 159 | end_offset = self.vaddr_binary_offset(end) 160 | # if either of these returns None we don't want to slice up -- raise an error 161 | if start_offset and end_offset: 162 | return self.get_binary()[start_offset:end_offset] 163 | 164 | bad_addr = start if not start_offset else end # which address triggered our error 165 | raise KeyError("Vaddr is not in binary: {:x}".format(bad_addr)) 166 | 167 | def analyze(self): 168 | ''' 169 | Creates an analyzer for the binary and then runs the initial analysis routine. 170 | :return: The created analyzer object. 171 | ''' 172 | if self.architecture == ARCHITECTURE.X86: 173 | self.analyzer = X86_Analyzer(self) 174 | elif self.architecture == ARCHITECTURE.X86_64: 175 | self.analyzer = X86_64_Analyzer(self) 176 | elif self.architecture == ARCHITECTURE.ARM: 177 | self.analyzer = ARM_Analyzer(self) 178 | elif self.architecture == ARCHITECTURE.ARM_64: 179 | self.analyzer = ARM_64_Analyzer(self) 180 | 181 | if self.analyzer: 182 | self.analyzer.analyze() 183 | else: 184 | logging.error('Could not create analyzer for {}'.format(self)) 185 | 186 | return self.analyzer 187 | 188 | def _ks_symbol_resolver(self, symbol, value): 189 | f = self.function_named(symbol) 190 | 191 | if f: 192 | value = f.address 193 | return True 194 | 195 | return False 196 | 197 | def assemble(self, s, vaddr=0): 198 | ''' 199 | Assemble the given string relative to the given virtual address 200 | :param s: String of assembly commands to be assembled 201 | :param vaddr: Virtual address the code is assembled relative to 202 | :return: A bytearray with the resulting machine code 203 | ''' 204 | if self.architecture == ARCHITECTURE.X86: 205 | ks = Ks(KS_ARCH_X86, KS_MODE_32) 206 | elif self.architecture == ARCHITECTURE.X86_64: 207 | ks = Ks(KS_ARCH_X86, KS_MODE_64) 208 | elif self.architecture == ARCHITECTURE.ARM: 209 | ks = Ks(KS_ARCH_ARM, KS_MODE_ARM) 210 | elif self.architecture == ARCHITECTURE.ARM_64: 211 | ks = Ks(KS_ARCH_ARM64, KS_MODE_ARM) 212 | else: 213 | logging.error('Could not create assembler for {}'.format(self)) 214 | raise Exception('Architecture not supported') 215 | 216 | ks.sym_resolver = self._ks_symbol_resolver 217 | 218 | encoding, count = ks.asm(s, vaddr) 219 | 220 | return bytearray(encoding) 221 | 222 | def prepare_for_injection(self): 223 | ''' 224 | Prepares the binary for code injection, creating sections/segments where needed. 225 | This should *always* be called before inject() is called, as it provides the initial values for 226 | next_injection_vaddr which may be required to do certain IP-relative computations. 227 | :return: None 228 | ''' 229 | raise NotImplementedError() 230 | 231 | def inject(self, asm, update_entry=False): 232 | ''' 233 | Injects the given assembly into the binary, optionally updating the entry point if the injected code is to run 234 | before initialization. 235 | :param asm: The assembly to inject. 236 | :param update_entry: Whether or not to update the binary entry point to point to the injected code. 237 | :return: (offset of injected assembly in binary, virtual address of injected assembly) 238 | ''' 239 | raise NotImplementedError() 240 | 241 | def hook(self, vaddr, asm): 242 | ''' 243 | Patches the given binary to call `asm` at `vaddr`. 244 | :param vaddr: The virtual address of the instruction to hook/patch 245 | :param asm: The assembly (either string of assembly, bytearray of assembled opcodes, or list of Instructions) to 246 | be written 247 | :return: The virtual address of the created hook 248 | ''' 249 | 250 | # TODO: Move below to its own function and use in replace_instruction and inject 251 | if type(asm) not in [str, list, bytearray]: 252 | raise ValueError('asm is not a valid type. Must be str, list, or bytearray') 253 | 254 | if self.next_injection_vaddr is None: 255 | self.prepare_for_injection() 256 | 257 | # We first replace the original instruction with a call to a new code chunk 258 | jmp = self.assemble('call '+hex(self.next_injection_vaddr), vaddr) #TODO: Architecture independent 259 | overwritten_instructions = self.replace_at(vaddr, jmp) 260 | 261 | if type(asm) == str: 262 | # Assemble with keystone 263 | pulled_list = [x.mnemonic + ' ' + x.op_str() for x in overwritten_instructions] 264 | asm = ' ; '.join(pulled_list) + ' ; ' + asm 265 | asm = self.assemble(asm, self.next_injection_vaddr) 266 | new_chunk = asm 267 | elif type(asm) == list: 268 | # TODO: reassemble to fix offsets in overwritten instructions 269 | # Assemble each Instruction object 270 | asm = sum((ins.raw for ins in asm), bytearray()) 271 | new_chunk = sum((ins.raw for ins in overwritten_instructions), bytearray()) + asm 272 | 273 | # Then we inject that new code chunk. This is composed of the instructions we wrote over to create the jump 274 | # as well as the assembly we actually want to call 275 | hook_addr = self.inject(new_chunk) 276 | logging.debug('Replaced instruction at {} with jump to {}'.format(vaddr, hook_addr)) 277 | 278 | return hook_addr 279 | 280 | 281 | def iter_functions(self): 282 | ''' 283 | Iterates over the functions in this executable 284 | :return: Iterator 285 | ''' 286 | return iter(self.functions.values()) 287 | 288 | def function_named(self, name): 289 | ''' 290 | Finds a function with a given name if it exists. 291 | :param name: The name of the function to search for. 292 | :return: The function if it is found, else None. 293 | ''' 294 | for func in self.iter_functions(): 295 | if func.name == name or func.name == 'sub_'+name or func.name == name+'@PLT': 296 | return func 297 | 298 | return None 299 | 300 | def replace_at(self, vaddr, new_asm): 301 | ''' 302 | Replaces an instruction with the given assembly. 303 | :param vaddr: The address of the existing instruction(s) to overwrite. 304 | :param new_asm: The new assembly that will be written over the old instruction. 305 | :return: The original instruction(s) that was/were overwritten 306 | ''' 307 | 308 | if not vaddr in self.analyzer.ins_map: 309 | raise Exception('Starting virtual address to replace must be an existing instruction') 310 | 311 | # Find all instructions we will be overwriting, and warn if they are referenced elsewhere. 312 | 313 | # If an instruction is being referenced elsewhere (most likely in a jump), it's possible that the jump 314 | # (or whatever it is) will end up going to the middle of our replaced asm which can obviously make the program 315 | # behave unexpectedly. 316 | overwritten_insns = self.analyzer.ins_map[vaddr:vaddr + max(len(new_asm), 1)] 317 | for ins in overwritten_insns: 318 | if ins.address in self.xrefs: 319 | logging.warning('{} will be overwritten but there are xrefs to it: {}'.format(ins, 320 | self.xrefs[ins.address])) 321 | 322 | # Write the new bytes 323 | offset = self.vaddr_binary_offset(vaddr) 324 | self.binary.seek(offset) 325 | logging.debug('Replacing instruction(s) at offset {}'.format(offset)) 326 | self.binary.write(new_asm) 327 | 328 | # Find how much is left over in the original instruction(s) and NOP them out 329 | overwritten_size = sum(i.size for i in overwritten_insns) 330 | padding = self.analyzer.NOP_INSTRUCTION * ((overwritten_size - len(new_asm)) / len(self.analyzer.NOP_INSTRUCTION)) 331 | self.binary.write(padding) 332 | 333 | # Disassemble the new instructions 334 | new_instructions = self.analyzer.disassemble_range(vaddr, vaddr + len(new_asm)) 335 | 336 | func = self.function_containing_vaddr(vaddr) 337 | 338 | insert_point = func.instructions.index(overwritten_insns[0]) 339 | 340 | # Remove the old instructions from the function 341 | for ins in overwritten_insns: 342 | func.instructions.remove(ins) 343 | 344 | # Insert the new instructions where we just removed the old ones 345 | func.instructions = func.instructions[:insert_point] + new_instructions + func.instructions[insert_point:] 346 | 347 | # Re-analyze the function for BBs 348 | func.do_bb_analysis() 349 | 350 | # Finally clear the instructions out from the global instruction map 351 | for ins in overwritten_insns: 352 | del self.analyzer.ins_map[ins.address] 353 | 354 | for ins in new_instructions: 355 | self.analyzer.ins_map[ins.address] = ins 356 | 357 | return overwritten_insns 358 | 359 | def save(self, file_name): 360 | with open(file_name, 'wb') as f: 361 | f.write(self.get_binary()) 362 | -------------------------------------------------------------------------------- /dispatch/formats/elf_executable.py: -------------------------------------------------------------------------------- 1 | from elftools.elf.elffile import ELFFile 2 | from elftools.elf.enums import * 3 | from elftools.elf.constants import * 4 | from elftools.elf.sections import SymbolTableSection 5 | import logging 6 | 7 | from .base_executable import * 8 | from .section import * 9 | 10 | INJECTION_SIZE = 0x1000 11 | 12 | class ELFExecutable(BaseExecutable): 13 | def __init__(self, file_path): 14 | super(ELFExecutable, self).__init__(file_path) 15 | 16 | self.helper = ELFFile(self.binary) 17 | 18 | self.architecture = self._identify_arch() 19 | 20 | if self.architecture is None: 21 | raise Exception('Architecture is not recognized') 22 | 23 | logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path)) 24 | 25 | self.pack_endianness = '<' if self.helper.little_endian else '>' 26 | self.address_pack_type = 'I' if self.helper.elfclass == 32 else 'Q' 27 | 28 | self.sections = [section_from_elf_section(s) for s in self.helper.iter_sections()] 29 | 30 | self.executable_segment = [s for s in self.helper.iter_segments() if s['p_type'] == 'PT_LOAD' and s['p_flags'] & 0x1][0] 31 | 32 | dyn = self.helper.get_section_by_name('.dynamic') 33 | if dyn: 34 | self.libraries = [t.needed for t in dyn.iter_tags() if t['d_tag'] == 'DT_NEEDED'] 35 | 36 | self.next_injection_offset = None 37 | 38 | def _identify_arch(self): 39 | machine = self.helper.get_machine_arch() 40 | if machine == 'x86': 41 | return ARCHITECTURE.X86 42 | elif machine == 'x64': 43 | return ARCHITECTURE.X86_64 44 | elif machine == 'ARM': 45 | return ARCHITECTURE.ARM 46 | elif machine == 'AArch64': 47 | return ARCHITECTURE.ARM_64 48 | else: 49 | return None 50 | 51 | def entry_point(self): 52 | return self.helper['e_entry'] 53 | 54 | def executable_segment_vaddr(self): 55 | return self.executable_segment['p_vaddr'] 56 | 57 | def executable_segment_size(self): 58 | # TODO: Maybe limit this because we use this as part of our injection method? 59 | return self.executable_segment['p_memsz'] 60 | 61 | def iter_string_sections(self): 62 | STRING_SECTIONS = ['.rodata', '.data', '.bss'] 63 | for s in self.sections: 64 | if s.name in STRING_SECTIONS: 65 | yield s 66 | 67 | def _extract_symbol_table(self): 68 | # Add in symbols from the PLT/rela.plt 69 | # .rela.plt contains indexes to reference both .dynsym (symbol names) and .plt (jumps to GOT) 70 | 71 | if self.is_64_bit(): 72 | reloc_section = self.helper.get_section_by_name('.rela.plt') 73 | else: 74 | reloc_section = self.helper.get_section_by_name('.rel.plt') 75 | 76 | if reloc_section: 77 | dynsym = self.helper.get_section(reloc_section['sh_link']) # .dynsym 78 | if isinstance(dynsym, SymbolTableSection): 79 | plt = self.helper.get_section_by_name('.plt') 80 | for idx, reloc in enumerate(reloc_section.iter_relocations()): 81 | # Get the symbol's name from dynsym 82 | symbol_name = dynsym.get_symbol(reloc['r_info_sym']).name 83 | 84 | # The address of this function in the PLT is the base PLT offset + the index of the relocation. 85 | # However, since there is the extra "trampoline" entity at the top of the PLT, we need to add one to the 86 | # index to account for it. 87 | 88 | # While sh_entsize is sometimes defined, it appears to be incorrect in some cases so we just ignore that 89 | # and calculate it based off of the total size / num_relocations (plus the trampoline entity) 90 | entsize = (plt['sh_size'] / (reloc_section.num_relocations() + 1)) 91 | 92 | plt_addr = plt['sh_addr'] + ((idx+1) * entsize) 93 | 94 | logging.debug('Directly adding PLT function {} at vaddr {}'.format(symbol_name, hex(plt_addr))) 95 | 96 | f = Function(plt_addr, 97 | entsize, 98 | symbol_name + '@PLT', 99 | self, 100 | type=Function.DYNAMIC_FUNC) 101 | self.functions[plt_addr] = f 102 | else: 103 | logging.debug('.rel(a).plt section had sh_link to {}. Not parsing symbols...'.format(dynsym)) 104 | 105 | if self.helper.get_section_by_name('.dynsym'): 106 | for symbol in self.helper.get_section_by_name('.dynsym').iter_symbols(): 107 | if symbol.entry['st_info']['type'] == 'STT_FUNC' and symbol.entry['st_size'] > 0: 108 | vaddr = symbol.entry['st_value'] 109 | if vaddr not in self.functions: 110 | logging.debug('Adding function from .dynsym directly at vaddr {}'.format(vaddr)) 111 | f = Function(vaddr, 112 | symbol.entry['st_size'], 113 | symbol.name, 114 | self, 115 | type=Function.DYNAMIC_FUNC) 116 | self.functions[vaddr] = f 117 | 118 | 119 | # Some things in the symtab have st_size = 0 which confuses analysis later on. To solve this, we keep track of 120 | # where each address is in the `function_vaddrs` set and go back after all symbols have been iterated to compute 121 | # size by taking the difference between the current address and the next recorded address. 122 | 123 | # We do this for each executable section so that the produced functions cannot span multiple sections. 124 | 125 | for section in self.helper.iter_sections(): 126 | if self.executable_segment.section_in_segment(section): 127 | name_for_addr = {} 128 | 129 | function_vaddrs = set([section['sh_addr'] + section['sh_size']]) 130 | 131 | symbol_table = self.helper.get_section_by_name('.symtab') 132 | if symbol_table: 133 | for symbol in symbol_table.iter_symbols(): 134 | if symbol['st_info']['type'] == 'STT_FUNC' and symbol['st_shndx'] != 'SHN_UNDEF': 135 | if section['sh_addr'] <= symbol['st_value'] < section['sh_addr'] + section['sh_size']: 136 | name_for_addr[symbol['st_value']] = symbol.name 137 | function_vaddrs.add(symbol['st_value']) 138 | 139 | if symbol['st_size']: 140 | logging.debug('Eagerly adding function {} from .symtab at vaddr {} with size {}' 141 | .format(symbol.name, hex(symbol['st_value']), hex(symbol['st_size']))) 142 | f = Function(symbol['st_value'], 143 | symbol['st_size'], 144 | symbol.name, 145 | self) 146 | self.functions[symbol['st_value']] = f 147 | 148 | 149 | function_vaddrs = sorted(list(function_vaddrs)) 150 | 151 | for cur_addr, next_addr in zip(function_vaddrs[:-1], function_vaddrs[1:]): 152 | # If st_size was set, we already added the function above, so don't add it again. 153 | if cur_addr not in self.functions: 154 | func_name = name_for_addr[cur_addr] 155 | size = next_addr - cur_addr 156 | logging.debug('Lazily adding function {} from .symtab at vaddr {} with size {}' 157 | .format(func_name, hex(cur_addr), hex(size))) 158 | f = Function(cur_addr, 159 | next_addr - cur_addr, 160 | name_for_addr[cur_addr], 161 | self, 162 | type=Function.DYNAMIC_FUNC) 163 | self.functions[cur_addr] = f 164 | 165 | # TODO: Automatically find and label main from call to libc_start_main 166 | 167 | def prepare_for_injection(self): 168 | """ 169 | Derived from http://vxheavens.com/lib/vsc01.html 170 | """ 171 | modified = StringIO(self.binary.getvalue()) 172 | 173 | # Add INJECTION_SIZE to the section header list offset to make room for our injected code 174 | elf_hdr = self.helper.header.copy() 175 | elf_hdr.e_shoff += INJECTION_SIZE 176 | logging.debug('Changing e_shoff to {}'.format(elf_hdr.e_shoff)) 177 | 178 | modified.seek(0) 179 | modified.write(self.helper.structs.Elf_Ehdr.build(elf_hdr)) 180 | 181 | # Find the main RX LOAD segment and also adjust other segment offsets along the way 182 | executable_segment = None 183 | 184 | for segment_idx, segment in enumerate(self.helper.iter_segments()): 185 | segment_hdr = segment.header.copy() 186 | segment_hdr_offset = self.helper._segment_offset(segment_idx) 187 | 188 | if executable_segment is not None: 189 | # Already past the executable segment, so just update the offset if needed (i.e. don't update things 190 | # that come before the expanded section) 191 | if segment_hdr.p_offset > last_exec_section['sh_offset']: 192 | segment_hdr.p_offset += INJECTION_SIZE 193 | 194 | elif segment['p_type'] == 'PT_LOAD' and segment['p_flags'] & P_FLAGS.PF_X: 195 | # Found the executable LOAD segment. 196 | # Make room for our injected code. 197 | 198 | logging.debug('Found executable LOAD segment at index {}'.format(segment_idx)) 199 | executable_segment = segment 200 | 201 | last_exec_section_idx = max([idx for idx in range(self.helper.num_sections()) if 202 | executable_segment.section_in_segment(self.helper.get_section(idx))]) 203 | last_exec_section = self.helper.get_section(last_exec_section_idx) 204 | 205 | segment_hdr.p_flags |= P_FLAGS.PF_X | P_FLAGS.PF_W | P_FLAGS.PF_R 206 | segment_hdr.p_filesz += INJECTION_SIZE 207 | segment_hdr.p_memsz += INJECTION_SIZE 208 | 209 | logging.debug('Rewriting segment filesize and memsize to {} and {}'.format( 210 | segment_hdr.p_filesz, segment_hdr.p_memsz) 211 | ) 212 | 213 | modified.seek(segment_hdr_offset) 214 | modified.write(self.helper.structs.Elf_Phdr.build(segment_hdr)) 215 | 216 | if executable_segment is None: 217 | logging.error("Could not locate an executable LOAD segment. Cannot continue injection.") 218 | return False 219 | 220 | logging.debug('Last section in executable LOAD segment is at index {} ({})'.format(last_exec_section_idx, 221 | last_exec_section.name)) 222 | 223 | self.next_injection_offset = last_exec_section['sh_offset'] + last_exec_section['sh_size'] 224 | self.next_injection_vaddr = last_exec_section['sh_addr'] + last_exec_section['sh_size'] 225 | 226 | # Update sh_size for the section we grew 227 | section_header_offset = self.helper._section_offset(last_exec_section_idx) 228 | section_header = last_exec_section.header.copy() 229 | 230 | section_header.pflags = P_FLAGS.PF_R | P_FLAGS.PF_W | P_FLAGS.PF_X # Hack to make it so we can RWX the page 231 | section_header.sh_size += INJECTION_SIZE 232 | 233 | modified.seek(section_header_offset) 234 | modified.write(self.helper.structs.Elf_Shdr.build(section_header)) 235 | 236 | # Update sh_offset for each section past the last section in the executable segment 237 | for section_idx in range(last_exec_section_idx + 1, self.helper.num_sections()): 238 | section_header_offset = self.helper._section_offset(section_idx) 239 | section_header = self.helper.get_section(section_idx).header.copy() 240 | 241 | section_header.sh_offset += INJECTION_SIZE 242 | logging.debug('Rewriting section {}\'s offset to {}'.format(section_idx, section_header.sh_offset)) 243 | 244 | modified.seek(section_header_offset) 245 | modified.write(self.helper.structs.Elf_Shdr.build(section_header)) 246 | 247 | # TODO: Architecture-specific padding 248 | # Should be something that won't immediately crash, but can be caught (e.g. SIGTRAP on x86) 249 | modified = StringIO(modified.getvalue()[:self.next_injection_offset] + 250 | '\xCC'*INJECTION_SIZE + 251 | modified.getvalue()[self.next_injection_offset:]) 252 | 253 | self.binary = modified 254 | self.helper = ELFFile(self.binary) 255 | 256 | return True 257 | 258 | def inject(self, asm, update_entry=False): 259 | if self.next_injection_offset is None or self.next_injection_vaddr is None: 260 | logging.warning( 261 | 'prepare_for_injection() was not called before inject(). Calling now, but this may cause unexpected behavior') 262 | self.prepare_for_injection() 263 | 264 | for segment in self.helper.iter_segments(): 265 | if segment['p_type'] == 'PT_LOAD' and segment['p_flags'] & P_FLAGS.PF_X: 266 | injection_section_idx = max(i for i in range(self.helper.num_sections()) if segment.section_in_segment(self.helper.get_section(i))) 267 | break 268 | 269 | injection_section = self.helper.get_section(injection_section_idx) 270 | 271 | # If we haven't injected code before or need to expand the section again for this injection, go ahead and 272 | # shift stuff around. 273 | if injection_section['sh_offset'] + injection_section['sh_size'] < self.next_injection_offset + len(asm): 274 | logging.debug('Automatically expanding injection section to accommodate for assembly') 275 | 276 | # NOTE: Could this change the destination address for the code that gets injected? 277 | self.prepare_for_injection() 278 | injection_section = self.helper.get_section(injection_section_idx) 279 | 280 | used_code_len = len(injection_section.data().rstrip('\xCC')) 281 | self.next_injection_offset = injection_section['sh_offset'] + used_code_len 282 | self.next_injection_vaddr = injection_section['sh_addr'] + used_code_len 283 | 284 | # "Inject" the assembly 285 | logging.debug('Injecting {} bytes of assembly at offset {}'.format(len(asm), self.next_injection_offset)) 286 | self.binary.seek(self.next_injection_offset) 287 | self.binary.write(asm) 288 | 289 | # Update e_entry if requested 290 | if update_entry: 291 | logging.debug('Rewriting ELF entry address to {}'.format(self.next_injection_vaddr)) 292 | elf_hdr = self.helper.header 293 | elf_hdr.e_entry = self.next_injection_vaddr 294 | 295 | self.binary.seek(0) 296 | self.binary.write(self.helper.structs.Elf_Ehdr.build(elf_hdr)) 297 | 298 | self.helper = ELFFile(self.binary) 299 | 300 | self.next_injection_vaddr += len(asm) 301 | self.next_injection_offset += len(asm) 302 | 303 | return self.next_injection_vaddr - len(asm) 304 | -------------------------------------------------------------------------------- /dispatch/formats/macho_executable.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import struct 3 | 4 | from macholib.MachO import MachO 5 | from macholib.mach_o import * 6 | 7 | from .base_executable import * 8 | from .section import * 9 | 10 | INJECTION_SEGMENT_NAME = 'INJECT' 11 | INJECTION_SECTION_NAME = 'inject' 12 | 13 | class MachOExecutable(BaseExecutable): 14 | def __init__(self, file_path): 15 | super(MachOExecutable, self).__init__(file_path) 16 | 17 | self.helper = MachO(self.fp) 18 | 19 | if self.helper.fat: 20 | raise Exception('MachO fat binaries are not supported at this time') 21 | 22 | self.architecture = self._identify_arch() 23 | 24 | if self.architecture is None: 25 | raise Exception('Architecture is not recognized') 26 | 27 | logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path)) 28 | 29 | self.pack_endianness = self.helper.headers[0].endian 30 | 31 | self.sections = [] 32 | for lc, cmd, data in self.helper.headers[0].commands: 33 | if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64): 34 | for section in data: 35 | self.sections.append(section_from_macho_section(section, cmd)) 36 | 37 | self.executable_segment = [cmd for lc, cmd, _ in self.helper.headers[0].commands 38 | if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and cmd.initprot & 0x4][0] 39 | 40 | self.libraries = [fp.rstrip('\x00') for lc, cmd, fp in self.helper.headers[0].commands if lc.cmd == LC_LOAD_DYLIB] 41 | 42 | def _identify_arch(self): 43 | if self.helper.headers[0].header.cputype == 0x7: 44 | return ARCHITECTURE.X86 45 | elif self.helper.headers[0].header.cputype == 0x01000007: 46 | return ARCHITECTURE.X86_64 47 | elif self.helper.headers[0].header.cputype == 0xc: 48 | return ARCHITECTURE.ARM 49 | elif self.helper.headers[0].header.cputype == 0x0100000c: 50 | return ARCHITECTURE.ARM_64 51 | else: 52 | return None 53 | 54 | def executable_segment_vaddr(self): 55 | return self.executable_segment.vmaddr 56 | 57 | def executable_segment_size(self): 58 | return self.executable_segment.vmsize 59 | 60 | def entry_point(self): 61 | for lc, cmd, _ in self.helper.headers[0].commands: 62 | if lc.cmd == LC_MAIN: 63 | return cmd.entryoff 64 | return 65 | 66 | def _extract_symbol_table(self): 67 | ordered_symbols = [] 68 | 69 | symtab_command = self.helper.headers[0].getSymbolTableCommand() 70 | 71 | if symtab_command: 72 | self.binary.seek(symtab_command.stroff) 73 | symbol_strings = self.binary.read(symtab_command.strsize) 74 | 75 | self.binary.seek(symtab_command.symoff) 76 | 77 | for i in range(symtab_command.nsyms): 78 | if self.is_64_bit(): 79 | symbol = nlist_64.from_fileobj(self.binary, _endian_=self.pack_endianness) 80 | else: 81 | symbol = nlist.from_fileobj(self.binary, _endian_=self.pack_endianness) 82 | 83 | symbol_name = symbol_strings[symbol.n_un:].split('\x00')[0] 84 | 85 | if symbol.n_type & N_STAB == 0: 86 | is_ext = symbol.n_type & N_EXT and symbol.n_value == 0 87 | 88 | # Ignore Apple's hack for radar bug 5614542 89 | if not is_ext and symbol_name != 'radr://5614542': 90 | size = 0 91 | logging.debug('Adding function {} from the symtab at vaddr {} with size {}' 92 | .format(symbol_name, hex(symbol.n_value), hex(size))) 93 | f = Function(symbol.n_value, size, symbol_name, self) 94 | self.functions[symbol.n_value] = f 95 | 96 | ordered_symbols.append(symbol_name) 97 | 98 | dysymtab_command = self.helper.headers[0].getDynamicSymbolTableCommand() 99 | if dysymtab_command: 100 | self.binary.seek(dysymtab_command.indirectsymoff) 101 | indirect_symbols = self.binary.read(dysymtab_command.nindirectsyms*4) 102 | 103 | sym_offsets = struct.unpack(self.pack_endianness + 'I'*dysymtab_command.nindirectsyms, indirect_symbols) 104 | 105 | for lc, cmd, sections in self.helper.headers[0].commands: 106 | if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and cmd.initprot & 0x4: 107 | for section in sections: 108 | if section.flags & S_NON_LAZY_SYMBOL_POINTERS == S_NON_LAZY_SYMBOL_POINTERS \ 109 | or section.flags & S_LAZY_SYMBOL_POINTERS == S_LAZY_SYMBOL_POINTERS \ 110 | or section.flags & S_SYMBOL_STUBS == S_SYMBOL_STUBS: 111 | 112 | logging.debug('Parsing dynamic entries in {}.{}'.format(section.segname, section.sectname)) 113 | 114 | if section.flags & S_SYMBOL_STUBS: 115 | stride = section.reserved2 116 | else: 117 | stride = (64 if self.is_64_bit() else 32) 118 | 119 | count = section.size / stride 120 | 121 | for i in range(count): 122 | addr = self.executable_segment.vmaddr + section.offset + (i * stride) 123 | idx = sym_offsets[i + section.reserved1] 124 | if idx == 0x40000000: 125 | symbol_name = "INDIRECT_SYMBOL_ABS" 126 | elif idx == 0x80000000: 127 | symbol_name = "INDIRECT_SYMBOL_LOCAL" 128 | else: 129 | symbol_name = ordered_symbols[idx] 130 | logging.debug('Adding function {} from the dynamic symtab at vaddr {} with size {}' 131 | .format(symbol_name, hex(addr), hex(stride))) 132 | f = Function(addr, stride, symbol_name, self, type=Function.DYNAMIC_FUNC) 133 | self.functions[addr] = f 134 | 135 | def iter_string_sections(self): 136 | STRING_SECTIONS = ['__const', '__cstring', '__objc_methname', '__objc_classname'] 137 | for s in self.sections: 138 | if s.name in STRING_SECTIONS: 139 | yield s 140 | 141 | def prepare_for_injection(self): 142 | # Total size of the stuff we're going to be adding in the middle of the binary 143 | offset = 72+80 if self.is_64_bit() else 56+68 # 1 segment header + 1 section header 144 | 145 | fileoff = (self.binary.len & ~0xfff) + 0x1000 146 | 147 | vmaddr = self.function_named('__mh_execute_header').address + fileoff 148 | 149 | logging.debug('Creating new MachOSegment at vaddr {}'.format(hex(vmaddr))) 150 | new_segment = segment_command_64() if self.is_64_bit() else segment_command() 151 | new_segment._endian_ = self.pack_endianness 152 | new_segment.segname = INJECTION_SEGMENT_NAME 153 | new_segment.fileoff = fileoff 154 | new_segment.filesize = 0 155 | new_segment.vmaddr = vmaddr 156 | new_segment.vmsize = 0x1000 157 | new_segment.maxprot = 0x7 #RWX 158 | new_segment.initprot = 0x5 # RX 159 | new_segment.flags = 0 160 | new_segment.nsects = 1 161 | 162 | logging.debug('Creating new MachOSection at vaddr {}'.format(hex(vmaddr))) 163 | new_section = section_64() if self.is_64_bit() else section() 164 | new_section._endian_ = self.pack_endianness 165 | new_section.sectname = INJECTION_SECTION_NAME 166 | new_section.segname = new_segment.segname 167 | new_section.addr = new_segment.vmaddr 168 | new_section.size = 0 169 | new_section.offset = new_segment.fileoff 170 | new_section.align = 4 171 | new_section.flags = 0x80000400 172 | 173 | lc = load_command() 174 | lc._endian_ = self.pack_endianness 175 | lc.cmd = LC_SEGMENT_64 if self.is_64_bit() else LC_SEGMENT 176 | lc.cmdsize = offset 177 | 178 | self.helper.headers[0].commands.append((lc, new_segment, [new_section])) 179 | 180 | self.helper.headers[0].header.ncmds += 1 181 | self.helper.headers[0].header.sizeofcmds += offset 182 | 183 | return new_segment 184 | 185 | def inject(self, asm, update_entry=False): 186 | found = [s for lc,s,_ in self.helper.headers[0].commands if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and s.segname == INJECTION_SEGMENT_NAME] 187 | if found: 188 | injection_vaddr = found[0].vmaddr 189 | else: 190 | logging.warning( 191 | 'prepare_for_injection() was not called before inject(). This may cause unexpected behavior') 192 | inject_seg = self.prepare_for_injection() 193 | injection_vaddr = inject_seg.vmaddr 194 | 195 | if update_entry: 196 | for lc, cmd, _ in self.helper.headers[0].commands: 197 | if lc.cmd == LC_MAIN: 198 | cmd.entryoff = injection_vaddr 199 | break 200 | 201 | self.binary.seek(0) 202 | 203 | for lc, segment, sections in self.helper.headers[0].commands: 204 | if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and segment.segname == INJECTION_SEGMENT_NAME: 205 | injection_offset = segment.fileoff + segment.filesize 206 | segment.filesize += len(asm) 207 | if segment.filesize + len(asm) > segment.vmsize: 208 | segment.vmsize += 0x1000 209 | for section in sections: 210 | if section.sectname == INJECTION_SECTION_NAME: 211 | section.size += len(asm) 212 | self.next_injection_vaddr = section.addr + section.size 213 | 214 | self.helper.headers[0].write(self.binary) 215 | 216 | self.binary.seek(injection_offset) 217 | self.binary.write(asm) 218 | 219 | return injection_vaddr 220 | -------------------------------------------------------------------------------- /dispatch/formats/pe_executable.py: -------------------------------------------------------------------------------- 1 | import pefile 2 | from .SectionDoubleP import SectionDoubleP 3 | 4 | from .base_executable import * 5 | from .section import * 6 | 7 | SECTION_SIZE = 0x1000 8 | 9 | class PEExecutable(BaseExecutable): 10 | def __init__(self, file_path): 11 | super(PEExecutable, self).__init__(file_path) 12 | 13 | self.helper = pefile.PE(self.fp) 14 | 15 | self.architecture = self._identify_arch() 16 | 17 | if self.architecture is None: 18 | raise Exception('Architecture is not recognized') 19 | 20 | logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path)) 21 | 22 | self.pack_endianness = '<' 23 | 24 | self.sections = [section_from_pe_section(s, self.helper) for s in self.helper.sections] 25 | 26 | if hasattr(self.helper, 'DIRECTORY_ENTRY_IMPORT'): 27 | self.libraries = [dll.dll for dll in self.helper.DIRECTORY_ENTRY_IMPORT] 28 | else: 29 | self.libraries = [] 30 | 31 | def _identify_arch(self): 32 | machine = pefile.MACHINE_TYPE[self.helper.FILE_HEADER.Machine] 33 | if machine == 'IMAGE_FILE_MACHINE_I386': 34 | return ARCHITECTURE.X86 35 | elif machine == 'IMAGE_FILE_MACHINE_AMD64': 36 | return ARCHITECTURE.X86_64 37 | elif machine == 'IMAGE_FILE_MACHINE_ARM': 38 | return ARCHITECTURE.ARM 39 | else: 40 | return None 41 | 42 | def entry_point(self): 43 | return self.helper.OPTIONAL_HEADER.AddressOfEntryPoint 44 | 45 | def get_binary(self): 46 | return self.helper.write() 47 | 48 | def iter_string_sections(self): 49 | STRING_SECTIONS = ['.rdata'] 50 | for s in self.sections: 51 | if s.name in STRING_SECTIONS: 52 | yield s 53 | 54 | def _extract_symbol_table(self): 55 | # Load in stuff from the IAT if it exists 56 | if hasattr(self.helper, 'DIRECTORY_ENTRY_IMPORT'): 57 | for dll in self.helper.DIRECTORY_ENTRY_IMPORT: 58 | for imp in dll.imports: 59 | if imp.name: 60 | name = imp.name + '@' + dll.dll 61 | else: 62 | name = 'ordinal_' + str(imp.ordinal) + '@' + dll.dll 63 | 64 | self.functions[imp.address] = Function(imp.address, 65 | self.address_length(), 66 | name, 67 | self) 68 | 69 | # Load in information from the EAT if it exists 70 | if hasattr(self.helper, 'DIRECTORY_ENTRY_EXPORT'): 71 | for symbol in self.helper.DIRECTORY_ENTRY_EXPORT.symbols: 72 | if symbol.address not in self.functions: 73 | self.functions[symbol.address] = Function(symbol.address, 74 | 0, 75 | symbol.name, 76 | self) 77 | else: 78 | self.functions[symbol.address].name = symbol.name 79 | 80 | def prepare_for_injection(self): 81 | sdp = SectionDoubleP(self.helper) 82 | to_inject = '\x00' * SECTION_SIZE 83 | self.helper = sdp.push_back(Name='.inject', Characteristics=0x60000020, Data=to_inject) 84 | self.next_injection_vaddr = self.helper.sections[-1].VirtualAddress + self.helper.OPTIONAL_HEADER.ImageBase 85 | 86 | def inject(self, asm, update_entry=False): 87 | has_injection_section = [s for s in self.helper.sections if s.Name.startswith('.inject')] 88 | 89 | if not has_injection_section: 90 | logging.warning( 91 | 'prepare_for_injection() was not called before inject(). This may cause unexpected behavior') 92 | self.prepare_for_injection() 93 | 94 | inject_rva = self.next_injection_vaddr - self.helper.OPTIONAL_HEADER.ImageBase 95 | self.helper.set_bytes_at_rva(inject_rva, asm) 96 | 97 | if update_entry: 98 | self.helper.OPTIONAL_HEADER.AddressOfEntryPoint = inject_rva 99 | 100 | self.next_injection_vaddr += len(asm) 101 | 102 | return inject_rva + self.helper.OPTIONAL_HEADER.ImageBase 103 | 104 | def replace_at(self, vaddr, new_asm): 105 | # Identical to the implementation in base_executable except for the commented section 106 | 107 | if not vaddr in self.analyzer.ins_map: 108 | raise Exception('Starting virtual address to replace must be an existing instruction') 109 | 110 | overwritten_insns = self.analyzer.ins_map[vaddr:vaddr + max(len(new_asm), 1)] 111 | for ins in overwritten_insns: 112 | if ins.address in self.xrefs: 113 | logging.warning('{} will be overwritten but there are xrefs to it: {}'.format(ins, 114 | self.xrefs[ins.address])) 115 | 116 | logging.debug('Replacing instruction(s) at vaddr {}'.format(vaddr)) 117 | 118 | # Since we're using pefile to keep track of the (changed) binary, use pefile's methods to write the new asm 119 | self.helper.set_bytes_at_rva(vaddr - self.helper.OPTIONAL_HEADER.ImageBase, new_asm) 120 | 121 | overwritten_size = sum(i.size for i in overwritten_insns) 122 | padding = self.analyzer.NOP_INSTRUCTION * ((overwritten_size - len(new_asm)) / len(self.analyzer.NOP_INSTRUCTION)) 123 | self.helper.set_bytes_at_rva(vaddr - self.helper.OPTIONAL_HEADER.ImageBase + len(new_asm), padding) 124 | 125 | new_instructions = self.analyzer.disassemble_range(vaddr, vaddr + len(new_asm)) 126 | 127 | func = self.function_containing_vaddr(vaddr) 128 | 129 | insert_point = func.instructions.index(overwritten_insns[0]) 130 | 131 | for ins in overwritten_insns: 132 | func.instructions.remove(ins) 133 | 134 | func.instructions = func.instructions[:insert_point] + new_instructions + func.instructions[insert_point:] 135 | 136 | func.do_bb_analysis() 137 | 138 | for ins in overwritten_insns: 139 | del self.analyzer.ins_map[ins.address] 140 | 141 | for ins in new_instructions: 142 | self.analyzer.ins_map[ins.address] = ins 143 | 144 | return overwritten_insns -------------------------------------------------------------------------------- /dispatch/formats/section.py: -------------------------------------------------------------------------------- 1 | class Section(object): 2 | ''' 3 | Represents a section from an executable. All common executable formats have nearly the exact same idea of a 4 | section, so we just put it into a unified class for easy, consistent access 5 | ''' 6 | def __init__(self): 7 | self.name = '' 8 | self.vaddr = 0 9 | self.offset = 0 10 | self.size = 0 11 | self.raw = None 12 | 13 | self.readable = False 14 | self.writable = False 15 | self.executable = False 16 | 17 | self.orig_section = None 18 | 19 | def __repr__(self): 20 | return '
'.format(self.name, hex(self.vaddr)) 21 | 22 | def contains_vaddr(self, vaddr): 23 | return self.vaddr <= vaddr < self.vaddr + self.size 24 | 25 | def section_from_elf_section(elf_section): 26 | s = Section() 27 | s.name = elf_section.name 28 | s.vaddr = elf_section['sh_addr'] 29 | s.offset = elf_section['sh_offset'] 30 | s.size = elf_section['sh_size'] 31 | s.raw = elf_section.data() 32 | 33 | s.writable = bool(elf_section['sh_flags'] & 0x1) 34 | s.readable = bool(elf_section['sh_flags'] & 0x2) 35 | s.executable = bool(elf_section['sh_flags'] & 0x4) 36 | 37 | s.orig_section = elf_section 38 | 39 | return s 40 | 41 | def section_from_macho_section(macho_section, macho_segment): 42 | s = Section() 43 | s.name = macho_section.sectname.rstrip('\x00') 44 | s.vaddr = macho_section.addr 45 | s.offset = macho_section.offset 46 | s.size = macho_section.size 47 | if hasattr(macho_section, 'section_data'): 48 | s.raw = macho_section.section_data 49 | else: 50 | s.raw = '' 51 | s.readable = bool(macho_segment.initprot & 0x1) 52 | s.writable = bool(macho_segment.initprot & 0x2) 53 | s.executable = bool(macho_segment.initprot & 0x4) 54 | 55 | s.orig_section = macho_section 56 | 57 | return s 58 | 59 | def section_from_pe_section(pe_section, pe): 60 | s = Section() 61 | s.name = pe_section.Name.strip('\x00') 62 | s.vaddr = pe_section.VirtualAddress + pe.OPTIONAL_HEADER.ImageBase 63 | s.offset = pe_section.PointerToRawData 64 | s.size = pe_section.SizeOfRawData 65 | s.raw = pe_section.get_data() 66 | 67 | s.writable = bool(pe_section.Characteristics & 0x80000000) 68 | s.readable = bool(pe_section.Characteristics & 0x40000000) 69 | s.executable = bool(pe_section.Characteristics & 0x20000000) 70 | 71 | s.orig_section = pe_section 72 | 73 | return s 74 | 75 | 76 | 77 | -------------------------------------------------------------------------------- /dispatch/util/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/util/__init__.py -------------------------------------------------------------------------------- /dispatch/util/trie.py: -------------------------------------------------------------------------------- 1 | from ..constructs import Instruction 2 | 3 | class Trie(object): 4 | BUCKET_LEN = 1 5 | BUCKET_MASK = (2**BUCKET_LEN)-1 6 | def __init__(self): 7 | self.children = [None for _ in range(2**Trie.BUCKET_LEN)] 8 | self.value = None 9 | 10 | def __setitem__(self, key, value): 11 | assert type(value) == Instruction 12 | 13 | node = self 14 | for bucket in [(key >> i) & Trie.BUCKET_MASK for \ 15 | i in range(64, -1, -Trie.BUCKET_LEN)]: 16 | if not node.children[bucket]: 17 | node.children[bucket] = Trie() 18 | node = node.children[bucket] 19 | 20 | node.value = value 21 | 22 | def __getitem__(self, item): 23 | if type(item) in (int, long): 24 | node = self 25 | for bucket in [(item >> i) & Trie.BUCKET_MASK for \ 26 | i in range(64, -1, -Trie.BUCKET_LEN)]: 27 | if not node.children[bucket]: 28 | raise KeyError() 29 | node = node.children[bucket] 30 | 31 | return node.value 32 | 33 | elif type(item) == slice: 34 | start = item.start 35 | stop = item.stop 36 | if start is None: 37 | start = 0 38 | if stop is None: 39 | # 128 bits max address. Seems big enough for practical purposes 40 | stop = 0xFFFFFFFFFFFFFFFF 41 | uncommon_bits = (stop ^ start).bit_length() 42 | 43 | node = self 44 | for bucket in [(start >> i) & Trie.BUCKET_MASK for \ 45 | i in range(64, uncommon_bits, -Trie.BUCKET_LEN)]: 46 | if not node.children[bucket]: 47 | raise KeyError() 48 | node = node.children[bucket] 49 | 50 | return [v for v in iter(node) if start <= v.address < stop][::item.step] 51 | 52 | def __iter__(self): 53 | if self.value: 54 | yield self.value 55 | for child in filter(None, self.children): 56 | for v in child: 57 | yield v 58 | 59 | def __contains__(self, item): 60 | node = self 61 | for bucket in [(item >> i) & Trie.BUCKET_MASK for \ 62 | i in range(64, -1, -Trie.BUCKET_LEN)]: 63 | if not node.children[bucket]: 64 | return False 65 | node = node.children[bucket] 66 | return True 67 | 68 | def __delitem__(self, key): 69 | node = self 70 | for bucket in [(key >> i) & Trie.BUCKET_MASK for \ 71 | i in range(64, -1, -Trie.BUCKET_LEN)]: 72 | if not node.children[bucket]: 73 | raise KeyError() 74 | node = node.children[bucket] 75 | 76 | node.value = None 77 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name='dispatch', 5 | version='0.9', 6 | author='NYU OSIRIS Lab', 7 | url='https://github.com/isislab/dispatch', 8 | description='Programmatic disassembly and patching from NYU\'s OSIRIS lab', 9 | packages=['dispatch', 'dispatch.util', 'dispatch.formats', 'dispatch.analysis'], 10 | install_requires=[ 11 | 'capstone>3.0', 12 | 'keystone-engine', 13 | 'pyelftools', 14 | 'pefile', 15 | 'macholib' 16 | ] 17 | ) 18 | -------------------------------------------------------------------------------- /tests/analyze_one.py: -------------------------------------------------------------------------------- 1 | from dispatch import * 2 | import logging, sys 3 | logging.basicConfig(level=logging.INFO) 4 | 5 | exe = read_executable(sys.argv[1]) 6 | exe.analyze() 7 | exe.analyzer.cfg() 8 | print "passed" 9 | -------------------------------------------------------------------------------- /tests/binaries/arm32/conditions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/conditions-static.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/conditions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/conditions.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/functions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/functions-static.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/functions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/functions.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/hello-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/hello-static.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/hello.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/hello.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/switch-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/switch-static.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/switch.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/switch.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/test2-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/test2-static.elf -------------------------------------------------------------------------------- /tests/binaries/arm32/test2.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/test2.elf -------------------------------------------------------------------------------- /tests/binaries/src/conditions.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main() { 4 | int x = 10; 5 | if (x == 100) { 6 | printf("x is 100\n"); 7 | } else { 8 | printf("x is not 100\n"); 9 | } 10 | return 0; 11 | } 12 | -------------------------------------------------------------------------------- /tests/binaries/src/functions.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | // To make Visual Studio happy ;) 4 | int mut_rec2(int); 5 | 6 | int add_two(int x) { 7 | return x + 2; 8 | } 9 | 10 | int subtract_two(int x) { 11 | return add_two(x) - 4; 12 | } 13 | 14 | int fib(int n) { 15 | if (n == 0) return 0; 16 | if (n == 1) return 1; 17 | return fib(n-1) + fib(n-2); 18 | } 19 | 20 | int mut_rec1(int n) { 21 | if (n == 0) return 0; 22 | return mut_rec2(n-1); 23 | } 24 | 25 | int mut_rec2(int n) { 26 | if (n == 0) return 1; 27 | return mut_rec1(n-1); 28 | } 29 | 30 | int main() { 31 | fib(10); 32 | printf("%d\n", add_two(1000)); 33 | printf("%d\n", subtract_two(10)); 34 | mut_rec1(10); 35 | } 36 | -------------------------------------------------------------------------------- /tests/binaries/src/hello.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main() { 4 | int a = 1000000; 5 | a += 10000; 6 | printf("Hello World\n"); 7 | } 8 | -------------------------------------------------------------------------------- /tests/binaries/src/switch.c: -------------------------------------------------------------------------------- 1 | int main() { 2 | int x = 10; 3 | switch (x) { 4 | case 0: 5 | x = 1000003; 6 | break; 7 | case 1: 8 | x = 10; 9 | break; 10 | case 2: 11 | x = 320; 12 | break; 13 | case 3: 14 | x = 12021; 15 | break; 16 | case 4: 17 | x = 11983; 18 | break; 19 | case 5: 20 | x = 12028; 21 | break; 22 | case 6: 23 | x = 11985; 24 | break; 25 | case 7: 26 | x = 12002; 27 | break; 28 | case 8: 29 | x = 12019; 30 | break; 31 | case 9: 32 | x = 12048; 33 | break; 34 | case 10: 35 | x = 12082; 36 | break; 37 | case 11: 38 | x = 12100; 39 | break; 40 | case 12: 41 | x = 12106; 42 | break; 43 | case 13: 44 | x = 12173; 45 | break; 46 | case 14: 47 | x = 12235; 48 | break; 49 | case 15: 50 | x = 12248; 51 | break; 52 | case 16: 53 | x = 12333; 54 | break; 55 | default: 56 | break; 57 | } 58 | return 0; 59 | } 60 | -------------------------------------------------------------------------------- /tests/binaries/src/test2.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main() { 4 | int i = 101; 5 | if (i < 100) { 6 | printf("lesser!\n"); 7 | } else { 8 | printf("Greater!\n"); 9 | } 10 | return 0; 11 | } 12 | -------------------------------------------------------------------------------- /tests/binaries/x86/conditions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86/conditions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.elf -------------------------------------------------------------------------------- /tests/binaries/x86/conditions.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.macho -------------------------------------------------------------------------------- /tests/binaries/x86/conditions.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.pe -------------------------------------------------------------------------------- /tests/binaries/x86/functions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86/functions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.elf -------------------------------------------------------------------------------- /tests/binaries/x86/functions.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.macho -------------------------------------------------------------------------------- /tests/binaries/x86/functions.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.pe -------------------------------------------------------------------------------- /tests/binaries/x86/hello-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86/hello.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.elf -------------------------------------------------------------------------------- /tests/binaries/x86/hello.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.macho -------------------------------------------------------------------------------- /tests/binaries/x86/hello.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.pe -------------------------------------------------------------------------------- /tests/binaries/x86/switch-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86/switch.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.elf -------------------------------------------------------------------------------- /tests/binaries/x86/switch.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.macho -------------------------------------------------------------------------------- /tests/binaries/x86/switch.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.pe -------------------------------------------------------------------------------- /tests/binaries/x86/test2-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86/test2.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.elf -------------------------------------------------------------------------------- /tests/binaries/x86/test2.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.macho -------------------------------------------------------------------------------- /tests/binaries/x86/test2.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.pe -------------------------------------------------------------------------------- /tests/binaries/x86_64/conditions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/conditions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/conditions.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.macho -------------------------------------------------------------------------------- /tests/binaries/x86_64/conditions.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.pe -------------------------------------------------------------------------------- /tests/binaries/x86_64/functions-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/functions.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/functions.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.macho -------------------------------------------------------------------------------- /tests/binaries/x86_64/functions.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.pe -------------------------------------------------------------------------------- /tests/binaries/x86_64/hello-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/hello.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/hello.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.macho -------------------------------------------------------------------------------- /tests/binaries/x86_64/hello.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.pe -------------------------------------------------------------------------------- /tests/binaries/x86_64/switch-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/switch.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/switch.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.macho -------------------------------------------------------------------------------- /tests/binaries/x86_64/switch.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.pe -------------------------------------------------------------------------------- /tests/binaries/x86_64/test2-static.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2-static.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/test2.elf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.elf -------------------------------------------------------------------------------- /tests/binaries/x86_64/test2.macho: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.macho -------------------------------------------------------------------------------- /tests/binaries/x86_64/test2.pe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.pe -------------------------------------------------------------------------------- /tests/test_analysis.py: -------------------------------------------------------------------------------- 1 | import dispatch 2 | import logging,glob 3 | 4 | logging.basicConfig(level=logging.INFO) 5 | binary_types = ['macho', 'elf', 'pe'] 6 | for bin_type in binary_types: 7 | print("~~ Testing binary type: {} ~~".format(bin_type)) 8 | for f in glob.glob('binaries/*/*.{}'.format(bin_type)): 9 | print("Testing {}...".format(f)) 10 | executable = dispatch.read_executable(f) 11 | executable.analyze() 12 | executable.analyzer.cfg() 13 | print("Passed {}!".format(f)) 14 | print('') 15 | print('') 16 | -------------------------------------------------------------------------------- /tests/test_injection.py: -------------------------------------------------------------------------------- 1 | import dispatch 2 | 3 | import logging, sys 4 | 5 | if len(sys.argv) != 3: 6 | print "Usage: {} input_binary output_binary".format(sys.argv[0]) 7 | sys.exit(1) 8 | 9 | 10 | logging.basicConfig(level=logging.DEBUG) 11 | 12 | # Load in the executable with read_executable (pass filename) 13 | executable = dispatch.read_executable(sys.argv[1]) 14 | 15 | # Invoke the analyzer to find functions 16 | executable.analyze() 17 | 18 | # Prepare the executable for code injection 19 | executable.prepare_for_injection() 20 | 21 | instrumentation = '\xcc\xc3' # Sample x86 instrumentation - INT 3 (SIGTRAP), RET 22 | instrumentation_vaddr = executable.inject(instrumentation) 23 | logging.debug('Injected instrumentation asm at {}'.format(hex(instrumentation_vaddr))) 24 | 25 | for function in executable.iter_functions(): 26 | replaced_instruction = None 27 | for instruction in function.instructions: 28 | if instruction.size >= 5 \ 29 | and not instruction.redirects_flow() \ 30 | and not instruction.references_sp() \ 31 | and not instruction.references_ip(): 32 | logging.debug('In {} - Found candidate replacement instruction at {}: {} {}' 33 | .format(function, hex(instruction.address), instruction.mnemonic, instruction.op_str())) 34 | 35 | replaced_instruction = instruction 36 | break 37 | 38 | if not replaced_instruction: 39 | logging.warning('Could not find instruction to replace in {}'.format(function)) 40 | else: 41 | # Given a candidate instruction, replace it with a call to a new "function" that contains just that one 42 | # instruction and a jmp to the instrumentation code. 43 | 44 | hook_addr = executable.hook(replaced_instruction.address, 'jmp {}'.format(instrumentation_vaddr)) 45 | logging.info('Replaced instruction at address {} to call hook at {}'.format(hex(replaced_instruction.address), 46 | hex(hook_addr))) 47 | 48 | executable.save(sys.argv[2]) 49 | --------------------------------------------------------------------------------